217 lines
3.4 KiB
Markdown
217 lines
3.4 KiB
Markdown
|
||
|
||
# 📘 **Add OCR Layer – Windows Context Menu Script**
|
||
|
||
This script adds an OCR text layer to any PDF using **OCRmyPDF**, with smart handling for PDFs that already contain text.
|
||
|
||
It integrates directly into the Windows right-click menu, so you can right-click any PDF → **Add OCR Layer**.
|
||
|
||
- - -
|
||
|
||
# ✅ **Features**
|
||
|
||
* ✔ Right-click any PDF to run OCR
|
||
|
||
* ✔ Detects:
|
||
|
||
* Tagged PDFs
|
||
|
||
* PDFs with pre-existing OCR
|
||
|
||
* ✔ Prompts user when text already exists:
|
||
|
||
* **R** → `--redo-ocr` (best for mixed raster/vector)
|
||
|
||
* **F** → `--force-ocr` (overwrite all text)
|
||
|
||
* **S** → Skip OCR
|
||
|
||
* ✔ Produces a new file with `_ocr.pdf` appended
|
||
|
||
* ✔ Works even when OCRmyPDF returns ambiguous exit codes
|
||
|
||
|
||
- - -
|
||
|
||
# 📦 **Installation Guide**
|
||
|
||
## 1\. Install Python
|
||
|
||
Install Python 3.11 or later:
|
||
[https://www.python.org/downloads/](https://www.python.org/downloads/)
|
||
Be sure to check:
|
||
|
||
☑ **Add python.exe to PATH**
|
||
|
||
- - -
|
||
|
||
## 2\. Install OCRmyPDF
|
||
|
||
Open **Command Prompt** (Win+R → `cmd`) and install:
|
||
|
||
nginx
|
||
|
||
Copy code
|
||
|
||
`pip install ocrmypdf`
|
||
|
||
OCRmyPDF requires several external tools.
|
||
|
||
- - -
|
||
|
||
## 3\. Install Ghostscript
|
||
|
||
Required for rasterizing pages:
|
||
|
||
nginx
|
||
|
||
Copy code
|
||
|
||
`choco install ghostscript`
|
||
|
||
Or download manually:
|
||
[https://ghostscript.com/releases/index.html](https://ghostscript.com/releases/index.html)
|
||
|
||
- - -
|
||
|
||
## 4\. Optional: Install Tesseract
|
||
|
||
OCRmyPDF bundles a basic engine, but Tesseract yields better results:
|
||
|
||
nginx
|
||
|
||
Copy code
|
||
|
||
`choco install tesseract`
|
||
|
||
Or install manually from UB Mannheim builds.
|
||
|
||
- - -
|
||
|
||
## 5\. Copy the Script
|
||
|
||
Save the provided batch script as:
|
||
|
||
makefile
|
||
|
||
Copy code
|
||
|
||
`C:\Tools\add_ocr_layer.bat`
|
||
|
||
(You may place it anywhere, but avoid locations that sync to the cloud.)
|
||
|
||
- - -
|
||
|
||
## 6\. Add “Add OCR Layer” to Right-Click Menu
|
||
|
||
### Automated (recommended)
|
||
|
||
Create a `.reg` file:
|
||
|
||
swift
|
||
|
||
Copy code
|
||
|
||
`Windows Registry Editor Version 5.00 [HKEY_CLASSES_ROOT\*\shell\Add OCR Layer] @="Add OCR Layer" [HKEY_CLASSES_ROOT\*\shell\Add OCR Layer\command] @="\"C:\\Tools\\add_ocr_layer.bat\" \"%1\""`
|
||
|
||
Double-click to install.
|
||
|
||
### Manual (if needed)
|
||
|
||
Navigate to:
|
||
|
||
Copy code
|
||
|
||
`Computer\HKEY_CLASSES_ROOT\*\shell\`
|
||
|
||
Create key: `Add OCR Layer`
|
||
Inside it, create key: `command`
|
||
Set default value to:
|
||
|
||
perl
|
||
|
||
Copy code
|
||
|
||
`"C:\Tools\add_ocr_layer.bat" "%1"`
|
||
|
||
- - -
|
||
|
||
# ▶️ **Usage**
|
||
|
||
### **Right-click any PDF → Add OCR Layer**
|
||
|
||
The script will:
|
||
|
||
1. Show the file path
|
||
|
||
2. Run OCRmyPDF
|
||
|
||
3. Detect if pages contain text
|
||
|
||
4. If text is found, it will prompt:
|
||
|
||
|
||
sql
|
||
|
||
Copy code
|
||
|
||
`Choose how to proceed: R = Use --redo-ocr (raster areas only) F = Use --force-ocr (overwrite all text) S = Skip OCR`
|
||
|
||
5. Your OCR’d file will be saved as:
|
||
|
||
|
||
Copy code
|
||
|
||
`original_filename_ocr.pdf`
|
||
|
||
- - -
|
||
|
||
# ⚠ Troubleshooting
|
||
|
||
### **Ghostscript not found (‘gs’ missing)**
|
||
|
||
Install via Chocolatey:
|
||
|
||
nginx
|
||
|
||
Copy code
|
||
|
||
`choco install ghostscript`
|
||
|
||
Or add Ghostscript’s `bin` folder to PATH manually.
|
||
|
||
- - -
|
||
|
||
### **OCRmyPDF not found**
|
||
|
||
Ensure Python Scripts folder is in PATH:
|
||
|
||
makefile
|
||
|
||
Copy code
|
||
|
||
`C:\Users\<you>\AppData\Local\Programs\Python\Python312\Scripts\`
|
||
|
||
- - -
|
||
|
||
### **TaggedPDFError appears and OCR stops**
|
||
|
||
This script handles it automatically and will offer choices.
|
||
|
||
- - -
|
||
|
||
# 🧪 Tested On
|
||
|
||
* Windows 10
|
||
|
||
* Windows 11
|
||
|
||
* Python 3.12
|
||
|
||
* OCRmyPDF 15.x
|
||
|
||
* Ghostscript 10.x
|
||
|
||
* Tesseract 5.x
|
||
|
||
|