Utility_Apps/Shell/AddOCR/readme.md

217 lines
3.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# 📘 **Add OCR Layer Windows Context Menu Script**
This script adds an OCR text layer to any PDF using **OCRmyPDF**, with smart handling for PDFs that already contain text.
It integrates directly into the Windows right-click menu, so you can right-click any PDF → **Add OCR Layer**.
- - -
# ✅ **Features**
* ✔ Right-click any PDF to run OCR
* ✔ Detects:
* Tagged PDFs
* PDFs with pre-existing OCR
* ✔ Prompts user when text already exists:
* **R**`--redo-ocr` (best for mixed raster/vector)
* **F**`--force-ocr` (overwrite all text)
* **S** → Skip OCR
* ✔ Produces a new file with `_ocr.pdf` appended
* ✔ Works even when OCRmyPDF returns ambiguous exit codes
- - -
# 📦 **Installation Guide**
## 1\. Install Python
Install Python 3.11 or later:
[https://www.python.org/downloads/](https://www.python.org/downloads/)
Be sure to check:
**Add python.exe to PATH**
- - -
## 2\. Install OCRmyPDF
Open **Command Prompt** (Win+R → `cmd`) and install:
nginx
Copy code
`pip install ocrmypdf`
OCRmyPDF requires several external tools.
- - -
## 3\. Install Ghostscript
Required for rasterizing pages:
nginx
Copy code
`choco install ghostscript`
Or download manually:
[https://ghostscript.com/releases/index.html](https://ghostscript.com/releases/index.html)
- - -
## 4\. Optional: Install Tesseract
OCRmyPDF bundles a basic engine, but Tesseract yields better results:
nginx
Copy code
`choco install tesseract`
Or install manually from UB Mannheim builds.
- - -
## 5\. Copy the Script
Save the provided batch script as:
makefile
Copy code
`C:\Tools\add_ocr_layer.bat`
(You may place it anywhere, but avoid locations that sync to the cloud.)
- - -
## 6\. Add “Add OCR Layer” to Right-Click Menu
### Automated (recommended)
Create a `.reg` file:
swift
Copy code
`Windows Registry Editor Version 5.00 [HKEY_CLASSES_ROOT\*\shell\Add OCR Layer] @="Add OCR Layer" [HKEY_CLASSES_ROOT\*\shell\Add OCR Layer\command] @="\"C:\\Tools\\add_ocr_layer.bat\" \"%1\""`
Double-click to install.
### Manual (if needed)
Navigate to:
Copy code
`Computer\HKEY_CLASSES_ROOT\*\shell\`
Create key: `Add OCR Layer`
Inside it, create key: `command`
Set default value to:
perl
Copy code
`"C:\Tools\add_ocr_layer.bat" "%1"`
- - -
# ▶️ **Usage**
### **Right-click any PDF → Add OCR Layer**
The script will:
1. Show the file path
2. Run OCRmyPDF
3. Detect if pages contain text
4. If text is found, it will prompt:
sql
Copy code
`Choose how to proceed: R = Use --redo-ocr (raster areas only) F = Use --force-ocr (overwrite all text) S = Skip OCR`
5. Your OCRd file will be saved as:
Copy code
`original_filename_ocr.pdf`
- - -
# ⚠ Troubleshooting
### **Ghostscript not found (gs missing)**
Install via Chocolatey:
nginx
Copy code
`choco install ghostscript`
Or add Ghostscripts `bin` folder to PATH manually.
- - -
### **OCRmyPDF not found**
Ensure Python Scripts folder is in PATH:
makefile
Copy code
`C:\Users\<you>\AppData\Local\Programs\Python\Python312\Scripts\`
- - -
### **TaggedPDFError appears and OCR stops**
This script handles it automatically and will offer choices.
- - -
# 🧪 Tested On
* Windows 10
* Windows 11
* Python 3.12
* OCRmyPDF 15.x
* Ghostscript 10.x
* Tesseract 5.x