Extract text from video frames with Tesseract OCR

Screen recordings, scanned documents on video, slide decks, tutorial captures — they're full of text the vision model can read but text-only LLMs can't. `peepshow ./demo.mp4 --ocr` auto-detects Tesseract on PATH and attaches recognised text to every frame in the JSON emit. Pair with the Notion / Obsidian / SQLite sinks for a searchable archive.

Steps

Install peepshow + Tesseract

Tesseract is on brew/apt.

npm install -g peepshow
brew install tesseract  # or: apt install tesseract-ocr

Run with --ocr
Auto-detected on PATH; soft-fails the whole pass if missing.
```
peepshow ./demo.mp4 --ocr
```
Pick a language
Default `eng`. Comma+ for multilingual frames.
```
peepshow ./demo.mp4 --ocr --ocr-lang eng+spa
```

Feed to your LLM or archive

Frame-level `ocr.text` field lands in the JSON emit and every sink.

peepshow ./demo.mp4 --ocr --sink sqlite
sqlite3 ~/.peepshow/sinks/sqlite/peepshow.db 'SELECT ocr_text FROM frames WHERE ocr_text LIKE "%login%"'

Why it works

Tesseract is the most widely-deployed open-source OCR engine. peepshow wraps it without adding it as a dependency — auto-detects via `which`, soft-fails per-frame on engine errors, soft-fails the whole pass if missing. The top-level `OcrInfo` summary reports applied/skipped status, language used, and frame counts. The same OCR output also drives peepshow's burned-caption heuristic (detects rolling captions in 3+ consecutive frames).

When it helps

Screen recordings — extract UI labels, error messages, code visible on screen.
Scanned documents on video (academic recordings, lectures, legal evidence).
Subtitled videos where the captions are burned in rather than soft.
Searchable video archives — full-text search over what was visible, not just spoken.

Pitfalls

Tesseract isn't on every system — `brew install tesseract` (macOS) / `apt install tesseract-ocr` (Linux). Soft-fails with hint if missing.
Default PSM 3 ('automatic page segmentation, no OSD'). Override with `--ocr-psm` for specialist content (e.g. PSM 6 for uniform blocks).
Per-frame OCR is CPU-bound — long videos at high frame counts add seconds-per-frame. Pair with `--max` to cap.

Extract text from video frames with Tesseract OCR

Steps

Why it works

When it helps

Pitfalls

Works with these LLMs

Pairs with these sinks

Other how-to workflows

Pairs with

Extract text from video frames with Tesseract OCR

Steps

Why it works

When it helps

Pitfalls

Works with these LLMs

Pairs with these sinks

Related how-to

Other how-to workflows

Pairs with