peepshow/ how-to/ ocr-video-frames

Reel #H-17 Video frames → recognised text → LLM

peepshow how-to / ocr-video-frames

Extract text from video frames with Tesseract OCR

Screen recordings, scanned documents on video, slide decks, tutorial captures — they're full of text the vision model can read but text-only LLMs can't. `peepshow ./demo.mp4 --ocr` auto-detects Tesseract on PATH and attaches recognised text to every frame in the JSON emit. Pair with the Notion / Obsidian / SQLite sinks for a searchable archive.

Steps

  1. Install peepshow + Tesseract

    Tesseract is on brew/apt.

    npm install -g peepshow
    brew install tesseract  # or: apt install tesseract-ocr
  2. Run with --ocr

    Auto-detected on PATH; soft-fails the whole pass if missing.

    peepshow ./demo.mp4 --ocr
  3. Pick a language

    Default `eng`. Comma+ for multilingual frames.

    peepshow ./demo.mp4 --ocr --ocr-lang eng+spa
  4. Feed to your LLM or archive

    Frame-level `ocr.text` field lands in the JSON emit and every sink.

    peepshow ./demo.mp4 --ocr --sink sqlite
    sqlite3 ~/.peepshow/sinks/sqlite/peepshow.db 'SELECT ocr_text FROM frames WHERE ocr_text LIKE "%login%"'

Why it works

Tesseract is the most widely-deployed open-source OCR engine. peepshow wraps it without adding it as a dependency — auto-detects via `which`, soft-fails per-frame on engine errors, soft-fails the whole pass if missing. The top-level `OcrInfo` summary reports applied/skipped status, language used, and frame counts. The same OCR output also drives peepshow's burned-caption heuristic (detects rolling captions in 3+ consecutive frames).

When it helps

  • Screen recordings — extract UI labels, error messages, code visible on screen.
  • Scanned documents on video (academic recordings, lectures, legal evidence).
  • Subtitled videos where the captions are burned in rather than soft.
  • Searchable video archives — full-text search over what was visible, not just spoken.

Pitfalls

  • Tesseract isn't on every system — `brew install tesseract` (macOS) / `apt install tesseract-ocr` (Linux). Soft-fails with hint if missing.
  • Default PSM 3 ('automatic page segmentation, no OSD'). Override with `--ocr-psm` for specialist content (e.g. PSM 6 for uniform blocks).
  • Per-frame OCR is CPU-bound — long videos at high frame counts add seconds-per-frame. Pair with `--max` to cap.

Works with these LLMs

Pairs with these sinks