peepshow + DeepSeek — OCR-grade vision — peepshow brings the timeline

DeepSeek has no native video. peepshow bridges.

DeepSeek's vision is image-only. peepshow turns video + animated formats into the frame timeline DeepSeek already accepts.

Only path to video on DeepSeek-OCR. DeepSeek-OCR accepts image inputs only. peepshow turns video into that shape — including any preceding DeepSeek-VL2 deployment.
Built for text-in-frame. DeepSeek-OCR is tuned for documents, screen captures, slide decks. peepshow's scene-change extractor lands exactly on the frames where text changes.
Open weights. DeepSeek-OCR ships with open weights on HuggingFace. Run on-prem with vLLM or transformers. peepshow pipeline unchanged.
Animated GIF / APNG / WebP. Adapter expects JPEG / PNG. peepshow flattens animated formats — useful for animated UI walkthroughs.
Cost-bounded. Open-weights inference = compute cost. peepshow caps frame count so VRAM use stays predictable.
Reuse the bundle on DeepSeek text models. Once extracted, feed the transcript to DeepSeek-V3 / R1 for downstream reasoning.

Token-cost math (worked examples)

Clip	Native upload	peepshow + DeepSeek
30s product demo (peepshow)	—	~2.5K (6 frames + transcript)
10-minute lecture (peepshow)	—	~6K (20 scene frames + transcript)
1-hour CCTV reel (peepshow)	—	~10K (30 motion frames + sparse transcript)
3-hour conference (peepshow + chunked)	—	~28K (60 scene frames + chaptered transcript)

DeepSeek-OCR is open-weight — billed in VRAM-seconds rather than $/token if you self-host. Numbers above are token-equivalent for context-budget planning.

Install (CLI)

npm install -g peepshow

# Self-host DeepSeek-OCR via vLLM:
pip install vllm
python -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-OCR

peepshow ./demo.mp4 --emit json > run.json

Install (DeepSeek API directly, no CLI)

Calling the DeepSeek API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:

# Frames + transcript → DeepSeek-OCR via OpenAI-compatible endpoint
node -e '
  import OpenAI from "openai";
  import { readFileSync } from "node:fs";
  const run = JSON.parse(readFileSync("run.json", "utf8"));
  const content = [
    { type: "text", text: "Transcribe and summarise this clip." },
    ...run.frames.map(f => ({
      type: "image_url",
      image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
    })),
    { type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
  ];
  const client = new OpenAI({ baseURL: "http://127.0.0.1:8000/v1", apiKey: "none" });
  const r = await client.chat.completions.create({
    model: "deepseek-ai/DeepSeek-OCR",
    messages: [{ role: "user", content }]
  });
  console.log(r.choices[0].message.content);
'

Animated GIF / APNG / WebP — peepshow's killer move on DeepSeek

DeepSeek-OCR's image adapter reads JPEG / PNG. peepshow normalises animated GIF / APNG / WebP into a flat JPEG sequence — handy for screen-recording GIFs where the text changes per frame.

peepshow ./meme.gif         # animated GIF → frame timeline
peepshow ./tutorial.apng    # animated PNG → frames
peepshow ./loop.webp        # animated WebP → frames

Frame strategy presets

peepshow picks scene-change frames by default. For DeepSeek specifically, these presets are worth knowing:

--strategy scene --max 16Slide decks / screen recordings — scene detection lands on slide transitions.
--strategy scene --max 30Document review / OCR-heavy footage — more frames, higher recall on text changes.
--strategy fps --fps 1 --max 30Steady-cadence sampling for sport, broadcast, gameplay (non-OCR use).

All 95 sinks still fire

Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one DeepSeek run. Browse the full sink catalogue →.

Report + LLM analysis loop

Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When DeepSeek consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.

echo '{"summary":"<DeepSeek's summary>","provider":"deepseek-ocr"}' \
  | peepshow report annotate "<outputDir>"

When to skip peepshow + use DeepSeek direct

Source is already a small set of images / a single document scan.
Running DeepSeek-V3 / R1 text-only — no vision capability.
Need streaming-frame OCR on a live feed (peepshow is one-shot).

For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → DeepSeek reads them as images + text.

DeepSeek-OCR — DeepSeek-OCR reads images with OCR precision. peepshow extracts the frame timeline so it reads video as a sequence of stills.

DeepSeek has no native video. peepshow bridges.

Token-cost math (worked examples)

Install (CLI)

Install (DeepSeek API directly, no CLI)

Animated GIF / APNG / WebP — peepshow's killer move on DeepSeek

Frame strategy presets

All 95 sinks still fire

Report + LLM analysis loop

When to skip peepshow + use DeepSeek direct

peepshow + other LLMs

Related

DeepSeek-OCR — DeepSeek-OCR reads images with OCR precision. peepshow extracts the frame timeline so it reads video as a sequence of stills.

DeepSeek has no native video. peepshow bridges.

Token-cost math (worked examples)

Install (CLI)

Install (DeepSeek API directly, no CLI)

Animated GIF / APNG / WebP — peepshow's killer move on DeepSeek

Frame strategy presets

All 95 sinks still fire

Report + LLM analysis loop

When to skip peepshow + use DeepSeek direct

Related models

peepshow + other LLMs

Related