DeepSeek has no native video. peepshow bridges.
DeepSeek's vision is image-only. peepshow turns video + animated formats into the frame timeline DeepSeek already accepts.
- Only path to video on DeepSeek-OCR. DeepSeek-OCR accepts image inputs only. peepshow turns video into that shape — including any preceding DeepSeek-VL2 deployment.
- Built for text-in-frame. DeepSeek-OCR is tuned for documents, screen captures, slide decks. peepshow's scene-change extractor lands exactly on the frames where text changes.
- Open weights. DeepSeek-OCR ships with open weights on HuggingFace. Run on-prem with vLLM or transformers. peepshow pipeline unchanged.
- Animated GIF / APNG / WebP. Adapter expects JPEG / PNG. peepshow flattens animated formats — useful for animated UI walkthroughs.
- Cost-bounded. Open-weights inference = compute cost. peepshow caps frame count so VRAM use stays predictable.
- Reuse the bundle on DeepSeek text models. Once extracted, feed the transcript to DeepSeek-V3 / R1 for downstream reasoning.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + DeepSeek |
|---|---|---|
| 30s product demo (peepshow) | — | ~2.5K (6 frames + transcript) |
| 10-minute lecture (peepshow) | — | ~6K (20 scene frames + transcript) |
| 1-hour CCTV reel (peepshow) | — | ~10K (30 motion frames + sparse transcript) |
| 3-hour conference (peepshow + chunked) | — | ~28K (60 scene frames + chaptered transcript) |
DeepSeek-OCR is open-weight — billed in VRAM-seconds rather than $/token if you self-host. Numbers above are token-equivalent for context-budget planning.
Install (CLI)
npm install -g peepshow
# Self-host DeepSeek-OCR via vLLM:
pip install vllm
python -m vllm.entrypoints.openai.api_server --model deepseek-ai/DeepSeek-OCR
peepshow ./demo.mp4 --emit json > run.jsonInstall (DeepSeek API directly, no CLI)
Calling the DeepSeek API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Frames + transcript → DeepSeek-OCR via OpenAI-compatible endpoint
node -e '
import OpenAI from "openai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const content = [
{ type: "text", text: "Transcribe and summarise this clip." },
...run.frames.map(f => ({
type: "image_url",
image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
})),
{ type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
];
const client = new OpenAI({ baseURL: "http://127.0.0.1:8000/v1", apiKey: "none" });
const r = await client.chat.completions.create({
model: "deepseek-ai/DeepSeek-OCR",
messages: [{ role: "user", content }]
});
console.log(r.choices[0].message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on DeepSeek
DeepSeek-OCR's image adapter reads JPEG / PNG. peepshow normalises animated GIF / APNG / WebP into a flat JPEG sequence — handy for screen-recording GIFs where the text changes per frame.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For DeepSeek specifically, these presets are worth knowing:
--strategy scene --max 16Slide decks / screen recordings — scene detection lands on slide transitions.--strategy scene --max 30Document review / OCR-heavy footage — more frames, higher recall on text changes.--strategy fps --fps 1 --max 30Steady-cadence sampling for sport, broadcast, gameplay (non-OCR use).
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one DeepSeek run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When DeepSeek consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<DeepSeek's summary>","provider":"deepseek-ocr"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use DeepSeek direct
- Source is already a small set of images / a single document scan.
- Running DeepSeek-V3 / R1 text-only — no vision capability.
- Need streaming-frame OCR on a live feed (peepshow is one-shot).
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → DeepSeek reads them as images + text.