Qwen already reads video. Why peepshow?
Qwen accept video natively. peepshow is not a competing video understanding stack — it's a control plane that sits in front of native video to handle the cases native upload doesn't.
- Works on every Qwen3-VL size. Native video lands on Qwen3-VL 235B-A22B; the 30B-A3B and dense checkpoints want a frame sequence for anything but the shortest clips. peepshow gives every size the same shape.
- 256K context isn't free. Qwen3-VL's bigger window encourages longer uploads — token spend still scales linearly with clip seconds on native video. peepshow trims to N frames so the budget stays flat.
- Animated GIF / APNG / WebP. Qwen3-VL adapters don't decode animated formats end-to-end. peepshow normalises them to JPEG sequences.
- Self-host friendly. Run Qwen3-VL under vLLM, SGLang, Ollama, or LM Studio. peepshow's pipeline is identical across all of them.
- Same bundle for fine-tuned variants. Use peepshow to prep frame data for Qwen3-VL fine-tunes — no special export, the JSON manifest is a stable contract.
- DashScope + open weights. Alibaba-hosted Qwen3-VL and the open-weight checkpoints share the same image input shape — extract once, swap endpoints.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + Qwen |
|---|---|---|
| 30s demo (Qwen3-VL 235B native) | ~4.2K tokens | ~3K (peepshow 6 frames + transcript) |
| 10-min lecture (Qwen3-VL 235B native, often refuses) | ~84K tokens | ~7K (peepshow 20 frames + transcript) |
| 10-min lecture (Qwen3-VL 30B-A3B, image-mode) | n/a | ~7K (peepshow 20 frames + transcript) |
| 1-hour CCTV (any Qwen3-VL size) | (no native support past clip cap) | ~12K (peepshow 30 frames + sparse transcript) |
Qwen3-VL's native video has clip-length and frame-rate ceilings — refuses many longer clips outright. peepshow always works.
Install (CLI)
npm install -g peepshow
# Self-host via Ollama:
ollama pull qwen3-vl
# OR DashScope:
export DASHSCOPE_API_KEY=...
peepshow ./demo.mp4 --emit json > run.jsonInstall (Qwen API directly, no CLI)
Calling the Qwen API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Pipe frames + transcript to Qwen3-VL via Ollama
node -e '
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const r = await fetch("http://127.0.0.1:11434/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "qwen3-vl",
messages: [{
role: "user",
content: "Summarise this clip. Transcript: " + (run.transcript?.text ?? ""),
images: run.frames.map(f => readFileSync(f.path).toString("base64"))
}],
stream: false
})
});
console.log((await r.json()).message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on Qwen
Qwen3-VL accepts JPEG / PNG. Animated GIFs / APNGs / WebPs need flattening — peepshow does it automatically.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For Qwen specifically, these presets are worth knowing:
--strategy scene --max 12 --resize 512Qwen3-VL dense / 30B-A3B at small VRAM — keep frames lean.--strategy scene --max 32 --resize 1024Qwen3-VL 235B-A22B — exploit the 256K context with more scene coverage.--strategy fps --fps 1 --max 30Mirror Qwen3-VL's native video sampler for like-for-like comparison runs.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Qwen run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Qwen consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<Qwen's summary>","provider":"qwen3-vl-235b-a22b"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use Qwen direct
- Clip is under 30s and you're on Qwen3-VL 235B-A22B with the native video path.
- Running Qwen3 text-only (no -VL suffix) — no vision capability.
- Need streaming inference on a live video feed.
Native video is excellent for those cases. peepshow earns its place when any of: long, animated format, cost-bounded, auditable, multi-model, local-first, fan out to other systems.