peepshow + Qwen — Native video on the big size; peepshow keeps cost…

Qwen already reads video. Why peepshow?

Qwen accept video natively. peepshow is not a competing video understanding stack — it's a control plane that sits in front of native video to handle the cases native upload doesn't.

Works on every Qwen3-VL size. Native video lands on Qwen3-VL 235B-A22B; the 30B-A3B and dense checkpoints want a frame sequence for anything but the shortest clips. peepshow gives every size the same shape.
256K context isn't free. Qwen3-VL's bigger window encourages longer uploads — token spend still scales linearly with clip seconds on native video. peepshow trims to N frames so the budget stays flat.
Animated GIF / APNG / WebP. Qwen3-VL adapters don't decode animated formats end-to-end. peepshow normalises them to JPEG sequences.
Self-host friendly. Run Qwen3-VL under vLLM, SGLang, Ollama, or LM Studio. peepshow's pipeline is identical across all of them.
Same bundle for fine-tuned variants. Use peepshow to prep frame data for Qwen3-VL fine-tunes — no special export, the JSON manifest is a stable contract.
DashScope + open weights. Alibaba-hosted Qwen3-VL and the open-weight checkpoints share the same image input shape — extract once, swap endpoints.

Token-cost math (worked examples)

Clip	Native upload	peepshow + Qwen
30s demo (Qwen3-VL 235B native)	~4.2K tokens	~3K (peepshow 6 frames + transcript)
10-min lecture (Qwen3-VL 235B native, often refuses)	~84K tokens	~7K (peepshow 20 frames + transcript)
10-min lecture (Qwen3-VL 30B-A3B, image-mode)	n/a	~7K (peepshow 20 frames + transcript)
1-hour CCTV (any Qwen3-VL size)	(no native support past clip cap)	~12K (peepshow 30 frames + sparse transcript)

Qwen3-VL's native video has clip-length and frame-rate ceilings — refuses many longer clips outright. peepshow always works.

Install (CLI)

npm install -g peepshow

# Self-host via Ollama:
ollama pull qwen3-vl
# OR DashScope:
export DASHSCOPE_API_KEY=...

peepshow ./demo.mp4 --emit json > run.json

Install (Qwen API directly, no CLI)

Calling the Qwen API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:

# Pipe frames + transcript to Qwen3-VL via Ollama
node -e '
  import { readFileSync } from "node:fs";
  const run = JSON.parse(readFileSync("run.json", "utf8"));
  const r = await fetch("http://127.0.0.1:11434/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "qwen3-vl",
      messages: [{
        role: "user",
        content: "Summarise this clip. Transcript: " + (run.transcript?.text ?? ""),
        images: run.frames.map(f => readFileSync(f.path).toString("base64"))
      }],
      stream: false
    })
  });
  console.log((await r.json()).message.content);
'

Animated GIF / APNG / WebP — peepshow's killer move on Qwen

Qwen3-VL accepts JPEG / PNG. Animated GIFs / APNGs / WebPs need flattening — peepshow does it automatically.

peepshow ./meme.gif         # animated GIF → frame timeline
peepshow ./tutorial.apng    # animated PNG → frames
peepshow ./loop.webp        # animated WebP → frames

Frame strategy presets

peepshow picks scene-change frames by default. For Qwen specifically, these presets are worth knowing:

--strategy scene --max 12 --resize 512Qwen3-VL dense / 30B-A3B at small VRAM — keep frames lean.
--strategy scene --max 32 --resize 1024Qwen3-VL 235B-A22B — exploit the 256K context with more scene coverage.
--strategy fps --fps 1 --max 30Mirror Qwen3-VL's native video sampler for like-for-like comparison runs.

All 95 sinks still fire

Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Qwen run. Browse the full sink catalogue →.

Report + LLM analysis loop

Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Qwen consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.

echo '{"summary":"<Qwen's summary>","provider":"qwen3-vl-235b-a22b"}' \
  | peepshow report annotate "<outputDir>"

When to skip peepshow + use Qwen direct

Clip is under 30s and you're on Qwen3-VL 235B-A22B with the native video path.
Running Qwen3 text-only (no -VL suffix) — no vision capability.
Need streaming inference on a live video feed.

Native video is excellent for those cases. peepshow earns its place when any of: long, animated format, cost-bounded, auditable, multi-model, local-first, fan out to other systems.

Qwen3-VL 235B-A22B — Qwen3-VL 235B reads short video natively. peepshow normalises long clips and bridges the smaller sizes — same artifact, every variant.

Qwen already reads video. Why peepshow?

Token-cost math (worked examples)

Install (CLI)

Install (Qwen API directly, no CLI)

Animated GIF / APNG / WebP — peepshow's killer move on Qwen

Frame strategy presets

All 95 sinks still fire

Report + LLM analysis loop

When to skip peepshow + use Qwen direct

peepshow + other LLMs

Related

Qwen3-VL 235B-A22B — Qwen3-VL 235B reads short video natively. peepshow normalises long clips and bridges the smaller sizes — same artifact, every variant.

Qwen already reads video. Why peepshow?

Token-cost math (worked examples)

Install (CLI)

Install (Qwen API directly, no CLI)

Animated GIF / APNG / WebP — peepshow's killer move on Qwen

Frame strategy presets

All 95 sinks still fire

Report + LLM analysis loop

When to skip peepshow + use Qwen direct

Related models

peepshow + other LLMs

Related