peepshow/ models/ qwen

Reel #M-07 Alibaba Qwen3-VL

peepshow for models / qwen

Qwen3-VL 235B-A22BQwen3-VL 235B reads short video natively. peepshow normalises long clips and bridges the smaller sizes — same artifact, every variant.

Qwen3-VL ships as a 235B-A22B MoE and a 30B-A3B MoE, plus dense checkpoints. The 235B reads short video natively with a 256K context window. Long clips, smaller variants, and animated formats all still want a frame timeline. peepshow extracts that timeline so every Qwen3-VL size handles arbitrary video at a predictable cost.

Qwen already reads video. Why peepshow?

Qwen accept video natively. peepshow is not a competing video understanding stack — it's a control plane that sits in front of native video to handle the cases native upload doesn't.

  • Works on every Qwen3-VL size. Native video lands on Qwen3-VL 235B-A22B; the 30B-A3B and dense checkpoints want a frame sequence for anything but the shortest clips. peepshow gives every size the same shape.
  • 256K context isn't free. Qwen3-VL's bigger window encourages longer uploads — token spend still scales linearly with clip seconds on native video. peepshow trims to N frames so the budget stays flat.
  • Animated GIF / APNG / WebP. Qwen3-VL adapters don't decode animated formats end-to-end. peepshow normalises them to JPEG sequences.
  • Self-host friendly. Run Qwen3-VL under vLLM, SGLang, Ollama, or LM Studio. peepshow's pipeline is identical across all of them.
  • Same bundle for fine-tuned variants. Use peepshow to prep frame data for Qwen3-VL fine-tunes — no special export, the JSON manifest is a stable contract.
  • DashScope + open weights. Alibaba-hosted Qwen3-VL and the open-weight checkpoints share the same image input shape — extract once, swap endpoints.

Token-cost math (worked examples)

ClipNative uploadpeepshow + Qwen
30s demo (Qwen3-VL 235B native)~4.2K tokens~3K (peepshow 6 frames + transcript)
10-min lecture (Qwen3-VL 235B native, often refuses)~84K tokens~7K (peepshow 20 frames + transcript)
10-min lecture (Qwen3-VL 30B-A3B, image-mode)n/a~7K (peepshow 20 frames + transcript)
1-hour CCTV (any Qwen3-VL size)(no native support past clip cap)~12K (peepshow 30 frames + sparse transcript)

Qwen3-VL's native video has clip-length and frame-rate ceilings — refuses many longer clips outright. peepshow always works.

Install (CLI)

npm install -g peepshow

# Self-host via Ollama:
ollama pull qwen3-vl
# OR DashScope:
export DASHSCOPE_API_KEY=...

peepshow ./demo.mp4 --emit json > run.json

Install (Qwen API directly, no CLI)

Calling the Qwen API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:

# Pipe frames + transcript to Qwen3-VL via Ollama
node -e '
  import { readFileSync } from "node:fs";
  const run = JSON.parse(readFileSync("run.json", "utf8"));
  const r = await fetch("http://127.0.0.1:11434/api/chat", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      model: "qwen3-vl",
      messages: [{
        role: "user",
        content: "Summarise this clip. Transcript: " + (run.transcript?.text ?? ""),
        images: run.frames.map(f => readFileSync(f.path).toString("base64"))
      }],
      stream: false
    })
  });
  console.log((await r.json()).message.content);
'

Animated GIF / APNG / WebP — peepshow's killer move on Qwen

Qwen3-VL accepts JPEG / PNG. Animated GIFs / APNGs / WebPs need flattening — peepshow does it automatically.

peepshow ./meme.gif         # animated GIF → frame timeline
peepshow ./tutorial.apng    # animated PNG → frames
peepshow ./loop.webp        # animated WebP → frames

Frame strategy presets

peepshow picks scene-change frames by default. For Qwen specifically, these presets are worth knowing:

  • --strategy scene --max 12 --resize 512Qwen3-VL dense / 30B-A3B at small VRAM — keep frames lean.
  • --strategy scene --max 32 --resize 1024Qwen3-VL 235B-A22B — exploit the 256K context with more scene coverage.
  • --strategy fps --fps 1 --max 30Mirror Qwen3-VL's native video sampler for like-for-like comparison runs.

All 95 sinks still fire

Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Qwen run. Browse the full sink catalogue →.

Report + LLM analysis loop

Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Qwen consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.

echo '{"summary":"<Qwen's summary>","provider":"qwen3-vl-235b-a22b"}' \
  | peepshow report annotate "<outputDir>"

When to skip peepshow + use Qwen direct

  • Clip is under 30s and you're on Qwen3-VL 235B-A22B with the native video path.
  • Running Qwen3 text-only (no -VL suffix) — no vision capability.
  • Need streaming inference on a live video feed.

Native video is excellent for those cases. peepshow earns its place when any of: long, animated format, cost-bounded, auditable, multi-model, local-first, fan out to other systems.