Gemini already reads video. Why peepshow?
Gemini accepts video natively. peepshow is not a competing video understanding stack — it's a control plane that sits in front of native video to handle the cases native upload doesn't.
- Token-cost ceiling on long footage. Native video bills ~258 tokens / second. A one-hour clip ≈ 930K input tokens. peepshow extracts 30 scene-change frames + transcript, model spends a fraction.
- Animated GIF / APNG / animated WebP. The Gemini File API rejects most animated images as video. peepshow normalises them to a flat frame sequence.
- Audio split-out. `whisper.cpp` on PATH → plain transcript text. Cheaper, often more accurate on dialogue than raw video upload.
- Auditable frames. See exactly which stills the model saw. Cache, replay, diff. Native sampling is opaque.
- Local-first. No File API upload, no quota, no PII leaving the machine.
- Sinks. Gemini won't push frames to Notion / Slack / SQL / S3. peepshow does — same artifact, 95 destinations.
- Cross-model portability. Same frame bundle feeds Gemini + Claude + GPT + local Ollama. No re-upload.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + Gemini |
|---|---|---|
| 30s product demo | ~7.7K tokens | ~6K (6 frames + 200-word transcript) |
| 10-minute lecture | ~155K tokens | ~14K (20 scene frames + transcript) |
| 1-hour CCTV reel | ~930K tokens | ~22K (30 motion frames + sparse transcript) |
| 3-hour conference recording | ~2.8M tokens (over 1M context limit) | ~45K (60 scene frames + transcript chapters) |
Numbers approximate — Gemini's per-second video token rate is the published default (~258 tok/s for standard quality). peepshow's frame size depends on resolution + JPEG quality, transcript depends on speech density.
Install (Gemini CLI / agent)
npm install -g peepshow
git clone --depth 1 https://github.com/t0mtaylor/peepshow.git
cd peepshow
gemini --extension . # registers gemini-extension.json + GEMINI.mdFull agent reference: peepshow for Gemini →. The CLI itself works in any shell — the agent integration is one of many entry points.
Install (Gemini API directly, no CLI)
Calling the Gemini API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# 1. Extract
peepshow ./demo.mp4 --emit json > run.json
# 2. Hand the frames + transcript to Gemini
node -e '
import { GoogleGenAI } from "@google/genai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const parts = [
{ text: "Summarise this clip." },
...run.frames.map(f => ({
inlineData: { mimeType: "image/jpeg", data: readFileSync(f.path).toString("base64") }
})),
{ text: "Transcript:\n" + (run.transcript?.text ?? "") }
];
const ai = new GoogleGenAI({});
const r = await ai.models.generateContent({ model: "gemini-2.5-flash", contents: parts });
console.log(r.text);
'Animated GIF / APNG / WebP — peepshow's killer move on Gemini
The Gemini File API expects video containers. Animated PNGs, animated WebPs, and most GIFs come back as still images — you lose the motion. peepshow probes the source, treats it as a frame sequence, and emits the timeline as ordinary JPEGs Gemini will gladly read.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For Gemini specifically, these presets are worth knowing:
--strategy scene --max 20Default. Best for narrative clips where information density varies — talks, demos, recordings.--strategy fps --fps 1 --max 60Mirror Gemini's internal 1fps sampler for like-for-like comparison runs or for footage with steady motion.--strategy scene --max 12 --dedup perceptualAggressive trim for CCTV / long-form static footage. Drops near-duplicates before the model sees them.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Gemini run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Gemini consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<Gemini's summary>","provider":"gemini-2.5-flash"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use Gemini direct
- Clip is under 60 seconds and you need audio + motion fidelity (lip-sync, sport, music).
- One-shot prototype where token cost doesn't matter.
- No need to persist frames anywhere.
Native video is excellent for those cases. peepshow earns its place when any of: long, animated format, cost-bounded, auditable, multi-model, local-first, fan out to other systems.