Where peepshow fits
Six deployment targets · CLI first · zero glue
peepshow is a CLI first — anything that can spawn a child process can use it. The intended use cases break down into six shapes, from a drag-and-drop Claude Code plugin to a server-side multi-LLM pre-processor.
Claude Code plugin
Drag a video into the prompt; the UserPromptSubmit hook auto-invokes /peepshow:slides. Native skills, statusline badge, all built-in sinks.
Cursor / Windsurf / Cline / Codex / Gemini
Native rules files in .cursor/rules/, .windsurf/rules/, .clinerules/, .codex/hooks.json, gemini-extension.json. Each agent picks peepshow up automatically once it's on PATH.
Generic CLI / Aider / Continue / `llm`
Pipe video bytes on stdin, JSON on stdout, fan-out to sinks. Works inside any shell pipeline, CI job, cron task, Makefile target. Snippets for aider, Continue, Cody, Zed AI, Copilot CLI in docs/INTEGRATIONS.md.
Electron desktop AI client
Drop-target a video onto a BrowserWindow, pre-process locally in the main process, only forward distilled JSON to the cloud LLM. Frames + transcripts stay on disk.
Server-side AI portal
Node service ingests user uploads, runs peepshow once, fans the JSON manifest out to multiple LLMs (Claude, GPT, Gemini, local). Cuts upload bandwidth + per-token cost.
peepshow serve
Local HTTP server browses run history, streams frames + audio, manages auto-sinks. Loopback by default; non-loopback bind requires a token.
Single-user, zero deps,node:http.Who it's for
LLMs can read images. Your footage is a sequence. Whatever's in the video — a bug, a break-in, a lecture, an exploit — peepshow turns it into still frames an LLM already knows how to reason about. Five audiences shape the defaults:
QA + dev video repros
Screen-recording a flicker, a designer sending a 12-second Loom, a user uploading the .mov with the frame that breaks everything. peepshow turns the clip into scene-aware stills so the model sees the bug frame-by-frame.
peepshow ./bug-repro.mov --strategy scene --max 12 CCTV & surveillanceHours of footage → minutes of signal
An hour of overnight camera footage is hours of nothing followed by twelve seconds that matter. Scene detect flags motion; perceptual-hash dedup drops the static near-duplicates; SQLite sink archives the timeline.
Motion-only keyframes · searchable archive · LLM Q&A. Researchers & studentsLectures, fieldwork, microscopy
Lecture captures, fieldwork timelapses, documentary clips, microscopy. The LLM can read a slide, a phase change, a titration colour shift — peepshow picks frames that change so notes are reasoned about, not transcribed.
Slide-by-slide · Obsidian sink · markdown emit. Security researchCVE repros + exploit PoCs
Evidence is only useful if reviewable. peepshow extracts the frames where state changes so an analyst, a report, or an LLM can cite them directly — frame-accurate, no re-watching at 1×.
XML emit · GitHub Issues sink · severity tags. AccessibilityScreen-reader friendly video
Video content is a wall to users on screen readers. peepshow converts the visual track into per-scene stills so an LLM can describe each moment — alt text that reflects the whole story, not a single thumbnail.
Scene alt text · webhook fan-out · deterministic.Common patterns
- Pre-process locally, send less to the cloud. The Electron and server-side patterns both extract on-machine first, so only the compact JSON manifest (frames + transcript + tags) crosses the wire — never the video bytes.
- One extraction, many sinks. Every sink reads the same JSON contract from stdin. Fan a single run out to SQL + vector DB + chat + observability without re-extracting.
- Headless service mode. Pass
--no-index --no-reportwhen running inside a stateless service so peepshow doesn't write to~/.peepshow/. - Token budget control. Pair with caveman via
--emit cavemanfor ultra-compressed LLM payloads.
peepshow + Gemini (and other native-video models)
Gemini 2.x reads video natively via the File API. So does GPT-4o on short clips, and so will most frontier models. peepshow doesn't compete with that — it sits in front of it. Native video is great for short clips where token budget is irrelevant; peepshow is the control plane for everything else.
- Token-cost ceiling. Native video bills ~258 tokens per second of footage. A one-hour clip ≈ 930K tokens. peepshow trims it to 30 scene-aware frames + a transcript — predictable budget, same answer for most questions.
- Scene-change frames beat 1fps sampling. ffmpeg picks the moments where something actually changed. Higher signal per token than the model's internal uniform sampler.
- Animated GIF / APNG / animated WebP. The Gemini File API rejects most of these as video. peepshow normalises them to a flat frame sequence the model will accept.
- Audio split out, transcribed locally.
whisper.cppon PATH → plain transcript text. Frames + transcript reach the model as two cheap inputs instead of one expensive video upload — often more accurate on dialogue too. - Determinism + audit. You can see exactly which frames the model saw, cache them, replay them, diff them. Native video sampling is opaque.
- Local-first. No File API upload, no quota, no PII leaving the box. Frames stay under
~/.peepshow/unless you opt in to a remote sink. - Sinks fan-out. Gemini won't push frames to Notion, Slack, SQL, S3. peepshow does — same extracted artifact powers every downstream pipeline.
- Cross-agent portability. One frame bundle feeds Gemini + Claude + GPT + local models. No re-upload, no vendor lock, byte-identical inputs across runs.
- Pre-filter long footage. Hour-long surveillance, all-day timelapse, multi-hour lecture — scene-detect + perceptual-hash dedup collapse it to the frames that matter before the model ever sees them.
The native multimodal API and peepshow are complementary: use Gemini direct for a 30-second clip in a single turn; use peepshow when the video is long, the same artifact needs to reach more than one model, or the frames need to live somewhere durable.
Use it from Gemini CLI (3 steps)
peepshow already ships a Gemini CLI extension — gemini-extension.json + GEMINI.md at the repo root. No skill file needed; Gemini picks it up as a custom tool.
- Install peepshow.
npm install -g peepshow - Register the extension with Gemini CLI.
git clone --depth 1 https://github.com/t0mtaylor/peepshow.git cd peepshow gemini --extension . # or copy gemini-extension.json + GEMINI.md into your global Gemini extensions dir - Ask Gemini about a video.
user: summarise demo.mp4 gemini: (invokes peepshow tool → reads scene frames + transcript → answers)
Same flow works for long lectures, surveillance clips, animated GIFs — Gemini gets a frame timeline instead of paying per-second video tokens. All 71 sinks fire automatically, so the run also lands wherever you've configured (Notion, Slack, SQLite, S3, …). Full reference: peepshow for Gemini CLI.
Privacy & telemetry
The CLI sends an anonymous run beacon by default (version + OS family + outcome — no paths, no payload). Opt out with peepshow config set telemetry off, PEEPSHOW_TELEMETRY=0, or DO_NOT_TRACK=1. Full details in the privacy policy and docs/PRIVACY.md.