Where peepshow fits

Six deployment targets · CLI first · zero glue

peepshow is a CLI first — anything that can spawn a child process can use it. The intended use cases break down into six shapes, from a drag-and-drop Claude Code plugin to a server-side multi-LLM pre-processor.

A — agents

Claude Code plugin

Drag a video into the prompt; the UserPromptSubmit hook auto-invokes /peepshow:slides. Native skills, statusline badge, all built-in sinks.

The reference integration. B — agents

Cursor / Windsurf / Cline / Codex / Gemini

Native rules files in .cursor/rules/, .windsurf/rules/, .clinerules/, .codex/hooks.json, gemini-extension.json. Each agent picks peepshow up automatically once it's on PATH.

Eight agent manifests ship in-tree. C — CLI

Generic CLI / Aider / Continue / `llm`

Pipe video bytes on stdin, JSON on stdout, fan-out to sinks. Works inside any shell pipeline, CI job, cron task, Makefile target. Snippets for aider, Continue, Cody, Zed AI, Copilot CLI in docs/INTEGRATIONS.md.

Zero glue. Zero install ceremony. D — desktop

Electron desktop AI client

Drop-target a video onto a BrowserWindow, pre-process locally in the main process, only forward distilled JSON to the cloud LLM. Frames + transcripts stay on disk.

Local-first context, cloud-first reasoning. E — server

Server-side AI portal

Node service ingests user uploads, runs peepshow once, fans the JSON manifest out to multiple LLMs (Claude, GPT, Gemini, local). Cuts upload bandwidth + per-token cost.

One extract, many models. F — dashboard

`peepshow serve`

Local HTTP server browses run history, streams frames + audio, manages auto-sinks. Loopback by default; non-loopback bind requires a token.

Single-user, zero deps, node:http.

Who it's for

LLMs can read images. Your footage is a sequence. Whatever's in the video — a bug, a break-in, a lecture, an exploit — peepshow turns it into still frames an LLM already knows how to reason about. Five audiences shape the defaults:

Developers

QA + dev video repros

Screen-recording a flicker, a designer sending a 12-second Loom, a user uploading the .mov with the frame that breaks everything. peepshow turns the clip into scene-aware stills so the model sees the bug frame-by-frame.

peepshow ./bug-repro.mov --strategy scene --max 12 CCTV & surveillance

Hours of footage → minutes of signal

An hour of overnight camera footage is hours of nothing followed by twelve seconds that matter. Scene detect flags motion; perceptual-hash dedup drops the static near-duplicates; SQLite sink archives the timeline.

Motion-only keyframes · searchable archive · LLM Q&A. Researchers & students

Lectures, fieldwork, microscopy

Lecture captures, fieldwork timelapses, documentary clips, microscopy. The LLM can read a slide, a phase change, a titration colour shift — peepshow picks frames that change so notes are reasoned about, not transcribed.

Slide-by-slide · Obsidian sink · markdown emit. Security research

CVE repros + exploit PoCs

Evidence is only useful if reviewable. peepshow extracts the frames where state changes so an analyst, a report, or an LLM can cite them directly — frame-accurate, no re-watching at 1×.

XML emit · GitHub Issues sink · severity tags. Accessibility

Screen-reader friendly video

Video content is a wall to users on screen readers. peepshow converts the visual track into per-scene stills so an LLM can describe each moment — alt text that reflects the whole story, not a single thumbnail.

Scene alt text · webhook fan-out · deterministic.

Common patterns

Pre-process locally, send less to the cloud. The Electron and server-side patterns both extract on-machine first, so only the compact JSON manifest (frames + transcript + tags) crosses the wire — never the video bytes.
One extraction, many sinks. Every sink reads the same JSON contract from stdin. Fan a single run out to SQL + vector DB + chat + observability without re-extracting.
Headless service mode. Pass --no-index --no-report when running inside a stateless service so peepshow doesn't write to ~/.peepshow/.
Token budget control. Pair with caveman via --emit caveman for ultra-compressed LLM payloads.

peepshow + Gemini (and other native-video models)

Gemini 2.x reads video natively via the File API. So does GPT-4o on short clips, and so will most frontier models. peepshow doesn't compete with that — it sits in front of it. Native video is great for short clips where token budget is irrelevant; peepshow is the control plane for everything else.

Token-cost ceiling. Native video bills ~258 tokens per second of footage. A one-hour clip ≈ 930K tokens. peepshow trims it to 30 scene-aware frames + a transcript — predictable budget, same answer for most questions.
Scene-change frames beat 1fps sampling. ffmpeg picks the moments where something actually changed. Higher signal per token than the model's internal uniform sampler.
Animated GIF / APNG / animated WebP. The Gemini File API rejects most of these as video. peepshow normalises them to a flat frame sequence the model will accept.
Audio split out, transcribed locally. whisper.cpp on PATH → plain transcript text. Frames + transcript reach the model as two cheap inputs instead of one expensive video upload — often more accurate on dialogue too.
Determinism + audit. You can see exactly which frames the model saw, cache them, replay them, diff them. Native video sampling is opaque.
Local-first. No File API upload, no quota, no PII leaving the box. Frames stay under ~/.peepshow/ unless you opt in to a remote sink.
Sinks fan-out. Gemini won't push frames to Notion, Slack, SQL, S3. peepshow does — same extracted artifact powers every downstream pipeline.
Cross-agent portability. One frame bundle feeds Gemini + Claude + GPT + local models. No re-upload, no vendor lock, byte-identical inputs across runs.
Pre-filter long footage. Hour-long surveillance, all-day timelapse, multi-hour lecture — scene-detect + perceptual-hash dedup collapse it to the frames that matter before the model ever sees them.

The native multimodal API and peepshow are complementary: use Gemini direct for a 30-second clip in a single turn; use peepshow when the video is long, the same artifact needs to reach more than one model, or the frames need to live somewhere durable.

Use it from Gemini CLI (3 steps)

peepshow already ships a Gemini CLI extension — gemini-extension.json + GEMINI.md at the repo root. No skill file needed; Gemini picks it up as a custom tool.

Install peepshow.
```
npm install -g peepshow
```

Register the extension with Gemini CLI.

git clone --depth 1 https://github.com/t0mtaylor/peepshow.git
cd peepshow
gemini --extension .   # or copy gemini-extension.json + GEMINI.md into your global Gemini extensions dir

Ask Gemini about a video.

user:   summarise demo.mp4
gemini: (invokes peepshow tool → reads scene frames + transcript → answers)

Same flow works for long lectures, surveillance clips, animated GIFs — Gemini gets a frame timeline instead of paying per-second video tokens. All 71 sinks fire automatically, so the run also lands wherever you've configured (Notion, Slack, SQLite, S3, …). Full reference: peepshow for Gemini CLI.

Privacy & telemetry

The CLI sends an anonymous run beacon by default (version + OS family + outcome — no paths, no payload). Opt out with peepshow config set telemetry off, PEEPSHOW_TELEMETRY=0, or DO_NOT_TRACK=1. Full details in the privacy policy and docs/PRIVACY.md.

← Back to peepshow.dev Sinks → Agents → Serve →