Local LLMs has no native video. peepshow bridges.
Local LLMs's vision is image-only. peepshow turns video + animated formats into the frame timeline Local LLMs already accepts.
- Fully offline. peepshow + whisper.cpp + Ollama = video → answer without any network hop. No cloud, no API key, no upload.
- Cap visual VRAM. Local vision models choke on long video. peepshow trims the input to N frames so any 7B / 11B multimodal model fits in 12GB VRAM.
- Animated GIF / APNG / WebP. Most local vision adapters only accept JPEG/PNG. peepshow flattens animated formats.
- Audio transcript text-cheap. whisper.cpp ships standalone — peepshow auto-detects it. Frames + transcript reach the model as cheap inputs.
- Vendor-agnostic. Same peepshow bundle feeds Ollama, LM Studio, llama.cpp, Jan, OpenWebUI, GPT4All. Pick or switch models freely.
- No telemetry leakage. Run with `PEEPSHOW_TELEMETRY=0` for a hard offline pipeline.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + Local LLMs |
|---|---|---|
| 30s clip + Llama 3.2 11B Vision | — | ~3GB VRAM peak, ~2s/frame on M3 Max |
| 10-min clip + Qwen2.5-VL 7B | — | ~8GB VRAM, ~80s total for 20 frames |
| 1-hour CCTV + Pixtral 12B | — | ~14GB VRAM, ~3min for 30 frames |
| 3-hour conference + LLaVA 13B | — | Chunk into 5-min segments; ~18GB VRAM peak |
Numbers on Apple Silicon (M3 Max, MPS backend). x86 CUDA performance varies by GPU. peepshow itself adds <2s overhead per clip — ffmpeg + whisper.cpp dominate.
Install (CLI)
# 1. Install peepshow + whisper.cpp + Ollama
npm install -g peepshow
brew install whisper-cpp ollama
ollama pull llama3.2-vision
# 2. Extract video → frames + transcript
peepshow ./demo.mp4 --emit json > run.jsonInstall (Local LLMs API directly, no CLI)
Calling the Local LLMs API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Pipe frames to Ollama directly
node -e '
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const r = await fetch("http://127.0.0.1:11434/api/chat", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
model: "llama3.2-vision",
messages: [{
role: "user",
content: "Summarise this clip. Transcript: " + (run.transcript?.text ?? ""),
images: run.frames.map(f => readFileSync(f.path).toString("base64"))
}],
stream: false
})
});
console.log((await r.json()).message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on Local LLMs
Most local vision adapters reject animated images outright. peepshow normalises GIF / APNG / WebP into JPEG sequences so any local multimodal model reads them.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For Local LLMs specifically, these presets are worth knowing:
--strategy scene --max 12Lean preset for 8GB VRAM cards. Keeps Llama 3.2 11B Vision comfortable.--strategy scene --max 6 --resize 512Tight VRAM budget — half-res frames + 6-frame max. Works on 6GB cards.--strategy fps --fps 0.5 --max 30Steady-cadence sampling for long static content. Pairs with chunked inference.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Local LLMs run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Local LLMs consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<Local LLMs's summary>","provider":"llama3.2-vision"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use Local LLMs direct
- Footage is already a small set of stills.
- Running a non-vision model — text-only LLMs don't read frames.
- Need realtime streaming inference — peepshow is one-shot, not streaming.
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → Local LLMs reads them as images + text.