GPT-4o / GPT-5 has no native video. peepshow bridges.
GPT-4o / GPT-5's vision is image-only. peepshow turns video + animated formats into the frame timeline GPT-4o / GPT-5 already accepts.
- Only path to video on GPT. OpenAI's Files + Vision APIs accept images. peepshow turns video into the input shape GPT already understands.
- Pair with Whisper API. peepshow's transcription provider chain includes OpenAI Whisper — run extraction + transcription on the same vendor in one call.
- Files API sink ships in-tree. `peepshow-sink-openai-files` uploads frames directly to OpenAI Files; reference them by `file-id` in subsequent Responses calls.
- Animated GIF / APNG / WebP. GPT vision treats these as static images. peepshow flattens them into a JPEG sequence.
- Token cost is N × ~85-170 per image. Predictable budget. Native video would arrive uncapped — peepshow caps it.
- Reasoning models work too. o4 / o5 reasoning models accept image content. peepshow's frame bundle feeds reasoning runs identically to chat runs.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + GPT-4o / GPT-5 |
|---|---|---|
| 30s product demo (peepshow) | — | ~3K (6 frames at ~170 tok each + transcript) |
| 10-minute lecture (peepshow) | — | ~6K (20 scene frames + transcript) |
| 1-hour CCTV reel (peepshow) | — | ~10K (30 motion frames + sparse transcript) |
| 3-hour conference (peepshow + chunked) | — | ~28K (60 scene frames + chaptered transcript) |
Per-image token cost uses OpenAI's high-detail vision pricing (~170 tokens / 512×512 tile). Lower for `detail: low` (~85 tok / image flat).
Install (CLI)
npm install -g peepshow
# Run peepshow with the OpenAI Files sink:
export OPENAI_API_KEY=sk-...
peepshow ./demo.mp4 --sink openai-files
# → frames uploaded to OpenAI Files, file-ids returned in the manifest.Install (GPT-4o / GPT-5 API directly, no CLI)
Calling the GPT-4o / GPT-5 API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# 1. Extract
peepshow ./demo.mp4 --emit json > run.json
# 2. Hand the frames + transcript to GPT
node -e '
import OpenAI from "openai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const content = [
{ type: "input_text", text: "Summarise this clip." },
...run.frames.map(f => ({
type: "input_image",
image_url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64"),
detail: "high"
})),
{ type: "input_text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
];
const client = new OpenAI();
const r = await client.responses.create({
model: "gpt-4o",
input: [{ role: "user", content }]
});
console.log(r.output_text);
'Animated GIF / APNG / WebP — peepshow's killer move on GPT-4o / GPT-5
GPT's vision endpoint reads animated images as still frame one. peepshow extracts every motion frame from animated GIF / APNG / WebP, so GPT sees the whole loop.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For GPT-4o / GPT-5 specifically, these presets are worth knowing:
--strategy scene --max 24Default — 24 frames at high detail = ~4K tokens. Good for narrative video.--strategy scene --max 60 --detail lowLong content with low-detail vision (flat 85 tok/image). 60 frames ≈ 5.1K tokens — wide context window.--strategy fps --fps 1 --max 30 --sink openai-filesCache frames to OpenAI Files for reuse across multiple Responses calls (RAG-style).
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one GPT-4o / GPT-5 run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When GPT-4o / GPT-5 consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<GPT-4o / GPT-5's summary>","provider":"gpt-4o"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use GPT-4o / GPT-5 direct
- OpenAI ships native video in the future and your clip is short.
- You only need a single frame at a known timestamp.
- Audio-only — use Whisper API directly, no frames needed.
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → GPT-4o / GPT-5 reads them as images + text.