Grok has no native video. peepshow bridges.
Grok's vision is image-only. peepshow turns video + animated formats into the frame timeline Grok already accepts.
- Only path to video on Grok. Grok's chat endpoint accepts image URLs / base64. peepshow turns video into that shape.
- X / Twitter posting workflows. Pair with the Slack / Webhook sinks to fan run output into X posts via Grok.
- Animated GIF / APNG / WebP. Grok vision treats these as static. peepshow flattens to JPEG sequence.
- Predictable token cost. N × per-image price; peepshow picks N. Native video would be unbounded — peepshow caps it.
- Reasoning + chat models share input shape. Frame bundle works whether Grok runs in chat or extended-thinking mode.
- Cross-model portability. Same artifact feeds Grok + Gemini + Claude + GPT — no re-extract.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + Grok |
|---|---|---|
| 30s product demo (peepshow) | — | ~3K (6 frames + 200-word transcript) |
| 10-minute lecture (peepshow) | — | ~6K (20 scene frames + transcript) |
| 1-hour CCTV reel (peepshow) | — | ~10K (30 motion frames + sparse transcript) |
| 3-hour podcast video (peepshow) | — | ~28K (60 scene frames + chaptered transcript) |
Per-image token cost uses xAI's published vision pricing tier. Numbers approximate — actual cost depends on detail level and frame resolution.
Install (CLI)
npm install -g peepshow
# Set Grok credentials:
export XAI_API_KEY=xai-...
# Run extraction:
peepshow ./demo.mp4 --emit json > run.jsonInstall (Grok API directly, no CLI)
Calling the Grok API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Hand frames + transcript to Grok
node -e '
import OpenAI from "openai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const content = [
{ type: "text", text: "Summarise this clip." },
...run.frames.map(f => ({
type: "image_url",
image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
})),
{ type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
];
const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1" });
const r = await client.chat.completions.create({
model: "grok-4-vision",
messages: [{ role: "user", content }]
});
console.log(r.choices[0].message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on Grok
Grok's vision adapter handles animated images as a single still. peepshow extracts every motion frame from animated GIF / APNG / WebP so Grok sees the full loop.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For Grok specifically, these presets are worth knowing:
--strategy scene --max 16Default for Grok — 16 frames keeps context lean while still capturing scene changes.--strategy scene --max 30 --dedup perceptualLong-form podcasts / interviews. Drops near-duplicates so Grok doesn't pay for static talking-heads.--strategy fps --fps 1 --max 30Steady-cadence sampling for sport, gameplay, broadcast.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Grok run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Grok consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<Grok's summary>","provider":"grok-4-vision"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use Grok direct
- Source is already a single image.
- Doing X-realtime trend extraction — peepshow is per-clip, not stream.
- Audio-only — Grok's transcript handling doesn't need frames.
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → Grok reads them as images + text.