peepshow/ models/ grok

Reel #M-05 xAI Grok vision

peepshow for models / grok

Grok 4 VisionGrok vision is image input only. peepshow turns video + animated formats into the frame timeline Grok ingests.

xAI Grok's multimodal endpoint reads images, not video containers. peepshow extracts scene-change frames and a transcript so Grok reads any clip as a sequence of images — same shape the API already accepts.

Grok has no native video. peepshow bridges.

Grok's vision is image-only. peepshow turns video + animated formats into the frame timeline Grok already accepts.

  • Only path to video on Grok. Grok's chat endpoint accepts image URLs / base64. peepshow turns video into that shape.
  • X / Twitter posting workflows. Pair with the Slack / Webhook sinks to fan run output into X posts via Grok.
  • Animated GIF / APNG / WebP. Grok vision treats these as static. peepshow flattens to JPEG sequence.
  • Predictable token cost. N × per-image price; peepshow picks N. Native video would be unbounded — peepshow caps it.
  • Reasoning + chat models share input shape. Frame bundle works whether Grok runs in chat or extended-thinking mode.
  • Cross-model portability. Same artifact feeds Grok + Gemini + Claude + GPT — no re-extract.

Token-cost math (worked examples)

ClipNative uploadpeepshow + Grok
30s product demo (peepshow)~3K (6 frames + 200-word transcript)
10-minute lecture (peepshow)~6K (20 scene frames + transcript)
1-hour CCTV reel (peepshow)~10K (30 motion frames + sparse transcript)
3-hour podcast video (peepshow)~28K (60 scene frames + chaptered transcript)

Per-image token cost uses xAI's published vision pricing tier. Numbers approximate — actual cost depends on detail level and frame resolution.

Install (CLI)

npm install -g peepshow

# Set Grok credentials:
export XAI_API_KEY=xai-...

# Run extraction:
peepshow ./demo.mp4 --emit json > run.json

Install (Grok API directly, no CLI)

Calling the Grok API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:

# Hand frames + transcript to Grok
node -e '
  import OpenAI from "openai";
  import { readFileSync } from "node:fs";
  const run = JSON.parse(readFileSync("run.json", "utf8"));
  const content = [
    { type: "text", text: "Summarise this clip." },
    ...run.frames.map(f => ({
      type: "image_url",
      image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
    })),
    { type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
  ];
  const client = new OpenAI({ apiKey: process.env.XAI_API_KEY, baseURL: "https://api.x.ai/v1" });
  const r = await client.chat.completions.create({
    model: "grok-4-vision",
    messages: [{ role: "user", content }]
  });
  console.log(r.choices[0].message.content);
'

Animated GIF / APNG / WebP — peepshow's killer move on Grok

Grok's vision adapter handles animated images as a single still. peepshow extracts every motion frame from animated GIF / APNG / WebP so Grok sees the full loop.

peepshow ./meme.gif         # animated GIF → frame timeline
peepshow ./tutorial.apng    # animated PNG → frames
peepshow ./loop.webp        # animated WebP → frames

Frame strategy presets

peepshow picks scene-change frames by default. For Grok specifically, these presets are worth knowing:

  • --strategy scene --max 16Default for Grok — 16 frames keeps context lean while still capturing scene changes.
  • --strategy scene --max 30 --dedup perceptualLong-form podcasts / interviews. Drops near-duplicates so Grok doesn't pay for static talking-heads.
  • --strategy fps --fps 1 --max 30Steady-cadence sampling for sport, gameplay, broadcast.

All 95 sinks still fire

Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Grok run. Browse the full sink catalogue →.

Report + LLM analysis loop

Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Grok consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.

echo '{"summary":"<Grok's summary>","provider":"grok-4-vision"}' \
  | peepshow report annotate "<outputDir>"

When to skip peepshow + use Grok direct

  • Source is already a single image.
  • Doing X-realtime trend extraction — peepshow is per-clip, not stream.
  • Audio-only — Grok's transcript handling doesn't need frames.

For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → Grok reads them as images + text.