peepshow/ models/ mistral

Reel #M-06 Pixtral · Mistral Large

peepshow for models / mistral

Pixtral LargePixtral and Mistral Large 3 read images. peepshow extracts the frame timeline that lets them reason about video.

Mistral's vision models — Pixtral Large, Pixtral 12B, Mistral Large 3 — accept images, not video. peepshow extracts scene-change frames + transcript so any Mistral multimodal endpoint reads video as the image sequence it already understands.

Mistral has no native video. peepshow bridges.

Mistral's vision is image-only. peepshow turns video + animated formats into the frame timeline Mistral already accepts.

  • Only path to video on Pixtral / Mistral Large 3. Mistral's vision API accepts image_url parts. peepshow turns video into that shape.
  • EU data residency. Mistral hosts in Europe. Pair with peepshow's local-first extraction to keep PII off non-EU clouds.
  • Open-weights variants. Pixtral 12B has open weights — run it on-prem with vLLM or llama.cpp. peepshow's pipeline doesn't change.
  • Animated GIF / APNG / WebP. Mistral vision treats these as a static image. peepshow extracts the full motion.
  • Token-cost predictable. N × per-image vision price. peepshow lets you pick N (defaults to ~20).
  • Same bundle on Le Chat / API / open weights. Extract once, feed any Mistral endpoint.

Token-cost math (worked examples)

ClipNative uploadpeepshow + Mistral
30s product demo (peepshow)~3K (6 frames + transcript)
10-minute lecture (peepshow)~7K (20 scene frames + transcript)
1-hour CCTV reel (peepshow)~12K (30 motion frames + sparse transcript)
3-hour conference (peepshow + chunked)~32K (60 scene frames + chaptered transcript)

Pixtral's per-image cost varies by size class. Pixtral Large uses ~1100 tokens for a 1024×1024 frame; Pixtral 12B is cheaper.

Install (CLI)

npm install -g peepshow

# Set Mistral credentials:
export MISTRAL_API_KEY=...

# Run extraction:
peepshow ./demo.mp4 --emit json > run.json

Install (Mistral API directly, no CLI)

Calling the Mistral API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:

# Hand frames + transcript to Pixtral
node -e '
  import { Mistral } from "@mistralai/mistralai";
  import { readFileSync } from "node:fs";
  const run = JSON.parse(readFileSync("run.json", "utf8"));
  const content = [
    { type: "text", text: "Summarise this clip." },
    ...run.frames.map(f => ({
      type: "image_url",
      imageUrl: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64")
    })),
    { type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
  ];
  const client = new Mistral({ apiKey: process.env.MISTRAL_API_KEY });
  const r = await client.chat.complete({
    model: "pixtral-large-latest",
    messages: [{ role: "user", content }]
  });
  console.log(r.choices[0].message.content);
'

Animated GIF / APNG / WebP — peepshow's killer move on Mistral

Pixtral and Mistral Large 3 read animated images as a single frame. peepshow extracts the full motion sequence so the model sees what's actually happening.

peepshow ./meme.gif         # animated GIF → frame timeline
peepshow ./tutorial.apng    # animated PNG → frames
peepshow ./loop.webp        # animated WebP → frames

Frame strategy presets

peepshow picks scene-change frames by default. For Mistral specifically, these presets are worth knowing:

  • --strategy scene --max 16Default for Pixtral Large — keeps cost lean.
  • --strategy scene --max 30 --resize 1024Pixtral 12B (cheaper per image) — 30 frames at 1024px fits budget.
  • --strategy fps --fps 0.5 --max 24Steady-motion content — predictable cadence.

All 95 sinks still fire

Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Mistral run. Browse the full sink catalogue →.

Report + LLM analysis loop

Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Mistral consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.

echo '{"summary":"<Mistral's summary>","provider":"pixtral-large-latest"}' \
  | peepshow report annotate "<outputDir>"

When to skip peepshow + use Mistral direct

  • Source is already a single image.
  • Running Mistral Small or 7B text-only — no vision capability.
  • Need EU-only inference with no extraction step (rare).

For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → Mistral reads them as images + text.