IBM Granite 4.1 Vision has no native video. peepshow bridges.
IBM Granite 4.1 Vision's vision is image-only. peepshow turns video + animated formats into the frame timeline IBM Granite 4.1 Vision already accepts.
- Only path to video on Granite. Granite 4.1 Vision accepts images only. peepshow turns video into the frame timeline the model already understands — same shape whether you hit watsonx.ai or self-host.
- Tuned for charts, tables, slides. Scene-change extraction lands on the frames where document structure changes — exactly the inputs Granite was trained to extract from.
- OpenAI-compatible on watsonx.ai. Granite's hosted endpoint takes `image_url` parts. peepshow's JSON manifest drops in with one line of glue code.
- Open weights, regulated friendly. Apache 2.0 weights on HuggingFace mean the same pipeline runs on-prem under a watsonx Code Assistant or vLLM deployment with zero data egress.
- 4B VRAM footprint. Granite 4.1 Vision 4B fits in 8GB VRAM at FP16 — peepshow trims frame count so even Granite Mini stays comfortable on a single L4 / 4090.
- Animated GIF / APNG / WebP. Granite's vision adapter reads JPEG / PNG. peepshow flattens animated formats — useful for animated UI walkthroughs and product demos.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + IBM Granite 4.1 Vision |
|---|---|---|
| 30s product demo (peepshow) | — | ~2.5K (6 frames + transcript) |
| 10-min slide deck recording (peepshow) | — | ~7K (20 scene frames + transcript) |
| 1-hour compliance training (peepshow) | — | ~14K (30 motion frames + sparse transcript) |
| 3-hour conference recording (peepshow + chunked) | — | ~38K (60 scene frames + chaptered transcript) |
Granite 4.1 Vision bills per input image (≈ 1.4K tokens per 768×768 frame) plus context on watsonx.ai. Open-weight self-host = VRAM-seconds. Scene-change extraction tends to undershoot per-frame cost vs naïve fps sampling on document footage.
Install (CLI)
npm install -g peepshow
# watsonx.ai (OpenAI-compatible):
export WATSONX_API_KEY=...
export WATSONX_PROJECT_ID=...
# OR self-host via vLLM (open weights on HuggingFace):
pip install vllm
python -m vllm.entrypoints.openai.api_server --model ibm-granite/granite-vision-3.2-2b
peepshow ./demo.mp4 --emit json > run.jsonInstall (IBM Granite 4.1 Vision API directly, no CLI)
Calling the IBM Granite 4.1 Vision API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Hand frames + transcript to Granite 4.1 Vision (OpenAI-compatible on watsonx.ai)
node -e '
import OpenAI from "openai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const content = [
{ type: "text", text: "Extract the charts, tables, and any structured text from this clip." },
...run.frames.map(f => ({
type: "image_url",
image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
})),
{ type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
];
const client = new OpenAI({
apiKey: process.env.WATSONX_API_KEY,
baseURL: "https://us-south.ml.cloud.ibm.com/ml/v1",
defaultHeaders: { "X-Watsonx-Project-Id": process.env.WATSONX_PROJECT_ID }
});
const r = await client.chat.completions.create({
model: "ibm/granite-vision-3-2-2b",
messages: [{ role: "user", content }]
});
console.log(r.choices[0].message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on IBM Granite 4.1 Vision
Granite's vision adapter reads JPEG / PNG. Animated GIFs, APNGs, and animated WebPs need flattening for document VLMs — peepshow does it automatically, so animated product tours and screen-capture walkthroughs reach Granite without losing motion.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For IBM Granite 4.1 Vision specifically, these presets are worth knowing:
--strategy scene --max 16Default for slide decks / training material — scene detection lands on slide transitions.--strategy scene --max 24 --resize 1024Document review / table extraction — full-res frames so Granite's OCR-grade vision keeps fine print.--strategy scene --max 8 --resize 768Granite Mini on edge hardware — lean budget, native input resolution.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one IBM Granite 4.1 Vision run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When IBM Granite 4.1 Vision consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<IBM Granite 4.1 Vision's summary>","provider":"ibm/granite-vision-3-2-2b"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use IBM Granite 4.1 Vision direct
- Source is already a single PDF / scanned form — call Granite directly with the image.
- Running Granite 4.1 Code or Granite 4.1 text-only — no vision capability.
- Need streaming OCR on a live document feed (peepshow is one-shot).
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → IBM Granite 4.1 Vision reads them as images + text.