Cohere Command A Vision has no native video. peepshow bridges.
Cohere Command A Vision's vision is image-only. peepshow turns video + animated formats into the frame timeline Cohere Command A Vision already accepts.
- 20-image cap matches peepshow's defaults. Command A Vision accepts up to 20 images per request — peepshow's default frame budget lands inside that limit with room for prompt + transcript.
- OCR-grade vision needs OCR-grade frames. Scene-change extraction lands exactly on the frames where text or layout changes — slide decks, screen captures, document walkthroughs.
- OpenAI-compatible shape. Cohere's vision endpoint takes `image_url` parts. peepshow's JSON manifest drops in with one line of glue code.
- 128K context, predictable spend. The large context is for transcript + reasoning, not for paying per-second native-video bills. peepshow keeps the visual budget bounded.
- Animated GIF / APNG / WebP. Cohere reads still images. peepshow flattens animated formats into a JPEG sequence.
- Enterprise audit trail. Same auditable frame bundle peepshow gives every model — useful when compliance wants to see exactly what the LLM saw.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + Cohere Command A Vision |
|---|---|---|
| 30s product demo (peepshow) | — | ~3K (6 frames + transcript) |
| 10-minute lecture (peepshow) | — | ~7K (20 frames — at the per-request cap — + transcript) |
| 1-hour CCTV reel (peepshow, chunked 3×) | — | ~22K (3 × 20-frame batches + sparse transcript) |
| 3-hour conference (peepshow + chunked) | — | ~64K (9 batches of 20 frames + chaptered transcript) — fits 128K ctx |
Cohere bills per image (high-detail) plus context. Numbers approximate against published Command A Vision pricing tiers. The 20-image cap forces chunking on longer clips — peepshow does the chunking deterministically.
Install (CLI)
npm install -g peepshow
# Set Cohere credentials:
export COHERE_API_KEY=co-...
# Run extraction (defaults sit nicely inside the 20-image cap):
peepshow ./demo.mp4 --emit json > run.jsonInstall (Cohere Command A Vision API directly, no CLI)
Calling the Cohere Command A Vision API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Hand frames + transcript to Cohere Command A Vision (OpenAI-compatible)
node -e '
import OpenAI from "openai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const content = [
{ type: "text", text: "Summarise this clip; extract any visible text." },
...run.frames.slice(0, 20).map(f => ({
type: "image_url",
image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
})),
{ type: "text", text: "Transcript:\n" + (run.transcript?.text ?? "") }
];
const client = new OpenAI({
apiKey: process.env.COHERE_API_KEY,
baseURL: "https://api.cohere.com/compatibility/v1",
});
const r = await client.chat.completions.create({
model: "command-a-vision-07-2025",
messages: [{ role: "user", content }]
});
console.log(r.choices[0].message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on Cohere Command A Vision
Command A Vision reads still images only. peepshow normalises animated GIF / APNG / WebP into a JPEG sequence, capped at 20 frames so the per-request limit is respected without manual pruning.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For Cohere Command A Vision specifically, these presets are worth knowing:
--strategy scene --max 20Default — exactly at the Cohere per-request image cap. One call covers the whole clip.--strategy scene --max 12 --dedup perceptualStatic / talking-head footage — drop near-duplicates so the 20-frame budget covers more meaningful change.--strategy fps --fps 0.5 --max 20Steady-motion content — predictable cadence, still inside the cap.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one Cohere Command A Vision run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When Cohere Command A Vision consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<Cohere Command A Vision's summary>","provider":"command-a-vision-07-2025"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use Cohere Command A Vision direct
- Source is already a single document scan — call Cohere with the one image.
- Running Command R+ text-only — no vision capability.
- Need to exceed 20 images in one request (peepshow chunks instead).
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → Cohere Command A Vision reads them as images + text.