peepshow/ models
peepshow for every LLM
peepshow + your LLM
Native video, image-only vision, fully-offline local models — peepshow plugs in front of every multimodal LLM. Pick the one you use:
peepshow + Gemini
Gemini reads video natively, but peepshow sits in front for long footage, animated GIF/APNG/WebP, audit, audio split-out, and cross-model frame bundles.
> 1 min · animated formats · cost-capped · audit · multi-model Anthropicpeepshow + Claude
Claude has no native video input. peepshow extracts scene frames + audio transcript so Opus / Sonnet / Haiku can reason about video as a timeline of stills.
any video · any animated format · any clip length · Claude Code first-class OpenAIpeepshow + GPT-4o / GPT-5
GPT-4o / GPT-5 have no native video input. peepshow extracts scene frames + transcript so OpenAI's vision models can reason about video as a frame timeline.
any video · any animated format · Whisper API integration · Files API push Self-hostedpeepshow + Local LLMs
Run peepshow + a local multimodal LLM (Ollama, LM Studio, llama.cpp) for fully offline video understanding. Llama 3.2 Vision, Qwen2.5-VL, Pixtral, LLaVA all supported.
offline · PII-sensitive · long clips · zero-budget · self-hosted xAIpeepshow + Grok
Grok has no native video input. peepshow extracts scene frames + audio transcript so Grok 4 / 3 / 2 Vision can reason about video as a frame timeline.
any video · animated formats · X-native context · token-bounded Mistral AIpeepshow + Mistral
Mistral has no native video input. peepshow extracts scene frames + transcript so Pixtral / Mistral Large 3 can reason about video as a frame timeline.
any video · EU-hosted · animated formats · open-weights too Alibabapeepshow + Qwen
Qwen3-VL 235B-A22B and 30B-A3B improve native video on the large size but still cap clip length. peepshow keeps every Qwen-VL size working at predictable cost.
small Qwen sizes · long video · token-bounded · vLLM / Ollama / DashScope DeepSeekpeepshow + DeepSeek
DeepSeek-OCR (latest, supersedes VL2) has no native video input. peepshow extracts scene frames + transcript so DeepSeek's OCR-grade VLM reads video as a frame timeline.
any video · screen recordings · slides · OCR-heavy footage · open-weights Coherepeepshow + Cohere Command A Vision
Cohere Command A Vision has no native video input. peepshow extracts scene frames + transcript so its 20-image / 128K-ctx endpoint reads video as a frame timeline.
enterprise · OCR-heavy footage · 20-frame batches · 128K context · audit Coherepeepshow + Cohere Aya Vision
Cohere Aya Vision (multilingual research VLM, 23+ languages, image-only) has no native video input. peepshow extracts scene frames + transcript so it reads video as a frame timeline in any language.
multilingual · non-English OCR · 23+ languages · research · UI / document footage NVIDIApeepshow + NVIDIA Nemotron 3 Nano Omni
NVIDIA Nemotron 3 Nano Omni (30B MoE, vision + audio + text) has no native video container input. peepshow extracts frames + audio so it reads video as the multimodal bundle it expects.
long video · open-weights omni · NIM / OpenRouter · vision + audio in one call Microsoftpeepshow + Phi-4 Multimodal
Microsoft Phi-4-multimodal / Phi-4-reasoning-vision have no native video. peepshow extracts frames + transcript so small Phi-4 VLMs handle video on edge / laptop / Azure AI Foundry.
edge · laptop · Jetson · Azure AI Foundry · NIM · 12GB VRAM and under Reka AIpeepshow + Reka
Reka Core / Flash / Edge accept video + audio + image natively. peepshow caps cost on long clips, fixes animated formats, and keeps the artifact portable.
> 1 min · animated formats · cost-capped · audit · cross-model portability IBMpeepshow + IBM Granite 4.1 Vision
IBM Granite 4.1 Vision (4B, document-grade, OpenAI-compatible on watsonx.ai) has no native video input. peepshow extracts scene frames + transcript so it reads video as a frame timeline.
regulated · watsonx.ai · document footage · 4B VRAM budget · open weights · audit Google Researchpeepshow + SigLIP 2
SigLIP 2 is an embedding model — frames in, vectors out. peepshow extracts the frame timeline so SigLIP can pre-index video before vectors reach Chroma / Qdrant / Pinecone.
vector search pre-index · zero-shot retrieval · open-weights embedder · 768-D frame vectorsHow peepshow plugs into each
peepshow is one CLI. Every model gets the same extracted artifact — frames as JPEGs, transcript as text, metadata as JSON. The model-specific pages above cover token math, install snippets, frame-preset recommendations, and when to skip peepshow.