peepshow/ models

peepshow for every LLM

peepshow + your LLM

Native video, image-only vision, fully-offline local models — peepshow plugs in front of every multimodal LLM. Pick the one you use:

Google

peepshow + Gemini

Gemini reads video natively, but peepshow sits in front for long footage, animated GIF/APNG/WebP, audit, audio split-out, and cross-model frame bundles.

> 1 min · animated formats · cost-capped · audit · multi-model
Anthropic

peepshow + Claude

Claude has no native video input. peepshow extracts scene frames + audio transcript so Opus / Sonnet / Haiku can reason about video as a timeline of stills.

any video · any animated format · any clip length · Claude Code first-class
OpenAI

peepshow + GPT-4o / GPT-5

GPT-4o / GPT-5 have no native video input. peepshow extracts scene frames + transcript so OpenAI's vision models can reason about video as a frame timeline.

any video · any animated format · Whisper API integration · Files API push
Self-hosted

peepshow + Local LLMs

Run peepshow + a local multimodal LLM (Ollama, LM Studio, llama.cpp) for fully offline video understanding. Llama 3.2 Vision, Qwen2.5-VL, Pixtral, LLaVA all supported.

offline · PII-sensitive · long clips · zero-budget · self-hosted
xAI

peepshow + Grok

Grok has no native video input. peepshow extracts scene frames + audio transcript so Grok 4 / 3 / 2 Vision can reason about video as a frame timeline.

any video · animated formats · X-native context · token-bounded
Mistral AI

peepshow + Mistral

Mistral has no native video input. peepshow extracts scene frames + transcript so Pixtral / Mistral Large 3 can reason about video as a frame timeline.

any video · EU-hosted · animated formats · open-weights too
Alibaba

peepshow + Qwen

Qwen3-VL 235B-A22B and 30B-A3B improve native video on the large size but still cap clip length. peepshow keeps every Qwen-VL size working at predictable cost.

small Qwen sizes · long video · token-bounded · vLLM / Ollama / DashScope
DeepSeek

peepshow + DeepSeek

DeepSeek-OCR (latest, supersedes VL2) has no native video input. peepshow extracts scene frames + transcript so DeepSeek's OCR-grade VLM reads video as a frame timeline.

any video · screen recordings · slides · OCR-heavy footage · open-weights
Cohere

peepshow + Cohere Command A Vision

Cohere Command A Vision has no native video input. peepshow extracts scene frames + transcript so its 20-image / 128K-ctx endpoint reads video as a frame timeline.

enterprise · OCR-heavy footage · 20-frame batches · 128K context · audit
Cohere

peepshow + Cohere Aya Vision

Cohere Aya Vision (multilingual research VLM, 23+ languages, image-only) has no native video input. peepshow extracts scene frames + transcript so it reads video as a frame timeline in any language.

multilingual · non-English OCR · 23+ languages · research · UI / document footage
NVIDIA

peepshow + NVIDIA Nemotron 3 Nano Omni

NVIDIA Nemotron 3 Nano Omni (30B MoE, vision + audio + text) has no native video container input. peepshow extracts frames + audio so it reads video as the multimodal bundle it expects.

long video · open-weights omni · NIM / OpenRouter · vision + audio in one call
Microsoft

peepshow + Phi-4 Multimodal

Microsoft Phi-4-multimodal / Phi-4-reasoning-vision have no native video. peepshow extracts frames + transcript so small Phi-4 VLMs handle video on edge / laptop / Azure AI Foundry.

edge · laptop · Jetson · Azure AI Foundry · NIM · 12GB VRAM and under
Reka AI

peepshow + Reka

Reka Core / Flash / Edge accept video + audio + image natively. peepshow caps cost on long clips, fixes animated formats, and keeps the artifact portable.

> 1 min · animated formats · cost-capped · audit · cross-model portability
IBM

peepshow + IBM Granite 4.1 Vision

IBM Granite 4.1 Vision (4B, document-grade, OpenAI-compatible on watsonx.ai) has no native video input. peepshow extracts scene frames + transcript so it reads video as a frame timeline.

regulated · watsonx.ai · document footage · 4B VRAM budget · open weights · audit
Google Research

peepshow + SigLIP 2

SigLIP 2 is an embedding model — frames in, vectors out. peepshow extracts the frame timeline so SigLIP can pre-index video before vectors reach Chroma / Qdrant / Pinecone.

vector search pre-index · zero-shot retrieval · open-weights embedder · 768-D frame vectors

How peepshow plugs into each

peepshow is one CLI. Every model gets the same extracted artifact — frames as JPEGs, transcript as text, metadata as JSON. The model-specific pages above cover token math, install snippets, frame-preset recommendations, and when to skip peepshow.