NVIDIA Nemotron 3 Nano Omni has no native video. peepshow bridges.
NVIDIA Nemotron 3 Nano Omni's vision is image-only. peepshow turns video + animated formats into the frame timeline NVIDIA Nemotron 3 Nano Omni already accepts.
- Omni input shape, fed properly. Nemotron 3 Nano Omni accepts vision + audio + text together. peepshow's manifest carries frames and the source audio path — feed both in one call instead of running modalities separately.
- Open weights via HuggingFace. Run Nemotron 3 Nano Omni on-prem under NVIDIA NIM or transformers. peepshow's pipeline is identical to the hosted route.
- OpenRouter ready. OpenRouter exposes Nemotron 3 Nano Omni with an OpenAI-compatible shape — peepshow's JSON drops in with the standard `image_url` parts.
- 30B MoE = small VRAM footprint. Only ~6B active params per token — peepshow trims frame count so an L40S / 4090 keeps up.
- Animated GIF / APNG / WebP. Adapter expects still images. peepshow flattens animated formats.
- Audio split-out optional. Send the raw audio to Nemotron's omni input, or run peepshow's whisper.cpp pass for a text transcript and save the audio tokens.
Token-cost math (worked examples)
| Clip | Native upload | peepshow + NVIDIA Nemotron 3 Nano Omni |
|---|---|---|
| 30s product demo (peepshow + audio) | — | ~5K (6 frames + raw audio + transcript) |
| 10-minute lecture (peepshow + audio) | — | ~16K (20 scene frames + audio + transcript) |
| 1-hour CCTV reel (peepshow, video only) | — | ~10K (30 motion frames + sparse transcript) |
| 3-hour conference (peepshow + chunked audio) | — | ~40K (60 scene frames + chaptered audio + transcript) |
Nemotron 3 Nano Omni is open-weight — self-hosted cost is VRAM-seconds, hosted cost is per-token on NIM / OpenRouter. Audio adds tokens but typically less than equivalent frame coverage of the same content.
Install (CLI)
npm install -g peepshow
# Self-host via NVIDIA NIM (Docker):
docker run --gpus all -p 8000:8000 \
nvcr.io/nim/nvidia/nemotron-3-nano-omni:latest
# OR OpenRouter:
export OPENROUTER_API_KEY=sk-or-...
peepshow ./demo.mp4 --emit json > run.jsonInstall (NVIDIA Nemotron 3 Nano Omni API directly, no CLI)
Calling the NVIDIA Nemotron 3 Nano Omni API from your own code? Run peepshow first, then feed the JSON manifest in as multimodal parts:
# Hand frames + audio to Nemotron 3 Nano Omni (OpenAI-compatible via NIM / OpenRouter)
node -e '
import OpenAI from "openai";
import { readFileSync } from "node:fs";
const run = JSON.parse(readFileSync("run.json", "utf8"));
const content = [
{ type: "text", text: "Summarise this clip using both the frames and the audio." },
...run.frames.map(f => ({
type: "image_url",
image_url: { url: "data:image/jpeg;base64," + readFileSync(f.path).toString("base64") }
})),
...(run.audio ? [{
type: "input_audio",
input_audio: { data: readFileSync(run.audio.path).toString("base64"), format: "wav" }
}] : []),
{ type: "text", text: "Transcript fallback:\n" + (run.transcript?.text ?? "") }
];
const client = new OpenAI({
apiKey: process.env.OPENROUTER_API_KEY,
baseURL: "https://openrouter.ai/api/v1",
});
const r = await client.chat.completions.create({
model: "nvidia/nemotron-3-nano-omni",
messages: [{ role: "user", content }]
});
console.log(r.choices[0].message.content);
'Animated GIF / APNG / WebP — peepshow's killer move on NVIDIA Nemotron 3 Nano Omni
Nemotron 3 Nano Omni's vision adapter reads JPEG / PNG. peepshow flattens animated GIF / APNG / WebP into a frame sequence so animated content reaches the omni stack alongside the audio track.
peepshow ./meme.gif # animated GIF → frame timeline
peepshow ./tutorial.apng # animated PNG → frames
peepshow ./loop.webp # animated WebP → framesFrame strategy presets
peepshow picks scene-change frames by default. For NVIDIA Nemotron 3 Nano Omni specifically, these presets are worth knowing:
--strategy scene --max 20Default — leaves headroom for the audio modality in the same call.--strategy scene --max 30 --emit jsonLong static content (CCTV / lecture). Combine with the audio path for an omni call that still fits VRAM.--strategy fps --fps 1 --max 24Steady-motion sport / gameplay with synced audio.
All 95 sinks still fire
Same CLI = same sinks. Push frames to SQLite, embed captions into Chroma, mirror to S3, drop a thumbnail in Slack, file a GitHub issue with the offending frame attached — all from one NVIDIA Nemotron 3 Nano Omni run. Browse the full sink catalogue →.
Report + LLM analysis loop
Every run also writes a self-contained report.html + manifest.json next to the frames (see the Report page). When NVIDIA Nemotron 3 Nano Omni consumes the frames, the analysis flows back into the report — whoever opens it next sees the model's understanding without re-running the prompt.
echo '{"summary":"<NVIDIA Nemotron 3 Nano Omni's summary>","provider":"nvidia/nemotron-3-nano-omni"}' \
| peepshow report annotate "<outputDir>"When to skip peepshow + use NVIDIA Nemotron 3 Nano Omni direct
- Source is already an image + audio pair — call Nemotron directly.
- Running Nemotron 3 Nano text-only (no Omni suffix) — no vision capability.
- Need realtime streaming omni inference (peepshow is one-shot).
For everything beyond those edge cases, peepshow is the bridge: video + animated formats + transcript → NVIDIA Nemotron 3 Nano Omni reads them as images + text.