Steps
- Install peepshow + yt-dlp + whisper.cpp
All three are stand-alone CLIs.
npm install -g peepshow brew install yt-dlp whisper-cpp - Download the YouTube clip
yt-dlp grabs the source `.mp4`. Use `-f` to pick a small format for fast extraction.
yt-dlp -f 'best[height<=480]' -o 'clip.%(ext)s' https://www.youtube.com/watch?v=... - Run peepshow
Scene-change frames + whisper.cpp transcript emerge automatically.
peepshow ./clip.mp4 --emit json > run.json - Feed to your LLM
Drag the frame folder into Claude / ChatGPT, or pipe the JSON to the API.
peepshow ./clip.mp4 --sink obsidian # or --sink notion / --sink slack / etc.
Why it works
YouTube videos are the canonical case for video → LLM. Native video upload to Gemini works but burns tokens proportional to clip length. peepshow's frame-extraction + whisper.cpp transcript bundle stays under a few thousand tokens for most clips — even hour-long lectures.
When it helps
- Lecture summaries and chapter generation.
- Tutorial walkthroughs where you need both visuals and dialogue.
- Compliance / review of long-form content (podcasts, conference talks).
Pitfalls
- yt-dlp is rate-limited — don't bulk-extract without spacing requests.
- Some YouTube videos are age-gated / region-locked; yt-dlp will refuse without cookies.
- Live streams need a separate flow — yt-dlp doesn't tail by default.