peepshow/ how-to/ speaker-diarisation

Reel #H-21 Who said what — labelled segments per speaker

peepshow how-to / speaker-diarisation

Get per-speaker transcript segments with WhisperX or Deepgram

A transcript that says 'OK so the plan is X' tells you a lot less than one that says 'Alice: OK so the plan is X. Bob: And then we…'. peepshow's `--diarise` flag attaches per-segment speaker labels and a top-level speakers summary to the transcript. Local via auto-detected WhisperX, cloud via Deepgram and AssemblyAI.

Steps

  1. Install peepshow + WhisperX (local) OR set a cloud API key

    For local diarisation, `pip install whisperx`. Cloud paths use existing API keys.

    npm install -g peepshow
    pip install whisperx
    export HF_TOKEN=hf_...  # required for pyannote model download
  2. Run with --diarise

    WhisperX path used if `whisperx` is on PATH; cloud providers honour `--transcribe` selection.

    peepshow ./meeting.mp4 --diarise
  3. Pick a cloud provider explicitly

    Pass through to Deepgram's `?diarize=true` or AssemblyAI's `speaker_labels:true`.

    peepshow ./meeting.mp4 --diarise --transcribe deepgram
    peepshow ./meeting.mp4 --diarise --transcribe assemblyai
  4. Cap speaker count

    Helps the diariser converge faster.

    peepshow ./call.mp4 --diarise --max-speakers 3

Why it works

Diarisation is the missing piece of meeting / interview transcription. WhisperX (local) ships pyannote.audio under the hood — needs a Hugging Face token to download the pyannote models, then runs entirely offline. Deepgram and AssemblyAI both ship server-side diarisation with a single flag. peepshow's `--diarise` routes through the existing transcription provider chain, so you opt-in once and the right path fires based on what's available.

When it helps

  • Meetings / standups / interviews — know who said what without re-listening.
  • Podcast post-production — auto-tag segments by host.
  • Court / legal interviews where speaker attribution matters.
  • Customer-support call review — separate agent vs caller.

Pitfalls

  • WhisperX needs a Hugging Face token (`HF_TOKEN`) to download pyannote models on first run.
  • Cloud providers don't always agree on speaker IDs — WhisperX uses `SPEAKER_00`, Deepgram uses numeric, AssemblyAI uses `A`/`B`/`C`. peepshow normalises to the provider's native scheme.
  • Diarisation accuracy drops on overlapping speech and very short utterances.

Works with these LLMs

Pairs with these sinks