Get per-speaker transcript segments with WhisperX or Deepgram

A transcript that says 'OK so the plan is X' tells you a lot less than one that says 'Alice: OK so the plan is X. Bob: And then we…'. peepshow's `--diarise` flag attaches per-segment speaker labels and a top-level speakers summary to the transcript. Local via auto-detected WhisperX, cloud via Deepgram and AssemblyAI.

Steps

Install peepshow + WhisperX (local) OR set a cloud API key
For local diarisation, `pip install whisperx`. Cloud paths use existing API keys.
```
npm install -g peepshow
pip install whisperx
export HF_TOKEN=hf_...  # required for pyannote model download
```
Run with --diarise
WhisperX path used if `whisperx` is on PATH; cloud providers honour `--transcribe` selection.
```
peepshow ./meeting.mp4 --diarise
```
Pick a cloud provider explicitly
Pass through to Deepgram's `?diarize=true` or AssemblyAI's `speaker_labels:true`.
```
peepshow ./meeting.mp4 --diarise --transcribe deepgram
peepshow ./meeting.mp4 --diarise --transcribe assemblyai
```
Cap speaker count
Helps the diariser converge faster.
```
peepshow ./call.mp4 --diarise --max-speakers 3
```

Why it works

Diarisation is the missing piece of meeting / interview transcription. WhisperX (local) ships pyannote.audio under the hood — needs a Hugging Face token to download the pyannote models, then runs entirely offline. Deepgram and AssemblyAI both ship server-side diarisation with a single flag. peepshow's `--diarise` routes through the existing transcription provider chain, so you opt-in once and the right path fires based on what's available.

When it helps

Meetings / standups / interviews — know who said what without re-listening.
Podcast post-production — auto-tag segments by host.
Court / legal interviews where speaker attribution matters.
Customer-support call review — separate agent vs caller.

Pitfalls

WhisperX needs a Hugging Face token (`HF_TOKEN`) to download pyannote models on first run.
Cloud providers don't always agree on speaker IDs — WhisperX uses `SPEAKER_00`, Deepgram uses numeric, AssemblyAI uses `A`/`B`/`C`. peepshow normalises to the provider's native scheme.
Diarisation accuracy drops on overlapping speech and very short utterances.

Get per-speaker transcript segments with WhisperX or Deepgram

Steps

Why it works

When it helps

Pitfalls

Works with these LLMs

Pairs with these sinks

Other how-to workflows

Pairs with

Get per-speaker transcript segments with WhisperX or Deepgram

Steps

Why it works

When it helps

Pitfalls

Works with these LLMs

Pairs with these sinks

Related how-to

Other how-to workflows

Pairs with