peepshow/ sinks/ clickhouse

Reel #75 OLAP archive

peepshow sink / clickhouse

ClickHouseColumnar OLAP archive — one row per run, JSONEachRow over HTTP.

Insert each peepshow run into a ClickHouse table via the HTTP interface. Auto-creates the schema on first write; designed for high-cardinality run archives + ad-hoc analytics.

drop · process · clickhouse

What it does

[ClickHouse](https://clickhouse.com) is the column-store database of choice for petabyte-scale event analytics. This sink writes one row per peepshow run via the HTTP interface using `FORMAT JSONEachRow`, with a simple `MergeTree` schema (`run_id`, `title`, `frames`, `duration`, `transcript`, `thumbnail_url`, `created_at`) that auto-creates on the first run. Pair with `clickhouse-client` or any BI tool to slice run history by duration, transcript content, or container tags at interactive latencies.

When to reach for it

  • Centralise every video an agent has watched across thousands of runs — ClickHouse stays fast where Postgres slows down
  • Run aggregate analytics (avg frames per video, transcript word counts, runs per day) without ETL
  • Join run metadata against existing ClickHouse event streams for product analytics correlation

Install

npm i -g peepshow

Use it

CLICKHOUSE_URL="http://localhost:8123" \
peepshow ./demo.mp4 --sink clickhouse

Make it automatic

Register the sink once — every run fires it afterward. Scope by --when so it only runs for matching videos.

peepshow sinks add clickhouse
peepshow sinks add clickhouse --when extension=mp4,mov
peepshow sinks add clickhouse --when path=/Volumes/Work/

Configuration

  • CLICKHOUSE_URL Base URL of the ClickHouse HTTP interface, e.g. `http://localhost:8123`. required
  • CLICKHOUSE_USER Username. Default `default`.
  • CLICKHOUSE_PASSWORD Password. Sent via the `X-ClickHouse-Key` header.
  • CLICKHOUSE_DATABASE Database name. Default `default`.
  • CLICKHOUSE_TABLE Table name. Default `peepshow_runs`. Auto-created on first write.
  • PEEPSHOW_FRAME_BASE_URL When set, the first frame URL is written to the `thumbnail_url` column.

Use with an LLM agent

Every peepshow sink reads its config from env vars and receives a single JSON payload on stdin. An LLM agent (Claude Code, Cursor, Windsurf, Gemini, Codex) can drive the ClickHouse sink automatically when three things are true:

  • the env vars below are exported in the agent's shell (or a project .env it can load),
  • the peepshow CLI is on PATH — install with npm i -g peepshow,
  • a peepshow auto-sink is registered for the run (optional but recommended — makes invocation zero-argument).

1. Set the environment

# Add to ~/.zshrc, ~/.bashrc, or a project .env the agent can load
export CLICKHOUSE_URL="..."

2. Register as an auto-sink

peepshow sinks add clickhouse
peepshow sinks add clickhouse --when extension=mp4,mov

3. Example LLM session

You → drop a .mov into Claude Code.

Claude → auto-invokes /peepshow:slides ./clip.mov. peepshow extracts frames + audio, the ClickHouse sink forwards the run to the configured database. Claude replies with a summary and a link to the created record.

The transcript rides along in the payload whenever the audio pass transcribes successfully.

Write your own

A sink is any executable that reads the --emit json payload on stdin. Shell, Node, Python, Go — the spec's in docs/PLUGINS.md. Register persistent ones with peepshow sinks add-cmd 'your-command'.