peepshow/ sinks/ bigquery

Reel #86 Cloud data warehouse

peepshow sink / bigquery

BigQueryStream each run into a BigQuery table via the insertAll API — Google's data warehouse.

Stream one row per peepshow run into a Google BigQuery table via the `tabledata.insertAll` REST endpoint. Uses an OAuth2 access token (Bearer) — service-account or user credentials both work.

drop · process · bigquery

What it does

[BigQuery](https://cloud.google.com/bigquery) is Google's petabyte-scale serverless data warehouse. This sink streams one row per peepshow run into a BigQuery table via the [`tabledata.insertAll`](https://cloud.google.com/bigquery/docs/reference/rest/v2/tabledata/insertAll) REST endpoint. Auth is an OAuth2 access token (Bearer header) — typically generated with `gcloud auth print-access-token` (user creds) or `gcloud auth print-access-token --impersonate-service-account=...` (service account). The destination table must already exist; the doc captures the expected schema (`run_id STRING · title STRING · frames INT64 · duration FLOAT64 · transcript STRING · thumbnail_url STRING · strategy STRING · tags STRING · created_at TIMESTAMP`) so you can paste it as a `CREATE TABLE`. Token refresh is the caller's responsibility — the sink fails fast on 401 so a CI job retries cleanly with a fresh token.

When to reach for it

  • Pipe peepshow runs into the same BigQuery dataset your product analytics lives in
  • Build a Looker Studio / Data Studio dashboard over peepshow run history
  • Hand a service-account access token to a CI job that records every QA video into a shared warehouse

Install

npm i -g peepshow

Use it

BIGQUERY_PROJECT="my-project" \
BIGQUERY_DATASET="peepshow" \
BIGQUERY_ACCESS_TOKEN="$(gcloud auth print-access-token)" \
peepshow ./demo.mp4 --sink bigquery

Make it automatic

Register the sink once — every run fires it afterward. Scope by --when so it only runs for matching videos.

peepshow sinks add bigquery
peepshow sinks add bigquery --when extension=mp4,mov
peepshow sinks add bigquery --when path=/Volumes/Work/

Configuration

  • BIGQUERY_PROJECT GCP project id that owns the BigQuery dataset. required
  • BIGQUERY_DATASET BigQuery dataset id (case-sensitive). required
  • BIGQUERY_TABLE Table name within the dataset. Default `peepshow_runs`. Must exist with a compatible schema (see docs).
  • BIGQUERY_ACCESS_TOKEN OAuth2 access token (Bearer). Generate with `gcloud auth print-access-token` or equivalent. Token refresh is the caller's responsibility. required
  • PEEPSHOW_FRAME_BASE_URL When set, the first frame URL is written to the `thumbnail_url` field.

Use with an LLM agent

Every peepshow sink reads its config from env vars and receives a single JSON payload on stdin. An LLM agent (Claude Code, Cursor, Windsurf, Gemini, Codex) can drive the BigQuery sink automatically when three things are true:

  • the env vars below are exported in the agent's shell (or a project .env it can load),
  • the peepshow CLI is on PATH — install with npm i -g peepshow,
  • a peepshow auto-sink is registered for the run (optional but recommended — makes invocation zero-argument).

1. Set the environment

# Add to ~/.zshrc, ~/.bashrc, or a project .env the agent can load
export BIGQUERY_PROJECT="..."
export BIGQUERY_DATASET="..."
export BIGQUERY_ACCESS_TOKEN="..."

2. Register as an auto-sink

peepshow sinks add bigquery
peepshow sinks add bigquery --when extension=mp4,mov

3. Example LLM session

You → drop a .mov into Claude Code.

Claude → auto-invokes /peepshow:slides ./clip.mov. peepshow extracts frames + audio, the BigQuery sink forwards the run to the configured database. Claude replies with a summary and a link to the created record.

The transcript rides along in the payload whenever the audio pass transcribes successfully.

Write your own

A sink is any executable that reads the --emit json payload on stdin. Shell, Node, Python, Go — the spec's in docs/PLUGINS.md. Register persistent ones with peepshow sinks add-cmd 'your-command'.