peepshow/ sinks/ databricks

Reel #94 Lakehouse warehouse

peepshow sink / databricks

DatabricksInsert each run into a Databricks Delta table via the Statement Execution API.

POST one row per peepshow run into a Databricks SQL warehouse via `POST /api/2.0/sql/statements/`. Auto-creates the Delta table on first write; subsequent runs append. Works against AWS, Azure, and GCP workspaces.

drop · process · databricks

What it does

[Databricks](https://www.databricks.com/) is the unified lakehouse platform — Delta Lake storage plus a SQL warehouse on top. This sink writes one row per peepshow run to a Delta table via the [SQL Statement Execution API](https://docs.databricks.com/api/workspace/statementexecution) (`POST /api/2.0/sql/statements/`). Auth is a [personal access token (PAT)](https://docs.databricks.com/en/dev-tools/auth/pat.html) sent as a Bearer header — no JDBC driver, no SQL Connector for Python to install, no key-pair signing. The first write auto-creates the table with the standard peepshow schema (`run_id · title · frames · duration · transcript · thumbnail_url · strategy · tags · created_at`) using `CREATE TABLE IF NOT EXISTS ... USING DELTA`; subsequent runs append. The warehouse must already exist — `DATABRICKS_WAREHOUSE_ID` points the sink at it.

When to reach for it

  • Pipe peepshow runs into the same Databricks workspace your product analytics already lives in
  • Build a Databricks SQL dashboard or AI/BI Genie space over run history without an ETL layer
  • Hand a service-principal PAT to a CI job that records every QA video into a shared lakehouse

Install

npm i -g peepshow

Use it

DATABRICKS_URL="https://abc-123.cloud.databricks.com" \
DATABRICKS_TOKEN="$(< ~/.databricks-pat)" \
DATABRICKS_WAREHOUSE_ID="abcd1234efgh5678" \
peepshow ./demo.mp4 --sink databricks

Make it automatic

Register the sink once — every run fires it afterward. Scope by --when so it only runs for matching videos.

peepshow sinks add databricks
peepshow sinks add databricks --when extension=mp4,mov
peepshow sinks add databricks --when path=/Volumes/Work/

Configuration

  • DATABRICKS_URL Workspace URL, e.g. `https://abc-123.cloud.databricks.com` (AWS), `https://adb-…azuredatabricks.net` (Azure), or your GCP workspace host. required
  • DATABRICKS_TOKEN Personal access token (Bearer). Generate under User Settings → Developer → Access tokens. Use a service-principal PAT for CI. required
  • DATABRICKS_WAREHOUSE_ID SQL warehouse id — the compute that runs the statement. Copy from the SQL warehouse details page. required
  • DATABRICKS_CATALOG Unity Catalog catalog. Default `main`.
  • DATABRICKS_SCHEMA Schema (a.k.a. database) within the catalog. Default `default`.
  • DATABRICKS_TABLE Table name. Default `peepshow_runs`. Auto-created on first write.
  • PEEPSHOW_FRAME_BASE_URL When set, the first frame URL is written to the `thumbnail_url` column.

Use with an LLM agent

Every peepshow sink reads its config from env vars and receives a single JSON payload on stdin. An LLM agent (Claude Code, Cursor, Windsurf, Gemini, Codex) can drive the Databricks sink automatically when three things are true:

  • the env vars below are exported in the agent's shell (or a project .env it can load),
  • the peepshow CLI is on PATH — install with npm i -g peepshow,
  • a peepshow auto-sink is registered for the run (optional but recommended — makes invocation zero-argument).

1. Set the environment

# Add to ~/.zshrc, ~/.bashrc, or a project .env the agent can load
export DATABRICKS_URL="..."
export DATABRICKS_TOKEN="..."
export DATABRICKS_WAREHOUSE_ID="..."

2. Register as an auto-sink

peepshow sinks add databricks
peepshow sinks add databricks --when extension=mp4,mov

3. Example LLM session

You → drop a .mov into Claude Code.

Claude → auto-invokes /peepshow:slides ./clip.mov. peepshow extracts frames + audio, the Databricks sink forwards the run to the configured database. Claude replies with a summary and a link to the created record.

The transcript rides along in the payload whenever the audio pass transcribes successfully.

Write your own

A sink is any executable that reads the --emit json payload on stdin. Shell, Node, Python, Go — the spec's in docs/PLUGINS.md. Register persistent ones with peepshow sinks add-cmd 'your-command'.