npm - @revos/cli - Versions diffs - 0.2.0 → 0.2.2 - Mend

@revos/cli 0.2.0 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (152) hide show

package/dist/templates/skills/create-cubes/references/stripe-entities.md ADDED Viewed

@@ -0,0 +1,114 @@
+# Stripe Entities Reference
+## Table naming
+Airbyte syncs Stripe tables with a configurable prefix (default: `stripe_`).
+Inspect the BigQuery dataset to confirm:
+```sql
+SELECT table_name FROM `<dataset>.INFORMATION_SCHEMA.TABLES`
+WHERE table_name LIKE '%customers%' OR table_name LIKE '%invoices%'
+ORDER BY table_name LIMIT 20;
+```
+Throughout this document `<prefix>` is a placeholder for that prefix.
+---
+## Primary entities
+| Cube name               | BigQuery table          | PK   | Notes                            |
+| ----------------------- | ----------------------- | ---- | -------------------------------- |
+| `<prefix>customers`     | `<prefix>customers`     | `id` | `name` is display name           |
+| `<prefix>subscriptions` | `<prefix>subscriptions` | `id` | FK `customer` (→ customers.id)   |
+| `<prefix>invoices`      | `<prefix>invoices`      | `id` | FK `customer`, FK `subscription` |
+---
+## Relationship graph
+```
+customers ──< subscriptions ──< invoices
+    └────────────────────────< invoices
+```
+- customer → subscriptions: `one_to_many` via `subscriptions.customer = customers.id`
+- customer → invoices: `one_to_many` via `invoices.customer = customers.id`
+- subscription → invoices: `one_to_many` via `invoices.subscription = subscriptions.id`
+- subscription → latest_invoice: `many_to_one` via `subscriptions.latest_invoice = latest_invoice.id`
+---
+## Standard cube definitions
+### customers
+```yaml
+name: <prefix>customers
+sql_table: "`<dataset>.<prefix>customers`"
+joins:
+  <prefix>subscriptions:
+    relationship: one_to_many
+    sql: "${CUBE}.id = ${<prefix>subscriptions.customer}"
+  <prefix>invoices:
+    relationship: one_to_many
+    sql: "${CUBE}.id = ${<prefix>invoices.customer}"
+```
+### subscriptions
+```yaml
+name: <prefix>subscriptions
+sql_table: "`<dataset>.<prefix>subscriptions`"
+joins:
+  <prefix>customers:
+    relationship: many_to_one
+    sql: "${CUBE}.customer = ${<prefix>customers.id}"
+  <prefix>invoices:
+    relationship: one_to_many
+    sql: "${CUBE}.id = ${<prefix>invoices.subscription}"
+  <prefix>latest_invoice:
+    relationship: many_to_one
+    sql: "${CUBE}.latest_invoice = ${<prefix>latest_invoice.id}"
+```
+### invoices
+```yaml
+name: <prefix>invoices
+sql_table: "`<dataset>.<prefix>invoices`"
+joins:
+  <prefix>customers:
+    relationship: many_to_one
+    sql: "${CUBE}.customer = ${<prefix>customers.id}"
+  <prefix>subscriptions:
+    relationship: many_to_one
+    sql: "${CUBE}.subscription = ${<prefix>subscriptions.id}"
+```
+---
+## Special cube: latest_invoice
+`latest_invoice` is an alias for the `invoices` table (public: false) used
+exclusively for the `subscriptions.latest_invoice` FK join. Needed because
+Cube.js does not support two joins to the same table under the same cube name.
+```yaml
+name: <prefix>latest_invoice
+sql_table: "`<dataset>.<prefix>invoices`"
+public: false
+joins:
+  <prefix>subscriptions:
+    relationship: one_to_many
+    sql: "${CUBE}.id = ${<prefix>subscriptions.latest_invoice}"
+```
+---
+## Common pitfalls
+1. **`latest_invoice` must be a separate cube** — subscriptions needs both `invoices` (for all invoices) and `latest_invoice` (for the most recent one). Same physical table, different cube names.
+2. **FK column names without suffix** — `subscriptions.customer` is the raw Stripe customer ID (not `customer_id`). Same for `invoices.subscription` and `invoices.customer`. Check actual column names in INFORMATION_SCHEMA.
+3. **Stripe IDs are strings** — all IDs start with a prefix (`cus_`, `sub_`, `in_`, etc.). No casting needed.
+4. **Timestamps** — Stripe tables from Airbyte use `_airbyte_extracted_at` as the sync timestamp. Use it for `refresh_key`.

package/dist/templates/skills/create-dbt-transformations/SKILL.md CHANGED Viewed

@@ -1,11 +1,11 @@
 ---
 name: create-dbt-transformations
-description: Create new dbt transformations (bronze/silver/gold models) in the RevOS dbt project. Use when asked to create a dbt model, build a transformation, add a new layer model, declare a raw source, or register a new Airbyte-ingested table. Covers dbt project conventions, sources, materialization, schema.yml, and validation commands.
+description: Create new dbt transformations (silver/gold models) in the RevOS dbt project. Use when asked to create a dbt model, build a transformation, add a new layer model, declare a raw source, or register a new raw table. Bronze is source-declarations only — no SQL files. Covers dbt project conventions, sources, materialization, schema.yml, and validation commands.
 ---
 # Create dbt Transformations
-Use this skill to generate SQL models, declare sources, update `schema.yml`, and validate models with `revos dbt run` / `revos dbt test`.
+Use this skill to generate SQL models, declare sources, update `schema.yml`, and validate models with `dbt run` / `dbt test`.
 For BigQuery exploration (listing datasets, inspecting raw tables, previewing rows, null rates), load the `explore-lakehouse` skill. If that skill is not installed, fall back to:
@@ -22,32 +22,35 @@ Warn the user: "The `explore-lakehouse` skill is not installed — using `bq sho
 ## Layer Conventions
 - **gold** — business-ready models exposed for reporting or downstream consumption.
-- **silver** — cleaned, deduplicated, type-conformed intermediates.
-- **bronze** — thin views over raw source data. References sources via `{{ source() }}`.
+- **silver** — cleaned, deduplicated, type-conformed intermediates. Lowest SQL layer; reads raw data via `{{ source('bronze', '<table>') }}`.
+- **bronze** — **not a SQL layer**. Holds only `dbt/models/bronze/schema.yml`, which declares raw tables as dbt sources. No `.sql` files belong under `dbt/models/bronze/`.
 When layer is not obvious from context, ask (see Checkpoint 1).
 ## Sources (bronze layer)
-Raw tables ingested by Airbyte are not dbt models. Declare them as dbt sources so bronze models can reference them with `{{ source() }}`.
+Raw tables loaded into the warehouse by your ingestion pipeline are not dbt models. Declare them as dbt sources so silver models can reference them with `{{ source() }}`.
 Sources are declared in `dbt/models/bronze/schema.yml` under a `sources:` block using `schema` (the BigQuery dataset):
 ```yaml
 sources:
-  - name: raw
+  - name: bronze
     schema: "{{ env_var('REVOS_BQ_DATASET') }}"
     tables:
       - name: hubspot_contacts
 ```
-Reference in bronze SQL:
+Reference in silver SQL:
 ```sql
-SELECT * FROM {{ source('raw', 'hubspot_contacts') }}
+-- dbt/models/silver/silver_hubspot_contacts.sql
+SELECT * FROM {{ source('bronze', 'hubspot_contacts') }}
 ```
-See [schema-conventions.md](references/schema-conventions.md) for the full declaration pattern alongside `models:`.
+`{{ source('bronze', 'hubspot_contacts') }}` resolves to `${REVOS_BQ_DATASET}.hubspot_contacts` — the same dataset where raw tables live — so silver has direct access without a bronze SQL view in between.
+See [schema-conventions.md](references/schema-conventions.md) for the full declaration pattern.
 ## Materialization
@@ -63,24 +66,24 @@ Materialized table lives at: `$REVOS_BQ_DATASET.<model_name>`
 **When to use `{{ ref() }}` vs. `{{ source() }}`:**
-| Context                             | Use                              |
-| ----------------------------------- | -------------------------------- |
-| dbt SQL → other dbt model           | `{{ ref('<model>') }}`           |
-| dbt SQL → raw source table (bronze) | `{{ source('raw', '<table>') }}` |
+| Context                                            | Use                                 |
+| -------------------------------------------------- | ----------------------------------- |
+| dbt SQL → other dbt model                          | `{{ ref('<model>') }}`              |
+| dbt SQL → raw table (silver reading from `bronze`) | `{{ source('bronze', '<table>') }}` |
+Silver is the lowest SQL layer — `{{ source('bronze', ...) }}` is used in silver only. Gold reads from silver via `{{ ref() }}`. There are no SQL files in `dbt/models/bronze/`.
 Always declare raw tables as sources before referencing them. Do not use bare fully qualified names — that bypasses dbt's dependency graph and source freshness tracking.
 ## Standard dbt Commands
-Always use the `revos` wrapper:
 ```bash
-revos dbt parse                               # validate syntax (no warehouse)
-revos dbt compile --select <model>            # resolve refs, produce compiled SQL
-revos dbt run --select <model>                # execute against warehouse
-revos dbt test --select <model>               # run tests
-revos dbt build --select <model>              # run + test
-revos dbt build --select path:models/<layer>  # entire layer
+dbt parse                               # validate syntax (no warehouse)
+dbt compile --select <model>            # resolve refs, produce compiled SQL
+dbt run --select <model>                # execute against warehouse
+dbt test --select <model>               # run tests
+dbt build --select <model>              # run + test
+dbt build --select path:models/<layer>  # entire layer
 ```
 ---
@@ -91,16 +94,16 @@ revos dbt build --select path:models/<layer>  # entire layer
 For each transformation (one at a time — do not batch):
-1. Determine the target layer (Checkpoint 1 if unclear).
+1. Determine the target layer — **silver** or **gold** only (Checkpoint 1 if unclear). Refuse bronze SQL models (see Checkpoint 4).
 2. Determine the model name.
 3. Check if that model already exists (Checkpoint 2 if yes).
 4. Gather source data and transformation logic. For bridge models, apply the bridge template ([sql-templates.md](references/sql-templates.md)).
-5. For bronze models: check if required sources are declared in `dbt/models/bronze/schema.yml`; add them if missing.
-6. Generate `dbt/models/<layer>/<model_name>.sql`.
+5. If the model reads raw data, ensure each raw table is declared under the `bronze` source in `dbt/models/bronze/schema.yml`; add it if missing.
+6. Generate `dbt/models/<silver|gold>/<model_name>.sql`. **Never** generate `.sql` files under `dbt/models/bronze/`.
 7. Detect the primary key (Checkpoint 3 if ambiguous).
 8. Add model entry to `dbt/models/<layer>/schema.yml` with PK and FK tests. See [schema-conventions.md](references/schema-conventions.md).
-9. Run `revos dbt run --select <model_name>` and report result.
-10. Run `revos dbt test --select <model_name>` and report result.
+9. Run `dbt run --select <model_name>` and report result.
+10. Run `dbt test --select <model_name>` and report result.
 11. Summarize (see Final Response Format).
 For multiple transformations in one request: repeat steps 1–11 per model in order.
@@ -117,8 +120,9 @@ Ask if the layer is not obvious:
 Which layer should this transformation live in?
 - gold: business-ready, exposed for reporting or downstream consumption
-- silver: cleaned/intermediate, shared across downstream uses
-- bronze: close-to-source view over raw data, references sources
+- silver: cleaned/intermediate, reads raw via `{{ source('bronze', ...) }}`
+(bronze is not a SQL layer — it only holds `schema.yml` source declarations.)
 ```
 Layer is obvious when the user explicitly names it.
@@ -150,6 +154,21 @@ I could not unambiguously detect the primary key. Candidates:
 Which column(s) should be the primary key?
 ```
+### Checkpoint 4: Bronze SQL Model Refused
+If the user explicitly asks to create a bronze SQL model:
+```text
+Bronze is not a SQL layer in this project — it only holds source
+declarations in `dbt/models/bronze/schema.yml`. Silver reads raw data
+directly via `{{ source('bronze', '<raw_table>') }}`.
+Would you like to create this as a silver model instead?
+```
+Do not generate any file under `dbt/models/bronze/` other than
+`schema.yml`.
 ---
 ## Primary Key Detection
@@ -169,12 +188,22 @@ If none produce a clear answer → Checkpoint 3.
 A column is a FK candidate if it matches `<entity>_id` where `<entity>` ≠ model's own entity, is not part of the PK, and is not nullable by design. Add `not_null` test only (no `relationships` tests by default).
+## Timestamp Column Propagation (Gold Models)
+Every gold model **must** propagate at least one timestamp column so downstream cubes can use SQL-based `refresh_key` (see `create-cubes` skill). Priority:
+1. An ingestion-time column on the raw table (e.g. Airbyte writes `_airbyte_extracted_at`) — propagate when present.
+2. `updated_at` / `modified_at` — CDC-friendly streams.
+3. `created_at` — insert-only fact tables.
+If the upstream source has none of these, document it in a SQL comment: `-- no timestamp column available from source`.
 ## SQL File Generation
 See [sql-templates.md](references/sql-templates.md) for:
-- Bronze model template using `{{ source() }}`
-- Standard silver/gold model template
+- Standard silver model template (reads raw via `{{ source('bronze', ...) }}`)
+- Standard gold model template (reads silver via `{{ ref() }}`)
 - Bridge model (JSON array) template with concrete example
 - Bridge model naming convention and SQL content rules
@@ -193,7 +222,7 @@ See [edge-cases.md](references/edge-cases.md) for: missing SQL details, missing
 ```text
 Created dbt transformation: <model_name>
-Layer:           <bronze | silver | gold>
+Layer:           <silver | gold>
 File:            dbt/models/<layer>/<model_name>.sql
 Materialization: <inherited: table | overridden: <type>>
 Primary key:     <pk_column>  (or composite: <col_1>, <col_2>)
@@ -206,8 +235,8 @@ Tests:
 - not_null on <fk>: added
 Validation:
-- revos dbt run:  passed | failed
-- revos dbt test: passed | failed
+- dbt run:  passed | failed
+- dbt test: passed | failed
 Physical table after run:
 `<resolved_dataset>.<model_name>`

package/dist/templates/skills/create-dbt-transformations/references/edge-cases.md CHANGED Viewed

@@ -26,15 +26,33 @@ The transformation you described references `<missing_model>`, which does not
 exist in dbt/models/. Should I create that model first?
 ```
-## Source is a raw Airbyte table not yet declared as a dbt source
+## Source is a raw table not yet declared as a dbt source
-Declare it as a source in `dbt/models/bronze/schema.yml` first (see [schema-conventions.md](schema-conventions.md)), then reference it with `{{ source('raw', '<table>') }}` in the bronze model SQL. Do not use fully qualified BigQuery names directly — that bypasses dbt's dependency graph and source freshness tracking.
+Declare it under `sources: - name: bronze` in `dbt/models/bronze/schema.yml`
+first (see [schema-conventions.md](schema-conventions.md)), then reference it
+with `{{ source('bronze', '<table>') }}` in the silver model SQL. Do not use
+fully qualified BigQuery names directly — that bypasses dbt's dependency
+graph and source freshness tracking.
+## User asks to create a bronze SQL model
+Refuse and redirect:
+```text
+Bronze is not a SQL layer in this project — `dbt/models/bronze/` only
+contains `schema.yml` declaring raw tables as sources. Silver reads raw
+data directly via `{{ source('bronze', '<raw_table>') }}`.
+Should I create this as a silver model instead?
+```
+Do not generate any file under `dbt/models/bronze/` other than `schema.yml`.
 ## run fails
 1. Show the error verbatim — do not paraphrase warehouse errors.
 2. Offer to fix the SQL based on the error message.
-3. Do not proceed to `revos dbt test` until run succeeds.
+3. Do not proceed to `dbt test` until run succeeds.
 ## test fails

package/dist/templates/skills/create-dbt-transformations/references/schema-conventions.md CHANGED Viewed

@@ -10,9 +10,14 @@
 ---
-Each layer has one shared `schema.yml` at `dbt/models/<layer>/schema.yml`. Append new models; do not create per-model files.
+Each SQL layer (silver, gold) has one shared `schema.yml` at
+`dbt/models/<layer>/schema.yml`. Append new models; do not create per-model
+files.
-If the file does not exist, create it with:
+The bronze directory is **not** a SQL layer — its `schema.yml` contains only
+source declarations, no `models:` block.
+If a layer's `schema.yml` does not exist, create it with:
 ```yaml
 version: 2
@@ -22,7 +27,9 @@ models:
 ## Declaring Sources (bronze layer)
-Raw tables must be declared as dbt sources before they can be referenced with `{{ source() }}`. Sources live in `dbt/models/bronze/schema.yml` under a `sources:` block alongside the `models:` block.
+`dbt/models/bronze/schema.yml` is the only file in `dbt/models/bronze/`. It
+declares raw tables as dbt sources so that silver models can reference them
+with `{{ source('bronze', '<table>') }}`.
 `schema` maps to the BigQuery dataset (`REVOS_BQ_DATASET`):
@@ -30,23 +37,30 @@ Raw tables must be declared as dbt sources before they can be referenced with `{
 version: 2
 sources:
-  - name: raw
+  - name: bronze
     schema: "{{ env_var('REVOS_BQ_DATASET') }}"
     tables:
       - name: hubspot_contacts
       - name: hubspot_deals
       - name: stripe_charges
+```
+The corresponding silver model entry lives in `dbt/models/silver/schema.yml`:
+```yaml
+version: 2
 models:
-  - name: bronze_hubspot_contacts
+  - name: silver_hubspot_contacts
     ...
 ```
 Rules:
-- Use `raw` as the source name for all Airbyte-ingested tables.
-- Each raw table referenced in bronze SQL needs a corresponding entry under `tables:`.
+- Use `bronze` as the source name for all raw tables.
+- Each raw table referenced in silver SQL needs a corresponding entry under `tables:`.
 - If the source block already exists, append to the `tables:` list only.
+- Do **not** add a `models:` block to `dbt/models/bronze/schema.yml` — bronze contains source declarations only.
 ## Standard Model Entry

package/dist/templates/skills/create-dbt-transformations/references/sql-templates.md CHANGED Viewed

@@ -1,61 +1,74 @@
 # SQL Templates
-## Bronze Model (source reference)
+## Bronze: no SQL
-Bronze models read raw Airbyte-ingested tables via `{{ source() }}`:
+Bronze is **not** a SQL layer. `dbt/models/bronze/` contains only `schema.yml`,
+which declares raw tables as dbt sources. See
+[schema-conventions.md](schema-conventions.md). Silver models read raw data
+directly via `{{ source('bronze', '<raw_table_name>') }}`.
+## Silver Model (reads raw via source)
 ```sql
 SELECT
   <pk_column>,
   <business_columns>,
-  _airbyte_extracted_at
-FROM {{ source('raw', '<raw_table_name>') }}
+  <ingestion_timestamp_column>
+FROM {{ source('bronze', '<raw_table_name>') }}
+WHERE <filtering_conditions>
 ```
-Ensure the source is declared in `dbt/models/bronze/schema.yml` before using it (see `references/schema-conventions.md`).
+The raw table must be declared under `sources: - name: bronze` in
+`dbt/models/bronze/schema.yml` first (see
+[schema-conventions.md](schema-conventions.md)).
-## Standard Silver / Gold Model
+## Gold Model (reads silver via ref)
 ```sql
 SELECT
   <pk_column>,
   <business_columns>,
-  _airbyte_extracted_at
-FROM {{ ref('<source_model>') }}
+  <ingestion_timestamp_column>
+FROM {{ ref('<silver_model>') }}
 WHERE <filtering_conditions>
 ```
 ## Bridge Model (JSON Array)
-When unpacking a JSON array into a many-to-many bridge table:
+When unpacking a JSON array into a many-to-many bridge table, read from the
+silver model that owns the array column:
 ```sql
 SELECT DISTINCT
   d.id              AS <entity_a>_id,
   <entity_b>_id,
-  d._airbyte_extracted_at
-FROM {{ ref('<source_model>') }} d,
+  d.<ingestion_timestamp_column>
+FROM {{ ref('<silver_source_model>') }} d,
 UNNEST(JSON_VALUE_ARRAY(d.<json_array_column>)) AS <entity_b>_id
 WHERE d.<json_array_column> IS NOT NULL
 ```
-Concrete example (`gold_deals_companies.sql`, unpacking `companies` array on `hubspot_deals`):
+Concrete example (`gold_deals_companies.sql`, unpacking the `companies` array
+on `silver_hubspot_deals`):
 ```sql
 SELECT DISTINCT
   d.id                      AS deal_id,
   company_id,
   d._airbyte_extracted_at
-FROM {{ ref('hubspot_deals') }} d,
+FROM {{ ref('silver_hubspot_deals') }} d,
 UNNEST(JSON_VALUE_ARRAY(d.companies)) AS company_id
 WHERE d.companies IS NOT NULL
 ```
+If the silver model for the upstream entity does not exist yet, create it
+first (see edge case: missing upstream model in `edge-cases.md`).
 Notes:
-1. `SELECT DISTINCT` — a single source row can produce duplicate combinations under some Airbyte sync patterns.
+1. `SELECT DISTINCT` — a single source row can produce duplicate combinations under some sync patterns.
 2. `WHERE d.<json_array_column> IS NOT NULL` is required — `UNNEST(JSON_VALUE_ARRAY(NULL))` is unsafe.
-3. `_airbyte_extracted_at` is preserved for downstream freshness checks.
+3. Preserve the ingestion timestamp column from upstream for downstream freshness checks.
 4. Composite PK: `(<entity_a>_id, <entity_b>_id)`.
 ## Bridge Model Naming
@@ -66,8 +79,9 @@ Examples: `gold_deals_companies`, `gold_deals_contacts`, `gold_companies_contact
 ## SQL Content Rules
-1. No `{{ config(materialized=...) }}` unless user asks to override the layer default.
-2. `{{ source('raw', '<table>') }}` for raw source tables in bronze models.
-3. `{{ ref('<model>') }}` for references to other dbt models.
-4. Named CTEs for non-trivial logic, explicit column lists where practical.
-5. Preserve `_airbyte_extracted_at` from Airbyte-ingested sources.
+1. No `{{ config(materialized=...) }}` unless the user asks to override the layer default.
+2. `{{ source('bronze', '<table>') }}` for raw tables — used **only** in silver models.
+3. `{{ ref('<model>') }}` for references to other dbt models (gold reads silver this way).
+4. Never write `.sql` files in `dbt/models/bronze/`.
+5. Named CTEs for non-trivial logic, explicit column lists where practical.
+6. Preserve the ingestion timestamp column from raw (e.g. `_airbyte_extracted_at` if Airbyte loaded it) when present.

package/dist/templates/skills/explore-lakehouse/SKILL.md CHANGED Viewed

@@ -4,7 +4,7 @@ description: >
   Inspect the RevOS BigQuery lakehouse: list datasets and tables, introspect table schemas
   and column types, preview sample rows, assess data layers (bronze/silver/gold), and check
   data completeness and null rates. Required companion skill for create-dbt-transformations
-  and create-semantic-model — load before generating dbt models or semantic overlays to
+  and create-cubes — load before generating dbt models or cube definitions to
   introspect warehouse columns and types. Use when asked to: explore the lakehouse, list
   BigQuery tables, inspect a table schema, preview data, check raw source tables, assess data
   quality, check null rates, understand available data, or perform BigQuery schema introspection.
@@ -85,9 +85,13 @@ FROM \`$GOOGLE_CLOUD_PROJECT.$REVOS_BQ_DATASET.<table>\`
 ### "What's in my database?" / general overview
 1. List tables in the org's dataset: `bq ls $REVOS_BQ_DATASET`
-2. Infer the data source and domain from table name prefixes (e.g. `salesforce_*`, `stripe_*`, `hubspot_*`)
-3. Group tables by source/domain
-4. Return: sources found, table count per source, table types (TABLE/VIEW), one-line description per group
+2. If the dataset is empty (no tables), tell the user:
+   - They can add data sources by running `revos sources create` to open the RevOS UI
+   - They can view existing sources with `revos sources list`
+   - Stop here — no further exploration is possible without data
+3. Infer the data source and domain from table name prefixes (e.g. `salesforce_*`, `stripe_*`, `hubspot_*`)
+4. Group tables by source/domain
+5. Return: sources found, table count per source, table types (TABLE/VIEW), one-line description per group
 ### "What layer is this data?" / bronze–silver–gold assessment

package/dist/templates/skills/load-sample-data/SKILL.md ADDED Viewed

@@ -0,0 +1,119 @@
+---
+name: load-sample-data
+description: >
+  Populate a BigQuery dataset with sample data from public datasets using bq cp.
+  Use when asked to: load sample data, populate the lakehouse, add demo data, seed the dataset,
+  get started with sample tables, or when the user needs data to explore.
+---
+# Load Sample Data
+Copy sample tables from `bigquery-public-data` into the user's dataset using `bq cp`. All copied tables are prefixed with `sample_` so they can be easily identified and cleaned up later.
+## Environment
+Verify env vars before running any command. If either is empty, ask the user to set it.
+```bash
+echo "Project: $GOOGLE_CLOUD_PROJECT"
+echo "Dataset: $REVOS_BQ_DATASET"
+```
+- `$GOOGLE_CLOUD_PROJECT` — BQ project ID
+- `$REVOS_BQ_DATASET` — target dataset
+## Step 1: Check Existing Tables
+```bash
+bq ls $REVOS_BQ_DATASET
+```
+If tables exist, list them. They will be relevant in Step 3 for collision handling.
+## Step 2: Present Sample Dataset Catalog
+```text
+Available sample datasets:
+1. thelook_ecommerce (default) — B2C e-commerce data
+   Tables: sample_users, sample_orders, sample_order_items, sample_products, sample_events, sample_inventory_items, sample_distribution_centers
+   Rows: ~100K users, ~300K orders
+   Good for: customer analytics, purchase funnels, product performance
+2. google_analytics_sample — Web analytics session data
+   Tables: sample_ga_sessions (single table, one day snapshot)
+   Rows: ~900K sessions
+   Good for: web traffic analysis, user behavior, channel attribution
+3. austin_bikeshare — Bikeshare trip and station data
+   Tables: sample_bikeshare_trips, sample_bikeshare_stations
+   Rows: ~1.3M trips
+   Good for: geospatial analysis, demand forecasting, utilization metrics
+```
+## Step 3: Copy Tables
+If any table names from the chosen sample dataset collide with existing tables (from Step 1), ask the user whether to **overwrite** or **skip** each collision before copying. Skip collisions the user chose not to overwrite.
+### thelook_ecommerce
+```bash
+for table in users orders order_items products events inventory_items distribution_centers; do
+  echo "Copying sample_$table..."
+  bq cp -f bigquery-public-data:thelook_ecommerce.$table \
+    $GOOGLE_CLOUD_PROJECT:$REVOS_BQ_DATASET.sample_$table
+done
+```
+### google_analytics_sample
+The `ga_sessions` table is date-sharded. Copy a representative day:
+```bash
+echo "Copying sample_ga_sessions..."
+bq cp -f bigquery-public-data:google_analytics_sample.ga_sessions_20170801 \
+  $GOOGLE_CLOUD_PROJECT:$REVOS_BQ_DATASET.sample_ga_sessions
+```
+### austin_bikeshare
+```bash
+for table in bikeshare_trips bikeshare_stations; do
+  echo "Copying sample_$table..."
+  bq cp -f bigquery-public-data:austin_bikeshare.$table \
+    $GOOGLE_CLOUD_PROJECT:$REVOS_BQ_DATASET.sample_$table
+done
+```
+## Step 4: Verify
+```bash
+for table in <copied_tables>; do
+  echo -n "$table: "
+  bq query --nouse_legacy_sql --format=csv \
+    "SELECT COUNT(*) FROM \`$GOOGLE_CLOUD_PROJECT.$REVOS_BQ_DATASET.$table\`" 2>/dev/null | tail -1
+done
+```
+## Final Response
+```text
+Sample data loaded into $REVOS_BQ_DATASET.
+Source: bigquery-public-data:<dataset_name>
+Tables copied:
+- <table_1>: <row_count> rows
+- <table_2>: <row_count> rows
+Next steps:
+- Run "explore lakehouse" to inspect the data
+- Run "create dbt transformations" to build bronze/silver/gold models
+- Run "create cube" to generate Cube.dev semantic models
+```
+## Rules
+- Use `bq cp -f` (not `CREATE TABLE AS SELECT`) — faster, no query costs, preserves schema. The `-f` flag is required to avoid cross-region confirmation prompts that block non-interactive shells.
+- Show progress for each table being copied.
+- Report any failures clearly with the `bq` error message.
+- Always prefix destination tables with `sample_` — this allows easy identification and cleanup of sample data.