npm - @revos/cli - Versions diffs - 0.2.1 → 0.2.2 - Mend

@revos/cli 0.2.1 → 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (157) hide show

package/dist/templates/skills/{create-semantic-model → create-cubes}/SKILL.md RENAMED Viewed

@@ -1,12 +1,12 @@
 ---
-name: create-semantic-model
+name: create-cubes
 description: >
-  Create semantic models (Cube.dev cubes) from existing RevOS dbt gold models.
+  Create first-class Cube.dev cube definitions from existing RevOS dbt gold models.
   Use when asked to: build a semantic layer, create cubes, generate Cube definitions from dbt,
-  create a semantic overlay, or create a semantic model from gold models.
+  define cube files, or create a semantic model from gold models.
 ---
-# Create Semantic Model
+# Create Cube
 ## Skill Dependencies
@@ -27,6 +27,8 @@ load the `explore-lakehouse` skill on demand.
 Expose existing dbt gold models as queryable Cube.dev semantic models without manually writing YAML boilerplate. Gold models may be tables or views.
+Each cube is a **complete, standalone definition** stored in `cubes/`. There is no patching or merging — what is in the file is what gets deployed.
 This skill does not build gold models. If a needed gold model is missing, hand off to `create-dbt-transformations`.
 ---
@@ -38,13 +40,13 @@ Strip the `gold_` prefix for cube names and file names. Keep `gold_` in `sql_tab
 ```text
 gold SQL file:     dbt/models/gold/gold_hubspot_companies.sql
 BigQuery table:    gold_hubspot_companies
-overlay file:      semantic/hubspot_companies.yml
+cube file:         cubes/hubspot_companies.yml
 cube name:         hubspot_companies
 join reference:    ${hubspot_companies}
 sql_table:         "`<dataset>.gold_hubspot_companies`"
 ```
-Same rule for bridge cubes: `gold_deals_companies` -> cube name `deals_companies`, file `deals_companies.yml`.
+Same rule for bridge cubes: `gold_deals_companies` -> cube name `deals_companies`, file `cubes/deals_companies.yml`.
 ## Cube `sql_table` Reference
@@ -76,7 +78,7 @@ When a many-to-many relationship is detected and no suitable bridge model exists
 ### Checkpoint 3: Relationship Confirmation
-Present validated relationships with join directions, cardinality, and match rates. Ask the user to confirm before generating overlays. Do not present unvalidated joins as confirmed — mark them as `validation pending`.
+Present validated relationships with join directions, cardinality, and match rates. Ask the user to confirm before generating cube files. Do not present unvalidated joins as confirmed — mark them as `validation pending`.
 ### Checkpoint 4: Measures Confirmation
@@ -94,7 +96,7 @@ Follow these phases in order. Do not skip ahead.
 1. Discover gold models via `find dbt/models/gold -name "*.sql"`.
 2. If none exist, stop and tell the user to create gold models first via `create-dbt-transformations`.
-3. Inspect 1-2 existing overlays in `semantic/` to detect conventions (`extends:`, `public:`, `refresh_key` style). Apply detected conventions to new overlays. Always use flat single-cube YAML (never `cubes:` or `views:` root).
+3. Inspect 1-2 existing cube files in `cubes/` to detect conventions (`extends:`, `public:`, `refresh_key` style). Apply detected conventions to new cubes. Always use flat single-cube YAML (never `cubes:` or `views:` root).
 4. If the user named a specific model, find it. If not found, stop.
 5. Otherwise list all discovered gold models and ask which should participate (Checkpoint 1).
 6. Keep the full discovered list available for connector search in Phase 3.
@@ -231,18 +233,18 @@ See [references/cube-examples.md](references/cube-examples.md) for type mapping
 ---
-## Phase 8: Generate Cube Semantic Overlays
+## Phase 8: Generate Cube Files
-Create Cube.dev YAML files in `semantic/`. Follow the existing style detected in Phase 1.
+Create Cube.dev YAML files in `cubes/`. Follow the existing style detected in Phase 1.
 Key rules:
-1. **One cube per file, flat YAML.** Each overlay file contains a single cube starting with `name:` at the root level. Never wrap with `cubes:` or `views:` at the root.
+1. **One cube per file, flat YAML.** Each cube file contains a single cube starting with `name:` at the root level. Never wrap with `cubes:` or `views:` at the root.
 2. File name = cube `name` (no `gold_` prefix) + `.yml`.
 3. `sql_table` uses fully qualified BigQuery reference with `gold_` prefix.
 4. Every confirmed relationship gets joins in both directions.
 5. Bridge/junction cubes use `public: false`.
-6. Every overlay **must** include a SQL-based `refresh_key`. Use `SELECT MAX(<timestamp_col>)` with columns in this priority: `_airbyte_extracted_at` (present on all Airbyte sources), `updated_at`/`modified_at` (CDC streams), `created_at` (insert-only facts). Only use `every: <interval>` as absolute last resort when **no timestamp column exists in the table** — add a YAML comment explaining why (e.g. `# no timestamp column available`).
+6. Every cube **must** include a SQL-based `refresh_key`. Use `SELECT MAX(<timestamp_col>)` with columns in this priority: `_airbyte_extracted_at` (present on all Airbyte sources), `updated_at`/`modified_at` (CDC streams), `created_at` (insert-only facts). Only use `every: <interval>` as absolute last resort when **no timestamp column exists in the table** — add a YAML comment explaining why (e.g. `# no timestamp column available`).
 7. `refresh_key.sql` references the same table as `sql_table`.
 8. Tag unvalidated joins with `# UNVALIDATED: <reason>`.
@@ -254,14 +256,14 @@ See [references/cube-examples.md](references/cube-examples.md) for canonical sta
 1. If `create-dbt-transformations` was invoked (bridge model), it already validated dbt models. Otherwise run `dbt parse`.
 2. Verify physical tables exist in BigQuery: `bq show <dataset>.<table_name>`. If missing, document as pending.
-3. Verify generated overlays match conventions: flat YAML, correct naming, correct `sql_table`, all dimensions present, `refresh_key` included, joins in both directions.
+3. Verify generated cube files match conventions: flat YAML, correct naming, correct `sql_table`, all dimensions present, `refresh_key` included, joins in both directions.
 ---
 ## Final Response Format
 ```text
-Created semantic model draft.
+Created cube definitions.
 Selected gold models:
 - dbt/models/gold/<gold_model_1>.sql
@@ -272,9 +274,9 @@ Approved connector models:
 Bridge/support models created (via create-dbt-transformations):
 - dbt/models/gold/<bridge_model>.sql
-Semantic overlays:
-- semantic/<entity_1>.yml         (cube name: <entity_1>)
-- semantic/<bridge_entity>.yml    (cube name: <bridge_entity>, public: false)
+Cube files:
+- cubes/<entity_1>.yml         (cube name: <entity_1>)
+- cubes/<bridge_entity>.yml    (cube name: <bridge_entity>, public: false)
 Validated relationships:
 - <entity_a>.<key> -> <entity_b>.<key> (<relationship_type>)
@@ -299,7 +301,7 @@ Pending items:
 - <pending_item>
 Next step:
-  revos overlays push -d ./semantic
+  revos apply
 ```
 If validation is incomplete, say exactly what remains pending.

package/dist/templates/skills/create-cubes/references/bq-pk-fk-conventions.md ADDED Viewed

@@ -0,0 +1,183 @@
+# BigQuery PK / FK Conventions
+Conventions and patterns for primary keys, foreign keys, and type handling in
+BigQuery-backed Cube.dev cube definitions.
+---
+## Primary key rules
+### Single-column PK
+Expose with `primary_key: true`:
+```yaml
+dimensions:
+  id:
+    sql: "${CUBE}.id"
+    type: string
+    primary_key: true
+```
+### Composite PK (no natural single key)
+Use a synthetic `CONCAT` or string concatenation dimension:
+```yaml
+dimensions:
+  id:
+    sql: "${CUBE}.deal_id || '_' || ${CUBE}.contact_id"
+    type: string
+    primary_key: true
+```
+Use `||` (SQL string concat) not `CONCAT()` — both work in BigQuery but `||` is
+more portable. Always cast non-string parts:
+```yaml
+sql: "${CUBE}.issue_id || '_' || CAST(${CUBE}.sprint_id AS STRING)"
+```
+### No natural PK
+When no unique column exists, use `ROW_NUMBER()` in the cube `sql:` view or
+document the absence clearly. Warn the user — Cube.js fan-out protection
+depends on a correct PK.
+---
+## FK type casting
+BigQuery enforces strict type matching in JOINs. Common mismatches:
+| Situation                   | Fix                                                   |
+| --------------------------- | ----------------------------------------------------- |
+| `id` is STRING, FK is INT64 | `CAST(fk_col AS STRING) = id`                         |
+| `id` is INT64, FK is STRING | `SAFE_CAST(id AS INT64) = fk_col`                     |
+| Both sides uncertain        | `SAFE_CAST(... AS STRING) = SAFE_CAST(... AS STRING)` |
+| JSON object storing ID      | `JSON_VALUE(col, '$.id')`                             |
+Use `SAFE_CAST` (not `CAST`) when the FK can contain non-numeric values —
+`SAFE_CAST` returns NULL on failure instead of throwing.
+---
+## JSON column patterns
+### Extracting a scalar value
+```sql
+-- From a top-level field
+JSON_VALUE(col, '$.fieldName')
+-- From a nested object
+JSON_VALUE(col, '$.parent.child.id')
+```
+### Extracting an array of scalars (for UNNEST)
+```sql
+-- Array of plain strings/numbers (association IDs):
+UNNEST(JSON_VALUE_ARRAY(col)) AS element
+-- Array of JSON objects (pipeline stages):
+UNNEST(JSON_QUERY_ARRAY(col)) AS obj
+-- then: JSON_VALUE(obj, '$.fieldName')
+```
+Rule of thumb: `JSON_VALUE_ARRAY` for scalar arrays, `JSON_QUERY_ARRAY` for object arrays.
+---
+## sql_table vs sql
+| Approach                                       | When to use                            |
+| ---------------------------------------------- | -------------------------------------- |
+| `sql_table: "\`<dataset>.<table>\`"`           | Raw table, no transformation needed    |
+| `sql: "SELECT ... FROM \`<dataset>.<table>\`"` | Derived view (UNNEST, JOIN, aggregate) |
+Always wrap BigQuery table names in backticks inside YAML. In YAML double-quoted
+strings you must escape backticks: `"\`dataset.table\`"`. In block scalars (`>`or`|`) no escaping needed:
+```yaml
+# Double-quoted — must escape backticks:
+sql_table: "`my_project.my_dataset.my_table`"
+# Inside sql block scalar — no escaping:
+sql: >
+  SELECT id FROM `my_project.my_dataset.my_table`
+```
+---
+## refresh_key patterns
+Priority order for the timestamp column:
+1. `_airbyte_extracted_at` — present on all Airbyte-synced tables
+2. `updated_at` / `modified_at` / `lastModifiedDate` — CDC streams
+3. `created_at` — insert-only facts
+4. `every: 1 hour` — only when **no timestamp column exists**, with a YAML comment
+```yaml
+# Pattern 1 (preferred):
+refresh_key:
+  sql: "SELECT MAX(_airbyte_extracted_at) FROM `<dataset>.<table>`"
+# Pattern 4 (last resort):
+refresh_key:
+  every: 1 hour  # no timestamp column available in this table
+```
+For derived cubes (`sql:` based, not `sql_table:`), the refresh key should
+reference the **underlying source table**, not the derived view:
+```yaml
+# Bridge cube derived from deals:
+refresh_key:
+  sql: "SELECT MAX(_airbyte_extracted_at) FROM `<dataset>.<prefix>deals`"
+```
+---
+## Common dimension types in Cube.dev
+| BigQuery type             | Cube type | Notes                                      |
+| ------------------------- | --------- | ------------------------------------------ |
+| STRING                    | `string`  | default for most IDs, names                |
+| INT64, FLOAT64, NUMERIC   | `number`  | use for metrics                            |
+| BOOL                      | `boolean` |                                            |
+| TIMESTAMP, DATETIME, DATE | `time`    | enables time drill-downs                   |
+| JSON                      | `string`  | expose extracted subfields individually    |
+| ARRAY                     | —         | use UNNEST in a bridge cube or `sql:` view |
+---
+## Naming conventions
+| Item                    | Convention                                |
+| ----------------------- | ----------------------------------------- |
+| Cube name               | `gold_` prefix stripped; snake_case       |
+| File name               | same as cube name + `.yml`                |
+| Dimension/measure names | snake_case                                |
+| Computed dimensions     | descriptive name, not `col_json_value`    |
+| Bridge cubes            | `<entity_a>_to_<entity_b>`                |
+| Table aliases           | `<entity>_<role>` (e.g. `users_assignee`) |
+---
+## BigQuery-specific SQL tips
+```sql
+-- Safe division (avoid divide-by-zero)
+SAFE_DIVIDE(numerator, denominator)
+-- Null-safe equality
+${CUBE}.col IS NOT DISTINCT FROM other_col
+-- Date truncation for time series
+DATE_TRUNC(${CUBE}.created_at, MONTH)
+-- String aggregation
+STRING_AGG(${CUBE}.name, ', ')
+```

package/dist/templates/skills/{create-semantic-model → create-cubes}/references/cube-examples.md RENAMED Viewed

@@ -1,4 +1,4 @@
-# Cube Overlay Examples
+# Cube Examples
 ## Table of Contents
@@ -50,7 +50,7 @@ Notes:
 1. Cube `name` is `hubspot_companies` (no `gold_` prefix).
 2. `sql_table` references `gold_hubspot_companies` (with `gold_` prefix), in backticks.
-3. The join references `${companies_deals}` — the cube name of a bridge cube defined in `semantic/companies_deals.yml`.
+3. The join references `${companies_deals}` — the cube name of a bridge cube defined in `cubes/companies_deals.yml`.
 4. Only `_airbyte_extracted_at` is exposed from Airbyte metadata, as `airbyte_extracted_at`.
 5. `refresh_key.sql` uses the same fully qualified table name as `sql_table`.

package/dist/templates/skills/create-cubes/references/hubspot-entities.md ADDED Viewed

@@ -0,0 +1,289 @@
+# HubSpot Entities Reference
+## Table naming
+Airbyte syncs HubSpot tables with a configurable prefix (default: `hubspot_`).
+Inspect the BigQuery dataset to identify the actual prefix:
+```sql
+SELECT table_name FROM `<dataset>.INFORMATION_SCHEMA.TABLES`
+WHERE table_name LIKE '%companies%' OR table_name LIKE '%deals%'
+ORDER BY table_name LIMIT 20;
+```
+Throughout this document `<prefix>` is a placeholder for that prefix (e.g. `hubspot_`).
+---
+## Primary entities
+| Cube name                | BigQuery table           | PK           | Notes                                              |
+| ------------------------ | ------------------------ | ------------ | -------------------------------------------------- | --- | --- | --- | --------- |
+| `<prefix>companies`      | `<prefix>companies`      | `id`         | `properties_name` is the display name              |
+| `<prefix>contacts`       | `<prefix>contacts`       | `id`         | `properties_hs_full_name_or_email` is display name |
+| `<prefix>deals`          | `<prefix>deals`          | `id`         | `properties_dealname` is display name              |
+| `<prefix>tickets`        | `<prefix>tickets`        | `id`         | —                                                  |
+| `<prefix>owners`         | `<prefix>owners`         | `id`         | Display name: `firstName                           |     | ' ' |     | lastName` |
+| `<prefix>engagements`    | `<prefix>engagements`    | `id`         | See engagement sub-types below                     |
+| `<prefix>deal_pipelines` | `<prefix>deal_pipelines` | `pipelineId` | Stages stored as JSON array                        |
+| `<prefix>line_items`     | `<prefix>line_items`     | `id`         | `properties_name`                                  |
+| `<prefix>products`       | `<prefix>products`       | `id`         | `properties_name`                                  |
+**Owner join pattern (shared by companies, contacts, deals, tickets):**
+```yaml
+joins:
+  <prefix>owners:
+    relationship: many_to_one
+    sql: "${CUBE}.properties_hubspot_owner_id = ${<prefix>owners.id}"
+```
+---
+## Bridge / junction cubes (public: false)
+HubSpot stores many-to-many associations as JSON arrays on the primary object.
+Bridge cubes are required to join across these associations. They must be
+`public: false` and use a composite PK.
+### Association columns
+| Source table          | Column                    | Contains                                |
+| --------------------- | ------------------------- | --------------------------------------- |
+| `<prefix>deals`       | `companies`               | JSON array of company IDs               |
+| `<prefix>deals`       | `contacts`                | JSON array of contact IDs               |
+| `<prefix>deals`       | `line_items`              | JSON array of line item IDs             |
+| `<prefix>deals`       | `deals`                   | JSON array (for tickets→deals)          |
+| `<prefix>tickets`     | `companies`               | JSON array of company IDs               |
+| `<prefix>tickets`     | `contacts`                | JSON array of contact IDs               |
+| `<prefix>tickets`     | `deals`                   | JSON array of deal IDs (CAST to STRING) |
+| `<prefix>companies`   | `contacts`                | JSON array of contact IDs               |
+| `<prefix>engagements` | `associations.contactIds` | JSON array of contact IDs               |
+| `<prefix>engagements` | `associations.companyIds` | JSON array of company IDs               |
+| `<prefix>engagements` | `associations.dealIds`    | JSON array of deal IDs                  |
+### Bridge cube: companies_to_deals
+```yaml
+name: <prefix>companies_to_deals
+sql: >
+  SELECT DISTINCT d.id as deal_id, company_id
+  FROM `<dataset>.<prefix>deals` d,
+  UNNEST(JSON_VALUE_ARRAY(d.companies)) company_id
+public: false
+dimensions:
+  id:
+    sql: "${CUBE.company_id} || ${CUBE.deal_id}"
+    type: string
+    primary_key: true
+  company_id:
+    sql: "${CUBE}.company_id"
+    type: string
+  deal_id:
+    sql: "${CUBE}.deal_id"
+    type: string
+joins:
+  <prefix>companies:
+    relationship: many_to_one
+    sql: "${CUBE}.company_id = ${<prefix>companies.id}"
+  <prefix>deals:
+    relationship: many_to_one
+    sql: "${CUBE}.deal_id = ${<prefix>deals.id}"
+refresh_key:
+  sql: "SELECT MAX(_airbyte_extracted_at) FROM `<dataset>.<prefix>deals`"
+```
+### Bridge cube: companies_to_tickets
+Same pattern — UNNEST `tickets.companies`:
+```yaml
+name: <prefix>companies_to_tickets
+sql: >
+  SELECT DISTINCT t.id as ticket_id, company_id
+  FROM `<dataset>.<prefix>tickets` t,
+  UNNEST(JSON_VALUE_ARRAY(t.companies)) company_id
+```
+### Bridge cube: deals_to_tickets
+Note: ticket `deals` column values are numbers — cast to STRING:
+```yaml
+name: <prefix>deals_to_tickets
+sql: >
+  SELECT DISTINCT t.id AS ticket_id, CAST(deal_id AS STRING) AS deal_id
+  FROM `<dataset>.<prefix>tickets` t,
+  UNNEST(JSON_VALUE_ARRAY(t.deals)) AS deal_id
+```
+### Bridge cube: deals_to_line_items
+```yaml
+name: <prefix>deals_to_line_items
+sql: >
+  SELECT DISTINCT d.id AS deal_id, line_item_id
+  FROM `<dataset>.<prefix>deals` d,
+  UNNEST(JSON_VALUE_ARRAY(d.line_items)) AS line_item_id
+```
+### Bridge cube: contacts_to_deals
+```yaml
+name: <prefix>contacts_to_deals
+sql: >
+  SELECT DISTINCT d.id AS deal_id, contact_id
+  FROM `<dataset>.<prefix>deals` d,
+  UNNEST(JSON_VALUE_ARRAY(d.contacts)) contact_id
+```
+### Bridge cube: contacts_to_tickets
+```yaml
+name: <prefix>contacts_to_tickets
+sql: >
+  SELECT DISTINCT t.id AS ticket_id, contact_id
+  FROM `<dataset>.<prefix>tickets` t,
+  UNNEST(JSON_VALUE_ARRAY(t.contacts)) contact_id
+```
+### Bridge cube: contacts_to_companies
+Note: this uses `SAFE_CAST` on both sides — IDs can have type mismatches:
+```yaml
+name: <prefix>contacts_to_companies
+sql: >
+  SELECT DISTINCT c.id AS company_id, contact_id
+  FROM `<dataset>.<prefix>companies` c,
+  UNNEST(JSON_VALUE_ARRAY(c.contacts)) AS contact_id
+joins:
+  <prefix>contacts:
+    relationship: many_to_one
+    sql: "SAFE_CAST(${CUBE}.contact_id AS STRING) = SAFE_CAST(${<prefix>contacts.id} AS STRING)"
+  <prefix>companies:
+    relationship: many_to_one
+    sql: "SAFE_CAST(${CUBE}.company_id AS STRING) = SAFE_CAST(${<prefix>companies.id} AS STRING)"
+```
+### Bridge cubes: engagements_to_contacts / companies / deals
+Engagement IDs are integers — always CAST to STRING:
+```yaml
+name: <prefix>engagements_to_contacts
+sql: >
+  SELECT DISTINCT
+    CAST(e.id AS STRING) AS engagement_id,
+    CAST(contact_id AS STRING) AS contact_id
+  FROM `<dataset>.<prefix>engagements` e,
+  UNNEST(JSON_VALUE_ARRAY(e.associations.contactIds)) AS contact_id
+```
+Same pattern for `companyIds` → `engagements_to_companies` and `dealIds` → `engagements_to_deals`.
+Engagement join:
+```yaml
+joins:
+  <prefix>engagements:
+    relationship: many_to_one
+    sql: "CAST(${CUBE}.engagement_id AS STRING) = CAST(${<prefix>engagements.id} AS STRING)"
+```
+---
+## Special cubes
+### deal_pipeline_stages
+Derived from `deal_pipelines.stages` JSON array. Not a raw table — uses `sql:` not `sql_table:`.
+```yaml
+name: <prefix>deal_pipeline_stages
+sql: >
+  SELECT
+    JSON_VALUE(elem, '$.stageId') AS stage_id,
+    JSON_VALUE(elem, '$.label') AS label
+  FROM `<dataset>.<prefix>deal_pipelines`,
+  UNNEST(JSON_QUERY_ARRAY(stages)) AS elem
+dimensions:
+  stage_id:
+    sql: "${CUBE}.stage_id"
+    type: string
+    primary_key: true
+  label:
+    sql: "${CUBE}.label"
+    type: string
+joins:
+  <prefix>deals:
+    relationship: one_to_many
+    sql: "${CUBE}.stage_id = ${<prefix>deals.properties_dealstage}"
+refresh_key:
+  sql: "SELECT MAX(_airbyte_extracted_at) FROM `<dataset>.<prefix>deal_pipelines`"
+```
+Deals join to stages and pipelines:
+```yaml
+joins:
+  <prefix>deal_pipeline_stages:
+    relationship: many_to_one
+    sql: "${CUBE}.properties_dealstage = ${<prefix>deal_pipeline_stages.stage_id}"
+  <prefix>deal_pipelines:
+    relationship: many_to_one
+    sql: "${CUBE}.properties_pipeline = ${<prefix>deal_pipelines.pipelineId}"
+```
+### engagements sub-types
+`engagements` table has sub-type tables: `engagements_calls`, `engagements_emails`,
+`engagements_meetings`, `engagements_tasks`, `engagements_notes`.
+Join pattern (one-to-one by ID with CAST):
+```yaml
+# On the engagements cube:
+joins:
+  <prefix>engagements_calls:
+    relationship: one_to_one
+    sql: "CAST(${CUBE}.id AS STRING) = ${<prefix>engagements_calls.id}"
+# On each sub-type cube:
+joins:
+  <prefix>engagements:
+    relationship: many_to_one
+    sql: "${CUBE}.id = CAST(${<prefix>engagements.id} AS STRING)"
+```
+---
+## Deals measures
+```yaml
+measures:
+  count_closed:
+    type: count
+    filters:
+      - sql: "${CUBE}.properties_hs_is_closed = TRUE"
+  count_closed_won:
+    type: count
+    filters:
+      - sql: "${CUBE}.properties_hs_is_closed_won = TRUE"
+  count_closed_lost:
+    type: count
+    filters:
+      - sql: >
+          ${CUBE}.properties_hs_is_closed = TRUE
+          AND ${CUBE}.properties_hs_is_closed_won = FALSE
+```
+---
+## Common pitfalls
+1. **ID type mismatches** — HubSpot IDs are sometimes integers, sometimes strings. Use `SAFE_CAST` when unsure (especially contacts_to_companies). Engagement IDs are always integers → always CAST to STRING.
+2. **JSON_VALUE_ARRAY vs JSON_QUERY_ARRAY** — use `JSON_VALUE_ARRAY` when the array contains scalar strings/ints (association IDs); use `JSON_QUERY_ARRAY` when the array contains JSON objects (deal_pipelines stages).
+3. **deal_pipeline_stages is derived** — uses `sql:` not `sql_table:`. Cannot be used in `revos cubes preview` diff against Airbyte-generated cubes.
+4. **engagements bridge refresh_key** — use the parent engagement table timestamp, not the contact/company/deal table.
+5. **Prefix varies** — always confirm the actual prefix from BigQuery before writing cube files.