@altimateai/altimate-code 0.4.9 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (38) hide show
  1. package/CHANGELOG.md +36 -0
  2. package/package.json +54 -14
  3. package/postinstall.mjs +35 -0
  4. package/skills/cost-report/SKILL.md +134 -0
  5. package/skills/data-viz/SKILL.md +135 -0
  6. package/skills/data-viz/references/component-guide.md +394 -0
  7. package/skills/dbt-analyze/SKILL.md +130 -0
  8. package/skills/dbt-analyze/references/altimate-dbt-commands.md +66 -0
  9. package/skills/dbt-analyze/references/lineage-interpretation.md +58 -0
  10. package/skills/dbt-develop/SKILL.md +151 -0
  11. package/skills/dbt-develop/references/altimate-dbt-commands.md +66 -0
  12. package/skills/dbt-develop/references/common-mistakes.md +49 -0
  13. package/skills/dbt-develop/references/incremental-strategies.md +118 -0
  14. package/skills/dbt-develop/references/layer-patterns.md +158 -0
  15. package/skills/dbt-develop/references/medallion-architecture.md +125 -0
  16. package/skills/dbt-develop/references/yaml-generation.md +90 -0
  17. package/skills/dbt-docs/SKILL.md +99 -0
  18. package/skills/dbt-docs/references/altimate-dbt-commands.md +66 -0
  19. package/skills/dbt-docs/references/documentation-standards.md +94 -0
  20. package/skills/dbt-test/SKILL.md +121 -0
  21. package/skills/dbt-test/references/altimate-dbt-commands.md +66 -0
  22. package/skills/dbt-test/references/custom-tests.md +59 -0
  23. package/skills/dbt-test/references/schema-test-patterns.md +103 -0
  24. package/skills/dbt-test/references/unit-test-guide.md +121 -0
  25. package/skills/dbt-troubleshoot/SKILL.md +187 -0
  26. package/skills/dbt-troubleshoot/references/altimate-dbt-commands.md +66 -0
  27. package/skills/dbt-troubleshoot/references/compilation-errors.md +57 -0
  28. package/skills/dbt-troubleshoot/references/runtime-errors.md +71 -0
  29. package/skills/dbt-troubleshoot/references/test-failures.md +95 -0
  30. package/skills/lineage-diff/SKILL.md +64 -0
  31. package/skills/pii-audit/SKILL.md +117 -0
  32. package/skills/query-optimize/SKILL.md +86 -0
  33. package/skills/schema-migration/SKILL.md +119 -0
  34. package/skills/sql-review/SKILL.md +118 -0
  35. package/skills/sql-translate/SKILL.md +68 -0
  36. package/skills/teach/SKILL.md +54 -0
  37. package/skills/train/SKILL.md +51 -0
  38. package/skills/training-status/SKILL.md +45 -0
@@ -0,0 +1,151 @@
1
+ ---
2
+ name: dbt-develop
3
+ description: Create and modify dbt models — staging, intermediate, marts, incremental, medallion architecture. Use when building new SQL models, extending existing ones, scaffolding YAML configs, or reorganizing project structure. Powered by altimate-dbt.
4
+ ---
5
+
6
+ # dbt Model Development
7
+
8
+ ## Requirements
9
+ **Agent:** builder or migrator (requires file write access)
10
+ **Tools used:** bash (runs `altimate-dbt` commands), read, glob, write, edit, schema_search, dbt_profiles, sql_analyze, altimate_core_validate, altimate_core_column_lineage
11
+
12
+ ## When to Use This Skill
13
+
14
+ **Use when the user wants to:**
15
+ - Create a new dbt model (staging, intermediate, mart, OBT)
16
+ - Add or modify SQL logic in an existing model
17
+ - Generate sources.yml or schema.yml from warehouse metadata
18
+ - Reorganize models into layers (staging/intermediate/mart or bronze/silver/gold)
19
+ - Convert a model to incremental materialization
20
+ - Scaffold a new dbt project structure
21
+
22
+ **Do NOT use for:**
23
+ - Adding tests to models → use `dbt-test`
24
+ - Writing model/column descriptions → use `dbt-docs`
25
+ - Debugging build failures → use `dbt-troubleshoot`
26
+ - Analyzing change impact → use `dbt-analyze`
27
+
28
+ ## Core Workflow: Plan → Discover → Write → Validate
29
+
30
+ ### 1. Plan — Understand Before Writing
31
+
32
+ Before writing any SQL:
33
+ - Read the task requirements carefully
34
+ - Identify which layer this model belongs to (staging, intermediate, mart)
35
+ - Check existing models for naming conventions and patterns
36
+ - **Check dependencies:** If `packages.yml` exists, check for `dbt_packages/` or `package-lock.yml`. Only run `dbt deps` if packages are declared but not yet installed.
37
+
38
+ ```bash
39
+ altimate-dbt info # project name, adapter type
40
+ altimate-dbt parents --model <upstream> # understand what feeds this model
41
+ altimate-dbt children --model <downstream> # understand what consumes it
42
+ ```
43
+
44
+ **Check warehouse connection:** Run `dbt_profiles` to discover available profiles and map them to warehouse connections. This tells you which adapter (Snowflake, BigQuery, Postgres, etc.) and target the project uses — essential for dialect-aware SQL.
45
+
46
+
47
+ ### 2. Discover — Understand the Data Before Writing
48
+
49
+ **Never write SQL without deeply understanding your data first.** The #1 cause of wrong results is writing SQL blind — assuming grain, relationships, column names, or values without checking.
50
+
51
+ **Step 2a: Search for relevant tables and columns**
52
+ - Use `schema_search` with natural-language queries to find tables/columns in large warehouses (e.g., `schema_search(query: "customer orders")` returns matching tables and columns from the indexed schema cache)
53
+ - Read `sources.yml`, `schema.yml`, and any YAML files that describe the source/parent models
54
+ - These contain column descriptions, data types, tests, and business context
55
+ - Pay special attention to: primary keys, unique constraints, relationships between tables, and what each column represents
56
+
57
+ **Step 2b: Understand the grain of each parent model/source**
58
+ - What does one row represent? (one customer? one event? one day per customer?)
59
+ - What are the primary/unique keys?
60
+ - This is critical for JOINs — joining on the wrong grain causes fan-out (too many rows) or missing rows
61
+
62
+ ```bash
63
+ altimate-dbt columns --model <name> # existing model columns
64
+ altimate-dbt columns-source --source <src> --table <tbl> # source table columns
65
+ altimate-dbt execute --query "SELECT count(*) FROM {{ ref('model') }}" --limit 1
66
+ altimate-dbt execute --query "SELECT * FROM {{ ref('model') }}" --limit 5
67
+ altimate-dbt column-values --model <name> --column <col> # sample values for key columns
68
+ ```
69
+
70
+ **Step 2c: Query the actual data to verify your understanding**
71
+ - Check row counts, NULLs, date ranges, cardinality of key columns
72
+ - Verify foreign key relationships actually hold (do all IDs in child exist in parent?)
73
+ - Check for duplicates in what you think are unique keys
74
+
75
+ **Step 2d: Read existing models that your new model will reference**
76
+ - Read the actual SQL of parent models — understand their logic, filters, and transformations
77
+ - Read 2-3 existing models in the same directory to match patterns and conventions
78
+
79
+ ```bash
80
+ glob models/**/*.sql # find all model files
81
+ read <model_file> # understand existing patterns and logic
82
+ ```
83
+
84
+ ### 3. Write — Follow Layer Patterns
85
+
86
+ See [references/layer-patterns.md](references/layer-patterns.md) for staging/intermediate/mart templates.
87
+ See [references/medallion-architecture.md](references/medallion-architecture.md) for bronze/silver/gold patterns.
88
+ See [references/incremental-strategies.md](references/incremental-strategies.md) for incremental materialization.
89
+ See [references/yaml-generation.md](references/yaml-generation.md) for sources.yml and schema.yml.
90
+
91
+ ### 4. Validate — Build, Verify, Check Impact
92
+
93
+ Never stop at writing the SQL. Always validate:
94
+
95
+ **Build it:**
96
+ ```bash
97
+ altimate-dbt compile --model <name> # catch Jinja errors
98
+ altimate-dbt build --model <name> # materialize + run tests
99
+ ```
100
+
101
+ **Verify the output:**
102
+ ```bash
103
+ altimate-dbt columns --model <name> # confirm expected columns exist
104
+ altimate-dbt execute --query "SELECT count(*) FROM {{ ref('<name>') }}" --limit 1
105
+ altimate-dbt execute --query "SELECT * FROM {{ ref('<name>') }}" --limit 10 # spot-check values
106
+ ```
107
+ - Do the columns match what schema.yml or the task expects?
108
+ - Does the row count make sense? (no fan-out from bad joins, no missing rows from wrong filters)
109
+ - Are values correct? (spot-check NULLs, aggregations, date ranges)
110
+
111
+ **Check SQL quality** (on the compiled SQL from `altimate-dbt compile`):
112
+ - `sql_analyze` — catches anti-patterns (SELECT *, cartesian products, missing filters)
113
+ - `altimate_core_validate` — validates syntax and schema references
114
+ - `altimate_core_column_lineage` — traces how source columns flow to output columns. Use this to verify your SELECT is pulling the right columns from the right sources, especially for complex JOINs or multi-CTE models.
115
+
116
+ **Check downstream impact** (when modifying an existing model):
117
+ ```bash
118
+ altimate-dbt children --model <name> # who depends on this?
119
+ altimate-dbt build --model <name> --downstream # rebuild downstream to catch breakage
120
+ ```
121
+ Use `altimate-dbt children` and `altimate-dbt parents` to verify the DAG is intact when changes could affect downstream models.
122
+
123
+ ## Iron Rules
124
+
125
+ 1. **Never write SQL without reading the source columns first.** Use `altimate-dbt columns` or `altimate-dbt columns-source`.
126
+ 2. **Never stop at compile.** Always `altimate-dbt build` to catch runtime errors.
127
+ 3. **Match existing patterns.** Read 2-3 existing models in the same directory before writing.
128
+ 4. **One model, one purpose.** A staging model should not contain business logic. An intermediate model should not be materialized as a table unless it has consumers.
129
+ 5. **Fix ALL errors, not just yours.** After creating/modifying models, run a full `dbt build`. If ANY model fails — even pre-existing ones you didn't touch — fix them. Your job is to leave the project in a fully working state.
130
+
131
+ ## Common Mistakes
132
+
133
+ | Mistake | Fix |
134
+ |---------|-----|
135
+ | Writing SQL without checking column names | Run `altimate-dbt columns` or `altimate-dbt columns-source` first |
136
+ | Stopping at `compile` — "it compiled, ship it" | Always `altimate-dbt build` to materialize and run tests |
137
+ | Hardcoding table references instead of `{{ ref() }}` | Always use `{{ ref('model') }}` or `{{ source('src', 'table') }}` |
138
+ | Creating a staging model with JOINs | Staging = 1:1 with source. JOINs belong in intermediate or mart |
139
+ | Not checking existing naming conventions | Read existing models in the same directory first |
140
+ | Using `SELECT *` in final models | Explicitly list columns for clarity and contract stability |
141
+
142
+ ## Reference Guides
143
+
144
+ | Guide | Use When |
145
+ |-------|----------|
146
+ | [references/altimate-dbt-commands.md](references/altimate-dbt-commands.md) | Need the full CLI reference |
147
+ | [references/layer-patterns.md](references/layer-patterns.md) | Creating staging, intermediate, or mart models |
148
+ | [references/medallion-architecture.md](references/medallion-architecture.md) | Organizing into bronze/silver/gold layers |
149
+ | [references/incremental-strategies.md](references/incremental-strategies.md) | Converting to incremental materialization |
150
+ | [references/yaml-generation.md](references/yaml-generation.md) | Generating sources.yml or schema.yml |
151
+ | [references/common-mistakes.md](references/common-mistakes.md) | Extended anti-patterns catalog |
@@ -0,0 +1,66 @@
1
+ # altimate-dbt Command Reference
2
+
3
+ All dbt operations use the `altimate-dbt` CLI. Output is JSON to stdout; logs go to stderr.
4
+
5
+ ```bash
6
+ altimate-dbt <command> [args...]
7
+ altimate-dbt <command> [args...] --format text # Human-readable output
8
+ ```
9
+
10
+ ## First-Time Setup
11
+
12
+ ```bash
13
+ altimate-dbt init # Auto-detect project root
14
+ altimate-dbt init --project-root /path # Explicit root
15
+ altimate-dbt init --python-path /path # Override Python
16
+ altimate-dbt doctor # Verify setup
17
+ altimate-dbt info # Project name, adapter, root
18
+ ```
19
+
20
+ ## Build & Run
21
+
22
+ ```bash
23
+ altimate-dbt build --model <name> [--downstream] # compile + run + test
24
+ altimate-dbt run --model <name> [--downstream] # materialize only
25
+ altimate-dbt test --model <name> # run tests only
26
+ altimate-dbt build-project # full project build
27
+ ```
28
+
29
+ ## Compile
30
+
31
+ ```bash
32
+ altimate-dbt compile --model <name>
33
+ altimate-dbt compile-query --query "SELECT * FROM {{ ref('stg_orders') }}" [--model <context>]
34
+ ```
35
+
36
+ ## Execute SQL
37
+
38
+ ```bash
39
+ altimate-dbt execute --query "SELECT count(*) FROM {{ ref('orders') }}" --limit 100
40
+ ```
41
+
42
+ ## Schema & DAG
43
+
44
+ ```bash
45
+ altimate-dbt columns --model <name> # column names and types
46
+ altimate-dbt columns-source --source <src> --table <tbl> # source table columns
47
+ altimate-dbt column-values --model <name> --column <col> # sample values
48
+ altimate-dbt children --model <name> # downstream models
49
+ altimate-dbt parents --model <name> # upstream models
50
+ ```
51
+
52
+ ## Packages
53
+
54
+ ```bash
55
+ altimate-dbt deps # install packages.yml
56
+ altimate-dbt add-packages --packages dbt-utils,dbt-expectations
57
+ ```
58
+
59
+ ## Error Handling
60
+
61
+ All errors return JSON with `error` and `fix` fields:
62
+ ```json
63
+ { "error": "dbt-core is not installed", "fix": "Install it: python3 -m pip install dbt-core" }
64
+ ```
65
+
66
+ Run `altimate-dbt doctor` as the first diagnostic step for any failure.
@@ -0,0 +1,49 @@
1
+ # Common dbt Development Mistakes
2
+
3
+ ## SQL Mistakes
4
+
5
+ | Mistake | Why It's Wrong | Fix |
6
+ |---------|---------------|-----|
7
+ | `SELECT *` in mart/fact models | Breaks contracts when upstream adds columns | Explicitly list all columns |
8
+ | Hardcoded table names | Breaks when schema/database changes | Use `{{ ref() }}` and `{{ source() }}` |
9
+ | Business logic in staging models | Staging = 1:1 source mirror | Move logic to intermediate layer |
10
+ | JOINs in staging models | Same as above | Move JOINs to intermediate layer |
11
+ | `LEFT JOIN` when `INNER JOIN` is correct | Returns NULLs for unmatched rows | Think about what NULL means for your use case |
12
+ | Missing `GROUP BY` columns | Query fails or returns wrong results | Every non-aggregate column must be in GROUP BY |
13
+ | Window functions without `PARTITION BY` | Aggregates across entire table | Add appropriate partitioning |
14
+
15
+ ## Project Structure Mistakes
16
+
17
+ | Mistake | Fix |
18
+ |---------|-----|
19
+ | Models in wrong layer directory | staging/ for source mirrors, intermediate/ for transforms, marts/ for business tables |
20
+ | No schema.yml for new models | Always create companion YAML with at minimum `unique` + `not_null` on primary key |
21
+ | Naming doesn't match convention | Check existing models: `stg_`, `int_`, `fct_`, `dim_` prefixes |
22
+ | Missing `{{ config() }}` block | Every model should declare materialization explicitly or inherit from dbt_project.yml |
23
+
24
+ ## Incremental Mistakes
25
+
26
+ | Mistake | Fix |
27
+ |---------|-----|
28
+ | No `unique_key` on merge strategy | Causes duplicates. Set `unique_key` to your primary key |
29
+ | Using `created_at` for mutable records | Use `updated_at` — `created_at` misses updates |
30
+ | No lookback window | Use `max(ts) - interval '1 hour'` to catch late-arriving data |
31
+ | Forgetting `is_incremental()` returns false on first run | The `WHERE` clause only applies after first run |
32
+
33
+ ## Validation Mistakes
34
+
35
+ | Mistake | Fix |
36
+ |---------|-----|
37
+ | "It compiled, ship it" | Compilation only checks Jinja syntax. Always `altimate-dbt build` |
38
+ | Not spot-checking output data | Run `altimate-dbt execute --query "SELECT * FROM {{ ref('model') }}" --limit 10` |
39
+ | Not checking row counts | Compare source vs output: `SELECT count(*) FROM {{ ref('model') }}` |
40
+ | Skipping downstream builds | Use `altimate-dbt build --model <name> --downstream` |
41
+
42
+ ## Rationalizations to Resist
43
+
44
+ | You're Thinking... | Reality |
45
+ |--------------------|---------|
46
+ | "I'll add tests later" | You won't. Add them now. |
47
+ | "SELECT * is fine for now" | It will break when upstream changes. List columns explicitly. |
48
+ | "This model is temporary" | Nothing is more permanent than a temporary solution. |
49
+ | "The data looks right at a glance" | Run the build. Check the tests. Spot-check edge cases. |
@@ -0,0 +1,118 @@
1
+ # Incremental Materialization Strategies
2
+
3
+ ## When to Use Incremental
4
+
5
+ Use incremental when:
6
+ - Table has millions+ rows
7
+ - Source data is append-only or has reliable `updated_at` timestamps
8
+ - Full refreshes take too long or cost too much
9
+
10
+ Do NOT use incremental when:
11
+ - Table is small (< 1M rows)
12
+ - Source data doesn't have reliable timestamps
13
+ - Logic requires full-table window functions
14
+
15
+ ## Strategy Decision Tree
16
+
17
+ ```
18
+ Is the data append-only (events, logs)?
19
+ YES → Append strategy
20
+ NO → Can rows be updated?
21
+ YES → Does your warehouse support MERGE?
22
+ YES → Merge/Upsert strategy
23
+ NO → Delete+Insert strategy
24
+ NO → Is data date-partitioned?
25
+ YES → Insert Overwrite strategy
26
+ NO → Append with dedup
27
+ ```
28
+
29
+ ## Append (Event Logs)
30
+
31
+ ```sql
32
+ {{ config(
33
+ materialized='incremental',
34
+ on_schema_change='append_new_columns'
35
+ ) }}
36
+
37
+ select
38
+ event_id,
39
+ event_type,
40
+ created_at
41
+ from {{ ref('stg_events') }}
42
+
43
+ {% if is_incremental() %}
44
+ where created_at > (select max(created_at) from {{ this }})
45
+ {% endif %}
46
+ ```
47
+
48
+ ## Merge/Upsert (Mutable Records)
49
+
50
+ ```sql
51
+ {{ config(
52
+ materialized='incremental',
53
+ unique_key='order_id',
54
+ merge_update_columns=['status', 'updated_at', 'amount'],
55
+ on_schema_change='sync_all_columns'
56
+ ) }}
57
+
58
+ select
59
+ order_id,
60
+ status,
61
+ amount,
62
+ created_at,
63
+ updated_at
64
+ from {{ ref('stg_orders') }}
65
+
66
+ {% if is_incremental() %}
67
+ where updated_at > (select max(updated_at) from {{ this }})
68
+ {% endif %}
69
+ ```
70
+
71
+ ## Insert Overwrite (Partitioned)
72
+
73
+ ```sql
74
+ {{ config(
75
+ materialized='incremental',
76
+ incremental_strategy='insert_overwrite',
77
+ partition_by={'field': 'event_date', 'data_type': 'date'},
78
+ on_schema_change='fail'
79
+ ) }}
80
+
81
+ select
82
+ date_trunc('day', created_at) as event_date,
83
+ count(*) as event_count
84
+ from {{ ref('stg_events') }}
85
+
86
+ {% if is_incremental() %}
87
+ where date_trunc('day', created_at) >= (select max(event_date) - interval '3 days' from {{ this }})
88
+ {% endif %}
89
+
90
+ group by 1
91
+ ```
92
+
93
+ ## Common Pitfalls
94
+
95
+ | Issue | Problem | Fix |
96
+ |-------|---------|-----|
97
+ | Missing `unique_key` | Duplicates on re-run | Add `unique_key` matching the primary key |
98
+ | Wrong timestamp column | Missed updates | Use `updated_at` not `created_at` for mutable data |
99
+ | No lookback window | Late-arriving data missed | `max(ts) - interval '1 hour'` instead of strict `>` |
100
+ | `on_schema_change='fail'` | Breaks on column additions | Use `'append_new_columns'` or `'sync_all_columns'` |
101
+ | Full refresh needed | Schema drift accumulated | `altimate-dbt run --model <name>` with `--full-refresh` flag |
102
+
103
+ ## Official Documentation
104
+
105
+ For the latest syntax and adapter-specific options, refer to:
106
+ - **dbt incremental models**: https://docs.getdbt.com/docs/build/incremental-models
107
+ - **Incremental strategies by adapter**: https://docs.getdbt.com/docs/build/incremental-strategy
108
+ - **Configuring incremental models**: https://docs.getdbt.com/reference/resource-configs/materialized#incremental
109
+
110
+ ## Warehouse Support
111
+
112
+ | Warehouse | Default Strategy | Merge | Partition | Notes |
113
+ |-----------|-----------------|-------|-----------|-------|
114
+ | Snowflake | `merge` | Yes | Cluster keys | Best incremental support |
115
+ | BigQuery | `merge` | Yes | `partition_by` | Requires partition for insert_overwrite |
116
+ | PostgreSQL | `append` | No | No | Use delete+insert pattern |
117
+ | DuckDB | `append` | Partial | No | Limited incremental support |
118
+ | Redshift | `append` | No | dist/sort keys | Use delete+insert pattern |
@@ -0,0 +1,158 @@
1
+ # dbt Layer Patterns
2
+
3
+ ## Staging (`stg_`)
4
+
5
+ **Purpose**: 1:1 with source table. Rename, cast, no joins, no business logic.
6
+ **Materialization**: `view`
7
+ **Naming**: `stg_<source>__<table>.sql`
8
+ **Location**: `models/staging/<source>/`
9
+
10
+ ```sql
11
+ with source as (
12
+ select * from {{ source('source_name', 'table_name') }}
13
+ ),
14
+
15
+ renamed as (
16
+ select
17
+ -- Primary key
18
+ column_id as table_id,
19
+
20
+ -- Dimensions
21
+ column_name,
22
+
23
+ -- Timestamps
24
+ created_at,
25
+ updated_at
26
+
27
+ from source
28
+ )
29
+
30
+ select * from renamed
31
+ ```
32
+
33
+ **Rules**:
34
+ - One CTE named `source`, one named `renamed` (or `cast`, `cleaned`)
35
+ - Only type casting, renaming, deduplication
36
+ - No joins, no filters (except dedup), no business logic
37
+
38
+ ## Intermediate (`int_`)
39
+
40
+ **Purpose**: Business logic, joins, transformations between staging and marts.
41
+ **Materialization**: `ephemeral` or `view`
42
+ **Naming**: `int_<entity>__<verb>.sql` (e.g., `int_orders__joined`, `int_payments__pivoted`)
43
+ **Location**: `models/intermediate/`
44
+
45
+ ```sql
46
+ with orders as (
47
+ select * from {{ ref('stg_source__orders') }}
48
+ ),
49
+
50
+ customers as (
51
+ select * from {{ ref('stg_source__customers') }}
52
+ ),
53
+
54
+ joined as (
55
+ select
56
+ orders.order_id,
57
+ orders.customer_id,
58
+ customers.customer_name,
59
+ orders.order_date,
60
+ orders.amount
61
+ from orders
62
+ left join customers on orders.customer_id = customers.customer_id
63
+ )
64
+
65
+ select * from joined
66
+ ```
67
+
68
+ **Rules**:
69
+ - Cross-source joins allowed
70
+ - Business logic transformations
71
+ - Not exposed to end users directly
72
+ - Name the verb: `__joined`, `__pivoted`, `__filtered`, `__aggregated`
73
+
74
+ ## Mart: Facts (`fct_`)
75
+
76
+ **Purpose**: Business events. Immutable, timestamped, narrow.
77
+ **Materialization**: `table` or `incremental`
78
+ **Naming**: `fct_<entity>.sql`
79
+ **Location**: `models/marts/<domain>/`
80
+
81
+ ```sql
82
+ with final as (
83
+ select
84
+ order_id,
85
+ customer_id,
86
+ order_date,
87
+ amount,
88
+ discount_amount,
89
+ amount - discount_amount as net_amount
90
+ from {{ ref('int_orders__joined') }}
91
+ )
92
+
93
+ select * from final
94
+ ```
95
+
96
+ ## Mart: Dimensions (`dim_`)
97
+
98
+ **Purpose**: Descriptive attributes. Slowly changing, wide.
99
+ **Materialization**: `table`
100
+ **Naming**: `dim_<entity>.sql`
101
+
102
+ ```sql
103
+ with final as (
104
+ select
105
+ customer_id,
106
+ customer_name,
107
+ email,
108
+ segment,
109
+ first_order_date,
110
+ most_recent_order_date,
111
+ lifetime_order_count
112
+ from {{ ref('int_customers__aggregated') }}
113
+ )
114
+
115
+ select * from final
116
+ ```
117
+
118
+ ## One Big Table (`obt_`)
119
+
120
+ **Purpose**: Denormalized wide table combining fact + dimensions for BI consumption.
121
+ **Materialization**: `table`
122
+ **Naming**: `obt_<entity>.sql`
123
+
124
+ ```sql
125
+ with facts as (
126
+ select * from {{ ref('fct_orders') }}
127
+ ),
128
+
129
+ customers as (
130
+ select * from {{ ref('dim_customers') }}
131
+ ),
132
+
133
+ dates as (
134
+ select * from {{ ref('dim_dates') }}
135
+ ),
136
+
137
+ final as (
138
+ select
139
+ facts.*,
140
+ customers.customer_name,
141
+ customers.segment,
142
+ dates.day_of_week,
143
+ dates.month_name,
144
+ dates.is_weekend
145
+ from facts
146
+ left join customers on facts.customer_id = customers.customer_id
147
+ left join dates on facts.order_date = dates.date_day
148
+ )
149
+
150
+ select * from final
151
+ ```
152
+
153
+ ## CTE Style Guide
154
+
155
+ - Name CTEs after what they contain, not what they do: `orders` not `get_orders`
156
+ - Use `final` as the last CTE name
157
+ - One CTE per source model (via `ref()` or `source()`)
158
+ - End every model with `select * from final`
@@ -0,0 +1,125 @@
1
+ # Medallion Architecture (Bronze / Silver / Gold)
2
+
3
+ An alternative to staging/intermediate/mart layering. Common in Databricks and lakehouse environments.
4
+
5
+ ## Layer Mapping
6
+
7
+ | Medallion | Traditional dbt | Purpose |
8
+ |-----------|----------------|---------|
9
+ | Bronze | Staging (`stg_`) | Raw ingestion, minimal transform |
10
+ | Silver | Intermediate (`int_`) | Cleaned, conformed, joined |
11
+ | Gold | Marts (`fct_`/`dim_`) | Business-ready aggregations |
12
+
13
+ ## Directory Structure
14
+
15
+ ```
16
+ models/
17
+ bronze/
18
+ source_system/
19
+ _source_system__sources.yml
20
+ brz_source_system__table.sql
21
+ silver/
22
+ domain/
23
+ slv_domain__entity.sql
24
+ gold/
25
+ domain/
26
+ fct_metric.sql
27
+ dim_entity.sql
28
+ ```
29
+
30
+ ## Bronze (Raw)
31
+
32
+ ```sql
33
+ -- brz_stripe__payments.sql
34
+ {{ config(materialized='view') }}
35
+
36
+ with source as (
37
+ select * from {{ source('stripe', 'payments') }}
38
+ ),
39
+
40
+ cast as (
41
+ select
42
+ cast(id as varchar) as payment_id,
43
+ cast(amount as integer) as amount_cents,
44
+ cast(created as timestamp) as created_at,
45
+ _loaded_at
46
+ from source
47
+ )
48
+
49
+ select * from cast
50
+ ```
51
+
52
+ **Rules**: 1:1 with source, only cast/rename, no joins, `view` materialization.
53
+
54
+ ## Silver (Cleaned)
55
+
56
+ ```sql
57
+ -- slv_finance__orders_enriched.sql
58
+ {{ config(materialized='table') }}
59
+
60
+ with orders as (
61
+ select * from {{ ref('brz_stripe__payments') }}
62
+ ),
63
+
64
+ customers as (
65
+ select * from {{ ref('brz_crm__customers') }}
66
+ ),
67
+
68
+ enriched as (
69
+ select
70
+ o.payment_id,
71
+ o.amount_cents / 100.0 as amount_dollars,
72
+ c.customer_name,
73
+ c.segment,
74
+ o.created_at
75
+ from orders o
76
+ left join customers c on o.customer_id = c.customer_id
77
+ where o.created_at is not null
78
+ )
79
+
80
+ select * from enriched
81
+ ```
82
+
83
+ **Rules**: Cross-source joins, business logic, quality filters, `table` materialization.
84
+
85
+ ## Gold (Business-Ready)
86
+
87
+ ```sql
88
+ -- fct_daily_revenue.sql
89
+ {{ config(materialized='table') }}
90
+
91
+ with daily as (
92
+ select
93
+ date_trunc('day', created_at) as revenue_date,
94
+ segment,
95
+ count(*) as order_count,
96
+ sum(amount_dollars) as gross_revenue
97
+ from {{ ref('slv_finance__orders_enriched') }}
98
+ group by 1, 2
99
+ )
100
+
101
+ select * from daily
102
+ ```
103
+
104
+ **Rules**: Aggregations, metrics, KPIs, `table` or `incremental` materialization.
105
+
106
+ ## dbt_project.yml Config
107
+
108
+ ```yaml
109
+ models:
110
+ my_project:
111
+ bronze:
112
+ +materialized: view
113
+ silver:
114
+ +materialized: table
115
+ gold:
116
+ +materialized: table
117
+ ```
118
+
119
+ ## When to Use Medallion vs Traditional
120
+
121
+ | Use Medallion When | Use Traditional When |
122
+ |-------------------|---------------------|
123
+ | Databricks/lakehouse environment | dbt Cloud or Snowflake-centric |
124
+ | Team already uses bronze/silver/gold terminology | Team uses staging/intermediate/mart |
125
+ | Data platform team maintains bronze layer | Analytics engineers own the full stack |