@altimateai/altimate-code 0.4.9 → 0.5.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +36 -0
- package/package.json +54 -14
- package/postinstall.mjs +35 -0
- package/skills/cost-report/SKILL.md +134 -0
- package/skills/data-viz/SKILL.md +135 -0
- package/skills/data-viz/references/component-guide.md +394 -0
- package/skills/dbt-analyze/SKILL.md +130 -0
- package/skills/dbt-analyze/references/altimate-dbt-commands.md +66 -0
- package/skills/dbt-analyze/references/lineage-interpretation.md +58 -0
- package/skills/dbt-develop/SKILL.md +151 -0
- package/skills/dbt-develop/references/altimate-dbt-commands.md +66 -0
- package/skills/dbt-develop/references/common-mistakes.md +49 -0
- package/skills/dbt-develop/references/incremental-strategies.md +118 -0
- package/skills/dbt-develop/references/layer-patterns.md +158 -0
- package/skills/dbt-develop/references/medallion-architecture.md +125 -0
- package/skills/dbt-develop/references/yaml-generation.md +90 -0
- package/skills/dbt-docs/SKILL.md +99 -0
- package/skills/dbt-docs/references/altimate-dbt-commands.md +66 -0
- package/skills/dbt-docs/references/documentation-standards.md +94 -0
- package/skills/dbt-test/SKILL.md +121 -0
- package/skills/dbt-test/references/altimate-dbt-commands.md +66 -0
- package/skills/dbt-test/references/custom-tests.md +59 -0
- package/skills/dbt-test/references/schema-test-patterns.md +103 -0
- package/skills/dbt-test/references/unit-test-guide.md +121 -0
- package/skills/dbt-troubleshoot/SKILL.md +187 -0
- package/skills/dbt-troubleshoot/references/altimate-dbt-commands.md +66 -0
- package/skills/dbt-troubleshoot/references/compilation-errors.md +57 -0
- package/skills/dbt-troubleshoot/references/runtime-errors.md +71 -0
- package/skills/dbt-troubleshoot/references/test-failures.md +95 -0
- package/skills/lineage-diff/SKILL.md +64 -0
- package/skills/pii-audit/SKILL.md +117 -0
- package/skills/query-optimize/SKILL.md +86 -0
- package/skills/schema-migration/SKILL.md +119 -0
- package/skills/sql-review/SKILL.md +118 -0
- package/skills/sql-translate/SKILL.md +68 -0
- package/skills/teach/SKILL.md +54 -0
- package/skills/train/SKILL.md +51 -0
- package/skills/training-status/SKILL.md +45 -0
|
@@ -0,0 +1,151 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: dbt-develop
|
|
3
|
+
description: Create and modify dbt models — staging, intermediate, marts, incremental, medallion architecture. Use when building new SQL models, extending existing ones, scaffolding YAML configs, or reorganizing project structure. Powered by altimate-dbt.
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# dbt Model Development
|
|
7
|
+
|
|
8
|
+
## Requirements
|
|
9
|
+
**Agent:** builder or migrator (requires file write access)
|
|
10
|
+
**Tools used:** bash (runs `altimate-dbt` commands), read, glob, write, edit, schema_search, dbt_profiles, sql_analyze, altimate_core_validate, altimate_core_column_lineage
|
|
11
|
+
|
|
12
|
+
## When to Use This Skill
|
|
13
|
+
|
|
14
|
+
**Use when the user wants to:**
|
|
15
|
+
- Create a new dbt model (staging, intermediate, mart, OBT)
|
|
16
|
+
- Add or modify SQL logic in an existing model
|
|
17
|
+
- Generate sources.yml or schema.yml from warehouse metadata
|
|
18
|
+
- Reorganize models into layers (staging/intermediate/mart or bronze/silver/gold)
|
|
19
|
+
- Convert a model to incremental materialization
|
|
20
|
+
- Scaffold a new dbt project structure
|
|
21
|
+
|
|
22
|
+
**Do NOT use for:**
|
|
23
|
+
- Adding tests to models → use `dbt-test`
|
|
24
|
+
- Writing model/column descriptions → use `dbt-docs`
|
|
25
|
+
- Debugging build failures → use `dbt-troubleshoot`
|
|
26
|
+
- Analyzing change impact → use `dbt-analyze`
|
|
27
|
+
|
|
28
|
+
## Core Workflow: Plan → Discover → Write → Validate
|
|
29
|
+
|
|
30
|
+
### 1. Plan — Understand Before Writing
|
|
31
|
+
|
|
32
|
+
Before writing any SQL:
|
|
33
|
+
- Read the task requirements carefully
|
|
34
|
+
- Identify which layer this model belongs to (staging, intermediate, mart)
|
|
35
|
+
- Check existing models for naming conventions and patterns
|
|
36
|
+
- **Check dependencies:** If `packages.yml` exists, check for `dbt_packages/` or `package-lock.yml`. Only run `dbt deps` if packages are declared but not yet installed.
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
altimate-dbt info # project name, adapter type
|
|
40
|
+
altimate-dbt parents --model <upstream> # understand what feeds this model
|
|
41
|
+
altimate-dbt children --model <downstream> # understand what consumes it
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Check warehouse connection:** Run `dbt_profiles` to discover available profiles and map them to warehouse connections. This tells you which adapter (Snowflake, BigQuery, Postgres, etc.) and target the project uses — essential for dialect-aware SQL.
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
### 2. Discover — Understand the Data Before Writing
|
|
48
|
+
|
|
49
|
+
**Never write SQL without deeply understanding your data first.** The #1 cause of wrong results is writing SQL blind — assuming grain, relationships, column names, or values without checking.
|
|
50
|
+
|
|
51
|
+
**Step 2a: Search for relevant tables and columns**
|
|
52
|
+
- Use `schema_search` with natural-language queries to find tables/columns in large warehouses (e.g., `schema_search(query: "customer orders")` returns matching tables and columns from the indexed schema cache)
|
|
53
|
+
- Read `sources.yml`, `schema.yml`, and any YAML files that describe the source/parent models
|
|
54
|
+
- These contain column descriptions, data types, tests, and business context
|
|
55
|
+
- Pay special attention to: primary keys, unique constraints, relationships between tables, and what each column represents
|
|
56
|
+
|
|
57
|
+
**Step 2b: Understand the grain of each parent model/source**
|
|
58
|
+
- What does one row represent? (one customer? one event? one day per customer?)
|
|
59
|
+
- What are the primary/unique keys?
|
|
60
|
+
- This is critical for JOINs — joining on the wrong grain causes fan-out (too many rows) or missing rows
|
|
61
|
+
|
|
62
|
+
```bash
|
|
63
|
+
altimate-dbt columns --model <name> # existing model columns
|
|
64
|
+
altimate-dbt columns-source --source <src> --table <tbl> # source table columns
|
|
65
|
+
altimate-dbt execute --query "SELECT count(*) FROM {{ ref('model') }}" --limit 1
|
|
66
|
+
altimate-dbt execute --query "SELECT * FROM {{ ref('model') }}" --limit 5
|
|
67
|
+
altimate-dbt column-values --model <name> --column <col> # sample values for key columns
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**Step 2c: Query the actual data to verify your understanding**
|
|
71
|
+
- Check row counts, NULLs, date ranges, cardinality of key columns
|
|
72
|
+
- Verify foreign key relationships actually hold (do all IDs in child exist in parent?)
|
|
73
|
+
- Check for duplicates in what you think are unique keys
|
|
74
|
+
|
|
75
|
+
**Step 2d: Read existing models that your new model will reference**
|
|
76
|
+
- Read the actual SQL of parent models — understand their logic, filters, and transformations
|
|
77
|
+
- Read 2-3 existing models in the same directory to match patterns and conventions
|
|
78
|
+
|
|
79
|
+
```bash
|
|
80
|
+
glob models/**/*.sql # find all model files
|
|
81
|
+
read <model_file> # understand existing patterns and logic
|
|
82
|
+
```
|
|
83
|
+
|
|
84
|
+
### 3. Write — Follow Layer Patterns
|
|
85
|
+
|
|
86
|
+
See [references/layer-patterns.md](references/layer-patterns.md) for staging/intermediate/mart templates.
|
|
87
|
+
See [references/medallion-architecture.md](references/medallion-architecture.md) for bronze/silver/gold patterns.
|
|
88
|
+
See [references/incremental-strategies.md](references/incremental-strategies.md) for incremental materialization.
|
|
89
|
+
See [references/yaml-generation.md](references/yaml-generation.md) for sources.yml and schema.yml.
|
|
90
|
+
|
|
91
|
+
### 4. Validate — Build, Verify, Check Impact
|
|
92
|
+
|
|
93
|
+
Never stop at writing the SQL. Always validate:
|
|
94
|
+
|
|
95
|
+
**Build it:**
|
|
96
|
+
```bash
|
|
97
|
+
altimate-dbt compile --model <name> # catch Jinja errors
|
|
98
|
+
altimate-dbt build --model <name> # materialize + run tests
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
**Verify the output:**
|
|
102
|
+
```bash
|
|
103
|
+
altimate-dbt columns --model <name> # confirm expected columns exist
|
|
104
|
+
altimate-dbt execute --query "SELECT count(*) FROM {{ ref('<name>') }}" --limit 1
|
|
105
|
+
altimate-dbt execute --query "SELECT * FROM {{ ref('<name>') }}" --limit 10 # spot-check values
|
|
106
|
+
```
|
|
107
|
+
- Do the columns match what schema.yml or the task expects?
|
|
108
|
+
- Does the row count make sense? (no fan-out from bad joins, no missing rows from wrong filters)
|
|
109
|
+
- Are values correct? (spot-check NULLs, aggregations, date ranges)
|
|
110
|
+
|
|
111
|
+
**Check SQL quality** (on the compiled SQL from `altimate-dbt compile`):
|
|
112
|
+
- `sql_analyze` — catches anti-patterns (SELECT *, cartesian products, missing filters)
|
|
113
|
+
- `altimate_core_validate` — validates syntax and schema references
|
|
114
|
+
- `altimate_core_column_lineage` — traces how source columns flow to output columns. Use this to verify your SELECT is pulling the right columns from the right sources, especially for complex JOINs or multi-CTE models.
|
|
115
|
+
|
|
116
|
+
**Check downstream impact** (when modifying an existing model):
|
|
117
|
+
```bash
|
|
118
|
+
altimate-dbt children --model <name> # who depends on this?
|
|
119
|
+
altimate-dbt build --model <name> --downstream # rebuild downstream to catch breakage
|
|
120
|
+
```
|
|
121
|
+
Use `altimate-dbt children` and `altimate-dbt parents` to verify the DAG is intact when changes could affect downstream models.
|
|
122
|
+
|
|
123
|
+
## Iron Rules
|
|
124
|
+
|
|
125
|
+
1. **Never write SQL without reading the source columns first.** Use `altimate-dbt columns` or `altimate-dbt columns-source`.
|
|
126
|
+
2. **Never stop at compile.** Always `altimate-dbt build` to catch runtime errors.
|
|
127
|
+
3. **Match existing patterns.** Read 2-3 existing models in the same directory before writing.
|
|
128
|
+
4. **One model, one purpose.** A staging model should not contain business logic. An intermediate model should not be materialized as a table unless it has consumers.
|
|
129
|
+
5. **Fix ALL errors, not just yours.** After creating/modifying models, run a full `dbt build`. If ANY model fails — even pre-existing ones you didn't touch — fix them. Your job is to leave the project in a fully working state.
|
|
130
|
+
|
|
131
|
+
## Common Mistakes
|
|
132
|
+
|
|
133
|
+
| Mistake | Fix |
|
|
134
|
+
|---------|-----|
|
|
135
|
+
| Writing SQL without checking column names | Run `altimate-dbt columns` or `altimate-dbt columns-source` first |
|
|
136
|
+
| Stopping at `compile` — "it compiled, ship it" | Always `altimate-dbt build` to materialize and run tests |
|
|
137
|
+
| Hardcoding table references instead of `{{ ref() }}` | Always use `{{ ref('model') }}` or `{{ source('src', 'table') }}` |
|
|
138
|
+
| Creating a staging model with JOINs | Staging = 1:1 with source. JOINs belong in intermediate or mart |
|
|
139
|
+
| Not checking existing naming conventions | Read existing models in the same directory first |
|
|
140
|
+
| Using `SELECT *` in final models | Explicitly list columns for clarity and contract stability |
|
|
141
|
+
|
|
142
|
+
## Reference Guides
|
|
143
|
+
|
|
144
|
+
| Guide | Use When |
|
|
145
|
+
|-------|----------|
|
|
146
|
+
| [references/altimate-dbt-commands.md](references/altimate-dbt-commands.md) | Need the full CLI reference |
|
|
147
|
+
| [references/layer-patterns.md](references/layer-patterns.md) | Creating staging, intermediate, or mart models |
|
|
148
|
+
| [references/medallion-architecture.md](references/medallion-architecture.md) | Organizing into bronze/silver/gold layers |
|
|
149
|
+
| [references/incremental-strategies.md](references/incremental-strategies.md) | Converting to incremental materialization |
|
|
150
|
+
| [references/yaml-generation.md](references/yaml-generation.md) | Generating sources.yml or schema.yml |
|
|
151
|
+
| [references/common-mistakes.md](references/common-mistakes.md) | Extended anti-patterns catalog |
|
|
@@ -0,0 +1,66 @@
|
|
|
1
|
+
# altimate-dbt Command Reference
|
|
2
|
+
|
|
3
|
+
All dbt operations use the `altimate-dbt` CLI. Output is JSON to stdout; logs go to stderr.
|
|
4
|
+
|
|
5
|
+
```bash
|
|
6
|
+
altimate-dbt <command> [args...]
|
|
7
|
+
altimate-dbt <command> [args...] --format text # Human-readable output
|
|
8
|
+
```
|
|
9
|
+
|
|
10
|
+
## First-Time Setup
|
|
11
|
+
|
|
12
|
+
```bash
|
|
13
|
+
altimate-dbt init # Auto-detect project root
|
|
14
|
+
altimate-dbt init --project-root /path # Explicit root
|
|
15
|
+
altimate-dbt init --python-path /path # Override Python
|
|
16
|
+
altimate-dbt doctor # Verify setup
|
|
17
|
+
altimate-dbt info # Project name, adapter, root
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
## Build & Run
|
|
21
|
+
|
|
22
|
+
```bash
|
|
23
|
+
altimate-dbt build --model <name> [--downstream] # compile + run + test
|
|
24
|
+
altimate-dbt run --model <name> [--downstream] # materialize only
|
|
25
|
+
altimate-dbt test --model <name> # run tests only
|
|
26
|
+
altimate-dbt build-project # full project build
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Compile
|
|
30
|
+
|
|
31
|
+
```bash
|
|
32
|
+
altimate-dbt compile --model <name>
|
|
33
|
+
altimate-dbt compile-query --query "SELECT * FROM {{ ref('stg_orders') }}" [--model <context>]
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
## Execute SQL
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
altimate-dbt execute --query "SELECT count(*) FROM {{ ref('orders') }}" --limit 100
|
|
40
|
+
```
|
|
41
|
+
|
|
42
|
+
## Schema & DAG
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
altimate-dbt columns --model <name> # column names and types
|
|
46
|
+
altimate-dbt columns-source --source <src> --table <tbl> # source table columns
|
|
47
|
+
altimate-dbt column-values --model <name> --column <col> # sample values
|
|
48
|
+
altimate-dbt children --model <name> # downstream models
|
|
49
|
+
altimate-dbt parents --model <name> # upstream models
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
## Packages
|
|
53
|
+
|
|
54
|
+
```bash
|
|
55
|
+
altimate-dbt deps # install packages.yml
|
|
56
|
+
altimate-dbt add-packages --packages dbt-utils,dbt-expectations
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
## Error Handling
|
|
60
|
+
|
|
61
|
+
All errors return JSON with `error` and `fix` fields:
|
|
62
|
+
```json
|
|
63
|
+
{ "error": "dbt-core is not installed", "fix": "Install it: python3 -m pip install dbt-core" }
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
Run `altimate-dbt doctor` as the first diagnostic step for any failure.
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Common dbt Development Mistakes
|
|
2
|
+
|
|
3
|
+
## SQL Mistakes
|
|
4
|
+
|
|
5
|
+
| Mistake | Why It's Wrong | Fix |
|
|
6
|
+
|---------|---------------|-----|
|
|
7
|
+
| `SELECT *` in mart/fact models | Breaks contracts when upstream adds columns | Explicitly list all columns |
|
|
8
|
+
| Hardcoded table names | Breaks when schema/database changes | Use `{{ ref() }}` and `{{ source() }}` |
|
|
9
|
+
| Business logic in staging models | Staging = 1:1 source mirror | Move logic to intermediate layer |
|
|
10
|
+
| JOINs in staging models | Same as above | Move JOINs to intermediate layer |
|
|
11
|
+
| `LEFT JOIN` when `INNER JOIN` is correct | Returns NULLs for unmatched rows | Think about what NULL means for your use case |
|
|
12
|
+
| Missing `GROUP BY` columns | Query fails or returns wrong results | Every non-aggregate column must be in GROUP BY |
|
|
13
|
+
| Window functions without `PARTITION BY` | Aggregates across entire table | Add appropriate partitioning |
|
|
14
|
+
|
|
15
|
+
## Project Structure Mistakes
|
|
16
|
+
|
|
17
|
+
| Mistake | Fix |
|
|
18
|
+
|---------|-----|
|
|
19
|
+
| Models in wrong layer directory | staging/ for source mirrors, intermediate/ for transforms, marts/ for business tables |
|
|
20
|
+
| No schema.yml for new models | Always create companion YAML with at minimum `unique` + `not_null` on primary key |
|
|
21
|
+
| Naming doesn't match convention | Check existing models: `stg_`, `int_`, `fct_`, `dim_` prefixes |
|
|
22
|
+
| Missing `{{ config() }}` block | Every model should declare materialization explicitly or inherit from dbt_project.yml |
|
|
23
|
+
|
|
24
|
+
## Incremental Mistakes
|
|
25
|
+
|
|
26
|
+
| Mistake | Fix |
|
|
27
|
+
|---------|-----|
|
|
28
|
+
| No `unique_key` on merge strategy | Causes duplicates. Set `unique_key` to your primary key |
|
|
29
|
+
| Using `created_at` for mutable records | Use `updated_at` — `created_at` misses updates |
|
|
30
|
+
| No lookback window | Use `max(ts) - interval '1 hour'` to catch late-arriving data |
|
|
31
|
+
| Forgetting `is_incremental()` returns false on first run | The `WHERE` clause only applies after first run |
|
|
32
|
+
|
|
33
|
+
## Validation Mistakes
|
|
34
|
+
|
|
35
|
+
| Mistake | Fix |
|
|
36
|
+
|---------|-----|
|
|
37
|
+
| "It compiled, ship it" | Compilation only checks Jinja syntax. Always `altimate-dbt build` |
|
|
38
|
+
| Not spot-checking output data | Run `altimate-dbt execute --query "SELECT * FROM {{ ref('model') }}" --limit 10` |
|
|
39
|
+
| Not checking row counts | Compare source vs output: `SELECT count(*) FROM {{ ref('model') }}` |
|
|
40
|
+
| Skipping downstream builds | Use `altimate-dbt build --model <name> --downstream` |
|
|
41
|
+
|
|
42
|
+
## Rationalizations to Resist
|
|
43
|
+
|
|
44
|
+
| You're Thinking... | Reality |
|
|
45
|
+
|--------------------|---------|
|
|
46
|
+
| "I'll add tests later" | You won't. Add them now. |
|
|
47
|
+
| "SELECT * is fine for now" | It will break when upstream changes. List columns explicitly. |
|
|
48
|
+
| "This model is temporary" | Nothing is more permanent than a temporary solution. |
|
|
49
|
+
| "The data looks right at a glance" | Run the build. Check the tests. Spot-check edge cases. |
|
|
@@ -0,0 +1,118 @@
|
|
|
1
|
+
# Incremental Materialization Strategies
|
|
2
|
+
|
|
3
|
+
## When to Use Incremental
|
|
4
|
+
|
|
5
|
+
Use incremental when:
|
|
6
|
+
- Table has millions+ rows
|
|
7
|
+
- Source data is append-only or has reliable `updated_at` timestamps
|
|
8
|
+
- Full refreshes take too long or cost too much
|
|
9
|
+
|
|
10
|
+
Do NOT use incremental when:
|
|
11
|
+
- Table is small (< 1M rows)
|
|
12
|
+
- Source data doesn't have reliable timestamps
|
|
13
|
+
- Logic requires full-table window functions
|
|
14
|
+
|
|
15
|
+
## Strategy Decision Tree
|
|
16
|
+
|
|
17
|
+
```
|
|
18
|
+
Is the data append-only (events, logs)?
|
|
19
|
+
YES → Append strategy
|
|
20
|
+
NO → Can rows be updated?
|
|
21
|
+
YES → Does your warehouse support MERGE?
|
|
22
|
+
YES → Merge/Upsert strategy
|
|
23
|
+
NO → Delete+Insert strategy
|
|
24
|
+
NO → Is data date-partitioned?
|
|
25
|
+
YES → Insert Overwrite strategy
|
|
26
|
+
NO → Append with dedup
|
|
27
|
+
```
|
|
28
|
+
|
|
29
|
+
## Append (Event Logs)
|
|
30
|
+
|
|
31
|
+
```sql
|
|
32
|
+
{{ config(
|
|
33
|
+
materialized='incremental',
|
|
34
|
+
on_schema_change='append_new_columns'
|
|
35
|
+
) }}
|
|
36
|
+
|
|
37
|
+
select
|
|
38
|
+
event_id,
|
|
39
|
+
event_type,
|
|
40
|
+
created_at
|
|
41
|
+
from {{ ref('stg_events') }}
|
|
42
|
+
|
|
43
|
+
{% if is_incremental() %}
|
|
44
|
+
where created_at > (select max(created_at) from {{ this }})
|
|
45
|
+
{% endif %}
|
|
46
|
+
```
|
|
47
|
+
|
|
48
|
+
## Merge/Upsert (Mutable Records)
|
|
49
|
+
|
|
50
|
+
```sql
|
|
51
|
+
{{ config(
|
|
52
|
+
materialized='incremental',
|
|
53
|
+
unique_key='order_id',
|
|
54
|
+
merge_update_columns=['status', 'updated_at', 'amount'],
|
|
55
|
+
on_schema_change='sync_all_columns'
|
|
56
|
+
) }}
|
|
57
|
+
|
|
58
|
+
select
|
|
59
|
+
order_id,
|
|
60
|
+
status,
|
|
61
|
+
amount,
|
|
62
|
+
created_at,
|
|
63
|
+
updated_at
|
|
64
|
+
from {{ ref('stg_orders') }}
|
|
65
|
+
|
|
66
|
+
{% if is_incremental() %}
|
|
67
|
+
where updated_at > (select max(updated_at) from {{ this }})
|
|
68
|
+
{% endif %}
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Insert Overwrite (Partitioned)
|
|
72
|
+
|
|
73
|
+
```sql
|
|
74
|
+
{{ config(
|
|
75
|
+
materialized='incremental',
|
|
76
|
+
incremental_strategy='insert_overwrite',
|
|
77
|
+
partition_by={'field': 'event_date', 'data_type': 'date'},
|
|
78
|
+
on_schema_change='fail'
|
|
79
|
+
) }}
|
|
80
|
+
|
|
81
|
+
select
|
|
82
|
+
date_trunc('day', created_at) as event_date,
|
|
83
|
+
count(*) as event_count
|
|
84
|
+
from {{ ref('stg_events') }}
|
|
85
|
+
|
|
86
|
+
{% if is_incremental() %}
|
|
87
|
+
where date_trunc('day', created_at) >= (select max(event_date) - interval '3 days' from {{ this }})
|
|
88
|
+
{% endif %}
|
|
89
|
+
|
|
90
|
+
group by 1
|
|
91
|
+
```
|
|
92
|
+
|
|
93
|
+
## Common Pitfalls
|
|
94
|
+
|
|
95
|
+
| Issue | Problem | Fix |
|
|
96
|
+
|-------|---------|-----|
|
|
97
|
+
| Missing `unique_key` | Duplicates on re-run | Add `unique_key` matching the primary key |
|
|
98
|
+
| Wrong timestamp column | Missed updates | Use `updated_at` not `created_at` for mutable data |
|
|
99
|
+
| No lookback window | Late-arriving data missed | `max(ts) - interval '1 hour'` instead of strict `>` |
|
|
100
|
+
| `on_schema_change='fail'` | Breaks on column additions | Use `'append_new_columns'` or `'sync_all_columns'` |
|
|
101
|
+
| Full refresh needed | Schema drift accumulated | `altimate-dbt run --model <name>` with `--full-refresh` flag |
|
|
102
|
+
|
|
103
|
+
## Official Documentation
|
|
104
|
+
|
|
105
|
+
For the latest syntax and adapter-specific options, refer to:
|
|
106
|
+
- **dbt incremental models**: https://docs.getdbt.com/docs/build/incremental-models
|
|
107
|
+
- **Incremental strategies by adapter**: https://docs.getdbt.com/docs/build/incremental-strategy
|
|
108
|
+
- **Configuring incremental models**: https://docs.getdbt.com/reference/resource-configs/materialized#incremental
|
|
109
|
+
|
|
110
|
+
## Warehouse Support
|
|
111
|
+
|
|
112
|
+
| Warehouse | Default Strategy | Merge | Partition | Notes |
|
|
113
|
+
|-----------|-----------------|-------|-----------|-------|
|
|
114
|
+
| Snowflake | `merge` | Yes | Cluster keys | Best incremental support |
|
|
115
|
+
| BigQuery | `merge` | Yes | `partition_by` | Requires partition for insert_overwrite |
|
|
116
|
+
| PostgreSQL | `append` | No | No | Use delete+insert pattern |
|
|
117
|
+
| DuckDB | `append` | Partial | No | Limited incremental support |
|
|
118
|
+
| Redshift | `append` | No | dist/sort keys | Use delete+insert pattern |
|
|
@@ -0,0 +1,158 @@
|
|
|
1
|
+
# dbt Layer Patterns
|
|
2
|
+
|
|
3
|
+
## Staging (`stg_`)
|
|
4
|
+
|
|
5
|
+
**Purpose**: 1:1 with source table. Rename, cast, no joins, no business logic.
|
|
6
|
+
**Materialization**: `view`
|
|
7
|
+
**Naming**: `stg_<source>__<table>.sql`
|
|
8
|
+
**Location**: `models/staging/<source>/`
|
|
9
|
+
|
|
10
|
+
```sql
|
|
11
|
+
with source as (
|
|
12
|
+
select * from {{ source('source_name', 'table_name') }}
|
|
13
|
+
),
|
|
14
|
+
|
|
15
|
+
renamed as (
|
|
16
|
+
select
|
|
17
|
+
-- Primary key
|
|
18
|
+
column_id as table_id,
|
|
19
|
+
|
|
20
|
+
-- Dimensions
|
|
21
|
+
column_name,
|
|
22
|
+
|
|
23
|
+
-- Timestamps
|
|
24
|
+
created_at,
|
|
25
|
+
updated_at
|
|
26
|
+
|
|
27
|
+
from source
|
|
28
|
+
)
|
|
29
|
+
|
|
30
|
+
select * from renamed
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
**Rules**:
|
|
34
|
+
- One CTE named `source`, one named `renamed` (or `cast`, `cleaned`)
|
|
35
|
+
- Only type casting, renaming, deduplication
|
|
36
|
+
- No joins, no filters (except dedup), no business logic
|
|
37
|
+
|
|
38
|
+
## Intermediate (`int_`)
|
|
39
|
+
|
|
40
|
+
**Purpose**: Business logic, joins, transformations between staging and marts.
|
|
41
|
+
**Materialization**: `ephemeral` or `view`
|
|
42
|
+
**Naming**: `int_<entity>__<verb>.sql` (e.g., `int_orders__joined`, `int_payments__pivoted`)
|
|
43
|
+
**Location**: `models/intermediate/`
|
|
44
|
+
|
|
45
|
+
```sql
|
|
46
|
+
with orders as (
|
|
47
|
+
select * from {{ ref('stg_source__orders') }}
|
|
48
|
+
),
|
|
49
|
+
|
|
50
|
+
customers as (
|
|
51
|
+
select * from {{ ref('stg_source__customers') }}
|
|
52
|
+
),
|
|
53
|
+
|
|
54
|
+
joined as (
|
|
55
|
+
select
|
|
56
|
+
orders.order_id,
|
|
57
|
+
orders.customer_id,
|
|
58
|
+
customers.customer_name,
|
|
59
|
+
orders.order_date,
|
|
60
|
+
orders.amount
|
|
61
|
+
from orders
|
|
62
|
+
left join customers on orders.customer_id = customers.customer_id
|
|
63
|
+
)
|
|
64
|
+
|
|
65
|
+
select * from joined
|
|
66
|
+
```
|
|
67
|
+
|
|
68
|
+
**Rules**:
|
|
69
|
+
- Cross-source joins allowed
|
|
70
|
+
- Business logic transformations
|
|
71
|
+
- Not exposed to end users directly
|
|
72
|
+
- Name the verb: `__joined`, `__pivoted`, `__filtered`, `__aggregated`
|
|
73
|
+
|
|
74
|
+
## Mart: Facts (`fct_`)
|
|
75
|
+
|
|
76
|
+
**Purpose**: Business events. Immutable, timestamped, narrow.
|
|
77
|
+
**Materialization**: `table` or `incremental`
|
|
78
|
+
**Naming**: `fct_<entity>.sql`
|
|
79
|
+
**Location**: `models/marts/<domain>/`
|
|
80
|
+
|
|
81
|
+
```sql
|
|
82
|
+
with final as (
|
|
83
|
+
select
|
|
84
|
+
order_id,
|
|
85
|
+
customer_id,
|
|
86
|
+
order_date,
|
|
87
|
+
amount,
|
|
88
|
+
discount_amount,
|
|
89
|
+
amount - discount_amount as net_amount
|
|
90
|
+
from {{ ref('int_orders__joined') }}
|
|
91
|
+
)
|
|
92
|
+
|
|
93
|
+
select * from final
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## Mart: Dimensions (`dim_`)
|
|
97
|
+
|
|
98
|
+
**Purpose**: Descriptive attributes. Slowly changing, wide.
|
|
99
|
+
**Materialization**: `table`
|
|
100
|
+
**Naming**: `dim_<entity>.sql`
|
|
101
|
+
|
|
102
|
+
```sql
|
|
103
|
+
with final as (
|
|
104
|
+
select
|
|
105
|
+
customer_id,
|
|
106
|
+
customer_name,
|
|
107
|
+
email,
|
|
108
|
+
segment,
|
|
109
|
+
first_order_date,
|
|
110
|
+
most_recent_order_date,
|
|
111
|
+
lifetime_order_count
|
|
112
|
+
from {{ ref('int_customers__aggregated') }}
|
|
113
|
+
)
|
|
114
|
+
|
|
115
|
+
select * from final
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
## One Big Table (`obt_`)
|
|
119
|
+
|
|
120
|
+
**Purpose**: Denormalized wide table combining fact + dimensions for BI consumption.
|
|
121
|
+
**Materialization**: `table`
|
|
122
|
+
**Naming**: `obt_<entity>.sql`
|
|
123
|
+
|
|
124
|
+
```sql
|
|
125
|
+
with facts as (
|
|
126
|
+
select * from {{ ref('fct_orders') }}
|
|
127
|
+
),
|
|
128
|
+
|
|
129
|
+
customers as (
|
|
130
|
+
select * from {{ ref('dim_customers') }}
|
|
131
|
+
),
|
|
132
|
+
|
|
133
|
+
dates as (
|
|
134
|
+
select * from {{ ref('dim_dates') }}
|
|
135
|
+
),
|
|
136
|
+
|
|
137
|
+
final as (
|
|
138
|
+
select
|
|
139
|
+
facts.*,
|
|
140
|
+
customers.customer_name,
|
|
141
|
+
customers.segment,
|
|
142
|
+
dates.day_of_week,
|
|
143
|
+
dates.month_name,
|
|
144
|
+
dates.is_weekend
|
|
145
|
+
from facts
|
|
146
|
+
left join customers on facts.customer_id = customers.customer_id
|
|
147
|
+
left join dates on facts.order_date = dates.date_day
|
|
148
|
+
)
|
|
149
|
+
|
|
150
|
+
select * from final
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
## CTE Style Guide
|
|
154
|
+
|
|
155
|
+
- Name CTEs after what they contain, not what they do: `orders` not `get_orders`
|
|
156
|
+
- Use `final` as the last CTE name
|
|
157
|
+
- One CTE per source model (via `ref()` or `source()`)
|
|
158
|
+
- End every model with `select * from final`
|
|
@@ -0,0 +1,125 @@
|
|
|
1
|
+
# Medallion Architecture (Bronze / Silver / Gold)
|
|
2
|
+
|
|
3
|
+
An alternative to staging/intermediate/mart layering. Common in Databricks and lakehouse environments.
|
|
4
|
+
|
|
5
|
+
## Layer Mapping
|
|
6
|
+
|
|
7
|
+
| Medallion | Traditional dbt | Purpose |
|
|
8
|
+
|-----------|----------------|---------|
|
|
9
|
+
| Bronze | Staging (`stg_`) | Raw ingestion, minimal transform |
|
|
10
|
+
| Silver | Intermediate (`int_`) | Cleaned, conformed, joined |
|
|
11
|
+
| Gold | Marts (`fct_`/`dim_`) | Business-ready aggregations |
|
|
12
|
+
|
|
13
|
+
## Directory Structure
|
|
14
|
+
|
|
15
|
+
```
|
|
16
|
+
models/
|
|
17
|
+
bronze/
|
|
18
|
+
source_system/
|
|
19
|
+
_source_system__sources.yml
|
|
20
|
+
brz_source_system__table.sql
|
|
21
|
+
silver/
|
|
22
|
+
domain/
|
|
23
|
+
slv_domain__entity.sql
|
|
24
|
+
gold/
|
|
25
|
+
domain/
|
|
26
|
+
fct_metric.sql
|
|
27
|
+
dim_entity.sql
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
## Bronze (Raw)
|
|
31
|
+
|
|
32
|
+
```sql
|
|
33
|
+
-- brz_stripe__payments.sql
|
|
34
|
+
{{ config(materialized='view') }}
|
|
35
|
+
|
|
36
|
+
with source as (
|
|
37
|
+
select * from {{ source('stripe', 'payments') }}
|
|
38
|
+
),
|
|
39
|
+
|
|
40
|
+
cast as (
|
|
41
|
+
select
|
|
42
|
+
cast(id as varchar) as payment_id,
|
|
43
|
+
cast(amount as integer) as amount_cents,
|
|
44
|
+
cast(created as timestamp) as created_at,
|
|
45
|
+
_loaded_at
|
|
46
|
+
from source
|
|
47
|
+
)
|
|
48
|
+
|
|
49
|
+
select * from cast
|
|
50
|
+
```
|
|
51
|
+
|
|
52
|
+
**Rules**: 1:1 with source, only cast/rename, no joins, `view` materialization.
|
|
53
|
+
|
|
54
|
+
## Silver (Cleaned)
|
|
55
|
+
|
|
56
|
+
```sql
|
|
57
|
+
-- slv_finance__orders_enriched.sql
|
|
58
|
+
{{ config(materialized='table') }}
|
|
59
|
+
|
|
60
|
+
with orders as (
|
|
61
|
+
select * from {{ ref('brz_stripe__payments') }}
|
|
62
|
+
),
|
|
63
|
+
|
|
64
|
+
customers as (
|
|
65
|
+
select * from {{ ref('brz_crm__customers') }}
|
|
66
|
+
),
|
|
67
|
+
|
|
68
|
+
enriched as (
|
|
69
|
+
select
|
|
70
|
+
o.payment_id,
|
|
71
|
+
o.amount_cents / 100.0 as amount_dollars,
|
|
72
|
+
c.customer_name,
|
|
73
|
+
c.segment,
|
|
74
|
+
o.created_at
|
|
75
|
+
from orders o
|
|
76
|
+
left join customers c on o.customer_id = c.customer_id
|
|
77
|
+
where o.created_at is not null
|
|
78
|
+
)
|
|
79
|
+
|
|
80
|
+
select * from enriched
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
**Rules**: Cross-source joins, business logic, quality filters, `table` materialization.
|
|
84
|
+
|
|
85
|
+
## Gold (Business-Ready)
|
|
86
|
+
|
|
87
|
+
```sql
|
|
88
|
+
-- fct_daily_revenue.sql
|
|
89
|
+
{{ config(materialized='table') }}
|
|
90
|
+
|
|
91
|
+
with daily as (
|
|
92
|
+
select
|
|
93
|
+
date_trunc('day', created_at) as revenue_date,
|
|
94
|
+
segment,
|
|
95
|
+
count(*) as order_count,
|
|
96
|
+
sum(amount_dollars) as gross_revenue
|
|
97
|
+
from {{ ref('slv_finance__orders_enriched') }}
|
|
98
|
+
group by 1, 2
|
|
99
|
+
)
|
|
100
|
+
|
|
101
|
+
select * from daily
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
**Rules**: Aggregations, metrics, KPIs, `table` or `incremental` materialization.
|
|
105
|
+
|
|
106
|
+
## dbt_project.yml Config
|
|
107
|
+
|
|
108
|
+
```yaml
|
|
109
|
+
models:
|
|
110
|
+
my_project:
|
|
111
|
+
bronze:
|
|
112
|
+
+materialized: view
|
|
113
|
+
silver:
|
|
114
|
+
+materialized: table
|
|
115
|
+
gold:
|
|
116
|
+
+materialized: table
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
## When to Use Medallion vs Traditional
|
|
120
|
+
|
|
121
|
+
| Use Medallion When | Use Traditional When |
|
|
122
|
+
|-------------------|---------------------|
|
|
123
|
+
| Databricks/lakehouse environment | dbt Cloud or Snowflake-centric |
|
|
124
|
+
| Team already uses bronze/silver/gold terminology | Team uses staging/intermediate/mart |
|
|
125
|
+
| Data platform team maintains bronze layer | Analytics engineers own the full stack |
|