coalesce-transform-mcp 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +304 -0
- package/dist/cache-dir.d.ts +26 -0
- package/dist/cache-dir.js +106 -0
- package/dist/client.d.ts +25 -0
- package/dist/client.js +212 -0
- package/dist/coalesce/api/environments.d.ts +20 -0
- package/dist/coalesce/api/environments.js +15 -0
- package/dist/coalesce/api/git-accounts.d.ts +21 -0
- package/dist/coalesce/api/git-accounts.js +21 -0
- package/dist/coalesce/api/jobs.d.ts +25 -0
- package/dist/coalesce/api/jobs.js +21 -0
- package/dist/coalesce/api/nodes.d.ts +29 -0
- package/dist/coalesce/api/nodes.js +33 -0
- package/dist/coalesce/api/projects.d.ts +22 -0
- package/dist/coalesce/api/projects.js +25 -0
- package/dist/coalesce/api/runs.d.ts +19 -0
- package/dist/coalesce/api/runs.js +34 -0
- package/dist/coalesce/api/subgraphs.d.ts +20 -0
- package/dist/coalesce/api/subgraphs.js +17 -0
- package/dist/coalesce/api/users.d.ts +30 -0
- package/dist/coalesce/api/users.js +31 -0
- package/dist/coalesce/types.d.ts +298 -0
- package/dist/coalesce/types.js +746 -0
- package/dist/generated/.gitkeep +0 -0
- package/dist/generated/node-type-corpus.json +42656 -0
- package/dist/index.d.ts +2 -0
- package/dist/index.js +10 -0
- package/dist/mcp/cache.d.ts +3 -0
- package/dist/mcp/cache.js +137 -0
- package/dist/mcp/environments.d.ts +3 -0
- package/dist/mcp/environments.js +61 -0
- package/dist/mcp/git-accounts.d.ts +3 -0
- package/dist/mcp/git-accounts.js +70 -0
- package/dist/mcp/jobs.d.ts +3 -0
- package/dist/mcp/jobs.js +77 -0
- package/dist/mcp/node-type-corpus.d.ts +3 -0
- package/dist/mcp/node-type-corpus.js +173 -0
- package/dist/mcp/nodes.d.ts +3 -0
- package/dist/mcp/nodes.js +341 -0
- package/dist/mcp/pipelines.d.ts +3 -0
- package/dist/mcp/pipelines.js +342 -0
- package/dist/mcp/projects.d.ts +3 -0
- package/dist/mcp/projects.js +70 -0
- package/dist/mcp/repo-node-types.d.ts +135 -0
- package/dist/mcp/repo-node-types.js +387 -0
- package/dist/mcp/runs.d.ts +3 -0
- package/dist/mcp/runs.js +92 -0
- package/dist/mcp/subgraphs.d.ts +3 -0
- package/dist/mcp/subgraphs.js +60 -0
- package/dist/mcp/users.d.ts +3 -0
- package/dist/mcp/users.js +107 -0
- package/dist/prompts/index.d.ts +2 -0
- package/dist/prompts/index.js +58 -0
- package/dist/resources/context/aggregation-patterns.md +145 -0
- package/dist/resources/context/data-engineering-principles.md +183 -0
- package/dist/resources/context/hydrated-metadata.md +92 -0
- package/dist/resources/context/id-discovery.md +64 -0
- package/dist/resources/context/intelligent-node-configuration.md +162 -0
- package/dist/resources/context/node-creation-decision-tree.md +156 -0
- package/dist/resources/context/node-operations.md +316 -0
- package/dist/resources/context/node-payloads.md +114 -0
- package/dist/resources/context/node-type-corpus.md +166 -0
- package/dist/resources/context/node-type-selection-guide.md +96 -0
- package/dist/resources/context/overview.md +135 -0
- package/dist/resources/context/pipeline-workflows.md +355 -0
- package/dist/resources/context/run-operations.md +55 -0
- package/dist/resources/context/sql-bigquery.md +41 -0
- package/dist/resources/context/sql-databricks.md +40 -0
- package/dist/resources/context/sql-platform-selection.md +70 -0
- package/dist/resources/context/sql-snowflake.md +43 -0
- package/dist/resources/context/storage-mappings.md +49 -0
- package/dist/resources/context/tool-usage.md +98 -0
- package/dist/resources/index.d.ts +5 -0
- package/dist/resources/index.js +254 -0
- package/dist/schemas/node-payloads.d.ts +5019 -0
- package/dist/schemas/node-payloads.js +147 -0
- package/dist/server.d.ts +7 -0
- package/dist/server.js +63 -0
- package/dist/services/cache/snapshots.d.ts +108 -0
- package/dist/services/cache/snapshots.js +275 -0
- package/dist/services/config/context-analyzer.d.ts +14 -0
- package/dist/services/config/context-analyzer.js +76 -0
- package/dist/services/config/field-classifier.d.ts +23 -0
- package/dist/services/config/field-classifier.js +47 -0
- package/dist/services/config/intelligent.d.ts +55 -0
- package/dist/services/config/intelligent.js +306 -0
- package/dist/services/config/rules.d.ts +6 -0
- package/dist/services/config/rules.js +44 -0
- package/dist/services/config/schema-resolver.d.ts +18 -0
- package/dist/services/config/schema-resolver.js +80 -0
- package/dist/services/corpus/loader.d.ts +56 -0
- package/dist/services/corpus/loader.js +25 -0
- package/dist/services/corpus/search.d.ts +49 -0
- package/dist/services/corpus/search.js +69 -0
- package/dist/services/corpus/templates.d.ts +4 -0
- package/dist/services/corpus/templates.js +11 -0
- package/dist/services/pipelines/execution.d.ts +20 -0
- package/dist/services/pipelines/execution.js +290 -0
- package/dist/services/pipelines/node-type-intent.d.ts +96 -0
- package/dist/services/pipelines/node-type-intent.js +356 -0
- package/dist/services/pipelines/node-type-selection.d.ts +66 -0
- package/dist/services/pipelines/node-type-selection.js +758 -0
- package/dist/services/pipelines/planning.d.ts +543 -0
- package/dist/services/pipelines/planning.js +1839 -0
- package/dist/services/policies/sql-override.d.ts +7 -0
- package/dist/services/policies/sql-override.js +109 -0
- package/dist/services/repo/operations.d.ts +6 -0
- package/dist/services/repo/operations.js +10 -0
- package/dist/services/repo/parser.d.ts +70 -0
- package/dist/services/repo/parser.js +365 -0
- package/dist/services/repo/path.d.ts +2 -0
- package/dist/services/repo/path.js +58 -0
- package/dist/services/templates/nodes.d.ts +50 -0
- package/dist/services/templates/nodes.js +336 -0
- package/dist/services/workspace/analysis.d.ts +56 -0
- package/dist/services/workspace/analysis.js +151 -0
- package/dist/services/workspace/mutations.d.ts +150 -0
- package/dist/services/workspace/mutations.js +1718 -0
- package/dist/utils.d.ts +5 -0
- package/dist/utils.js +7 -0
- package/dist/workflows/get-environment-overview.d.ts +9 -0
- package/dist/workflows/get-environment-overview.js +23 -0
- package/dist/workflows/get-run-details.d.ts +10 -0
- package/dist/workflows/get-run-details.js +28 -0
- package/dist/workflows/progress.d.ts +20 -0
- package/dist/workflows/progress.js +54 -0
- package/dist/workflows/retry-and-wait.d.ts +13 -0
- package/dist/workflows/retry-and-wait.js +139 -0
- package/dist/workflows/run-and-wait.d.ts +13 -0
- package/dist/workflows/run-and-wait.js +141 -0
- package/dist/workflows/run-status.d.ts +10 -0
- package/dist/workflows/run-status.js +27 -0
- package/package.json +34 -0
|
@@ -0,0 +1,355 @@
|
|
|
1
|
+
# Pipeline Building Workflows
|
|
2
|
+
|
|
3
|
+
## Quick Reference
|
|
4
|
+
|
|
5
|
+
| User says | Node type | Action |
|
|
6
|
+
|-----------|-----------|--------|
|
|
7
|
+
| "stage my X data" | `Stage` | `create-workspace-node-from-predecessor` with source node |
|
|
8
|
+
| "create a dimension for X" | `Dimension` | `create-workspace-node-from-predecessor` with staging node |
|
|
9
|
+
| "build a fact table from X and Y" | `Fact` | `create-workspace-node-from-predecessor` with both, then `convert-join-to-aggregation` |
|
|
10
|
+
| "create a view for X" | `View` | `create-workspace-node-from-predecessor` with upstream node |
|
|
11
|
+
| "join X and Y" | `View` | `create-workspace-node-from-predecessor` with both, then apply join condition |
|
|
12
|
+
| "build an incremental pipeline" | `Incremental Load` | See "Incremental Pipeline Setup" below |
|
|
13
|
+
| "create an empty node" | Any | `create-workspace-node-from-scratch` with `name` and `metadata.columns` |
|
|
14
|
+
|
|
15
|
+
Use this table as a heuristic for likely node families, but still call `plan-pipeline` before creating anything so the exact `nodeType` is confirmed from the repo/workspace context. Use the full workflow whenever the request is ambiguous or involves multiple layers.
|
|
16
|
+
|
|
17
|
+
## Choosing the Right Node Type
|
|
18
|
+
|
|
19
|
+
**Step 1: Check available types**
|
|
20
|
+
|
|
21
|
+
```javascript
|
|
22
|
+
list-workspace-node-types({ workspaceID })
|
|
23
|
+
```
|
|
24
|
+
|
|
25
|
+
Prefer types already in use. If `base-nodes:::Stage` etc. are observed, use those over built-in types. If a recommended type is not observed, tell the user it may need installation via Build Settings > Packages.
|
|
26
|
+
|
|
27
|
+
**Step 2: Read source columns**
|
|
28
|
+
|
|
29
|
+
```javascript
|
|
30
|
+
get-workspace-node({ workspaceID, nodeID: "source-id" })
|
|
31
|
+
```
|
|
32
|
+
|
|
33
|
+
Look for timestamp columns (UPDATED_AT, CREATED_AT) for incremental loading, business keys (CUSTOMER_ID) for merge keys, and SCD columns (EFFECTIVE_FROM, IS_CURRENT) for dimension tracking.
|
|
34
|
+
|
|
35
|
+
**Step 3: Select by pipeline layer**
|
|
36
|
+
|
|
37
|
+
| Layer | Default | Use Instead When |
|
|
38
|
+
|-------|---------|-----------------|
|
|
39
|
+
| Staging | `Stage` | Source has timestamps AND is large -> `Incremental Load` (requires package) |
|
|
40
|
+
| Intermediate | `View` | Expensive computation -> `Stage` or `Work` |
|
|
41
|
+
| Dimension (gold) | `Dimension` | Has SCD columns -> configure SCD Type 2 |
|
|
42
|
+
| Fact (gold) | `Fact` | Large append-heavy data -> incremental config |
|
|
43
|
+
| Metrics | `View` | Complex/frequently queried -> `Fact` |
|
|
44
|
+
|
|
45
|
+
IMPORTANT: `View` types can ONLY materialize as views. For aggregations queried repeatedly, use `Dimension` or `Fact`. See `coalesce://context/data-engineering-principles` for platform-specific materialization guidance.
|
|
46
|
+
|
|
47
|
+
Node type format: `"Stage"` (simple) or `"IncrementalLoading:::230"` (PackageName:::NodeTypeID). Prefer the package-prefixed format when known.
|
|
48
|
+
|
|
49
|
+
## Pipeline Building Sequence
|
|
50
|
+
|
|
51
|
+
### Step 1: Discover the Workspace
|
|
52
|
+
|
|
53
|
+
```javascript
|
|
54
|
+
list-projects({ includeWorkspaces: true })
|
|
55
|
+
```
|
|
56
|
+
|
|
57
|
+
### Step 2: Find Source Nodes
|
|
58
|
+
|
|
59
|
+
```javascript
|
|
60
|
+
list-workspace-nodes({ workspaceID, detail: true })
|
|
61
|
+
```
|
|
62
|
+
|
|
63
|
+
Note node IDs, names, and `locationName` (needed for `{{ ref() }}` syntax).
|
|
64
|
+
|
|
65
|
+
### Step 3: Plan to discover the correct node type
|
|
66
|
+
|
|
67
|
+
**Always call `plan-pipeline` before creating nodes.** The planner scans the repo for all committed node type definitions, scores them against your use case, and returns the best match. Do not guess node types — use what the planner recommends.
|
|
68
|
+
|
|
69
|
+
```javascript
|
|
70
|
+
plan-pipeline({
|
|
71
|
+
workspaceID,
|
|
72
|
+
goal: "stage customer data",
|
|
73
|
+
sourceNodeIDs: ["source-id-1"],
|
|
74
|
+
repoPath: "/path/to/repo" // or rely on COALESCE_REPO_PATH
|
|
75
|
+
})
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
The response includes:
|
|
79
|
+
|
|
80
|
+
- `nodeTypeSelection.consideredNodeTypes` — ranked candidates with scores and reasons
|
|
81
|
+
- `supportedNodeTypes` — types that support automatic creation
|
|
82
|
+
- `nodes[].nodeType` — the recommended type for each planned node
|
|
83
|
+
|
|
84
|
+
**If the user provides SQL:** Pass it directly to the planner.
|
|
85
|
+
|
|
86
|
+
```javascript
|
|
87
|
+
plan-pipeline({ workspaceID, sql: "<USER-PROVIDED SQL HERE>" })
|
|
88
|
+
```
|
|
89
|
+
|
|
90
|
+
Never author SQL yourself to pass to these tools.
|
|
91
|
+
|
|
92
|
+
### Step 4: Review and Execute
|
|
93
|
+
|
|
94
|
+
- Plan returns `status: "ready"` -> use the recommended node type to create, or call `create-pipeline-from-plan`
|
|
95
|
+
- Plan returns `status: "needs_clarification"` -> address `openQuestions`, then re-plan
|
|
96
|
+
|
|
97
|
+
### Step 5: Create Nodes with the planned node type
|
|
98
|
+
|
|
99
|
+
Use `create-workspace-node-from-predecessor` with the node type from the plan. **Always pass `repoPath`** so config completion runs automatically:
|
|
100
|
+
|
|
101
|
+
```javascript
|
|
102
|
+
create-workspace-node-from-predecessor({
|
|
103
|
+
workspaceID,
|
|
104
|
+
nodeType: "base-nodes:::Stage", // from plan-pipeline result
|
|
105
|
+
predecessorNodeIDs: ["source-id-1"],
|
|
106
|
+
changes: { name: "STG_CUSTOMER" },
|
|
107
|
+
repoPath: "/path/to/repo" // enables automatic config completion
|
|
108
|
+
})
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
**Always use `create-workspace-node-from-predecessor` or `create-workspace-node-from-scratch`** — they handle validation, config completion, and column-level attributes automatically.
|
|
112
|
+
|
|
113
|
+
For joins and aggregations:
|
|
114
|
+
|
|
115
|
+
```javascript
|
|
116
|
+
convert-join-to-aggregation({
|
|
117
|
+
workspaceID, nodeID: "join-node-id",
|
|
118
|
+
groupByColumns: [...], aggregates: [...],
|
|
119
|
+
maintainJoins: true,
|
|
120
|
+
repoPath: "/path/to/repo"
|
|
121
|
+
})
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
### Step 6: Verify Config Completion
|
|
125
|
+
|
|
126
|
+
Config is applied automatically when `repoPath` is provided. Check the `configCompletion` field in the response:
|
|
127
|
+
|
|
128
|
+
- `configCompletion.configReview.status` — `complete`, `needs_attention`, or `incomplete`
|
|
129
|
+
- `configCompletion.configReview.summary` — human-readable status summary
|
|
130
|
+
- `configCompletion.configReview.missingRequired` — required fields/columnSelectors still unset
|
|
131
|
+
- `configCompletion.configReview.warnings` — issues needing manual review (e.g., missing business keys on dimension nodes)
|
|
132
|
+
- `configCompletion.configReview.suggestions` — optional improvements (e.g., change tracking, materialization)
|
|
133
|
+
- `configCompletion.appliedConfig` — node-level config values that were set
|
|
134
|
+
- `configCompletion.columnAttributeChanges.applied` — column-level attributes (isBusinessKey, etc.)
|
|
135
|
+
- `configCompletion.reasoning` — why each decision was made
|
|
136
|
+
|
|
137
|
+
**Action required when `configReview.status` is not `complete`:**
|
|
138
|
+
|
|
139
|
+
- `incomplete` — required fields are missing. Set them via `update-workspace-node` or `replace-workspace-node-columns`.
|
|
140
|
+
- `needs_attention` — warnings need manual review (e.g., set `isBusinessKey` on the correct columns).
|
|
141
|
+
|
|
142
|
+
If `configCompletionSkipped` appears instead, call `complete-node-configuration` with `repoPath` to retry.
|
|
143
|
+
|
|
144
|
+
### Step 7: Follow-Up Edits
|
|
145
|
+
|
|
146
|
+
Use `update-workspace-node` for post-creation changes.
|
|
147
|
+
|
|
148
|
+
## Multi-Node Pipelines
|
|
149
|
+
|
|
150
|
+
Create nodes bottom-up — upstream before downstream. Each step uses node IDs from the previous step.
|
|
151
|
+
|
|
152
|
+
**Example: join two sources then aggregate**
|
|
153
|
+
|
|
154
|
+
```javascript
|
|
155
|
+
// 0. Plan to discover correct node types for each layer
|
|
156
|
+
plan-pipeline({
|
|
157
|
+
workspaceID,
|
|
158
|
+
goal: "stage customer and orders, then build CLV fact",
|
|
159
|
+
sourceNodeIDs: ["source-customer-id", "source-orders-id"],
|
|
160
|
+
repoPath: "/path/to/repo"
|
|
161
|
+
})
|
|
162
|
+
// -> nodeTypeSelection shows e.g. "base-nodes:::Stage" for staging,
|
|
163
|
+
// "base-nodes:::Fact" for the fact layer
|
|
164
|
+
|
|
165
|
+
// 1. Stage each source using the planned node type (independent — parallelize)
|
|
166
|
+
create-workspace-node-from-predecessor({
|
|
167
|
+
workspaceID, nodeType: "base-nodes:::Stage", // from plan
|
|
168
|
+
predecessorNodeIDs: ["source-customer-id"],
|
|
169
|
+
changes: { name: "STG_CUSTOMER" },
|
|
170
|
+
repoPath: "/path/to/repo" // auto-completes config
|
|
171
|
+
})
|
|
172
|
+
// -> "stg-cust-id" + configCompletion shows applied config
|
|
173
|
+
|
|
174
|
+
create-workspace-node-from-predecessor({
|
|
175
|
+
workspaceID, nodeType: "base-nodes:::Stage", // from plan
|
|
176
|
+
predecessorNodeIDs: ["source-orders-id"],
|
|
177
|
+
changes: { name: "STG_ORDERS" },
|
|
178
|
+
repoPath: "/path/to/repo"
|
|
179
|
+
})
|
|
180
|
+
// -> "stg-orders-id" + configCompletion shows applied config
|
|
181
|
+
|
|
182
|
+
// 2. Create join/fact node with planned fact type
|
|
183
|
+
create-workspace-node-from-predecessor({
|
|
184
|
+
workspaceID, nodeType: "base-nodes:::Fact", // from plan
|
|
185
|
+
predecessorNodeIDs: ["stg-cust-id", "stg-orders-id"],
|
|
186
|
+
changes: { name: "FACT_CLV" },
|
|
187
|
+
repoPath: "/path/to/repo"
|
|
188
|
+
})
|
|
189
|
+
// -> "fact-clv-id", joinSuggestions with common columns, configCompletion
|
|
190
|
+
|
|
191
|
+
// 3. Aggregate with automatic JOIN ON generation
|
|
192
|
+
convert-join-to-aggregation({
|
|
193
|
+
workspaceID, nodeID: "fact-clv-id",
|
|
194
|
+
groupByColumns: ['"STG_CUSTOMER"."CUSTOMER_ID"'],
|
|
195
|
+
aggregates: [
|
|
196
|
+
{ name: "TOTAL_ORDERS", function: "COUNT", expression: 'DISTINCT "STG_ORDERS"."ORDER_ID"' },
|
|
197
|
+
{ name: "LIFETIME_VALUE", function: "SUM", expression: '"STG_ORDERS"."ORDER_TOTAL"' }
|
|
198
|
+
],
|
|
199
|
+
maintainJoins: true,
|
|
200
|
+
repoPath: "/path/to/repo"
|
|
201
|
+
})
|
|
202
|
+
```
|
|
203
|
+
|
|
204
|
+
**Verification after each step:** Check `validation.allPredecessorsRepresented`, `validation.autoPopulatedColumns`, `warning`, and `joinSuggestions` before proceeding.
|
|
205
|
+
|
|
206
|
+
**If a node is created with 0 columns:** The predecessor may have no columns, the IDs were wrong, or the node type doesn't auto-inherit. Try a projection-capable type (Stage, View, Work) or add columns with `replace-workspace-node-columns`.
|
|
207
|
+
|
|
208
|
+
### UNION / Multi-Source Nodes
|
|
209
|
+
|
|
210
|
+
```javascript
|
|
211
|
+
create-workspace-node-from-predecessor({
|
|
212
|
+
workspaceID, nodeType: "Stage",
|
|
213
|
+
predecessorNodeIDs: ["stg-orders-us-id", "stg-orders-eu-id"],
|
|
214
|
+
changes: { name: "STG_ORDERS_ALL" }
|
|
215
|
+
})
|
|
216
|
+
|
|
217
|
+
update-workspace-node({
|
|
218
|
+
workspaceID, nodeID: "new-node-id",
|
|
219
|
+
changes: { config: { insertStrategy: "UNION ALL" } }
|
|
220
|
+
})
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
Values: `"UNION"` (dedup), `"UNION ALL"` (keep all), `"INSERT"` (sequential, default).
|
|
224
|
+
|
|
225
|
+
## Incremental Pipeline Setup
|
|
226
|
+
|
|
227
|
+
**With the Incremental-Nodes package** (check `list-workspace-node-types` for `IncrementalLoading:::230`):
|
|
228
|
+
|
|
229
|
+
```javascript
|
|
230
|
+
// 1. Create the node
|
|
231
|
+
create-workspace-node-from-predecessor({
|
|
232
|
+
workspaceID, nodeType: "IncrementalLoading:::230",
|
|
233
|
+
predecessorNodeIDs: ["source-orders-id"],
|
|
234
|
+
changes: { name: "INC_ORDERS" }
|
|
235
|
+
})
|
|
236
|
+
|
|
237
|
+
// 2. Configure incremental settings
|
|
238
|
+
update-workspace-node({
|
|
239
|
+
workspaceID, nodeID: "inc-orders-id",
|
|
240
|
+
changes: {
|
|
241
|
+
config: {
|
|
242
|
+
filterBasedOnPersistentTable: true,
|
|
243
|
+
persistentTableLocationName: "STAGING",
|
|
244
|
+
persistentTableName: "INC_ORDERS",
|
|
245
|
+
incrementalLoadColumn: "UPDATED_AT"
|
|
246
|
+
}
|
|
247
|
+
}
|
|
248
|
+
})
|
|
249
|
+
```
|
|
250
|
+
|
|
251
|
+
How it works: reads MAX of the high-water mark column from the target, filters source to rows above that value, INSERTs new rows. Any MERGE/upsert/SCD logic happens downstream in Dimension or Fact nodes.
|
|
252
|
+
|
|
253
|
+
**Without the package** — use a regular Stage with a MAX subquery in joinCondition:
|
|
254
|
+
|
|
255
|
+
```javascript
|
|
256
|
+
update-workspace-node({
|
|
257
|
+
workspaceID, nodeID: "new-node-id",
|
|
258
|
+
changes: {
|
|
259
|
+
config: { truncateBefore: false },
|
|
260
|
+
metadata: {
|
|
261
|
+
sourceMapping: [{
|
|
262
|
+
name: "STG_ORDERS_INCREMENTAL",
|
|
263
|
+
dependencies: [{ locationName: "RAW", nodeName: "ORDERS" }],
|
|
264
|
+
join: {
|
|
265
|
+
joinCondition: 'FROM {{ ref(\'RAW\', \'ORDERS\') }} "ORDERS"\nWHERE "ORDERS"."UPDATED_AT" > (\n SELECT COALESCE(MAX("UPDATED_AT"), \'1900-01-01\')\n FROM {{ ref_no_link(\'STAGING\', \'STG_ORDERS_INCREMENTAL\') }}\n)'
|
|
266
|
+
},
|
|
267
|
+
customSQL: { customSQL: "" }, aliases: {}, noLinkRefs: []
|
|
268
|
+
}]
|
|
269
|
+
}
|
|
270
|
+
}
|
|
271
|
+
})
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
Use `{{ ref_no_link() }}` for the self-reference to avoid circular DAG dependencies.
|
|
275
|
+
|
|
276
|
+
## Data Engineering Best Practices
|
|
277
|
+
|
|
278
|
+
### Naming Conventions
|
|
279
|
+
|
|
280
|
+
Follow layer-appropriate naming to keep pipelines readable:
|
|
281
|
+
|
|
282
|
+
| Layer | Convention | Examples |
|
|
283
|
+
|-------|-----------|----------|
|
|
284
|
+
| Staging | `STG_<SOURCE>` | `STG_CUSTOMERS`, `STG_ORDERS` |
|
|
285
|
+
| Intermediate | `INT_<PURPOSE>` or `WRK_<PURPOSE>` | `INT_ORDER_ENRICHMENT`, `WRK_CUSTOMER_DEDUP` |
|
|
286
|
+
| Dimension | `DIM_<ENTITY>` | `DIM_CUSTOMER`, `DIM_PRODUCT` |
|
|
287
|
+
| Fact | `FACT_<PROCESS>` or `FCT_<PROCESS>` | `FACT_SALES`, `FCT_CLV` |
|
|
288
|
+
| View | `V_<PURPOSE>` | `V_ACTIVE_CUSTOMERS` |
|
|
289
|
+
| Hub | `HUB_<KEY>` | `HUB_CUSTOMER` |
|
|
290
|
+
| Satellite | `SAT_<HUB>_<CONTEXT>` | `SAT_CUSTOMER_DETAILS` |
|
|
291
|
+
| Link | `LNK_<RELATIONSHIP>` | `LNK_CUSTOMER_ORDER` |
|
|
292
|
+
|
|
293
|
+
For Snowflake, UPPERCASE node names are conventional (`STG_LOCATION`) since unquoted identifiers are uppercase. For Databricks/BigQuery, lowercase is typical. **Always respect the user's chosen casing** — if they provide or create a node with a specific case, preserve it exactly.
|
|
294
|
+
|
|
295
|
+
### Join Verification Checklist
|
|
296
|
+
|
|
297
|
+
After creating a multi-predecessor node, ALWAYS:
|
|
298
|
+
|
|
299
|
+
1. **Review `joinSuggestions`** — the response shows common columns between predecessors. Confirm these are the correct business keys for the join.
|
|
300
|
+
2. **Choose the right join type:**
|
|
301
|
+
- `INNER JOIN` — only matching rows (use when every record must exist in both tables)
|
|
302
|
+
- `LEFT JOIN` — keep all rows from the primary table (use when the left table is the "driver" and right table may have missing matches)
|
|
303
|
+
- `FULL OUTER JOIN` — keep all rows from both tables (rare, use for reconciliation)
|
|
304
|
+
3. **Verify join cardinality:** Joining a 1M-row table to a 10M-row table on a non-unique key causes fan-out (row multiplication). Ensure at least one side of the join is unique on the join key.
|
|
305
|
+
4. **Set the join condition** — call `convert-join-to-aggregation` (for aggregation), `apply-join-condition` (for row-level joins), or `update-workspace-node` (for full manual control)
|
|
306
|
+
5. **Verify columns** — call `get-workspace-node` to confirm the final column list and transforms are correct
|
|
307
|
+
|
|
308
|
+
### Fact Table Grain
|
|
309
|
+
|
|
310
|
+
When building fact tables, define the **grain** (the set of columns that uniquely identifies each row):
|
|
311
|
+
|
|
312
|
+
- Grain columns become your `groupByColumns` in `convert-join-to-aggregation`
|
|
313
|
+
- Mark grain columns as `isBusinessKey: true`
|
|
314
|
+
- All other columns should be aggregates (COUNT, SUM, AVG, etc.)
|
|
315
|
+
- If unsure about the grain, ask the user: "What uniquely identifies each row in this fact table?"
|
|
316
|
+
|
|
317
|
+
Example: A sales fact table might have grain = `[CUSTOMER_ID, ORDER_DATE, PRODUCT_ID]` with measures `QUANTITY`, `REVENUE`, `DISCOUNT_AMOUNT`.
|
|
318
|
+
|
|
319
|
+
### Post-Creation Verification
|
|
320
|
+
|
|
321
|
+
After creating each node, verify before moving to the next:
|
|
322
|
+
|
|
323
|
+
1. **Check `nextSteps`** in the creation response — follow all required steps
|
|
324
|
+
2. **Check `validation.allPredecessorsRepresented`** — if false, predecessors are missing from column sources
|
|
325
|
+
3. **Check `configCompletion`** — verify applied config and column attributes make sense
|
|
326
|
+
4. **For multi-predecessor nodes:** Confirm the join condition was set (call `get-workspace-node` to verify `metadata.sourceMapping[].join.joinCondition` is not empty)
|
|
327
|
+
5. **For aggregation nodes:** Verify GROUP BY is valid (`validation.valid: true` from `convert-join-to-aggregation`)
|
|
328
|
+
|
|
329
|
+
### Materialization Strategy
|
|
330
|
+
|
|
331
|
+
Choose materialization based on the node's role and query patterns:
|
|
332
|
+
|
|
333
|
+
- **Staging/Bronze:** Always `table` (preserve raw data; Snowflake: use transient tables)
|
|
334
|
+
- **Intermediate:** `view` for lightweight transforms; `table` for expensive computations
|
|
335
|
+
- **Dimensions:** `table` (small, queried repeatedly, need persistence)
|
|
336
|
+
- **Facts:** `table` with incremental loading for large volumes
|
|
337
|
+
- **Metrics/aggregations queried frequently:** `table` via Dimension or Fact (NEVER `view` for repeated aggregation queries)
|
|
338
|
+
|
|
339
|
+
IMPORTANT: `View` node types can ONLY materialize as views. If you need a table, use `Dimension`, `Fact`, `Stage`, or `Work`.
|
|
340
|
+
|
|
341
|
+
## After Building the Pipeline
|
|
342
|
+
|
|
343
|
+
1. **Deploy**: `start-run` with `runType: "deploy"`
|
|
344
|
+
2. **Run**: `start-run` with `runType: "refresh"`
|
|
345
|
+
3. **Monitor**: `run-status` or `run-and-wait`
|
|
346
|
+
4. **Troubleshoot**: `get-run-results` for errors, `retry-run` to re-run
|
|
347
|
+
|
|
348
|
+
Scheduling is configured via Jobs in the Coalesce UI. Trigger existing jobs with `start-run` and `jobID`. See `coalesce://context/run-operations` for full guidance.
|
|
349
|
+
|
|
350
|
+
## Related Resources
|
|
351
|
+
|
|
352
|
+
- `coalesce://context/node-creation-decision-tree` — routing: which tool to use
|
|
353
|
+
- `coalesce://context/node-operations` — editing nodes after creation
|
|
354
|
+
- `coalesce://context/data-engineering-principles` — architecture and materialization
|
|
355
|
+
- `coalesce://context/aggregation-patterns` — GROUP BY, datatype inference
|
|
@@ -0,0 +1,55 @@
|
|
|
1
|
+
# Run Operations
|
|
2
|
+
|
|
3
|
+
This guidance matches the Coalesce run source model and the actual run workflows registered in this MCP.
|
|
4
|
+
|
|
5
|
+
## Current Tool Surface
|
|
6
|
+
|
|
7
|
+
Use these run tools and workflows:
|
|
8
|
+
|
|
9
|
+
- `list-runs`
|
|
10
|
+
- `get-run`
|
|
11
|
+
- `get-run-results`
|
|
12
|
+
- `get-run-details`
|
|
13
|
+
- `start-run`
|
|
14
|
+
- `run-status`
|
|
15
|
+
- `run-and-wait`
|
|
16
|
+
- `retry-run`
|
|
17
|
+
- `retry-and-wait`
|
|
18
|
+
- `cancel-run`
|
|
19
|
+
|
|
20
|
+
## Identifier And Status Model
|
|
21
|
+
|
|
22
|
+
- Coalesce scheduler start and rerun flows return a numeric `runCounter`.
|
|
23
|
+
- `run-status` polls by `runCounter`.
|
|
24
|
+
- The same numeric identifier is used as `runID` for `get-run`, `get-run-results`, and `get-run-details`.
|
|
25
|
+
- Non-terminal statuses are `waitingToRun` and `running`.
|
|
26
|
+
- Terminal statuses are `completed`, `failed`, and `canceled`.
|
|
27
|
+
|
|
28
|
+
## Source-Derived Lifecycle
|
|
29
|
+
|
|
30
|
+
- Coalesce app helpers call `/scheduler/startRun` or `/scheduler/rerun`, then poll `/scheduler/runStatus` until the run reaches a terminal status.
|
|
31
|
+
- The source only treats terminal completion as the point where success or failure can actually be asserted.
|
|
32
|
+
- The MCP `run-and-wait` and `retry-and-wait` workflows follow that same model and then fetch `/api/v1/runs/{runCounter}/results`.
|
|
33
|
+
|
|
34
|
+
## Routing Rules
|
|
35
|
+
|
|
36
|
+
- Use `run-and-wait` when the user wants the final run outcome in one call.
|
|
37
|
+
- Use `retry-and-wait` when the prior run has already failed and should be retried immediately.
|
|
38
|
+
- Use `start-run` or `retry-run` when you want explicit control over the polling sequence.
|
|
39
|
+
- Use `run-status` for live scheduler state by `runCounter`.
|
|
40
|
+
- Use `get-run-details` when you want run metadata and results together.
|
|
41
|
+
- Use `get-run` or `get-run-results` when you only need one side of that data.
|
|
42
|
+
- Use `cancel-run` only with `runID`, `environmentID`, and org context.
|
|
43
|
+
|
|
44
|
+
## Practical Checks
|
|
45
|
+
|
|
46
|
+
- Do not treat "request accepted" as the same thing as "run completed".
|
|
47
|
+
- Inspect terminal status and result payloads before reporting success.
|
|
48
|
+
- If the user only knows a job name, resolve its numeric ID first.
|
|
49
|
+
- `run-and-wait` and `retry-and-wait` can still return `resultsError`, `incomplete`, or `timedOut`; inspect those fields before calling the workflow successful.
|
|
50
|
+
|
|
51
|
+
## Avoid
|
|
52
|
+
|
|
53
|
+
- Do not use browser URL UUID fragments as run IDs.
|
|
54
|
+
- Do not poll `run-status` with a job ID or environment ID.
|
|
55
|
+
- Do not retry runs that are still `waitingToRun` or `running`.
|
|
@@ -0,0 +1,41 @@
|
|
|
1
|
+
# SQL Rules: BigQuery
|
|
2
|
+
|
|
3
|
+
This guidance is based on the AI runtime platform rules for BigQuery and the related Coalesce SQL rendering behavior.
|
|
4
|
+
|
|
5
|
+
## Default SQL Style
|
|
6
|
+
|
|
7
|
+
- Prefer lowercase identifiers.
|
|
8
|
+
- Use backticks to quote identifiers when needed.
|
|
9
|
+
- For raw physical references, prefer fully qualified `project.dataset.table` paths and quote the entire path with one pair of backticks when quoting is necessary.
|
|
10
|
+
- Avoid `SELECT *`; prefer explicit column lists.
|
|
11
|
+
|
|
12
|
+
## Coalesce `ref()` Syntax
|
|
13
|
+
|
|
14
|
+
- In Coalesce node SQL, node-to-node dependencies should stay in logical ref form:
|
|
15
|
+
|
|
16
|
+
```jinja
|
|
17
|
+
FROM {{ ref('sample', 'nation') }} nation
|
|
18
|
+
LEFT JOIN {{ ref('storage_location', 'stg_foo') }} stg_foo
|
|
19
|
+
ON nation.id = stg_foo.nation_id
|
|
20
|
+
```
|
|
21
|
+
|
|
22
|
+
- The SQL processing layer treats `ref()` arguments as exact `locationName` and `nodeName` values.
|
|
23
|
+
- Keep `ref()` arguments aligned with the saved Coalesce names, even when the rest of the SQL follows BigQuery lowercase conventions.
|
|
24
|
+
- For more on logical locations, use `coalesce://context/storage-mappings`.
|
|
25
|
+
|
|
26
|
+
## Physical Object Names
|
|
27
|
+
|
|
28
|
+
- When Coalesce renders physical database and schema locations for BigQuery, generated physical references may appear in backtick form such as `` `project`.`dataset`. ``
|
|
29
|
+
- Preserve that generated style when editing SQL that already includes physical references from Coalesce metadata or templates.
|
|
30
|
+
|
|
31
|
+
## Common Source-Backed Functions
|
|
32
|
+
|
|
33
|
+
- String: `split`, `substr`, `upper`, `lower`, `regexp_extract`, `replace`
|
|
34
|
+
- Date: `date_trunc`, `timestamp_trunc`, `date_add`, `date_sub`, `timestamp_add`, `timestamp_diff`
|
|
35
|
+
- Aggregate: `sum`, `count`, `avg`, `min`, `max`, `any_value`, `string_agg`, `array_agg`
|
|
36
|
+
|
|
37
|
+
## Avoid
|
|
38
|
+
|
|
39
|
+
- Do not replace Coalesce `ref()` references with raw project.dataset.table paths unless the user explicitly wants physical SQL.
|
|
40
|
+
- Do not switch BigQuery identifier quoting to Snowflake-style double quotes.
|
|
41
|
+
- Do not introduce wildcard projections when explicit columns are practical.
|
|
@@ -0,0 +1,40 @@
|
|
|
1
|
+
# SQL Rules: Databricks
|
|
2
|
+
|
|
3
|
+
This guidance is based on the AI runtime platform rules for Databricks and the related Coalesce SQL rendering behavior.
|
|
4
|
+
|
|
5
|
+
## Default SQL Style
|
|
6
|
+
|
|
7
|
+
- Table references should use backticks rather than double quotes.
|
|
8
|
+
- Use lowercase identifiers for SQL you introduce.
|
|
9
|
+
- Use clear naming prefixes such as `stg_`, `dim_`, and `fct_` when naming new objects.
|
|
10
|
+
|
|
11
|
+
## Coalesce `ref()` Syntax
|
|
12
|
+
|
|
13
|
+
- Coalesce node-to-node SQL should use logical refs:
|
|
14
|
+
|
|
15
|
+
```jinja
|
|
16
|
+
FROM {{ ref('sample', 'nation') }} `nation`
|
|
17
|
+
JOIN {{ ref('storage_location', 'stg_foo') }} `stg_foo`
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
- The SQL processing layer treats `ref()` arguments as exact `locationName` and `nodeName` values.
|
|
21
|
+
- Keep `ref()` arguments aligned with the saved Coalesce names, even when the rest of the SQL follows Databricks lowercase conventions.
|
|
22
|
+
- For more on logical locations, use `coalesce://context/storage-mappings`.
|
|
23
|
+
|
|
24
|
+
## Physical Object Names
|
|
25
|
+
|
|
26
|
+
- When Coalesce renders physical database and schema locations for Databricks, object names appear in backtick form such as `` `catalog`.`schema`. `` or `` `db`.`schema`. ``
|
|
27
|
+
- Preserve that generated style when editing SQL that already includes physical references from Coalesce metadata or templates.
|
|
28
|
+
|
|
29
|
+
## Common Source-Backed Functions
|
|
30
|
+
|
|
31
|
+
- String: `split`, `regexp_extract`, `initcap`, `translate`, `reverse`
|
|
32
|
+
- Date: `date_trunc`, `add_months`, `months_between`, `next_day`, `current_date`
|
|
33
|
+
- Array: `explode`, `array_contains`, `size`, `slice`, `posexplode`
|
|
34
|
+
- Aggregate: `collect_list`, `collect_set`, `count_distinct`, `percentile_approx`, `any_value`
|
|
35
|
+
|
|
36
|
+
## Avoid
|
|
37
|
+
|
|
38
|
+
- Do not rewrite `ref()` arguments into physical warehouse paths.
|
|
39
|
+
- Do not use Snowflake double quotes for Databricks identifiers.
|
|
40
|
+
- Do not mix uppercase Snowflake-style aliasing with Databricks lowercase conventions unless the saved names require it.
|
|
@@ -0,0 +1,70 @@
|
|
|
1
|
+
# SQL Platform Selection
|
|
2
|
+
|
|
3
|
+
Read this resource before writing or editing SQL in a Coalesce node.
|
|
4
|
+
|
|
5
|
+
## Goal
|
|
6
|
+
|
|
7
|
+
Determine the platform first, then read exactly one dialect resource:
|
|
8
|
+
|
|
9
|
+
- `coalesce://context/sql-snowflake`
|
|
10
|
+
- `coalesce://context/sql-databricks`
|
|
11
|
+
- `coalesce://context/sql-bigquery`
|
|
12
|
+
|
|
13
|
+
Follow exactly one dialect's rules per edit. Mixing dialect conventions in one node creates compilation errors.
|
|
14
|
+
|
|
15
|
+
## Best Signal Order
|
|
16
|
+
|
|
17
|
+
### 1. Check Project Metadata First
|
|
18
|
+
|
|
19
|
+
Use `get-project` or `list-projects` and inspect the returned project metadata for the warehouse or platform type.
|
|
20
|
+
|
|
21
|
+
This is the best first signal because it reflects the project configuration directly.
|
|
22
|
+
|
|
23
|
+
### 2. Check Existing Node SQL
|
|
24
|
+
|
|
25
|
+
If you are editing an existing node, read the current node and inspect:
|
|
26
|
+
|
|
27
|
+
- identifier casing
|
|
28
|
+
- quoting style
|
|
29
|
+
- function names
|
|
30
|
+
- date/time function patterns
|
|
31
|
+
- join and alias style
|
|
32
|
+
|
|
33
|
+
Prefer preserving the existing node and workspace conventions rather than normalizing everything.
|
|
34
|
+
|
|
35
|
+
### 3. Check Neighboring Nodes
|
|
36
|
+
|
|
37
|
+
If one node is ambiguous, inspect nearby workspace nodes in the same layer or dependency chain.
|
|
38
|
+
|
|
39
|
+
Workspace-local conventions are a better guide than generic SQL style advice.
|
|
40
|
+
|
|
41
|
+
### 4. Ask the User If Still Unclear
|
|
42
|
+
|
|
43
|
+
If project metadata and existing SQL still do not settle the dialect, ask the user.
|
|
44
|
+
|
|
45
|
+
When the dialect choice materially affects correctness (e.g., function names, quoting, type syntax), ask the user rather than guessing.
|
|
46
|
+
|
|
47
|
+
## Coalesce-Specific Rule
|
|
48
|
+
|
|
49
|
+
Inside Coalesce node SQL, prefer `{{ ref(...) }}` for node and storage references. For full reference syntax details, see `coalesce://context/storage-mappings`.
|
|
50
|
+
|
|
51
|
+
Preserve `{{ ref(...) }}` syntax for node references. Only replace with raw warehouse-qualified table names if the user explicitly wants warehouse-native SQL outside normal Coalesce patterns.
|
|
52
|
+
|
|
53
|
+
## Editing Principles
|
|
54
|
+
|
|
55
|
+
This is the canonical source for "preserve workspace conventions" when editing SQL.
|
|
56
|
+
|
|
57
|
+
When modifying existing SQL:
|
|
58
|
+
|
|
59
|
+
- preserve the current dialect
|
|
60
|
+
- preserve the current quoting and casing style unless it is clearly broken
|
|
61
|
+
- avoid broad formatting rewrites that do not change behavior
|
|
62
|
+
- preserve existing workspace style over personal preferences
|
|
63
|
+
|
|
64
|
+
When generating entirely new SQL and no local convention exists:
|
|
65
|
+
|
|
66
|
+
- use the selected platform resource as the default style guide
|
|
67
|
+
|
|
68
|
+
## Special Note
|
|
69
|
+
|
|
70
|
+
Run-tool authentication in this MCP server is Snowflake Key Pair-based, but that does not mean every project uses Snowflake SQL semantics. Determine the SQL platform from project metadata and node SQL, not from run-tool auth requirements.
|
|
@@ -0,0 +1,43 @@
|
|
|
1
|
+
# SQL Rules: Snowflake
|
|
2
|
+
|
|
3
|
+
This guidance is based on the AI runtime platform rules for Snowflake and the related Coalesce SQL rendering behavior.
|
|
4
|
+
|
|
5
|
+
## Default SQL Style
|
|
6
|
+
|
|
7
|
+
- Default to uppercase unquoted identifiers for column names, aliases, and SQL identifiers (Snowflake convention).
|
|
8
|
+
- Node names become Snowflake table/view names. Default to UPPERCASE (`STG_LOCATION`, `FACT_ORDERS`), but **respect the user's chosen casing** — if they name a node in lowercase, preserve it.
|
|
9
|
+
- Use double quotes only when you must preserve exact case or quote a reserved word.
|
|
10
|
+
- Column names and aliases should normally be uppercase and unquoted.
|
|
11
|
+
- Table aliases should normally be uppercase and unquoted.
|
|
12
|
+
- Avoid backticks, single-quoted identifiers, or mixed quoting styles.
|
|
13
|
+
|
|
14
|
+
## Coalesce `ref()` Syntax
|
|
15
|
+
|
|
16
|
+
- Coalesce node-to-node SQL should use logical refs:
|
|
17
|
+
|
|
18
|
+
```jinja
|
|
19
|
+
FROM {{ ref("SAMPLE", "NATION") }} N
|
|
20
|
+
LEFT JOIN {{ ref("STORAGE_LOCATION", "STG_FOO") }} F
|
|
21
|
+
ON N.ID = F.NATION_ID
|
|
22
|
+
```
|
|
23
|
+
|
|
24
|
+
- The SQL processing layer treats `ref()` arguments as exact `locationName` and `nodeName` values.
|
|
25
|
+
- Keep `ref()` arguments aligned with the saved Coalesce names, even when the rest of the SQL follows Snowflake casing preferences.
|
|
26
|
+
- For more on logical locations, use `coalesce://context/storage-mappings`.
|
|
27
|
+
|
|
28
|
+
## Physical Object Names
|
|
29
|
+
|
|
30
|
+
- When Coalesce renders physical database and schema locations, Snowflake object names may appear in double-quoted form such as `"DB"."SCHEMA".`
|
|
31
|
+
- Preserve that generated style when editing SQL that already includes physical object references from Coalesce metadata or templates.
|
|
32
|
+
|
|
33
|
+
## Common Source-Backed Functions
|
|
34
|
+
|
|
35
|
+
- String: `SPLIT`, `SUBSTR`, `UPPER`, `LOWER`, `REGEXP_SUBSTR`
|
|
36
|
+
- Date: `DATEADD`, `DATEDIFF`, `TO_DATE`, `DATE_TRUNC`
|
|
37
|
+
- Aggregate: `SUM`, `COUNT`, `AVG`, `MIN`, `MAX`, `MEDIAN`
|
|
38
|
+
|
|
39
|
+
## Avoid
|
|
40
|
+
|
|
41
|
+
- Do not rewrite `ref()` arguments into raw warehouse object paths.
|
|
42
|
+
- Do not switch Snowflake identifier quoting to backticks.
|
|
43
|
+
- Do not introduce mixed-case quoted identifiers unless preserving exact case is required.
|
|
@@ -0,0 +1,49 @@
|
|
|
1
|
+
# Storage Locations and References
|
|
2
|
+
|
|
3
|
+
This guidance is based on the AI runtime storage-context section and the related Coalesce ref-handling code.
|
|
4
|
+
|
|
5
|
+
## What Storage Locations Mean
|
|
6
|
+
|
|
7
|
+
- Storage locations are short logical names that map to `DATABASE.SCHEMA` pairs where tables live.
|
|
8
|
+
- Users may have many storage locations configured in a workspace.
|
|
9
|
+
- Any node can be manually configured to live somewhere other than the obvious default, so always pay attention to the existing node storage locations when writing joins or downstream refs.
|
|
10
|
+
|
|
11
|
+
## Default Placement Matters
|
|
12
|
+
|
|
13
|
+
- New nodes normally default to the storage location specified by their node type.
|
|
14
|
+
- That default matters when you create multiple nodes at once or in parallel because downstream refs may need to target the new node's default location, not the source node's location.
|
|
15
|
+
|
|
16
|
+
## Source Example
|
|
17
|
+
|
|
18
|
+
- You create a chain `A -> B -> C`
|
|
19
|
+
- `A` already exists in `SOURCE_A`
|
|
20
|
+
- `B` is created and defaults to `USER_WORKSPACE`
|
|
21
|
+
- `C` is created at the same time and also defaults to `USER_WORKSPACE`
|
|
22
|
+
|
|
23
|
+
That means:
|
|
24
|
+
|
|
25
|
+
- `B` should reference `A` with `{{ ref('SOURCE_A', 'ORDERS') }}`
|
|
26
|
+
- `C` should reference `B` with `{{ ref('USER_WORKSPACE', 'NODE_B') }}`
|
|
27
|
+
|
|
28
|
+
## `ref()` Contract
|
|
29
|
+
|
|
30
|
+
- Coalesce node SQL uses logical refs in this shape:
|
|
31
|
+
|
|
32
|
+
```jinja
|
|
33
|
+
FROM {{ ref('LOCATION_NAME', 'NODE_NAME') }}
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
- The SQL processing layer extracts refs into exact `locationName` and `nodeName` pairs.
|
|
37
|
+
- Keep `ref()` arguments aligned with the saved Coalesce names.
|
|
38
|
+
|
|
39
|
+
## Practical Rules
|
|
40
|
+
|
|
41
|
+
- Treat `locationName` as logical Coalesce state and `database` or `schema` as the resolved physical target behind it.
|
|
42
|
+
- When chaining node creation, confirm where the upstream node was actually placed before referencing it.
|
|
43
|
+
- If storage mappings are missing, fix that first instead of guessing raw warehouse paths.
|
|
44
|
+
|
|
45
|
+
## Avoid
|
|
46
|
+
|
|
47
|
+
- Do not guess logical location names from database or schema values.
|
|
48
|
+
- Do not assume new nodes land in the same location as their predecessors.
|
|
49
|
+
- Do not normalize `ref()` arguments to warehouse casing rules.
|