coalesce-transform-mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (134) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +304 -0
  3. package/dist/cache-dir.d.ts +26 -0
  4. package/dist/cache-dir.js +106 -0
  5. package/dist/client.d.ts +25 -0
  6. package/dist/client.js +212 -0
  7. package/dist/coalesce/api/environments.d.ts +20 -0
  8. package/dist/coalesce/api/environments.js +15 -0
  9. package/dist/coalesce/api/git-accounts.d.ts +21 -0
  10. package/dist/coalesce/api/git-accounts.js +21 -0
  11. package/dist/coalesce/api/jobs.d.ts +25 -0
  12. package/dist/coalesce/api/jobs.js +21 -0
  13. package/dist/coalesce/api/nodes.d.ts +29 -0
  14. package/dist/coalesce/api/nodes.js +33 -0
  15. package/dist/coalesce/api/projects.d.ts +22 -0
  16. package/dist/coalesce/api/projects.js +25 -0
  17. package/dist/coalesce/api/runs.d.ts +19 -0
  18. package/dist/coalesce/api/runs.js +34 -0
  19. package/dist/coalesce/api/subgraphs.d.ts +20 -0
  20. package/dist/coalesce/api/subgraphs.js +17 -0
  21. package/dist/coalesce/api/users.d.ts +30 -0
  22. package/dist/coalesce/api/users.js +31 -0
  23. package/dist/coalesce/types.d.ts +298 -0
  24. package/dist/coalesce/types.js +746 -0
  25. package/dist/generated/.gitkeep +0 -0
  26. package/dist/generated/node-type-corpus.json +42656 -0
  27. package/dist/index.d.ts +2 -0
  28. package/dist/index.js +10 -0
  29. package/dist/mcp/cache.d.ts +3 -0
  30. package/dist/mcp/cache.js +137 -0
  31. package/dist/mcp/environments.d.ts +3 -0
  32. package/dist/mcp/environments.js +61 -0
  33. package/dist/mcp/git-accounts.d.ts +3 -0
  34. package/dist/mcp/git-accounts.js +70 -0
  35. package/dist/mcp/jobs.d.ts +3 -0
  36. package/dist/mcp/jobs.js +77 -0
  37. package/dist/mcp/node-type-corpus.d.ts +3 -0
  38. package/dist/mcp/node-type-corpus.js +173 -0
  39. package/dist/mcp/nodes.d.ts +3 -0
  40. package/dist/mcp/nodes.js +341 -0
  41. package/dist/mcp/pipelines.d.ts +3 -0
  42. package/dist/mcp/pipelines.js +342 -0
  43. package/dist/mcp/projects.d.ts +3 -0
  44. package/dist/mcp/projects.js +70 -0
  45. package/dist/mcp/repo-node-types.d.ts +135 -0
  46. package/dist/mcp/repo-node-types.js +387 -0
  47. package/dist/mcp/runs.d.ts +3 -0
  48. package/dist/mcp/runs.js +92 -0
  49. package/dist/mcp/subgraphs.d.ts +3 -0
  50. package/dist/mcp/subgraphs.js +60 -0
  51. package/dist/mcp/users.d.ts +3 -0
  52. package/dist/mcp/users.js +107 -0
  53. package/dist/prompts/index.d.ts +2 -0
  54. package/dist/prompts/index.js +58 -0
  55. package/dist/resources/context/aggregation-patterns.md +145 -0
  56. package/dist/resources/context/data-engineering-principles.md +183 -0
  57. package/dist/resources/context/hydrated-metadata.md +92 -0
  58. package/dist/resources/context/id-discovery.md +64 -0
  59. package/dist/resources/context/intelligent-node-configuration.md +162 -0
  60. package/dist/resources/context/node-creation-decision-tree.md +156 -0
  61. package/dist/resources/context/node-operations.md +316 -0
  62. package/dist/resources/context/node-payloads.md +114 -0
  63. package/dist/resources/context/node-type-corpus.md +166 -0
  64. package/dist/resources/context/node-type-selection-guide.md +96 -0
  65. package/dist/resources/context/overview.md +135 -0
  66. package/dist/resources/context/pipeline-workflows.md +355 -0
  67. package/dist/resources/context/run-operations.md +55 -0
  68. package/dist/resources/context/sql-bigquery.md +41 -0
  69. package/dist/resources/context/sql-databricks.md +40 -0
  70. package/dist/resources/context/sql-platform-selection.md +70 -0
  71. package/dist/resources/context/sql-snowflake.md +43 -0
  72. package/dist/resources/context/storage-mappings.md +49 -0
  73. package/dist/resources/context/tool-usage.md +98 -0
  74. package/dist/resources/index.d.ts +5 -0
  75. package/dist/resources/index.js +254 -0
  76. package/dist/schemas/node-payloads.d.ts +5019 -0
  77. package/dist/schemas/node-payloads.js +147 -0
  78. package/dist/server.d.ts +7 -0
  79. package/dist/server.js +63 -0
  80. package/dist/services/cache/snapshots.d.ts +108 -0
  81. package/dist/services/cache/snapshots.js +275 -0
  82. package/dist/services/config/context-analyzer.d.ts +14 -0
  83. package/dist/services/config/context-analyzer.js +76 -0
  84. package/dist/services/config/field-classifier.d.ts +23 -0
  85. package/dist/services/config/field-classifier.js +47 -0
  86. package/dist/services/config/intelligent.d.ts +55 -0
  87. package/dist/services/config/intelligent.js +306 -0
  88. package/dist/services/config/rules.d.ts +6 -0
  89. package/dist/services/config/rules.js +44 -0
  90. package/dist/services/config/schema-resolver.d.ts +18 -0
  91. package/dist/services/config/schema-resolver.js +80 -0
  92. package/dist/services/corpus/loader.d.ts +56 -0
  93. package/dist/services/corpus/loader.js +25 -0
  94. package/dist/services/corpus/search.d.ts +49 -0
  95. package/dist/services/corpus/search.js +69 -0
  96. package/dist/services/corpus/templates.d.ts +4 -0
  97. package/dist/services/corpus/templates.js +11 -0
  98. package/dist/services/pipelines/execution.d.ts +20 -0
  99. package/dist/services/pipelines/execution.js +290 -0
  100. package/dist/services/pipelines/node-type-intent.d.ts +96 -0
  101. package/dist/services/pipelines/node-type-intent.js +356 -0
  102. package/dist/services/pipelines/node-type-selection.d.ts +66 -0
  103. package/dist/services/pipelines/node-type-selection.js +758 -0
  104. package/dist/services/pipelines/planning.d.ts +543 -0
  105. package/dist/services/pipelines/planning.js +1839 -0
  106. package/dist/services/policies/sql-override.d.ts +7 -0
  107. package/dist/services/policies/sql-override.js +109 -0
  108. package/dist/services/repo/operations.d.ts +6 -0
  109. package/dist/services/repo/operations.js +10 -0
  110. package/dist/services/repo/parser.d.ts +70 -0
  111. package/dist/services/repo/parser.js +365 -0
  112. package/dist/services/repo/path.d.ts +2 -0
  113. package/dist/services/repo/path.js +58 -0
  114. package/dist/services/templates/nodes.d.ts +50 -0
  115. package/dist/services/templates/nodes.js +336 -0
  116. package/dist/services/workspace/analysis.d.ts +56 -0
  117. package/dist/services/workspace/analysis.js +151 -0
  118. package/dist/services/workspace/mutations.d.ts +150 -0
  119. package/dist/services/workspace/mutations.js +1718 -0
  120. package/dist/utils.d.ts +5 -0
  121. package/dist/utils.js +7 -0
  122. package/dist/workflows/get-environment-overview.d.ts +9 -0
  123. package/dist/workflows/get-environment-overview.js +23 -0
  124. package/dist/workflows/get-run-details.d.ts +10 -0
  125. package/dist/workflows/get-run-details.js +28 -0
  126. package/dist/workflows/progress.d.ts +20 -0
  127. package/dist/workflows/progress.js +54 -0
  128. package/dist/workflows/retry-and-wait.d.ts +13 -0
  129. package/dist/workflows/retry-and-wait.js +139 -0
  130. package/dist/workflows/run-and-wait.d.ts +13 -0
  131. package/dist/workflows/run-and-wait.js +141 -0
  132. package/dist/workflows/run-status.d.ts +10 -0
  133. package/dist/workflows/run-status.js +27 -0
  134. package/package.json +34 -0
@@ -0,0 +1,355 @@
1
+ # Pipeline Building Workflows
2
+
3
+ ## Quick Reference
4
+
5
+ | User says | Node type | Action |
6
+ |-----------|-----------|--------|
7
+ | "stage my X data" | `Stage` | `create-workspace-node-from-predecessor` with source node |
8
+ | "create a dimension for X" | `Dimension` | `create-workspace-node-from-predecessor` with staging node |
9
+ | "build a fact table from X and Y" | `Fact` | `create-workspace-node-from-predecessor` with both, then `convert-join-to-aggregation` |
10
+ | "create a view for X" | `View` | `create-workspace-node-from-predecessor` with upstream node |
11
+ | "join X and Y" | `View` | `create-workspace-node-from-predecessor` with both, then apply join condition |
12
+ | "build an incremental pipeline" | `Incremental Load` | See "Incremental Pipeline Setup" below |
13
+ | "create an empty node" | Any | `create-workspace-node-from-scratch` with `name` and `metadata.columns` |
14
+
15
+ Use this table as a heuristic for likely node families, but still call `plan-pipeline` before creating anything so the exact `nodeType` is confirmed from the repo/workspace context. Use the full workflow whenever the request is ambiguous or involves multiple layers.
16
+
17
+ ## Choosing the Right Node Type
18
+
19
+ **Step 1: Check available types**
20
+
21
+ ```javascript
22
+ list-workspace-node-types({ workspaceID })
23
+ ```
24
+
25
+ Prefer types already in use. If `base-nodes:::Stage` etc. are observed, use those over built-in types. If a recommended type is not observed, tell the user it may need installation via Build Settings > Packages.
26
+
27
+ **Step 2: Read source columns**
28
+
29
+ ```javascript
30
+ get-workspace-node({ workspaceID, nodeID: "source-id" })
31
+ ```
32
+
33
+ Look for timestamp columns (UPDATED_AT, CREATED_AT) for incremental loading, business keys (CUSTOMER_ID) for merge keys, and SCD columns (EFFECTIVE_FROM, IS_CURRENT) for dimension tracking.
34
+
35
+ **Step 3: Select by pipeline layer**
36
+
37
+ | Layer | Default | Use Instead When |
38
+ |-------|---------|-----------------|
39
+ | Staging | `Stage` | Source has timestamps AND is large -> `Incremental Load` (requires package) |
40
+ | Intermediate | `View` | Expensive computation -> `Stage` or `Work` |
41
+ | Dimension (gold) | `Dimension` | Has SCD columns -> configure SCD Type 2 |
42
+ | Fact (gold) | `Fact` | Large append-heavy data -> incremental config |
43
+ | Metrics | `View` | Complex/frequently queried -> `Fact` |
44
+
45
+ IMPORTANT: `View` types can ONLY materialize as views. For aggregations queried repeatedly, use `Dimension` or `Fact`. See `coalesce://context/data-engineering-principles` for platform-specific materialization guidance.
46
+
47
+ Node type format: `"Stage"` (simple) or `"IncrementalLoading:::230"` (PackageName:::NodeTypeID). Prefer the package-prefixed format when known.
48
+
49
+ ## Pipeline Building Sequence
50
+
51
+ ### Step 1: Discover the Workspace
52
+
53
+ ```javascript
54
+ list-projects({ includeWorkspaces: true })
55
+ ```
56
+
57
+ ### Step 2: Find Source Nodes
58
+
59
+ ```javascript
60
+ list-workspace-nodes({ workspaceID, detail: true })
61
+ ```
62
+
63
+ Note node IDs, names, and `locationName` (needed for `{{ ref() }}` syntax).
64
+
65
+ ### Step 3: Plan to discover the correct node type
66
+
67
+ **Always call `plan-pipeline` before creating nodes.** The planner scans the repo for all committed node type definitions, scores them against your use case, and returns the best match. Do not guess node types — use what the planner recommends.
68
+
69
+ ```javascript
70
+ plan-pipeline({
71
+ workspaceID,
72
+ goal: "stage customer data",
73
+ sourceNodeIDs: ["source-id-1"],
74
+ repoPath: "/path/to/repo" // or rely on COALESCE_REPO_PATH
75
+ })
76
+ ```
77
+
78
+ The response includes:
79
+
80
+ - `nodeTypeSelection.consideredNodeTypes` — ranked candidates with scores and reasons
81
+ - `supportedNodeTypes` — types that support automatic creation
82
+ - `nodes[].nodeType` — the recommended type for each planned node
83
+
84
+ **If the user provides SQL:** Pass it directly to the planner.
85
+
86
+ ```javascript
87
+ plan-pipeline({ workspaceID, sql: "<USER-PROVIDED SQL HERE>" })
88
+ ```
89
+
90
+ Never author SQL yourself to pass to these tools.
91
+
92
+ ### Step 4: Review and Execute
93
+
94
+ - Plan returns `status: "ready"` -> use the recommended node type to create, or call `create-pipeline-from-plan`
95
+ - Plan returns `status: "needs_clarification"` -> address `openQuestions`, then re-plan
96
+
97
+ ### Step 5: Create Nodes with the planned node type
98
+
99
+ Use `create-workspace-node-from-predecessor` with the node type from the plan. **Always pass `repoPath`** so config completion runs automatically:
100
+
101
+ ```javascript
102
+ create-workspace-node-from-predecessor({
103
+ workspaceID,
104
+ nodeType: "base-nodes:::Stage", // from plan-pipeline result
105
+ predecessorNodeIDs: ["source-id-1"],
106
+ changes: { name: "STG_CUSTOMER" },
107
+ repoPath: "/path/to/repo" // enables automatic config completion
108
+ })
109
+ ```
110
+
111
+ **Always use `create-workspace-node-from-predecessor` or `create-workspace-node-from-scratch`** — they handle validation, config completion, and column-level attributes automatically.
112
+
113
+ For joins and aggregations:
114
+
115
+ ```javascript
116
+ convert-join-to-aggregation({
117
+ workspaceID, nodeID: "join-node-id",
118
+ groupByColumns: [...], aggregates: [...],
119
+ maintainJoins: true,
120
+ repoPath: "/path/to/repo"
121
+ })
122
+ ```
123
+
124
+ ### Step 6: Verify Config Completion
125
+
126
+ Config is applied automatically when `repoPath` is provided. Check the `configCompletion` field in the response:
127
+
128
+ - `configCompletion.configReview.status` — `complete`, `needs_attention`, or `incomplete`
129
+ - `configCompletion.configReview.summary` — human-readable status summary
130
+ - `configCompletion.configReview.missingRequired` — required fields/columnSelectors still unset
131
+ - `configCompletion.configReview.warnings` — issues needing manual review (e.g., missing business keys on dimension nodes)
132
+ - `configCompletion.configReview.suggestions` — optional improvements (e.g., change tracking, materialization)
133
+ - `configCompletion.appliedConfig` — node-level config values that were set
134
+ - `configCompletion.columnAttributeChanges.applied` — column-level attributes (isBusinessKey, etc.)
135
+ - `configCompletion.reasoning` — why each decision was made
136
+
137
+ **Action required when `configReview.status` is not `complete`:**
138
+
139
+ - `incomplete` — required fields are missing. Set them via `update-workspace-node` or `replace-workspace-node-columns`.
140
+ - `needs_attention` — warnings need manual review (e.g., set `isBusinessKey` on the correct columns).
141
+
142
+ If `configCompletionSkipped` appears instead, call `complete-node-configuration` with `repoPath` to retry.
143
+
144
+ ### Step 7: Follow-Up Edits
145
+
146
+ Use `update-workspace-node` for post-creation changes.
147
+
148
+ ## Multi-Node Pipelines
149
+
150
+ Create nodes bottom-up — upstream before downstream. Each step uses node IDs from the previous step.
151
+
152
+ **Example: join two sources then aggregate**
153
+
154
+ ```javascript
155
+ // 0. Plan to discover correct node types for each layer
156
+ plan-pipeline({
157
+ workspaceID,
158
+ goal: "stage customer and orders, then build CLV fact",
159
+ sourceNodeIDs: ["source-customer-id", "source-orders-id"],
160
+ repoPath: "/path/to/repo"
161
+ })
162
+ // -> nodeTypeSelection shows e.g. "base-nodes:::Stage" for staging,
163
+ // "base-nodes:::Fact" for the fact layer
164
+
165
+ // 1. Stage each source using the planned node type (independent — parallelize)
166
+ create-workspace-node-from-predecessor({
167
+ workspaceID, nodeType: "base-nodes:::Stage", // from plan
168
+ predecessorNodeIDs: ["source-customer-id"],
169
+ changes: { name: "STG_CUSTOMER" },
170
+ repoPath: "/path/to/repo" // auto-completes config
171
+ })
172
+ // -> "stg-cust-id" + configCompletion shows applied config
173
+
174
+ create-workspace-node-from-predecessor({
175
+ workspaceID, nodeType: "base-nodes:::Stage", // from plan
176
+ predecessorNodeIDs: ["source-orders-id"],
177
+ changes: { name: "STG_ORDERS" },
178
+ repoPath: "/path/to/repo"
179
+ })
180
+ // -> "stg-orders-id" + configCompletion shows applied config
181
+
182
+ // 2. Create join/fact node with planned fact type
183
+ create-workspace-node-from-predecessor({
184
+ workspaceID, nodeType: "base-nodes:::Fact", // from plan
185
+ predecessorNodeIDs: ["stg-cust-id", "stg-orders-id"],
186
+ changes: { name: "FACT_CLV" },
187
+ repoPath: "/path/to/repo"
188
+ })
189
+ // -> "fact-clv-id", joinSuggestions with common columns, configCompletion
190
+
191
+ // 3. Aggregate with automatic JOIN ON generation
192
+ convert-join-to-aggregation({
193
+ workspaceID, nodeID: "fact-clv-id",
194
+ groupByColumns: ['"STG_CUSTOMER"."CUSTOMER_ID"'],
195
+ aggregates: [
196
+ { name: "TOTAL_ORDERS", function: "COUNT", expression: 'DISTINCT "STG_ORDERS"."ORDER_ID"' },
197
+ { name: "LIFETIME_VALUE", function: "SUM", expression: '"STG_ORDERS"."ORDER_TOTAL"' }
198
+ ],
199
+ maintainJoins: true,
200
+ repoPath: "/path/to/repo"
201
+ })
202
+ ```
203
+
204
+ **Verification after each step:** Check `validation.allPredecessorsRepresented`, `validation.autoPopulatedColumns`, `warning`, and `joinSuggestions` before proceeding.
205
+
206
+ **If a node is created with 0 columns:** The predecessor may have no columns, the IDs were wrong, or the node type doesn't auto-inherit. Try a projection-capable type (Stage, View, Work) or add columns with `replace-workspace-node-columns`.
207
+
208
+ ### UNION / Multi-Source Nodes
209
+
210
+ ```javascript
211
+ create-workspace-node-from-predecessor({
212
+ workspaceID, nodeType: "Stage",
213
+ predecessorNodeIDs: ["stg-orders-us-id", "stg-orders-eu-id"],
214
+ changes: { name: "STG_ORDERS_ALL" }
215
+ })
216
+
217
+ update-workspace-node({
218
+ workspaceID, nodeID: "new-node-id",
219
+ changes: { config: { insertStrategy: "UNION ALL" } }
220
+ })
221
+ ```
222
+
223
+ Values: `"UNION"` (dedup), `"UNION ALL"` (keep all), `"INSERT"` (sequential, default).
224
+
225
+ ## Incremental Pipeline Setup
226
+
227
+ **With the Incremental-Nodes package** (check `list-workspace-node-types` for `IncrementalLoading:::230`):
228
+
229
+ ```javascript
230
+ // 1. Create the node
231
+ create-workspace-node-from-predecessor({
232
+ workspaceID, nodeType: "IncrementalLoading:::230",
233
+ predecessorNodeIDs: ["source-orders-id"],
234
+ changes: { name: "INC_ORDERS" }
235
+ })
236
+
237
+ // 2. Configure incremental settings
238
+ update-workspace-node({
239
+ workspaceID, nodeID: "inc-orders-id",
240
+ changes: {
241
+ config: {
242
+ filterBasedOnPersistentTable: true,
243
+ persistentTableLocationName: "STAGING",
244
+ persistentTableName: "INC_ORDERS",
245
+ incrementalLoadColumn: "UPDATED_AT"
246
+ }
247
+ }
248
+ })
249
+ ```
250
+
251
+ How it works: reads MAX of the high-water mark column from the target, filters source to rows above that value, INSERTs new rows. Any MERGE/upsert/SCD logic happens downstream in Dimension or Fact nodes.
252
+
253
+ **Without the package** — use a regular Stage with a MAX subquery in joinCondition:
254
+
255
+ ```javascript
256
+ update-workspace-node({
257
+ workspaceID, nodeID: "new-node-id",
258
+ changes: {
259
+ config: { truncateBefore: false },
260
+ metadata: {
261
+ sourceMapping: [{
262
+ name: "STG_ORDERS_INCREMENTAL",
263
+ dependencies: [{ locationName: "RAW", nodeName: "ORDERS" }],
264
+ join: {
265
+ joinCondition: 'FROM {{ ref(\'RAW\', \'ORDERS\') }} "ORDERS"\nWHERE "ORDERS"."UPDATED_AT" > (\n SELECT COALESCE(MAX("UPDATED_AT"), \'1900-01-01\')\n FROM {{ ref_no_link(\'STAGING\', \'STG_ORDERS_INCREMENTAL\') }}\n)'
266
+ },
267
+ customSQL: { customSQL: "" }, aliases: {}, noLinkRefs: []
268
+ }]
269
+ }
270
+ }
271
+ })
272
+ ```
273
+
274
+ Use `{{ ref_no_link() }}` for the self-reference to avoid circular DAG dependencies.
275
+
276
+ ## Data Engineering Best Practices
277
+
278
+ ### Naming Conventions
279
+
280
+ Follow layer-appropriate naming to keep pipelines readable:
281
+
282
+ | Layer | Convention | Examples |
283
+ |-------|-----------|----------|
284
+ | Staging | `STG_<SOURCE>` | `STG_CUSTOMERS`, `STG_ORDERS` |
285
+ | Intermediate | `INT_<PURPOSE>` or `WRK_<PURPOSE>` | `INT_ORDER_ENRICHMENT`, `WRK_CUSTOMER_DEDUP` |
286
+ | Dimension | `DIM_<ENTITY>` | `DIM_CUSTOMER`, `DIM_PRODUCT` |
287
+ | Fact | `FACT_<PROCESS>` or `FCT_<PROCESS>` | `FACT_SALES`, `FCT_CLV` |
288
+ | View | `V_<PURPOSE>` | `V_ACTIVE_CUSTOMERS` |
289
+ | Hub | `HUB_<KEY>` | `HUB_CUSTOMER` |
290
+ | Satellite | `SAT_<HUB>_<CONTEXT>` | `SAT_CUSTOMER_DETAILS` |
291
+ | Link | `LNK_<RELATIONSHIP>` | `LNK_CUSTOMER_ORDER` |
292
+
293
+ For Snowflake, UPPERCASE node names are conventional (`STG_LOCATION`) since unquoted identifiers are uppercase. For Databricks/BigQuery, lowercase is typical. **Always respect the user's chosen casing** — if they provide or create a node with a specific case, preserve it exactly.
294
+
295
+ ### Join Verification Checklist
296
+
297
+ After creating a multi-predecessor node, ALWAYS:
298
+
299
+ 1. **Review `joinSuggestions`** — the response shows common columns between predecessors. Confirm these are the correct business keys for the join.
300
+ 2. **Choose the right join type:**
301
+ - `INNER JOIN` — only matching rows (use when every record must exist in both tables)
302
+ - `LEFT JOIN` — keep all rows from the primary table (use when the left table is the "driver" and right table may have missing matches)
303
+ - `FULL OUTER JOIN` — keep all rows from both tables (rare, use for reconciliation)
304
+ 3. **Verify join cardinality:** Joining a 1M-row table to a 10M-row table on a non-unique key causes fan-out (row multiplication). Ensure at least one side of the join is unique on the join key.
305
+ 4. **Set the join condition** — call `convert-join-to-aggregation` (for aggregation), `apply-join-condition` (for row-level joins), or `update-workspace-node` (for full manual control)
306
+ 5. **Verify columns** — call `get-workspace-node` to confirm the final column list and transforms are correct
307
+
308
+ ### Fact Table Grain
309
+
310
+ When building fact tables, define the **grain** (the set of columns that uniquely identifies each row):
311
+
312
+ - Grain columns become your `groupByColumns` in `convert-join-to-aggregation`
313
+ - Mark grain columns as `isBusinessKey: true`
314
+ - All other columns should be aggregates (COUNT, SUM, AVG, etc.)
315
+ - If unsure about the grain, ask the user: "What uniquely identifies each row in this fact table?"
316
+
317
+ Example: A sales fact table might have grain = `[CUSTOMER_ID, ORDER_DATE, PRODUCT_ID]` with measures `QUANTITY`, `REVENUE`, `DISCOUNT_AMOUNT`.
318
+
319
+ ### Post-Creation Verification
320
+
321
+ After creating each node, verify before moving to the next:
322
+
323
+ 1. **Check `nextSteps`** in the creation response — follow all required steps
324
+ 2. **Check `validation.allPredecessorsRepresented`** — if false, predecessors are missing from column sources
325
+ 3. **Check `configCompletion`** — verify applied config and column attributes make sense
326
+ 4. **For multi-predecessor nodes:** Confirm the join condition was set (call `get-workspace-node` to verify `metadata.sourceMapping[].join.joinCondition` is not empty)
327
+ 5. **For aggregation nodes:** Verify GROUP BY is valid (`validation.valid: true` from `convert-join-to-aggregation`)
328
+
329
+ ### Materialization Strategy
330
+
331
+ Choose materialization based on the node's role and query patterns:
332
+
333
+ - **Staging/Bronze:** Always `table` (preserve raw data; Snowflake: use transient tables)
334
+ - **Intermediate:** `view` for lightweight transforms; `table` for expensive computations
335
+ - **Dimensions:** `table` (small, queried repeatedly, need persistence)
336
+ - **Facts:** `table` with incremental loading for large volumes
337
+ - **Metrics/aggregations queried frequently:** `table` via Dimension or Fact (NEVER `view` for repeated aggregation queries)
338
+
339
+ IMPORTANT: `View` node types can ONLY materialize as views. If you need a table, use `Dimension`, `Fact`, `Stage`, or `Work`.
340
+
341
+ ## After Building the Pipeline
342
+
343
+ 1. **Deploy**: `start-run` with `runType: "deploy"`
344
+ 2. **Run**: `start-run` with `runType: "refresh"`
345
+ 3. **Monitor**: `run-status` or `run-and-wait`
346
+ 4. **Troubleshoot**: `get-run-results` for errors, `retry-run` to re-run
347
+
348
+ Scheduling is configured via Jobs in the Coalesce UI. Trigger existing jobs with `start-run` and `jobID`. See `coalesce://context/run-operations` for full guidance.
349
+
350
+ ## Related Resources
351
+
352
+ - `coalesce://context/node-creation-decision-tree` — routing: which tool to use
353
+ - `coalesce://context/node-operations` — editing nodes after creation
354
+ - `coalesce://context/data-engineering-principles` — architecture and materialization
355
+ - `coalesce://context/aggregation-patterns` — GROUP BY, datatype inference
@@ -0,0 +1,55 @@
1
+ # Run Operations
2
+
3
+ This guidance matches the Coalesce run source model and the actual run workflows registered in this MCP.
4
+
5
+ ## Current Tool Surface
6
+
7
+ Use these run tools and workflows:
8
+
9
+ - `list-runs`
10
+ - `get-run`
11
+ - `get-run-results`
12
+ - `get-run-details`
13
+ - `start-run`
14
+ - `run-status`
15
+ - `run-and-wait`
16
+ - `retry-run`
17
+ - `retry-and-wait`
18
+ - `cancel-run`
19
+
20
+ ## Identifier And Status Model
21
+
22
+ - Coalesce scheduler start and rerun flows return a numeric `runCounter`.
23
+ - `run-status` polls by `runCounter`.
24
+ - The same numeric identifier is used as `runID` for `get-run`, `get-run-results`, and `get-run-details`.
25
+ - Non-terminal statuses are `waitingToRun` and `running`.
26
+ - Terminal statuses are `completed`, `failed`, and `canceled`.
27
+
28
+ ## Source-Derived Lifecycle
29
+
30
+ - Coalesce app helpers call `/scheduler/startRun` or `/scheduler/rerun`, then poll `/scheduler/runStatus` until the run reaches a terminal status.
31
+ - The source only treats terminal completion as the point where success or failure can actually be asserted.
32
+ - The MCP `run-and-wait` and `retry-and-wait` workflows follow that same model and then fetch `/api/v1/runs/{runCounter}/results`.
33
+
34
+ ## Routing Rules
35
+
36
+ - Use `run-and-wait` when the user wants the final run outcome in one call.
37
+ - Use `retry-and-wait` when the prior run has already failed and should be retried immediately.
38
+ - Use `start-run` or `retry-run` when you want explicit control over the polling sequence.
39
+ - Use `run-status` for live scheduler state by `runCounter`.
40
+ - Use `get-run-details` when you want run metadata and results together.
41
+ - Use `get-run` or `get-run-results` when you only need one side of that data.
42
+ - Use `cancel-run` only with `runID`, `environmentID`, and org context.
43
+
44
+ ## Practical Checks
45
+
46
+ - Do not treat "request accepted" as the same thing as "run completed".
47
+ - Inspect terminal status and result payloads before reporting success.
48
+ - If the user only knows a job name, resolve its numeric ID first.
49
+ - `run-and-wait` and `retry-and-wait` can still return `resultsError`, `incomplete`, or `timedOut`; inspect those fields before calling the workflow successful.
50
+
51
+ ## Avoid
52
+
53
+ - Do not use browser URL UUID fragments as run IDs.
54
+ - Do not poll `run-status` with a job ID or environment ID.
55
+ - Do not retry runs that are still `waitingToRun` or `running`.
@@ -0,0 +1,41 @@
1
+ # SQL Rules: BigQuery
2
+
3
+ This guidance is based on the AI runtime platform rules for BigQuery and the related Coalesce SQL rendering behavior.
4
+
5
+ ## Default SQL Style
6
+
7
+ - Prefer lowercase identifiers.
8
+ - Use backticks to quote identifiers when needed.
9
+ - For raw physical references, prefer fully qualified `project.dataset.table` paths and quote the entire path with one pair of backticks when quoting is necessary.
10
+ - Avoid `SELECT *`; prefer explicit column lists.
11
+
12
+ ## Coalesce `ref()` Syntax
13
+
14
+ - In Coalesce node SQL, node-to-node dependencies should stay in logical ref form:
15
+
16
+ ```jinja
17
+ FROM {{ ref('sample', 'nation') }} nation
18
+ LEFT JOIN {{ ref('storage_location', 'stg_foo') }} stg_foo
19
+ ON nation.id = stg_foo.nation_id
20
+ ```
21
+
22
+ - The SQL processing layer treats `ref()` arguments as exact `locationName` and `nodeName` values.
23
+ - Keep `ref()` arguments aligned with the saved Coalesce names, even when the rest of the SQL follows BigQuery lowercase conventions.
24
+ - For more on logical locations, use `coalesce://context/storage-mappings`.
25
+
26
+ ## Physical Object Names
27
+
28
+ - When Coalesce renders physical database and schema locations for BigQuery, generated physical references may appear in backtick form such as `` `project`.`dataset`. ``
29
+ - Preserve that generated style when editing SQL that already includes physical references from Coalesce metadata or templates.
30
+
31
+ ## Common Source-Backed Functions
32
+
33
+ - String: `split`, `substr`, `upper`, `lower`, `regexp_extract`, `replace`
34
+ - Date: `date_trunc`, `timestamp_trunc`, `date_add`, `date_sub`, `timestamp_add`, `timestamp_diff`
35
+ - Aggregate: `sum`, `count`, `avg`, `min`, `max`, `any_value`, `string_agg`, `array_agg`
36
+
37
+ ## Avoid
38
+
39
+ - Do not replace Coalesce `ref()` references with raw project.dataset.table paths unless the user explicitly wants physical SQL.
40
+ - Do not switch BigQuery identifier quoting to Snowflake-style double quotes.
41
+ - Do not introduce wildcard projections when explicit columns are practical.
@@ -0,0 +1,40 @@
1
+ # SQL Rules: Databricks
2
+
3
+ This guidance is based on the AI runtime platform rules for Databricks and the related Coalesce SQL rendering behavior.
4
+
5
+ ## Default SQL Style
6
+
7
+ - Table references should use backticks rather than double quotes.
8
+ - Use lowercase identifiers for SQL you introduce.
9
+ - Use clear naming prefixes such as `stg_`, `dim_`, and `fct_` when naming new objects.
10
+
11
+ ## Coalesce `ref()` Syntax
12
+
13
+ - Coalesce node-to-node SQL should use logical refs:
14
+
15
+ ```jinja
16
+ FROM {{ ref('sample', 'nation') }} `nation`
17
+ JOIN {{ ref('storage_location', 'stg_foo') }} `stg_foo`
18
+ ```
19
+
20
+ - The SQL processing layer treats `ref()` arguments as exact `locationName` and `nodeName` values.
21
+ - Keep `ref()` arguments aligned with the saved Coalesce names, even when the rest of the SQL follows Databricks lowercase conventions.
22
+ - For more on logical locations, use `coalesce://context/storage-mappings`.
23
+
24
+ ## Physical Object Names
25
+
26
+ - When Coalesce renders physical database and schema locations for Databricks, object names appear in backtick form such as `` `catalog`.`schema`. `` or `` `db`.`schema`. ``
27
+ - Preserve that generated style when editing SQL that already includes physical references from Coalesce metadata or templates.
28
+
29
+ ## Common Source-Backed Functions
30
+
31
+ - String: `split`, `regexp_extract`, `initcap`, `translate`, `reverse`
32
+ - Date: `date_trunc`, `add_months`, `months_between`, `next_day`, `current_date`
33
+ - Array: `explode`, `array_contains`, `size`, `slice`, `posexplode`
34
+ - Aggregate: `collect_list`, `collect_set`, `count_distinct`, `percentile_approx`, `any_value`
35
+
36
+ ## Avoid
37
+
38
+ - Do not rewrite `ref()` arguments into physical warehouse paths.
39
+ - Do not use Snowflake double quotes for Databricks identifiers.
40
+ - Do not mix uppercase Snowflake-style aliasing with Databricks lowercase conventions unless the saved names require it.
@@ -0,0 +1,70 @@
1
+ # SQL Platform Selection
2
+
3
+ Read this resource before writing or editing SQL in a Coalesce node.
4
+
5
+ ## Goal
6
+
7
+ Determine the platform first, then read exactly one dialect resource:
8
+
9
+ - `coalesce://context/sql-snowflake`
10
+ - `coalesce://context/sql-databricks`
11
+ - `coalesce://context/sql-bigquery`
12
+
13
+ Follow exactly one dialect's rules per edit. Mixing dialect conventions in one node creates compilation errors.
14
+
15
+ ## Best Signal Order
16
+
17
+ ### 1. Check Project Metadata First
18
+
19
+ Use `get-project` or `list-projects` and inspect the returned project metadata for the warehouse or platform type.
20
+
21
+ This is the best first signal because it reflects the project configuration directly.
22
+
23
+ ### 2. Check Existing Node SQL
24
+
25
+ If you are editing an existing node, read the current node and inspect:
26
+
27
+ - identifier casing
28
+ - quoting style
29
+ - function names
30
+ - date/time function patterns
31
+ - join and alias style
32
+
33
+ Prefer preserving the existing node and workspace conventions rather than normalizing everything.
34
+
35
+ ### 3. Check Neighboring Nodes
36
+
37
+ If one node is ambiguous, inspect nearby workspace nodes in the same layer or dependency chain.
38
+
39
+ Workspace-local conventions are a better guide than generic SQL style advice.
40
+
41
+ ### 4. Ask the User If Still Unclear
42
+
43
+ If project metadata and existing SQL still do not settle the dialect, ask the user.
44
+
45
+ When the dialect choice materially affects correctness (e.g., function names, quoting, type syntax), ask the user rather than guessing.
46
+
47
+ ## Coalesce-Specific Rule
48
+
49
+ Inside Coalesce node SQL, prefer `{{ ref(...) }}` for node and storage references. For full reference syntax details, see `coalesce://context/storage-mappings`.
50
+
51
+ Preserve `{{ ref(...) }}` syntax for node references. Only replace with raw warehouse-qualified table names if the user explicitly wants warehouse-native SQL outside normal Coalesce patterns.
52
+
53
+ ## Editing Principles
54
+
55
+ This is the canonical source for "preserve workspace conventions" when editing SQL.
56
+
57
+ When modifying existing SQL:
58
+
59
+ - preserve the current dialect
60
+ - preserve the current quoting and casing style unless it is clearly broken
61
+ - avoid broad formatting rewrites that do not change behavior
62
+ - preserve existing workspace style over personal preferences
63
+
64
+ When generating entirely new SQL and no local convention exists:
65
+
66
+ - use the selected platform resource as the default style guide
67
+
68
+ ## Special Note
69
+
70
+ Run-tool authentication in this MCP server is Snowflake Key Pair-based, but that does not mean every project uses Snowflake SQL semantics. Determine the SQL platform from project metadata and node SQL, not from run-tool auth requirements.
@@ -0,0 +1,43 @@
1
+ # SQL Rules: Snowflake
2
+
3
+ This guidance is based on the AI runtime platform rules for Snowflake and the related Coalesce SQL rendering behavior.
4
+
5
+ ## Default SQL Style
6
+
7
+ - Default to uppercase unquoted identifiers for column names, aliases, and SQL identifiers (Snowflake convention).
8
+ - Node names become Snowflake table/view names. Default to UPPERCASE (`STG_LOCATION`, `FACT_ORDERS`), but **respect the user's chosen casing** — if they name a node in lowercase, preserve it.
9
+ - Use double quotes only when you must preserve exact case or quote a reserved word.
10
+ - Column names and aliases should normally be uppercase and unquoted.
11
+ - Table aliases should normally be uppercase and unquoted.
12
+ - Avoid backticks, single-quoted identifiers, or mixed quoting styles.
13
+
14
+ ## Coalesce `ref()` Syntax
15
+
16
+ - Coalesce node-to-node SQL should use logical refs:
17
+
18
+ ```jinja
19
+ FROM {{ ref("SAMPLE", "NATION") }} N
20
+ LEFT JOIN {{ ref("STORAGE_LOCATION", "STG_FOO") }} F
21
+ ON N.ID = F.NATION_ID
22
+ ```
23
+
24
+ - The SQL processing layer treats `ref()` arguments as exact `locationName` and `nodeName` values.
25
+ - Keep `ref()` arguments aligned with the saved Coalesce names, even when the rest of the SQL follows Snowflake casing preferences.
26
+ - For more on logical locations, use `coalesce://context/storage-mappings`.
27
+
28
+ ## Physical Object Names
29
+
30
+ - When Coalesce renders physical database and schema locations, Snowflake object names may appear in double-quoted form such as `"DB"."SCHEMA".`
31
+ - Preserve that generated style when editing SQL that already includes physical object references from Coalesce metadata or templates.
32
+
33
+ ## Common Source-Backed Functions
34
+
35
+ - String: `SPLIT`, `SUBSTR`, `UPPER`, `LOWER`, `REGEXP_SUBSTR`
36
+ - Date: `DATEADD`, `DATEDIFF`, `TO_DATE`, `DATE_TRUNC`
37
+ - Aggregate: `SUM`, `COUNT`, `AVG`, `MIN`, `MAX`, `MEDIAN`
38
+
39
+ ## Avoid
40
+
41
+ - Do not rewrite `ref()` arguments into raw warehouse object paths.
42
+ - Do not switch Snowflake identifier quoting to backticks.
43
+ - Do not introduce mixed-case quoted identifiers unless preserving exact case is required.
@@ -0,0 +1,49 @@
1
+ # Storage Locations and References
2
+
3
+ This guidance is based on the AI runtime storage-context section and the related Coalesce ref-handling code.
4
+
5
+ ## What Storage Locations Mean
6
+
7
+ - Storage locations are short logical names that map to `DATABASE.SCHEMA` pairs where tables live.
8
+ - Users may have many storage locations configured in a workspace.
9
+ - Any node can be manually configured to live somewhere other than the obvious default, so always pay attention to the existing node storage locations when writing joins or downstream refs.
10
+
11
+ ## Default Placement Matters
12
+
13
+ - New nodes normally default to the storage location specified by their node type.
14
+ - That default matters when you create multiple nodes at once or in parallel because downstream refs may need to target the new node's default location, not the source node's location.
15
+
16
+ ## Source Example
17
+
18
+ - You create a chain `A -> B -> C`
19
+ - `A` already exists in `SOURCE_A`
20
+ - `B` is created and defaults to `USER_WORKSPACE`
21
+ - `C` is created at the same time and also defaults to `USER_WORKSPACE`
22
+
23
+ That means:
24
+
25
+ - `B` should reference `A` with `{{ ref('SOURCE_A', 'ORDERS') }}`
26
+ - `C` should reference `B` with `{{ ref('USER_WORKSPACE', 'NODE_B') }}`
27
+
28
+ ## `ref()` Contract
29
+
30
+ - Coalesce node SQL uses logical refs in this shape:
31
+
32
+ ```jinja
33
+ FROM {{ ref('LOCATION_NAME', 'NODE_NAME') }}
34
+ ```
35
+
36
+ - The SQL processing layer extracts refs into exact `locationName` and `nodeName` pairs.
37
+ - Keep `ref()` arguments aligned with the saved Coalesce names.
38
+
39
+ ## Practical Rules
40
+
41
+ - Treat `locationName` as logical Coalesce state and `database` or `schema` as the resolved physical target behind it.
42
+ - When chaining node creation, confirm where the upstream node was actually placed before referencing it.
43
+ - If storage mappings are missing, fix that first instead of guessing raw warehouse paths.
44
+
45
+ ## Avoid
46
+
47
+ - Do not guess logical location names from database or schema values.
48
+ - Do not assume new nodes land in the same location as their predecessors.
49
+ - Do not normalize `ref()` arguments to warehouse casing rules.