coalesce-transform-mcp 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (134) hide show
  1. package/LICENSE +21 -0
  2. package/README.md +304 -0
  3. package/dist/cache-dir.d.ts +26 -0
  4. package/dist/cache-dir.js +106 -0
  5. package/dist/client.d.ts +25 -0
  6. package/dist/client.js +212 -0
  7. package/dist/coalesce/api/environments.d.ts +20 -0
  8. package/dist/coalesce/api/environments.js +15 -0
  9. package/dist/coalesce/api/git-accounts.d.ts +21 -0
  10. package/dist/coalesce/api/git-accounts.js +21 -0
  11. package/dist/coalesce/api/jobs.d.ts +25 -0
  12. package/dist/coalesce/api/jobs.js +21 -0
  13. package/dist/coalesce/api/nodes.d.ts +29 -0
  14. package/dist/coalesce/api/nodes.js +33 -0
  15. package/dist/coalesce/api/projects.d.ts +22 -0
  16. package/dist/coalesce/api/projects.js +25 -0
  17. package/dist/coalesce/api/runs.d.ts +19 -0
  18. package/dist/coalesce/api/runs.js +34 -0
  19. package/dist/coalesce/api/subgraphs.d.ts +20 -0
  20. package/dist/coalesce/api/subgraphs.js +17 -0
  21. package/dist/coalesce/api/users.d.ts +30 -0
  22. package/dist/coalesce/api/users.js +31 -0
  23. package/dist/coalesce/types.d.ts +298 -0
  24. package/dist/coalesce/types.js +746 -0
  25. package/dist/generated/.gitkeep +0 -0
  26. package/dist/generated/node-type-corpus.json +42656 -0
  27. package/dist/index.d.ts +2 -0
  28. package/dist/index.js +10 -0
  29. package/dist/mcp/cache.d.ts +3 -0
  30. package/dist/mcp/cache.js +137 -0
  31. package/dist/mcp/environments.d.ts +3 -0
  32. package/dist/mcp/environments.js +61 -0
  33. package/dist/mcp/git-accounts.d.ts +3 -0
  34. package/dist/mcp/git-accounts.js +70 -0
  35. package/dist/mcp/jobs.d.ts +3 -0
  36. package/dist/mcp/jobs.js +77 -0
  37. package/dist/mcp/node-type-corpus.d.ts +3 -0
  38. package/dist/mcp/node-type-corpus.js +173 -0
  39. package/dist/mcp/nodes.d.ts +3 -0
  40. package/dist/mcp/nodes.js +341 -0
  41. package/dist/mcp/pipelines.d.ts +3 -0
  42. package/dist/mcp/pipelines.js +342 -0
  43. package/dist/mcp/projects.d.ts +3 -0
  44. package/dist/mcp/projects.js +70 -0
  45. package/dist/mcp/repo-node-types.d.ts +135 -0
  46. package/dist/mcp/repo-node-types.js +387 -0
  47. package/dist/mcp/runs.d.ts +3 -0
  48. package/dist/mcp/runs.js +92 -0
  49. package/dist/mcp/subgraphs.d.ts +3 -0
  50. package/dist/mcp/subgraphs.js +60 -0
  51. package/dist/mcp/users.d.ts +3 -0
  52. package/dist/mcp/users.js +107 -0
  53. package/dist/prompts/index.d.ts +2 -0
  54. package/dist/prompts/index.js +58 -0
  55. package/dist/resources/context/aggregation-patterns.md +145 -0
  56. package/dist/resources/context/data-engineering-principles.md +183 -0
  57. package/dist/resources/context/hydrated-metadata.md +92 -0
  58. package/dist/resources/context/id-discovery.md +64 -0
  59. package/dist/resources/context/intelligent-node-configuration.md +162 -0
  60. package/dist/resources/context/node-creation-decision-tree.md +156 -0
  61. package/dist/resources/context/node-operations.md +316 -0
  62. package/dist/resources/context/node-payloads.md +114 -0
  63. package/dist/resources/context/node-type-corpus.md +166 -0
  64. package/dist/resources/context/node-type-selection-guide.md +96 -0
  65. package/dist/resources/context/overview.md +135 -0
  66. package/dist/resources/context/pipeline-workflows.md +355 -0
  67. package/dist/resources/context/run-operations.md +55 -0
  68. package/dist/resources/context/sql-bigquery.md +41 -0
  69. package/dist/resources/context/sql-databricks.md +40 -0
  70. package/dist/resources/context/sql-platform-selection.md +70 -0
  71. package/dist/resources/context/sql-snowflake.md +43 -0
  72. package/dist/resources/context/storage-mappings.md +49 -0
  73. package/dist/resources/context/tool-usage.md +98 -0
  74. package/dist/resources/index.d.ts +5 -0
  75. package/dist/resources/index.js +254 -0
  76. package/dist/schemas/node-payloads.d.ts +5019 -0
  77. package/dist/schemas/node-payloads.js +147 -0
  78. package/dist/server.d.ts +7 -0
  79. package/dist/server.js +63 -0
  80. package/dist/services/cache/snapshots.d.ts +108 -0
  81. package/dist/services/cache/snapshots.js +275 -0
  82. package/dist/services/config/context-analyzer.d.ts +14 -0
  83. package/dist/services/config/context-analyzer.js +76 -0
  84. package/dist/services/config/field-classifier.d.ts +23 -0
  85. package/dist/services/config/field-classifier.js +47 -0
  86. package/dist/services/config/intelligent.d.ts +55 -0
  87. package/dist/services/config/intelligent.js +306 -0
  88. package/dist/services/config/rules.d.ts +6 -0
  89. package/dist/services/config/rules.js +44 -0
  90. package/dist/services/config/schema-resolver.d.ts +18 -0
  91. package/dist/services/config/schema-resolver.js +80 -0
  92. package/dist/services/corpus/loader.d.ts +56 -0
  93. package/dist/services/corpus/loader.js +25 -0
  94. package/dist/services/corpus/search.d.ts +49 -0
  95. package/dist/services/corpus/search.js +69 -0
  96. package/dist/services/corpus/templates.d.ts +4 -0
  97. package/dist/services/corpus/templates.js +11 -0
  98. package/dist/services/pipelines/execution.d.ts +20 -0
  99. package/dist/services/pipelines/execution.js +290 -0
  100. package/dist/services/pipelines/node-type-intent.d.ts +96 -0
  101. package/dist/services/pipelines/node-type-intent.js +356 -0
  102. package/dist/services/pipelines/node-type-selection.d.ts +66 -0
  103. package/dist/services/pipelines/node-type-selection.js +758 -0
  104. package/dist/services/pipelines/planning.d.ts +543 -0
  105. package/dist/services/pipelines/planning.js +1839 -0
  106. package/dist/services/policies/sql-override.d.ts +7 -0
  107. package/dist/services/policies/sql-override.js +109 -0
  108. package/dist/services/repo/operations.d.ts +6 -0
  109. package/dist/services/repo/operations.js +10 -0
  110. package/dist/services/repo/parser.d.ts +70 -0
  111. package/dist/services/repo/parser.js +365 -0
  112. package/dist/services/repo/path.d.ts +2 -0
  113. package/dist/services/repo/path.js +58 -0
  114. package/dist/services/templates/nodes.d.ts +50 -0
  115. package/dist/services/templates/nodes.js +336 -0
  116. package/dist/services/workspace/analysis.d.ts +56 -0
  117. package/dist/services/workspace/analysis.js +151 -0
  118. package/dist/services/workspace/mutations.d.ts +150 -0
  119. package/dist/services/workspace/mutations.js +1718 -0
  120. package/dist/utils.d.ts +5 -0
  121. package/dist/utils.js +7 -0
  122. package/dist/workflows/get-environment-overview.d.ts +9 -0
  123. package/dist/workflows/get-environment-overview.js +23 -0
  124. package/dist/workflows/get-run-details.d.ts +10 -0
  125. package/dist/workflows/get-run-details.js +28 -0
  126. package/dist/workflows/progress.d.ts +20 -0
  127. package/dist/workflows/progress.js +54 -0
  128. package/dist/workflows/retry-and-wait.d.ts +13 -0
  129. package/dist/workflows/retry-and-wait.js +139 -0
  130. package/dist/workflows/run-and-wait.d.ts +13 -0
  131. package/dist/workflows/run-and-wait.js +141 -0
  132. package/dist/workflows/run-status.d.ts +10 -0
  133. package/dist/workflows/run-status.js +27 -0
  134. package/package.json +34 -0
@@ -0,0 +1,114 @@
1
+ # Node Payloads
2
+
3
+ Use this resource when reading or writing full workspace node bodies.
4
+
5
+ ## Core Rule
6
+
7
+ Treat workspace node bodies as structured objects with important differences between top-level fields, `metadata`, `config`, and arrays.
8
+
9
+ ## Common Payload Areas
10
+
11
+ ### Top-Level Fields
12
+
13
+ Common top-level fields include:
14
+ - `id`
15
+ - `name`
16
+ - `description`
17
+ - `nodeType`
18
+ - `database`
19
+ - `schema`
20
+ - `locationName`
21
+ - `storageLocations`
22
+ - `config`
23
+ - `metadata`
24
+
25
+ Not every node type uses every field.
26
+
27
+ Project policy:
28
+ - Do not send `overrideSQL`
29
+ - Do not send `override.*`
30
+ - If a node definition mentions `overrideSQLToggle`, treat it as disallowed and omit it from writes
31
+
32
+ ### `metadata`
33
+
34
+ `metadata` usually contains node-shape details such as:
35
+ - `columns`
36
+ - lineage/source structures
37
+ - mapping-related structures
38
+
39
+ This is where many node-specific editing mistakes happen.
40
+
41
+ ### `config`
42
+
43
+ `config` is node-type-specific operational configuration.
44
+
45
+ Examples include:
46
+ - `preSQL`
47
+ - `postSQL`
48
+ - `insertStrategy`
49
+ - node-type-specific dropdown or tabular settings
50
+
51
+ Do not assume the same config keys exist across node types.
52
+
53
+ ### `storageLocations` and top-level location fields
54
+
55
+ Some nodes use `storageLocations`.
56
+ Some also expose location fields at the top level such as:
57
+ - `database`
58
+ - `schema`
59
+ - `locationName`
60
+
61
+ Verify the saved node body instead of assuming one location representation is always present.
62
+
63
+ ## Array Safety
64
+
65
+ Arrays are replace-on-write unless the tool says otherwise.
66
+
67
+ This is especially important for:
68
+ - `metadata.columns`
69
+ - other metadata arrays
70
+ - `storageLocations`
71
+
72
+ If you send only the new array items, you will usually replace the old array.
73
+
74
+ ## Tool Guidance
75
+
76
+ For tool selection and helper preference guidance, see `coalesce://context/node-creation-decision-tree`.
77
+
78
+ ### Prefer `update-workspace-node`
79
+
80
+ Use it for most partial node edits because it:
81
+ - reads the current node first
82
+ - deep-merges object fields
83
+ - writes the merged full body back
84
+
85
+ ### Use `set-workspace-node` carefully
86
+
87
+ Use `set-workspace-node` only when:
88
+ - you intend a full replacement
89
+ - you already have the complete final node body
90
+
91
+ ### Scratch creation
92
+
93
+ For scratch creation guidance (use cases, completion levels, expected fields), see `coalesce://context/pipeline-workflows`.
94
+
95
+ ## Validation Rules
96
+
97
+ After helper-based writes, inspect:
98
+ - `validation`
99
+ - `warning`
100
+
101
+ Do not assume a node is fully ready unless the helper says the requested completion was satisfied.
102
+
103
+ ## Rename Safety
104
+
105
+ If a node name changes:
106
+ - verify the saved node body still looks correct
107
+ - verify related mapping/source fields still make sense
108
+ - do not assume downstream references were updated automatically
109
+
110
+ ## Related Resources
111
+
112
+ - `coalesce://context/hydrated-metadata`
113
+ - `coalesce://context/storage-mappings`
114
+ - `coalesce://context/node-creation-decision-tree`
@@ -0,0 +1,166 @@
1
+ # Node Type Discovery and Corpus
2
+
3
+ ## Prefer Repo-Backed Discovery First
4
+
5
+ **BEFORE creating or editing nodes:**
6
+ 1. Determine which node types are already observed in the workspace
7
+ 2. If a local committed repo is available, use repo-backed discovery with explicit `repoPath` or the `COALESCE_REPO_PATH` fallback
8
+ 3. Use the corpus only when the repo is unavailable or lacks the committed definition
9
+
10
+ Repo-backed workflow:
11
+ - Install the package in Coalesce
12
+ - Commit the workspace branch
13
+ - Update the local repo clone to that branch
14
+ - Use `list-repo-packages`, `list-repo-node-types`, `get-repo-node-type-definition`, or `generate-set-workspace-node-template`
15
+ - Prefer explicit `repoPath` on repo-aware calls; when omitted, tools fall back to `COALESCE_REPO_PATH`
16
+ - There is no server-wide repo mapping in v1
17
+
18
+ ## Repo-Aware Tools
19
+
20
+ ### `list-repo-packages`
21
+ Inspect committed `packages/*.yml`
22
+ Use this to discover exact package aliases and see which enabled node-type IDs have committed definitions
23
+
24
+ ### `list-repo-node-types`
25
+ List exact resolvable node-type identifiers from committed `nodeTypes/`
26
+ Use this to confirm the exact direct identifier or `alias:::id` value before generating templates
27
+
28
+ ### `get-repo-node-type-definition`
29
+ Resolve one exact node type from the committed repo
30
+ Returns the parsed outer definition plus raw and parsed `metadata.nodeMetadataSpec`
31
+
32
+ ### `generate-set-workspace-node-template`
33
+ Generate a `set-workspace-node` body template from either:
34
+ - a raw definition object
35
+ - a committed repo definition resolved by `repoPath` + `nodeType`
36
+
37
+ In repo mode, this preserves the exact resolved node type, including package-backed identifiers like `alias:::id`
38
+
39
+ ## Corpus Tools
40
+
41
+ ### `search-node-type-variants`
42
+ Search for node type families (e.g., "stage", "dimension")
43
+ Returns variants with their config schema and metadata examples
44
+
45
+ ### `get-node-type-variant`
46
+ Get exact variant definition by key
47
+ Use this to get the authoritative structure for a node type
48
+
49
+ ### `list-workspace-node-types` *(NEW)*
50
+ Scan workspace nodes and return distinct observed node types
51
+ Use this to inspect current workspace usage and exact identifiers before recommending
52
+ The response includes `basis: "observed_nodes"` to make the contract explicit
53
+
54
+ ## Node-Type-Specific Patterns
55
+
56
+ Project policy:
57
+ - Never recommend or set SQL override fields
58
+ - Ignore or remove `overrideSQLToggle`, `overrideSQL`, and `override.*` when reading node-type definitions
59
+ - Prefer native node configuration and metadata patterns instead of SQL override
60
+
61
+ The corpus contains node-type-specific patterns and configurations. These vary by:
62
+ - Package source (Coalesce built-in vs community packages)
63
+ - Node type family (Stage, Dimension, Fact, View, etc.)
64
+ - Package version and variant
65
+
66
+ **When to consult corpus patterns:**
67
+ - Creating nodes with complex metadata (sources, joins, config)
68
+ - Applying node-type-specific logic (materialization, caching, partitioning)
69
+ - Understanding required vs optional config fields
70
+ - Adapting SQL patterns for specific node types
71
+
72
+ The corpus provides real-world examples from actual node type source code, including:
73
+ - Metadata structure (columns, sources, mappings)
74
+ - Config field schemas and valid values
75
+ - SQL patterns and Jinja syntax specific to that node type
76
+ - Storage location and reference patterns
77
+
78
+ ## Column-Level Attributes (columnSelector)
79
+
80
+ Node type definitions contain config items with `"type": "columnSelector"`. These are **column-level attributes** — boolean flags set directly on individual column objects in `metadata.columns`, NOT in the node-level `config` object.
81
+
82
+ **How it works:**
83
+
84
+ 1. The node type definition (in `nodeTypes/` in the local repo) has a config item like:
85
+ ```json
86
+ { "displayName": "Business Key", "attributeName": "isBusinessKey", "type": "columnSelector" }
87
+ ```
88
+ 2. To activate it, set the `attributeName` as a boolean on the column:
89
+ ```json
90
+ { "name": "CUSTOMER_ID", "dataType": "NUMBER(38,0)", "isBusinessKey": true, "transform": "..." }
91
+ ```
92
+ 3. The node type's Jinja templates reference it: `columns | selectattr('isBusinessKey')`
93
+
94
+ **Discovery workflow:**
95
+
96
+ 1. Use `get-repo-node-type-definition` to read the node type definition from the local repo
97
+ 2. Find config items where `type` is `"columnSelector"`
98
+ 3. The `attributeName` field tells you what property to set on columns
99
+ 4. Set `attributeName: true` on the appropriate columns via `update-workspace-node` or `replace-workspace-node-columns`
100
+
101
+ **Important:** Attribute names vary by node type and package. Always look them up in the actual node type definition — do not guess or hardcode attribute names.
102
+
103
+ ## Workflow
104
+
105
+ 1. **Discover observed types:** `list-workspace-node-types`
106
+ 2. **Check if recommended type is already observed:** Compare to workspace types
107
+ 3. **If unobserved:** Do not claim it is unavailable; confirm installation in the UI before proceeding
108
+ 4. **If a committed local repo is available:** Use repo-aware tools with `repoPath` or `COALESCE_REPO_PATH`
109
+ 5. **If repo resolution fails or the definition is missing:** Use `search-node-type-variants` and `get-node-type-variant`
110
+ 6. **Adapt example:** Replace placeholder values with user-specific data
111
+ 7. **Create/update:** Use appropriate tool with correct structure
112
+
113
+ ## Node Type Availability
114
+
115
+ If a recommended node type is not observed in current workspace nodes:
116
+ 1. Do not claim the type is unavailable
117
+ 2. Explain that the workspace scan only reflects existing nodes, not a true installed-type registry
118
+ 3. Confirm package installation in the Coalesce UI when the exact availability is uncertain
119
+ 4. Prefer repo-backed discovery to recover the exact identifier before proceeding
120
+
121
+ ## Node Type Format
122
+
123
+ Node types can appear in two formats:
124
+
125
+ 1. **Simple format:** Direct node type names without package prefix
126
+ - Examples: `"Stage"`, `"persistentStage"`, `"View"`
127
+ - Used for built-in node types or custom node types in the repo
128
+
129
+ 2. **Package-prefixed format:** Package name followed by `:::` and node type ID
130
+ - Examples: `"IncrementalLoading:::230"`, `"Databricks-Incremental-nodes:::278"`
131
+ - Used for package-installed node types from published packages
132
+ - The format is: `"PackageName:::NodeTypeID"`
133
+
134
+ ### Creating Nodes with Package-Prefixed Types
135
+
136
+ When creating nodes, you can use **either format**:
137
+
138
+ - Full format: `nodeType: "IncrementalLoading:::230"`
139
+ - Bare ID: `nodeType: "230"` (will match any package with that ID)
140
+
141
+ Prefer the full package-prefixed format when you know it. Using just the numeric ID (e.g., `"230"`) is safest when a matching package-prefixed type is already observed in workspace nodes.
142
+
143
+ ### Discovering Node Type Format
144
+
145
+ Use `list-workspace-node-types` to see the exact format of observed node types. The response will show package-prefixed types like `"IncrementalLoading:::230"` when those identifiers are already present in current workspace nodes.
146
+
147
+ ## Template Usage
148
+
149
+ ### Template Generation Hierarchy
150
+
151
+ 1. **For committed local repo definitions:** Use `generate-set-workspace-node-template` in repo mode
152
+ - Resolves the exact committed definition from `repoPath` or `COALESCE_REPO_PATH`
153
+ - Preserves exact direct or package-backed node-type identifiers
154
+ - Best choice when the local repo contains the definition
155
+
156
+ 2. **For known corpus variants:** Use `get-node-type-variant`
157
+ - Returns exact variant definition from the committed corpus snapshot
158
+ - Best fallback when repo-backed resolution is unavailable
159
+
160
+ 3. **For generating templates from corpus variants:** Use `generate-set-workspace-node-template-from-variant`
161
+ - Converts a corpus variant into an editable YAML-friendly template
162
+ - Use this when the repo does not contain the committed definition
163
+
164
+ 4. **For discovery:** Search first, then get variant
165
+ - Use `search-node-type-variants` to find the right fallback variant
166
+ - Then use `get-node-type-variant` or `generate-set-workspace-node-template-from-variant`
@@ -0,0 +1,96 @@
1
+ # Node Type Selection Guide
2
+
3
+ When creating pipeline nodes, choose the node type based on the actual purpose of the node — not the SQL pattern.
4
+
5
+ ## General-Purpose Node Types (use for most transforms)
6
+
7
+ ### Stage / Work
8
+ - **Purpose**: General-purpose intermediate processing. Handles single-source, multi-source joins, GROUP BY, UNION, filters, transforms. These are interchangeable for most patterns.
9
+ - **Materialization**: Table or View
10
+ - **Use for**: Column renames, type casts, WHERE filters, GROUP BY aggregations, multi-table JOINs, UNION/UNION ALL, any transformation without special requirements
11
+ - **Default choice**: When in doubt, use Stage or Work
12
+
13
+ ### View
14
+ - **Purpose**: Virtual table with no physical storage. Recalculates on every query.
15
+ - **Materialization**: View only
16
+ - **Use for**: Lightweight projections, secure views, cost savings when recomputation is OK
17
+ - **Avoid when**: Downstream queries are performance-critical or aggregations are expensive
18
+
19
+ ## Dimensional Modeling Node Types (only when explicitly building a dimensional model)
20
+
21
+ ### Dimension
22
+ - **Purpose**: Descriptive business context (customers, products, locations). Requires business keys. Supports SCD Type 1/2.
23
+ - **Materialization**: Table or View
24
+ - **Use ONLY when**: Building a star/snowflake schema, node is named dim_ or dimension_, SCD tracking is needed
25
+ - **NOT for**: Generic GROUP BY, aggregations, staging, or CTE decomposition
26
+
27
+ ### Fact
28
+ - **Purpose**: Business measures (revenue, quantity, cost) at a defined grain. Requires business keys.
29
+ - **Materialization**: Table or View
30
+ - **Use ONLY when**: Building a fact table in a dimensional model, node is named fct_ or fact_
31
+ - **NOT for**: Any GROUP BY or SUM — those are transforms, not fact tables
32
+
33
+ ### Factless Fact
34
+ - **Purpose**: Record events without numeric measures (attendance, eligibility)
35
+ - **Use ONLY when**: Event tracking without measures in a dimensional model
36
+
37
+ ## Change Tracking Node Types
38
+
39
+ ### Persistent Stage
40
+ - **Purpose**: CDC / change tracking with business keys. Type 1/Type 2 history.
41
+ - **Materialization**: Table only
42
+ - **Use ONLY when**: Goal explicitly mentions CDC, change tracking, or history tracking
43
+ - **NOT for**: Simple staging, general transforms, CTE decomposition
44
+
45
+ ## Data Vault Node Types
46
+
47
+ ### Hub / Satellite / Link
48
+ - **Use ONLY when**: Explicitly building a Data Vault model
49
+ - **NOT for**: General-purpose joins or transforms
50
+
51
+ ## Specialized Materialization Patterns (only when explicitly requested)
52
+
53
+ ### Dynamic Tables
54
+ - **Purpose**: Snowflake-managed declarative refresh with lag-based orchestration
55
+ - **Use when**: Near-real-time / continuous refresh, streaming-like pipelines, replacing complex Streams+Tasks
56
+ - **NOT for**: Batch ETL, scheduled runs, CTE decomposition, cost-sensitive workloads
57
+ - **Key difference**: Snowflake manages the refresh DAG automatically. You specify a lag (e.g., 5 minutes) and Snowflake keeps data fresh within that window. This adds continuous compute cost.
58
+
59
+ ### Incremental Load
60
+ - **Purpose**: Process only new/modified records via high-water mark comparison
61
+ - **Use when**: Large tables where full refresh is too expensive, append-only sources
62
+ - **NOT for**: Full-refresh staging, CTE decomposition, small-to-medium tables
63
+
64
+ ### Materialized View
65
+ - **Purpose**: Pre-computed aggregations that auto-refresh when base data changes
66
+ - **Use when**: Expensive aggregations that need to stay current, single-source only
67
+ - **NOT for**: Multi-source joins, standard transforms
68
+
69
+ ### Deferred Merge
70
+ - **Purpose**: Snowflake Streams + scheduled merge tasks for high-frequency ingestion
71
+ - **Use when**: High-frequency data ingestion where immediate merge is too expensive
72
+ - **NOT for**: Batch ETL, standard staging
73
+
74
+ ### Tasks / DAG
75
+ - **Purpose**: Snowflake scheduled or DAG-based orchestration
76
+ - **Use when**: Building scheduled task workflows
77
+ - **NOT for**: Data transformation nodes
78
+
79
+ ## Decision Rules
80
+
81
+ 1. **CTE decomposition**: Each CTE becomes a Stage or Work node. Never Dimension, Fact, or specialized types unless the user explicitly names it that way.
82
+ 2. **GROUP BY / SUM / COUNT**: These are transforms. Use Stage/Work. Only use Fact/Dimension if building a dimensional model.
83
+ 3. **Multi-source JOIN**: Use Stage or Work — both handle joins via sourceMapping.
84
+ 4. **Name prefix**: `stg_` → Stage, `wrk_` → Work, `dim_` → Dimension, `fct_` → Fact, `vw_` → View. Follow the prefix.
85
+ 5. **No explicit requirement**: Default to Stage for single-source, Work for multi-source.
86
+ 6. **Specialized types require explicit user request**: Dynamic Tables, Incremental Load, Materialized View, Deferred Merge, and Tasks are specialized materialization patterns. **NEVER auto-select these.** The user must explicitly ask for continuous refresh, incremental processing, or the specific pattern by name.
87
+
88
+ ## Common Mistakes
89
+
90
+ | Mistake | Why It's Wrong | Correct Approach |
91
+ |---------|---------------|-----------------|
92
+ | Using Dynamic Tables for batch ETL | Dynamic Tables add continuous compute cost and are for near-real-time refresh | Use Stage or Work with table materialization |
93
+ | Picking node type by ID (e.g., "65") without `plan-pipeline` | The agent doesn't know what the ID maps to or if it's appropriate | Always call `plan-pipeline` first |
94
+ | Using Dimension/Fact for GROUP BY | GROUP BY is a transform, not a dimensional model | Use Stage or Work |
95
+ | Skipping `plan-pipeline` and guessing "Stage" | May miss better options or select an inappropriate type | Always call `plan-pipeline` with `repoPath` |
96
+ | Using Incremental Load for small tables | Incremental overhead isn't worth it for fast full refreshes | Use Stage with full refresh |
@@ -0,0 +1,135 @@
1
+ # Coalesce MCP Server Overview
2
+
3
+ ## How This Server Works
4
+
5
+ This MCP server provides tools and resources for working with Coalesce Transform workspaces and environments via the Coalesce API.
6
+
7
+ ### Available Resources
8
+
9
+ Consult these resources for specific guidance. Load only what the current task needs.
10
+
11
+ - **coalesce://context/pipeline-workflows** — Building pipelines, node type selection, multi-node sequences, incremental setup
12
+ - **coalesce://context/node-operations** — Editing nodes: join conditions, columns, config, renames, SQL conversion
13
+ - **coalesce://context/node-creation-decision-tree** — Routing: which tool to use for creation vs update
14
+ - **coalesce://context/data-engineering-principles** — Architecture, platforms, methodology, materialization, packages
15
+ - **coalesce://context/aggregation-patterns** — GROUP BY, datatype inference, common aggregation patterns
16
+ - **coalesce://context/node-type-corpus** — Node type discovery, corpus search, metadata patterns
17
+ - **coalesce://context/tool-usage** — Core tool rules, discovery patterns, pagination, parallelization
18
+ - **coalesce://context/sql-platform-selection** — Determine the active SQL platform
19
+ - **coalesce://context/sql-snowflake** — Snowflake SQL conventions
20
+ - **coalesce://context/sql-databricks** — Databricks SQL conventions
21
+ - **coalesce://context/sql-bigquery** — BigQuery SQL conventions
22
+ - **coalesce://context/storage-mappings** — `{{ ref() }}` syntax and storage locations
23
+ - **coalesce://context/id-discovery** — Resolving project, workspace, environment, and node IDs
24
+ - **coalesce://context/node-payloads** — Full node body editing guidance
25
+ - **coalesce://context/hydrated-metadata** — Hydrated metadata structures
26
+ - **coalesce://context/run-operations** — Start, retry, diagnose, and cancel runs
27
+ - **coalesce://context/intelligent-node-configuration** — Intelligent config completion
28
+
29
+ ### Response Guidelines
30
+
31
+ - Answer questions directly without preamble
32
+ - Use fenced code blocks with language tags
33
+ - Provide detail when requested, brevity otherwise
34
+ - For multi-step pipeline creation, report progress after each node
35
+
36
+ ## How Coalesce Nodes Work
37
+
38
+ **CRITICAL**: Coalesce nodes are NOT SQL scripts. They are declarative configurations with these components:
39
+
40
+ | Component | Where it lives | What it does |
41
+ |-----------|---------------|--------------|
42
+ | **Columns** | `metadata.columns[].transform` | Each column has a SQL expression (e.g., `"ORDERS"."CUSTOMER_ID"`, `SUM("ORDERS"."TOTAL")`) |
43
+ | **Join condition** | `metadata.sourceMapping[].join.joinCondition` | The FROM/JOIN/WHERE/GROUP BY clause using `{{ ref() }}` syntax |
44
+ | **Dependencies** | `metadata.sourceMapping[].dependencies` | Which upstream nodes this node reads from |
45
+ | **Config** | `config` | Node-type-specific settings (truncate, business keys, SCD, etc.) |
46
+
47
+ The node type's Jinja template combines these into final SQL at compile time. You configure **columns and joins separately** — you never write a complete SQL query.
48
+
49
+ **Example — a CLV aggregation node:**
50
+
51
+ ```text
52
+ Columns:
53
+ - CUSTOMER_ID: transform = "CUSTOMER_LOYALTY"."CUSTOMER_ID"
54
+ - TOTAL_ORDERS: transform = COUNT(DISTINCT "ORDER_HEADER"."ORDER_ID")
55
+ - LIFETIME_VALUE: transform = SUM("ORDER_HEADER"."ORDER_TOTAL")
56
+
57
+ joinCondition:
58
+ FROM {{ ref('STAGING', 'CUSTOMER_LOYALTY') }} "CUSTOMER_LOYALTY"
59
+ LEFT JOIN {{ ref('STAGING', 'ORDER_HEADER') }} "ORDER_HEADER"
60
+ ON "CUSTOMER_LOYALTY"."CUSTOMER_ID" = "ORDER_HEADER"."CUSTOMER_ID"
61
+ GROUP BY "CUSTOMER_LOYALTY"."CUSTOMER_ID"
62
+ ```
63
+
64
+ **Key rules:**
65
+
66
+ - Column transforms reference **predecessor table aliases** from the joinCondition
67
+ - GROUP BY goes inside the joinCondition, after JOIN clauses
68
+ - Aggregate functions (COUNT, SUM, AVG) go in column transforms, NOT in joinCondition
69
+ - CASE expressions go in column transforms
70
+ - CTEs are NOT supported — break into separate upstream nodes
71
+ - Do NOT write `overrideSQL` or pass raw SQL — use the column/joinCondition model
72
+
73
+ **Required workflow for creating nodes:**
74
+
75
+ 1. **Always call `plan-pipeline` first** — it discovers all available node types from the repo and ranks them for your use case. Do NOT guess node types like "Stage" or "View".
76
+ 2. **Use `create-workspace-node-from-predecessor`** (or `create-workspace-node-from-scratch` for nodes with no upstream). Pass `repoPath` for automatic config completion.
77
+ 3. **Use `create-workspace-node-from-predecessor`** (or `create-workspace-node-from-scratch` for nodes with no upstream) — they handle validation, config completion, and column-level attributes automatically.
78
+
79
+ Config completion is automatic when `repoPath` is provided — the response includes `configCompletion` showing what node-level config and column-level attributes were applied.
80
+
81
+ **Post-creation verification (required before moving to the next node):**
82
+
83
+ 1. Check `nextSteps` in the creation response — follow all required steps (especially join setup for multi-predecessor nodes)
84
+ 2. Check `validation.allPredecessorsRepresented` — if false, the join is incomplete
85
+ 3. For multi-predecessor nodes: set up the join condition via `convert-join-to-aggregation` (aggregation), `apply-join-condition` (row-level joins), or `update-workspace-node` (manual)
86
+ 4. Verify the final node with `get-workspace-node` — confirm columns, joinCondition, and config are correct
87
+ 5. Follow naming conventions: STG_ for staging, DIM_ for dimensions, FACT_ for facts, INT_ for intermediate (e.g., `STG_LOCATION`, `FACT_ORDERS`). Default to UPPERCASE for Snowflake, but **respect the user's chosen casing**
88
+
89
+ **Anti-pattern — writing SQL and passing it to the planner:**
90
+
91
+ Do NOT author SQL yourself to pass to `plan-pipeline` or `create-pipeline-from-sql`. The `sql` parameter exists solely for converting SQL that the **user** provided. When building a pipeline, use declarative tools:
92
+
93
+ 1. `create-workspace-node-from-predecessor` to create nodes
94
+ 2. `update-workspace-node` to set joinCondition
95
+ 3. `replace-workspace-node-columns` or `convert-join-to-aggregation` for column transforms
96
+
97
+ ### Writing SQL for Nodes
98
+
99
+ Determine the platform first, then load exactly one dialect resource:
100
+
101
+ 1. **Detect**: `get-project` for warehouse type, or check existing node SQL (see `coalesce://context/sql-platform-selection`)
102
+ 2. **Load one**: Snowflake -> `coalesce://context/sql-snowflake`, Databricks -> `coalesce://context/sql-databricks`, BigQuery -> `coalesce://context/sql-bigquery`
103
+ 3. **Follow that dialect's rules** for the entire edit
104
+
105
+ ## Operational Scope
106
+
107
+ ### In Scope
108
+
109
+ - Creating/updating workspace nodes
110
+ - Reasoning from project metadata and workspace patterns
111
+ - Writing SQL transforms and joins
112
+ - Running jobs and monitoring runs
113
+ - Managing environments and projects
114
+
115
+ ### Out of Scope
116
+
117
+ - **Previewing compiled SQL** — compilation happens at deploy/run time
118
+ - **Changing node type** — set at creation time, cannot be changed via API. Create a new node of the desired type instead
119
+ - **Data preview / row counts** — this server manages node definitions, not live warehouse data
120
+ - **Cross-workspace replication** — no clone tool; recreate each node bottom-up in the target workspace
121
+ - Creating/modifying node type templates, source nodes, or macros
122
+
123
+ ### Description Generation
124
+
125
+ Only generate descriptions when explicitly requested. Focus on disambiguation — what makes this column the one the user is looking for? If you lack context, ask.
126
+
127
+ Apply all descriptions in one call using `replace-workspace-node-columns`.
128
+
129
+ ## Documentation
130
+
131
+ - [General Coalesce Docs](https://docs.coalesce.io/docs)
132
+ - [API Documentation](https://docs.coalesce.io/docs/api)
133
+ - [Snowflake Base Node Types](https://docs.coalesce.io/docs/marketplace/package/coalesce_snowflake_base-node-types)
134
+ - [Incremental Loading](https://docs.coalesce.io/docs/marketplace/package/coalesce_snowflake_incremental-loading)
135
+ - [Dynamic Tables](https://docs.coalesce.io/docs/marketplace/package/coalesce_snowflake_dynamic-tables)