coalesce-transform-mcp 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -0
- package/README.md +304 -0
- package/dist/cache-dir.d.ts +26 -0
- package/dist/cache-dir.js +106 -0
- package/dist/client.d.ts +25 -0
- package/dist/client.js +212 -0
- package/dist/coalesce/api/environments.d.ts +20 -0
- package/dist/coalesce/api/environments.js +15 -0
- package/dist/coalesce/api/git-accounts.d.ts +21 -0
- package/dist/coalesce/api/git-accounts.js +21 -0
- package/dist/coalesce/api/jobs.d.ts +25 -0
- package/dist/coalesce/api/jobs.js +21 -0
- package/dist/coalesce/api/nodes.d.ts +29 -0
- package/dist/coalesce/api/nodes.js +33 -0
- package/dist/coalesce/api/projects.d.ts +22 -0
- package/dist/coalesce/api/projects.js +25 -0
- package/dist/coalesce/api/runs.d.ts +19 -0
- package/dist/coalesce/api/runs.js +34 -0
- package/dist/coalesce/api/subgraphs.d.ts +20 -0
- package/dist/coalesce/api/subgraphs.js +17 -0
- package/dist/coalesce/api/users.d.ts +30 -0
- package/dist/coalesce/api/users.js +31 -0
- package/dist/coalesce/types.d.ts +298 -0
- package/dist/coalesce/types.js +746 -0
- package/dist/generated/.gitkeep +0 -0
- package/dist/generated/node-type-corpus.json +42656 -0
- package/dist/index.d.ts +2 -0
- package/dist/index.js +10 -0
- package/dist/mcp/cache.d.ts +3 -0
- package/dist/mcp/cache.js +137 -0
- package/dist/mcp/environments.d.ts +3 -0
- package/dist/mcp/environments.js +61 -0
- package/dist/mcp/git-accounts.d.ts +3 -0
- package/dist/mcp/git-accounts.js +70 -0
- package/dist/mcp/jobs.d.ts +3 -0
- package/dist/mcp/jobs.js +77 -0
- package/dist/mcp/node-type-corpus.d.ts +3 -0
- package/dist/mcp/node-type-corpus.js +173 -0
- package/dist/mcp/nodes.d.ts +3 -0
- package/dist/mcp/nodes.js +341 -0
- package/dist/mcp/pipelines.d.ts +3 -0
- package/dist/mcp/pipelines.js +342 -0
- package/dist/mcp/projects.d.ts +3 -0
- package/dist/mcp/projects.js +70 -0
- package/dist/mcp/repo-node-types.d.ts +135 -0
- package/dist/mcp/repo-node-types.js +387 -0
- package/dist/mcp/runs.d.ts +3 -0
- package/dist/mcp/runs.js +92 -0
- package/dist/mcp/subgraphs.d.ts +3 -0
- package/dist/mcp/subgraphs.js +60 -0
- package/dist/mcp/users.d.ts +3 -0
- package/dist/mcp/users.js +107 -0
- package/dist/prompts/index.d.ts +2 -0
- package/dist/prompts/index.js +58 -0
- package/dist/resources/context/aggregation-patterns.md +145 -0
- package/dist/resources/context/data-engineering-principles.md +183 -0
- package/dist/resources/context/hydrated-metadata.md +92 -0
- package/dist/resources/context/id-discovery.md +64 -0
- package/dist/resources/context/intelligent-node-configuration.md +162 -0
- package/dist/resources/context/node-creation-decision-tree.md +156 -0
- package/dist/resources/context/node-operations.md +316 -0
- package/dist/resources/context/node-payloads.md +114 -0
- package/dist/resources/context/node-type-corpus.md +166 -0
- package/dist/resources/context/node-type-selection-guide.md +96 -0
- package/dist/resources/context/overview.md +135 -0
- package/dist/resources/context/pipeline-workflows.md +355 -0
- package/dist/resources/context/run-operations.md +55 -0
- package/dist/resources/context/sql-bigquery.md +41 -0
- package/dist/resources/context/sql-databricks.md +40 -0
- package/dist/resources/context/sql-platform-selection.md +70 -0
- package/dist/resources/context/sql-snowflake.md +43 -0
- package/dist/resources/context/storage-mappings.md +49 -0
- package/dist/resources/context/tool-usage.md +98 -0
- package/dist/resources/index.d.ts +5 -0
- package/dist/resources/index.js +254 -0
- package/dist/schemas/node-payloads.d.ts +5019 -0
- package/dist/schemas/node-payloads.js +147 -0
- package/dist/server.d.ts +7 -0
- package/dist/server.js +63 -0
- package/dist/services/cache/snapshots.d.ts +108 -0
- package/dist/services/cache/snapshots.js +275 -0
- package/dist/services/config/context-analyzer.d.ts +14 -0
- package/dist/services/config/context-analyzer.js +76 -0
- package/dist/services/config/field-classifier.d.ts +23 -0
- package/dist/services/config/field-classifier.js +47 -0
- package/dist/services/config/intelligent.d.ts +55 -0
- package/dist/services/config/intelligent.js +306 -0
- package/dist/services/config/rules.d.ts +6 -0
- package/dist/services/config/rules.js +44 -0
- package/dist/services/config/schema-resolver.d.ts +18 -0
- package/dist/services/config/schema-resolver.js +80 -0
- package/dist/services/corpus/loader.d.ts +56 -0
- package/dist/services/corpus/loader.js +25 -0
- package/dist/services/corpus/search.d.ts +49 -0
- package/dist/services/corpus/search.js +69 -0
- package/dist/services/corpus/templates.d.ts +4 -0
- package/dist/services/corpus/templates.js +11 -0
- package/dist/services/pipelines/execution.d.ts +20 -0
- package/dist/services/pipelines/execution.js +290 -0
- package/dist/services/pipelines/node-type-intent.d.ts +96 -0
- package/dist/services/pipelines/node-type-intent.js +356 -0
- package/dist/services/pipelines/node-type-selection.d.ts +66 -0
- package/dist/services/pipelines/node-type-selection.js +758 -0
- package/dist/services/pipelines/planning.d.ts +543 -0
- package/dist/services/pipelines/planning.js +1839 -0
- package/dist/services/policies/sql-override.d.ts +7 -0
- package/dist/services/policies/sql-override.js +109 -0
- package/dist/services/repo/operations.d.ts +6 -0
- package/dist/services/repo/operations.js +10 -0
- package/dist/services/repo/parser.d.ts +70 -0
- package/dist/services/repo/parser.js +365 -0
- package/dist/services/repo/path.d.ts +2 -0
- package/dist/services/repo/path.js +58 -0
- package/dist/services/templates/nodes.d.ts +50 -0
- package/dist/services/templates/nodes.js +336 -0
- package/dist/services/workspace/analysis.d.ts +56 -0
- package/dist/services/workspace/analysis.js +151 -0
- package/dist/services/workspace/mutations.d.ts +150 -0
- package/dist/services/workspace/mutations.js +1718 -0
- package/dist/utils.d.ts +5 -0
- package/dist/utils.js +7 -0
- package/dist/workflows/get-environment-overview.d.ts +9 -0
- package/dist/workflows/get-environment-overview.js +23 -0
- package/dist/workflows/get-run-details.d.ts +10 -0
- package/dist/workflows/get-run-details.js +28 -0
- package/dist/workflows/progress.d.ts +20 -0
- package/dist/workflows/progress.js +54 -0
- package/dist/workflows/retry-and-wait.d.ts +13 -0
- package/dist/workflows/retry-and-wait.js +139 -0
- package/dist/workflows/run-and-wait.d.ts +13 -0
- package/dist/workflows/run-and-wait.js +141 -0
- package/dist/workflows/run-status.d.ts +10 -0
- package/dist/workflows/run-status.js +27 -0
- package/package.json +34 -0
|
@@ -0,0 +1,114 @@
|
|
|
1
|
+
# Node Payloads
|
|
2
|
+
|
|
3
|
+
Use this resource when reading or writing full workspace node bodies.
|
|
4
|
+
|
|
5
|
+
## Core Rule
|
|
6
|
+
|
|
7
|
+
Treat workspace node bodies as structured objects with important differences between top-level fields, `metadata`, `config`, and arrays.
|
|
8
|
+
|
|
9
|
+
## Common Payload Areas
|
|
10
|
+
|
|
11
|
+
### Top-Level Fields
|
|
12
|
+
|
|
13
|
+
Common top-level fields include:
|
|
14
|
+
- `id`
|
|
15
|
+
- `name`
|
|
16
|
+
- `description`
|
|
17
|
+
- `nodeType`
|
|
18
|
+
- `database`
|
|
19
|
+
- `schema`
|
|
20
|
+
- `locationName`
|
|
21
|
+
- `storageLocations`
|
|
22
|
+
- `config`
|
|
23
|
+
- `metadata`
|
|
24
|
+
|
|
25
|
+
Not every node type uses every field.
|
|
26
|
+
|
|
27
|
+
Project policy:
|
|
28
|
+
- Do not send `overrideSQL`
|
|
29
|
+
- Do not send `override.*`
|
|
30
|
+
- If a node definition mentions `overrideSQLToggle`, treat it as disallowed and omit it from writes
|
|
31
|
+
|
|
32
|
+
### `metadata`
|
|
33
|
+
|
|
34
|
+
`metadata` usually contains node-shape details such as:
|
|
35
|
+
- `columns`
|
|
36
|
+
- lineage/source structures
|
|
37
|
+
- mapping-related structures
|
|
38
|
+
|
|
39
|
+
This is where many node-specific editing mistakes happen.
|
|
40
|
+
|
|
41
|
+
### `config`
|
|
42
|
+
|
|
43
|
+
`config` is node-type-specific operational configuration.
|
|
44
|
+
|
|
45
|
+
Examples include:
|
|
46
|
+
- `preSQL`
|
|
47
|
+
- `postSQL`
|
|
48
|
+
- `insertStrategy`
|
|
49
|
+
- node-type-specific dropdown or tabular settings
|
|
50
|
+
|
|
51
|
+
Do not assume the same config keys exist across node types.
|
|
52
|
+
|
|
53
|
+
### `storageLocations` and top-level location fields
|
|
54
|
+
|
|
55
|
+
Some nodes use `storageLocations`.
|
|
56
|
+
Some also expose location fields at the top level such as:
|
|
57
|
+
- `database`
|
|
58
|
+
- `schema`
|
|
59
|
+
- `locationName`
|
|
60
|
+
|
|
61
|
+
Verify the saved node body instead of assuming one location representation is always present.
|
|
62
|
+
|
|
63
|
+
## Array Safety
|
|
64
|
+
|
|
65
|
+
Arrays are replace-on-write unless the tool says otherwise.
|
|
66
|
+
|
|
67
|
+
This is especially important for:
|
|
68
|
+
- `metadata.columns`
|
|
69
|
+
- other metadata arrays
|
|
70
|
+
- `storageLocations`
|
|
71
|
+
|
|
72
|
+
If you send only the new array items, you will usually replace the old array.
|
|
73
|
+
|
|
74
|
+
## Tool Guidance
|
|
75
|
+
|
|
76
|
+
For tool selection and helper preference guidance, see `coalesce://context/node-creation-decision-tree`.
|
|
77
|
+
|
|
78
|
+
### Prefer `update-workspace-node`
|
|
79
|
+
|
|
80
|
+
Use it for most partial node edits because it:
|
|
81
|
+
- reads the current node first
|
|
82
|
+
- deep-merges object fields
|
|
83
|
+
- writes the merged full body back
|
|
84
|
+
|
|
85
|
+
### Use `set-workspace-node` carefully
|
|
86
|
+
|
|
87
|
+
Use `set-workspace-node` only when:
|
|
88
|
+
- you intend a full replacement
|
|
89
|
+
- you already have the complete final node body
|
|
90
|
+
|
|
91
|
+
### Scratch creation
|
|
92
|
+
|
|
93
|
+
For scratch creation guidance (use cases, completion levels, expected fields), see `coalesce://context/pipeline-workflows`.
|
|
94
|
+
|
|
95
|
+
## Validation Rules
|
|
96
|
+
|
|
97
|
+
After helper-based writes, inspect:
|
|
98
|
+
- `validation`
|
|
99
|
+
- `warning`
|
|
100
|
+
|
|
101
|
+
Do not assume a node is fully ready unless the helper says the requested completion was satisfied.
|
|
102
|
+
|
|
103
|
+
## Rename Safety
|
|
104
|
+
|
|
105
|
+
If a node name changes:
|
|
106
|
+
- verify the saved node body still looks correct
|
|
107
|
+
- verify related mapping/source fields still make sense
|
|
108
|
+
- do not assume downstream references were updated automatically
|
|
109
|
+
|
|
110
|
+
## Related Resources
|
|
111
|
+
|
|
112
|
+
- `coalesce://context/hydrated-metadata`
|
|
113
|
+
- `coalesce://context/storage-mappings`
|
|
114
|
+
- `coalesce://context/node-creation-decision-tree`
|
|
@@ -0,0 +1,166 @@
|
|
|
1
|
+
# Node Type Discovery and Corpus
|
|
2
|
+
|
|
3
|
+
## Prefer Repo-Backed Discovery First
|
|
4
|
+
|
|
5
|
+
**BEFORE creating or editing nodes:**
|
|
6
|
+
1. Determine which node types are already observed in the workspace
|
|
7
|
+
2. If a local committed repo is available, use repo-backed discovery with explicit `repoPath` or the `COALESCE_REPO_PATH` fallback
|
|
8
|
+
3. Use the corpus only when the repo is unavailable or lacks the committed definition
|
|
9
|
+
|
|
10
|
+
Repo-backed workflow:
|
|
11
|
+
- Install the package in Coalesce
|
|
12
|
+
- Commit the workspace branch
|
|
13
|
+
- Update the local repo clone to that branch
|
|
14
|
+
- Use `list-repo-packages`, `list-repo-node-types`, `get-repo-node-type-definition`, or `generate-set-workspace-node-template`
|
|
15
|
+
- Prefer explicit `repoPath` on repo-aware calls; when omitted, tools fall back to `COALESCE_REPO_PATH`
|
|
16
|
+
- There is no server-wide repo mapping in v1
|
|
17
|
+
|
|
18
|
+
## Repo-Aware Tools
|
|
19
|
+
|
|
20
|
+
### `list-repo-packages`
|
|
21
|
+
Inspect committed `packages/*.yml`
|
|
22
|
+
Use this to discover exact package aliases and see which enabled node-type IDs have committed definitions
|
|
23
|
+
|
|
24
|
+
### `list-repo-node-types`
|
|
25
|
+
List exact resolvable node-type identifiers from committed `nodeTypes/`
|
|
26
|
+
Use this to confirm the exact direct identifier or `alias:::id` value before generating templates
|
|
27
|
+
|
|
28
|
+
### `get-repo-node-type-definition`
|
|
29
|
+
Resolve one exact node type from the committed repo
|
|
30
|
+
Returns the parsed outer definition plus raw and parsed `metadata.nodeMetadataSpec`
|
|
31
|
+
|
|
32
|
+
### `generate-set-workspace-node-template`
|
|
33
|
+
Generate a `set-workspace-node` body template from either:
|
|
34
|
+
- a raw definition object
|
|
35
|
+
- a committed repo definition resolved by `repoPath` + `nodeType`
|
|
36
|
+
|
|
37
|
+
In repo mode, this preserves the exact resolved node type, including package-backed identifiers like `alias:::id`
|
|
38
|
+
|
|
39
|
+
## Corpus Tools
|
|
40
|
+
|
|
41
|
+
### `search-node-type-variants`
|
|
42
|
+
Search for node type families (e.g., "stage", "dimension")
|
|
43
|
+
Returns variants with their config schema and metadata examples
|
|
44
|
+
|
|
45
|
+
### `get-node-type-variant`
|
|
46
|
+
Get exact variant definition by key
|
|
47
|
+
Use this to get the authoritative structure for a node type
|
|
48
|
+
|
|
49
|
+
### `list-workspace-node-types` *(NEW)*
|
|
50
|
+
Scan workspace nodes and return distinct observed node types
|
|
51
|
+
Use this to inspect current workspace usage and exact identifiers before recommending
|
|
52
|
+
The response includes `basis: "observed_nodes"` to make the contract explicit
|
|
53
|
+
|
|
54
|
+
## Node-Type-Specific Patterns
|
|
55
|
+
|
|
56
|
+
Project policy:
|
|
57
|
+
- Never recommend or set SQL override fields
|
|
58
|
+
- Ignore or remove `overrideSQLToggle`, `overrideSQL`, and `override.*` when reading node-type definitions
|
|
59
|
+
- Prefer native node configuration and metadata patterns instead of SQL override
|
|
60
|
+
|
|
61
|
+
The corpus contains node-type-specific patterns and configurations. These vary by:
|
|
62
|
+
- Package source (Coalesce built-in vs community packages)
|
|
63
|
+
- Node type family (Stage, Dimension, Fact, View, etc.)
|
|
64
|
+
- Package version and variant
|
|
65
|
+
|
|
66
|
+
**When to consult corpus patterns:**
|
|
67
|
+
- Creating nodes with complex metadata (sources, joins, config)
|
|
68
|
+
- Applying node-type-specific logic (materialization, caching, partitioning)
|
|
69
|
+
- Understanding required vs optional config fields
|
|
70
|
+
- Adapting SQL patterns for specific node types
|
|
71
|
+
|
|
72
|
+
The corpus provides real-world examples from actual node type source code, including:
|
|
73
|
+
- Metadata structure (columns, sources, mappings)
|
|
74
|
+
- Config field schemas and valid values
|
|
75
|
+
- SQL patterns and Jinja syntax specific to that node type
|
|
76
|
+
- Storage location and reference patterns
|
|
77
|
+
|
|
78
|
+
## Column-Level Attributes (columnSelector)
|
|
79
|
+
|
|
80
|
+
Node type definitions contain config items with `"type": "columnSelector"`. These are **column-level attributes** — boolean flags set directly on individual column objects in `metadata.columns`, NOT in the node-level `config` object.
|
|
81
|
+
|
|
82
|
+
**How it works:**
|
|
83
|
+
|
|
84
|
+
1. The node type definition (in `nodeTypes/` in the local repo) has a config item like:
|
|
85
|
+
```json
|
|
86
|
+
{ "displayName": "Business Key", "attributeName": "isBusinessKey", "type": "columnSelector" }
|
|
87
|
+
```
|
|
88
|
+
2. To activate it, set the `attributeName` as a boolean on the column:
|
|
89
|
+
```json
|
|
90
|
+
{ "name": "CUSTOMER_ID", "dataType": "NUMBER(38,0)", "isBusinessKey": true, "transform": "..." }
|
|
91
|
+
```
|
|
92
|
+
3. The node type's Jinja templates reference it: `columns | selectattr('isBusinessKey')`
|
|
93
|
+
|
|
94
|
+
**Discovery workflow:**
|
|
95
|
+
|
|
96
|
+
1. Use `get-repo-node-type-definition` to read the node type definition from the local repo
|
|
97
|
+
2. Find config items where `type` is `"columnSelector"`
|
|
98
|
+
3. The `attributeName` field tells you what property to set on columns
|
|
99
|
+
4. Set `attributeName: true` on the appropriate columns via `update-workspace-node` or `replace-workspace-node-columns`
|
|
100
|
+
|
|
101
|
+
**Important:** Attribute names vary by node type and package. Always look them up in the actual node type definition — do not guess or hardcode attribute names.
|
|
102
|
+
|
|
103
|
+
## Workflow
|
|
104
|
+
|
|
105
|
+
1. **Discover observed types:** `list-workspace-node-types`
|
|
106
|
+
2. **Check if recommended type is already observed:** Compare to workspace types
|
|
107
|
+
3. **If unobserved:** Do not claim it is unavailable; confirm installation in the UI before proceeding
|
|
108
|
+
4. **If a committed local repo is available:** Use repo-aware tools with `repoPath` or `COALESCE_REPO_PATH`
|
|
109
|
+
5. **If repo resolution fails or the definition is missing:** Use `search-node-type-variants` and `get-node-type-variant`
|
|
110
|
+
6. **Adapt example:** Replace placeholder values with user-specific data
|
|
111
|
+
7. **Create/update:** Use appropriate tool with correct structure
|
|
112
|
+
|
|
113
|
+
## Node Type Availability
|
|
114
|
+
|
|
115
|
+
If a recommended node type is not observed in current workspace nodes:
|
|
116
|
+
1. Do not claim the type is unavailable
|
|
117
|
+
2. Explain that the workspace scan only reflects existing nodes, not a true installed-type registry
|
|
118
|
+
3. Confirm package installation in the Coalesce UI when the exact availability is uncertain
|
|
119
|
+
4. Prefer repo-backed discovery to recover the exact identifier before proceeding
|
|
120
|
+
|
|
121
|
+
## Node Type Format
|
|
122
|
+
|
|
123
|
+
Node types can appear in two formats:
|
|
124
|
+
|
|
125
|
+
1. **Simple format:** Direct node type names without package prefix
|
|
126
|
+
- Examples: `"Stage"`, `"persistentStage"`, `"View"`
|
|
127
|
+
- Used for built-in node types or custom node types in the repo
|
|
128
|
+
|
|
129
|
+
2. **Package-prefixed format:** Package name followed by `:::` and node type ID
|
|
130
|
+
- Examples: `"IncrementalLoading:::230"`, `"Databricks-Incremental-nodes:::278"`
|
|
131
|
+
- Used for package-installed node types from published packages
|
|
132
|
+
- The format is: `"PackageName:::NodeTypeID"`
|
|
133
|
+
|
|
134
|
+
### Creating Nodes with Package-Prefixed Types
|
|
135
|
+
|
|
136
|
+
When creating nodes, you can use **either format**:
|
|
137
|
+
|
|
138
|
+
- Full format: `nodeType: "IncrementalLoading:::230"`
|
|
139
|
+
- Bare ID: `nodeType: "230"` (will match any package with that ID)
|
|
140
|
+
|
|
141
|
+
Prefer the full package-prefixed format when you know it. Using just the numeric ID (e.g., `"230"`) is safest when a matching package-prefixed type is already observed in workspace nodes.
|
|
142
|
+
|
|
143
|
+
### Discovering Node Type Format
|
|
144
|
+
|
|
145
|
+
Use `list-workspace-node-types` to see the exact format of observed node types. The response will show package-prefixed types like `"IncrementalLoading:::230"` when those identifiers are already present in current workspace nodes.
|
|
146
|
+
|
|
147
|
+
## Template Usage
|
|
148
|
+
|
|
149
|
+
### Template Generation Hierarchy
|
|
150
|
+
|
|
151
|
+
1. **For committed local repo definitions:** Use `generate-set-workspace-node-template` in repo mode
|
|
152
|
+
- Resolves the exact committed definition from `repoPath` or `COALESCE_REPO_PATH`
|
|
153
|
+
- Preserves exact direct or package-backed node-type identifiers
|
|
154
|
+
- Best choice when the local repo contains the definition
|
|
155
|
+
|
|
156
|
+
2. **For known corpus variants:** Use `get-node-type-variant`
|
|
157
|
+
- Returns exact variant definition from the committed corpus snapshot
|
|
158
|
+
- Best fallback when repo-backed resolution is unavailable
|
|
159
|
+
|
|
160
|
+
3. **For generating templates from corpus variants:** Use `generate-set-workspace-node-template-from-variant`
|
|
161
|
+
- Converts a corpus variant into an editable YAML-friendly template
|
|
162
|
+
- Use this when the repo does not contain the committed definition
|
|
163
|
+
|
|
164
|
+
4. **For discovery:** Search first, then get variant
|
|
165
|
+
- Use `search-node-type-variants` to find the right fallback variant
|
|
166
|
+
- Then use `get-node-type-variant` or `generate-set-workspace-node-template-from-variant`
|
|
@@ -0,0 +1,96 @@
|
|
|
1
|
+
# Node Type Selection Guide
|
|
2
|
+
|
|
3
|
+
When creating pipeline nodes, choose the node type based on the actual purpose of the node — not the SQL pattern.
|
|
4
|
+
|
|
5
|
+
## General-Purpose Node Types (use for most transforms)
|
|
6
|
+
|
|
7
|
+
### Stage / Work
|
|
8
|
+
- **Purpose**: General-purpose intermediate processing. Handles single-source, multi-source joins, GROUP BY, UNION, filters, transforms. These are interchangeable for most patterns.
|
|
9
|
+
- **Materialization**: Table or View
|
|
10
|
+
- **Use for**: Column renames, type casts, WHERE filters, GROUP BY aggregations, multi-table JOINs, UNION/UNION ALL, any transformation without special requirements
|
|
11
|
+
- **Default choice**: When in doubt, use Stage or Work
|
|
12
|
+
|
|
13
|
+
### View
|
|
14
|
+
- **Purpose**: Virtual table with no physical storage. Recalculates on every query.
|
|
15
|
+
- **Materialization**: View only
|
|
16
|
+
- **Use for**: Lightweight projections, secure views, cost savings when recomputation is OK
|
|
17
|
+
- **Avoid when**: Downstream queries are performance-critical or aggregations are expensive
|
|
18
|
+
|
|
19
|
+
## Dimensional Modeling Node Types (only when explicitly building a dimensional model)
|
|
20
|
+
|
|
21
|
+
### Dimension
|
|
22
|
+
- **Purpose**: Descriptive business context (customers, products, locations). Requires business keys. Supports SCD Type 1/2.
|
|
23
|
+
- **Materialization**: Table or View
|
|
24
|
+
- **Use ONLY when**: Building a star/snowflake schema, node is named dim_ or dimension_, SCD tracking is needed
|
|
25
|
+
- **NOT for**: Generic GROUP BY, aggregations, staging, or CTE decomposition
|
|
26
|
+
|
|
27
|
+
### Fact
|
|
28
|
+
- **Purpose**: Business measures (revenue, quantity, cost) at a defined grain. Requires business keys.
|
|
29
|
+
- **Materialization**: Table or View
|
|
30
|
+
- **Use ONLY when**: Building a fact table in a dimensional model, node is named fct_ or fact_
|
|
31
|
+
- **NOT for**: Any GROUP BY or SUM — those are transforms, not fact tables
|
|
32
|
+
|
|
33
|
+
### Factless Fact
|
|
34
|
+
- **Purpose**: Record events without numeric measures (attendance, eligibility)
|
|
35
|
+
- **Use ONLY when**: Event tracking without measures in a dimensional model
|
|
36
|
+
|
|
37
|
+
## Change Tracking Node Types
|
|
38
|
+
|
|
39
|
+
### Persistent Stage
|
|
40
|
+
- **Purpose**: CDC / change tracking with business keys. Type 1/Type 2 history.
|
|
41
|
+
- **Materialization**: Table only
|
|
42
|
+
- **Use ONLY when**: Goal explicitly mentions CDC, change tracking, or history tracking
|
|
43
|
+
- **NOT for**: Simple staging, general transforms, CTE decomposition
|
|
44
|
+
|
|
45
|
+
## Data Vault Node Types
|
|
46
|
+
|
|
47
|
+
### Hub / Satellite / Link
|
|
48
|
+
- **Use ONLY when**: Explicitly building a Data Vault model
|
|
49
|
+
- **NOT for**: General-purpose joins or transforms
|
|
50
|
+
|
|
51
|
+
## Specialized Materialization Patterns (only when explicitly requested)
|
|
52
|
+
|
|
53
|
+
### Dynamic Tables
|
|
54
|
+
- **Purpose**: Snowflake-managed declarative refresh with lag-based orchestration
|
|
55
|
+
- **Use when**: Near-real-time / continuous refresh, streaming-like pipelines, replacing complex Streams+Tasks
|
|
56
|
+
- **NOT for**: Batch ETL, scheduled runs, CTE decomposition, cost-sensitive workloads
|
|
57
|
+
- **Key difference**: Snowflake manages the refresh DAG automatically. You specify a lag (e.g., 5 minutes) and Snowflake keeps data fresh within that window. This adds continuous compute cost.
|
|
58
|
+
|
|
59
|
+
### Incremental Load
|
|
60
|
+
- **Purpose**: Process only new/modified records via high-water mark comparison
|
|
61
|
+
- **Use when**: Large tables where full refresh is too expensive, append-only sources
|
|
62
|
+
- **NOT for**: Full-refresh staging, CTE decomposition, small-to-medium tables
|
|
63
|
+
|
|
64
|
+
### Materialized View
|
|
65
|
+
- **Purpose**: Pre-computed aggregations that auto-refresh when base data changes
|
|
66
|
+
- **Use when**: Expensive aggregations that need to stay current, single-source only
|
|
67
|
+
- **NOT for**: Multi-source joins, standard transforms
|
|
68
|
+
|
|
69
|
+
### Deferred Merge
|
|
70
|
+
- **Purpose**: Snowflake Streams + scheduled merge tasks for high-frequency ingestion
|
|
71
|
+
- **Use when**: High-frequency data ingestion where immediate merge is too expensive
|
|
72
|
+
- **NOT for**: Batch ETL, standard staging
|
|
73
|
+
|
|
74
|
+
### Tasks / DAG
|
|
75
|
+
- **Purpose**: Snowflake scheduled or DAG-based orchestration
|
|
76
|
+
- **Use when**: Building scheduled task workflows
|
|
77
|
+
- **NOT for**: Data transformation nodes
|
|
78
|
+
|
|
79
|
+
## Decision Rules
|
|
80
|
+
|
|
81
|
+
1. **CTE decomposition**: Each CTE becomes a Stage or Work node. Never Dimension, Fact, or specialized types unless the user explicitly names it that way.
|
|
82
|
+
2. **GROUP BY / SUM / COUNT**: These are transforms. Use Stage/Work. Only use Fact/Dimension if building a dimensional model.
|
|
83
|
+
3. **Multi-source JOIN**: Use Stage or Work — both handle joins via sourceMapping.
|
|
84
|
+
4. **Name prefix**: `stg_` → Stage, `wrk_` → Work, `dim_` → Dimension, `fct_` → Fact, `vw_` → View. Follow the prefix.
|
|
85
|
+
5. **No explicit requirement**: Default to Stage for single-source, Work for multi-source.
|
|
86
|
+
6. **Specialized types require explicit user request**: Dynamic Tables, Incremental Load, Materialized View, Deferred Merge, and Tasks are specialized materialization patterns. **NEVER auto-select these.** The user must explicitly ask for continuous refresh, incremental processing, or the specific pattern by name.
|
|
87
|
+
|
|
88
|
+
## Common Mistakes
|
|
89
|
+
|
|
90
|
+
| Mistake | Why It's Wrong | Correct Approach |
|
|
91
|
+
|---------|---------------|-----------------|
|
|
92
|
+
| Using Dynamic Tables for batch ETL | Dynamic Tables add continuous compute cost and are for near-real-time refresh | Use Stage or Work with table materialization |
|
|
93
|
+
| Picking node type by ID (e.g., "65") without `plan-pipeline` | The agent doesn't know what the ID maps to or if it's appropriate | Always call `plan-pipeline` first |
|
|
94
|
+
| Using Dimension/Fact for GROUP BY | GROUP BY is a transform, not a dimensional model | Use Stage or Work |
|
|
95
|
+
| Skipping `plan-pipeline` and guessing "Stage" | May miss better options or select an inappropriate type | Always call `plan-pipeline` with `repoPath` |
|
|
96
|
+
| Using Incremental Load for small tables | Incremental overhead isn't worth it for fast full refreshes | Use Stage with full refresh |
|
|
@@ -0,0 +1,135 @@
|
|
|
1
|
+
# Coalesce MCP Server Overview
|
|
2
|
+
|
|
3
|
+
## How This Server Works
|
|
4
|
+
|
|
5
|
+
This MCP server provides tools and resources for working with Coalesce Transform workspaces and environments via the Coalesce API.
|
|
6
|
+
|
|
7
|
+
### Available Resources
|
|
8
|
+
|
|
9
|
+
Consult these resources for specific guidance. Load only what the current task needs.
|
|
10
|
+
|
|
11
|
+
- **coalesce://context/pipeline-workflows** — Building pipelines, node type selection, multi-node sequences, incremental setup
|
|
12
|
+
- **coalesce://context/node-operations** — Editing nodes: join conditions, columns, config, renames, SQL conversion
|
|
13
|
+
- **coalesce://context/node-creation-decision-tree** — Routing: which tool to use for creation vs update
|
|
14
|
+
- **coalesce://context/data-engineering-principles** — Architecture, platforms, methodology, materialization, packages
|
|
15
|
+
- **coalesce://context/aggregation-patterns** — GROUP BY, datatype inference, common aggregation patterns
|
|
16
|
+
- **coalesce://context/node-type-corpus** — Node type discovery, corpus search, metadata patterns
|
|
17
|
+
- **coalesce://context/tool-usage** — Core tool rules, discovery patterns, pagination, parallelization
|
|
18
|
+
- **coalesce://context/sql-platform-selection** — Determine the active SQL platform
|
|
19
|
+
- **coalesce://context/sql-snowflake** — Snowflake SQL conventions
|
|
20
|
+
- **coalesce://context/sql-databricks** — Databricks SQL conventions
|
|
21
|
+
- **coalesce://context/sql-bigquery** — BigQuery SQL conventions
|
|
22
|
+
- **coalesce://context/storage-mappings** — `{{ ref() }}` syntax and storage locations
|
|
23
|
+
- **coalesce://context/id-discovery** — Resolving project, workspace, environment, and node IDs
|
|
24
|
+
- **coalesce://context/node-payloads** — Full node body editing guidance
|
|
25
|
+
- **coalesce://context/hydrated-metadata** — Hydrated metadata structures
|
|
26
|
+
- **coalesce://context/run-operations** — Start, retry, diagnose, and cancel runs
|
|
27
|
+
- **coalesce://context/intelligent-node-configuration** — Intelligent config completion
|
|
28
|
+
|
|
29
|
+
### Response Guidelines
|
|
30
|
+
|
|
31
|
+
- Answer questions directly without preamble
|
|
32
|
+
- Use fenced code blocks with language tags
|
|
33
|
+
- Provide detail when requested, brevity otherwise
|
|
34
|
+
- For multi-step pipeline creation, report progress after each node
|
|
35
|
+
|
|
36
|
+
## How Coalesce Nodes Work
|
|
37
|
+
|
|
38
|
+
**CRITICAL**: Coalesce nodes are NOT SQL scripts. They are declarative configurations with these components:
|
|
39
|
+
|
|
40
|
+
| Component | Where it lives | What it does |
|
|
41
|
+
|-----------|---------------|--------------|
|
|
42
|
+
| **Columns** | `metadata.columns[].transform` | Each column has a SQL expression (e.g., `"ORDERS"."CUSTOMER_ID"`, `SUM("ORDERS"."TOTAL")`) |
|
|
43
|
+
| **Join condition** | `metadata.sourceMapping[].join.joinCondition` | The FROM/JOIN/WHERE/GROUP BY clause using `{{ ref() }}` syntax |
|
|
44
|
+
| **Dependencies** | `metadata.sourceMapping[].dependencies` | Which upstream nodes this node reads from |
|
|
45
|
+
| **Config** | `config` | Node-type-specific settings (truncate, business keys, SCD, etc.) |
|
|
46
|
+
|
|
47
|
+
The node type's Jinja template combines these into final SQL at compile time. You configure **columns and joins separately** — you never write a complete SQL query.
|
|
48
|
+
|
|
49
|
+
**Example — a CLV aggregation node:**
|
|
50
|
+
|
|
51
|
+
```text
|
|
52
|
+
Columns:
|
|
53
|
+
- CUSTOMER_ID: transform = "CUSTOMER_LOYALTY"."CUSTOMER_ID"
|
|
54
|
+
- TOTAL_ORDERS: transform = COUNT(DISTINCT "ORDER_HEADER"."ORDER_ID")
|
|
55
|
+
- LIFETIME_VALUE: transform = SUM("ORDER_HEADER"."ORDER_TOTAL")
|
|
56
|
+
|
|
57
|
+
joinCondition:
|
|
58
|
+
FROM {{ ref('STAGING', 'CUSTOMER_LOYALTY') }} "CUSTOMER_LOYALTY"
|
|
59
|
+
LEFT JOIN {{ ref('STAGING', 'ORDER_HEADER') }} "ORDER_HEADER"
|
|
60
|
+
ON "CUSTOMER_LOYALTY"."CUSTOMER_ID" = "ORDER_HEADER"."CUSTOMER_ID"
|
|
61
|
+
GROUP BY "CUSTOMER_LOYALTY"."CUSTOMER_ID"
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**Key rules:**
|
|
65
|
+
|
|
66
|
+
- Column transforms reference **predecessor table aliases** from the joinCondition
|
|
67
|
+
- GROUP BY goes inside the joinCondition, after JOIN clauses
|
|
68
|
+
- Aggregate functions (COUNT, SUM, AVG) go in column transforms, NOT in joinCondition
|
|
69
|
+
- CASE expressions go in column transforms
|
|
70
|
+
- CTEs are NOT supported — break into separate upstream nodes
|
|
71
|
+
- Do NOT write `overrideSQL` or pass raw SQL — use the column/joinCondition model
|
|
72
|
+
|
|
73
|
+
**Required workflow for creating nodes:**
|
|
74
|
+
|
|
75
|
+
1. **Always call `plan-pipeline` first** — it discovers all available node types from the repo and ranks them for your use case. Do NOT guess node types like "Stage" or "View".
|
|
76
|
+
2. **Use `create-workspace-node-from-predecessor`** (or `create-workspace-node-from-scratch` for nodes with no upstream). Pass `repoPath` for automatic config completion.
|
|
77
|
+
3. **Use `create-workspace-node-from-predecessor`** (or `create-workspace-node-from-scratch` for nodes with no upstream) — they handle validation, config completion, and column-level attributes automatically.
|
|
78
|
+
|
|
79
|
+
Config completion is automatic when `repoPath` is provided — the response includes `configCompletion` showing what node-level config and column-level attributes were applied.
|
|
80
|
+
|
|
81
|
+
**Post-creation verification (required before moving to the next node):**
|
|
82
|
+
|
|
83
|
+
1. Check `nextSteps` in the creation response — follow all required steps (especially join setup for multi-predecessor nodes)
|
|
84
|
+
2. Check `validation.allPredecessorsRepresented` — if false, the join is incomplete
|
|
85
|
+
3. For multi-predecessor nodes: set up the join condition via `convert-join-to-aggregation` (aggregation), `apply-join-condition` (row-level joins), or `update-workspace-node` (manual)
|
|
86
|
+
4. Verify the final node with `get-workspace-node` — confirm columns, joinCondition, and config are correct
|
|
87
|
+
5. Follow naming conventions: STG_ for staging, DIM_ for dimensions, FACT_ for facts, INT_ for intermediate (e.g., `STG_LOCATION`, `FACT_ORDERS`). Default to UPPERCASE for Snowflake, but **respect the user's chosen casing**
|
|
88
|
+
|
|
89
|
+
**Anti-pattern — writing SQL and passing it to the planner:**
|
|
90
|
+
|
|
91
|
+
Do NOT author SQL yourself to pass to `plan-pipeline` or `create-pipeline-from-sql`. The `sql` parameter exists solely for converting SQL that the **user** provided. When building a pipeline, use declarative tools:
|
|
92
|
+
|
|
93
|
+
1. `create-workspace-node-from-predecessor` to create nodes
|
|
94
|
+
2. `update-workspace-node` to set joinCondition
|
|
95
|
+
3. `replace-workspace-node-columns` or `convert-join-to-aggregation` for column transforms
|
|
96
|
+
|
|
97
|
+
### Writing SQL for Nodes
|
|
98
|
+
|
|
99
|
+
Determine the platform first, then load exactly one dialect resource:
|
|
100
|
+
|
|
101
|
+
1. **Detect**: `get-project` for warehouse type, or check existing node SQL (see `coalesce://context/sql-platform-selection`)
|
|
102
|
+
2. **Load one**: Snowflake -> `coalesce://context/sql-snowflake`, Databricks -> `coalesce://context/sql-databricks`, BigQuery -> `coalesce://context/sql-bigquery`
|
|
103
|
+
3. **Follow that dialect's rules** for the entire edit
|
|
104
|
+
|
|
105
|
+
## Operational Scope
|
|
106
|
+
|
|
107
|
+
### In Scope
|
|
108
|
+
|
|
109
|
+
- Creating/updating workspace nodes
|
|
110
|
+
- Reasoning from project metadata and workspace patterns
|
|
111
|
+
- Writing SQL transforms and joins
|
|
112
|
+
- Running jobs and monitoring runs
|
|
113
|
+
- Managing environments and projects
|
|
114
|
+
|
|
115
|
+
### Out of Scope
|
|
116
|
+
|
|
117
|
+
- **Previewing compiled SQL** — compilation happens at deploy/run time
|
|
118
|
+
- **Changing node type** — set at creation time, cannot be changed via API. Create a new node of the desired type instead
|
|
119
|
+
- **Data preview / row counts** — this server manages node definitions, not live warehouse data
|
|
120
|
+
- **Cross-workspace replication** — no clone tool; recreate each node bottom-up in the target workspace
|
|
121
|
+
- Creating/modifying node type templates, source nodes, or macros
|
|
122
|
+
|
|
123
|
+
### Description Generation
|
|
124
|
+
|
|
125
|
+
Only generate descriptions when explicitly requested. Focus on disambiguation — what makes this column the one the user is looking for? If you lack context, ask.
|
|
126
|
+
|
|
127
|
+
Apply all descriptions in one call using `replace-workspace-node-columns`.
|
|
128
|
+
|
|
129
|
+
## Documentation
|
|
130
|
+
|
|
131
|
+
- [General Coalesce Docs](https://docs.coalesce.io/docs)
|
|
132
|
+
- [API Documentation](https://docs.coalesce.io/docs/api)
|
|
133
|
+
- [Snowflake Base Node Types](https://docs.coalesce.io/docs/marketplace/package/coalesce_snowflake_base-node-types)
|
|
134
|
+
- [Incremental Loading](https://docs.coalesce.io/docs/marketplace/package/coalesce_snowflake_incremental-loading)
|
|
135
|
+
- [Dynamic Tables](https://docs.coalesce.io/docs/marketplace/package/coalesce_snowflake_dynamic-tables)
|