modscape 1.1.8 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,122 +1,731 @@
1
- # Modscape Data Modeling Rules (AI-Optimized)
1
+ # Modscape Modeling Rules for AI Agents
2
2
 
3
- ## 0. Foundational Principle: Instruction Fidelity
4
- AI agents MUST prioritize the user's specific instructions above all else.
5
- - **Accuracy**: Precisely implement every table, column, and relationship requested.
6
- - **Completeness**: Do not omit requested details for the sake of brevity.
7
- - **Expert Guidance**: If a user's instruction contradicts modeling best practices (e.g., mixing grains), **warn the user and suggest an alternative**, but do not ignore the original intent.
3
+ > **Purpose**: This file teaches AI agents how to write valid `model.yaml` files for Modscape.
4
+ > Read this file completely before generating or editing any YAML.
8
5
 
9
- ## CRITICAL: YAML Architecture
10
- AI agents MUST follow this root-level structure. Schema violations will cause parsing errors.
6
+ ---
7
+
8
+ ## QUICK REFERENCE (read this first)
9
+
10
+ ```
11
+ ROOT KEYS domains | tables | relationships | annotations | layout
12
+ COORDINATES ONLY in `layout`. NEVER inside tables or domains.
13
+ LINEAGE Use lineage.upstream (not relationships) for mart/aggregated tables.
14
+ parentId Declare a table's domain membership inside layout, not inside domains.
15
+ IDs Every object (table, domain, annotation) needs a unique `id`.
16
+ sampleData First row = column IDs. At least 3 realistic data rows.
17
+ Grid All x/y values must be multiples of 40.
18
+ ```
19
+
20
+ ---
21
+
22
+ ## 1. Root Structure
23
+
24
+ A valid `model.yaml` has exactly these top-level keys.
25
+
26
+ ```yaml
27
+ domains: # (array) visual containers — OPTIONAL but recommended
28
+ tables: # (array) entity definitions — REQUIRED
29
+ relationships: # (array) ER cardinality edges — OPTIONAL
30
+ annotations: # (array) sticky notes / callouts — OPTIONAL
31
+ layout: # (object) ALL coordinates — REQUIRED if any objects exist
32
+ ```
33
+
34
+ **MUST NOT** add any other top-level keys. They will be ignored or cause errors.
35
+
36
+ ---
37
+
38
+ ## 2. Tables
39
+
40
+ ### 2-1. Required and Optional Fields
41
+
42
+ | Field | Required | Description |
43
+ |-------|----------|-------------|
44
+ | `id` | **REQUIRED** | Unique identifier used as a key in `layout`, `domains.tables`, `lineage.upstream`, etc. Use snake_case. |
45
+ | `name` | **REQUIRED** | Conceptual (business) name shown large on the canvas. |
46
+ | `logical_name` | optional | Formal business name shown medium. Omit if same as `name`. |
47
+ | `physical_name` | optional | Actual database table name shown small. |
48
+ | `appearance` | optional | Visual type, icon, color. |
49
+ | `conceptual` | optional | AI-friendly business context metadata. |
50
+ | `lineage` | optional | Upstream table IDs. Only for `mart` / aggregated tables. |
51
+ | `columns` | optional | Column definitions. |
52
+ | `sampleData` | optional | 2D array of sample rows. Strongly recommended. |
53
+
54
+ ### 2-2. `appearance` Fields
55
+
56
+ ```yaml
57
+ appearance:
58
+ type: fact # REQUIRED if used. See table below.
59
+ sub_type: transaction # optional free text (transaction | periodic | accumulating | ...)
60
+ scd: type2 # optional. dimension tables only. type0|type1|type2|type3|type4|type6
61
+ icon: "💰" # optional. any single emoji.
62
+ color: "#e0f2fe" # optional. hex or CSS color for the header.
63
+ ```
64
+
65
+ **`appearance.type` values:**
66
+
67
+ | type | Use when... |
68
+ |------|-------------|
69
+ | `fact` | Events, transactions, measurements. Has measures (numbers) and FK columns. |
70
+ | `dimension` | Entities, master data, reference lists. Descriptive attributes. |
71
+ | `mart` | Aggregated or consumer-facing output. **Always add `lineage.upstream`.** |
72
+ | `hub` | Data Vault: stores a single unique business key. |
73
+ | `link` | Data Vault: joins two or more hubs (transaction or relationship). |
74
+ | `satellite` | Data Vault: descriptive attributes of a hub, tracked over time. |
75
+ | `table` | Generic. Use when none of the above apply. |
76
+
77
+ **MUST NOT** use `scd` on `fact`, `mart`, `hub`, `link`, or `satellite` tables.
78
+
79
+ ### 2-3. `conceptual` Fields (AI-readable business context)
80
+
81
+ ```yaml
82
+ conceptual:
83
+ description: "One row per order line item."
84
+ tags: [WHAT, HOW_MUCH] # BEAM* tags: WHO | WHAT | WHEN | WHERE | HOW | COUNT | HOW_MUCH
85
+ businessDefinitions:
86
+ revenue: "Net revenue after discounts and returns."
87
+ ```
88
+
89
+ ### 2-4. `columns` Fields
90
+
91
+ Each column has an `id` plus optional `logical` and `physical` blocks.
92
+
93
+ ```yaml
94
+ columns:
95
+ - id: order_id # REQUIRED. Unique within the table. Used in sampleData header.
96
+ logical:
97
+ name: "Order ID" # Display name
98
+ type: Int # Int | String | Decimal | Date | Timestamp | Boolean | ...
99
+ description: "Surrogate key." # optional
100
+ isPrimaryKey: true # optional. default false.
101
+ isForeignKey: false # optional. default false.
102
+ isPartitionKey: false # optional. default false.
103
+ isMetadata: false # optional. true for audit cols: load_date, record_source, hash_diff
104
+ additivity: fully # optional. fully=summable | semi=balance/stock | non=price/rate/ID
105
+ physical: # optional. override when warehouse names/types differ.
106
+ name: order_id_pk
107
+ type: "BIGINT"
108
+ constraints: [NOT NULL, UNIQUE]
109
+ ```
110
+
111
+ ---
112
+
113
+ ## 3. Relationships (ER Cardinality)
114
+
115
+ Use `relationships` **only** for structural ER connections between tables.
116
+
117
+ ```yaml
118
+ relationships:
119
+ - from:
120
+ table: dim_customers # table id
121
+ column: customer_key # column id — optional but recommended
122
+ to:
123
+ table: fct_orders
124
+ column: customer_key
125
+ type: one-to-many
126
+ ```
127
+
128
+ **`type` values:**
129
+
130
+ | type | Typical usage |
131
+ |------|--------------|
132
+ | `one-to-one` | Lookup table / vertical split |
133
+ | `one-to-many` | Dimension → Fact *(most common)* |
134
+ | `many-to-one` | Fact → Dimension *(inverse notation of above)* |
135
+ | `many-to-many` | Via a bridge / link table |
136
+
137
+ **MUST NOT** use `relationships` to express data lineage (use `lineage.upstream` instead).
138
+
139
+ ---
140
+
141
+ ## 4. Data Lineage
142
+
143
+ `lineage.upstream` declares which source tables a derived table is built from.
144
+ This is rendered as animated arrows in **Lineage Mode**. It is separate from ER relationships.
145
+
146
+ ```yaml
147
+ tables:
148
+ - id: mart_revenue
149
+ appearance: { type: mart }
150
+ lineage:
151
+ upstream:
152
+ - fct_orders # list of source table IDs
153
+ - dim_dates
154
+ ```
155
+
156
+ ### When to use lineage vs relationships
157
+
158
+ | Situation | Use |
159
+ |-----------|-----|
160
+ | `dim_customers` → `fct_orders` (FK join) | `relationships` |
161
+ | `fct_orders` + `dim_dates` → `mart_revenue` (aggregation) | `lineage.upstream` |
162
+
163
+ **MUST** define `lineage.upstream` for every `mart` or aggregated table.
164
+ **MUST NOT** define `lineage.upstream` for raw tables (`fact`, `dimension`, `hub`, `link`, `satellite`).
165
+ **MUST NOT** add a `relationships` entry for a connection already expressed in `lineage.upstream`.
166
+
167
+ #### Example: correct separation
168
+
169
+ ```yaml
170
+ # CORRECT
171
+ tables:
172
+ - id: mart_revenue
173
+ appearance: { type: mart }
174
+ lineage:
175
+ upstream: [fct_orders, dim_dates] # lineage only
176
+
177
+ relationships:
178
+ - from: { table: dim_customers, column: customer_key }
179
+ to: { table: fct_orders, column: customer_key }
180
+ type: one-to-many # ER only
181
+
182
+ # WRONG — do not add a relationships entry for the same connection as lineage
183
+ relationships:
184
+ - from: { table: fct_orders }
185
+ to: { table: mart_revenue }
186
+ type: lineage # ❌ never do this
187
+ ```
188
+
189
+ ---
190
+
191
+ ## 5. Domains
192
+
193
+ ```yaml
194
+ domains:
195
+ - id: sales_ops # REQUIRED. Used as key in layout.
196
+ name: "Sales Operations" # REQUIRED. Display name.
197
+ description: "..." # optional
198
+ color: "rgba(59, 130, 246, 0.1)" # optional. rgba recommended.
199
+ tables: # REQUIRED. List of table IDs inside this domain.
200
+ - fct_orders
201
+ - dim_customers
202
+ isLocked: false # optional. true = prevent drag on canvas.
203
+ ```
204
+
205
+ **MUST** list only table IDs that actually exist in `tables`.
206
+ **MUST** add a layout entry for the domain with `width` and `height`.
207
+
208
+ ---
209
+
210
+ ## 6. Layout
211
+
212
+ **All coordinates live here.** Never put `x`, `y`, `width`, or `height` inside `tables` or `domains`.
213
+
214
+ ### 6-1. Field Reference
215
+
216
+ | Field | Required for | Description |
217
+ |-------|-------------|-------------|
218
+ | `x` | all entries | Canvas x coordinate (integer, multiple of 40) |
219
+ | `y` | all entries | Canvas y coordinate (integer, multiple of 40) |
220
+ | `width` | domains | Total pixel width of the domain container |
221
+ | `height` | domains | Total pixel height of the domain container |
222
+ | `parentId` | tables inside a domain | ID of the containing domain. Makes coordinates relative to domain origin. |
223
+ | `isLocked` | domains or tables | Prevents drag when true |
224
+
225
+ ### 6-2. Domain Size Formula
226
+
227
+ Calculate domain dimensions so tables fit without overflow:
228
+
229
+ ```
230
+ width = (numCols * 320) + ((numCols - 1) * 80) + 160
231
+ height = (numRows * 240) + ((numRows - 1) * 80) + 160
232
+ ```
233
+
234
+ Examples:
235
+ - 1 col × 1 row → width: 480, height: 400
236
+ - 2 col × 1 row → width: 880, height: 400
237
+ - 2 col × 2 row → width: 880, height: 720
238
+ - 3 col × 2 row → width: 1280, height: 720
239
+
240
+ ### 6-3. Table Positioning Inside a Domain
241
+
242
+ When `parentId` is set, `x`/`y` are **relative to the domain's top-left corner (0, 0)**.
243
+
244
+ ```yaml
245
+ layout:
246
+ sales_ops:
247
+ x: 0 # absolute canvas position
248
+ y: 0
249
+ width: 880
250
+ height: 400
251
+ dim_customers:
252
+ x: 80 # 80px from domain's left edge
253
+ y: 80 # 80px from domain's top edge
254
+ parentId: sales_ops
255
+ fct_orders:
256
+ x: 480 # 480px from domain's left edge
257
+ y: 80
258
+ parentId: sales_ops
259
+ ```
260
+
261
+ **MUST NOT** let any table's right edge (`x + 320`) or bottom edge (`y + 240`) exceed the domain's `width` or `height`.
262
+
263
+ ### 6-4. Layout Flow Conventions
264
+
265
+ - **ER diagrams**: Dimension/Hub tables TOP, Fact/Link tables BOTTOM
266
+ - **Lineage diagrams**: Upstream (source) LEFT, Downstream (mart) RIGHT
267
+ - **Grid**: All `x` and `y` values must be multiples of 40
268
+ - **Spacing**: Minimum gap of 120px between nodes
269
+
270
+ ### 6-5. Layout Template
271
+
272
+ ```yaml
273
+ layout:
274
+ # --- Domain ---
275
+ <domain_id>:
276
+ x: <canvas_x> # absolute
277
+ y: <canvas_y>
278
+ width: <W> # use formula above
279
+ height: <H>
280
+
281
+ # --- Table inside domain ---
282
+ <table_id>:
283
+ x: <relative_x> # relative to domain origin
284
+ y: <relative_y>
285
+ parentId: <domain_id>
286
+
287
+ # --- Standalone table ---
288
+ <table_id>:
289
+ x: <canvas_x> # absolute
290
+ y: <canvas_y>
291
+ ```
292
+
293
+ ---
294
+
295
+ ## 7. Annotations
296
+
297
+ ```yaml
298
+ annotations:
299
+ - id: note_001 # REQUIRED. Unique ID.
300
+ type: sticky # REQUIRED. sticky | callout
301
+ text: "..." # REQUIRED. Note content.
302
+ color: "#fef9c3" # optional. background color.
303
+ targetId: fct_orders # optional. ID of the object to attach to.
304
+ targetType: table # required if targetId is set. table | domain | relationship | column
305
+ offset:
306
+ x: 100 # offset from target's top-left. if no targetId, this is absolute canvas position.
307
+ y: -80 # negative y = above the target.
308
+ ```
309
+
310
+ ---
311
+
312
+ ## 8. Sample Data
313
+
314
+ Every table SHOULD include `sampleData`.
315
+
316
+ ```yaml
317
+ sampleData:
318
+ - [1001, 1, 150.00, "COMPLETED"] # each row = one data record
319
+ - [1002, 2, 89.50, "PENDING"]
320
+ - [1003, 1, 210.00, "COMPLETED"]
321
+ ```
322
+
323
+ **Rules:**
324
+ - Each row is a plain data record. No header row.
325
+ - The order of values MUST match the order of `columns` defined in the table.
326
+ - Use realistic values. Do NOT use "test1", "foo", "xxx".
327
+ - Numeric measures should be plausible business amounts.
328
+ - Dates should be in ISO 8601 format: `"2024-01-15"` or `"2024-01-15T00:00:00Z"`.
329
+
330
+ ---
331
+
332
+ ## 9. Implementation Hints
333
+
334
+ `implementation` is an **optional** block inside each table. AI agents read it to generate dbt / Spark / SQLMesh code. Omitting it is fine — the visualizer works without it.
11
335
 
12
- 1. **`domains`**: (Array) Visual groupings.
13
- 2. **`tables`**: (Array) Entity definitions. **NEVER put `x` or `y` coordinates here.**
14
- 3. **`relationships`**: (Array) ER connections.
15
- 4. **`annotations`**: (Array) Sticky notes and callouts.
16
- 5. **`layout`**: (Dictionary) **MANDATORY**. All coordinates MUST live here, keyed by object ID.
336
+ ```yaml
337
+ tables:
338
+ - id: fct_orders
339
+ appearance: { type: fact }
340
+ implementation:
341
+ materialization: incremental # table | view | incremental | ephemeral
342
+ incremental_strategy: merge # merge | append | delete+insert
343
+ unique_key: order_id # column id used for upsert
344
+ partition_by:
345
+ field: event_date
346
+ granularity: day # day | month | year | hour
347
+ cluster_by: [customer_id, region_id]
348
+ grain: [month_key, region_id] # GROUP BY columns (mart only)
349
+ measures:
350
+ - column: total_revenue # output column id in this table
351
+ agg: sum # sum | count | count_distinct | avg | min | max
352
+ source_column: amount # upstream column id (use <table_id>.<col_id> to disambiguate)
353
+ ```
354
+
355
+ ### AI Inference Defaults (when `implementation` is absent)
356
+
357
+ | `appearance.type` | `appearance.scd` | Inferred `materialization` |
358
+ |------------------|-----------------|--------------------------|
359
+ | `fact` | — | `incremental` |
360
+ | `dimension` | `type2` | `table` (snapshot pattern) |
361
+ | `dimension` | other | `table` |
362
+ | `mart` | — | `table` |
363
+ | `hub` / `link` / `satellite` | — | `incremental` |
364
+ | `table` | — | `view` |
365
+
366
+ **Rules:**
367
+ - `measures` and `grain` are for `mart` tables only.
368
+ - `incremental_strategy` and `unique_key` are only relevant when `materialization: incremental`.
369
+ - When `source_column` is ambiguous across multiple upstream tables, qualify it as `<table_id>.<column_id>` (e.g., `fct_orders.amount`).
370
+ - **MUST NOT** define `implementation` inside `domains`, `relationships`, or `annotations`.
17
371
 
18
372
  ---
19
373
 
20
- ## 1. Beautiful Layout Heuristics
21
- To ensure a professional and clean diagram, AI agents MUST use the following numeric standards:
374
+ ## 10. Common Mistakes (Before → After)
22
375
 
23
- ### Standard Metrics
24
- - **Grid Snapping**: All `x` and `y` values MUST be multiples of **40** (e.g., 0, 40, 80, 120).
25
- - **Standard Table Width**: `320`
26
- - **Standard Table Height**: `240` (base)
27
- - **Node Spacing (Gap)**: Minimum `120` between nodes.
376
+ ### Coordinates inside a table definition
28
377
 
29
- ### Directional Flow
30
- - **Data Lineage (Horizontal)**:
31
- - Upstream (Source) tables on the **LEFT**.
32
- - Downstream (Target) tables on the **RIGHT**.
33
- - **ER Relationships (Vertical)**:
34
- - Master/Dimension/Hub tables on the **TOP**.
35
- - Fact/Transaction/Link tables on the **BOTTOM**.
378
+ ```yaml
379
+ # WRONG
380
+ tables:
381
+ - id: fct_orders
382
+ x: 200 # coordinates do not belong here
383
+ y: 400
384
+ ```
385
+
386
+ ```yaml
387
+ # CORRECT
388
+ tables:
389
+ - id: fct_orders
390
+ name: Orders
36
391
 
37
- ### Domain Containers
38
- - Tables inside a domain are positioned **relative** to the domain's (0,0) origin.
39
- - **Domain Packing (Arithmetic Rule)**:
40
- To ensure tables fit perfectly inside a domain, calculate dimensions as follows:
41
- - **Width**: `(Cols * 320) + ((Cols - 1) * 80) + 160` (Padding). *Example: 2-col domain = 880px wide.*
42
- - **Height**: `(Rows * 240) + ((Rows - 1) * 80) + 160` (Padding). *Example: 2-row domain = 720px high.*
43
- - **Boundary Constraint**: NEVER place a table such that its right/bottom edge exceeds the domain's `width`/`height`.
392
+ layout:
393
+ fct_orders:
394
+ x: 200 # coordinates belong in layout
395
+ y: 400
396
+ ```
44
397
 
45
398
  ---
46
399
 
47
- ## 2. Table Naming Hierarchy (3-Layer)
48
- Bridge the gap between business and tech by populating all three layers:
400
+ ### Using relationships for lineage
49
401
 
50
- 1. **Conceptual Name (`name`)**: Business title (e.g., "Customers"). High-level clarity.
51
- 2. **Logical Name (`logical_name`)**: Formal modeling name (e.g., "Customer Master"). Hidden if identical to `name`.
52
- 3. **Physical Name (`physical_name`)**: Actual database table name (e.g., `dim_customers_v1`).
402
+ ```yaml
403
+ # WRONG
404
+ relationships:
405
+ - from: { table: fct_orders }
406
+ to: { table: mart_revenue }
407
+ type: lineage # ❌ 'lineage' is not a valid relationship type
408
+ ```
409
+
410
+ ```yaml
411
+ # CORRECT
412
+ tables:
413
+ - id: mart_revenue
414
+ appearance: { type: mart }
415
+ lineage:
416
+ upstream: [fct_orders] # ✅ express lineage here
417
+ ```
53
418
 
54
419
  ---
55
420
 
56
- ## 3. Modeling Strategy & Intelligence
57
- AI agents MUST analyze the nature of data to choose the correct classification and methodology.
421
+ ### Table listed in domain but missing from layout
58
422
 
59
- ### Table Classification Heuristics
60
- - **Fact (`fact`)**: Data represents **Events, Transactions, or Measurements** (e.g., "Sales", "Clicks"). Usually has numbers (measures) and foreign keys.
61
- - **Dimension (`dimension`)**: Data represents **Entities, People, or Reference Lists** (e.g., "Customers", "Products"). Contains descriptive attributes.
62
- - **Hub (`hub`)**: Data represents a **Unique Business Key** (e.g., "Customer ID"). Used in Data Vault for core entity identification.
63
- - **Satellite (`satellite`)**: Data represents **Descriptive Attributes of a Hub over time**. Always linked to a Hub.
423
+ ```yaml
424
+ # WRONG
425
+ domains:
426
+ - id: sales_ops
427
+ tables: [fct_orders, dim_customers] # dim_customers listed here...
64
428
 
65
- ### Defining the Grain (The "1-Row Rule")
66
- - Before adding columns, define the **Grain**: What does one row represent? (e.g., "One line item per invoice").
67
- - **STRICT**: NEVER mix grains in a single table. Aggregated measures and atomic transactions MUST be in separate tables.
429
+ layout:
430
+ sales_ops: { x: 0, y: 0, width: 880, height: 400 }
431
+ fct_orders: { x: 480, y: 80, parentId: sales_ops }
432
+ # ❌ dim_customers has no layout entry → will render at origin (0,0)
433
+ ```
68
434
 
69
- ### Methodology Selection
70
- - **Star Schema**: Use for most business reporting. Prioritize user-friendliness and query performance.
71
- - **Data Vault 2.0**: Use for high-integration environments with many source systems. Prioritize scalability and auditability over direct queryability.
435
+ ```yaml
436
+ # CORRECT every table in a domain MUST have a layout entry
437
+ layout:
438
+ sales_ops: { x: 0, y: 0, width: 880, height: 400 }
439
+ dim_customers: { x: 80, y: 80, parentId: sales_ops } # ✅
440
+ fct_orders: { x: 480, y: 80, parentId: sales_ops } # ✅
441
+ ```
72
442
 
73
443
  ---
74
444
 
75
- ## 4. Logical Column Rules
76
- - **Key Flags**: Mark `isPrimaryKey`, `isForeignKey`, or `isPartitionKey`.
77
- - **Metadata**: Mark technical columns (e.g., `dw_load_date`) with `isMetadata: true`.
78
- - **Additivity**: `fully` (Summable), `semi` (Balance), `non` (Price/ID).
445
+ ### Table overflows domain boundary
446
+
447
+ ```yaml
448
+ # WRONG domain width is 480 but table at x:280 + width:320 = 600 > 480
449
+ layout:
450
+ small_domain: { x: 0, y: 0, width: 480, height: 400 }
451
+ fct_orders: { x: 280, y: 80, parentId: small_domain } # ❌ right edge = 600
452
+ ```
453
+
454
+ ```yaml
455
+ # CORRECT — use the formula: 1 col = width 480
456
+ layout:
457
+ small_domain: { x: 0, y: 0, width: 480, height: 400 }
458
+ fct_orders: { x: 80, y: 80, parentId: small_domain } # ✅ right edge = 400
459
+ ```
79
460
 
80
461
  ---
81
462
 
82
- ## 5. Sample Data Stories
83
- **Every table MUST include high-quality sample data.**
84
- - **Format**: 2D array. First row is Header IDs.
85
- - **Storytelling**: Provide at least 3 rows representing a real business scenario. Avoid "test1", "test2". Use realistic names, dates, and amounts.
463
+ ## 11. dbt Project Integration
464
+
465
+ If the user has a dbt project, AI agents SHOULD recommend using the built-in import commands instead of writing YAML from scratch.
466
+
467
+ ### 11-1. Commands
468
+
469
+ ```bash
470
+ # Prerequisite: generate manifest.json first
471
+ dbt parse
472
+
473
+ # Import a dbt project into Modscape YAML (one-time)
474
+ modscape dbt import [project-dir] [options]
475
+
476
+ # Sync dbt changes into existing Modscape YAML (incremental)
477
+ modscape dbt sync [project-dir] [options]
478
+ ```
479
+
480
+ **`dbt import` options:**
481
+
482
+ | Option | Description |
483
+ |--------|-------------|
484
+ | `-o, --output <dir>` | Output directory (default: `modscape-<project-name>`) |
485
+ | `--split-by folder` | One YAML file per dbt folder |
486
+ | `--split-by schema` | One YAML file per database schema |
487
+ | `--split-by tag` | One YAML file per dbt tag |
488
+
489
+ ### 11-2. What `dbt import` generates
490
+
491
+ The command reads `target/manifest.json` and produces YAML with:
492
+
493
+ | Field | Source | Notes |
494
+ |-------|--------|-------|
495
+ | `id` | `node.unique_id` | Format: `model.project.name` or `source.project.src.table` |
496
+ | `name` | `node.name` | Model / source name |
497
+ | `physical_name` | `node.alias` | Falls back to `node.name` |
498
+ | `conceptual.description` | `node.description` | From dbt docs |
499
+ | `columns[].logical.name/type/description` | `node.columns` | From dbt schema.yml |
500
+ | `lineage.upstream` | `node.depends_on.nodes` | Auto-populated |
501
+ | `appearance.type` | — | **Always `table`. Must be reclassified.** |
502
+ | `sampleData` | — | **Not generated. Must be added.** |
503
+ | `layout` | — | **Not generated. Must be added.** |
504
+ | `domains` | dbt folder structure | Auto-grouped by `fqn[1]` |
505
+
506
+ ### 11-3. What AI agents MUST do after `dbt import`
507
+
508
+ After running `modscape dbt import`, the generated YAML needs enrichment. AI agents MUST:
509
+
510
+ 1. **Reclassify `appearance.type`** — All tables default to `type: table`. Inspect the table name and columns to assign the correct type (`fact`, `dimension`, `mart`, etc.).
511
+ - Tables named `fct_*` → `fact`
512
+ - Tables named `dim_*` → `dimension`
513
+ - Tables named `mart_*` or `rpt_*` → `mart`
514
+ - Tables named `hub_*` → `hub`, `lnk_*` → `link`, `sat_*` → `satellite`
515
+
516
+ 2. **Add `layout`** — The import does not generate coordinates. Calculate domain sizes and add `layout` entries for all tables and domains using the formula in Section 6.
517
+
518
+ 3. **Add `sampleData`** — The import does not generate sample data. Add at least 3 realistic rows per table.
519
+
520
+ 4. **Do NOT re-generate `lineage.upstream`** — It is already correctly populated from `depends_on.nodes`.
521
+
522
+ ### 11-4. `dbt sync` — Incremental updates
523
+
524
+ Use `modscape dbt sync` when the dbt project has changed (new models, updated columns, etc.) and you want to update the existing Modscape YAML without losing manual edits.
525
+
526
+ **What `sync` overwrites:**
527
+ - `name`, `logical_name`, `physical_name`
528
+ - `conceptual.description`
529
+ - `columns` (all)
530
+ - `lineage.upstream`
531
+
532
+ **What `sync` preserves (safe to edit manually):**
533
+ - `appearance` (type, icon, color, scd)
534
+ - `sampleData`
535
+ - `layout`
536
+ - `domains`
537
+ - `annotations`
538
+ - Any fields not listed above
539
+
540
+ > **Workflow**: `dbt import` once → enrich with AI → `dbt sync` when dbt changes → re-enrich as needed.
541
+
542
+ ### 11-5. Table ID format in dbt-imported models
543
+
544
+ In dbt-imported YAML, table IDs are dbt `unique_id` strings, not short names:
545
+
546
+ ```yaml
547
+ # dbt-imported table ID examples
548
+ id: "model.my_project.fct_orders"
549
+ id: "source.my_project.raw.orders"
550
+ id: "seed.my_project.product_categories"
551
+
552
+ # lineage.upstream also uses unique_id format
553
+ lineage:
554
+ upstream:
555
+ - "model.my_project.stg_orders"
556
+ - "source.my_project.raw.customers"
557
+ ```
558
+
559
+ **MUST NOT** shorten these IDs. They are the join keys between `tables`, `domains.tables`, `lineage.upstream`, and `layout`.
86
560
 
87
561
  ---
88
562
 
89
- ## 6. Prohibitions & Anti-Patterns
90
- - **NO NESTED LAYOUT**: Never put `x` or `y` inside `tables[...]` or `domains[...]`.
91
- - **NO FLOATS**: Use only integers for coordinates.
92
- - **NO FRAGMENTED LINEAGE**: Always define `lineage.upstream` for derived tables.
563
+ ## 12. Merging YAML Files
564
+
565
+ When a user asks to **combine, merge, or consolidate** multiple YAML model files, use the built-in `merge` command instead of editing YAML manually.
566
+
567
+ ```bash
568
+ # Merge specific files
569
+ modscape merge sales.yaml marketing.yaml -o combined.yaml
570
+
571
+ # Merge all YAML files in a directory
572
+ modscape merge ./models -o combined.yaml
573
+
574
+ # Merge multiple directories
575
+ modscape merge ./sales ./marketing -o combined.yaml
576
+ ```
577
+
578
+ **Merge behavior:**
579
+
580
+ | Section | Behavior |
581
+ |---------|----------|
582
+ | `tables` | Deduplicated by `id`. First occurrence wins on conflict. |
583
+ | `relationships` | All entries included (no deduplication). |
584
+ | `domains` | Deduplicated by `id`. First occurrence wins on conflict. |
585
+ | `layout` | **Not included in output.** Must be added after merging. |
586
+ | `annotations` | **Not included in output.** Must be added after merging. |
587
+
588
+ **What AI agents MUST do after merge:**
589
+
590
+ 1. **Add `layout`** — Run `modscape dev <output>` and use auto-layout, or calculate coordinates manually using the formula in Section 6.
591
+ 2. **Check for relationship duplication** — If the same relationship exists in multiple source files, it will appear twice. Deduplicate manually if needed.
93
592
 
94
593
  ---
95
594
 
96
- ## 7. Golden Schema Example
595
+ ## 13. Complete Example
596
+
97
597
  ```yaml
98
598
  domains:
99
599
  - id: sales_domain
100
600
  name: "Sales Operations"
101
- tables: [fct_orders]
601
+ description: "Core transactional data."
602
+ color: "rgba(239, 68, 68, 0.1)"
603
+ tables: [dim_customers, fct_orders]
604
+
605
+ - id: analytics_domain
606
+ name: "Analytics & Insights"
607
+ color: "rgba(245, 158, 11, 0.1)"
608
+ tables: [mart_monthly_revenue]
102
609
 
103
610
  tables:
611
+ - id: dim_customers
612
+ name: "Customers"
613
+ logical_name: "Customer Master"
614
+ physical_name: "dim_customers_v2"
615
+ appearance:
616
+ type: dimension
617
+ scd: type2
618
+ icon: "👤"
619
+ conceptual:
620
+ description: "One row per unique customer version (SCD Type 2)."
621
+ tags: [WHO]
622
+ columns:
623
+ - id: customer_key
624
+ logical: { name: "Customer Key", type: Int, isPrimaryKey: true }
625
+ - id: customer_name
626
+ logical: { name: "Name", type: String }
627
+ - id: dw_valid_from
628
+ logical: { name: "Valid From", type: Timestamp, isMetadata: true }
629
+ sampleData:
630
+ - [1, "Acme Corp", "2024-01-01T00:00:00Z"]
631
+ - [2, "Beta Ltd", "2024-03-15T00:00:00Z"]
632
+ - [3, "Gamma Inc", "2024-06-01T00:00:00Z"]
633
+
104
634
  - id: fct_orders
105
635
  name: "Orders"
106
636
  logical_name: "Order Transactions"
107
637
  physical_name: "fct_sales_orders"
108
638
  appearance: { type: fact, sub_type: transaction, icon: "🛒" }
639
+ conceptual:
640
+ description: "One row per order line item."
641
+ tags: [WHAT, HOW_MUCH]
642
+ implementation:
643
+ materialization: incremental
644
+ incremental_strategy: merge
645
+ unique_key: order_id
646
+ partition_by: { field: order_date, granularity: day }
647
+ cluster_by: [customer_key]
109
648
  columns:
110
649
  - id: order_id
111
- logical: { name: "ID", type: Int, isPrimaryKey: true }
650
+ logical: { name: "Order ID", type: Int, isPrimaryKey: true }
651
+ physical: { name: "order_id", type: "BIGINT", constraints: [NOT NULL] }
652
+ - id: customer_key
653
+ logical: { name: "Customer Key", type: Int, isForeignKey: true }
112
654
  - id: amount
113
655
  logical: { name: "Amount", type: Decimal, additivity: fully }
114
656
  sampleData:
115
- - [order_id, amount]
116
- - [1001, 50.0]
117
- - [1002, 120.5]
657
+ - [1001, 1, 150.00]
658
+ - [1002, 2, 89.50]
659
+ - [1003, 1, 210.00]
660
+
661
+ - id: mart_monthly_revenue
662
+ name: "Monthly Revenue"
663
+ logical_name: "Executive Revenue Summary"
664
+ physical_name: "mart_finance_monthly_revenue_agg"
665
+ appearance: { type: mart, icon: "📈" }
666
+ lineage: # mart → use lineage, not relationships
667
+ upstream:
668
+ - fct_orders
669
+ - dim_customers
670
+ implementation:
671
+ materialization: table
672
+ grain: [month_key]
673
+ measures:
674
+ - column: total_revenue
675
+ agg: sum
676
+ source_column: fct_orders.amount
677
+ columns:
678
+ - id: month_key
679
+ logical: { name: "Month", type: String, isPrimaryKey: true }
680
+ - id: total_revenue
681
+ logical: { name: "Revenue", type: Decimal, additivity: fully }
682
+ sampleData:
683
+ - ["2024-01", 12450.50]
684
+ - ["2024-02", 15200.00]
685
+ - ["2024-03", 18900.75]
686
+
687
+ relationships: # ER only — not for lineage
688
+ - from: { table: dim_customers, column: customer_key }
689
+ to: { table: fct_orders, column: customer_key }
690
+ type: one-to-many
691
+
692
+ annotations:
693
+ - id: note_001
694
+ type: sticky
695
+ text: "Grain: one row per order line item."
696
+ targetId: fct_orders
697
+ targetType: table
698
+ offset: { x: 100, y: -80 }
118
699
 
119
700
  layout:
120
- sales_domain: { x: 0, y: 0, width: 480, height: 400 }
121
- fct_orders: { x: 80, y: 80 } # Relative to domain
701
+ # Domains width/height calculated by formula
702
+ # sales_domain: 2 tables side by side 2-col × 1-row → w:880, h:400
703
+ sales_domain:
704
+ x: 0
705
+ y: 0
706
+ width: 880
707
+ height: 400
708
+
709
+ # Tables inside sales_domain — coordinates relative to domain origin
710
+ dim_customers:
711
+ x: 80
712
+ y: 80
713
+ parentId: sales_domain
714
+
715
+ fct_orders:
716
+ x: 480
717
+ y: 80
718
+ parentId: sales_domain
719
+
720
+ # analytics_domain: 1 table → 1-col × 1-row → w:480, h:400
721
+ analytics_domain:
722
+ x: 1000
723
+ y: 0
724
+ width: 480
725
+ height: 400
726
+
727
+ mart_monthly_revenue:
728
+ x: 80
729
+ y: 80
730
+ parentId: analytics_domain
122
731
  ```