modscape 1.2.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,122 +1,730 @@
1
- # Modscape Data Modeling Rules (AI-Optimized)
1
+ # Modscape Modeling Rules for AI Agents
2
2
 
3
- ## 0. Foundational Principle: Instruction Fidelity
4
- AI agents MUST prioritize the user's specific instructions above all else.
5
- - **Accuracy**: Precisely implement every table, column, and relationship requested.
6
- - **Completeness**: Do not omit requested details for the sake of brevity.
7
- - **Expert Guidance**: If a user's instruction contradicts modeling best practices (e.g., mixing grains), **warn the user and suggest an alternative**, but do not ignore the original intent.
3
+ > **Purpose**: This file teaches AI agents how to write valid `model.yaml` files for Modscape.
4
+ > Read this file completely before generating or editing any YAML.
8
5
 
9
- ## CRITICAL: YAML Architecture
10
- AI agents MUST follow this root-level structure. Schema violations will cause parsing errors.
6
+ ---
7
+
8
+ ## QUICK REFERENCE (read this first)
9
+
10
+ ```
11
+ ROOT KEYS domains | tables | relationships | lineage | annotations | layout
12
+ COORDINATES ONLY in `layout`. NEVER inside tables or domains.
13
+ LINEAGE Use top-level `lineage` section (not relationships, not table.lineage.upstream).
14
+ parentId Declare a table's domain membership inside layout, not inside domains.
15
+ IDs Every object (table, domain, annotation) needs a unique `id`.
16
+ sampleData First row = column IDs. At least 3 realistic data rows.
17
+ Grid All x/y values must be multiples of 40.
18
+ ```
19
+
20
+ ---
21
+
22
+ ## 1. Root Structure
23
+
24
+ A valid `model.yaml` has exactly these top-level keys.
25
+
26
+ ```yaml
27
+ domains: # (array) visual containers — OPTIONAL but recommended
28
+ tables: # (array) entity definitions — REQUIRED
29
+ relationships: # (array) ER cardinality edges — OPTIONAL
30
+ lineage: # (array) data lineage edges — OPTIONAL
31
+ annotations: # (array) sticky notes / callouts — OPTIONAL
32
+ layout: # (object) ALL coordinates — REQUIRED if any objects exist
33
+ ```
34
+
35
+ **MUST NOT** add any other top-level keys. They will be ignored or cause errors.
36
+
37
+ ---
38
+
39
+ ## 2. Tables
40
+
41
+ ### 2-1. Required and Optional Fields
42
+
43
+ | Field | Required | Description |
44
+ |-------|----------|-------------|
45
+ | `id` | **REQUIRED** | Unique identifier used as a key in `layout`, `domains.tables`, `lineage.upstream`, etc. Use snake_case. |
46
+ | `name` | **REQUIRED** | Conceptual (business) name shown large on the canvas. |
47
+ | `logical_name` | optional | Formal business name shown medium. Omit if same as `name`. |
48
+ | `physical_name` | optional | Actual database table name shown small. |
49
+ | `appearance` | optional | Visual type, icon, color. |
50
+ | `conceptual` | optional | AI-friendly business context metadata. |
51
+ | `lineage` | optional | Upstream table IDs. Only for `mart` / aggregated tables. |
52
+ | `columns` | optional | Column definitions. |
53
+ | `sampleData` | optional | 2D array of sample rows. Strongly recommended. |
54
+
55
+ ### 2-2. `appearance` Fields
56
+
57
+ ```yaml
58
+ appearance:
59
+ type: fact # REQUIRED if used. See table below.
60
+ sub_type: transaction # optional free text (transaction | periodic | accumulating | ...)
61
+ scd: type2 # optional. dimension tables only. type0|type1|type2|type3|type4|type6
62
+ icon: "💰" # optional. any single emoji.
63
+ color: "#e0f2fe" # optional. hex or CSS color for the header.
64
+ ```
65
+
66
+ **`appearance.type` values:**
67
+
68
+ | type | Use when... |
69
+ |------|-------------|
70
+ | `fact` | Events, transactions, measurements. Has measures (numbers) and FK columns. |
71
+ | `dimension` | Entities, master data, reference lists. Descriptive attributes. |
72
+ | `mart` | Aggregated or consumer-facing output. **Always add `lineage.upstream`.** |
73
+ | `hub` | Data Vault: stores a single unique business key. |
74
+ | `link` | Data Vault: joins two or more hubs (transaction or relationship). |
75
+ | `satellite` | Data Vault: descriptive attributes of a hub, tracked over time. |
76
+ | `table` | Generic. Use when none of the above apply. |
77
+
78
+ **MUST NOT** use `scd` on `fact`, `mart`, `hub`, `link`, or `satellite` tables.
79
+
80
+ ### 2-3. `conceptual` Fields (AI-readable business context)
81
+
82
+ ```yaml
83
+ conceptual:
84
+ description: "One row per order line item."
85
+ tags: [WHAT, HOW_MUCH] # BEAM* tags: WHO | WHAT | WHEN | WHERE | HOW | COUNT | HOW_MUCH
86
+ businessDefinitions:
87
+ revenue: "Net revenue after discounts and returns."
88
+ ```
89
+
90
+ ### 2-4. `columns` Fields
91
+
92
+ Each column has an `id` plus optional `logical` and `physical` blocks.
93
+
94
+ ```yaml
95
+ columns:
96
+ - id: order_id # REQUIRED. Unique within the table. Used in sampleData header.
97
+ logical:
98
+ name: "Order ID" # Display name
99
+ type: Int # Int | String | Decimal | Date | Timestamp | Boolean | ...
100
+ description: "Surrogate key." # optional
101
+ isPrimaryKey: true # optional. default false.
102
+ isForeignKey: false # optional. default false.
103
+ isPartitionKey: false # optional. default false.
104
+ isMetadata: false # optional. true for audit cols: load_date, record_source, hash_diff
105
+ additivity: fully # optional. fully=summable | semi=balance/stock | non=price/rate/ID
106
+ physical: # optional. override when warehouse names/types differ.
107
+ name: order_id_pk
108
+ type: "BIGINT"
109
+ constraints: [NOT NULL, UNIQUE]
110
+ ```
111
+
112
+ ---
113
+
114
+ ## 3. Relationships (ER Cardinality)
115
+
116
+ Use `relationships` **only** for structural ER connections between tables.
117
+
118
+ ```yaml
119
+ relationships:
120
+ - from:
121
+ table: dim_customers # table id
122
+ column: customer_key # column id — optional but recommended
123
+ to:
124
+ table: fct_orders
125
+ column: customer_key
126
+ type: one-to-many
127
+ ```
128
+
129
+ **`type` values:**
130
+
131
+ | type | Typical usage |
132
+ |------|--------------|
133
+ | `one-to-one` | Lookup table / vertical split |
134
+ | `one-to-many` | Dimension → Fact *(most common)* |
135
+ | `many-to-one` | Fact → Dimension *(inverse notation of above)* |
136
+ | `many-to-many` | Via a bridge / link table |
137
+
138
+ **MUST NOT** use `relationships` to express data lineage (use `lineage.upstream` instead).
139
+
140
+ ---
141
+
142
+ ## 4. Data Lineage
143
+
144
+ Top-level `lineage` section declares data flow between tables (which source tables feed which derived tables).
145
+ This is rendered as dashed arrows in **Lineage Mode**. It is separate from ER relationships.
146
+
147
+ ```yaml
148
+ lineage:
149
+ - from: fct_orders # source table id
150
+ to: mart_revenue # derived table id
151
+ - from: dim_dates
152
+ to: mart_revenue
153
+ ```
154
+
155
+ ### When to use lineage vs relationships
156
+
157
+ | Situation | Use |
158
+ |-----------|-----|
159
+ | `dim_customers` → `fct_orders` (FK join) | `relationships` |
160
+ | `fct_orders` + `dim_dates` → `mart_revenue` (aggregation) | `lineage` |
161
+
162
+ **MUST** define `lineage` entries for every `mart` or aggregated table.
163
+ **MUST NOT** define `lineage` entries for raw tables (`fact`, `dimension`, `hub`, `link`, `satellite`) as sources.
164
+ **MUST NOT** add a `relationships` entry for a connection already expressed in `lineage`.
165
+
166
+ #### Example: correct separation
167
+
168
+ ```yaml
169
+ # CORRECT
170
+ lineage:
171
+ - from: fct_orders
172
+ to: mart_revenue
173
+ - from: dim_dates
174
+ to: mart_revenue
175
+
176
+ relationships:
177
+ - from: { table: dim_customers, column: customer_key }
178
+ to: { table: fct_orders, column: customer_key }
179
+ type: one-to-many # ER only
180
+
181
+ # WRONG — do not add a relationships entry for the same connection as lineage
182
+ relationships:
183
+ - from: { table: fct_orders }
184
+ to: { table: mart_revenue }
185
+ type: lineage # ❌ never do this
186
+ ```
187
+
188
+ ---
189
+
190
+ ## 5. Domains
191
+
192
+ ```yaml
193
+ domains:
194
+ - id: sales_ops # REQUIRED. Used as key in layout.
195
+ name: "Sales Operations" # REQUIRED. Display name.
196
+ description: "..." # optional
197
+ color: "rgba(59, 130, 246, 0.1)" # optional. rgba recommended.
198
+ tables: # REQUIRED. List of table IDs inside this domain.
199
+ - fct_orders
200
+ - dim_customers
201
+ isLocked: false # optional. true = prevent drag on canvas.
202
+ ```
203
+
204
+ **MUST** list only table IDs that actually exist in `tables`.
205
+ **MUST** add a layout entry for the domain with `width` and `height`.
206
+
207
+ ---
208
+
209
+ ## 6. Layout
210
+
211
+ **All coordinates live here.** Never put `x`, `y`, `width`, or `height` inside `tables` or `domains`.
212
+
213
+ ### 6-1. Field Reference
214
+
215
+ | Field | Required for | Description |
216
+ |-------|-------------|-------------|
217
+ | `x` | all entries | Canvas x coordinate (integer, multiple of 40) |
218
+ | `y` | all entries | Canvas y coordinate (integer, multiple of 40) |
219
+ | `width` | domains | Total pixel width of the domain container |
220
+ | `height` | domains | Total pixel height of the domain container |
221
+ | `parentId` | tables inside a domain | ID of the containing domain. Makes coordinates relative to domain origin. |
222
+ | `isLocked` | domains or tables | Prevents drag when true |
223
+
224
+ ### 6-2. Domain Size Formula
225
+
226
+ Calculate domain dimensions so tables fit without overflow:
227
+
228
+ ```
229
+ width = (numCols * 320) + ((numCols - 1) * 80) + 160
230
+ height = (numRows * 240) + ((numRows - 1) * 80) + 160
231
+ ```
232
+
233
+ Examples:
234
+ - 1 col × 1 row → width: 480, height: 400
235
+ - 2 col × 1 row → width: 880, height: 400
236
+ - 2 col × 2 row → width: 880, height: 720
237
+ - 3 col × 2 row → width: 1280, height: 720
238
+
239
+ ### 6-3. Table Positioning Inside a Domain
240
+
241
+ When `parentId` is set, `x`/`y` are **relative to the domain's top-left corner (0, 0)**.
242
+
243
+ ```yaml
244
+ layout:
245
+ sales_ops:
246
+ x: 0 # absolute canvas position
247
+ y: 0
248
+ width: 880
249
+ height: 400
250
+ dim_customers:
251
+ x: 80 # 80px from domain's left edge
252
+ y: 80 # 80px from domain's top edge
253
+ parentId: sales_ops
254
+ fct_orders:
255
+ x: 480 # 480px from domain's left edge
256
+ y: 80
257
+ parentId: sales_ops
258
+ ```
259
+
260
+ **MUST NOT** let any table's right edge (`x + 320`) or bottom edge (`y + 240`) exceed the domain's `width` or `height`.
261
+
262
+ ### 6-4. Layout Flow Conventions
263
+
264
+ - **ER diagrams**: Dimension/Hub tables TOP, Fact/Link tables BOTTOM
265
+ - **Lineage diagrams**: Upstream (source) LEFT, Downstream (mart) RIGHT
266
+ - **Grid**: All `x` and `y` values must be multiples of 40
267
+ - **Spacing**: Minimum gap of 120px between nodes
268
+
269
+ ### 6-5. Layout Template
270
+
271
+ ```yaml
272
+ layout:
273
+ # --- Domain ---
274
+ <domain_id>:
275
+ x: <canvas_x> # absolute
276
+ y: <canvas_y>
277
+ width: <W> # use formula above
278
+ height: <H>
279
+
280
+ # --- Table inside domain ---
281
+ <table_id>:
282
+ x: <relative_x> # relative to domain origin
283
+ y: <relative_y>
284
+ parentId: <domain_id>
285
+
286
+ # --- Standalone table ---
287
+ <table_id>:
288
+ x: <canvas_x> # absolute
289
+ y: <canvas_y>
290
+ ```
291
+
292
+ ---
293
+
294
+ ## 7. Annotations
295
+
296
+ ```yaml
297
+ annotations:
298
+ - id: note_001 # REQUIRED. Unique ID.
299
+ type: sticky # REQUIRED. sticky | callout
300
+ text: "..." # REQUIRED. Note content.
301
+ color: "#fef9c3" # optional. background color.
302
+ targetId: fct_orders # optional. ID of the object to attach to.
303
+ targetType: table # required if targetId is set. table | domain | relationship | column
304
+ offset:
305
+ x: 100 # offset from target's top-left. if no targetId, this is absolute canvas position.
306
+ y: -80 # negative y = above the target.
307
+ ```
11
308
 
12
- 1. **`domains`**: (Array) Visual groupings.
13
- 2. **`tables`**: (Array) Entity definitions. **NEVER put `x` or `y` coordinates here.**
14
- 3. **`relationships`**: (Array) ER connections.
15
- 4. **`annotations`**: (Array) Sticky notes and callouts.
16
- 5. **`layout`**: (Dictionary) **MANDATORY**. All coordinates MUST live here, keyed by object ID.
309
+ ---
310
+
311
+ ## 8. Sample Data
312
+
313
+ Every table SHOULD include `sampleData`.
314
+
315
+ ```yaml
316
+ sampleData:
317
+ - [1001, 1, 150.00, "COMPLETED"] # each row = one data record
318
+ - [1002, 2, 89.50, "PENDING"]
319
+ - [1003, 1, 210.00, "COMPLETED"]
320
+ ```
321
+
322
+ **Rules:**
323
+ - Each row is a plain data record. No header row.
324
+ - The order of values MUST match the order of `columns` defined in the table.
325
+ - Use realistic values. Do NOT use "test1", "foo", "xxx".
326
+ - Numeric measures should be plausible business amounts.
327
+ - Dates should be in ISO 8601 format: `"2024-01-15"` or `"2024-01-15T00:00:00Z"`.
17
328
 
18
329
  ---
19
330
 
20
- ## 1. Beautiful Layout Heuristics
21
- To ensure a professional and clean diagram, AI agents MUST use the following numeric standards:
331
+ ## 9. Implementation Hints
332
+
333
+ `implementation` is an **optional** block inside each table. AI agents read it to generate dbt / Spark / SQLMesh code. Omitting it is fine — the visualizer works without it.
334
+
335
+ ```yaml
336
+ tables:
337
+ - id: fct_orders
338
+ appearance: { type: fact }
339
+ implementation:
340
+ materialization: incremental # table | view | incremental | ephemeral
341
+ incremental_strategy: merge # merge | append | delete+insert
342
+ unique_key: order_id # column id used for upsert
343
+ partition_by:
344
+ field: event_date
345
+ granularity: day # day | month | year | hour
346
+ cluster_by: [customer_id, region_id]
347
+ grain: [month_key, region_id] # GROUP BY columns (mart only)
348
+ measures:
349
+ - column: total_revenue # output column id in this table
350
+ agg: sum # sum | count | count_distinct | avg | min | max
351
+ source_column: amount # upstream column id (use <table_id>.<col_id> to disambiguate)
352
+ ```
22
353
 
23
- ### Standard Metrics
24
- - **Grid Snapping**: All `x` and `y` values MUST be multiples of **40** (e.g., 0, 40, 80, 120).
25
- - **Standard Table Width**: `320`
26
- - **Standard Table Height**: `240` (base)
27
- - **Node Spacing (Gap)**: Minimum `120` between nodes.
354
+ ### AI Inference Defaults (when `implementation` is absent)
28
355
 
29
- ### Directional Flow
30
- - **Data Lineage (Horizontal)**:
31
- - Upstream (Source) tables on the **LEFT**.
32
- - Downstream (Target) tables on the **RIGHT**.
33
- - **ER Relationships (Vertical)**:
34
- - Master/Dimension/Hub tables on the **TOP**.
35
- - Fact/Transaction/Link tables on the **BOTTOM**.
356
+ | `appearance.type` | `appearance.scd` | Inferred `materialization` |
357
+ |------------------|-----------------|--------------------------|
358
+ | `fact` | | `incremental` |
359
+ | `dimension` | `type2` | `table` (snapshot pattern) |
360
+ | `dimension` | other | `table` |
361
+ | `mart` | | `table` |
362
+ | `hub` / `link` / `satellite` | | `incremental` |
363
+ | `table` | — | `view` |
36
364
 
37
- ### Domain Containers
38
- - Tables inside a domain are positioned **relative** to the domain's (0,0) origin.
39
- - **Domain Packing (Arithmetic Rule)**:
40
- To ensure tables fit perfectly inside a domain, calculate dimensions as follows:
41
- - **Width**: `(Cols * 320) + ((Cols - 1) * 80) + 160` (Padding). *Example: 2-col domain = 880px wide.*
42
- - **Height**: `(Rows * 240) + ((Rows - 1) * 80) + 160` (Padding). *Example: 2-row domain = 720px high.*
43
- - **Boundary Constraint**: NEVER place a table such that its right/bottom edge exceeds the domain's `width`/`height`.
365
+ **Rules:**
366
+ - `measures` and `grain` are for `mart` tables only.
367
+ - `incremental_strategy` and `unique_key` are only relevant when `materialization: incremental`.
368
+ - When `source_column` is ambiguous across multiple upstream tables, qualify it as `<table_id>.<column_id>` (e.g., `fct_orders.amount`).
369
+ - **MUST NOT** define `implementation` inside `domains`, `relationships`, or `annotations`.
44
370
 
45
371
  ---
46
372
 
47
- ## 2. Table Naming Hierarchy (3-Layer)
48
- Bridge the gap between business and tech by populating all three layers:
373
+ ## 10. Common Mistakes (Before → After)
49
374
 
50
- 1. **Conceptual Name (`name`)**: Business title (e.g., "Customers"). High-level clarity.
51
- 2. **Logical Name (`logical_name`)**: Formal modeling name (e.g., "Customer Master"). Hidden if identical to `name`.
52
- 3. **Physical Name (`physical_name`)**: Actual database table name (e.g., `dim_customers_v1`).
375
+ ### Coordinates inside a table definition
376
+
377
+ ```yaml
378
+ # WRONG
379
+ tables:
380
+ - id: fct_orders
381
+ x: 200 # ❌ coordinates do not belong here
382
+ y: 400
383
+ ```
384
+
385
+ ```yaml
386
+ # CORRECT
387
+ tables:
388
+ - id: fct_orders
389
+ name: Orders
390
+
391
+ layout:
392
+ fct_orders:
393
+ x: 200 # ✅ coordinates belong in layout
394
+ y: 400
395
+ ```
53
396
 
54
397
  ---
55
398
 
56
- ## 3. Modeling Strategy & Intelligence
57
- AI agents MUST analyze the nature of data to choose the correct classification and methodology.
399
+ ### Using relationships for lineage
58
400
 
59
- ### Table Classification Heuristics
60
- - **Fact (`fact`)**: Data represents **Events, Transactions, or Measurements** (e.g., "Sales", "Clicks"). Usually has numbers (measures) and foreign keys.
61
- - **Dimension (`dimension`)**: Data represents **Entities, People, or Reference Lists** (e.g., "Customers", "Products"). Contains descriptive attributes.
62
- - **Hub (`hub`)**: Data represents a **Unique Business Key** (e.g., "Customer ID"). Used in Data Vault for core entity identification.
63
- - **Satellite (`satellite`)**: Data represents **Descriptive Attributes of a Hub over time**. Always linked to a Hub.
401
+ ```yaml
402
+ # WRONG
403
+ relationships:
404
+ - from: { table: fct_orders }
405
+ to: { table: mart_revenue }
406
+ type: lineage # ❌ 'lineage' is not a valid relationship type
407
+ ```
64
408
 
65
- ### Defining the Grain (The "1-Row Rule")
66
- - Before adding columns, define the **Grain**: What does one row represent? (e.g., "One line item per invoice").
67
- - **STRICT**: NEVER mix grains in a single table. Aggregated measures and atomic transactions MUST be in separate tables.
409
+ ```yaml
410
+ # CORRECT
411
+ lineage:
412
+ - from: fct_orders
413
+ to: mart_revenue # ✅ express lineage in the top-level lineage section
414
+ ```
68
415
 
69
- ### Methodology Selection
70
- - **Star Schema**: Use for most business reporting. Prioritize user-friendliness and query performance.
71
- - **Data Vault 2.0**: Use for high-integration environments with many source systems. Prioritize scalability and auditability over direct queryability.
416
+ ---
417
+
418
+ ### Table listed in domain but missing from layout
419
+
420
+ ```yaml
421
+ # WRONG
422
+ domains:
423
+ - id: sales_ops
424
+ tables: [fct_orders, dim_customers] # dim_customers listed here...
425
+
426
+ layout:
427
+ sales_ops: { x: 0, y: 0, width: 880, height: 400 }
428
+ fct_orders: { x: 480, y: 80, parentId: sales_ops }
429
+ # ❌ dim_customers has no layout entry → will render at origin (0,0)
430
+ ```
431
+
432
+ ```yaml
433
+ # CORRECT — every table in a domain MUST have a layout entry
434
+ layout:
435
+ sales_ops: { x: 0, y: 0, width: 880, height: 400 }
436
+ dim_customers: { x: 80, y: 80, parentId: sales_ops } # ✅
437
+ fct_orders: { x: 480, y: 80, parentId: sales_ops } # ✅
438
+ ```
72
439
 
73
440
  ---
74
441
 
75
- ## 4. Logical Column Rules
76
- - **Key Flags**: Mark `isPrimaryKey`, `isForeignKey`, or `isPartitionKey`.
77
- - **Metadata**: Mark technical columns (e.g., `dw_load_date`) with `isMetadata: true`.
78
- - **Additivity**: `fully` (Summable), `semi` (Balance), `non` (Price/ID).
442
+ ### Table overflows domain boundary
443
+
444
+ ```yaml
445
+ # WRONG domain width is 480 but table at x:280 + width:320 = 600 > 480
446
+ layout:
447
+ small_domain: { x: 0, y: 0, width: 480, height: 400 }
448
+ fct_orders: { x: 280, y: 80, parentId: small_domain } # ❌ right edge = 600
449
+ ```
450
+
451
+ ```yaml
452
+ # CORRECT — use the formula: 1 col = width 480
453
+ layout:
454
+ small_domain: { x: 0, y: 0, width: 480, height: 400 }
455
+ fct_orders: { x: 80, y: 80, parentId: small_domain } # ✅ right edge = 400
456
+ ```
79
457
 
80
458
  ---
81
459
 
82
- ## 5. Sample Data Stories
83
- **Every table MUST include high-quality sample data.**
84
- - **Format**: 2D array. First row is Header IDs.
85
- - **Storytelling**: Provide at least 3 rows representing a real business scenario. Avoid "test1", "test2". Use realistic names, dates, and amounts.
460
+ ## 11. dbt Project Integration
461
+
462
+ If the user has a dbt project, AI agents SHOULD recommend using the built-in import commands instead of writing YAML from scratch.
463
+
464
+ ### 11-1. Commands
465
+
466
+ ```bash
467
+ # Prerequisite: generate manifest.json first
468
+ dbt parse
469
+
470
+ # Import a dbt project into Modscape YAML (one-time)
471
+ modscape dbt import [project-dir] [options]
472
+
473
+ # Sync dbt changes into existing Modscape YAML (incremental)
474
+ modscape dbt sync [project-dir] [options]
475
+ ```
476
+
477
+ **`dbt import` options:**
478
+
479
+ | Option | Description |
480
+ |--------|-------------|
481
+ | `-o, --output <dir>` | Output directory (default: `modscape-<project-name>`) |
482
+ | `--split-by folder` | One YAML file per dbt folder |
483
+ | `--split-by schema` | One YAML file per database schema |
484
+ | `--split-by tag` | One YAML file per dbt tag |
485
+
486
+ ### 11-2. What `dbt import` generates
487
+
488
+ The command reads `target/manifest.json` and produces YAML with:
489
+
490
+ | Field | Source | Notes |
491
+ |-------|--------|-------|
492
+ | `id` | `node.unique_id` | Format: `model.project.name` or `source.project.src.table` |
493
+ | `name` | `node.name` | Model / source name |
494
+ | `physical_name` | `node.alias` | Falls back to `node.name` |
495
+ | `conceptual.description` | `node.description` | From dbt docs |
496
+ | `columns[].logical.name/type/description` | `node.columns` | From dbt schema.yml |
497
+ | `lineage.upstream` | `node.depends_on.nodes` | Auto-populated |
498
+ | `appearance.type` | — | **Always `table`. Must be reclassified.** |
499
+ | `sampleData` | — | **Not generated. Must be added.** |
500
+ | `layout` | — | **Not generated. Must be added.** |
501
+ | `domains` | dbt folder structure | Auto-grouped by `fqn[1]` |
502
+
503
+ ### 11-3. What AI agents MUST do after `dbt import`
504
+
505
+ After running `modscape dbt import`, the generated YAML needs enrichment. AI agents MUST:
506
+
507
+ 1. **Reclassify `appearance.type`** — All tables default to `type: table`. Inspect the table name and columns to assign the correct type (`fact`, `dimension`, `mart`, etc.).
508
+ - Tables named `fct_*` → `fact`
509
+ - Tables named `dim_*` → `dimension`
510
+ - Tables named `mart_*` or `rpt_*` → `mart`
511
+ - Tables named `hub_*` → `hub`, `lnk_*` → `link`, `sat_*` → `satellite`
512
+
513
+ 2. **Add `layout`** — The import does not generate coordinates. Calculate domain sizes and add `layout` entries for all tables and domains using the formula in Section 6.
514
+
515
+ 3. **Add `sampleData`** — The import does not generate sample data. Add at least 3 realistic rows per table.
516
+
517
+ 4. **Do NOT re-generate `lineage.upstream`** — It is already correctly populated from `depends_on.nodes`.
518
+
519
+ ### 11-4. `dbt sync` — Incremental updates
520
+
521
+ Use `modscape dbt sync` when the dbt project has changed (new models, updated columns, etc.) and you want to update the existing Modscape YAML without losing manual edits.
522
+
523
+ **What `sync` overwrites:**
524
+ - `name`, `logical_name`, `physical_name`
525
+ - `conceptual.description`
526
+ - `columns` (all)
527
+ - `lineage.upstream`
528
+
529
+ **What `sync` preserves (safe to edit manually):**
530
+ - `appearance` (type, icon, color, scd)
531
+ - `sampleData`
532
+ - `layout`
533
+ - `domains`
534
+ - `annotations`
535
+ - Any fields not listed above
536
+
537
+ > **Workflow**: `dbt import` once → enrich with AI → `dbt sync` when dbt changes → re-enrich as needed.
538
+
539
+ ### 11-5. Table ID format in dbt-imported models
540
+
541
+ In dbt-imported YAML, table IDs are dbt `unique_id` strings, not short names:
542
+
543
+ ```yaml
544
+ # dbt-imported table ID examples
545
+ id: "model.my_project.fct_orders"
546
+ id: "source.my_project.raw.orders"
547
+ id: "seed.my_project.product_categories"
548
+
549
+ # lineage.upstream also uses unique_id format
550
+ lineage:
551
+ upstream:
552
+ - "model.my_project.stg_orders"
553
+ - "source.my_project.raw.customers"
554
+ ```
555
+
556
+ **MUST NOT** shorten these IDs. They are the join keys between `tables`, `domains.tables`, `lineage.upstream`, and `layout`.
86
557
 
87
558
  ---
88
559
 
89
- ## 6. Prohibitions & Anti-Patterns
90
- - **NO NESTED LAYOUT**: Never put `x` or `y` inside `tables[...]` or `domains[...]`.
91
- - **NO FLOATS**: Use only integers for coordinates.
92
- - **NO FRAGMENTED LINEAGE**: Always define `lineage.upstream` for derived tables.
560
+ ## 12. Merging YAML Files
561
+
562
+ When a user asks to **combine, merge, or consolidate** multiple YAML model files, use the built-in `merge` command instead of editing YAML manually.
563
+
564
+ ```bash
565
+ # Merge specific files
566
+ modscape merge sales.yaml marketing.yaml -o combined.yaml
567
+
568
+ # Merge all YAML files in a directory
569
+ modscape merge ./models -o combined.yaml
570
+
571
+ # Merge multiple directories
572
+ modscape merge ./sales ./marketing -o combined.yaml
573
+ ```
574
+
575
+ **Merge behavior:**
576
+
577
+ | Section | Behavior |
578
+ |---------|----------|
579
+ | `tables` | Deduplicated by `id`. First occurrence wins on conflict. |
580
+ | `relationships` | All entries included (no deduplication). |
581
+ | `domains` | Deduplicated by `id`. First occurrence wins on conflict. |
582
+ | `layout` | **Not included in output.** Must be added after merging. |
583
+ | `annotations` | **Not included in output.** Must be added after merging. |
584
+
585
+ **What AI agents MUST do after merge:**
586
+
587
+ 1. **Add `layout`** — Run `modscape dev <output>` and use auto-layout, or calculate coordinates manually using the formula in Section 6.
588
+ 2. **Check for relationship duplication** — If the same relationship exists in multiple source files, it will appear twice. Deduplicate manually if needed.
93
589
 
94
590
  ---
95
591
 
96
- ## 7. Golden Schema Example
592
+ ## 13. Complete Example
593
+
97
594
  ```yaml
98
595
  domains:
99
596
  - id: sales_domain
100
597
  name: "Sales Operations"
101
- tables: [fct_orders]
598
+ description: "Core transactional data."
599
+ color: "rgba(239, 68, 68, 0.1)"
600
+ tables: [dim_customers, fct_orders]
601
+
602
+ - id: analytics_domain
603
+ name: "Analytics & Insights"
604
+ color: "rgba(245, 158, 11, 0.1)"
605
+ tables: [mart_monthly_revenue]
102
606
 
103
607
  tables:
608
+ - id: dim_customers
609
+ name: "Customers"
610
+ logical_name: "Customer Master"
611
+ physical_name: "dim_customers_v2"
612
+ appearance:
613
+ type: dimension
614
+ scd: type2
615
+ icon: "👤"
616
+ conceptual:
617
+ description: "One row per unique customer version (SCD Type 2)."
618
+ tags: [WHO]
619
+ columns:
620
+ - id: customer_key
621
+ logical: { name: "Customer Key", type: Int, isPrimaryKey: true }
622
+ - id: customer_name
623
+ logical: { name: "Name", type: String }
624
+ - id: dw_valid_from
625
+ logical: { name: "Valid From", type: Timestamp, isMetadata: true }
626
+ sampleData:
627
+ - [1, "Acme Corp", "2024-01-01T00:00:00Z"]
628
+ - [2, "Beta Ltd", "2024-03-15T00:00:00Z"]
629
+ - [3, "Gamma Inc", "2024-06-01T00:00:00Z"]
630
+
104
631
  - id: fct_orders
105
632
  name: "Orders"
106
633
  logical_name: "Order Transactions"
107
634
  physical_name: "fct_sales_orders"
108
635
  appearance: { type: fact, sub_type: transaction, icon: "🛒" }
636
+ conceptual:
637
+ description: "One row per order line item."
638
+ tags: [WHAT, HOW_MUCH]
639
+ implementation:
640
+ materialization: incremental
641
+ incremental_strategy: merge
642
+ unique_key: order_id
643
+ partition_by: { field: order_date, granularity: day }
644
+ cluster_by: [customer_key]
109
645
  columns:
110
646
  - id: order_id
111
- logical: { name: "ID", type: Int, isPrimaryKey: true }
647
+ logical: { name: "Order ID", type: Int, isPrimaryKey: true }
648
+ physical: { name: "order_id", type: "BIGINT", constraints: [NOT NULL] }
649
+ - id: customer_key
650
+ logical: { name: "Customer Key", type: Int, isForeignKey: true }
112
651
  - id: amount
113
652
  logical: { name: "Amount", type: Decimal, additivity: fully }
114
653
  sampleData:
115
- - [order_id, amount]
116
- - [1001, 50.0]
117
- - [1002, 120.5]
654
+ - [1001, 1, 150.00]
655
+ - [1002, 2, 89.50]
656
+ - [1003, 1, 210.00]
657
+
658
+ - id: mart_monthly_revenue
659
+ name: "Monthly Revenue"
660
+ logical_name: "Executive Revenue Summary"
661
+ physical_name: "mart_finance_monthly_revenue_agg"
662
+ appearance: { type: mart, icon: "📈" }
663
+ implementation:
664
+ materialization: table
665
+ grain: [month_key]
666
+ measures:
667
+ - column: total_revenue
668
+ agg: sum
669
+ source_column: fct_orders.amount
670
+ columns:
671
+ - id: month_key
672
+ logical: { name: "Month", type: String, isPrimaryKey: true }
673
+ - id: total_revenue
674
+ logical: { name: "Revenue", type: Decimal, additivity: fully }
675
+ sampleData:
676
+ - ["2024-01", 12450.50]
677
+ - ["2024-02", 15200.00]
678
+ - ["2024-03", 18900.75]
679
+
680
+ lineage: # data flow — separate from ER
681
+ - from: fct_orders
682
+ to: mart_monthly_revenue
683
+ - from: dim_customers
684
+ to: mart_monthly_revenue
685
+
686
+ relationships: # ER only — not for lineage
687
+ - from: { table: dim_customers, column: customer_key }
688
+ to: { table: fct_orders, column: customer_key }
689
+ type: one-to-many
690
+
691
+ annotations:
692
+ - id: note_001
693
+ type: sticky
694
+ text: "Grain: one row per order line item."
695
+ targetId: fct_orders
696
+ targetType: table
697
+ offset: { x: 100, y: -80 }
118
698
 
119
699
  layout:
120
- sales_domain: { x: 0, y: 0, width: 480, height: 400 }
121
- fct_orders: { x: 80, y: 80 } # Relative to domain
700
+ # Domains width/height calculated by formula
701
+ # sales_domain: 2 tables side by side 2-col × 1-row → w:880, h:400
702
+ sales_domain:
703
+ x: 0
704
+ y: 0
705
+ width: 880
706
+ height: 400
707
+
708
+ # Tables inside sales_domain — coordinates relative to domain origin
709
+ dim_customers:
710
+ x: 80
711
+ y: 80
712
+ parentId: sales_domain
713
+
714
+ fct_orders:
715
+ x: 480
716
+ y: 80
717
+ parentId: sales_domain
718
+
719
+ # analytics_domain: 1 table → 1-col × 1-row → w:480, h:400
720
+ analytics_domain:
721
+ x: 1000
722
+ y: 0
723
+ width: 480
724
+ height: 400
725
+
726
+ mart_monthly_revenue:
727
+ x: 80
728
+ y: 80
729
+ parentId: analytics_domain
122
730
  ```