meadow-integration 1.0.4 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (67) hide show
  1. package/.dockerignore +11 -0
  2. package/Docker-Build.sh +2 -0
  3. package/Docker-Compose.sh +2 -0
  4. package/Docker-Push.sh +2 -0
  5. package/Docker-Tag.sh +2 -0
  6. package/Dockerfile +28 -0
  7. package/Dockerfile_LUXURYCode +23 -0
  8. package/README.md +139 -25
  9. package/docker-compose.yml +16 -0
  10. package/docs/README.md +65 -18
  11. package/docs/{cover.md → _cover.md} +3 -2
  12. package/docs/_sidebar.md +52 -7
  13. package/docs/_topbar.md +2 -0
  14. package/docs/api/clone-rest-client.md +278 -0
  15. package/docs/api/connection-manager.md +179 -0
  16. package/docs/api/guid-map.md +234 -0
  17. package/docs/api/integration-adapter.md +283 -0
  18. package/docs/api/operation.md +241 -0
  19. package/docs/api/sync-entity-initial.md +227 -0
  20. package/docs/api/sync-entity-ongoing.md +244 -0
  21. package/docs/api/sync.md +213 -0
  22. package/docs/api/tabular-check.md +213 -0
  23. package/docs/api/tabular-transform.md +316 -0
  24. package/docs/architecture.md +423 -0
  25. package/docs/cli/comprehensionarray.md +111 -0
  26. package/docs/cli/comprehensionintersect.md +132 -0
  27. package/docs/cli/csvcheck.md +111 -0
  28. package/docs/cli/csvtransform.md +170 -0
  29. package/docs/cli/data-clone.md +277 -0
  30. package/docs/cli/jsonarraytransform.md +166 -0
  31. package/docs/cli/load-comprehension.md +129 -0
  32. package/docs/cli/objectarraytocsv.md +159 -0
  33. package/docs/cli/overview.md +96 -0
  34. package/docs/cli/serve.md +102 -0
  35. package/docs/cli/tsvtransform.md +144 -0
  36. package/docs/data-clone/configuration.md +357 -0
  37. package/docs/data-clone/connection-manager.md +206 -0
  38. package/docs/data-clone/docker.md +290 -0
  39. package/docs/data-clone/overview.md +173 -0
  40. package/docs/data-clone/sync-modes.md +186 -0
  41. package/docs/implementation-reference.md +311 -0
  42. package/docs/overview.md +156 -0
  43. package/docs/quickstart.md +233 -0
  44. package/docs/rest/comprehension-push.md +209 -0
  45. package/docs/rest/comprehension.md +506 -0
  46. package/docs/rest/csv.md +255 -0
  47. package/docs/rest/entity-generation.md +158 -0
  48. package/docs/rest/json-array.md +243 -0
  49. package/docs/rest/overview.md +120 -0
  50. package/docs/rest/status.md +63 -0
  51. package/docs/rest/tsv.md +241 -0
  52. package/docs/retold-catalog.json +93 -3
  53. package/docs/retold-keyword-index.json +23683 -1901
  54. package/package.json +13 -10
  55. package/scripts/run.sh +18 -0
  56. package/source/Meadow-Integration.js +15 -1
  57. package/source/cli/Default-Meadow-Integration-Configuration.json +37 -2
  58. package/source/cli/Meadow-Integration-CLI-Program.js +4 -1
  59. package/source/cli/commands/Meadow-Integration-Command-DataClone.js +284 -0
  60. package/source/services/clone/Meadow-Service-ConnectionManager.js +251 -0
  61. package/source/services/clone/Meadow-Service-Operation.js +196 -0
  62. package/source/services/clone/Meadow-Service-RestClient.js +364 -0
  63. package/source/services/clone/Meadow-Service-Sync-Entity-Initial.js +367 -0
  64. package/source/services/clone/Meadow-Service-Sync-Entity-Ongoing.js +457 -0
  65. package/source/services/clone/Meadow-Service-Sync.js +142 -0
  66. /package/docs/examples/bookstore/{mapping_books_Author.json → mapping_books_author.json} +0 -0
  67. /package/docs/examples/bookstore/{mapping_books_Book.json → mapping_books_book.json} +0 -0
@@ -0,0 +1,423 @@
1
+ # Architecture
2
+
3
+ This document describes the architectural design of Meadow Integration, covering both the data transformation and data synchronization pipelines.
4
+
5
+ ## High-Level System Architecture
6
+
7
+ Meadow Integration sits between external data sources and the Meadow data access layer. It provides three interfaces (CLI, REST, Programmatic) that share a common set of services.
8
+
9
+ ```mermaid
10
+ flowchart TB
11
+ subgraph External["External Data Sources"]
12
+ CSV["CSV Files"]
13
+ TSV["TSV Files"]
14
+ JSON["JSON Arrays"]
15
+ API["Remote Meadow API"]
16
+ end
17
+
18
+ subgraph Interfaces["Interfaces"]
19
+ CLI["CLI Program\n(pict-service-commandlineutility)"]
20
+ REST["REST Server\n(Orator + Restify)"]
21
+ PROG["Programmatic API\n(require meadow-integration)"]
22
+ end
23
+
24
+ subgraph Services["Core Services"]
25
+ TC["TabularCheck"]
26
+ TT["TabularTransform"]
27
+ IA["IntegrationAdapter"]
28
+ GM["GUIDMap"]
29
+ CM["ConnectionManager"]
30
+ CRC["CloneRestClient"]
31
+ SYNC["Sync"]
32
+ SEI["SyncEntityInitial"]
33
+ SEO["SyncEntityOngoing"]
34
+ OP["Operation"]
35
+ end
36
+
37
+ subgraph Targets["Targets"]
38
+ MAPI["Meadow REST API\n(write)"]
39
+ DB["Local Database\n(MySQL / MSSQL)"]
40
+ end
41
+
42
+ CSV --> CLI
43
+ TSV --> CLI
44
+ JSON --> CLI
45
+ CSV --> REST
46
+ TSV --> REST
47
+ JSON --> REST
48
+
49
+ CLI --> TC
50
+ CLI --> TT
51
+ CLI --> IA
52
+ CLI --> SYNC
53
+ REST --> TC
54
+ REST --> TT
55
+ REST --> IA
56
+ PROG --> TC
57
+ PROG --> TT
58
+ PROG --> IA
59
+ PROG --> SYNC
60
+
61
+ TT --> IA
62
+ IA --> GM
63
+ IA --> MAPI
64
+
65
+ API --> CRC
66
+ CRC --> SYNC
67
+ SYNC --> SEI
68
+ SYNC --> SEO
69
+ SEI --> CM
70
+ SEO --> CM
71
+ CM --> DB
72
+
73
+ SYNC --> OP
74
+ SEI --> OP
75
+ SEO --> OP
76
+ ```
77
+
78
+ The system divides cleanly into two pipelines that share the `Operation` utility for timing and progress tracking.
79
+
80
+ ## Data Transformation Pipeline
81
+
82
+ The transformation pipeline converts tabular data into Meadow entity records. Each stage is a discrete service that can be used independently.
83
+
84
+ ```mermaid
85
+ flowchart LR
86
+ subgraph Input["1. Parse"]
87
+ FILE["Source File\n(CSV / TSV / JSON)"]
88
+ PARSE["Stream Parser\n(csv-parser / JSON.parse)"]
89
+ end
90
+
91
+ subgraph Transform["2. Transform"]
92
+ RECORD["Raw Record"]
93
+ MAPPING["Mapping Config\n(Implicit + Explicit + User)"]
94
+ TEMPLATE["Pict Template Engine\n({~D:Record.col~})"]
95
+ SOLVER["Solvers\n(Expression Parser)"]
96
+ end
97
+
98
+ subgraph Collect["3. Collect"]
99
+ COMP["Comprehension\n{Entity: {GUID: Record}}"]
100
+ MERGE["Intersect\n(merge by GUID)"]
101
+ end
102
+
103
+ subgraph Push["4. Push"]
104
+ ADAPTER["IntegrationAdapter"]
105
+ MARSHAL["Marshal Record\n(schema validation,\nstring truncation,\nGUID prefixing)"]
106
+ GUIDMAP["GUIDMap\n(external <-> Meadow IDs)"]
107
+ UPSERT["Upsert\n(single or bulk)"]
108
+ end
109
+
110
+ FILE --> PARSE --> RECORD
111
+ MAPPING --> TEMPLATE
112
+ RECORD --> TEMPLATE --> COMP
113
+ RECORD --> SOLVER --> COMP
114
+ COMP --> MERGE --> COMP
115
+ COMP --> ADAPTER --> MARSHAL --> GUIDMAP --> UPSERT
116
+ ```
117
+
118
+ ### Stage Details
119
+
120
+ **Parse** -- CSV and TSV files are streamed through a parser that emits one record per row. JSON Array files are loaded and iterated. The `TabularCheck` service can analyze records without transforming them, producing column statistics.
121
+
122
+ **Transform** -- The `TabularTransform` service applies a three-layer configuration cascade to each record:
123
+
124
+ 1. **Implicit** -- Auto-generated from the first record's keys (column names become field names, the first column is used for GUID generation)
125
+ 2. **Explicit** -- Loaded from a mapping file that specifies entity name, GUID template, and column-to-field mappings
126
+ 3. **User** -- Command-line overrides for entity name, GUID name, GUID template, and inline column mappings
127
+
128
+ Each layer merges on top of the previous one using `Object.assign`, so User settings always win.
129
+
130
+ Pict template expressions resolve column values at transformation time. Solvers (powered by the Fable Expression Parser) enable multi-entity extraction from a single source row by dynamically generating multiple GUID uniqueness entries.
131
+
132
+ **Collect** -- Transformed records accumulate in a Comprehension object. Records with duplicate GUIDs within the same parse are merged. Records can also be merged with an existing Comprehension loaded from disk.
133
+
134
+ **Push** -- The `IntegrationAdapter` marshals comprehension records into Meadow-compatible format. It fetches the target entity schema from the Meadow API, validates field types, truncates strings that exceed schema-defined sizes, and strips reserved columns (`CreateDate`, `UpdateDate`, `Deleted`, `DeleteDate`). Cross-entity GUID references are resolved through the `GUIDMap`. Records are pushed via upsert -- individually for small sets, or in configurable bulk batches (default threshold: 1000 records, batch size: 100) for large sets.
135
+
136
+ ## Data Synchronization Pipeline
137
+
138
+ The Data Clone pipeline replicates entity data from a remote Meadow API into a local relational database.
139
+
140
+ ```mermaid
141
+ flowchart TB
142
+ subgraph Config["Configuration"]
143
+ MCFG[".meadow.config.json"]
144
+ SCHEMA["Extended Schema JSON\n(from Stricture)"]
145
+ CLIOPTS["CLI Overrides\n(--api_server, --db_host, etc.)"]
146
+ end
147
+
148
+ subgraph Auth["1. Authenticate"]
149
+ CRC["CloneRestClient"]
150
+ SESSION["Session Management\n(cookie / token)"]
151
+ end
152
+
153
+ subgraph Connect["2. Connect"]
154
+ CM["ConnectionManager"]
155
+ POOL["Connection Pool\n(MySQL or MSSQL)"]
156
+ end
157
+
158
+ subgraph Init["3. Initialize Schema"]
159
+ LOAD["Load Extended Schema"]
160
+ CREATE["Create Tables\n(if not exist)"]
161
+ INDEX["Create Indexes\n(GUID unique, Deleted)"]
162
+ end
163
+
164
+ subgraph Sync["4. Sync Entities"]
165
+ COMPARE["Compare\nlocal vs. remote\n(max ID, count, UpdateDate)"]
166
+ DOWNLOAD["Download Pages\n(filtered + sorted)"]
167
+ WRITE["Marshal + Write\n(create or update)"]
168
+ PROGRESS["Progress Tracking\n(Operation service)"]
169
+ end
170
+
171
+ MCFG --> CRC
172
+ CLIOPTS --> CRC
173
+ MCFG --> CM
174
+ CLIOPTS --> CM
175
+ SCHEMA --> LOAD
176
+
177
+ CRC --> SESSION --> DOWNLOAD
178
+ CM --> POOL --> WRITE
179
+ LOAD --> CREATE --> INDEX
180
+
181
+ COMPARE --> DOWNLOAD --> WRITE
182
+ WRITE --> PROGRESS
183
+ ```
184
+
185
+ ### Stage Details
186
+
187
+ **Authenticate** -- The `CloneRestClient` authenticates with the remote Meadow API by posting credentials to `/Authenticate`. The resulting session cookie or token is attached to all subsequent requests. If no credentials are configured, authentication is skipped (for unauthenticated APIs). HTTP keep-alive is enabled for connection reuse.
188
+
189
+ **Connect** -- The `ConnectionManager` establishes a connection pool to the local database. It supports MySQL (via `meadow-connection-mysql`) and MSSQL (via `meadow-connection-mssql`). The provider is selected by the `Destination.Provider` configuration key.
190
+
191
+ **Initialize Schema** -- The Meadow extended schema JSON (produced by Stricture's `build` command) is loaded. For each entity in the schema (or a configured subset), the sync service uses the Meadow provider to create the table if it does not exist. It then creates a unique index on the GUID column and a non-unique index on the Deleted column using the `ConnectionManager`.
192
+
193
+ **Sync Entities** -- Entities are synced sequentially in the order defined by `SyncEntityList` (or schema order if the list is empty). Two sync strategies are available:
194
+
195
+ - **Initial** -- Queries the local max ID and the remote max ID and record count. Generates paginated URL partials filtered to records with IDs greater than the local maximum. Downloads each page and creates records locally with identity insert enabled so primary keys match the remote system.
196
+ - **Ongoing** -- Extends Initial sync with `UpdateDate` comparison. After identifying new records by ID, it also compares `UpdateDate` timestamps. Records where the remote `UpdateDate` differs from the local `UpdateDate` by more than 5 milliseconds are updated. This handles both new records and modifications.
197
+
198
+ ## Service Dependency Diagram
199
+
200
+ All services extend `fable-serviceproviderbase` and register with a Fable instance. The diagram below shows the dependency relationships.
201
+
202
+ ```mermaid
203
+ classDiagram
204
+ class FableServiceProviderBase {
205
+ +fable
206
+ +options
207
+ +log
208
+ +serviceType
209
+ }
210
+
211
+ class TabularCheck {
212
+ +serviceType: TabularCheck
213
+ +newStatisticsObject()
214
+ +collectStatistics()
215
+ }
216
+
217
+ class TabularTransform {
218
+ +serviceType: TabularTransform
219
+ +newMappingOutcomeObject()
220
+ +transformRecord()
221
+ +addRecordToComprehension()
222
+ +createRecordFromMapping()
223
+ }
224
+
225
+ class IntegrationAdapter {
226
+ +serviceType: IntegrationAdapter
227
+ +Entity
228
+ +addSourceRecord()
229
+ +integrateRecords()
230
+ +marshalRecord()
231
+ +pushRecordsToServer()
232
+ }
233
+
234
+ class GUIDMap {
235
+ +serviceType: MeadowGUIDMap
236
+ +mapGUIDToID()
237
+ +getIDFromGUID()
238
+ +mapExternalGUIDtoMeadowGUID()
239
+ +getMeadowIDFromExternalGUID()
240
+ }
241
+
242
+ class ConnectionManager {
243
+ +serviceType: MeadowConnectionManager
244
+ +Provider
245
+ +ConnectionPool
246
+ +connect()
247
+ +createIndex()
248
+ }
249
+
250
+ class CloneRestClient {
251
+ +serviceType: MeadowCloneRestClient
252
+ +serverURL
253
+ +authenticate()
254
+ +deauthenticate()
255
+ +getJSON()
256
+ +upsertEntity()
257
+ +getEntitySet()
258
+ }
259
+
260
+ class Sync {
261
+ +serviceType: MeadowSync
262
+ +SyncMode
263
+ +SyncEntityList
264
+ +loadMeadowSchema()
265
+ +syncEntity()
266
+ +syncAll()
267
+ }
268
+
269
+ class SyncEntityInitial {
270
+ +serviceType: MeadowSyncEntityInitial
271
+ +EntitySchema
272
+ +PageSize
273
+ +initialize()
274
+ +sync()
275
+ +marshalRecord()
276
+ }
277
+
278
+ class SyncEntityOngoing {
279
+ +serviceType: MeadowSyncEntityOngoing
280
+ +EntitySchema
281
+ +PageSize
282
+ +initialize()
283
+ +sync()
284
+ +marshalRecord()
285
+ }
286
+
287
+ class Operation {
288
+ +timeStamps
289
+ +progressTrackers
290
+ +createTimeStamp()
291
+ +createProgressTracker()
292
+ +printProgressTrackerStatus()
293
+ }
294
+
295
+ FableServiceProviderBase <|-- TabularCheck
296
+ FableServiceProviderBase <|-- TabularTransform
297
+ FableServiceProviderBase <|-- IntegrationAdapter
298
+ FableServiceProviderBase <|-- GUIDMap
299
+ FableServiceProviderBase <|-- ConnectionManager
300
+ FableServiceProviderBase <|-- CloneRestClient
301
+ FableServiceProviderBase <|-- Sync
302
+ FableServiceProviderBase <|-- SyncEntityInitial
303
+ FableServiceProviderBase <|-- SyncEntityOngoing
304
+
305
+ IntegrationAdapter --> GUIDMap : uses
306
+ Sync --> SyncEntityInitial : creates
307
+ Sync --> SyncEntityOngoing : creates
308
+ SyncEntityInitial --> Operation : uses
309
+ SyncEntityOngoing --> Operation : uses
310
+ SyncEntityInitial ..> CloneRestClient : reads from
311
+ SyncEntityOngoing ..> CloneRestClient : reads from
312
+ SyncEntityInitial ..> ConnectionManager : writes to
313
+ SyncEntityOngoing ..> ConnectionManager : writes to
314
+ ```
315
+
316
+ ## Configuration Cascade
317
+
318
+ Configuration for the CLI flows through multiple layers, each overriding the previous.
319
+
320
+ ```mermaid
321
+ flowchart LR
322
+ DEF["Default Configuration\n(Default-Meadow-Integration-\nConfiguration.json)"]
323
+ FILE[".meadow.config.json\n(working directory)"]
324
+ CLI["Command-Line Flags\n(--api_server, --db_host, etc.)"]
325
+
326
+ DEF -->|"base"| MERGED["Resolved Configuration"]
327
+ FILE -->|"overrides"| MERGED
328
+ CLI -->|"overrides"| MERGED
329
+ ```
330
+
331
+ For data transformation, the mapping configuration has its own three-layer cascade:
332
+
333
+ ```mermaid
334
+ flowchart LR
335
+ IMP["Implicit\n(auto-detected from\nfirst record)"]
336
+ EXP["Explicit\n(mapping file\nvia -m flag)"]
337
+ USR["User\n(CLI flags:\n-e, -g, -n, -c)"]
338
+
339
+ IMP -->|"base"| FINAL["Final Mapping Config"]
340
+ EXP -->|"overrides"| FINAL
341
+ USR -->|"overrides"| FINAL
342
+ ```
343
+
344
+ ## Sync Mode Comparison
345
+
346
+ The two sync modes serve different purposes and have different performance characteristics.
347
+
348
+ ```mermaid
349
+ flowchart TB
350
+ subgraph Initial["Initial Sync"]
351
+ I1["Query local max ID"]
352
+ I2["Query remote max ID + count"]
353
+ I3["Generate paginated URL partials\n(filter: ID > local max)"]
354
+ I4["Download each page"]
355
+ I5["For each record:\nRead local by ID"]
356
+ I6{"Record\nexists?"}
357
+ I7["Skip"]
358
+ I8["Create with\nidentity insert"]
359
+
360
+ I1 --> I2 --> I3 --> I4 --> I5 --> I6
361
+ I6 -->|"yes"| I7
362
+ I6 -->|"no"| I8
363
+ end
364
+
365
+ subgraph Ongoing["Ongoing Sync"]
366
+ O1["Query local max ID + UpdateDate"]
367
+ O2["Query remote max ID + UpdateDate + count"]
368
+ O3["Iterate all records\n(paginated, ID ascending)"]
369
+ O4["For each record:\nRead local by ID"]
370
+ O5{"Record\nexists?"}
371
+ O6{"UpdateDate\ndifference\n> 5ms?"}
372
+ O7["Update record"]
373
+ O8["Skip"]
374
+ O9["Create with\nidentity insert"]
375
+
376
+ O1 --> O2 --> O3 --> O4 --> O5
377
+ O5 -->|"yes"| O6
378
+ O5 -->|"no"| O9
379
+ O6 -->|"yes"| O7
380
+ O6 -->|"no"| O8
381
+ end
382
+ ```
383
+
384
+ | Aspect | Initial | Ongoing |
385
+ |--------|---------|---------|
386
+ | **Purpose** | First-time clone or catch-up | Incremental sync of changes |
387
+ | **Strategy** | Only downloads records with IDs above local max | Walks all records and compares timestamps |
388
+ | **Handles new records** | Yes | Yes |
389
+ | **Handles updates** | No | Yes (by UpdateDate comparison) |
390
+ | **Performance** | Faster for first clone (skips existing) | Slower per run but keeps data current |
391
+ | **Typical usage** | Run once, then switch to Ongoing | Run on a schedule (cron, Docker) |
392
+
393
+ ## Docker Deployment
394
+
395
+ The included Dockerfile builds a production image for running Data Clone as a containerized service. The image is based on `node:20-bookworm` and expects a `.meadow.config.json` to be provided at runtime (via volume mount or baked into a derived image).
396
+
397
+ ```mermaid
398
+ flowchart LR
399
+ subgraph Build["Docker Build"]
400
+ BASE["node:20-bookworm"]
401
+ DEPS["npm install --omit=dev"]
402
+ SRC["Copy source + scripts"]
403
+ end
404
+
405
+ subgraph Runtime["Docker Runtime"]
406
+ CFG[".meadow.config.json\n(volume mount)"]
407
+ SCHEMA["Extended Schema\n(volume mount)"]
408
+ RUN["scripts/run.sh"]
409
+ end
410
+
411
+ subgraph External["External"]
412
+ REMOTE["Remote Meadow API"]
413
+ LOCAL["Local Database"]
414
+ end
415
+
416
+ BASE --> DEPS --> SRC --> RUN
417
+ CFG --> RUN
418
+ SCHEMA --> RUN
419
+ RUN --> REMOTE
420
+ RUN --> LOCAL
421
+ ```
422
+
423
+ The `docker-compose.yml` can be used to run the Data Clone alongside a local MySQL or MSSQL container for development and testing.
@@ -0,0 +1,111 @@
1
+ # comprehensionarray
2
+
3
+ Convert an object-keyed Comprehension into a JSON array. Comprehensions store records as `{ GUID: record }` objects for fast lookup and merging, but sometimes you need a plain array for export, UI consumption, or further processing.
4
+
5
+ **Aliases:** `comprehension_to_array`, `array`
6
+
7
+ ## Usage
8
+
9
+ ```shell
10
+ mdwint comprehensionarray <file> [options]
11
+ ```
12
+
13
+ ## Arguments
14
+
15
+ | Argument | Required | Description |
16
+ |----------|----------|-------------|
17
+ | `<file>` | Yes | Path to the Comprehension file to convert |
18
+
19
+ ## Options
20
+
21
+ | Option | Description | Default |
22
+ |--------|-------------|---------|
23
+ | `-e, --entity <name>` | Entity name to extract from the Comprehension | Auto-detected from the first key |
24
+ | `-o, --output <path>` | Output file path | `./Array-Comprehension-<filename>.json` |
25
+
26
+ ## Input Format
27
+
28
+ The input is a standard object-keyed Comprehension:
29
+
30
+ ```json
31
+ {
32
+ "Book": {
33
+ "Book_1": { "GUIDBook": "Book_1", "Title": "The Hunger Games", "Language": "eng" },
34
+ "Book_2": { "GUIDBook": "Book_2", "Title": "Harry Potter", "Language": "eng" }
35
+ }
36
+ }
37
+ ```
38
+
39
+ ## Output Format
40
+
41
+ The output is a JSON array of the record objects:
42
+
43
+ ```json
44
+ [
45
+ { "GUIDBook": "Book_1", "Title": "The Hunger Games", "Language": "eng" },
46
+ { "GUIDBook": "Book_2", "Title": "Harry Potter", "Language": "eng" }
47
+ ]
48
+ ```
49
+
50
+ The GUID keys are discarded; only the record values are included in the array.
51
+
52
+ ## Examples
53
+
54
+ ### Basic conversion with explicit entity
55
+
56
+ ```shell
57
+ mdwint comprehensionarray ./books.json -e Book -o books-array.json
58
+ ```
59
+
60
+ ### Auto-detect entity name
61
+
62
+ ```shell
63
+ mdwint comprehensionarray ./books.json -o books-array.json
64
+ ```
65
+
66
+ If `-e` is omitted, the entity name is inferred from the first key in the Comprehension. If the Comprehension has only one entity, this works automatically.
67
+
68
+ ### Using the alias
69
+
70
+ ```shell
71
+ mdwint array ./books.json -e Book -o books-array.json
72
+ ```
73
+
74
+ ### Pipeline: Comprehension to CSV export
75
+
76
+ This command is commonly used as an intermediate step when exporting to CSV:
77
+
78
+ ```shell
79
+ # Step 1: Convert object Comprehension to array
80
+ mdwint comprehensionarray ./store.json -e Book -o books-array.json
81
+
82
+ # Step 2: Export array to CSV
83
+ mdwint objectarraytocsv ./books-array.json -o books.csv
84
+ ```
85
+
86
+ ### Extract one entity from a multi-entity Comprehension
87
+
88
+ ```shell
89
+ # Extract just the Author records from a multi-entity file
90
+ mdwint comprehensionarray ./bookstore.json -e Author -o authors-array.json
91
+
92
+ # Extract just the BookAuthorJoin records
93
+ mdwint comprehensionarray ./bookstore.json -e BookAuthorJoin -o joins-array.json
94
+ ```
95
+
96
+ ## Tips
97
+
98
+ - When working with multi-entity Comprehensions, always specify the `-e` flag to select which entity to extract. Without it, only the first entity key is used.
99
+ - This command is non-destructive to the input file. The original Comprehension is not modified.
100
+ - The output array preserves the order in which object keys were enumerated, which is generally insertion order in modern JavaScript engines.
101
+
102
+ ## Notes
103
+
104
+ - If the specified entity does not exist in the Comprehension, the output will be an empty array.
105
+ - If no entities are found in the Comprehension file, the command will error.
106
+
107
+ ## See Also
108
+
109
+ - [objectarraytocsv](objectarraytocsv.md) -- Convert a JSON array to CSV format
110
+ - [csvtransform](csvtransform.md) -- Create Comprehensions from CSV files
111
+ - [Comprehensions](../comprehensions.md) -- Object vs. array format documentation
@@ -0,0 +1,132 @@
1
+ # comprehensionintersect
2
+
3
+ Merge two Comprehension JSON files together. Records with the same GUID in both files are merged, with values from the secondary file overwriting values in the primary file. This is useful when the same entities have data spread across multiple source files.
4
+
5
+ **Aliases:** `intersect`
6
+
7
+ ## Usage
8
+
9
+ ```shell
10
+ mdwint comprehensionintersect <primary_file> [options]
11
+ ```
12
+
13
+ ## Arguments
14
+
15
+ | Argument | Required | Description |
16
+ |----------|----------|-------------|
17
+ | `<primary_file>` | Yes | Path to the primary Comprehension file |
18
+
19
+ ## Options
20
+
21
+ | Option | Description | Default |
22
+ |--------|-------------|---------|
23
+ | `-i, --intersect <path>` | Path to the secondary Comprehension file to merge with the primary | (required) |
24
+ | `-e, --entity <name>` | Entity name to merge | Auto-detected from the first key of the primary Comprehension |
25
+ | `-o, --output <path>` | Output file path | `./Intersected-Comprehension-<filename>.json` |
26
+
27
+ ## How Merging Works
28
+
29
+ The intersect operation iterates over every record in the secondary Comprehension and matches it to the primary Comprehension by GUID:
30
+
31
+ - **Matching GUID found in primary**: The record fields are merged. Fields from the secondary file overwrite fields in the primary file. Fields that exist only in the primary are preserved.
32
+ - **No matching GUID in primary**: The record from the secondary file is added to the result.
33
+ - **GUID exists only in primary**: The record is preserved as-is in the output.
34
+
35
+ This behavior makes `comprehensionintersect` ideal for enriching records with data from additional sources.
36
+
37
+ ## Output Format
38
+
39
+ The output is a standard Comprehension JSON object containing the merged records:
40
+
41
+ ```json
42
+ {
43
+ "Neighborhood": {
44
+ "Capitol Hill": {
45
+ "GUIDNeighborhood": "Capitol Hill",
46
+ "MedianHomeValue": "625000",
47
+ "MedianRent": "1850",
48
+ "Population": "32000",
49
+ "MedianAge": "33"
50
+ }
51
+ }
52
+ }
53
+ ```
54
+
55
+ ## Examples
56
+
57
+ ### Basic intersection
58
+
59
+ ```shell
60
+ mdwint comprehensionintersect set1.json \
61
+ -i set2.json \
62
+ -e Document \
63
+ -o merged.json
64
+ ```
65
+
66
+ ### Auto-detect entity name
67
+
68
+ ```shell
69
+ mdwint comprehensionintersect set1.json \
70
+ -i set2.json \
71
+ -o merged.json
72
+ ```
73
+
74
+ If `-e` is omitted, the entity name is inferred from the first key in the primary Comprehension file.
75
+
76
+ ### Chain multiple intersections
77
+
78
+ When you have data spread across three or more source files, chain the intersect operations:
79
+
80
+ ```shell
81
+ # Step 1: Transform each source to a Comprehension
82
+ mdwint csvtransform housing_chars.csv \
83
+ -e Neighborhood -n GUIDNeighborhood \
84
+ -g "{~D:Record.Neighborhood Name~}" \
85
+ -o set_chars.json
86
+
87
+ mdwint csvtransform housing_costs.csv \
88
+ -e Neighborhood -n GUIDNeighborhood \
89
+ -g "{~D:Record.Neighborhood Name~}" \
90
+ -o set_costs.json
91
+
92
+ mdwint csvtransform demographics.csv \
93
+ -e Neighborhood -n GUIDNeighborhood \
94
+ -g "{~D:Record.Neighborhood Name~}" \
95
+ -o set_demographics.json
96
+
97
+ # Step 2: Merge the first two
98
+ mdwint comprehensionintersect set_chars.json \
99
+ -i set_costs.json \
100
+ -e Neighborhood \
101
+ -o merged.json
102
+
103
+ # Step 3: Merge the third into the result
104
+ mdwint comprehensionintersect merged.json \
105
+ -i set_demographics.json \
106
+ -e Neighborhood \
107
+ -o merged.json
108
+ ```
109
+
110
+ ### Using the alias
111
+
112
+ ```shell
113
+ mdwint intersect primary.json -i secondary.json -o result.json
114
+ ```
115
+
116
+ ## Tips
117
+
118
+ - The GUID template must be identical across the source Comprehensions for records to match. Use the same `-g` value or the same mapping file GUID template when generating the source Comprehensions.
119
+ - The secondary file's values overwrite the primary file's values for matching fields. If you want the primary's values to take priority, swap the file arguments.
120
+ - You can intersect Comprehensions that were generated from different file formats (e.g., one from CSV, one from TSV, one from JSON array).
121
+ - For large merge operations with many source files, consider writing a shell script that chains the intersect calls sequentially.
122
+
123
+ ## Notes
124
+
125
+ - The `-i` option is required. The command will error if no secondary Comprehension file is specified.
126
+ - If no entity is specified and the primary Comprehension has multiple entity keys, the first key is used.
127
+ - The output file can be the same as one of the input files, allowing in-place merge operations.
128
+
129
+ ## See Also
130
+
131
+ - [csvtransform](csvtransform.md) -- Generate Comprehensions from CSV files
132
+ - [Comprehensions](../comprehensions.md) -- Core data structure and merging concepts