meadow-integration 1.0.5 → 1.0.6
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.dockerignore +11 -0
- package/Docker-Build.sh +2 -0
- package/Docker-Compose.sh +2 -0
- package/Docker-Push.sh +2 -0
- package/Docker-Tag.sh +2 -0
- package/Dockerfile +28 -0
- package/Dockerfile_LUXURYCode +23 -0
- package/README.md +139 -25
- package/docker-compose.yml +16 -0
- package/docs/README.md +65 -18
- package/docs/_cover.md +3 -2
- package/docs/_sidebar.md +52 -7
- package/docs/_topbar.md +2 -0
- package/docs/api/clone-rest-client.md +278 -0
- package/docs/api/connection-manager.md +179 -0
- package/docs/api/guid-map.md +234 -0
- package/docs/api/integration-adapter.md +283 -0
- package/docs/api/operation.md +241 -0
- package/docs/api/sync-entity-initial.md +227 -0
- package/docs/api/sync-entity-ongoing.md +244 -0
- package/docs/api/sync.md +213 -0
- package/docs/api/tabular-check.md +213 -0
- package/docs/api/tabular-transform.md +316 -0
- package/docs/architecture.md +423 -0
- package/docs/cli/comprehensionarray.md +111 -0
- package/docs/cli/comprehensionintersect.md +132 -0
- package/docs/cli/csvcheck.md +111 -0
- package/docs/cli/csvtransform.md +170 -0
- package/docs/cli/data-clone.md +277 -0
- package/docs/cli/jsonarraytransform.md +166 -0
- package/docs/cli/load-comprehension.md +129 -0
- package/docs/cli/objectarraytocsv.md +159 -0
- package/docs/cli/overview.md +96 -0
- package/docs/cli/serve.md +102 -0
- package/docs/cli/tsvtransform.md +144 -0
- package/docs/data-clone/configuration.md +357 -0
- package/docs/data-clone/connection-manager.md +206 -0
- package/docs/data-clone/docker.md +290 -0
- package/docs/data-clone/overview.md +173 -0
- package/docs/data-clone/sync-modes.md +186 -0
- package/docs/implementation-reference.md +311 -0
- package/docs/overview.md +156 -0
- package/docs/quickstart.md +233 -0
- package/docs/rest/comprehension-push.md +209 -0
- package/docs/rest/comprehension.md +506 -0
- package/docs/rest/csv.md +255 -0
- package/docs/rest/entity-generation.md +158 -0
- package/docs/rest/json-array.md +243 -0
- package/docs/rest/overview.md +120 -0
- package/docs/rest/status.md +63 -0
- package/docs/rest/tsv.md +241 -0
- package/docs/retold-catalog.json +93 -3
- package/docs/retold-keyword-index.json +23683 -1901
- package/package.json +6 -3
- package/scripts/run.sh +18 -0
- package/source/Meadow-Integration.js +15 -1
- package/source/cli/Default-Meadow-Integration-Configuration.json +37 -2
- package/source/cli/Meadow-Integration-CLI-Program.js +4 -1
- package/source/cli/commands/Meadow-Integration-Command-DataClone.js +284 -0
- package/source/services/clone/Meadow-Service-ConnectionManager.js +251 -0
- package/source/services/clone/Meadow-Service-Operation.js +196 -0
- package/source/services/clone/Meadow-Service-RestClient.js +364 -0
- package/source/services/clone/Meadow-Service-Sync-Entity-Initial.js +367 -0
- package/source/services/clone/Meadow-Service-Sync-Entity-Ongoing.js +457 -0
- package/source/services/clone/Meadow-Service-Sync.js +142 -0
- /package/docs/examples/bookstore/{mapping_books_Author.json → mapping_books_author.json} +0 -0
- /package/docs/examples/bookstore/{mapping_books_Book.json → mapping_books_book.json} +0 -0
|
@@ -0,0 +1,423 @@
|
|
|
1
|
+
# Architecture
|
|
2
|
+
|
|
3
|
+
This document describes the architectural design of Meadow Integration, covering both the data transformation and data synchronization pipelines.
|
|
4
|
+
|
|
5
|
+
## High-Level System Architecture
|
|
6
|
+
|
|
7
|
+
Meadow Integration sits between external data sources and the Meadow data access layer. It provides three interfaces (CLI, REST, Programmatic) that share a common set of services.
|
|
8
|
+
|
|
9
|
+
```mermaid
|
|
10
|
+
flowchart TB
|
|
11
|
+
subgraph External["External Data Sources"]
|
|
12
|
+
CSV["CSV Files"]
|
|
13
|
+
TSV["TSV Files"]
|
|
14
|
+
JSON["JSON Arrays"]
|
|
15
|
+
API["Remote Meadow API"]
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
subgraph Interfaces["Interfaces"]
|
|
19
|
+
CLI["CLI Program\n(pict-service-commandlineutility)"]
|
|
20
|
+
REST["REST Server\n(Orator + Restify)"]
|
|
21
|
+
PROG["Programmatic API\n(require meadow-integration)"]
|
|
22
|
+
end
|
|
23
|
+
|
|
24
|
+
subgraph Services["Core Services"]
|
|
25
|
+
TC["TabularCheck"]
|
|
26
|
+
TT["TabularTransform"]
|
|
27
|
+
IA["IntegrationAdapter"]
|
|
28
|
+
GM["GUIDMap"]
|
|
29
|
+
CM["ConnectionManager"]
|
|
30
|
+
CRC["CloneRestClient"]
|
|
31
|
+
SYNC["Sync"]
|
|
32
|
+
SEI["SyncEntityInitial"]
|
|
33
|
+
SEO["SyncEntityOngoing"]
|
|
34
|
+
OP["Operation"]
|
|
35
|
+
end
|
|
36
|
+
|
|
37
|
+
subgraph Targets["Targets"]
|
|
38
|
+
MAPI["Meadow REST API\n(write)"]
|
|
39
|
+
DB["Local Database\n(MySQL / MSSQL)"]
|
|
40
|
+
end
|
|
41
|
+
|
|
42
|
+
CSV --> CLI
|
|
43
|
+
TSV --> CLI
|
|
44
|
+
JSON --> CLI
|
|
45
|
+
CSV --> REST
|
|
46
|
+
TSV --> REST
|
|
47
|
+
JSON --> REST
|
|
48
|
+
|
|
49
|
+
CLI --> TC
|
|
50
|
+
CLI --> TT
|
|
51
|
+
CLI --> IA
|
|
52
|
+
CLI --> SYNC
|
|
53
|
+
REST --> TC
|
|
54
|
+
REST --> TT
|
|
55
|
+
REST --> IA
|
|
56
|
+
PROG --> TC
|
|
57
|
+
PROG --> TT
|
|
58
|
+
PROG --> IA
|
|
59
|
+
PROG --> SYNC
|
|
60
|
+
|
|
61
|
+
TT --> IA
|
|
62
|
+
IA --> GM
|
|
63
|
+
IA --> MAPI
|
|
64
|
+
|
|
65
|
+
API --> CRC
|
|
66
|
+
CRC --> SYNC
|
|
67
|
+
SYNC --> SEI
|
|
68
|
+
SYNC --> SEO
|
|
69
|
+
SEI --> CM
|
|
70
|
+
SEO --> CM
|
|
71
|
+
CM --> DB
|
|
72
|
+
|
|
73
|
+
SYNC --> OP
|
|
74
|
+
SEI --> OP
|
|
75
|
+
SEO --> OP
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
The system divides cleanly into two pipelines that share the `Operation` utility for timing and progress tracking.
|
|
79
|
+
|
|
80
|
+
## Data Transformation Pipeline
|
|
81
|
+
|
|
82
|
+
The transformation pipeline converts tabular data into Meadow entity records. Each stage is a discrete service that can be used independently.
|
|
83
|
+
|
|
84
|
+
```mermaid
|
|
85
|
+
flowchart LR
|
|
86
|
+
subgraph Input["1. Parse"]
|
|
87
|
+
FILE["Source File\n(CSV / TSV / JSON)"]
|
|
88
|
+
PARSE["Stream Parser\n(csv-parser / JSON.parse)"]
|
|
89
|
+
end
|
|
90
|
+
|
|
91
|
+
subgraph Transform["2. Transform"]
|
|
92
|
+
RECORD["Raw Record"]
|
|
93
|
+
MAPPING["Mapping Config\n(Implicit + Explicit + User)"]
|
|
94
|
+
TEMPLATE["Pict Template Engine\n({~D:Record.col~})"]
|
|
95
|
+
SOLVER["Solvers\n(Expression Parser)"]
|
|
96
|
+
end
|
|
97
|
+
|
|
98
|
+
subgraph Collect["3. Collect"]
|
|
99
|
+
COMP["Comprehension\n{Entity: {GUID: Record}}"]
|
|
100
|
+
MERGE["Intersect\n(merge by GUID)"]
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
subgraph Push["4. Push"]
|
|
104
|
+
ADAPTER["IntegrationAdapter"]
|
|
105
|
+
MARSHAL["Marshal Record\n(schema validation,\nstring truncation,\nGUID prefixing)"]
|
|
106
|
+
GUIDMAP["GUIDMap\n(external <-> Meadow IDs)"]
|
|
107
|
+
UPSERT["Upsert\n(single or bulk)"]
|
|
108
|
+
end
|
|
109
|
+
|
|
110
|
+
FILE --> PARSE --> RECORD
|
|
111
|
+
MAPPING --> TEMPLATE
|
|
112
|
+
RECORD --> TEMPLATE --> COMP
|
|
113
|
+
RECORD --> SOLVER --> COMP
|
|
114
|
+
COMP --> MERGE --> COMP
|
|
115
|
+
COMP --> ADAPTER --> MARSHAL --> GUIDMAP --> UPSERT
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
### Stage Details
|
|
119
|
+
|
|
120
|
+
**Parse** -- CSV and TSV files are streamed through a parser that emits one record per row. JSON Array files are loaded and iterated. The `TabularCheck` service can analyze records without transforming them, producing column statistics.
|
|
121
|
+
|
|
122
|
+
**Transform** -- The `TabularTransform` service applies a three-layer configuration cascade to each record:
|
|
123
|
+
|
|
124
|
+
1. **Implicit** -- Auto-generated from the first record's keys (column names become field names, the first column is used for GUID generation)
|
|
125
|
+
2. **Explicit** -- Loaded from a mapping file that specifies entity name, GUID template, and column-to-field mappings
|
|
126
|
+
3. **User** -- Command-line overrides for entity name, GUID name, GUID template, and inline column mappings
|
|
127
|
+
|
|
128
|
+
Each layer merges on top of the previous one using `Object.assign`, so User settings always win.
|
|
129
|
+
|
|
130
|
+
Pict template expressions resolve column values at transformation time. Solvers (powered by the Fable Expression Parser) enable multi-entity extraction from a single source row by dynamically generating multiple GUID uniqueness entries.
|
|
131
|
+
|
|
132
|
+
**Collect** -- Transformed records accumulate in a Comprehension object. Records with duplicate GUIDs within the same parse are merged. Records can also be merged with an existing Comprehension loaded from disk.
|
|
133
|
+
|
|
134
|
+
**Push** -- The `IntegrationAdapter` marshals comprehension records into Meadow-compatible format. It fetches the target entity schema from the Meadow API, validates field types, truncates strings that exceed schema-defined sizes, and strips reserved columns (`CreateDate`, `UpdateDate`, `Deleted`, `DeleteDate`). Cross-entity GUID references are resolved through the `GUIDMap`. Records are pushed via upsert -- individually for small sets, or in configurable bulk batches (default threshold: 1000 records, batch size: 100) for large sets.
|
|
135
|
+
|
|
136
|
+
## Data Synchronization Pipeline
|
|
137
|
+
|
|
138
|
+
The Data Clone pipeline replicates entity data from a remote Meadow API into a local relational database.
|
|
139
|
+
|
|
140
|
+
```mermaid
|
|
141
|
+
flowchart TB
|
|
142
|
+
subgraph Config["Configuration"]
|
|
143
|
+
MCFG[".meadow.config.json"]
|
|
144
|
+
SCHEMA["Extended Schema JSON\n(from Stricture)"]
|
|
145
|
+
CLIOPTS["CLI Overrides\n(--api_server, --db_host, etc.)"]
|
|
146
|
+
end
|
|
147
|
+
|
|
148
|
+
subgraph Auth["1. Authenticate"]
|
|
149
|
+
CRC["CloneRestClient"]
|
|
150
|
+
SESSION["Session Management\n(cookie / token)"]
|
|
151
|
+
end
|
|
152
|
+
|
|
153
|
+
subgraph Connect["2. Connect"]
|
|
154
|
+
CM["ConnectionManager"]
|
|
155
|
+
POOL["Connection Pool\n(MySQL or MSSQL)"]
|
|
156
|
+
end
|
|
157
|
+
|
|
158
|
+
subgraph Init["3. Initialize Schema"]
|
|
159
|
+
LOAD["Load Extended Schema"]
|
|
160
|
+
CREATE["Create Tables\n(if not exist)"]
|
|
161
|
+
INDEX["Create Indexes\n(GUID unique, Deleted)"]
|
|
162
|
+
end
|
|
163
|
+
|
|
164
|
+
subgraph Sync["4. Sync Entities"]
|
|
165
|
+
COMPARE["Compare\nlocal vs. remote\n(max ID, count, UpdateDate)"]
|
|
166
|
+
DOWNLOAD["Download Pages\n(filtered + sorted)"]
|
|
167
|
+
WRITE["Marshal + Write\n(create or update)"]
|
|
168
|
+
PROGRESS["Progress Tracking\n(Operation service)"]
|
|
169
|
+
end
|
|
170
|
+
|
|
171
|
+
MCFG --> CRC
|
|
172
|
+
CLIOPTS --> CRC
|
|
173
|
+
MCFG --> CM
|
|
174
|
+
CLIOPTS --> CM
|
|
175
|
+
SCHEMA --> LOAD
|
|
176
|
+
|
|
177
|
+
CRC --> SESSION --> DOWNLOAD
|
|
178
|
+
CM --> POOL --> WRITE
|
|
179
|
+
LOAD --> CREATE --> INDEX
|
|
180
|
+
|
|
181
|
+
COMPARE --> DOWNLOAD --> WRITE
|
|
182
|
+
WRITE --> PROGRESS
|
|
183
|
+
```
|
|
184
|
+
|
|
185
|
+
### Stage Details
|
|
186
|
+
|
|
187
|
+
**Authenticate** -- The `CloneRestClient` authenticates with the remote Meadow API by posting credentials to `/Authenticate`. The resulting session cookie or token is attached to all subsequent requests. If no credentials are configured, authentication is skipped (for unauthenticated APIs). HTTP keep-alive is enabled for connection reuse.
|
|
188
|
+
|
|
189
|
+
**Connect** -- The `ConnectionManager` establishes a connection pool to the local database. It supports MySQL (via `meadow-connection-mysql`) and MSSQL (via `meadow-connection-mssql`). The provider is selected by the `Destination.Provider` configuration key.
|
|
190
|
+
|
|
191
|
+
**Initialize Schema** -- The Meadow extended schema JSON (produced by Stricture's `build` command) is loaded. For each entity in the schema (or a configured subset), the sync service uses the Meadow provider to create the table if it does not exist. It then creates a unique index on the GUID column and a non-unique index on the Deleted column using the `ConnectionManager`.
|
|
192
|
+
|
|
193
|
+
**Sync Entities** -- Entities are synced sequentially in the order defined by `SyncEntityList` (or schema order if the list is empty). Two sync strategies are available:
|
|
194
|
+
|
|
195
|
+
- **Initial** -- Queries the local max ID and the remote max ID and record count. Generates paginated URL partials filtered to records with IDs greater than the local maximum. Downloads each page and creates records locally with identity insert enabled so primary keys match the remote system.
|
|
196
|
+
- **Ongoing** -- Extends Initial sync with `UpdateDate` comparison. After identifying new records by ID, it also compares `UpdateDate` timestamps. Records where the remote `UpdateDate` differs from the local `UpdateDate` by more than 5 milliseconds are updated. This handles both new records and modifications.
|
|
197
|
+
|
|
198
|
+
## Service Dependency Diagram
|
|
199
|
+
|
|
200
|
+
All services extend `fable-serviceproviderbase` and register with a Fable instance. The diagram below shows the dependency relationships.
|
|
201
|
+
|
|
202
|
+
```mermaid
|
|
203
|
+
classDiagram
|
|
204
|
+
class FableServiceProviderBase {
|
|
205
|
+
+fable
|
|
206
|
+
+options
|
|
207
|
+
+log
|
|
208
|
+
+serviceType
|
|
209
|
+
}
|
|
210
|
+
|
|
211
|
+
class TabularCheck {
|
|
212
|
+
+serviceType: TabularCheck
|
|
213
|
+
+newStatisticsObject()
|
|
214
|
+
+collectStatistics()
|
|
215
|
+
}
|
|
216
|
+
|
|
217
|
+
class TabularTransform {
|
|
218
|
+
+serviceType: TabularTransform
|
|
219
|
+
+newMappingOutcomeObject()
|
|
220
|
+
+transformRecord()
|
|
221
|
+
+addRecordToComprehension()
|
|
222
|
+
+createRecordFromMapping()
|
|
223
|
+
}
|
|
224
|
+
|
|
225
|
+
class IntegrationAdapter {
|
|
226
|
+
+serviceType: IntegrationAdapter
|
|
227
|
+
+Entity
|
|
228
|
+
+addSourceRecord()
|
|
229
|
+
+integrateRecords()
|
|
230
|
+
+marshalRecord()
|
|
231
|
+
+pushRecordsToServer()
|
|
232
|
+
}
|
|
233
|
+
|
|
234
|
+
class GUIDMap {
|
|
235
|
+
+serviceType: MeadowGUIDMap
|
|
236
|
+
+mapGUIDToID()
|
|
237
|
+
+getIDFromGUID()
|
|
238
|
+
+mapExternalGUIDtoMeadowGUID()
|
|
239
|
+
+getMeadowIDFromExternalGUID()
|
|
240
|
+
}
|
|
241
|
+
|
|
242
|
+
class ConnectionManager {
|
|
243
|
+
+serviceType: MeadowConnectionManager
|
|
244
|
+
+Provider
|
|
245
|
+
+ConnectionPool
|
|
246
|
+
+connect()
|
|
247
|
+
+createIndex()
|
|
248
|
+
}
|
|
249
|
+
|
|
250
|
+
class CloneRestClient {
|
|
251
|
+
+serviceType: MeadowCloneRestClient
|
|
252
|
+
+serverURL
|
|
253
|
+
+authenticate()
|
|
254
|
+
+deauthenticate()
|
|
255
|
+
+getJSON()
|
|
256
|
+
+upsertEntity()
|
|
257
|
+
+getEntitySet()
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
class Sync {
|
|
261
|
+
+serviceType: MeadowSync
|
|
262
|
+
+SyncMode
|
|
263
|
+
+SyncEntityList
|
|
264
|
+
+loadMeadowSchema()
|
|
265
|
+
+syncEntity()
|
|
266
|
+
+syncAll()
|
|
267
|
+
}
|
|
268
|
+
|
|
269
|
+
class SyncEntityInitial {
|
|
270
|
+
+serviceType: MeadowSyncEntityInitial
|
|
271
|
+
+EntitySchema
|
|
272
|
+
+PageSize
|
|
273
|
+
+initialize()
|
|
274
|
+
+sync()
|
|
275
|
+
+marshalRecord()
|
|
276
|
+
}
|
|
277
|
+
|
|
278
|
+
class SyncEntityOngoing {
|
|
279
|
+
+serviceType: MeadowSyncEntityOngoing
|
|
280
|
+
+EntitySchema
|
|
281
|
+
+PageSize
|
|
282
|
+
+initialize()
|
|
283
|
+
+sync()
|
|
284
|
+
+marshalRecord()
|
|
285
|
+
}
|
|
286
|
+
|
|
287
|
+
class Operation {
|
|
288
|
+
+timeStamps
|
|
289
|
+
+progressTrackers
|
|
290
|
+
+createTimeStamp()
|
|
291
|
+
+createProgressTracker()
|
|
292
|
+
+printProgressTrackerStatus()
|
|
293
|
+
}
|
|
294
|
+
|
|
295
|
+
FableServiceProviderBase <|-- TabularCheck
|
|
296
|
+
FableServiceProviderBase <|-- TabularTransform
|
|
297
|
+
FableServiceProviderBase <|-- IntegrationAdapter
|
|
298
|
+
FableServiceProviderBase <|-- GUIDMap
|
|
299
|
+
FableServiceProviderBase <|-- ConnectionManager
|
|
300
|
+
FableServiceProviderBase <|-- CloneRestClient
|
|
301
|
+
FableServiceProviderBase <|-- Sync
|
|
302
|
+
FableServiceProviderBase <|-- SyncEntityInitial
|
|
303
|
+
FableServiceProviderBase <|-- SyncEntityOngoing
|
|
304
|
+
|
|
305
|
+
IntegrationAdapter --> GUIDMap : uses
|
|
306
|
+
Sync --> SyncEntityInitial : creates
|
|
307
|
+
Sync --> SyncEntityOngoing : creates
|
|
308
|
+
SyncEntityInitial --> Operation : uses
|
|
309
|
+
SyncEntityOngoing --> Operation : uses
|
|
310
|
+
SyncEntityInitial ..> CloneRestClient : reads from
|
|
311
|
+
SyncEntityOngoing ..> CloneRestClient : reads from
|
|
312
|
+
SyncEntityInitial ..> ConnectionManager : writes to
|
|
313
|
+
SyncEntityOngoing ..> ConnectionManager : writes to
|
|
314
|
+
```
|
|
315
|
+
|
|
316
|
+
## Configuration Cascade
|
|
317
|
+
|
|
318
|
+
Configuration for the CLI flows through multiple layers, each overriding the previous.
|
|
319
|
+
|
|
320
|
+
```mermaid
|
|
321
|
+
flowchart LR
|
|
322
|
+
DEF["Default Configuration\n(Default-Meadow-Integration-\nConfiguration.json)"]
|
|
323
|
+
FILE[".meadow.config.json\n(working directory)"]
|
|
324
|
+
CLI["Command-Line Flags\n(--api_server, --db_host, etc.)"]
|
|
325
|
+
|
|
326
|
+
DEF -->|"base"| MERGED["Resolved Configuration"]
|
|
327
|
+
FILE -->|"overrides"| MERGED
|
|
328
|
+
CLI -->|"overrides"| MERGED
|
|
329
|
+
```
|
|
330
|
+
|
|
331
|
+
For data transformation, the mapping configuration has its own three-layer cascade:
|
|
332
|
+
|
|
333
|
+
```mermaid
|
|
334
|
+
flowchart LR
|
|
335
|
+
IMP["Implicit\n(auto-detected from\nfirst record)"]
|
|
336
|
+
EXP["Explicit\n(mapping file\nvia -m flag)"]
|
|
337
|
+
USR["User\n(CLI flags:\n-e, -g, -n, -c)"]
|
|
338
|
+
|
|
339
|
+
IMP -->|"base"| FINAL["Final Mapping Config"]
|
|
340
|
+
EXP -->|"overrides"| FINAL
|
|
341
|
+
USR -->|"overrides"| FINAL
|
|
342
|
+
```
|
|
343
|
+
|
|
344
|
+
## Sync Mode Comparison
|
|
345
|
+
|
|
346
|
+
The two sync modes serve different purposes and have different performance characteristics.
|
|
347
|
+
|
|
348
|
+
```mermaid
|
|
349
|
+
flowchart TB
|
|
350
|
+
subgraph Initial["Initial Sync"]
|
|
351
|
+
I1["Query local max ID"]
|
|
352
|
+
I2["Query remote max ID + count"]
|
|
353
|
+
I3["Generate paginated URL partials\n(filter: ID > local max)"]
|
|
354
|
+
I4["Download each page"]
|
|
355
|
+
I5["For each record:\nRead local by ID"]
|
|
356
|
+
I6{"Record\nexists?"}
|
|
357
|
+
I7["Skip"]
|
|
358
|
+
I8["Create with\nidentity insert"]
|
|
359
|
+
|
|
360
|
+
I1 --> I2 --> I3 --> I4 --> I5 --> I6
|
|
361
|
+
I6 -->|"yes"| I7
|
|
362
|
+
I6 -->|"no"| I8
|
|
363
|
+
end
|
|
364
|
+
|
|
365
|
+
subgraph Ongoing["Ongoing Sync"]
|
|
366
|
+
O1["Query local max ID + UpdateDate"]
|
|
367
|
+
O2["Query remote max ID + UpdateDate + count"]
|
|
368
|
+
O3["Iterate all records\n(paginated, ID ascending)"]
|
|
369
|
+
O4["For each record:\nRead local by ID"]
|
|
370
|
+
O5{"Record\nexists?"}
|
|
371
|
+
O6{"UpdateDate\ndifference\n> 5ms?"}
|
|
372
|
+
O7["Update record"]
|
|
373
|
+
O8["Skip"]
|
|
374
|
+
O9["Create with\nidentity insert"]
|
|
375
|
+
|
|
376
|
+
O1 --> O2 --> O3 --> O4 --> O5
|
|
377
|
+
O5 -->|"yes"| O6
|
|
378
|
+
O5 -->|"no"| O9
|
|
379
|
+
O6 -->|"yes"| O7
|
|
380
|
+
O6 -->|"no"| O8
|
|
381
|
+
end
|
|
382
|
+
```
|
|
383
|
+
|
|
384
|
+
| Aspect | Initial | Ongoing |
|
|
385
|
+
|--------|---------|---------|
|
|
386
|
+
| **Purpose** | First-time clone or catch-up | Incremental sync of changes |
|
|
387
|
+
| **Strategy** | Only downloads records with IDs above local max | Walks all records and compares timestamps |
|
|
388
|
+
| **Handles new records** | Yes | Yes |
|
|
389
|
+
| **Handles updates** | No | Yes (by UpdateDate comparison) |
|
|
390
|
+
| **Performance** | Faster for first clone (skips existing) | Slower per run but keeps data current |
|
|
391
|
+
| **Typical usage** | Run once, then switch to Ongoing | Run on a schedule (cron, Docker) |
|
|
392
|
+
|
|
393
|
+
## Docker Deployment
|
|
394
|
+
|
|
395
|
+
The included Dockerfile builds a production image for running Data Clone as a containerized service. The image is based on `node:20-bookworm` and expects a `.meadow.config.json` to be provided at runtime (via volume mount or baked into a derived image).
|
|
396
|
+
|
|
397
|
+
```mermaid
|
|
398
|
+
flowchart LR
|
|
399
|
+
subgraph Build["Docker Build"]
|
|
400
|
+
BASE["node:20-bookworm"]
|
|
401
|
+
DEPS["npm install --omit=dev"]
|
|
402
|
+
SRC["Copy source + scripts"]
|
|
403
|
+
end
|
|
404
|
+
|
|
405
|
+
subgraph Runtime["Docker Runtime"]
|
|
406
|
+
CFG[".meadow.config.json\n(volume mount)"]
|
|
407
|
+
SCHEMA["Extended Schema\n(volume mount)"]
|
|
408
|
+
RUN["scripts/run.sh"]
|
|
409
|
+
end
|
|
410
|
+
|
|
411
|
+
subgraph External["External"]
|
|
412
|
+
REMOTE["Remote Meadow API"]
|
|
413
|
+
LOCAL["Local Database"]
|
|
414
|
+
end
|
|
415
|
+
|
|
416
|
+
BASE --> DEPS --> SRC --> RUN
|
|
417
|
+
CFG --> RUN
|
|
418
|
+
SCHEMA --> RUN
|
|
419
|
+
RUN --> REMOTE
|
|
420
|
+
RUN --> LOCAL
|
|
421
|
+
```
|
|
422
|
+
|
|
423
|
+
The `docker-compose.yml` can be used to run the Data Clone alongside a local MySQL or MSSQL container for development and testing.
|
|
@@ -0,0 +1,111 @@
|
|
|
1
|
+
# comprehensionarray
|
|
2
|
+
|
|
3
|
+
Convert an object-keyed Comprehension into a JSON array. Comprehensions store records as `{ GUID: record }` objects for fast lookup and merging, but sometimes you need a plain array for export, UI consumption, or further processing.
|
|
4
|
+
|
|
5
|
+
**Aliases:** `comprehension_to_array`, `array`
|
|
6
|
+
|
|
7
|
+
## Usage
|
|
8
|
+
|
|
9
|
+
```shell
|
|
10
|
+
mdwint comprehensionarray <file> [options]
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Arguments
|
|
14
|
+
|
|
15
|
+
| Argument | Required | Description |
|
|
16
|
+
|----------|----------|-------------|
|
|
17
|
+
| `<file>` | Yes | Path to the Comprehension file to convert |
|
|
18
|
+
|
|
19
|
+
## Options
|
|
20
|
+
|
|
21
|
+
| Option | Description | Default |
|
|
22
|
+
|--------|-------------|---------|
|
|
23
|
+
| `-e, --entity <name>` | Entity name to extract from the Comprehension | Auto-detected from the first key |
|
|
24
|
+
| `-o, --output <path>` | Output file path | `./Array-Comprehension-<filename>.json` |
|
|
25
|
+
|
|
26
|
+
## Input Format
|
|
27
|
+
|
|
28
|
+
The input is a standard object-keyed Comprehension:
|
|
29
|
+
|
|
30
|
+
```json
|
|
31
|
+
{
|
|
32
|
+
"Book": {
|
|
33
|
+
"Book_1": { "GUIDBook": "Book_1", "Title": "The Hunger Games", "Language": "eng" },
|
|
34
|
+
"Book_2": { "GUIDBook": "Book_2", "Title": "Harry Potter", "Language": "eng" }
|
|
35
|
+
}
|
|
36
|
+
}
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
## Output Format
|
|
40
|
+
|
|
41
|
+
The output is a JSON array of the record objects:
|
|
42
|
+
|
|
43
|
+
```json
|
|
44
|
+
[
|
|
45
|
+
{ "GUIDBook": "Book_1", "Title": "The Hunger Games", "Language": "eng" },
|
|
46
|
+
{ "GUIDBook": "Book_2", "Title": "Harry Potter", "Language": "eng" }
|
|
47
|
+
]
|
|
48
|
+
```
|
|
49
|
+
|
|
50
|
+
The GUID keys are discarded; only the record values are included in the array.
|
|
51
|
+
|
|
52
|
+
## Examples
|
|
53
|
+
|
|
54
|
+
### Basic conversion with explicit entity
|
|
55
|
+
|
|
56
|
+
```shell
|
|
57
|
+
mdwint comprehensionarray ./books.json -e Book -o books-array.json
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
### Auto-detect entity name
|
|
61
|
+
|
|
62
|
+
```shell
|
|
63
|
+
mdwint comprehensionarray ./books.json -o books-array.json
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
If `-e` is omitted, the entity name is inferred from the first key in the Comprehension. If the Comprehension has only one entity, this works automatically.
|
|
67
|
+
|
|
68
|
+
### Using the alias
|
|
69
|
+
|
|
70
|
+
```shell
|
|
71
|
+
mdwint array ./books.json -e Book -o books-array.json
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
### Pipeline: Comprehension to CSV export
|
|
75
|
+
|
|
76
|
+
This command is commonly used as an intermediate step when exporting to CSV:
|
|
77
|
+
|
|
78
|
+
```shell
|
|
79
|
+
# Step 1: Convert object Comprehension to array
|
|
80
|
+
mdwint comprehensionarray ./store.json -e Book -o books-array.json
|
|
81
|
+
|
|
82
|
+
# Step 2: Export array to CSV
|
|
83
|
+
mdwint objectarraytocsv ./books-array.json -o books.csv
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
### Extract one entity from a multi-entity Comprehension
|
|
87
|
+
|
|
88
|
+
```shell
|
|
89
|
+
# Extract just the Author records from a multi-entity file
|
|
90
|
+
mdwint comprehensionarray ./bookstore.json -e Author -o authors-array.json
|
|
91
|
+
|
|
92
|
+
# Extract just the BookAuthorJoin records
|
|
93
|
+
mdwint comprehensionarray ./bookstore.json -e BookAuthorJoin -o joins-array.json
|
|
94
|
+
```
|
|
95
|
+
|
|
96
|
+
## Tips
|
|
97
|
+
|
|
98
|
+
- When working with multi-entity Comprehensions, always specify the `-e` flag to select which entity to extract. Without it, only the first entity key is used.
|
|
99
|
+
- This command is non-destructive to the input file. The original Comprehension is not modified.
|
|
100
|
+
- The output array preserves the order in which object keys were enumerated, which is generally insertion order in modern JavaScript engines.
|
|
101
|
+
|
|
102
|
+
## Notes
|
|
103
|
+
|
|
104
|
+
- If the specified entity does not exist in the Comprehension, the output will be an empty array.
|
|
105
|
+
- If no entities are found in the Comprehension file, the command will error.
|
|
106
|
+
|
|
107
|
+
## See Also
|
|
108
|
+
|
|
109
|
+
- [objectarraytocsv](objectarraytocsv.md) -- Convert a JSON array to CSV format
|
|
110
|
+
- [csvtransform](csvtransform.md) -- Create Comprehensions from CSV files
|
|
111
|
+
- [Comprehensions](../comprehensions.md) -- Object vs. array format documentation
|
|
@@ -0,0 +1,132 @@
|
|
|
1
|
+
# comprehensionintersect
|
|
2
|
+
|
|
3
|
+
Merge two Comprehension JSON files together. Records with the same GUID in both files are merged, with values from the secondary file overwriting values in the primary file. This is useful when the same entities have data spread across multiple source files.
|
|
4
|
+
|
|
5
|
+
**Aliases:** `intersect`
|
|
6
|
+
|
|
7
|
+
## Usage
|
|
8
|
+
|
|
9
|
+
```shell
|
|
10
|
+
mdwint comprehensionintersect <primary_file> [options]
|
|
11
|
+
```
|
|
12
|
+
|
|
13
|
+
## Arguments
|
|
14
|
+
|
|
15
|
+
| Argument | Required | Description |
|
|
16
|
+
|----------|----------|-------------|
|
|
17
|
+
| `<primary_file>` | Yes | Path to the primary Comprehension file |
|
|
18
|
+
|
|
19
|
+
## Options
|
|
20
|
+
|
|
21
|
+
| Option | Description | Default |
|
|
22
|
+
|--------|-------------|---------|
|
|
23
|
+
| `-i, --intersect <path>` | Path to the secondary Comprehension file to merge with the primary | (required) |
|
|
24
|
+
| `-e, --entity <name>` | Entity name to merge | Auto-detected from the first key of the primary Comprehension |
|
|
25
|
+
| `-o, --output <path>` | Output file path | `./Intersected-Comprehension-<filename>.json` |
|
|
26
|
+
|
|
27
|
+
## How Merging Works
|
|
28
|
+
|
|
29
|
+
The intersect operation iterates over every record in the secondary Comprehension and matches it to the primary Comprehension by GUID:
|
|
30
|
+
|
|
31
|
+
- **Matching GUID found in primary**: The record fields are merged. Fields from the secondary file overwrite fields in the primary file. Fields that exist only in the primary are preserved.
|
|
32
|
+
- **No matching GUID in primary**: The record from the secondary file is added to the result.
|
|
33
|
+
- **GUID exists only in primary**: The record is preserved as-is in the output.
|
|
34
|
+
|
|
35
|
+
This behavior makes `comprehensionintersect` ideal for enriching records with data from additional sources.
|
|
36
|
+
|
|
37
|
+
## Output Format
|
|
38
|
+
|
|
39
|
+
The output is a standard Comprehension JSON object containing the merged records:
|
|
40
|
+
|
|
41
|
+
```json
|
|
42
|
+
{
|
|
43
|
+
"Neighborhood": {
|
|
44
|
+
"Capitol Hill": {
|
|
45
|
+
"GUIDNeighborhood": "Capitol Hill",
|
|
46
|
+
"MedianHomeValue": "625000",
|
|
47
|
+
"MedianRent": "1850",
|
|
48
|
+
"Population": "32000",
|
|
49
|
+
"MedianAge": "33"
|
|
50
|
+
}
|
|
51
|
+
}
|
|
52
|
+
}
|
|
53
|
+
```
|
|
54
|
+
|
|
55
|
+
## Examples
|
|
56
|
+
|
|
57
|
+
### Basic intersection
|
|
58
|
+
|
|
59
|
+
```shell
|
|
60
|
+
mdwint comprehensionintersect set1.json \
|
|
61
|
+
-i set2.json \
|
|
62
|
+
-e Document \
|
|
63
|
+
-o merged.json
|
|
64
|
+
```
|
|
65
|
+
|
|
66
|
+
### Auto-detect entity name
|
|
67
|
+
|
|
68
|
+
```shell
|
|
69
|
+
mdwint comprehensionintersect set1.json \
|
|
70
|
+
-i set2.json \
|
|
71
|
+
-o merged.json
|
|
72
|
+
```
|
|
73
|
+
|
|
74
|
+
If `-e` is omitted, the entity name is inferred from the first key in the primary Comprehension file.
|
|
75
|
+
|
|
76
|
+
### Chain multiple intersections
|
|
77
|
+
|
|
78
|
+
When you have data spread across three or more source files, chain the intersect operations:
|
|
79
|
+
|
|
80
|
+
```shell
|
|
81
|
+
# Step 1: Transform each source to a Comprehension
|
|
82
|
+
mdwint csvtransform housing_chars.csv \
|
|
83
|
+
-e Neighborhood -n GUIDNeighborhood \
|
|
84
|
+
-g "{~D:Record.Neighborhood Name~}" \
|
|
85
|
+
-o set_chars.json
|
|
86
|
+
|
|
87
|
+
mdwint csvtransform housing_costs.csv \
|
|
88
|
+
-e Neighborhood -n GUIDNeighborhood \
|
|
89
|
+
-g "{~D:Record.Neighborhood Name~}" \
|
|
90
|
+
-o set_costs.json
|
|
91
|
+
|
|
92
|
+
mdwint csvtransform demographics.csv \
|
|
93
|
+
-e Neighborhood -n GUIDNeighborhood \
|
|
94
|
+
-g "{~D:Record.Neighborhood Name~}" \
|
|
95
|
+
-o set_demographics.json
|
|
96
|
+
|
|
97
|
+
# Step 2: Merge the first two
|
|
98
|
+
mdwint comprehensionintersect set_chars.json \
|
|
99
|
+
-i set_costs.json \
|
|
100
|
+
-e Neighborhood \
|
|
101
|
+
-o merged.json
|
|
102
|
+
|
|
103
|
+
# Step 3: Merge the third into the result
|
|
104
|
+
mdwint comprehensionintersect merged.json \
|
|
105
|
+
-i set_demographics.json \
|
|
106
|
+
-e Neighborhood \
|
|
107
|
+
-o merged.json
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
### Using the alias
|
|
111
|
+
|
|
112
|
+
```shell
|
|
113
|
+
mdwint intersect primary.json -i secondary.json -o result.json
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
## Tips
|
|
117
|
+
|
|
118
|
+
- The GUID template must be identical across the source Comprehensions for records to match. Use the same `-g` value or the same mapping file GUID template when generating the source Comprehensions.
|
|
119
|
+
- The secondary file's values overwrite the primary file's values for matching fields. If you want the primary's values to take priority, swap the file arguments.
|
|
120
|
+
- You can intersect Comprehensions that were generated from different file formats (e.g., one from CSV, one from TSV, one from JSON array).
|
|
121
|
+
- For large merge operations with many source files, consider writing a shell script that chains the intersect calls sequentially.
|
|
122
|
+
|
|
123
|
+
## Notes
|
|
124
|
+
|
|
125
|
+
- The `-i` option is required. The command will error if no secondary Comprehension file is specified.
|
|
126
|
+
- If no entity is specified and the primary Comprehension has multiple entity keys, the first key is used.
|
|
127
|
+
- The output file can be the same as one of the input files, allowing in-place merge operations.
|
|
128
|
+
|
|
129
|
+
## See Also
|
|
130
|
+
|
|
131
|
+
- [csvtransform](csvtransform.md) -- Generate Comprehensions from CSV files
|
|
132
|
+
- [Comprehensions](../comprehensions.md) -- Core data structure and merging concepts
|