@pattern-stack/codegen 0.9.2 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,134 @@
1
+ ---
2
+ name: sync
3
+ description: Load when integrating an external system (CRM, billing, etc.) in a project that ran `codegen subsystem install sync`. Triggers include implementing `IChangeSource<T>` for a provider; writing an `ISyncSink<T>`; registering `SyncModule.forRoot(...)` in `app.module.ts`; building a per-entity feature module that binds the change source, sink, and `ExecuteSyncUseCase`; declaring a `detection:` block in entity YAML; querying the `sync_runs` / `sync_run_items` audit log or the structured `changed_fields` jsonb; or wiring cursor persistence, diffing, and multi-tenancy.
4
+ allowed-tools: Read, Write, Edit, Glob, Grep, Bash
5
+ user-invocable: false
6
+ ---
7
+
8
+ <!-- managed by @pattern-stack/codegen — re-run `codegen skills install` to refresh. Edit the package source, not this vendored copy. -->
9
+
10
+ # Sync Subsystem
11
+
12
+ The sync subsystem is a generic external-system integration engine for your
13
+ app. One orchestrator — `ExecuteSyncUseCase<T>` — runs *every* integration in
14
+ your codebase. You write per-provider detection code against a single port
15
+ (`IChangeSource<T>`) and per-entity write code against a single sink
16
+ (`ISyncSink<T>`). Everything else — cursor persistence, field diffing,
17
+ per-record audit, run lifecycle — is provided by the subsystem.
18
+
19
+ You opt in with `codegen subsystem install sync`, which vendors the runtime
20
+ into `<paths.subsystems>/sync/` (imported as `@shared/subsystems/sync`), adds a
21
+ `sync:` block to `codegen.config.yaml`, and emits the audit schema
22
+ (`sync-audit.schema.ts`). Unlike some subsystems, sync ships **no `generated/`
23
+ directory** — there are no codegen-emitted runtime artifacts from the base
24
+ install. (Per-entity change-source modules are emitted only if you declare a
25
+ `detection:` block in an entity YAML — see the change-sources L1 file.)
26
+
27
+ ## Mental model: the five-step dance
28
+
29
+ Every integration repeats the same invariant. Steps 2–5 are machinery the
30
+ subsystem owns; step 1 is the only thing you write per provider:
31
+
32
+ 1. **detect** upstream change *(you — `IChangeSource<T>`)*
33
+ 2. **diff** against local state *(subsystem — `IFieldDiffer<T>`, default `DeepEqualDiffer`)*
34
+ 3. **apply** the upsert or soft-delete *(your sink, called by the orchestrator)*
35
+ 4. **record** the structured delta into `sync_run_items` *(subsystem)*
36
+ 5. **emit** an event on success *(you — wired in your sink, optional)*
37
+
38
+ Three detection modes (poll / CDC / webhook) converge on the single
39
+ `IChangeSource<T>` port; per-mode differences live in `Change<T>` metadata, not
40
+ in separate ports.
41
+
42
+ **Sync is not events and not jobs.** Sync detects upstream change → diffs →
43
+ applies → records (`sync_runs` + `sync_run_items` pairs). It can be *triggered
44
+ by* a scheduled job (polling) or a webhook, and it can *emit* events on a
45
+ successful upsert — but the three subsystems have distinct lifecycles. See the
46
+ `jobs` and `events` skills for those.
47
+
48
+ **The audit is structured, not freeform.** `sync_run_items.changed_fields` is
49
+ `{ fieldName: { from, to } }` jsonb, validated at write time. That makes drift
50
+ queries ("when did this opportunity first become Closed Won?") one-shot SQL
51
+ filters instead of JSON scrapes.
52
+
53
+ ## Wiring at a glance
54
+
55
+ `SyncModule.forRoot(...)` in `app.module.ts` wires the substrate — the cursor
56
+ store, run recorder, field differ, and multi-tenant flag. It is `global: true`
57
+ and **does NOT provide `ExecuteSyncUseCase`**. The orchestrator depends on
58
+ `SYNC_CHANGE_SOURCE` + `SYNC_SINK`, which are per-entity and consumer-owned, so
59
+ you register `ExecuteSyncUseCase` in your *feature module* alongside those
60
+ bindings. Putting it in `SyncModule` would force Nest to resolve those tokens
61
+ at module compile time, before your feature module is imported.
62
+
63
+ ```ts
64
+ import { SyncModule } from '@shared/subsystems/sync';
65
+
66
+ @Module({
67
+ imports: [
68
+ DatabaseModule,
69
+ SyncModule.forRoot({ backend: 'drizzle' }), // 'memory' in tests
70
+ // ... per-entity feature modules, other subsystems
71
+ ],
72
+ })
73
+ export class AppModule {}
74
+ ```
75
+
76
+ ## Routing table
77
+
78
+ | When the task involves… | Read |
79
+ |---|---|
80
+ | Implementing `IChangeSource<T>` or `ISyncSink<T>`; the per-entity feature module; the `detection:` block + provider-keyed factory; triggering a run; multi-tenancy; loopback; testing | `change-sources-and-sinks.md` |
81
+ | The `sync_runs` / `sync_run_items` / `sync_subscriptions` shape; the structured `changed_fields` contract; worked drift / staleness / stuck-run queries; orchestrator run lifecycle and failure semantics | `audit-and-detection.md` |
82
+
83
+ ## Non-obvious rules
84
+
85
+ 1. **One port for three modes.** Poll, CDC, and webhook adapters all implement
86
+ `IChangeSource<T>` with `listChanges(subscription, cursor): AsyncIterable<Change<T>>`.
87
+ Per-mode concerns ride in `Change<T>` metadata (`source`, `dedupKey`,
88
+ `providerChangedFields`). Do not introduce `IPollSource` / `ICdcSource` /
89
+ `IWebhookSource` — the union is deliberate.
90
+
91
+ 2. **Cursors are opaque at the port seam, owned by the orchestrator.** Your
92
+ adapter types its own cursor internally and yields it on each `Change<T>`.
93
+ The orchestrator is the only reader/writer of cursor storage — never inject
94
+ the cursor store inside a source or sink.
95
+
96
+ 3. **`SyncModule` does NOT provide `ExecuteSyncUseCase`.** Register the
97
+ orchestrator in your feature module's `providers` array next to your source
98
+ and sink bindings.
99
+
100
+ 4. **`changed_fields` is structured, validated at write.** It is
101
+ `{ fieldName: { from, to } }`, parsed against the field-diff schema before
102
+ insert (in both Drizzle and Memory backends). Do not treat it as freeform —
103
+ arbitrary keys break drift queries and get rejected.
104
+
105
+ 5. **The sync audit tables are subsystem-owned.** Query `sync_subscriptions`,
106
+ `sync_runs`, and `sync_run_items` freely for dashboards, but do not write to
107
+ them directly (bypassing the recorder's validation lands malformed data),
108
+ and do not author entity YAMLs for them (that produces redundant
109
+ repositories/services shadowing the subsystem).
110
+
111
+ 6. **All-failed runs still advance the cursor.** If every record in a run
112
+ fails, the run is `status='failed'` but the cursor still persists as
113
+ last-yielded — the source kept yielding, so re-running would not re-deliver
114
+ those records. This is the most common "wait, what?" moment; document it in
115
+ your runbooks. Retry semantics are caller-owned.
116
+
117
+ 7. **The orchestrator does not emit events, schedule itself, retry, or resolve
118
+ subscriptions.** Those are all consumer concerns. Wire event emission inside
119
+ your sink's transaction; wire scheduling via a job or webhook handler.
120
+
121
+ ## Do not
122
+
123
+ - Do not introduce mode-specific ports (`IPollSource` / `ICdcSource` /
124
+ `IWebhookSource`). One `IChangeSource<T>` for all modes.
125
+ - Do not treat `changed_fields` as freeform jsonb — the `{ from, to }` shape is
126
+ load-bearing for drift queries and enforced at write.
127
+ - Do not provide `ExecuteSyncUseCase` in `SyncModule` — it forces eager
128
+ resolution of consumer-owned tokens.
129
+ - Do not write directly to the sync audit tables, and do not ship entity YAMLs
130
+ for them.
131
+ - Do not inject the cursor store inside a source or sink — the orchestrator
132
+ owns the get/put lifecycle.
133
+ - Do not drop `tenantId` when `multi_tenant: true` — the orchestrator throws
134
+ `MissingTenantIdError` at entry.
@@ -0,0 +1,302 @@
1
+ <!-- managed by @pattern-stack/codegen — re-run `codegen skills install` to refresh. Edit the package source, not this vendored copy. -->
2
+
3
+ # Audit model, orchestrator flow, and diffing
4
+
5
+ The sync subsystem records every run into three tables and runs every
6
+ integration through one orchestrator loop. This file covers the table shapes,
7
+ the structured `changed_fields` contract, the queries you'll write against the
8
+ audit log, the orchestrator's run lifecycle and failure semantics, and how the
9
+ default field differ works.
10
+
11
+ ## The three audit tables
12
+
13
+ All three are **subsystem-owned**. Query them freely for dashboards and admin
14
+ UIs, but never write to them directly (you'd bypass the recorder's validation),
15
+ and never author entity YAMLs for them (that produces redundant
16
+ repositories/services that shadow the subsystem).
17
+
18
+ ### `sync_subscriptions`
19
+
20
+ Cursor owner per `(integration_id, adapter, domain, external_ref)` tuple. The
21
+ cursor store reads/writes it.
22
+
23
+ | Column | Type | Notes |
24
+ |---|---|---|
25
+ | `id` | uuid PK | `defaultRandom()` |
26
+ | `integration_id` | text | Opaque id of the connected account/instance (SFDC org id, GH installation id, …) |
27
+ | `adapter` | text | Short adapter label: `'salesforce'`, `'hubspot'` |
28
+ | `domain` | text | Canonical entity: `'opportunity'`, `'contact'` |
29
+ | `external_ref` | text NULL | Upstream scope (filter id, webhook subscription id); NULL = full domain |
30
+ | `enabled` | bool, default true | Scheduling filter |
31
+ | `config` | jsonb, default `{}` | Per-subscription config (`batchSize`, `highWatermark`, …) |
32
+ | `cursor` | jsonb NULL | Opaque; written by the cursor store; NULL until first successful run |
33
+ | `last_sync_at` | ts NULL | Stamped alongside `cursor` |
34
+ | `tenant_id` | text NULL | Present only when `sync.multi_tenant: true` |
35
+ | `created_at` / `updated_at` | ts | |
36
+
37
+ Indexes: a unique `(integration_id, adapter, domain, external_ref)` tuple
38
+ (Postgres treats NULL `external_ref` as distinct — that's a consumer modeling
39
+ concern), and an `(enabled, last_sync_at)` scheduling index.
40
+
41
+ ### `sync_runs`
42
+
43
+ One row per `ExecuteSyncUseCase.execute()` invocation.
44
+
45
+ | Column | Type | Notes |
46
+ |---|---|---|
47
+ | `id` | uuid PK | |
48
+ | `subscription_id` | uuid FK → `sync_subscriptions.id` (cascade) | |
49
+ | `direction` | enum `inbound \| outbound` | Almost always `inbound`; `outbound` reserved for writeback |
50
+ | `action` | enum `poll \| cdc \| webhook \| manual \| writeback` | Provenance for self-identification |
51
+ | `status` | enum `running \| success \| no_changes \| failed` | `running` is in-flight only |
52
+ | `records_found` / `records_processed` | int, default 0 | |
53
+ | `cursor_before` / `cursor_after` | jsonb NULL | Opaque cursor snapshots |
54
+ | `duration_ms` | int NULL | Stamped at completion |
55
+ | `error` | text NULL | Run-level error only |
56
+ | `started_at` | ts, default now | |
57
+ | `completed_at` | ts NULL | NULL while `status='running'` |
58
+ | `tenant_id` | text NULL | Present only when multi-tenant |
59
+
60
+ Indexes: `(subscription_id, started_at)` for timelines, `(status, started_at)`
61
+ for the stale-run sweeper.
62
+
63
+ ### `sync_run_items`
64
+
65
+ One row per upstream change processed within a run.
66
+
67
+ | Column | Type | Notes |
68
+ |---|---|---|
69
+ | `id` | uuid PK | |
70
+ | `sync_run_id` | uuid FK → `sync_runs.id` (cascade) | |
71
+ | `entity_type` | text | Canonical domain (`'opportunity'`) |
72
+ | `external_id` | text | Upstream id |
73
+ | `local_id` | text NULL | Set on `created \| updated \| deleted`; null on `noop` |
74
+ | `operation` | enum `created \| updated \| deleted \| noop` | |
75
+ | `status` | enum `success \| failed \| skipped` | `skipped` = loopback echo |
76
+ | `changed_fields` | jsonb NOT NULL, default `{}` | Structured `{ from, to }` shape; validated at write |
77
+ | `title` | text NULL | Optional human label |
78
+ | `error` | text NULL | Item-level error on `status='failed'` |
79
+ | `created_at` | ts, default now | |
80
+ | `tenant_id` | text NULL | Present only when multi-tenant |
81
+
82
+ Indexes: `(sync_run_id, created_at)` for within-run timelines,
83
+ `(entity_type, external_id)` for per-record history.
84
+
85
+ ## The `changed_fields` contract
86
+
87
+ `changed_fields` is structured `{ fieldName: { from, to } }` jsonb — not
88
+ freeform:
89
+
90
+ ```jsonc
91
+ {
92
+ "stage_name": { "from": "Prospecting", "to": "Closed Won" },
93
+ "amount": { "from": 92364, "to": 120000 }
94
+ }
95
+ ```
96
+
97
+ It is validated against the field-diff schema at the recorder boundary in
98
+ **both** the Drizzle and Memory backends, before the INSERT. A malformed shape
99
+ throws a validation error (not a DB constraint error) so you catch it at the
100
+ recorder, not in the database.
101
+
102
+ **Why structured beats freeform:** drift queries become one-shot SQL filters
103
+ instead of JSON scrapes. "When did this opportunity first become Closed Won?" is
104
+ an index-friendly filter any developer writes in two minutes (see queries
105
+ below), not a custom JSONB-extraction function per question.
106
+
107
+ **Canonical shape per operation:**
108
+
109
+ | Operation | `changed_fields` |
110
+ |---|---|
111
+ | `created` | `{ [field]: { from: null, to: <value> } }` for every non-null user field |
112
+ | `updated` | `{ [field]: { from: <old>, to: <new> } }` for mutated fields only |
113
+ | `deleted` | `{}` — the deletion itself is the change |
114
+ | `noop` | `{}` — no change detected |
115
+
116
+ Created-row diffs include domain identifiers like `external_id` — they are
117
+ legitimately part of a new record's diff. If that's too wide for your audit
118
+ taste, augment the differ's ignore list in your feature module (see Diffing).
119
+
120
+ ## Common queries
121
+
122
+ **"What changed in the last 24 hours across all sync?"**
123
+
124
+ ```sql
125
+ SELECT sr.action, sri.entity_type, sri.external_id, sri.operation, sri.changed_fields
126
+ FROM sync_run_items sri
127
+ JOIN sync_runs sr ON sri.sync_run_id = sr.id
128
+ WHERE sri.created_at > now() - interval '1 day'
129
+ AND sri.status = 'success'
130
+ AND sri.operation != 'noop'
131
+ ORDER BY sri.created_at DESC;
132
+ ```
133
+
134
+ **"Which subscriptions are stale?"** (uses the `(enabled, last_sync_at)` index)
135
+
136
+ ```sql
137
+ SELECT id, adapter, domain, external_ref, last_sync_at
138
+ FROM sync_subscriptions
139
+ WHERE enabled = true
140
+ AND (last_sync_at IS NULL OR last_sync_at < now() - interval '1 hour')
141
+ ORDER BY last_sync_at ASC NULLS FIRST;
142
+ ```
143
+
144
+ **"Any runs stuck in-flight?"** (uses the `(status, started_at)` index — should
145
+ return zero rows under normal operation; non-zero means the process died
146
+ mid-run without reaching the completion path)
147
+
148
+ ```sql
149
+ SELECT sr.id, sr.subscription_id, sr.started_at, sr.action
150
+ FROM sync_runs sr
151
+ WHERE sr.status = 'running'
152
+ AND sr.started_at < now() - interval '10 minutes';
153
+ ```
154
+
155
+ **"When did opportunity X first become Closed Won?"**
156
+
157
+ ```sql
158
+ SELECT sri.created_at
159
+ FROM sync_run_items sri
160
+ WHERE sri.entity_type = 'opportunity'
161
+ AND sri.external_id = '006Ab00000ABC'
162
+ AND sri.changed_fields -> 'stage_name' ->> 'to' = 'Closed Won'
163
+ ORDER BY sri.created_at ASC
164
+ LIMIT 1;
165
+ ```
166
+
167
+ **"Drift detection: opportunities whose `amount` changed in the last week"**
168
+ (the `?` operator hits the jsonb column directly — no JOIN, no JSONB function
169
+ gymnastics)
170
+
171
+ ```sql
172
+ SELECT sri.external_id,
173
+ sri.changed_fields -> 'amount' ->> 'from' AS old_amount,
174
+ sri.changed_fields -> 'amount' ->> 'to' AS new_amount,
175
+ sri.created_at
176
+ FROM sync_run_items sri
177
+ WHERE sri.entity_type = 'opportunity'
178
+ AND sri.created_at > now() - interval '7 days'
179
+ AND sri.changed_fields ? 'amount'
180
+ ORDER BY sri.created_at DESC;
181
+ ```
182
+
183
+ When `multi_tenant: true`, add `AND tenant_id = $1` to any of these.
184
+
185
+ ## Orchestrator run lifecycle
186
+
187
+ ```
188
+ execute(input)
189
+ ├─ assertTenantId(input.tenantId) ← throws BEFORE startRun when multiTenant
190
+ ├─ cursorBefore = cursors.get(subId, tenantId)
191
+ ├─ runId = recorder.startRun({ subId, direction, action, cursorBefore, tenantId })
192
+
193
+ ├─ for await (change of source.listChanges(sub, cursorBefore)):
194
+ │ recordsFound++; latestCursor = change.cursor; cursorAdvanced = true
195
+ │ try:
196
+ │ if loopback.isEchoOfOwnWrite(…): recordItem({ operation:'noop', status:'skipped' }); continue
197
+ │ if change.operation === 'deleted':
198
+ │ result = sink.softDeleteByExternalId(…)
199
+ │ recordItem({ operation: result ? 'deleted' : 'noop', status:'success', localId: result?.id })
200
+ │ else:
201
+ │ existing = sink.findByExternalId(…)
202
+ │ diff = differ.diff(existing, change.record, change.providerChangedFields)
203
+ │ if diff === 'noop': recordItem({ operation:'noop', status:'success' })
204
+ │ else:
205
+ │ { id } = sink.upsertByExternalId(…)
206
+ │ recordItem({ operation: existing===null ? 'created' : 'updated',
207
+ │ status:'success', localId: id, changedFields: diff })
208
+ │ recordsProcessed++
209
+ │ catch: recordsFailed++; recordItem({ status:'failed', error })
210
+
211
+ ├─ if cursorAdvanced: cursors.put(subId, latestCursor, tenantId)
212
+ └─ recorder.completeRun(runId, { status, counts, cursorAfter, durationMs, error }) ← finally
213
+ ```
214
+
215
+ ### Failure semantics worth memorizing
216
+
217
+ 1. **`assertTenantId` fires before `startRun`.** Rejected multi-tenant inputs
218
+ never open a `sync_runs` row — no dangling `status=running`. Backends
219
+ re-validate at their write boundary (defense in depth).
220
+
221
+ 2. **Cursor advances per-yield, not per-success.** `latestCursor` updates on
222
+ every yield, persisted once at the end as whatever the iterator *last*
223
+ produced — regardless of whether that record succeeded, failed, or was
224
+ skipped. A source that yields 10 then throws on 11 still persists the cursor
225
+ of record 10; re-running resumes at 11.
226
+
227
+ 3. **All-failed runs still advance the cursor.** If every record throws from the
228
+ sink, the run is `status='failed'` with `error: 'all N records failed'` — but
229
+ the cursor still persists, because the source kept yielding and re-running
230
+ would not re-deliver those records. **This is the most common "wait, what?"
231
+ moment for first-time consumers — document it in your runbooks.** Retry
232
+ semantics (dead-letter replay, `action: 'manual'` resync with a
233
+ `sourceOverride`) are caller-owned. If you want hold-on-all-fail, wrap the
234
+ orchestrator with your own retry layer — don't change the subsystem default.
235
+
236
+ 4. **Source throws mid-iteration** → run `status='failed'`, last-good cursor
237
+ persisted, completion runs in `finally`. Partial runs don't lose progress.
238
+ **Source throws before any yield** (connect timeout) → cursor is not advanced;
239
+ `cursors.put` is skipped; the run completes with `cursorAfter: cursorBefore`.
240
+
241
+ 5. **`completeRun` is in a `finally` block.** The run always terminates — no
242
+ stuck `status='running'` rows. Operator cleanup queries can rely on
243
+ `completed_at IS NULL`.
244
+
245
+ 6. **Per-item failure does not fail the run.** The try/catch is per-record. A
246
+ run with 9 successes + 1 failure is `status='success'` with
247
+ `recordsProcessed: 9, recordsFailed: 1`. Only when *every* seen record fails
248
+ does the run go `failed` (decision 3).
249
+
250
+ 7. **`cursors.put` failure promotes a successful run to `failed`** with
251
+ `error: 'cursor put failed: ...'`. A successful run-log but no cursor advance
252
+ is a worse footgun than a failed-run marker — the next run would re-process
253
+ everything — so the orchestrator surfaces the cursor problem loudly.
254
+
255
+ ### What the orchestrator does NOT do
256
+
257
+ - **Does not emit events** — wire `TypedEventBus.publish(...)` inside your
258
+ sink's `upsertByExternalId` transaction (see the change-sources L1 file).
259
+ - **Does not schedule itself** — scheduling is a job, cron, or webhook handler
260
+ you own.
261
+ - **Does not retry** — per-item failures are recorded and skipped; run-level
262
+ failures bubble to the caller.
263
+ - **Does not resolve subscriptions** — `input.subscription` is passed in by the
264
+ caller; subscription lookup / enabled-checks are your concern.
265
+
266
+ ## Diffing: the default `DeepEqualDiffer`
267
+
268
+ The orchestrator calls `differ.diff(existing, incoming, change.providerChangedFields)`
269
+ once per record. `existing` comes from your sink (canonical); `incoming` is
270
+ `change.record` (canonical). The default `DeepEqualDiffer` returns either a
271
+ `{ fieldName: { from, to } }` map or the literal `'noop'`.
272
+
273
+ **Default ignore list** (row metadata that sinks/services stamp):
274
+ `id`, `createdAt`, `updatedAt`, `deletedAt`, `type`, `lastModifiedAt`,
275
+ `fields`, `providerMetadata`. (`fields` is the EAV bag — diffed by the sink's
276
+ dual-write path, not the canonical layer.) Domain fields, including identifiers
277
+ like `external_id`, are NOT ignored.
278
+
279
+ **Normalizations applied during comparison** (so equal values aren't reported
280
+ as changes), but the diff output preserves the *raw* values:
281
+
282
+ - `Date → toISOString()` — adapters deliver strings, the DB driver returns
283
+ `Date`; normalize so they match.
284
+ - Decimal-string ↔ number — Postgres `numeric` comes back as a string through
285
+ Drizzle while adapters deliver numbers; a numeric and a finite-parseable
286
+ string compare equal (with an empty-string guard against silent 0-equality).
287
+
288
+ **Augment the ignore list per entity** (values merge with the defaults; you
289
+ cannot remove defaults):
290
+
291
+ ```ts
292
+ { provide: SYNC_FIELD_DIFFER, useValue: new DeepEqualDiffer({ ignore: ['sync_version'] }) }
293
+ ```
294
+
295
+ Bind it as `useValue: new DeepEqualDiffer(...)`, not `useClass` — the
296
+ constructor's optional options object confuses Nest's metadata reflection.
297
+
298
+ **`providerChangedFields` is advisory.** When a CDC provider tells you which
299
+ columns changed, set it on the `Change<T>` and the differ skips deep-equal over
300
+ untouched fields — but it still applies the ignore list. To write a fully
301
+ custom differ (type-aware enum normalization, hint-only inspection), implement
302
+ `IFieldDiffer<T>` and bind it to `SYNC_FIELD_DIFFER`.