create-ncblock 0.0.39 → 0.0.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/package.json +1 -1
  2. package/scripts/init.ts +39 -23
  3. package/scripts/utils/templates.ts +53 -8
  4. package/sdk-version.json +1 -1
  5. package/templates/worker/.agents/INSTRUCTIONS.md +593 -0
  6. package/templates/worker/.agents/skills/auth-guide/SKILL.md +227 -0
  7. package/templates/worker/.agents/skills/sync/SKILL.md +368 -0
  8. package/templates/worker/.agents/skills/sync-debug/SKILL.md +101 -0
  9. package/templates/worker/.agents/skills/sync-guide/SKILL.md +253 -0
  10. package/templates/worker/.agents/skills/sync-guide/api-pagination-patterns.md +661 -0
  11. package/templates/worker/.agents/skills/sync-guide/examples/incremental-basic.ts +103 -0
  12. package/templates/worker/.agents/skills/sync-guide/examples/incremental-bimodal.ts +207 -0
  13. package/templates/worker/.agents/skills/sync-guide/examples/incremental-events.ts +132 -0
  14. package/templates/worker/.agents/skills/sync-guide/examples/replace-paginated.ts +79 -0
  15. package/templates/worker/.agents/skills/sync-guide/examples/replace-simple.ts +57 -0
  16. package/templates/worker/.agents/skills/sync-validate/SKILL.md +60 -0
  17. package/templates/worker/.claudeignore +2 -0
  18. package/templates/worker/.codexignore +2 -0
  19. package/templates/worker/.examples/automation-example.ts +60 -0
  20. package/templates/worker/.examples/oauth-example.ts +79 -0
  21. package/templates/worker/.examples/sync-example.ts +184 -0
  22. package/templates/worker/.examples/tool-example.ts +37 -0
  23. package/templates/worker/.examples/webhook-example.ts +66 -0
  24. package/templates/worker/README.md +765 -0
  25. package/templates/worker/_gitignore +6 -0
  26. package/templates/worker/docs/custom-tool.png +0 -0
  27. package/templates/worker/notionhq-workers-0.4.0.tgz +0 -0
  28. package/templates/worker/package.json +25 -0
  29. package/templates/worker/src/index.ts +8 -0
  30. package/templates/worker/tsconfig.json +16 -0
  31. package/templates/worker/views/empty/AGENTS.md +71 -0
  32. package/templates/worker/views/empty/README.md +10 -0
  33. package/templates/worker/views/empty/_gitignore +2 -0
  34. package/templates/worker/views/empty/custom_blocks.json +4 -0
  35. package/templates/worker/views/empty/index.html +15 -0
  36. package/templates/worker/views/empty/package.json +23 -0
  37. package/templates/worker/views/empty/src/index.css +33 -0
  38. package/templates/worker/views/empty/src/index.tsx +20 -0
  39. package/templates/worker/views/empty/tsconfig.json +17 -0
  40. package/templates/worker/views/empty/vite.config.ts +7 -0
@@ -0,0 +1,661 @@
1
+ # API Pagination & Cursor Strategy Reference
2
+
3
+ Strategies drawn from production syncs with Salesforce, Stripe, HubSpot, GitHub, and ServiceNow. Intended as a reference for building Notion Workers syncs.
4
+
5
+ > **v2 SDK:** Code snippets use the v2 SDK shape. Databases are declared separately via `worker.database()` and syncs reference them by handle. For APIs with change tracking, the recommended architecture is a **backfill sync** (`mode: "replace"`, `schedule: "manual"`) paired with a **delta sync** (`mode: "incremental"`).
6
+
7
+ ---
8
+
9
+ ## The Universal Contract
10
+
11
+ ```
12
+ execute(state) → { changes, hasMore, nextState }
13
+ ```
14
+
15
+ The cursor lives in `nextState`. The runtime calls `execute` again with that state until `hasMore` is `false`, completing a **cycle**. The next cycle starts with the state from the end of the previous cycle.
16
+
17
+ **Critical:** In incremental mode, state is never reset. The cursor persists across cycles indefinitely. When a cycle ends (`hasMore: false`), the next cycle begins with the same `nextState`. This means:
18
+ - Records behind the cursor are never re-fetched (unless you explicitly move the cursor backwards)
19
+ - A consistency buffer isn't about "catching up next time" — it's about ensuring the cursor never advances past records that haven't been indexed by the source API yet
20
+ - If a record is missed because the cursor passed it, it's missed permanently
21
+
22
+ In replace mode, the runtime handles deletion detection. Each cycle must return the complete dataset. State is only used for within-cycle pagination and is effectively reset between cycles.
23
+
24
+ ---
25
+
26
+ ## Source 1: Salesforce
27
+
28
+ **API type:** REST + SOQL queries
29
+ **Pagination:** Keyset on `(timestamp, id)`
30
+
31
+ ### Backfill
32
+
33
+ Uses `ORDER BY CreatedDate, Id LIMIT N` with a keyset `WHERE` clause:
34
+
35
+ ```sql
36
+ WHERE CreatedDate > :cursorTimestamp
37
+ OR (CreatedDate = :cursorTimestamp AND Id > :cursorId)
38
+ ```
39
+
40
+ This is the gold standard for paginating mutable datasets by timestamp. The `Id` column breaks ties when multiple records share the same `CreatedDate`, preventing both skips and duplicates.
41
+
42
+ ### Delta (Incremental)
43
+
44
+ Identical keyset pattern but on `SystemModstamp` (Salesforce's last-modified timestamp) instead of `CreatedDate`. The cursor is buffered to **at most 15 seconds behind "now"** to guard against Salesforce's eventual consistency. This buffer is critical because the cursor never goes backwards — any record not yet visible when the cursor passes it is lost permanently.
45
+
46
+ ### Cursor Design
47
+
48
+ With separate backfill and delta syncs, each has its own simple cursor:
49
+
50
+ ```ts
51
+ // Backfill cursor (within-cycle pagination for replace mode)
52
+ type SalesforceBackfillState = { cursorTimestamp: string; cursorId: string };
53
+
54
+ // Delta cursor (persists across cycles in incremental mode)
55
+ type SalesforceDeltaState = { cursorTimestamp: string; cursorId: string };
56
+ ```
57
+
58
+ Since the backfill is a replace-mode sync, its state is only used for within-cycle pagination. The delta sync's cursor persists across cycles and tracks the last-seen `SystemModstamp`.
59
+
60
+ ### Gotcha: Unreliable `done` Flag
61
+
62
+ Salesforce returns a `done` boolean in query results. It lies. The production code requires *both* `done == true` AND `records.length < limit` before treating a page as the last one. Neither signal alone is trustworthy.
63
+
64
+ ### Workers Mapping
65
+
66
+ With the v2 SDK, this is modeled as two syncs: a manual backfill (replace) and a scheduled delta (incremental).
67
+
68
+ ```ts
69
+ const db = worker.database("salesforce_accounts");
70
+
71
+ // Backfill: keyset pagination on CreatedDate — run manually to seed data
72
+ worker.sync("salesforceBackfill", {
73
+ database: db,
74
+ mode: "replace",
75
+ schedule: "manual",
76
+ execute: async (state: { cursorTimestamp: string; cursorId: string } | undefined) => {
77
+ // Keyset query: WHERE CreatedDate > X OR (CreatedDate = X AND Id > Y)
78
+ // ORDER BY CreatedDate, Id LIMIT 100
79
+ const records = await querySOQL(state?.cursorTimestamp, state?.cursorId);
80
+ const last = records[records.length - 1];
81
+ const done = records.length < 100;
82
+
83
+ return {
84
+ changes: records.map(toUpsert),
85
+ hasMore: !done,
86
+ nextState: done ? undefined : { cursorTimestamp: last.CreatedDate, cursorId: last.Id },
87
+ };
88
+ },
89
+ });
90
+
91
+ // Delta: keyset on SystemModstamp, with 15s consistency buffer
92
+ worker.sync("salesforceDelta", {
93
+ database: db,
94
+ mode: "incremental",
95
+ schedule: { cron: "*/5 * * * *" },
96
+ execute: async (state: { cursorTimestamp: string; cursorId: string } | undefined) => {
97
+ const bufferTs = new Date(Date.now() - 15_000).toISOString();
98
+ const records = await querySOQL(state?.cursorTimestamp, state?.cursorId, "SystemModstamp");
99
+ const last = records[records.length - 1];
100
+ const done = records.length < 100;
101
+
102
+ return {
103
+ changes: records.map(toUpsert),
104
+ hasMore: !done,
105
+ nextState: {
106
+ cursorTimestamp: done ? min(last?.SystemModstamp ?? state?.cursorTimestamp, bufferTs) : last.SystemModstamp,
107
+ cursorId: last?.Id ?? state?.cursorId,
108
+ },
109
+ };
110
+ },
111
+ });
112
+ ```
113
+
114
+ ---
115
+
116
+ ## Source 2: Stripe
117
+
118
+ **API type:** REST with cursor-based list pagination
119
+ **Pagination:** `starting_after` / `ending_before` + `has_more`
120
+
121
+ ### Backfill
122
+
123
+ Standard Stripe list pagination: `GET /v1/customers?starting_after=cus_xyz&limit=100`. The cursor is the `id` of the last object on the page. Stripe's `has_more` boolean is reliable.
124
+
125
+ **Critical pre-step:** Before fetching any data page, the backfill captures the ID of the most recent event from `GET /v1/events?limit=1`. This "event anchor" is saved in the cursor so the delta phase knows exactly where to start.
126
+
127
+ ### Delta (Event-Based)
128
+
129
+ Reads from `GET /v1/events` in reverse-chronological order. The cursor is an event ID. Events are filtered to only those at least **10 seconds old** — events younger than 10s are skipped. If all events on a page are too recent, the cursor does not advance. Since the cursor never resets, this buffer ensures the cursor doesn't permanently skip past late-arriving events.
130
+
131
+ ### Nested Object Extraction
132
+
133
+ Stripe objects contain nested sub-objects (e.g., a `PaymentIntent` contains `payment_method`). The sync recursively walks payloads and extracts sub-objects. If a list field has `has_more: true`, it paginates that sub-list inline. This means one "page" of the sync may trigger many HTTP requests.
134
+
135
+ ### Cursor Design
136
+
137
+ With separate syncs, each cursor is simple:
138
+
139
+ ```ts
140
+ // Backfill cursor (within-cycle pagination for replace mode)
141
+ type StripeBackfillState = { cursor: string | null };
142
+
143
+ // Delta cursor (event ID, persists across cycles)
144
+ type StripeDeltaState = { cursor: string };
145
+ ```
146
+
147
+ ### Workers Mapping
148
+
149
+ Two syncs: a manual backfill and a scheduled delta reading from the events endpoint.
150
+
151
+ ```ts
152
+ const db = worker.database("stripe_customers");
153
+
154
+ // Backfill: paginate all customers, capture event anchor for delta handoff
155
+ worker.sync("stripeBackfill", {
156
+ database: db,
157
+ mode: "replace",
158
+ schedule: "manual",
159
+ execute: async (state: { cursor: string | null } | undefined) => {
160
+ const { data, has_more } = await stripe.customers.list({
161
+ starting_after: state?.cursor ?? undefined,
162
+ limit: 100,
163
+ });
164
+ const last = data[data.length - 1];
165
+
166
+ return {
167
+ changes: data.map(toUpsert),
168
+ hasMore: has_more,
169
+ nextState: has_more ? { cursor: last.id } : undefined,
170
+ };
171
+ },
172
+ });
173
+
174
+ // Delta: read events, skip any < 10s old
175
+ worker.sync("stripeDelta", {
176
+ database: db,
177
+ mode: "incremental",
178
+ schedule: { cron: "*/5 * * * *" },
179
+ execute: async (state: { cursor: string } | undefined) => {
180
+ const { data: events, has_more } = await stripe.events.list({
181
+ ending_before: state?.cursor,
182
+ limit: 100,
183
+ });
184
+ const safeEvents = events.filter(e => e.created < Date.now() / 1000 - 10);
185
+ const changes = safeEvents.map(eventToChange); // map to upsert or delete
186
+ const lastSafe = safeEvents[safeEvents.length - 1];
187
+
188
+ return {
189
+ changes,
190
+ hasMore: has_more && safeEvents.length > 0,
191
+ nextState: { cursor: lastSafe?.id ?? state?.cursor },
192
+ };
193
+ },
194
+ });
195
+ ```
196
+
197
+ ---
198
+
199
+ ## Source 3: HubSpot
200
+
201
+ **API type:** REST (CRM v3 — both List and Search endpoints)
202
+ **Pagination:** Opaque `after` token (List) / timestamp cursor (Search)
203
+
204
+ ### Backfill
205
+
206
+ Uses `GET /crm/v3/objects/{type}?limit=100&after=<token>`. The `after` token is opaque (HubSpot generates it). Completion is detected by the absence of the `paging` key in the response.
207
+
208
+ ### Delta
209
+
210
+ Uses `POST /crm/v3/objects/{type}/search` with a `GTE` filter on `lastmodifieddate` (milliseconds). The cursor advances to `max(lastmodifieddate)` across the page. Capped to **10 seconds behind "now"** — since the cursor never resets, this ensures records still being indexed by HubSpot aren't permanently skipped.
211
+
212
+ ### The Deadlock Problem
213
+
214
+ The most instructive edge case across all sources. HubSpot's Search API only sorts by one field. If >100 records share the same `lastmodifieddate`, the cursor can never advance past that timestamp — it's stuck returning the same 100 records forever.
215
+
216
+ **Detection:** If `records.length == page_limit` AND all records have the same timestamp → deadlock.
217
+
218
+ **Resolution:** Switch to a special deadlock-breaking mode that filters `lastmodifieddate EQ <stuck_timestamp>` and paginates by `hs_object_id > <last_seen_id>`. When the deadlock clears (empty page), resume normal search with cursor advanced by 1ms.
219
+
220
+ ### Cursor Design
221
+
222
+ With separate syncs, the backfill cursor is simple. The delta sync still needs a multi-phase state for deadlock handling:
223
+
224
+ ```ts
225
+ // Backfill cursor (within-cycle pagination for replace mode)
226
+ type HubSpotBackfillState = { afterToken: string | null };
227
+
228
+ // Delta cursor (deadlock handling requires multi-phase state)
229
+ type HubSpotDeltaState =
230
+ | { phase: "delta"; cursorMs: number }
231
+ | { phase: "deadlock"; deadlockMs: number; lastId: string; resumeCursorMs: number };
232
+ ```
233
+
234
+ ### Workers Mapping
235
+
236
+ Two syncs: a manual backfill using the List endpoint, and a delta sync using the Search endpoint with deadlock handling.
237
+
238
+ ```ts
239
+ const db = worker.database("hubspot_contacts");
240
+
241
+ // Backfill: paginate using opaque after token
242
+ worker.sync("hubspotBackfill", {
243
+ database: db,
244
+ mode: "replace",
245
+ schedule: "manual",
246
+ execute: async (state: { afterToken: string | null } | undefined) => {
247
+ const { results, paging } = await hubspotList(state?.afterToken);
248
+ const hasMore = Boolean(paging?.next?.after);
249
+
250
+ return {
251
+ changes: results.map(toUpsert),
252
+ hasMore,
253
+ nextState: hasMore ? { afterToken: paging.next.after } : undefined,
254
+ };
255
+ },
256
+ });
257
+
258
+ // Delta: search by lastmodifieddate with deadlock handling
259
+ type HubSpotDeltaState =
260
+ | { phase: "delta"; cursorMs: number }
261
+ | { phase: "deadlock"; deadlockMs: number; lastId: string; resumeCursorMs: number };
262
+
263
+ worker.sync("hubspotDelta", {
264
+ database: db,
265
+ mode: "incremental",
266
+ schedule: { cron: "*/5 * * * *" },
267
+ execute: async (state: HubSpotDeltaState | undefined) => {
268
+ if (state?.phase === "deadlock") {
269
+ // Page through records at the stuck timestamp by ID
270
+ const results = await hubspotSearch({
271
+ filter: { lastmodifieddate: { eq: state.deadlockMs } },
272
+ after: state.lastId, // hs_object_id > lastId
273
+ });
274
+
275
+ if (results.length === 0) {
276
+ // Deadlock cleared — resume normal delta, advance cursor by 1ms
277
+ return {
278
+ changes: [],
279
+ hasMore: true,
280
+ nextState: { phase: "delta", cursorMs: state.resumeCursorMs + 1 },
281
+ };
282
+ }
283
+
284
+ const lastId = results[results.length - 1].id;
285
+ return {
286
+ changes: results.map(toUpsert),
287
+ hasMore: true,
288
+ nextState: { phase: "deadlock", deadlockMs: state.deadlockMs, lastId, resumeCursorMs: state.resumeCursorMs },
289
+ };
290
+ }
291
+
292
+ // Normal delta: search by lastmodifieddate >= cursorMs
293
+ const bufferMs = Date.now() - 10_000;
294
+ const cursorMs = state?.cursorMs ?? Date.now() - 5 * 60 * 1000;
295
+ const results = await hubspotSearch({
296
+ filter: { lastmodifieddate: { gte: cursorMs } },
297
+ limit: 100,
298
+ });
299
+
300
+ // Deadlock detection
301
+ const allSameTimestamp = results.length === 100 &&
302
+ results.every(r => r.lastmodifieddate === results[0].lastmodifieddate);
303
+
304
+ if (allSameTimestamp) {
305
+ return {
306
+ changes: results.map(toUpsert),
307
+ hasMore: true,
308
+ nextState: {
309
+ phase: "deadlock",
310
+ deadlockMs: results[0].lastmodifieddate,
311
+ lastId: results[results.length - 1].id,
312
+ resumeCursorMs: results[0].lastmodifieddate,
313
+ },
314
+ };
315
+ }
316
+
317
+ const maxTs = Math.max(...results.map(r => r.lastmodifieddate));
318
+ const nextCursor = Math.min(maxTs, bufferMs);
319
+ const done = results.length < 100;
320
+
321
+ return {
322
+ changes: results.map(toUpsert),
323
+ hasMore: !done,
324
+ nextState: { phase: "delta", cursorMs: done ? nextCursor : maxTs },
325
+ };
326
+ },
327
+ });
328
+ ```
329
+
330
+ ---
331
+
332
+ ## Source 4: GitHub
333
+
334
+ **API type:** GraphQL (Relay-style connections)
335
+ **Pagination:** `endCursor` + `hasNextPage` from `pageInfo`
336
+
337
+ ### Backfill
338
+
339
+ Standard Relay pagination: `first: 100, after: $cursor` → `pageInfo { endCursor, hasNextPage }`. The cursor is the opaque `endCursor` string.
340
+
341
+ ### Two-Level Pagination
342
+
343
+ GitHub has nested collections (e.g., issues within repositories). The sync handles this with a two-level cursor:
344
+
345
+ 1. **Outer level:** paginate over repositories using `endCursor`
346
+ 2. **Inner level:** for each repository, track a separate `endCursor` in a `nestedCursors` map
347
+
348
+ When inner cursors exist, the next request only queries repos with more data. The overall `hasMore` is `outerHasMore || nestedCursors.size > 0`.
349
+
350
+ ### Rate Limit Awareness
351
+
352
+ The GraphQL response includes `rateLimit { limit }`. This is stored in the cursor and used to configure request pacing on subsequent pages.
353
+
354
+ ### Cursor Design
355
+
356
+ ```ts
357
+ type GitHubState = {
358
+ cursor: string | null;
359
+ nestedCursors?: Record<string, string>; // repo → inner endCursor
360
+ };
361
+
362
+ // hasMore = Boolean(pageInfo.hasNextPage) || Object.keys(nestedCursors ?? {}).length > 0
363
+ ```
364
+
365
+ ### Workers Mapping
366
+
367
+ ```ts
368
+ const db = worker.database("github_repos");
369
+
370
+ worker.sync("githubSync", {
371
+ database: db,
372
+ mode: "replace",
373
+ schedule: { cron: "0 * * * *" }, // GitHub GraphQL has no good incremental signal without webhooks
374
+ execute: async (state: GitHubState | undefined) => {
375
+ // For flat collections (e.g., repos): simple Relay pagination
376
+ const { data, pageInfo } = await graphql(query, { after: state?.cursor });
377
+
378
+ return {
379
+ changes: data.map(toUpsert),
380
+ hasMore: pageInfo.hasNextPage,
381
+ nextState: pageInfo.hasNextPage
382
+ ? { cursor: pageInfo.endCursor }
383
+ : undefined,
384
+ };
385
+
386
+ // For nested collections (e.g., issues across repos):
387
+ // Track nestedCursors map, query only repos with hasNextPage,
388
+ // hasMore = outerMore || Object.keys(nestedCursors).length > 0
389
+ },
390
+ });
391
+ ```
392
+
393
+ ---
394
+
395
+ ## Source 5: ServiceNow
396
+
397
+ **API type:** REST (Table API with SYSPARM query language)
398
+ **Pagination:** Keyset on `(sys_updated_on, sys_id)`
399
+
400
+ ### Backfill & Delta
401
+
402
+ Same keyset pattern as Salesforce, using ServiceNow's query syntax:
403
+
404
+ ```
405
+ sys_updated_on>{cursor}^NQsys_updated_on={cursor}^sys_id>{sys_id}
406
+ ^ORDERBYsys_updated_on^ORDERBYsys_id
407
+ ```
408
+
409
+ The `^NQ` is ServiceNow's OR operator. This is the `(timestamp, id)` keyset pattern again.
410
+
411
+ ### Deletion via Audit Log
412
+
413
+ ServiceNow captures deletes in the `sys_audit` table (`fieldname=DELETED`). In the production system, this runs as a separate parallel stream using `(sys_created_on, sys_id)` keyset pagination.
414
+
415
+ **In Workers (single stream):** Model this as a flip-flop. The main delta stream runs until `hasMore: false` (caught up), then the state switches to the delete stream for a cycle, then back. See the "Stream Flip-Flop" pattern below.
416
+
417
+ ### Completion Detection Difference
418
+
419
+ - **Backfill:** continues until an empty page (`records.length == 0`)
420
+ - **Delta:** stops when a page is not full (`records.length < limit`)
421
+
422
+ This is a subtle but important distinction. Backfill is exhaustive; delta assumes a non-full page means "caught up."
423
+
424
+ ### Cursor Design
425
+
426
+ With separate syncs, the backfill cursor is simple. The delta sync uses a flip-flop state for delete detection:
427
+
428
+ ```ts
429
+ // Backfill cursor (within-cycle pagination for replace mode)
430
+ type ServiceNowBackfillState = { afterTimestamp: string | null; afterId: string | null };
431
+
432
+ // Delta cursor (flip-flop between changes and deletes)
433
+ type ServiceNowDeltaState =
434
+ | { phase: "delta"; afterTimestamp: string; afterId: string;
435
+ deletesCursor?: { afterCreatedOn: string; afterId: string } }
436
+ | { phase: "deletes"; afterCreatedOn: string; afterId: string;
437
+ deltaCursor: { afterTimestamp: string; afterId: string } };
438
+ ```
439
+
440
+ ---
441
+
442
+ ## APIs Without Change Tracking
443
+
444
+ Some APIs (Linear, Airtable) have no `updated_at`, no change feed, and no deletion webhook. For these, **use `mode: "replace"`**. The runtime handles the full sweep automatically: each cycle returns the complete dataset, and anything not returned gets deleted.
445
+
446
+ Replace mode is the right choice when:
447
+ - The API provides only opaque cursor pagination with no timestamp filtering
448
+ - Total records are manageable (< ~50k, depending on schedule interval)
449
+ - You need deletion detection but the API provides no delete signal
450
+
451
+ The state in replace mode is just within-cycle pagination (e.g., `{ offset: string }`) and effectively resets between cycles.
452
+
453
+ ---
454
+
455
+ ## Cross-Cutting Patterns
456
+
457
+ These patterns recur across multiple sources. They're the building blocks of cursor design.
458
+
459
+ ### Pattern 1: Keyset Pagination `(timestamp, id)`
460
+
461
+ **Used by:** Salesforce, ServiceNow
462
+
463
+ The correct way to paginate a mutable dataset ordered by timestamp. Two columns form the cursor: the timestamp and a unique ID that breaks ties. The query uses an OR condition:
464
+
465
+ ```
466
+ WHERE ts > :cursorTs OR (ts = :cursorTs AND id > :cursorId)
467
+ ORDER BY ts, id
468
+ ```
469
+
470
+ **When to use:** Any API that lets you query with inequality filters on a timestamp and sort by it. Particularly important when multiple records can share the same timestamp (batch imports, bulk updates).
471
+
472
+ **Workers implementation:**
473
+
474
+ ```ts
475
+ type KeysetCursor = { cursorTimestamp: string; cursorId: string };
476
+
477
+ const lastRecord = records[records.length - 1];
478
+ const nextState: KeysetCursor = {
479
+ cursorTimestamp: lastRecord.updatedAt,
480
+ cursorId: lastRecord.id,
481
+ };
482
+ ```
483
+
484
+ ### Pattern 2: Consistency Buffer
485
+
486
+ **Used by:** Salesforce (15s), Stripe (10s), HubSpot (10s)
487
+
488
+ Never advance the cursor to "now." Always leave a gap. Eventually consistent APIs may not surface recent writes in query results immediately. Because the cursor never resets in incremental mode, if it advances past a record that hasn't been indexed yet, that record is lost permanently.
489
+
490
+ The buffer ensures the cursor stays behind the API's consistency frontier.
491
+
492
+ **Workers implementation:**
493
+
494
+ ```ts
495
+ const bufferMs = 15_000; // 15 seconds
496
+ const maxCursor = new Date(Date.now() - bufferMs).toISOString();
497
+ const nextCursor = records.length > 0
498
+ ? min(lastRecord.updatedAt, maxCursor)
499
+ : maxCursor;
500
+ ```
501
+
502
+ ### Pattern 3: Event Anchor (Backfill-to-Delta Transition)
503
+
504
+ **Used by:** Stripe, Salesforce, HubSpot
505
+
506
+ Before starting a backfill, snapshot the current position of the change feed (event ID, timestamp, etc.). The delta sync should start from that snapshot — not from the end of the backfill data.
507
+
508
+ **Why:** The backfill may take hours. Records change during that time. Without the anchor, changes between "backfill started" and "backfill ended" are lost permanently (since the cursor never goes backwards).
509
+
510
+ **v2 SDK note:** With separate backfill and delta syncs, the event anchor is handled by starting the delta sync before or concurrently with the backfill. The delta sync's cursor naturally captures the starting point. If you need explicit coordination, snapshot the event anchor before triggering the backfill and initialize the delta sync's cursor from it.
511
+
512
+ **Workers implementation:**
513
+
514
+ ```ts
515
+ // Snapshot the anchor before triggering the backfill
516
+ const eventAnchor = await getLatestEventId();
517
+ // Initialize the delta sync's cursor to start from this anchor
518
+ // The backfill (replace, manual) handles seeding all existing data
519
+ ```
520
+
521
+ ### Pattern 4: Sweep = Replace Mode
522
+
523
+ When an API has no `updated_at`, no change feed, and no deletion signal, use `mode: "replace"`. The runtime handles the full sweep and deletion detection automatically. You just return all records each cycle.
524
+
525
+ ### Pattern 5: Multi-Phase State Machine
526
+
527
+ **Used by:** HubSpot (deadlock handling), ServiceNow (flip-flop deletes)
528
+
529
+ Model the state as a discriminated union when a single sync needs multiple phases:
530
+
531
+ ```ts
532
+ type State =
533
+ | { phase: "delta"; cursor: string }
534
+ | { phase: "deadlock"; stuckAt: number; lastId: string; resumeCursor: string };
535
+ ```
536
+
537
+ Each `execute` call checks `state.phase` and runs the appropriate logic. In the v2 SDK, backfill and delta are typically **separate syncs** (backfill as `replace` + `manual`, delta as `incremental`), so the state machine within a single sync is simpler. Multi-phase state machines are still useful for edge cases within a delta sync (deadlock handling, flip-flop deletes).
538
+
539
+ ### Pattern 6: Stream Flip-Flop (Single-Stream Delete Detection)
540
+
541
+ **Used by:** ServiceNow (adapted for single-stream Workers)
542
+
543
+ When an API exposes deletions through a separate endpoint (audit log, archived filter, trash), but you only have one `execute` function, alternate between streams:
544
+
545
+ 1. Run the main delta stream until `hasMore: false` (caught up to present)
546
+ 2. Switch to the delete-detection stream for one or more cycles
547
+ 3. When the delete stream catches up, switch back to delta
548
+
549
+ The state carries cursors for both streams, plus which one is active:
550
+
551
+ ```ts
552
+ type State =
553
+ | { phase: "delta"; deltaCursor: string; deletesCursor?: string }
554
+ | { phase: "deletes"; deltaCursor: string; deletesCursor: string };
555
+
556
+ // In execute:
557
+ if (state.phase === "delta") {
558
+ const { records, hasMore } = await fetchChanges(state.deltaCursor);
559
+ if (!hasMore) {
560
+ // Delta caught up — flip to deletes on next cycle
561
+ return {
562
+ changes: records.map(toUpsert),
563
+ hasMore: false,
564
+ nextState: { phase: "deletes", deltaCursor: nextCursor, deletesCursor: state.deletesCursor ?? "" },
565
+ };
566
+ }
567
+ // ... continue delta
568
+ }
569
+
570
+ if (state.phase === "deletes") {
571
+ const { deletedIds, hasMore } = await fetchDeletedRecords(state.deletesCursor);
572
+ if (!hasMore) {
573
+ // Deletes caught up — flip back to delta
574
+ return {
575
+ changes: deletedIds.map(id => ({ type: "delete", key: id })),
576
+ hasMore: false,
577
+ nextState: { phase: "delta", deltaCursor: state.deltaCursor, deletesCursor: nextCursor },
578
+ };
579
+ }
580
+ // ... continue deletes
581
+ }
582
+ ```
583
+
584
+ The flip happens at cycle boundaries (`hasMore: false`). The next cycle picks up with the alternate stream. Both cursors advance independently and persist across cycles.
585
+
586
+ ---
587
+
588
+ ## Decision Tree: Choosing a Pagination Strategy
589
+
590
+ This tree applies to **backfill** pagination. Delta pagination often differs (see per-source sections).
591
+
592
+ ```
593
+ Does the API provide pagination?
594
+ ├─ No → Return all data in one batch (small datasets only)
595
+
596
+ ├─ Yes, opaque cursor token (GraphQL endCursor, Stripe starting_after)
597
+ │ └─ Use the token directly in state
598
+ │ State: { cursor: string | null }
599
+
600
+ ├─ Yes, page numbers or offsets
601
+ │ └─ Use page number in state
602
+ │ State: { page: number }
603
+
604
+ └─ Yes, timestamp-based query (updated_since, modified_after)
605
+ ├─ Can multiple records share the same timestamp?
606
+ │ ├─ No → Simple timestamp cursor
607
+ │ │ State: { cursor: string }
608
+ │ │
609
+ │ └─ Yes → Keyset cursor (timestamp + id)
610
+ │ State: { cursorTimestamp: string, cursorId: string }
611
+
612
+ └─ Always add a consistency buffer (10-60s behind now)
613
+ APIs tend to be eventually consistent — safe default
614
+ ```
615
+
616
+ For **delta** pagination, the main question is: does the API have a change feed?
617
+
618
+ ```
619
+ Does the API have an events/changelog endpoint?
620
+ ├─ Yes → Use event ID as delta cursor (Stripe pattern)
621
+ │ Anchor the latest event ID before backfill starts
622
+
623
+ ├─ No, but has updated_at / modified_since filter
624
+ │ └─ Use timestamp (or keyset) as delta cursor (Salesforce pattern)
625
+ │ Apply consistency buffer
626
+
627
+ └─ No change tracking at all
628
+ └─ Use replace mode instead of incremental
629
+ ```
630
+
631
+ ---
632
+
633
+ ## Decision Tree: Choosing Replace vs Incremental Mode
634
+
635
+ ```
636
+ Does the API support change tracking (updated_at / modified_since / change feed)?
637
+ ├─ No → replace (simpler, auto-handles deletes)
638
+
639
+ ├─ Yes → backfill (replace, manual) + delta (incremental, scheduled)
640
+ │ │
641
+ │ │ The backfill sync seeds the database on-demand.
642
+ │ │ The delta sync keeps it up-to-date on a schedule.
643
+ │ │ Both target the same worker.database() handle.
644
+ │ │
645
+ │ └─ Does the API support deletion detection?
646
+ │ ├─ Yes (archived filter, audit log, events) → delta sync with flip-flop deletes
647
+ │ ├─ No, but deletions matter → replace only (re-fetches everything, catches deletes)
648
+ │ └─ No, and deletions don't matter → incremental delta (accept stale records)
649
+ ```
650
+
651
+ ---
652
+
653
+ ## Summary Table
654
+
655
+ | Source | API Type | Backfill Pagination | Delta Strategy | Key Pattern |
656
+ |---|---|---|---|---|
657
+ | Salesforce | REST/SOQL | Keyset (timestamp, id) | Keyset on SystemModstamp | Consistency buffer (15s), overlap transition |
658
+ | Stripe | REST | `starting_after` cursor | Event feed (10s buffer) | Event anchor before backfill |
659
+ | HubSpot | REST | Opaque `after` token | Search API + timestamp | Deadlock detection & resolution |
660
+ | GitHub | GraphQL | Relay `endCursor` | N/A (use replace mode) | Two-level nested pagination |
661
+ | ServiceNow | REST | Keyset (timestamp, id) | Same keyset | Flip-flop delete stream via audit log |