@effect-app/infra 4.0.0-beta.258 → 4.0.0-beta.259

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,262 @@
1
+ # Durable Workflow Engines (SQLite & Cosmos)
2
+
3
+ This package provides two custom durable `WorkflowEngine` implementations for
4
+ `@effect/workflow`:
5
+
6
+ - [`WorkflowEngineSqlite.ts`](../src/WorkflowEngineSqlite.ts) — SQL-backed (4 tables).
7
+ - [`WorkflowEngineCosmos.ts`](../src/WorkflowEngineCosmos.ts) — Azure Cosmos DB
8
+ (single container, per-execution partition key).
9
+
10
+ Both implement the low-level `WorkflowEngine.Encoded` contract from
11
+ `effect/unstable/workflow/WorkflowEngine` and wrap it with `makeUnsafe`. They are
12
+ drop-in alternatives to Effect's built-in `ClusterWorkflowEngine`, trading
13
+ cluster-grade routing for a much lighter operational footprint: they need only a
14
+ database, not the full cluster stack (ShardManager, Runners, MessageStorage).
15
+
16
+ This document explains what they provide, how they differ from
17
+ `ClusterWorkflowEngine`, and how they behave during blue/green deployments.
18
+
19
+ ---
20
+
21
+ ## 1. What these engines provide
22
+
23
+ Both engines persist the complete durable-execution state and recover it after a
24
+ restart. They are at **parity with `ClusterWorkflowEngine` on durability**:
25
+
26
+ | Capability | Provided | How |
27
+ | --------------------------------- | -------- | ------------------------------------------------------------------------ |
28
+ | Durable execution state | ✅ | exec / activity / deferred / clock rows, schema-encoded payloads+results |
29
+ | Restart recovery | ✅ | recovery poller re-drives `running` executions with an expired lease |
30
+ | Activity replay | ✅ | keyed by `(executionId, name, attempt)`; completed results replay |
31
+ | Durable clocks (`DurableClock`) | ✅ | clock row + clock poller; survives restart |
32
+ | Suspend / resume | ✅ | deferred completions persisted, polled, re-drive on completion |
33
+ | Idempotency / exactly-once writes | ✅ | first-writer-wins (`ON CONFLICT DO NOTHING` / batch `Create`) |
34
+ | Multi-process safety | ✅\* | etag optimistic-concurrency (OCC) + worker lease |
35
+
36
+ \* With caveats — see [§4 split-brain](#4-concurrency--split-brain) and
37
+ [§5 blue/green](#5-bluegreen-deployment-behavior).
38
+
39
+ ### Persistence layout
40
+
41
+ **SQLite** — 4 tables ([`WorkflowEngineSqlite.ts:157-203`](../src/WorkflowEngineSqlite.ts#L157-L203)):
42
+
43
+ - `*_executions` — `execution_id` PK, `workflow_name`, `payload`, `parent`,
44
+ `status` (`running|complete|interrupted`), `suspended`, `interrupted`,
45
+ `completed_result`, `worker`, `lease_expires_at`, `etag`.
46
+ Index `(status, lease_expires_at)` for recovery scans.
47
+ - `*_activities` — PK `(execution_id, name, attempt)`, `result`.
48
+ - `*_deferred` — PK `(execution_id, name)`, `exit`.
49
+ - `*_clocks` — PK `(execution_id, name)`, `fire_at`. Index on `fire_at`.
50
+
51
+ **Cosmos** — single container, 4 document types, all partitioned by
52
+ `executionId` ([`WorkflowEngineCosmos.ts:69-149`](../src/WorkflowEngineCosmos.ts#L69-L149)):
53
+
54
+ - `exec` doc (the execution), `activity::<name>::<attempt>`,
55
+ `deferred::<name>`, `clock::<name>`.
56
+ - Sharing one partition key per execution makes all writes for an execution
57
+ TransactionalBatch-eligible.
58
+
59
+ Both encode payloads/results via schema round-tripping
60
+ (`S.fromJsonString(S.toCodecJson(...))`) so typed values (dates, branded IDs,
61
+ schema classes) survive restart — same strategy as `ClusterWorkflowEngine`.
62
+
63
+ ---
64
+
65
+ ## 2. Execution & recovery model
66
+
67
+ These engines use a **poll-and-claim** model, not deterministic routing:
68
+
69
+ 1. **Claim via lease.** A process claims an execution by writing its `workerId`
70
+ and `lease_expires_at = now + leaseTtl` under an etag OCC guard
71
+ ([sqlite:455-475](../src/WorkflowEngineSqlite.ts#L455-L475)). If the lease is
72
+ held and unexpired by another worker, the claim is skipped.
73
+ 2. **Heartbeat.** While driving, a fiber renews the lease every
74
+ `heartbeatInterval` ([sqlite:477-498](../src/WorkflowEngineSqlite.ts#L477-L498)).
75
+ 3. **Recovery poller.** Every `recoveryInterval`, each process scans for
76
+ `status = 'running'` executions with a `NULL`/expired lease and re-drives them
77
+ locally ([sqlite:750-772](../src/WorkflowEngineSqlite.ts#L750-L772)).
78
+ 4. **Clock poller.** Every `clockPollInterval`, each process scans for clocks with
79
+ `fire_at <= now`, inserts the deferred completion (first-writer-wins), deletes
80
+ the clock row, and re-drives ([sqlite:776-801](../src/WorkflowEngineSqlite.ts#L776-L801)).
81
+
82
+ ### Default timings (both engines)
83
+
84
+ | Option | Default | Meaning |
85
+ | ------------------- | ------- | ---------------------------------------- |
86
+ | `leaseTtl` | 30s | how long a claim is held without renewal |
87
+ | `heartbeatInterval` | 10s | lease renewal cadence (< `leaseTtl`) |
88
+ | `recoveryInterval` | 15s | stale-execution rescan cadence |
89
+ | `clockPollInterval` | 5s | due-clock rescan cadence |
90
+
91
+ Sources: [sqlite:137-141](../src/WorkflowEngineSqlite.ts#L137-L141),
92
+ [cosmos:146-149](../src/WorkflowEngineCosmos.ts#L146-L149).
93
+
94
+ ---
95
+
96
+ ## 3. Comparison vs `ClusterWorkflowEngine`
97
+
98
+ Durability is matched. The gaps are about **routing efficiency, latency, and
99
+ targeting** — the things cluster sharding exists to provide.
100
+
101
+ | Aspect | These engines | `ClusterWorkflowEngine` |
102
+ | ---------------------------- | --------------------------------------------- | ---------------------------------------------------------- |
103
+ | Work routing | poll-and-race; every process scans the table | hash-ring routes each execution to one owning runner |
104
+ | Resume / signal latency | pull-based; up to poll interval (5–15s) | push-based; near-instant via entity messages |
105
+ | Interrupt across processes | flag in storage; remote fiber sees it on poll | interrupt message to owning runner; immediate |
106
+ | Load balancing | none; first claimer wins, recovery herds | even distribution by shard ownership |
107
+ | Node traits / host targeting | none; all processes are equal pollers | shard groups pin workloads to specific hosts (see §3.1) |
108
+ | Entity message ordering | concurrent `driveById` + OCC to resolve | serialized per-entity mailbox |
109
+ | Backpressure / capacity | none | mailbox capacity, termination timeout, poll intervals |
110
+ | Failover speed | wait lease expiry (~30s) then re-drive | fast shard rebalance on node leave |
111
+ | Split-brain window | wider (lease-expiry racing) | narrower (shard lock) |
112
+ | Operational cost | **just a DB** | full cluster stack (ShardManager, Runners, MessageStorage) |
113
+
114
+ **Cost at scale:** with N processes, every process recovery-polls the whole
115
+ executions table (15s) and clocks table (5s). On Cosmos the recovery scan is a
116
+ **cross-partition query** (fans across all partitions, RU-expensive). Cluster
117
+ scopes reads to owned shards.
118
+
119
+ ### 3.1 Node traits / shard groups
120
+
121
+ `ClusterWorkflowEngine` supports **heterogeneous nodes** via shard groups:
122
+
123
+ - A runner declares which groups it hosts (`Runner.groups`,
124
+ `ShardingConfig.assignedShardGroups`).
125
+ - A workflow is pinned to a group with
126
+ `Workflow.make({...}).annotate(ClusterSchema.ShardGroup, () => "workflow")`.
127
+ - A separate hash ring per group means a workflow only ever lands on a runner
128
+ that hosts its group (e.g. "GPU workflows → GPU hosts only").
129
+
130
+ These engines have **no equivalent** — all processes are equal pollers. There is
131
+ no way to target a workflow at a subset of hosts. Note the trait granularity in
132
+ cluster is also coarse: opaque string group names, not arbitrary key/value label
133
+ selectors.
134
+
135
+ ---
136
+
137
+ ## 4. Concurrency & split-brain
138
+
139
+ Safety rests on two mechanisms:
140
+
141
+ - **etag OCC** on the exec doc — a losing writer gets
142
+ `OptimisticConcurrencyException` (412/409) and backs off.
143
+ - **Worker lease** — only the lease holder drives; others skip while the lease is
144
+ live.
145
+
146
+ **The window:** `leaseTtl` is 30s, heartbeat 10s. If a process stalls (GC pause,
147
+ network partition) for longer than `leaseTtl` but is still alive, another process
148
+ claims the execution and both can run until the next exec write triggers an OCC
149
+ conflict.
150
+
151
+ - **Persisted activity results** are protected — first-writer-wins
152
+ (`ON CONFLICT DO NOTHING`), so the stored result is single-valued.
153
+ - **The activity _effect itself_** can still fire twice inside that window.
154
+ → **Activities must be idempotent.** Same caveat applies to
155
+ `ClusterWorkflowEngine` (at-least-once during rebalance), but the lease-racing
156
+ window here is wider.
157
+
158
+ ---
159
+
160
+ ## 5. Blue/green deployment behavior
161
+
162
+ ### 5.1 Prerequisite: shared storage
163
+
164
+ - **Cosmos** — the container is always shared; blue and green see the same
165
+ executions. ✅
166
+ - **SQLite** — if each instance has its **own** database file, the storage is
167
+ **not shared**: green cannot see blue's in-flight executions, and they strand on
168
+ blue with no recovery path. **Blue/green with per-instance SQLite is broken.**
169
+ Use a shared volume, or treat the SQLite engine as single-node only.
170
+
171
+ The rest of this section assumes shared storage.
172
+
173
+ ### 5.2 Overlap window (blue + green both alive)
174
+
175
+ - In-flight executions hold blue's lease and heartbeat every 10s. Green's recovery
176
+ poller sees the live lease and **skips** them. Blue keeps running them; green
177
+ only takes executions it newly starts. No double-run while blue heartbeats. ✅
178
+ - Both environments poll the same tables → ~2× scan load during overlap (on Cosmos,
179
+ 2× cross-partition recovery RU). Minor, transient.
180
+
181
+ ### 5.3 Cutover (blue terminated)
182
+
183
+ - Blue's leases stop renewing. After `leaseTtl` (30s) + green's `recoveryInterval`
184
+ (15s), green detects the stale executions, claims, and re-drives.
185
+ - **Failover gap ≈ 30–45s** — in-flight executions are paused that long
186
+ (pull-based, not instant).
187
+ - If blue is hard-killed mid-activity, the activity result was not persisted as
188
+ `Complete`, so green **re-runs** it (at-least-once). Non-idempotent side effects
189
+ double-fire — see [§4](#4-concurrency--split-brain).
190
+
191
+ ### 5.4 Split-brain risk is elevated during deploys
192
+
193
+ Blue/green deliberately runs two versions concurrently. If blue is slow-draining
194
+ (SIGTERM grace overlapping a long activity) past `leaseTtl`, green claims and both
195
+ run. OCC + first-writer-wins protect persisted state; the activity-effect
196
+ double-fire window remains.
197
+
198
+ ### 5.5 Sticky-lease — an accidental advantage
199
+
200
+ The lease model is **sticky to the starting worker**: an execution stays on the
201
+ process that started it as long as that process keeps heartbeating. With a
202
+ **graceful drain** (blue stops accepting new executions, finishes in-flight, then
203
+ terminates), in-flight workflows **complete on the version that started them** —
204
+ sidestepping cross-version replay entirely.
205
+
206
+ `ClusterWorkflowEngine` does the opposite: as soon as a blue runner deregisters,
207
+ its shards rebalance onto green (new-code) runners, so mid-flight cross-version
208
+ replay is the _default_ path during a deploy, not an edge case.
209
+
210
+ > ⚠️ **Missing drain hook.** Neither engine currently has a "stop claiming new
211
+ > executions" flag. To get the safe drain story above, this must be added (a flag
212
+ > that disables `tryClaim` for new work while letting heartbeats and in-flight
213
+ > drive continue). Until then, a graceful blue/green relies on the orchestrator
214
+ > keeping blue alive until in-flight work finishes.
215
+
216
+ ### 5.6 Workflow versioning hazard (shared by both engines)
217
+
218
+ The real blue/green danger is **not** engine-specific — it is durable-execution
219
+ versioning:
220
+
221
+ - Replay must be deterministic. An execution started on v1 and replayed on v2 with
222
+ reordered/renamed activities mismatches activity-by-name → corrupt replay.
223
+ - A payload encoded with the v1 schema and decoded with an incompatible v2 schema
224
+ fails to decode.
225
+
226
+ Neither these engines nor `ClusterWorkflowEngine` solve this. Mitigations:
227
+
228
+ - **Additive-only schema changes** (no field removal/retype on in-flight payloads).
229
+ - **Do not reshape an in-flight workflow's step sequence**; version the workflow
230
+ name when the shape changes (`OrderV1` / `OrderV2`).
231
+ - **Drain in-flight executions before deploying breaking changes** (see §5.5).
232
+
233
+ ---
234
+
235
+ ## 6. When to use which
236
+
237
+ - **These engines** — durable workflows on a single process, or a small number of
238
+ processes over shared storage, where you want minimal ops (no cluster stack) and
239
+ can tolerate 5–15s resume latency and ~30–45s failover. Graceful drain on
240
+ deploy strongly recommended.
241
+ - **`ClusterWorkflowEngine`** — many processes, low-latency resume/interrupt, even
242
+ load distribution, or host targeting (shard groups). Worth the cluster stack
243
+ when you actually need those.
244
+
245
+ ---
246
+
247
+ ## 7. Summary cheat-sheet
248
+
249
+ | Dimension | These engines | ClusterWorkflowEngine |
250
+ | ------------------------------- | --------------------------- | --------------------------------- |
251
+ | Durability / restart recovery | ✅ parity | ✅ |
252
+ | Activity replay & idempotency | ✅ parity | ✅ |
253
+ | Failover speed | ~30–45s (poll) | fast (push rebalance) |
254
+ | Resume / interrupt latency | 5–15s (pull) | near-instant (push) |
255
+ | Load balancing | ❌ | ✅ (shard ownership) |
256
+ | Host targeting / node traits | ❌ | ✅ (shard groups) |
257
+ | Split-brain window | wider (lease race) | narrower (shard lock) |
258
+ | Activity double-fire on cutover | possible — need idempotency | possible — need idempotency |
259
+ | Version-skew replay safety | unsolved — discipline only | unsolved — discipline only |
260
+ | In-flight version stickiness | sticky to starter (+ drain) | migrates to new code on rebalance |
261
+ | SQLite blue/green | needs shared storage | n/a |
262
+ | Operational cost | just a DB | full cluster stack |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@effect-app/infra",
3
- "version": "4.0.0-beta.258",
3
+ "version": "4.0.0-beta.259",
4
4
  "license": "MIT",
5
5
  "type": "module",
6
6
  "dependencies": {
@@ -13,7 +13,7 @@
13
13
  "proper-lockfile": "^4.1.2",
14
14
  "pure-rand": "8.4.0",
15
15
  "query-string": "^9.3.1",
16
- "effect-app": "4.0.0-beta.258"
16
+ "effect-app": "4.0.0-beta.259"
17
17
  },
18
18
  "devDependencies": {
19
19
  "@azure/cosmos": "^4.9.3",
package/run.sh ADDED
@@ -0,0 +1 @@
1
+ OSMOS_TEST_URL="AccountEndpoint=https://macs-empasa-dev.documents.azure.com:443/;AccountKey=bu1BPiecISwsSG7cNGrF0RWAX8QhAOPZKIvK9WVNQXelsdt3FJ6jo6YPqnKChvMLvtwO1USsiTKBACDb6Zp1bA==;" pnpm test test/cluster-cosmos.test.ts