@bluelibs/runner-dev 5.1.0 → 6.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AI.md +80 -6
- package/README.md +216 -22
- package/dist/cli/generators/scaffold/templates/package.json.d.ts +2 -2
- package/dist/cli/generators/scaffold/templates/package.json.js +2 -2
- package/dist/cli/generators/scaffold.js +1 -135
- package/dist/cli/generators/scaffold.js.map +1 -1
- package/dist/cli/generators/templates.js +2 -1
- package/dist/cli/generators/templates.js.map +1 -1
- package/dist/generated/resolvers-types.d.ts +545 -112
- package/dist/index.d.ts +39 -39
- package/dist/resources/cli.config.resource.d.ts +1 -1
- package/dist/resources/cli.config.resource.js +2 -2
- package/dist/resources/cli.config.resource.js.map +1 -1
- package/dist/resources/coverage.resource.d.ts +2 -2
- package/dist/resources/coverage.resource.js +3 -3
- package/dist/resources/coverage.resource.js.map +1 -1
- package/dist/resources/dev.resource.d.ts +1 -1
- package/dist/resources/dev.resource.js +2 -2
- package/dist/resources/dev.resource.js.map +1 -1
- package/dist/resources/docs.generator.resource.d.ts +4 -3
- package/dist/resources/docs.generator.resource.js +2 -2
- package/dist/resources/docs.generator.resource.js.map +1 -1
- package/dist/resources/graphql-accumulator.resource.d.ts +2 -2
- package/dist/resources/graphql-accumulator.resource.js +7 -3
- package/dist/resources/graphql-accumulator.resource.js.map +1 -1
- package/dist/resources/graphql.cli.resource.d.ts +1 -1
- package/dist/resources/graphql.cli.resource.js +2 -2
- package/dist/resources/graphql.cli.resource.js.map +1 -1
- package/dist/resources/graphql.query.cli.task.d.ts +14 -16
- package/dist/resources/graphql.query.cli.task.js +3 -3
- package/dist/resources/graphql.query.cli.task.js.map +1 -1
- package/dist/resources/graphql.query.task.d.ts +18 -18
- package/dist/resources/graphql.query.task.js +4 -4
- package/dist/resources/graphql.query.task.js.map +1 -1
- package/dist/resources/http.tag.d.ts +1 -1
- package/dist/resources/http.tag.js +2 -2
- package/dist/resources/http.tag.js.map +1 -1
- package/dist/resources/introspector.cli.resource.d.ts +2 -2
- package/dist/resources/introspector.cli.resource.js +37 -3
- package/dist/resources/introspector.cli.resource.js.map +1 -1
- package/dist/resources/introspector.resource.d.ts +3 -2
- package/dist/resources/introspector.resource.js +6 -6
- package/dist/resources/introspector.resource.js.map +1 -1
- package/dist/resources/live.resource.d.ts +7 -6
- package/dist/resources/live.resource.js +64 -25
- package/dist/resources/live.resource.js.map +1 -1
- package/dist/resources/models/Introspector.d.ts +59 -15
- package/dist/resources/models/Introspector.js +467 -137
- package/dist/resources/models/Introspector.js.map +1 -1
- package/dist/resources/models/durable.runtime.d.ts +1 -1
- package/dist/resources/models/durable.runtime.js +53 -2
- package/dist/resources/models/durable.runtime.js.map +1 -1
- package/dist/resources/models/durable.tools.d.ts +1 -1
- package/dist/resources/models/durable.tools.js +6 -3
- package/dist/resources/models/durable.tools.js.map +1 -1
- package/dist/resources/models/initializeFromStore.js +126 -19
- package/dist/resources/models/initializeFromStore.js.map +1 -1
- package/dist/resources/models/initializeFromStore.utils.d.ts +12 -7
- package/dist/resources/models/initializeFromStore.utils.js +319 -23
- package/dist/resources/models/initializeFromStore.utils.js.map +1 -1
- package/dist/resources/models/introspector.tools.js +18 -6
- package/dist/resources/models/introspector.tools.js.map +1 -1
- package/dist/resources/routeHandlers/createLiveStreamHandler.d.ts +16 -0
- package/dist/resources/routeHandlers/createLiveStreamHandler.js +127 -0
- package/dist/resources/routeHandlers/createLiveStreamHandler.js.map +1 -0
- package/dist/resources/routeHandlers/getDocsData.d.ts +4 -0
- package/dist/resources/routeHandlers/getDocsData.js +28 -0
- package/dist/resources/routeHandlers/getDocsData.js.map +1 -1
- package/dist/resources/routeHandlers/registerHttpRoutes.hook.d.ts +26 -23
- package/dist/resources/routeHandlers/registerHttpRoutes.hook.js +10 -9
- package/dist/resources/routeHandlers/registerHttpRoutes.hook.js.map +1 -1
- package/dist/resources/routeHandlers/requestCorrelation.d.ts +11 -0
- package/dist/resources/routeHandlers/requestCorrelation.js +29 -0
- package/dist/resources/routeHandlers/requestCorrelation.js.map +1 -0
- package/dist/resources/server.resource.d.ts +20 -20
- package/dist/resources/server.resource.js +17 -5
- package/dist/resources/server.resource.js.map +1 -1
- package/dist/resources/swap.cli.resource.d.ts +4 -4
- package/dist/resources/swap.cli.resource.js +2 -2
- package/dist/resources/swap.cli.resource.js.map +1 -1
- package/dist/resources/swap.resource.d.ts +7 -6
- package/dist/resources/swap.resource.js +188 -38
- package/dist/resources/swap.resource.js.map +1 -1
- package/dist/resources/swap.tools.d.ts +3 -2
- package/dist/resources/swap.tools.js +27 -27
- package/dist/resources/swap.tools.js.map +1 -1
- package/dist/resources/telemetry.resource.d.ts +1 -1
- package/dist/resources/telemetry.resource.js +46 -43
- package/dist/resources/telemetry.resource.js.map +1 -1
- package/dist/runner-compat.d.ts +85 -0
- package/dist/runner-compat.js +178 -0
- package/dist/runner-compat.js.map +1 -0
- package/dist/runner-node-compat.d.ts +2 -0
- package/dist/runner-node-compat.js +28 -0
- package/dist/runner-node-compat.js.map +1 -0
- package/dist/schema/index.js +8 -8
- package/dist/schema/index.js.map +1 -1
- package/dist/schema/model.d.ts +100 -20
- package/dist/schema/model.js.map +1 -1
- package/dist/schema/query.js +25 -1
- package/dist/schema/query.js.map +1 -1
- package/dist/schema/types/AllType.js +13 -2
- package/dist/schema/types/AllType.js.map +1 -1
- package/dist/schema/types/BaseElementCommon.js +10 -0
- package/dist/schema/types/BaseElementCommon.js.map +1 -1
- package/dist/schema/types/ErrorType.js +1 -1
- package/dist/schema/types/ErrorType.js.map +1 -1
- package/dist/schema/types/EventType.js +19 -2
- package/dist/schema/types/EventType.js.map +1 -1
- package/dist/schema/types/InterceptorOwnersType.d.ts +2 -0
- package/dist/schema/types/InterceptorOwnersType.js +63 -0
- package/dist/schema/types/InterceptorOwnersType.js.map +1 -0
- package/dist/schema/types/LaneSummaryTypes.d.ts +3 -0
- package/dist/schema/types/LaneSummaryTypes.js +19 -0
- package/dist/schema/types/LaneSummaryTypes.js.map +1 -0
- package/dist/schema/types/LiveType.js +81 -76
- package/dist/schema/types/LiveType.js.map +1 -1
- package/dist/schema/types/ResourceType.js +101 -15
- package/dist/schema/types/ResourceType.js.map +1 -1
- package/dist/schema/types/RunOptionsType.d.ts +2 -0
- package/dist/schema/types/RunOptionsType.js +107 -0
- package/dist/schema/types/RunOptionsType.js.map +1 -0
- package/dist/schema/types/TagType.js +35 -4
- package/dist/schema/types/TagType.js.map +1 -1
- package/dist/schema/types/TaskType.js +20 -0
- package/dist/schema/types/TaskType.js.map +1 -1
- package/dist/schema/types/index.d.ts +4 -2
- package/dist/schema/types/index.js +10 -7
- package/dist/schema/types/index.js.map +1 -1
- package/dist/schema/types/middleware/common.d.ts +3 -2
- package/dist/schema/types/middleware/common.js +19 -13
- package/dist/schema/types/middleware/common.js.map +1 -1
- package/dist/ui/.vite/manifest.json +2 -2
- package/dist/ui/assets/docs-Btkv97Ls.js +302 -0
- package/dist/ui/assets/docs-Btkv97Ls.js.map +1 -0
- package/dist/ui/assets/docs-CipvKUxZ.css +1 -0
- package/dist/utils/healthCollectors.d.ts +37 -0
- package/dist/utils/healthCollectors.js +147 -0
- package/dist/utils/healthCollectors.js.map +1 -0
- package/dist/utils/lane-resources.d.ts +55 -0
- package/dist/utils/lane-resources.js +143 -0
- package/dist/utils/lane-resources.js.map +1 -0
- package/dist/utils/zod.js +36 -3
- package/dist/utils/zod.js.map +1 -1
- package/dist/version.d.ts +1 -1
- package/dist/version.js +1 -1
- package/package.json +4 -6
- package/readmes/runner-AI.md +740 -0
- package/readmes/runner-durable-workflows.md +2247 -0
- package/readmes/runner-full-guide.md +5869 -0
- package/readmes/runner-remote-lanes.md +909 -0
- package/dist/ui/assets/docs-B_-zFz4-.css +0 -1
- package/dist/ui/assets/docs-Be-GHfZi.js +0 -353
- package/dist/ui/assets/docs-Be-GHfZi.js.map +0 -1
|
@@ -0,0 +1,2247 @@
|
|
|
1
|
+
# Durable Workflows (Node-only) — Architecture v2
|
|
2
|
+
|
|
3
|
+
← [Back to main README](../README.md)
|
|
4
|
+
|
|
5
|
+
---
|
|
6
|
+
|
|
7
|
+
> Durable workflows are Runner tasks with "save points". If your process dies, deploys, or scales horizontally, the workflow comes back and continues like nothing happened (except now you can finally sleep at night).
|
|
8
|
+
|
|
9
|
+
## Table of Contents
|
|
10
|
+
|
|
11
|
+
- [Start Here](#start-here)
|
|
12
|
+
- [Quickstart](#quickstart)
|
|
13
|
+
- [Tagging Workflows for Discovery](#tagging-workflows-for-discovery-required)
|
|
14
|
+
- [Why You'd Want This (In One Minute)](#why-youd-want-this-in-one-minute)
|
|
15
|
+
- [Core Insight](#core-insight)
|
|
16
|
+
- [Abstract Interfaces](#abstract-interfaces)
|
|
17
|
+
- [API Design](#api-design)
|
|
18
|
+
- [Safety & Semantics](#safety--semantics)
|
|
19
|
+
- [Signals (wait for external events)](#signals-wait-for-external-events)
|
|
20
|
+
- [Testing Utilities](#testing-utilities)
|
|
21
|
+
- [Compensation / Rollback Pattern](#compensation--rollback-pattern)
|
|
22
|
+
- [Branching with durableContext.switch()](#branching-with-durablecontextswitch)
|
|
23
|
+
- [Describing a Flow (Static Shape Export)](#describing-a-flow-static-shape-export)
|
|
24
|
+
- [Scheduling & Cron Jobs](#scheduling--cron-jobs)
|
|
25
|
+
- [Gotchas & Troubleshooting](#gotchas--troubleshooting)
|
|
26
|
+
|
|
27
|
+
## Start Here
|
|
28
|
+
|
|
29
|
+
- If you want the short version: `readmes/DURABLE_WORKFLOWS_AI.md`
|
|
30
|
+
- If you're new to Runner concepts (tasks/resources/events/middleware): `readmes/AI.md`
|
|
31
|
+
- Platform note (why this is Node-only): `readmes/MULTI_PLATFORM.md`
|
|
32
|
+
|
|
33
|
+
## Quickstart
|
|
34
|
+
|
|
35
|
+
### 0) Create durable support + a durable backend
|
|
36
|
+
|
|
37
|
+
The recommended integration is:
|
|
38
|
+
|
|
39
|
+
- register `resources.durable` once for durable tags/events support
|
|
40
|
+
- fork a concrete durable backend (`resources.memoryWorkflow` / `resources.redisWorkflow`)
|
|
41
|
+
|
|
42
|
+
The concrete durable backend:
|
|
43
|
+
|
|
44
|
+
- Executes Runner tasks via DI (`taskRunner.run(...)`).
|
|
45
|
+
- Provides a **per-resource** durable context, accessed via `durable.use()`.
|
|
46
|
+
- Optionally embeds a worker (`worker: true`) to consume the queue in that process.
|
|
47
|
+
|
|
48
|
+
### 1) Define a durable task (steps + sleep + signal)
|
|
49
|
+
|
|
50
|
+
```ts
|
|
51
|
+
import { event, r, run } from "@bluelibs/runner";
|
|
52
|
+
import { resources } from "@bluelibs/runner/node";
|
|
53
|
+
|
|
54
|
+
const Approved = event<{ approvedBy: string }>({ id: "app.signals.approved" });
|
|
55
|
+
|
|
56
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
57
|
+
|
|
58
|
+
const durableRegistration = durable.with({
|
|
59
|
+
worker: true, // single-process dev/tests
|
|
60
|
+
});
|
|
61
|
+
|
|
62
|
+
const approveOrder = r
|
|
63
|
+
.task("app.tasks.approveOrder")
|
|
64
|
+
.dependencies({ durable })
|
|
65
|
+
.run(async (input: { orderId: string }, { durable }) => {
|
|
66
|
+
const durableContext = durable.use();
|
|
67
|
+
|
|
68
|
+
await durableContext.step("validate", async () => {
|
|
69
|
+
// fetch order, validate invariants, etc.
|
|
70
|
+
return { ok: true };
|
|
71
|
+
});
|
|
72
|
+
|
|
73
|
+
const outcome = await durableContext.waitForSignal(Approved, {
|
|
74
|
+
timeoutMs: 86_400_000,
|
|
75
|
+
});
|
|
76
|
+
if (outcome.kind === "timeout") {
|
|
77
|
+
return { status: "timed_out" };
|
|
78
|
+
}
|
|
79
|
+
|
|
80
|
+
await durableContext.step("ship", async () => {
|
|
81
|
+
// ship only after approval
|
|
82
|
+
return { shipped: true };
|
|
83
|
+
});
|
|
84
|
+
|
|
85
|
+
return {
|
|
86
|
+
status: "approved",
|
|
87
|
+
approvedBy: outcome.payload.approvedBy,
|
|
88
|
+
};
|
|
89
|
+
})
|
|
90
|
+
.build();
|
|
91
|
+
|
|
92
|
+
const app = r
|
|
93
|
+
.resource("app")
|
|
94
|
+
.register([resources.durable, durableRegistration, approveOrder])
|
|
95
|
+
.build();
|
|
96
|
+
|
|
97
|
+
await run(app, { logs: { printThreshold: null } });
|
|
98
|
+
```
|
|
99
|
+
|
|
100
|
+
## Tagging Workflows for Discovery (Required)
|
|
101
|
+
|
|
102
|
+
Durable workflows are regular Runner tasks, but **must be tagged with `tags.durableWorkflow`**
|
|
103
|
+
to make them discoverable at runtime. Always add this tag to your workflow tasks:
|
|
104
|
+
|
|
105
|
+
```ts
|
|
106
|
+
import { r } from "@bluelibs/runner";
|
|
107
|
+
import { resources, tags } from "@bluelibs/runner/node";
|
|
108
|
+
|
|
109
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
110
|
+
|
|
111
|
+
const onboarding = r
|
|
112
|
+
.task("app.workflows.onboarding")
|
|
113
|
+
.dependencies({ durable })
|
|
114
|
+
.tags([
|
|
115
|
+
tags.durableWorkflow.with({
|
|
116
|
+
category: "users",
|
|
117
|
+
defaults: { invitedBy: "system" },
|
|
118
|
+
}),
|
|
119
|
+
])
|
|
120
|
+
.run(async (_input, { durable }) => {
|
|
121
|
+
const durableContext = durable.use();
|
|
122
|
+
await durableContext.step("create-user", async () => ({ ok: true }));
|
|
123
|
+
return { ok: true };
|
|
124
|
+
})
|
|
125
|
+
.build();
|
|
126
|
+
|
|
127
|
+
// later, after run(...)
|
|
128
|
+
// const durableRuntime = runtime.getResourceValue(durable);
|
|
129
|
+
// const workflows = durableRuntime.getWorkflows();
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
`tags.durableWorkflow` is **required** — workflows without this tag will not be discoverable
|
|
133
|
+
via `getWorkflows()`. Register `resources.durable` once in the app so the durable tag
|
|
134
|
+
definition and durable events are available at runtime.
|
|
135
|
+
|
|
136
|
+
`tags.durableWorkflow` is discovery metadata only. The unified response envelope
|
|
137
|
+
is produced by `durable.startAndWait(...)`:
|
|
138
|
+
`{ durable: { executionId }, data }`.
|
|
139
|
+
|
|
140
|
+
`tags.durableWorkflow` also supports optional `defaults` used by
|
|
141
|
+
`durable.describe(task)` **only when no explicit describe input is provided**.
|
|
142
|
+
This does not affect `start()`, `startAndWait()`, `schedule()`, or `ensureSchedule()`.
|
|
143
|
+
|
|
144
|
+
### Starting Durable Workflows From Resource Dependencies (HTTP route)
|
|
145
|
+
|
|
146
|
+
Tagged workflow tasks are discoverable metadata only. Execution is explicit:
|
|
147
|
+
start with `durable.start(...)` (fire-and-track) or
|
|
148
|
+
`durable.startAndWait(...)` (start-and-wait).
|
|
149
|
+
|
|
150
|
+
```ts
|
|
151
|
+
import express from "express";
|
|
152
|
+
import { r, run } from "@bluelibs/runner";
|
|
153
|
+
import { resources, tags } from "@bluelibs/runner/node";
|
|
154
|
+
|
|
155
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
156
|
+
|
|
157
|
+
const approveOrder = r
|
|
158
|
+
.task("app.workflows.approveOrder")
|
|
159
|
+
.dependencies({ durable })
|
|
160
|
+
.tags([tags.durableWorkflow.with({ category: "orders" })])
|
|
161
|
+
.run(async (input: { orderId: string }, { durable }) => {
|
|
162
|
+
const durableContext = durable.use();
|
|
163
|
+
await durableContext.step("approve", async () => ({ approved: true }));
|
|
164
|
+
return { orderId: input.orderId, status: "approved" as const };
|
|
165
|
+
})
|
|
166
|
+
.build();
|
|
167
|
+
|
|
168
|
+
const api = r
|
|
169
|
+
.resource("app.api")
|
|
170
|
+
.register([resources.durable, durable.with({ worker: false }), approveOrder])
|
|
171
|
+
.dependencies({ durable, approveOrder })
|
|
172
|
+
.init(async (_cfg, { durable, approveOrder }) => {
|
|
173
|
+
const app = express();
|
|
174
|
+
app.use(express.json());
|
|
175
|
+
|
|
176
|
+
app.post("/orders/:id/approve", async (req, res) => {
|
|
177
|
+
const executionId = await durable.start(approveOrder, {
|
|
178
|
+
orderId: req.params.id,
|
|
179
|
+
});
|
|
180
|
+
|
|
181
|
+
res.status(202).json({ executionId });
|
|
182
|
+
});
|
|
183
|
+
|
|
184
|
+
app.listen(3000);
|
|
185
|
+
})
|
|
186
|
+
.build();
|
|
187
|
+
|
|
188
|
+
await run(api);
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Production wiring (Redis + RabbitMQ)
|
|
192
|
+
|
|
193
|
+
For production, swap the in-memory backends:
|
|
194
|
+
|
|
195
|
+
```ts
|
|
196
|
+
import { resources } from "@bluelibs/runner/node";
|
|
197
|
+
|
|
198
|
+
const durable = resources.redisWorkflow.fork("app-durable");
|
|
199
|
+
|
|
200
|
+
const durableRegistration = durable.with({
|
|
201
|
+
redis: { url: process.env.REDIS_URL! },
|
|
202
|
+
queue: { url: process.env.RABBITMQ_URL! },
|
|
203
|
+
worker: true,
|
|
204
|
+
});
|
|
205
|
+
```
|
|
206
|
+
|
|
207
|
+
Isolation note: `resources.redisWorkflow` derives Redis key prefixes, pub/sub prefixes, and default queue names from the durable resource id (the value you pass to `.fork("...")`). Use different ids (or set `{ namespace }`) to run multiple durable "apps" safely on the same Redis/RabbitMQ.
|
|
208
|
+
|
|
209
|
+
API nodes typically **disable polling and the embedded worker**:
|
|
210
|
+
|
|
211
|
+
```ts
|
|
212
|
+
const durable = resources.redisWorkflow.fork("app-durable");
|
|
213
|
+
const durableRegistration = durable.with({
|
|
214
|
+
redis: { url: process.env.REDIS_URL! },
|
|
215
|
+
queue: { url: process.env.RABBITMQ_URL! },
|
|
216
|
+
worker: false,
|
|
217
|
+
polling: { enabled: false },
|
|
218
|
+
});
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
In a typical deployment:
|
|
222
|
+
|
|
223
|
+
- API nodes call `start()` / `signal()` / `wait()`.
|
|
224
|
+
- Worker nodes run the durable resource with `worker: true`.
|
|
225
|
+
|
|
226
|
+
### Scaling in production (recommended topology)
|
|
227
|
+
|
|
228
|
+
Durable workflows are designed to scale **horizontally**.
|
|
229
|
+
The core idea is: **the store is the source of truth**, and the queue distributes work.
|
|
230
|
+
|
|
231
|
+
**Recommended split:**
|
|
232
|
+
|
|
233
|
+
- **API nodes** (stateless): accept HTTP/webhooks, call `start()` / `signal()` / `wait()`.
|
|
234
|
+
- **Worker nodes** (scalable): consume the durable queue and run executions.
|
|
235
|
+
|
|
236
|
+
**API node config (no background work):**
|
|
237
|
+
|
|
238
|
+
```ts
|
|
239
|
+
const durable = resources.redisWorkflow.fork("app-durable");
|
|
240
|
+
const durableRegistration = durable.with({
|
|
241
|
+
redis: { url: process.env.REDIS_URL! },
|
|
242
|
+
queue: { url: process.env.RABBITMQ_URL! },
|
|
243
|
+
worker: false,
|
|
244
|
+
polling: { enabled: false },
|
|
245
|
+
});
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
**Worker node config (does background work):**
|
|
249
|
+
|
|
250
|
+
```ts
|
|
251
|
+
const durable = resources.redisWorkflow.fork("app-durable");
|
|
252
|
+
const durableRegistration = durable.with({
|
|
253
|
+
redis: { url: process.env.REDIS_URL! },
|
|
254
|
+
queue: { url: process.env.RABBITMQ_URL! },
|
|
255
|
+
worker: true,
|
|
256
|
+
polling: { enabled: true, interval: 1000 },
|
|
257
|
+
});
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
**How it scales:**
|
|
261
|
+
|
|
262
|
+
- Increase worker replicas: each one consumes from the queue, so throughput scales with workers.
|
|
263
|
+
- Crash/redeploy safety: a worker can die at any time; the next worker resumes from the last checkpoint.
|
|
264
|
+
- Multi-worker correctness: executions/steps are coordinated through the store, not through in-memory state.
|
|
265
|
+
|
|
266
|
+
**Timers, sleeps, and schedules (important):**
|
|
267
|
+
|
|
268
|
+
Timers (used by `durableContext.sleep(...)`, signal timeouts, and scheduling) are driven by the durable polling loop.
|
|
269
|
+
In multi-process setups you typically either:
|
|
270
|
+
|
|
271
|
+
- run a **single poller** (one worker replica with `polling.enabled: true`), or
|
|
272
|
+
- use a store implementation that provides **atomic timer claiming** so multiple pollers are safe.
|
|
273
|
+
|
|
274
|
+
If you enable polling in multiple processes without atomic claiming, you may get duplicate resume attempts.
|
|
275
|
+
This is still designed to be safe (at-least-once), but it can increase load/noise.
|
|
276
|
+
|
|
277
|
+
### 2) Start an execution (store the executionId)
|
|
278
|
+
|
|
279
|
+
```ts
|
|
280
|
+
const executionId = await d.start(approveOrder, {
|
|
281
|
+
orderId: "order-123",
|
|
282
|
+
});
|
|
283
|
+
// store executionId on the order record so your webhook can resume the workflow later
|
|
284
|
+
```
|
|
285
|
+
|
|
286
|
+
### Reading status later (no double-sync required)
|
|
287
|
+
|
|
288
|
+
If you store the `executionId` in your main database (eg. `orders.durable_execution_id`), you can fetch live workflow status on-demand from the durable store.
|
|
289
|
+
This avoids mirroring every durable transition into Postgres.
|
|
290
|
+
|
|
291
|
+
```ts
|
|
292
|
+
import { DurableOperator, RedisStore } from "@bluelibs/runner/node";
|
|
293
|
+
|
|
294
|
+
const durableStorePrefix = process.env.DURABLE_STORE_PREFIX!; // same value used by your durable runtime config
|
|
295
|
+
|
|
296
|
+
// Read-only store client for status lookups (same redis url + prefix)
|
|
297
|
+
const store = new RedisStore({
|
|
298
|
+
redis: process.env.REDIS_URL!,
|
|
299
|
+
prefix: durableStorePrefix,
|
|
300
|
+
});
|
|
301
|
+
|
|
302
|
+
// Minimal: just the execution row (status/result/error)
|
|
303
|
+
const execution = await store.getExecution(executionId);
|
|
304
|
+
|
|
305
|
+
// Rich: execution + steps + audit (dashboard-like view)
|
|
306
|
+
const operator = new DurableOperator(store);
|
|
307
|
+
const detail = await operator.getExecutionDetail(executionId);
|
|
308
|
+
```
|
|
309
|
+
|
|
310
|
+
Keep the durable store prefix in one shared config module and reuse it for both workflow runtime wiring and read-only status lookups.
|
|
311
|
+
|
|
312
|
+
If you already have the durable resource instance (dependency injection), you can use the operator API directly:
|
|
313
|
+
|
|
314
|
+
```ts
|
|
315
|
+
const detail = await durable.operator.getExecutionDetail(executionId);
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### 3) Resume from the outside (webhook / callback)
|
|
319
|
+
|
|
320
|
+
```ts
|
|
321
|
+
await d.signal(executionId, Approved, { approvedBy: "admin@company.com" });
|
|
322
|
+
const result = await d.wait(executionId, { timeout: 30_000 });
|
|
323
|
+
```
|
|
324
|
+
|
|
325
|
+
## Why You'd Want This (In One Minute)
|
|
326
|
+
|
|
327
|
+
- Your workflow needs to span time: minutes, hours, days (payments, shipping, approvals).
|
|
328
|
+
- You want deterministic retries without duplicating side-effects (charge twice, email twice, etc.).
|
|
329
|
+
- You want horizontal scaling without "who owns this in-memory timeout?" problems.
|
|
330
|
+
- You want explicit, type-safe "outside world pokes the workflow" via signals.
|
|
331
|
+
|
|
332
|
+
## Core Insight
|
|
333
|
+
|
|
334
|
+
The key insight (Temporal/Inngest-style) is that workflows are just functions with checkpoints. We provide a `DurableContext` that gives tasks:
|
|
335
|
+
|
|
336
|
+
1. **`step(id, fn)`** - Execute a function once, cache the result, return cached on replay
|
|
337
|
+
2. **`sleep(ms)`** - Durable sleep that survives process restarts
|
|
338
|
+
3. **`emit(event, data)`** - Publish a best-effort notification, de-duplicated via `step()` (not guaranteed delivery)
|
|
339
|
+
4. **`waitForSignal(signal)`** - Suspend until an external signal is delivered (eg. payment confirmation)
|
|
340
|
+
|
|
341
|
+
**Scalability Model:** Multiple worker instances can process executions concurrently. Work is distributed via a durable queue (RabbitMQ quorum queues by default), with state stored in Redis.
|
|
342
|
+
|
|
343
|
+
```mermaid
|
|
344
|
+
graph TB
|
|
345
|
+
subgraph Clients
|
|
346
|
+
C1[Client 1]
|
|
347
|
+
C2[Client 2]
|
|
348
|
+
end
|
|
349
|
+
|
|
350
|
+
subgraph DurableInfra[Durable Infrastructure]
|
|
351
|
+
Q[(RabbitMQ - Quorum Queue)]
|
|
352
|
+
R[(Redis - State/PubSub)]
|
|
353
|
+
end
|
|
354
|
+
|
|
355
|
+
subgraph Workers[Scalable Workers]
|
|
356
|
+
W1[Worker 1]
|
|
357
|
+
W2[Worker 2]
|
|
358
|
+
W3[Worker N]
|
|
359
|
+
end
|
|
360
|
+
|
|
361
|
+
C1 -->|enqueue| Q
|
|
362
|
+
C2 -->|enqueue| Q
|
|
363
|
+
|
|
364
|
+
Q -->|consume| W1
|
|
365
|
+
Q -->|consume| W2
|
|
366
|
+
Q -->|consume| W3
|
|
367
|
+
|
|
368
|
+
W1 <-->|state| R
|
|
369
|
+
W2 <-->|state| R
|
|
370
|
+
W3 <-->|state| R
|
|
371
|
+
|
|
372
|
+
R -.->|pub/sub| W1
|
|
373
|
+
R -.->|pub/sub| W2
|
|
374
|
+
R -.->|pub/sub| W3
|
|
375
|
+
```
|
|
376
|
+
|
|
377
|
+
---
|
|
378
|
+
|
|
379
|
+
## Abstract Interfaces
|
|
380
|
+
|
|
381
|
+
Three pluggable interfaces allow swapping backends without changing application code:
|
|
382
|
+
|
|
383
|
+
### 1. IDurableStore - State Storage
|
|
384
|
+
|
|
385
|
+
The **Store** is the absolute source of truth. It persists execution state, step results, timers, and schedules. If it's not in the store, it didn't happen.
|
|
386
|
+
|
|
387
|
+
```typescript
|
|
388
|
+
// interfaces/IDurableStore.ts
|
|
389
|
+
|
|
390
|
+
export interface IDurableStore {
|
|
391
|
+
// Executions (The primary workflow records)
|
|
392
|
+
saveExecution(execution: Execution): Promise<void>;
|
|
393
|
+
getExecution(id: string): Promise<Execution | null>;
|
|
394
|
+
updateExecution(id: string, updates: Partial<Execution>): Promise<void>;
|
|
395
|
+
listIncompleteExecutions(): Promise<Execution[]>;
|
|
396
|
+
|
|
397
|
+
// Steps (Memoized results for exactly-once-ish semantics)
|
|
398
|
+
getStepResult(
|
|
399
|
+
executionId: string,
|
|
400
|
+
stepId: string,
|
|
401
|
+
): Promise<StepResult | null>;
|
|
402
|
+
saveStepResult(result: StepResult): Promise<void>;
|
|
403
|
+
|
|
404
|
+
// Timers (Drives sleep(), signal timeouts, and cron)
|
|
405
|
+
createTimer(timer: Timer): Promise<void>;
|
|
406
|
+
getReadyTimers(now?: Date): Promise<Timer[]>;
|
|
407
|
+
markTimerFired(timerId: string): Promise<void>;
|
|
408
|
+
deleteTimer(timerId: string): Promise<void>;
|
|
409
|
+
|
|
410
|
+
// Schedules (Cron and Interval orchestration)
|
|
411
|
+
createSchedule(schedule: Schedule): Promise<void>;
|
|
412
|
+
getSchedule(id: string): Promise<Schedule | null>;
|
|
413
|
+
updateSchedule(id: string, updates: Partial<Schedule>): Promise<void>;
|
|
414
|
+
deleteSchedule(id: string): Promise<void>;
|
|
415
|
+
listSchedules(): Promise<Schedule[]>;
|
|
416
|
+
listActiveSchedules(): Promise<Schedule[]>;
|
|
417
|
+
|
|
418
|
+
// Optional: Distributed Timer Coordination
|
|
419
|
+
claimTimer?(
|
|
420
|
+
timerId: string,
|
|
421
|
+
workerId: string,
|
|
422
|
+
ttlMs: number,
|
|
423
|
+
): Promise<boolean>;
|
|
424
|
+
|
|
425
|
+
// Optional: Idempotency (dedupe start calls)
|
|
426
|
+
getExecutionIdByIdempotencyKey?(params: {
|
|
427
|
+
taskId: string;
|
|
428
|
+
idempotencyKey: string;
|
|
429
|
+
}): Promise<string | null>;
|
|
430
|
+
setExecutionIdByIdempotencyKey?(params: {
|
|
431
|
+
taskId: string;
|
|
432
|
+
idempotencyKey: string;
|
|
433
|
+
executionId: string;
|
|
434
|
+
}): Promise<boolean>;
|
|
435
|
+
|
|
436
|
+
// Optional: Dashboard & Operator API
|
|
437
|
+
listExecutions?(options?: ListExecutionsOptions): Promise<Execution[]>;
|
|
438
|
+
listStepResults?(executionId: string): Promise<StepResult[]>;
|
|
439
|
+
retryRollback?(executionId: string): Promise<void>;
|
|
440
|
+
skipStep?(executionId: string, stepId: string): Promise<void>;
|
|
441
|
+
forceFail?(
|
|
442
|
+
executionId: string,
|
|
443
|
+
error: { message: string; stack?: string },
|
|
444
|
+
): Promise<void>;
|
|
445
|
+
editStepResult?(
|
|
446
|
+
executionId: string,
|
|
447
|
+
stepId: string,
|
|
448
|
+
newResult: unknown,
|
|
449
|
+
): Promise<void>;
|
|
450
|
+
|
|
451
|
+
// Lifecycle
|
|
452
|
+
init?(): Promise<void>;
|
|
453
|
+
dispose?(): Promise<void>;
|
|
454
|
+
|
|
455
|
+
// Optional: Locking (if store handles its own concurrency)
|
|
456
|
+
acquireLock?(resource: string, ttlMs: number): Promise<string | null>;
|
|
457
|
+
releaseLock?(resource: string, lockId: string): Promise<void>;
|
|
458
|
+
}
|
|
459
|
+
```
|
|
460
|
+
|
|
461
|
+
**Implementations:**
|
|
462
|
+
|
|
463
|
+
- `MemoryStore` - Dev/test, no persistence
|
|
464
|
+
- `RedisStore` - Production default, distributed locking
|
|
465
|
+
|
|
466
|
+
### 2. IEventBus - Pub/Sub
|
|
467
|
+
|
|
468
|
+
For event notifications across workers (timer ready, execution complete, etc).
|
|
469
|
+
|
|
470
|
+
```typescript
|
|
471
|
+
// interfaces/IEventBus.ts
|
|
472
|
+
|
|
473
|
+
export type EventHandler = (event: BusEvent) => Promise<void>;
|
|
474
|
+
|
|
475
|
+
export interface IEventBus {
|
|
476
|
+
// Publish event to all subscribers
|
|
477
|
+
publish(channel: string, event: BusEvent): Promise<void>;
|
|
478
|
+
|
|
479
|
+
// Subscribe to events on a channel
|
|
480
|
+
subscribe(channel: string, handler: EventHandler): Promise<void>;
|
|
481
|
+
|
|
482
|
+
// Unsubscribe from a channel
|
|
483
|
+
unsubscribe(channel: string): Promise<void>;
|
|
484
|
+
|
|
485
|
+
// Lifecycle
|
|
486
|
+
init?(): Promise<void>;
|
|
487
|
+
dispose?(): Promise<void>;
|
|
488
|
+
}
|
|
489
|
+
|
|
490
|
+
export interface BusEvent {
|
|
491
|
+
type: string;
|
|
492
|
+
payload: unknown;
|
|
493
|
+
timestamp: Date;
|
|
494
|
+
}
|
|
495
|
+
```
|
|
496
|
+
|
|
497
|
+
**Implementations:**
|
|
498
|
+
|
|
499
|
+
- `MemoryEventBus` - Dev/test, single-process only
|
|
500
|
+
- `RedisEventBus` - Production default, uses Redis Pub/Sub
|
|
501
|
+
|
|
502
|
+
**Serialization note:** `RedisEventBus` serializes events using Runner's serializer (tree mode) so `BusEvent.timestamp: Date` (and other supported built-in types) round-trip correctly across Redis Pub/Sub.
|
|
503
|
+
|
|
504
|
+
### 3. IDurableQueue - Work Distribution
|
|
505
|
+
|
|
506
|
+
For distributing execution work across multiple workers with durability guarantees.
|
|
507
|
+
|
|
508
|
+
```typescript
|
|
509
|
+
// interfaces/IDurableQueue.ts
|
|
510
|
+
|
|
511
|
+
export interface QueueMessage<T = unknown> {
|
|
512
|
+
id: string;
|
|
513
|
+
type: "execute" | "resume" | "schedule";
|
|
514
|
+
payload: T;
|
|
515
|
+
attempts: number;
|
|
516
|
+
maxAttempts: number;
|
|
517
|
+
createdAt: Date;
|
|
518
|
+
}
|
|
519
|
+
|
|
520
|
+
export type MessageHandler<T = unknown> = (
|
|
521
|
+
message: QueueMessage<T>,
|
|
522
|
+
) => Promise<void>;
|
|
523
|
+
|
|
524
|
+
export interface IDurableQueue {
|
|
525
|
+
// Send message to queue
|
|
526
|
+
enqueue<T>(
|
|
527
|
+
message: Omit<QueueMessage<T>, "id" | "createdAt">,
|
|
528
|
+
): Promise<string>;
|
|
529
|
+
|
|
530
|
+
// Start consuming messages (calls handler for each)
|
|
531
|
+
consume<T>(handler: MessageHandler<T>): Promise<void>;
|
|
532
|
+
|
|
533
|
+
// Acknowledge successful processing
|
|
534
|
+
ack(messageId: string): Promise<void>;
|
|
535
|
+
|
|
536
|
+
// Negative acknowledge (requeue or dead-letter)
|
|
537
|
+
nack(messageId: string, requeue?: boolean): Promise<void>;
|
|
538
|
+
|
|
539
|
+
// Lifecycle
|
|
540
|
+
init?(): Promise<void>;
|
|
541
|
+
dispose?(): Promise<void>;
|
|
542
|
+
}
|
|
543
|
+
```
|
|
544
|
+
|
|
545
|
+
**Message types note:** Runner currently enqueues `execute` and `resume`. `schedule` is accepted by `DurableWorker` as an alias of `resume` (an execution hint) so custom adapters can use it, but built-in cron/interval scheduling is driven by timers + `resume`.
|
|
546
|
+
|
|
547
|
+
**Implementations:**
|
|
548
|
+
|
|
549
|
+
- `MemoryQueue` - Dev/test, no persistence
|
|
550
|
+
- `RabbitMQQueue` - Production default, quorum queues for durability
|
|
551
|
+
|
|
552
|
+
---
|
|
553
|
+
|
|
554
|
+
## Adapting to Your Flow: Custom Backends
|
|
555
|
+
|
|
556
|
+
One of Runner's core philosophies is **zero lock-in**. If your team uses Postgres for state or Kafka for queues, you shouldn't have to change your workflow logic to use them.
|
|
557
|
+
|
|
558
|
+
### Implementing a Custom Store
|
|
559
|
+
|
|
560
|
+
To implement a custom store (e.g., for SQL), you only need to satisfy the `IDurableStore` interface. The engine is designed to be "dumb" and trust the store for all persistence.
|
|
561
|
+
|
|
562
|
+
**Minimum Viable Store (Pseudo-SQL):**
|
|
563
|
+
|
|
564
|
+
```typescript
|
|
565
|
+
class MySqlStore implements IDurableStore {
|
|
566
|
+
async saveExecution(e: Execution) {
|
|
567
|
+
await db.query("INSERT INTO durable_executions ...", [e.id, serialize(e)]);
|
|
568
|
+
}
|
|
569
|
+
|
|
570
|
+
async getExecution(id: string) {
|
|
571
|
+
const row = await db.query(
|
|
572
|
+
"SELECT data FROM durable_executions WHERE id = ?",
|
|
573
|
+
[id],
|
|
574
|
+
);
|
|
575
|
+
return row ? deserialize(row.data) : null;
|
|
576
|
+
}
|
|
577
|
+
|
|
578
|
+
// ... implement other methods by mapping to your DB tables
|
|
579
|
+
}
|
|
580
|
+
```
|
|
581
|
+
|
|
582
|
+
> [!TIP]
|
|
583
|
+
> Look at [MemoryStore.ts](../src/node/durable/store/MemoryStore.ts) for a clean reference of how to manage in-memory state, or [RedisStore.ts](../src/node/durable/store/RedisStore.ts) for a production-grade implementation using Lua scripts for atomicity.
|
|
584
|
+
|
|
585
|
+
### Implementing a Custom Queue
|
|
586
|
+
|
|
587
|
+
If you want to use a different message broker (SQS, Kafka, Redis Streams), implement `IDurableQueue`.
|
|
588
|
+
|
|
589
|
+
**Key Responsibilities:**
|
|
590
|
+
|
|
591
|
+
- **`enqueue`**: Push a message (task execution hint) to the broker.
|
|
592
|
+
- **`consume`**: Register a listener that calls the provided handler when a message arrives.
|
|
593
|
+
- **`ack` / `nack`**: Handle message confirmation/failure.
|
|
594
|
+
|
|
595
|
+
```typescript
|
|
596
|
+
class SqsQueue implements IDurableQueue {
|
|
597
|
+
async enqueue(msg) {
|
|
598
|
+
const res = await sqs.sendMessage({
|
|
599
|
+
QueueUrl,
|
|
600
|
+
MessageBody: JSON.stringify(msg),
|
|
601
|
+
});
|
|
602
|
+
return res.MessageId;
|
|
603
|
+
}
|
|
604
|
+
|
|
605
|
+
async consume(handler) {
|
|
606
|
+
// Polling loop or subscription
|
|
607
|
+
const msgs = await sqs.receiveMessage({ QueueUrl });
|
|
608
|
+
for (const m of msgs) {
|
|
609
|
+
await handler(JSON.parse(m.Body));
|
|
610
|
+
await this.ack(m.ReceiptHandle);
|
|
611
|
+
}
|
|
612
|
+
}
|
|
613
|
+
}
|
|
614
|
+
```
|
|
615
|
+
|
|
616
|
+
> [!IMPORTANT]
|
|
617
|
+
> A queue in Durable Workflows is just a **hint**. If a message is lost, the `polling` loop in `DurableService` acts as a safety net to find and resume stuck executions. However, a reliable queue (like RabbitMQ or SQS) is critical for low-latency distribution and high throughput.
|
|
618
|
+
|
|
619
|
+
---
|
|
620
|
+
|
|
621
|
+
## Component Architecture
|
|
622
|
+
|
|
623
|
+
```mermaid
|
|
624
|
+
graph TB
|
|
625
|
+
subgraph RunnerCore[Runner Core - Unchanged]
|
|
626
|
+
R[Resources]
|
|
627
|
+
T[Tasks]
|
|
628
|
+
E[Events]
|
|
629
|
+
H[Hooks]
|
|
630
|
+
end
|
|
631
|
+
|
|
632
|
+
subgraph DurableModule[src/node/durable/]
|
|
633
|
+
DS[DurableService]
|
|
634
|
+
DC[DurableContext]
|
|
635
|
+
DW[DurableWorker]
|
|
636
|
+
|
|
637
|
+
subgraph Interfaces[Abstract Interfaces]
|
|
638
|
+
IS[IDurableStore]
|
|
639
|
+
IB[IEventBus]
|
|
640
|
+
IQ[IDurableQueue]
|
|
641
|
+
end
|
|
642
|
+
|
|
643
|
+
subgraph StoreImpl[Store Implementations]
|
|
644
|
+
MS[MemoryStore]
|
|
645
|
+
RS[RedisStore]
|
|
646
|
+
end
|
|
647
|
+
|
|
648
|
+
subgraph BusImpl[EventBus Implementations]
|
|
649
|
+
MB[MemoryEventBus]
|
|
650
|
+
RB[RedisEventBus]
|
|
651
|
+
end
|
|
652
|
+
|
|
653
|
+
subgraph QueueImpl[Queue Implementations]
|
|
654
|
+
MQ[MemoryQueue]
|
|
655
|
+
RQ[RabbitMQQueue]
|
|
656
|
+
end
|
|
657
|
+
end
|
|
658
|
+
|
|
659
|
+
DS --> IS
|
|
660
|
+
DS --> IB
|
|
661
|
+
DS --> IQ
|
|
662
|
+
DW --> IQ
|
|
663
|
+
DC --> IS
|
|
664
|
+
|
|
665
|
+
IS -.-> MS
|
|
666
|
+
IS -.-> RS
|
|
667
|
+
IB -.-> MB
|
|
668
|
+
IB -.-> RB
|
|
669
|
+
IQ -.-> MQ
|
|
670
|
+
IQ -.-> RQ
|
|
671
|
+
|
|
672
|
+
T -.->|uses| DC
|
|
673
|
+
```
|
|
674
|
+
|
|
675
|
+
---
|
|
676
|
+
|
|
677
|
+
## API Design
|
|
678
|
+
|
|
679
|
+
### Basic Usage
|
|
680
|
+
|
|
681
|
+
Durable workflows are **normal Runner tasks** that inject a **durable backend resource** (created via `resources.memoryWorkflow.fork(id)` or `resources.redisWorkflow.fork(id)` and registered via `.with(config)`) and call `durableContext.step(...)` / `durableContext.sleep(...)` from inside their `run` function.
|
|
682
|
+
|
|
683
|
+
```typescript
|
|
684
|
+
import { r, run } from "@bluelibs/runner";
|
|
685
|
+
import { MemoryStore, resources } from "@bluelibs/runner/node";
|
|
686
|
+
|
|
687
|
+
// 1. Create store
|
|
688
|
+
const store = new MemoryStore();
|
|
689
|
+
|
|
690
|
+
// 2. Create durable resource definition
|
|
691
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
692
|
+
|
|
693
|
+
// 3. Register durable resource with config
|
|
694
|
+
const durableRegistration = durable.with({
|
|
695
|
+
store,
|
|
696
|
+
polling: { enabled: true, interval: 1000 }, // Timer polling interval
|
|
697
|
+
});
|
|
698
|
+
|
|
699
|
+
// 3. Define a task that uses durable context
|
|
700
|
+
const processOrder = r
|
|
701
|
+
.task("app.tasks.processOrder")
|
|
702
|
+
.inputSchema<{ orderId: string; customerId: string }>()
|
|
703
|
+
.dependencies({ durable })
|
|
704
|
+
.run(async (input, { durable }) => {
|
|
705
|
+
const durableContext = durable.use();
|
|
706
|
+
|
|
707
|
+
// Step 1: Validate order (checkpointed)
|
|
708
|
+
const order = await durableContext.step("validate", async () => {
|
|
709
|
+
const o = await db.orders.find(input.orderId);
|
|
710
|
+
if (!o) throw new Error("Order not found");
|
|
711
|
+
return o;
|
|
712
|
+
});
|
|
713
|
+
|
|
714
|
+
// Step 2: Process payment (checkpointed)
|
|
715
|
+
const payment = await durableContext.step("charge-payment", async () => {
|
|
716
|
+
return await payments.charge(order.customerId, order.total);
|
|
717
|
+
});
|
|
718
|
+
|
|
719
|
+
// Durable sleep - survives restart
|
|
720
|
+
await durableContext.sleep(5000);
|
|
721
|
+
|
|
722
|
+
// Step 3: Ship order (checkpointed)
|
|
723
|
+
const shipment = await durableContext.step("create-shipment", async () => {
|
|
724
|
+
return await shipping.create(order.id);
|
|
725
|
+
});
|
|
726
|
+
|
|
727
|
+
return {
|
|
728
|
+
success: true,
|
|
729
|
+
orderId: order.id,
|
|
730
|
+
trackingId: shipment.trackingId,
|
|
731
|
+
};
|
|
732
|
+
})
|
|
733
|
+
.build();
|
|
734
|
+
|
|
735
|
+
// 4. Wire up and run
|
|
736
|
+
const app = r
|
|
737
|
+
.resource("app")
|
|
738
|
+
.register([resources.durable, durableRegistration, processOrder])
|
|
739
|
+
.build();
|
|
740
|
+
|
|
741
|
+
const runtime = await run(app);
|
|
742
|
+
|
|
743
|
+
// 5. Execute durably
|
|
744
|
+
const d = runtime.getResourceValue(durable);
|
|
745
|
+
const result = await d.startAndWait(processOrder, {
|
|
746
|
+
orderId: "order-123",
|
|
747
|
+
customerId: "cust-456",
|
|
748
|
+
});
|
|
749
|
+
```
|
|
750
|
+
|
|
751
|
+
### How It Works
|
|
752
|
+
|
|
753
|
+
1. **`durable.startAndWait(task, input)`** creates an execution record and runs the task
|
|
754
|
+
- Prefer `startAndWait()` when you want "start and wait for result" in one call.
|
|
755
|
+
- Prefer `start()` + `signal()` + `wait()` when the outside world must resume the workflow later (webhooks, approvals).
|
|
756
|
+
2. **`durableContext.step(id, fn)`** checks if step was already executed:
|
|
757
|
+
- If yes: returns cached result (replay)
|
|
758
|
+
- If no: executes fn, caches result, returns result
|
|
759
|
+
3. **`durableContext.sleep(ms)`** creates a timer record, suspends execution, resumes when timer fires
|
|
760
|
+
4. **`durableContext.waitForSignal(signal)`** records a durable wait checkpoint and suspends execution
|
|
761
|
+
5. **`durable.signal(executionId, signal, payload)`** completes the signal checkpoint and resumes the execution
|
|
762
|
+
6. If process crashes, **`durableService.recover()`** resumes incomplete executions from their last checkpoint
|
|
763
|
+
|
|
764
|
+
### `start()` vs `startAndWait()` (clear contract)
|
|
765
|
+
|
|
766
|
+
- `start(taskOrTaskId, input)`:
|
|
767
|
+
returns immediately with `executionId` (`string`).
|
|
768
|
+
- `startAndWait(taskOrTaskId, input)`:
|
|
769
|
+
convenience wrapper for `start(...)` + `wait(executionId)`; returns
|
|
770
|
+
`{ durable: { executionId }, data }`.
|
|
771
|
+
|
|
772
|
+
`start()` and `startAndWait()` are the only supported durable execution APIs.
|
|
773
|
+
|
|
774
|
+
`taskOrTaskId` can be:
|
|
775
|
+
|
|
776
|
+
- an `ITask` (the built task object, returned by `.build()`)
|
|
777
|
+
- a task id `string`
|
|
778
|
+
|
|
779
|
+
It is **not** the injected dependency callable from `.dependencies({ someTask })`. That dependency is a function used to invoke the task directly, not an `ITask` reference.
|
|
780
|
+
|
|
781
|
+
```ts
|
|
782
|
+
// ✅ built task object
|
|
783
|
+
const executionIdA = await d.start(approveOrder, { orderId: "o1" });
|
|
784
|
+
|
|
785
|
+
// ✅ task id string
|
|
786
|
+
const executionIdB = await d.start(approveOrder.id, {
|
|
787
|
+
orderId: "o2",
|
|
788
|
+
});
|
|
789
|
+
|
|
790
|
+
// ❌ injected callable dependency (different type)
|
|
791
|
+
// await d.start(deps.approveOrder, { orderId: "o3" });
|
|
792
|
+
```
|
|
793
|
+
|
|
794
|
+
### What Happens with the Return Value
|
|
795
|
+
|
|
796
|
+
Whatever your workflow function returns becomes the **execution result**, persisted in the durable store. You can retrieve it in three ways depending on your pattern:
|
|
797
|
+
|
|
798
|
+
- **`startAndWait(task, input)`** — starts the workflow **and** waits for it to finish, returning `{ durable: { executionId }, data }`:
|
|
799
|
+
|
|
800
|
+
```ts
|
|
801
|
+
const result = await d.startAndWait(processOrder, { orderId: "order-123" });
|
|
802
|
+
// result = {
|
|
803
|
+
// durable: { executionId: "..." },
|
|
804
|
+
// data: { success: true, orderId: "order-123", trackingId: "TRK-789" }
|
|
805
|
+
// }
|
|
806
|
+
```
|
|
807
|
+
|
|
808
|
+
- **`start(task, input)`** + **`wait(executionId)`** — start and wait separately (useful when a webhook or external event resumes the workflow later):
|
|
809
|
+
|
|
810
|
+
```ts
|
|
811
|
+
const executionId = await d.start(approveOrder, {
|
|
812
|
+
orderId: "order-123",
|
|
813
|
+
});
|
|
814
|
+
// ... later (eg. in a webhook handler) ...
|
|
815
|
+
await d.signal(executionId, Approved, { approvedBy: "admin@co.com" });
|
|
816
|
+
const result = await d.wait(executionId, { timeout: 30_000 });
|
|
817
|
+
// result = { status: "approved", approvedBy: "admin@co.com" }
|
|
818
|
+
```
|
|
819
|
+
|
|
820
|
+
- **Read from the store** — fetch the persisted result without blocking:
|
|
821
|
+
```ts
|
|
822
|
+
const execution = await store.getExecution(executionId);
|
|
823
|
+
// execution.status = "completed" | "failed" | "running" | ...
|
|
824
|
+
// execution.result = the return value of your workflow
|
|
825
|
+
```
|
|
826
|
+
|
|
827
|
+
If the workflow throws an error instead of returning, the execution is marked as `failed` and `startAndWait()`/`wait()` will reject with that error.
|
|
828
|
+
|
|
829
|
+
---
|
|
830
|
+
|
|
831
|
+
## Execution Flow
|
|
832
|
+
|
|
833
|
+
```mermaid
|
|
834
|
+
sequenceDiagram
|
|
835
|
+
participant C as Client
|
|
836
|
+
participant DS as DurableService
|
|
837
|
+
participant S as Store
|
|
838
|
+
participant DC as DurableContext
|
|
839
|
+
participant T as Task Function
|
|
840
|
+
|
|
841
|
+
C->>DS: startAndWait(task, input)
|
|
842
|
+
DS->>S: createExecution(id, task, input)
|
|
843
|
+
DS->>DC: create context for execution
|
|
844
|
+
DS->>T: run task with context
|
|
845
|
+
|
|
846
|
+
T->>DC: step('validate', fn)
|
|
847
|
+
DC->>S: getStepResult(execId, 'validate')
|
|
848
|
+
alt Step not cached
|
|
849
|
+
DC->>DC: execute fn()
|
|
850
|
+
DC->>S: saveStepResult(execId, 'validate', result)
|
|
851
|
+
end
|
|
852
|
+
DC-->>T: return result
|
|
853
|
+
|
|
854
|
+
T->>DC: sleep(5000)
|
|
855
|
+
DC->>S: createTimer(execId, fireAt)
|
|
856
|
+
Note over DC,T: Execution suspends
|
|
857
|
+
|
|
858
|
+
Note over DS: Timer polling...
|
|
859
|
+
DS->>S: getReadyTimers()
|
|
860
|
+
S-->>DS: timer ready!
|
|
861
|
+
DS->>DC: resume execution
|
|
862
|
+
|
|
863
|
+
T->>DC: step('ship', fn)
|
|
864
|
+
DC->>S: getStepResult(execId, 'ship')
|
|
865
|
+
DC->>DC: execute fn()
|
|
866
|
+
DC->>S: saveStepResult(execId, 'ship', result)
|
|
867
|
+
DC-->>T: return result
|
|
868
|
+
|
|
869
|
+
T-->>DS: return final result
|
|
870
|
+
DS->>S: markExecutionComplete(id, result)
|
|
871
|
+
DS-->>C: return result
|
|
872
|
+
```
|
|
873
|
+
|
|
874
|
+
---
|
|
875
|
+
|
|
876
|
+
## Safety & Semantics
|
|
877
|
+
|
|
878
|
+
This section summarizes the safety guarantees and expectations of the durable workflow system.
|
|
879
|
+
|
|
880
|
+
- **Store is the source of truth**
|
|
881
|
+
All durable state (executions, steps, timers, schedules) lives in `IDurableStore`. Queues and pub/sub are optimizations on top; correctness must not rely solely on in-memory state or transient messages.
|
|
882
|
+
|
|
883
|
+
- **At-least-once execution, effectively-once steps**
|
|
884
|
+
- Executions are retried on failure, so the same logical workflow may run more than once.
|
|
885
|
+
- `durableContext.step(stepId, fn)` ensures each step function is _observably_ executed at most once per execution: results are memoized in the store and returned on replay.
|
|
886
|
+
- External side effects inside a step must still be designed to be idempotent or safely repeatable (for example, idempotent payment/refund APIs).
|
|
887
|
+
|
|
888
|
+
- **Sleep and resumption**
|
|
889
|
+
- `durableContext.sleep(ms)` persists a timer and marks the execution as `sleeping`.
|
|
890
|
+
- When the timer fires, execution is resumed from the code _after_ `sleep`, and all previous steps are replayed via cached results (no re‑issuing of side effects wrapped in `step`).
|
|
891
|
+
|
|
892
|
+
- **Event emission without duplicates**
|
|
893
|
+
- `durableContext.emit(event, data)` is implemented as one or more internal `step`s under the hood.
|
|
894
|
+
- Each call is assigned a deterministic internal id like `__emit:<eventId>:<index>` so you can emit the same event type multiple times in one workflow.
|
|
895
|
+
- On replay, memoization prevents duplicates for each individual emission.
|
|
896
|
+
- **Determinism note:** those internal `:<index>` suffixes are derived from call order within the workflow. If you change the workflow structure (branching / adding/removing calls), the internal step ids may shift and past executions may no longer replay cleanly.
|
|
897
|
+
|
|
898
|
+
- **Signals (wait until external confirmation)**
|
|
899
|
+
- `durableContext.waitForSignal(signal)` suspends an execution until `durable.signal(executionId, signal, payload)` is called.
|
|
900
|
+
- `stepId` keeps the same return type (payload + timeout error), while `timeoutMs` switches to a `{ kind: "signal" | "timeout" }` outcome.
|
|
901
|
+
- Signals are memoized as steps under `__signal:<signal.id>[:index]` (or `__signal:<id>[:index]` for string ids).
|
|
902
|
+
- Repeated waits use `__signal:<id>:<index>` and are resolved by the first available slot; payloads can be buffered for future waits.
|
|
903
|
+
- **Determinism note:** like `emit`, the `:<index>` suffixes are derived from call order within the workflow; code changes can shift indexes on replay.
|
|
904
|
+
|
|
905
|
+
- **Retries and timeouts**
|
|
906
|
+
- `StepOptions.retries` and `DurableServiceConfig.execution.maxAttempts` control step‑level and execution‑level retries respectively.
|
|
907
|
+
- `StepOptions.timeout` and `execution.timeout` bound how long a single step or the whole execution may run.
|
|
908
|
+
- **Global Timeouts**: `execution.timeout` measures the total time from the very first attempt (`createdAt`) and is not reset on retries or resumptions.
|
|
909
|
+
|
|
910
|
+
- **Queue and worker semantics**
|
|
911
|
+
- `IDurableQueue` provides **at-least-once** delivery: messages may be delivered more than once but will not be silently dropped.
|
|
912
|
+
- Workers must treat queue messages as hints to load state from the store, apply `DurableContext` logic, and then `ack` or `nack` the message. Idempotency is achieved by reading/writing through `IDurableStore`, not by trusting the queue alone.
|
|
913
|
+
|
|
914
|
+
- **Multi-node coordination**
|
|
915
|
+
- `IEventBus` is used to reduce `wait()` latency (publish `execution:<id>` completion events) but does not replace the store.
|
|
916
|
+
- Timers (`sleep`, signal timeouts, schedules) are driven by the durable poller (`DurableService` polling loop). In multi-process setups, run a single poller (`polling: { enabled: true }`) or implement atomic timer claiming in your store.
|
|
917
|
+
|
|
918
|
+
- **Reserved step ids**
|
|
919
|
+
- Step ids starting with `__` and `rollback:` are reserved for durable internals. Avoid using them in `durableContext.step(...)` to prevent collisions with system steps.
|
|
920
|
+
|
|
921
|
+
These semantics intentionally favor **safety and debuggability** over perfect "exactly-once" guarantees at the infrastructure level. Application code remains explicit and testable, while the system provides strong, well-defined durability guarantees around that code.
|
|
922
|
+
|
|
923
|
+
---
|
|
924
|
+
|
|
925
|
+
## Signals (wait for external events)
|
|
926
|
+
|
|
927
|
+
Durable workflows often need to pause until the outside world confirms something (eg. payment provider callbacks). Use `durableContext.waitForSignal()` inside the workflow, and `durable.signal()` from the outside.
|
|
928
|
+
|
|
929
|
+
Signal summary:
|
|
930
|
+
|
|
931
|
+
- `stepId` is a stable key only; it does not change return types.
|
|
932
|
+
- `waitForSignal({ stepId })` requires a store that supports listing step results (`listStepResults`) so `durable.signal(...)` can find the waiter.
|
|
933
|
+
- `timeoutMs` changes the return value to a `{ kind: "signal" | "timeout" }` outcome.
|
|
934
|
+
- Without `timeoutMs`, timeouts throw an error (no union result).
|
|
935
|
+
|
|
936
|
+
Return shapes:
|
|
937
|
+
|
|
938
|
+
| Call | Returns |
|
|
939
|
+
| ---------------------------------------------- | ------------------------------------------------------ |
|
|
940
|
+
| `waitForSignal(signal)` | `payload` (throws on timeout) |
|
|
941
|
+
| `waitForSignal(signal, { stepId })` | `payload` (throws on timeout) |
|
|
942
|
+
| `waitForSignal(signal, { timeoutMs })` | `{ kind: "signal", payload }` or `{ kind: "timeout" }` |
|
|
943
|
+
| `waitForSignal(signal, { timeoutMs, stepId })` | `{ kind: "signal", payload }` or `{ kind: "timeout" }` |
|
|
944
|
+
|
|
945
|
+
### Example: `waitUntilPaid()`
|
|
946
|
+
|
|
947
|
+
```typescript
|
|
948
|
+
import { event, r } from "@bluelibs/runner";
|
|
949
|
+
import { MemoryStore, resources } from "@bluelibs/runner/node";
|
|
950
|
+
|
|
951
|
+
const Paid = event<{ paidAt: number }>({ id: "app.signals.paid" });
|
|
952
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
953
|
+
const durableRegistration = durable.with({ store: new MemoryStore() });
|
|
954
|
+
|
|
955
|
+
export const processOrder = r
|
|
956
|
+
.task("app.tasks.processOrder")
|
|
957
|
+
.dependencies({ durable })
|
|
958
|
+
.run(async (input: { orderId: string }, { durable }) => {
|
|
959
|
+
const durableContext = durable.use();
|
|
960
|
+
|
|
961
|
+
await durableContext.step("reserve", async () => {
|
|
962
|
+
// reserve inventory, create payment intent, etc.
|
|
963
|
+
return { ok: true };
|
|
964
|
+
});
|
|
965
|
+
|
|
966
|
+
const payment = await durableContext.waitForSignal(Paid);
|
|
967
|
+
|
|
968
|
+
await durableContext.step("ship", async () => {
|
|
969
|
+
// ship only after payment is confirmed
|
|
970
|
+
return { ok: true, paidAt: payment.paidAt };
|
|
971
|
+
});
|
|
972
|
+
})
|
|
973
|
+
.build();
|
|
974
|
+
```
|
|
975
|
+
|
|
976
|
+
From an API webhook / callback handler:
|
|
977
|
+
|
|
978
|
+
```typescript
|
|
979
|
+
// Store the workflow `executionId` in your domain data when you start it.
|
|
980
|
+
// You can get it immediately via `await d.start(task, input)`.
|
|
981
|
+
const d = runtime.getResourceValue(durable);
|
|
982
|
+
await d.signal(executionId, Paid, { paidAt: Date.now() });
|
|
983
|
+
```
|
|
984
|
+
|
|
985
|
+
### Whichever comes first: signal or timeout
|
|
986
|
+
|
|
987
|
+
If you need "wait for payment confirmation or continue after 1 day", use the timeout variant:
|
|
988
|
+
|
|
989
|
+
```typescript
|
|
990
|
+
const outcome = await durableContext.waitForSignal(Paid, {
|
|
991
|
+
timeoutMs: 86_400_000,
|
|
992
|
+
});
|
|
993
|
+
|
|
994
|
+
if (outcome.kind === "timeout") {
|
|
995
|
+
// mark order as expired, notify user, etc.
|
|
996
|
+
return;
|
|
997
|
+
}
|
|
998
|
+
|
|
999
|
+
// outcome.kind === "signal"
|
|
1000
|
+
await durableContext.step("ship", async () => ({
|
|
1001
|
+
paidAt: outcome.payload.paidAt,
|
|
1002
|
+
}));
|
|
1003
|
+
```
|
|
1004
|
+
|
|
1005
|
+
### Stable `stepId` without changing behavior
|
|
1006
|
+
|
|
1007
|
+
You can pass a stable step id for replay stability without changing the return type:
|
|
1008
|
+
|
|
1009
|
+
```typescript
|
|
1010
|
+
const payment = await durableContext.waitForSignal(Paid, {
|
|
1011
|
+
stepId: "stable-paid",
|
|
1012
|
+
});
|
|
1013
|
+
```
|
|
1014
|
+
|
|
1015
|
+
---
|
|
1016
|
+
|
|
1017
|
+
## Compensation / Rollback Pattern
|
|
1018
|
+
|
|
1019
|
+
Instead of a complex saga orchestrator, users implement compensation explicitly:
|
|
1020
|
+
|
|
1021
|
+
```typescript
|
|
1022
|
+
const processOrderWithRollback = r
|
|
1023
|
+
.task("app.tasks.processOrder")
|
|
1024
|
+
.dependencies({ durable })
|
|
1025
|
+
.run(async (input, { durable }) => {
|
|
1026
|
+
const durableContext = durable.use();
|
|
1027
|
+
|
|
1028
|
+
// Reserve inventory
|
|
1029
|
+
const reservation = await durableContext
|
|
1030
|
+
.step("reserve-inventory")
|
|
1031
|
+
.up(async () => inventory.reserve(input.items))
|
|
1032
|
+
.down(async (res) => inventory.release(res.reservationId));
|
|
1033
|
+
|
|
1034
|
+
// Charge payment
|
|
1035
|
+
const payment = await durableContext
|
|
1036
|
+
.step("charge-payment")
|
|
1037
|
+
.up(async () => payments.charge(input.customerId, input.amount))
|
|
1038
|
+
.down(async (p) => payments.refund(p.chargeId));
|
|
1039
|
+
|
|
1040
|
+
try {
|
|
1041
|
+
// Ship order - might fail
|
|
1042
|
+
const shipment = await durableContext.step("ship-order", async () => {
|
|
1043
|
+
return await shipping.ship(input.orderId);
|
|
1044
|
+
});
|
|
1045
|
+
return { success: true, shipment };
|
|
1046
|
+
} catch (error) {
|
|
1047
|
+
await durableContext.rollback();
|
|
1048
|
+
return {
|
|
1049
|
+
success: false,
|
|
1050
|
+
error: error instanceof Error ? error.message : String(error),
|
|
1051
|
+
};
|
|
1052
|
+
}
|
|
1053
|
+
})
|
|
1054
|
+
.build();
|
|
1055
|
+
```
|
|
1056
|
+
|
|
1057
|
+
This is more explicit and readable than an automatic saga system.
|
|
1058
|
+
|
|
1059
|
+
---
|
|
1060
|
+
|
|
1061
|
+
## Branching with durableContext.switch()
|
|
1062
|
+
|
|
1063
|
+
`durableContext.switch()` is a replay-safe branching primitive for durable workflows. Instead of using plain `if/else` (which the flow shape exporter can't capture), model conditional logic with `switch` so that:
|
|
1064
|
+
|
|
1065
|
+
1. The branch decision is **persisted** — on replay, matchers are skipped and the cached branch result is returned.
|
|
1066
|
+
2. The branch structure is **visible** to the flow-shape recorder (via `durable.describe(...)`) for documentation and visualization.
|
|
1067
|
+
|
|
1068
|
+
### API
|
|
1069
|
+
|
|
1070
|
+
```typescript
|
|
1071
|
+
const result = await durableContext.switch<TValue, TResult>(
|
|
1072
|
+
stepId, // unique step ID (like durableContext.step)
|
|
1073
|
+
value, // the value to match against
|
|
1074
|
+
branches, // array of { id, match, run }
|
|
1075
|
+
defaultBranch?, // optional { id, run } (no match needed)
|
|
1076
|
+
);
|
|
1077
|
+
```
|
|
1078
|
+
|
|
1079
|
+
### Example
|
|
1080
|
+
|
|
1081
|
+
```typescript
|
|
1082
|
+
const fulfillOrder = r
|
|
1083
|
+
.task("app.tasks.fulfillOrder")
|
|
1084
|
+
.dependencies({ durable })
|
|
1085
|
+
.run(async (input: { orderId: string; tier: string }, { durable }) => {
|
|
1086
|
+
const durableContext = durable.use();
|
|
1087
|
+
|
|
1088
|
+
const order = await durableContext.step("fetch-order", async () => {
|
|
1089
|
+
return await db.orders.findById(input.orderId);
|
|
1090
|
+
});
|
|
1091
|
+
|
|
1092
|
+
const result = await durableContext.switch(
|
|
1093
|
+
"fulfillment-route",
|
|
1094
|
+
order.tier,
|
|
1095
|
+
[
|
|
1096
|
+
{
|
|
1097
|
+
id: "premium",
|
|
1098
|
+
match: (tier) => tier === "premium",
|
|
1099
|
+
run: async () => {
|
|
1100
|
+
await durableContext.step("express-ship", async () =>
|
|
1101
|
+
shipping.express(order),
|
|
1102
|
+
);
|
|
1103
|
+
return "express-shipped";
|
|
1104
|
+
},
|
|
1105
|
+
},
|
|
1106
|
+
{
|
|
1107
|
+
id: "standard",
|
|
1108
|
+
match: (tier) => tier === "standard",
|
|
1109
|
+
run: async () => {
|
|
1110
|
+
await durableContext.step("standard-ship", async () =>
|
|
1111
|
+
shipping.standard(order),
|
|
1112
|
+
);
|
|
1113
|
+
return "standard-shipped";
|
|
1114
|
+
},
|
|
1115
|
+
},
|
|
1116
|
+
],
|
|
1117
|
+
{
|
|
1118
|
+
id: "manual-review",
|
|
1119
|
+
run: async () => {
|
|
1120
|
+
await durableContext.step("flag-review", async () =>
|
|
1121
|
+
flagForReview(order),
|
|
1122
|
+
);
|
|
1123
|
+
return "needs-review";
|
|
1124
|
+
},
|
|
1125
|
+
},
|
|
1126
|
+
);
|
|
1127
|
+
|
|
1128
|
+
return { orderId: input.orderId, result };
|
|
1129
|
+
})
|
|
1130
|
+
.build();
|
|
1131
|
+
```
|
|
1132
|
+
|
|
1133
|
+
### How it works
|
|
1134
|
+
|
|
1135
|
+
- **First execution**: matchers evaluate in order; the first matching branch's `run()` is called. The branch `id` and result are persisted as a step result.
|
|
1136
|
+
- **Replay**: the cached `{ branchId, result }` is returned immediately — no matchers or `run()` are re-executed.
|
|
1137
|
+
- **Audit**: emits a `switch_evaluated` audit entry with `branchId` and `durationMs`.
|
|
1138
|
+
- **Determinism**: the step ID is user-provided (required), so it's stable across refactors (like `durableContext.step`).
|
|
1139
|
+
- **Fail-fast**: throws if no branch matches and no default is provided.
|
|
1140
|
+
|
|
1141
|
+
### Interface
|
|
1142
|
+
|
|
1143
|
+
```typescript
|
|
1144
|
+
interface SwitchBranch<TValue, TResult> {
|
|
1145
|
+
id: string;
|
|
1146
|
+
match: (value: TValue) => boolean;
|
|
1147
|
+
run: (value: TValue) => Promise<TResult>;
|
|
1148
|
+
}
|
|
1149
|
+
```
|
|
1150
|
+
|
|
1151
|
+
---
|
|
1152
|
+
|
|
1153
|
+
## Describing a Flow (Static Shape Export)
|
|
1154
|
+
|
|
1155
|
+
Use `durable.describe(...)` to capture the **structure** of a durable workflow without executing it. It returns a serializable `DurableFlowShape` object that you can use for:
|
|
1156
|
+
|
|
1157
|
+
- Documentation generation
|
|
1158
|
+
- Visual workflow diagrams
|
|
1159
|
+
- Tooling and editor plugins
|
|
1160
|
+
- API schema exports
|
|
1161
|
+
|
|
1162
|
+
### From an existing task (recommended)
|
|
1163
|
+
|
|
1164
|
+
Call `describe()` on your durable dependency, then pass your task directly — it shims `durable.use()` and records every `durableContext.*` operation:
|
|
1165
|
+
|
|
1166
|
+
```typescript
|
|
1167
|
+
import { r, run } from "@bluelibs/runner";
|
|
1168
|
+
import { resources } from "@bluelibs/runner/node";
|
|
1169
|
+
|
|
1170
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
1171
|
+
const app = r
|
|
1172
|
+
.resource("app")
|
|
1173
|
+
.register([resources.durable, durable.with({})])
|
|
1174
|
+
.build();
|
|
1175
|
+
const runtime = await run(app);
|
|
1176
|
+
|
|
1177
|
+
// TInput is inferred from the task:
|
|
1178
|
+
const shape = await runtime.getResourceValue(durable).describe(approveOrder);
|
|
1179
|
+
|
|
1180
|
+
// Or specify input explicitly:
|
|
1181
|
+
const shape2 = await runtime
|
|
1182
|
+
.getResourceValue(durable)
|
|
1183
|
+
.describe<{ orderId: string }>(approveOrder, { orderId: "123" });
|
|
1184
|
+
|
|
1185
|
+
console.log(shape.nodes);
|
|
1186
|
+
// [
|
|
1187
|
+
// { kind: "step", stepId: "validate", hasCompensation: false },
|
|
1188
|
+
// { kind: "waitForSignal", signalId: "app.signals.approved", ... },
|
|
1189
|
+
// { kind: "step", stepId: "ship", hasCompensation: false },
|
|
1190
|
+
// { kind: "emit", eventId: "app.events.shipped", stepId: "notify" },
|
|
1191
|
+
// ]
|
|
1192
|
+
```
|
|
1193
|
+
|
|
1194
|
+
If your task is tagged with `tags.durableWorkflow.with({ defaults: {...} })`,
|
|
1195
|
+
`describe(task)` (without input) uses a cloned copy of those defaults.
|
|
1196
|
+
Passing `describe(task, input)` always wins and replaces tag defaults.
|
|
1197
|
+
|
|
1198
|
+
That's it. No refactoring — just call `durable.describe(task)` and get the shape.
|
|
1199
|
+
|
|
1200
|
+
### Output shape
|
|
1201
|
+
|
|
1202
|
+
```typescript
|
|
1203
|
+
interface DurableFlowShape {
|
|
1204
|
+
nodes: FlowNode[];
|
|
1205
|
+
}
|
|
1206
|
+
|
|
1207
|
+
type FlowNode =
|
|
1208
|
+
| { kind: "step"; stepId: string; hasCompensation: boolean }
|
|
1209
|
+
| { kind: "sleep"; durationMs: number; stepId?: string }
|
|
1210
|
+
| {
|
|
1211
|
+
kind: "waitForSignal";
|
|
1212
|
+
signalId: string;
|
|
1213
|
+
timeoutMs?: number;
|
|
1214
|
+
stepId?: string;
|
|
1215
|
+
}
|
|
1216
|
+
| { kind: "emit"; eventId: string; stepId?: string }
|
|
1217
|
+
| { kind: "switch"; stepId: string; branchIds: string[]; hasDefault: boolean }
|
|
1218
|
+
| { kind: "note"; message: string };
|
|
1219
|
+
```
|
|
1220
|
+
|
|
1221
|
+
### How it works
|
|
1222
|
+
|
|
1223
|
+
The recorder runs your task's `run` function with **real runtime dependencies**, but wraps durable resource dependencies so `durable.use()` returns a **recording context**. That context implements `IDurableContext` and captures each `durableContext.*` call as a `FlowNode` instead of executing it.
|
|
1224
|
+
|
|
1225
|
+
The step builder API (`.up()` / `.down()`) is also supported: `hasCompensation` reflects whether `.down()` was called.
|
|
1226
|
+
|
|
1227
|
+
`rollback()` is a no-op in the recorder (it's a runtime concern, not a structural one).
|
|
1228
|
+
|
|
1229
|
+
---
|
|
1230
|
+
|
|
1231
|
+
## Scheduling & Cron Jobs
|
|
1232
|
+
|
|
1233
|
+
### One-Time Scheduled Execution
|
|
1234
|
+
|
|
1235
|
+
Run a task at a specific future time:
|
|
1236
|
+
|
|
1237
|
+
```typescript
|
|
1238
|
+
// Schedule a task to run in 1 hour
|
|
1239
|
+
const executionId = await durable.schedule(
|
|
1240
|
+
processReport,
|
|
1241
|
+
{ reportId: "daily-sales" },
|
|
1242
|
+
{ at: new Date(Date.now() + 3600000) },
|
|
1243
|
+
);
|
|
1244
|
+
|
|
1245
|
+
// Or use delay helper
|
|
1246
|
+
const executionId = await durable.schedule(
|
|
1247
|
+
sendReminder,
|
|
1248
|
+
{ userId: "user-123" },
|
|
1249
|
+
{ delay: 24 * 60 * 60 * 1000 }, // 24 hours from now
|
|
1250
|
+
);
|
|
1251
|
+
```
|
|
1252
|
+
|
|
1253
|
+
### Recurring Cron Jobs
|
|
1254
|
+
|
|
1255
|
+
Define tasks that run on a schedule using cron expressions:
|
|
1256
|
+
|
|
1257
|
+
```typescript
|
|
1258
|
+
// Define a scheduled task
|
|
1259
|
+
const dailyCleanup = r
|
|
1260
|
+
.task("app.tasks.dailyCleanup")
|
|
1261
|
+
.dependencies({ durable, db })
|
|
1262
|
+
.run(async (input, { durable, db }) => {
|
|
1263
|
+
const durableContext = durable.use();
|
|
1264
|
+
|
|
1265
|
+
await durableContext.step("cleanup-old-sessions", async () => {
|
|
1266
|
+
await db.sessions.deleteOlderThan(7, "days");
|
|
1267
|
+
});
|
|
1268
|
+
|
|
1269
|
+
await durableContext.step("cleanup-temp-files", async () => {
|
|
1270
|
+
await fs.rm("./tmp/*", { recursive: true });
|
|
1271
|
+
});
|
|
1272
|
+
|
|
1273
|
+
return { cleaned: true };
|
|
1274
|
+
})
|
|
1275
|
+
.build();
|
|
1276
|
+
|
|
1277
|
+
// Create schedules once at startup (in a bootstrap resource/task)
|
|
1278
|
+
// ensureSchedule() is idempotent — safe to call on every boot and concurrently
|
|
1279
|
+
await durable.ensureSchedule(
|
|
1280
|
+
dailyCleanup,
|
|
1281
|
+
{},
|
|
1282
|
+
{ id: "daily-cleanup", cron: "0 3 * * *" },
|
|
1283
|
+
);
|
|
1284
|
+
await durable.ensureSchedule(
|
|
1285
|
+
syncInventory,
|
|
1286
|
+
{ full: false },
|
|
1287
|
+
{ id: "hourly-sync", cron: "0 * * * *" },
|
|
1288
|
+
);
|
|
1289
|
+
await durable.ensureSchedule(
|
|
1290
|
+
generateWeeklyReport,
|
|
1291
|
+
{ type: "weekly" },
|
|
1292
|
+
{ id: "weekly-report", cron: "0 9 * * MON" },
|
|
1293
|
+
);
|
|
1294
|
+
```
|
|
1295
|
+
|
|
1296
|
+
### Interval-Based Scheduling
|
|
1297
|
+
|
|
1298
|
+
Run tasks at fixed intervals (e.g., every 30 seconds):
|
|
1299
|
+
|
|
1300
|
+
```typescript
|
|
1301
|
+
// ensureSchedule() is idempotent — safe to call on every boot and concurrently
|
|
1302
|
+
await durable.ensureSchedule(
|
|
1303
|
+
healthCheckTask,
|
|
1304
|
+
{ endpoints: ["api", "db"] },
|
|
1305
|
+
{ id: "health-check", interval: 30_000 },
|
|
1306
|
+
);
|
|
1307
|
+
await durable.ensureSchedule(
|
|
1308
|
+
pollExternalApi,
|
|
1309
|
+
{},
|
|
1310
|
+
{ id: "poll-external-api", interval: 5 * 60 * 1000 },
|
|
1311
|
+
);
|
|
1312
|
+
await durable.ensureSchedule(
|
|
1313
|
+
metricsSync,
|
|
1314
|
+
{ flush: true },
|
|
1315
|
+
{ id: "metrics-sync", interval: 60_000 },
|
|
1316
|
+
);
|
|
1317
|
+
```
|
|
1318
|
+
|
|
1319
|
+
**Interval vs Cron:**
|
|
1320
|
+
|
|
1321
|
+
- **Interval**: Fixed delay between executions. Next run = end of previous + interval. Best for polling, health checks.
|
|
1322
|
+
- **Cron**: Calendar-based. Next run = next matching time. Best for scheduled reports, daily cleanup.
|
|
1323
|
+
|
|
1324
|
+
**Interval Behavior (current implementation):**
|
|
1325
|
+
Intervals are currently measured from when the schedule timer fires / execution is kicked off (not from task completion).
|
|
1326
|
+
If the task runs longer than the interval, the next run will be scheduled after the interval from _kickoff time_, which can cause overlapping executions unless your task logic (or your infrastructure) prevents it.
|
|
1327
|
+
|
|
1328
|
+
```
|
|
1329
|
+
Task starts at t=0, takes 12s to complete
|
|
1330
|
+
Interval = 10s
|
|
1331
|
+
|
|
1332
|
+
t=0 t=10 t=12
|
|
1333
|
+
|------------|------------|
|
|
1334
|
+
task run A next run B A completes
|
|
1335
|
+
```
|
|
1336
|
+
|
|
1337
|
+
If you need "completion-based" intervals (no overlap), implement it explicitly inside the workflow:
|
|
1338
|
+
|
|
1339
|
+
- run the work
|
|
1340
|
+
- then `await durableContext.sleep(intervalMs)`
|
|
1341
|
+
- then loop / re-run (or have the schedule fire less frequently and use durable sleeps inside)
|
|
1342
|
+
|
|
1343
|
+
### Cron Expression Format
|
|
1344
|
+
|
|
1345
|
+
Standard 5-field cron format:
|
|
1346
|
+
|
|
1347
|
+
```
|
|
1348
|
+
┌───────────── minute (0-59)
|
|
1349
|
+
│ ┌─────────── hour (0-23)
|
|
1350
|
+
│ │ ┌───────── day of month (1-31)
|
|
1351
|
+
│ │ │ ┌─────── month (1-12 or JAN-DEC)
|
|
1352
|
+
│ │ │ │ ┌───── day of week (0-6 or SUN-SAT)
|
|
1353
|
+
│ │ │ │ │
|
|
1354
|
+
* * * * *
|
|
1355
|
+
```
|
|
1356
|
+
|
|
1357
|
+
Common patterns:
|
|
1358
|
+
|
|
1359
|
+
- `* * * * *` - Every minute
|
|
1360
|
+
- `0 * * * *` - Every hour
|
|
1361
|
+
- `0 0 * * *` - Every day at midnight
|
|
1362
|
+
- `0 9 * * MON-FRI` - Weekdays at 9am
|
|
1363
|
+
- `0 0 1 * *` - First of every month
|
|
1364
|
+
|
|
1365
|
+
### Schedule Management API
|
|
1366
|
+
|
|
1367
|
+
```typescript
|
|
1368
|
+
// Pause a schedule
|
|
1369
|
+
await durable.pauseSchedule("daily-cleanup");
|
|
1370
|
+
|
|
1371
|
+
// Resume a schedule
|
|
1372
|
+
await durable.resumeSchedule("daily-cleanup");
|
|
1373
|
+
|
|
1374
|
+
// Get schedule status
|
|
1375
|
+
const status = await durable.getSchedule("daily-cleanup");
|
|
1376
|
+
// { id, cron, lastRun, nextRun, status: 'active' | 'paused' }
|
|
1377
|
+
|
|
1378
|
+
// List all schedules
|
|
1379
|
+
const schedules = await durable.listSchedules();
|
|
1380
|
+
|
|
1381
|
+
// Update schedule cron
|
|
1382
|
+
await durable.updateSchedule("daily-cleanup", { cron: "0 4 * * *" });
|
|
1383
|
+
|
|
1384
|
+
// Remove schedule
|
|
1385
|
+
await durable.removeSchedule("daily-cleanup");
|
|
1386
|
+
```
|
|
1387
|
+
|
|
1388
|
+
### How Scheduling Works
|
|
1389
|
+
|
|
1390
|
+
```mermaid
|
|
1391
|
+
sequenceDiagram
|
|
1392
|
+
participant DS as DurableService
|
|
1393
|
+
participant S as Store
|
|
1394
|
+
participant T as Task
|
|
1395
|
+
|
|
1396
|
+
Note over DS: Timer polling loop
|
|
1397
|
+
|
|
1398
|
+
loop Every polling interval
|
|
1399
|
+
DS->>S: getReadyTimers
|
|
1400
|
+
S-->>DS: timers ready to fire
|
|
1401
|
+
|
|
1402
|
+
alt Schedule timer
|
|
1403
|
+
DS->>S: getSchedule by scheduleId
|
|
1404
|
+
DS->>DS: execute task with input
|
|
1405
|
+
DS->>S: calculateNextRun from cron
|
|
1406
|
+
DS->>S: createTimer for next run
|
|
1407
|
+
else Sleep timer
|
|
1408
|
+
DS->>DS: resume execution
|
|
1409
|
+
else One-time scheduled
|
|
1410
|
+
DS->>DS: execute task
|
|
1411
|
+
end
|
|
1412
|
+
end
|
|
1413
|
+
```
|
|
1414
|
+
|
|
1415
|
+
---
|
|
1416
|
+
|
|
1417
|
+
## Core Types
|
|
1418
|
+
|
|
1419
|
+
```typescript
|
|
1420
|
+
// types.ts
|
|
1421
|
+
|
|
1422
|
+
export type ExecutionStatus =
|
|
1423
|
+
| "pending"
|
|
1424
|
+
| "running"
|
|
1425
|
+
| "retrying"
|
|
1426
|
+
| "sleeping"
|
|
1427
|
+
| "completed"
|
|
1428
|
+
| "failed"
|
|
1429
|
+
| "compensation_failed";
|
|
1430
|
+
|
|
1431
|
+
export interface Execution<TInput = unknown, TResult = unknown> {
|
|
1432
|
+
id: string;
|
|
1433
|
+
taskId: string;
|
|
1434
|
+
input: TInput | undefined;
|
|
1435
|
+
status: ExecutionStatus;
|
|
1436
|
+
result?: TResult;
|
|
1437
|
+
error?: {
|
|
1438
|
+
message: string;
|
|
1439
|
+
stack?: string;
|
|
1440
|
+
};
|
|
1441
|
+
attempt: number;
|
|
1442
|
+
maxAttempts: number;
|
|
1443
|
+
timeout?: number;
|
|
1444
|
+
createdAt: Date;
|
|
1445
|
+
updatedAt: Date;
|
|
1446
|
+
completedAt?: Date;
|
|
1447
|
+
}
|
|
1448
|
+
|
|
1449
|
+
export interface StepResult<T = unknown> {
|
|
1450
|
+
executionId: string;
|
|
1451
|
+
stepId: string;
|
|
1452
|
+
result: T;
|
|
1453
|
+
completedAt: Date;
|
|
1454
|
+
}
|
|
1455
|
+
|
|
1456
|
+
export type TimerType =
|
|
1457
|
+
| "sleep"
|
|
1458
|
+
| "timeout"
|
|
1459
|
+
| "scheduled"
|
|
1460
|
+
| "cron"
|
|
1461
|
+
| "retry"
|
|
1462
|
+
| "signal_timeout";
|
|
1463
|
+
|
|
1464
|
+
export interface Timer {
|
|
1465
|
+
id: string;
|
|
1466
|
+
executionId?: string; // For sleep/timeout timers
|
|
1467
|
+
stepId?: string; // For step-specific timers
|
|
1468
|
+
scheduleId?: string; // For cron timers
|
|
1469
|
+
type: TimerType;
|
|
1470
|
+
fireAt: Date;
|
|
1471
|
+
status: "pending" | "fired";
|
|
1472
|
+
}
|
|
1473
|
+
|
|
1474
|
+
export type ScheduleType = "cron" | "interval";
|
|
1475
|
+
|
|
1476
|
+
export interface Schedule<TInput = unknown> {
|
|
1477
|
+
id: string;
|
|
1478
|
+
taskId: string;
|
|
1479
|
+
type: ScheduleType;
|
|
1480
|
+
pattern: string; // Cron expression or interval (ms)
|
|
1481
|
+
input: TInput | undefined;
|
|
1482
|
+
status: "active" | "paused";
|
|
1483
|
+
lastRun?: Date;
|
|
1484
|
+
nextRun?: Date;
|
|
1485
|
+
createdAt: Date;
|
|
1486
|
+
updatedAt: Date;
|
|
1487
|
+
}
|
|
1488
|
+
|
|
1489
|
+
export interface DurableContextState {
|
|
1490
|
+
executionId: string;
|
|
1491
|
+
attempt: number;
|
|
1492
|
+
}
|
|
1493
|
+
```
|
|
1494
|
+
|
|
1495
|
+
---
|
|
1496
|
+
|
|
1497
|
+
**Note on Interfaces**: The full technical contracts for `IDurableStore`, `IEventBus`, and `IDurableQueue` are documented in the [Abstract Interfaces](#abstract-interfaces) section.
|
|
1498
|
+
|
|
1499
|
+
---
|
|
1500
|
+
|
|
1501
|
+
## DurableContext
|
|
1502
|
+
|
|
1503
|
+
```typescript
|
|
1504
|
+
// DurableContext.ts
|
|
1505
|
+
|
|
1506
|
+
export interface IDurableContext {
|
|
1507
|
+
readonly executionId: string;
|
|
1508
|
+
readonly attempt: number;
|
|
1509
|
+
|
|
1510
|
+
/**
|
|
1511
|
+
* Execute a step with memoization. On replay, returns cached result.
|
|
1512
|
+
*/
|
|
1513
|
+
step<T>(stepId: string, fn: () => Promise<T>): Promise<T>;
|
|
1514
|
+
step<T>(
|
|
1515
|
+
stepId: string,
|
|
1516
|
+
options: StepOptions,
|
|
1517
|
+
fn: () => Promise<T>,
|
|
1518
|
+
): Promise<T>;
|
|
1519
|
+
|
|
1520
|
+
/**
|
|
1521
|
+
* Durable sleep that survives process restarts.
|
|
1522
|
+
*/
|
|
1523
|
+
sleep(durationMs: number): Promise<void>;
|
|
1524
|
+
|
|
1525
|
+
/**
|
|
1526
|
+
* Emit an event durably (as a step).
|
|
1527
|
+
*/
|
|
1528
|
+
emit<T>(event: IEvent<T>, data: T): Promise<void>;
|
|
1529
|
+
}
|
|
1530
|
+
|
|
1531
|
+
export interface StepOptions {
|
|
1532
|
+
retries?: number;
|
|
1533
|
+
timeout?: number;
|
|
1534
|
+
}
|
|
1535
|
+
```
|
|
1536
|
+
|
|
1537
|
+
---
|
|
1538
|
+
|
|
1539
|
+
## DurableService
|
|
1540
|
+
|
|
1541
|
+
```typescript
|
|
1542
|
+
// DurableService.ts (simplified interface)
|
|
1543
|
+
|
|
1544
|
+
export interface ScheduleConfig<TInput = unknown> {
|
|
1545
|
+
id: string;
|
|
1546
|
+
task: ITask<TInput, any>;
|
|
1547
|
+
cron?: string; // Cron expression (e.g., '0 3 * * *')
|
|
1548
|
+
interval?: number; // Interval in ms (e.g., 30000 for 30 seconds)
|
|
1549
|
+
input: TInput;
|
|
1550
|
+
}
|
|
1551
|
+
// Must specify either cron OR interval, not both
|
|
1552
|
+
|
|
1553
|
+
export interface DurableServiceConfig {
|
|
1554
|
+
store: IDurableStore;
|
|
1555
|
+
queue?: IDurableQueue;
|
|
1556
|
+
eventBus?: IEventBus;
|
|
1557
|
+
audit?: {
|
|
1558
|
+
enabled?: boolean; // Default: false
|
|
1559
|
+
};
|
|
1560
|
+
polling?: {
|
|
1561
|
+
enabled?: boolean; // Default: true
|
|
1562
|
+
interval?: number; // Default: 1000ms
|
|
1563
|
+
};
|
|
1564
|
+
execution?: {
|
|
1565
|
+
maxAttempts?: number; // Default: 3
|
|
1566
|
+
timeout?: number; // Default: no timeout
|
|
1567
|
+
};
|
|
1568
|
+
schedules?: ScheduleConfig[]; // Cron schedules to register
|
|
1569
|
+
}
|
|
1570
|
+
|
|
1571
|
+
export interface ScheduleOptions {
|
|
1572
|
+
id?: string; // Stable schedule id (required for ensureSchedule)
|
|
1573
|
+
at?: Date; // Run at specific time
|
|
1574
|
+
delay?: number; // Run after delay (ms)
|
|
1575
|
+
cron?: string; // Cron expression (for recurring)
|
|
1576
|
+
interval?: number; // Interval in ms (for recurring)
|
|
1577
|
+
}
|
|
1578
|
+
|
|
1579
|
+
export interface IDurableService {
|
|
1580
|
+
/**
|
|
1581
|
+
* Start a task durably and wait for it to complete.
|
|
1582
|
+
*/
|
|
1583
|
+
startAndWait<TInput, TResult>(
|
|
1584
|
+
task: ITask<TInput, Promise<TResult>, any, any, any, any> | string,
|
|
1585
|
+
input?: TInput,
|
|
1586
|
+
options?: ExecuteOptions,
|
|
1587
|
+
): Promise<TResult>;
|
|
1588
|
+
|
|
1589
|
+
/**
|
|
1590
|
+
* Start a task execution and return the ID immediately.
|
|
1591
|
+
*/
|
|
1592
|
+
start<TInput>(
|
|
1593
|
+
task: ITask<TInput, Promise<unknown>, any, any, any, any> | string,
|
|
1594
|
+
input?: TInput,
|
|
1595
|
+
options?: ExecuteOptions,
|
|
1596
|
+
): Promise<string>;
|
|
1597
|
+
|
|
1598
|
+
/**
|
|
1599
|
+
* Wait for a previously started execution to complete.
|
|
1600
|
+
*/
|
|
1601
|
+
wait<TResult>(
|
|
1602
|
+
executionId: string,
|
|
1603
|
+
options?: { timeout?: number; waitPollIntervalMs?: number },
|
|
1604
|
+
): Promise<TResult>;
|
|
1605
|
+
|
|
1606
|
+
/**
|
|
1607
|
+
* Deliver a signal payload to a waiting workflow execution.
|
|
1608
|
+
*/
|
|
1609
|
+
signal<TPayload>(
|
|
1610
|
+
executionId: string,
|
|
1611
|
+
signal: string | IEventDefinition<TPayload>,
|
|
1612
|
+
payload: TPayload,
|
|
1613
|
+
): Promise<void>;
|
|
1614
|
+
|
|
1615
|
+
/**
|
|
1616
|
+
* Schedule a one-time task execution.
|
|
1617
|
+
*/
|
|
1618
|
+
schedule<TInput>(
|
|
1619
|
+
task: ITask<TInput, Promise<any>, any, any, any, any> | string,
|
|
1620
|
+
input: TInput,
|
|
1621
|
+
options: ScheduleOptions,
|
|
1622
|
+
): Promise<string>;
|
|
1623
|
+
|
|
1624
|
+
/**
|
|
1625
|
+
* Idempotently create (or update) a recurring schedule (cron/interval).
|
|
1626
|
+
* Safe to call on every boot and concurrently across processes.
|
|
1627
|
+
*/
|
|
1628
|
+
ensureSchedule<TInput>(
|
|
1629
|
+
task: ITask<TInput, Promise<any>, any, any, any, any> | string,
|
|
1630
|
+
input: TInput,
|
|
1631
|
+
options: ScheduleOptions & { id: string },
|
|
1632
|
+
): Promise<string>;
|
|
1633
|
+
|
|
1634
|
+
/**
|
|
1635
|
+
* Recover incomplete executions on startup.
|
|
1636
|
+
*/
|
|
1637
|
+
recover(): Promise<void>;
|
|
1638
|
+
|
|
1639
|
+
/**
|
|
1640
|
+
* Start timer polling (called automatically on init).
|
|
1641
|
+
*/
|
|
1642
|
+
start(): void;
|
|
1643
|
+
|
|
1644
|
+
/**
|
|
1645
|
+
* Stop timer polling (called on dispose).
|
|
1646
|
+
*/
|
|
1647
|
+
stop(): Promise<void>;
|
|
1648
|
+
|
|
1649
|
+
// Schedule management
|
|
1650
|
+
pauseSchedule(scheduleId: string): Promise<void>;
|
|
1651
|
+
resumeSchedule(scheduleId: string): Promise<void>;
|
|
1652
|
+
getSchedule(scheduleId: string): Promise<Schedule | null>;
|
|
1653
|
+
listSchedules(): Promise<Schedule[]>;
|
|
1654
|
+
updateSchedule(
|
|
1655
|
+
scheduleId: string,
|
|
1656
|
+
updates: { cron?: string; interval?: number; input?: unknown },
|
|
1657
|
+
): Promise<void>;
|
|
1658
|
+
removeSchedule(scheduleId: string): Promise<void>;
|
|
1659
|
+
}
|
|
1660
|
+
```
|
|
1661
|
+
|
|
1662
|
+
---
|
|
1663
|
+
|
|
1664
|
+
## File Structure
|
|
1665
|
+
|
|
1666
|
+
```
|
|
1667
|
+
src/node/durable/
|
|
1668
|
+
├── index.ts # Public exports (from `@bluelibs/runner/node`)
|
|
1669
|
+
├── core/ # Engine (store is the source of truth)
|
|
1670
|
+
│ ├── index.ts
|
|
1671
|
+
│ ├── types.ts
|
|
1672
|
+
│ ├── CronParser.ts
|
|
1673
|
+
│ ├── DurableContext.ts
|
|
1674
|
+
│ ├── DurableService.ts
|
|
1675
|
+
│ ├── DurableWorker.ts
|
|
1676
|
+
│ ├── DurableOperator.ts
|
|
1677
|
+
│ ├── StepBuilder.ts
|
|
1678
|
+
│ └── interfaces/
|
|
1679
|
+
├── store/
|
|
1680
|
+
│ ├── MemoryStore.ts
|
|
1681
|
+
│ └── RedisStore.ts
|
|
1682
|
+
├── queue/
|
|
1683
|
+
│ ├── MemoryQueue.ts
|
|
1684
|
+
│ └── RabbitMQQueue.ts
|
|
1685
|
+
├── bus/
|
|
1686
|
+
│ ├── MemoryEventBus.ts
|
|
1687
|
+
│ ├── NoopEventBus.ts
|
|
1688
|
+
│ └── RedisEventBus.ts
|
|
1689
|
+
└── __tests__/
|
|
1690
|
+
├── DurableContext.test.ts
|
|
1691
|
+
├── DurableService.integration.test.ts
|
|
1692
|
+
├── DurableService.realBackends.integration.test.ts
|
|
1693
|
+
├── MemoryBackends.test.ts
|
|
1694
|
+
├── RabbitMQQueue.mock.test.ts
|
|
1695
|
+
├── RedisEventBus.mock.test.ts
|
|
1696
|
+
└── RedisStore.mock.test.ts
|
|
1697
|
+
```
|
|
1698
|
+
|
|
1699
|
+
---
|
|
1700
|
+
|
|
1701
|
+
## Production Setup with Redis + RabbitMQ
|
|
1702
|
+
|
|
1703
|
+
For production, use Redis for state/pub-sub and RabbitMQ with quorum queues for durable work distribution.
|
|
1704
|
+
|
|
1705
|
+
Install required Node dependencies:
|
|
1706
|
+
|
|
1707
|
+
```bash
|
|
1708
|
+
npm install ioredis amqplib
|
|
1709
|
+
```
|
|
1710
|
+
|
|
1711
|
+
### Quick Start - Production Configuration
|
|
1712
|
+
|
|
1713
|
+
```typescript
|
|
1714
|
+
import {
|
|
1715
|
+
RedisStore,
|
|
1716
|
+
RedisEventBus,
|
|
1717
|
+
RabbitMQQueue,
|
|
1718
|
+
resources,
|
|
1719
|
+
} from "@bluelibs/runner/node";
|
|
1720
|
+
|
|
1721
|
+
// State storage with Redis
|
|
1722
|
+
const store = new RedisStore({
|
|
1723
|
+
redis: process.env.REDIS_URL || "redis://localhost:6379",
|
|
1724
|
+
prefix: "durable:",
|
|
1725
|
+
});
|
|
1726
|
+
|
|
1727
|
+
// Pub/Sub with Redis
|
|
1728
|
+
const eventBus = new RedisEventBus({
|
|
1729
|
+
redis: process.env.REDIS_URL || "redis://localhost:6379",
|
|
1730
|
+
prefix: "durable:bus:",
|
|
1731
|
+
});
|
|
1732
|
+
|
|
1733
|
+
// Work distribution with RabbitMQ quorum queues
|
|
1734
|
+
const queue = new RabbitMQQueue({
|
|
1735
|
+
url: process.env.RABBITMQ_URL || "amqp://localhost",
|
|
1736
|
+
queue: {
|
|
1737
|
+
name: "durable-executions",
|
|
1738
|
+
quorum: true, // Use quorum queue for durability
|
|
1739
|
+
deadLetter: "durable-dlq", // Dead letter queue for failed messages
|
|
1740
|
+
},
|
|
1741
|
+
prefetch: 10, // Process up to 10 messages concurrently
|
|
1742
|
+
});
|
|
1743
|
+
|
|
1744
|
+
// Create durable resource definition + registration
|
|
1745
|
+
const durable = resources.redisWorkflow.fork("app-durable");
|
|
1746
|
+
const durableRegistration = durable.with({
|
|
1747
|
+
store,
|
|
1748
|
+
eventBus,
|
|
1749
|
+
queue,
|
|
1750
|
+
worker: true, // starts a queue consumer in this process
|
|
1751
|
+
// polling.enabled defaults to true; keep it on for timers/schedules
|
|
1752
|
+
});
|
|
1753
|
+
```
|
|
1754
|
+
|
|
1755
|
+
If you want API-only nodes to call `start()` / `signal()` / `wait()` **without running the timer poller**, disable polling:
|
|
1756
|
+
|
|
1757
|
+
```ts
|
|
1758
|
+
const durable = resources.redisWorkflow.fork("app-durable");
|
|
1759
|
+
const durableRegistration = durable.with({
|
|
1760
|
+
store,
|
|
1761
|
+
eventBus,
|
|
1762
|
+
queue,
|
|
1763
|
+
worker: false,
|
|
1764
|
+
polling: { enabled: false },
|
|
1765
|
+
});
|
|
1766
|
+
```
|
|
1767
|
+
|
|
1768
|
+
Make sure at least one worker process runs with polling enabled, otherwise sleeps/timeouts/schedules will never fire.
|
|
1769
|
+
|
|
1770
|
+
### RabbitMQ Quorum Queues
|
|
1771
|
+
|
|
1772
|
+
**Why quorum queues?**
|
|
1773
|
+
|
|
1774
|
+
- **Durability** - Messages survive broker restarts
|
|
1775
|
+
- **Replication** - Messages replicated across nodes
|
|
1776
|
+
- **Consistency** - Strong guarantees vs classic mirrored queues
|
|
1777
|
+
- **Dead-letter** - Failed messages go to DLQ for inspection
|
|
1778
|
+
|
|
1779
|
+
```typescript
|
|
1780
|
+
// queue/RabbitMQQueue.ts
|
|
1781
|
+
|
|
1782
|
+
export interface RabbitMQQueueConfig {
|
|
1783
|
+
url: string;
|
|
1784
|
+
queue: {
|
|
1785
|
+
name: string;
|
|
1786
|
+
quorum?: boolean; // Use quorum queue (default: true)
|
|
1787
|
+
deadLetter?: string; // Dead letter exchange
|
|
1788
|
+
messageTtl?: number; // Message TTL in ms
|
|
1789
|
+
};
|
|
1790
|
+
prefetch?: number; // Consumer prefetch (default: 10)
|
|
1791
|
+
}
|
|
1792
|
+
|
|
1793
|
+
export class RabbitMQQueue implements IDurableQueue {
|
|
1794
|
+
constructor(config: RabbitMQQueueConfig);
|
|
1795
|
+
|
|
1796
|
+
async init(): Promise<void> {
|
|
1797
|
+
// Creates quorum queue with:
|
|
1798
|
+
// - x-queue-type: quorum
|
|
1799
|
+
// - x-dead-letter-exchange: <deadLetter>
|
|
1800
|
+
// - durable: true
|
|
1801
|
+
}
|
|
1802
|
+
}
|
|
1803
|
+
```
|
|
1804
|
+
|
|
1805
|
+
### Redis Store Implementation Details
|
|
1806
|
+
|
|
1807
|
+
- **Serialization**: `RedisStore` uses Runner's serializer for persistence. This preserves `Date` objects and other complex types, avoiding "time bombs" where dates become strings after being stored.
|
|
1808
|
+
- **Performance (SCAN vs KEYS)**: All multi-key searches use Redis `SCAN` for non-blocking iteration. This prevents Redis from freezing when thousands of executions are present.
|
|
1809
|
+
- **Concurrency & Atomicity**:
|
|
1810
|
+
- `updateExecution()` uses a Lua script to perform a read/merge/write update atomically.
|
|
1811
|
+
- Execution processing is guarded by `acquireLock()` so only one worker runs an execution attempt at a time.
|
|
1812
|
+
- Signal delivery (`durable.signal`) and signal waits (`durableContext.waitForSignal`) use a per-execution/per-signal lock when supported by the store, to prevent races between "signal arrives" and "wait is being recorded".
|
|
1813
|
+
|
|
1814
|
+
### Optimized Client Waiting
|
|
1815
|
+
|
|
1816
|
+
When an `IEventBus` (like `RedisEventBus`) is present, calls to `durable.startAndWait()` or `durable.wait()` use a **reactive event-driven approach**. The service subscribes to completion events for that specific execution ID, resulting in near-instant response times once the workflow finishes, without constant store polling.
|
|
1817
|
+
|
|
1818
|
+
### Horizontal Scaling
|
|
1819
|
+
|
|
1820
|
+
```mermaid
|
|
1821
|
+
graph TB
|
|
1822
|
+
subgraph Clients[API Servers]
|
|
1823
|
+
A1[API 1]
|
|
1824
|
+
A2[API 2]
|
|
1825
|
+
end
|
|
1826
|
+
|
|
1827
|
+
subgraph RabbitMQ[RabbitMQ Cluster]
|
|
1828
|
+
Q[(Quorum Queue)]
|
|
1829
|
+
DLQ[(Dead Letter Queue)]
|
|
1830
|
+
end
|
|
1831
|
+
|
|
1832
|
+
subgraph Redis[Redis Cluster]
|
|
1833
|
+
RS[(State Store)]
|
|
1834
|
+
RP[(Pub/Sub)]
|
|
1835
|
+
end
|
|
1836
|
+
|
|
1837
|
+
subgraph Workers[Worker Pool - Auto-Scaling]
|
|
1838
|
+
W1[Worker 1]
|
|
1839
|
+
W2[Worker 2]
|
|
1840
|
+
W3[Worker N]
|
|
1841
|
+
end
|
|
1842
|
+
|
|
1843
|
+
A1 -->|enqueue| Q
|
|
1844
|
+
A2 -->|enqueue| Q
|
|
1845
|
+
|
|
1846
|
+
Q -->|consume| W1
|
|
1847
|
+
Q -->|consume| W2
|
|
1848
|
+
Q -->|consume| W3
|
|
1849
|
+
|
|
1850
|
+
W1 <-->|state| RS
|
|
1851
|
+
W2 <-->|state| RS
|
|
1852
|
+
W3 <-->|state| RS
|
|
1853
|
+
|
|
1854
|
+
RP -.->|notify| W1
|
|
1855
|
+
RP -.->|notify| W2
|
|
1856
|
+
RP -.->|notify| W3
|
|
1857
|
+
|
|
1858
|
+
Q -->|failed| DLQ
|
|
1859
|
+
```
|
|
1860
|
+
|
|
1861
|
+
**Scaling characteristics:**
|
|
1862
|
+
|
|
1863
|
+
- **Workers** - Add more worker instances to increase throughput
|
|
1864
|
+
- **Queue** - RabbitMQ handles work distribution automatically
|
|
1865
|
+
- **State** - All workers share state via Redis
|
|
1866
|
+
- **Events** - Redis pub/sub notifies workers of timer events
|
|
1867
|
+
|
|
1868
|
+
### Execution Flow with Queue
|
|
1869
|
+
|
|
1870
|
+
```mermaid
|
|
1871
|
+
sequenceDiagram
|
|
1872
|
+
participant C as Client
|
|
1873
|
+
participant Q as RabbitMQ
|
|
1874
|
+
participant W as Worker
|
|
1875
|
+
participant R as Redis
|
|
1876
|
+
|
|
1877
|
+
C->>R: Create execution record
|
|
1878
|
+
C->>Q: Enqueue execution message
|
|
1879
|
+
C-->>C: Return execution ID
|
|
1880
|
+
|
|
1881
|
+
Note over Q,W: Workers consuming queue
|
|
1882
|
+
|
|
1883
|
+
Q->>W: Deliver message
|
|
1884
|
+
W->>R: Acquire lock on execution
|
|
1885
|
+
|
|
1886
|
+
alt Lock acquired
|
|
1887
|
+
W->>R: Load execution state
|
|
1888
|
+
W->>W: Execute task with DurableContext
|
|
1889
|
+
|
|
1890
|
+
loop For each step
|
|
1891
|
+
W->>R: Check step result cache
|
|
1892
|
+
alt Cache hit
|
|
1893
|
+
R-->>W: Return cached result
|
|
1894
|
+
else Cache miss
|
|
1895
|
+
W->>W: Execute step
|
|
1896
|
+
W->>R: Cache step result
|
|
1897
|
+
end
|
|
1898
|
+
end
|
|
1899
|
+
|
|
1900
|
+
W->>R: Mark execution complete
|
|
1901
|
+
W->>R: Release lock
|
|
1902
|
+
W->>Q: Ack message
|
|
1903
|
+
else Lock not acquired
|
|
1904
|
+
W->>Q: Nack with requeue
|
|
1905
|
+
end
|
|
1906
|
+
```
|
|
1907
|
+
|
|
1908
|
+
---
|
|
1909
|
+
|
|
1910
|
+
## Integration with Runner Resources
|
|
1911
|
+
|
|
1912
|
+
The durable module integrates seamlessly with Runner's resource pattern:
|
|
1913
|
+
|
|
1914
|
+
### As a Dependency
|
|
1915
|
+
|
|
1916
|
+
```typescript
|
|
1917
|
+
import { r, run } from "@bluelibs/runner";
|
|
1918
|
+
import { MemoryStore, resources } from "@bluelibs/runner/node";
|
|
1919
|
+
|
|
1920
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
1921
|
+
const durableRegistration = durable.with({
|
|
1922
|
+
store: new MemoryStore(),
|
|
1923
|
+
worker: true, // single-process: also consumes the queue if configured
|
|
1924
|
+
});
|
|
1925
|
+
|
|
1926
|
+
const processOrder = r
|
|
1927
|
+
.task("app.tasks.processOrder")
|
|
1928
|
+
.dependencies({ durable })
|
|
1929
|
+
.run(async (input, { durable }) => {
|
|
1930
|
+
const durableContext = durable.use();
|
|
1931
|
+
// ... durable task logic
|
|
1932
|
+
})
|
|
1933
|
+
.build();
|
|
1934
|
+
|
|
1935
|
+
const recoverDurable = r
|
|
1936
|
+
.resource("app-durable.recover")
|
|
1937
|
+
.dependencies({ durable })
|
|
1938
|
+
.init(async (_cfg, { durable }) => {
|
|
1939
|
+
await durable.recover();
|
|
1940
|
+
})
|
|
1941
|
+
.build();
|
|
1942
|
+
|
|
1943
|
+
const app = r
|
|
1944
|
+
.resource("app")
|
|
1945
|
+
.register([
|
|
1946
|
+
resources.durable,
|
|
1947
|
+
durableRegistration,
|
|
1948
|
+
processOrder,
|
|
1949
|
+
recoverDurable,
|
|
1950
|
+
])
|
|
1951
|
+
.build();
|
|
1952
|
+
await run(app);
|
|
1953
|
+
```
|
|
1954
|
+
|
|
1955
|
+
### Resource Factory Pattern
|
|
1956
|
+
|
|
1957
|
+
Runner resources are definitions built at bootstrap time. If you want to pick a store based on environment/config, do it when you create the resource:
|
|
1958
|
+
|
|
1959
|
+
```typescript
|
|
1960
|
+
const store = process.env.REDIS_URL
|
|
1961
|
+
? new RedisStore({ redis: process.env.REDIS_URL })
|
|
1962
|
+
: new MemoryStore();
|
|
1963
|
+
|
|
1964
|
+
const durable = resources.memoryWorkflow.fork("app-durable");
|
|
1965
|
+
const durableRegistration = durable.with({ store });
|
|
1966
|
+
```
|
|
1967
|
+
|
|
1968
|
+
### Integration with HTTP Exposure
|
|
1969
|
+
|
|
1970
|
+
Expose durable task execution over HTTP using Runner's remote lanes pattern:
|
|
1971
|
+
|
|
1972
|
+
```typescript
|
|
1973
|
+
import { createHttpClient } from "@bluelibs/runner";
|
|
1974
|
+
import { rpcLanesResource } from "@bluelibs/runner/node";
|
|
1975
|
+
|
|
1976
|
+
const durableLane = r
|
|
1977
|
+
.rpcLane("app.rpc.durable")
|
|
1978
|
+
.applyTo([processOrder])
|
|
1979
|
+
.build();
|
|
1980
|
+
|
|
1981
|
+
const topology = r.rpcLane.topology({
|
|
1982
|
+
profiles: { worker: { serve: [durableLane] } },
|
|
1983
|
+
bindings: [{ lane: durableLane, communicator: r.rpcLane.http() }],
|
|
1984
|
+
});
|
|
1985
|
+
|
|
1986
|
+
const app = r
|
|
1987
|
+
.resource("app")
|
|
1988
|
+
.register([
|
|
1989
|
+
durable,
|
|
1990
|
+
processOrder,
|
|
1991
|
+
rpcLanesResource.with({
|
|
1992
|
+
profile: "worker",
|
|
1993
|
+
mode: "network",
|
|
1994
|
+
topology,
|
|
1995
|
+
exposure: {
|
|
1996
|
+
http: { basePath: "/__runner", listen: { port: 7070 } },
|
|
1997
|
+
},
|
|
1998
|
+
}),
|
|
1999
|
+
])
|
|
2000
|
+
.build();
|
|
2001
|
+
|
|
2002
|
+
// Remote clients can now call durable tasks via HTTP
|
|
2003
|
+
const client = createHttpClient({ baseUrl: "http://worker:7070/__runner" });
|
|
2004
|
+
await client.task("app.tasks.processOrder", { orderId: "123" });
|
|
2005
|
+
```
|
|
2006
|
+
|
|
2007
|
+
## Recovery on Startup
|
|
2008
|
+
|
|
2009
|
+
```typescript
|
|
2010
|
+
const recoverDurable = r
|
|
2011
|
+
.resource("app-durable.recover")
|
|
2012
|
+
.dependencies({ durable })
|
|
2013
|
+
.init(async (_cfg, { durable }) => {
|
|
2014
|
+
await durable.recover();
|
|
2015
|
+
})
|
|
2016
|
+
.build();
|
|
2017
|
+
|
|
2018
|
+
const app = r
|
|
2019
|
+
.resource("app")
|
|
2020
|
+
.register([durable, processOrder, recoverDurable])
|
|
2021
|
+
.build();
|
|
2022
|
+
```
|
|
2023
|
+
|
|
2024
|
+
The recovery process:
|
|
2025
|
+
|
|
2026
|
+
1. Load all incomplete executions (status `pending`, `running`, `sleeping`, or `retrying`)
|
|
2027
|
+
2. For each, re-execute the task within a new DurableContext
|
|
2028
|
+
3. The task replays through cached steps automatically
|
|
2029
|
+
4. Execution continues from where it left off
|
|
2030
|
+
|
|
2031
|
+
---
|
|
2032
|
+
|
|
2033
|
+
## Testing Utilities
|
|
2034
|
+
|
|
2035
|
+
Durable exports a small test harness so you can run workflows with in-memory
|
|
2036
|
+
backends while keeping the `run()` semantics you use in production.
|
|
2037
|
+
|
|
2038
|
+
```ts
|
|
2039
|
+
import { r, run } from "@bluelibs/runner";
|
|
2040
|
+
import { createDurableTestSetup, waitUntil } from "@bluelibs/runner/node";
|
|
2041
|
+
|
|
2042
|
+
const { durable, durableRegistration, store } = createDurableTestSetup();
|
|
2043
|
+
const Paid = r.event<{ paidAt: number }>("app.signals.paid").build();
|
|
2044
|
+
|
|
2045
|
+
const task = r
|
|
2046
|
+
.task("spec.durable.waitForSignal")
|
|
2047
|
+
.dependencies({ durable, Paid })
|
|
2048
|
+
.run(async (_input: undefined, { durable, Paid }) => {
|
|
2049
|
+
const durableContext = durable.use();
|
|
2050
|
+
const payment = await durableContext.waitForSignal(Paid);
|
|
2051
|
+
return { ok: true, paidAt: payment.paidAt };
|
|
2052
|
+
})
|
|
2053
|
+
.build();
|
|
2054
|
+
|
|
2055
|
+
const app = r
|
|
2056
|
+
.resource("spec.app")
|
|
2057
|
+
.register([resources.durable, durableRegistration, Paid, task])
|
|
2058
|
+
.build();
|
|
2059
|
+
const runtime = await run(app);
|
|
2060
|
+
const durableRuntime = runtime.getResourceValue(durable);
|
|
2061
|
+
|
|
2062
|
+
const executionId = await durableRuntime.start(task);
|
|
2063
|
+
|
|
2064
|
+
await waitUntil(
|
|
2065
|
+
async () => (await store.getExecution(executionId))?.status === "sleeping",
|
|
2066
|
+
{ timeoutMs: 1000, intervalMs: 5 },
|
|
2067
|
+
);
|
|
2068
|
+
|
|
2069
|
+
await durableRuntime.signal(executionId, Paid, { paidAt: Date.now() });
|
|
2070
|
+
await durableRuntime.wait(executionId);
|
|
2071
|
+
|
|
2072
|
+
await runtime.dispose();
|
|
2073
|
+
```
|
|
2074
|
+
|
|
2075
|
+
`createDurableTestSetup` uses `MemoryStore`, `MemoryEventBus`, and an optional
|
|
2076
|
+
`MemoryQueue`, so tests stay fast and isolated.
|
|
2077
|
+
|
|
2078
|
+
Tip: Use `stepId` for stability in tests without changing behavior, and use `timeoutMs`
|
|
2079
|
+
when you need an explicit timeout outcome.
|
|
2080
|
+
|
|
2081
|
+
### Running tests against real backends (Redis + RabbitMQ)
|
|
2082
|
+
|
|
2083
|
+
Runner also ships an integration suite that exercises the durable service with real backends
|
|
2084
|
+
(Redis for store + pub/sub and RabbitMQ for queue). This suite is part of the normal Jest
|
|
2085
|
+
test discovery, but it is **skipped by default** to keep local runs hermetic.
|
|
2086
|
+
|
|
2087
|
+
To enable it, set `DURABLE_INTEGRATION=1` and provide connection URLs (defaults point to localhost):
|
|
2088
|
+
|
|
2089
|
+
```bash
|
|
2090
|
+
DURABLE_INTEGRATION=1 \
|
|
2091
|
+
DURABLE_TEST_REDIS_URL=redis://127.0.0.1:6379 \
|
|
2092
|
+
DURABLE_TEST_RABBIT_URL=amqp://127.0.0.1:5672 \
|
|
2093
|
+
npm run coverage:ai
|
|
2094
|
+
```
|
|
2095
|
+
|
|
2096
|
+
---
|
|
2097
|
+
|
|
2098
|
+
## Comparison with Previous Design
|
|
2099
|
+
|
|
2100
|
+
| Aspect | Previous Design | New Design |
|
|
2101
|
+
| ------------------- | ----------------------------------------------------------- | ----------------------------------------- |
|
|
2102
|
+
| Components | 8+ (EventManager, WorkflowEngine, TimerManager, Saga, etc.) | 3 (DurableService, DurableContext, Store) |
|
|
2103
|
+
| Files | ~30 | ~12 |
|
|
2104
|
+
| New concepts | Workflows, Sagas, Compensation, DLQ | Just `step()` and `sleep()` |
|
|
2105
|
+
| Changes to core | EventBuilder, TaskBuilder modifications | None - pure node extension |
|
|
2106
|
+
| Learning curve | High | Low |
|
|
2107
|
+
| Implementation time | 12 weeks | 2-3 weeks |
|
|
2108
|
+
|
|
2109
|
+
## Operator & Observability
|
|
2110
|
+
|
|
2111
|
+
> [!NOTE]
|
|
2112
|
+
> `createDashboardMiddleware` moved out of core and now lives in `@bluelibs/runner-durable-dashboard`.
|
|
2113
|
+
|
|
2114
|
+
### What is the store?
|
|
2115
|
+
|
|
2116
|
+
The **durable store** (`IDurableStore`) is the persistence layer for durable workflows. It is responsible for saving and loading:
|
|
2117
|
+
|
|
2118
|
+
- executions (id, task id, input, status, attempt/error, timestamps)
|
|
2119
|
+
- step results (memoized outputs for `durableContext.step(...)`)
|
|
2120
|
+
- timers and schedules (for `sleep`, signal timeouts, cron/interval scheduling)
|
|
2121
|
+
- optional audit entries (timeline), and optional operator actions (manual interventions)
|
|
2122
|
+
|
|
2123
|
+
You provide a store implementation when you create the durable resource/service:
|
|
2124
|
+
|
|
2125
|
+
- `MemoryStore` — in-memory, great for local dev/tests (state is lost on restart)
|
|
2126
|
+
- `RedisStore` — Redis-backed, appropriate for production durability
|
|
2127
|
+
|
|
2128
|
+
### What is `DurableOperator`?
|
|
2129
|
+
|
|
2130
|
+
`DurableOperator` is an **operations/admin helper** around the store. It does not execute workflows; it reads/writes durable state to support external tooling and manual interventions:
|
|
2131
|
+
|
|
2132
|
+
- query executions for listing (filters/pagination)
|
|
2133
|
+
- load execution details (execution + step results + audit)
|
|
2134
|
+
- operator actions: retry rollback, skip steps, force fail, patch a step result
|
|
2135
|
+
|
|
2136
|
+
You can use `DurableOperator` as the backend contract for your own operational UI or APIs.
|
|
2137
|
+
|
|
2138
|
+
### Audit trail (timeline)
|
|
2139
|
+
|
|
2140
|
+
In addition to `StepResult` records, durable can persist a structured audit trail as the workflow runs:
|
|
2141
|
+
|
|
2142
|
+
- execution status transitions (pending/running/sleeping/retrying/completed/failed/cancelled)
|
|
2143
|
+
- step completions (with durations)
|
|
2144
|
+
- sleep scheduled/completed
|
|
2145
|
+
- signal waiting/delivered/timed-out
|
|
2146
|
+
- user-added notes via `durableContext.note(...)`
|
|
2147
|
+
|
|
2148
|
+
This is implemented via optional `IDurableStore` capabilities:
|
|
2149
|
+
|
|
2150
|
+
- Enable it via `resources.memoryWorkflow.fork("app-durable").with({ audit: { enabled: true }, ... })` or `resources.redisWorkflow.fork("app-durable").with({ audit: { enabled: true }, ... })` (default: off).
|
|
2151
|
+
- `appendAuditEntry(entry)`
|
|
2152
|
+
- `listAuditEntries(executionId)`
|
|
2153
|
+
|
|
2154
|
+
Notes are replay-safe: if the workflow replays after a suspend, the same `durableContext.note(...)` call does not create duplicates.
|
|
2155
|
+
|
|
2156
|
+
### Stream audit entries via Runner events (for mirroring)
|
|
2157
|
+
|
|
2158
|
+
If you want to mirror audit entries to cold storage (S3/Glacier/Postgres), enable:
|
|
2159
|
+
|
|
2160
|
+
- `audit: { enabled: true, emitRunnerEvents: true }`
|
|
2161
|
+
|
|
2162
|
+
Then listen to Runner events (they are excluded from `on("*")` global hooks by default, so subscribe explicitly):
|
|
2163
|
+
|
|
2164
|
+
```ts
|
|
2165
|
+
import { r } from "@bluelibs/runner";
|
|
2166
|
+
import { durableEvents } from "@bluelibs/runner/node";
|
|
2167
|
+
|
|
2168
|
+
const mirrorAudit = r
|
|
2169
|
+
.hook("app.hooks.durableAuditMirror")
|
|
2170
|
+
.on(durableEvents.audit.appended)
|
|
2171
|
+
.run(async (event) => {
|
|
2172
|
+
const { entry } = event.data;
|
|
2173
|
+
// write entry to your cold store (idempotent by entry.id)
|
|
2174
|
+
})
|
|
2175
|
+
.build();
|
|
2176
|
+
```
|
|
2177
|
+
|
|
2178
|
+
---
|
|
2179
|
+
|
|
2180
|
+
## Gotchas & Troubleshooting
|
|
2181
|
+
|
|
2182
|
+
- **Always put side effects inside `durableContext.step(...)`**: anything outside a step can run multiple times on retries/replays.
|
|
2183
|
+
- **Keep step ids stable**: renaming a step id (or changing control-flow so a different call order happens) can break replay determinism for existing executions.
|
|
2184
|
+
- **Call-order indexing is real**: `emit()` and repeated `waitForSignal()` allocate `:<index>` internally based on call order; refactors that add/remove calls can shift indexes.
|
|
2185
|
+
- **Signals are "deliver to current wait"**: `durableService.signal(executionId, ...)` delivers to the base signal slot if it's not completed yet (this can buffer the first signal even if the workflow hasn't reached the wait). Additional signals only deliver to subsequent indexed waits; otherwise they are ignored.
|
|
2186
|
+
- **Don't hang forever**: prefer `durableService.wait(executionId, { timeout: ... })` unless you intentionally want an unbounded wait.
|
|
2187
|
+
- **Compensation failures are terminal**: if `durableContext.rollback()` fails, execution becomes `compensation_failed` and `wait()` rejects. Use `DurableOperator.retryRollback(executionId)` after fixing the underlying issue.
|
|
2188
|
+
- **Intervals can overlap**: interval schedules are currently measured from kickoff time, not completion time. If you need non-overlapping behavior, implement it via `durableContext.sleep()` inside the workflow.
|
|
2189
|
+
- **Debugging**: inspect step results + timers via `DurableOperator`/store queries (Redis keys are prefixed by `durable:` by default).
|
|
2190
|
+
|
|
2191
|
+
## Idempotency & Deduplication
|
|
2192
|
+
|
|
2193
|
+
There are two different "idempotency" problems:
|
|
2194
|
+
|
|
2195
|
+
1. **Workflow-level deduplication (start only once)**
|
|
2196
|
+
|
|
2197
|
+
- `start(task, input, { idempotencyKey })` supports a store-backed **"start-or-get"** mode.
|
|
2198
|
+
- It returns the same `executionId` for the same `{ taskId, idempotencyKey }` pair, even if multiple callers race.
|
|
2199
|
+
- Important: subsequent calls return the existing `executionId` and do **not** overwrite the originally stored `input`.
|
|
2200
|
+
- Store support: `MemoryStore` and `RedisStore` implement this. Custom stores must implement `getExecutionIdByIdempotencyKey` / `setExecutionIdByIdempotencyKey`.
|
|
2201
|
+
- You should still persist the returned `executionId` in your domain model for observability and to make webhook handling trivial.
|
|
2202
|
+
|
|
2203
|
+
2. **Schedule-level deduplication (create schedule only once)**
|
|
2204
|
+
|
|
2205
|
+
- Use `ensureSchedule(...)` with a stable `id`. It is designed to be safe to call on every boot and concurrently across processes.
|
|
2206
|
+
|
|
2207
|
+
If you need workflow-level dedupe by business key (for example `orderId`), use it as the `idempotencyKey` (for example `order:${orderId}`), and store the returned `executionId` on the record as well.
|
|
2208
|
+
|
|
2209
|
+
## Cancellation (and why it's tricky)
|
|
2210
|
+
|
|
2211
|
+
Durable exposes a first-class cancellation API:
|
|
2212
|
+
|
|
2213
|
+
- `durableService.cancelExecution(executionId, reason?)`
|
|
2214
|
+
|
|
2215
|
+
Semantics:
|
|
2216
|
+
|
|
2217
|
+
- Cancellation is **cooperative**, not preemptive: Node cannot reliably interrupt arbitrary async work.
|
|
2218
|
+
- Cancelling marks the execution as terminal (`cancelled`), unblocks `wait()` / `startAndWait()`, and prevents future resumes (timers/signals won't continue it).
|
|
2219
|
+
- Already-running code will only stop at the next durable checkpoint (for example the next `durableContext.step(...)`, `durableContext.sleep(...)`, `durableContext.waitForSignal(...)`, or `durableContext.emit(...)`).
|
|
2220
|
+
|
|
2221
|
+
Administrative alternatives still exist:
|
|
2222
|
+
|
|
2223
|
+
- `DurableOperator.forceFail(executionId)` is a blunt instrument to stop and mark `failed`.
|
|
2224
|
+
|
|
2225
|
+
## What This Design Deliberately Excludes
|
|
2226
|
+
|
|
2227
|
+
1. **Exactly-once external side effects** – The system provides at-least-once execution with effectively-once steps; true exactly-once semantics at the boundary (e.g., payment processors) are left to idempotent APIs and application logic.
|
|
2228
|
+
2. **Event sourcing** – Steps are modeled as checkpoints, not a full event stream. This keeps the model simple.
|
|
2229
|
+
3. **Automatic saga orchestration DSLs** – There is no separate workflow language or visual designer. Compensation is regular TypeScript code using `try/catch` and `durableContext.step`.
|
|
2230
|
+
4. **Built-in dashboards** – not included in core; observability UIs are intentionally external to the runtime package.
|
|
2231
|
+
5. **Cross-region or multi-tenant sharding logic** – Multi-region replication and advanced topology concerns are out of scope for v1.
|
|
2232
|
+
|
|
2233
|
+
Also intentionally minimal in v1: 6. **Preemptive cancellation** – cancellation is cooperative (checkpoints), not an interrupt/kill mechanism for arbitrary in-flight async work. 7. **Advanced visibility indexes** – `listExecutions` is operator-oriented and not a full-blown search/indexing system. 8. **Cron timezone & misfire policies** – cron is evaluated using the process environment defaults; DST/timezone/misfire handling is not configurable yet.
|
|
2234
|
+
|
|
2235
|
+
These can all be added in future versions if needed, without changing the core `DurableContext` and `DurableService` APIs.
|
|
2236
|
+
|
|
2237
|
+
---
|
|
2238
|
+
|
|
2239
|
+
## Why This is Better
|
|
2240
|
+
|
|
2241
|
+
1. **Fits Runner's philosophy** - No new concepts, just enhanced tasks
|
|
2242
|
+
2. **No magic** - What you see is what you get
|
|
2243
|
+
3. **Explicit over implicit** - Compensation is code, not configuration
|
|
2244
|
+
4. **Simple mental model** - `step()` = checkpoint, that's it
|
|
2245
|
+
5. **Easy to understand** - Read the code, know what happens
|
|
2246
|
+
6. **Easy to test** - MemoryStore for tests, no external dependencies
|
|
2247
|
+
7. **Easy to debug** - Each step is recorded, replay is deterministic
|