@ayepi/work 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/ayepi-work.md ADDED
@@ -0,0 +1,926 @@
1
+ <!--
2
+ ayepi-work.md — reference for `@ayepi/work`, written for coding agents.
3
+
4
+ Copy this file into any project that depends on `@ayepi/work` (e.g. into your repo's
5
+ `docs/` or `.claude/` directory) and reference it from your agents and slash commands.
6
+ It documents the public API, the patterns the package expects, and how it works under the
7
+ hood, with copy-pasteable examples. Keep it in sync with the installed package version.
8
+ -->
9
+
10
+ # `@ayepi/work` — overview
11
+
12
+ `@ayepi/work` is a **type-safe distributed work / job-queue + workflow engine**. Define
13
+ work types with `defineWork` (each yields a typed, callable, queueable builder), pass them
14
+ to `createWork` as a `const` registry, and `enqueue` is fully checked — by instance
15
+ (`enqueue(add({ a, b }))`) or by name (`enqueue('add', { a, b })`). A handler **returns a
16
+ `WorkResult`** (`ctx.result` / `ctx.queue` / `ctx.void` / `.next`) describing what it
17
+ produced, so each work carries **two** inferred types — its *awaited-alone* result and its
18
+ *group* contribution — and `enqueue(root).group()` resolves to a **precise union from the
19
+ workflow structure**, not the whole registry. Reach for it for durable background jobs,
20
+ fan-out/fan-in workflows, retries, scheduling, or cross-process coordination — type-checked
21
+ end to end.
22
+
23
+ It runs **zero-config**: an in-memory implementation of its three ports (`Queue` /
24
+ `PubSub` / `Store`) is bundled, so `createWork()` works with no setup. The same engine
25
+ scales out by swapping those ports for Redis/SQS/etc. — no engine changes.
26
+
27
+ ```sh
28
+ pnpm add @ayepi/work
29
+ ```
30
+
31
+ ```ts
32
+ import { defineWork, createWork } from '@ayepi/work'
33
+
34
+ const add = defineWork('add', (i: { a: number; b: number }, ctx) => ctx.result(i.a + i.b))
35
+ const w = createWork({ work: [add] as const })
36
+
37
+ const sum = await w.enqueue(add({ a: 1, b: 2 })).result() // 3, typed as number
38
+ await w.stop()
39
+ ```
40
+
41
+ Bare `import` has **no side effects** — the default instance does not auto-start.
42
+
43
+ ## This doc set
44
+
45
+ This reference is split by topic. Start here, then jump to the relevant page:
46
+
47
+ - **`ayepi-work.md`** (this file) — overview, the **`WorkResult` handler contract**
48
+ (`ctx.result` / `ctx.queue` / `ctx.void` / `.next`), `defineWork` / `defineBatchWork` /
49
+ `createWork`, the typed `enqueue` overloads, `WorkHandle`, `WorkContext`, instance
50
+ options, retries, deadlines, the id generator, doers, the default instance, tunable
51
+ defaults, plus abbreviated ["How it works under the hood"](#how-it-works-under-the-hood)
52
+ and ["Gotchas"](#gotchas--constraints) sections.
53
+ - **[`ayepi-work-deps-schedule.md`](./ayepi-work-deps-schedule.md)** — fan-in
54
+ dependencies (`dependency` / `DependencyCondition` / `conditionMet`, and the native
55
+ `.next` chain) and scheduling (`schedule` / `parseCron` / `nextAfter` / cron + fn forms).
56
+ - **[`ayepi-work-ports.md`](./ayepi-work-ports.md)** — the three ports
57
+ (`Queue` / `PubSub` / `Store`), custom backends, the bundled in-memory backend
58
+ (`memoryQueue` / `memoryPubSub` / `memoryStore` / `memoryBackend`), the JSON codec
59
+ (`defaultCodec` / `JsonCodec`), and the full engine-mechanics deep dive.
60
+
61
+ All durations throughout the package are **milliseconds**.
62
+
63
+ ---
64
+
65
+ ## The handler contract — returning a `WorkResult`
66
+
67
+ Every handler **returns a `WorkResult`** describing what it produced. A `WorkResult` is a
68
+ lazy instruction built by the context and carried out **after** the handler returns. There
69
+ are three constructors plus a chaining method:
70
+
71
+ ```ts
72
+ ctx.result(value, opts?) // contribute a value (this item's .result() AND the group)
73
+ ctx.queue(items, opts?) // run sub-work in the same group; this item DELEGATES (.result() = void)
74
+ ctx.void() // contribute nothing
75
+ result.next(works, cond?, opts?) // native dependency: queue `works` once prior items satisfy `cond`
76
+ ```
77
+
78
+ Each work then carries **two** inferred types — its *awaited-alone* result `S` (what
79
+ `.result()` resolves to) and its *group* contribution `G` (what awaiting the handle /
80
+ `.group()` resolves to). Because `G` is built from the structure the handler returns,
81
+ `enqueue(root).group()` is a **precise union of the workflow's parts**, not the
82
+ registry-wide union.
83
+
84
+ ```ts
85
+ // leaf: S = G = number
86
+ const add = defineWork('add', (i: { a: number; b: number }, ctx) => ctx.result(i.a + i.b))
87
+
88
+ // delegating root: S = void, G = number | string (the union of what it queues)
89
+ const fetch = defineWork('fetch', (i: { id: string }, ctx) => ctx.result(load(i.id))) // string
90
+ const flow = defineWork('flow', (i: { ids: string[] }, ctx) =>
91
+ ctx.queue(i.ids.map((id) => fetch({ id }))) // .result() is void; .group() is string
92
+ .next([add({ a: 1, b: 2 })], 'all-success'), // .next widens the group by number
93
+ )
94
+ ```
95
+
96
+ - **`ctx.result(value, opts?)`** ⇒ `WorkResult<value, value>`. `opts`: `{ final }` locks the
97
+ group result (later contributors can't overwrite it); `{ append: (existing) => next }`
98
+ folds this value into the existing group result instead of overwriting.
99
+ - **`ctx.queue(items, opts?)`** ⇒ `WorkResult<void, GroupOf<items>>`. `items` is a `Work`, a
100
+ `WorkResult`, or a tuple/array of them (nesting allowed). The works join **this item's
101
+ group**; the item itself **delegates**, so its own `.result()` is `void`. `opts` is the
102
+ same `WorkInstanceOptions` as `enqueue` (`delay`, `priority`, `group`, …).
103
+ - **`ctx.void()`** ⇒ `WorkResult<void, void>`. Contributes nothing (and `void` is dropped
104
+ from a group union).
105
+ - **`.next(works, condition?, opts?)`** — a **native dependency**: queue `works` once the
106
+ works the prior result queued satisfy `condition` (default `'all-success'`; see
107
+ [deps & scheduling](./ayepi-work-deps-schedule.md)). It widens the group type by `works`'
108
+ contribution and is the ergonomic form of enqueuing a `dependency(...)` by hand.
109
+
110
+ **Strict-return.** A `WorkResult` that is **created but not returned** throws — it would
111
+ otherwise be invisible to the group type and silently never run. Opt out per type
112
+ (`{ strictReturn: false }`) or system-wide (`createWork({ strictReturn: false })`); with it
113
+ off, a detached `ctx.queue(...)` simply doesn't execute.
114
+
115
+ **Group value (runtime).** The group's resolved value is the **last contributor to finish**
116
+ (last-writer-wins). `ctx.result(v, { final: true })` locks it; `{ append }` accumulates
117
+ (read-modify-write — best-effort under concurrency, exact when the contributors are
118
+ serialized). `ctx.void()` and delegating `ctx.queue(...)` contribute no value of their own.
119
+
120
+ ## Defining work — `defineWork`
121
+
122
+ ```ts
123
+ function defineWork<Name extends string, I, S, G>(
124
+ name: Name,
125
+ handler: WorkHandler<I, S, G>, // (input: I, ctx: WorkContext) => WorkResult<S, G> | Promise<…>
126
+ opts?: WorkOptions<I>, // default {}
127
+ ): WorkBuilder<Name, I, S, NonVoidUnion<G>>
128
+ ```
129
+
130
+ Returns a **callable builder** typed by its input `I` and the `WorkResult` the handler
131
+ returns (`S` = awaited-alone result, `G` = group contribution, with `void` dropped). Call
132
+ it with the work's exact input to mint a queueable `Work<Name, S, G>` with a fresh
133
+ build-time `id`:
134
+
135
+ ```ts
136
+ const add = defineWork('add', (i: { a: number; b: number }, ctx) => ctx.result(i.a + i.b))
137
+ const a = add({ a: 1, b: 2 }) // a: Work<'add', number, number>, a.id assigned now
138
+ ```
139
+
140
+ A `WorkBuilder` also exposes `.type` (the name) and `.def` (the underlying
141
+ `WorkDefinition`). The id is assigned at **build time**, so you can reference a work
142
+ instance before queueing it — e.g. to depend on it (see
143
+ [deps & scheduling](./ayepi-work-deps-schedule.md)). Override how build-time ids are minted
144
+ with `setIdGenerator` (see [Custom id generation](#custom-id-generation)).
145
+
146
+ ### `WorkHandler` and `WorkContext`
147
+
148
+ ```ts
149
+ type WorkHandler<I, S, G> = (input: I, ctx: WorkContext) => WorkResult<S, G> | Promise<WorkResult<S, G>>
150
+
151
+ interface WorkContext {
152
+ readonly id: string // this item's id
153
+ readonly groupId: string // group shared by this item and everything it queues
154
+ readonly attempt: number // delivery attempt (1 = first try)
155
+ readonly parent?: string // id of the work that queued this one (undefined at top level)
156
+ readonly dependents?: readonly string[] // when queued by a fired dependency, the ids it depended on
157
+ result<R>(value: R, opts?: { final?: boolean; append?: (existing: R | undefined) => R }): WorkResult<R, R>
158
+ queue<const Is>(items: Is, opts?: WorkInstanceOptions): WorkResult<void, GroupOf<Is>>
159
+ void(): WorkResult<void, void>
160
+ states(ids: readonly string[]): Promise<(WorkState | undefined)[]>
161
+ claim(key: string): Promise<boolean>
162
+ }
163
+ ```
164
+
165
+ - `ctx.result(value, opts?)` / `ctx.queue(items, opts?)` / `ctx.void()` build the
166
+ `WorkResult` the handler returns — see [the handler contract](#the-handler-contract--returning-a-workresult) above.
167
+ - `ctx.parent` is the id of the work whose handler queued this one (via `ctx.queue` or
168
+ `.next`); it's `undefined` for a top-level `enqueue`. `ctx.dependents` is the ids a
169
+ fired dependency was waiting `on` when it queued this work. Both are also exposed on the
170
+ item-scoped `WorkEvent`s.
171
+ - `ctx.states(ids)` reads other items' `WorkState` (used by the dependency type).
172
+ - `ctx.claim(key)` wins a one-time distributed claim — returns `true` exactly once across
173
+ the fleet (built on `Store.setIfNotExists`).
174
+
175
+ A handler shapes the group value by **returning** `ctx.result(value, { final?, append? })`.
176
+
177
+ ### `WorkOptions` — per-type config
178
+
179
+ Every field is optional:
180
+
181
+ ```ts
182
+ interface WorkOptions<I> {
183
+ readonly retry?: RetryOptions // default retry policy for this type
184
+ readonly priority?: number // default scheduling priority
185
+ readonly group?: string // default fairness group
186
+ readonly doer?: Doer // dedicated doer (else the system's doer) — caps this type's concurrency
187
+ readonly queue?: Queue // dedicated queue (else the system's queue) — isolates this type's load
188
+ readonly options?: (input: I) => WorkInstanceOptions // compute per-instance options from input
189
+ readonly codec?: JsonCodec // per-type codec (else the global codec)
190
+ readonly onEvent?: (event: WorkEvent) => void // per-type lifecycle hook
191
+ readonly onFailure?: FailureClassifier // classify a failure → abort / re-queue / retry (see "Classifying a failure")
192
+ readonly logContext?: (input: I) => object // derive logWith context from input
193
+ readonly timeout?: number // default relative deadline (ms from enqueue)
194
+ readonly strictReturn?: boolean // require WorkResults to be returned (default true)
195
+ readonly skipQueue?: boolean // run the first attempt in-process
196
+ }
197
+ ```
198
+
199
+ ```ts
200
+ const send = defineWork('send', handler, {
201
+ retry: { attempts: 5, base: 1000 },
202
+ options: (i: { to: string }) => ({ group: i.to, priority: 0 }), // computed per instance
203
+ onEvent: (e) => log(e.kind),
204
+ })
205
+ ```
206
+
207
+ ## Defining batched work — `defineBatchWork`
208
+
209
+ When per-item work is wasteful but a bulk call is cheap (embeddings, bulk inserts), define
210
+ the type with `defineBatchWork`. Items still enqueue, retry, prioritize, and join groups
211
+ individually, but **execute together** once `size` accumulate or `maxWait` ms elapse. Each
212
+ `.result()` resolves to its **index-aligned** output.
213
+
214
+ ```ts
215
+ function defineBatchWork<Name extends string, I, O>(
216
+ name: Name,
217
+ config: BatchConfig<I, O> & WorkOptions<I>,
218
+ ): WorkBuilder<Name, I, O, NonVoidUnion<O>> // each item's S = G = O
219
+
220
+ interface BatchConfig<I, O> {
221
+ readonly size: number // flush when this many items are buffered
222
+ readonly maxWait: number // flush a partial batch this long after the first item (ms)
223
+ readonly run: (inputs: I[]) => O[] | Promise<O[]> // one output per input, same order
224
+ }
225
+ ```
226
+
227
+ ```ts
228
+ import { defineBatchWork, createWork, priorityDoer } from '@ayepi/work'
229
+
230
+ const embed = defineBatchWork('embed', {
231
+ size: 50,
232
+ maxWait: 100,
233
+ run: (inputs: { text: string }[]) => embedAll(inputs.map((i) => i.text)), // number[][], aligned
234
+ doer: priorityDoer({ max: 2 }), // the type's doer governs how many *batches* run at once
235
+ })
236
+
237
+ const w = createWork({ work: [embed] as const })
238
+ const vec = await w.enqueue(embed({ text: 'hello' })).result() // its own embedding
239
+ ```
240
+
241
+ Notes:
242
+ - A batch handler gets **no per-item `ctx`** — it's for leaf work.
243
+ - If `run` throws, every item in the batch follows its **own** retry policy (re-enqueued,
244
+ possibly landing in a different batch next time).
245
+ - `run` **must** return an array of exactly `inputs.length` outputs, or the batch fails
246
+ with `batch "<type>" returned N outputs for M inputs`.
247
+
248
+ ## Creating a system — `createWork`
249
+
250
+ ```ts
251
+ function createWork<const Defs extends readonly AnyWorkBuilder[]>(
252
+ opts?: WorkSystemOptions & { work?: Defs },
253
+ ): WorkSystem<Defs>
254
+ ```
255
+
256
+ Pass `work: [...] as const` for a typed registry. Zero-config (`createWork()`) uses the
257
+ bundled in-memory backend and an `unlimitedDoer`.
258
+
259
+ ### `WorkSystemOptions` and their defaults
260
+
261
+ | Option | Type | Default |
262
+ |---|---|---|
263
+ | `queue` | `Queue` | bundled `memoryQueue` |
264
+ | `pubsub` | `PubSub` | bundled `memoryPubSub` |
265
+ | `store` | `Store` | bundled `memoryStore` |
266
+ | `retry` | `RetryOptions` | `@ayepi/core` defaults (`attempts:3, base:1000, factor:2, max:30000, jitter:0.5`) |
267
+ | `doer` | `Doer` | `unlimitedDoer()` |
268
+ | `pollInterval` | `number` (ms) | `1000` |
269
+ | `backpressure` | `(ctx: BackpressureContext) => MaybePromise<number \| void>` | — (always proceed) |
270
+ | `visibility` | `number` (ms) | `30000` |
271
+ | `heartbeat` | `number` (ms) | `Math.floor(visibility / 3)` |
272
+ | `prefix` | `string` | `'work:'` |
273
+ | `codec` | `JsonCodec` | `defaultCodec` |
274
+ | `logWith` | `LogWith` | identity (no-op wrapper) |
275
+ | `logContext` | `(input, type) => object` | — |
276
+ | `onEvent` | `(event: WorkEvent) => void` | — |
277
+ | `onError` | `(err, phase: 'commit' \| 'queue') => void` | — |
278
+ | `onFailure` | `FailureClassifier` (default; per-type overrides) | — (retry) |
279
+ | `dlq` | `Queue` (readable — redrive source when idle) | — (off) |
280
+ | `redriveCount` | `number` (max moved per idle poll) | `10` |
281
+ | `metrics` | `Metrics` (`@ayepi/core` registry; bring one for quantiles) | fresh `createMetrics()` |
282
+ | `accept` | `(info: WorkAcceptInfo) => boolean` | — (accept all) |
283
+ | `unhandledWorkGroup` | `(info: UnhandledWorkGroupInfo) => void` | — |
284
+ | `strictReturn` | `boolean` (require handlers to return every `WorkResult` they create) | `true` |
285
+ | `generateId` | `() => string` (ids the **engine** mints — group/name-form/dependency/re-push) | process generator (`setIdGenerator`) |
286
+ | `autoStart` | `boolean` | `true` |
287
+ | `now` | `() => number` | `Date.now` |
288
+ | `random` | `() => number` | `Math.random` |
289
+
290
+ > The three ports are all-or-nothing: provide all three to go fully custom, or none for
291
+ > zero-config. Providing one or two means the rest fall back to a *fresh, separate*
292
+ > in-memory backend — usually not what you want.
293
+
294
+ ### `WorkSystem` — the returned API
295
+
296
+ ```ts
297
+ interface WorkSystem<Defs extends readonly AnyWorkBuilder[]> {
298
+ // instance form — await ⇒ the root's group contribution (structural), .result() ⇒ its own output
299
+ enqueue<W extends Work>(work: W, options?: WorkInstanceOptions): WorkHandle<SelfOfWork<W>, GroupOfWork<W>>
300
+ // name form — name ∈ registry, input typed
301
+ enqueue<K extends RegistryNames<Defs>>(name: K, input: InputForName<Defs, K>, options?: WorkInstanceOptions): WorkHandle<SelfForName<Defs, K>, GroupForName<Defs, K>>
302
+ schedule(config: ScheduleConfig): () => void // returns a cancel fn; see deps-schedule doc
303
+ start(): void // start worker + scheduler loops (idempotent)
304
+ stop(): Promise<void> // stop loops, flush in-flight (idempotent)
305
+ list(): Promise<WorkState[]> // snapshot of known states (best-effort)
306
+ active(): ActiveWork[] // work this instance polled + accepted
307
+ stats(): StatValue[] // flat per-type metric snapshot (see below)
308
+ readonly metrics: Metrics // the live metrics registry (list/get/subscribe)
309
+ readonly backend: Backend // the underlying ports
310
+ }
311
+ ```
312
+
313
+ ## Enqueueing & handles
314
+
315
+ `enqueue` returns a `WorkHandle`. **Awaiting it resolves to the group result**; use
316
+ `.result()` for this item's own output and `.group()` for the explicit group form.
317
+
318
+ ```ts
319
+ interface WorkHandle<Self, Group> extends PromiseLike<Group> {
320
+ readonly id: string
321
+ readonly groupId: string
322
+ result(): Promise<Self> // this item's own output
323
+ group(): Promise<Group> // the group's final result (same as awaiting the handle)
324
+ }
325
+ ```
326
+
327
+ The two `enqueue` overloads are equivalent at runtime:
328
+
329
+ ```ts
330
+ w.enqueue(add({ a: 1, b: 2 })) // instance form
331
+ w.enqueue('add', { a: 1, b: 2 }) // name form (name ∈ registry, input typed)
332
+ add({ a: 1 }) // ✗ type error: missing `b`
333
+ w.enqueue('nope', {}) // ✗ type error: unknown work name
334
+ ```
335
+
336
+ ### Example: define + enqueue + await the group result
337
+
338
+ ```ts
339
+ import { defineWork, createWork } from '@ayepi/work'
340
+
341
+ const add = defineWork('add', (i: { a: number; b: number }, ctx) => ctx.result(i.a + i.b))
342
+ const w = createWork({ work: [add] as const })
343
+
344
+ const group = await w.enqueue(add({ a: 1, b: 2 })) // group contribution (here: number)
345
+ const own = await w.enqueue(add({ a: 1, b: 2 })).result() // number — this item's output
346
+ await w.stop()
347
+ ```
348
+
349
+ ### Example: fanning out with `ctx.queue` (and shaping the group value)
350
+
351
+ A handler **returns** `ctx.queue(children)` to fan out into the same group; awaiting the
352
+ parent's handle waits for **all** of them, and the group value is the **last child to
353
+ finish**. The parent delegates, so its own `.result()` is `void`:
354
+
355
+ ```ts
356
+ const child = defineWork('child', (i: { n: number }, ctx) => ctx.result(i.n * 2))
357
+ const parent = defineWork('parent', (i: { ids: string[] }, ctx) =>
358
+ ctx.queue(i.ids.map((id) => child({ n: id.length }))), // each joins the same group
359
+ )
360
+
361
+ const w = createWork({ work: [child, parent] as const })
362
+ const group = await w.enqueue(parent({ ids: ['a', 'b'] })) // resolves after both children settle
363
+ // group is the last child's output (here: 2); typed number (child's contribution)
364
+ await w.enqueue(parent({ ids: ['a'] })).result() // undefined — the parent delegates
365
+ ```
366
+
367
+ To make the parent contribute its **own** value instead of delegating, return
368
+ `ctx.result(...)` (optionally `{ final: true }` so children can't overwrite it) and queue
369
+ the children via `.next` or a nested result. Use `{ append }` to accumulate across
370
+ contributors:
371
+
372
+ ```ts
373
+ const sum = defineWork('sum', (i: { n: number }, ctx) =>
374
+ ctx.result(i.n, { append: (existing) => (existing ?? 0) + i.n }), // fold into the group value
375
+ )
376
+ ```
377
+
378
+ ## Instance options — `WorkInstanceOptions`
379
+
380
+ `delay`, `runAt`, `retry`, `priority`, `group`, and `skipQueue` are **per-instance** —
381
+ provided at queue time, set as per-type constants, or computed from the input — and are
382
+ **serialized with the item**, so the worker that runs it applies the same policy.
383
+
384
+ ```ts
385
+ interface WorkInstanceOptions {
386
+ readonly delay?: number // sets startAt = queueAt + delay
387
+ readonly runAt?: number // absolute start (epoch ms) — alternative to delay, wins over it
388
+ readonly retry?: RetryOptions // retry policy override for this item
389
+ readonly priority?: number // higher runs first (consumed by the doer)
390
+ readonly group?: string // fairness group label (consumed by balancedDoer)
391
+ readonly deadline?: number // epoch ms by which it must start+finish, else terminal (no retry)
392
+ readonly timeout?: number // relative deadline (ms from enqueue) — deadline = queueAt + timeout
393
+ readonly skipQueue?: boolean // run the first attempt in-process (no queue hop)
394
+ }
395
+ ```
396
+
397
+ ```ts
398
+ w.enqueue(sendEmail({ to }), { delay: 5_000, priority: 10, group: to })
399
+ w.enqueue(report({}), { runAt: Date.parse('2030-01-01T03:00:00Z') }) // far-future scheduled
400
+ ```
401
+
402
+ `runAt` is an **absolute** schedule (epoch ms): `startAt = runAt` and `delay = runAt - now`,
403
+ so `runAt` wins over `delay` when both are given. It works for **arbitrarily far** times even
404
+ on backends that cap a single delay (e.g. SQS's 15-min `DelaySeconds`): the engine re-defers
405
+ an item that arrives early until its `startAt`. See
406
+ [Deferral & scheduling](#deferral--scheduling) below and the
407
+ [ports doc](./ayepi-work-ports.md#early-arrival-re-defer-far-future-scheduling).
408
+
409
+ **Resolution precedence** (last wins), per the engine's `resolveOptions`:
410
+ `queue-time options` > `type options(input)` > `type constants` > defaults. For `retry`
411
+ the chain is fully merged field-by-field:
412
+ `getDefaultRetryOptions()` < system `retry` < type `retry` < computed `retry` < queue-time `retry`.
413
+
414
+ ### Retries
415
+
416
+ `retry` is `@ayepi/core`'s `RetryOptions`:
417
+
418
+ ```ts
419
+ interface RetryOptions {
420
+ attempts?: number // total attempts incl. the first (default 3)
421
+ base?: number // first-retry delay ms (default 1000)
422
+ factor?: number // multiplier per attempt (default 2)
423
+ max?: number // delay cap ms (default 30000)
424
+ jitter?: number // jitter fraction [0,1] (default 0.5)
425
+ }
426
+ ```
427
+
428
+ A retry **re-enters the queue** as a fresh delivery (`attempt + 1`) after a backoff delay;
429
+ on exhaustion the item is dead-lettered. Backoff per retry `attempt` (1 = first retry) is
430
+ `min(base · factor^(attempt-1), max) · (1 − jitter · random())`.
431
+
432
+ ```ts
433
+ const flaky = defineWork('flaky', handler, {
434
+ retry: { attempts: 5, base: 1000, factor: 2, jitter: 0.5 },
435
+ })
436
+
437
+ // or per-instance at queue time:
438
+ await w.enqueue(flaky({}), { retry: { attempts: 2, base: 2, jitter: 0 } }).result()
439
+ ```
440
+
441
+ Set fleet-wide defaults with `setDefaultRetryOptions` (re-exported here from
442
+ `@ayepi/core/retry`, alongside `retry`, `backoff`, `getDefaultRetryOptions`,
443
+ `DEFAULT_RETRY_OPTIONS`).
444
+
445
+ ### `skipQueue`
446
+
447
+ `skipQueue` runs the **first attempt in-process** (no queue hop, lease, or heartbeat) for
448
+ low latency; state/results/group bookkeeping still go through the store. A **failure
449
+ re-enqueues durably** (`attempt + 1`), so the retry survives a crash and any instance can
450
+ pick it up. The first run itself is best-effort — the latency-for-durability trade.
451
+
452
+ ```ts
453
+ const h = w.enqueue(echo({ v: 'hi' }), { skipQueue: true })
454
+ await h.result() // resolves without a queue round-trip on the happy path
455
+ ```
456
+
457
+ ## Doers — concurrency, ordering, rate limiting
458
+
459
+ A **doer** (`@ayepi/core/doer`, re-exported here) decides how many items to pull and which
460
+ to run next. Set one globally (`createWork({ doer })`) or per type
461
+ (`defineWork(..., { doer })`):
462
+
463
+ - `unlimitedDoer()` — run everything, no concurrency cap.
464
+ - `balancedDoer({ max })` — cap N; share slots fairly across `group`s, then priority, then age.
465
+ - `priorityDoer({ max })` — cap N; highest priority first, then age.
466
+ - `ageDoer({ max })` — cap N; oldest first.
467
+ - `rateLimitedDoer({ limit, window, ... })` — from `@ayepi/rate` (not re-exported here);
468
+ caps the **start rate**.
469
+
470
+ ```ts
471
+ import { balancedDoer } from '@ayepi/work' // re-exported from @ayepi/core/doer
472
+
473
+ createWork({ work: [/* ... */] as const, doer: balancedDoer({ max: 20 }) })
474
+ ```
475
+
476
+ Re-exported doer types: `Doer`, `DoerTaskOptions`, `BoundedDoerOptions`,
477
+ `UnlimitedDoerOptions`.
478
+
479
+ ## Load-sharing / fairness — per-type `queue`
480
+
481
+ By default every type shares the system's one `Queue`, so a type that floods the queue can
482
+ starve the others behind it. Give a type its **own** `Queue` (`WorkOptions.queue`) to isolate
483
+ its load: the worker loop polls **every distinct queue each tick** (a fair `ceil(n / queues)`
484
+ share apiece, round-robin), so a flood on one queue can't starve types on another. Several
485
+ types can share one `Queue` instance — group them to draw the isolation boundary where you
486
+ want it.
487
+
488
+ ```ts
489
+ import { defineWork, createWork, memoryQueue, balancedDoer } from '@ayepi/work'
490
+
491
+ const bulkQ = memoryQueue() // a separate queue for the noisy type
492
+
493
+ const ingest = defineWork('ingest', handler, { queue: bulkQ }) // floods stay on bulkQ
494
+ const checkout = defineWork('checkout', handler) // on the default queue, unaffected
495
+ const w = createWork({ work: [ingest, checkout] as const })
496
+ ```
497
+
498
+ Per-type `queue` **composes with** the per-type `doer`: `queue` isolates a type at the
499
+ **queue boundary** (it can't starve types on another queue), while `doer` caps how many of
500
+ that type run **at once**. Use both to both isolate a noisy type's intake and bound its
501
+ concurrency:
502
+
503
+ ```ts
504
+ const ingest = defineWork('ingest', handler, {
505
+ queue: bulkQ, // isolate its intake from other types
506
+ doer: balancedDoer({ max: 4 }), // and cap it to 4 concurrent
507
+ })
508
+ ```
509
+
510
+ The loop doesn't busy-spin: it keeps pulling immediately only while a queue returns a **full**
511
+ share *and* it's actually starting work, and backs off (sleeps `pollInterval`) when a full
512
+ round started nothing (only over-capacity or not-yet-due work was available).
513
+
514
+ ### Dynamic backpressure — `backpressure`
515
+
516
+ A `WorkSystemOptions.backpressure` hook is checked **before every poll**. Return a number of
517
+ **milliseconds to pause** before taking any work — even when doers have free slots — or `0` /
518
+ nothing to proceed. The loop sleeps the returned time and checks again, so it's re-polled until
519
+ it returns `0`. Use it to stop pulling while an external resource is saturated (a database at
520
+ capacity, a downstream API rate-limited, a memory ceiling) and let it recover before resuming:
521
+
522
+ ```ts
523
+ createWork({
524
+ ...backend,
525
+ work: [...] as const,
526
+ backpressure: async () => (await db.poolUtilization()) > 0.9 ? 2000 : 0, // pause 2s while the DB pool is hot
527
+ })
528
+ ```
529
+
530
+ It may be async. A throwing `backpressure` is reported via `onError` (`'queue'`) and the loop
531
+ backs off `pollInterval`. Prefer a modest pause (it's re-polled, and it also bounds how long
532
+ `stop()` waits for the loop to exit). `backpressure` gates the durable queue loop only;
533
+ `skipQueue` work runs in-process regardless.
534
+
535
+ The hook receives a `BackpressureContext` — the live `metrics` registry plus the in-flight
536
+ `active` count — so the pause can adapt to observed throughput (taking no arguments is still
537
+ valid, as above).
538
+
539
+ #### `adaptiveDelay()` — automatic throughput-driven backoff
540
+
541
+ For the common case of "slow down automatically when a downstream starts failing," drop in
542
+ the bundled `adaptiveDelay()` helper. It's an AIMD controller (the shape TCP uses): each poll
543
+ it samples the **delta** in `succeeded`/`failed` since the last check, and when the recent
544
+ failure rate exceeds `maxFailRate` it backs off multiplicatively; when work completes cleanly
545
+ it ramps the pause back down additively — self-healing, with no windowed state to keep.
546
+
547
+ ```ts
548
+ import { adaptiveDelay } from '@ayepi/work'
549
+
550
+ createWork({
551
+ work: [...] as const,
552
+ backpressure: adaptiveDelay({ max: 10_000 }), // pause grows toward 10s under failures, eases back to 0
553
+ })
554
+ ```
555
+
556
+ ```ts
557
+ adaptiveDelay({
558
+ types?: string[] // only watch these types (default: all)
559
+ maxFailRate?: number // failed/(succeeded+failed) per interval before backing off (default 0 — any failure)
560
+ min?: number // pause floor while healthy (default 0)
561
+ max?: number // pause ceiling (default 30000)
562
+ base?: number // first non-zero pause when backoff starts (default 100)
563
+ factor?: number // multiplier per unhealthy interval (default 2)
564
+ step?: number // amount subtracted per healthy interval (default = base)
565
+ })
566
+ ```
567
+
568
+ It's **stateful** (the current pause + last counts live in the closure), so create **one**
569
+ per work system. Pass `types` to protect a specific downstream — e.g. watch only the type that
570
+ hits a rate-limited API. Or read `ctx.metrics` directly in your own hook for a custom policy.
571
+
572
+ ## Deferral & scheduling
573
+
574
+ ### `runAt` — absolute scheduling
575
+
576
+ `enqueue(work, { runAt })` schedules an item for an absolute time (epoch ms). `runAt` is an
577
+ alternative to `delay` and **wins over it**; it works for arbitrarily-far times even on
578
+ backends that cap a single delay — the engine re-defers an early arrival until its `startAt`
579
+ (see [ports](./ayepi-work-ports.md#early-arrival-re-defer-far-future-scheduling)).
580
+
581
+ ```ts
582
+ w.enqueue(report({ day }), { runAt: Date.parse('2030-01-01T03:00:00Z') })
583
+ ```
584
+
585
+ ### `WorkDelayError` — defer from a handler (reschedule, not retry)
586
+
587
+ A handler throws `WorkDelayError` to **defer** its item to a later time. This is a
588
+ **reschedule, not a retry**: the `attempt` count is **unchanged**, so a handler can defer
589
+ indefinitely (e.g. "the upstream isn't ready, check again in 5 minutes") without ever
590
+ exhausting its retries or dead-lettering.
591
+
592
+ ```ts
593
+ import { WorkDelayError } from '@ayepi/work'
594
+
595
+ const poll = defineWork('poll', async (input, ctx) => {
596
+ if (!(await upstreamReady())) throw new WorkDelayError({ delay: 5 * 60_000 }) // try again in 5 min
597
+ return doWork(input)
598
+ })
599
+ ```
600
+
601
+ `WorkDelayError`'s `when` is a `WorkDelaySpec` — give it `runAt` (absolute epoch ms, wins) or
602
+ `delay` (relative ms, resolved to `now + delay`):
603
+
604
+ ```ts
605
+ class WorkDelayError extends Error {
606
+ constructor(when: { runAt?: number; delay?: number }, message?: string) // default 'work deferred'
607
+ readonly when: WorkDelaySpec
608
+ }
609
+ interface WorkDelaySpec {
610
+ readonly runAt?: number // absolute (epoch ms) — wins over delay
611
+ readonly delay?: number // relative (ms) — runAt = now + delay
612
+ }
613
+ ```
614
+
615
+ A deferral re-enqueues the item at the resolved time **without** advancing `attempt`, removes
616
+ the current delivery, and emits a `deferred` event (`{ kind: 'deferred'; id; type; groupId;
617
+ runAt; at }`). A **batch** handler throwing `WorkDelayError` defers **every** item in the
618
+ batch. As with `runAt`, a far-future deferral is honored even on delay-capping backends via
619
+ the engine's early-arrival re-defer.
620
+
621
+ ### Classifying a failure — abort vs. retry vs. re-queue
622
+
623
+ When a handler throws, the engine routes the failure three ways — so a permanent error stops
624
+ fast and a transient one (a rate limit) comes back **without** burning the retry budget:
625
+
626
+ - **`throw new RetryAbort(cause)`** → **dead-letter now** (permanent). No more attempts, no churn;
627
+ the item goes `dead` with `cause`'s message and the awaiting `.result()` rejects. (`RetryAbort`
628
+ is `@ayepi/core`'s, re-exported here.)
629
+ - **`throw new WorkDelayError({ delay })`** → **re-queue, `attempt` unchanged** (transient). The
630
+ natural fit for a rate limit (`429` + `Retry-After`) or "upstream not ready."
631
+ - **anything else** → the normal **retry** (backoff, `attempt++`, dead-letter once exhausted).
632
+
633
+ For policy you don't want to encode at each throw site, classify centrally with
634
+ **`onFailure`** (per-type on `WorkOptions`, or a default on `WorkSystemOptions`):
635
+
636
+ ```ts
637
+ type FailureDecision = 'retry' | 'abort' | { delay: number } | { runAt: number }
638
+
639
+ defineWork('call-api', handler, {
640
+ retry: { attempts: 5 },
641
+ onFailure: (err, { attempt }) => {
642
+ const s = (err as { status?: number }).status
643
+ if (s === 429) return { delay: 30_000 } // rate-limited → come back in 30s, NOT a retry
644
+ if (s && s >= 400 && s < 500) return 'abort' // client error → permanent, dead-letter now
645
+ return 'retry' // (or return nothing) → normal backoff retry
646
+ },
647
+ })
648
+ ```
649
+
650
+ `(err, info) => 'retry' | 'abort' | { delay } | { runAt } | void` — `info` is `{ id, type,
651
+ attempt, attempts }`. `'abort'` dead-letters; `{ delay }`/`{ runAt }` reschedule without counting a
652
+ retry (emitting a `deferred` event, like `WorkDelayError`); `'retry'`/`void` is the default. A
653
+ per-type `onFailure` overrides the system one; a throwing classifier is reported and falls back to
654
+ the default. (An explicit `RetryAbort`/`WorkDelayError` throw takes precedence over the classifier.)
655
+
656
+ ### Redriving the dead-letter queue — `dlq`
657
+
658
+ Dead-lettered items are terminal — but a downstream that was down often recovers. Point
659
+ `WorkSystemOptions.dlq` at a **readable** `Queue` and, whenever the normal queue(s) are idle (a
660
+ poll round pulled nothing) and there's free capacity, the loop transfers up to `redriveCount`
661
+ bodies from it back onto their type's queue as **fresh** work — `attempt` reset to 1 (full retry
662
+ budget), `queueAt`/`startAt` = now, a fresh group hold re-opened — then acks them off the DLQ:
663
+
664
+ ```ts
665
+ createWork({
666
+ work: [...] as const,
667
+ dlq: deadLetterQueue, // a Queue you can pop() — e.g. the sink your queue's deadLetter writes to
668
+ redriveCount: 10, // max moved per idle poll (default 10; 0 disables)
669
+ })
670
+ ```
671
+
672
+ Redrive only runs when the live queues are empty, so it never competes with fresh work. Each
673
+ moved item re-enters as a normal `queued` item (counted in `stats()`), and an unparseable body is
674
+ dropped (acked) rather than looped on. Wire `dlq` to the same sink your queue's `deadLetter`
675
+ targets so recovery is automatic; leave it unset to keep dead items terminal until you redrive
676
+ them yourself.
677
+
678
+ ### `retry()`'s own `on` hook
679
+
680
+ What a `retry()` does on each error is configurable per call:
681
+ `RetryOptions.on?: (err) => MaybePromise<number | false>` returns `false` to **stop**, or a number
682
+ of **ms to wait at least** before the next attempt (a floor under the normal backoff; `0` = just
683
+ back off). Default `(err) => (err instanceof RetryAbort ? false : 0)`, so e.g.
684
+ `retry(fn, { on: (e) => (e.status === 404 ? false : e.status === 429 ? 30_000 : 0) })` stops on a
685
+ 404 and waits ≥30s on a 429 — no `RetryAbort` wrapper needed. Overriding `on` replaces the default,
686
+ so to keep retrying through a `RetryAbort` just return a number (e.g. `on: () => 0`) instead of `false`.
687
+
688
+ ## Deadlines & timeouts
689
+
690
+ A `deadline` (absolute epoch ms) or `timeout` (relative ms from enqueue) bounds the whole
691
+ life of an item: if it hasn't **started and finished** by then, it is **not retried** — it
692
+ goes terminal and an **`'expired'`** event fires. Unlike a retry budget (which counts
693
+ attempts), a deadline is wall-clock. Set it per-instance, or per-type via `timeout`:
694
+
695
+ ```ts
696
+ w.enqueue(charge({ id }), { timeout: 30_000 }) // must finish within 30s of enqueue
697
+ w.enqueue(report({}), { deadline: Date.parse('2030-01-01') }) // absolute cutoff
698
+ const sync = defineWork('sync', handler, { timeout: 60_000 }) // per-type default
699
+ ```
700
+
701
+ `deadline` wins over `timeout` (which resolves to `queueAt + timeout`); the resolved
702
+ absolute deadline is **serialized with the item** and carried across re-pushes. It is
703
+ enforced at two points:
704
+
705
+ - **Before dispatch** — an item whose scheduled `startAt` is already past its deadline
706
+ (e.g. a long `delay`) expires **without ever running**.
707
+ - **Before a retry** — if the next backoff would land past the deadline, the failing item
708
+ expires **instead of** re-enqueueing.
709
+
710
+ On expiry the item goes terminal (status `dead`, error `'deadline exceeded'`), its group
711
+ settles, the awaiting `.result()` **rejects**, and the `'expired'` event
712
+ (`{ kind: 'expired'; id; type; groupId; deadline; parent?; dependents?; at }`) fires; a
713
+ `work.expired` counter is bumped. (The dependency type's own `timeout` is the same idea
714
+ applied to a fan-in gate — see [deps & scheduling](./ayepi-work-deps-schedule.md).)
715
+
716
+ ## Lifecycle events & affinity
717
+
718
+ `onEvent(event)` (global, and per-type via `WorkOptions.onEvent`) fires for:
719
+
720
+ ```ts
721
+ type WorkEvent =
722
+ | { kind: 'queued'; id; type; groupId; parent?; dependents?; at }
723
+ | { kind: 'started'; id; type; groupId; attempt; parent?; dependents?; at }
724
+ | { kind: 'deferred'; id; type; groupId; runAt; at } // rescheduled (WorkDelayError); attempt unchanged
725
+ | { kind: 'succeeded'; id; type; groupId; attempt; result; parent?; dependents?; at }
726
+ | { kind: 'failed'; id; type; groupId; attempt; error; willRetry; parent?; dependents?; at } // willRetry:false ⇒ dead-letter
727
+ | { kind: 'expired'; id; type; groupId; deadline; parent?; dependents?; at } // past its deadline/timeout — terminal, no retry
728
+ | { kind: 'group-done'; groupId; result; at }
729
+ ```
730
+
731
+ The item-scoped events carry **`parent`** (the id of the work that queued this one) and
732
+ **`dependents`** (the ids it depended on, when queued by a fired dependency) — the same
733
+ metadata exposed on `ctx`.
734
+
735
+ Both hooks are wrapped so a throwing handler **never disrupts the engine**.
736
+
737
+ `onError(err, phase)` observes **non-critical** errors the engine swallows so they're not
738
+ mistaken for handler failures. `phase: 'commit'` is a failure while **recording a result the
739
+ handler already produced** (the store/ack/pub-sub after success) — it's reported and **never
740
+ retried** (retrying would duplicate the work). `phase: 'queue'` is a poll/routing error in the
741
+ worker loop — it sleeps and continues. A handler's **own** error is not routed here; it
742
+ retries/dead-letters as usual. Off by default; a throwing `onError` is itself ignored.
743
+
744
+ `accept(info: WorkAcceptInfo)` returns `false` to **decline** an item on this instance so
745
+ another picks it up — shard work types across a fleet. A declined item is re-queued
746
+ (visible again after ~`pollInterval`).
747
+
748
+ ```ts
749
+ const a = createWork({ ...backend, work: [ping, pong] as const, accept: (i) => i.type === 'ping' })
750
+ const b = createWork({ ...backend, work: [ping, pong] as const, accept: (i) => i.type === 'pong' })
751
+ ```
752
+
753
+ `unhandledWorkGroup(info)` fires **once** when a group finishes with a result but nobody
754
+ awaited it (an orphan). `info` is `{ groupId, lastResult, states }`.
755
+
756
+ ## Inspecting state
757
+
758
+ `list()` returns a best-effort snapshot of `WorkState`s this instance knows about:
759
+
760
+ ```ts
761
+ interface WorkState {
762
+ readonly id; readonly type; readonly status // 'pending'|'running'|'success'|'failed'|'dead'
763
+ readonly attempt: number
764
+ readonly result?: unknown
765
+ readonly error?: string
766
+ readonly queueAt: number // enqueued (epoch ms)
767
+ readonly startAt: number // scheduled earliest start = queueAt + delay
768
+ readonly runAt?: number // when execution actually began
769
+ readonly endAt?: number // terminal state reached
770
+ readonly priority?: number
771
+ readonly group?: string
772
+ }
773
+ ```
774
+
775
+ `active()` returns the work this instance has **polled and accepted** (will not be
776
+ skipped), as `ActiveWork` (`status` is `'pending'` = admitted to the doer awaiting a slot,
777
+ or `'running'`).
778
+
779
+ ### `metrics` / `stats()` — per-type metrics
780
+
781
+ The engine records per-type metrics into a `Metrics` registry from `@ayepi/core` (re-exported
782
+ here). Each series is **labelled by work `type`** and fed at every lifecycle transition, so it
783
+ tracks the gaps between an item's timestamps: creation (`queueAt`) → start (`runAt`) → terminal
784
+ (`endAt`). All durations are **ms**; counters are cumulative since start.
785
+
786
+ - `w.metrics` — the live registry: `list()`, `get(name, { type })`, `subscribe(listener)`.
787
+ - `w.stats()` — convenience for `w.metrics.list()`: a flat `StatValue[]` (one per name + labels).
788
+
789
+ Metric names live on the exported `WORK_METRICS` map (so you reference series without typos):
790
+
791
+ ```
792
+ counters work.queued work.started work.succeeded work.failed
793
+ work.retried work.deferred work.rescheduled
794
+ gauges work.active work.pending work.running work.peak_active
795
+ work.last_queued_at work.last_started_at work.last_succeeded_at work.last_failed_at
796
+ summaries work.wait_time poll lag runAt − startAt
797
+ (ms unless work.total_time end-to-end endAt − queueAt
798
+ noted) work.success_time / work.error_time run duration (success / dead-letter)
799
+ work.delay_time / work.reschedule_time re-queue horizons
800
+ work.attempts (count) tries used at terminal
801
+ ```
802
+
803
+ A **summary** always carries `{ count, total, min, max, avg }`; pass a quantile-enabled registry
804
+ to also get `quantiles` (p50/p95/p99) and histogram `buckets`:
805
+
806
+ ```ts
807
+ import { createWork, createMetrics, formatPrometheus, WORK_METRICS } from '@ayepi/work'
808
+
809
+ const metrics = createMetrics({ quantiles: [0.5, 0.95, 0.99] }) // opt-in percentiles
810
+ const w = createWork({ work: [...] as const, metrics })
811
+
812
+ const s = w.metrics.get(WORK_METRICS.successTime, { type: 'sendEmail' })?.summary
813
+ s?.avg; s?.quantiles?.['0.95'] // mean / tail latency
814
+
815
+ // integrate: scrape/log on an interval, or push on change
816
+ setInterval(() => console.log(formatPrometheus(w.stats())), 15_000)
817
+ w.metrics.subscribe((changed) => pushToStatsd(changed)) // coalesced (one batch per burst)
818
+ ```
819
+
820
+ Notes: `active = pending + running`; `peak_active` is the high-water mark. `wait_time` is the
821
+ **poll lag** (`runAt − startAt` for *this* delivery), not the end-to-end wait — use `total_time`
822
+ for cradle-to-grave. A type's series appear once it's first queued or claimed. `subscribe`
823
+ batches a burst of mutations into one callback (via microtask); for pull-based exporters just
824
+ call `stats()`/`metrics.list()`. See `@ayepi/core`'s stats module for the registry API.
825
+
826
+ ## Custom id generation
827
+
828
+ Work ids are minted in two places, both overridable:
829
+
830
+ - **Build-time** — a builder assigns an id when you call it (`add({ a, b }).id`). Override
831
+ the process-wide generator with `setIdGenerator(fn)`; call `setIdGenerator()` with no
832
+ argument to reset to the default (UUID).
833
+ - **Engine-minted** — group ids, name-form item ids, dependency keys, and re-push ids. Set
834
+ `WorkSystemOptions.generateId` to override these per system (defaults to the process
835
+ generator).
836
+
837
+ ```ts
838
+ import { setIdGenerator, createWork } from '@ayepi/work'
839
+
840
+ let n = 0
841
+ setIdGenerator(() => `job-${++n}`) // build-time ids: job-1, job-2, …
842
+ const w = createWork({ work: [...] as const, generateId: () => `eng-${++n}` })
843
+ // ...
844
+ setIdGenerator() // reset to the default UUID generator
845
+ ```
846
+
847
+ Use a deterministic generator in tests, or a monotonic/prefixed scheme to make ids sortable
848
+ or traceable. Ids must be **unique** — collisions alias distinct items in the store.
849
+
850
+ ## Default instance + top-level exports
851
+
852
+ The module exports a default registry-less system (`autoStart: false`, so a bare import
853
+ has no side effects) plus convenience bindings:
854
+
855
+ ```ts
856
+ import { work, enqueue, schedule, start, stop, list } from '@ayepi/work'
857
+ // work — the default WorkSystem (registry-less); enqueue/schedule/start/stop/list are
858
+ // bound to it. `enqueue` is instance-form only (no typed registry).
859
+ ```
860
+
861
+ The default instance does **not** auto-start. Most apps call `createWork` with their own
862
+ registry instead.
863
+
864
+ ---
865
+
866
+ ## How it works under the hood
867
+
868
+ A quick map of the moving parts. The **full mechanics deep dive** (key layouts, exact
869
+ algorithms, every tunable constant) lives in
870
+ [`ayepi-work-ports.md`](./ayepi-work-ports.md#engine-mechanics-deep-dive).
871
+
872
+ - **Delivery** — the `Queue` is a durable log: `pop` leases items under a **visibility
873
+ timeout**, the engine **heartbeats** the lease (`visibility/3`), and a dead worker's lease
874
+ lapses so `pop` **reclaims** it (`attempt + 1`). Lease handles are token-gated.
875
+ - **Groups** — each group keeps an atomic **open-work counter** (`Store.increment`); it hits
876
+ `0` only after every descendant settles, which fires `group-done` + the orphan check.
877
+ - **Distributed wait** — `.result()`/`.group()` races a `PubSub` subscription against a
878
+ `WAIT_POLL = 250` ms store re-read, so a waiter on one pod resolves when another finishes.
879
+ - **Idempotency** — every "exactly once across the fleet" concern (dependency fire,
880
+ scheduler lease, orphan hook) is one `Store.setIfNotExists` compare-and-set.
881
+ - **Backoff** — a retrying attempt sleeps
882
+ `min(base · factor^(attempt-1), max) · (1 − jitter · random())`, then re-enqueues.
883
+
884
+ ---
885
+
886
+ ## Gotchas / constraints
887
+
888
+ - **Group-result type is structural, but still a union.** Awaiting a handle is typed
889
+ `GroupOfWork<root>` — the union the root work **structurally** contributes (its own
890
+ `ctx.result` value plus everything it `ctx.queue`s / `.next`s, transitively), **not** the
891
+ registry-wide union. The actual runtime value is the **last contributor to finish** (or a
892
+ `{ final }`/`{ append }` result) — the type can't know which member it is, so treat it as
893
+ a union and narrow, or use `.result()` for a precisely-typed single-item output. A work
894
+ that returns `ctx.queue(...)` delegates, so its `.result()` is `void`.
895
+ - **At-least-once semantics.** An item can be delivered more than once (lease expiry,
896
+ redelivery). Handlers should be **idempotent**, or guard side effects with `ctx.claim`.
897
+ - **`increment` and non-atomic fallback.** Without a real atomic `Store.increment`, the
898
+ group open-counter falls back to get+set — correct only on a **single process**. Any
899
+ multi-pod backend must implement `increment` atomically.
900
+ - **`skipQueue` first run is best-effort.** It does not go through the durable queue/lease;
901
+ only the retry (on failure) is durable. Don't use `skipQueue` for work that must survive
902
+ a crash on its first attempt.
903
+ - **In-memory backend is per-process.** `memoryBackend()` shares state only within one
904
+ process. For real multi-pod deployments, supply distributed ports
905
+ (see [ports doc](./ayepi-work-ports.md)).
906
+ - **Codec must round-trip your inputs/outputs.** Values cross the wire as strings. Plain
907
+ `JSON` drops `undefined`, throws on `BigInt`, etc. — `defaultCodec` handles the common
908
+ non-native types; provide a custom `codec` for custom classes
909
+ (see [ports doc](./ayepi-work-ports.md)).
910
+ - **Cron is minute-granular, local time.** `nextAfter` scans minutes in **local** time
911
+ (see [deps & scheduling](./ayepi-work-deps-schedule.md)).
912
+
913
+ ## Running the engine as a managed component
914
+
915
+ `createWork` returns `start()` / `stop()` (idempotent). To wire it into graceful
916
+ startup/shutdown alongside the rest of your services, register it with `@ayepi/updown`:
917
+
918
+ ```ts
919
+ import { updown } from '@ayepi/updown'
920
+ const lc = updown()
921
+ const w = createWork({ work: [/* ... */] as const, autoStart: false })
922
+ lc.register({ name: 'work', up: () => w.start(), post: () => w.stop() })
923
+ ```
924
+
925
+ See `@ayepi/updown` for dependency-ordered `up()`/`down()` and health probes, and
926
+ `ayepi-core.md` when building the rest of your service on `@ayepi/core` (typed HTTP/WS).