queasy 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CLAUDE.md CHANGED
@@ -21,7 +21,7 @@ Tests require a running Redis instance. Use `docker:up` first if needed.
21
21
 
22
22
  Queasy is a Redis-backed job queue with **at-least-once** delivery semantics. The core logic lives in two layers:
23
23
 
24
- - **JS layer** (`src/queue.js`): The `queue()` factory returns `{ dispatch, cancel, listen }`. On first use, it uploads the Lua script to Redis via `FUNCTION LOAD REPLACE`. A `WeakSet` (`initializedClients`) tracks which Redis clients have already had the functions loaded. `listen()` is currently a TODO stub.
24
+ - **JS layer** (`src/client.js`): The `Client` class accepts a `RedisOptions` object and constructs its own node-redis connection via `createClient` (plain object) or `createCluster` (object with `rootNodes`). On construction it connects, then uploads the Lua script to Redis via `FUNCTION LOAD REPLACE`. The connection is torn down in `close()` via `destroy()`.
25
25
  - **Lua layer** (`src/queasy.lua`): All queue state mutations are atomic Redis functions registered under the `queasy` library. No queue logic should be duplicated in JS — the Lua functions are the single source of truth for state transitions.
26
26
 
27
27
  ### Redis data structures
package/Readme.md CHANGED
@@ -13,7 +13,7 @@ A Redis-backed job queue for Node.js, featuring (in comparison with design inspi
13
13
 
14
14
  ### Terminology
15
15
 
16
- A _client_ is an instance of Queasy that connects to a Redis database. A _job_ is the basic unit of work that is _dispatched_ into a _queue_.
16
+ A _client_ is an instance of Queasy. It manages its own Redis connection. A _job_ is the basic unit of work that is _dispatched_ into a _queue_.
17
17
 
18
18
  A _handler_ is JavaScript code that performs work. There are two kinds of handlers: _task handlers_, which process jobs, and _fail handlers_, which are invoked when a job fails permanently. Handlers run on _workers_, which are Node.js worker threads. By default, a Queasy client automatically creates one worker per CPU.
19
19
 
@@ -55,9 +55,14 @@ The response of the heartbeat Lua function indicates whether the client had been
55
55
 
56
56
  ## API
57
57
 
58
- ### `client(redisConnection, workerCount)`
59
- Returns a Queasy client.
60
- - `redisConnection`: a node-redis connection object.
58
+ ### `new Client(options, workerCount)`
59
+ Returns a Queasy client. Queasy creates and manages its own Redis connection internally.
60
+ - `options`: connection options. Two forms are accepted:
61
+ - **Single node** (plain object): passed to node-redis `createClient`. Accepts `url`, `socket`, `username`, `password`, and `database`. Defaults to `{}` (connects to `localhost:6379`).
62
+ - **Cluster** (object with `rootNodes`): passed to node-redis `createCluster`. Accepts:
63
+ - `rootNodes`: array of per-node connection options (same fields as single-node form); at least three nodes are recommended.
64
+ - `defaults`: options shared across all nodes (e.g. auth and TLS).
65
+ - `nodeAddressMap`: address translation map for NAT environments.
61
66
  - `workerCount`: number; Size of the worker pool. If 0, or if called in a queasy worker thread, no pool is created. Defaults to the number of CPUs.
62
67
 
63
68
  The client object returned is an EventEmitter, which emits a 'disconnect' event when it fails permanently for any reason, such as library version mismatch between different workers connected to the same Redis insance, or a lost locks situation. When this happens, in general the application should exit the worker process and allow the supervisor to restart it.
@@ -7,8 +7,6 @@ services:
7
7
  ports:
8
8
  - '6379:6379'
9
9
  command: redis-server --save 60 1 --loglevel warning
10
- volumes:
11
- - redis-data:/data
12
10
  healthcheck:
13
11
  test: ['CMD', 'redis-cli', 'ping']
14
12
  interval: 5s
@@ -0,0 +1,185 @@
1
+ # Queasy Fuzz Test Plan
2
+
3
+ A long-running end-to-end fuzz test that simulates random failures and continuously verifies core system invariants.
4
+
5
+ ## Invariants Verified
6
+
7
+ 1. **Mutual exclusion**: Two jobs with the same Job ID are never processed by different clients or worker threads simultaneously.
8
+ 2. **No re-processing of successful jobs**: A job that has succeeded is never processed again.
9
+ 3. **Scheduling**: No job is processed before its `run_at` time.
10
+ 4. **Priority ordering within a queue**: No job starts processing while another job in the same queue with a lower `run_at` is still waiting (i.e., eligible jobs are dequeued in order).
11
+ 5. **Fail handler completeness**: If a fail handler is registered, every job that does not eventually succeed MUST result in the fail handler being invoked.
12
+ 6. **Queue progress (priority starvation prevention)**: Non-empty queues at the highest priority level always make progress. When they drain, queues at the next priority level begin making progress.
13
+
14
+ ## Structure Overview
15
+
16
+ ```
17
+ fuzztest/
18
+ Readme.md # This file
19
+ fuzz.js # Orchestrator: spawns child processes, monitors shared state
20
+ process.js # Child process: sets up clients and listens on all queues
21
+ handlers/
22
+ periodic.js # Re-queues itself; dispatches cascade jobs; occasionally stalls/crashes
23
+ cascade-a.js # Dispatched by periodic; dispatches into cascade-b
24
+ cascade-b.js # Dispatched by cascade-a; final handler
25
+ fail-handler.js # Shared fail handler for all queues; records invocations
26
+ shared/
27
+ state.js # In-process shared state helpers (for the orchestrator)
28
+ log.js # Structured logger (writes to fuzz-output.log, never throws)
29
+ ```
30
+
31
+ ## Process Architecture
32
+
33
+ The orchestrator (`fuzz.js`) spawns **N child processes** (default: 4). Each child process creates one Redis client and calls `listen()` on every queue. The orchestrator itself does not process jobs — it only monitors invariants and manages the lifecycle.
34
+
35
+ Handlers write events (job start, finish, fail, stall) directly to a Redis stream (`fuzz:events`). The orchestrator reads from this stream and maintains a shared in-memory log of events, checking invariants after each one. Child processes do not need to forward events to the orchestrator themselves — the stream is the shared channel.
36
+
37
+ Child processes are deliberately killed and restarted periodically to simulate crashes. A killed process' checked-out jobs will be swept and retried/failed by the remaining processes.
38
+
39
+ ## Queue Configuration
40
+
41
+ Three queues at different priority levels, all listened on by every child process. Parameters are kept small to produce many events quickly:
42
+
43
+ | Parameter | `{fuzz}:periodic` | `{fuzz}:cascade-a` | `{fuzz}:cascade-b` |
44
+ |---|---|---|---|
45
+ | Handler | `periodic.js` | `cascade-a.js` | `cascade-b.js` |
46
+ | Priority | 300 | 200 | 100 |
47
+ | `maxRetries` | 3 | 3 | 3 |
48
+ | `maxStalls` | 2 | 2 | 2 |
49
+ | `minBackoff` | 200 ms | 200 ms | 200 ms |
50
+ | `maxBackoff` | 2 000 ms | 2 000 ms | 2 000 ms |
51
+ | `timeout` | 3 000 ms | 3 000 ms | 3 000 ms |
52
+ | `size` | 10 | 10 | 10 |
53
+ | `failHandler` | `fail-handler.js` | `fail-handler.js` | `fail-handler.js` |
54
+ | `failRetryOptions.maxRetries` | 5 | 5 | 5 |
55
+ | `failRetryOptions.minBackoff` | 200 ms | 200 ms | 200 ms |
56
+
57
+ The short `timeout` (3 s) means stalling jobs are detected and swept quickly. The short `minBackoff` / `maxBackoff` window (200 ms – 2 s) means retries cycle fast. With `maxRetries: 3` and `maxStalls: 2`, most failed jobs reach the fail handler within seconds.
58
+
59
+ ## Periodic Jobs (Seed)
60
+
61
+ A fixed set of periodic job IDs (e.g., `periodic-0` through `periodic-4`) are dispatched by the orchestrator at startup. Each periodic handler:
62
+
63
+ 1. Records the current processing event by writing `{ type: 'start', queue, id, threadId, clientId, startedAt }` to the `fuzz:events` Redis stream.
64
+ 2. Optionally sleeps for a random short delay.
65
+ 3. Dispatches a cascade-a job with a unique ID and a `runAt` randomly up to 2 seconds in the future.
66
+ 4. Re-dispatches itself (same job ID, `updateRunAt: true`) with a delay of 1–5 seconds, so the job continues to fire periodically.
67
+ 5. On success, writes `{ type: 'finish', queue, id, threadId, clientId, finishedAt }` to the `fuzz:events` stream.
68
+
69
+ The fail handler for periodic jobs also re-dispatches the same periodic job ID (with a delay), ensuring periodic jobs survive permanent failures. This lets the orchestrator assert that periodic jobs keep running indefinitely.
70
+
71
+ ## Cascade Jobs
72
+
73
+ `cascade-a.js`:
74
+ - Records start/finish events.
75
+ - Dispatches one or two `cascade-b` jobs with unique IDs.
76
+ - Subject to all chaos behaviors (see below).
77
+
78
+ `cascade-b.js`:
79
+ - Records start/finish events.
80
+ - Terminal handler; does not dispatch further jobs.
81
+ - Subject to all chaos behaviors (see below).
82
+
83
+ ## Chaos Behaviors
84
+
85
+ All handlers are subject to all chaos behaviors. The probabilities below are per-invocation and apply uniformly across `periodic.js`, `cascade-a.js`, and `cascade-b.js`:
86
+
87
+ | Behavior | Probability | Notes |
88
+ |---|---|---|
89
+ | Normal completion | ~65% | Dispatches downstream jobs (if any), then returns |
90
+ | Retriable error (throws `Error`) | ~15% | No downstream dispatch |
91
+ | Permanent error (throws `PermanentError`) | ~5% | No downstream dispatch |
92
+ | Stall (returns a never-resolving promise) | ~10% | Detected after `timeout` (3 s); counts as a stall |
93
+ | CPU spin (blocks the worker thread) | ~3% | Tight loop until the process detects the hang and kills the thread (via `timeout`) |
94
+ | Crash (causes the child process to exit) | ~2% | Handler writes a "crash-me" flag to Redis; main thread polls and exits |
95
+
96
+ With `timeout: 3000`, stalling and spinning jobs are swept within ~3–13 seconds (timeout + heartbeat sweep interval). With `maxStalls: 2`, two stalls exhaust the stall budget and the job is sent to the fail handler, cycling fast.
97
+
98
+ When a child process crashes, the orchestrator detects the exit event and restarts a new child process after a short delay.
99
+
100
+ ## Event Logging and Invariant Checking
101
+
102
+ The orchestrator maintains an append-only in-memory event log. Each entry contains:
103
+ ```js
104
+ { type, queue, id, threadId, clientId, timestamp }
105
+ ```
106
+ where `type` is one of: `start`, `finish`, `fail`, `stall`, `cancel`.
107
+
108
+ After each event is appended, the orchestrator runs incremental invariant checks:
109
+
110
+ ### Invariant 1: Mutual Exclusion
111
+ Maintain a `Map<jobId, { clientId, threadId, startedAt }>` of currently-active jobs. On `start`, check that the job ID is not already in the map. On `finish`/`fail`/`stall`, remove it.
112
+
113
+ If a `start` event arrives for a job ID already in the map → **VIOLATION**.
114
+
115
+ ### Invariant 2: No Re-processing of Succeeded Jobs
116
+ Maintain a `Set<jobId>` of successfully finished job IDs. On `start`, check that the ID is not in this set.
117
+
118
+ If a `start` event arrives for a job ID in the succeeded set → **VIOLATION**.
119
+
120
+ Note: Re-processing after a stall or retry is expected and must not be flagged.
121
+
122
+ ### Invariant 3: Scheduling (No Early Processing)
123
+ Each `start` event includes `startedAt` (wall clock). Each job dispatch records an intended `runAt`. On `start`, verify `startedAt >= runAt - CLOCK_TOLERANCE_MS`.
124
+
125
+ If `startedAt < runAt - CLOCK_TOLERANCE_MS` → **VIOLATION**.
126
+
127
+ `CLOCK_TOLERANCE_MS` accounts for clock skew between the orchestrator, child processes, and Redis (default: 100ms).
128
+
129
+ ### Invariant 4: Priority Ordering
130
+ Track the earliest-known `runAt` for jobs dispatched into each queue but not yet started. When a `start` event arrives for a job in that queue, verify no other eligible job (with `runAt <= now`) in the same queue has a lower `runAt` that has been waiting longer.
131
+
132
+ This invariant is best-effort and checked with a configurable lag (e.g., 200ms) to account for the inherent race between dequeue polling and dispatch. A violation is only flagged when the ordering difference exceeds this lag.
133
+
134
+ ### Invariant 5: Fail Handler Completeness
135
+ Track every job that has been dispatched (by ID). When a job exceeds its `maxRetries` or receives a permanent error, a fail event should be observed. Maintain a map `{ jobId → { exhausted: bool, failSeen: bool } }`. After a configurable drain period (e.g., 30 seconds after a queue goes quiet), check that every exhausted job has a corresponding `fail` event.
136
+
137
+ ### Invariant 6: Queue Progress
138
+ The orchestrator monitors the time since the last `start` event per queue. If a queue is known to be non-empty (based on dispatched vs finished counts) and no `start` event has been seen for more than a configurable `STALL_THRESHOLD_MS` (e.g., 30 seconds), flag a progress violation.
139
+
140
+ Priority starvation is checked by verifying that the low-priority queue does not process jobs while the high-priority queue has outstanding jobs older than the dequeue poll interval.
141
+
142
+ ## Output and Reporting
143
+
144
+ Violations are logged to stdout and to `fuzz-output.log` with full context. The process does **not** exit on a violation — it logs and continues, accumulating a count of violations. A summary is printed periodically (every 60 seconds) and on `SIGINT`.
145
+
146
+ Log format (newline-delimited JSON):
147
+ ```json
148
+ { "time": "...", "level": "info|warn|error", "msg": "...", "data": { ... } }
149
+ ```
150
+
151
+ Violation entries use level `"error"` and include the invariant name, the offending event, and relevant recent history.
152
+
153
+ ## Configuration
154
+
155
+ All tunable parameters live at the top of `fuzz.js` as named constants:
156
+
157
+ ```js
158
+ const NUM_PROCESSES = 4; // Child processes
159
+ const NUM_PERIODIC_JOBS = 5; // Fixed periodic job IDs
160
+ const PERIODIC_MIN_DELAY = 1000; // ms before re-queuing self
161
+ const PERIODIC_MAX_DELAY = 5000;
162
+ const CRASH_INTERVAL_MS = 30000; // Orchestrator kills a random child process this often
163
+ const CLOCK_TOLERANCE_MS = 100;
164
+ const STALL_THRESHOLD_MS = 30000;
165
+ const PRIORITY_LAG_MS = 200;
166
+ const LOG_FILE = 'fuzz-output.log';
167
+ ```
168
+
169
+ ## Running
170
+
171
+ The fuzz test is separate from the default test suite and is never run by `npm test`. It is started manually:
172
+
173
+ ```sh
174
+ node fuzztest/fuzz.js
175
+ ```
176
+
177
+ It runs indefinitely. Stop with `Ctrl+C`. A summary of violations and events processed will be printed on exit.
178
+
179
+ ## Notes on Implementation
180
+
181
+ - Child processes use the `queasy` library's public API (`queue()`, `dispatch()`, `listen()`). They do not talk directly to Redis.
182
+ - The orchestrator does not import from `src/`; it only spawns child processes and learns about child process lifecycle only from the `spawn` and `exit` events.
183
+ - All handler modules in `fuzztest/handlers/` must be self-contained ESM modules that can be passed as `handlerPath` to `queue.listen()`.
184
+ - Handlers write events to the `fuzz:events` Redis stream using a dedicated Redis client created at handler module load time. The orchestrator reads from this stream via `XREAD BLOCK`. This is the only communication channel between handlers and the orchestrator — no IPC is used.
185
+ - The chaos crash behavior must be triggered from the child process's main thread, not from inside a handler's worker thread. To simulate a crash, the handler uses `postMessage` to send a `{ type: 'crash' }` message to the main thread, which listens for it and calls `process.exit()`.
@@ -0,0 +1,354 @@
1
+ /**
2
+ * Fuzz test orchestrator.
3
+ *
4
+ * - Spawns NUM_PROCESSES child processes, each running fuzztest/process.js
5
+ * - Dispatches seed periodic jobs at startup
6
+ * - Reads events from the fuzz:events Redis stream
7
+ * - Checks system invariants after each event
8
+ * - Logs violations without terminating
9
+ * - Prints a summary every 60 seconds and on SIGINT
10
+ */
11
+
12
+ import { fork } from 'node:child_process';
13
+ import { createWriteStream } from 'node:fs';
14
+ import { dirname, join } from 'node:path';
15
+ import { fileURLToPath } from 'node:url';
16
+ import { createClient } from 'redis';
17
+ import { Client } from '../src/index.js';
18
+ import { readEvents, STREAM_KEY } from './shared/stream.js';
19
+
20
+ const __dirname = dirname(fileURLToPath(import.meta.url));
21
+
22
+ // ── Configuration ──────────────────────────────────────────────────────────────
23
+
24
+ const NUM_PROCESSES = 4;
25
+ const NUM_PERIODIC_JOBS = 5;
26
+ const CRASH_INTERVAL_MS = 30_000; // kill a random process this often
27
+ const CLOCK_TOLERANCE_MS = 200; // allow this much early-start slop
28
+ const STALL_THRESHOLD_MS = 30_000; // no start event → progress violation
29
+ const PRIORITY_LAG_MS = 500; // ordering slop between queues
30
+ const PROCESS_RESTART_DELAY_MS = 500;
31
+ const SUMMARY_INTERVAL_MS = 60_000;
32
+ const LOG_FILE = join(__dirname, '..', 'fuzz-output.log');
33
+
34
+ // ── Logging ────────────────────────────────────────────────────────────────────
35
+
36
+ const logStream = createWriteStream(LOG_FILE, { flags: 'a' });
37
+
38
+ function log(level, msg, data = {}) {
39
+ const entry = JSON.stringify({ time: new Date().toISOString(), level, msg, data });
40
+ if (level === 'error') process.stdout.write(`VIOLATION: ${entry}\n`);
41
+ else process.stdout.write(`${entry}\n`);
42
+ logStream.write(`${entry}\n`);
43
+ }
44
+
45
+ // ── Invariant state ────────────────────────────────────────────────────────────
46
+
47
+ /** @type {Map<string, {queue: string, startedAt: number, runAt: number, pid: number}>} */
48
+ const activeJobs = new Map();
49
+
50
+ /** @type {Set<string>} */
51
+ const succeededJobs = new Set();
52
+
53
+ /**
54
+ * Per-queue: list of {id, runAt, dispatchedAt} for jobs seen but not yet started.
55
+ * Used to check priority ordering.
56
+ * @type {Map<string, {id: string, runAt: number, dispatchedAt: number}[]>}
57
+ */
58
+ const waitingByQueue = new Map();
59
+
60
+ /** @type {Map<string, number>} last start event timestamp per queue */
61
+ const lastStartPerQueue = new Map();
62
+
63
+ let violationCount = 0;
64
+ let eventCount = 0;
65
+
66
+ function violation(invariant, msg, data = {}) {
67
+ violationCount++;
68
+ log('error', `[${invariant}] ${msg}`, data);
69
+ }
70
+
71
+ // ── Invariant checks ───────────────────────────────────────────────────────────
72
+
73
+ /**
74
+ * Called when a child process dequeues a job (via IPC).
75
+ * @param {number} pid
76
+ * @param {{queue: string, jobId: string, runAt: number}} msg
77
+ */
78
+ function onIpcDequeue(pid, msg) {
79
+ const { jobId: id, queue, runAt } = msg;
80
+
81
+ // Mutual exclusion: job must not already be active
82
+ if (activeJobs.has(id)) {
83
+ const existing = activeJobs.get(id);
84
+ violation('MutualExclusion', `Job ${id} dequeued while already active`, {
85
+ existing,
86
+ newDequeue: { queue, pid },
87
+ });
88
+ }
89
+
90
+ activeJobs.set(id, { queue, pid, startedAt: Date.now(), runAt });
91
+ }
92
+
93
+ /**
94
+ * Called when a child process finishes/retries/fails a job (via IPC).
95
+ * @param {string} jobId
96
+ */
97
+ function onIpcJobDone(jobId) {
98
+ activeJobs.delete(jobId);
99
+ }
100
+
101
+ function onStart(event) {
102
+ const { id, queue, runAt: runAtStr, startedAt: startedAtStr } = event;
103
+ const runAt = Number(runAtStr);
104
+ const startedAt = Number(startedAtStr);
105
+
106
+ // No re-processing of succeeded jobs (except periodic which re-queues itself)
107
+ if (succeededJobs.has(id) && !id.startsWith('fuzz-periodic-')) {
108
+ violation('NoReprocess', `Job ${id} started after already succeeding`, {
109
+ queue,
110
+ startedAt,
111
+ });
112
+ }
113
+
114
+ // Scheduling: not before runAt
115
+ if (runAt > 0 && startedAt < runAt - CLOCK_TOLERANCE_MS) {
116
+ violation('Scheduling', `Job ${id} started ${runAt - startedAt}ms too early`, {
117
+ queue,
118
+ runAt,
119
+ startedAt,
120
+ delta: runAt - startedAt,
121
+ });
122
+ }
123
+
124
+ // Priority ordering: no eligible lower-runAt job in same queue waiting
125
+ const waiting = waitingByQueue.get(queue) ?? [];
126
+ for (const w of waiting) {
127
+ if (w.id === id) continue;
128
+ if (w.runAt <= startedAt - CLOCK_TOLERANCE_MS) {
129
+ if (runAt > w.runAt + PRIORITY_LAG_MS) {
130
+ violation(
131
+ 'Ordering',
132
+ `Job ${id} (runAt=${runAt}) started before ${w.id} (runAt=${w.runAt}) in ${queue}`,
133
+ {
134
+ startedId: id,
135
+ startedRunAt: runAt,
136
+ waitingId: w.id,
137
+ waitingRunAt: w.runAt,
138
+ }
139
+ );
140
+ break;
141
+ }
142
+ }
143
+ }
144
+
145
+ lastStartPerQueue.set(queue, startedAt);
146
+
147
+ // Remove from waiting list
148
+ if (waitingByQueue.has(queue)) {
149
+ waitingByQueue.set(
150
+ queue,
151
+ waitingByQueue.get(queue).filter((w) => w.id !== id)
152
+ );
153
+ }
154
+ }
155
+
156
+ function onFinish(event) {
157
+ succeededJobs.add(event.id);
158
+ }
159
+
160
+ /**
161
+ * Called when a child process exits. Clears all active jobs belonging to that
162
+ * PID so they don't trigger spurious MutualExclusion violations when the
163
+ * queasy sweep retries them and a new process picks them up.
164
+ * @param {number} pid
165
+ */
166
+ function onProcessExit(pid) {
167
+ for (const [id, entry] of activeJobs) {
168
+ if (entry.pid === pid) {
169
+ activeJobs.delete(id);
170
+ }
171
+ }
172
+ }
173
+
174
+ /**
175
+ * Called periodically to check queue progress and priority starvation.
176
+ */
177
+ function checkProgress() {
178
+ const now = Date.now();
179
+ for (const [queue, lastStart] of lastStartPerQueue) {
180
+ const idle = now - lastStart;
181
+ if (idle > STALL_THRESHOLD_MS) {
182
+ violation('Progress', `Queue ${queue} has not processed a job in ${idle}ms`, {
183
+ queue,
184
+ lastStartAt: lastStart,
185
+ idleMs: idle,
186
+ });
187
+ }
188
+ }
189
+
190
+ // Priority starvation: cascade-b should not start while periodic has old eligible jobs
191
+ const periodicWaiting = waitingByQueue.get('{fuzz}:periodic') ?? [];
192
+ const eligiblePeriodic = periodicWaiting.filter((w) => w.runAt <= now - PRIORITY_LAG_MS);
193
+ if (eligiblePeriodic.length > 0) {
194
+ const lastBStart = lastStartPerQueue.get('{fuzz}:cascade-b') ?? 0;
195
+ const bStartedAfterPeriodic = eligiblePeriodic.some(
196
+ (w) => lastBStart > w.dispatchedAt + PRIORITY_LAG_MS
197
+ );
198
+ if (bStartedAfterPeriodic) {
199
+ violation(
200
+ 'PriorityStarvation',
201
+ 'cascade-b processed while periodic had eligible waiting jobs',
202
+ {
203
+ eligiblePeriodicCount: eligiblePeriodic.length,
204
+ }
205
+ );
206
+ }
207
+ }
208
+ }
209
+
210
+ // ── Event dispatch ─────────────────────────────────────────────────────────────
211
+
212
+ function handleEvent(event) {
213
+ eventCount++;
214
+ const { type } = event;
215
+ log('info', 'event', event);
216
+
217
+ if (type === 'start') {
218
+ const { id, queue, runAt } = event;
219
+ // Register in waiting list for ordering checks (before onStart removes it)
220
+ const runAtNum = Number(runAt);
221
+ const q = waitingByQueue.get(queue) ?? [];
222
+ if (!q.find((w) => w.id === id)) {
223
+ q.push({ id, runAt: runAtNum, dispatchedAt: Date.now() });
224
+ waitingByQueue.set(queue, q);
225
+ }
226
+ onStart(event);
227
+ } else if (type === 'finish') {
228
+ onFinish(event);
229
+ }
230
+ }
231
+
232
+ // ── Summary ────────────────────────────────────────────────────────────────────
233
+
234
+ function printSummary() {
235
+ const summary = {
236
+ events: eventCount,
237
+ violations: violationCount,
238
+ activeJobs: activeJobs.size,
239
+ succeededJobs: succeededJobs.size,
240
+ lastStartPerQueue: Object.fromEntries(lastStartPerQueue),
241
+ };
242
+ log('info', 'Summary', summary);
243
+ console.log(`\n=== Fuzz Summary ===`);
244
+ console.log(` Events processed : ${eventCount}`);
245
+ console.log(` Violations found : ${violationCount}`);
246
+ console.log(` Active jobs : ${activeJobs.size}`);
247
+ console.log(` Succeeded jobs : ${succeededJobs.size}`);
248
+ console.log('===================\n');
249
+ }
250
+
251
+ // ── Child process management ───────────────────────────────────────────────────
252
+
253
+ /** @type {Set<import('node:child_process').ChildProcess>} */
254
+ const processes = new Set();
255
+
256
+ function spawnProcess() {
257
+ const child = fork(join(__dirname, 'process.js'));
258
+ processes.add(child);
259
+
260
+ child.on('message', (msg) => {
261
+ if (msg.type === 'dequeue') {
262
+ onIpcDequeue(child.pid, msg);
263
+ } else if (msg.type === 'finish' || msg.type === 'retry' || msg.type === 'fail') {
264
+ onIpcJobDone(msg.jobId);
265
+ }
266
+ });
267
+
268
+ child.on('exit', (code, signal) => {
269
+ processes.delete(child);
270
+ log('info', 'Child process exited', { pid: child.pid, code, signal });
271
+ onProcessExit(child.pid);
272
+ setTimeout(spawnProcess, PROCESS_RESTART_DELAY_MS);
273
+ });
274
+
275
+ child.on('error', (err) => {
276
+ log('info', 'Child process error', { pid: child.pid, error: err.message });
277
+ });
278
+
279
+ return child;
280
+ }
281
+
282
+ function killRandomProcess() {
283
+ const list = [...processes];
284
+ if (list.length === 0) return;
285
+ const target = list[Math.floor(Math.random() * list.length)];
286
+ log('info', 'Killing random child process', { pid: target.pid });
287
+ target.kill('SIGKILL');
288
+ }
289
+
290
+ // ── Redis setup ────────────────────────────────────────────────────────────────
291
+
292
+ const redis = createClient();
293
+ const dispatchRedis = createClient();
294
+
295
+ await redis.connect();
296
+ await dispatchRedis.connect();
297
+
298
+ // Clean up state from previous runs
299
+ await redis.del(STREAM_KEY);
300
+ log('info', 'Cleared fuzz:events stream from previous run');
301
+
302
+ // Dispatch seed periodic jobs (await ready to avoid Function not found race)
303
+ const dispatchClient = await new Promise((resolve) => new Client(dispatchRedis, 0, resolve));
304
+ const periodicQueue = dispatchClient.queue('{fuzz}:periodic', true);
305
+
306
+ for (let i = 0; i < NUM_PERIODIC_JOBS; i++) {
307
+ const id = `fuzz-periodic-${i}`;
308
+ await periodicQueue.dispatch({ periodic: true, index: i }, { id });
309
+ log('info', `Dispatched seed job ${id}`);
310
+ }
311
+
312
+ // ── Spawn child processes ──────────────────────────────────────────────────────
313
+
314
+ for (let i = 0; i < NUM_PROCESSES; i++) {
315
+ spawnProcess();
316
+ }
317
+
318
+ // Periodically kill a random process to simulate crashes
319
+ const crashTimer = setInterval(killRandomProcess, CRASH_INTERVAL_MS);
320
+
321
+ // Periodic progress + starvation check
322
+ const progressTimer = setInterval(checkProgress, 10_000);
323
+
324
+ // Summary timer
325
+ const summaryTimer = setInterval(printSummary, SUMMARY_INTERVAL_MS);
326
+
327
+ log('info', 'Orchestrator started', {
328
+ numProcesses: NUM_PROCESSES,
329
+ numPeriodicJobs: NUM_PERIODIC_JOBS,
330
+ logFile: LOG_FILE,
331
+ });
332
+
333
+ // ── SIGINT handler ─────────────────────────────────────────────────────────────
334
+
335
+ process.on('SIGINT', () => {
336
+ clearInterval(crashTimer);
337
+ clearInterval(progressTimer);
338
+ clearInterval(summaryTimer);
339
+
340
+ for (const child of processes) {
341
+ child.kill('SIGKILL');
342
+ }
343
+
344
+ printSummary();
345
+ process.exit(violationCount > 0 ? 1 : 0);
346
+ });
347
+
348
+ // ── Event loop ─────────────────────────────────────────────────────────────────
349
+
350
+ // Read from the beginning. We cleared the stream above, so '0' reads all
351
+ // events from the fresh start without missing anything.
352
+ for await (const event of readEvents(redis, '0')) {
353
+ handleEvent(event);
354
+ }
@@ -0,0 +1,94 @@
1
+ /**
2
+ * cascade-a handler — dispatches one or two cascade-b jobs on normal completion.
3
+ * Subject to all chaos behaviors.
4
+ */
5
+
6
+ import { BroadcastChannel } from 'node:worker_threads';
7
+ import { createClient } from 'redis';
8
+ import { Client, PermanentError } from '../../src/index.js';
9
+ import { pickChaos } from '../shared/chaos.js';
10
+ import { emitEvent } from '../shared/stream.js';
11
+
12
+ const redis = createClient();
13
+ const eventRedis = createClient();
14
+
15
+ await redis.connect();
16
+ await eventRedis.connect();
17
+
18
+ // Dispatch-only queasy client (await ready to avoid Function not found race)
19
+ const client = await new Promise((resolve) => new Client(redis, 0, resolve));
20
+ const cascadeBQueue = client.queue('{fuzz}:cascade-b', true);
21
+
22
+ const crashChannel = new BroadcastChannel('fuzz-crash');
23
+
24
+ /**
25
+ * @param {any} data
26
+ * @param {import('../../src/types.js').Job} job
27
+ */
28
+ export async function handle(_data, job) {
29
+ const startedAt = Date.now();
30
+ await emitEvent(eventRedis, {
31
+ type: 'start',
32
+ queue: '{fuzz}:cascade-a',
33
+ id: job.id,
34
+ pid: String(process.pid),
35
+ runAt: String(job.runAt),
36
+ startedAt: String(startedAt),
37
+ });
38
+
39
+ const chaos = pickChaos();
40
+ await emitEvent(eventRedis, {
41
+ type: 'chaos',
42
+ queue: '{fuzz}:cascade-a',
43
+ id: job.id,
44
+ chaos,
45
+ });
46
+
47
+ if (chaos === 'crash') {
48
+ crashChannel.postMessage({ type: 'crash' });
49
+ await new Promise(() => {});
50
+ }
51
+
52
+ if (chaos === 'stall') {
53
+ await new Promise(() => {});
54
+ }
55
+
56
+ if (chaos === 'spin') {
57
+ const end = Date.now() + 10_000;
58
+ while (Date.now() < end) {
59
+ /* busy wait */
60
+ }
61
+ }
62
+
63
+ if (chaos === 'permanent') {
64
+ throw new PermanentError('cascade-a: permanent chaos');
65
+ }
66
+
67
+ if (chaos === 'retriable') {
68
+ throw new Error('cascade-a: retriable chaos');
69
+ }
70
+
71
+ // Normal completion: dispatch 1-2 cascade-b jobs
72
+ const count = Math.random() < 0.5 ? 1 : 2;
73
+ const runAtOffset = Math.random() * 2000;
74
+ const dispatchPromises = [];
75
+ for (let i = 0; i < count; i++) {
76
+ dispatchPromises.push(
77
+ cascadeBQueue.dispatch(
78
+ { from: job.id, index: i },
79
+ {
80
+ runAt: Date.now() + runAtOffset,
81
+ }
82
+ )
83
+ );
84
+ }
85
+ const ids = await Promise.all(dispatchPromises);
86
+
87
+ await emitEvent(eventRedis, {
88
+ type: 'finish',
89
+ queue: '{fuzz}:cascade-a',
90
+ id: job.id,
91
+ finishedAt: String(Date.now()),
92
+ dispatched: ids.join(','),
93
+ });
94
+ }