queasy 0.2.0 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CLAUDE.md +1 -1
- package/Readme.md +9 -4
- package/docker-compose.yml +0 -2
- package/fuzztest/Readme.md +185 -0
- package/fuzztest/fuzz.js +354 -0
- package/fuzztest/handlers/cascade-a.js +94 -0
- package/fuzztest/handlers/cascade-b.js +72 -0
- package/fuzztest/handlers/fail-handler.js +52 -0
- package/fuzztest/handlers/periodic.js +93 -0
- package/fuzztest/process.js +100 -0
- package/fuzztest/shared/chaos.js +28 -0
- package/fuzztest/shared/stream.js +40 -0
- package/package.json +2 -3
- package/plans/redis-options.md +279 -0
- package/src/client.js +42 -12
- package/src/constants.js +1 -1
- package/src/queasy.lua +2 -3
- package/src/types.ts +15 -0
- package/test/client.test.js +1 -1
- package/test/guards.test.js +13 -23
- package/test/manager.test.js +3 -1
- package/test/pool.test.js +4 -2
- package/test/queue.test.js +5 -4
package/CLAUDE.md
CHANGED
|
@@ -21,7 +21,7 @@ Tests require a running Redis instance. Use `docker:up` first if needed.
|
|
|
21
21
|
|
|
22
22
|
Queasy is a Redis-backed job queue with **at-least-once** delivery semantics. The core logic lives in two layers:
|
|
23
23
|
|
|
24
|
-
- **JS layer** (`src/
|
|
24
|
+
- **JS layer** (`src/client.js`): The `Client` class accepts a `RedisOptions` object and constructs its own node-redis connection via `createClient` (plain object) or `createCluster` (object with `rootNodes`). On construction it connects, then uploads the Lua script to Redis via `FUNCTION LOAD REPLACE`. The connection is torn down in `close()` via `destroy()`.
|
|
25
25
|
- **Lua layer** (`src/queasy.lua`): All queue state mutations are atomic Redis functions registered under the `queasy` library. No queue logic should be duplicated in JS — the Lua functions are the single source of truth for state transitions.
|
|
26
26
|
|
|
27
27
|
### Redis data structures
|
package/Readme.md
CHANGED
|
@@ -13,7 +13,7 @@ A Redis-backed job queue for Node.js, featuring (in comparison with design inspi
|
|
|
13
13
|
|
|
14
14
|
### Terminology
|
|
15
15
|
|
|
16
|
-
A _client_ is an instance of Queasy
|
|
16
|
+
A _client_ is an instance of Queasy. It manages its own Redis connection. A _job_ is the basic unit of work that is _dispatched_ into a _queue_.
|
|
17
17
|
|
|
18
18
|
A _handler_ is JavaScript code that performs work. There are two kinds of handlers: _task handlers_, which process jobs, and _fail handlers_, which are invoked when a job fails permanently. Handlers run on _workers_, which are Node.js worker threads. By default, a Queasy client automatically creates one worker per CPU.
|
|
19
19
|
|
|
@@ -55,9 +55,14 @@ The response of the heartbeat Lua function indicates whether the client had been
|
|
|
55
55
|
|
|
56
56
|
## API
|
|
57
57
|
|
|
58
|
-
### `
|
|
59
|
-
Returns a Queasy client.
|
|
60
|
-
- `
|
|
58
|
+
### `new Client(options, workerCount)`
|
|
59
|
+
Returns a Queasy client. Queasy creates and manages its own Redis connection internally.
|
|
60
|
+
- `options`: connection options. Two forms are accepted:
|
|
61
|
+
- **Single node** (plain object): passed to node-redis `createClient`. Accepts `url`, `socket`, `username`, `password`, and `database`. Defaults to `{}` (connects to `localhost:6379`).
|
|
62
|
+
- **Cluster** (object with `rootNodes`): passed to node-redis `createCluster`. Accepts:
|
|
63
|
+
- `rootNodes`: array of per-node connection options (same fields as single-node form); at least three nodes are recommended.
|
|
64
|
+
- `defaults`: options shared across all nodes (e.g. auth and TLS).
|
|
65
|
+
- `nodeAddressMap`: address translation map for NAT environments.
|
|
61
66
|
- `workerCount`: number; Size of the worker pool. If 0, or if called in a queasy worker thread, no pool is created. Defaults to the number of CPUs.
|
|
62
67
|
|
|
63
68
|
The client object returned is an EventEmitter, which emits a 'disconnect' event when it fails permanently for any reason, such as library version mismatch between different workers connected to the same Redis insance, or a lost locks situation. When this happens, in general the application should exit the worker process and allow the supervisor to restart it.
|
package/docker-compose.yml
CHANGED
|
@@ -0,0 +1,185 @@
|
|
|
1
|
+
# Queasy Fuzz Test Plan
|
|
2
|
+
|
|
3
|
+
A long-running end-to-end fuzz test that simulates random failures and continuously verifies core system invariants.
|
|
4
|
+
|
|
5
|
+
## Invariants Verified
|
|
6
|
+
|
|
7
|
+
1. **Mutual exclusion**: Two jobs with the same Job ID are never processed by different clients or worker threads simultaneously.
|
|
8
|
+
2. **No re-processing of successful jobs**: A job that has succeeded is never processed again.
|
|
9
|
+
3. **Scheduling**: No job is processed before its `run_at` time.
|
|
10
|
+
4. **Priority ordering within a queue**: No job starts processing while another job in the same queue with a lower `run_at` is still waiting (i.e., eligible jobs are dequeued in order).
|
|
11
|
+
5. **Fail handler completeness**: If a fail handler is registered, every job that does not eventually succeed MUST result in the fail handler being invoked.
|
|
12
|
+
6. **Queue progress (priority starvation prevention)**: Non-empty queues at the highest priority level always make progress. When they drain, queues at the next priority level begin making progress.
|
|
13
|
+
|
|
14
|
+
## Structure Overview
|
|
15
|
+
|
|
16
|
+
```
|
|
17
|
+
fuzztest/
|
|
18
|
+
Readme.md # This file
|
|
19
|
+
fuzz.js # Orchestrator: spawns child processes, monitors shared state
|
|
20
|
+
process.js # Child process: sets up clients and listens on all queues
|
|
21
|
+
handlers/
|
|
22
|
+
periodic.js # Re-queues itself; dispatches cascade jobs; occasionally stalls/crashes
|
|
23
|
+
cascade-a.js # Dispatched by periodic; dispatches into cascade-b
|
|
24
|
+
cascade-b.js # Dispatched by cascade-a; final handler
|
|
25
|
+
fail-handler.js # Shared fail handler for all queues; records invocations
|
|
26
|
+
shared/
|
|
27
|
+
state.js # In-process shared state helpers (for the orchestrator)
|
|
28
|
+
log.js # Structured logger (writes to fuzz-output.log, never throws)
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Process Architecture
|
|
32
|
+
|
|
33
|
+
The orchestrator (`fuzz.js`) spawns **N child processes** (default: 4). Each child process creates one Redis client and calls `listen()` on every queue. The orchestrator itself does not process jobs — it only monitors invariants and manages the lifecycle.
|
|
34
|
+
|
|
35
|
+
Handlers write events (job start, finish, fail, stall) directly to a Redis stream (`fuzz:events`). The orchestrator reads from this stream and maintains a shared in-memory log of events, checking invariants after each one. Child processes do not need to forward events to the orchestrator themselves — the stream is the shared channel.
|
|
36
|
+
|
|
37
|
+
Child processes are deliberately killed and restarted periodically to simulate crashes. A killed process' checked-out jobs will be swept and retried/failed by the remaining processes.
|
|
38
|
+
|
|
39
|
+
## Queue Configuration
|
|
40
|
+
|
|
41
|
+
Three queues at different priority levels, all listened on by every child process. Parameters are kept small to produce many events quickly:
|
|
42
|
+
|
|
43
|
+
| Parameter | `{fuzz}:periodic` | `{fuzz}:cascade-a` | `{fuzz}:cascade-b` |
|
|
44
|
+
|---|---|---|---|
|
|
45
|
+
| Handler | `periodic.js` | `cascade-a.js` | `cascade-b.js` |
|
|
46
|
+
| Priority | 300 | 200 | 100 |
|
|
47
|
+
| `maxRetries` | 3 | 3 | 3 |
|
|
48
|
+
| `maxStalls` | 2 | 2 | 2 |
|
|
49
|
+
| `minBackoff` | 200 ms | 200 ms | 200 ms |
|
|
50
|
+
| `maxBackoff` | 2 000 ms | 2 000 ms | 2 000 ms |
|
|
51
|
+
| `timeout` | 3 000 ms | 3 000 ms | 3 000 ms |
|
|
52
|
+
| `size` | 10 | 10 | 10 |
|
|
53
|
+
| `failHandler` | `fail-handler.js` | `fail-handler.js` | `fail-handler.js` |
|
|
54
|
+
| `failRetryOptions.maxRetries` | 5 | 5 | 5 |
|
|
55
|
+
| `failRetryOptions.minBackoff` | 200 ms | 200 ms | 200 ms |
|
|
56
|
+
|
|
57
|
+
The short `timeout` (3 s) means stalling jobs are detected and swept quickly. The short `minBackoff` / `maxBackoff` window (200 ms – 2 s) means retries cycle fast. With `maxRetries: 3` and `maxStalls: 2`, most failed jobs reach the fail handler within seconds.
|
|
58
|
+
|
|
59
|
+
## Periodic Jobs (Seed)
|
|
60
|
+
|
|
61
|
+
A fixed set of periodic job IDs (e.g., `periodic-0` through `periodic-4`) are dispatched by the orchestrator at startup. Each periodic handler:
|
|
62
|
+
|
|
63
|
+
1. Records the current processing event by writing `{ type: 'start', queue, id, threadId, clientId, startedAt }` to the `fuzz:events` Redis stream.
|
|
64
|
+
2. Optionally sleeps for a random short delay.
|
|
65
|
+
3. Dispatches a cascade-a job with a unique ID and a `runAt` randomly up to 2 seconds in the future.
|
|
66
|
+
4. Re-dispatches itself (same job ID, `updateRunAt: true`) with a delay of 1–5 seconds, so the job continues to fire periodically.
|
|
67
|
+
5. On success, writes `{ type: 'finish', queue, id, threadId, clientId, finishedAt }` to the `fuzz:events` stream.
|
|
68
|
+
|
|
69
|
+
The fail handler for periodic jobs also re-dispatches the same periodic job ID (with a delay), ensuring periodic jobs survive permanent failures. This lets the orchestrator assert that periodic jobs keep running indefinitely.
|
|
70
|
+
|
|
71
|
+
## Cascade Jobs
|
|
72
|
+
|
|
73
|
+
`cascade-a.js`:
|
|
74
|
+
- Records start/finish events.
|
|
75
|
+
- Dispatches one or two `cascade-b` jobs with unique IDs.
|
|
76
|
+
- Subject to all chaos behaviors (see below).
|
|
77
|
+
|
|
78
|
+
`cascade-b.js`:
|
|
79
|
+
- Records start/finish events.
|
|
80
|
+
- Terminal handler; does not dispatch further jobs.
|
|
81
|
+
- Subject to all chaos behaviors (see below).
|
|
82
|
+
|
|
83
|
+
## Chaos Behaviors
|
|
84
|
+
|
|
85
|
+
All handlers are subject to all chaos behaviors. The probabilities below are per-invocation and apply uniformly across `periodic.js`, `cascade-a.js`, and `cascade-b.js`:
|
|
86
|
+
|
|
87
|
+
| Behavior | Probability | Notes |
|
|
88
|
+
|---|---|---|
|
|
89
|
+
| Normal completion | ~65% | Dispatches downstream jobs (if any), then returns |
|
|
90
|
+
| Retriable error (throws `Error`) | ~15% | No downstream dispatch |
|
|
91
|
+
| Permanent error (throws `PermanentError`) | ~5% | No downstream dispatch |
|
|
92
|
+
| Stall (returns a never-resolving promise) | ~10% | Detected after `timeout` (3 s); counts as a stall |
|
|
93
|
+
| CPU spin (blocks the worker thread) | ~3% | Tight loop until the process detects the hang and kills the thread (via `timeout`) |
|
|
94
|
+
| Crash (causes the child process to exit) | ~2% | Handler writes a "crash-me" flag to Redis; main thread polls and exits |
|
|
95
|
+
|
|
96
|
+
With `timeout: 3000`, stalling and spinning jobs are swept within ~3–13 seconds (timeout + heartbeat sweep interval). With `maxStalls: 2`, two stalls exhaust the stall budget and the job is sent to the fail handler, cycling fast.
|
|
97
|
+
|
|
98
|
+
When a child process crashes, the orchestrator detects the exit event and restarts a new child process after a short delay.
|
|
99
|
+
|
|
100
|
+
## Event Logging and Invariant Checking
|
|
101
|
+
|
|
102
|
+
The orchestrator maintains an append-only in-memory event log. Each entry contains:
|
|
103
|
+
```js
|
|
104
|
+
{ type, queue, id, threadId, clientId, timestamp }
|
|
105
|
+
```
|
|
106
|
+
where `type` is one of: `start`, `finish`, `fail`, `stall`, `cancel`.
|
|
107
|
+
|
|
108
|
+
After each event is appended, the orchestrator runs incremental invariant checks:
|
|
109
|
+
|
|
110
|
+
### Invariant 1: Mutual Exclusion
|
|
111
|
+
Maintain a `Map<jobId, { clientId, threadId, startedAt }>` of currently-active jobs. On `start`, check that the job ID is not already in the map. On `finish`/`fail`/`stall`, remove it.
|
|
112
|
+
|
|
113
|
+
If a `start` event arrives for a job ID already in the map → **VIOLATION**.
|
|
114
|
+
|
|
115
|
+
### Invariant 2: No Re-processing of Succeeded Jobs
|
|
116
|
+
Maintain a `Set<jobId>` of successfully finished job IDs. On `start`, check that the ID is not in this set.
|
|
117
|
+
|
|
118
|
+
If a `start` event arrives for a job ID in the succeeded set → **VIOLATION**.
|
|
119
|
+
|
|
120
|
+
Note: Re-processing after a stall or retry is expected and must not be flagged.
|
|
121
|
+
|
|
122
|
+
### Invariant 3: Scheduling (No Early Processing)
|
|
123
|
+
Each `start` event includes `startedAt` (wall clock). Each job dispatch records an intended `runAt`. On `start`, verify `startedAt >= runAt - CLOCK_TOLERANCE_MS`.
|
|
124
|
+
|
|
125
|
+
If `startedAt < runAt - CLOCK_TOLERANCE_MS` → **VIOLATION**.
|
|
126
|
+
|
|
127
|
+
`CLOCK_TOLERANCE_MS` accounts for clock skew between the orchestrator, child processes, and Redis (default: 100ms).
|
|
128
|
+
|
|
129
|
+
### Invariant 4: Priority Ordering
|
|
130
|
+
Track the earliest-known `runAt` for jobs dispatched into each queue but not yet started. When a `start` event arrives for a job in that queue, verify no other eligible job (with `runAt <= now`) in the same queue has a lower `runAt` that has been waiting longer.
|
|
131
|
+
|
|
132
|
+
This invariant is best-effort and checked with a configurable lag (e.g., 200ms) to account for the inherent race between dequeue polling and dispatch. A violation is only flagged when the ordering difference exceeds this lag.
|
|
133
|
+
|
|
134
|
+
### Invariant 5: Fail Handler Completeness
|
|
135
|
+
Track every job that has been dispatched (by ID). When a job exceeds its `maxRetries` or receives a permanent error, a fail event should be observed. Maintain a map `{ jobId → { exhausted: bool, failSeen: bool } }`. After a configurable drain period (e.g., 30 seconds after a queue goes quiet), check that every exhausted job has a corresponding `fail` event.
|
|
136
|
+
|
|
137
|
+
### Invariant 6: Queue Progress
|
|
138
|
+
The orchestrator monitors the time since the last `start` event per queue. If a queue is known to be non-empty (based on dispatched vs finished counts) and no `start` event has been seen for more than a configurable `STALL_THRESHOLD_MS` (e.g., 30 seconds), flag a progress violation.
|
|
139
|
+
|
|
140
|
+
Priority starvation is checked by verifying that the low-priority queue does not process jobs while the high-priority queue has outstanding jobs older than the dequeue poll interval.
|
|
141
|
+
|
|
142
|
+
## Output and Reporting
|
|
143
|
+
|
|
144
|
+
Violations are logged to stdout and to `fuzz-output.log` with full context. The process does **not** exit on a violation — it logs and continues, accumulating a count of violations. A summary is printed periodically (every 60 seconds) and on `SIGINT`.
|
|
145
|
+
|
|
146
|
+
Log format (newline-delimited JSON):
|
|
147
|
+
```json
|
|
148
|
+
{ "time": "...", "level": "info|warn|error", "msg": "...", "data": { ... } }
|
|
149
|
+
```
|
|
150
|
+
|
|
151
|
+
Violation entries use level `"error"` and include the invariant name, the offending event, and relevant recent history.
|
|
152
|
+
|
|
153
|
+
## Configuration
|
|
154
|
+
|
|
155
|
+
All tunable parameters live at the top of `fuzz.js` as named constants:
|
|
156
|
+
|
|
157
|
+
```js
|
|
158
|
+
const NUM_PROCESSES = 4; // Child processes
|
|
159
|
+
const NUM_PERIODIC_JOBS = 5; // Fixed periodic job IDs
|
|
160
|
+
const PERIODIC_MIN_DELAY = 1000; // ms before re-queuing self
|
|
161
|
+
const PERIODIC_MAX_DELAY = 5000;
|
|
162
|
+
const CRASH_INTERVAL_MS = 30000; // Orchestrator kills a random child process this often
|
|
163
|
+
const CLOCK_TOLERANCE_MS = 100;
|
|
164
|
+
const STALL_THRESHOLD_MS = 30000;
|
|
165
|
+
const PRIORITY_LAG_MS = 200;
|
|
166
|
+
const LOG_FILE = 'fuzz-output.log';
|
|
167
|
+
```
|
|
168
|
+
|
|
169
|
+
## Running
|
|
170
|
+
|
|
171
|
+
The fuzz test is separate from the default test suite and is never run by `npm test`. It is started manually:
|
|
172
|
+
|
|
173
|
+
```sh
|
|
174
|
+
node fuzztest/fuzz.js
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
It runs indefinitely. Stop with `Ctrl+C`. A summary of violations and events processed will be printed on exit.
|
|
178
|
+
|
|
179
|
+
## Notes on Implementation
|
|
180
|
+
|
|
181
|
+
- Child processes use the `queasy` library's public API (`queue()`, `dispatch()`, `listen()`). They do not talk directly to Redis.
|
|
182
|
+
- The orchestrator does not import from `src/`; it only spawns child processes and learns about child process lifecycle only from the `spawn` and `exit` events.
|
|
183
|
+
- All handler modules in `fuzztest/handlers/` must be self-contained ESM modules that can be passed as `handlerPath` to `queue.listen()`.
|
|
184
|
+
- Handlers write events to the `fuzz:events` Redis stream using a dedicated Redis client created at handler module load time. The orchestrator reads from this stream via `XREAD BLOCK`. This is the only communication channel between handlers and the orchestrator — no IPC is used.
|
|
185
|
+
- The chaos crash behavior must be triggered from the child process's main thread, not from inside a handler's worker thread. To simulate a crash, the handler uses `postMessage` to send a `{ type: 'crash' }` message to the main thread, which listens for it and calls `process.exit()`.
|
package/fuzztest/fuzz.js
ADDED
|
@@ -0,0 +1,354 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Fuzz test orchestrator.
|
|
3
|
+
*
|
|
4
|
+
* - Spawns NUM_PROCESSES child processes, each running fuzztest/process.js
|
|
5
|
+
* - Dispatches seed periodic jobs at startup
|
|
6
|
+
* - Reads events from the fuzz:events Redis stream
|
|
7
|
+
* - Checks system invariants after each event
|
|
8
|
+
* - Logs violations without terminating
|
|
9
|
+
* - Prints a summary every 60 seconds and on SIGINT
|
|
10
|
+
*/
|
|
11
|
+
|
|
12
|
+
import { fork } from 'node:child_process';
|
|
13
|
+
import { createWriteStream } from 'node:fs';
|
|
14
|
+
import { dirname, join } from 'node:path';
|
|
15
|
+
import { fileURLToPath } from 'node:url';
|
|
16
|
+
import { createClient } from 'redis';
|
|
17
|
+
import { Client } from '../src/index.js';
|
|
18
|
+
import { readEvents, STREAM_KEY } from './shared/stream.js';
|
|
19
|
+
|
|
20
|
+
const __dirname = dirname(fileURLToPath(import.meta.url));
|
|
21
|
+
|
|
22
|
+
// ── Configuration ──────────────────────────────────────────────────────────────
|
|
23
|
+
|
|
24
|
+
const NUM_PROCESSES = 4;
|
|
25
|
+
const NUM_PERIODIC_JOBS = 5;
|
|
26
|
+
const CRASH_INTERVAL_MS = 30_000; // kill a random process this often
|
|
27
|
+
const CLOCK_TOLERANCE_MS = 200; // allow this much early-start slop
|
|
28
|
+
const STALL_THRESHOLD_MS = 30_000; // no start event → progress violation
|
|
29
|
+
const PRIORITY_LAG_MS = 500; // ordering slop between queues
|
|
30
|
+
const PROCESS_RESTART_DELAY_MS = 500;
|
|
31
|
+
const SUMMARY_INTERVAL_MS = 60_000;
|
|
32
|
+
const LOG_FILE = join(__dirname, '..', 'fuzz-output.log');
|
|
33
|
+
|
|
34
|
+
// ── Logging ────────────────────────────────────────────────────────────────────
|
|
35
|
+
|
|
36
|
+
const logStream = createWriteStream(LOG_FILE, { flags: 'a' });
|
|
37
|
+
|
|
38
|
+
function log(level, msg, data = {}) {
|
|
39
|
+
const entry = JSON.stringify({ time: new Date().toISOString(), level, msg, data });
|
|
40
|
+
if (level === 'error') process.stdout.write(`VIOLATION: ${entry}\n`);
|
|
41
|
+
else process.stdout.write(`${entry}\n`);
|
|
42
|
+
logStream.write(`${entry}\n`);
|
|
43
|
+
}
|
|
44
|
+
|
|
45
|
+
// ── Invariant state ────────────────────────────────────────────────────────────
|
|
46
|
+
|
|
47
|
+
/** @type {Map<string, {queue: string, startedAt: number, runAt: number, pid: number}>} */
|
|
48
|
+
const activeJobs = new Map();
|
|
49
|
+
|
|
50
|
+
/** @type {Set<string>} */
|
|
51
|
+
const succeededJobs = new Set();
|
|
52
|
+
|
|
53
|
+
/**
|
|
54
|
+
* Per-queue: list of {id, runAt, dispatchedAt} for jobs seen but not yet started.
|
|
55
|
+
* Used to check priority ordering.
|
|
56
|
+
* @type {Map<string, {id: string, runAt: number, dispatchedAt: number}[]>}
|
|
57
|
+
*/
|
|
58
|
+
const waitingByQueue = new Map();
|
|
59
|
+
|
|
60
|
+
/** @type {Map<string, number>} last start event timestamp per queue */
|
|
61
|
+
const lastStartPerQueue = new Map();
|
|
62
|
+
|
|
63
|
+
let violationCount = 0;
|
|
64
|
+
let eventCount = 0;
|
|
65
|
+
|
|
66
|
+
function violation(invariant, msg, data = {}) {
|
|
67
|
+
violationCount++;
|
|
68
|
+
log('error', `[${invariant}] ${msg}`, data);
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
// ── Invariant checks ───────────────────────────────────────────────────────────
|
|
72
|
+
|
|
73
|
+
/**
|
|
74
|
+
* Called when a child process dequeues a job (via IPC).
|
|
75
|
+
* @param {number} pid
|
|
76
|
+
* @param {{queue: string, jobId: string, runAt: number}} msg
|
|
77
|
+
*/
|
|
78
|
+
function onIpcDequeue(pid, msg) {
|
|
79
|
+
const { jobId: id, queue, runAt } = msg;
|
|
80
|
+
|
|
81
|
+
// Mutual exclusion: job must not already be active
|
|
82
|
+
if (activeJobs.has(id)) {
|
|
83
|
+
const existing = activeJobs.get(id);
|
|
84
|
+
violation('MutualExclusion', `Job ${id} dequeued while already active`, {
|
|
85
|
+
existing,
|
|
86
|
+
newDequeue: { queue, pid },
|
|
87
|
+
});
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
activeJobs.set(id, { queue, pid, startedAt: Date.now(), runAt });
|
|
91
|
+
}
|
|
92
|
+
|
|
93
|
+
/**
|
|
94
|
+
* Called when a child process finishes/retries/fails a job (via IPC).
|
|
95
|
+
* @param {string} jobId
|
|
96
|
+
*/
|
|
97
|
+
function onIpcJobDone(jobId) {
|
|
98
|
+
activeJobs.delete(jobId);
|
|
99
|
+
}
|
|
100
|
+
|
|
101
|
+
function onStart(event) {
|
|
102
|
+
const { id, queue, runAt: runAtStr, startedAt: startedAtStr } = event;
|
|
103
|
+
const runAt = Number(runAtStr);
|
|
104
|
+
const startedAt = Number(startedAtStr);
|
|
105
|
+
|
|
106
|
+
// No re-processing of succeeded jobs (except periodic which re-queues itself)
|
|
107
|
+
if (succeededJobs.has(id) && !id.startsWith('fuzz-periodic-')) {
|
|
108
|
+
violation('NoReprocess', `Job ${id} started after already succeeding`, {
|
|
109
|
+
queue,
|
|
110
|
+
startedAt,
|
|
111
|
+
});
|
|
112
|
+
}
|
|
113
|
+
|
|
114
|
+
// Scheduling: not before runAt
|
|
115
|
+
if (runAt > 0 && startedAt < runAt - CLOCK_TOLERANCE_MS) {
|
|
116
|
+
violation('Scheduling', `Job ${id} started ${runAt - startedAt}ms too early`, {
|
|
117
|
+
queue,
|
|
118
|
+
runAt,
|
|
119
|
+
startedAt,
|
|
120
|
+
delta: runAt - startedAt,
|
|
121
|
+
});
|
|
122
|
+
}
|
|
123
|
+
|
|
124
|
+
// Priority ordering: no eligible lower-runAt job in same queue waiting
|
|
125
|
+
const waiting = waitingByQueue.get(queue) ?? [];
|
|
126
|
+
for (const w of waiting) {
|
|
127
|
+
if (w.id === id) continue;
|
|
128
|
+
if (w.runAt <= startedAt - CLOCK_TOLERANCE_MS) {
|
|
129
|
+
if (runAt > w.runAt + PRIORITY_LAG_MS) {
|
|
130
|
+
violation(
|
|
131
|
+
'Ordering',
|
|
132
|
+
`Job ${id} (runAt=${runAt}) started before ${w.id} (runAt=${w.runAt}) in ${queue}`,
|
|
133
|
+
{
|
|
134
|
+
startedId: id,
|
|
135
|
+
startedRunAt: runAt,
|
|
136
|
+
waitingId: w.id,
|
|
137
|
+
waitingRunAt: w.runAt,
|
|
138
|
+
}
|
|
139
|
+
);
|
|
140
|
+
break;
|
|
141
|
+
}
|
|
142
|
+
}
|
|
143
|
+
}
|
|
144
|
+
|
|
145
|
+
lastStartPerQueue.set(queue, startedAt);
|
|
146
|
+
|
|
147
|
+
// Remove from waiting list
|
|
148
|
+
if (waitingByQueue.has(queue)) {
|
|
149
|
+
waitingByQueue.set(
|
|
150
|
+
queue,
|
|
151
|
+
waitingByQueue.get(queue).filter((w) => w.id !== id)
|
|
152
|
+
);
|
|
153
|
+
}
|
|
154
|
+
}
|
|
155
|
+
|
|
156
|
+
function onFinish(event) {
|
|
157
|
+
succeededJobs.add(event.id);
|
|
158
|
+
}
|
|
159
|
+
|
|
160
|
+
/**
|
|
161
|
+
* Called when a child process exits. Clears all active jobs belonging to that
|
|
162
|
+
* PID so they don't trigger spurious MutualExclusion violations when the
|
|
163
|
+
* queasy sweep retries them and a new process picks them up.
|
|
164
|
+
* @param {number} pid
|
|
165
|
+
*/
|
|
166
|
+
function onProcessExit(pid) {
|
|
167
|
+
for (const [id, entry] of activeJobs) {
|
|
168
|
+
if (entry.pid === pid) {
|
|
169
|
+
activeJobs.delete(id);
|
|
170
|
+
}
|
|
171
|
+
}
|
|
172
|
+
}
|
|
173
|
+
|
|
174
|
+
/**
|
|
175
|
+
* Called periodically to check queue progress and priority starvation.
|
|
176
|
+
*/
|
|
177
|
+
function checkProgress() {
|
|
178
|
+
const now = Date.now();
|
|
179
|
+
for (const [queue, lastStart] of lastStartPerQueue) {
|
|
180
|
+
const idle = now - lastStart;
|
|
181
|
+
if (idle > STALL_THRESHOLD_MS) {
|
|
182
|
+
violation('Progress', `Queue ${queue} has not processed a job in ${idle}ms`, {
|
|
183
|
+
queue,
|
|
184
|
+
lastStartAt: lastStart,
|
|
185
|
+
idleMs: idle,
|
|
186
|
+
});
|
|
187
|
+
}
|
|
188
|
+
}
|
|
189
|
+
|
|
190
|
+
// Priority starvation: cascade-b should not start while periodic has old eligible jobs
|
|
191
|
+
const periodicWaiting = waitingByQueue.get('{fuzz}:periodic') ?? [];
|
|
192
|
+
const eligiblePeriodic = periodicWaiting.filter((w) => w.runAt <= now - PRIORITY_LAG_MS);
|
|
193
|
+
if (eligiblePeriodic.length > 0) {
|
|
194
|
+
const lastBStart = lastStartPerQueue.get('{fuzz}:cascade-b') ?? 0;
|
|
195
|
+
const bStartedAfterPeriodic = eligiblePeriodic.some(
|
|
196
|
+
(w) => lastBStart > w.dispatchedAt + PRIORITY_LAG_MS
|
|
197
|
+
);
|
|
198
|
+
if (bStartedAfterPeriodic) {
|
|
199
|
+
violation(
|
|
200
|
+
'PriorityStarvation',
|
|
201
|
+
'cascade-b processed while periodic had eligible waiting jobs',
|
|
202
|
+
{
|
|
203
|
+
eligiblePeriodicCount: eligiblePeriodic.length,
|
|
204
|
+
}
|
|
205
|
+
);
|
|
206
|
+
}
|
|
207
|
+
}
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
// ── Event dispatch ─────────────────────────────────────────────────────────────
|
|
211
|
+
|
|
212
|
+
function handleEvent(event) {
|
|
213
|
+
eventCount++;
|
|
214
|
+
const { type } = event;
|
|
215
|
+
log('info', 'event', event);
|
|
216
|
+
|
|
217
|
+
if (type === 'start') {
|
|
218
|
+
const { id, queue, runAt } = event;
|
|
219
|
+
// Register in waiting list for ordering checks (before onStart removes it)
|
|
220
|
+
const runAtNum = Number(runAt);
|
|
221
|
+
const q = waitingByQueue.get(queue) ?? [];
|
|
222
|
+
if (!q.find((w) => w.id === id)) {
|
|
223
|
+
q.push({ id, runAt: runAtNum, dispatchedAt: Date.now() });
|
|
224
|
+
waitingByQueue.set(queue, q);
|
|
225
|
+
}
|
|
226
|
+
onStart(event);
|
|
227
|
+
} else if (type === 'finish') {
|
|
228
|
+
onFinish(event);
|
|
229
|
+
}
|
|
230
|
+
}
|
|
231
|
+
|
|
232
|
+
// ── Summary ────────────────────────────────────────────────────────────────────
|
|
233
|
+
|
|
234
|
+
function printSummary() {
|
|
235
|
+
const summary = {
|
|
236
|
+
events: eventCount,
|
|
237
|
+
violations: violationCount,
|
|
238
|
+
activeJobs: activeJobs.size,
|
|
239
|
+
succeededJobs: succeededJobs.size,
|
|
240
|
+
lastStartPerQueue: Object.fromEntries(lastStartPerQueue),
|
|
241
|
+
};
|
|
242
|
+
log('info', 'Summary', summary);
|
|
243
|
+
console.log(`\n=== Fuzz Summary ===`);
|
|
244
|
+
console.log(` Events processed : ${eventCount}`);
|
|
245
|
+
console.log(` Violations found : ${violationCount}`);
|
|
246
|
+
console.log(` Active jobs : ${activeJobs.size}`);
|
|
247
|
+
console.log(` Succeeded jobs : ${succeededJobs.size}`);
|
|
248
|
+
console.log('===================\n');
|
|
249
|
+
}
|
|
250
|
+
|
|
251
|
+
// ── Child process management ───────────────────────────────────────────────────
|
|
252
|
+
|
|
253
|
+
/** @type {Set<import('node:child_process').ChildProcess>} */
|
|
254
|
+
const processes = new Set();
|
|
255
|
+
|
|
256
|
+
function spawnProcess() {
|
|
257
|
+
const child = fork(join(__dirname, 'process.js'));
|
|
258
|
+
processes.add(child);
|
|
259
|
+
|
|
260
|
+
child.on('message', (msg) => {
|
|
261
|
+
if (msg.type === 'dequeue') {
|
|
262
|
+
onIpcDequeue(child.pid, msg);
|
|
263
|
+
} else if (msg.type === 'finish' || msg.type === 'retry' || msg.type === 'fail') {
|
|
264
|
+
onIpcJobDone(msg.jobId);
|
|
265
|
+
}
|
|
266
|
+
});
|
|
267
|
+
|
|
268
|
+
child.on('exit', (code, signal) => {
|
|
269
|
+
processes.delete(child);
|
|
270
|
+
log('info', 'Child process exited', { pid: child.pid, code, signal });
|
|
271
|
+
onProcessExit(child.pid);
|
|
272
|
+
setTimeout(spawnProcess, PROCESS_RESTART_DELAY_MS);
|
|
273
|
+
});
|
|
274
|
+
|
|
275
|
+
child.on('error', (err) => {
|
|
276
|
+
log('info', 'Child process error', { pid: child.pid, error: err.message });
|
|
277
|
+
});
|
|
278
|
+
|
|
279
|
+
return child;
|
|
280
|
+
}
|
|
281
|
+
|
|
282
|
+
function killRandomProcess() {
|
|
283
|
+
const list = [...processes];
|
|
284
|
+
if (list.length === 0) return;
|
|
285
|
+
const target = list[Math.floor(Math.random() * list.length)];
|
|
286
|
+
log('info', 'Killing random child process', { pid: target.pid });
|
|
287
|
+
target.kill('SIGKILL');
|
|
288
|
+
}
|
|
289
|
+
|
|
290
|
+
// ── Redis setup ────────────────────────────────────────────────────────────────
|
|
291
|
+
|
|
292
|
+
const redis = createClient();
|
|
293
|
+
const dispatchRedis = createClient();
|
|
294
|
+
|
|
295
|
+
await redis.connect();
|
|
296
|
+
await dispatchRedis.connect();
|
|
297
|
+
|
|
298
|
+
// Clean up state from previous runs
|
|
299
|
+
await redis.del(STREAM_KEY);
|
|
300
|
+
log('info', 'Cleared fuzz:events stream from previous run');
|
|
301
|
+
|
|
302
|
+
// Dispatch seed periodic jobs (await ready to avoid Function not found race)
|
|
303
|
+
const dispatchClient = await new Promise((resolve) => new Client(dispatchRedis, 0, resolve));
|
|
304
|
+
const periodicQueue = dispatchClient.queue('{fuzz}:periodic', true);
|
|
305
|
+
|
|
306
|
+
for (let i = 0; i < NUM_PERIODIC_JOBS; i++) {
|
|
307
|
+
const id = `fuzz-periodic-${i}`;
|
|
308
|
+
await periodicQueue.dispatch({ periodic: true, index: i }, { id });
|
|
309
|
+
log('info', `Dispatched seed job ${id}`);
|
|
310
|
+
}
|
|
311
|
+
|
|
312
|
+
// ── Spawn child processes ──────────────────────────────────────────────────────
|
|
313
|
+
|
|
314
|
+
for (let i = 0; i < NUM_PROCESSES; i++) {
|
|
315
|
+
spawnProcess();
|
|
316
|
+
}
|
|
317
|
+
|
|
318
|
+
// Periodically kill a random process to simulate crashes
|
|
319
|
+
const crashTimer = setInterval(killRandomProcess, CRASH_INTERVAL_MS);
|
|
320
|
+
|
|
321
|
+
// Periodic progress + starvation check
|
|
322
|
+
const progressTimer = setInterval(checkProgress, 10_000);
|
|
323
|
+
|
|
324
|
+
// Summary timer
|
|
325
|
+
const summaryTimer = setInterval(printSummary, SUMMARY_INTERVAL_MS);
|
|
326
|
+
|
|
327
|
+
log('info', 'Orchestrator started', {
|
|
328
|
+
numProcesses: NUM_PROCESSES,
|
|
329
|
+
numPeriodicJobs: NUM_PERIODIC_JOBS,
|
|
330
|
+
logFile: LOG_FILE,
|
|
331
|
+
});
|
|
332
|
+
|
|
333
|
+
// ── SIGINT handler ─────────────────────────────────────────────────────────────
|
|
334
|
+
|
|
335
|
+
process.on('SIGINT', () => {
|
|
336
|
+
clearInterval(crashTimer);
|
|
337
|
+
clearInterval(progressTimer);
|
|
338
|
+
clearInterval(summaryTimer);
|
|
339
|
+
|
|
340
|
+
for (const child of processes) {
|
|
341
|
+
child.kill('SIGKILL');
|
|
342
|
+
}
|
|
343
|
+
|
|
344
|
+
printSummary();
|
|
345
|
+
process.exit(violationCount > 0 ? 1 : 0);
|
|
346
|
+
});
|
|
347
|
+
|
|
348
|
+
// ── Event loop ─────────────────────────────────────────────────────────────────
|
|
349
|
+
|
|
350
|
+
// Read from the beginning. We cleared the stream above, so '0' reads all
|
|
351
|
+
// events from the fresh start without missing anything.
|
|
352
|
+
for await (const event of readEvents(redis, '0')) {
|
|
353
|
+
handleEvent(event);
|
|
354
|
+
}
|
|
@@ -0,0 +1,94 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* cascade-a handler — dispatches one or two cascade-b jobs on normal completion.
|
|
3
|
+
* Subject to all chaos behaviors.
|
|
4
|
+
*/
|
|
5
|
+
|
|
6
|
+
import { BroadcastChannel } from 'node:worker_threads';
|
|
7
|
+
import { createClient } from 'redis';
|
|
8
|
+
import { Client, PermanentError } from '../../src/index.js';
|
|
9
|
+
import { pickChaos } from '../shared/chaos.js';
|
|
10
|
+
import { emitEvent } from '../shared/stream.js';
|
|
11
|
+
|
|
12
|
+
const redis = createClient();
|
|
13
|
+
const eventRedis = createClient();
|
|
14
|
+
|
|
15
|
+
await redis.connect();
|
|
16
|
+
await eventRedis.connect();
|
|
17
|
+
|
|
18
|
+
// Dispatch-only queasy client (await ready to avoid Function not found race)
|
|
19
|
+
const client = await new Promise((resolve) => new Client(redis, 0, resolve));
|
|
20
|
+
const cascadeBQueue = client.queue('{fuzz}:cascade-b', true);
|
|
21
|
+
|
|
22
|
+
const crashChannel = new BroadcastChannel('fuzz-crash');
|
|
23
|
+
|
|
24
|
+
/**
|
|
25
|
+
* @param {any} data
|
|
26
|
+
* @param {import('../../src/types.js').Job} job
|
|
27
|
+
*/
|
|
28
|
+
export async function handle(_data, job) {
|
|
29
|
+
const startedAt = Date.now();
|
|
30
|
+
await emitEvent(eventRedis, {
|
|
31
|
+
type: 'start',
|
|
32
|
+
queue: '{fuzz}:cascade-a',
|
|
33
|
+
id: job.id,
|
|
34
|
+
pid: String(process.pid),
|
|
35
|
+
runAt: String(job.runAt),
|
|
36
|
+
startedAt: String(startedAt),
|
|
37
|
+
});
|
|
38
|
+
|
|
39
|
+
const chaos = pickChaos();
|
|
40
|
+
await emitEvent(eventRedis, {
|
|
41
|
+
type: 'chaos',
|
|
42
|
+
queue: '{fuzz}:cascade-a',
|
|
43
|
+
id: job.id,
|
|
44
|
+
chaos,
|
|
45
|
+
});
|
|
46
|
+
|
|
47
|
+
if (chaos === 'crash') {
|
|
48
|
+
crashChannel.postMessage({ type: 'crash' });
|
|
49
|
+
await new Promise(() => {});
|
|
50
|
+
}
|
|
51
|
+
|
|
52
|
+
if (chaos === 'stall') {
|
|
53
|
+
await new Promise(() => {});
|
|
54
|
+
}
|
|
55
|
+
|
|
56
|
+
if (chaos === 'spin') {
|
|
57
|
+
const end = Date.now() + 10_000;
|
|
58
|
+
while (Date.now() < end) {
|
|
59
|
+
/* busy wait */
|
|
60
|
+
}
|
|
61
|
+
}
|
|
62
|
+
|
|
63
|
+
if (chaos === 'permanent') {
|
|
64
|
+
throw new PermanentError('cascade-a: permanent chaos');
|
|
65
|
+
}
|
|
66
|
+
|
|
67
|
+
if (chaos === 'retriable') {
|
|
68
|
+
throw new Error('cascade-a: retriable chaos');
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
// Normal completion: dispatch 1-2 cascade-b jobs
|
|
72
|
+
const count = Math.random() < 0.5 ? 1 : 2;
|
|
73
|
+
const runAtOffset = Math.random() * 2000;
|
|
74
|
+
const dispatchPromises = [];
|
|
75
|
+
for (let i = 0; i < count; i++) {
|
|
76
|
+
dispatchPromises.push(
|
|
77
|
+
cascadeBQueue.dispatch(
|
|
78
|
+
{ from: job.id, index: i },
|
|
79
|
+
{
|
|
80
|
+
runAt: Date.now() + runAtOffset,
|
|
81
|
+
}
|
|
82
|
+
)
|
|
83
|
+
);
|
|
84
|
+
}
|
|
85
|
+
const ids = await Promise.all(dispatchPromises);
|
|
86
|
+
|
|
87
|
+
await emitEvent(eventRedis, {
|
|
88
|
+
type: 'finish',
|
|
89
|
+
queue: '{fuzz}:cascade-a',
|
|
90
|
+
id: job.id,
|
|
91
|
+
finishedAt: String(Date.now()),
|
|
92
|
+
dispatched: ids.join(','),
|
|
93
|
+
});
|
|
94
|
+
}
|