groupmq-plus 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (40) hide show
  1. package/LICENSE +59 -0
  2. package/README.md +722 -0
  3. package/dist/index.cjs +2567 -0
  4. package/dist/index.cjs.map +1 -0
  5. package/dist/index.d.cts +1300 -0
  6. package/dist/index.d.ts +1300 -0
  7. package/dist/index.js +2557 -0
  8. package/dist/index.js.map +1 -0
  9. package/dist/lua/change-delay.lua +62 -0
  10. package/dist/lua/check-stalled.lua +86 -0
  11. package/dist/lua/clean-status.lua +64 -0
  12. package/dist/lua/cleanup-poisoned-group.lua +46 -0
  13. package/dist/lua/cleanup.lua +46 -0
  14. package/dist/lua/complete-and-reserve-next-with-metadata.lua +221 -0
  15. package/dist/lua/complete-with-metadata.lua +190 -0
  16. package/dist/lua/complete.lua +51 -0
  17. package/dist/lua/dead-letter.lua +86 -0
  18. package/dist/lua/enqueue-batch.lua +149 -0
  19. package/dist/lua/enqueue-flow.lua +107 -0
  20. package/dist/lua/enqueue.lua +154 -0
  21. package/dist/lua/get-active-count.lua +6 -0
  22. package/dist/lua/get-active-jobs.lua +6 -0
  23. package/dist/lua/get-delayed-count.lua +5 -0
  24. package/dist/lua/get-delayed-jobs.lua +5 -0
  25. package/dist/lua/get-unique-groups-count.lua +13 -0
  26. package/dist/lua/get-unique-groups.lua +15 -0
  27. package/dist/lua/get-waiting-count.lua +11 -0
  28. package/dist/lua/get-waiting-jobs.lua +15 -0
  29. package/dist/lua/heartbeat.lua +22 -0
  30. package/dist/lua/is-empty.lua +35 -0
  31. package/dist/lua/promote-delayed-jobs.lua +40 -0
  32. package/dist/lua/promote-delayed-one.lua +44 -0
  33. package/dist/lua/promote-staged.lua +70 -0
  34. package/dist/lua/record-job-result.lua +143 -0
  35. package/dist/lua/remove.lua +55 -0
  36. package/dist/lua/reserve-atomic.lua +114 -0
  37. package/dist/lua/reserve-batch.lua +141 -0
  38. package/dist/lua/reserve.lua +161 -0
  39. package/dist/lua/retry.lua +53 -0
  40. package/package.json +92 -0
package/README.md ADDED
@@ -0,0 +1,722 @@
1
+
2
+ <p align="center">
3
+ <img src="website/public/favicon/web-app-manifest-512x512.png" width="200px" height="200px" />
4
+ <h1 align="center"><b>GroupMQ, Redis Group Queue</b></h1>
5
+ <p align="center">
6
+ A fast, reliable Redis-backed per-group FIFO queue for Node + TypeScript with guaranteed job ordering and parallel processing across groups.
7
+ <br />
8
+ <br />
9
+ <a href="https://openpanel-dev.github.io/groupmq/">Website</a>
10
+ ·
11
+ <a href="https://openpanel.dev">Created by OpenPanel.dev</a>
12
+ </p>
13
+ <br />
14
+ <br />
15
+ </p>
16
+
17
+ ## Install
18
+
19
+ ```bash
20
+ npm i groupmq ioredis
21
+ ```
22
+
23
+ ## Quick start
24
+
25
+ ```ts
26
+ import Redis from "ioredis";
27
+ import { Queue, Worker } from "groupmq";
28
+
29
+ const redis = new Redis("redis://127.0.0.1:6379");
30
+
31
+ const queue = new Queue({
32
+ redis,
33
+ namespace: "orders", // Will be prefixed with 'groupmq:'
34
+ jobTimeoutMs: 30_000, // How long before job times out
35
+ logger: true, // Enable logging (optional)
36
+ });
37
+
38
+ await queue.add({
39
+ groupId: "user:42",
40
+ data: { type: "charge", amount: 999 },
41
+ orderMs: Date.now(), // or event.createdAtMs
42
+ maxAttempts: 5,
43
+ });
44
+
45
+ const worker = new Worker({
46
+ queue,
47
+ concurrency: 1, // Process 1 job at a time (can increase for parallel processing)
48
+ handler: async (job) => {
49
+ console.log(`Processing:`, job.data);
50
+ },
51
+ });
52
+
53
+ worker.run();
54
+ ```
55
+
56
+ ## Key Features
57
+
58
+
59
+ ## Key Features
60
+
61
+ - **Per-group FIFO ordering** - Jobs within the same group process in strict order, perfect for user workflows, data pipelines, and sequential operations
62
+ - **Parallel processing across groups** - Process multiple groups simultaneously while maintaining order within each group
63
+ - **BullMQ-compatible API** - Familiar interface with enhanced group-based capabilities
64
+ - **High performance** - High throughput with low latency ([see benchmarks](https://openpanel-dev.github.io/groupmq/benchmarks/))
65
+ - **Built-in ordering strategies** - Handle out-of-order job arrivals with `'none'`, `'scheduler'`, or `'in-memory'` methods
66
+ - **Automatic recovery** - Stalled job detection and connection error handling with exponential backoff
67
+ - **Production ready** - Atomic operations, graceful shutdown, and comprehensive logging
68
+ - **Zero polling** - Efficient blocking operations prevent wasteful Redis calls
69
+
70
+ ## Inspiration from BullMQ
71
+
72
+ GroupMQ is heavily inspired by [BullMQ](https://github.com/taskforcesh/bullmq), a fantastic library and one of the most popular Redis-based job queue libraries for Node.js. We've taken many core concepts and design patterns from BullMQ while adapting them for our specific use case of per-group FIFO processing.
73
+
74
+ ### Key differences from BullMQ:
75
+ - **Per-group FIFO ordering**, jobs within the same group are processed in strict order
76
+ - **Group-based concurrency**, only one job per group can be active at a time
77
+ - **Ordered processing**, built-in support for `orderMs` timestamp-based ordering
78
+ - **Cross-group parallelism**, multiple groups can be processed simultaneously
79
+ - **No job types**, simplified to a single job, instead use union typed data `{ type: 'paint', data: { ... } } | { type: 'repair', data: { ... } }`
80
+
81
+ We're grateful to the BullMQ team for their excellent work and the foundation they've provided for the Redis job queue ecosystem.
82
+
83
+ ### Third-Party Code Attribution
84
+
85
+ While GroupMQ is inspired by BullMQ's design and concepts, we have also directly copied some code from BullMQ:
86
+
87
+ - **`src/async-fifo-queue.ts`** - This file contains code copied from BullMQ's AsyncFifoQueue implementation. BullMQ's implementation is well-designed and fits our needs perfectly, so we've used it directly rather than reimplementing it.
88
+
89
+ This code is used under the MIT License. The original copyright notice and license can be found at:
90
+ - BullMQ Repository: https://github.com/taskforcesh/bullmq
91
+ - BullMQ License: https://github.com/taskforcesh/bullmq/blob/main/LICENSE
92
+
93
+ Original copyright: Copyright (c) Taskforce.sh and contributors
94
+
95
+ ### Queue Options
96
+
97
+ ```ts
98
+ type QueueOptions = {
99
+ redis: Redis; // Redis client instance (required)
100
+ namespace: string; // Unique queue name, gets 'groupmq:' prefix (required)
101
+ logger?: boolean | LoggerInterface; // Enable logging (default: false)
102
+ jobTimeoutMs?: number; // Job processing timeout (default: 30000ms)
103
+ maxAttempts?: number; // Default max retry attempts (default: 3)
104
+ reserveScanLimit?: number; // Groups to scan when reserving (default: 20)
105
+ keepCompleted?: number; // Number of completed jobs to retain (default: 0)
106
+ keepFailed?: number; // Number of failed jobs to retain (default: 0)
107
+ schedulerLockTtlMs?: number; // Scheduler lock TTL (default: 1500ms)
108
+ orderingMethod?: OrderingMethod; // Ordering strategy (default: 'none')
109
+ orderingWindowMs?: number; // Time window for ordering (required for non-'none' methods)
110
+ orderingMaxWaitMultiplier?: number; // Max grace period multiplier for in-memory (default: 3)
111
+ orderingGracePeriodDecay?: number; // Grace period decay factor for in-memory (default: 1.0)
112
+ orderingMaxBatchSize?: number; // Max jobs to collect in batch for in-memory (default: 10)
113
+ };
114
+
115
+ type OrderingMethod = 'none' | 'scheduler' | 'in-memory';
116
+ ```
117
+
118
+ **Ordering Methods:**
119
+ - `'none'` - No ordering guarantees (fastest, zero overhead, no extra latency)
120
+ - `'scheduler'` - Redis buffering for large windows (≥1000ms, requires scheduler, adds latency)
121
+ - `'in-memory'` - Worker collection for small windows (50-500ms, no scheduler, adds latency per batch)
122
+
123
+ See [Ordering Methods](https://openpanel-dev.github.io/groupmq/docs/ordering-methods) for detailed comparison.
124
+
125
+ ### Worker Options
126
+
127
+ ```ts
128
+ type WorkerOptions<T> = {
129
+ queue: Queue<T>; // Queue instance to process jobs from (required)
130
+ handler: (job: ReservedJob<T>) => Promise<unknown>; // Job processing function (required)
131
+ name?: string; // Worker name for logging (default: queue.name)
132
+ logger?: boolean | LoggerInterface; // Enable logging (default: false)
133
+ concurrency?: number; // Number of jobs to process in parallel (default: 1)
134
+ heartbeatMs?: number; // Heartbeat interval (default: Math.max(1000, jobTimeoutMs/3))
135
+ onError?: (err: unknown, job?: ReservedJob<T>) => void; // Error handler
136
+ maxAttempts?: number; // Max retry attempts (default: queue.maxAttempts)
137
+ backoff?: BackoffStrategy; // Retry backoff function (default: exponential with jitter)
138
+ enableCleanup?: boolean; // Periodic cleanup (default: true)
139
+ cleanupIntervalMs?: number; // Cleanup frequency (default: 60000ms)
140
+ schedulerIntervalMs?: number; // Scheduler frequency (default: adaptive)
141
+ blockingTimeoutSec?: number; // Blocking reserve timeout (default: 5s)
142
+ atomicCompletion?: boolean; // Atomic completion + next reserve (default: true)
143
+ stalledInterval?: number; // Check if stalled every N ms (default: 30000)
144
+ maxStalledCount?: number; // Fail after N stalls (default: 1)
145
+ stalledGracePeriod?: number; // Grace period before considering stalled (default: 0)
146
+ };
147
+
148
+ type BackoffStrategy = (attempt: number) => number; // returns delay in ms
149
+ ```
150
+
151
+ ### Job Options
152
+
153
+ When adding a job to the queue:
154
+
155
+ ```ts
156
+ await queue.add({
157
+ groupId: string; // Required: Group ID for FIFO processing
158
+ data: T; // Required: Job payload data
159
+ orderMs?: number; // Timestamp for ordering (default: Date.now())
160
+ maxAttempts?: number; // Max retry attempts (default: queue.maxAttempts)
161
+ jobId?: string; // Custom job ID (default: auto-generated UUID)
162
+ delay?: number; // Delay in ms before job becomes available
163
+ runAt?: Date | number; // Specific time to run the job
164
+ repeat?: RepeatOptions; // Repeating job configuration (cron or interval)
165
+ });
166
+
167
+ type RepeatOptions =
168
+ | { every: number } // Repeat every N milliseconds
169
+ | { pattern: string }; // Cron pattern (standard 5-field format)
170
+ ```
171
+
172
+ **Example with delay:**
173
+ ```ts
174
+ await queue.add({
175
+ groupId: 'user:123',
176
+ data: { action: 'send-reminder' },
177
+ delay: 3600000, // Run in 1 hour
178
+ });
179
+ ```
180
+
181
+ **Example with specific time:**
182
+ ```ts
183
+ await queue.add({
184
+ groupId: 'user:123',
185
+ data: { action: 'scheduled-report' },
186
+ runAt: new Date('2025-12-31T23:59:59Z'),
187
+ });
188
+ ```
189
+
190
+ ## Worker Concurrency
191
+
192
+ Workers support configurable concurrency to process multiple jobs in parallel from different groups:
193
+
194
+ ```ts
195
+ const worker = new Worker({
196
+ queue,
197
+ concurrency: 8, // Process up to 8 jobs simultaneously
198
+ handler: async (job) => {
199
+ // Jobs from different groups can run in parallel
200
+ // Jobs from the same group still run sequentially
201
+ },
202
+ });
203
+ ```
204
+
205
+ **Benefits:**
206
+ - Higher throughput for multi-group workloads
207
+ - Efficient resource utilization
208
+ - Still maintains per-group FIFO ordering
209
+
210
+ **Considerations:**
211
+ - Each job consumes memory and resources
212
+ - Set concurrency based on job duration and system resources
213
+ - Monitor Redis connection pool (ioredis default: 10 connections)
214
+
215
+ ## Logging
216
+
217
+ Both Queue and Worker support optional logging for debugging and monitoring:
218
+
219
+ ```ts
220
+ // Enable default logger
221
+ const queue = new Queue({
222
+ redis,
223
+ namespace: 'orders',
224
+ logger: true, // Logs to console with queue name prefix
225
+ });
226
+
227
+ const worker = new Worker({
228
+ queue,
229
+ logger: true, // Logs to console with worker name prefix
230
+ handler: async (job) => { /* ... */ },
231
+ });
232
+ ```
233
+
234
+ **Custom logger:**
235
+
236
+ Works out of the box with both `pino` and `winston`
237
+
238
+ ```ts
239
+ import type { LoggerInterface } from 'groupmq';
240
+
241
+ const customLogger: LoggerInterface = {
242
+ debug: (msg: string, ...args: any[]) => { /* custom logging */ },
243
+ info: (msg: string, ...args: any[]) => { /* custom logging */ },
244
+ warn: (msg: string, ...args: any[]) => { /* custom logging */ },
245
+ error: (msg: string, ...args: any[]) => { /* custom logging */ },
246
+ };
247
+
248
+ const queue = new Queue({
249
+ redis,
250
+ namespace: 'orders',
251
+ logger: customLogger,
252
+ });
253
+ ```
254
+
255
+ **What gets logged:**
256
+ - Job reservation and completion
257
+ - Error handling and retries
258
+ - Scheduler runs and delayed job promotions
259
+ - Group locking and unlocking
260
+ - Redis connection events
261
+ - Performance warnings
262
+
263
+ ## Repeatable jobs (cron/interval)
264
+
265
+ GroupMQ supports simple repeatable jobs using either a fixed interval (`every`) or a basic cron pattern (`pattern`). Repeats are materialized by a lightweight scheduler that runs as part of the worker's periodic cleanup cycle.
266
+
267
+ ### Add a repeating job (every N ms)
268
+
269
+ ```ts
270
+ await queue.add({
271
+ groupId: 'reports',
272
+ data: { type: 'daily-summary' },
273
+ repeat: { every: 5000 }, // run every 5 seconds
274
+ });
275
+
276
+ const worker = new Worker({
277
+ queue,
278
+ handler: async (job) => {
279
+ // process...
280
+ },
281
+ // IMPORTANT: For timely repeats, run the scheduler frequently
282
+ cleanupIntervalMs: 1000, // <= repeat.every (recommended 1–2s for 5s repeats)
283
+ });
284
+
285
+ worker.run();
286
+ ```
287
+
288
+ ### Add a repeating job (cron pattern)
289
+
290
+ ```ts
291
+ await queue.add({
292
+ groupId: 'emails',
293
+ data: { type: 'weekly-digest' },
294
+ repeat: { pattern: '0 9 * * 1-5' }, // 09:00 Mon–Fri
295
+ });
296
+ ```
297
+
298
+ ### Remove a repeating job
299
+
300
+ ```ts
301
+ await queue.removeRepeatingJob('reports', { every: 5000 });
302
+ // or
303
+ await queue.removeRepeatingJob('emails', { pattern: '0 9 * * 1-5' });
304
+ ```
305
+
306
+ ### Scheduler behavior and best practices
307
+
308
+ - The worker's periodic cycle runs: `cleanup()`, `promoteDelayedJobs()`, and `processRepeatingJobs()`.
309
+ - Repeating jobs are enqueued during this cycle via a distributed scheduler with lock coordination.
310
+ - **Minimum practical repeat interval:** ~1.5-2 seconds (controlled by `schedulerLockTtlMs`, default: 1500ms)
311
+ - For sub-second repeats (not recommended in production):
312
+ ```ts
313
+ const queue = new Queue({
314
+ redis,
315
+ namespace: 'fast',
316
+ schedulerLockTtlMs: 50, // Allow fast scheduler lock
317
+ });
318
+
319
+ const worker = new Worker({
320
+ queue,
321
+ schedulerIntervalMs: 10, // Check every 10ms
322
+ cleanupIntervalMs: 100, // Cleanup every 100ms
323
+ handler: async (job) => { /* ... */ },
324
+ });
325
+ ```
326
+ ⚠️ Fast repeats (< 1s) increase Redis load and should be used sparingly.
327
+ - The scheduler is idempotent: it updates the next run time before enqueueing to prevent double runs.
328
+ - Each occurrence is a normal job with a fresh `jobId`, preserving per-group FIFO semantics.
329
+ - You can monitor repeated runs via BullBoard using the provided adapter.
330
+
331
+ ## Graceful Shutdown
332
+
333
+ ```ts
334
+ // Stop worker gracefully - waits for current job to finish
335
+ await worker.close(gracefulTimeoutMs);
336
+
337
+ // Wait for queue to be empty
338
+ const isEmpty = await queue.waitForEmpty(timeoutMs);
339
+
340
+ // Recover groups that might be stuck due to ordering delays
341
+ const recoveredCount = await queue.recoverDelayedGroups();
342
+ ```
343
+
344
+ ## Additional Methods
345
+
346
+ ### Queue Methods
347
+
348
+ ```ts
349
+ // Job counts and status
350
+ const counts = await queue.getJobCounts();
351
+ // { active: 5, waiting: 12, delayed: 3, total: 20, uniqueGroups: 8 }
352
+
353
+ const activeCount = await queue.getActiveCount();
354
+ const waitingCount = await queue.getWaitingCount();
355
+ const delayedCount = await queue.getDelayedCount();
356
+ const completedCount = await queue.getCompletedCount();
357
+ const failedCount = await queue.getFailedCount();
358
+
359
+ // Get job IDs by status
360
+ const activeJobIds = await queue.getActiveJobs();
361
+ const waitingJobIds = await queue.getWaitingJobs();
362
+ const delayedJobIds = await queue.getDelayedJobs();
363
+
364
+ // Get Job instances by status
365
+ const completedJobs = await queue.getCompletedJobs(limit); // returns Job[]
366
+ const failedJobs = await queue.getFailedJobs(limit);
367
+
368
+ // Group information
369
+ const groups = await queue.getUniqueGroups(); // ['user:123', 'order:456']
370
+ const groupCount = await queue.getUniqueGroupsCount();
371
+ const jobsInGroup = await queue.getGroupJobCount('user:123');
372
+
373
+ // Get specific job
374
+ const job = await queue.getJob(jobId); // returns Job instance
375
+
376
+ // Job manipulation
377
+ await queue.remove(jobId);
378
+ await queue.retry(jobId); // Re-enqueue a failed job
379
+ await queue.promote(jobId); // Promote delayed job to waiting
380
+ await queue.changeDelay(jobId, newDelayMs);
381
+ await queue.updateData(jobId, newData);
382
+
383
+ // Scheduler operations
384
+ await queue.runSchedulerOnce(); // Manual scheduler run
385
+ await queue.promoteDelayedJobs(); // Promote delayed jobs
386
+ await queue.recoverDelayedGroups(); // Recover stuck groups
387
+
388
+ // Cleanup and shutdown
389
+ await queue.waitForEmpty(timeoutMs);
390
+ await queue.close();
391
+ ```
392
+
393
+ ### Job Instance Methods
394
+
395
+ Jobs returned from `queue.getJob()`, `queue.getCompletedJobs()`, etc. have these methods:
396
+
397
+ ```ts
398
+ const job = await queue.getJob(jobId);
399
+
400
+ // Manipulate the job
401
+ await job.remove();
402
+ await job.retry();
403
+ await job.promote();
404
+ await job.changeDelay(newDelayMs);
405
+ await job.updateData(newData);
406
+ await job.update(newData); // Alias for updateData
407
+
408
+ // Get job state
409
+ const state = await job.getState(); // 'active' | 'waiting' | 'delayed' | 'completed' | 'failed'
410
+
411
+ // Serialize job
412
+ const json = job.toJSON();
413
+ ```
414
+
415
+ ### Worker Methods
416
+
417
+ ```ts
418
+ // Check worker status
419
+ const isProcessing = worker.isProcessing();
420
+
421
+ // Get current job(s) being processed
422
+ const currentJob = worker.getCurrentJob();
423
+ // { job: ReservedJob, processingTimeMs: 1500 } | null
424
+
425
+ // For concurrency > 1
426
+ const currentJobs = worker.getCurrentJobs();
427
+ // [{ job: ReservedJob, processingTimeMs: 1500 }, ...]
428
+
429
+ // Get worker metrics
430
+ const metrics = worker.getWorkerMetrics();
431
+ // { jobsInProgress: 2, lastJobPickupTime: 1234567890, ... }
432
+
433
+ // Graceful shutdown
434
+ await worker.close(gracefulTimeoutMs);
435
+ ```
436
+
437
+ ### Worker Events
438
+
439
+ Workers emit events that you can listen to:
440
+
441
+ ```ts
442
+ worker.on('ready', () => {
443
+ console.log('Worker is ready');
444
+ });
445
+
446
+ worker.on('completed', (job: Job) => {
447
+ console.log('Job completed:', job.id);
448
+ });
449
+
450
+ worker.on('failed', (job: Job) => {
451
+ console.log('Job failed:', job.id, job.failedReason);
452
+ });
453
+
454
+ worker.on('error', (error: Error) => {
455
+ console.error('Worker error:', error);
456
+ });
457
+
458
+ worker.on('closed', () => {
459
+ console.log('Worker closed');
460
+ });
461
+
462
+ worker.on('graceful-timeout', (job: Job) => {
463
+ console.log('Job exceeded graceful timeout:', job.id);
464
+ });
465
+
466
+ // Remove event listeners
467
+ worker.off('completed', handler);
468
+ worker.removeAllListeners();
469
+ ```
470
+
471
+ ### BullBoard Integration
472
+
473
+ GroupMQ provides a BullBoard adapter for visual monitoring and management:
474
+
475
+ ```ts
476
+ import { createBullBoard } from '@bull-board/api';
477
+ import { ExpressAdapter } from '@bull-board/express';
478
+ import { BullBoardGroupMQAdapter } from 'groupmq';
479
+ import express from 'express';
480
+
481
+ const serverAdapter = new ExpressAdapter();
482
+ serverAdapter.setBasePath('/admin/queues');
483
+
484
+ createBullBoard({
485
+ queues: [
486
+ new BullBoardGroupMQAdapter(queue, {
487
+ displayName: 'Order Processing',
488
+ description: 'Processes customer orders',
489
+ readOnlyMode: false, // Allow job manipulation through UI
490
+ }),
491
+ ],
492
+ serverAdapter,
493
+ });
494
+
495
+ const app = express();
496
+ app.use('/admin/queues', serverAdapter.getRouter());
497
+ app.listen(3000, () => {
498
+ console.log('BullBoard running at http://localhost:3000/admin/queues');
499
+ });
500
+ ```
501
+
502
+ ### Detailed Architecture
503
+
504
+ #### Redis Data Structures
505
+
506
+ GroupMQ uses these Redis keys (all prefixed with `groupmq:{namespace}:`):
507
+
508
+ - **`:g:{groupId}`**, sorted set of job IDs in a group, ordered by score (derived from `orderMs` and `seq`)
509
+ - **`:ready`**, sorted set of group IDs that have jobs available, ordered by lowest job score
510
+ - **`:job:{jobId}`**, hash containing job data (id, groupId, data, attempts, status, etc.)
511
+ - **`:lock:{groupId}`**, string with job ID that currently owns the group lock (with TTL)
512
+ - **`:processing`**, sorted set of active job IDs, ordered by deadline
513
+ - **`:processing:{jobId}`**, hash with processing metadata (groupId, deadlineAt)
514
+ - **`:delayed`**, sorted set of delayed jobs, ordered by runAt timestamp
515
+ - **`:completed`**, sorted set of completed job IDs (for retention)
516
+ - **`:failed`**, sorted set of failed job IDs (for retention)
517
+ - **`:repeats`**, hash of repeating job definitions (groupId → config)
518
+
519
+ #### Job Lifecycle States
520
+
521
+ 1. **Waiting**, job is in `:g:{groupId}` and group is in `:ready`
522
+ 2. **Delayed**, job is in `:delayed` (scheduled for future)
523
+ 3. **Active**, job is in `:processing` and group is locked
524
+ 4. **Completed**, job is in `:completed` (retention)
525
+ 5. **Failed**, job exceeded maxAttempts, moved to `:failed` (retention)
526
+
527
+ #### Worker Loop
528
+
529
+ The worker runs a continuous loop optimized for both single and concurrent processing:
530
+
531
+ **For concurrency = 1 (sequential):**
532
+ ```typescript
533
+ while (!stopping) {
534
+ // 1. Blocking reserve (waits for job, efficient)
535
+ const job = await queue.reserveBlocking(timeoutSec);
536
+
537
+ // 2. Process job synchronously
538
+ if (job) {
539
+ await processOne(job);
540
+ }
541
+
542
+ // 3. Periodic scheduler run (every schedulerIntervalMs)
543
+ await queue.runSchedulerOnce(); // Promotes delayed jobs, processes repeats
544
+ }
545
+ ```
546
+
547
+ **For concurrency > 1 (parallel):**
548
+ ```typescript
549
+ while (!stopping) {
550
+ // 1. Run lightweight scheduler periodically
551
+ await queue.runSchedulerOnce();
552
+
553
+ // 2. Try batch reservation if we have capacity
554
+ const capacity = concurrency - jobsInProgress.size;
555
+ if (capacity > 0) {
556
+ const jobs = await queue.reserveBatch(capacity);
557
+ // Process all jobs concurrently (fire and forget)
558
+ for (const job of jobs) {
559
+ void processOne(job);
560
+ }
561
+ }
562
+
563
+ // 3. Blocking reserve for remaining capacity
564
+ const job = await queue.reserveBlocking(blockingTimeoutSec);
565
+ if (job) {
566
+ void processOne(job); // Process async
567
+ }
568
+ }
569
+ ```
570
+
571
+ **Key optimizations:**
572
+ - Batch reservation reduces Redis round-trips for concurrent workers
573
+ - Blocking operations prevent wasteful polling
574
+ - Heartbeat mechanism keeps jobs alive during long processing
575
+ - Atomic completion + next reservation reduces latency
576
+
577
+ #### Atomic Operations (Lua Scripts)
578
+
579
+ All critical operations use Lua scripts for atomicity:
580
+
581
+ - **`enqueue.lua`**, adds job to group queue, adds group to ready set
582
+ - **`reserve.lua`**, finds ready group, pops head job, locks group
583
+ - **`reserve-batch.lua`**, reserves one job from multiple groups atomically
584
+ - **`complete.lua`**, marks job complete, unlocks group, re-adds group to ready if more jobs
585
+ - **`complete-and-reserve-next.lua`**, atomic completion + reservation from same group
586
+ - **`retry.lua`**, increments attempts, re-adds job to group with backoff delay
587
+ - **`remove.lua`**, removes job from all data structures
588
+
589
+ #### Job Reservation Flow
590
+
591
+ When a worker reserves a job:
592
+
593
+ 1. **Find Ready Group**: `ZRANGE :ready 0 0` gets lowest-score group
594
+ 2. **Check Lock**: `PTTL :lock:{groupId}` ensures group isn't locked
595
+ 3. **Pop Job**: `ZPOPMIN :g:{groupId} 1` gets head job atomically
596
+ 4. **Lock Group**: `SET :lock:{groupId} {jobId} PX {timeout}`
597
+ 5. **Mark Processing**: Add to `:processing` sorted set with deadline
598
+ 6. **Re-add Group**: If more jobs exist, `ZADD :ready {score} {groupId}`
599
+
600
+ #### Job Completion Flow
601
+
602
+ When a job completes successfully:
603
+
604
+ 1. **Remove from Processing**: `DEL :processing:{jobId}`, `ZREM :processing {jobId}`
605
+ 2. **Mark Completed**: `HSET :job:{jobId} status completed`
606
+ 3. **Add to Retention**: `ZADD :completed {now} {jobId}`
607
+ 4. **Unlock Group**: `DEL :lock:{groupId}` (only if this job owns the lock)
608
+ 5. **Check for More Jobs**: `ZCARD :g:{groupId}`
609
+ 6. **Re-add to Ready**: If jobs remain, `ZADD :ready {nextScore} {groupId}`
610
+
611
+ The critical fix in step 6 ensures that after a job completes, the group becomes available again for other workers to pick up the next job in the queue.
612
+
613
+ #### Ordering and Scoring
614
+
615
+ Jobs are ordered using a composite score:
616
+
617
+ ```typescript
618
+ score = (orderMs - baseEpoch) * 1000 + seq
619
+ ```
620
+
621
+ - `orderMs`, user-provided timestamp for event ordering
622
+ - `baseEpoch`, fixed epoch timestamp (1704067200000) to keep scores manageable
623
+ - `seq`, auto-incrementing sequence for tiebreaking (resets daily to prevent overflow)
624
+
625
+ This ensures:
626
+ - Jobs with earlier `orderMs` process first
627
+ - Jobs with same `orderMs` process in submission order
628
+ - Score is stable and sortable
629
+ - Daily sequence reset prevents integer overflow
630
+
631
+ #### Concurrency Modes
632
+
633
+ **concurrency = 1** (Sequential):
634
+ - Worker processes one job at a time
635
+ - Uses blocking reserve with synchronous processing
636
+ - Simplest mode, lowest memory, lowest Redis overhead
637
+ - Best for: CPU-intensive jobs, resource-constrained environments
638
+
639
+ **concurrency > 1** (Parallel):
640
+ - Worker attempts batch reservation first (lower latency)
641
+ - Processes multiple jobs concurrently (from different groups only)
642
+ - Each job runs in parallel with its own heartbeat
643
+ - Falls back to blocking reserve when batch is empty
644
+ - Higher throughput, efficient for I/O-bound workloads
645
+ - Best for: Network calls, database operations, API requests
646
+
647
+ **Important:** Per-group FIFO ordering is maintained regardless of concurrency level. Multiple jobs from the same group never run in parallel.
648
+
649
+ #### Error Handling and Retries
650
+
651
+ When a job fails:
652
+
653
+ 1. **Increment Attempts**: `HINCRBY :job:{jobId} attempts 1`
654
+ 2. **Check Max Attempts**: If `attempts >= maxAttempts`, mark as failed
655
+ 3. **Calculate Backoff**: Use exponential backoff strategy
656
+ 4. **Re-enqueue**: Add job back to `:g:{groupId}` with delay
657
+ 5. **Unlock Group**: Release lock so next job can process
658
+
659
+ If a job times out (visibility timeout expires):
660
+ - Heartbeat mechanism extends the lock: `SET :lock:{groupId} {jobId} PX {timeout}`
661
+ - If heartbeat fails, job remains locked until TTL expires
662
+ - Cleanup cycle detects expired locks and recovers jobs
663
+
664
+ #### Cleanup and Recovery
665
+
666
+ Periodic cleanup runs:
667
+
668
+ 1. **Promote Delayed Jobs**: Move jobs from `:delayed` to waiting when `runAt` arrives
669
+ 2. **Process Repeats**: Enqueue next occurrence of repeating jobs
670
+ 3. **Recover Stale Locks**: Find expired locks in `:processing` and unlock groups
671
+ 4. **Recover Delayed Groups**: Handle groups stuck due to ordering delays
672
+ 5. **Trim Completed/Failed**: Remove old completed and failed jobs per retention policy
673
+
674
+ ### Performance Characteristics
675
+
676
+ **Latest Benchmarks** (MacBook M2, 500 jobs, 4 workers, multi-process):
677
+
678
+ #### GroupMQ Performance
679
+ - **Throughput**: 68-73 jobs/sec (500 jobs), 80-86 jobs/sec (5000 jobs)
680
+ - **Latency**: P95 pickup ~5-5.5s, P95 processing ~45-50ms
681
+ - **Memory**: ~120-145 MB per worker process
682
+ - **CPU**: <1% average, <70% peak
683
+
684
+ #### vs BullMQ Comparison
685
+ GroupMQ maintains competitive performance while adding per-group FIFO ordering guarantees:
686
+ - **Similar throughput** for group-based workloads
687
+ - **Better job ordering** with guaranteed per-group FIFO processing
688
+ - **Atomic operations** reduce race conditions and improve reliability
689
+
690
+ For detailed benchmark results and comparisons over time, see our [Performance Benchmarks](https://openpanel-dev.github.io/groupmq/benchmarks/) page.
691
+
692
+ **Optimizations:**
693
+ - **Batch Operations**: `reserveBatch` reduces round-trips for concurrent workers
694
+ - **Blocking Operations**: Efficient Redis BLPOP-style blocking prevents wasteful polling
695
+ - **Lua Scripts**: All critical paths are atomic, avoiding race conditions
696
+ - **Atomic Completion**: Complete job + reserve next in single operation
697
+ - **Minimal Data**: Jobs store only essential fields, keeps memory low
698
+ - **Score-Based Ordering**: O(log N) insertions and retrievals via sorted sets
699
+ - **Adaptive Behavior**: Scheduler intervals adjust based on ordering configuration
700
+
701
+ ### Contributing
702
+
703
+ Contributions are welcome! When making changes:
704
+
705
+ 1. **Run tests and benchmarks** before and after your changes to verify everything works correctly
706
+ 2. **Add tests** for any new features
707
+
708
+ ## Testing
709
+
710
+ Requires a local Redis at `127.0.0.1:6379` (no auth).
711
+
712
+ ```bash
713
+ npm i
714
+ npm run build
715
+ npm test
716
+ ```
717
+
718
+ Optionally:
719
+
720
+ ```bash
721
+ docker run --rm -p 6379:6379 redis:7
722
+ ```