npm - atomic-queues - Versions diffs - 2.3.0 → 3.0.0 - Mend

atomic-queues 2.3.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (204) hide show

package/README.md +297 -382
package/dist/cli/generators/classes.d.ts +1 -1
package/dist/cli/generators/json-schema.d.ts +1 -1
package/dist/cli/generators/typescript.d.ts +1 -1
package/dist/cli/index.js +147 -5
package/dist/cli/index.js.map +1 -1
package/dist/cluster/cluster-discovery.service.d.ts +91 -0
package/dist/cluster/cluster-discovery.service.d.ts.map +1 -0
package/dist/cluster/cluster-discovery.service.js +423 -0
package/dist/cluster/cluster-discovery.service.js.map +1 -0
package/dist/cluster/grpc-peer-monitor.service.d.ts +31 -0
package/dist/cluster/grpc-peer-monitor.service.d.ts.map +1 -0
package/dist/cluster/grpc-peer-monitor.service.js +192 -0
package/dist/cluster/grpc-peer-monitor.service.js.map +1 -0
package/dist/cluster/index.d.ts +7 -0
package/dist/cluster/index.d.ts.map +1 -0
package/dist/cluster/index.js +23 -0
package/dist/cluster/index.js.map +1 -0
package/dist/cluster/leader-election.service.d.ts +38 -0
package/dist/cluster/leader-election.service.d.ts.map +1 -0
package/dist/cluster/leader-election.service.js +184 -0
package/dist/cluster/leader-election.service.js.map +1 -0
package/dist/cluster/master-coordinator.d.ts +50 -0
package/dist/cluster/master-coordinator.d.ts.map +1 -0
package/dist/cluster/master-coordinator.js +307 -0
package/dist/cluster/master-coordinator.js.map +1 -0
package/dist/cluster/redis-health-monitor.service.d.ts +23 -0
package/dist/cluster/redis-health-monitor.service.d.ts.map +1 -0
package/dist/cluster/redis-health-monitor.service.js +100 -0
package/dist/cluster/redis-health-monitor.service.js.map +1 -0
package/dist/cluster/server-ring.service.d.ts +48 -0
package/dist/cluster/server-ring.service.d.ts.map +1 -0
package/dist/cluster/server-ring.service.js +136 -0
package/dist/cluster/server-ring.service.js.map +1 -0
package/dist/decorators/entity.decorators.d.ts +16 -24
package/dist/decorators/entity.decorators.d.ts.map +1 -1
package/dist/decorators/entity.decorators.js +0 -39
package/dist/decorators/entity.decorators.js.map +1 -1
package/dist/decorators/interfaces.d.ts +10 -10
package/dist/decorators/interfaces.d.ts.map +1 -1
package/dist/decorators/job.decorators.d.ts +4 -52
package/dist/decorators/job.decorators.d.ts.map +1 -1
package/dist/decorators/job.decorators.js +6 -54
package/dist/decorators/job.decorators.js.map +1 -1
package/dist/decorators/metadata-readers.d.ts +4 -2
package/dist/decorators/metadata-readers.d.ts.map +1 -1
package/dist/decorators/metadata-readers.js +2 -0
package/dist/decorators/metadata-readers.js.map +1 -1
package/dist/decorators/schema.decorators.d.ts +1 -1
package/dist/decorators/schema.decorators.d.ts.map +1 -1
package/dist/decorators/schema.decorators.js.map +1 -1
package/dist/decorators/utils.d.ts +1 -1
package/dist/decorators/utils.d.ts.map +1 -1
package/dist/decorators/utils.js +5 -1
package/dist/decorators/utils.js.map +1 -1
package/dist/domain/interfaces/config.interfaces.d.ts +92 -29
package/dist/domain/interfaces/config.interfaces.d.ts.map +1 -1
package/dist/domain/interfaces/index.d.ts +1 -0
package/dist/domain/interfaces/index.d.ts.map +1 -1
package/dist/domain/interfaces/index.js +1 -0
package/dist/domain/interfaces/index.js.map +1 -1
package/dist/{services/registry → domain/interfaces}/registry.types.d.ts.map +1 -1
package/dist/domain/interfaces/registry.types.js.map +1 -0
package/dist/grpc/grpc-client-pool.service.d.ts +71 -0
package/dist/grpc/grpc-client-pool.service.d.ts.map +1 -0
package/dist/grpc/grpc-client-pool.service.js +307 -0
package/dist/grpc/grpc-client-pool.service.js.map +1 -0
package/dist/grpc/grpc-server.service.d.ts +47 -0
package/dist/grpc/grpc-server.service.d.ts.map +1 -0
package/dist/grpc/grpc-server.service.js +494 -0
package/dist/grpc/grpc-server.service.js.map +1 -0
package/dist/grpc/index.d.ts +3 -0
package/dist/grpc/index.d.ts.map +1 -0
package/dist/{services/executor-pool → grpc}/index.js +2 -1
package/dist/grpc/index.js.map +1 -0
package/dist/index.d.ts +4 -0
package/dist/index.d.ts.map +1 -1
package/dist/index.js +4 -0
package/dist/index.js.map +1 -1
package/dist/module/atomic-queues.module.d.ts +1 -0
package/dist/module/atomic-queues.module.d.ts.map +1 -1
package/dist/module/atomic-queues.module.js +59 -10
package/dist/module/atomic-queues.module.js.map +1 -1
package/dist/services/command-discovery/command-discovery.service.js +2 -2
package/dist/services/command-discovery/command-discovery.service.js.map +1 -1
package/dist/services/index.d.ts +2 -8
package/dist/services/index.d.ts.map +1 -1
package/dist/services/index.js +2 -8
package/dist/services/index.js.map +1 -1
package/dist/services/message-router/index.d.ts +2 -0
package/dist/services/message-router/index.d.ts.map +1 -0
package/dist/services/{actor-system → message-router}/index.js +1 -1
package/dist/services/message-router/index.js.map +1 -0
package/dist/services/message-router/message-router.service.d.ts +53 -0
package/dist/services/message-router/message-router.service.d.ts.map +1 -0
package/dist/services/message-router/message-router.service.js +519 -0
package/dist/services/message-router/message-router.service.js.map +1 -0
package/dist/services/queue-bus/cluster-contracts.d.ts +1 -1
package/dist/services/queue-bus/cluster-contracts.d.ts.map +1 -1
package/dist/services/queue-bus/cluster-contracts.js.map +1 -1
package/dist/services/queue-bus/queue-bus.service.d.ts +3 -21
package/dist/services/queue-bus/queue-bus.service.d.ts.map +1 -1
package/dist/services/queue-bus/queue-bus.service.js +15 -119
package/dist/services/queue-bus/queue-bus.service.js.map +1 -1
package/dist/utils/id.utils.d.ts +3 -0
package/dist/utils/id.utils.d.ts.map +1 -0
package/dist/utils/id.utils.js +14 -0
package/dist/utils/id.utils.js.map +1 -0
package/dist/utils/index.d.ts +1 -0
package/dist/utils/index.d.ts.map +1 -1
package/dist/utils/index.js +1 -0
package/dist/utils/index.js.map +1 -1
package/dist/wal/index.d.ts +4 -0
package/dist/wal/index.d.ts.map +1 -0
package/dist/{services/gate → wal}/index.js +3 -1
package/dist/wal/index.js.map +1 -0
package/dist/wal/wal.scripts.d.ts +51 -0
package/dist/wal/wal.scripts.d.ts.map +1 -0
package/dist/wal/wal.scripts.js +84 -0
package/dist/wal/wal.scripts.js.map +1 -0
package/dist/wal/wal.service.d.ts +46 -0
package/dist/wal/wal.service.d.ts.map +1 -0
package/dist/wal/wal.service.js +243 -0
package/dist/wal/wal.service.js.map +1 -0
package/dist/wal/wal.types.d.ts +23 -0
package/dist/wal/wal.types.d.ts.map +1 -0
package/dist/wal/wal.types.js +3 -0
package/dist/wal/wal.types.js.map +1 -0
package/dist/workers/consistent-hash.d.ts +97 -0
package/dist/workers/consistent-hash.d.ts.map +1 -0
package/dist/workers/consistent-hash.js +231 -0
package/dist/workers/consistent-hash.js.map +1 -0
package/dist/workers/entity-worker-manager.d.ts +35 -0
package/dist/workers/entity-worker-manager.d.ts.map +1 -0
package/dist/workers/entity-worker-manager.js +237 -0
package/dist/workers/entity-worker-manager.js.map +1 -0
package/dist/workers/entity-worker.d.ts +54 -0
package/dist/workers/entity-worker.d.ts.map +1 -0
package/dist/workers/entity-worker.js +142 -0
package/dist/workers/entity-worker.js.map +1 -0
package/dist/workers/index.d.ts +4 -0
package/dist/workers/index.d.ts.map +1 -0
package/dist/{services/log → workers}/index.js +3 -1
package/dist/workers/index.js.map +1 -0
package/package.json +17 -4
package/dist/services/actor-system/actor-system.service.d.ts +0 -19
package/dist/services/actor-system/actor-system.service.d.ts.map +0 -1
package/dist/services/actor-system/actor-system.service.js +0 -86
package/dist/services/actor-system/actor-system.service.js.map +0 -1
package/dist/services/actor-system/index.d.ts +0 -2
package/dist/services/actor-system/index.d.ts.map +0 -1
package/dist/services/actor-system/index.js.map +0 -1
package/dist/services/executor-pool/executor-pool.service.d.ts +0 -38
package/dist/services/executor-pool/executor-pool.service.d.ts.map +0 -1
package/dist/services/executor-pool/executor-pool.service.js +0 -166
package/dist/services/executor-pool/executor-pool.service.js.map +0 -1
package/dist/services/executor-pool/index.d.ts +0 -2
package/dist/services/executor-pool/index.d.ts.map +0 -1
package/dist/services/executor-pool/index.js.map +0 -1
package/dist/services/gate/gate.service.d.ts +0 -17
package/dist/services/gate/gate.service.d.ts.map +0 -1
package/dist/services/gate/gate.service.js +0 -81
package/dist/services/gate/gate.service.js.map +0 -1
package/dist/services/gate/index.d.ts +0 -2
package/dist/services/gate/index.d.ts.map +0 -1
package/dist/services/gate/index.js.map +0 -1
package/dist/services/log/index.d.ts +0 -2
package/dist/services/log/index.d.ts.map +0 -1
package/dist/services/log/index.js.map +0 -1
package/dist/services/log/log.service.d.ts +0 -21
package/dist/services/log/log.service.d.ts.map +0 -1
package/dist/services/log/log.service.js +0 -92
package/dist/services/log/log.service.js.map +0 -1
package/dist/services/registry/index.d.ts +0 -4
package/dist/services/registry/index.d.ts.map +0 -1
package/dist/services/registry/index.js +0 -20
package/dist/services/registry/index.js.map +0 -1
package/dist/services/registry/registry.service.d.ts +0 -43
package/dist/services/registry/registry.service.d.ts.map +0 -1
package/dist/services/registry/registry.service.js +0 -367
package/dist/services/registry/registry.service.js.map +0 -1
package/dist/services/registry/registry.types.js.map +0 -1
package/dist/services/registry/schema-converter.d.ts +0 -2
package/dist/services/registry/schema-converter.d.ts.map +0 -1
package/dist/services/registry/schema-converter.js +0 -27
package/dist/services/registry/schema-converter.js.map +0 -1
package/dist/services/result-collector/index.d.ts +0 -2
package/dist/services/result-collector/index.d.ts.map +0 -1
package/dist/services/result-collector/index.js +0 -18
package/dist/services/result-collector/index.js.map +0 -1
package/dist/services/result-collector/result-collector.service.d.ts +0 -17
package/dist/services/result-collector/result-collector.service.d.ts.map +0 -1
package/dist/services/result-collector/result-collector.service.js +0 -92
package/dist/services/result-collector/result-collector.service.js.map +0 -1
package/dist/services/scheduler/index.d.ts +0 -2
package/dist/services/scheduler/index.d.ts.map +0 -1
package/dist/services/scheduler/index.js +0 -18
package/dist/services/scheduler/index.js.map +0 -1
package/dist/services/scheduler/scheduler.service.d.ts +0 -17
package/dist/services/scheduler/scheduler.service.d.ts.map +0 -1
package/dist/services/scheduler/scheduler.service.js +0 -140
package/dist/services/scheduler/scheduler.service.js.map +0 -1
/package/dist/{services/registry → domain/interfaces}/registry.types.d.ts +0 -0
/package/dist/{services/registry → domain/interfaces}/registry.types.js +0 -0

package/README.md CHANGED Viewed

@@ -10,11 +10,11 @@
 ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⣽⣟⣳⡝⡼⢁⠎⠀⡀⢁⣴⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⡄⠰⣄⠈⠓⢌⠛⢽⣣⡟⢿⠿⣿⣿⢿⣿⣿⣿⣿⣿⣿⣿█▀█  █  █▄█ █ ▀ █ █ █▄▄
 ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⡿⣽⠳⡼⢁⡞⠀⡜⢰⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡆⢸⢵⠀⠀⠁⠂⠤⣉⠉⠓⠒⠚⠦⠥⡈⠉⣙⢛⡿⣿█▀█ █ █ █▀▀ █ █ █▀▀ █▀▀
 ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⡾⣽⣏⢳⢃⣞⠃⡼⢀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡄⠀⠀⠀⠀⠀⠀⠀⠀⠁⢀⣀⠤⠐⢋⡰⣌⣾⣿⣿▀▀█ █▄█ ██▄ █▄█ ██▄ ▄▄█
-⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⣮⢳⣿⠶⠁⠖⠃⠀⠁⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠿⠿⠟⠛⠛⠀⠀⠀⠀⢀⡤⠤⠐⠒⣉⠡⣄⠶⣭⣿⣽⣿⣿⣿⣿
-⣿⣿⣿⣿⣿⣿⣿⡿⠿⢉⡢⠝⠁⠀⠃⠀⠀⠀⠀⠀⠿⠃⠿⠿⠿⠛⠋⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠀⣀⢤⣰⣲⣽⣾⡟⣾⣿⣿⣿⣿⣿⣿⣿⣿
-⣿⣿⣟⡿⡚⠏⠁⠀⠐⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠂⣠⠀⣯⣗⣮⢿⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿z e r o   c o n t e n t i o n
-⣿⢯⡝⠠⠁⠀⠀⠠⠤⠀⠀⠀⠀⡀⠢⣄⣀⡀⠐⠤⡀⠀⠀⠀⢤⣄⣀⠤⣄⣤⢤⣖⡾⠋⢁⡼⠁⣸⡿⣞⣽⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿p e r   e n t i t y
-⣿⣷⣾⣵⣦⣶⣖⣳⣶⣝⣶⣯⣷⣽⣷⣾⣶⣽⣯⣶⠄⠈⠒⣤⣀⠉⠙⠛⠛⠋⠋⢁⣠⠔⠁⠀⢰⣿⣽⣯⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿l o c k   f r e e
+⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⣮⢳⣿⠶⠁⠖⠃⠀⠁⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠿⠿⠟⠛⠛⠀⠀⠀⠀⢀⡤⠤⠐⠒⣉⠡⣄⠶⣭⣿⣽⣿⣿⣿⣿⣿
+⣿⣿⣿⣿⣿⣿⣿⡿⠿⢉⡢⠝⠁⠀⠃⠀⠀⠀⠀⠀⠿⠃⠿⠿⠿⠛⠋⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⣀⠀⣀⢤⣰⣲⣽⣾⡟⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿
+⣿⣟⡿⡚⠏⠁⠀⠀⠐⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠂⣠⠀⣯⣗⣮⢿⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿v i r t u a l   a c t o r s
+⣿⢯⡝⠠⠁⠀⠀⠠⠤⠀⠀⠀⠀⡀⠢⣄⣀⡀⠐⠤⡀⠀⠀⠀⢤⣄⣀⠤⣄⣤⢤⣖⡾⠋⢁⡼⠁⣸⡿⣞⣽⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿s t r i c t l y   o n c e
+⣿⣷⣾⣵⣦⣶⣖⣳⣶⣝⣶⣯⣷⣽⣷⣾⣶⣽⣯⣶⠄⠈⠒⣤⣀⠉⠙⠛⠛⠋⠋⢁⣠⠔⠁⠀⢰⣿⣽⣯⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿z e r o   l o c k s
 ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣦⡄⡀⡉⠛⠓⠶⠶⠒⠛⠋⠀⠀⢀⣼⣻⢷⣾⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
 ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣾⣧⡵⣌⣒⢂⠀⣀⣀⣠⣤⣶⣿⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
 ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣿⣾⣷⣯⣿⣧⣿⣷⣿⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
@@ -26,7 +26,6 @@
 <p align="center">
   <img src="https://img.shields.io/npm/v/atomic-queues?style=flat-square&color=cb3837" alt="npm version" />
   <img src="https://img.shields.io/badge/NestJS-11-ea2845?style=flat-square&logo=nestjs" alt="NestJS 11" />
-  <img src="https://img.shields.io/badge/Redis-7-dc382d?style=flat-square&logo=redis&logoColor=white" alt="Redis 7" />
   <img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="MIT License" />
 </p>
@@ -34,67 +33,78 @@
 ## What is atomic-queues?
-**Per-entity sequential processing for Node.js, built entirely on Redis primitives.**
+**Per-entity sequential processing with virtual actors for NestJS.**
-Think of it as automatic entity-level serialization for the NestJS ecosystem, requiring nothing beyond a Redis instance you probably already have.
+One worker per entity instance, spawned on demand, destroyed when idle. The worker IS the serialization boundary. If only one worker exists for `account:a-123` across the entire cluster, all operations on that account are serial by construction. No locks. No transactions. No race conditions.
-Messages addressed to the same entity execute sequentially. Messages addressed to different entities execute in parallel. No distributed locks. No worker processes. No message broker. No BullMQ.
+**Motto: Strictly once, fail if interrupted.**
 ```
 npm install atomic-queues ioredis
 ```
+**Peer dependencies:** `@nestjs/common`, `@nestjs/core`, `@nestjs/cqrs`, `ioredis`
+**Optional:** `@grpc/grpc-js`, `@grpc/proto-loader` (cluster mode), `zod` (CLI schema validation)
 ---
 ## The Problem
-Every distributed system eventually builds toward one of two failure modes: **state corruption** from concurrent mutations on the same entity, or **throughput collapse** from the locking mechanisms used to prevent it.
 ```
 Time    Request A                    Request B                    Database
 ──────────────────────────────────────────────────────────────────────────
-T₀      SELECT balance → $100        SELECT balance → $100        $100
-T₁      CHECK: $100 ≥ $80 ✓          CHECK: $100 ≥ $80 ✓
-T₂      UPDATE: $100 − $80 = $20                                  $20
-T₃                                   UPDATE: $100 − $80 = $20     −$60
+T0      SELECT balance -> $100       SELECT balance -> $100       $100
+T1      CHECK: $100 >= $80           CHECK: $100 >= $80
+T2      UPDATE: $100 - $80 = $20                                  $20
+T3                                   UPDATE: $100 - $80 = $20     -$60
 ──────────────────────────────────────────────────────────────────────────
-Result: Balance is −$60. Both withdrawals succeed. Integrity violated.
+Result: Balance is -$60. Both withdrawals succeed. Integrity violated.
 ```
-The standard answers — `SELECT ... FOR UPDATE`, optimistic locking with retries, distributed locks via Redlock or ZooKeeper, serializable transactions — all trade throughput for correctness. Under load, they become bottlenecks. Across services, they become nightmares. And every team ends up inventing some ad-hoc combination of them, poorly, under production pressure.
+Row locks, optimistic locking, Redlock — they all trade throughput for correctness.
 ## The Insight
-The problem disappears if you change *when* serialization happens. Instead of serializing at the database level (row locks, transaction isolation), serialize at the **message level**: route all operations for a given entity through a single ordered log, and process that log sequentially. Different entities maintain independent logs with zero coordination between them.
-This is the per-entity serialization model. It's the same insight behind the actor model (Erlang/OTP, Orleans, Akka) — but implemented with nothing beyond Redis and native to the NestJS ecosystem. Entity types are defined implicitly: any CQRS command or query decorated with `@EntityType` automatically gets per-entity sequential processing. Your `@CommandHandler` and `@QueryHandler` classes are the handlers — no separate actor classes needed.
+Don't lock the database. Don't lock the resource. **Route all operations for a given entity through a single worker.** That worker processes messages sequentially. Different entities have their own workers running concurrently.
 ```
-                        ┌─────────────────────────────────────────────────┐
-  Request A ─┐          │           Entity: account-42                    │
-             │          │  ┌──────┐  ┌──────┐  ┌──────┐                  │
-  Request B ─┼─► Route ─┼─►│ Msg1 │─►│ Msg2 │─►│ Msg3 │─► [Executor] ─┐ │
-             │          │  └──────┘  └──────┘  └──────┘               │ │
-  Request C ─┘          │                     Sequential ◄────────────┘ │
-                        └─────────────────────────────────────────────────┘
-  Meanwhile, account-99, order-7, user-abc — all execute
-  in parallel on the same cluster, completely independent.
+  account:a-1  ──► [Worker] ──► handler1 → handler2 → handler3  (sequential)
+  account:a-2  ──► [Worker] ──► handler1 → handler2              (sequential)
+  order:o-5    ──► [Worker] ──► handler1                          (sequential)
+                                                    (all concurrent across entities)
 ```
-This eliminates an entire class of bugs — lost updates, dirty reads, write skew, phantom reads on hot entities — without pessimistic locks, without optimistic retries, and without the `SELECT ... FOR UPDATE` that your DBA tells you not to use under load. The entity itself becomes the consistency boundary, and the consistency is structural rather than transactional.
+One worker per entity. Spawned when a message arrives. Destroyed when idle. The worker runs on the event loop — async I/O interleaves naturally across entities. No threads, no separate processes, no extra NestJS contexts.
 ---
-## How It Works
+## Quick Start
+### 1. Register the module
-### Entities and messages
+```typescript
+@Module({
+  imports: [
+    AtomicQueuesModule.forRoot({
+      redis: { host: 'localhost', port: 6379 },
+      entities: {
+        account: {},
+        order: { onInterrupt: 'dead-letter' },
+      },
+    }),
+  ],
+})
+export class AppModule {}
+```
-Everything in atomic-queues is an **entity** that receives **messages**. An entity is identified by a type and an ID — `account:a-42`, `order:o-17`, `user:u-abc`. A message is a command or query addressed to a specific entity instance. You define this relationship with two decorators:
+### 2. Define commands
 ```typescript
+import { EntityType, QueueEntityId, Reply } from 'atomic-queues';
 @EntityType('account')
-export class WithdrawCommand {
+class DepositCommand implements Reply<{ balance: number }> {
   constructor(
     @QueueEntityId() public readonly accountId: string,
     public readonly amount: number,
@@ -102,470 +112,375 @@ export class WithdrawCommand {
 }
 ```
-That's the entire contract. `@EntityType` says "this message targets the `account` entity type." `@QueueEntityId()` says "the value of `accountId` is the entity instance ID." When you enqueue this command, the runtime routes it to the log for `account:{accountId}` and guarantees sequential execution against that specific entity instance, cluster-wide.
+### 3. Handle commands
-### Two levels of abstraction
-Entity types are defined implicitly — decorate your CQRS command or query class with `@EntityType`, and atomic-queues routes it through the per-entity log and gate system. Your `@CommandHandler` and `@QueryHandler` classes are the handlers. The handler code doesn't change. The guarantee changes — instead of executing inline on whatever request thread happens to call `commandBus.execute()`, your handler now executes sequentially per entity, cluster-wide.
+Standard `@nestjs/cqrs` handlers — nothing new to learn:
 ```typescript
-@EntityType('account')
-export class WithdrawCommand {
-  constructor(
-    @QueueEntityId() public readonly accountId: string,
-    public readonly amount: number,
-  ) {}
-}
-@CommandHandler(WithdrawCommand)
-export class WithdrawHandler implements ICommandHandler<WithdrawCommand> {
-  async execute(cmd: WithdrawCommand) {
-    // This runs sequentially per account — cluster-wide.
-    // No locks. No transactions. The dispatch engine guarantees it.
+@CommandHandler(DepositCommand)
+class DepositHandler implements ICommandHandler<DepositCommand> {
+  async execute(cmd: DepositCommand) {
+    // Runs sequentially per accountId — no concurrent deposits to the same account
+    const balance = await this.accountService.deposit(cmd.accountId, cmd.amount);
+    return { balance };
   }
 }
 ```
-The library auto-discovers `@CommandHandler` and `@QueryHandler` classes at boot and wires them into the dispatch pipeline. Your existing CQRS architecture gets per-entity sequential guarantees without changing a single handler.
-### Enqueuing messages
+### 4. Dispatch
 ```typescript
-// Fire-and-forget
-await queueBus.enqueue(new WithdrawCommand(accountId, 100));
-// Enqueue and block until result — return type inferred from Reply<T> brand
-const balance = await queueBus.enqueueAndWait(new GetBalanceQuery(accountId));
+import { QueueBus } from 'atomic-queues';
-// Scoped to an entity type
-await queueBus.forEntity('account').enqueueBulk([charge1, charge2, charge3]);
+@Injectable()
+class PaymentService {
+  constructor(private readonly queueBus: QueueBus) {}
-// Cross-service: string-based API — no class import needed
-await queueBus.enqueue('warehouse', 'ReserveStockCommand', 'SKU-001', { sku: 'SKU-001', quantity: 50 });
-const stock = await queueBus.enqueueAndWait('warehouse', 'GetStockQuery', 'SKU-001', { sku: 'SKU-001' });
+  async deposit(accountId: string, amount: number) {
+    // Fire and forget
+    await this.queueBus.enqueue(new DepositCommand(accountId, amount));
-// Scoped cross-service
-const warehouse = queueBus.forEntity('warehouse');
-await warehouse.enqueue('ReserveStockCommand', 'SKU-001', { sku: 'SKU-001', quantity: 50 });
+    // Wait for typed result (Reply<R> branding)
+    const { balance } = await this.queueBus.enqueueAndWait(
+      new DepositCommand(accountId, amount),
+    );
+  }
+}
 ```
----
-## The Dispatch Engine
-Under every API call is the same pipeline: **message → Redis log → Lua scheduler → gate → executor → handler**. Understanding this pipeline is key to understanding what atomic-queues actually guarantees and why it can guarantee it without locks.
-### Per-entity message logs
-When you call `enqueue()`, the message is serialized to JSON and appended to a Redis list (`LPUSH aq:log:account:a-42`), and the entity key is added to a global ready set (`SADD aq:ready account:a-42`). A pub/sub notification wakes the executor pool. Three Redis commands, pipelined in one round-trip.
-The log is the source of truth for ordering. Redis lists are FIFO — `LPUSH` appends to the head, `RPOP` consumes from the tail. Messages for the same entity are always processed in enqueue order.
-### The dispatch gate
-The core consistency primitive is the **dispatch gate** — a Redis key per entity (`SET aq:gate:account:a-42 <token> EX 30 NX`). The `NX` flag means only one executor can acquire it. The `EX` TTL means a crashed executor releases it automatically. This is not a distributed lock in the Redlock sense — there's no quorum, no retry loop, no backoff. If the gate is held, the scheduler moves on to the next ready entity. Zero contention between entities, zero blocking within the scheduling loop.
-### Atomic Lua scheduling
-A single Lua script runs atomically in Redis to perform the entire dispatch cycle:
-1. Sample entities from the ready set (`SRANDMEMBER` with batch size 32)
-2. Try to acquire the gate for each candidate (`SET NX EX`)
-3. On first successful acquisition, pop the next message from that entity's log (`RPOP`)
-4. Remove the entity from the ready set if its log is now empty
-Because Lua scripts execute atomically in Redis, the pick → gate acquisition → message pop sequence cannot be interleaved by another executor on another node. This is what eliminates race conditions — not locks, but atomicity at the Redis command level.
-### Shared executor pool
-Traditional queue systems spawn a worker per queue or per entity type. With thousands of entities, that means thousands of blocking Redis connections, thousands of event loops, and a scaling problem that grows linearly with your domain model.
-atomic-queues uses a **shared executor pool** — a configurable number of concurrent executors per node that dispatch messages from *any* ready entity. One pool can service millions of distinct entities. The pool self-regulates: it drains the ready set until empty or until the concurrency limit is hit, then sleeps until the next pub/sub tickle wakes it. There are no workers to spawn, monitor, or auto-scale.
-### Gate refresh for long-running handlers
-If a handler runs longer than the gate TTL, the gate doesn't expire — the executor pool runs a background interval that extends the TTL while the handler is still executing. This prevents false recovery (another node re-dispatching the same message) without requiring an unreasonably large TTL as the safety default.
-### Multiplexed result collection
-Request-reply (`enqueueAndWait` / `sendAndWait`) uses a single `PSUBSCRIBE` connection per node for all concurrent result waits. Hundreds or thousands of pending results share one TCP connection to Redis, routed to the correct promise via correlation ID. No connection-per-call, no connection pool exhaustion, no subscriber amplification.
+First message for `account:a-123` spawns a worker. All subsequent messages for that account queue behind it. The handler runs on your app's event loop using your existing DI container.
 ---
-## Cross-Service Communication
-This is where atomic-queues stops being a "queue library" and becomes a **distributed coordination primitive**.
-### The problem it solves
+## Queries
-In a microservices architecture, the standard way for Service A to tell Service B to do something is: define a gRPC/REST contract, deploy an API gateway or service mesh, handle serialization, implement retries, manage circuit breakers, and hope the schema stays in sync across repos. For async communication, add a message broker (RabbitMQ, Kafka, SQS), define topic/queue naming conventions, implement dead-letter handling, and build consumer groups.
-atomic-queues replaces all of that with Redis.
-### How it works
-Enable the distributed registry and any service connected to the same Redis instance can send typed messages to any entity — regardless of which service owns the handler.
+Queries work identically to commands but route through the `QueryBus`. They are sequenced with commands — a query enqueued after a deposit will see the deposit's effect.
 ```typescript
-// warehouse-service: defines and handles the entity
-AtomicQueuesModule.forRoot({
-  redis: { url: process.env.REDIS_URL },
-  registry: { enabled: true, serviceName: 'warehouse-service' },
-})
-// order-service: generate classes from the live registry, then use them like local CQRS
-import { ReserveStockCommand, GetStockQuery } from './generated';
+@EntityType('account')
+class GetBalanceQuery implements Reply<{ balance: number }> {
+  constructor(@QueueEntityId() public readonly accountId: string) {}
+}
-await queueBus.enqueue(new ReserveStockCommand({ sku: 'SKU-001', quantity: 50 }));
-const stock = await queueBus.enqueueAndWait(new GetStockQuery({ sku: 'SKU-001' }));
-stock.available; // fully typed — no string API, no explicit timeout, no code dependency on warehouse-service
+const { balance } = await queueBus.enqueueAndWait(new GetBalanceQuery('acc-123'));
 ```
-When `warehouse-service` starts, it scans its own `@CommandHandler` and `@QueryHandler` classes and publishes **entity contracts** to Redis — a JSON document listing the entity type, accepted messages, optional JSON schemas, and reply schemas, refreshed via heartbeat TTL. When `order-service` enqueues a message, the registry validates it at the call site *before* it enters the log: entity type exists, message name is accepted, payload matches schema. Errors are immediate and descriptive — not silent dead letters discovered hours later in a DLQ dashboard.
-The Lua scheduler ensures each node only dispatches messages for entity types it owns handlers for. Services that don't own any handlers (API gateways, pure producers) participate in the registry without stealing messages from handler-owning nodes.
-### What this replaces
-Think about what you no longer need:
-**No API gateway between services.** Messages go directly into the entity's log via Redis. The "endpoint" is the entity type and message name, not a URL.
-**No message broker.** Redis is the transport, the ordering guarantee, and the persistence layer. You don't need RabbitMQ, Kafka, or SQS to get async cross-service communication with ordering guarantees.
-**No schema registry as a separate service.** The entity contracts live in Redis alongside the message logs. Schema validation happens at the call site. Zod schemas on the producer side serialize to JSON Schema in the registry and validate on every enqueue.
+---
-**No service discovery.** The registry *is* service discovery. When a service starts, it publishes what it handles. When a service stops, its registrations TTL out. Other services discover capabilities by reading the registry.
+## How It Works
-**No serialization framework.** Messages are JSON. The wire protocol is three Redis commands. No Protobuf compilation step, no `.proto` files, no code generation from IDL. (Though atomic-queues does offer codegen from the live registry — it generates decorated TypeScript classes so Service A gets compile-time type safety for messages destined to Service B, without importing Service B's code.)
+### Virtual Actors (EntityWorker)
-**No separate dead-letter infrastructure.** Failed messages are dead-lettered per entity type in Redis, queryable via the same connection.
+Each entity instance (`account:a-123`, `order:o-5`) gets its own virtual actor — a processor callback with a FIFO message queue. The actor:
-### Schema validation
+1. Spawns on first message (no pre-registration needed)
+2. Processes messages sequentially (one at a time, on the event loop)
+3. Yields at `await` points (other entities' actors proceed concurrently)
+4. Tears down after idle timeout (configurable, default 30s)
-Attach Zod schemas to message classes for runtime safety across service boundaries:
+### Write-Ahead Log (WAL)
-```typescript
-import { Schema } from 'atomic-queues';
-import { z } from 'zod';
+Every message is dual-written: in-memory queue (speed) + Redis WAL (durability). The WAL is a state machine:
-@Schema(z.object({
-  accountId: z.string().uuid(),
-  amount: z.number().positive(),
-}))
-@EntityType('account')
-export class WithdrawCommand {
-  @QueueEntityId() public readonly accountId: string;
-  public readonly amount: number;
-}
+```
+enqueued → dispatched → completed | failed | interrupted
 ```
-The Zod schema serializes to JSON Schema and stores in the registry. Every service validates payloads against it — even services that don't import your code, even services written in a different language that read the registry directly from Redis.
+Each transition is an atomic Lua script that checks the current state before moving forward. Recovery runs automatically on startup:
-### Entity co-ownership
+- `enqueued` → re-dispatch (handler never ran — this is the first attempt, not a retry)
+- `dispatched` → **dead-letter** (handler was running when the process crashed — never re-execute)
+- `completed` / `failed` / `interrupted` → cleanup (stale terminal entries)
-Multiple services can handle different message types on the same entity. Service A handles `DepositCommand` and `WithdrawCommand` on the `account` entity type. Service B handles `FreezeAccountCommand` on the same entity type. The registry merges their contracts automatically. The dispatch gate still ensures single-writer semantics per entity instance, regardless of which service's executor picks up the message.
+A background cleanup timer evicts terminal WAL entries on a configurable interval.
-### Runtime introspection
+### Master Topology (Cluster Mode)
-Any service can discover what the cluster offers at runtime — no config files, no shared code:
+Each replica set has a **deterministic master** — the node with the lowest `serverId` among live nodes in the same `serviceGroup`. No locks, no elections, no Redlock. All nodes read the same Redis-backed heartbeat registry and independently compute who the master is.
-```typescript
-const contracts = await queueBus.introspect();
+The master:
-contracts.entityTypes();                              // ['account', 'warehouse', ...]
-contracts.hasEntity('warehouse');                      // true
-contracts.messagesFor('warehouse');                    // ['ReserveStockCommand', 'GetStockQuery']
-contracts.accepts('warehouse', 'ReserveStockCommand'); // true
-contracts.schemaFor('warehouse', 'ReserveStockCommand');  // { properties: { sku: ..., quantity: ... } }
-contracts.replySchemaFor('warehouse', 'GetStockQuery');   // { properties: { sku: ..., available: ... } }
+- Owns the **worker assignment table**: which `entity:entityId` lives on which replica
+- Routes all petitions: replicas forward via gRPC to the master
+- Resolves workers via three tiers: existing assignment → consistent hash ring → least-loaded replica
+- **Epoch fences** every dispatch: replicas reject commands from stale masters
-// Human-readable summary for logging/debugging
-console.log(contracts.toString());
+```
+Replica Set: billing-service
+┌──────────────────────────────────────────────┐
+│  Master (deterministic: lowest serverId)     │
+│  ├── Assignment Table                        │
+│  │   account:a-1 → replica-2                 │
+│  │   account:a-2 → replica-1                 │
+│  └── Routes petitions, balances load         │
+│                                              │
+│  Replica-1: [worker: account:a-2]            │
+│  Replica-2: [worker: account:a-1]            │
+│  Replica-3: (master pod, no workers yet)     │
+└──────────────────────────────────────────────┘
 ```
-### Raw cross-service API
-For quick prototyping or dynamic dispatch, you can also use the string-based API — no classes, no codegen, no imports:
-```typescript
-// Fire-and-forget
-await queueBus.enqueue('warehouse', 'ReserveStockCommand', 'SKU-001', {
-  sku: 'SKU-001',
-  quantity: 50,
-});
+Masters interconnect across service groups:
+```
+Master (billing) ←── gRPC ──→ Master (warehouse)
+```
-// Request-reply
-const stock = await queueBus.enqueueAndWait('warehouse', 'GetStockQuery', 'SKU-001', {
-  sku: 'SKU-001',
-});
+### Master Failover
-// Scoped to an entity type
-const warehouse = queueBus.forEntity('warehouse');
-await warehouse.enqueue('ReserveStockCommand', 'SKU-001', { sku: 'SKU-001', quantity: 50 });
-const stock = await warehouse.enqueueAndWait('GetStockQuery', 'SKU-001', { sku: 'SKU-001' });
-```
+1. Master crashes → heartbeat TTL expires
+2. Remaining nodes recompute leader from node list → next-lowest `serverId` becomes master
+3. New master queries all replicas via gRPC `ListWorkers`
+4. Rebuilds assignment table from live cluster state (petitions rejected during rebuild — fail-fast over misrouting)
+5. Old master pushes its worker list to the new master on demotion
+6. Resumes operations
-This works out of the box — the registry validates entity type and message name at the call site. For production services, class codegen gives you full type safety.
+No split-brain: leadership is a pure function of the live node set. Epoch fencing rejects any stale-master commands that arrive during transitions.
-### Class codegen (recommended)
+### Health Monitoring
-Generate fully decorated TypeScript classes from the live registry — import them and use them like local CQRS classes with full autocomplete, type safety, and zero string APIs:
+**Redis health**: Periodic `PING`. Consecutive failures above threshold → degraded mode (reject new messages, leader resigns, discovery steps down). Automatic recovery when Redis responds again.
-```bash
-npx atomic-queues generate --classes -o src/generated
-```
+**gRPC peer connectivity**: Native gRPC channel state watching (`READY` → alive, `TRANSIENT_FAILURE` → suspected dead). Debounce timer prevents flapping on brief disconnects.
-This produces one file per entity type plus a barrel `index.ts`:
+**Per-peer circuit breakers**: gRPC connections track consecutive failures. After threshold → circuit opens (fast-fail, no network calls). After cooldown → half-open (one probe). Success → closed. Failure → re-open.
-```
-src/generated/
-  warehouse.ts    # ReserveStockCommand, GetStockQuery, data interfaces, reply interfaces
-  billing.ts      # ChargeCommand, GetInvoiceQuery, ...
-  index.ts        # export * from './warehouse'; export * from './billing';
-```
+---
-Then use them exactly like local command/query classes:
+## Enqueuing Messages
 ```typescript
-import { ReserveStockCommand, GetStockQuery } from './generated';
+// Fire-and-forget
+await queueBus.enqueue(new WithdrawCommand(accountId, 100));
-// Fire-and-forget — full autocomplete on constructor fields
-await queueBus.enqueue(new ReserveStockCommand({ sku: 'SKU-001', quantity: 50 }));
+// Enqueue and wait for typed result
+const { balance } = await queueBus.enqueueAndWait(new GetBalanceQuery(accountId));
-// Request-reply — return type inferred from Reply<T> brand, no explicit timeout
-const stock = await queueBus.enqueueAndWait(new GetStockQuery({ sku: 'SKU-001' }));
-stock.available; // typed as number — full IDE support
+// Scoped API
+const account = queueBus.forEntity('account', accountId);
+await account.enqueue(new DepositCommand(accountId, 500));
+// Raw string API (cross-service, no class needed)
+await queueBus.enqueue('warehouse', 'ReserveStockCommand', 'SKU-001', {
+  sku: 'SKU-001', quantity: 50,
+});
 ```
-Generated query classes implement `Reply<T>` via a phantom type brand, so `enqueueAndWait` infers the return type at compile time with zero runtime cost. No explicit generics, no timeout parameter — timeouts are resolved from config.
+---
-You can also filter to specific entity types:
+## Backpressure
-```bash
-npx atomic-queues generate --classes -o src/generated --entities warehouse,billing
-```
-### Other codegen formats
+Three levels, all configurable:
-```bash
-# TypeScript interfaces + DispatchMap (for typed string-based API)
-npx atomic-queues generate --ts --output ./generated/contracts.ts
+| Level | Config | Behavior |
+|-------|--------|----------|
+| Per-worker | `workerMaxQueueDepth` | Rejects with `QUEUE_DEPTH_EXCEEDED` |
+| Global workers | `maxTotalWorkers` | Rejects new entities with `WORKER_LIMIT_EXCEEDED` (existing entities still accepted) |
+| Global depth | `maxTotalQueueDepth` | Rejects all enqueues with `QUEUE_DEPTH_EXCEEDED` |
-# JSON Schema (language-agnostic)
-npx atomic-queues generate --json-schema --output ./generated/schema.json
+In cluster mode, the master also enforces `maxConcurrentPetitions` to bound petition processing.
-# Full registry snapshot
-npx atomic-queues generate --snapshot --output ./generated/snapshot.json
-```
+---
-### Config-driven timeouts
+## Configuration
-`enqueueAndWait` resolves timeouts automatically — you never need to pass one explicitly:
+### Minimal (single server)
 ```typescript
 AtomicQueuesModule.forRoot({
-  executor: {
-    gateTTL: 30,
-    defaultReplyTimeout: 15000,     // global fallback: 15s
-  },
-  entities: {
-    warehouse: {
-      replyTimeout: 5000,           // warehouse-specific: 5s
-    },
-  },
+  redis: { host: 'localhost', port: 6379 },
 })
 ```
-Resolution chain: explicit arg → per-entity `replyTimeout` → global `defaultReplyTimeout` → `gateTTL * 2 * 1000`. If nothing is configured, defaults to 60s.
+That's it. Everything else has defaults. Add `entities` to customize per-entity behavior, `grpc` to enable cluster mode.
----
+### Full reference
-## Redis *is* the Protocol
+#### `AtomicQueuesModule.forRoot(config)`
-This is the most important architectural decision in the project, and it has implications that go far beyond NestJS.
+| Field | Type | Required | Default | Description |
+|-------|------|----------|---------|-------------|
+| `redis` | `IRedisConfig` | **yes** | — | Redis connection. Accepts `{ host, port, password, db }` or `{ url }` |
+| `entities` | `Record<string, IEntityConfig>` | no | `{}` | Per-entity-type overrides (see below) |
+| `keyPrefix` | `string` | no | `'aq'` | Prefix for all Redis keys |
+| `maxTotalWorkers` | `number` | no | `10000` | Max concurrent entity workers across all types. `0` = unbounded |
+| `maxTotalQueueDepth` | `number` | no | `100000` | Max total pending messages across all workers. `0` = unbounded |
+| `retry` | `IRetryPolicy` | no | `{ maxAttempts: 1 }` | Default retry policy (strictly-once by default) |
+| `wal` | `IWalConfig` | no | `{ enabled: true }` | Write-ahead log settings |
+| `grpc` | `IGrpcConfig` | no | `{ enabled: false }` | Cluster mode — omit entirely for single-server |
+| `verbose` | `boolean` | no | `false` | Enable verbose logging |
-The wire protocol is [fully documented](./WIRE-PROTOCOL.md), intentionally simple, and versioned with breaking-change semantics. Enqueuing a message is three Redis commands:
+#### `IEntityConfig` — per entity type
+```typescript
+entities: {
+  account: { /* all fields optional */ },
+  order: { onInterrupt: 'dead-letter', workerIdleTimeout: 60_000 },
+}
 ```
-LPUSH  aq:log:account:a-1  '<message JSON>'
-SADD   aq:ready  account:a-1
-PUBLISH aq:tickle  1
-```
-**Any language with a Redis client is a first-class citizen.** A Python data pipeline can enqueue commands to a NestJS-hosted entity. A Go microservice can fire events at entities defined in TypeScript. A Rust executor can run the same Lua scheduling script and compete for gates on equal terms with the Node.js executor pool. A Bash script can trigger a workflow.
-This is not a feature of most frameworks. Orleans requires the Orleans silo. Temporal requires the Temporal server with its own database. All of them are monoglot execution environments — handlers must be written in the framework's language.
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `defaultEntityId` | `string` | — | Property name used as entity ID when `@QueueEntityId` is not present |
+| `onInterrupt` | `'dead-letter' \| 'retry'` | `'dead-letter'` | What to do when a message is found mid-execution on recovery |
+| `workerIdleTimeout` | `number` (ms) | `30000` | How long an idle worker lives before teardown |
+| `workerMaxQueueDepth` | `number` | `0` (unbounded) | Max pending messages per worker. Rejects with `QUEUE_DEPTH_EXCEEDED` |
+| `replyTimeout` | `number` (ms) | `5000` | Default timeout for `enqueueAndWait` on this entity type |
+| `retry` | `IRetryPolicy` | inherits root | Per-entity retry policy override |
-atomic-queues is **polyglot by construction**. The coordination happens in Redis, not in the runtime. Any process that speaks the wire protocol participates on equal terms, and the [WIRE-PROTOCOL.md](./WIRE-PROTOCOL.md) includes a complete Python reference client to prove it.
+#### `IRetryPolicy`
-This opens architectures that are genuinely difficult to build otherwise:
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `maxAttempts` | `number` | `1` | Total attempts. `1` = strictly once, no retries |
+| `backoff` | `'fixed' \| 'exponential'` | `'exponential'` | Backoff strategy between retries |
+| `backoffDelay` | `number` (ms) | `1000` | Base delay between retries |
+| `maxDelay` | `number` (ms) | `30000` | Maximum delay cap for exponential backoff |
-- **Ingest in Go, process in Node.js, analyze in Python.** Each layer speaks Redis. The entity logs are the integration boundary.
-- **Rust executors for CPU-hot-path entities.** The same Lua scheduler, the same gates, the same entity logs. The Rust process is just another executor that happens to be faster. The Node.js side doesn't know or care.
-- **Gradual migration.** Move one entity type's handlers to a different service, a different language, or a different infrastructure — without touching any other service's code. The entity contract in the registry is the interface, not the import statement.
-- **Edge coordination.** An IoT device with a Redis client and 3 commands of knowledge can participate in the same entity model as your cloud services.
+#### `IWalConfig` — write-ahead log
----
-## Quick Start
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled` | `boolean` | `true` | Disable WAL for testing only — **never disable in production** |
+| `cleanupInterval` | `number` (ms) | `5000` | How often to evict completed/failed WAL entries |
+| `entryTTL` | `number` (seconds) | `86400` (24h) | Safety TTL for WAL entries in Redis |
-```typescript
-import { Module } from '@nestjs/common';
-import { AtomicQueuesModule } from 'atomic-queues';
-@Module({
-  imports: [
-    AtomicQueuesModule.forRoot({
-      redis: { host: 'localhost', port: 6379 },
-    }),
-  ],
-})
-export class AppModule {}
-```
+#### `IGrpcConfig` — cluster mode
-Define a command and enqueue it:
+Omit entirely for single-server. Set `enabled: true` to activate.
 ```typescript
-@EntityType('account')
-export class WithdrawCommand {
-  constructor(
-    @QueueEntityId() public readonly accountId: string,
-    public readonly amount: number,
-  ) {}
-}
-@Injectable()
-export class PaymentService {
-  constructor(private readonly queueBus: QueueBus) {}
-  async withdraw(accountId: string, amount: number) {
-    await this.queueBus.enqueue(new WithdrawCommand(accountId, amount));
-  }
+grpc: {
+  enabled: true,
+  listenAddress: '0.0.0.0:50051',
+  advertisedAddress: '10.0.1.5:50051',
+  serverId: 'billing-1',
+  serviceGroup: 'billing',
 }
 ```
-The command is appended to `account:{accountId}`'s message log and executed sequentially by the shared executor pool. No handler registration, no worker setup, no queue configuration.
+| Field | Type | Default | Description |
+|-------|------|---------|-------------|
+| `enabled` | `boolean` | `false` | Enable gRPC cluster transport |
+| `listenAddress` | `string` | `'0.0.0.0:50051'` | Address the gRPC server binds to |
+| `advertisedAddress` | `string` | `os.hostname() + ':50051'` | Address other nodes use to reach this one |
+| `serverId` | `string` | auto-generated UUID | Unique node ID. Must be stable across restarts for predictable leader election |
+| `serviceGroup` | `string` | `'default'` | Logical grouping — nodes in the same group form a replica set |
+| `maxForwardHops` | `number` | `3` | Max cross-service forwarding hops to prevent loops |
+| `maxConcurrentPetitions` | `number` | `50` | Max in-flight petitions the master processes. `0` = unbounded |
+**Timing (ms)**
+| Field | Default | Description |
+|-------|---------|-------------|
+| `heartbeatMs` | `400` | How often this node heartbeats to Redis |
+| `nodeTTLMs` | `1500` | Node considered dead after this long without heartbeat |
+| `reconcileIntervalMs` | `2000` | How often to scan Redis for membership changes |
+| `leaderTTLMs` | `2000` | Leader lock TTL |
+| `leaderRenewalMs` | `400` | Leader lock renewal interval |
+| `leaderDebounceMs` | `800` | Debounce window before recomputing leader after ring changes |
+**Health monitoring**
+| Field | Default | Description |
+|-------|---------|-------------|
+| `peerMonitorEnabled` | `true` | Watch gRPC channel state for fast failure detection |
+| `peerSuspectDebounceMs` | `500` | Debounce before declaring a peer suspected-dead |
+| `redisHealthCheckMs` | `500` | Redis PING interval |
+| `redisHealthFailureThreshold` | `3` | Consecutive PING failures before degraded mode |
+**Circuit breaker (per-peer gRPC connections)**
+| Field | Default | Description |
+|-------|---------|-------------|
+| `circuitBreakerFailureThreshold` | `3` | Consecutive failures before opening the circuit |
+| `circuitBreakerCooldownMs` | `2000` | Time before a half-open probe is allowed |
+**gRPC keepalive**
+| Field | Default | Description |
+|-------|---------|-------------|
+| `keepaliveTimeMs` | `10000` | Keepalive ping interval (minimum enforced by grpc-js) |
+| `keepaliveTimeoutMs` | `5000` | Connection dead if no keepalive response |
+**RPC deadlines** (`deadlines` sub-object)
+| Field | Default | Description |
+|-------|---------|-------------|
+| `deadlines.forwardMs` | `1500` | Deadline for fire-and-forget RPCs (forward, petition, enqueueToWorker) |
+| `deadlines.pingMs` | `1000` | Deadline for health ping |
+| `deadlines.andWaitMs` | `60000` | Default deadline for `*AndWait` RPCs when no `replyTimeout` is set |
+| `deadlines.syncMs` | `1000` | Deadline for `listWorkers` during master table rebuild |
+| `deadlines.connectivityWatchMs` | `30000` | Timeout for peer connectivity watch loop re-arm |
 ---
-## Configuration
+## Dead Letter Queue
-```typescript
-AtomicQueuesModule.forRoot({
-  redis: { host: 'localhost', port: 6379 },
-  executor: {
-    poolSize: 1,              // concurrent executors per node
-    gateTTL: 30,              // seconds before gate expires (safety net)
-    defaultReplyTimeout: 15000, // global default for enqueueAndWait (ms)
-  },
-  entities: {
-    account: {
-      defaultEntityId: 'accountId',
-      gateTTL: 60,
-      retry: { maxAttempts: 5, backoff: 'exponential', backoffDelay: 2000 },
-      replyTimeout: 5000,     // per-entity enqueueAndWait timeout (ms)
-    },
-  },
-  registry: {
-    enabled: false,
-    serviceName: 'my-service',
-    schemaValidation: false,
-    heartbeatInterval: 10000,
-    registrationTTL: 30,
-  },
-  keyPrefix: 'aq',
-  verbose: false,
-})
-```
-Optional peer dependencies:
+Messages found in `dispatched` state on recovery, or that exhaust all retry attempts, are moved to a Redis-backed dead letter queue.
 ```bash
-npm install @nestjs/cqrs          # for CQRS handler auto-wiring
-npm install zod zod-to-json-schema # for schema validation in the registry
+npx atomic-queues dlq list
+npx atomic-queues dlq replay --id <message-id>
+npx atomic-queues dlq purge
 ```
 ---
-## Guarantees
-| Guarantee | Scope | Mechanism |
-|---|---|---|
-| FIFO per entity | Cluster-wide | Redis list (`LPUSH`/`RPOP`) |
-| Single-writer per entity | Cluster-wide | Gate key (`SET NX EX`) |
-| At-least-once delivery | Per message | Retry on gate TTL expiry |
-| Parallel across entities | Per node | Executor pool concurrency |
-| Durability | Per message | Redis persistence (AOF/RDB) |
-### What this does NOT guarantee
-**Exactly-once processing.** Like every distributed message system — Orleans, Akka, Temporal, Kafka — handlers must be idempotent. If an executor crashes mid-processing, the gate TTL expires and the message retries on another node. This is a fundamental constraint of distributed systems, not a limitation of the library.
+## CLI
----
-## How It Compares
+```bash
+# Inspect live entity/command/query registry from Redis
+npx atomic-queues introspect
-| Capability | BullMQ | Temporal | atomic-queues |
-|---|---|---|---|
-| Per-entity ordering | Manual (named queues) | Workflow-scoped | Built-in, zero config |
-| Cross-entity parallelism | Worker pools | Worker pools | Shared executor pool |
-| Stateful entities | No | Workflow state | Per-entity sequential handlers |
-| Cross-service messaging | Shared queue names | gRPC | Redis registry + codegen |
-| Polyglot clients | JS/TS only | SDK per language | Any Redis client (3 commands) |
-| Infrastructure required | Redis | Temporal server + DB | Redis only |
-| Distributed locks needed | Yes, for ordering | Internal | None — gates are non-contending |
-| Service discovery | External | Built-in | Built-in (registry) |
-| Schema validation | No | Protobuf | Zod → JSON Schema |
+# Generate TypeScript from the live registry
+npx atomic-queues generate --classes -o ./src/generated   # decorated class files
+npx atomic-queues generate --ts -o ./src/generated        # namespace interfaces + DispatchMap
+npx atomic-queues generate --json-schema -o ./src/generated
+```
 ---
-## Decorator Reference
+## Guarantees
-| Decorator | Purpose |
+| Guarantee | Mechanism |
 |---|---|
-| `@EntityType('type')` | Route a message to an entity type |
-| `@QueueEntityId()` | Mark the property holding the entity ID |
-| `@QueueEntity('type', 'prop')` | Combined entity type + ID |
-| `@Schema(zodSchema)` | Attach a Zod schema for registry validation |
-| `@ReplySchema(zodSchema)` | Attach a reply schema for query codegen |
+| FIFO per entity | One worker per entity:entityId with FIFO queue |
+| Single-writer per entity | Only one worker exists across the cluster |
+| At-most-once delivery | WAL: enqueued → dispatched → completed. Never re-executed after dispatch. |
+| Fail if interrupted | Dispatched on crash → dead-lettered, source notified |
+| Concurrent across entities | Event loop interleaves at await points |
+| Durability | Redis WAL (dual-write: in-memory + Redis) |
+| Auto-recovery | WAL recovery + cleanup run automatically on startup |
+| Cluster coordination | Deterministic master topology with gRPC |
+| Master failover | Heartbeat expiry → deterministic re-election + assignment table rebuild |
+| Epoch fencing | Replicas reject commands from stale masters |
+| No distributed locks | The worker IS the serialization — not a lock, not Redlock, not SET NX |
 ---
-## Production Considerations
+## Design Philosophy
-### Redis as a Single Point of Failure
+AtomicQueues is pessimistic by design. At every decision point, it chooses safety over liveness:
-atomic-queues relies on a single Redis instance for all coordination: message logs, gates, the ready set, and the distributed registry. If that Redis instance becomes unavailable, all dispatch stops.
+- **Interrupted?** Dead-letter, don't retry.
+- **Redis down?** Reject new work, don't buffer.
+- **Stale epoch?** Reject, don't process.
+- **Master rebuilding?** Reject petitions, don't guess.
+- **Unknown assignment?** Bounce and retry through the master, don't deliver speculatively.
-**Mitigations:**
-- **Redis Sentinel** — automatic failover to a replica. Gates (SET NX EX) and Lua scripts work identically after promotion. Brief message re-delivery is possible during failover but per-entity ordering is preserved.
-- **Redis Cluster** — horizontal scaling. Requires all keys for a given entity to land on the same shard. Use Redis hash tags (e.g. `{account:a-1}`) in your `keyPrefix` config to ensure co-location.
-- **Persistence** — enable AOF (`appendonly yes`) with `appendfsync everysec` at minimum. RDB snapshots alone risk losing the last seconds of enqueued messages on crash.
-- **Monitoring** — watch `connected_clients`, `used_memory`, and `instantaneous_ops_per_sec`. Set alerts on replication lag if using Sentinel.
-### Retry Ordering
-Failed messages are re-enqueued with `RPUSH`, placing them at the back of the entity's log. This means other pending messages for the same entity are processed before the retry. If you need head-of-line retry (failed message retried immediately), implement a custom retry strategy.
+The system refuses to operate under uncertainty rather than risk executing a message twice.
 ---
-## Migrating from V1
-V2 is a full rewrite of the internals. BullMQ is removed. Workers are removed. The public API is largely preserved.
+## Migrating from v2
-**What stays the same**: `@EntityType`, `@QueueEntityId`, `@QueueEntity`, `queueBus.enqueue()`, `queueBus.forEntity()`, `queueBus.enqueueAndWait()`.
+**Removed**: `executor`, `registry`, `gateTTL`, `ActorSystem`, `LogService`, `GateService`, `SchedulerService`, `ExecutorPoolService`, `ResultCollector`, `RegistryService`, `workers` config, `WorkerModule`.
-**What's removed**: `@WorkerProcessor`, `@JobHandler`, `@EntityScaler`, `@OnSpawnWorker`, `@OnTerminateWorker`, `@GetActiveEntities`, `@GetDesiredWorkerCount`, `.forProcessor()`. All worker and scaling concepts are gone.
+**Added**: `EntityWorker`, `EntityWorkerManager`, `MasterCoordinator`, `workerIdleTimeout` in entity config.
-**What's new**: `@Schema`, `@ReplySchema`, `ActorSystem`, `RegistryService`, distributed registry, runtime introspection (`queueBus.introspect()`), cross-service string-based API, `Reply<T>` phantom type, class codegen CLI (`--classes`), config-driven timeouts.
+**Unchanged**: All decorators, `QueueBus` public API, CLI generators.
-**Migration steps**: (1) remove all `@WorkerProcessor` classes — configure entity defaults in module config and use `@CommandHandler`/`@QueryHandler`; (2) remove all scaling decorators; (3) run the data migration script to drain in-flight BullMQ jobs to the new log format; (4) remove `bullmq` and `@nestjs/bullmq` from your dependencies.
+**Migration**: Remove `executor`/`registry`/`workers` from config. That's it. Workers are now internal.
 ---