atomic-queues 2.3.0 → 3.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (204) hide show
  1. package/README.md +297 -382
  2. package/dist/cli/generators/classes.d.ts +1 -1
  3. package/dist/cli/generators/json-schema.d.ts +1 -1
  4. package/dist/cli/generators/typescript.d.ts +1 -1
  5. package/dist/cli/index.js +147 -5
  6. package/dist/cli/index.js.map +1 -1
  7. package/dist/cluster/cluster-discovery.service.d.ts +91 -0
  8. package/dist/cluster/cluster-discovery.service.d.ts.map +1 -0
  9. package/dist/cluster/cluster-discovery.service.js +423 -0
  10. package/dist/cluster/cluster-discovery.service.js.map +1 -0
  11. package/dist/cluster/grpc-peer-monitor.service.d.ts +31 -0
  12. package/dist/cluster/grpc-peer-monitor.service.d.ts.map +1 -0
  13. package/dist/cluster/grpc-peer-monitor.service.js +192 -0
  14. package/dist/cluster/grpc-peer-monitor.service.js.map +1 -0
  15. package/dist/cluster/index.d.ts +7 -0
  16. package/dist/cluster/index.d.ts.map +1 -0
  17. package/dist/cluster/index.js +23 -0
  18. package/dist/cluster/index.js.map +1 -0
  19. package/dist/cluster/leader-election.service.d.ts +38 -0
  20. package/dist/cluster/leader-election.service.d.ts.map +1 -0
  21. package/dist/cluster/leader-election.service.js +184 -0
  22. package/dist/cluster/leader-election.service.js.map +1 -0
  23. package/dist/cluster/master-coordinator.d.ts +50 -0
  24. package/dist/cluster/master-coordinator.d.ts.map +1 -0
  25. package/dist/cluster/master-coordinator.js +307 -0
  26. package/dist/cluster/master-coordinator.js.map +1 -0
  27. package/dist/cluster/redis-health-monitor.service.d.ts +23 -0
  28. package/dist/cluster/redis-health-monitor.service.d.ts.map +1 -0
  29. package/dist/cluster/redis-health-monitor.service.js +100 -0
  30. package/dist/cluster/redis-health-monitor.service.js.map +1 -0
  31. package/dist/cluster/server-ring.service.d.ts +48 -0
  32. package/dist/cluster/server-ring.service.d.ts.map +1 -0
  33. package/dist/cluster/server-ring.service.js +136 -0
  34. package/dist/cluster/server-ring.service.js.map +1 -0
  35. package/dist/decorators/entity.decorators.d.ts +16 -24
  36. package/dist/decorators/entity.decorators.d.ts.map +1 -1
  37. package/dist/decorators/entity.decorators.js +0 -39
  38. package/dist/decorators/entity.decorators.js.map +1 -1
  39. package/dist/decorators/interfaces.d.ts +10 -10
  40. package/dist/decorators/interfaces.d.ts.map +1 -1
  41. package/dist/decorators/job.decorators.d.ts +4 -52
  42. package/dist/decorators/job.decorators.d.ts.map +1 -1
  43. package/dist/decorators/job.decorators.js +6 -54
  44. package/dist/decorators/job.decorators.js.map +1 -1
  45. package/dist/decorators/metadata-readers.d.ts +4 -2
  46. package/dist/decorators/metadata-readers.d.ts.map +1 -1
  47. package/dist/decorators/metadata-readers.js +2 -0
  48. package/dist/decorators/metadata-readers.js.map +1 -1
  49. package/dist/decorators/schema.decorators.d.ts +1 -1
  50. package/dist/decorators/schema.decorators.d.ts.map +1 -1
  51. package/dist/decorators/schema.decorators.js.map +1 -1
  52. package/dist/decorators/utils.d.ts +1 -1
  53. package/dist/decorators/utils.d.ts.map +1 -1
  54. package/dist/decorators/utils.js +5 -1
  55. package/dist/decorators/utils.js.map +1 -1
  56. package/dist/domain/interfaces/config.interfaces.d.ts +92 -29
  57. package/dist/domain/interfaces/config.interfaces.d.ts.map +1 -1
  58. package/dist/domain/interfaces/index.d.ts +1 -0
  59. package/dist/domain/interfaces/index.d.ts.map +1 -1
  60. package/dist/domain/interfaces/index.js +1 -0
  61. package/dist/domain/interfaces/index.js.map +1 -1
  62. package/dist/{services/registry → domain/interfaces}/registry.types.d.ts.map +1 -1
  63. package/dist/domain/interfaces/registry.types.js.map +1 -0
  64. package/dist/grpc/grpc-client-pool.service.d.ts +71 -0
  65. package/dist/grpc/grpc-client-pool.service.d.ts.map +1 -0
  66. package/dist/grpc/grpc-client-pool.service.js +307 -0
  67. package/dist/grpc/grpc-client-pool.service.js.map +1 -0
  68. package/dist/grpc/grpc-server.service.d.ts +47 -0
  69. package/dist/grpc/grpc-server.service.d.ts.map +1 -0
  70. package/dist/grpc/grpc-server.service.js +494 -0
  71. package/dist/grpc/grpc-server.service.js.map +1 -0
  72. package/dist/grpc/index.d.ts +3 -0
  73. package/dist/grpc/index.d.ts.map +1 -0
  74. package/dist/{services/executor-pool → grpc}/index.js +2 -1
  75. package/dist/grpc/index.js.map +1 -0
  76. package/dist/index.d.ts +4 -0
  77. package/dist/index.d.ts.map +1 -1
  78. package/dist/index.js +4 -0
  79. package/dist/index.js.map +1 -1
  80. package/dist/module/atomic-queues.module.d.ts +1 -0
  81. package/dist/module/atomic-queues.module.d.ts.map +1 -1
  82. package/dist/module/atomic-queues.module.js +59 -10
  83. package/dist/module/atomic-queues.module.js.map +1 -1
  84. package/dist/services/command-discovery/command-discovery.service.js +2 -2
  85. package/dist/services/command-discovery/command-discovery.service.js.map +1 -1
  86. package/dist/services/index.d.ts +2 -8
  87. package/dist/services/index.d.ts.map +1 -1
  88. package/dist/services/index.js +2 -8
  89. package/dist/services/index.js.map +1 -1
  90. package/dist/services/message-router/index.d.ts +2 -0
  91. package/dist/services/message-router/index.d.ts.map +1 -0
  92. package/dist/services/{actor-system → message-router}/index.js +1 -1
  93. package/dist/services/message-router/index.js.map +1 -0
  94. package/dist/services/message-router/message-router.service.d.ts +53 -0
  95. package/dist/services/message-router/message-router.service.d.ts.map +1 -0
  96. package/dist/services/message-router/message-router.service.js +519 -0
  97. package/dist/services/message-router/message-router.service.js.map +1 -0
  98. package/dist/services/queue-bus/cluster-contracts.d.ts +1 -1
  99. package/dist/services/queue-bus/cluster-contracts.d.ts.map +1 -1
  100. package/dist/services/queue-bus/cluster-contracts.js.map +1 -1
  101. package/dist/services/queue-bus/queue-bus.service.d.ts +3 -21
  102. package/dist/services/queue-bus/queue-bus.service.d.ts.map +1 -1
  103. package/dist/services/queue-bus/queue-bus.service.js +15 -119
  104. package/dist/services/queue-bus/queue-bus.service.js.map +1 -1
  105. package/dist/utils/id.utils.d.ts +3 -0
  106. package/dist/utils/id.utils.d.ts.map +1 -0
  107. package/dist/utils/id.utils.js +14 -0
  108. package/dist/utils/id.utils.js.map +1 -0
  109. package/dist/utils/index.d.ts +1 -0
  110. package/dist/utils/index.d.ts.map +1 -1
  111. package/dist/utils/index.js +1 -0
  112. package/dist/utils/index.js.map +1 -1
  113. package/dist/wal/index.d.ts +4 -0
  114. package/dist/wal/index.d.ts.map +1 -0
  115. package/dist/{services/gate → wal}/index.js +3 -1
  116. package/dist/wal/index.js.map +1 -0
  117. package/dist/wal/wal.scripts.d.ts +51 -0
  118. package/dist/wal/wal.scripts.d.ts.map +1 -0
  119. package/dist/wal/wal.scripts.js +84 -0
  120. package/dist/wal/wal.scripts.js.map +1 -0
  121. package/dist/wal/wal.service.d.ts +46 -0
  122. package/dist/wal/wal.service.d.ts.map +1 -0
  123. package/dist/wal/wal.service.js +243 -0
  124. package/dist/wal/wal.service.js.map +1 -0
  125. package/dist/wal/wal.types.d.ts +23 -0
  126. package/dist/wal/wal.types.d.ts.map +1 -0
  127. package/dist/wal/wal.types.js +3 -0
  128. package/dist/wal/wal.types.js.map +1 -0
  129. package/dist/workers/consistent-hash.d.ts +97 -0
  130. package/dist/workers/consistent-hash.d.ts.map +1 -0
  131. package/dist/workers/consistent-hash.js +231 -0
  132. package/dist/workers/consistent-hash.js.map +1 -0
  133. package/dist/workers/entity-worker-manager.d.ts +35 -0
  134. package/dist/workers/entity-worker-manager.d.ts.map +1 -0
  135. package/dist/workers/entity-worker-manager.js +237 -0
  136. package/dist/workers/entity-worker-manager.js.map +1 -0
  137. package/dist/workers/entity-worker.d.ts +54 -0
  138. package/dist/workers/entity-worker.d.ts.map +1 -0
  139. package/dist/workers/entity-worker.js +142 -0
  140. package/dist/workers/entity-worker.js.map +1 -0
  141. package/dist/workers/index.d.ts +4 -0
  142. package/dist/workers/index.d.ts.map +1 -0
  143. package/dist/{services/log → workers}/index.js +3 -1
  144. package/dist/workers/index.js.map +1 -0
  145. package/package.json +17 -4
  146. package/dist/services/actor-system/actor-system.service.d.ts +0 -19
  147. package/dist/services/actor-system/actor-system.service.d.ts.map +0 -1
  148. package/dist/services/actor-system/actor-system.service.js +0 -86
  149. package/dist/services/actor-system/actor-system.service.js.map +0 -1
  150. package/dist/services/actor-system/index.d.ts +0 -2
  151. package/dist/services/actor-system/index.d.ts.map +0 -1
  152. package/dist/services/actor-system/index.js.map +0 -1
  153. package/dist/services/executor-pool/executor-pool.service.d.ts +0 -38
  154. package/dist/services/executor-pool/executor-pool.service.d.ts.map +0 -1
  155. package/dist/services/executor-pool/executor-pool.service.js +0 -166
  156. package/dist/services/executor-pool/executor-pool.service.js.map +0 -1
  157. package/dist/services/executor-pool/index.d.ts +0 -2
  158. package/dist/services/executor-pool/index.d.ts.map +0 -1
  159. package/dist/services/executor-pool/index.js.map +0 -1
  160. package/dist/services/gate/gate.service.d.ts +0 -17
  161. package/dist/services/gate/gate.service.d.ts.map +0 -1
  162. package/dist/services/gate/gate.service.js +0 -81
  163. package/dist/services/gate/gate.service.js.map +0 -1
  164. package/dist/services/gate/index.d.ts +0 -2
  165. package/dist/services/gate/index.d.ts.map +0 -1
  166. package/dist/services/gate/index.js.map +0 -1
  167. package/dist/services/log/index.d.ts +0 -2
  168. package/dist/services/log/index.d.ts.map +0 -1
  169. package/dist/services/log/index.js.map +0 -1
  170. package/dist/services/log/log.service.d.ts +0 -21
  171. package/dist/services/log/log.service.d.ts.map +0 -1
  172. package/dist/services/log/log.service.js +0 -92
  173. package/dist/services/log/log.service.js.map +0 -1
  174. package/dist/services/registry/index.d.ts +0 -4
  175. package/dist/services/registry/index.d.ts.map +0 -1
  176. package/dist/services/registry/index.js +0 -20
  177. package/dist/services/registry/index.js.map +0 -1
  178. package/dist/services/registry/registry.service.d.ts +0 -43
  179. package/dist/services/registry/registry.service.d.ts.map +0 -1
  180. package/dist/services/registry/registry.service.js +0 -367
  181. package/dist/services/registry/registry.service.js.map +0 -1
  182. package/dist/services/registry/registry.types.js.map +0 -1
  183. package/dist/services/registry/schema-converter.d.ts +0 -2
  184. package/dist/services/registry/schema-converter.d.ts.map +0 -1
  185. package/dist/services/registry/schema-converter.js +0 -27
  186. package/dist/services/registry/schema-converter.js.map +0 -1
  187. package/dist/services/result-collector/index.d.ts +0 -2
  188. package/dist/services/result-collector/index.d.ts.map +0 -1
  189. package/dist/services/result-collector/index.js +0 -18
  190. package/dist/services/result-collector/index.js.map +0 -1
  191. package/dist/services/result-collector/result-collector.service.d.ts +0 -17
  192. package/dist/services/result-collector/result-collector.service.d.ts.map +0 -1
  193. package/dist/services/result-collector/result-collector.service.js +0 -92
  194. package/dist/services/result-collector/result-collector.service.js.map +0 -1
  195. package/dist/services/scheduler/index.d.ts +0 -2
  196. package/dist/services/scheduler/index.d.ts.map +0 -1
  197. package/dist/services/scheduler/index.js +0 -18
  198. package/dist/services/scheduler/index.js.map +0 -1
  199. package/dist/services/scheduler/scheduler.service.d.ts +0 -17
  200. package/dist/services/scheduler/scheduler.service.d.ts.map +0 -1
  201. package/dist/services/scheduler/scheduler.service.js +0 -140
  202. package/dist/services/scheduler/scheduler.service.js.map +0 -1
  203. /package/dist/{services/registry → domain/interfaces}/registry.types.d.ts +0 -0
  204. /package/dist/{services/registry → domain/interfaces}/registry.types.js +0 -0
package/README.md CHANGED
@@ -10,11 +10,11 @@
10
10
  ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⣽⣟⣳⡝⡼⢁⠎⠀⡀⢁⣴⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣶⡄⠰⣄⠈⠓⢌⠛⢽⣣⡟⢿⠿⣿⣿⢿⣿⣿⣿⣿⣿⣿⣿█▀█ █ █▄█ █ ▀ █ █ █▄▄
11
11
  ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⡿⣽⠳⡼⢁⡞⠀⡜⢰⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡆⢸⢵⠀⠀⠁⠂⠤⣉⠉⠓⠒⠚⠦⠥⡈⠉⣙⢛⡿⣿█▀█ █ █ █▀▀ █ █ █▀▀ █▀▀
12
12
  ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⡾⣽⣏⢳⢃⣞⠃⡼⢀⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡄⠀⠀⠀⠀⠀⠀⠀⠀⠁⢀⣀⠤⠐⢋⡰⣌⣾⣿⣿▀▀█ █▄█ ██▄ █▄█ ██▄ ▄▄█
13
- ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⣮⢳⣿⠶⠁⠖⠃⠀⠁⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠿⠿⠟⠛⠛⠀⠀⠀⠀⢀⡤⠤⠐⠒⣉⠡⣄⠶⣭⣿⣽⣿⣿⣿⣿
14
- ⣿⣿⣿⣿⣿⣿⣿⡿⠿⢉⡢⠝⠁⠀⠃⠀⠀⠀⠀⠀⠿⠃⠿⠿⠿⠛⠋⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⠀⣀⠀⣀⢤⣰⣲⣽⣾⡟⣾⣿⣿⣿⣿⣿⣿⣿⣿
15
- ⣿⣿⣟⡿⡚⠏⠁⠀⠐⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠂⣠⠀⣯⣗⣮⢿⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿z e r o c o n t e n t i o n
16
- ⣿⢯⡝⠠⠁⠀⠀⠠⠤⠀⠀⠀⠀⡀⠢⣄⣀⡀⠐⠤⡀⠀⠀⠀⢤⣄⣀⠤⣄⣤⢤⣖⡾⠋⢁⡼⠁⣸⡿⣞⣽⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿p e r e n t i t y
17
- ⣿⣷⣾⣵⣦⣶⣖⣳⣶⣝⣶⣯⣷⣽⣷⣾⣶⣽⣯⣶⠄⠈⠒⣤⣀⠉⠙⠛⠛⠋⠋⢁⣠⠔⠁⠀⢰⣿⣽⣯⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿l o c k f r e e
13
+ ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣟⣮⢳⣿⠶⠁⠖⠃⠀⠁⢸⣿⣿⣿⣿⣿⣿⣿⣿⣿⡿⠿⠿⠟⠛⠛⠀⠀⠀⠀⢀⡤⠤⠐⠒⣉⠡⣄⠶⣭⣿⣽⣿⣿⣿⣿⣿
14
+ ⣿⣿⣿⣿⣿⣿⣿⡿⠿⢉⡢⠝⠁⠀⠃⠀⠀⠀⠀⠀⠿⠃⠿⠿⠿⠛⠋⠉⠁⠀⠀⠀⠀⠀⠀⠀⠀⣀⠀⣀⢤⣰⣲⣽⣾⡟⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿
15
+ ⣿⣟⡿⡚⠏⠁⠀⠀⠐⠉⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⠀⣠⠂⣠⠀⣯⣗⣮⢿⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿v i r t u a l a c t o r s
16
+ ⣿⢯⡝⠠⠁⠀⠀⠠⠤⠀⠀⠀⠀⡀⠢⣄⣀⡀⠐⠤⡀⠀⠀⠀⢤⣄⣀⠤⣄⣤⢤⣖⡾⠋⢁⡼⠁⣸⡿⣞⣽⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿s t r i c t l y o n c e
17
+ ⣿⣷⣾⣵⣦⣶⣖⣳⣶⣝⣶⣯⣷⣽⣷⣾⣶⣽⣯⣶⠄⠈⠒⣤⣀⠉⠙⠛⠛⠋⠋⢁⣠⠔⠁⠀⢰⣿⣽⣯⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿z e r o l o c k s
18
18
  ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣦⡄⡀⡉⠛⠓⠶⠶⠒⠛⠋⠀⠀⢀⣼⣻⢷⣾⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
19
19
  ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣾⣧⡵⣌⣒⢂⠀⣀⣀⣠⣤⣶⣿⣾⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
20
20
  ⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣷⣿⣾⣷⣯⣿⣧⣿⣷⣿⣷⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿⣿
@@ -26,7 +26,6 @@
26
26
  <p align="center">
27
27
  <img src="https://img.shields.io/npm/v/atomic-queues?style=flat-square&color=cb3837" alt="npm version" />
28
28
  <img src="https://img.shields.io/badge/NestJS-11-ea2845?style=flat-square&logo=nestjs" alt="NestJS 11" />
29
- <img src="https://img.shields.io/badge/Redis-7-dc382d?style=flat-square&logo=redis&logoColor=white" alt="Redis 7" />
30
29
  <img src="https://img.shields.io/badge/license-MIT-blue?style=flat-square" alt="MIT License" />
31
30
  </p>
32
31
 
@@ -34,67 +33,78 @@
34
33
 
35
34
  ## What is atomic-queues?
36
35
 
37
- **Per-entity sequential processing for Node.js, built entirely on Redis primitives.**
36
+ **Per-entity sequential processing with virtual actors for NestJS.**
38
37
 
39
- Think of it as automatic entity-level serialization for the NestJS ecosystem, requiring nothing beyond a Redis instance you probably already have.
38
+ One worker per entity instance, spawned on demand, destroyed when idle. The worker IS the serialization boundary. If only one worker exists for `account:a-123` across the entire cluster, all operations on that account are serial by construction. No locks. No transactions. No race conditions.
40
39
 
41
- Messages addressed to the same entity execute sequentially. Messages addressed to different entities execute in parallel. No distributed locks. No worker processes. No message broker. No BullMQ.
40
+ **Motto: Strictly once, fail if interrupted.**
42
41
 
43
42
  ```
44
43
  npm install atomic-queues ioredis
45
44
  ```
46
45
 
46
+ **Peer dependencies:** `@nestjs/common`, `@nestjs/core`, `@nestjs/cqrs`, `ioredis`
47
+
48
+ **Optional:** `@grpc/grpc-js`, `@grpc/proto-loader` (cluster mode), `zod` (CLI schema validation)
49
+
47
50
  ---
48
51
 
49
52
  ## The Problem
50
53
 
51
- Every distributed system eventually builds toward one of two failure modes: **state corruption** from concurrent mutations on the same entity, or **throughput collapse** from the locking mechanisms used to prevent it.
52
-
53
54
  ```
54
55
  Time Request A Request B Database
55
56
  ──────────────────────────────────────────────────────────────────────────
56
- T₀ SELECT balance $100 SELECT balance $100 $100
57
- T₁ CHECK: $100 $80CHECK: $100 $80
58
- T₂ UPDATE: $100 $80 = $20 $20
59
- T₃ UPDATE: $100 $80 = $20 −$60
57
+ T0 SELECT balance -> $100 SELECT balance -> $100 $100
58
+ T1 CHECK: $100 >= $80 CHECK: $100 >= $80
59
+ T2 UPDATE: $100 - $80 = $20 $20
60
+ T3 UPDATE: $100 - $80 = $20 -$60
60
61
  ──────────────────────────────────────────────────────────────────────────
61
- Result: Balance is −$60. Both withdrawals succeed. Integrity violated.
62
+ Result: Balance is -$60. Both withdrawals succeed. Integrity violated.
62
63
  ```
63
64
 
64
- The standard answers — `SELECT ... FOR UPDATE`, optimistic locking with retries, distributed locks via Redlock or ZooKeeper, serializable transactions — all trade throughput for correctness. Under load, they become bottlenecks. Across services, they become nightmares. And every team ends up inventing some ad-hoc combination of them, poorly, under production pressure.
65
+ Row locks, optimistic locking, Redlock they all trade throughput for correctness.
65
66
 
66
67
  ## The Insight
67
68
 
68
- The problem disappears if you change *when* serialization happens. Instead of serializing at the database level (row locks, transaction isolation), serialize at the **message level**: route all operations for a given entity through a single ordered log, and process that log sequentially. Different entities maintain independent logs with zero coordination between them.
69
-
70
- This is the per-entity serialization model. It's the same insight behind the actor model (Erlang/OTP, Orleans, Akka) — but implemented with nothing beyond Redis and native to the NestJS ecosystem. Entity types are defined implicitly: any CQRS command or query decorated with `@EntityType` automatically gets per-entity sequential processing. Your `@CommandHandler` and `@QueryHandler` classes are the handlers — no separate actor classes needed.
69
+ Don't lock the database. Don't lock the resource. **Route all operations for a given entity through a single worker.** That worker processes messages sequentially. Different entities have their own workers running concurrently.
71
70
 
72
71
  ```
73
- ┌─────────────────────────────────────────────────┐
74
- Request A ─┐ │ Entity: account-42 │
75
- │ │ ┌──────┐ ┌──────┐ ┌──────┐ │
76
- Request B ─┼─► Route ─┼─►│ Msg1 │─►│ Msg2 │─►│ Msg3 │─► [Executor] ─┐ │
77
- │ │ └──────┘ └──────┘ └──────┘ │ │
78
- Request C ─┘ │ Sequential ◄────────────┘ │
79
- └─────────────────────────────────────────────────┘
80
-
81
- Meanwhile, account-99, order-7, user-abc — all execute
82
- in parallel on the same cluster, completely independent.
72
+ account:a-1 ──► [Worker] ──► handler1 → handler2 → handler3 (sequential)
73
+ account:a-2 ──► [Worker] ──► handler1 → handler2 (sequential)
74
+ order:o-5 ──► [Worker] ──► handler1 (sequential)
75
+ (all concurrent across entities)
83
76
  ```
84
77
 
85
- This eliminates an entire class of bugs lost updates, dirty reads, write skew, phantom reads on hot entities — without pessimistic locks, without optimistic retries, and without the `SELECT ... FOR UPDATE` that your DBA tells you not to use under load. The entity itself becomes the consistency boundary, and the consistency is structural rather than transactional.
78
+ One worker per entity. Spawned when a message arrives. Destroyed when idle. The worker runs on the event loop async I/O interleaves naturally across entities. No threads, no separate processes, no extra NestJS contexts.
86
79
 
87
80
  ---
88
81
 
89
- ## How It Works
82
+ ## Quick Start
83
+
84
+ ### 1. Register the module
90
85
 
91
- ### Entities and messages
86
+ ```typescript
87
+ @Module({
88
+ imports: [
89
+ AtomicQueuesModule.forRoot({
90
+ redis: { host: 'localhost', port: 6379 },
91
+ entities: {
92
+ account: {},
93
+ order: { onInterrupt: 'dead-letter' },
94
+ },
95
+ }),
96
+ ],
97
+ })
98
+ export class AppModule {}
99
+ ```
92
100
 
93
- Everything in atomic-queues is an **entity** that receives **messages**. An entity is identified by a type and an ID — `account:a-42`, `order:o-17`, `user:u-abc`. A message is a command or query addressed to a specific entity instance. You define this relationship with two decorators:
101
+ ### 2. Define commands
94
102
 
95
103
  ```typescript
104
+ import { EntityType, QueueEntityId, Reply } from 'atomic-queues';
105
+
96
106
  @EntityType('account')
97
- export class WithdrawCommand {
107
+ class DepositCommand implements Reply<{ balance: number }> {
98
108
  constructor(
99
109
  @QueueEntityId() public readonly accountId: string,
100
110
  public readonly amount: number,
@@ -102,470 +112,375 @@ export class WithdrawCommand {
102
112
  }
103
113
  ```
104
114
 
105
- That's the entire contract. `@EntityType` says "this message targets the `account` entity type." `@QueueEntityId()` says "the value of `accountId` is the entity instance ID." When you enqueue this command, the runtime routes it to the log for `account:{accountId}` and guarantees sequential execution against that specific entity instance, cluster-wide.
115
+ ### 3. Handle commands
106
116
 
107
- ### Two levels of abstraction
108
-
109
- Entity types are defined implicitly — decorate your CQRS command or query class with `@EntityType`, and atomic-queues routes it through the per-entity log and gate system. Your `@CommandHandler` and `@QueryHandler` classes are the handlers. The handler code doesn't change. The guarantee changes — instead of executing inline on whatever request thread happens to call `commandBus.execute()`, your handler now executes sequentially per entity, cluster-wide.
117
+ Standard `@nestjs/cqrs` handlers nothing new to learn:
110
118
 
111
119
  ```typescript
112
- @EntityType('account')
113
- export class WithdrawCommand {
114
- constructor(
115
- @QueueEntityId() public readonly accountId: string,
116
- public readonly amount: number,
117
- ) {}
118
- }
119
-
120
- @CommandHandler(WithdrawCommand)
121
- export class WithdrawHandler implements ICommandHandler<WithdrawCommand> {
122
- async execute(cmd: WithdrawCommand) {
123
- // This runs sequentially per account — cluster-wide.
124
- // No locks. No transactions. The dispatch engine guarantees it.
120
+ @CommandHandler(DepositCommand)
121
+ class DepositHandler implements ICommandHandler<DepositCommand> {
122
+ async execute(cmd: DepositCommand) {
123
+ // Runs sequentially per accountId — no concurrent deposits to the same account
124
+ const balance = await this.accountService.deposit(cmd.accountId, cmd.amount);
125
+ return { balance };
125
126
  }
126
127
  }
127
128
  ```
128
129
 
129
- The library auto-discovers `@CommandHandler` and `@QueryHandler` classes at boot and wires them into the dispatch pipeline. Your existing CQRS architecture gets per-entity sequential guarantees without changing a single handler.
130
-
131
- ### Enqueuing messages
130
+ ### 4. Dispatch
132
131
 
133
132
  ```typescript
134
- // Fire-and-forget
135
- await queueBus.enqueue(new WithdrawCommand(accountId, 100));
136
-
137
- // Enqueue and block until result — return type inferred from Reply<T> brand
138
- const balance = await queueBus.enqueueAndWait(new GetBalanceQuery(accountId));
133
+ import { QueueBus } from 'atomic-queues';
139
134
 
140
- // Scoped to an entity type
141
- await queueBus.forEntity('account').enqueueBulk([charge1, charge2, charge3]);
135
+ @Injectable()
136
+ class PaymentService {
137
+ constructor(private readonly queueBus: QueueBus) {}
142
138
 
143
- // Cross-service: string-based API no class import needed
144
- await queueBus.enqueue('warehouse', 'ReserveStockCommand', 'SKU-001', { sku: 'SKU-001', quantity: 50 });
145
- const stock = await queueBus.enqueueAndWait('warehouse', 'GetStockQuery', 'SKU-001', { sku: 'SKU-001' });
139
+ async deposit(accountId: string, amount: number) {
140
+ // Fire and forget
141
+ await this.queueBus.enqueue(new DepositCommand(accountId, amount));
146
142
 
147
- // Scoped cross-service
148
- const warehouse = queueBus.forEntity('warehouse');
149
- await warehouse.enqueue('ReserveStockCommand', 'SKU-001', { sku: 'SKU-001', quantity: 50 });
143
+ // Wait for typed result (Reply<R> branding)
144
+ const { balance } = await this.queueBus.enqueueAndWait(
145
+ new DepositCommand(accountId, amount),
146
+ );
147
+ }
148
+ }
150
149
  ```
151
150
 
152
- ---
153
-
154
- ## The Dispatch Engine
155
-
156
- Under every API call is the same pipeline: **message → Redis log → Lua scheduler → gate → executor → handler**. Understanding this pipeline is key to understanding what atomic-queues actually guarantees and why it can guarantee it without locks.
157
-
158
- ### Per-entity message logs
159
-
160
- When you call `enqueue()`, the message is serialized to JSON and appended to a Redis list (`LPUSH aq:log:account:a-42`), and the entity key is added to a global ready set (`SADD aq:ready account:a-42`). A pub/sub notification wakes the executor pool. Three Redis commands, pipelined in one round-trip.
161
-
162
- The log is the source of truth for ordering. Redis lists are FIFO — `LPUSH` appends to the head, `RPOP` consumes from the tail. Messages for the same entity are always processed in enqueue order.
163
-
164
- ### The dispatch gate
165
-
166
- The core consistency primitive is the **dispatch gate** — a Redis key per entity (`SET aq:gate:account:a-42 <token> EX 30 NX`). The `NX` flag means only one executor can acquire it. The `EX` TTL means a crashed executor releases it automatically. This is not a distributed lock in the Redlock sense — there's no quorum, no retry loop, no backoff. If the gate is held, the scheduler moves on to the next ready entity. Zero contention between entities, zero blocking within the scheduling loop.
167
-
168
- ### Atomic Lua scheduling
169
-
170
- A single Lua script runs atomically in Redis to perform the entire dispatch cycle:
171
-
172
- 1. Sample entities from the ready set (`SRANDMEMBER` with batch size 32)
173
- 2. Try to acquire the gate for each candidate (`SET NX EX`)
174
- 3. On first successful acquisition, pop the next message from that entity's log (`RPOP`)
175
- 4. Remove the entity from the ready set if its log is now empty
176
-
177
- Because Lua scripts execute atomically in Redis, the pick → gate acquisition → message pop sequence cannot be interleaved by another executor on another node. This is what eliminates race conditions — not locks, but atomicity at the Redis command level.
178
-
179
- ### Shared executor pool
180
-
181
- Traditional queue systems spawn a worker per queue or per entity type. With thousands of entities, that means thousands of blocking Redis connections, thousands of event loops, and a scaling problem that grows linearly with your domain model.
182
-
183
- atomic-queues uses a **shared executor pool** — a configurable number of concurrent executors per node that dispatch messages from *any* ready entity. One pool can service millions of distinct entities. The pool self-regulates: it drains the ready set until empty or until the concurrency limit is hit, then sleeps until the next pub/sub tickle wakes it. There are no workers to spawn, monitor, or auto-scale.
184
-
185
- ### Gate refresh for long-running handlers
186
-
187
- If a handler runs longer than the gate TTL, the gate doesn't expire — the executor pool runs a background interval that extends the TTL while the handler is still executing. This prevents false recovery (another node re-dispatching the same message) without requiring an unreasonably large TTL as the safety default.
188
-
189
- ### Multiplexed result collection
190
-
191
- Request-reply (`enqueueAndWait` / `sendAndWait`) uses a single `PSUBSCRIBE` connection per node for all concurrent result waits. Hundreds or thousands of pending results share one TCP connection to Redis, routed to the correct promise via correlation ID. No connection-per-call, no connection pool exhaustion, no subscriber amplification.
151
+ First message for `account:a-123` spawns a worker. All subsequent messages for that account queue behind it. The handler runs on your app's event loop using your existing DI container.
192
152
 
193
153
  ---
194
154
 
195
- ## Cross-Service Communication
196
-
197
- This is where atomic-queues stops being a "queue library" and becomes a **distributed coordination primitive**.
198
-
199
- ### The problem it solves
155
+ ## Queries
200
156
 
201
- In a microservices architecture, the standard way for Service A to tell Service B to do something is: define a gRPC/REST contract, deploy an API gateway or service mesh, handle serialization, implement retries, manage circuit breakers, and hope the schema stays in sync across repos. For async communication, add a message broker (RabbitMQ, Kafka, SQS), define topic/queue naming conventions, implement dead-letter handling, and build consumer groups.
202
-
203
- atomic-queues replaces all of that with Redis.
204
-
205
- ### How it works
206
-
207
- Enable the distributed registry and any service connected to the same Redis instance can send typed messages to any entity — regardless of which service owns the handler.
157
+ Queries work identically to commands but route through the `QueryBus`. They are sequenced with commands a query enqueued after a deposit will see the deposit's effect.
208
158
 
209
159
  ```typescript
210
- // warehouse-service: defines and handles the entity
211
- AtomicQueuesModule.forRoot({
212
- redis: { url: process.env.REDIS_URL },
213
- registry: { enabled: true, serviceName: 'warehouse-service' },
214
- })
215
-
216
- // order-service: generate classes from the live registry, then use them like local CQRS
217
- import { ReserveStockCommand, GetStockQuery } from './generated';
160
+ @EntityType('account')
161
+ class GetBalanceQuery implements Reply<{ balance: number }> {
162
+ constructor(@QueueEntityId() public readonly accountId: string) {}
163
+ }
218
164
 
219
- await queueBus.enqueue(new ReserveStockCommand({ sku: 'SKU-001', quantity: 50 }));
220
- const stock = await queueBus.enqueueAndWait(new GetStockQuery({ sku: 'SKU-001' }));
221
- stock.available; // fully typed — no string API, no explicit timeout, no code dependency on warehouse-service
165
+ const { balance } = await queueBus.enqueueAndWait(new GetBalanceQuery('acc-123'));
222
166
  ```
223
167
 
224
- When `warehouse-service` starts, it scans its own `@CommandHandler` and `@QueryHandler` classes and publishes **entity contracts** to Redis — a JSON document listing the entity type, accepted messages, optional JSON schemas, and reply schemas, refreshed via heartbeat TTL. When `order-service` enqueues a message, the registry validates it at the call site *before* it enters the log: entity type exists, message name is accepted, payload matches schema. Errors are immediate and descriptive — not silent dead letters discovered hours later in a DLQ dashboard.
225
-
226
- The Lua scheduler ensures each node only dispatches messages for entity types it owns handlers for. Services that don't own any handlers (API gateways, pure producers) participate in the registry without stealing messages from handler-owning nodes.
227
-
228
- ### What this replaces
229
-
230
- Think about what you no longer need:
231
-
232
- **No API gateway between services.** Messages go directly into the entity's log via Redis. The "endpoint" is the entity type and message name, not a URL.
233
-
234
- **No message broker.** Redis is the transport, the ordering guarantee, and the persistence layer. You don't need RabbitMQ, Kafka, or SQS to get async cross-service communication with ordering guarantees.
235
-
236
- **No schema registry as a separate service.** The entity contracts live in Redis alongside the message logs. Schema validation happens at the call site. Zod schemas on the producer side serialize to JSON Schema in the registry and validate on every enqueue.
168
+ ---
237
169
 
238
- **No service discovery.** The registry *is* service discovery. When a service starts, it publishes what it handles. When a service stops, its registrations TTL out. Other services discover capabilities by reading the registry.
170
+ ## How It Works
239
171
 
240
- **No serialization framework.** Messages are JSON. The wire protocol is three Redis commands. No Protobuf compilation step, no `.proto` files, no code generation from IDL. (Though atomic-queues does offer codegen from the live registry — it generates decorated TypeScript classes so Service A gets compile-time type safety for messages destined to Service B, without importing Service B's code.)
172
+ ### Virtual Actors (EntityWorker)
241
173
 
242
- **No separate dead-letter infrastructure.** Failed messages are dead-lettered per entity type in Redis, queryable via the same connection.
174
+ Each entity instance (`account:a-123`, `order:o-5`) gets its own virtual actor a processor callback with a FIFO message queue. The actor:
243
175
 
244
- ### Schema validation
176
+ 1. Spawns on first message (no pre-registration needed)
177
+ 2. Processes messages sequentially (one at a time, on the event loop)
178
+ 3. Yields at `await` points (other entities' actors proceed concurrently)
179
+ 4. Tears down after idle timeout (configurable, default 30s)
245
180
 
246
- Attach Zod schemas to message classes for runtime safety across service boundaries:
181
+ ### Write-Ahead Log (WAL)
247
182
 
248
- ```typescript
249
- import { Schema } from 'atomic-queues';
250
- import { z } from 'zod';
183
+ Every message is dual-written: in-memory queue (speed) + Redis WAL (durability). The WAL is a state machine:
251
184
 
252
- @Schema(z.object({
253
- accountId: z.string().uuid(),
254
- amount: z.number().positive(),
255
- }))
256
- @EntityType('account')
257
- export class WithdrawCommand {
258
- @QueueEntityId() public readonly accountId: string;
259
- public readonly amount: number;
260
- }
185
+ ```
186
+ enqueued → dispatched → completed | failed | interrupted
261
187
  ```
262
188
 
263
- The Zod schema serializes to JSON Schema and stores in the registry. Every service validates payloads against it even services that don't import your code, even services written in a different language that read the registry directly from Redis.
189
+ Each transition is an atomic Lua script that checks the current state before moving forward. Recovery runs automatically on startup:
264
190
 
265
- ### Entity co-ownership
191
+ - `enqueued` → re-dispatch (handler never ran — this is the first attempt, not a retry)
192
+ - `dispatched` → **dead-letter** (handler was running when the process crashed — never re-execute)
193
+ - `completed` / `failed` / `interrupted` → cleanup (stale terminal entries)
266
194
 
267
- Multiple services can handle different message types on the same entity. Service A handles `DepositCommand` and `WithdrawCommand` on the `account` entity type. Service B handles `FreezeAccountCommand` on the same entity type. The registry merges their contracts automatically. The dispatch gate still ensures single-writer semantics per entity instance, regardless of which service's executor picks up the message.
195
+ A background cleanup timer evicts terminal WAL entries on a configurable interval.
268
196
 
269
- ### Runtime introspection
197
+ ### Master Topology (Cluster Mode)
270
198
 
271
- Any service can discover what the cluster offers at runtime no config files, no shared code:
199
+ Each replica set has a **deterministic master** — the node with the lowest `serverId` among live nodes in the same `serviceGroup`. No locks, no elections, no Redlock. All nodes read the same Redis-backed heartbeat registry and independently compute who the master is.
272
200
 
273
- ```typescript
274
- const contracts = await queueBus.introspect();
201
+ The master:
275
202
 
276
- contracts.entityTypes(); // ['account', 'warehouse', ...]
277
- contracts.hasEntity('warehouse'); // true
278
- contracts.messagesFor('warehouse'); // ['ReserveStockCommand', 'GetStockQuery']
279
- contracts.accepts('warehouse', 'ReserveStockCommand'); // true
280
- contracts.schemaFor('warehouse', 'ReserveStockCommand'); // { properties: { sku: ..., quantity: ... } }
281
- contracts.replySchemaFor('warehouse', 'GetStockQuery'); // { properties: { sku: ..., available: ... } }
203
+ - Owns the **worker assignment table**: which `entity:entityId` lives on which replica
204
+ - Routes all petitions: replicas forward via gRPC to the master
205
+ - Resolves workers via three tiers: existing assignment → consistent hash ring → least-loaded replica
206
+ - **Epoch fences** every dispatch: replicas reject commands from stale masters
282
207
 
283
- // Human-readable summary for logging/debugging
284
- console.log(contracts.toString());
208
+ ```
209
+ Replica Set: billing-service
210
+ ┌──────────────────────────────────────────────┐
211
+ │ Master (deterministic: lowest serverId) │
212
+ │ ├── Assignment Table │
213
+ │ │ account:a-1 → replica-2 │
214
+ │ │ account:a-2 → replica-1 │
215
+ │ └── Routes petitions, balances load │
216
+ │ │
217
+ │ Replica-1: [worker: account:a-2] │
218
+ │ Replica-2: [worker: account:a-1] │
219
+ │ Replica-3: (master pod, no workers yet) │
220
+ └──────────────────────────────────────────────┘
285
221
  ```
286
222
 
287
- ### Raw cross-service API
288
-
289
- For quick prototyping or dynamic dispatch, you can also use the string-based API — no classes, no codegen, no imports:
290
-
291
- ```typescript
292
- // Fire-and-forget
293
- await queueBus.enqueue('warehouse', 'ReserveStockCommand', 'SKU-001', {
294
- sku: 'SKU-001',
295
- quantity: 50,
296
- });
223
+ Masters interconnect across service groups:
224
+ ```
225
+ Master (billing) ←── gRPC ──→ Master (warehouse)
226
+ ```
297
227
 
298
- // Request-reply
299
- const stock = await queueBus.enqueueAndWait('warehouse', 'GetStockQuery', 'SKU-001', {
300
- sku: 'SKU-001',
301
- });
228
+ ### Master Failover
302
229
 
303
- // Scoped to an entity type
304
- const warehouse = queueBus.forEntity('warehouse');
305
- await warehouse.enqueue('ReserveStockCommand', 'SKU-001', { sku: 'SKU-001', quantity: 50 });
306
- const stock = await warehouse.enqueueAndWait('GetStockQuery', 'SKU-001', { sku: 'SKU-001' });
307
- ```
230
+ 1. Master crashes heartbeat TTL expires
231
+ 2. Remaining nodes recompute leader from node list → next-lowest `serverId` becomes master
232
+ 3. New master queries all replicas via gRPC `ListWorkers`
233
+ 4. Rebuilds assignment table from live cluster state (petitions rejected during rebuild — fail-fast over misrouting)
234
+ 5. Old master pushes its worker list to the new master on demotion
235
+ 6. Resumes operations
308
236
 
309
- This works out of the box the registry validates entity type and message name at the call site. For production services, class codegen gives you full type safety.
237
+ No split-brain: leadership is a pure function of the live node set. Epoch fencing rejects any stale-master commands that arrive during transitions.
310
238
 
311
- ### Class codegen (recommended)
239
+ ### Health Monitoring
312
240
 
313
- Generate fully decorated TypeScript classes from the live registry import them and use them like local CQRS classes with full autocomplete, type safety, and zero string APIs:
241
+ **Redis health**: Periodic `PING`. Consecutive failures above threshold degraded mode (reject new messages, leader resigns, discovery steps down). Automatic recovery when Redis responds again.
314
242
 
315
- ```bash
316
- npx atomic-queues generate --classes -o src/generated
317
- ```
243
+ **gRPC peer connectivity**: Native gRPC channel state watching (`READY` → alive, `TRANSIENT_FAILURE` → suspected dead). Debounce timer prevents flapping on brief disconnects.
318
244
 
319
- This produces one file per entity type plus a barrel `index.ts`:
245
+ **Per-peer circuit breakers**: gRPC connections track consecutive failures. After threshold → circuit opens (fast-fail, no network calls). After cooldown → half-open (one probe). Success → closed. Failure → re-open.
320
246
 
321
- ```
322
- src/generated/
323
- warehouse.ts # ReserveStockCommand, GetStockQuery, data interfaces, reply interfaces
324
- billing.ts # ChargeCommand, GetInvoiceQuery, ...
325
- index.ts # export * from './warehouse'; export * from './billing';
326
- ```
247
+ ---
327
248
 
328
- Then use them exactly like local command/query classes:
249
+ ## Enqueuing Messages
329
250
 
330
251
  ```typescript
331
- import { ReserveStockCommand, GetStockQuery } from './generated';
252
+ // Fire-and-forget
253
+ await queueBus.enqueue(new WithdrawCommand(accountId, 100));
332
254
 
333
- // Fire-and-forget full autocomplete on constructor fields
334
- await queueBus.enqueue(new ReserveStockCommand({ sku: 'SKU-001', quantity: 50 }));
255
+ // Enqueue and wait for typed result
256
+ const { balance } = await queueBus.enqueueAndWait(new GetBalanceQuery(accountId));
335
257
 
336
- // Request-reply — return type inferred from Reply<T> brand, no explicit timeout
337
- const stock = await queueBus.enqueueAndWait(new GetStockQuery({ sku: 'SKU-001' }));
338
- stock.available; // typed as number — full IDE support
258
+ // Scoped API
259
+ const account = queueBus.forEntity('account', accountId);
260
+ await account.enqueue(new DepositCommand(accountId, 500));
261
+
262
+ // Raw string API (cross-service, no class needed)
263
+ await queueBus.enqueue('warehouse', 'ReserveStockCommand', 'SKU-001', {
264
+ sku: 'SKU-001', quantity: 50,
265
+ });
339
266
  ```
340
267
 
341
- Generated query classes implement `Reply<T>` via a phantom type brand, so `enqueueAndWait` infers the return type at compile time with zero runtime cost. No explicit generics, no timeout parameter — timeouts are resolved from config.
268
+ ---
342
269
 
343
- You can also filter to specific entity types:
270
+ ## Backpressure
344
271
 
345
- ```bash
346
- npx atomic-queues generate --classes -o src/generated --entities warehouse,billing
347
- ```
348
-
349
- ### Other codegen formats
272
+ Three levels, all configurable:
350
273
 
351
- ```bash
352
- # TypeScript interfaces + DispatchMap (for typed string-based API)
353
- npx atomic-queues generate --ts --output ./generated/contracts.ts
274
+ | Level | Config | Behavior |
275
+ |-------|--------|----------|
276
+ | Per-worker | `workerMaxQueueDepth` | Rejects with `QUEUE_DEPTH_EXCEEDED` |
277
+ | Global workers | `maxTotalWorkers` | Rejects new entities with `WORKER_LIMIT_EXCEEDED` (existing entities still accepted) |
278
+ | Global depth | `maxTotalQueueDepth` | Rejects all enqueues with `QUEUE_DEPTH_EXCEEDED` |
354
279
 
355
- # JSON Schema (language-agnostic)
356
- npx atomic-queues generate --json-schema --output ./generated/schema.json
280
+ In cluster mode, the master also enforces `maxConcurrentPetitions` to bound petition processing.
357
281
 
358
- # Full registry snapshot
359
- npx atomic-queues generate --snapshot --output ./generated/snapshot.json
360
- ```
282
+ ---
361
283
 
362
- ### Config-driven timeouts
284
+ ## Configuration
363
285
 
364
- `enqueueAndWait` resolves timeouts automatically — you never need to pass one explicitly:
286
+ ### Minimal (single server)
365
287
 
366
288
  ```typescript
367
289
  AtomicQueuesModule.forRoot({
368
- executor: {
369
- gateTTL: 30,
370
- defaultReplyTimeout: 15000, // global fallback: 15s
371
- },
372
- entities: {
373
- warehouse: {
374
- replyTimeout: 5000, // warehouse-specific: 5s
375
- },
376
- },
290
+ redis: { host: 'localhost', port: 6379 },
377
291
  })
378
292
  ```
379
293
 
380
- Resolution chain: explicit arg per-entity `replyTimeout` global `defaultReplyTimeout` `gateTTL * 2 * 1000`. If nothing is configured, defaults to 60s.
294
+ That's it. Everything else has defaults. Add `entities` to customize per-entity behavior, `grpc` to enable cluster mode.
381
295
 
382
- ---
296
+ ### Full reference
383
297
 
384
- ## Redis *is* the Protocol
298
+ #### `AtomicQueuesModule.forRoot(config)`
385
299
 
386
- This is the most important architectural decision in the project, and it has implications that go far beyond NestJS.
300
+ | Field | Type | Required | Default | Description |
301
+ |-------|------|----------|---------|-------------|
302
+ | `redis` | `IRedisConfig` | **yes** | — | Redis connection. Accepts `{ host, port, password, db }` or `{ url }` |
303
+ | `entities` | `Record<string, IEntityConfig>` | no | `{}` | Per-entity-type overrides (see below) |
304
+ | `keyPrefix` | `string` | no | `'aq'` | Prefix for all Redis keys |
305
+ | `maxTotalWorkers` | `number` | no | `10000` | Max concurrent entity workers across all types. `0` = unbounded |
306
+ | `maxTotalQueueDepth` | `number` | no | `100000` | Max total pending messages across all workers. `0` = unbounded |
307
+ | `retry` | `IRetryPolicy` | no | `{ maxAttempts: 1 }` | Default retry policy (strictly-once by default) |
308
+ | `wal` | `IWalConfig` | no | `{ enabled: true }` | Write-ahead log settings |
309
+ | `grpc` | `IGrpcConfig` | no | `{ enabled: false }` | Cluster mode — omit entirely for single-server |
310
+ | `verbose` | `boolean` | no | `false` | Enable verbose logging |
387
311
 
388
- The wire protocol is [fully documented](./WIRE-PROTOCOL.md), intentionally simple, and versioned with breaking-change semantics. Enqueuing a message is three Redis commands:
312
+ #### `IEntityConfig` per entity type
389
313
 
314
+ ```typescript
315
+ entities: {
316
+ account: { /* all fields optional */ },
317
+ order: { onInterrupt: 'dead-letter', workerIdleTimeout: 60_000 },
318
+ }
390
319
  ```
391
- LPUSH aq:log:account:a-1 '<message JSON>'
392
- SADD aq:ready account:a-1
393
- PUBLISH aq:tickle 1
394
- ```
395
-
396
- **Any language with a Redis client is a first-class citizen.** A Python data pipeline can enqueue commands to a NestJS-hosted entity. A Go microservice can fire events at entities defined in TypeScript. A Rust executor can run the same Lua scheduling script and compete for gates on equal terms with the Node.js executor pool. A Bash script can trigger a workflow.
397
320
 
398
- This is not a feature of most frameworks. Orleans requires the Orleans silo. Temporal requires the Temporal server with its own database. All of them are monoglot execution environments — handlers must be written in the framework's language.
321
+ | Field | Type | Default | Description |
322
+ |-------|------|---------|-------------|
323
+ | `defaultEntityId` | `string` | — | Property name used as entity ID when `@QueueEntityId` is not present |
324
+ | `onInterrupt` | `'dead-letter' \| 'retry'` | `'dead-letter'` | What to do when a message is found mid-execution on recovery |
325
+ | `workerIdleTimeout` | `number` (ms) | `30000` | How long an idle worker lives before teardown |
326
+ | `workerMaxQueueDepth` | `number` | `0` (unbounded) | Max pending messages per worker. Rejects with `QUEUE_DEPTH_EXCEEDED` |
327
+ | `replyTimeout` | `number` (ms) | `5000` | Default timeout for `enqueueAndWait` on this entity type |
328
+ | `retry` | `IRetryPolicy` | inherits root | Per-entity retry policy override |
399
329
 
400
- atomic-queues is **polyglot by construction**. The coordination happens in Redis, not in the runtime. Any process that speaks the wire protocol participates on equal terms, and the [WIRE-PROTOCOL.md](./WIRE-PROTOCOL.md) includes a complete Python reference client to prove it.
330
+ #### `IRetryPolicy`
401
331
 
402
- This opens architectures that are genuinely difficult to build otherwise:
332
+ | Field | Type | Default | Description |
333
+ |-------|------|---------|-------------|
334
+ | `maxAttempts` | `number` | `1` | Total attempts. `1` = strictly once, no retries |
335
+ | `backoff` | `'fixed' \| 'exponential'` | `'exponential'` | Backoff strategy between retries |
336
+ | `backoffDelay` | `number` (ms) | `1000` | Base delay between retries |
337
+ | `maxDelay` | `number` (ms) | `30000` | Maximum delay cap for exponential backoff |
403
338
 
404
- - **Ingest in Go, process in Node.js, analyze in Python.** Each layer speaks Redis. The entity logs are the integration boundary.
405
- - **Rust executors for CPU-hot-path entities.** The same Lua scheduler, the same gates, the same entity logs. The Rust process is just another executor that happens to be faster. The Node.js side doesn't know or care.
406
- - **Gradual migration.** Move one entity type's handlers to a different service, a different language, or a different infrastructure — without touching any other service's code. The entity contract in the registry is the interface, not the import statement.
407
- - **Edge coordination.** An IoT device with a Redis client and 3 commands of knowledge can participate in the same entity model as your cloud services.
339
+ #### `IWalConfig` write-ahead log
408
340
 
409
- ---
410
-
411
- ## Quick Start
341
+ | Field | Type | Default | Description |
342
+ |-------|------|---------|-------------|
343
+ | `enabled` | `boolean` | `true` | Disable WAL for testing only — **never disable in production** |
344
+ | `cleanupInterval` | `number` (ms) | `5000` | How often to evict completed/failed WAL entries |
345
+ | `entryTTL` | `number` (seconds) | `86400` (24h) | Safety TTL for WAL entries in Redis |
412
346
 
413
- ```typescript
414
- import { Module } from '@nestjs/common';
415
- import { AtomicQueuesModule } from 'atomic-queues';
416
-
417
- @Module({
418
- imports: [
419
- AtomicQueuesModule.forRoot({
420
- redis: { host: 'localhost', port: 6379 },
421
- }),
422
- ],
423
- })
424
- export class AppModule {}
425
- ```
347
+ #### `IGrpcConfig` — cluster mode
426
348
 
427
- Define a command and enqueue it:
349
+ Omit entirely for single-server. Set `enabled: true` to activate.
428
350
 
429
351
  ```typescript
430
- @EntityType('account')
431
- export class WithdrawCommand {
432
- constructor(
433
- @QueueEntityId() public readonly accountId: string,
434
- public readonly amount: number,
435
- ) {}
436
- }
437
-
438
- @Injectable()
439
- export class PaymentService {
440
- constructor(private readonly queueBus: QueueBus) {}
441
-
442
- async withdraw(accountId: string, amount: number) {
443
- await this.queueBus.enqueue(new WithdrawCommand(accountId, amount));
444
- }
352
+ grpc: {
353
+ enabled: true,
354
+ listenAddress: '0.0.0.0:50051',
355
+ advertisedAddress: '10.0.1.5:50051',
356
+ serverId: 'billing-1',
357
+ serviceGroup: 'billing',
445
358
  }
446
359
  ```
447
360
 
448
- The command is appended to `account:{accountId}`'s message log and executed sequentially by the shared executor pool. No handler registration, no worker setup, no queue configuration.
361
+ | Field | Type | Default | Description |
362
+ |-------|------|---------|-------------|
363
+ | `enabled` | `boolean` | `false` | Enable gRPC cluster transport |
364
+ | `listenAddress` | `string` | `'0.0.0.0:50051'` | Address the gRPC server binds to |
365
+ | `advertisedAddress` | `string` | `os.hostname() + ':50051'` | Address other nodes use to reach this one |
366
+ | `serverId` | `string` | auto-generated UUID | Unique node ID. Must be stable across restarts for predictable leader election |
367
+ | `serviceGroup` | `string` | `'default'` | Logical grouping — nodes in the same group form a replica set |
368
+ | `maxForwardHops` | `number` | `3` | Max cross-service forwarding hops to prevent loops |
369
+ | `maxConcurrentPetitions` | `number` | `50` | Max in-flight petitions the master processes. `0` = unbounded |
370
+
371
+ **Timing (ms)**
372
+
373
+ | Field | Default | Description |
374
+ |-------|---------|-------------|
375
+ | `heartbeatMs` | `400` | How often this node heartbeats to Redis |
376
+ | `nodeTTLMs` | `1500` | Node considered dead after this long without heartbeat |
377
+ | `reconcileIntervalMs` | `2000` | How often to scan Redis for membership changes |
378
+ | `leaderTTLMs` | `2000` | Leader lock TTL |
379
+ | `leaderRenewalMs` | `400` | Leader lock renewal interval |
380
+ | `leaderDebounceMs` | `800` | Debounce window before recomputing leader after ring changes |
381
+
382
+ **Health monitoring**
383
+
384
+ | Field | Default | Description |
385
+ |-------|---------|-------------|
386
+ | `peerMonitorEnabled` | `true` | Watch gRPC channel state for fast failure detection |
387
+ | `peerSuspectDebounceMs` | `500` | Debounce before declaring a peer suspected-dead |
388
+ | `redisHealthCheckMs` | `500` | Redis PING interval |
389
+ | `redisHealthFailureThreshold` | `3` | Consecutive PING failures before degraded mode |
390
+
391
+ **Circuit breaker (per-peer gRPC connections)**
392
+
393
+ | Field | Default | Description |
394
+ |-------|---------|-------------|
395
+ | `circuitBreakerFailureThreshold` | `3` | Consecutive failures before opening the circuit |
396
+ | `circuitBreakerCooldownMs` | `2000` | Time before a half-open probe is allowed |
397
+
398
+ **gRPC keepalive**
399
+
400
+ | Field | Default | Description |
401
+ |-------|---------|-------------|
402
+ | `keepaliveTimeMs` | `10000` | Keepalive ping interval (minimum enforced by grpc-js) |
403
+ | `keepaliveTimeoutMs` | `5000` | Connection dead if no keepalive response |
404
+
405
+ **RPC deadlines** (`deadlines` sub-object)
406
+
407
+ | Field | Default | Description |
408
+ |-------|---------|-------------|
409
+ | `deadlines.forwardMs` | `1500` | Deadline for fire-and-forget RPCs (forward, petition, enqueueToWorker) |
410
+ | `deadlines.pingMs` | `1000` | Deadline for health ping |
411
+ | `deadlines.andWaitMs` | `60000` | Default deadline for `*AndWait` RPCs when no `replyTimeout` is set |
412
+ | `deadlines.syncMs` | `1000` | Deadline for `listWorkers` during master table rebuild |
413
+ | `deadlines.connectivityWatchMs` | `30000` | Timeout for peer connectivity watch loop re-arm |
449
414
 
450
415
  ---
451
416
 
452
- ## Configuration
417
+ ## Dead Letter Queue
453
418
 
454
- ```typescript
455
- AtomicQueuesModule.forRoot({
456
- redis: { host: 'localhost', port: 6379 },
457
-
458
- executor: {
459
- poolSize: 1, // concurrent executors per node
460
- gateTTL: 30, // seconds before gate expires (safety net)
461
- defaultReplyTimeout: 15000, // global default for enqueueAndWait (ms)
462
- },
463
-
464
- entities: {
465
- account: {
466
- defaultEntityId: 'accountId',
467
- gateTTL: 60,
468
- retry: { maxAttempts: 5, backoff: 'exponential', backoffDelay: 2000 },
469
- replyTimeout: 5000, // per-entity enqueueAndWait timeout (ms)
470
- },
471
- },
472
-
473
- registry: {
474
- enabled: false,
475
- serviceName: 'my-service',
476
- schemaValidation: false,
477
- heartbeatInterval: 10000,
478
- registrationTTL: 30,
479
- },
480
-
481
- keyPrefix: 'aq',
482
- verbose: false,
483
- })
484
- ```
485
-
486
- Optional peer dependencies:
419
+ Messages found in `dispatched` state on recovery, or that exhaust all retry attempts, are moved to a Redis-backed dead letter queue.
487
420
 
488
421
  ```bash
489
- npm install @nestjs/cqrs # for CQRS handler auto-wiring
490
- npm install zod zod-to-json-schema # for schema validation in the registry
422
+ npx atomic-queues dlq list
423
+ npx atomic-queues dlq replay --id <message-id>
424
+ npx atomic-queues dlq purge
491
425
  ```
492
426
 
493
427
  ---
494
428
 
495
- ## Guarantees
496
-
497
- | Guarantee | Scope | Mechanism |
498
- |---|---|---|
499
- | FIFO per entity | Cluster-wide | Redis list (`LPUSH`/`RPOP`) |
500
- | Single-writer per entity | Cluster-wide | Gate key (`SET NX EX`) |
501
- | At-least-once delivery | Per message | Retry on gate TTL expiry |
502
- | Parallel across entities | Per node | Executor pool concurrency |
503
- | Durability | Per message | Redis persistence (AOF/RDB) |
504
-
505
- ### What this does NOT guarantee
506
-
507
- **Exactly-once processing.** Like every distributed message system — Orleans, Akka, Temporal, Kafka — handlers must be idempotent. If an executor crashes mid-processing, the gate TTL expires and the message retries on another node. This is a fundamental constraint of distributed systems, not a limitation of the library.
429
+ ## CLI
508
430
 
509
- ---
510
-
511
- ## How It Compares
431
+ ```bash
432
+ # Inspect live entity/command/query registry from Redis
433
+ npx atomic-queues introspect
512
434
 
513
- | Capability | BullMQ | Temporal | atomic-queues |
514
- |---|---|---|---|
515
- | Per-entity ordering | Manual (named queues) | Workflow-scoped | Built-in, zero config |
516
- | Cross-entity parallelism | Worker pools | Worker pools | Shared executor pool |
517
- | Stateful entities | No | Workflow state | Per-entity sequential handlers |
518
- | Cross-service messaging | Shared queue names | gRPC | Redis registry + codegen |
519
- | Polyglot clients | JS/TS only | SDK per language | Any Redis client (3 commands) |
520
- | Infrastructure required | Redis | Temporal server + DB | Redis only |
521
- | Distributed locks needed | Yes, for ordering | Internal | None — gates are non-contending |
522
- | Service discovery | External | Built-in | Built-in (registry) |
523
- | Schema validation | No | Protobuf | Zod → JSON Schema |
435
+ # Generate TypeScript from the live registry
436
+ npx atomic-queues generate --classes -o ./src/generated # decorated class files
437
+ npx atomic-queues generate --ts -o ./src/generated # namespace interfaces + DispatchMap
438
+ npx atomic-queues generate --json-schema -o ./src/generated
439
+ ```
524
440
 
525
441
  ---
526
442
 
527
- ## Decorator Reference
443
+ ## Guarantees
528
444
 
529
- | Decorator | Purpose |
445
+ | Guarantee | Mechanism |
530
446
  |---|---|
531
- | `@EntityType('type')` | Route a message to an entity type |
532
- | `@QueueEntityId()` | Mark the property holding the entity ID |
533
- | `@QueueEntity('type', 'prop')` | Combined entity type + ID |
534
- | `@Schema(zodSchema)` | Attach a Zod schema for registry validation |
535
- | `@ReplySchema(zodSchema)` | Attach a reply schema for query codegen |
447
+ | FIFO per entity | One worker per entity:entityId with FIFO queue |
448
+ | Single-writer per entity | Only one worker exists across the cluster |
449
+ | At-most-once delivery | WAL: enqueued dispatched completed. Never re-executed after dispatch. |
450
+ | Fail if interrupted | Dispatched on crash dead-lettered, source notified |
451
+ | Concurrent across entities | Event loop interleaves at await points |
452
+ | Durability | Redis WAL (dual-write: in-memory + Redis) |
453
+ | Auto-recovery | WAL recovery + cleanup run automatically on startup |
454
+ | Cluster coordination | Deterministic master topology with gRPC |
455
+ | Master failover | Heartbeat expiry → deterministic re-election + assignment table rebuild |
456
+ | Epoch fencing | Replicas reject commands from stale masters |
457
+ | No distributed locks | The worker IS the serialization — not a lock, not Redlock, not SET NX |
536
458
 
537
459
  ---
538
460
 
539
- ## Production Considerations
461
+ ## Design Philosophy
540
462
 
541
- ### Redis as a Single Point of Failure
463
+ AtomicQueues is pessimistic by design. At every decision point, it chooses safety over liveness:
542
464
 
543
- atomic-queues relies on a single Redis instance for all coordination: message logs, gates, the ready set, and the distributed registry. If that Redis instance becomes unavailable, all dispatch stops.
465
+ - **Interrupted?** Dead-letter, don't retry.
466
+ - **Redis down?** Reject new work, don't buffer.
467
+ - **Stale epoch?** Reject, don't process.
468
+ - **Master rebuilding?** Reject petitions, don't guess.
469
+ - **Unknown assignment?** Bounce and retry through the master, don't deliver speculatively.
544
470
 
545
- **Mitigations:**
546
-
547
- - **Redis Sentinel** — automatic failover to a replica. Gates (SET NX EX) and Lua scripts work identically after promotion. Brief message re-delivery is possible during failover but per-entity ordering is preserved.
548
- - **Redis Cluster** — horizontal scaling. Requires all keys for a given entity to land on the same shard. Use Redis hash tags (e.g. `{account:a-1}`) in your `keyPrefix` config to ensure co-location.
549
- - **Persistence** — enable AOF (`appendonly yes`) with `appendfsync everysec` at minimum. RDB snapshots alone risk losing the last seconds of enqueued messages on crash.
550
- - **Monitoring** — watch `connected_clients`, `used_memory`, and `instantaneous_ops_per_sec`. Set alerts on replication lag if using Sentinel.
551
-
552
- ### Retry Ordering
553
-
554
- Failed messages are re-enqueued with `RPUSH`, placing them at the back of the entity's log. This means other pending messages for the same entity are processed before the retry. If you need head-of-line retry (failed message retried immediately), implement a custom retry strategy.
471
+ The system refuses to operate under uncertainty rather than risk executing a message twice.
555
472
 
556
473
  ---
557
474
 
558
- ## Migrating from V1
559
-
560
- V2 is a full rewrite of the internals. BullMQ is removed. Workers are removed. The public API is largely preserved.
475
+ ## Migrating from v2
561
476
 
562
- **What stays the same**: `@EntityType`, `@QueueEntityId`, `@QueueEntity`, `queueBus.enqueue()`, `queueBus.forEntity()`, `queueBus.enqueueAndWait()`.
477
+ **Removed**: `executor`, `registry`, `gateTTL`, `ActorSystem`, `LogService`, `GateService`, `SchedulerService`, `ExecutorPoolService`, `ResultCollector`, `RegistryService`, `workers` config, `WorkerModule`.
563
478
 
564
- **What's removed**: `@WorkerProcessor`, `@JobHandler`, `@EntityScaler`, `@OnSpawnWorker`, `@OnTerminateWorker`, `@GetActiveEntities`, `@GetDesiredWorkerCount`, `.forProcessor()`. All worker and scaling concepts are gone.
479
+ **Added**: `EntityWorker`, `EntityWorkerManager`, `MasterCoordinator`, `workerIdleTimeout` in entity config.
565
480
 
566
- **What's new**: `@Schema`, `@ReplySchema`, `ActorSystem`, `RegistryService`, distributed registry, runtime introspection (`queueBus.introspect()`), cross-service string-based API, `Reply<T>` phantom type, class codegen CLI (`--classes`), config-driven timeouts.
481
+ **Unchanged**: All decorators, `QueueBus` public API, CLI generators.
567
482
 
568
- **Migration steps**: (1) remove all `@WorkerProcessor` classes — configure entity defaults in module config and use `@CommandHandler`/`@QueryHandler`; (2) remove all scaling decorators; (3) run the data migration script to drain in-flight BullMQ jobs to the new log format; (4) remove `bullmq` and `@nestjs/bullmq` from your dependencies.
483
+ **Migration**: Remove `executor`/`registry`/`workers` from config. That's it. Workers are now internal.
569
484
 
570
485
  ---
571
486