@wazir-dev/cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (629) hide show
  1. package/AGENTS.md +111 -0
  2. package/CHANGELOG.md +14 -0
  3. package/CONTRIBUTING.md +101 -0
  4. package/LICENSE +21 -0
  5. package/README.md +314 -0
  6. package/assets/composition-engine.mmd +34 -0
  7. package/assets/demo-script.sh +17 -0
  8. package/assets/logo-dark.svg +14 -0
  9. package/assets/logo.svg +14 -0
  10. package/assets/pipeline.mmd +39 -0
  11. package/assets/record-demo.sh +51 -0
  12. package/docs/README.md +51 -0
  13. package/docs/adapters/context-mode.md +60 -0
  14. package/docs/concepts/architecture.md +87 -0
  15. package/docs/concepts/artifact-model.md +60 -0
  16. package/docs/concepts/composition-engine.md +36 -0
  17. package/docs/concepts/indexing-and-recall.md +160 -0
  18. package/docs/concepts/observability.md +41 -0
  19. package/docs/concepts/roles-and-workflows.md +59 -0
  20. package/docs/concepts/terminology-policy.md +27 -0
  21. package/docs/getting-started/01-installation.md +78 -0
  22. package/docs/getting-started/02-first-run.md +102 -0
  23. package/docs/getting-started/03-adding-to-project.md +15 -0
  24. package/docs/getting-started/04-host-setup.md +15 -0
  25. package/docs/guides/ci-integration.md +15 -0
  26. package/docs/guides/creating-skills.md +15 -0
  27. package/docs/guides/expertise-module-authoring.md +15 -0
  28. package/docs/guides/hook-development.md +15 -0
  29. package/docs/guides/memory-and-learnings.md +34 -0
  30. package/docs/guides/multi-host-export.md +15 -0
  31. package/docs/guides/troubleshooting.md +101 -0
  32. package/docs/guides/writing-custom-roles.md +15 -0
  33. package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
  34. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
  35. package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
  36. package/docs/readmes/INDEX.md +99 -0
  37. package/docs/readmes/features/expertise/README.md +171 -0
  38. package/docs/readmes/features/exports/README.md +222 -0
  39. package/docs/readmes/features/hooks/README.md +103 -0
  40. package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
  41. package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
  42. package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
  43. package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
  44. package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
  45. package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
  46. package/docs/readmes/features/hooks/session-start.md +119 -0
  47. package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
  48. package/docs/readmes/features/roles/README.md +157 -0
  49. package/docs/readmes/features/roles/clarifier.md +152 -0
  50. package/docs/readmes/features/roles/content-author.md +190 -0
  51. package/docs/readmes/features/roles/designer.md +193 -0
  52. package/docs/readmes/features/roles/executor.md +184 -0
  53. package/docs/readmes/features/roles/learner.md +210 -0
  54. package/docs/readmes/features/roles/planner.md +182 -0
  55. package/docs/readmes/features/roles/researcher.md +164 -0
  56. package/docs/readmes/features/roles/reviewer.md +184 -0
  57. package/docs/readmes/features/roles/specifier.md +162 -0
  58. package/docs/readmes/features/roles/verifier.md +215 -0
  59. package/docs/readmes/features/schemas/README.md +178 -0
  60. package/docs/readmes/features/skills/README.md +63 -0
  61. package/docs/readmes/features/skills/brainstorming.md +96 -0
  62. package/docs/readmes/features/skills/debugging.md +148 -0
  63. package/docs/readmes/features/skills/design.md +120 -0
  64. package/docs/readmes/features/skills/prepare-next.md +109 -0
  65. package/docs/readmes/features/skills/run-audit.md +159 -0
  66. package/docs/readmes/features/skills/scan-project.md +109 -0
  67. package/docs/readmes/features/skills/self-audit.md +176 -0
  68. package/docs/readmes/features/skills/tdd.md +137 -0
  69. package/docs/readmes/features/skills/using-skills.md +92 -0
  70. package/docs/readmes/features/skills/verification.md +120 -0
  71. package/docs/readmes/features/skills/writing-plans.md +104 -0
  72. package/docs/readmes/features/tooling/README.md +320 -0
  73. package/docs/readmes/features/workflows/README.md +186 -0
  74. package/docs/readmes/features/workflows/author.md +181 -0
  75. package/docs/readmes/features/workflows/clarify.md +154 -0
  76. package/docs/readmes/features/workflows/design-review.md +171 -0
  77. package/docs/readmes/features/workflows/design.md +169 -0
  78. package/docs/readmes/features/workflows/discover.md +162 -0
  79. package/docs/readmes/features/workflows/execute.md +173 -0
  80. package/docs/readmes/features/workflows/learn.md +167 -0
  81. package/docs/readmes/features/workflows/plan-review.md +165 -0
  82. package/docs/readmes/features/workflows/plan.md +170 -0
  83. package/docs/readmes/features/workflows/prepare-next.md +167 -0
  84. package/docs/readmes/features/workflows/review.md +169 -0
  85. package/docs/readmes/features/workflows/run-audit.md +191 -0
  86. package/docs/readmes/features/workflows/spec-challenge.md +159 -0
  87. package/docs/readmes/features/workflows/specify.md +160 -0
  88. package/docs/readmes/features/workflows/verify.md +177 -0
  89. package/docs/readmes/packages/README.md +50 -0
  90. package/docs/readmes/packages/ajv.md +117 -0
  91. package/docs/readmes/packages/context-mode.md +118 -0
  92. package/docs/readmes/packages/gray-matter.md +116 -0
  93. package/docs/readmes/packages/node-test.md +137 -0
  94. package/docs/readmes/packages/yaml.md +112 -0
  95. package/docs/reference/configuration-reference.md +159 -0
  96. package/docs/reference/expertise-index.md +52 -0
  97. package/docs/reference/git-flow.md +43 -0
  98. package/docs/reference/hooks.md +87 -0
  99. package/docs/reference/host-exports.md +50 -0
  100. package/docs/reference/launch-checklist.md +172 -0
  101. package/docs/reference/marketplace-listings.md +76 -0
  102. package/docs/reference/release-process.md +34 -0
  103. package/docs/reference/roles-reference.md +77 -0
  104. package/docs/reference/skills.md +33 -0
  105. package/docs/reference/templates.md +29 -0
  106. package/docs/reference/tooling-cli.md +94 -0
  107. package/docs/truth-claims.yaml +222 -0
  108. package/expertise/PROGRESS.md +63 -0
  109. package/expertise/README.md +18 -0
  110. package/expertise/antipatterns/PROGRESS.md +56 -0
  111. package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
  112. package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
  113. package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
  114. package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
  115. package/expertise/antipatterns/backend/index.md +24 -0
  116. package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
  117. package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
  118. package/expertise/antipatterns/code/async-antipatterns.md +622 -0
  119. package/expertise/antipatterns/code/code-smells.md +1186 -0
  120. package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
  121. package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
  122. package/expertise/antipatterns/code/index.md +27 -0
  123. package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
  124. package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
  125. package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
  126. package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
  127. package/expertise/antipatterns/design/dark-patterns.md +1121 -0
  128. package/expertise/antipatterns/design/index.md +22 -0
  129. package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
  130. package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
  131. package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
  132. package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
  133. package/expertise/antipatterns/frontend/index.md +23 -0
  134. package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
  135. package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
  136. package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
  137. package/expertise/antipatterns/index.md +31 -0
  138. package/expertise/antipatterns/performance/index.md +20 -0
  139. package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
  140. package/expertise/antipatterns/performance/premature-optimization.md +623 -0
  141. package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
  142. package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
  143. package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
  144. package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
  145. package/expertise/antipatterns/process/index.md +23 -0
  146. package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
  147. package/expertise/antipatterns/security/index.md +20 -0
  148. package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
  149. package/expertise/antipatterns/security/security-theater.md +843 -0
  150. package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
  151. package/expertise/architecture/PROGRESS.md +70 -0
  152. package/expertise/architecture/data/caching-architecture.md +671 -0
  153. package/expertise/architecture/data/data-consistency.md +574 -0
  154. package/expertise/architecture/data/data-modeling.md +536 -0
  155. package/expertise/architecture/data/event-streams-and-queues.md +634 -0
  156. package/expertise/architecture/data/index.md +25 -0
  157. package/expertise/architecture/data/search-architecture.md +663 -0
  158. package/expertise/architecture/data/sql-vs-nosql.md +708 -0
  159. package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
  160. package/expertise/architecture/decisions/build-vs-buy.md +616 -0
  161. package/expertise/architecture/decisions/index.md +23 -0
  162. package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
  163. package/expertise/architecture/decisions/technology-selection.md +616 -0
  164. package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
  165. package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
  166. package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
  167. package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
  168. package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
  169. package/expertise/architecture/distributed/index.md +25 -0
  170. package/expertise/architecture/distributed/saga-pattern.md +797 -0
  171. package/expertise/architecture/foundations/architectural-thinking.md +460 -0
  172. package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
  173. package/expertise/architecture/foundations/design-principles-solid.md +649 -0
  174. package/expertise/architecture/foundations/domain-driven-design.md +719 -0
  175. package/expertise/architecture/foundations/index.md +25 -0
  176. package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
  177. package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
  178. package/expertise/architecture/index.md +34 -0
  179. package/expertise/architecture/integration/api-design-graphql.md +638 -0
  180. package/expertise/architecture/integration/api-design-grpc.md +804 -0
  181. package/expertise/architecture/integration/api-design-rest.md +892 -0
  182. package/expertise/architecture/integration/index.md +25 -0
  183. package/expertise/architecture/integration/third-party-integration.md +795 -0
  184. package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
  185. package/expertise/architecture/integration/websockets-realtime.md +791 -0
  186. package/expertise/architecture/mobile-architecture/index.md +22 -0
  187. package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
  188. package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
  189. package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
  190. package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
  191. package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
  192. package/expertise/architecture/patterns/event-driven.md +797 -0
  193. package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
  194. package/expertise/architecture/patterns/index.md +27 -0
  195. package/expertise/architecture/patterns/layered-architecture.md +736 -0
  196. package/expertise/architecture/patterns/microservices.md +753 -0
  197. package/expertise/architecture/patterns/modular-monolith.md +692 -0
  198. package/expertise/architecture/patterns/monolith.md +626 -0
  199. package/expertise/architecture/patterns/plugin-architecture.md +735 -0
  200. package/expertise/architecture/patterns/serverless.md +780 -0
  201. package/expertise/architecture/scaling/database-scaling.md +615 -0
  202. package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
  203. package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
  204. package/expertise/architecture/scaling/index.md +24 -0
  205. package/expertise/architecture/scaling/multi-tenancy.md +800 -0
  206. package/expertise/architecture/scaling/stateless-design.md +787 -0
  207. package/expertise/backend/embedded-firmware.md +625 -0
  208. package/expertise/backend/go.md +853 -0
  209. package/expertise/backend/index.md +24 -0
  210. package/expertise/backend/java-spring.md +448 -0
  211. package/expertise/backend/node-typescript.md +625 -0
  212. package/expertise/backend/python-fastapi.md +724 -0
  213. package/expertise/backend/rust.md +458 -0
  214. package/expertise/backend/solidity.md +711 -0
  215. package/expertise/composition-map.yaml +443 -0
  216. package/expertise/content/foundations/content-modeling.md +395 -0
  217. package/expertise/content/foundations/editorial-standards.md +449 -0
  218. package/expertise/content/foundations/index.md +24 -0
  219. package/expertise/content/foundations/microcopy.md +455 -0
  220. package/expertise/content/foundations/terminology-governance.md +509 -0
  221. package/expertise/content/index.md +34 -0
  222. package/expertise/content/patterns/accessibility-copy.md +518 -0
  223. package/expertise/content/patterns/index.md +24 -0
  224. package/expertise/content/patterns/notification-content.md +433 -0
  225. package/expertise/content/patterns/sample-content.md +486 -0
  226. package/expertise/content/patterns/state-copy.md +439 -0
  227. package/expertise/design/PROGRESS.md +58 -0
  228. package/expertise/design/disciplines/dark-mode-theming.md +577 -0
  229. package/expertise/design/disciplines/design-systems.md +595 -0
  230. package/expertise/design/disciplines/index.md +25 -0
  231. package/expertise/design/disciplines/information-architecture.md +800 -0
  232. package/expertise/design/disciplines/interaction-design.md +788 -0
  233. package/expertise/design/disciplines/responsive-design.md +552 -0
  234. package/expertise/design/disciplines/usability-testing.md +516 -0
  235. package/expertise/design/disciplines/user-research.md +792 -0
  236. package/expertise/design/foundations/accessibility-design.md +796 -0
  237. package/expertise/design/foundations/color-theory.md +797 -0
  238. package/expertise/design/foundations/iconography.md +795 -0
  239. package/expertise/design/foundations/index.md +26 -0
  240. package/expertise/design/foundations/motion-and-animation.md +653 -0
  241. package/expertise/design/foundations/rtl-design.md +585 -0
  242. package/expertise/design/foundations/spacing-and-layout.md +607 -0
  243. package/expertise/design/foundations/typography.md +800 -0
  244. package/expertise/design/foundations/visual-hierarchy.md +761 -0
  245. package/expertise/design/index.md +32 -0
  246. package/expertise/design/patterns/authentication-flows.md +474 -0
  247. package/expertise/design/patterns/content-consumption.md +789 -0
  248. package/expertise/design/patterns/data-display.md +618 -0
  249. package/expertise/design/patterns/e-commerce.md +1494 -0
  250. package/expertise/design/patterns/feedback-and-states.md +642 -0
  251. package/expertise/design/patterns/forms-and-input.md +819 -0
  252. package/expertise/design/patterns/gamification.md +801 -0
  253. package/expertise/design/patterns/index.md +31 -0
  254. package/expertise/design/patterns/microinteractions.md +449 -0
  255. package/expertise/design/patterns/navigation.md +800 -0
  256. package/expertise/design/patterns/notifications.md +705 -0
  257. package/expertise/design/patterns/onboarding.md +700 -0
  258. package/expertise/design/patterns/search-and-filter.md +601 -0
  259. package/expertise/design/patterns/settings-and-preferences.md +768 -0
  260. package/expertise/design/patterns/social-and-community.md +748 -0
  261. package/expertise/design/platforms/desktop-native.md +612 -0
  262. package/expertise/design/platforms/index.md +25 -0
  263. package/expertise/design/platforms/mobile-android.md +825 -0
  264. package/expertise/design/platforms/mobile-cross-platform.md +983 -0
  265. package/expertise/design/platforms/mobile-ios.md +699 -0
  266. package/expertise/design/platforms/tablet.md +794 -0
  267. package/expertise/design/platforms/web-dashboard.md +790 -0
  268. package/expertise/design/platforms/web-responsive.md +550 -0
  269. package/expertise/design/psychology/behavioral-nudges.md +449 -0
  270. package/expertise/design/psychology/cognitive-load.md +1191 -0
  271. package/expertise/design/psychology/error-psychology.md +778 -0
  272. package/expertise/design/psychology/index.md +22 -0
  273. package/expertise/design/psychology/persuasive-design.md +736 -0
  274. package/expertise/design/psychology/user-mental-models.md +623 -0
  275. package/expertise/design/tooling/open-pencil.md +266 -0
  276. package/expertise/frontend/angular.md +1073 -0
  277. package/expertise/frontend/desktop-electron.md +546 -0
  278. package/expertise/frontend/flutter.md +782 -0
  279. package/expertise/frontend/index.md +27 -0
  280. package/expertise/frontend/native-android.md +409 -0
  281. package/expertise/frontend/native-ios.md +490 -0
  282. package/expertise/frontend/react-native.md +1160 -0
  283. package/expertise/frontend/react.md +808 -0
  284. package/expertise/frontend/vue.md +1089 -0
  285. package/expertise/humanize/domain-rules-code.md +79 -0
  286. package/expertise/humanize/domain-rules-content.md +67 -0
  287. package/expertise/humanize/domain-rules-technical-docs.md +56 -0
  288. package/expertise/humanize/index.md +35 -0
  289. package/expertise/humanize/self-audit-checklist.md +87 -0
  290. package/expertise/humanize/sentence-patterns.md +218 -0
  291. package/expertise/humanize/vocabulary-blacklist.md +105 -0
  292. package/expertise/i18n/PROGRESS.md +65 -0
  293. package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
  294. package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
  295. package/expertise/i18n/advanced/complex-scripts.md +30 -0
  296. package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
  297. package/expertise/i18n/advanced/testing-i18n.md +28 -0
  298. package/expertise/i18n/content/content-adaptation.md +23 -0
  299. package/expertise/i18n/content/locale-specific-formatting.md +23 -0
  300. package/expertise/i18n/content/machine-translation-integration.md +28 -0
  301. package/expertise/i18n/content/translation-management.md +29 -0
  302. package/expertise/i18n/foundations/date-time-calendars.md +67 -0
  303. package/expertise/i18n/foundations/i18n-architecture.md +272 -0
  304. package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
  305. package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
  306. package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
  307. package/expertise/i18n/foundations/string-externalization.md +236 -0
  308. package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
  309. package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
  310. package/expertise/i18n/index.md +38 -0
  311. package/expertise/i18n/platform/backend-i18n.md +31 -0
  312. package/expertise/i18n/platform/flutter-i18n.md +148 -0
  313. package/expertise/i18n/platform/native-android-i18n.md +36 -0
  314. package/expertise/i18n/platform/native-ios-i18n.md +36 -0
  315. package/expertise/i18n/platform/react-i18n.md +103 -0
  316. package/expertise/i18n/platform/web-css-i18n.md +81 -0
  317. package/expertise/i18n/rtl/arabic-specific.md +175 -0
  318. package/expertise/i18n/rtl/hebrew-specific.md +149 -0
  319. package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
  320. package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
  321. package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
  322. package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
  323. package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
  324. package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
  325. package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
  326. package/expertise/i18n/rtl/rtl-typography.md +160 -0
  327. package/expertise/index.md +113 -0
  328. package/expertise/index.yaml +216 -0
  329. package/expertise/infrastructure/cloud-aws.md +597 -0
  330. package/expertise/infrastructure/cloud-gcp.md +599 -0
  331. package/expertise/infrastructure/cybersecurity.md +816 -0
  332. package/expertise/infrastructure/database-mongodb.md +447 -0
  333. package/expertise/infrastructure/database-postgres.md +400 -0
  334. package/expertise/infrastructure/devops-cicd.md +787 -0
  335. package/expertise/infrastructure/index.md +27 -0
  336. package/expertise/performance/PROGRESS.md +50 -0
  337. package/expertise/performance/backend/api-latency.md +1204 -0
  338. package/expertise/performance/backend/background-jobs.md +506 -0
  339. package/expertise/performance/backend/connection-pooling.md +1209 -0
  340. package/expertise/performance/backend/database-query-optimization.md +515 -0
  341. package/expertise/performance/backend/index.md +23 -0
  342. package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
  343. package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
  344. package/expertise/performance/foundations/caching-strategies.md +489 -0
  345. package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
  346. package/expertise/performance/foundations/index.md +24 -0
  347. package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
  348. package/expertise/performance/foundations/memory-management.md +964 -0
  349. package/expertise/performance/foundations/performance-budgets.md +1314 -0
  350. package/expertise/performance/index.md +31 -0
  351. package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
  352. package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
  353. package/expertise/performance/infrastructure/index.md +22 -0
  354. package/expertise/performance/infrastructure/load-balancing.md +1081 -0
  355. package/expertise/performance/infrastructure/observability.md +1079 -0
  356. package/expertise/performance/mobile/index.md +23 -0
  357. package/expertise/performance/mobile/mobile-animations.md +544 -0
  358. package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
  359. package/expertise/performance/mobile/mobile-network.md +452 -0
  360. package/expertise/performance/mobile/mobile-rendering.md +599 -0
  361. package/expertise/performance/mobile/mobile-startup-time.md +505 -0
  362. package/expertise/performance/platform-specific/flutter-performance.md +647 -0
  363. package/expertise/performance/platform-specific/index.md +22 -0
  364. package/expertise/performance/platform-specific/node-performance.md +1307 -0
  365. package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
  366. package/expertise/performance/platform-specific/react-performance.md +1403 -0
  367. package/expertise/performance/web/bundle-optimization.md +1239 -0
  368. package/expertise/performance/web/image-and-media.md +636 -0
  369. package/expertise/performance/web/index.md +24 -0
  370. package/expertise/performance/web/network-optimization.md +1133 -0
  371. package/expertise/performance/web/rendering-performance.md +1098 -0
  372. package/expertise/performance/web/ssr-and-hydration.md +918 -0
  373. package/expertise/performance/web/web-vitals.md +1374 -0
  374. package/expertise/quality/accessibility.md +985 -0
  375. package/expertise/quality/evidence-based-verification.md +499 -0
  376. package/expertise/quality/index.md +24 -0
  377. package/expertise/quality/ml-model-audit.md +614 -0
  378. package/expertise/quality/performance.md +600 -0
  379. package/expertise/quality/testing-api.md +891 -0
  380. package/expertise/quality/testing-mobile.md +496 -0
  381. package/expertise/quality/testing-web.md +849 -0
  382. package/expertise/security/PROGRESS.md +54 -0
  383. package/expertise/security/agentic-identity.md +540 -0
  384. package/expertise/security/compliance-frameworks.md +601 -0
  385. package/expertise/security/data/data-encryption.md +364 -0
  386. package/expertise/security/data/data-privacy-gdpr.md +692 -0
  387. package/expertise/security/data/database-security.md +1171 -0
  388. package/expertise/security/data/index.md +22 -0
  389. package/expertise/security/data/pii-handling.md +531 -0
  390. package/expertise/security/foundations/authentication.md +1041 -0
  391. package/expertise/security/foundations/authorization.md +603 -0
  392. package/expertise/security/foundations/cryptography.md +1001 -0
  393. package/expertise/security/foundations/index.md +25 -0
  394. package/expertise/security/foundations/owasp-top-10.md +1354 -0
  395. package/expertise/security/foundations/secrets-management.md +1217 -0
  396. package/expertise/security/foundations/secure-sdlc.md +700 -0
  397. package/expertise/security/foundations/supply-chain-security.md +698 -0
  398. package/expertise/security/index.md +31 -0
  399. package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
  400. package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
  401. package/expertise/security/infrastructure/container-security.md +721 -0
  402. package/expertise/security/infrastructure/incident-response.md +1295 -0
  403. package/expertise/security/infrastructure/index.md +24 -0
  404. package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
  405. package/expertise/security/infrastructure/network-security.md +1337 -0
  406. package/expertise/security/mobile/index.md +23 -0
  407. package/expertise/security/mobile/mobile-android-security.md +1218 -0
  408. package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
  409. package/expertise/security/mobile/mobile-data-storage.md +1265 -0
  410. package/expertise/security/mobile/mobile-ios-security.md +1401 -0
  411. package/expertise/security/mobile/mobile-network-security.md +1520 -0
  412. package/expertise/security/smart-contract-security.md +594 -0
  413. package/expertise/security/testing/index.md +22 -0
  414. package/expertise/security/testing/penetration-testing.md +1258 -0
  415. package/expertise/security/testing/security-code-review.md +1765 -0
  416. package/expertise/security/testing/threat-modeling.md +1074 -0
  417. package/expertise/security/testing/vulnerability-scanning.md +1062 -0
  418. package/expertise/security/web/api-security.md +586 -0
  419. package/expertise/security/web/cors-and-headers.md +433 -0
  420. package/expertise/security/web/csrf.md +562 -0
  421. package/expertise/security/web/file-upload.md +1477 -0
  422. package/expertise/security/web/index.md +25 -0
  423. package/expertise/security/web/injection.md +1375 -0
  424. package/expertise/security/web/session-management.md +1101 -0
  425. package/expertise/security/web/xss.md +1158 -0
  426. package/exports/README.md +17 -0
  427. package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
  428. package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
  429. package/exports/hosts/claude/.claude/agents/designer.md +55 -0
  430. package/exports/hosts/claude/.claude/agents/executor.md +55 -0
  431. package/exports/hosts/claude/.claude/agents/learner.md +51 -0
  432. package/exports/hosts/claude/.claude/agents/planner.md +53 -0
  433. package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
  434. package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
  435. package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
  436. package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
  437. package/exports/hosts/claude/.claude/commands/author.md +42 -0
  438. package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
  439. package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
  440. package/exports/hosts/claude/.claude/commands/design.md +44 -0
  441. package/exports/hosts/claude/.claude/commands/discover.md +37 -0
  442. package/exports/hosts/claude/.claude/commands/execute.md +48 -0
  443. package/exports/hosts/claude/.claude/commands/learn.md +38 -0
  444. package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
  445. package/exports/hosts/claude/.claude/commands/plan.md +39 -0
  446. package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
  447. package/exports/hosts/claude/.claude/commands/review.md +40 -0
  448. package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
  449. package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
  450. package/exports/hosts/claude/.claude/commands/specify.md +38 -0
  451. package/exports/hosts/claude/.claude/commands/verify.md +37 -0
  452. package/exports/hosts/claude/.claude/settings.json +34 -0
  453. package/exports/hosts/claude/CLAUDE.md +19 -0
  454. package/exports/hosts/claude/export.manifest.json +38 -0
  455. package/exports/hosts/claude/host-package.json +67 -0
  456. package/exports/hosts/codex/AGENTS.md +19 -0
  457. package/exports/hosts/codex/export.manifest.json +38 -0
  458. package/exports/hosts/codex/host-package.json +41 -0
  459. package/exports/hosts/cursor/.cursor/hooks.json +16 -0
  460. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
  461. package/exports/hosts/cursor/export.manifest.json +38 -0
  462. package/exports/hosts/cursor/host-package.json +42 -0
  463. package/exports/hosts/gemini/GEMINI.md +19 -0
  464. package/exports/hosts/gemini/export.manifest.json +38 -0
  465. package/exports/hosts/gemini/host-package.json +41 -0
  466. package/hooks/README.md +18 -0
  467. package/hooks/definitions/loop_cap_guard.yaml +21 -0
  468. package/hooks/definitions/post_tool_capture.yaml +24 -0
  469. package/hooks/definitions/pre_compact_summary.yaml +19 -0
  470. package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
  471. package/hooks/definitions/protected_path_write_guard.yaml +19 -0
  472. package/hooks/definitions/session_start.yaml +19 -0
  473. package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
  474. package/hooks/loop-cap-guard +17 -0
  475. package/hooks/post-tool-lint +36 -0
  476. package/hooks/protected-path-write-guard +17 -0
  477. package/hooks/session-start +41 -0
  478. package/llms-full.txt +2355 -0
  479. package/llms.txt +43 -0
  480. package/package.json +79 -0
  481. package/roles/README.md +20 -0
  482. package/roles/clarifier.md +42 -0
  483. package/roles/content-author.md +63 -0
  484. package/roles/designer.md +55 -0
  485. package/roles/executor.md +55 -0
  486. package/roles/learner.md +51 -0
  487. package/roles/planner.md +53 -0
  488. package/roles/researcher.md +43 -0
  489. package/roles/reviewer.md +54 -0
  490. package/roles/specifier.md +47 -0
  491. package/roles/verifier.md +71 -0
  492. package/schemas/README.md +24 -0
  493. package/schemas/accepted-learning.schema.json +20 -0
  494. package/schemas/author-artifact.schema.json +156 -0
  495. package/schemas/clarification.schema.json +19 -0
  496. package/schemas/design-artifact.schema.json +80 -0
  497. package/schemas/docs-claim.schema.json +18 -0
  498. package/schemas/export-manifest.schema.json +20 -0
  499. package/schemas/hook.schema.json +67 -0
  500. package/schemas/host-export-package.schema.json +18 -0
  501. package/schemas/implementation-plan.schema.json +19 -0
  502. package/schemas/proposed-learning.schema.json +19 -0
  503. package/schemas/research.schema.json +18 -0
  504. package/schemas/review.schema.json +29 -0
  505. package/schemas/run-manifest.schema.json +18 -0
  506. package/schemas/spec-challenge.schema.json +18 -0
  507. package/schemas/spec.schema.json +20 -0
  508. package/schemas/usage.schema.json +102 -0
  509. package/schemas/verification-proof.schema.json +29 -0
  510. package/schemas/wazir-manifest.schema.json +173 -0
  511. package/skills/README.md +40 -0
  512. package/skills/brainstorming/SKILL.md +77 -0
  513. package/skills/debugging/SKILL.md +50 -0
  514. package/skills/design/SKILL.md +61 -0
  515. package/skills/dispatching-parallel-agents/SKILL.md +128 -0
  516. package/skills/executing-plans/SKILL.md +70 -0
  517. package/skills/finishing-a-development-branch/SKILL.md +169 -0
  518. package/skills/humanize/SKILL.md +123 -0
  519. package/skills/init-pipeline/SKILL.md +124 -0
  520. package/skills/prepare-next/SKILL.md +20 -0
  521. package/skills/receiving-code-review/SKILL.md +123 -0
  522. package/skills/requesting-code-review/SKILL.md +105 -0
  523. package/skills/requesting-code-review/code-reviewer.md +108 -0
  524. package/skills/run-audit/SKILL.md +197 -0
  525. package/skills/scan-project/SKILL.md +41 -0
  526. package/skills/self-audit/SKILL.md +153 -0
  527. package/skills/subagent-driven-development/SKILL.md +154 -0
  528. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  529. package/skills/subagent-driven-development/implementer-prompt.md +102 -0
  530. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  531. package/skills/tdd/SKILL.md +23 -0
  532. package/skills/using-git-worktrees/SKILL.md +163 -0
  533. package/skills/using-skills/SKILL.md +95 -0
  534. package/skills/verification/SKILL.md +22 -0
  535. package/skills/wazir/SKILL.md +463 -0
  536. package/skills/writing-plans/SKILL.md +30 -0
  537. package/skills/writing-skills/SKILL.md +157 -0
  538. package/skills/writing-skills/anthropic-best-practices.md +122 -0
  539. package/skills/writing-skills/persuasion-principles.md +50 -0
  540. package/templates/README.md +20 -0
  541. package/templates/artifacts/README.md +10 -0
  542. package/templates/artifacts/accepted-learning.md +19 -0
  543. package/templates/artifacts/accepted-learning.template.json +12 -0
  544. package/templates/artifacts/author.md +74 -0
  545. package/templates/artifacts/author.template.json +19 -0
  546. package/templates/artifacts/clarification.md +21 -0
  547. package/templates/artifacts/clarification.template.json +12 -0
  548. package/templates/artifacts/execute-notes.md +19 -0
  549. package/templates/artifacts/implementation-plan.md +21 -0
  550. package/templates/artifacts/implementation-plan.template.json +11 -0
  551. package/templates/artifacts/learning-proposal.md +19 -0
  552. package/templates/artifacts/next-run-handoff.md +21 -0
  553. package/templates/artifacts/plan-review.md +19 -0
  554. package/templates/artifacts/proposed-learning.template.json +12 -0
  555. package/templates/artifacts/research.md +21 -0
  556. package/templates/artifacts/research.template.json +12 -0
  557. package/templates/artifacts/review-findings.md +19 -0
  558. package/templates/artifacts/review.template.json +11 -0
  559. package/templates/artifacts/run-manifest.template.json +8 -0
  560. package/templates/artifacts/spec-challenge.md +19 -0
  561. package/templates/artifacts/spec-challenge.template.json +11 -0
  562. package/templates/artifacts/spec.md +21 -0
  563. package/templates/artifacts/spec.template.json +12 -0
  564. package/templates/artifacts/verification-proof.md +19 -0
  565. package/templates/artifacts/verification-proof.template.json +11 -0
  566. package/templates/examples/accepted-learning.example.json +14 -0
  567. package/templates/examples/author.example.json +152 -0
  568. package/templates/examples/clarification.example.json +15 -0
  569. package/templates/examples/docs-claim.example.json +8 -0
  570. package/templates/examples/export-manifest.example.json +7 -0
  571. package/templates/examples/host-export-package.example.json +11 -0
  572. package/templates/examples/implementation-plan.example.json +17 -0
  573. package/templates/examples/proposed-learning.example.json +13 -0
  574. package/templates/examples/research.example.json +15 -0
  575. package/templates/examples/research.example.md +6 -0
  576. package/templates/examples/review.example.json +17 -0
  577. package/templates/examples/run-manifest.example.json +9 -0
  578. package/templates/examples/spec-challenge.example.json +14 -0
  579. package/templates/examples/spec.example.json +21 -0
  580. package/templates/examples/verification-proof.example.json +21 -0
  581. package/templates/examples/wazir-manifest.example.yaml +65 -0
  582. package/templates/task-definition-schema.md +99 -0
  583. package/tooling/README.md +20 -0
  584. package/tooling/src/adapters/context-mode.js +50 -0
  585. package/tooling/src/capture/command.js +376 -0
  586. package/tooling/src/capture/store.js +99 -0
  587. package/tooling/src/capture/usage.js +270 -0
  588. package/tooling/src/checks/branches.js +50 -0
  589. package/tooling/src/checks/brand-truth.js +110 -0
  590. package/tooling/src/checks/changelog.js +231 -0
  591. package/tooling/src/checks/command-registry.js +36 -0
  592. package/tooling/src/checks/commits.js +102 -0
  593. package/tooling/src/checks/docs-drift.js +103 -0
  594. package/tooling/src/checks/docs-truth.js +201 -0
  595. package/tooling/src/checks/runtime-surface.js +156 -0
  596. package/tooling/src/cli.js +116 -0
  597. package/tooling/src/command-options.js +56 -0
  598. package/tooling/src/commands/validate.js +320 -0
  599. package/tooling/src/doctor/command.js +91 -0
  600. package/tooling/src/export/command.js +77 -0
  601. package/tooling/src/export/compiler.js +498 -0
  602. package/tooling/src/guards/loop-cap-guard.js +52 -0
  603. package/tooling/src/guards/protected-path-write-guard.js +67 -0
  604. package/tooling/src/index/command.js +152 -0
  605. package/tooling/src/index/storage.js +1061 -0
  606. package/tooling/src/index/summarizers.js +261 -0
  607. package/tooling/src/loaders.js +18 -0
  608. package/tooling/src/project-root.js +22 -0
  609. package/tooling/src/recall/command.js +225 -0
  610. package/tooling/src/schema-validator.js +30 -0
  611. package/tooling/src/state-root.js +40 -0
  612. package/tooling/src/status/command.js +71 -0
  613. package/wazir.manifest.yaml +135 -0
  614. package/workflows/README.md +19 -0
  615. package/workflows/author.md +42 -0
  616. package/workflows/clarify.md +38 -0
  617. package/workflows/design-review.md +46 -0
  618. package/workflows/design.md +44 -0
  619. package/workflows/discover.md +37 -0
  620. package/workflows/execute.md +48 -0
  621. package/workflows/learn.md +38 -0
  622. package/workflows/plan-review.md +42 -0
  623. package/workflows/plan.md +39 -0
  624. package/workflows/prepare-next.md +37 -0
  625. package/workflows/review.md +40 -0
  626. package/workflows/run-audit.md +41 -0
  627. package/workflows/spec-challenge.md +41 -0
  628. package/workflows/specify.md +38 -0
  629. package/workflows/verify.md +37 -0
@@ -0,0 +1,797 @@
1
+ # Event-Driven Architecture — Architecture Expertise Module
2
+
3
+ > Event-Driven Architecture (EDA) is a design paradigm where services communicate through events rather than direct calls. Producers emit events without knowing who consumes them; consumers react independently. This achieves loose coupling and high scalability at the cost of debugging complexity and eventual consistency.
4
+
5
+ > **Category:** Pattern
6
+ > **Complexity:** Complex
7
+ > **Applies when:** Systems requiring loose coupling, high-throughput event processing, audit trails, or complex workflows spanning multiple services
8
+
9
+ ---
10
+
11
+ ## What This Is (and What It Isn't)
12
+
13
+ ### Events vs Commands vs Messages
14
+
15
+ These three terms are conflated constantly. They are not the same thing.
16
+
17
+ An **event** is a notification that something has already happened. It is past tense, immutable, and carries no expectation of a response. `OrderPlaced`, `PaymentCompleted`, `UserRegistered`. The producer does not know — and must not care — who receives it. Events are facts. They cannot be rejected by consumers.
18
+
19
+ A **command** is a request for something to happen. It is imperative, directed at a specific handler, and expects exactly one consumer. `PlaceOrder`, `ChargePayment`, `SendEmail`. Commands can fail. They carry intent and imply a contract between sender and receiver.
20
+
21
+ A **message** is the envelope. Both events and commands travel as messages through brokers, queues, or buses. "Message-driven" describes the transport mechanism. "Event-driven" describes the semantic pattern. A system can be message-driven without being event-driven (e.g., command queues), and theoretically event-driven without messages (e.g., polling a change log).
22
+
23
+ **The litmus test:** If removing all consumers would break the producer, you have commands disguised as events. True events are fire-and-forget from the producer's perspective.
24
+
25
+ ### EDA Is Not Event Sourcing
26
+
27
+ Event-Driven Architecture and Event Sourcing are orthogonal concepts that happen to share the word "event."
28
+
29
+ **EDA** is a communication pattern between services. Service A emits an event; Services B, C, D react to it. The events are transient signals — once consumed, they have served their purpose (though they may be retained in a log for replay).
30
+
31
+ **Event Sourcing** is a persistence pattern within a single service. Instead of storing current state, you store the sequence of events that produced that state. The event log is the system of record. Current state is derived by replaying events.
32
+
33
+ You can use EDA without Event Sourcing (most systems do). You can use Event Sourcing without EDA (a single monolith with an event-sourced aggregate). You can use both together (a CQRS/ES service that also publishes domain events to a broker). Conflating them leads to architectures that are complex for no reason.
34
+
35
+ ### Choreography vs Orchestration
36
+
37
+ These are the two fundamental coordination models in event-driven systems.
38
+
39
+ **Choreography** is decentralized. Each service listens for events and decides independently what to do. There is no central coordinator. An `OrderPlaced` event triggers the inventory service to reserve stock, the payment service to charge the card, the notification service to send a confirmation — all independently. The "workflow" emerges from the collective behavior of autonomous services reacting to events.
40
+
41
+ - **Strengths:** Loose coupling, independent deployability, no single point of failure, scales well when workflows are simple.
42
+ - **Weaknesses:** The workflow is implicit and scattered across codebases. Nobody owns the end-to-end flow. Debugging "why did this order fail?" requires correlating logs across every service. Adding a new step means modifying a consumer, and the existing producers are unaware. Circular event chains can create infinite loops.
43
+
44
+ **Orchestration** is centralized. A coordinator service (the orchestrator) owns the workflow definition and issues commands to each participant. The orchestrator knows the sequence: first reserve inventory, then charge payment, then send notification. Participants respond with results, and the orchestrator decides the next step.
45
+
46
+ - **Strengths:** The workflow is explicit, visible, and debuggable. Compensation logic (rollbacks) lives in one place. Adding or reordering steps is straightforward.
47
+ - **Weaknesses:** The orchestrator is a coupling point and potential bottleneck. It must be highly available. It knows about every participant, creating a dependency fan-out.
48
+
49
+ **The pragmatic answer:** Use choreography for simple, loosely related reactions (notifications, analytics, audit logging). Use orchestration for multi-step business-critical workflows where you need visibility and compensation (order fulfillment, payment processing, onboarding flows). Most production systems use both — choreography for fire-and-forget side effects, orchestration for the critical path. Tools like Temporal, Step Functions, and Conductor provide orchestration frameworks purpose-built for this.
50
+
51
+ ### The Four EDA Patterns
52
+
53
+ Martin Fowler identifies four distinct patterns that all fall under "event-driven":
54
+
55
+ 1. **Event Notification** — The simplest form. A service emits an event saying "something happened" with minimal data (just an ID and event type). Consumers must call back to the source to get details. Low coupling but high chattiness.
56
+
57
+ 2. **Event-Carried State Transfer (ECST)** — Events carry the full relevant state. `OrderPlaced { orderId, items, shippingAddress, total }`. Consumers have everything they need without callbacks. Reduces coupling further but increases event size and raises staleness risks.
58
+
59
+ 3. **Event Sourcing** — Events are the system of record. State is derived from the event log. Covered in detail in the CQRS/Event Sourcing module.
60
+
61
+ 4. **CQRS** — Command Query Responsibility Segregation. Read and write models are separated, often connected by events. The write side emits events; the read side builds projections. Covered in the CQRS/Event Sourcing module.
62
+
63
+ ### "Just Use Kafka" Is Not EDA
64
+
65
+ Kafka is an infrastructure component, not an architecture. Installing Kafka and publishing JSON blobs to topics does not make your system event-driven. Common anti-patterns:
66
+
67
+ - **Request-response over Kafka:** Producing a command to a topic and polling a response topic. You have reinvented HTTP with worse latency and no type safety.
68
+ - **Database replication via events:** Mirroring every database row change as an event. You have built a slow, unreliable database replica. Use CDC tools (Debezium) if you actually need this, and do not pretend it is EDA.
69
+ - **Kafka as a database:** Storing state in Kafka topics and reading it back. Kafka is an append-only log, not a query engine. Use it as a transport or audit log, not a primary datastore (unless you are building a very specific streaming application with KTables).
70
+
71
+ ---
72
+
73
+ ## When to Use It
74
+
75
+ ### Decoupling Services That Evolve Independently
76
+
77
+ When teams own different services and deploy on different schedules, synchronous coupling creates coordination nightmares. EDA allows services to communicate through contracts (event schemas) without runtime dependencies. The payment service does not import the order service's SDK — it subscribes to `OrderPlaced` events.
78
+
79
+ **Real example — Shopify:** Shopify processes approximately 66 million messages per second at peak through its Kafka backbone. Each domain (inventory, payments, search, analytics) consumes events independently. When the search team wants to re-index products, they replay events from the product catalog topic without coordinating with the catalog team.
80
+
81
+ ### Audit Trails and Compliance
82
+
83
+ Events are naturally auditable. An append-only event log provides a complete history of what happened, when, and why. Financial services, healthcare, and regulated industries benefit enormously.
84
+
85
+ **Real example — ING Bank:** ING adopted event-driven systems across its European financial platforms. Their payments platform processes thousands of event types through a schema registry with enforced backward compatibility. Every transaction produces an immutable event trail satisfying regulatory audit requirements.
86
+
87
+ ### High-Throughput Data Pipelines
88
+
89
+ When you need to ingest millions of records per second and distribute them to multiple consumers with different processing speeds, event logs excel.
90
+
91
+ **Real example — LinkedIn:** Apache Kafka was born at LinkedIn in 2011 specifically to solve this problem. LinkedIn needed to track page views, search queries, ad impressions, and connection updates — billions of events daily — and route them to both real-time consumers (LinkedIn Signal search, notification feeds) and offline consumers (Hadoop data warehouse, analytics pipelines). A single infrastructure serves both online and offline use cases. The engineering team uses the Actor Pattern extensively when building event-driven workflows.
92
+
93
+ ### Real-Time Stream Processing
94
+
95
+ When business decisions must happen in milliseconds based on continuous data streams — fraud detection, dynamic pricing, anomaly detection — batch processing is too slow.
96
+
97
+ **Real example — Uber:** Uber processes billions of Kafka messages daily. Millions of GPS pings and ride-status events per second flow from rider and driver apps into Kafka. Apache Flink consumes these streams to calculate surge pricing multipliers in real time, with pricing models updating every few seconds. Their Active-Active Kafka setup provides high availability, and the Kappa+ architecture enables seamless backfill. Uber Freight cut aggregation latency from 15 minutes to seconds by shifting from batch to streaming.
98
+
99
+ ### Complex Workflows Spanning Multiple Services
100
+
101
+ Order processing, user onboarding, insurance claims — any workflow that touches 5+ services and has compensation logic benefits from event-driven orchestration.
102
+
103
+ **Real example — Amazon order processing:** Amazon's order pipeline uses SNS for fan-out and SQS for reliable buffering. When an order is placed, an event fans out to inventory reservation, payment processing, fraud detection, and notification services simultaneously. Each service consumes from its own SQS queue at its own pace. EventBridge routes events based on content rules, enabling fine-grained filtering without consumer-side logic.
104
+
105
+ ### AI/ML Real-Time Inference Pipelines
106
+
107
+ In 2024-2025, EDA is increasingly powering real-time ML pipelines where events trigger model updates and predictions. Rather than batch training and batch inference, events flow through feature stores and model servers.
108
+
109
+ **Real example — Pinterest:** Pinterest's ML infrastructure evolved toward streaming architectures to support more responsive ad conversion models. Events from user interactions flow through real-time feature pipelines, enabling models to react to behavioral signals within seconds rather than hours.
110
+
111
+ ---
112
+
113
+ ## When NOT to Use It
114
+
115
+ This section is deliberately as long as "When to Use It." The industry has a bias toward over-engineering, and EDA is frequently adopted where simpler patterns suffice.
116
+
117
+ ### When Synchronous Request-Response Is Simpler
118
+
119
+ If a user submits a form and needs an immediate success/failure response, adding a message broker between the frontend and backend adds latency, complexity, and failure modes — with zero benefit. REST/gRPC request-response is the right choice when the caller needs a synchronous answer.
120
+
121
+ **Real example:** A startup built a simple CRUD app for managing restaurant menus. They used Kafka to "decouple" the API from the database write. Result: 200ms latency became 2 seconds, they needed Kafka ops expertise they did not have, and debugging a failed menu update required checking three systems instead of one. A direct database call would have been correct.
122
+
123
+ ### When You Have Fewer Than Three Consumers
124
+
125
+ If there is exactly one producer and one consumer, you do not need pub/sub. You need a function call or a direct API invocation. Event-driven architecture's value comes from the fan-out — one event, many independent reactions. One-to-one communication is better served by direct calls or simple queues.
126
+
127
+ **Anti-pattern:** Teams that put a message broker between two services "for decoupling" when both services are owned by the same team, deployed together, and always change in lockstep. The broker adds operational overhead without delivering any actual decoupling benefit.
128
+
129
+ ### When Ordering Guarantees Are Critical
130
+
131
+ EDA systems provide ordering within a partition (Kafka) or within a FIFO queue (SQS FIFO), but cross-partition and cross-topic ordering is not guaranteed. If your business logic requires strict global ordering across all events from all sources, EDA will fight you every step of the way. You will end up building a distributed lock manager on top of your event system, which is worse than not using events.
132
+
133
+ **Example:** A financial ledger system where debits and credits must be applied in exact chronological order across all accounts. Using events with per-account partitioning works, but cross-account ordering (e.g., an inter-account transfer that must debit before credit) requires careful saga design or synchronous coordination at the boundary.
134
+
135
+ ### When "Why Did This Happen?" Must Be Instantly Answerable
136
+
137
+ In choreographed systems, understanding the full causal chain of an event requires correlating logs across every service that touched it. If your domain requires instant root-cause analysis — regulatory compliance investigations, financial audits with real-time reporting — the debugging overhead of EDA can be disqualifying unless you invest heavily in distributed tracing infrastructure (OpenTelemetry, Jaeger, correlation IDs in every event).
138
+
139
+ ### When the Team Lacks Distributed Systems Experience
140
+
141
+ EDA introduces failure modes that most developers have never encountered: message duplication, out-of-order delivery, consumer lag, partition rebalancing, schema evolution, poison pills, dead letter queue management. If the team's experience is primarily with synchronous CRUD applications, adopting EDA will result in production incidents that nobody knows how to debug.
142
+
143
+ **Real-world observation:** Most developers are trained on request/response systems. Adopting EDA often requires either hiring specialists or significant training investment. Companies that underestimate this end up with poorly implemented event-driven systems that create "integration debt" — undocumented streams, inconsistent schemas, unclear ownership, and absent governance.
144
+
145
+ ### When Batch Processing Is Good Enough
146
+
147
+ If your data processing needs are satisfied by running a job every hour (or even every minute), do not build a real-time streaming pipeline. Batch is simpler, cheaper, and easier to debug. The gap between "we process data every hour" and "we process data in real-time" is enormous in operational complexity.
148
+
149
+ **Anti-pattern:** A reporting system that generates daily executive dashboards adopted Kafka Streams for "real-time analytics." The dashboards are viewed once per morning. The Kafka cluster costs $15,000/month and requires a dedicated DevOps engineer. A daily cron job with PostgreSQL would have cost $200/month.
150
+
151
+ ### When Simple Function Calls Suffice
152
+
153
+ Within a single service or monolith, method calls are the simplest, fastest, and most debuggable form of communication. Adding an in-process event bus (like MediatR in .NET or Spring Events) adds indirection without adding value unless you have a genuine reason for the decoupling (plugin architecture, testability isolation). The stack trace becomes harder to follow, and "go to definition" stops working.
154
+
155
+ ---
156
+
157
+ ## How It Works
158
+
159
+ ### Event Anatomy
160
+
161
+ A well-designed event contains:
162
+
163
+ ```json
164
+ {
165
+ "eventId": "evt_a1b2c3d4-e5f6-7890-abcd-ef1234567890",
166
+ "eventType": "order.placed",
167
+ "eventVersion": "2.1",
168
+ "timestamp": "2025-03-15T14:30:00.000Z",
169
+ "correlationId": "corr_x9y8z7w6-v5u4-3210-fedc-ba0987654321",
170
+ "causationId": "evt_previous-event-id",
171
+ "source": "order-service",
172
+ "dataContentType": "application/json",
173
+ "data": {
174
+ "orderId": "ord_12345",
175
+ "customerId": "cust_67890",
176
+ "items": [
177
+ { "sku": "WIDGET-001", "quantity": 2, "unitPrice": 29.99 }
178
+ ],
179
+ "totalAmount": 59.98,
180
+ "currency": "USD"
181
+ },
182
+ "metadata": {
183
+ "traceId": "trace_abc123",
184
+ "spanId": "span_def456",
185
+ "environment": "production",
186
+ "schemaRegistryId": "sr_789"
187
+ }
188
+ }
189
+ ```
190
+
191
+ **Key fields explained:**
192
+
193
+ - **eventId:** Globally unique identifier. Used for deduplication. Every consumer must track processed eventIds to achieve idempotency.
194
+ - **correlationId:** Links all events in a single business transaction. When `OrderPlaced` triggers `PaymentCharged` which triggers `ShipmentCreated`, all three share the same correlationId. This is non-negotiable for debugging.
195
+ - **causationId:** The eventId of the event that directly caused this event. Enables building causal chains. CorrelationId groups; causationId sequences.
196
+ - **eventVersion:** Schema version. Consumers use this to select the correct deserialization logic. Without it, schema evolution breaks consumers silently.
197
+ - **source:** Which service produced this event. Critical for debugging and access control.
198
+
199
+ ### Message Brokers vs Event Logs
200
+
201
+ These are fundamentally different infrastructure choices with different guarantees:
202
+
203
+ **Message Brokers (RabbitMQ, ActiveMQ, SQS):**
204
+ - Messages are delivered to consumers and then deleted from the broker.
205
+ - Designed for task distribution — each message is processed by exactly one consumer in a consumer group.
206
+ - Support complex routing (topic exchanges, headers-based routing, fan-out).
207
+ - No replay capability — once consumed, messages are gone (unless you configure persistence, which undermines the performance model).
208
+ - Best for: command distribution, work queues, RPC-style async communication.
209
+
210
+ **Event Logs (Kafka, Redpanda, Pulsar):**
211
+ - Events are appended to a partitioned, replicated log and retained for a configurable period (hours, days, forever).
212
+ - Multiple consumer groups can read the same events independently, each tracking their own offset.
213
+ - Full replay capability — a new consumer can start from the beginning of the log and process all historical events.
214
+ - Ordering guaranteed within a partition, not across partitions.
215
+ - Best for: event sourcing, audit trails, stream processing, multi-consumer fan-out, data pipeline integration.
216
+
217
+ **The choice matters.** If you choose RabbitMQ and later need replay, you are stuck. If you choose Kafka and only need simple task queues, you are paying for complexity you do not use. Many production systems use both — Kafka for the event backbone, RabbitMQ or SQS for specific command-processing workflows.
218
+
219
+ ### Consumer Groups
220
+
221
+ A consumer group is a set of consumers that collectively process events from a topic. Each partition in a topic is assigned to exactly one consumer in the group. This provides:
222
+
223
+ - **Parallelism:** More partitions + more consumers = higher throughput.
224
+ - **Load balancing:** Partitions are distributed across available consumers.
225
+ - **Fault tolerance:** If a consumer dies, its partitions are reassigned to surviving consumers (rebalancing).
226
+ - **Independent processing:** Different consumer groups process the same events independently. The order service's consumer group and the analytics consumer group both read from `order.placed`, each at their own pace.
227
+
228
+ **Critical constraint:** The number of consumers in a group cannot exceed the number of partitions. If you have 12 partitions and 20 consumers, 8 consumers sit idle. Plan partition counts based on expected peak parallelism.
229
+
230
+ ### Idempotency
231
+
232
+ In distributed systems, messages will be delivered more than once. Network retries, consumer crashes after processing but before committing the offset, broker failovers — all produce duplicates. Every consumer must be idempotent.
233
+
234
+ **Implementation strategies:**
235
+
236
+ 1. **Deduplication table:** Store processed eventIds in a database. Before processing, check if the eventId exists. Use a unique constraint. This is simple but requires a database lookup for every message.
237
+
238
+ 2. **Natural idempotency:** Design operations so that applying them multiple times has the same effect as applying them once. `SET balance = 100` is idempotent; `SET balance = balance + 10` is not. Prefer absolute state updates over relative deltas.
239
+
240
+ 3. **Idempotency key in the event:** Include a client-generated idempotency key. The consumer uses this key (not just the eventId) to detect duplicates. Useful when the same logical operation might produce multiple events with different eventIds.
241
+
242
+ 4. **Conditional writes:** Use database optimistic locking (version columns) or conditional updates (`UPDATE ... WHERE version = expected_version`) so that duplicate processing attempts fail safely.
243
+
244
+ ### Dead Letter Queues (DLQ)
245
+
246
+ When a consumer cannot process a message — malformed data, business rule violation, infrastructure failure — it must not block the entire queue. Dead letter queues provide a quarantine for poison messages.
247
+
248
+ **DLQ design:**
249
+
250
+ - After N retry attempts (typically 3-5), move the message to the DLQ.
251
+ - Preserve the original message, all headers, and the error details (stack trace, error code, consumer that failed).
252
+ - Set up alerting on DLQ depth. A growing DLQ means something is systematically wrong.
253
+ - Build tooling to inspect, replay, or discard DLQ messages. Without this tooling, DLQs become black holes that nobody checks until a customer complains.
254
+ - Consider separate DLQs per consumer group. A shared DLQ makes it impossible to determine which consumer failed.
255
+
256
+ ### At-Least-Once vs Exactly-Once
257
+
258
+ **At-most-once:** Process the message, then commit the offset. If the consumer crashes after processing but before committing, the message is skipped on restart. Data loss is possible.
259
+
260
+ **At-least-once:** Commit the offset, then process the message. Wait — that is also wrong. The standard pattern: process the message, commit the offset. If the consumer crashes after processing but before committing, the message is reprocessed on restart. Duplicates are possible. This is why idempotency is mandatory.
261
+
262
+ **Exactly-once:** The holy grail. True exactly-once requires transactional coordination between the message broker and the consumer's side effects. Kafka achieves this through transactional producers and the read-process-write pattern within Kafka Streams — but only when both input and output are Kafka topics. The moment your consumer writes to an external database, exactly-once reverts to at-least-once-with-idempotency. Uber published their approach to real-time exactly-once ad event processing using Flink and Kafka transactions, but acknowledged it requires careful engineering and is not a general-purpose guarantee.
263
+
264
+ **Pragmatic advice:** Design for at-least-once delivery with idempotent consumers. This is the industry standard and works for 99% of use cases.
265
+
266
+ ### Schema Registry
267
+
268
+ As systems evolve, event schemas change. Without governance, a producer adding a field can break every consumer. A schema registry (Confluent Schema Registry, AWS Glue Schema Registry, Apicurio) provides:
269
+
270
+ - **Central schema storage:** All event schemas are registered and versioned.
271
+ - **Compatibility enforcement:** The registry rejects schema changes that would break consumers. Compatibility modes include:
272
+ - **BACKWARD:** New schemas can read data written by old schemas. Safe to add optional fields with defaults.
273
+ - **FORWARD:** Old schemas can read data written by new schemas. Safe to remove optional fields.
274
+ - **FULL:** Both backward and forward compatible. The safest but most restrictive.
275
+ - **TRANSITIVE:** Compatibility is checked against all previous versions, not just the immediately prior one.
276
+ - **Serialization efficiency:** Avro and Protobuf schemas enable compact binary serialization. The producer writes a schema ID into the message header; the consumer resolves the ID from the registry to deserialize. No self-describing JSON overhead.
277
+
278
+ **ING Bank's lesson:** When managing thousands of event types across hundreds of teams, enforcing backward compatibility through a schema registry was essential to keep their payments platform stable. Without it, a single team's schema change could cascade failures across the entire platform.
279
+
280
+ ### Back-Pressure
281
+
282
+ When producers emit events faster than consumers can process them, the system must degrade gracefully rather than crash.
283
+
284
+ **Strategies:**
285
+
286
+ - **Consumer-side buffering:** SQS and similar queues provide natural buffering. Messages wait in the queue until consumers are ready. Queue depth is a key metric.
287
+ - **Partition-based scaling:** Add partitions and consumers to increase throughput. Kafka's consumer group protocol handles rebalancing automatically.
288
+ - **Rate limiting producers:** If consumers are overwhelmed, signal producers to slow down. This is easier with orchestration (the orchestrator can pause) than choreography (nobody is in charge).
289
+ - **Dropping or sampling:** For non-critical events (analytics, metrics), dropping or sampling under load may be acceptable. Never drop business-critical events.
290
+ - **Reactive streams:** Frameworks like Project Reactor and RxJava implement back-pressure protocols where consumers signal their capacity to producers.
291
+
292
+ ---
293
+
294
+ ## Trade-Offs Matrix
295
+
296
+ | Dimension | Event-Driven Approach | Synchronous Alternative | Winner |
297
+ |---|---|---|---|
298
+ | **Coupling** | Producers and consumers are decoupled; either can change independently. Schema contract is the only coupling point. | Caller depends on callee's API, availability, and response time. Changes require coordinated deployment. | EDA |
299
+ | **Scalability** | Consumers scale independently. Add partitions and consumers to handle load. Natural load leveling via buffering. | Callee must handle peak load directly or caller retries/fails. Scaling requires both sides to coordinate. | EDA |
300
+ | **Latency** | Added latency from serialization, network hop to broker, deserialization. Milliseconds to seconds depending on consumer lag. | Sub-millisecond for in-process; single-digit milliseconds for network calls. Direct and predictable. | Sync |
301
+ | **Debuggability** | Requires distributed tracing, correlation IDs, centralized logging. "Where did this event go?" is a hard question. | Stack traces. Breakpoints. Step-through debugging. "What called this function?" is trivially answerable. | Sync |
302
+ | **Fault tolerance** | Broker absorbs producer/consumer failures. Consumers can catch up after recovery. Messages persist during outages. | If the callee is down, the caller gets an error immediately. Circuit breakers help but add complexity. | EDA |
303
+ | **Data consistency** | Eventual consistency. The window between event emission and consumer processing can range from milliseconds to hours. | Strong consistency via database transactions. ACID guarantees. Immediate confirmation. | Sync |
304
+ | **Auditability** | Event logs provide a natural audit trail. Every state change is recorded as an immutable event. | Audit logging requires explicit implementation. Easy to forget a code path. | EDA |
305
+ | **Operational cost** | Message broker infrastructure (Kafka cluster, schema registry, monitoring). Requires specialized ops knowledge. | Application servers and databases. Standard DevOps tooling. Lower baseline operational burden. | Sync |
306
+ | **Team autonomy** | Teams can deploy and scale independently. Schema contracts are the boundary. | Teams must coordinate API changes, deployment order, and version compatibility. | EDA |
307
+ | **Testing complexity** | Integration tests require broker infrastructure. Verifying async workflows requires waiting, polling, or test harnesses. | Direct function calls are trivially unit-testable. Integration tests are synchronous and deterministic. | Sync |
308
+ | **Recovery from failure** | Dead letter queues, replay from offset, reprocessing historical events. Rich recovery tooling. | Retry with exponential backoff. Failed requests may require manual reconciliation. | EDA |
309
+
310
+ **Summary:** EDA wins on coupling, scalability, fault tolerance, auditability, team autonomy, and recovery. Synchronous wins on latency, debuggability, consistency, operational cost, and testing. Choose based on which properties your system values most.
311
+
312
+ ---
313
+
314
+ ## Evolution Path
315
+
316
+ ### Stage 1: Start Synchronous (Monolith or Simple Services)
317
+
318
+ Begin with direct function calls or REST/gRPC APIs. This is correct for most new projects. You do not know your scaling requirements, your team is learning the domain, and the overhead of distributed systems is not justified.
319
+
320
+ **Indicators you are here:** Single deployment unit. Database transactions provide consistency. All communication is request-response. Debugging means reading stack traces.
321
+
322
+ ### Stage 2: Identify Specific Decoupling Needs
323
+
324
+ As the system grows, certain patterns emerge that signal EDA could help:
325
+
326
+ - **Side effects that slow down the critical path.** A user registration endpoint sends a welcome email, updates analytics, provisions a workspace, and notifies the sales team. The user waits 3 seconds for all of this. Move side effects to async event processing; the user gets a response in 200ms.
327
+ - **Multiple consumers need the same data.** The search index, recommendation engine, analytics pipeline, and audit log all need to know about product changes. Rather than the product service calling four APIs, it emits a `ProductUpdated` event.
328
+ - **Downstream services have different SLAs.** The payment service needs 99.99% availability; the email service needs 99%. Coupling them synchronously means the payment flow fails when email is down.
329
+
330
+ ### Stage 3: Introduce Async for Specific Paths
331
+
332
+ Do not convert everything at once. Pick the highest-value use case:
333
+
334
+ 1. **Notifications and side effects** — The easiest starting point. Emit events for email, SMS, push notifications. Use a simple queue (SQS, RabbitMQ). This is low risk and delivers immediate value.
335
+ 2. **Analytics and audit logging** — Emit domain events to an event log. Analytics consumers process them independently. The main application is unaffected.
336
+ 3. **Cross-service workflows** — Introduce event-driven communication between specific services where the coupling is causing deployment coordination pain.
337
+
338
+ ### Stage 4: Event Backbone
339
+
340
+ Once multiple services communicate through events, standardize:
341
+
342
+ - Adopt a schema registry and enforce compatibility rules.
343
+ - Standardize event envelope format (eventId, correlationId, causationId, version).
344
+ - Build shared libraries for event production and consumption (serialization, retry, DLQ, idempotency).
345
+ - Deploy centralized monitoring: consumer lag dashboards, DLQ alerting, event flow visualization.
346
+
347
+ ### Stage 5: Full Event-Driven (If Justified)
348
+
349
+ Some organizations reach a point where events are the primary communication mechanism. This is appropriate for large-scale platforms (LinkedIn, Uber, Netflix) where hundreds of services must communicate with minimal coordination.
350
+
351
+ **Warning:** Most systems should stop at Stage 3 or 4. Full EDA is justified only when you have the team size, operational maturity, and business requirements to support it.
352
+
353
+ ---
354
+
355
+ ## Failure Modes
356
+
357
+ ### Event Storms
358
+
359
+ A cascade where one event triggers multiple events, which each trigger more events, creating an exponential amplification effect.
360
+
361
+ **How it happens:** Service A emits `OrderCreated`. Service B consumes it and emits `InventoryReserved`. Service C consumes both `OrderCreated` and `InventoryReserved` and emits `PaymentRequested` for each — but it should only emit one. Now the payment service processes two charges. The customer disputes. The refund flow emits more events. The storm grows.
362
+
363
+ **Real-world pattern:** Retry storms following transient errors. A downstream service returns a 503. All consumers retry immediately. The downstream service, already overloaded, returns more 503s. Each retry produces a retry event. Without exponential backoff and jitter, the system amplifies the original failure.
364
+
365
+ **Mitigation:** Implement circuit breakers on consumers. Add exponential backoff with jitter on retries. Use rate limiting on event production. Design idempotent consumers so duplicate processing is harmless. Monitor event production rates and alert on anomalous spikes.
366
+
367
+ ### Schema Drift
368
+
369
+ Producers change event schemas without coordinating with consumers. A field is renamed, a required field is removed, or a type changes from string to integer.
370
+
371
+ **How it happens:** Team A owns the order service and renames `totalAmount` to `orderTotal` in a "cleanup" commit. Team B's analytics consumer parses `totalAmount` and starts getting null values. No error is thrown — the JSON silently ignores the unknown field. Analytics dashboards show zero revenue for 6 hours before anyone notices.
372
+
373
+ **Mitigation:** Schema registry with compatibility enforcement (BACKWARD or FULL mode). CI/CD pipeline that validates schema changes against the registry before deployment. Contract testing between producers and consumers.
374
+
375
+ ### Consumer Lag
376
+
377
+ A consumer falls behind the event production rate. The gap between the latest produced event and the latest consumed event grows continuously.
378
+
379
+ **How it happens:** A consumer does expensive processing (ML inference, database joins) that takes 500ms per event. The producer emits 1,000 events per second. The consumer processes 2 events per second. The lag grows by 998 events per second. Within an hour, the consumer is processing events from an hour ago. Within a day, the lag is unrecoverable.
380
+
381
+ **Impact:** Data becomes stale. Real-time features show outdated information. If the consumer is part of a workflow, downstream steps are delayed. If the event log has a retention period, lagging consumers may lose events permanently when older segments are deleted.
382
+
383
+ **Mitigation:** Monitor consumer lag per consumer group per partition. Alert when lag exceeds a threshold. Scale consumers horizontally (add more instances). Increase partitions to allow more parallelism. Optimize consumer processing time. Consider dedicated consumer groups for slow processors so they do not block fast consumers.
384
+
385
+ ### Message Loss
386
+
387
+ Events disappear between production and consumption.
388
+
389
+ **How it happens:** Producer sends an event but does not wait for broker acknowledgment (fire-and-forget, acks=0 in Kafka). The broker crashes before flushing to disk. Or: The consumer commits the offset before processing the message, then crashes. The message is marked as consumed but was never processed.
390
+
391
+ **Mitigation:** Configure producers for `acks=all` (Kafka) or publisher confirms (RabbitMQ). Enable broker replication (minimum replication factor of 3). Process messages before committing offsets. Use transactional outbox pattern: write the event to a database table in the same transaction as the business state change, then a separate process publishes from the outbox to the broker.
392
+
393
+ ### Ordering Violations
394
+
395
+ Events arrive at a consumer in a different order than they were produced.
396
+
397
+ **How it happens:** `OrderPlaced` and `OrderCancelled` are produced in sequence but land in different partitions (because the partition key was not set consistently). A consumer processes `OrderCancelled` first, does nothing (no order to cancel), then processes `OrderPlaced` and creates an order that should have been cancelled.
398
+
399
+ **Mitigation:** Use consistent partition keys. For order events, partition by orderId so all events for the same order arrive in order. Accept that cross-entity ordering is not guaranteed and design for it. Use sequence numbers in events so consumers can detect and handle out-of-order delivery.
400
+
401
+ ### Poison Pills
402
+
403
+ A malformed event that crashes the consumer on every processing attempt, blocking the entire partition.
404
+
405
+ **How it happens:** A producer has a serialization bug and emits an event with invalid JSON. Or a valid event triggers a bug in the consumer's business logic (division by zero, null reference). The consumer crashes, restarts, reads the same event, crashes again. The partition is stuck.
406
+
407
+ **Mitigation:** Wrap consumer processing in try-catch. After N failures (configurable), move the event to a dead letter queue and advance the offset. Log the full event payload and error details. Alert on DLQ additions. Build tooling to inspect and replay DLQ events after the bug is fixed.
408
+
409
+ ### Split-Brain During Rebalancing
410
+
411
+ When a consumer group rebalances (a consumer joins or leaves), there is a brief period where partition ownership is ambiguous. Two consumers might process the same partition simultaneously.
412
+
413
+ **Mitigation:** Use cooperative rebalancing (Kafka 2.4+). Design consumers for idempotency so double-processing is harmless. Use session timeouts and heartbeat intervals tuned to your workload.
414
+
415
+ ---
416
+
417
+ ## Technology Landscape
418
+
419
+ ### Apache Kafka
420
+
421
+ The dominant event log platform. Append-only, partitioned, replicated log with configurable retention. Consumer groups provide scalable consumption. Kafka Streams and ksqlDB enable stream processing without a separate framework. Kafka Connect provides connectors for databases, cloud services, and file systems.
422
+
423
+ - **Strengths:** Throughput (millions of messages per second), durability, replay, ecosystem maturity, exactly-once semantics within Kafka.
424
+ - **Weaknesses:** Operational complexity (ZooKeeper dependency being removed with KRaft), JVM memory tuning, partition management, steep learning curve.
425
+ - **Best for:** High-throughput event backbone, audit trails, stream processing, data pipeline integration.
426
+
427
+ ### RabbitMQ
428
+
429
+ A traditional message broker implementing AMQP. Supports complex routing (direct, topic, fanout, headers exchanges), message acknowledgment, and TTL.
430
+
431
+ - **Strengths:** Flexible routing, mature, well-understood, supports multiple protocols (AMQP, MQTT, STOMP), lightweight for small deployments.
432
+ - **Weaknesses:** Messages are deleted after consumption (no replay). Performance degrades with large queue depths. Not designed for high-throughput streaming.
433
+ - **Best for:** Task queues, command distribution, RPC-style async, workloads that need complex routing logic.
434
+
435
+ ### AWS SNS/SQS/EventBridge
436
+
437
+ Amazon's managed messaging services. SNS provides pub/sub fan-out. SQS provides reliable queues with at-least-once delivery. EventBridge provides content-based routing with rules.
438
+
439
+ - **Strengths:** Fully managed (no cluster operations), native AWS integration, pay-per-use pricing, FIFO variants for ordering.
440
+ - **Weaknesses:** Vendor lock-in, limited throughput compared to Kafka, no replay capability (except EventBridge Archive), SQS messages must be polled.
441
+ - **Best for:** AWS-native applications, serverless architectures (Lambda integration), moderate throughput workloads.
442
+
443
+ ### Google Cloud Pub/Sub
444
+
445
+ Managed pub/sub with automatic scaling, push and pull delivery, message replay (seek to timestamp), and exactly-once delivery mode. Best for GCP-native applications and global event distribution. Trade-off: vendor lock-in, ordering keys instead of partitions.
446
+
447
+ ### NATS / NATS JetStream
448
+
449
+ Lightweight, high-performance messaging. NATS Core provides at-most-once pub/sub; JetStream adds persistence, replay, and consumer groups. Extremely fast (lower latency than Kafka for small messages), single-binary deployment. Best for edge computing, IoT, and systems where Kafka is overkill. Trade-off: smaller ecosystem, less enterprise adoption.
450
+
451
+ ### Apache Pulsar
452
+
453
+ Combines log-based storage (like Kafka) with queue semantics. Separates compute from storage (BookKeeper). Supports multi-tenancy, geo-replication, and tiered storage. Best for multi-tenant platforms and geo-distributed deployments. Trade-off: smaller community, more complex architecture.
454
+
455
+ ### Temporal / Step Functions / Conductor
456
+
457
+ Workflow orchestration engines, not message brokers. They provide durable execution of multi-step workflows with built-in retry, compensation, and visibility.
458
+
459
+ - **Temporal:** Open-source, code-first workflow definitions, supports long-running workflows (months), automatic state management and retry.
460
+ - **AWS Step Functions:** Managed orchestration, visual workflow designer, native integration with AWS services.
461
+ - **Netflix Conductor:** Open-source, JSON-based workflow definitions, built for Netflix's microservices scale.
462
+
463
+ - **Best for:** Orchestrated sagas, multi-step business processes where you need visibility into workflow state, compensation logic, and reliable execution.
464
+
465
+ ### Schema Registry Options
466
+
467
+ - **Confluent Schema Registry:** The standard for Kafka. Supports Avro, Protobuf, JSON Schema. Compatibility enforcement.
468
+ - **AWS Glue Schema Registry:** Managed, integrates with AWS services. Supports Avro and JSON Schema.
469
+ - **Apicurio Registry:** Open-source, supports multiple serialization formats, integrates with Kafka and other brokers.
470
+
471
+ ---
472
+
473
+ ## Decision Tree
474
+
475
+ Use concrete thresholds, not vibes.
476
+
477
+ ```
478
+ Q1: Does the producer need an immediate response from the consumer?
479
+ ├── YES → Use synchronous request-response (REST/gRPC). Stop here.
480
+ └── NO → Continue.
481
+
482
+ Q2: How many independent consumers need this data?
483
+ ├── 1 consumer → Use a direct API call or simple task queue. Stop here.
484
+ ├── 2-3 consumers → Consider EDA. Evaluate whether the coupling cost
485
+ │ justifies the operational overhead.
486
+ └── 4+ consumers → Use EDA. The fan-out value is clear.
487
+
488
+ Q3: What throughput do you need?
489
+ ├── < 100 msgs/sec → SQS, RabbitMQ, or even a database-backed queue.
490
+ ├── 100 - 10,000 msgs/sec → RabbitMQ, SQS, NATS, Kafka (single cluster).
491
+ ├── 10,000 - 1,000,000 msgs/sec → Kafka, Pulsar, NATS JetStream.
492
+ └── > 1,000,000 msgs/sec → Kafka (multi-cluster), custom infrastructure.
493
+
494
+ Q4: Do you need event replay?
495
+ ├── YES → Kafka, Pulsar, or NATS JetStream (log-based).
496
+ │ RabbitMQ and SQS do not support replay.
497
+ └── NO → Any broker is fine. Choose based on throughput and operational
498
+ preference.
499
+
500
+ Q5: Do you need strict ordering?
501
+ ├── GLOBAL ordering → Reconsider EDA. Single partition = no parallelism.
502
+ │ Consider a synchronous pipeline instead.
503
+ ├── Per-entity ordering → Use partition keys (Kafka) or message group
504
+ │ IDs (SQS FIFO). Standard EDA.
505
+ └── No ordering requirement → Any broker with any partitioning.
506
+
507
+ Q6: Choreography or Orchestration?
508
+ ├── Simple reactions (notifications, analytics, audit) → Choreography.
509
+ ├── Multi-step business workflow with compensation → Orchestration
510
+ │ (Temporal, Step Functions, Conductor).
511
+ └── Both → Choreography for side effects, orchestration for the
512
+ critical path. This is the most common production pattern.
513
+
514
+ Q7: What is your team's distributed systems experience?
515
+ ├── Low → Start with managed services (SQS/SNS, Cloud Pub/Sub).
516
+ │ Avoid self-managed Kafka until you have ops maturity.
517
+ ├── Medium → Managed Kafka (Confluent Cloud, Amazon MSK) with
518
+ │ schema registry and monitoring.
519
+ └── High → Self-managed Kafka, custom tooling, Flink/Streams for
520
+ stream processing.
521
+ ```
522
+
523
+ ---
524
+
525
+ ## Implementation Sketch
526
+
527
+ ### Event Schema Definition (Avro)
528
+
529
+ ```avro
530
+ {
531
+ "type": "record",
532
+ "name": "OrderPlaced",
533
+ "namespace": "com.example.orders.events",
534
+ "doc": "Emitted when a customer successfully places an order.",
535
+ "fields": [
536
+ { "name": "eventId", "type": "string", "doc": "Unique event identifier (UUID v4)" },
537
+ { "name": "eventVersion", "type": "string", "default": "1.0", "doc": "Schema version" },
538
+ { "name": "timestamp", "type": "long", "logicalType": "timestamp-millis" },
539
+ { "name": "correlationId", "type": "string", "doc": "Shared across all events in a business transaction" },
540
+ { "name": "causationId", "type": ["null", "string"], "default": null, "doc": "EventId of the causing event" },
541
+ { "name": "orderId", "type": "string" },
542
+ { "name": "customerId", "type": "string" },
543
+ { "name": "items", "type": {
544
+ "type": "array",
545
+ "items": {
546
+ "type": "record",
547
+ "name": "OrderItem",
548
+ "fields": [
549
+ { "name": "sku", "type": "string" },
550
+ { "name": "quantity", "type": "int" },
551
+ { "name": "unitPriceCents", "type": "long", "doc": "Price in cents to avoid floating point" }
552
+ ]
553
+ }
554
+ }},
555
+ { "name": "totalAmountCents", "type": "long" },
556
+ { "name": "currency", "type": "string", "default": "USD" }
557
+ ]
558
+ }
559
+ ```
560
+
561
+ ### Producer (TypeScript / Node.js with Kafka)
562
+
563
+ ```typescript
564
+ import { Kafka, Partitioners } from 'kafkajs';
565
+ import { SchemaRegistry } from '@kafkajs/confluent-schema-registry';
566
+ import { randomUUID } from 'crypto';
567
+
568
+ const kafka = new Kafka({
569
+ clientId: 'order-service',
570
+ brokers: ['kafka-1:9092', 'kafka-2:9092', 'kafka-3:9092'],
571
+ });
572
+
573
+ const producer = kafka.producer({
574
+ createPartitioner: Partitioners.DefaultPartitioner,
575
+ idempotent: true, // Prevents duplicate messages from producer retries
576
+ maxInFlightRequests: 5, // With idempotent=true, Kafka handles ordering
577
+ retry: { retries: 3 },
578
+ });
579
+
580
+ const registry = new SchemaRegistry({ host: 'http://schema-registry:8081' });
581
+
582
+ async function publishOrderPlaced(order: Order, correlationId: string): Promise<void> {
583
+ const event = {
584
+ eventId: randomUUID(),
585
+ eventVersion: '1.0',
586
+ timestamp: Date.now(),
587
+ correlationId,
588
+ causationId: null,
589
+ orderId: order.id,
590
+ customerId: order.customerId,
591
+ items: order.items.map(item => ({
592
+ sku: item.sku,
593
+ quantity: item.quantity,
594
+ unitPriceCents: Math.round(item.unitPrice * 100),
595
+ })),
596
+ totalAmountCents: Math.round(order.total * 100),
597
+ currency: order.currency,
598
+ };
599
+
600
+ const encodedValue = await registry.encode(SCHEMA_ID, event);
601
+
602
+ await producer.send({
603
+ topic: 'orders.placed',
604
+ messages: [{
605
+ key: order.id, // Partition by orderId for ordering
606
+ value: encodedValue,
607
+ headers: {
608
+ 'correlation-id': correlationId,
609
+ 'event-type': 'order.placed',
610
+ 'source': 'order-service',
611
+ },
612
+ }],
613
+ acks: -1, // Wait for all replicas (acks=all)
614
+ });
615
+ }
616
+ ```
617
+
618
+ ### Consumer with Idempotency and DLQ
619
+
620
+ ```typescript
621
+ import { Kafka, EachMessagePayload } from 'kafkajs';
622
+ import { SchemaRegistry } from '@kafkajs/confluent-schema-registry';
623
+
624
+ const consumer = kafka.consumer({
625
+ groupId: 'inventory-service',
626
+ sessionTimeout: 30000,
627
+ heartbeatInterval: 3000,
628
+ maxPollIntervalMs: 300000,
629
+ });
630
+
631
+ const registry = new SchemaRegistry({ host: 'http://schema-registry:8081' });
632
+ const dlqProducer = kafka.producer();
633
+
634
+ const MAX_RETRIES = 3;
635
+
636
+ async function startConsumer(): Promise<void> {
637
+ await consumer.connect();
638
+ await consumer.subscribe({ topic: 'orders.placed', fromBeginning: false });
639
+
640
+ await consumer.run({
641
+ eachMessage: async (payload: EachMessagePayload) => {
642
+ const { topic, partition, message } = payload;
643
+ const eventId = message.headers?.['event-id']?.toString();
644
+ const correlationId = message.headers?.['correlation-id']?.toString();
645
+
646
+ try {
647
+ // Decode using schema registry
648
+ const event = await registry.decode(message.value!);
649
+
650
+ // Idempotency check: skip if already processed
651
+ const alreadyProcessed = await idempotencyStore.exists(event.eventId);
652
+ if (alreadyProcessed) {
653
+ console.log(`Skipping duplicate event ${event.eventId}`);
654
+ return; // Offset will be committed automatically
655
+ }
656
+
657
+ // Business logic: reserve inventory
658
+ await inventoryService.reserveStock(event.orderId, event.items);
659
+
660
+ // Mark as processed (in same transaction as business logic if possible)
661
+ await idempotencyStore.markProcessed(event.eventId);
662
+
663
+ console.log(`Processed order ${event.orderId} [correlation: ${correlationId}]`);
664
+
665
+ } catch (error) {
666
+ const retryCount = parseInt(message.headers?.['retry-count']?.toString() || '0');
667
+
668
+ if (retryCount >= MAX_RETRIES) {
669
+ // Move to Dead Letter Queue
670
+ await dlqProducer.send({
671
+ topic: 'orders.placed.dlq',
672
+ messages: [{
673
+ key: message.key,
674
+ value: message.value,
675
+ headers: {
676
+ ...message.headers,
677
+ 'original-topic': topic,
678
+ 'original-partition': String(partition),
679
+ 'error-message': String(error),
680
+ 'failed-at': new Date().toISOString(),
681
+ 'consumer-group': 'inventory-service',
682
+ },
683
+ }],
684
+ });
685
+ console.error(`Moved to DLQ after ${MAX_RETRIES} retries: ${eventId}`);
686
+ } else {
687
+ // Retry: re-publish to a retry topic with delay
688
+ await dlqProducer.send({
689
+ topic: 'orders.placed.retry',
690
+ messages: [{
691
+ key: message.key,
692
+ value: message.value,
693
+ headers: {
694
+ ...message.headers,
695
+ 'retry-count': String(retryCount + 1),
696
+ 'retry-after': String(Date.now() + (1000 * Math.pow(2, retryCount))),
697
+ },
698
+ }],
699
+ });
700
+ }
701
+ }
702
+ },
703
+ });
704
+ }
705
+ ```
706
+
707
+ ### Kafka Topic Design
708
+
709
+ ```
710
+ # Topic naming convention: <domain>.<entity>.<event-type>
711
+ # Use dots for logical separation, hyphens within names
712
+
713
+ orders.placed # New orders (partitioned by orderId)
714
+ orders.cancelled # Cancelled orders
715
+ orders.placed.retry # Retry topic with exponential backoff
716
+ orders.placed.dlq # Dead letter queue for orders.placed
717
+
718
+ payments.charged # Successful payments
719
+ payments.failed # Failed payment attempts
720
+ payments.refunded # Refunds
721
+
722
+ inventory.reserved # Stock reserved for an order
723
+ inventory.released # Stock released (order cancelled/failed)
724
+
725
+ notifications.email # Email send requests (command-style)
726
+ notifications.sms # SMS send requests
727
+
728
+ # Partition strategy:
729
+ # - orders.*: partition by orderId (ensures per-order ordering)
730
+ # - payments.*: partition by orderId (correlate with order events)
731
+ # - inventory.*: partition by warehouseId (per-warehouse processing)
732
+ # - notifications.*: partition by customerId (per-customer ordering)
733
+
734
+ # Retention:
735
+ # - Business events (orders, payments): 30 days (or infinite for audit)
736
+ # - Retry topics: 7 days
737
+ # - DLQ topics: 90 days (need time to investigate and replay)
738
+ # - Notifications: 3 days (ephemeral, no replay value)
739
+
740
+ # Partition count:
741
+ # - Start with 12 partitions for moderate throughput
742
+ # - Scale to 36-72 for high throughput
743
+ # - Never reduce partition count (Kafka does not support it)
744
+ # - Rule of thumb: partitions >= expected peak consumer instances
745
+ ```
746
+
747
+ ### Correlation ID Propagation
748
+
749
+ ```typescript
750
+ // Middleware: extract or create correlation ID, attach to async context
751
+ function correlationMiddleware(req: Request, res: Response, next: NextFunction) {
752
+ const correlationId = req.headers['x-correlation-id'] as string || randomUUID();
753
+ res.setHeader('x-correlation-id', correlationId);
754
+ asyncLocalStorage.run({ correlationId }, () => next());
755
+ }
756
+
757
+ // Consumer: propagate correlation ID to all downstream events
758
+ async function handleOrderPlaced(event: OrderPlacedEvent): Promise<void> {
759
+ await publishEvent('payments.requested', {
760
+ correlationId: event.correlationId, // Same correlation ID
761
+ causationId: event.eventId, // This event caused the payment request
762
+ orderId: event.orderId,
763
+ amount: event.totalAmountCents,
764
+ });
765
+ }
766
+ ```
767
+
768
+ ---
769
+
770
+ ## Cross-References
771
+
772
+ - **[cqrs-event-sourcing](../cqrs-event-sourcing.md)** — Event Sourcing uses events as the system of record; CQRS separates read/write models connected by events. Complementary to EDA but orthogonal.
773
+ - **[microservices](../microservices.md)** — EDA is a primary communication pattern for microservices. Most microservice architectures use some form of event-driven communication for inter-service messaging.
774
+ - **[saga-pattern](../../backend/saga-pattern.md)** — Sagas coordinate multi-step distributed transactions using either choreography (event-driven) or orchestration. EDA is the substrate on which choreographed sagas run.
775
+ - **[event-streams-and-queues](../../infrastructure/event-streams-and-queues.md)** — Infrastructure-level details of broker deployment, cluster sizing, replication, and operational runbooks.
776
+ - **[idempotency-and-retry](../../backend/idempotency-and-retry.md)** — Idempotency is mandatory for EDA consumers. Retry strategies, deduplication tables, and exactly-once semantics are covered in depth.
777
+
778
+ ---
779
+
780
+ ## Sources
781
+
782
+ - [Growin: Event Driven Architecture Done Right (2025)](https://www.growin.com/blog/event-driven-architecture-scale-systems-2025/)
783
+ - [Estuary: 10 Event-Driven Architecture Examples](https://estuary.dev/blog/event-driven-architecture-examples/)
784
+ - [Ably: Event-Driven Architecture Challenges](https://ably.com/topic/event-driven-architecture-challenges)
785
+ - [Three Dots Labs: Event-Driven Architecture: The Hard Parts](https://threedots.tech/episode/event-driven-architecture/)
786
+ - [CodeOpinion: Event-Driven Architecture Issues & Challenges](https://codeopinion.com/event-driven-architecture-issues-challenges/)
787
+ - [Wix Engineering: 5 Pitfalls to Avoid](https://medium.com/wix-engineering/event-driven-architecture-5-pitfalls-to-avoid-b3ebf885bdb1)
788
+ - [Camunda: Orchestration vs Choreography](https://camunda.com/blog/2023/02/orchestration-vs-choreography/)
789
+ - [Temporal: Saga Orchestration vs Choreography](https://temporal.io/blog/to-choreograph-or-orchestrate-your-saga-that-is-the-question)
790
+ - [LinkedIn: Open-sourcing Kafka](https://www.linkedin.com/blog/member/archive/open-source-linkedin-kafka)
791
+ - [Quastor: How LinkedIn Uses Event Driven Architectures](https://blog.quastor.org/p/how-linkedin-uses-event-driven-architectures-to-scale)
792
+ - [Uber Blog: Real-Time Exactly-Once Ad Event Processing](https://www.uber.com/blog/real-time-exactly-once-ad-event-processing/)
793
+ - [ByteByteGo: How Uber Manages Petabytes of Real-Time Data](https://blog.bytebytego.com/p/how-uber-manages-petabytes-of-real)
794
+ - [AWS: SNS, SQS, or EventBridge Decision Guide](https://docs.aws.amazon.com/decision-guides/latest/sns-or-sqs-or-eventbridge/sns-or-sqs-or-eventbridge.html)
795
+ - [Confluent Schema Registry and Avro](https://markaicode.com/kafka-schema-registry-avro/)
796
+ - [The Burning Monk: Event Versioning Strategies](https://theburningmonk.com/2025/04/event-versioning-strategies-for-event-driven-architectures/)
797
+ - [Equal Experts: EDA — The Good, The Bad, and The Ugly](https://www.equalexperts.com/blog/tech-focus/event-driven-architecture-the-good-the-bad-and-the-ugly/)