@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,634 @@
|
|
|
1
|
+
# Event Streams and Queues — Architecture Expertise Module
|
|
2
|
+
|
|
3
|
+
> Event streams (Kafka, Kinesis) and message queues (RabbitMQ, SQS) are the backbone of asynchronous communication. Despite surface similarity, they solve fundamentally different problems: queues distribute work across consumers (each message processed once), while event streams provide a durable, replayable log of events (multiple consumers can read independently). Conflating the two is one of the most common architectural mistakes in distributed systems.
|
|
4
|
+
|
|
5
|
+
> **Category:** Data
|
|
6
|
+
> **Complexity:** Moderate
|
|
7
|
+
> **Applies when:** Asynchronous communication between services, background job processing, event-driven architectures, real-time data pipelines
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## What This Is (and What It Isn't)
|
|
12
|
+
|
|
13
|
+
### Two Fundamentally Different Abstractions
|
|
14
|
+
|
|
15
|
+
Despite both moving messages from producers to consumers, message queues and event streams have **different data models, different consumption semantics, and different failure characteristics**. They are not interchangeable. Treating one like the other leads to subtle, production-breaking bugs.
|
|
16
|
+
|
|
17
|
+
**Message Queue** — A transient buffer for work distribution. A producer enqueues a message; one consumer dequeues and processes it; the message is deleted. Think of it as a task list: once a task is done, it is removed. The queue's job is to ensure exactly one consumer handles each message. RabbitMQ, AWS SQS, ActiveMQ, and BullMQ are canonical examples.
|
|
18
|
+
|
|
19
|
+
**Event Stream** — An immutable, append-only log of events. Producers append events; consumers read from the log at their own pace using an offset (a cursor position). The event is never deleted by consumption — it persists for a configured retention period (or indefinitely with compaction). Multiple independent consumers can read the same events without interfering with each other. Apache Kafka, Amazon Kinesis, Apache Pulsar, and Redpanda are canonical examples.
|
|
20
|
+
|
|
21
|
+
### The Key Distinction: Ownership of the Cursor
|
|
22
|
+
|
|
23
|
+
In a queue, the **broker** owns the cursor. It tracks which messages have been delivered and acknowledged, and removes them. Consumers are passive recipients.
|
|
24
|
+
|
|
25
|
+
In a stream, the **consumer** owns the cursor (offset). The broker simply stores the log. Each consumer group tracks its own position independently. This is why replay is possible — rewind the offset and re-read.
|
|
26
|
+
|
|
27
|
+
### What They Share
|
|
28
|
+
|
|
29
|
+
Both provide **decoupling** (producer and consumer don't need to know about each other), **buffering** (absorb bursts when producers outpace consumers), and **asynchrony** (producer doesn't block waiting for the consumer). Both require careful handling of failure, ordering, and delivery guarantees.
|
|
30
|
+
|
|
31
|
+
### They Are NOT Interchangeable
|
|
32
|
+
|
|
33
|
+
Using Kafka as a simple work queue is possible but wasteful — you pay for partitioning, retention, and replication you don't need. Using RabbitMQ as an event log is dangerous — once a message is consumed it is gone, and you cannot replay. Choose based on the consumption pattern, not the throughput number on a benchmark slide.
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## When to Use Queues
|
|
38
|
+
|
|
39
|
+
### The Core Pattern: Work Distribution
|
|
40
|
+
|
|
41
|
+
Use a message queue when you have **tasks to distribute across competing consumers** and each task should be processed exactly once.
|
|
42
|
+
|
|
43
|
+
**Background job processing.** A web request enqueues a "send welcome email" job. One of N worker processes picks it up, sends the email, and acknowledges. The job is removed from the queue. If the worker crashes before acknowledging, the message becomes visible again (visibility timeout in SQS, nack/requeue in RabbitMQ).
|
|
44
|
+
|
|
45
|
+
**Load leveling.** Your API receives 10,000 image resize requests per minute during peak hours but your GPU workers can only handle 2,000/min. The queue absorbs the burst. Workers drain it at their own pace. No request is lost; latency increases but the system doesn't crash.
|
|
46
|
+
|
|
47
|
+
**Request-reply (RPC over messaging).** Service A sends a request message with a correlation ID and a reply-to queue. Service B processes it and sends the response to the reply queue. RabbitMQ's direct reply-to feature optimizes this pattern.
|
|
48
|
+
|
|
49
|
+
**Fan-out with topic routing.** RabbitMQ exchanges (direct, topic, fanout, headers) provide flexible routing. A single message can be routed to multiple queues based on routing keys. This is pub/sub, but each queue still has competing consumers — the queue semantics remain.
|
|
50
|
+
|
|
51
|
+
**Delayed and scheduled jobs.** RabbitMQ supports delayed message plugins; SQS has DelaySeconds; BullMQ has built-in delayed jobs with cron scheduling. This is awkward to implement with event streams.
|
|
52
|
+
|
|
53
|
+
**Priority processing.** RabbitMQ supports priority queues natively (up to 255 priority levels). Kafka partitions have no concept of priority — all messages in a partition are read in order.
|
|
54
|
+
|
|
55
|
+
### Indicators That a Queue Is the Right Choice
|
|
56
|
+
|
|
57
|
+
- Each message represents a **unit of work** that should be processed once
|
|
58
|
+
- You need **competing consumers** that divide work (not duplicate it)
|
|
59
|
+
- Message **ordering across the entire queue** is not critical (or FIFO queues suffice)
|
|
60
|
+
- You want messages to **disappear after processing** — retention is not needed
|
|
61
|
+
- You need **routing logic** (topic patterns, header-based routing, priority lanes)
|
|
62
|
+
- The consumer count is **dynamic** and scales independently of message partitioning
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## When to Use Streams
|
|
67
|
+
|
|
68
|
+
### The Core Pattern: Event Log
|
|
69
|
+
|
|
70
|
+
Use an event stream when events are **facts that happened** and multiple systems need to react to them independently, potentially at different speeds, potentially replaying history.
|
|
71
|
+
|
|
72
|
+
**Event sourcing.** Instead of storing current state, store every state-changing event. The stream IS the database. Rebuild state by replaying events from the beginning. Kafka's log compaction feature keeps the latest value per key, enabling infinite retention of the "current state" while discarding superseded events.
|
|
73
|
+
|
|
74
|
+
**Audit logs and compliance.** Financial regulations (SOX, PCI-DSS, MiFID II) require an immutable record of every transaction. An event stream provides this natively — events are append-only, timestamped, and retained for configurable periods.
|
|
75
|
+
|
|
76
|
+
**Multi-consumer data pipelines.** An "order placed" event needs to be consumed by: (1) the fulfillment service, (2) the analytics pipeline, (3) the notification service, (4) the fraud detection system. With a queue, you need four separate queues and fan-out logic. With a stream, each service has its own consumer group reading the same topic independently.
|
|
77
|
+
|
|
78
|
+
**Real-time analytics and CEP.** Kafka Streams, Apache Flink, and ksqlDB enable complex event processing — windowed aggregations, joins between streams, pattern detection — directly on the event log without extracting data to a separate system.
|
|
79
|
+
|
|
80
|
+
**Change Data Capture (CDC).** Debezium captures database row-level changes and publishes them to Kafka topics. Downstream services consume these change events to maintain materialized views, update search indexes, or synchronize data warehouses.
|
|
81
|
+
|
|
82
|
+
**Replay and reprocessing.** A bug in consumer v1 corrupted derived data. Deploy consumer v2, reset its offset to the beginning of the topic, and reprocess every event. The source of truth (the log) was never affected. This is impossible with a queue — consumed messages are gone.
|
|
83
|
+
|
|
84
|
+
### Indicators That a Stream Is the Right Choice
|
|
85
|
+
|
|
86
|
+
- Events are **facts** (immutable records of things that happened), not tasks
|
|
87
|
+
- **Multiple independent consumers** need the same events
|
|
88
|
+
- You need **replay** capability (reprocessing, backfilling, debugging)
|
|
89
|
+
- **Ordering within a partition** is critical (e.g., all events for a given user must be processed in order)
|
|
90
|
+
- You are building an **event-driven architecture** where services react to domain events
|
|
91
|
+
- Data must be **retained** for hours, days, or indefinitely
|
|
92
|
+
- You need **stream processing** (windowed aggregations, joins, pattern detection)
|
|
93
|
+
|
|
94
|
+
---
|
|
95
|
+
|
|
96
|
+
## When NOT to Use Either
|
|
97
|
+
|
|
98
|
+
This section is deliberately long because **over-engineering with messaging infrastructure is as common as under-engineering it**.
|
|
99
|
+
|
|
100
|
+
### When a Database-Backed Job Queue Is Enough
|
|
101
|
+
|
|
102
|
+
If you are processing 100 messages per day — or even 10,000 — you likely do not need Kafka or RabbitMQ. A Postgres table with a `status` column, a `locked_until` timestamp, and `SELECT ... FOR UPDATE SKIP LOCKED` gives you a perfectly functional job queue with zero additional infrastructure. Libraries like `graphile-worker` (Node.js), `Oban` (Elixir), `Sidekiq` (Ruby, backed by Redis), `Celery` (Python), and `Hangfire` (.NET) provide battle-tested abstractions over database-backed queues.
|
|
103
|
+
|
|
104
|
+
**When this is the right call:**
|
|
105
|
+
- Fewer than 10,000 jobs/minute
|
|
106
|
+
- You already have a database and don't want to operate another system
|
|
107
|
+
- Jobs don't need to fan out to multiple consumers
|
|
108
|
+
- You don't need replay, retention, or stream processing
|
|
109
|
+
- Your team doesn't have Kafka/RabbitMQ operational expertise
|
|
110
|
+
|
|
111
|
+
### When Kafka Is Over-Engineering
|
|
112
|
+
|
|
113
|
+
Kafka is a distributed system that requires: ZooKeeper or KRaft for consensus, careful partition planning, ISR (in-sync replica) monitoring, topic configuration management, consumer group rebalancing, and JVM tuning. Running Kafka well demands dedicated operational knowledge.
|
|
114
|
+
|
|
115
|
+
**Do not use Kafka when:**
|
|
116
|
+
- You have a small team without dedicated infrastructure engineers
|
|
117
|
+
- Your message volume is under 1,000 messages/second and a managed queue (SQS) would suffice
|
|
118
|
+
- You don't need replay, multi-consumer, or stream processing
|
|
119
|
+
- You primarily need request-reply patterns (Kafka is awkward for RPC)
|
|
120
|
+
- You need per-message routing logic (Kafka topics are coarse-grained; RabbitMQ exchanges are far more flexible)
|
|
121
|
+
- You need message priority (Kafka has no priority concept)
|
|
122
|
+
|
|
123
|
+
### When a Direct HTTP Call or Webhook Is Enough
|
|
124
|
+
|
|
125
|
+
If Service A calls Service B synchronously and can tolerate B's latency, a direct HTTP call is simpler. Adding a queue "just in case" without a clear failure or decoupling requirement adds latency, operational burden, and debugging complexity (you now need to trace messages through a broker).
|
|
126
|
+
|
|
127
|
+
### When Polling or Cron Is Enough
|
|
128
|
+
|
|
129
|
+
A nightly batch job that processes all new orders doesn't need a real-time event stream. A cron job querying a database for `WHERE processed = false` is simpler, easier to debug, and sufficient when real-time processing isn't required.
|
|
130
|
+
|
|
131
|
+
### The "Resume-Driven Development" Anti-Pattern
|
|
132
|
+
|
|
133
|
+
Introducing Kafka because "we might need it someday" or "Netflix uses it" is the most common form of over-engineering in backend systems. Kafka solves real problems at scale — but at the cost of significant operational complexity. If you cannot articulate which specific problem Kafka solves that a simpler approach cannot, you don't need it yet. Start with the simplest tool that works and migrate when you hit actual limits.
|
|
134
|
+
|
|
135
|
+
---
|
|
136
|
+
|
|
137
|
+
## How It Works
|
|
138
|
+
|
|
139
|
+
### Queue Semantics
|
|
140
|
+
|
|
141
|
+
**Point-to-point delivery.** Each message is delivered to exactly one consumer. If five consumers listen on the same queue, the broker load-balances messages across them (competing consumers pattern). This is the default behavior in RabbitMQ and SQS.
|
|
142
|
+
|
|
143
|
+
**Fan-out via exchanges (RabbitMQ).** A fanout exchange copies each message to every bound queue. A topic exchange routes based on routing key patterns (e.g., `order.*.created` matches `order.us.created`). A headers exchange routes based on message header values. This provides pub/sub ON TOP of queue semantics — each queue still has its own competing consumers.
|
|
144
|
+
|
|
145
|
+
**Dead Letter Queues (DLQ).** When a message fails processing repeatedly (exceeds max retries or TTL), the broker moves it to a designated DLQ. This prevents poison messages from blocking the entire queue. SQS has native DLQ support via redrive policies. RabbitMQ uses dead-letter exchanges (DLX). DLQs are essential — without them, a single malformed message can halt all processing.
|
|
146
|
+
|
|
147
|
+
**Visibility timeout / acknowledgment.** SQS uses visibility timeout: after a consumer receives a message, the message is invisible to other consumers for a configurable period. If the consumer doesn't delete it in time, it becomes visible again. RabbitMQ uses explicit acknowledgments: the consumer sends `ack` (success) or `nack` (failure, optionally requeue). Unacknowledged messages are redelivered when the consumer disconnects.
|
|
148
|
+
|
|
149
|
+
**Message TTL and expiration.** Messages can expire after a configurable time. RabbitMQ supports per-queue and per-message TTL. SQS supports message retention from 1 minute to 14 days (default 4 days). Expired messages are either discarded or routed to a DLQ.
|
|
150
|
+
|
|
151
|
+
### Stream Semantics
|
|
152
|
+
|
|
153
|
+
**Partitions.** A topic is divided into partitions (Kafka) or shards (Kinesis). Each partition is an ordered, immutable sequence of events. Ordering is guaranteed ONLY within a partition, not across partitions. A message's partition is determined by its key (hash of key modulo partition count) or round-robin if no key is provided.
|
|
154
|
+
|
|
155
|
+
**Offsets.** Each message in a partition has a monotonically increasing offset (a 64-bit integer). A consumer reads from a specific offset and advances it as it processes messages. The consumer (or consumer group) periodically commits its current offset to the broker, forming a checkpoint. On restart, the consumer resumes from the last committed offset.
|
|
156
|
+
|
|
157
|
+
**Consumer groups.** A consumer group is a set of consumers that cooperatively consume a topic. Each partition is assigned to exactly one consumer in the group. If you have 6 partitions and 3 consumers in a group, each consumer handles 2 partitions. If a consumer dies, its partitions are reassigned to surviving consumers (rebalancing). Different consumer groups are completely independent — each tracks its own offsets and reads every message.
|
|
158
|
+
|
|
159
|
+
**Retention and compaction.** Time-based retention: delete messages older than N hours/days (default 7 days in Kafka). Size-based retention: delete oldest messages when the partition exceeds N bytes. Log compaction: keep only the latest message per key — enables "table" semantics on a topic where each key represents an entity and the latest value is its current state.
|
|
160
|
+
|
|
161
|
+
**Compacted topics as materialized views.** A compacted topic with key = `user-123` and value = `{name: "Alice", email: "..."}` effectively becomes a key-value store. Kafka's KTable abstraction and ksqlDB build on this. New consumers reading from the beginning of a compacted topic get the full current state of every entity.
|
|
162
|
+
|
|
163
|
+
### Delivery Guarantees
|
|
164
|
+
|
|
165
|
+
**At-most-once.** Fire and forget. The producer sends the message and doesn't wait for acknowledgment. The consumer processes the message before committing the offset. If either crashes, the message is lost. Fastest, but unsuitable for anything that matters.
|
|
166
|
+
|
|
167
|
+
**At-least-once.** The producer retries until the broker acknowledges. The consumer commits the offset AFTER successful processing. If the consumer crashes after processing but before committing, the message is redelivered. This is the default for most systems and the RIGHT default for most use cases. Requires idempotent consumers.
|
|
168
|
+
|
|
169
|
+
**Exactly-once.** The holy grail — and the most misunderstood guarantee. Kafka achieves exactly-once semantics (EOS) within its ecosystem via idempotent producers (sequence numbers per partition to deduplicate retries) and transactional writes (atomically write to multiple partitions and commit offsets). However, EOS has real limitations:
|
|
170
|
+
|
|
171
|
+
- **Scope**: Exactly-once applies to Kafka-to-Kafka workflows (consume from topic A, process, produce to topic B, commit offset — all atomically). It does NOT extend to external systems. If your consumer writes to a database and Kafka, you still need idempotency at the database level.
|
|
172
|
+
- **Performance**: Transactional writes add latency (synchronous RPCs for transaction coordination) and reduce throughput. Each producer can have only one active transaction.
|
|
173
|
+
- **Consumer isolation**: Consumers reading transactional topics must use `read_committed` isolation, which means they cannot see uncommitted messages. This increases end-to-end latency because consumers must wait for the LSO (Last Stable Offset) to advance.
|
|
174
|
+
|
|
175
|
+
**The pragmatic choice**: Design for at-least-once delivery with idempotent consumers. This works across all brokers and all external systems. Exactly-once within Kafka is valuable for stream processing pipelines (Kafka Streams, ksqlDB) but should not be relied upon as a substitute for application-level idempotency.
|
|
176
|
+
|
|
177
|
+
### Back-Pressure
|
|
178
|
+
|
|
179
|
+
**Queue-based back-pressure.** The queue grows when producers outpace consumers. The queue has a max length or max memory limit. When reached: RabbitMQ blocks publishers (TCP back-pressure) or drops messages (overflow policy); SQS has no hard limit (effectively infinite, up to 14-day retention). Monitor queue depth as a key health metric — growing depth means consumers are falling behind.
|
|
180
|
+
|
|
181
|
+
**Stream-based back-pressure.** Consumer lag (the gap between the latest offset and the consumer's committed offset) is the primary metric. Kafka does not slow down producers when consumers lag — the log keeps growing until retention kicks in. If a consumer falls behind far enough, its unconsumed messages may be deleted by retention. This is a data loss scenario. Monitor consumer lag aggressively. Auto-scale consumers or alert when lag exceeds thresholds.
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Trade-Offs Matrix
|
|
186
|
+
|
|
187
|
+
| Dimension | Message Queue (RabbitMQ/SQS) | Event Stream (Kafka/Kinesis) |
|
|
188
|
+
|---|---|---|
|
|
189
|
+
| **Data model** | Transient message, deleted after ack | Immutable log, retained by policy |
|
|
190
|
+
| **Consumption** | Competing consumers (1 msg = 1 consumer) | Independent consumer groups (1 msg = N consumers) |
|
|
191
|
+
| **Ordering** | FIFO per queue (with caveats) | Ordered per partition only |
|
|
192
|
+
| **Replay** | Not possible (message gone after ack) | Native (reset offset to any point) |
|
|
193
|
+
| **Routing** | Rich (exchange types, routing keys, headers) | Coarse (topic-level only) |
|
|
194
|
+
| **Throughput** | 50K-100K msg/s (RabbitMQ); unlimited (SQS) | 500K-1M+ msg/s (Kafka with batching) |
|
|
195
|
+
| **Latency** | Sub-ms at low throughput (RabbitMQ) | Low-single-digit ms (Kafka), tunable |
|
|
196
|
+
| **Delivery guarantees** | At-most-once, at-least-once | At-most-once, at-least-once, exactly-once (within Kafka) |
|
|
197
|
+
| **Back-pressure** | Broker-managed (queue depth limits, publisher blocking) | Consumer-managed (lag monitoring, no producer throttling) |
|
|
198
|
+
| **Priority** | Native support (RabbitMQ priority queues) | Not supported |
|
|
199
|
+
| **Delayed messages** | Supported (plugins, DelaySeconds) | Not natively supported |
|
|
200
|
+
| **Operational complexity** | Low-moderate (single Erlang node to HA cluster) | High (partitions, replication, ISR, consumer rebalancing) |
|
|
201
|
+
| **Scaling model** | Add consumers freely; queue is the bottleneck | Add partitions (but cannot reduce); consumers <= partitions |
|
|
202
|
+
| **Storage cost** | Low (messages deleted after processing) | High (all messages retained for retention period) |
|
|
203
|
+
| **Protocol** | AMQP, MQTT, STOMP (RabbitMQ); HTTP (SQS) | Custom binary protocol (Kafka); HTTP (Kinesis) |
|
|
204
|
+
| **Best for** | Task distribution, RPC, routing | Event sourcing, multi-consumer pipelines, analytics |
|
|
205
|
+
|
|
206
|
+
---
|
|
207
|
+
|
|
208
|
+
## Evolution Path
|
|
209
|
+
|
|
210
|
+
### Phase 1: Database-Backed Jobs (0-1,000 msg/s)
|
|
211
|
+
|
|
212
|
+
Start here. Use your existing database with a job processing library.
|
|
213
|
+
|
|
214
|
+
- **Node.js**: `graphile-worker` (Postgres), `BullMQ` (Redis)
|
|
215
|
+
- **Python**: `Celery` (Redis/RabbitMQ backend), `Dramatiq`, `Huey`
|
|
216
|
+
- **Ruby**: `Sidekiq` (Redis), `GoodJob` (Postgres)
|
|
217
|
+
- **Go**: `River` (Postgres), `Asynq` (Redis)
|
|
218
|
+
- **Elixir**: `Oban` (Postgres)
|
|
219
|
+
- **.NET**: `Hangfire` (SQL Server/Redis)
|
|
220
|
+
|
|
221
|
+
Benefits: no new infrastructure, transactional enqueue (enqueue a job in the same transaction as your database write), familiar debugging (SQL queries to inspect job state).
|
|
222
|
+
|
|
223
|
+
Limitations: database becomes the bottleneck above ~1,000-5,000 jobs/second; no pub/sub; no replay; polling-based consumption adds latency.
|
|
224
|
+
|
|
225
|
+
### Phase 2: Dedicated Message Broker (1,000-50,000 msg/s)
|
|
226
|
+
|
|
227
|
+
When you outgrow database-backed jobs or need pub/sub, routing, or multiple consumer patterns.
|
|
228
|
+
|
|
229
|
+
**RabbitMQ** for: flexible routing, request-reply, priority queues, mixed protocols (AMQP + MQTT for IoT). Operational model: Erlang cluster with quorum queues for durability. Modern RabbitMQ (3.13+, now 4.x) with quorum queues and streams is significantly more reliable than classic mirrored queues.
|
|
230
|
+
|
|
231
|
+
**Managed SQS/SNS** for: serverless architectures on AWS, zero-ops queue with unlimited throughput, fan-out via SNS-to-SQS subscriptions. Trade-off: 256KB message size limit, at-least-once delivery only (FIFO queues are limited to 300 msg/s per group or 3,000 msg/s with batching).
|
|
232
|
+
|
|
233
|
+
**NATS** for: lightweight, high-performance pub/sub with minimal operational overhead. NATS server is a single static binary (no JVM, no Erlang). NATS JetStream adds persistence, replay, and exactly-once delivery. Ideal for edge computing, IoT, and Kubernetes-native microservices. Throughput: 200K-400K msg/s with JetStream persistence.
|
|
234
|
+
|
|
235
|
+
### Phase 3: Event Streaming Platform (50,000+ msg/s or multi-consumer requirements)
|
|
236
|
+
|
|
237
|
+
When you need replay, multiple independent consumers, stream processing, or high-throughput event pipelines.
|
|
238
|
+
|
|
239
|
+
**Apache Kafka** for: the de facto standard in event streaming. Massive ecosystem (Kafka Connect, Kafka Streams, ksqlDB, Schema Registry). Production-proven at LinkedIn (7 trillion messages/day), PayPal (1 trillion messages/day), and Netflix. KRaft mode (replacing ZooKeeper, GA since Kafka 3.3) simplifies operations.
|
|
240
|
+
|
|
241
|
+
**Amazon Kinesis Data Streams** for: AWS-native event streaming without managing Kafka clusters. Auto-scaling with on-demand capacity mode. Integrates with Lambda, Firehose, Analytics. Trade-off: 1MB/s per shard ingestion, 2MB/s per shard consumption; less ecosystem than Kafka.
|
|
242
|
+
|
|
243
|
+
**Redpanda** for: Kafka API-compatible but written in C++ (no JVM). Lower tail latency, simpler operations (no ZooKeeper, no JVM tuning). Drop-in replacement for Kafka in most cases.
|
|
244
|
+
|
|
245
|
+
**Apache Pulsar** for: unified messaging (queues + streams in one system), multi-tenancy, geo-replication. Separates compute (brokers) from storage (BookKeeper), enabling independent scaling. More complex to operate than Kafka but more flexible.
|
|
246
|
+
|
|
247
|
+
### Phase 4: Hybrid Patterns (enterprise scale)
|
|
248
|
+
|
|
249
|
+
At scale, most organizations use BOTH queues and streams. Common pattern:
|
|
250
|
+
|
|
251
|
+
1. **Event stream** as the backbone: all domain events flow through Kafka topics
|
|
252
|
+
2. **Queue for task distribution**: a consumer reads from Kafka and enqueues specific tasks into SQS/RabbitMQ for worker pools
|
|
253
|
+
3. **CDC pipeline**: Debezium captures database changes into Kafka; downstream services consume change events
|
|
254
|
+
4. **Stream processing layer**: Kafka Streams or Flink for real-time aggregations, feeding results back into Kafka or a database
|
|
255
|
+
|
|
256
|
+
This is not over-engineering at scale — it is separation of concerns. The stream provides the durable event log; the queue provides work distribution. Each tool does what it does best.
|
|
257
|
+
|
|
258
|
+
---
|
|
259
|
+
|
|
260
|
+
## Failure Modes
|
|
261
|
+
|
|
262
|
+
### Consumer Lag (Streams)
|
|
263
|
+
|
|
264
|
+
**What happens:** Consumer group falls behind the latest offset. If lag exceeds retention, unprocessed messages are deleted — silent data loss.
|
|
265
|
+
|
|
266
|
+
**Causes:** Consumer processing is too slow; consumer crashed and wasn't restarted; rebalancing storms after deployment; external dependency (database, API) is slow.
|
|
267
|
+
|
|
268
|
+
**Detection:** Monitor `records-lag-max` and `records-lead-min` (Kafka JMX metrics), or use Burrow (LinkedIn's consumer lag monitoring tool). Alert when lag exceeds a threshold relative to your retention period.
|
|
269
|
+
|
|
270
|
+
**Mitigation:** Scale consumers (up to partition count); increase retention temporarily; optimize consumer processing; use back-pressure to slow producers if needed.
|
|
271
|
+
|
|
272
|
+
### Poison Messages (Queues and Streams)
|
|
273
|
+
|
|
274
|
+
**What happens:** A message that cannot be processed (malformed data, schema mismatch, triggers a bug) is retried infinitely, blocking the queue or partition.
|
|
275
|
+
|
|
276
|
+
**Queue mitigation:** Configure max retry count and dead-letter queue. After N failures, the message moves to the DLQ for manual inspection. SQS: set `maxReceiveCount` on the redrive policy. RabbitMQ: use `x-death` headers and dead-letter exchanges.
|
|
277
|
+
|
|
278
|
+
**Stream mitigation:** Harder because the consumer must advance its offset. Options: (1) log the error and skip (advance offset); (2) publish to a dead-letter topic; (3) use a circuit breaker that pauses consumption and alerts. Never allow a poison message to block an entire partition indefinitely.
|
|
279
|
+
|
|
280
|
+
### Partition Hot Spots (Streams)
|
|
281
|
+
|
|
282
|
+
**What happens:** One partition receives disproportionately more messages than others because the partition key has skewed distribution (e.g., one customer generates 80% of events).
|
|
283
|
+
|
|
284
|
+
**Symptoms:** One consumer in the group is overwhelmed while others are idle. Lag grows on the hot partition only.
|
|
285
|
+
|
|
286
|
+
**Mitigation:** Choose partition keys with high cardinality and even distribution. If a single entity generates massive traffic, use a compound key (e.g., `customer-123-shard-N` with random suffix) to spread its events across partitions — but this sacrifices per-entity ordering. Alternatively, increase partition count (but you cannot decrease it in Kafka).
|
|
287
|
+
|
|
288
|
+
### Message Loss
|
|
289
|
+
|
|
290
|
+
**Queue message loss scenarios:**
|
|
291
|
+
- Consumer acks before processing completes, then crashes (pre-ack loss)
|
|
292
|
+
- Queue exceeds max length with `drop-head` overflow policy (silent discard)
|
|
293
|
+
- SQS message exceeds 14-day retention without being consumed
|
|
294
|
+
- RabbitMQ node failure with non-durable queues (messages in RAM only)
|
|
295
|
+
|
|
296
|
+
**Stream message loss scenarios:**
|
|
297
|
+
- Producer sends with `acks=0` (fire-and-forget) and broker crashes
|
|
298
|
+
- Consumer lag exceeds retention period (messages deleted before consumption)
|
|
299
|
+
- Under-replicated partitions: broker failure when `min.insync.replicas` is not set correctly
|
|
300
|
+
- Unclean leader election: a lagging replica becomes leader and messages on the old leader are lost (disabled by default since Kafka 0.11)
|
|
301
|
+
|
|
302
|
+
**Prevention:** Always use `acks=all` for Kafka producers in critical paths. Set `min.insync.replicas=2` with replication factor 3. Use quorum queues (not classic mirrored queues) in RabbitMQ. Monitor under-replicated partitions and ISR shrink events.
|
|
303
|
+
|
|
304
|
+
### Consumer Rebalancing Storms (Kafka)
|
|
305
|
+
|
|
306
|
+
**What happens:** Adding or removing consumers triggers a rebalance — all partitions are reassigned. During rebalance, no consumer processes messages. If a consumer is slow to respond to the rebalance, it gets kicked out, triggering ANOTHER rebalance. This cascading effect can cause minutes of downtime.
|
|
307
|
+
|
|
308
|
+
**Causes:** Long-running message processing exceeding `max.poll.interval.ms`; rolling deployments that add/remove consumers rapidly; unstable consumers that frequently crash.
|
|
309
|
+
|
|
310
|
+
**Mitigation:** Use cooperative sticky assignor (incremental rebalancing, available since Kafka 2.4) instead of eager rebalancing. Increase `max.poll.interval.ms` if processing is legitimately slow. Use static group membership (`group.instance.id`) to avoid rebalance on brief consumer restarts. Deploy consumers with rolling updates that wait for rebalance to complete between pod restarts.
|
|
311
|
+
|
|
312
|
+
### Split Brain and Network Partitions
|
|
313
|
+
|
|
314
|
+
**RabbitMQ:** Network partition between cluster nodes can cause split-brain — both sides accept writes, leading to divergent queue state. RabbitMQ's partition handling modes (`pause-minority`, `autoheal`) each have trade-offs. Quorum queues (Raft-based) handle this correctly by requiring majority consensus.
|
|
315
|
+
|
|
316
|
+
**Kafka:** Less susceptible due to single-leader-per-partition design. The controller detects broker failures and reassigns leadership. With `min.insync.replicas` set correctly, a minority partition cannot accept writes.
|
|
317
|
+
|
|
318
|
+
---
|
|
319
|
+
|
|
320
|
+
## Technology Landscape
|
|
321
|
+
|
|
322
|
+
### Apache Kafka
|
|
323
|
+
|
|
324
|
+
The industry standard for event streaming. Created at LinkedIn (2011), open-sourced via Apache. Now developed by Confluent (founded by Kafka's creators).
|
|
325
|
+
|
|
326
|
+
**Strengths:** Highest throughput (millions of msg/s), battle-tested at extreme scale (LinkedIn: 7T msg/day, PayPal: 1T msg/day, Uber, Netflix, Walmart), massive ecosystem (Connect, Streams, ksqlDB, Schema Registry), KRaft mode eliminates ZooKeeper dependency, log compaction enables table semantics.
|
|
327
|
+
|
|
328
|
+
**Weaknesses:** Operational complexity (JVM tuning, partition management, ISR monitoring), high resource requirements (8+ cores, 64-128GB RAM per broker recommended), no message priority, no per-message routing, no native delayed messages, consumer rebalancing can cause processing pauses.
|
|
329
|
+
|
|
330
|
+
**Managed offerings:** Confluent Cloud, Amazon MSK, Azure Event Hubs (Kafka-compatible API), Aiven, Upstash (serverless Kafka).
|
|
331
|
+
|
|
332
|
+
### RabbitMQ
|
|
333
|
+
|
|
334
|
+
The most widely deployed open-source message broker. Implements AMQP 0.9.1 with extensions. Written in Erlang.
|
|
335
|
+
|
|
336
|
+
**Strengths:** Flexible routing (4 exchange types), multiple protocols (AMQP, MQTT, STOMP), priority queues, built-in management UI, lower resource requirements, excellent documentation, quorum queues (since 3.8) for strong consistency, RabbitMQ Streams (since 3.9) for log-based consumption.
|
|
337
|
+
|
|
338
|
+
**Weaknesses:** Lower throughput than Kafka (50K-100K msg/s), Erlang can be challenging to debug and profile, clustering adds complexity, classic mirrored queues are deprecated (use quorum queues), no native exactly-once delivery.
|
|
339
|
+
|
|
340
|
+
**Note:** RabbitMQ 4.x (released 2024) introduced native AMQP 1.0 support, Khepri metadata store (replacing Mnesia), and further improvements to quorum queues and streams. The project is actively evolving.
|
|
341
|
+
|
|
342
|
+
### AWS SQS / SNS / EventBridge / Kinesis
|
|
343
|
+
|
|
344
|
+
Amazon offers four complementary services, each solving a different problem:
|
|
345
|
+
|
|
346
|
+
**SQS** — Fully managed queue. Zero ops, virtually unlimited throughput (standard queues), 256KB message limit. Standard queues: at-least-once, best-effort ordering. FIFO queues: exactly-once processing, strict ordering, but limited to 300 msg/s per message group (3,000 with batching). Best for: serverless backends, decoupling microservices on AWS.
|
|
347
|
+
|
|
348
|
+
**SNS** — Pub/sub topic for fan-out. Pushes to SQS queues, Lambda functions, HTTP endpoints, email, SMS. Often used as SNS (fan-out) -> SQS (queue per consumer) pattern. Up to 12.5M subscriptions per topic.
|
|
349
|
+
|
|
350
|
+
**Kinesis Data Streams** — Managed event streaming. Shard-based (1MB/s in, 2MB/s out per shard). On-demand mode auto-scales. 24-hour default retention (up to 365 days). Best for: real-time analytics on AWS, Lambda integration, Firehose to S3/Redshift.
|
|
351
|
+
|
|
352
|
+
**EventBridge** — Serverless event bus. Content-based filtering (match on event fields, not just topics). Native integration with 90+ AWS services and SaaS partners (Shopify, Datadog, Auth0). Schema registry built in. Best for: event-driven architectures with complex routing rules, cross-account event sharing.
|
|
353
|
+
|
|
354
|
+
### Google Cloud Pub/Sub
|
|
355
|
+
|
|
356
|
+
Fully managed, serverless messaging with global reach. Auto-scales without shard/partition management.
|
|
357
|
+
|
|
358
|
+
**Strengths:** True serverless (no capacity planning), global message routing, per-message acknowledgment with dead-letter topics, exactly-once delivery within Google Cloud, seek-to-timestamp for replay, schema enforcement via Schema Registry.
|
|
359
|
+
|
|
360
|
+
**Weaknesses:** No ordering guarantee by default (ordering keys opt-in, limited to 1MB/s per ordering key), higher per-message cost than self-managed Kafka at high volume, vendor lock-in, 7-31 day retention only.
|
|
361
|
+
|
|
362
|
+
**Best for:** Google Cloud-native applications, global event distribution, teams that want zero operational overhead.
|
|
363
|
+
|
|
364
|
+
### NATS / NATS JetStream
|
|
365
|
+
|
|
366
|
+
Lightweight, high-performance messaging system. Originally pure pub/sub (fire-and-forget). JetStream (added in NATS 2.2) adds persistence, replay, exactly-once, and key-value storage.
|
|
367
|
+
|
|
368
|
+
**Strengths:** Single static binary (12MB, no JVM), sub-millisecond latency, minimal resource footprint (runs on Raspberry Pi), 200K-400K msg/s with JetStream, built-in WebSocket support, Kubernetes-native (NATS is the default messaging layer for many cloud-native projects), supports queue groups (competing consumers), request-reply, and pub/sub.
|
|
369
|
+
|
|
370
|
+
**Weaknesses:** Smaller ecosystem than Kafka, less mature stream processing tooling, JetStream is newer and less battle-tested at extreme scale, fewer managed offerings.
|
|
371
|
+
|
|
372
|
+
**Best for:** Edge computing, IoT, Kubernetes microservices, applications that need low latency and low operational overhead, polyglot environments (clients in 40+ languages).
|
|
373
|
+
|
|
374
|
+
### Redis Streams
|
|
375
|
+
|
|
376
|
+
Append-only log data structure built into Redis (since 5.0). Consumer groups, acknowledgments, and persistence (AOF/RDB).
|
|
377
|
+
|
|
378
|
+
**Strengths:** Sub-millisecond latency (in-memory), zero additional infrastructure if you already use Redis, consumer groups with pending entry list (PEL), XRANGE for replay, simple API (XADD, XREAD, XREADGROUP, XACK), capped streams for bounded memory usage.
|
|
379
|
+
|
|
380
|
+
**Weaknesses:** Limited durability (Redis persistence is asynchronous by default; data loss possible on crash), single-node throughput ceiling, no built-in partitioning across nodes (Redis Cluster shards by key, not by stream), no ecosystem for connectors or stream processing, retention is memory-bound.
|
|
381
|
+
|
|
382
|
+
**Best for:** Lightweight event streaming when Redis is already in the stack, real-time features (activity feeds, notifications), use cases where sub-millisecond latency matters more than strong durability.
|
|
383
|
+
|
|
384
|
+
### BullMQ / Celery / Sidekiq (Application-Level Job Queues)
|
|
385
|
+
|
|
386
|
+
Not message brokers — these are libraries that use Redis or a database as a backend. They provide higher-level abstractions: job scheduling, retries with backoff, rate limiting, dashboard UIs, cron jobs, job priority, and job dependencies.
|
|
387
|
+
|
|
388
|
+
**BullMQ** (Node.js/TypeScript + Redis): Successor to Bull. Supports delayed jobs, rate limiting, job dependencies, sandboxed processors. Excellent for Node.js backends.
|
|
389
|
+
|
|
390
|
+
**Celery** (Python + Redis/RabbitMQ): The standard Python task queue. Supports periodic tasks (Celery Beat), result backends, task chaining, groups, and chords. Mature but complex configuration.
|
|
391
|
+
|
|
392
|
+
**Sidekiq** (Ruby + Redis): The standard Ruby background job processor. Simple API, web dashboard, Sidekiq Pro/Enterprise for reliability features (unique jobs, batches, rate limiting).
|
|
393
|
+
|
|
394
|
+
**Best for:** Application-level background jobs where you want a high-level API and don't need cross-service messaging or event streaming.
|
|
395
|
+
|
|
396
|
+
---
|
|
397
|
+
|
|
398
|
+
## Decision Tree
|
|
399
|
+
|
|
400
|
+
```
|
|
401
|
+
START: Do you need asynchronous communication between components?
|
|
402
|
+
│
|
|
403
|
+
├─ NO → Use direct HTTP/gRPC calls. You don't need messaging.
|
|
404
|
+
│
|
|
405
|
+
└─ YES → Is this within a single application (background jobs)?
|
|
406
|
+
│
|
|
407
|
+
├─ YES → Is volume < 5,000 jobs/minute?
|
|
408
|
+
│ │
|
|
409
|
+
│ ├─ YES → Use a database-backed job queue
|
|
410
|
+
│ │ (BullMQ, Celery, Sidekiq, Oban, graphile-worker)
|
|
411
|
+
│ │
|
|
412
|
+
│ └─ NO → Do you need complex routing or priority?
|
|
413
|
+
│ │
|
|
414
|
+
│ ├─ YES → Use RabbitMQ or managed SQS
|
|
415
|
+
│ └─ NO → Use Redis-backed job queue (BullMQ, Sidekiq)
|
|
416
|
+
│
|
|
417
|
+
└─ NO → Cross-service communication. Do multiple independent
|
|
418
|
+
consumers need the same events?
|
|
419
|
+
│
|
|
420
|
+
├─ NO → Each message processed by one consumer only?
|
|
421
|
+
│ │
|
|
422
|
+
│ ├─ YES → Do you need flexible routing, priority, or RPC?
|
|
423
|
+
│ │ │
|
|
424
|
+
│ │ ├─ YES → RabbitMQ (self-managed) or SQS + SNS (AWS)
|
|
425
|
+
│ │ └─ NO → SQS (AWS) or NATS (self-managed, lightweight)
|
|
426
|
+
│ │
|
|
427
|
+
│ └─ (Unclear) → Start with a queue; migrate to stream if
|
|
428
|
+
│ multi-consumer needs emerge
|
|
429
|
+
│
|
|
430
|
+
└─ YES → Do you need replay, retention, or stream processing?
|
|
431
|
+
│
|
|
432
|
+
├─ YES → Event streaming platform
|
|
433
|
+
│ │
|
|
434
|
+
│ ├─ AWS-native? → Kinesis Data Streams
|
|
435
|
+
│ ├─ GCP-native? → Google Cloud Pub/Sub
|
|
436
|
+
│ ├─ Want zero JVM ops? → Redpanda or NATS JetStream
|
|
437
|
+
│ └─ Need largest ecosystem? → Apache Kafka (or Confluent Cloud)
|
|
438
|
+
│
|
|
439
|
+
└─ NO → Fan-out with independent queues may suffice
|
|
440
|
+
(SNS → SQS, RabbitMQ fanout exchange)
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
---
|
|
444
|
+
|
|
445
|
+
## Implementation Sketch
|
|
446
|
+
|
|
447
|
+
### Queue Pattern: Background Job with Dead-Letter Handling (RabbitMQ)
|
|
448
|
+
|
|
449
|
+
```python
|
|
450
|
+
# Producer: enqueue a job
|
|
451
|
+
import pika
|
|
452
|
+
|
|
453
|
+
connection = pika.BlockingConnection(pika.ConnectionParameters('localhost'))
|
|
454
|
+
channel = connection.channel()
|
|
455
|
+
|
|
456
|
+
# Declare the work queue with a dead-letter exchange
|
|
457
|
+
channel.queue_declare(
|
|
458
|
+
queue='email_jobs',
|
|
459
|
+
durable=True,
|
|
460
|
+
arguments={
|
|
461
|
+
'x-dead-letter-exchange': 'dlx',
|
|
462
|
+
'x-dead-letter-routing-key': 'email_jobs.dead',
|
|
463
|
+
'x-message-ttl': 60000, # 60s TTL
|
|
464
|
+
'x-max-length': 100000, # Max queue depth
|
|
465
|
+
}
|
|
466
|
+
)
|
|
467
|
+
|
|
468
|
+
# Publish with persistence
|
|
469
|
+
channel.basic_publish(
|
|
470
|
+
exchange='',
|
|
471
|
+
routing_key='email_jobs',
|
|
472
|
+
body=json.dumps({'to': 'user@example.com', 'template': 'welcome'}),
|
|
473
|
+
properties=pika.BasicProperties(
|
|
474
|
+
delivery_mode=2, # persistent
|
|
475
|
+
content_type='application/json',
|
|
476
|
+
)
|
|
477
|
+
)
|
|
478
|
+
```
|
|
479
|
+
|
|
480
|
+
```python
|
|
481
|
+
# Consumer: process jobs with manual acknowledgment
|
|
482
|
+
def process_email(ch, method, properties, body):
|
|
483
|
+
try:
|
|
484
|
+
job = json.loads(body)
|
|
485
|
+
send_email(job['to'], job['template'])
|
|
486
|
+
ch.basic_ack(delivery_tag=method.delivery_tag)
|
|
487
|
+
except TransientError:
|
|
488
|
+
# Requeue for retry (up to DLQ threshold)
|
|
489
|
+
ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)
|
|
490
|
+
except PermanentError:
|
|
491
|
+
# Reject without requeue — goes to DLQ
|
|
492
|
+
ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)
|
|
493
|
+
|
|
494
|
+
channel.basic_qos(prefetch_count=10) # Don't overwhelm the consumer
|
|
495
|
+
channel.basic_consume(queue='email_jobs', on_message_callback=process_email)
|
|
496
|
+
channel.start_consuming()
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
### Stream Pattern: Multi-Consumer Event Pipeline (Kafka)
|
|
500
|
+
|
|
501
|
+
```python
|
|
502
|
+
# Producer: publish domain events
|
|
503
|
+
from confluent_kafka import Producer
|
|
504
|
+
|
|
505
|
+
producer = Producer({
|
|
506
|
+
'bootstrap.servers': 'kafka:9092',
|
|
507
|
+
'acks': 'all', # Wait for all ISR replicas
|
|
508
|
+
'enable.idempotence': True, # Deduplicate retries
|
|
509
|
+
'max.in.flight.requests.per.connection': 5, # Safe with idempotence
|
|
510
|
+
'compression.type': 'lz4',
|
|
511
|
+
})
|
|
512
|
+
|
|
513
|
+
def publish_order_event(order):
|
|
514
|
+
producer.produce(
|
|
515
|
+
topic='orders',
|
|
516
|
+
key=str(order['customer_id']), # Partition by customer
|
|
517
|
+
value=json.dumps({
|
|
518
|
+
'event': 'order.placed',
|
|
519
|
+
'order_id': order['id'],
|
|
520
|
+
'items': order['items'],
|
|
521
|
+
'total': order['total'],
|
|
522
|
+
'timestamp': datetime.utcnow().isoformat(),
|
|
523
|
+
}),
|
|
524
|
+
on_delivery=lambda err, msg: log.error(f'Failed: {err}') if err else None,
|
|
525
|
+
)
|
|
526
|
+
producer.flush() # Ensure delivery (or use poll() in a loop)
|
|
527
|
+
```
|
|
528
|
+
|
|
529
|
+
```python
|
|
530
|
+
# Consumer Group A: Fulfillment service
|
|
531
|
+
from confluent_kafka import Consumer
|
|
532
|
+
|
|
533
|
+
consumer = Consumer({
|
|
534
|
+
'bootstrap.servers': 'kafka:9092',
|
|
535
|
+
'group.id': 'fulfillment-service',
|
|
536
|
+
'auto.offset.reset': 'earliest',
|
|
537
|
+
'enable.auto.commit': False, # Manual commit for at-least-once
|
|
538
|
+
'partition.assignment.strategy': 'cooperative-sticky',
|
|
539
|
+
})
|
|
540
|
+
|
|
541
|
+
consumer.subscribe(['orders'])
|
|
542
|
+
|
|
543
|
+
while True:
|
|
544
|
+
msg = consumer.poll(timeout=1.0)
|
|
545
|
+
if msg is None:
|
|
546
|
+
continue
|
|
547
|
+
if msg.error():
|
|
548
|
+
handle_error(msg.error())
|
|
549
|
+
continue
|
|
550
|
+
|
|
551
|
+
event = json.loads(msg.value())
|
|
552
|
+
if event['event'] == 'order.placed':
|
|
553
|
+
# Idempotent processing: use order_id as dedup key
|
|
554
|
+
if not already_processed(event['order_id']):
|
|
555
|
+
create_shipment(event)
|
|
556
|
+
mark_processed(event['order_id'])
|
|
557
|
+
|
|
558
|
+
consumer.commit(asynchronous=False) # Commit after processing
|
|
559
|
+
```
|
|
560
|
+
|
|
561
|
+
```python
|
|
562
|
+
# Consumer Group B: Analytics pipeline (completely independent)
|
|
563
|
+
analytics_consumer = Consumer({
|
|
564
|
+
'bootstrap.servers': 'kafka:9092',
|
|
565
|
+
'group.id': 'analytics-pipeline', # Different group = independent
|
|
566
|
+
'auto.offset.reset': 'earliest',
|
|
567
|
+
})
|
|
568
|
+
|
|
569
|
+
analytics_consumer.subscribe(['orders'])
|
|
570
|
+
# This consumer reads ALL the same messages independently
|
|
571
|
+
# It can lag behind or be ahead of the fulfillment consumer
|
|
572
|
+
```
|
|
573
|
+
|
|
574
|
+
### Hybrid Pattern: Kafka Stream to SQS Worker Queue
|
|
575
|
+
|
|
576
|
+
```python
|
|
577
|
+
# Bridge: Kafka consumer that fans out specific events to SQS
|
|
578
|
+
import boto3
|
|
579
|
+
from confluent_kafka import Consumer
|
|
580
|
+
|
|
581
|
+
sqs = boto3.client('sqs')
|
|
582
|
+
consumer = Consumer({
|
|
583
|
+
'bootstrap.servers': 'kafka:9092',
|
|
584
|
+
'group.id': 'sqs-bridge',
|
|
585
|
+
'enable.auto.commit': False,
|
|
586
|
+
})
|
|
587
|
+
consumer.subscribe(['orders'])
|
|
588
|
+
|
|
589
|
+
while True:
|
|
590
|
+
msg = consumer.poll(timeout=1.0)
|
|
591
|
+
if msg is None:
|
|
592
|
+
continue
|
|
593
|
+
|
|
594
|
+
event = json.loads(msg.value())
|
|
595
|
+
|
|
596
|
+
# Route high-value orders to a priority SQS queue for manual review
|
|
597
|
+
if event['event'] == 'order.placed' and event['total'] > 10000:
|
|
598
|
+
sqs.send_message(
|
|
599
|
+
QueueUrl='https://sqs.us-east-1.amazonaws.com/123/high-value-orders',
|
|
600
|
+
MessageBody=json.dumps(event),
|
|
601
|
+
MessageAttributes={
|
|
602
|
+
'OrderId': {'StringValue': event['order_id'], 'DataType': 'String'}
|
|
603
|
+
},
|
|
604
|
+
MessageGroupId=event['customer_id'], # FIFO queue
|
|
605
|
+
MessageDeduplicationId=event['order_id'],
|
|
606
|
+
)
|
|
607
|
+
|
|
608
|
+
consumer.commit(asynchronous=False)
|
|
609
|
+
```
|
|
610
|
+
|
|
611
|
+
---
|
|
612
|
+
|
|
613
|
+
## Real-World Case Studies
|
|
614
|
+
|
|
615
|
+
**LinkedIn** — Created Kafka to solve the problem of moving data between systems at scale. As of 2024, LinkedIn processes over 7 trillion messages per day across Kafka clusters, powering activity feeds, metrics, messaging, and the entire data pipeline from operational databases to the data warehouse.
|
|
616
|
+
|
|
617
|
+
**PayPal** — Processes approximately 1 trillion Kafka messages per day for real-time fraud detection, transaction processing, and risk analysis. The event streaming architecture enables sub-second fraud scoring on every payment transaction globally.
|
|
618
|
+
|
|
619
|
+
**Walmart** — Uses Kafka for real-time inventory tracking across thousands of stores, dynamic pricing adjustments based on demand signals, and customer behavior analytics. The streaming architecture replaced batch ETL processes that previously ran overnight, enabling real-time supply chain visibility.
|
|
620
|
+
|
|
621
|
+
**Uber** — Runs one of the largest Kafka deployments in the world, powering real-time trip tracking, surge pricing calculations, driver-rider matching, and the logging infrastructure. Uber contributed the Kafka consumer rebalance improvements that became the cooperative-sticky assignor.
|
|
622
|
+
|
|
623
|
+
**Netflix** — Uses a combination of Kafka and SQS. Kafka serves as the backbone for real-time event processing (viewing history, recommendations, A/B test events). SQS handles task distribution for encoding pipeline jobs, where each movie/show is encoded into multiple formats by competing worker instances.
|
|
624
|
+
|
|
625
|
+
**Shopify** — Migrated from RabbitMQ to Kafka for their core event platform when they needed multi-consumer access to the same events. They still use background job queues (Sidekiq) for merchant-facing task processing. This hybrid approach — streams for events, queues for jobs — is a recurring pattern at scale.
|
|
626
|
+
|
|
627
|
+
---
|
|
628
|
+
|
|
629
|
+
## Cross-References
|
|
630
|
+
|
|
631
|
+
- **[Event-Driven Architecture](../patterns/event-driven.md)** — The architectural style that event streams enable; covers choreography vs. orchestration, saga patterns, and event schema evolution
|
|
632
|
+
- **[Microservices](../patterns/microservices.md)** — Queues and streams are the communication backbone; covers service decomposition, API gateway patterns, and service mesh
|
|
633
|
+
- **[Idempotency and Retry](../integration/idempotency-and-retry.md)** — Essential for at-least-once delivery; covers idempotency keys, retry with exponential backoff, and deduplication strategies
|
|
634
|
+
- **[Data Consistency](../data/data-consistency.md)** — Queues and streams introduce eventual consistency; covers saga patterns, outbox pattern, and compensating transactions
|