@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,797 @@
|
|
|
1
|
+
# Saga Pattern — Architecture Expertise Module
|
|
2
|
+
|
|
3
|
+
> The Saga pattern manages distributed transactions across multiple services by breaking them into a sequence of local transactions, each with a compensating action for rollback. It replaces distributed two-phase commit (2PC) — which doesn't scale — with eventual consistency and explicit compensation logic. The term was introduced by Hector Garcia-Molina and Kenneth Salem in their 1987 ACM SIGMOD paper "Sagas," originally addressing long-lived transactions in monolithic databases. The modern microservices community repurposed the concept for cross-service coordination, and it has become the dominant pattern for distributed business transactions.
|
|
4
|
+
|
|
5
|
+
> **Category:** Distributed
|
|
6
|
+
> **Complexity:** Expert
|
|
7
|
+
> **Applies when:** Business transactions spanning multiple services that need atomicity guarantees without distributed database transactions
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## What This Is (and What It Isn't)
|
|
12
|
+
|
|
13
|
+
### The Core Idea
|
|
14
|
+
|
|
15
|
+
A saga is a sequence of local transactions where each step (T1, T2, ... Tn) executes within a single service's database boundary. Each step Ti has a corresponding compensating transaction Ci that semantically undoes the effect of Ti. If step Tk fails, the saga executes compensating transactions Ck-1, Ck-2, ... C1 in reverse order to undo the work of all preceding steps.
|
|
16
|
+
|
|
17
|
+
This is fundamentally different from a distributed two-phase commit (2PC). In 2PC, a coordinator asks all participants to prepare (vote), then commits or aborts atomically. All participants hold locks during preparation, the coordinator is a single point of failure, and network partitions cause blocking. 2PC provides strong consistency at the cost of availability and latency. Sagas provide eventual consistency at the cost of isolation — but with dramatically better availability and scalability.
|
|
18
|
+
|
|
19
|
+
**The key distinction:** 2PC holds locks across services until all participants agree. Sagas commit each local transaction immediately and rely on compensating transactions if something goes wrong later. Each local transaction is committed and visible to other transactions the moment it completes — there is no global lock.
|
|
20
|
+
|
|
21
|
+
### Two Coordination Strategies
|
|
22
|
+
|
|
23
|
+
**Choreography-based sagas:** Each service publishes domain events after completing its local transaction. Other services listen for those events and react by executing their own local transaction and publishing the next event. There is no central coordinator. The saga's execution flow is implicit — it emerges from the event subscriptions. Each service knows only about its own step and which events to listen for.
|
|
24
|
+
|
|
25
|
+
**Orchestration-based sagas:** A central saga coordinator (orchestrator) explicitly controls the sequence. The orchestrator sends commands to each service ("reserve inventory," "charge payment"), waits for replies, and decides the next step. If a step fails, the orchestrator invokes compensating transactions in reverse order. The saga's execution flow is explicit — it lives in the orchestrator's state machine.
|
|
26
|
+
|
|
27
|
+
### What a Saga Is Not
|
|
28
|
+
|
|
29
|
+
**Not a distributed transaction.** A saga does not provide ACID guarantees across services. There is no atomicity in the traditional sense — intermediate states are visible. There is no isolation — other transactions can see partially completed saga state. What a saga provides is eventual atomicity: the guarantee that either all steps complete or all completed steps are compensated.
|
|
30
|
+
|
|
31
|
+
**Not a replacement for local database transactions.** Each step within a saga should be a proper ACID transaction within a single service's database. The saga pattern coordinates the sequence of these local transactions. If a single service needs a multi-table write, that is a local transaction, not a saga.
|
|
32
|
+
|
|
33
|
+
**Not event sourcing.** Event sourcing stores all state changes as an append-only sequence of events. Sagas coordinate multi-service transactions using events (in choreography) or commands (in orchestration). They are complementary but distinct patterns. You can implement sagas with or without event sourcing.
|
|
34
|
+
|
|
35
|
+
**Not an outbox pattern.** The transactional outbox pattern ensures reliable event publishing by writing events to an outbox table within the same local transaction as the business data. Sagas frequently use the outbox pattern as an implementation detail to guarantee reliable step transitions, but they are separate concerns.
|
|
36
|
+
|
|
37
|
+
---
|
|
38
|
+
|
|
39
|
+
## When to Use It
|
|
40
|
+
|
|
41
|
+
### The Qualifying Conditions
|
|
42
|
+
|
|
43
|
+
Apply the saga pattern when **all** of these are true:
|
|
44
|
+
|
|
45
|
+
**The business transaction spans multiple services with separate databases.** The entire reason sagas exist is that you cannot execute a single ACID transaction across service boundaries. If all the data lives in one database, use a regular database transaction. If the data is split across services each owning their data (the database-per-service pattern), sagas are the primary mechanism for cross-service consistency.
|
|
46
|
+
|
|
47
|
+
**Eventual consistency is acceptable for the business.** There will be a window — milliseconds to seconds, sometimes minutes — where the system is in a partially completed state. An order may be created but payment not yet confirmed. Inventory may be reserved but the shipping label not yet generated. If the business can tolerate this intermediate visibility (and most can, with proper UI design), sagas work. If not, you need a different architecture.
|
|
48
|
+
|
|
49
|
+
**Each step can be meaningfully compensated.** Every transaction in the saga must have a well-defined undo operation. "Unreserve inventory" is easy. "Unsend an email" is impossible. If the saga includes non-compensatable actions (sending notifications, charging credit cards with no-refund policies, triggering physical processes), those steps must be placed at the end of the saga (as pivot transactions) or handled with alternative strategies.
|
|
50
|
+
|
|
51
|
+
**The workflow is well-defined and finite.** Sagas model a sequence of steps with known entry and exit conditions. They are not suitable for open-ended, long-running processes where the set of steps is not known in advance. For those, consider a process manager or workflow engine.
|
|
52
|
+
|
|
53
|
+
### Real-World Domains
|
|
54
|
+
|
|
55
|
+
**E-commerce order processing.** The canonical example. An order saga coordinates: create order (Order Service) -> reserve inventory (Inventory Service) -> authorize payment (Payment Service) -> arrange shipment (Shipping Service). If payment authorization fails at step 3, the saga compensates by unreserving inventory (C2) and canceling the order (C1). Amazon, Shopify, and most large e-commerce platforms use saga variants for order fulfillment.
|
|
56
|
+
|
|
57
|
+
**Travel booking.** Booking a trip involves reserving a flight (Airline Service) -> booking a hotel (Hotel Service) -> renting a car (Car Rental Service). If the hotel booking fails, the saga compensates by canceling the flight reservation. Booking.com and Expedia handle multi-supplier coordination with saga-like patterns, often with orchestration due to the heterogeneity of supplier APIs.
|
|
58
|
+
|
|
59
|
+
**Banking and financial transfers.** A fund transfer saga: debit source account (Account Service) -> credit destination account (Account Service or external) -> record transaction (Ledger Service). If the credit fails, the saga compensates by reversing the debit. Banks like ING and Rabobank have publicly discussed saga-based architectures for payment processing.
|
|
60
|
+
|
|
61
|
+
**Insurance claims processing.** Validate claim -> assess damage -> approve payment -> disburse funds. Each step involves different services with different SLAs. Saga orchestration handles the multi-day, multi-step nature of claims workflows.
|
|
62
|
+
|
|
63
|
+
**Food delivery.** Accept order -> assign restaurant -> assign driver -> process payment -> track delivery. DoorDash and Uber Eats use saga-like coordination to manage the complex multi-party workflow where any participant can fail or cancel.
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## When NOT to Use It
|
|
68
|
+
|
|
69
|
+
This section is equally important. The saga pattern is frequently over-applied, and its complexity cost is consistently underestimated by teams adopting microservices for the first time.
|
|
70
|
+
|
|
71
|
+
### The Disqualifying Conditions
|
|
72
|
+
|
|
73
|
+
**A single database transaction suffices.** If all the data involved in the business operation lives in one database (or can be accessed within one service), use a regular transaction. This sounds obvious, but teams routinely split services prematurely, then discover they need sagas to coordinate what was previously a single transaction. The solution is not to add a saga — it is to reconsider the service boundary. A monolith or modular monolith with proper transaction boundaries eliminates entire categories of distributed coordination problems.
|
|
74
|
+
|
|
75
|
+
**Eventual consistency is unacceptable.** Some business domains have hard requirements for immediate consistency. Real-time trading systems, medical device control systems, and safety-critical infrastructure cannot tolerate windows of inconsistency. If the business says "the user must never see a partially completed state," sagas are the wrong tool. Consider keeping the data in a single database or using synchronous coordination with proper locking (accepting the availability trade-off).
|
|
76
|
+
|
|
77
|
+
**Compensating transactions are impractical or impossible.** If you cannot undo a step, you cannot include it in a saga's compensatable sequence. Common non-compensatable operations:
|
|
78
|
+
- Sending emails or SMS notifications (you cannot unsend them)
|
|
79
|
+
- Charging a credit card with a no-refund payment processor
|
|
80
|
+
- Triggering a physical process (printing a shipping label, dispatching a delivery vehicle)
|
|
81
|
+
- Publishing data to external partners with no delete API
|
|
82
|
+
- Regulatory submissions that cannot be retracted
|
|
83
|
+
|
|
84
|
+
If most steps in your workflow are non-compensatable, sagas become a poor fit. You need an alternative strategy: place non-compensatable steps last (as pivot transactions), use reservation/confirmation patterns, or accept that compensation is approximate (e.g., send a "cancellation notice" email rather than unsending the original).
|
|
85
|
+
|
|
86
|
+
**The team underestimates the complexity.** Implementing sagas correctly requires: designing compensating transactions for every step, handling partial failures during compensation itself, dealing with concurrent saga instances acting on the same data, implementing idempotency for every step (messages may be delivered more than once), managing timeouts and deadlocked sagas, building observability to trace saga execution across services. Teams that adopt the saga pattern without experience in distributed systems regularly spend 3-6 months debugging subtle consistency issues that would not have existed in a monolithic architecture. If your team has not shipped distributed systems before, start with a modular monolith.
|
|
87
|
+
|
|
88
|
+
**The number of steps is large (>7).** Sagas with many steps become exponentially harder to reason about. Each step adds a compensating transaction, a potential failure point, and interactions with concurrent sagas. A 10-step saga has 10 possible failure points, 10 compensating transactions, and hundreds of possible interleaving scenarios. Consider decomposing into smaller sagas or rethinking service boundaries.
|
|
89
|
+
|
|
90
|
+
**You are solving a data consistency problem caused by premature service decomposition.** If two services are always modified together, they probably should not be separate services. The "distributed monolith" antipattern is not fixed by adding a saga. It is fixed by merging the services or redesigning boundaries around true bounded contexts.
|
|
91
|
+
|
|
92
|
+
### The Complexity Tax Is Real
|
|
93
|
+
|
|
94
|
+
Production teams consistently report: "We spent more time implementing saga coordination than the actual business logic." The saga pattern introduces a new failure mode, consistency challenge, observability requirement, testing burden, and operational concern for every step. This cost is justified when you genuinely need cross-service transactions at scale. It is not justified when a simpler architecture would avoid the problem entirely.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## How It Works
|
|
99
|
+
|
|
100
|
+
### Choreography: Event-Driven Coordination
|
|
101
|
+
|
|
102
|
+
In choreography, there is no central coordinator. Each service publishes events after completing its local transaction, and other services subscribe to those events to trigger their next step.
|
|
103
|
+
|
|
104
|
+
```
|
|
105
|
+
Order Service Inventory Service Payment Service Shipping Service
|
|
106
|
+
| | | |
|
|
107
|
+
|--- OrderCreated ----->| | |
|
|
108
|
+
| | | |
|
|
109
|
+
| [Reserve Stock] | |
|
|
110
|
+
| | | |
|
|
111
|
+
| InventoryReserved -------------->| |
|
|
112
|
+
| | | |
|
|
113
|
+
| | [Charge Card] |
|
|
114
|
+
| | | |
|
|
115
|
+
| | PaymentCharged ---------------->|
|
|
116
|
+
| | | |
|
|
117
|
+
| | | [Create Shipment]
|
|
118
|
+
| | | |
|
|
119
|
+
|<---------------------------------------------- ShipmentCreated -------|
|
|
120
|
+
| | | |
|
|
121
|
+
[Mark Complete] | | |
|
|
122
|
+
```
|
|
123
|
+
|
|
124
|
+
**Compensation flow (if Payment fails):**
|
|
125
|
+
|
|
126
|
+
```
|
|
127
|
+
Payment Service publishes PaymentFailed
|
|
128
|
+
-> Inventory Service hears PaymentFailed -> unreserves stock -> publishes InventoryUnreserved
|
|
129
|
+
-> Order Service hears PaymentFailed -> marks order as failed
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
**Choreography strengths:**
|
|
133
|
+
- No single point of failure — each service is autonomous
|
|
134
|
+
- Loose coupling — services communicate only through events
|
|
135
|
+
- Natural fit for simple, linear workflows with 3-4 steps
|
|
136
|
+
- Each service owns its logic completely
|
|
137
|
+
|
|
138
|
+
**Choreography weaknesses:**
|
|
139
|
+
- The overall saga flow is implicit — it exists only as the sum of all event subscriptions
|
|
140
|
+
- Difficult to understand, debug, and test as the number of services grows
|
|
141
|
+
- Adding a new step requires modifying event subscriptions across services
|
|
142
|
+
- Hard to implement global timeouts or saga-level retries
|
|
143
|
+
- Cyclic dependencies between services can create event storms
|
|
144
|
+
- No single place to view the current state of a saga instance
|
|
145
|
+
|
|
146
|
+
### Orchestration: Central Coordinator
|
|
147
|
+
|
|
148
|
+
In orchestration, a saga orchestrator (often called a saga execution coordinator or SEC) manages the entire flow. It sends commands to services and receives replies.
|
|
149
|
+
|
|
150
|
+
```
|
|
151
|
+
Saga Orchestrator
|
|
152
|
+
|
|
|
153
|
+
|--- CreateOrder -------> Order Service
|
|
154
|
+
|<-- OrderCreated ------- |
|
|
155
|
+
|
|
|
156
|
+
|--- ReserveInventory ---> Inventory Service
|
|
157
|
+
|<-- InventoryReserved -- |
|
|
158
|
+
|
|
|
159
|
+
|--- ChargePayment ------> Payment Service
|
|
160
|
+
|<-- PaymentCharged ----- |
|
|
161
|
+
|
|
|
162
|
+
|--- CreateShipment -----> Shipping Service
|
|
163
|
+
|<-- ShipmentCreated ---- |
|
|
164
|
+
|
|
|
165
|
+
[Saga Complete]
|
|
166
|
+
```
|
|
167
|
+
|
|
168
|
+
**Compensation flow (if Payment fails):**
|
|
169
|
+
|
|
170
|
+
```
|
|
171
|
+
Saga Orchestrator
|
|
172
|
+
|
|
|
173
|
+
|<-- PaymentFailed ------- Payment Service
|
|
174
|
+
|
|
|
175
|
+
|--- UnreserveInventory -> Inventory Service
|
|
176
|
+
|<-- InventoryUnreserved |
|
|
177
|
+
|
|
|
178
|
+
|--- CancelOrder --------> Order Service
|
|
179
|
+
|<-- OrderCancelled ------ |
|
|
180
|
+
|
|
|
181
|
+
[Saga Compensated]
|
|
182
|
+
```
|
|
183
|
+
|
|
184
|
+
The orchestrator is typically implemented as a state machine. Each state represents a step in the saga, and transitions are triggered by command replies (success or failure). The orchestrator persists its state so it can recover from crashes.
|
|
185
|
+
|
|
186
|
+
**Orchestration strengths:**
|
|
187
|
+
- The entire saga flow is explicit and visible in one place
|
|
188
|
+
- Easy to add, remove, or reorder steps
|
|
189
|
+
- Centralized error handling and compensation logic
|
|
190
|
+
- Straightforward to implement timeouts, retries, and dead letter handling
|
|
191
|
+
- Easy to query the current state of any saga instance
|
|
192
|
+
- Better suited for complex workflows with branching, parallel steps, or conditional logic
|
|
193
|
+
|
|
194
|
+
**Orchestration weaknesses:**
|
|
195
|
+
- The orchestrator is a potential single point of failure (mitigated by making it stateless with durable state storage)
|
|
196
|
+
- Risk of coupling business logic into the orchestrator (it should only coordinate, not contain domain logic)
|
|
197
|
+
- Additional infrastructure component to deploy, monitor, and scale
|
|
198
|
+
|
|
199
|
+
### Compensating Transactions: Design Principles
|
|
200
|
+
|
|
201
|
+
Compensating transactions are the heart of the saga pattern. Designing them well is the difference between a saga that works and one that creates data inconsistency nightmares.
|
|
202
|
+
|
|
203
|
+
**Principle 1: Semantic, not physical undo.** A compensating transaction does not "roll back" in the database sense. It applies a new business operation that logically reverses the effect. "Unreserve inventory" is not a DELETE of the reservation row — it is a new transaction that increments available stock and marks the reservation as cancelled. The original reservation record should be preserved for audit trails.
|
|
204
|
+
|
|
205
|
+
**Principle 2: Compensating transactions must be idempotent.** Because messages can be delivered more than once (at-least-once delivery), a compensating transaction may be invoked multiple times. Calling "unreserve inventory" twice for the same saga instance must produce the same result as calling it once. Use idempotency keys (typically the saga instance ID) to detect and skip duplicate compensations.
|
|
206
|
+
|
|
207
|
+
**Principle 3: Compensating transactions must be retryable.** If a compensating transaction fails (network error, service down), it must be retried until it succeeds. A saga that fails to compensate leaves the system in an inconsistent state. Design compensations to be safe to retry indefinitely. Use exponential backoff with jitter.
|
|
208
|
+
|
|
209
|
+
**Principle 4: Commutative updates reduce conflicts.** Design data updates so that the order of application does not matter. Instead of "set stock = 50," use "increment stock by 5." This reduces conflicts when multiple sagas operate on the same data concurrently.
|
|
210
|
+
|
|
211
|
+
### Transaction Classification
|
|
212
|
+
|
|
213
|
+
Garcia-Molina's original paper and Chris Richardson's modern treatment classify saga transactions into three categories:
|
|
214
|
+
|
|
215
|
+
**Compensatable transactions:** Steps that can be undone by a compensating transaction. These are all steps before the pivot transaction. Examples: reserve inventory, create a pending order, place a hold on funds.
|
|
216
|
+
|
|
217
|
+
**Pivot transaction:** The go/no-go point of the saga. If the pivot transaction succeeds, the saga is committed to completing. If it fails, the saga must compensate all preceding steps. The pivot transaction is the step after which all remaining steps are retryable (guaranteed to eventually succeed). Example: charging the credit card is often the pivot — once charged, the saga proceeds to fulfillment steps that can be retried.
|
|
218
|
+
|
|
219
|
+
**Retryable transactions:** Steps that are guaranteed to eventually succeed (possibly after retries). These come after the pivot transaction. They do not need compensating transactions because they will always complete. Examples: sending a confirmation email, updating an analytics event, generating an invoice.
|
|
220
|
+
|
|
221
|
+
**The correct saga structure is:** Compensatable steps -> Pivot transaction -> Retryable steps. This ordering minimizes the window of inconsistency and ensures that compensation is always possible for steps that might need it.
|
|
222
|
+
|
|
223
|
+
### Semantic Locks
|
|
224
|
+
|
|
225
|
+
Semantic locks are an application-level mechanism to manage concurrent access when sagas operate on the same data. When a compensatable transaction creates or updates a record, it sets a flag indicating the record is "in progress" and may change.
|
|
226
|
+
|
|
227
|
+
```
|
|
228
|
+
-- When reserving inventory (compensatable step):
|
|
229
|
+
UPDATE inventory SET
|
|
230
|
+
available_quantity = available_quantity - 5,
|
|
231
|
+
saga_lock = 'ORDER-12345', -- semantic lock
|
|
232
|
+
lock_status = 'PENDING' -- indicates in-progress saga
|
|
233
|
+
WHERE product_id = 'SKU-100';
|
|
234
|
+
|
|
235
|
+
-- Other sagas or queries can check lock_status before acting
|
|
236
|
+
-- When saga completes:
|
|
237
|
+
UPDATE inventory SET
|
|
238
|
+
lock_status = 'COMMITTED',
|
|
239
|
+
saga_lock = NULL
|
|
240
|
+
WHERE product_id = 'SKU-100' AND saga_lock = 'ORDER-12345';
|
|
241
|
+
|
|
242
|
+
-- When saga compensates:
|
|
243
|
+
UPDATE inventory SET
|
|
244
|
+
available_quantity = available_quantity + 5,
|
|
245
|
+
lock_status = 'COMPENSATED',
|
|
246
|
+
saga_lock = NULL
|
|
247
|
+
WHERE product_id = 'SKU-100' AND saga_lock = 'ORDER-12345';
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
Other transactions can treat locked records with suspicion — either waiting, reading optimistically, or failing fast depending on the business rules.
|
|
251
|
+
|
|
252
|
+
### The Saga Log
|
|
253
|
+
|
|
254
|
+
The saga log (or saga state store) records the current state of every saga instance. For orchestration, the orchestrator persists:
|
|
255
|
+
- Saga instance ID
|
|
256
|
+
- Current step
|
|
257
|
+
- Status (STARTED, COMPENSATING, COMPLETED, FAILED)
|
|
258
|
+
- Input data for each step
|
|
259
|
+
- Timestamps for each transition
|
|
260
|
+
- Compensation progress (which steps have been compensated)
|
|
261
|
+
|
|
262
|
+
This log is critical for recovery. If the orchestrator crashes mid-saga, it reads the log on restart and resumes from the last recorded state. For choreography, each service maintains its own view of saga state, making recovery harder (one reason orchestration is generally preferred for complex sagas).
|
|
263
|
+
|
|
264
|
+
---
|
|
265
|
+
|
|
266
|
+
## Trade-Offs Matrix
|
|
267
|
+
|
|
268
|
+
| Dimension | Saga Benefit | Saga Cost |
|
|
269
|
+
|---|---|---|
|
|
270
|
+
| **Availability** | No global locks; each service can process independently | Compensating transactions add load during failures |
|
|
271
|
+
| **Consistency** | Eventual atomicity — all complete or all compensate | No isolation; intermediate states visible to other transactions |
|
|
272
|
+
| **Scalability** | Each service scales independently; no global coordinator bottleneck (choreography) | Orchestrator can become a bottleneck (orchestration) |
|
|
273
|
+
| **Complexity** | Well-defined pattern for cross-service transactions | Requires compensating transaction for every step; dramatically increases codebase |
|
|
274
|
+
| **Debuggability** | Orchestration provides clear execution trace | Choreography makes tracing extremely difficult |
|
|
275
|
+
| **Failure handling** | Explicit compensation logic for every failure mode | Compensation failures require additional handling (dead letter, manual intervention) |
|
|
276
|
+
| **Performance** | No distributed locks = lower latency per step | Total transaction time increases due to async coordination |
|
|
277
|
+
| **Data integrity** | Guarantees eventual consistency under normal operation | Lost updates, dirty reads, and fuzzy reads possible under concurrent sagas |
|
|
278
|
+
| **Testability** | Each step is a local transaction, testable in isolation | Testing all failure/compensation paths requires extensive integration testing |
|
|
279
|
+
| **Operational cost** | Handles partial failures gracefully in production | Requires saga-specific monitoring, alerting, and tooling |
|
|
280
|
+
| **Development velocity** | Established pattern with framework support | Initial implementation takes 3-10x longer than equivalent monolithic transaction |
|
|
281
|
+
| **Team cognitive load** | Clear mental model (sequence of steps + compensations) | Every developer must understand distributed systems failure modes |
|
|
282
|
+
|
|
283
|
+
### Choreography vs. Orchestration Decision
|
|
284
|
+
|
|
285
|
+
| Factor | Choreography | Orchestration |
|
|
286
|
+
|---|---|---|
|
|
287
|
+
| Number of services | 2-4 services | 4+ services |
|
|
288
|
+
| Flow complexity | Linear, no branching | Branching, parallel steps, conditionals |
|
|
289
|
+
| Coupling | Loose (event-based) | Moderate (orchestrator knows all services) |
|
|
290
|
+
| Visibility | Low (implicit flow) | High (explicit state machine) |
|
|
291
|
+
| Single point of failure | None | Orchestrator (mitigatable) |
|
|
292
|
+
| Adding new steps | Requires updating event subscriptions | Change orchestrator logic only |
|
|
293
|
+
| Timeout handling | Per-service | Centralized |
|
|
294
|
+
| Debugging | Difficult | Straightforward |
|
|
295
|
+
| Team autonomy | High (each team owns their events) | Lower (orchestrator team coordinates) |
|
|
296
|
+
| Recommended for | Simple, well-understood workflows | Complex, evolving business processes |
|
|
297
|
+
|
|
298
|
+
---
|
|
299
|
+
|
|
300
|
+
## Evolution Path
|
|
301
|
+
|
|
302
|
+
### Stage 1: Monolithic Transactions (Start Here)
|
|
303
|
+
|
|
304
|
+
Start with a monolith or modular monolith. All business operations execute within a single database transaction. This is the simplest, most reliable approach. Do not introduce sagas until you have a genuine need to decompose into separate services with separate databases.
|
|
305
|
+
|
|
306
|
+
### Stage 2: Synchronous Cross-Service Calls
|
|
307
|
+
|
|
308
|
+
When services are first extracted, teams often use synchronous HTTP calls within a single request. Service A calls Service B calls Service C, all within the same HTTP request. This works for simple cases but creates temporal coupling (all services must be available simultaneously), cascading failures, and increasing latency.
|
|
309
|
+
|
|
310
|
+
### Stage 3: Choreography-Based Sagas
|
|
311
|
+
|
|
312
|
+
For simple, linear workflows (3-4 services), introduce choreography-based sagas. Each service publishes events after completing its step. Compensating transactions are triggered by failure events. This is the lowest-overhead saga implementation but becomes hard to manage as workflows grow.
|
|
313
|
+
|
|
314
|
+
### Stage 4: Orchestration-Based Sagas
|
|
315
|
+
|
|
316
|
+
When workflows become complex (5+ services, branching logic, conditional steps), introduce a saga orchestrator. The orchestrator manages the state machine, coordinates steps, handles retries and timeouts, and invokes compensations on failure. This is the recommended approach for most production saga implementations.
|
|
317
|
+
|
|
318
|
+
### Stage 5: Workflow Engine
|
|
319
|
+
|
|
320
|
+
For long-running, complex business processes with human interaction, approval steps, timers, and complex branching, adopt a dedicated workflow engine (Temporal, Camunda, AWS Step Functions). These engines provide durable execution, built-in retry and compensation support, visual process monitoring, and versioning. This is the most sophisticated and most capable approach.
|
|
321
|
+
|
|
322
|
+
### Anti-evolution: Do Not Skip Stages
|
|
323
|
+
|
|
324
|
+
Teams that jump directly from Stage 1 to Stage 4 or 5 without understanding why they need sagas consistently over-engineer their solutions. Each stage should be driven by a concrete problem with the current approach, not by anticipated future complexity.
|
|
325
|
+
|
|
326
|
+
---
|
|
327
|
+
|
|
328
|
+
## Failure Modes
|
|
329
|
+
|
|
330
|
+
### 1. Incomplete Compensation
|
|
331
|
+
|
|
332
|
+
**The problem:** A compensating transaction fails partway through. The saga is now in a state where some steps are completed, some are compensated, and some are stuck.
|
|
333
|
+
|
|
334
|
+
**Real-world scenario:** An order saga charges a credit card (step 3) then fails to create a shipment (step 4). The orchestrator initiates compensation: refund the credit card (C3). But the payment gateway is temporarily down. The refund fails. The customer has been charged but has no order.
|
|
335
|
+
|
|
336
|
+
**Mitigation:**
|
|
337
|
+
- Retry compensating transactions with exponential backoff until they succeed
|
|
338
|
+
- After N retries, move to a dead letter queue for manual intervention
|
|
339
|
+
- Alert operations teams immediately when compensation fails
|
|
340
|
+
- Design compensations to be idempotent so retries are safe
|
|
341
|
+
- Implement a "saga janitor" process that periodically scans for stuck sagas
|
|
342
|
+
|
|
343
|
+
### 2. Non-Compensatable Actions
|
|
344
|
+
|
|
345
|
+
**The problem:** Some operations cannot be undone. Emails have been sent. Physical goods have been dispatched. External APIs have been called with no reversal endpoint.
|
|
346
|
+
|
|
347
|
+
**Real-world scenario:** A travel booking saga books a flight (step 1), sends a confirmation email (step 2), then fails to book the hotel (step 3). The email is already sent and cannot be unsent.
|
|
348
|
+
|
|
349
|
+
**Mitigation:**
|
|
350
|
+
- Place non-compensatable actions at the end of the saga (after the pivot transaction)
|
|
351
|
+
- Use the reservation/confirmation pattern: reserve first (compensatable), confirm last (retryable)
|
|
352
|
+
- For emails/notifications: send "pending" notifications and "confirmed" notifications separately
|
|
353
|
+
- Accept approximate compensation: send a "cancellation" email rather than unsending the original
|
|
354
|
+
|
|
355
|
+
### 3. Saga Coordinator Failure
|
|
356
|
+
|
|
357
|
+
**The problem:** The orchestrator crashes mid-saga. If saga state is lost, the saga is orphaned — some steps completed, no one driving completion or compensation.
|
|
358
|
+
|
|
359
|
+
**Mitigation:**
|
|
360
|
+
- Persist saga state to a durable store (database, event log) before each step
|
|
361
|
+
- On restart, the orchestrator reads persisted state and resumes from the last recorded step
|
|
362
|
+
- Use exactly-once processing semantics where possible (Kafka transactions, outbox pattern)
|
|
363
|
+
- Deploy the orchestrator with high availability (multiple replicas with leader election)
|
|
364
|
+
|
|
365
|
+
### 4. Interleaving Sagas (Isolation Anomalies)
|
|
366
|
+
|
|
367
|
+
**The problem:** Two sagas operating on the same data concurrently create anomalies that neither saga detects.
|
|
368
|
+
|
|
369
|
+
**Real-world scenario:** Saga A reserves the last 5 units of inventory. Saga B reads inventory and sees 5 units reserved (dirty read). Saga A then fails and compensates (unreserves the 5 units). Saga B proceeds assuming inventory is low and triggers a reorder. The reorder was unnecessary.
|
|
370
|
+
|
|
371
|
+
**Anomaly types:**
|
|
372
|
+
- **Lost updates:** Saga A modifies a record, Saga B overwrites it without seeing A's change
|
|
373
|
+
- **Dirty reads:** Saga B reads data that Saga A has modified but may still compensate
|
|
374
|
+
- **Fuzzy reads:** Saga B reads the same data twice and gets different values because Saga A modified it between reads
|
|
375
|
+
|
|
376
|
+
**Mitigation:**
|
|
377
|
+
- Semantic locks (application-level flags indicating in-progress sagas)
|
|
378
|
+
- Commutative updates (increment/decrement instead of absolute set)
|
|
379
|
+
- Pessimistic ordering: reorder saga steps to minimize dirty read windows
|
|
380
|
+
- Version numbers on records to detect concurrent modifications
|
|
381
|
+
- Accepting and designing for eventual consistency rather than fighting it
|
|
382
|
+
|
|
383
|
+
### 5. Timeout and Stuck Sagas
|
|
384
|
+
|
|
385
|
+
**The problem:** A saga step sends a command to a service and never receives a reply. The saga is stuck indefinitely.
|
|
386
|
+
|
|
387
|
+
**Mitigation:**
|
|
388
|
+
- Implement per-step timeouts in the orchestrator
|
|
389
|
+
- After timeout, retry the step (requires idempotency)
|
|
390
|
+
- After N retries, begin compensation
|
|
391
|
+
- Run a background "saga sweeper" that detects sagas stuck in a state for longer than a configured threshold
|
|
392
|
+
- Alert on stuck sagas for manual investigation
|
|
393
|
+
|
|
394
|
+
### 6. Message Ordering and Duplication
|
|
395
|
+
|
|
396
|
+
**The problem:** In event-driven choreography, messages may arrive out of order or be delivered multiple times. A compensation event may arrive before the original transaction event.
|
|
397
|
+
|
|
398
|
+
**Mitigation:**
|
|
399
|
+
- Design every step and every compensation to be idempotent
|
|
400
|
+
- Use saga instance IDs to correlate messages and detect duplicates
|
|
401
|
+
- Use sequence numbers or causality tracking to detect out-of-order delivery
|
|
402
|
+
- Choose messaging infrastructure with ordering guarantees per partition (Kafka)
|
|
403
|
+
|
|
404
|
+
### 7. Cascading Saga Failures
|
|
405
|
+
|
|
406
|
+
**The problem:** One saga's compensation triggers events that cause other sagas to fail and compensate, creating a cascade of compensations across the system.
|
|
407
|
+
|
|
408
|
+
**Mitigation:**
|
|
409
|
+
- Design services to handle "saga compensated" events gracefully
|
|
410
|
+
- Use circuit breakers to prevent cascading failures
|
|
411
|
+
- Rate-limit saga creation during periods of high failure
|
|
412
|
+
- Monitor saga failure rates and alert on anomalies
|
|
413
|
+
|
|
414
|
+
---
|
|
415
|
+
|
|
416
|
+
## Technology Landscape
|
|
417
|
+
|
|
418
|
+
### Temporal (Recommended for New Projects)
|
|
419
|
+
|
|
420
|
+
Temporal is an open-source durable execution platform originally created at Uber (as Cadence) and now developed by Temporal Technologies. It provides native saga pattern support with workflow-as-code.
|
|
421
|
+
|
|
422
|
+
**Saga support:** Write saga logic as code in Go, Java, TypeScript, Python, or .NET. Temporal guarantees workflow completion — if the worker crashes, Temporal replays the workflow from the last checkpoint. Compensation is expressed naturally using try/catch and a compensation stack. Built-in exponential retry, timeouts, and saga rollback.
|
|
423
|
+
|
|
424
|
+
**Strengths:** Code-native (no DSLs or YAML), durable execution guarantees, built-in retries and timeouts, excellent observability via Temporal Web UI, multi-language SDKs, active open-source community.
|
|
425
|
+
|
|
426
|
+
**Weaknesses:** Operational complexity of running Temporal server (Cassandra/PostgreSQL + Elasticsearch), learning curve for the replay-based execution model, Temporal Cloud pricing at scale ($25 per 1M actions).
|
|
427
|
+
|
|
428
|
+
### AWS Step Functions
|
|
429
|
+
|
|
430
|
+
AWS Step Functions is a serverless workflow orchestration service. Saga implementation uses the Amazon States Language (JSON/YAML) to define state machines with error handling and compensation.
|
|
431
|
+
|
|
432
|
+
**Saga support:** Define saga steps as states in a state machine. Use Catch blocks to trigger compensation states on failure. Step Functions handles retries, timeouts, and parallel execution. Integrates natively with Lambda, DynamoDB, SQS, SNS, and other AWS services.
|
|
433
|
+
|
|
434
|
+
**Strengths:** Fully managed (no infrastructure to operate), deep AWS ecosystem integration, visual workflow designer (Workflow Studio), pay-per-use pricing ($0.025 per 1,000 state transitions for Standard, $1.00 per 1M requests for Express).
|
|
435
|
+
|
|
436
|
+
**Weaknesses:** AWS vendor lock-in, limited to Amazon States Language for workflow definition, state payload limited to 256 KB, cold start latency for Lambda-backed steps, less expressive than code-based approaches.
|
|
437
|
+
|
|
438
|
+
### Camunda
|
|
439
|
+
|
|
440
|
+
Camunda is a BPMN-based process orchestration platform designed for enterprise workflows. It supports both cloud-hosted (Camunda 8) and self-hosted deployments.
|
|
441
|
+
|
|
442
|
+
**Saga support:** Model sagas as BPMN processes with compensation events. Camunda's engine handles the execution, retry, and compensation automatically. Supports visual process modeling, auditing, and process versioning.
|
|
443
|
+
|
|
444
|
+
**Strengths:** BPMN standard compliance, visual process modeling for business/developer collaboration, enterprise-grade auditing and compliance features, both cloud and on-premise deployment, strong Java ecosystem integration.
|
|
445
|
+
|
|
446
|
+
**Weaknesses:** BPMN overhead for simple workflows, steeper learning curve for non-Java teams, enterprise pricing, heavier operational footprint than Temporal.
|
|
447
|
+
|
|
448
|
+
### MassTransit (Automatonymous)
|
|
449
|
+
|
|
450
|
+
MassTransit is an open-source distributed application framework for .NET. It includes Automatonymous, a state machine library for implementing saga orchestration.
|
|
451
|
+
|
|
452
|
+
**Saga support:** Define sagas as state machines in C#. MassTransit handles message routing, saga persistence (Entity Framework, MongoDB, Redis, etc.), and correlation. Integrates with RabbitMQ, Azure Service Bus, Amazon SQS, and Kafka.
|
|
453
|
+
|
|
454
|
+
**Strengths:** Native .NET integration, flexible message broker support, mature and well-documented, active open-source community, no separate infrastructure beyond the message broker and saga persistence store.
|
|
455
|
+
|
|
456
|
+
**Weaknesses:** .NET only, requires understanding of state machine concepts, manual compensation logic.
|
|
457
|
+
|
|
458
|
+
### Axon Framework
|
|
459
|
+
|
|
460
|
+
Axon Framework is a Java-based framework for building event-driven microservices with built-in CQRS and saga support.
|
|
461
|
+
|
|
462
|
+
**Saga support:** Annotate Java classes with `@Saga`, define `@SagaEventHandler` methods for each step, and use `SagaLifecycle.end()` to complete the saga. Axon handles event routing, saga persistence, and correlation.
|
|
463
|
+
|
|
464
|
+
**Strengths:** Deep integration with CQRS and event sourcing, Java-native, Axon Server provides event store and message routing, well-suited for DDD-based architectures.
|
|
465
|
+
|
|
466
|
+
**Weaknesses:** Java/Kotlin only, Axon Server adds infrastructure complexity, strong opinions that may conflict with existing architecture, commercial licensing for Axon Server Enterprise.
|
|
467
|
+
|
|
468
|
+
### Eventuate Tram
|
|
469
|
+
|
|
470
|
+
Eventuate Tram is an open-source framework by Chris Richardson (author of "Microservices Patterns") specifically designed for implementing sagas in Java/Spring applications.
|
|
471
|
+
|
|
472
|
+
**Saga support:** Define saga orchestrators as Java classes with step definitions and compensations. Uses the transactional outbox pattern for reliable messaging. Supports both choreography and orchestration.
|
|
473
|
+
|
|
474
|
+
**Strengths:** Purpose-built for sagas, transactional outbox built-in, works with existing Spring applications, created by a leading authority on microservices patterns.
|
|
475
|
+
|
|
476
|
+
**Weaknesses:** Java/Spring only, smaller community than Temporal or MassTransit, less actively maintained than alternatives.
|
|
477
|
+
|
|
478
|
+
### Manual Implementation
|
|
479
|
+
|
|
480
|
+
For simple sagas (2-3 steps), manual implementation using a message broker (Kafka, RabbitMQ) and a saga state table is viable. This avoids framework dependencies but requires implementing:
|
|
481
|
+
- Saga state persistence
|
|
482
|
+
- Step coordination logic
|
|
483
|
+
- Compensation logic
|
|
484
|
+
- Idempotency handling
|
|
485
|
+
- Timeout detection
|
|
486
|
+
- Retry logic
|
|
487
|
+
|
|
488
|
+
This is recommended only for simple workflows where adopting a framework is not justified. For anything beyond 3-4 steps, use a framework or workflow engine.
|
|
489
|
+
|
|
490
|
+
### Technology Selection Guide
|
|
491
|
+
|
|
492
|
+
| Criterion | Temporal | Step Functions | Camunda | MassTransit | Axon |
|
|
493
|
+
|---|---|---|---|---|---|
|
|
494
|
+
| Language | Go, Java, TS, Python, .NET | JSON/YAML (ASL) | Java, REST API | C# (.NET) | Java/Kotlin |
|
|
495
|
+
| Hosting | Self-hosted or Cloud | AWS managed | Self-hosted or Cloud | Self-hosted | Self-hosted or Cloud |
|
|
496
|
+
| Saga complexity | Any | Moderate | Any | Moderate-High | Moderate-High |
|
|
497
|
+
| Learning curve | Moderate | Low (AWS users) | High | Moderate | High |
|
|
498
|
+
| Vendor lock-in | None | AWS | None | None | Moderate (Axon Server) |
|
|
499
|
+
| Operational overhead | High (self-hosted) | None | Moderate-High | Low | Moderate |
|
|
500
|
+
| Community | Large, growing | Very large (AWS) | Large (enterprise) | Medium (.NET) | Medium (Java/DDD) |
|
|
501
|
+
|
|
502
|
+
---
|
|
503
|
+
|
|
504
|
+
## Decision Tree
|
|
505
|
+
|
|
506
|
+
```
|
|
507
|
+
Do you need a transaction spanning multiple services?
|
|
508
|
+
├── No --> Use a local database transaction. Stop here.
|
|
509
|
+
└── Yes
|
|
510
|
+
├── Is eventual consistency acceptable?
|
|
511
|
+
│ ├── No --> Reconsider service boundaries. Can you merge services?
|
|
512
|
+
│ │ If not, consider synchronous 2PC (accepting availability cost)
|
|
513
|
+
│ │ or redesign the business process to tolerate eventual consistency.
|
|
514
|
+
│ └── Yes
|
|
515
|
+
│ ├── Can every step be compensated?
|
|
516
|
+
│ │ ├── No --> Can non-compensatable steps be moved to the end (after pivot)?
|
|
517
|
+
│ │ │ If yes, restructure the saga. If no, saga is a poor fit.
|
|
518
|
+
│ │ │ Consider alternative patterns (reservation/confirmation, outbox only).
|
|
519
|
+
│ │ └── Yes
|
|
520
|
+
│ │ ├── How many services are involved?
|
|
521
|
+
│ │ │ ├── 2-3 services, linear flow
|
|
522
|
+
│ │ │ │ └── Choreography-based saga
|
|
523
|
+
│ │ │ │ (or manual implementation with message broker)
|
|
524
|
+
│ │ │ ├── 4-6 services, some branching
|
|
525
|
+
│ │ │ │ └── Orchestration-based saga
|
|
526
|
+
│ │ │ │ (Temporal, MassTransit, Eventuate Tram)
|
|
527
|
+
│ │ │ └── 7+ services or complex branching/parallel steps
|
|
528
|
+
│ │ │ └── Workflow engine (Temporal, Camunda, Step Functions)
|
|
529
|
+
│ │ │ Consider decomposing into smaller sagas
|
|
530
|
+
│ │ └── Does the team have distributed systems experience?
|
|
531
|
+
│ │ ├── No --> Start with a modular monolith. Learn the failure
|
|
532
|
+
│ │ │ modes before distributing. Read "Microservices Patterns"
|
|
533
|
+
│ │ │ by Chris Richardson before implementing.
|
|
534
|
+
│ │ └── Yes --> Proceed with saga implementation.
|
|
535
|
+
└── Is the "transaction" actually a long-running business process?
|
|
536
|
+
└── Yes --> Consider a process manager or workflow engine instead of a saga.
|
|
537
|
+
Sagas are for finite sequences of steps, not open-ended processes.
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
---
|
|
541
|
+
|
|
542
|
+
## Implementation Sketch
|
|
543
|
+
|
|
544
|
+
### Order Saga: Orchestration with Compensation
|
|
545
|
+
|
|
546
|
+
This sketch shows an order saga orchestrator managing four services. The orchestrator is a state machine that persists its state to a database.
|
|
547
|
+
|
|
548
|
+
```
|
|
549
|
+
Saga Definition:
|
|
550
|
+
|
|
551
|
+
Step 1: CreateOrder Compensation: CancelOrder
|
|
552
|
+
Step 2: ReserveInventory Compensation: UnreserveInventory
|
|
553
|
+
Step 3: AuthorizePayment Compensation: RefundPayment <-- PIVOT
|
|
554
|
+
Step 4: CreateShipment (retryable, no compensation needed)
|
|
555
|
+
```
|
|
556
|
+
|
|
557
|
+
**Pseudocode — Saga Orchestrator:**
|
|
558
|
+
|
|
559
|
+
```python
|
|
560
|
+
class OrderSaga:
|
|
561
|
+
def __init__(self, saga_id, order_data):
|
|
562
|
+
self.saga_id = saga_id
|
|
563
|
+
self.state = "CREATED"
|
|
564
|
+
self.order_data = order_data
|
|
565
|
+
self.compensation_stack = []
|
|
566
|
+
|
|
567
|
+
def execute(self):
|
|
568
|
+
try:
|
|
569
|
+
# Step 1: Create Order (compensatable)
|
|
570
|
+
order = self.order_service.create(
|
|
571
|
+
saga_id=self.saga_id,
|
|
572
|
+
data=self.order_data
|
|
573
|
+
)
|
|
574
|
+
self.compensation_stack.append(
|
|
575
|
+
lambda: self.order_service.cancel(self.saga_id, order.id)
|
|
576
|
+
)
|
|
577
|
+
self.persist_state("ORDER_CREATED")
|
|
578
|
+
|
|
579
|
+
# Step 2: Reserve Inventory (compensatable)
|
|
580
|
+
reservation = self.inventory_service.reserve(
|
|
581
|
+
saga_id=self.saga_id,
|
|
582
|
+
items=self.order_data.items
|
|
583
|
+
)
|
|
584
|
+
self.compensation_stack.append(
|
|
585
|
+
lambda: self.inventory_service.unreserve(self.saga_id, reservation.id)
|
|
586
|
+
)
|
|
587
|
+
self.persist_state("INVENTORY_RESERVED")
|
|
588
|
+
|
|
589
|
+
# Step 3: Authorize Payment (PIVOT transaction)
|
|
590
|
+
payment = self.payment_service.authorize(
|
|
591
|
+
saga_id=self.saga_id,
|
|
592
|
+
amount=self.order_data.total,
|
|
593
|
+
payment_method=self.order_data.payment_method
|
|
594
|
+
)
|
|
595
|
+
# After pivot succeeds, saga is committed to completing.
|
|
596
|
+
# Compensation for pivot is still possible (refund) but
|
|
597
|
+
# all subsequent steps are retryable.
|
|
598
|
+
self.compensation_stack.append(
|
|
599
|
+
lambda: self.payment_service.refund(self.saga_id, payment.id)
|
|
600
|
+
)
|
|
601
|
+
self.persist_state("PAYMENT_AUTHORIZED")
|
|
602
|
+
|
|
603
|
+
# Step 4: Create Shipment (retryable — will eventually succeed)
|
|
604
|
+
self.shipping_service.create_shipment(
|
|
605
|
+
saga_id=self.saga_id,
|
|
606
|
+
order_id=order.id,
|
|
607
|
+
address=self.order_data.shipping_address
|
|
608
|
+
)
|
|
609
|
+
self.persist_state("COMPLETED")
|
|
610
|
+
|
|
611
|
+
except StepFailedException as e:
|
|
612
|
+
self.compensate(e)
|
|
613
|
+
|
|
614
|
+
def compensate(self, original_error):
|
|
615
|
+
self.persist_state("COMPENSATING")
|
|
616
|
+
while self.compensation_stack:
|
|
617
|
+
compensation = self.compensation_stack.pop()
|
|
618
|
+
try:
|
|
619
|
+
retry_with_backoff(compensation, max_retries=10)
|
|
620
|
+
except CompensationFailedException as e:
|
|
621
|
+
# Compensation failed after retries — escalate
|
|
622
|
+
self.dead_letter_queue.send(
|
|
623
|
+
saga_id=self.saga_id,
|
|
624
|
+
failed_compensation=compensation,
|
|
625
|
+
error=e
|
|
626
|
+
)
|
|
627
|
+
self.alert_operations(self.saga_id, e)
|
|
628
|
+
self.persist_state("COMPENSATED")
|
|
629
|
+
|
|
630
|
+
def persist_state(self, new_state):
|
|
631
|
+
self.state = new_state
|
|
632
|
+
self.saga_store.save(self.saga_id, self.state, self.compensation_stack)
|
|
633
|
+
```
|
|
634
|
+
|
|
635
|
+
**Pseudocode — Idempotent Service Handler:**
|
|
636
|
+
|
|
637
|
+
```python
|
|
638
|
+
class InventoryService:
|
|
639
|
+
def reserve(self, saga_id, items):
|
|
640
|
+
# Idempotency: check if this saga already reserved
|
|
641
|
+
existing = self.db.find_reservation(saga_id=saga_id)
|
|
642
|
+
if existing:
|
|
643
|
+
return existing # Already reserved, return existing reservation
|
|
644
|
+
|
|
645
|
+
# Execute local transaction
|
|
646
|
+
with self.db.transaction():
|
|
647
|
+
reservation = Reservation(
|
|
648
|
+
saga_id=saga_id,
|
|
649
|
+
items=items,
|
|
650
|
+
status="RESERVED",
|
|
651
|
+
lock_status="PENDING" # Semantic lock
|
|
652
|
+
)
|
|
653
|
+
self.db.save(reservation)
|
|
654
|
+
for item in items:
|
|
655
|
+
self.db.update_inventory(
|
|
656
|
+
product_id=item.product_id,
|
|
657
|
+
decrement=item.quantity
|
|
658
|
+
)
|
|
659
|
+
return reservation
|
|
660
|
+
|
|
661
|
+
def unreserve(self, saga_id, reservation_id):
|
|
662
|
+
# Idempotency: check if already unreserved
|
|
663
|
+
reservation = self.db.find_reservation(saga_id=saga_id)
|
|
664
|
+
if not reservation or reservation.status == "UNRESERVED":
|
|
665
|
+
return # Already compensated or never reserved
|
|
666
|
+
|
|
667
|
+
with self.db.transaction():
|
|
668
|
+
reservation.status = "UNRESERVED"
|
|
669
|
+
reservation.lock_status = "COMPENSATED"
|
|
670
|
+
self.db.save(reservation)
|
|
671
|
+
for item in reservation.items:
|
|
672
|
+
self.db.update_inventory(
|
|
673
|
+
product_id=item.product_id,
|
|
674
|
+
increment=item.quantity
|
|
675
|
+
)
|
|
676
|
+
```
|
|
677
|
+
|
|
678
|
+
**Pseudocode — Choreography Variant (Event-Driven):**
|
|
679
|
+
|
|
680
|
+
```python
|
|
681
|
+
# Order Service
|
|
682
|
+
class OrderEventHandler:
|
|
683
|
+
@on_event("OrderRequested")
|
|
684
|
+
def handle_order_requested(self, event):
|
|
685
|
+
order = self.create_order(event.order_data)
|
|
686
|
+
self.publish("OrderCreated", order_id=order.id, items=event.items)
|
|
687
|
+
|
|
688
|
+
@on_event("PaymentFailed")
|
|
689
|
+
def handle_payment_failed(self, event):
|
|
690
|
+
self.cancel_order(event.order_id)
|
|
691
|
+
self.publish("OrderCancelled", order_id=event.order_id)
|
|
692
|
+
|
|
693
|
+
# Inventory Service
|
|
694
|
+
class InventoryEventHandler:
|
|
695
|
+
@on_event("OrderCreated")
|
|
696
|
+
def handle_order_created(self, event):
|
|
697
|
+
try:
|
|
698
|
+
reservation = self.reserve(event.order_id, event.items)
|
|
699
|
+
self.publish("InventoryReserved",
|
|
700
|
+
order_id=event.order_id,
|
|
701
|
+
reservation_id=reservation.id)
|
|
702
|
+
except InsufficientStockError:
|
|
703
|
+
self.publish("InventoryReservationFailed",
|
|
704
|
+
order_id=event.order_id)
|
|
705
|
+
|
|
706
|
+
@on_event("PaymentFailed")
|
|
707
|
+
def handle_payment_failed(self, event):
|
|
708
|
+
self.unreserve(event.order_id)
|
|
709
|
+
self.publish("InventoryUnreserved", order_id=event.order_id)
|
|
710
|
+
|
|
711
|
+
# Payment Service
|
|
712
|
+
class PaymentEventHandler:
|
|
713
|
+
@on_event("InventoryReserved")
|
|
714
|
+
def handle_inventory_reserved(self, event):
|
|
715
|
+
try:
|
|
716
|
+
payment = self.charge(event.order_id, event.amount)
|
|
717
|
+
self.publish("PaymentCharged",
|
|
718
|
+
order_id=event.order_id,
|
|
719
|
+
payment_id=payment.id)
|
|
720
|
+
except PaymentDeclinedError:
|
|
721
|
+
self.publish("PaymentFailed", order_id=event.order_id)
|
|
722
|
+
```
|
|
723
|
+
|
|
724
|
+
### Temporal Workflow Implementation (Production-Grade)
|
|
725
|
+
|
|
726
|
+
```typescript
|
|
727
|
+
// saga-workflow.ts — Temporal workflow definition
|
|
728
|
+
import { proxyActivities, ApplicationFailure } from '@temporalio/workflow';
|
|
729
|
+
import type * as activities from './activities';
|
|
730
|
+
|
|
731
|
+
const { createOrder, cancelOrder,
|
|
732
|
+
reserveInventory, unreserveInventory,
|
|
733
|
+
authorizePayment, refundPayment,
|
|
734
|
+
createShipment } = proxyActivities<typeof activities>({
|
|
735
|
+
startToCloseTimeout: '30s',
|
|
736
|
+
retry: { maximumAttempts: 3 },
|
|
737
|
+
});
|
|
738
|
+
|
|
739
|
+
export async function orderSaga(orderData: OrderData): Promise<OrderResult> {
|
|
740
|
+
const compensations: (() => Promise<void>)[] = [];
|
|
741
|
+
|
|
742
|
+
try {
|
|
743
|
+
// Step 1: Create Order
|
|
744
|
+
const order = await createOrder(orderData);
|
|
745
|
+
compensations.push(() => cancelOrder(order.id));
|
|
746
|
+
|
|
747
|
+
// Step 2: Reserve Inventory
|
|
748
|
+
const reservation = await reserveInventory(order.id, orderData.items);
|
|
749
|
+
compensations.push(() => unreserveInventory(reservation.id));
|
|
750
|
+
|
|
751
|
+
// Step 3: Authorize Payment (Pivot)
|
|
752
|
+
const payment = await authorizePayment(order.id, orderData.total);
|
|
753
|
+
compensations.push(() => refundPayment(payment.id));
|
|
754
|
+
|
|
755
|
+
// Step 4: Create Shipment (Retryable)
|
|
756
|
+
const shipment = await createShipment(order.id, orderData.shippingAddress);
|
|
757
|
+
|
|
758
|
+
return { orderId: order.id, shipmentId: shipment.id, status: 'COMPLETED' };
|
|
759
|
+
} catch (err) {
|
|
760
|
+
// Compensate in reverse order
|
|
761
|
+
for (const compensation of compensations.reverse()) {
|
|
762
|
+
try {
|
|
763
|
+
await compensation();
|
|
764
|
+
} catch (compensationErr) {
|
|
765
|
+
// Log but continue compensating remaining steps
|
|
766
|
+
console.error('Compensation failed, continuing:', compensationErr);
|
|
767
|
+
}
|
|
768
|
+
}
|
|
769
|
+
throw ApplicationFailure.nonRetryable(
|
|
770
|
+
`Order saga failed and compensated: ${err}`
|
|
771
|
+
);
|
|
772
|
+
}
|
|
773
|
+
}
|
|
774
|
+
```
|
|
775
|
+
|
|
776
|
+
---
|
|
777
|
+
|
|
778
|
+
## Cross-References
|
|
779
|
+
|
|
780
|
+
- **Event-Driven Architecture** — Choreography sagas rely on event-driven communication. Understanding event schemas, ordering, and event sourcing is essential.
|
|
781
|
+
- **Microservices** — The database-per-service pattern creates the need for cross-service transaction coordination that sagas provide.
|
|
782
|
+
- **Data Consistency** — Sagas implement eventual consistency. Understanding CAP theorem and consistency models is prerequisite knowledge.
|
|
783
|
+
- **Idempotency and Retry** — Every saga step and compensation must be idempotent due to at-least-once message delivery.
|
|
784
|
+
- **Distributed Systems Fundamentals** — Sagas operate under partial failures, network partitions, and message loss. Understanding distributed failure modes is essential.
|
|
785
|
+
- **Transactional Outbox** — The outbox pattern ensures local DB transactions and event publishing happen atomically. Sagas frequently use it for reliable step transitions.
|
|
786
|
+
- **CQRS** — Command Query Responsibility Segregation is often paired with sagas. Commands trigger saga steps; queries project eventual consistent state.
|
|
787
|
+
|
|
788
|
+
---
|
|
789
|
+
|
|
790
|
+
## Further Reading
|
|
791
|
+
|
|
792
|
+
- Garcia-Molina & Salem (1987). "Sagas." ACM SIGMOD: https://dl.acm.org/doi/10.1145/38713.38742
|
|
793
|
+
- Richardson, C. (2018). *Microservices Patterns.* Manning. Chapters 4-5 cover sagas in depth.
|
|
794
|
+
- Temporal: https://temporal.io/blog/saga-pattern-made-easy
|
|
795
|
+
- Azure Architecture Center: https://learn.microsoft.com/en-us/azure/architecture/patterns/saga
|
|
796
|
+
- AWS Prescriptive Guidance: https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/saga.html
|
|
797
|
+
- microservices.io: https://microservices.io/patterns/data/saga.html
|