@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,796 @@
|
|
|
1
|
+
# Idempotency and Retry — Architecture Expertise Module
|
|
2
|
+
|
|
3
|
+
> Idempotency ensures that performing an operation multiple times produces the same result as performing it once. Retry logic automatically re-attempts failed operations. Together, they are the foundation of reliability in distributed systems — without idempotency, retries cause duplicate operations; without retries, transient failures become permanent.
|
|
4
|
+
|
|
5
|
+
> **Category:** Distributed
|
|
6
|
+
> **Complexity:** Moderate
|
|
7
|
+
> **Applies when:** Any system where operations can fail and be retried — API calls, message processing, payment processing, database operations across network boundaries
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## What This Is (and What It Isn't)
|
|
12
|
+
|
|
13
|
+
### Idempotency: f(x) = f(f(x))
|
|
14
|
+
|
|
15
|
+
The mathematical definition is precise: an operation is idempotent if applying it multiple times yields the same result as applying it once. In distributed systems, the definition broadens slightly: an operation is idempotent if repeating it produces the same **observable side effects** as executing it once. The response body may differ (e.g., returning the already-created resource on a second call), but the system state must not change after the first successful execution.
|
|
16
|
+
|
|
17
|
+
This is not an abstract concern. In a distributed system, the network sits between every caller and callee. A request can succeed on the server but the response can be lost in transit. The client cannot distinguish "the server never received my request" from "the server processed my request but I never got the response." Without idempotency, the only safe option is to not retry — which means accepting permanent failure from transient network issues.
|
|
18
|
+
|
|
19
|
+
**Natural idempotency** — some operations are inherently idempotent without any special design. HTTP GET returns the same resource regardless of how many times you call it. HTTP PUT replaces a resource with the provided state — calling it twice with the same body produces the same result. HTTP DELETE removes a resource — calling it on an already-deleted resource is a no-op (or returns 404, which is still idempotent in terms of system state). SQL `UPDATE users SET email = 'x@y.com' WHERE id = 42` is naturally idempotent. `INSERT ... ON CONFLICT DO NOTHING` is naturally idempotent. Setting a value is idempotent; incrementing a value is not.
|
|
20
|
+
|
|
21
|
+
**Designed idempotency** — most real-world operations are not naturally idempotent. HTTP POST creates a new resource — calling it twice creates two resources. `INSERT INTO orders (...)` creates a duplicate row. `balance = balance - 100` deducts twice. Sending a notification email is not idempotent. Calling an external payment API is not idempotent. These operations require explicit idempotency design: an idempotency key mechanism that allows the server to detect and deduplicate repeated requests.
|
|
22
|
+
|
|
23
|
+
### Idempotency Keys
|
|
24
|
+
|
|
25
|
+
The idempotency key pattern, popularized by Stripe and now an industry standard, works as follows:
|
|
26
|
+
|
|
27
|
+
1. The client generates a unique key for each logical operation (typically a UUID v4 or a deterministic hash of the operation's business parameters).
|
|
28
|
+
2. The client sends the key with the request (usually as a header: `Idempotency-Key: <value>`).
|
|
29
|
+
3. The server receives the request and checks whether this key has been seen before.
|
|
30
|
+
4. If the key is new: process the request, store the key alongside the result, return the response.
|
|
31
|
+
5. If the key exists: skip processing, return the stored response from the first execution.
|
|
32
|
+
6. If the key exists but with different parameters: reject the request with a 422 error — this prevents accidental misuse where a client reuses a key for a different operation.
|
|
33
|
+
|
|
34
|
+
The key insight is that idempotency keys shift the deduplication responsibility to the server, which is the only component with authoritative knowledge of what has already been processed.
|
|
35
|
+
|
|
36
|
+
### Retry: Not "Just Try Again"
|
|
37
|
+
|
|
38
|
+
Retry is the automatic re-attempt of a failed operation. But naive retry — immediately repeating a failed call — is one of the most dangerous patterns in distributed systems. Retry requires careful design across multiple dimensions:
|
|
39
|
+
|
|
40
|
+
- **Which failures are retryable?** A 500 Internal Server Error is retryable. A 400 Bad Request is not — sending the same malformed request again will produce the same error. A 429 Too Many Requests is retryable, but only after respecting the `Retry-After` header. A network timeout is retryable. A connection refused might be retryable (the server may be restarting) or not (the server may be permanently down).
|
|
41
|
+
- **How many times?** Unbounded retries can run forever, consuming resources and amplifying load on a struggling service. A retry budget of 3–5 attempts is typical.
|
|
42
|
+
- **How long to wait between attempts?** Immediate retry floods the server. Fixed-interval retry synchronizes waves of retries from multiple clients. Exponential backoff increases wait times but can still synchronize. Exponential backoff with jitter is the current best practice.
|
|
43
|
+
- **What happens when all retries are exhausted?** The operation must fail gracefully — dead letter queues, compensation logic, alerting, or surfacing the failure to the user for manual intervention.
|
|
44
|
+
|
|
45
|
+
### What This Is Not
|
|
46
|
+
|
|
47
|
+
**Not a substitute for fixing root causes.** If an API returns errors 50% of the time, adding retries does not solve the problem — it masks it while doubling the load. Retry is for transient failures (network blips, brief overloads, temporary unavailability during deployments), not for systematic failures.
|
|
48
|
+
|
|
49
|
+
**Not free.** Every retry consumes resources on both client and server. Every idempotency key consumes storage. Every deduplication check adds latency. These costs are justified by the reliability gains, but they are real and must be budgeted.
|
|
50
|
+
|
|
51
|
+
**Not a replacement for transactions.** Idempotency ensures an operation is not duplicated. It does not ensure that a multi-step operation is atomic. If step 1 succeeds and step 2 fails, idempotency on step 1 prevents it from being duplicated on retry, but it does not roll back step 1. For multi-step atomicity, you need sagas or distributed transactions in addition to idempotency.
|
|
52
|
+
|
|
53
|
+
---
|
|
54
|
+
|
|
55
|
+
## When to Use It
|
|
56
|
+
|
|
57
|
+
### Idempotency: The Universal Requirement
|
|
58
|
+
|
|
59
|
+
**All external API calls.** Any call that crosses a network boundary can fail ambiguously. The caller may not know whether the server received and processed the request. Without idempotency, retrying risks duplicate processing. This applies to outgoing calls (your service calling Stripe, Twilio, AWS) and incoming calls (clients calling your API).
|
|
60
|
+
|
|
61
|
+
**Payment processing — the canonical example.** Stripe's idempotency key implementation is the gold standard because the consequences of non-idempotent payments are catastrophic. A customer is charged $500. The response is lost due to a network timeout. The client retries. Without idempotency, the customer is charged $1,000. Stripe solved this by requiring an `Idempotency-Key` header on all POST requests. The same key always returns the same charge result, regardless of how many times it is sent. Stripe stores idempotency keys for 24 hours — long enough to cover any reasonable retry window but short enough to avoid unbounded storage growth. Brandur Leach, then at Stripe, published a detailed design showing how to implement Stripe-like idempotency keys in Postgres using an `idempotency_keys` table with atomic phases tracked within each key's lifecycle.
|
|
62
|
+
|
|
63
|
+
**Message queue consumers.** Every major message broker (Kafka, RabbitMQ, SQS, Pub/Sub) provides at-least-once delivery — meaning messages can be delivered more than once. Consumer-side idempotency is required to prevent duplicate processing. The message ID or a business-domain deduplication key serves as the idempotency key. Without this, every message redelivery (caused by consumer crashes, network partitions, or broker failovers) produces duplicate side effects.
|
|
64
|
+
|
|
65
|
+
**Webhook handlers.** Webhook providers (Stripe, GitHub, Twilio, Shopify) explicitly document that webhooks may be delivered multiple times. Your webhook handler must be idempotent. The webhook event ID serves as the natural idempotency key. Store processed event IDs and skip duplicates.
|
|
66
|
+
|
|
67
|
+
**Database operations across service boundaries.** When Service A calls Service B to create a record, and the response is lost, Service A retrying should not create a duplicate record in Service B. Service B needs an idempotency mechanism — either a natural unique constraint (email, order reference number) or an explicit idempotency key.
|
|
68
|
+
|
|
69
|
+
**Scheduled jobs and cron tasks.** A daily billing job that runs at midnight can be accidentally triggered twice (clock skew, overlapping runs, manual trigger during debugging). Each execution must be idempotent — processing only unbilled items, not rebilling already-billed ones.
|
|
70
|
+
|
|
71
|
+
### Retry: The Resilience Requirement
|
|
72
|
+
|
|
73
|
+
**Transient network failures.** DNS resolution failures, TCP connection resets, TLS handshake timeouts — these are temporary by nature. A retry after 1–2 seconds almost always succeeds. Not retrying means a single dropped packet causes a permanent failure.
|
|
74
|
+
|
|
75
|
+
**Service deployments and rolling restarts.** During a rolling deployment, some instances are temporarily unavailable. Requests routed to a terminating instance get connection refused or 503 errors. A retry routed to a healthy instance succeeds immediately. Without retry, every deployment causes visible errors.
|
|
76
|
+
|
|
77
|
+
**Cloud infrastructure transient errors.** AWS, GCP, and Azure all document transient error rates as expected behavior. AWS SDK retry behavior defaults to 3 retries with exponential backoff for this reason. The AWS Builders' Library explicitly states: "We use timeouts and retries to make the inevitable failures invisible to customers."
|
|
78
|
+
|
|
79
|
+
**Rate limiting responses.** A 429 response with a `Retry-After` header is an explicit invitation to retry. Respecting the header and retrying after the specified delay is the correct behavior — not retrying means accepting failure when the server has told you exactly when to try again.
|
|
80
|
+
|
|
81
|
+
---
|
|
82
|
+
|
|
83
|
+
## When NOT to Use It
|
|
84
|
+
|
|
85
|
+
This section is equally important. Idempotency and retry, applied incorrectly, cause some of the most devastating production incidents in distributed systems.
|
|
86
|
+
|
|
87
|
+
### Retries Without Idempotency: The Double-Charge Problem
|
|
88
|
+
|
|
89
|
+
The single most dangerous mistake is adding retry logic to a non-idempotent operation. Consider a payment API that is not idempotent:
|
|
90
|
+
|
|
91
|
+
1. Client sends `POST /charge` with `amount: $100`.
|
|
92
|
+
2. Server processes the charge successfully. Customer is billed $100.
|
|
93
|
+
3. Response is lost (network timeout).
|
|
94
|
+
4. Client retries `POST /charge` with `amount: $100`.
|
|
95
|
+
5. Server processes the charge again. Customer is billed another $100.
|
|
96
|
+
|
|
97
|
+
The customer has been charged $200 for a $100 purchase. This is not a hypothetical scenario — it is one of the most common bugs in payment systems. The same pattern applies to any side-effecting operation: sending duplicate emails, creating duplicate orders, posting duplicate messages, triggering duplicate shipments.
|
|
98
|
+
|
|
99
|
+
**Rule: Never add retry logic to an operation unless that operation is idempotent.** If you cannot make the operation idempotent, do not retry it — fail loudly and handle the failure through compensation or manual intervention.
|
|
100
|
+
|
|
101
|
+
### Retry Storms: Retries That Kill Services
|
|
102
|
+
|
|
103
|
+
A retry storm occurs when multiple clients simultaneously retry requests against a service that is already struggling, amplifying the load and turning a partial failure into a complete outage. This is one of the most common causes of cascading failures in microservices architectures.
|
|
104
|
+
|
|
105
|
+
**The Yandex Go Incident.** A new release of the order service introduced widespread errors. CPU usage across many services reached 100%. The release was rolled back within 10 minutes, but the system remained down for an entire hour. Why? Every upstream service was retrying failed requests. The order service recovered, but immediately received a flood of retried requests from every client that had accumulated retries during the 10-minute outage. Each retry wave triggered more failures, which triggered more retries. The system only stabilized after manually disabling retries across all upstream services.
|
|
106
|
+
|
|
107
|
+
**The AWS/ECS Cascading Retry Incident.** A single frustrated user tapped "Try Again" four times in quick succession on a mobile app. The load balancer was configured to retry all 503 responses five times. Each user tap generated one request, the load balancer retried each five times, resulting in 5 user requests multiplied by 6 total attempts each, equaling 30 backend requests — from a single user. Multiply this by thousands of users experiencing the same error, and the backend was overwhelmed by retry traffic that dwarfed the original load.
|
|
108
|
+
|
|
109
|
+
**The Agoda Production Incident.** Agoda implemented a 5% retry budget at the service level. During a production incident, they discovered that heavy endpoints were consuming up to 35% of the retry budget, starving lightweight health-check and metadata endpoints of their retry capacity. The uniform budget did not account for the uneven distribution of retry costs across endpoints.
|
|
110
|
+
|
|
111
|
+
### Exponential Backoff Without Jitter: Synchronized Retry Waves
|
|
112
|
+
|
|
113
|
+
Exponential backoff alone is insufficient. If 1,000 clients all fail at the same moment and all use exponential backoff with the same base and multiplier, they all retry at exactly the same times: 1 second, 2 seconds, 4 seconds, 8 seconds. The retries are spaced further apart but remain perfectly synchronized, creating periodic spikes that can overwhelm the recovering service.
|
|
114
|
+
|
|
115
|
+
AWS documented this problem extensively in their Architecture Blog. The solution is jitter — adding randomness to the backoff interval to desynchronize retries across clients.
|
|
116
|
+
|
|
117
|
+
### Retrying Non-Transient Errors
|
|
118
|
+
|
|
119
|
+
Retrying a 400 Bad Request, a 401 Unauthorized, a 403 Forbidden, or a 404 Not Found is pointless — the same request will produce the same error every time. Retrying these errors wastes resources and delays the actual error handling. Only retry errors that indicate transient conditions: 429 (Too Many Requests), 500 (Internal Server Error), 502 (Bad Gateway), 503 (Service Unavailable), 504 (Gateway Timeout), and network-level errors (connection refused, timeout, DNS failure).
|
|
120
|
+
|
|
121
|
+
### Idempotency Key Collisions and Misuse
|
|
122
|
+
|
|
123
|
+
If idempotency keys are generated with insufficient entropy (e.g., short random strings, timestamp-based), different operations can receive the same key. The server treats the second operation as a retry of the first and returns the first operation's result — silently dropping the second operation. Similarly, if a client reuses the same idempotency key for genuinely different operations (sending the same key with different parameters), the server must reject the request rather than silently returning the wrong cached result.
|
|
124
|
+
|
|
125
|
+
### Unbounded Retry Without Circuit Breaking
|
|
126
|
+
|
|
127
|
+
Retrying indefinitely against a down service keeps connections open, consumes thread pools, and can cause the caller to fail under resource exhaustion. Retries must have a maximum attempt count, and ideally integrate with a circuit breaker that stops retries entirely when the failure rate exceeds a threshold.
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## How It Works
|
|
132
|
+
|
|
133
|
+
### Idempotency Key Lifecycle
|
|
134
|
+
|
|
135
|
+
The lifecycle of an idempotency key follows a clear state machine:
|
|
136
|
+
|
|
137
|
+
```
|
|
138
|
+
Client generates key (UUID v4 or deterministic hash)
|
|
139
|
+
│
|
|
140
|
+
▼
|
|
141
|
+
┌─────────────────┐
|
|
142
|
+
│ Server receives │
|
|
143
|
+
│ request + key │
|
|
144
|
+
└────────┬────────┘
|
|
145
|
+
│
|
|
146
|
+
▼
|
|
147
|
+
┌─────────────────┐ ┌─────────────────┐
|
|
148
|
+
│ Key exists in │─Yes─▶│ Parameters match │─Yes─▶ Return stored response
|
|
149
|
+
│ idempotency │ │ original? │
|
|
150
|
+
│ store? │ └────────┬────────┘
|
|
151
|
+
└────────┬────────┘ │No
|
|
152
|
+
│No ▼
|
|
153
|
+
▼ Return 422 Conflict
|
|
154
|
+
┌─────────────────┐
|
|
155
|
+
│ Store key with │
|
|
156
|
+
│ status: STARTED │
|
|
157
|
+
└────────┬────────┘
|
|
158
|
+
│
|
|
159
|
+
▼
|
|
160
|
+
┌─────────────────┐
|
|
161
|
+
│ Process request │
|
|
162
|
+
└────────┬────────┘
|
|
163
|
+
│
|
|
164
|
+
┌────┴────┐
|
|
165
|
+
│Success │Failure
|
|
166
|
+
▼ ▼
|
|
167
|
+
Store result Store error (if permanent)
|
|
168
|
+
+ COMPLETED or delete key (if transient)
|
|
169
|
+
│ │
|
|
170
|
+
▼ ▼
|
|
171
|
+
Return Return error
|
|
172
|
+
response (client may retry)
|
|
173
|
+
```
|
|
174
|
+
|
|
175
|
+
**Critical design decision: what to store on failure.** If the server stores the error response alongside the idempotency key, subsequent retries will return the cached error without re-attempting the operation. This is correct for permanent errors (validation failures, business rule violations) but catastrophic for transient errors (database timeouts, downstream service unavailability). For transient errors, the idempotency key should be deleted or marked as retryable, allowing the client's next attempt to actually re-execute the operation.
|
|
176
|
+
|
|
177
|
+
### Idempotency Store Design
|
|
178
|
+
|
|
179
|
+
The idempotency store is a lookup table mapping keys to results. It can be implemented as:
|
|
180
|
+
|
|
181
|
+
**Database table (most common for critical operations):**
|
|
182
|
+
|
|
183
|
+
```sql
|
|
184
|
+
CREATE TABLE idempotency_keys (
|
|
185
|
+
key VARCHAR(255) PRIMARY KEY,
|
|
186
|
+
request_path VARCHAR(500) NOT NULL,
|
|
187
|
+
request_hash VARCHAR(64) NOT NULL, -- SHA-256 of request body
|
|
188
|
+
status VARCHAR(20) NOT NULL, -- STARTED, COMPLETED, FAILED
|
|
189
|
+
response_code INTEGER,
|
|
190
|
+
response_body JSONB,
|
|
191
|
+
created_at TIMESTAMP NOT NULL DEFAULT NOW(),
|
|
192
|
+
locked_at TIMESTAMP, -- for concurrent request protection
|
|
193
|
+
locked_by VARCHAR(255) -- process/instance ID
|
|
194
|
+
);
|
|
195
|
+
|
|
196
|
+
CREATE INDEX idx_idempotency_keys_created ON idempotency_keys (created_at);
|
|
197
|
+
-- For cleanup: DELETE FROM idempotency_keys WHERE created_at < NOW() - INTERVAL '24 hours';
|
|
198
|
+
```
|
|
199
|
+
|
|
200
|
+
**Redis/cache (for high-throughput, lower-criticality operations):**
|
|
201
|
+
|
|
202
|
+
```
|
|
203
|
+
SET idempotency:{key} {serialized_response} EX 86400 -- 24h TTL
|
|
204
|
+
```
|
|
205
|
+
|
|
206
|
+
Redis provides automatic expiration (no cleanup job needed) and faster lookups, but lacks transactional guarantees with the main database. For payment processing, a database-backed store within the same transaction as the business operation is strongly preferred.
|
|
207
|
+
|
|
208
|
+
### Deduplication Window
|
|
209
|
+
|
|
210
|
+
Idempotency keys cannot be stored forever. The deduplication window defines how long keys are retained:
|
|
211
|
+
|
|
212
|
+
- **Stripe:** 24 hours. Long enough for any reasonable retry sequence (including overnight retries after a failure during business hours), short enough to avoid unbounded storage growth.
|
|
213
|
+
- **Payment systems:** 24–72 hours is typical. Some regulatory environments require longer windows.
|
|
214
|
+
- **Message queues:** Typically 5–15 minutes. Message redelivery happens quickly; if a consumer hasn't processed a message within 15 minutes, something is fundamentally wrong.
|
|
215
|
+
- **Webhook handlers:** 7–30 days. Webhook providers may retry failed deliveries over days.
|
|
216
|
+
|
|
217
|
+
After the deduplication window expires, the key is pruned and the same key can be used again (it will be treated as a new request). This is by design — the window covers the retry period, not eternity.
|
|
218
|
+
|
|
219
|
+
### Retry Strategies
|
|
220
|
+
|
|
221
|
+
**Immediate retry (almost never correct):**
|
|
222
|
+
|
|
223
|
+
```
|
|
224
|
+
retry(operation, maxAttempts: 3, delay: 0)
|
|
225
|
+
```
|
|
226
|
+
|
|
227
|
+
Retries instantly. Only appropriate when the failure is known to be resolved immediately (e.g., a lock contention that clears in microseconds). For network failures, immediate retry floods the server.
|
|
228
|
+
|
|
229
|
+
**Fixed-interval retry (rarely correct):**
|
|
230
|
+
|
|
231
|
+
```
|
|
232
|
+
retry(operation, maxAttempts: 3, delay: 1000ms)
|
|
233
|
+
```
|
|
234
|
+
|
|
235
|
+
Retries at a constant interval. Better than immediate, but all clients that fail at the same time retry at the same time, creating synchronized waves.
|
|
236
|
+
|
|
237
|
+
**Exponential backoff (good, but incomplete):**
|
|
238
|
+
|
|
239
|
+
```
|
|
240
|
+
delay = baseDelay * 2^attempt
|
|
241
|
+
// Attempt 1: 1s, Attempt 2: 2s, Attempt 3: 4s, Attempt 4: 8s
|
|
242
|
+
```
|
|
243
|
+
|
|
244
|
+
Increasing delays reduce load on the recovering server. But without jitter, synchronized retries still occur.
|
|
245
|
+
|
|
246
|
+
**Exponential backoff with full jitter (current best practice):**
|
|
247
|
+
|
|
248
|
+
```
|
|
249
|
+
delay = random(0, baseDelay * 2^attempt)
|
|
250
|
+
// Attempt 1: random(0, 1s), Attempt 2: random(0, 2s), Attempt 3: random(0, 4s)
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
AWS's analysis shows that full jitter provides the best spread of retries and the lowest total number of calls needed for all clients to eventually succeed. The randomness breaks synchronization, distributing retries evenly over the backoff window. AWS compared three jitter strategies:
|
|
254
|
+
|
|
255
|
+
- **Full Jitter:** `sleep = random(0, min(cap, base * 2^attempt))` — best overall distribution.
|
|
256
|
+
- **Equal Jitter:** `sleep = min(cap, base * 2^attempt) / 2 + random(0, min(cap, base * 2^attempt) / 2)` — guarantees at least half the backoff delay, less spread.
|
|
257
|
+
- **Decorrelated Jitter:** `sleep = min(cap, random(base, previousSleep * 3))` — each retry's delay depends on the previous one, not the attempt number.
|
|
258
|
+
|
|
259
|
+
**Decorrelated jitter (AWS alternative):** `sleep = min(cap, random(base, previousSleep * 3))` — each retry's delay depends on the previous one, exploring a wider range of delays when the server needs more recovery time.
|
|
260
|
+
|
|
261
|
+
### Retry Budgets
|
|
262
|
+
|
|
263
|
+
Instead of (or in addition to) per-request retry limits, a retry budget limits the total percentage of requests that can be retries across an entire service. Google's SRE practices recommend a retry budget of 10%: if a service is making 1,000 requests per second, at most 100 of those can be retries. This prevents retry amplification at the system level.
|
|
264
|
+
|
|
265
|
+
```
|
|
266
|
+
totalRequests = firstAttempts + retries
|
|
267
|
+
retryRatio = retries / totalRequests
|
|
268
|
+
if retryRatio > 0.10:
|
|
269
|
+
stop retrying, fail fast
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
Agoda's production experience revealed that uniform retry budgets are insufficient — they discovered that heavy endpoints consumed disproportionate retry budget, starving lightweight endpoints. The solution was per-endpoint or weighted retry budgets.
|
|
273
|
+
|
|
274
|
+
### Circuit Breaker Integration
|
|
275
|
+
|
|
276
|
+
Retry and circuit breaking are complementary, not competing, patterns:
|
|
277
|
+
|
|
278
|
+
1. **Closed state (normal):** Requests pass through. Failures are retried with backoff.
|
|
279
|
+
2. **Open state (circuit broken):** The failure rate has exceeded the threshold. All requests fail immediately without being sent — no retries, no network calls. This stops retry storms from overwhelming the downstream service.
|
|
280
|
+
3. **Half-open state (probing):** After a timeout, a single request is allowed through. If it succeeds, the circuit closes. If it fails, the circuit re-opens.
|
|
281
|
+
|
|
282
|
+
The circuit breaker wraps the retry logic. When the circuit is open, retries are not attempted. When the circuit is closed or half-open, retries proceed with backoff and jitter.
|
|
283
|
+
|
|
284
|
+
### Dead Letter Queues
|
|
285
|
+
|
|
286
|
+
When all retry attempts are exhausted and the operation has permanently failed, the request must go somewhere for investigation and potential manual processing. A dead letter queue (DLQ) captures these failed operations with full context: the original request, the idempotency key, the error history, and timestamps.
|
|
287
|
+
|
|
288
|
+
```
|
|
289
|
+
Request → Retry 1 (fail) → Retry 2 (fail) → Retry 3 (fail) → Dead Letter Queue
|
|
290
|
+
│
|
|
291
|
+
Alert operations team
|
|
292
|
+
Manual investigation
|
|
293
|
+
Reprocessing when ready
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
### At-Least-Once + Idempotency = Effectively-Once
|
|
297
|
+
|
|
298
|
+
True exactly-once processing is theoretically impossible in distributed systems (proven by the Two Generals Problem and FLP impossibility result). The practical approximation used across the industry is:
|
|
299
|
+
|
|
300
|
+
1. **At-least-once delivery:** The message/request will be delivered one or more times. Guaranteed by retry logic.
|
|
301
|
+
2. **Idempotent processing:** Processing the same message/request multiple times produces the same result as processing it once. Guaranteed by idempotency keys or natural idempotency.
|
|
302
|
+
3. **The combination:** Every operation is attempted until it succeeds (at-least-once), and duplicate attempts are harmless (idempotent). The observable result is "effectively exactly-once."
|
|
303
|
+
|
|
304
|
+
Apache Kafka's exactly-once semantics (EOS) implement this pattern: idempotent producers (each message has a producer ID and sequence number; the broker deduplicates), transactions (multiple messages are committed atomically), and transactional consumers (consumer offsets are committed in the same transaction as the produced messages). Extending this guarantee to external systems requires the Outbox Pattern or the Listen-to-Yourself Pattern where side effects are driven by consuming your own events rather than inline execution.
|
|
305
|
+
|
|
306
|
+
---
|
|
307
|
+
|
|
308
|
+
## Trade-Offs Matrix
|
|
309
|
+
|
|
310
|
+
| Dimension | Without Idempotency + Retry | With Idempotency + Retry | Notes |
|
|
311
|
+
|---|---|---|---|
|
|
312
|
+
| **Duplicate operations** | Retries cause duplicates (double charges, double sends) | Retries are safe; duplicates are detected and deduplicated | The fundamental reason idempotency exists |
|
|
313
|
+
| **Transient failure handling** | Single failure = permanent failure; user sees errors | Transient failures are invisible to users | AWS Builders' Library design goal |
|
|
314
|
+
| **Storage overhead** | None | Idempotency store requires space (keys + responses) | Stripe: 24h retention; typical: a few GB for millions of keys |
|
|
315
|
+
| **Latency overhead** | None | Each request incurs a lookup against the idempotency store | Database: 1–5ms; Redis: <1ms; amortized across requests |
|
|
316
|
+
| **Implementation complexity** | Simple (fire and forget) | Moderate (key generation, store, cleanup, concurrent access) | Most complexity is in the idempotency store and edge cases |
|
|
317
|
+
| **Server load under failure** | Normal (no retries) or catastrophic (naive retries) | Controlled (backoff + jitter + budget + circuit breaker) | Requires all four mechanisms working together |
|
|
318
|
+
| **Observability** | Hard to distinguish new requests from retries | Idempotency keys enable tracking retry rates and patterns | Valuable operational signal: rising retry rates indicate problems |
|
|
319
|
+
| **Client complexity** | Client must handle failures manually | Client retry logic is standardized and often SDK-provided | AWS SDK, gRPC, Axios interceptors all support configurable retry |
|
|
320
|
+
| **Data consistency** | Risk of inconsistent state from partial retries | Consistent state; operations are atomic + deduplicated | Especially critical for financial and inventory operations |
|
|
321
|
+
| **Recovery from outages** | Manual reconciliation to find and fix duplicates | Automatic recovery; retries succeed when service recovers | Reduces MTTR (Mean Time To Recovery) significantly |
|
|
322
|
+
| **Cost** | No infrastructure cost | Idempotency store infra + compute for dedup checks | Negligible relative to the cost of duplicate operations or data loss |
|
|
323
|
+
|
|
324
|
+
---
|
|
325
|
+
|
|
326
|
+
## Evolution Path
|
|
327
|
+
|
|
328
|
+
### Stage 1: Naive (No Retry, No Idempotency)
|
|
329
|
+
|
|
330
|
+
The starting state of most systems. Every operation is fire-and-forget. If it fails, the user sees an error. If the response is lost, the client has no way to know whether the operation succeeded. This works for read-only operations and internal tools where occasional failures are acceptable.
|
|
331
|
+
|
|
332
|
+
**Symptoms that you've outgrown this:** Users report seeing errors for operations that actually succeeded. Duplicate records appear in the database. Payment disputes arise from double charges. Operations teams spend hours reconciling data after outages.
|
|
333
|
+
|
|
334
|
+
### Stage 2: Client-Side Retry (Dangerous Without Idempotency)
|
|
335
|
+
|
|
336
|
+
The team adds retry logic (often an Axios interceptor or an HTTP client middleware) to automatically retry failed requests. This immediately causes duplicate operations because the server is not idempotent. The team then adds ad-hoc deduplication — unique database constraints, "check before write" patterns, or manual reconciliation.
|
|
337
|
+
|
|
338
|
+
**Symptoms:** Duplicate records with slightly different timestamps. IntegrityError exceptions in logs from unique constraint violations. Race conditions where two retries both pass the "check" in "check before write."
|
|
339
|
+
|
|
340
|
+
### Stage 3: Idempotency Keys on Critical Paths
|
|
341
|
+
|
|
342
|
+
The team implements idempotency keys on the most critical endpoints — payment processing, order creation, account registration. The idempotency store is a database table. Retry logic uses exponential backoff. This covers the highest-risk operations but leaves gaps in lower-risk paths.
|
|
343
|
+
|
|
344
|
+
**Symptoms that you've outgrown this:** Idempotency coverage is inconsistent — some endpoints are safe, others are not. Developers forget to add idempotency to new endpoints. No retry budget or circuit breaking exists yet.
|
|
345
|
+
|
|
346
|
+
### Stage 4: Systematic Idempotency + Retry Infrastructure
|
|
347
|
+
|
|
348
|
+
Idempotency middleware is applied to all mutating endpoints by default (opt-out, not opt-in). Retry logic includes exponential backoff with jitter, retry budgets, and circuit breaker integration. Dead letter queues capture exhausted retries. Monitoring dashboards track retry rates, idempotency key hit rates, and circuit breaker state.
|
|
349
|
+
|
|
350
|
+
**This is the target state for most production systems.**
|
|
351
|
+
|
|
352
|
+
### Stage 5: Platform-Level Guarantees
|
|
353
|
+
|
|
354
|
+
The infrastructure platform provides idempotency and retry as built-in primitives. Temporal workflows automatically retry activities with configurable policies. Kafka consumers use transactional exactly-once semantics. API gateways enforce idempotency key requirements and handle retry logic at the edge. Individual services no longer implement these patterns — they inherit them from the platform.
|
|
355
|
+
|
|
356
|
+
**This is where organizations like Stripe, Google, and Amazon operate.**
|
|
357
|
+
|
|
358
|
+
---
|
|
359
|
+
|
|
360
|
+
## Failure Modes
|
|
361
|
+
|
|
362
|
+
### Retry Storms (Cascading Amplification)
|
|
363
|
+
|
|
364
|
+
**What happens:** A downstream service experiences elevated latency or errors. Every upstream caller retries. Each retry adds load to the already-struggling service. The service degrades further. More retries are triggered. The cycle escalates until the service (and potentially its callers' callers) completely fails.
|
|
365
|
+
|
|
366
|
+
**Why it happens:** No retry budget (unlimited retries). No circuit breaker (retries continue even when the service is clearly down). No backoff (immediate retries). All clients retry simultaneously (no jitter).
|
|
367
|
+
|
|
368
|
+
**How to prevent:** Retry budgets (10% max retry traffic). Circuit breakers (stop retrying when failure rate > threshold). Exponential backoff with full jitter. Load shedding on the server side (reject excess requests with 503 + Retry-After). Monitoring and alerting on retry rates.
|
|
369
|
+
|
|
370
|
+
**Real-world example:** The Yandex Go incident — a 10-minute bad deployment caused a 60-minute outage because retry storms from upstream services kept the recovered service overwhelmed for 50 minutes after the fix was deployed.
|
|
371
|
+
|
|
372
|
+
### Duplicate Processing Without Idempotency
|
|
373
|
+
|
|
374
|
+
**What happens:** A message is delivered twice (normal in at-least-once systems). The consumer processes it twice, creating duplicate side effects — duplicate database records, duplicate emails sent, duplicate payments charged.
|
|
375
|
+
|
|
376
|
+
**Why it happens:** The consumer does not track which messages it has already processed. The operation is not naturally idempotent. No deduplication mechanism exists.
|
|
377
|
+
|
|
378
|
+
**How to prevent:** Consumer-side dedup table keyed on message ID. Natural idempotency through upserts or unique constraints. Idempotent consumer pattern with explicit processed-message tracking.
|
|
379
|
+
|
|
380
|
+
### Idempotency Key Collisions
|
|
381
|
+
|
|
382
|
+
**What happens:** Two different operations receive the same idempotency key. The server treats the second operation as a retry of the first and returns the first operation's response, silently dropping the second operation.
|
|
383
|
+
|
|
384
|
+
**Why it happens:** Idempotency keys generated with insufficient entropy (short strings, timestamps, sequential IDs). Client-side bugs that reuse keys across different operations.
|
|
385
|
+
|
|
386
|
+
**How to prevent:** Use UUID v4 (128 bits of entropy, collision probability ~1 in 2^122 even after billions of keys). Validate that the request parameters match the original request when a key collision is detected — if they do not match, return an error rather than the cached response. Stripe explicitly implements this parameter-matching check.
|
|
387
|
+
|
|
388
|
+
### Stale Idempotency Cache
|
|
389
|
+
|
|
390
|
+
**What happens:** The idempotency store retains a key beyond the point where the stored response is still valid. The cached response references data that has since changed (prices updated, inventory depleted, user permissions revoked). A retry returns the stale cached response.
|
|
391
|
+
|
|
392
|
+
**Why it happens:** Deduplication window is too long. The cached response includes data that is inherently time-sensitive.
|
|
393
|
+
|
|
394
|
+
**How to prevent:** Set deduplication windows appropriate to the operation's time sensitivity. Cache only the operation's result (success/failure + resource ID), not derived data. On cache hit, optionally re-fetch current state for the response body while still skipping the mutating operation.
|
|
395
|
+
|
|
396
|
+
### Thundering Herd from Synchronized Retries
|
|
397
|
+
|
|
398
|
+
**What happens:** A server goes down, 10,000 clients fail simultaneously, and all retry at the same exponential backoff intervals. Even with increasing delays, the retries arrive in synchronized waves — a spike at 1 second, a spike at 2 seconds, a spike at 4 seconds — each spike large enough to overwhelm the recovering server.
|
|
399
|
+
|
|
400
|
+
**Why it happens:** Exponential backoff without jitter. All clients use the same base delay and multiplier. The failure event synchronized all clients to the same retry clock.
|
|
401
|
+
|
|
402
|
+
**How to prevent:** Full jitter: `delay = random(0, baseDelay * 2^attempt)`. This spreads retries uniformly across the entire backoff window, eliminating spikes. AWS's analysis shows full jitter reduces total completion time by 40% compared to exponential backoff alone because it avoids the retry wave collisions.
|
|
403
|
+
|
|
404
|
+
### Idempotency Store Failure
|
|
405
|
+
|
|
406
|
+
**What happens:** The idempotency store itself becomes unavailable (database outage, Redis failure). The server cannot check whether a key has been seen before. Options: reject all requests (safe but causes downtime) or process all requests without deduplication (risks duplicates).
|
|
407
|
+
|
|
408
|
+
**Why it happens:** The idempotency store is a single point of failure that has not been designed for high availability.
|
|
409
|
+
|
|
410
|
+
**How to prevent:** Use the same database as the main application data (they fail together and recover together — and the idempotency check can be part of the same transaction). If using Redis, deploy in cluster mode with replicas. Define a fallback policy: most systems choose to process requests without dedup during store outages, accepting the small risk of duplicates in exchange for availability.
|
|
411
|
+
|
|
412
|
+
---
|
|
413
|
+
|
|
414
|
+
## Technology Landscape
|
|
415
|
+
|
|
416
|
+
### Stripe Idempotency (The Industry Standard)
|
|
417
|
+
|
|
418
|
+
Stripe requires an `Idempotency-Key` header on all POST requests. Keys are stored for 24 hours. Replayed requests return the cached response. Parameter mismatches on key reuse return a 422 error. The implementation is documented in detail by Brandur Leach, who described the Postgres-backed approach: an `idempotency_keys` table with atomic phases, where each key tracks its lifecycle (started → processing → completed/failed) to handle concurrent requests and partial failures correctly. This design has been adopted or adapted by Shopify, Square, PayPal, Adyen, and most modern payment processors.
|
|
419
|
+
|
|
420
|
+
### AWS SDK Retry Behavior
|
|
421
|
+
|
|
422
|
+
AWS SDKs implement three retry modes:
|
|
423
|
+
|
|
424
|
+
- **Legacy mode:** Simple retry with exponential backoff, up to the maximum retry count (default varies by service).
|
|
425
|
+
- **Standard mode:** Exponential backoff with jitter, token bucket rate limiting (each retry consumes tokens; when the bucket is empty, retries stop), and automatic classification of retryable vs. non-retryable errors.
|
|
426
|
+
- **Adaptive mode (experimental):** Adds client-side rate limiting that dynamically adjusts based on server responses. Throttling responses reduce the retry rate; successful responses increase it.
|
|
427
|
+
|
|
428
|
+
Default maximum attempts: 3 (including the initial request). Backoff formula: `min(maxBackoff, baseDelay * 2^(attemptNumber - 1))` with jitter. Base delay: typically 100ms for most services.
|
|
429
|
+
|
|
430
|
+
### Polly (.NET)
|
|
431
|
+
|
|
432
|
+
Polly provides policy-based resilience for .NET applications. Retry policies support immediate, fixed-interval, and exponential backoff with jitter. Circuit breaker policies integrate with retry policies. Bulkhead policies limit concurrent calls. Policies are composable: a request can pass through a bulkhead → circuit breaker → retry → timeout pipeline.
|
|
433
|
+
|
|
434
|
+
```csharp
|
|
435
|
+
var retryPolicy = Policy
|
|
436
|
+
.Handle<HttpRequestException>()
|
|
437
|
+
.OrResult<HttpResponseMessage>(r => r.StatusCode == HttpStatusCode.ServiceUnavailable)
|
|
438
|
+
.WaitAndRetryAsync(3, attempt =>
|
|
439
|
+
TimeSpan.FromSeconds(Math.Pow(2, attempt))
|
|
440
|
+
+ TimeSpan.FromMilliseconds(Random.Shared.Next(0, 1000)));
|
|
441
|
+
```
|
|
442
|
+
|
|
443
|
+
### Resilience4j (Java/Kotlin)
|
|
444
|
+
|
|
445
|
+
Resilience4j is the successor to Hystrix for JVM-based systems. It provides retry, circuit breaker, bulkhead, rate limiter, and time limiter as composable decorators. Retry supports exponential backoff with configurable jitter. Integrates with Spring Boot via annotations. Publishes metrics to Micrometer for monitoring.
|
|
446
|
+
|
|
447
|
+
```java
|
|
448
|
+
RetryConfig config = RetryConfig.custom()
|
|
449
|
+
.maxAttempts(3)
|
|
450
|
+
.intervalFunction(IntervalFunction.ofExponentialRandomBackoff(
|
|
451
|
+
Duration.ofMillis(500), 2.0, Duration.ofSeconds(30)))
|
|
452
|
+
.retryOnResult(response -> response.getStatus() >= 500)
|
|
453
|
+
.retryExceptions(IOException.class, TimeoutException.class)
|
|
454
|
+
.ignoreExceptions(BusinessException.class)
|
|
455
|
+
.build();
|
|
456
|
+
```
|
|
457
|
+
|
|
458
|
+
### Temporal Built-In Retry
|
|
459
|
+
|
|
460
|
+
Temporal workflows provide retry as a first-class primitive. Each activity (unit of work) has a configurable retry policy: initial interval, backoff coefficient, maximum interval, maximum attempts, and non-retryable error types. Temporal automatically retries failed activities according to the policy, with the workflow's progress durably persisted. If the worker crashes mid-retry, another worker picks up where it left off. This eliminates the need for application-level retry logic for operations orchestrated through Temporal.
|
|
461
|
+
|
|
462
|
+
```typescript
|
|
463
|
+
const { chargeCustomer } = proxyActivities<Activities>({
|
|
464
|
+
retry: {
|
|
465
|
+
initialInterval: '1s',
|
|
466
|
+
backoffCoefficient: 2.0,
|
|
467
|
+
maximumInterval: '30s',
|
|
468
|
+
maximumAttempts: 5,
|
|
469
|
+
nonRetryableErrorTypes: ['InvalidCardError', 'InsufficientFundsError'],
|
|
470
|
+
},
|
|
471
|
+
startToCloseTimeout: '60s',
|
|
472
|
+
});
|
|
473
|
+
```
|
|
474
|
+
|
|
475
|
+
### Database-Backed Idempotency Stores
|
|
476
|
+
|
|
477
|
+
For systems that do not use a dedicated idempotency service, the most robust approach is a database table within the same database as the application data. This allows the idempotency check and the business operation to be wrapped in a single database transaction — if the transaction commits, both the business operation and the idempotency record are persisted atomically; if it rolls back, neither is persisted.
|
|
478
|
+
|
|
479
|
+
```sql
|
|
480
|
+
BEGIN;
|
|
481
|
+
|
|
482
|
+
-- Check idempotency
|
|
483
|
+
SELECT status, response_body FROM idempotency_keys
|
|
484
|
+
WHERE key = $1 FOR UPDATE;
|
|
485
|
+
|
|
486
|
+
-- If key exists and status = 'COMPLETED': return stored response, COMMIT
|
|
487
|
+
-- If key exists and status = 'STARTED': another request is in-flight, return 409 Conflict
|
|
488
|
+
-- If key does not exist:
|
|
489
|
+
|
|
490
|
+
INSERT INTO idempotency_keys (key, request_hash, status, created_at)
|
|
491
|
+
VALUES ($1, $2, 'STARTED', NOW());
|
|
492
|
+
|
|
493
|
+
-- Execute business logic
|
|
494
|
+
INSERT INTO orders (...) VALUES (...);
|
|
495
|
+
|
|
496
|
+
-- Update idempotency record
|
|
497
|
+
UPDATE idempotency_keys
|
|
498
|
+
SET status = 'COMPLETED', response_code = 201, response_body = $3
|
|
499
|
+
WHERE key = $1;
|
|
500
|
+
|
|
501
|
+
COMMIT;
|
|
502
|
+
```
|
|
503
|
+
|
|
504
|
+
The `FOR UPDATE` lock prevents race conditions where two concurrent requests with the same idempotency key both pass the existence check and both attempt to process the operation.
|
|
505
|
+
|
|
506
|
+
---
|
|
507
|
+
|
|
508
|
+
## Decision Tree
|
|
509
|
+
|
|
510
|
+
```
|
|
511
|
+
Is the operation read-only (GET, query, search)?
|
|
512
|
+
├── Yes → Naturally idempotent. Retry freely. No idempotency key needed.
|
|
513
|
+
└── No (mutating operation) →
|
|
514
|
+
│
|
|
515
|
+
Is the operation naturally idempotent (PUT with full replacement, DELETE, upsert)?
|
|
516
|
+
├── Yes → Retry freely. Consider adding idempotency keys for observability.
|
|
517
|
+
└── No (POST, increment, side-effecting) →
|
|
518
|
+
│
|
|
519
|
+
Does the operation cross a network boundary?
|
|
520
|
+
├── No (local function call) → Retry may help for lock contention.
|
|
521
|
+
│ Idempotency usually unnecessary.
|
|
522
|
+
└── Yes →
|
|
523
|
+
│
|
|
524
|
+
Is the operation critical (payments, orders, account creation)?
|
|
525
|
+
├── Yes → REQUIRED: Database-backed idempotency store in same
|
|
526
|
+
│ transaction. Exponential backoff + jitter. Circuit breaker.
|
|
527
|
+
│ Dead letter queue. Monitoring.
|
|
528
|
+
└── No (notifications, logging, analytics) →
|
|
529
|
+
│
|
|
530
|
+
Can duplicates be tolerated?
|
|
531
|
+
├── Yes → Retry with backoff + jitter. Skip idempotency store.
|
|
532
|
+
│ Accept occasional duplicates.
|
|
533
|
+
└── No → Redis-backed idempotency with TTL. Retry with backoff
|
|
534
|
+
+ jitter. Retry budget.
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
```
|
|
538
|
+
Choosing a retry strategy:
|
|
539
|
+
|
|
540
|
+
What kind of failure?
|
|
541
|
+
├── 4xx Client Error (400, 401, 403, 404, 422) → DO NOT RETRY. Fix the request.
|
|
542
|
+
├── 429 Too Many Requests → Retry AFTER Retry-After header delay.
|
|
543
|
+
├── 5xx Server Error (500, 502, 503, 504) → Retry with backoff + jitter.
|
|
544
|
+
├── Network error (timeout, connection refused, DNS failure) → Retry with backoff + jitter.
|
|
545
|
+
└── Unknown/ambiguous error → Retry cautiously (low attempt count, longer backoff).
|
|
546
|
+
|
|
547
|
+
How many retries?
|
|
548
|
+
├── Critical path (user-facing, payment) → 3-5 attempts, short total timeout.
|
|
549
|
+
├── Background job (async processing) → 5-10 attempts, longer total timeout.
|
|
550
|
+
└── Webhook delivery (outgoing) → Many attempts over hours/days with increasing delays.
|
|
551
|
+
|
|
552
|
+
Backoff strategy?
|
|
553
|
+
├── Always use exponential backoff with full jitter.
|
|
554
|
+
├── Cap the maximum delay (10-60 seconds for synchronous, minutes for async).
|
|
555
|
+
└── Add retry budget at the service level (10% max retry traffic).
|
|
556
|
+
```
|
|
557
|
+
|
|
558
|
+
---
|
|
559
|
+
|
|
560
|
+
## Implementation Sketch
|
|
561
|
+
|
|
562
|
+
### Idempotency Middleware (Node.js/Express)
|
|
563
|
+
|
|
564
|
+
```typescript
|
|
565
|
+
import { Request, Response, NextFunction } from 'express';
|
|
566
|
+
import { Pool } from 'pg';
|
|
567
|
+
import crypto from 'crypto';
|
|
568
|
+
|
|
569
|
+
export function idempotencyMiddleware(pool: Pool, headerName = 'Idempotency-Key') {
|
|
570
|
+
return async (req: Request, res: Response, next: NextFunction) => {
|
|
571
|
+
if (req.method !== 'POST') return next();
|
|
572
|
+
|
|
573
|
+
const idempotencyKey = req.headers[headerName.toLowerCase()] as string;
|
|
574
|
+
if (!idempotencyKey) {
|
|
575
|
+
return res.status(400).json({
|
|
576
|
+
error: 'missing_idempotency_key',
|
|
577
|
+
message: `${headerName} header is required for POST requests`,
|
|
578
|
+
});
|
|
579
|
+
}
|
|
580
|
+
|
|
581
|
+
const requestHash = crypto
|
|
582
|
+
.createHash('sha256')
|
|
583
|
+
.update(JSON.stringify(req.body))
|
|
584
|
+
.digest('hex');
|
|
585
|
+
|
|
586
|
+
const client = await pool.connect();
|
|
587
|
+
try {
|
|
588
|
+
await client.query('BEGIN');
|
|
589
|
+
|
|
590
|
+
const existing = await client.query(
|
|
591
|
+
`SELECT status, request_hash, response_code, response_body
|
|
592
|
+
FROM idempotency_keys WHERE key = $1 FOR UPDATE`,
|
|
593
|
+
[idempotencyKey]
|
|
594
|
+
);
|
|
595
|
+
|
|
596
|
+
if (existing.rows.length > 0) {
|
|
597
|
+
const record = existing.rows[0];
|
|
598
|
+
if (record.request_hash !== requestHash) {
|
|
599
|
+
await client.query('ROLLBACK');
|
|
600
|
+
return res.status(422).json({ error: 'idempotency_key_reuse' });
|
|
601
|
+
}
|
|
602
|
+
if (record.status === 'STARTED') {
|
|
603
|
+
await client.query('ROLLBACK');
|
|
604
|
+
return res.status(409).json({ error: 'request_in_progress' });
|
|
605
|
+
}
|
|
606
|
+
await client.query('COMMIT');
|
|
607
|
+
return res.status(record.response_code).json(record.response_body);
|
|
608
|
+
}
|
|
609
|
+
|
|
610
|
+
await client.query(
|
|
611
|
+
`INSERT INTO idempotency_keys (key, request_path, request_hash, status, created_at)
|
|
612
|
+
VALUES ($1, $2, $3, 'STARTED', NOW())`,
|
|
613
|
+
[idempotencyKey, req.path, requestHash]
|
|
614
|
+
);
|
|
615
|
+
await client.query('COMMIT');
|
|
616
|
+
|
|
617
|
+
// Intercept response to capture and store the result
|
|
618
|
+
const originalJson = res.json.bind(res);
|
|
619
|
+
res.json = (body: any) => {
|
|
620
|
+
pool.query(
|
|
621
|
+
`UPDATE idempotency_keys
|
|
622
|
+
SET status = 'COMPLETED', response_code = $2, response_body = $3
|
|
623
|
+
WHERE key = $1`,
|
|
624
|
+
[idempotencyKey, res.statusCode, body]
|
|
625
|
+
).catch(err => console.error('Idempotency store update failed:', err));
|
|
626
|
+
return originalJson(body);
|
|
627
|
+
};
|
|
628
|
+
|
|
629
|
+
next();
|
|
630
|
+
} catch (err) {
|
|
631
|
+
await client.query('ROLLBACK');
|
|
632
|
+
next(err);
|
|
633
|
+
} finally {
|
|
634
|
+
client.release();
|
|
635
|
+
}
|
|
636
|
+
};
|
|
637
|
+
}
|
|
638
|
+
```
|
|
639
|
+
|
|
640
|
+
### Retry with Exponential Backoff + Full Jitter (TypeScript)
|
|
641
|
+
|
|
642
|
+
```typescript
|
|
643
|
+
interface RetryConfig {
|
|
644
|
+
maxAttempts: number; // Total attempts including the first
|
|
645
|
+
baseDelayMs: number; // Base delay for backoff calculation
|
|
646
|
+
maxDelayMs: number; // Cap on the computed delay
|
|
647
|
+
retryableErrors: Set<number>; // HTTP status codes that are retryable
|
|
648
|
+
onRetry?: (attempt: number, error: Error, delayMs: number) => void;
|
|
649
|
+
}
|
|
650
|
+
|
|
651
|
+
const DEFAULT_RETRY_CONFIG: RetryConfig = {
|
|
652
|
+
maxAttempts: 4,
|
|
653
|
+
baseDelayMs: 500,
|
|
654
|
+
maxDelayMs: 30_000,
|
|
655
|
+
retryableErrors: new Set([429, 500, 502, 503, 504]),
|
|
656
|
+
};
|
|
657
|
+
|
|
658
|
+
class RetriesExhaustedError extends Error {
|
|
659
|
+
constructor(
|
|
660
|
+
public readonly lastError: Error,
|
|
661
|
+
public readonly attempts: number,
|
|
662
|
+
) {
|
|
663
|
+
super(`All ${attempts} retry attempts exhausted. Last error: ${lastError.message}`);
|
|
664
|
+
this.name = 'RetriesExhaustedError';
|
|
665
|
+
}
|
|
666
|
+
}
|
|
667
|
+
|
|
668
|
+
async function withRetry<T>(
|
|
669
|
+
operation: () => Promise<T>,
|
|
670
|
+
config: RetryConfig = DEFAULT_RETRY_CONFIG,
|
|
671
|
+
): Promise<T> {
|
|
672
|
+
let lastError: Error | null = null;
|
|
673
|
+
|
|
674
|
+
for (let attempt = 1; attempt <= config.maxAttempts; attempt++) {
|
|
675
|
+
try {
|
|
676
|
+
return await operation();
|
|
677
|
+
} catch (error: any) {
|
|
678
|
+
lastError = error;
|
|
679
|
+
|
|
680
|
+
// Do not retry non-retryable errors
|
|
681
|
+
const statusCode = error?.response?.status ?? error?.statusCode;
|
|
682
|
+
if (statusCode && !config.retryableErrors.has(statusCode)) {
|
|
683
|
+
throw error;
|
|
684
|
+
}
|
|
685
|
+
|
|
686
|
+
// Do not retry if this was the last attempt
|
|
687
|
+
if (attempt === config.maxAttempts) {
|
|
688
|
+
break;
|
|
689
|
+
}
|
|
690
|
+
|
|
691
|
+
// Calculate delay with exponential backoff + full jitter
|
|
692
|
+
const exponentialDelay = config.baseDelayMs * Math.pow(2, attempt - 1);
|
|
693
|
+
const cappedDelay = Math.min(exponentialDelay, config.maxDelayMs);
|
|
694
|
+
const jitteredDelay = Math.random() * cappedDelay; // Full jitter
|
|
695
|
+
|
|
696
|
+
// Respect Retry-After header for 429 responses
|
|
697
|
+
const retryAfter = error?.response?.headers?.['retry-after'];
|
|
698
|
+
const finalDelay = retryAfter
|
|
699
|
+
? Math.max(jitteredDelay, parseRetryAfter(retryAfter))
|
|
700
|
+
: jitteredDelay;
|
|
701
|
+
|
|
702
|
+
config.onRetry?.(attempt, error, finalDelay);
|
|
703
|
+
|
|
704
|
+
await sleep(finalDelay);
|
|
705
|
+
}
|
|
706
|
+
}
|
|
707
|
+
|
|
708
|
+
throw new RetriesExhaustedError(lastError!, config.maxAttempts);
|
|
709
|
+
}
|
|
710
|
+
|
|
711
|
+
function parseRetryAfter(value: string): number {
|
|
712
|
+
const seconds = parseInt(value, 10);
|
|
713
|
+
if (!isNaN(seconds)) return seconds * 1000;
|
|
714
|
+
// Retry-After can also be an HTTP date
|
|
715
|
+
const date = new Date(value);
|
|
716
|
+
if (!isNaN(date.getTime())) return Math.max(0, date.getTime() - Date.now());
|
|
717
|
+
return 0;
|
|
718
|
+
}
|
|
719
|
+
|
|
720
|
+
function sleep(ms: number): Promise<void> {
|
|
721
|
+
return new Promise((resolve) => setTimeout(resolve, ms));
|
|
722
|
+
}
|
|
723
|
+
```
|
|
724
|
+
|
|
725
|
+
### Idempotent Message Consumer (Python)
|
|
726
|
+
|
|
727
|
+
```python
|
|
728
|
+
import logging
|
|
729
|
+
from datetime import datetime, timedelta
|
|
730
|
+
from typing import Any, Callable, Optional
|
|
731
|
+
|
|
732
|
+
logger = logging.getLogger(__name__)
|
|
733
|
+
|
|
734
|
+
class IdempotentConsumer:
|
|
735
|
+
"""Wraps a message handler to ensure idempotent processing via dedup table."""
|
|
736
|
+
|
|
737
|
+
def __init__(self, db_connection, dedup_window_hours: int = 1):
|
|
738
|
+
self.db = db_connection
|
|
739
|
+
self.dedup_window = timedelta(hours=dedup_window_hours)
|
|
740
|
+
|
|
741
|
+
def process(self, message_id: str, body: dict, handler: Callable) -> Optional[Any]:
|
|
742
|
+
cursor = self.db.cursor()
|
|
743
|
+
try:
|
|
744
|
+
self.db.begin()
|
|
745
|
+
cursor.execute(
|
|
746
|
+
"SELECT 1 FROM processed_messages WHERE message_id = %s FOR UPDATE",
|
|
747
|
+
(message_id,),
|
|
748
|
+
)
|
|
749
|
+
if cursor.fetchone():
|
|
750
|
+
logger.info("Skipping duplicate message %s", message_id)
|
|
751
|
+
self.db.rollback()
|
|
752
|
+
return None
|
|
753
|
+
|
|
754
|
+
cursor.execute(
|
|
755
|
+
"INSERT INTO processed_messages (message_id, processed_at) VALUES (%s, %s)",
|
|
756
|
+
(message_id, datetime.utcnow()),
|
|
757
|
+
)
|
|
758
|
+
result = handler(body)
|
|
759
|
+
self.db.commit()
|
|
760
|
+
return result
|
|
761
|
+
except Exception:
|
|
762
|
+
self.db.rollback()
|
|
763
|
+
# Clean up dedup record so the message can be retried
|
|
764
|
+
cursor.execute("DELETE FROM processed_messages WHERE message_id = %s", (message_id,))
|
|
765
|
+
self.db.commit()
|
|
766
|
+
raise
|
|
767
|
+
```
|
|
768
|
+
|
|
769
|
+
---
|
|
770
|
+
|
|
771
|
+
## Cross-References
|
|
772
|
+
|
|
773
|
+
- **Circuit Breaker and Bulkhead** — Circuit breakers stop retries when a service is down; bulkheads isolate failures. Retry without circuit breaking causes retry storms. See `circuit-breaker-bulkhead`.
|
|
774
|
+
- **Distributed Systems Fundamentals** — The CAP theorem, Two Generals Problem, and FLP impossibility result explain why exactly-once delivery is impossible and why at-least-once + idempotency is the practical alternative. See `distributed-systems-fundamentals`.
|
|
775
|
+
- **Event-Driven Architecture** — Message brokers provide at-least-once delivery, requiring consumer-side idempotency. The Outbox Pattern and transactional messaging are key integration patterns. See `event-driven`.
|
|
776
|
+
- **Saga Pattern** — Long-running transactions across services use compensating actions on failure. Each saga step must be idempotent (both the action and the compensation) to handle retries safely. See `saga-pattern`.
|
|
777
|
+
- **API Design (REST)** — HTTP method idempotency semantics (GET, PUT, DELETE are idempotent; POST is not), idempotency key headers, and error response design for retryable vs. non-retryable errors. See `api-design-rest`.
|
|
778
|
+
|
|
779
|
+
---
|
|
780
|
+
|
|
781
|
+
## Sources
|
|
782
|
+
|
|
783
|
+
- [Stripe: Designing Robust and Predictable APIs with Idempotency](https://stripe.com/blog/idempotency)
|
|
784
|
+
- [Stripe API Reference: Idempotent Requests](https://docs.stripe.com/api/idempotent_requests)
|
|
785
|
+
- [Brandur Leach: Implementing Stripe-like Idempotency Keys in Postgres](https://brandur.org/idempotency-keys)
|
|
786
|
+
- [AWS Builders' Library: Timeouts, Retries, and Backoff with Jitter](https://aws.amazon.com/builders-library/timeouts-retries-and-backoff-with-jitter/)
|
|
787
|
+
- [AWS Architecture Blog: Exponential Backoff and Jitter](https://aws.amazon.com/blogs/architecture/exponential-backoff-and-jitter/)
|
|
788
|
+
- [AWS Prescriptive Guidance: Retry with Backoff Pattern](https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/retry-backoff.html)
|
|
789
|
+
- [Microsoft Azure: Retry Storm Antipattern](https://learn.microsoft.com/en-us/azure/architecture/antipatterns/retry-storm/)
|
|
790
|
+
- [Yandex Engineering: Good Retry, Bad Retry — An Incident Story](https://medium.com/yandex/good-retry-bad-retry-an-incident-story-648072d3cee6)
|
|
791
|
+
- [Agoda Engineering: How Agoda Solved Retry Storms](https://medium.com/agoda-engineering/how-agoda-solved-retry-storms-to-boost-system-reliability-9bf0d1dfbeee)
|
|
792
|
+
- [Temporal: Error Handling in Distributed Systems](https://temporal.io/blog/error-handling-in-distributed-systems)
|
|
793
|
+
- [Confluent: Exactly-Once Semantics in Apache Kafka](https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/)
|
|
794
|
+
- [Gunnar Morling: On Idempotency Keys](https://www.morling.dev/blog/on-idempotency-keys/)
|
|
795
|
+
- [HTTP Toolkit: Working with the Idempotency Keys RFC](https://httptoolkit.com/blog/idempotency-keys/)
|
|
796
|
+
- [InfoQ: Timeouts, Retries and Idempotency in Distributed Systems](https://www.infoq.com/presentations/distributed-systems-resiliency/)
|