@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,1152 @@
|
|
|
1
|
+
# Webhooks and Callbacks — Architecture Expertise Module
|
|
2
|
+
|
|
3
|
+
> **Purpose:** Reference for AI agents during planning and implementation of webhook-based integrations — both sending and receiving.
|
|
4
|
+
> **Last updated:** 2026-03-08
|
|
5
|
+
> **Sources:** Stripe/GitHub/Twilio/Shopify webhook docs, webhooks.fyi, Svix, Hookdeck, OWASP, real-world production incidents 2022-2025
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 1. What This Is
|
|
10
|
+
|
|
11
|
+
### 1.1 Core Concept
|
|
12
|
+
|
|
13
|
+
Webhooks are HTTP callbacks — automated POST requests sent from one system (the **provider**) to another system (the **consumer**) when an event occurs. The consumer registers a URL endpoint in advance, and the provider pushes event data to that URL whenever relevant state changes happen.
|
|
14
|
+
|
|
15
|
+
The term "webhook" was coined by Jeff Lindsay in 2007. Today, webhooks are the dominant integration pattern for SaaS APIs. Stripe, GitHub, Twilio, Shopify, Slack, PayPal, and virtually every major platform uses webhooks to notify external systems of events.
|
|
16
|
+
|
|
17
|
+
### 1.2 What Webhooks Are NOT
|
|
18
|
+
|
|
19
|
+
**Webhooks are not WebSockets.** WebSockets establish a persistent, bidirectional connection between client and server. Webhooks are one-directional, fire-and-forget HTTP requests triggered by events. WebSockets maintain connection state; webhooks are stateless individual requests.
|
|
20
|
+
|
|
21
|
+
**Webhooks are not message queues.** Message queues (RabbitMQ, SQS, Kafka) provide durable, ordered, exactly-once delivery with backpressure and consumer acknowledgment. Webhooks are simpler but offer weaker delivery guarantees — typically at-least-once with provider-managed retries. You do not control the queue; the provider does.
|
|
22
|
+
|
|
23
|
+
**Webhooks are not Server-Sent Events (SSE).** SSE maintains a long-lived HTTP connection where the server streams events to the client. Webhooks are discrete HTTP requests to a URL you control. SSE requires the client to maintain a connection; webhooks require the consumer to expose a public endpoint.
|
|
24
|
+
|
|
25
|
+
### 1.3 Webhook vs Polling vs SSE vs WebSocket
|
|
26
|
+
|
|
27
|
+
| Dimension | Webhooks | Polling | SSE | WebSocket |
|
|
28
|
+
|-----------|----------|---------|-----|-----------|
|
|
29
|
+
| **Direction** | Server-to-server push | Client pulls from server | Server pushes to client | Bidirectional |
|
|
30
|
+
| **Connection** | No persistent connection | No persistent connection | Long-lived HTTP | Long-lived TCP |
|
|
31
|
+
| **Latency** | Near real-time (seconds) | Interval-dependent (minutes) | Real-time (ms) | Real-time (ms) |
|
|
32
|
+
| **Efficiency** | High — only fires on events | Low — Zapier reports only 1.5% of polls find updates | High | High |
|
|
33
|
+
| **Reliability** | Provider-managed retries | Consumer controls retry | Connection drops lose events | Connection drops lose events |
|
|
34
|
+
| **Complexity** | Moderate (endpoint + security) | Low (simple HTTP GET loop) | Low-moderate | High (connection management) |
|
|
35
|
+
| **Firewall** | Requires inbound access | Outbound only | Outbound only | Outbound only |
|
|
36
|
+
| **Use case** | Service-to-service events | Legacy APIs, no webhook support | Live dashboards, feeds | Chat, gaming, collaboration |
|
|
37
|
+
|
|
38
|
+
### 1.4 The Fundamental Tradeoff
|
|
39
|
+
|
|
40
|
+
Polling offers **control and universality** at the cost of freshness and efficiency. Webhooks offer **real-time behavior and efficiency** but introduce reliability concerns and require the receiver to expose a publicly accessible endpoint. In practice, mature systems use webhooks as the primary mechanism with polling as a reconciliation fallback.
|
|
41
|
+
|
|
42
|
+
---
|
|
43
|
+
|
|
44
|
+
## 2. When to Use It
|
|
45
|
+
|
|
46
|
+
### 2.1 Strong Fit
|
|
47
|
+
|
|
48
|
+
- **External service integration.** Receiving payment confirmations from Stripe, push events from GitHub, SMS delivery receipts from Twilio. These providers already have webhook infrastructure — use it.
|
|
49
|
+
- **Payment and billing notifications.** `payment_intent.succeeded`, `invoice.paid`, `subscription.canceled`. Financial events that must trigger downstream actions (fulfill order, update entitlements, send receipt).
|
|
50
|
+
- **CI/CD pipeline triggers.** GitHub/GitLab webhooks trigger builds on push, PR creation, tag creation. Jenkins, CircleCI, and ArgoCD all consume repository webhooks.
|
|
51
|
+
- **SaaS platform events.** Shopify order created, Slack message posted, Jira issue updated, Zendesk ticket resolved. Any multi-tenant platform where consumers need to react to state changes.
|
|
52
|
+
- **User-facing notification triggers.** New comment, review approved, deployment completed — events that should trigger emails, Slack messages, or in-app notifications.
|
|
53
|
+
- **Building a developer platform.** If you are building an API that third parties integrate with, offering webhooks is table stakes. Developers expect push-based event notification.
|
|
54
|
+
|
|
55
|
+
### 2.2 Complementary Pattern: Webhooks + Polling
|
|
56
|
+
|
|
57
|
+
The most resilient integration pattern combines both:
|
|
58
|
+
|
|
59
|
+
1. **Webhooks** handle the happy path — instant notification on state change.
|
|
60
|
+
2. **Periodic polling** serves as a reconciliation sweep — catches any events missed due to webhook delivery failure, endpoint downtime, or provider outage.
|
|
61
|
+
3. **Idempotent processing** on the receiver ensures duplicate events (from both channels) are handled safely.
|
|
62
|
+
|
|
63
|
+
Stripe explicitly recommends this pattern: "We recommend listening to webhooks for getting updates, along with periodically polling for any missed events."
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## 3. When NOT to Use It
|
|
68
|
+
|
|
69
|
+
This section is intentionally long. Webhook misuse causes more production incidents than webhook bugs.
|
|
70
|
+
|
|
71
|
+
### 3.1 Internal Service Communication
|
|
72
|
+
|
|
73
|
+
**Use message queues instead.** If both the sender and receiver are services you own and deploy, a message broker (RabbitMQ, SQS, Kafka, NATS) provides stronger guarantees:
|
|
74
|
+
|
|
75
|
+
- **Durable delivery** — messages persist until acknowledged, surviving consumer restarts.
|
|
76
|
+
- **Backpressure** — consumers pull at their own pace; no thundering herd risk.
|
|
77
|
+
- **Exactly-once semantics** — achievable with transactional consumers (Kafka exactly-once, SQS FIFO deduplication).
|
|
78
|
+
- **Ordering guarantees** — partition-level ordering in Kafka, FIFO queues in SQS.
|
|
79
|
+
|
|
80
|
+
Webhooks over HTTP between internal services add network overhead, TLS handshake latency, and require you to build retry/DLQ infrastructure that message brokers provide natively.
|
|
81
|
+
|
|
82
|
+
### 3.2 When Delivery Guarantees Are Critical Without Retry Infrastructure
|
|
83
|
+
|
|
84
|
+
Webhooks provide **at-least-once** delivery at best — and only if the provider implements retries. If your business logic requires guaranteed delivery and you have no retry infrastructure (no DLQ, no reconciliation polling, no event store), webhooks will lose events. Common causes:
|
|
85
|
+
|
|
86
|
+
- Receiver returns 200 but crashes before processing
|
|
87
|
+
- Receiver is down during all retry attempts (provider gives up after 24-72 hours)
|
|
88
|
+
- Network partition between provider and receiver
|
|
89
|
+
- Provider has a bug in their retry logic
|
|
90
|
+
|
|
91
|
+
**Real example:** WooCommerce webhooks fire once and do not retry on failure. If your endpoint is down when an order is placed, you never receive that event. No DLQ, no replay, no reconciliation.
|
|
92
|
+
|
|
93
|
+
### 3.3 When the Receiver Cannot Handle Burst Traffic
|
|
94
|
+
|
|
95
|
+
Webhook providers do not respect your capacity. A bulk import on Shopify can trigger thousands of `product/update` webhooks in seconds. A GitHub organization rename can fire webhooks for every repository simultaneously. Stripe processes billions of events monthly — a merchant with high transaction volume can receive thousands of events per minute during peak.
|
|
96
|
+
|
|
97
|
+
If your receiver is a single-instance application without a queue buffer, webhook storms will overwhelm it — causing timeouts, which trigger retries, which create more load (the retry amplification problem).
|
|
98
|
+
|
|
99
|
+
### 3.4 Firewall and NAT Traversal
|
|
100
|
+
|
|
101
|
+
Webhooks require the receiver to expose a publicly accessible HTTP endpoint. This is a non-starter in:
|
|
102
|
+
|
|
103
|
+
- Corporate networks behind strict firewalls with no inbound rules
|
|
104
|
+
- On-premise deployments without public IP addresses
|
|
105
|
+
- IoT devices behind NAT without port forwarding
|
|
106
|
+
- Development environments (without tools like ngrok or Hookdeck)
|
|
107
|
+
|
|
108
|
+
In these scenarios, polling or a managed webhook proxy (Hookdeck, Svix) is required.
|
|
109
|
+
|
|
110
|
+
### 3.5 When Ordering Matters
|
|
111
|
+
|
|
112
|
+
Webhooks provide **no ordering guarantees**. A `payment_intent.succeeded` event might arrive before `payment_intent.created` due to network timing, provider processing order, or retry scheduling. If your processing logic assumes events arrive in order, webhooks will break it.
|
|
113
|
+
|
|
114
|
+
Solutions exist (timestamp-based ordering, event sequence numbers) but they add complexity. If strict ordering is a hard requirement, a message queue with partition-level ordering (Kafka) or FIFO semantics (SQS FIFO) is more appropriate.
|
|
115
|
+
|
|
116
|
+
### 3.6 High-Security Environments
|
|
117
|
+
|
|
118
|
+
Webhooks require opening inbound HTTP endpoints, which increases attack surface. In environments with strict security postures (PCI DSS scope, HIPAA, FedRAMP), every inbound endpoint must be hardened, monitored, and audited. Some compliance frameworks make this prohibitively expensive. Polling (outbound-only) is simpler to secure.
|
|
119
|
+
|
|
120
|
+
---
|
|
121
|
+
|
|
122
|
+
## 4. How It Works
|
|
123
|
+
|
|
124
|
+
### 4.1 Webhook Registration
|
|
125
|
+
|
|
126
|
+
The consumer registers a webhook subscription with the provider, specifying:
|
|
127
|
+
|
|
128
|
+
```
|
|
129
|
+
POST /api/webhooks
|
|
130
|
+
{
|
|
131
|
+
"url": "https://myapp.com/webhooks/stripe",
|
|
132
|
+
"events": ["payment_intent.succeeded", "invoice.paid"],
|
|
133
|
+
"secret": "whsec_..." // or provider generates and returns this
|
|
134
|
+
}
|
|
135
|
+
```
|
|
136
|
+
|
|
137
|
+
**Registration models vary by provider:**
|
|
138
|
+
|
|
139
|
+
| Provider | Registration Method | Event Filtering | Secret Management |
|
|
140
|
+
|----------|-------------------|-----------------|-------------------|
|
|
141
|
+
| Stripe | Dashboard + API | Per-endpoint event filter | Provider generates signing secret |
|
|
142
|
+
| GitHub | Repository/Org settings + API | Per-hook event selection | User provides secret |
|
|
143
|
+
| Twilio | Per-resource URL configuration | Implicit (resource-level) | Account auth token used for signing |
|
|
144
|
+
| Shopify | Partner Dashboard + API | Topic-based subscription | Provider generates HMAC secret |
|
|
145
|
+
| Slack | App configuration | Event subscriptions | Signing secret in app credentials |
|
|
146
|
+
|
|
147
|
+
### 4.2 Event Payload Design
|
|
148
|
+
|
|
149
|
+
A well-designed webhook payload includes:
|
|
150
|
+
|
|
151
|
+
```json
|
|
152
|
+
{
|
|
153
|
+
"id": "evt_1NqZ5bABCDEF",
|
|
154
|
+
"type": "payment_intent.succeeded",
|
|
155
|
+
"created": 1695312000,
|
|
156
|
+
"api_version": "2023-10-16",
|
|
157
|
+
"data": {
|
|
158
|
+
"object": {
|
|
159
|
+
"id": "pi_3NqZ5bABCDEF",
|
|
160
|
+
"amount": 2000,
|
|
161
|
+
"currency": "usd",
|
|
162
|
+
"status": "succeeded",
|
|
163
|
+
"customer": "cus_ABCDEF"
|
|
164
|
+
},
|
|
165
|
+
"previous_attributes": {
|
|
166
|
+
"status": "processing"
|
|
167
|
+
}
|
|
168
|
+
},
|
|
169
|
+
"livemode": true
|
|
170
|
+
}
|
|
171
|
+
```
|
|
172
|
+
|
|
173
|
+
**Key design decisions:**
|
|
174
|
+
|
|
175
|
+
| Decision | Recommended Approach | Rationale |
|
|
176
|
+
|----------|---------------------|-----------|
|
|
177
|
+
| **Fat vs thin payload** | Fat (include full object state) | Reduces need for API callbacks; thin payloads force receiver to fetch data, adding latency and coupling |
|
|
178
|
+
| **Event ID** | Globally unique, provider-generated | Enables idempotency on receiver side |
|
|
179
|
+
| **Timestamp** | Unix epoch (integer) | Unambiguous, timezone-free, enables ordering |
|
|
180
|
+
| **API version** | Include in payload | Receiver can handle schema evolution |
|
|
181
|
+
| **Previous state** | Include changed attributes | Enables delta processing without storing previous state |
|
|
182
|
+
|
|
183
|
+
### 4.3 HMAC Signature Verification
|
|
184
|
+
|
|
185
|
+
HMAC is the dominant webhook authentication method — used by 65% of webhook providers (per webhooks.fyi). The pattern:
|
|
186
|
+
|
|
187
|
+
**Provider side (sender):**
|
|
188
|
+
|
|
189
|
+
```python
|
|
190
|
+
import hmac
|
|
191
|
+
import hashlib
|
|
192
|
+
import time
|
|
193
|
+
|
|
194
|
+
def sign_webhook(payload: bytes, secret: str) -> dict:
|
|
195
|
+
timestamp = str(int(time.time()))
|
|
196
|
+
signed_content = f"{timestamp}.{payload.decode()}"
|
|
197
|
+
signature = hmac.new(
|
|
198
|
+
secret.encode(),
|
|
199
|
+
signed_content.encode(),
|
|
200
|
+
hashlib.sha256
|
|
201
|
+
).hexdigest()
|
|
202
|
+
|
|
203
|
+
return {
|
|
204
|
+
"X-Webhook-Timestamp": timestamp,
|
|
205
|
+
"X-Webhook-Signature": f"sha256={signature}"
|
|
206
|
+
}
|
|
207
|
+
```
|
|
208
|
+
|
|
209
|
+
**Consumer side (receiver):**
|
|
210
|
+
|
|
211
|
+
```python
|
|
212
|
+
import hmac
|
|
213
|
+
import hashlib
|
|
214
|
+
import time
|
|
215
|
+
|
|
216
|
+
TOLERANCE_SECONDS = 300 # 5 minutes
|
|
217
|
+
|
|
218
|
+
def verify_webhook(payload: bytes, headers: dict, secret: str) -> bool:
|
|
219
|
+
timestamp = headers.get("X-Webhook-Timestamp")
|
|
220
|
+
signature = headers.get("X-Webhook-Signature")
|
|
221
|
+
|
|
222
|
+
if not timestamp or not signature:
|
|
223
|
+
return False
|
|
224
|
+
|
|
225
|
+
# Reject stale timestamps to prevent replay attacks
|
|
226
|
+
if abs(time.time() - int(timestamp)) > TOLERANCE_SECONDS:
|
|
227
|
+
return False
|
|
228
|
+
|
|
229
|
+
# Recompute signature
|
|
230
|
+
signed_content = f"{timestamp}.{payload.decode()}"
|
|
231
|
+
expected = hmac.new(
|
|
232
|
+
secret.encode(),
|
|
233
|
+
signed_content.encode(),
|
|
234
|
+
hashlib.sha256
|
|
235
|
+
).hexdigest()
|
|
236
|
+
|
|
237
|
+
# Timing-safe comparison to prevent timing attacks
|
|
238
|
+
return hmac.compare_digest(f"sha256={expected}", signature)
|
|
239
|
+
```
|
|
240
|
+
|
|
241
|
+
**Critical implementation details:**
|
|
242
|
+
|
|
243
|
+
1. **Use the raw request body** for signature computation. Do not parse to JSON and re-serialize — re-serialization can change field order, whitespace, or unicode escaping, breaking the signature.
|
|
244
|
+
2. **Use timing-safe comparison** (`hmac.compare_digest` in Python, `crypto.timingSafeEqual` in Node.js). Standard string equality (`==`) is vulnerable to timing attacks that can leak the signature character by character.
|
|
245
|
+
3. **Include a timestamp** in the signed content and reject old timestamps (typically 5 minutes). This prevents replay attacks where an attacker captures a valid webhook and resends it later.
|
|
246
|
+
4. **Store the secret securely** — environment variables or a secrets manager (AWS Secrets Manager, HashiCorp Vault). Never commit secrets to source control.
|
|
247
|
+
|
|
248
|
+
### 4.4 Retry with Exponential Backoff
|
|
249
|
+
|
|
250
|
+
When a webhook delivery fails (non-2xx response or timeout), providers retry with exponential backoff:
|
|
251
|
+
|
|
252
|
+
```
|
|
253
|
+
Attempt 1: Immediate
|
|
254
|
+
Attempt 2: ~5 seconds
|
|
255
|
+
Attempt 3: ~25 seconds
|
|
256
|
+
Attempt 4: ~2 minutes
|
|
257
|
+
Attempt 5: ~10 minutes
|
|
258
|
+
Attempt 6: ~1 hour
|
|
259
|
+
Attempt 7: ~6 hours
|
|
260
|
+
Attempt 8: ~24 hours (final attempt)
|
|
261
|
+
```
|
|
262
|
+
|
|
263
|
+
**Real-world provider retry policies:**
|
|
264
|
+
|
|
265
|
+
| Provider | Max Retries | Total Window | Backoff Strategy | Give-Up Behavior |
|
|
266
|
+
|----------|-------------|--------------|------------------|------------------|
|
|
267
|
+
| Stripe | 3 attempts over 72 hours | 72 hours | Exponential | Disables endpoint after sustained failures |
|
|
268
|
+
| GitHub | 1 retry | ~10 seconds | Fixed | Event lost |
|
|
269
|
+
| Shopify | 19 retries | 48 hours | Exponential | Webhook removed after 19 consecutive failures |
|
|
270
|
+
| Twilio | Configurable | Configurable | Exponential | Depends on configuration |
|
|
271
|
+
| Slack | 3 retries | ~1 hour | Exponential + backoff | Disables app event subscription |
|
|
272
|
+
|
|
273
|
+
**Exponential backoff with jitter (sender implementation):**
|
|
274
|
+
|
|
275
|
+
```python
|
|
276
|
+
import random
|
|
277
|
+
import asyncio
|
|
278
|
+
|
|
279
|
+
async def deliver_with_retry(
|
|
280
|
+
url: str,
|
|
281
|
+
payload: dict,
|
|
282
|
+
max_retries: int = 8,
|
|
283
|
+
base_delay: float = 5.0,
|
|
284
|
+
max_delay: float = 3600.0, # 1 hour cap
|
|
285
|
+
):
|
|
286
|
+
for attempt in range(max_retries):
|
|
287
|
+
try:
|
|
288
|
+
response = await http_post(url, json=payload, timeout=30)
|
|
289
|
+
if 200 <= response.status < 300:
|
|
290
|
+
return True # Success
|
|
291
|
+
if response.status >= 400 and response.status < 500:
|
|
292
|
+
# Client error — do not retry (except 408, 429)
|
|
293
|
+
if response.status not in (408, 429):
|
|
294
|
+
await send_to_dlq(url, payload, response)
|
|
295
|
+
return False
|
|
296
|
+
except (TimeoutError, ConnectionError):
|
|
297
|
+
pass # Retry on network errors
|
|
298
|
+
|
|
299
|
+
# Exponential backoff with full jitter
|
|
300
|
+
delay = min(base_delay * (2 ** attempt), max_delay)
|
|
301
|
+
jitter = random.uniform(0, delay)
|
|
302
|
+
await asyncio.sleep(jitter)
|
|
303
|
+
|
|
304
|
+
# All retries exhausted — dead letter queue
|
|
305
|
+
await send_to_dlq(url, payload, last_response)
|
|
306
|
+
return False
|
|
307
|
+
```
|
|
308
|
+
|
|
309
|
+
**Why jitter matters:** Without jitter, if 1,000 webhooks fail simultaneously (common during a consumer outage), all 1,000 retry at exactly the same intervals. When the endpoint recovers, it gets hit with 1,000 simultaneous requests — the **thundering herd problem**. Jitter spreads retries across time windows, preventing synchronized spikes.
|
|
310
|
+
|
|
311
|
+
### 4.5 Idempotency on the Receiver
|
|
312
|
+
|
|
313
|
+
Providers deliver at-least-once, meaning your receiver **will** get duplicate events. Every webhook handler must be idempotent.
|
|
314
|
+
|
|
315
|
+
**Pattern: Event ID deduplication**
|
|
316
|
+
|
|
317
|
+
```python
|
|
318
|
+
async def handle_webhook(request):
|
|
319
|
+
payload = await request.json()
|
|
320
|
+
event_id = payload["id"]
|
|
321
|
+
|
|
322
|
+
# Check if already processed (atomic check-and-set)
|
|
323
|
+
if await redis.set(f"webhook:{event_id}", "processing", nx=True, ex=86400):
|
|
324
|
+
try:
|
|
325
|
+
await process_event(payload)
|
|
326
|
+
await redis.set(f"webhook:{event_id}", "done", ex=604800)
|
|
327
|
+
except Exception:
|
|
328
|
+
await redis.delete(f"webhook:{event_id}")
|
|
329
|
+
raise
|
|
330
|
+
else:
|
|
331
|
+
# Already processed or in-progress — acknowledge without reprocessing
|
|
332
|
+
pass
|
|
333
|
+
|
|
334
|
+
return Response(status=200)
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
**Pattern: Database-level idempotency with unique constraint**
|
|
338
|
+
|
|
339
|
+
```sql
|
|
340
|
+
CREATE TABLE processed_events (
|
|
341
|
+
event_id VARCHAR(255) PRIMARY KEY,
|
|
342
|
+
event_type VARCHAR(100) NOT NULL,
|
|
343
|
+
processed_at TIMESTAMP DEFAULT NOW(),
|
|
344
|
+
payload JSONB
|
|
345
|
+
);
|
|
346
|
+
|
|
347
|
+
-- In application code: INSERT ... ON CONFLICT DO NOTHING
|
|
348
|
+
-- If insert succeeds, process the event. If conflict, skip.
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
**Pattern: Idempotency through business logic**
|
|
352
|
+
|
|
353
|
+
Some operations are naturally idempotent. `UPDATE orders SET status = 'paid' WHERE id = ?` produces the same result regardless of how many times it runs. Design your state transitions to be idempotent where possible, rather than relying solely on deduplication.
|
|
354
|
+
|
|
355
|
+
### 4.6 Delivery Guarantees
|
|
356
|
+
|
|
357
|
+
| Guarantee | Description | Who Provides It |
|
|
358
|
+
|-----------|-------------|-----------------|
|
|
359
|
+
| **At-most-once** | Event delivered 0 or 1 times. No retries. | GitHub (limited retries), fire-and-forget providers |
|
|
360
|
+
| **At-least-once** | Event delivered 1 or more times. Provider retries on failure. | Stripe, Shopify, most mature providers |
|
|
361
|
+
| **Exactly-once** | Event delivered exactly 1 time. | Nobody — not achievable over HTTP without receiver-side deduplication |
|
|
362
|
+
|
|
363
|
+
**At-least-once is the practical standard.** Exactly-once delivery is a theoretical impossibility in distributed systems without both sides participating. The closest approximation: at-least-once delivery from the provider + idempotent processing on the receiver = effectively-once processing.
|
|
364
|
+
|
|
365
|
+
### 4.7 Ordering Challenges
|
|
366
|
+
|
|
367
|
+
Webhooks have no inherent ordering. Strategies for handling out-of-order events:
|
|
368
|
+
|
|
369
|
+
**Strategy 1: Timestamp-based last-write-wins**
|
|
370
|
+
|
|
371
|
+
```python
|
|
372
|
+
async def handle_order_update(event):
|
|
373
|
+
event_time = event["created"]
|
|
374
|
+
order_id = event["data"]["object"]["id"]
|
|
375
|
+
|
|
376
|
+
# Only apply if this event is newer than what we have
|
|
377
|
+
result = await db.execute("""
|
|
378
|
+
UPDATE orders
|
|
379
|
+
SET status = $1, updated_at = $2
|
|
380
|
+
WHERE id = $3 AND updated_at < $2
|
|
381
|
+
""", event["data"]["object"]["status"], event_time, order_id)
|
|
382
|
+
|
|
383
|
+
# If no rows updated, we already have a newer state — safe to ignore
|
|
384
|
+
```
|
|
385
|
+
|
|
386
|
+
**Strategy 2: Fetch current state from provider API**
|
|
387
|
+
|
|
388
|
+
When you receive any event for an object, ignore the event payload and fetch the current state from the provider's API. This guarantees you always have the latest state, regardless of event ordering.
|
|
389
|
+
|
|
390
|
+
**Strategy 3: Event sequence numbers**
|
|
391
|
+
|
|
392
|
+
Some providers include a sequence number or version. Only apply events with a higher sequence than your current stored version. This is the most robust approach but requires provider support.
|
|
393
|
+
|
|
394
|
+
### 4.8 Webhook Management UI
|
|
395
|
+
|
|
396
|
+
If you are building a platform that sends webhooks, provide consumers with:
|
|
397
|
+
|
|
398
|
+
- **Endpoint management** — CRUD operations for webhook URLs and event subscriptions
|
|
399
|
+
- **Event log** — Searchable history of all delivered events with status, response code, latency
|
|
400
|
+
- **Manual retry** — Ability to replay failed events individually or in bulk
|
|
401
|
+
- **Test/ping** — Send a test event to verify endpoint connectivity
|
|
402
|
+
- **Secret rotation** — Rotate signing secrets without downtime (support multiple active secrets during transition)
|
|
403
|
+
- **Failure alerts** — Notify consumers when their endpoint is failing consistently
|
|
404
|
+
|
|
405
|
+
Stripe's webhook dashboard is the gold standard. Study it when designing your own.
|
|
406
|
+
|
|
407
|
+
---
|
|
408
|
+
|
|
409
|
+
## 5. Trade-Offs Matrix
|
|
410
|
+
|
|
411
|
+
| Dimension | Webhooks Win | Webhooks Lose |
|
|
412
|
+
|-----------|-------------|---------------|
|
|
413
|
+
| **Latency** | Near real-time (seconds vs minutes for polling) | Higher than in-process events or message queues |
|
|
414
|
+
| **Efficiency** | Only fires on actual events; no wasted requests | Receiver must maintain an always-on public endpoint |
|
|
415
|
+
| **Simplicity** | Simple to consume — just an HTTP endpoint | Complex to operate reliably (retries, idempotency, signature verification, ordering) |
|
|
416
|
+
| **Coupling** | Loose temporal coupling (async) | Tight protocol coupling (HTTP, specific payload schema) |
|
|
417
|
+
| **Cost** | Eliminates polling infrastructure costs | Requires endpoint hosting, TLS certificates, monitoring |
|
|
418
|
+
| **Scalability** | Provider handles fan-out to all consumers | Consumer must handle burst traffic from provider |
|
|
419
|
+
| **Debugging** | Event logs provide audit trail | Harder to debug than synchronous API calls |
|
|
420
|
+
| **Security** | HMAC signatures provide authenticity verification | Exposes an inbound endpoint — increased attack surface |
|
|
421
|
+
| **Reliability** | Provider manages retry infrastructure | Consumer has no control over retry timing or backoff |
|
|
422
|
+
| **Ordering** | Events reflect actual occurrence | No delivery ordering guarantees |
|
|
423
|
+
| **Universality** | Dominant SaaS integration pattern | Not all providers support webhooks; some fire-and-forget |
|
|
424
|
+
| **Development** | Rich ecosystem of tools (ngrok, Svix, Hookdeck) | Local development requires tunneling or proxy tools |
|
|
425
|
+
|
|
426
|
+
---
|
|
427
|
+
|
|
428
|
+
## 6. Evolution Path
|
|
429
|
+
|
|
430
|
+
### 6.1 Maturity Levels
|
|
431
|
+
|
|
432
|
+
**Level 0: Direct polling**
|
|
433
|
+
Cron job polls provider API every N minutes. Simple, reliable, wasteful. Suitable for low-volume integrations where minute-level latency is acceptable.
|
|
434
|
+
|
|
435
|
+
**Level 1: Basic webhook receiver**
|
|
436
|
+
Single HTTP endpoint receives webhooks. No signature verification, no idempotency, no retry handling. Processes events synchronously in the request handler. Breaks under load, loses events on downtime.
|
|
437
|
+
|
|
438
|
+
**Level 2: Hardened webhook receiver**
|
|
439
|
+
Signature verification, idempotent processing with event ID deduplication, quick 200 response before async processing. Handles duplicates safely. Still vulnerable to burst traffic and extended downtime.
|
|
440
|
+
|
|
441
|
+
**Level 3: Queue-buffered processing**
|
|
442
|
+
Webhook endpoint validates signature, persists event to a queue (SQS, Redis, RabbitMQ), and returns 200 immediately. Separate workers process events from the queue at their own pace. Handles bursts, survives worker restarts. This is the recommended production architecture.
|
|
443
|
+
|
|
444
|
+
```
|
|
445
|
+
Provider → [Webhook Endpoint] → [Queue] → [Worker] → [Business Logic]
|
|
446
|
+
validates sig buffer processes idempotent
|
|
447
|
+
returns 200 durable at own pace operations
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
**Level 4: Full event infrastructure**
|
|
451
|
+
Queue-buffered processing plus: DLQ for failed events, reconciliation polling as fallback, event store for replay, monitoring dashboards, automatic endpoint health checks, webhook management UI. Used by teams processing thousands of webhook events per minute.
|
|
452
|
+
|
|
453
|
+
**Level 5: Managed webhook infrastructure**
|
|
454
|
+
Offload sending/receiving to a dedicated service (Svix for sending, Hookdeck for receiving). Handles retries, DLQ, signature verification, rate limiting, event log, and management UI. You focus on business logic.
|
|
455
|
+
|
|
456
|
+
### 6.2 Migration Path: Polling to Webhooks
|
|
457
|
+
|
|
458
|
+
1. Build webhook receiver alongside existing polling infrastructure
|
|
459
|
+
2. Run both in parallel — webhooks as primary, polling as reconciliation
|
|
460
|
+
3. Extend polling interval (5 min to 30 min to 1 hour) as webhook reliability is confirmed
|
|
461
|
+
4. Keep polling as a safety net — never fully remove it for critical integrations
|
|
462
|
+
|
|
463
|
+
---
|
|
464
|
+
|
|
465
|
+
## 7. Failure Modes
|
|
466
|
+
|
|
467
|
+
### 7.1 Receiver Downtime Losing Events
|
|
468
|
+
|
|
469
|
+
**Scenario:** Your webhook endpoint is down for a deployment, scaling event, or outage. The provider delivers webhooks, gets errors, retries according to their policy, and eventually gives up.
|
|
470
|
+
|
|
471
|
+
**Impact:** Events permanently lost. For payment webhooks, this means orders fulfilled without payment confirmation, or payments received without order fulfillment.
|
|
472
|
+
|
|
473
|
+
**Real example:** Shopify removes webhook subscriptions entirely after 19 consecutive failures. If your endpoint was down for 48 hours, you lose the subscription itself — not just individual events.
|
|
474
|
+
|
|
475
|
+
**Mitigation:**
|
|
476
|
+
- Queue-buffered architecture (Level 3+) — even if workers are down, the endpoint stays up
|
|
477
|
+
- Reconciliation polling as fallback
|
|
478
|
+
- Zero-downtime deployments (blue-green, rolling) for the webhook endpoint
|
|
479
|
+
- Health check monitoring with immediate alerting
|
|
480
|
+
|
|
481
|
+
### 7.2 Replay Attacks
|
|
482
|
+
|
|
483
|
+
**Scenario:** Attacker intercepts a legitimate webhook request (including valid signature) and resends it later to trigger duplicate processing — double-charging a customer, double-crediting an account.
|
|
484
|
+
|
|
485
|
+
**Mitigation:**
|
|
486
|
+
- Include a timestamp in the signed content
|
|
487
|
+
- Reject webhooks with timestamps older than 5 minutes (`TOLERANCE_SECONDS = 300`)
|
|
488
|
+
- Idempotent processing with event ID deduplication (defense in depth)
|
|
489
|
+
|
|
490
|
+
### 7.3 Signature Verification Skipped
|
|
491
|
+
|
|
492
|
+
**Scenario:** Developer skips signature verification during development ("I'll add it later") and the code ships to production. Attacker discovers the endpoint URL and sends forged webhook payloads.
|
|
493
|
+
|
|
494
|
+
**Impact:** Arbitrary event injection. Attacker can forge `payment_intent.succeeded` events to get products without paying, or `customer.deleted` events to disrupt service.
|
|
495
|
+
|
|
496
|
+
**Real-world frequency:** Alarmingly common. A 2023 survey by Hookdeck found that a significant percentage of webhook consumers do not verify signatures in production.
|
|
497
|
+
|
|
498
|
+
**Mitigation:**
|
|
499
|
+
- Signature verification in middleware/decorator — applied to all webhook routes by default
|
|
500
|
+
- CI/CD tests that verify webhook handlers reject unsigned requests
|
|
501
|
+
- Framework-level enforcement (reject all webhook requests without valid signature)
|
|
502
|
+
|
|
503
|
+
### 7.4 Webhook Storms
|
|
504
|
+
|
|
505
|
+
**Scenario:** A bulk operation on the provider side triggers thousands of webhooks simultaneously. A Shopify merchant imports 50,000 products — 50,000 `product/create` webhooks fire within minutes. A GitHub organization renames and triggers webhooks across hundreds of repositories.
|
|
506
|
+
|
|
507
|
+
**Impact:** Receiver overwhelmed, returning timeouts/5xx errors, which triggers retries, which create more load — the **retry amplification loop**. Can cascade into full receiver outage.
|
|
508
|
+
|
|
509
|
+
**Mitigation:**
|
|
510
|
+
- Queue-buffered architecture (absorb bursts in the queue)
|
|
511
|
+
- Rate limiting on the receiver (return 429 — well-behaved providers will back off)
|
|
512
|
+
- Circuit breaker pattern on the receiver side
|
|
513
|
+
- Provider-side: batch events where possible, respect consumer rate limits
|
|
514
|
+
|
|
515
|
+
### 7.5 Endpoint URL Hijacking
|
|
516
|
+
|
|
517
|
+
**Scenario:** Attacker gains access to the provider dashboard and changes the webhook URL to their own endpoint, receiving all events (potentially including sensitive data like customer information, payment details).
|
|
518
|
+
|
|
519
|
+
**Mitigation:**
|
|
520
|
+
- MFA on provider dashboard accounts
|
|
521
|
+
- Webhook URL changes trigger email notification to account owner
|
|
522
|
+
- Restrict webhook management to admin roles only
|
|
523
|
+
- Monitor for unexpected changes to webhook configuration
|
|
524
|
+
|
|
525
|
+
### 7.6 Payload Size and Timeout Issues
|
|
526
|
+
|
|
527
|
+
**Scenario:** Provider sends a webhook with a large payload (e.g., a Shopify order with 500 line items). Receiver's web framework has a body size limit that rejects the request, or processing takes longer than the provider's timeout (typically 5-30 seconds).
|
|
528
|
+
|
|
529
|
+
**Mitigation:**
|
|
530
|
+
- Configure generous body size limits on webhook endpoints (at least 1MB)
|
|
531
|
+
- Never process business logic synchronously in the webhook handler — persist to queue, return 200
|
|
532
|
+
- Stripe best practice: "Return a 200 response before any complex logic that might cause a timeout"
|
|
533
|
+
|
|
534
|
+
### 7.7 Secret Rotation Failures
|
|
535
|
+
|
|
536
|
+
**Scenario:** Signing secret needs rotation (compromised, compliance requirement, employee departure). Old secret is invalidated before new secret is deployed to all receiver instances, causing all webhooks to fail signature verification.
|
|
537
|
+
|
|
538
|
+
**Mitigation:**
|
|
539
|
+
- Support multiple active signing secrets during rotation window
|
|
540
|
+
- Provider generates new secret → deploy new secret to all receivers → verify → invalidate old secret
|
|
541
|
+
- Some providers (Stripe) support this natively with secret versioning
|
|
542
|
+
|
|
543
|
+
---
|
|
544
|
+
|
|
545
|
+
## 8. Technology Landscape
|
|
546
|
+
|
|
547
|
+
### 8.1 Webhook Infrastructure Services
|
|
548
|
+
|
|
549
|
+
| Service | Focus | Key Features | Pricing Model |
|
|
550
|
+
|---------|-------|--------------|---------------|
|
|
551
|
+
| **Svix** | Sending webhooks | Open-source core, hosted option, management UI, retry, signing, event types | Free tier (50K msgs), paid from $10/mo |
|
|
552
|
+
| **Hookdeck** | Receiving webhooks | Gateway, retry, transformation, rate limiting, DLQ, event routing | Free tier (100K requests), paid from $39/mo |
|
|
553
|
+
| **Convoy** | Sending + receiving | Open-source, self-hostable, multi-tenant, rate limiting | Open-source (free), cloud hosted available |
|
|
554
|
+
| **Hook Relay** | Receiving webhooks | Relay proxy, local development, event replay | Usage-based |
|
|
555
|
+
| **ngrok** | Local development | Tunnel public URL to localhost, request inspection | Free tier, paid from $8/mo |
|
|
556
|
+
|
|
557
|
+
**Build vs buy decision:** Svix estimates it takes 2-4 engineering months to build production-grade webhook sending infrastructure from scratch. For most teams, buying or using open-source is the correct choice unless webhooks are core to your product.
|
|
558
|
+
|
|
559
|
+
### 8.2 Provider Implementation Examples
|
|
560
|
+
|
|
561
|
+
**Stripe** — The gold standard for webhook design:
|
|
562
|
+
- Fat payloads with full object state + `previous_attributes`
|
|
563
|
+
- HMAC-SHA256 signing with timestamp-based replay protection
|
|
564
|
+
- Event type hierarchy (`payment_intent.succeeded`, `invoice.payment_failed`)
|
|
565
|
+
- Webhook endpoint management API + dashboard
|
|
566
|
+
- Event log with filtering and manual retry
|
|
567
|
+
- Test mode events (no real charges)
|
|
568
|
+
- Explicit recommendation to use webhooks + polling together
|
|
569
|
+
|
|
570
|
+
**GitHub** — Comprehensive but limited retries:
|
|
571
|
+
- Configurable per-repository or per-organization
|
|
572
|
+
- 250+ event types with granular filtering
|
|
573
|
+
- HMAC-SHA256 with user-provided secret
|
|
574
|
+
- Recent Deliveries log with redeliver button
|
|
575
|
+
- Only 1 automatic retry after ~10 seconds — if both fail, event is lost
|
|
576
|
+
- Ping event on webhook creation for endpoint verification
|
|
577
|
+
|
|
578
|
+
**Twilio** — Resource-level webhook configuration:
|
|
579
|
+
- Webhook URLs set per-resource (per phone number, per messaging service)
|
|
580
|
+
- Signatures use account auth token (shared across all webhooks)
|
|
581
|
+
- StatusCallback URLs for async event notification
|
|
582
|
+
- Fallback URLs for primary endpoint failure
|
|
583
|
+
- Request validation using X-Twilio-Signature header
|
|
584
|
+
|
|
585
|
+
**Shopify** — Aggressive failure handling:
|
|
586
|
+
- Topic-based subscription (`orders/create`, `products/update`)
|
|
587
|
+
- HMAC-SHA256 with app-specific secret
|
|
588
|
+
- 19 retry attempts over 48 hours with exponential backoff
|
|
589
|
+
- **Removes webhook subscription after 19 consecutive failures**
|
|
590
|
+
- Mandatory webhook compliance for apps in the Shopify App Store
|
|
591
|
+
|
|
592
|
+
### 8.3 Local Development Tools
|
|
593
|
+
|
|
594
|
+
| Tool | How It Works | Best For |
|
|
595
|
+
|------|-------------|----------|
|
|
596
|
+
| **ngrok** | Creates secure tunnel from public URL to localhost | General webhook development, widely supported |
|
|
597
|
+
| **Hookdeck CLI** | Routes webhooks through Hookdeck to localhost | Teams already using Hookdeck |
|
|
598
|
+
| **Stripe CLI** | `stripe listen --forward-to localhost:3000` | Stripe-specific development |
|
|
599
|
+
| **localtunnel** | Open-source alternative to ngrok | Cost-sensitive, open-source preference |
|
|
600
|
+
| **Webhook.site** | Captures and inspects webhook payloads in browser | Debugging payload format, no code needed |
|
|
601
|
+
|
|
602
|
+
---
|
|
603
|
+
|
|
604
|
+
## 9. Decision Tree
|
|
605
|
+
|
|
606
|
+
```
|
|
607
|
+
Need to react to external service events?
|
|
608
|
+
├── YES: Does the provider support webhooks?
|
|
609
|
+
│ ├── YES: Use webhooks (proceed to architecture decisions below)
|
|
610
|
+
│ │ ├── Is your receiver always available?
|
|
611
|
+
│ │ │ ├── YES: Level 2 (hardened receiver) may suffice
|
|
612
|
+
│ │ │ └── NO: Level 3+ (queue-buffered) required
|
|
613
|
+
│ │ ├── Can you handle burst traffic?
|
|
614
|
+
│ │ │ ├── YES: Direct processing possible
|
|
615
|
+
│ │ │ └── NO: Queue buffer mandatory
|
|
616
|
+
│ │ └── Are events critical (payments, orders)?
|
|
617
|
+
│ │ ├── YES: Webhooks + reconciliation polling + DLQ
|
|
618
|
+
│ │ └── NO: Webhooks with basic retry handling
|
|
619
|
+
│ └── NO: Polling is your only option
|
|
620
|
+
│ ├── Need near-real-time? → Short polling interval (15-30s)
|
|
621
|
+
│ └── Minutes-level latency OK? → Standard polling (1-5 min)
|
|
622
|
+
│
|
|
623
|
+
├── Need to SEND events to external consumers?
|
|
624
|
+
│ ├── < 10 consumers, low volume: Build simple webhook sender
|
|
625
|
+
│ ├── 10-100 consumers, moderate volume: Use Svix or Convoy
|
|
626
|
+
│ └── 100+ consumers, high volume: Managed infrastructure (Svix/Hookdeck)
|
|
627
|
+
│
|
|
628
|
+
└── Internal service-to-service events?
|
|
629
|
+
├── Strong ordering needed? → Kafka / SQS FIFO
|
|
630
|
+
├── Simple pub/sub? → RabbitMQ / SNS+SQS / NATS
|
|
631
|
+
└── Real-time bidirectional? → WebSocket / gRPC streaming
|
|
632
|
+
```
|
|
633
|
+
|
|
634
|
+
---
|
|
635
|
+
|
|
636
|
+
## 10. Implementation Sketch
|
|
637
|
+
|
|
638
|
+
### 10.1 Webhook Sender (Provider Side)
|
|
639
|
+
|
|
640
|
+
```python
|
|
641
|
+
"""
|
|
642
|
+
Production webhook sender with retry, signing, and DLQ.
|
|
643
|
+
Uses a background queue for async delivery.
|
|
644
|
+
"""
|
|
645
|
+
|
|
646
|
+
import hmac
|
|
647
|
+
import hashlib
|
|
648
|
+
import json
|
|
649
|
+
import time
|
|
650
|
+
import uuid
|
|
651
|
+
import random
|
|
652
|
+
from dataclasses import dataclass
|
|
653
|
+
from enum import Enum
|
|
654
|
+
|
|
655
|
+
class DeliveryStatus(Enum):
|
|
656
|
+
PENDING = "pending"
|
|
657
|
+
DELIVERED = "delivered"
|
|
658
|
+
FAILED = "failed"
|
|
659
|
+
DLQ = "dead_letter"
|
|
660
|
+
|
|
661
|
+
@dataclass
|
|
662
|
+
class WebhookEvent:
|
|
663
|
+
id: str
|
|
664
|
+
type: str
|
|
665
|
+
timestamp: int
|
|
666
|
+
data: dict
|
|
667
|
+
api_version: str = "2025-01-01"
|
|
668
|
+
|
|
669
|
+
@dataclass
|
|
670
|
+
class WebhookSubscription:
|
|
671
|
+
id: str
|
|
672
|
+
url: str
|
|
673
|
+
events: list[str]
|
|
674
|
+
secret: str
|
|
675
|
+
active: bool = True
|
|
676
|
+
failure_count: int = 0
|
|
677
|
+
max_failures: int = 20 # Disable after N consecutive failures
|
|
678
|
+
|
|
679
|
+
class WebhookSender:
|
|
680
|
+
"""Handles webhook delivery with signing, retries, and failure tracking."""
|
|
681
|
+
|
|
682
|
+
def __init__(self, queue, event_store, dlq):
|
|
683
|
+
self.queue = queue # SQS, RabbitMQ, Redis, etc.
|
|
684
|
+
self.event_store = event_store # DB for event log
|
|
685
|
+
self.dlq = dlq # Dead letter queue
|
|
686
|
+
|
|
687
|
+
def emit_event(self, event_type: str, data: dict):
|
|
688
|
+
"""Create an event and fan out to all matching subscriptions."""
|
|
689
|
+
event = WebhookEvent(
|
|
690
|
+
id=f"evt_{uuid.uuid4().hex[:24]}",
|
|
691
|
+
type=event_type,
|
|
692
|
+
timestamp=int(time.time()),
|
|
693
|
+
data=data,
|
|
694
|
+
)
|
|
695
|
+
|
|
696
|
+
subscriptions = self.get_active_subscriptions(event_type)
|
|
697
|
+
for sub in subscriptions:
|
|
698
|
+
delivery = {
|
|
699
|
+
"event": event.__dict__,
|
|
700
|
+
"subscription_id": sub.id,
|
|
701
|
+
"url": sub.url,
|
|
702
|
+
"secret": sub.secret,
|
|
703
|
+
"attempt": 0,
|
|
704
|
+
"max_retries": 8,
|
|
705
|
+
}
|
|
706
|
+
self.queue.enqueue(delivery)
|
|
707
|
+
self.event_store.log(event, sub, DeliveryStatus.PENDING)
|
|
708
|
+
|
|
709
|
+
def sign_payload(self, payload_bytes: bytes, secret: str) -> dict:
|
|
710
|
+
"""Generate HMAC-SHA256 signature with timestamp."""
|
|
711
|
+
timestamp = str(int(time.time()))
|
|
712
|
+
signed_content = f"{timestamp}.{payload_bytes.decode('utf-8')}"
|
|
713
|
+
signature = hmac.new(
|
|
714
|
+
secret.encode(),
|
|
715
|
+
signed_content.encode(),
|
|
716
|
+
hashlib.sha256,
|
|
717
|
+
).hexdigest()
|
|
718
|
+
return {
|
|
719
|
+
"Content-Type": "application/json",
|
|
720
|
+
"X-Webhook-ID": str(uuid.uuid4()),
|
|
721
|
+
"X-Webhook-Timestamp": timestamp,
|
|
722
|
+
"X-Webhook-Signature": f"sha256={signature}",
|
|
723
|
+
}
|
|
724
|
+
|
|
725
|
+
async def deliver(self, delivery: dict):
|
|
726
|
+
"""Attempt delivery with retry on failure."""
|
|
727
|
+
event = delivery["event"]
|
|
728
|
+
payload = json.dumps(event).encode()
|
|
729
|
+
headers = self.sign_payload(payload, delivery["secret"])
|
|
730
|
+
|
|
731
|
+
try:
|
|
732
|
+
response = await http_post(
|
|
733
|
+
delivery["url"],
|
|
734
|
+
body=payload,
|
|
735
|
+
headers=headers,
|
|
736
|
+
timeout=30,
|
|
737
|
+
)
|
|
738
|
+
if 200 <= response.status < 300:
|
|
739
|
+
self.event_store.update(
|
|
740
|
+
event["id"], delivery["subscription_id"],
|
|
741
|
+
DeliveryStatus.DELIVERED,
|
|
742
|
+
)
|
|
743
|
+
self.reset_failure_count(delivery["subscription_id"])
|
|
744
|
+
return
|
|
745
|
+
|
|
746
|
+
# 4xx (except 408, 429) = do not retry
|
|
747
|
+
if 400 <= response.status < 500 and response.status not in (408, 429):
|
|
748
|
+
self.event_store.update(
|
|
749
|
+
event["id"], delivery["subscription_id"],
|
|
750
|
+
DeliveryStatus.FAILED,
|
|
751
|
+
)
|
|
752
|
+
return
|
|
753
|
+
except (TimeoutError, ConnectionError):
|
|
754
|
+
pass
|
|
755
|
+
|
|
756
|
+
# Schedule retry or send to DLQ
|
|
757
|
+
attempt = delivery["attempt"] + 1
|
|
758
|
+
if attempt < delivery["max_retries"]:
|
|
759
|
+
delay = min(5 * (2 ** attempt), 3600) # Cap at 1 hour
|
|
760
|
+
jitter = random.uniform(0, delay)
|
|
761
|
+
delivery["attempt"] = attempt
|
|
762
|
+
self.queue.enqueue(delivery, delay_seconds=jitter)
|
|
763
|
+
else:
|
|
764
|
+
self.dlq.enqueue(delivery)
|
|
765
|
+
self.event_store.update(
|
|
766
|
+
event["id"], delivery["subscription_id"],
|
|
767
|
+
DeliveryStatus.DLQ,
|
|
768
|
+
)
|
|
769
|
+
self.increment_failure_count(delivery["subscription_id"])
|
|
770
|
+
```
|
|
771
|
+
|
|
772
|
+
### 10.2 Webhook Receiver (Consumer Side)
|
|
773
|
+
|
|
774
|
+
```python
|
|
775
|
+
"""
|
|
776
|
+
Production webhook receiver with signature verification,
|
|
777
|
+
queue-buffered processing, and idempotency.
|
|
778
|
+
"""
|
|
779
|
+
|
|
780
|
+
import hmac
|
|
781
|
+
import hashlib
|
|
782
|
+
import time
|
|
783
|
+
import json
|
|
784
|
+
from functools import wraps
|
|
785
|
+
|
|
786
|
+
# -- Middleware: Signature Verification --
|
|
787
|
+
|
|
788
|
+
WEBHOOK_SECRET = os.environ["WEBHOOK_SIGNING_SECRET"]
|
|
789
|
+
TIMESTAMP_TOLERANCE = 300 # 5 minutes
|
|
790
|
+
|
|
791
|
+
def verify_webhook_signature(func):
|
|
792
|
+
"""Decorator that enforces HMAC signature verification."""
|
|
793
|
+
@wraps(func)
|
|
794
|
+
async def wrapper(request):
|
|
795
|
+
# Read raw body ONCE (do not parse and re-serialize)
|
|
796
|
+
raw_body = await request.body()
|
|
797
|
+
|
|
798
|
+
timestamp = request.headers.get("X-Webhook-Timestamp")
|
|
799
|
+
signature = request.headers.get("X-Webhook-Signature")
|
|
800
|
+
|
|
801
|
+
if not timestamp or not signature:
|
|
802
|
+
return Response(status=401, body="Missing signature headers")
|
|
803
|
+
|
|
804
|
+
# Replay protection: reject stale timestamps
|
|
805
|
+
try:
|
|
806
|
+
ts = int(timestamp)
|
|
807
|
+
except ValueError:
|
|
808
|
+
return Response(status=401, body="Invalid timestamp")
|
|
809
|
+
|
|
810
|
+
if abs(time.time() - ts) > TIMESTAMP_TOLERANCE:
|
|
811
|
+
return Response(status=401, body="Timestamp too old")
|
|
812
|
+
|
|
813
|
+
# Recompute and compare signature
|
|
814
|
+
signed_content = f"{timestamp}.{raw_body.decode('utf-8')}"
|
|
815
|
+
expected = hmac.new(
|
|
816
|
+
WEBHOOK_SECRET.encode(),
|
|
817
|
+
signed_content.encode(),
|
|
818
|
+
hashlib.sha256,
|
|
819
|
+
).hexdigest()
|
|
820
|
+
|
|
821
|
+
if not hmac.compare_digest(f"sha256={expected}", signature):
|
|
822
|
+
return Response(status=401, body="Invalid signature")
|
|
823
|
+
|
|
824
|
+
# Attach parsed payload to request for handler
|
|
825
|
+
request.webhook_payload = json.loads(raw_body)
|
|
826
|
+
return await func(request)
|
|
827
|
+
|
|
828
|
+
return wrapper
|
|
829
|
+
|
|
830
|
+
|
|
831
|
+
# -- Endpoint: Queue-Buffered Receiver --
|
|
832
|
+
|
|
833
|
+
@app.post("/webhooks/provider")
|
|
834
|
+
@verify_webhook_signature
|
|
835
|
+
async def receive_webhook(request):
|
|
836
|
+
"""Accept webhook, persist to queue, return 200 immediately."""
|
|
837
|
+
payload = request.webhook_payload
|
|
838
|
+
event_id = payload.get("id")
|
|
839
|
+
|
|
840
|
+
# Quick dedup check (optional — worker also deduplicates)
|
|
841
|
+
if await redis.exists(f"webhook:seen:{event_id}"):
|
|
842
|
+
return Response(status=200, body="Already received")
|
|
843
|
+
|
|
844
|
+
# Persist to processing queue
|
|
845
|
+
await queue.enqueue({
|
|
846
|
+
"event_id": event_id,
|
|
847
|
+
"event_type": payload.get("type"),
|
|
848
|
+
"payload": payload,
|
|
849
|
+
"received_at": time.time(),
|
|
850
|
+
})
|
|
851
|
+
|
|
852
|
+
await redis.set(f"webhook:seen:{event_id}", "1", ex=86400)
|
|
853
|
+
|
|
854
|
+
# Return 200 BEFORE any business logic
|
|
855
|
+
return Response(status=200)
|
|
856
|
+
|
|
857
|
+
|
|
858
|
+
# -- Worker: Idempotent Event Processor --
|
|
859
|
+
|
|
860
|
+
class WebhookWorker:
|
|
861
|
+
"""Processes webhook events from queue with idempotency."""
|
|
862
|
+
|
|
863
|
+
def __init__(self, queue, db):
|
|
864
|
+
self.queue = queue
|
|
865
|
+
self.db = db
|
|
866
|
+
self.handlers = {}
|
|
867
|
+
|
|
868
|
+
def register(self, event_type: str):
|
|
869
|
+
"""Decorator to register an event handler."""
|
|
870
|
+
def decorator(func):
|
|
871
|
+
self.handlers[event_type] = func
|
|
872
|
+
return func
|
|
873
|
+
return decorator
|
|
874
|
+
|
|
875
|
+
async def process(self):
|
|
876
|
+
"""Main processing loop."""
|
|
877
|
+
while True:
|
|
878
|
+
message = await self.queue.dequeue()
|
|
879
|
+
event_id = message["event_id"]
|
|
880
|
+
event_type = message["event_type"]
|
|
881
|
+
|
|
882
|
+
# Idempotency: skip if already processed
|
|
883
|
+
if await self.already_processed(event_id):
|
|
884
|
+
await self.queue.ack(message)
|
|
885
|
+
continue
|
|
886
|
+
|
|
887
|
+
handler = self.handlers.get(event_type)
|
|
888
|
+
if not handler:
|
|
889
|
+
# Unknown event type — ack and skip
|
|
890
|
+
await self.queue.ack(message)
|
|
891
|
+
continue
|
|
892
|
+
|
|
893
|
+
try:
|
|
894
|
+
await handler(message["payload"])
|
|
895
|
+
await self.mark_processed(event_id, event_type)
|
|
896
|
+
await self.queue.ack(message)
|
|
897
|
+
except Exception as e:
|
|
898
|
+
# Return to queue for retry (queue handles DLQ)
|
|
899
|
+
await self.queue.nack(message)
|
|
900
|
+
log.error(f"Failed to process {event_id}: {e}")
|
|
901
|
+
|
|
902
|
+
async def already_processed(self, event_id: str) -> bool:
|
|
903
|
+
"""Check idempotency store."""
|
|
904
|
+
result = await self.db.execute(
|
|
905
|
+
"SELECT 1 FROM processed_events WHERE event_id = $1",
|
|
906
|
+
event_id,
|
|
907
|
+
)
|
|
908
|
+
return result is not None
|
|
909
|
+
|
|
910
|
+
async def mark_processed(self, event_id: str, event_type: str):
|
|
911
|
+
"""Record successful processing."""
|
|
912
|
+
await self.db.execute(
|
|
913
|
+
"""INSERT INTO processed_events (event_id, event_type, processed_at)
|
|
914
|
+
VALUES ($1, $2, NOW())
|
|
915
|
+
ON CONFLICT (event_id) DO NOTHING""",
|
|
916
|
+
event_id, event_type,
|
|
917
|
+
)
|
|
918
|
+
|
|
919
|
+
|
|
920
|
+
# -- Handler Registration --
|
|
921
|
+
|
|
922
|
+
worker = WebhookWorker(queue, db)
|
|
923
|
+
|
|
924
|
+
@worker.register("payment_intent.succeeded")
|
|
925
|
+
async def handle_payment_succeeded(payload):
|
|
926
|
+
payment = payload["data"]["object"]
|
|
927
|
+
order_id = payment["metadata"]["order_id"]
|
|
928
|
+
|
|
929
|
+
# Idempotent operation: SET status = 'paid' is safe to repeat
|
|
930
|
+
await db.execute(
|
|
931
|
+
"""UPDATE orders SET status = 'paid', paid_at = NOW()
|
|
932
|
+
WHERE id = $1 AND status = 'pending'""",
|
|
933
|
+
order_id,
|
|
934
|
+
)
|
|
935
|
+
await send_receipt_email(order_id)
|
|
936
|
+
|
|
937
|
+
@worker.register("customer.subscription.deleted")
|
|
938
|
+
async def handle_subscription_canceled(payload):
|
|
939
|
+
subscription = payload["data"]["object"]
|
|
940
|
+
customer_id = subscription["customer"]
|
|
941
|
+
|
|
942
|
+
# Idempotent: downgrade only if still on paid plan
|
|
943
|
+
await db.execute(
|
|
944
|
+
"""UPDATE users SET plan = 'free', plan_expires_at = NOW()
|
|
945
|
+
WHERE stripe_customer_id = $1 AND plan != 'free'""",
|
|
946
|
+
customer_id,
|
|
947
|
+
)
|
|
948
|
+
```
|
|
949
|
+
|
|
950
|
+
### 10.3 Reconciliation Poller (Safety Net)
|
|
951
|
+
|
|
952
|
+
```python
|
|
953
|
+
"""
|
|
954
|
+
Periodic reconciliation that catches events missed by webhooks.
|
|
955
|
+
Run alongside webhook processing — not instead of it.
|
|
956
|
+
"""
|
|
957
|
+
|
|
958
|
+
async def reconcile_payments():
|
|
959
|
+
"""Fetch recent Stripe events and process any we missed."""
|
|
960
|
+
# Get events from the last hour
|
|
961
|
+
events = await stripe.Event.list(
|
|
962
|
+
type="payment_intent.succeeded",
|
|
963
|
+
created={"gte": int(time.time()) - 3600},
|
|
964
|
+
limit=100,
|
|
965
|
+
)
|
|
966
|
+
|
|
967
|
+
for event in events.auto_paging_iter():
|
|
968
|
+
if not await already_processed(event.id):
|
|
969
|
+
await process_event(event)
|
|
970
|
+
log.info(f"Reconciliation caught missed event: {event.id}")
|
|
971
|
+
|
|
972
|
+
# Run every 30 minutes via cron or scheduler
|
|
973
|
+
# Frequency decreases as webhook reliability is proven
|
|
974
|
+
```
|
|
975
|
+
|
|
976
|
+
---
|
|
977
|
+
|
|
978
|
+
## 11. Operational Checklist
|
|
979
|
+
|
|
980
|
+
### 11.1 Before Going to Production
|
|
981
|
+
|
|
982
|
+
- [ ] Signature verification is enforced on all webhook endpoints (not just implemented — enforced)
|
|
983
|
+
- [ ] Timestamp validation rejects webhooks older than 5 minutes
|
|
984
|
+
- [ ] Raw request body is used for signature computation (not re-serialized JSON)
|
|
985
|
+
- [ ] Timing-safe comparison is used for signature matching
|
|
986
|
+
- [ ] Signing secret is stored in environment variables or secrets manager
|
|
987
|
+
- [ ] Webhook handler returns 200 before any business logic processing
|
|
988
|
+
- [ ] Events are persisted to a queue before acknowledgment
|
|
989
|
+
- [ ] Idempotency is implemented (event ID deduplication or idempotent operations)
|
|
990
|
+
- [ ] Reconciliation polling is running as a safety net for critical integrations
|
|
991
|
+
- [ ] Monitoring and alerting is configured for webhook endpoint health
|
|
992
|
+
- [ ] Load testing has verified the endpoint can handle expected burst traffic
|
|
993
|
+
- [ ] Body size limits are configured (minimum 1MB for webhook endpoints)
|
|
994
|
+
- [ ] Zero-downtime deployment is configured for the webhook endpoint
|
|
995
|
+
|
|
996
|
+
### 11.2 Monitoring Metrics
|
|
997
|
+
|
|
998
|
+
| Metric | Alert Threshold | Why It Matters |
|
|
999
|
+
|--------|----------------|----------------|
|
|
1000
|
+
| Webhook endpoint response time (p99) | > 5 seconds | Provider will timeout and retry, creating amplification |
|
|
1001
|
+
| Webhook endpoint error rate | > 1% over 5 minutes | Provider may disable endpoint |
|
|
1002
|
+
| Queue depth | Growing for > 10 minutes | Workers are not keeping up with event volume |
|
|
1003
|
+
| DLQ depth | Any message | Failed events need investigation |
|
|
1004
|
+
| Event processing latency | > target SLA | Business impact (delayed order fulfillment, etc.) |
|
|
1005
|
+
| Duplicate event rate | Sustained > 10% | Provider retry logic may be misconfigured, or dedup is failing |
|
|
1006
|
+
| Signature verification failures | Any | Either a bug or an active attack |
|
|
1007
|
+
|
|
1008
|
+
---
|
|
1009
|
+
|
|
1010
|
+
## 12. Case Studies
|
|
1011
|
+
|
|
1012
|
+
### 12.1 Stripe: Webhook Infrastructure at Scale
|
|
1013
|
+
|
|
1014
|
+
Stripe processes billions of webhook deliveries monthly. Key architectural decisions:
|
|
1015
|
+
|
|
1016
|
+
- **Fat payloads** — full object state included, reducing API round-trips
|
|
1017
|
+
- **Event versioning** — API version pinned per-endpoint, enabling schema evolution without breaking consumers
|
|
1018
|
+
- **Retry policy** — up to 3 attempts over 72 hours with exponential backoff
|
|
1019
|
+
- **Endpoint disabling** — after sustained failures, endpoints are marked as disabled (not deleted) with email notification
|
|
1020
|
+
- **Test mode** — separate webhook endpoints for test vs live mode events
|
|
1021
|
+
- **Explicit reconciliation guidance** — Stripe documentation explicitly recommends polling as a complement to webhooks
|
|
1022
|
+
|
|
1023
|
+
**Lesson:** Even the best webhook infrastructure recommends a backup mechanism. Webhooks are the fast path, not the only path.
|
|
1024
|
+
|
|
1025
|
+
### 12.2 GitHub: Minimal Retry, Maximum Transparency
|
|
1026
|
+
|
|
1027
|
+
GitHub takes a different approach — minimal retries (1 retry after ~10 seconds) but maximum transparency:
|
|
1028
|
+
|
|
1029
|
+
- **Recent Deliveries** dashboard shows the last 20 deliveries with full request/response details
|
|
1030
|
+
- **Redeliver** button allows manual replay of any past delivery
|
|
1031
|
+
- **Ping event** sent on webhook creation to verify connectivity
|
|
1032
|
+
- **250+ event types** with granular filtering
|
|
1033
|
+
|
|
1034
|
+
**Lesson:** If your retry policy is minimal, compensate with excellent tooling for debugging and manual replay. GitHub's approach works because their events are less critical (CI triggers, notifications) than financial events.
|
|
1035
|
+
|
|
1036
|
+
### 12.3 Shopify: Aggressive Failure Handling
|
|
1037
|
+
|
|
1038
|
+
Shopify takes the most aggressive stance on endpoint health:
|
|
1039
|
+
|
|
1040
|
+
- 19 retry attempts over 48 hours
|
|
1041
|
+
- **Removes the entire webhook subscription** after 19 consecutive failures
|
|
1042
|
+
- Mandatory webhook compliance for App Store apps
|
|
1043
|
+
- Bulk operations can trigger thousands of webhooks simultaneously
|
|
1044
|
+
|
|
1045
|
+
**Lesson:** Your webhook consumer must be resilient, or you will lose your subscription entirely. Queue-buffered architectures are not optional for Shopify integrations.
|
|
1046
|
+
|
|
1047
|
+
---
|
|
1048
|
+
|
|
1049
|
+
## 13. Anti-Patterns
|
|
1050
|
+
|
|
1051
|
+
### 13.1 Synchronous Processing in the Handler
|
|
1052
|
+
|
|
1053
|
+
```python
|
|
1054
|
+
# BAD: Processing in the request handler
|
|
1055
|
+
@app.post("/webhooks")
|
|
1056
|
+
async def webhook(request):
|
|
1057
|
+
payload = await request.json()
|
|
1058
|
+
await update_database(payload) # Slow
|
|
1059
|
+
await send_email(payload) # Slow
|
|
1060
|
+
await notify_slack(payload) # Slow
|
|
1061
|
+
await update_analytics(payload) # Slow
|
|
1062
|
+
return Response(status=200) # Provider already timed out
|
|
1063
|
+
```
|
|
1064
|
+
|
|
1065
|
+
The provider has a 5-30 second timeout. If your processing exceeds that, the provider records a failure and retries — creating duplicate work and potentially disabling your endpoint.
|
|
1066
|
+
|
|
1067
|
+
### 13.2 Trusting the Payload Without Verification
|
|
1068
|
+
|
|
1069
|
+
```python
|
|
1070
|
+
# BAD: No signature verification
|
|
1071
|
+
@app.post("/webhooks")
|
|
1072
|
+
async def webhook(request):
|
|
1073
|
+
payload = await request.json()
|
|
1074
|
+
if payload["type"] == "payment.succeeded":
|
|
1075
|
+
await fulfill_order(payload["data"]["order_id"]) # Attacker can forge this
|
|
1076
|
+
```
|
|
1077
|
+
|
|
1078
|
+
Anyone who discovers your webhook URL can send forged events. Always verify the signature.
|
|
1079
|
+
|
|
1080
|
+
### 13.3 Relying on Webhook Ordering
|
|
1081
|
+
|
|
1082
|
+
```python
|
|
1083
|
+
# BAD: Assuming events arrive in order
|
|
1084
|
+
@app.post("/webhooks")
|
|
1085
|
+
async def webhook(request):
|
|
1086
|
+
payload = await request.json()
|
|
1087
|
+
if payload["type"] == "order.created":
|
|
1088
|
+
await create_order(payload)
|
|
1089
|
+
elif payload["type"] == "order.paid":
|
|
1090
|
+
order = await get_order(payload) # May not exist yet!
|
|
1091
|
+
await mark_paid(order)
|
|
1092
|
+
```
|
|
1093
|
+
|
|
1094
|
+
`order.paid` can arrive before `order.created`. Use timestamp-based updates or fetch current state from the provider API.
|
|
1095
|
+
|
|
1096
|
+
### 13.4 No Idempotency
|
|
1097
|
+
|
|
1098
|
+
```python
|
|
1099
|
+
# BAD: Non-idempotent handler
|
|
1100
|
+
@app.post("/webhooks")
|
|
1101
|
+
async def webhook(request):
|
|
1102
|
+
payload = await request.json()
|
|
1103
|
+
if payload["type"] == "payment.succeeded":
|
|
1104
|
+
await credit_account(payload["amount"]) # Doubles on retry!
|
|
1105
|
+
```
|
|
1106
|
+
|
|
1107
|
+
At-least-once delivery means you will receive duplicates. Every handler must be idempotent.
|
|
1108
|
+
|
|
1109
|
+
### 13.5 Exposing Internal State in Webhook Payloads
|
|
1110
|
+
|
|
1111
|
+
```python
|
|
1112
|
+
# BAD: Leaking internal IDs and database structure
|
|
1113
|
+
event = {
|
|
1114
|
+
"type": "user.created",
|
|
1115
|
+
"data": {
|
|
1116
|
+
"internal_user_id": 12345, # Sequential — enumerable
|
|
1117
|
+
"postgres_row_id": 67890, # Reveals database
|
|
1118
|
+
"auth_hash": "bcrypt$...", # Security breach
|
|
1119
|
+
"feature_flags": {"beta_v2": True}, # Internal roadmap leak
|
|
1120
|
+
}
|
|
1121
|
+
}
|
|
1122
|
+
```
|
|
1123
|
+
|
|
1124
|
+
Webhook payloads go to external systems. Treat them as public API responses — expose only what consumers need.
|
|
1125
|
+
|
|
1126
|
+
---
|
|
1127
|
+
|
|
1128
|
+
## 14. Cross-References
|
|
1129
|
+
|
|
1130
|
+
- **api-design-rest** — Webhook registration endpoints follow REST conventions; event payload design shares principles with API response design
|
|
1131
|
+
- **event-driven** — Webhooks are the cross-boundary implementation of event-driven architecture; internal events use message brokers
|
|
1132
|
+
- **idempotency-and-retry** — Core requirement for webhook consumers; idempotency keys and retry strategies apply directly
|
|
1133
|
+
- **third-party-integration** — Webhooks are the primary mechanism for SaaS integration; this module covers the webhook-specific patterns
|
|
1134
|
+
- **rate-limiting-and-throttling** — Webhook receivers need rate limiting to handle bursts; webhook senders should respect consumer rate limits
|
|
1135
|
+
- **secrets-management** — Webhook signing secrets must be stored securely and rotated safely
|
|
1136
|
+
|
|
1137
|
+
---
|
|
1138
|
+
|
|
1139
|
+
## 15. Quick Reference Card
|
|
1140
|
+
|
|
1141
|
+
```
|
|
1142
|
+
SENDING WEBHOOKS RECEIVING WEBHOOKS
|
|
1143
|
+
───────────────── ──────────────────
|
|
1144
|
+
1. Sign with HMAC-SHA256 + timestamp 1. Verify signature (HMAC-SHA256)
|
|
1145
|
+
2. Include event ID, type, timestamp 2. Reject stale timestamps (>5 min)
|
|
1146
|
+
3. Fat payload (full object state) 3. Return 200 IMMEDIATELY
|
|
1147
|
+
4. Retry with exponential backoff+jitter 4. Persist to queue, process async
|
|
1148
|
+
5. DLQ after max retries exhausted 5. Deduplicate by event ID
|
|
1149
|
+
6. Disable endpoint after N failures 6. Handle out-of-order events
|
|
1150
|
+
7. Provide event log + manual replay 7. Run reconciliation polling
|
|
1151
|
+
8. Support secret rotation 8. Monitor endpoint health + DLQ depth
|
|
1152
|
+
```
|