@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,800 @@
|
|
|
1
|
+
# CAP Theorem and Tradeoffs -- Architecture Expertise Module
|
|
2
|
+
|
|
3
|
+
> The CAP theorem states that a distributed system can provide at most two of three guarantees:
|
|
4
|
+
> Consistency, Availability, and Partition tolerance. Since network partitions are inevitable,
|
|
5
|
+
> the real choice is between consistency and availability during a partition. PACELC extends
|
|
6
|
+
> this: even without partitions, there is a latency vs consistency tradeoff.
|
|
7
|
+
|
|
8
|
+
> **Category:** Distributed
|
|
9
|
+
> **Complexity:** Complex
|
|
10
|
+
> **Applies when:** Designing data replication strategy, choosing databases, or deciding consistency guarantees for distributed services.
|
|
11
|
+
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
## What This Is
|
|
15
|
+
|
|
16
|
+
### The CAP Theorem
|
|
17
|
+
|
|
18
|
+
Eric Brewer introduced the CAP conjecture in his keynote at the ACM PODC symposium in 2000.
|
|
19
|
+
Seth Gilbert and Nancy Lynch of MIT formally proved it in 2002, elevating it from conjecture
|
|
20
|
+
to theorem. The theorem states that any distributed data store can provide at most two of the
|
|
21
|
+
following three guarantees simultaneously:
|
|
22
|
+
|
|
23
|
+
**Consistency (C):** Every read receives the most recent write or an error. All nodes in the
|
|
24
|
+
system see the same data at the same time. When a client writes a value, every subsequent
|
|
25
|
+
read from any node must return that value (or a more recent one). This is linearizable
|
|
26
|
+
consistency -- the strongest form -- not eventual consistency.
|
|
27
|
+
|
|
28
|
+
**Availability (A):** Every request to a non-failing node receives a non-error response,
|
|
29
|
+
without the guarantee that it contains the most recent write. The system remains operational
|
|
30
|
+
and responsive. In CAP's formal definition, *every* request must eventually receive a
|
|
31
|
+
response -- there is no timeout constraint.
|
|
32
|
+
|
|
33
|
+
**Partition Tolerance (P):** The system continues to operate despite an arbitrary number of
|
|
34
|
+
messages being dropped or delayed by the network between nodes. A partition is a
|
|
35
|
+
communication break -- a lost or temporarily delayed connection between two nodes or groups
|
|
36
|
+
of nodes.
|
|
37
|
+
|
|
38
|
+
### The Real Choice: CP or AP
|
|
39
|
+
|
|
40
|
+
The critical insight that Brewer himself clarified in 2012 is that CAP is not about
|
|
41
|
+
choosing two of three in normal operation. It is about what happens *during a network
|
|
42
|
+
partition*:
|
|
43
|
+
|
|
44
|
+
- **Network partitions are inevitable.** In any distributed system that spans more than one
|
|
45
|
+
node, network failures will occur. Hardware fails, cables get cut, switches drop packets,
|
|
46
|
+
cloud availability zones lose connectivity. Partition tolerance is not optional -- it is a
|
|
47
|
+
given.
|
|
48
|
+
|
|
49
|
+
- **The real choice is binary.** When a partition occurs, the system must choose: respond to
|
|
50
|
+
requests with potentially stale data (choose Availability, sacrifice Consistency) or refuse
|
|
51
|
+
to respond until the partition heals and consistency can be confirmed (choose Consistency,
|
|
52
|
+
sacrifice Availability).
|
|
53
|
+
|
|
54
|
+
- **During normal operation, all three are achievable.** When the network is healthy and no
|
|
55
|
+
partitions exist, a well-designed system can be both consistent and available. The tradeoff
|
|
56
|
+
only manifests during failures.
|
|
57
|
+
|
|
58
|
+
### PACELC: The Essential Extension
|
|
59
|
+
|
|
60
|
+
Daniel Abadi proposed the PACELC theorem in 2010 (published 2012) to address CAP's blind
|
|
61
|
+
spot. CAP says nothing about system behavior when there is *no* partition -- which is the
|
|
62
|
+
vast majority of the time. PACELC states:
|
|
63
|
+
|
|
64
|
+
> If there is a **P**artition, choose between **A**vailability and **C**onsistency;
|
|
65
|
+
> **E**lse, when the system is operating normally, choose between **L**atency and
|
|
66
|
+
> **C**onsistency.
|
|
67
|
+
|
|
68
|
+
This captures a fundamental truth: even without failures, replicating data across nodes
|
|
69
|
+
introduces latency. A system that waits for all replicas to acknowledge a write before
|
|
70
|
+
responding is consistent but slow. A system that responds after writing to one replica is
|
|
71
|
+
fast but temporarily inconsistent.
|
|
72
|
+
|
|
73
|
+
PACELC classifications:
|
|
74
|
+
|
|
75
|
+
| Classification | During Partition | Normal Operation | Example Systems |
|
|
76
|
+
|---------------|-----------------|-----------------|-----------------|
|
|
77
|
+
| **PA/EL** | Availability | Low Latency | DynamoDB, Cassandra, Riak |
|
|
78
|
+
| **PA/EC** | Availability | Consistency | --- |
|
|
79
|
+
| **PC/EL** | Consistency | Low Latency | --- |
|
|
80
|
+
| **PC/EC** | Consistency | Consistency | Google Spanner, CockroachDB, VoltDB |
|
|
81
|
+
|
|
82
|
+
PA/EL systems are the most common AP systems -- they optimize for speed and uptime at the
|
|
83
|
+
cost of strict consistency. PC/EC systems are the most common CP systems -- they never
|
|
84
|
+
compromise on correctness. The off-diagonal combinations (PA/EC, PC/EL) are rare because
|
|
85
|
+
the design philosophies that drive partition behavior tend to align with normal-operation
|
|
86
|
+
behavior.
|
|
87
|
+
|
|
88
|
+
### What CAP Does NOT Say
|
|
89
|
+
|
|
90
|
+
These are the most common and damaging misconceptions:
|
|
91
|
+
|
|
92
|
+
**Misconception 1: "Pick any two."** The original "pick 2 of 3" framing (often drawn as a
|
|
93
|
+
Venn diagram with CA, CP, and AP regions) is misleading. CA systems do not exist in
|
|
94
|
+
distributed computing because you cannot simply opt out of partitions. A single-node
|
|
95
|
+
PostgreSQL database is "CA" only because it is not distributed -- there is no network to
|
|
96
|
+
partition. The moment you add replication, you must handle partitions.
|
|
97
|
+
|
|
98
|
+
**Misconception 2: "A system is either CP or AP, full stop."** As Martin Kleppmann argued in
|
|
99
|
+
his influential 2015 essay "Please stop calling databases CP or AP," most real systems cannot
|
|
100
|
+
be cleanly classified. MongoDB with a single primary is CP for writes but different replicas
|
|
101
|
+
may serve stale reads. DynamoDB offers tunable consistency per request. A system's CAP
|
|
102
|
+
position can vary by operation, configuration, and even by individual request.
|
|
103
|
+
|
|
104
|
+
**Misconception 3: "CAP means you can never have consistency and availability."** You can have
|
|
105
|
+
both during normal operation. The tradeoff is only forced during a partition event. Google
|
|
106
|
+
Spanner demonstrates this: it is technically CP (it will sacrifice availability during a
|
|
107
|
+
partition) but achieves greater than 99.999% availability because Google's private network
|
|
108
|
+
infrastructure makes partitions extraordinarily rare.
|
|
109
|
+
|
|
110
|
+
**Misconception 4: "Availability in CAP means 'high availability' as SREs define it."** CAP's
|
|
111
|
+
definition of availability is very specific: every request to a non-failing node must receive
|
|
112
|
+
a response. It says nothing about response time. A system that takes 30 days to respond is
|
|
113
|
+
"available" by CAP's definition. This is why PACELC's addition of latency is so important
|
|
114
|
+
for practical system design.
|
|
115
|
+
|
|
116
|
+
**Misconception 5: "Consistency in CAP is the same as ACID consistency."** CAP consistency
|
|
117
|
+
means linearizability -- a specific property of read/write operations across replicas. ACID
|
|
118
|
+
consistency means transactions preserve database invariants (foreign keys, constraints). They
|
|
119
|
+
are different concepts that happen to share a name.
|
|
120
|
+
|
|
121
|
+
---
|
|
122
|
+
|
|
123
|
+
## When to Prioritize Consistency (CP Systems)
|
|
124
|
+
|
|
125
|
+
Choose consistency over availability when incorrect or stale data causes irreversible harm,
|
|
126
|
+
financial loss, or safety risks.
|
|
127
|
+
|
|
128
|
+
### Financial Systems and Banking
|
|
129
|
+
|
|
130
|
+
A bank transfer between accounts must be atomic. If a network partition occurs mid-transfer,
|
|
131
|
+
the system must refuse to process further transactions on those accounts rather than risk
|
|
132
|
+
double-spending or lost funds. When a customer checks their balance, they must see the actual
|
|
133
|
+
balance -- not a cached value from before a recent deposit. Banks universally choose CP
|
|
134
|
+
because the cost of an incorrect balance (regulatory penalties, customer trust, actual money
|
|
135
|
+
loss) far outweighs the cost of brief unavailability.
|
|
136
|
+
|
|
137
|
+
**Real-world example:** Traditional banking cores (Temenos, FIS) operate as CP systems. When
|
|
138
|
+
a branch cannot reach the central ledger, it queues transactions locally rather than
|
|
139
|
+
processing them optimistically. ATM networks use authorization holds -- a CP pattern --
|
|
140
|
+
rather than dispensing cash they cannot verify.
|
|
141
|
+
|
|
142
|
+
### Inventory and Booking Systems
|
|
143
|
+
|
|
144
|
+
An airline cannot sell the same seat twice. A hotel cannot book the same room to two guests.
|
|
145
|
+
An event venue cannot oversell beyond capacity. These systems require strong consistency
|
|
146
|
+
because the physical resource is finite and non-fungible.
|
|
147
|
+
|
|
148
|
+
**Real-world example:** Ticketmaster's seat reservation system uses strong consistency for
|
|
149
|
+
the booking operation. When a partition occurs between data centers, the system will reject
|
|
150
|
+
booking attempts rather than risk double-booking. However, the *browsing* portion of the
|
|
151
|
+
system (checking what seats are available) can tolerate eventual consistency -- showing a
|
|
152
|
+
seat as available when it was just booked is acceptable because the booking attempt will
|
|
153
|
+
be rejected at the consistent layer.
|
|
154
|
+
|
|
155
|
+
### Medical Records and Safety-Critical Systems
|
|
156
|
+
|
|
157
|
+
Patient medication records, dosage calculations, and allergy information cannot tolerate
|
|
158
|
+
stale reads. Administering a medication that was contraindicated by a recently-entered
|
|
159
|
+
allergy could be fatal. These systems choose CP and accept the operational burden of
|
|
160
|
+
unavailability during network issues.
|
|
161
|
+
|
|
162
|
+
### Leader Election and Coordination
|
|
163
|
+
|
|
164
|
+
Distributed coordination services like ZooKeeper and etcd are inherently CP. They implement
|
|
165
|
+
consensus protocols (ZAB, Raft) that sacrifice availability during partitions to ensure
|
|
166
|
+
that all nodes agree on the current state. This is necessary because their purpose is to
|
|
167
|
+
provide a single source of truth for configuration, leader election, and distributed locks.
|
|
168
|
+
|
|
169
|
+
### CP System Characteristics
|
|
170
|
+
|
|
171
|
+
| Property | Typical CP Behavior |
|
|
172
|
+
|----------|-------------------|
|
|
173
|
+
| Write path | Synchronous replication to a quorum before acknowledging |
|
|
174
|
+
| Read path | Read from leader, or read from follower with consistency check |
|
|
175
|
+
| During partition | Minority partition becomes read-only or fully unavailable |
|
|
176
|
+
| Recovery | Automatic once partition heals; no conflict resolution needed |
|
|
177
|
+
| Consensus protocol | Raft, Paxos, ZAB, or similar |
|
|
178
|
+
| Example databases | PostgreSQL (single-primary), CockroachDB, Google Spanner, etcd, ZooKeeper, HBase |
|
|
179
|
+
|
|
180
|
+
---
|
|
181
|
+
|
|
182
|
+
## When to Prioritize Availability (AP Systems)
|
|
183
|
+
|
|
184
|
+
Choose availability over consistency when the system must remain responsive at all costs
|
|
185
|
+
and temporary staleness or inconsistency is tolerable.
|
|
186
|
+
|
|
187
|
+
### Social Media Feeds and Content Platforms
|
|
188
|
+
|
|
189
|
+
When a user posts on a social platform, it is acceptable if followers in another region
|
|
190
|
+
see the post a few seconds or even minutes later. The platform must never show a user an
|
|
191
|
+
error page or refuse to load their feed because of a network issue between data centers.
|
|
192
|
+
Facebook, Twitter/X, and Instagram all prioritize availability for feed rendering.
|
|
193
|
+
|
|
194
|
+
**Real-world example:** If a user updates their profile picture on Facebook, other users
|
|
195
|
+
may see the old picture for a brief period. This is a deliberate design choice: the
|
|
196
|
+
alternative -- making the entire profile unavailable until all replicas confirm the new
|
|
197
|
+
picture -- would degrade the user experience far more than a few seconds of staleness.
|
|
198
|
+
|
|
199
|
+
### Caching Layers and CDNs
|
|
200
|
+
|
|
201
|
+
Content delivery networks are inherently AP systems. A CDN node serves cached content even
|
|
202
|
+
if it cannot reach the origin server. The content may be stale (an old version of a webpage,
|
|
203
|
+
an outdated product image) but serving stale content is vastly preferable to serving nothing.
|
|
204
|
+
DNS is another classic AP system -- DNS resolvers cache records and serve potentially stale
|
|
205
|
+
entries rather than failing when they cannot reach authoritative nameservers.
|
|
206
|
+
|
|
207
|
+
### Shopping Carts and Wishlists
|
|
208
|
+
|
|
209
|
+
Amazon's original Dynamo paper (2007) described the shopping cart as an AP use case. Items
|
|
210
|
+
added to a cart during a partition might temporarily diverge across replicas, but the system
|
|
211
|
+
resolges conflicts by merging (union of items) rather than discarding. A customer seeing a
|
|
212
|
+
previously removed item reappear in their cart is annoying; a customer being unable to add
|
|
213
|
+
items to their cart at all loses revenue.
|
|
214
|
+
|
|
215
|
+
### IoT Sensor Data and Telemetry
|
|
216
|
+
|
|
217
|
+
Sensor networks collecting temperature, humidity, or machine telemetry readings prioritize
|
|
218
|
+
availability. Missing a few readings or receiving them out of order is tolerable. Losing the
|
|
219
|
+
ability to ingest data at all -- because a network link between the collection tier and the
|
|
220
|
+
storage tier is down -- means losing irreplaceable time-series data.
|
|
221
|
+
|
|
222
|
+
### AP System Characteristics
|
|
223
|
+
|
|
224
|
+
| Property | Typical AP Behavior |
|
|
225
|
+
|----------|-------------------|
|
|
226
|
+
| Write path | Write to any available node; asynchronous replication |
|
|
227
|
+
| Read path | Read from any available node; may return stale data |
|
|
228
|
+
| During partition | Both sides of the partition continue serving reads and writes |
|
|
229
|
+
| Recovery | Conflict resolution via last-write-wins, vector clocks, CRDTs, or application-level merge |
|
|
230
|
+
| Conflict strategy | Merge, last-write-wins, or custom resolution |
|
|
231
|
+
| Example databases | Cassandra, DynamoDB (default), Riak, CouchDB, DNS |
|
|
232
|
+
|
|
233
|
+
---
|
|
234
|
+
|
|
235
|
+
## When NOT to Apply CAP
|
|
236
|
+
|
|
237
|
+
This section is as important as the sections above. CAP is frequently misapplied, leading
|
|
238
|
+
to poor architectural decisions.
|
|
239
|
+
|
|
240
|
+
### Single-Node Systems
|
|
241
|
+
|
|
242
|
+
CAP is a theorem about *distributed* systems. A single PostgreSQL instance, a single Redis
|
|
243
|
+
server, or a monolithic application with one database has no network partitions to worry
|
|
244
|
+
about. Applying CAP to these systems is a category error. If you have a single database
|
|
245
|
+
server with no replication, your concerns are durability (disk failure), capacity (can the
|
|
246
|
+
server handle the load), and recovery time -- not CAP tradeoffs.
|
|
247
|
+
|
|
248
|
+
**Common mistake:** A team chooses Cassandra "for availability" when they have a single
|
|
249
|
+
data center, a single application server, and modest data volumes. A single PostgreSQL
|
|
250
|
+
instance with good backups would serve them better, with simpler operations and stronger
|
|
251
|
+
consistency.
|
|
252
|
+
|
|
253
|
+
### Systems Where Partitions Are Handled by Infrastructure
|
|
254
|
+
|
|
255
|
+
Google Spanner is technically CP but achieves greater than 99.999% availability. How?
|
|
256
|
+
Google's private fiber network makes partitions so rare that the theoretical availability
|
|
257
|
+
sacrifice almost never materializes. Similarly, systems running within a single availability
|
|
258
|
+
zone in AWS with redundant networking face partition probabilities so low that designing
|
|
259
|
+
around partition behavior is not the dominant architectural concern.
|
|
260
|
+
|
|
261
|
+
**Key insight:** If your partition probability is 0.001%, designing your entire data model
|
|
262
|
+
around partition behavior is over-engineering. Focus on the tradeoffs that actually affect
|
|
263
|
+
your system daily: latency, throughput, operational complexity, and cost.
|
|
264
|
+
|
|
265
|
+
### Over-Simplification of "Just Pick CP or AP"
|
|
266
|
+
|
|
267
|
+
Real systems are not uniformly CP or AP. A well-designed system uses different consistency
|
|
268
|
+
levels for different operations:
|
|
269
|
+
|
|
270
|
+
- **E-commerce platform:** Product catalog browsing (AP -- eventual consistency is fine),
|
|
271
|
+
inventory reservation (CP -- must be consistent), payment processing (CP -- must be
|
|
272
|
+
consistent), order history display (AP -- slight delay is acceptable), recommendation
|
|
273
|
+
engine (AP -- stale preferences are tolerable).
|
|
274
|
+
|
|
275
|
+
- **Ride-sharing application:** Driver location updates (AP -- eventual consistency, high
|
|
276
|
+
frequency), ride matching (CP at the assignment moment -- cannot double-assign a driver),
|
|
277
|
+
fare calculation (CP -- must be based on consistent trip data), trip history (AP -- can
|
|
278
|
+
tolerate brief delays).
|
|
279
|
+
|
|
280
|
+
**Common mistake:** A team declares their entire system "AP" or "CP" and forces every
|
|
281
|
+
component into that mold, rather than making per-feature consistency decisions.
|
|
282
|
+
|
|
283
|
+
### When the Problem Is Actually Latency, Not Partitions
|
|
284
|
+
|
|
285
|
+
Many teams invoke CAP when their real problem is latency. They say "we chose AP for
|
|
286
|
+
availability" when they actually mean "we chose eventual consistency because synchronous
|
|
287
|
+
replication was too slow." This is a PACELC tradeoff (EL vs EC), not a CAP tradeoff (A vs C).
|
|
288
|
+
Conflating the two leads to architectural discussions where participants talk past each other.
|
|
289
|
+
|
|
290
|
+
### When Consensus Is the Real Requirement
|
|
291
|
+
|
|
292
|
+
If your system needs distributed transactions, global ordering of events, or leader election,
|
|
293
|
+
CAP is the wrong framework. You need to reason about consensus protocols (Raft, Paxos,
|
|
294
|
+
PBFT), their failure modes, and their performance characteristics. CAP tells you that
|
|
295
|
+
consensus is impossible during a partition -- which is true but not useful for designing the
|
|
296
|
+
consensus protocol itself.
|
|
297
|
+
|
|
298
|
+
### Real Examples of Teams Misapplying CAP
|
|
299
|
+
|
|
300
|
+
**Example 1: Choosing MongoDB "because it is CP."** A startup chose MongoDB for a social
|
|
301
|
+
media application specifically because they wanted consistency. But MongoDB with a replica
|
|
302
|
+
set is only CP for writes routed to the primary. Reads from secondaries return stale data
|
|
303
|
+
by default. The team did not configure read preferences correctly and ended up with an
|
|
304
|
+
effectively AP read path they did not intend, causing subtle bugs in their notification
|
|
305
|
+
system.
|
|
306
|
+
|
|
307
|
+
**Example 2: Choosing Cassandra "because it is AP" for financial data.** A fintech company
|
|
308
|
+
chose Cassandra for transaction records because they wanted "five nines of availability."
|
|
309
|
+
They did not realize that Cassandra's AP nature meant concurrent writes to the same
|
|
310
|
+
transaction record could conflict silently, with last-write-wins discarding earlier updates.
|
|
311
|
+
They discovered lost transactions during an audit and had to add an external coordination
|
|
312
|
+
layer (effectively rebuilding CP semantics on top of an AP database).
|
|
313
|
+
|
|
314
|
+
**Example 3: Ignoring the "Else" in PACELC.** A team chose a PC/EC database (CockroachDB)
|
|
315
|
+
for a latency-sensitive user-facing API. During normal operation (no partitions), every write
|
|
316
|
+
required cross-region consensus, adding 100-200ms of latency. Users complained about slow
|
|
317
|
+
response times. The team eventually moved user session data to a PA/EL store (Redis with
|
|
318
|
+
replication) while keeping financial data in CockroachDB -- a per-feature consistency
|
|
319
|
+
decision they should have made from the start.
|
|
320
|
+
|
|
321
|
+
---
|
|
322
|
+
|
|
323
|
+
## How It Works
|
|
324
|
+
|
|
325
|
+
### Partition Detection
|
|
326
|
+
|
|
327
|
+
A network partition is detected when nodes cannot communicate with each other within a
|
|
328
|
+
configured timeout. Detection mechanisms include:
|
|
329
|
+
|
|
330
|
+
1. **Heartbeat failure:** Nodes exchange periodic heartbeat messages. If a node misses
|
|
331
|
+
several consecutive heartbeats from a peer, it suspects a partition.
|
|
332
|
+
2. **Quorum loss:** In consensus-based systems, if a node cannot reach a majority of peers,
|
|
333
|
+
it knows it is on the minority side of a partition.
|
|
334
|
+
3. **Split-brain detection:** Some systems use a witness node, a shared disk, or a cloud
|
|
335
|
+
API as a tiebreaker to determine which side of a partition is the "real" cluster.
|
|
336
|
+
|
|
337
|
+
The difficulty is distinguishing a true network partition from a slow node. A node that
|
|
338
|
+
takes 10 seconds to respond to a heartbeat might be overloaded, not partitioned. Most
|
|
339
|
+
systems use aggressive timeouts (seconds) to detect partitions quickly, accepting that some
|
|
340
|
+
slow nodes will be falsely flagged as partitioned.
|
|
341
|
+
|
|
342
|
+
### Consistency Levels: A Spectrum
|
|
343
|
+
|
|
344
|
+
Consistency is not binary. The following levels form a hierarchy from strongest to weakest:
|
|
345
|
+
|
|
346
|
+
**Linearizability (Strongest):**
|
|
347
|
+
All operations appear to execute atomically at some point between their invocation and
|
|
348
|
+
completion. There is a total order of operations consistent with real-time ordering. If
|
|
349
|
+
operation A completes before operation B begins, A appears before B in the total order. This
|
|
350
|
+
is what CAP means by "consistency." It requires coordination on every operation and is the
|
|
351
|
+
most expensive consistency level.
|
|
352
|
+
|
|
353
|
+
**Sequential Consistency:**
|
|
354
|
+
All operations appear to execute in some total order that is consistent with the program
|
|
355
|
+
order of each individual process, but this order need not respect real-time ordering. Two
|
|
356
|
+
clients may observe writes in different orders, as long as each client sees a sequence
|
|
357
|
+
consistent with the order it issued its own operations.
|
|
358
|
+
|
|
359
|
+
**Causal Consistency:**
|
|
360
|
+
Operations that are causally related must be seen in the same order by all nodes. Causally
|
|
361
|
+
unrelated (concurrent) operations may be seen in different orders by different nodes. If
|
|
362
|
+
process A writes X, and process B reads X and then writes Y, then X causally precedes Y and
|
|
363
|
+
all nodes must see X before Y. But if process C independently writes Z with no knowledge of
|
|
364
|
+
X or Y, Z may appear at any point.
|
|
365
|
+
|
|
366
|
+
**Read-Your-Writes Consistency:**
|
|
367
|
+
A client always sees its own writes. If client A writes a value, client A's subsequent reads
|
|
368
|
+
will reflect that write. Other clients may see stale data. This is often sufficient for web
|
|
369
|
+
applications where users primarily interact with their own data.
|
|
370
|
+
|
|
371
|
+
**Eventual Consistency (Weakest):**
|
|
372
|
+
If no new updates are made, all replicas will *eventually* converge to the same value. There
|
|
373
|
+
is no bound on how long "eventually" takes (though in practice it is usually seconds). During
|
|
374
|
+
the convergence window, different replicas may return different values. This is the cheapest
|
|
375
|
+
consistency level in terms of latency and availability.
|
|
376
|
+
|
|
377
|
+
### Availability Levels
|
|
378
|
+
|
|
379
|
+
| Level | Annual Downtime | Description |
|
|
380
|
+
|-------|----------------|-------------|
|
|
381
|
+
| 99% ("two nines") | 3.65 days | Acceptable for internal tools |
|
|
382
|
+
| 99.9% ("three nines") | 8.76 hours | Standard for most web applications |
|
|
383
|
+
| 99.95% | 4.38 hours | Typical SLA for cloud databases |
|
|
384
|
+
| 99.99% ("four nines") | 52.6 minutes | High-availability production systems |
|
|
385
|
+
| 99.999% ("five nines") | 5.26 minutes | Telecom, financial systems, Google Spanner |
|
|
386
|
+
| 99.9999% ("six nines") | 31.5 seconds | Theoretical; requires extraordinary infrastructure |
|
|
387
|
+
|
|
388
|
+
Note: CAP availability and SLA availability are different concepts. CAP availability means
|
|
389
|
+
every request to a non-failing node gets a response. SLA availability means the system
|
|
390
|
+
responds successfully within a defined latency threshold for a defined percentage of
|
|
391
|
+
requests over a time window.
|
|
392
|
+
|
|
393
|
+
### Real Database CAP/PACELC Positions
|
|
394
|
+
|
|
395
|
+
| Database | CAP Position | PACELC Position | Notes |
|
|
396
|
+
|----------|-------------|----------------|-------|
|
|
397
|
+
| **PostgreSQL** (single primary) | CP | PC/EC | Followers reject writes during partition; strong consistency always |
|
|
398
|
+
| **PostgreSQL** (Patroni HA) | CP | PC/EC | Automatic failover but still single-writer; fencing prevents split-brain |
|
|
399
|
+
| **MySQL** (Group Replication) | CP | PC/EC | Multi-primary mode exists but defaults to single-primary |
|
|
400
|
+
| **Cassandra** | AP | PA/EL | Tunable consistency (ONE, QUORUM, ALL) per query; default is eventual |
|
|
401
|
+
| **DynamoDB** | AP (default) | PA/EL | Supports strongly consistent reads per-request (costs 2x throughput) |
|
|
402
|
+
| **MongoDB** | CP-ish | PC/EC | Primary handles writes; reads from secondaries can be stale unless configured |
|
|
403
|
+
| **CockroachDB** | CP | PC/EC | Serializable isolation; Raft consensus for every write; geo-partitioned leaseholders reduce latency |
|
|
404
|
+
| **Google Spanner** | CP | PC/EC | TrueTime enables external consistency; >99.999% availability via network investment |
|
|
405
|
+
| **Redis** (replicated) | AP | PA/EL | Asynchronous replication; acknowledged writes can be lost on failover |
|
|
406
|
+
| **Redis** (Sentinel) | AP | PA/EL | Sentinel provides failover but does not prevent data loss during partition |
|
|
407
|
+
| **etcd** | CP | PC/EC | Raft consensus; minority partition is unavailable |
|
|
408
|
+
| **ZooKeeper** | CP | PC/EC | ZAB protocol; minority partition refuses requests |
|
|
409
|
+
| **CouchDB** | AP | PA/EL | Multi-master replication with conflict detection; user resolves conflicts |
|
|
410
|
+
| **Riak** | AP | PA/EL | Dynamo-inspired; vector clocks and CRDTs for conflict resolution |
|
|
411
|
+
| **ScyllaDB** | AP | PA/EL | Cassandra-compatible; same tunable consistency model, higher throughput |
|
|
412
|
+
| **TiDB** | CP | PC/EC | Raft-based; strong consistency with MySQL compatibility |
|
|
413
|
+
| **YugabyteDB** | CP | PC/EC | Raft consensus; PostgreSQL-compatible wire protocol |
|
|
414
|
+
| **FoundationDB** | CP | PC/EC | Strictly serializable; Apple's iCloud backend |
|
|
415
|
+
|
|
416
|
+
---
|
|
417
|
+
|
|
418
|
+
## Trade-Offs Matrix
|
|
419
|
+
|
|
420
|
+
| Decision | Choose Consistency When | Choose Availability When | Real-World Signal |
|
|
421
|
+
|----------|----------------------|------------------------|-------------------|
|
|
422
|
+
| **Data correctness** | A wrong answer is worse than no answer (finance, medical, legal) | A stale answer is better than no answer (feeds, search, analytics) | "Can we tolerate showing outdated data for 5 seconds?" |
|
|
423
|
+
| **Conflict resolution** | Conflicts are expensive or impossible to resolve after the fact (double-booking, double-spending) | Conflicts are cheap to resolve or merge (shopping carts, like counts, view counters) | "What happens if two replicas accept conflicting writes?" |
|
|
424
|
+
| **User expectations** | Users expect to see their most recent action immediately (bank balance after transfer) | Users tolerate brief delays (social feed not showing a just-posted comment) | "Will users call support if they see stale data?" |
|
|
425
|
+
| **Regulatory requirements** | Regulations demand audit trails with total ordering (SOX, PCI-DSS, HIPAA) | No regulatory ordering requirements (content platforms, IoT telemetry) | "Do auditors need to see a globally consistent timeline?" |
|
|
426
|
+
| **Failure blast radius** | Brief unavailability affects few users or is operationally manageable | Unavailability causes revenue loss, user churn, or SLA penalties | "What costs more: 30 seconds of downtime or 30 seconds of stale data?" |
|
|
427
|
+
| **Write frequency** | Writes are infrequent relative to the consensus latency budget | Writes are high-frequency and latency-sensitive | "Can we afford 50-200ms of consensus overhead per write?" |
|
|
428
|
+
| **Geographic distribution** | Users are geographically concentrated or latency is not the primary concern | Users are globally distributed and latency is critical | "Are our users within one region or spread across continents?" |
|
|
429
|
+
| **Operational complexity** | Team can operate consensus-based systems (monitoring, debugging split-brain, quorum management) | Team prefers simpler operational model (any-node-writes, no quorum) | "Does the team have experience operating Raft/Paxos-based systems?" |
|
|
430
|
+
| **Recovery cost** | Recovery from inconsistency is expensive (manual reconciliation, compensating transactions) | Recovery from inconsistency is automated (CRDTs, last-write-wins, merge functions) | "What does our conflict resolution procedure look like?" |
|
|
431
|
+
| **Data volume and velocity** | Moderate data volume where consensus overhead is acceptable | High data volume or velocity where consensus would be a bottleneck | "Are we writing 100 records/sec or 100,000 records/sec?" |
|
|
432
|
+
|
|
433
|
+
---
|
|
434
|
+
|
|
435
|
+
## Evolution Path
|
|
436
|
+
|
|
437
|
+
Most systems should start with strong consistency and deliberately relax it where the
|
|
438
|
+
tradeoffs justify the complexity.
|
|
439
|
+
|
|
440
|
+
### Phase 1: Start with Strong Consistency
|
|
441
|
+
|
|
442
|
+
Begin with a single-primary relational database (PostgreSQL, MySQL). Every read and write
|
|
443
|
+
goes through one node. There are no CAP tradeoffs because there is no distribution. This is
|
|
444
|
+
not a limitation -- it is a feature. You get linearizable consistency, ACID transactions, and
|
|
445
|
+
simple debugging. Most applications never outgrow this phase.
|
|
446
|
+
|
|
447
|
+
### Phase 2: Identify Read Paths That Tolerate Staleness
|
|
448
|
+
|
|
449
|
+
As traffic grows, identify read operations where eventual consistency is acceptable: product
|
|
450
|
+
catalog pages, user feeds, recommendation results, analytics dashboards. Route these reads
|
|
451
|
+
to replicas with asynchronous replication. Writes still go to the primary.
|
|
452
|
+
|
|
453
|
+
### Phase 3: Add Caching for AP Read Paths
|
|
454
|
+
|
|
455
|
+
Put a caching layer (Redis, Memcached) in front of read replicas for frequently accessed,
|
|
456
|
+
staleness-tolerant data. The cache is inherently AP: it serves stale data when it cannot
|
|
457
|
+
reach the database, and cache invalidation introduces a consistency window. The result is
|
|
458
|
+
a tiered read path: hot reads from cache (AP, sub-ms), warm reads from replica (near-
|
|
459
|
+
consistent, low-ms), and consistent reads from primary (CP, higher latency).
|
|
460
|
+
|
|
461
|
+
### Phase 4: Per-Feature Consistency Decisions
|
|
462
|
+
|
|
463
|
+
As the system grows, different features adopt different consistency models based on their
|
|
464
|
+
requirements. This is the mature state. Document each feature's consistency choice and
|
|
465
|
+
rationale. For example: authentication and payment processing use CP (PostgreSQL primary,
|
|
466
|
+
serializable isolation), inventory reservation uses CP (SELECT FOR UPDATE), while product
|
|
467
|
+
catalog (Elasticsearch via CDC), shopping cart (DynamoDB), recommendations (Redis), order
|
|
468
|
+
history (read replica), and notifications (WebSocket) all use AP with varying staleness
|
|
469
|
+
tolerances.
|
|
470
|
+
|
|
471
|
+
### Phase 5: Multi-Region with Tunable Consistency
|
|
472
|
+
|
|
473
|
+
For global scale, use databases that support per-operation consistency tuning. CockroachDB
|
|
474
|
+
with geo-partitioned leaseholders pins data to the region closest to the user, reducing
|
|
475
|
+
consensus latency. DynamoDB global tables provide eventual consistency across regions with
|
|
476
|
+
strong consistency available per-request within a region.
|
|
477
|
+
|
|
478
|
+
---
|
|
479
|
+
|
|
480
|
+
## Failure Modes
|
|
481
|
+
|
|
482
|
+
### Split-Brain from Incorrect Partition Handling
|
|
483
|
+
|
|
484
|
+
**What happens:** A network partition divides a cluster into two groups. Both groups elect a
|
|
485
|
+
leader and accept writes independently. When the partition heals, the system has two
|
|
486
|
+
divergent histories that cannot be automatically reconciled.
|
|
487
|
+
|
|
488
|
+
**Real-world example:** The 2013 GitHub outage was caused by a network partition that led to
|
|
489
|
+
a split-brain scenario in their MySQL cluster. Both sides of the partition accepted writes,
|
|
490
|
+
causing data inconsistencies that required manual intervention to resolve.
|
|
491
|
+
|
|
492
|
+
**Prevention:**
|
|
493
|
+
- Use odd-numbered clusters (3, 5, 7 nodes) so quorum is always a strict majority
|
|
494
|
+
- Implement fencing tokens -- when a new leader is elected, it gets a monotonically
|
|
495
|
+
increasing token, and storage nodes reject writes from old leaders with stale tokens
|
|
496
|
+
- Use external witness services (cloud provider APIs, separate availability zone) as a
|
|
497
|
+
tiebreaker
|
|
498
|
+
- Prefer consensus protocols (Raft, Paxos) that mathematically prevent split-brain over
|
|
499
|
+
ad-hoc leader election
|
|
500
|
+
|
|
501
|
+
### Stale Reads Causing Business Logic Errors
|
|
502
|
+
|
|
503
|
+
**What happens:** A service reads stale data from an eventually consistent store and makes a
|
|
504
|
+
business decision based on that stale data. The decision is wrong because the data has since
|
|
505
|
+
changed.
|
|
506
|
+
|
|
507
|
+
**Example:** An inventory service reads available stock from a read replica (2 seconds behind
|
|
508
|
+
primary). It sees 5 units available and allows a purchase. But the primary already processed
|
|
509
|
+
4 other purchases, leaving only 1 unit. The system has now oversold.
|
|
510
|
+
|
|
511
|
+
**Prevention:**
|
|
512
|
+
- Route business-critical reads to the primary or a synchronous replica
|
|
513
|
+
- Use read-your-writes consistency for operations within a single user session
|
|
514
|
+
- Implement optimistic concurrency control (version numbers) so that writes based on stale
|
|
515
|
+
reads fail at the write step
|
|
516
|
+
- Accept eventual consistency for the read (showing "5 in stock") but enforce consistency
|
|
517
|
+
at the write (inventory decrement uses a compare-and-swap or database constraint)
|
|
518
|
+
|
|
519
|
+
### Unavailability Cascading Through the System
|
|
520
|
+
|
|
521
|
+
**What happens:** A CP data store becomes unavailable during a partition. Services that depend
|
|
522
|
+
on it also become unavailable. Services that depend on *those* services also become
|
|
523
|
+
unavailable. The blast radius expands exponentially.
|
|
524
|
+
|
|
525
|
+
**Example:** The user authentication service uses etcd (CP) for session validation. During a
|
|
526
|
+
partition, etcd's minority side is unavailable. All services in that zone cannot validate
|
|
527
|
+
sessions. The API gateway cannot authenticate requests. The entire zone is effectively down,
|
|
528
|
+
even though the application servers, databases, and network within the zone are healthy.
|
|
529
|
+
|
|
530
|
+
**Prevention:**
|
|
531
|
+
- Cache authentication tokens locally with a TTL so services can validate existing sessions
|
|
532
|
+
during brief partitions
|
|
533
|
+
- Implement circuit breakers that allow degraded operation when a CP dependency is unavailable
|
|
534
|
+
- Design fallback paths: if the CP store is unreachable, degrade gracefully rather than
|
|
535
|
+
failing completely
|
|
536
|
+
- Avoid putting CP systems on the critical path of every request
|
|
537
|
+
|
|
538
|
+
### Conflict Resolution Complexity in AP Systems
|
|
539
|
+
|
|
540
|
+
**What happens:** During a partition, both sides accept conflicting writes. When the partition
|
|
541
|
+
heals, the system must resolve these conflicts. Simple strategies (last-write-wins) lose
|
|
542
|
+
data. Complex strategies (application-level merge) introduce subtle bugs.
|
|
543
|
+
|
|
544
|
+
**Example:** Two users edit the same document during a partition. User A adds paragraph 3.
|
|
545
|
+
User B deletes paragraph 2. When the partition heals, the system must merge these changes.
|
|
546
|
+
Last-write-wins would discard one user's edits entirely. A naive merge might apply both
|
|
547
|
+
changes but produce a garbled document.
|
|
548
|
+
|
|
549
|
+
**Prevention:**
|
|
550
|
+
- Use CRDTs (Conflict-free Replicated Data Types) for data structures that can be
|
|
551
|
+
mathematically merged without conflicts (counters, sets, registers)
|
|
552
|
+
- Design data models to be append-only where possible (event sourcing) so conflicts become
|
|
553
|
+
a matter of ordering rather than overwriting
|
|
554
|
+
- Implement domain-specific merge functions that understand the semantics of the data
|
|
555
|
+
- Alert operators when conflicts occur so they can be reviewed, rather than silently
|
|
556
|
+
applying a generic resolution strategy
|
|
557
|
+
|
|
558
|
+
### Timeout Misconfiguration
|
|
559
|
+
|
|
560
|
+
**What happens:** Partition detection timeouts set too aggressively cause healthy-but-slow
|
|
561
|
+
nodes to be flagged as partitioned, triggering unnecessary failovers. Timeouts set too
|
|
562
|
+
conservatively leave the system operating inconsistently for minutes without detection.
|
|
563
|
+
|
|
564
|
+
**Prevention:** Use adaptive timeouts (phi accrual failure detector, used by Cassandra and
|
|
565
|
+
Akka). Separate "suspicion" from "declared dead" thresholds. Monitor false-positive and
|
|
566
|
+
false-negative rates and tune accordingly.
|
|
567
|
+
|
|
568
|
+
---
|
|
569
|
+
|
|
570
|
+
## Technology Landscape
|
|
571
|
+
|
|
572
|
+
### CP Databases and When to Use Them
|
|
573
|
+
|
|
574
|
+
| Database | Consensus Protocol | Consistency Level | Best For |
|
|
575
|
+
|----------|-------------------|-------------------|----------|
|
|
576
|
+
| **PostgreSQL** (single primary + replicas) | N/A (single writer) | Linearizable (primary), eventual (replicas) | General-purpose OLTP, moderate scale |
|
|
577
|
+
| **CockroachDB** | Raft | Serializable (default), read-committed available | Global OLTP requiring strong consistency with PostgreSQL compatibility |
|
|
578
|
+
| **Google Spanner** | Paxos + TrueTime | External consistency (stronger than linearizable) | Global-scale OLTP where Google Cloud is acceptable |
|
|
579
|
+
| **TiDB** | Raft (via TiKV) | Snapshot isolation (default), configurable | MySQL-compatible distributed OLTP |
|
|
580
|
+
| **YugabyteDB** | Raft | Serializable (YSQL), tunable (YCQL) | PostgreSQL-compatible distributed OLTP |
|
|
581
|
+
| **FoundationDB** | Custom (OCC + Paxos) | Strictly serializable | Low-level key-value requiring strongest guarantees |
|
|
582
|
+
| **etcd** | Raft | Linearizable | Configuration management, service discovery, leader election |
|
|
583
|
+
| **ZooKeeper** | ZAB | Linearizable | Distributed coordination, lock management |
|
|
584
|
+
|
|
585
|
+
### AP Databases and When to Use Them
|
|
586
|
+
|
|
587
|
+
| Database | Replication Model | Conflict Resolution | Best For |
|
|
588
|
+
|----------|------------------|-------------------|----------|
|
|
589
|
+
| **Cassandra** | Gossip + hinted handoff | Last-write-wins (LWW) by default, LWTs available | High-throughput time-series, IoT, logs |
|
|
590
|
+
| **DynamoDB** | Multi-master (global tables) | Last-write-wins; strong reads available per-request | Serverless, key-value, session stores |
|
|
591
|
+
| **Riak** | Vnodes + hinted handoff | Vector clocks, CRDTs, sibling resolution | High availability key-value, session stores |
|
|
592
|
+
| **CouchDB** | Multi-master HTTP replication | Revision tree, deterministic winner, user-resolves conflicts | Offline-first mobile, document sync |
|
|
593
|
+
| **ScyllaDB** | Gossip (Cassandra-compatible) | Last-write-wins (Cassandra-compatible) | Cassandra workloads requiring lower latency |
|
|
594
|
+
| **Redis** (replicated) | Async primary-replica | Last-write-wins (no conflict detection) | Caching, session stores, pub/sub |
|
|
595
|
+
|
|
596
|
+
### Tunable Consistency Databases
|
|
597
|
+
|
|
598
|
+
These databases allow per-operation consistency tuning, which is the most practical approach
|
|
599
|
+
for systems with mixed consistency requirements:
|
|
600
|
+
|
|
601
|
+
| Database | Consistency Tuning Mechanism | Range |
|
|
602
|
+
|----------|---------------------------|-------|
|
|
603
|
+
| **Cassandra** | Per-query consistency level (ONE, TWO, THREE, QUORUM, ALL, LOCAL_QUORUM, EACH_QUORUM) | Full AP to near-CP |
|
|
604
|
+
| **DynamoDB** | `ConsistentRead: true` parameter on GetItem/Query | AP (default) or CP per request |
|
|
605
|
+
| **MongoDB** | Read concern (local, majority, linearizable) + write concern (w:1, w:majority, w:all) | Near-AP to CP |
|
|
606
|
+
| **YugabyteDB** | YSQL (serializable) vs YCQL (tunable, Cassandra-compatible) | CP (YSQL) or tunable (YCQL) |
|
|
607
|
+
| **Cosmos DB** | Five consistency levels (strong, bounded staleness, session, consistent prefix, eventual) | CP to AP in five steps |
|
|
608
|
+
|
|
609
|
+
---
|
|
610
|
+
|
|
611
|
+
## Decision Tree
|
|
612
|
+
|
|
613
|
+
Use this flowchart to determine the appropriate consistency model for a specific feature
|
|
614
|
+
or operation (not for the entire system):
|
|
615
|
+
|
|
616
|
+
```
|
|
617
|
+
START: What happens if this operation returns stale data?
|
|
618
|
+
|
|
|
619
|
+
|-- "Financial loss, safety risk, or regulatory violation"
|
|
620
|
+
| |
|
|
621
|
+
| --> Use STRONG CONSISTENCY (CP)
|
|
622
|
+
| |
|
|
623
|
+
| |-- Is this a global system with cross-region writes?
|
|
624
|
+
| | |-- Yes --> CockroachDB, Spanner, YugabyteDB
|
|
625
|
+
| | |-- No --> PostgreSQL (single primary), MySQL with Group Replication
|
|
626
|
+
| |
|
|
627
|
+
| |-- Is this a coordination/config primitive?
|
|
628
|
+
| |-- Yes --> etcd, ZooKeeper
|
|
629
|
+
| |-- No --> Use database above with serializable isolation
|
|
630
|
+
|
|
|
631
|
+
|-- "User confusion but no lasting harm"
|
|
632
|
+
| |
|
|
633
|
+
| --> Use SESSION CONSISTENCY (read-your-writes)
|
|
634
|
+
| |
|
|
635
|
+
| |-- Route user's reads to the same node that processed their writes
|
|
636
|
+
| |-- Or use sticky sessions with a bounded staleness guarantee
|
|
637
|
+
| |-- Example: User sees their own post immediately; followers see it within seconds
|
|
638
|
+
|
|
|
639
|
+
|-- "Minor inconvenience or unnoticeable"
|
|
640
|
+
| |
|
|
641
|
+
| --> Use EVENTUAL CONSISTENCY (AP)
|
|
642
|
+
| |
|
|
643
|
+
| |-- Can conflicts be automatically resolved?
|
|
644
|
+
| | |-- Yes, with LWW --> Cassandra, DynamoDB, Redis
|
|
645
|
+
| | |-- Yes, with CRDTs --> Riak, custom implementation
|
|
646
|
+
| | |-- No, needs manual merge --> CouchDB, application-level resolution
|
|
647
|
+
| |
|
|
648
|
+
| |-- Is this a cache?
|
|
649
|
+
| |-- Yes --> Redis, Memcached with TTL-based invalidation
|
|
650
|
+
| |-- No --> Choose based on data model and query patterns
|
|
651
|
+
|
|
|
652
|
+
|-- "It depends on the specific field or context"
|
|
653
|
+
|
|
|
654
|
+
--> Use MIXED CONSISTENCY (tunable per-operation)
|
|
655
|
+
|
|
|
656
|
+
|-- Use Cosmos DB's five levels, or
|
|
657
|
+
|-- Use DynamoDB with per-request ConsistentRead, or
|
|
658
|
+
|-- Use separate databases for different consistency tiers
|
|
659
|
+
```
|
|
660
|
+
|
|
661
|
+
---
|
|
662
|
+
|
|
663
|
+
## Implementation Sketch
|
|
664
|
+
|
|
665
|
+
### Pattern: Consistency Tier Router
|
|
666
|
+
|
|
667
|
+
A middleware that routes requests to different data stores based on the consistency
|
|
668
|
+
requirement of each operation:
|
|
669
|
+
|
|
670
|
+
```python
|
|
671
|
+
from enum import Enum
|
|
672
|
+
from typing import Any, Optional
|
|
673
|
+
|
|
674
|
+
class ConsistencyLevel(Enum):
|
|
675
|
+
STRONG = "strong" # Linearizable reads from primary
|
|
676
|
+
SESSION = "session" # Read-your-writes within a session
|
|
677
|
+
BOUNDED = "bounded" # Staleness bounded by time or version
|
|
678
|
+
EVENTUAL = "eventual" # Read from any replica or cache
|
|
679
|
+
|
|
680
|
+
class ConsistencyRouter:
|
|
681
|
+
"""Routes data operations to the appropriate store based on
|
|
682
|
+
the consistency level required by each operation."""
|
|
683
|
+
|
|
684
|
+
def __init__(self, primary_db, read_replica, cache):
|
|
685
|
+
self.primary = primary_db # CP: PostgreSQL primary
|
|
686
|
+
self.replica = read_replica # Near-consistent: streaming replica
|
|
687
|
+
self.cache = cache # AP: Redis cache
|
|
688
|
+
|
|
689
|
+
def read(self, key: str, level: ConsistencyLevel,
|
|
690
|
+
session_id: Optional[str] = None) -> Any:
|
|
691
|
+
if level == ConsistencyLevel.STRONG:
|
|
692
|
+
# Always read from primary -- linearizable
|
|
693
|
+
return self.primary.read(key)
|
|
694
|
+
|
|
695
|
+
if level == ConsistencyLevel.SESSION:
|
|
696
|
+
# Check if this session wrote recently
|
|
697
|
+
last_write_ts = self.cache.get(f"session:{session_id}:lwt:{key}")
|
|
698
|
+
if last_write_ts and self.replica.lag() > last_write_ts:
|
|
699
|
+
# Replica has not caught up to this session's write
|
|
700
|
+
return self.primary.read(key)
|
|
701
|
+
return self.replica.read(key)
|
|
702
|
+
|
|
703
|
+
if level == ConsistencyLevel.BOUNDED:
|
|
704
|
+
# Read from replica if lag is within bounds
|
|
705
|
+
if self.replica.lag_seconds() < 5:
|
|
706
|
+
return self.replica.read(key)
|
|
707
|
+
return self.primary.read(key)
|
|
708
|
+
|
|
709
|
+
if level == ConsistencyLevel.EVENTUAL:
|
|
710
|
+
# Try cache first, then replica, then primary
|
|
711
|
+
cached = self.cache.get(key)
|
|
712
|
+
if cached is not None:
|
|
713
|
+
return cached
|
|
714
|
+
value = self.replica.read(key)
|
|
715
|
+
self.cache.set(key, value, ttl=60)
|
|
716
|
+
return value
|
|
717
|
+
|
|
718
|
+
def write(self, key: str, value: Any,
|
|
719
|
+
session_id: Optional[str] = None) -> None:
|
|
720
|
+
# Writes always go to primary (CP path)
|
|
721
|
+
self.primary.write(key, value)
|
|
722
|
+
|
|
723
|
+
# Record write timestamp for session consistency
|
|
724
|
+
if session_id:
|
|
725
|
+
self.cache.set(
|
|
726
|
+
f"session:{session_id}:lwt:{key}",
|
|
727
|
+
self.primary.current_lsn(),
|
|
728
|
+
ttl=300
|
|
729
|
+
)
|
|
730
|
+
|
|
731
|
+
# Async cache invalidation (AP path)
|
|
732
|
+
self.cache.delete(key)
|
|
733
|
+
```
|
|
734
|
+
|
|
735
|
+
### Pattern: Feature Consistency Declaration
|
|
736
|
+
|
|
737
|
+
Declare consistency requirements per feature in configuration, not in code, so they can
|
|
738
|
+
be reviewed and audited. Each entry specifies the consistency level, backing store,
|
|
739
|
+
optional staleness bound, conflict resolution strategy, and a rationale:
|
|
740
|
+
|
|
741
|
+
```yaml
|
|
742
|
+
# consistency-config.yaml
|
|
743
|
+
features:
|
|
744
|
+
user_authentication: { consistency: strong, store: postgresql_primary, rationale: "Security-critical" }
|
|
745
|
+
product_catalog: { consistency: eventual, store: elasticsearch, max_staleness: 30s }
|
|
746
|
+
inventory_check: { consistency: bounded, store: postgresql_replica, max_staleness: 2s }
|
|
747
|
+
inventory_reservation:{ consistency: strong, store: postgresql_primary, isolation: serializable }
|
|
748
|
+
shopping_cart: { consistency: session, store: dynamodb, conflict_resolution: union_merge }
|
|
749
|
+
recommendation_feed: { consistency: eventual, store: redis_cache, max_staleness: 5m }
|
|
750
|
+
payment_processing: { consistency: strong, store: postgresql_primary, isolation: serializable }
|
|
751
|
+
```
|
|
752
|
+
|
|
753
|
+
---
|
|
754
|
+
|
|
755
|
+
## Key Takeaways
|
|
756
|
+
|
|
757
|
+
1. **CAP is about partitions, not about normal operation.** During normal operation, a
|
|
758
|
+
well-designed distributed system provides both consistency and availability. The
|
|
759
|
+
tradeoff only manifests during network partitions.
|
|
760
|
+
|
|
761
|
+
2. **PACELC is the more useful model.** It captures the latency-consistency tradeoff that
|
|
762
|
+
dominates day-to-day system design, not just the partition-time tradeoff.
|
|
763
|
+
|
|
764
|
+
3. **Per-feature, not per-system.** Choose consistency levels per feature, per operation,
|
|
765
|
+
or even per request. No serious production system is uniformly CP or AP.
|
|
766
|
+
|
|
767
|
+
4. **Start consistent, relax deliberately.** Begin with strong consistency. Identify paths
|
|
768
|
+
where eventual consistency is acceptable. Document the rationale for each relaxation.
|
|
769
|
+
|
|
770
|
+
5. **The real question is cost of inconsistency vs cost of unavailability.** For each
|
|
771
|
+
feature, quantify what happens when data is stale versus what happens when the service
|
|
772
|
+
is down. The answer determines your consistency choice.
|
|
773
|
+
|
|
774
|
+
---
|
|
775
|
+
|
|
776
|
+
## Cross-References
|
|
777
|
+
|
|
778
|
+
- **distributed-systems-fundamentals** -- Foundational concepts (replication, consensus, failure models) that underpin CAP
|
|
779
|
+
- **data-consistency** -- Deep dive into consistency models, isolation levels, and implementation patterns
|
|
780
|
+
- **consensus-and-coordination** -- Raft, Paxos, ZAB, and other protocols that implement CP guarantees
|
|
781
|
+
- **sql-vs-nosql** -- Database selection criteria beyond CAP, including data model, query patterns, and operational concerns
|
|
782
|
+
|
|
783
|
+
---
|
|
784
|
+
|
|
785
|
+
## Sources
|
|
786
|
+
|
|
787
|
+
- Brewer, E. (2000). "Towards Robust Distributed Systems." ACM PODC Keynote.
|
|
788
|
+
- Gilbert, S. and Lynch, N. (2002). "Brewer's Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services."
|
|
789
|
+
- Brewer, E. (2012). "CAP Twelve Years Later: How the 'Rules' Have Changed." IEEE Computer.
|
|
790
|
+
- Abadi, D. (2012). "Consistency Tradeoffs in Modern Distributed Database System Design." IEEE Computer.
|
|
791
|
+
- Kleppmann, M. (2015). ["Please stop calling databases CP or AP."](https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html)
|
|
792
|
+
- Corbett, J. et al. (2013). ["Spanner, TrueTime and the CAP Theorem."](https://research.google/pubs/spanner-truetime-and-the-cap-theorem/) Google Research.
|
|
793
|
+
- DeCandia, G. et al. (2007). "Dynamo: Amazon's Highly Available Key-value Store." SOSP.
|
|
794
|
+
- [CAP Theorem -- Wikipedia](https://en.wikipedia.org/wiki/CAP_theorem)
|
|
795
|
+
- [PACELC Theorem -- Wikipedia](https://en.wikipedia.org/wiki/PACELC_design_principle)
|
|
796
|
+
- [Jepsen: Consistency Models](https://jepsen.io/consistency)
|
|
797
|
+
- [Consistency and Partition Tolerance -- ByteByteGo](https://blog.bytebytego.com/p/consistency-and-partition-tolerance)
|
|
798
|
+
- [CAP Theorem -- IBM](https://www.ibm.com/think/topics/cap-theorem)
|
|
799
|
+
- [PACELC Theorem -- ScyllaDB](https://www.scylladb.com/glossary/pacelc-theorem/)
|
|
800
|
+
- [Strong Consistency Models -- Aphyr](https://aphyr.com/posts/313-strong-consistency-models)
|