@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,615 @@
|
|
|
1
|
+
# Database Scaling — Architecture Expertise Module
|
|
2
|
+
|
|
3
|
+
> Database scaling is typically the hardest scaling challenge because databases are stateful. The progression: optimize queries → add indexes → vertical scaling → read replicas → caching → partitioning → sharding. Most applications never need to go past read replicas. Premature sharding is one of the most expensive architectural mistakes.
|
|
4
|
+
|
|
5
|
+
> **Category:** Scaling
|
|
6
|
+
> **Complexity:** Complex
|
|
7
|
+
> **Applies when:** Database becoming a performance bottleneck due to query volume, data size, or write throughput
|
|
8
|
+
|
|
9
|
+
---
|
|
10
|
+
|
|
11
|
+
## What This Is
|
|
12
|
+
|
|
13
|
+
Database scaling increases a system's capacity to handle growing workloads — more QPS, larger datasets, higher write throughput. Unlike stateless app servers, databases hold persistent state, making every scaling step fundamentally harder.
|
|
14
|
+
|
|
15
|
+
### The Scaling Ladder
|
|
16
|
+
|
|
17
|
+
Each step is roughly **10x harder** than the previous one in complexity, operational burden, and risk.
|
|
18
|
+
|
|
19
|
+
```
|
|
20
|
+
Level 0: Optimize Queries — Cost: Hours | Risk: None | Impact: 2-100x
|
|
21
|
+
Level 1: Add/Improve Indexes — Cost: Hours | Risk: Low | Impact: 10-1000x
|
|
22
|
+
Level 2: Vertical Scaling — Cost: Minutes | Risk: Low | Impact: 2-8x
|
|
23
|
+
Level 3: Connection Pooling — Cost: Days | Risk: Low | Impact: 2-10x
|
|
24
|
+
Level 4: Read Replicas — Cost: Days | Risk: Medium | Impact: 2-50x
|
|
25
|
+
Level 5: Caching Layer — Cost: Weeks | Risk: Medium | Impact: 10-100x
|
|
26
|
+
Level 6: Table Partitioning — Cost: Weeks | Risk: Medium | Impact: 2-10x
|
|
27
|
+
Level 7: Vertical Partitioning — Cost: Months | Risk: High | Impact: 2-10x
|
|
28
|
+
Level 8: Horizontal Sharding — Cost: Months+ | Risk: Very High| Impact: 10-1000x
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
**Critical insight:** Levels 0-3 are free or nearly free. Level 4 handles 80%+ of scaling needs. Level 5 handles another 15%. Only ~5% of applications ever genuinely need Levels 6-8. OpenAI serves 800M ChatGPT users with a single PostgreSQL primary and ~50 read replicas — no sharding.
|
|
32
|
+
|
|
33
|
+
---
|
|
34
|
+
|
|
35
|
+
## When to Use Each Level
|
|
36
|
+
|
|
37
|
+
### Level 0-1: Query and Index Optimization — Always First
|
|
38
|
+
|
|
39
|
+
**Signals:** Slow queries >100ms, sequential scans on tables >10K rows, N+1 patterns, missing composite indexes.
|
|
40
|
+
**Tools:** `EXPLAIN ANALYZE`, `pg_stat_statements`, `auto_explain`.
|
|
41
|
+
**Outcome:** 10-1000x improvement on specific queries. Often eliminates all other scaling needs.
|
|
42
|
+
|
|
43
|
+
### Level 2: Vertical Scaling — Bigger Hardware
|
|
44
|
+
|
|
45
|
+
**Signals:** CPU >70% sustained after query optimization, instance is not the largest available.
|
|
46
|
+
**Ceiling:** AWS offers 64 vCPUs / 512GB RAM (r6g.16xlarge). Azure offers 128 vCPUs / 4TB RAM. Most databases never exhaust these.
|
|
47
|
+
|
|
48
|
+
### Level 3: Connection Pooling — Reduce Overhead
|
|
49
|
+
|
|
50
|
+
**Signals:** `max_connections` errors, hundreds of idle connections (~10MB each in PostgreSQL), serverless connection storms.
|
|
51
|
+
**Rule:** Any app with >100 RPS should use PgBouncer (transaction mode) or ProxySQL.
|
|
52
|
+
|
|
53
|
+
### Level 4: Read Replicas — Scale Reads (80% of use cases)
|
|
54
|
+
|
|
55
|
+
**Signals:** >80% read workload, single primary CPU saturated by SELECTs, can tolerate <100ms replication lag.
|
|
56
|
+
**Scale:** 2-5 replicas (most apps), 5-15 (high-traffic SaaS), 15-50 (exceptional — OpenAI).
|
|
57
|
+
**Requirement:** Application must handle eventual consistency and read-your-writes routing.
|
|
58
|
+
|
|
59
|
+
### Level 5: Caching Layer — Absorb Hot Reads
|
|
60
|
+
|
|
61
|
+
**Signals:** Hot data <10% of total, same queries repeated thousands of times/min, data changes infrequently.
|
|
62
|
+
**Patterns:** Cache-aside (lazy load), write-through, write-behind. Redis preferred over Memcached.
|
|
63
|
+
**Critical:** Implement thundering herd protection (cache locks). OpenAI uses single-flight cache locking.
|
|
64
|
+
|
|
65
|
+
### Level 6: Table Partitioning — Split Large Tables
|
|
66
|
+
|
|
67
|
+
**Signals:** Tables >100M rows, VACUUM/REINDEX taking hours, queries naturally filter by partition key (date, tenant).
|
|
68
|
+
**Strategies:** Range (time-series), List (categorical), Hash (even distribution).
|
|
69
|
+
|
|
70
|
+
### Level 7: Vertical Partitioning — Split by Domain
|
|
71
|
+
|
|
72
|
+
**Signals:** Independent feature domains with different scaling needs, no cross-table JOINs needed between groups.
|
|
73
|
+
**Example:** Figma went from 1 to 12 PostgreSQL databases by moving table groups (Files, Organizations) to dedicated servers.
|
|
74
|
+
|
|
75
|
+
### Level 8: Horizontal Sharding — Split Rows Across Servers
|
|
76
|
+
|
|
77
|
+
**ALL must be true:** Vertical scaling exhausted, read replicas insufficient (write-bound), connection pooling and caching in place, single table >1-5TB and growing, writes >50-200K/sec sustained.
|
|
78
|
+
**Reality:** Most PostgreSQL instances handle 1TB+ with proper indexing. You almost certainly do not need sharding.
|
|
79
|
+
|
|
80
|
+
---
|
|
81
|
+
|
|
82
|
+
## When NOT to Shard
|
|
83
|
+
|
|
84
|
+
**Premature sharding is one of the most catastrophic architectural mistakes.** It is essentially irreversible without a multi-month migration.
|
|
85
|
+
|
|
86
|
+
### What Sharding Destroys
|
|
87
|
+
|
|
88
|
+
| Operation | Before Sharding | After Sharding |
|
|
89
|
+
|-----------|-----------------|----------------|
|
|
90
|
+
| Simple query | Single-node lookup | Route to correct shard |
|
|
91
|
+
| Aggregate query | `SELECT COUNT(*)` | Query ALL shards, merge results |
|
|
92
|
+
| JOIN | Standard SQL JOIN | Impossible across shard keys, or scatter-gather |
|
|
93
|
+
| Transaction | `BEGIN; ... COMMIT;` | Distributed 2PC across shards |
|
|
94
|
+
| Schema migration | Single `ALTER TABLE` | Execute on EVERY shard, coordinate rollback |
|
|
95
|
+
| Unique constraint | Database enforces | Application must enforce globally |
|
|
96
|
+
| Foreign keys | Database enforces | Cannot enforce across shards |
|
|
97
|
+
|
|
98
|
+
### Real-World Sharding Disasters
|
|
99
|
+
|
|
100
|
+
**The $2.9M Rollback:** A company spent $2.9M (implementation + rollback), wasted 30 months, lost 5 engineers. The database had 100M records / 2TB with 8% CPU and 34% memory usage at 400 QPS. Sharding was never needed — proper indexing would have sufficed.
|
|
101
|
+
|
|
102
|
+
**Foursquare Shard Imbalance (2010):** Two shards, uneven user distribution — one grew to 67GB (exceeding RAM), the other 50GB. Performance collapsed, causing an 11-hour outage. Even "simple" sharding has unpredictable failure modes.
|
|
103
|
+
|
|
104
|
+
**Wrong Shard Key:** E-commerce team sharded by `order_id` (even distribution), but every query was "show all orders for customer X" — scatter-gather across ALL shards. Resharding by `customer_id` required migrating billions of rows over 3 weeks.
|
|
105
|
+
|
|
106
|
+
**Gaming Hot Shard:** Sharded by `game_id` — one viral game got 80% of traffic on one shard while 15 others sat idle.
|
|
107
|
+
|
|
108
|
+
### Do NOT Shard If
|
|
109
|
+
|
|
110
|
+
1. Database under 500GB (proper indexing + vertical scaling handles it)
|
|
111
|
+
2. Under 50K QPS (connection pooling + read replicas suffice)
|
|
112
|
+
3. Under 10K writes/sec (single PostgreSQL primary handles this easily)
|
|
113
|
+
4. Fewer than 10 engineers (sharding requires dedicated DB engineering capacity)
|
|
114
|
+
5. Many cross-entity relationships (sharding destroys JOINs)
|
|
115
|
+
6. You haven't exhausted Levels 0-6 (every prior level is cheaper and less risky)
|
|
116
|
+
|
|
117
|
+
---
|
|
118
|
+
|
|
119
|
+
## How It Works
|
|
120
|
+
|
|
121
|
+
### Read Replicas: Replication and Routing
|
|
122
|
+
|
|
123
|
+
```
|
|
124
|
+
┌──────────────┐
|
|
125
|
+
│ Application │
|
|
126
|
+
└──────┬───────┘
|
|
127
|
+
┌──────▼───────┐
|
|
128
|
+
│ Router/Proxy │
|
|
129
|
+
└──┬─────┬────┬┘
|
|
130
|
+
┌─────▼──┐ ┌▼────▼──┐
|
|
131
|
+
│Primary │ │Replicas │ ← WAL streaming from primary
|
|
132
|
+
│(writes)│ │ (reads) │
|
|
133
|
+
└────────┘ └─────────┘
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**Replication lag patterns:**
|
|
137
|
+
|
|
138
|
+
1. **Read-your-writes:** After a write, route the same session's reads to the primary for a configurable window (e.g., 5 seconds):
|
|
139
|
+
```python
|
|
140
|
+
class DatabaseRouter:
|
|
141
|
+
def route(self, query, session):
|
|
142
|
+
if query.is_write():
|
|
143
|
+
return self.primary
|
|
144
|
+
if session.has_recent_write(window=5_seconds):
|
|
145
|
+
return self.primary # read-your-writes consistency
|
|
146
|
+
return self.replica_pool.next() # round-robin replicas
|
|
147
|
+
```
|
|
148
|
+
2. **Monotonic reads:** Pin a user session to one replica to avoid reading from a more-lagged replica after a less-lagged one.
|
|
149
|
+
3. **Causal consistency:** Track logical timestamps; route reads to any replica caught up past the last write's timestamp.
|
|
150
|
+
|
|
151
|
+
**WAL distribution at scale (OpenAI):** At ~50 replicas, the primary cannot stream WAL to all — network bandwidth and CPU pressure cause unstable replica lag. Solution: intermediate "relay" replicas form a tree topology (primary -> relay -> leaf), enabling 100+ replicas without overwhelming the primary.
|
|
152
|
+
|
|
153
|
+
### Connection Pooling: PgBouncer
|
|
154
|
+
|
|
155
|
+
| Mode | Behavior | Use Case |
|
|
156
|
+
|------|----------|----------|
|
|
157
|
+
| Session | Client owns connection for entire session | Legacy apps, prepared statements |
|
|
158
|
+
| Transaction | Connection returned after each transaction | **Most production workloads** |
|
|
159
|
+
| Statement | Connection returned after each statement | Simple read-only workloads |
|
|
160
|
+
|
|
161
|
+
Pool size formula: `(available_RAM / 20MB) / num_databases`, capped at 100-200.
|
|
162
|
+
|
|
163
|
+
### Partitioning: Range, Hash, List
|
|
164
|
+
|
|
165
|
+
```sql
|
|
166
|
+
-- Range partitioning by date (most common)
|
|
167
|
+
CREATE TABLE events (
|
|
168
|
+
id BIGINT GENERATED ALWAYS AS IDENTITY,
|
|
169
|
+
created_at TIMESTAMPTZ NOT NULL,
|
|
170
|
+
payload JSONB
|
|
171
|
+
) PARTITION BY RANGE (created_at);
|
|
172
|
+
|
|
173
|
+
CREATE TABLE events_2025_01 PARTITION OF events
|
|
174
|
+
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
```sql
|
|
178
|
+
-- Hash partitioning for even distribution
|
|
179
|
+
CREATE TABLE orders (
|
|
180
|
+
id BIGINT GENERATED ALWAYS AS IDENTITY,
|
|
181
|
+
customer_id BIGINT NOT NULL,
|
|
182
|
+
total NUMERIC(10,2)
|
|
183
|
+
) PARTITION BY HASH (customer_id);
|
|
184
|
+
|
|
185
|
+
CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (MODULUS 8, REMAINDER 0);
|
|
186
|
+
CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (MODULUS 8, REMAINDER 1);
|
|
187
|
+
-- ... through REMAINDER 7
|
|
188
|
+
```
|
|
189
|
+
|
|
190
|
+
Partition pruning: `WHERE created_at >= '2025-01-15' AND created_at < '2025-01-20'` scans only the January partition, not all data. Verify with `EXPLAIN ANALYZE`.
|
|
191
|
+
|
|
192
|
+
### Horizontal Sharding: Shard Key Selection
|
|
193
|
+
|
|
194
|
+
The **single most consequential decision** in sharding. Criteria: (1) High cardinality — many distinct values. (2) Even distribution — no power-law hotspots. (3) Query alignment — most queries filter by shard key. (4) Growth stability — stays even as data grows.
|
|
195
|
+
|
|
196
|
+
**Instagram's ID design:** 41 bits timestamp + 13 bits shard ID (8192 logical shards) + 10 bits sequence. Logical shards map to physical servers. Rebalancing moves logical shards — no row-level resharding.
|
|
197
|
+
|
|
198
|
+
**Routing strategies:**
|
|
199
|
+
|
|
200
|
+
| Strategy | Pros | Cons |
|
|
201
|
+
|----------|------|------|
|
|
202
|
+
| Hash (modulo) | Simple, even distribution | Resharding moves ~all data |
|
|
203
|
+
| Consistent hashing | Minimal data movement on reshard | More complex |
|
|
204
|
+
| Range-based | Efficient range queries | Hot shard risk |
|
|
205
|
+
| Directory/lookup | Maximum flexibility | Lookup table is SPOF |
|
|
206
|
+
|
|
207
|
+
### Query Optimization: EXPLAIN ANALYZE
|
|
208
|
+
|
|
209
|
+
The single most important command for database performance:
|
|
210
|
+
|
|
211
|
+
```sql
|
|
212
|
+
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
|
|
213
|
+
SELECT u.name, COUNT(o.id) as order_count
|
|
214
|
+
FROM users u
|
|
215
|
+
JOIN orders o ON o.user_id = u.id
|
|
216
|
+
WHERE u.created_at > '2025-01-01'
|
|
217
|
+
GROUP BY u.name
|
|
218
|
+
ORDER BY order_count DESC LIMIT 10;
|
|
219
|
+
```
|
|
220
|
+
|
|
221
|
+
**Red flags in output:**
|
|
222
|
+
- `Seq Scan` on large tables -> needs an index
|
|
223
|
+
- `Nested Loop` with large outer table -> consider `Hash Join`
|
|
224
|
+
- `Rows Removed by Filter` >> actual rows returned -> index not selective enough
|
|
225
|
+
- `Buffers: shared read` >> `shared hit` -> working set exceeds `shared_buffers`
|
|
226
|
+
- `Sort Method: external merge Disk` -> `work_mem` too low for this query
|
|
227
|
+
|
|
228
|
+
---
|
|
229
|
+
|
|
230
|
+
## Trade-Offs Matrix
|
|
231
|
+
|
|
232
|
+
| Strategy | Complexity | Read Scale | Write Scale | Consistency | Reversibility |
|
|
233
|
+
|----------|-----------|------------|-------------|-------------|---------------|
|
|
234
|
+
| Query optimization | Very Low | High | Medium | Full | N/A |
|
|
235
|
+
| Indexing | Low | Very High | Slight negative | Full | Easy |
|
|
236
|
+
| Vertical scaling | Very Low | Medium | Medium | Full | Trivial |
|
|
237
|
+
| Connection pooling | Low | Medium | Medium | Full | Easy |
|
|
238
|
+
| Read replicas | Medium | Very High | None | Eventual | Moderate |
|
|
239
|
+
| Caching (Redis) | Medium | Very High | None | Eventual/TTL | Moderate |
|
|
240
|
+
| Table partitioning | Medium | Medium | Low | Full | Difficult |
|
|
241
|
+
| Vertical partitioning | High | Medium | Medium | Per-database | Very Difficult |
|
|
242
|
+
| Horizontal sharding | Very High | Very High | Very High | Per-shard | Nearly Impossible |
|
|
243
|
+
|
|
244
|
+
| Strategy | Team Size | Time to Implement | Data Model Impact |
|
|
245
|
+
|----------|-----------|-------------------|-------------------|
|
|
246
|
+
| Query/Index optimization | 1 engineer | Hours-Days | None |
|
|
247
|
+
| Vertical scaling | 1 SRE | Minutes | None |
|
|
248
|
+
| Connection pooling | 1-2 engineers | Days | None |
|
|
249
|
+
| Read replicas | 2-3 engineers | Days-Weeks | Minimal (read routing) |
|
|
250
|
+
| Caching | 2-3 engineers | Weeks | Moderate (invalidation logic) |
|
|
251
|
+
| Table partitioning | 2-3 engineers | Weeks | Moderate (partition key) |
|
|
252
|
+
| Vertical partitioning | 3-5 engineers | Months | High (no cross-DB JOINs) |
|
|
253
|
+
| Horizontal sharding | 5-10 engineers | Months-Years | Fundamental (shard key governs everything) |
|
|
254
|
+
|
|
255
|
+
---
|
|
256
|
+
|
|
257
|
+
## Evolution Path
|
|
258
|
+
|
|
259
|
+
### Stage 1: Foundation (0-1K users, <100 RPS)
|
|
260
|
+
Single PostgreSQL instance. Set up `pg_stat_statements` and `auto_explain` from day one.
|
|
261
|
+
**Advance when:** P95 latency >200ms OR CPU >50% sustained.
|
|
262
|
+
|
|
263
|
+
### Stage 2: Optimization (1K-100K users, 100-1K RPS)
|
|
264
|
+
Fix top 10 queries by total time. Add composite indexes. Eliminate N+1 queries. Add PgBouncer. Tune `shared_buffers`, `work_mem`, `effective_cache_size`.
|
|
265
|
+
|
|
266
|
+
**Key metrics targets:**
|
|
267
|
+
```
|
|
268
|
+
Cache hit ratio: > 99% (if below, increase shared_buffers)
|
|
269
|
+
Connection count: < 80% of max_connections
|
|
270
|
+
CPU utilization: < 70% sustained
|
|
271
|
+
Disk I/O wait: < 10%
|
|
272
|
+
Query p99 latency: < 100ms
|
|
273
|
+
```
|
|
274
|
+
**Advance when:** CPU >70% after optimization.
|
|
275
|
+
|
|
276
|
+
### Stage 3: Read Scaling (100K-10M users, 1K-50K RPS)
|
|
277
|
+
Deploy 2-5 read replicas. Implement read/write routing with read-your-writes consistency. Monitor replication lag; alert at >1s.
|
|
278
|
+
**Advance when:** Replicas saturated OR write throughput is the bottleneck.
|
|
279
|
+
|
|
280
|
+
### Stage 4: Caching (10M+ users, >10K RPS)
|
|
281
|
+
Redis cache-aside for hot entities. Cache invalidation on writes. Thundering herd protection via cache locks (OpenAI pattern: single-flight, one request populates cache, others wait).
|
|
282
|
+
**Advance when:** Single-table performance degrades despite caching.
|
|
283
|
+
|
|
284
|
+
### Stage 5: Partitioning (tables >100M rows)
|
|
285
|
+
Identify tables with natural partition keys. Range by time is most common. Use `pg_partman` for automation. Verify partition pruning in `EXPLAIN` output.
|
|
286
|
+
**Advance when:** Aggregate load exceeds single server.
|
|
287
|
+
|
|
288
|
+
### Stage 6: Vertical Partitioning
|
|
289
|
+
Move independent table groups to separate databases (Figma: 1 → 12 databases). Remove cross-domain JOINs first.
|
|
290
|
+
**Advance when:** Single-table write throughput exceeds vertical limits. (Rare.)
|
|
291
|
+
|
|
292
|
+
### Stage 7: Horizontal Sharding
|
|
293
|
+
Select shard key. Choose middleware (Vitess/Citus/application-level). Use logical shards (Instagram: 8192 logical → few physical). Build dual-write migration with verification (Shopify). Plan for years, not months (Slack: 3 years).
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## Failure Modes
|
|
298
|
+
|
|
299
|
+
### 1. Replication Lag → Stale Reads
|
|
300
|
+
User creates a record, sees it missing. **Fix:** Read-your-writes routing; synchronous replication for critical paths; monitor lag, circuit-break to primary if >threshold.
|
|
301
|
+
|
|
302
|
+
### 2. Connection Pool Exhaustion
|
|
303
|
+
"Too many connections" errors, total unavailability. **Fix:** Size pools correctly, set `statement_timeout` (30s), monitor at 80% utilization, use `SHOW POOLS` to diagnose.
|
|
304
|
+
|
|
305
|
+
### 3. Shard Key Hot Spots
|
|
306
|
+
One shard overwhelmed while others idle (gaming company: 80% traffic on one shard). **Fix:** Hash-based sharding, split hot keys with salt, monitor per-shard imbalance.
|
|
307
|
+
|
|
308
|
+
### 4. Cross-Shard JOIN Impossibility
|
|
309
|
+
Features requiring multi-shard data become impossible. **Fix:** Denormalize joined data into each shard, maintain read-only aggregate store for analytics, application-level joins.
|
|
310
|
+
|
|
311
|
+
### 5. Resharding Downtime
|
|
312
|
+
System unavailable during rebalancing. **Fix:** Use virtual/logical shards from day one (Instagram: 8192 logical shards), consistent hashing, online dual-write resharding.
|
|
313
|
+
|
|
314
|
+
### 6. Cache Stampede (Thundering Herd)
|
|
315
|
+
DB load spikes when popular cache entries expire simultaneously. **Fix:** Cache locks / single-flight, jittered TTLs, background refresh before expiry.
|
|
316
|
+
|
|
317
|
+
### 7. Partition Explosion
|
|
318
|
+
Query planner slows with >10K partitions. **Fix:** Keep under 1000 (ideally <100), coarser granularity, drop old partitions.
|
|
319
|
+
|
|
320
|
+
### 8. Split-Brain During Failover
|
|
321
|
+
Two nodes both accepting writes after failover. **Fix:** Fencing (STONITH), monitor timeline divergence, use managed services that handle failover correctly, test regularly.
|
|
322
|
+
|
|
323
|
+
---
|
|
324
|
+
|
|
325
|
+
## Technology Landscape
|
|
326
|
+
|
|
327
|
+
### PostgreSQL Ecosystem
|
|
328
|
+
| Tool | Purpose |
|
|
329
|
+
|------|---------|
|
|
330
|
+
| **PgBouncer** | Connection pooling (standard for production) |
|
|
331
|
+
| **Citus** | Distributed PostgreSQL — sharding as extension |
|
|
332
|
+
| **pg_partman** | Automated partition management |
|
|
333
|
+
| **Patroni** | HA and automatic failover |
|
|
334
|
+
|
|
335
|
+
### MySQL Ecosystem
|
|
336
|
+
| Tool | Purpose |
|
|
337
|
+
|------|---------|
|
|
338
|
+
| **ProxySQL** | Query routing + connection pooling |
|
|
339
|
+
| **Vitess** | Sharding middleware (Slack, Shopify, GitHub) |
|
|
340
|
+
| **Orchestrator** | Replication topology management |
|
|
341
|
+
|
|
342
|
+
### Managed Cloud Databases
|
|
343
|
+
| Service | Provider | Key Feature |
|
|
344
|
+
|---------|----------|-------------|
|
|
345
|
+
| **Aurora** | AWS | 15 replicas, auto-scales to 128TB |
|
|
346
|
+
| **AlloyDB** | GCP | PostgreSQL-compatible, 100x faster analytics |
|
|
347
|
+
| **PlanetScale** | PlanetScale | Vitess-based serverless MySQL, zero-downtime DDL |
|
|
348
|
+
| **Neon** | Neon | Serverless PostgreSQL, branching, scale-to-zero |
|
|
349
|
+
|
|
350
|
+
### NewSQL / Distributed SQL
|
|
351
|
+
| Database | Sharding | SQL Compat | Best For |
|
|
352
|
+
|----------|----------|------------|----------|
|
|
353
|
+
| **CockroachDB** | Automatic (ranges) | PostgreSQL wire | Global distribution, strong consistency |
|
|
354
|
+
| **YugabyteDB** | Automatic (hash/range) | PostgreSQL-compatible | PostgreSQL apps needing horizontal scale |
|
|
355
|
+
| **TiDB** | Automatic (ranges) | MySQL-compatible | MySQL apps needing horizontal scale |
|
|
356
|
+
| **Vitess** | Application-directed | MySQL (middleware) | Existing MySQL at extreme scale |
|
|
357
|
+
| **Citus** | Explicit (extension) | PostgreSQL (native) | Multi-tenant, real-time analytics |
|
|
358
|
+
|
|
359
|
+
**Key distinction:** CockroachDB/YugabyteDB handle sharding transparently (no shard key). Citus/Vitess require explicit shard key selection. Transparent sharding trades performance for simplicity.
|
|
360
|
+
|
|
361
|
+
### Read Replica Support by Provider
|
|
362
|
+
|
|
363
|
+
| Provider | Max Replicas | Cross-Region | Replication |
|
|
364
|
+
|----------|-------------|--------------|-------------|
|
|
365
|
+
| AWS Aurora | 15 | Yes | Async (sync available) |
|
|
366
|
+
| AWS RDS | 5 | Yes | Async |
|
|
367
|
+
| GCP Cloud SQL | 10 | Yes | Async |
|
|
368
|
+
| Azure Flexible | 5 | Yes (geo-replication) | Async |
|
|
369
|
+
| Self-managed | Unlimited | Manual setup | Streaming WAL |
|
|
370
|
+
|
|
371
|
+
---
|
|
372
|
+
|
|
373
|
+
## Decision Tree
|
|
374
|
+
|
|
375
|
+
```
|
|
376
|
+
START: Database is slow
|
|
377
|
+
│
|
|
378
|
+
├─ Analyzed slow queries? (pg_stat_statements, EXPLAIN ANALYZE)
|
|
379
|
+
│ └─ NO → Do this first. Stop. 90% of problems end here.
|
|
380
|
+
│
|
|
381
|
+
├─ Queries optimized but CPU > 70%?
|
|
382
|
+
│ ├─ Largest instance? NO → Vertically scale. Done.
|
|
383
|
+
│ └─ YES → Continue.
|
|
384
|
+
│
|
|
385
|
+
├─ Workload > 80% reads?
|
|
386
|
+
│ ├─ YES → Add read replicas (start with 2).
|
|
387
|
+
│ │ ├─ Same data read repeatedly? → Add Redis caching.
|
|
388
|
+
│ │ └─ Still bottlenecked? → Continue.
|
|
389
|
+
│ └─ NO (write-heavy) → Continue.
|
|
390
|
+
│
|
|
391
|
+
├─ Have connection pooling?
|
|
392
|
+
│ └─ NO → Add PgBouncer. May solve the problem alone.
|
|
393
|
+
│
|
|
394
|
+
├─ Tables > 100M rows?
|
|
395
|
+
│ └─ YES → Partition (range by time most common).
|
|
396
|
+
│
|
|
397
|
+
├─ Workload splittable into independent domains?
|
|
398
|
+
│ └─ YES → Vertical partition (Figma: 1 → 12 databases).
|
|
399
|
+
│
|
|
400
|
+
├─ Single-table writes > 50K/sec sustained?
|
|
401
|
+
│ ├─ YES → Horizontal sharding warranted.
|
|
402
|
+
│ │ ├─ PostgreSQL → Citus or application-level
|
|
403
|
+
│ │ ├─ MySQL → Vitess or PlanetScale
|
|
404
|
+
│ │ └─ Greenfield → CockroachDB or YugabyteDB
|
|
405
|
+
│ └─ NO → Revisit Levels 0-6. You don't need sharding.
|
|
406
|
+
│
|
|
407
|
+
└─ None of the above?
|
|
408
|
+
└─ Problem is likely not the database. Check: network latency,
|
|
409
|
+
app-level N+1 patterns, lock contention, disk I/O.
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
---
|
|
413
|
+
|
|
414
|
+
## Implementation Sketch
|
|
415
|
+
|
|
416
|
+
### Read Replica Setup (PostgreSQL / AWS RDS)
|
|
417
|
+
|
|
418
|
+
```bash
|
|
419
|
+
# Create read replica
|
|
420
|
+
aws rds create-db-instance-read-replica \
|
|
421
|
+
--db-instance-identifier myapp-read-1 \
|
|
422
|
+
--source-db-instance-identifier myapp-primary \
|
|
423
|
+
--db-instance-class db.r6g.2xlarge
|
|
424
|
+
```
|
|
425
|
+
|
|
426
|
+
```ruby
|
|
427
|
+
# Rails read/write routing (config/database.yml)
|
|
428
|
+
production:
|
|
429
|
+
primary:
|
|
430
|
+
url: postgres://myapp-primary.xxx.rds.amazonaws.com/myapp
|
|
431
|
+
primary_replica:
|
|
432
|
+
url: postgres://myapp-read-1.xxx.rds.amazonaws.com/myapp
|
|
433
|
+
replica: true
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
### PgBouncer Production Config
|
|
437
|
+
|
|
438
|
+
```ini
|
|
439
|
+
[pgbouncer]
|
|
440
|
+
pool_mode = transaction # recommended for most workloads
|
|
441
|
+
default_pool_size = 50 # per user/database pair
|
|
442
|
+
max_client_conn = 1000
|
|
443
|
+
max_db_connections = 100 # hard cap on backend connections
|
|
444
|
+
server_idle_timeout = 600
|
|
445
|
+
query_timeout = 30
|
|
446
|
+
```
|
|
447
|
+
|
|
448
|
+
### Cache Lock Pattern (Thundering Herd Protection)
|
|
449
|
+
|
|
450
|
+
Based on the OpenAI approach — only one request fetches from the database on a cache miss:
|
|
451
|
+
|
|
452
|
+
```python
|
|
453
|
+
def get_user(user_id):
|
|
454
|
+
cached = redis.get(f"user:{user_id}")
|
|
455
|
+
if cached:
|
|
456
|
+
return deserialize(cached)
|
|
457
|
+
|
|
458
|
+
# Acquire lock - only one request fetches from DB
|
|
459
|
+
lock = redis.set(f"lock:user:{user_id}", "1", nx=True, ex=5)
|
|
460
|
+
if lock:
|
|
461
|
+
user = db.query("SELECT * FROM users WHERE id = %s", user_id)
|
|
462
|
+
redis.setex(f"user:{user_id}", 300, serialize(user))
|
|
463
|
+
redis.delete(f"lock:user:{user_id}")
|
|
464
|
+
return user
|
|
465
|
+
else:
|
|
466
|
+
time.sleep(0.05) # wait for other request to populate
|
|
467
|
+
return get_user(user_id) # retry
|
|
468
|
+
```
|
|
469
|
+
|
|
470
|
+
### Zero-Downtime Partitioning Migration
|
|
471
|
+
|
|
472
|
+
```sql
|
|
473
|
+
-- Step 1: Create partitioned table with same schema
|
|
474
|
+
CREATE TABLE events_partitioned (
|
|
475
|
+
LIKE events INCLUDING ALL
|
|
476
|
+
) PARTITION BY RANGE (created_at);
|
|
477
|
+
|
|
478
|
+
-- Step 2: Create partitions for each range
|
|
479
|
+
CREATE TABLE events_p_2025_01 PARTITION OF events_partitioned
|
|
480
|
+
FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
|
|
481
|
+
-- ... create all needed partitions
|
|
482
|
+
|
|
483
|
+
-- Step 3: Backfill in batches (avoid long locks)
|
|
484
|
+
INSERT INTO events_partitioned
|
|
485
|
+
SELECT * FROM events
|
|
486
|
+
WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01'
|
|
487
|
+
ON CONFLICT DO NOTHING;
|
|
488
|
+
|
|
489
|
+
-- Step 4: Swap tables (brief exclusive lock)
|
|
490
|
+
BEGIN;
|
|
491
|
+
ALTER TABLE events RENAME TO events_old;
|
|
492
|
+
ALTER TABLE events_partitioned RENAME TO events;
|
|
493
|
+
COMMIT;
|
|
494
|
+
|
|
495
|
+
-- Step 5: Verify, then drop after confirmation period
|
|
496
|
+
-- DROP TABLE events_old;
|
|
497
|
+
```
|
|
498
|
+
|
|
499
|
+
### Essential Monitoring Queries
|
|
500
|
+
|
|
501
|
+
```sql
|
|
502
|
+
-- Top queries by total time
|
|
503
|
+
SELECT calls, round(mean_exec_time::numeric, 2) as mean_ms,
|
|
504
|
+
round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) as pct,
|
|
505
|
+
left(query, 80) as query
|
|
506
|
+
FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;
|
|
507
|
+
|
|
508
|
+
-- Cache hit ratio (target > 99%)
|
|
509
|
+
SELECT round(100.0 * sum(heap_blks_hit) /
|
|
510
|
+
nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) as cache_hit_pct
|
|
511
|
+
FROM pg_statio_user_tables;
|
|
512
|
+
|
|
513
|
+
-- Replication lag
|
|
514
|
+
SELECT client_addr, state,
|
|
515
|
+
flush_lsn - replay_lsn as replay_lag
|
|
516
|
+
FROM pg_stat_replication;
|
|
517
|
+
|
|
518
|
+
-- Table bloat and dead tuples (VACUUM health)
|
|
519
|
+
SELECT relname, n_live_tup, n_dead_tup,
|
|
520
|
+
round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct,
|
|
521
|
+
last_autovacuum
|
|
522
|
+
FROM pg_stat_user_tables
|
|
523
|
+
WHERE n_live_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 20;
|
|
524
|
+
|
|
525
|
+
-- Active connections by state
|
|
526
|
+
SELECT state, count(*), max(now() - state_change) as longest
|
|
527
|
+
FROM pg_stat_activity
|
|
528
|
+
WHERE pid <> pg_backend_pid() GROUP BY state;
|
|
529
|
+
```
|
|
530
|
+
|
|
531
|
+
---
|
|
532
|
+
|
|
533
|
+
## Real-World Case Studies
|
|
534
|
+
|
|
535
|
+
### OpenAI (ChatGPT) — Read Replicas at Extreme Scale
|
|
536
|
+
|
|
537
|
+
**Scale:** 800 million users, millions of queries per second.
|
|
538
|
+
**Architecture:** Single Azure PostgreSQL Flexible Server primary + ~50 read replicas across multiple regions.
|
|
539
|
+
|
|
540
|
+
Key innovations:
|
|
541
|
+
- Tree-topology WAL distribution: primary streams to relay replicas, relays stream to leaf replicas — avoids overwhelming the primary's network/CPU at 50+ replicas
|
|
542
|
+
- Cache lock mechanism: only a single reader that misses on a cache key fetches from PostgreSQL; other requests wait — prevents thundering herd
|
|
543
|
+
- Consistent low double-digit millisecond p99 client-side latency
|
|
544
|
+
- Five-nines (99.999%) availability in production
|
|
545
|
+
- **No sharding needed** for the core metadata store
|
|
546
|
+
|
|
547
|
+
**Lesson:** Aggressive query optimization, caching, and read replicas can scale PostgreSQL far beyond what most engineers expect.
|
|
548
|
+
|
|
549
|
+
### Instagram — Logical Sharding on PostgreSQL
|
|
550
|
+
|
|
551
|
+
**Scale:** 2+ billion monthly active users.
|
|
552
|
+
**Architecture:** Django + PostgreSQL, sharded across thousands of logical shards mapped to physical servers.
|
|
553
|
+
|
|
554
|
+
Key innovations:
|
|
555
|
+
- Custom 64-bit ID: 41-bit timestamp (time-sortable) + 13-bit shard ID (8192 logical shards) + 10-bit auto-increment sequence
|
|
556
|
+
- Logical shards as PostgreSQL schemas (not separate databases) — multiple logical shards per physical server
|
|
557
|
+
- PgBouncer for connection pooling across all shards
|
|
558
|
+
- Cassandra for specific use cases; Redis for ephemeral caching
|
|
559
|
+
- Rebalancing moves logical shards between physical servers — no row-level data migration
|
|
560
|
+
|
|
561
|
+
**Lesson:** Logical sharding provides resharding flexibility. Start with many logical shards on few physical servers. Rebalancing is moving entire schemas, not splitting tables.
|
|
562
|
+
|
|
563
|
+
### Figma — Incremental Vertical Partitioning
|
|
564
|
+
|
|
565
|
+
**Scale:** 4M+ users, database traffic growing ~3x annually.
|
|
566
|
+
**Starting point:** Single PostgreSQL on AWS RDS, hitting 65% CPU at peak with all queries on one database.
|
|
567
|
+
|
|
568
|
+
Evolution:
|
|
569
|
+
1. Upgraded from r5.12xlarge to r5.24xlarge (largest available) — bought time
|
|
570
|
+
2. Added multiple read replicas + PgBouncer as connection pooler
|
|
571
|
+
3. Created new databases for new features to limit growth of the original
|
|
572
|
+
4. Vertical partitioning: moved high-traffic table groups (Files, Organizations) to dedicated databases — grew from 1 to 12 databases
|
|
573
|
+
5. Only then began exploring horizontal sharding on top of vertically partitioned RDS Postgres
|
|
574
|
+
|
|
575
|
+
**Key principle:** Minimize developer impact. Every step was incremental — no "big bang" cutover. App developers could focus on features instead of refactoring.
|
|
576
|
+
|
|
577
|
+
**Lesson:** Exhaust vertical partitioning before horizontal sharding. Going from 1 to 12 databases gave ~12x headroom with far less complexity than sharding.
|
|
578
|
+
|
|
579
|
+
### Slack — Vitess Migration (3 Years)
|
|
580
|
+
|
|
581
|
+
**Scale:** Hundreds of thousands of MySQL queries/second, thousands of sharded hosts.
|
|
582
|
+
**Problem:** Shard-per-workspace model meant large customers overwhelmed individual shards while thousands of others sat mostly idle.
|
|
583
|
+
|
|
584
|
+
Migration:
|
|
585
|
+
- Chose Vitess: "no other storage system truly fit all of Slack's needs" for flexible "shard by anything"
|
|
586
|
+
- Timeline: July 2017 to late 2020 — 3+ years from 0% to 99% adoption
|
|
587
|
+
- Scaled from 0 to 2.3 million QPS on Vitess
|
|
588
|
+
- Specific migration of a table comprising 20% of overall query load documented in detail
|
|
589
|
+
|
|
590
|
+
**Lesson:** Sharding migrations at scale take years, not months. Budget accordingly and plan for long dual-write periods.
|
|
591
|
+
|
|
592
|
+
### Shopify — Vitess with Query Verification
|
|
593
|
+
|
|
594
|
+
**Architecture:** MySQL + Vitess, sharded by `user_id`.
|
|
595
|
+
- User-owned data in sharded "users" keyspace; global/shared data in unsharded "global" keyspace
|
|
596
|
+
- Built query verifiers in the application layer that validated query correctness, routing, and data distribution
|
|
597
|
+
- Ran verifiers in shadow mode in production before switching traffic
|
|
598
|
+
- Credited verifiers as instrumental to the successful migration
|
|
599
|
+
|
|
600
|
+
**Lesson:** Build verification tooling before migrating. Validate every query against the new sharded topology in production (shadow mode) before cutting over traffic.
|
|
601
|
+
|
|
602
|
+
---
|
|
603
|
+
|
|
604
|
+
## Cross-References
|
|
605
|
+
|
|
606
|
+
- **[data-modeling](../data-modeling.md):** Schema design impacts scaling options. Normalized schemas require JOINs that break under sharding.
|
|
607
|
+
- **[sql-vs-nosql](../sql-vs-nosql.md):** NoSQL databases are pre-sharded by design but sacrifice JOINs and transactions.
|
|
608
|
+
- **[caching-architecture](../caching-architecture.md):** Level 5 on the scaling ladder. Redis can reduce DB load by 90%+, often eliminating further scaling needs.
|
|
609
|
+
- **[horizontal-vs-vertical](../horizontal-vs-vertical.md):** Vertical is always simpler. Horizontal only when vertical limits are reached.
|
|
610
|
+
- **[data-consistency](../data-consistency.md):** Every scaling step beyond vertical introduces consistency trade-offs. Understand CAP theorem before choosing.
|
|
611
|
+
|
|
612
|
+
---
|
|
613
|
+
|
|
614
|
+
*Last updated: 2026-03-08*
|
|
615
|
+
*Sources: [OpenAI Engineering](https://openai.com/index/scaling-postgresql/), [Instagram Engineering](https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c), [Figma Blog](https://www.figma.com/blog/how-figma-scaled-to-multiple-databases/), [Slack Engineering](https://slack.engineering/scaling-datastores-at-slack-with-vitess/), [Shopify Engineering](https://shopify.engineering/horizontally-scaling-the-rails-backend-of-shop-app-with-vitess)*
|