@wazir-dev/cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (629) hide show
  1. package/AGENTS.md +111 -0
  2. package/CHANGELOG.md +14 -0
  3. package/CONTRIBUTING.md +101 -0
  4. package/LICENSE +21 -0
  5. package/README.md +314 -0
  6. package/assets/composition-engine.mmd +34 -0
  7. package/assets/demo-script.sh +17 -0
  8. package/assets/logo-dark.svg +14 -0
  9. package/assets/logo.svg +14 -0
  10. package/assets/pipeline.mmd +39 -0
  11. package/assets/record-demo.sh +51 -0
  12. package/docs/README.md +51 -0
  13. package/docs/adapters/context-mode.md +60 -0
  14. package/docs/concepts/architecture.md +87 -0
  15. package/docs/concepts/artifact-model.md +60 -0
  16. package/docs/concepts/composition-engine.md +36 -0
  17. package/docs/concepts/indexing-and-recall.md +160 -0
  18. package/docs/concepts/observability.md +41 -0
  19. package/docs/concepts/roles-and-workflows.md +59 -0
  20. package/docs/concepts/terminology-policy.md +27 -0
  21. package/docs/getting-started/01-installation.md +78 -0
  22. package/docs/getting-started/02-first-run.md +102 -0
  23. package/docs/getting-started/03-adding-to-project.md +15 -0
  24. package/docs/getting-started/04-host-setup.md +15 -0
  25. package/docs/guides/ci-integration.md +15 -0
  26. package/docs/guides/creating-skills.md +15 -0
  27. package/docs/guides/expertise-module-authoring.md +15 -0
  28. package/docs/guides/hook-development.md +15 -0
  29. package/docs/guides/memory-and-learnings.md +34 -0
  30. package/docs/guides/multi-host-export.md +15 -0
  31. package/docs/guides/troubleshooting.md +101 -0
  32. package/docs/guides/writing-custom-roles.md +15 -0
  33. package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
  34. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
  35. package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
  36. package/docs/readmes/INDEX.md +99 -0
  37. package/docs/readmes/features/expertise/README.md +171 -0
  38. package/docs/readmes/features/exports/README.md +222 -0
  39. package/docs/readmes/features/hooks/README.md +103 -0
  40. package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
  41. package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
  42. package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
  43. package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
  44. package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
  45. package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
  46. package/docs/readmes/features/hooks/session-start.md +119 -0
  47. package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
  48. package/docs/readmes/features/roles/README.md +157 -0
  49. package/docs/readmes/features/roles/clarifier.md +152 -0
  50. package/docs/readmes/features/roles/content-author.md +190 -0
  51. package/docs/readmes/features/roles/designer.md +193 -0
  52. package/docs/readmes/features/roles/executor.md +184 -0
  53. package/docs/readmes/features/roles/learner.md +210 -0
  54. package/docs/readmes/features/roles/planner.md +182 -0
  55. package/docs/readmes/features/roles/researcher.md +164 -0
  56. package/docs/readmes/features/roles/reviewer.md +184 -0
  57. package/docs/readmes/features/roles/specifier.md +162 -0
  58. package/docs/readmes/features/roles/verifier.md +215 -0
  59. package/docs/readmes/features/schemas/README.md +178 -0
  60. package/docs/readmes/features/skills/README.md +63 -0
  61. package/docs/readmes/features/skills/brainstorming.md +96 -0
  62. package/docs/readmes/features/skills/debugging.md +148 -0
  63. package/docs/readmes/features/skills/design.md +120 -0
  64. package/docs/readmes/features/skills/prepare-next.md +109 -0
  65. package/docs/readmes/features/skills/run-audit.md +159 -0
  66. package/docs/readmes/features/skills/scan-project.md +109 -0
  67. package/docs/readmes/features/skills/self-audit.md +176 -0
  68. package/docs/readmes/features/skills/tdd.md +137 -0
  69. package/docs/readmes/features/skills/using-skills.md +92 -0
  70. package/docs/readmes/features/skills/verification.md +120 -0
  71. package/docs/readmes/features/skills/writing-plans.md +104 -0
  72. package/docs/readmes/features/tooling/README.md +320 -0
  73. package/docs/readmes/features/workflows/README.md +186 -0
  74. package/docs/readmes/features/workflows/author.md +181 -0
  75. package/docs/readmes/features/workflows/clarify.md +154 -0
  76. package/docs/readmes/features/workflows/design-review.md +171 -0
  77. package/docs/readmes/features/workflows/design.md +169 -0
  78. package/docs/readmes/features/workflows/discover.md +162 -0
  79. package/docs/readmes/features/workflows/execute.md +173 -0
  80. package/docs/readmes/features/workflows/learn.md +167 -0
  81. package/docs/readmes/features/workflows/plan-review.md +165 -0
  82. package/docs/readmes/features/workflows/plan.md +170 -0
  83. package/docs/readmes/features/workflows/prepare-next.md +167 -0
  84. package/docs/readmes/features/workflows/review.md +169 -0
  85. package/docs/readmes/features/workflows/run-audit.md +191 -0
  86. package/docs/readmes/features/workflows/spec-challenge.md +159 -0
  87. package/docs/readmes/features/workflows/specify.md +160 -0
  88. package/docs/readmes/features/workflows/verify.md +177 -0
  89. package/docs/readmes/packages/README.md +50 -0
  90. package/docs/readmes/packages/ajv.md +117 -0
  91. package/docs/readmes/packages/context-mode.md +118 -0
  92. package/docs/readmes/packages/gray-matter.md +116 -0
  93. package/docs/readmes/packages/node-test.md +137 -0
  94. package/docs/readmes/packages/yaml.md +112 -0
  95. package/docs/reference/configuration-reference.md +159 -0
  96. package/docs/reference/expertise-index.md +52 -0
  97. package/docs/reference/git-flow.md +43 -0
  98. package/docs/reference/hooks.md +87 -0
  99. package/docs/reference/host-exports.md +50 -0
  100. package/docs/reference/launch-checklist.md +172 -0
  101. package/docs/reference/marketplace-listings.md +76 -0
  102. package/docs/reference/release-process.md +34 -0
  103. package/docs/reference/roles-reference.md +77 -0
  104. package/docs/reference/skills.md +33 -0
  105. package/docs/reference/templates.md +29 -0
  106. package/docs/reference/tooling-cli.md +94 -0
  107. package/docs/truth-claims.yaml +222 -0
  108. package/expertise/PROGRESS.md +63 -0
  109. package/expertise/README.md +18 -0
  110. package/expertise/antipatterns/PROGRESS.md +56 -0
  111. package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
  112. package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
  113. package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
  114. package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
  115. package/expertise/antipatterns/backend/index.md +24 -0
  116. package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
  117. package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
  118. package/expertise/antipatterns/code/async-antipatterns.md +622 -0
  119. package/expertise/antipatterns/code/code-smells.md +1186 -0
  120. package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
  121. package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
  122. package/expertise/antipatterns/code/index.md +27 -0
  123. package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
  124. package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
  125. package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
  126. package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
  127. package/expertise/antipatterns/design/dark-patterns.md +1121 -0
  128. package/expertise/antipatterns/design/index.md +22 -0
  129. package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
  130. package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
  131. package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
  132. package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
  133. package/expertise/antipatterns/frontend/index.md +23 -0
  134. package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
  135. package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
  136. package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
  137. package/expertise/antipatterns/index.md +31 -0
  138. package/expertise/antipatterns/performance/index.md +20 -0
  139. package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
  140. package/expertise/antipatterns/performance/premature-optimization.md +623 -0
  141. package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
  142. package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
  143. package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
  144. package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
  145. package/expertise/antipatterns/process/index.md +23 -0
  146. package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
  147. package/expertise/antipatterns/security/index.md +20 -0
  148. package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
  149. package/expertise/antipatterns/security/security-theater.md +843 -0
  150. package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
  151. package/expertise/architecture/PROGRESS.md +70 -0
  152. package/expertise/architecture/data/caching-architecture.md +671 -0
  153. package/expertise/architecture/data/data-consistency.md +574 -0
  154. package/expertise/architecture/data/data-modeling.md +536 -0
  155. package/expertise/architecture/data/event-streams-and-queues.md +634 -0
  156. package/expertise/architecture/data/index.md +25 -0
  157. package/expertise/architecture/data/search-architecture.md +663 -0
  158. package/expertise/architecture/data/sql-vs-nosql.md +708 -0
  159. package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
  160. package/expertise/architecture/decisions/build-vs-buy.md +616 -0
  161. package/expertise/architecture/decisions/index.md +23 -0
  162. package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
  163. package/expertise/architecture/decisions/technology-selection.md +616 -0
  164. package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
  165. package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
  166. package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
  167. package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
  168. package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
  169. package/expertise/architecture/distributed/index.md +25 -0
  170. package/expertise/architecture/distributed/saga-pattern.md +797 -0
  171. package/expertise/architecture/foundations/architectural-thinking.md +460 -0
  172. package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
  173. package/expertise/architecture/foundations/design-principles-solid.md +649 -0
  174. package/expertise/architecture/foundations/domain-driven-design.md +719 -0
  175. package/expertise/architecture/foundations/index.md +25 -0
  176. package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
  177. package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
  178. package/expertise/architecture/index.md +34 -0
  179. package/expertise/architecture/integration/api-design-graphql.md +638 -0
  180. package/expertise/architecture/integration/api-design-grpc.md +804 -0
  181. package/expertise/architecture/integration/api-design-rest.md +892 -0
  182. package/expertise/architecture/integration/index.md +25 -0
  183. package/expertise/architecture/integration/third-party-integration.md +795 -0
  184. package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
  185. package/expertise/architecture/integration/websockets-realtime.md +791 -0
  186. package/expertise/architecture/mobile-architecture/index.md +22 -0
  187. package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
  188. package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
  189. package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
  190. package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
  191. package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
  192. package/expertise/architecture/patterns/event-driven.md +797 -0
  193. package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
  194. package/expertise/architecture/patterns/index.md +27 -0
  195. package/expertise/architecture/patterns/layered-architecture.md +736 -0
  196. package/expertise/architecture/patterns/microservices.md +753 -0
  197. package/expertise/architecture/patterns/modular-monolith.md +692 -0
  198. package/expertise/architecture/patterns/monolith.md +626 -0
  199. package/expertise/architecture/patterns/plugin-architecture.md +735 -0
  200. package/expertise/architecture/patterns/serverless.md +780 -0
  201. package/expertise/architecture/scaling/database-scaling.md +615 -0
  202. package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
  203. package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
  204. package/expertise/architecture/scaling/index.md +24 -0
  205. package/expertise/architecture/scaling/multi-tenancy.md +800 -0
  206. package/expertise/architecture/scaling/stateless-design.md +787 -0
  207. package/expertise/backend/embedded-firmware.md +625 -0
  208. package/expertise/backend/go.md +853 -0
  209. package/expertise/backend/index.md +24 -0
  210. package/expertise/backend/java-spring.md +448 -0
  211. package/expertise/backend/node-typescript.md +625 -0
  212. package/expertise/backend/python-fastapi.md +724 -0
  213. package/expertise/backend/rust.md +458 -0
  214. package/expertise/backend/solidity.md +711 -0
  215. package/expertise/composition-map.yaml +443 -0
  216. package/expertise/content/foundations/content-modeling.md +395 -0
  217. package/expertise/content/foundations/editorial-standards.md +449 -0
  218. package/expertise/content/foundations/index.md +24 -0
  219. package/expertise/content/foundations/microcopy.md +455 -0
  220. package/expertise/content/foundations/terminology-governance.md +509 -0
  221. package/expertise/content/index.md +34 -0
  222. package/expertise/content/patterns/accessibility-copy.md +518 -0
  223. package/expertise/content/patterns/index.md +24 -0
  224. package/expertise/content/patterns/notification-content.md +433 -0
  225. package/expertise/content/patterns/sample-content.md +486 -0
  226. package/expertise/content/patterns/state-copy.md +439 -0
  227. package/expertise/design/PROGRESS.md +58 -0
  228. package/expertise/design/disciplines/dark-mode-theming.md +577 -0
  229. package/expertise/design/disciplines/design-systems.md +595 -0
  230. package/expertise/design/disciplines/index.md +25 -0
  231. package/expertise/design/disciplines/information-architecture.md +800 -0
  232. package/expertise/design/disciplines/interaction-design.md +788 -0
  233. package/expertise/design/disciplines/responsive-design.md +552 -0
  234. package/expertise/design/disciplines/usability-testing.md +516 -0
  235. package/expertise/design/disciplines/user-research.md +792 -0
  236. package/expertise/design/foundations/accessibility-design.md +796 -0
  237. package/expertise/design/foundations/color-theory.md +797 -0
  238. package/expertise/design/foundations/iconography.md +795 -0
  239. package/expertise/design/foundations/index.md +26 -0
  240. package/expertise/design/foundations/motion-and-animation.md +653 -0
  241. package/expertise/design/foundations/rtl-design.md +585 -0
  242. package/expertise/design/foundations/spacing-and-layout.md +607 -0
  243. package/expertise/design/foundations/typography.md +800 -0
  244. package/expertise/design/foundations/visual-hierarchy.md +761 -0
  245. package/expertise/design/index.md +32 -0
  246. package/expertise/design/patterns/authentication-flows.md +474 -0
  247. package/expertise/design/patterns/content-consumption.md +789 -0
  248. package/expertise/design/patterns/data-display.md +618 -0
  249. package/expertise/design/patterns/e-commerce.md +1494 -0
  250. package/expertise/design/patterns/feedback-and-states.md +642 -0
  251. package/expertise/design/patterns/forms-and-input.md +819 -0
  252. package/expertise/design/patterns/gamification.md +801 -0
  253. package/expertise/design/patterns/index.md +31 -0
  254. package/expertise/design/patterns/microinteractions.md +449 -0
  255. package/expertise/design/patterns/navigation.md +800 -0
  256. package/expertise/design/patterns/notifications.md +705 -0
  257. package/expertise/design/patterns/onboarding.md +700 -0
  258. package/expertise/design/patterns/search-and-filter.md +601 -0
  259. package/expertise/design/patterns/settings-and-preferences.md +768 -0
  260. package/expertise/design/patterns/social-and-community.md +748 -0
  261. package/expertise/design/platforms/desktop-native.md +612 -0
  262. package/expertise/design/platforms/index.md +25 -0
  263. package/expertise/design/platforms/mobile-android.md +825 -0
  264. package/expertise/design/platforms/mobile-cross-platform.md +983 -0
  265. package/expertise/design/platforms/mobile-ios.md +699 -0
  266. package/expertise/design/platforms/tablet.md +794 -0
  267. package/expertise/design/platforms/web-dashboard.md +790 -0
  268. package/expertise/design/platforms/web-responsive.md +550 -0
  269. package/expertise/design/psychology/behavioral-nudges.md +449 -0
  270. package/expertise/design/psychology/cognitive-load.md +1191 -0
  271. package/expertise/design/psychology/error-psychology.md +778 -0
  272. package/expertise/design/psychology/index.md +22 -0
  273. package/expertise/design/psychology/persuasive-design.md +736 -0
  274. package/expertise/design/psychology/user-mental-models.md +623 -0
  275. package/expertise/design/tooling/open-pencil.md +266 -0
  276. package/expertise/frontend/angular.md +1073 -0
  277. package/expertise/frontend/desktop-electron.md +546 -0
  278. package/expertise/frontend/flutter.md +782 -0
  279. package/expertise/frontend/index.md +27 -0
  280. package/expertise/frontend/native-android.md +409 -0
  281. package/expertise/frontend/native-ios.md +490 -0
  282. package/expertise/frontend/react-native.md +1160 -0
  283. package/expertise/frontend/react.md +808 -0
  284. package/expertise/frontend/vue.md +1089 -0
  285. package/expertise/humanize/domain-rules-code.md +79 -0
  286. package/expertise/humanize/domain-rules-content.md +67 -0
  287. package/expertise/humanize/domain-rules-technical-docs.md +56 -0
  288. package/expertise/humanize/index.md +35 -0
  289. package/expertise/humanize/self-audit-checklist.md +87 -0
  290. package/expertise/humanize/sentence-patterns.md +218 -0
  291. package/expertise/humanize/vocabulary-blacklist.md +105 -0
  292. package/expertise/i18n/PROGRESS.md +65 -0
  293. package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
  294. package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
  295. package/expertise/i18n/advanced/complex-scripts.md +30 -0
  296. package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
  297. package/expertise/i18n/advanced/testing-i18n.md +28 -0
  298. package/expertise/i18n/content/content-adaptation.md +23 -0
  299. package/expertise/i18n/content/locale-specific-formatting.md +23 -0
  300. package/expertise/i18n/content/machine-translation-integration.md +28 -0
  301. package/expertise/i18n/content/translation-management.md +29 -0
  302. package/expertise/i18n/foundations/date-time-calendars.md +67 -0
  303. package/expertise/i18n/foundations/i18n-architecture.md +272 -0
  304. package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
  305. package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
  306. package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
  307. package/expertise/i18n/foundations/string-externalization.md +236 -0
  308. package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
  309. package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
  310. package/expertise/i18n/index.md +38 -0
  311. package/expertise/i18n/platform/backend-i18n.md +31 -0
  312. package/expertise/i18n/platform/flutter-i18n.md +148 -0
  313. package/expertise/i18n/platform/native-android-i18n.md +36 -0
  314. package/expertise/i18n/platform/native-ios-i18n.md +36 -0
  315. package/expertise/i18n/platform/react-i18n.md +103 -0
  316. package/expertise/i18n/platform/web-css-i18n.md +81 -0
  317. package/expertise/i18n/rtl/arabic-specific.md +175 -0
  318. package/expertise/i18n/rtl/hebrew-specific.md +149 -0
  319. package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
  320. package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
  321. package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
  322. package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
  323. package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
  324. package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
  325. package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
  326. package/expertise/i18n/rtl/rtl-typography.md +160 -0
  327. package/expertise/index.md +113 -0
  328. package/expertise/index.yaml +216 -0
  329. package/expertise/infrastructure/cloud-aws.md +597 -0
  330. package/expertise/infrastructure/cloud-gcp.md +599 -0
  331. package/expertise/infrastructure/cybersecurity.md +816 -0
  332. package/expertise/infrastructure/database-mongodb.md +447 -0
  333. package/expertise/infrastructure/database-postgres.md +400 -0
  334. package/expertise/infrastructure/devops-cicd.md +787 -0
  335. package/expertise/infrastructure/index.md +27 -0
  336. package/expertise/performance/PROGRESS.md +50 -0
  337. package/expertise/performance/backend/api-latency.md +1204 -0
  338. package/expertise/performance/backend/background-jobs.md +506 -0
  339. package/expertise/performance/backend/connection-pooling.md +1209 -0
  340. package/expertise/performance/backend/database-query-optimization.md +515 -0
  341. package/expertise/performance/backend/index.md +23 -0
  342. package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
  343. package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
  344. package/expertise/performance/foundations/caching-strategies.md +489 -0
  345. package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
  346. package/expertise/performance/foundations/index.md +24 -0
  347. package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
  348. package/expertise/performance/foundations/memory-management.md +964 -0
  349. package/expertise/performance/foundations/performance-budgets.md +1314 -0
  350. package/expertise/performance/index.md +31 -0
  351. package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
  352. package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
  353. package/expertise/performance/infrastructure/index.md +22 -0
  354. package/expertise/performance/infrastructure/load-balancing.md +1081 -0
  355. package/expertise/performance/infrastructure/observability.md +1079 -0
  356. package/expertise/performance/mobile/index.md +23 -0
  357. package/expertise/performance/mobile/mobile-animations.md +544 -0
  358. package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
  359. package/expertise/performance/mobile/mobile-network.md +452 -0
  360. package/expertise/performance/mobile/mobile-rendering.md +599 -0
  361. package/expertise/performance/mobile/mobile-startup-time.md +505 -0
  362. package/expertise/performance/platform-specific/flutter-performance.md +647 -0
  363. package/expertise/performance/platform-specific/index.md +22 -0
  364. package/expertise/performance/platform-specific/node-performance.md +1307 -0
  365. package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
  366. package/expertise/performance/platform-specific/react-performance.md +1403 -0
  367. package/expertise/performance/web/bundle-optimization.md +1239 -0
  368. package/expertise/performance/web/image-and-media.md +636 -0
  369. package/expertise/performance/web/index.md +24 -0
  370. package/expertise/performance/web/network-optimization.md +1133 -0
  371. package/expertise/performance/web/rendering-performance.md +1098 -0
  372. package/expertise/performance/web/ssr-and-hydration.md +918 -0
  373. package/expertise/performance/web/web-vitals.md +1374 -0
  374. package/expertise/quality/accessibility.md +985 -0
  375. package/expertise/quality/evidence-based-verification.md +499 -0
  376. package/expertise/quality/index.md +24 -0
  377. package/expertise/quality/ml-model-audit.md +614 -0
  378. package/expertise/quality/performance.md +600 -0
  379. package/expertise/quality/testing-api.md +891 -0
  380. package/expertise/quality/testing-mobile.md +496 -0
  381. package/expertise/quality/testing-web.md +849 -0
  382. package/expertise/security/PROGRESS.md +54 -0
  383. package/expertise/security/agentic-identity.md +540 -0
  384. package/expertise/security/compliance-frameworks.md +601 -0
  385. package/expertise/security/data/data-encryption.md +364 -0
  386. package/expertise/security/data/data-privacy-gdpr.md +692 -0
  387. package/expertise/security/data/database-security.md +1171 -0
  388. package/expertise/security/data/index.md +22 -0
  389. package/expertise/security/data/pii-handling.md +531 -0
  390. package/expertise/security/foundations/authentication.md +1041 -0
  391. package/expertise/security/foundations/authorization.md +603 -0
  392. package/expertise/security/foundations/cryptography.md +1001 -0
  393. package/expertise/security/foundations/index.md +25 -0
  394. package/expertise/security/foundations/owasp-top-10.md +1354 -0
  395. package/expertise/security/foundations/secrets-management.md +1217 -0
  396. package/expertise/security/foundations/secure-sdlc.md +700 -0
  397. package/expertise/security/foundations/supply-chain-security.md +698 -0
  398. package/expertise/security/index.md +31 -0
  399. package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
  400. package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
  401. package/expertise/security/infrastructure/container-security.md +721 -0
  402. package/expertise/security/infrastructure/incident-response.md +1295 -0
  403. package/expertise/security/infrastructure/index.md +24 -0
  404. package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
  405. package/expertise/security/infrastructure/network-security.md +1337 -0
  406. package/expertise/security/mobile/index.md +23 -0
  407. package/expertise/security/mobile/mobile-android-security.md +1218 -0
  408. package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
  409. package/expertise/security/mobile/mobile-data-storage.md +1265 -0
  410. package/expertise/security/mobile/mobile-ios-security.md +1401 -0
  411. package/expertise/security/mobile/mobile-network-security.md +1520 -0
  412. package/expertise/security/smart-contract-security.md +594 -0
  413. package/expertise/security/testing/index.md +22 -0
  414. package/expertise/security/testing/penetration-testing.md +1258 -0
  415. package/expertise/security/testing/security-code-review.md +1765 -0
  416. package/expertise/security/testing/threat-modeling.md +1074 -0
  417. package/expertise/security/testing/vulnerability-scanning.md +1062 -0
  418. package/expertise/security/web/api-security.md +586 -0
  419. package/expertise/security/web/cors-and-headers.md +433 -0
  420. package/expertise/security/web/csrf.md +562 -0
  421. package/expertise/security/web/file-upload.md +1477 -0
  422. package/expertise/security/web/index.md +25 -0
  423. package/expertise/security/web/injection.md +1375 -0
  424. package/expertise/security/web/session-management.md +1101 -0
  425. package/expertise/security/web/xss.md +1158 -0
  426. package/exports/README.md +17 -0
  427. package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
  428. package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
  429. package/exports/hosts/claude/.claude/agents/designer.md +55 -0
  430. package/exports/hosts/claude/.claude/agents/executor.md +55 -0
  431. package/exports/hosts/claude/.claude/agents/learner.md +51 -0
  432. package/exports/hosts/claude/.claude/agents/planner.md +53 -0
  433. package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
  434. package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
  435. package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
  436. package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
  437. package/exports/hosts/claude/.claude/commands/author.md +42 -0
  438. package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
  439. package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
  440. package/exports/hosts/claude/.claude/commands/design.md +44 -0
  441. package/exports/hosts/claude/.claude/commands/discover.md +37 -0
  442. package/exports/hosts/claude/.claude/commands/execute.md +48 -0
  443. package/exports/hosts/claude/.claude/commands/learn.md +38 -0
  444. package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
  445. package/exports/hosts/claude/.claude/commands/plan.md +39 -0
  446. package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
  447. package/exports/hosts/claude/.claude/commands/review.md +40 -0
  448. package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
  449. package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
  450. package/exports/hosts/claude/.claude/commands/specify.md +38 -0
  451. package/exports/hosts/claude/.claude/commands/verify.md +37 -0
  452. package/exports/hosts/claude/.claude/settings.json +34 -0
  453. package/exports/hosts/claude/CLAUDE.md +19 -0
  454. package/exports/hosts/claude/export.manifest.json +38 -0
  455. package/exports/hosts/claude/host-package.json +67 -0
  456. package/exports/hosts/codex/AGENTS.md +19 -0
  457. package/exports/hosts/codex/export.manifest.json +38 -0
  458. package/exports/hosts/codex/host-package.json +41 -0
  459. package/exports/hosts/cursor/.cursor/hooks.json +16 -0
  460. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
  461. package/exports/hosts/cursor/export.manifest.json +38 -0
  462. package/exports/hosts/cursor/host-package.json +42 -0
  463. package/exports/hosts/gemini/GEMINI.md +19 -0
  464. package/exports/hosts/gemini/export.manifest.json +38 -0
  465. package/exports/hosts/gemini/host-package.json +41 -0
  466. package/hooks/README.md +18 -0
  467. package/hooks/definitions/loop_cap_guard.yaml +21 -0
  468. package/hooks/definitions/post_tool_capture.yaml +24 -0
  469. package/hooks/definitions/pre_compact_summary.yaml +19 -0
  470. package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
  471. package/hooks/definitions/protected_path_write_guard.yaml +19 -0
  472. package/hooks/definitions/session_start.yaml +19 -0
  473. package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
  474. package/hooks/loop-cap-guard +17 -0
  475. package/hooks/post-tool-lint +36 -0
  476. package/hooks/protected-path-write-guard +17 -0
  477. package/hooks/session-start +41 -0
  478. package/llms-full.txt +2355 -0
  479. package/llms.txt +43 -0
  480. package/package.json +79 -0
  481. package/roles/README.md +20 -0
  482. package/roles/clarifier.md +42 -0
  483. package/roles/content-author.md +63 -0
  484. package/roles/designer.md +55 -0
  485. package/roles/executor.md +55 -0
  486. package/roles/learner.md +51 -0
  487. package/roles/planner.md +53 -0
  488. package/roles/researcher.md +43 -0
  489. package/roles/reviewer.md +54 -0
  490. package/roles/specifier.md +47 -0
  491. package/roles/verifier.md +71 -0
  492. package/schemas/README.md +24 -0
  493. package/schemas/accepted-learning.schema.json +20 -0
  494. package/schemas/author-artifact.schema.json +156 -0
  495. package/schemas/clarification.schema.json +19 -0
  496. package/schemas/design-artifact.schema.json +80 -0
  497. package/schemas/docs-claim.schema.json +18 -0
  498. package/schemas/export-manifest.schema.json +20 -0
  499. package/schemas/hook.schema.json +67 -0
  500. package/schemas/host-export-package.schema.json +18 -0
  501. package/schemas/implementation-plan.schema.json +19 -0
  502. package/schemas/proposed-learning.schema.json +19 -0
  503. package/schemas/research.schema.json +18 -0
  504. package/schemas/review.schema.json +29 -0
  505. package/schemas/run-manifest.schema.json +18 -0
  506. package/schemas/spec-challenge.schema.json +18 -0
  507. package/schemas/spec.schema.json +20 -0
  508. package/schemas/usage.schema.json +102 -0
  509. package/schemas/verification-proof.schema.json +29 -0
  510. package/schemas/wazir-manifest.schema.json +173 -0
  511. package/skills/README.md +40 -0
  512. package/skills/brainstorming/SKILL.md +77 -0
  513. package/skills/debugging/SKILL.md +50 -0
  514. package/skills/design/SKILL.md +61 -0
  515. package/skills/dispatching-parallel-agents/SKILL.md +128 -0
  516. package/skills/executing-plans/SKILL.md +70 -0
  517. package/skills/finishing-a-development-branch/SKILL.md +169 -0
  518. package/skills/humanize/SKILL.md +123 -0
  519. package/skills/init-pipeline/SKILL.md +124 -0
  520. package/skills/prepare-next/SKILL.md +20 -0
  521. package/skills/receiving-code-review/SKILL.md +123 -0
  522. package/skills/requesting-code-review/SKILL.md +105 -0
  523. package/skills/requesting-code-review/code-reviewer.md +108 -0
  524. package/skills/run-audit/SKILL.md +197 -0
  525. package/skills/scan-project/SKILL.md +41 -0
  526. package/skills/self-audit/SKILL.md +153 -0
  527. package/skills/subagent-driven-development/SKILL.md +154 -0
  528. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  529. package/skills/subagent-driven-development/implementer-prompt.md +102 -0
  530. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  531. package/skills/tdd/SKILL.md +23 -0
  532. package/skills/using-git-worktrees/SKILL.md +163 -0
  533. package/skills/using-skills/SKILL.md +95 -0
  534. package/skills/verification/SKILL.md +22 -0
  535. package/skills/wazir/SKILL.md +463 -0
  536. package/skills/writing-plans/SKILL.md +30 -0
  537. package/skills/writing-skills/SKILL.md +157 -0
  538. package/skills/writing-skills/anthropic-best-practices.md +122 -0
  539. package/skills/writing-skills/persuasion-principles.md +50 -0
  540. package/templates/README.md +20 -0
  541. package/templates/artifacts/README.md +10 -0
  542. package/templates/artifacts/accepted-learning.md +19 -0
  543. package/templates/artifacts/accepted-learning.template.json +12 -0
  544. package/templates/artifacts/author.md +74 -0
  545. package/templates/artifacts/author.template.json +19 -0
  546. package/templates/artifacts/clarification.md +21 -0
  547. package/templates/artifacts/clarification.template.json +12 -0
  548. package/templates/artifacts/execute-notes.md +19 -0
  549. package/templates/artifacts/implementation-plan.md +21 -0
  550. package/templates/artifacts/implementation-plan.template.json +11 -0
  551. package/templates/artifacts/learning-proposal.md +19 -0
  552. package/templates/artifacts/next-run-handoff.md +21 -0
  553. package/templates/artifacts/plan-review.md +19 -0
  554. package/templates/artifacts/proposed-learning.template.json +12 -0
  555. package/templates/artifacts/research.md +21 -0
  556. package/templates/artifacts/research.template.json +12 -0
  557. package/templates/artifacts/review-findings.md +19 -0
  558. package/templates/artifacts/review.template.json +11 -0
  559. package/templates/artifacts/run-manifest.template.json +8 -0
  560. package/templates/artifacts/spec-challenge.md +19 -0
  561. package/templates/artifacts/spec-challenge.template.json +11 -0
  562. package/templates/artifacts/spec.md +21 -0
  563. package/templates/artifacts/spec.template.json +12 -0
  564. package/templates/artifacts/verification-proof.md +19 -0
  565. package/templates/artifacts/verification-proof.template.json +11 -0
  566. package/templates/examples/accepted-learning.example.json +14 -0
  567. package/templates/examples/author.example.json +152 -0
  568. package/templates/examples/clarification.example.json +15 -0
  569. package/templates/examples/docs-claim.example.json +8 -0
  570. package/templates/examples/export-manifest.example.json +7 -0
  571. package/templates/examples/host-export-package.example.json +11 -0
  572. package/templates/examples/implementation-plan.example.json +17 -0
  573. package/templates/examples/proposed-learning.example.json +13 -0
  574. package/templates/examples/research.example.json +15 -0
  575. package/templates/examples/research.example.md +6 -0
  576. package/templates/examples/review.example.json +17 -0
  577. package/templates/examples/run-manifest.example.json +9 -0
  578. package/templates/examples/spec-challenge.example.json +14 -0
  579. package/templates/examples/spec.example.json +21 -0
  580. package/templates/examples/verification-proof.example.json +21 -0
  581. package/templates/examples/wazir-manifest.example.yaml +65 -0
  582. package/templates/task-definition-schema.md +99 -0
  583. package/tooling/README.md +20 -0
  584. package/tooling/src/adapters/context-mode.js +50 -0
  585. package/tooling/src/capture/command.js +376 -0
  586. package/tooling/src/capture/store.js +99 -0
  587. package/tooling/src/capture/usage.js +270 -0
  588. package/tooling/src/checks/branches.js +50 -0
  589. package/tooling/src/checks/brand-truth.js +110 -0
  590. package/tooling/src/checks/changelog.js +231 -0
  591. package/tooling/src/checks/command-registry.js +36 -0
  592. package/tooling/src/checks/commits.js +102 -0
  593. package/tooling/src/checks/docs-drift.js +103 -0
  594. package/tooling/src/checks/docs-truth.js +201 -0
  595. package/tooling/src/checks/runtime-surface.js +156 -0
  596. package/tooling/src/cli.js +116 -0
  597. package/tooling/src/command-options.js +56 -0
  598. package/tooling/src/commands/validate.js +320 -0
  599. package/tooling/src/doctor/command.js +91 -0
  600. package/tooling/src/export/command.js +77 -0
  601. package/tooling/src/export/compiler.js +498 -0
  602. package/tooling/src/guards/loop-cap-guard.js +52 -0
  603. package/tooling/src/guards/protected-path-write-guard.js +67 -0
  604. package/tooling/src/index/command.js +152 -0
  605. package/tooling/src/index/storage.js +1061 -0
  606. package/tooling/src/index/summarizers.js +261 -0
  607. package/tooling/src/loaders.js +18 -0
  608. package/tooling/src/project-root.js +22 -0
  609. package/tooling/src/recall/command.js +225 -0
  610. package/tooling/src/schema-validator.js +30 -0
  611. package/tooling/src/state-root.js +40 -0
  612. package/tooling/src/status/command.js +71 -0
  613. package/wazir.manifest.yaml +135 -0
  614. package/workflows/README.md +19 -0
  615. package/workflows/author.md +42 -0
  616. package/workflows/clarify.md +38 -0
  617. package/workflows/design-review.md +46 -0
  618. package/workflows/design.md +44 -0
  619. package/workflows/discover.md +37 -0
  620. package/workflows/execute.md +48 -0
  621. package/workflows/learn.md +38 -0
  622. package/workflows/plan-review.md +42 -0
  623. package/workflows/plan.md +39 -0
  624. package/workflows/prepare-next.md +37 -0
  625. package/workflows/review.md +40 -0
  626. package/workflows/run-audit.md +41 -0
  627. package/workflows/spec-challenge.md +41 -0
  628. package/workflows/specify.md +38 -0
  629. package/workflows/verify.md +37 -0
@@ -0,0 +1,615 @@
1
+ # Database Scaling — Architecture Expertise Module
2
+
3
+ > Database scaling is typically the hardest scaling challenge because databases are stateful. The progression: optimize queries → add indexes → vertical scaling → read replicas → caching → partitioning → sharding. Most applications never need to go past read replicas. Premature sharding is one of the most expensive architectural mistakes.
4
+
5
+ > **Category:** Scaling
6
+ > **Complexity:** Complex
7
+ > **Applies when:** Database becoming a performance bottleneck due to query volume, data size, or write throughput
8
+
9
+ ---
10
+
11
+ ## What This Is
12
+
13
+ Database scaling increases a system's capacity to handle growing workloads — more QPS, larger datasets, higher write throughput. Unlike stateless app servers, databases hold persistent state, making every scaling step fundamentally harder.
14
+
15
+ ### The Scaling Ladder
16
+
17
+ Each step is roughly **10x harder** than the previous one in complexity, operational burden, and risk.
18
+
19
+ ```
20
+ Level 0: Optimize Queries — Cost: Hours | Risk: None | Impact: 2-100x
21
+ Level 1: Add/Improve Indexes — Cost: Hours | Risk: Low | Impact: 10-1000x
22
+ Level 2: Vertical Scaling — Cost: Minutes | Risk: Low | Impact: 2-8x
23
+ Level 3: Connection Pooling — Cost: Days | Risk: Low | Impact: 2-10x
24
+ Level 4: Read Replicas — Cost: Days | Risk: Medium | Impact: 2-50x
25
+ Level 5: Caching Layer — Cost: Weeks | Risk: Medium | Impact: 10-100x
26
+ Level 6: Table Partitioning — Cost: Weeks | Risk: Medium | Impact: 2-10x
27
+ Level 7: Vertical Partitioning — Cost: Months | Risk: High | Impact: 2-10x
28
+ Level 8: Horizontal Sharding — Cost: Months+ | Risk: Very High| Impact: 10-1000x
29
+ ```
30
+
31
+ **Critical insight:** Levels 0-3 are free or nearly free. Level 4 handles 80%+ of scaling needs. Level 5 handles another 15%. Only ~5% of applications ever genuinely need Levels 6-8. OpenAI serves 800M ChatGPT users with a single PostgreSQL primary and ~50 read replicas — no sharding.
32
+
33
+ ---
34
+
35
+ ## When to Use Each Level
36
+
37
+ ### Level 0-1: Query and Index Optimization — Always First
38
+
39
+ **Signals:** Slow queries >100ms, sequential scans on tables >10K rows, N+1 patterns, missing composite indexes.
40
+ **Tools:** `EXPLAIN ANALYZE`, `pg_stat_statements`, `auto_explain`.
41
+ **Outcome:** 10-1000x improvement on specific queries. Often eliminates all other scaling needs.
42
+
43
+ ### Level 2: Vertical Scaling — Bigger Hardware
44
+
45
+ **Signals:** CPU >70% sustained after query optimization, instance is not the largest available.
46
+ **Ceiling:** AWS offers 64 vCPUs / 512GB RAM (r6g.16xlarge). Azure offers 128 vCPUs / 4TB RAM. Most databases never exhaust these.
47
+
48
+ ### Level 3: Connection Pooling — Reduce Overhead
49
+
50
+ **Signals:** `max_connections` errors, hundreds of idle connections (~10MB each in PostgreSQL), serverless connection storms.
51
+ **Rule:** Any app with >100 RPS should use PgBouncer (transaction mode) or ProxySQL.
52
+
53
+ ### Level 4: Read Replicas — Scale Reads (80% of use cases)
54
+
55
+ **Signals:** >80% read workload, single primary CPU saturated by SELECTs, can tolerate <100ms replication lag.
56
+ **Scale:** 2-5 replicas (most apps), 5-15 (high-traffic SaaS), 15-50 (exceptional — OpenAI).
57
+ **Requirement:** Application must handle eventual consistency and read-your-writes routing.
58
+
59
+ ### Level 5: Caching Layer — Absorb Hot Reads
60
+
61
+ **Signals:** Hot data <10% of total, same queries repeated thousands of times/min, data changes infrequently.
62
+ **Patterns:** Cache-aside (lazy load), write-through, write-behind. Redis preferred over Memcached.
63
+ **Critical:** Implement thundering herd protection (cache locks). OpenAI uses single-flight cache locking.
64
+
65
+ ### Level 6: Table Partitioning — Split Large Tables
66
+
67
+ **Signals:** Tables >100M rows, VACUUM/REINDEX taking hours, queries naturally filter by partition key (date, tenant).
68
+ **Strategies:** Range (time-series), List (categorical), Hash (even distribution).
69
+
70
+ ### Level 7: Vertical Partitioning — Split by Domain
71
+
72
+ **Signals:** Independent feature domains with different scaling needs, no cross-table JOINs needed between groups.
73
+ **Example:** Figma went from 1 to 12 PostgreSQL databases by moving table groups (Files, Organizations) to dedicated servers.
74
+
75
+ ### Level 8: Horizontal Sharding — Split Rows Across Servers
76
+
77
+ **ALL must be true:** Vertical scaling exhausted, read replicas insufficient (write-bound), connection pooling and caching in place, single table >1-5TB and growing, writes >50-200K/sec sustained.
78
+ **Reality:** Most PostgreSQL instances handle 1TB+ with proper indexing. You almost certainly do not need sharding.
79
+
80
+ ---
81
+
82
+ ## When NOT to Shard
83
+
84
+ **Premature sharding is one of the most catastrophic architectural mistakes.** It is essentially irreversible without a multi-month migration.
85
+
86
+ ### What Sharding Destroys
87
+
88
+ | Operation | Before Sharding | After Sharding |
89
+ |-----------|-----------------|----------------|
90
+ | Simple query | Single-node lookup | Route to correct shard |
91
+ | Aggregate query | `SELECT COUNT(*)` | Query ALL shards, merge results |
92
+ | JOIN | Standard SQL JOIN | Impossible across shard keys, or scatter-gather |
93
+ | Transaction | `BEGIN; ... COMMIT;` | Distributed 2PC across shards |
94
+ | Schema migration | Single `ALTER TABLE` | Execute on EVERY shard, coordinate rollback |
95
+ | Unique constraint | Database enforces | Application must enforce globally |
96
+ | Foreign keys | Database enforces | Cannot enforce across shards |
97
+
98
+ ### Real-World Sharding Disasters
99
+
100
+ **The $2.9M Rollback:** A company spent $2.9M (implementation + rollback), wasted 30 months, lost 5 engineers. The database had 100M records / 2TB with 8% CPU and 34% memory usage at 400 QPS. Sharding was never needed — proper indexing would have sufficed.
101
+
102
+ **Foursquare Shard Imbalance (2010):** Two shards, uneven user distribution — one grew to 67GB (exceeding RAM), the other 50GB. Performance collapsed, causing an 11-hour outage. Even "simple" sharding has unpredictable failure modes.
103
+
104
+ **Wrong Shard Key:** E-commerce team sharded by `order_id` (even distribution), but every query was "show all orders for customer X" — scatter-gather across ALL shards. Resharding by `customer_id` required migrating billions of rows over 3 weeks.
105
+
106
+ **Gaming Hot Shard:** Sharded by `game_id` — one viral game got 80% of traffic on one shard while 15 others sat idle.
107
+
108
+ ### Do NOT Shard If
109
+
110
+ 1. Database under 500GB (proper indexing + vertical scaling handles it)
111
+ 2. Under 50K QPS (connection pooling + read replicas suffice)
112
+ 3. Under 10K writes/sec (single PostgreSQL primary handles this easily)
113
+ 4. Fewer than 10 engineers (sharding requires dedicated DB engineering capacity)
114
+ 5. Many cross-entity relationships (sharding destroys JOINs)
115
+ 6. You haven't exhausted Levels 0-6 (every prior level is cheaper and less risky)
116
+
117
+ ---
118
+
119
+ ## How It Works
120
+
121
+ ### Read Replicas: Replication and Routing
122
+
123
+ ```
124
+ ┌──────────────┐
125
+ │ Application │
126
+ └──────┬───────┘
127
+ ┌──────▼───────┐
128
+ │ Router/Proxy │
129
+ └──┬─────┬────┬┘
130
+ ┌─────▼──┐ ┌▼────▼──┐
131
+ │Primary │ │Replicas │ ← WAL streaming from primary
132
+ │(writes)│ │ (reads) │
133
+ └────────┘ └─────────┘
134
+ ```
135
+
136
+ **Replication lag patterns:**
137
+
138
+ 1. **Read-your-writes:** After a write, route the same session's reads to the primary for a configurable window (e.g., 5 seconds):
139
+ ```python
140
+ class DatabaseRouter:
141
+ def route(self, query, session):
142
+ if query.is_write():
143
+ return self.primary
144
+ if session.has_recent_write(window=5_seconds):
145
+ return self.primary # read-your-writes consistency
146
+ return self.replica_pool.next() # round-robin replicas
147
+ ```
148
+ 2. **Monotonic reads:** Pin a user session to one replica to avoid reading from a more-lagged replica after a less-lagged one.
149
+ 3. **Causal consistency:** Track logical timestamps; route reads to any replica caught up past the last write's timestamp.
150
+
151
+ **WAL distribution at scale (OpenAI):** At ~50 replicas, the primary cannot stream WAL to all — network bandwidth and CPU pressure cause unstable replica lag. Solution: intermediate "relay" replicas form a tree topology (primary -> relay -> leaf), enabling 100+ replicas without overwhelming the primary.
152
+
153
+ ### Connection Pooling: PgBouncer
154
+
155
+ | Mode | Behavior | Use Case |
156
+ |------|----------|----------|
157
+ | Session | Client owns connection for entire session | Legacy apps, prepared statements |
158
+ | Transaction | Connection returned after each transaction | **Most production workloads** |
159
+ | Statement | Connection returned after each statement | Simple read-only workloads |
160
+
161
+ Pool size formula: `(available_RAM / 20MB) / num_databases`, capped at 100-200.
162
+
163
+ ### Partitioning: Range, Hash, List
164
+
165
+ ```sql
166
+ -- Range partitioning by date (most common)
167
+ CREATE TABLE events (
168
+ id BIGINT GENERATED ALWAYS AS IDENTITY,
169
+ created_at TIMESTAMPTZ NOT NULL,
170
+ payload JSONB
171
+ ) PARTITION BY RANGE (created_at);
172
+
173
+ CREATE TABLE events_2025_01 PARTITION OF events
174
+ FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
175
+ ```
176
+
177
+ ```sql
178
+ -- Hash partitioning for even distribution
179
+ CREATE TABLE orders (
180
+ id BIGINT GENERATED ALWAYS AS IDENTITY,
181
+ customer_id BIGINT NOT NULL,
182
+ total NUMERIC(10,2)
183
+ ) PARTITION BY HASH (customer_id);
184
+
185
+ CREATE TABLE orders_p0 PARTITION OF orders FOR VALUES WITH (MODULUS 8, REMAINDER 0);
186
+ CREATE TABLE orders_p1 PARTITION OF orders FOR VALUES WITH (MODULUS 8, REMAINDER 1);
187
+ -- ... through REMAINDER 7
188
+ ```
189
+
190
+ Partition pruning: `WHERE created_at >= '2025-01-15' AND created_at < '2025-01-20'` scans only the January partition, not all data. Verify with `EXPLAIN ANALYZE`.
191
+
192
+ ### Horizontal Sharding: Shard Key Selection
193
+
194
+ The **single most consequential decision** in sharding. Criteria: (1) High cardinality — many distinct values. (2) Even distribution — no power-law hotspots. (3) Query alignment — most queries filter by shard key. (4) Growth stability — stays even as data grows.
195
+
196
+ **Instagram's ID design:** 41 bits timestamp + 13 bits shard ID (8192 logical shards) + 10 bits sequence. Logical shards map to physical servers. Rebalancing moves logical shards — no row-level resharding.
197
+
198
+ **Routing strategies:**
199
+
200
+ | Strategy | Pros | Cons |
201
+ |----------|------|------|
202
+ | Hash (modulo) | Simple, even distribution | Resharding moves ~all data |
203
+ | Consistent hashing | Minimal data movement on reshard | More complex |
204
+ | Range-based | Efficient range queries | Hot shard risk |
205
+ | Directory/lookup | Maximum flexibility | Lookup table is SPOF |
206
+
207
+ ### Query Optimization: EXPLAIN ANALYZE
208
+
209
+ The single most important command for database performance:
210
+
211
+ ```sql
212
+ EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
213
+ SELECT u.name, COUNT(o.id) as order_count
214
+ FROM users u
215
+ JOIN orders o ON o.user_id = u.id
216
+ WHERE u.created_at > '2025-01-01'
217
+ GROUP BY u.name
218
+ ORDER BY order_count DESC LIMIT 10;
219
+ ```
220
+
221
+ **Red flags in output:**
222
+ - `Seq Scan` on large tables -> needs an index
223
+ - `Nested Loop` with large outer table -> consider `Hash Join`
224
+ - `Rows Removed by Filter` >> actual rows returned -> index not selective enough
225
+ - `Buffers: shared read` >> `shared hit` -> working set exceeds `shared_buffers`
226
+ - `Sort Method: external merge Disk` -> `work_mem` too low for this query
227
+
228
+ ---
229
+
230
+ ## Trade-Offs Matrix
231
+
232
+ | Strategy | Complexity | Read Scale | Write Scale | Consistency | Reversibility |
233
+ |----------|-----------|------------|-------------|-------------|---------------|
234
+ | Query optimization | Very Low | High | Medium | Full | N/A |
235
+ | Indexing | Low | Very High | Slight negative | Full | Easy |
236
+ | Vertical scaling | Very Low | Medium | Medium | Full | Trivial |
237
+ | Connection pooling | Low | Medium | Medium | Full | Easy |
238
+ | Read replicas | Medium | Very High | None | Eventual | Moderate |
239
+ | Caching (Redis) | Medium | Very High | None | Eventual/TTL | Moderate |
240
+ | Table partitioning | Medium | Medium | Low | Full | Difficult |
241
+ | Vertical partitioning | High | Medium | Medium | Per-database | Very Difficult |
242
+ | Horizontal sharding | Very High | Very High | Very High | Per-shard | Nearly Impossible |
243
+
244
+ | Strategy | Team Size | Time to Implement | Data Model Impact |
245
+ |----------|-----------|-------------------|-------------------|
246
+ | Query/Index optimization | 1 engineer | Hours-Days | None |
247
+ | Vertical scaling | 1 SRE | Minutes | None |
248
+ | Connection pooling | 1-2 engineers | Days | None |
249
+ | Read replicas | 2-3 engineers | Days-Weeks | Minimal (read routing) |
250
+ | Caching | 2-3 engineers | Weeks | Moderate (invalidation logic) |
251
+ | Table partitioning | 2-3 engineers | Weeks | Moderate (partition key) |
252
+ | Vertical partitioning | 3-5 engineers | Months | High (no cross-DB JOINs) |
253
+ | Horizontal sharding | 5-10 engineers | Months-Years | Fundamental (shard key governs everything) |
254
+
255
+ ---
256
+
257
+ ## Evolution Path
258
+
259
+ ### Stage 1: Foundation (0-1K users, <100 RPS)
260
+ Single PostgreSQL instance. Set up `pg_stat_statements` and `auto_explain` from day one.
261
+ **Advance when:** P95 latency >200ms OR CPU >50% sustained.
262
+
263
+ ### Stage 2: Optimization (1K-100K users, 100-1K RPS)
264
+ Fix top 10 queries by total time. Add composite indexes. Eliminate N+1 queries. Add PgBouncer. Tune `shared_buffers`, `work_mem`, `effective_cache_size`.
265
+
266
+ **Key metrics targets:**
267
+ ```
268
+ Cache hit ratio: > 99% (if below, increase shared_buffers)
269
+ Connection count: < 80% of max_connections
270
+ CPU utilization: < 70% sustained
271
+ Disk I/O wait: < 10%
272
+ Query p99 latency: < 100ms
273
+ ```
274
+ **Advance when:** CPU >70% after optimization.
275
+
276
+ ### Stage 3: Read Scaling (100K-10M users, 1K-50K RPS)
277
+ Deploy 2-5 read replicas. Implement read/write routing with read-your-writes consistency. Monitor replication lag; alert at >1s.
278
+ **Advance when:** Replicas saturated OR write throughput is the bottleneck.
279
+
280
+ ### Stage 4: Caching (10M+ users, >10K RPS)
281
+ Redis cache-aside for hot entities. Cache invalidation on writes. Thundering herd protection via cache locks (OpenAI pattern: single-flight, one request populates cache, others wait).
282
+ **Advance when:** Single-table performance degrades despite caching.
283
+
284
+ ### Stage 5: Partitioning (tables >100M rows)
285
+ Identify tables with natural partition keys. Range by time is most common. Use `pg_partman` for automation. Verify partition pruning in `EXPLAIN` output.
286
+ **Advance when:** Aggregate load exceeds single server.
287
+
288
+ ### Stage 6: Vertical Partitioning
289
+ Move independent table groups to separate databases (Figma: 1 → 12 databases). Remove cross-domain JOINs first.
290
+ **Advance when:** Single-table write throughput exceeds vertical limits. (Rare.)
291
+
292
+ ### Stage 7: Horizontal Sharding
293
+ Select shard key. Choose middleware (Vitess/Citus/application-level). Use logical shards (Instagram: 8192 logical → few physical). Build dual-write migration with verification (Shopify). Plan for years, not months (Slack: 3 years).
294
+
295
+ ---
296
+
297
+ ## Failure Modes
298
+
299
+ ### 1. Replication Lag → Stale Reads
300
+ User creates a record, sees it missing. **Fix:** Read-your-writes routing; synchronous replication for critical paths; monitor lag, circuit-break to primary if >threshold.
301
+
302
+ ### 2. Connection Pool Exhaustion
303
+ "Too many connections" errors, total unavailability. **Fix:** Size pools correctly, set `statement_timeout` (30s), monitor at 80% utilization, use `SHOW POOLS` to diagnose.
304
+
305
+ ### 3. Shard Key Hot Spots
306
+ One shard overwhelmed while others idle (gaming company: 80% traffic on one shard). **Fix:** Hash-based sharding, split hot keys with salt, monitor per-shard imbalance.
307
+
308
+ ### 4. Cross-Shard JOIN Impossibility
309
+ Features requiring multi-shard data become impossible. **Fix:** Denormalize joined data into each shard, maintain read-only aggregate store for analytics, application-level joins.
310
+
311
+ ### 5. Resharding Downtime
312
+ System unavailable during rebalancing. **Fix:** Use virtual/logical shards from day one (Instagram: 8192 logical shards), consistent hashing, online dual-write resharding.
313
+
314
+ ### 6. Cache Stampede (Thundering Herd)
315
+ DB load spikes when popular cache entries expire simultaneously. **Fix:** Cache locks / single-flight, jittered TTLs, background refresh before expiry.
316
+
317
+ ### 7. Partition Explosion
318
+ Query planner slows with >10K partitions. **Fix:** Keep under 1000 (ideally <100), coarser granularity, drop old partitions.
319
+
320
+ ### 8. Split-Brain During Failover
321
+ Two nodes both accepting writes after failover. **Fix:** Fencing (STONITH), monitor timeline divergence, use managed services that handle failover correctly, test regularly.
322
+
323
+ ---
324
+
325
+ ## Technology Landscape
326
+
327
+ ### PostgreSQL Ecosystem
328
+ | Tool | Purpose |
329
+ |------|---------|
330
+ | **PgBouncer** | Connection pooling (standard for production) |
331
+ | **Citus** | Distributed PostgreSQL — sharding as extension |
332
+ | **pg_partman** | Automated partition management |
333
+ | **Patroni** | HA and automatic failover |
334
+
335
+ ### MySQL Ecosystem
336
+ | Tool | Purpose |
337
+ |------|---------|
338
+ | **ProxySQL** | Query routing + connection pooling |
339
+ | **Vitess** | Sharding middleware (Slack, Shopify, GitHub) |
340
+ | **Orchestrator** | Replication topology management |
341
+
342
+ ### Managed Cloud Databases
343
+ | Service | Provider | Key Feature |
344
+ |---------|----------|-------------|
345
+ | **Aurora** | AWS | 15 replicas, auto-scales to 128TB |
346
+ | **AlloyDB** | GCP | PostgreSQL-compatible, 100x faster analytics |
347
+ | **PlanetScale** | PlanetScale | Vitess-based serverless MySQL, zero-downtime DDL |
348
+ | **Neon** | Neon | Serverless PostgreSQL, branching, scale-to-zero |
349
+
350
+ ### NewSQL / Distributed SQL
351
+ | Database | Sharding | SQL Compat | Best For |
352
+ |----------|----------|------------|----------|
353
+ | **CockroachDB** | Automatic (ranges) | PostgreSQL wire | Global distribution, strong consistency |
354
+ | **YugabyteDB** | Automatic (hash/range) | PostgreSQL-compatible | PostgreSQL apps needing horizontal scale |
355
+ | **TiDB** | Automatic (ranges) | MySQL-compatible | MySQL apps needing horizontal scale |
356
+ | **Vitess** | Application-directed | MySQL (middleware) | Existing MySQL at extreme scale |
357
+ | **Citus** | Explicit (extension) | PostgreSQL (native) | Multi-tenant, real-time analytics |
358
+
359
+ **Key distinction:** CockroachDB/YugabyteDB handle sharding transparently (no shard key). Citus/Vitess require explicit shard key selection. Transparent sharding trades performance for simplicity.
360
+
361
+ ### Read Replica Support by Provider
362
+
363
+ | Provider | Max Replicas | Cross-Region | Replication |
364
+ |----------|-------------|--------------|-------------|
365
+ | AWS Aurora | 15 | Yes | Async (sync available) |
366
+ | AWS RDS | 5 | Yes | Async |
367
+ | GCP Cloud SQL | 10 | Yes | Async |
368
+ | Azure Flexible | 5 | Yes (geo-replication) | Async |
369
+ | Self-managed | Unlimited | Manual setup | Streaming WAL |
370
+
371
+ ---
372
+
373
+ ## Decision Tree
374
+
375
+ ```
376
+ START: Database is slow
377
+
378
+ ├─ Analyzed slow queries? (pg_stat_statements, EXPLAIN ANALYZE)
379
+ │ └─ NO → Do this first. Stop. 90% of problems end here.
380
+
381
+ ├─ Queries optimized but CPU > 70%?
382
+ │ ├─ Largest instance? NO → Vertically scale. Done.
383
+ │ └─ YES → Continue.
384
+
385
+ ├─ Workload > 80% reads?
386
+ │ ├─ YES → Add read replicas (start with 2).
387
+ │ │ ├─ Same data read repeatedly? → Add Redis caching.
388
+ │ │ └─ Still bottlenecked? → Continue.
389
+ │ └─ NO (write-heavy) → Continue.
390
+
391
+ ├─ Have connection pooling?
392
+ │ └─ NO → Add PgBouncer. May solve the problem alone.
393
+
394
+ ├─ Tables > 100M rows?
395
+ │ └─ YES → Partition (range by time most common).
396
+
397
+ ├─ Workload splittable into independent domains?
398
+ │ └─ YES → Vertical partition (Figma: 1 → 12 databases).
399
+
400
+ ├─ Single-table writes > 50K/sec sustained?
401
+ │ ├─ YES → Horizontal sharding warranted.
402
+ │ │ ├─ PostgreSQL → Citus or application-level
403
+ │ │ ├─ MySQL → Vitess or PlanetScale
404
+ │ │ └─ Greenfield → CockroachDB or YugabyteDB
405
+ │ └─ NO → Revisit Levels 0-6. You don't need sharding.
406
+
407
+ └─ None of the above?
408
+ └─ Problem is likely not the database. Check: network latency,
409
+ app-level N+1 patterns, lock contention, disk I/O.
410
+ ```
411
+
412
+ ---
413
+
414
+ ## Implementation Sketch
415
+
416
+ ### Read Replica Setup (PostgreSQL / AWS RDS)
417
+
418
+ ```bash
419
+ # Create read replica
420
+ aws rds create-db-instance-read-replica \
421
+ --db-instance-identifier myapp-read-1 \
422
+ --source-db-instance-identifier myapp-primary \
423
+ --db-instance-class db.r6g.2xlarge
424
+ ```
425
+
426
+ ```ruby
427
+ # Rails read/write routing (config/database.yml)
428
+ production:
429
+ primary:
430
+ url: postgres://myapp-primary.xxx.rds.amazonaws.com/myapp
431
+ primary_replica:
432
+ url: postgres://myapp-read-1.xxx.rds.amazonaws.com/myapp
433
+ replica: true
434
+ ```
435
+
436
+ ### PgBouncer Production Config
437
+
438
+ ```ini
439
+ [pgbouncer]
440
+ pool_mode = transaction # recommended for most workloads
441
+ default_pool_size = 50 # per user/database pair
442
+ max_client_conn = 1000
443
+ max_db_connections = 100 # hard cap on backend connections
444
+ server_idle_timeout = 600
445
+ query_timeout = 30
446
+ ```
447
+
448
+ ### Cache Lock Pattern (Thundering Herd Protection)
449
+
450
+ Based on the OpenAI approach — only one request fetches from the database on a cache miss:
451
+
452
+ ```python
453
+ def get_user(user_id):
454
+ cached = redis.get(f"user:{user_id}")
455
+ if cached:
456
+ return deserialize(cached)
457
+
458
+ # Acquire lock - only one request fetches from DB
459
+ lock = redis.set(f"lock:user:{user_id}", "1", nx=True, ex=5)
460
+ if lock:
461
+ user = db.query("SELECT * FROM users WHERE id = %s", user_id)
462
+ redis.setex(f"user:{user_id}", 300, serialize(user))
463
+ redis.delete(f"lock:user:{user_id}")
464
+ return user
465
+ else:
466
+ time.sleep(0.05) # wait for other request to populate
467
+ return get_user(user_id) # retry
468
+ ```
469
+
470
+ ### Zero-Downtime Partitioning Migration
471
+
472
+ ```sql
473
+ -- Step 1: Create partitioned table with same schema
474
+ CREATE TABLE events_partitioned (
475
+ LIKE events INCLUDING ALL
476
+ ) PARTITION BY RANGE (created_at);
477
+
478
+ -- Step 2: Create partitions for each range
479
+ CREATE TABLE events_p_2025_01 PARTITION OF events_partitioned
480
+ FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
481
+ -- ... create all needed partitions
482
+
483
+ -- Step 3: Backfill in batches (avoid long locks)
484
+ INSERT INTO events_partitioned
485
+ SELECT * FROM events
486
+ WHERE created_at >= '2025-01-01' AND created_at < '2025-02-01'
487
+ ON CONFLICT DO NOTHING;
488
+
489
+ -- Step 4: Swap tables (brief exclusive lock)
490
+ BEGIN;
491
+ ALTER TABLE events RENAME TO events_old;
492
+ ALTER TABLE events_partitioned RENAME TO events;
493
+ COMMIT;
494
+
495
+ -- Step 5: Verify, then drop after confirmation period
496
+ -- DROP TABLE events_old;
497
+ ```
498
+
499
+ ### Essential Monitoring Queries
500
+
501
+ ```sql
502
+ -- Top queries by total time
503
+ SELECT calls, round(mean_exec_time::numeric, 2) as mean_ms,
504
+ round((100 * total_exec_time / sum(total_exec_time) OVER ())::numeric, 2) as pct,
505
+ left(query, 80) as query
506
+ FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;
507
+
508
+ -- Cache hit ratio (target > 99%)
509
+ SELECT round(100.0 * sum(heap_blks_hit) /
510
+ nullif(sum(heap_blks_hit) + sum(heap_blks_read), 0), 2) as cache_hit_pct
511
+ FROM pg_statio_user_tables;
512
+
513
+ -- Replication lag
514
+ SELECT client_addr, state,
515
+ flush_lsn - replay_lsn as replay_lag
516
+ FROM pg_stat_replication;
517
+
518
+ -- Table bloat and dead tuples (VACUUM health)
519
+ SELECT relname, n_live_tup, n_dead_tup,
520
+ round(100.0 * n_dead_tup / nullif(n_live_tup + n_dead_tup, 0), 2) as dead_pct,
521
+ last_autovacuum
522
+ FROM pg_stat_user_tables
523
+ WHERE n_live_tup > 10000 ORDER BY n_dead_tup DESC LIMIT 20;
524
+
525
+ -- Active connections by state
526
+ SELECT state, count(*), max(now() - state_change) as longest
527
+ FROM pg_stat_activity
528
+ WHERE pid <> pg_backend_pid() GROUP BY state;
529
+ ```
530
+
531
+ ---
532
+
533
+ ## Real-World Case Studies
534
+
535
+ ### OpenAI (ChatGPT) — Read Replicas at Extreme Scale
536
+
537
+ **Scale:** 800 million users, millions of queries per second.
538
+ **Architecture:** Single Azure PostgreSQL Flexible Server primary + ~50 read replicas across multiple regions.
539
+
540
+ Key innovations:
541
+ - Tree-topology WAL distribution: primary streams to relay replicas, relays stream to leaf replicas — avoids overwhelming the primary's network/CPU at 50+ replicas
542
+ - Cache lock mechanism: only a single reader that misses on a cache key fetches from PostgreSQL; other requests wait — prevents thundering herd
543
+ - Consistent low double-digit millisecond p99 client-side latency
544
+ - Five-nines (99.999%) availability in production
545
+ - **No sharding needed** for the core metadata store
546
+
547
+ **Lesson:** Aggressive query optimization, caching, and read replicas can scale PostgreSQL far beyond what most engineers expect.
548
+
549
+ ### Instagram — Logical Sharding on PostgreSQL
550
+
551
+ **Scale:** 2+ billion monthly active users.
552
+ **Architecture:** Django + PostgreSQL, sharded across thousands of logical shards mapped to physical servers.
553
+
554
+ Key innovations:
555
+ - Custom 64-bit ID: 41-bit timestamp (time-sortable) + 13-bit shard ID (8192 logical shards) + 10-bit auto-increment sequence
556
+ - Logical shards as PostgreSQL schemas (not separate databases) — multiple logical shards per physical server
557
+ - PgBouncer for connection pooling across all shards
558
+ - Cassandra for specific use cases; Redis for ephemeral caching
559
+ - Rebalancing moves logical shards between physical servers — no row-level data migration
560
+
561
+ **Lesson:** Logical sharding provides resharding flexibility. Start with many logical shards on few physical servers. Rebalancing is moving entire schemas, not splitting tables.
562
+
563
+ ### Figma — Incremental Vertical Partitioning
564
+
565
+ **Scale:** 4M+ users, database traffic growing ~3x annually.
566
+ **Starting point:** Single PostgreSQL on AWS RDS, hitting 65% CPU at peak with all queries on one database.
567
+
568
+ Evolution:
569
+ 1. Upgraded from r5.12xlarge to r5.24xlarge (largest available) — bought time
570
+ 2. Added multiple read replicas + PgBouncer as connection pooler
571
+ 3. Created new databases for new features to limit growth of the original
572
+ 4. Vertical partitioning: moved high-traffic table groups (Files, Organizations) to dedicated databases — grew from 1 to 12 databases
573
+ 5. Only then began exploring horizontal sharding on top of vertically partitioned RDS Postgres
574
+
575
+ **Key principle:** Minimize developer impact. Every step was incremental — no "big bang" cutover. App developers could focus on features instead of refactoring.
576
+
577
+ **Lesson:** Exhaust vertical partitioning before horizontal sharding. Going from 1 to 12 databases gave ~12x headroom with far less complexity than sharding.
578
+
579
+ ### Slack — Vitess Migration (3 Years)
580
+
581
+ **Scale:** Hundreds of thousands of MySQL queries/second, thousands of sharded hosts.
582
+ **Problem:** Shard-per-workspace model meant large customers overwhelmed individual shards while thousands of others sat mostly idle.
583
+
584
+ Migration:
585
+ - Chose Vitess: "no other storage system truly fit all of Slack's needs" for flexible "shard by anything"
586
+ - Timeline: July 2017 to late 2020 — 3+ years from 0% to 99% adoption
587
+ - Scaled from 0 to 2.3 million QPS on Vitess
588
+ - Specific migration of a table comprising 20% of overall query load documented in detail
589
+
590
+ **Lesson:** Sharding migrations at scale take years, not months. Budget accordingly and plan for long dual-write periods.
591
+
592
+ ### Shopify — Vitess with Query Verification
593
+
594
+ **Architecture:** MySQL + Vitess, sharded by `user_id`.
595
+ - User-owned data in sharded "users" keyspace; global/shared data in unsharded "global" keyspace
596
+ - Built query verifiers in the application layer that validated query correctness, routing, and data distribution
597
+ - Ran verifiers in shadow mode in production before switching traffic
598
+ - Credited verifiers as instrumental to the successful migration
599
+
600
+ **Lesson:** Build verification tooling before migrating. Validate every query against the new sharded topology in production (shadow mode) before cutting over traffic.
601
+
602
+ ---
603
+
604
+ ## Cross-References
605
+
606
+ - **[data-modeling](../data-modeling.md):** Schema design impacts scaling options. Normalized schemas require JOINs that break under sharding.
607
+ - **[sql-vs-nosql](../sql-vs-nosql.md):** NoSQL databases are pre-sharded by design but sacrifice JOINs and transactions.
608
+ - **[caching-architecture](../caching-architecture.md):** Level 5 on the scaling ladder. Redis can reduce DB load by 90%+, often eliminating further scaling needs.
609
+ - **[horizontal-vs-vertical](../horizontal-vs-vertical.md):** Vertical is always simpler. Horizontal only when vertical limits are reached.
610
+ - **[data-consistency](../data-consistency.md):** Every scaling step beyond vertical introduces consistency trade-offs. Understand CAP theorem before choosing.
611
+
612
+ ---
613
+
614
+ *Last updated: 2026-03-08*
615
+ *Sources: [OpenAI Engineering](https://openai.com/index/scaling-postgresql/), [Instagram Engineering](https://instagram-engineering.com/sharding-ids-at-instagram-1cf5a71e5a5c), [Figma Blog](https://www.figma.com/blog/how-figma-scaled-to-multiple-databases/), [Slack Engineering](https://slack.engineering/scaling-datastores-at-slack-with-vitess/), [Shopify Engineering](https://shopify.engineering/horizontally-scaling-the-rails-backend-of-shop-app-with-vitess)*