@wazir-dev/cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (629) hide show
  1. package/AGENTS.md +111 -0
  2. package/CHANGELOG.md +14 -0
  3. package/CONTRIBUTING.md +101 -0
  4. package/LICENSE +21 -0
  5. package/README.md +314 -0
  6. package/assets/composition-engine.mmd +34 -0
  7. package/assets/demo-script.sh +17 -0
  8. package/assets/logo-dark.svg +14 -0
  9. package/assets/logo.svg +14 -0
  10. package/assets/pipeline.mmd +39 -0
  11. package/assets/record-demo.sh +51 -0
  12. package/docs/README.md +51 -0
  13. package/docs/adapters/context-mode.md +60 -0
  14. package/docs/concepts/architecture.md +87 -0
  15. package/docs/concepts/artifact-model.md +60 -0
  16. package/docs/concepts/composition-engine.md +36 -0
  17. package/docs/concepts/indexing-and-recall.md +160 -0
  18. package/docs/concepts/observability.md +41 -0
  19. package/docs/concepts/roles-and-workflows.md +59 -0
  20. package/docs/concepts/terminology-policy.md +27 -0
  21. package/docs/getting-started/01-installation.md +78 -0
  22. package/docs/getting-started/02-first-run.md +102 -0
  23. package/docs/getting-started/03-adding-to-project.md +15 -0
  24. package/docs/getting-started/04-host-setup.md +15 -0
  25. package/docs/guides/ci-integration.md +15 -0
  26. package/docs/guides/creating-skills.md +15 -0
  27. package/docs/guides/expertise-module-authoring.md +15 -0
  28. package/docs/guides/hook-development.md +15 -0
  29. package/docs/guides/memory-and-learnings.md +34 -0
  30. package/docs/guides/multi-host-export.md +15 -0
  31. package/docs/guides/troubleshooting.md +101 -0
  32. package/docs/guides/writing-custom-roles.md +15 -0
  33. package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
  34. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
  35. package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
  36. package/docs/readmes/INDEX.md +99 -0
  37. package/docs/readmes/features/expertise/README.md +171 -0
  38. package/docs/readmes/features/exports/README.md +222 -0
  39. package/docs/readmes/features/hooks/README.md +103 -0
  40. package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
  41. package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
  42. package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
  43. package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
  44. package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
  45. package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
  46. package/docs/readmes/features/hooks/session-start.md +119 -0
  47. package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
  48. package/docs/readmes/features/roles/README.md +157 -0
  49. package/docs/readmes/features/roles/clarifier.md +152 -0
  50. package/docs/readmes/features/roles/content-author.md +190 -0
  51. package/docs/readmes/features/roles/designer.md +193 -0
  52. package/docs/readmes/features/roles/executor.md +184 -0
  53. package/docs/readmes/features/roles/learner.md +210 -0
  54. package/docs/readmes/features/roles/planner.md +182 -0
  55. package/docs/readmes/features/roles/researcher.md +164 -0
  56. package/docs/readmes/features/roles/reviewer.md +184 -0
  57. package/docs/readmes/features/roles/specifier.md +162 -0
  58. package/docs/readmes/features/roles/verifier.md +215 -0
  59. package/docs/readmes/features/schemas/README.md +178 -0
  60. package/docs/readmes/features/skills/README.md +63 -0
  61. package/docs/readmes/features/skills/brainstorming.md +96 -0
  62. package/docs/readmes/features/skills/debugging.md +148 -0
  63. package/docs/readmes/features/skills/design.md +120 -0
  64. package/docs/readmes/features/skills/prepare-next.md +109 -0
  65. package/docs/readmes/features/skills/run-audit.md +159 -0
  66. package/docs/readmes/features/skills/scan-project.md +109 -0
  67. package/docs/readmes/features/skills/self-audit.md +176 -0
  68. package/docs/readmes/features/skills/tdd.md +137 -0
  69. package/docs/readmes/features/skills/using-skills.md +92 -0
  70. package/docs/readmes/features/skills/verification.md +120 -0
  71. package/docs/readmes/features/skills/writing-plans.md +104 -0
  72. package/docs/readmes/features/tooling/README.md +320 -0
  73. package/docs/readmes/features/workflows/README.md +186 -0
  74. package/docs/readmes/features/workflows/author.md +181 -0
  75. package/docs/readmes/features/workflows/clarify.md +154 -0
  76. package/docs/readmes/features/workflows/design-review.md +171 -0
  77. package/docs/readmes/features/workflows/design.md +169 -0
  78. package/docs/readmes/features/workflows/discover.md +162 -0
  79. package/docs/readmes/features/workflows/execute.md +173 -0
  80. package/docs/readmes/features/workflows/learn.md +167 -0
  81. package/docs/readmes/features/workflows/plan-review.md +165 -0
  82. package/docs/readmes/features/workflows/plan.md +170 -0
  83. package/docs/readmes/features/workflows/prepare-next.md +167 -0
  84. package/docs/readmes/features/workflows/review.md +169 -0
  85. package/docs/readmes/features/workflows/run-audit.md +191 -0
  86. package/docs/readmes/features/workflows/spec-challenge.md +159 -0
  87. package/docs/readmes/features/workflows/specify.md +160 -0
  88. package/docs/readmes/features/workflows/verify.md +177 -0
  89. package/docs/readmes/packages/README.md +50 -0
  90. package/docs/readmes/packages/ajv.md +117 -0
  91. package/docs/readmes/packages/context-mode.md +118 -0
  92. package/docs/readmes/packages/gray-matter.md +116 -0
  93. package/docs/readmes/packages/node-test.md +137 -0
  94. package/docs/readmes/packages/yaml.md +112 -0
  95. package/docs/reference/configuration-reference.md +159 -0
  96. package/docs/reference/expertise-index.md +52 -0
  97. package/docs/reference/git-flow.md +43 -0
  98. package/docs/reference/hooks.md +87 -0
  99. package/docs/reference/host-exports.md +50 -0
  100. package/docs/reference/launch-checklist.md +172 -0
  101. package/docs/reference/marketplace-listings.md +76 -0
  102. package/docs/reference/release-process.md +34 -0
  103. package/docs/reference/roles-reference.md +77 -0
  104. package/docs/reference/skills.md +33 -0
  105. package/docs/reference/templates.md +29 -0
  106. package/docs/reference/tooling-cli.md +94 -0
  107. package/docs/truth-claims.yaml +222 -0
  108. package/expertise/PROGRESS.md +63 -0
  109. package/expertise/README.md +18 -0
  110. package/expertise/antipatterns/PROGRESS.md +56 -0
  111. package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
  112. package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
  113. package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
  114. package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
  115. package/expertise/antipatterns/backend/index.md +24 -0
  116. package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
  117. package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
  118. package/expertise/antipatterns/code/async-antipatterns.md +622 -0
  119. package/expertise/antipatterns/code/code-smells.md +1186 -0
  120. package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
  121. package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
  122. package/expertise/antipatterns/code/index.md +27 -0
  123. package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
  124. package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
  125. package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
  126. package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
  127. package/expertise/antipatterns/design/dark-patterns.md +1121 -0
  128. package/expertise/antipatterns/design/index.md +22 -0
  129. package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
  130. package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
  131. package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
  132. package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
  133. package/expertise/antipatterns/frontend/index.md +23 -0
  134. package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
  135. package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
  136. package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
  137. package/expertise/antipatterns/index.md +31 -0
  138. package/expertise/antipatterns/performance/index.md +20 -0
  139. package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
  140. package/expertise/antipatterns/performance/premature-optimization.md +623 -0
  141. package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
  142. package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
  143. package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
  144. package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
  145. package/expertise/antipatterns/process/index.md +23 -0
  146. package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
  147. package/expertise/antipatterns/security/index.md +20 -0
  148. package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
  149. package/expertise/antipatterns/security/security-theater.md +843 -0
  150. package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
  151. package/expertise/architecture/PROGRESS.md +70 -0
  152. package/expertise/architecture/data/caching-architecture.md +671 -0
  153. package/expertise/architecture/data/data-consistency.md +574 -0
  154. package/expertise/architecture/data/data-modeling.md +536 -0
  155. package/expertise/architecture/data/event-streams-and-queues.md +634 -0
  156. package/expertise/architecture/data/index.md +25 -0
  157. package/expertise/architecture/data/search-architecture.md +663 -0
  158. package/expertise/architecture/data/sql-vs-nosql.md +708 -0
  159. package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
  160. package/expertise/architecture/decisions/build-vs-buy.md +616 -0
  161. package/expertise/architecture/decisions/index.md +23 -0
  162. package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
  163. package/expertise/architecture/decisions/technology-selection.md +616 -0
  164. package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
  165. package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
  166. package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
  167. package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
  168. package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
  169. package/expertise/architecture/distributed/index.md +25 -0
  170. package/expertise/architecture/distributed/saga-pattern.md +797 -0
  171. package/expertise/architecture/foundations/architectural-thinking.md +460 -0
  172. package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
  173. package/expertise/architecture/foundations/design-principles-solid.md +649 -0
  174. package/expertise/architecture/foundations/domain-driven-design.md +719 -0
  175. package/expertise/architecture/foundations/index.md +25 -0
  176. package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
  177. package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
  178. package/expertise/architecture/index.md +34 -0
  179. package/expertise/architecture/integration/api-design-graphql.md +638 -0
  180. package/expertise/architecture/integration/api-design-grpc.md +804 -0
  181. package/expertise/architecture/integration/api-design-rest.md +892 -0
  182. package/expertise/architecture/integration/index.md +25 -0
  183. package/expertise/architecture/integration/third-party-integration.md +795 -0
  184. package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
  185. package/expertise/architecture/integration/websockets-realtime.md +791 -0
  186. package/expertise/architecture/mobile-architecture/index.md +22 -0
  187. package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
  188. package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
  189. package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
  190. package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
  191. package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
  192. package/expertise/architecture/patterns/event-driven.md +797 -0
  193. package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
  194. package/expertise/architecture/patterns/index.md +27 -0
  195. package/expertise/architecture/patterns/layered-architecture.md +736 -0
  196. package/expertise/architecture/patterns/microservices.md +753 -0
  197. package/expertise/architecture/patterns/modular-monolith.md +692 -0
  198. package/expertise/architecture/patterns/monolith.md +626 -0
  199. package/expertise/architecture/patterns/plugin-architecture.md +735 -0
  200. package/expertise/architecture/patterns/serverless.md +780 -0
  201. package/expertise/architecture/scaling/database-scaling.md +615 -0
  202. package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
  203. package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
  204. package/expertise/architecture/scaling/index.md +24 -0
  205. package/expertise/architecture/scaling/multi-tenancy.md +800 -0
  206. package/expertise/architecture/scaling/stateless-design.md +787 -0
  207. package/expertise/backend/embedded-firmware.md +625 -0
  208. package/expertise/backend/go.md +853 -0
  209. package/expertise/backend/index.md +24 -0
  210. package/expertise/backend/java-spring.md +448 -0
  211. package/expertise/backend/node-typescript.md +625 -0
  212. package/expertise/backend/python-fastapi.md +724 -0
  213. package/expertise/backend/rust.md +458 -0
  214. package/expertise/backend/solidity.md +711 -0
  215. package/expertise/composition-map.yaml +443 -0
  216. package/expertise/content/foundations/content-modeling.md +395 -0
  217. package/expertise/content/foundations/editorial-standards.md +449 -0
  218. package/expertise/content/foundations/index.md +24 -0
  219. package/expertise/content/foundations/microcopy.md +455 -0
  220. package/expertise/content/foundations/terminology-governance.md +509 -0
  221. package/expertise/content/index.md +34 -0
  222. package/expertise/content/patterns/accessibility-copy.md +518 -0
  223. package/expertise/content/patterns/index.md +24 -0
  224. package/expertise/content/patterns/notification-content.md +433 -0
  225. package/expertise/content/patterns/sample-content.md +486 -0
  226. package/expertise/content/patterns/state-copy.md +439 -0
  227. package/expertise/design/PROGRESS.md +58 -0
  228. package/expertise/design/disciplines/dark-mode-theming.md +577 -0
  229. package/expertise/design/disciplines/design-systems.md +595 -0
  230. package/expertise/design/disciplines/index.md +25 -0
  231. package/expertise/design/disciplines/information-architecture.md +800 -0
  232. package/expertise/design/disciplines/interaction-design.md +788 -0
  233. package/expertise/design/disciplines/responsive-design.md +552 -0
  234. package/expertise/design/disciplines/usability-testing.md +516 -0
  235. package/expertise/design/disciplines/user-research.md +792 -0
  236. package/expertise/design/foundations/accessibility-design.md +796 -0
  237. package/expertise/design/foundations/color-theory.md +797 -0
  238. package/expertise/design/foundations/iconography.md +795 -0
  239. package/expertise/design/foundations/index.md +26 -0
  240. package/expertise/design/foundations/motion-and-animation.md +653 -0
  241. package/expertise/design/foundations/rtl-design.md +585 -0
  242. package/expertise/design/foundations/spacing-and-layout.md +607 -0
  243. package/expertise/design/foundations/typography.md +800 -0
  244. package/expertise/design/foundations/visual-hierarchy.md +761 -0
  245. package/expertise/design/index.md +32 -0
  246. package/expertise/design/patterns/authentication-flows.md +474 -0
  247. package/expertise/design/patterns/content-consumption.md +789 -0
  248. package/expertise/design/patterns/data-display.md +618 -0
  249. package/expertise/design/patterns/e-commerce.md +1494 -0
  250. package/expertise/design/patterns/feedback-and-states.md +642 -0
  251. package/expertise/design/patterns/forms-and-input.md +819 -0
  252. package/expertise/design/patterns/gamification.md +801 -0
  253. package/expertise/design/patterns/index.md +31 -0
  254. package/expertise/design/patterns/microinteractions.md +449 -0
  255. package/expertise/design/patterns/navigation.md +800 -0
  256. package/expertise/design/patterns/notifications.md +705 -0
  257. package/expertise/design/patterns/onboarding.md +700 -0
  258. package/expertise/design/patterns/search-and-filter.md +601 -0
  259. package/expertise/design/patterns/settings-and-preferences.md +768 -0
  260. package/expertise/design/patterns/social-and-community.md +748 -0
  261. package/expertise/design/platforms/desktop-native.md +612 -0
  262. package/expertise/design/platforms/index.md +25 -0
  263. package/expertise/design/platforms/mobile-android.md +825 -0
  264. package/expertise/design/platforms/mobile-cross-platform.md +983 -0
  265. package/expertise/design/platforms/mobile-ios.md +699 -0
  266. package/expertise/design/platforms/tablet.md +794 -0
  267. package/expertise/design/platforms/web-dashboard.md +790 -0
  268. package/expertise/design/platforms/web-responsive.md +550 -0
  269. package/expertise/design/psychology/behavioral-nudges.md +449 -0
  270. package/expertise/design/psychology/cognitive-load.md +1191 -0
  271. package/expertise/design/psychology/error-psychology.md +778 -0
  272. package/expertise/design/psychology/index.md +22 -0
  273. package/expertise/design/psychology/persuasive-design.md +736 -0
  274. package/expertise/design/psychology/user-mental-models.md +623 -0
  275. package/expertise/design/tooling/open-pencil.md +266 -0
  276. package/expertise/frontend/angular.md +1073 -0
  277. package/expertise/frontend/desktop-electron.md +546 -0
  278. package/expertise/frontend/flutter.md +782 -0
  279. package/expertise/frontend/index.md +27 -0
  280. package/expertise/frontend/native-android.md +409 -0
  281. package/expertise/frontend/native-ios.md +490 -0
  282. package/expertise/frontend/react-native.md +1160 -0
  283. package/expertise/frontend/react.md +808 -0
  284. package/expertise/frontend/vue.md +1089 -0
  285. package/expertise/humanize/domain-rules-code.md +79 -0
  286. package/expertise/humanize/domain-rules-content.md +67 -0
  287. package/expertise/humanize/domain-rules-technical-docs.md +56 -0
  288. package/expertise/humanize/index.md +35 -0
  289. package/expertise/humanize/self-audit-checklist.md +87 -0
  290. package/expertise/humanize/sentence-patterns.md +218 -0
  291. package/expertise/humanize/vocabulary-blacklist.md +105 -0
  292. package/expertise/i18n/PROGRESS.md +65 -0
  293. package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
  294. package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
  295. package/expertise/i18n/advanced/complex-scripts.md +30 -0
  296. package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
  297. package/expertise/i18n/advanced/testing-i18n.md +28 -0
  298. package/expertise/i18n/content/content-adaptation.md +23 -0
  299. package/expertise/i18n/content/locale-specific-formatting.md +23 -0
  300. package/expertise/i18n/content/machine-translation-integration.md +28 -0
  301. package/expertise/i18n/content/translation-management.md +29 -0
  302. package/expertise/i18n/foundations/date-time-calendars.md +67 -0
  303. package/expertise/i18n/foundations/i18n-architecture.md +272 -0
  304. package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
  305. package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
  306. package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
  307. package/expertise/i18n/foundations/string-externalization.md +236 -0
  308. package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
  309. package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
  310. package/expertise/i18n/index.md +38 -0
  311. package/expertise/i18n/platform/backend-i18n.md +31 -0
  312. package/expertise/i18n/platform/flutter-i18n.md +148 -0
  313. package/expertise/i18n/platform/native-android-i18n.md +36 -0
  314. package/expertise/i18n/platform/native-ios-i18n.md +36 -0
  315. package/expertise/i18n/platform/react-i18n.md +103 -0
  316. package/expertise/i18n/platform/web-css-i18n.md +81 -0
  317. package/expertise/i18n/rtl/arabic-specific.md +175 -0
  318. package/expertise/i18n/rtl/hebrew-specific.md +149 -0
  319. package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
  320. package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
  321. package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
  322. package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
  323. package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
  324. package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
  325. package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
  326. package/expertise/i18n/rtl/rtl-typography.md +160 -0
  327. package/expertise/index.md +113 -0
  328. package/expertise/index.yaml +216 -0
  329. package/expertise/infrastructure/cloud-aws.md +597 -0
  330. package/expertise/infrastructure/cloud-gcp.md +599 -0
  331. package/expertise/infrastructure/cybersecurity.md +816 -0
  332. package/expertise/infrastructure/database-mongodb.md +447 -0
  333. package/expertise/infrastructure/database-postgres.md +400 -0
  334. package/expertise/infrastructure/devops-cicd.md +787 -0
  335. package/expertise/infrastructure/index.md +27 -0
  336. package/expertise/performance/PROGRESS.md +50 -0
  337. package/expertise/performance/backend/api-latency.md +1204 -0
  338. package/expertise/performance/backend/background-jobs.md +506 -0
  339. package/expertise/performance/backend/connection-pooling.md +1209 -0
  340. package/expertise/performance/backend/database-query-optimization.md +515 -0
  341. package/expertise/performance/backend/index.md +23 -0
  342. package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
  343. package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
  344. package/expertise/performance/foundations/caching-strategies.md +489 -0
  345. package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
  346. package/expertise/performance/foundations/index.md +24 -0
  347. package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
  348. package/expertise/performance/foundations/memory-management.md +964 -0
  349. package/expertise/performance/foundations/performance-budgets.md +1314 -0
  350. package/expertise/performance/index.md +31 -0
  351. package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
  352. package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
  353. package/expertise/performance/infrastructure/index.md +22 -0
  354. package/expertise/performance/infrastructure/load-balancing.md +1081 -0
  355. package/expertise/performance/infrastructure/observability.md +1079 -0
  356. package/expertise/performance/mobile/index.md +23 -0
  357. package/expertise/performance/mobile/mobile-animations.md +544 -0
  358. package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
  359. package/expertise/performance/mobile/mobile-network.md +452 -0
  360. package/expertise/performance/mobile/mobile-rendering.md +599 -0
  361. package/expertise/performance/mobile/mobile-startup-time.md +505 -0
  362. package/expertise/performance/platform-specific/flutter-performance.md +647 -0
  363. package/expertise/performance/platform-specific/index.md +22 -0
  364. package/expertise/performance/platform-specific/node-performance.md +1307 -0
  365. package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
  366. package/expertise/performance/platform-specific/react-performance.md +1403 -0
  367. package/expertise/performance/web/bundle-optimization.md +1239 -0
  368. package/expertise/performance/web/image-and-media.md +636 -0
  369. package/expertise/performance/web/index.md +24 -0
  370. package/expertise/performance/web/network-optimization.md +1133 -0
  371. package/expertise/performance/web/rendering-performance.md +1098 -0
  372. package/expertise/performance/web/ssr-and-hydration.md +918 -0
  373. package/expertise/performance/web/web-vitals.md +1374 -0
  374. package/expertise/quality/accessibility.md +985 -0
  375. package/expertise/quality/evidence-based-verification.md +499 -0
  376. package/expertise/quality/index.md +24 -0
  377. package/expertise/quality/ml-model-audit.md +614 -0
  378. package/expertise/quality/performance.md +600 -0
  379. package/expertise/quality/testing-api.md +891 -0
  380. package/expertise/quality/testing-mobile.md +496 -0
  381. package/expertise/quality/testing-web.md +849 -0
  382. package/expertise/security/PROGRESS.md +54 -0
  383. package/expertise/security/agentic-identity.md +540 -0
  384. package/expertise/security/compliance-frameworks.md +601 -0
  385. package/expertise/security/data/data-encryption.md +364 -0
  386. package/expertise/security/data/data-privacy-gdpr.md +692 -0
  387. package/expertise/security/data/database-security.md +1171 -0
  388. package/expertise/security/data/index.md +22 -0
  389. package/expertise/security/data/pii-handling.md +531 -0
  390. package/expertise/security/foundations/authentication.md +1041 -0
  391. package/expertise/security/foundations/authorization.md +603 -0
  392. package/expertise/security/foundations/cryptography.md +1001 -0
  393. package/expertise/security/foundations/index.md +25 -0
  394. package/expertise/security/foundations/owasp-top-10.md +1354 -0
  395. package/expertise/security/foundations/secrets-management.md +1217 -0
  396. package/expertise/security/foundations/secure-sdlc.md +700 -0
  397. package/expertise/security/foundations/supply-chain-security.md +698 -0
  398. package/expertise/security/index.md +31 -0
  399. package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
  400. package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
  401. package/expertise/security/infrastructure/container-security.md +721 -0
  402. package/expertise/security/infrastructure/incident-response.md +1295 -0
  403. package/expertise/security/infrastructure/index.md +24 -0
  404. package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
  405. package/expertise/security/infrastructure/network-security.md +1337 -0
  406. package/expertise/security/mobile/index.md +23 -0
  407. package/expertise/security/mobile/mobile-android-security.md +1218 -0
  408. package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
  409. package/expertise/security/mobile/mobile-data-storage.md +1265 -0
  410. package/expertise/security/mobile/mobile-ios-security.md +1401 -0
  411. package/expertise/security/mobile/mobile-network-security.md +1520 -0
  412. package/expertise/security/smart-contract-security.md +594 -0
  413. package/expertise/security/testing/index.md +22 -0
  414. package/expertise/security/testing/penetration-testing.md +1258 -0
  415. package/expertise/security/testing/security-code-review.md +1765 -0
  416. package/expertise/security/testing/threat-modeling.md +1074 -0
  417. package/expertise/security/testing/vulnerability-scanning.md +1062 -0
  418. package/expertise/security/web/api-security.md +586 -0
  419. package/expertise/security/web/cors-and-headers.md +433 -0
  420. package/expertise/security/web/csrf.md +562 -0
  421. package/expertise/security/web/file-upload.md +1477 -0
  422. package/expertise/security/web/index.md +25 -0
  423. package/expertise/security/web/injection.md +1375 -0
  424. package/expertise/security/web/session-management.md +1101 -0
  425. package/expertise/security/web/xss.md +1158 -0
  426. package/exports/README.md +17 -0
  427. package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
  428. package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
  429. package/exports/hosts/claude/.claude/agents/designer.md +55 -0
  430. package/exports/hosts/claude/.claude/agents/executor.md +55 -0
  431. package/exports/hosts/claude/.claude/agents/learner.md +51 -0
  432. package/exports/hosts/claude/.claude/agents/planner.md +53 -0
  433. package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
  434. package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
  435. package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
  436. package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
  437. package/exports/hosts/claude/.claude/commands/author.md +42 -0
  438. package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
  439. package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
  440. package/exports/hosts/claude/.claude/commands/design.md +44 -0
  441. package/exports/hosts/claude/.claude/commands/discover.md +37 -0
  442. package/exports/hosts/claude/.claude/commands/execute.md +48 -0
  443. package/exports/hosts/claude/.claude/commands/learn.md +38 -0
  444. package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
  445. package/exports/hosts/claude/.claude/commands/plan.md +39 -0
  446. package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
  447. package/exports/hosts/claude/.claude/commands/review.md +40 -0
  448. package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
  449. package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
  450. package/exports/hosts/claude/.claude/commands/specify.md +38 -0
  451. package/exports/hosts/claude/.claude/commands/verify.md +37 -0
  452. package/exports/hosts/claude/.claude/settings.json +34 -0
  453. package/exports/hosts/claude/CLAUDE.md +19 -0
  454. package/exports/hosts/claude/export.manifest.json +38 -0
  455. package/exports/hosts/claude/host-package.json +67 -0
  456. package/exports/hosts/codex/AGENTS.md +19 -0
  457. package/exports/hosts/codex/export.manifest.json +38 -0
  458. package/exports/hosts/codex/host-package.json +41 -0
  459. package/exports/hosts/cursor/.cursor/hooks.json +16 -0
  460. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
  461. package/exports/hosts/cursor/export.manifest.json +38 -0
  462. package/exports/hosts/cursor/host-package.json +42 -0
  463. package/exports/hosts/gemini/GEMINI.md +19 -0
  464. package/exports/hosts/gemini/export.manifest.json +38 -0
  465. package/exports/hosts/gemini/host-package.json +41 -0
  466. package/hooks/README.md +18 -0
  467. package/hooks/definitions/loop_cap_guard.yaml +21 -0
  468. package/hooks/definitions/post_tool_capture.yaml +24 -0
  469. package/hooks/definitions/pre_compact_summary.yaml +19 -0
  470. package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
  471. package/hooks/definitions/protected_path_write_guard.yaml +19 -0
  472. package/hooks/definitions/session_start.yaml +19 -0
  473. package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
  474. package/hooks/loop-cap-guard +17 -0
  475. package/hooks/post-tool-lint +36 -0
  476. package/hooks/protected-path-write-guard +17 -0
  477. package/hooks/session-start +41 -0
  478. package/llms-full.txt +2355 -0
  479. package/llms.txt +43 -0
  480. package/package.json +79 -0
  481. package/roles/README.md +20 -0
  482. package/roles/clarifier.md +42 -0
  483. package/roles/content-author.md +63 -0
  484. package/roles/designer.md +55 -0
  485. package/roles/executor.md +55 -0
  486. package/roles/learner.md +51 -0
  487. package/roles/planner.md +53 -0
  488. package/roles/researcher.md +43 -0
  489. package/roles/reviewer.md +54 -0
  490. package/roles/specifier.md +47 -0
  491. package/roles/verifier.md +71 -0
  492. package/schemas/README.md +24 -0
  493. package/schemas/accepted-learning.schema.json +20 -0
  494. package/schemas/author-artifact.schema.json +156 -0
  495. package/schemas/clarification.schema.json +19 -0
  496. package/schemas/design-artifact.schema.json +80 -0
  497. package/schemas/docs-claim.schema.json +18 -0
  498. package/schemas/export-manifest.schema.json +20 -0
  499. package/schemas/hook.schema.json +67 -0
  500. package/schemas/host-export-package.schema.json +18 -0
  501. package/schemas/implementation-plan.schema.json +19 -0
  502. package/schemas/proposed-learning.schema.json +19 -0
  503. package/schemas/research.schema.json +18 -0
  504. package/schemas/review.schema.json +29 -0
  505. package/schemas/run-manifest.schema.json +18 -0
  506. package/schemas/spec-challenge.schema.json +18 -0
  507. package/schemas/spec.schema.json +20 -0
  508. package/schemas/usage.schema.json +102 -0
  509. package/schemas/verification-proof.schema.json +29 -0
  510. package/schemas/wazir-manifest.schema.json +173 -0
  511. package/skills/README.md +40 -0
  512. package/skills/brainstorming/SKILL.md +77 -0
  513. package/skills/debugging/SKILL.md +50 -0
  514. package/skills/design/SKILL.md +61 -0
  515. package/skills/dispatching-parallel-agents/SKILL.md +128 -0
  516. package/skills/executing-plans/SKILL.md +70 -0
  517. package/skills/finishing-a-development-branch/SKILL.md +169 -0
  518. package/skills/humanize/SKILL.md +123 -0
  519. package/skills/init-pipeline/SKILL.md +124 -0
  520. package/skills/prepare-next/SKILL.md +20 -0
  521. package/skills/receiving-code-review/SKILL.md +123 -0
  522. package/skills/requesting-code-review/SKILL.md +105 -0
  523. package/skills/requesting-code-review/code-reviewer.md +108 -0
  524. package/skills/run-audit/SKILL.md +197 -0
  525. package/skills/scan-project/SKILL.md +41 -0
  526. package/skills/self-audit/SKILL.md +153 -0
  527. package/skills/subagent-driven-development/SKILL.md +154 -0
  528. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  529. package/skills/subagent-driven-development/implementer-prompt.md +102 -0
  530. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  531. package/skills/tdd/SKILL.md +23 -0
  532. package/skills/using-git-worktrees/SKILL.md +163 -0
  533. package/skills/using-skills/SKILL.md +95 -0
  534. package/skills/verification/SKILL.md +22 -0
  535. package/skills/wazir/SKILL.md +463 -0
  536. package/skills/writing-plans/SKILL.md +30 -0
  537. package/skills/writing-skills/SKILL.md +157 -0
  538. package/skills/writing-skills/anthropic-best-practices.md +122 -0
  539. package/skills/writing-skills/persuasion-principles.md +50 -0
  540. package/templates/README.md +20 -0
  541. package/templates/artifacts/README.md +10 -0
  542. package/templates/artifacts/accepted-learning.md +19 -0
  543. package/templates/artifacts/accepted-learning.template.json +12 -0
  544. package/templates/artifacts/author.md +74 -0
  545. package/templates/artifacts/author.template.json +19 -0
  546. package/templates/artifacts/clarification.md +21 -0
  547. package/templates/artifacts/clarification.template.json +12 -0
  548. package/templates/artifacts/execute-notes.md +19 -0
  549. package/templates/artifacts/implementation-plan.md +21 -0
  550. package/templates/artifacts/implementation-plan.template.json +11 -0
  551. package/templates/artifacts/learning-proposal.md +19 -0
  552. package/templates/artifacts/next-run-handoff.md +21 -0
  553. package/templates/artifacts/plan-review.md +19 -0
  554. package/templates/artifacts/proposed-learning.template.json +12 -0
  555. package/templates/artifacts/research.md +21 -0
  556. package/templates/artifacts/research.template.json +12 -0
  557. package/templates/artifacts/review-findings.md +19 -0
  558. package/templates/artifacts/review.template.json +11 -0
  559. package/templates/artifacts/run-manifest.template.json +8 -0
  560. package/templates/artifacts/spec-challenge.md +19 -0
  561. package/templates/artifacts/spec-challenge.template.json +11 -0
  562. package/templates/artifacts/spec.md +21 -0
  563. package/templates/artifacts/spec.template.json +12 -0
  564. package/templates/artifacts/verification-proof.md +19 -0
  565. package/templates/artifacts/verification-proof.template.json +11 -0
  566. package/templates/examples/accepted-learning.example.json +14 -0
  567. package/templates/examples/author.example.json +152 -0
  568. package/templates/examples/clarification.example.json +15 -0
  569. package/templates/examples/docs-claim.example.json +8 -0
  570. package/templates/examples/export-manifest.example.json +7 -0
  571. package/templates/examples/host-export-package.example.json +11 -0
  572. package/templates/examples/implementation-plan.example.json +17 -0
  573. package/templates/examples/proposed-learning.example.json +13 -0
  574. package/templates/examples/research.example.json +15 -0
  575. package/templates/examples/research.example.md +6 -0
  576. package/templates/examples/review.example.json +17 -0
  577. package/templates/examples/run-manifest.example.json +9 -0
  578. package/templates/examples/spec-challenge.example.json +14 -0
  579. package/templates/examples/spec.example.json +21 -0
  580. package/templates/examples/verification-proof.example.json +21 -0
  581. package/templates/examples/wazir-manifest.example.yaml +65 -0
  582. package/templates/task-definition-schema.md +99 -0
  583. package/tooling/README.md +20 -0
  584. package/tooling/src/adapters/context-mode.js +50 -0
  585. package/tooling/src/capture/command.js +376 -0
  586. package/tooling/src/capture/store.js +99 -0
  587. package/tooling/src/capture/usage.js +270 -0
  588. package/tooling/src/checks/branches.js +50 -0
  589. package/tooling/src/checks/brand-truth.js +110 -0
  590. package/tooling/src/checks/changelog.js +231 -0
  591. package/tooling/src/checks/command-registry.js +36 -0
  592. package/tooling/src/checks/commits.js +102 -0
  593. package/tooling/src/checks/docs-drift.js +103 -0
  594. package/tooling/src/checks/docs-truth.js +201 -0
  595. package/tooling/src/checks/runtime-surface.js +156 -0
  596. package/tooling/src/cli.js +116 -0
  597. package/tooling/src/command-options.js +56 -0
  598. package/tooling/src/commands/validate.js +320 -0
  599. package/tooling/src/doctor/command.js +91 -0
  600. package/tooling/src/export/command.js +77 -0
  601. package/tooling/src/export/compiler.js +498 -0
  602. package/tooling/src/guards/loop-cap-guard.js +52 -0
  603. package/tooling/src/guards/protected-path-write-guard.js +67 -0
  604. package/tooling/src/index/command.js +152 -0
  605. package/tooling/src/index/storage.js +1061 -0
  606. package/tooling/src/index/summarizers.js +261 -0
  607. package/tooling/src/loaders.js +18 -0
  608. package/tooling/src/project-root.js +22 -0
  609. package/tooling/src/recall/command.js +225 -0
  610. package/tooling/src/schema-validator.js +30 -0
  611. package/tooling/src/state-root.js +40 -0
  612. package/tooling/src/status/command.js +71 -0
  613. package/wazir.manifest.yaml +135 -0
  614. package/workflows/README.md +19 -0
  615. package/workflows/author.md +42 -0
  616. package/workflows/clarify.md +38 -0
  617. package/workflows/design-review.md +46 -0
  618. package/workflows/design.md +44 -0
  619. package/workflows/discover.md +37 -0
  620. package/workflows/execute.md +48 -0
  621. package/workflows/learn.md +38 -0
  622. package/workflows/plan-review.md +42 -0
  623. package/workflows/plan.md +39 -0
  624. package/workflows/prepare-next.md +37 -0
  625. package/workflows/review.md +40 -0
  626. package/workflows/run-audit.md +41 -0
  627. package/workflows/spec-challenge.md +41 -0
  628. package/workflows/specify.md +38 -0
  629. package/workflows/verify.md +37 -0
@@ -0,0 +1,753 @@
1
+ # Microservices Architecture — Architecture Expertise Module
2
+
3
+ > Microservices decompose an application into small, independently deployable services each owning its data and business capability. They enable organizational scale and independent deployment at the cost of massive operational complexity. The right choice for a minority of systems — and the most over-applied architectural pattern in the industry.
4
+
5
+ > **Category:** Pattern
6
+ > **Complexity:** Expert
7
+ > **Applies when:** Organizations with 5+ independent teams, proven product-market fit, independent deployment needs, and operational maturity for distributed systems
8
+
9
+ ---
10
+
11
+ ## What This Is (and What It Isn't)
12
+
13
+ A microservices architecture structures an application as a **collection of independently deployable services**, each running in its own process, owning its own data store, and communicating over the network via well-defined APIs. Each service is built, tested, deployed, and scaled independently by a team that owns the full lifecycle of that service.
14
+
15
+ The pattern originated from two parallel forces in the early 2010s:
16
+
17
+ - **Amazon's "two-pizza teams" mandate (2002-2006):** Jeff Bezos's directive that every team must be small enough to be fed by two pizzas, and every team must expose its functionality via service interfaces. No direct database access between teams. This was an organizational decision that forced a technical architecture.
18
+ - **Netflix's migration (2008-2015):** After a catastrophic three-day database corruption outage in 2008 exposed the fragility of their monolith, Netflix spent seven years migrating to over 700 microservices. They open-sourced much of the tooling (Hystrix, Eureka, Zuul, Ribbon) and became the public face of the pattern.
19
+
20
+ The critical distinction from adjacent architectures:
21
+
22
+ | Architecture | Deployment units | Data ownership | Network calls | Operational complexity |
23
+ |---|---|---|---|---|
24
+ | Monolith | 1 | Shared DB | None | Low |
25
+ | Modular monolith | 1 | Per-module schema | None (in-process) | Low-moderate |
26
+ | Microservices | Many (1 per service) | Per-service DB | Yes — every call | Very high |
27
+ | Distributed monolith | Many | Often shared DB | Yes — every call | Very high, no benefits |
28
+
29
+ **What microservices are NOT:**
30
+
31
+ - **Not "small services."** Size is irrelevant. A service should align with a business capability or bounded context, whether that is 200 lines or 20,000 lines. The "micro" in microservices refers to scope of responsibility, not lines of code. Teams that split by size produce anemic services that cannot do anything useful alone.
32
+ - **Not a performance optimization.** Microservices add network latency to every inter-service call (0.1ms in-process becomes 5-50ms over the network). They exist for organizational and deployment independence, not for speed. If your goal is performance, a monolith with in-process calls will always be faster for the same workload.
33
+ - **Not a default architecture.** The pattern solves specific scaling problems that emerge only at organizational scale. Choosing microservices before you have those problems is premature optimization of your architecture.
34
+ - **Not SOA rebranded.** Service-Oriented Architecture (SOA) used shared enterprise service buses, shared schemas (canonical data models), and top-down governance. Microservices deliberately reject all three — each service owns its data, picks its own technology, and communicates via lightweight protocols without a centralized bus.
35
+ - **Not the opposite of a monolith.** A well-structured modular monolith and a well-structured microservices system solve many of the same code organization problems. The difference is deployment topology and the operational cost that comes with it.
36
+
37
+ **Common misconceptions that cause real damage:**
38
+
39
+ 1. "Microservices make the system simpler." They make each individual service simpler while making the system as a whole dramatically more complex. You trade code complexity for operational complexity.
40
+ 2. "Microservices improve performance." They degrade performance (network latency on every call) while enabling independent scaling — which only matters if different parts of your system have genuinely different load profiles.
41
+ 3. "We need microservices for CI/CD." A well-structured monolith can deploy dozens of times per day (Shopify deploys 40+ times daily from a monolith). CI/CD is an engineering practice, not an architecture.
42
+ 4. "Every new project should start with microservices." Martin Fowler's MonolithFirst principle: almost all successful microservices architectures started as monoliths that were decomposed after the domain was well understood.
43
+
44
+ ---
45
+
46
+ ## When to Use It
47
+
48
+ This pattern is justified only when **multiple specific conditions are true simultaneously**. Having one or two is insufficient.
49
+
50
+ ### 5+ autonomous teams needing independent deployment
51
+
52
+ When five or more teams work on the same product and are blocked by each other's release cycles, microservices provide genuine organizational relief. Team A can deploy a new recommendation algorithm without waiting for Team B's payment refactor to be tested. This is the primary benefit — organizational, not technical.
53
+
54
+ **Netflix:** Over 2,000 engineers organized into hundreds of small teams, each owning one or more services. A team can deploy multiple times per day without coordinating with anyone. At this scale, a shared monolith would create crippling deployment contention.
55
+
56
+ **Amazon:** Thousands of services powering amazon.com, each owned by a "two-pizza team" with full operational responsibility. The organizational model (small, autonomous teams with end-to-end ownership) requires the technical model (independent services).
57
+
58
+ ### Genuinely different scaling requirements
59
+
60
+ When one part of your system handles 10,000x the load of another, scaling them as a single unit is wasteful. Microservices allow you to run 200 instances of the video encoding service while running 2 instances of the user settings service.
61
+
62
+ **Netflix:** Video streaming (CDN, encoding, recommendation) operates at a fundamentally different scale than account management or billing. Scaling the entire system to handle streaming load would waste resources on components that handle 1/10,000th of the traffic.
63
+
64
+ **Uber:** Trip matching and pricing need to handle millions of real-time requests per second. Driver onboarding handles hundreds per day. Independent scaling is a hard requirement.
65
+
66
+ ### Polyglot technology requirements
67
+
68
+ When different parts of the system genuinely benefit from different technology stacks — ML inference in Python/PyTorch, real-time data processing in Go or Rust, web API in Node.js — microservices allow each team to choose the best tool for their specific problem.
69
+
70
+ ### Proven domain boundaries
71
+
72
+ The domain must be well understood. Service boundaries that are drawn wrong create distributed monoliths — services that must be deployed together, defeating the entire purpose. Domain-Driven Design bounded contexts are the prerequisite, not the outcome, of microservices adoption.
73
+
74
+ **Uber's lesson:** With 2,200+ microservices, Uber found that engineers often had to work through 50 services across 12 different teams to investigate a single root cause. They responded by reorganizing 2,200 microservices into 70 domains (DOMA — Domain-Oriented Microservice Architecture), reducing inter-team touchpoints by 25-50%.
75
+
76
+ ### Mature operational infrastructure
77
+
78
+ Microservices require: container orchestration (Kubernetes), service discovery, distributed tracing, centralized logging, CI/CD per service, API gateway, health checking, circuit breakers, and on-call for each service. If your organization does not already have (or can immediately invest in) this infrastructure, microservices will drown you in operational work.
79
+
80
+ ---
81
+
82
+ ## When NOT to Use It
83
+
84
+ **This section is deliberately longer than "When to Use It" because the failure mode of unnecessary microservices adoption is far more common and far more damaging than the failure mode of staying on a monolith too long.**
85
+
86
+ ### Team fewer than 10 developers — never
87
+
88
+ A team of 8 developers does not need independent deployment because they can coordinate directly. They do not need independent scaling because they share a deployment target. They do not have the staffing for per-service CI/CD, per-service on-call, and per-service monitoring.
89
+
90
+ **Real failure pattern:** A startup of 8 engineers builds 15 microservices. They spend 40% of their engineering time on infrastructure — maintaining CI/CD pipelines, debugging network issues between services, managing Kubernetes clusters, handling distributed tracing. They have 60% of their capacity left for actual product development. A competitor with a monolith ships twice as fast with the same team size.
91
+
92
+ **The math:** 10 microservices = 10 CI/CD pipelines, 10 deployment configurations, 10 sets of health checks, 10 log aggregation configurations, 10 sets of metrics dashboards. For a team of 8, this means every engineer maintains more than one service's infrastructure alongside their product work.
93
+
94
+ ### Startup without product-market fit — never
95
+
96
+ Before product-market fit, the product changes constantly. Service boundaries drawn today will be wrong in 3 months because the domain itself is being discovered. Refactoring across service boundaries (changing APIs, migrating data between services, updating contracts) is 10-100x more expensive than refactoring within a monolith.
97
+
98
+ Martin Fowler's MonolithFirst principle applies with full force: "You shouldn't start a new project with microservices, even if you're sure your application will be big enough to make it worthwhile."
99
+
100
+ ### Network latency kills your use case
101
+
102
+ An in-process function call takes ~0.1 microseconds. A network call between services in the same datacenter takes 0.5-5 milliseconds — a 5,000x-50,000x increase. If a single user request requires 10 service-to-service calls, you add 5-50ms of pure network overhead.
103
+
104
+ For latency-sensitive applications (real-time trading, game servers, video processing pipelines), this overhead is unacceptable. Amazon Prime Video discovered this firsthand.
105
+
106
+ **Amazon Prime Video (2023):** The Video Quality Analysis team had implemented their monitoring pipeline as a distributed microservices architecture using AWS Step Functions and S3 for inter-service data transfer. Each video frame had to be serialized to S3, then deserialized by the next service. The distributed overhead was so severe that the system could not scale to handle even a fraction of their streams.
107
+
108
+ They re-architected into a single monolithic process, moving all data transfer to in-memory operations. **Result: 90% cost reduction.** The bottleneck was not compute — it was the overhead of distributed communication between services that had no reason to be separated. The services had no independent teams, no independent scaling needs, and no independent deployment requirements. They were separated because "microservices" was the default architectural choice, not because it solved a real problem.
109
+
110
+ ### Distributed transactions become the norm
111
+
112
+ If your business logic frequently requires atomic operations spanning multiple services (e.g., "debit account AND update inventory AND create order — all or nothing"), microservices force you into distributed transaction patterns (Saga, 2PC) that are orders of magnitude more complex than a database transaction.
113
+
114
+ A single SQL `BEGIN; ... COMMIT;` across three tables in a monolith becomes a multi-step choreography or orchestration with compensation logic, idempotency requirements, and eventual consistency guarantees. If more than 20% of your operations require cross-service transactions, you have drawn your service boundaries wrong — or microservices are wrong for your domain.
115
+
116
+ ### Operational overhead exceeds organizational benefit
117
+
118
+ **The operational tax of microservices is real and quantifiable:**
119
+
120
+ - Infrastructure costs increase 25%+ due to per-service containers, sidecars, load balancers, and orchestration overhead.
121
+ - Teams report 30-50% more time spent on deployment automation compared to monolithic deployments.
122
+ - Communication overhead increases 3-5x — standups go from 15 minutes to 45 minutes when each service needs representation.
123
+ - A 30-person team running microservices pays ~$40K/month vs ~$10K/month for an equivalent modular monolith — $360K annual difference.
124
+ - 15+ services require a dedicated platform team just to keep the infrastructure running.
125
+
126
+ **Gartner research (2024-2025):** 90% of organizations adopting microservices prematurely will fail to realize the expected benefits. The most common reason: adopting the pattern without the organizational maturity to operate it.
127
+
128
+ ### Segment's retreat (2017-2018)
129
+
130
+ Segment adopted microservices to solve fault isolation — a failure in one integration destination (e.g., Google Analytics connector) would crash the entire monolith. They created one microservice per destination, eventually running 140+ services.
131
+
132
+ The operational cost exploded. Each service needed its own queue, its own deployment pipeline, its own monitoring. The small team was drowning in infrastructure management. In the words of Segment engineer Alexandra Noonan: "If microservices are implemented incorrectly or used as a band-aid without addressing some of the root flaws in your system, you'll be unable to do new product development because you're drowning in the complexity."
133
+
134
+ They consolidated back to a monolith — a single service called Centrifuge that handled all destinations. Velocity recovered immediately. The root problem (fault isolation) was solved within the monolith using better error handling and bulkheading, not by distributing the system.
135
+
136
+ ### Debugging becomes archaeology across 50 services
137
+
138
+ In a monolith, a stack trace shows you the complete call path. In microservices, a single user request may traverse 10-20 services. Debugging requires distributed tracing (Jaeger, Zipkin, Datadog), log correlation across services, and understanding of asynchronous event flows that span multiple systems.
139
+
140
+ Uber engineers reported having to investigate across 50 services and 12 teams to find the root cause of a single problem. Without world-class observability tooling, debugging a microservices system is like reading a novel where every chapter is in a different building.
141
+
142
+ ### Testing complexity explodes
143
+
144
+ Testing a monolith: start the application, run tests. Testing microservices: start all dependent services (or maintain contract tests for each service boundary), manage test data across multiple databases, handle asynchronous event propagation in tests, mock network failures, test timeout and retry behavior.
145
+
146
+ Integration test scope grows combinatorially. With 10 services and 3 API versions each, the number of possible interaction states is enormous. Teams without mature contract testing practices (Pact, Spring Cloud Contract) spend more time maintaining test infrastructure than writing tests.
147
+
148
+ ---
149
+
150
+ ## How It Works
151
+
152
+ ### Service boundaries: bounded contexts, not technical layers
153
+
154
+ Each service maps to a business capability (Orders, Payments, Inventory, Recommendations), **not** a technical layer (API, Business Logic, Data Access). A service owns the full vertical stack for its capability: API endpoint, business logic, data store, and deployment pipeline.
155
+
156
+ Wrong decomposition (by technical layer):
157
+ ```
158
+ api-gateway-service/ → handles all HTTP routing
159
+ business-logic-service/ → handles all business rules
160
+ data-access-service/ → handles all database queries
161
+ ```
162
+
163
+ Correct decomposition (by business capability):
164
+ ```
165
+ order-service/ → owns orders: API + logic + data
166
+ payment-service/ → owns payments: API + logic + data
167
+ inventory-service/ → owns inventory: API + logic + data
168
+ recommendation-service/ → owns recommendations: API + logic + data
169
+ ```
170
+
171
+ ### Synchronous communication (request-response)
172
+
173
+ Used when the caller needs an immediate response to continue processing.
174
+
175
+ **REST/HTTP:** The default for external-facing APIs and simple service-to-service calls. Well understood, easy to debug, broadly supported.
176
+
177
+ **gRPC:** Binary protocol over HTTP/2. Faster serialization (Protocol Buffers), bidirectional streaming, strong typing via `.proto` contracts. Preferred for internal service-to-service calls where performance matters.
178
+
179
+ ```protobuf
180
+ // order.proto — contract between order-service and payment-service
181
+ service PaymentService {
182
+ rpc AuthorizePayment(PaymentRequest) returns (PaymentResponse);
183
+ }
184
+
185
+ message PaymentRequest {
186
+ string order_id = 1;
187
+ int64 amount_cents = 2;
188
+ string currency = 3;
189
+ string idempotency_key = 4;
190
+ }
191
+
192
+ message PaymentResponse {
193
+ bool authorized = 1;
194
+ string transaction_id = 2;
195
+ string decline_reason = 3;
196
+ }
197
+ ```
198
+
199
+ **The danger of synchronous calls:** Every synchronous call creates temporal coupling. If payment-service is down, order-service cannot complete. Chain three synchronous calls together and the probability of failure multiplies. Use synchronous calls only when the caller genuinely cannot proceed without an immediate answer.
200
+
201
+ ### Asynchronous communication (event-driven)
202
+
203
+ Used when the caller does not need an immediate response, or when multiple consumers need to react to the same event.
204
+
205
+ ```
206
+ order-service publishes → OrderPlaced event → message broker (Kafka/RabbitMQ)
207
+ ├── payment-service subscribes → initiates payment
208
+ ├── inventory-service subscribes → reserves stock
209
+ ├── notification-service subscribes → sends confirmation email
210
+ └── analytics-service subscribes → records event for reporting
211
+ ```
212
+
213
+ Events decouple the publisher from all consumers. Order-service does not know (or care) how many services react to an OrderPlaced event. New consumers can be added without modifying the publisher.
214
+
215
+ **Tradeoff:** Eventual consistency. After order-service publishes OrderPlaced, there is a window where the payment has not yet been processed and inventory has not yet been reserved. The system is temporarily inconsistent. Designing for eventual consistency requires careful thought about what the user sees during this window.
216
+
217
+ ### Service discovery
218
+
219
+ Services need to find each other without hardcoded addresses. Two models:
220
+
221
+ **Client-side discovery:** The client queries a service registry (Consul, Eureka) and load-balances across available instances. The client is responsible for choosing an instance.
222
+
223
+ **Server-side discovery (platform-managed):** Kubernetes Services provide built-in service discovery. `http://payment-service:8080` resolves via kube-dns to a healthy pod. The platform handles load balancing and health checking. This is the dominant model in 2024+.
224
+
225
+ ### API Gateway
226
+
227
+ A single entry point for external clients. Routes requests to internal services, handles cross-cutting concerns (authentication, rate limiting, request logging, SSL termination), and aggregates responses from multiple services when needed.
228
+
229
+ ```
230
+ Client → API Gateway → order-service
231
+ → payment-service
232
+ → user-service
233
+ ```
234
+
235
+ The gateway shields internal service topology from external consumers. Internal services can be split, merged, or moved without changing the external API.
236
+
237
+ ### Data isolation: each service owns its database
238
+
239
+ **This is the most important — and most violated — principle of microservices.** Each service has its own database (or schema) that no other service may access directly. Not read-only access. Not "just for reporting." No access at all.
240
+
241
+ If order-service needs customer data, it calls user-service's API. It does not query user-service's database directly. This is non-negotiable — shared databases create coupling that defeats every benefit of independent deployment.
242
+
243
+ **Consequence:** No cross-service JOINs. No cross-service foreign keys. Reports that need data from multiple services must either query each service's API and aggregate, or consume events into a dedicated reporting/analytics data store (CQRS read model).
244
+
245
+ ### Eventual consistency
246
+
247
+ Without shared databases and distributed transactions, consistency between services is eventual. After order-service creates an order and publishes an event, payment-service will process the payment — but not instantly. The system is consistent "eventually" (typically milliseconds to seconds, but potentially longer under failure).
248
+
249
+ Patterns for managing this:
250
+ - **Saga pattern:** A sequence of local transactions across services, with compensating transactions to undo previous steps if a later step fails. See `architecture/patterns/saga-pattern.md`.
251
+ - **Outbox pattern:** Events are written to a local "outbox" table in the same transaction as the domain change, then published asynchronously. Guarantees at-least-once delivery without distributed transactions.
252
+ - **CQRS:** Separate read and write models. Write models are strongly consistent within a service. Read models are eventually consistent projections built from events.
253
+
254
+ ### Container orchestration (Kubernetes)
255
+
256
+ Kubernetes is the de facto standard. It handles deployment, scaling, health checking, service discovery (kube-dns), rolling updates, and resource management. Each service runs as multiple replicated pods for availability. Each service declares CPU/memory resource requests and limits.
257
+
258
+ ### Service mesh
259
+
260
+ A dedicated infrastructure layer (Istio, Linkerd) deployed as sidecar proxies alongside each service. Provides mutual TLS (mTLS), traffic management (canary, blue-green), observability, and retry/timeout policies — without application code changes. Necessary at scale (50+ services) but significant operational overhead for smaller deployments.
261
+
262
+ ### Distributed tracing
263
+
264
+ A single user request may traverse 10+ services. Distributed tracing assigns a unique trace ID and propagates it across all service calls, creating a visual trace with timing per hop:
265
+
266
+ ```
267
+ [trace-id=abc123] api-gateway(2ms) → order-service(15ms) → payment-service(45ms) → fraud-detection(30ms)
268
+ ```
269
+
270
+ Without distributed tracing, debugging is effectively impossible. Tracing is prerequisite infrastructure, not an optimization.
271
+
272
+ ---
273
+
274
+ ## Trade-Offs Matrix
275
+
276
+ | You Get | You Pay |
277
+ |---------|---------|
278
+ | Independent deployment per service — teams ship on their own schedule | N services = N CI/CD pipelines, N deployment configs, N rollback procedures |
279
+ | Independent scaling — allocate resources where load actually is | Infrastructure cost increases 25%+ from per-service overhead (containers, sidecars, load balancers) |
280
+ | Team autonomy — each team owns their service end-to-end | Cross-service features require multi-team coordination, API versioning, and contract negotiation |
281
+ | Technology freedom — each service can use the best tool for the job | Polyglot infrastructure multiplies operational complexity (different build tools, different monitoring, different debugging) |
282
+ | Fault isolation — one service failing does not crash the system | Partial failures are harder to reason about than total failures; cascading failures require circuit breakers |
283
+ | Organizational scalability — 100+ engineers can work without stepping on each other | Communication overhead increases 3-5x; standups grow from 15 to 45+ minutes |
284
+ | Smaller, focused codebases per service are easier to understand individually | The system as a whole is dramatically harder to understand; no single person comprehends the full architecture |
285
+ | Forced API discipline — service boundaries require explicit contracts | Every service boundary is a potential point of failure, latency, and versioning complexity |
286
+ | Independent data stores prevent data coupling | No cross-service JOINs, no cross-service transactions; reporting requires dedicated infrastructure |
287
+ | Easier to replace or rewrite a single service | Service boundaries drawn wrong create a distributed monolith — worse than a regular monolith |
288
+ | Enables continuous delivery at organizational scale | Requires mature DevOps culture, platform team, and significant tooling investment before benefits materialize |
289
+
290
+ ---
291
+
292
+ ## Evolution Path
293
+
294
+ ### The cardinal rule: never start with microservices
295
+
296
+ Almost every successful microservices architecture started as a monolith. Netflix started as a monolith. Amazon started as a monolith. Uber started as a monolith. They migrated to microservices after years of growth, when the organizational and scaling problems became concrete and measurable — not theoretical.
297
+
298
+ Starting with microservices means drawing service boundaries before you understand your domain. The boundaries will be wrong. Refactoring across service boundaries is 10-100x more expensive than refactoring within a monolith.
299
+
300
+ ### Stage 1: Monolith (correct starting point)
301
+
302
+ Build a well-structured monolith. Focus on clean domain separation within the codebase — not because you plan to extract services, but because it is good engineering. Use the energy you would have spent on Kubernetes, service mesh, and distributed tracing on shipping product and finding product-market fit.
303
+
304
+ ### Stage 2: Modular monolith (when team grows)
305
+
306
+ When the codebase grows to 10+ developers across multiple teams, enforce module boundaries within the monolith. Use tooling (ArchUnit, Packwerk, Nx boundary rules) to prevent modules from reaching into each other's internals. Each module owns its own database schema. Communication between modules goes through published interfaces or an in-process event bus.
307
+
308
+ This stage provides 80% of the organizational benefits of microservices (team ownership, code isolation, clear contracts) with none of the distributed systems overhead.
309
+
310
+ **Many organizations should stay at this stage permanently.** Shopify runs a modular monolith with 2,000+ engineers. Stack Overflow serves billions of page views from a monolith.
311
+
312
+ ### Stage 3: Selective extraction (when specific constraints demand it)
313
+
314
+ Extract individual modules to services **only when a specific, measurable constraint requires it**:
315
+
316
+ - Module X needs independent scaling (measured, not speculated)
317
+ - Module X must deploy independently due to regulatory requirements
318
+ - Module X requires a fundamentally different technology stack
319
+ - Module X's deployment cadence is blocked by shared deployment
320
+
321
+ **The Strangler Fig pattern** is the safest extraction method:
322
+
323
+ 1. **Intercept:** Place a routing layer (API gateway or proxy) in front of the monolith.
324
+ 2. **Extract:** Build the new service implementing the module's functionality.
325
+ 3. **Redirect:** Route requests for that capability to the new service.
326
+ 4. **Remove:** Delete the module code from the monolith.
327
+
328
+ Each step is independently deployable and reversible. The system works at every intermediate stage. If the extraction proves to be a mistake, you can route traffic back to the monolith.
329
+
330
+ **Never extract more than one module at a time.** Validate each extraction in production — performance, reliability, operational burden — before starting the next. Netflix's migration took seven years. Rushing is how you build a distributed monolith.
331
+
332
+ ### Stage 4: Domain-oriented architecture (at extreme scale)
333
+
334
+ At 100+ services, individual services become too granular for organizational management. Uber's response — Domain-Oriented Microservice Architecture (DOMA) — groups related services into domains with a single entry point. Each domain exposes a gateway service; internal services within the domain are hidden from the rest of the organization.
335
+
336
+ This is the architecture of organizations with 1,000+ engineers and hundreds of services. If you are reading this section because you think you need it, you almost certainly do not.
337
+
338
+ ---
339
+
340
+ ## Failure Modes
341
+
342
+ ### Failure Mode 1: The distributed monolith
343
+
344
+ **What it looks like:** 15 "microservices" that must all be deployed together. A change to the Order schema requires simultaneous updates to Order, Payment, Inventory, and Notification services. Services share a database. The team has a "deployment day" where all services are released in sequence.
345
+
346
+ **Why it happens:** Service boundaries were drawn by technical layer (API service, business logic service, data service) instead of by business capability. Or the team split the monolith along arbitrary lines without understanding which components are genuinely independent.
347
+
348
+ **The result:** You have all the operational complexity of microservices (network calls, distributed debugging, multiple pipelines) with none of the benefits (independent deployment, independent scaling). This is strictly worse than a monolith.
349
+
350
+ **90% of microservices teams still batch-deploy like monoliths (2024-2025 industry data).** If you cannot deploy any single service without touching others, you have a distributed monolith.
351
+
352
+ **Fix:** Either consolidate back to a monolith (or modular monolith) or invest in proper service boundary redesign using DDD bounded contexts.
353
+
354
+ ### Failure Mode 2: Chatty services
355
+
356
+ **What it looks like:** Rendering a single web page requires 15 synchronous service-to-service calls. Latency is dominated by network round-trips, not computation. A latency spike in any one of the 15 services causes the entire page to slow down or time out.
357
+
358
+ **Why it happens:** Services are too fine-grained. Instead of a single Order service that handles the complete order lifecycle, there is an OrderCreation service, an OrderValidation service, an OrderPricing service, and an OrderPersistence service — all of which must be called in sequence.
359
+
360
+ **The math:** 15 calls x 5ms average latency = 75ms of pure network overhead, before any actual computation. Add one call with a P99 of 50ms and the page regularly takes over 125ms just for networking.
361
+
362
+ **Fix:** Merge chatty services. If two services always communicate synchronously and cannot function independently, they are one service that has been unnecessarily split. An inter-service call should represent a genuine organizational or scaling boundary, not a function call.
363
+
364
+ ### Failure Mode 3: Shared database
365
+
366
+ **What it looks like:** Three services read from and write to the same PostgreSQL database. The Order table has columns added by the Payment team. A schema migration by the Inventory team breaks the Order service.
367
+
368
+ **Why it happens:** The team wanted the convenience of JOINs and transactions across service boundaries. Or the team was told "do microservices" but nobody enforced data isolation.
369
+
370
+ **The result:** Tight coupling through shared state. Any service can break any other service by changing the schema. Services cannot be deployed independently because a schema change must be coordinated. The "micro" in microservices is meaningless when all services share a single data store.
371
+
372
+ **Fix:** Each service gets its own database or schema with no cross-service access. Accept the cost: no cross-service JOINs, eventual consistency for cross-service data needs, and API calls or events for data sharing.
373
+
374
+ ### Failure Mode 4: Missing observability
375
+
376
+ **What it looks like:** A user reports that checkout is slow. The on-call engineer checks the order service — it looks fine. Checks the payment service — also fine. The issue is actually a slow query in the fraud detection service that is called by the payment service, but nobody knows this because there is no distributed tracing.
377
+
378
+ **Real cost:** Without distributed tracing, debugging a 10-service call chain takes hours instead of minutes. Engineers resort to adding log statements, redeploying, and hoping to reproduce the issue.
379
+
380
+ **Fix:** Implement distributed tracing (Jaeger, Zipkin, Datadog APT) before you deploy your second service. Not after. Not when you "have time." Tracing is prerequisite infrastructure for microservices, not an optimization.
381
+
382
+ ### Failure Mode 5: Cascade failures
383
+
384
+ **What it looks like:** The recommendation service experiences high load. It becomes slow, causing the product-page service (which calls it synchronously) to queue up threads waiting for responses. The product-page service's thread pool fills up, causing it to stop responding to the API gateway. The API gateway marks the product-page service as unhealthy and routes all traffic to the remaining instances, which immediately overload. The entire system goes down because one non-critical service was slow.
385
+
386
+ **Why it happens:** No circuit breakers, no bulkheading, no timeout budgets. Every service trusts that its dependencies will respond quickly.
387
+
388
+ **Fix:** Circuit breaker pattern (Hystrix, Resilience4j, Polly): when a downstream service fails, stop calling it and return a degraded response. Bulkheading: isolate thread pools per dependency so one slow dependency cannot exhaust all resources. Timeout budgets: a request that has already consumed 80% of its time budget should not make a downstream call that takes 50% of the budget.
389
+
390
+ ### Failure Mode 6: Version coupling hell
391
+
392
+ **What it looks like:** Payment service v2 changes its API contract. Order service, Subscription service, and Refund service all depend on Payment v1. All three must be updated simultaneously. The team spends a sprint coordinating the migration across four services owned by three teams.
393
+
394
+ **Why it happens:** No API versioning strategy. Breaking changes are deployed without maintaining backward compatibility.
395
+
396
+ **Fix:** Never make breaking API changes. Add new fields; don't remove old ones. Use API versioning (`/v1/payments`, `/v2/payments`) and support both versions simultaneously during migration. Consumer-driven contract testing (Pact) catches breaking changes before deployment.
397
+
398
+ ### Failure Mode 7: Over-decomposition
399
+
400
+ **What it looks like:** A 12-person team maintains 45 services. Each service has 200-500 lines of code. Most services do a single database query and forward the result. Engineers spend more time navigating between repositories and updating inter-service contracts than writing business logic.
401
+
402
+ **Why it happens:** The team took "micro" literally and split by every entity or function rather than by business capability.
403
+
404
+ **Fix:** Merge services that have no independent deployment or scaling needs. A reasonable starting point: one service per team of 3-8 engineers. If a service cannot justify its own CI/CD pipeline and on-call rotation, it should not be a separate service.
405
+
406
+ ---
407
+
408
+ ## Technology Landscape
409
+
410
+ ### Container orchestration
411
+
412
+ **Kubernetes (K8s):** The de facto standard. Handles deployment, scaling, service discovery (kube-dns), health checking, rolling updates, resource management, and secrets. Non-negotiable for serious microservices deployment. See Implementation Sketch for deployment YAML.
413
+
414
+ ### Service mesh
415
+
416
+ **Istio:** Full-featured — mTLS, traffic management (canary, blue-green), observability, policy enforcement. Significant overhead; justified for 50+ services. **Linkerd:** Lighter-weight alternative, simpler to operate. Often better for organizations new to service mesh.
417
+
418
+ ### API Gateway
419
+
420
+ **Kong:** Open-source, plugin-based (auth, rate limiting, logging). K8s-native via Ingress Controller. **AWS/GCP API Gateway:** Cloud-managed, lower ops overhead. **Envoy:** High-performance proxy, also the data plane for Istio.
421
+
422
+ ### Communication protocols
423
+
424
+ **REST/HTTP:** Default for external APIs. Well-understood, easy to debug. Use OpenAPI for contracts. **gRPC:** Binary (Protocol Buffers) over HTTP/2, 2-10x faster than JSON/REST for internal calls. Strong typing via `.proto` files. **GraphQL:** Useful as gateway aggregation layer, not for service-to-service.
425
+
426
+ ### Asynchronous messaging
427
+
428
+ **Apache Kafka:** Durable, ordered, replayable event log. The standard at scale. **RabbitMQ:** Flexible routing, simpler to operate for smaller deployments. **AWS SQS/SNS, GCP Pub/Sub:** Cloud-managed, lower ops overhead with provider lock-in.
429
+
430
+ ### Observability
431
+
432
+ **OpenTelemetry:** Vendor-neutral instrumentation standard — instrument once, export to any backend. The industry standard. **Jaeger / Zipkin:** Open-source distributed tracing (self-hosted). **Datadog / New Relic / Honeycomb:** Commercial platforms combining tracing, metrics, and logging.
433
+
434
+ ### Resilience
435
+
436
+ **Circuit breaker:** Resilience4j (Java), Polly (.NET), opossum (Node.js). Stops calling failing services, returns fallback. **Retry with backoff:** Exponential backoff + jitter for transient failures; requires server-side idempotency. **Bulkhead:** Isolates resource pools per dependency so one slow service cannot exhaust all capacity.
437
+
438
+ ---
439
+
440
+ ## Decision Tree
441
+
442
+ ```
443
+ How many developers work on this system?
444
+
445
+ ├── Fewer than 10 (single team)
446
+ │ └── STOP. Use a monolith or modular monolith.
447
+ │ Microservices will cost more than they save.
448
+ │ Revisit when team grows past 10.
449
+
450
+ ├── 10-50 developers (2-5 teams)
451
+ │ ├── Do all teams work on the same codebase?
452
+ │ │ ├── Yes → Modular monolith with enforced boundaries.
453
+ │ │ │ Extract only if specific modules have
454
+ │ │ │ measured independent scaling needs.
455
+ │ │ └── No → Are the codebases genuinely independent products?
456
+ │ │ ├── Yes → Separate applications (not microservices).
457
+ │ │ └── No → Modular monolith. Shared deployment is fine.
458
+ │ │
459
+ │ ├── Do you have a mature DevOps/platform team?
460
+ │ │ ├── No → STOP. You cannot operate microservices without
461
+ │ │ │ CI/CD automation, container orchestration,
462
+ │ │ │ and distributed tracing. Build this first.
463
+ │ │ └── Yes → Continue evaluation.
464
+ │ │
465
+ │ └── Is the product domain well understood?
466
+ │ ├── No → Monolith or modular monolith. You will draw
467
+ │ │ service boundaries wrong. Wait until domain
468
+ │ │ boundaries are proven (6-12 months of production).
469
+ │ └── Yes → Consider selective extraction of 1-3 services
470
+ │ with proven independent scaling/deployment needs.
471
+
472
+ ├── 50+ developers (5+ autonomous teams)
473
+ │ ├── Do teams need independent deployment cadences?
474
+ │ │ ├── No → Modular monolith (see: Shopify, 2000+ engineers).
475
+ │ │ └── Yes → Microservices for teams needing independence.
476
+ │ │ Keep shared components as modular monolith.
477
+ │ │
478
+ │ ├── Are there genuinely different scaling profiles?
479
+ │ │ ├── No → Modular monolith scales horizontally as a unit.
480
+ │ │ └── Yes → Extract high-scale components to services.
481
+ │ │
482
+ │ └── Do you have platform engineering capacity?
483
+ │ ├── No → STOP. Invest in platform team first.
484
+ │ │ Microservices without platform support = chaos.
485
+ │ └── Yes → Microservices with domain-oriented grouping.
486
+ │ Each domain = 3-8 services behind a domain gateway.
487
+
488
+ └── Is this a startup pre-product-market-fit?
489
+ └── ALWAYS monolith. No exceptions. Ship product.
490
+ Revisit architecture when you have paying customers
491
+ and measurable scaling problems.
492
+ ```
493
+
494
+ **Summary decision heuristic:**
495
+
496
+ - Team < 10 → monolith or modular monolith (never microservices)
497
+ - Team 10-50, same product → modular monolith (selective extraction only with evidence)
498
+ - Team 50+, multiple products, mature DevOps → microservices (with domain-oriented grouping)
499
+ - Startup without PMF → monolith (regardless of team size)
500
+ - No platform/DevOps team → not ready (regardless of team size)
501
+
502
+ ---
503
+
504
+ ## Implementation Sketch
505
+
506
+ ### Service structure (single service)
507
+
508
+ ```
509
+ order-service/
510
+ ├── src/
511
+ │ ├── api/ # Inbound adapters
512
+ │ │ ├── rest/
513
+ │ │ │ ├── OrderController.java # REST endpoints
514
+ │ │ │ └── dto/
515
+ │ │ │ ├── CreateOrderRequest.java
516
+ │ │ │ └── OrderResponse.java
517
+ │ │ └── grpc/
518
+ │ │ └── OrderGrpcService.java # gRPC endpoint
519
+ │ ├── domain/ # Core business logic (no framework deps)
520
+ │ │ ├── Order.java # Aggregate root
521
+ │ │ ├── OrderItem.java # Value object
522
+ │ │ ├── OrderStatus.java # Enum
523
+ │ │ ├── OrderRepository.java # Port (interface)
524
+ │ │ └── events/
525
+ │ │ ├── OrderPlaced.java # Domain event
526
+ │ │ └── OrderCancelled.java
527
+ │ ├── application/ # Use cases / application services
528
+ │ │ ├── CreateOrderUseCase.java
529
+ │ │ ├── CancelOrderUseCase.java
530
+ │ │ └── ports/
531
+ │ │ ├── PaymentClient.java # Port for calling payment-service
532
+ │ │ └── InventoryClient.java # Port for calling inventory-service
533
+ │ └── infrastructure/ # Outbound adapters
534
+ │ ├── persistence/
535
+ │ │ └── JpaOrderRepository.java
536
+ │ ├── clients/
537
+ │ │ ├── PaymentGrpcClient.java
538
+ │ │ └── InventoryRestClient.java
539
+ │ ├── messaging/
540
+ │ │ └── KafkaEventPublisher.java
541
+ │ └── config/
542
+ │ └── AppConfig.java
543
+ ├── Dockerfile
544
+ ├── build.gradle
545
+ ├── openapi.yaml # API contract (source of truth)
546
+ └── kubernetes/
547
+ ├── deployment.yaml
548
+ ├── service.yaml
549
+ └── configmap.yaml
550
+ ```
551
+
552
+ ### Docker Compose for local development
553
+
554
+ ```yaml
555
+ # docker-compose.yaml — local dev environment (NOT production)
556
+ # Abridged: shows pattern, not every service
557
+ version: '3.8'
558
+ services:
559
+ order-service:
560
+ build: ./order-service
561
+ ports: ["8081:8080"]
562
+ environment:
563
+ - DB_URL=jdbc:postgresql://order-db:5432/orders
564
+ - PAYMENT_SERVICE_URL=http://payment-service:8080
565
+ - KAFKA_BOOTSTRAP_SERVERS=kafka:9092
566
+ depends_on: [order-db, kafka]
567
+
568
+ payment-service:
569
+ build: ./payment-service
570
+ ports: ["8082:8080"]
571
+ environment:
572
+ - DB_URL=jdbc:postgresql://payment-db:5432/payments
573
+ - KAFKA_BOOTSTRAP_SERVERS=kafka:9092
574
+ depends_on: [payment-db, kafka]
575
+
576
+ # Each service gets its own database — data isolation enforced at infra level
577
+ order-db:
578
+ image: postgres:16
579
+ environment: { POSTGRES_DB: orders, POSTGRES_USER: order_app, POSTGRES_PASSWORD: localdev }
580
+ payment-db:
581
+ image: postgres:16
582
+ environment: { POSTGRES_DB: payments, POSTGRES_USER: payment_app, POSTGRES_PASSWORD: localdev }
583
+
584
+ kafka:
585
+ image: confluentinc/cp-kafka:7.6.0
586
+ ports: ["9092:9092"]
587
+ zookeeper:
588
+ image: confluentinc/cp-zookeeper:7.6.0
589
+ jaeger:
590
+ image: jaegertracing/all-in-one:1.55
591
+ ports: ["16686:16686", "4317:4317"] # UI + OTLP
592
+ ```
593
+
594
+ Note the local development complexity: 7 containers for just 2 services. A real system with 5+ services needs 15-20+ containers locally. In production, add service mesh sidecars, API gateway, monitoring stack, and log aggregation. This is the operational cost made tangible.
595
+
596
+ ### OpenAPI contract (source of truth)
597
+
598
+ ```yaml
599
+ # order-service/openapi.yaml — each service publishes its contract
600
+ openapi: 3.1.0
601
+ info:
602
+ title: Order Service API
603
+ version: 1.2.0
604
+ paths:
605
+ /api/v1/orders:
606
+ post:
607
+ operationId: createOrder
608
+ requestBody:
609
+ required: true
610
+ content:
611
+ application/json:
612
+ schema:
613
+ $ref: '#/components/schemas/CreateOrderRequest'
614
+ responses:
615
+ '201':
616
+ description: Order created
617
+ content:
618
+ application/json:
619
+ schema:
620
+ $ref: '#/components/schemas/OrderResponse'
621
+ '402':
622
+ description: Payment declined
623
+ /api/v1/orders/{orderId}:
624
+ get:
625
+ operationId: getOrder
626
+ parameters:
627
+ - name: orderId
628
+ in: path
629
+ required: true
630
+ schema: { type: string, format: uuid }
631
+ responses:
632
+ '200':
633
+ content:
634
+ application/json:
635
+ schema:
636
+ $ref: '#/components/schemas/OrderResponse'
637
+ '404':
638
+ description: Order not found
639
+ components:
640
+ schemas:
641
+ CreateOrderRequest:
642
+ type: object
643
+ required: [customerId, items]
644
+ properties:
645
+ customerId: { type: string, format: uuid }
646
+ items:
647
+ type: array
648
+ items: { $ref: '#/components/schemas/OrderItemRequest' }
649
+ idempotencyKey: { type: string }
650
+ OrderItemRequest:
651
+ type: object
652
+ required: [productId, quantity]
653
+ properties:
654
+ productId: { type: string, format: uuid }
655
+ quantity: { type: integer, minimum: 1 }
656
+ OrderResponse:
657
+ type: object
658
+ properties:
659
+ id: { type: string, format: uuid }
660
+ status: { type: string, enum: [PENDING, CONFIRMED, SHIPPED, DELIVERED, CANCELLED] }
661
+ totalCents: { type: integer }
662
+ createdAt: { type: string, format: date-time }
663
+ ```
664
+
665
+ ### Kubernetes deployment with health checks
666
+
667
+ ```yaml
668
+ # order-service/kubernetes/deployment.yaml — key elements shown
669
+ apiVersion: apps/v1
670
+ kind: Deployment
671
+ metadata:
672
+ name: order-service
673
+ namespace: production
674
+ spec:
675
+ replicas: 3
676
+ strategy:
677
+ type: RollingUpdate
678
+ rollingUpdate: { maxUnavailable: 1, maxSurge: 1 }
679
+ selector:
680
+ matchLabels: { app: order-service }
681
+ template:
682
+ metadata:
683
+ labels: { app: order-service, domain: commerce }
684
+ annotations:
685
+ prometheus.io/scrape: "true"
686
+ prometheus.io/port: "8080"
687
+ spec:
688
+ containers:
689
+ - name: order-service
690
+ image: registry.example.com/order-service:v1.2.0
691
+ ports:
692
+ - containerPort: 8080
693
+ env:
694
+ - name: DB_URL
695
+ valueFrom:
696
+ secretKeyRef: { name: order-service-secrets, key: db-url }
697
+ - name: OTEL_EXPORTER_OTLP_ENDPOINT
698
+ value: "http://jaeger-collector:4317"
699
+ resources:
700
+ requests: { memory: "256Mi", cpu: "250m" }
701
+ limits: { memory: "512Mi", cpu: "500m" }
702
+ readinessProbe:
703
+ httpGet: { path: /health/ready, port: 8080 }
704
+ initialDelaySeconds: 10
705
+ periodSeconds: 5
706
+ livenessProbe:
707
+ httpGet: { path: /health/live, port: 8080 }
708
+ initialDelaySeconds: 30
709
+ periodSeconds: 10
710
+ ---
711
+ apiVersion: v1
712
+ kind: Service
713
+ metadata:
714
+ name: order-service
715
+ spec:
716
+ selector: { app: order-service }
717
+ ports: [{ port: 8080, targetPort: 8080 }]
718
+ ---
719
+ apiVersion: autoscaling/v2
720
+ kind: HorizontalPodAutoscaler
721
+ metadata:
722
+ name: order-service-hpa
723
+ spec:
724
+ scaleTargetRef:
725
+ apiVersion: apps/v1
726
+ kind: Deployment
727
+ name: order-service
728
+ minReplicas: 3
729
+ maxReplicas: 20
730
+ metrics:
731
+ - type: Resource
732
+ resource:
733
+ name: cpu
734
+ target: { type: Utilization, averageUtilization: 70 }
735
+ ```
736
+
737
+ ---
738
+
739
+ ## Cross-References
740
+
741
+ - **monolith:** The correct starting point. Understand when a monolith is sufficient before reaching for microservices. See `architecture/patterns/monolith.md`.
742
+ - **modular-monolith:** The recommended intermediate step. Provides team ownership and boundary enforcement without distributed systems overhead. The architecture most teams should use instead of microservices. See `architecture/patterns/modular-monolith.md`.
743
+ - **event-driven:** Asynchronous communication between microservices relies on event-driven patterns. Events decouple services and enable eventual consistency. See `architecture/patterns/event-driven.md`.
744
+ - **saga-pattern:** The primary pattern for managing distributed transactions across microservices without 2PC. See `architecture/patterns/saga-pattern.md`.
745
+ - **api-design-rest:** REST API design principles for service-to-service and external communication. See `architecture/integration/api-design-rest.md`.
746
+ - **api-design-grpc:** gRPC design for high-performance internal service communication. See `architecture/integration/api-design-grpc.md`.
747
+ - **circuit-breaker-bulkhead:** Resilience patterns essential for preventing cascade failures in microservices. See `architecture/distributed/circuit-breaker-bulkhead.md`.
748
+ - **monolith-to-microservices:** The strangler fig pattern and migration strategies for extracting services from a monolith. See `architecture/patterns/monolith-to-microservices.md`.
749
+ - **domain-driven-design:** Bounded contexts are the prerequisite for defining service boundaries. Wrong boundaries = distributed monolith. See `architecture/foundations/domain-driven-design.md`.
750
+
751
+ ---
752
+
753
+ *Researched: 2026-03-08 | Sources: [Amazon Prime Video — Scaling Up the Prime Video Audio/Video Monitoring Service](https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90) | [Segment — Goodbye Microservices: From 100s of Problem Children to 1 Superstar (InfoQ)](https://www.infoq.com/news/2020/04/microservices-back-again/) | [Netflix Microservices Migration — From Monolith to 700+ Services](https://caffeinatedcoder.medium.com/netflixs-microservices-migration-from-monolith-to-700-services-8caa8e5bc574) | [Uber — Introducing Domain-Oriented Microservice Architecture](https://www.uber.com/en-US/blog/microservice-architecture/) | [Microservices Retrospective — What We Learned from Netflix (InfoQ)](https://www.infoq.com/presentations/microservices-netflix-industry/) | [The True Cost of Microservices — Quantifying Operational Complexity](https://www.softwareseni.com/the-true-cost-of-microservices-quantifying-operational-complexity-and-debugging-overhead/) | [The Architecture Decision That Saved Us $2M (FullScale)](https://fullscale.io/blog/microservices-team-management/) | [Microservices: Lessons from the Trenches (JavaPro)](https://javapro.io/2025/09/12/microservices-lessons-from-the-trenches/) | [Monolith vs Microservices 2025: Real Cloud Migration Costs](https://medium.com/@pawel.piwosz/monolith-vs-microservices-2025-real-cloud-migration-costs-and-hidden-challenges-8b453a3c71ec) | [Martin Fowler — MonolithFirst](https://martinfowler.com/bliki/MonolithFirst.html) | [Strangler Fig Pattern — AWS Prescriptive Guidance](https://docs.aws.amazon.com/prescriptive-guidance/latest/cloud-design-patterns/strangler-fig.html) | [Adopting Microservices at Netflix: Lessons for Architectural Design (F5/NGINX)](https://www.f5.com/company/blog/nginx/microservices-at-netflix-architectural-best-practices)*