@wazir-dev/cli 1.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (629) hide show
  1. package/AGENTS.md +111 -0
  2. package/CHANGELOG.md +14 -0
  3. package/CONTRIBUTING.md +101 -0
  4. package/LICENSE +21 -0
  5. package/README.md +314 -0
  6. package/assets/composition-engine.mmd +34 -0
  7. package/assets/demo-script.sh +17 -0
  8. package/assets/logo-dark.svg +14 -0
  9. package/assets/logo.svg +14 -0
  10. package/assets/pipeline.mmd +39 -0
  11. package/assets/record-demo.sh +51 -0
  12. package/docs/README.md +51 -0
  13. package/docs/adapters/context-mode.md +60 -0
  14. package/docs/concepts/architecture.md +87 -0
  15. package/docs/concepts/artifact-model.md +60 -0
  16. package/docs/concepts/composition-engine.md +36 -0
  17. package/docs/concepts/indexing-and-recall.md +160 -0
  18. package/docs/concepts/observability.md +41 -0
  19. package/docs/concepts/roles-and-workflows.md +59 -0
  20. package/docs/concepts/terminology-policy.md +27 -0
  21. package/docs/getting-started/01-installation.md +78 -0
  22. package/docs/getting-started/02-first-run.md +102 -0
  23. package/docs/getting-started/03-adding-to-project.md +15 -0
  24. package/docs/getting-started/04-host-setup.md +15 -0
  25. package/docs/guides/ci-integration.md +15 -0
  26. package/docs/guides/creating-skills.md +15 -0
  27. package/docs/guides/expertise-module-authoring.md +15 -0
  28. package/docs/guides/hook-development.md +15 -0
  29. package/docs/guides/memory-and-learnings.md +34 -0
  30. package/docs/guides/multi-host-export.md +15 -0
  31. package/docs/guides/troubleshooting.md +101 -0
  32. package/docs/guides/writing-custom-roles.md +15 -0
  33. package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
  34. package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
  35. package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
  36. package/docs/readmes/INDEX.md +99 -0
  37. package/docs/readmes/features/expertise/README.md +171 -0
  38. package/docs/readmes/features/exports/README.md +222 -0
  39. package/docs/readmes/features/hooks/README.md +103 -0
  40. package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
  41. package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
  42. package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
  43. package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
  44. package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
  45. package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
  46. package/docs/readmes/features/hooks/session-start.md +119 -0
  47. package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
  48. package/docs/readmes/features/roles/README.md +157 -0
  49. package/docs/readmes/features/roles/clarifier.md +152 -0
  50. package/docs/readmes/features/roles/content-author.md +190 -0
  51. package/docs/readmes/features/roles/designer.md +193 -0
  52. package/docs/readmes/features/roles/executor.md +184 -0
  53. package/docs/readmes/features/roles/learner.md +210 -0
  54. package/docs/readmes/features/roles/planner.md +182 -0
  55. package/docs/readmes/features/roles/researcher.md +164 -0
  56. package/docs/readmes/features/roles/reviewer.md +184 -0
  57. package/docs/readmes/features/roles/specifier.md +162 -0
  58. package/docs/readmes/features/roles/verifier.md +215 -0
  59. package/docs/readmes/features/schemas/README.md +178 -0
  60. package/docs/readmes/features/skills/README.md +63 -0
  61. package/docs/readmes/features/skills/brainstorming.md +96 -0
  62. package/docs/readmes/features/skills/debugging.md +148 -0
  63. package/docs/readmes/features/skills/design.md +120 -0
  64. package/docs/readmes/features/skills/prepare-next.md +109 -0
  65. package/docs/readmes/features/skills/run-audit.md +159 -0
  66. package/docs/readmes/features/skills/scan-project.md +109 -0
  67. package/docs/readmes/features/skills/self-audit.md +176 -0
  68. package/docs/readmes/features/skills/tdd.md +137 -0
  69. package/docs/readmes/features/skills/using-skills.md +92 -0
  70. package/docs/readmes/features/skills/verification.md +120 -0
  71. package/docs/readmes/features/skills/writing-plans.md +104 -0
  72. package/docs/readmes/features/tooling/README.md +320 -0
  73. package/docs/readmes/features/workflows/README.md +186 -0
  74. package/docs/readmes/features/workflows/author.md +181 -0
  75. package/docs/readmes/features/workflows/clarify.md +154 -0
  76. package/docs/readmes/features/workflows/design-review.md +171 -0
  77. package/docs/readmes/features/workflows/design.md +169 -0
  78. package/docs/readmes/features/workflows/discover.md +162 -0
  79. package/docs/readmes/features/workflows/execute.md +173 -0
  80. package/docs/readmes/features/workflows/learn.md +167 -0
  81. package/docs/readmes/features/workflows/plan-review.md +165 -0
  82. package/docs/readmes/features/workflows/plan.md +170 -0
  83. package/docs/readmes/features/workflows/prepare-next.md +167 -0
  84. package/docs/readmes/features/workflows/review.md +169 -0
  85. package/docs/readmes/features/workflows/run-audit.md +191 -0
  86. package/docs/readmes/features/workflows/spec-challenge.md +159 -0
  87. package/docs/readmes/features/workflows/specify.md +160 -0
  88. package/docs/readmes/features/workflows/verify.md +177 -0
  89. package/docs/readmes/packages/README.md +50 -0
  90. package/docs/readmes/packages/ajv.md +117 -0
  91. package/docs/readmes/packages/context-mode.md +118 -0
  92. package/docs/readmes/packages/gray-matter.md +116 -0
  93. package/docs/readmes/packages/node-test.md +137 -0
  94. package/docs/readmes/packages/yaml.md +112 -0
  95. package/docs/reference/configuration-reference.md +159 -0
  96. package/docs/reference/expertise-index.md +52 -0
  97. package/docs/reference/git-flow.md +43 -0
  98. package/docs/reference/hooks.md +87 -0
  99. package/docs/reference/host-exports.md +50 -0
  100. package/docs/reference/launch-checklist.md +172 -0
  101. package/docs/reference/marketplace-listings.md +76 -0
  102. package/docs/reference/release-process.md +34 -0
  103. package/docs/reference/roles-reference.md +77 -0
  104. package/docs/reference/skills.md +33 -0
  105. package/docs/reference/templates.md +29 -0
  106. package/docs/reference/tooling-cli.md +94 -0
  107. package/docs/truth-claims.yaml +222 -0
  108. package/expertise/PROGRESS.md +63 -0
  109. package/expertise/README.md +18 -0
  110. package/expertise/antipatterns/PROGRESS.md +56 -0
  111. package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
  112. package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
  113. package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
  114. package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
  115. package/expertise/antipatterns/backend/index.md +24 -0
  116. package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
  117. package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
  118. package/expertise/antipatterns/code/async-antipatterns.md +622 -0
  119. package/expertise/antipatterns/code/code-smells.md +1186 -0
  120. package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
  121. package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
  122. package/expertise/antipatterns/code/index.md +27 -0
  123. package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
  124. package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
  125. package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
  126. package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
  127. package/expertise/antipatterns/design/dark-patterns.md +1121 -0
  128. package/expertise/antipatterns/design/index.md +22 -0
  129. package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
  130. package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
  131. package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
  132. package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
  133. package/expertise/antipatterns/frontend/index.md +23 -0
  134. package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
  135. package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
  136. package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
  137. package/expertise/antipatterns/index.md +31 -0
  138. package/expertise/antipatterns/performance/index.md +20 -0
  139. package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
  140. package/expertise/antipatterns/performance/premature-optimization.md +623 -0
  141. package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
  142. package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
  143. package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
  144. package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
  145. package/expertise/antipatterns/process/index.md +23 -0
  146. package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
  147. package/expertise/antipatterns/security/index.md +20 -0
  148. package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
  149. package/expertise/antipatterns/security/security-theater.md +843 -0
  150. package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
  151. package/expertise/architecture/PROGRESS.md +70 -0
  152. package/expertise/architecture/data/caching-architecture.md +671 -0
  153. package/expertise/architecture/data/data-consistency.md +574 -0
  154. package/expertise/architecture/data/data-modeling.md +536 -0
  155. package/expertise/architecture/data/event-streams-and-queues.md +634 -0
  156. package/expertise/architecture/data/index.md +25 -0
  157. package/expertise/architecture/data/search-architecture.md +663 -0
  158. package/expertise/architecture/data/sql-vs-nosql.md +708 -0
  159. package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
  160. package/expertise/architecture/decisions/build-vs-buy.md +616 -0
  161. package/expertise/architecture/decisions/index.md +23 -0
  162. package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
  163. package/expertise/architecture/decisions/technology-selection.md +616 -0
  164. package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
  165. package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
  166. package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
  167. package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
  168. package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
  169. package/expertise/architecture/distributed/index.md +25 -0
  170. package/expertise/architecture/distributed/saga-pattern.md +797 -0
  171. package/expertise/architecture/foundations/architectural-thinking.md +460 -0
  172. package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
  173. package/expertise/architecture/foundations/design-principles-solid.md +649 -0
  174. package/expertise/architecture/foundations/domain-driven-design.md +719 -0
  175. package/expertise/architecture/foundations/index.md +25 -0
  176. package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
  177. package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
  178. package/expertise/architecture/index.md +34 -0
  179. package/expertise/architecture/integration/api-design-graphql.md +638 -0
  180. package/expertise/architecture/integration/api-design-grpc.md +804 -0
  181. package/expertise/architecture/integration/api-design-rest.md +892 -0
  182. package/expertise/architecture/integration/index.md +25 -0
  183. package/expertise/architecture/integration/third-party-integration.md +795 -0
  184. package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
  185. package/expertise/architecture/integration/websockets-realtime.md +791 -0
  186. package/expertise/architecture/mobile-architecture/index.md +22 -0
  187. package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
  188. package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
  189. package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
  190. package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
  191. package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
  192. package/expertise/architecture/patterns/event-driven.md +797 -0
  193. package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
  194. package/expertise/architecture/patterns/index.md +27 -0
  195. package/expertise/architecture/patterns/layered-architecture.md +736 -0
  196. package/expertise/architecture/patterns/microservices.md +753 -0
  197. package/expertise/architecture/patterns/modular-monolith.md +692 -0
  198. package/expertise/architecture/patterns/monolith.md +626 -0
  199. package/expertise/architecture/patterns/plugin-architecture.md +735 -0
  200. package/expertise/architecture/patterns/serverless.md +780 -0
  201. package/expertise/architecture/scaling/database-scaling.md +615 -0
  202. package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
  203. package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
  204. package/expertise/architecture/scaling/index.md +24 -0
  205. package/expertise/architecture/scaling/multi-tenancy.md +800 -0
  206. package/expertise/architecture/scaling/stateless-design.md +787 -0
  207. package/expertise/backend/embedded-firmware.md +625 -0
  208. package/expertise/backend/go.md +853 -0
  209. package/expertise/backend/index.md +24 -0
  210. package/expertise/backend/java-spring.md +448 -0
  211. package/expertise/backend/node-typescript.md +625 -0
  212. package/expertise/backend/python-fastapi.md +724 -0
  213. package/expertise/backend/rust.md +458 -0
  214. package/expertise/backend/solidity.md +711 -0
  215. package/expertise/composition-map.yaml +443 -0
  216. package/expertise/content/foundations/content-modeling.md +395 -0
  217. package/expertise/content/foundations/editorial-standards.md +449 -0
  218. package/expertise/content/foundations/index.md +24 -0
  219. package/expertise/content/foundations/microcopy.md +455 -0
  220. package/expertise/content/foundations/terminology-governance.md +509 -0
  221. package/expertise/content/index.md +34 -0
  222. package/expertise/content/patterns/accessibility-copy.md +518 -0
  223. package/expertise/content/patterns/index.md +24 -0
  224. package/expertise/content/patterns/notification-content.md +433 -0
  225. package/expertise/content/patterns/sample-content.md +486 -0
  226. package/expertise/content/patterns/state-copy.md +439 -0
  227. package/expertise/design/PROGRESS.md +58 -0
  228. package/expertise/design/disciplines/dark-mode-theming.md +577 -0
  229. package/expertise/design/disciplines/design-systems.md +595 -0
  230. package/expertise/design/disciplines/index.md +25 -0
  231. package/expertise/design/disciplines/information-architecture.md +800 -0
  232. package/expertise/design/disciplines/interaction-design.md +788 -0
  233. package/expertise/design/disciplines/responsive-design.md +552 -0
  234. package/expertise/design/disciplines/usability-testing.md +516 -0
  235. package/expertise/design/disciplines/user-research.md +792 -0
  236. package/expertise/design/foundations/accessibility-design.md +796 -0
  237. package/expertise/design/foundations/color-theory.md +797 -0
  238. package/expertise/design/foundations/iconography.md +795 -0
  239. package/expertise/design/foundations/index.md +26 -0
  240. package/expertise/design/foundations/motion-and-animation.md +653 -0
  241. package/expertise/design/foundations/rtl-design.md +585 -0
  242. package/expertise/design/foundations/spacing-and-layout.md +607 -0
  243. package/expertise/design/foundations/typography.md +800 -0
  244. package/expertise/design/foundations/visual-hierarchy.md +761 -0
  245. package/expertise/design/index.md +32 -0
  246. package/expertise/design/patterns/authentication-flows.md +474 -0
  247. package/expertise/design/patterns/content-consumption.md +789 -0
  248. package/expertise/design/patterns/data-display.md +618 -0
  249. package/expertise/design/patterns/e-commerce.md +1494 -0
  250. package/expertise/design/patterns/feedback-and-states.md +642 -0
  251. package/expertise/design/patterns/forms-and-input.md +819 -0
  252. package/expertise/design/patterns/gamification.md +801 -0
  253. package/expertise/design/patterns/index.md +31 -0
  254. package/expertise/design/patterns/microinteractions.md +449 -0
  255. package/expertise/design/patterns/navigation.md +800 -0
  256. package/expertise/design/patterns/notifications.md +705 -0
  257. package/expertise/design/patterns/onboarding.md +700 -0
  258. package/expertise/design/patterns/search-and-filter.md +601 -0
  259. package/expertise/design/patterns/settings-and-preferences.md +768 -0
  260. package/expertise/design/patterns/social-and-community.md +748 -0
  261. package/expertise/design/platforms/desktop-native.md +612 -0
  262. package/expertise/design/platforms/index.md +25 -0
  263. package/expertise/design/platforms/mobile-android.md +825 -0
  264. package/expertise/design/platforms/mobile-cross-platform.md +983 -0
  265. package/expertise/design/platforms/mobile-ios.md +699 -0
  266. package/expertise/design/platforms/tablet.md +794 -0
  267. package/expertise/design/platforms/web-dashboard.md +790 -0
  268. package/expertise/design/platforms/web-responsive.md +550 -0
  269. package/expertise/design/psychology/behavioral-nudges.md +449 -0
  270. package/expertise/design/psychology/cognitive-load.md +1191 -0
  271. package/expertise/design/psychology/error-psychology.md +778 -0
  272. package/expertise/design/psychology/index.md +22 -0
  273. package/expertise/design/psychology/persuasive-design.md +736 -0
  274. package/expertise/design/psychology/user-mental-models.md +623 -0
  275. package/expertise/design/tooling/open-pencil.md +266 -0
  276. package/expertise/frontend/angular.md +1073 -0
  277. package/expertise/frontend/desktop-electron.md +546 -0
  278. package/expertise/frontend/flutter.md +782 -0
  279. package/expertise/frontend/index.md +27 -0
  280. package/expertise/frontend/native-android.md +409 -0
  281. package/expertise/frontend/native-ios.md +490 -0
  282. package/expertise/frontend/react-native.md +1160 -0
  283. package/expertise/frontend/react.md +808 -0
  284. package/expertise/frontend/vue.md +1089 -0
  285. package/expertise/humanize/domain-rules-code.md +79 -0
  286. package/expertise/humanize/domain-rules-content.md +67 -0
  287. package/expertise/humanize/domain-rules-technical-docs.md +56 -0
  288. package/expertise/humanize/index.md +35 -0
  289. package/expertise/humanize/self-audit-checklist.md +87 -0
  290. package/expertise/humanize/sentence-patterns.md +218 -0
  291. package/expertise/humanize/vocabulary-blacklist.md +105 -0
  292. package/expertise/i18n/PROGRESS.md +65 -0
  293. package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
  294. package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
  295. package/expertise/i18n/advanced/complex-scripts.md +30 -0
  296. package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
  297. package/expertise/i18n/advanced/testing-i18n.md +28 -0
  298. package/expertise/i18n/content/content-adaptation.md +23 -0
  299. package/expertise/i18n/content/locale-specific-formatting.md +23 -0
  300. package/expertise/i18n/content/machine-translation-integration.md +28 -0
  301. package/expertise/i18n/content/translation-management.md +29 -0
  302. package/expertise/i18n/foundations/date-time-calendars.md +67 -0
  303. package/expertise/i18n/foundations/i18n-architecture.md +272 -0
  304. package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
  305. package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
  306. package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
  307. package/expertise/i18n/foundations/string-externalization.md +236 -0
  308. package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
  309. package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
  310. package/expertise/i18n/index.md +38 -0
  311. package/expertise/i18n/platform/backend-i18n.md +31 -0
  312. package/expertise/i18n/platform/flutter-i18n.md +148 -0
  313. package/expertise/i18n/platform/native-android-i18n.md +36 -0
  314. package/expertise/i18n/platform/native-ios-i18n.md +36 -0
  315. package/expertise/i18n/platform/react-i18n.md +103 -0
  316. package/expertise/i18n/platform/web-css-i18n.md +81 -0
  317. package/expertise/i18n/rtl/arabic-specific.md +175 -0
  318. package/expertise/i18n/rtl/hebrew-specific.md +149 -0
  319. package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
  320. package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
  321. package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
  322. package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
  323. package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
  324. package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
  325. package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
  326. package/expertise/i18n/rtl/rtl-typography.md +160 -0
  327. package/expertise/index.md +113 -0
  328. package/expertise/index.yaml +216 -0
  329. package/expertise/infrastructure/cloud-aws.md +597 -0
  330. package/expertise/infrastructure/cloud-gcp.md +599 -0
  331. package/expertise/infrastructure/cybersecurity.md +816 -0
  332. package/expertise/infrastructure/database-mongodb.md +447 -0
  333. package/expertise/infrastructure/database-postgres.md +400 -0
  334. package/expertise/infrastructure/devops-cicd.md +787 -0
  335. package/expertise/infrastructure/index.md +27 -0
  336. package/expertise/performance/PROGRESS.md +50 -0
  337. package/expertise/performance/backend/api-latency.md +1204 -0
  338. package/expertise/performance/backend/background-jobs.md +506 -0
  339. package/expertise/performance/backend/connection-pooling.md +1209 -0
  340. package/expertise/performance/backend/database-query-optimization.md +515 -0
  341. package/expertise/performance/backend/index.md +23 -0
  342. package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
  343. package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
  344. package/expertise/performance/foundations/caching-strategies.md +489 -0
  345. package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
  346. package/expertise/performance/foundations/index.md +24 -0
  347. package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
  348. package/expertise/performance/foundations/memory-management.md +964 -0
  349. package/expertise/performance/foundations/performance-budgets.md +1314 -0
  350. package/expertise/performance/index.md +31 -0
  351. package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
  352. package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
  353. package/expertise/performance/infrastructure/index.md +22 -0
  354. package/expertise/performance/infrastructure/load-balancing.md +1081 -0
  355. package/expertise/performance/infrastructure/observability.md +1079 -0
  356. package/expertise/performance/mobile/index.md +23 -0
  357. package/expertise/performance/mobile/mobile-animations.md +544 -0
  358. package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
  359. package/expertise/performance/mobile/mobile-network.md +452 -0
  360. package/expertise/performance/mobile/mobile-rendering.md +599 -0
  361. package/expertise/performance/mobile/mobile-startup-time.md +505 -0
  362. package/expertise/performance/platform-specific/flutter-performance.md +647 -0
  363. package/expertise/performance/platform-specific/index.md +22 -0
  364. package/expertise/performance/platform-specific/node-performance.md +1307 -0
  365. package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
  366. package/expertise/performance/platform-specific/react-performance.md +1403 -0
  367. package/expertise/performance/web/bundle-optimization.md +1239 -0
  368. package/expertise/performance/web/image-and-media.md +636 -0
  369. package/expertise/performance/web/index.md +24 -0
  370. package/expertise/performance/web/network-optimization.md +1133 -0
  371. package/expertise/performance/web/rendering-performance.md +1098 -0
  372. package/expertise/performance/web/ssr-and-hydration.md +918 -0
  373. package/expertise/performance/web/web-vitals.md +1374 -0
  374. package/expertise/quality/accessibility.md +985 -0
  375. package/expertise/quality/evidence-based-verification.md +499 -0
  376. package/expertise/quality/index.md +24 -0
  377. package/expertise/quality/ml-model-audit.md +614 -0
  378. package/expertise/quality/performance.md +600 -0
  379. package/expertise/quality/testing-api.md +891 -0
  380. package/expertise/quality/testing-mobile.md +496 -0
  381. package/expertise/quality/testing-web.md +849 -0
  382. package/expertise/security/PROGRESS.md +54 -0
  383. package/expertise/security/agentic-identity.md +540 -0
  384. package/expertise/security/compliance-frameworks.md +601 -0
  385. package/expertise/security/data/data-encryption.md +364 -0
  386. package/expertise/security/data/data-privacy-gdpr.md +692 -0
  387. package/expertise/security/data/database-security.md +1171 -0
  388. package/expertise/security/data/index.md +22 -0
  389. package/expertise/security/data/pii-handling.md +531 -0
  390. package/expertise/security/foundations/authentication.md +1041 -0
  391. package/expertise/security/foundations/authorization.md +603 -0
  392. package/expertise/security/foundations/cryptography.md +1001 -0
  393. package/expertise/security/foundations/index.md +25 -0
  394. package/expertise/security/foundations/owasp-top-10.md +1354 -0
  395. package/expertise/security/foundations/secrets-management.md +1217 -0
  396. package/expertise/security/foundations/secure-sdlc.md +700 -0
  397. package/expertise/security/foundations/supply-chain-security.md +698 -0
  398. package/expertise/security/index.md +31 -0
  399. package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
  400. package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
  401. package/expertise/security/infrastructure/container-security.md +721 -0
  402. package/expertise/security/infrastructure/incident-response.md +1295 -0
  403. package/expertise/security/infrastructure/index.md +24 -0
  404. package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
  405. package/expertise/security/infrastructure/network-security.md +1337 -0
  406. package/expertise/security/mobile/index.md +23 -0
  407. package/expertise/security/mobile/mobile-android-security.md +1218 -0
  408. package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
  409. package/expertise/security/mobile/mobile-data-storage.md +1265 -0
  410. package/expertise/security/mobile/mobile-ios-security.md +1401 -0
  411. package/expertise/security/mobile/mobile-network-security.md +1520 -0
  412. package/expertise/security/smart-contract-security.md +594 -0
  413. package/expertise/security/testing/index.md +22 -0
  414. package/expertise/security/testing/penetration-testing.md +1258 -0
  415. package/expertise/security/testing/security-code-review.md +1765 -0
  416. package/expertise/security/testing/threat-modeling.md +1074 -0
  417. package/expertise/security/testing/vulnerability-scanning.md +1062 -0
  418. package/expertise/security/web/api-security.md +586 -0
  419. package/expertise/security/web/cors-and-headers.md +433 -0
  420. package/expertise/security/web/csrf.md +562 -0
  421. package/expertise/security/web/file-upload.md +1477 -0
  422. package/expertise/security/web/index.md +25 -0
  423. package/expertise/security/web/injection.md +1375 -0
  424. package/expertise/security/web/session-management.md +1101 -0
  425. package/expertise/security/web/xss.md +1158 -0
  426. package/exports/README.md +17 -0
  427. package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
  428. package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
  429. package/exports/hosts/claude/.claude/agents/designer.md +55 -0
  430. package/exports/hosts/claude/.claude/agents/executor.md +55 -0
  431. package/exports/hosts/claude/.claude/agents/learner.md +51 -0
  432. package/exports/hosts/claude/.claude/agents/planner.md +53 -0
  433. package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
  434. package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
  435. package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
  436. package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
  437. package/exports/hosts/claude/.claude/commands/author.md +42 -0
  438. package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
  439. package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
  440. package/exports/hosts/claude/.claude/commands/design.md +44 -0
  441. package/exports/hosts/claude/.claude/commands/discover.md +37 -0
  442. package/exports/hosts/claude/.claude/commands/execute.md +48 -0
  443. package/exports/hosts/claude/.claude/commands/learn.md +38 -0
  444. package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
  445. package/exports/hosts/claude/.claude/commands/plan.md +39 -0
  446. package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
  447. package/exports/hosts/claude/.claude/commands/review.md +40 -0
  448. package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
  449. package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
  450. package/exports/hosts/claude/.claude/commands/specify.md +38 -0
  451. package/exports/hosts/claude/.claude/commands/verify.md +37 -0
  452. package/exports/hosts/claude/.claude/settings.json +34 -0
  453. package/exports/hosts/claude/CLAUDE.md +19 -0
  454. package/exports/hosts/claude/export.manifest.json +38 -0
  455. package/exports/hosts/claude/host-package.json +67 -0
  456. package/exports/hosts/codex/AGENTS.md +19 -0
  457. package/exports/hosts/codex/export.manifest.json +38 -0
  458. package/exports/hosts/codex/host-package.json +41 -0
  459. package/exports/hosts/cursor/.cursor/hooks.json +16 -0
  460. package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
  461. package/exports/hosts/cursor/export.manifest.json +38 -0
  462. package/exports/hosts/cursor/host-package.json +42 -0
  463. package/exports/hosts/gemini/GEMINI.md +19 -0
  464. package/exports/hosts/gemini/export.manifest.json +38 -0
  465. package/exports/hosts/gemini/host-package.json +41 -0
  466. package/hooks/README.md +18 -0
  467. package/hooks/definitions/loop_cap_guard.yaml +21 -0
  468. package/hooks/definitions/post_tool_capture.yaml +24 -0
  469. package/hooks/definitions/pre_compact_summary.yaml +19 -0
  470. package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
  471. package/hooks/definitions/protected_path_write_guard.yaml +19 -0
  472. package/hooks/definitions/session_start.yaml +19 -0
  473. package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
  474. package/hooks/loop-cap-guard +17 -0
  475. package/hooks/post-tool-lint +36 -0
  476. package/hooks/protected-path-write-guard +17 -0
  477. package/hooks/session-start +41 -0
  478. package/llms-full.txt +2355 -0
  479. package/llms.txt +43 -0
  480. package/package.json +79 -0
  481. package/roles/README.md +20 -0
  482. package/roles/clarifier.md +42 -0
  483. package/roles/content-author.md +63 -0
  484. package/roles/designer.md +55 -0
  485. package/roles/executor.md +55 -0
  486. package/roles/learner.md +51 -0
  487. package/roles/planner.md +53 -0
  488. package/roles/researcher.md +43 -0
  489. package/roles/reviewer.md +54 -0
  490. package/roles/specifier.md +47 -0
  491. package/roles/verifier.md +71 -0
  492. package/schemas/README.md +24 -0
  493. package/schemas/accepted-learning.schema.json +20 -0
  494. package/schemas/author-artifact.schema.json +156 -0
  495. package/schemas/clarification.schema.json +19 -0
  496. package/schemas/design-artifact.schema.json +80 -0
  497. package/schemas/docs-claim.schema.json +18 -0
  498. package/schemas/export-manifest.schema.json +20 -0
  499. package/schemas/hook.schema.json +67 -0
  500. package/schemas/host-export-package.schema.json +18 -0
  501. package/schemas/implementation-plan.schema.json +19 -0
  502. package/schemas/proposed-learning.schema.json +19 -0
  503. package/schemas/research.schema.json +18 -0
  504. package/schemas/review.schema.json +29 -0
  505. package/schemas/run-manifest.schema.json +18 -0
  506. package/schemas/spec-challenge.schema.json +18 -0
  507. package/schemas/spec.schema.json +20 -0
  508. package/schemas/usage.schema.json +102 -0
  509. package/schemas/verification-proof.schema.json +29 -0
  510. package/schemas/wazir-manifest.schema.json +173 -0
  511. package/skills/README.md +40 -0
  512. package/skills/brainstorming/SKILL.md +77 -0
  513. package/skills/debugging/SKILL.md +50 -0
  514. package/skills/design/SKILL.md +61 -0
  515. package/skills/dispatching-parallel-agents/SKILL.md +128 -0
  516. package/skills/executing-plans/SKILL.md +70 -0
  517. package/skills/finishing-a-development-branch/SKILL.md +169 -0
  518. package/skills/humanize/SKILL.md +123 -0
  519. package/skills/init-pipeline/SKILL.md +124 -0
  520. package/skills/prepare-next/SKILL.md +20 -0
  521. package/skills/receiving-code-review/SKILL.md +123 -0
  522. package/skills/requesting-code-review/SKILL.md +105 -0
  523. package/skills/requesting-code-review/code-reviewer.md +108 -0
  524. package/skills/run-audit/SKILL.md +197 -0
  525. package/skills/scan-project/SKILL.md +41 -0
  526. package/skills/self-audit/SKILL.md +153 -0
  527. package/skills/subagent-driven-development/SKILL.md +154 -0
  528. package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
  529. package/skills/subagent-driven-development/implementer-prompt.md +102 -0
  530. package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
  531. package/skills/tdd/SKILL.md +23 -0
  532. package/skills/using-git-worktrees/SKILL.md +163 -0
  533. package/skills/using-skills/SKILL.md +95 -0
  534. package/skills/verification/SKILL.md +22 -0
  535. package/skills/wazir/SKILL.md +463 -0
  536. package/skills/writing-plans/SKILL.md +30 -0
  537. package/skills/writing-skills/SKILL.md +157 -0
  538. package/skills/writing-skills/anthropic-best-practices.md +122 -0
  539. package/skills/writing-skills/persuasion-principles.md +50 -0
  540. package/templates/README.md +20 -0
  541. package/templates/artifacts/README.md +10 -0
  542. package/templates/artifacts/accepted-learning.md +19 -0
  543. package/templates/artifacts/accepted-learning.template.json +12 -0
  544. package/templates/artifacts/author.md +74 -0
  545. package/templates/artifacts/author.template.json +19 -0
  546. package/templates/artifacts/clarification.md +21 -0
  547. package/templates/artifacts/clarification.template.json +12 -0
  548. package/templates/artifacts/execute-notes.md +19 -0
  549. package/templates/artifacts/implementation-plan.md +21 -0
  550. package/templates/artifacts/implementation-plan.template.json +11 -0
  551. package/templates/artifacts/learning-proposal.md +19 -0
  552. package/templates/artifacts/next-run-handoff.md +21 -0
  553. package/templates/artifacts/plan-review.md +19 -0
  554. package/templates/artifacts/proposed-learning.template.json +12 -0
  555. package/templates/artifacts/research.md +21 -0
  556. package/templates/artifacts/research.template.json +12 -0
  557. package/templates/artifacts/review-findings.md +19 -0
  558. package/templates/artifacts/review.template.json +11 -0
  559. package/templates/artifacts/run-manifest.template.json +8 -0
  560. package/templates/artifacts/spec-challenge.md +19 -0
  561. package/templates/artifacts/spec-challenge.template.json +11 -0
  562. package/templates/artifacts/spec.md +21 -0
  563. package/templates/artifacts/spec.template.json +12 -0
  564. package/templates/artifacts/verification-proof.md +19 -0
  565. package/templates/artifacts/verification-proof.template.json +11 -0
  566. package/templates/examples/accepted-learning.example.json +14 -0
  567. package/templates/examples/author.example.json +152 -0
  568. package/templates/examples/clarification.example.json +15 -0
  569. package/templates/examples/docs-claim.example.json +8 -0
  570. package/templates/examples/export-manifest.example.json +7 -0
  571. package/templates/examples/host-export-package.example.json +11 -0
  572. package/templates/examples/implementation-plan.example.json +17 -0
  573. package/templates/examples/proposed-learning.example.json +13 -0
  574. package/templates/examples/research.example.json +15 -0
  575. package/templates/examples/research.example.md +6 -0
  576. package/templates/examples/review.example.json +17 -0
  577. package/templates/examples/run-manifest.example.json +9 -0
  578. package/templates/examples/spec-challenge.example.json +14 -0
  579. package/templates/examples/spec.example.json +21 -0
  580. package/templates/examples/verification-proof.example.json +21 -0
  581. package/templates/examples/wazir-manifest.example.yaml +65 -0
  582. package/templates/task-definition-schema.md +99 -0
  583. package/tooling/README.md +20 -0
  584. package/tooling/src/adapters/context-mode.js +50 -0
  585. package/tooling/src/capture/command.js +376 -0
  586. package/tooling/src/capture/store.js +99 -0
  587. package/tooling/src/capture/usage.js +270 -0
  588. package/tooling/src/checks/branches.js +50 -0
  589. package/tooling/src/checks/brand-truth.js +110 -0
  590. package/tooling/src/checks/changelog.js +231 -0
  591. package/tooling/src/checks/command-registry.js +36 -0
  592. package/tooling/src/checks/commits.js +102 -0
  593. package/tooling/src/checks/docs-drift.js +103 -0
  594. package/tooling/src/checks/docs-truth.js +201 -0
  595. package/tooling/src/checks/runtime-surface.js +156 -0
  596. package/tooling/src/cli.js +116 -0
  597. package/tooling/src/command-options.js +56 -0
  598. package/tooling/src/commands/validate.js +320 -0
  599. package/tooling/src/doctor/command.js +91 -0
  600. package/tooling/src/export/command.js +77 -0
  601. package/tooling/src/export/compiler.js +498 -0
  602. package/tooling/src/guards/loop-cap-guard.js +52 -0
  603. package/tooling/src/guards/protected-path-write-guard.js +67 -0
  604. package/tooling/src/index/command.js +152 -0
  605. package/tooling/src/index/storage.js +1061 -0
  606. package/tooling/src/index/summarizers.js +261 -0
  607. package/tooling/src/loaders.js +18 -0
  608. package/tooling/src/project-root.js +22 -0
  609. package/tooling/src/recall/command.js +225 -0
  610. package/tooling/src/schema-validator.js +30 -0
  611. package/tooling/src/state-root.js +40 -0
  612. package/tooling/src/status/command.js +71 -0
  613. package/wazir.manifest.yaml +135 -0
  614. package/workflows/README.md +19 -0
  615. package/workflows/author.md +42 -0
  616. package/workflows/clarify.md +38 -0
  617. package/workflows/design-review.md +46 -0
  618. package/workflows/design.md +44 -0
  619. package/workflows/discover.md +37 -0
  620. package/workflows/execute.md +48 -0
  621. package/workflows/learn.md +38 -0
  622. package/workflows/plan-review.md +42 -0
  623. package/workflows/plan.md +39 -0
  624. package/workflows/prepare-next.md +37 -0
  625. package/workflows/review.md +40 -0
  626. package/workflows/run-audit.md +41 -0
  627. package/workflows/spec-challenge.md +41 -0
  628. package/workflows/specify.md +38 -0
  629. package/workflows/verify.md +37 -0
@@ -0,0 +1,1059 @@
1
+ # Auto-Scaling Performance Expertise Module
2
+
3
+ > **Domain**: Infrastructure Performance
4
+ > **Last Updated**: 2026-03-08
5
+ > **Confidence Level**: High (benchmarks from production systems, AWS documentation, and peer-reviewed sources)
6
+
7
+ ---
8
+
9
+ ## Table of Contents
10
+
11
+ 1. [Overview and Scaling Taxonomy](#overview-and-scaling-taxonomy)
12
+ 2. [Horizontal vs Vertical Scaling: Performance Tradeoffs](#horizontal-vs-vertical-scaling-performance-tradeoffs)
13
+ 3. [Kubernetes HPA and VPA](#kubernetes-hpa-and-vpa)
14
+ 4. [Kubernetes Node Scaling: Cluster Autoscaler vs Karpenter](#kubernetes-node-scaling-cluster-autoscaler-vs-karpenter)
15
+ 5. [KEDA: Event-Driven Autoscaling](#keda-event-driven-autoscaling)
16
+ 6. [AWS Auto Scaling Policies](#aws-auto-scaling-policies)
17
+ 7. [Serverless Scaling and Cold Starts](#serverless-scaling-and-cold-starts)
18
+ 8. [Custom Metrics for Scaling](#custom-metrics-for-scaling)
19
+ 9. [Scaling Speed: Time-to-Ready Analysis](#scaling-speed-time-to-ready-analysis)
20
+ 10. [Warm Pool Strategies](#warm-pool-strategies)
21
+ 11. [Cost vs Performance Tradeoffs](#cost-vs-performance-tradeoffs)
22
+ 12. [Common Bottlenecks](#common-bottlenecks)
23
+ 13. [Anti-Patterns](#anti-patterns)
24
+ 14. [Before/After: Configuration Improvements](#beforeafter-configuration-improvements)
25
+ 15. [Decision Tree: How Should I Configure Auto-Scaling?](#decision-tree-how-should-i-configure-auto-scaling)
26
+ 16. [Sources](#sources)
27
+
28
+ ---
29
+
30
+ ## Overview and Scaling Taxonomy
31
+
32
+ Auto-scaling is the automatic adjustment of compute resources in response to demand.
33
+ The three fundamental dimensions are:
34
+
35
+ | Dimension | Mechanism | Latency to Effect | Cost Profile |
36
+ |---|---|---|---|
37
+ | **Horizontal** (scale out/in) | Add/remove instances or pods | 45s-4min (pods), 1-5min (VMs) | Linear with instance count |
38
+ | **Vertical** (scale up/down) | Resize CPU/memory of existing units | 0s (in-place VPA) to 2min (restart) | Step function at instance type boundaries |
39
+ | **Functional** | Offload to specialized services | N/A (architectural) | Varies by service |
40
+
41
+ Scaling triggers fall into two categories:
42
+
43
+ - **Reactive**: Respond to observed metric thresholds (CloudWatch alarms, HPA polling). Scaling-related latency of 2-5 minutes is typical.
44
+ - **Predictive/Proactive**: Use ML models to forecast demand and pre-provision capacity. Reduces scaling-related latency by 65-80% compared to reactive approaches. Hybrid approaches reduce average response time by 35% while maintaining resource utilization above 75%.
45
+
46
+ ---
47
+
48
+ ## Horizontal vs Vertical Scaling: Performance Tradeoffs
49
+
50
+ ### Horizontal Scaling (Scale Out)
51
+
52
+ **Strengths:**
53
+ - Near-linear throughput increase for stateless workloads (adding 10 pods yields ~10x throughput for embarrassingly parallel work)
54
+ - No upper hardware ceiling -- scale to thousands of nodes
55
+ - Built-in fault tolerance -- losing 1 of 20 instances loses only 5% capacity
56
+ - Geographic distribution reduces client latency by 30-70ms per continent hop
57
+
58
+ **Performance costs:**
59
+ - Network latency between instances: 0.1-0.5ms intra-AZ, 0.5-2ms cross-AZ, 50-150ms cross-region
60
+ - Load balancer overhead: 0.05-0.2ms per request for ALB/NLB
61
+ - Distributed state coordination: Two-Phase Commit adds 2-10ms per transaction
62
+ - Session affinity complexity: sticky sessions reduce effective capacity by 15-30%
63
+
64
+ **Best for:** Stateless APIs, web frontends, worker queues, microservices with >1000 RPS
65
+
66
+ ### Vertical Scaling (Scale Up)
67
+
68
+ **Strengths:**
69
+ - Zero distributed systems overhead -- all data local to one machine
70
+ - Complex SQL joins execute 2-5x faster than cross-shard equivalents
71
+ - ACID transactions without distributed coordination
72
+ - Simpler operational model -- 1 server to monitor, backup, tune
73
+
74
+ **Performance costs:**
75
+ - Hardware ceiling: largest EC2 instance (u-24tb1.112xlarge) has 448 vCPUs and 24TB RAM
76
+ - Scaling requires downtime for non-live-resize platforms: 1-5 minutes during resize
77
+ - Single point of failure without replication
78
+ - Diminishing returns: doubling CPU from 64 to 128 cores yields <2x throughput due to lock contention
79
+
80
+ **Best for:** Relational databases, in-memory caches, legacy monoliths, workloads with <500 RPS
81
+
82
+ ### Head-to-Head Comparison
83
+
84
+ | Metric | Horizontal | Vertical |
85
+ |---|---|---|
86
+ | P99 latency (stateless API) | +0.5-2ms (cross-node) | Baseline |
87
+ | Max throughput ceiling | Effectively unlimited | Hardware-bound |
88
+ | Time to scale | 45s-5min | 1-5min (restart) |
89
+ | Data consistency | Requires coordination | Native ACID |
90
+ | Cost at 10x load | ~10x baseline | 3-8x baseline (non-linear pricing) |
91
+ | Failure blast radius | 1/N capacity lost | 100% capacity lost |
92
+
93
+ ---
94
+
95
+ ## Kubernetes HPA and VPA
96
+
97
+ ### Horizontal Pod Autoscaler (HPA)
98
+
99
+ The HPA polls metrics every **15 seconds** (default `--horizontal-pod-autoscaler-sync-period`) and adjusts replica count based on the ratio of current to target metric values.
100
+
101
+ **Scaling formula:**
102
+ ```
103
+ desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))
104
+ ```
105
+
106
+ **Performance characteristics:**
107
+ - Reaction time to demand change: **2-4 minutes** end-to-end (15s poll + 3min stabilization window)
108
+ - Scale-up stabilization window: **0 seconds** (default, immediate)
109
+ - Scale-down stabilization window: **300 seconds** (default, prevents flapping)
110
+ - Tolerance band: 10% (no scaling if metric is within 0.9x-1.1x of target)
111
+
112
+ **Recommended configuration for latency-sensitive workloads:**
113
+ ```yaml
114
+ apiVersion: autoscaling/v2
115
+ kind: HorizontalPodAutoscaler
116
+ metadata:
117
+ name: api-server
118
+ spec:
119
+ scaleTargetRef:
120
+ apiVersion: apps/v1
121
+ kind: Deployment
122
+ name: api-server
123
+ minReplicas: 3
124
+ maxReplicas: 50
125
+ metrics:
126
+ - type: Resource
127
+ resource:
128
+ name: cpu
129
+ target:
130
+ type: Utilization
131
+ averageUtilization: 60 # Leave 40% headroom for spikes
132
+ - type: Pods
133
+ pods:
134
+ metric:
135
+ name: http_requests_per_second
136
+ target:
137
+ type: AverageValue
138
+ averageValue: "100" # Scale on business metric too
139
+ behavior:
140
+ scaleUp:
141
+ stabilizationWindowSeconds: 0
142
+ policies:
143
+ - type: Percent
144
+ value: 100 # Allow doubling per scale event
145
+ periodSeconds: 60
146
+ scaleDown:
147
+ stabilizationWindowSeconds: 300
148
+ policies:
149
+ - type: Percent
150
+ value: 10 # Scale down slowly (10%/min)
151
+ periodSeconds: 60
152
+ ```
153
+
154
+ **Key insight**: Organizations using multiple metric types (CPU + custom metrics) for HPA scaling decisions experience fewer outages during traffic surges compared to CPU-only configurations, per the 2024 Kubernetes Benchmark Report.
155
+
156
+ ### Vertical Pod Autoscaler (VPA)
157
+
158
+ VPA adjusts CPU and memory requests/limits for individual pods based on historical usage.
159
+
160
+ **Modes:**
161
+ | Mode | Behavior | Disruption | Use Case |
162
+ |---|---|---|---|
163
+ | `Off` | Recommendations only | None | Capacity planning |
164
+ | `Initial` | Sets requests at pod creation | None (existing pods) | Batch jobs |
165
+ | `Auto` | Evicts and recreates pods | Pod restart (5-30s) | Stateless services |
166
+ | `InPlace` (beta, K8s 1.32+) | Resizes without restart | None | Latency-sensitive |
167
+
168
+ **Real-world optimization results:**
169
+ - MongoDB cluster (3 replicas): VPA reduced memory requests from 6GB to 3.41GB, saving 4.2GB across the cluster
170
+ - etcd deployment: VPA recommended 93m CPU (vs. 10m initial) and 599MB memory, preventing OOMKills
171
+ - Typical memory right-sizing: 20-40% reduction in requested resources
172
+
173
+ **Critical constraint**: Do NOT run HPA and VPA on the same metric (e.g., both scaling on CPU). HPA adds pods because CPU is high; VPA increases CPU limits because CPU is high. They will fight each other in a scaling seesaw. Use VPA for memory right-sizing and HPA for horizontal scaling on CPU or custom metrics.
174
+
175
+ ### Combining HPA + VPA Effectively
176
+
177
+ ```
178
+ VPA handles: memory requests/limits (right-sizing)
179
+ HPA handles: replica count based on CPU utilization + custom metrics
180
+ Result: Covers ~80% of use cases without conflict
181
+ ```
182
+
183
+ ---
184
+
185
+ ## Kubernetes Node Scaling: Cluster Autoscaler vs Karpenter
186
+
187
+ ### Cluster Autoscaler (CAS)
188
+
189
+ - **Architecture**: Periodic scan loop (default 10-second interval)
190
+ - **Scaling mechanism**: Manages pre-defined Auto Scaling Groups (ASGs) with fixed instance types
191
+ - **Node provisioning time**: **3-4 minutes** end-to-end (scan cycle + ASG spin-up)
192
+ - **Scan interval tradeoff**: Reducing scan interval from 10s to 60s cuts API calls by 6x but slows scale-up by 38%
193
+
194
+ ### Karpenter
195
+
196
+ - **Architecture**: Event-driven reconciliation -- each pending pod immediately triggers provisioning
197
+ - **Scaling mechanism**: Direct cloud provider API calls, no ASG dependency
198
+ - **Node provisioning time**: **45-60 seconds** in AWS benchmarks
199
+ - **Spot interruption recovery**: Can replace a Spot node within the 2-minute interruption notice window
200
+
201
+ ### Performance Comparison
202
+
203
+ | Metric | Cluster Autoscaler | Karpenter |
204
+ |---|---|---|
205
+ | Pod-to-running latency | 3-4 minutes | 45-60 seconds |
206
+ | Instance type flexibility | Fixed per node group | Any type per pod spec |
207
+ | Bin-packing efficiency | Moderate (pre-defined groups) | High (right-sized per workload) |
208
+ | Cost reduction (reported) | Baseline | Up to 70% vs CAS |
209
+ | Spot instance support | Via ASG mixed instances | Native, with consolidation |
210
+ | Scale-down intelligence | Node utilization threshold | Active consolidation (replaces underutilized nodes) |
211
+
212
+ **Production benchmark** (SaaS workload): Karpenter brought CPU-bound pods online in ~55 seconds, while Cluster Autoscaler required 3-4 minutes -- primarily ASG spin-up time.
213
+
214
+ AWS introduced **EKS Auto Mode** (November 2024), which abstracts node management entirely. Early adopters report 60-70% cost savings and 80% reduction in infrastructure management time.
215
+
216
+ ---
217
+
218
+ ## KEDA: Event-Driven Autoscaling
219
+
220
+ KEDA (Kubernetes Event-Driven Autoscaling) extends HPA with external event sources.
221
+
222
+ **Key capability**: Scale to zero when idle, scale from zero on first event. This is impossible with standard HPA.
223
+
224
+ **Supported scalers**: 60+ including Kafka, RabbitMQ, AWS SQS, Azure Service Bus, PostgreSQL, Redis, Prometheus, Datadog, HTTP request count.
225
+
226
+ **Architecture:**
227
+ ```
228
+ Event Source (e.g., SQS queue)
229
+ |
230
+ v
231
+ KEDA Metrics Server --> HPA --> Deployment
232
+ |
233
+ v
234
+ ScaledObject CRD (defines thresholds)
235
+ ```
236
+
237
+ **Performance characteristics:**
238
+ - Metric polling interval: configurable, typically 15-30 seconds
239
+ - Scale-from-zero latency: container startup time + pod scheduling (typically 5-30 seconds)
240
+ - Scale-to-zero cooldown: configurable (default 300 seconds)
241
+
242
+ **Example: SQS queue-based scaling:**
243
+ ```yaml
244
+ apiVersion: keda.sh/v1alpha1
245
+ kind: ScaledObject
246
+ metadata:
247
+ name: order-processor
248
+ spec:
249
+ scaleTargetRef:
250
+ name: order-processor
251
+ minReplicaCount: 0 # Scale to zero when queue is empty
252
+ maxReplicaCount: 100
253
+ triggers:
254
+ - type: aws-sqs-queue
255
+ metadata:
256
+ queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
257
+ queueLength: "5" # Target 5 messages per pod
258
+ awsRegion: us-east-1
259
+ ```
260
+
261
+ **When to use KEDA over HPA:**
262
+ - Queue-based workloads that should scale to zero
263
+ - Event-driven architectures (Kafka consumers, webhook processors)
264
+ - Workloads driven by external metrics (database row count, API rate)
265
+
266
+ ---
267
+
268
+ ## AWS Auto Scaling Policies
269
+
270
+ ### Target Tracking Scaling
271
+
272
+ Automatically adjusts capacity to keep a metric at a target value. AWS **strongly recommends** this as the default policy type.
273
+
274
+ **Behavior:**
275
+ - Scales out aggressively (proportional to metric overshoot)
276
+ - Scales in gradually (conservative to avoid flapping)
277
+ - Creates and manages CloudWatch alarms automatically
278
+ - Uses 1-minute metrics for fastest response (recommended over 5-minute defaults)
279
+
280
+ **Pre-defined metrics:**
281
+ | Metric | Typical Target | Best For |
282
+ |---|---|---|
283
+ | `ASGAverageCPUUtilization` | 50-70% | General compute |
284
+ | `ALBRequestCountPerTarget` | 100-1000 | Web APIs |
285
+ | `ASGAverageNetworkOut` | Varies | Data processing |
286
+ | Custom CloudWatch metric | Application-specific | Business logic |
287
+
288
+ **Example configuration:**
289
+ ```json
290
+ {
291
+ "TargetTrackingScalingPolicyConfiguration": {
292
+ "TargetValue": 60.0,
293
+ "PredefinedMetricSpecification": {
294
+ "PredefinedMetricType": "ASGAverageCPUUtilization"
295
+ },
296
+ "ScaleInCooldown": 300,
297
+ "ScaleOutCooldown": 60,
298
+ "DisableScaleIn": false
299
+ }
300
+ }
301
+ ```
302
+
303
+ ### Step Scaling
304
+
305
+ Provides graduated scaling responses based on alarm breach severity.
306
+
307
+ **Advantages over target tracking:**
308
+ - Fine-grained control: small load increase adds 1 instance, large surge adds 10
309
+ - Multiple step adjustments prevent over-provisioning for moderate increases
310
+ - Better for workloads with non-linear resource requirements
311
+
312
+ **Example step configuration:**
313
+ ```
314
+ CPU 60-70% → Add 1 instance
315
+ CPU 70-80% → Add 3 instances
316
+ CPU 80-90% → Add 5 instances
317
+ CPU >90% → Add 10 instances
318
+ ```
319
+
320
+ ### Predictive Scaling
321
+
322
+ Uses ML models trained on 14 days of historical data to forecast demand and pre-provision capacity.
323
+
324
+ **Performance benefits:**
325
+ - Reduces scaling-related latency by 65-80% vs reactive approaches
326
+ - Pre-provisions capacity before demand spike arrives
327
+ - Reduces underprovisioned intervals by 45-60% vs threshold-based approaches
328
+
329
+ **Best suited for:**
330
+ - Cyclical traffic (business hours vs off-hours): daily patterns with 3-5x variation
331
+ - Recurring batch processing windows
332
+ - Applications with long initialization (>60 seconds bootstrap)
333
+
334
+ **Requirements:**
335
+ - Minimum 24 hours of historical data (14 days recommended)
336
+ - Traffic must have repeating patterns (random traffic defeats prediction)
337
+ - Forecasts generated every 6 hours, capacity provisioned 1 hour before predicted need
338
+
339
+ **Cost savings**: 20-30% reduction in infrastructure costs for workloads with recognizable patterns, because capacity is right-sized rather than over-provisioned as a buffer.
340
+
341
+ ### Policy Selection Guide
342
+
343
+ | Scenario | Recommended Policy | Why |
344
+ |---|---|---|
345
+ | General web API | Target Tracking on ALBRequestCount | Proportional, self-managing |
346
+ | CPU-intensive batch | Step Scaling on CPU | Graduated response to load levels |
347
+ | Daily traffic pattern | Predictive + Target Tracking | Pre-warm + reactive fallback |
348
+ | Queue processing | Target Tracking on custom backlog metric | Proportional to actual work |
349
+ | Scheduled events (sales) | Scheduled + Target Tracking | Guaranteed minimum + dynamic |
350
+
351
+ ---
352
+
353
+ ## Serverless Scaling and Cold Starts
354
+
355
+ ### AWS Lambda Cold Start Benchmarks
356
+
357
+ Cold start time = INIT phase (runtime bootstrap + dependency loading + function initialization).
358
+
359
+ **By runtime (simple functions, 2025 benchmarks):**
360
+
361
+ | Runtime | P50 Cold Start | P99 Cold Start | Warm Invocation P50 |
362
+ |---|---|---|---|
363
+ | Python 3.12 | 100-200ms | 300-500ms | 1-5ms |
364
+ | Node.js 20 | 100-200ms | 300-600ms | 1-5ms |
365
+ | Go (provided.al2023) | 8-15ms | 30-50ms | 0.5-2ms |
366
+ | Rust (provided.al2023) | 8-15ms | 30-50ms | 0.5-2ms |
367
+ | Java 21 (no SnapStart) | 3,000-4,000ms | 5,000-6,000ms | 2-10ms |
368
+ | Java 21 (SnapStart) | 150-200ms | 600-700ms | 2-10ms |
369
+ | .NET 8 (Native AOT) | 200-400ms | 600-1,000ms | 1-5ms |
370
+
371
+ **Key finding**: Java SnapStart reduces P50 cold starts from 3,841ms to 182ms -- a **95% reduction** at the median. SnapStart expanded to Python (November 2024) and .NET 8 with Native AOT.
372
+
373
+ **Factors that multiply cold start time:**
374
+ - VPC attachment: historically added 10+ seconds, now <1 second with Hyperplane ENIs
375
+ - Package size: each additional 1MB adds ~2-5ms to INIT
376
+ - Dependency count: heavy frameworks (Spring Boot, Django) add 500-3000ms
377
+ - Memory allocation: 128MB vs 1024MB can mean 3x slower INIT (CPU scales with memory)
378
+
379
+ **Architecture impact (Arm64 vs x86_64):**
380
+ Graviton2-based arm64 Lambda functions show **13-24% faster cold start initialization** at equivalent memory settings.
381
+
382
+ **Billing change (August 2025):** AWS now bills for the Lambda INIT phase, making cold start frequency a direct cost factor in addition to a latency concern.
383
+
384
+ ### Container Cold Starts (ECS Fargate, Kubernetes)
385
+
386
+ Container cold starts are 10-100x slower than Lambda cold starts.
387
+
388
+ **Fargate cold start breakdown (production benchmarks):**
389
+
390
+ | Phase | Duration | Optimization |
391
+ |---|---|---|
392
+ | ENI Provisioning | 10-30 seconds | Cannot optimize (platform) |
393
+ | Image Pull | 5-60 seconds | Use SOCI, smaller images, ECR in same region |
394
+ | Layer Extraction | 2-15 seconds | Use zstd compression (27% reduction) |
395
+ | Application Bootstrap | 1-10 seconds | Optimize startup code, lazy init |
396
+ | **Total (unoptimized)** | **20-60 seconds** | -- |
397
+ | **Total (optimized)** | **3-8 seconds** | SOCI + small image + zstd |
398
+
399
+ **Optimization results:**
400
+ - SOCI (Seekable OCI) lazy loading: **50% startup acceleration**; 10GB Deep Learning Container showed ~60% improvement in pull times
401
+ - zstd compression: up to **27% reduction** in task/pod startup time
402
+ - Production achievement (Prime Day 2025): P99 cold starts reduced from 38 seconds to **under 4 seconds**
403
+
404
+ **Kubernetes pod startup time (typical):**
405
+
406
+ | Component | Duration |
407
+ |---|---|
408
+ | Scheduling decision | 0.5-2 seconds |
409
+ | Image pull (cached) | 0-1 seconds |
410
+ | Image pull (uncached, 500MB) | 5-20 seconds |
411
+ | Container start | 0.5-2 seconds |
412
+ | Readiness probe pass | 1-30 seconds (app-dependent) |
413
+ | **Total (cached image)** | **2-5 seconds** |
414
+ | **Total (cold pull)** | **10-30 seconds** |
415
+
416
+ ---
417
+
418
+ ## Custom Metrics for Scaling
419
+
420
+ CPU and memory utilization are lagging indicators. By the time CPU hits 80%, users are already experiencing degraded performance. Custom metrics provide **leading indicators** of demand.
421
+
422
+ ### Metric Categories and Use Cases
423
+
424
+ **Queue-Based Metrics (most responsive for async workloads):**
425
+
426
+ | Metric | How to Calculate Target | Example |
427
+ |---|---|---|
428
+ | Backlog per instance | acceptable_latency / avg_processing_time | 10s latency / 0.1s per msg = 100 msgs/instance |
429
+ | Queue depth | total_messages / target_per_instance | 5000 msgs / 50 per pod = 100 pods |
430
+ | Age of oldest message | Alert if > SLA threshold | Scale if oldest > 30 seconds |
431
+
432
+ **Important**: Scale on backlog-per-instance, not raw queue depth. Raw depth does not account for processing speed or current instance count.
433
+
434
+ **Request-Based Metrics (best for synchronous APIs):**
435
+
436
+ | Metric | Target | When to Use |
437
+ |---|---|---|
438
+ | Requests per second per pod | 50-500 (benchmark your app) | HTTP APIs with known capacity |
439
+ | P95 response latency | Your SLA target (e.g., 200ms) | Latency-sensitive services |
440
+ | Error rate (5xx) | 0.1-1% | Overload detection |
441
+ | Active connections per instance | 80% of max (e.g., 800 of 1000) | Connection-limited services |
442
+
443
+ **Business Metrics (most aligned with value):**
444
+
445
+ | Metric | Example | Benefit |
446
+ |---|---|---|
447
+ | Orders per minute | Scale checkout service at 50 orders/min/pod | Directly tied to revenue |
448
+ | Active users | Scale at 1000 concurrent users per instance | Capacity planning |
449
+ | Payment queue depth | Scale payment processor per backlog | SLA compliance |
450
+ | Search queries per second | Scale search cluster at 200 QPS/node | User experience |
451
+
452
+ ### Implementing Custom Metrics in Kubernetes
453
+
454
+ **Prometheus Adapter example (exposing HTTP request rate to HPA):**
455
+ ```yaml
456
+ # prometheus-adapter-config
457
+ rules:
458
+ - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
459
+ resources:
460
+ overrides:
461
+ namespace: {resource: "namespace"}
462
+ pod: {resource: "pod"}
463
+ name:
464
+ matches: "^(.*)_total"
465
+ as: "${1}_per_second"
466
+ metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
467
+ ```
468
+
469
+ **HPA using the custom metric:**
470
+ ```yaml
471
+ metrics:
472
+ - type: Pods
473
+ pods:
474
+ metric:
475
+ name: http_requests_per_second
476
+ target:
477
+ type: AverageValue
478
+ averageValue: "200" # Scale when avg exceeds 200 RPS/pod
479
+ ```
480
+
481
+ ### AWS Custom Metric Scaling (SQS Example)
482
+
483
+ The recommended approach for SQS is to calculate **backlog per instance**:
484
+
485
+ ```
486
+ backlog_per_instance = ApproximateNumberOfMessagesVisible / RunningTaskCount
487
+ target_backlog = acceptable_latency / average_processing_time
488
+ ```
489
+
490
+ If average processing time = 0.1 seconds and acceptable latency = 10 seconds:
491
+ - Target backlog per instance = 10 / 0.1 = **100 messages**
492
+ - With 5000 messages in queue and target of 100: desired = 5000 / 100 = **50 instances**
493
+
494
+ **Critical monitoring**: Track both queue depth AND age of oldest message. Scaling on depth alone may miss important messages aging past SLA thresholds.
495
+
496
+ ---
497
+
498
+ ## Scaling Speed: Time-to-Ready Analysis
499
+
500
+ The total time from "demand increase detected" to "new capacity serving traffic" varies dramatically by platform.
501
+
502
+ ### End-to-End Scaling Timeline
503
+
504
+ ```
505
+ Detection Provisioning Healthy
506
+ ───────── ──────────── ───────
507
+ Lambda (warm): 0s 0s 0s = 0s total
508
+ Lambda (cold): 0s 0.1-5s 0s = 0.1-5s total
509
+ K8s Pod (cached): 15-30s 2-5s 1-30s = 18-65s total
510
+ K8s Pod (Karpenter): 0-5s 45-60s 5-30s = 50-95s total
511
+ K8s Pod (CAS): 10-30s 180-240s 5-30s = 195-300s total
512
+ EC2 (ASG): 60-300s 60-180s 30-120s = 150-600s total
513
+ EC2 (Warm Pool): 60-300s 5-30s 5-30s = 70-360s total
514
+ Fargate (cold): 15-60s 20-60s 1-30s = 36-150s total
515
+ Fargate (optimized): 15-60s 3-8s 1-30s = 19-98s total
516
+ ```
517
+
518
+ ### Breakdown of Detection Phase
519
+
520
+ | Scaling System | Detection Mechanism | Detection Latency |
521
+ |---|---|---|
522
+ | HPA | Polling (15s default) | 0-15 seconds |
523
+ | KEDA | Polling (configurable) | 0-30 seconds |
524
+ | Karpenter | Event-driven (pending pods) | 0-5 seconds |
525
+ | Cluster Autoscaler | Scan loop (10s default) | 0-10 seconds |
526
+ | AWS Target Tracking | CloudWatch alarm (1-5 min) | 60-300 seconds |
527
+ | AWS Predictive | ML forecast (1hr ahead) | Pre-provisioned |
528
+
529
+ ### Key Takeaway
530
+
531
+ For latency-sensitive workloads that must handle spikes within 60 seconds, the only viable options are:
532
+ 1. **Over-provision** (maintain 30-50% headroom)
533
+ 2. **Lambda** (instant scaling if functions are warm, <5s cold)
534
+ 3. **Pre-warmed capacity** (warm pools, provisioned concurrency, min replicas)
535
+ 4. **Predictive scaling** (if traffic is patterned)
536
+
537
+ ---
538
+
539
+ ## Warm Pool Strategies
540
+
541
+ Warm pools pre-initialize resources so they can be placed into service faster than cold-starting new ones.
542
+
543
+ ### AWS EC2 Warm Pools
544
+
545
+ **Instance states in warm pool:**
546
+
547
+ | State | Boot Time to Service | Cost | Use Case |
548
+ |---|---|---|---|
549
+ | Running | 5-10 seconds | Full instance cost | Ultra-fast scaling, short spikes |
550
+ | Stopped | 30-60 seconds | EBS storage cost only | Cost-effective for moderate latency tolerance |
551
+ | Hibernated | 10-30 seconds | EBS storage + memory snapshot | Stateful apps, OS-level caches |
552
+
553
+ **Configuration:**
554
+ ```json
555
+ {
556
+ "WarmPool": {
557
+ "MinSize": 2,
558
+ "MaxGroupPreparedCapacity": 10,
559
+ "PoolState": "Stopped",
560
+ "InstanceReusePolicy": {
561
+ "ReuseOnScaleIn": true
562
+ }
563
+ }
564
+ }
565
+ ```
566
+
567
+ **Key consideration**: If the warm pool is depleted during a scale-out event, instances launch cold (full boot). Size your warm pool to cover expected burst magnitude.
568
+
569
+ **Recent expansion (2025)**: AWS added warm pool support for Auto Scaling groups with mixed instances policies, enabling Spot + On-Demand warm pools.
570
+
571
+ ### Lambda Provisioned Concurrency
572
+
573
+ Pre-initializes a specified number of execution environments, eliminating cold starts entirely for those instances.
574
+
575
+ **Performance:**
576
+ - Cold start: **0ms** for provisioned instances
577
+ - Spillover: Standard cold start if demand exceeds provisioned count
578
+ - Scaling: Provisioned concurrency can be scheduled or managed via Application Auto Scaling
579
+
580
+ **Cost model:**
581
+ - Provisioned rate: ~60% cheaper per GB-second than on-demand execution
582
+ - But: you pay 24/7 even with zero requests
583
+ - Break-even: typically cost-effective at >100 invocations/hour consistently
584
+
585
+ **Strategic warm-up alternative**: CloudWatch Events timer every 5 minutes sending concurrent warm-up requests. Achieves 80-95% warm availability at 5-15% of the cost of full provisioned concurrency.
586
+
587
+ ### Kubernetes Warm Strategies
588
+
589
+ **Over-provisioning with priority-based preemption:**
590
+ ```yaml
591
+ # Low-priority "balloon" pods that hold capacity
592
+ apiVersion: scheduling.k8s.io/v1
593
+ kind: PriorityClass
594
+ metadata:
595
+ name: overprovisioning
596
+ value: -1 # Lowest priority
597
+ preemptionPolicy: Never
598
+ ---
599
+ apiVersion: apps/v1
600
+ kind: Deployment
601
+ metadata:
602
+ name: overprovisioning
603
+ spec:
604
+ replicas: 3
605
+ template:
606
+ spec:
607
+ priorityClassName: overprovisioning
608
+ containers:
609
+ - name: pause
610
+ image: registry.k8s.io/pause:3.9
611
+ resources:
612
+ requests:
613
+ cpu: "2"
614
+ memory: "4Gi"
615
+ ```
616
+
617
+ When real workloads need resources, balloon pods are evicted instantly (0 seconds). New pods schedule on the freed capacity without waiting for node provisioning.
618
+
619
+ **Effective warm capacity**: 3 balloon pods x 2 CPU x 4Gi = 6 CPU and 12Gi always available for burst. Cost: ~$200-400/month for 3 medium instances.
620
+
621
+ ---
622
+
623
+ ## Cost vs Performance Tradeoffs
624
+
625
+ ### The Fundamental Tension
626
+
627
+ ```
628
+ Over-provisioning Under-provisioning
629
+ (high cost, low latency) (low cost, high latency risk)
630
+ | |
631
+ | ← Sweet Spot: 60-70% utilization → |
632
+ | |
633
+ 30% idle capacity Users hit latency spikes
634
+ $$$$ wasted during scale-up lag
635
+ Zero scaling lag 2-5 min degraded performance
636
+ ```
637
+
638
+ **Industry benchmark (CAST AI Report 2025):** Average resource utilization across cloud providers is 67% (AWS) and 66% (GCP). This means 33-34% of compute spend is wasted on idle capacity -- but this waste buys protection against scaling lag.
639
+
640
+ ### Cost-Performance Matrix
641
+
642
+ | Strategy | Cost Overhead | P99 Latency During Spike | Time to Full Capacity |
643
+ |---|---|---|---|
644
+ | Always over-provisioned (50% headroom) | +50% baseline | 0ms impact | 0 seconds |
645
+ | Moderate headroom (20%) + reactive HPA | +20% baseline | +50-200ms for 2-4 min | 2-4 minutes |
646
+ | Tight provisioning + aggressive HPA | +5% baseline | +200-500ms for 3-5 min | 3-5 minutes |
647
+ | Predictive + reactive hybrid | +10-15% baseline | +20-50ms for 30-60s | 30-60 seconds |
648
+ | Scale-to-zero (KEDA/Lambda) | Pay-per-use only | Cold start penalty | 0.1s-30s |
649
+
650
+ ### GPU Workload Waste
651
+
652
+ GPU workloads often suffer from high idle time and unused memory. A single A100 costs ~$3/hour; idle GPU capacity at scale translates to tens of thousands in monthly waste. Auto-scaling GPU workloads with Karpenter or KEDA (based on inference queue depth) can reduce GPU costs by 40-60%.
653
+
654
+ ### Right-Sizing Formula
655
+
656
+ ```
657
+ target_capacity = peak_demand * (1 + safety_margin)
658
+ safety_margin = scaling_time / acceptable_degradation_time
659
+
660
+ Example:
661
+ Peak demand: 100 pods
662
+ Scaling time: 3 minutes (180s)
663
+ Acceptable degradation: 1 minute (60s)
664
+ Safety margin: 180/60 = 3.0 (300%)
665
+ Target capacity: 100 * 4 = 400 pods ← Unsustainable!
666
+
667
+ Better approach:
668
+ Use predictive scaling (reduces effective scaling time to 30s)
669
+ Safety margin: 30/60 = 0.5 (50%)
670
+ Target capacity: 100 * 1.5 = 150 pods ← Manageable
671
+ ```
672
+
673
+ ---
674
+
675
+ ## Common Bottlenecks
676
+
677
+ ### 1. Slow Scale-Up
678
+
679
+ **Symptom**: Latency spikes lasting 3-10 minutes during traffic increases.
680
+
681
+ **Root causes and fixes:**
682
+
683
+ | Root Cause | Impact | Fix | Improvement |
684
+ |---|---|---|---|
685
+ | Large container images (>1GB) | +20-60s pull time | Multi-stage builds, distroless base | 50-80% smaller images |
686
+ | Slow health checks | +30-120s before serving | Separate liveness/readiness, fast startup probe | 30-60s faster |
687
+ | CAS scan interval too long | +60s detection delay | Reduce to 10s or switch to Karpenter | 45-60s faster |
688
+ | CloudWatch 5-min metrics | +300s detection delay | Switch to 1-min detailed monitoring | 240s faster |
689
+ | Cold node pool (no warm pool) | +120-180s boot time | EC2 warm pool or Karpenter | 90-150s faster |
690
+
691
+ ### 2. Database Becomes Bottleneck During Scale
692
+
693
+ **Symptom**: Application scales horizontally but response times increase because all new instances hit the same database.
694
+
695
+ **The math:**
696
+ - 10 pods, each with 50 DB connections = 500 connections
697
+ - Scale to 30 pods = 1,500 connections
698
+ - RDS max_connections for db.r5.xlarge = 2,730
699
+ - At 40 pods = 2,000 connections (73% of max, performance degrades at >70%)
700
+
701
+ **Fixes:**
702
+ 1. **Connection pooling** (PgBouncer, ProxySQL): Reduce per-pod connections from 50 to 5-10
703
+ 2. **Read replicas**: Route read traffic to replicas, scale reads independently
704
+ 3. **Database-aware scaling limits**: Set HPA maxReplicas based on DB connection budget
705
+ 4. **Caching layer**: Add Redis/Memcached to absorb repeated reads (70-90% cache hit rate typical)
706
+
707
+ ### 3. Thundering Herd After Scale Events
708
+
709
+ **Symptom**: Cache expires or service restarts, and all new instances simultaneously fetch the same data, overwhelming backends.
710
+
711
+ **The timeline:**
712
+ - Auto-scaling adds 20 instances at T=0
713
+ - All 20 instances start with cold caches at T=+30s
714
+ - All 20 simultaneously query the database for warm-up data at T=+31s
715
+ - Database CPU spikes to 100%, queries timeout at T=+32s
716
+ - Auto-scaling detects failure, may add MORE instances (cascading failure)
717
+
718
+ **Mitigation strategies:**
719
+
720
+ | Strategy | Mechanism | Effectiveness |
721
+ |---|---|---|
722
+ | Request coalescing | Single fetch, shared response for identical concurrent requests | Reduces DB load by 90%+ during storms |
723
+ | Jittered cache TTLs | Random ±10-20% on TTL prevents synchronized expiry | Eliminates cache stampede |
724
+ | Exponential backoff with jitter | 200ms, 400ms, 800ms delays with random offset | Staggers retry storms |
725
+ | Staggered rollout | Roll out new instances 2-3 at a time with 30s intervals | Prevents simultaneous cold cache |
726
+ | Cache pre-warming | Load critical data before marking instance healthy | Zero cold-cache window |
727
+
728
+ **Key number**: Auto-scaling takes 45+ seconds to respond to spikes, while a thundering herd spike happens in seconds. By the time scaling responds, the damage is done. Prevention (coalescing, jitter, pre-warming) beats reaction.
729
+
730
+ ---
731
+
732
+ ## Anti-Patterns
733
+
734
+ ### 1. Scaling on CPU Only
735
+
736
+ **Why it fails**: CPU is a lagging indicator. By the time CPU reaches 80%, request queues are already saturated and users experience 2-5x latency.
737
+
738
+ **Additional problem with memory**: Most application runtimes do not release memory after load decreases. They keep memory allocated for reuse. Scaling on memory utilization may scale out but **never scale back in**.
739
+
740
+ **Fix**: Use request-based or queue-based metrics as primary scaling signals. Use CPU as a safety backstop only.
741
+
742
+ ### 2. No Scale-Down Policy
743
+
744
+ **What happens**: `disableScaleIn: true` or overly conservative scale-down settings cause capacity to ratchet up permanently. A single daily spike provisions instances that run 24/7 at <10% utilization.
745
+
746
+ **Cost impact**: A 20-instance ASG that should average 8 instances wastes 12 instances * $0.10/hr * 720 hrs/month = **$864/month** in idle capacity.
747
+
748
+ **Fix**: Configure aggressive but stable scale-down:
749
+ - Scale-down cooldown: 300 seconds (prevents flapping)
750
+ - Scale-down evaluation: 15 consecutive minutes below threshold
751
+ - Scale-down rate: 1-2 instances per evaluation period
752
+
753
+ ### 3. Scaling to Infinity (No Max Limit)
754
+
755
+ **What happens**: A bug, retry storm, or DDoS triggers unbounded scaling. Thousands of instances launch. Monthly bill: $50,000+.
756
+
757
+ **Real scenario**: A misconfigured health check returns 500 errors. Load balancer retries. Each retry increases load. Auto-scaling adds instances. New instances also return 500s. More retries. More scaling. 200 instances running within 10 minutes, none serving real traffic.
758
+
759
+ **Fix**: Always set `maxReplicas` / `MaxSize`. Set billing alerts at 150% and 200% of expected spend. Use AWS Service Quotas as a hard ceiling.
760
+
761
+ ### 4. Scaling Oscillation (Flapping)
762
+
763
+ **What happens**: Scale-up threshold at 70% CPU, scale-down at 60% CPU. Adding instances drops CPU to 55%. Scale-down triggers. Removing instances raises CPU to 75%. Scale-up triggers. Infinite loop.
764
+
765
+ **Fix**: Maintain at least a 20% gap between scale-up and scale-down thresholds. Use stabilization windows: 0 seconds for scale-up, 300+ seconds for scale-down.
766
+
767
+ ### 5. Ignoring Startup Time in Scaling Calculations
768
+
769
+ **What happens**: HPA targets 70% CPU. Application takes 60 seconds to start serving. During those 60 seconds, existing pods handle 100% of traffic at 90% CPU. HPA sees 90%, adds MORE pods. Overshoot by 2-3x.
770
+
771
+ **Fix**: Account for startup time with `scaleUp.stabilizationWindowSeconds`. Use readiness gates to prevent HPA from counting pods that are not yet serving traffic.
772
+
773
+ ### 6. HPA + VPA Conflict on Same Metric
774
+
775
+ **What happens**: Both scale on CPU. CPU rises. HPA adds pods. VPA increases CPU requests. More total CPU requested than available. Pods go Pending. Cluster Autoscaler adds nodes. Massive over-provisioning.
776
+
777
+ **Fix**: VPA for memory only. HPA for CPU and custom metrics.
778
+
779
+ ---
780
+
781
+ ## Before/After: Configuration Improvements
782
+
783
+ ### Case 1: E-Commerce API During Flash Sale
784
+
785
+ **Before** (naive configuration):
786
+ ```yaml
787
+ # HPA: CPU only, default settings
788
+ spec:
789
+ minReplicas: 2
790
+ maxReplicas: 10
791
+ metrics:
792
+ - type: Resource
793
+ resource:
794
+ name: cpu
795
+ target:
796
+ type: Utilization
797
+ averageUtilization: 80
798
+ ```
799
+
800
+ **Behavior during 10x traffic spike:**
801
+ - T=0: Traffic spike begins, CPU at 30%
802
+ - T=+60s: CPU reaches 80%, HPA triggers
803
+ - T=+120s: 2 new pods starting, CPU at 95%, P99 latency: 2,500ms
804
+ - T=+180s: New pods ready, but only at 4 total. Still need more.
805
+ - T=+300s: 6 pods running. CPU at 75%. P99 latency: 800ms
806
+ - T=+420s: 8 pods running. Stabilized. P99 latency: 200ms
807
+ - **Total degradation window: 7 minutes. Peak P99: 2,500ms**
808
+
809
+ **After** (optimized configuration):
810
+ ```yaml
811
+ spec:
812
+ minReplicas: 5 # Higher baseline for faster initial absorption
813
+ maxReplicas: 50 # Room to grow
814
+ metrics:
815
+ - type: Resource
816
+ resource:
817
+ name: cpu
818
+ target:
819
+ type: Utilization
820
+ averageUtilization: 60 # Lower target = earlier scaling
821
+ - type: Pods
822
+ pods:
823
+ metric:
824
+ name: http_requests_per_second
825
+ target:
826
+ type: AverageValue
827
+ averageValue: "200" # Leading indicator
828
+ behavior:
829
+ scaleUp:
830
+ stabilizationWindowSeconds: 0
831
+ policies:
832
+ - type: Percent
833
+ value: 100 # Double capacity per minute
834
+ periodSeconds: 60
835
+ - type: Pods
836
+ value: 10 # Or add 10 pods, whichever is greater
837
+ periodSeconds: 60
838
+ selectPolicy: Max
839
+ scaleDown:
840
+ stabilizationWindowSeconds: 600
841
+ policies:
842
+ - type: Percent
843
+ value: 10
844
+ periodSeconds: 120
845
+ ```
846
+
847
+ **Behavior during 10x traffic spike:**
848
+ - T=0: Traffic spike begins, 5 pods absorb initial burst
849
+ - T=+15s: HPA detects RPS increase (leading indicator), triggers scale-up
850
+ - T=+20s: 10 pods targeted (100% increase)
851
+ - T=+45s: 10 pods ready and serving. CPU at 65%. P99 latency: 250ms
852
+ - T=+60s: HPA evaluates again, adds 5 more pods (RPS still above target)
853
+ - T=+90s: 15 pods running. Stabilized. P99 latency: 150ms
854
+ - **Total degradation window: 45 seconds. Peak P99: 350ms**
855
+
856
+ **Improvement: 7 minutes degradation reduced to 45 seconds. Peak P99 reduced from 2,500ms to 350ms.**
857
+
858
+ ### Case 2: AWS ASG with Predictive Scaling
859
+
860
+ **Before** (reactive only):
861
+ ```
862
+ Policy: Target Tracking on CPU at 70%
863
+ Metrics: 5-minute CloudWatch intervals
864
+ Warm pool: None
865
+ Min: 2, Max: 20
866
+ ```
867
+ - Daily traffic pattern: 3x spike at 9 AM, ramp-down at 6 PM
868
+ - Every morning: 5-8 minutes of degraded performance (P99 >1s) while scaling from 2 to 8 instances
869
+ - Each instance takes 3 minutes to boot + pass health checks
870
+
871
+ **After** (predictive + reactive + warm pool):
872
+ ```
873
+ Policy: Predictive Scaling (forecast mode) + Target Tracking at 65%
874
+ Metrics: 1-minute detailed monitoring
875
+ Warm pool: 4 Stopped instances
876
+ Min: 2, Max: 20
877
+ Predictive: provisions capacity 1 hour before predicted need
878
+ ```
879
+ - 8 AM: Predictive scaling launches 6 instances from warm pool (30s boot from stopped)
880
+ - 8:30 AM: 8 instances ready before traffic arrives
881
+ - 9 AM: Traffic spike absorbed by pre-provisioned capacity. P99: 180ms
882
+ - Reactive target tracking handles any variance above prediction
883
+ - **Result: Zero degradation window. P99 stays under 200ms throughout the day.**
884
+
885
+ ### Case 3: Lambda Function with Provisioned Concurrency
886
+
887
+ **Before**:
888
+ ```
889
+ Runtime: Java 21
890
+ Memory: 512MB
891
+ Provisioned Concurrency: None
892
+ Cold start P50: 3,841ms
893
+ Cold start P99: 5,200ms
894
+ ```
895
+ - API Gateway timeout set to 10 seconds
896
+ - 5% of requests hit cold starts
897
+ - 0.3% of cold-start requests timeout entirely (>10s)
898
+ - User-facing error rate: 0.015%
899
+
900
+ **After**:
901
+ ```
902
+ Runtime: Java 21 with SnapStart
903
+ Memory: 1024MB (2x CPU allocation)
904
+ Provisioned Concurrency: 20 (covers P95 concurrent demand)
905
+ Cold start P50: 182ms (SnapStart for spillover)
906
+ Cold start P99: 700ms (SnapStart for spillover)
907
+ ```
908
+ - Provisioned handles 95% of invocations: 0ms cold start
909
+ - Spillover 5% uses SnapStart: 182ms P50 cold start
910
+ - Timeout rate: 0%
911
+ - **Result: P50 cold start reduced by 95%. Timeout errors eliminated.**
912
+
913
+ ---
914
+
915
+ ## Decision Tree: How Should I Configure Auto-Scaling?
916
+
917
+ ```
918
+ START: What type of workload?
919
+
920
+ ├─► Synchronous API (HTTP/gRPC)
921
+ │ │
922
+ │ ├─► Latency-sensitive (P99 < 200ms SLA)?
923
+ │ │ │
924
+ │ │ ├─► YES: Use HPA with request-rate metric + CPU backstop
925
+ │ │ │ Set minReplicas to handle 30% of peak
926
+ │ │ │ Use Karpenter (not CAS) for node scaling
927
+ │ │ │ Enable predictive scaling if traffic is patterned
928
+ │ │ │ Consider balloon pods for instant burst capacity
929
+ │ │ │
930
+ │ │ └─► NO: Use HPA with CPU at 60-70% target
931
+ │ │ Default minReplicas (2-3 for HA)
932
+ │ │ Cluster Autoscaler is sufficient
933
+ │ │
934
+ │ └─► Serverless candidate? (< 1000 RPS, spiky traffic)
935
+ │ │
936
+ │ ├─► YES: Lambda + API Gateway
937
+ │ │ Use SnapStart for Java/.NET
938
+ │ │ Provisioned Concurrency if P99 < 500ms required
939
+ │ │ Arm64 (Graviton) for 13-24% faster cold starts
940
+ │ │
941
+ │ └─► NO: Stay on containers (ECS/EKS)
942
+
943
+ ├─► Asynchronous Queue Processing
944
+ │ │
945
+ │ ├─► Can tolerate scale-to-zero? (no traffic = no cost)
946
+ │ │ │
947
+ │ │ ├─► YES: KEDA with queue-length scaler
948
+ │ │ │ Set target messages-per-pod based on:
949
+ │ │ │ acceptable_latency / avg_processing_time
950
+ │ │ │ Monitor oldest message age (not just depth)
951
+ │ │ │
952
+ │ │ └─► NO: HPA with custom backlog-per-instance metric
953
+ │ │ minReplicas: 1-3 for always-on processing
954
+ │ │
955
+ │ └─► Processing time per message?
956
+ │ │
957
+ │ ├─► < 15 minutes: Lambda (SQS trigger, automatic scaling)
958
+ │ │
959
+ │ └─► > 15 minutes: ECS/EKS with KEDA or HPA
960
+
961
+ ├─► Batch / Scheduled Workload
962
+ │ │
963
+ │ ├─► Predictable schedule?
964
+ │ │ │
965
+ │ │ ├─► YES: Scheduled scaling (cron-based min/max)
966
+ │ │ │ + Target Tracking for variance
967
+ │ │ │ + Warm pool for fast scale-up
968
+ │ │ │
969
+ │ │ └─► NO: Event-driven (KEDA or Step Functions)
970
+ │ │
971
+ │ └─► GPU required?
972
+ │ │
973
+ │ ├─► YES: Karpenter with GPU node pools
974
+ │ │ Scale on inference queue depth (KEDA)
975
+ │ │ Aggressive scale-down (GPU instances are expensive)
976
+ │ │
977
+ │ └─► NO: Standard compute auto-scaling
978
+
979
+ └─► Stateful Workload (Database, Cache)
980
+
981
+ ├─► Vertical scaling first (larger instance type)
982
+ │ Until: single-instance limits reached OR cost prohibitive
983
+
984
+ ├─► Read scaling: Add read replicas with connection routing
985
+
986
+ ├─► Write scaling: Sharding (application-level partitioning)
987
+
988
+ └─► Managed auto-scaling:
989
+ ├─► Aurora: Auto-scales storage + read replicas
990
+ ├─► DynamoDB: On-demand or provisioned with auto-scaling
991
+ └─► ElastiCache: Cluster mode with shard auto-scaling
992
+ ```
993
+
994
+ ### Quick Reference: Scaling Configuration Checklist
995
+
996
+ ```
997
+ [ ] Set maxReplicas / MaxSize (NEVER leave unbounded)
998
+ [ ] Set minReplicas >= 2 for HA (production workloads)
999
+ [ ] Use request-based or queue-based metrics as PRIMARY scaling signal
1000
+ [ ] Use CPU as SECONDARY backstop only
1001
+ [ ] Configure scale-up: stabilizationWindowSeconds = 0
1002
+ [ ] Configure scale-down: stabilizationWindowSeconds >= 300
1003
+ [ ] Ensure 20%+ gap between scale-up and scale-down thresholds
1004
+ [ ] Test actual scaling speed end-to-end (don't assume)
1005
+ [ ] Set billing alerts at 150% and 200% of expected spend
1006
+ [ ] Monitor database connections as a function of instance count
1007
+ [ ] Implement connection pooling before scaling application tier
1008
+ [ ] Use warm pools or predictive scaling for boot times > 60s
1009
+ [ ] Separate VPA (memory) from HPA (CPU + custom) to avoid conflicts
1010
+ [ ] Configure PodDisruptionBudgets for scale-down safety
1011
+ [ ] Load test at 2x expected peak to validate scaling behavior
1012
+ ```
1013
+
1014
+ ---
1015
+
1016
+ ## Sources
1017
+
1018
+ - [AWS EC2 Auto Scaling Predictive Scaling Documentation](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-predictive-scaling.html)
1019
+ - [AWS EC2 Auto Scaling Warm Pools](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html)
1020
+ - [AWS Target Tracking Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html)
1021
+ - [AWS Step and Simple Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html)
1022
+ - [AWS SQS-Based Scaling Policy](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html)
1023
+ - [AWS Lambda Provisioned Concurrency](https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html)
1024
+ - [AWS Lambda Cold Start Benchmarks - maxday](https://maxday.github.io/lambda-perf/)
1025
+ - [AWS Lambda Cold Starts in 2025 - Edge Delta](https://edgedelta.com/company/knowledge-center/aws-lambda-cold-start-cost)
1026
+ - [AWS Lambda Cold Start Optimization 2025 - Zircon Tech](https://zircon.tech/blog/aws-lambda-cold-start-optimization-in-2025-what-actually-works/)
1027
+ - [AWS Lambda Cold Start 7 Fixes 2026 - AgileSoft Labs](https://www.agilesoftlabs.com/blog/2026/02/aws-lambda-cold-start-7-proven-fixes)
1028
+ - [AWS Lambda Arm64 vs x86_64 Performance - Chris Ebert](https://chrisebert.net/comparing-aws-lambda-arm64-vs-x86_64-performance-across-multiple-runtimes-in-late-2025/)
1029
+ - [Serverless Java Cold Start Solved 2025 - Devrim Ozcay](https://devrimozcay.medium.com/serverless-java-aws-lambda-cold-start-solved-in-2025-ea3d28c734c3)
1030
+ - [Reducing Fargate Startup with zstd - AWS Blog](https://aws.amazon.com/blogs/containers/reducing-aws-fargate-startup-times-with-zstd-compressed-container-images/)
1031
+ - [Taming Cold Starts on Fargate - AWS Plain English](https://aws.plainenglish.io/taming-cold-starts-on-aws-fargate-the-architecture-behind-sub-5-second-task-launches-622ebd73b051)
1032
+ - [Advanced Autoscaling Reduces AWS Costs by 70% - InfoQ](https://www.infoq.com/news/2025/08/autoscaling-karpenter-automode/)
1033
+ - [Kubernetes Autoscaling in 2025 - Sedai](https://www.sedai.io/blog/kubernetes-autoscaling)
1034
+ - [Kubernetes Best Practices 2025 - KodeKloud](https://kodekloud.com/blog/kubernetes-best-practices-2025/)
1035
+ - [HPA vs VPA Kubernetes Autoscaling 2025 - ScaleOps](https://scaleops.com/blog/hpa-vs-vpa-understanding-kubernetes-autoscaling-and-why-its-not-enough-in-2025/)
1036
+ - [Karpenter vs Cluster Autoscaler 2025 - ScaleOps](https://scaleops.com/blog/karpenter-vs-cluster-autoscaler/)
1037
+ - [Karpenter vs Cluster Autoscaler - Spacelift](https://spacelift.io/blog/karpenter-vs-cluster-autoscaler)
1038
+ - [Karpenter vs Cluster Autoscaler - PerfectScale](https://www.perfectscale.io/blog/karpenter-vs-cluster-autoscaler)
1039
+ - [KEDA - Kubernetes Event-Driven Autoscaling](https://keda.sh/)
1040
+ - [KEDA Practical Guide - Digital Power](https://medium.com/@digitalpower/kubernetes-based-event-driven-autoscaling-with-keda-a-practical-guide-ed29cf482e7b)
1041
+ - [HPA Custom Metrics with Prometheus Adapter](https://oneuptime.com/blog/post/2026-02-09-hpa-custom-metrics-prometheus-adapter/view)
1042
+ - [HPA Object Metrics for Queue-Based Scaling](https://oneuptime.com/blog/post/2026-02-09-hpa-object-metrics-queue/view)
1043
+ - [Custom Metrics Autoscaling in Kubernetes - Pixie Labs](https://blog.px.dev/autoscaling-custom-k8s-metric/)
1044
+ - [AWS ECS Auto Scaling with Custom Metrics - AWS Blog](https://aws.amazon.com/blogs/containers/amazon-elastic-container-service-ecs-auto-scaling-using-custom-metrics/)
1045
+ - [Scaling Depot: Thundering Herd Problem](https://depot.dev/blog/planetscale-to-reduce-the-thundering-herd)
1046
+ - [Thundering Herd Problem Explained - Dhairya Singla](https://medium.com/@work.dhairya.singla/the-thundering-herd-problem-explained-causes-examples-and-solutions-7166b7e26c0c)
1047
+ - [Thundering Herds: The Scalability Killer - Aonnis](https://docs.aonnis.com/blog/thundering-herds-the-scalability-killer)
1048
+ - [Hybrid Reactive-Proactive Auto-scaling - arXiv](https://www.arxiv.org/pdf/2512.14290)
1049
+ - [Proactive and Reactive Autoscaling for Edge Computing - arXiv](https://arxiv.org/pdf/2510.10166)
1050
+ - [Predictive Scaling with ML - Hokstad Consulting](https://hokstadconsulting.com/blog/predictive-scaling-with-machine-learning-how-it-works)
1051
+ - [CAST AI AWS Cost Optimization Report 2025](https://cast.ai/blog/aws-cost-optimization/)
1052
+ - [Horizontal vs Vertical Scaling - PingCAP](https://www.pingcap.com/horizontal-scaling-vs-vertical-scaling/)
1053
+ - [Horizontal vs Vertical Scaling - DataCamp](https://www.datacamp.com/blog/horizontal-vs-vertical-scaling)
1054
+ - [AWS EKS Autoscaling Best Practices](https://docs.aws.amazon.com/eks/latest/best-practices/cas.html)
1055
+ - [Azure AKS Performance and Scaling Best Practices - Microsoft](https://learn.microsoft.com/en-us/azure/aks/best-practices-performance-scale)
1056
+ - [Kubernetes Autoscaling Challenges - ScaleOps](https://scaleops.com/blog/kubernetes-autoscaling/)
1057
+ - [Lambda Provisioned Concurrency - Lumigo](https://lumigo.io/blog/provisioned-concurrency-the-end-of-cold-starts/)
1058
+ - [Lambda Provisioned Concurrency - Pulumi](https://www.pulumi.com/blog/aws-lambda-provisioned-concurrency-no-cold-starts/)
1059
+ - [AWS EC2 Auto Scaling Warm Pool Mixed Instances 2025](https://aws.amazon.com/about-aws/whats-new/2025/11/ec2-auto-scaling-warm-pool-mixed-instances-policies/)