@wazir-dev/cli 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/AGENTS.md +111 -0
- package/CHANGELOG.md +14 -0
- package/CONTRIBUTING.md +101 -0
- package/LICENSE +21 -0
- package/README.md +314 -0
- package/assets/composition-engine.mmd +34 -0
- package/assets/demo-script.sh +17 -0
- package/assets/logo-dark.svg +14 -0
- package/assets/logo.svg +14 -0
- package/assets/pipeline.mmd +39 -0
- package/assets/record-demo.sh +51 -0
- package/docs/README.md +51 -0
- package/docs/adapters/context-mode.md +60 -0
- package/docs/concepts/architecture.md +87 -0
- package/docs/concepts/artifact-model.md +60 -0
- package/docs/concepts/composition-engine.md +36 -0
- package/docs/concepts/indexing-and-recall.md +160 -0
- package/docs/concepts/observability.md +41 -0
- package/docs/concepts/roles-and-workflows.md +59 -0
- package/docs/concepts/terminology-policy.md +27 -0
- package/docs/getting-started/01-installation.md +78 -0
- package/docs/getting-started/02-first-run.md +102 -0
- package/docs/getting-started/03-adding-to-project.md +15 -0
- package/docs/getting-started/04-host-setup.md +15 -0
- package/docs/guides/ci-integration.md +15 -0
- package/docs/guides/creating-skills.md +15 -0
- package/docs/guides/expertise-module-authoring.md +15 -0
- package/docs/guides/hook-development.md +15 -0
- package/docs/guides/memory-and-learnings.md +34 -0
- package/docs/guides/multi-host-export.md +15 -0
- package/docs/guides/troubleshooting.md +101 -0
- package/docs/guides/writing-custom-roles.md +15 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-design.md +592 -0
- package/docs/plans/2026-03-15-cli-pipeline-integration-plan.md +598 -0
- package/docs/plans/2026-03-15-docs-enforcement-plan.md +238 -0
- package/docs/readmes/INDEX.md +99 -0
- package/docs/readmes/features/expertise/README.md +171 -0
- package/docs/readmes/features/exports/README.md +222 -0
- package/docs/readmes/features/hooks/README.md +103 -0
- package/docs/readmes/features/hooks/loop-cap-guard.md +133 -0
- package/docs/readmes/features/hooks/post-tool-capture.md +121 -0
- package/docs/readmes/features/hooks/post-tool-lint.md +130 -0
- package/docs/readmes/features/hooks/pre-compact-summary.md +122 -0
- package/docs/readmes/features/hooks/pre-tool-capture-route.md +100 -0
- package/docs/readmes/features/hooks/protected-path-write-guard.md +128 -0
- package/docs/readmes/features/hooks/session-start.md +119 -0
- package/docs/readmes/features/hooks/stop-handoff-harvest.md +125 -0
- package/docs/readmes/features/roles/README.md +157 -0
- package/docs/readmes/features/roles/clarifier.md +152 -0
- package/docs/readmes/features/roles/content-author.md +190 -0
- package/docs/readmes/features/roles/designer.md +193 -0
- package/docs/readmes/features/roles/executor.md +184 -0
- package/docs/readmes/features/roles/learner.md +210 -0
- package/docs/readmes/features/roles/planner.md +182 -0
- package/docs/readmes/features/roles/researcher.md +164 -0
- package/docs/readmes/features/roles/reviewer.md +184 -0
- package/docs/readmes/features/roles/specifier.md +162 -0
- package/docs/readmes/features/roles/verifier.md +215 -0
- package/docs/readmes/features/schemas/README.md +178 -0
- package/docs/readmes/features/skills/README.md +63 -0
- package/docs/readmes/features/skills/brainstorming.md +96 -0
- package/docs/readmes/features/skills/debugging.md +148 -0
- package/docs/readmes/features/skills/design.md +120 -0
- package/docs/readmes/features/skills/prepare-next.md +109 -0
- package/docs/readmes/features/skills/run-audit.md +159 -0
- package/docs/readmes/features/skills/scan-project.md +109 -0
- package/docs/readmes/features/skills/self-audit.md +176 -0
- package/docs/readmes/features/skills/tdd.md +137 -0
- package/docs/readmes/features/skills/using-skills.md +92 -0
- package/docs/readmes/features/skills/verification.md +120 -0
- package/docs/readmes/features/skills/writing-plans.md +104 -0
- package/docs/readmes/features/tooling/README.md +320 -0
- package/docs/readmes/features/workflows/README.md +186 -0
- package/docs/readmes/features/workflows/author.md +181 -0
- package/docs/readmes/features/workflows/clarify.md +154 -0
- package/docs/readmes/features/workflows/design-review.md +171 -0
- package/docs/readmes/features/workflows/design.md +169 -0
- package/docs/readmes/features/workflows/discover.md +162 -0
- package/docs/readmes/features/workflows/execute.md +173 -0
- package/docs/readmes/features/workflows/learn.md +167 -0
- package/docs/readmes/features/workflows/plan-review.md +165 -0
- package/docs/readmes/features/workflows/plan.md +170 -0
- package/docs/readmes/features/workflows/prepare-next.md +167 -0
- package/docs/readmes/features/workflows/review.md +169 -0
- package/docs/readmes/features/workflows/run-audit.md +191 -0
- package/docs/readmes/features/workflows/spec-challenge.md +159 -0
- package/docs/readmes/features/workflows/specify.md +160 -0
- package/docs/readmes/features/workflows/verify.md +177 -0
- package/docs/readmes/packages/README.md +50 -0
- package/docs/readmes/packages/ajv.md +117 -0
- package/docs/readmes/packages/context-mode.md +118 -0
- package/docs/readmes/packages/gray-matter.md +116 -0
- package/docs/readmes/packages/node-test.md +137 -0
- package/docs/readmes/packages/yaml.md +112 -0
- package/docs/reference/configuration-reference.md +159 -0
- package/docs/reference/expertise-index.md +52 -0
- package/docs/reference/git-flow.md +43 -0
- package/docs/reference/hooks.md +87 -0
- package/docs/reference/host-exports.md +50 -0
- package/docs/reference/launch-checklist.md +172 -0
- package/docs/reference/marketplace-listings.md +76 -0
- package/docs/reference/release-process.md +34 -0
- package/docs/reference/roles-reference.md +77 -0
- package/docs/reference/skills.md +33 -0
- package/docs/reference/templates.md +29 -0
- package/docs/reference/tooling-cli.md +94 -0
- package/docs/truth-claims.yaml +222 -0
- package/expertise/PROGRESS.md +63 -0
- package/expertise/README.md +18 -0
- package/expertise/antipatterns/PROGRESS.md +56 -0
- package/expertise/antipatterns/backend/api-design-antipatterns.md +1271 -0
- package/expertise/antipatterns/backend/auth-antipatterns.md +1195 -0
- package/expertise/antipatterns/backend/caching-antipatterns.md +622 -0
- package/expertise/antipatterns/backend/database-antipatterns.md +1038 -0
- package/expertise/antipatterns/backend/index.md +24 -0
- package/expertise/antipatterns/backend/microservices-antipatterns.md +850 -0
- package/expertise/antipatterns/code/architecture-antipatterns.md +919 -0
- package/expertise/antipatterns/code/async-antipatterns.md +622 -0
- package/expertise/antipatterns/code/code-smells.md +1186 -0
- package/expertise/antipatterns/code/dependency-antipatterns.md +1209 -0
- package/expertise/antipatterns/code/error-handling-antipatterns.md +1360 -0
- package/expertise/antipatterns/code/index.md +27 -0
- package/expertise/antipatterns/code/naming-and-abstraction.md +1118 -0
- package/expertise/antipatterns/code/state-management-antipatterns.md +1076 -0
- package/expertise/antipatterns/code/testing-antipatterns.md +1053 -0
- package/expertise/antipatterns/design/accessibility-antipatterns.md +1136 -0
- package/expertise/antipatterns/design/dark-patterns.md +1121 -0
- package/expertise/antipatterns/design/index.md +22 -0
- package/expertise/antipatterns/design/ui-antipatterns.md +1202 -0
- package/expertise/antipatterns/design/ux-antipatterns.md +680 -0
- package/expertise/antipatterns/frontend/css-layout-antipatterns.md +691 -0
- package/expertise/antipatterns/frontend/flutter-antipatterns.md +1827 -0
- package/expertise/antipatterns/frontend/index.md +23 -0
- package/expertise/antipatterns/frontend/mobile-antipatterns.md +573 -0
- package/expertise/antipatterns/frontend/react-antipatterns.md +1128 -0
- package/expertise/antipatterns/frontend/spa-antipatterns.md +1235 -0
- package/expertise/antipatterns/index.md +31 -0
- package/expertise/antipatterns/performance/index.md +20 -0
- package/expertise/antipatterns/performance/performance-antipatterns.md +1013 -0
- package/expertise/antipatterns/performance/premature-optimization.md +623 -0
- package/expertise/antipatterns/performance/scaling-antipatterns.md +785 -0
- package/expertise/antipatterns/process/ai-coding-antipatterns.md +853 -0
- package/expertise/antipatterns/process/code-review-antipatterns.md +656 -0
- package/expertise/antipatterns/process/deployment-antipatterns.md +920 -0
- package/expertise/antipatterns/process/index.md +23 -0
- package/expertise/antipatterns/process/technical-debt-antipatterns.md +647 -0
- package/expertise/antipatterns/security/index.md +20 -0
- package/expertise/antipatterns/security/secrets-antipatterns.md +849 -0
- package/expertise/antipatterns/security/security-theater.md +843 -0
- package/expertise/antipatterns/security/vulnerability-patterns.md +801 -0
- package/expertise/architecture/PROGRESS.md +70 -0
- package/expertise/architecture/data/caching-architecture.md +671 -0
- package/expertise/architecture/data/data-consistency.md +574 -0
- package/expertise/architecture/data/data-modeling.md +536 -0
- package/expertise/architecture/data/event-streams-and-queues.md +634 -0
- package/expertise/architecture/data/index.md +25 -0
- package/expertise/architecture/data/search-architecture.md +663 -0
- package/expertise/architecture/data/sql-vs-nosql.md +708 -0
- package/expertise/architecture/decisions/architecture-decision-records.md +640 -0
- package/expertise/architecture/decisions/build-vs-buy.md +616 -0
- package/expertise/architecture/decisions/index.md +23 -0
- package/expertise/architecture/decisions/monolith-to-microservices.md +790 -0
- package/expertise/architecture/decisions/technology-selection.md +616 -0
- package/expertise/architecture/distributed/cap-theorem-and-tradeoffs.md +800 -0
- package/expertise/architecture/distributed/circuit-breaker-bulkhead.md +741 -0
- package/expertise/architecture/distributed/consensus-and-coordination.md +796 -0
- package/expertise/architecture/distributed/distributed-systems-fundamentals.md +564 -0
- package/expertise/architecture/distributed/idempotency-and-retry.md +796 -0
- package/expertise/architecture/distributed/index.md +25 -0
- package/expertise/architecture/distributed/saga-pattern.md +797 -0
- package/expertise/architecture/foundations/architectural-thinking.md +460 -0
- package/expertise/architecture/foundations/coupling-and-cohesion.md +770 -0
- package/expertise/architecture/foundations/design-principles-solid.md +649 -0
- package/expertise/architecture/foundations/domain-driven-design.md +719 -0
- package/expertise/architecture/foundations/index.md +25 -0
- package/expertise/architecture/foundations/separation-of-concerns.md +472 -0
- package/expertise/architecture/foundations/twelve-factor-app.md +797 -0
- package/expertise/architecture/index.md +34 -0
- package/expertise/architecture/integration/api-design-graphql.md +638 -0
- package/expertise/architecture/integration/api-design-grpc.md +804 -0
- package/expertise/architecture/integration/api-design-rest.md +892 -0
- package/expertise/architecture/integration/index.md +25 -0
- package/expertise/architecture/integration/third-party-integration.md +795 -0
- package/expertise/architecture/integration/webhooks-and-callbacks.md +1152 -0
- package/expertise/architecture/integration/websockets-realtime.md +791 -0
- package/expertise/architecture/mobile-architecture/index.md +22 -0
- package/expertise/architecture/mobile-architecture/mobile-app-architecture.md +780 -0
- package/expertise/architecture/mobile-architecture/mobile-backend-for-frontend.md +670 -0
- package/expertise/architecture/mobile-architecture/offline-first.md +719 -0
- package/expertise/architecture/mobile-architecture/push-and-sync.md +782 -0
- package/expertise/architecture/patterns/cqrs-event-sourcing.md +717 -0
- package/expertise/architecture/patterns/event-driven.md +797 -0
- package/expertise/architecture/patterns/hexagonal-clean-architecture.md +870 -0
- package/expertise/architecture/patterns/index.md +27 -0
- package/expertise/architecture/patterns/layered-architecture.md +736 -0
- package/expertise/architecture/patterns/microservices.md +753 -0
- package/expertise/architecture/patterns/modular-monolith.md +692 -0
- package/expertise/architecture/patterns/monolith.md +626 -0
- package/expertise/architecture/patterns/plugin-architecture.md +735 -0
- package/expertise/architecture/patterns/serverless.md +780 -0
- package/expertise/architecture/scaling/database-scaling.md +615 -0
- package/expertise/architecture/scaling/feature-flags-and-rollouts.md +757 -0
- package/expertise/architecture/scaling/horizontal-vs-vertical.md +606 -0
- package/expertise/architecture/scaling/index.md +24 -0
- package/expertise/architecture/scaling/multi-tenancy.md +800 -0
- package/expertise/architecture/scaling/stateless-design.md +787 -0
- package/expertise/backend/embedded-firmware.md +625 -0
- package/expertise/backend/go.md +853 -0
- package/expertise/backend/index.md +24 -0
- package/expertise/backend/java-spring.md +448 -0
- package/expertise/backend/node-typescript.md +625 -0
- package/expertise/backend/python-fastapi.md +724 -0
- package/expertise/backend/rust.md +458 -0
- package/expertise/backend/solidity.md +711 -0
- package/expertise/composition-map.yaml +443 -0
- package/expertise/content/foundations/content-modeling.md +395 -0
- package/expertise/content/foundations/editorial-standards.md +449 -0
- package/expertise/content/foundations/index.md +24 -0
- package/expertise/content/foundations/microcopy.md +455 -0
- package/expertise/content/foundations/terminology-governance.md +509 -0
- package/expertise/content/index.md +34 -0
- package/expertise/content/patterns/accessibility-copy.md +518 -0
- package/expertise/content/patterns/index.md +24 -0
- package/expertise/content/patterns/notification-content.md +433 -0
- package/expertise/content/patterns/sample-content.md +486 -0
- package/expertise/content/patterns/state-copy.md +439 -0
- package/expertise/design/PROGRESS.md +58 -0
- package/expertise/design/disciplines/dark-mode-theming.md +577 -0
- package/expertise/design/disciplines/design-systems.md +595 -0
- package/expertise/design/disciplines/index.md +25 -0
- package/expertise/design/disciplines/information-architecture.md +800 -0
- package/expertise/design/disciplines/interaction-design.md +788 -0
- package/expertise/design/disciplines/responsive-design.md +552 -0
- package/expertise/design/disciplines/usability-testing.md +516 -0
- package/expertise/design/disciplines/user-research.md +792 -0
- package/expertise/design/foundations/accessibility-design.md +796 -0
- package/expertise/design/foundations/color-theory.md +797 -0
- package/expertise/design/foundations/iconography.md +795 -0
- package/expertise/design/foundations/index.md +26 -0
- package/expertise/design/foundations/motion-and-animation.md +653 -0
- package/expertise/design/foundations/rtl-design.md +585 -0
- package/expertise/design/foundations/spacing-and-layout.md +607 -0
- package/expertise/design/foundations/typography.md +800 -0
- package/expertise/design/foundations/visual-hierarchy.md +761 -0
- package/expertise/design/index.md +32 -0
- package/expertise/design/patterns/authentication-flows.md +474 -0
- package/expertise/design/patterns/content-consumption.md +789 -0
- package/expertise/design/patterns/data-display.md +618 -0
- package/expertise/design/patterns/e-commerce.md +1494 -0
- package/expertise/design/patterns/feedback-and-states.md +642 -0
- package/expertise/design/patterns/forms-and-input.md +819 -0
- package/expertise/design/patterns/gamification.md +801 -0
- package/expertise/design/patterns/index.md +31 -0
- package/expertise/design/patterns/microinteractions.md +449 -0
- package/expertise/design/patterns/navigation.md +800 -0
- package/expertise/design/patterns/notifications.md +705 -0
- package/expertise/design/patterns/onboarding.md +700 -0
- package/expertise/design/patterns/search-and-filter.md +601 -0
- package/expertise/design/patterns/settings-and-preferences.md +768 -0
- package/expertise/design/patterns/social-and-community.md +748 -0
- package/expertise/design/platforms/desktop-native.md +612 -0
- package/expertise/design/platforms/index.md +25 -0
- package/expertise/design/platforms/mobile-android.md +825 -0
- package/expertise/design/platforms/mobile-cross-platform.md +983 -0
- package/expertise/design/platforms/mobile-ios.md +699 -0
- package/expertise/design/platforms/tablet.md +794 -0
- package/expertise/design/platforms/web-dashboard.md +790 -0
- package/expertise/design/platforms/web-responsive.md +550 -0
- package/expertise/design/psychology/behavioral-nudges.md +449 -0
- package/expertise/design/psychology/cognitive-load.md +1191 -0
- package/expertise/design/psychology/error-psychology.md +778 -0
- package/expertise/design/psychology/index.md +22 -0
- package/expertise/design/psychology/persuasive-design.md +736 -0
- package/expertise/design/psychology/user-mental-models.md +623 -0
- package/expertise/design/tooling/open-pencil.md +266 -0
- package/expertise/frontend/angular.md +1073 -0
- package/expertise/frontend/desktop-electron.md +546 -0
- package/expertise/frontend/flutter.md +782 -0
- package/expertise/frontend/index.md +27 -0
- package/expertise/frontend/native-android.md +409 -0
- package/expertise/frontend/native-ios.md +490 -0
- package/expertise/frontend/react-native.md +1160 -0
- package/expertise/frontend/react.md +808 -0
- package/expertise/frontend/vue.md +1089 -0
- package/expertise/humanize/domain-rules-code.md +79 -0
- package/expertise/humanize/domain-rules-content.md +67 -0
- package/expertise/humanize/domain-rules-technical-docs.md +56 -0
- package/expertise/humanize/index.md +35 -0
- package/expertise/humanize/self-audit-checklist.md +87 -0
- package/expertise/humanize/sentence-patterns.md +218 -0
- package/expertise/humanize/vocabulary-blacklist.md +105 -0
- package/expertise/i18n/PROGRESS.md +65 -0
- package/expertise/i18n/advanced/accessibility-and-i18n.md +28 -0
- package/expertise/i18n/advanced/bidirectional-text-algorithm.md +38 -0
- package/expertise/i18n/advanced/complex-scripts.md +30 -0
- package/expertise/i18n/advanced/performance-and-i18n.md +27 -0
- package/expertise/i18n/advanced/testing-i18n.md +28 -0
- package/expertise/i18n/content/content-adaptation.md +23 -0
- package/expertise/i18n/content/locale-specific-formatting.md +23 -0
- package/expertise/i18n/content/machine-translation-integration.md +28 -0
- package/expertise/i18n/content/translation-management.md +29 -0
- package/expertise/i18n/foundations/date-time-calendars.md +67 -0
- package/expertise/i18n/foundations/i18n-architecture.md +272 -0
- package/expertise/i18n/foundations/locale-and-language-tags.md +79 -0
- package/expertise/i18n/foundations/numbers-currency-units.md +61 -0
- package/expertise/i18n/foundations/pluralization-and-gender.md +109 -0
- package/expertise/i18n/foundations/string-externalization.md +236 -0
- package/expertise/i18n/foundations/text-direction-bidi.md +241 -0
- package/expertise/i18n/foundations/unicode-and-encoding.md +86 -0
- package/expertise/i18n/index.md +38 -0
- package/expertise/i18n/platform/backend-i18n.md +31 -0
- package/expertise/i18n/platform/flutter-i18n.md +148 -0
- package/expertise/i18n/platform/native-android-i18n.md +36 -0
- package/expertise/i18n/platform/native-ios-i18n.md +36 -0
- package/expertise/i18n/platform/react-i18n.md +103 -0
- package/expertise/i18n/platform/web-css-i18n.md +81 -0
- package/expertise/i18n/rtl/arabic-specific.md +175 -0
- package/expertise/i18n/rtl/hebrew-specific.md +149 -0
- package/expertise/i18n/rtl/rtl-animations-and-transitions.md +111 -0
- package/expertise/i18n/rtl/rtl-forms-and-input.md +161 -0
- package/expertise/i18n/rtl/rtl-fundamentals.md +211 -0
- package/expertise/i18n/rtl/rtl-icons-and-images.md +181 -0
- package/expertise/i18n/rtl/rtl-layout-mirroring.md +252 -0
- package/expertise/i18n/rtl/rtl-navigation-and-gestures.md +107 -0
- package/expertise/i18n/rtl/rtl-testing-and-qa.md +147 -0
- package/expertise/i18n/rtl/rtl-typography.md +160 -0
- package/expertise/index.md +113 -0
- package/expertise/index.yaml +216 -0
- package/expertise/infrastructure/cloud-aws.md +597 -0
- package/expertise/infrastructure/cloud-gcp.md +599 -0
- package/expertise/infrastructure/cybersecurity.md +816 -0
- package/expertise/infrastructure/database-mongodb.md +447 -0
- package/expertise/infrastructure/database-postgres.md +400 -0
- package/expertise/infrastructure/devops-cicd.md +787 -0
- package/expertise/infrastructure/index.md +27 -0
- package/expertise/performance/PROGRESS.md +50 -0
- package/expertise/performance/backend/api-latency.md +1204 -0
- package/expertise/performance/backend/background-jobs.md +506 -0
- package/expertise/performance/backend/connection-pooling.md +1209 -0
- package/expertise/performance/backend/database-query-optimization.md +515 -0
- package/expertise/performance/backend/index.md +23 -0
- package/expertise/performance/backend/rate-limiting-and-throttling.md +971 -0
- package/expertise/performance/foundations/algorithmic-complexity.md +954 -0
- package/expertise/performance/foundations/caching-strategies.md +489 -0
- package/expertise/performance/foundations/concurrency-and-parallelism.md +847 -0
- package/expertise/performance/foundations/index.md +24 -0
- package/expertise/performance/foundations/measuring-and-profiling.md +440 -0
- package/expertise/performance/foundations/memory-management.md +964 -0
- package/expertise/performance/foundations/performance-budgets.md +1314 -0
- package/expertise/performance/index.md +31 -0
- package/expertise/performance/infrastructure/auto-scaling.md +1059 -0
- package/expertise/performance/infrastructure/cdn-and-edge.md +1081 -0
- package/expertise/performance/infrastructure/index.md +22 -0
- package/expertise/performance/infrastructure/load-balancing.md +1081 -0
- package/expertise/performance/infrastructure/observability.md +1079 -0
- package/expertise/performance/mobile/index.md +23 -0
- package/expertise/performance/mobile/mobile-animations.md +544 -0
- package/expertise/performance/mobile/mobile-memory-battery.md +416 -0
- package/expertise/performance/mobile/mobile-network.md +452 -0
- package/expertise/performance/mobile/mobile-rendering.md +599 -0
- package/expertise/performance/mobile/mobile-startup-time.md +505 -0
- package/expertise/performance/platform-specific/flutter-performance.md +647 -0
- package/expertise/performance/platform-specific/index.md +22 -0
- package/expertise/performance/platform-specific/node-performance.md +1307 -0
- package/expertise/performance/platform-specific/postgres-performance.md +1366 -0
- package/expertise/performance/platform-specific/react-performance.md +1403 -0
- package/expertise/performance/web/bundle-optimization.md +1239 -0
- package/expertise/performance/web/image-and-media.md +636 -0
- package/expertise/performance/web/index.md +24 -0
- package/expertise/performance/web/network-optimization.md +1133 -0
- package/expertise/performance/web/rendering-performance.md +1098 -0
- package/expertise/performance/web/ssr-and-hydration.md +918 -0
- package/expertise/performance/web/web-vitals.md +1374 -0
- package/expertise/quality/accessibility.md +985 -0
- package/expertise/quality/evidence-based-verification.md +499 -0
- package/expertise/quality/index.md +24 -0
- package/expertise/quality/ml-model-audit.md +614 -0
- package/expertise/quality/performance.md +600 -0
- package/expertise/quality/testing-api.md +891 -0
- package/expertise/quality/testing-mobile.md +496 -0
- package/expertise/quality/testing-web.md +849 -0
- package/expertise/security/PROGRESS.md +54 -0
- package/expertise/security/agentic-identity.md +540 -0
- package/expertise/security/compliance-frameworks.md +601 -0
- package/expertise/security/data/data-encryption.md +364 -0
- package/expertise/security/data/data-privacy-gdpr.md +692 -0
- package/expertise/security/data/database-security.md +1171 -0
- package/expertise/security/data/index.md +22 -0
- package/expertise/security/data/pii-handling.md +531 -0
- package/expertise/security/foundations/authentication.md +1041 -0
- package/expertise/security/foundations/authorization.md +603 -0
- package/expertise/security/foundations/cryptography.md +1001 -0
- package/expertise/security/foundations/index.md +25 -0
- package/expertise/security/foundations/owasp-top-10.md +1354 -0
- package/expertise/security/foundations/secrets-management.md +1217 -0
- package/expertise/security/foundations/secure-sdlc.md +700 -0
- package/expertise/security/foundations/supply-chain-security.md +698 -0
- package/expertise/security/index.md +31 -0
- package/expertise/security/infrastructure/cloud-security-aws.md +1296 -0
- package/expertise/security/infrastructure/cloud-security-gcp.md +1376 -0
- package/expertise/security/infrastructure/container-security.md +721 -0
- package/expertise/security/infrastructure/incident-response.md +1295 -0
- package/expertise/security/infrastructure/index.md +24 -0
- package/expertise/security/infrastructure/logging-and-monitoring.md +1618 -0
- package/expertise/security/infrastructure/network-security.md +1337 -0
- package/expertise/security/mobile/index.md +23 -0
- package/expertise/security/mobile/mobile-android-security.md +1218 -0
- package/expertise/security/mobile/mobile-binary-protection.md +1229 -0
- package/expertise/security/mobile/mobile-data-storage.md +1265 -0
- package/expertise/security/mobile/mobile-ios-security.md +1401 -0
- package/expertise/security/mobile/mobile-network-security.md +1520 -0
- package/expertise/security/smart-contract-security.md +594 -0
- package/expertise/security/testing/index.md +22 -0
- package/expertise/security/testing/penetration-testing.md +1258 -0
- package/expertise/security/testing/security-code-review.md +1765 -0
- package/expertise/security/testing/threat-modeling.md +1074 -0
- package/expertise/security/testing/vulnerability-scanning.md +1062 -0
- package/expertise/security/web/api-security.md +586 -0
- package/expertise/security/web/cors-and-headers.md +433 -0
- package/expertise/security/web/csrf.md +562 -0
- package/expertise/security/web/file-upload.md +1477 -0
- package/expertise/security/web/index.md +25 -0
- package/expertise/security/web/injection.md +1375 -0
- package/expertise/security/web/session-management.md +1101 -0
- package/expertise/security/web/xss.md +1158 -0
- package/exports/README.md +17 -0
- package/exports/hosts/claude/.claude/agents/clarifier.md +42 -0
- package/exports/hosts/claude/.claude/agents/content-author.md +63 -0
- package/exports/hosts/claude/.claude/agents/designer.md +55 -0
- package/exports/hosts/claude/.claude/agents/executor.md +55 -0
- package/exports/hosts/claude/.claude/agents/learner.md +51 -0
- package/exports/hosts/claude/.claude/agents/planner.md +53 -0
- package/exports/hosts/claude/.claude/agents/researcher.md +43 -0
- package/exports/hosts/claude/.claude/agents/reviewer.md +54 -0
- package/exports/hosts/claude/.claude/agents/specifier.md +47 -0
- package/exports/hosts/claude/.claude/agents/verifier.md +71 -0
- package/exports/hosts/claude/.claude/commands/author.md +42 -0
- package/exports/hosts/claude/.claude/commands/clarify.md +38 -0
- package/exports/hosts/claude/.claude/commands/design-review.md +46 -0
- package/exports/hosts/claude/.claude/commands/design.md +44 -0
- package/exports/hosts/claude/.claude/commands/discover.md +37 -0
- package/exports/hosts/claude/.claude/commands/execute.md +48 -0
- package/exports/hosts/claude/.claude/commands/learn.md +38 -0
- package/exports/hosts/claude/.claude/commands/plan-review.md +42 -0
- package/exports/hosts/claude/.claude/commands/plan.md +39 -0
- package/exports/hosts/claude/.claude/commands/prepare-next.md +37 -0
- package/exports/hosts/claude/.claude/commands/review.md +40 -0
- package/exports/hosts/claude/.claude/commands/run-audit.md +41 -0
- package/exports/hosts/claude/.claude/commands/spec-challenge.md +41 -0
- package/exports/hosts/claude/.claude/commands/specify.md +38 -0
- package/exports/hosts/claude/.claude/commands/verify.md +37 -0
- package/exports/hosts/claude/.claude/settings.json +34 -0
- package/exports/hosts/claude/CLAUDE.md +19 -0
- package/exports/hosts/claude/export.manifest.json +38 -0
- package/exports/hosts/claude/host-package.json +67 -0
- package/exports/hosts/codex/AGENTS.md +19 -0
- package/exports/hosts/codex/export.manifest.json +38 -0
- package/exports/hosts/codex/host-package.json +41 -0
- package/exports/hosts/cursor/.cursor/hooks.json +16 -0
- package/exports/hosts/cursor/.cursor/rules/wazir-core.mdc +19 -0
- package/exports/hosts/cursor/export.manifest.json +38 -0
- package/exports/hosts/cursor/host-package.json +42 -0
- package/exports/hosts/gemini/GEMINI.md +19 -0
- package/exports/hosts/gemini/export.manifest.json +38 -0
- package/exports/hosts/gemini/host-package.json +41 -0
- package/hooks/README.md +18 -0
- package/hooks/definitions/loop_cap_guard.yaml +21 -0
- package/hooks/definitions/post_tool_capture.yaml +24 -0
- package/hooks/definitions/pre_compact_summary.yaml +19 -0
- package/hooks/definitions/pre_tool_capture_route.yaml +19 -0
- package/hooks/definitions/protected_path_write_guard.yaml +19 -0
- package/hooks/definitions/session_start.yaml +19 -0
- package/hooks/definitions/stop_handoff_harvest.yaml +20 -0
- package/hooks/loop-cap-guard +17 -0
- package/hooks/post-tool-lint +36 -0
- package/hooks/protected-path-write-guard +17 -0
- package/hooks/session-start +41 -0
- package/llms-full.txt +2355 -0
- package/llms.txt +43 -0
- package/package.json +79 -0
- package/roles/README.md +20 -0
- package/roles/clarifier.md +42 -0
- package/roles/content-author.md +63 -0
- package/roles/designer.md +55 -0
- package/roles/executor.md +55 -0
- package/roles/learner.md +51 -0
- package/roles/planner.md +53 -0
- package/roles/researcher.md +43 -0
- package/roles/reviewer.md +54 -0
- package/roles/specifier.md +47 -0
- package/roles/verifier.md +71 -0
- package/schemas/README.md +24 -0
- package/schemas/accepted-learning.schema.json +20 -0
- package/schemas/author-artifact.schema.json +156 -0
- package/schemas/clarification.schema.json +19 -0
- package/schemas/design-artifact.schema.json +80 -0
- package/schemas/docs-claim.schema.json +18 -0
- package/schemas/export-manifest.schema.json +20 -0
- package/schemas/hook.schema.json +67 -0
- package/schemas/host-export-package.schema.json +18 -0
- package/schemas/implementation-plan.schema.json +19 -0
- package/schemas/proposed-learning.schema.json +19 -0
- package/schemas/research.schema.json +18 -0
- package/schemas/review.schema.json +29 -0
- package/schemas/run-manifest.schema.json +18 -0
- package/schemas/spec-challenge.schema.json +18 -0
- package/schemas/spec.schema.json +20 -0
- package/schemas/usage.schema.json +102 -0
- package/schemas/verification-proof.schema.json +29 -0
- package/schemas/wazir-manifest.schema.json +173 -0
- package/skills/README.md +40 -0
- package/skills/brainstorming/SKILL.md +77 -0
- package/skills/debugging/SKILL.md +50 -0
- package/skills/design/SKILL.md +61 -0
- package/skills/dispatching-parallel-agents/SKILL.md +128 -0
- package/skills/executing-plans/SKILL.md +70 -0
- package/skills/finishing-a-development-branch/SKILL.md +169 -0
- package/skills/humanize/SKILL.md +123 -0
- package/skills/init-pipeline/SKILL.md +124 -0
- package/skills/prepare-next/SKILL.md +20 -0
- package/skills/receiving-code-review/SKILL.md +123 -0
- package/skills/requesting-code-review/SKILL.md +105 -0
- package/skills/requesting-code-review/code-reviewer.md +108 -0
- package/skills/run-audit/SKILL.md +197 -0
- package/skills/scan-project/SKILL.md +41 -0
- package/skills/self-audit/SKILL.md +153 -0
- package/skills/subagent-driven-development/SKILL.md +154 -0
- package/skills/subagent-driven-development/code-quality-reviewer-prompt.md +26 -0
- package/skills/subagent-driven-development/implementer-prompt.md +102 -0
- package/skills/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/tdd/SKILL.md +23 -0
- package/skills/using-git-worktrees/SKILL.md +163 -0
- package/skills/using-skills/SKILL.md +95 -0
- package/skills/verification/SKILL.md +22 -0
- package/skills/wazir/SKILL.md +463 -0
- package/skills/writing-plans/SKILL.md +30 -0
- package/skills/writing-skills/SKILL.md +157 -0
- package/skills/writing-skills/anthropic-best-practices.md +122 -0
- package/skills/writing-skills/persuasion-principles.md +50 -0
- package/templates/README.md +20 -0
- package/templates/artifacts/README.md +10 -0
- package/templates/artifacts/accepted-learning.md +19 -0
- package/templates/artifacts/accepted-learning.template.json +12 -0
- package/templates/artifacts/author.md +74 -0
- package/templates/artifacts/author.template.json +19 -0
- package/templates/artifacts/clarification.md +21 -0
- package/templates/artifacts/clarification.template.json +12 -0
- package/templates/artifacts/execute-notes.md +19 -0
- package/templates/artifacts/implementation-plan.md +21 -0
- package/templates/artifacts/implementation-plan.template.json +11 -0
- package/templates/artifacts/learning-proposal.md +19 -0
- package/templates/artifacts/next-run-handoff.md +21 -0
- package/templates/artifacts/plan-review.md +19 -0
- package/templates/artifacts/proposed-learning.template.json +12 -0
- package/templates/artifacts/research.md +21 -0
- package/templates/artifacts/research.template.json +12 -0
- package/templates/artifacts/review-findings.md +19 -0
- package/templates/artifacts/review.template.json +11 -0
- package/templates/artifacts/run-manifest.template.json +8 -0
- package/templates/artifacts/spec-challenge.md +19 -0
- package/templates/artifacts/spec-challenge.template.json +11 -0
- package/templates/artifacts/spec.md +21 -0
- package/templates/artifacts/spec.template.json +12 -0
- package/templates/artifacts/verification-proof.md +19 -0
- package/templates/artifacts/verification-proof.template.json +11 -0
- package/templates/examples/accepted-learning.example.json +14 -0
- package/templates/examples/author.example.json +152 -0
- package/templates/examples/clarification.example.json +15 -0
- package/templates/examples/docs-claim.example.json +8 -0
- package/templates/examples/export-manifest.example.json +7 -0
- package/templates/examples/host-export-package.example.json +11 -0
- package/templates/examples/implementation-plan.example.json +17 -0
- package/templates/examples/proposed-learning.example.json +13 -0
- package/templates/examples/research.example.json +15 -0
- package/templates/examples/research.example.md +6 -0
- package/templates/examples/review.example.json +17 -0
- package/templates/examples/run-manifest.example.json +9 -0
- package/templates/examples/spec-challenge.example.json +14 -0
- package/templates/examples/spec.example.json +21 -0
- package/templates/examples/verification-proof.example.json +21 -0
- package/templates/examples/wazir-manifest.example.yaml +65 -0
- package/templates/task-definition-schema.md +99 -0
- package/tooling/README.md +20 -0
- package/tooling/src/adapters/context-mode.js +50 -0
- package/tooling/src/capture/command.js +376 -0
- package/tooling/src/capture/store.js +99 -0
- package/tooling/src/capture/usage.js +270 -0
- package/tooling/src/checks/branches.js +50 -0
- package/tooling/src/checks/brand-truth.js +110 -0
- package/tooling/src/checks/changelog.js +231 -0
- package/tooling/src/checks/command-registry.js +36 -0
- package/tooling/src/checks/commits.js +102 -0
- package/tooling/src/checks/docs-drift.js +103 -0
- package/tooling/src/checks/docs-truth.js +201 -0
- package/tooling/src/checks/runtime-surface.js +156 -0
- package/tooling/src/cli.js +116 -0
- package/tooling/src/command-options.js +56 -0
- package/tooling/src/commands/validate.js +320 -0
- package/tooling/src/doctor/command.js +91 -0
- package/tooling/src/export/command.js +77 -0
- package/tooling/src/export/compiler.js +498 -0
- package/tooling/src/guards/loop-cap-guard.js +52 -0
- package/tooling/src/guards/protected-path-write-guard.js +67 -0
- package/tooling/src/index/command.js +152 -0
- package/tooling/src/index/storage.js +1061 -0
- package/tooling/src/index/summarizers.js +261 -0
- package/tooling/src/loaders.js +18 -0
- package/tooling/src/project-root.js +22 -0
- package/tooling/src/recall/command.js +225 -0
- package/tooling/src/schema-validator.js +30 -0
- package/tooling/src/state-root.js +40 -0
- package/tooling/src/status/command.js +71 -0
- package/wazir.manifest.yaml +135 -0
- package/workflows/README.md +19 -0
- package/workflows/author.md +42 -0
- package/workflows/clarify.md +38 -0
- package/workflows/design-review.md +46 -0
- package/workflows/design.md +44 -0
- package/workflows/discover.md +37 -0
- package/workflows/execute.md +48 -0
- package/workflows/learn.md +38 -0
- package/workflows/plan-review.md +42 -0
- package/workflows/plan.md +39 -0
- package/workflows/prepare-next.md +37 -0
- package/workflows/review.md +40 -0
- package/workflows/run-audit.md +41 -0
- package/workflows/spec-challenge.md +41 -0
- package/workflows/specify.md +38 -0
- package/workflows/verify.md +37 -0
|
@@ -0,0 +1,1059 @@
|
|
|
1
|
+
# Auto-Scaling Performance Expertise Module
|
|
2
|
+
|
|
3
|
+
> **Domain**: Infrastructure Performance
|
|
4
|
+
> **Last Updated**: 2026-03-08
|
|
5
|
+
> **Confidence Level**: High (benchmarks from production systems, AWS documentation, and peer-reviewed sources)
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## Table of Contents
|
|
10
|
+
|
|
11
|
+
1. [Overview and Scaling Taxonomy](#overview-and-scaling-taxonomy)
|
|
12
|
+
2. [Horizontal vs Vertical Scaling: Performance Tradeoffs](#horizontal-vs-vertical-scaling-performance-tradeoffs)
|
|
13
|
+
3. [Kubernetes HPA and VPA](#kubernetes-hpa-and-vpa)
|
|
14
|
+
4. [Kubernetes Node Scaling: Cluster Autoscaler vs Karpenter](#kubernetes-node-scaling-cluster-autoscaler-vs-karpenter)
|
|
15
|
+
5. [KEDA: Event-Driven Autoscaling](#keda-event-driven-autoscaling)
|
|
16
|
+
6. [AWS Auto Scaling Policies](#aws-auto-scaling-policies)
|
|
17
|
+
7. [Serverless Scaling and Cold Starts](#serverless-scaling-and-cold-starts)
|
|
18
|
+
8. [Custom Metrics for Scaling](#custom-metrics-for-scaling)
|
|
19
|
+
9. [Scaling Speed: Time-to-Ready Analysis](#scaling-speed-time-to-ready-analysis)
|
|
20
|
+
10. [Warm Pool Strategies](#warm-pool-strategies)
|
|
21
|
+
11. [Cost vs Performance Tradeoffs](#cost-vs-performance-tradeoffs)
|
|
22
|
+
12. [Common Bottlenecks](#common-bottlenecks)
|
|
23
|
+
13. [Anti-Patterns](#anti-patterns)
|
|
24
|
+
14. [Before/After: Configuration Improvements](#beforeafter-configuration-improvements)
|
|
25
|
+
15. [Decision Tree: How Should I Configure Auto-Scaling?](#decision-tree-how-should-i-configure-auto-scaling)
|
|
26
|
+
16. [Sources](#sources)
|
|
27
|
+
|
|
28
|
+
---
|
|
29
|
+
|
|
30
|
+
## Overview and Scaling Taxonomy
|
|
31
|
+
|
|
32
|
+
Auto-scaling is the automatic adjustment of compute resources in response to demand.
|
|
33
|
+
The three fundamental dimensions are:
|
|
34
|
+
|
|
35
|
+
| Dimension | Mechanism | Latency to Effect | Cost Profile |
|
|
36
|
+
|---|---|---|---|
|
|
37
|
+
| **Horizontal** (scale out/in) | Add/remove instances or pods | 45s-4min (pods), 1-5min (VMs) | Linear with instance count |
|
|
38
|
+
| **Vertical** (scale up/down) | Resize CPU/memory of existing units | 0s (in-place VPA) to 2min (restart) | Step function at instance type boundaries |
|
|
39
|
+
| **Functional** | Offload to specialized services | N/A (architectural) | Varies by service |
|
|
40
|
+
|
|
41
|
+
Scaling triggers fall into two categories:
|
|
42
|
+
|
|
43
|
+
- **Reactive**: Respond to observed metric thresholds (CloudWatch alarms, HPA polling). Scaling-related latency of 2-5 minutes is typical.
|
|
44
|
+
- **Predictive/Proactive**: Use ML models to forecast demand and pre-provision capacity. Reduces scaling-related latency by 65-80% compared to reactive approaches. Hybrid approaches reduce average response time by 35% while maintaining resource utilization above 75%.
|
|
45
|
+
|
|
46
|
+
---
|
|
47
|
+
|
|
48
|
+
## Horizontal vs Vertical Scaling: Performance Tradeoffs
|
|
49
|
+
|
|
50
|
+
### Horizontal Scaling (Scale Out)
|
|
51
|
+
|
|
52
|
+
**Strengths:**
|
|
53
|
+
- Near-linear throughput increase for stateless workloads (adding 10 pods yields ~10x throughput for embarrassingly parallel work)
|
|
54
|
+
- No upper hardware ceiling -- scale to thousands of nodes
|
|
55
|
+
- Built-in fault tolerance -- losing 1 of 20 instances loses only 5% capacity
|
|
56
|
+
- Geographic distribution reduces client latency by 30-70ms per continent hop
|
|
57
|
+
|
|
58
|
+
**Performance costs:**
|
|
59
|
+
- Network latency between instances: 0.1-0.5ms intra-AZ, 0.5-2ms cross-AZ, 50-150ms cross-region
|
|
60
|
+
- Load balancer overhead: 0.05-0.2ms per request for ALB/NLB
|
|
61
|
+
- Distributed state coordination: Two-Phase Commit adds 2-10ms per transaction
|
|
62
|
+
- Session affinity complexity: sticky sessions reduce effective capacity by 15-30%
|
|
63
|
+
|
|
64
|
+
**Best for:** Stateless APIs, web frontends, worker queues, microservices with >1000 RPS
|
|
65
|
+
|
|
66
|
+
### Vertical Scaling (Scale Up)
|
|
67
|
+
|
|
68
|
+
**Strengths:**
|
|
69
|
+
- Zero distributed systems overhead -- all data local to one machine
|
|
70
|
+
- Complex SQL joins execute 2-5x faster than cross-shard equivalents
|
|
71
|
+
- ACID transactions without distributed coordination
|
|
72
|
+
- Simpler operational model -- 1 server to monitor, backup, tune
|
|
73
|
+
|
|
74
|
+
**Performance costs:**
|
|
75
|
+
- Hardware ceiling: largest EC2 instance (u-24tb1.112xlarge) has 448 vCPUs and 24TB RAM
|
|
76
|
+
- Scaling requires downtime for non-live-resize platforms: 1-5 minutes during resize
|
|
77
|
+
- Single point of failure without replication
|
|
78
|
+
- Diminishing returns: doubling CPU from 64 to 128 cores yields <2x throughput due to lock contention
|
|
79
|
+
|
|
80
|
+
**Best for:** Relational databases, in-memory caches, legacy monoliths, workloads with <500 RPS
|
|
81
|
+
|
|
82
|
+
### Head-to-Head Comparison
|
|
83
|
+
|
|
84
|
+
| Metric | Horizontal | Vertical |
|
|
85
|
+
|---|---|---|
|
|
86
|
+
| P99 latency (stateless API) | +0.5-2ms (cross-node) | Baseline |
|
|
87
|
+
| Max throughput ceiling | Effectively unlimited | Hardware-bound |
|
|
88
|
+
| Time to scale | 45s-5min | 1-5min (restart) |
|
|
89
|
+
| Data consistency | Requires coordination | Native ACID |
|
|
90
|
+
| Cost at 10x load | ~10x baseline | 3-8x baseline (non-linear pricing) |
|
|
91
|
+
| Failure blast radius | 1/N capacity lost | 100% capacity lost |
|
|
92
|
+
|
|
93
|
+
---
|
|
94
|
+
|
|
95
|
+
## Kubernetes HPA and VPA
|
|
96
|
+
|
|
97
|
+
### Horizontal Pod Autoscaler (HPA)
|
|
98
|
+
|
|
99
|
+
The HPA polls metrics every **15 seconds** (default `--horizontal-pod-autoscaler-sync-period`) and adjusts replica count based on the ratio of current to target metric values.
|
|
100
|
+
|
|
101
|
+
**Scaling formula:**
|
|
102
|
+
```
|
|
103
|
+
desiredReplicas = ceil(currentReplicas * (currentMetricValue / desiredMetricValue))
|
|
104
|
+
```
|
|
105
|
+
|
|
106
|
+
**Performance characteristics:**
|
|
107
|
+
- Reaction time to demand change: **2-4 minutes** end-to-end (15s poll + 3min stabilization window)
|
|
108
|
+
- Scale-up stabilization window: **0 seconds** (default, immediate)
|
|
109
|
+
- Scale-down stabilization window: **300 seconds** (default, prevents flapping)
|
|
110
|
+
- Tolerance band: 10% (no scaling if metric is within 0.9x-1.1x of target)
|
|
111
|
+
|
|
112
|
+
**Recommended configuration for latency-sensitive workloads:**
|
|
113
|
+
```yaml
|
|
114
|
+
apiVersion: autoscaling/v2
|
|
115
|
+
kind: HorizontalPodAutoscaler
|
|
116
|
+
metadata:
|
|
117
|
+
name: api-server
|
|
118
|
+
spec:
|
|
119
|
+
scaleTargetRef:
|
|
120
|
+
apiVersion: apps/v1
|
|
121
|
+
kind: Deployment
|
|
122
|
+
name: api-server
|
|
123
|
+
minReplicas: 3
|
|
124
|
+
maxReplicas: 50
|
|
125
|
+
metrics:
|
|
126
|
+
- type: Resource
|
|
127
|
+
resource:
|
|
128
|
+
name: cpu
|
|
129
|
+
target:
|
|
130
|
+
type: Utilization
|
|
131
|
+
averageUtilization: 60 # Leave 40% headroom for spikes
|
|
132
|
+
- type: Pods
|
|
133
|
+
pods:
|
|
134
|
+
metric:
|
|
135
|
+
name: http_requests_per_second
|
|
136
|
+
target:
|
|
137
|
+
type: AverageValue
|
|
138
|
+
averageValue: "100" # Scale on business metric too
|
|
139
|
+
behavior:
|
|
140
|
+
scaleUp:
|
|
141
|
+
stabilizationWindowSeconds: 0
|
|
142
|
+
policies:
|
|
143
|
+
- type: Percent
|
|
144
|
+
value: 100 # Allow doubling per scale event
|
|
145
|
+
periodSeconds: 60
|
|
146
|
+
scaleDown:
|
|
147
|
+
stabilizationWindowSeconds: 300
|
|
148
|
+
policies:
|
|
149
|
+
- type: Percent
|
|
150
|
+
value: 10 # Scale down slowly (10%/min)
|
|
151
|
+
periodSeconds: 60
|
|
152
|
+
```
|
|
153
|
+
|
|
154
|
+
**Key insight**: Organizations using multiple metric types (CPU + custom metrics) for HPA scaling decisions experience fewer outages during traffic surges compared to CPU-only configurations, per the 2024 Kubernetes Benchmark Report.
|
|
155
|
+
|
|
156
|
+
### Vertical Pod Autoscaler (VPA)
|
|
157
|
+
|
|
158
|
+
VPA adjusts CPU and memory requests/limits for individual pods based on historical usage.
|
|
159
|
+
|
|
160
|
+
**Modes:**
|
|
161
|
+
| Mode | Behavior | Disruption | Use Case |
|
|
162
|
+
|---|---|---|---|
|
|
163
|
+
| `Off` | Recommendations only | None | Capacity planning |
|
|
164
|
+
| `Initial` | Sets requests at pod creation | None (existing pods) | Batch jobs |
|
|
165
|
+
| `Auto` | Evicts and recreates pods | Pod restart (5-30s) | Stateless services |
|
|
166
|
+
| `InPlace` (beta, K8s 1.32+) | Resizes without restart | None | Latency-sensitive |
|
|
167
|
+
|
|
168
|
+
**Real-world optimization results:**
|
|
169
|
+
- MongoDB cluster (3 replicas): VPA reduced memory requests from 6GB to 3.41GB, saving 4.2GB across the cluster
|
|
170
|
+
- etcd deployment: VPA recommended 93m CPU (vs. 10m initial) and 599MB memory, preventing OOMKills
|
|
171
|
+
- Typical memory right-sizing: 20-40% reduction in requested resources
|
|
172
|
+
|
|
173
|
+
**Critical constraint**: Do NOT run HPA and VPA on the same metric (e.g., both scaling on CPU). HPA adds pods because CPU is high; VPA increases CPU limits because CPU is high. They will fight each other in a scaling seesaw. Use VPA for memory right-sizing and HPA for horizontal scaling on CPU or custom metrics.
|
|
174
|
+
|
|
175
|
+
### Combining HPA + VPA Effectively
|
|
176
|
+
|
|
177
|
+
```
|
|
178
|
+
VPA handles: memory requests/limits (right-sizing)
|
|
179
|
+
HPA handles: replica count based on CPU utilization + custom metrics
|
|
180
|
+
Result: Covers ~80% of use cases without conflict
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
---
|
|
184
|
+
|
|
185
|
+
## Kubernetes Node Scaling: Cluster Autoscaler vs Karpenter
|
|
186
|
+
|
|
187
|
+
### Cluster Autoscaler (CAS)
|
|
188
|
+
|
|
189
|
+
- **Architecture**: Periodic scan loop (default 10-second interval)
|
|
190
|
+
- **Scaling mechanism**: Manages pre-defined Auto Scaling Groups (ASGs) with fixed instance types
|
|
191
|
+
- **Node provisioning time**: **3-4 minutes** end-to-end (scan cycle + ASG spin-up)
|
|
192
|
+
- **Scan interval tradeoff**: Reducing scan interval from 10s to 60s cuts API calls by 6x but slows scale-up by 38%
|
|
193
|
+
|
|
194
|
+
### Karpenter
|
|
195
|
+
|
|
196
|
+
- **Architecture**: Event-driven reconciliation -- each pending pod immediately triggers provisioning
|
|
197
|
+
- **Scaling mechanism**: Direct cloud provider API calls, no ASG dependency
|
|
198
|
+
- **Node provisioning time**: **45-60 seconds** in AWS benchmarks
|
|
199
|
+
- **Spot interruption recovery**: Can replace a Spot node within the 2-minute interruption notice window
|
|
200
|
+
|
|
201
|
+
### Performance Comparison
|
|
202
|
+
|
|
203
|
+
| Metric | Cluster Autoscaler | Karpenter |
|
|
204
|
+
|---|---|---|
|
|
205
|
+
| Pod-to-running latency | 3-4 minutes | 45-60 seconds |
|
|
206
|
+
| Instance type flexibility | Fixed per node group | Any type per pod spec |
|
|
207
|
+
| Bin-packing efficiency | Moderate (pre-defined groups) | High (right-sized per workload) |
|
|
208
|
+
| Cost reduction (reported) | Baseline | Up to 70% vs CAS |
|
|
209
|
+
| Spot instance support | Via ASG mixed instances | Native, with consolidation |
|
|
210
|
+
| Scale-down intelligence | Node utilization threshold | Active consolidation (replaces underutilized nodes) |
|
|
211
|
+
|
|
212
|
+
**Production benchmark** (SaaS workload): Karpenter brought CPU-bound pods online in ~55 seconds, while Cluster Autoscaler required 3-4 minutes -- primarily ASG spin-up time.
|
|
213
|
+
|
|
214
|
+
AWS introduced **EKS Auto Mode** (November 2024), which abstracts node management entirely. Early adopters report 60-70% cost savings and 80% reduction in infrastructure management time.
|
|
215
|
+
|
|
216
|
+
---
|
|
217
|
+
|
|
218
|
+
## KEDA: Event-Driven Autoscaling
|
|
219
|
+
|
|
220
|
+
KEDA (Kubernetes Event-Driven Autoscaling) extends HPA with external event sources.
|
|
221
|
+
|
|
222
|
+
**Key capability**: Scale to zero when idle, scale from zero on first event. This is impossible with standard HPA.
|
|
223
|
+
|
|
224
|
+
**Supported scalers**: 60+ including Kafka, RabbitMQ, AWS SQS, Azure Service Bus, PostgreSQL, Redis, Prometheus, Datadog, HTTP request count.
|
|
225
|
+
|
|
226
|
+
**Architecture:**
|
|
227
|
+
```
|
|
228
|
+
Event Source (e.g., SQS queue)
|
|
229
|
+
|
|
|
230
|
+
v
|
|
231
|
+
KEDA Metrics Server --> HPA --> Deployment
|
|
232
|
+
|
|
|
233
|
+
v
|
|
234
|
+
ScaledObject CRD (defines thresholds)
|
|
235
|
+
```
|
|
236
|
+
|
|
237
|
+
**Performance characteristics:**
|
|
238
|
+
- Metric polling interval: configurable, typically 15-30 seconds
|
|
239
|
+
- Scale-from-zero latency: container startup time + pod scheduling (typically 5-30 seconds)
|
|
240
|
+
- Scale-to-zero cooldown: configurable (default 300 seconds)
|
|
241
|
+
|
|
242
|
+
**Example: SQS queue-based scaling:**
|
|
243
|
+
```yaml
|
|
244
|
+
apiVersion: keda.sh/v1alpha1
|
|
245
|
+
kind: ScaledObject
|
|
246
|
+
metadata:
|
|
247
|
+
name: order-processor
|
|
248
|
+
spec:
|
|
249
|
+
scaleTargetRef:
|
|
250
|
+
name: order-processor
|
|
251
|
+
minReplicaCount: 0 # Scale to zero when queue is empty
|
|
252
|
+
maxReplicaCount: 100
|
|
253
|
+
triggers:
|
|
254
|
+
- type: aws-sqs-queue
|
|
255
|
+
metadata:
|
|
256
|
+
queueURL: https://sqs.us-east-1.amazonaws.com/123456789/orders
|
|
257
|
+
queueLength: "5" # Target 5 messages per pod
|
|
258
|
+
awsRegion: us-east-1
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
**When to use KEDA over HPA:**
|
|
262
|
+
- Queue-based workloads that should scale to zero
|
|
263
|
+
- Event-driven architectures (Kafka consumers, webhook processors)
|
|
264
|
+
- Workloads driven by external metrics (database row count, API rate)
|
|
265
|
+
|
|
266
|
+
---
|
|
267
|
+
|
|
268
|
+
## AWS Auto Scaling Policies
|
|
269
|
+
|
|
270
|
+
### Target Tracking Scaling
|
|
271
|
+
|
|
272
|
+
Automatically adjusts capacity to keep a metric at a target value. AWS **strongly recommends** this as the default policy type.
|
|
273
|
+
|
|
274
|
+
**Behavior:**
|
|
275
|
+
- Scales out aggressively (proportional to metric overshoot)
|
|
276
|
+
- Scales in gradually (conservative to avoid flapping)
|
|
277
|
+
- Creates and manages CloudWatch alarms automatically
|
|
278
|
+
- Uses 1-minute metrics for fastest response (recommended over 5-minute defaults)
|
|
279
|
+
|
|
280
|
+
**Pre-defined metrics:**
|
|
281
|
+
| Metric | Typical Target | Best For |
|
|
282
|
+
|---|---|---|
|
|
283
|
+
| `ASGAverageCPUUtilization` | 50-70% | General compute |
|
|
284
|
+
| `ALBRequestCountPerTarget` | 100-1000 | Web APIs |
|
|
285
|
+
| `ASGAverageNetworkOut` | Varies | Data processing |
|
|
286
|
+
| Custom CloudWatch metric | Application-specific | Business logic |
|
|
287
|
+
|
|
288
|
+
**Example configuration:**
|
|
289
|
+
```json
|
|
290
|
+
{
|
|
291
|
+
"TargetTrackingScalingPolicyConfiguration": {
|
|
292
|
+
"TargetValue": 60.0,
|
|
293
|
+
"PredefinedMetricSpecification": {
|
|
294
|
+
"PredefinedMetricType": "ASGAverageCPUUtilization"
|
|
295
|
+
},
|
|
296
|
+
"ScaleInCooldown": 300,
|
|
297
|
+
"ScaleOutCooldown": 60,
|
|
298
|
+
"DisableScaleIn": false
|
|
299
|
+
}
|
|
300
|
+
}
|
|
301
|
+
```
|
|
302
|
+
|
|
303
|
+
### Step Scaling
|
|
304
|
+
|
|
305
|
+
Provides graduated scaling responses based on alarm breach severity.
|
|
306
|
+
|
|
307
|
+
**Advantages over target tracking:**
|
|
308
|
+
- Fine-grained control: small load increase adds 1 instance, large surge adds 10
|
|
309
|
+
- Multiple step adjustments prevent over-provisioning for moderate increases
|
|
310
|
+
- Better for workloads with non-linear resource requirements
|
|
311
|
+
|
|
312
|
+
**Example step configuration:**
|
|
313
|
+
```
|
|
314
|
+
CPU 60-70% → Add 1 instance
|
|
315
|
+
CPU 70-80% → Add 3 instances
|
|
316
|
+
CPU 80-90% → Add 5 instances
|
|
317
|
+
CPU >90% → Add 10 instances
|
|
318
|
+
```
|
|
319
|
+
|
|
320
|
+
### Predictive Scaling
|
|
321
|
+
|
|
322
|
+
Uses ML models trained on 14 days of historical data to forecast demand and pre-provision capacity.
|
|
323
|
+
|
|
324
|
+
**Performance benefits:**
|
|
325
|
+
- Reduces scaling-related latency by 65-80% vs reactive approaches
|
|
326
|
+
- Pre-provisions capacity before demand spike arrives
|
|
327
|
+
- Reduces underprovisioned intervals by 45-60% vs threshold-based approaches
|
|
328
|
+
|
|
329
|
+
**Best suited for:**
|
|
330
|
+
- Cyclical traffic (business hours vs off-hours): daily patterns with 3-5x variation
|
|
331
|
+
- Recurring batch processing windows
|
|
332
|
+
- Applications with long initialization (>60 seconds bootstrap)
|
|
333
|
+
|
|
334
|
+
**Requirements:**
|
|
335
|
+
- Minimum 24 hours of historical data (14 days recommended)
|
|
336
|
+
- Traffic must have repeating patterns (random traffic defeats prediction)
|
|
337
|
+
- Forecasts generated every 6 hours, capacity provisioned 1 hour before predicted need
|
|
338
|
+
|
|
339
|
+
**Cost savings**: 20-30% reduction in infrastructure costs for workloads with recognizable patterns, because capacity is right-sized rather than over-provisioned as a buffer.
|
|
340
|
+
|
|
341
|
+
### Policy Selection Guide
|
|
342
|
+
|
|
343
|
+
| Scenario | Recommended Policy | Why |
|
|
344
|
+
|---|---|---|
|
|
345
|
+
| General web API | Target Tracking on ALBRequestCount | Proportional, self-managing |
|
|
346
|
+
| CPU-intensive batch | Step Scaling on CPU | Graduated response to load levels |
|
|
347
|
+
| Daily traffic pattern | Predictive + Target Tracking | Pre-warm + reactive fallback |
|
|
348
|
+
| Queue processing | Target Tracking on custom backlog metric | Proportional to actual work |
|
|
349
|
+
| Scheduled events (sales) | Scheduled + Target Tracking | Guaranteed minimum + dynamic |
|
|
350
|
+
|
|
351
|
+
---
|
|
352
|
+
|
|
353
|
+
## Serverless Scaling and Cold Starts
|
|
354
|
+
|
|
355
|
+
### AWS Lambda Cold Start Benchmarks
|
|
356
|
+
|
|
357
|
+
Cold start time = INIT phase (runtime bootstrap + dependency loading + function initialization).
|
|
358
|
+
|
|
359
|
+
**By runtime (simple functions, 2025 benchmarks):**
|
|
360
|
+
|
|
361
|
+
| Runtime | P50 Cold Start | P99 Cold Start | Warm Invocation P50 |
|
|
362
|
+
|---|---|---|---|
|
|
363
|
+
| Python 3.12 | 100-200ms | 300-500ms | 1-5ms |
|
|
364
|
+
| Node.js 20 | 100-200ms | 300-600ms | 1-5ms |
|
|
365
|
+
| Go (provided.al2023) | 8-15ms | 30-50ms | 0.5-2ms |
|
|
366
|
+
| Rust (provided.al2023) | 8-15ms | 30-50ms | 0.5-2ms |
|
|
367
|
+
| Java 21 (no SnapStart) | 3,000-4,000ms | 5,000-6,000ms | 2-10ms |
|
|
368
|
+
| Java 21 (SnapStart) | 150-200ms | 600-700ms | 2-10ms |
|
|
369
|
+
| .NET 8 (Native AOT) | 200-400ms | 600-1,000ms | 1-5ms |
|
|
370
|
+
|
|
371
|
+
**Key finding**: Java SnapStart reduces P50 cold starts from 3,841ms to 182ms -- a **95% reduction** at the median. SnapStart expanded to Python (November 2024) and .NET 8 with Native AOT.
|
|
372
|
+
|
|
373
|
+
**Factors that multiply cold start time:**
|
|
374
|
+
- VPC attachment: historically added 10+ seconds, now <1 second with Hyperplane ENIs
|
|
375
|
+
- Package size: each additional 1MB adds ~2-5ms to INIT
|
|
376
|
+
- Dependency count: heavy frameworks (Spring Boot, Django) add 500-3000ms
|
|
377
|
+
- Memory allocation: 128MB vs 1024MB can mean 3x slower INIT (CPU scales with memory)
|
|
378
|
+
|
|
379
|
+
**Architecture impact (Arm64 vs x86_64):**
|
|
380
|
+
Graviton2-based arm64 Lambda functions show **13-24% faster cold start initialization** at equivalent memory settings.
|
|
381
|
+
|
|
382
|
+
**Billing change (August 2025):** AWS now bills for the Lambda INIT phase, making cold start frequency a direct cost factor in addition to a latency concern.
|
|
383
|
+
|
|
384
|
+
### Container Cold Starts (ECS Fargate, Kubernetes)
|
|
385
|
+
|
|
386
|
+
Container cold starts are 10-100x slower than Lambda cold starts.
|
|
387
|
+
|
|
388
|
+
**Fargate cold start breakdown (production benchmarks):**
|
|
389
|
+
|
|
390
|
+
| Phase | Duration | Optimization |
|
|
391
|
+
|---|---|---|
|
|
392
|
+
| ENI Provisioning | 10-30 seconds | Cannot optimize (platform) |
|
|
393
|
+
| Image Pull | 5-60 seconds | Use SOCI, smaller images, ECR in same region |
|
|
394
|
+
| Layer Extraction | 2-15 seconds | Use zstd compression (27% reduction) |
|
|
395
|
+
| Application Bootstrap | 1-10 seconds | Optimize startup code, lazy init |
|
|
396
|
+
| **Total (unoptimized)** | **20-60 seconds** | -- |
|
|
397
|
+
| **Total (optimized)** | **3-8 seconds** | SOCI + small image + zstd |
|
|
398
|
+
|
|
399
|
+
**Optimization results:**
|
|
400
|
+
- SOCI (Seekable OCI) lazy loading: **50% startup acceleration**; 10GB Deep Learning Container showed ~60% improvement in pull times
|
|
401
|
+
- zstd compression: up to **27% reduction** in task/pod startup time
|
|
402
|
+
- Production achievement (Prime Day 2025): P99 cold starts reduced from 38 seconds to **under 4 seconds**
|
|
403
|
+
|
|
404
|
+
**Kubernetes pod startup time (typical):**
|
|
405
|
+
|
|
406
|
+
| Component | Duration |
|
|
407
|
+
|---|---|
|
|
408
|
+
| Scheduling decision | 0.5-2 seconds |
|
|
409
|
+
| Image pull (cached) | 0-1 seconds |
|
|
410
|
+
| Image pull (uncached, 500MB) | 5-20 seconds |
|
|
411
|
+
| Container start | 0.5-2 seconds |
|
|
412
|
+
| Readiness probe pass | 1-30 seconds (app-dependent) |
|
|
413
|
+
| **Total (cached image)** | **2-5 seconds** |
|
|
414
|
+
| **Total (cold pull)** | **10-30 seconds** |
|
|
415
|
+
|
|
416
|
+
---
|
|
417
|
+
|
|
418
|
+
## Custom Metrics for Scaling
|
|
419
|
+
|
|
420
|
+
CPU and memory utilization are lagging indicators. By the time CPU hits 80%, users are already experiencing degraded performance. Custom metrics provide **leading indicators** of demand.
|
|
421
|
+
|
|
422
|
+
### Metric Categories and Use Cases
|
|
423
|
+
|
|
424
|
+
**Queue-Based Metrics (most responsive for async workloads):**
|
|
425
|
+
|
|
426
|
+
| Metric | How to Calculate Target | Example |
|
|
427
|
+
|---|---|---|
|
|
428
|
+
| Backlog per instance | acceptable_latency / avg_processing_time | 10s latency / 0.1s per msg = 100 msgs/instance |
|
|
429
|
+
| Queue depth | total_messages / target_per_instance | 5000 msgs / 50 per pod = 100 pods |
|
|
430
|
+
| Age of oldest message | Alert if > SLA threshold | Scale if oldest > 30 seconds |
|
|
431
|
+
|
|
432
|
+
**Important**: Scale on backlog-per-instance, not raw queue depth. Raw depth does not account for processing speed or current instance count.
|
|
433
|
+
|
|
434
|
+
**Request-Based Metrics (best for synchronous APIs):**
|
|
435
|
+
|
|
436
|
+
| Metric | Target | When to Use |
|
|
437
|
+
|---|---|---|
|
|
438
|
+
| Requests per second per pod | 50-500 (benchmark your app) | HTTP APIs with known capacity |
|
|
439
|
+
| P95 response latency | Your SLA target (e.g., 200ms) | Latency-sensitive services |
|
|
440
|
+
| Error rate (5xx) | 0.1-1% | Overload detection |
|
|
441
|
+
| Active connections per instance | 80% of max (e.g., 800 of 1000) | Connection-limited services |
|
|
442
|
+
|
|
443
|
+
**Business Metrics (most aligned with value):**
|
|
444
|
+
|
|
445
|
+
| Metric | Example | Benefit |
|
|
446
|
+
|---|---|---|
|
|
447
|
+
| Orders per minute | Scale checkout service at 50 orders/min/pod | Directly tied to revenue |
|
|
448
|
+
| Active users | Scale at 1000 concurrent users per instance | Capacity planning |
|
|
449
|
+
| Payment queue depth | Scale payment processor per backlog | SLA compliance |
|
|
450
|
+
| Search queries per second | Scale search cluster at 200 QPS/node | User experience |
|
|
451
|
+
|
|
452
|
+
### Implementing Custom Metrics in Kubernetes
|
|
453
|
+
|
|
454
|
+
**Prometheus Adapter example (exposing HTTP request rate to HPA):**
|
|
455
|
+
```yaml
|
|
456
|
+
# prometheus-adapter-config
|
|
457
|
+
rules:
|
|
458
|
+
- seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
|
|
459
|
+
resources:
|
|
460
|
+
overrides:
|
|
461
|
+
namespace: {resource: "namespace"}
|
|
462
|
+
pod: {resource: "pod"}
|
|
463
|
+
name:
|
|
464
|
+
matches: "^(.*)_total"
|
|
465
|
+
as: "${1}_per_second"
|
|
466
|
+
metricsQuery: 'rate(<<.Series>>{<<.LabelMatchers>>}[2m])'
|
|
467
|
+
```
|
|
468
|
+
|
|
469
|
+
**HPA using the custom metric:**
|
|
470
|
+
```yaml
|
|
471
|
+
metrics:
|
|
472
|
+
- type: Pods
|
|
473
|
+
pods:
|
|
474
|
+
metric:
|
|
475
|
+
name: http_requests_per_second
|
|
476
|
+
target:
|
|
477
|
+
type: AverageValue
|
|
478
|
+
averageValue: "200" # Scale when avg exceeds 200 RPS/pod
|
|
479
|
+
```
|
|
480
|
+
|
|
481
|
+
### AWS Custom Metric Scaling (SQS Example)
|
|
482
|
+
|
|
483
|
+
The recommended approach for SQS is to calculate **backlog per instance**:
|
|
484
|
+
|
|
485
|
+
```
|
|
486
|
+
backlog_per_instance = ApproximateNumberOfMessagesVisible / RunningTaskCount
|
|
487
|
+
target_backlog = acceptable_latency / average_processing_time
|
|
488
|
+
```
|
|
489
|
+
|
|
490
|
+
If average processing time = 0.1 seconds and acceptable latency = 10 seconds:
|
|
491
|
+
- Target backlog per instance = 10 / 0.1 = **100 messages**
|
|
492
|
+
- With 5000 messages in queue and target of 100: desired = 5000 / 100 = **50 instances**
|
|
493
|
+
|
|
494
|
+
**Critical monitoring**: Track both queue depth AND age of oldest message. Scaling on depth alone may miss important messages aging past SLA thresholds.
|
|
495
|
+
|
|
496
|
+
---
|
|
497
|
+
|
|
498
|
+
## Scaling Speed: Time-to-Ready Analysis
|
|
499
|
+
|
|
500
|
+
The total time from "demand increase detected" to "new capacity serving traffic" varies dramatically by platform.
|
|
501
|
+
|
|
502
|
+
### End-to-End Scaling Timeline
|
|
503
|
+
|
|
504
|
+
```
|
|
505
|
+
Detection Provisioning Healthy
|
|
506
|
+
───────── ──────────── ───────
|
|
507
|
+
Lambda (warm): 0s 0s 0s = 0s total
|
|
508
|
+
Lambda (cold): 0s 0.1-5s 0s = 0.1-5s total
|
|
509
|
+
K8s Pod (cached): 15-30s 2-5s 1-30s = 18-65s total
|
|
510
|
+
K8s Pod (Karpenter): 0-5s 45-60s 5-30s = 50-95s total
|
|
511
|
+
K8s Pod (CAS): 10-30s 180-240s 5-30s = 195-300s total
|
|
512
|
+
EC2 (ASG): 60-300s 60-180s 30-120s = 150-600s total
|
|
513
|
+
EC2 (Warm Pool): 60-300s 5-30s 5-30s = 70-360s total
|
|
514
|
+
Fargate (cold): 15-60s 20-60s 1-30s = 36-150s total
|
|
515
|
+
Fargate (optimized): 15-60s 3-8s 1-30s = 19-98s total
|
|
516
|
+
```
|
|
517
|
+
|
|
518
|
+
### Breakdown of Detection Phase
|
|
519
|
+
|
|
520
|
+
| Scaling System | Detection Mechanism | Detection Latency |
|
|
521
|
+
|---|---|---|
|
|
522
|
+
| HPA | Polling (15s default) | 0-15 seconds |
|
|
523
|
+
| KEDA | Polling (configurable) | 0-30 seconds |
|
|
524
|
+
| Karpenter | Event-driven (pending pods) | 0-5 seconds |
|
|
525
|
+
| Cluster Autoscaler | Scan loop (10s default) | 0-10 seconds |
|
|
526
|
+
| AWS Target Tracking | CloudWatch alarm (1-5 min) | 60-300 seconds |
|
|
527
|
+
| AWS Predictive | ML forecast (1hr ahead) | Pre-provisioned |
|
|
528
|
+
|
|
529
|
+
### Key Takeaway
|
|
530
|
+
|
|
531
|
+
For latency-sensitive workloads that must handle spikes within 60 seconds, the only viable options are:
|
|
532
|
+
1. **Over-provision** (maintain 30-50% headroom)
|
|
533
|
+
2. **Lambda** (instant scaling if functions are warm, <5s cold)
|
|
534
|
+
3. **Pre-warmed capacity** (warm pools, provisioned concurrency, min replicas)
|
|
535
|
+
4. **Predictive scaling** (if traffic is patterned)
|
|
536
|
+
|
|
537
|
+
---
|
|
538
|
+
|
|
539
|
+
## Warm Pool Strategies
|
|
540
|
+
|
|
541
|
+
Warm pools pre-initialize resources so they can be placed into service faster than cold-starting new ones.
|
|
542
|
+
|
|
543
|
+
### AWS EC2 Warm Pools
|
|
544
|
+
|
|
545
|
+
**Instance states in warm pool:**
|
|
546
|
+
|
|
547
|
+
| State | Boot Time to Service | Cost | Use Case |
|
|
548
|
+
|---|---|---|---|
|
|
549
|
+
| Running | 5-10 seconds | Full instance cost | Ultra-fast scaling, short spikes |
|
|
550
|
+
| Stopped | 30-60 seconds | EBS storage cost only | Cost-effective for moderate latency tolerance |
|
|
551
|
+
| Hibernated | 10-30 seconds | EBS storage + memory snapshot | Stateful apps, OS-level caches |
|
|
552
|
+
|
|
553
|
+
**Configuration:**
|
|
554
|
+
```json
|
|
555
|
+
{
|
|
556
|
+
"WarmPool": {
|
|
557
|
+
"MinSize": 2,
|
|
558
|
+
"MaxGroupPreparedCapacity": 10,
|
|
559
|
+
"PoolState": "Stopped",
|
|
560
|
+
"InstanceReusePolicy": {
|
|
561
|
+
"ReuseOnScaleIn": true
|
|
562
|
+
}
|
|
563
|
+
}
|
|
564
|
+
}
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
**Key consideration**: If the warm pool is depleted during a scale-out event, instances launch cold (full boot). Size your warm pool to cover expected burst magnitude.
|
|
568
|
+
|
|
569
|
+
**Recent expansion (2025)**: AWS added warm pool support for Auto Scaling groups with mixed instances policies, enabling Spot + On-Demand warm pools.
|
|
570
|
+
|
|
571
|
+
### Lambda Provisioned Concurrency
|
|
572
|
+
|
|
573
|
+
Pre-initializes a specified number of execution environments, eliminating cold starts entirely for those instances.
|
|
574
|
+
|
|
575
|
+
**Performance:**
|
|
576
|
+
- Cold start: **0ms** for provisioned instances
|
|
577
|
+
- Spillover: Standard cold start if demand exceeds provisioned count
|
|
578
|
+
- Scaling: Provisioned concurrency can be scheduled or managed via Application Auto Scaling
|
|
579
|
+
|
|
580
|
+
**Cost model:**
|
|
581
|
+
- Provisioned rate: ~60% cheaper per GB-second than on-demand execution
|
|
582
|
+
- But: you pay 24/7 even with zero requests
|
|
583
|
+
- Break-even: typically cost-effective at >100 invocations/hour consistently
|
|
584
|
+
|
|
585
|
+
**Strategic warm-up alternative**: CloudWatch Events timer every 5 minutes sending concurrent warm-up requests. Achieves 80-95% warm availability at 5-15% of the cost of full provisioned concurrency.
|
|
586
|
+
|
|
587
|
+
### Kubernetes Warm Strategies
|
|
588
|
+
|
|
589
|
+
**Over-provisioning with priority-based preemption:**
|
|
590
|
+
```yaml
|
|
591
|
+
# Low-priority "balloon" pods that hold capacity
|
|
592
|
+
apiVersion: scheduling.k8s.io/v1
|
|
593
|
+
kind: PriorityClass
|
|
594
|
+
metadata:
|
|
595
|
+
name: overprovisioning
|
|
596
|
+
value: -1 # Lowest priority
|
|
597
|
+
preemptionPolicy: Never
|
|
598
|
+
---
|
|
599
|
+
apiVersion: apps/v1
|
|
600
|
+
kind: Deployment
|
|
601
|
+
metadata:
|
|
602
|
+
name: overprovisioning
|
|
603
|
+
spec:
|
|
604
|
+
replicas: 3
|
|
605
|
+
template:
|
|
606
|
+
spec:
|
|
607
|
+
priorityClassName: overprovisioning
|
|
608
|
+
containers:
|
|
609
|
+
- name: pause
|
|
610
|
+
image: registry.k8s.io/pause:3.9
|
|
611
|
+
resources:
|
|
612
|
+
requests:
|
|
613
|
+
cpu: "2"
|
|
614
|
+
memory: "4Gi"
|
|
615
|
+
```
|
|
616
|
+
|
|
617
|
+
When real workloads need resources, balloon pods are evicted instantly (0 seconds). New pods schedule on the freed capacity without waiting for node provisioning.
|
|
618
|
+
|
|
619
|
+
**Effective warm capacity**: 3 balloon pods x 2 CPU x 4Gi = 6 CPU and 12Gi always available for burst. Cost: ~$200-400/month for 3 medium instances.
|
|
620
|
+
|
|
621
|
+
---
|
|
622
|
+
|
|
623
|
+
## Cost vs Performance Tradeoffs
|
|
624
|
+
|
|
625
|
+
### The Fundamental Tension
|
|
626
|
+
|
|
627
|
+
```
|
|
628
|
+
Over-provisioning Under-provisioning
|
|
629
|
+
(high cost, low latency) (low cost, high latency risk)
|
|
630
|
+
| |
|
|
631
|
+
| ← Sweet Spot: 60-70% utilization → |
|
|
632
|
+
| |
|
|
633
|
+
30% idle capacity Users hit latency spikes
|
|
634
|
+
$$$$ wasted during scale-up lag
|
|
635
|
+
Zero scaling lag 2-5 min degraded performance
|
|
636
|
+
```
|
|
637
|
+
|
|
638
|
+
**Industry benchmark (CAST AI Report 2025):** Average resource utilization across cloud providers is 67% (AWS) and 66% (GCP). This means 33-34% of compute spend is wasted on idle capacity -- but this waste buys protection against scaling lag.
|
|
639
|
+
|
|
640
|
+
### Cost-Performance Matrix
|
|
641
|
+
|
|
642
|
+
| Strategy | Cost Overhead | P99 Latency During Spike | Time to Full Capacity |
|
|
643
|
+
|---|---|---|---|
|
|
644
|
+
| Always over-provisioned (50% headroom) | +50% baseline | 0ms impact | 0 seconds |
|
|
645
|
+
| Moderate headroom (20%) + reactive HPA | +20% baseline | +50-200ms for 2-4 min | 2-4 minutes |
|
|
646
|
+
| Tight provisioning + aggressive HPA | +5% baseline | +200-500ms for 3-5 min | 3-5 minutes |
|
|
647
|
+
| Predictive + reactive hybrid | +10-15% baseline | +20-50ms for 30-60s | 30-60 seconds |
|
|
648
|
+
| Scale-to-zero (KEDA/Lambda) | Pay-per-use only | Cold start penalty | 0.1s-30s |
|
|
649
|
+
|
|
650
|
+
### GPU Workload Waste
|
|
651
|
+
|
|
652
|
+
GPU workloads often suffer from high idle time and unused memory. A single A100 costs ~$3/hour; idle GPU capacity at scale translates to tens of thousands in monthly waste. Auto-scaling GPU workloads with Karpenter or KEDA (based on inference queue depth) can reduce GPU costs by 40-60%.
|
|
653
|
+
|
|
654
|
+
### Right-Sizing Formula
|
|
655
|
+
|
|
656
|
+
```
|
|
657
|
+
target_capacity = peak_demand * (1 + safety_margin)
|
|
658
|
+
safety_margin = scaling_time / acceptable_degradation_time
|
|
659
|
+
|
|
660
|
+
Example:
|
|
661
|
+
Peak demand: 100 pods
|
|
662
|
+
Scaling time: 3 minutes (180s)
|
|
663
|
+
Acceptable degradation: 1 minute (60s)
|
|
664
|
+
Safety margin: 180/60 = 3.0 (300%)
|
|
665
|
+
Target capacity: 100 * 4 = 400 pods ← Unsustainable!
|
|
666
|
+
|
|
667
|
+
Better approach:
|
|
668
|
+
Use predictive scaling (reduces effective scaling time to 30s)
|
|
669
|
+
Safety margin: 30/60 = 0.5 (50%)
|
|
670
|
+
Target capacity: 100 * 1.5 = 150 pods ← Manageable
|
|
671
|
+
```
|
|
672
|
+
|
|
673
|
+
---
|
|
674
|
+
|
|
675
|
+
## Common Bottlenecks
|
|
676
|
+
|
|
677
|
+
### 1. Slow Scale-Up
|
|
678
|
+
|
|
679
|
+
**Symptom**: Latency spikes lasting 3-10 minutes during traffic increases.
|
|
680
|
+
|
|
681
|
+
**Root causes and fixes:**
|
|
682
|
+
|
|
683
|
+
| Root Cause | Impact | Fix | Improvement |
|
|
684
|
+
|---|---|---|---|
|
|
685
|
+
| Large container images (>1GB) | +20-60s pull time | Multi-stage builds, distroless base | 50-80% smaller images |
|
|
686
|
+
| Slow health checks | +30-120s before serving | Separate liveness/readiness, fast startup probe | 30-60s faster |
|
|
687
|
+
| CAS scan interval too long | +60s detection delay | Reduce to 10s or switch to Karpenter | 45-60s faster |
|
|
688
|
+
| CloudWatch 5-min metrics | +300s detection delay | Switch to 1-min detailed monitoring | 240s faster |
|
|
689
|
+
| Cold node pool (no warm pool) | +120-180s boot time | EC2 warm pool or Karpenter | 90-150s faster |
|
|
690
|
+
|
|
691
|
+
### 2. Database Becomes Bottleneck During Scale
|
|
692
|
+
|
|
693
|
+
**Symptom**: Application scales horizontally but response times increase because all new instances hit the same database.
|
|
694
|
+
|
|
695
|
+
**The math:**
|
|
696
|
+
- 10 pods, each with 50 DB connections = 500 connections
|
|
697
|
+
- Scale to 30 pods = 1,500 connections
|
|
698
|
+
- RDS max_connections for db.r5.xlarge = 2,730
|
|
699
|
+
- At 40 pods = 2,000 connections (73% of max, performance degrades at >70%)
|
|
700
|
+
|
|
701
|
+
**Fixes:**
|
|
702
|
+
1. **Connection pooling** (PgBouncer, ProxySQL): Reduce per-pod connections from 50 to 5-10
|
|
703
|
+
2. **Read replicas**: Route read traffic to replicas, scale reads independently
|
|
704
|
+
3. **Database-aware scaling limits**: Set HPA maxReplicas based on DB connection budget
|
|
705
|
+
4. **Caching layer**: Add Redis/Memcached to absorb repeated reads (70-90% cache hit rate typical)
|
|
706
|
+
|
|
707
|
+
### 3. Thundering Herd After Scale Events
|
|
708
|
+
|
|
709
|
+
**Symptom**: Cache expires or service restarts, and all new instances simultaneously fetch the same data, overwhelming backends.
|
|
710
|
+
|
|
711
|
+
**The timeline:**
|
|
712
|
+
- Auto-scaling adds 20 instances at T=0
|
|
713
|
+
- All 20 instances start with cold caches at T=+30s
|
|
714
|
+
- All 20 simultaneously query the database for warm-up data at T=+31s
|
|
715
|
+
- Database CPU spikes to 100%, queries timeout at T=+32s
|
|
716
|
+
- Auto-scaling detects failure, may add MORE instances (cascading failure)
|
|
717
|
+
|
|
718
|
+
**Mitigation strategies:**
|
|
719
|
+
|
|
720
|
+
| Strategy | Mechanism | Effectiveness |
|
|
721
|
+
|---|---|---|
|
|
722
|
+
| Request coalescing | Single fetch, shared response for identical concurrent requests | Reduces DB load by 90%+ during storms |
|
|
723
|
+
| Jittered cache TTLs | Random ±10-20% on TTL prevents synchronized expiry | Eliminates cache stampede |
|
|
724
|
+
| Exponential backoff with jitter | 200ms, 400ms, 800ms delays with random offset | Staggers retry storms |
|
|
725
|
+
| Staggered rollout | Roll out new instances 2-3 at a time with 30s intervals | Prevents simultaneous cold cache |
|
|
726
|
+
| Cache pre-warming | Load critical data before marking instance healthy | Zero cold-cache window |
|
|
727
|
+
|
|
728
|
+
**Key number**: Auto-scaling takes 45+ seconds to respond to spikes, while a thundering herd spike happens in seconds. By the time scaling responds, the damage is done. Prevention (coalescing, jitter, pre-warming) beats reaction.
|
|
729
|
+
|
|
730
|
+
---
|
|
731
|
+
|
|
732
|
+
## Anti-Patterns
|
|
733
|
+
|
|
734
|
+
### 1. Scaling on CPU Only
|
|
735
|
+
|
|
736
|
+
**Why it fails**: CPU is a lagging indicator. By the time CPU reaches 80%, request queues are already saturated and users experience 2-5x latency.
|
|
737
|
+
|
|
738
|
+
**Additional problem with memory**: Most application runtimes do not release memory after load decreases. They keep memory allocated for reuse. Scaling on memory utilization may scale out but **never scale back in**.
|
|
739
|
+
|
|
740
|
+
**Fix**: Use request-based or queue-based metrics as primary scaling signals. Use CPU as a safety backstop only.
|
|
741
|
+
|
|
742
|
+
### 2. No Scale-Down Policy
|
|
743
|
+
|
|
744
|
+
**What happens**: `disableScaleIn: true` or overly conservative scale-down settings cause capacity to ratchet up permanently. A single daily spike provisions instances that run 24/7 at <10% utilization.
|
|
745
|
+
|
|
746
|
+
**Cost impact**: A 20-instance ASG that should average 8 instances wastes 12 instances * $0.10/hr * 720 hrs/month = **$864/month** in idle capacity.
|
|
747
|
+
|
|
748
|
+
**Fix**: Configure aggressive but stable scale-down:
|
|
749
|
+
- Scale-down cooldown: 300 seconds (prevents flapping)
|
|
750
|
+
- Scale-down evaluation: 15 consecutive minutes below threshold
|
|
751
|
+
- Scale-down rate: 1-2 instances per evaluation period
|
|
752
|
+
|
|
753
|
+
### 3. Scaling to Infinity (No Max Limit)
|
|
754
|
+
|
|
755
|
+
**What happens**: A bug, retry storm, or DDoS triggers unbounded scaling. Thousands of instances launch. Monthly bill: $50,000+.
|
|
756
|
+
|
|
757
|
+
**Real scenario**: A misconfigured health check returns 500 errors. Load balancer retries. Each retry increases load. Auto-scaling adds instances. New instances also return 500s. More retries. More scaling. 200 instances running within 10 minutes, none serving real traffic.
|
|
758
|
+
|
|
759
|
+
**Fix**: Always set `maxReplicas` / `MaxSize`. Set billing alerts at 150% and 200% of expected spend. Use AWS Service Quotas as a hard ceiling.
|
|
760
|
+
|
|
761
|
+
### 4. Scaling Oscillation (Flapping)
|
|
762
|
+
|
|
763
|
+
**What happens**: Scale-up threshold at 70% CPU, scale-down at 60% CPU. Adding instances drops CPU to 55%. Scale-down triggers. Removing instances raises CPU to 75%. Scale-up triggers. Infinite loop.
|
|
764
|
+
|
|
765
|
+
**Fix**: Maintain at least a 20% gap between scale-up and scale-down thresholds. Use stabilization windows: 0 seconds for scale-up, 300+ seconds for scale-down.
|
|
766
|
+
|
|
767
|
+
### 5. Ignoring Startup Time in Scaling Calculations
|
|
768
|
+
|
|
769
|
+
**What happens**: HPA targets 70% CPU. Application takes 60 seconds to start serving. During those 60 seconds, existing pods handle 100% of traffic at 90% CPU. HPA sees 90%, adds MORE pods. Overshoot by 2-3x.
|
|
770
|
+
|
|
771
|
+
**Fix**: Account for startup time with `scaleUp.stabilizationWindowSeconds`. Use readiness gates to prevent HPA from counting pods that are not yet serving traffic.
|
|
772
|
+
|
|
773
|
+
### 6. HPA + VPA Conflict on Same Metric
|
|
774
|
+
|
|
775
|
+
**What happens**: Both scale on CPU. CPU rises. HPA adds pods. VPA increases CPU requests. More total CPU requested than available. Pods go Pending. Cluster Autoscaler adds nodes. Massive over-provisioning.
|
|
776
|
+
|
|
777
|
+
**Fix**: VPA for memory only. HPA for CPU and custom metrics.
|
|
778
|
+
|
|
779
|
+
---
|
|
780
|
+
|
|
781
|
+
## Before/After: Configuration Improvements
|
|
782
|
+
|
|
783
|
+
### Case 1: E-Commerce API During Flash Sale
|
|
784
|
+
|
|
785
|
+
**Before** (naive configuration):
|
|
786
|
+
```yaml
|
|
787
|
+
# HPA: CPU only, default settings
|
|
788
|
+
spec:
|
|
789
|
+
minReplicas: 2
|
|
790
|
+
maxReplicas: 10
|
|
791
|
+
metrics:
|
|
792
|
+
- type: Resource
|
|
793
|
+
resource:
|
|
794
|
+
name: cpu
|
|
795
|
+
target:
|
|
796
|
+
type: Utilization
|
|
797
|
+
averageUtilization: 80
|
|
798
|
+
```
|
|
799
|
+
|
|
800
|
+
**Behavior during 10x traffic spike:**
|
|
801
|
+
- T=0: Traffic spike begins, CPU at 30%
|
|
802
|
+
- T=+60s: CPU reaches 80%, HPA triggers
|
|
803
|
+
- T=+120s: 2 new pods starting, CPU at 95%, P99 latency: 2,500ms
|
|
804
|
+
- T=+180s: New pods ready, but only at 4 total. Still need more.
|
|
805
|
+
- T=+300s: 6 pods running. CPU at 75%. P99 latency: 800ms
|
|
806
|
+
- T=+420s: 8 pods running. Stabilized. P99 latency: 200ms
|
|
807
|
+
- **Total degradation window: 7 minutes. Peak P99: 2,500ms**
|
|
808
|
+
|
|
809
|
+
**After** (optimized configuration):
|
|
810
|
+
```yaml
|
|
811
|
+
spec:
|
|
812
|
+
minReplicas: 5 # Higher baseline for faster initial absorption
|
|
813
|
+
maxReplicas: 50 # Room to grow
|
|
814
|
+
metrics:
|
|
815
|
+
- type: Resource
|
|
816
|
+
resource:
|
|
817
|
+
name: cpu
|
|
818
|
+
target:
|
|
819
|
+
type: Utilization
|
|
820
|
+
averageUtilization: 60 # Lower target = earlier scaling
|
|
821
|
+
- type: Pods
|
|
822
|
+
pods:
|
|
823
|
+
metric:
|
|
824
|
+
name: http_requests_per_second
|
|
825
|
+
target:
|
|
826
|
+
type: AverageValue
|
|
827
|
+
averageValue: "200" # Leading indicator
|
|
828
|
+
behavior:
|
|
829
|
+
scaleUp:
|
|
830
|
+
stabilizationWindowSeconds: 0
|
|
831
|
+
policies:
|
|
832
|
+
- type: Percent
|
|
833
|
+
value: 100 # Double capacity per minute
|
|
834
|
+
periodSeconds: 60
|
|
835
|
+
- type: Pods
|
|
836
|
+
value: 10 # Or add 10 pods, whichever is greater
|
|
837
|
+
periodSeconds: 60
|
|
838
|
+
selectPolicy: Max
|
|
839
|
+
scaleDown:
|
|
840
|
+
stabilizationWindowSeconds: 600
|
|
841
|
+
policies:
|
|
842
|
+
- type: Percent
|
|
843
|
+
value: 10
|
|
844
|
+
periodSeconds: 120
|
|
845
|
+
```
|
|
846
|
+
|
|
847
|
+
**Behavior during 10x traffic spike:**
|
|
848
|
+
- T=0: Traffic spike begins, 5 pods absorb initial burst
|
|
849
|
+
- T=+15s: HPA detects RPS increase (leading indicator), triggers scale-up
|
|
850
|
+
- T=+20s: 10 pods targeted (100% increase)
|
|
851
|
+
- T=+45s: 10 pods ready and serving. CPU at 65%. P99 latency: 250ms
|
|
852
|
+
- T=+60s: HPA evaluates again, adds 5 more pods (RPS still above target)
|
|
853
|
+
- T=+90s: 15 pods running. Stabilized. P99 latency: 150ms
|
|
854
|
+
- **Total degradation window: 45 seconds. Peak P99: 350ms**
|
|
855
|
+
|
|
856
|
+
**Improvement: 7 minutes degradation reduced to 45 seconds. Peak P99 reduced from 2,500ms to 350ms.**
|
|
857
|
+
|
|
858
|
+
### Case 2: AWS ASG with Predictive Scaling
|
|
859
|
+
|
|
860
|
+
**Before** (reactive only):
|
|
861
|
+
```
|
|
862
|
+
Policy: Target Tracking on CPU at 70%
|
|
863
|
+
Metrics: 5-minute CloudWatch intervals
|
|
864
|
+
Warm pool: None
|
|
865
|
+
Min: 2, Max: 20
|
|
866
|
+
```
|
|
867
|
+
- Daily traffic pattern: 3x spike at 9 AM, ramp-down at 6 PM
|
|
868
|
+
- Every morning: 5-8 minutes of degraded performance (P99 >1s) while scaling from 2 to 8 instances
|
|
869
|
+
- Each instance takes 3 minutes to boot + pass health checks
|
|
870
|
+
|
|
871
|
+
**After** (predictive + reactive + warm pool):
|
|
872
|
+
```
|
|
873
|
+
Policy: Predictive Scaling (forecast mode) + Target Tracking at 65%
|
|
874
|
+
Metrics: 1-minute detailed monitoring
|
|
875
|
+
Warm pool: 4 Stopped instances
|
|
876
|
+
Min: 2, Max: 20
|
|
877
|
+
Predictive: provisions capacity 1 hour before predicted need
|
|
878
|
+
```
|
|
879
|
+
- 8 AM: Predictive scaling launches 6 instances from warm pool (30s boot from stopped)
|
|
880
|
+
- 8:30 AM: 8 instances ready before traffic arrives
|
|
881
|
+
- 9 AM: Traffic spike absorbed by pre-provisioned capacity. P99: 180ms
|
|
882
|
+
- Reactive target tracking handles any variance above prediction
|
|
883
|
+
- **Result: Zero degradation window. P99 stays under 200ms throughout the day.**
|
|
884
|
+
|
|
885
|
+
### Case 3: Lambda Function with Provisioned Concurrency
|
|
886
|
+
|
|
887
|
+
**Before**:
|
|
888
|
+
```
|
|
889
|
+
Runtime: Java 21
|
|
890
|
+
Memory: 512MB
|
|
891
|
+
Provisioned Concurrency: None
|
|
892
|
+
Cold start P50: 3,841ms
|
|
893
|
+
Cold start P99: 5,200ms
|
|
894
|
+
```
|
|
895
|
+
- API Gateway timeout set to 10 seconds
|
|
896
|
+
- 5% of requests hit cold starts
|
|
897
|
+
- 0.3% of cold-start requests timeout entirely (>10s)
|
|
898
|
+
- User-facing error rate: 0.015%
|
|
899
|
+
|
|
900
|
+
**After**:
|
|
901
|
+
```
|
|
902
|
+
Runtime: Java 21 with SnapStart
|
|
903
|
+
Memory: 1024MB (2x CPU allocation)
|
|
904
|
+
Provisioned Concurrency: 20 (covers P95 concurrent demand)
|
|
905
|
+
Cold start P50: 182ms (SnapStart for spillover)
|
|
906
|
+
Cold start P99: 700ms (SnapStart for spillover)
|
|
907
|
+
```
|
|
908
|
+
- Provisioned handles 95% of invocations: 0ms cold start
|
|
909
|
+
- Spillover 5% uses SnapStart: 182ms P50 cold start
|
|
910
|
+
- Timeout rate: 0%
|
|
911
|
+
- **Result: P50 cold start reduced by 95%. Timeout errors eliminated.**
|
|
912
|
+
|
|
913
|
+
---
|
|
914
|
+
|
|
915
|
+
## Decision Tree: How Should I Configure Auto-Scaling?
|
|
916
|
+
|
|
917
|
+
```
|
|
918
|
+
START: What type of workload?
|
|
919
|
+
│
|
|
920
|
+
├─► Synchronous API (HTTP/gRPC)
|
|
921
|
+
│ │
|
|
922
|
+
│ ├─► Latency-sensitive (P99 < 200ms SLA)?
|
|
923
|
+
│ │ │
|
|
924
|
+
│ │ ├─► YES: Use HPA with request-rate metric + CPU backstop
|
|
925
|
+
│ │ │ Set minReplicas to handle 30% of peak
|
|
926
|
+
│ │ │ Use Karpenter (not CAS) for node scaling
|
|
927
|
+
│ │ │ Enable predictive scaling if traffic is patterned
|
|
928
|
+
│ │ │ Consider balloon pods for instant burst capacity
|
|
929
|
+
│ │ │
|
|
930
|
+
│ │ └─► NO: Use HPA with CPU at 60-70% target
|
|
931
|
+
│ │ Default minReplicas (2-3 for HA)
|
|
932
|
+
│ │ Cluster Autoscaler is sufficient
|
|
933
|
+
│ │
|
|
934
|
+
│ └─► Serverless candidate? (< 1000 RPS, spiky traffic)
|
|
935
|
+
│ │
|
|
936
|
+
│ ├─► YES: Lambda + API Gateway
|
|
937
|
+
│ │ Use SnapStart for Java/.NET
|
|
938
|
+
│ │ Provisioned Concurrency if P99 < 500ms required
|
|
939
|
+
│ │ Arm64 (Graviton) for 13-24% faster cold starts
|
|
940
|
+
│ │
|
|
941
|
+
│ └─► NO: Stay on containers (ECS/EKS)
|
|
942
|
+
│
|
|
943
|
+
├─► Asynchronous Queue Processing
|
|
944
|
+
│ │
|
|
945
|
+
│ ├─► Can tolerate scale-to-zero? (no traffic = no cost)
|
|
946
|
+
│ │ │
|
|
947
|
+
│ │ ├─► YES: KEDA with queue-length scaler
|
|
948
|
+
│ │ │ Set target messages-per-pod based on:
|
|
949
|
+
│ │ │ acceptable_latency / avg_processing_time
|
|
950
|
+
│ │ │ Monitor oldest message age (not just depth)
|
|
951
|
+
│ │ │
|
|
952
|
+
│ │ └─► NO: HPA with custom backlog-per-instance metric
|
|
953
|
+
│ │ minReplicas: 1-3 for always-on processing
|
|
954
|
+
│ │
|
|
955
|
+
│ └─► Processing time per message?
|
|
956
|
+
│ │
|
|
957
|
+
│ ├─► < 15 minutes: Lambda (SQS trigger, automatic scaling)
|
|
958
|
+
│ │
|
|
959
|
+
│ └─► > 15 minutes: ECS/EKS with KEDA or HPA
|
|
960
|
+
│
|
|
961
|
+
├─► Batch / Scheduled Workload
|
|
962
|
+
│ │
|
|
963
|
+
│ ├─► Predictable schedule?
|
|
964
|
+
│ │ │
|
|
965
|
+
│ │ ├─► YES: Scheduled scaling (cron-based min/max)
|
|
966
|
+
│ │ │ + Target Tracking for variance
|
|
967
|
+
│ │ │ + Warm pool for fast scale-up
|
|
968
|
+
│ │ │
|
|
969
|
+
│ │ └─► NO: Event-driven (KEDA or Step Functions)
|
|
970
|
+
│ │
|
|
971
|
+
│ └─► GPU required?
|
|
972
|
+
│ │
|
|
973
|
+
│ ├─► YES: Karpenter with GPU node pools
|
|
974
|
+
│ │ Scale on inference queue depth (KEDA)
|
|
975
|
+
│ │ Aggressive scale-down (GPU instances are expensive)
|
|
976
|
+
│ │
|
|
977
|
+
│ └─► NO: Standard compute auto-scaling
|
|
978
|
+
│
|
|
979
|
+
└─► Stateful Workload (Database, Cache)
|
|
980
|
+
│
|
|
981
|
+
├─► Vertical scaling first (larger instance type)
|
|
982
|
+
│ Until: single-instance limits reached OR cost prohibitive
|
|
983
|
+
│
|
|
984
|
+
├─► Read scaling: Add read replicas with connection routing
|
|
985
|
+
│
|
|
986
|
+
├─► Write scaling: Sharding (application-level partitioning)
|
|
987
|
+
│
|
|
988
|
+
└─► Managed auto-scaling:
|
|
989
|
+
├─► Aurora: Auto-scales storage + read replicas
|
|
990
|
+
├─► DynamoDB: On-demand or provisioned with auto-scaling
|
|
991
|
+
└─► ElastiCache: Cluster mode with shard auto-scaling
|
|
992
|
+
```
|
|
993
|
+
|
|
994
|
+
### Quick Reference: Scaling Configuration Checklist
|
|
995
|
+
|
|
996
|
+
```
|
|
997
|
+
[ ] Set maxReplicas / MaxSize (NEVER leave unbounded)
|
|
998
|
+
[ ] Set minReplicas >= 2 for HA (production workloads)
|
|
999
|
+
[ ] Use request-based or queue-based metrics as PRIMARY scaling signal
|
|
1000
|
+
[ ] Use CPU as SECONDARY backstop only
|
|
1001
|
+
[ ] Configure scale-up: stabilizationWindowSeconds = 0
|
|
1002
|
+
[ ] Configure scale-down: stabilizationWindowSeconds >= 300
|
|
1003
|
+
[ ] Ensure 20%+ gap between scale-up and scale-down thresholds
|
|
1004
|
+
[ ] Test actual scaling speed end-to-end (don't assume)
|
|
1005
|
+
[ ] Set billing alerts at 150% and 200% of expected spend
|
|
1006
|
+
[ ] Monitor database connections as a function of instance count
|
|
1007
|
+
[ ] Implement connection pooling before scaling application tier
|
|
1008
|
+
[ ] Use warm pools or predictive scaling for boot times > 60s
|
|
1009
|
+
[ ] Separate VPA (memory) from HPA (CPU + custom) to avoid conflicts
|
|
1010
|
+
[ ] Configure PodDisruptionBudgets for scale-down safety
|
|
1011
|
+
[ ] Load test at 2x expected peak to validate scaling behavior
|
|
1012
|
+
```
|
|
1013
|
+
|
|
1014
|
+
---
|
|
1015
|
+
|
|
1016
|
+
## Sources
|
|
1017
|
+
|
|
1018
|
+
- [AWS EC2 Auto Scaling Predictive Scaling Documentation](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-predictive-scaling.html)
|
|
1019
|
+
- [AWS EC2 Auto Scaling Warm Pools](https://docs.aws.amazon.com/autoscaling/ec2/userguide/ec2-auto-scaling-warm-pools.html)
|
|
1020
|
+
- [AWS Target Tracking Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-target-tracking.html)
|
|
1021
|
+
- [AWS Step and Simple Scaling Policies](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-scaling-simple-step.html)
|
|
1022
|
+
- [AWS SQS-Based Scaling Policy](https://docs.aws.amazon.com/autoscaling/ec2/userguide/as-using-sqs-queue.html)
|
|
1023
|
+
- [AWS Lambda Provisioned Concurrency](https://docs.aws.amazon.com/lambda/latest/dg/provisioned-concurrency.html)
|
|
1024
|
+
- [AWS Lambda Cold Start Benchmarks - maxday](https://maxday.github.io/lambda-perf/)
|
|
1025
|
+
- [AWS Lambda Cold Starts in 2025 - Edge Delta](https://edgedelta.com/company/knowledge-center/aws-lambda-cold-start-cost)
|
|
1026
|
+
- [AWS Lambda Cold Start Optimization 2025 - Zircon Tech](https://zircon.tech/blog/aws-lambda-cold-start-optimization-in-2025-what-actually-works/)
|
|
1027
|
+
- [AWS Lambda Cold Start 7 Fixes 2026 - AgileSoft Labs](https://www.agilesoftlabs.com/blog/2026/02/aws-lambda-cold-start-7-proven-fixes)
|
|
1028
|
+
- [AWS Lambda Arm64 vs x86_64 Performance - Chris Ebert](https://chrisebert.net/comparing-aws-lambda-arm64-vs-x86_64-performance-across-multiple-runtimes-in-late-2025/)
|
|
1029
|
+
- [Serverless Java Cold Start Solved 2025 - Devrim Ozcay](https://devrimozcay.medium.com/serverless-java-aws-lambda-cold-start-solved-in-2025-ea3d28c734c3)
|
|
1030
|
+
- [Reducing Fargate Startup with zstd - AWS Blog](https://aws.amazon.com/blogs/containers/reducing-aws-fargate-startup-times-with-zstd-compressed-container-images/)
|
|
1031
|
+
- [Taming Cold Starts on Fargate - AWS Plain English](https://aws.plainenglish.io/taming-cold-starts-on-aws-fargate-the-architecture-behind-sub-5-second-task-launches-622ebd73b051)
|
|
1032
|
+
- [Advanced Autoscaling Reduces AWS Costs by 70% - InfoQ](https://www.infoq.com/news/2025/08/autoscaling-karpenter-automode/)
|
|
1033
|
+
- [Kubernetes Autoscaling in 2025 - Sedai](https://www.sedai.io/blog/kubernetes-autoscaling)
|
|
1034
|
+
- [Kubernetes Best Practices 2025 - KodeKloud](https://kodekloud.com/blog/kubernetes-best-practices-2025/)
|
|
1035
|
+
- [HPA vs VPA Kubernetes Autoscaling 2025 - ScaleOps](https://scaleops.com/blog/hpa-vs-vpa-understanding-kubernetes-autoscaling-and-why-its-not-enough-in-2025/)
|
|
1036
|
+
- [Karpenter vs Cluster Autoscaler 2025 - ScaleOps](https://scaleops.com/blog/karpenter-vs-cluster-autoscaler/)
|
|
1037
|
+
- [Karpenter vs Cluster Autoscaler - Spacelift](https://spacelift.io/blog/karpenter-vs-cluster-autoscaler)
|
|
1038
|
+
- [Karpenter vs Cluster Autoscaler - PerfectScale](https://www.perfectscale.io/blog/karpenter-vs-cluster-autoscaler)
|
|
1039
|
+
- [KEDA - Kubernetes Event-Driven Autoscaling](https://keda.sh/)
|
|
1040
|
+
- [KEDA Practical Guide - Digital Power](https://medium.com/@digitalpower/kubernetes-based-event-driven-autoscaling-with-keda-a-practical-guide-ed29cf482e7b)
|
|
1041
|
+
- [HPA Custom Metrics with Prometheus Adapter](https://oneuptime.com/blog/post/2026-02-09-hpa-custom-metrics-prometheus-adapter/view)
|
|
1042
|
+
- [HPA Object Metrics for Queue-Based Scaling](https://oneuptime.com/blog/post/2026-02-09-hpa-object-metrics-queue/view)
|
|
1043
|
+
- [Custom Metrics Autoscaling in Kubernetes - Pixie Labs](https://blog.px.dev/autoscaling-custom-k8s-metric/)
|
|
1044
|
+
- [AWS ECS Auto Scaling with Custom Metrics - AWS Blog](https://aws.amazon.com/blogs/containers/amazon-elastic-container-service-ecs-auto-scaling-using-custom-metrics/)
|
|
1045
|
+
- [Scaling Depot: Thundering Herd Problem](https://depot.dev/blog/planetscale-to-reduce-the-thundering-herd)
|
|
1046
|
+
- [Thundering Herd Problem Explained - Dhairya Singla](https://medium.com/@work.dhairya.singla/the-thundering-herd-problem-explained-causes-examples-and-solutions-7166b7e26c0c)
|
|
1047
|
+
- [Thundering Herds: The Scalability Killer - Aonnis](https://docs.aonnis.com/blog/thundering-herds-the-scalability-killer)
|
|
1048
|
+
- [Hybrid Reactive-Proactive Auto-scaling - arXiv](https://www.arxiv.org/pdf/2512.14290)
|
|
1049
|
+
- [Proactive and Reactive Autoscaling for Edge Computing - arXiv](https://arxiv.org/pdf/2510.10166)
|
|
1050
|
+
- [Predictive Scaling with ML - Hokstad Consulting](https://hokstadconsulting.com/blog/predictive-scaling-with-machine-learning-how-it-works)
|
|
1051
|
+
- [CAST AI AWS Cost Optimization Report 2025](https://cast.ai/blog/aws-cost-optimization/)
|
|
1052
|
+
- [Horizontal vs Vertical Scaling - PingCAP](https://www.pingcap.com/horizontal-scaling-vs-vertical-scaling/)
|
|
1053
|
+
- [Horizontal vs Vertical Scaling - DataCamp](https://www.datacamp.com/blog/horizontal-vs-vertical-scaling)
|
|
1054
|
+
- [AWS EKS Autoscaling Best Practices](https://docs.aws.amazon.com/eks/latest/best-practices/cas.html)
|
|
1055
|
+
- [Azure AKS Performance and Scaling Best Practices - Microsoft](https://learn.microsoft.com/en-us/azure/aks/best-practices-performance-scale)
|
|
1056
|
+
- [Kubernetes Autoscaling Challenges - ScaleOps](https://scaleops.com/blog/kubernetes-autoscaling/)
|
|
1057
|
+
- [Lambda Provisioned Concurrency - Lumigo](https://lumigo.io/blog/provisioned-concurrency-the-end-of-cold-starts/)
|
|
1058
|
+
- [Lambda Provisioned Concurrency - Pulumi](https://www.pulumi.com/blog/aws-lambda-provisioned-concurrency-no-cold-starts/)
|
|
1059
|
+
- [AWS EC2 Auto Scaling Warm Pool Mixed Instances 2025](https://aws.amazon.com/about-aws/whats-new/2025/11/ec2-auto-scaling-warm-pool-mixed-instances-policies/)
|