@umacloud/knowledge 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (418) hide show
  1. package/00-governance/governance-capabilities.md +557 -0
  2. package/00-governance/knowledge-map.md +39 -0
  3. package/00-governance/maintenance-policy.md +76 -0
  4. package/00-governance/review-checklist.md +81 -0
  5. package/README.md +13 -0
  6. package/ai/01-standards/agent-development-complete.md +691 -0
  7. package/ai/01-standards/llm-application-complete.md +488 -0
  8. package/ai/01-standards/mlops-complete.md +798 -0
  9. package/ai/01-standards/prompt-engineering-complete.md +646 -0
  10. package/ai/01-standards/rag-architecture-complete.md +649 -0
  11. package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
  12. package/ai/03-checklists/ai-project-checklist.md +215 -0
  13. package/ai/04-antipatterns/ai-antipatterns.md +661 -0
  14. package/ai/05-cases/case-rag-production.md +147 -0
  15. package/ai/06-glossary/ai-glossary.md +162 -0
  16. package/ai/agent-evaluation-benchmark.md +53 -0
  17. package/ai/ai-agent-memory-context-management.md +41 -0
  18. package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
  19. package/ai/ai-data-security-and-compliance-playbook.md +37 -0
  20. package/ai/ai-domain-index-and-checklist.md +40 -0
  21. package/ai/ai-governance-maturity-model.md +50 -0
  22. package/ai/ai-model-selection-and-routing-strategy.md +47 -0
  23. package/ai/ai-observability-and-oncall-runbook.md +52 -0
  24. package/ai/ai-rag-engineering-playbook.md +42 -0
  25. package/ai/ai-red-team-and-safety-evaluation.md +42 -0
  26. package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
  27. package/ai/llm-agent-engineering-deep-dive.md +57 -0
  28. package/ai/prompt-and-tool-guardrails.md +52 -0
  29. package/api/01-standards/enterprise-api-standards.md +198 -0
  30. package/api/01-standards/rest-api-design-guide.md +63 -0
  31. package/api/02-playbooks/api-pagination-playbook.md +93 -0
  32. package/api/02-playbooks/graphql-production-playbook.md +176 -0
  33. package/api/03-checklists/api-review-checklist.md +55 -0
  34. package/api/04-antipatterns/api-antipatterns.md +112 -0
  35. package/architecture/01-standards/api-gateway-patterns.md +496 -0
  36. package/architecture/01-standards/cloud-native-patterns.md +644 -0
  37. package/architecture/01-standards/distributed-systems-patterns.md +591 -0
  38. package/architecture/01-standards/event-driven-architecture.md +595 -0
  39. package/architecture/01-standards/microservices-patterns-complete.md +968 -0
  40. package/architecture/01-standards/microservices-patterns.md +495 -0
  41. package/architecture/01-standards/system-design-interview.md +664 -0
  42. package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
  43. package/architecture/02-playbooks/migration-playbook.md +780 -0
  44. package/architecture/02-playbooks/system-design-playbook.md +779 -0
  45. package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
  46. package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
  47. package/architecture/05-cases/case-netflix-microservices.md +413 -0
  48. package/architecture/06-glossary/architecture-glossary.md +164 -0
  49. package/architecture/adr-template-and-examples.md +38 -0
  50. package/architecture/api-gateway-deep-dive.md +1291 -0
  51. package/architecture/configuration-management.md +1162 -0
  52. package/architecture/distributed-transactions.md +1220 -0
  53. package/architecture/microservices-complete.md +735 -0
  54. package/architecture/resilience-and-disaster-patterns.md +37 -0
  55. package/architecture/service-governance.md +1198 -0
  56. package/architecture/system-architecture-deep-dive.md +37 -0
  57. package/backend/01-standards/analytics-and-growth.md +65 -0
  58. package/backend/01-standards/api-and-error-conventions.md +120 -0
  59. package/backend/01-standards/application-layering-and-packaging.md +160 -0
  60. package/backend/01-standards/auth-implementation.md +104 -0
  61. package/backend/01-standards/backend-framework-idioms.md +74 -0
  62. package/backend/01-standards/background-jobs-and-async.md +66 -0
  63. package/backend/01-standards/caching-strategies-complete.md +390 -0
  64. package/backend/01-standards/config-and-observability.md +77 -0
  65. package/backend/01-standards/data-modeling-and-persistence.md +94 -0
  66. package/backend/01-standards/django-complete.md +1765 -0
  67. package/backend/01-standards/email-and-notifications.md +64 -0
  68. package/backend/01-standards/fastapi-complete.md +925 -0
  69. package/backend/01-standards/file-upload-and-storage.md +66 -0
  70. package/backend/01-standards/graphql-api-complete.md +416 -0
  71. package/backend/01-standards/llm-application-standard.md +78 -0
  72. package/backend/01-standards/message-queue-patterns.md +379 -0
  73. package/backend/01-standards/microservices-and-distributed.md +78 -0
  74. package/backend/01-standards/nestjs-complete.md +2167 -0
  75. package/backend/01-standards/payment-integration.md +80 -0
  76. package/backend/01-standards/rate-limiting-complete.md +451 -0
  77. package/backend/01-standards/realtime-and-websocket.md +65 -0
  78. package/backend/01-standards/search-and-filtering.md +64 -0
  79. package/backend/01-standards/spring-boot-complete.md +445 -0
  80. package/backend/02-playbooks/api-design-playbook.md +718 -0
  81. package/backend/02-playbooks/email-send-playbook.md +130 -0
  82. package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
  83. package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
  84. package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
  85. package/backend/03-checklists/api-launch-checklist.md +189 -0
  86. package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
  87. package/blockchain/01-standards/blockchain-basics.md +557 -0
  88. package/blockchain/01-standards/smart-contract-development.md +1315 -0
  89. package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
  90. package/cicd/01-standards/github-actions-complete.md +473 -0
  91. package/cicd/01-standards/release-and-store-submission.md +75 -0
  92. package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
  93. package/cicd/02-playbooks/release-management-playbook.md +605 -0
  94. package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
  95. package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
  96. package/cicd/05-cases/case-deployment-automation.md +221 -0
  97. package/cicd/05-cases/case-gitops-transformation.md +212 -0
  98. package/cicd/06-glossary/cicd-glossary.md +114 -0
  99. package/cicd/cicd-blueprint-deep-dive.md +38 -0
  100. package/cicd/release-readiness-gate.md +37 -0
  101. package/cloud-native/01-standards/container-security.md +741 -0
  102. package/cloud-native/01-standards/kubernetes-complete.md +812 -0
  103. package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
  104. package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
  105. package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
  106. package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
  107. package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
  108. package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
  109. package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
  110. package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
  111. package/cloud-native/03-checklists/container-security-checklist.md +431 -0
  112. package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
  113. package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
  114. package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
  115. package/cloud-native/05-cases/case-k8s-migration.md +478 -0
  116. package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
  117. package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
  118. package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
  119. package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
  120. package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
  121. package/data/01-standards/elasticsearch-complete.md +2098 -0
  122. package/data/01-standards/postgresql-complete.md +1613 -0
  123. package/data/01-standards/redis-complete.md +1527 -0
  124. package/data/02-playbooks/database-optimization-playbook.md +403 -0
  125. package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
  126. package/data/03-checklists/database-launch-checklist.md +187 -0
  127. package/data/04-antipatterns/database-antipatterns.md +873 -0
  128. package/data/05-cases/case-database-migration.md +310 -0
  129. package/data/06-glossary/database-glossary.md +440 -0
  130. package/data/data-governance-and-modeling-deep-dive.md +39 -0
  131. package/data-engineering/01-standards/airflow-complete.md +523 -0
  132. package/data-engineering/01-standards/kafka-complete.md +1521 -0
  133. package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
  134. package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
  135. package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
  136. package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
  137. package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
  138. package/database/01-standards/database-schema-standards.md +147 -0
  139. package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
  140. package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
  141. package/database/02-playbooks/postgresql-production-playbook.md +146 -0
  142. package/database/02-playbooks/redis-caching-playbook.md +117 -0
  143. package/database/03-checklists/database-review-checklist.md +50 -0
  144. package/database/04-antipatterns/database-antipatterns.md +112 -0
  145. package/design/01-standards/ui-design-system-complete.md +423 -0
  146. package/design/02-playbooks/design-handoff-playbook.md +254 -0
  147. package/design/02-playbooks/design-review-playbook.md +388 -0
  148. package/design/03-checklists/design-review-checklist.md +246 -0
  149. package/design/04-antipatterns/design-antipatterns.md +378 -0
  150. package/design/05-cases/case-design-system-adoption.md +328 -0
  151. package/design/06-glossary/design-glossary.md +329 -0
  152. package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
  153. package/design/ux-system-deep-dive.md +38 -0
  154. package/design-systems/00-craft-rules.md +71 -0
  155. package/design-systems/aesthetic-families.md +43 -0
  156. package/design-systems/anti-ai-slop.md +162 -0
  157. package/design-systems/bold-geometric.md +120 -0
  158. package/design-systems/brutalist-bold.md +103 -0
  159. package/design-systems/editorial-clean.md +109 -0
  160. package/design-systems/glass-aurora.md +108 -0
  161. package/design-systems/modern-minimal.md +145 -0
  162. package/design-systems/premium-luxury.md +106 -0
  163. package/design-systems/product-type-design-map.md +48 -0
  164. package/design-systems/soft-warm.md +123 -0
  165. package/design-systems/tech-utility.md +113 -0
  166. package/desktop/01-standards/desktop-app-standard.md +72 -0
  167. package/desktop/01-standards/desktop-design.md +71 -0
  168. package/development/00-governance/document-template.md +41 -0
  169. package/development/01-standards/api-versioning-strategies.md +432 -0
  170. package/development/01-standards/authentication-patterns-complete.md +479 -0
  171. package/development/01-standards/css-architecture-complete.md +550 -0
  172. package/development/01-standards/database-migration-strategies.md +484 -0
  173. package/development/01-standards/elasticsearch-complete.md +347 -0
  174. package/development/01-standards/git-complete.md +371 -0
  175. package/development/01-standards/golang-complete.md +1565 -0
  176. package/development/01-standards/graphql-complete.md +298 -0
  177. package/development/01-standards/javascript-bundlers-complete.md +469 -0
  178. package/development/01-standards/javascript-typescript-complete.md +528 -0
  179. package/development/01-standards/jest-complete.md +275 -0
  180. package/development/01-standards/linux-complete.md +234 -0
  181. package/development/01-standards/logging-observability-complete.md +526 -0
  182. package/development/01-standards/microservices-communication.md +502 -0
  183. package/development/01-standards/mongodb-complete.md +406 -0
  184. package/development/01-standards/oauth2-complete.md +285 -0
  185. package/development/01-standards/performance-optimization-complete.md +289 -0
  186. package/development/01-standards/playwright-complete.md +247 -0
  187. package/development/01-standards/postgresql-complete.md +456 -0
  188. package/development/01-standards/pytest-complete.md +340 -0
  189. package/development/01-standards/python-async-programming.md +902 -0
  190. package/development/01-standards/python-complete.md +956 -0
  191. package/development/01-standards/python-decorators-complete.md +799 -0
  192. package/development/01-standards/python-design-patterns.md +2854 -0
  193. package/development/01-standards/python-packaging-distribution.md +420 -0
  194. package/development/01-standards/python-testing-strategies.md +607 -0
  195. package/development/01-standards/python-web-frameworks-comparison.md +471 -0
  196. package/development/01-standards/redis-complete.md +317 -0
  197. package/development/01-standards/rest-api-complete.md +316 -0
  198. package/development/01-standards/rust-complete.md +578 -0
  199. package/development/01-standards/typescript-advanced-types.md +1513 -0
  200. package/development/01-standards/web-security-complete.md +292 -0
  201. package/development/02-playbooks/api-design-playbook.md +810 -0
  202. package/development/02-playbooks/database-migration-playbook.md +580 -0
  203. package/development/02-playbooks/debugging-playbook.md +692 -0
  204. package/development/02-playbooks/feature-delivery-playbook.md +430 -0
  205. package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
  206. package/development/02-playbooks/performance-optimization-playbook.md +531 -0
  207. package/development/02-playbooks/performance-tuning-playbook.md +652 -0
  208. package/development/02-playbooks/refactor-playbook.md +403 -0
  209. package/development/02-playbooks/release-playbook.md +469 -0
  210. package/development/03-checklists/architecture-review-checklist.md +168 -0
  211. package/development/03-checklists/data-migration-checklist.md +157 -0
  212. package/development/03-checklists/oncall-handover-checklist.md +173 -0
  213. package/development/03-checklists/pr-checklist.md +158 -0
  214. package/development/03-checklists/production-readiness-checklist.md +190 -0
  215. package/development/03-checklists/release-readiness-checklist.md +154 -0
  216. package/development/03-checklists/security-review-checklist.md +182 -0
  217. package/development/04-antipatterns/api-antipatterns.md +657 -0
  218. package/development/04-antipatterns/architecture-antipatterns.md +686 -0
  219. package/development/04-antipatterns/backend-antipatterns.md +648 -0
  220. package/development/04-antipatterns/cicd-antipatterns.md +540 -0
  221. package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
  222. package/development/04-antipatterns/data-antipatterns.md +658 -0
  223. package/development/04-antipatterns/database-antipatterns.md +578 -0
  224. package/development/04-antipatterns/frontend-antipatterns.md +635 -0
  225. package/development/04-antipatterns/reliability-antipatterns.md +700 -0
  226. package/development/04-antipatterns/security-antipatterns.md +747 -0
  227. package/development/05-cases/case-api-version-migration.md +428 -0
  228. package/development/05-cases/case-authorization-hardening.md +383 -0
  229. package/development/05-cases/case-bluegreen-rollback.md +466 -0
  230. package/development/05-cases/case-cache-snowball-protection.md +485 -0
  231. package/development/05-cases/case-ci-cd-pipeline.md +544 -0
  232. package/development/05-cases/case-database-scaling.md +500 -0
  233. package/development/05-cases/case-db-hotspot-optimization.md +487 -0
  234. package/development/05-cases/case-incident-mttr-reduction.md +563 -0
  235. package/development/05-cases/case-microservice-migration.md +375 -0
  236. package/development/05-cases/case-performance-optimization.md +406 -0
  237. package/development/05-cases/case-security-incident-response.md +345 -0
  238. package/development/06-glossary/full-stack-glossary.md +166 -0
  239. package/development/09-maturity/quarterly-audit-template.md +35 -0
  240. package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
  241. package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
  242. package/development/12-scenarios/development-scenarios-guide.md +565 -0
  243. package/development/13-implementation-assets/implementation-toolkit.md +282 -0
  244. package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
  245. package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
  246. package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
  247. package/development/api-contract-and-versioning-guide.md +36 -0
  248. package/development/api-governance-complete.md +43 -0
  249. package/development/backend-engineering-complete.md +43 -0
  250. package/development/code-review-quality-complete.md +43 -0
  251. package/development/concurrency-reliability-complete.md +43 -0
  252. package/development/database-engineering-complete.md +43 -0
  253. package/development/engineering-effectiveness-complete.md +43 -0
  254. package/development/engineering-standards-deep-dive.md +38 -0
  255. package/development/frontend-engineering-complete.md +43 -0
  256. package/development/performance-capacity-complete.md +43 -0
  257. package/development/refactor-migration-complete.md +42 -0
  258. package/development/refactoring-and-techdebt-playbook.md +37 -0
  259. package/development/security-in-development-complete.md +43 -0
  260. package/devops/01-standards/cicd-pipeline-complete.md +262 -0
  261. package/devops/01-standards/docker-complete.md +1490 -0
  262. package/devops/01-standards/github-actions-complete.md +337 -0
  263. package/devops/01-standards/kubernetes-complete.md +638 -0
  264. package/devops/01-standards/terraform-complete.md +2117 -0
  265. package/devops/02-playbooks/docker-compose-playbook.md +233 -0
  266. package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
  267. package/devops/02-playbooks/docker-production-playbook.md +952 -0
  268. package/edge-iot/01-standards/edge-iot-complete.md +473 -0
  269. package/experts/architect/api-design.md +178 -0
  270. package/experts/architect/methodology.md +124 -0
  271. package/experts/architect/security.md +75 -0
  272. package/experts/backend-lead/methodology.md +216 -0
  273. package/experts/devops/methodology.md +160 -0
  274. package/experts/frontend-lead/methodology.md +178 -0
  275. package/experts/product-manager/industry/ecommerce.md +43 -0
  276. package/experts/product-manager/industry/saas.md +40 -0
  277. package/experts/product-manager/methodology.md +97 -0
  278. package/experts/qa-lead/methodology.md +123 -0
  279. package/experts/qa-lead/test-strategy.md +128 -0
  280. package/experts/uiux-designer/methodology.md +125 -0
  281. package/frontend/01-standards/accessibility-complete.md +532 -0
  282. package/frontend/01-standards/accessibility-standard.md +74 -0
  283. package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
  284. package/frontend/01-standards/design-tokens-complete.md +444 -0
  285. package/frontend/01-standards/forms-and-validation.md +77 -0
  286. package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
  287. package/frontend/01-standards/i18n-and-localization.md +65 -0
  288. package/frontend/01-standards/nextjs-complete.md +451 -0
  289. package/frontend/01-standards/react-complete.md +713 -0
  290. package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
  291. package/frontend/01-standards/react-hooks-complete.md +1171 -0
  292. package/frontend/01-standards/seo-and-web-vitals.md +77 -0
  293. package/frontend/01-standards/state-management-complete.md +444 -0
  294. package/frontend/01-standards/vue-complete.md +499 -0
  295. package/frontend/01-standards/vue3-complete.md +2002 -0
  296. package/frontend/01-standards/web-framework-best-practices.md +64 -0
  297. package/frontend/01-standards/web-performance-complete.md +495 -0
  298. package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
  299. package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
  300. package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
  301. package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
  302. package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
  303. package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
  304. package/frontend/03-checklists/component-quality-checklist.md +166 -0
  305. package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
  306. package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
  307. package/frontend/05-cases/case-performance-optimization.md +274 -0
  308. package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
  309. package/harmony/01-standards/harmonyos-design.md +65 -0
  310. package/high-quality-engineering-playbook.md +54 -0
  311. package/incident/01-standards/incident-response-complete.md +303 -0
  312. package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
  313. package/incident/02-playbooks/postmortem-playbook.md +398 -0
  314. package/incident/03-checklists/incident-readiness-checklist.md +181 -0
  315. package/incident/04-antipatterns/incident-antipatterns.md +490 -0
  316. package/incident/05-cases/case-cascade-failure.md +176 -0
  317. package/incident/06-glossary/incident-glossary.md +114 -0
  318. package/incident/postmortem-and-response-deep-dive.md +39 -0
  319. package/industries/ecommerce/ecommerce-complete.md +631 -0
  320. package/industries/education/education-complete.md +555 -0
  321. package/industries/fintech/fintech-complete.md +501 -0
  322. package/industries/gaming/gaming-complete.md +587 -0
  323. package/industries/healthcare/healthcare-complete.md +452 -0
  324. package/low-code/01-standards/low-code-complete.md +944 -0
  325. package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
  326. package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
  327. package/miniprogram/01-standards/miniprogram-design.md +61 -0
  328. package/miniprogram/01-standards/miniprogram-standard.md +81 -0
  329. package/mobile/01-standards/android-material-design.md +70 -0
  330. package/mobile/01-standards/flutter-complete.md +384 -0
  331. package/mobile/01-standards/ios-design-hig.md +78 -0
  332. package/mobile/01-standards/mobile-app-standard.md +85 -0
  333. package/mobile/01-standards/react-native-complete.md +352 -0
  334. package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
  335. package/mobile/02-playbooks/mobile-performance.md +473 -0
  336. package/mobile/03-checklists/mobile-release-checklist.md +234 -0
  337. package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
  338. package/mobile/05-cases/case-app-performance.md +500 -0
  339. package/mobile/05-cases/case-app-startup-optimization.md +218 -0
  340. package/mobile/06-glossary/mobile-glossary.md +484 -0
  341. package/observability/01-standards/observability-standards.md +103 -0
  342. package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
  343. package/observability/02-playbooks/structured-logging-playbook.md +73 -0
  344. package/observability/03-checklists/observability-checklist.md +54 -0
  345. package/observability/04-antipatterns/observability-antipatterns.md +106 -0
  346. package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
  347. package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
  348. package/operations/03-checklists/production-launch-checklist.md +365 -0
  349. package/operations/04-antipatterns/operations-antipatterns.md +664 -0
  350. package/operations/05-cases/case-sre-practices.md +581 -0
  351. package/operations/06-glossary/operations-glossary.md +120 -0
  352. package/operations/aiops-anomaly-detection.md +758 -0
  353. package/operations/capacity-planning.md +1061 -0
  354. package/operations/chaos-engineering.md +659 -0
  355. package/operations/incident-command-system.md +38 -0
  356. package/operations/observability-complete.md +442 -0
  357. package/operations/slo-sli-playbook.md +517 -0
  358. package/operations/sre-operations-deep-dive.md +39 -0
  359. package/package.json +8 -0
  360. package/performance/01-standards/performance-and-scalability.md +80 -0
  361. package/performance/01-standards/performance-standards.md +156 -0
  362. package/performance/02-playbooks/query-optimization-playbook.md +103 -0
  363. package/performance/03-checklists/performance-checklist.md +56 -0
  364. package/performance/04-antipatterns/performance-antipatterns.md +146 -0
  365. package/product/01-standards/product-management-complete.md +285 -0
  366. package/product/02-playbooks/feature-launch-playbook.md +207 -0
  367. package/product/02-playbooks/user-research-playbook.md +532 -0
  368. package/product/03-checklists/feature-launch-checklist.md +275 -0
  369. package/product/04-antipatterns/product-antipatterns.md +355 -0
  370. package/product/05-cases/case-mvp-to-scale.md +384 -0
  371. package/product/06-glossary/product-glossary.md +462 -0
  372. package/product/feature-prioritization-framework.md +40 -0
  373. package/product/kpi-and-metric-tree.md +37 -0
  374. package/product/product-discovery-and-prd-deep-dive.md +41 -0
  375. package/quantum/01-standards/quantum-complete.md +1186 -0
  376. package/security/01-standards/api-security-complete.md +511 -0
  377. package/security/01-standards/container-runtime-security.md +574 -0
  378. package/security/01-standards/data-protection-gdpr.md +543 -0
  379. package/security/01-standards/owasp-top10-complete.md +1890 -0
  380. package/security/01-standards/secure-coding-baseline.md +90 -0
  381. package/security/01-standards/supply-chain-security.md +441 -0
  382. package/security/01-standards/web-security-checklist.md +108 -0
  383. package/security/01-standards/zero-trust-architecture.md +521 -0
  384. package/security/02-playbooks/auth-sso-playbook.md +166 -0
  385. package/security/02-playbooks/incident-response-security-playbook.md +588 -0
  386. package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
  387. package/security/02-playbooks/payment-integration-playbook.md +119 -0
  388. package/security/02-playbooks/penetration-testing-playbook.md +517 -0
  389. package/security/03-checklists/security-audit-checklist.md +356 -0
  390. package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
  391. package/security/05-cases/case-log4shell-incident.md +537 -0
  392. package/security/05-cases/case-major-breaches.md +468 -0
  393. package/security/06-glossary/security-glossary.md +212 -0
  394. package/security/compliance-automation.md +993 -0
  395. package/security/container-security.md +680 -0
  396. package/security/devsecops-complete.md +426 -0
  397. package/security/sast-dast-sca.md +775 -0
  398. package/security/secrets-management.md +594 -0
  399. package/security/security-architecture-deep-dive.md +37 -0
  400. package/security/threat-modeling-stride-playbook.md +40 -0
  401. package/seed-templates/auth-system.md +59 -0
  402. package/seed-templates/blog-content.md +94 -0
  403. package/seed-templates/dashboard.md +89 -0
  404. package/seed-templates/docs-site.md +73 -0
  405. package/seed-templates/e-commerce.md +50 -0
  406. package/seed-templates/saas-landing.md +92 -0
  407. package/seed-templates/settings-page.md +51 -0
  408. package/testing/01-standards/test-strategy-and-layering.md +83 -0
  409. package/testing/01-standards/testing-strategy-complete.md +422 -0
  410. package/testing/01-standards/unit-testing-best-practices.md +118 -0
  411. package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
  412. package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
  413. package/testing/03-checklists/test-strategy-checklist.md +208 -0
  414. package/testing/04-antipatterns/testing-antipatterns.md +718 -0
  415. package/testing/05-cases/case-testing-transformation.md +300 -0
  416. package/testing/06-glossary/testing-glossary.md +110 -0
  417. package/testing/risk-based-test-matrix.md +36 -0
  418. package/testing/testing-strategy-deep-dive.md +37 -0
@@ -0,0 +1,620 @@
1
+ ---
2
+ id: capacity-planning-playbook
3
+ title: 容量规划作战手册 (Capacity Planning Playbook)
4
+ domain: operations
5
+ category: 02-playbooks
6
+ difficulty: intermediate
7
+ tags: [capacity, operations, planning, playbook, 业务增长评估模板, 业务概况, 前置条件, 报告信息]
8
+ quality_score: 70
9
+ last_updated: 2026-06-15
10
+ ---
11
+ # 容量规划作战手册 (Capacity Planning Playbook)
12
+
13
+ ## 概述
14
+
15
+ 容量规划是确保系统在业务增长过程中持续满足性能与可用性目标的系统性工程实践。本手册覆盖从需求分析到自动伸缩的完整流程,帮助团队建立数据驱动的容量决策机制,避免资源浪费与性能劣化。适用于中大型生产系统(日活 > 1 万或月请求量 > 1 亿)。
16
+
17
+ ## 前置条件
18
+
19
+ ### 必须满足
20
+
21
+ - [ ] 生产环境可观测性已部署(Prometheus / Datadog / CloudWatch)
22
+ - [ ] 核心业务指标已定义 SLI/SLO
23
+ - [ ] 历史流量数据至少覆盖最近 90 天
24
+ - [ ] 基础设施即代码(IaC)已就绪,支持快速扩缩容
25
+ - [ ] 成本监控工具已配置(AWS Cost Explorer / GCP Billing / 自建)
26
+
27
+ ### 建议满足
28
+
29
+ - [ ] 业务方已提供未来 6-12 个月增长预期
30
+ - [ ] 已完成至少一次基线压测
31
+ - [ ] 告警体系已覆盖核心资源指标(CPU/Memory/Disk/Network)
32
+
33
+ ---
34
+
35
+ ## 阶段一:需求分析与业务建模
36
+
37
+ ### 1.1 业务增长预测
38
+
39
+ ```markdown
40
+ ## 业务增长评估模板
41
+
42
+ ### 当前基线
43
+ - 日活用户数 (DAU): ___
44
+ - 峰值 QPS: ___
45
+ - 平均请求延迟 (P50/P95/P99): ___/___/___ ms
46
+ - 存储增长速率: ___ GB/月
47
+ - 数据库连接峰值: ___
48
+
49
+ ### 增长预期
50
+ - 未来 3 个月 DAU 预期: ___(增长率 ___%)
51
+ - 未来 6 个月 DAU 预期: ___(增长率 ___%)
52
+ - 未来 12 个月 DAU 预期: ___(增长率 ___%)
53
+ - 计划中的营销活动 / 大促日期: ___
54
+ - 新功能上线对流量的预估影响: ___
55
+
56
+ ### 季节性因素
57
+ - 日内峰谷比: ___(如 10:1)
58
+ - 周内波动模式: ___
59
+ - 年内季节性高峰: ___
60
+ ```
61
+
62
+ ### 1.2 资源消耗画像
63
+
64
+ 针对每个核心服务,建立资源消耗模型:
65
+
66
+ ```yaml
67
+ # 服务资源画像示例
68
+ service: order-service
69
+ resource_profile:
70
+ cpu:
71
+ baseline: 200m # 空闲时 CPU 使用
72
+ per_request: 5m # 每请求增量
73
+ peak_multiplier: 3.0 # 峰值倍率
74
+ memory:
75
+ baseline: 256Mi
76
+ per_connection: 2Mi
77
+ cache_overhead: 128Mi
78
+ disk_io:
79
+ read_iops: 500
80
+ write_iops: 200
81
+ network:
82
+ ingress_per_req: 2KB
83
+ egress_per_req: 8KB
84
+ dependencies:
85
+ - name: postgres
86
+ connections_per_pod: 10
87
+ max_pool: 50
88
+ - name: redis
89
+ connections_per_pod: 20
90
+ ```
91
+
92
+ ### 1.3 容量需求计算
93
+
94
+ ```python
95
+ # 容量需求计算公式
96
+ def calculate_capacity(current_qps, growth_rate, months, safety_margin=0.3):
97
+ """
98
+ 计算目标容量需求
99
+
100
+ Args:
101
+ current_qps: 当前峰值 QPS
102
+ growth_rate: 月增长率 (如 0.15 表示 15%)
103
+ months: 规划周期(月)
104
+ safety_margin: 安全余量(默认 30%)
105
+
106
+ Returns:
107
+ target_qps: 目标容量
108
+ """
109
+ projected_qps = current_qps * (1 + growth_rate) ** months
110
+ target_qps = projected_qps * (1 + safety_margin)
111
+ return target_qps
112
+
113
+ # 示例:当前 1000 QPS,月增长 10%,规划 6 个月
114
+ # 目标 = 1000 * 1.1^6 * 1.3 ≈ 2304 QPS
115
+ ```
116
+
117
+ ---
118
+
119
+ ## 阶段二:负载测试
120
+
121
+ ### 2.1 k6 压测方案
122
+
123
+ ```javascript
124
+ // k6-capacity-test.js
125
+ import http from 'k6/http';
126
+ import { check, sleep } from 'k6';
127
+ import { Rate, Trend } from 'k6/metrics';
128
+
129
+ const errorRate = new Rate('errors');
130
+ const latency = new Trend('request_latency');
131
+
132
+ // 阶梯式负载:逐步增加并发,找到系统拐点
133
+ export const options = {
134
+ stages: [
135
+ { duration: '2m', target: 50 }, // 热身
136
+ { duration: '5m', target: 100 }, // 基线负载
137
+ { duration: '5m', target: 200 }, // 1.5x 负载
138
+ { duration: '5m', target: 400 }, // 3x 负载
139
+ { duration: '5m', target: 600 }, // 4.5x 负载(探测极限)
140
+ { duration: '5m', target: 800 }, // 6x 负载(过载测试)
141
+ { duration: '3m', target: 0 }, // 回落恢复
142
+ ],
143
+ thresholds: {
144
+ http_req_duration: ['p(95)<500', 'p(99)<1000'],
145
+ errors: ['rate<0.01'],
146
+ },
147
+ };
148
+
149
+ export default function () {
150
+ // 模拟核心业务场景(按真实流量比例混合)
151
+ const scenarios = [
152
+ { weight: 50, fn: browseProducts },
153
+ { weight: 30, fn: searchProducts },
154
+ { weight: 15, fn: addToCart },
155
+ { weight: 5, fn: checkout },
156
+ ];
157
+
158
+ const rand = Math.random() * 100;
159
+ let cumulative = 0;
160
+ for (const s of scenarios) {
161
+ cumulative += s.weight;
162
+ if (rand < cumulative) {
163
+ s.fn();
164
+ break;
165
+ }
166
+ }
167
+
168
+ sleep(1);
169
+ }
170
+
171
+ function browseProducts() {
172
+ const res = http.get(`${__ENV.BASE_URL}/api/products`);
173
+ check(res, { 'browse 200': (r) => r.status === 200 });
174
+ errorRate.add(res.status !== 200);
175
+ latency.add(res.timings.duration);
176
+ }
177
+
178
+ function searchProducts() {
179
+ const res = http.get(`${__ENV.BASE_URL}/api/search?q=test`);
180
+ check(res, { 'search 200': (r) => r.status === 200 });
181
+ errorRate.add(res.status !== 200);
182
+ latency.add(res.timings.duration);
183
+ }
184
+
185
+ function addToCart() {
186
+ const payload = JSON.stringify({ productId: 1, quantity: 1 });
187
+ const params = { headers: { 'Content-Type': 'application/json' } };
188
+ const res = http.post(`${__ENV.BASE_URL}/api/cart`, payload, params);
189
+ check(res, { 'cart 200': (r) => r.status === 200 });
190
+ errorRate.add(res.status !== 200);
191
+ latency.add(res.timings.duration);
192
+ }
193
+
194
+ function checkout() {
195
+ const res = http.post(`${__ENV.BASE_URL}/api/checkout`, '{}', {
196
+ headers: { 'Content-Type': 'application/json' },
197
+ });
198
+ check(res, { 'checkout 200': (r) => r.status === 200 });
199
+ errorRate.add(res.status !== 200);
200
+ latency.add(res.timings.duration);
201
+ }
202
+ ```
203
+
204
+ ### 2.2 Locust 压测方案
205
+
206
+ ```python
207
+ # locustfile.py
208
+ from locust import HttpUser, task, between, events
209
+ import json
210
+ import logging
211
+
212
+ class CapacityTestUser(HttpUser):
213
+ """容量规划压测用户模型"""
214
+ wait_time = between(0.5, 2.0)
215
+
216
+ @task(50)
217
+ def browse_products(self):
218
+ with self.client.get("/api/products", catch_response=True) as resp:
219
+ if resp.status_code != 200:
220
+ resp.failure(f"Status {resp.status_code}")
221
+
222
+ @task(30)
223
+ def search_products(self):
224
+ with self.client.get("/api/search?q=test", catch_response=True) as resp:
225
+ if resp.status_code != 200:
226
+ resp.failure(f"Status {resp.status_code}")
227
+
228
+ @task(15)
229
+ def add_to_cart(self):
230
+ payload = {"productId": 1, "quantity": 1}
231
+ with self.client.post("/api/cart", json=payload, catch_response=True) as resp:
232
+ if resp.status_code != 200:
233
+ resp.failure(f"Status {resp.status_code}")
234
+
235
+ @task(5)
236
+ def checkout(self):
237
+ with self.client.post("/api/checkout", json={}, catch_response=True) as resp:
238
+ if resp.status_code != 200:
239
+ resp.failure(f"Status {resp.status_code}")
240
+
241
+ # 运行命令:
242
+ # locust -f locustfile.py --host=http://target:8080 \
243
+ # --users 500 --spawn-rate 10 --run-time 30m \
244
+ # --csv=capacity_report --html=capacity_report.html
245
+ ```
246
+
247
+ ### 2.3 压测执行规范
248
+
249
+ | 项目 | 要求 |
250
+ |------|------|
251
+ | 测试环境 | 必须与生产配置一致(实例规格、副本数、中间件版本) |
252
+ | 数据准备 | 数据量级至少为生产的 50%,分布特征一致 |
253
+ | 基线测试 | 先在当前配置下跑出基线 P50/P95/P99 |
254
+ | 阶梯递增 | 每阶段递增 50-100% 负载,持续至少 5 分钟 |
255
+ | 监控覆盖 | 压测期间必须同步采集 CPU/Memory/Disk IO/Network/DB 指标 |
256
+ | 超时设定 | 请求超时应与生产一致(不可放大) |
257
+ | 重复验证 | 关键结论至少重复验证 2 次 |
258
+
259
+ ---
260
+
261
+ ## 阶段三:瓶颈识别
262
+
263
+ ### 3.1 资源维度分析矩阵
264
+
265
+ ```markdown
266
+ ## 瓶颈识别检查表
267
+
268
+ ### CPU 瓶颈
269
+ - [ ] CPU 使用率是否持续 > 70%?
270
+ - [ ] 是否存在 CPU throttling(容器场景检查 nr_throttled)?
271
+ - [ ] 热点函数是否可优化(通过 profiling 确认)?
272
+ - [ ] 是否存在不必要的序列化/反序列化开销?
273
+
274
+ ### 内存瓶颈
275
+ - [ ] RSS 内存是否持续增长(内存泄漏)?
276
+ - [ ] GC 停顿时间是否影响延迟?
277
+ - [ ] OOM Kill 事件是否出现?
278
+ - [ ] 缓存命中率是否达标(> 90%)?
279
+
280
+ ### 磁盘 I/O 瓶颈
281
+ - [ ] IOPS 是否接近磁盘上限?
282
+ - [ ] I/O await 时间是否过高(> 10ms)?
283
+ - [ ] 是否存在写放大(WAL、日志过多)?
284
+ - [ ] 磁盘使用率是否 > 80%?
285
+
286
+ ### 网络瓶颈
287
+ - [ ] 带宽使用率是否接近网卡上限?
288
+ - [ ] TCP 连接数是否接近 ulimit?
289
+ - [ ] DNS 解析延迟是否过高?
290
+ - [ ] 跨可用区流量是否过大?
291
+
292
+ ### 数据库瓶颈
293
+ - [ ] 慢查询比例(> 100ms)?
294
+ - [ ] 连接池使用率是否 > 80%?
295
+ - [ ] 锁等待是否频繁?
296
+ - [ ] 索引是否覆盖高频查询?
297
+ - [ ] 主从延迟是否 < 1s?
298
+
299
+ ### 外部依赖瓶颈
300
+ - [ ] 第三方 API 延迟 P99 是否 < 500ms?
301
+ - [ ] 熔断器触发频率是否正常?
302
+ - [ ] 消息队列积压量是否可控?
303
+ ```
304
+
305
+ ### 3.2 性能拐点定位
306
+
307
+ 系统性能拐点是容量规划的核心数据点。通过阶梯压测数据,识别以下拐点:
308
+
309
+ | 拐点类型 | 定义 | 识别方法 |
310
+ |----------|------|----------|
311
+ | 拐点 A(线性区终点) | 延迟开始偏离线性增长 | P95 延迟增速 > 20% |
312
+ | 拐点 B(饱和点) | 吞吐量不再随负载增加 | QPS 增长 < 5% 而负载增加 > 20% |
313
+ | 拐点 C(崩溃点) | 错误率突增、系统不可用 | 错误率 > 1% 或 P99 > SLO 阈值 |
314
+
315
+ **安全运行区间 = 拐点 A 的 70-80%**
316
+
317
+ ---
318
+
319
+ ## 阶段四:扩容策略
320
+
321
+ ### 4.1 垂直扩容 (Scale Up)
322
+
323
+ 适用场景:
324
+ - 数据库主节点(写入瓶颈)
325
+ - 有状态服务(迁移成本高)
326
+ - 单线程瓶颈(需要更高主频)
327
+
328
+ ```yaml
329
+ # 垂直扩容决策矩阵
330
+ vertical_scaling:
331
+ when_to_use:
332
+ - 单实例 CPU/Memory 未达上限
333
+ - 应用无法水平扩展(强状态)
334
+ - 扩容窗口紧急(< 1 小时)
335
+ limits:
336
+ - 单机上限受云厂商实例规格限制
337
+ - 扩容需要停机(部分场景)
338
+ - 成本随规格指数增长
339
+ sizing_guide:
340
+ cpu_upgrade: "每次升级 1 档(如 4c -> 8c),观测 24h"
341
+ memory_upgrade: "按当前使用量 2x 扩容,预留 GC 空间"
342
+ disk_upgrade: "IOPS 不足优先升级磁盘类型(gp3 -> io2)"
343
+ ```
344
+
345
+ ### 4.2 水平扩容 (Scale Out)
346
+
347
+ 适用场景:
348
+ - 无状态 Web/API 服务
349
+ - 读密集型数据库(加只读副本)
350
+ - 消息消费者(增加消费实例)
351
+
352
+ ```yaml
353
+ # 水平扩容配置示例 (Kubernetes)
354
+ apiVersion: apps/v1
355
+ kind: Deployment
356
+ metadata:
357
+ name: api-server
358
+ spec:
359
+ replicas: 6 # 从 3 扩到 6
360
+ strategy:
361
+ rollingUpdate:
362
+ maxSurge: 2 # 滚动更新批次
363
+ maxUnavailable: 1
364
+ template:
365
+ spec:
366
+ containers:
367
+ - name: api
368
+ resources:
369
+ requests:
370
+ cpu: "500m"
371
+ memory: "512Mi"
372
+ limits:
373
+ cpu: "1000m"
374
+ memory: "1Gi"
375
+ topologySpreadConstraints: # 跨可用区打散
376
+ - maxSkew: 1
377
+ topologyKey: topology.kubernetes.io/zone
378
+ whenUnsatisfiable: DoNotSchedule
379
+ ```
380
+
381
+ ### 4.3 扩容决策树
382
+
383
+ ```
384
+ 当前瓶颈是什么?
385
+ ├── CPU → 应用是否支持多副本?
386
+ │ ├── 是 → 水平扩容(增加 Pod)
387
+ │ └── 否 → 垂直扩容(升级 CPU)
388
+ ├── Memory → 是否为缓存可外置?
389
+ │ ├── 是 → 引入外部缓存(Redis)
390
+ │ └── 否 → 垂直扩容(增加内存)
391
+ ├── Disk I/O → 是否为读瓶颈?
392
+ │ ├── 是 → 加只读副本 / 引入缓存层
393
+ │ └── 否 → 升级磁盘类型 / 分库分表
394
+ ├── Network → 是否为跨区流量?
395
+ │ ├── 是 → 增加本区副本 / CDN
396
+ │ └── 否 → 升级网卡 / 优化 payload
397
+ └── DB 连接 → 引入连接池 / PgBouncer / ProxySQL
398
+ ```
399
+
400
+ ---
401
+
402
+ ## 阶段五:自动伸缩
403
+
404
+ ### 5.1 Kubernetes HPA
405
+
406
+ ```yaml
407
+ # HPA 配置(基于 CPU + 自定义指标)
408
+ apiVersion: autoscaling/v2
409
+ kind: HorizontalPodAutoscaler
410
+ metadata:
411
+ name: api-server-hpa
412
+ spec:
413
+ scaleTargetRef:
414
+ apiVersion: apps/v1
415
+ kind: Deployment
416
+ name: api-server
417
+ minReplicas: 3
418
+ maxReplicas: 20
419
+ behavior:
420
+ scaleUp:
421
+ stabilizationWindowSeconds: 60 # 扩容冷却 1 分钟
422
+ policies:
423
+ - type: Pods
424
+ value: 4 # 每次最多加 4 个 Pod
425
+ periodSeconds: 60
426
+ scaleDown:
427
+ stabilizationWindowSeconds: 300 # 缩容冷却 5 分钟
428
+ policies:
429
+ - type: Percent
430
+ value: 25 # 每次最多缩 25%
431
+ periodSeconds: 120
432
+ metrics:
433
+ - type: Resource
434
+ resource:
435
+ name: cpu
436
+ target:
437
+ type: Utilization
438
+ averageUtilization: 65 # CPU 目标 65%
439
+ - type: Resource
440
+ resource:
441
+ name: memory
442
+ target:
443
+ type: Utilization
444
+ averageUtilization: 75
445
+ - type: Pods
446
+ pods:
447
+ metric:
448
+ name: http_requests_per_second # 自定义业务指标
449
+ target:
450
+ type: AverageValue
451
+ averageValue: "100" # 每 Pod 目标 100 QPS
452
+ ```
453
+
454
+ ### 5.2 KEDA 事件驱动伸缩
455
+
456
+ ```yaml
457
+ # KEDA ScaledObject(基于消息队列深度)
458
+ apiVersion: keda.sh/v1alpha1
459
+ kind: ScaledObject
460
+ metadata:
461
+ name: order-processor-scaler
462
+ spec:
463
+ scaleTargetRef:
464
+ name: order-processor
465
+ minReplicaCount: 2
466
+ maxReplicaCount: 50
467
+ pollingInterval: 15
468
+ cooldownPeriod: 120
469
+ triggers:
470
+ - type: rabbitmq
471
+ metadata:
472
+ host: "amqp://rabbitmq.default:5672"
473
+ queueName: "orders"
474
+ queueLength: "100" # 每 100 条消息扩一个 Pod
475
+ - type: prometheus
476
+ metadata:
477
+ serverAddress: "http://prometheus:9090"
478
+ metricName: order_processing_lag_seconds
479
+ threshold: "30" # 处理延迟 > 30s 时扩容
480
+ query: |
481
+ avg(order_processing_lag_seconds{job="order-processor"})
482
+ ```
483
+
484
+ ### 5.3 伸缩策略最佳实践
485
+
486
+ | 策略 | 推荐值 | 原因 |
487
+ |------|--------|------|
488
+ | 扩容冷却期 | 30-60s | 快速响应流量突增 |
489
+ | 缩容冷却期 | 300-600s | 避免流量波动导致频繁缩容 |
490
+ | CPU 目标利用率 | 60-70% | 预留 burst 空间 |
491
+ | 最小副本数 | >= 2 | 保证高可用 |
492
+ | 最大副本数 | 基于预算和集群容量 | 防止资源耗尽 |
493
+ | 扩容步长 | 当前副本数的 50-100% | 快速到位 |
494
+ | 缩容步长 | 当前副本数的 10-25% | 平稳收缩 |
495
+
496
+ ---
497
+
498
+ ## 阶段六:成本优化
499
+
500
+ ### 6.1 资源利用率审计
501
+
502
+ ```bash
503
+ #!/bin/bash
504
+ # 资源利用率审计脚本
505
+ echo "=== 资源利用率审计 ==="
506
+
507
+ # 查看所有 namespace 资源使用 vs 请求
508
+ kubectl top pods -A --sort-by=cpu | head -20
509
+
510
+ # 找到过度分配的 Pod(请求远大于实际使用)
511
+ echo "--- 过度分配检查 ---"
512
+ kubectl get pods -A -o json | jq -r '
513
+ .items[] |
514
+ select(.spec.containers[0].resources.requests.cpu != null) |
515
+ "\(.metadata.namespace)/\(.metadata.name) - CPU Request: \(.spec.containers[0].resources.requests.cpu)"
516
+ '
517
+
518
+ # 查看 PVC 使用情况
519
+ echo "--- 存储利用率 ---"
520
+ kubectl get pvc -A --sort-by=.spec.resources.requests.storage
521
+ ```
522
+
523
+ ### 6.2 成本优化清单
524
+
525
+ | 优化项 | 预期节省 | 风险等级 | 实施难度 |
526
+ |--------|----------|----------|----------|
527
+ | 使用 Spot/Preemptible 实例(无状态服务) | 60-80% | 中 | 低 |
528
+ | 右调资源 request/limit(Right-sizing) | 20-40% | 低 | 低 |
529
+ | 预留实例 / Savings Plan(基线负载) | 30-50% | 低 | 低 |
530
+ | 非高峰时段缩容(定时 HPA) | 15-25% | 低 | 低 |
531
+ | 冷数据归档(S3 Glacier / 低频存储) | 40-60% | 低 | 中 |
532
+ | 多可用区副本精简(非关键服务) | 10-20% | 中 | 中 |
533
+ | 缓存层优化减少 DB 读负载 | 间接 | 低 | 中 |
534
+
535
+ ---
536
+
537
+ ## 阶段七:容量报告
538
+
539
+ ### 7.1 容量报告模板
540
+
541
+ ```markdown
542
+ # 容量规划报告
543
+
544
+ ## 报告信息
545
+ - 报告日期: YYYY-MM-DD
546
+ - 报告人: ___
547
+ - 评审周期: Q_ YYYY
548
+ - 下次评审: YYYY-MM-DD
549
+
550
+ ## 1. 业务概况
551
+ | 指标 | 上期 | 本期 | 变化 | 预测(下期) |
552
+ |------|------|------|------|-------------|
553
+ | DAU | | | | |
554
+ | 峰值 QPS | | | | |
555
+ | 数据量(GB) | | | | |
556
+ | 月度成本($) | | | | |
557
+
558
+ ## 2. 核心服务容量状态
559
+ | 服务 | 当前容量 | 实际负载 | 利用率 | 拐点距离 | 风险等级 |
560
+ |------|---------|---------|--------|---------|---------|
561
+ | api-server | 2000 QPS | 1200 QPS | 60% | 800 QPS | 低 |
562
+ | order-svc | 500 QPS | 420 QPS | 84% | 80 QPS | 高 |
563
+ | user-db | 200 conn | 180 conn | 90% | 20 conn | 危险 |
564
+
565
+ ## 3. 资源利用率摘要
566
+ | 资源类型 | 总配额 | 实际使用 | 利用率 | 趋势 |
567
+ |----------|--------|---------|--------|------|
568
+ | CPU (cores) | 100 | 62 | 62% | ↑ |
569
+ | Memory (GB) | 256 | 180 | 70% | ↑ |
570
+ | Storage (TB) | 10 | 7.2 | 72% | ↑ |
571
+ | DB Connections | 500 | 380 | 76% | → |
572
+
573
+ ## 4. 压测结果摘要
574
+ - 测试日期: YYYY-MM-DD
575
+ - 工具: k6 / Locust
576
+ - 拐点 A: ___ QPS (P95 < 200ms)
577
+ - 拐点 B: ___ QPS (吞吐饱和)
578
+ - 拐点 C: ___ QPS (错误率 > 1%)
579
+ - 安全运行上限: ___ QPS
580
+
581
+ ## 5. 风险与建议
582
+ | 风险项 | 影响 | 建议措施 | 优先级 | 预估成本 |
583
+ |--------|------|---------|--------|---------|
584
+ | order-svc 接近饱和 | 订单失败 | 水平扩容至 8 Pod | P0 | $200/月 |
585
+ | user-db 连接池满 | 全站降级 | 引入 PgBouncer | P0 | $50/月 |
586
+ | 存储增长快 | 磁盘满 | 冷数据归档 | P1 | 节省$100/月 |
587
+
588
+ ## 6. 行动计划
589
+ | 编号 | 行动项 | 负责人 | 截止日期 | 状态 |
590
+ |------|--------|--------|---------|------|
591
+ | CP-01 | order-svc 扩容 | SRE-A | YYYY-MM-DD | 待执行 |
592
+ | CP-02 | 引入 PgBouncer | DBA-B | YYYY-MM-DD | 待执行 |
593
+ | CP-03 | 冷数据归档方案 | SRE-C | YYYY-MM-DD | 规划中 |
594
+ ```
595
+
596
+ ### 7.2 容量评审节奏
597
+
598
+ | 评审类型 | 频率 | 参与者 | 输出 |
599
+ |----------|------|--------|------|
600
+ | 周度快查 | 每周 | SRE On-call | 异常告警 Slack 通知 |
601
+ | 月度评审 | 每月 | SRE + 架构师 | 月度容量简报 |
602
+ | 季度规划 | 每季度 | SRE + 架构 + 业务 | 完整容量报告 + 预算申请 |
603
+ | 大促专项 | 大促前 4 周 | 全团队 | 压测报告 + 扩容方案 |
604
+
605
+ ---
606
+
607
+ ## Agent Checklist
608
+
609
+ - [ ] 已完成业务增长预测并建立资源消耗画像
610
+ - [ ] 已使用 k6 或 Locust 完成基线压测和阶梯压测
611
+ - [ ] 已识别系统性能拐点(A/B/C 三个拐点)
612
+ - [ ] 已完成瓶颈分析(CPU/Memory/Disk/Network/DB/外部依赖)
613
+ - [ ] 已制定扩容策略(垂直 vs 水平 vs 混合)
614
+ - [ ] 已配置自动伸缩(HPA/KEDA 或等效方案)
615
+ - [ ] 已完成成本优化审计并输出优化建议
616
+ - [ ] 已输出完整的容量规划报告
617
+ - [ ] 已建立定期容量评审机制(周/月/季度)
618
+ - [ ] 已将容量告警集成到监控体系(利用率 > 70% 预警,> 85% 告警)
619
+ - [ ] 已将压测脚本纳入版本控制并可重复执行
620
+ - [ ] 已与业务方对齐增长预期并更新容量模型