@umacloud/knowledge 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (418) hide show
  1. package/00-governance/governance-capabilities.md +557 -0
  2. package/00-governance/knowledge-map.md +39 -0
  3. package/00-governance/maintenance-policy.md +76 -0
  4. package/00-governance/review-checklist.md +81 -0
  5. package/README.md +13 -0
  6. package/ai/01-standards/agent-development-complete.md +691 -0
  7. package/ai/01-standards/llm-application-complete.md +488 -0
  8. package/ai/01-standards/mlops-complete.md +798 -0
  9. package/ai/01-standards/prompt-engineering-complete.md +646 -0
  10. package/ai/01-standards/rag-architecture-complete.md +649 -0
  11. package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
  12. package/ai/03-checklists/ai-project-checklist.md +215 -0
  13. package/ai/04-antipatterns/ai-antipatterns.md +661 -0
  14. package/ai/05-cases/case-rag-production.md +147 -0
  15. package/ai/06-glossary/ai-glossary.md +162 -0
  16. package/ai/agent-evaluation-benchmark.md +53 -0
  17. package/ai/ai-agent-memory-context-management.md +41 -0
  18. package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
  19. package/ai/ai-data-security-and-compliance-playbook.md +37 -0
  20. package/ai/ai-domain-index-and-checklist.md +40 -0
  21. package/ai/ai-governance-maturity-model.md +50 -0
  22. package/ai/ai-model-selection-and-routing-strategy.md +47 -0
  23. package/ai/ai-observability-and-oncall-runbook.md +52 -0
  24. package/ai/ai-rag-engineering-playbook.md +42 -0
  25. package/ai/ai-red-team-and-safety-evaluation.md +42 -0
  26. package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
  27. package/ai/llm-agent-engineering-deep-dive.md +57 -0
  28. package/ai/prompt-and-tool-guardrails.md +52 -0
  29. package/api/01-standards/enterprise-api-standards.md +198 -0
  30. package/api/01-standards/rest-api-design-guide.md +63 -0
  31. package/api/02-playbooks/api-pagination-playbook.md +93 -0
  32. package/api/02-playbooks/graphql-production-playbook.md +176 -0
  33. package/api/03-checklists/api-review-checklist.md +55 -0
  34. package/api/04-antipatterns/api-antipatterns.md +112 -0
  35. package/architecture/01-standards/api-gateway-patterns.md +496 -0
  36. package/architecture/01-standards/cloud-native-patterns.md +644 -0
  37. package/architecture/01-standards/distributed-systems-patterns.md +591 -0
  38. package/architecture/01-standards/event-driven-architecture.md +595 -0
  39. package/architecture/01-standards/microservices-patterns-complete.md +968 -0
  40. package/architecture/01-standards/microservices-patterns.md +495 -0
  41. package/architecture/01-standards/system-design-interview.md +664 -0
  42. package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
  43. package/architecture/02-playbooks/migration-playbook.md +780 -0
  44. package/architecture/02-playbooks/system-design-playbook.md +779 -0
  45. package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
  46. package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
  47. package/architecture/05-cases/case-netflix-microservices.md +413 -0
  48. package/architecture/06-glossary/architecture-glossary.md +164 -0
  49. package/architecture/adr-template-and-examples.md +38 -0
  50. package/architecture/api-gateway-deep-dive.md +1291 -0
  51. package/architecture/configuration-management.md +1162 -0
  52. package/architecture/distributed-transactions.md +1220 -0
  53. package/architecture/microservices-complete.md +735 -0
  54. package/architecture/resilience-and-disaster-patterns.md +37 -0
  55. package/architecture/service-governance.md +1198 -0
  56. package/architecture/system-architecture-deep-dive.md +37 -0
  57. package/backend/01-standards/analytics-and-growth.md +65 -0
  58. package/backend/01-standards/api-and-error-conventions.md +120 -0
  59. package/backend/01-standards/application-layering-and-packaging.md +160 -0
  60. package/backend/01-standards/auth-implementation.md +104 -0
  61. package/backend/01-standards/backend-framework-idioms.md +74 -0
  62. package/backend/01-standards/background-jobs-and-async.md +66 -0
  63. package/backend/01-standards/caching-strategies-complete.md +390 -0
  64. package/backend/01-standards/config-and-observability.md +77 -0
  65. package/backend/01-standards/data-modeling-and-persistence.md +94 -0
  66. package/backend/01-standards/django-complete.md +1765 -0
  67. package/backend/01-standards/email-and-notifications.md +64 -0
  68. package/backend/01-standards/fastapi-complete.md +925 -0
  69. package/backend/01-standards/file-upload-and-storage.md +66 -0
  70. package/backend/01-standards/graphql-api-complete.md +416 -0
  71. package/backend/01-standards/llm-application-standard.md +78 -0
  72. package/backend/01-standards/message-queue-patterns.md +379 -0
  73. package/backend/01-standards/microservices-and-distributed.md +78 -0
  74. package/backend/01-standards/nestjs-complete.md +2167 -0
  75. package/backend/01-standards/payment-integration.md +80 -0
  76. package/backend/01-standards/rate-limiting-complete.md +451 -0
  77. package/backend/01-standards/realtime-and-websocket.md +65 -0
  78. package/backend/01-standards/search-and-filtering.md +64 -0
  79. package/backend/01-standards/spring-boot-complete.md +445 -0
  80. package/backend/02-playbooks/api-design-playbook.md +718 -0
  81. package/backend/02-playbooks/email-send-playbook.md +130 -0
  82. package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
  83. package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
  84. package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
  85. package/backend/03-checklists/api-launch-checklist.md +189 -0
  86. package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
  87. package/blockchain/01-standards/blockchain-basics.md +557 -0
  88. package/blockchain/01-standards/smart-contract-development.md +1315 -0
  89. package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
  90. package/cicd/01-standards/github-actions-complete.md +473 -0
  91. package/cicd/01-standards/release-and-store-submission.md +75 -0
  92. package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
  93. package/cicd/02-playbooks/release-management-playbook.md +605 -0
  94. package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
  95. package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
  96. package/cicd/05-cases/case-deployment-automation.md +221 -0
  97. package/cicd/05-cases/case-gitops-transformation.md +212 -0
  98. package/cicd/06-glossary/cicd-glossary.md +114 -0
  99. package/cicd/cicd-blueprint-deep-dive.md +38 -0
  100. package/cicd/release-readiness-gate.md +37 -0
  101. package/cloud-native/01-standards/container-security.md +741 -0
  102. package/cloud-native/01-standards/kubernetes-complete.md +812 -0
  103. package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
  104. package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
  105. package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
  106. package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
  107. package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
  108. package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
  109. package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
  110. package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
  111. package/cloud-native/03-checklists/container-security-checklist.md +431 -0
  112. package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
  113. package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
  114. package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
  115. package/cloud-native/05-cases/case-k8s-migration.md +478 -0
  116. package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
  117. package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
  118. package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
  119. package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
  120. package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
  121. package/data/01-standards/elasticsearch-complete.md +2098 -0
  122. package/data/01-standards/postgresql-complete.md +1613 -0
  123. package/data/01-standards/redis-complete.md +1527 -0
  124. package/data/02-playbooks/database-optimization-playbook.md +403 -0
  125. package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
  126. package/data/03-checklists/database-launch-checklist.md +187 -0
  127. package/data/04-antipatterns/database-antipatterns.md +873 -0
  128. package/data/05-cases/case-database-migration.md +310 -0
  129. package/data/06-glossary/database-glossary.md +440 -0
  130. package/data/data-governance-and-modeling-deep-dive.md +39 -0
  131. package/data-engineering/01-standards/airflow-complete.md +523 -0
  132. package/data-engineering/01-standards/kafka-complete.md +1521 -0
  133. package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
  134. package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
  135. package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
  136. package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
  137. package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
  138. package/database/01-standards/database-schema-standards.md +147 -0
  139. package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
  140. package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
  141. package/database/02-playbooks/postgresql-production-playbook.md +146 -0
  142. package/database/02-playbooks/redis-caching-playbook.md +117 -0
  143. package/database/03-checklists/database-review-checklist.md +50 -0
  144. package/database/04-antipatterns/database-antipatterns.md +112 -0
  145. package/design/01-standards/ui-design-system-complete.md +423 -0
  146. package/design/02-playbooks/design-handoff-playbook.md +254 -0
  147. package/design/02-playbooks/design-review-playbook.md +388 -0
  148. package/design/03-checklists/design-review-checklist.md +246 -0
  149. package/design/04-antipatterns/design-antipatterns.md +378 -0
  150. package/design/05-cases/case-design-system-adoption.md +328 -0
  151. package/design/06-glossary/design-glossary.md +329 -0
  152. package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
  153. package/design/ux-system-deep-dive.md +38 -0
  154. package/design-systems/00-craft-rules.md +71 -0
  155. package/design-systems/aesthetic-families.md +43 -0
  156. package/design-systems/anti-ai-slop.md +162 -0
  157. package/design-systems/bold-geometric.md +120 -0
  158. package/design-systems/brutalist-bold.md +103 -0
  159. package/design-systems/editorial-clean.md +109 -0
  160. package/design-systems/glass-aurora.md +108 -0
  161. package/design-systems/modern-minimal.md +145 -0
  162. package/design-systems/premium-luxury.md +106 -0
  163. package/design-systems/product-type-design-map.md +48 -0
  164. package/design-systems/soft-warm.md +123 -0
  165. package/design-systems/tech-utility.md +113 -0
  166. package/desktop/01-standards/desktop-app-standard.md +72 -0
  167. package/desktop/01-standards/desktop-design.md +71 -0
  168. package/development/00-governance/document-template.md +41 -0
  169. package/development/01-standards/api-versioning-strategies.md +432 -0
  170. package/development/01-standards/authentication-patterns-complete.md +479 -0
  171. package/development/01-standards/css-architecture-complete.md +550 -0
  172. package/development/01-standards/database-migration-strategies.md +484 -0
  173. package/development/01-standards/elasticsearch-complete.md +347 -0
  174. package/development/01-standards/git-complete.md +371 -0
  175. package/development/01-standards/golang-complete.md +1565 -0
  176. package/development/01-standards/graphql-complete.md +298 -0
  177. package/development/01-standards/javascript-bundlers-complete.md +469 -0
  178. package/development/01-standards/javascript-typescript-complete.md +528 -0
  179. package/development/01-standards/jest-complete.md +275 -0
  180. package/development/01-standards/linux-complete.md +234 -0
  181. package/development/01-standards/logging-observability-complete.md +526 -0
  182. package/development/01-standards/microservices-communication.md +502 -0
  183. package/development/01-standards/mongodb-complete.md +406 -0
  184. package/development/01-standards/oauth2-complete.md +285 -0
  185. package/development/01-standards/performance-optimization-complete.md +289 -0
  186. package/development/01-standards/playwright-complete.md +247 -0
  187. package/development/01-standards/postgresql-complete.md +456 -0
  188. package/development/01-standards/pytest-complete.md +340 -0
  189. package/development/01-standards/python-async-programming.md +902 -0
  190. package/development/01-standards/python-complete.md +956 -0
  191. package/development/01-standards/python-decorators-complete.md +799 -0
  192. package/development/01-standards/python-design-patterns.md +2854 -0
  193. package/development/01-standards/python-packaging-distribution.md +420 -0
  194. package/development/01-standards/python-testing-strategies.md +607 -0
  195. package/development/01-standards/python-web-frameworks-comparison.md +471 -0
  196. package/development/01-standards/redis-complete.md +317 -0
  197. package/development/01-standards/rest-api-complete.md +316 -0
  198. package/development/01-standards/rust-complete.md +578 -0
  199. package/development/01-standards/typescript-advanced-types.md +1513 -0
  200. package/development/01-standards/web-security-complete.md +292 -0
  201. package/development/02-playbooks/api-design-playbook.md +810 -0
  202. package/development/02-playbooks/database-migration-playbook.md +580 -0
  203. package/development/02-playbooks/debugging-playbook.md +692 -0
  204. package/development/02-playbooks/feature-delivery-playbook.md +430 -0
  205. package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
  206. package/development/02-playbooks/performance-optimization-playbook.md +531 -0
  207. package/development/02-playbooks/performance-tuning-playbook.md +652 -0
  208. package/development/02-playbooks/refactor-playbook.md +403 -0
  209. package/development/02-playbooks/release-playbook.md +469 -0
  210. package/development/03-checklists/architecture-review-checklist.md +168 -0
  211. package/development/03-checklists/data-migration-checklist.md +157 -0
  212. package/development/03-checklists/oncall-handover-checklist.md +173 -0
  213. package/development/03-checklists/pr-checklist.md +158 -0
  214. package/development/03-checklists/production-readiness-checklist.md +190 -0
  215. package/development/03-checklists/release-readiness-checklist.md +154 -0
  216. package/development/03-checklists/security-review-checklist.md +182 -0
  217. package/development/04-antipatterns/api-antipatterns.md +657 -0
  218. package/development/04-antipatterns/architecture-antipatterns.md +686 -0
  219. package/development/04-antipatterns/backend-antipatterns.md +648 -0
  220. package/development/04-antipatterns/cicd-antipatterns.md +540 -0
  221. package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
  222. package/development/04-antipatterns/data-antipatterns.md +658 -0
  223. package/development/04-antipatterns/database-antipatterns.md +578 -0
  224. package/development/04-antipatterns/frontend-antipatterns.md +635 -0
  225. package/development/04-antipatterns/reliability-antipatterns.md +700 -0
  226. package/development/04-antipatterns/security-antipatterns.md +747 -0
  227. package/development/05-cases/case-api-version-migration.md +428 -0
  228. package/development/05-cases/case-authorization-hardening.md +383 -0
  229. package/development/05-cases/case-bluegreen-rollback.md +466 -0
  230. package/development/05-cases/case-cache-snowball-protection.md +485 -0
  231. package/development/05-cases/case-ci-cd-pipeline.md +544 -0
  232. package/development/05-cases/case-database-scaling.md +500 -0
  233. package/development/05-cases/case-db-hotspot-optimization.md +487 -0
  234. package/development/05-cases/case-incident-mttr-reduction.md +563 -0
  235. package/development/05-cases/case-microservice-migration.md +375 -0
  236. package/development/05-cases/case-performance-optimization.md +406 -0
  237. package/development/05-cases/case-security-incident-response.md +345 -0
  238. package/development/06-glossary/full-stack-glossary.md +166 -0
  239. package/development/09-maturity/quarterly-audit-template.md +35 -0
  240. package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
  241. package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
  242. package/development/12-scenarios/development-scenarios-guide.md +565 -0
  243. package/development/13-implementation-assets/implementation-toolkit.md +282 -0
  244. package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
  245. package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
  246. package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
  247. package/development/api-contract-and-versioning-guide.md +36 -0
  248. package/development/api-governance-complete.md +43 -0
  249. package/development/backend-engineering-complete.md +43 -0
  250. package/development/code-review-quality-complete.md +43 -0
  251. package/development/concurrency-reliability-complete.md +43 -0
  252. package/development/database-engineering-complete.md +43 -0
  253. package/development/engineering-effectiveness-complete.md +43 -0
  254. package/development/engineering-standards-deep-dive.md +38 -0
  255. package/development/frontend-engineering-complete.md +43 -0
  256. package/development/performance-capacity-complete.md +43 -0
  257. package/development/refactor-migration-complete.md +42 -0
  258. package/development/refactoring-and-techdebt-playbook.md +37 -0
  259. package/development/security-in-development-complete.md +43 -0
  260. package/devops/01-standards/cicd-pipeline-complete.md +262 -0
  261. package/devops/01-standards/docker-complete.md +1490 -0
  262. package/devops/01-standards/github-actions-complete.md +337 -0
  263. package/devops/01-standards/kubernetes-complete.md +638 -0
  264. package/devops/01-standards/terraform-complete.md +2117 -0
  265. package/devops/02-playbooks/docker-compose-playbook.md +233 -0
  266. package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
  267. package/devops/02-playbooks/docker-production-playbook.md +952 -0
  268. package/edge-iot/01-standards/edge-iot-complete.md +473 -0
  269. package/experts/architect/api-design.md +178 -0
  270. package/experts/architect/methodology.md +124 -0
  271. package/experts/architect/security.md +75 -0
  272. package/experts/backend-lead/methodology.md +216 -0
  273. package/experts/devops/methodology.md +160 -0
  274. package/experts/frontend-lead/methodology.md +178 -0
  275. package/experts/product-manager/industry/ecommerce.md +43 -0
  276. package/experts/product-manager/industry/saas.md +40 -0
  277. package/experts/product-manager/methodology.md +97 -0
  278. package/experts/qa-lead/methodology.md +123 -0
  279. package/experts/qa-lead/test-strategy.md +128 -0
  280. package/experts/uiux-designer/methodology.md +125 -0
  281. package/frontend/01-standards/accessibility-complete.md +532 -0
  282. package/frontend/01-standards/accessibility-standard.md +74 -0
  283. package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
  284. package/frontend/01-standards/design-tokens-complete.md +444 -0
  285. package/frontend/01-standards/forms-and-validation.md +77 -0
  286. package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
  287. package/frontend/01-standards/i18n-and-localization.md +65 -0
  288. package/frontend/01-standards/nextjs-complete.md +451 -0
  289. package/frontend/01-standards/react-complete.md +713 -0
  290. package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
  291. package/frontend/01-standards/react-hooks-complete.md +1171 -0
  292. package/frontend/01-standards/seo-and-web-vitals.md +77 -0
  293. package/frontend/01-standards/state-management-complete.md +444 -0
  294. package/frontend/01-standards/vue-complete.md +499 -0
  295. package/frontend/01-standards/vue3-complete.md +2002 -0
  296. package/frontend/01-standards/web-framework-best-practices.md +64 -0
  297. package/frontend/01-standards/web-performance-complete.md +495 -0
  298. package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
  299. package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
  300. package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
  301. package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
  302. package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
  303. package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
  304. package/frontend/03-checklists/component-quality-checklist.md +166 -0
  305. package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
  306. package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
  307. package/frontend/05-cases/case-performance-optimization.md +274 -0
  308. package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
  309. package/harmony/01-standards/harmonyos-design.md +65 -0
  310. package/high-quality-engineering-playbook.md +54 -0
  311. package/incident/01-standards/incident-response-complete.md +303 -0
  312. package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
  313. package/incident/02-playbooks/postmortem-playbook.md +398 -0
  314. package/incident/03-checklists/incident-readiness-checklist.md +181 -0
  315. package/incident/04-antipatterns/incident-antipatterns.md +490 -0
  316. package/incident/05-cases/case-cascade-failure.md +176 -0
  317. package/incident/06-glossary/incident-glossary.md +114 -0
  318. package/incident/postmortem-and-response-deep-dive.md +39 -0
  319. package/industries/ecommerce/ecommerce-complete.md +631 -0
  320. package/industries/education/education-complete.md +555 -0
  321. package/industries/fintech/fintech-complete.md +501 -0
  322. package/industries/gaming/gaming-complete.md +587 -0
  323. package/industries/healthcare/healthcare-complete.md +452 -0
  324. package/low-code/01-standards/low-code-complete.md +944 -0
  325. package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
  326. package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
  327. package/miniprogram/01-standards/miniprogram-design.md +61 -0
  328. package/miniprogram/01-standards/miniprogram-standard.md +81 -0
  329. package/mobile/01-standards/android-material-design.md +70 -0
  330. package/mobile/01-standards/flutter-complete.md +384 -0
  331. package/mobile/01-standards/ios-design-hig.md +78 -0
  332. package/mobile/01-standards/mobile-app-standard.md +85 -0
  333. package/mobile/01-standards/react-native-complete.md +352 -0
  334. package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
  335. package/mobile/02-playbooks/mobile-performance.md +473 -0
  336. package/mobile/03-checklists/mobile-release-checklist.md +234 -0
  337. package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
  338. package/mobile/05-cases/case-app-performance.md +500 -0
  339. package/mobile/05-cases/case-app-startup-optimization.md +218 -0
  340. package/mobile/06-glossary/mobile-glossary.md +484 -0
  341. package/observability/01-standards/observability-standards.md +103 -0
  342. package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
  343. package/observability/02-playbooks/structured-logging-playbook.md +73 -0
  344. package/observability/03-checklists/observability-checklist.md +54 -0
  345. package/observability/04-antipatterns/observability-antipatterns.md +106 -0
  346. package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
  347. package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
  348. package/operations/03-checklists/production-launch-checklist.md +365 -0
  349. package/operations/04-antipatterns/operations-antipatterns.md +664 -0
  350. package/operations/05-cases/case-sre-practices.md +581 -0
  351. package/operations/06-glossary/operations-glossary.md +120 -0
  352. package/operations/aiops-anomaly-detection.md +758 -0
  353. package/operations/capacity-planning.md +1061 -0
  354. package/operations/chaos-engineering.md +659 -0
  355. package/operations/incident-command-system.md +38 -0
  356. package/operations/observability-complete.md +442 -0
  357. package/operations/slo-sli-playbook.md +517 -0
  358. package/operations/sre-operations-deep-dive.md +39 -0
  359. package/package.json +8 -0
  360. package/performance/01-standards/performance-and-scalability.md +80 -0
  361. package/performance/01-standards/performance-standards.md +156 -0
  362. package/performance/02-playbooks/query-optimization-playbook.md +103 -0
  363. package/performance/03-checklists/performance-checklist.md +56 -0
  364. package/performance/04-antipatterns/performance-antipatterns.md +146 -0
  365. package/product/01-standards/product-management-complete.md +285 -0
  366. package/product/02-playbooks/feature-launch-playbook.md +207 -0
  367. package/product/02-playbooks/user-research-playbook.md +532 -0
  368. package/product/03-checklists/feature-launch-checklist.md +275 -0
  369. package/product/04-antipatterns/product-antipatterns.md +355 -0
  370. package/product/05-cases/case-mvp-to-scale.md +384 -0
  371. package/product/06-glossary/product-glossary.md +462 -0
  372. package/product/feature-prioritization-framework.md +40 -0
  373. package/product/kpi-and-metric-tree.md +37 -0
  374. package/product/product-discovery-and-prd-deep-dive.md +41 -0
  375. package/quantum/01-standards/quantum-complete.md +1186 -0
  376. package/security/01-standards/api-security-complete.md +511 -0
  377. package/security/01-standards/container-runtime-security.md +574 -0
  378. package/security/01-standards/data-protection-gdpr.md +543 -0
  379. package/security/01-standards/owasp-top10-complete.md +1890 -0
  380. package/security/01-standards/secure-coding-baseline.md +90 -0
  381. package/security/01-standards/supply-chain-security.md +441 -0
  382. package/security/01-standards/web-security-checklist.md +108 -0
  383. package/security/01-standards/zero-trust-architecture.md +521 -0
  384. package/security/02-playbooks/auth-sso-playbook.md +166 -0
  385. package/security/02-playbooks/incident-response-security-playbook.md +588 -0
  386. package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
  387. package/security/02-playbooks/payment-integration-playbook.md +119 -0
  388. package/security/02-playbooks/penetration-testing-playbook.md +517 -0
  389. package/security/03-checklists/security-audit-checklist.md +356 -0
  390. package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
  391. package/security/05-cases/case-log4shell-incident.md +537 -0
  392. package/security/05-cases/case-major-breaches.md +468 -0
  393. package/security/06-glossary/security-glossary.md +212 -0
  394. package/security/compliance-automation.md +993 -0
  395. package/security/container-security.md +680 -0
  396. package/security/devsecops-complete.md +426 -0
  397. package/security/sast-dast-sca.md +775 -0
  398. package/security/secrets-management.md +594 -0
  399. package/security/security-architecture-deep-dive.md +37 -0
  400. package/security/threat-modeling-stride-playbook.md +40 -0
  401. package/seed-templates/auth-system.md +59 -0
  402. package/seed-templates/blog-content.md +94 -0
  403. package/seed-templates/dashboard.md +89 -0
  404. package/seed-templates/docs-site.md +73 -0
  405. package/seed-templates/e-commerce.md +50 -0
  406. package/seed-templates/saas-landing.md +92 -0
  407. package/seed-templates/settings-page.md +51 -0
  408. package/testing/01-standards/test-strategy-and-layering.md +83 -0
  409. package/testing/01-standards/testing-strategy-complete.md +422 -0
  410. package/testing/01-standards/unit-testing-best-practices.md +118 -0
  411. package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
  412. package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
  413. package/testing/03-checklists/test-strategy-checklist.md +208 -0
  414. package/testing/04-antipatterns/testing-antipatterns.md +718 -0
  415. package/testing/05-cases/case-testing-transformation.md +300 -0
  416. package/testing/06-glossary/testing-glossary.md +110 -0
  417. package/testing/risk-based-test-matrix.md +36 -0
  418. package/testing/testing-strategy-deep-dive.md +37 -0
@@ -0,0 +1,664 @@
1
+ ---
2
+ id: operations-antipatterns
3
+ title: 运维反模式 (Operations Anti-Patterns)
4
+ domain: operations
5
+ category: 04-antipatterns
6
+ difficulty: intermediate
7
+ tags: [alert, antipatterns, deployment, fatigue, manual, operations, server, snowflake]
8
+ quality_score: 70
9
+ last_updated: 2026-06-15
10
+ ---
11
+ # 运维反模式 (Operations Anti-Patterns)
12
+
13
+ ## 概述
14
+
15
+ 本文档收录生产运维中常见的 10 大反模式,每个反模式包含:问题描述、真实症状、根因分析、正确做法和检测方法。这些反模式在中大型系统中反复出现,是影响系统可用性和团队效率的主要根源。识别并修复它们是 SRE/DevOps 成熟度提升的关键一步。
16
+
17
+ ---
18
+
19
+ ## 反模式 1:告警疲劳 (Alert Fatigue)
20
+
21
+ ### 问题描述
22
+
23
+ 告警数量过多、误报率高、缺乏分级,导致运维人员对告警产生麻木心理,真正的关键告警被淹没在噪声中。
24
+
25
+ ### 典型症状
26
+
27
+ - On-call 每天收到 100+ 条告警通知
28
+ - 告警群/Channel 被设为免打扰
29
+ - P0 故障发现来自用户投诉而非告警
30
+ - 团队默认"先忽略,如果持续再看"
31
+ - 告警恢复通知被直接忽略
32
+
33
+ ### 根因分析
34
+
35
+ - 阈值设置过低(CPU > 50% 就告警)
36
+ - 缺乏告警聚合与抑制规则
37
+ - 未区分告警级别(所有告警走同一通道)
38
+ - 告警只关注资源指标,不关注业务影响
39
+ - 历史告警未清理(已下线服务仍在告警)
40
+
41
+ ### 正确做法
42
+
43
+ ```yaml
44
+ # 告警治理最佳实践
45
+ alert_governance:
46
+ principles:
47
+ - 每条告警必须可行动(收到就知道做什么)
48
+ - 告警必须分级(P0-P3 走不同通道)
49
+ - 非紧急告警不在夜间触发
50
+ - 每月审查告警有效性(删除/调整无效告警)
51
+
52
+ metrics:
53
+ target_daily_alerts: "< 5 条 P0/P1"
54
+ noise_ratio: "< 10%" # 误报率
55
+ ack_time_p95: "< 5 分钟" # 响应时间
56
+ resolution_time_p95: "< 30 分钟"
57
+
58
+ tiers:
59
+ P0_critical:
60
+ channel: PagerDuty + 电话
61
+ examples: ["服务不可用", "数据丢失", "安全事件"]
62
+ P1_high:
63
+ channel: Slack 告警频道 + 短信
64
+ examples: ["错误率 > 5%", "延迟 > SLO"]
65
+ P2_medium:
66
+ channel: Slack 告警频道
67
+ examples: ["CPU > 80%", "磁盘 > 85%"]
68
+ P3_low:
69
+ channel: 日报汇总
70
+ examples: ["证书 30 天内过期", "依赖版本过旧"]
71
+ ```
72
+
73
+ ### 检测方法
74
+
75
+ - 统计每日/每周告警数量趋势
76
+ - 计算告警误报率(自动恢复且无人处理的告警比例)
77
+ - 调查 P0 故障是通过告警还是用户反馈发现的
78
+
79
+ ---
80
+
81
+ ## 反模式 2:雪花服务器 (Snowflake Server)
82
+
83
+ ### 问题描述
84
+
85
+ 服务器配置全靠手工操作,每台机器都是独一无二的"雪花",无法复制、无法重建,出故障时只能祈祷。
86
+
87
+ ### 典型症状
88
+
89
+ - 没有人知道生产服务器上装了哪些软件
90
+ - "这台机器不能重启,上面有很多手动改过的配置"
91
+ - 新环境搭建需要数天,且每次结果不同
92
+ - 配置漂移导致"在我机器上能跑"
93
+ - 运维知识只存在于个别人脑中
94
+
95
+ ### 根因分析
96
+
97
+ - 没有使用基础设施即代码(IaC)
98
+ - SSH 到服务器手动修改配置
99
+ - 缺乏配置管理工具(Ansible/Chef/Puppet)
100
+ - 没有不可变基础设施的理念
101
+ - 文档缺失或过时
102
+
103
+ ### 正确做法
104
+
105
+ ```hcl
106
+ # 基础设施即代码(Terraform 示例)
107
+ resource "aws_instance" "api_server" {
108
+ ami = data.aws_ami.ubuntu.id # 标准化镜像
109
+ instance_type = "t3.large"
110
+
111
+ user_data = file("init.sh") # 初始化脚本版本化
112
+
113
+ tags = {
114
+ Name = "api-server-${count.index}"
115
+ Environment = "production"
116
+ ManagedBy = "terraform" # 标记 IaC 管理
117
+ }
118
+ }
119
+
120
+ # 原则:
121
+ # 1. 所有基础设施变更通过 Git PR 审核
122
+ # 2. 服务器是牛群(Cattle)不是宠物(Pet)
123
+ # 3. 任何服务器可随时销毁重建
124
+ # 4. 配置漂移检测每日运行
125
+ ```
126
+
127
+ ### 检测方法
128
+
129
+ - 尝试从零重建一台生产服务器,记录耗时和遇到的问题
130
+ - 检查是否所有服务器配置都在 Git 仓库中
131
+ - 对比多台同角色服务器的软件包列表,查看差异
132
+
133
+ ---
134
+
135
+ ## 反模式 3:手动部署 (Manual Deployment)
136
+
137
+ ### 问题描述
138
+
139
+ 部署过程依赖人工执行命令,没有自动化流水线,每次部署都是一场冒险。
140
+
141
+ ### 典型症状
142
+
143
+ - 部署需要 SSH 到服务器手动执行
144
+ - 部署步骤在 Wiki 或口口相传
145
+ - 只有特定人才会部署,此人请假就没人能发布
146
+ - 部署频率极低(每月一次),每次都是大爆炸式发布
147
+ - 部署后需要手动验证每个功能
148
+
149
+ ### 根因分析
150
+
151
+ - 缺乏 CI/CD 流水线
152
+ - 团队对自动化部署信心不足
153
+ - 技术债积累导致自动化困难
154
+ - "手动部署又不是不能用"的惯性思维
155
+
156
+ ### 正确做法
157
+
158
+ ```yaml
159
+ # GitHub Actions CI/CD 示例
160
+ name: Production Deploy
161
+ on:
162
+ push:
163
+ tags: ['v*']
164
+
165
+ jobs:
166
+ deploy:
167
+ runs-on: ubuntu-latest
168
+ steps:
169
+ - uses: actions/checkout@v4
170
+
171
+ - name: Run tests
172
+ run: pytest --tb=short
173
+
174
+ - name: Build & push image
175
+ run: |
176
+ docker build -t app:${{ github.ref_name }} .
177
+ docker push registry/app:${{ github.ref_name }}
178
+
179
+ - name: Deploy to production
180
+ run: |
181
+ kubectl set image deployment/app \
182
+ app=registry/app:${{ github.ref_name }}
183
+ kubectl rollout status deployment/app --timeout=300s
184
+
185
+ - name: Smoke test
186
+ run: ./scripts/smoke-test.sh
187
+
188
+ - name: Notify
189
+ if: always()
190
+ run: ./scripts/notify-deploy.sh ${{ job.status }}
191
+ ```
192
+
193
+ ### 检测方法
194
+
195
+ - 统计部署频率和每次部署耗时
196
+ - 检查是否所有部署都有审计记录
197
+ - 测试:随机找一位开发者,要求其在 30 分钟内完成一次生产部署
198
+
199
+ ---
200
+
201
+ ## 反模式 4:无 Runbook (No Runbook)
202
+
203
+ ### 问题描述
204
+
205
+ 没有故障处理手册,On-call 遇到问题只能临场发挥或等待"专家"上线。
206
+
207
+ ### 典型症状
208
+
209
+ - 新人 On-call 遇到告警完全不知所措
210
+ - 同一故障每次处理方式不同
211
+ - 故障恢复时间高度依赖当班人员经验
212
+ - "这个问题只有老张知道怎么处理"
213
+ - 复盘会反复出现"需要写 Runbook"的 Action Item,但从未完成
214
+
215
+ ### 根因分析
216
+
217
+ - 没有 Runbook 编写的流程要求
218
+ - 写完告警不写对应处理手册
219
+ - Runbook 写了但没人维护,内容过时
220
+ - 缺乏知识共享文化
221
+
222
+ ### 正确做法
223
+
224
+ ```markdown
225
+ # Runbook 模板
226
+
227
+ ## 告警名称: API Error Rate > 5%
228
+
229
+ ### 严重程度: P1
230
+
231
+ ### 影响范围
232
+ - 用户可能遇到 500 错误
233
+ - 下游服务可能受到影响
234
+
235
+ ### 诊断步骤
236
+ 1. 检查错误日志:
237
+ kubectl logs -n prod -l app=api --tail=100 | grep ERROR
238
+ 2. 检查依赖服务状态:
239
+ curl -s http://db-monitor:9090/health
240
+ curl -s http://redis:6379/ping
241
+ 3. 检查最近部署:
242
+ kubectl rollout history deployment/api -n prod
243
+ 4. 检查资源使用:
244
+ kubectl top pods -n prod -l app=api
245
+
246
+ ### 修复操作
247
+ - **如果是最近部署引起**:回滚 -> kubectl rollout undo deployment/api -n prod
248
+ - **如果是依赖服务故障**:启用降级开关 -> curl -X POST http://api/admin/circuit-breaker/open
249
+ - **如果是流量突增**:手动扩容 -> kubectl scale deployment/api --replicas=10 -n prod
250
+ - **如果是数据库慢查询**:联系 DBA 值班,升级为 P0
251
+
252
+ ### 验证方法
253
+ - 错误率恢复到 < 1%
254
+ - P95 延迟 < 200ms
255
+ - 健康检查通过
256
+
257
+ ### 升级条件
258
+ - 15 分钟内无法恢复 -> 升级为 P0,拉 War Room
259
+ ```
260
+
261
+ ### 检测方法
262
+
263
+ - 检查每个 P0/P1 告警是否有对应 Runbook
264
+ - 让新人根据 Runbook 处理模拟故障,记录成功率
265
+ - 统计故障 MTTR 与是否有 Runbook 的相关性
266
+
267
+ ---
268
+
269
+ ## 反模式 5:单点故障 (Single Point of Failure)
270
+
271
+ ### 问题描述
272
+
273
+ 系统关键路径上存在无冗余的单一组件,该组件故障将导致整体不可用。
274
+
275
+ ### 典型症状
276
+
277
+ - 数据库只有一个主节点,无从库
278
+ - 关键服务只有一个实例
279
+ - 所有流量都经过一台 Nginx
280
+ - 部署依赖单一 CI 服务器
281
+ - 配置中心 / 注册中心单节点
282
+
283
+ ### 根因分析
284
+
285
+ - 早期架构未考虑高可用
286
+ - 成本约束导致省略冗余
287
+ - "以前没出过问题"的侥幸心理
288
+ - 没有定期进行故障模式分析
289
+
290
+ ### 正确做法
291
+
292
+ ```yaml
293
+ # 消除单点故障检查清单
294
+ single_point_elimination:
295
+ compute:
296
+ - 每个服务至少 2 个实例
297
+ - 跨可用区部署
298
+ - PDB 保证滚动更新时最小可用数
299
+
300
+ database:
301
+ - 主从复制(同步或半同步)
302
+ - 自动 failover(Patroni / RDS Multi-AZ)
303
+ - 读写分离(写主读从)
304
+
305
+ network:
306
+ - 负载均衡器多节点
307
+ - DNS 多提供商
308
+ - 多条出口线路
309
+
310
+ storage:
311
+ - 备份异地存储
312
+ - 磁盘 RAID 或分布式存储
313
+
314
+ third_party:
315
+ - 关键第三方服务有备选方案
316
+ - 熔断器 + 降级策略
317
+ ```
318
+
319
+ ### 检测方法
320
+
321
+ - 画出系统架构图,标记每个组件的冗余数量
322
+ - 逐一假设每个组件故障,分析影响范围
323
+ - 执行 Chaos Engineering 实验验证
324
+
325
+ ---
326
+
327
+ ## 反模式 6:忽略日志 (Ignoring Logs)
328
+
329
+ ### 问题描述
330
+
331
+ 日志系统形同虚设:要么没有集中采集,要么有但没人看,要么格式混乱无法检索。
332
+
333
+ ### 典型症状
334
+
335
+ - 排查问题需要 SSH 到每台服务器 grep 日志
336
+ - 日志格式不统一(有的 JSON,有的纯文本,有的混合)
337
+ - 关键错误日志被 INFO 级别日志淹没
338
+ - 日志磁盘满导致服务崩溃
339
+ - 无法通过 request_id 串联一次请求的全链路日志
340
+
341
+ ### 根因分析
342
+
343
+ - 没有日志规范
344
+ - 日志采集基础设施未建设
345
+ - 开发人员不重视日志质量
346
+ - 日志保留策略缺失
347
+
348
+ ### 正确做法
349
+
350
+ ```python
351
+ # 结构化日志标准(Python 示例)
352
+ import structlog
353
+ import uuid
354
+
355
+ logger = structlog.get_logger()
356
+
357
+ def handle_request(request):
358
+ # 每次请求注入唯一 trace_id
359
+ trace_id = request.headers.get("X-Trace-ID", str(uuid.uuid4()))
360
+ log = logger.bind(
361
+ trace_id=trace_id,
362
+ method=request.method,
363
+ path=request.path,
364
+ user_id=request.user_id,
365
+ )
366
+
367
+ log.info("request_started")
368
+
369
+ try:
370
+ result = process(request)
371
+ log.info("request_completed",
372
+ status=200,
373
+ duration_ms=result.duration)
374
+ return result
375
+ except Exception as e:
376
+ log.error("request_failed",
377
+ error_type=type(e).__name__,
378
+ error_message=str(e),
379
+ status=500)
380
+ raise
381
+
382
+ # 输出示例(JSON 格式):
383
+ # {"event":"request_started","trace_id":"abc-123",
384
+ # "method":"POST","path":"/api/orders","user_id":"u-456",
385
+ # "timestamp":"2025-01-15T10:30:00Z","level":"info"}
386
+ ```
387
+
388
+ ### 检测方法
389
+
390
+ - 尝试查找 24 小时前某个特定请求的完整日志链路
391
+ - 检查日志格式是否统一为结构化 JSON
392
+ - 检查日志保留和轮转策略是否配置
393
+
394
+ ---
395
+
396
+ ## 反模式 7:无备份验证 (Untested Backups)
397
+
398
+ ### 问题描述
399
+
400
+ 备份策略已配置,但从未验证恢复流程,直到真正需要恢复时才发现备份不可用。
401
+
402
+ ### 典型症状
403
+
404
+ - "我们有备份"但从未执行过恢复
405
+ - 备份文件损坏未被发现(数月甚至数年)
406
+ - 恢复时间远超预期("说好的 1 小时变成了 8 小时")
407
+ - 恢复后数据不完整或不一致
408
+ - 备份空间已满,新备份静默失败
409
+
410
+ ### 根因分析
411
+
412
+ - 备份 = 配置完自动任务就放心了
413
+ - 没有恢复演练的流程要求
414
+ - 备份监控缺失(没有告警通知备份失败)
415
+ - 缺乏灾难恢复计划(DR Plan)
416
+
417
+ ### 正确做法
418
+
419
+ ```bash
420
+ #!/bin/bash
421
+ # 备份验证自动化脚本(每周执行)
422
+ set -euo pipefail
423
+
424
+ BACKUP_DATE=$(date +%Y%m%d)
425
+ RESTORE_DB="restore_test_${BACKUP_DATE}"
426
+
427
+ echo "=== 备份验证开始: $(date) ==="
428
+
429
+ # 1. 下载最新备份
430
+ echo "下载最新备份..."
431
+ aws s3 cp s3://backups/db/latest.sql.gz /tmp/restore_test.sql.gz
432
+
433
+ # 2. 检查备份文件完整性
434
+ echo "校验文件完整性..."
435
+ gunzip -t /tmp/restore_test.sql.gz
436
+
437
+ # 3. 恢复到测试库
438
+ echo "恢复到测试库..."
439
+ createdb $RESTORE_DB
440
+ gunzip -c /tmp/restore_test.sql.gz | psql $RESTORE_DB
441
+
442
+ # 4. 验证数据完整性
443
+ echo "验证数据完整性..."
444
+ RECORD_COUNT=$(psql -t $RESTORE_DB -c "SELECT count(*) FROM users")
445
+ if [ "$RECORD_COUNT" -lt 1000 ]; then
446
+ echo "ERROR: 记录数异常: $RECORD_COUNT"
447
+ exit 1
448
+ fi
449
+
450
+ # 5. 验证关键表结构
451
+ echo "验证表结构..."
452
+ psql $RESTORE_DB -c "\dt" > /dev/null
453
+
454
+ # 6. 清理
455
+ dropdb $RESTORE_DB
456
+ rm /tmp/restore_test.sql.gz
457
+
458
+ echo "=== 备份验证通过: $(date) ==="
459
+
460
+ # 发送成功通知
461
+ curl -X POST "$SLACK_WEBHOOK" \
462
+ -d "{\"text\":\"Backup verification PASSED: $(date)\"}"
463
+ ```
464
+
465
+ ### 检测方法
466
+
467
+ - 询问团队最近一次恢复演练的日期
468
+ - 检查备份任务的成功/失败历史记录
469
+ - 检查备份存储的实际占用是否合理增长
470
+
471
+ ---
472
+
473
+ ## 反模式 8:密钥硬编码 (Hardcoded Secrets)
474
+
475
+ ### 问题描述
476
+
477
+ 密码、API Key、Token 等敏感信息直接写在代码、配置文件或环境变量中,缺乏安全管理。
478
+
479
+ ### 典型症状
480
+
481
+ - 代码仓库中搜索到 password/secret/token 的明文值
482
+ - `.env` 文件被提交到 Git
483
+ - 所有环境使用同一套密钥
484
+ - 密钥从未轮换过
485
+ - 离职员工仍持有有效密钥
486
+
487
+ ### 根因分析
488
+
489
+ - "先跑起来再说"的开发习惯
490
+ - 没有密钥管理工具
491
+ - `.gitignore` 不完善
492
+ - 缺乏代码审查中的安全检查
493
+ - 密钥轮换流程未建立
494
+
495
+ ### 正确做法
496
+
497
+ ```yaml
498
+ # 密钥管理最佳实践
499
+ secrets_management:
500
+ storage:
501
+ - 使用 HashiCorp Vault / AWS Secrets Manager / GCP Secret Manager
502
+ - Kubernetes 使用 External Secrets Operator 同步
503
+ - 永远不在代码仓库中存储密钥
504
+
505
+ access:
506
+ - 最小权限原则(每个服务只能访问自己的密钥)
507
+ - 密钥访问有审计日志
508
+ - 动态密钥优于静态密钥(Vault 动态数据库凭据)
509
+
510
+ rotation:
511
+ - 数据库密码: 每 90 天轮换
512
+ - API Key: 每 180 天轮换
513
+ - TLS 证书: 自动续期(< 90 天有效期)
514
+ - 泄露后立即轮换(零容忍)
515
+
516
+ prevention:
517
+ - pre-commit hook 扫描密钥(gitleaks / detect-secrets)
518
+ - CI 流水线密钥扫描(阻断含密钥的提交)
519
+ - 代码审查时关注硬编码凭据
520
+ ```
521
+
522
+ ```bash
523
+ # pre-commit 配置示例
524
+ # .pre-commit-config.yaml
525
+ repos:
526
+ - repo: https://github.com/gitleaks/gitleaks
527
+ rev: v8.18.0
528
+ hooks:
529
+ - id: gitleaks
530
+ ```
531
+
532
+ ### 检测方法
533
+
534
+ - 在代码仓库执行 `gitleaks detect` 或 `trufflehog`
535
+ - 检查是否有 `.env` 文件被提交
536
+ - 审查 CI/CD 中密钥的传递方式
537
+
538
+ ---
539
+
540
+ ## 反模式 9:无容量规划 (No Capacity Planning)
541
+
542
+ ### 问题描述
543
+
544
+ 系统容量完全靠猜测,不做负载测试,不做增长预测,直到系统崩溃才扩容。
545
+
546
+ ### 典型症状
547
+
548
+ - 大促/营销活动期间系统频繁宕机
549
+ - 数据库磁盘满导致写入失败
550
+ - 扩容操作都是紧急响应而非提前规划
551
+ - 不知道系统的性能上限
552
+ - 资源利用率要么极低(浪费)要么极高(危险)
553
+
554
+ ### 根因分析
555
+
556
+ - 没有定期容量评审机制
557
+ - 缺乏负载测试基础设施
558
+ - 业务和技术团队缺乏沟通
559
+ - "加机器就行"的粗放思维
560
+
561
+ ### 正确做法
562
+
563
+ 参考本知识库中的 `capacity-planning-playbook.md`,核心要点:
564
+
565
+ 1. 每季度进行容量评审
566
+ 2. 建立服务资源消耗模型
567
+ 3. 定期进行基线压测
568
+ 4. 基于业务增长预测规划容量
569
+ 5. 设置容量预警告警(利用率 > 70%)
570
+ 6. 预留 30% 安全余量
571
+
572
+ ### 检测方法
573
+
574
+ - 询问团队系统的性能上限(QPS/连接数)
575
+ - 检查是否有定期压测记录
576
+ - 检查资源利用率告警是否配置
577
+
578
+ ---
579
+
580
+ ## 反模式 10:配置漂移 (Configuration Drift)
581
+
582
+ ### 问题描述
583
+
584
+ 生产环境的实际配置与代码仓库中声明的配置不一致,且差异持续扩大。
585
+
586
+ ### 典型症状
587
+
588
+ - "Terraform apply 显示要改 50 个资源,但我们没改过呀"
589
+ - 同角色服务器的软件版本不同
590
+ - 环境间配置不一致(测试环境正常,生产环境报错)
591
+ - 手动修改后忘记同步到 IaC 代码
592
+ - 审计发现安全组规则与文档不符
593
+
594
+ ### 根因分析
595
+
596
+ - 允许手动修改生产环境(绕过 IaC)
597
+ - 缺乏配置漂移检测机制
598
+ - IaC 代码不是唯一真相来源
599
+ - 紧急修复后未补充 IaC 变更
600
+
601
+ ### 正确做法
602
+
603
+ ```yaml
604
+ # 配置漂移防治策略
605
+ drift_prevention:
606
+ principles:
607
+ - IaC 代码是唯一真相来源(Single Source of Truth)
608
+ - 生产环境禁止手动修改(收回 Console/SSH 写权限)
609
+ - 所有变更通过 PR -> Review -> CI/CD 流水线
610
+
611
+ detection:
612
+ - 每日执行 terraform plan 并告警差异
613
+ - 每周执行配置一致性扫描
614
+ - 使用 AWS Config / Azure Policy / OPA 持续合规检查
615
+
616
+ remediation:
617
+ - 检测到漂移 -> 立即创建 Issue
618
+ - 48 小时内将手动变更导入 IaC 或回滚
619
+ - 复盘漂移原因并加固流程
620
+
621
+ tooling:
622
+ - Terraform: terraform plan -detailed-exitcode
623
+ - Kubernetes: kubectl diff / ArgoCD drift detection
624
+ - Ansible: --check --diff 模式
625
+ - 通用: driftctl / CloudQuery
626
+ ```
627
+
628
+ ### 检测方法
629
+
630
+ - 执行 `terraform plan` 查看差异数量
631
+ - 对比 Kubernetes 集群实际状态与 Git 仓库声明
632
+ - 检查最近 30 天是否有人直接通过 Console 修改过资源
633
+
634
+ ---
635
+
636
+ ## 反模式影响矩阵
637
+
638
+ | 反模式 | 可用性影响 | 安全影响 | 效率影响 | 修复难度 | 优先级 |
639
+ |--------|-----------|---------|---------|---------|--------|
640
+ | 告警疲劳 | 高 | 中 | 高 | 低 | P0 |
641
+ | 雪花服务器 | 高 | 中 | 高 | 高 | P1 |
642
+ | 手动部署 | 中 | 中 | 高 | 中 | P1 |
643
+ | 无 Runbook | 高 | 低 | 高 | 低 | P0 |
644
+ | 单点故障 | 极高 | 低 | 低 | 中 | P0 |
645
+ | 忽略日志 | 中 | 中 | 高 | 中 | P1 |
646
+ | 无备份验证 | 极高 | 低 | 低 | 低 | P0 |
647
+ | 密钥硬编码 | 低 | 极高 | 低 | 低 | P0 |
648
+ | 无容量规划 | 高 | 低 | 中 | 中 | P1 |
649
+ | 配置漂移 | 中 | 高 | 中 | 中 | P1 |
650
+
651
+ ---
652
+
653
+ ## Agent Checklist
654
+
655
+ - [ ] 已审查告警体系,确认误报率 < 10%,每条告警可行动
656
+ - [ ] 已确认所有服务器配置通过 IaC 管理,无雪花服务器
657
+ - [ ] 已确认部署流程完全自动化,无手动 SSH 操作
658
+ - [ ] 已确认每个 P0/P1 告警有对应 Runbook
659
+ - [ ] 已完成单点故障分析,关键路径无单点
660
+ - [ ] 已确认日志集中采集、结构化、可检索
661
+ - [ ] 已执行备份恢复验证,RTO/RPO 满足 SLA
662
+ - [ ] 已确认代码仓库无硬编码密钥,密钥管理工具就绪
663
+ - [ ] 已建立容量规划机制,定期评审
664
+ - [ ] 已部署配置漂移检测,IaC 为唯一真相来源