@umacloud/knowledge 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (418) hide show
  1. package/00-governance/governance-capabilities.md +557 -0
  2. package/00-governance/knowledge-map.md +39 -0
  3. package/00-governance/maintenance-policy.md +76 -0
  4. package/00-governance/review-checklist.md +81 -0
  5. package/README.md +13 -0
  6. package/ai/01-standards/agent-development-complete.md +691 -0
  7. package/ai/01-standards/llm-application-complete.md +488 -0
  8. package/ai/01-standards/mlops-complete.md +798 -0
  9. package/ai/01-standards/prompt-engineering-complete.md +646 -0
  10. package/ai/01-standards/rag-architecture-complete.md +649 -0
  11. package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
  12. package/ai/03-checklists/ai-project-checklist.md +215 -0
  13. package/ai/04-antipatterns/ai-antipatterns.md +661 -0
  14. package/ai/05-cases/case-rag-production.md +147 -0
  15. package/ai/06-glossary/ai-glossary.md +162 -0
  16. package/ai/agent-evaluation-benchmark.md +53 -0
  17. package/ai/ai-agent-memory-context-management.md +41 -0
  18. package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
  19. package/ai/ai-data-security-and-compliance-playbook.md +37 -0
  20. package/ai/ai-domain-index-and-checklist.md +40 -0
  21. package/ai/ai-governance-maturity-model.md +50 -0
  22. package/ai/ai-model-selection-and-routing-strategy.md +47 -0
  23. package/ai/ai-observability-and-oncall-runbook.md +52 -0
  24. package/ai/ai-rag-engineering-playbook.md +42 -0
  25. package/ai/ai-red-team-and-safety-evaluation.md +42 -0
  26. package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
  27. package/ai/llm-agent-engineering-deep-dive.md +57 -0
  28. package/ai/prompt-and-tool-guardrails.md +52 -0
  29. package/api/01-standards/enterprise-api-standards.md +198 -0
  30. package/api/01-standards/rest-api-design-guide.md +63 -0
  31. package/api/02-playbooks/api-pagination-playbook.md +93 -0
  32. package/api/02-playbooks/graphql-production-playbook.md +176 -0
  33. package/api/03-checklists/api-review-checklist.md +55 -0
  34. package/api/04-antipatterns/api-antipatterns.md +112 -0
  35. package/architecture/01-standards/api-gateway-patterns.md +496 -0
  36. package/architecture/01-standards/cloud-native-patterns.md +644 -0
  37. package/architecture/01-standards/distributed-systems-patterns.md +591 -0
  38. package/architecture/01-standards/event-driven-architecture.md +595 -0
  39. package/architecture/01-standards/microservices-patterns-complete.md +968 -0
  40. package/architecture/01-standards/microservices-patterns.md +495 -0
  41. package/architecture/01-standards/system-design-interview.md +664 -0
  42. package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
  43. package/architecture/02-playbooks/migration-playbook.md +780 -0
  44. package/architecture/02-playbooks/system-design-playbook.md +779 -0
  45. package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
  46. package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
  47. package/architecture/05-cases/case-netflix-microservices.md +413 -0
  48. package/architecture/06-glossary/architecture-glossary.md +164 -0
  49. package/architecture/adr-template-and-examples.md +38 -0
  50. package/architecture/api-gateway-deep-dive.md +1291 -0
  51. package/architecture/configuration-management.md +1162 -0
  52. package/architecture/distributed-transactions.md +1220 -0
  53. package/architecture/microservices-complete.md +735 -0
  54. package/architecture/resilience-and-disaster-patterns.md +37 -0
  55. package/architecture/service-governance.md +1198 -0
  56. package/architecture/system-architecture-deep-dive.md +37 -0
  57. package/backend/01-standards/analytics-and-growth.md +65 -0
  58. package/backend/01-standards/api-and-error-conventions.md +120 -0
  59. package/backend/01-standards/application-layering-and-packaging.md +160 -0
  60. package/backend/01-standards/auth-implementation.md +104 -0
  61. package/backend/01-standards/backend-framework-idioms.md +74 -0
  62. package/backend/01-standards/background-jobs-and-async.md +66 -0
  63. package/backend/01-standards/caching-strategies-complete.md +390 -0
  64. package/backend/01-standards/config-and-observability.md +77 -0
  65. package/backend/01-standards/data-modeling-and-persistence.md +94 -0
  66. package/backend/01-standards/django-complete.md +1765 -0
  67. package/backend/01-standards/email-and-notifications.md +64 -0
  68. package/backend/01-standards/fastapi-complete.md +925 -0
  69. package/backend/01-standards/file-upload-and-storage.md +66 -0
  70. package/backend/01-standards/graphql-api-complete.md +416 -0
  71. package/backend/01-standards/llm-application-standard.md +78 -0
  72. package/backend/01-standards/message-queue-patterns.md +379 -0
  73. package/backend/01-standards/microservices-and-distributed.md +78 -0
  74. package/backend/01-standards/nestjs-complete.md +2167 -0
  75. package/backend/01-standards/payment-integration.md +80 -0
  76. package/backend/01-standards/rate-limiting-complete.md +451 -0
  77. package/backend/01-standards/realtime-and-websocket.md +65 -0
  78. package/backend/01-standards/search-and-filtering.md +64 -0
  79. package/backend/01-standards/spring-boot-complete.md +445 -0
  80. package/backend/02-playbooks/api-design-playbook.md +718 -0
  81. package/backend/02-playbooks/email-send-playbook.md +130 -0
  82. package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
  83. package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
  84. package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
  85. package/backend/03-checklists/api-launch-checklist.md +189 -0
  86. package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
  87. package/blockchain/01-standards/blockchain-basics.md +557 -0
  88. package/blockchain/01-standards/smart-contract-development.md +1315 -0
  89. package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
  90. package/cicd/01-standards/github-actions-complete.md +473 -0
  91. package/cicd/01-standards/release-and-store-submission.md +75 -0
  92. package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
  93. package/cicd/02-playbooks/release-management-playbook.md +605 -0
  94. package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
  95. package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
  96. package/cicd/05-cases/case-deployment-automation.md +221 -0
  97. package/cicd/05-cases/case-gitops-transformation.md +212 -0
  98. package/cicd/06-glossary/cicd-glossary.md +114 -0
  99. package/cicd/cicd-blueprint-deep-dive.md +38 -0
  100. package/cicd/release-readiness-gate.md +37 -0
  101. package/cloud-native/01-standards/container-security.md +741 -0
  102. package/cloud-native/01-standards/kubernetes-complete.md +812 -0
  103. package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
  104. package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
  105. package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
  106. package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
  107. package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
  108. package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
  109. package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
  110. package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
  111. package/cloud-native/03-checklists/container-security-checklist.md +431 -0
  112. package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
  113. package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
  114. package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
  115. package/cloud-native/05-cases/case-k8s-migration.md +478 -0
  116. package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
  117. package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
  118. package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
  119. package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
  120. package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
  121. package/data/01-standards/elasticsearch-complete.md +2098 -0
  122. package/data/01-standards/postgresql-complete.md +1613 -0
  123. package/data/01-standards/redis-complete.md +1527 -0
  124. package/data/02-playbooks/database-optimization-playbook.md +403 -0
  125. package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
  126. package/data/03-checklists/database-launch-checklist.md +187 -0
  127. package/data/04-antipatterns/database-antipatterns.md +873 -0
  128. package/data/05-cases/case-database-migration.md +310 -0
  129. package/data/06-glossary/database-glossary.md +440 -0
  130. package/data/data-governance-and-modeling-deep-dive.md +39 -0
  131. package/data-engineering/01-standards/airflow-complete.md +523 -0
  132. package/data-engineering/01-standards/kafka-complete.md +1521 -0
  133. package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
  134. package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
  135. package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
  136. package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
  137. package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
  138. package/database/01-standards/database-schema-standards.md +147 -0
  139. package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
  140. package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
  141. package/database/02-playbooks/postgresql-production-playbook.md +146 -0
  142. package/database/02-playbooks/redis-caching-playbook.md +117 -0
  143. package/database/03-checklists/database-review-checklist.md +50 -0
  144. package/database/04-antipatterns/database-antipatterns.md +112 -0
  145. package/design/01-standards/ui-design-system-complete.md +423 -0
  146. package/design/02-playbooks/design-handoff-playbook.md +254 -0
  147. package/design/02-playbooks/design-review-playbook.md +388 -0
  148. package/design/03-checklists/design-review-checklist.md +246 -0
  149. package/design/04-antipatterns/design-antipatterns.md +378 -0
  150. package/design/05-cases/case-design-system-adoption.md +328 -0
  151. package/design/06-glossary/design-glossary.md +329 -0
  152. package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
  153. package/design/ux-system-deep-dive.md +38 -0
  154. package/design-systems/00-craft-rules.md +71 -0
  155. package/design-systems/aesthetic-families.md +43 -0
  156. package/design-systems/anti-ai-slop.md +162 -0
  157. package/design-systems/bold-geometric.md +120 -0
  158. package/design-systems/brutalist-bold.md +103 -0
  159. package/design-systems/editorial-clean.md +109 -0
  160. package/design-systems/glass-aurora.md +108 -0
  161. package/design-systems/modern-minimal.md +145 -0
  162. package/design-systems/premium-luxury.md +106 -0
  163. package/design-systems/product-type-design-map.md +48 -0
  164. package/design-systems/soft-warm.md +123 -0
  165. package/design-systems/tech-utility.md +113 -0
  166. package/desktop/01-standards/desktop-app-standard.md +72 -0
  167. package/desktop/01-standards/desktop-design.md +71 -0
  168. package/development/00-governance/document-template.md +41 -0
  169. package/development/01-standards/api-versioning-strategies.md +432 -0
  170. package/development/01-standards/authentication-patterns-complete.md +479 -0
  171. package/development/01-standards/css-architecture-complete.md +550 -0
  172. package/development/01-standards/database-migration-strategies.md +484 -0
  173. package/development/01-standards/elasticsearch-complete.md +347 -0
  174. package/development/01-standards/git-complete.md +371 -0
  175. package/development/01-standards/golang-complete.md +1565 -0
  176. package/development/01-standards/graphql-complete.md +298 -0
  177. package/development/01-standards/javascript-bundlers-complete.md +469 -0
  178. package/development/01-standards/javascript-typescript-complete.md +528 -0
  179. package/development/01-standards/jest-complete.md +275 -0
  180. package/development/01-standards/linux-complete.md +234 -0
  181. package/development/01-standards/logging-observability-complete.md +526 -0
  182. package/development/01-standards/microservices-communication.md +502 -0
  183. package/development/01-standards/mongodb-complete.md +406 -0
  184. package/development/01-standards/oauth2-complete.md +285 -0
  185. package/development/01-standards/performance-optimization-complete.md +289 -0
  186. package/development/01-standards/playwright-complete.md +247 -0
  187. package/development/01-standards/postgresql-complete.md +456 -0
  188. package/development/01-standards/pytest-complete.md +340 -0
  189. package/development/01-standards/python-async-programming.md +902 -0
  190. package/development/01-standards/python-complete.md +956 -0
  191. package/development/01-standards/python-decorators-complete.md +799 -0
  192. package/development/01-standards/python-design-patterns.md +2854 -0
  193. package/development/01-standards/python-packaging-distribution.md +420 -0
  194. package/development/01-standards/python-testing-strategies.md +607 -0
  195. package/development/01-standards/python-web-frameworks-comparison.md +471 -0
  196. package/development/01-standards/redis-complete.md +317 -0
  197. package/development/01-standards/rest-api-complete.md +316 -0
  198. package/development/01-standards/rust-complete.md +578 -0
  199. package/development/01-standards/typescript-advanced-types.md +1513 -0
  200. package/development/01-standards/web-security-complete.md +292 -0
  201. package/development/02-playbooks/api-design-playbook.md +810 -0
  202. package/development/02-playbooks/database-migration-playbook.md +580 -0
  203. package/development/02-playbooks/debugging-playbook.md +692 -0
  204. package/development/02-playbooks/feature-delivery-playbook.md +430 -0
  205. package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
  206. package/development/02-playbooks/performance-optimization-playbook.md +531 -0
  207. package/development/02-playbooks/performance-tuning-playbook.md +652 -0
  208. package/development/02-playbooks/refactor-playbook.md +403 -0
  209. package/development/02-playbooks/release-playbook.md +469 -0
  210. package/development/03-checklists/architecture-review-checklist.md +168 -0
  211. package/development/03-checklists/data-migration-checklist.md +157 -0
  212. package/development/03-checklists/oncall-handover-checklist.md +173 -0
  213. package/development/03-checklists/pr-checklist.md +158 -0
  214. package/development/03-checklists/production-readiness-checklist.md +190 -0
  215. package/development/03-checklists/release-readiness-checklist.md +154 -0
  216. package/development/03-checklists/security-review-checklist.md +182 -0
  217. package/development/04-antipatterns/api-antipatterns.md +657 -0
  218. package/development/04-antipatterns/architecture-antipatterns.md +686 -0
  219. package/development/04-antipatterns/backend-antipatterns.md +648 -0
  220. package/development/04-antipatterns/cicd-antipatterns.md +540 -0
  221. package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
  222. package/development/04-antipatterns/data-antipatterns.md +658 -0
  223. package/development/04-antipatterns/database-antipatterns.md +578 -0
  224. package/development/04-antipatterns/frontend-antipatterns.md +635 -0
  225. package/development/04-antipatterns/reliability-antipatterns.md +700 -0
  226. package/development/04-antipatterns/security-antipatterns.md +747 -0
  227. package/development/05-cases/case-api-version-migration.md +428 -0
  228. package/development/05-cases/case-authorization-hardening.md +383 -0
  229. package/development/05-cases/case-bluegreen-rollback.md +466 -0
  230. package/development/05-cases/case-cache-snowball-protection.md +485 -0
  231. package/development/05-cases/case-ci-cd-pipeline.md +544 -0
  232. package/development/05-cases/case-database-scaling.md +500 -0
  233. package/development/05-cases/case-db-hotspot-optimization.md +487 -0
  234. package/development/05-cases/case-incident-mttr-reduction.md +563 -0
  235. package/development/05-cases/case-microservice-migration.md +375 -0
  236. package/development/05-cases/case-performance-optimization.md +406 -0
  237. package/development/05-cases/case-security-incident-response.md +345 -0
  238. package/development/06-glossary/full-stack-glossary.md +166 -0
  239. package/development/09-maturity/quarterly-audit-template.md +35 -0
  240. package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
  241. package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
  242. package/development/12-scenarios/development-scenarios-guide.md +565 -0
  243. package/development/13-implementation-assets/implementation-toolkit.md +282 -0
  244. package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
  245. package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
  246. package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
  247. package/development/api-contract-and-versioning-guide.md +36 -0
  248. package/development/api-governance-complete.md +43 -0
  249. package/development/backend-engineering-complete.md +43 -0
  250. package/development/code-review-quality-complete.md +43 -0
  251. package/development/concurrency-reliability-complete.md +43 -0
  252. package/development/database-engineering-complete.md +43 -0
  253. package/development/engineering-effectiveness-complete.md +43 -0
  254. package/development/engineering-standards-deep-dive.md +38 -0
  255. package/development/frontend-engineering-complete.md +43 -0
  256. package/development/performance-capacity-complete.md +43 -0
  257. package/development/refactor-migration-complete.md +42 -0
  258. package/development/refactoring-and-techdebt-playbook.md +37 -0
  259. package/development/security-in-development-complete.md +43 -0
  260. package/devops/01-standards/cicd-pipeline-complete.md +262 -0
  261. package/devops/01-standards/docker-complete.md +1490 -0
  262. package/devops/01-standards/github-actions-complete.md +337 -0
  263. package/devops/01-standards/kubernetes-complete.md +638 -0
  264. package/devops/01-standards/terraform-complete.md +2117 -0
  265. package/devops/02-playbooks/docker-compose-playbook.md +233 -0
  266. package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
  267. package/devops/02-playbooks/docker-production-playbook.md +952 -0
  268. package/edge-iot/01-standards/edge-iot-complete.md +473 -0
  269. package/experts/architect/api-design.md +178 -0
  270. package/experts/architect/methodology.md +124 -0
  271. package/experts/architect/security.md +75 -0
  272. package/experts/backend-lead/methodology.md +216 -0
  273. package/experts/devops/methodology.md +160 -0
  274. package/experts/frontend-lead/methodology.md +178 -0
  275. package/experts/product-manager/industry/ecommerce.md +43 -0
  276. package/experts/product-manager/industry/saas.md +40 -0
  277. package/experts/product-manager/methodology.md +97 -0
  278. package/experts/qa-lead/methodology.md +123 -0
  279. package/experts/qa-lead/test-strategy.md +128 -0
  280. package/experts/uiux-designer/methodology.md +125 -0
  281. package/frontend/01-standards/accessibility-complete.md +532 -0
  282. package/frontend/01-standards/accessibility-standard.md +74 -0
  283. package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
  284. package/frontend/01-standards/design-tokens-complete.md +444 -0
  285. package/frontend/01-standards/forms-and-validation.md +77 -0
  286. package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
  287. package/frontend/01-standards/i18n-and-localization.md +65 -0
  288. package/frontend/01-standards/nextjs-complete.md +451 -0
  289. package/frontend/01-standards/react-complete.md +713 -0
  290. package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
  291. package/frontend/01-standards/react-hooks-complete.md +1171 -0
  292. package/frontend/01-standards/seo-and-web-vitals.md +77 -0
  293. package/frontend/01-standards/state-management-complete.md +444 -0
  294. package/frontend/01-standards/vue-complete.md +499 -0
  295. package/frontend/01-standards/vue3-complete.md +2002 -0
  296. package/frontend/01-standards/web-framework-best-practices.md +64 -0
  297. package/frontend/01-standards/web-performance-complete.md +495 -0
  298. package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
  299. package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
  300. package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
  301. package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
  302. package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
  303. package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
  304. package/frontend/03-checklists/component-quality-checklist.md +166 -0
  305. package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
  306. package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
  307. package/frontend/05-cases/case-performance-optimization.md +274 -0
  308. package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
  309. package/harmony/01-standards/harmonyos-design.md +65 -0
  310. package/high-quality-engineering-playbook.md +54 -0
  311. package/incident/01-standards/incident-response-complete.md +303 -0
  312. package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
  313. package/incident/02-playbooks/postmortem-playbook.md +398 -0
  314. package/incident/03-checklists/incident-readiness-checklist.md +181 -0
  315. package/incident/04-antipatterns/incident-antipatterns.md +490 -0
  316. package/incident/05-cases/case-cascade-failure.md +176 -0
  317. package/incident/06-glossary/incident-glossary.md +114 -0
  318. package/incident/postmortem-and-response-deep-dive.md +39 -0
  319. package/industries/ecommerce/ecommerce-complete.md +631 -0
  320. package/industries/education/education-complete.md +555 -0
  321. package/industries/fintech/fintech-complete.md +501 -0
  322. package/industries/gaming/gaming-complete.md +587 -0
  323. package/industries/healthcare/healthcare-complete.md +452 -0
  324. package/low-code/01-standards/low-code-complete.md +944 -0
  325. package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
  326. package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
  327. package/miniprogram/01-standards/miniprogram-design.md +61 -0
  328. package/miniprogram/01-standards/miniprogram-standard.md +81 -0
  329. package/mobile/01-standards/android-material-design.md +70 -0
  330. package/mobile/01-standards/flutter-complete.md +384 -0
  331. package/mobile/01-standards/ios-design-hig.md +78 -0
  332. package/mobile/01-standards/mobile-app-standard.md +85 -0
  333. package/mobile/01-standards/react-native-complete.md +352 -0
  334. package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
  335. package/mobile/02-playbooks/mobile-performance.md +473 -0
  336. package/mobile/03-checklists/mobile-release-checklist.md +234 -0
  337. package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
  338. package/mobile/05-cases/case-app-performance.md +500 -0
  339. package/mobile/05-cases/case-app-startup-optimization.md +218 -0
  340. package/mobile/06-glossary/mobile-glossary.md +484 -0
  341. package/observability/01-standards/observability-standards.md +103 -0
  342. package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
  343. package/observability/02-playbooks/structured-logging-playbook.md +73 -0
  344. package/observability/03-checklists/observability-checklist.md +54 -0
  345. package/observability/04-antipatterns/observability-antipatterns.md +106 -0
  346. package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
  347. package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
  348. package/operations/03-checklists/production-launch-checklist.md +365 -0
  349. package/operations/04-antipatterns/operations-antipatterns.md +664 -0
  350. package/operations/05-cases/case-sre-practices.md +581 -0
  351. package/operations/06-glossary/operations-glossary.md +120 -0
  352. package/operations/aiops-anomaly-detection.md +758 -0
  353. package/operations/capacity-planning.md +1061 -0
  354. package/operations/chaos-engineering.md +659 -0
  355. package/operations/incident-command-system.md +38 -0
  356. package/operations/observability-complete.md +442 -0
  357. package/operations/slo-sli-playbook.md +517 -0
  358. package/operations/sre-operations-deep-dive.md +39 -0
  359. package/package.json +8 -0
  360. package/performance/01-standards/performance-and-scalability.md +80 -0
  361. package/performance/01-standards/performance-standards.md +156 -0
  362. package/performance/02-playbooks/query-optimization-playbook.md +103 -0
  363. package/performance/03-checklists/performance-checklist.md +56 -0
  364. package/performance/04-antipatterns/performance-antipatterns.md +146 -0
  365. package/product/01-standards/product-management-complete.md +285 -0
  366. package/product/02-playbooks/feature-launch-playbook.md +207 -0
  367. package/product/02-playbooks/user-research-playbook.md +532 -0
  368. package/product/03-checklists/feature-launch-checklist.md +275 -0
  369. package/product/04-antipatterns/product-antipatterns.md +355 -0
  370. package/product/05-cases/case-mvp-to-scale.md +384 -0
  371. package/product/06-glossary/product-glossary.md +462 -0
  372. package/product/feature-prioritization-framework.md +40 -0
  373. package/product/kpi-and-metric-tree.md +37 -0
  374. package/product/product-discovery-and-prd-deep-dive.md +41 -0
  375. package/quantum/01-standards/quantum-complete.md +1186 -0
  376. package/security/01-standards/api-security-complete.md +511 -0
  377. package/security/01-standards/container-runtime-security.md +574 -0
  378. package/security/01-standards/data-protection-gdpr.md +543 -0
  379. package/security/01-standards/owasp-top10-complete.md +1890 -0
  380. package/security/01-standards/secure-coding-baseline.md +90 -0
  381. package/security/01-standards/supply-chain-security.md +441 -0
  382. package/security/01-standards/web-security-checklist.md +108 -0
  383. package/security/01-standards/zero-trust-architecture.md +521 -0
  384. package/security/02-playbooks/auth-sso-playbook.md +166 -0
  385. package/security/02-playbooks/incident-response-security-playbook.md +588 -0
  386. package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
  387. package/security/02-playbooks/payment-integration-playbook.md +119 -0
  388. package/security/02-playbooks/penetration-testing-playbook.md +517 -0
  389. package/security/03-checklists/security-audit-checklist.md +356 -0
  390. package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
  391. package/security/05-cases/case-log4shell-incident.md +537 -0
  392. package/security/05-cases/case-major-breaches.md +468 -0
  393. package/security/06-glossary/security-glossary.md +212 -0
  394. package/security/compliance-automation.md +993 -0
  395. package/security/container-security.md +680 -0
  396. package/security/devsecops-complete.md +426 -0
  397. package/security/sast-dast-sca.md +775 -0
  398. package/security/secrets-management.md +594 -0
  399. package/security/security-architecture-deep-dive.md +37 -0
  400. package/security/threat-modeling-stride-playbook.md +40 -0
  401. package/seed-templates/auth-system.md +59 -0
  402. package/seed-templates/blog-content.md +94 -0
  403. package/seed-templates/dashboard.md +89 -0
  404. package/seed-templates/docs-site.md +73 -0
  405. package/seed-templates/e-commerce.md +50 -0
  406. package/seed-templates/saas-landing.md +92 -0
  407. package/seed-templates/settings-page.md +51 -0
  408. package/testing/01-standards/test-strategy-and-layering.md +83 -0
  409. package/testing/01-standards/testing-strategy-complete.md +422 -0
  410. package/testing/01-standards/unit-testing-best-practices.md +118 -0
  411. package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
  412. package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
  413. package/testing/03-checklists/test-strategy-checklist.md +208 -0
  414. package/testing/04-antipatterns/testing-antipatterns.md +718 -0
  415. package/testing/05-cases/case-testing-transformation.md +300 -0
  416. package/testing/06-glossary/testing-glossary.md +110 -0
  417. package/testing/risk-based-test-matrix.md +36 -0
  418. package/testing/testing-strategy-deep-dive.md +37 -0
@@ -0,0 +1,517 @@
1
+ ---
2
+ id: slo-sli-playbook
3
+ title: slo-sli-playbook
4
+ domain: operations
5
+ category: slo-sli-playbook.md
6
+ difficulty: intermediate
7
+ tags: [budget, operations, playbook, sli, slo, 实施流程, 核心概念, 目标]
8
+ quality_score: 70
9
+ last_updated: 2026-06-15
10
+ ---
11
+ # 开发:Excellent(11964948@qq.com)
12
+ # 功能:SLO/SLI 实战手册
13
+ # 作用:提供服务水平目标(SLO)和服务水平指标(SLI)的完整实施指南
14
+ # 创建时间:2026-03-20
15
+ # 最后修改:2026-03-20
16
+
17
+ ## 目标
18
+ 建立以 SLO 为中心的可靠性工程体系,量化服务质量,平衡功能交付与稳定性,实现数据驱动的故障响应和容量规划。
19
+
20
+ ## 适用范围
21
+ - 所有面向用户的生产服务
22
+ - 关键内部依赖服务(数据库、消息队列、缓存)
23
+ - 第三方服务集成(支付网关、短信服务、云服务)
24
+
25
+ ## 核心概念
26
+
27
+ ### SLI(Service Level Indicator)服务水平指标
28
+ **定义**:量化服务行为的指标,反映服务质量
29
+
30
+ **SLI 类型**:
31
+ 1. **可用性(Availability)**
32
+ - 定义:服务正常响应的比例
33
+ - 计算:成功请求数 / 总请求数
34
+ - 示例:99.9% 可用性 = 允许 0.1% 失败率
35
+
36
+ 2. **延迟(Latency)**
37
+ - 定义:请求响应时间
38
+ - 计算:P50/P95/P99 分位数
39
+ - 示例:P95 < 200ms, P99 < 500ms
40
+
41
+ 3. **吞吐量(Throughput)**
42
+ - 定义:单位时间处理的请求数
43
+ - 计算:QPS/RPS
44
+ - 示例:1000 QPS
45
+
46
+ 4. **错误率(Error Rate)**
47
+ - 定义:失败请求占比
48
+ - 计算:错误请求数 / 总请求数
49
+ - 示例:< 0.1% 错误率
50
+
51
+ 5. **数据质量(Data Quality)**
52
+ - 定义:数据准确性/完整性
53
+ - 计算:正确数据数 / 总数据数
54
+ - 示例:订单金额准确率 99.99%
55
+
56
+ ### SLO(Service Level Objective)服务水平目标
57
+ **定义**:SLI 的目标值,定义服务的最低可接受行为
58
+
59
+ **SLO 设计原则**:
60
+ - 可衡量:基于可靠的 SLI
61
+ - 可达成:考虑历史性能和团队能力
62
+ - 用户中心:反映用户真实体验
63
+ - 渐进式:逐步提升,避免过度承诺
64
+
65
+ **SLO 示例**:
66
+ ```yaml
67
+ # 电商订单服务 SLO
68
+ availability:
69
+ target: 99.9%
70
+ window: 30d
71
+ description: "30 天内订单服务可用性 >= 99.9%"
72
+
73
+ latency:
74
+ - target: 200ms
75
+ percentile: 95
76
+ window: 5m
77
+ description: "P95 延迟 < 200ms(5 分钟窗口)"
78
+ - target: 500ms
79
+ percentile: 99
80
+ window: 5m
81
+ description: "P99 延迟 < 500ms(5 分钟窗口)"
82
+
83
+ throughput:
84
+ target: 1000 QPS
85
+ window: 1m
86
+ description: "吞吐量 >= 1000 QPS(1 分钟窗口)"
87
+
88
+ error_rate:
89
+ target: 0.1%
90
+ window: 5m
91
+ description: "错误率 < 0.1%(5 分钟窗口)"
92
+ ```
93
+
94
+ ### SLA(Service Level Agreement)服务水平协议
95
+ **定义**:与用户签订的、违反后需要赔偿的正式协议
96
+
97
+ **SLA 与 SLO 的关系**:
98
+ - SLA 是外部承诺,SLO 是内部目标
99
+ - SLO 应比 SLA 更严格(留有缓冲)
100
+ - 建议:SLO = SLA + 安全边际(如 SLA 99.9% -> SLO 99.95%)
101
+
102
+ **SLA 示例**:
103
+ ```yaml
104
+ # 电商订单服务 SLA
105
+ availability:
106
+ target: 99.9%
107
+ window: 30d
108
+ penalty:
109
+ - range: "99.0% - 99.9%"
110
+ compensation: "10% 服务费用减免"
111
+ - range: "95.0% - 99.0%"
112
+ compensation: "25% 服务费用减免"
113
+ - range: "< 95.0%"
114
+ compensation: "100% 服务费用减免"
115
+ ```
116
+
117
+ ## SLO 实施流程
118
+
119
+ ### 步骤 1:识别关键服务
120
+ **评估维度**:
121
+ - 用户可见性:是否直接影响用户体验?
122
+ - 收入影响:是否影响核心业务收入?
123
+ - 依赖关系:是否被多个服务依赖?
124
+ - 故障影响:故障后影响范围多大?
125
+
126
+ **优先级矩阵**:
127
+ | 服务 | 用户可见 | 收入影响 | 依赖度 | 优先级 |
128
+ |------|----------|----------|--------|--------|
129
+ | 订单服务 | 高 | 高 | 高 | P0 |
130
+ | 支付服务 | 高 | 高 | 中 | P0 |
131
+ | 搜索服务 | 高 | 中 | 中 | P1 |
132
+ | 通知服务 | 中 | 低 | 低 | P2 |
133
+
134
+ ### 步骤 2:定义 SLI
135
+ **用户旅程分析**:
136
+ ```
137
+ 用户 -> 搜索商品 -> 查看详情 -> 加入购物车 -> 下单 -> 支付 -> 履约
138
+ ↓ ↓ ↓ ↓ ↓ ↓
139
+ 搜索延迟 页面加载 响应时间 可用性 成功率 交付时长
140
+ ```
141
+
142
+ **SLI 选择清单**:
143
+ - [ ] 可用性 SLI:请求成功率
144
+ - [ ] 延迟 SLI:响应时间分位数
145
+ - [ ] 吞吐量 SLI:QPS/RPS
146
+ - [ ] 错误率 SLI:失败请求占比
147
+ - [ ] 数据质量 SLI:数据准确性(如适用)
148
+ - [ ] 自定义 SLI:业务特定指标(如订单履约时长)
149
+
150
+ ### 步骤 3:设定 SLO 目标
151
+ **基于历史数据**:
152
+ ```promql
153
+ # 查询过去 30 天可用性
154
+ sum(rate(http_requests_total{status!~"5.."}[30d])) /
155
+ sum(rate(http_requests_total[30d])) * 100
156
+
157
+ # 查询过去 30 天 P95 延迟
158
+ histogram_quantile(0.95,
159
+ sum(rate(http_request_duration_seconds_bucket[30d])) by (le)
160
+ )
161
+ ```
162
+
163
+ **目标设定策略**:
164
+ - 激进目标:历史最佳值 + 10% 提升
165
+ - 保守目标:历史中位数
166
+ - 渐进式:从保守目标开始,每季度提升
167
+
168
+ **示例 SLO 表**:
169
+ | 服务 | SLI | 当前值 | SLO 目标 | SLA 承诺 |
170
+ |------|-----|--------|----------|----------|
171
+ | 订单服务 | 可用性 | 99.85% | 99.9% | 99.5% |
172
+ | 订单服务 | P95 延迟 | 180ms | 200ms | 300ms |
173
+ | 支付服务 | 成功率 | 99.92% | 99.95% | 99.5% |
174
+ | 搜索服务 | P95 延迟 | 250ms | 300ms | 500ms |
175
+
176
+ ### 步骤 4:实现监控
177
+ **SLO 监控架构**:
178
+ ```
179
+ Prometheus(指标采集)
180
+ -> Recording Rules(预计算 SLO)
181
+ -> Alertmanager(告警)
182
+ -> Grafana(可视化)
183
+ ```
184
+
185
+ **Recording Rules 示例**:
186
+ ```yaml
187
+ # slo-recording-rules.yaml
188
+ groups:
189
+ - name: slo_availability
190
+ rules:
191
+ - record: slo:availability:ratio
192
+ expr: |
193
+ sum(rate(http_requests_total{status!~"5.."}[5m])) by (service) /
194
+ sum(rate(http_requests_total[5m])) by (service)
195
+
196
+ - record: slo:availability:ratio:30d
197
+ expr: |
198
+ sum(rate(http_requests_total{status!~"5.."}[30d])) by (service) /
199
+ sum(rate(http_requests_total[30d])) by (service)
200
+
201
+ - name: slo_latency
202
+ rules:
203
+ - record: slo:latency:p95
204
+ expr: |
205
+ histogram_quantile(0.95,
206
+ sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
207
+ )
208
+
209
+ - record: slo:latency:p99
210
+ expr: |
211
+ histogram_quantile(0.99,
212
+ sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service)
213
+ )
214
+ ```
215
+
216
+ **Grafana Dashboard 示例**:
217
+ ```yaml
218
+ # SLO 概览 Dashboard
219
+ panels:
220
+ - title: "可用性 SLO(30 天)"
221
+ type: gauge
222
+ targets:
223
+ - expr: slo:availability:ratio:30d{service="order-service"} * 100
224
+ thresholds:
225
+ - value: 99.9
226
+ color: green
227
+ - value: 99.0
228
+ color: yellow
229
+ - value: 0
230
+ color: red
231
+
232
+ - title: "P95 延迟 SLO(5 分钟)"
233
+ type: time-series
234
+ targets:
235
+ - expr: slo:latency:p95{service="order-service"} * 1000
236
+ thresholds:
237
+ - value: 200
238
+ color: green
239
+ - value: 300
240
+ color: yellow
241
+ - value: 0
242
+ color: red
243
+ ```
244
+
245
+ ### 步骤 5:配置告警
246
+ **错误预算告警**:
247
+ ```yaml
248
+ # 错误预算消耗速率告警
249
+ - alert: ErrorBudgetBurningFast
250
+ expr: |
251
+ (
252
+ 1 - slo:availability:ratio:30d{service="order-service"}
253
+ ) / (1 - 0.999) > 0.1
254
+ for: 5m
255
+ labels:
256
+ severity: P1
257
+ annotations:
258
+ summary: "订单服务错误预算消耗过快"
259
+ description: "错误预算消耗速率超过 10%,当前可用性 {{ $value | humanizePercentage }}"
260
+
261
+ # 可用性 SLO 违反告警
262
+ - alert: AvailabilitySLOBreach
263
+ expr: slo:availability:ratio:30d{service="order-service"} < 0.999
264
+ for: 5m
265
+ labels:
266
+ severity: P0
267
+ annotations:
268
+ summary: "订单服务可用性 SLO 违反"
269
+ description: "30 天可用性 {{ $value | humanizePercentage }} 低于目标 99.9%"
270
+ ```
271
+
272
+ **多窗口告警策略**:
273
+ ```yaml
274
+ # 短期窗口(快速告警)
275
+ - alert: HighErrorRate_5m
276
+ expr: slo:availability:ratio{service="order-service"} < 0.99
277
+ for: 5m
278
+ labels:
279
+ severity: P1
280
+
281
+ # 中期窗口(持续问题)
282
+ - alert: HighErrorRate_1h
283
+ expr: slo:availability:ratio:1h{service="order-service"} < 0.995
284
+ for: 1h
285
+ labels:
286
+ severity: P1
287
+
288
+ # 长期窗口(SLO 违反)
289
+ - alert: HighErrorRate_30d
290
+ expr: slo:availability:ratio:30d{service="order-service"} < 0.999
291
+ for: 5m
292
+ labels:
293
+ severity: P0
294
+ ```
295
+
296
+ ## 错误预算(Error Budget)
297
+
298
+ ### 概念
299
+ **定义**:SLO 违反前允许的最大故障时间/错误数
300
+
301
+ **计算公式**:
302
+ ```
303
+ 错误预算 = 1 - SLO 目标
304
+ 每月错误预算分钟数 = (1 - SLO) * 30 天 * 24 小时 * 60 分钟
305
+ ```
306
+
307
+ **示例**:
308
+ ```
309
+ SLO = 99.9%
310
+ 错误预算 = 1 - 99.9% = 0.1%
311
+ 每月错误预算分钟数 = 0.1% * 30 * 24 * 60 = 43.2 分钟
312
+ ```
313
+
314
+ ### 错误预算策略
315
+ **使用场景**:
316
+ 1. **功能发布决策**:
317
+ - 预算充足(> 50%):可以加速功能发布
318
+ - 预算紧张(< 20%):暂停非紧急发布,优先修复稳定性问题
319
+ - 预算耗尽(< 0%):冻结所有发布,全力恢复 SLO
320
+
321
+ 2. **故障优先级**:
322
+ - 预算充足:P1 事故
323
+ - 预算紧张:P0 事故(消耗速率 > 10%/天)
324
+ - 预算耗尽:P0 事故 + 复盘改进
325
+
326
+ 3. **容量规划**:
327
+ - 预算消耗快:需要扩容/优化性能
328
+ - 预算消耗慢:可以推迟扩容,节省成本
329
+
330
+ **错误预算 Dashboard**:
331
+ ```yaml
332
+ panels:
333
+ - title: "错误预算剩余"
334
+ type: gauge
335
+ targets:
336
+ - expr: |
337
+ (1 - (1 - slo:availability:ratio:30d{service="order-service"}) / (1 - 0.999)) * 100
338
+ thresholds:
339
+ - value: 50
340
+ color: green
341
+ - value: 20
342
+ color: yellow
343
+ - value: 0
344
+ color: red
345
+
346
+ - title: "错误预算消耗速率"
347
+ type: stat
348
+ targets:
349
+ - expr: |
350
+ (1 - slo:availability:ratio:1d{service="order-service"}) / (1 - 0.999) * 100
351
+ description: "每天消耗错误预算的百分比"
352
+ ```
353
+
354
+ ## SLO 评审与演进
355
+
356
+ ### 定期评审(每月)
357
+ **议程**:
358
+ 1. SLO 达成情况回顾
359
+ - 哪些 SLO 达成?
360
+ - 哪些 SLO 违反?
361
+ - 违反根因分析
362
+
363
+ 2. 错误预算使用分析
364
+ - 预算消耗速率
365
+ - 主要消耗来源(故障、发布、维护)
366
+ - 预算预测
367
+
368
+ 3. SLO 适应性评估
369
+ - 当前 SLO 是否合理?
370
+ - 是否需要调整目标?
371
+ - 用户反馈如何?
372
+
373
+ 4. 改进计划
374
+ - 稳定性改进措施
375
+ - 监控完善
376
+ - 团队培训
377
+
378
+ **评审报告模板**:
379
+ ```markdown
380
+ # SLO 月度评审报告 - YYYY-MM
381
+
382
+ ## SLO 达成情况
383
+ | 服务 | SLI | 目标 | 实际 | 状态 |
384
+ |------|-----|------|------|------|
385
+ | 订单服务 | 可用性 | 99.9% | 99.85% | 违反 |
386
+ | 订单服务 | P95 延迟 | 200ms | 180ms | 达成 |
387
+
388
+ ## 错误预算分析
389
+ - 月初预算:43.2 分钟
390
+ - 本月消耗:8.5 分钟(19.7%)
391
+ - 剩余预算:34.7 分钟(80.3%)
392
+
393
+ ## 主要事件
394
+ 1. [事件 ID] 订单服务数据库连接池耗尽(消耗预算 5 分钟)
395
+ 2. [事件 ID] 支付网关超时(消耗预算 2 分钟)
396
+
397
+ ## 改进措施
398
+ 1. 数据库连接池优化(优先级:P1,负责人:XXX,截止:YYYY-MM-DD)
399
+ 2. 支付网关熔断器配置(优先级:P1,负责人:XXX,截止:YYYY-MM-DD)
400
+
401
+ ## 下月计划
402
+ - SLO 目标调整:可用性 99.9% -> 99.95%(渐进提升)
403
+ ```
404
+
405
+ ### SLO 调整策略
406
+ **提升 SLO**:
407
+ - 条件:连续 3 个月达成当前 SLO
408
+ - 步骤:小幅提升(+ 0.05% - 0.1%),观察 1 个月
409
+ - 风险:避免过快提升导致频繁违反
410
+
411
+ **降低 SLO**:
412
+ - 条件:连续 3 个月违反当前 SLO
413
+ - 步骤:与业务方沟通,重新评估用户容忍度
414
+ - 风险:降低用户信任,影响品牌形象
415
+
416
+ **新增 SLO**:
417
+ - 场景:新增关键功能、新服务上线
418
+ - 步骤:试运行 1 个月(非正式),收集数据后正式设定
419
+
420
+ ## SLO 工具与平台
421
+
422
+ ### 开源工具
423
+ **SLO 计算框架**:
424
+ - Pyrra:Kubernetes 原生 SLO 管理 https://pyrra.dev/
425
+ - Sloth:SLO 生成器(Prometheus)https://slok.github.io/sloth/
426
+ - OpenSLO:SLO 规范标准 https://openslo.com/
427
+
428
+ **示例:Pyrra SLO 配置**:
429
+ ```yaml
430
+ apiVersion: pyrra.dev/v1alpha1
431
+ kind: ServiceLevelObjective
432
+ metadata:
433
+ name: order-service-availability
434
+ namespace: monitoring
435
+ spec:
436
+ target: 99.9
437
+ window: 30d
438
+ serviceLevelIndicator:
439
+ ratio:
440
+ total:
441
+ metric: http_requests_total{service="order-service"}
442
+ errors:
443
+ metric: http_requests_total{service="order-service",status=~"5.."}
444
+ alerting:
445
+ name: OrderServiceAvailability
446
+ labels:
447
+ severity: critical
448
+ ```
449
+
450
+ ### 商业平台
451
+ - Datadog SLO Monitor
452
+ - New Relic Service Level Management
453
+ - Google Cloud SLO Monitoring
454
+ - AWS CloudWatch ServiceLens
455
+
456
+ ## 常见失败模式
457
+
458
+ ### 1. SLO 设计问题
459
+ - **SLO 过于宽松**:99.9% 可用性,实际用户已投诉(实际需要 99.95%)
460
+ - **SLO 过于严格**:99.999% 可用性,成本过高且无业务价值
461
+ - **缺少关键 SLI**:只监控可用性,忽略延迟和数据质量
462
+ - **SLO 与业务脱节**:技术 SLO 达成,但业务 KPI 下降
463
+
464
+ ### 2. 监控实施问题
465
+ - **指标采集不准确**:缺少关键路径埋点,SLO 计算失真
466
+ - **窗口选择不当**:短期窗口告警噪音大,长期窗口响应慢
467
+ - **缺少错误预算**:有 SLO 但无错误预算,无法指导决策
468
+ - **Dashboard 不清晰**:SLO 信息淹没在海量指标中
469
+
470
+ ### 3. 组织流程问题
471
+ - **SLO 制定后不评审**:SLO 永不调整,与实际需求脱节
472
+ - **SLO 违反无行动**:反复违反但无改进措施,SLO 形同虚设
473
+ - **缺少跨团队对齐**:服务间 SLO 不匹配,下游 SLO 无法支撑上游
474
+ - **SLO 与绩效考核挂钩不当**:导致团队隐瞒问题、调整指标
475
+
476
+ ### 4. 技术债务问题
477
+ - **历史系统无监控**:无法计算 SLO,只能靠用户反馈
478
+ - **依赖外部服务无 SLO**:第三方服务故障影响自身 SLO,但无 SLA 约束
479
+ - **缺少自动化**:SLO 计算、告警、Dashboard 维护成本高
480
+
481
+ ## 验收标准
482
+
483
+ ### 功能验收
484
+ - [ ] 关键服务(Top 5)SLO 定义 100%
485
+ - [ ] SLO Dashboard 部署 100%
486
+ - [ ] SLO 告警规则配置 100%
487
+ - [ ] 错误预算计算和可视化
488
+ - [ ] SLO 评审流程文档化
489
+
490
+ ### 质量验收
491
+ - [ ] SLO 可达成率 >= 90%(3 个月观察期)
492
+ - [ ] 误报率 < 5%(错误告警 / 总告警)
493
+ - [ ] SLO 违反检测时间 < 5 分钟
494
+ - [ ] SLO 数据准确性 >= 99%(与实际业务数据对比)
495
+
496
+ ### 运营验收
497
+ - [ ] 每月 SLO 评审会议召开
498
+ - [ ] SLO 评审报告产出
499
+ - [ ] 团队 SLO 培训覆盖率 100%
500
+ - [ ] SLO 文档完整性 >= 90%
501
+
502
+ ## 参考资源
503
+
504
+ ### 经典著作
505
+ - Google SRE Book - Chapter 4: Service Level Objectives
506
+ - Site Reliability Workbook - Chapter 2: Implementing SLOs
507
+ - Building Secure and Reliable Systems - Chapter 8: Reliability
508
+
509
+ ### 最佳实践
510
+ - The Art of SLOs(Google)https://sre.google/workbook/implementing-slos/
511
+ - SLO Adoption Framework(Datadog)
512
+ - Error Budget Best Practices(PagerDuty)
513
+
514
+ ### 工具与框架
515
+ - OpenSLO 规范:https://github.com/OpenSLO/OpenSLO
516
+ - Pyrra:https://pyrra.dev/
517
+ - Sloth:https://github.com/slok/sloth
@@ -0,0 +1,39 @@
1
+ ---
2
+ id: sre-operations-deep-dive
3
+ title: sre-operations-deep-dive
4
+ domain: operations
5
+ category: sre-operations-deep-dive.md
6
+ difficulty: intermediate
7
+ tags: [deep, dive, operations, sre, 运维环节深度知识库]
8
+ quality_score: 70
9
+ last_updated: 2026-06-15
10
+ ---
11
+ # 开发:Excellent(11964948@qq.com)
12
+
13
+ ## 运维环节深度知识库
14
+
15
+ ### 目标
16
+ - 构建可观测、可响应、可恢复的生产运行体系。
17
+
18
+ ### 可观测性体系
19
+ - 指标:请求量、错误率、时延、资源利用率。
20
+ - 日志:结构化日志、上下文字段、检索规范。
21
+ - 追踪:统一 TraceID 与跨服务链路追踪。
22
+
23
+ ### 告警治理
24
+ - 告警分级:致命、高、中、低。
25
+ - 告警去噪:聚合策略、抑制策略、静默窗口。
26
+ - 值班规则:轮值机制、升级路径、响应时限。
27
+
28
+ ### 可靠性管理
29
+ - SLO 与错误预算绑定发布决策。
30
+ - 关键依赖设定超时、重试、熔断、降级策略。
31
+ - 定期执行备份恢复与容灾演练。
32
+
33
+ ### 运行手册要求
34
+ - 每个核心服务必须有启动、检查、回滚、故障定位流程。
35
+ - 每次重大故障后更新 runbook 与监控阈值。
36
+
37
+ ### 常见失败模式
38
+ - 告警太多但无法行动。
39
+ - 指标看起来正常但缺少业务成功率监控。
package/package.json ADDED
@@ -0,0 +1,8 @@
1
+ {
2
+ "name": "@umacloud/knowledge",
3
+ "version": "1.0.1",
4
+ "description": "UmaDev curated engineering knowledge corpus (standards, methodologies, expert playbooks, design systems, miniprogram/uniapp guides). Platform-independent data shipped once so npm users get the full KB offline.",
5
+ "license": "MIT",
6
+ "repository": { "type": "git", "url": "https://github.com/umacloud/umadev.git" },
7
+ "files": ["**/*.md"]
8
+ }
@@ -0,0 +1,80 @@
1
+ ---
2
+ id: performance-and-scalability
3
+ title: 性能与可扩展性规范(商业级必读)
4
+ domain: performance
5
+ category: 01-standards
6
+ difficulty: intermediate
7
+ tags: [性能, performance, 可扩展, scalability, 缓存, cache, 异步, 连接池, 分页, n+1, 索引, cdn, 懒加载, 商业级]
8
+ quality_score: 95
9
+ last_updated: 2026-06-19
10
+ ---
11
+
12
+ # 性能与可扩展性规范(商业级必读)
13
+
14
+ > 商业系统要扛真实流量。性能不是上线后再优化,而是写代码时就避免明显瓶颈。但也不要过早优化——先按本规范避开已知的坑,再用数据驱动针对性优化。
15
+
16
+ ## 1. 黄金原则
17
+
18
+ - **先测量再优化**:用真实指标/profiling 定位瓶颈,不要凭感觉优化。过早优化是浪费。
19
+ - **避开已知坑**(下面这些是必须避免的,不算"过早优化"):N+1、无索引全表扫、同步阻塞、无分页、无缓存的重复昂贵计算。
20
+ - 关注**尾延迟**(p95/p99)而非平均;关注资源饱和点。
21
+
22
+ ## 2. 数据库性能
23
+
24
+ - **消灭 N+1**:列表/关联用预加载(JOIN/include/selectinload)或批量查询,不要循环里逐条查。
25
+ - 高频查询条件、外键、排序字段建**索引**;用 `EXPLAIN` 验证走索引;只查需要的列(不 `SELECT *`)。
26
+ - 列表强制**分页 + 上限**;深翻页用 cursor 而非大 offset。
27
+ - 连接用**连接池**(合理 min/max),不要每请求新建连接。
28
+ - 写多读多的热点考虑读写分离/只读副本;批量写用 bulk insert/update。
29
+
30
+ ## 3. 缓存(合理使用,注意失效)
31
+
32
+ - 缓存层次:客户端/CDN → 反向代理 → 应用内存 → 分布式缓存(Redis) → DB。
33
+ - 缓存昂贵且相对稳定的数据(配置、热点详情、聚合结果);设合理 TTL。
34
+ - **缓存失效策略**明确:写时失效/更新;防止脏数据。
35
+ - 防**缓存击穿/穿透/雪崩**:热点 key 互斥重建、空值缓存、TTL 加随机抖动。
36
+ - 不要缓存强一致要求的数据(余额、库存实时值)而不做兜底。
37
+
38
+ ## 4. 异步与并发
39
+
40
+ - I/O 密集用**异步/非阻塞**(async/await、事件循环),不要在请求线程里同步等外部调用。
41
+ - 耗时任务(发邮件、生成报表、调第三方)放**后台队列/任务**,请求快速返回,别让用户等。
42
+ - 控制并发与背压:限制并发数、用队列削峰;外部调用设超时 + 重试(指数退避) + 熔断。
43
+ - CPU 密集任务用 worker/线程池,别阻塞事件循环。
44
+
45
+ ## 5. 前端性能
46
+
47
+ - **代码分割 + 懒加载**:路由级 lazy、按需加载重组件;首屏 JS 小。
48
+ - 资源优化:图片用现代格式(webp/avif)+响应式尺寸+懒加载;字体子集化;静态资源走 CDN + 长缓存 + 内容哈希。
49
+ - 减少重渲染:稳定 key、memo、虚拟列表渲染长列表;避免在渲染里做重计算。
50
+ - 服务端数据用缓存库(React Query)做去重/缓存/重取,避免重复请求。
51
+ - 关注 Core Web Vitals(LCP/CLS/INP)。
52
+
53
+ ## 6. 可扩展性(水平扩展友好)
54
+
55
+ - 应用**无状态**(会话/状态放外部存储),便于水平扩展多实例。
56
+ - 避免单点;关键路径可水平扩。
57
+ - 长连接/有状态服务(websocket)用共享存储 + 粘性或网关分发。
58
+ - 限流保护后端;优雅降级(依赖挂了返回兜底而非全崩)。
59
+
60
+ ## 7. 反模式(出现即不合格)
61
+
62
+ - N+1 查询;无索引高频查询;`SELECT *`;列表无分页拉全表。
63
+ - 每请求新建 DB 连接(无连接池)。
64
+ - 请求线程里同步等待慢外部调用;耗时任务不入队列让用户干等。
65
+ - 无缓存重复做昂贵计算;缓存无失效策略导致脏数据。
66
+ - 外部调用无超时/重试/熔断;故障级联雪崩。
67
+ - 前端首屏加载全部 JS、大图不优化、长列表不虚拟化。
68
+ - 应用有状态导致无法水平扩展。
69
+
70
+ ## 8. 最低交付 checklist
71
+
72
+ - [ ] 列表无 N+1、有索引、有分页+上限;只查需要的列;连接池。
73
+ - [ ] 昂贵稳定数据有缓存 + 明确失效;防击穿/穿透/雪崩。
74
+ - [ ] I/O 异步非阻塞;耗时任务入后台队列;外部调用超时+重试+熔断。
75
+ - [ ] 前端代码分割/懒加载、图片/字体/静态资源优化、长列表虚拟化、服务端数据缓存。
76
+ - [ ] 应用无状态可水平扩展;限流 + 优雅降级。
77
+ - [ ] 关注 p95/p99 与 Core Web Vitals,测量驱动优化。
78
+
79
+ ---
80
+ **参考**:Use The Index Luke、缓存模式(Cache-Aside)、12-Factor(无状态)、断路器(Circuit Breaker)、Core Web Vitals、Google SRE。