@umacloud/knowledge 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/00-governance/governance-capabilities.md +557 -0
- package/00-governance/knowledge-map.md +39 -0
- package/00-governance/maintenance-policy.md +76 -0
- package/00-governance/review-checklist.md +81 -0
- package/README.md +13 -0
- package/ai/01-standards/agent-development-complete.md +691 -0
- package/ai/01-standards/llm-application-complete.md +488 -0
- package/ai/01-standards/mlops-complete.md +798 -0
- package/ai/01-standards/prompt-engineering-complete.md +646 -0
- package/ai/01-standards/rag-architecture-complete.md +649 -0
- package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
- package/ai/03-checklists/ai-project-checklist.md +215 -0
- package/ai/04-antipatterns/ai-antipatterns.md +661 -0
- package/ai/05-cases/case-rag-production.md +147 -0
- package/ai/06-glossary/ai-glossary.md +162 -0
- package/ai/agent-evaluation-benchmark.md +53 -0
- package/ai/ai-agent-memory-context-management.md +41 -0
- package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
- package/ai/ai-data-security-and-compliance-playbook.md +37 -0
- package/ai/ai-domain-index-and-checklist.md +40 -0
- package/ai/ai-governance-maturity-model.md +50 -0
- package/ai/ai-model-selection-and-routing-strategy.md +47 -0
- package/ai/ai-observability-and-oncall-runbook.md +52 -0
- package/ai/ai-rag-engineering-playbook.md +42 -0
- package/ai/ai-red-team-and-safety-evaluation.md +42 -0
- package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
- package/ai/llm-agent-engineering-deep-dive.md +57 -0
- package/ai/prompt-and-tool-guardrails.md +52 -0
- package/api/01-standards/enterprise-api-standards.md +198 -0
- package/api/01-standards/rest-api-design-guide.md +63 -0
- package/api/02-playbooks/api-pagination-playbook.md +93 -0
- package/api/02-playbooks/graphql-production-playbook.md +176 -0
- package/api/03-checklists/api-review-checklist.md +55 -0
- package/api/04-antipatterns/api-antipatterns.md +112 -0
- package/architecture/01-standards/api-gateway-patterns.md +496 -0
- package/architecture/01-standards/cloud-native-patterns.md +644 -0
- package/architecture/01-standards/distributed-systems-patterns.md +591 -0
- package/architecture/01-standards/event-driven-architecture.md +595 -0
- package/architecture/01-standards/microservices-patterns-complete.md +968 -0
- package/architecture/01-standards/microservices-patterns.md +495 -0
- package/architecture/01-standards/system-design-interview.md +664 -0
- package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
- package/architecture/02-playbooks/migration-playbook.md +780 -0
- package/architecture/02-playbooks/system-design-playbook.md +779 -0
- package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
- package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
- package/architecture/05-cases/case-netflix-microservices.md +413 -0
- package/architecture/06-glossary/architecture-glossary.md +164 -0
- package/architecture/adr-template-and-examples.md +38 -0
- package/architecture/api-gateway-deep-dive.md +1291 -0
- package/architecture/configuration-management.md +1162 -0
- package/architecture/distributed-transactions.md +1220 -0
- package/architecture/microservices-complete.md +735 -0
- package/architecture/resilience-and-disaster-patterns.md +37 -0
- package/architecture/service-governance.md +1198 -0
- package/architecture/system-architecture-deep-dive.md +37 -0
- package/backend/01-standards/analytics-and-growth.md +65 -0
- package/backend/01-standards/api-and-error-conventions.md +120 -0
- package/backend/01-standards/application-layering-and-packaging.md +160 -0
- package/backend/01-standards/auth-implementation.md +104 -0
- package/backend/01-standards/backend-framework-idioms.md +74 -0
- package/backend/01-standards/background-jobs-and-async.md +66 -0
- package/backend/01-standards/caching-strategies-complete.md +390 -0
- package/backend/01-standards/config-and-observability.md +77 -0
- package/backend/01-standards/data-modeling-and-persistence.md +94 -0
- package/backend/01-standards/django-complete.md +1765 -0
- package/backend/01-standards/email-and-notifications.md +64 -0
- package/backend/01-standards/fastapi-complete.md +925 -0
- package/backend/01-standards/file-upload-and-storage.md +66 -0
- package/backend/01-standards/graphql-api-complete.md +416 -0
- package/backend/01-standards/llm-application-standard.md +78 -0
- package/backend/01-standards/message-queue-patterns.md +379 -0
- package/backend/01-standards/microservices-and-distributed.md +78 -0
- package/backend/01-standards/nestjs-complete.md +2167 -0
- package/backend/01-standards/payment-integration.md +80 -0
- package/backend/01-standards/rate-limiting-complete.md +451 -0
- package/backend/01-standards/realtime-and-websocket.md +65 -0
- package/backend/01-standards/search-and-filtering.md +64 -0
- package/backend/01-standards/spring-boot-complete.md +445 -0
- package/backend/02-playbooks/api-design-playbook.md +718 -0
- package/backend/02-playbooks/email-send-playbook.md +130 -0
- package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
- package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
- package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
- package/backend/03-checklists/api-launch-checklist.md +189 -0
- package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
- package/blockchain/01-standards/blockchain-basics.md +557 -0
- package/blockchain/01-standards/smart-contract-development.md +1315 -0
- package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
- package/cicd/01-standards/github-actions-complete.md +473 -0
- package/cicd/01-standards/release-and-store-submission.md +75 -0
- package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
- package/cicd/02-playbooks/release-management-playbook.md +605 -0
- package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
- package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
- package/cicd/05-cases/case-deployment-automation.md +221 -0
- package/cicd/05-cases/case-gitops-transformation.md +212 -0
- package/cicd/06-glossary/cicd-glossary.md +114 -0
- package/cicd/cicd-blueprint-deep-dive.md +38 -0
- package/cicd/release-readiness-gate.md +37 -0
- package/cloud-native/01-standards/container-security.md +741 -0
- package/cloud-native/01-standards/kubernetes-complete.md +812 -0
- package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
- package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
- package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
- package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
- package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
- package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
- package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
- package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
- package/cloud-native/03-checklists/container-security-checklist.md +431 -0
- package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
- package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
- package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
- package/cloud-native/05-cases/case-k8s-migration.md +478 -0
- package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
- package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
- package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
- package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
- package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
- package/data/01-standards/elasticsearch-complete.md +2098 -0
- package/data/01-standards/postgresql-complete.md +1613 -0
- package/data/01-standards/redis-complete.md +1527 -0
- package/data/02-playbooks/database-optimization-playbook.md +403 -0
- package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
- package/data/03-checklists/database-launch-checklist.md +187 -0
- package/data/04-antipatterns/database-antipatterns.md +873 -0
- package/data/05-cases/case-database-migration.md +310 -0
- package/data/06-glossary/database-glossary.md +440 -0
- package/data/data-governance-and-modeling-deep-dive.md +39 -0
- package/data-engineering/01-standards/airflow-complete.md +523 -0
- package/data-engineering/01-standards/kafka-complete.md +1521 -0
- package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
- package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
- package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
- package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
- package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
- package/database/01-standards/database-schema-standards.md +147 -0
- package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
- package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
- package/database/02-playbooks/postgresql-production-playbook.md +146 -0
- package/database/02-playbooks/redis-caching-playbook.md +117 -0
- package/database/03-checklists/database-review-checklist.md +50 -0
- package/database/04-antipatterns/database-antipatterns.md +112 -0
- package/design/01-standards/ui-design-system-complete.md +423 -0
- package/design/02-playbooks/design-handoff-playbook.md +254 -0
- package/design/02-playbooks/design-review-playbook.md +388 -0
- package/design/03-checklists/design-review-checklist.md +246 -0
- package/design/04-antipatterns/design-antipatterns.md +378 -0
- package/design/05-cases/case-design-system-adoption.md +328 -0
- package/design/06-glossary/design-glossary.md +329 -0
- package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
- package/design/ux-system-deep-dive.md +38 -0
- package/design-systems/00-craft-rules.md +71 -0
- package/design-systems/aesthetic-families.md +43 -0
- package/design-systems/anti-ai-slop.md +162 -0
- package/design-systems/bold-geometric.md +120 -0
- package/design-systems/brutalist-bold.md +103 -0
- package/design-systems/editorial-clean.md +109 -0
- package/design-systems/glass-aurora.md +108 -0
- package/design-systems/modern-minimal.md +145 -0
- package/design-systems/premium-luxury.md +106 -0
- package/design-systems/product-type-design-map.md +48 -0
- package/design-systems/soft-warm.md +123 -0
- package/design-systems/tech-utility.md +113 -0
- package/desktop/01-standards/desktop-app-standard.md +72 -0
- package/desktop/01-standards/desktop-design.md +71 -0
- package/development/00-governance/document-template.md +41 -0
- package/development/01-standards/api-versioning-strategies.md +432 -0
- package/development/01-standards/authentication-patterns-complete.md +479 -0
- package/development/01-standards/css-architecture-complete.md +550 -0
- package/development/01-standards/database-migration-strategies.md +484 -0
- package/development/01-standards/elasticsearch-complete.md +347 -0
- package/development/01-standards/git-complete.md +371 -0
- package/development/01-standards/golang-complete.md +1565 -0
- package/development/01-standards/graphql-complete.md +298 -0
- package/development/01-standards/javascript-bundlers-complete.md +469 -0
- package/development/01-standards/javascript-typescript-complete.md +528 -0
- package/development/01-standards/jest-complete.md +275 -0
- package/development/01-standards/linux-complete.md +234 -0
- package/development/01-standards/logging-observability-complete.md +526 -0
- package/development/01-standards/microservices-communication.md +502 -0
- package/development/01-standards/mongodb-complete.md +406 -0
- package/development/01-standards/oauth2-complete.md +285 -0
- package/development/01-standards/performance-optimization-complete.md +289 -0
- package/development/01-standards/playwright-complete.md +247 -0
- package/development/01-standards/postgresql-complete.md +456 -0
- package/development/01-standards/pytest-complete.md +340 -0
- package/development/01-standards/python-async-programming.md +902 -0
- package/development/01-standards/python-complete.md +956 -0
- package/development/01-standards/python-decorators-complete.md +799 -0
- package/development/01-standards/python-design-patterns.md +2854 -0
- package/development/01-standards/python-packaging-distribution.md +420 -0
- package/development/01-standards/python-testing-strategies.md +607 -0
- package/development/01-standards/python-web-frameworks-comparison.md +471 -0
- package/development/01-standards/redis-complete.md +317 -0
- package/development/01-standards/rest-api-complete.md +316 -0
- package/development/01-standards/rust-complete.md +578 -0
- package/development/01-standards/typescript-advanced-types.md +1513 -0
- package/development/01-standards/web-security-complete.md +292 -0
- package/development/02-playbooks/api-design-playbook.md +810 -0
- package/development/02-playbooks/database-migration-playbook.md +580 -0
- package/development/02-playbooks/debugging-playbook.md +692 -0
- package/development/02-playbooks/feature-delivery-playbook.md +430 -0
- package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
- package/development/02-playbooks/performance-optimization-playbook.md +531 -0
- package/development/02-playbooks/performance-tuning-playbook.md +652 -0
- package/development/02-playbooks/refactor-playbook.md +403 -0
- package/development/02-playbooks/release-playbook.md +469 -0
- package/development/03-checklists/architecture-review-checklist.md +168 -0
- package/development/03-checklists/data-migration-checklist.md +157 -0
- package/development/03-checklists/oncall-handover-checklist.md +173 -0
- package/development/03-checklists/pr-checklist.md +158 -0
- package/development/03-checklists/production-readiness-checklist.md +190 -0
- package/development/03-checklists/release-readiness-checklist.md +154 -0
- package/development/03-checklists/security-review-checklist.md +182 -0
- package/development/04-antipatterns/api-antipatterns.md +657 -0
- package/development/04-antipatterns/architecture-antipatterns.md +686 -0
- package/development/04-antipatterns/backend-antipatterns.md +648 -0
- package/development/04-antipatterns/cicd-antipatterns.md +540 -0
- package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
- package/development/04-antipatterns/data-antipatterns.md +658 -0
- package/development/04-antipatterns/database-antipatterns.md +578 -0
- package/development/04-antipatterns/frontend-antipatterns.md +635 -0
- package/development/04-antipatterns/reliability-antipatterns.md +700 -0
- package/development/04-antipatterns/security-antipatterns.md +747 -0
- package/development/05-cases/case-api-version-migration.md +428 -0
- package/development/05-cases/case-authorization-hardening.md +383 -0
- package/development/05-cases/case-bluegreen-rollback.md +466 -0
- package/development/05-cases/case-cache-snowball-protection.md +485 -0
- package/development/05-cases/case-ci-cd-pipeline.md +544 -0
- package/development/05-cases/case-database-scaling.md +500 -0
- package/development/05-cases/case-db-hotspot-optimization.md +487 -0
- package/development/05-cases/case-incident-mttr-reduction.md +563 -0
- package/development/05-cases/case-microservice-migration.md +375 -0
- package/development/05-cases/case-performance-optimization.md +406 -0
- package/development/05-cases/case-security-incident-response.md +345 -0
- package/development/06-glossary/full-stack-glossary.md +166 -0
- package/development/09-maturity/quarterly-audit-template.md +35 -0
- package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
- package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
- package/development/12-scenarios/development-scenarios-guide.md +565 -0
- package/development/13-implementation-assets/implementation-toolkit.md +282 -0
- package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
- package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
- package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
- package/development/api-contract-and-versioning-guide.md +36 -0
- package/development/api-governance-complete.md +43 -0
- package/development/backend-engineering-complete.md +43 -0
- package/development/code-review-quality-complete.md +43 -0
- package/development/concurrency-reliability-complete.md +43 -0
- package/development/database-engineering-complete.md +43 -0
- package/development/engineering-effectiveness-complete.md +43 -0
- package/development/engineering-standards-deep-dive.md +38 -0
- package/development/frontend-engineering-complete.md +43 -0
- package/development/performance-capacity-complete.md +43 -0
- package/development/refactor-migration-complete.md +42 -0
- package/development/refactoring-and-techdebt-playbook.md +37 -0
- package/development/security-in-development-complete.md +43 -0
- package/devops/01-standards/cicd-pipeline-complete.md +262 -0
- package/devops/01-standards/docker-complete.md +1490 -0
- package/devops/01-standards/github-actions-complete.md +337 -0
- package/devops/01-standards/kubernetes-complete.md +638 -0
- package/devops/01-standards/terraform-complete.md +2117 -0
- package/devops/02-playbooks/docker-compose-playbook.md +233 -0
- package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
- package/devops/02-playbooks/docker-production-playbook.md +952 -0
- package/edge-iot/01-standards/edge-iot-complete.md +473 -0
- package/experts/architect/api-design.md +178 -0
- package/experts/architect/methodology.md +124 -0
- package/experts/architect/security.md +75 -0
- package/experts/backend-lead/methodology.md +216 -0
- package/experts/devops/methodology.md +160 -0
- package/experts/frontend-lead/methodology.md +178 -0
- package/experts/product-manager/industry/ecommerce.md +43 -0
- package/experts/product-manager/industry/saas.md +40 -0
- package/experts/product-manager/methodology.md +97 -0
- package/experts/qa-lead/methodology.md +123 -0
- package/experts/qa-lead/test-strategy.md +128 -0
- package/experts/uiux-designer/methodology.md +125 -0
- package/frontend/01-standards/accessibility-complete.md +532 -0
- package/frontend/01-standards/accessibility-standard.md +74 -0
- package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
- package/frontend/01-standards/design-tokens-complete.md +444 -0
- package/frontend/01-standards/forms-and-validation.md +77 -0
- package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
- package/frontend/01-standards/i18n-and-localization.md +65 -0
- package/frontend/01-standards/nextjs-complete.md +451 -0
- package/frontend/01-standards/react-complete.md +713 -0
- package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
- package/frontend/01-standards/react-hooks-complete.md +1171 -0
- package/frontend/01-standards/seo-and-web-vitals.md +77 -0
- package/frontend/01-standards/state-management-complete.md +444 -0
- package/frontend/01-standards/vue-complete.md +499 -0
- package/frontend/01-standards/vue3-complete.md +2002 -0
- package/frontend/01-standards/web-framework-best-practices.md +64 -0
- package/frontend/01-standards/web-performance-complete.md +495 -0
- package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
- package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
- package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
- package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
- package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
- package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
- package/frontend/03-checklists/component-quality-checklist.md +166 -0
- package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
- package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
- package/frontend/05-cases/case-performance-optimization.md +274 -0
- package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
- package/harmony/01-standards/harmonyos-design.md +65 -0
- package/high-quality-engineering-playbook.md +54 -0
- package/incident/01-standards/incident-response-complete.md +303 -0
- package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
- package/incident/02-playbooks/postmortem-playbook.md +398 -0
- package/incident/03-checklists/incident-readiness-checklist.md +181 -0
- package/incident/04-antipatterns/incident-antipatterns.md +490 -0
- package/incident/05-cases/case-cascade-failure.md +176 -0
- package/incident/06-glossary/incident-glossary.md +114 -0
- package/incident/postmortem-and-response-deep-dive.md +39 -0
- package/industries/ecommerce/ecommerce-complete.md +631 -0
- package/industries/education/education-complete.md +555 -0
- package/industries/fintech/fintech-complete.md +501 -0
- package/industries/gaming/gaming-complete.md +587 -0
- package/industries/healthcare/healthcare-complete.md +452 -0
- package/low-code/01-standards/low-code-complete.md +944 -0
- package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
- package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
- package/miniprogram/01-standards/miniprogram-design.md +61 -0
- package/miniprogram/01-standards/miniprogram-standard.md +81 -0
- package/mobile/01-standards/android-material-design.md +70 -0
- package/mobile/01-standards/flutter-complete.md +384 -0
- package/mobile/01-standards/ios-design-hig.md +78 -0
- package/mobile/01-standards/mobile-app-standard.md +85 -0
- package/mobile/01-standards/react-native-complete.md +352 -0
- package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
- package/mobile/02-playbooks/mobile-performance.md +473 -0
- package/mobile/03-checklists/mobile-release-checklist.md +234 -0
- package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
- package/mobile/05-cases/case-app-performance.md +500 -0
- package/mobile/05-cases/case-app-startup-optimization.md +218 -0
- package/mobile/06-glossary/mobile-glossary.md +484 -0
- package/observability/01-standards/observability-standards.md +103 -0
- package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
- package/observability/02-playbooks/structured-logging-playbook.md +73 -0
- package/observability/03-checklists/observability-checklist.md +54 -0
- package/observability/04-antipatterns/observability-antipatterns.md +106 -0
- package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
- package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
- package/operations/03-checklists/production-launch-checklist.md +365 -0
- package/operations/04-antipatterns/operations-antipatterns.md +664 -0
- package/operations/05-cases/case-sre-practices.md +581 -0
- package/operations/06-glossary/operations-glossary.md +120 -0
- package/operations/aiops-anomaly-detection.md +758 -0
- package/operations/capacity-planning.md +1061 -0
- package/operations/chaos-engineering.md +659 -0
- package/operations/incident-command-system.md +38 -0
- package/operations/observability-complete.md +442 -0
- package/operations/slo-sli-playbook.md +517 -0
- package/operations/sre-operations-deep-dive.md +39 -0
- package/package.json +8 -0
- package/performance/01-standards/performance-and-scalability.md +80 -0
- package/performance/01-standards/performance-standards.md +156 -0
- package/performance/02-playbooks/query-optimization-playbook.md +103 -0
- package/performance/03-checklists/performance-checklist.md +56 -0
- package/performance/04-antipatterns/performance-antipatterns.md +146 -0
- package/product/01-standards/product-management-complete.md +285 -0
- package/product/02-playbooks/feature-launch-playbook.md +207 -0
- package/product/02-playbooks/user-research-playbook.md +532 -0
- package/product/03-checklists/feature-launch-checklist.md +275 -0
- package/product/04-antipatterns/product-antipatterns.md +355 -0
- package/product/05-cases/case-mvp-to-scale.md +384 -0
- package/product/06-glossary/product-glossary.md +462 -0
- package/product/feature-prioritization-framework.md +40 -0
- package/product/kpi-and-metric-tree.md +37 -0
- package/product/product-discovery-and-prd-deep-dive.md +41 -0
- package/quantum/01-standards/quantum-complete.md +1186 -0
- package/security/01-standards/api-security-complete.md +511 -0
- package/security/01-standards/container-runtime-security.md +574 -0
- package/security/01-standards/data-protection-gdpr.md +543 -0
- package/security/01-standards/owasp-top10-complete.md +1890 -0
- package/security/01-standards/secure-coding-baseline.md +90 -0
- package/security/01-standards/supply-chain-security.md +441 -0
- package/security/01-standards/web-security-checklist.md +108 -0
- package/security/01-standards/zero-trust-architecture.md +521 -0
- package/security/02-playbooks/auth-sso-playbook.md +166 -0
- package/security/02-playbooks/incident-response-security-playbook.md +588 -0
- package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
- package/security/02-playbooks/payment-integration-playbook.md +119 -0
- package/security/02-playbooks/penetration-testing-playbook.md +517 -0
- package/security/03-checklists/security-audit-checklist.md +356 -0
- package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
- package/security/05-cases/case-log4shell-incident.md +537 -0
- package/security/05-cases/case-major-breaches.md +468 -0
- package/security/06-glossary/security-glossary.md +212 -0
- package/security/compliance-automation.md +993 -0
- package/security/container-security.md +680 -0
- package/security/devsecops-complete.md +426 -0
- package/security/sast-dast-sca.md +775 -0
- package/security/secrets-management.md +594 -0
- package/security/security-architecture-deep-dive.md +37 -0
- package/security/threat-modeling-stride-playbook.md +40 -0
- package/seed-templates/auth-system.md +59 -0
- package/seed-templates/blog-content.md +94 -0
- package/seed-templates/dashboard.md +89 -0
- package/seed-templates/docs-site.md +73 -0
- package/seed-templates/e-commerce.md +50 -0
- package/seed-templates/saas-landing.md +92 -0
- package/seed-templates/settings-page.md +51 -0
- package/testing/01-standards/test-strategy-and-layering.md +83 -0
- package/testing/01-standards/testing-strategy-complete.md +422 -0
- package/testing/01-standards/unit-testing-best-practices.md +118 -0
- package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
- package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
- package/testing/03-checklists/test-strategy-checklist.md +208 -0
- package/testing/04-antipatterns/testing-antipatterns.md +718 -0
- package/testing/05-cases/case-testing-transformation.md +300 -0
- package/testing/06-glossary/testing-glossary.md +110 -0
- package/testing/risk-based-test-matrix.md +36 -0
- package/testing/testing-strategy-deep-dive.md +37 -0
|
@@ -0,0 +1,466 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: case-bluegreen-rollback
|
|
3
|
+
title: 案例研究:蓝绿发布与快速回滚体系建设
|
|
4
|
+
domain: development
|
|
5
|
+
category: 05-cases
|
|
6
|
+
difficulty: intermediate
|
|
7
|
+
tags: [agent, bluegreen, case, checklist, development, rollback, 元数据]
|
|
8
|
+
quality_score: 70
|
|
9
|
+
last_updated: 2026-06-15
|
|
10
|
+
---
|
|
11
|
+
# 案例研究:蓝绿发布与快速回滚体系建设
|
|
12
|
+
|
|
13
|
+
## 元数据
|
|
14
|
+
|
|
15
|
+
| 字段 | 值 |
|
|
16
|
+
|------|------|
|
|
17
|
+
| 行业 | 在线旅游平台(OTA) |
|
|
18
|
+
| 系统规模 | 日均订单 30 万,峰值 QPS 18,000 |
|
|
19
|
+
| 技术栈 | Go + React + PostgreSQL + Redis + Kubernetes |
|
|
20
|
+
| 服务数量 | 18 个微服务 |
|
|
21
|
+
| 团队规模 | 后端 24 人,前端 8 人,SRE 5 人 |
|
|
22
|
+
| 建设周期 | 8 周(2024-01 至 2024-03) |
|
|
23
|
+
| 核心目标 | 版本故障恢复时间从 30 分钟降到 2 分钟 |
|
|
24
|
+
|
|
25
|
+
---
|
|
26
|
+
|
|
27
|
+
## 一、背景
|
|
28
|
+
|
|
29
|
+
### 1.1 发布现状
|
|
30
|
+
|
|
31
|
+
旅游平台高频发布下的痛点:
|
|
32
|
+
|
|
33
|
+
```
|
|
34
|
+
发布流程(改造前):
|
|
35
|
+
1. 构建镜像并推送到 Harbor(5 min)
|
|
36
|
+
2. kubectl set image 更新 Deployment(1 min)
|
|
37
|
+
3. 等待 Rolling Update 完成(8-15 min,依赖服务启动速度)
|
|
38
|
+
4. 人工验证核心流程(10-20 min)
|
|
39
|
+
5. 发现问题后手动回滚(kubectl rollout undo,5-10 min)
|
|
40
|
+
6. 等待回滚完成(8-15 min)
|
|
41
|
+
|
|
42
|
+
总计:正常发布 25-40 min,回滚 15-25 min
|
|
43
|
+
```
|
|
44
|
+
|
|
45
|
+
过去 6 个月的发布事故统计:
|
|
46
|
+
|
|
47
|
+
| 指标 | 值 |
|
|
48
|
+
|------|------|
|
|
49
|
+
| 发布总次数 | 180 次 |
|
|
50
|
+
| 发布失败次数 | 22 次(12%) |
|
|
51
|
+
| 需要回滚的次数 | 15 次(8%) |
|
|
52
|
+
| 平均回滚时间 | 28 分钟 |
|
|
53
|
+
| 最长回滚时间 | 52 分钟(数据库迁移回滚) |
|
|
54
|
+
| 发布导致的 P0 事故 | 3 次 |
|
|
55
|
+
| 发布窗口限制 | 工作日 10:00-16:00 |
|
|
56
|
+
|
|
57
|
+
### 1.2 核心痛点
|
|
58
|
+
|
|
59
|
+
1. **回滚太慢**:28 分钟的平均回滚时间意味着故障期间损失数十万元订单
|
|
60
|
+
2. **回滚不可靠**:Rolling Update 的回滚依赖 K8s 历史版本,超过 10 个版本后不可控
|
|
61
|
+
3. **验证滞后**:新版本上线后人工验证慢,问题发现延迟 10-20 分钟
|
|
62
|
+
4. **全量发布**:一次性全量切流,没有灰度能力
|
|
63
|
+
5. **数据库迁移绑定**:代码和数据库迁移捆绑部署,回滚时数据库无法回退
|
|
64
|
+
|
|
65
|
+
---
|
|
66
|
+
|
|
67
|
+
## 二、挑战
|
|
68
|
+
|
|
69
|
+
### 2.1 技术挑战
|
|
70
|
+
|
|
71
|
+
1. **有状态服务**:搜索服务有本地缓存预热(15 分钟),蓝绿切换后缓存冷启动影响性能
|
|
72
|
+
2. **长连接服务**:WebSocket 消息推送服务有 10 万+ 长连接,切换时不能断连
|
|
73
|
+
3. **数据库兼容**:蓝绿两个版本需要兼容同一个数据库 Schema
|
|
74
|
+
4. **资源成本**:蓝绿部署需要双倍计算资源
|
|
75
|
+
5. **服务依赖**:18 个微服务间有调用依赖,版本兼容性管理复杂
|
|
76
|
+
|
|
77
|
+
### 2.2 业务约束
|
|
78
|
+
|
|
79
|
+
1. 旅游平台对实时性要求高(机票/酒店价格实时变化)
|
|
80
|
+
2. 支付环节不允许中断(支付中的订单必须在当前版本完成)
|
|
81
|
+
3. 搜索服务是流量入口,性能劣化直接影响转化率
|
|
82
|
+
|
|
83
|
+
---
|
|
84
|
+
|
|
85
|
+
## 三、方案设计
|
|
86
|
+
|
|
87
|
+
### 3.1 蓝绿部署架构
|
|
88
|
+
|
|
89
|
+
```
|
|
90
|
+
┌─────────────┐
|
|
91
|
+
│ Istio │
|
|
92
|
+
│ Gateway │
|
|
93
|
+
└──────┬──────┘
|
|
94
|
+
│
|
|
95
|
+
┌────────────────┼────────────────┐
|
|
96
|
+
│ │ │
|
|
97
|
+
┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐
|
|
98
|
+
│ VirtualSvc│ │ VirtualSvc│ │ VirtualSvc│
|
|
99
|
+
│ (Search) │ │ (Order) │ │ (Payment) │
|
|
100
|
+
└─────┬──────┘ └─────┬──────┘ └─────┬──────┘
|
|
101
|
+
│ │ │
|
|
102
|
+
┌────┬──┴──┐ ┌────┬──┴──┐ ┌────┬──┴──┐
|
|
103
|
+
│Blue│Green│ │Blue│Green│ │Blue│Green│
|
|
104
|
+
│v2.1│v2.2 │ │v3.0│v3.1 │ │v1.5│v1.5 │
|
|
105
|
+
└────┴─────┘ └────┴─────┘ └────┴─────┘
|
|
106
|
+
```
|
|
107
|
+
|
|
108
|
+
### 3.2 流量切换策略
|
|
109
|
+
|
|
110
|
+
```yaml
|
|
111
|
+
# Istio VirtualService 配置
|
|
112
|
+
apiVersion: networking.istio.io/v1beta1
|
|
113
|
+
kind: VirtualService
|
|
114
|
+
metadata:
|
|
115
|
+
name: search-service
|
|
116
|
+
spec:
|
|
117
|
+
hosts:
|
|
118
|
+
- search-service
|
|
119
|
+
http:
|
|
120
|
+
- match:
|
|
121
|
+
- headers:
|
|
122
|
+
x-canary:
|
|
123
|
+
exact: "true"
|
|
124
|
+
route:
|
|
125
|
+
- destination:
|
|
126
|
+
host: search-service
|
|
127
|
+
subset: green
|
|
128
|
+
weight: 100
|
|
129
|
+
|
|
130
|
+
- route:
|
|
131
|
+
- destination:
|
|
132
|
+
host: search-service
|
|
133
|
+
subset: blue
|
|
134
|
+
weight: 90
|
|
135
|
+
- destination:
|
|
136
|
+
host: search-service
|
|
137
|
+
subset: green
|
|
138
|
+
weight: 10
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
### 3.3 灰度切流方案
|
|
142
|
+
|
|
143
|
+
```
|
|
144
|
+
切流步骤(自动化编排):
|
|
145
|
+
|
|
146
|
+
Step 1: 部署 Green 版本(不接收流量)
|
|
147
|
+
- 部署新版本到 Green Deployment
|
|
148
|
+
- 等待所有 Pod Ready + 健康检查通过
|
|
149
|
+
- 搜索服务额外等待缓存预热完成
|
|
150
|
+
|
|
151
|
+
Step 2: 冒烟测试(0% 真实流量)
|
|
152
|
+
- 向 Green 发送合成测试请求
|
|
153
|
+
- 验证核心接口:搜索/下单/支付/退款
|
|
154
|
+
- 任何失败 → 终止发布
|
|
155
|
+
|
|
156
|
+
Step 3: 内部灰度(1% 流量)
|
|
157
|
+
- 切 1% 真实流量到 Green
|
|
158
|
+
- 持续 5 分钟
|
|
159
|
+
- 自动对比 Blue/Green 的错误率和延迟
|
|
160
|
+
- Green 错误率 > Blue * 1.5 → 自动回滚
|
|
161
|
+
|
|
162
|
+
Step 4: 小比例灰度(10% 流量)
|
|
163
|
+
- 切 10% 流量到 Green
|
|
164
|
+
- 持续 10 分钟
|
|
165
|
+
- 自动分析指标
|
|
166
|
+
|
|
167
|
+
Step 5: 半量灰度(50% 流量)
|
|
168
|
+
- 切 50% 流量到 Green
|
|
169
|
+
- 持续 15 分钟
|
|
170
|
+
- 自动分析指标 + 业务指标对比
|
|
171
|
+
|
|
172
|
+
Step 6: 全量切换(100% 流量)
|
|
173
|
+
- 全部流量切到 Green
|
|
174
|
+
- Blue 保留但不接收流量(待命回滚)
|
|
175
|
+
- 保留 24 小时后清理 Blue 资源
|
|
176
|
+
|
|
177
|
+
回滚(任意步骤):
|
|
178
|
+
- 将 VirtualService weight 100% 切回 Blue
|
|
179
|
+
- 耗时 < 2 秒(Istio 配置生效时间)
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### 3.4 自动回滚触发条件
|
|
183
|
+
|
|
184
|
+
```yaml
|
|
185
|
+
# 自动回滚策略配置
|
|
186
|
+
rollback_triggers:
|
|
187
|
+
# 错误率
|
|
188
|
+
- metric: "http_5xx_rate"
|
|
189
|
+
threshold: 0.01 # 5xx 错误率 > 1%
|
|
190
|
+
duration: "2m" # 持续 2 分钟
|
|
191
|
+
action: "auto_rollback"
|
|
192
|
+
|
|
193
|
+
# 延迟
|
|
194
|
+
- metric: "http_p99_latency_ms"
|
|
195
|
+
threshold: 500 # P99 > 500ms
|
|
196
|
+
duration: "3m"
|
|
197
|
+
action: "auto_rollback"
|
|
198
|
+
|
|
199
|
+
# 业务指标
|
|
200
|
+
- metric: "order_creation_success_rate"
|
|
201
|
+
threshold: 0.98 # 下单成功率 < 98%
|
|
202
|
+
duration: "2m"
|
|
203
|
+
action: "auto_rollback"
|
|
204
|
+
|
|
205
|
+
# 搜索指标
|
|
206
|
+
- metric: "search_empty_result_rate"
|
|
207
|
+
threshold: 0.15 # 搜索空结果率 > 15%
|
|
208
|
+
duration: "5m"
|
|
209
|
+
action: "alert" # 告警但不自动回滚(需人工确认)
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
---
|
|
213
|
+
|
|
214
|
+
## 四、实施步骤
|
|
215
|
+
|
|
216
|
+
### 4.1 Phase 1:基础设施准备(Week 1-2)
|
|
217
|
+
|
|
218
|
+
```
|
|
219
|
+
Week 1: Istio Service Mesh 部署
|
|
220
|
+
- 安装 Istio 1.20(Sidecar 注入模式)
|
|
221
|
+
- 逐个服务启用 Sidecar(从非核心服务开始)
|
|
222
|
+
- 验证 Istio 对现有流量无影响
|
|
223
|
+
|
|
224
|
+
Week 2: 蓝绿 Deployment 模板
|
|
225
|
+
- 为每个服务创建 Blue/Green 两套 Deployment
|
|
226
|
+
- DestinationRule 定义 Blue/Green subset
|
|
227
|
+
- VirtualService 初始化(100% Blue)
|
|
228
|
+
- 资源规划:Green 环境使用 Spot Instance 降低成本
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
### 4.2 Phase 2:切流引擎开发(Week 3-4)
|
|
232
|
+
|
|
233
|
+
```
|
|
234
|
+
Week 3: 切流控制器
|
|
235
|
+
- 开发 Rollout Controller(Go)
|
|
236
|
+
- 实现灰度步骤编排(1% → 10% → 50% → 100%)
|
|
237
|
+
- 实现指标采集和自动回滚判断
|
|
238
|
+
|
|
239
|
+
Week 4: 可观测性
|
|
240
|
+
- Prometheus 指标采集(Blue/Green 分标签)
|
|
241
|
+
- Grafana Dashboard:蓝绿流量对比、错误率对比、延迟对比
|
|
242
|
+
- 告警规则配置
|
|
243
|
+
- ChatOps 集成(Slack/钉钉通知发布进度)
|
|
244
|
+
```
|
|
245
|
+
|
|
246
|
+
**切流控制器核心逻辑**:
|
|
247
|
+
|
|
248
|
+
```go
|
|
249
|
+
type RolloutStep struct {
|
|
250
|
+
Weight int // Green 流量百分比
|
|
251
|
+
Duration time.Duration // 观察时间
|
|
252
|
+
AutoRollback bool // 是否自动回滚
|
|
253
|
+
}
|
|
254
|
+
|
|
255
|
+
func (c *Controller) ExecuteRollout(ctx context.Context, service string, steps []RolloutStep) error {
|
|
256
|
+
for i, step := range steps {
|
|
257
|
+
log.Infof("Step %d: Setting green weight to %d%%", i+1, step.Weight)
|
|
258
|
+
|
|
259
|
+
// 更新 VirtualService
|
|
260
|
+
if err := c.setTrafficWeight(ctx, service, step.Weight); err != nil {
|
|
261
|
+
return c.rollback(ctx, service, "Failed to set weight")
|
|
262
|
+
}
|
|
263
|
+
|
|
264
|
+
// 观察期
|
|
265
|
+
ticker := time.NewTicker(30 * time.Second)
|
|
266
|
+
timer := time.NewTimer(step.Duration)
|
|
267
|
+
for {
|
|
268
|
+
select {
|
|
269
|
+
case <-ticker.C:
|
|
270
|
+
metrics, err := c.collectMetrics(ctx, service)
|
|
271
|
+
if err != nil {
|
|
272
|
+
log.Warnf("Failed to collect metrics: %v", err)
|
|
273
|
+
continue
|
|
274
|
+
}
|
|
275
|
+
if step.AutoRollback && c.shouldRollback(metrics) {
|
|
276
|
+
return c.rollback(ctx, service,
|
|
277
|
+
fmt.Sprintf("Metrics exceeded threshold at step %d", i+1))
|
|
278
|
+
}
|
|
279
|
+
case <-timer.C:
|
|
280
|
+
goto nextStep
|
|
281
|
+
case <-ctx.Done():
|
|
282
|
+
return c.rollback(ctx, service, "Context cancelled")
|
|
283
|
+
}
|
|
284
|
+
}
|
|
285
|
+
nextStep:
|
|
286
|
+
log.Infof("Step %d completed successfully", i+1)
|
|
287
|
+
}
|
|
288
|
+
return nil
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
func (c *Controller) rollback(ctx context.Context, service, reason string) error {
|
|
292
|
+
log.Warnf("Rolling back %s: %s", service, reason)
|
|
293
|
+
start := time.Now()
|
|
294
|
+
err := c.setTrafficWeight(ctx, service, 0) // 100% Blue
|
|
295
|
+
log.Infof("Rollback completed in %v", time.Since(start))
|
|
296
|
+
c.notify(fmt.Sprintf("🔴 Rollback: %s - %s (took %v)", service, reason, time.Since(start)))
|
|
297
|
+
return err
|
|
298
|
+
}
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
### 4.3 Phase 3:特殊场景处理(Week 5-6)
|
|
302
|
+
|
|
303
|
+
#### 搜索服务缓存预热
|
|
304
|
+
|
|
305
|
+
```
|
|
306
|
+
问题:搜索服务启动后需要 15 分钟预热本地缓存,冷启动期间查询延迟 5x
|
|
307
|
+
|
|
308
|
+
解决方案:
|
|
309
|
+
1. Green 版本部署后立即开始缓存预热(不接收真实流量)
|
|
310
|
+
2. 预热完成标志:自定义 Readiness Probe 检查缓存命中率 > 80%
|
|
311
|
+
3. 预热期间用 Blue 继续服务所有流量
|
|
312
|
+
4. 预热完成后才开始灰度切流
|
|
313
|
+
```
|
|
314
|
+
|
|
315
|
+
```yaml
|
|
316
|
+
# 搜索服务自定义 Readiness Probe
|
|
317
|
+
readinessProbe:
|
|
318
|
+
httpGet:
|
|
319
|
+
path: /health/ready
|
|
320
|
+
port: 8080
|
|
321
|
+
initialDelaySeconds: 60
|
|
322
|
+
periodSeconds: 10
|
|
323
|
+
successThreshold: 3 # 连续 3 次成功才算 Ready
|
|
324
|
+
failureThreshold: 60 # 最多等 10 分钟
|
|
325
|
+
```
|
|
326
|
+
|
|
327
|
+
#### WebSocket 长连接优雅迁移
|
|
328
|
+
|
|
329
|
+
```
|
|
330
|
+
问题:10 万+ WebSocket 连接切换时不能断连
|
|
331
|
+
|
|
332
|
+
解决方案:
|
|
333
|
+
1. 回滚时不切 WebSocket 连接(WebSocket 独立 VirtualService)
|
|
334
|
+
2. 新建连接路由到 Green,存量连接留在 Blue
|
|
335
|
+
3. 设置 Blue 的 WebSocket Pod 的 Grace Period = 30min
|
|
336
|
+
4. 自然过渡:客户端重连时自动连接到 Green
|
|
337
|
+
```
|
|
338
|
+
|
|
339
|
+
#### 支付中订单保护
|
|
340
|
+
|
|
341
|
+
```
|
|
342
|
+
问题:切流时正在支付的订单不能中断
|
|
343
|
+
|
|
344
|
+
解决方案:
|
|
345
|
+
1. 支付接口使用会话亲和性(Session Affinity)
|
|
346
|
+
2. 进入支付流程的请求锁定到当前版本(通过 Cookie 标记)
|
|
347
|
+
3. 切流时 Blue 的支付 Pod 保留 10 分钟(drain period)
|
|
348
|
+
4. 超时未完成的支付走异步补偿
|
|
349
|
+
```
|
|
350
|
+
|
|
351
|
+
### 4.4 Phase 4:数据库迁移解耦(Week 7)
|
|
352
|
+
|
|
353
|
+
```
|
|
354
|
+
核心原则:数据库迁移与代码部署分离
|
|
355
|
+
|
|
356
|
+
规则:
|
|
357
|
+
1. Migration 必须向前兼容(N 和 N+1 版本都能工作)
|
|
358
|
+
2. 加字段:先 migration 加字段 → 再部署使用该字段的代码
|
|
359
|
+
3. 删字段:先部署不再使用该字段的代码 → 再 migration 删字段
|
|
360
|
+
4. 改字段名:加新字段 → 双写 → 迁移数据 → 删旧字段(3 次部署)
|
|
361
|
+
|
|
362
|
+
示例(重命名字段 old_name → new_name):
|
|
363
|
+
Release 1: ALTER TABLE ADD new_name; UPDATE SET new_name = old_name;
|
|
364
|
+
代码双写 old_name 和 new_name,读 new_name
|
|
365
|
+
Release 2: 代码只写 new_name,不再读写 old_name
|
|
366
|
+
Release 3: ALTER TABLE DROP old_name;
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
### 4.5 Phase 5:自动化与演练(Week 8)
|
|
370
|
+
|
|
371
|
+
```
|
|
372
|
+
1. 一键发布脚本:
|
|
373
|
+
super-deploy --service search --version v2.3.0 --strategy canary
|
|
374
|
+
|
|
375
|
+
2. 回滚演练(每两周一次):
|
|
376
|
+
- 故意部署一个会触发回滚阈值的版本
|
|
377
|
+
- 验证自动回滚是否在 2 分钟内完成
|
|
378
|
+
- 验证告警通知是否及时
|
|
379
|
+
|
|
380
|
+
3. 混沌工程:
|
|
381
|
+
- 在灰度期间注入网络延迟,验证回滚触发
|
|
382
|
+
- 在灰度期间 Kill Green Pod,验证自愈能力
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
---
|
|
386
|
+
|
|
387
|
+
## 五、结果数据
|
|
388
|
+
|
|
389
|
+
### 5.1 核心指标对比
|
|
390
|
+
|
|
391
|
+
| 指标 | 改造前 | 改造后 | 改善幅度 |
|
|
392
|
+
|------|--------|--------|----------|
|
|
393
|
+
| 正常发布时间 | 25-40 min | 12 min(含灰度观察) | -65% |
|
|
394
|
+
| 回滚时间 | 28 min(平均) | 1.8 秒(流量切换) | -99.9% |
|
|
395
|
+
| 最长回滚时间 | 52 min | 3.2 秒 | -99.9% |
|
|
396
|
+
| 发布失败率 | 12% | 3%(灰度期间发现) | -75% |
|
|
397
|
+
| 需要回滚的比率 | 8% | 2%(自动回滚) | -75% |
|
|
398
|
+
| 发布导致 P0 | 3 次/半年 | 0 次/半年 | -100% |
|
|
399
|
+
| 发布窗口限制 | 工作日 10-16 | 7x24(有灰度兜底) | 全时段 |
|
|
400
|
+
|
|
401
|
+
### 5.2 业务影响
|
|
402
|
+
|
|
403
|
+
| 指标 | 改造前 | 改造后 |
|
|
404
|
+
|------|--------|--------|
|
|
405
|
+
| 发布频率 | 每周 7 次 | 每天 5+ 次 |
|
|
406
|
+
| 需求交付周期 | 2 周 | 3 天 |
|
|
407
|
+
| 发布期间订单损失 | 月均 15 万元 | 0 元 |
|
|
408
|
+
| SRE 发布工作时间占比 | 40% | 5% |
|
|
409
|
+
|
|
410
|
+
### 5.3 资源成本
|
|
411
|
+
|
|
412
|
+
| 项目 | 成本 |
|
|
413
|
+
|------|------|
|
|
414
|
+
| Green 环境资源 | +40% 计算资源(非活跃时使用 Spot Instance) |
|
|
415
|
+
| Istio 资源开销 | +15% CPU / +20% Memory(Sidecar) |
|
|
416
|
+
| 实际月增成本 | 2.8 万元 |
|
|
417
|
+
| 避免的月均故障损失 | 15 万元 |
|
|
418
|
+
| **净收益** | **12.2 万元/月** |
|
|
419
|
+
|
|
420
|
+
---
|
|
421
|
+
|
|
422
|
+
## 六、经验教训
|
|
423
|
+
|
|
424
|
+
### 6.1 做对的事
|
|
425
|
+
|
|
426
|
+
1. **Istio 流量治理**:比 Nginx 路由更精细的流量控制能力,支持按 Header/Cookie/百分比切流
|
|
427
|
+
2. **自动回滚**:人工判断回滚需要 5-10 分钟犹豫时间,自动回滚消除了决策延迟
|
|
428
|
+
3. **数据库迁移解耦**:将 Migration 和代码部署分离后,回滚只需切流量,不涉及数据库回退
|
|
429
|
+
4. **回滚演练常态化**:每两周演练一次确保回滚机制始终可用,发现了 2 次演练中的配置漂移
|
|
430
|
+
5. **搜索缓存预热方案**:提前预热 + Readiness Probe 联动,避免了冷启动性能劣化
|
|
431
|
+
|
|
432
|
+
### 6.2 做错的事
|
|
433
|
+
|
|
434
|
+
1. **初期未考虑 Spot Instance**:Green 环境全用按需实例,成本翻倍。后来改为 Spot Instance,成本降低 60%
|
|
435
|
+
2. **自动回滚阈值过敏**:初始阈值设太紧(错误率 > 0.5%),导致正常发布也被误回滚。调整为 1% 后稳定
|
|
436
|
+
3. **WebSocket 处理延迟**:直到第一次切流时才发现 WebSocket 断连问题,紧急补丁修复
|
|
437
|
+
4. **监控指标不全**:初期只监控 HTTP 指标,遗漏了 gRPC 服务的指标,导致内部服务问题未能触发回滚
|
|
438
|
+
|
|
439
|
+
### 6.3 关键认知
|
|
440
|
+
|
|
441
|
+
- 蓝绿部署的核心价值不是"发布更快",而是"回滚更快更安全"
|
|
442
|
+
- 回滚速度 = 故障恢复速度,2 秒回滚比 30 分钟回滚在业务层面是质的飞跃
|
|
443
|
+
- 自动回滚的阈值需要 2-3 次校准才能找到合适值,太松会漏问题,太紧会误报
|
|
444
|
+
- 数据库迁移与代码部署的解耦是蓝绿部署的前提条件
|
|
445
|
+
- 双倍资源的成本增加,远低于发布事故的业务损失
|
|
446
|
+
|
|
447
|
+
---
|
|
448
|
+
|
|
449
|
+
## Agent Checklist
|
|
450
|
+
|
|
451
|
+
在 AI Agent 辅助搭建蓝绿发布体系时,应逐项确认:
|
|
452
|
+
|
|
453
|
+
- [ ] **双环境一致**:Blue 和 Green 的资源配置、环境变量、依赖版本是否一致
|
|
454
|
+
- [ ] **流量治理**:是否有精细的流量切换能力(百分比/Header/Cookie)
|
|
455
|
+
- [ ] **灰度步骤**:是否定义了灰度切流步骤和每步的观察时间
|
|
456
|
+
- [ ] **自动回滚**:是否配置了基于指标的自动回滚触发条件
|
|
457
|
+
- [ ] **回滚速度**:回滚操作是否可以在秒级完成
|
|
458
|
+
- [ ] **指标覆盖**:Blue/Green 的错误率/延迟/业务指标是否有对比看板
|
|
459
|
+
- [ ] **冒烟测试**:新版本上线前是否有自动化冒烟测试
|
|
460
|
+
- [ ] **缓存预热**:有本地缓存的服务是否有预热机制
|
|
461
|
+
- [ ] **长连接处理**:WebSocket/gRPC Stream 等长连接的切换方案是否明确
|
|
462
|
+
- [ ] **有状态保护**:进行中的事务(支付等)是否有会话亲和性保护
|
|
463
|
+
- [ ] **数据库解耦**:数据库迁移是否与代码部署分离
|
|
464
|
+
- [ ] **资源成本**:Green 环境是否使用了 Spot/竞价实例降低成本
|
|
465
|
+
- [ ] **回滚演练**:是否建立了定期回滚演练机制
|
|
466
|
+
- [ ] **通知机制**:发布进度和回滚事件是否有及时的团队通知
|