@umacloud/knowledge 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/00-governance/governance-capabilities.md +557 -0
- package/00-governance/knowledge-map.md +39 -0
- package/00-governance/maintenance-policy.md +76 -0
- package/00-governance/review-checklist.md +81 -0
- package/README.md +13 -0
- package/ai/01-standards/agent-development-complete.md +691 -0
- package/ai/01-standards/llm-application-complete.md +488 -0
- package/ai/01-standards/mlops-complete.md +798 -0
- package/ai/01-standards/prompt-engineering-complete.md +646 -0
- package/ai/01-standards/rag-architecture-complete.md +649 -0
- package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
- package/ai/03-checklists/ai-project-checklist.md +215 -0
- package/ai/04-antipatterns/ai-antipatterns.md +661 -0
- package/ai/05-cases/case-rag-production.md +147 -0
- package/ai/06-glossary/ai-glossary.md +162 -0
- package/ai/agent-evaluation-benchmark.md +53 -0
- package/ai/ai-agent-memory-context-management.md +41 -0
- package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
- package/ai/ai-data-security-and-compliance-playbook.md +37 -0
- package/ai/ai-domain-index-and-checklist.md +40 -0
- package/ai/ai-governance-maturity-model.md +50 -0
- package/ai/ai-model-selection-and-routing-strategy.md +47 -0
- package/ai/ai-observability-and-oncall-runbook.md +52 -0
- package/ai/ai-rag-engineering-playbook.md +42 -0
- package/ai/ai-red-team-and-safety-evaluation.md +42 -0
- package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
- package/ai/llm-agent-engineering-deep-dive.md +57 -0
- package/ai/prompt-and-tool-guardrails.md +52 -0
- package/api/01-standards/enterprise-api-standards.md +198 -0
- package/api/01-standards/rest-api-design-guide.md +63 -0
- package/api/02-playbooks/api-pagination-playbook.md +93 -0
- package/api/02-playbooks/graphql-production-playbook.md +176 -0
- package/api/03-checklists/api-review-checklist.md +55 -0
- package/api/04-antipatterns/api-antipatterns.md +112 -0
- package/architecture/01-standards/api-gateway-patterns.md +496 -0
- package/architecture/01-standards/cloud-native-patterns.md +644 -0
- package/architecture/01-standards/distributed-systems-patterns.md +591 -0
- package/architecture/01-standards/event-driven-architecture.md +595 -0
- package/architecture/01-standards/microservices-patterns-complete.md +968 -0
- package/architecture/01-standards/microservices-patterns.md +495 -0
- package/architecture/01-standards/system-design-interview.md +664 -0
- package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
- package/architecture/02-playbooks/migration-playbook.md +780 -0
- package/architecture/02-playbooks/system-design-playbook.md +779 -0
- package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
- package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
- package/architecture/05-cases/case-netflix-microservices.md +413 -0
- package/architecture/06-glossary/architecture-glossary.md +164 -0
- package/architecture/adr-template-and-examples.md +38 -0
- package/architecture/api-gateway-deep-dive.md +1291 -0
- package/architecture/configuration-management.md +1162 -0
- package/architecture/distributed-transactions.md +1220 -0
- package/architecture/microservices-complete.md +735 -0
- package/architecture/resilience-and-disaster-patterns.md +37 -0
- package/architecture/service-governance.md +1198 -0
- package/architecture/system-architecture-deep-dive.md +37 -0
- package/backend/01-standards/analytics-and-growth.md +65 -0
- package/backend/01-standards/api-and-error-conventions.md +120 -0
- package/backend/01-standards/application-layering-and-packaging.md +160 -0
- package/backend/01-standards/auth-implementation.md +104 -0
- package/backend/01-standards/backend-framework-idioms.md +74 -0
- package/backend/01-standards/background-jobs-and-async.md +66 -0
- package/backend/01-standards/caching-strategies-complete.md +390 -0
- package/backend/01-standards/config-and-observability.md +77 -0
- package/backend/01-standards/data-modeling-and-persistence.md +94 -0
- package/backend/01-standards/django-complete.md +1765 -0
- package/backend/01-standards/email-and-notifications.md +64 -0
- package/backend/01-standards/fastapi-complete.md +925 -0
- package/backend/01-standards/file-upload-and-storage.md +66 -0
- package/backend/01-standards/graphql-api-complete.md +416 -0
- package/backend/01-standards/llm-application-standard.md +78 -0
- package/backend/01-standards/message-queue-patterns.md +379 -0
- package/backend/01-standards/microservices-and-distributed.md +78 -0
- package/backend/01-standards/nestjs-complete.md +2167 -0
- package/backend/01-standards/payment-integration.md +80 -0
- package/backend/01-standards/rate-limiting-complete.md +451 -0
- package/backend/01-standards/realtime-and-websocket.md +65 -0
- package/backend/01-standards/search-and-filtering.md +64 -0
- package/backend/01-standards/spring-boot-complete.md +445 -0
- package/backend/02-playbooks/api-design-playbook.md +718 -0
- package/backend/02-playbooks/email-send-playbook.md +130 -0
- package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
- package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
- package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
- package/backend/03-checklists/api-launch-checklist.md +189 -0
- package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
- package/blockchain/01-standards/blockchain-basics.md +557 -0
- package/blockchain/01-standards/smart-contract-development.md +1315 -0
- package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
- package/cicd/01-standards/github-actions-complete.md +473 -0
- package/cicd/01-standards/release-and-store-submission.md +75 -0
- package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
- package/cicd/02-playbooks/release-management-playbook.md +605 -0
- package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
- package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
- package/cicd/05-cases/case-deployment-automation.md +221 -0
- package/cicd/05-cases/case-gitops-transformation.md +212 -0
- package/cicd/06-glossary/cicd-glossary.md +114 -0
- package/cicd/cicd-blueprint-deep-dive.md +38 -0
- package/cicd/release-readiness-gate.md +37 -0
- package/cloud-native/01-standards/container-security.md +741 -0
- package/cloud-native/01-standards/kubernetes-complete.md +812 -0
- package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
- package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
- package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
- package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
- package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
- package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
- package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
- package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
- package/cloud-native/03-checklists/container-security-checklist.md +431 -0
- package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
- package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
- package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
- package/cloud-native/05-cases/case-k8s-migration.md +478 -0
- package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
- package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
- package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
- package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
- package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
- package/data/01-standards/elasticsearch-complete.md +2098 -0
- package/data/01-standards/postgresql-complete.md +1613 -0
- package/data/01-standards/redis-complete.md +1527 -0
- package/data/02-playbooks/database-optimization-playbook.md +403 -0
- package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
- package/data/03-checklists/database-launch-checklist.md +187 -0
- package/data/04-antipatterns/database-antipatterns.md +873 -0
- package/data/05-cases/case-database-migration.md +310 -0
- package/data/06-glossary/database-glossary.md +440 -0
- package/data/data-governance-and-modeling-deep-dive.md +39 -0
- package/data-engineering/01-standards/airflow-complete.md +523 -0
- package/data-engineering/01-standards/kafka-complete.md +1521 -0
- package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
- package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
- package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
- package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
- package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
- package/database/01-standards/database-schema-standards.md +147 -0
- package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
- package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
- package/database/02-playbooks/postgresql-production-playbook.md +146 -0
- package/database/02-playbooks/redis-caching-playbook.md +117 -0
- package/database/03-checklists/database-review-checklist.md +50 -0
- package/database/04-antipatterns/database-antipatterns.md +112 -0
- package/design/01-standards/ui-design-system-complete.md +423 -0
- package/design/02-playbooks/design-handoff-playbook.md +254 -0
- package/design/02-playbooks/design-review-playbook.md +388 -0
- package/design/03-checklists/design-review-checklist.md +246 -0
- package/design/04-antipatterns/design-antipatterns.md +378 -0
- package/design/05-cases/case-design-system-adoption.md +328 -0
- package/design/06-glossary/design-glossary.md +329 -0
- package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
- package/design/ux-system-deep-dive.md +38 -0
- package/design-systems/00-craft-rules.md +71 -0
- package/design-systems/aesthetic-families.md +43 -0
- package/design-systems/anti-ai-slop.md +162 -0
- package/design-systems/bold-geometric.md +120 -0
- package/design-systems/brutalist-bold.md +103 -0
- package/design-systems/editorial-clean.md +109 -0
- package/design-systems/glass-aurora.md +108 -0
- package/design-systems/modern-minimal.md +145 -0
- package/design-systems/premium-luxury.md +106 -0
- package/design-systems/product-type-design-map.md +48 -0
- package/design-systems/soft-warm.md +123 -0
- package/design-systems/tech-utility.md +113 -0
- package/desktop/01-standards/desktop-app-standard.md +72 -0
- package/desktop/01-standards/desktop-design.md +71 -0
- package/development/00-governance/document-template.md +41 -0
- package/development/01-standards/api-versioning-strategies.md +432 -0
- package/development/01-standards/authentication-patterns-complete.md +479 -0
- package/development/01-standards/css-architecture-complete.md +550 -0
- package/development/01-standards/database-migration-strategies.md +484 -0
- package/development/01-standards/elasticsearch-complete.md +347 -0
- package/development/01-standards/git-complete.md +371 -0
- package/development/01-standards/golang-complete.md +1565 -0
- package/development/01-standards/graphql-complete.md +298 -0
- package/development/01-standards/javascript-bundlers-complete.md +469 -0
- package/development/01-standards/javascript-typescript-complete.md +528 -0
- package/development/01-standards/jest-complete.md +275 -0
- package/development/01-standards/linux-complete.md +234 -0
- package/development/01-standards/logging-observability-complete.md +526 -0
- package/development/01-standards/microservices-communication.md +502 -0
- package/development/01-standards/mongodb-complete.md +406 -0
- package/development/01-standards/oauth2-complete.md +285 -0
- package/development/01-standards/performance-optimization-complete.md +289 -0
- package/development/01-standards/playwright-complete.md +247 -0
- package/development/01-standards/postgresql-complete.md +456 -0
- package/development/01-standards/pytest-complete.md +340 -0
- package/development/01-standards/python-async-programming.md +902 -0
- package/development/01-standards/python-complete.md +956 -0
- package/development/01-standards/python-decorators-complete.md +799 -0
- package/development/01-standards/python-design-patterns.md +2854 -0
- package/development/01-standards/python-packaging-distribution.md +420 -0
- package/development/01-standards/python-testing-strategies.md +607 -0
- package/development/01-standards/python-web-frameworks-comparison.md +471 -0
- package/development/01-standards/redis-complete.md +317 -0
- package/development/01-standards/rest-api-complete.md +316 -0
- package/development/01-standards/rust-complete.md +578 -0
- package/development/01-standards/typescript-advanced-types.md +1513 -0
- package/development/01-standards/web-security-complete.md +292 -0
- package/development/02-playbooks/api-design-playbook.md +810 -0
- package/development/02-playbooks/database-migration-playbook.md +580 -0
- package/development/02-playbooks/debugging-playbook.md +692 -0
- package/development/02-playbooks/feature-delivery-playbook.md +430 -0
- package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
- package/development/02-playbooks/performance-optimization-playbook.md +531 -0
- package/development/02-playbooks/performance-tuning-playbook.md +652 -0
- package/development/02-playbooks/refactor-playbook.md +403 -0
- package/development/02-playbooks/release-playbook.md +469 -0
- package/development/03-checklists/architecture-review-checklist.md +168 -0
- package/development/03-checklists/data-migration-checklist.md +157 -0
- package/development/03-checklists/oncall-handover-checklist.md +173 -0
- package/development/03-checklists/pr-checklist.md +158 -0
- package/development/03-checklists/production-readiness-checklist.md +190 -0
- package/development/03-checklists/release-readiness-checklist.md +154 -0
- package/development/03-checklists/security-review-checklist.md +182 -0
- package/development/04-antipatterns/api-antipatterns.md +657 -0
- package/development/04-antipatterns/architecture-antipatterns.md +686 -0
- package/development/04-antipatterns/backend-antipatterns.md +648 -0
- package/development/04-antipatterns/cicd-antipatterns.md +540 -0
- package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
- package/development/04-antipatterns/data-antipatterns.md +658 -0
- package/development/04-antipatterns/database-antipatterns.md +578 -0
- package/development/04-antipatterns/frontend-antipatterns.md +635 -0
- package/development/04-antipatterns/reliability-antipatterns.md +700 -0
- package/development/04-antipatterns/security-antipatterns.md +747 -0
- package/development/05-cases/case-api-version-migration.md +428 -0
- package/development/05-cases/case-authorization-hardening.md +383 -0
- package/development/05-cases/case-bluegreen-rollback.md +466 -0
- package/development/05-cases/case-cache-snowball-protection.md +485 -0
- package/development/05-cases/case-ci-cd-pipeline.md +544 -0
- package/development/05-cases/case-database-scaling.md +500 -0
- package/development/05-cases/case-db-hotspot-optimization.md +487 -0
- package/development/05-cases/case-incident-mttr-reduction.md +563 -0
- package/development/05-cases/case-microservice-migration.md +375 -0
- package/development/05-cases/case-performance-optimization.md +406 -0
- package/development/05-cases/case-security-incident-response.md +345 -0
- package/development/06-glossary/full-stack-glossary.md +166 -0
- package/development/09-maturity/quarterly-audit-template.md +35 -0
- package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
- package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
- package/development/12-scenarios/development-scenarios-guide.md +565 -0
- package/development/13-implementation-assets/implementation-toolkit.md +282 -0
- package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
- package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
- package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
- package/development/api-contract-and-versioning-guide.md +36 -0
- package/development/api-governance-complete.md +43 -0
- package/development/backend-engineering-complete.md +43 -0
- package/development/code-review-quality-complete.md +43 -0
- package/development/concurrency-reliability-complete.md +43 -0
- package/development/database-engineering-complete.md +43 -0
- package/development/engineering-effectiveness-complete.md +43 -0
- package/development/engineering-standards-deep-dive.md +38 -0
- package/development/frontend-engineering-complete.md +43 -0
- package/development/performance-capacity-complete.md +43 -0
- package/development/refactor-migration-complete.md +42 -0
- package/development/refactoring-and-techdebt-playbook.md +37 -0
- package/development/security-in-development-complete.md +43 -0
- package/devops/01-standards/cicd-pipeline-complete.md +262 -0
- package/devops/01-standards/docker-complete.md +1490 -0
- package/devops/01-standards/github-actions-complete.md +337 -0
- package/devops/01-standards/kubernetes-complete.md +638 -0
- package/devops/01-standards/terraform-complete.md +2117 -0
- package/devops/02-playbooks/docker-compose-playbook.md +233 -0
- package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
- package/devops/02-playbooks/docker-production-playbook.md +952 -0
- package/edge-iot/01-standards/edge-iot-complete.md +473 -0
- package/experts/architect/api-design.md +178 -0
- package/experts/architect/methodology.md +124 -0
- package/experts/architect/security.md +75 -0
- package/experts/backend-lead/methodology.md +216 -0
- package/experts/devops/methodology.md +160 -0
- package/experts/frontend-lead/methodology.md +178 -0
- package/experts/product-manager/industry/ecommerce.md +43 -0
- package/experts/product-manager/industry/saas.md +40 -0
- package/experts/product-manager/methodology.md +97 -0
- package/experts/qa-lead/methodology.md +123 -0
- package/experts/qa-lead/test-strategy.md +128 -0
- package/experts/uiux-designer/methodology.md +125 -0
- package/frontend/01-standards/accessibility-complete.md +532 -0
- package/frontend/01-standards/accessibility-standard.md +74 -0
- package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
- package/frontend/01-standards/design-tokens-complete.md +444 -0
- package/frontend/01-standards/forms-and-validation.md +77 -0
- package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
- package/frontend/01-standards/i18n-and-localization.md +65 -0
- package/frontend/01-standards/nextjs-complete.md +451 -0
- package/frontend/01-standards/react-complete.md +713 -0
- package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
- package/frontend/01-standards/react-hooks-complete.md +1171 -0
- package/frontend/01-standards/seo-and-web-vitals.md +77 -0
- package/frontend/01-standards/state-management-complete.md +444 -0
- package/frontend/01-standards/vue-complete.md +499 -0
- package/frontend/01-standards/vue3-complete.md +2002 -0
- package/frontend/01-standards/web-framework-best-practices.md +64 -0
- package/frontend/01-standards/web-performance-complete.md +495 -0
- package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
- package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
- package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
- package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
- package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
- package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
- package/frontend/03-checklists/component-quality-checklist.md +166 -0
- package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
- package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
- package/frontend/05-cases/case-performance-optimization.md +274 -0
- package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
- package/harmony/01-standards/harmonyos-design.md +65 -0
- package/high-quality-engineering-playbook.md +54 -0
- package/incident/01-standards/incident-response-complete.md +303 -0
- package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
- package/incident/02-playbooks/postmortem-playbook.md +398 -0
- package/incident/03-checklists/incident-readiness-checklist.md +181 -0
- package/incident/04-antipatterns/incident-antipatterns.md +490 -0
- package/incident/05-cases/case-cascade-failure.md +176 -0
- package/incident/06-glossary/incident-glossary.md +114 -0
- package/incident/postmortem-and-response-deep-dive.md +39 -0
- package/industries/ecommerce/ecommerce-complete.md +631 -0
- package/industries/education/education-complete.md +555 -0
- package/industries/fintech/fintech-complete.md +501 -0
- package/industries/gaming/gaming-complete.md +587 -0
- package/industries/healthcare/healthcare-complete.md +452 -0
- package/low-code/01-standards/low-code-complete.md +944 -0
- package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
- package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
- package/miniprogram/01-standards/miniprogram-design.md +61 -0
- package/miniprogram/01-standards/miniprogram-standard.md +81 -0
- package/mobile/01-standards/android-material-design.md +70 -0
- package/mobile/01-standards/flutter-complete.md +384 -0
- package/mobile/01-standards/ios-design-hig.md +78 -0
- package/mobile/01-standards/mobile-app-standard.md +85 -0
- package/mobile/01-standards/react-native-complete.md +352 -0
- package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
- package/mobile/02-playbooks/mobile-performance.md +473 -0
- package/mobile/03-checklists/mobile-release-checklist.md +234 -0
- package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
- package/mobile/05-cases/case-app-performance.md +500 -0
- package/mobile/05-cases/case-app-startup-optimization.md +218 -0
- package/mobile/06-glossary/mobile-glossary.md +484 -0
- package/observability/01-standards/observability-standards.md +103 -0
- package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
- package/observability/02-playbooks/structured-logging-playbook.md +73 -0
- package/observability/03-checklists/observability-checklist.md +54 -0
- package/observability/04-antipatterns/observability-antipatterns.md +106 -0
- package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
- package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
- package/operations/03-checklists/production-launch-checklist.md +365 -0
- package/operations/04-antipatterns/operations-antipatterns.md +664 -0
- package/operations/05-cases/case-sre-practices.md +581 -0
- package/operations/06-glossary/operations-glossary.md +120 -0
- package/operations/aiops-anomaly-detection.md +758 -0
- package/operations/capacity-planning.md +1061 -0
- package/operations/chaos-engineering.md +659 -0
- package/operations/incident-command-system.md +38 -0
- package/operations/observability-complete.md +442 -0
- package/operations/slo-sli-playbook.md +517 -0
- package/operations/sre-operations-deep-dive.md +39 -0
- package/package.json +8 -0
- package/performance/01-standards/performance-and-scalability.md +80 -0
- package/performance/01-standards/performance-standards.md +156 -0
- package/performance/02-playbooks/query-optimization-playbook.md +103 -0
- package/performance/03-checklists/performance-checklist.md +56 -0
- package/performance/04-antipatterns/performance-antipatterns.md +146 -0
- package/product/01-standards/product-management-complete.md +285 -0
- package/product/02-playbooks/feature-launch-playbook.md +207 -0
- package/product/02-playbooks/user-research-playbook.md +532 -0
- package/product/03-checklists/feature-launch-checklist.md +275 -0
- package/product/04-antipatterns/product-antipatterns.md +355 -0
- package/product/05-cases/case-mvp-to-scale.md +384 -0
- package/product/06-glossary/product-glossary.md +462 -0
- package/product/feature-prioritization-framework.md +40 -0
- package/product/kpi-and-metric-tree.md +37 -0
- package/product/product-discovery-and-prd-deep-dive.md +41 -0
- package/quantum/01-standards/quantum-complete.md +1186 -0
- package/security/01-standards/api-security-complete.md +511 -0
- package/security/01-standards/container-runtime-security.md +574 -0
- package/security/01-standards/data-protection-gdpr.md +543 -0
- package/security/01-standards/owasp-top10-complete.md +1890 -0
- package/security/01-standards/secure-coding-baseline.md +90 -0
- package/security/01-standards/supply-chain-security.md +441 -0
- package/security/01-standards/web-security-checklist.md +108 -0
- package/security/01-standards/zero-trust-architecture.md +521 -0
- package/security/02-playbooks/auth-sso-playbook.md +166 -0
- package/security/02-playbooks/incident-response-security-playbook.md +588 -0
- package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
- package/security/02-playbooks/payment-integration-playbook.md +119 -0
- package/security/02-playbooks/penetration-testing-playbook.md +517 -0
- package/security/03-checklists/security-audit-checklist.md +356 -0
- package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
- package/security/05-cases/case-log4shell-incident.md +537 -0
- package/security/05-cases/case-major-breaches.md +468 -0
- package/security/06-glossary/security-glossary.md +212 -0
- package/security/compliance-automation.md +993 -0
- package/security/container-security.md +680 -0
- package/security/devsecops-complete.md +426 -0
- package/security/sast-dast-sca.md +775 -0
- package/security/secrets-management.md +594 -0
- package/security/security-architecture-deep-dive.md +37 -0
- package/security/threat-modeling-stride-playbook.md +40 -0
- package/seed-templates/auth-system.md +59 -0
- package/seed-templates/blog-content.md +94 -0
- package/seed-templates/dashboard.md +89 -0
- package/seed-templates/docs-site.md +73 -0
- package/seed-templates/e-commerce.md +50 -0
- package/seed-templates/saas-landing.md +92 -0
- package/seed-templates/settings-page.md +51 -0
- package/testing/01-standards/test-strategy-and-layering.md +83 -0
- package/testing/01-standards/testing-strategy-complete.md +422 -0
- package/testing/01-standards/unit-testing-best-practices.md +118 -0
- package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
- package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
- package/testing/03-checklists/test-strategy-checklist.md +208 -0
- package/testing/04-antipatterns/testing-antipatterns.md +718 -0
- package/testing/05-cases/case-testing-transformation.md +300 -0
- package/testing/06-glossary/testing-glossary.md +110 -0
- package/testing/risk-based-test-matrix.md +36 -0
- package/testing/testing-strategy-deep-dive.md +37 -0
|
@@ -0,0 +1,692 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: debugging-playbook
|
|
3
|
+
title: 系统化调试手册 (Debugging Playbook)
|
|
4
|
+
domain: development
|
|
5
|
+
category: 02-playbooks
|
|
6
|
+
difficulty: intermediate
|
|
7
|
+
tags: [agent, checklist, debugging, development, playbook, 前置条件, 回滚方案, 报告]
|
|
8
|
+
quality_score: 70
|
|
9
|
+
last_updated: 2026-06-15
|
|
10
|
+
---
|
|
11
|
+
# 系统化调试手册 (Debugging Playbook)
|
|
12
|
+
|
|
13
|
+
## 概述
|
|
14
|
+
|
|
15
|
+
调试是通过系统化方法定位和修复软件缺陷的过程。本手册覆盖从日志分析、断点调试、分布式链路追踪到内存泄漏排查的完整方法论,强调科学假设 -> 验证 -> 排除的迭代流程,避免盲目猜测。
|
|
16
|
+
|
|
17
|
+
## 前置条件
|
|
18
|
+
|
|
19
|
+
### 必须满足
|
|
20
|
+
|
|
21
|
+
- [ ] 能复现问题(或有足够的日志/监控数据还原现场)
|
|
22
|
+
- [ ] 有可观测性基础设施(日志、指标、追踪至少具其二)
|
|
23
|
+
- [ ] 有访问相关系统日志和监控数据的权限
|
|
24
|
+
- [ ] 了解系统架构和关键组件的交互关系
|
|
25
|
+
|
|
26
|
+
### 工具清单
|
|
27
|
+
|
|
28
|
+
| 场景 | 工具 |
|
|
29
|
+
|------|------|
|
|
30
|
+
| 日志分析 | ELK Stack, Loki+Grafana, CloudWatch |
|
|
31
|
+
| 断点调试 | pdb/ipdb (Python), delve (Go), gdb/lldb (C/C++) |
|
|
32
|
+
| 远程调试 | debugpy (Python), dlv attach (Go) |
|
|
33
|
+
| 分布式追踪 | Jaeger, Zipkin, SkyWalking, Datadog APM |
|
|
34
|
+
| 网络调试 | tcpdump, wireshark, curl -v, httpie |
|
|
35
|
+
| 内存分析 | tracemalloc, objgraph, valgrind, MAT |
|
|
36
|
+
| 系统级 | strace, ltrace, dmesg, journalctl |
|
|
37
|
+
|
|
38
|
+
---
|
|
39
|
+
|
|
40
|
+
## 步骤一:问题定义
|
|
41
|
+
|
|
42
|
+
### 1.1 问题描述模板
|
|
43
|
+
|
|
44
|
+
```markdown
|
|
45
|
+
## Bug 报告
|
|
46
|
+
|
|
47
|
+
### 现象
|
|
48
|
+
- 什么:[具体的错误表现]
|
|
49
|
+
- 何时:[首次出现时间、出现频率]
|
|
50
|
+
- 哪里:[哪个环境、哪个接口、哪个页面]
|
|
51
|
+
- 谁受影响:[用户群体、影响范围]
|
|
52
|
+
|
|
53
|
+
### 上下文
|
|
54
|
+
- 最近有发布/变更吗?
|
|
55
|
+
- 最近有流量/数据量变化吗?
|
|
56
|
+
- 最近有基础设施变更吗?
|
|
57
|
+
- 是否只在特定条件下出现?
|
|
58
|
+
|
|
59
|
+
### 已有信息
|
|
60
|
+
- 错误日志:[粘贴关键日志]
|
|
61
|
+
- 错误码/消息:[具体的错误信息]
|
|
62
|
+
- 复现步骤:[1, 2, 3...]
|
|
63
|
+
- 已尝试的排查:[做过什么]
|
|
64
|
+
|
|
65
|
+
### 影响
|
|
66
|
+
- 业务影响:[具体影响]
|
|
67
|
+
- 紧急程度:P0/P1/P2/P3
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
### 1.2 调试思维模型
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
科学方法应用于调试:
|
|
74
|
+
|
|
75
|
+
1. 观察现象 → 收集所有相关信息
|
|
76
|
+
2. 形成假设 → 基于现象推测可能的原因
|
|
77
|
+
3. 设计实验 → 制定验证假设的步骤
|
|
78
|
+
4. 执行验证 → 运行实验
|
|
79
|
+
5. 分析结果 → 假设成立?排除?修正?
|
|
80
|
+
6. 重复 2-5 → 直到定位根因
|
|
81
|
+
|
|
82
|
+
关键原则:
|
|
83
|
+
- 一次只改一个变量
|
|
84
|
+
- 记录每个假设和验证结果
|
|
85
|
+
- 先排除最可能的原因
|
|
86
|
+
- 不要假设,要验证
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
---
|
|
90
|
+
|
|
91
|
+
## 步骤二:日志分析
|
|
92
|
+
|
|
93
|
+
### 2.1 结构化日志查询
|
|
94
|
+
|
|
95
|
+
```bash
|
|
96
|
+
# ELK - Kibana Query Language (KQL)
|
|
97
|
+
# 按时间范围和错误级别过滤
|
|
98
|
+
level:ERROR AND service:"order-service" AND @timestamp >= "2024-01-15T10:00:00"
|
|
99
|
+
|
|
100
|
+
# 按请求 ID 追踪
|
|
101
|
+
request_id:"req_abc123def456"
|
|
102
|
+
|
|
103
|
+
# 按用户追踪
|
|
104
|
+
user_id:12345 AND level:(ERROR OR WARN)
|
|
105
|
+
|
|
106
|
+
# 按错误类型统计
|
|
107
|
+
level:ERROR | stats count() by error_code
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
```bash
|
|
111
|
+
# Loki (Grafana) - LogQL
|
|
112
|
+
# 按标签过滤
|
|
113
|
+
{service="order-service", level="error"}
|
|
114
|
+
|
|
115
|
+
# 正则匹配
|
|
116
|
+
{service="order-service"} |~ "timeout|connection refused"
|
|
117
|
+
|
|
118
|
+
# 按模式统计
|
|
119
|
+
{service="order-service"} | pattern `<_> ERROR <_> <error_type> <_>` | line_format "{{.error_type}}" | label_format error_type
|
|
120
|
+
```
|
|
121
|
+
|
|
122
|
+
```bash
|
|
123
|
+
# 命令行快速分析(无 ELK 时)
|
|
124
|
+
# 查看最近的错误日志
|
|
125
|
+
kubectl logs -n production deployment/order-service --since=30m | grep -i error | tail -50
|
|
126
|
+
|
|
127
|
+
# 多 Pod 聚合查看
|
|
128
|
+
stern order-service -n production --since 30m -o raw | grep -E "(ERROR|FATAL|Traceback)"
|
|
129
|
+
|
|
130
|
+
# 按错误类型统计频率
|
|
131
|
+
kubectl logs -n production deployment/order-service --since=1h | \
|
|
132
|
+
grep ERROR | \
|
|
133
|
+
sed 's/.*ERROR *//' | \
|
|
134
|
+
sed 's/\[.*$//' | \
|
|
135
|
+
sort | uniq -c | sort -rn | head -20
|
|
136
|
+
|
|
137
|
+
# 提取特定时间段
|
|
138
|
+
kubectl logs -n production deployment/order-service --since=1h | \
|
|
139
|
+
awk '/2024-01-15T10:00/,/2024-01-15T10:30/'
|
|
140
|
+
```
|
|
141
|
+
|
|
142
|
+
### 2.2 日志关联分析
|
|
143
|
+
|
|
144
|
+
```python
|
|
145
|
+
# 将分散的日志按 request_id 关联成完整链路
|
|
146
|
+
|
|
147
|
+
import json
|
|
148
|
+
from collections import defaultdict
|
|
149
|
+
|
|
150
|
+
def correlate_logs(log_lines: list[str], request_id: str) -> list[dict]:
|
|
151
|
+
"""按 request_id 关联日志,还原请求链路"""
|
|
152
|
+
events = []
|
|
153
|
+
for line in log_lines:
|
|
154
|
+
try:
|
|
155
|
+
log = json.loads(line)
|
|
156
|
+
if log.get('request_id') == request_id:
|
|
157
|
+
events.append(log)
|
|
158
|
+
except json.JSONDecodeError:
|
|
159
|
+
continue
|
|
160
|
+
return sorted(events, key=lambda x: x.get('timestamp', ''))
|
|
161
|
+
|
|
162
|
+
# 输出示例:
|
|
163
|
+
# 10:00:00.001 [INFO] Received POST /api/v1/orders
|
|
164
|
+
# 10:00:00.005 [INFO] Validating request payload
|
|
165
|
+
# 10:00:00.010 [INFO] Querying user from database
|
|
166
|
+
# 10:00:00.510 [WARN] Database query slow: 500ms
|
|
167
|
+
# 10:00:00.515 [INFO] Checking inventory
|
|
168
|
+
# 10:00:01.520 [ERROR] Inventory service timeout after 1000ms
|
|
169
|
+
# 10:00:01.521 [ERROR] Order creation failed: upstream timeout
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
### 2.3 日志模式识别
|
|
173
|
+
|
|
174
|
+
```bash
|
|
175
|
+
# 找到错误突增的时间点
|
|
176
|
+
kubectl logs -n production deployment/order-service --since=6h | \
|
|
177
|
+
grep ERROR | \
|
|
178
|
+
awk -F'T' '{print $1"T"substr($2,1,5)}' | \
|
|
179
|
+
sort | uniq -c | sort -k2
|
|
180
|
+
|
|
181
|
+
# 输出示例:
|
|
182
|
+
# 5 2024-01-15T10:00
|
|
183
|
+
# 3 2024-01-15T10:05
|
|
184
|
+
# 2 2024-01-15T10:10
|
|
185
|
+
# 158 2024-01-15T10:15 ← 错误突增!
|
|
186
|
+
# 203 2024-01-15T10:20
|
|
187
|
+
# 12 2024-01-15T10:25
|
|
188
|
+
|
|
189
|
+
# 检查这个时间点前后发生了什么
|
|
190
|
+
# 1. 是否有部署?
|
|
191
|
+
kubectl rollout history deployment/order-service -n production
|
|
192
|
+
|
|
193
|
+
# 2. 是否有配置变更?
|
|
194
|
+
git log --since="2024-01-15 10:00" --until="2024-01-15 10:20"
|
|
195
|
+
```
|
|
196
|
+
|
|
197
|
+
---
|
|
198
|
+
|
|
199
|
+
## 步骤三:断点调试
|
|
200
|
+
|
|
201
|
+
### 3.1 Python 调试
|
|
202
|
+
|
|
203
|
+
```python
|
|
204
|
+
# pdb - 内置调试器
|
|
205
|
+
def process_order(order_data):
|
|
206
|
+
user = get_user(order_data['user_id'])
|
|
207
|
+
import pdb; pdb.set_trace() # 在此暂停
|
|
208
|
+
# 或使用 breakpoint()(Python 3.7+)
|
|
209
|
+
inventory = check_inventory(order_data['product_id'])
|
|
210
|
+
return create_order(user, inventory)
|
|
211
|
+
|
|
212
|
+
# pdb 常用命令:
|
|
213
|
+
# n (next) - 执行下一行
|
|
214
|
+
# s (step) - 进入函数
|
|
215
|
+
# c (continue) - 继续执行
|
|
216
|
+
# p var - 打印变量
|
|
217
|
+
# pp var - 格式化打印
|
|
218
|
+
# l (list) - 显示代码
|
|
219
|
+
# w (where) - 显示调用栈
|
|
220
|
+
# u (up) - 向上一层调用栈
|
|
221
|
+
# d (down) - 向下一层调用栈
|
|
222
|
+
# b 42 - 在第 42 行设置断点
|
|
223
|
+
# condition 1 x > 10 - 条件断点
|
|
224
|
+
|
|
225
|
+
# ipdb - 增强版(支持 Tab 补全和语法高亮)
|
|
226
|
+
# pip install ipdb
|
|
227
|
+
import ipdb; ipdb.set_trace()
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
```python
|
|
231
|
+
# 远程调试 - debugpy(VS Code 远程调试)
|
|
232
|
+
# pip install debugpy
|
|
233
|
+
|
|
234
|
+
import debugpy
|
|
235
|
+
debugpy.listen(("0.0.0.0", 5678))
|
|
236
|
+
print("Waiting for debugger to attach...")
|
|
237
|
+
debugpy.wait_for_client()
|
|
238
|
+
# 在 VS Code 中配置 launch.json 连接到 5678 端口
|
|
239
|
+
|
|
240
|
+
# 条件性调试 - 仅在特定条件下触发
|
|
241
|
+
def process_order(order_data):
|
|
242
|
+
result = calculate_total(order_data)
|
|
243
|
+
if result < 0: # 只有异常值才调试
|
|
244
|
+
import pdb; pdb.set_trace()
|
|
245
|
+
return result
|
|
246
|
+
```
|
|
247
|
+
|
|
248
|
+
### 3.2 Post-mortem 调试
|
|
249
|
+
|
|
250
|
+
```python
|
|
251
|
+
# 程序崩溃后分析
|
|
252
|
+
import traceback
|
|
253
|
+
import sys
|
|
254
|
+
|
|
255
|
+
def main():
|
|
256
|
+
try:
|
|
257
|
+
risky_operation()
|
|
258
|
+
except Exception:
|
|
259
|
+
# 打印完整异常信息
|
|
260
|
+
traceback.print_exc()
|
|
261
|
+
# 进入 post-mortem 调试
|
|
262
|
+
import pdb; pdb.post_mortem()
|
|
263
|
+
|
|
264
|
+
# 或在命令行运行
|
|
265
|
+
# python -m pdb script.py
|
|
266
|
+
# 崩溃时自动进入 pdb
|
|
267
|
+
```
|
|
268
|
+
|
|
269
|
+
### 3.3 异步代码调试
|
|
270
|
+
|
|
271
|
+
```python
|
|
272
|
+
# asyncio 调试模式
|
|
273
|
+
import asyncio
|
|
274
|
+
|
|
275
|
+
# 启用 asyncio 调试
|
|
276
|
+
asyncio.get_event_loop().set_debug(True)
|
|
277
|
+
# 或设置环境变量 PYTHONASYNCIODEBUG=1
|
|
278
|
+
|
|
279
|
+
# 追踪协程执行
|
|
280
|
+
import logging
|
|
281
|
+
logging.getLogger('asyncio').setLevel(logging.DEBUG)
|
|
282
|
+
|
|
283
|
+
# 检测未 await 的协程
|
|
284
|
+
import warnings
|
|
285
|
+
warnings.filterwarnings('error', category=RuntimeWarning, message='.*was never awaited.*')
|
|
286
|
+
```
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
290
|
+
## 步骤四:分布式链路追踪
|
|
291
|
+
|
|
292
|
+
### 4.1 追踪架构
|
|
293
|
+
|
|
294
|
+
```
|
|
295
|
+
用户请求
|
|
296
|
+
│
|
|
297
|
+
▼
|
|
298
|
+
API Gateway (Span 1: gateway)
|
|
299
|
+
│
|
|
300
|
+
├─► Order Service (Span 2: order-create)
|
|
301
|
+
│ │
|
|
302
|
+
│ ├─► User Service (Span 3: user-validate)
|
|
303
|
+
│ │
|
|
304
|
+
│ ├─► Inventory Service (Span 4: inventory-check)
|
|
305
|
+
│ │ │
|
|
306
|
+
│ │ └─► Redis Cache (Span 5: cache-lookup)
|
|
307
|
+
│ │
|
|
308
|
+
│ └─► Database (Span 6: db-insert)
|
|
309
|
+
│
|
|
310
|
+
└─► 响应
|
|
311
|
+
|
|
312
|
+
每个 Span 记录:
|
|
313
|
+
- trace_id: 全链路唯一 ID
|
|
314
|
+
- span_id: 当前操作 ID
|
|
315
|
+
- parent_span_id: 父操作 ID
|
|
316
|
+
- operation_name: 操作名称
|
|
317
|
+
- start_time / duration: 时间信息
|
|
318
|
+
- tags: 元数据(http.method, http.status_code, error)
|
|
319
|
+
- logs: 事件日志
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### 4.2 OpenTelemetry 集成
|
|
323
|
+
|
|
324
|
+
```python
|
|
325
|
+
# pip install opentelemetry-api opentelemetry-sdk opentelemetry-instrumentation-fastapi
|
|
326
|
+
|
|
327
|
+
from opentelemetry import trace
|
|
328
|
+
from opentelemetry.sdk.trace import TracerProvider
|
|
329
|
+
from opentelemetry.sdk.trace.export import BatchSpanProcessor
|
|
330
|
+
from opentelemetry.exporter.jaeger.thrift import JaegerExporter
|
|
331
|
+
from opentelemetry.instrumentation.fastapi import FastAPIInstrumentor
|
|
332
|
+
|
|
333
|
+
# 配置追踪
|
|
334
|
+
provider = TracerProvider()
|
|
335
|
+
jaeger_exporter = JaegerExporter(
|
|
336
|
+
agent_host_name="jaeger",
|
|
337
|
+
agent_port=6831,
|
|
338
|
+
)
|
|
339
|
+
provider.add_span_processor(BatchSpanProcessor(jaeger_exporter))
|
|
340
|
+
trace.set_tracer_provider(provider)
|
|
341
|
+
|
|
342
|
+
# 自动检测 FastAPI
|
|
343
|
+
FastAPIInstrumentor.instrument_app(app)
|
|
344
|
+
|
|
345
|
+
# 手动创建 Span(业务逻辑追踪)
|
|
346
|
+
tracer = trace.get_tracer("order-service")
|
|
347
|
+
|
|
348
|
+
async def create_order(order_data: dict):
|
|
349
|
+
with tracer.start_as_current_span("create_order") as span:
|
|
350
|
+
span.set_attribute("order.product_id", order_data["product_id"])
|
|
351
|
+
|
|
352
|
+
with tracer.start_as_current_span("validate_user"):
|
|
353
|
+
user = await validate_user(order_data["user_id"])
|
|
354
|
+
|
|
355
|
+
with tracer.start_as_current_span("check_inventory") as inv_span:
|
|
356
|
+
available = await check_inventory(order_data["product_id"])
|
|
357
|
+
inv_span.set_attribute("inventory.available", available)
|
|
358
|
+
if not available:
|
|
359
|
+
inv_span.set_status(trace.Status(trace.StatusCode.ERROR, "Out of stock"))
|
|
360
|
+
raise BusinessError("INSUFFICIENT_STOCK")
|
|
361
|
+
|
|
362
|
+
with tracer.start_as_current_span("persist_order"):
|
|
363
|
+
order = await save_order(order_data)
|
|
364
|
+
|
|
365
|
+
span.set_attribute("order.id", order.id)
|
|
366
|
+
return order
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
### 4.3 Jaeger 查询
|
|
370
|
+
|
|
371
|
+
```bash
|
|
372
|
+
# 按服务和操作查找追踪
|
|
373
|
+
curl -s "http://jaeger:16686/api/traces?service=order-service&operation=create_order&limit=20&lookback=1h" | jq '.data[0].spans | length'
|
|
374
|
+
|
|
375
|
+
# 按 Trace ID 获取完整链路
|
|
376
|
+
curl -s "http://jaeger:16686/api/traces/<trace-id>" | jq '.data[0].spans[] | {operationName, duration: (.duration/1000|tostring + "ms"), tags: [.tags[] | select(.key | test("error|http.status")) | {(.key): .value}]}'
|
|
377
|
+
|
|
378
|
+
# 查找慢请求
|
|
379
|
+
curl -s "http://jaeger:16686/api/traces?service=order-service&minDuration=1s&limit=10" | jq '.data[].traceID'
|
|
380
|
+
|
|
381
|
+
# 查找错误请求
|
|
382
|
+
curl -s "http://jaeger:16686/api/traces?service=order-service&tags=error%3Dtrue&limit=10"
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
### 4.4 追踪分析方法
|
|
386
|
+
|
|
387
|
+
```markdown
|
|
388
|
+
链路分析步骤:
|
|
389
|
+
|
|
390
|
+
1. 找到问题 Trace ID(从日志、告警或 Jaeger 搜索)
|
|
391
|
+
2. 在 Jaeger UI 查看完整 Trace
|
|
392
|
+
3. 找到耗时最长的 Span(瓶颈定位)
|
|
393
|
+
4. 检查 Span 的 Tags 和 Logs(错误信息)
|
|
394
|
+
5. 分析 Span 的时间关系:
|
|
395
|
+
- 是串行导致的慢?→ 改为并行
|
|
396
|
+
- 是某个 Span 自身慢?→ 深入该服务分析
|
|
397
|
+
- 是重试导致的慢?→ 检查重试策略
|
|
398
|
+
- 有 Span 之间的 Gap?→ 检查排队等待
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
---
|
|
402
|
+
|
|
403
|
+
## 步骤五:内存泄漏排查
|
|
404
|
+
|
|
405
|
+
### 5.1 检测内存泄漏
|
|
406
|
+
|
|
407
|
+
```bash
|
|
408
|
+
# 监控进程内存趋势
|
|
409
|
+
while true; do
|
|
410
|
+
RSS=$(ps -o rss= -p $(pgrep -f "your-app"))
|
|
411
|
+
echo "$(date +%H:%M:%S) RSS: ${RSS}KB"
|
|
412
|
+
sleep 10
|
|
413
|
+
done > memory_trend.log
|
|
414
|
+
|
|
415
|
+
# 用 matplotlib 可视化
|
|
416
|
+
python3 << 'PYEOF'
|
|
417
|
+
import matplotlib.pyplot as plt
|
|
418
|
+
|
|
419
|
+
times, values = [], []
|
|
420
|
+
with open('memory_trend.log') as f:
|
|
421
|
+
for line in f:
|
|
422
|
+
parts = line.strip().split()
|
|
423
|
+
times.append(parts[0])
|
|
424
|
+
values.append(int(parts[2].rstrip('KB')))
|
|
425
|
+
|
|
426
|
+
plt.figure(figsize=(12, 4))
|
|
427
|
+
plt.plot(values)
|
|
428
|
+
plt.ylabel('RSS (KB)')
|
|
429
|
+
plt.title('Memory Usage Over Time')
|
|
430
|
+
plt.savefig('memory_trend.png')
|
|
431
|
+
PYEOF
|
|
432
|
+
```
|
|
433
|
+
|
|
434
|
+
### 5.2 定位泄漏点
|
|
435
|
+
|
|
436
|
+
```python
|
|
437
|
+
import tracemalloc
|
|
438
|
+
import gc
|
|
439
|
+
|
|
440
|
+
# 启用追踪
|
|
441
|
+
tracemalloc.start(25)
|
|
442
|
+
|
|
443
|
+
# 第一次快照
|
|
444
|
+
gc.collect()
|
|
445
|
+
snapshot1 = tracemalloc.take_snapshot()
|
|
446
|
+
|
|
447
|
+
# ... 运行一段时间或执行 N 次操作 ...
|
|
448
|
+
|
|
449
|
+
# 第二次快照
|
|
450
|
+
gc.collect()
|
|
451
|
+
snapshot2 = tracemalloc.take_snapshot()
|
|
452
|
+
|
|
453
|
+
# 对比找增长
|
|
454
|
+
stats = snapshot2.compare_to(snapshot1, 'traceback')
|
|
455
|
+
print("\n=== Top 10 Memory Growth ===")
|
|
456
|
+
for stat in stats[:10]:
|
|
457
|
+
print(f"\n{stat}")
|
|
458
|
+
for line in stat.traceback.format():
|
|
459
|
+
print(f" {line}")
|
|
460
|
+
```
|
|
461
|
+
|
|
462
|
+
### 5.3 常见泄漏模式
|
|
463
|
+
|
|
464
|
+
```python
|
|
465
|
+
# 泄漏模式 1: 全局缓存无限增长
|
|
466
|
+
# 问题
|
|
467
|
+
_cache = {}
|
|
468
|
+
def get_data(key):
|
|
469
|
+
if key not in _cache:
|
|
470
|
+
_cache[key] = expensive_fetch(key) # 永远不清理!
|
|
471
|
+
return _cache[key]
|
|
472
|
+
|
|
473
|
+
# 修复: 使用 LRU 缓存
|
|
474
|
+
from functools import lru_cache
|
|
475
|
+
|
|
476
|
+
@lru_cache(maxsize=1024)
|
|
477
|
+
def get_data(key):
|
|
478
|
+
return expensive_fetch(key)
|
|
479
|
+
|
|
480
|
+
|
|
481
|
+
# 泄漏模式 2: 事件监听器未取消
|
|
482
|
+
# 问题
|
|
483
|
+
class DataProcessor:
|
|
484
|
+
def __init__(self, event_bus):
|
|
485
|
+
event_bus.subscribe("data_ready", self.on_data) # 注册了但从不取消
|
|
486
|
+
|
|
487
|
+
# 修复: 使用弱引用或显式取消
|
|
488
|
+
import weakref
|
|
489
|
+
|
|
490
|
+
class DataProcessor:
|
|
491
|
+
def __init__(self, event_bus):
|
|
492
|
+
self._event_bus = event_bus
|
|
493
|
+
event_bus.subscribe("data_ready", weakref.WeakMethod(self.on_data))
|
|
494
|
+
|
|
495
|
+
def __del__(self):
|
|
496
|
+
self._event_bus.unsubscribe("data_ready", self.on_data)
|
|
497
|
+
|
|
498
|
+
|
|
499
|
+
# 泄漏模式 3: 闭包捕获大对象
|
|
500
|
+
# 问题
|
|
501
|
+
def create_handler(large_data):
|
|
502
|
+
def handler():
|
|
503
|
+
return len(large_data) # 闭包持有 large_data 的引用
|
|
504
|
+
return handler
|
|
505
|
+
|
|
506
|
+
# 修复: 只捕获必要的值
|
|
507
|
+
def create_handler(large_data):
|
|
508
|
+
data_length = len(large_data) # 提取需要的值
|
|
509
|
+
def handler():
|
|
510
|
+
return data_length
|
|
511
|
+
return handler
|
|
512
|
+
|
|
513
|
+
|
|
514
|
+
# 泄漏模式 4: 循环引用
|
|
515
|
+
# 问题
|
|
516
|
+
class Parent:
|
|
517
|
+
def __init__(self):
|
|
518
|
+
self.child = Child(self)
|
|
519
|
+
|
|
520
|
+
class Child:
|
|
521
|
+
def __init__(self, parent):
|
|
522
|
+
self.parent = parent # 循环引用
|
|
523
|
+
|
|
524
|
+
# 修复: 使用弱引用
|
|
525
|
+
class Child:
|
|
526
|
+
def __init__(self, parent):
|
|
527
|
+
self.parent = weakref.ref(parent)
|
|
528
|
+
```
|
|
529
|
+
|
|
530
|
+
### 5.4 生产环境内存诊断
|
|
531
|
+
|
|
532
|
+
```python
|
|
533
|
+
# 不停机的生产环境内存分析端点
|
|
534
|
+
from fastapi import FastAPI
|
|
535
|
+
import tracemalloc
|
|
536
|
+
import gc
|
|
537
|
+
|
|
538
|
+
app = FastAPI()
|
|
539
|
+
|
|
540
|
+
# 在应用启动时开启 tracemalloc(有约 5% 性能开销)
|
|
541
|
+
tracemalloc.start(10)
|
|
542
|
+
_baseline_snapshot = None
|
|
543
|
+
|
|
544
|
+
@app.post("/debug/memory/baseline")
|
|
545
|
+
async def set_memory_baseline():
|
|
546
|
+
"""设置内存基线"""
|
|
547
|
+
global _baseline_snapshot
|
|
548
|
+
gc.collect()
|
|
549
|
+
_baseline_snapshot = tracemalloc.take_snapshot()
|
|
550
|
+
return {"status": "baseline set"}
|
|
551
|
+
|
|
552
|
+
@app.get("/debug/memory/growth")
|
|
553
|
+
async def get_memory_growth(top_n: int = 20):
|
|
554
|
+
"""查看内存增长"""
|
|
555
|
+
gc.collect()
|
|
556
|
+
current = tracemalloc.take_snapshot()
|
|
557
|
+
if _baseline_snapshot:
|
|
558
|
+
stats = current.compare_to(_baseline_snapshot, 'lineno')
|
|
559
|
+
else:
|
|
560
|
+
stats = current.statistics('lineno')
|
|
561
|
+
|
|
562
|
+
return {
|
|
563
|
+
"top_allocations": [
|
|
564
|
+
{
|
|
565
|
+
"file": str(stat.traceback),
|
|
566
|
+
"size_kb": stat.size / 1024,
|
|
567
|
+
"count": stat.count,
|
|
568
|
+
}
|
|
569
|
+
for stat in stats[:top_n]
|
|
570
|
+
]
|
|
571
|
+
}
|
|
572
|
+
|
|
573
|
+
@app.get("/debug/memory/objects")
|
|
574
|
+
async def get_object_stats(top_n: int = 20):
|
|
575
|
+
"""查看对象统计"""
|
|
576
|
+
gc.collect()
|
|
577
|
+
import objgraph
|
|
578
|
+
return {
|
|
579
|
+
"most_common": objgraph.most_common_types(limit=top_n),
|
|
580
|
+
"growth": objgraph.growth(limit=top_n),
|
|
581
|
+
}
|
|
582
|
+
```
|
|
583
|
+
|
|
584
|
+
---
|
|
585
|
+
|
|
586
|
+
## 步骤六:网络问题调试
|
|
587
|
+
|
|
588
|
+
### 6.1 连接问题
|
|
589
|
+
|
|
590
|
+
```bash
|
|
591
|
+
# DNS 解析
|
|
592
|
+
dig api.external-service.com +short
|
|
593
|
+
nslookup api.external-service.com
|
|
594
|
+
|
|
595
|
+
# 连通性测试
|
|
596
|
+
curl -v -o /dev/null -w "\
|
|
597
|
+
DNS: %{time_namelookup}s\n\
|
|
598
|
+
TCP: %{time_connect}s\n\
|
|
599
|
+
TLS: %{time_appconnect}s\n\
|
|
600
|
+
First byte: %{time_starttransfer}s\n\
|
|
601
|
+
Total: %{time_total}s\n\
|
|
602
|
+
Status: %{http_code}\n" \
|
|
603
|
+
https://api.external-service.com/health
|
|
604
|
+
|
|
605
|
+
# 路由追踪
|
|
606
|
+
mtr -r -c 10 api.external-service.com
|
|
607
|
+
|
|
608
|
+
# TCP 连接状态分析
|
|
609
|
+
ss -tnp | awk '{print $1}' | sort | uniq -c | sort -rn
|
|
610
|
+
# 如果 TIME_WAIT 过多,可能需要调优内核参数
|
|
611
|
+
# 如果 CLOSE_WAIT 过多,应用未正确关闭连接
|
|
612
|
+
```
|
|
613
|
+
|
|
614
|
+
### 6.2 请求级调试
|
|
615
|
+
|
|
616
|
+
```bash
|
|
617
|
+
# 详细的 HTTP 请求调试
|
|
618
|
+
curl -v --trace-time https://api.example.com/api/v1/orders 2>&1 | head -50
|
|
619
|
+
|
|
620
|
+
# 使用 httpie(更友好的输出)
|
|
621
|
+
http --print=HhBb GET https://api.example.com/api/v1/orders Authorization:"Bearer xxx"
|
|
622
|
+
|
|
623
|
+
# 抓包分析
|
|
624
|
+
sudo tcpdump -i any -A -s 0 'port 8080 and host 10.0.0.1' -w debug.pcap -c 1000
|
|
625
|
+
# 使用 wireshark 分析 debug.pcap
|
|
626
|
+
```
|
|
627
|
+
|
|
628
|
+
---
|
|
629
|
+
|
|
630
|
+
## 验证
|
|
631
|
+
|
|
632
|
+
### 调试完成确认
|
|
633
|
+
|
|
634
|
+
```markdown
|
|
635
|
+
调试完成的标志:
|
|
636
|
+
|
|
637
|
+
1. [ ] 根因已确认(不是表面症状,而是底层原因)
|
|
638
|
+
2. [ ] 修复方案已验证(问题不再复现)
|
|
639
|
+
3. [ ] 回归测试通过(修复没有引入新问题)
|
|
640
|
+
4. [ ] 相关监控已添加(同类问题能被及时发现)
|
|
641
|
+
5. [ ] 知识已沉淀(调试过程和经验已文档化)
|
|
642
|
+
```
|
|
643
|
+
|
|
644
|
+
---
|
|
645
|
+
|
|
646
|
+
## 回滚方案
|
|
647
|
+
|
|
648
|
+
### 调试操作回滚
|
|
649
|
+
|
|
650
|
+
```bash
|
|
651
|
+
# 移除调试代码
|
|
652
|
+
grep -rn "pdb\|breakpoint()\|debugpy\|import ipdb" src/ | head -20
|
|
653
|
+
# 确保无调试代码残留
|
|
654
|
+
|
|
655
|
+
# 移除调试端点
|
|
656
|
+
# 确保 /debug/ 路径不暴露到生产环境
|
|
657
|
+
|
|
658
|
+
# 关闭 tracemalloc(如果开销显著)
|
|
659
|
+
tracemalloc.stop()
|
|
660
|
+
|
|
661
|
+
# 恢复日志级别
|
|
662
|
+
# 调试时可能调高了日志级别,调试完成后恢复
|
|
663
|
+
```
|
|
664
|
+
|
|
665
|
+
### 修复回滚
|
|
666
|
+
|
|
667
|
+
```bash
|
|
668
|
+
# 如果修复引入了新问题
|
|
669
|
+
git revert <fix-commit>
|
|
670
|
+
|
|
671
|
+
# 如果修复涉及配置变更
|
|
672
|
+
kubectl rollout undo deployment/<service> -n production
|
|
673
|
+
```
|
|
674
|
+
|
|
675
|
+
---
|
|
676
|
+
|
|
677
|
+
## Agent Checklist
|
|
678
|
+
|
|
679
|
+
AI 编码 Agent 在协助调试时必须逐项确认:
|
|
680
|
+
|
|
681
|
+
- [ ] **问题已定义**:有明确的现象描述、影响范围和复现条件
|
|
682
|
+
- [ ] **信息已收集**:日志、监控指标、追踪数据已获取
|
|
683
|
+
- [ ] **假设已记录**:每个假设和验证结果有记录
|
|
684
|
+
- [ ] **科学排查**:一次只验证一个假设,不盲目猜测
|
|
685
|
+
- [ ] **根因已确认**:不只是修复了表面症状
|
|
686
|
+
- [ ] **修复最小化**:修复范围尽可能小
|
|
687
|
+
- [ ] **回归已验证**:修复没有引入新问题
|
|
688
|
+
- [ ] **调试代码已清理**:无 pdb/breakpoint/debugpy 残留
|
|
689
|
+
- [ ] **监控已补充**:针对此类问题的监控告警已添加
|
|
690
|
+
- [ ] **测试已补充**:覆盖此次 Bug 场景的测试已添加
|
|
691
|
+
- [ ] **知识已沉淀**:调试过程和根因分析已文档化
|
|
692
|
+
- [ ] **防复发措施**:有明确的措施防止同类问题再次发生
|