@umacloud/knowledge 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/00-governance/governance-capabilities.md +557 -0
- package/00-governance/knowledge-map.md +39 -0
- package/00-governance/maintenance-policy.md +76 -0
- package/00-governance/review-checklist.md +81 -0
- package/README.md +13 -0
- package/ai/01-standards/agent-development-complete.md +691 -0
- package/ai/01-standards/llm-application-complete.md +488 -0
- package/ai/01-standards/mlops-complete.md +798 -0
- package/ai/01-standards/prompt-engineering-complete.md +646 -0
- package/ai/01-standards/rag-architecture-complete.md +649 -0
- package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
- package/ai/03-checklists/ai-project-checklist.md +215 -0
- package/ai/04-antipatterns/ai-antipatterns.md +661 -0
- package/ai/05-cases/case-rag-production.md +147 -0
- package/ai/06-glossary/ai-glossary.md +162 -0
- package/ai/agent-evaluation-benchmark.md +53 -0
- package/ai/ai-agent-memory-context-management.md +41 -0
- package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
- package/ai/ai-data-security-and-compliance-playbook.md +37 -0
- package/ai/ai-domain-index-and-checklist.md +40 -0
- package/ai/ai-governance-maturity-model.md +50 -0
- package/ai/ai-model-selection-and-routing-strategy.md +47 -0
- package/ai/ai-observability-and-oncall-runbook.md +52 -0
- package/ai/ai-rag-engineering-playbook.md +42 -0
- package/ai/ai-red-team-and-safety-evaluation.md +42 -0
- package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
- package/ai/llm-agent-engineering-deep-dive.md +57 -0
- package/ai/prompt-and-tool-guardrails.md +52 -0
- package/api/01-standards/enterprise-api-standards.md +198 -0
- package/api/01-standards/rest-api-design-guide.md +63 -0
- package/api/02-playbooks/api-pagination-playbook.md +93 -0
- package/api/02-playbooks/graphql-production-playbook.md +176 -0
- package/api/03-checklists/api-review-checklist.md +55 -0
- package/api/04-antipatterns/api-antipatterns.md +112 -0
- package/architecture/01-standards/api-gateway-patterns.md +496 -0
- package/architecture/01-standards/cloud-native-patterns.md +644 -0
- package/architecture/01-standards/distributed-systems-patterns.md +591 -0
- package/architecture/01-standards/event-driven-architecture.md +595 -0
- package/architecture/01-standards/microservices-patterns-complete.md +968 -0
- package/architecture/01-standards/microservices-patterns.md +495 -0
- package/architecture/01-standards/system-design-interview.md +664 -0
- package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
- package/architecture/02-playbooks/migration-playbook.md +780 -0
- package/architecture/02-playbooks/system-design-playbook.md +779 -0
- package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
- package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
- package/architecture/05-cases/case-netflix-microservices.md +413 -0
- package/architecture/06-glossary/architecture-glossary.md +164 -0
- package/architecture/adr-template-and-examples.md +38 -0
- package/architecture/api-gateway-deep-dive.md +1291 -0
- package/architecture/configuration-management.md +1162 -0
- package/architecture/distributed-transactions.md +1220 -0
- package/architecture/microservices-complete.md +735 -0
- package/architecture/resilience-and-disaster-patterns.md +37 -0
- package/architecture/service-governance.md +1198 -0
- package/architecture/system-architecture-deep-dive.md +37 -0
- package/backend/01-standards/analytics-and-growth.md +65 -0
- package/backend/01-standards/api-and-error-conventions.md +120 -0
- package/backend/01-standards/application-layering-and-packaging.md +160 -0
- package/backend/01-standards/auth-implementation.md +104 -0
- package/backend/01-standards/backend-framework-idioms.md +74 -0
- package/backend/01-standards/background-jobs-and-async.md +66 -0
- package/backend/01-standards/caching-strategies-complete.md +390 -0
- package/backend/01-standards/config-and-observability.md +77 -0
- package/backend/01-standards/data-modeling-and-persistence.md +94 -0
- package/backend/01-standards/django-complete.md +1765 -0
- package/backend/01-standards/email-and-notifications.md +64 -0
- package/backend/01-standards/fastapi-complete.md +925 -0
- package/backend/01-standards/file-upload-and-storage.md +66 -0
- package/backend/01-standards/graphql-api-complete.md +416 -0
- package/backend/01-standards/llm-application-standard.md +78 -0
- package/backend/01-standards/message-queue-patterns.md +379 -0
- package/backend/01-standards/microservices-and-distributed.md +78 -0
- package/backend/01-standards/nestjs-complete.md +2167 -0
- package/backend/01-standards/payment-integration.md +80 -0
- package/backend/01-standards/rate-limiting-complete.md +451 -0
- package/backend/01-standards/realtime-and-websocket.md +65 -0
- package/backend/01-standards/search-and-filtering.md +64 -0
- package/backend/01-standards/spring-boot-complete.md +445 -0
- package/backend/02-playbooks/api-design-playbook.md +718 -0
- package/backend/02-playbooks/email-send-playbook.md +130 -0
- package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
- package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
- package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
- package/backend/03-checklists/api-launch-checklist.md +189 -0
- package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
- package/blockchain/01-standards/blockchain-basics.md +557 -0
- package/blockchain/01-standards/smart-contract-development.md +1315 -0
- package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
- package/cicd/01-standards/github-actions-complete.md +473 -0
- package/cicd/01-standards/release-and-store-submission.md +75 -0
- package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
- package/cicd/02-playbooks/release-management-playbook.md +605 -0
- package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
- package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
- package/cicd/05-cases/case-deployment-automation.md +221 -0
- package/cicd/05-cases/case-gitops-transformation.md +212 -0
- package/cicd/06-glossary/cicd-glossary.md +114 -0
- package/cicd/cicd-blueprint-deep-dive.md +38 -0
- package/cicd/release-readiness-gate.md +37 -0
- package/cloud-native/01-standards/container-security.md +741 -0
- package/cloud-native/01-standards/kubernetes-complete.md +812 -0
- package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
- package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
- package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
- package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
- package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
- package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
- package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
- package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
- package/cloud-native/03-checklists/container-security-checklist.md +431 -0
- package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
- package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
- package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
- package/cloud-native/05-cases/case-k8s-migration.md +478 -0
- package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
- package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
- package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
- package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
- package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
- package/data/01-standards/elasticsearch-complete.md +2098 -0
- package/data/01-standards/postgresql-complete.md +1613 -0
- package/data/01-standards/redis-complete.md +1527 -0
- package/data/02-playbooks/database-optimization-playbook.md +403 -0
- package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
- package/data/03-checklists/database-launch-checklist.md +187 -0
- package/data/04-antipatterns/database-antipatterns.md +873 -0
- package/data/05-cases/case-database-migration.md +310 -0
- package/data/06-glossary/database-glossary.md +440 -0
- package/data/data-governance-and-modeling-deep-dive.md +39 -0
- package/data-engineering/01-standards/airflow-complete.md +523 -0
- package/data-engineering/01-standards/kafka-complete.md +1521 -0
- package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
- package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
- package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
- package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
- package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
- package/database/01-standards/database-schema-standards.md +147 -0
- package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
- package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
- package/database/02-playbooks/postgresql-production-playbook.md +146 -0
- package/database/02-playbooks/redis-caching-playbook.md +117 -0
- package/database/03-checklists/database-review-checklist.md +50 -0
- package/database/04-antipatterns/database-antipatterns.md +112 -0
- package/design/01-standards/ui-design-system-complete.md +423 -0
- package/design/02-playbooks/design-handoff-playbook.md +254 -0
- package/design/02-playbooks/design-review-playbook.md +388 -0
- package/design/03-checklists/design-review-checklist.md +246 -0
- package/design/04-antipatterns/design-antipatterns.md +378 -0
- package/design/05-cases/case-design-system-adoption.md +328 -0
- package/design/06-glossary/design-glossary.md +329 -0
- package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
- package/design/ux-system-deep-dive.md +38 -0
- package/design-systems/00-craft-rules.md +71 -0
- package/design-systems/aesthetic-families.md +43 -0
- package/design-systems/anti-ai-slop.md +162 -0
- package/design-systems/bold-geometric.md +120 -0
- package/design-systems/brutalist-bold.md +103 -0
- package/design-systems/editorial-clean.md +109 -0
- package/design-systems/glass-aurora.md +108 -0
- package/design-systems/modern-minimal.md +145 -0
- package/design-systems/premium-luxury.md +106 -0
- package/design-systems/product-type-design-map.md +48 -0
- package/design-systems/soft-warm.md +123 -0
- package/design-systems/tech-utility.md +113 -0
- package/desktop/01-standards/desktop-app-standard.md +72 -0
- package/desktop/01-standards/desktop-design.md +71 -0
- package/development/00-governance/document-template.md +41 -0
- package/development/01-standards/api-versioning-strategies.md +432 -0
- package/development/01-standards/authentication-patterns-complete.md +479 -0
- package/development/01-standards/css-architecture-complete.md +550 -0
- package/development/01-standards/database-migration-strategies.md +484 -0
- package/development/01-standards/elasticsearch-complete.md +347 -0
- package/development/01-standards/git-complete.md +371 -0
- package/development/01-standards/golang-complete.md +1565 -0
- package/development/01-standards/graphql-complete.md +298 -0
- package/development/01-standards/javascript-bundlers-complete.md +469 -0
- package/development/01-standards/javascript-typescript-complete.md +528 -0
- package/development/01-standards/jest-complete.md +275 -0
- package/development/01-standards/linux-complete.md +234 -0
- package/development/01-standards/logging-observability-complete.md +526 -0
- package/development/01-standards/microservices-communication.md +502 -0
- package/development/01-standards/mongodb-complete.md +406 -0
- package/development/01-standards/oauth2-complete.md +285 -0
- package/development/01-standards/performance-optimization-complete.md +289 -0
- package/development/01-standards/playwright-complete.md +247 -0
- package/development/01-standards/postgresql-complete.md +456 -0
- package/development/01-standards/pytest-complete.md +340 -0
- package/development/01-standards/python-async-programming.md +902 -0
- package/development/01-standards/python-complete.md +956 -0
- package/development/01-standards/python-decorators-complete.md +799 -0
- package/development/01-standards/python-design-patterns.md +2854 -0
- package/development/01-standards/python-packaging-distribution.md +420 -0
- package/development/01-standards/python-testing-strategies.md +607 -0
- package/development/01-standards/python-web-frameworks-comparison.md +471 -0
- package/development/01-standards/redis-complete.md +317 -0
- package/development/01-standards/rest-api-complete.md +316 -0
- package/development/01-standards/rust-complete.md +578 -0
- package/development/01-standards/typescript-advanced-types.md +1513 -0
- package/development/01-standards/web-security-complete.md +292 -0
- package/development/02-playbooks/api-design-playbook.md +810 -0
- package/development/02-playbooks/database-migration-playbook.md +580 -0
- package/development/02-playbooks/debugging-playbook.md +692 -0
- package/development/02-playbooks/feature-delivery-playbook.md +430 -0
- package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
- package/development/02-playbooks/performance-optimization-playbook.md +531 -0
- package/development/02-playbooks/performance-tuning-playbook.md +652 -0
- package/development/02-playbooks/refactor-playbook.md +403 -0
- package/development/02-playbooks/release-playbook.md +469 -0
- package/development/03-checklists/architecture-review-checklist.md +168 -0
- package/development/03-checklists/data-migration-checklist.md +157 -0
- package/development/03-checklists/oncall-handover-checklist.md +173 -0
- package/development/03-checklists/pr-checklist.md +158 -0
- package/development/03-checklists/production-readiness-checklist.md +190 -0
- package/development/03-checklists/release-readiness-checklist.md +154 -0
- package/development/03-checklists/security-review-checklist.md +182 -0
- package/development/04-antipatterns/api-antipatterns.md +657 -0
- package/development/04-antipatterns/architecture-antipatterns.md +686 -0
- package/development/04-antipatterns/backend-antipatterns.md +648 -0
- package/development/04-antipatterns/cicd-antipatterns.md +540 -0
- package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
- package/development/04-antipatterns/data-antipatterns.md +658 -0
- package/development/04-antipatterns/database-antipatterns.md +578 -0
- package/development/04-antipatterns/frontend-antipatterns.md +635 -0
- package/development/04-antipatterns/reliability-antipatterns.md +700 -0
- package/development/04-antipatterns/security-antipatterns.md +747 -0
- package/development/05-cases/case-api-version-migration.md +428 -0
- package/development/05-cases/case-authorization-hardening.md +383 -0
- package/development/05-cases/case-bluegreen-rollback.md +466 -0
- package/development/05-cases/case-cache-snowball-protection.md +485 -0
- package/development/05-cases/case-ci-cd-pipeline.md +544 -0
- package/development/05-cases/case-database-scaling.md +500 -0
- package/development/05-cases/case-db-hotspot-optimization.md +487 -0
- package/development/05-cases/case-incident-mttr-reduction.md +563 -0
- package/development/05-cases/case-microservice-migration.md +375 -0
- package/development/05-cases/case-performance-optimization.md +406 -0
- package/development/05-cases/case-security-incident-response.md +345 -0
- package/development/06-glossary/full-stack-glossary.md +166 -0
- package/development/09-maturity/quarterly-audit-template.md +35 -0
- package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
- package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
- package/development/12-scenarios/development-scenarios-guide.md +565 -0
- package/development/13-implementation-assets/implementation-toolkit.md +282 -0
- package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
- package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
- package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
- package/development/api-contract-and-versioning-guide.md +36 -0
- package/development/api-governance-complete.md +43 -0
- package/development/backend-engineering-complete.md +43 -0
- package/development/code-review-quality-complete.md +43 -0
- package/development/concurrency-reliability-complete.md +43 -0
- package/development/database-engineering-complete.md +43 -0
- package/development/engineering-effectiveness-complete.md +43 -0
- package/development/engineering-standards-deep-dive.md +38 -0
- package/development/frontend-engineering-complete.md +43 -0
- package/development/performance-capacity-complete.md +43 -0
- package/development/refactor-migration-complete.md +42 -0
- package/development/refactoring-and-techdebt-playbook.md +37 -0
- package/development/security-in-development-complete.md +43 -0
- package/devops/01-standards/cicd-pipeline-complete.md +262 -0
- package/devops/01-standards/docker-complete.md +1490 -0
- package/devops/01-standards/github-actions-complete.md +337 -0
- package/devops/01-standards/kubernetes-complete.md +638 -0
- package/devops/01-standards/terraform-complete.md +2117 -0
- package/devops/02-playbooks/docker-compose-playbook.md +233 -0
- package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
- package/devops/02-playbooks/docker-production-playbook.md +952 -0
- package/edge-iot/01-standards/edge-iot-complete.md +473 -0
- package/experts/architect/api-design.md +178 -0
- package/experts/architect/methodology.md +124 -0
- package/experts/architect/security.md +75 -0
- package/experts/backend-lead/methodology.md +216 -0
- package/experts/devops/methodology.md +160 -0
- package/experts/frontend-lead/methodology.md +178 -0
- package/experts/product-manager/industry/ecommerce.md +43 -0
- package/experts/product-manager/industry/saas.md +40 -0
- package/experts/product-manager/methodology.md +97 -0
- package/experts/qa-lead/methodology.md +123 -0
- package/experts/qa-lead/test-strategy.md +128 -0
- package/experts/uiux-designer/methodology.md +125 -0
- package/frontend/01-standards/accessibility-complete.md +532 -0
- package/frontend/01-standards/accessibility-standard.md +74 -0
- package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
- package/frontend/01-standards/design-tokens-complete.md +444 -0
- package/frontend/01-standards/forms-and-validation.md +77 -0
- package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
- package/frontend/01-standards/i18n-and-localization.md +65 -0
- package/frontend/01-standards/nextjs-complete.md +451 -0
- package/frontend/01-standards/react-complete.md +713 -0
- package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
- package/frontend/01-standards/react-hooks-complete.md +1171 -0
- package/frontend/01-standards/seo-and-web-vitals.md +77 -0
- package/frontend/01-standards/state-management-complete.md +444 -0
- package/frontend/01-standards/vue-complete.md +499 -0
- package/frontend/01-standards/vue3-complete.md +2002 -0
- package/frontend/01-standards/web-framework-best-practices.md +64 -0
- package/frontend/01-standards/web-performance-complete.md +495 -0
- package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
- package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
- package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
- package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
- package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
- package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
- package/frontend/03-checklists/component-quality-checklist.md +166 -0
- package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
- package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
- package/frontend/05-cases/case-performance-optimization.md +274 -0
- package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
- package/harmony/01-standards/harmonyos-design.md +65 -0
- package/high-quality-engineering-playbook.md +54 -0
- package/incident/01-standards/incident-response-complete.md +303 -0
- package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
- package/incident/02-playbooks/postmortem-playbook.md +398 -0
- package/incident/03-checklists/incident-readiness-checklist.md +181 -0
- package/incident/04-antipatterns/incident-antipatterns.md +490 -0
- package/incident/05-cases/case-cascade-failure.md +176 -0
- package/incident/06-glossary/incident-glossary.md +114 -0
- package/incident/postmortem-and-response-deep-dive.md +39 -0
- package/industries/ecommerce/ecommerce-complete.md +631 -0
- package/industries/education/education-complete.md +555 -0
- package/industries/fintech/fintech-complete.md +501 -0
- package/industries/gaming/gaming-complete.md +587 -0
- package/industries/healthcare/healthcare-complete.md +452 -0
- package/low-code/01-standards/low-code-complete.md +944 -0
- package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
- package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
- package/miniprogram/01-standards/miniprogram-design.md +61 -0
- package/miniprogram/01-standards/miniprogram-standard.md +81 -0
- package/mobile/01-standards/android-material-design.md +70 -0
- package/mobile/01-standards/flutter-complete.md +384 -0
- package/mobile/01-standards/ios-design-hig.md +78 -0
- package/mobile/01-standards/mobile-app-standard.md +85 -0
- package/mobile/01-standards/react-native-complete.md +352 -0
- package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
- package/mobile/02-playbooks/mobile-performance.md +473 -0
- package/mobile/03-checklists/mobile-release-checklist.md +234 -0
- package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
- package/mobile/05-cases/case-app-performance.md +500 -0
- package/mobile/05-cases/case-app-startup-optimization.md +218 -0
- package/mobile/06-glossary/mobile-glossary.md +484 -0
- package/observability/01-standards/observability-standards.md +103 -0
- package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
- package/observability/02-playbooks/structured-logging-playbook.md +73 -0
- package/observability/03-checklists/observability-checklist.md +54 -0
- package/observability/04-antipatterns/observability-antipatterns.md +106 -0
- package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
- package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
- package/operations/03-checklists/production-launch-checklist.md +365 -0
- package/operations/04-antipatterns/operations-antipatterns.md +664 -0
- package/operations/05-cases/case-sre-practices.md +581 -0
- package/operations/06-glossary/operations-glossary.md +120 -0
- package/operations/aiops-anomaly-detection.md +758 -0
- package/operations/capacity-planning.md +1061 -0
- package/operations/chaos-engineering.md +659 -0
- package/operations/incident-command-system.md +38 -0
- package/operations/observability-complete.md +442 -0
- package/operations/slo-sli-playbook.md +517 -0
- package/operations/sre-operations-deep-dive.md +39 -0
- package/package.json +8 -0
- package/performance/01-standards/performance-and-scalability.md +80 -0
- package/performance/01-standards/performance-standards.md +156 -0
- package/performance/02-playbooks/query-optimization-playbook.md +103 -0
- package/performance/03-checklists/performance-checklist.md +56 -0
- package/performance/04-antipatterns/performance-antipatterns.md +146 -0
- package/product/01-standards/product-management-complete.md +285 -0
- package/product/02-playbooks/feature-launch-playbook.md +207 -0
- package/product/02-playbooks/user-research-playbook.md +532 -0
- package/product/03-checklists/feature-launch-checklist.md +275 -0
- package/product/04-antipatterns/product-antipatterns.md +355 -0
- package/product/05-cases/case-mvp-to-scale.md +384 -0
- package/product/06-glossary/product-glossary.md +462 -0
- package/product/feature-prioritization-framework.md +40 -0
- package/product/kpi-and-metric-tree.md +37 -0
- package/product/product-discovery-and-prd-deep-dive.md +41 -0
- package/quantum/01-standards/quantum-complete.md +1186 -0
- package/security/01-standards/api-security-complete.md +511 -0
- package/security/01-standards/container-runtime-security.md +574 -0
- package/security/01-standards/data-protection-gdpr.md +543 -0
- package/security/01-standards/owasp-top10-complete.md +1890 -0
- package/security/01-standards/secure-coding-baseline.md +90 -0
- package/security/01-standards/supply-chain-security.md +441 -0
- package/security/01-standards/web-security-checklist.md +108 -0
- package/security/01-standards/zero-trust-architecture.md +521 -0
- package/security/02-playbooks/auth-sso-playbook.md +166 -0
- package/security/02-playbooks/incident-response-security-playbook.md +588 -0
- package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
- package/security/02-playbooks/payment-integration-playbook.md +119 -0
- package/security/02-playbooks/penetration-testing-playbook.md +517 -0
- package/security/03-checklists/security-audit-checklist.md +356 -0
- package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
- package/security/05-cases/case-log4shell-incident.md +537 -0
- package/security/05-cases/case-major-breaches.md +468 -0
- package/security/06-glossary/security-glossary.md +212 -0
- package/security/compliance-automation.md +993 -0
- package/security/container-security.md +680 -0
- package/security/devsecops-complete.md +426 -0
- package/security/sast-dast-sca.md +775 -0
- package/security/secrets-management.md +594 -0
- package/security/security-architecture-deep-dive.md +37 -0
- package/security/threat-modeling-stride-playbook.md +40 -0
- package/seed-templates/auth-system.md +59 -0
- package/seed-templates/blog-content.md +94 -0
- package/seed-templates/dashboard.md +89 -0
- package/seed-templates/docs-site.md +73 -0
- package/seed-templates/e-commerce.md +50 -0
- package/seed-templates/saas-landing.md +92 -0
- package/seed-templates/settings-page.md +51 -0
- package/testing/01-standards/test-strategy-and-layering.md +83 -0
- package/testing/01-standards/testing-strategy-complete.md +422 -0
- package/testing/01-standards/unit-testing-best-practices.md +118 -0
- package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
- package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
- package/testing/03-checklists/test-strategy-checklist.md +208 -0
- package/testing/04-antipatterns/testing-antipatterns.md +718 -0
- package/testing/05-cases/case-testing-transformation.md +300 -0
- package/testing/06-glossary/testing-glossary.md +110 -0
- package/testing/risk-based-test-matrix.md +36 -0
- package/testing/testing-strategy-deep-dive.md +37 -0
|
@@ -0,0 +1,1521 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: kafka-complete
|
|
3
|
+
title: Apache Kafka完整指南
|
|
4
|
+
domain: data-engineering
|
|
5
|
+
category: 01-standards
|
|
6
|
+
difficulty: intermediate
|
|
7
|
+
tags: [complete, connect, data-engineering, kafka, schema管理, streams, 核心概念, 概述]
|
|
8
|
+
quality_score: 70
|
|
9
|
+
last_updated: 2026-06-15
|
|
10
|
+
---
|
|
11
|
+
# Apache Kafka完整指南
|
|
12
|
+
|
|
13
|
+
## 概述
|
|
14
|
+
|
|
15
|
+
Apache Kafka是一个分布式事件流平台,用于高吞吐量、低延迟的实时数据管道和流处理。最初由LinkedIn开发,后捐赠给Apache基金会。Kafka以其持久化日志模型、水平扩展能力和容错设计,成为现代数据架构的核心基础设施。
|
|
16
|
+
|
|
17
|
+
### 消息队列对比
|
|
18
|
+
|
|
19
|
+
| 特性 | Kafka | RabbitMQ | Redis Streams | Pulsar |
|
|
20
|
+
|------|-------|----------|---------------|--------|
|
|
21
|
+
| 模型 | 分布式日志 | AMQP消息代理 | 内存流 | 分层存储日志 |
|
|
22
|
+
| 吞吐量 | 百万级/秒 | 万级/秒 | 十万级/秒 | 百万级/秒 |
|
|
23
|
+
| 延迟 | 毫秒级 | 微秒级 | 亚毫秒级 | 毫秒级 |
|
|
24
|
+
| 持久化 | 磁盘顺序写 | 可选持久化 | AOF/RDB | BookKeeper |
|
|
25
|
+
| 消息回溯 | 支持(Offset) | 不支持 | 支持(ID) | 支持(MessageID) |
|
|
26
|
+
| 消费模式 | 拉取(Pull) | 推送(Push) | 拉取/阻塞读 | 推送+拉取 |
|
|
27
|
+
| 协议 | 自有二进制协议 | AMQP/MQTT/STOMP | Redis协议 | 自有二进制协议 |
|
|
28
|
+
| 多租户 | 有限(ACL) | VHost隔离 | 无原生支持 | 原生多租户 |
|
|
29
|
+
| 存算分离 | KRaft模式部分支持 | 不支持 | 不支持 | 原生支持 |
|
|
30
|
+
| 适用场景 | 事件流/日志聚合/CDC | 任务队列/RPC | 轻量实时流 | 大规模多租户流 |
|
|
31
|
+
|
|
32
|
+
**选型建议**:
|
|
33
|
+
- **高吞吐事件流/日志采集/CDC**: 选择Kafka
|
|
34
|
+
- **复杂路由/任务队列/低延迟RPC**: 选择RabbitMQ
|
|
35
|
+
- **轻量级实时流/已有Redis生态**: 选择Redis Streams
|
|
36
|
+
- **多租户/存算分离/跨地域复制**: 选择Pulsar
|
|
37
|
+
|
|
38
|
+
## 核心概念
|
|
39
|
+
|
|
40
|
+
### 1. Broker
|
|
41
|
+
|
|
42
|
+
Broker是Kafka集群中的单个服务器节点,负责消息的接收、存储和分发。
|
|
43
|
+
|
|
44
|
+
```
|
|
45
|
+
Kafka集群拓扑:
|
|
46
|
+
┌─────────┐ ┌─────────┐ ┌─────────┐
|
|
47
|
+
│ Broker 0│ │ Broker 1│ │ Broker 2│
|
|
48
|
+
│ (Leader) │ │(Follower)│ │(Follower)│
|
|
49
|
+
│ P0,P3 │ │ P1,P4 │ │ P2,P5 │
|
|
50
|
+
└─────────┘ └─────────┘ └─────────┘
|
|
51
|
+
│ │ │
|
|
52
|
+
└────────────┼────────────┘
|
|
53
|
+
│
|
|
54
|
+
┌────────┴────────┐
|
|
55
|
+
│ ZooKeeper/KRaft │
|
|
56
|
+
└─────────────────┘
|
|
57
|
+
```
|
|
58
|
+
|
|
59
|
+
**关键配置**:
|
|
60
|
+
```properties
|
|
61
|
+
# server.properties
|
|
62
|
+
broker.id=0
|
|
63
|
+
listeners=PLAINTEXT://0.0.0.0:9092
|
|
64
|
+
advertised.listeners=PLAINTEXT://kafka-broker-0:9092
|
|
65
|
+
log.dirs=/var/kafka-logs
|
|
66
|
+
num.partitions=6
|
|
67
|
+
default.replication.factor=3
|
|
68
|
+
min.insync.replicas=2
|
|
69
|
+
log.retention.hours=168
|
|
70
|
+
log.segment.bytes=1073741824
|
|
71
|
+
```
|
|
72
|
+
|
|
73
|
+
### 2. Topic与Partition
|
|
74
|
+
|
|
75
|
+
Topic是消息的逻辑分类,Partition是Topic的物理分片,是Kafka并行处理的基本单元。
|
|
76
|
+
|
|
77
|
+
```
|
|
78
|
+
Topic: order-events (3 Partitions, RF=3)
|
|
79
|
+
|
|
80
|
+
Partition 0: [msg0, msg3, msg6, msg9, ...] → Leader: Broker 0
|
|
81
|
+
Partition 1: [msg1, msg4, msg7, msg10, ...] → Leader: Broker 1
|
|
82
|
+
Partition 2: [msg2, msg5, msg8, msg11, ...] → Leader: Broker 2
|
|
83
|
+
|
|
84
|
+
每条消息在Partition内有唯一递增的Offset:
|
|
85
|
+
Partition 0: offset 0 → offset 1 → offset 2 → ...
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**Topic管理**:
|
|
89
|
+
```bash
|
|
90
|
+
# 创建Topic
|
|
91
|
+
kafka-topics.sh --bootstrap-server localhost:9092 \
|
|
92
|
+
--create --topic order-events \
|
|
93
|
+
--partitions 6 --replication-factor 3
|
|
94
|
+
|
|
95
|
+
# 查看Topic列表
|
|
96
|
+
kafka-topics.sh --bootstrap-server localhost:9092 --list
|
|
97
|
+
|
|
98
|
+
# 查看Topic详情
|
|
99
|
+
kafka-topics.sh --bootstrap-server localhost:9092 \
|
|
100
|
+
--describe --topic order-events
|
|
101
|
+
|
|
102
|
+
# 修改Partition数(只能增加不能减少)
|
|
103
|
+
kafka-topics.sh --bootstrap-server localhost:9092 \
|
|
104
|
+
--alter --topic order-events --partitions 12
|
|
105
|
+
|
|
106
|
+
# 删除Topic
|
|
107
|
+
kafka-topics.sh --bootstrap-server localhost:9092 \
|
|
108
|
+
--delete --topic order-events
|
|
109
|
+
```
|
|
110
|
+
|
|
111
|
+
### 3. Consumer Group
|
|
112
|
+
|
|
113
|
+
Consumer Group是一组协同消费同一Topic的消费者。同一组内每个Partition只被一个消费者消费,实现负载均衡。
|
|
114
|
+
|
|
115
|
+
```
|
|
116
|
+
Consumer Group: order-processing-group
|
|
117
|
+
|
|
118
|
+
Topic: order-events (6 Partitions)
|
|
119
|
+
|
|
120
|
+
Consumer A ← P0, P1
|
|
121
|
+
Consumer B ← P2, P3
|
|
122
|
+
Consumer C ← P4, P5
|
|
123
|
+
|
|
124
|
+
如果Consumer B宕机:
|
|
125
|
+
Consumer A ← P0, P1, P2
|
|
126
|
+
Consumer C ← P3, P4, P5 (触发Rebalance)
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
### 4. Offset管理
|
|
130
|
+
|
|
131
|
+
Offset是消息在Partition中的位置标识,Consumer通过Offset追踪消费进度。
|
|
132
|
+
|
|
133
|
+
```
|
|
134
|
+
Partition 0:
|
|
135
|
+
┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
|
|
136
|
+
│ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │
|
|
137
|
+
└───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
|
|
138
|
+
↑ ↑ ↑
|
|
139
|
+
committed current LEO
|
|
140
|
+
offset position (Log End Offset)
|
|
141
|
+
|
|
142
|
+
committed offset: 已提交的消费位移
|
|
143
|
+
current position: 当前消费位置
|
|
144
|
+
LEO: 日志末端偏移量(下一条写入位置)
|
|
145
|
+
HW (High Watermark): 已同步到所有ISR副本的最大Offset
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
### 5. ISR/Leader/Follower
|
|
149
|
+
|
|
150
|
+
ISR(In-Sync Replicas)是与Leader保持同步的副本集合。
|
|
151
|
+
|
|
152
|
+
```
|
|
153
|
+
Partition 0 (RF=3):
|
|
154
|
+
Leader: Broker 0 (接收读写)
|
|
155
|
+
Follower: Broker 1 (ISR成员, 同步中)
|
|
156
|
+
Follower: Broker 2 (ISR成员, 同步中)
|
|
157
|
+
|
|
158
|
+
当Follower落后超过replica.lag.time.max.ms时,被移出ISR:
|
|
159
|
+
ISR: [0, 1, 2] → [0, 1] (Broker 2被移出)
|
|
160
|
+
|
|
161
|
+
Leader选举: 只从ISR中选举新Leader
|
|
162
|
+
如果unclean.leader.election.enable=true, 允许从非ISR选举(可能丢数据)
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
## 生产者
|
|
166
|
+
|
|
167
|
+
### 1. 基础生产者
|
|
168
|
+
|
|
169
|
+
```java
|
|
170
|
+
// Java生产者
|
|
171
|
+
Properties props = new Properties();
|
|
172
|
+
props.put("bootstrap.servers", "kafka1:9092,kafka2:9092,kafka3:9092");
|
|
173
|
+
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
|
|
174
|
+
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
|
|
175
|
+
|
|
176
|
+
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
|
|
177
|
+
|
|
178
|
+
// 异步发送
|
|
179
|
+
producer.send(new ProducerRecord<>("order-events", "order-123", orderJson),
|
|
180
|
+
(metadata, exception) -> {
|
|
181
|
+
if (exception != null) {
|
|
182
|
+
log.error("发送失败", exception);
|
|
183
|
+
} else {
|
|
184
|
+
log.info("发送成功: topic={}, partition={}, offset={}",
|
|
185
|
+
metadata.topic(), metadata.partition(), metadata.offset());
|
|
186
|
+
}
|
|
187
|
+
});
|
|
188
|
+
|
|
189
|
+
// 同步发送
|
|
190
|
+
RecordMetadata metadata = producer.send(
|
|
191
|
+
new ProducerRecord<>("order-events", "order-123", orderJson)).get();
|
|
192
|
+
|
|
193
|
+
producer.close();
|
|
194
|
+
```
|
|
195
|
+
|
|
196
|
+
```python
|
|
197
|
+
# Python生产者(confluent-kafka)
|
|
198
|
+
from confluent_kafka import Producer
|
|
199
|
+
|
|
200
|
+
conf = {
|
|
201
|
+
'bootstrap.servers': 'kafka1:9092,kafka2:9092,kafka3:9092',
|
|
202
|
+
'client.id': 'order-producer',
|
|
203
|
+
'acks': 'all',
|
|
204
|
+
}
|
|
205
|
+
|
|
206
|
+
producer = Producer(conf)
|
|
207
|
+
|
|
208
|
+
def delivery_callback(err, msg):
|
|
209
|
+
if err:
|
|
210
|
+
print(f'发送失败: {err}')
|
|
211
|
+
else:
|
|
212
|
+
print(f'发送成功: topic={msg.topic()}, partition={msg.partition()}, offset={msg.offset()}')
|
|
213
|
+
|
|
214
|
+
producer.produce(
|
|
215
|
+
topic='order-events',
|
|
216
|
+
key='order-123',
|
|
217
|
+
value=order_json.encode('utf-8'),
|
|
218
|
+
callback=delivery_callback
|
|
219
|
+
)
|
|
220
|
+
producer.flush()
|
|
221
|
+
```
|
|
222
|
+
|
|
223
|
+
### 2. 分区策略
|
|
224
|
+
|
|
225
|
+
```java
|
|
226
|
+
// 默认分区策略:
|
|
227
|
+
// 1. 指定partition → 直接使用
|
|
228
|
+
// 2. 有key → hash(key) % numPartitions
|
|
229
|
+
// 3. 无key → 粘性分区(Sticky Partitioner, Kafka 2.4+)
|
|
230
|
+
|
|
231
|
+
// 自定义分区器
|
|
232
|
+
public class OrderPartitioner implements Partitioner {
|
|
233
|
+
@Override
|
|
234
|
+
public int partition(String topic, Object key, byte[] keyBytes,
|
|
235
|
+
Object value, byte[] valueBytes, Cluster cluster) {
|
|
236
|
+
List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
|
|
237
|
+
int numPartitions = partitions.size();
|
|
238
|
+
|
|
239
|
+
if (key == null) {
|
|
240
|
+
// 无key使用轮询
|
|
241
|
+
return ThreadLocalRandom.current().nextInt(numPartitions);
|
|
242
|
+
}
|
|
243
|
+
|
|
244
|
+
String orderKey = (String) key;
|
|
245
|
+
// VIP订单路由到专用分区
|
|
246
|
+
if (orderKey.startsWith("VIP-")) {
|
|
247
|
+
return 0;
|
|
248
|
+
}
|
|
249
|
+
// 其他订单按key哈希
|
|
250
|
+
return Math.abs(Utils.murmur2(keyBytes)) % numPartitions;
|
|
251
|
+
}
|
|
252
|
+
|
|
253
|
+
@Override
|
|
254
|
+
public void close() {}
|
|
255
|
+
|
|
256
|
+
@Override
|
|
257
|
+
public void configure(Map<String, ?> configs) {}
|
|
258
|
+
}
|
|
259
|
+
|
|
260
|
+
// 使用自定义分区器
|
|
261
|
+
props.put("partitioner.class", "com.example.OrderPartitioner");
|
|
262
|
+
```
|
|
263
|
+
|
|
264
|
+
### 3. 幂等性与Exactly-Once语义
|
|
265
|
+
|
|
266
|
+
```properties
|
|
267
|
+
# 幂等生产者配置(防止重复发送)
|
|
268
|
+
enable.idempotence=true
|
|
269
|
+
acks=all
|
|
270
|
+
retries=2147483647
|
|
271
|
+
max.in.flight.requests.per.connection=5
|
|
272
|
+
```
|
|
273
|
+
|
|
274
|
+
```java
|
|
275
|
+
// 事务性生产者(跨分区Exactly-Once)
|
|
276
|
+
props.put("enable.idempotence", "true");
|
|
277
|
+
props.put("transactional.id", "order-tx-producer-1");
|
|
278
|
+
|
|
279
|
+
KafkaProducer<String, String> producer = new KafkaProducer<>(props);
|
|
280
|
+
producer.initTransactions();
|
|
281
|
+
|
|
282
|
+
try {
|
|
283
|
+
producer.beginTransaction();
|
|
284
|
+
|
|
285
|
+
// 发送到多个Topic/Partition(原子操作)
|
|
286
|
+
producer.send(new ProducerRecord<>("order-events", orderKey, orderJson));
|
|
287
|
+
producer.send(new ProducerRecord<>("inventory-events", skuKey, inventoryJson));
|
|
288
|
+
producer.send(new ProducerRecord<>("payment-events", paymentKey, paymentJson));
|
|
289
|
+
|
|
290
|
+
// 提交消费位移(消费-转换-生产模式)
|
|
291
|
+
producer.sendOffsetsToTransaction(offsets, consumerGroupMetadata);
|
|
292
|
+
|
|
293
|
+
producer.commitTransaction();
|
|
294
|
+
} catch (ProducerFencedException | OutOfOrderSequenceException e) {
|
|
295
|
+
producer.close(); // 不可恢复的错误
|
|
296
|
+
} catch (KafkaException e) {
|
|
297
|
+
producer.abortTransaction(); // 可恢复的错误,回滚
|
|
298
|
+
}
|
|
299
|
+
```
|
|
300
|
+
|
|
301
|
+
### 4. 批处理与压缩
|
|
302
|
+
|
|
303
|
+
```properties
|
|
304
|
+
# 批处理配置
|
|
305
|
+
batch.size=65536 # 批次大小(字节), 默认16384
|
|
306
|
+
linger.ms=20 # 等待时间(毫秒), 默认0
|
|
307
|
+
buffer.memory=67108864 # 缓冲区总大小(64MB)
|
|
308
|
+
|
|
309
|
+
# 压缩配置
|
|
310
|
+
compression.type=lz4 # 可选: none, gzip, snappy, lz4, zstd
|
|
311
|
+
# 压缩效果对比:
|
|
312
|
+
# gzip: 压缩率最高, CPU消耗最大
|
|
313
|
+
# snappy: 压缩率中等, CPU消耗低
|
|
314
|
+
# lz4: 压缩率中等, 速度最快
|
|
315
|
+
# zstd: 压缩率高, 速度快(推荐Kafka 2.1+)
|
|
316
|
+
```
|
|
317
|
+
|
|
318
|
+
### 5. acks配置与可靠性
|
|
319
|
+
|
|
320
|
+
```properties
|
|
321
|
+
# acks=0: 不等待确认(最快, 可能丢数据)
|
|
322
|
+
# acks=1: 等待Leader确认(默认, Leader宕机可能丢数据)
|
|
323
|
+
# acks=all/-1: 等待所有ISR确认(最安全, 配合min.insync.replicas)
|
|
324
|
+
acks=all
|
|
325
|
+
|
|
326
|
+
# 可靠性最佳组合
|
|
327
|
+
acks=all
|
|
328
|
+
min.insync.replicas=2
|
|
329
|
+
replication.factor=3
|
|
330
|
+
# 保证: 即使1个Broker宕机也不丢数据
|
|
331
|
+
```
|
|
332
|
+
|
|
333
|
+
## 消费者
|
|
334
|
+
|
|
335
|
+
### 1. 基础消费者
|
|
336
|
+
|
|
337
|
+
```java
|
|
338
|
+
// Java消费者
|
|
339
|
+
Properties props = new Properties();
|
|
340
|
+
props.put("bootstrap.servers", "kafka1:9092,kafka2:9092,kafka3:9092");
|
|
341
|
+
props.put("group.id", "order-processing-group");
|
|
342
|
+
props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
|
|
343
|
+
props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
|
|
344
|
+
props.put("auto.offset.reset", "earliest");
|
|
345
|
+
|
|
346
|
+
KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
|
|
347
|
+
consumer.subscribe(Arrays.asList("order-events"));
|
|
348
|
+
|
|
349
|
+
try {
|
|
350
|
+
while (true) {
|
|
351
|
+
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
|
|
352
|
+
for (ConsumerRecord<String, String> record : records) {
|
|
353
|
+
log.info("消费: topic={}, partition={}, offset={}, key={}, value={}",
|
|
354
|
+
record.topic(), record.partition(), record.offset(),
|
|
355
|
+
record.key(), record.value());
|
|
356
|
+
processOrder(record.value());
|
|
357
|
+
}
|
|
358
|
+
}
|
|
359
|
+
} finally {
|
|
360
|
+
consumer.close();
|
|
361
|
+
}
|
|
362
|
+
```
|
|
363
|
+
|
|
364
|
+
```python
|
|
365
|
+
# Python消费者(confluent-kafka)
|
|
366
|
+
from confluent_kafka import Consumer
|
|
367
|
+
|
|
368
|
+
conf = {
|
|
369
|
+
'bootstrap.servers': 'kafka1:9092,kafka2:9092,kafka3:9092',
|
|
370
|
+
'group.id': 'order-processing-group',
|
|
371
|
+
'auto.offset.reset': 'earliest',
|
|
372
|
+
'enable.auto.commit': False,
|
|
373
|
+
}
|
|
374
|
+
|
|
375
|
+
consumer = Consumer(conf)
|
|
376
|
+
consumer.subscribe(['order-events'])
|
|
377
|
+
|
|
378
|
+
try:
|
|
379
|
+
while True:
|
|
380
|
+
msg = consumer.poll(timeout=1.0)
|
|
381
|
+
if msg is None:
|
|
382
|
+
continue
|
|
383
|
+
if msg.error():
|
|
384
|
+
print(f'消费错误: {msg.error()}')
|
|
385
|
+
continue
|
|
386
|
+
|
|
387
|
+
process_order(msg.value().decode('utf-8'))
|
|
388
|
+
consumer.commit(asynchronous=False)
|
|
389
|
+
finally:
|
|
390
|
+
consumer.close()
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
### 2. 自动提交vs手动提交
|
|
394
|
+
|
|
395
|
+
```java
|
|
396
|
+
// 自动提交(简单但可能重复消费或丢失)
|
|
397
|
+
props.put("enable.auto.commit", "true");
|
|
398
|
+
props.put("auto.commit.interval.ms", "5000");
|
|
399
|
+
|
|
400
|
+
// 手动同步提交(逐条)
|
|
401
|
+
props.put("enable.auto.commit", "false");
|
|
402
|
+
|
|
403
|
+
for (ConsumerRecord<String, String> record : records) {
|
|
404
|
+
processOrder(record.value());
|
|
405
|
+
// 处理完一条提交一次(性能差但最安全)
|
|
406
|
+
consumer.commitSync(Collections.singletonMap(
|
|
407
|
+
new TopicPartition(record.topic(), record.partition()),
|
|
408
|
+
new OffsetAndMetadata(record.offset() + 1)
|
|
409
|
+
));
|
|
410
|
+
}
|
|
411
|
+
|
|
412
|
+
// 手动同步提交(批次)
|
|
413
|
+
for (ConsumerRecord<String, String> record : records) {
|
|
414
|
+
processOrder(record.value());
|
|
415
|
+
}
|
|
416
|
+
consumer.commitSync(); // 处理完一批再提交
|
|
417
|
+
|
|
418
|
+
// 手动异步提交(高性能)
|
|
419
|
+
consumer.commitAsync((offsets, exception) -> {
|
|
420
|
+
if (exception != null) {
|
|
421
|
+
log.error("提交失败: {}", offsets, exception);
|
|
422
|
+
}
|
|
423
|
+
});
|
|
424
|
+
|
|
425
|
+
// 最佳实践: 异步+同步混合
|
|
426
|
+
try {
|
|
427
|
+
while (true) {
|
|
428
|
+
ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
|
|
429
|
+
for (ConsumerRecord<String, String> record : records) {
|
|
430
|
+
processOrder(record.value());
|
|
431
|
+
}
|
|
432
|
+
consumer.commitAsync(); // 正常用异步
|
|
433
|
+
}
|
|
434
|
+
} catch (Exception e) {
|
|
435
|
+
log.error("消费异常", e);
|
|
436
|
+
} finally {
|
|
437
|
+
consumer.commitSync(); // 关闭前用同步确保提交
|
|
438
|
+
consumer.close();
|
|
439
|
+
}
|
|
440
|
+
```
|
|
441
|
+
|
|
442
|
+
### 3. Rebalance策略
|
|
443
|
+
|
|
444
|
+
```java
|
|
445
|
+
// Rebalance触发条件:
|
|
446
|
+
// 1. 消费者加入/离开Group
|
|
447
|
+
// 2. 订阅Topic的Partition数变化
|
|
448
|
+
// 3. 消费者心跳超时(session.timeout.ms)
|
|
449
|
+
// 4. 消费者处理超时(max.poll.interval.ms)
|
|
450
|
+
|
|
451
|
+
// 关键配置
|
|
452
|
+
props.put("session.timeout.ms", "30000"); // 心跳超时
|
|
453
|
+
props.put("heartbeat.interval.ms", "10000"); // 心跳间隔(建议session.timeout的1/3)
|
|
454
|
+
props.put("max.poll.interval.ms", "300000"); // 两次poll最大间隔
|
|
455
|
+
props.put("max.poll.records", "500"); // 单次poll最大记录数
|
|
456
|
+
|
|
457
|
+
// 分区分配策略
|
|
458
|
+
props.put("partition.assignment.strategy",
|
|
459
|
+
"org.apache.kafka.clients.consumer.CooperativeStickyAssignor");
|
|
460
|
+
// 可选策略:
|
|
461
|
+
// RangeAssignor: 按范围分配(默认), 可能不均匀
|
|
462
|
+
// RoundRobinAssignor: 轮询分配, 较均匀
|
|
463
|
+
// StickyAssignor: 粘性分配, 尽量保持原有分配
|
|
464
|
+
// CooperativeStickyAssignor: 增量式协同Rebalance(推荐, 避免Stop-the-world)
|
|
465
|
+
|
|
466
|
+
// Rebalance监听器(用于保存中间状态)
|
|
467
|
+
consumer.subscribe(Arrays.asList("order-events"), new ConsumerRebalanceListener() {
|
|
468
|
+
@Override
|
|
469
|
+
public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
|
|
470
|
+
// 分区被回收前: 提交当前offset, 保存处理状态
|
|
471
|
+
consumer.commitSync();
|
|
472
|
+
log.info("分区被回收: {}", partitions);
|
|
473
|
+
}
|
|
474
|
+
|
|
475
|
+
@Override
|
|
476
|
+
public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
|
|
477
|
+
// 分区被分配后: 恢复处理状态
|
|
478
|
+
log.info("分区被分配: {}", partitions);
|
|
479
|
+
}
|
|
480
|
+
});
|
|
481
|
+
```
|
|
482
|
+
|
|
483
|
+
### 4. 消费策略
|
|
484
|
+
|
|
485
|
+
```java
|
|
486
|
+
// 从指定Offset消费
|
|
487
|
+
consumer.assign(Arrays.asList(new TopicPartition("order-events", 0)));
|
|
488
|
+
consumer.seek(new TopicPartition("order-events", 0), 1000L);
|
|
489
|
+
|
|
490
|
+
// 从指定时间戳消费
|
|
491
|
+
Map<TopicPartition, Long> timestamps = new HashMap<>();
|
|
492
|
+
timestamps.put(new TopicPartition("order-events", 0),
|
|
493
|
+
Instant.parse("2026-03-01T00:00:00Z").toEpochMilli());
|
|
494
|
+
Map<TopicPartition, OffsetAndTimestamp> offsets =
|
|
495
|
+
consumer.offsetsForTimes(timestamps);
|
|
496
|
+
for (Map.Entry<TopicPartition, OffsetAndTimestamp> entry : offsets.entrySet()) {
|
|
497
|
+
consumer.seek(entry.getKey(), entry.getValue().offset());
|
|
498
|
+
}
|
|
499
|
+
|
|
500
|
+
// 从头消费
|
|
501
|
+
consumer.seekToBeginning(consumer.assignment());
|
|
502
|
+
|
|
503
|
+
// 从末尾消费
|
|
504
|
+
consumer.seekToEnd(consumer.assignment());
|
|
505
|
+
```
|
|
506
|
+
|
|
507
|
+
## 集群管理
|
|
508
|
+
|
|
509
|
+
### 1. 副本与ISR配置
|
|
510
|
+
|
|
511
|
+
```properties
|
|
512
|
+
# Broker端配置
|
|
513
|
+
default.replication.factor=3 # 默认副本因子
|
|
514
|
+
min.insync.replicas=2 # 最小ISR数量
|
|
515
|
+
replica.lag.time.max.ms=30000 # 副本最大落后时间
|
|
516
|
+
unclean.leader.election.enable=false # 禁止不干净的选举(防止丢数据)
|
|
517
|
+
|
|
518
|
+
# Topic级别覆盖
|
|
519
|
+
kafka-configs.sh --bootstrap-server localhost:9092 \
|
|
520
|
+
--entity-type topics --entity-name order-events \
|
|
521
|
+
--alter --add-config min.insync.replicas=2
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### 2. Controller角色
|
|
525
|
+
|
|
526
|
+
```
|
|
527
|
+
Controller职责:
|
|
528
|
+
1. 分区Leader选举
|
|
529
|
+
2. 副本状态管理
|
|
530
|
+
3. Topic创建/删除
|
|
531
|
+
4. Broker上下线处理
|
|
532
|
+
|
|
533
|
+
ZooKeeper模式:
|
|
534
|
+
集群中一个Broker担任Controller
|
|
535
|
+
Controller通过ZooKeeper的临时节点选举
|
|
536
|
+
Controller将元数据写入ZooKeeper
|
|
537
|
+
|
|
538
|
+
KRaft模式(Kafka 3.3+, 推荐):
|
|
539
|
+
不再依赖ZooKeeper
|
|
540
|
+
使用Raft协议进行Controller选举
|
|
541
|
+
元数据存储在Kafka自身的__cluster_metadata Topic中
|
|
542
|
+
```
|
|
543
|
+
|
|
544
|
+
### 3. ZooKeeper vs KRaft
|
|
545
|
+
|
|
546
|
+
```properties
|
|
547
|
+
# ZooKeeper模式配置(旧版)
|
|
548
|
+
zookeeper.connect=zk1:2181,zk2:2181,zk3:2181/kafka
|
|
549
|
+
zookeeper.connection.timeout.ms=18000
|
|
550
|
+
|
|
551
|
+
# KRaft模式配置(Kafka 3.3+, 推荐)
|
|
552
|
+
process.roles=broker,controller # 或只设broker/controller
|
|
553
|
+
node.id=1
|
|
554
|
+
controller.quorum.voters=1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
|
|
555
|
+
controller.listener.names=CONTROLLER
|
|
556
|
+
listeners=PLAINTEXT://:9092,CONTROLLER://:9093
|
|
557
|
+
|
|
558
|
+
# KRaft优势:
|
|
559
|
+
# 1. 去除ZooKeeper依赖, 简化运维
|
|
560
|
+
# 2. 更快的Controller切换(秒级→毫秒级)
|
|
561
|
+
# 3. 支持更多Partition(百万级)
|
|
562
|
+
# 4. 元数据同步更高效
|
|
563
|
+
```
|
|
564
|
+
|
|
565
|
+
**KRaft迁移步骤**:
|
|
566
|
+
```bash
|
|
567
|
+
# 1. 生成集群ID
|
|
568
|
+
kafka-storage.sh random-uuid
|
|
569
|
+
|
|
570
|
+
# 2. 格式化存储
|
|
571
|
+
kafka-storage.sh format -t <cluster-id> -c config/kraft/server.properties
|
|
572
|
+
|
|
573
|
+
# 3. 启动KRaft节点
|
|
574
|
+
kafka-server-start.sh config/kraft/server.properties
|
|
575
|
+
|
|
576
|
+
# 4. 从ZooKeeper迁移(Kafka 3.6+支持在线迁移)
|
|
577
|
+
kafka-metadata.sh --snapshot /path/to/snapshot \
|
|
578
|
+
--cluster-id <cluster-id>
|
|
579
|
+
```
|
|
580
|
+
|
|
581
|
+
## Schema管理
|
|
582
|
+
|
|
583
|
+
### 1. Schema Registry
|
|
584
|
+
|
|
585
|
+
```
|
|
586
|
+
生产者 → Schema Registry → Kafka Broker → Schema Registry → 消费者
|
|
587
|
+
(注册Schema) (存储消息) (获取Schema)
|
|
588
|
+
|
|
589
|
+
Schema Registry存储层: _schemas (Kafka内部Topic)
|
|
590
|
+
兼容性检查: 写入时验证新Schema与已有Schema的兼容性
|
|
591
|
+
```
|
|
592
|
+
|
|
593
|
+
### 2. Avro Schema
|
|
594
|
+
|
|
595
|
+
```json
|
|
596
|
+
{
|
|
597
|
+
"type": "record",
|
|
598
|
+
"name": "OrderEvent",
|
|
599
|
+
"namespace": "com.example.events",
|
|
600
|
+
"fields": [
|
|
601
|
+
{"name": "orderId", "type": "string"},
|
|
602
|
+
{"name": "userId", "type": "string"},
|
|
603
|
+
{"name": "amount", "type": "double"},
|
|
604
|
+
{"name": "currency", "type": "string", "default": "CNY"},
|
|
605
|
+
{"name": "status", "type": {
|
|
606
|
+
"type": "enum",
|
|
607
|
+
"name": "OrderStatus",
|
|
608
|
+
"symbols": ["CREATED", "PAID", "SHIPPED", "DELIVERED", "CANCELLED"]
|
|
609
|
+
}},
|
|
610
|
+
{"name": "items", "type": {
|
|
611
|
+
"type": "array",
|
|
612
|
+
"items": {
|
|
613
|
+
"type": "record",
|
|
614
|
+
"name": "OrderItem",
|
|
615
|
+
"fields": [
|
|
616
|
+
{"name": "skuId", "type": "string"},
|
|
617
|
+
{"name": "quantity", "type": "int"},
|
|
618
|
+
{"name": "price", "type": "double"}
|
|
619
|
+
]
|
|
620
|
+
}
|
|
621
|
+
}},
|
|
622
|
+
{"name": "createdAt", "type": {"type": "long", "logicalType": "timestamp-millis"}},
|
|
623
|
+
{"name": "metadata", "type": ["null", {"type": "map", "values": "string"}], "default": null}
|
|
624
|
+
]
|
|
625
|
+
}
|
|
626
|
+
```
|
|
627
|
+
|
|
628
|
+
### 3. Protobuf Schema
|
|
629
|
+
|
|
630
|
+
```protobuf
|
|
631
|
+
syntax = "proto3";
|
|
632
|
+
package com.example.events;
|
|
633
|
+
|
|
634
|
+
message OrderEvent {
|
|
635
|
+
string order_id = 1;
|
|
636
|
+
string user_id = 2;
|
|
637
|
+
double amount = 3;
|
|
638
|
+
string currency = 4;
|
|
639
|
+
OrderStatus status = 5;
|
|
640
|
+
repeated OrderItem items = 6;
|
|
641
|
+
int64 created_at = 7;
|
|
642
|
+
map<string, string> metadata = 8;
|
|
643
|
+
|
|
644
|
+
enum OrderStatus {
|
|
645
|
+
CREATED = 0;
|
|
646
|
+
PAID = 1;
|
|
647
|
+
SHIPPED = 2;
|
|
648
|
+
DELIVERED = 3;
|
|
649
|
+
CANCELLED = 4;
|
|
650
|
+
}
|
|
651
|
+
|
|
652
|
+
message OrderItem {
|
|
653
|
+
string sku_id = 1;
|
|
654
|
+
int32 quantity = 2;
|
|
655
|
+
double price = 3;
|
|
656
|
+
}
|
|
657
|
+
}
|
|
658
|
+
```
|
|
659
|
+
|
|
660
|
+
### 4. 兼容性策略
|
|
661
|
+
|
|
662
|
+
```
|
|
663
|
+
兼容性级别:
|
|
664
|
+
┌──────────────────┬──────────────────────────────────────────┐
|
|
665
|
+
│ BACKWARD │ 新Schema可以读旧数据(默认) │
|
|
666
|
+
│ BACKWARD_TRANSITIVE │ 新Schema可以读所有历史版本数据 │
|
|
667
|
+
│ FORWARD │ 旧Schema可以读新数据 │
|
|
668
|
+
│ FORWARD_TRANSITIVE │ 所有历史版本可以读新数据 │
|
|
669
|
+
│ FULL │ 双向兼容(最新版本) │
|
|
670
|
+
│ FULL_TRANSITIVE │ 双向兼容(所有版本) │
|
|
671
|
+
│ NONE │ 不检查兼容性(不推荐生产使用) │
|
|
672
|
+
└──────────────────┴──────────────────────────────────────────┘
|
|
673
|
+
|
|
674
|
+
安全的Schema演进操作:
|
|
675
|
+
✅ 添加带默认值的字段(BACKWARD兼容)
|
|
676
|
+
✅ 删除带默认值的字段(FORWARD兼容)
|
|
677
|
+
✅ 添加可选字段(FULL兼容)
|
|
678
|
+
❌ 删除必需字段(破坏BACKWARD)
|
|
679
|
+
❌ 修改字段类型(破坏所有兼容性)
|
|
680
|
+
❌ 重命名字段(破坏所有兼容性)
|
|
681
|
+
```
|
|
682
|
+
|
|
683
|
+
```bash
|
|
684
|
+
# Schema Registry API
|
|
685
|
+
# 注册Schema
|
|
686
|
+
curl -X POST http://schema-registry:8081/subjects/order-events-value/versions \
|
|
687
|
+
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
|
|
688
|
+
-d '{"schema": "{\"type\":\"record\",\"name\":\"OrderEvent\",...}"}'
|
|
689
|
+
|
|
690
|
+
# 查看兼容性
|
|
691
|
+
curl http://schema-registry:8081/config/order-events-value
|
|
692
|
+
|
|
693
|
+
# 设置兼容性级别
|
|
694
|
+
curl -X PUT http://schema-registry:8081/config/order-events-value \
|
|
695
|
+
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
|
|
696
|
+
-d '{"compatibility": "FULL_TRANSITIVE"}'
|
|
697
|
+
|
|
698
|
+
# 兼容性测试
|
|
699
|
+
curl -X POST http://schema-registry:8081/compatibility/subjects/order-events-value/versions/latest \
|
|
700
|
+
-H "Content-Type: application/vnd.schemaregistry.v1+json" \
|
|
701
|
+
-d '{"schema": "{...}"}'
|
|
702
|
+
```
|
|
703
|
+
|
|
704
|
+
## Kafka Streams
|
|
705
|
+
|
|
706
|
+
### 1. KStream与KTable
|
|
707
|
+
|
|
708
|
+
```java
|
|
709
|
+
// KStream: 无界事件流(每条记录是独立事件)
|
|
710
|
+
// KTable: 变更日志流(每个key只保留最新值,类似数据库表)
|
|
711
|
+
|
|
712
|
+
StreamsBuilder builder = new StreamsBuilder();
|
|
713
|
+
|
|
714
|
+
// KStream: 订单事件流
|
|
715
|
+
KStream<String, OrderEvent> orderStream = builder.stream("order-events",
|
|
716
|
+
Consumed.with(Serdes.String(), orderEventSerde));
|
|
717
|
+
|
|
718
|
+
// KTable: 用户信息表(从compacted topic读取)
|
|
719
|
+
KTable<String, UserInfo> userTable = builder.table("user-info",
|
|
720
|
+
Materialized.as("user-info-store"));
|
|
721
|
+
|
|
722
|
+
// 流处理: 过滤 + 转换
|
|
723
|
+
KStream<String, EnrichedOrder> enrichedOrders = orderStream
|
|
724
|
+
.filter((key, order) -> order.getAmount() > 0)
|
|
725
|
+
.mapValues(order -> EnrichedOrder.from(order))
|
|
726
|
+
.selectKey((key, order) -> order.getUserId());
|
|
727
|
+
|
|
728
|
+
// Stream-Table Join(用户信息关联)
|
|
729
|
+
KStream<String, OrderWithUser> ordersWithUser = enrichedOrders.join(
|
|
730
|
+
userTable,
|
|
731
|
+
(order, user) -> new OrderWithUser(order, user)
|
|
732
|
+
);
|
|
733
|
+
|
|
734
|
+
ordersWithUser.to("enriched-order-events",
|
|
735
|
+
Produced.with(Serdes.String(), enrichedOrderSerde));
|
|
736
|
+
```
|
|
737
|
+
|
|
738
|
+
### 2. 窗口操作
|
|
739
|
+
|
|
740
|
+
```java
|
|
741
|
+
// 滚动窗口(Tumbling Window): 固定大小,无重叠
|
|
742
|
+
KTable<Windowed<String>, Long> tumblingCounts = orderStream
|
|
743
|
+
.groupByKey()
|
|
744
|
+
.windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
|
|
745
|
+
.count(Materialized.as("tumbling-counts"));
|
|
746
|
+
|
|
747
|
+
// 跳跃窗口(Hopping Window): 固定大小,有重叠
|
|
748
|
+
KTable<Windowed<String>, Long> hoppingCounts = orderStream
|
|
749
|
+
.groupByKey()
|
|
750
|
+
.windowedBy(TimeWindows.ofSizeAndGrace(
|
|
751
|
+
Duration.ofMinutes(5),
|
|
752
|
+
Duration.ofMinutes(1))
|
|
753
|
+
.advanceBy(Duration.ofMinutes(1)))
|
|
754
|
+
.count(Materialized.as("hopping-counts"));
|
|
755
|
+
|
|
756
|
+
// 滑动窗口(Sliding Window): 基于时间差
|
|
757
|
+
KTable<Windowed<String>, Long> slidingCounts = orderStream
|
|
758
|
+
.groupByKey()
|
|
759
|
+
.windowedBy(SlidingWindows.ofTimeDifferenceAndGrace(
|
|
760
|
+
Duration.ofMinutes(5),
|
|
761
|
+
Duration.ofMinutes(1)))
|
|
762
|
+
.count(Materialized.as("sliding-counts"));
|
|
763
|
+
|
|
764
|
+
// 会话窗口(Session Window): 基于活动间隔
|
|
765
|
+
KTable<Windowed<String>, Long> sessionCounts = orderStream
|
|
766
|
+
.groupByKey()
|
|
767
|
+
.windowedBy(SessionWindows.ofInactivityGapAndGrace(
|
|
768
|
+
Duration.ofMinutes(30),
|
|
769
|
+
Duration.ofMinutes(5)))
|
|
770
|
+
.count(Materialized.as("session-counts"));
|
|
771
|
+
```
|
|
772
|
+
|
|
773
|
+
### 3. 状态存储
|
|
774
|
+
|
|
775
|
+
```java
|
|
776
|
+
// Kafka Streams使用RocksDB作为本地状态存储
|
|
777
|
+
// 状态存储自动备份到changelog topic(容错)
|
|
778
|
+
|
|
779
|
+
// 自定义状态存储
|
|
780
|
+
StoreBuilder<KeyValueStore<String, OrderAggregate>> storeBuilder =
|
|
781
|
+
Stores.keyValueStoreBuilder(
|
|
782
|
+
Stores.persistentKeyValueStore("order-aggregate-store"),
|
|
783
|
+
Serdes.String(),
|
|
784
|
+
orderAggregateSerde
|
|
785
|
+
).withCachingEnabled()
|
|
786
|
+
.withLoggingEnabled(new HashMap<>()); // 启用changelog
|
|
787
|
+
|
|
788
|
+
builder.addStateStore(storeBuilder);
|
|
789
|
+
|
|
790
|
+
// 在Processor中使用状态存储
|
|
791
|
+
orderStream.process(() -> new Processor<String, OrderEvent, String, OrderAggregate>() {
|
|
792
|
+
private KeyValueStore<String, OrderAggregate> store;
|
|
793
|
+
|
|
794
|
+
@Override
|
|
795
|
+
public void init(ProcessorContext<String, OrderAggregate> context) {
|
|
796
|
+
store = context.getStateStore("order-aggregate-store");
|
|
797
|
+
}
|
|
798
|
+
|
|
799
|
+
@Override
|
|
800
|
+
public void process(Record<String, OrderEvent> record) {
|
|
801
|
+
OrderAggregate agg = store.get(record.key());
|
|
802
|
+
if (agg == null) agg = new OrderAggregate();
|
|
803
|
+
agg.add(record.value());
|
|
804
|
+
store.put(record.key(), agg);
|
|
805
|
+
context().forward(record.withValue(agg));
|
|
806
|
+
}
|
|
807
|
+
}, "order-aggregate-store");
|
|
808
|
+
```
|
|
809
|
+
|
|
810
|
+
## Kafka Connect
|
|
811
|
+
|
|
812
|
+
### 1. Source Connector(数据导入)
|
|
813
|
+
|
|
814
|
+
```json
|
|
815
|
+
{
|
|
816
|
+
"name": "mysql-source-connector",
|
|
817
|
+
"config": {
|
|
818
|
+
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
|
|
819
|
+
"tasks.max": "1",
|
|
820
|
+
"database.hostname": "mysql-host",
|
|
821
|
+
"database.port": "3306",
|
|
822
|
+
"database.user": "debezium",
|
|
823
|
+
"database.password": "${env:MYSQL_PASSWORD}",
|
|
824
|
+
"database.server.id": "184054",
|
|
825
|
+
"topic.prefix": "cdc-mysql",
|
|
826
|
+
"database.include.list": "ecommerce",
|
|
827
|
+
"table.include.list": "ecommerce.orders,ecommerce.users",
|
|
828
|
+
"schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
|
|
829
|
+
"schema.history.internal.kafka.topic": "schema-history.ecommerce",
|
|
830
|
+
"include.schema.changes": "true",
|
|
831
|
+
"snapshot.mode": "initial",
|
|
832
|
+
"transforms": "route",
|
|
833
|
+
"transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
|
|
834
|
+
"transforms.route.regex": "cdc-mysql\\.ecommerce\\.(.*)",
|
|
835
|
+
"transforms.route.replacement": "cdc.$1"
|
|
836
|
+
}
|
|
837
|
+
}
|
|
838
|
+
```
|
|
839
|
+
|
|
840
|
+
### 2. Sink Connector(数据导出)
|
|
841
|
+
|
|
842
|
+
```json
|
|
843
|
+
{
|
|
844
|
+
"name": "elasticsearch-sink-connector",
|
|
845
|
+
"config": {
|
|
846
|
+
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
|
|
847
|
+
"tasks.max": "3",
|
|
848
|
+
"topics": "order-events,user-events",
|
|
849
|
+
"connection.url": "http://elasticsearch:9200",
|
|
850
|
+
"type.name": "_doc",
|
|
851
|
+
"key.ignore": "false",
|
|
852
|
+
"schema.ignore": "true",
|
|
853
|
+
"behavior.on.null.values": "delete",
|
|
854
|
+
"write.method": "upsert",
|
|
855
|
+
"transforms": "extractKey,timestampRouter",
|
|
856
|
+
"transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
|
|
857
|
+
"transforms.extractKey.field": "id",
|
|
858
|
+
"transforms.timestampRouter.type": "org.apache.kafka.connect.transforms.TimestampRouter",
|
|
859
|
+
"transforms.timestampRouter.topic.format": "${topic}-${timestamp}",
|
|
860
|
+
"transforms.timestampRouter.timestamp.format": "yyyyMMdd"
|
|
861
|
+
}
|
|
862
|
+
}
|
|
863
|
+
```
|
|
864
|
+
|
|
865
|
+
### 3. Connect管理API
|
|
866
|
+
|
|
867
|
+
```bash
|
|
868
|
+
# 查看已安装的Connector插件
|
|
869
|
+
curl http://connect:8083/connector-plugins | jq
|
|
870
|
+
|
|
871
|
+
# 创建Connector
|
|
872
|
+
curl -X POST http://connect:8083/connectors \
|
|
873
|
+
-H "Content-Type: application/json" \
|
|
874
|
+
-d @mysql-source-connector.json
|
|
875
|
+
|
|
876
|
+
# 查看Connector状态
|
|
877
|
+
curl http://connect:8083/connectors/mysql-source-connector/status | jq
|
|
878
|
+
|
|
879
|
+
# 暂停/恢复
|
|
880
|
+
curl -X PUT http://connect:8083/connectors/mysql-source-connector/pause
|
|
881
|
+
curl -X PUT http://connect:8083/connectors/mysql-source-connector/resume
|
|
882
|
+
|
|
883
|
+
# 重启Connector
|
|
884
|
+
curl -X POST http://connect:8083/connectors/mysql-source-connector/restart
|
|
885
|
+
|
|
886
|
+
# 重启单个Task
|
|
887
|
+
curl -X POST http://connect:8083/connectors/mysql-source-connector/tasks/0/restart
|
|
888
|
+
|
|
889
|
+
# 删除Connector
|
|
890
|
+
curl -X DELETE http://connect:8083/connectors/mysql-source-connector
|
|
891
|
+
```
|
|
892
|
+
|
|
893
|
+
## 性能优化
|
|
894
|
+
|
|
895
|
+
### 1. 分区数规划
|
|
896
|
+
|
|
897
|
+
```
|
|
898
|
+
分区数计算公式:
|
|
899
|
+
目标吞吐量 / min(生产者单分区吞吐, 消费者单分区吞吐)
|
|
900
|
+
|
|
901
|
+
示例:
|
|
902
|
+
目标: 100MB/s
|
|
903
|
+
生产者单分区: 20MB/s
|
|
904
|
+
消费者单分区: 10MB/s
|
|
905
|
+
分区数 = 100 / 10 = 10 (取消费者瓶颈)
|
|
906
|
+
|
|
907
|
+
分区数建议:
|
|
908
|
+
- 小规模(< 10MB/s): 6-12个分区
|
|
909
|
+
- 中规模(10-100MB/s): 12-64个分区
|
|
910
|
+
- 大规模(> 100MB/s): 64-256个分区
|
|
911
|
+
- 分区数上限考虑: 每个分区占用Broker约1MB内存和一个文件句柄
|
|
912
|
+
|
|
913
|
+
注意: 分区数只能增加不能减少,规划时留有余量
|
|
914
|
+
```
|
|
915
|
+
|
|
916
|
+
### 2. 生产者性能调优
|
|
917
|
+
|
|
918
|
+
```properties
|
|
919
|
+
# 批处理(核心优化)
|
|
920
|
+
batch.size=131072 # 128KB(默认16KB太小)
|
|
921
|
+
linger.ms=20 # 等待20ms凑批(默认0立即发送)
|
|
922
|
+
|
|
923
|
+
# 压缩
|
|
924
|
+
compression.type=lz4 # LZ4压缩(速度快,压缩率适中)
|
|
925
|
+
|
|
926
|
+
# 缓冲区
|
|
927
|
+
buffer.memory=134217728 # 128MB发送缓冲区
|
|
928
|
+
max.block.ms=60000 # 缓冲区满时最大阻塞时间
|
|
929
|
+
|
|
930
|
+
# 网络
|
|
931
|
+
send.buffer.bytes=131072 # TCP发送缓冲区
|
|
932
|
+
receive.buffer.bytes=65536 # TCP接收缓冲区
|
|
933
|
+
|
|
934
|
+
# 请求
|
|
935
|
+
max.request.size=10485760 # 单个请求最大10MB
|
|
936
|
+
request.timeout.ms=30000 # 请求超时30s
|
|
937
|
+
delivery.timeout.ms=120000 # 总投递超时120s
|
|
938
|
+
```
|
|
939
|
+
|
|
940
|
+
### 3. 消费者性能调优
|
|
941
|
+
|
|
942
|
+
```properties
|
|
943
|
+
# 拉取配置
|
|
944
|
+
fetch.min.bytes=1048576 # 最小拉取1MB(减少请求次数)
|
|
945
|
+
fetch.max.bytes=52428800 # 最大拉取50MB
|
|
946
|
+
fetch.max.wait.ms=500 # 最大等待500ms
|
|
947
|
+
max.partition.fetch.bytes=10485760 # 单分区最大拉取10MB
|
|
948
|
+
|
|
949
|
+
# 消费批次
|
|
950
|
+
max.poll.records=1000 # 单次poll最大记录数
|
|
951
|
+
|
|
952
|
+
# 并行度: 消费者数 = 分区数(最佳1:1映射)
|
|
953
|
+
```
|
|
954
|
+
|
|
955
|
+
### 4. Broker端性能优化
|
|
956
|
+
|
|
957
|
+
```properties
|
|
958
|
+
# 零拷贝(sendfile系统调用, 默认开启)
|
|
959
|
+
# Kafka使用零拷贝技术, 数据从磁盘直接传输到网卡, 不经过用户空间
|
|
960
|
+
|
|
961
|
+
# 页面缓存(Page Cache)
|
|
962
|
+
# Kafka依赖OS页面缓存而非JVM堆
|
|
963
|
+
# 建议: 预留25-50%物理内存给页面缓存
|
|
964
|
+
# JVM堆设置: 6-8GB即可(不要过大)
|
|
965
|
+
export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"
|
|
966
|
+
|
|
967
|
+
# 磁盘
|
|
968
|
+
log.dirs=/data1/kafka-logs,/data2/kafka-logs # 多磁盘并行写入
|
|
969
|
+
log.flush.interval.messages=10000 # 每10000条刷盘(依赖页面缓存更佳)
|
|
970
|
+
log.flush.interval.ms=1000 # 每秒刷盘
|
|
971
|
+
|
|
972
|
+
# 网络线程
|
|
973
|
+
num.network.threads=8 # 网络IO线程
|
|
974
|
+
num.io.threads=16 # 磁盘IO线程
|
|
975
|
+
num.replica.fetchers=4 # 副本拉取线程
|
|
976
|
+
|
|
977
|
+
# 日志段
|
|
978
|
+
log.segment.bytes=1073741824 # 1GB段文件
|
|
979
|
+
log.index.interval.bytes=4096 # 索引间隔
|
|
980
|
+
```
|
|
981
|
+
|
|
982
|
+
## 监控
|
|
983
|
+
|
|
984
|
+
### 1. 核心JMX指标
|
|
985
|
+
|
|
986
|
+
```
|
|
987
|
+
Broker指标:
|
|
988
|
+
┌──────────────────────────────────────────┬────────────────────────┐
|
|
989
|
+
│ 指标 │ 说明 │
|
|
990
|
+
├──────────────────────────────────────────┼────────────────────────┤
|
|
991
|
+
│ kafka.server:type=BrokerTopicMetrics, │ 消息入站速率(条/秒) │
|
|
992
|
+
│ name=MessagesInPerSec │ │
|
|
993
|
+
│ kafka.server:type=BrokerTopicMetrics, │ 入站字节速率(B/秒) │
|
|
994
|
+
│ name=BytesInPerSec │ │
|
|
995
|
+
│ kafka.server:type=BrokerTopicMetrics, │ 出站字节速率(B/秒) │
|
|
996
|
+
│ name=BytesOutPerSec │ │
|
|
997
|
+
│ kafka.server:type=ReplicaManager, │ ISR扩缩次数(频繁则不健康) │
|
|
998
|
+
│ name=IsrShrinksPerSec │ │
|
|
999
|
+
│ kafka.server:type=ReplicaManager, │ 副本不足的分区数 │
|
|
1000
|
+
│ name=UnderReplicatedPartitions │ │
|
|
1001
|
+
│ kafka.controller:type=KafkaController, │ 活跃Controller数(应=1) │
|
|
1002
|
+
│ name=ActiveControllerCount │ │
|
|
1003
|
+
│ kafka.server:type=ReplicaManager, │ Leader分区数 │
|
|
1004
|
+
│ name=LeaderCount │ │
|
|
1005
|
+
│ kafka.network:type=RequestMetrics, │ 请求延迟(ms) │
|
|
1006
|
+
│ name=TotalTimeMs,request=Produce │ │
|
|
1007
|
+
│ kafka.log:type=LogFlushStats, │ 日志刷盘速率 │
|
|
1008
|
+
│ name=LogFlushRateAndTimeMs │ │
|
|
1009
|
+
└──────────────────────────────────────────┴────────────────────────┘
|
|
1010
|
+
|
|
1011
|
+
生产者指标:
|
|
1012
|
+
record-send-rate: 发送速率(条/秒)
|
|
1013
|
+
record-error-rate: 发送错误率
|
|
1014
|
+
request-latency-avg: 平均请求延迟
|
|
1015
|
+
batch-size-avg: 平均批次大小
|
|
1016
|
+
compression-rate-avg: 平均压缩率
|
|
1017
|
+
|
|
1018
|
+
消费者指标:
|
|
1019
|
+
records-consumed-rate: 消费速率(条/秒)
|
|
1020
|
+
records-lag-max: 最大消费延迟(条)
|
|
1021
|
+
fetch-latency-avg: 平均拉取延迟
|
|
1022
|
+
commit-latency-avg: 平均提交延迟
|
|
1023
|
+
```
|
|
1024
|
+
|
|
1025
|
+
### 2. Lag监控
|
|
1026
|
+
|
|
1027
|
+
```bash
|
|
1028
|
+
# 命令行查看Consumer Lag
|
|
1029
|
+
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
|
|
1030
|
+
--describe --group order-processing-group
|
|
1031
|
+
|
|
1032
|
+
# 输出示例:
|
|
1033
|
+
# GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
|
|
1034
|
+
# order-processing-group order-events 0 1000 1050 50
|
|
1035
|
+
# order-processing-group order-events 1 2000 2100 100
|
|
1036
|
+
# order-processing-group order-events 2 3000 3010 10
|
|
1037
|
+
```
|
|
1038
|
+
|
|
1039
|
+
**Burrow监控配置**:
|
|
1040
|
+
```yaml
|
|
1041
|
+
# burrow.toml
|
|
1042
|
+
[general]
|
|
1043
|
+
access-control-allow-origin = "*"
|
|
1044
|
+
|
|
1045
|
+
[zookeeper]
|
|
1046
|
+
servers = ["zk1:2181", "zk2:2181", "zk3:2181"]
|
|
1047
|
+
|
|
1048
|
+
[cluster.production]
|
|
1049
|
+
class-name = "kafka"
|
|
1050
|
+
servers = ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
|
|
1051
|
+
topic-refresh = 60
|
|
1052
|
+
offset-refresh = 30
|
|
1053
|
+
|
|
1054
|
+
[consumer.production]
|
|
1055
|
+
class-name = "kafka"
|
|
1056
|
+
cluster = "production"
|
|
1057
|
+
servers = ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
|
|
1058
|
+
group-denylist = "^console-consumer-"
|
|
1059
|
+
offset-refresh = 30
|
|
1060
|
+
|
|
1061
|
+
[notifier.slack]
|
|
1062
|
+
class-name = "http"
|
|
1063
|
+
url-open = "https://hooks.slack.com/services/xxx"
|
|
1064
|
+
template-open = "burrow-alert.tmpl"
|
|
1065
|
+
send-close = true
|
|
1066
|
+
interval = 60
|
|
1067
|
+
threshold = 2 # WARNING以上才告警
|
|
1068
|
+
|
|
1069
|
+
# Burrow评估状态:
|
|
1070
|
+
# OK: Lag稳定或下降
|
|
1071
|
+
# WARNING: Lag持续增长
|
|
1072
|
+
# ERR: Lag增长且消费停滞
|
|
1073
|
+
# STOP: 消费完全停止
|
|
1074
|
+
```
|
|
1075
|
+
|
|
1076
|
+
### 3. Prometheus + Grafana监控
|
|
1077
|
+
|
|
1078
|
+
```yaml
|
|
1079
|
+
# docker-compose.yml - JMX Exporter
|
|
1080
|
+
services:
|
|
1081
|
+
kafka:
|
|
1082
|
+
environment:
|
|
1083
|
+
KAFKA_JMX_OPTS: >-
|
|
1084
|
+
-Dcom.sun.management.jmxremote
|
|
1085
|
+
-Dcom.sun.management.jmxremote.port=9999
|
|
1086
|
+
-Dcom.sun.management.jmxremote.authenticate=false
|
|
1087
|
+
-Dcom.sun.management.jmxremote.ssl=false
|
|
1088
|
+
EXTRA_ARGS: >-
|
|
1089
|
+
-javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=7071:/opt/jmx-exporter/kafka-broker.yml
|
|
1090
|
+
|
|
1091
|
+
# prometheus.yml
|
|
1092
|
+
scrape_configs:
|
|
1093
|
+
- job_name: 'kafka'
|
|
1094
|
+
static_configs:
|
|
1095
|
+
- targets: ['kafka1:7071', 'kafka2:7071', 'kafka3:7071']
|
|
1096
|
+
```
|
|
1097
|
+
|
|
1098
|
+
## 安全
|
|
1099
|
+
|
|
1100
|
+
### 1. SASL认证
|
|
1101
|
+
|
|
1102
|
+
```properties
|
|
1103
|
+
# Broker配置(SASL/SCRAM)
|
|
1104
|
+
listeners=SASL_SSL://0.0.0.0:9093
|
|
1105
|
+
advertised.listeners=SASL_SSL://kafka-broker:9093
|
|
1106
|
+
security.inter.broker.protocol=SASL_SSL
|
|
1107
|
+
sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
|
|
1108
|
+
sasl.enabled.mechanisms=SCRAM-SHA-512
|
|
1109
|
+
|
|
1110
|
+
# 创建SCRAM用户
|
|
1111
|
+
kafka-configs.sh --bootstrap-server localhost:9092 \
|
|
1112
|
+
--alter --add-config 'SCRAM-SHA-512=[password=secret123]' \
|
|
1113
|
+
--entity-type users --entity-name producer-user
|
|
1114
|
+
|
|
1115
|
+
kafka-configs.sh --bootstrap-server localhost:9092 \
|
|
1116
|
+
--alter --add-config 'SCRAM-SHA-512=[password=secret456]' \
|
|
1117
|
+
--entity-type users --entity-name consumer-user
|
|
1118
|
+
```
|
|
1119
|
+
|
|
1120
|
+
### 2. SSL/TLS加密
|
|
1121
|
+
|
|
1122
|
+
```bash
|
|
1123
|
+
# 生成CA证书
|
|
1124
|
+
openssl req -new -x509 -keyout ca-key -out ca-cert -days 3650 \
|
|
1125
|
+
-subj "/CN=KafkaCA" -nodes
|
|
1126
|
+
|
|
1127
|
+
# 为每个Broker生成密钥库
|
|
1128
|
+
keytool -keystore kafka-broker.keystore.jks -alias broker \
|
|
1129
|
+
-genkey -keyalg RSA -validity 3650 \
|
|
1130
|
+
-dname "CN=kafka-broker,OU=Kafka,O=Example,L=BJ,ST=BJ,C=CN" \
|
|
1131
|
+
-storepass changeit -keypass changeit
|
|
1132
|
+
|
|
1133
|
+
# 签名证书
|
|
1134
|
+
keytool -keystore kafka-broker.keystore.jks -alias broker \
|
|
1135
|
+
-certreq -file cert-file -storepass changeit
|
|
1136
|
+
openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file \
|
|
1137
|
+
-out cert-signed -days 3650 -CAcreateserial
|
|
1138
|
+
|
|
1139
|
+
# 导入CA和签名证书
|
|
1140
|
+
keytool -keystore kafka-broker.keystore.jks -alias CARoot \
|
|
1141
|
+
-import -file ca-cert -storepass changeit -noprompt
|
|
1142
|
+
keytool -keystore kafka-broker.keystore.jks -alias broker \
|
|
1143
|
+
-import -file cert-signed -storepass changeit
|
|
1144
|
+
|
|
1145
|
+
# 创建信任库
|
|
1146
|
+
keytool -keystore kafka.truststore.jks -alias CARoot \
|
|
1147
|
+
-import -file ca-cert -storepass changeit -noprompt
|
|
1148
|
+
```
|
|
1149
|
+
|
|
1150
|
+
```properties
|
|
1151
|
+
# Broker SSL配置
|
|
1152
|
+
ssl.keystore.location=/etc/kafka/ssl/kafka-broker.keystore.jks
|
|
1153
|
+
ssl.keystore.password=changeit
|
|
1154
|
+
ssl.key.password=changeit
|
|
1155
|
+
ssl.truststore.location=/etc/kafka/ssl/kafka.truststore.jks
|
|
1156
|
+
ssl.truststore.password=changeit
|
|
1157
|
+
ssl.client.auth=required
|
|
1158
|
+
ssl.endpoint.identification.algorithm=https
|
|
1159
|
+
```
|
|
1160
|
+
|
|
1161
|
+
### 3. ACL授权
|
|
1162
|
+
|
|
1163
|
+
```bash
|
|
1164
|
+
# 授权生产者写入
|
|
1165
|
+
kafka-acls.sh --bootstrap-server localhost:9092 \
|
|
1166
|
+
--add --allow-principal User:producer-user \
|
|
1167
|
+
--operation Write --topic order-events
|
|
1168
|
+
|
|
1169
|
+
# 授权消费者读取
|
|
1170
|
+
kafka-acls.sh --bootstrap-server localhost:9092 \
|
|
1171
|
+
--add --allow-principal User:consumer-user \
|
|
1172
|
+
--operation Read --topic order-events \
|
|
1173
|
+
--group order-processing-group
|
|
1174
|
+
|
|
1175
|
+
# 授权Consumer Group
|
|
1176
|
+
kafka-acls.sh --bootstrap-server localhost:9092 \
|
|
1177
|
+
--add --allow-principal User:consumer-user \
|
|
1178
|
+
--operation Read --group order-processing-group
|
|
1179
|
+
|
|
1180
|
+
# 查看ACL
|
|
1181
|
+
kafka-acls.sh --bootstrap-server localhost:9092 \
|
|
1182
|
+
--list --topic order-events
|
|
1183
|
+
|
|
1184
|
+
# 删除ACL
|
|
1185
|
+
kafka-acls.sh --bootstrap-server localhost:9092 \
|
|
1186
|
+
--remove --allow-principal User:producer-user \
|
|
1187
|
+
--operation Write --topic order-events
|
|
1188
|
+
|
|
1189
|
+
# 通配符授权(前缀匹配)
|
|
1190
|
+
kafka-acls.sh --bootstrap-server localhost:9092 \
|
|
1191
|
+
--add --allow-principal User:analytics-user \
|
|
1192
|
+
--operation Read --topic order- --resource-pattern-type prefixed
|
|
1193
|
+
```
|
|
1194
|
+
|
|
1195
|
+
## 运维
|
|
1196
|
+
|
|
1197
|
+
### 1. 扩容与缩容
|
|
1198
|
+
|
|
1199
|
+
```bash
|
|
1200
|
+
# 扩容: 添加新Broker后, 重新分配分区
|
|
1201
|
+
# 1. 生成分配方案
|
|
1202
|
+
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
|
|
1203
|
+
--topics-to-move-json-file topics.json \
|
|
1204
|
+
--broker-list "0,1,2,3" \
|
|
1205
|
+
--generate
|
|
1206
|
+
|
|
1207
|
+
# topics.json
|
|
1208
|
+
# {"topics": [{"topic": "order-events"}], "version": 1}
|
|
1209
|
+
|
|
1210
|
+
# 2. 执行迁移
|
|
1211
|
+
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
|
|
1212
|
+
--reassignment-json-file reassignment.json \
|
|
1213
|
+
--execute \
|
|
1214
|
+
--throttle 50000000 # 限速50MB/s避免影响业务
|
|
1215
|
+
|
|
1216
|
+
# 3. 验证迁移状态
|
|
1217
|
+
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
|
|
1218
|
+
--reassignment-json-file reassignment.json \
|
|
1219
|
+
--verify
|
|
1220
|
+
|
|
1221
|
+
# 缩容: 先迁移分区到其他Broker, 再下线
|
|
1222
|
+
# 确保待下线Broker上无Leader分区
|
|
1223
|
+
```
|
|
1224
|
+
|
|
1225
|
+
### 2. 数据保留策略
|
|
1226
|
+
|
|
1227
|
+
```properties
|
|
1228
|
+
# 基于时间保留
|
|
1229
|
+
log.retention.hours=168 # 保留7天(默认)
|
|
1230
|
+
log.retention.minutes=10080 # 更精确的分钟级设置
|
|
1231
|
+
log.retention.ms=604800000 # 最精确的毫秒级设置
|
|
1232
|
+
|
|
1233
|
+
# 基于大小保留
|
|
1234
|
+
log.retention.bytes=107374182400 # 每分区保留100GB
|
|
1235
|
+
# -1表示无大小限制
|
|
1236
|
+
|
|
1237
|
+
# 基于压缩(Compaction)
|
|
1238
|
+
log.cleanup.policy=compact # 只保留每个key的最新值
|
|
1239
|
+
log.cleaner.min.compaction.lag.ms=86400000 # 最小压缩延迟24h
|
|
1240
|
+
log.cleaner.delete.retention.ms=86400000 # 墓碑消息保留24h
|
|
1241
|
+
|
|
1242
|
+
# 混合策略(同时基于时间和压缩)
|
|
1243
|
+
log.cleanup.policy=compact,delete
|
|
1244
|
+
|
|
1245
|
+
# Topic级别覆盖
|
|
1246
|
+
kafka-configs.sh --bootstrap-server localhost:9092 \
|
|
1247
|
+
--entity-type topics --entity-name order-events \
|
|
1248
|
+
--alter --add-config retention.ms=2592000000 # 30天
|
|
1249
|
+
```
|
|
1250
|
+
|
|
1251
|
+
### 3. 跨数据中心复制(MirrorMaker 2)
|
|
1252
|
+
|
|
1253
|
+
```properties
|
|
1254
|
+
# mm2.properties (MirrorMaker 2配置)
|
|
1255
|
+
clusters = source, target
|
|
1256
|
+
|
|
1257
|
+
source.bootstrap.servers = dc1-kafka1:9092,dc1-kafka2:9092
|
|
1258
|
+
target.bootstrap.servers = dc2-kafka1:9092,dc2-kafka2:9092
|
|
1259
|
+
|
|
1260
|
+
# 复制配置
|
|
1261
|
+
source->target.enabled = true
|
|
1262
|
+
source->target.topics = order-events,user-events,payment-events
|
|
1263
|
+
source->target.topics.exclude = .*-internal,__.*
|
|
1264
|
+
source->target.groups = order-processing-group,analytics-group
|
|
1265
|
+
|
|
1266
|
+
# 同步配置
|
|
1267
|
+
replication.factor = 3
|
|
1268
|
+
offset-syncs.topic.replication.factor = 3
|
|
1269
|
+
heartbeats.topic.replication.factor = 3
|
|
1270
|
+
checkpoints.topic.replication.factor = 3
|
|
1271
|
+
|
|
1272
|
+
# 性能配置
|
|
1273
|
+
tasks.max = 4
|
|
1274
|
+
producer.buffer.memory = 134217728
|
|
1275
|
+
consumer.fetch.max.bytes = 52428800
|
|
1276
|
+
|
|
1277
|
+
# 偏移量同步(故障切换时保持消费位置)
|
|
1278
|
+
sync.group.offsets.enabled = true
|
|
1279
|
+
sync.group.offsets.interval.seconds = 10
|
|
1280
|
+
emit.checkpoints.enabled = true
|
|
1281
|
+
emit.checkpoints.interval.seconds = 30
|
|
1282
|
+
```
|
|
1283
|
+
|
|
1284
|
+
```bash
|
|
1285
|
+
# 启动MirrorMaker 2
|
|
1286
|
+
connect-mirror-maker.sh mm2.properties
|
|
1287
|
+
|
|
1288
|
+
# 灾难恢复切换:
|
|
1289
|
+
# 1. 停止源集群的生产者
|
|
1290
|
+
# 2. 等待MirrorMaker 2同步完成(检查checkpoint lag)
|
|
1291
|
+
# 3. 将消费者指向目标集群
|
|
1292
|
+
# 4. 使用synced offset恢复消费位置
|
|
1293
|
+
# 5. 启动目标集群的生产者
|
|
1294
|
+
```
|
|
1295
|
+
|
|
1296
|
+
### 4. Topic迁移
|
|
1297
|
+
|
|
1298
|
+
```bash
|
|
1299
|
+
# 分区Leader重新选举(优先副本选举)
|
|
1300
|
+
kafka-leader-election.sh --bootstrap-server localhost:9092 \
|
|
1301
|
+
--election-type preferred \
|
|
1302
|
+
--all-topic-partitions
|
|
1303
|
+
|
|
1304
|
+
# 增加Topic副本因子
|
|
1305
|
+
# 1. 生成增加副本的reassignment JSON
|
|
1306
|
+
cat > increase-rf.json << 'EOF'
|
|
1307
|
+
{
|
|
1308
|
+
"version": 1,
|
|
1309
|
+
"partitions": [
|
|
1310
|
+
{"topic": "order-events", "partition": 0, "replicas": [0, 1, 2]},
|
|
1311
|
+
{"topic": "order-events", "partition": 1, "replicas": [1, 2, 0]},
|
|
1312
|
+
{"topic": "order-events", "partition": 2, "replicas": [2, 0, 1]}
|
|
1313
|
+
]
|
|
1314
|
+
}
|
|
1315
|
+
EOF
|
|
1316
|
+
|
|
1317
|
+
# 2. 执行
|
|
1318
|
+
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
|
|
1319
|
+
--reassignment-json-file increase-rf.json \
|
|
1320
|
+
--execute --throttle 50000000
|
|
1321
|
+
```
|
|
1322
|
+
|
|
1323
|
+
## 常见陷阱
|
|
1324
|
+
|
|
1325
|
+
### 1. 消费者Lag暴涨
|
|
1326
|
+
|
|
1327
|
+
```
|
|
1328
|
+
症状: Consumer Lag持续增长, 消费速度跟不上生产速度
|
|
1329
|
+
原因:
|
|
1330
|
+
- 消费者处理逻辑耗时过长(数据库慢查询/外部调用超时)
|
|
1331
|
+
- 消费者数量不足(少于分区数)
|
|
1332
|
+
- GC暂停导致消费停滞
|
|
1333
|
+
- max.poll.records过大, 处理超过max.poll.interval.ms
|
|
1334
|
+
|
|
1335
|
+
排查步骤:
|
|
1336
|
+
1. kafka-consumer-groups.sh --describe 查看各分区Lag
|
|
1337
|
+
2. 检查消费者日志是否有处理异常/超时
|
|
1338
|
+
3. 监控消费者JVM GC情况
|
|
1339
|
+
4. 检查下游依赖(DB/缓存/HTTP)的响应时间
|
|
1340
|
+
|
|
1341
|
+
解决方案:
|
|
1342
|
+
✅ 增加消费者实例(不超过分区数)
|
|
1343
|
+
✅ 减小max.poll.records, 增大max.poll.interval.ms
|
|
1344
|
+
✅ 异步处理: poll后放入本地队列, 多线程处理
|
|
1345
|
+
✅ 优化下游调用(批量写DB/连接池/缓存)
|
|
1346
|
+
❌ 盲目增加分区数(需要同时增加消费者才有效)
|
|
1347
|
+
```
|
|
1348
|
+
|
|
1349
|
+
### 2. Rebalance风暴
|
|
1350
|
+
|
|
1351
|
+
```
|
|
1352
|
+
症状: Consumer频繁触发Rebalance, 消费几乎停滞
|
|
1353
|
+
原因:
|
|
1354
|
+
- session.timeout.ms过短, 心跳超时触发Rebalance
|
|
1355
|
+
- max.poll.interval.ms过短, 处理慢导致被踢出Group
|
|
1356
|
+
- Consumer频繁启停(K8s Pod频繁重启)
|
|
1357
|
+
- GC暂停超过session.timeout.ms
|
|
1358
|
+
|
|
1359
|
+
解决方案:
|
|
1360
|
+
✅ 使用CooperativeStickyAssignor(增量Rebalance)
|
|
1361
|
+
✅ 增大session.timeout.ms(30-60s)
|
|
1362
|
+
✅ 增大max.poll.interval.ms(5-10min)
|
|
1363
|
+
✅ 减小max.poll.records, 确保处理时间可控
|
|
1364
|
+
✅ 设置group.instance.id启用静态成员(避免重启触发Rebalance)
|
|
1365
|
+
|
|
1366
|
+
# 静态成员配置(Kafka 2.3+)
|
|
1367
|
+
group.instance.id=consumer-host-1 # 每个实例唯一
|
|
1368
|
+
session.timeout.ms=60000 # 可以设更长(静态成员离开不立即Rebalance)
|
|
1369
|
+
```
|
|
1370
|
+
|
|
1371
|
+
### 3. 分区过多
|
|
1372
|
+
|
|
1373
|
+
```
|
|
1374
|
+
症状: Broker内存占用高, Controller切换慢, 端到端延迟增加
|
|
1375
|
+
原因: 分区数远超实际吞吐需求
|
|
1376
|
+
|
|
1377
|
+
影响:
|
|
1378
|
+
- 每个分区占用约1MB Broker内存(元数据+索引)
|
|
1379
|
+
- Controller故障恢复时间与分区数成正比
|
|
1380
|
+
- 文件句柄数增加(每分区2-3个文件)
|
|
1381
|
+
- 生产者内存增加(每分区一个RecordBatch缓冲)
|
|
1382
|
+
|
|
1383
|
+
建议:
|
|
1384
|
+
✅ 单集群分区总数 < 200,000(KRaft模式可更多)
|
|
1385
|
+
✅ 单Broker分区数 < 4,000
|
|
1386
|
+
✅ 根据实际吞吐需求规划, 预留20-30%余量
|
|
1387
|
+
❌ 不要盲目设置大量分区(分区数只增不减)
|
|
1388
|
+
```
|
|
1389
|
+
|
|
1390
|
+
### 4. 消息丢失
|
|
1391
|
+
|
|
1392
|
+
```
|
|
1393
|
+
场景1: 生产端丢失
|
|
1394
|
+
原因: acks=0或acks=1且Leader宕机
|
|
1395
|
+
解决: acks=all + min.insync.replicas=2 + retries=MAX
|
|
1396
|
+
|
|
1397
|
+
场景2: Broker端丢失
|
|
1398
|
+
原因: unclean.leader.election.enable=true, 非ISR副本当选Leader
|
|
1399
|
+
解决: unclean.leader.election.enable=false
|
|
1400
|
+
|
|
1401
|
+
场景3: 消费端丢失
|
|
1402
|
+
原因: 自动提交offset后处理失败
|
|
1403
|
+
解决: 手动提交, 先处理后提交(at-least-once)
|
|
1404
|
+
|
|
1405
|
+
生产环境防丢失配置组合:
|
|
1406
|
+
# 生产者
|
|
1407
|
+
acks=all
|
|
1408
|
+
retries=2147483647
|
|
1409
|
+
enable.idempotence=true
|
|
1410
|
+
max.in.flight.requests.per.connection=5
|
|
1411
|
+
|
|
1412
|
+
# Broker
|
|
1413
|
+
min.insync.replicas=2
|
|
1414
|
+
unclean.leader.election.enable=false
|
|
1415
|
+
default.replication.factor=3
|
|
1416
|
+
|
|
1417
|
+
# 消费者
|
|
1418
|
+
enable.auto.commit=false
|
|
1419
|
+
# 手动提交: 先处理,后提交
|
|
1420
|
+
```
|
|
1421
|
+
|
|
1422
|
+
### 5. 重复消费
|
|
1423
|
+
|
|
1424
|
+
```
|
|
1425
|
+
场景: 消费者处理完消息但提交offset前宕机, 重启后重复消费
|
|
1426
|
+
原因: at-least-once语义下的正常行为
|
|
1427
|
+
|
|
1428
|
+
解决方案:
|
|
1429
|
+
1. 幂等消费(推荐)
|
|
1430
|
+
- 使用消息中的唯一ID(orderId)做去重
|
|
1431
|
+
- 数据库INSERT时使用UPSERT/ON CONFLICT
|
|
1432
|
+
- Redis SETNX记录已处理消息ID
|
|
1433
|
+
|
|
1434
|
+
2. Exactly-Once语义
|
|
1435
|
+
- Kafka事务(消费-转换-生产场景)
|
|
1436
|
+
- 将offset和业务数据写入同一事务(如同一数据库)
|
|
1437
|
+
|
|
1438
|
+
3. 消费去重表
|
|
1439
|
+
CREATE TABLE consumed_offsets (
|
|
1440
|
+
consumer_group VARCHAR(255),
|
|
1441
|
+
topic VARCHAR(255),
|
|
1442
|
+
partition_id INT,
|
|
1443
|
+
offset_val BIGINT,
|
|
1444
|
+
processed_at TIMESTAMP,
|
|
1445
|
+
PRIMARY KEY (consumer_group, topic, partition_id)
|
|
1446
|
+
);
|
|
1447
|
+
```
|
|
1448
|
+
|
|
1449
|
+
## 学习路线
|
|
1450
|
+
|
|
1451
|
+
### 入门级 (1-2周)
|
|
1452
|
+
1. 理解Kafka核心概念(Topic/Partition/Consumer Group/Offset)
|
|
1453
|
+
2. 搭建单节点Kafka(Docker)
|
|
1454
|
+
3. 使用命令行工具收发消息
|
|
1455
|
+
4. 编写简单的生产者和消费者
|
|
1456
|
+
|
|
1457
|
+
### 中级 (2-4周)
|
|
1458
|
+
1. 集群搭建与副本管理
|
|
1459
|
+
2. Schema管理与序列化
|
|
1460
|
+
3. Consumer Group与Rebalance机制
|
|
1461
|
+
4. Kafka Connect数据集成
|
|
1462
|
+
5. 基础监控与告警
|
|
1463
|
+
|
|
1464
|
+
### 高级 (1-2月)
|
|
1465
|
+
1. Kafka Streams流处理
|
|
1466
|
+
2. 事务与Exactly-Once语义
|
|
1467
|
+
3. 性能调优与容量规划
|
|
1468
|
+
4. 安全配置(SASL/SSL/ACL)
|
|
1469
|
+
5. 跨数据中心复制(MirrorMaker 2)
|
|
1470
|
+
|
|
1471
|
+
### 专家级 (持续)
|
|
1472
|
+
1. KRaft架构与迁移
|
|
1473
|
+
2. 大规模集群运维(百万分区)
|
|
1474
|
+
3. CDC管道设计(Debezium)
|
|
1475
|
+
4. 故障演练与灾难恢复
|
|
1476
|
+
5. 自定义Interceptor/Serializer/Partitioner
|
|
1477
|
+
|
|
1478
|
+
## 参考资料
|
|
1479
|
+
|
|
1480
|
+
### 官方文档
|
|
1481
|
+
- [Kafka官方文档](https://kafka.apache.org/documentation/)
|
|
1482
|
+
- [Confluent文档](https://docs.confluent.io/)
|
|
1483
|
+
- [KRaft文档](https://kafka.apache.org/documentation/#kraft)
|
|
1484
|
+
|
|
1485
|
+
### 工具
|
|
1486
|
+
- [Kafka UI](https://github.com/provectus/kafka-ui) - Web管理界面
|
|
1487
|
+
- [Burrow](https://github.com/linkedin/Burrow) - Consumer Lag监控
|
|
1488
|
+
- [AKHQ](https://github.com/tchiotludo/akhq) - Kafka管理平台
|
|
1489
|
+
- [Debezium](https://debezium.io/) - CDC连接器
|
|
1490
|
+
|
|
1491
|
+
### 书籍
|
|
1492
|
+
- 《Kafka权威指南》(第2版) - O'Reilly
|
|
1493
|
+
- 《Kafka Streams实战》 - Manning
|
|
1494
|
+
|
|
1495
|
+
---
|
|
1496
|
+
|
|
1497
|
+
## Agent Checklist
|
|
1498
|
+
|
|
1499
|
+
- [ ] Topic设计: 命名规范(domain.entity.event), 分区数合理, 副本因子>=3
|
|
1500
|
+
- [ ] 生产者: acks=all, 开启幂等性, 批处理+压缩配置
|
|
1501
|
+
- [ ] 消费者: 手动提交offset, CooperativeStickyAssignor, 幂等消费
|
|
1502
|
+
- [ ] Schema: 使用Schema Registry, 设置兼容性策略, Avro/Protobuf序列化
|
|
1503
|
+
- [ ] 集群: min.insync.replicas=2, unclean.leader.election.enable=false
|
|
1504
|
+
- [ ] 安全: SASL认证 + SSL/TLS加密 + ACL授权, 最小权限原则
|
|
1505
|
+
- [ ] 监控: JMX指标接入Prometheus, Consumer Lag告警, Broker健康检查
|
|
1506
|
+
- [ ] 性能: 分区数与消费者数匹配, 零拷贝+页面缓存, 合理的JVM堆(6-8GB)
|
|
1507
|
+
- [ ] 保留策略: 按业务需求配置retention, 重要Topic使用compact策略
|
|
1508
|
+
- [ ] 灾备: MirrorMaker 2跨DC复制, 定期故障演练, 切换SOP文档化
|
|
1509
|
+
- [ ] 防丢失: acks=all + min.insync.replicas=2 + 手动提交 + 幂等消费
|
|
1510
|
+
- [ ] 防重复: 业务幂等 + 去重表/Redis去重 + Exactly-Once事务(如需要)
|
|
1511
|
+
- [ ] KRaft迁移: 评估是否已满足Kafka 3.3+要求, 规划ZK→KRaft迁移路径
|
|
1512
|
+
|
|
1513
|
+
---
|
|
1514
|
+
|
|
1515
|
+
**知识ID**: `kafka-complete`
|
|
1516
|
+
**领域**: data-engineering
|
|
1517
|
+
**类型**: standards
|
|
1518
|
+
**难度**: intermediate
|
|
1519
|
+
**质量分**: 95
|
|
1520
|
+
**维护者**: data-team@umadev.com
|
|
1521
|
+
**最后更新**: 2026-03-28
|