@umacloud/knowledge 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (418) hide show
  1. package/00-governance/governance-capabilities.md +557 -0
  2. package/00-governance/knowledge-map.md +39 -0
  3. package/00-governance/maintenance-policy.md +76 -0
  4. package/00-governance/review-checklist.md +81 -0
  5. package/README.md +13 -0
  6. package/ai/01-standards/agent-development-complete.md +691 -0
  7. package/ai/01-standards/llm-application-complete.md +488 -0
  8. package/ai/01-standards/mlops-complete.md +798 -0
  9. package/ai/01-standards/prompt-engineering-complete.md +646 -0
  10. package/ai/01-standards/rag-architecture-complete.md +649 -0
  11. package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
  12. package/ai/03-checklists/ai-project-checklist.md +215 -0
  13. package/ai/04-antipatterns/ai-antipatterns.md +661 -0
  14. package/ai/05-cases/case-rag-production.md +147 -0
  15. package/ai/06-glossary/ai-glossary.md +162 -0
  16. package/ai/agent-evaluation-benchmark.md +53 -0
  17. package/ai/ai-agent-memory-context-management.md +41 -0
  18. package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
  19. package/ai/ai-data-security-and-compliance-playbook.md +37 -0
  20. package/ai/ai-domain-index-and-checklist.md +40 -0
  21. package/ai/ai-governance-maturity-model.md +50 -0
  22. package/ai/ai-model-selection-and-routing-strategy.md +47 -0
  23. package/ai/ai-observability-and-oncall-runbook.md +52 -0
  24. package/ai/ai-rag-engineering-playbook.md +42 -0
  25. package/ai/ai-red-team-and-safety-evaluation.md +42 -0
  26. package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
  27. package/ai/llm-agent-engineering-deep-dive.md +57 -0
  28. package/ai/prompt-and-tool-guardrails.md +52 -0
  29. package/api/01-standards/enterprise-api-standards.md +198 -0
  30. package/api/01-standards/rest-api-design-guide.md +63 -0
  31. package/api/02-playbooks/api-pagination-playbook.md +93 -0
  32. package/api/02-playbooks/graphql-production-playbook.md +176 -0
  33. package/api/03-checklists/api-review-checklist.md +55 -0
  34. package/api/04-antipatterns/api-antipatterns.md +112 -0
  35. package/architecture/01-standards/api-gateway-patterns.md +496 -0
  36. package/architecture/01-standards/cloud-native-patterns.md +644 -0
  37. package/architecture/01-standards/distributed-systems-patterns.md +591 -0
  38. package/architecture/01-standards/event-driven-architecture.md +595 -0
  39. package/architecture/01-standards/microservices-patterns-complete.md +968 -0
  40. package/architecture/01-standards/microservices-patterns.md +495 -0
  41. package/architecture/01-standards/system-design-interview.md +664 -0
  42. package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
  43. package/architecture/02-playbooks/migration-playbook.md +780 -0
  44. package/architecture/02-playbooks/system-design-playbook.md +779 -0
  45. package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
  46. package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
  47. package/architecture/05-cases/case-netflix-microservices.md +413 -0
  48. package/architecture/06-glossary/architecture-glossary.md +164 -0
  49. package/architecture/adr-template-and-examples.md +38 -0
  50. package/architecture/api-gateway-deep-dive.md +1291 -0
  51. package/architecture/configuration-management.md +1162 -0
  52. package/architecture/distributed-transactions.md +1220 -0
  53. package/architecture/microservices-complete.md +735 -0
  54. package/architecture/resilience-and-disaster-patterns.md +37 -0
  55. package/architecture/service-governance.md +1198 -0
  56. package/architecture/system-architecture-deep-dive.md +37 -0
  57. package/backend/01-standards/analytics-and-growth.md +65 -0
  58. package/backend/01-standards/api-and-error-conventions.md +120 -0
  59. package/backend/01-standards/application-layering-and-packaging.md +160 -0
  60. package/backend/01-standards/auth-implementation.md +104 -0
  61. package/backend/01-standards/backend-framework-idioms.md +74 -0
  62. package/backend/01-standards/background-jobs-and-async.md +66 -0
  63. package/backend/01-standards/caching-strategies-complete.md +390 -0
  64. package/backend/01-standards/config-and-observability.md +77 -0
  65. package/backend/01-standards/data-modeling-and-persistence.md +94 -0
  66. package/backend/01-standards/django-complete.md +1765 -0
  67. package/backend/01-standards/email-and-notifications.md +64 -0
  68. package/backend/01-standards/fastapi-complete.md +925 -0
  69. package/backend/01-standards/file-upload-and-storage.md +66 -0
  70. package/backend/01-standards/graphql-api-complete.md +416 -0
  71. package/backend/01-standards/llm-application-standard.md +78 -0
  72. package/backend/01-standards/message-queue-patterns.md +379 -0
  73. package/backend/01-standards/microservices-and-distributed.md +78 -0
  74. package/backend/01-standards/nestjs-complete.md +2167 -0
  75. package/backend/01-standards/payment-integration.md +80 -0
  76. package/backend/01-standards/rate-limiting-complete.md +451 -0
  77. package/backend/01-standards/realtime-and-websocket.md +65 -0
  78. package/backend/01-standards/search-and-filtering.md +64 -0
  79. package/backend/01-standards/spring-boot-complete.md +445 -0
  80. package/backend/02-playbooks/api-design-playbook.md +718 -0
  81. package/backend/02-playbooks/email-send-playbook.md +130 -0
  82. package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
  83. package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
  84. package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
  85. package/backend/03-checklists/api-launch-checklist.md +189 -0
  86. package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
  87. package/blockchain/01-standards/blockchain-basics.md +557 -0
  88. package/blockchain/01-standards/smart-contract-development.md +1315 -0
  89. package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
  90. package/cicd/01-standards/github-actions-complete.md +473 -0
  91. package/cicd/01-standards/release-and-store-submission.md +75 -0
  92. package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
  93. package/cicd/02-playbooks/release-management-playbook.md +605 -0
  94. package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
  95. package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
  96. package/cicd/05-cases/case-deployment-automation.md +221 -0
  97. package/cicd/05-cases/case-gitops-transformation.md +212 -0
  98. package/cicd/06-glossary/cicd-glossary.md +114 -0
  99. package/cicd/cicd-blueprint-deep-dive.md +38 -0
  100. package/cicd/release-readiness-gate.md +37 -0
  101. package/cloud-native/01-standards/container-security.md +741 -0
  102. package/cloud-native/01-standards/kubernetes-complete.md +812 -0
  103. package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
  104. package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
  105. package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
  106. package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
  107. package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
  108. package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
  109. package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
  110. package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
  111. package/cloud-native/03-checklists/container-security-checklist.md +431 -0
  112. package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
  113. package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
  114. package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
  115. package/cloud-native/05-cases/case-k8s-migration.md +478 -0
  116. package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
  117. package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
  118. package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
  119. package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
  120. package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
  121. package/data/01-standards/elasticsearch-complete.md +2098 -0
  122. package/data/01-standards/postgresql-complete.md +1613 -0
  123. package/data/01-standards/redis-complete.md +1527 -0
  124. package/data/02-playbooks/database-optimization-playbook.md +403 -0
  125. package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
  126. package/data/03-checklists/database-launch-checklist.md +187 -0
  127. package/data/04-antipatterns/database-antipatterns.md +873 -0
  128. package/data/05-cases/case-database-migration.md +310 -0
  129. package/data/06-glossary/database-glossary.md +440 -0
  130. package/data/data-governance-and-modeling-deep-dive.md +39 -0
  131. package/data-engineering/01-standards/airflow-complete.md +523 -0
  132. package/data-engineering/01-standards/kafka-complete.md +1521 -0
  133. package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
  134. package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
  135. package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
  136. package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
  137. package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
  138. package/database/01-standards/database-schema-standards.md +147 -0
  139. package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
  140. package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
  141. package/database/02-playbooks/postgresql-production-playbook.md +146 -0
  142. package/database/02-playbooks/redis-caching-playbook.md +117 -0
  143. package/database/03-checklists/database-review-checklist.md +50 -0
  144. package/database/04-antipatterns/database-antipatterns.md +112 -0
  145. package/design/01-standards/ui-design-system-complete.md +423 -0
  146. package/design/02-playbooks/design-handoff-playbook.md +254 -0
  147. package/design/02-playbooks/design-review-playbook.md +388 -0
  148. package/design/03-checklists/design-review-checklist.md +246 -0
  149. package/design/04-antipatterns/design-antipatterns.md +378 -0
  150. package/design/05-cases/case-design-system-adoption.md +328 -0
  151. package/design/06-glossary/design-glossary.md +329 -0
  152. package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
  153. package/design/ux-system-deep-dive.md +38 -0
  154. package/design-systems/00-craft-rules.md +71 -0
  155. package/design-systems/aesthetic-families.md +43 -0
  156. package/design-systems/anti-ai-slop.md +162 -0
  157. package/design-systems/bold-geometric.md +120 -0
  158. package/design-systems/brutalist-bold.md +103 -0
  159. package/design-systems/editorial-clean.md +109 -0
  160. package/design-systems/glass-aurora.md +108 -0
  161. package/design-systems/modern-minimal.md +145 -0
  162. package/design-systems/premium-luxury.md +106 -0
  163. package/design-systems/product-type-design-map.md +48 -0
  164. package/design-systems/soft-warm.md +123 -0
  165. package/design-systems/tech-utility.md +113 -0
  166. package/desktop/01-standards/desktop-app-standard.md +72 -0
  167. package/desktop/01-standards/desktop-design.md +71 -0
  168. package/development/00-governance/document-template.md +41 -0
  169. package/development/01-standards/api-versioning-strategies.md +432 -0
  170. package/development/01-standards/authentication-patterns-complete.md +479 -0
  171. package/development/01-standards/css-architecture-complete.md +550 -0
  172. package/development/01-standards/database-migration-strategies.md +484 -0
  173. package/development/01-standards/elasticsearch-complete.md +347 -0
  174. package/development/01-standards/git-complete.md +371 -0
  175. package/development/01-standards/golang-complete.md +1565 -0
  176. package/development/01-standards/graphql-complete.md +298 -0
  177. package/development/01-standards/javascript-bundlers-complete.md +469 -0
  178. package/development/01-standards/javascript-typescript-complete.md +528 -0
  179. package/development/01-standards/jest-complete.md +275 -0
  180. package/development/01-standards/linux-complete.md +234 -0
  181. package/development/01-standards/logging-observability-complete.md +526 -0
  182. package/development/01-standards/microservices-communication.md +502 -0
  183. package/development/01-standards/mongodb-complete.md +406 -0
  184. package/development/01-standards/oauth2-complete.md +285 -0
  185. package/development/01-standards/performance-optimization-complete.md +289 -0
  186. package/development/01-standards/playwright-complete.md +247 -0
  187. package/development/01-standards/postgresql-complete.md +456 -0
  188. package/development/01-standards/pytest-complete.md +340 -0
  189. package/development/01-standards/python-async-programming.md +902 -0
  190. package/development/01-standards/python-complete.md +956 -0
  191. package/development/01-standards/python-decorators-complete.md +799 -0
  192. package/development/01-standards/python-design-patterns.md +2854 -0
  193. package/development/01-standards/python-packaging-distribution.md +420 -0
  194. package/development/01-standards/python-testing-strategies.md +607 -0
  195. package/development/01-standards/python-web-frameworks-comparison.md +471 -0
  196. package/development/01-standards/redis-complete.md +317 -0
  197. package/development/01-standards/rest-api-complete.md +316 -0
  198. package/development/01-standards/rust-complete.md +578 -0
  199. package/development/01-standards/typescript-advanced-types.md +1513 -0
  200. package/development/01-standards/web-security-complete.md +292 -0
  201. package/development/02-playbooks/api-design-playbook.md +810 -0
  202. package/development/02-playbooks/database-migration-playbook.md +580 -0
  203. package/development/02-playbooks/debugging-playbook.md +692 -0
  204. package/development/02-playbooks/feature-delivery-playbook.md +430 -0
  205. package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
  206. package/development/02-playbooks/performance-optimization-playbook.md +531 -0
  207. package/development/02-playbooks/performance-tuning-playbook.md +652 -0
  208. package/development/02-playbooks/refactor-playbook.md +403 -0
  209. package/development/02-playbooks/release-playbook.md +469 -0
  210. package/development/03-checklists/architecture-review-checklist.md +168 -0
  211. package/development/03-checklists/data-migration-checklist.md +157 -0
  212. package/development/03-checklists/oncall-handover-checklist.md +173 -0
  213. package/development/03-checklists/pr-checklist.md +158 -0
  214. package/development/03-checklists/production-readiness-checklist.md +190 -0
  215. package/development/03-checklists/release-readiness-checklist.md +154 -0
  216. package/development/03-checklists/security-review-checklist.md +182 -0
  217. package/development/04-antipatterns/api-antipatterns.md +657 -0
  218. package/development/04-antipatterns/architecture-antipatterns.md +686 -0
  219. package/development/04-antipatterns/backend-antipatterns.md +648 -0
  220. package/development/04-antipatterns/cicd-antipatterns.md +540 -0
  221. package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
  222. package/development/04-antipatterns/data-antipatterns.md +658 -0
  223. package/development/04-antipatterns/database-antipatterns.md +578 -0
  224. package/development/04-antipatterns/frontend-antipatterns.md +635 -0
  225. package/development/04-antipatterns/reliability-antipatterns.md +700 -0
  226. package/development/04-antipatterns/security-antipatterns.md +747 -0
  227. package/development/05-cases/case-api-version-migration.md +428 -0
  228. package/development/05-cases/case-authorization-hardening.md +383 -0
  229. package/development/05-cases/case-bluegreen-rollback.md +466 -0
  230. package/development/05-cases/case-cache-snowball-protection.md +485 -0
  231. package/development/05-cases/case-ci-cd-pipeline.md +544 -0
  232. package/development/05-cases/case-database-scaling.md +500 -0
  233. package/development/05-cases/case-db-hotspot-optimization.md +487 -0
  234. package/development/05-cases/case-incident-mttr-reduction.md +563 -0
  235. package/development/05-cases/case-microservice-migration.md +375 -0
  236. package/development/05-cases/case-performance-optimization.md +406 -0
  237. package/development/05-cases/case-security-incident-response.md +345 -0
  238. package/development/06-glossary/full-stack-glossary.md +166 -0
  239. package/development/09-maturity/quarterly-audit-template.md +35 -0
  240. package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
  241. package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
  242. package/development/12-scenarios/development-scenarios-guide.md +565 -0
  243. package/development/13-implementation-assets/implementation-toolkit.md +282 -0
  244. package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
  245. package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
  246. package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
  247. package/development/api-contract-and-versioning-guide.md +36 -0
  248. package/development/api-governance-complete.md +43 -0
  249. package/development/backend-engineering-complete.md +43 -0
  250. package/development/code-review-quality-complete.md +43 -0
  251. package/development/concurrency-reliability-complete.md +43 -0
  252. package/development/database-engineering-complete.md +43 -0
  253. package/development/engineering-effectiveness-complete.md +43 -0
  254. package/development/engineering-standards-deep-dive.md +38 -0
  255. package/development/frontend-engineering-complete.md +43 -0
  256. package/development/performance-capacity-complete.md +43 -0
  257. package/development/refactor-migration-complete.md +42 -0
  258. package/development/refactoring-and-techdebt-playbook.md +37 -0
  259. package/development/security-in-development-complete.md +43 -0
  260. package/devops/01-standards/cicd-pipeline-complete.md +262 -0
  261. package/devops/01-standards/docker-complete.md +1490 -0
  262. package/devops/01-standards/github-actions-complete.md +337 -0
  263. package/devops/01-standards/kubernetes-complete.md +638 -0
  264. package/devops/01-standards/terraform-complete.md +2117 -0
  265. package/devops/02-playbooks/docker-compose-playbook.md +233 -0
  266. package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
  267. package/devops/02-playbooks/docker-production-playbook.md +952 -0
  268. package/edge-iot/01-standards/edge-iot-complete.md +473 -0
  269. package/experts/architect/api-design.md +178 -0
  270. package/experts/architect/methodology.md +124 -0
  271. package/experts/architect/security.md +75 -0
  272. package/experts/backend-lead/methodology.md +216 -0
  273. package/experts/devops/methodology.md +160 -0
  274. package/experts/frontend-lead/methodology.md +178 -0
  275. package/experts/product-manager/industry/ecommerce.md +43 -0
  276. package/experts/product-manager/industry/saas.md +40 -0
  277. package/experts/product-manager/methodology.md +97 -0
  278. package/experts/qa-lead/methodology.md +123 -0
  279. package/experts/qa-lead/test-strategy.md +128 -0
  280. package/experts/uiux-designer/methodology.md +125 -0
  281. package/frontend/01-standards/accessibility-complete.md +532 -0
  282. package/frontend/01-standards/accessibility-standard.md +74 -0
  283. package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
  284. package/frontend/01-standards/design-tokens-complete.md +444 -0
  285. package/frontend/01-standards/forms-and-validation.md +77 -0
  286. package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
  287. package/frontend/01-standards/i18n-and-localization.md +65 -0
  288. package/frontend/01-standards/nextjs-complete.md +451 -0
  289. package/frontend/01-standards/react-complete.md +713 -0
  290. package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
  291. package/frontend/01-standards/react-hooks-complete.md +1171 -0
  292. package/frontend/01-standards/seo-and-web-vitals.md +77 -0
  293. package/frontend/01-standards/state-management-complete.md +444 -0
  294. package/frontend/01-standards/vue-complete.md +499 -0
  295. package/frontend/01-standards/vue3-complete.md +2002 -0
  296. package/frontend/01-standards/web-framework-best-practices.md +64 -0
  297. package/frontend/01-standards/web-performance-complete.md +495 -0
  298. package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
  299. package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
  300. package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
  301. package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
  302. package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
  303. package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
  304. package/frontend/03-checklists/component-quality-checklist.md +166 -0
  305. package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
  306. package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
  307. package/frontend/05-cases/case-performance-optimization.md +274 -0
  308. package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
  309. package/harmony/01-standards/harmonyos-design.md +65 -0
  310. package/high-quality-engineering-playbook.md +54 -0
  311. package/incident/01-standards/incident-response-complete.md +303 -0
  312. package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
  313. package/incident/02-playbooks/postmortem-playbook.md +398 -0
  314. package/incident/03-checklists/incident-readiness-checklist.md +181 -0
  315. package/incident/04-antipatterns/incident-antipatterns.md +490 -0
  316. package/incident/05-cases/case-cascade-failure.md +176 -0
  317. package/incident/06-glossary/incident-glossary.md +114 -0
  318. package/incident/postmortem-and-response-deep-dive.md +39 -0
  319. package/industries/ecommerce/ecommerce-complete.md +631 -0
  320. package/industries/education/education-complete.md +555 -0
  321. package/industries/fintech/fintech-complete.md +501 -0
  322. package/industries/gaming/gaming-complete.md +587 -0
  323. package/industries/healthcare/healthcare-complete.md +452 -0
  324. package/low-code/01-standards/low-code-complete.md +944 -0
  325. package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
  326. package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
  327. package/miniprogram/01-standards/miniprogram-design.md +61 -0
  328. package/miniprogram/01-standards/miniprogram-standard.md +81 -0
  329. package/mobile/01-standards/android-material-design.md +70 -0
  330. package/mobile/01-standards/flutter-complete.md +384 -0
  331. package/mobile/01-standards/ios-design-hig.md +78 -0
  332. package/mobile/01-standards/mobile-app-standard.md +85 -0
  333. package/mobile/01-standards/react-native-complete.md +352 -0
  334. package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
  335. package/mobile/02-playbooks/mobile-performance.md +473 -0
  336. package/mobile/03-checklists/mobile-release-checklist.md +234 -0
  337. package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
  338. package/mobile/05-cases/case-app-performance.md +500 -0
  339. package/mobile/05-cases/case-app-startup-optimization.md +218 -0
  340. package/mobile/06-glossary/mobile-glossary.md +484 -0
  341. package/observability/01-standards/observability-standards.md +103 -0
  342. package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
  343. package/observability/02-playbooks/structured-logging-playbook.md +73 -0
  344. package/observability/03-checklists/observability-checklist.md +54 -0
  345. package/observability/04-antipatterns/observability-antipatterns.md +106 -0
  346. package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
  347. package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
  348. package/operations/03-checklists/production-launch-checklist.md +365 -0
  349. package/operations/04-antipatterns/operations-antipatterns.md +664 -0
  350. package/operations/05-cases/case-sre-practices.md +581 -0
  351. package/operations/06-glossary/operations-glossary.md +120 -0
  352. package/operations/aiops-anomaly-detection.md +758 -0
  353. package/operations/capacity-planning.md +1061 -0
  354. package/operations/chaos-engineering.md +659 -0
  355. package/operations/incident-command-system.md +38 -0
  356. package/operations/observability-complete.md +442 -0
  357. package/operations/slo-sli-playbook.md +517 -0
  358. package/operations/sre-operations-deep-dive.md +39 -0
  359. package/package.json +8 -0
  360. package/performance/01-standards/performance-and-scalability.md +80 -0
  361. package/performance/01-standards/performance-standards.md +156 -0
  362. package/performance/02-playbooks/query-optimization-playbook.md +103 -0
  363. package/performance/03-checklists/performance-checklist.md +56 -0
  364. package/performance/04-antipatterns/performance-antipatterns.md +146 -0
  365. package/product/01-standards/product-management-complete.md +285 -0
  366. package/product/02-playbooks/feature-launch-playbook.md +207 -0
  367. package/product/02-playbooks/user-research-playbook.md +532 -0
  368. package/product/03-checklists/feature-launch-checklist.md +275 -0
  369. package/product/04-antipatterns/product-antipatterns.md +355 -0
  370. package/product/05-cases/case-mvp-to-scale.md +384 -0
  371. package/product/06-glossary/product-glossary.md +462 -0
  372. package/product/feature-prioritization-framework.md +40 -0
  373. package/product/kpi-and-metric-tree.md +37 -0
  374. package/product/product-discovery-and-prd-deep-dive.md +41 -0
  375. package/quantum/01-standards/quantum-complete.md +1186 -0
  376. package/security/01-standards/api-security-complete.md +511 -0
  377. package/security/01-standards/container-runtime-security.md +574 -0
  378. package/security/01-standards/data-protection-gdpr.md +543 -0
  379. package/security/01-standards/owasp-top10-complete.md +1890 -0
  380. package/security/01-standards/secure-coding-baseline.md +90 -0
  381. package/security/01-standards/supply-chain-security.md +441 -0
  382. package/security/01-standards/web-security-checklist.md +108 -0
  383. package/security/01-standards/zero-trust-architecture.md +521 -0
  384. package/security/02-playbooks/auth-sso-playbook.md +166 -0
  385. package/security/02-playbooks/incident-response-security-playbook.md +588 -0
  386. package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
  387. package/security/02-playbooks/payment-integration-playbook.md +119 -0
  388. package/security/02-playbooks/penetration-testing-playbook.md +517 -0
  389. package/security/03-checklists/security-audit-checklist.md +356 -0
  390. package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
  391. package/security/05-cases/case-log4shell-incident.md +537 -0
  392. package/security/05-cases/case-major-breaches.md +468 -0
  393. package/security/06-glossary/security-glossary.md +212 -0
  394. package/security/compliance-automation.md +993 -0
  395. package/security/container-security.md +680 -0
  396. package/security/devsecops-complete.md +426 -0
  397. package/security/sast-dast-sca.md +775 -0
  398. package/security/secrets-management.md +594 -0
  399. package/security/security-architecture-deep-dive.md +37 -0
  400. package/security/threat-modeling-stride-playbook.md +40 -0
  401. package/seed-templates/auth-system.md +59 -0
  402. package/seed-templates/blog-content.md +94 -0
  403. package/seed-templates/dashboard.md +89 -0
  404. package/seed-templates/docs-site.md +73 -0
  405. package/seed-templates/e-commerce.md +50 -0
  406. package/seed-templates/saas-landing.md +92 -0
  407. package/seed-templates/settings-page.md +51 -0
  408. package/testing/01-standards/test-strategy-and-layering.md +83 -0
  409. package/testing/01-standards/testing-strategy-complete.md +422 -0
  410. package/testing/01-standards/unit-testing-best-practices.md +118 -0
  411. package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
  412. package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
  413. package/testing/03-checklists/test-strategy-checklist.md +208 -0
  414. package/testing/04-antipatterns/testing-antipatterns.md +718 -0
  415. package/testing/05-cases/case-testing-transformation.md +300 -0
  416. package/testing/06-glossary/testing-glossary.md +110 -0
  417. package/testing/risk-based-test-matrix.md +36 -0
  418. package/testing/testing-strategy-deep-dive.md +37 -0
@@ -0,0 +1,1521 @@
1
+ ---
2
+ id: kafka-complete
3
+ title: Apache Kafka完整指南
4
+ domain: data-engineering
5
+ category: 01-standards
6
+ difficulty: intermediate
7
+ tags: [complete, connect, data-engineering, kafka, schema管理, streams, 核心概念, 概述]
8
+ quality_score: 70
9
+ last_updated: 2026-06-15
10
+ ---
11
+ # Apache Kafka完整指南
12
+
13
+ ## 概述
14
+
15
+ Apache Kafka是一个分布式事件流平台,用于高吞吐量、低延迟的实时数据管道和流处理。最初由LinkedIn开发,后捐赠给Apache基金会。Kafka以其持久化日志模型、水平扩展能力和容错设计,成为现代数据架构的核心基础设施。
16
+
17
+ ### 消息队列对比
18
+
19
+ | 特性 | Kafka | RabbitMQ | Redis Streams | Pulsar |
20
+ |------|-------|----------|---------------|--------|
21
+ | 模型 | 分布式日志 | AMQP消息代理 | 内存流 | 分层存储日志 |
22
+ | 吞吐量 | 百万级/秒 | 万级/秒 | 十万级/秒 | 百万级/秒 |
23
+ | 延迟 | 毫秒级 | 微秒级 | 亚毫秒级 | 毫秒级 |
24
+ | 持久化 | 磁盘顺序写 | 可选持久化 | AOF/RDB | BookKeeper |
25
+ | 消息回溯 | 支持(Offset) | 不支持 | 支持(ID) | 支持(MessageID) |
26
+ | 消费模式 | 拉取(Pull) | 推送(Push) | 拉取/阻塞读 | 推送+拉取 |
27
+ | 协议 | 自有二进制协议 | AMQP/MQTT/STOMP | Redis协议 | 自有二进制协议 |
28
+ | 多租户 | 有限(ACL) | VHost隔离 | 无原生支持 | 原生多租户 |
29
+ | 存算分离 | KRaft模式部分支持 | 不支持 | 不支持 | 原生支持 |
30
+ | 适用场景 | 事件流/日志聚合/CDC | 任务队列/RPC | 轻量实时流 | 大规模多租户流 |
31
+
32
+ **选型建议**:
33
+ - **高吞吐事件流/日志采集/CDC**: 选择Kafka
34
+ - **复杂路由/任务队列/低延迟RPC**: 选择RabbitMQ
35
+ - **轻量级实时流/已有Redis生态**: 选择Redis Streams
36
+ - **多租户/存算分离/跨地域复制**: 选择Pulsar
37
+
38
+ ## 核心概念
39
+
40
+ ### 1. Broker
41
+
42
+ Broker是Kafka集群中的单个服务器节点,负责消息的接收、存储和分发。
43
+
44
+ ```
45
+ Kafka集群拓扑:
46
+ ┌─────────┐ ┌─────────┐ ┌─────────┐
47
+ │ Broker 0│ │ Broker 1│ │ Broker 2│
48
+ │ (Leader) │ │(Follower)│ │(Follower)│
49
+ │ P0,P3 │ │ P1,P4 │ │ P2,P5 │
50
+ └─────────┘ └─────────┘ └─────────┘
51
+ │ │ │
52
+ └────────────┼────────────┘
53
+
54
+ ┌────────┴────────┐
55
+ │ ZooKeeper/KRaft │
56
+ └─────────────────┘
57
+ ```
58
+
59
+ **关键配置**:
60
+ ```properties
61
+ # server.properties
62
+ broker.id=0
63
+ listeners=PLAINTEXT://0.0.0.0:9092
64
+ advertised.listeners=PLAINTEXT://kafka-broker-0:9092
65
+ log.dirs=/var/kafka-logs
66
+ num.partitions=6
67
+ default.replication.factor=3
68
+ min.insync.replicas=2
69
+ log.retention.hours=168
70
+ log.segment.bytes=1073741824
71
+ ```
72
+
73
+ ### 2. Topic与Partition
74
+
75
+ Topic是消息的逻辑分类,Partition是Topic的物理分片,是Kafka并行处理的基本单元。
76
+
77
+ ```
78
+ Topic: order-events (3 Partitions, RF=3)
79
+
80
+ Partition 0: [msg0, msg3, msg6, msg9, ...] → Leader: Broker 0
81
+ Partition 1: [msg1, msg4, msg7, msg10, ...] → Leader: Broker 1
82
+ Partition 2: [msg2, msg5, msg8, msg11, ...] → Leader: Broker 2
83
+
84
+ 每条消息在Partition内有唯一递增的Offset:
85
+ Partition 0: offset 0 → offset 1 → offset 2 → ...
86
+ ```
87
+
88
+ **Topic管理**:
89
+ ```bash
90
+ # 创建Topic
91
+ kafka-topics.sh --bootstrap-server localhost:9092 \
92
+ --create --topic order-events \
93
+ --partitions 6 --replication-factor 3
94
+
95
+ # 查看Topic列表
96
+ kafka-topics.sh --bootstrap-server localhost:9092 --list
97
+
98
+ # 查看Topic详情
99
+ kafka-topics.sh --bootstrap-server localhost:9092 \
100
+ --describe --topic order-events
101
+
102
+ # 修改Partition数(只能增加不能减少)
103
+ kafka-topics.sh --bootstrap-server localhost:9092 \
104
+ --alter --topic order-events --partitions 12
105
+
106
+ # 删除Topic
107
+ kafka-topics.sh --bootstrap-server localhost:9092 \
108
+ --delete --topic order-events
109
+ ```
110
+
111
+ ### 3. Consumer Group
112
+
113
+ Consumer Group是一组协同消费同一Topic的消费者。同一组内每个Partition只被一个消费者消费,实现负载均衡。
114
+
115
+ ```
116
+ Consumer Group: order-processing-group
117
+
118
+ Topic: order-events (6 Partitions)
119
+
120
+ Consumer A ← P0, P1
121
+ Consumer B ← P2, P3
122
+ Consumer C ← P4, P5
123
+
124
+ 如果Consumer B宕机:
125
+ Consumer A ← P0, P1, P2
126
+ Consumer C ← P3, P4, P5 (触发Rebalance)
127
+ ```
128
+
129
+ ### 4. Offset管理
130
+
131
+ Offset是消息在Partition中的位置标识,Consumer通过Offset追踪消费进度。
132
+
133
+ ```
134
+ Partition 0:
135
+ ┌───┬───┬───┬───┬───┬───┬───┬───┬───┬───┐
136
+ │ 0 │ 1 │ 2 │ 3 │ 4 │ 5 │ 6 │ 7 │ 8 │ 9 │
137
+ └───┴───┴───┴───┴───┴───┴───┴───┴───┴───┘
138
+ ↑ ↑ ↑
139
+ committed current LEO
140
+ offset position (Log End Offset)
141
+
142
+ committed offset: 已提交的消费位移
143
+ current position: 当前消费位置
144
+ LEO: 日志末端偏移量(下一条写入位置)
145
+ HW (High Watermark): 已同步到所有ISR副本的最大Offset
146
+ ```
147
+
148
+ ### 5. ISR/Leader/Follower
149
+
150
+ ISR(In-Sync Replicas)是与Leader保持同步的副本集合。
151
+
152
+ ```
153
+ Partition 0 (RF=3):
154
+ Leader: Broker 0 (接收读写)
155
+ Follower: Broker 1 (ISR成员, 同步中)
156
+ Follower: Broker 2 (ISR成员, 同步中)
157
+
158
+ 当Follower落后超过replica.lag.time.max.ms时,被移出ISR:
159
+ ISR: [0, 1, 2] → [0, 1] (Broker 2被移出)
160
+
161
+ Leader选举: 只从ISR中选举新Leader
162
+ 如果unclean.leader.election.enable=true, 允许从非ISR选举(可能丢数据)
163
+ ```
164
+
165
+ ## 生产者
166
+
167
+ ### 1. 基础生产者
168
+
169
+ ```java
170
+ // Java生产者
171
+ Properties props = new Properties();
172
+ props.put("bootstrap.servers", "kafka1:9092,kafka2:9092,kafka3:9092");
173
+ props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
174
+ props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
175
+
176
+ KafkaProducer<String, String> producer = new KafkaProducer<>(props);
177
+
178
+ // 异步发送
179
+ producer.send(new ProducerRecord<>("order-events", "order-123", orderJson),
180
+ (metadata, exception) -> {
181
+ if (exception != null) {
182
+ log.error("发送失败", exception);
183
+ } else {
184
+ log.info("发送成功: topic={}, partition={}, offset={}",
185
+ metadata.topic(), metadata.partition(), metadata.offset());
186
+ }
187
+ });
188
+
189
+ // 同步发送
190
+ RecordMetadata metadata = producer.send(
191
+ new ProducerRecord<>("order-events", "order-123", orderJson)).get();
192
+
193
+ producer.close();
194
+ ```
195
+
196
+ ```python
197
+ # Python生产者(confluent-kafka)
198
+ from confluent_kafka import Producer
199
+
200
+ conf = {
201
+ 'bootstrap.servers': 'kafka1:9092,kafka2:9092,kafka3:9092',
202
+ 'client.id': 'order-producer',
203
+ 'acks': 'all',
204
+ }
205
+
206
+ producer = Producer(conf)
207
+
208
+ def delivery_callback(err, msg):
209
+ if err:
210
+ print(f'发送失败: {err}')
211
+ else:
212
+ print(f'发送成功: topic={msg.topic()}, partition={msg.partition()}, offset={msg.offset()}')
213
+
214
+ producer.produce(
215
+ topic='order-events',
216
+ key='order-123',
217
+ value=order_json.encode('utf-8'),
218
+ callback=delivery_callback
219
+ )
220
+ producer.flush()
221
+ ```
222
+
223
+ ### 2. 分区策略
224
+
225
+ ```java
226
+ // 默认分区策略:
227
+ // 1. 指定partition → 直接使用
228
+ // 2. 有key → hash(key) % numPartitions
229
+ // 3. 无key → 粘性分区(Sticky Partitioner, Kafka 2.4+)
230
+
231
+ // 自定义分区器
232
+ public class OrderPartitioner implements Partitioner {
233
+ @Override
234
+ public int partition(String topic, Object key, byte[] keyBytes,
235
+ Object value, byte[] valueBytes, Cluster cluster) {
236
+ List<PartitionInfo> partitions = cluster.partitionsForTopic(topic);
237
+ int numPartitions = partitions.size();
238
+
239
+ if (key == null) {
240
+ // 无key使用轮询
241
+ return ThreadLocalRandom.current().nextInt(numPartitions);
242
+ }
243
+
244
+ String orderKey = (String) key;
245
+ // VIP订单路由到专用分区
246
+ if (orderKey.startsWith("VIP-")) {
247
+ return 0;
248
+ }
249
+ // 其他订单按key哈希
250
+ return Math.abs(Utils.murmur2(keyBytes)) % numPartitions;
251
+ }
252
+
253
+ @Override
254
+ public void close() {}
255
+
256
+ @Override
257
+ public void configure(Map<String, ?> configs) {}
258
+ }
259
+
260
+ // 使用自定义分区器
261
+ props.put("partitioner.class", "com.example.OrderPartitioner");
262
+ ```
263
+
264
+ ### 3. 幂等性与Exactly-Once语义
265
+
266
+ ```properties
267
+ # 幂等生产者配置(防止重复发送)
268
+ enable.idempotence=true
269
+ acks=all
270
+ retries=2147483647
271
+ max.in.flight.requests.per.connection=5
272
+ ```
273
+
274
+ ```java
275
+ // 事务性生产者(跨分区Exactly-Once)
276
+ props.put("enable.idempotence", "true");
277
+ props.put("transactional.id", "order-tx-producer-1");
278
+
279
+ KafkaProducer<String, String> producer = new KafkaProducer<>(props);
280
+ producer.initTransactions();
281
+
282
+ try {
283
+ producer.beginTransaction();
284
+
285
+ // 发送到多个Topic/Partition(原子操作)
286
+ producer.send(new ProducerRecord<>("order-events", orderKey, orderJson));
287
+ producer.send(new ProducerRecord<>("inventory-events", skuKey, inventoryJson));
288
+ producer.send(new ProducerRecord<>("payment-events", paymentKey, paymentJson));
289
+
290
+ // 提交消费位移(消费-转换-生产模式)
291
+ producer.sendOffsetsToTransaction(offsets, consumerGroupMetadata);
292
+
293
+ producer.commitTransaction();
294
+ } catch (ProducerFencedException | OutOfOrderSequenceException e) {
295
+ producer.close(); // 不可恢复的错误
296
+ } catch (KafkaException e) {
297
+ producer.abortTransaction(); // 可恢复的错误,回滚
298
+ }
299
+ ```
300
+
301
+ ### 4. 批处理与压缩
302
+
303
+ ```properties
304
+ # 批处理配置
305
+ batch.size=65536 # 批次大小(字节), 默认16384
306
+ linger.ms=20 # 等待时间(毫秒), 默认0
307
+ buffer.memory=67108864 # 缓冲区总大小(64MB)
308
+
309
+ # 压缩配置
310
+ compression.type=lz4 # 可选: none, gzip, snappy, lz4, zstd
311
+ # 压缩效果对比:
312
+ # gzip: 压缩率最高, CPU消耗最大
313
+ # snappy: 压缩率中等, CPU消耗低
314
+ # lz4: 压缩率中等, 速度最快
315
+ # zstd: 压缩率高, 速度快(推荐Kafka 2.1+)
316
+ ```
317
+
318
+ ### 5. acks配置与可靠性
319
+
320
+ ```properties
321
+ # acks=0: 不等待确认(最快, 可能丢数据)
322
+ # acks=1: 等待Leader确认(默认, Leader宕机可能丢数据)
323
+ # acks=all/-1: 等待所有ISR确认(最安全, 配合min.insync.replicas)
324
+ acks=all
325
+
326
+ # 可靠性最佳组合
327
+ acks=all
328
+ min.insync.replicas=2
329
+ replication.factor=3
330
+ # 保证: 即使1个Broker宕机也不丢数据
331
+ ```
332
+
333
+ ## 消费者
334
+
335
+ ### 1. 基础消费者
336
+
337
+ ```java
338
+ // Java消费者
339
+ Properties props = new Properties();
340
+ props.put("bootstrap.servers", "kafka1:9092,kafka2:9092,kafka3:9092");
341
+ props.put("group.id", "order-processing-group");
342
+ props.put("key.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
343
+ props.put("value.deserializer", "org.apache.kafka.common.serialization.StringDeserializer");
344
+ props.put("auto.offset.reset", "earliest");
345
+
346
+ KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props);
347
+ consumer.subscribe(Arrays.asList("order-events"));
348
+
349
+ try {
350
+ while (true) {
351
+ ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
352
+ for (ConsumerRecord<String, String> record : records) {
353
+ log.info("消费: topic={}, partition={}, offset={}, key={}, value={}",
354
+ record.topic(), record.partition(), record.offset(),
355
+ record.key(), record.value());
356
+ processOrder(record.value());
357
+ }
358
+ }
359
+ } finally {
360
+ consumer.close();
361
+ }
362
+ ```
363
+
364
+ ```python
365
+ # Python消费者(confluent-kafka)
366
+ from confluent_kafka import Consumer
367
+
368
+ conf = {
369
+ 'bootstrap.servers': 'kafka1:9092,kafka2:9092,kafka3:9092',
370
+ 'group.id': 'order-processing-group',
371
+ 'auto.offset.reset': 'earliest',
372
+ 'enable.auto.commit': False,
373
+ }
374
+
375
+ consumer = Consumer(conf)
376
+ consumer.subscribe(['order-events'])
377
+
378
+ try:
379
+ while True:
380
+ msg = consumer.poll(timeout=1.0)
381
+ if msg is None:
382
+ continue
383
+ if msg.error():
384
+ print(f'消费错误: {msg.error()}')
385
+ continue
386
+
387
+ process_order(msg.value().decode('utf-8'))
388
+ consumer.commit(asynchronous=False)
389
+ finally:
390
+ consumer.close()
391
+ ```
392
+
393
+ ### 2. 自动提交vs手动提交
394
+
395
+ ```java
396
+ // 自动提交(简单但可能重复消费或丢失)
397
+ props.put("enable.auto.commit", "true");
398
+ props.put("auto.commit.interval.ms", "5000");
399
+
400
+ // 手动同步提交(逐条)
401
+ props.put("enable.auto.commit", "false");
402
+
403
+ for (ConsumerRecord<String, String> record : records) {
404
+ processOrder(record.value());
405
+ // 处理完一条提交一次(性能差但最安全)
406
+ consumer.commitSync(Collections.singletonMap(
407
+ new TopicPartition(record.topic(), record.partition()),
408
+ new OffsetAndMetadata(record.offset() + 1)
409
+ ));
410
+ }
411
+
412
+ // 手动同步提交(批次)
413
+ for (ConsumerRecord<String, String> record : records) {
414
+ processOrder(record.value());
415
+ }
416
+ consumer.commitSync(); // 处理完一批再提交
417
+
418
+ // 手动异步提交(高性能)
419
+ consumer.commitAsync((offsets, exception) -> {
420
+ if (exception != null) {
421
+ log.error("提交失败: {}", offsets, exception);
422
+ }
423
+ });
424
+
425
+ // 最佳实践: 异步+同步混合
426
+ try {
427
+ while (true) {
428
+ ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(1000));
429
+ for (ConsumerRecord<String, String> record : records) {
430
+ processOrder(record.value());
431
+ }
432
+ consumer.commitAsync(); // 正常用异步
433
+ }
434
+ } catch (Exception e) {
435
+ log.error("消费异常", e);
436
+ } finally {
437
+ consumer.commitSync(); // 关闭前用同步确保提交
438
+ consumer.close();
439
+ }
440
+ ```
441
+
442
+ ### 3. Rebalance策略
443
+
444
+ ```java
445
+ // Rebalance触发条件:
446
+ // 1. 消费者加入/离开Group
447
+ // 2. 订阅Topic的Partition数变化
448
+ // 3. 消费者心跳超时(session.timeout.ms)
449
+ // 4. 消费者处理超时(max.poll.interval.ms)
450
+
451
+ // 关键配置
452
+ props.put("session.timeout.ms", "30000"); // 心跳超时
453
+ props.put("heartbeat.interval.ms", "10000"); // 心跳间隔(建议session.timeout的1/3)
454
+ props.put("max.poll.interval.ms", "300000"); // 两次poll最大间隔
455
+ props.put("max.poll.records", "500"); // 单次poll最大记录数
456
+
457
+ // 分区分配策略
458
+ props.put("partition.assignment.strategy",
459
+ "org.apache.kafka.clients.consumer.CooperativeStickyAssignor");
460
+ // 可选策略:
461
+ // RangeAssignor: 按范围分配(默认), 可能不均匀
462
+ // RoundRobinAssignor: 轮询分配, 较均匀
463
+ // StickyAssignor: 粘性分配, 尽量保持原有分配
464
+ // CooperativeStickyAssignor: 增量式协同Rebalance(推荐, 避免Stop-the-world)
465
+
466
+ // Rebalance监听器(用于保存中间状态)
467
+ consumer.subscribe(Arrays.asList("order-events"), new ConsumerRebalanceListener() {
468
+ @Override
469
+ public void onPartitionsRevoked(Collection<TopicPartition> partitions) {
470
+ // 分区被回收前: 提交当前offset, 保存处理状态
471
+ consumer.commitSync();
472
+ log.info("分区被回收: {}", partitions);
473
+ }
474
+
475
+ @Override
476
+ public void onPartitionsAssigned(Collection<TopicPartition> partitions) {
477
+ // 分区被分配后: 恢复处理状态
478
+ log.info("分区被分配: {}", partitions);
479
+ }
480
+ });
481
+ ```
482
+
483
+ ### 4. 消费策略
484
+
485
+ ```java
486
+ // 从指定Offset消费
487
+ consumer.assign(Arrays.asList(new TopicPartition("order-events", 0)));
488
+ consumer.seek(new TopicPartition("order-events", 0), 1000L);
489
+
490
+ // 从指定时间戳消费
491
+ Map<TopicPartition, Long> timestamps = new HashMap<>();
492
+ timestamps.put(new TopicPartition("order-events", 0),
493
+ Instant.parse("2026-03-01T00:00:00Z").toEpochMilli());
494
+ Map<TopicPartition, OffsetAndTimestamp> offsets =
495
+ consumer.offsetsForTimes(timestamps);
496
+ for (Map.Entry<TopicPartition, OffsetAndTimestamp> entry : offsets.entrySet()) {
497
+ consumer.seek(entry.getKey(), entry.getValue().offset());
498
+ }
499
+
500
+ // 从头消费
501
+ consumer.seekToBeginning(consumer.assignment());
502
+
503
+ // 从末尾消费
504
+ consumer.seekToEnd(consumer.assignment());
505
+ ```
506
+
507
+ ## 集群管理
508
+
509
+ ### 1. 副本与ISR配置
510
+
511
+ ```properties
512
+ # Broker端配置
513
+ default.replication.factor=3 # 默认副本因子
514
+ min.insync.replicas=2 # 最小ISR数量
515
+ replica.lag.time.max.ms=30000 # 副本最大落后时间
516
+ unclean.leader.election.enable=false # 禁止不干净的选举(防止丢数据)
517
+
518
+ # Topic级别覆盖
519
+ kafka-configs.sh --bootstrap-server localhost:9092 \
520
+ --entity-type topics --entity-name order-events \
521
+ --alter --add-config min.insync.replicas=2
522
+ ```
523
+
524
+ ### 2. Controller角色
525
+
526
+ ```
527
+ Controller职责:
528
+ 1. 分区Leader选举
529
+ 2. 副本状态管理
530
+ 3. Topic创建/删除
531
+ 4. Broker上下线处理
532
+
533
+ ZooKeeper模式:
534
+ 集群中一个Broker担任Controller
535
+ Controller通过ZooKeeper的临时节点选举
536
+ Controller将元数据写入ZooKeeper
537
+
538
+ KRaft模式(Kafka 3.3+, 推荐):
539
+ 不再依赖ZooKeeper
540
+ 使用Raft协议进行Controller选举
541
+ 元数据存储在Kafka自身的__cluster_metadata Topic中
542
+ ```
543
+
544
+ ### 3. ZooKeeper vs KRaft
545
+
546
+ ```properties
547
+ # ZooKeeper模式配置(旧版)
548
+ zookeeper.connect=zk1:2181,zk2:2181,zk3:2181/kafka
549
+ zookeeper.connection.timeout.ms=18000
550
+
551
+ # KRaft模式配置(Kafka 3.3+, 推荐)
552
+ process.roles=broker,controller # 或只设broker/controller
553
+ node.id=1
554
+ controller.quorum.voters=1@kafka1:9093,2@kafka2:9093,3@kafka3:9093
555
+ controller.listener.names=CONTROLLER
556
+ listeners=PLAINTEXT://:9092,CONTROLLER://:9093
557
+
558
+ # KRaft优势:
559
+ # 1. 去除ZooKeeper依赖, 简化运维
560
+ # 2. 更快的Controller切换(秒级→毫秒级)
561
+ # 3. 支持更多Partition(百万级)
562
+ # 4. 元数据同步更高效
563
+ ```
564
+
565
+ **KRaft迁移步骤**:
566
+ ```bash
567
+ # 1. 生成集群ID
568
+ kafka-storage.sh random-uuid
569
+
570
+ # 2. 格式化存储
571
+ kafka-storage.sh format -t <cluster-id> -c config/kraft/server.properties
572
+
573
+ # 3. 启动KRaft节点
574
+ kafka-server-start.sh config/kraft/server.properties
575
+
576
+ # 4. 从ZooKeeper迁移(Kafka 3.6+支持在线迁移)
577
+ kafka-metadata.sh --snapshot /path/to/snapshot \
578
+ --cluster-id <cluster-id>
579
+ ```
580
+
581
+ ## Schema管理
582
+
583
+ ### 1. Schema Registry
584
+
585
+ ```
586
+ 生产者 → Schema Registry → Kafka Broker → Schema Registry → 消费者
587
+ (注册Schema) (存储消息) (获取Schema)
588
+
589
+ Schema Registry存储层: _schemas (Kafka内部Topic)
590
+ 兼容性检查: 写入时验证新Schema与已有Schema的兼容性
591
+ ```
592
+
593
+ ### 2. Avro Schema
594
+
595
+ ```json
596
+ {
597
+ "type": "record",
598
+ "name": "OrderEvent",
599
+ "namespace": "com.example.events",
600
+ "fields": [
601
+ {"name": "orderId", "type": "string"},
602
+ {"name": "userId", "type": "string"},
603
+ {"name": "amount", "type": "double"},
604
+ {"name": "currency", "type": "string", "default": "CNY"},
605
+ {"name": "status", "type": {
606
+ "type": "enum",
607
+ "name": "OrderStatus",
608
+ "symbols": ["CREATED", "PAID", "SHIPPED", "DELIVERED", "CANCELLED"]
609
+ }},
610
+ {"name": "items", "type": {
611
+ "type": "array",
612
+ "items": {
613
+ "type": "record",
614
+ "name": "OrderItem",
615
+ "fields": [
616
+ {"name": "skuId", "type": "string"},
617
+ {"name": "quantity", "type": "int"},
618
+ {"name": "price", "type": "double"}
619
+ ]
620
+ }
621
+ }},
622
+ {"name": "createdAt", "type": {"type": "long", "logicalType": "timestamp-millis"}},
623
+ {"name": "metadata", "type": ["null", {"type": "map", "values": "string"}], "default": null}
624
+ ]
625
+ }
626
+ ```
627
+
628
+ ### 3. Protobuf Schema
629
+
630
+ ```protobuf
631
+ syntax = "proto3";
632
+ package com.example.events;
633
+
634
+ message OrderEvent {
635
+ string order_id = 1;
636
+ string user_id = 2;
637
+ double amount = 3;
638
+ string currency = 4;
639
+ OrderStatus status = 5;
640
+ repeated OrderItem items = 6;
641
+ int64 created_at = 7;
642
+ map<string, string> metadata = 8;
643
+
644
+ enum OrderStatus {
645
+ CREATED = 0;
646
+ PAID = 1;
647
+ SHIPPED = 2;
648
+ DELIVERED = 3;
649
+ CANCELLED = 4;
650
+ }
651
+
652
+ message OrderItem {
653
+ string sku_id = 1;
654
+ int32 quantity = 2;
655
+ double price = 3;
656
+ }
657
+ }
658
+ ```
659
+
660
+ ### 4. 兼容性策略
661
+
662
+ ```
663
+ 兼容性级别:
664
+ ┌──────────────────┬──────────────────────────────────────────┐
665
+ │ BACKWARD │ 新Schema可以读旧数据(默认) │
666
+ │ BACKWARD_TRANSITIVE │ 新Schema可以读所有历史版本数据 │
667
+ │ FORWARD │ 旧Schema可以读新数据 │
668
+ │ FORWARD_TRANSITIVE │ 所有历史版本可以读新数据 │
669
+ │ FULL │ 双向兼容(最新版本) │
670
+ │ FULL_TRANSITIVE │ 双向兼容(所有版本) │
671
+ │ NONE │ 不检查兼容性(不推荐生产使用) │
672
+ └──────────────────┴──────────────────────────────────────────┘
673
+
674
+ 安全的Schema演进操作:
675
+ ✅ 添加带默认值的字段(BACKWARD兼容)
676
+ ✅ 删除带默认值的字段(FORWARD兼容)
677
+ ✅ 添加可选字段(FULL兼容)
678
+ ❌ 删除必需字段(破坏BACKWARD)
679
+ ❌ 修改字段类型(破坏所有兼容性)
680
+ ❌ 重命名字段(破坏所有兼容性)
681
+ ```
682
+
683
+ ```bash
684
+ # Schema Registry API
685
+ # 注册Schema
686
+ curl -X POST http://schema-registry:8081/subjects/order-events-value/versions \
687
+ -H "Content-Type: application/vnd.schemaregistry.v1+json" \
688
+ -d '{"schema": "{\"type\":\"record\",\"name\":\"OrderEvent\",...}"}'
689
+
690
+ # 查看兼容性
691
+ curl http://schema-registry:8081/config/order-events-value
692
+
693
+ # 设置兼容性级别
694
+ curl -X PUT http://schema-registry:8081/config/order-events-value \
695
+ -H "Content-Type: application/vnd.schemaregistry.v1+json" \
696
+ -d '{"compatibility": "FULL_TRANSITIVE"}'
697
+
698
+ # 兼容性测试
699
+ curl -X POST http://schema-registry:8081/compatibility/subjects/order-events-value/versions/latest \
700
+ -H "Content-Type: application/vnd.schemaregistry.v1+json" \
701
+ -d '{"schema": "{...}"}'
702
+ ```
703
+
704
+ ## Kafka Streams
705
+
706
+ ### 1. KStream与KTable
707
+
708
+ ```java
709
+ // KStream: 无界事件流(每条记录是独立事件)
710
+ // KTable: 变更日志流(每个key只保留最新值,类似数据库表)
711
+
712
+ StreamsBuilder builder = new StreamsBuilder();
713
+
714
+ // KStream: 订单事件流
715
+ KStream<String, OrderEvent> orderStream = builder.stream("order-events",
716
+ Consumed.with(Serdes.String(), orderEventSerde));
717
+
718
+ // KTable: 用户信息表(从compacted topic读取)
719
+ KTable<String, UserInfo> userTable = builder.table("user-info",
720
+ Materialized.as("user-info-store"));
721
+
722
+ // 流处理: 过滤 + 转换
723
+ KStream<String, EnrichedOrder> enrichedOrders = orderStream
724
+ .filter((key, order) -> order.getAmount() > 0)
725
+ .mapValues(order -> EnrichedOrder.from(order))
726
+ .selectKey((key, order) -> order.getUserId());
727
+
728
+ // Stream-Table Join(用户信息关联)
729
+ KStream<String, OrderWithUser> ordersWithUser = enrichedOrders.join(
730
+ userTable,
731
+ (order, user) -> new OrderWithUser(order, user)
732
+ );
733
+
734
+ ordersWithUser.to("enriched-order-events",
735
+ Produced.with(Serdes.String(), enrichedOrderSerde));
736
+ ```
737
+
738
+ ### 2. 窗口操作
739
+
740
+ ```java
741
+ // 滚动窗口(Tumbling Window): 固定大小,无重叠
742
+ KTable<Windowed<String>, Long> tumblingCounts = orderStream
743
+ .groupByKey()
744
+ .windowedBy(TimeWindows.ofSizeWithNoGrace(Duration.ofMinutes(5)))
745
+ .count(Materialized.as("tumbling-counts"));
746
+
747
+ // 跳跃窗口(Hopping Window): 固定大小,有重叠
748
+ KTable<Windowed<String>, Long> hoppingCounts = orderStream
749
+ .groupByKey()
750
+ .windowedBy(TimeWindows.ofSizeAndGrace(
751
+ Duration.ofMinutes(5),
752
+ Duration.ofMinutes(1))
753
+ .advanceBy(Duration.ofMinutes(1)))
754
+ .count(Materialized.as("hopping-counts"));
755
+
756
+ // 滑动窗口(Sliding Window): 基于时间差
757
+ KTable<Windowed<String>, Long> slidingCounts = orderStream
758
+ .groupByKey()
759
+ .windowedBy(SlidingWindows.ofTimeDifferenceAndGrace(
760
+ Duration.ofMinutes(5),
761
+ Duration.ofMinutes(1)))
762
+ .count(Materialized.as("sliding-counts"));
763
+
764
+ // 会话窗口(Session Window): 基于活动间隔
765
+ KTable<Windowed<String>, Long> sessionCounts = orderStream
766
+ .groupByKey()
767
+ .windowedBy(SessionWindows.ofInactivityGapAndGrace(
768
+ Duration.ofMinutes(30),
769
+ Duration.ofMinutes(5)))
770
+ .count(Materialized.as("session-counts"));
771
+ ```
772
+
773
+ ### 3. 状态存储
774
+
775
+ ```java
776
+ // Kafka Streams使用RocksDB作为本地状态存储
777
+ // 状态存储自动备份到changelog topic(容错)
778
+
779
+ // 自定义状态存储
780
+ StoreBuilder<KeyValueStore<String, OrderAggregate>> storeBuilder =
781
+ Stores.keyValueStoreBuilder(
782
+ Stores.persistentKeyValueStore("order-aggregate-store"),
783
+ Serdes.String(),
784
+ orderAggregateSerde
785
+ ).withCachingEnabled()
786
+ .withLoggingEnabled(new HashMap<>()); // 启用changelog
787
+
788
+ builder.addStateStore(storeBuilder);
789
+
790
+ // 在Processor中使用状态存储
791
+ orderStream.process(() -> new Processor<String, OrderEvent, String, OrderAggregate>() {
792
+ private KeyValueStore<String, OrderAggregate> store;
793
+
794
+ @Override
795
+ public void init(ProcessorContext<String, OrderAggregate> context) {
796
+ store = context.getStateStore("order-aggregate-store");
797
+ }
798
+
799
+ @Override
800
+ public void process(Record<String, OrderEvent> record) {
801
+ OrderAggregate agg = store.get(record.key());
802
+ if (agg == null) agg = new OrderAggregate();
803
+ agg.add(record.value());
804
+ store.put(record.key(), agg);
805
+ context().forward(record.withValue(agg));
806
+ }
807
+ }, "order-aggregate-store");
808
+ ```
809
+
810
+ ## Kafka Connect
811
+
812
+ ### 1. Source Connector(数据导入)
813
+
814
+ ```json
815
+ {
816
+ "name": "mysql-source-connector",
817
+ "config": {
818
+ "connector.class": "io.debezium.connector.mysql.MySqlConnector",
819
+ "tasks.max": "1",
820
+ "database.hostname": "mysql-host",
821
+ "database.port": "3306",
822
+ "database.user": "debezium",
823
+ "database.password": "${env:MYSQL_PASSWORD}",
824
+ "database.server.id": "184054",
825
+ "topic.prefix": "cdc-mysql",
826
+ "database.include.list": "ecommerce",
827
+ "table.include.list": "ecommerce.orders,ecommerce.users",
828
+ "schema.history.internal.kafka.bootstrap.servers": "kafka:9092",
829
+ "schema.history.internal.kafka.topic": "schema-history.ecommerce",
830
+ "include.schema.changes": "true",
831
+ "snapshot.mode": "initial",
832
+ "transforms": "route",
833
+ "transforms.route.type": "org.apache.kafka.connect.transforms.RegexRouter",
834
+ "transforms.route.regex": "cdc-mysql\\.ecommerce\\.(.*)",
835
+ "transforms.route.replacement": "cdc.$1"
836
+ }
837
+ }
838
+ ```
839
+
840
+ ### 2. Sink Connector(数据导出)
841
+
842
+ ```json
843
+ {
844
+ "name": "elasticsearch-sink-connector",
845
+ "config": {
846
+ "connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
847
+ "tasks.max": "3",
848
+ "topics": "order-events,user-events",
849
+ "connection.url": "http://elasticsearch:9200",
850
+ "type.name": "_doc",
851
+ "key.ignore": "false",
852
+ "schema.ignore": "true",
853
+ "behavior.on.null.values": "delete",
854
+ "write.method": "upsert",
855
+ "transforms": "extractKey,timestampRouter",
856
+ "transforms.extractKey.type": "org.apache.kafka.connect.transforms.ExtractField$Key",
857
+ "transforms.extractKey.field": "id",
858
+ "transforms.timestampRouter.type": "org.apache.kafka.connect.transforms.TimestampRouter",
859
+ "transforms.timestampRouter.topic.format": "${topic}-${timestamp}",
860
+ "transforms.timestampRouter.timestamp.format": "yyyyMMdd"
861
+ }
862
+ }
863
+ ```
864
+
865
+ ### 3. Connect管理API
866
+
867
+ ```bash
868
+ # 查看已安装的Connector插件
869
+ curl http://connect:8083/connector-plugins | jq
870
+
871
+ # 创建Connector
872
+ curl -X POST http://connect:8083/connectors \
873
+ -H "Content-Type: application/json" \
874
+ -d @mysql-source-connector.json
875
+
876
+ # 查看Connector状态
877
+ curl http://connect:8083/connectors/mysql-source-connector/status | jq
878
+
879
+ # 暂停/恢复
880
+ curl -X PUT http://connect:8083/connectors/mysql-source-connector/pause
881
+ curl -X PUT http://connect:8083/connectors/mysql-source-connector/resume
882
+
883
+ # 重启Connector
884
+ curl -X POST http://connect:8083/connectors/mysql-source-connector/restart
885
+
886
+ # 重启单个Task
887
+ curl -X POST http://connect:8083/connectors/mysql-source-connector/tasks/0/restart
888
+
889
+ # 删除Connector
890
+ curl -X DELETE http://connect:8083/connectors/mysql-source-connector
891
+ ```
892
+
893
+ ## 性能优化
894
+
895
+ ### 1. 分区数规划
896
+
897
+ ```
898
+ 分区数计算公式:
899
+ 目标吞吐量 / min(生产者单分区吞吐, 消费者单分区吞吐)
900
+
901
+ 示例:
902
+ 目标: 100MB/s
903
+ 生产者单分区: 20MB/s
904
+ 消费者单分区: 10MB/s
905
+ 分区数 = 100 / 10 = 10 (取消费者瓶颈)
906
+
907
+ 分区数建议:
908
+ - 小规模(< 10MB/s): 6-12个分区
909
+ - 中规模(10-100MB/s): 12-64个分区
910
+ - 大规模(> 100MB/s): 64-256个分区
911
+ - 分区数上限考虑: 每个分区占用Broker约1MB内存和一个文件句柄
912
+
913
+ 注意: 分区数只能增加不能减少,规划时留有余量
914
+ ```
915
+
916
+ ### 2. 生产者性能调优
917
+
918
+ ```properties
919
+ # 批处理(核心优化)
920
+ batch.size=131072 # 128KB(默认16KB太小)
921
+ linger.ms=20 # 等待20ms凑批(默认0立即发送)
922
+
923
+ # 压缩
924
+ compression.type=lz4 # LZ4压缩(速度快,压缩率适中)
925
+
926
+ # 缓冲区
927
+ buffer.memory=134217728 # 128MB发送缓冲区
928
+ max.block.ms=60000 # 缓冲区满时最大阻塞时间
929
+
930
+ # 网络
931
+ send.buffer.bytes=131072 # TCP发送缓冲区
932
+ receive.buffer.bytes=65536 # TCP接收缓冲区
933
+
934
+ # 请求
935
+ max.request.size=10485760 # 单个请求最大10MB
936
+ request.timeout.ms=30000 # 请求超时30s
937
+ delivery.timeout.ms=120000 # 总投递超时120s
938
+ ```
939
+
940
+ ### 3. 消费者性能调优
941
+
942
+ ```properties
943
+ # 拉取配置
944
+ fetch.min.bytes=1048576 # 最小拉取1MB(减少请求次数)
945
+ fetch.max.bytes=52428800 # 最大拉取50MB
946
+ fetch.max.wait.ms=500 # 最大等待500ms
947
+ max.partition.fetch.bytes=10485760 # 单分区最大拉取10MB
948
+
949
+ # 消费批次
950
+ max.poll.records=1000 # 单次poll最大记录数
951
+
952
+ # 并行度: 消费者数 = 分区数(最佳1:1映射)
953
+ ```
954
+
955
+ ### 4. Broker端性能优化
956
+
957
+ ```properties
958
+ # 零拷贝(sendfile系统调用, 默认开启)
959
+ # Kafka使用零拷贝技术, 数据从磁盘直接传输到网卡, 不经过用户空间
960
+
961
+ # 页面缓存(Page Cache)
962
+ # Kafka依赖OS页面缓存而非JVM堆
963
+ # 建议: 预留25-50%物理内存给页面缓存
964
+ # JVM堆设置: 6-8GB即可(不要过大)
965
+ export KAFKA_HEAP_OPTS="-Xms6g -Xmx6g"
966
+
967
+ # 磁盘
968
+ log.dirs=/data1/kafka-logs,/data2/kafka-logs # 多磁盘并行写入
969
+ log.flush.interval.messages=10000 # 每10000条刷盘(依赖页面缓存更佳)
970
+ log.flush.interval.ms=1000 # 每秒刷盘
971
+
972
+ # 网络线程
973
+ num.network.threads=8 # 网络IO线程
974
+ num.io.threads=16 # 磁盘IO线程
975
+ num.replica.fetchers=4 # 副本拉取线程
976
+
977
+ # 日志段
978
+ log.segment.bytes=1073741824 # 1GB段文件
979
+ log.index.interval.bytes=4096 # 索引间隔
980
+ ```
981
+
982
+ ## 监控
983
+
984
+ ### 1. 核心JMX指标
985
+
986
+ ```
987
+ Broker指标:
988
+ ┌──────────────────────────────────────────┬────────────────────────┐
989
+ │ 指标 │ 说明 │
990
+ ├──────────────────────────────────────────┼────────────────────────┤
991
+ │ kafka.server:type=BrokerTopicMetrics, │ 消息入站速率(条/秒) │
992
+ │ name=MessagesInPerSec │ │
993
+ │ kafka.server:type=BrokerTopicMetrics, │ 入站字节速率(B/秒) │
994
+ │ name=BytesInPerSec │ │
995
+ │ kafka.server:type=BrokerTopicMetrics, │ 出站字节速率(B/秒) │
996
+ │ name=BytesOutPerSec │ │
997
+ │ kafka.server:type=ReplicaManager, │ ISR扩缩次数(频繁则不健康) │
998
+ │ name=IsrShrinksPerSec │ │
999
+ │ kafka.server:type=ReplicaManager, │ 副本不足的分区数 │
1000
+ │ name=UnderReplicatedPartitions │ │
1001
+ │ kafka.controller:type=KafkaController, │ 活跃Controller数(应=1) │
1002
+ │ name=ActiveControllerCount │ │
1003
+ │ kafka.server:type=ReplicaManager, │ Leader分区数 │
1004
+ │ name=LeaderCount │ │
1005
+ │ kafka.network:type=RequestMetrics, │ 请求延迟(ms) │
1006
+ │ name=TotalTimeMs,request=Produce │ │
1007
+ │ kafka.log:type=LogFlushStats, │ 日志刷盘速率 │
1008
+ │ name=LogFlushRateAndTimeMs │ │
1009
+ └──────────────────────────────────────────┴────────────────────────┘
1010
+
1011
+ 生产者指标:
1012
+ record-send-rate: 发送速率(条/秒)
1013
+ record-error-rate: 发送错误率
1014
+ request-latency-avg: 平均请求延迟
1015
+ batch-size-avg: 平均批次大小
1016
+ compression-rate-avg: 平均压缩率
1017
+
1018
+ 消费者指标:
1019
+ records-consumed-rate: 消费速率(条/秒)
1020
+ records-lag-max: 最大消费延迟(条)
1021
+ fetch-latency-avg: 平均拉取延迟
1022
+ commit-latency-avg: 平均提交延迟
1023
+ ```
1024
+
1025
+ ### 2. Lag监控
1026
+
1027
+ ```bash
1028
+ # 命令行查看Consumer Lag
1029
+ kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
1030
+ --describe --group order-processing-group
1031
+
1032
+ # 输出示例:
1033
+ # GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG
1034
+ # order-processing-group order-events 0 1000 1050 50
1035
+ # order-processing-group order-events 1 2000 2100 100
1036
+ # order-processing-group order-events 2 3000 3010 10
1037
+ ```
1038
+
1039
+ **Burrow监控配置**:
1040
+ ```yaml
1041
+ # burrow.toml
1042
+ [general]
1043
+ access-control-allow-origin = "*"
1044
+
1045
+ [zookeeper]
1046
+ servers = ["zk1:2181", "zk2:2181", "zk3:2181"]
1047
+
1048
+ [cluster.production]
1049
+ class-name = "kafka"
1050
+ servers = ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
1051
+ topic-refresh = 60
1052
+ offset-refresh = 30
1053
+
1054
+ [consumer.production]
1055
+ class-name = "kafka"
1056
+ cluster = "production"
1057
+ servers = ["kafka1:9092", "kafka2:9092", "kafka3:9092"]
1058
+ group-denylist = "^console-consumer-"
1059
+ offset-refresh = 30
1060
+
1061
+ [notifier.slack]
1062
+ class-name = "http"
1063
+ url-open = "https://hooks.slack.com/services/xxx"
1064
+ template-open = "burrow-alert.tmpl"
1065
+ send-close = true
1066
+ interval = 60
1067
+ threshold = 2 # WARNING以上才告警
1068
+
1069
+ # Burrow评估状态:
1070
+ # OK: Lag稳定或下降
1071
+ # WARNING: Lag持续增长
1072
+ # ERR: Lag增长且消费停滞
1073
+ # STOP: 消费完全停止
1074
+ ```
1075
+
1076
+ ### 3. Prometheus + Grafana监控
1077
+
1078
+ ```yaml
1079
+ # docker-compose.yml - JMX Exporter
1080
+ services:
1081
+ kafka:
1082
+ environment:
1083
+ KAFKA_JMX_OPTS: >-
1084
+ -Dcom.sun.management.jmxremote
1085
+ -Dcom.sun.management.jmxremote.port=9999
1086
+ -Dcom.sun.management.jmxremote.authenticate=false
1087
+ -Dcom.sun.management.jmxremote.ssl=false
1088
+ EXTRA_ARGS: >-
1089
+ -javaagent:/opt/jmx-exporter/jmx_prometheus_javaagent.jar=7071:/opt/jmx-exporter/kafka-broker.yml
1090
+
1091
+ # prometheus.yml
1092
+ scrape_configs:
1093
+ - job_name: 'kafka'
1094
+ static_configs:
1095
+ - targets: ['kafka1:7071', 'kafka2:7071', 'kafka3:7071']
1096
+ ```
1097
+
1098
+ ## 安全
1099
+
1100
+ ### 1. SASL认证
1101
+
1102
+ ```properties
1103
+ # Broker配置(SASL/SCRAM)
1104
+ listeners=SASL_SSL://0.0.0.0:9093
1105
+ advertised.listeners=SASL_SSL://kafka-broker:9093
1106
+ security.inter.broker.protocol=SASL_SSL
1107
+ sasl.mechanism.inter.broker.protocol=SCRAM-SHA-512
1108
+ sasl.enabled.mechanisms=SCRAM-SHA-512
1109
+
1110
+ # 创建SCRAM用户
1111
+ kafka-configs.sh --bootstrap-server localhost:9092 \
1112
+ --alter --add-config 'SCRAM-SHA-512=[password=secret123]' \
1113
+ --entity-type users --entity-name producer-user
1114
+
1115
+ kafka-configs.sh --bootstrap-server localhost:9092 \
1116
+ --alter --add-config 'SCRAM-SHA-512=[password=secret456]' \
1117
+ --entity-type users --entity-name consumer-user
1118
+ ```
1119
+
1120
+ ### 2. SSL/TLS加密
1121
+
1122
+ ```bash
1123
+ # 生成CA证书
1124
+ openssl req -new -x509 -keyout ca-key -out ca-cert -days 3650 \
1125
+ -subj "/CN=KafkaCA" -nodes
1126
+
1127
+ # 为每个Broker生成密钥库
1128
+ keytool -keystore kafka-broker.keystore.jks -alias broker \
1129
+ -genkey -keyalg RSA -validity 3650 \
1130
+ -dname "CN=kafka-broker,OU=Kafka,O=Example,L=BJ,ST=BJ,C=CN" \
1131
+ -storepass changeit -keypass changeit
1132
+
1133
+ # 签名证书
1134
+ keytool -keystore kafka-broker.keystore.jks -alias broker \
1135
+ -certreq -file cert-file -storepass changeit
1136
+ openssl x509 -req -CA ca-cert -CAkey ca-key -in cert-file \
1137
+ -out cert-signed -days 3650 -CAcreateserial
1138
+
1139
+ # 导入CA和签名证书
1140
+ keytool -keystore kafka-broker.keystore.jks -alias CARoot \
1141
+ -import -file ca-cert -storepass changeit -noprompt
1142
+ keytool -keystore kafka-broker.keystore.jks -alias broker \
1143
+ -import -file cert-signed -storepass changeit
1144
+
1145
+ # 创建信任库
1146
+ keytool -keystore kafka.truststore.jks -alias CARoot \
1147
+ -import -file ca-cert -storepass changeit -noprompt
1148
+ ```
1149
+
1150
+ ```properties
1151
+ # Broker SSL配置
1152
+ ssl.keystore.location=/etc/kafka/ssl/kafka-broker.keystore.jks
1153
+ ssl.keystore.password=changeit
1154
+ ssl.key.password=changeit
1155
+ ssl.truststore.location=/etc/kafka/ssl/kafka.truststore.jks
1156
+ ssl.truststore.password=changeit
1157
+ ssl.client.auth=required
1158
+ ssl.endpoint.identification.algorithm=https
1159
+ ```
1160
+
1161
+ ### 3. ACL授权
1162
+
1163
+ ```bash
1164
+ # 授权生产者写入
1165
+ kafka-acls.sh --bootstrap-server localhost:9092 \
1166
+ --add --allow-principal User:producer-user \
1167
+ --operation Write --topic order-events
1168
+
1169
+ # 授权消费者读取
1170
+ kafka-acls.sh --bootstrap-server localhost:9092 \
1171
+ --add --allow-principal User:consumer-user \
1172
+ --operation Read --topic order-events \
1173
+ --group order-processing-group
1174
+
1175
+ # 授权Consumer Group
1176
+ kafka-acls.sh --bootstrap-server localhost:9092 \
1177
+ --add --allow-principal User:consumer-user \
1178
+ --operation Read --group order-processing-group
1179
+
1180
+ # 查看ACL
1181
+ kafka-acls.sh --bootstrap-server localhost:9092 \
1182
+ --list --topic order-events
1183
+
1184
+ # 删除ACL
1185
+ kafka-acls.sh --bootstrap-server localhost:9092 \
1186
+ --remove --allow-principal User:producer-user \
1187
+ --operation Write --topic order-events
1188
+
1189
+ # 通配符授权(前缀匹配)
1190
+ kafka-acls.sh --bootstrap-server localhost:9092 \
1191
+ --add --allow-principal User:analytics-user \
1192
+ --operation Read --topic order- --resource-pattern-type prefixed
1193
+ ```
1194
+
1195
+ ## 运维
1196
+
1197
+ ### 1. 扩容与缩容
1198
+
1199
+ ```bash
1200
+ # 扩容: 添加新Broker后, 重新分配分区
1201
+ # 1. 生成分配方案
1202
+ kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
1203
+ --topics-to-move-json-file topics.json \
1204
+ --broker-list "0,1,2,3" \
1205
+ --generate
1206
+
1207
+ # topics.json
1208
+ # {"topics": [{"topic": "order-events"}], "version": 1}
1209
+
1210
+ # 2. 执行迁移
1211
+ kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
1212
+ --reassignment-json-file reassignment.json \
1213
+ --execute \
1214
+ --throttle 50000000 # 限速50MB/s避免影响业务
1215
+
1216
+ # 3. 验证迁移状态
1217
+ kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
1218
+ --reassignment-json-file reassignment.json \
1219
+ --verify
1220
+
1221
+ # 缩容: 先迁移分区到其他Broker, 再下线
1222
+ # 确保待下线Broker上无Leader分区
1223
+ ```
1224
+
1225
+ ### 2. 数据保留策略
1226
+
1227
+ ```properties
1228
+ # 基于时间保留
1229
+ log.retention.hours=168 # 保留7天(默认)
1230
+ log.retention.minutes=10080 # 更精确的分钟级设置
1231
+ log.retention.ms=604800000 # 最精确的毫秒级设置
1232
+
1233
+ # 基于大小保留
1234
+ log.retention.bytes=107374182400 # 每分区保留100GB
1235
+ # -1表示无大小限制
1236
+
1237
+ # 基于压缩(Compaction)
1238
+ log.cleanup.policy=compact # 只保留每个key的最新值
1239
+ log.cleaner.min.compaction.lag.ms=86400000 # 最小压缩延迟24h
1240
+ log.cleaner.delete.retention.ms=86400000 # 墓碑消息保留24h
1241
+
1242
+ # 混合策略(同时基于时间和压缩)
1243
+ log.cleanup.policy=compact,delete
1244
+
1245
+ # Topic级别覆盖
1246
+ kafka-configs.sh --bootstrap-server localhost:9092 \
1247
+ --entity-type topics --entity-name order-events \
1248
+ --alter --add-config retention.ms=2592000000 # 30天
1249
+ ```
1250
+
1251
+ ### 3. 跨数据中心复制(MirrorMaker 2)
1252
+
1253
+ ```properties
1254
+ # mm2.properties (MirrorMaker 2配置)
1255
+ clusters = source, target
1256
+
1257
+ source.bootstrap.servers = dc1-kafka1:9092,dc1-kafka2:9092
1258
+ target.bootstrap.servers = dc2-kafka1:9092,dc2-kafka2:9092
1259
+
1260
+ # 复制配置
1261
+ source->target.enabled = true
1262
+ source->target.topics = order-events,user-events,payment-events
1263
+ source->target.topics.exclude = .*-internal,__.*
1264
+ source->target.groups = order-processing-group,analytics-group
1265
+
1266
+ # 同步配置
1267
+ replication.factor = 3
1268
+ offset-syncs.topic.replication.factor = 3
1269
+ heartbeats.topic.replication.factor = 3
1270
+ checkpoints.topic.replication.factor = 3
1271
+
1272
+ # 性能配置
1273
+ tasks.max = 4
1274
+ producer.buffer.memory = 134217728
1275
+ consumer.fetch.max.bytes = 52428800
1276
+
1277
+ # 偏移量同步(故障切换时保持消费位置)
1278
+ sync.group.offsets.enabled = true
1279
+ sync.group.offsets.interval.seconds = 10
1280
+ emit.checkpoints.enabled = true
1281
+ emit.checkpoints.interval.seconds = 30
1282
+ ```
1283
+
1284
+ ```bash
1285
+ # 启动MirrorMaker 2
1286
+ connect-mirror-maker.sh mm2.properties
1287
+
1288
+ # 灾难恢复切换:
1289
+ # 1. 停止源集群的生产者
1290
+ # 2. 等待MirrorMaker 2同步完成(检查checkpoint lag)
1291
+ # 3. 将消费者指向目标集群
1292
+ # 4. 使用synced offset恢复消费位置
1293
+ # 5. 启动目标集群的生产者
1294
+ ```
1295
+
1296
+ ### 4. Topic迁移
1297
+
1298
+ ```bash
1299
+ # 分区Leader重新选举(优先副本选举)
1300
+ kafka-leader-election.sh --bootstrap-server localhost:9092 \
1301
+ --election-type preferred \
1302
+ --all-topic-partitions
1303
+
1304
+ # 增加Topic副本因子
1305
+ # 1. 生成增加副本的reassignment JSON
1306
+ cat > increase-rf.json << 'EOF'
1307
+ {
1308
+ "version": 1,
1309
+ "partitions": [
1310
+ {"topic": "order-events", "partition": 0, "replicas": [0, 1, 2]},
1311
+ {"topic": "order-events", "partition": 1, "replicas": [1, 2, 0]},
1312
+ {"topic": "order-events", "partition": 2, "replicas": [2, 0, 1]}
1313
+ ]
1314
+ }
1315
+ EOF
1316
+
1317
+ # 2. 执行
1318
+ kafka-reassign-partitions.sh --bootstrap-server localhost:9092 \
1319
+ --reassignment-json-file increase-rf.json \
1320
+ --execute --throttle 50000000
1321
+ ```
1322
+
1323
+ ## 常见陷阱
1324
+
1325
+ ### 1. 消费者Lag暴涨
1326
+
1327
+ ```
1328
+ 症状: Consumer Lag持续增长, 消费速度跟不上生产速度
1329
+ 原因:
1330
+ - 消费者处理逻辑耗时过长(数据库慢查询/外部调用超时)
1331
+ - 消费者数量不足(少于分区数)
1332
+ - GC暂停导致消费停滞
1333
+ - max.poll.records过大, 处理超过max.poll.interval.ms
1334
+
1335
+ 排查步骤:
1336
+ 1. kafka-consumer-groups.sh --describe 查看各分区Lag
1337
+ 2. 检查消费者日志是否有处理异常/超时
1338
+ 3. 监控消费者JVM GC情况
1339
+ 4. 检查下游依赖(DB/缓存/HTTP)的响应时间
1340
+
1341
+ 解决方案:
1342
+ ✅ 增加消费者实例(不超过分区数)
1343
+ ✅ 减小max.poll.records, 增大max.poll.interval.ms
1344
+ ✅ 异步处理: poll后放入本地队列, 多线程处理
1345
+ ✅ 优化下游调用(批量写DB/连接池/缓存)
1346
+ ❌ 盲目增加分区数(需要同时增加消费者才有效)
1347
+ ```
1348
+
1349
+ ### 2. Rebalance风暴
1350
+
1351
+ ```
1352
+ 症状: Consumer频繁触发Rebalance, 消费几乎停滞
1353
+ 原因:
1354
+ - session.timeout.ms过短, 心跳超时触发Rebalance
1355
+ - max.poll.interval.ms过短, 处理慢导致被踢出Group
1356
+ - Consumer频繁启停(K8s Pod频繁重启)
1357
+ - GC暂停超过session.timeout.ms
1358
+
1359
+ 解决方案:
1360
+ ✅ 使用CooperativeStickyAssignor(增量Rebalance)
1361
+ ✅ 增大session.timeout.ms(30-60s)
1362
+ ✅ 增大max.poll.interval.ms(5-10min)
1363
+ ✅ 减小max.poll.records, 确保处理时间可控
1364
+ ✅ 设置group.instance.id启用静态成员(避免重启触发Rebalance)
1365
+
1366
+ # 静态成员配置(Kafka 2.3+)
1367
+ group.instance.id=consumer-host-1 # 每个实例唯一
1368
+ session.timeout.ms=60000 # 可以设更长(静态成员离开不立即Rebalance)
1369
+ ```
1370
+
1371
+ ### 3. 分区过多
1372
+
1373
+ ```
1374
+ 症状: Broker内存占用高, Controller切换慢, 端到端延迟增加
1375
+ 原因: 分区数远超实际吞吐需求
1376
+
1377
+ 影响:
1378
+ - 每个分区占用约1MB Broker内存(元数据+索引)
1379
+ - Controller故障恢复时间与分区数成正比
1380
+ - 文件句柄数增加(每分区2-3个文件)
1381
+ - 生产者内存增加(每分区一个RecordBatch缓冲)
1382
+
1383
+ 建议:
1384
+ ✅ 单集群分区总数 < 200,000(KRaft模式可更多)
1385
+ ✅ 单Broker分区数 < 4,000
1386
+ ✅ 根据实际吞吐需求规划, 预留20-30%余量
1387
+ ❌ 不要盲目设置大量分区(分区数只增不减)
1388
+ ```
1389
+
1390
+ ### 4. 消息丢失
1391
+
1392
+ ```
1393
+ 场景1: 生产端丢失
1394
+ 原因: acks=0或acks=1且Leader宕机
1395
+ 解决: acks=all + min.insync.replicas=2 + retries=MAX
1396
+
1397
+ 场景2: Broker端丢失
1398
+ 原因: unclean.leader.election.enable=true, 非ISR副本当选Leader
1399
+ 解决: unclean.leader.election.enable=false
1400
+
1401
+ 场景3: 消费端丢失
1402
+ 原因: 自动提交offset后处理失败
1403
+ 解决: 手动提交, 先处理后提交(at-least-once)
1404
+
1405
+ 生产环境防丢失配置组合:
1406
+ # 生产者
1407
+ acks=all
1408
+ retries=2147483647
1409
+ enable.idempotence=true
1410
+ max.in.flight.requests.per.connection=5
1411
+
1412
+ # Broker
1413
+ min.insync.replicas=2
1414
+ unclean.leader.election.enable=false
1415
+ default.replication.factor=3
1416
+
1417
+ # 消费者
1418
+ enable.auto.commit=false
1419
+ # 手动提交: 先处理,后提交
1420
+ ```
1421
+
1422
+ ### 5. 重复消费
1423
+
1424
+ ```
1425
+ 场景: 消费者处理完消息但提交offset前宕机, 重启后重复消费
1426
+ 原因: at-least-once语义下的正常行为
1427
+
1428
+ 解决方案:
1429
+ 1. 幂等消费(推荐)
1430
+ - 使用消息中的唯一ID(orderId)做去重
1431
+ - 数据库INSERT时使用UPSERT/ON CONFLICT
1432
+ - Redis SETNX记录已处理消息ID
1433
+
1434
+ 2. Exactly-Once语义
1435
+ - Kafka事务(消费-转换-生产场景)
1436
+ - 将offset和业务数据写入同一事务(如同一数据库)
1437
+
1438
+ 3. 消费去重表
1439
+ CREATE TABLE consumed_offsets (
1440
+ consumer_group VARCHAR(255),
1441
+ topic VARCHAR(255),
1442
+ partition_id INT,
1443
+ offset_val BIGINT,
1444
+ processed_at TIMESTAMP,
1445
+ PRIMARY KEY (consumer_group, topic, partition_id)
1446
+ );
1447
+ ```
1448
+
1449
+ ## 学习路线
1450
+
1451
+ ### 入门级 (1-2周)
1452
+ 1. 理解Kafka核心概念(Topic/Partition/Consumer Group/Offset)
1453
+ 2. 搭建单节点Kafka(Docker)
1454
+ 3. 使用命令行工具收发消息
1455
+ 4. 编写简单的生产者和消费者
1456
+
1457
+ ### 中级 (2-4周)
1458
+ 1. 集群搭建与副本管理
1459
+ 2. Schema管理与序列化
1460
+ 3. Consumer Group与Rebalance机制
1461
+ 4. Kafka Connect数据集成
1462
+ 5. 基础监控与告警
1463
+
1464
+ ### 高级 (1-2月)
1465
+ 1. Kafka Streams流处理
1466
+ 2. 事务与Exactly-Once语义
1467
+ 3. 性能调优与容量规划
1468
+ 4. 安全配置(SASL/SSL/ACL)
1469
+ 5. 跨数据中心复制(MirrorMaker 2)
1470
+
1471
+ ### 专家级 (持续)
1472
+ 1. KRaft架构与迁移
1473
+ 2. 大规模集群运维(百万分区)
1474
+ 3. CDC管道设计(Debezium)
1475
+ 4. 故障演练与灾难恢复
1476
+ 5. 自定义Interceptor/Serializer/Partitioner
1477
+
1478
+ ## 参考资料
1479
+
1480
+ ### 官方文档
1481
+ - [Kafka官方文档](https://kafka.apache.org/documentation/)
1482
+ - [Confluent文档](https://docs.confluent.io/)
1483
+ - [KRaft文档](https://kafka.apache.org/documentation/#kraft)
1484
+
1485
+ ### 工具
1486
+ - [Kafka UI](https://github.com/provectus/kafka-ui) - Web管理界面
1487
+ - [Burrow](https://github.com/linkedin/Burrow) - Consumer Lag监控
1488
+ - [AKHQ](https://github.com/tchiotludo/akhq) - Kafka管理平台
1489
+ - [Debezium](https://debezium.io/) - CDC连接器
1490
+
1491
+ ### 书籍
1492
+ - 《Kafka权威指南》(第2版) - O'Reilly
1493
+ - 《Kafka Streams实战》 - Manning
1494
+
1495
+ ---
1496
+
1497
+ ## Agent Checklist
1498
+
1499
+ - [ ] Topic设计: 命名规范(domain.entity.event), 分区数合理, 副本因子>=3
1500
+ - [ ] 生产者: acks=all, 开启幂等性, 批处理+压缩配置
1501
+ - [ ] 消费者: 手动提交offset, CooperativeStickyAssignor, 幂等消费
1502
+ - [ ] Schema: 使用Schema Registry, 设置兼容性策略, Avro/Protobuf序列化
1503
+ - [ ] 集群: min.insync.replicas=2, unclean.leader.election.enable=false
1504
+ - [ ] 安全: SASL认证 + SSL/TLS加密 + ACL授权, 最小权限原则
1505
+ - [ ] 监控: JMX指标接入Prometheus, Consumer Lag告警, Broker健康检查
1506
+ - [ ] 性能: 分区数与消费者数匹配, 零拷贝+页面缓存, 合理的JVM堆(6-8GB)
1507
+ - [ ] 保留策略: 按业务需求配置retention, 重要Topic使用compact策略
1508
+ - [ ] 灾备: MirrorMaker 2跨DC复制, 定期故障演练, 切换SOP文档化
1509
+ - [ ] 防丢失: acks=all + min.insync.replicas=2 + 手动提交 + 幂等消费
1510
+ - [ ] 防重复: 业务幂等 + 去重表/Redis去重 + Exactly-Once事务(如需要)
1511
+ - [ ] KRaft迁移: 评估是否已满足Kafka 3.3+要求, 规划ZK→KRaft迁移路径
1512
+
1513
+ ---
1514
+
1515
+ **知识ID**: `kafka-complete`
1516
+ **领域**: data-engineering
1517
+ **类型**: standards
1518
+ **难度**: intermediate
1519
+ **质量分**: 95
1520
+ **维护者**: data-team@umadev.com
1521
+ **最后更新**: 2026-03-28