@umacloud/knowledge 1.0.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (418) hide show
  1. package/00-governance/governance-capabilities.md +557 -0
  2. package/00-governance/knowledge-map.md +39 -0
  3. package/00-governance/maintenance-policy.md +76 -0
  4. package/00-governance/review-checklist.md +81 -0
  5. package/README.md +13 -0
  6. package/ai/01-standards/agent-development-complete.md +691 -0
  7. package/ai/01-standards/llm-application-complete.md +488 -0
  8. package/ai/01-standards/mlops-complete.md +798 -0
  9. package/ai/01-standards/prompt-engineering-complete.md +646 -0
  10. package/ai/01-standards/rag-architecture-complete.md +649 -0
  11. package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
  12. package/ai/03-checklists/ai-project-checklist.md +215 -0
  13. package/ai/04-antipatterns/ai-antipatterns.md +661 -0
  14. package/ai/05-cases/case-rag-production.md +147 -0
  15. package/ai/06-glossary/ai-glossary.md +162 -0
  16. package/ai/agent-evaluation-benchmark.md +53 -0
  17. package/ai/ai-agent-memory-context-management.md +41 -0
  18. package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
  19. package/ai/ai-data-security-and-compliance-playbook.md +37 -0
  20. package/ai/ai-domain-index-and-checklist.md +40 -0
  21. package/ai/ai-governance-maturity-model.md +50 -0
  22. package/ai/ai-model-selection-and-routing-strategy.md +47 -0
  23. package/ai/ai-observability-and-oncall-runbook.md +52 -0
  24. package/ai/ai-rag-engineering-playbook.md +42 -0
  25. package/ai/ai-red-team-and-safety-evaluation.md +42 -0
  26. package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
  27. package/ai/llm-agent-engineering-deep-dive.md +57 -0
  28. package/ai/prompt-and-tool-guardrails.md +52 -0
  29. package/api/01-standards/enterprise-api-standards.md +198 -0
  30. package/api/01-standards/rest-api-design-guide.md +63 -0
  31. package/api/02-playbooks/api-pagination-playbook.md +93 -0
  32. package/api/02-playbooks/graphql-production-playbook.md +176 -0
  33. package/api/03-checklists/api-review-checklist.md +55 -0
  34. package/api/04-antipatterns/api-antipatterns.md +112 -0
  35. package/architecture/01-standards/api-gateway-patterns.md +496 -0
  36. package/architecture/01-standards/cloud-native-patterns.md +644 -0
  37. package/architecture/01-standards/distributed-systems-patterns.md +591 -0
  38. package/architecture/01-standards/event-driven-architecture.md +595 -0
  39. package/architecture/01-standards/microservices-patterns-complete.md +968 -0
  40. package/architecture/01-standards/microservices-patterns.md +495 -0
  41. package/architecture/01-standards/system-design-interview.md +664 -0
  42. package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
  43. package/architecture/02-playbooks/migration-playbook.md +780 -0
  44. package/architecture/02-playbooks/system-design-playbook.md +779 -0
  45. package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
  46. package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
  47. package/architecture/05-cases/case-netflix-microservices.md +413 -0
  48. package/architecture/06-glossary/architecture-glossary.md +164 -0
  49. package/architecture/adr-template-and-examples.md +38 -0
  50. package/architecture/api-gateway-deep-dive.md +1291 -0
  51. package/architecture/configuration-management.md +1162 -0
  52. package/architecture/distributed-transactions.md +1220 -0
  53. package/architecture/microservices-complete.md +735 -0
  54. package/architecture/resilience-and-disaster-patterns.md +37 -0
  55. package/architecture/service-governance.md +1198 -0
  56. package/architecture/system-architecture-deep-dive.md +37 -0
  57. package/backend/01-standards/analytics-and-growth.md +65 -0
  58. package/backend/01-standards/api-and-error-conventions.md +120 -0
  59. package/backend/01-standards/application-layering-and-packaging.md +160 -0
  60. package/backend/01-standards/auth-implementation.md +104 -0
  61. package/backend/01-standards/backend-framework-idioms.md +74 -0
  62. package/backend/01-standards/background-jobs-and-async.md +66 -0
  63. package/backend/01-standards/caching-strategies-complete.md +390 -0
  64. package/backend/01-standards/config-and-observability.md +77 -0
  65. package/backend/01-standards/data-modeling-and-persistence.md +94 -0
  66. package/backend/01-standards/django-complete.md +1765 -0
  67. package/backend/01-standards/email-and-notifications.md +64 -0
  68. package/backend/01-standards/fastapi-complete.md +925 -0
  69. package/backend/01-standards/file-upload-and-storage.md +66 -0
  70. package/backend/01-standards/graphql-api-complete.md +416 -0
  71. package/backend/01-standards/llm-application-standard.md +78 -0
  72. package/backend/01-standards/message-queue-patterns.md +379 -0
  73. package/backend/01-standards/microservices-and-distributed.md +78 -0
  74. package/backend/01-standards/nestjs-complete.md +2167 -0
  75. package/backend/01-standards/payment-integration.md +80 -0
  76. package/backend/01-standards/rate-limiting-complete.md +451 -0
  77. package/backend/01-standards/realtime-and-websocket.md +65 -0
  78. package/backend/01-standards/search-and-filtering.md +64 -0
  79. package/backend/01-standards/spring-boot-complete.md +445 -0
  80. package/backend/02-playbooks/api-design-playbook.md +718 -0
  81. package/backend/02-playbooks/email-send-playbook.md +130 -0
  82. package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
  83. package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
  84. package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
  85. package/backend/03-checklists/api-launch-checklist.md +189 -0
  86. package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
  87. package/blockchain/01-standards/blockchain-basics.md +557 -0
  88. package/blockchain/01-standards/smart-contract-development.md +1315 -0
  89. package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
  90. package/cicd/01-standards/github-actions-complete.md +473 -0
  91. package/cicd/01-standards/release-and-store-submission.md +75 -0
  92. package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
  93. package/cicd/02-playbooks/release-management-playbook.md +605 -0
  94. package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
  95. package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
  96. package/cicd/05-cases/case-deployment-automation.md +221 -0
  97. package/cicd/05-cases/case-gitops-transformation.md +212 -0
  98. package/cicd/06-glossary/cicd-glossary.md +114 -0
  99. package/cicd/cicd-blueprint-deep-dive.md +38 -0
  100. package/cicd/release-readiness-gate.md +37 -0
  101. package/cloud-native/01-standards/container-security.md +741 -0
  102. package/cloud-native/01-standards/kubernetes-complete.md +812 -0
  103. package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
  104. package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
  105. package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
  106. package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
  107. package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
  108. package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
  109. package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
  110. package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
  111. package/cloud-native/03-checklists/container-security-checklist.md +431 -0
  112. package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
  113. package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
  114. package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
  115. package/cloud-native/05-cases/case-k8s-migration.md +478 -0
  116. package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
  117. package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
  118. package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
  119. package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
  120. package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
  121. package/data/01-standards/elasticsearch-complete.md +2098 -0
  122. package/data/01-standards/postgresql-complete.md +1613 -0
  123. package/data/01-standards/redis-complete.md +1527 -0
  124. package/data/02-playbooks/database-optimization-playbook.md +403 -0
  125. package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
  126. package/data/03-checklists/database-launch-checklist.md +187 -0
  127. package/data/04-antipatterns/database-antipatterns.md +873 -0
  128. package/data/05-cases/case-database-migration.md +310 -0
  129. package/data/06-glossary/database-glossary.md +440 -0
  130. package/data/data-governance-and-modeling-deep-dive.md +39 -0
  131. package/data-engineering/01-standards/airflow-complete.md +523 -0
  132. package/data-engineering/01-standards/kafka-complete.md +1521 -0
  133. package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
  134. package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
  135. package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
  136. package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
  137. package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
  138. package/database/01-standards/database-schema-standards.md +147 -0
  139. package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
  140. package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
  141. package/database/02-playbooks/postgresql-production-playbook.md +146 -0
  142. package/database/02-playbooks/redis-caching-playbook.md +117 -0
  143. package/database/03-checklists/database-review-checklist.md +50 -0
  144. package/database/04-antipatterns/database-antipatterns.md +112 -0
  145. package/design/01-standards/ui-design-system-complete.md +423 -0
  146. package/design/02-playbooks/design-handoff-playbook.md +254 -0
  147. package/design/02-playbooks/design-review-playbook.md +388 -0
  148. package/design/03-checklists/design-review-checklist.md +246 -0
  149. package/design/04-antipatterns/design-antipatterns.md +378 -0
  150. package/design/05-cases/case-design-system-adoption.md +328 -0
  151. package/design/06-glossary/design-glossary.md +329 -0
  152. package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
  153. package/design/ux-system-deep-dive.md +38 -0
  154. package/design-systems/00-craft-rules.md +71 -0
  155. package/design-systems/aesthetic-families.md +43 -0
  156. package/design-systems/anti-ai-slop.md +162 -0
  157. package/design-systems/bold-geometric.md +120 -0
  158. package/design-systems/brutalist-bold.md +103 -0
  159. package/design-systems/editorial-clean.md +109 -0
  160. package/design-systems/glass-aurora.md +108 -0
  161. package/design-systems/modern-minimal.md +145 -0
  162. package/design-systems/premium-luxury.md +106 -0
  163. package/design-systems/product-type-design-map.md +48 -0
  164. package/design-systems/soft-warm.md +123 -0
  165. package/design-systems/tech-utility.md +113 -0
  166. package/desktop/01-standards/desktop-app-standard.md +72 -0
  167. package/desktop/01-standards/desktop-design.md +71 -0
  168. package/development/00-governance/document-template.md +41 -0
  169. package/development/01-standards/api-versioning-strategies.md +432 -0
  170. package/development/01-standards/authentication-patterns-complete.md +479 -0
  171. package/development/01-standards/css-architecture-complete.md +550 -0
  172. package/development/01-standards/database-migration-strategies.md +484 -0
  173. package/development/01-standards/elasticsearch-complete.md +347 -0
  174. package/development/01-standards/git-complete.md +371 -0
  175. package/development/01-standards/golang-complete.md +1565 -0
  176. package/development/01-standards/graphql-complete.md +298 -0
  177. package/development/01-standards/javascript-bundlers-complete.md +469 -0
  178. package/development/01-standards/javascript-typescript-complete.md +528 -0
  179. package/development/01-standards/jest-complete.md +275 -0
  180. package/development/01-standards/linux-complete.md +234 -0
  181. package/development/01-standards/logging-observability-complete.md +526 -0
  182. package/development/01-standards/microservices-communication.md +502 -0
  183. package/development/01-standards/mongodb-complete.md +406 -0
  184. package/development/01-standards/oauth2-complete.md +285 -0
  185. package/development/01-standards/performance-optimization-complete.md +289 -0
  186. package/development/01-standards/playwright-complete.md +247 -0
  187. package/development/01-standards/postgresql-complete.md +456 -0
  188. package/development/01-standards/pytest-complete.md +340 -0
  189. package/development/01-standards/python-async-programming.md +902 -0
  190. package/development/01-standards/python-complete.md +956 -0
  191. package/development/01-standards/python-decorators-complete.md +799 -0
  192. package/development/01-standards/python-design-patterns.md +2854 -0
  193. package/development/01-standards/python-packaging-distribution.md +420 -0
  194. package/development/01-standards/python-testing-strategies.md +607 -0
  195. package/development/01-standards/python-web-frameworks-comparison.md +471 -0
  196. package/development/01-standards/redis-complete.md +317 -0
  197. package/development/01-standards/rest-api-complete.md +316 -0
  198. package/development/01-standards/rust-complete.md +578 -0
  199. package/development/01-standards/typescript-advanced-types.md +1513 -0
  200. package/development/01-standards/web-security-complete.md +292 -0
  201. package/development/02-playbooks/api-design-playbook.md +810 -0
  202. package/development/02-playbooks/database-migration-playbook.md +580 -0
  203. package/development/02-playbooks/debugging-playbook.md +692 -0
  204. package/development/02-playbooks/feature-delivery-playbook.md +430 -0
  205. package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
  206. package/development/02-playbooks/performance-optimization-playbook.md +531 -0
  207. package/development/02-playbooks/performance-tuning-playbook.md +652 -0
  208. package/development/02-playbooks/refactor-playbook.md +403 -0
  209. package/development/02-playbooks/release-playbook.md +469 -0
  210. package/development/03-checklists/architecture-review-checklist.md +168 -0
  211. package/development/03-checklists/data-migration-checklist.md +157 -0
  212. package/development/03-checklists/oncall-handover-checklist.md +173 -0
  213. package/development/03-checklists/pr-checklist.md +158 -0
  214. package/development/03-checklists/production-readiness-checklist.md +190 -0
  215. package/development/03-checklists/release-readiness-checklist.md +154 -0
  216. package/development/03-checklists/security-review-checklist.md +182 -0
  217. package/development/04-antipatterns/api-antipatterns.md +657 -0
  218. package/development/04-antipatterns/architecture-antipatterns.md +686 -0
  219. package/development/04-antipatterns/backend-antipatterns.md +648 -0
  220. package/development/04-antipatterns/cicd-antipatterns.md +540 -0
  221. package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
  222. package/development/04-antipatterns/data-antipatterns.md +658 -0
  223. package/development/04-antipatterns/database-antipatterns.md +578 -0
  224. package/development/04-antipatterns/frontend-antipatterns.md +635 -0
  225. package/development/04-antipatterns/reliability-antipatterns.md +700 -0
  226. package/development/04-antipatterns/security-antipatterns.md +747 -0
  227. package/development/05-cases/case-api-version-migration.md +428 -0
  228. package/development/05-cases/case-authorization-hardening.md +383 -0
  229. package/development/05-cases/case-bluegreen-rollback.md +466 -0
  230. package/development/05-cases/case-cache-snowball-protection.md +485 -0
  231. package/development/05-cases/case-ci-cd-pipeline.md +544 -0
  232. package/development/05-cases/case-database-scaling.md +500 -0
  233. package/development/05-cases/case-db-hotspot-optimization.md +487 -0
  234. package/development/05-cases/case-incident-mttr-reduction.md +563 -0
  235. package/development/05-cases/case-microservice-migration.md +375 -0
  236. package/development/05-cases/case-performance-optimization.md +406 -0
  237. package/development/05-cases/case-security-incident-response.md +345 -0
  238. package/development/06-glossary/full-stack-glossary.md +166 -0
  239. package/development/09-maturity/quarterly-audit-template.md +35 -0
  240. package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
  241. package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
  242. package/development/12-scenarios/development-scenarios-guide.md +565 -0
  243. package/development/13-implementation-assets/implementation-toolkit.md +282 -0
  244. package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
  245. package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
  246. package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
  247. package/development/api-contract-and-versioning-guide.md +36 -0
  248. package/development/api-governance-complete.md +43 -0
  249. package/development/backend-engineering-complete.md +43 -0
  250. package/development/code-review-quality-complete.md +43 -0
  251. package/development/concurrency-reliability-complete.md +43 -0
  252. package/development/database-engineering-complete.md +43 -0
  253. package/development/engineering-effectiveness-complete.md +43 -0
  254. package/development/engineering-standards-deep-dive.md +38 -0
  255. package/development/frontend-engineering-complete.md +43 -0
  256. package/development/performance-capacity-complete.md +43 -0
  257. package/development/refactor-migration-complete.md +42 -0
  258. package/development/refactoring-and-techdebt-playbook.md +37 -0
  259. package/development/security-in-development-complete.md +43 -0
  260. package/devops/01-standards/cicd-pipeline-complete.md +262 -0
  261. package/devops/01-standards/docker-complete.md +1490 -0
  262. package/devops/01-standards/github-actions-complete.md +337 -0
  263. package/devops/01-standards/kubernetes-complete.md +638 -0
  264. package/devops/01-standards/terraform-complete.md +2117 -0
  265. package/devops/02-playbooks/docker-compose-playbook.md +233 -0
  266. package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
  267. package/devops/02-playbooks/docker-production-playbook.md +952 -0
  268. package/edge-iot/01-standards/edge-iot-complete.md +473 -0
  269. package/experts/architect/api-design.md +178 -0
  270. package/experts/architect/methodology.md +124 -0
  271. package/experts/architect/security.md +75 -0
  272. package/experts/backend-lead/methodology.md +216 -0
  273. package/experts/devops/methodology.md +160 -0
  274. package/experts/frontend-lead/methodology.md +178 -0
  275. package/experts/product-manager/industry/ecommerce.md +43 -0
  276. package/experts/product-manager/industry/saas.md +40 -0
  277. package/experts/product-manager/methodology.md +97 -0
  278. package/experts/qa-lead/methodology.md +123 -0
  279. package/experts/qa-lead/test-strategy.md +128 -0
  280. package/experts/uiux-designer/methodology.md +125 -0
  281. package/frontend/01-standards/accessibility-complete.md +532 -0
  282. package/frontend/01-standards/accessibility-standard.md +74 -0
  283. package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
  284. package/frontend/01-standards/design-tokens-complete.md +444 -0
  285. package/frontend/01-standards/forms-and-validation.md +77 -0
  286. package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
  287. package/frontend/01-standards/i18n-and-localization.md +65 -0
  288. package/frontend/01-standards/nextjs-complete.md +451 -0
  289. package/frontend/01-standards/react-complete.md +713 -0
  290. package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
  291. package/frontend/01-standards/react-hooks-complete.md +1171 -0
  292. package/frontend/01-standards/seo-and-web-vitals.md +77 -0
  293. package/frontend/01-standards/state-management-complete.md +444 -0
  294. package/frontend/01-standards/vue-complete.md +499 -0
  295. package/frontend/01-standards/vue3-complete.md +2002 -0
  296. package/frontend/01-standards/web-framework-best-practices.md +64 -0
  297. package/frontend/01-standards/web-performance-complete.md +495 -0
  298. package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
  299. package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
  300. package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
  301. package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
  302. package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
  303. package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
  304. package/frontend/03-checklists/component-quality-checklist.md +166 -0
  305. package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
  306. package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
  307. package/frontend/05-cases/case-performance-optimization.md +274 -0
  308. package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
  309. package/harmony/01-standards/harmonyos-design.md +65 -0
  310. package/high-quality-engineering-playbook.md +54 -0
  311. package/incident/01-standards/incident-response-complete.md +303 -0
  312. package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
  313. package/incident/02-playbooks/postmortem-playbook.md +398 -0
  314. package/incident/03-checklists/incident-readiness-checklist.md +181 -0
  315. package/incident/04-antipatterns/incident-antipatterns.md +490 -0
  316. package/incident/05-cases/case-cascade-failure.md +176 -0
  317. package/incident/06-glossary/incident-glossary.md +114 -0
  318. package/incident/postmortem-and-response-deep-dive.md +39 -0
  319. package/industries/ecommerce/ecommerce-complete.md +631 -0
  320. package/industries/education/education-complete.md +555 -0
  321. package/industries/fintech/fintech-complete.md +501 -0
  322. package/industries/gaming/gaming-complete.md +587 -0
  323. package/industries/healthcare/healthcare-complete.md +452 -0
  324. package/low-code/01-standards/low-code-complete.md +944 -0
  325. package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
  326. package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
  327. package/miniprogram/01-standards/miniprogram-design.md +61 -0
  328. package/miniprogram/01-standards/miniprogram-standard.md +81 -0
  329. package/mobile/01-standards/android-material-design.md +70 -0
  330. package/mobile/01-standards/flutter-complete.md +384 -0
  331. package/mobile/01-standards/ios-design-hig.md +78 -0
  332. package/mobile/01-standards/mobile-app-standard.md +85 -0
  333. package/mobile/01-standards/react-native-complete.md +352 -0
  334. package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
  335. package/mobile/02-playbooks/mobile-performance.md +473 -0
  336. package/mobile/03-checklists/mobile-release-checklist.md +234 -0
  337. package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
  338. package/mobile/05-cases/case-app-performance.md +500 -0
  339. package/mobile/05-cases/case-app-startup-optimization.md +218 -0
  340. package/mobile/06-glossary/mobile-glossary.md +484 -0
  341. package/observability/01-standards/observability-standards.md +103 -0
  342. package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
  343. package/observability/02-playbooks/structured-logging-playbook.md +73 -0
  344. package/observability/03-checklists/observability-checklist.md +54 -0
  345. package/observability/04-antipatterns/observability-antipatterns.md +106 -0
  346. package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
  347. package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
  348. package/operations/03-checklists/production-launch-checklist.md +365 -0
  349. package/operations/04-antipatterns/operations-antipatterns.md +664 -0
  350. package/operations/05-cases/case-sre-practices.md +581 -0
  351. package/operations/06-glossary/operations-glossary.md +120 -0
  352. package/operations/aiops-anomaly-detection.md +758 -0
  353. package/operations/capacity-planning.md +1061 -0
  354. package/operations/chaos-engineering.md +659 -0
  355. package/operations/incident-command-system.md +38 -0
  356. package/operations/observability-complete.md +442 -0
  357. package/operations/slo-sli-playbook.md +517 -0
  358. package/operations/sre-operations-deep-dive.md +39 -0
  359. package/package.json +8 -0
  360. package/performance/01-standards/performance-and-scalability.md +80 -0
  361. package/performance/01-standards/performance-standards.md +156 -0
  362. package/performance/02-playbooks/query-optimization-playbook.md +103 -0
  363. package/performance/03-checklists/performance-checklist.md +56 -0
  364. package/performance/04-antipatterns/performance-antipatterns.md +146 -0
  365. package/product/01-standards/product-management-complete.md +285 -0
  366. package/product/02-playbooks/feature-launch-playbook.md +207 -0
  367. package/product/02-playbooks/user-research-playbook.md +532 -0
  368. package/product/03-checklists/feature-launch-checklist.md +275 -0
  369. package/product/04-antipatterns/product-antipatterns.md +355 -0
  370. package/product/05-cases/case-mvp-to-scale.md +384 -0
  371. package/product/06-glossary/product-glossary.md +462 -0
  372. package/product/feature-prioritization-framework.md +40 -0
  373. package/product/kpi-and-metric-tree.md +37 -0
  374. package/product/product-discovery-and-prd-deep-dive.md +41 -0
  375. package/quantum/01-standards/quantum-complete.md +1186 -0
  376. package/security/01-standards/api-security-complete.md +511 -0
  377. package/security/01-standards/container-runtime-security.md +574 -0
  378. package/security/01-standards/data-protection-gdpr.md +543 -0
  379. package/security/01-standards/owasp-top10-complete.md +1890 -0
  380. package/security/01-standards/secure-coding-baseline.md +90 -0
  381. package/security/01-standards/supply-chain-security.md +441 -0
  382. package/security/01-standards/web-security-checklist.md +108 -0
  383. package/security/01-standards/zero-trust-architecture.md +521 -0
  384. package/security/02-playbooks/auth-sso-playbook.md +166 -0
  385. package/security/02-playbooks/incident-response-security-playbook.md +588 -0
  386. package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
  387. package/security/02-playbooks/payment-integration-playbook.md +119 -0
  388. package/security/02-playbooks/penetration-testing-playbook.md +517 -0
  389. package/security/03-checklists/security-audit-checklist.md +356 -0
  390. package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
  391. package/security/05-cases/case-log4shell-incident.md +537 -0
  392. package/security/05-cases/case-major-breaches.md +468 -0
  393. package/security/06-glossary/security-glossary.md +212 -0
  394. package/security/compliance-automation.md +993 -0
  395. package/security/container-security.md +680 -0
  396. package/security/devsecops-complete.md +426 -0
  397. package/security/sast-dast-sca.md +775 -0
  398. package/security/secrets-management.md +594 -0
  399. package/security/security-architecture-deep-dive.md +37 -0
  400. package/security/threat-modeling-stride-playbook.md +40 -0
  401. package/seed-templates/auth-system.md +59 -0
  402. package/seed-templates/blog-content.md +94 -0
  403. package/seed-templates/dashboard.md +89 -0
  404. package/seed-templates/docs-site.md +73 -0
  405. package/seed-templates/e-commerce.md +50 -0
  406. package/seed-templates/saas-landing.md +92 -0
  407. package/seed-templates/settings-page.md +51 -0
  408. package/testing/01-standards/test-strategy-and-layering.md +83 -0
  409. package/testing/01-standards/testing-strategy-complete.md +422 -0
  410. package/testing/01-standards/unit-testing-best-practices.md +118 -0
  411. package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
  412. package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
  413. package/testing/03-checklists/test-strategy-checklist.md +208 -0
  414. package/testing/04-antipatterns/testing-antipatterns.md +718 -0
  415. package/testing/05-cases/case-testing-transformation.md +300 -0
  416. package/testing/06-glossary/testing-glossary.md +110 -0
  417. package/testing/risk-based-test-matrix.md +36 -0
  418. package/testing/testing-strategy-deep-dive.md +37 -0
@@ -0,0 +1,684 @@
1
+ ---
2
+ id: data-pipeline-antipatterns
3
+ title: 数据管道反模式完全指南
4
+ domain: data-engineering
5
+ category: 04-antipatterns
6
+ difficulty: intermediate
7
+ tags: [antipatterns, backfill, capability, data, data-engineering, failure, observability, pipeline]
8
+ quality_score: 70
9
+ last_updated: 2026-06-15
10
+ ---
11
+ # 数据管道反模式完全指南
12
+
13
+ > 适用范围:ETL / ELT / 流处理 / 批处理管道
14
+ > 约束级别:SHALL(必须在 Pipeline Review 和架构评审阶段拦截)
15
+ > 适用工具:Apache Spark / Flink / Airflow / dbt / Kafka / Prefect
16
+
17
+ ---
18
+
19
+ ## 1. 无幂等性(Non-Idempotent Pipeline)
20
+
21
+ ### 描述
22
+ 管道任务重试或重跑时产生重复数据或副作用。在分布式系统中网络超时、节点故障是常态,任何任务都可能被执行多次。无幂等性的管道会导致数据重复、金额多扣、指标膨胀等严重问题。
23
+
24
+ ### 错误示例
25
+ ```python
26
+ # Airflow -- 非幂等的 INSERT
27
+ @task
28
+ def load_daily_sales(ds):
29
+ db.execute(f"""
30
+ INSERT INTO sales_summary (date, total_amount, order_count)
31
+ SELECT '{ds}', SUM(amount), COUNT(*)
32
+ FROM orders
33
+ WHERE order_date = '{ds}'
34
+ """)
35
+ # 重试一次 -> 同一天的数据插入两行
36
+ # 调度重跑 -> 数据翻倍
37
+ ```
38
+
39
+ ```python
40
+ # Spark -- 非幂等的 append 模式
41
+ def process_daily(date):
42
+ df = spark.read.parquet(f"s3://raw/orders/{date}/")
43
+ result = df.groupBy("category").agg(sum("amount"))
44
+ result.write.mode("append").parquet("s3://warehouse/daily_sales/")
45
+ # 重跑时在同一目录追加重复数据
46
+ ```
47
+
48
+ ### 正确示例
49
+ ```python
50
+ # Airflow -- 幂等方案:先删后插 (UPSERT)
51
+ @task
52
+ def load_daily_sales(ds):
53
+ db.execute(f"""
54
+ DELETE FROM sales_summary WHERE date = '{ds}';
55
+ INSERT INTO sales_summary (date, total_amount, order_count)
56
+ SELECT '{ds}', SUM(amount), COUNT(*)
57
+ FROM orders
58
+ WHERE order_date = '{ds}';
59
+ """)
60
+ # 无论执行多少次,结果始终正确
61
+ ```
62
+
63
+ ```python
64
+ # Spark -- 幂等方案:分区覆写
65
+ def process_daily(date):
66
+ df = spark.read.parquet(f"s3://raw/orders/{date}/")
67
+ result = df.groupBy("category").agg(sum("amount"))
68
+ result.write.mode("overwrite").partitionBy("date").parquet(
69
+ "s3://warehouse/daily_sales/"
70
+ )
71
+ # 覆写当日分区,重跑结果一致
72
+ ```
73
+
74
+ ```sql
75
+ -- PostgreSQL -- UPSERT (ON CONFLICT)
76
+ INSERT INTO sales_summary (date, category, total_amount, order_count)
77
+ SELECT order_date, category, SUM(amount), COUNT(*)
78
+ FROM orders
79
+ WHERE order_date = '2024-06-15'
80
+ GROUP BY order_date, category
81
+ ON CONFLICT (date, category)
82
+ DO UPDATE SET
83
+ total_amount = EXCLUDED.total_amount,
84
+ order_count = EXCLUDED.order_count,
85
+ updated_at = NOW();
86
+ ```
87
+
88
+ ### 幂等性检查清单
89
+ | 操作类型 | 幂等实现方式 |
90
+ |---------|------------|
91
+ | 文件写入 | 覆写目标分区 / 使用唯一文件名 |
92
+ | 数据库写入 | DELETE + INSERT / UPSERT / MERGE |
93
+ | API 调用 | 请求带幂等键(Idempotency Key) |
94
+ | 消息发送 | 消费端去重 / Exactly-once 语义 |
95
+
96
+ ---
97
+
98
+ ## 2. 无监控与告警(Missing Observability)
99
+
100
+ ### 描述
101
+ 管道没有执行状态监控、数据量监控、延迟监控和异常告警。当管道静默失败、数据量异常下降或延迟超标时,团队无法及时感知,导致下游报表和业务系统使用错误数据。
102
+
103
+ ### 错误示例
104
+ ```python
105
+ # Airflow DAG -- 无任何监控
106
+ @dag(schedule="0 2 * * *", catchup=False)
107
+ def daily_etl():
108
+ @task
109
+ def extract():
110
+ data = fetch_from_api()
111
+ save_to_staging(data)
112
+ # 如果 API 返回空数据? 无人知晓
113
+ # 如果只返回了昨天数据的 10%? 无人知晓
114
+
115
+ @task
116
+ def transform():
117
+ process_staging_data()
118
+ # 如果转换逻辑有 bug 丢了 30% 的行? 无人知晓
119
+
120
+ @task
121
+ def load():
122
+ load_to_warehouse()
123
+ # 如果目标表被锁导致超时? 日志被淹没
124
+ ```
125
+
126
+ ### 正确示例
127
+ ```python
128
+ # Airflow DAG -- 完整监控体系
129
+ from airflow.providers.slack.operators.slack_webhook import SlackWebhookOperator
130
+
131
+ def on_failure(context):
132
+ """任务失败时发送告警"""
133
+ task_id = context["task_instance"].task_id
134
+ dag_id = context["dag"].dag_id
135
+ execution_date = context["execution_date"]
136
+ log_url = context["task_instance"].log_url
137
+ SlackWebhookOperator(
138
+ task_id="slack_alert",
139
+ slack_webhook_conn_id="slack_ops",
140
+ message=f":red_circle: Pipeline Failed\n"
141
+ f"DAG: {dag_id}\nTask: {task_id}\n"
142
+ f"Date: {execution_date}\nLog: {log_url}",
143
+ ).execute(context)
144
+
145
+ @dag(
146
+ schedule="0 2 * * *",
147
+ catchup=False,
148
+ default_args={"on_failure_callback": on_failure},
149
+ sla_miss_callback=sla_alert, # SLA 超时告警
150
+ )
151
+ def daily_etl():
152
+
153
+ @task
154
+ def extract(**context):
155
+ data = fetch_from_api()
156
+ row_count = len(data)
157
+
158
+ # 数据量异常检测
159
+ expected_min = get_expected_min_rows(context["ds"])
160
+ if row_count < expected_min * 0.5:
161
+ raise ValueError(
162
+ f"Row count {row_count} is below 50% of expected {expected_min}"
163
+ )
164
+
165
+ # 推送指标到 Prometheus / DataDog
166
+ metrics.gauge("pipeline.extract.row_count", row_count,
167
+ tags={"dag": "daily_etl", "date": context["ds"]})
168
+ metrics.timer("pipeline.extract.duration", context["task_instance"].duration)
169
+
170
+ save_to_staging(data)
171
+ return {"row_count": row_count}
172
+
173
+ @task
174
+ def data_quality_check(extract_result, **context):
175
+ """独立的数据质量检查任务"""
176
+ checks = [
177
+ ("null_check", "SELECT COUNT(*) FROM staging WHERE id IS NULL"),
178
+ ("dup_check", "SELECT COUNT(*) - COUNT(DISTINCT id) FROM staging"),
179
+ ("range_check", "SELECT COUNT(*) FROM staging WHERE amount < 0"),
180
+ ]
181
+ for name, sql in checks:
182
+ bad_count = db.execute(sql).scalar()
183
+ if bad_count > 0:
184
+ raise DataQualityError(f"{name} failed: {bad_count} bad rows")
185
+
186
+ metrics.gauge("pipeline.quality.pass_rate", 1.0,
187
+ tags={"dag": "daily_etl"})
188
+ ```
189
+
190
+ ### 监控维度矩阵
191
+ | 维度 | 指标 | 告警阈值 | 工具 |
192
+ |------|------|---------|------|
193
+ | 执行状态 | 成功/失败/跳过 | 失败即告警 | Airflow callback |
194
+ | 数据量 | 输入行数 / 输出行数 | < 预期 50% 或 > 200% | 自定义 check |
195
+ | 延迟 | 任务耗时 / 数据时效 | > SLA 阈值 | Airflow SLA |
196
+ | 数据质量 | 空值率 / 重复率 / 范围异常 | > 阈值 | Great Expectations / dbt tests |
197
+ | 资源 | CPU / 内存 / 磁盘 | > 80% | Prometheus + Grafana |
198
+
199
+ ---
200
+
201
+ ## 3. 硬编码 Schema(Hardcoded Schema)
202
+
203
+ ### 描述
204
+ 在管道代码中硬编码列名、数据类型和表结构,当上游 Schema 发生变更(加列、改名、改类型)时管道静默产出错误数据或直接崩溃。
205
+
206
+ ### 错误示例
207
+ ```python
208
+ # 硬编码列索引 -- 上游加列后全部错位
209
+ def parse_csv_row(row):
210
+ return {
211
+ "user_id": row[0],
212
+ "name": row[1],
213
+ "email": row[2], # 上游在 name 后加了 nickname 列
214
+ "amount": row[3], # 实际取到的是 email
215
+ }
216
+
217
+ # 硬编码 DataFrame 列名 -- 无验证
218
+ def transform(df):
219
+ df["total"] = df["price"] * df["quantity"]
220
+ df["category_name"] = df["cat"] # 上游把 cat 改成了 category
221
+ return df # KeyError 或静默产出 NaN
222
+ ```
223
+
224
+ ### 正确示例
225
+ ```python
226
+ # Schema 注册 + 验证
227
+ from pydantic import BaseModel, validator
228
+ from typing import Optional
229
+
230
+ class OrderSchema(BaseModel):
231
+ """订单数据 Schema -- 版本化管理"""
232
+ order_id: str
233
+ user_id: int
234
+ amount: float
235
+ currency: str = "CNY"
236
+ status: str
237
+ created_at: datetime
238
+
239
+ @validator("amount")
240
+ def amount_must_be_positive(cls, v):
241
+ if v < 0:
242
+ raise ValueError("amount must be positive")
243
+ return v
244
+
245
+ @validator("status")
246
+ def status_must_be_valid(cls, v):
247
+ valid = {"pending", "paid", "shipped", "completed", "cancelled"}
248
+ if v not in valid:
249
+ raise ValueError(f"invalid status: {v}, expected one of {valid}")
250
+ return v
251
+
252
+ def transform(raw_data: list[dict]) -> list[OrderSchema]:
253
+ """解析时验证 Schema,不匹配立即报错"""
254
+ validated = []
255
+ errors = []
256
+ for i, row in enumerate(raw_data):
257
+ try:
258
+ validated.append(OrderSchema(**row))
259
+ except ValidationError as e:
260
+ errors.append({"row": i, "error": str(e)})
261
+
262
+ error_rate = len(errors) / len(raw_data) if raw_data else 0
263
+ if error_rate > 0.01: # 错误率 > 1% 则中断
264
+ raise SchemaValidationError(
265
+ f"Schema validation failed: {len(errors)} errors "
266
+ f"({error_rate:.1%}), first 5: {errors[:5]}"
267
+ )
268
+ return validated
269
+ ```
270
+
271
+ ```python
272
+ # dbt -- Schema 测试 (schema.yml)
273
+ # models/staging/schema.yml
274
+ """
275
+ version: 2
276
+ models:
277
+ - name: stg_orders
278
+ columns:
279
+ - name: order_id
280
+ tests: [not_null, unique]
281
+ - name: amount
282
+ tests:
283
+ - not_null
284
+ - dbt_utils.accepted_range:
285
+ min_value: 0
286
+ max_value: 1000000
287
+ - name: status
288
+ tests:
289
+ - accepted_values:
290
+ values: ['pending', 'paid', 'shipped', 'completed', 'cancelled']
291
+ """
292
+ ```
293
+
294
+ ---
295
+
296
+ ## 4. 忽略数据质量(Ignoring Data Quality)
297
+
298
+ ### 描述
299
+ 管道只负责搬运数据,不检查数据的完整性、准确性、一致性和时效性。脏数据流入下游后导致报表失真、模型预测偏差、业务决策错误,修复成本远大于预防成本。
300
+
301
+ ### 错误示例
302
+ ```python
303
+ # 直接加载,不做任何质量检查
304
+ def load_user_events(date):
305
+ df = spark.read.json(f"s3://raw/events/{date}/")
306
+ df.write.mode("overwrite").saveAsTable("warehouse.user_events")
307
+ # 可能包含:空 user_id、未来日期的时间戳、负数的 duration
308
+ # 下游看板显示 DAU 虚高(空 user_id 被计为一个用户)
309
+ ```
310
+
311
+ ### 正确示例
312
+ ```python
313
+ # Great Expectations -- 数据质量检查
314
+ import great_expectations as gx
315
+
316
+ def validate_user_events(df, date):
317
+ context = gx.get_context()
318
+ validator = context.sources.pandas_default.read_dataframe(df)
319
+
320
+ # 完整性检查
321
+ validator.expect_column_values_to_not_be_null("user_id")
322
+ validator.expect_column_values_to_not_be_null("event_type")
323
+ validator.expect_column_values_to_not_be_null("timestamp")
324
+
325
+ # 准确性检查
326
+ validator.expect_column_values_to_be_between(
327
+ "timestamp",
328
+ min_value=f"{date}T00:00:00Z",
329
+ max_value=f"{date}T23:59:59Z",
330
+ )
331
+ validator.expect_column_values_to_be_in_set(
332
+ "event_type",
333
+ ["page_view", "click", "purchase", "signup"],
334
+ )
335
+
336
+ # 一致性检查
337
+ validator.expect_column_values_to_be_between("duration_ms", min_value=0, max_value=3600000)
338
+
339
+ # 量级检查
340
+ validator.expect_table_row_count_to_be_between(min_value=10000, max_value=10000000)
341
+
342
+ result = validator.validate()
343
+ if not result.success:
344
+ failed = [r for r in result.results if not r.success]
345
+ raise DataQualityError(f"Quality check failed: {len(failed)} checks failed")
346
+
347
+ return df
348
+ ```
349
+
350
+ ### 数据质量维度
351
+ | 维度 | 检查项 | 工具 |
352
+ |------|--------|------|
353
+ | 完整性 | 非空、必填字段 | Great Expectations / dbt tests |
354
+ | 准确性 | 值域、格式、参照完整性 | Pydantic / 自定义规则 |
355
+ | 一致性 | 跨表/跨系统数据一致 | dbt_utils.equality / 自定义 |
356
+ | 时效性 | 数据新鲜度 | dbt source freshness / 自定义 |
357
+ | 唯一性 | 主键唯一、业务键唯一 | dbt unique test |
358
+ | 量级合理性 | 行数在预期范围 | 自定义阈值检查 |
359
+
360
+ ---
361
+
362
+ ## 5. 单点故障(Single Point of Failure)
363
+
364
+ ### 描述
365
+ 管道中的关键组件(调度器、数据源连接、中间存储)只有单个实例,该实例故障时整条管道中断。在生产环境中任何组件都可能故障,必须有冗余和故障转移机制。
366
+
367
+ ### 错误示例
368
+ ```python
369
+ # 单一数据源 -- 无备用
370
+ def extract():
371
+ # 唯一数据源,如果 API 挂了整条管道停摆
372
+ data = requests.get("https://api.partner.com/data", timeout=30).json()
373
+ return data
374
+
375
+ # 单一调度器 -- 无 HA
376
+ # Airflow 只部署了一个 Scheduler 实例
377
+ # Scheduler 进程挂了 -> 所有 DAG 停止调度
378
+
379
+ # 中间结果存本地磁盘 -- 无冗余
380
+ def transform(data):
381
+ df = pd.DataFrame(data)
382
+ df.to_parquet("/tmp/staging/result.parquet") # 机器重启 -> 数据丢失
383
+ ```
384
+
385
+ ### 正确示例
386
+ ```python
387
+ # 数据源冗余 + 重试 + 降级
388
+ from tenacity import retry, stop_after_attempt, wait_exponential
389
+
390
+ @retry(
391
+ stop=stop_after_attempt(3),
392
+ wait=wait_exponential(multiplier=1, min=4, max=60),
393
+ reraise=True,
394
+ )
395
+ def fetch_from_primary():
396
+ return requests.get("https://api.partner.com/data", timeout=30).json()
397
+
398
+ def extract():
399
+ try:
400
+ return fetch_from_primary()
401
+ except Exception as e:
402
+ logger.warning(f"Primary source failed: {e}, falling back to replica")
403
+ # 降级到备用数据源
404
+ return requests.get("https://api-backup.partner.com/data", timeout=60).json()
405
+
406
+ # 中间结果存到分布式存储
407
+ def transform(data):
408
+ df = pd.DataFrame(data)
409
+ # S3 自带 3 副本冗余
410
+ df.to_parquet("s3://pipeline-staging/daily/result.parquet")
411
+ ```
412
+
413
+ ```yaml
414
+ # Airflow HA 部署 -- Kubernetes
415
+ # 多 Scheduler 实例(Airflow 2.0+ 原生支持)
416
+ apiVersion: apps/v1
417
+ kind: Deployment
418
+ metadata:
419
+ name: airflow-scheduler
420
+ spec:
421
+ replicas: 2 # 双 Scheduler,自动 Leader 选举
422
+ selector:
423
+ matchLabels:
424
+ app: airflow-scheduler
425
+ template:
426
+ spec:
427
+ containers:
428
+ - name: scheduler
429
+ image: apache/airflow:2.8.0
430
+ command: ["airflow", "scheduler"]
431
+ env:
432
+ - name: AIRFLOW__SCHEDULER__STANDALONE_DAG_PROCESSOR
433
+ value: "False"
434
+ ```
435
+
436
+ ---
437
+
438
+ ## 6. 无回溯能力(No Backfill Capability)
439
+
440
+ ### 描述
441
+ 管道只能处理当天数据,无法重新处理历史数据。当发现历史数据有误、逻辑变更需要回刷、或新增指标需要回算时,只能手动补数据或等自然累积。
442
+
443
+ ### 错误示例
444
+ ```python
445
+ # 只处理"当前"数据,无参数化日期
446
+ def daily_etl():
447
+ today = datetime.now().strftime("%Y-%m-%d")
448
+ df = spark.read.parquet(f"s3://raw/events/{today}/")
449
+ result = df.groupBy("category").agg(sum("amount"))
450
+ result.write.mode("overwrite").saveAsTable("warehouse.daily_summary")
451
+ # 如何回刷上个月的数据? 改代码? 手动循环?
452
+
453
+ # 输出不按日期分区 -- 无法局部重跑
454
+ def load(df):
455
+ df.write.mode("append").parquet("s3://warehouse/all_data/")
456
+ # 回刷某一天会导致重复,无法只覆写那一天
457
+ ```
458
+
459
+ ### 正确示例
460
+ ```python
461
+ # Airflow -- 参数化日期 + 分区输出
462
+ @dag(
463
+ schedule="0 2 * * *",
464
+ start_date=datetime(2024, 1, 1),
465
+ catchup=True, # 允许回溯执行
466
+ )
467
+ def daily_etl():
468
+
469
+ @task
470
+ def extract(ds=None):
471
+ """ds 由 Airflow 自动传入逻辑日期"""
472
+ df = spark.read.parquet(f"s3://raw/events/{ds}/")
473
+ return df
474
+
475
+ @task
476
+ def transform(df, ds=None):
477
+ result = df.groupBy("category").agg(sum("amount").alias("total"))
478
+ result = result.withColumn("date", lit(ds))
479
+ return result
480
+
481
+ @task
482
+ def load(result, ds=None):
483
+ # 按日期分区覆写,幂等 + 可回溯
484
+ result.write.mode("overwrite").partitionBy("date").parquet(
485
+ "s3://warehouse/daily_summary/"
486
+ )
487
+
488
+ raw = extract()
489
+ transformed = transform(raw)
490
+ load(transformed)
491
+
492
+ # 回溯命令:
493
+ # airflow dags backfill daily_etl -s 2024-01-01 -e 2024-01-31
494
+ ```
495
+
496
+ ---
497
+
498
+ ## 7. 过度微批处理(Over-Micro-Batching)
499
+
500
+ ### 描述
501
+ 对不需要低延迟的场景使用过于频繁的微批处理(如每秒或每分钟),导致大量小文件(Small File Problem)、调度开销远大于计算开销、存储系统元数据压力大。
502
+
503
+ ### 错误示例
504
+ ```python
505
+ # 每分钟跑一次 Spark 作业 -- 90% 时间在调度和初始化
506
+ @dag(schedule="* * * * *") # 每分钟
507
+ def micro_batch_etl():
508
+ @task
509
+ def process():
510
+ spark = SparkSession.builder.getOrCreate() # ~15s 启动
511
+ df = spark.read.parquet("s3://raw/latest/") # ~5s 读取
512
+ df.write.parquet(f"s3://output/{ts}/") # ~3s 写入
513
+ # 实际计算 < 2s,调度和初始化 > 20s
514
+ # 每天产生 1440 个小文件
515
+
516
+ # Kafka -> S3 每条消息一个文件
517
+ def process_message(message):
518
+ key = f"s3://events/{message['id']}.json"
519
+ s3.put_object(Bucket="events", Key=key, Body=json.dumps(message))
520
+ # 每天 100 万消息 = 100 万个小文件
521
+ # S3 LIST 操作 O(n),下游 Spark 读取极慢
522
+ ```
523
+
524
+ ### 正确示例
525
+ ```python
526
+ # 根据 SLA 选择合适的批次频率
527
+ # 报表场景 -> 每日批处理
528
+ @dag(schedule="0 2 * * *") # 每天凌晨 2 点
529
+ def daily_batch():
530
+ ...
531
+
532
+ # 准实时场景 -> 每 5-15 分钟微批
533
+ @dag(schedule="*/15 * * * *") # 每 15 分钟
534
+ def near_realtime():
535
+ ...
536
+
537
+ # 真正的实时场景 -> 使用流处理引擎
538
+ # Flink / Spark Structured Streaming
539
+ def realtime_stream():
540
+ df = (
541
+ spark.readStream
542
+ .format("kafka")
543
+ .option("kafka.bootstrap.servers", "kafka:9092")
544
+ .option("subscribe", "events")
545
+ .load()
546
+ )
547
+
548
+ result = df.groupBy(
549
+ window("timestamp", "5 minutes"),
550
+ "category",
551
+ ).agg(sum("amount"))
552
+
553
+ result.writeStream \
554
+ .format("delta") \
555
+ .option("checkpointLocation", "s3://checkpoints/events/") \
556
+ .outputMode("update") \
557
+ .trigger(processingTime="1 minute") \
558
+ .start("s3://warehouse/realtime_summary/")
559
+ ```
560
+
561
+ ### 批次频率选择指南
562
+ | 数据时效要求 | 推荐方案 | 适用场景 |
563
+ |------------|---------|---------|
564
+ | T+1(隔天) | 日批 Spark / dbt | 报表、BI、数仓 |
565
+ | 小时级 | 每小时微批 | 运营看板、趋势监控 |
566
+ | 分钟级 | Structured Streaming / Flink | 实时推荐、风控预警 |
567
+ | 秒级/毫秒级 | Flink / Kafka Streams | 交易监控、欺诈检测 |
568
+
569
+ ---
570
+
571
+ ## 8. 忽视背压(Ignoring Backpressure)
572
+
573
+ ### 描述
574
+ 数据生产速度超过消费速度时,系统不限制上游流入速率,导致消费端 OOM、消息堆积、端到端延迟剧增。背压是流处理系统的核心挑战之一。
575
+
576
+ ### 错误示例
577
+ ```python
578
+ # Kafka 消费者 -- 无流量控制
579
+ def consume_forever():
580
+ consumer = KafkaConsumer("events", bootstrap_servers="kafka:9092")
581
+ for message in consumer:
582
+ # 同步处理,如果处理慢于生产速度
583
+ # consumer lag 持续增长 -> 最终 OOM 或数据过期被删除
584
+ result = heavy_transform(message.value)
585
+ db.insert(result) # 如果 DB 慢,这里阻塞
586
+ ```
587
+
588
+ ```python
589
+ # Spark Streaming -- 无速率限制
590
+ df = (
591
+ spark.readStream
592
+ .format("kafka")
593
+ .option("subscribe", "high_volume_events") # 每秒 100K 消息
594
+ .load()
595
+ )
596
+ # 默认读取所有可用数据 -> 每个微批次数据量爆炸 -> OOM
597
+ ```
598
+
599
+ ### 正确示例
600
+ ```python
601
+ # Kafka 消费者 -- 控制消费速率 + 异步处理
602
+ from concurrent.futures import ThreadPoolExecutor
603
+ import asyncio
604
+
605
+ class BackpressureConsumer:
606
+ def __init__(self, max_inflight=100):
607
+ self.semaphore = asyncio.Semaphore(max_inflight)
608
+ self.consumer = KafkaConsumer(
609
+ "events",
610
+ bootstrap_servers="kafka:9092",
611
+ max_poll_records=500, # 每次 poll 最多 500 条
612
+ max_poll_interval_ms=300000, # 处理超时 5 分钟
613
+ enable_auto_commit=False,
614
+ )
615
+
616
+ async def process_with_backpressure(self):
617
+ for message in self.consumer:
618
+ await self.semaphore.acquire() # 控制并发
619
+ asyncio.create_task(self._process(message))
620
+
621
+ async def _process(self, message):
622
+ try:
623
+ result = await heavy_transform(message.value)
624
+ await db.insert(result)
625
+ self.consumer.commit()
626
+ finally:
627
+ self.semaphore.release()
628
+ ```
629
+
630
+ ```python
631
+ # Spark Streaming -- 配置速率限制
632
+ df = (
633
+ spark.readStream
634
+ .format("kafka")
635
+ .option("subscribe", "high_volume_events")
636
+ .option("maxOffsetsPerTrigger", 100000) # 每个微批最多 10 万条
637
+ .option("minOffsetsPerTrigger", 10000) # 最少 1 万条(避免过于频繁)
638
+ .load()
639
+ )
640
+
641
+ # Flink -- 内置背压机制
642
+ # Flink 基于 Credit-based 流控,下游处理不过来时自动减慢上游
643
+ # 监控 Flink Web UI 的 Backpressure 面板
644
+ # 如果持续背压,需要:增加并行度 / 优化处理逻辑 / 扩容
645
+ ```
646
+
647
+ ### 背压处理策略
648
+ | 策略 | 实现方式 | 适用场景 |
649
+ |------|---------|---------|
650
+ | 速率限制 | maxOffsetsPerTrigger / max_poll_records | 所有场景 |
651
+ | 缓冲队列 | 内存队列 + 溢出到磁盘 | 突发流量 |
652
+ | 丢弃策略 | 丢弃最旧 / 采样 | 可容忍数据丢失的监控场景 |
653
+ | 动态扩容 | K8s HPA / 自动增加分区消费者 | 云原生环境 |
654
+ | 异步 + 批量写入 | 攒批后批量写入 DB / S3 | 写入密集场景 |
655
+
656
+ ---
657
+
658
+ ## 反模式速查矩阵
659
+
660
+ | # | 反模式 | 风险等级 | 典型后果 | 检测方式 |
661
+ |---|--------|:-------:|---------|---------|
662
+ | 1 | 无幂等性 | CRITICAL | 数据重复、金额错误 | 重跑验证 + Code Review |
663
+ | 2 | 无监控 | HIGH | 静默故障、脏数据流入下游 | 运维审计 |
664
+ | 3 | 硬编码 Schema | HIGH | Schema 变更后管道崩溃或静默错误 | Schema Registry + 测试 |
665
+ | 4 | 忽略数据质量 | HIGH | 报表失真、模型偏差 | 数据质量测试 |
666
+ | 5 | 单点故障 | HIGH | 管道完全中断 | 架构评审 + 混沌测试 |
667
+ | 6 | 无回溯能力 | MEDIUM | 无法修复历史数据 | 回溯测试 |
668
+ | 7 | 过度微批处理 | MEDIUM | 小文件、资源浪费 | 文件数监控 + 成本分析 |
669
+ | 8 | 忽视背压 | HIGH | OOM、数据丢失、延迟爆炸 | 消费者 lag 监控 |
670
+
671
+ ---
672
+
673
+ ## Agent Checklist
674
+
675
+ - [ ] 所有管道任务具备幂等性,重跑不产生重复数据
676
+ - [ ] 管道具备完整监控:执行状态、数据量、延迟、质量
677
+ - [ ] 告警覆盖所有关键失败场景,on-call 流程已建立
678
+ - [ ] Schema 通过注册中心或 Pydantic/dbt 测试验证
679
+ - [ ] 数据质量检查覆盖完整性、准确性、一致性、时效性
680
+ - [ ] 关键组件无单点故障,调度器和存储有冗余
681
+ - [ ] 管道支持参数化日期回溯,输出按日期分区
682
+ - [ ] 批次频率与业务 SLA 匹配,无过度微批
683
+ - [ ] 流处理场景已配置背压控制和速率限制
684
+ - [ ] Consumer lag 和处理延迟有监控和告警