@umacloud/knowledge 1.0.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/00-governance/governance-capabilities.md +557 -0
- package/00-governance/knowledge-map.md +39 -0
- package/00-governance/maintenance-policy.md +76 -0
- package/00-governance/review-checklist.md +81 -0
- package/README.md +13 -0
- package/ai/01-standards/agent-development-complete.md +691 -0
- package/ai/01-standards/llm-application-complete.md +488 -0
- package/ai/01-standards/mlops-complete.md +798 -0
- package/ai/01-standards/prompt-engineering-complete.md +646 -0
- package/ai/01-standards/rag-architecture-complete.md +649 -0
- package/ai/02-playbooks/llm-evaluation-playbook.md +847 -0
- package/ai/03-checklists/ai-project-checklist.md +215 -0
- package/ai/04-antipatterns/ai-antipatterns.md +661 -0
- package/ai/05-cases/case-rag-production.md +147 -0
- package/ai/06-glossary/ai-glossary.md +162 -0
- package/ai/agent-evaluation-benchmark.md +53 -0
- package/ai/ai-agent-memory-context-management.md +41 -0
- package/ai/ai-cost-capacity-optimization-playbook.md +42 -0
- package/ai/ai-data-security-and-compliance-playbook.md +37 -0
- package/ai/ai-domain-index-and-checklist.md +40 -0
- package/ai/ai-governance-maturity-model.md +50 -0
- package/ai/ai-model-selection-and-routing-strategy.md +47 -0
- package/ai/ai-observability-and-oncall-runbook.md +52 -0
- package/ai/ai-rag-engineering-playbook.md +42 -0
- package/ai/ai-red-team-and-safety-evaluation.md +42 -0
- package/ai/ai-release-readiness-and-rollback-gate.md +42 -0
- package/ai/llm-agent-engineering-deep-dive.md +57 -0
- package/ai/prompt-and-tool-guardrails.md +52 -0
- package/api/01-standards/enterprise-api-standards.md +198 -0
- package/api/01-standards/rest-api-design-guide.md +63 -0
- package/api/02-playbooks/api-pagination-playbook.md +93 -0
- package/api/02-playbooks/graphql-production-playbook.md +176 -0
- package/api/03-checklists/api-review-checklist.md +55 -0
- package/api/04-antipatterns/api-antipatterns.md +112 -0
- package/architecture/01-standards/api-gateway-patterns.md +496 -0
- package/architecture/01-standards/cloud-native-patterns.md +644 -0
- package/architecture/01-standards/distributed-systems-patterns.md +591 -0
- package/architecture/01-standards/event-driven-architecture.md +595 -0
- package/architecture/01-standards/microservices-patterns-complete.md +968 -0
- package/architecture/01-standards/microservices-patterns.md +495 -0
- package/architecture/01-standards/system-design-interview.md +664 -0
- package/architecture/02-playbooks/microservices-patterns-playbook.md +137 -0
- package/architecture/02-playbooks/migration-playbook.md +780 -0
- package/architecture/02-playbooks/system-design-playbook.md +779 -0
- package/architecture/03-checklists/architecture-decision-checklist.md +297 -0
- package/architecture/04-antipatterns/architecture-antipatterns.md +417 -0
- package/architecture/05-cases/case-netflix-microservices.md +413 -0
- package/architecture/06-glossary/architecture-glossary.md +164 -0
- package/architecture/adr-template-and-examples.md +38 -0
- package/architecture/api-gateway-deep-dive.md +1291 -0
- package/architecture/configuration-management.md +1162 -0
- package/architecture/distributed-transactions.md +1220 -0
- package/architecture/microservices-complete.md +735 -0
- package/architecture/resilience-and-disaster-patterns.md +37 -0
- package/architecture/service-governance.md +1198 -0
- package/architecture/system-architecture-deep-dive.md +37 -0
- package/backend/01-standards/analytics-and-growth.md +65 -0
- package/backend/01-standards/api-and-error-conventions.md +120 -0
- package/backend/01-standards/application-layering-and-packaging.md +160 -0
- package/backend/01-standards/auth-implementation.md +104 -0
- package/backend/01-standards/backend-framework-idioms.md +74 -0
- package/backend/01-standards/background-jobs-and-async.md +66 -0
- package/backend/01-standards/caching-strategies-complete.md +390 -0
- package/backend/01-standards/config-and-observability.md +77 -0
- package/backend/01-standards/data-modeling-and-persistence.md +94 -0
- package/backend/01-standards/django-complete.md +1765 -0
- package/backend/01-standards/email-and-notifications.md +64 -0
- package/backend/01-standards/fastapi-complete.md +925 -0
- package/backend/01-standards/file-upload-and-storage.md +66 -0
- package/backend/01-standards/graphql-api-complete.md +416 -0
- package/backend/01-standards/llm-application-standard.md +78 -0
- package/backend/01-standards/message-queue-patterns.md +379 -0
- package/backend/01-standards/microservices-and-distributed.md +78 -0
- package/backend/01-standards/nestjs-complete.md +2167 -0
- package/backend/01-standards/payment-integration.md +80 -0
- package/backend/01-standards/rate-limiting-complete.md +451 -0
- package/backend/01-standards/realtime-and-websocket.md +65 -0
- package/backend/01-standards/search-and-filtering.md +64 -0
- package/backend/01-standards/spring-boot-complete.md +445 -0
- package/backend/02-playbooks/api-design-playbook.md +718 -0
- package/backend/02-playbooks/email-send-playbook.md +130 -0
- package/backend/02-playbooks/file-upload-s3-playbook.md +153 -0
- package/backend/02-playbooks/typescript-enterprise-playbook.md +133 -0
- package/backend/02-playbooks/websocket-realtime-playbook.md +154 -0
- package/backend/03-checklists/api-launch-checklist.md +189 -0
- package/backend/04-antipatterns/backend-antipatterns.md +1051 -0
- package/blockchain/01-standards/blockchain-basics.md +557 -0
- package/blockchain/01-standards/smart-contract-development.md +1315 -0
- package/cicd/01-standards/deployment-and-delivery-standard.md +96 -0
- package/cicd/01-standards/github-actions-complete.md +473 -0
- package/cicd/01-standards/release-and-store-submission.md +75 -0
- package/cicd/02-playbooks/cicd-pipeline-playbook.md +144 -0
- package/cicd/02-playbooks/release-management-playbook.md +605 -0
- package/cicd/03-checklists/pipeline-security-checklist.md +168 -0
- package/cicd/04-antipatterns/cicd-antipatterns.md +589 -0
- package/cicd/05-cases/case-deployment-automation.md +221 -0
- package/cicd/05-cases/case-gitops-transformation.md +212 -0
- package/cicd/06-glossary/cicd-glossary.md +114 -0
- package/cicd/cicd-blueprint-deep-dive.md +38 -0
- package/cicd/release-readiness-gate.md +37 -0
- package/cloud-native/01-standards/container-security.md +741 -0
- package/cloud-native/01-standards/kubernetes-complete.md +812 -0
- package/cloud-native/02-playbooks/api-gateway-playbook.md +155 -0
- package/cloud-native/02-playbooks/gitops-with-argocd.md +760 -0
- package/cloud-native/02-playbooks/k8s-troubleshooting-playbook.md +1942 -0
- package/cloud-native/02-playbooks/message-queue-playbook.md +129 -0
- package/cloud-native/02-playbooks/multicloud-governance.md +726 -0
- package/cloud-native/02-playbooks/serverless-patterns.md +788 -0
- package/cloud-native/02-playbooks/service-mesh-playbook.md +612 -0
- package/cloud-native/02-playbooks/terraform-iac-playbook.md +143 -0
- package/cloud-native/03-checklists/container-security-checklist.md +431 -0
- package/cloud-native/03-checklists/k8s-production-readiness-checklist.md +460 -0
- package/cloud-native/04-antipatterns/container-antipatterns.md +660 -0
- package/cloud-native/04-antipatterns/k8s-antipatterns.md +743 -0
- package/cloud-native/05-cases/case-k8s-migration.md +478 -0
- package/cloud-native/05-cases/case-k8s-scaling.md +642 -0
- package/cloud-native/05-cases/case-k8s-security-incident.md +397 -0
- package/cloud-native/06-glossary/cloud-native-glossary.md +337 -0
- package/cross-platform/01-standards/cross-platform-frameworks.md +83 -0
- package/cross-platform/01-standards/platform-selection-and-architecture.md +77 -0
- package/data/01-standards/elasticsearch-complete.md +2098 -0
- package/data/01-standards/postgresql-complete.md +1613 -0
- package/data/01-standards/redis-complete.md +1527 -0
- package/data/02-playbooks/database-optimization-playbook.md +403 -0
- package/data/02-playbooks/elasticsearch-production-playbook.md +132 -0
- package/data/03-checklists/database-launch-checklist.md +187 -0
- package/data/04-antipatterns/database-antipatterns.md +873 -0
- package/data/05-cases/case-database-migration.md +310 -0
- package/data/06-glossary/database-glossary.md +440 -0
- package/data/data-governance-and-modeling-deep-dive.md +39 -0
- package/data-engineering/01-standards/airflow-complete.md +523 -0
- package/data-engineering/01-standards/kafka-complete.md +1521 -0
- package/data-engineering/02-playbooks/spark-etl-playbook.md +496 -0
- package/data-engineering/03-checklists/pipeline-launch-checklist.md +194 -0
- package/data-engineering/04-antipatterns/data-pipeline-antipatterns.md +684 -0
- package/data-engineering/05-cases/case-real-time-pipeline.md +355 -0
- package/data-engineering/06-glossary/data-engineering-glossary.md +429 -0
- package/database/01-standards/database-schema-standards.md +147 -0
- package/database/02-playbooks/postgresql-optimization-quick.md +52 -0
- package/database/02-playbooks/postgresql-performance-optimization.md +58 -0
- package/database/02-playbooks/postgresql-production-playbook.md +146 -0
- package/database/02-playbooks/redis-caching-playbook.md +117 -0
- package/database/03-checklists/database-review-checklist.md +50 -0
- package/database/04-antipatterns/database-antipatterns.md +112 -0
- package/design/01-standards/ui-design-system-complete.md +423 -0
- package/design/02-playbooks/design-handoff-playbook.md +254 -0
- package/design/02-playbooks/design-review-playbook.md +388 -0
- package/design/03-checklists/design-review-checklist.md +246 -0
- package/design/04-antipatterns/design-antipatterns.md +378 -0
- package/design/05-cases/case-design-system-adoption.md +328 -0
- package/design/06-glossary/design-glossary.md +329 -0
- package/design/ui-full-lifecycle-cross-platform-playbook.md +571 -0
- package/design/ux-system-deep-dive.md +38 -0
- package/design-systems/00-craft-rules.md +71 -0
- package/design-systems/aesthetic-families.md +43 -0
- package/design-systems/anti-ai-slop.md +162 -0
- package/design-systems/bold-geometric.md +120 -0
- package/design-systems/brutalist-bold.md +103 -0
- package/design-systems/editorial-clean.md +109 -0
- package/design-systems/glass-aurora.md +108 -0
- package/design-systems/modern-minimal.md +145 -0
- package/design-systems/premium-luxury.md +106 -0
- package/design-systems/product-type-design-map.md +48 -0
- package/design-systems/soft-warm.md +123 -0
- package/design-systems/tech-utility.md +113 -0
- package/desktop/01-standards/desktop-app-standard.md +72 -0
- package/desktop/01-standards/desktop-design.md +71 -0
- package/development/00-governance/document-template.md +41 -0
- package/development/01-standards/api-versioning-strategies.md +432 -0
- package/development/01-standards/authentication-patterns-complete.md +479 -0
- package/development/01-standards/css-architecture-complete.md +550 -0
- package/development/01-standards/database-migration-strategies.md +484 -0
- package/development/01-standards/elasticsearch-complete.md +347 -0
- package/development/01-standards/git-complete.md +371 -0
- package/development/01-standards/golang-complete.md +1565 -0
- package/development/01-standards/graphql-complete.md +298 -0
- package/development/01-standards/javascript-bundlers-complete.md +469 -0
- package/development/01-standards/javascript-typescript-complete.md +528 -0
- package/development/01-standards/jest-complete.md +275 -0
- package/development/01-standards/linux-complete.md +234 -0
- package/development/01-standards/logging-observability-complete.md +526 -0
- package/development/01-standards/microservices-communication.md +502 -0
- package/development/01-standards/mongodb-complete.md +406 -0
- package/development/01-standards/oauth2-complete.md +285 -0
- package/development/01-standards/performance-optimization-complete.md +289 -0
- package/development/01-standards/playwright-complete.md +247 -0
- package/development/01-standards/postgresql-complete.md +456 -0
- package/development/01-standards/pytest-complete.md +340 -0
- package/development/01-standards/python-async-programming.md +902 -0
- package/development/01-standards/python-complete.md +956 -0
- package/development/01-standards/python-decorators-complete.md +799 -0
- package/development/01-standards/python-design-patterns.md +2854 -0
- package/development/01-standards/python-packaging-distribution.md +420 -0
- package/development/01-standards/python-testing-strategies.md +607 -0
- package/development/01-standards/python-web-frameworks-comparison.md +471 -0
- package/development/01-standards/redis-complete.md +317 -0
- package/development/01-standards/rest-api-complete.md +316 -0
- package/development/01-standards/rust-complete.md +578 -0
- package/development/01-standards/typescript-advanced-types.md +1513 -0
- package/development/01-standards/web-security-complete.md +292 -0
- package/development/02-playbooks/api-design-playbook.md +810 -0
- package/development/02-playbooks/database-migration-playbook.md +580 -0
- package/development/02-playbooks/debugging-playbook.md +692 -0
- package/development/02-playbooks/feature-delivery-playbook.md +430 -0
- package/development/02-playbooks/incident-hotfix-playbook.md +387 -0
- package/development/02-playbooks/performance-optimization-playbook.md +531 -0
- package/development/02-playbooks/performance-tuning-playbook.md +652 -0
- package/development/02-playbooks/refactor-playbook.md +403 -0
- package/development/02-playbooks/release-playbook.md +469 -0
- package/development/03-checklists/architecture-review-checklist.md +168 -0
- package/development/03-checklists/data-migration-checklist.md +157 -0
- package/development/03-checklists/oncall-handover-checklist.md +173 -0
- package/development/03-checklists/pr-checklist.md +158 -0
- package/development/03-checklists/production-readiness-checklist.md +190 -0
- package/development/03-checklists/release-readiness-checklist.md +154 -0
- package/development/03-checklists/security-review-checklist.md +182 -0
- package/development/04-antipatterns/api-antipatterns.md +657 -0
- package/development/04-antipatterns/architecture-antipatterns.md +686 -0
- package/development/04-antipatterns/backend-antipatterns.md +648 -0
- package/development/04-antipatterns/cicd-antipatterns.md +540 -0
- package/development/04-antipatterns/code-smell-antipatterns.md +571 -0
- package/development/04-antipatterns/data-antipatterns.md +658 -0
- package/development/04-antipatterns/database-antipatterns.md +578 -0
- package/development/04-antipatterns/frontend-antipatterns.md +635 -0
- package/development/04-antipatterns/reliability-antipatterns.md +700 -0
- package/development/04-antipatterns/security-antipatterns.md +747 -0
- package/development/05-cases/case-api-version-migration.md +428 -0
- package/development/05-cases/case-authorization-hardening.md +383 -0
- package/development/05-cases/case-bluegreen-rollback.md +466 -0
- package/development/05-cases/case-cache-snowball-protection.md +485 -0
- package/development/05-cases/case-ci-cd-pipeline.md +544 -0
- package/development/05-cases/case-database-scaling.md +500 -0
- package/development/05-cases/case-db-hotspot-optimization.md +487 -0
- package/development/05-cases/case-incident-mttr-reduction.md +563 -0
- package/development/05-cases/case-microservice-migration.md +375 -0
- package/development/05-cases/case-performance-optimization.md +406 -0
- package/development/05-cases/case-security-incident-response.md +345 -0
- package/development/06-glossary/full-stack-glossary.md +166 -0
- package/development/09-maturity/quarterly-audit-template.md +35 -0
- package/development/11-ui-excellence/ui-aesthetic-system.md +41 -0
- package/development/11-ui-excellence/ui-engineering-excellence.md +435 -0
- package/development/12-scenarios/development-scenarios-guide.md +565 -0
- package/development/13-implementation-assets/implementation-toolkit.md +282 -0
- package/development/13-implementation-assets/knowledge-gates-execution.md +43 -0
- package/development/14-full-lifecycle/software-lifecycle-gates.md +511 -0
- package/development/15-lifecycle-templates/project-templates-collection.md +791 -0
- package/development/api-contract-and-versioning-guide.md +36 -0
- package/development/api-governance-complete.md +43 -0
- package/development/backend-engineering-complete.md +43 -0
- package/development/code-review-quality-complete.md +43 -0
- package/development/concurrency-reliability-complete.md +43 -0
- package/development/database-engineering-complete.md +43 -0
- package/development/engineering-effectiveness-complete.md +43 -0
- package/development/engineering-standards-deep-dive.md +38 -0
- package/development/frontend-engineering-complete.md +43 -0
- package/development/performance-capacity-complete.md +43 -0
- package/development/refactor-migration-complete.md +42 -0
- package/development/refactoring-and-techdebt-playbook.md +37 -0
- package/development/security-in-development-complete.md +43 -0
- package/devops/01-standards/cicd-pipeline-complete.md +262 -0
- package/devops/01-standards/docker-complete.md +1490 -0
- package/devops/01-standards/github-actions-complete.md +337 -0
- package/devops/01-standards/kubernetes-complete.md +638 -0
- package/devops/01-standards/terraform-complete.md +2117 -0
- package/devops/02-playbooks/docker-compose-playbook.md +233 -0
- package/devops/02-playbooks/docker-k8s-production-playbook.md +186 -0
- package/devops/02-playbooks/docker-production-playbook.md +952 -0
- package/edge-iot/01-standards/edge-iot-complete.md +473 -0
- package/experts/architect/api-design.md +178 -0
- package/experts/architect/methodology.md +124 -0
- package/experts/architect/security.md +75 -0
- package/experts/backend-lead/methodology.md +216 -0
- package/experts/devops/methodology.md +160 -0
- package/experts/frontend-lead/methodology.md +178 -0
- package/experts/product-manager/industry/ecommerce.md +43 -0
- package/experts/product-manager/industry/saas.md +40 -0
- package/experts/product-manager/methodology.md +97 -0
- package/experts/qa-lead/methodology.md +123 -0
- package/experts/qa-lead/test-strategy.md +128 -0
- package/experts/uiux-designer/methodology.md +125 -0
- package/frontend/01-standards/accessibility-complete.md +532 -0
- package/frontend/01-standards/accessibility-standard.md +74 -0
- package/frontend/01-standards/admin-dashboard-and-crud.md +72 -0
- package/frontend/01-standards/design-tokens-complete.md +444 -0
- package/frontend/01-standards/forms-and-validation.md +77 -0
- package/frontend/01-standards/frontend-architecture-and-layering.md +119 -0
- package/frontend/01-standards/i18n-and-localization.md +65 -0
- package/frontend/01-standards/nextjs-complete.md +451 -0
- package/frontend/01-standards/react-complete.md +713 -0
- package/frontend/01-standards/react-hooks-complete-guide.md +1100 -0
- package/frontend/01-standards/react-hooks-complete.md +1171 -0
- package/frontend/01-standards/seo-and-web-vitals.md +77 -0
- package/frontend/01-standards/state-management-complete.md +444 -0
- package/frontend/01-standards/vue-complete.md +499 -0
- package/frontend/01-standards/vue3-complete.md +2002 -0
- package/frontend/01-standards/web-framework-best-practices.md +64 -0
- package/frontend/01-standards/web-performance-complete.md +495 -0
- package/frontend/02-playbooks/accessibility-a11y-playbook.md +161 -0
- package/frontend/02-playbooks/frontend-performance-playbook.md +707 -0
- package/frontend/02-playbooks/i18n-internationalization-playbook.md +120 -0
- package/frontend/02-playbooks/performance-optimization-playbook.md +163 -0
- package/frontend/02-playbooks/react-nextjs-production-playbook.md +167 -0
- package/frontend/02-playbooks/react-state-management-playbook.md +173 -0
- package/frontend/03-checklists/component-quality-checklist.md +166 -0
- package/frontend/03-checklists/frontend-launch-checklist.md +299 -0
- package/frontend/04-antipatterns/frontend-antipatterns.md +886 -0
- package/frontend/05-cases/case-performance-optimization.md +274 -0
- package/harmony/01-standards/harmonyos-arkts-standard.md +75 -0
- package/harmony/01-standards/harmonyos-design.md +65 -0
- package/high-quality-engineering-playbook.md +54 -0
- package/incident/01-standards/incident-response-complete.md +303 -0
- package/incident/02-playbooks/chaos-engineering-playbook.md +883 -0
- package/incident/02-playbooks/postmortem-playbook.md +398 -0
- package/incident/03-checklists/incident-readiness-checklist.md +181 -0
- package/incident/04-antipatterns/incident-antipatterns.md +490 -0
- package/incident/05-cases/case-cascade-failure.md +176 -0
- package/incident/06-glossary/incident-glossary.md +114 -0
- package/incident/postmortem-and-response-deep-dive.md +39 -0
- package/industries/ecommerce/ecommerce-complete.md +631 -0
- package/industries/education/education-complete.md +555 -0
- package/industries/fintech/fintech-complete.md +501 -0
- package/industries/gaming/gaming-complete.md +587 -0
- package/industries/healthcare/healthcare-complete.md +452 -0
- package/low-code/01-standards/low-code-complete.md +944 -0
- package/miniprogram/01-standards/ai-common-mistakes.md +61 -0
- package/miniprogram/01-standards/miniprogram-custom-navbar-capsule.md +77 -0
- package/miniprogram/01-standards/miniprogram-design.md +61 -0
- package/miniprogram/01-standards/miniprogram-standard.md +81 -0
- package/mobile/01-standards/android-material-design.md +70 -0
- package/mobile/01-standards/flutter-complete.md +384 -0
- package/mobile/01-standards/ios-design-hig.md +78 -0
- package/mobile/01-standards/mobile-app-standard.md +85 -0
- package/mobile/01-standards/react-native-complete.md +352 -0
- package/mobile/02-playbooks/mobile-cross-platform-playbook.md +175 -0
- package/mobile/02-playbooks/mobile-performance.md +473 -0
- package/mobile/03-checklists/mobile-release-checklist.md +234 -0
- package/mobile/04-antipatterns/mobile-antipatterns.md +798 -0
- package/mobile/05-cases/case-app-performance.md +500 -0
- package/mobile/05-cases/case-app-startup-optimization.md +218 -0
- package/mobile/06-glossary/mobile-glossary.md +484 -0
- package/observability/01-standards/observability-standards.md +103 -0
- package/observability/02-playbooks/prometheus-grafana-playbook.md +135 -0
- package/observability/02-playbooks/structured-logging-playbook.md +73 -0
- package/observability/03-checklists/observability-checklist.md +54 -0
- package/observability/04-antipatterns/observability-antipatterns.md +106 -0
- package/operations/01-standards/prometheus-monitoring-complete.md +1578 -0
- package/operations/02-playbooks/capacity-planning-playbook.md +620 -0
- package/operations/03-checklists/production-launch-checklist.md +365 -0
- package/operations/04-antipatterns/operations-antipatterns.md +664 -0
- package/operations/05-cases/case-sre-practices.md +581 -0
- package/operations/06-glossary/operations-glossary.md +120 -0
- package/operations/aiops-anomaly-detection.md +758 -0
- package/operations/capacity-planning.md +1061 -0
- package/operations/chaos-engineering.md +659 -0
- package/operations/incident-command-system.md +38 -0
- package/operations/observability-complete.md +442 -0
- package/operations/slo-sli-playbook.md +517 -0
- package/operations/sre-operations-deep-dive.md +39 -0
- package/package.json +8 -0
- package/performance/01-standards/performance-and-scalability.md +80 -0
- package/performance/01-standards/performance-standards.md +156 -0
- package/performance/02-playbooks/query-optimization-playbook.md +103 -0
- package/performance/03-checklists/performance-checklist.md +56 -0
- package/performance/04-antipatterns/performance-antipatterns.md +146 -0
- package/product/01-standards/product-management-complete.md +285 -0
- package/product/02-playbooks/feature-launch-playbook.md +207 -0
- package/product/02-playbooks/user-research-playbook.md +532 -0
- package/product/03-checklists/feature-launch-checklist.md +275 -0
- package/product/04-antipatterns/product-antipatterns.md +355 -0
- package/product/05-cases/case-mvp-to-scale.md +384 -0
- package/product/06-glossary/product-glossary.md +462 -0
- package/product/feature-prioritization-framework.md +40 -0
- package/product/kpi-and-metric-tree.md +37 -0
- package/product/product-discovery-and-prd-deep-dive.md +41 -0
- package/quantum/01-standards/quantum-complete.md +1186 -0
- package/security/01-standards/api-security-complete.md +511 -0
- package/security/01-standards/container-runtime-security.md +574 -0
- package/security/01-standards/data-protection-gdpr.md +543 -0
- package/security/01-standards/owasp-top10-complete.md +1890 -0
- package/security/01-standards/secure-coding-baseline.md +90 -0
- package/security/01-standards/supply-chain-security.md +441 -0
- package/security/01-standards/web-security-checklist.md +108 -0
- package/security/01-standards/zero-trust-architecture.md +521 -0
- package/security/02-playbooks/auth-sso-playbook.md +166 -0
- package/security/02-playbooks/incident-response-security-playbook.md +588 -0
- package/security/02-playbooks/owasp-api-security-playbook.md +129 -0
- package/security/02-playbooks/payment-integration-playbook.md +119 -0
- package/security/02-playbooks/penetration-testing-playbook.md +517 -0
- package/security/03-checklists/security-audit-checklist.md +356 -0
- package/security/04-antipatterns/security-coding-antipatterns.md +580 -0
- package/security/05-cases/case-log4shell-incident.md +537 -0
- package/security/05-cases/case-major-breaches.md +468 -0
- package/security/06-glossary/security-glossary.md +212 -0
- package/security/compliance-automation.md +993 -0
- package/security/container-security.md +680 -0
- package/security/devsecops-complete.md +426 -0
- package/security/sast-dast-sca.md +775 -0
- package/security/secrets-management.md +594 -0
- package/security/security-architecture-deep-dive.md +37 -0
- package/security/threat-modeling-stride-playbook.md +40 -0
- package/seed-templates/auth-system.md +59 -0
- package/seed-templates/blog-content.md +94 -0
- package/seed-templates/dashboard.md +89 -0
- package/seed-templates/docs-site.md +73 -0
- package/seed-templates/e-commerce.md +50 -0
- package/seed-templates/saas-landing.md +92 -0
- package/seed-templates/settings-page.md +51 -0
- package/testing/01-standards/test-strategy-and-layering.md +83 -0
- package/testing/01-standards/testing-strategy-complete.md +422 -0
- package/testing/01-standards/unit-testing-best-practices.md +118 -0
- package/testing/02-playbooks/e2e-testing-playbook.md +988 -0
- package/testing/02-playbooks/testing-strategy-playbook.md +126 -0
- package/testing/03-checklists/test-strategy-checklist.md +208 -0
- package/testing/04-antipatterns/testing-antipatterns.md +718 -0
- package/testing/05-cases/case-testing-transformation.md +300 -0
- package/testing/06-glossary/testing-glossary.md +110 -0
- package/testing/risk-based-test-matrix.md +36 -0
- package/testing/testing-strategy-deep-dive.md +37 -0
|
@@ -0,0 +1,2098 @@
|
|
|
1
|
+
---
|
|
2
|
+
id: elasticsearch-complete
|
|
3
|
+
title: Elasticsearch 数据领域完整指南
|
|
4
|
+
domain: data
|
|
5
|
+
category: 01-standards
|
|
6
|
+
difficulty: intermediate
|
|
7
|
+
tags: [complete, data, elasticsearch, mapping, 性能优化, 查询, 核心概念, 概述]
|
|
8
|
+
quality_score: 70
|
|
9
|
+
last_updated: 2026-06-15
|
|
10
|
+
---
|
|
11
|
+
# Elasticsearch 数据领域完整指南
|
|
12
|
+
|
|
13
|
+
> 文档版本: v1.0 | 最后更新: 2026-03-28
|
|
14
|
+
|
|
15
|
+
## 概述
|
|
16
|
+
|
|
17
|
+
Elasticsearch 是一款基于 Apache Lucene 构建的分布式搜索与分析引擎,以近实时(NRT)的全文搜索能力和水平扩展性著称。它不仅是日志分析(ELK Stack)的核心组件,还广泛应用于站内搜索、推荐系统、安全分析、APM(应用性能监控)和向量检索等场景。Elasticsearch 8.x 支持原生向量搜索(kNN)、无需安全配置即默认启用 TLS + 认证、以及 ES|QL 查询语言。
|
|
18
|
+
|
|
19
|
+
### 全文搜索 vs 结构化搜索
|
|
20
|
+
|
|
21
|
+
| 维度 | 全文搜索 | 结构化搜索 |
|
|
22
|
+
|------|----------|------------|
|
|
23
|
+
| 数据类型 | 非结构化文本(文章、日志、评论) | 精确值(状态码、ID、日期范围) |
|
|
24
|
+
| 匹配方式 | 分词 -> 倒排索引 -> 相关性评分 | 精确匹配 / 范围过滤,无评分 |
|
|
25
|
+
| 典型查询 | `match`、`multi_match`、`match_phrase` | `term`、`range`、`exists`、`bool filter` |
|
|
26
|
+
| 是否评分 | 是(`_score`) | 否(filter context,可缓存) |
|
|
27
|
+
| 性能特征 | CPU 密集(分词 + 评分) | I/O 密集但可高度缓存 |
|
|
28
|
+
|
|
29
|
+
### ES 适用场景
|
|
30
|
+
|
|
31
|
+
| 场景 | 说明 |
|
|
32
|
+
|------|------|
|
|
33
|
+
| 站内搜索 | 电商商品、文档知识库、CMS 内容检索 |
|
|
34
|
+
| 日志与可观测性 | ELK/EFK Stack,集中式日志查询与告警 |
|
|
35
|
+
| 安全分析(SIEM) | Elastic Security,威胁检测与事件溯源 |
|
|
36
|
+
| APM | 分布式链路追踪、服务拓扑、错误追踪 |
|
|
37
|
+
| 推荐/个性化 | 基于用户行为的实时推荐,结合 function_score |
|
|
38
|
+
| 向量检索 | 8.x kNN 搜索,语义检索 + 传统检索混合排序 |
|
|
39
|
+
| 地理信息 | geo_point / geo_shape 查询,LBS 应用 |
|
|
40
|
+
| 指标聚合 | 实时仪表盘、多维度统计分析 |
|
|
41
|
+
|
|
42
|
+
### 何时不应选择 ES
|
|
43
|
+
|
|
44
|
+
- **强事务需求**: ES 不支持 ACID 事务,写入后需等待 refresh 才可见(默认 1s)
|
|
45
|
+
- **频繁全量更新**: ES 的 update 本质是 delete + reindex,写放大严重
|
|
46
|
+
- **主数据存储**: ES 不应作为唯一数据源(Source of Truth),需搭配关系型数据库
|
|
47
|
+
- **小数据量精确查询**: 数据量小于百万级且仅需精确查询时,RDBMS 更合适
|
|
48
|
+
|
|
49
|
+
---
|
|
50
|
+
|
|
51
|
+
## 核心概念
|
|
52
|
+
|
|
53
|
+
### 1. Index(索引)
|
|
54
|
+
|
|
55
|
+
ES 中的索引类似于关系型数据库中的"表",是文档(Document)的逻辑容器。每个索引有自己的 Mapping(模式定义)和 Settings(分片数、副本数、分析器等)。
|
|
56
|
+
|
|
57
|
+
```json
|
|
58
|
+
// 创建索引
|
|
59
|
+
PUT /products
|
|
60
|
+
{
|
|
61
|
+
"settings": {
|
|
62
|
+
"number_of_shards": 3,
|
|
63
|
+
"number_of_replicas": 1,
|
|
64
|
+
"refresh_interval": "5s",
|
|
65
|
+
"analysis": {
|
|
66
|
+
"analyzer": {
|
|
67
|
+
"product_analyzer": {
|
|
68
|
+
"type": "custom",
|
|
69
|
+
"tokenizer": "ik_max_word",
|
|
70
|
+
"filter": ["lowercase", "synonym_filter"]
|
|
71
|
+
}
|
|
72
|
+
}
|
|
73
|
+
}
|
|
74
|
+
}
|
|
75
|
+
}
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
### 2. Document(文档)
|
|
79
|
+
|
|
80
|
+
ES 中的最小数据单元,以 JSON 格式存储。每个文档属于一个索引,拥有唯一的 `_id`。
|
|
81
|
+
|
|
82
|
+
```json
|
|
83
|
+
// 索引一个文档
|
|
84
|
+
PUT /products/_doc/1
|
|
85
|
+
{
|
|
86
|
+
"name": "Apple iPhone 15 Pro",
|
|
87
|
+
"category": "electronics",
|
|
88
|
+
"price": 7999,
|
|
89
|
+
"description": "A17 Pro 芯片,钛金属设计,4800 万像素主摄",
|
|
90
|
+
"tags": ["smartphone", "apple", "5g"],
|
|
91
|
+
"created_at": "2024-09-15T10:30:00Z"
|
|
92
|
+
}
|
|
93
|
+
```
|
|
94
|
+
|
|
95
|
+
### 3. Shard(分片)
|
|
96
|
+
|
|
97
|
+
索引可以被分割为多个分片,每个分片是一个独立的 Lucene 实例。分片是 ES 实现水平扩展的基础。
|
|
98
|
+
|
|
99
|
+
- **主分片(Primary Shard)**: 数据写入的目标,索引创建后数量不可更改(除非 Reindex 或 Split)
|
|
100
|
+
- **副本分片(Replica Shard)**: 主分片的拷贝,提供读取负载均衡和容灾能力
|
|
101
|
+
|
|
102
|
+
```
|
|
103
|
+
Index: products (3P + 1R)
|
|
104
|
+
├── Shard 0 (Primary) -> Node A
|
|
105
|
+
│ └── Shard 0 (Replica) -> Node B
|
|
106
|
+
├── Shard 1 (Primary) -> Node B
|
|
107
|
+
│ └── Shard 1 (Replica) -> Node C
|
|
108
|
+
└── Shard 2 (Primary) -> Node C
|
|
109
|
+
└── Shard 2 (Replica) -> Node A
|
|
110
|
+
```
|
|
111
|
+
|
|
112
|
+
**分片大小指导原则**:
|
|
113
|
+
- 单个分片建议 10GB–50GB(日志场景可放宽至 50GB–80GB)
|
|
114
|
+
- 每个节点的分片数量不宜超过 20 个/GB 堆内存
|
|
115
|
+
- 避免过度分片(over-sharding),空分片也消耗资源
|
|
116
|
+
|
|
117
|
+
### 4. Replica(副本)
|
|
118
|
+
|
|
119
|
+
副本是主分片的完整拷贝,作用包括:
|
|
120
|
+
|
|
121
|
+
- **高可用**: 主分片所在节点故障时,副本自动提升为主分片
|
|
122
|
+
- **读扩展**: 搜索请求可以路由到副本,分摊读压力
|
|
123
|
+
- **副本数量可动态调整**:
|
|
124
|
+
|
|
125
|
+
```json
|
|
126
|
+
PUT /products/_settings
|
|
127
|
+
{
|
|
128
|
+
"number_of_replicas": 2
|
|
129
|
+
}
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
### 5. Segment(段)与近实时搜索
|
|
133
|
+
|
|
134
|
+
每个分片由多个不可变的 Segment 组成。文档写入流程:
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
写入请求 -> Memory Buffer -> refresh (默认1s) -> 新 Segment (可搜索)
|
|
138
|
+
-> Translog (持久化保障)
|
|
139
|
+
-> flush -> Segment 持久化到磁盘
|
|
140
|
+
-> Translog 清空
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
- **refresh**: 将内存缓冲区的数据写入新 Segment,使其可被搜索(默认每 1 秒)
|
|
144
|
+
- **flush**: 将 Segment 持久化到磁盘并清空 Translog(由 ES 自动管理)
|
|
145
|
+
- **merge**: 后台定期合并小 Segment 为大 Segment,回收已删除文档的空间
|
|
146
|
+
|
|
147
|
+
```json
|
|
148
|
+
// 手动 refresh(通常不需要,测试场景使用)
|
|
149
|
+
POST /products/_refresh
|
|
150
|
+
|
|
151
|
+
// 调整 refresh 间隔(大批量写入时可临时关闭)
|
|
152
|
+
PUT /products/_settings
|
|
153
|
+
{
|
|
154
|
+
"refresh_interval": "-1"
|
|
155
|
+
}
|
|
156
|
+
```
|
|
157
|
+
|
|
158
|
+
### 6. 倒排索引(Inverted Index)原理
|
|
159
|
+
|
|
160
|
+
倒排索引是 ES 全文搜索的核心数据结构,将分词后的 Term 映射到包含该 Term 的文档列表。
|
|
161
|
+
|
|
162
|
+
```
|
|
163
|
+
原始文档:
|
|
164
|
+
Doc 1: "Elasticsearch 是一个搜索引擎"
|
|
165
|
+
Doc 2: "Elasticsearch 支持全文搜索"
|
|
166
|
+
Doc 3: "Lucene 是搜索引擎的基础"
|
|
167
|
+
|
|
168
|
+
分词后的倒排索引:
|
|
169
|
+
Term -> Posting List (DocID, Position, Frequency)
|
|
170
|
+
─────────────────────────────────────────────────────────
|
|
171
|
+
elasticsearch -> [{doc:1, pos:0, freq:1}, {doc:2, pos:0, freq:1}]
|
|
172
|
+
搜索引擎 -> [{doc:1, pos:1, freq:1}, {doc:3, pos:1, freq:1}]
|
|
173
|
+
全文搜索 -> [{doc:2, pos:1, freq:1}]
|
|
174
|
+
lucene -> [{doc:3, pos:0, freq:1}]
|
|
175
|
+
基础 -> [{doc:3, pos:2, freq:1}]
|
|
176
|
+
```
|
|
177
|
+
|
|
178
|
+
倒排索引的关键组件:
|
|
179
|
+
- **Term Dictionary**: 所有分词后的词项,排序存储
|
|
180
|
+
- **Posting List**: 每个词项对应的文档ID列表 + 位置信息 + 词频
|
|
181
|
+
- **Doc Values**: 列式存储,用于排序和聚合(非倒排索引的一部分,但同样重要)
|
|
182
|
+
- **Stored Fields**: 原始字段值的行式存储(用于 `_source` 返回)
|
|
183
|
+
|
|
184
|
+
---
|
|
185
|
+
|
|
186
|
+
## Mapping 设计
|
|
187
|
+
|
|
188
|
+
Mapping 定义了索引中每个字段的数据类型和索引方式,类似于 RDBMS 中的 Schema。
|
|
189
|
+
|
|
190
|
+
### 常用字段类型
|
|
191
|
+
|
|
192
|
+
| 类型 | 说明 | 索引方式 |
|
|
193
|
+
|------|------|----------|
|
|
194
|
+
| `text` | 全文搜索字段,会被分词 | 倒排索引 |
|
|
195
|
+
| `keyword` | 精确值,不分词 | 倒排索引(整体作为单一 Term) |
|
|
196
|
+
| `long` / `integer` / `short` / `byte` | 整数 | BKD Tree |
|
|
197
|
+
| `float` / `double` / `half_float` / `scaled_float` | 浮点数 | BKD Tree |
|
|
198
|
+
| `date` | 日期时间 | BKD Tree |
|
|
199
|
+
| `boolean` | 布尔值 | 倒排索引 |
|
|
200
|
+
| `object` | JSON 对象(扁平化存储) | 各子字段独立索引 |
|
|
201
|
+
| `nested` | 嵌套对象(保持内部字段关联性) | 独立隐藏文档 |
|
|
202
|
+
| `geo_point` | 经纬度坐标 | BKD Tree |
|
|
203
|
+
| `geo_shape` | 任意 GeoJSON 几何形状 | BKD Tree |
|
|
204
|
+
| `dense_vector` | 密集向量(用于 kNN 搜索) | HNSW 图索引 |
|
|
205
|
+
| `ip` | IPv4 / IPv6 地址 | BKD Tree |
|
|
206
|
+
| `completion` | 自动补全(FST 结构) | 专用索引 |
|
|
207
|
+
| `join` | 父子关系 | 路由至同一分片 |
|
|
208
|
+
|
|
209
|
+
### 动态映射(Dynamic Mapping)
|
|
210
|
+
|
|
211
|
+
ES 默认开启动态映射,自动推断字段类型。生产环境建议关闭或设为 strict:
|
|
212
|
+
|
|
213
|
+
```json
|
|
214
|
+
PUT /orders
|
|
215
|
+
{
|
|
216
|
+
"mappings": {
|
|
217
|
+
"dynamic": "strict",
|
|
218
|
+
"properties": {
|
|
219
|
+
"order_id": { "type": "keyword" },
|
|
220
|
+
"amount": { "type": "scaled_float", "scaling_factor": 100 },
|
|
221
|
+
"status": { "type": "keyword" },
|
|
222
|
+
"created_at": { "type": "date", "format": "yyyy-MM-dd'T'HH:mm:ssZ||epoch_millis" }
|
|
223
|
+
}
|
|
224
|
+
}
|
|
225
|
+
}
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
动态映射策略:
|
|
229
|
+
- `true`(默认): 自动添加新字段
|
|
230
|
+
- `false`: 新字段被忽略(不索引,但仍存储在 `_source`)
|
|
231
|
+
- `strict`: 遇到未定义字段直接报错
|
|
232
|
+
- `runtime`: 新字段作为运行时字段(不持久化索引)
|
|
233
|
+
|
|
234
|
+
### 多字段(Multi-fields)
|
|
235
|
+
|
|
236
|
+
同一字段可以用不同方式索引,满足不同查询需求:
|
|
237
|
+
|
|
238
|
+
```json
|
|
239
|
+
{
|
|
240
|
+
"mappings": {
|
|
241
|
+
"properties": {
|
|
242
|
+
"product_name": {
|
|
243
|
+
"type": "text",
|
|
244
|
+
"analyzer": "ik_max_word",
|
|
245
|
+
"search_analyzer": "ik_smart",
|
|
246
|
+
"fields": {
|
|
247
|
+
"keyword": {
|
|
248
|
+
"type": "keyword",
|
|
249
|
+
"ignore_above": 256
|
|
250
|
+
},
|
|
251
|
+
"pinyin": {
|
|
252
|
+
"type": "text",
|
|
253
|
+
"analyzer": "pinyin_analyzer"
|
|
254
|
+
},
|
|
255
|
+
"suggest": {
|
|
256
|
+
"type": "completion"
|
|
257
|
+
}
|
|
258
|
+
}
|
|
259
|
+
}
|
|
260
|
+
}
|
|
261
|
+
}
|
|
262
|
+
}
|
|
263
|
+
```
|
|
264
|
+
|
|
265
|
+
查询时通过 `product_name`(全文搜索)、`product_name.keyword`(精确匹配/聚合)、`product_name.pinyin`(拼音搜索)访问不同子字段。
|
|
266
|
+
|
|
267
|
+
### 分析器链路(Analyzer Chain)
|
|
268
|
+
|
|
269
|
+
分析器由三部分组成,按顺序处理文本:
|
|
270
|
+
|
|
271
|
+
```
|
|
272
|
+
原始文本 -> Character Filter -> Tokenizer -> Token Filter -> Terms
|
|
273
|
+
(字符过滤) (分词器) (词项过滤)
|
|
274
|
+
```
|
|
275
|
+
|
|
276
|
+
- **Character Filter**: 在分词前处理原始文本(如去除 HTML 标签、字符映射)
|
|
277
|
+
- **Tokenizer**: 将文本拆分为词项(Token)
|
|
278
|
+
- **Token Filter**: 对词项进行转换(如小写、同义词、停用词移除、词干提取)
|
|
279
|
+
|
|
280
|
+
### 自定义分析器
|
|
281
|
+
|
|
282
|
+
```json
|
|
283
|
+
PUT /articles
|
|
284
|
+
{
|
|
285
|
+
"settings": {
|
|
286
|
+
"analysis": {
|
|
287
|
+
"char_filter": {
|
|
288
|
+
"html_strip_filter": {
|
|
289
|
+
"type": "html_strip",
|
|
290
|
+
"escaped_tags": ["b", "em"]
|
|
291
|
+
},
|
|
292
|
+
"ampersand_mapping": {
|
|
293
|
+
"type": "mapping",
|
|
294
|
+
"mappings": ["& => and", "| => or"]
|
|
295
|
+
}
|
|
296
|
+
},
|
|
297
|
+
"tokenizer": {
|
|
298
|
+
"my_edge_ngram": {
|
|
299
|
+
"type": "edge_ngram",
|
|
300
|
+
"min_gram": 2,
|
|
301
|
+
"max_gram": 15,
|
|
302
|
+
"token_chars": ["letter", "digit"]
|
|
303
|
+
}
|
|
304
|
+
},
|
|
305
|
+
"filter": {
|
|
306
|
+
"my_stopwords": {
|
|
307
|
+
"type": "stop",
|
|
308
|
+
"stopwords": ["的", "了", "是", "在", "和"]
|
|
309
|
+
},
|
|
310
|
+
"synonym_filter": {
|
|
311
|
+
"type": "synonym_graph",
|
|
312
|
+
"synonyms_path": "analysis/synonyms.txt",
|
|
313
|
+
"updateable": true
|
|
314
|
+
},
|
|
315
|
+
"my_pinyin": {
|
|
316
|
+
"type": "pinyin",
|
|
317
|
+
"keep_full_pinyin": true,
|
|
318
|
+
"keep_joined_full_pinyin": true,
|
|
319
|
+
"keep_original": true,
|
|
320
|
+
"limit_first_letter_length": 16,
|
|
321
|
+
"remove_duplicated_term": true
|
|
322
|
+
}
|
|
323
|
+
},
|
|
324
|
+
"analyzer": {
|
|
325
|
+
"article_analyzer": {
|
|
326
|
+
"type": "custom",
|
|
327
|
+
"char_filter": ["html_strip_filter", "ampersand_mapping"],
|
|
328
|
+
"tokenizer": "ik_max_word",
|
|
329
|
+
"filter": ["lowercase", "my_stopwords", "synonym_filter"]
|
|
330
|
+
},
|
|
331
|
+
"search_analyzer": {
|
|
332
|
+
"type": "custom",
|
|
333
|
+
"tokenizer": "ik_smart",
|
|
334
|
+
"filter": ["lowercase", "synonym_filter"]
|
|
335
|
+
},
|
|
336
|
+
"autocomplete_analyzer": {
|
|
337
|
+
"type": "custom",
|
|
338
|
+
"tokenizer": "my_edge_ngram",
|
|
339
|
+
"filter": ["lowercase"]
|
|
340
|
+
}
|
|
341
|
+
}
|
|
342
|
+
}
|
|
343
|
+
},
|
|
344
|
+
"mappings": {
|
|
345
|
+
"properties": {
|
|
346
|
+
"title": {
|
|
347
|
+
"type": "text",
|
|
348
|
+
"analyzer": "article_analyzer",
|
|
349
|
+
"search_analyzer": "search_analyzer"
|
|
350
|
+
},
|
|
351
|
+
"content": {
|
|
352
|
+
"type": "text",
|
|
353
|
+
"analyzer": "article_analyzer"
|
|
354
|
+
}
|
|
355
|
+
}
|
|
356
|
+
}
|
|
357
|
+
}
|
|
358
|
+
```
|
|
359
|
+
|
|
360
|
+
### 中文分词:IK Analyzer
|
|
361
|
+
|
|
362
|
+
IK 是 ES 中使用最广泛的中文分词插件,提供两种分词模式:
|
|
363
|
+
|
|
364
|
+
| 模式 | 说明 | 适用场景 |
|
|
365
|
+
|------|------|----------|
|
|
366
|
+
| `ik_max_word` | 最细粒度切分,穷尽所有可能的组合 | 索引时使用,提高召回率 |
|
|
367
|
+
| `ik_smart` | 最粗粒度切分,不重复 | 搜索时使用,提高精确率 |
|
|
368
|
+
|
|
369
|
+
```bash
|
|
370
|
+
# 安装 IK 插件
|
|
371
|
+
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v8.x.x/elasticsearch-analysis-ik-8.x.x.zip
|
|
372
|
+
|
|
373
|
+
# 测试分词效果
|
|
374
|
+
POST /_analyze
|
|
375
|
+
{
|
|
376
|
+
"analyzer": "ik_max_word",
|
|
377
|
+
"text": "中华人民共和国国歌"
|
|
378
|
+
}
|
|
379
|
+
# 结果: ["中华人民共和国", "中华人民", "中华", "华人", "人民共和国", "人民", "共和国", "共和", "国歌"]
|
|
380
|
+
```
|
|
381
|
+
|
|
382
|
+
**自定义词典**:
|
|
383
|
+
|
|
384
|
+
```xml
|
|
385
|
+
<!-- config/IKAnalyzer.cfg.xml -->
|
|
386
|
+
<properties>
|
|
387
|
+
<entry key="ext_dict">custom/custom_dict.dic</entry>
|
|
388
|
+
<entry key="ext_stopwords">custom/custom_stopwords.dic</entry>
|
|
389
|
+
<entry key="remote_ext_dict">http://dict-server/hot_words.txt</entry>
|
|
390
|
+
<entry key="remote_ext_stopwords">http://dict-server/stop_words.txt</entry>
|
|
391
|
+
</properties>
|
|
392
|
+
```
|
|
393
|
+
|
|
394
|
+
### 同义词配置
|
|
395
|
+
|
|
396
|
+
```
|
|
397
|
+
# analysis/synonyms.txt
|
|
398
|
+
# 格式一:等价同义词
|
|
399
|
+
手机,手提电话,移动电话
|
|
400
|
+
电脑,计算机,PC
|
|
401
|
+
|
|
402
|
+
# 格式二:单向映射
|
|
403
|
+
iPhone => 苹果手机
|
|
404
|
+
MacBook => 苹果笔记本
|
|
405
|
+
|
|
406
|
+
# 格式三:缩写展开
|
|
407
|
+
ES => Elasticsearch
|
|
408
|
+
K8s => Kubernetes
|
|
409
|
+
```
|
|
410
|
+
|
|
411
|
+
同义词过滤器建议在 search_analyzer 中使用而非 index_analyzer,以便更新同义词时无需重建索引。
|
|
412
|
+
|
|
413
|
+
---
|
|
414
|
+
|
|
415
|
+
## 查询 DSL
|
|
416
|
+
|
|
417
|
+
ES 查询分为两种上下文:
|
|
418
|
+
- **Query Context**: 计算相关性评分(`_score`),用于全文搜索
|
|
419
|
+
- **Filter Context**: 不计算评分,结果可缓存,用于精确过滤
|
|
420
|
+
|
|
421
|
+
### match 查询
|
|
422
|
+
|
|
423
|
+
```json
|
|
424
|
+
// 基础 match:对搜索词分词后查询
|
|
425
|
+
GET /products/_search
|
|
426
|
+
{
|
|
427
|
+
"query": {
|
|
428
|
+
"match": {
|
|
429
|
+
"description": {
|
|
430
|
+
"query": "苹果手机 拍照",
|
|
431
|
+
"operator": "or",
|
|
432
|
+
"minimum_should_match": "75%"
|
|
433
|
+
}
|
|
434
|
+
}
|
|
435
|
+
}
|
|
436
|
+
}
|
|
437
|
+
|
|
438
|
+
// match_phrase:短语匹配,保持词序
|
|
439
|
+
GET /products/_search
|
|
440
|
+
{
|
|
441
|
+
"query": {
|
|
442
|
+
"match_phrase": {
|
|
443
|
+
"description": {
|
|
444
|
+
"query": "钛金属设计",
|
|
445
|
+
"slop": 1
|
|
446
|
+
}
|
|
447
|
+
}
|
|
448
|
+
}
|
|
449
|
+
}
|
|
450
|
+
|
|
451
|
+
// multi_match:跨多字段搜索
|
|
452
|
+
GET /products/_search
|
|
453
|
+
{
|
|
454
|
+
"query": {
|
|
455
|
+
"multi_match": {
|
|
456
|
+
"query": "苹果手机",
|
|
457
|
+
"fields": ["name^3", "description^1", "tags^2"],
|
|
458
|
+
"type": "best_fields",
|
|
459
|
+
"tie_breaker": 0.3
|
|
460
|
+
}
|
|
461
|
+
}
|
|
462
|
+
}
|
|
463
|
+
```
|
|
464
|
+
|
|
465
|
+
`multi_match` type 说明:
|
|
466
|
+
- `best_fields`: 取最佳匹配字段的得分(默认)
|
|
467
|
+
- `most_fields`: 所有匹配字段得分之和
|
|
468
|
+
- `cross_fields`: 将多字段视为一个大字段
|
|
469
|
+
- `phrase`: 对每个字段执行 match_phrase
|
|
470
|
+
- `phrase_prefix`: 对每个字段执行 match_phrase_prefix
|
|
471
|
+
|
|
472
|
+
### term 查询
|
|
473
|
+
|
|
474
|
+
```json
|
|
475
|
+
// term:精确匹配(不分词)
|
|
476
|
+
GET /products/_search
|
|
477
|
+
{
|
|
478
|
+
"query": {
|
|
479
|
+
"term": {
|
|
480
|
+
"status": {
|
|
481
|
+
"value": "active"
|
|
482
|
+
}
|
|
483
|
+
}
|
|
484
|
+
}
|
|
485
|
+
}
|
|
486
|
+
|
|
487
|
+
// terms:多值精确匹配(类似 SQL IN)
|
|
488
|
+
GET /products/_search
|
|
489
|
+
{
|
|
490
|
+
"query": {
|
|
491
|
+
"terms": {
|
|
492
|
+
"category": ["electronics", "accessories"]
|
|
493
|
+
}
|
|
494
|
+
}
|
|
495
|
+
}
|
|
496
|
+
|
|
497
|
+
// exists:字段存在性检查
|
|
498
|
+
GET /products/_search
|
|
499
|
+
{
|
|
500
|
+
"query": {
|
|
501
|
+
"exists": {
|
|
502
|
+
"field": "discount_price"
|
|
503
|
+
}
|
|
504
|
+
}
|
|
505
|
+
}
|
|
506
|
+
```
|
|
507
|
+
|
|
508
|
+
> **警告**: 不要对 `text` 类型字段使用 `term` 查询。`text` 字段在索引时会被分词,而 `term` 不分词,几乎不会匹配。应使用 `keyword` 子字段。
|
|
509
|
+
|
|
510
|
+
### bool 查询
|
|
511
|
+
|
|
512
|
+
```json
|
|
513
|
+
GET /products/_search
|
|
514
|
+
{
|
|
515
|
+
"query": {
|
|
516
|
+
"bool": {
|
|
517
|
+
"must": [
|
|
518
|
+
{ "match": { "description": "智能手机" } }
|
|
519
|
+
],
|
|
520
|
+
"must_not": [
|
|
521
|
+
{ "term": { "status": "discontinued" } }
|
|
522
|
+
],
|
|
523
|
+
"should": [
|
|
524
|
+
{ "term": { "brand.keyword": "Apple" } },
|
|
525
|
+
{ "term": { "brand.keyword": "Samsung" } }
|
|
526
|
+
],
|
|
527
|
+
"minimum_should_match": 1,
|
|
528
|
+
"filter": [
|
|
529
|
+
{ "range": { "price": { "gte": 3000, "lte": 10000 } } },
|
|
530
|
+
{ "term": { "in_stock": true } }
|
|
531
|
+
]
|
|
532
|
+
}
|
|
533
|
+
}
|
|
534
|
+
}
|
|
535
|
+
```
|
|
536
|
+
|
|
537
|
+
- `must`: 必须匹配,贡献评分
|
|
538
|
+
- `must_not`: 必须不匹配,不贡献评分(filter context)
|
|
539
|
+
- `should`: 可选匹配,贡献评分。若无 must/filter,至少一个 should 必须匹配
|
|
540
|
+
- `filter`: 必须匹配,不贡献评分(filter context,结果可缓存)
|
|
541
|
+
|
|
542
|
+
### range 查询
|
|
543
|
+
|
|
544
|
+
```json
|
|
545
|
+
GET /orders/_search
|
|
546
|
+
{
|
|
547
|
+
"query": {
|
|
548
|
+
"range": {
|
|
549
|
+
"created_at": {
|
|
550
|
+
"gte": "2024-01-01",
|
|
551
|
+
"lt": "2024-07-01",
|
|
552
|
+
"format": "yyyy-MM-dd",
|
|
553
|
+
"time_zone": "+08:00"
|
|
554
|
+
}
|
|
555
|
+
}
|
|
556
|
+
}
|
|
557
|
+
}
|
|
558
|
+
```
|
|
559
|
+
|
|
560
|
+
### nested 查询
|
|
561
|
+
|
|
562
|
+
当使用 `nested` 类型存储嵌套对象时,必须使用 `nested` 查询:
|
|
563
|
+
|
|
564
|
+
```json
|
|
565
|
+
// Mapping
|
|
566
|
+
PUT /orders
|
|
567
|
+
{
|
|
568
|
+
"mappings": {
|
|
569
|
+
"properties": {
|
|
570
|
+
"order_id": { "type": "keyword" },
|
|
571
|
+
"items": {
|
|
572
|
+
"type": "nested",
|
|
573
|
+
"properties": {
|
|
574
|
+
"product_name": { "type": "text", "analyzer": "ik_smart" },
|
|
575
|
+
"quantity": { "type": "integer" },
|
|
576
|
+
"unit_price": { "type": "float" }
|
|
577
|
+
}
|
|
578
|
+
}
|
|
579
|
+
}
|
|
580
|
+
}
|
|
581
|
+
}
|
|
582
|
+
|
|
583
|
+
// 查询:找到有某个商品且数量 > 5 的订单
|
|
584
|
+
GET /orders/_search
|
|
585
|
+
{
|
|
586
|
+
"query": {
|
|
587
|
+
"nested": {
|
|
588
|
+
"path": "items",
|
|
589
|
+
"query": {
|
|
590
|
+
"bool": {
|
|
591
|
+
"must": [
|
|
592
|
+
{ "match": { "items.product_name": "键盘" } },
|
|
593
|
+
{ "range": { "items.quantity": { "gt": 5 } } }
|
|
594
|
+
]
|
|
595
|
+
}
|
|
596
|
+
},
|
|
597
|
+
"inner_hits": {
|
|
598
|
+
"size": 3,
|
|
599
|
+
"_source": ["items.product_name", "items.quantity"]
|
|
600
|
+
}
|
|
601
|
+
}
|
|
602
|
+
}
|
|
603
|
+
}
|
|
604
|
+
```
|
|
605
|
+
|
|
606
|
+
> **注意**: `object` 类型会扁平化存储嵌套对象的字段,导致不同对象的字段之间产生错误交叉匹配。如果需要保持嵌套对象内部字段的关联性,必须使用 `nested` 类型。
|
|
607
|
+
|
|
608
|
+
### has_child / has_parent 查询
|
|
609
|
+
|
|
610
|
+
```json
|
|
611
|
+
// Mapping(父子关系)
|
|
612
|
+
PUT /qa_forum
|
|
613
|
+
{
|
|
614
|
+
"mappings": {
|
|
615
|
+
"properties": {
|
|
616
|
+
"relation_type": {
|
|
617
|
+
"type": "join",
|
|
618
|
+
"relations": {
|
|
619
|
+
"question": "answer"
|
|
620
|
+
}
|
|
621
|
+
},
|
|
622
|
+
"title": { "type": "text" },
|
|
623
|
+
"content": { "type": "text" },
|
|
624
|
+
"votes": { "type": "integer" }
|
|
625
|
+
}
|
|
626
|
+
}
|
|
627
|
+
}
|
|
628
|
+
|
|
629
|
+
// 索引父文档(问题)
|
|
630
|
+
PUT /qa_forum/_doc/q1
|
|
631
|
+
{
|
|
632
|
+
"title": "如何优化 Elasticsearch 查询性能?",
|
|
633
|
+
"relation_type": "question"
|
|
634
|
+
}
|
|
635
|
+
|
|
636
|
+
// 索引子文档(答案),routing 到父文档所在分片
|
|
637
|
+
PUT /qa_forum/_doc/a1?routing=q1
|
|
638
|
+
{
|
|
639
|
+
"content": "首先确保使用了 filter context...",
|
|
640
|
+
"votes": 42,
|
|
641
|
+
"relation_type": {
|
|
642
|
+
"name": "answer",
|
|
643
|
+
"parent": "q1"
|
|
644
|
+
}
|
|
645
|
+
}
|
|
646
|
+
|
|
647
|
+
// has_child:找到有高赞答案的问题
|
|
648
|
+
GET /qa_forum/_search
|
|
649
|
+
{
|
|
650
|
+
"query": {
|
|
651
|
+
"has_child": {
|
|
652
|
+
"type": "answer",
|
|
653
|
+
"query": {
|
|
654
|
+
"range": { "votes": { "gte": 10 } }
|
|
655
|
+
},
|
|
656
|
+
"score_mode": "max",
|
|
657
|
+
"min_children": 1
|
|
658
|
+
}
|
|
659
|
+
}
|
|
660
|
+
}
|
|
661
|
+
|
|
662
|
+
// has_parent:找到某个问题的所有答案
|
|
663
|
+
GET /qa_forum/_search
|
|
664
|
+
{
|
|
665
|
+
"query": {
|
|
666
|
+
"has_parent": {
|
|
667
|
+
"parent_type": "question",
|
|
668
|
+
"query": {
|
|
669
|
+
"match": { "title": "Elasticsearch 性能" }
|
|
670
|
+
}
|
|
671
|
+
}
|
|
672
|
+
}
|
|
673
|
+
}
|
|
674
|
+
```
|
|
675
|
+
|
|
676
|
+
### function_score / script_score
|
|
677
|
+
|
|
678
|
+
```json
|
|
679
|
+
// function_score:自定义评分
|
|
680
|
+
GET /products/_search
|
|
681
|
+
{
|
|
682
|
+
"query": {
|
|
683
|
+
"function_score": {
|
|
684
|
+
"query": { "match": { "name": "手机" } },
|
|
685
|
+
"functions": [
|
|
686
|
+
{
|
|
687
|
+
"filter": { "term": { "is_promoted": true } },
|
|
688
|
+
"weight": 5
|
|
689
|
+
},
|
|
690
|
+
{
|
|
691
|
+
"field_value_factor": {
|
|
692
|
+
"field": "sales_count",
|
|
693
|
+
"modifier": "log1p",
|
|
694
|
+
"factor": 0.5
|
|
695
|
+
}
|
|
696
|
+
},
|
|
697
|
+
{
|
|
698
|
+
"gauss": {
|
|
699
|
+
"created_at": {
|
|
700
|
+
"origin": "now",
|
|
701
|
+
"scale": "30d",
|
|
702
|
+
"offset": "7d",
|
|
703
|
+
"decay": 0.5
|
|
704
|
+
}
|
|
705
|
+
}
|
|
706
|
+
},
|
|
707
|
+
{
|
|
708
|
+
"random_score": {
|
|
709
|
+
"seed": 12345,
|
|
710
|
+
"field": "_seq_no"
|
|
711
|
+
}
|
|
712
|
+
}
|
|
713
|
+
],
|
|
714
|
+
"score_mode": "sum",
|
|
715
|
+
"boost_mode": "multiply",
|
|
716
|
+
"max_boost": 100
|
|
717
|
+
}
|
|
718
|
+
}
|
|
719
|
+
}
|
|
720
|
+
|
|
721
|
+
// script_score:脚本评分(向量相似度等复杂场景)
|
|
722
|
+
GET /products/_search
|
|
723
|
+
{
|
|
724
|
+
"query": {
|
|
725
|
+
"script_score": {
|
|
726
|
+
"query": { "match_all": {} },
|
|
727
|
+
"script": {
|
|
728
|
+
"source": """
|
|
729
|
+
double textScore = _score;
|
|
730
|
+
double popularity = doc['popularity'].value;
|
|
731
|
+
double recency = decayDateLinear(params.origin, params.scale, params.offset, params.decay, doc['created_at'].value);
|
|
732
|
+
return textScore * 0.5 + Math.log1p(popularity) * 0.3 + recency * 0.2;
|
|
733
|
+
""",
|
|
734
|
+
"params": {
|
|
735
|
+
"origin": "2024-06-01",
|
|
736
|
+
"scale": "90d",
|
|
737
|
+
"offset": "7d",
|
|
738
|
+
"decay": 0.5
|
|
739
|
+
}
|
|
740
|
+
}
|
|
741
|
+
}
|
|
742
|
+
}
|
|
743
|
+
}
|
|
744
|
+
```
|
|
745
|
+
|
|
746
|
+
### 聚合(Aggregation)
|
|
747
|
+
|
|
748
|
+
聚合是 ES 的强大分析功能,支持嵌套组合。
|
|
749
|
+
|
|
750
|
+
#### Bucket 聚合
|
|
751
|
+
|
|
752
|
+
```json
|
|
753
|
+
// terms 聚合:按类别分组
|
|
754
|
+
GET /products/_search
|
|
755
|
+
{
|
|
756
|
+
"size": 0,
|
|
757
|
+
"aggs": {
|
|
758
|
+
"by_category": {
|
|
759
|
+
"terms": {
|
|
760
|
+
"field": "category.keyword",
|
|
761
|
+
"size": 20,
|
|
762
|
+
"order": { "_count": "desc" },
|
|
763
|
+
"min_doc_count": 1
|
|
764
|
+
}
|
|
765
|
+
}
|
|
766
|
+
}
|
|
767
|
+
}
|
|
768
|
+
|
|
769
|
+
// date_histogram 聚合:按时间分桶
|
|
770
|
+
GET /orders/_search
|
|
771
|
+
{
|
|
772
|
+
"size": 0,
|
|
773
|
+
"aggs": {
|
|
774
|
+
"monthly_sales": {
|
|
775
|
+
"date_histogram": {
|
|
776
|
+
"field": "created_at",
|
|
777
|
+
"calendar_interval": "month",
|
|
778
|
+
"format": "yyyy-MM",
|
|
779
|
+
"time_zone": "+08:00",
|
|
780
|
+
"min_doc_count": 0,
|
|
781
|
+
"extended_bounds": {
|
|
782
|
+
"min": "2024-01",
|
|
783
|
+
"max": "2024-12"
|
|
784
|
+
}
|
|
785
|
+
},
|
|
786
|
+
"aggs": {
|
|
787
|
+
"total_revenue": {
|
|
788
|
+
"sum": { "field": "amount" }
|
|
789
|
+
},
|
|
790
|
+
"avg_order_value": {
|
|
791
|
+
"avg": { "field": "amount" }
|
|
792
|
+
}
|
|
793
|
+
}
|
|
794
|
+
}
|
|
795
|
+
}
|
|
796
|
+
}
|
|
797
|
+
|
|
798
|
+
// range 聚合:自定义范围分桶
|
|
799
|
+
GET /products/_search
|
|
800
|
+
{
|
|
801
|
+
"size": 0,
|
|
802
|
+
"aggs": {
|
|
803
|
+
"price_ranges": {
|
|
804
|
+
"range": {
|
|
805
|
+
"field": "price",
|
|
806
|
+
"ranges": [
|
|
807
|
+
{ "key": "budget", "to": 1000 },
|
|
808
|
+
{ "key": "mid", "from": 1000, "to": 5000 },
|
|
809
|
+
{ "key": "premium", "from": 5000, "to": 10000 },
|
|
810
|
+
{ "key": "luxury", "from": 10000 }
|
|
811
|
+
]
|
|
812
|
+
}
|
|
813
|
+
}
|
|
814
|
+
}
|
|
815
|
+
}
|
|
816
|
+
```
|
|
817
|
+
|
|
818
|
+
#### Metrics 聚合
|
|
819
|
+
|
|
820
|
+
```json
|
|
821
|
+
GET /orders/_search
|
|
822
|
+
{
|
|
823
|
+
"size": 0,
|
|
824
|
+
"aggs": {
|
|
825
|
+
"revenue_stats": {
|
|
826
|
+
"stats": { "field": "amount" }
|
|
827
|
+
},
|
|
828
|
+
"unique_customers": {
|
|
829
|
+
"cardinality": {
|
|
830
|
+
"field": "customer_id",
|
|
831
|
+
"precision_threshold": 10000
|
|
832
|
+
}
|
|
833
|
+
},
|
|
834
|
+
"percentile_response_time": {
|
|
835
|
+
"percentiles": {
|
|
836
|
+
"field": "response_ms",
|
|
837
|
+
"percents": [50, 90, 95, 99]
|
|
838
|
+
}
|
|
839
|
+
},
|
|
840
|
+
"top_orders": {
|
|
841
|
+
"top_hits": {
|
|
842
|
+
"size": 3,
|
|
843
|
+
"sort": [{ "amount": "desc" }],
|
|
844
|
+
"_source": ["order_id", "amount", "customer_name"]
|
|
845
|
+
}
|
|
846
|
+
}
|
|
847
|
+
}
|
|
848
|
+
}
|
|
849
|
+
```
|
|
850
|
+
|
|
851
|
+
#### Pipeline 聚合
|
|
852
|
+
|
|
853
|
+
```json
|
|
854
|
+
GET /orders/_search
|
|
855
|
+
{
|
|
856
|
+
"size": 0,
|
|
857
|
+
"aggs": {
|
|
858
|
+
"monthly": {
|
|
859
|
+
"date_histogram": {
|
|
860
|
+
"field": "created_at",
|
|
861
|
+
"calendar_interval": "month"
|
|
862
|
+
},
|
|
863
|
+
"aggs": {
|
|
864
|
+
"monthly_revenue": {
|
|
865
|
+
"sum": { "field": "amount" }
|
|
866
|
+
},
|
|
867
|
+
"revenue_derivative": {
|
|
868
|
+
"derivative": {
|
|
869
|
+
"buckets_path": "monthly_revenue"
|
|
870
|
+
}
|
|
871
|
+
},
|
|
872
|
+
"cumulative_revenue": {
|
|
873
|
+
"cumulative_sum": {
|
|
874
|
+
"buckets_path": "monthly_revenue"
|
|
875
|
+
}
|
|
876
|
+
},
|
|
877
|
+
"moving_avg_revenue": {
|
|
878
|
+
"moving_avg": {
|
|
879
|
+
"buckets_path": "monthly_revenue",
|
|
880
|
+
"window": 3,
|
|
881
|
+
"model": "simple"
|
|
882
|
+
}
|
|
883
|
+
}
|
|
884
|
+
}
|
|
885
|
+
},
|
|
886
|
+
"max_monthly_revenue": {
|
|
887
|
+
"max_bucket": {
|
|
888
|
+
"buckets_path": "monthly>monthly_revenue"
|
|
889
|
+
}
|
|
890
|
+
}
|
|
891
|
+
}
|
|
892
|
+
}
|
|
893
|
+
```
|
|
894
|
+
|
|
895
|
+
---
|
|
896
|
+
|
|
897
|
+
## 索引管理
|
|
898
|
+
|
|
899
|
+
### 索引别名(Alias)
|
|
900
|
+
|
|
901
|
+
别名是指向一个或多个索引的虚拟名称,是零停机切换索引的关键手段。
|
|
902
|
+
|
|
903
|
+
```json
|
|
904
|
+
// 创建别名
|
|
905
|
+
POST /_aliases
|
|
906
|
+
{
|
|
907
|
+
"actions": [
|
|
908
|
+
{ "add": { "index": "products_v2", "alias": "products" } },
|
|
909
|
+
{ "remove": { "index": "products_v1", "alias": "products" } }
|
|
910
|
+
]
|
|
911
|
+
}
|
|
912
|
+
|
|
913
|
+
// 过滤别名(虚拟子集视图)
|
|
914
|
+
POST /_aliases
|
|
915
|
+
{
|
|
916
|
+
"actions": [
|
|
917
|
+
{
|
|
918
|
+
"add": {
|
|
919
|
+
"index": "logs-2024",
|
|
920
|
+
"alias": "logs-error",
|
|
921
|
+
"filter": { "term": { "level": "error" } }
|
|
922
|
+
}
|
|
923
|
+
}
|
|
924
|
+
]
|
|
925
|
+
}
|
|
926
|
+
|
|
927
|
+
// 写入别名(指定写入目标)
|
|
928
|
+
POST /_aliases
|
|
929
|
+
{
|
|
930
|
+
"actions": [
|
|
931
|
+
{
|
|
932
|
+
"add": {
|
|
933
|
+
"index": "products_v2",
|
|
934
|
+
"alias": "products_write",
|
|
935
|
+
"is_write_index": true
|
|
936
|
+
}
|
|
937
|
+
}
|
|
938
|
+
]
|
|
939
|
+
}
|
|
940
|
+
```
|
|
941
|
+
|
|
942
|
+
### 滚动索引(Rollover)
|
|
943
|
+
|
|
944
|
+
```json
|
|
945
|
+
// 创建初始索引并关联别名
|
|
946
|
+
PUT /logs-000001
|
|
947
|
+
{
|
|
948
|
+
"aliases": {
|
|
949
|
+
"logs-write": { "is_write_index": true },
|
|
950
|
+
"logs-read": {}
|
|
951
|
+
},
|
|
952
|
+
"settings": {
|
|
953
|
+
"number_of_shards": 2,
|
|
954
|
+
"number_of_replicas": 1
|
|
955
|
+
}
|
|
956
|
+
}
|
|
957
|
+
|
|
958
|
+
// 手动 Rollover
|
|
959
|
+
POST /logs-write/_rollover
|
|
960
|
+
{
|
|
961
|
+
"conditions": {
|
|
962
|
+
"max_age": "7d",
|
|
963
|
+
"max_docs": 10000000,
|
|
964
|
+
"max_primary_shard_size": "50gb"
|
|
965
|
+
}
|
|
966
|
+
}
|
|
967
|
+
// 结果: 创建 logs-000002,logs-write 指向新索引
|
|
968
|
+
```
|
|
969
|
+
|
|
970
|
+
### ILM 生命周期管理
|
|
971
|
+
|
|
972
|
+
```json
|
|
973
|
+
PUT /_ilm/policy/logs_policy
|
|
974
|
+
{
|
|
975
|
+
"policy": {
|
|
976
|
+
"phases": {
|
|
977
|
+
"hot": {
|
|
978
|
+
"min_age": "0ms",
|
|
979
|
+
"actions": {
|
|
980
|
+
"rollover": {
|
|
981
|
+
"max_primary_shard_size": "50gb",
|
|
982
|
+
"max_age": "7d"
|
|
983
|
+
},
|
|
984
|
+
"set_priority": { "priority": 100 }
|
|
985
|
+
}
|
|
986
|
+
},
|
|
987
|
+
"warm": {
|
|
988
|
+
"min_age": "30d",
|
|
989
|
+
"actions": {
|
|
990
|
+
"shrink": { "number_of_shards": 1 },
|
|
991
|
+
"forcemerge": { "max_num_segments": 1 },
|
|
992
|
+
"allocate": {
|
|
993
|
+
"require": { "data": "warm" }
|
|
994
|
+
},
|
|
995
|
+
"set_priority": { "priority": 50 }
|
|
996
|
+
}
|
|
997
|
+
},
|
|
998
|
+
"cold": {
|
|
999
|
+
"min_age": "90d",
|
|
1000
|
+
"actions": {
|
|
1001
|
+
"allocate": {
|
|
1002
|
+
"require": { "data": "cold" }
|
|
1003
|
+
},
|
|
1004
|
+
"freeze": {},
|
|
1005
|
+
"set_priority": { "priority": 0 }
|
|
1006
|
+
}
|
|
1007
|
+
},
|
|
1008
|
+
"delete": {
|
|
1009
|
+
"min_age": "365d",
|
|
1010
|
+
"actions": {
|
|
1011
|
+
"delete": {}
|
|
1012
|
+
}
|
|
1013
|
+
}
|
|
1014
|
+
}
|
|
1015
|
+
}
|
|
1016
|
+
}
|
|
1017
|
+
|
|
1018
|
+
// 将策略应用到索引模板
|
|
1019
|
+
PUT /_index_template/logs_template
|
|
1020
|
+
{
|
|
1021
|
+
"index_patterns": ["logs-*"],
|
|
1022
|
+
"template": {
|
|
1023
|
+
"settings": {
|
|
1024
|
+
"number_of_shards": 2,
|
|
1025
|
+
"number_of_replicas": 1,
|
|
1026
|
+
"index.lifecycle.name": "logs_policy",
|
|
1027
|
+
"index.lifecycle.rollover_alias": "logs-write"
|
|
1028
|
+
}
|
|
1029
|
+
}
|
|
1030
|
+
}
|
|
1031
|
+
```
|
|
1032
|
+
|
|
1033
|
+
### Reindex
|
|
1034
|
+
|
|
1035
|
+
```json
|
|
1036
|
+
// 基础 Reindex
|
|
1037
|
+
POST /_reindex
|
|
1038
|
+
{
|
|
1039
|
+
"source": {
|
|
1040
|
+
"index": "products_v1",
|
|
1041
|
+
"query": {
|
|
1042
|
+
"range": { "created_at": { "gte": "2024-01-01" } }
|
|
1043
|
+
}
|
|
1044
|
+
},
|
|
1045
|
+
"dest": {
|
|
1046
|
+
"index": "products_v2",
|
|
1047
|
+
"pipeline": "enrich_product"
|
|
1048
|
+
}
|
|
1049
|
+
}
|
|
1050
|
+
|
|
1051
|
+
// 远程 Reindex(跨集群迁移)
|
|
1052
|
+
POST /_reindex
|
|
1053
|
+
{
|
|
1054
|
+
"source": {
|
|
1055
|
+
"remote": {
|
|
1056
|
+
"host": "https://old-cluster:9200",
|
|
1057
|
+
"username": "user",
|
|
1058
|
+
"password": "pass"
|
|
1059
|
+
},
|
|
1060
|
+
"index": "products",
|
|
1061
|
+
"size": 5000
|
|
1062
|
+
},
|
|
1063
|
+
"dest": {
|
|
1064
|
+
"index": "products"
|
|
1065
|
+
}
|
|
1066
|
+
}
|
|
1067
|
+
|
|
1068
|
+
// 异步 Reindex(大数据量场景)
|
|
1069
|
+
POST /_reindex?wait_for_completion=false&slices=auto
|
|
1070
|
+
{
|
|
1071
|
+
"source": { "index": "big_index" },
|
|
1072
|
+
"dest": { "index": "big_index_v2" }
|
|
1073
|
+
}
|
|
1074
|
+
// 返回 task_id,通过 GET /_tasks/<task_id> 查看进度
|
|
1075
|
+
```
|
|
1076
|
+
|
|
1077
|
+
### Split / Shrink
|
|
1078
|
+
|
|
1079
|
+
```json
|
|
1080
|
+
// Split:增加分片数(必须是原分片数的整数倍)
|
|
1081
|
+
POST /products/_split/products_split
|
|
1082
|
+
{
|
|
1083
|
+
"settings": {
|
|
1084
|
+
"index.number_of_shards": 6
|
|
1085
|
+
}
|
|
1086
|
+
}
|
|
1087
|
+
|
|
1088
|
+
// Shrink:减少分片数(必须是原分片数的因数)
|
|
1089
|
+
// 前置条件:索引只读 + 所有分片迁移至同一节点
|
|
1090
|
+
PUT /logs-000001/_settings
|
|
1091
|
+
{
|
|
1092
|
+
"index.routing.allocation.require._name": "shrink_node",
|
|
1093
|
+
"index.blocks.write": true
|
|
1094
|
+
}
|
|
1095
|
+
|
|
1096
|
+
POST /logs-000001/_shrink/logs-000001-shrunk
|
|
1097
|
+
{
|
|
1098
|
+
"settings": {
|
|
1099
|
+
"index.number_of_shards": 1,
|
|
1100
|
+
"index.number_of_replicas": 1,
|
|
1101
|
+
"index.routing.allocation.require._name": null,
|
|
1102
|
+
"index.blocks.write": null
|
|
1103
|
+
}
|
|
1104
|
+
}
|
|
1105
|
+
```
|
|
1106
|
+
|
|
1107
|
+
---
|
|
1108
|
+
|
|
1109
|
+
## 性能优化
|
|
1110
|
+
|
|
1111
|
+
### 分片策略
|
|
1112
|
+
|
|
1113
|
+
| 原则 | 说明 |
|
|
1114
|
+
|------|------|
|
|
1115
|
+
| 单分片大小 | 10–50GB(搜索场景),50–80GB(日志场景) |
|
|
1116
|
+
| 总分片数 | 每节点分片数 ≤ 20 × JVM 堆内存(GB) |
|
|
1117
|
+
| 避免过度分片 | 1000 个 1MB 分片远不如 1 个 1GB 分片高效 |
|
|
1118
|
+
| 时间序列 | 使用 ILM + Rollover,按时间自动拆分 |
|
|
1119
|
+
| 搜索并行度 | 分片数 = 预期并发搜索数 × 响应时间要求 |
|
|
1120
|
+
|
|
1121
|
+
### Routing
|
|
1122
|
+
|
|
1123
|
+
默认情况下,文档通过 `hash(_id) % number_of_shards` 路由到分片。自定义 routing 可以将相关文档定位到同一分片,减少搜索时的分片扇出:
|
|
1124
|
+
|
|
1125
|
+
```json
|
|
1126
|
+
// 按租户 ID 路由
|
|
1127
|
+
PUT /multi_tenant_logs/_doc/1?routing=tenant_abc
|
|
1128
|
+
{
|
|
1129
|
+
"tenant_id": "tenant_abc",
|
|
1130
|
+
"message": "User login successful",
|
|
1131
|
+
"timestamp": "2024-06-15T10:30:00Z"
|
|
1132
|
+
}
|
|
1133
|
+
|
|
1134
|
+
// 搜索时指定 routing,只查询目标分片
|
|
1135
|
+
GET /multi_tenant_logs/_search?routing=tenant_abc
|
|
1136
|
+
{
|
|
1137
|
+
"query": {
|
|
1138
|
+
"match": { "message": "login" }
|
|
1139
|
+
}
|
|
1140
|
+
}
|
|
1141
|
+
|
|
1142
|
+
// Mapping 中强制 routing
|
|
1143
|
+
PUT /multi_tenant_logs
|
|
1144
|
+
{
|
|
1145
|
+
"mappings": {
|
|
1146
|
+
"_routing": { "required": true },
|
|
1147
|
+
"properties": {
|
|
1148
|
+
"tenant_id": { "type": "keyword" },
|
|
1149
|
+
"message": { "type": "text" }
|
|
1150
|
+
}
|
|
1151
|
+
}
|
|
1152
|
+
}
|
|
1153
|
+
```
|
|
1154
|
+
|
|
1155
|
+
### Bulk API
|
|
1156
|
+
|
|
1157
|
+
批量操作是 ES 写入性能的关键优化手段:
|
|
1158
|
+
|
|
1159
|
+
```json
|
|
1160
|
+
POST /_bulk
|
|
1161
|
+
{"index": {"_index": "products", "_id": "1"}}
|
|
1162
|
+
{"name": "Product A", "price": 100}
|
|
1163
|
+
{"index": {"_index": "products", "_id": "2"}}
|
|
1164
|
+
{"name": "Product B", "price": 200}
|
|
1165
|
+
{"update": {"_index": "products", "_id": "1"}}
|
|
1166
|
+
{"doc": {"price": 150}}
|
|
1167
|
+
{"delete": {"_index": "products", "_id": "3"}}
|
|
1168
|
+
```
|
|
1169
|
+
|
|
1170
|
+
Bulk 最佳实践:
|
|
1171
|
+
- **批次大小**: 5–15MB 每批次(而非按文档数),具体需基准测试
|
|
1172
|
+
- **并发写入**: 使用多线程/多进程并发写入,通常 3–8 个并发线程
|
|
1173
|
+
- **关闭副本**: 大批量写入前可临时将 `number_of_replicas` 设为 0,完成后恢复
|
|
1174
|
+
- **关闭 refresh**: 写入期间设 `refresh_interval: -1`,完成后手动 refresh
|
|
1175
|
+
- **错误处理**: 逐条检查 bulk 响应中的 `errors` 字段,部分失败需重试
|
|
1176
|
+
|
|
1177
|
+
### scroll vs search_after
|
|
1178
|
+
|
|
1179
|
+
| 方式 | 适用场景 | 特点 |
|
|
1180
|
+
|------|----------|------|
|
|
1181
|
+
| `from + size` | 浅分页(前 100 页) | 简单,但 `from + size ≤ 10000`(默认) |
|
|
1182
|
+
| `scroll` | 全量导出/遍历 | 快照语义,占用资源,非实时 |
|
|
1183
|
+
| `search_after` | 深分页/实时翻页 | 无状态,实时,需排序字段 |
|
|
1184
|
+
| PIT + search_after | 一致性深分页 | 最佳方案,保持一致视图 |
|
|
1185
|
+
|
|
1186
|
+
```json
|
|
1187
|
+
// scroll 方式(全量导出)
|
|
1188
|
+
POST /products/_search?scroll=5m
|
|
1189
|
+
{
|
|
1190
|
+
"size": 1000,
|
|
1191
|
+
"query": { "match_all": {} },
|
|
1192
|
+
"sort": ["_doc"]
|
|
1193
|
+
}
|
|
1194
|
+
// 后续请求
|
|
1195
|
+
POST /_search/scroll
|
|
1196
|
+
{
|
|
1197
|
+
"scroll": "5m",
|
|
1198
|
+
"scroll_id": "<scroll_id>"
|
|
1199
|
+
}
|
|
1200
|
+
// 用完清理
|
|
1201
|
+
DELETE /_search/scroll
|
|
1202
|
+
{ "scroll_id": "<scroll_id>" }
|
|
1203
|
+
|
|
1204
|
+
// search_after 方式(推荐的深分页)
|
|
1205
|
+
// 第一页
|
|
1206
|
+
GET /products/_search
|
|
1207
|
+
{
|
|
1208
|
+
"size": 20,
|
|
1209
|
+
"query": { "match": { "category": "electronics" } },
|
|
1210
|
+
"sort": [
|
|
1211
|
+
{ "created_at": "desc" },
|
|
1212
|
+
{ "_id": "asc" }
|
|
1213
|
+
]
|
|
1214
|
+
}
|
|
1215
|
+
// 下一页:使用上一页最后一条记录的 sort 值
|
|
1216
|
+
GET /products/_search
|
|
1217
|
+
{
|
|
1218
|
+
"size": 20,
|
|
1219
|
+
"query": { "match": { "category": "electronics" } },
|
|
1220
|
+
"sort": [
|
|
1221
|
+
{ "created_at": "desc" },
|
|
1222
|
+
{ "_id": "asc" }
|
|
1223
|
+
],
|
|
1224
|
+
"search_after": ["2024-06-15T10:30:00.000Z", "product_999"]
|
|
1225
|
+
}
|
|
1226
|
+
|
|
1227
|
+
// PIT (Point in Time) + search_after(一致性深分页)
|
|
1228
|
+
POST /products/_pit?keep_alive=5m
|
|
1229
|
+
// 返回 { "id": "<pit_id>" }
|
|
1230
|
+
|
|
1231
|
+
GET /_search
|
|
1232
|
+
{
|
|
1233
|
+
"size": 20,
|
|
1234
|
+
"query": { "match": { "category": "electronics" } },
|
|
1235
|
+
"pit": {
|
|
1236
|
+
"id": "<pit_id>",
|
|
1237
|
+
"keep_alive": "5m"
|
|
1238
|
+
},
|
|
1239
|
+
"sort": [
|
|
1240
|
+
{ "created_at": "desc" },
|
|
1241
|
+
{ "_id": "asc" }
|
|
1242
|
+
],
|
|
1243
|
+
"search_after": ["2024-06-15T10:30:00.000Z", "product_999"]
|
|
1244
|
+
}
|
|
1245
|
+
```
|
|
1246
|
+
|
|
1247
|
+
### 缓存与预热
|
|
1248
|
+
|
|
1249
|
+
ES 内置多级缓存:
|
|
1250
|
+
|
|
1251
|
+
| 缓存类型 | 说明 | 失效条件 |
|
|
1252
|
+
|----------|------|----------|
|
|
1253
|
+
| Node Query Cache | 缓存 filter context 的结果(bitset) | Segment 合并 / 索引更新 |
|
|
1254
|
+
| Shard Request Cache | 缓存聚合结果和 `size=0` 的搜索 | refresh 时失效 |
|
|
1255
|
+
| Fielddata Cache | `text` 字段排序/聚合时加载(避免使用) | 手动清理或内存压力 |
|
|
1256
|
+
| OS Page Cache | 操作系统文件系统缓存 | 内存不足时 LRU 淘汰 |
|
|
1257
|
+
|
|
1258
|
+
```json
|
|
1259
|
+
// 预热:在索引 settings 中配置
|
|
1260
|
+
PUT /products/_settings
|
|
1261
|
+
{
|
|
1262
|
+
"index.queries.cache.enabled": true
|
|
1263
|
+
}
|
|
1264
|
+
|
|
1265
|
+
// 手动预热关键查询
|
|
1266
|
+
GET /products/_search?request_cache=true
|
|
1267
|
+
{
|
|
1268
|
+
"size": 0,
|
|
1269
|
+
"aggs": {
|
|
1270
|
+
"popular_categories": {
|
|
1271
|
+
"terms": { "field": "category.keyword", "size": 50 }
|
|
1272
|
+
}
|
|
1273
|
+
}
|
|
1274
|
+
}
|
|
1275
|
+
|
|
1276
|
+
// 清理缓存(谨慎使用)
|
|
1277
|
+
POST /products/_cache/clear
|
|
1278
|
+
POST /_cache/clear?query=true&fielddata=true&request=true
|
|
1279
|
+
```
|
|
1280
|
+
|
|
1281
|
+
### Mapping 优化与关闭不需要的特性
|
|
1282
|
+
|
|
1283
|
+
```json
|
|
1284
|
+
PUT /optimized_logs
|
|
1285
|
+
{
|
|
1286
|
+
"mappings": {
|
|
1287
|
+
"properties": {
|
|
1288
|
+
"message": {
|
|
1289
|
+
"type": "text",
|
|
1290
|
+
"norms": false,
|
|
1291
|
+
"index_options": "freqs"
|
|
1292
|
+
},
|
|
1293
|
+
"trace_id": {
|
|
1294
|
+
"type": "keyword",
|
|
1295
|
+
"doc_values": false,
|
|
1296
|
+
"norms": false
|
|
1297
|
+
},
|
|
1298
|
+
"raw_body": {
|
|
1299
|
+
"type": "keyword",
|
|
1300
|
+
"index": false,
|
|
1301
|
+
"doc_values": false
|
|
1302
|
+
},
|
|
1303
|
+
"metadata": {
|
|
1304
|
+
"type": "object",
|
|
1305
|
+
"enabled": false
|
|
1306
|
+
}
|
|
1307
|
+
},
|
|
1308
|
+
"_source": {
|
|
1309
|
+
"excludes": ["raw_body"]
|
|
1310
|
+
}
|
|
1311
|
+
}
|
|
1312
|
+
}
|
|
1313
|
+
```
|
|
1314
|
+
|
|
1315
|
+
优化说明:
|
|
1316
|
+
- `norms: false` — 不需要评分时关闭(节省约 1 byte/doc/field)
|
|
1317
|
+
- `doc_values: false` — 不需要排序/聚合的 keyword 字段关闭
|
|
1318
|
+
- `index: false` — 只存储不索引的字段(如原始日志)
|
|
1319
|
+
- `enabled: false` — 完全跳过解析和索引(仅存储在 `_source`)
|
|
1320
|
+
- `index_options: freqs` — 不需要位置信息时降低索引精度
|
|
1321
|
+
- `_source.excludes` — 排除大字段,减少 `_source` 存储和网络传输
|
|
1322
|
+
|
|
1323
|
+
---
|
|
1324
|
+
|
|
1325
|
+
## 集群运维
|
|
1326
|
+
|
|
1327
|
+
### 节点角色
|
|
1328
|
+
|
|
1329
|
+
| 角色 | 配置 | 职责 |
|
|
1330
|
+
|------|------|------|
|
|
1331
|
+
| `master` | `node.roles: [master]` | 集群状态管理、分片分配决策 |
|
|
1332
|
+
| `data` | `node.roles: [data]` | 存储数据、执行搜索和聚合 |
|
|
1333
|
+
| `data_hot` | `node.roles: [data_hot]` | 存储热数据(SSD,高 I/O) |
|
|
1334
|
+
| `data_warm` | `node.roles: [data_warm]` | 存储温数据(HDD,中等 I/O) |
|
|
1335
|
+
| `data_cold` | `node.roles: [data_cold]` | 存储冷数据(大容量 HDD) |
|
|
1336
|
+
| `data_frozen` | `node.roles: [data_frozen]` | 可搜索快照(S3/共享存储) |
|
|
1337
|
+
| `ingest` | `node.roles: [ingest]` | 数据预处理(Pipeline) |
|
|
1338
|
+
| `ml` | `node.roles: [ml]` | 机器学习任务 |
|
|
1339
|
+
| `coordinating` | `node.roles: []` | 请求路由、结果合并(无数据) |
|
|
1340
|
+
| `transform` | `node.roles: [transform]` | 数据转换任务 |
|
|
1341
|
+
|
|
1342
|
+
生产集群最小部署:
|
|
1343
|
+
- 3 个 dedicated master 节点(避免脑裂,保障选主稳定)
|
|
1344
|
+
- N 个 data 节点(按存储量和性能需求扩展)
|
|
1345
|
+
- 1–2 个 coordinating 节点(高并发搜索场景)
|
|
1346
|
+
|
|
1347
|
+
### 容量规划
|
|
1348
|
+
|
|
1349
|
+
```
|
|
1350
|
+
存储容量计算:
|
|
1351
|
+
原始数据大小 × (1 + 副本数) × 1.1 (索引膨胀) × 1.15 (OS/临时空间)
|
|
1352
|
+
|
|
1353
|
+
内存规划:
|
|
1354
|
+
JVM Heap ≤ 50% 物理内存 且 ≤ 30GB(压缩指针边界)
|
|
1355
|
+
剩余 50%+ 留给 OS Page Cache(文件系统缓存至关重要)
|
|
1356
|
+
Heap 分配示例:64GB 物理内存 -> 30GB Heap + 34GB Page Cache
|
|
1357
|
+
|
|
1358
|
+
分片规划:
|
|
1359
|
+
总分片数 = 总数据量 / 目标分片大小(30GB)
|
|
1360
|
+
每节点分片数 ≤ 20 × JVM堆(GB)
|
|
1361
|
+
|
|
1362
|
+
示例:
|
|
1363
|
+
1TB 原始日志/天,保留 30 天,1 副本
|
|
1364
|
+
存储 = 1TB × 30 × 2 × 1.1 × 1.15 ≈ 76TB
|
|
1365
|
+
分片 = 76TB / 30GB ≈ 2534 个分片
|
|
1366
|
+
节点数 = 2534 / (20 × 30) ≈ 5 个 data 节点(每节点 30GB Heap)
|
|
1367
|
+
```
|
|
1368
|
+
|
|
1369
|
+
### 集群健康
|
|
1370
|
+
|
|
1371
|
+
```json
|
|
1372
|
+
// 集群健康状态
|
|
1373
|
+
GET /_cluster/health
|
|
1374
|
+
// green: 所有分片已分配
|
|
1375
|
+
// yellow: 主分片已分配,存在未分配的副本
|
|
1376
|
+
// red: 存在未分配的主分片(数据丢失风险)
|
|
1377
|
+
|
|
1378
|
+
// 查看未分配分片原因
|
|
1379
|
+
GET /_cluster/allocation/explain
|
|
1380
|
+
{
|
|
1381
|
+
"index": "products",
|
|
1382
|
+
"shard": 0,
|
|
1383
|
+
"primary": true
|
|
1384
|
+
}
|
|
1385
|
+
|
|
1386
|
+
// 节点统计
|
|
1387
|
+
GET /_nodes/stats/jvm,os,process,fs
|
|
1388
|
+
|
|
1389
|
+
// 分片分布
|
|
1390
|
+
GET /_cat/shards?v&s=index,shard
|
|
1391
|
+
|
|
1392
|
+
// 热线程诊断
|
|
1393
|
+
GET /_nodes/hot_threads
|
|
1394
|
+
|
|
1395
|
+
// Pending tasks
|
|
1396
|
+
GET /_cluster/pending_tasks
|
|
1397
|
+
```
|
|
1398
|
+
|
|
1399
|
+
### Hot-Warm-Cold 架构
|
|
1400
|
+
|
|
1401
|
+
```yaml
|
|
1402
|
+
# elasticsearch.yml (Hot 节点)
|
|
1403
|
+
node.roles: [data_hot, ingest]
|
|
1404
|
+
node.attr.data: hot
|
|
1405
|
+
|
|
1406
|
+
# elasticsearch.yml (Warm 节点)
|
|
1407
|
+
node.roles: [data_warm]
|
|
1408
|
+
node.attr.data: warm
|
|
1409
|
+
|
|
1410
|
+
# elasticsearch.yml (Cold 节点)
|
|
1411
|
+
node.roles: [data_cold]
|
|
1412
|
+
node.attr.data: cold
|
|
1413
|
+
```
|
|
1414
|
+
|
|
1415
|
+
配合 ILM 策略实现数据自动流转(参见上文 ILM 章节)。架构优势:
|
|
1416
|
+
- **Hot 节点**: 高性能 SSD,处理最新数据的写入和高频查询
|
|
1417
|
+
- **Warm 节点**: 普通 SSD/HDD,中频查询的历史数据
|
|
1418
|
+
- **Cold 节点**: 大容量 HDD,低频查询的归档数据
|
|
1419
|
+
- **Frozen 节点**: 可搜索快照(Searchable Snapshot),数据在对象存储(S3)上
|
|
1420
|
+
|
|
1421
|
+
### 跨集群搜索(Cross-Cluster Search)
|
|
1422
|
+
|
|
1423
|
+
```json
|
|
1424
|
+
// 配置远程集群
|
|
1425
|
+
PUT /_cluster/settings
|
|
1426
|
+
{
|
|
1427
|
+
"persistent": {
|
|
1428
|
+
"cluster": {
|
|
1429
|
+
"remote": {
|
|
1430
|
+
"cluster_us": {
|
|
1431
|
+
"seeds": ["us-node1:9300", "us-node2:9300"],
|
|
1432
|
+
"transport.compress": true,
|
|
1433
|
+
"skip_unavailable": true
|
|
1434
|
+
},
|
|
1435
|
+
"cluster_eu": {
|
|
1436
|
+
"seeds": ["eu-node1:9300"],
|
|
1437
|
+
"skip_unavailable": true
|
|
1438
|
+
}
|
|
1439
|
+
}
|
|
1440
|
+
}
|
|
1441
|
+
}
|
|
1442
|
+
}
|
|
1443
|
+
|
|
1444
|
+
// 跨集群搜索
|
|
1445
|
+
GET /local_index,cluster_us:remote_index,cluster_eu:remote_index/_search
|
|
1446
|
+
{
|
|
1447
|
+
"query": {
|
|
1448
|
+
"match": { "message": "critical error" }
|
|
1449
|
+
}
|
|
1450
|
+
}
|
|
1451
|
+
```
|
|
1452
|
+
|
|
1453
|
+
### 快照备份
|
|
1454
|
+
|
|
1455
|
+
```json
|
|
1456
|
+
// 注册快照仓库(S3)
|
|
1457
|
+
PUT /_snapshot/s3_backup
|
|
1458
|
+
{
|
|
1459
|
+
"type": "s3",
|
|
1460
|
+
"settings": {
|
|
1461
|
+
"bucket": "es-snapshots",
|
|
1462
|
+
"region": "ap-east-1",
|
|
1463
|
+
"base_path": "production",
|
|
1464
|
+
"compress": true,
|
|
1465
|
+
"max_restore_bytes_per_sec": "200mb",
|
|
1466
|
+
"max_snapshot_bytes_per_sec": "200mb"
|
|
1467
|
+
}
|
|
1468
|
+
}
|
|
1469
|
+
|
|
1470
|
+
// 创建快照
|
|
1471
|
+
PUT /_snapshot/s3_backup/snapshot_20240615
|
|
1472
|
+
{
|
|
1473
|
+
"indices": "products,orders-*",
|
|
1474
|
+
"ignore_unavailable": true,
|
|
1475
|
+
"include_global_state": false
|
|
1476
|
+
}
|
|
1477
|
+
|
|
1478
|
+
// 查看快照状态
|
|
1479
|
+
GET /_snapshot/s3_backup/snapshot_20240615/_status
|
|
1480
|
+
|
|
1481
|
+
// 恢复快照
|
|
1482
|
+
POST /_snapshot/s3_backup/snapshot_20240615/_restore
|
|
1483
|
+
{
|
|
1484
|
+
"indices": "products",
|
|
1485
|
+
"rename_pattern": "(.+)",
|
|
1486
|
+
"rename_replacement": "restored_$1"
|
|
1487
|
+
}
|
|
1488
|
+
|
|
1489
|
+
// SLM(快照生命周期管理)
|
|
1490
|
+
PUT /_slm/policy/nightly_backup
|
|
1491
|
+
{
|
|
1492
|
+
"schedule": "0 0 2 * * ?",
|
|
1493
|
+
"name": "<nightly-snap-{now/d}>",
|
|
1494
|
+
"repository": "s3_backup",
|
|
1495
|
+
"config": {
|
|
1496
|
+
"indices": "*",
|
|
1497
|
+
"ignore_unavailable": true,
|
|
1498
|
+
"include_global_state": false
|
|
1499
|
+
},
|
|
1500
|
+
"retention": {
|
|
1501
|
+
"expire_after": "30d",
|
|
1502
|
+
"min_count": 7,
|
|
1503
|
+
"max_count": 60
|
|
1504
|
+
}
|
|
1505
|
+
}
|
|
1506
|
+
```
|
|
1507
|
+
|
|
1508
|
+
---
|
|
1509
|
+
|
|
1510
|
+
## 安全
|
|
1511
|
+
|
|
1512
|
+
### X-Pack Security(8.x 默认启用)
|
|
1513
|
+
|
|
1514
|
+
ES 8.x 首次启动时自动生成 CA 证书、节点证书和 elastic 超级用户密码。
|
|
1515
|
+
|
|
1516
|
+
```yaml
|
|
1517
|
+
# elasticsearch.yml
|
|
1518
|
+
xpack.security.enabled: true
|
|
1519
|
+
xpack.security.enrollment.enabled: true
|
|
1520
|
+
|
|
1521
|
+
xpack.security.transport.ssl:
|
|
1522
|
+
enabled: true
|
|
1523
|
+
verification_mode: certificate
|
|
1524
|
+
keystore.path: certs/transport.p12
|
|
1525
|
+
truststore.path: certs/transport.p12
|
|
1526
|
+
|
|
1527
|
+
xpack.security.http.ssl:
|
|
1528
|
+
enabled: true
|
|
1529
|
+
keystore.path: certs/http.p12
|
|
1530
|
+
```
|
|
1531
|
+
|
|
1532
|
+
### 角色与用户管理
|
|
1533
|
+
|
|
1534
|
+
```json
|
|
1535
|
+
// 创建角色
|
|
1536
|
+
POST /_security/role/product_reader
|
|
1537
|
+
{
|
|
1538
|
+
"cluster": ["monitor"],
|
|
1539
|
+
"indices": [
|
|
1540
|
+
{
|
|
1541
|
+
"names": ["products*"],
|
|
1542
|
+
"privileges": ["read", "view_index_metadata"],
|
|
1543
|
+
"field_security": {
|
|
1544
|
+
"grant": ["name", "category", "price", "description"]
|
|
1545
|
+
},
|
|
1546
|
+
"query": {
|
|
1547
|
+
"term": { "status": "active" }
|
|
1548
|
+
}
|
|
1549
|
+
}
|
|
1550
|
+
],
|
|
1551
|
+
"run_as": []
|
|
1552
|
+
}
|
|
1553
|
+
|
|
1554
|
+
// 创建用户
|
|
1555
|
+
POST /_security/user/product_app
|
|
1556
|
+
{
|
|
1557
|
+
"password": "strong_password_here",
|
|
1558
|
+
"roles": ["product_reader"],
|
|
1559
|
+
"full_name": "Product App Service Account",
|
|
1560
|
+
"email": "product-app@example.com"
|
|
1561
|
+
}
|
|
1562
|
+
```
|
|
1563
|
+
|
|
1564
|
+
### API Key
|
|
1565
|
+
|
|
1566
|
+
```json
|
|
1567
|
+
// 创建 API Key
|
|
1568
|
+
POST /_security/api_key
|
|
1569
|
+
{
|
|
1570
|
+
"name": "product-search-key",
|
|
1571
|
+
"expiration": "90d",
|
|
1572
|
+
"role_descriptors": {
|
|
1573
|
+
"product_search": {
|
|
1574
|
+
"cluster": [],
|
|
1575
|
+
"index": [
|
|
1576
|
+
{
|
|
1577
|
+
"names": ["products*"],
|
|
1578
|
+
"privileges": ["read"]
|
|
1579
|
+
}
|
|
1580
|
+
]
|
|
1581
|
+
}
|
|
1582
|
+
},
|
|
1583
|
+
"metadata": {
|
|
1584
|
+
"application": "product-search-service",
|
|
1585
|
+
"team": "search-platform"
|
|
1586
|
+
}
|
|
1587
|
+
}
|
|
1588
|
+
// 返回 { "id": "...", "api_key": "...", "encoded": "base64..." }
|
|
1589
|
+
// 使用: curl -H "Authorization: ApiKey <encoded>"
|
|
1590
|
+
|
|
1591
|
+
// 撤销 API Key
|
|
1592
|
+
DELETE /_security/api_key
|
|
1593
|
+
{
|
|
1594
|
+
"ids": ["<api_key_id>"]
|
|
1595
|
+
}
|
|
1596
|
+
```
|
|
1597
|
+
|
|
1598
|
+
### 字段级安全(Field Level Security)
|
|
1599
|
+
|
|
1600
|
+
在角色定义中通过 `field_security` 限制可见字段:
|
|
1601
|
+
|
|
1602
|
+
```json
|
|
1603
|
+
{
|
|
1604
|
+
"indices": [
|
|
1605
|
+
{
|
|
1606
|
+
"names": ["customers*"],
|
|
1607
|
+
"privileges": ["read"],
|
|
1608
|
+
"field_security": {
|
|
1609
|
+
"grant": ["name", "email", "tier"],
|
|
1610
|
+
"except": ["ssn", "credit_card"]
|
|
1611
|
+
}
|
|
1612
|
+
}
|
|
1613
|
+
]
|
|
1614
|
+
}
|
|
1615
|
+
```
|
|
1616
|
+
|
|
1617
|
+
### 文档级安全(Document Level Security)
|
|
1618
|
+
|
|
1619
|
+
在角色定义中通过 `query` 限制可见文档:
|
|
1620
|
+
|
|
1621
|
+
```json
|
|
1622
|
+
{
|
|
1623
|
+
"indices": [
|
|
1624
|
+
{
|
|
1625
|
+
"names": ["orders*"],
|
|
1626
|
+
"privileges": ["read"],
|
|
1627
|
+
"query": {
|
|
1628
|
+
"bool": {
|
|
1629
|
+
"filter": [
|
|
1630
|
+
{ "term": { "region": "asia-pacific" } },
|
|
1631
|
+
{ "range": { "created_at": { "gte": "now-1y" } } }
|
|
1632
|
+
]
|
|
1633
|
+
}
|
|
1634
|
+
}
|
|
1635
|
+
}
|
|
1636
|
+
]
|
|
1637
|
+
}
|
|
1638
|
+
```
|
|
1639
|
+
|
|
1640
|
+
文档级安全和字段级安全可以组合使用,实现细粒度的数据访问控制。
|
|
1641
|
+
|
|
1642
|
+
---
|
|
1643
|
+
|
|
1644
|
+
## 与应用集成
|
|
1645
|
+
|
|
1646
|
+
### Python elasticsearch-py
|
|
1647
|
+
|
|
1648
|
+
```python
|
|
1649
|
+
from elasticsearch import Elasticsearch, helpers
|
|
1650
|
+
from datetime import datetime
|
|
1651
|
+
|
|
1652
|
+
# 连接(8.x 推荐方式)
|
|
1653
|
+
es = Elasticsearch(
|
|
1654
|
+
"https://localhost:9200",
|
|
1655
|
+
api_key="base64_encoded_key",
|
|
1656
|
+
ca_certs="/path/to/http_ca.crt",
|
|
1657
|
+
request_timeout=30,
|
|
1658
|
+
max_retries=3,
|
|
1659
|
+
retry_on_timeout=True
|
|
1660
|
+
)
|
|
1661
|
+
|
|
1662
|
+
# 连接验证
|
|
1663
|
+
print(es.info())
|
|
1664
|
+
|
|
1665
|
+
# 索引单个文档
|
|
1666
|
+
doc = {
|
|
1667
|
+
"name": "Elasticsearch Guide",
|
|
1668
|
+
"category": "book",
|
|
1669
|
+
"price": 59.99,
|
|
1670
|
+
"created_at": datetime.now()
|
|
1671
|
+
}
|
|
1672
|
+
resp = es.index(index="products", id="book_001", document=doc)
|
|
1673
|
+
print(f"Indexed: {resp['result']}") # created / updated
|
|
1674
|
+
|
|
1675
|
+
# 搜索
|
|
1676
|
+
resp = es.search(
|
|
1677
|
+
index="products",
|
|
1678
|
+
query={
|
|
1679
|
+
"bool": {
|
|
1680
|
+
"must": [{"match": {"name": "Elasticsearch"}}],
|
|
1681
|
+
"filter": [{"range": {"price": {"lte": 100}}}]
|
|
1682
|
+
}
|
|
1683
|
+
},
|
|
1684
|
+
size=10,
|
|
1685
|
+
source=["name", "price"]
|
|
1686
|
+
)
|
|
1687
|
+
for hit in resp["hits"]["hits"]:
|
|
1688
|
+
print(f"{hit['_id']}: {hit['_source']}")
|
|
1689
|
+
|
|
1690
|
+
# 聚合
|
|
1691
|
+
resp = es.search(
|
|
1692
|
+
index="products",
|
|
1693
|
+
size=0,
|
|
1694
|
+
aggs={
|
|
1695
|
+
"by_category": {
|
|
1696
|
+
"terms": {"field": "category.keyword", "size": 20},
|
|
1697
|
+
"aggs": {
|
|
1698
|
+
"avg_price": {"avg": {"field": "price"}}
|
|
1699
|
+
}
|
|
1700
|
+
}
|
|
1701
|
+
}
|
|
1702
|
+
)
|
|
1703
|
+
for bucket in resp["aggregations"]["by_category"]["buckets"]:
|
|
1704
|
+
print(f"{bucket['key']}: {bucket['doc_count']} items, avg={bucket['avg_price']['value']:.2f}")
|
|
1705
|
+
|
|
1706
|
+
# Bulk 批量写入
|
|
1707
|
+
def generate_actions():
|
|
1708
|
+
for i in range(10000):
|
|
1709
|
+
yield {
|
|
1710
|
+
"_index": "products",
|
|
1711
|
+
"_id": f"product_{i}",
|
|
1712
|
+
"_source": {
|
|
1713
|
+
"name": f"Product {i}",
|
|
1714
|
+
"price": round(10 + i * 0.1, 2),
|
|
1715
|
+
"category": f"cat_{i % 10}"
|
|
1716
|
+
}
|
|
1717
|
+
}
|
|
1718
|
+
|
|
1719
|
+
success, errors = helpers.bulk(
|
|
1720
|
+
es,
|
|
1721
|
+
generate_actions(),
|
|
1722
|
+
chunk_size=500,
|
|
1723
|
+
max_retries=3,
|
|
1724
|
+
raise_on_error=False
|
|
1725
|
+
)
|
|
1726
|
+
print(f"Bulk indexed: {success} success, {len(errors)} errors")
|
|
1727
|
+
|
|
1728
|
+
# Async 版本
|
|
1729
|
+
from elasticsearch import AsyncElasticsearch
|
|
1730
|
+
import asyncio
|
|
1731
|
+
|
|
1732
|
+
async def async_search():
|
|
1733
|
+
es_async = AsyncElasticsearch(
|
|
1734
|
+
"https://localhost:9200",
|
|
1735
|
+
api_key="base64_encoded_key",
|
|
1736
|
+
ca_certs="/path/to/http_ca.crt"
|
|
1737
|
+
)
|
|
1738
|
+
resp = await es_async.search(
|
|
1739
|
+
index="products",
|
|
1740
|
+
query={"match_all": {}},
|
|
1741
|
+
size=5
|
|
1742
|
+
)
|
|
1743
|
+
await es_async.close()
|
|
1744
|
+
return resp
|
|
1745
|
+
|
|
1746
|
+
asyncio.run(async_search())
|
|
1747
|
+
```
|
|
1748
|
+
|
|
1749
|
+
### REST API 直接调用
|
|
1750
|
+
|
|
1751
|
+
```bash
|
|
1752
|
+
# 基础搜索
|
|
1753
|
+
curl -X GET "https://localhost:9200/products/_search" \
|
|
1754
|
+
-H "Content-Type: application/json" \
|
|
1755
|
+
-H "Authorization: ApiKey <encoded_key>" \
|
|
1756
|
+
--cacert /path/to/http_ca.crt \
|
|
1757
|
+
-d '{
|
|
1758
|
+
"query": { "match": { "name": "手机" } },
|
|
1759
|
+
"size": 10
|
|
1760
|
+
}'
|
|
1761
|
+
|
|
1762
|
+
# 集群健康
|
|
1763
|
+
curl -s "https://localhost:9200/_cluster/health?pretty" \
|
|
1764
|
+
-H "Authorization: ApiKey <encoded_key>" \
|
|
1765
|
+
--cacert /path/to/http_ca.crt
|
|
1766
|
+
|
|
1767
|
+
# Cat APIs(运维常用)
|
|
1768
|
+
curl -s "https://localhost:9200/_cat/indices?v&s=store.size:desc" \
|
|
1769
|
+
-H "Authorization: ApiKey <encoded_key>" \
|
|
1770
|
+
--cacert /path/to/http_ca.crt
|
|
1771
|
+
```
|
|
1772
|
+
|
|
1773
|
+
### Logstash
|
|
1774
|
+
|
|
1775
|
+
```ruby
|
|
1776
|
+
# logstash.conf
|
|
1777
|
+
input {
|
|
1778
|
+
beats {
|
|
1779
|
+
port => 5044
|
|
1780
|
+
}
|
|
1781
|
+
kafka {
|
|
1782
|
+
bootstrap_servers => "kafka1:9092,kafka2:9092"
|
|
1783
|
+
topics => ["app-logs"]
|
|
1784
|
+
group_id => "logstash-consumers"
|
|
1785
|
+
codec => json
|
|
1786
|
+
}
|
|
1787
|
+
}
|
|
1788
|
+
|
|
1789
|
+
filter {
|
|
1790
|
+
if [type] == "nginx" {
|
|
1791
|
+
grok {
|
|
1792
|
+
match => { "message" => "%{COMBINEDAPACHELOG}" }
|
|
1793
|
+
}
|
|
1794
|
+
date {
|
|
1795
|
+
match => ["timestamp", "dd/MMM/yyyy:HH:mm:ss Z"]
|
|
1796
|
+
}
|
|
1797
|
+
geoip {
|
|
1798
|
+
source => "clientip"
|
|
1799
|
+
target => "geoip"
|
|
1800
|
+
}
|
|
1801
|
+
}
|
|
1802
|
+
|
|
1803
|
+
mutate {
|
|
1804
|
+
remove_field => ["agent", "ecs", "host"]
|
|
1805
|
+
rename => { "clientip" => "client_ip" }
|
|
1806
|
+
}
|
|
1807
|
+
}
|
|
1808
|
+
|
|
1809
|
+
output {
|
|
1810
|
+
elasticsearch {
|
|
1811
|
+
hosts => ["https://es-node1:9200", "https://es-node2:9200"]
|
|
1812
|
+
index => "logs-%{[type]}-%{+YYYY.MM.dd}"
|
|
1813
|
+
api_key => "id:api_key"
|
|
1814
|
+
ssl_certificate_authorities => ["/path/to/http_ca.crt"]
|
|
1815
|
+
manage_template => true
|
|
1816
|
+
template_name => "logs"
|
|
1817
|
+
ilm_enabled => true
|
|
1818
|
+
ilm_rollover_alias => "logs-write"
|
|
1819
|
+
ilm_policy => "logs_policy"
|
|
1820
|
+
}
|
|
1821
|
+
}
|
|
1822
|
+
```
|
|
1823
|
+
|
|
1824
|
+
### Filebeat
|
|
1825
|
+
|
|
1826
|
+
```yaml
|
|
1827
|
+
# filebeat.yml
|
|
1828
|
+
filebeat.inputs:
|
|
1829
|
+
- type: log
|
|
1830
|
+
enabled: true
|
|
1831
|
+
paths:
|
|
1832
|
+
- /var/log/app/*.log
|
|
1833
|
+
multiline:
|
|
1834
|
+
pattern: '^\d{4}-\d{2}-\d{2}'
|
|
1835
|
+
negate: true
|
|
1836
|
+
match: after
|
|
1837
|
+
fields:
|
|
1838
|
+
app: my-service
|
|
1839
|
+
env: production
|
|
1840
|
+
|
|
1841
|
+
- type: container
|
|
1842
|
+
paths:
|
|
1843
|
+
- /var/lib/docker/containers/*/*.log
|
|
1844
|
+
processors:
|
|
1845
|
+
- add_kubernetes_metadata:
|
|
1846
|
+
host: ${NODE_NAME}
|
|
1847
|
+
matchers:
|
|
1848
|
+
- logs_path:
|
|
1849
|
+
logs_path: "/var/lib/docker/containers/"
|
|
1850
|
+
|
|
1851
|
+
processors:
|
|
1852
|
+
- drop_fields:
|
|
1853
|
+
fields: ["agent.ephemeral_id", "agent.hostname"]
|
|
1854
|
+
- add_host_metadata: ~
|
|
1855
|
+
|
|
1856
|
+
output.elasticsearch:
|
|
1857
|
+
hosts: ["https://es-node1:9200"]
|
|
1858
|
+
api_key: "id:api_key"
|
|
1859
|
+
ssl.certificate_authorities: ["/path/to/http_ca.crt"]
|
|
1860
|
+
index: "filebeat-%{+yyyy.MM.dd}"
|
|
1861
|
+
|
|
1862
|
+
setup.ilm.enabled: true
|
|
1863
|
+
setup.ilm.rollover_alias: "filebeat"
|
|
1864
|
+
setup.ilm.policy_name: "filebeat-policy"
|
|
1865
|
+
|
|
1866
|
+
monitoring.enabled: true
|
|
1867
|
+
monitoring.elasticsearch:
|
|
1868
|
+
hosts: ["https://es-monitoring:9200"]
|
|
1869
|
+
```
|
|
1870
|
+
|
|
1871
|
+
### Kibana
|
|
1872
|
+
|
|
1873
|
+
Kibana 是 ES 的官方可视化平台,核心功能:
|
|
1874
|
+
|
|
1875
|
+
- **Discover**: 日志搜索与浏览,支持 KQL/Lucene 查询语法
|
|
1876
|
+
- **Dashboard**: 可视化仪表盘,支持 30+ 图表类型
|
|
1877
|
+
- **Lens**: 拖拽式可视化创建工具
|
|
1878
|
+
- **Dev Tools**: 交互式 REST API 控制台(开发调试必备)
|
|
1879
|
+
- **Index Management**: 索引、模板、ILM 策略可视化管理
|
|
1880
|
+
- **Security**: 用户、角色、API Key 管理界面
|
|
1881
|
+
- **Alerting**: 基于条件的告警规则(Watcher / Rules)
|
|
1882
|
+
- **Canvas**: 像素级报告和展示面板
|
|
1883
|
+
|
|
1884
|
+
```yaml
|
|
1885
|
+
# kibana.yml 关键配置
|
|
1886
|
+
server.host: "0.0.0.0"
|
|
1887
|
+
server.port: 5601
|
|
1888
|
+
server.publicBaseUrl: "https://kibana.example.com"
|
|
1889
|
+
|
|
1890
|
+
elasticsearch.hosts: ["https://es-node1:9200"]
|
|
1891
|
+
elasticsearch.serviceAccountToken: "<token>"
|
|
1892
|
+
elasticsearch.ssl.certificateAuthorities: ["/path/to/http_ca.crt"]
|
|
1893
|
+
|
|
1894
|
+
xpack.encryptedSavedObjects.encryptionKey: "min-32-char-encryption-key-here!"
|
|
1895
|
+
xpack.security.encryptionKey: "min-32-char-encryption-key-here!"
|
|
1896
|
+
xpack.reporting.encryptionKey: "min-32-char-encryption-key-here!"
|
|
1897
|
+
```
|
|
1898
|
+
|
|
1899
|
+
---
|
|
1900
|
+
|
|
1901
|
+
## 常见陷阱
|
|
1902
|
+
|
|
1903
|
+
### 1. Mapping Explosion(映射爆炸)
|
|
1904
|
+
|
|
1905
|
+
**问题**: 动态映射开启时,大量不同字段名被自动创建,导致集群元数据膨胀、内存溢出。
|
|
1906
|
+
|
|
1907
|
+
**典型场景**: 将用户自定义属性或日志的任意 JSON 键直接索引。
|
|
1908
|
+
|
|
1909
|
+
```json
|
|
1910
|
+
// 反模式:每个用户的自定义属性都成为独立字段
|
|
1911
|
+
{"user_attr_color": "red", "user_attr_size": "L", "user_attr_custom_12345": "value"}
|
|
1912
|
+
|
|
1913
|
+
// 正确做法:使用 nested 或 flattened 类型
|
|
1914
|
+
{
|
|
1915
|
+
"user_attributes": [
|
|
1916
|
+
{"key": "color", "value": "red"},
|
|
1917
|
+
{"key": "size", "value": "L"}
|
|
1918
|
+
]
|
|
1919
|
+
}
|
|
1920
|
+
```
|
|
1921
|
+
|
|
1922
|
+
**防御措施**:
|
|
1923
|
+
- 设置 `dynamic: strict` 或 `dynamic: false`
|
|
1924
|
+
- 配置 `index.mapping.total_fields.limit`(默认 1000)
|
|
1925
|
+
- 使用 `flattened` 类型存储任意 JSON
|
|
1926
|
+
- 对动态模板使用 `path_match` / `unmatch` 精确控制
|
|
1927
|
+
|
|
1928
|
+
### 2. 深分页(Deep Pagination)
|
|
1929
|
+
|
|
1930
|
+
**问题**: `from + size` 超过 `max_result_window`(默认 10000)时报错。即使增大限制,深分页也会消耗大量内存。
|
|
1931
|
+
|
|
1932
|
+
**原因**: ES 需要从每个分片取 `from + size` 条记录,协调节点需要在内存中排序 `shards × (from + size)` 条记录。
|
|
1933
|
+
|
|
1934
|
+
```
|
|
1935
|
+
分页到第 1000 页(每页 20 条): from=19980, size=20
|
|
1936
|
+
5 分片集群实际排序: 5 × 20000 = 100,000 条记录
|
|
1937
|
+
```
|
|
1938
|
+
|
|
1939
|
+
**解决方案**:
|
|
1940
|
+
- 浅分页(< 100 页): `from + size` 即可
|
|
1941
|
+
- 深分页: 使用 `search_after` + PIT(参见性能优化章节)
|
|
1942
|
+
- 全量导出: 使用 `scroll`(但注意资源消耗)
|
|
1943
|
+
- 产品设计: 引导用户通过筛选缩小范围,而非无限翻页
|
|
1944
|
+
|
|
1945
|
+
### 3. 高基数聚合(High Cardinality Aggregation)
|
|
1946
|
+
|
|
1947
|
+
**问题**: 对高基数字段(如 `user_id`、`ip`、`url`)做 `terms` 聚合,内存和 CPU 消耗极大。
|
|
1948
|
+
|
|
1949
|
+
```json
|
|
1950
|
+
// 危险操作:对百万级 user_id 做 terms 聚合
|
|
1951
|
+
{
|
|
1952
|
+
"aggs": {
|
|
1953
|
+
"all_users": {
|
|
1954
|
+
"terms": { "field": "user_id", "size": 1000000 }
|
|
1955
|
+
}
|
|
1956
|
+
}
|
|
1957
|
+
}
|
|
1958
|
+
```
|
|
1959
|
+
|
|
1960
|
+
**解决方案**:
|
|
1961
|
+
- 使用 `cardinality` 聚合做近似去重计数(HyperLogLog++,误差 < 1%)
|
|
1962
|
+
- `terms` 聚合的 `size` 保持合理范围(通常 < 1000)
|
|
1963
|
+
- 使用 `composite` 聚合分页获取所有桶
|
|
1964
|
+
- 降低基数:使用 `script` 将 URL 归一化,或按前缀分组
|
|
1965
|
+
- 预聚合:使用 Transform 定期将细粒度数据聚合为摘要索引
|
|
1966
|
+
|
|
1967
|
+
### 4. 分片过多(Over-Sharding)
|
|
1968
|
+
|
|
1969
|
+
**问题**: 过多的小分片导致集群元数据膨胀、搜索延迟升高、master 节点压力增大。
|
|
1970
|
+
|
|
1971
|
+
**典型场景**: 按天创建日志索引,每个索引默认 5 分片,但日志量很小。
|
|
1972
|
+
|
|
1973
|
+
```
|
|
1974
|
+
365天 × 5分片 × 3索引类型 = 5475 个分片(绝大多数 < 100MB)
|
|
1975
|
+
```
|
|
1976
|
+
|
|
1977
|
+
**解决方案**:
|
|
1978
|
+
- 小数据量索引使用 1 个分片
|
|
1979
|
+
- 使用 ILM + Rollover 基于大小/时间自动滚动
|
|
1980
|
+
- 定期 Shrink 旧索引减少分片数
|
|
1981
|
+
- 使用 Data Stream 替代手动管理的时间序列索引
|
|
1982
|
+
|
|
1983
|
+
### 5. GC 压力(Garbage Collection Pressure)
|
|
1984
|
+
|
|
1985
|
+
**问题**: JVM 堆内存不足或使用不当导致频繁 GC(尤其是 Old GC / Stop-the-World),集群响应变慢甚至节点脱离。
|
|
1986
|
+
|
|
1987
|
+
**常见原因**:
|
|
1988
|
+
- JVM Heap 设置过大(> 30GB,失去压缩指针优势)
|
|
1989
|
+
- 大量 fielddata 加载(对 `text` 字段排序/聚合)
|
|
1990
|
+
- 高基数 terms 聚合
|
|
1991
|
+
- 巨大的 bulk 请求
|
|
1992
|
+
- 过多的 pending tasks 和 in-flight requests
|
|
1993
|
+
|
|
1994
|
+
**解决方案**:
|
|
1995
|
+
- JVM Heap ≤ 50% 物理内存且 ≤ 30GB
|
|
1996
|
+
- 避免对 `text` 字段聚合/排序(使用 `keyword` 子字段 + `doc_values`)
|
|
1997
|
+
- 监控 `jvm.gc.collectors.old.collection_time_in_millis`
|
|
1998
|
+
- 设置 Circuit Breaker:
|
|
1999
|
+
|
|
2000
|
+
```json
|
|
2001
|
+
PUT /_cluster/settings
|
|
2002
|
+
{
|
|
2003
|
+
"persistent": {
|
|
2004
|
+
"indices.breaker.total.limit": "70%",
|
|
2005
|
+
"indices.breaker.fielddata.limit": "40%",
|
|
2006
|
+
"indices.breaker.request.limit": "60%",
|
|
2007
|
+
"network.breaker.inflight_requests.limit": "100%"
|
|
2008
|
+
}
|
|
2009
|
+
}
|
|
2010
|
+
```
|
|
2011
|
+
|
|
2012
|
+
- 使用 G1GC(ES 8.x 默认)并确保充足的堆外内存
|
|
2013
|
+
|
|
2014
|
+
### 6. 其他常见问题
|
|
2015
|
+
|
|
2016
|
+
| 问题 | 原因 | 解决方案 |
|
|
2017
|
+
|------|------|----------|
|
|
2018
|
+
| Yellow 状态 | 副本无法分配(节点不足) | 增加节点或减少副本数 |
|
|
2019
|
+
| 写入被拒绝 | 线程池队列满 | 控制并发写入数,增大队列(谨慎) |
|
|
2020
|
+
| 搜索超时 | 查询过重或数据量大 | 优化查询、增加分片、使用 `terminate_after` |
|
|
2021
|
+
| 磁盘水位线 | 磁盘使用超过 85%/90%/95% | 清理旧索引、扩容磁盘 |
|
|
2022
|
+
| Unassigned 分片 | 节点故障/磁盘满/分配规则冲突 | `_cluster/allocation/explain` 诊断 |
|
|
2023
|
+
| 版本升级兼容性 | 跨大版本升级 | 逐版本滚动升级(7→8),不可跳版本 |
|
|
2024
|
+
|
|
2025
|
+
---
|
|
2026
|
+
|
|
2027
|
+
## Agent Checklist
|
|
2028
|
+
|
|
2029
|
+
### Mapping 与索引设计
|
|
2030
|
+
- [ ] 生产环境已设置 `dynamic: strict` 或 `dynamic: false`
|
|
2031
|
+
- [ ] 字段类型明确定义,不依赖动态映射推断
|
|
2032
|
+
- [ ] `text` 字段配置了合适的 `analyzer` 和 `search_analyzer`
|
|
2033
|
+
- [ ] 需要聚合/排序的字段使用 `keyword` 类型或 `keyword` 子字段
|
|
2034
|
+
- [ ] `_source` 排除了不需要返回的大字段
|
|
2035
|
+
- [ ] 不需要评分的字段已关闭 `norms`
|
|
2036
|
+
- [ ] 不需要排序/聚合的字段已关闭 `doc_values`
|
|
2037
|
+
- [ ] 嵌套对象的关联性需求已评估(`object` vs `nested`)
|
|
2038
|
+
- [ ] `index.mapping.total_fields.limit` 已根据实际需求调整
|
|
2039
|
+
|
|
2040
|
+
### 查询优化
|
|
2041
|
+
- [ ] 精确过滤条件放在 `filter` context(可缓存,无评分开销)
|
|
2042
|
+
- [ ] 避免对 `text` 字段使用 `term` 查询
|
|
2043
|
+
- [ ] `multi_match` 的 type 根据场景选择(`best_fields` / `cross_fields`)
|
|
2044
|
+
- [ ] 深分页使用 `search_after` + PIT 而非 `from + size`
|
|
2045
|
+
- [ ] 高基数字段的 `terms` 聚合 size 已控制在合理范围
|
|
2046
|
+
- [ ] 评分公式使用 `function_score` 而非 `script_score`(性能更优)
|
|
2047
|
+
- [ ] 频繁执行的聚合查询使用 `request_cache=true`
|
|
2048
|
+
|
|
2049
|
+
### 写入优化
|
|
2050
|
+
- [ ] Bulk 批次大小为 5–15MB(而非固定文档数)
|
|
2051
|
+
- [ ] 大批量写入前临时关闭 `refresh_interval` 和减少 `number_of_replicas`
|
|
2052
|
+
- [ ] Bulk 响应中的 errors 字段有检查和重试逻辑
|
|
2053
|
+
- [ ] 使用 `_routing` 将相关文档路由到同一分片(多租户场景)
|
|
2054
|
+
- [ ] 文档 `_id` 使用业务键或自动生成(避免随机 UUID 影响写入性能)
|
|
2055
|
+
|
|
2056
|
+
### 集群运维
|
|
2057
|
+
- [ ] Master 节点为 dedicated 角色(不承担 data/ingest)
|
|
2058
|
+
- [ ] JVM Heap ≤ 50% 物理内存且 ≤ 30GB
|
|
2059
|
+
- [ ] 已配置 Hot-Warm-Cold 架构(数据量 > 1TB 场景)
|
|
2060
|
+
- [ ] ILM 策略已配置并正常运行
|
|
2061
|
+
- [ ] 快照备份已配置(SLM 或定时任务)
|
|
2062
|
+
- [ ] 集群健康状态监控告警已就位(`_cluster/health`)
|
|
2063
|
+
- [ ] 慢查询日志已开启(`index.search.slowlog.threshold`)
|
|
2064
|
+
- [ ] Circuit Breaker 参数已合理配置
|
|
2065
|
+
- [ ] 磁盘水位线告警已配置(85% / 90% / 95%)
|
|
2066
|
+
|
|
2067
|
+
### 安全
|
|
2068
|
+
- [ ] X-Pack Security 已启用(8.x 默认启用)
|
|
2069
|
+
- [ ] TLS 加密已配置(传输层 + HTTP 层)
|
|
2070
|
+
- [ ] 遵循最小权限原则,应用使用独立角色而非 elastic 超级用户
|
|
2071
|
+
- [ ] API Key 有合理的过期时间和权限范围
|
|
2072
|
+
- [ ] 需要时配置了字段级安全和/或文档级安全
|
|
2073
|
+
- [ ] `elastic` 超级用户密码已修改且安全存储
|
|
2074
|
+
|
|
2075
|
+
### 中文搜索
|
|
2076
|
+
- [ ] IK 分词插件已安装且版本与 ES 匹配
|
|
2077
|
+
- [ ] 索引时使用 `ik_max_word`,搜索时使用 `ik_smart`
|
|
2078
|
+
- [ ] 自定义词典已配置(行业术语、品牌名、新词等)
|
|
2079
|
+
- [ ] 同义词文件已配置在 search_analyzer 中
|
|
2080
|
+
- [ ] 停用词列表已根据业务场景定制
|
|
2081
|
+
|
|
2082
|
+
### 监控与可观测性
|
|
2083
|
+
- [ ] 关键指标已接入监控系统(Prometheus / Datadog / 自带 Monitoring)
|
|
2084
|
+
- [ ] JVM GC 时间和频率有告警阈值
|
|
2085
|
+
- [ ] 搜索延迟 P99 有 SLO 和告警
|
|
2086
|
+
- [ ] 索引速率下降有告警
|
|
2087
|
+
- [ ] 未分配分片有即时告警
|
|
2088
|
+
- [ ] Kibana Dev Tools 可用于日常运维调试
|
|
2089
|
+
|
|
2090
|
+
---
|
|
2091
|
+
|
|
2092
|
+
**知识ID**: `elasticsearch-complete`
|
|
2093
|
+
**领域**: data
|
|
2094
|
+
**类型**: standards
|
|
2095
|
+
**难度**: intermediate-advanced
|
|
2096
|
+
**质量分**: 95
|
|
2097
|
+
**维护者**: data-team@umadev.com
|
|
2098
|
+
**最后更新**: 2026-03-28
|