blockmine 1.24.0 → 1.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -0
- package/README.en.md +427 -0
- package/README.md +40 -0
- package/backend/cli.js +1 -1
- package/backend/src/ai/plugin-assistant-system-prompt.md +664 -5
- package/backend/src/api/routes/bots.js +13 -0
- package/backend/src/api/routes/servers.js +14 -2
- package/backend/src/core/BotProcess.js +98 -2
- package/backend/src/core/PluginLoader.js +83 -3
- package/backend/src/core/PluginManager.js +75 -5
- package/backend/src/core/services/BotLifecycleService.js +186 -2
- package/backend/src/server.js +11 -1
- package/frontend/dist/assets/browser-ponyfill-DN7pwmHT.js +2 -0
- package/frontend/dist/assets/index-LSy71uwm.js +11261 -0
- package/frontend/dist/assets/index-SfhKxI4-.css +32 -0
- package/frontend/dist/flags/en.svg +32 -0
- package/frontend/dist/flags/ru.svg +5 -0
- package/frontend/dist/index.html +2 -2
- package/frontend/dist/locales/en/admin.json +100 -0
- package/frontend/dist/locales/en/api-keys.json +58 -0
- package/frontend/dist/locales/en/bots.json +110 -0
- package/frontend/dist/locales/en/common.json +47 -0
- package/frontend/dist/locales/en/configuration.json +22 -0
- package/frontend/dist/locales/en/console.json +10 -0
- package/frontend/dist/locales/en/dashboard.json +85 -0
- package/frontend/dist/locales/en/dialogs.json +70 -0
- package/frontend/dist/locales/en/event-graphs.json +50 -0
- package/frontend/dist/locales/en/graph-store.json +70 -0
- package/frontend/dist/locales/en/login.json +34 -0
- package/frontend/dist/locales/en/management.json +114 -0
- package/frontend/dist/locales/en/minecraft-viewer.json +27 -0
- package/frontend/dist/locales/en/nodes.json +1077 -0
- package/frontend/dist/locales/en/permissions.json +50 -0
- package/frontend/dist/locales/en/plugin-detail.json +49 -0
- package/frontend/dist/locales/en/plugins.json +110 -0
- package/frontend/dist/locales/en/proxies.json +81 -0
- package/frontend/dist/locales/en/servers.json +39 -0
- package/frontend/dist/locales/en/setup.json +17 -0
- package/frontend/dist/locales/en/sidebar.json +27 -0
- package/frontend/dist/locales/en/tasks.json +62 -0
- package/frontend/dist/locales/en/visual-editor.json +219 -0
- package/frontend/dist/locales/en/websocket.json +86 -0
- package/frontend/dist/locales/ru/admin.json +100 -0
- package/frontend/dist/locales/ru/api-keys.json +58 -0
- package/frontend/dist/locales/ru/bots.json +110 -0
- package/frontend/dist/locales/ru/common.json +49 -0
- package/frontend/dist/locales/ru/configuration.json +22 -0
- package/frontend/dist/locales/ru/console.json +10 -0
- package/frontend/dist/locales/ru/dashboard.json +85 -0
- package/frontend/dist/locales/ru/dialogs.json +70 -0
- package/frontend/dist/locales/ru/event-graphs.json +50 -0
- package/frontend/dist/locales/ru/graph-store.json +70 -0
- package/frontend/dist/locales/ru/login.json +34 -0
- package/frontend/dist/locales/ru/management.json +114 -0
- package/frontend/dist/locales/ru/minecraft-viewer.json +27 -0
- package/frontend/dist/locales/ru/nodes.json +1077 -0
- package/frontend/dist/locales/ru/permissions.json +50 -0
- package/frontend/dist/locales/ru/plugin-detail.json +49 -0
- package/frontend/dist/locales/ru/plugins.json +110 -0
- package/frontend/dist/locales/ru/proxies.json +81 -0
- package/frontend/dist/locales/ru/servers.json +39 -0
- package/frontend/dist/locales/ru/setup.json +17 -0
- package/frontend/dist/locales/ru/sidebar.json +27 -0
- package/frontend/dist/locales/ru/tasks.json +62 -0
- package/frontend/dist/locales/ru/visual-editor.json +221 -0
- package/frontend/dist/locales/ru/websocket.json +86 -0
- package/frontend/dist/monacoeditorwork/css.worker.bundle.js +7 -7
- package/frontend/dist/monacoeditorwork/html.worker.bundle.js +7 -7
- package/frontend/dist/monacoeditorwork/json.worker.bundle.js +7 -7
- package/frontend/dist/monacoeditorwork/ts.worker.bundle.js +3 -3
- package/frontend/package.json +4 -0
- package/package.json +1 -1
- package/screen/3dviewer.png +0 -0
- package/screen/console.png +0 -0
- package/screen/dashboard.png +0 -0
- package/screen/graph_collabe.png +0 -0
- package/screen/graph_live_debug.png +0 -0
- package/screen/language_selector.png +0 -0
- package/screen/management_command.png +0 -0
- package/screen/node_debug_trace.png +0 -0
- package/screen/plugin_/320/276/320/261/320/267/320/276/321/200.png +0 -0
- package/screen/websocket.png +0 -0
- package/screen//320/275/320/260/321/201/321/202/321/200/320/276/320/271/320/272/320/270_/320/276/321/202/320/264/320/265/320/273/321/214/320/275/321/213/321/205_/320/272/320/276/320/274/320/260/320/275/320/264_/320/272/320/260/320/266/320/264/321/203_/320/272/320/276/320/274/320/260/320/275/320/273/320/264/321/203_/320/274/320/276/320/266/320/275/320/276_/320/275/320/260/321/201/321/202/321/200/320/260/320/270/320/262/320/260/321/202/321/214.png +0 -0
- package/screen//320/277/320/273/320/260/320/275/320/270/321/200/320/276/320/262/321/211/320/270/320/272_/320/274/320/276/320/266/320/275/320/276_/320/267/320/260/320/264/320/260/320/262/320/260/321/202/321/214_/320/264/320/265/320/271/321/201/321/202/320/262/320/270/321/217_/320/277/320/276_/320/262/321/200/320/265/320/274/320/265/320/275/320/270.png +0 -0
- package/.claude/agents/README.md +0 -469
- package/.claude/agents/auth-route-debugger.md +0 -118
- package/.claude/agents/auth-route-tester.md +0 -93
- package/.claude/agents/auto-error-resolver.md +0 -97
- package/.claude/agents/build-optimizer.md +0 -236
- package/.claude/agents/code-architect.md +0 -34
- package/.claude/agents/code-architecture-reviewer.md +0 -83
- package/.claude/agents/code-explorer.md +0 -51
- package/.claude/agents/code-refactor-master.md +0 -94
- package/.claude/agents/code-reviewer.md +0 -46
- package/.claude/agents/cost-optimizer.md +0 -134
- package/.claude/agents/deployment-orchestrator.md +0 -113
- package/.claude/agents/documentation-architect.md +0 -82
- package/.claude/agents/frontend-error-fixer.md +0 -77
- package/.claude/agents/iac-code-generator.md +0 -71
- package/.claude/agents/incident-responder.md +0 -346
- package/.claude/agents/infrastructure-architect.md +0 -31
- package/.claude/agents/kubernetes-specialist.md +0 -56
- package/.claude/agents/migration-planner.md +0 -181
- package/.claude/agents/network-architect.md +0 -196
- package/.claude/agents/plan-reviewer.md +0 -52
- package/.claude/agents/refactor-planner.md +0 -63
- package/.claude/agents/security-scanner.md +0 -102
- package/.claude/agents/web-research-specialist.md +0 -78
- package/.claude/commands/cost-analysis.md +0 -315
- package/.claude/commands/dev-docs-update.md +0 -55
- package/.claude/commands/dev-docs.md +0 -51
- package/.claude/commands/feature-dev.md +0 -125
- package/.claude/commands/incident-debug.md +0 -247
- package/.claude/commands/infra-plan.md +0 -81
- package/.claude/commands/migration-plan.md +0 -478
- package/.claude/commands/route-research-for-testing.md +0 -37
- package/.claude/commands/security-review.md +0 -66
- package/.claude/hooks/CONFIG.md +0 -448
- package/.claude/hooks/README.md +0 -163
- package/.claude/hooks/SKILL_ACTIVATION_COMPLETE.md +0 -226
- package/.claude/hooks/WINDOWS_HOOKS_README.md +0 -151
- package/.claude/hooks/add-skill-activation-banners.ts +0 -132
- package/.claude/hooks/comprehensive-skill-test.ts +0 -1315
- package/.claude/hooks/error-handling-reminder.sh +0 -12
- package/.claude/hooks/error-handling-reminder.ts +0 -222
- package/.claude/hooks/k8s-manifest-validator.sh +0 -56
- package/.claude/hooks/package-lock.json +0 -556
- package/.claude/hooks/package.json +0 -16
- package/.claude/hooks/post-tool-use-tracker.ps1 +0 -174
- package/.claude/hooks/post-tool-use-tracker.sh +0 -183
- package/.claude/hooks/security-policy-check.sh +0 -247
- package/.claude/hooks/skill-activation-prompt.ps1 +0 -10
- package/.claude/hooks/skill-activation-prompt.sh +0 -10
- package/.claude/hooks/skill-activation-prompt.ts +0 -141
- package/.claude/hooks/stop-build-check-enhanced.sh +0 -130
- package/.claude/hooks/terraform-validator.sh +0 -53
- package/.claude/hooks/test-input.json +0 -7
- package/.claude/hooks/test-skill-activation.ts +0 -427
- package/.claude/hooks/trigger-build-resolver.sh +0 -79
- package/.claude/hooks/tsc-check.sh +0 -173
- package/.claude/hooks/tsconfig.json +0 -19
- package/.claude/settings.json +0 -59
- package/.claude/settings.local.json +0 -67
- package/.claude/skills/README.md +0 -507
- package/.claude/skills/api-engineering/SKILL.md +0 -63
- package/.claude/skills/api-engineering/resources/api-versioning.md +0 -88
- package/.claude/skills/api-engineering/resources/graphql-patterns.md +0 -106
- package/.claude/skills/api-engineering/resources/rate-limiting.md +0 -118
- package/.claude/skills/api-engineering/resources/rest-api-design.md +0 -105
- package/.claude/skills/backend-dev-guidelines/SKILL.md +0 -306
- package/.claude/skills/backend-dev-guidelines/resources/architecture-overview.md +0 -451
- package/.claude/skills/backend-dev-guidelines/resources/async-and-errors.md +0 -307
- package/.claude/skills/backend-dev-guidelines/resources/complete-examples.md +0 -638
- package/.claude/skills/backend-dev-guidelines/resources/configuration.md +0 -275
- package/.claude/skills/backend-dev-guidelines/resources/database-patterns.md +0 -224
- package/.claude/skills/backend-dev-guidelines/resources/middleware-guide.md +0 -213
- package/.claude/skills/backend-dev-guidelines/resources/routing-and-controllers.md +0 -756
- package/.claude/skills/backend-dev-guidelines/resources/sentry-and-monitoring.md +0 -336
- package/.claude/skills/backend-dev-guidelines/resources/services-and-repositories.md +0 -789
- package/.claude/skills/backend-dev-guidelines/resources/testing-guide.md +0 -235
- package/.claude/skills/backend-dev-guidelines/resources/validation-patterns.md +0 -754
- package/.claude/skills/budget-and-cost-management/SKILL.md +0 -850
- package/.claude/skills/build-engineering/SKILL.md +0 -431
- package/.claude/skills/build-engineering/resources/artifact-repositories.md +0 -72
- package/.claude/skills/build-engineering/resources/build-caching.md +0 -96
- package/.claude/skills/build-engineering/resources/build-pipelines.md +0 -105
- package/.claude/skills/build-engineering/resources/build-security.md +0 -95
- package/.claude/skills/build-engineering/resources/build-systems.md +0 -389
- package/.claude/skills/build-engineering/resources/compilation-optimization.md +0 -201
- package/.claude/skills/build-engineering/resources/dependency-management.md +0 -73
- package/.claude/skills/build-engineering/resources/monorepo-builds.md +0 -110
- package/.claude/skills/build-engineering/resources/performance-optimization.md +0 -113
- package/.claude/skills/build-engineering/resources/reproducible-builds.md +0 -82
- package/.claude/skills/cloud-engineering/SKILL.md +0 -675
- package/.claude/skills/cloud-engineering/resources/aws-patterns.md +0 -742
- package/.claude/skills/cloud-engineering/resources/azure-patterns.md +0 -714
- package/.claude/skills/cloud-engineering/resources/cleared-cloud-environments.md +0 -987
- package/.claude/skills/cloud-engineering/resources/cloud-cost-optimization.md +0 -757
- package/.claude/skills/cloud-engineering/resources/cloud-networking.md +0 -1058
- package/.claude/skills/cloud-engineering/resources/cloud-security-tools.md +0 -1530
- package/.claude/skills/cloud-engineering/resources/cloud-security.md +0 -990
- package/.claude/skills/cloud-engineering/resources/gcp-patterns.md +0 -758
- package/.claude/skills/cloud-engineering/resources/migration-strategies.md +0 -820
- package/.claude/skills/cloud-engineering/resources/multi-cloud-strategies.md +0 -670
- package/.claude/skills/cloud-engineering/resources/oci-patterns.md +0 -1198
- package/.claude/skills/cloud-engineering/resources/serverless-patterns.md +0 -795
- package/.claude/skills/cloud-engineering/resources/well-architected-frameworks.md +0 -966
- package/.claude/skills/cybersecurity/SKILL.md +0 -409
- package/.claude/skills/cybersecurity/resources/security-architecture.md +0 -266
- package/.claude/skills/database-engineering/SKILL.md +0 -61
- package/.claude/skills/database-engineering/resources/backup-and-recovery.md +0 -72
- package/.claude/skills/database-engineering/resources/database-replication.md +0 -63
- package/.claude/skills/database-engineering/resources/postgresql-fundamentals.md +0 -70
- package/.claude/skills/database-engineering/resources/query-optimization.md +0 -68
- package/.claude/skills/devsecops/SKILL.md +0 -374
- package/.claude/skills/devsecops/resources/ci-cd-security.md +0 -204
- package/.claude/skills/devsecops/resources/compliance-automation.md +0 -530
- package/.claude/skills/devsecops/resources/compliance-frameworks.md +0 -2322
- package/.claude/skills/devsecops/resources/container-security.md +0 -915
- package/.claude/skills/devsecops/resources/cspm-integration.md +0 -1440
- package/.claude/skills/devsecops/resources/policy-enforcement.md +0 -619
- package/.claude/skills/devsecops/resources/secrets-management.md +0 -755
- package/.claude/skills/devsecops/resources/security-monitoring.md +0 -146
- package/.claude/skills/devsecops/resources/security-scanning.md +0 -887
- package/.claude/skills/devsecops/resources/security-testing.md +0 -203
- package/.claude/skills/devsecops/resources/supply-chain-security.md +0 -518
- package/.claude/skills/devsecops/resources/vulnerability-management.md +0 -481
- package/.claude/skills/devsecops/resources/zero-trust-architecture.md +0 -177
- package/.claude/skills/documentation-as-code/SKILL.md +0 -323
- package/.claude/skills/documentation-as-code/resources/api-documentation.md +0 -90
- package/.claude/skills/documentation-as-code/resources/changelog-management.md +0 -79
- package/.claude/skills/documentation-as-code/resources/diagram-generation.md +0 -44
- package/.claude/skills/documentation-as-code/resources/docs-as-code-workflow.md +0 -99
- package/.claude/skills/documentation-as-code/resources/documentation-automation.md +0 -68
- package/.claude/skills/documentation-as-code/resources/documentation-sites.md +0 -79
- package/.claude/skills/documentation-as-code/resources/markdown-best-practices.md +0 -162
- package/.claude/skills/documentation-as-code/resources/openapi-specification.md +0 -77
- package/.claude/skills/documentation-as-code/resources/readme-engineering.md +0 -60
- package/.claude/skills/documentation-as-code/resources/technical-writing-guide.md +0 -202
- package/.claude/skills/engineering-management/SKILL.md +0 -356
- package/.claude/skills/engineering-management/resources/career-ladders.md +0 -609
- package/.claude/skills/engineering-management/resources/hiring-and-assessment.md +0 -555
- package/.claude/skills/engineering-management/resources/one-on-one-guides.md +0 -609
- package/.claude/skills/engineering-management/resources/resource-planning.md +0 -557
- package/.claude/skills/engineering-management/resources/team-organization-patterns.md +0 -491
- package/.claude/skills/engineering-management/resources/technical-interviews.md +0 -474
- package/.claude/skills/engineering-operations-management/SKILL.md +0 -817
- package/.claude/skills/error-tracking/SKILL.md +0 -379
- package/.claude/skills/frontend-design/SKILL.md +0 -42
- package/.claude/skills/frontend-dev-guidelines/SKILL.md +0 -403
- package/.claude/skills/frontend-dev-guidelines/resources/common-patterns.md +0 -331
- package/.claude/skills/frontend-dev-guidelines/resources/complete-examples.md +0 -872
- package/.claude/skills/frontend-dev-guidelines/resources/component-patterns.md +0 -502
- package/.claude/skills/frontend-dev-guidelines/resources/data-fetching.md +0 -767
- package/.claude/skills/frontend-dev-guidelines/resources/file-organization.md +0 -502
- package/.claude/skills/frontend-dev-guidelines/resources/loading-and-error-states.md +0 -501
- package/.claude/skills/frontend-dev-guidelines/resources/performance.md +0 -406
- package/.claude/skills/frontend-dev-guidelines/resources/routing-guide.md +0 -364
- package/.claude/skills/frontend-dev-guidelines/resources/styling-guide.md +0 -428
- package/.claude/skills/frontend-dev-guidelines/resources/typescript-standards.md +0 -418
- package/.claude/skills/general-it-engineering/SKILL.md +0 -393
- package/.claude/skills/general-it-engineering/resources/asset-management.md +0 -712
- package/.claude/skills/general-it-engineering/resources/automation-orchestration.md +0 -817
- package/.claude/skills/general-it-engineering/resources/business-continuity.md +0 -786
- package/.claude/skills/general-it-engineering/resources/change-management.md +0 -715
- package/.claude/skills/general-it-engineering/resources/enterprise-monitoring.md +0 -729
- package/.claude/skills/general-it-engineering/resources/help-desk-operations.md +0 -738
- package/.claude/skills/general-it-engineering/resources/incident-service-management.md +0 -834
- package/.claude/skills/general-it-engineering/resources/it-governance.md +0 -753
- package/.claude/skills/general-it-engineering/resources/itil-framework.md +0 -503
- package/.claude/skills/general-it-engineering/resources/service-management.md +0 -669
- package/.claude/skills/infrastructure-architecture/SKILL.md +0 -328
- package/.claude/skills/infrastructure-architecture/resources/architecture-decision-records.md +0 -505
- package/.claude/skills/infrastructure-architecture/resources/architecture-patterns.md +0 -528
- package/.claude/skills/infrastructure-architecture/resources/capacity-planning.md +0 -453
- package/.claude/skills/infrastructure-architecture/resources/cleared-environment-architecture.md +0 -773
- package/.claude/skills/infrastructure-architecture/resources/cost-architecture.md +0 -499
- package/.claude/skills/infrastructure-architecture/resources/data-architecture.md +0 -501
- package/.claude/skills/infrastructure-architecture/resources/disaster-recovery.md +0 -535
- package/.claude/skills/infrastructure-architecture/resources/migration-architecture.md +0 -512
- package/.claude/skills/infrastructure-architecture/resources/multi-region-design.md +0 -608
- package/.claude/skills/infrastructure-architecture/resources/reference-architectures.md +0 -562
- package/.claude/skills/infrastructure-architecture/resources/security-architecture.md +0 -538
- package/.claude/skills/infrastructure-architecture/resources/system-design-principles.md +0 -489
- package/.claude/skills/infrastructure-architecture/resources/workload-classification.md +0 -1000
- package/.claude/skills/infrastructure-strategy/SKILL.md +0 -924
- package/.claude/skills/network-engineering/SKILL.md +0 -385
- package/.claude/skills/network-engineering/resources/dns-management.md +0 -738
- package/.claude/skills/network-engineering/resources/load-balancing.md +0 -820
- package/.claude/skills/network-engineering/resources/network-architecture.md +0 -546
- package/.claude/skills/network-engineering/resources/network-security.md +0 -921
- package/.claude/skills/network-engineering/resources/network-troubleshooting.md +0 -749
- package/.claude/skills/network-engineering/resources/routing-switching.md +0 -373
- package/.claude/skills/network-engineering/resources/sdn-networking.md +0 -695
- package/.claude/skills/network-engineering/resources/service-mesh-networking.md +0 -777
- package/.claude/skills/network-engineering/resources/tcp-ip-protocols.md +0 -444
- package/.claude/skills/network-engineering/resources/vpn-connectivity.md +0 -672
- package/.claude/skills/node-development/SKILL.md +0 -317
- package/.claude/skills/observability-engineering/SKILL.md +0 -101
- package/.claude/skills/observability-engineering/resources/apm-tools.md +0 -97
- package/.claude/skills/observability-engineering/resources/correlation-strategies.md +0 -87
- package/.claude/skills/observability-engineering/resources/distributed-tracing.md +0 -98
- package/.claude/skills/observability-engineering/resources/logs-aggregation.md +0 -118
- package/.claude/skills/observability-engineering/resources/observability-cost-optimization.md +0 -141
- package/.claude/skills/observability-engineering/resources/opentelemetry.md +0 -110
- package/.claude/skills/platform-engineering/SKILL.md +0 -555
- package/.claude/skills/platform-engineering/resources/architecture-overview.md +0 -600
- package/.claude/skills/platform-engineering/resources/container-orchestration.md +0 -916
- package/.claude/skills/platform-engineering/resources/cost-optimization.md +0 -634
- package/.claude/skills/platform-engineering/resources/developer-platforms.md +0 -670
- package/.claude/skills/platform-engineering/resources/gitops-automation.md +0 -650
- package/.claude/skills/platform-engineering/resources/infrastructure-as-code.md +0 -778
- package/.claude/skills/platform-engineering/resources/infrastructure-standards.md +0 -708
- package/.claude/skills/platform-engineering/resources/multi-tenancy.md +0 -602
- package/.claude/skills/platform-engineering/resources/platform-security.md +0 -711
- package/.claude/skills/platform-engineering/resources/resource-management.md +0 -592
- package/.claude/skills/platform-engineering/resources/service-mesh.md +0 -628
- package/.claude/skills/release-engineering/SKILL.md +0 -393
- package/.claude/skills/release-engineering/resources/artifact-management.md +0 -108
- package/.claude/skills/release-engineering/resources/build-optimization.md +0 -84
- package/.claude/skills/release-engineering/resources/ci-cd-pipelines.md +0 -411
- package/.claude/skills/release-engineering/resources/deployment-strategies.md +0 -197
- package/.claude/skills/release-engineering/resources/pipeline-security.md +0 -62
- package/.claude/skills/release-engineering/resources/progressive-delivery.md +0 -83
- package/.claude/skills/release-engineering/resources/release-automation.md +0 -68
- package/.claude/skills/release-engineering/resources/release-orchestration.md +0 -77
- package/.claude/skills/release-engineering/resources/rollback-strategies.md +0 -66
- package/.claude/skills/release-engineering/resources/versioning-strategies.md +0 -59
- package/.claude/skills/route-tester/SKILL.md +0 -392
- package/.claude/skills/skill-developer/ADVANCED.md +0 -197
- package/.claude/skills/skill-developer/HOOK_MECHANISMS.md +0 -306
- package/.claude/skills/skill-developer/PATTERNS_LIBRARY.md +0 -152
- package/.claude/skills/skill-developer/SKILL.md +0 -430
- package/.claude/skills/skill-developer/SKILL_RULES_REFERENCE.md +0 -315
- package/.claude/skills/skill-developer/TRIGGER_TYPES.md +0 -305
- package/.claude/skills/skill-developer/TROUBLESHOOTING.md +0 -514
- package/.claude/skills/skill-rules.json +0 -2989
- package/.claude/skills/sre/SKILL.md +0 -464
- package/.claude/skills/sre/resources/alerting-best-practices.md +0 -282
- package/.claude/skills/sre/resources/capacity-planning.md +0 -226
- package/.claude/skills/sre/resources/chaos-engineering.md +0 -193
- package/.claude/skills/sre/resources/disaster-recovery.md +0 -232
- package/.claude/skills/sre/resources/incident-management.md +0 -436
- package/.claude/skills/sre/resources/observability-stack.md +0 -240
- package/.claude/skills/sre/resources/on-call-runbooks.md +0 -167
- package/.claude/skills/sre/resources/performance-optimization.md +0 -108
- package/.claude/skills/sre/resources/reliability-patterns.md +0 -183
- package/.claude/skills/sre/resources/slo-sli-sla.md +0 -464
- package/.claude/skills/sre/resources/toil-reduction.md +0 -145
- package/.claude/skills/systems-engineering/SKILL.md +0 -648
- package/.claude/skills/systems-engineering/resources/automation-patterns.md +0 -771
- package/.claude/skills/systems-engineering/resources/configuration-management.md +0 -998
- package/.claude/skills/systems-engineering/resources/linux-administration.md +0 -672
- package/.claude/skills/systems-engineering/resources/networking-fundamentals.md +0 -982
- package/.claude/skills/systems-engineering/resources/performance-tuning.md +0 -871
- package/.claude/skills/systems-engineering/resources/powershell-scripting.md +0 -482
- package/.claude/skills/systems-engineering/resources/security-hardening.md +0 -739
- package/.claude/skills/systems-engineering/resources/shell-scripting.md +0 -915
- package/.claude/skills/systems-engineering/resources/storage-management.md +0 -628
- package/.claude/skills/systems-engineering/resources/system-monitoring.md +0 -787
- package/.claude/skills/systems-engineering/resources/troubleshooting-guide.md +0 -753
- package/.claude/skills/systems-engineering/resources/windows-administration.md +0 -738
- package/.claude/skills/technical-leadership/SKILL.md +0 -728
- package/backend/docs/SECRETS_DOCUMENTATION.md +0 -327
- package/frontend/dist/assets/index-BC-NbKXi.css +0 -32
- package/frontend/dist/assets/index-DqJXZMHY.js +0 -11266
|
@@ -1,729 +0,0 @@
|
|
|
1
|
-
# Enterprise Monitoring
|
|
2
|
-
|
|
3
|
-
Enterprise monitoring tools, dashboards, capacity management, performance metrics, and proactive monitoring strategies.
|
|
4
|
-
|
|
5
|
-
## Table of Contents
|
|
6
|
-
|
|
7
|
-
- [Monitoring Overview](#monitoring-overview)
|
|
8
|
-
- [Monitoring Tools](#monitoring-tools)
|
|
9
|
-
- [Monitoring Metrics](#monitoring-metrics)
|
|
10
|
-
- [Dashboards](#dashboards)
|
|
11
|
-
- [Alerting](#alerting)
|
|
12
|
-
- [Capacity Management](#capacity-management)
|
|
13
|
-
- [Best Practices](#best-practices)
|
|
14
|
-
|
|
15
|
-
## Monitoring Overview
|
|
16
|
-
|
|
17
|
-
### Purpose
|
|
18
|
-
|
|
19
|
-
Enterprise monitoring provides:
|
|
20
|
-
- Real-time visibility into IT infrastructure
|
|
21
|
-
- Proactive issue detection
|
|
22
|
-
- Performance optimization
|
|
23
|
-
- Capacity planning
|
|
24
|
-
- Service level compliance
|
|
25
|
-
- Root cause analysis
|
|
26
|
-
|
|
27
|
-
### Monitoring Layers
|
|
28
|
-
|
|
29
|
-
```
|
|
30
|
-
┌─────────────────────────────────────────┐
|
|
31
|
-
│ Business Monitoring │
|
|
32
|
-
│ - Transaction success rate │
|
|
33
|
-
│ - Revenue per minute │
|
|
34
|
-
│ - Customer experience │
|
|
35
|
-
└──────────────┬──────────────────────────┘
|
|
36
|
-
↓
|
|
37
|
-
┌─────────────────────────────────────────┐
|
|
38
|
-
│ Application Monitoring (APM) │
|
|
39
|
-
│ - Response times │
|
|
40
|
-
│ - Error rates │
|
|
41
|
-
│ - Database query performance │
|
|
42
|
-
└──────────────┬──────────────────────────┘
|
|
43
|
-
↓
|
|
44
|
-
┌─────────────────────────────────────────┐
|
|
45
|
-
│ Infrastructure Monitoring │
|
|
46
|
-
│ - Server CPU/memory │
|
|
47
|
-
│ - Network bandwidth │
|
|
48
|
-
│ - Storage capacity │
|
|
49
|
-
└──────────────┬──────────────────────────┘
|
|
50
|
-
↓
|
|
51
|
-
┌─────────────────────────────────────────┐
|
|
52
|
-
│ Network Monitoring │
|
|
53
|
-
│ - Link availability │
|
|
54
|
-
│ - Latency │
|
|
55
|
-
│ - Packet loss │
|
|
56
|
-
└─────────────────────────────────────────┘
|
|
57
|
-
```
|
|
58
|
-
|
|
59
|
-
## Monitoring Tools
|
|
60
|
-
|
|
61
|
-
### Enterprise Monitoring Stack
|
|
62
|
-
|
|
63
|
-
**Infrastructure Monitoring:**
|
|
64
|
-
```yaml
|
|
65
|
-
Tools:
|
|
66
|
-
- Nagios/Icinga: Traditional monitoring
|
|
67
|
-
- Zabbix: Enterprise monitoring
|
|
68
|
-
- PRTG: Network monitoring
|
|
69
|
-
- SolarWinds: Comprehensive suite
|
|
70
|
-
|
|
71
|
-
Capabilities:
|
|
72
|
-
- Server monitoring (CPU, memory, disk)
|
|
73
|
-
- Network device monitoring
|
|
74
|
-
- Service checks (HTTP, SMTP, etc.)
|
|
75
|
-
- SNMP monitoring
|
|
76
|
-
- Alerting
|
|
77
|
-
```
|
|
78
|
-
|
|
79
|
-
**Application Performance Monitoring (APM):**
|
|
80
|
-
```yaml
|
|
81
|
-
Tools:
|
|
82
|
-
- New Relic: Full-stack observability
|
|
83
|
-
- Dynatrace: AI-powered APM
|
|
84
|
-
- AppDynamics: Application intelligence
|
|
85
|
-
- Datadog: Cloud-scale monitoring
|
|
86
|
-
|
|
87
|
-
Capabilities:
|
|
88
|
-
- Application performance
|
|
89
|
-
- Transaction tracing
|
|
90
|
-
- Code-level diagnostics
|
|
91
|
-
- User experience monitoring
|
|
92
|
-
- Error tracking
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
**Log Management:**
|
|
96
|
-
```yaml
|
|
97
|
-
Tools:
|
|
98
|
-
- Splunk: Enterprise log analysis
|
|
99
|
-
- ELK Stack: Open-source (Elasticsearch, Logstash, Kibana)
|
|
100
|
-
- Graylog: Log management
|
|
101
|
-
- Sumo Logic: Cloud-native logs
|
|
102
|
-
|
|
103
|
-
Capabilities:
|
|
104
|
-
- Centralized logging
|
|
105
|
-
- Log aggregation
|
|
106
|
-
- Search and analysis
|
|
107
|
-
- Correlation
|
|
108
|
-
- Compliance
|
|
109
|
-
```
|
|
110
|
-
|
|
111
|
-
**Cloud Monitoring:**
|
|
112
|
-
```yaml
|
|
113
|
-
AWS:
|
|
114
|
-
- CloudWatch: Metrics and logs
|
|
115
|
-
- X-Ray: Distributed tracing
|
|
116
|
-
- CloudTrail: Audit logs
|
|
117
|
-
|
|
118
|
-
Azure:
|
|
119
|
-
- Azure Monitor: Unified monitoring
|
|
120
|
-
- Application Insights: APM
|
|
121
|
-
- Log Analytics: Log management
|
|
122
|
-
|
|
123
|
-
GCP:
|
|
124
|
-
- Cloud Monitoring: Metrics
|
|
125
|
-
- Cloud Logging: Logs
|
|
126
|
-
- Cloud Trace: Distributed tracing
|
|
127
|
-
```
|
|
128
|
-
|
|
129
|
-
## Monitoring Metrics
|
|
130
|
-
|
|
131
|
-
### Infrastructure Metrics
|
|
132
|
-
|
|
133
|
-
**Server Metrics:**
|
|
134
|
-
```yaml
|
|
135
|
-
CPU:
|
|
136
|
-
- CPU utilization (%)
|
|
137
|
-
- Load average (1m, 5m, 15m)
|
|
138
|
-
- Context switches
|
|
139
|
-
- CPU steal time (virtual)
|
|
140
|
-
|
|
141
|
-
Thresholds:
|
|
142
|
-
Warning: >70%
|
|
143
|
-
Critical: >90%
|
|
144
|
-
|
|
145
|
-
Memory:
|
|
146
|
-
- Memory utilization (%)
|
|
147
|
-
- Swap usage
|
|
148
|
-
- Memory available
|
|
149
|
-
- Page faults
|
|
150
|
-
|
|
151
|
-
Thresholds:
|
|
152
|
-
Warning: >80%
|
|
153
|
-
Critical: >95%
|
|
154
|
-
|
|
155
|
-
Disk:
|
|
156
|
-
- Disk utilization (%)
|
|
157
|
-
- Disk I/O (read/write IOPS)
|
|
158
|
-
- Disk latency
|
|
159
|
-
- Disk queue depth
|
|
160
|
-
|
|
161
|
-
Thresholds:
|
|
162
|
-
Utilization Warning: >80%
|
|
163
|
-
Utilization Critical: >90%
|
|
164
|
-
Latency Warning: >20ms
|
|
165
|
-
Latency Critical: >50ms
|
|
166
|
-
|
|
167
|
-
Network:
|
|
168
|
-
- Bandwidth utilization (%)
|
|
169
|
-
- Packets in/out
|
|
170
|
-
- Errors
|
|
171
|
-
- Dropped packets
|
|
172
|
-
|
|
173
|
-
Thresholds:
|
|
174
|
-
Bandwidth Warning: >70%
|
|
175
|
-
Bandwidth Critical: >90%
|
|
176
|
-
```
|
|
177
|
-
|
|
178
|
-
### Application Metrics
|
|
179
|
-
|
|
180
|
-
```yaml
|
|
181
|
-
Availability:
|
|
182
|
-
- Uptime (%)
|
|
183
|
-
- Error rate (%)
|
|
184
|
-
- Success rate (%)
|
|
185
|
-
|
|
186
|
-
Targets:
|
|
187
|
-
Uptime: 99.9% (SLA)
|
|
188
|
-
Error Rate: <1%
|
|
189
|
-
|
|
190
|
-
Performance:
|
|
191
|
-
- Response time (p50, p95, p99)
|
|
192
|
-
- Transactions per second (TPS)
|
|
193
|
-
- Throughput
|
|
194
|
-
- Apdex score
|
|
195
|
-
|
|
196
|
-
Targets:
|
|
197
|
-
Response Time p95: <500ms
|
|
198
|
-
Response Time p99: <1000ms
|
|
199
|
-
TPS: >1000
|
|
200
|
-
|
|
201
|
-
Resource Usage:
|
|
202
|
-
- Connection pool usage
|
|
203
|
-
- Thread pool usage
|
|
204
|
-
- Cache hit rate
|
|
205
|
-
- Queue depth
|
|
206
|
-
|
|
207
|
-
Targets:
|
|
208
|
-
Connection Pool: <80%
|
|
209
|
-
Cache Hit Rate: >90%
|
|
210
|
-
|
|
211
|
-
Database:
|
|
212
|
-
- Query response time
|
|
213
|
-
- Slow queries
|
|
214
|
-
- Connection count
|
|
215
|
-
- Deadlocks
|
|
216
|
-
|
|
217
|
-
Targets:
|
|
218
|
-
Query Time p95: <100ms
|
|
219
|
-
Slow Queries: <10/hour
|
|
220
|
-
```
|
|
221
|
-
|
|
222
|
-
### Business Metrics
|
|
223
|
-
|
|
224
|
-
```yaml
|
|
225
|
-
E-Commerce Example:
|
|
226
|
-
|
|
227
|
-
Revenue Metrics:
|
|
228
|
-
- Orders per minute
|
|
229
|
-
- Revenue per minute
|
|
230
|
-
- Cart abandonment rate
|
|
231
|
-
- Conversion rate
|
|
232
|
-
|
|
233
|
-
User Experience:
|
|
234
|
-
- Page load time
|
|
235
|
-
- Time to first byte
|
|
236
|
-
- Search results time
|
|
237
|
-
- Checkout time
|
|
238
|
-
|
|
239
|
-
Operational:
|
|
240
|
-
- Inventory accuracy
|
|
241
|
-
- Order fulfillment time
|
|
242
|
-
- Customer support tickets
|
|
243
|
-
- Failed payments
|
|
244
|
-
```
|
|
245
|
-
|
|
246
|
-
## Dashboards
|
|
247
|
-
|
|
248
|
-
### Executive Dashboard
|
|
249
|
-
|
|
250
|
-
```yaml
|
|
251
|
-
Executive IT Dashboard:
|
|
252
|
-
|
|
253
|
-
Service Health:
|
|
254
|
-
┌──────────────────────────────────────┐
|
|
255
|
-
│ Service Status │
|
|
256
|
-
├──────────────────────────────────────┤
|
|
257
|
-
│ Email: ✅ Operational │
|
|
258
|
-
│ Customer Portal: ✅ Operational │
|
|
259
|
-
│ VPN: ✅ Operational │
|
|
260
|
-
│ File Shares: ⚠️ Degraded │
|
|
261
|
-
│ ERP System: ✅ Operational │
|
|
262
|
-
└──────────────────────────────────────┘
|
|
263
|
-
|
|
264
|
-
SLA Compliance (This Month):
|
|
265
|
-
┌──────────────────────────────────────┐
|
|
266
|
-
│ Overall SLA: 99.7% ✅ (Target: 99.5%)│
|
|
267
|
-
│ │
|
|
268
|
-
│ Email: 99.95% ✅ │
|
|
269
|
-
│ Portal: 99.80% ✅ │
|
|
270
|
-
│ VPN: 99.50% ✅ │
|
|
271
|
-
│ File Shares: 99.40% ⚠️ │
|
|
272
|
-
└──────────────────────────────────────┘
|
|
273
|
-
|
|
274
|
-
Incidents:
|
|
275
|
-
┌──────────────────────────────────────┐
|
|
276
|
-
│ Open: 15 (▼ 25% vs last month) │
|
|
277
|
-
│ P1: 0 │
|
|
278
|
-
│ P2: 2 │
|
|
279
|
-
│ P3: 8 │
|
|
280
|
-
│ P4: 5 │
|
|
281
|
-
│ │
|
|
282
|
-
│ MTTR: 2.5 hours ✅ (Target: 4 hours) │
|
|
283
|
-
└──────────────────────────────────────┘
|
|
284
|
-
|
|
285
|
-
Costs:
|
|
286
|
-
┌──────────────────────────────────────┐
|
|
287
|
-
│ Cloud Spend: $145,000 │
|
|
288
|
-
│ Trend: ▼ 5% vs budget ✅ │
|
|
289
|
-
│ Top Costs: │
|
|
290
|
-
│ 1. Compute: $65,000 (45%) │
|
|
291
|
-
│ 2. Storage: $35,000 (24%) │
|
|
292
|
-
│ 3. Network: $25,000 (17%) │
|
|
293
|
-
└──────────────────────────────────────┘
|
|
294
|
-
```
|
|
295
|
-
|
|
296
|
-
### Operations Dashboard
|
|
297
|
-
|
|
298
|
-
```yaml
|
|
299
|
-
NOC (Network Operations Center) Dashboard:
|
|
300
|
-
|
|
301
|
-
Infrastructure Overview:
|
|
302
|
-
┌──────────────────────────────────────┐
|
|
303
|
-
│ Servers: 245 ✅ / 3 ⚠️ / 0 ❌ │
|
|
304
|
-
│ Network: 45 ✅ / 1 ⚠️ / 0 ❌ │
|
|
305
|
-
│ Storage: 15 ✅ / 0 ⚠️ / 0 ❌ │
|
|
306
|
-
│ Applications: 32 ✅ / 1 ⚠️ / 0 ❌ │
|
|
307
|
-
└──────────────────────────────────────┘
|
|
308
|
-
|
|
309
|
-
Active Alerts:
|
|
310
|
-
┌──────────────────────────────────────┐
|
|
311
|
-
│ Critical: 0 │
|
|
312
|
-
│ Warning: 5 │
|
|
313
|
-
│ │
|
|
314
|
-
│ 1. File Server Disk 85% (Warning) │
|
|
315
|
-
│ 2. Web01 CPU 75% (Warning) │
|
|
316
|
-
│ 3. Network Link Latency 25ms (Warn) │
|
|
317
|
-
│ 4. Database Slow Queries (Warning) │
|
|
318
|
-
│ 5. Backup Job Delayed (Warning) │
|
|
319
|
-
└──────────────────────────────────────┘
|
|
320
|
-
|
|
321
|
-
Performance:
|
|
322
|
-
┌──────────────────────────────────────┐
|
|
323
|
-
│ Application Response Time (p95) │
|
|
324
|
-
│ ████████████░░░░░░░░ 485ms ✅ │
|
|
325
|
-
│ Target: <500ms │
|
|
326
|
-
│ │
|
|
327
|
-
│ Network Latency (avg) │
|
|
328
|
-
│ ████░░░░░░░░░░░░░░░░ 18ms ✅ │
|
|
329
|
-
│ Target: <50ms │
|
|
330
|
-
│ │
|
|
331
|
-
│ Database Query Time (p95) │
|
|
332
|
-
│ ██████░░░░░░░░░░░░░░ 85ms ✅ │
|
|
333
|
-
│ Target: <100ms │
|
|
334
|
-
└──────────────────────────────────────┘
|
|
335
|
-
```
|
|
336
|
-
|
|
337
|
-
### Application Dashboard
|
|
338
|
-
|
|
339
|
-
```yaml
|
|
340
|
-
Application Performance Dashboard:
|
|
341
|
-
|
|
342
|
-
Customer Portal:
|
|
343
|
-
|
|
344
|
-
Response Time Trend (24 hours):
|
|
345
|
-
┌──────────────────────────────────────┐
|
|
346
|
-
│ p50 ▁▂▃▂▁▂▃▂▁▂▃▂▁▂▃▂▁▂▃▂▁▂▃▂ 250ms│
|
|
347
|
-
│ p95 ▃▅▆▅▃▅▆▅▃▅▆▅▃▅▆▅▃▅▆▅▃▅▆▅ 480ms│
|
|
348
|
-
│ p99 ▆▇█▇▆▇█▇▆▇█▇▆▇█▇▆▇█▇▆▇█▇ 920ms│
|
|
349
|
-
│ │
|
|
350
|
-
│ 00:00 06:00 12:00 18:00 │
|
|
351
|
-
└──────────────────────────────────────┘
|
|
352
|
-
|
|
353
|
-
Error Rate (24 hours):
|
|
354
|
-
┌──────────────────────────────────────┐
|
|
355
|
-
│ 2% █ │
|
|
356
|
-
│ 1% █ ▆ ▃ │
|
|
357
|
-
│ 0% ▅▃▂▁▂▃▂▁▂▃▂▁▂▃▂▁▂▃▂▁▂▃▂▁▂ │
|
|
358
|
-
│ │
|
|
359
|
-
│ Current: 0.3% ✅ (Target: <1%) │
|
|
360
|
-
└──────────────────────────────────────┘
|
|
361
|
-
|
|
362
|
-
Top Endpoints:
|
|
363
|
-
┌──────────────────────────────────────┐
|
|
364
|
-
│ Endpoint | Requests | p95 │
|
|
365
|
-
├──────────────────────────────────────┤
|
|
366
|
-
│ /api/orders | 15,000 | 320ms │
|
|
367
|
-
│ /api/products | 12,500 | 280ms │
|
|
368
|
-
│ /api/customers | 8,000 | 450ms │
|
|
369
|
-
│ /api/search | 6,000 | 650ms │
|
|
370
|
-
│ /api/checkout | 3,500 | 890ms │
|
|
371
|
-
└──────────────────────────────────────┘
|
|
372
|
-
|
|
373
|
-
Database Queries:
|
|
374
|
-
┌──────────────────────────────────────┐
|
|
375
|
-
│ Slow Queries (>1s): 12 ⚠️ │
|
|
376
|
-
│ │
|
|
377
|
-
│ Top Slow Queries: │
|
|
378
|
-
│ 1. SELECT * FROM orders... (2.5s) │
|
|
379
|
-
│ 2. JOIN customers... (1.8s) │
|
|
380
|
-
│ 3. UPDATE inventory... (1.2s) │
|
|
381
|
-
└──────────────────────────────────────┘
|
|
382
|
-
```
|
|
383
|
-
|
|
384
|
-
## Alerting
|
|
385
|
-
|
|
386
|
-
### Alert Levels
|
|
387
|
-
|
|
388
|
-
```yaml
|
|
389
|
-
Alert Severity Levels:
|
|
390
|
-
|
|
391
|
-
Critical:
|
|
392
|
-
Description: Service down, immediate action required
|
|
393
|
-
Examples:
|
|
394
|
-
- Production database down
|
|
395
|
-
- Website unreachable
|
|
396
|
-
- Data loss detected
|
|
397
|
-
Response: Page on-call, all hands on deck
|
|
398
|
-
SLA: Response within 15 minutes
|
|
399
|
-
|
|
400
|
-
Warning:
|
|
401
|
-
Description: Threshold exceeded, may impact service
|
|
402
|
-
Examples:
|
|
403
|
-
- Disk 85% full
|
|
404
|
-
- CPU 80% for 10 minutes
|
|
405
|
-
- Backup job delayed
|
|
406
|
-
Response: Investigate within 1 hour
|
|
407
|
-
SLA: Acknowledge within 30 minutes
|
|
408
|
-
|
|
409
|
-
Info:
|
|
410
|
-
Description: Informational, no action required
|
|
411
|
-
Examples:
|
|
412
|
-
- Backup completed successfully
|
|
413
|
-
- Deployment finished
|
|
414
|
-
- Certificate renewed
|
|
415
|
-
Response: Review during business hours
|
|
416
|
-
SLA: No SLA
|
|
417
|
-
```
|
|
418
|
-
|
|
419
|
-
### Alert Rules
|
|
420
|
-
|
|
421
|
-
```yaml
|
|
422
|
-
Server CPU Alert:
|
|
423
|
-
|
|
424
|
-
Metric: cpu.utilization
|
|
425
|
-
Condition: Average > 80% for 10 minutes
|
|
426
|
-
Severity: Warning
|
|
427
|
-
|
|
428
|
-
Actions:
|
|
429
|
-
- Send email to ops-team@company.com
|
|
430
|
-
- Create Slack notification in #ops-alerts
|
|
431
|
-
- Create ServiceNow ticket
|
|
432
|
-
|
|
433
|
-
Escalation:
|
|
434
|
-
If CPU > 90% for 10 minutes:
|
|
435
|
-
- Upgrade to Critical
|
|
436
|
-
- Page on-call engineer
|
|
437
|
-
- Notify manager
|
|
438
|
-
|
|
439
|
-
Auto-remediation:
|
|
440
|
-
If CPU > 95% for 5 minutes:
|
|
441
|
-
- Scale up (add server instance)
|
|
442
|
-
- Restart stuck processes (if configured)
|
|
443
|
-
```
|
|
444
|
-
|
|
445
|
-
### Alert Best Practices
|
|
446
|
-
|
|
447
|
-
```yaml
|
|
448
|
-
Alert Design:
|
|
449
|
-
|
|
450
|
-
1. Actionable:
|
|
451
|
-
❌ Bad: "Server CPU high"
|
|
452
|
-
✅ Good: "Web01 CPU >90% for 15min. Check runaway processes or scale up."
|
|
453
|
-
|
|
454
|
-
2. Contextual:
|
|
455
|
-
Include:
|
|
456
|
-
- Current value
|
|
457
|
-
- Threshold
|
|
458
|
-
- Duration
|
|
459
|
-
- Impact
|
|
460
|
-
- Runbook link
|
|
461
|
-
|
|
462
|
-
3. Threshold Tuning:
|
|
463
|
-
- Start conservative (avoid alert fatigue)
|
|
464
|
-
- Adjust based on normal patterns
|
|
465
|
-
- Different thresholds for different times
|
|
466
|
-
- Use anomaly detection
|
|
467
|
-
|
|
468
|
-
4. Alert Routing:
|
|
469
|
-
- Route to responsible team
|
|
470
|
-
- Escalate if not acknowledged
|
|
471
|
-
- Different channels per severity
|
|
472
|
-
- On-call rotation
|
|
473
|
-
|
|
474
|
-
5. Alert Deduplication:
|
|
475
|
-
- Group related alerts
|
|
476
|
-
- Suppress dependent alerts
|
|
477
|
-
- Cooldown periods
|
|
478
|
-
- Flapping detection
|
|
479
|
-
```
|
|
480
|
-
|
|
481
|
-
## Capacity Management
|
|
482
|
-
|
|
483
|
-
### Capacity Planning Process
|
|
484
|
-
|
|
485
|
-
```yaml
|
|
486
|
-
Capacity Planning Cycle:
|
|
487
|
-
|
|
488
|
-
1. Monitor Current Usage (Ongoing):
|
|
489
|
-
- Track resource utilization
|
|
490
|
-
- Identify trends
|
|
491
|
-
- Collect metrics
|
|
492
|
-
|
|
493
|
-
2. Forecast Future Demand (Quarterly):
|
|
494
|
-
- Business growth projections
|
|
495
|
-
- Seasonal variations
|
|
496
|
-
- New initiatives
|
|
497
|
-
- Historical trends
|
|
498
|
-
|
|
499
|
-
3. Analyze Capacity (Quarterly):
|
|
500
|
-
- Current vs forecasted demand
|
|
501
|
-
- Time to resource exhaustion
|
|
502
|
-
- Bottlenecks
|
|
503
|
-
- Optimization opportunities
|
|
504
|
-
|
|
505
|
-
4. Plan Capacity Changes (Quarterly):
|
|
506
|
-
- Procurement requirements
|
|
507
|
-
- Budget approval
|
|
508
|
-
- Implementation timeline
|
|
509
|
-
- Risk mitigation
|
|
510
|
-
|
|
511
|
-
5. Implement Changes (As needed):
|
|
512
|
-
- Procure resources
|
|
513
|
-
- Deploy infrastructure
|
|
514
|
-
- Validate capacity
|
|
515
|
-
- Document changes
|
|
516
|
-
|
|
517
|
-
6. Review and Optimize (Monthly):
|
|
518
|
-
- Actual vs plan
|
|
519
|
-
- Cost efficiency
|
|
520
|
-
- Performance impact
|
|
521
|
-
- Lessons learned
|
|
522
|
-
```
|
|
523
|
-
|
|
524
|
-
### Capacity Metrics
|
|
525
|
-
|
|
526
|
-
```yaml
|
|
527
|
-
Server Capacity:
|
|
528
|
-
|
|
529
|
-
Current State:
|
|
530
|
-
Total Servers: 250
|
|
531
|
-
Average CPU: 45%
|
|
532
|
-
Average Memory: 60%
|
|
533
|
-
Average Disk: 55%
|
|
534
|
-
|
|
535
|
-
Trend (6 months):
|
|
536
|
-
CPU: ▲ 5% increase
|
|
537
|
-
Memory: ▲ 8% increase
|
|
538
|
-
Disk: ▲ 12% increase
|
|
539
|
-
|
|
540
|
-
Forecast (Next 6 months):
|
|
541
|
-
Expected CPU: 55% (10% headroom)
|
|
542
|
-
Expected Memory: 75% (adequate)
|
|
543
|
-
Expected Disk: 75% (adequate)
|
|
544
|
-
|
|
545
|
-
Action Required:
|
|
546
|
-
- None for CPU/Memory
|
|
547
|
-
- Monitor disk growth
|
|
548
|
-
- Plan storage expansion in Q2
|
|
549
|
-
|
|
550
|
-
Storage Capacity:
|
|
551
|
-
|
|
552
|
-
Current: 500 TB used / 750 TB total (67%)
|
|
553
|
-
Growth Rate: 15 TB/month
|
|
554
|
-
Forecast: 590 TB in 6 months (79%)
|
|
555
|
-
Threshold: 80% (warning)
|
|
556
|
-
|
|
557
|
-
Action:
|
|
558
|
-
- Procure additional 250 TB
|
|
559
|
-
- Timeline: Q2 2025
|
|
560
|
-
- Budget: $50,000
|
|
561
|
-
|
|
562
|
-
Network Capacity:
|
|
563
|
-
|
|
564
|
-
Current: 1 Gbps links
|
|
565
|
-
Peak Usage: 650 Mbps (65%)
|
|
566
|
-
Growth: 5% per quarter
|
|
567
|
-
Forecast: 850 Mbps in 12 months (85%)
|
|
568
|
-
|
|
569
|
-
Action:
|
|
570
|
-
- Upgrade to 10 Gbps in Q3
|
|
571
|
-
- Cost: $25,000
|
|
572
|
-
- Provides 10x headroom
|
|
573
|
-
```
|
|
574
|
-
|
|
575
|
-
### Capacity Reporting
|
|
576
|
-
|
|
577
|
-
```yaml
|
|
578
|
-
Monthly Capacity Report:
|
|
579
|
-
|
|
580
|
-
Executive Summary:
|
|
581
|
-
- All systems within capacity targets
|
|
582
|
-
- Storage requiring expansion in 6 months
|
|
583
|
-
- Network upgrade planned Q3
|
|
584
|
-
- No immediate concerns
|
|
585
|
-
|
|
586
|
-
Current Utilization:
|
|
587
|
-
Compute: 45% (Low ✅)
|
|
588
|
-
Memory: 60% (Moderate ✅)
|
|
589
|
-
Storage: 67% (Moderate ✅)
|
|
590
|
-
Network: 65% (Moderate ✅)
|
|
591
|
-
|
|
592
|
-
Trends:
|
|
593
|
-
- Steady 5% quarterly compute growth
|
|
594
|
-
- Storage growth accelerating (cleanup needed)
|
|
595
|
-
- Network stable
|
|
596
|
-
|
|
597
|
-
Forecasts:
|
|
598
|
-
Next 6 Months:
|
|
599
|
-
- Compute: Adequate capacity
|
|
600
|
-
- Storage: Approaching limit (action required)
|
|
601
|
-
- Network: Adequate capacity
|
|
602
|
-
|
|
603
|
-
Next 12 Months:
|
|
604
|
-
- Compute: Adequate capacity
|
|
605
|
-
- Storage: Expansion required
|
|
606
|
-
- Network: Upgrade recommended
|
|
607
|
-
|
|
608
|
-
Actions:
|
|
609
|
-
- Storage procurement initiated
|
|
610
|
-
- Network upgrade planning started
|
|
611
|
-
- Cost: $75,000 (approved)
|
|
612
|
-
```
|
|
613
|
-
|
|
614
|
-
## Best Practices
|
|
615
|
-
|
|
616
|
-
### 1. Monitoring Coverage
|
|
617
|
-
|
|
618
|
-
```yaml
|
|
619
|
-
Ensure Comprehensive Coverage:
|
|
620
|
-
|
|
621
|
-
Infrastructure:
|
|
622
|
-
- All production servers
|
|
623
|
-
- Network devices
|
|
624
|
-
- Storage systems
|
|
625
|
-
- Virtualization platforms
|
|
626
|
-
|
|
627
|
-
Applications:
|
|
628
|
-
- All critical applications
|
|
629
|
-
- Key transactions
|
|
630
|
-
- Dependencies
|
|
631
|
-
- APIs
|
|
632
|
-
|
|
633
|
-
Business:
|
|
634
|
-
- Revenue metrics
|
|
635
|
-
- User experience
|
|
636
|
-
- SLA compliance
|
|
637
|
-
- Customer satisfaction
|
|
638
|
-
```
|
|
639
|
-
|
|
640
|
-
### 2. Baseline Establishment
|
|
641
|
-
|
|
642
|
-
```yaml
|
|
643
|
-
Establish Performance Baselines:
|
|
644
|
-
|
|
645
|
-
Process:
|
|
646
|
-
1. Collect metrics for 30 days
|
|
647
|
-
2. Analyze patterns (daily, weekly)
|
|
648
|
-
3. Calculate normal ranges
|
|
649
|
-
4. Set thresholds above baseline
|
|
650
|
-
5. Review quarterly
|
|
651
|
-
|
|
652
|
-
Example:
|
|
653
|
-
Metric: Application response time
|
|
654
|
-
Baseline (p95): 450ms
|
|
655
|
-
Warning: 600ms (133% of baseline)
|
|
656
|
-
Critical: 900ms (200% of baseline)
|
|
657
|
-
```
|
|
658
|
-
|
|
659
|
-
### 3. Alert Fatigue Prevention
|
|
660
|
-
|
|
661
|
-
```yaml
|
|
662
|
-
Avoid Alert Fatigue:
|
|
663
|
-
|
|
664
|
-
Strategies:
|
|
665
|
-
- Tune thresholds (reduce false positives)
|
|
666
|
-
- Use intelligent alerting (anomaly detection)
|
|
667
|
-
- Implement alert aggregation
|
|
668
|
-
- Regular alert review and cleanup
|
|
669
|
-
- Auto-remediation where possible
|
|
670
|
-
|
|
671
|
-
Metrics:
|
|
672
|
-
- Alert volume: <100/day
|
|
673
|
-
- Alert-to-incident ratio: >50%
|
|
674
|
-
- False positive rate: <10%
|
|
675
|
-
- Time to acknowledge: <5 minutes
|
|
676
|
-
```
|
|
677
|
-
|
|
678
|
-
### 4. Correlation and Root Cause
|
|
679
|
-
|
|
680
|
-
```yaml
|
|
681
|
-
Use Correlation for RCA:
|
|
682
|
-
|
|
683
|
-
Approach:
|
|
684
|
-
- Correlate metrics across layers
|
|
685
|
-
- Identify cascading failures
|
|
686
|
-
- Trace requests end-to-end
|
|
687
|
-
- Link logs to metrics
|
|
688
|
-
- Use dependency mapping
|
|
689
|
-
|
|
690
|
-
Example:
|
|
691
|
-
Symptom: Application slow
|
|
692
|
-
Correlation:
|
|
693
|
-
- Application response time ↑
|
|
694
|
-
- Database query time ↑
|
|
695
|
-
- Database disk I/O ↑
|
|
696
|
-
- Storage latency ↑
|
|
697
|
-
Root Cause: Storage array degraded disk
|
|
698
|
-
```
|
|
699
|
-
|
|
700
|
-
### 5. Continuous Improvement
|
|
701
|
-
|
|
702
|
-
```yaml
|
|
703
|
-
Monitoring Improvement Process:
|
|
704
|
-
|
|
705
|
-
Monthly:
|
|
706
|
-
- Review alert effectiveness
|
|
707
|
-
- Tune thresholds
|
|
708
|
-
- Add missing metrics
|
|
709
|
-
- Update dashboards
|
|
710
|
-
|
|
711
|
-
Quarterly:
|
|
712
|
-
- Capacity planning review
|
|
713
|
-
- Tool evaluation
|
|
714
|
-
- Process optimization
|
|
715
|
-
- Team training
|
|
716
|
-
|
|
717
|
-
Annually:
|
|
718
|
-
- Technology refresh
|
|
719
|
-
- Tool consolidation
|
|
720
|
-
- Architecture review
|
|
721
|
-
- Strategy planning
|
|
722
|
-
```
|
|
723
|
-
|
|
724
|
-
---
|
|
725
|
-
|
|
726
|
-
**Related Resources:**
|
|
727
|
-
- [incident-service-management.md](incident-service-management.md) - Incident response
|
|
728
|
-
- [business-continuity.md](business-continuity.md) - DR monitoring
|
|
729
|
-
- [automation-orchestration.md](automation-orchestration.md) - Automated remediation
|