blockmine 1.24.0 → 1.25.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +32 -0
- package/README.en.md +427 -0
- package/README.md +40 -0
- package/backend/cli.js +1 -1
- package/backend/src/ai/plugin-assistant-system-prompt.md +664 -5
- package/backend/src/api/routes/bots.js +13 -0
- package/backend/src/api/routes/servers.js +14 -2
- package/backend/src/core/BotProcess.js +98 -2
- package/backend/src/core/PluginLoader.js +83 -3
- package/backend/src/core/PluginManager.js +75 -5
- package/backend/src/core/services/BotLifecycleService.js +186 -2
- package/backend/src/server.js +11 -1
- package/frontend/dist/assets/browser-ponyfill-DN7pwmHT.js +2 -0
- package/frontend/dist/assets/index-LSy71uwm.js +11261 -0
- package/frontend/dist/assets/index-SfhKxI4-.css +32 -0
- package/frontend/dist/flags/en.svg +32 -0
- package/frontend/dist/flags/ru.svg +5 -0
- package/frontend/dist/index.html +2 -2
- package/frontend/dist/locales/en/admin.json +100 -0
- package/frontend/dist/locales/en/api-keys.json +58 -0
- package/frontend/dist/locales/en/bots.json +110 -0
- package/frontend/dist/locales/en/common.json +47 -0
- package/frontend/dist/locales/en/configuration.json +22 -0
- package/frontend/dist/locales/en/console.json +10 -0
- package/frontend/dist/locales/en/dashboard.json +85 -0
- package/frontend/dist/locales/en/dialogs.json +70 -0
- package/frontend/dist/locales/en/event-graphs.json +50 -0
- package/frontend/dist/locales/en/graph-store.json +70 -0
- package/frontend/dist/locales/en/login.json +34 -0
- package/frontend/dist/locales/en/management.json +114 -0
- package/frontend/dist/locales/en/minecraft-viewer.json +27 -0
- package/frontend/dist/locales/en/nodes.json +1077 -0
- package/frontend/dist/locales/en/permissions.json +50 -0
- package/frontend/dist/locales/en/plugin-detail.json +49 -0
- package/frontend/dist/locales/en/plugins.json +110 -0
- package/frontend/dist/locales/en/proxies.json +81 -0
- package/frontend/dist/locales/en/servers.json +39 -0
- package/frontend/dist/locales/en/setup.json +17 -0
- package/frontend/dist/locales/en/sidebar.json +27 -0
- package/frontend/dist/locales/en/tasks.json +62 -0
- package/frontend/dist/locales/en/visual-editor.json +219 -0
- package/frontend/dist/locales/en/websocket.json +86 -0
- package/frontend/dist/locales/ru/admin.json +100 -0
- package/frontend/dist/locales/ru/api-keys.json +58 -0
- package/frontend/dist/locales/ru/bots.json +110 -0
- package/frontend/dist/locales/ru/common.json +49 -0
- package/frontend/dist/locales/ru/configuration.json +22 -0
- package/frontend/dist/locales/ru/console.json +10 -0
- package/frontend/dist/locales/ru/dashboard.json +85 -0
- package/frontend/dist/locales/ru/dialogs.json +70 -0
- package/frontend/dist/locales/ru/event-graphs.json +50 -0
- package/frontend/dist/locales/ru/graph-store.json +70 -0
- package/frontend/dist/locales/ru/login.json +34 -0
- package/frontend/dist/locales/ru/management.json +114 -0
- package/frontend/dist/locales/ru/minecraft-viewer.json +27 -0
- package/frontend/dist/locales/ru/nodes.json +1077 -0
- package/frontend/dist/locales/ru/permissions.json +50 -0
- package/frontend/dist/locales/ru/plugin-detail.json +49 -0
- package/frontend/dist/locales/ru/plugins.json +110 -0
- package/frontend/dist/locales/ru/proxies.json +81 -0
- package/frontend/dist/locales/ru/servers.json +39 -0
- package/frontend/dist/locales/ru/setup.json +17 -0
- package/frontend/dist/locales/ru/sidebar.json +27 -0
- package/frontend/dist/locales/ru/tasks.json +62 -0
- package/frontend/dist/locales/ru/visual-editor.json +221 -0
- package/frontend/dist/locales/ru/websocket.json +86 -0
- package/frontend/dist/monacoeditorwork/css.worker.bundle.js +7 -7
- package/frontend/dist/monacoeditorwork/html.worker.bundle.js +7 -7
- package/frontend/dist/monacoeditorwork/json.worker.bundle.js +7 -7
- package/frontend/dist/monacoeditorwork/ts.worker.bundle.js +3 -3
- package/frontend/package.json +4 -0
- package/package.json +1 -1
- package/screen/3dviewer.png +0 -0
- package/screen/console.png +0 -0
- package/screen/dashboard.png +0 -0
- package/screen/graph_collabe.png +0 -0
- package/screen/graph_live_debug.png +0 -0
- package/screen/language_selector.png +0 -0
- package/screen/management_command.png +0 -0
- package/screen/node_debug_trace.png +0 -0
- package/screen/plugin_/320/276/320/261/320/267/320/276/321/200.png +0 -0
- package/screen/websocket.png +0 -0
- package/screen//320/275/320/260/321/201/321/202/321/200/320/276/320/271/320/272/320/270_/320/276/321/202/320/264/320/265/320/273/321/214/320/275/321/213/321/205_/320/272/320/276/320/274/320/260/320/275/320/264_/320/272/320/260/320/266/320/264/321/203_/320/272/320/276/320/274/320/260/320/275/320/273/320/264/321/203_/320/274/320/276/320/266/320/275/320/276_/320/275/320/260/321/201/321/202/321/200/320/260/320/270/320/262/320/260/321/202/321/214.png +0 -0
- package/screen//320/277/320/273/320/260/320/275/320/270/321/200/320/276/320/262/321/211/320/270/320/272_/320/274/320/276/320/266/320/275/320/276_/320/267/320/260/320/264/320/260/320/262/320/260/321/202/321/214_/320/264/320/265/320/271/321/201/321/202/320/262/320/270/321/217_/320/277/320/276_/320/262/321/200/320/265/320/274/320/265/320/275/320/270.png +0 -0
- package/.claude/agents/README.md +0 -469
- package/.claude/agents/auth-route-debugger.md +0 -118
- package/.claude/agents/auth-route-tester.md +0 -93
- package/.claude/agents/auto-error-resolver.md +0 -97
- package/.claude/agents/build-optimizer.md +0 -236
- package/.claude/agents/code-architect.md +0 -34
- package/.claude/agents/code-architecture-reviewer.md +0 -83
- package/.claude/agents/code-explorer.md +0 -51
- package/.claude/agents/code-refactor-master.md +0 -94
- package/.claude/agents/code-reviewer.md +0 -46
- package/.claude/agents/cost-optimizer.md +0 -134
- package/.claude/agents/deployment-orchestrator.md +0 -113
- package/.claude/agents/documentation-architect.md +0 -82
- package/.claude/agents/frontend-error-fixer.md +0 -77
- package/.claude/agents/iac-code-generator.md +0 -71
- package/.claude/agents/incident-responder.md +0 -346
- package/.claude/agents/infrastructure-architect.md +0 -31
- package/.claude/agents/kubernetes-specialist.md +0 -56
- package/.claude/agents/migration-planner.md +0 -181
- package/.claude/agents/network-architect.md +0 -196
- package/.claude/agents/plan-reviewer.md +0 -52
- package/.claude/agents/refactor-planner.md +0 -63
- package/.claude/agents/security-scanner.md +0 -102
- package/.claude/agents/web-research-specialist.md +0 -78
- package/.claude/commands/cost-analysis.md +0 -315
- package/.claude/commands/dev-docs-update.md +0 -55
- package/.claude/commands/dev-docs.md +0 -51
- package/.claude/commands/feature-dev.md +0 -125
- package/.claude/commands/incident-debug.md +0 -247
- package/.claude/commands/infra-plan.md +0 -81
- package/.claude/commands/migration-plan.md +0 -478
- package/.claude/commands/route-research-for-testing.md +0 -37
- package/.claude/commands/security-review.md +0 -66
- package/.claude/hooks/CONFIG.md +0 -448
- package/.claude/hooks/README.md +0 -163
- package/.claude/hooks/SKILL_ACTIVATION_COMPLETE.md +0 -226
- package/.claude/hooks/WINDOWS_HOOKS_README.md +0 -151
- package/.claude/hooks/add-skill-activation-banners.ts +0 -132
- package/.claude/hooks/comprehensive-skill-test.ts +0 -1315
- package/.claude/hooks/error-handling-reminder.sh +0 -12
- package/.claude/hooks/error-handling-reminder.ts +0 -222
- package/.claude/hooks/k8s-manifest-validator.sh +0 -56
- package/.claude/hooks/package-lock.json +0 -556
- package/.claude/hooks/package.json +0 -16
- package/.claude/hooks/post-tool-use-tracker.ps1 +0 -174
- package/.claude/hooks/post-tool-use-tracker.sh +0 -183
- package/.claude/hooks/security-policy-check.sh +0 -247
- package/.claude/hooks/skill-activation-prompt.ps1 +0 -10
- package/.claude/hooks/skill-activation-prompt.sh +0 -10
- package/.claude/hooks/skill-activation-prompt.ts +0 -141
- package/.claude/hooks/stop-build-check-enhanced.sh +0 -130
- package/.claude/hooks/terraform-validator.sh +0 -53
- package/.claude/hooks/test-input.json +0 -7
- package/.claude/hooks/test-skill-activation.ts +0 -427
- package/.claude/hooks/trigger-build-resolver.sh +0 -79
- package/.claude/hooks/tsc-check.sh +0 -173
- package/.claude/hooks/tsconfig.json +0 -19
- package/.claude/settings.json +0 -59
- package/.claude/settings.local.json +0 -67
- package/.claude/skills/README.md +0 -507
- package/.claude/skills/api-engineering/SKILL.md +0 -63
- package/.claude/skills/api-engineering/resources/api-versioning.md +0 -88
- package/.claude/skills/api-engineering/resources/graphql-patterns.md +0 -106
- package/.claude/skills/api-engineering/resources/rate-limiting.md +0 -118
- package/.claude/skills/api-engineering/resources/rest-api-design.md +0 -105
- package/.claude/skills/backend-dev-guidelines/SKILL.md +0 -306
- package/.claude/skills/backend-dev-guidelines/resources/architecture-overview.md +0 -451
- package/.claude/skills/backend-dev-guidelines/resources/async-and-errors.md +0 -307
- package/.claude/skills/backend-dev-guidelines/resources/complete-examples.md +0 -638
- package/.claude/skills/backend-dev-guidelines/resources/configuration.md +0 -275
- package/.claude/skills/backend-dev-guidelines/resources/database-patterns.md +0 -224
- package/.claude/skills/backend-dev-guidelines/resources/middleware-guide.md +0 -213
- package/.claude/skills/backend-dev-guidelines/resources/routing-and-controllers.md +0 -756
- package/.claude/skills/backend-dev-guidelines/resources/sentry-and-monitoring.md +0 -336
- package/.claude/skills/backend-dev-guidelines/resources/services-and-repositories.md +0 -789
- package/.claude/skills/backend-dev-guidelines/resources/testing-guide.md +0 -235
- package/.claude/skills/backend-dev-guidelines/resources/validation-patterns.md +0 -754
- package/.claude/skills/budget-and-cost-management/SKILL.md +0 -850
- package/.claude/skills/build-engineering/SKILL.md +0 -431
- package/.claude/skills/build-engineering/resources/artifact-repositories.md +0 -72
- package/.claude/skills/build-engineering/resources/build-caching.md +0 -96
- package/.claude/skills/build-engineering/resources/build-pipelines.md +0 -105
- package/.claude/skills/build-engineering/resources/build-security.md +0 -95
- package/.claude/skills/build-engineering/resources/build-systems.md +0 -389
- package/.claude/skills/build-engineering/resources/compilation-optimization.md +0 -201
- package/.claude/skills/build-engineering/resources/dependency-management.md +0 -73
- package/.claude/skills/build-engineering/resources/monorepo-builds.md +0 -110
- package/.claude/skills/build-engineering/resources/performance-optimization.md +0 -113
- package/.claude/skills/build-engineering/resources/reproducible-builds.md +0 -82
- package/.claude/skills/cloud-engineering/SKILL.md +0 -675
- package/.claude/skills/cloud-engineering/resources/aws-patterns.md +0 -742
- package/.claude/skills/cloud-engineering/resources/azure-patterns.md +0 -714
- package/.claude/skills/cloud-engineering/resources/cleared-cloud-environments.md +0 -987
- package/.claude/skills/cloud-engineering/resources/cloud-cost-optimization.md +0 -757
- package/.claude/skills/cloud-engineering/resources/cloud-networking.md +0 -1058
- package/.claude/skills/cloud-engineering/resources/cloud-security-tools.md +0 -1530
- package/.claude/skills/cloud-engineering/resources/cloud-security.md +0 -990
- package/.claude/skills/cloud-engineering/resources/gcp-patterns.md +0 -758
- package/.claude/skills/cloud-engineering/resources/migration-strategies.md +0 -820
- package/.claude/skills/cloud-engineering/resources/multi-cloud-strategies.md +0 -670
- package/.claude/skills/cloud-engineering/resources/oci-patterns.md +0 -1198
- package/.claude/skills/cloud-engineering/resources/serverless-patterns.md +0 -795
- package/.claude/skills/cloud-engineering/resources/well-architected-frameworks.md +0 -966
- package/.claude/skills/cybersecurity/SKILL.md +0 -409
- package/.claude/skills/cybersecurity/resources/security-architecture.md +0 -266
- package/.claude/skills/database-engineering/SKILL.md +0 -61
- package/.claude/skills/database-engineering/resources/backup-and-recovery.md +0 -72
- package/.claude/skills/database-engineering/resources/database-replication.md +0 -63
- package/.claude/skills/database-engineering/resources/postgresql-fundamentals.md +0 -70
- package/.claude/skills/database-engineering/resources/query-optimization.md +0 -68
- package/.claude/skills/devsecops/SKILL.md +0 -374
- package/.claude/skills/devsecops/resources/ci-cd-security.md +0 -204
- package/.claude/skills/devsecops/resources/compliance-automation.md +0 -530
- package/.claude/skills/devsecops/resources/compliance-frameworks.md +0 -2322
- package/.claude/skills/devsecops/resources/container-security.md +0 -915
- package/.claude/skills/devsecops/resources/cspm-integration.md +0 -1440
- package/.claude/skills/devsecops/resources/policy-enforcement.md +0 -619
- package/.claude/skills/devsecops/resources/secrets-management.md +0 -755
- package/.claude/skills/devsecops/resources/security-monitoring.md +0 -146
- package/.claude/skills/devsecops/resources/security-scanning.md +0 -887
- package/.claude/skills/devsecops/resources/security-testing.md +0 -203
- package/.claude/skills/devsecops/resources/supply-chain-security.md +0 -518
- package/.claude/skills/devsecops/resources/vulnerability-management.md +0 -481
- package/.claude/skills/devsecops/resources/zero-trust-architecture.md +0 -177
- package/.claude/skills/documentation-as-code/SKILL.md +0 -323
- package/.claude/skills/documentation-as-code/resources/api-documentation.md +0 -90
- package/.claude/skills/documentation-as-code/resources/changelog-management.md +0 -79
- package/.claude/skills/documentation-as-code/resources/diagram-generation.md +0 -44
- package/.claude/skills/documentation-as-code/resources/docs-as-code-workflow.md +0 -99
- package/.claude/skills/documentation-as-code/resources/documentation-automation.md +0 -68
- package/.claude/skills/documentation-as-code/resources/documentation-sites.md +0 -79
- package/.claude/skills/documentation-as-code/resources/markdown-best-practices.md +0 -162
- package/.claude/skills/documentation-as-code/resources/openapi-specification.md +0 -77
- package/.claude/skills/documentation-as-code/resources/readme-engineering.md +0 -60
- package/.claude/skills/documentation-as-code/resources/technical-writing-guide.md +0 -202
- package/.claude/skills/engineering-management/SKILL.md +0 -356
- package/.claude/skills/engineering-management/resources/career-ladders.md +0 -609
- package/.claude/skills/engineering-management/resources/hiring-and-assessment.md +0 -555
- package/.claude/skills/engineering-management/resources/one-on-one-guides.md +0 -609
- package/.claude/skills/engineering-management/resources/resource-planning.md +0 -557
- package/.claude/skills/engineering-management/resources/team-organization-patterns.md +0 -491
- package/.claude/skills/engineering-management/resources/technical-interviews.md +0 -474
- package/.claude/skills/engineering-operations-management/SKILL.md +0 -817
- package/.claude/skills/error-tracking/SKILL.md +0 -379
- package/.claude/skills/frontend-design/SKILL.md +0 -42
- package/.claude/skills/frontend-dev-guidelines/SKILL.md +0 -403
- package/.claude/skills/frontend-dev-guidelines/resources/common-patterns.md +0 -331
- package/.claude/skills/frontend-dev-guidelines/resources/complete-examples.md +0 -872
- package/.claude/skills/frontend-dev-guidelines/resources/component-patterns.md +0 -502
- package/.claude/skills/frontend-dev-guidelines/resources/data-fetching.md +0 -767
- package/.claude/skills/frontend-dev-guidelines/resources/file-organization.md +0 -502
- package/.claude/skills/frontend-dev-guidelines/resources/loading-and-error-states.md +0 -501
- package/.claude/skills/frontend-dev-guidelines/resources/performance.md +0 -406
- package/.claude/skills/frontend-dev-guidelines/resources/routing-guide.md +0 -364
- package/.claude/skills/frontend-dev-guidelines/resources/styling-guide.md +0 -428
- package/.claude/skills/frontend-dev-guidelines/resources/typescript-standards.md +0 -418
- package/.claude/skills/general-it-engineering/SKILL.md +0 -393
- package/.claude/skills/general-it-engineering/resources/asset-management.md +0 -712
- package/.claude/skills/general-it-engineering/resources/automation-orchestration.md +0 -817
- package/.claude/skills/general-it-engineering/resources/business-continuity.md +0 -786
- package/.claude/skills/general-it-engineering/resources/change-management.md +0 -715
- package/.claude/skills/general-it-engineering/resources/enterprise-monitoring.md +0 -729
- package/.claude/skills/general-it-engineering/resources/help-desk-operations.md +0 -738
- package/.claude/skills/general-it-engineering/resources/incident-service-management.md +0 -834
- package/.claude/skills/general-it-engineering/resources/it-governance.md +0 -753
- package/.claude/skills/general-it-engineering/resources/itil-framework.md +0 -503
- package/.claude/skills/general-it-engineering/resources/service-management.md +0 -669
- package/.claude/skills/infrastructure-architecture/SKILL.md +0 -328
- package/.claude/skills/infrastructure-architecture/resources/architecture-decision-records.md +0 -505
- package/.claude/skills/infrastructure-architecture/resources/architecture-patterns.md +0 -528
- package/.claude/skills/infrastructure-architecture/resources/capacity-planning.md +0 -453
- package/.claude/skills/infrastructure-architecture/resources/cleared-environment-architecture.md +0 -773
- package/.claude/skills/infrastructure-architecture/resources/cost-architecture.md +0 -499
- package/.claude/skills/infrastructure-architecture/resources/data-architecture.md +0 -501
- package/.claude/skills/infrastructure-architecture/resources/disaster-recovery.md +0 -535
- package/.claude/skills/infrastructure-architecture/resources/migration-architecture.md +0 -512
- package/.claude/skills/infrastructure-architecture/resources/multi-region-design.md +0 -608
- package/.claude/skills/infrastructure-architecture/resources/reference-architectures.md +0 -562
- package/.claude/skills/infrastructure-architecture/resources/security-architecture.md +0 -538
- package/.claude/skills/infrastructure-architecture/resources/system-design-principles.md +0 -489
- package/.claude/skills/infrastructure-architecture/resources/workload-classification.md +0 -1000
- package/.claude/skills/infrastructure-strategy/SKILL.md +0 -924
- package/.claude/skills/network-engineering/SKILL.md +0 -385
- package/.claude/skills/network-engineering/resources/dns-management.md +0 -738
- package/.claude/skills/network-engineering/resources/load-balancing.md +0 -820
- package/.claude/skills/network-engineering/resources/network-architecture.md +0 -546
- package/.claude/skills/network-engineering/resources/network-security.md +0 -921
- package/.claude/skills/network-engineering/resources/network-troubleshooting.md +0 -749
- package/.claude/skills/network-engineering/resources/routing-switching.md +0 -373
- package/.claude/skills/network-engineering/resources/sdn-networking.md +0 -695
- package/.claude/skills/network-engineering/resources/service-mesh-networking.md +0 -777
- package/.claude/skills/network-engineering/resources/tcp-ip-protocols.md +0 -444
- package/.claude/skills/network-engineering/resources/vpn-connectivity.md +0 -672
- package/.claude/skills/node-development/SKILL.md +0 -317
- package/.claude/skills/observability-engineering/SKILL.md +0 -101
- package/.claude/skills/observability-engineering/resources/apm-tools.md +0 -97
- package/.claude/skills/observability-engineering/resources/correlation-strategies.md +0 -87
- package/.claude/skills/observability-engineering/resources/distributed-tracing.md +0 -98
- package/.claude/skills/observability-engineering/resources/logs-aggregation.md +0 -118
- package/.claude/skills/observability-engineering/resources/observability-cost-optimization.md +0 -141
- package/.claude/skills/observability-engineering/resources/opentelemetry.md +0 -110
- package/.claude/skills/platform-engineering/SKILL.md +0 -555
- package/.claude/skills/platform-engineering/resources/architecture-overview.md +0 -600
- package/.claude/skills/platform-engineering/resources/container-orchestration.md +0 -916
- package/.claude/skills/platform-engineering/resources/cost-optimization.md +0 -634
- package/.claude/skills/platform-engineering/resources/developer-platforms.md +0 -670
- package/.claude/skills/platform-engineering/resources/gitops-automation.md +0 -650
- package/.claude/skills/platform-engineering/resources/infrastructure-as-code.md +0 -778
- package/.claude/skills/platform-engineering/resources/infrastructure-standards.md +0 -708
- package/.claude/skills/platform-engineering/resources/multi-tenancy.md +0 -602
- package/.claude/skills/platform-engineering/resources/platform-security.md +0 -711
- package/.claude/skills/platform-engineering/resources/resource-management.md +0 -592
- package/.claude/skills/platform-engineering/resources/service-mesh.md +0 -628
- package/.claude/skills/release-engineering/SKILL.md +0 -393
- package/.claude/skills/release-engineering/resources/artifact-management.md +0 -108
- package/.claude/skills/release-engineering/resources/build-optimization.md +0 -84
- package/.claude/skills/release-engineering/resources/ci-cd-pipelines.md +0 -411
- package/.claude/skills/release-engineering/resources/deployment-strategies.md +0 -197
- package/.claude/skills/release-engineering/resources/pipeline-security.md +0 -62
- package/.claude/skills/release-engineering/resources/progressive-delivery.md +0 -83
- package/.claude/skills/release-engineering/resources/release-automation.md +0 -68
- package/.claude/skills/release-engineering/resources/release-orchestration.md +0 -77
- package/.claude/skills/release-engineering/resources/rollback-strategies.md +0 -66
- package/.claude/skills/release-engineering/resources/versioning-strategies.md +0 -59
- package/.claude/skills/route-tester/SKILL.md +0 -392
- package/.claude/skills/skill-developer/ADVANCED.md +0 -197
- package/.claude/skills/skill-developer/HOOK_MECHANISMS.md +0 -306
- package/.claude/skills/skill-developer/PATTERNS_LIBRARY.md +0 -152
- package/.claude/skills/skill-developer/SKILL.md +0 -430
- package/.claude/skills/skill-developer/SKILL_RULES_REFERENCE.md +0 -315
- package/.claude/skills/skill-developer/TRIGGER_TYPES.md +0 -305
- package/.claude/skills/skill-developer/TROUBLESHOOTING.md +0 -514
- package/.claude/skills/skill-rules.json +0 -2989
- package/.claude/skills/sre/SKILL.md +0 -464
- package/.claude/skills/sre/resources/alerting-best-practices.md +0 -282
- package/.claude/skills/sre/resources/capacity-planning.md +0 -226
- package/.claude/skills/sre/resources/chaos-engineering.md +0 -193
- package/.claude/skills/sre/resources/disaster-recovery.md +0 -232
- package/.claude/skills/sre/resources/incident-management.md +0 -436
- package/.claude/skills/sre/resources/observability-stack.md +0 -240
- package/.claude/skills/sre/resources/on-call-runbooks.md +0 -167
- package/.claude/skills/sre/resources/performance-optimization.md +0 -108
- package/.claude/skills/sre/resources/reliability-patterns.md +0 -183
- package/.claude/skills/sre/resources/slo-sli-sla.md +0 -464
- package/.claude/skills/sre/resources/toil-reduction.md +0 -145
- package/.claude/skills/systems-engineering/SKILL.md +0 -648
- package/.claude/skills/systems-engineering/resources/automation-patterns.md +0 -771
- package/.claude/skills/systems-engineering/resources/configuration-management.md +0 -998
- package/.claude/skills/systems-engineering/resources/linux-administration.md +0 -672
- package/.claude/skills/systems-engineering/resources/networking-fundamentals.md +0 -982
- package/.claude/skills/systems-engineering/resources/performance-tuning.md +0 -871
- package/.claude/skills/systems-engineering/resources/powershell-scripting.md +0 -482
- package/.claude/skills/systems-engineering/resources/security-hardening.md +0 -739
- package/.claude/skills/systems-engineering/resources/shell-scripting.md +0 -915
- package/.claude/skills/systems-engineering/resources/storage-management.md +0 -628
- package/.claude/skills/systems-engineering/resources/system-monitoring.md +0 -787
- package/.claude/skills/systems-engineering/resources/troubleshooting-guide.md +0 -753
- package/.claude/skills/systems-engineering/resources/windows-administration.md +0 -738
- package/.claude/skills/technical-leadership/SKILL.md +0 -728
- package/backend/docs/SECRETS_DOCUMENTATION.md +0 -327
- package/frontend/dist/assets/index-BC-NbKXi.css +0 -32
- package/frontend/dist/assets/index-DqJXZMHY.js +0 -11266
|
@@ -1,608 +0,0 @@
|
|
|
1
|
-
# Multi-Region Design
|
|
2
|
-
|
|
3
|
-
Comprehensive guide to designing multi-region architectures for high availability, disaster recovery, and global performance.
|
|
4
|
-
|
|
5
|
-
## Why Multi-Region?
|
|
6
|
-
|
|
7
|
-
**Primary Drivers:**
|
|
8
|
-
1. **High Availability** - Survive regional outages
|
|
9
|
-
2. **Disaster Recovery** - RTO/RPO requirements
|
|
10
|
-
3. **Global Performance** - Reduce latency for users worldwide
|
|
11
|
-
4. **Compliance** - Data residency requirements
|
|
12
|
-
5. **Business Continuity** - Critical applications
|
|
13
|
-
|
|
14
|
-
## Architecture Patterns
|
|
15
|
-
|
|
16
|
-
### 1. Active-Passive (Warm Standby)
|
|
17
|
-
|
|
18
|
-
**Description:**
|
|
19
|
-
Primary region handles all traffic. Secondary region on standby with replicated data.
|
|
20
|
-
|
|
21
|
-
**Diagram:**
|
|
22
|
-
```
|
|
23
|
-
┌─────────────────────┐ ┌─────────────────────┐
|
|
24
|
-
│ Region A (Primary)│ │ Region B (Secondary)│
|
|
25
|
-
│ │ │ │
|
|
26
|
-
│ ┌─────────────┐ │ │ ┌─────────────┐ │
|
|
27
|
-
│ │ Application │◄──┼────────┼──►│ Application │ │
|
|
28
|
-
│ │ (Active) │ │ Data │ │ (Standby) │ │
|
|
29
|
-
│ └──────┬──────┘ │ Repl │ └─────────────┘ │
|
|
30
|
-
│ │ │ │ │
|
|
31
|
-
│ ┌──────▼──────┐ │ │ ┌─────────────┐ │
|
|
32
|
-
│ │ Database │───┼────────┼──►│ Database │ │
|
|
33
|
-
│ │ (Primary) │ │ Async │ │ (Replica) │ │
|
|
34
|
-
│ └─────────────┘ │ │ └─────────────┘ │
|
|
35
|
-
└─────────────────────┘ └─────────────────────┘
|
|
36
|
-
100% 0%
|
|
37
|
-
Traffic Traffic
|
|
38
|
-
```
|
|
39
|
-
|
|
40
|
-
**When to Use:**
|
|
41
|
-
- RTO: 5-30 minutes acceptable
|
|
42
|
-
- Cost optimization (secondary idle)
|
|
43
|
-
- Simple failover needed
|
|
44
|
-
|
|
45
|
-
**Implementation:**
|
|
46
|
-
```yaml
|
|
47
|
-
# Route 53 health check and failover
|
|
48
|
-
resource "aws_route53_health_check" "primary" {
|
|
49
|
-
fqdn = "api.primary-region.example.com"
|
|
50
|
-
port = 443
|
|
51
|
-
type = "HTTPS"
|
|
52
|
-
resource_path = "/health"
|
|
53
|
-
failure_threshold = 3
|
|
54
|
-
request_interval = 30
|
|
55
|
-
}
|
|
56
|
-
|
|
57
|
-
resource "aws_route53_record" "api" {
|
|
58
|
-
zone_id = aws_route53_zone.main.zone_id
|
|
59
|
-
name = "api.example.com"
|
|
60
|
-
type = "A"
|
|
61
|
-
|
|
62
|
-
# Primary region (active)
|
|
63
|
-
set_identifier = "primary"
|
|
64
|
-
failover_routing_policy {
|
|
65
|
-
type = "PRIMARY"
|
|
66
|
-
}
|
|
67
|
-
health_check_id = aws_route53_health_check.primary.id
|
|
68
|
-
alias {
|
|
69
|
-
name = aws_lb.primary.dns_name
|
|
70
|
-
zone_id = aws_lb.primary.zone_id
|
|
71
|
-
evaluate_target_health = true
|
|
72
|
-
}
|
|
73
|
-
}
|
|
74
|
-
|
|
75
|
-
resource "aws_route53_record" "api_failover" {
|
|
76
|
-
zone_id = aws_route53_zone.main.zone_id
|
|
77
|
-
name = "api.example.com"
|
|
78
|
-
type = "A"
|
|
79
|
-
|
|
80
|
-
# Secondary region (standby)
|
|
81
|
-
set_identifier = "secondary"
|
|
82
|
-
failover_routing_policy {
|
|
83
|
-
type = "SECONDARY"
|
|
84
|
-
}
|
|
85
|
-
alias {
|
|
86
|
-
name = aws_lb.secondary.dns_name
|
|
87
|
-
zone_id = aws_lb.secondary.zone_id
|
|
88
|
-
evaluate_target_health = true
|
|
89
|
-
}
|
|
90
|
-
}
|
|
91
|
-
```
|
|
92
|
-
|
|
93
|
-
**Pros:**
|
|
94
|
-
- ✅ Lower cost (secondary mostly idle)
|
|
95
|
-
- ✅ Simple to implement
|
|
96
|
-
- ✅ Data always available in secondary
|
|
97
|
-
|
|
98
|
-
**Cons:**
|
|
99
|
-
- ❌ Slower failover (5-30 minutes)
|
|
100
|
-
- ❌ Wasted capacity in secondary
|
|
101
|
-
- ❌ Manual failover often required
|
|
102
|
-
|
|
103
|
-
**Cost Estimate:**
|
|
104
|
-
- Primary: 100% of single-region cost
|
|
105
|
-
- Secondary: 30-50% (data replication + standby compute)
|
|
106
|
-
- **Total: ~130-150% of single-region**
|
|
107
|
-
|
|
108
|
-
---
|
|
109
|
-
|
|
110
|
-
### 2. Active-Active (Hot-Hot)
|
|
111
|
-
|
|
112
|
-
**Description:**
|
|
113
|
-
Multiple regions actively handle traffic simultaneously.
|
|
114
|
-
|
|
115
|
-
**Diagram:**
|
|
116
|
-
```
|
|
117
|
-
┌─────────────────┐
|
|
118
|
-
│ Global Load │
|
|
119
|
-
│ Balancer │
|
|
120
|
-
│ (Route 53) │
|
|
121
|
-
└────────┬────────┘
|
|
122
|
-
│
|
|
123
|
-
┏━━━━━━━━━━┻━━━━━━━━━━┓
|
|
124
|
-
┃ ┃
|
|
125
|
-
┌───────▼────────┐ ┌───────▼────────┐
|
|
126
|
-
│ Region A │ │ Region B │
|
|
127
|
-
│ ┌──────────┐ │ │ ┌──────────┐ │
|
|
128
|
-
│ │Application│ │ │ │Application│ │
|
|
129
|
-
│ │ (Active) │ │ │ │ (Active) │ │
|
|
130
|
-
│ └─────┬────┘ │ │ └─────┬────┘ │
|
|
131
|
-
│ ┌─────▼────┐ │ │ ┌─────▼────┐ │
|
|
132
|
-
│ │Database │◄─┼────┼─►│Database │ │
|
|
133
|
-
│ │(Primary) │ │ Bi │ │(Primary) │ │
|
|
134
|
-
│ └──────────┘ │ Dir│ └──────────┘ │
|
|
135
|
-
└────────────────┘ Sync└────────────────┘
|
|
136
|
-
50% 50%
|
|
137
|
-
Traffic Traffic
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
**When to Use:**
|
|
141
|
-
- RTO: < 5 minutes required
|
|
142
|
-
- Global user base
|
|
143
|
-
- High availability critical
|
|
144
|
-
- Cost less of a concern
|
|
145
|
-
|
|
146
|
-
**Implementation:**
|
|
147
|
-
```typescript
|
|
148
|
-
// DynamoDB global tables (active-active)
|
|
149
|
-
resource "aws_dynamodb_table" "users" {
|
|
150
|
-
name = "users"
|
|
151
|
-
billing_mode = "PAY_PER_REQUEST"
|
|
152
|
-
hash_key = "userId"
|
|
153
|
-
|
|
154
|
-
attribute {
|
|
155
|
-
name = "userId"
|
|
156
|
-
type = "S"
|
|
157
|
-
}
|
|
158
|
-
|
|
159
|
-
# Enable global tables
|
|
160
|
-
replica {
|
|
161
|
-
region_name = "us-east-1"
|
|
162
|
-
}
|
|
163
|
-
|
|
164
|
-
replica {
|
|
165
|
-
region_name = "eu-west-1"
|
|
166
|
-
}
|
|
167
|
-
|
|
168
|
-
replica {
|
|
169
|
-
region_name = "ap-southeast-1"
|
|
170
|
-
}
|
|
171
|
-
|
|
172
|
-
# Conflict resolution: last-writer-wins
|
|
173
|
-
stream_enabled = true
|
|
174
|
-
stream_view_type = "NEW_AND_OLD_IMAGES"
|
|
175
|
-
}
|
|
176
|
-
|
|
177
|
-
// Route 53 latency-based routing
|
|
178
|
-
resource "aws_route53_record" "api_us" {
|
|
179
|
-
zone_id = aws_route53_zone.main.zone_id
|
|
180
|
-
name = "api.example.com"
|
|
181
|
-
type = "A"
|
|
182
|
-
|
|
183
|
-
set_identifier = "us-east-1"
|
|
184
|
-
latency_routing_policy {
|
|
185
|
-
region = "us-east-1"
|
|
186
|
-
}
|
|
187
|
-
alias {
|
|
188
|
-
name = aws_lb.us_east_1.dns_name
|
|
189
|
-
zone_id = aws_lb.us_east_1.zone_id
|
|
190
|
-
}
|
|
191
|
-
}
|
|
192
|
-
|
|
193
|
-
resource "aws_route53_record" "api_eu" {
|
|
194
|
-
zone_id = aws_route53_zone.main.zone_id
|
|
195
|
-
name = "api.example.com"
|
|
196
|
-
type = "A"
|
|
197
|
-
|
|
198
|
-
set_identifier = "eu-west-1"
|
|
199
|
-
latency_routing_policy {
|
|
200
|
-
region = "eu-west-1"
|
|
201
|
-
}
|
|
202
|
-
alias {
|
|
203
|
-
name = aws_lb.eu_west_1.dns_name
|
|
204
|
-
zone_id = aws_lb.eu_west_1.zone_id
|
|
205
|
-
}
|
|
206
|
-
}
|
|
207
|
-
```
|
|
208
|
-
|
|
209
|
-
**Pros:**
|
|
210
|
-
- ✅ Fastest failover (seconds)
|
|
211
|
-
- ✅ Optimal user latency (geo-routing)
|
|
212
|
-
- ✅ Full capacity utilization
|
|
213
|
-
- ✅ No wasted resources
|
|
214
|
-
|
|
215
|
-
**Cons:**
|
|
216
|
-
- ❌ Higher cost (2x infrastructure)
|
|
217
|
-
- ❌ Data consistency complexity
|
|
218
|
-
- ❌ Cross-region data transfer costs
|
|
219
|
-
- ❌ Difficult to test
|
|
220
|
-
|
|
221
|
-
**Cost Estimate:**
|
|
222
|
-
- Region A: 100% of single-region
|
|
223
|
-
- Region B: 100% of single-region
|
|
224
|
-
- Data transfer: 10-20% additional
|
|
225
|
-
- **Total: ~210-220% of single-region**
|
|
226
|
-
|
|
227
|
-
---
|
|
228
|
-
|
|
229
|
-
### 3. Read Replicas (Global Read)
|
|
230
|
-
|
|
231
|
-
**Description:**
|
|
232
|
-
Primary region for writes, multiple regions for reads.
|
|
233
|
-
|
|
234
|
-
**Diagram:**
|
|
235
|
-
```
|
|
236
|
-
Writes (10%)
|
|
237
|
-
│
|
|
238
|
-
┌───────▼────────┐
|
|
239
|
-
│ Primary Region│
|
|
240
|
-
│ (us-east-1) │
|
|
241
|
-
│ ┌──────────┐ │
|
|
242
|
-
│ │ Database │ │
|
|
243
|
-
│ │ (Primary)│ │
|
|
244
|
-
│ └────┬─────┘ │
|
|
245
|
-
└───────┼────────┘
|
|
246
|
-
│
|
|
247
|
-
Replication
|
|
248
|
-
┌──────┼──────┐
|
|
249
|
-
│ │
|
|
250
|
-
┌──────▼──────┐ ┌───▼────────┐
|
|
251
|
-
│ Read Replica│ │Read Replica│
|
|
252
|
-
│ (eu-west-1)│ │(ap-south-1)│
|
|
253
|
-
│ │ │ │
|
|
254
|
-
│ Reads (45%)│ │Reads (45%) │
|
|
255
|
-
└─────────────┘ └────────────┘
|
|
256
|
-
```
|
|
257
|
-
|
|
258
|
-
**When to Use:**
|
|
259
|
-
- Read-heavy workloads (90%+ reads)
|
|
260
|
-
- Global user base
|
|
261
|
-
- Eventual consistency acceptable
|
|
262
|
-
- Cost optimization
|
|
263
|
-
|
|
264
|
-
**Implementation:**
|
|
265
|
-
```terraform
|
|
266
|
-
# Primary database
|
|
267
|
-
resource "aws_db_instance" "primary" {
|
|
268
|
-
identifier = "users-primary"
|
|
269
|
-
region = "us-east-1"
|
|
270
|
-
|
|
271
|
-
engine = "postgres"
|
|
272
|
-
instance_class = "db.r5.xlarge"
|
|
273
|
-
|
|
274
|
-
backup_retention_period = 7
|
|
275
|
-
multi_az = true
|
|
276
|
-
}
|
|
277
|
-
|
|
278
|
-
# Read replica in Europe
|
|
279
|
-
resource "aws_db_instance" "replica_eu" {
|
|
280
|
-
identifier = "users-replica-eu"
|
|
281
|
-
region = "eu-west-1"
|
|
282
|
-
|
|
283
|
-
replicate_source_db = aws_db_instance.primary.identifier
|
|
284
|
-
instance_class = "db.r5.large" # Can be smaller
|
|
285
|
-
|
|
286
|
-
# No backups needed for replica
|
|
287
|
-
backup_retention_period = 0
|
|
288
|
-
}
|
|
289
|
-
|
|
290
|
-
# Read replica in Asia
|
|
291
|
-
resource "aws_db_instance" "replica_asia" {
|
|
292
|
-
identifier = "users-replica-asia"
|
|
293
|
-
region = "ap-southeast-1"
|
|
294
|
-
|
|
295
|
-
replicate_source_db = aws_db_instance.primary.identifier
|
|
296
|
-
instance_class = "db.r5.large"
|
|
297
|
-
|
|
298
|
-
backup_retention_period = 0
|
|
299
|
-
}
|
|
300
|
-
```
|
|
301
|
-
|
|
302
|
-
**Application Code:**
|
|
303
|
-
```typescript
|
|
304
|
-
// Smart connection routing
|
|
305
|
-
const connectionConfig = {
|
|
306
|
-
write: process.env.DATABASE_PRIMARY_URL,
|
|
307
|
-
read: [
|
|
308
|
-
process.env.DATABASE_REPLICA_EU_URL,
|
|
309
|
-
process.env.DATABASE_REPLICA_ASIA_URL,
|
|
310
|
-
]
|
|
311
|
-
};
|
|
312
|
-
|
|
313
|
-
// Use Prisma read replicas
|
|
314
|
-
const prisma = new PrismaClient({
|
|
315
|
-
datasources: {
|
|
316
|
-
db: {
|
|
317
|
-
url: connectionConfig.write
|
|
318
|
-
}
|
|
319
|
-
}
|
|
320
|
-
});
|
|
321
|
-
|
|
322
|
-
// Explicit read from replica
|
|
323
|
-
const users = await prisma.$queryRaw`
|
|
324
|
-
SELECT * FROM users WHERE region = 'EU'
|
|
325
|
-
` // Automatically routed to nearest replica
|
|
326
|
-
|
|
327
|
-
// Writes always go to primary
|
|
328
|
-
await prisma.user.create({
|
|
329
|
-
data: { name: 'John' }
|
|
330
|
-
}); // Routed to primary
|
|
331
|
-
```
|
|
332
|
-
|
|
333
|
-
**Pros:**
|
|
334
|
-
- ✅ Lower cost than active-active
|
|
335
|
-
- ✅ Improved read performance globally
|
|
336
|
-
- ✅ Reduced load on primary
|
|
337
|
-
|
|
338
|
-
**Cons:**
|
|
339
|
-
- ❌ Replication lag (eventual consistency)
|
|
340
|
-
- ❌ Writes still centralized
|
|
341
|
-
- ❌ Application must handle read-after-write
|
|
342
|
-
|
|
343
|
-
**Cost Estimate:**
|
|
344
|
-
- Primary: 100%
|
|
345
|
-
- Replica 1: 50% (smaller instance)
|
|
346
|
-
- Replica 2: 50%
|
|
347
|
-
- **Total: ~200% of single-region**
|
|
348
|
-
|
|
349
|
-
---
|
|
350
|
-
|
|
351
|
-
## Data Replication Strategies
|
|
352
|
-
|
|
353
|
-
### Synchronous Replication
|
|
354
|
-
```
|
|
355
|
-
Client → Primary DB → Secondary DB → Client
|
|
356
|
-
(wait) (wait) (response)
|
|
357
|
-
```
|
|
358
|
-
- ✅ Strong consistency
|
|
359
|
-
- ❌ Higher latency
|
|
360
|
-
- ❌ Availability impact if secondary fails
|
|
361
|
-
|
|
362
|
-
### Asynchronous Replication
|
|
363
|
-
```
|
|
364
|
-
Client → Primary DB → Client
|
|
365
|
-
(no wait) (response)
|
|
366
|
-
↓
|
|
367
|
-
Secondary DB
|
|
368
|
-
(background)
|
|
369
|
-
```
|
|
370
|
-
- ✅ Lower latency
|
|
371
|
-
- ✅ High availability
|
|
372
|
-
- ❌ Potential data loss
|
|
373
|
-
- ❌ Eventual consistency
|
|
374
|
-
|
|
375
|
-
### Comparison:
|
|
376
|
-
|
|
377
|
-
| Aspect | Synchronous | Asynchronous |
|
|
378
|
-
|--------|-------------|--------------|
|
|
379
|
-
| Consistency | Strong | Eventual |
|
|
380
|
-
| Latency | High (+50-200ms) | Low (no impact) |
|
|
381
|
-
| Data Loss Risk | None | Seconds to minutes |
|
|
382
|
-
| Availability | Lower | Higher |
|
|
383
|
-
| Use Case | Financial | Most applications |
|
|
384
|
-
|
|
385
|
-
---
|
|
386
|
-
|
|
387
|
-
## Handling Cross-Region Challenges
|
|
388
|
-
|
|
389
|
-
### 1. Network Latency
|
|
390
|
-
|
|
391
|
-
**Problem:** Cross-region latency 50-300ms
|
|
392
|
-
|
|
393
|
-
**Solutions:**
|
|
394
|
-
- Use CDN for static assets
|
|
395
|
-
- Cache aggressively at edge
|
|
396
|
-
- Async replication
|
|
397
|
-
- Regional data partitioning
|
|
398
|
-
|
|
399
|
-
```typescript
|
|
400
|
-
// Regional data partitioning
|
|
401
|
-
const userRegion = getUserRegion(userId);
|
|
402
|
-
|
|
403
|
-
// Route to nearest region
|
|
404
|
-
const regionalDB = {
|
|
405
|
-
'US': usEastDB,
|
|
406
|
-
'EU': euWestDB,
|
|
407
|
-
'ASIA': apSouthDB
|
|
408
|
-
}[userRegion];
|
|
409
|
-
|
|
410
|
-
const user = await regionalDB.users.findUnique({
|
|
411
|
-
where: { id: userId }
|
|
412
|
-
});
|
|
413
|
-
```
|
|
414
|
-
|
|
415
|
-
### 2. Data Consistency
|
|
416
|
-
|
|
417
|
-
**Problem:** Distributed data, potential conflicts
|
|
418
|
-
|
|
419
|
-
**Solutions:**
|
|
420
|
-
- Last-writer-wins (simple, lossy)
|
|
421
|
-
- Vector clocks (complex, accurate)
|
|
422
|
-
- CRDTs (Conflict-Free Replicated Data Types)
|
|
423
|
-
- Application-level conflict resolution
|
|
424
|
-
|
|
425
|
-
```typescript
|
|
426
|
-
// Last-writer-wins with timestamp
|
|
427
|
-
interface User {
|
|
428
|
-
id: string;
|
|
429
|
-
name: string;
|
|
430
|
-
updatedAt: Date; // Conflict resolution
|
|
431
|
-
region: string; // Origin tracking
|
|
432
|
-
}
|
|
433
|
-
|
|
434
|
-
async function resolveConflict(local: User, remote: User) {
|
|
435
|
-
// Simple: most recent wins
|
|
436
|
-
return local.updatedAt > remote.updatedAt ? local : remote;
|
|
437
|
-
}
|
|
438
|
-
```
|
|
439
|
-
|
|
440
|
-
### 3. Data Transfer Costs
|
|
441
|
-
|
|
442
|
-
**AWS Inter-Region Data Transfer:**
|
|
443
|
-
- us-east-1 → us-west-2: $0.02/GB
|
|
444
|
-
- us-east-1 → eu-west-1: $0.02/GB
|
|
445
|
-
- us-east-1 → ap-south-1: $0.08/GB
|
|
446
|
-
|
|
447
|
-
**Cost Optimization:**
|
|
448
|
-
```terraform
|
|
449
|
-
# Use VPC peering to reduce costs
|
|
450
|
-
resource "aws_vpc_peering_connection" "us_to_eu" {
|
|
451
|
-
vpc_id = aws_vpc.us_east.id
|
|
452
|
-
peer_vpc_id = aws_vpc.eu_west.id
|
|
453
|
-
peer_region = "eu-west-1"
|
|
454
|
-
|
|
455
|
-
# Reduces data transfer cost vs public internet
|
|
456
|
-
}
|
|
457
|
-
|
|
458
|
-
# Compress data before transfer
|
|
459
|
-
resource "aws_lambda_function" "replicate_with_compression" {
|
|
460
|
-
function_name = "replicate-compressed"
|
|
461
|
-
|
|
462
|
-
environment {
|
|
463
|
-
variables = {
|
|
464
|
-
COMPRESSION = "gzip" # Reduce data volume 60-80%
|
|
465
|
-
}
|
|
466
|
-
}
|
|
467
|
-
}
|
|
468
|
-
```
|
|
469
|
-
|
|
470
|
-
---
|
|
471
|
-
|
|
472
|
-
## Failover Procedures
|
|
473
|
-
|
|
474
|
-
### Automated Failover (Active-Passive)
|
|
475
|
-
|
|
476
|
-
```bash
|
|
477
|
-
#!/bin/bash
|
|
478
|
-
# Automated failover script
|
|
479
|
-
|
|
480
|
-
# 1. Detect primary region failure
|
|
481
|
-
PRIMARY_HEALTH=$(curl -f https://api.us-east-1.example.com/health || echo "FAIL")
|
|
482
|
-
|
|
483
|
-
if [ "$PRIMARY_HEALTH" == "FAIL" ]; then
|
|
484
|
-
echo "Primary region unhealthy. Initiating failover..."
|
|
485
|
-
|
|
486
|
-
# 2. Promote secondary database to primary
|
|
487
|
-
aws rds promote-read-replica \
|
|
488
|
-
--db-instance-identifier users-replica-eu \
|
|
489
|
-
--region eu-west-1
|
|
490
|
-
|
|
491
|
-
# 3. Update DNS to point to secondary
|
|
492
|
-
aws route53 change-resource-record-sets \
|
|
493
|
-
--hosted-zone-id Z1234567890ABC \
|
|
494
|
-
--change-batch '{
|
|
495
|
-
"Changes": [{
|
|
496
|
-
"Action": "UPSERT",
|
|
497
|
-
"ResourceRecordSet": {
|
|
498
|
-
"Name": "api.example.com",
|
|
499
|
-
"Type": "A",
|
|
500
|
-
"SetIdentifier": "failover",
|
|
501
|
-
"Failover": "PRIMARY",
|
|
502
|
-
"AliasTarget": {
|
|
503
|
-
"HostedZoneId": "Z0987654321XYZ",
|
|
504
|
-
"DNSName": "api.eu-west-1.example.com",
|
|
505
|
-
"EvaluateTargetHealth": true
|
|
506
|
-
}
|
|
507
|
-
}
|
|
508
|
-
}]
|
|
509
|
-
}'
|
|
510
|
-
|
|
511
|
-
# 4. Notify team
|
|
512
|
-
aws sns publish \
|
|
513
|
-
--topic-arn arn:aws:sns:us-east-1:123456789:failover-alerts \
|
|
514
|
-
--message "Failover to eu-west-1 completed"
|
|
515
|
-
|
|
516
|
-
echo "Failover complete. Traffic now routing to eu-west-1."
|
|
517
|
-
fi
|
|
518
|
-
```
|
|
519
|
-
|
|
520
|
-
### Manual Failover Runbook
|
|
521
|
-
|
|
522
|
-
```markdown
|
|
523
|
-
# Multi-Region Failover Runbook
|
|
524
|
-
|
|
525
|
-
## Pre-Failover Checklist
|
|
526
|
-
- [ ] Confirm primary region is truly down (not false alarm)
|
|
527
|
-
- [ ] Verify secondary region is healthy
|
|
528
|
-
- [ ] Check replication lag < 5 minutes
|
|
529
|
-
- [ ] Notify stakeholders (Slack #incidents)
|
|
530
|
-
- [ ] Document start time
|
|
531
|
-
|
|
532
|
-
## Failover Steps (30 minutes)
|
|
533
|
-
|
|
534
|
-
### Step 1: Promote Secondary Database (5 min)
|
|
535
|
-
```bash
|
|
536
|
-
aws rds promote-read-replica \
|
|
537
|
-
--db-instance-identifier users-replica-eu
|
|
538
|
-
```
|
|
539
|
-
|
|
540
|
-
### Step 2: Update Application Config (5 min)
|
|
541
|
-
```bash
|
|
542
|
-
kubectl set env deployment/api \
|
|
543
|
-
DATABASE_URL=postgresql://eu-west-1.rds.amazonaws.com/users
|
|
544
|
-
```
|
|
545
|
-
|
|
546
|
-
### Step 3: Update DNS (10 min propagation)
|
|
547
|
-
```bash
|
|
548
|
-
# Update Route 53
|
|
549
|
-
aws route53 change-resource-record-sets ...
|
|
550
|
-
```
|
|
551
|
-
|
|
552
|
-
### Step 4: Verify Traffic (5 min)
|
|
553
|
-
- Check CloudWatch metrics
|
|
554
|
-
- Test critical user flows
|
|
555
|
-
- Monitor error rates
|
|
556
|
-
|
|
557
|
-
### Step 5: Monitor (5 min)
|
|
558
|
-
- Watch dashboards for 15 minutes
|
|
559
|
-
- Confirm no spike in errors
|
|
560
|
-
- Verify latency acceptable
|
|
561
|
-
|
|
562
|
-
## Post-Failover
|
|
563
|
-
- [ ] Update status page
|
|
564
|
-
- [ ] Monitor for 4 hours
|
|
565
|
-
- [ ] Plan failback when primary restored
|
|
566
|
-
- [ ] Post-mortem scheduled
|
|
567
|
-
```
|
|
568
|
-
|
|
569
|
-
---
|
|
570
|
-
|
|
571
|
-
## Testing Multi-Region Failover
|
|
572
|
-
|
|
573
|
-
### Chaos Engineering
|
|
574
|
-
|
|
575
|
-
```yaml
|
|
576
|
-
# Chaos Mesh experiment: Simulate region failure
|
|
577
|
-
apiVersion: chaos-mesh.org/v1alpha1
|
|
578
|
-
kind: NetworkChaos
|
|
579
|
-
metadata:
|
|
580
|
-
name: region-failure-simulation
|
|
581
|
-
spec:
|
|
582
|
-
action: partition
|
|
583
|
-
mode: all
|
|
584
|
-
selector:
|
|
585
|
-
namespaces:
|
|
586
|
-
- production
|
|
587
|
-
labelSelectors:
|
|
588
|
-
'region': 'us-east-1'
|
|
589
|
-
direction: both
|
|
590
|
-
duration: '30m'
|
|
591
|
-
```
|
|
592
|
-
|
|
593
|
-
### GameDays
|
|
594
|
-
|
|
595
|
-
**Quarterly failover testing:**
|
|
596
|
-
1. Schedule 2-hour window
|
|
597
|
-
2. Announce to team
|
|
598
|
-
3. Execute failover procedure
|
|
599
|
-
4. Measure RTO/RPO
|
|
600
|
-
5. Document learnings
|
|
601
|
-
6. Update runbooks
|
|
602
|
-
|
|
603
|
-
---
|
|
604
|
-
|
|
605
|
-
**Related Resources:**
|
|
606
|
-
- disaster-recovery.md - RTO/RPO planning and backup strategies
|
|
607
|
-
- capacity-planning.md - Sizing multi-region infrastructure
|
|
608
|
-
- cost-architecture.md - Multi-region cost optimization
|