blockmine 1.20.0 → 1.22.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/agents/README.md +469 -0
- package/.claude/agents/auth-route-debugger.md +118 -0
- package/.claude/agents/auth-route-tester.md +93 -0
- package/.claude/agents/auto-error-resolver.md +97 -0
- package/.claude/agents/build-optimizer.md +236 -0
- package/.claude/agents/code-architecture-reviewer.md +83 -0
- package/.claude/agents/code-refactor-master.md +94 -0
- package/.claude/agents/cost-optimizer.md +134 -0
- package/.claude/agents/deployment-orchestrator.md +113 -0
- package/.claude/agents/documentation-architect.md +82 -0
- package/.claude/agents/frontend-error-fixer.md +77 -0
- package/.claude/agents/iac-code-generator.md +71 -0
- package/.claude/agents/incident-responder.md +346 -0
- package/.claude/agents/infrastructure-architect.md +31 -0
- package/.claude/agents/kubernetes-specialist.md +56 -0
- package/.claude/agents/migration-planner.md +181 -0
- package/.claude/agents/network-architect.md +196 -0
- package/.claude/agents/plan-reviewer.md +52 -0
- package/.claude/agents/refactor-planner.md +63 -0
- package/.claude/agents/security-scanner.md +102 -0
- package/.claude/agents/web-research-specialist.md +78 -0
- package/.claude/commands/cost-analysis.md +315 -0
- package/.claude/commands/dev-docs-update.md +55 -0
- package/.claude/commands/dev-docs.md +51 -0
- package/.claude/commands/incident-debug.md +247 -0
- package/.claude/commands/infra-plan.md +81 -0
- package/.claude/commands/migration-plan.md +478 -0
- package/.claude/commands/route-research-for-testing.md +37 -0
- package/.claude/commands/security-review.md +66 -0
- package/.claude/hooks/CONFIG.md +448 -0
- package/.claude/hooks/README.md +163 -0
- package/.claude/hooks/SKILL_ACTIVATION_COMPLETE.md +226 -0
- package/.claude/hooks/WINDOWS_HOOKS_README.md +151 -0
- package/.claude/hooks/add-skill-activation-banners.ts +132 -0
- package/.claude/hooks/comprehensive-skill-test.ts +1315 -0
- package/.claude/hooks/error-handling-reminder.sh +12 -0
- package/.claude/hooks/error-handling-reminder.ts +222 -0
- package/.claude/hooks/k8s-manifest-validator.sh +56 -0
- package/.claude/hooks/package-lock.json +556 -0
- package/.claude/hooks/package.json +16 -0
- package/.claude/hooks/post-tool-use-tracker.ps1 +174 -0
- package/.claude/hooks/post-tool-use-tracker.sh +183 -0
- package/.claude/hooks/security-policy-check.sh +247 -0
- package/.claude/hooks/skill-activation-prompt.ps1 +10 -0
- package/.claude/hooks/skill-activation-prompt.sh +10 -0
- package/.claude/hooks/skill-activation-prompt.ts +141 -0
- package/.claude/hooks/stop-build-check-enhanced.sh +130 -0
- package/.claude/hooks/terraform-validator.sh +53 -0
- package/.claude/hooks/test-input.json +7 -0
- package/.claude/hooks/test-skill-activation.ts +427 -0
- package/.claude/hooks/trigger-build-resolver.sh +79 -0
- package/.claude/hooks/tsc-check.sh +173 -0
- package/.claude/hooks/tsconfig.json +19 -0
- package/.claude/settings.json +55 -0
- package/.claude/settings.local.json +28 -3
- package/.claude/skills/README.md +507 -0
- package/.claude/skills/api-engineering/SKILL.md +63 -0
- package/.claude/skills/api-engineering/resources/api-versioning.md +88 -0
- package/.claude/skills/api-engineering/resources/graphql-patterns.md +106 -0
- package/.claude/skills/api-engineering/resources/rate-limiting.md +118 -0
- package/.claude/skills/api-engineering/resources/rest-api-design.md +105 -0
- package/.claude/skills/backend-dev-guidelines/SKILL.md +306 -0
- package/.claude/skills/backend-dev-guidelines/resources/architecture-overview.md +451 -0
- package/.claude/skills/backend-dev-guidelines/resources/async-and-errors.md +307 -0
- package/.claude/skills/backend-dev-guidelines/resources/complete-examples.md +638 -0
- package/.claude/skills/backend-dev-guidelines/resources/configuration.md +275 -0
- package/.claude/skills/backend-dev-guidelines/resources/database-patterns.md +224 -0
- package/.claude/skills/backend-dev-guidelines/resources/middleware-guide.md +213 -0
- package/.claude/skills/backend-dev-guidelines/resources/routing-and-controllers.md +756 -0
- package/.claude/skills/backend-dev-guidelines/resources/sentry-and-monitoring.md +336 -0
- package/.claude/skills/backend-dev-guidelines/resources/services-and-repositories.md +789 -0
- package/.claude/skills/backend-dev-guidelines/resources/testing-guide.md +235 -0
- package/.claude/skills/backend-dev-guidelines/resources/validation-patterns.md +754 -0
- package/.claude/skills/budget-and-cost-management/SKILL.md +850 -0
- package/.claude/skills/build-engineering/SKILL.md +431 -0
- package/.claude/skills/build-engineering/resources/artifact-repositories.md +72 -0
- package/.claude/skills/build-engineering/resources/build-caching.md +96 -0
- package/.claude/skills/build-engineering/resources/build-pipelines.md +105 -0
- package/.claude/skills/build-engineering/resources/build-security.md +95 -0
- package/.claude/skills/build-engineering/resources/build-systems.md +389 -0
- package/.claude/skills/build-engineering/resources/compilation-optimization.md +201 -0
- package/.claude/skills/build-engineering/resources/dependency-management.md +73 -0
- package/.claude/skills/build-engineering/resources/monorepo-builds.md +110 -0
- package/.claude/skills/build-engineering/resources/performance-optimization.md +113 -0
- package/.claude/skills/build-engineering/resources/reproducible-builds.md +82 -0
- package/.claude/skills/cloud-engineering/SKILL.md +675 -0
- package/.claude/skills/cloud-engineering/resources/aws-patterns.md +742 -0
- package/.claude/skills/cloud-engineering/resources/azure-patterns.md +714 -0
- package/.claude/skills/cloud-engineering/resources/cleared-cloud-environments.md +987 -0
- package/.claude/skills/cloud-engineering/resources/cloud-cost-optimization.md +757 -0
- package/.claude/skills/cloud-engineering/resources/cloud-networking.md +1058 -0
- package/.claude/skills/cloud-engineering/resources/cloud-security-tools.md +1530 -0
- package/.claude/skills/cloud-engineering/resources/cloud-security.md +990 -0
- package/.claude/skills/cloud-engineering/resources/gcp-patterns.md +758 -0
- package/.claude/skills/cloud-engineering/resources/migration-strategies.md +820 -0
- package/.claude/skills/cloud-engineering/resources/multi-cloud-strategies.md +670 -0
- package/.claude/skills/cloud-engineering/resources/oci-patterns.md +1198 -0
- package/.claude/skills/cloud-engineering/resources/serverless-patterns.md +795 -0
- package/.claude/skills/cloud-engineering/resources/well-architected-frameworks.md +966 -0
- package/.claude/skills/cybersecurity/SKILL.md +409 -0
- package/.claude/skills/cybersecurity/resources/security-architecture.md +266 -0
- package/.claude/skills/database-engineering/SKILL.md +61 -0
- package/.claude/skills/database-engineering/resources/backup-and-recovery.md +72 -0
- package/.claude/skills/database-engineering/resources/database-replication.md +63 -0
- package/.claude/skills/database-engineering/resources/postgresql-fundamentals.md +70 -0
- package/.claude/skills/database-engineering/resources/query-optimization.md +68 -0
- package/.claude/skills/devsecops/SKILL.md +374 -0
- package/.claude/skills/devsecops/resources/ci-cd-security.md +204 -0
- package/.claude/skills/devsecops/resources/compliance-automation.md +530 -0
- package/.claude/skills/devsecops/resources/compliance-frameworks.md +2322 -0
- package/.claude/skills/devsecops/resources/container-security.md +915 -0
- package/.claude/skills/devsecops/resources/cspm-integration.md +1440 -0
- package/.claude/skills/devsecops/resources/policy-enforcement.md +619 -0
- package/.claude/skills/devsecops/resources/secrets-management.md +755 -0
- package/.claude/skills/devsecops/resources/security-monitoring.md +146 -0
- package/.claude/skills/devsecops/resources/security-scanning.md +887 -0
- package/.claude/skills/devsecops/resources/security-testing.md +203 -0
- package/.claude/skills/devsecops/resources/supply-chain-security.md +518 -0
- package/.claude/skills/devsecops/resources/vulnerability-management.md +481 -0
- package/.claude/skills/devsecops/resources/zero-trust-architecture.md +177 -0
- package/.claude/skills/documentation-as-code/SKILL.md +323 -0
- package/.claude/skills/documentation-as-code/resources/api-documentation.md +90 -0
- package/.claude/skills/documentation-as-code/resources/changelog-management.md +79 -0
- package/.claude/skills/documentation-as-code/resources/diagram-generation.md +44 -0
- package/.claude/skills/documentation-as-code/resources/docs-as-code-workflow.md +99 -0
- package/.claude/skills/documentation-as-code/resources/documentation-automation.md +68 -0
- package/.claude/skills/documentation-as-code/resources/documentation-sites.md +79 -0
- package/.claude/skills/documentation-as-code/resources/markdown-best-practices.md +162 -0
- package/.claude/skills/documentation-as-code/resources/openapi-specification.md +77 -0
- package/.claude/skills/documentation-as-code/resources/readme-engineering.md +60 -0
- package/.claude/skills/documentation-as-code/resources/technical-writing-guide.md +202 -0
- package/.claude/skills/engineering-management/SKILL.md +356 -0
- package/.claude/skills/engineering-management/resources/career-ladders.md +609 -0
- package/.claude/skills/engineering-management/resources/hiring-and-assessment.md +555 -0
- package/.claude/skills/engineering-management/resources/one-on-one-guides.md +609 -0
- package/.claude/skills/engineering-management/resources/resource-planning.md +557 -0
- package/.claude/skills/engineering-management/resources/team-organization-patterns.md +491 -0
- package/.claude/skills/engineering-management/resources/technical-interviews.md +474 -0
- package/.claude/skills/engineering-operations-management/SKILL.md +817 -0
- package/.claude/skills/error-tracking/SKILL.md +379 -0
- package/.claude/skills/frontend-dev-guidelines/SKILL.md +403 -0
- package/.claude/skills/frontend-dev-guidelines/resources/common-patterns.md +331 -0
- package/.claude/skills/frontend-dev-guidelines/resources/complete-examples.md +872 -0
- package/.claude/skills/frontend-dev-guidelines/resources/component-patterns.md +502 -0
- package/.claude/skills/frontend-dev-guidelines/resources/data-fetching.md +767 -0
- package/.claude/skills/frontend-dev-guidelines/resources/file-organization.md +502 -0
- package/.claude/skills/frontend-dev-guidelines/resources/loading-and-error-states.md +501 -0
- package/.claude/skills/frontend-dev-guidelines/resources/performance.md +406 -0
- package/.claude/skills/frontend-dev-guidelines/resources/routing-guide.md +364 -0
- package/.claude/skills/frontend-dev-guidelines/resources/styling-guide.md +428 -0
- package/.claude/skills/frontend-dev-guidelines/resources/typescript-standards.md +418 -0
- package/.claude/skills/general-it-engineering/SKILL.md +393 -0
- package/.claude/skills/general-it-engineering/resources/asset-management.md +712 -0
- package/.claude/skills/general-it-engineering/resources/automation-orchestration.md +817 -0
- package/.claude/skills/general-it-engineering/resources/business-continuity.md +786 -0
- package/.claude/skills/general-it-engineering/resources/change-management.md +715 -0
- package/.claude/skills/general-it-engineering/resources/enterprise-monitoring.md +729 -0
- package/.claude/skills/general-it-engineering/resources/help-desk-operations.md +738 -0
- package/.claude/skills/general-it-engineering/resources/incident-service-management.md +834 -0
- package/.claude/skills/general-it-engineering/resources/it-governance.md +753 -0
- package/.claude/skills/general-it-engineering/resources/itil-framework.md +503 -0
- package/.claude/skills/general-it-engineering/resources/service-management.md +669 -0
- package/.claude/skills/infrastructure-architecture/SKILL.md +328 -0
- package/.claude/skills/infrastructure-architecture/resources/architecture-decision-records.md +505 -0
- package/.claude/skills/infrastructure-architecture/resources/architecture-patterns.md +528 -0
- package/.claude/skills/infrastructure-architecture/resources/capacity-planning.md +453 -0
- package/.claude/skills/infrastructure-architecture/resources/cleared-environment-architecture.md +773 -0
- package/.claude/skills/infrastructure-architecture/resources/cost-architecture.md +499 -0
- package/.claude/skills/infrastructure-architecture/resources/data-architecture.md +501 -0
- package/.claude/skills/infrastructure-architecture/resources/disaster-recovery.md +535 -0
- package/.claude/skills/infrastructure-architecture/resources/migration-architecture.md +512 -0
- package/.claude/skills/infrastructure-architecture/resources/multi-region-design.md +608 -0
- package/.claude/skills/infrastructure-architecture/resources/reference-architectures.md +562 -0
- package/.claude/skills/infrastructure-architecture/resources/security-architecture.md +538 -0
- package/.claude/skills/infrastructure-architecture/resources/system-design-principles.md +489 -0
- package/.claude/skills/infrastructure-architecture/resources/workload-classification.md +1000 -0
- package/.claude/skills/infrastructure-strategy/SKILL.md +924 -0
- package/.claude/skills/network-engineering/SKILL.md +385 -0
- package/.claude/skills/network-engineering/resources/dns-management.md +738 -0
- package/.claude/skills/network-engineering/resources/load-balancing.md +820 -0
- package/.claude/skills/network-engineering/resources/network-architecture.md +546 -0
- package/.claude/skills/network-engineering/resources/network-security.md +921 -0
- package/.claude/skills/network-engineering/resources/network-troubleshooting.md +749 -0
- package/.claude/skills/network-engineering/resources/routing-switching.md +373 -0
- package/.claude/skills/network-engineering/resources/sdn-networking.md +695 -0
- package/.claude/skills/network-engineering/resources/service-mesh-networking.md +777 -0
- package/.claude/skills/network-engineering/resources/tcp-ip-protocols.md +444 -0
- package/.claude/skills/network-engineering/resources/vpn-connectivity.md +672 -0
- package/.claude/skills/observability-engineering/SKILL.md +101 -0
- package/.claude/skills/observability-engineering/resources/apm-tools.md +97 -0
- package/.claude/skills/observability-engineering/resources/correlation-strategies.md +87 -0
- package/.claude/skills/observability-engineering/resources/distributed-tracing.md +98 -0
- package/.claude/skills/observability-engineering/resources/logs-aggregation.md +118 -0
- package/.claude/skills/observability-engineering/resources/observability-cost-optimization.md +141 -0
- package/.claude/skills/observability-engineering/resources/opentelemetry.md +110 -0
- package/.claude/skills/platform-engineering/SKILL.md +555 -0
- package/.claude/skills/platform-engineering/resources/architecture-overview.md +600 -0
- package/.claude/skills/platform-engineering/resources/container-orchestration.md +916 -0
- package/.claude/skills/platform-engineering/resources/cost-optimization.md +634 -0
- package/.claude/skills/platform-engineering/resources/developer-platforms.md +670 -0
- package/.claude/skills/platform-engineering/resources/gitops-automation.md +650 -0
- package/.claude/skills/platform-engineering/resources/infrastructure-as-code.md +778 -0
- package/.claude/skills/platform-engineering/resources/infrastructure-standards.md +708 -0
- package/.claude/skills/platform-engineering/resources/multi-tenancy.md +602 -0
- package/.claude/skills/platform-engineering/resources/platform-security.md +711 -0
- package/.claude/skills/platform-engineering/resources/resource-management.md +592 -0
- package/.claude/skills/platform-engineering/resources/service-mesh.md +628 -0
- package/.claude/skills/release-engineering/SKILL.md +393 -0
- package/.claude/skills/release-engineering/resources/artifact-management.md +108 -0
- package/.claude/skills/release-engineering/resources/build-optimization.md +84 -0
- package/.claude/skills/release-engineering/resources/ci-cd-pipelines.md +411 -0
- package/.claude/skills/release-engineering/resources/deployment-strategies.md +197 -0
- package/.claude/skills/release-engineering/resources/pipeline-security.md +62 -0
- package/.claude/skills/release-engineering/resources/progressive-delivery.md +83 -0
- package/.claude/skills/release-engineering/resources/release-automation.md +68 -0
- package/.claude/skills/release-engineering/resources/release-orchestration.md +77 -0
- package/.claude/skills/release-engineering/resources/rollback-strategies.md +66 -0
- package/.claude/skills/release-engineering/resources/versioning-strategies.md +59 -0
- package/.claude/skills/route-tester/SKILL.md +392 -0
- package/.claude/skills/skill-developer/ADVANCED.md +197 -0
- package/.claude/skills/skill-developer/HOOK_MECHANISMS.md +306 -0
- package/.claude/skills/skill-developer/PATTERNS_LIBRARY.md +152 -0
- package/.claude/skills/skill-developer/SKILL.md +430 -0
- package/.claude/skills/skill-developer/SKILL_RULES_REFERENCE.md +315 -0
- package/.claude/skills/skill-developer/TRIGGER_TYPES.md +305 -0
- package/.claude/skills/skill-developer/TROUBLESHOOTING.md +514 -0
- package/.claude/skills/skill-rules.json +2940 -0
- package/.claude/skills/sre/SKILL.md +464 -0
- package/.claude/skills/sre/resources/alerting-best-practices.md +282 -0
- package/.claude/skills/sre/resources/capacity-planning.md +226 -0
- package/.claude/skills/sre/resources/chaos-engineering.md +193 -0
- package/.claude/skills/sre/resources/disaster-recovery.md +232 -0
- package/.claude/skills/sre/resources/incident-management.md +436 -0
- package/.claude/skills/sre/resources/observability-stack.md +240 -0
- package/.claude/skills/sre/resources/on-call-runbooks.md +167 -0
- package/.claude/skills/sre/resources/performance-optimization.md +108 -0
- package/.claude/skills/sre/resources/reliability-patterns.md +183 -0
- package/.claude/skills/sre/resources/slo-sli-sla.md +464 -0
- package/.claude/skills/sre/resources/toil-reduction.md +145 -0
- package/.claude/skills/systems-engineering/SKILL.md +648 -0
- package/.claude/skills/systems-engineering/resources/automation-patterns.md +771 -0
- package/.claude/skills/systems-engineering/resources/configuration-management.md +998 -0
- package/.claude/skills/systems-engineering/resources/linux-administration.md +672 -0
- package/.claude/skills/systems-engineering/resources/networking-fundamentals.md +982 -0
- package/.claude/skills/systems-engineering/resources/performance-tuning.md +871 -0
- package/.claude/skills/systems-engineering/resources/powershell-scripting.md +482 -0
- package/.claude/skills/systems-engineering/resources/security-hardening.md +739 -0
- package/.claude/skills/systems-engineering/resources/shell-scripting.md +915 -0
- package/.claude/skills/systems-engineering/resources/storage-management.md +628 -0
- package/.claude/skills/systems-engineering/resources/system-monitoring.md +787 -0
- package/.claude/skills/systems-engineering/resources/troubleshooting-guide.md +753 -0
- package/.claude/skills/systems-engineering/resources/windows-administration.md +738 -0
- package/.claude/skills/technical-leadership/SKILL.md +728 -0
- package/CHANGELOG.md +90 -39
- package/README.md +94 -0
- package/backend/docs/SECRETS_DOCUMENTATION.md +327 -0
- package/backend/jest.config.js +59 -0
- package/backend/package-lock.json +6129 -0
- package/backend/package.json +16 -4
- package/backend/prisma/migrations/20251026104609_add_websocket_api/migration.sql +33 -0
- package/backend/prisma/schema.prisma +33 -0
- package/backend/src/__tests__/core/DependencyService.test.js +336 -0
- package/backend/src/__tests__/core/UserService.test.js +875 -0
- package/backend/src/__tests__/repositories/BaseRepository.test.js +146 -0
- package/backend/src/__tests__/repositories/BotRepository.test.js +118 -0
- package/backend/src/__tests__/repositories/CommandRepository.test.js +132 -0
- package/backend/src/__tests__/repositories/EventGraphRepository.test.js +93 -0
- package/backend/src/__tests__/repositories/GroupRepository.test.js +155 -0
- package/backend/src/__tests__/repositories/PermissionRepository.test.js +130 -0
- package/backend/src/__tests__/repositories/PluginRepository.test.js +107 -0
- package/backend/src/__tests__/repositories/ServerRepository.test.js +80 -0
- package/backend/src/__tests__/repositories/UserRepository.test.js +128 -0
- package/backend/src/__tests__/secretsFilter.test.js +425 -0
- package/backend/src/__tests__/services/BotLifecycleService.test.js +411 -0
- package/backend/src/__tests__/services/BotProcessManager.test.js +285 -0
- package/backend/src/__tests__/services/CacheManager.test.js +125 -0
- package/backend/src/__tests__/services/CommandExecutionService.test.js +460 -0
- package/backend/src/__tests__/services/ResourceMonitorService.test.js +207 -0
- package/backend/src/__tests__/services/TelemetryService.test.js +291 -0
- package/backend/src/__tests__/setup.js +25 -0
- package/backend/src/api/routes/apiKeys.js +181 -0
- package/backend/src/api/routes/bots.js +49 -7
- package/backend/src/api/routes/plugins.js +2 -1
- package/backend/src/api/routes/system.js +174 -0
- package/backend/src/container.js +82 -0
- package/backend/src/core/BotManager.js +142 -871
- package/backend/src/core/BotManager.old.js +1093 -0
- package/backend/src/core/BotProcess.js +1092 -850
- package/backend/src/core/BreakLoopSignal.js +8 -0
- package/backend/src/core/EventGraphManager.js +280 -193
- package/backend/src/core/GraphExecutionEngine.js +321 -928
- package/backend/src/core/MessageQueue.js +27 -6
- package/backend/src/core/NodeRegistry.js +37 -991
- package/backend/src/core/PluginManager.js +62 -12
- package/backend/src/core/PrismaService.js +32 -0
- package/backend/src/core/UserService.js +3 -3
- package/backend/src/core/__tests__/PrismaService.test.js +24 -0
- package/backend/src/core/commands/README.md +305 -0
- package/backend/src/core/commands/dev.js +13 -7
- package/backend/src/core/commands/ping.js +10 -4
- package/backend/src/core/commands/whois.js +63 -0
- package/backend/src/core/config/validation.js +27 -0
- package/backend/src/core/constants/graphTypes.js +21 -0
- package/backend/src/core/node-registries/actions.js +132 -0
- package/backend/src/core/node-registries/arrays.js +137 -0
- package/backend/src/core/node-registries/bot.js +23 -0
- package/backend/src/core/node-registries/data.js +290 -0
- package/backend/src/core/node-registries/debug.js +26 -0
- package/backend/src/core/node-registries/events.js +187 -0
- package/backend/src/core/node-registries/flow.js +139 -0
- package/backend/src/core/node-registries/logic.js +45 -0
- package/backend/src/core/node-registries/math.js +42 -0
- package/backend/src/core/node-registries/objects.js +98 -0
- package/backend/src/core/node-registries/strings.js +153 -0
- package/backend/src/core/node-registries/time.js +113 -0
- package/backend/src/core/node-registries/users.js +79 -0
- package/backend/src/core/nodes/actions/bot_look_at.js +36 -0
- package/backend/src/core/nodes/actions/bot_set_variable.js +32 -0
- package/backend/src/core/nodes/actions/http_request.js +98 -0
- package/backend/src/core/nodes/actions/send_log.js +28 -0
- package/backend/src/core/nodes/actions/send_message.js +32 -0
- package/backend/src/core/nodes/actions/send_websocket_response.js +33 -0
- package/backend/src/core/nodes/arrays/add_element.js +23 -0
- package/backend/src/core/nodes/arrays/contains.js +40 -0
- package/backend/src/core/nodes/arrays/find_index.js +23 -0
- package/backend/src/core/nodes/arrays/get_by_index.js +23 -0
- package/backend/src/core/nodes/arrays/get_next.js +35 -0
- package/backend/src/core/nodes/arrays/get_random_element.js +32 -0
- package/backend/src/core/nodes/arrays/remove_by_index.js +30 -0
- package/backend/src/core/nodes/bot/get_position.js +20 -0
- package/backend/src/core/nodes/data/array_literal.js +31 -0
- package/backend/src/core/nodes/data/boolean_literal.js +21 -0
- package/backend/src/core/nodes/data/cast.js +42 -0
- package/backend/src/core/nodes/data/datetime_literal.js +27 -0
- package/backend/src/core/nodes/data/entity_info.js +69 -0
- package/backend/src/core/nodes/data/get_argument.js +23 -0
- package/backend/src/core/nodes/data/get_bot_look.js +14 -0
- package/backend/src/core/nodes/data/get_entity_field.js +18 -0
- package/backend/src/core/nodes/data/get_nearby_entities.js +32 -0
- package/backend/src/core/nodes/data/get_nearby_players.js +64 -0
- package/backend/src/core/nodes/data/get_server_players.js +18 -0
- package/backend/src/core/nodes/data/get_user_field.js +40 -0
- package/backend/src/core/nodes/data/get_variable.js +23 -0
- package/backend/src/core/nodes/data/length.js +25 -0
- package/backend/src/core/nodes/data/make_object.js +31 -0
- package/backend/src/core/nodes/data/number_literal.js +21 -0
- package/backend/src/core/nodes/data/string_literal.js +34 -0
- package/backend/src/core/nodes/data/type_check.js +53 -0
- package/backend/src/core/nodes/debug/log.js +16 -0
- package/backend/src/core/nodes/flow/branch.js +15 -0
- package/backend/src/core/nodes/flow/break.js +14 -0
- package/backend/src/core/nodes/flow/delay.js +43 -0
- package/backend/src/core/nodes/flow/for_each.js +39 -0
- package/backend/src/core/nodes/flow/sequence.js +16 -0
- package/backend/src/core/nodes/flow/switch.js +47 -0
- package/backend/src/core/nodes/flow/while.js +64 -0
- package/backend/src/core/nodes/logic/__tests__/compare.test.js +83 -0
- package/backend/src/core/nodes/logic/compare.js +33 -0
- package/backend/src/core/nodes/logic/operation.js +35 -0
- package/backend/src/core/nodes/math/__tests__/operation.test.js +65 -0
- package/backend/src/core/nodes/math/operation.js +31 -0
- package/backend/src/core/nodes/math/random_number.js +43 -0
- package/backend/src/core/nodes/objects/create.js +40 -0
- package/backend/src/core/nodes/objects/delete.js +26 -0
- package/backend/src/core/nodes/objects/get.js +23 -0
- package/backend/src/core/nodes/objects/has_key.js +30 -0
- package/backend/src/core/nodes/objects/set.js +27 -0
- package/backend/src/core/nodes/strings/__tests__/concat.test.js +89 -0
- package/backend/src/core/nodes/strings/concat.js +27 -0
- package/backend/src/core/nodes/strings/contains.js +41 -0
- package/backend/src/core/nodes/strings/ends_with.js +43 -0
- package/backend/src/core/nodes/strings/equals.js +36 -0
- package/backend/src/core/nodes/strings/length.js +36 -0
- package/backend/src/core/nodes/strings/matches.js +39 -0
- package/backend/src/core/nodes/strings/split.js +37 -0
- package/backend/src/core/nodes/strings/starts_with.js +43 -0
- package/backend/src/core/nodes/time/__tests__/now.test.js +24 -0
- package/backend/src/core/nodes/time/add.js +33 -0
- package/backend/src/core/nodes/time/compare.js +35 -0
- package/backend/src/core/nodes/time/diff.js +29 -0
- package/backend/src/core/nodes/time/format.js +32 -0
- package/backend/src/core/nodes/time/now.js +18 -0
- package/backend/src/core/nodes/users/check_blacklist.js +37 -0
- package/backend/src/core/nodes/users/get_groups.js +36 -0
- package/backend/src/core/nodes/users/get_permissions.js +36 -0
- package/backend/src/core/nodes/users/set_blacklist.js +37 -0
- package/backend/src/core/services/BotLifecycleService.js +596 -0
- package/backend/src/core/services/BotProcessManager.js +163 -0
- package/backend/src/core/services/CacheManager.js +111 -0
- package/backend/src/core/services/CommandExecutionService.js +351 -0
- package/backend/src/core/services/ResourceMonitorService.js +90 -0
- package/backend/src/core/services/TelemetryService.js +124 -0
- package/backend/src/core/services/ValidationService.js +132 -0
- package/backend/src/core/services/__tests__/ValidationService.test.js +148 -0
- package/backend/src/core/services.js +20 -5
- package/backend/src/core/system/CommandContext.js +84 -0
- package/backend/src/core/system/Transport.js +78 -0
- package/backend/src/core/utils/__tests__/jsonParser.test.js +44 -0
- package/backend/src/core/utils/jsonParser.js +18 -0
- package/backend/src/core/utils/secretsFilter.js +262 -0
- package/backend/src/core/utils/variableParser.js +89 -0
- package/backend/src/core/validation/__tests__/nodeSchemas.test.js +175 -0
- package/backend/src/core/validation/nodeSchemas.js +112 -0
- package/backend/src/lib/prisma.js +2 -4
- package/backend/src/real-time/botApi/handlers/commandHandlers.js +28 -0
- package/backend/src/real-time/botApi/handlers/graphHandlers.js +99 -0
- package/backend/src/real-time/botApi/handlers/graphWebSocketHandlers.js +147 -0
- package/backend/src/real-time/botApi/handlers/index.js +43 -0
- package/backend/src/real-time/botApi/handlers/messageHandlers.js +66 -0
- package/backend/src/real-time/botApi/handlers/statusHandlers.js +17 -0
- package/backend/src/real-time/botApi/handlers/userHandlers.js +141 -0
- package/backend/src/real-time/botApi/index.js +40 -0
- package/backend/src/real-time/botApi/middleware.js +79 -0
- package/backend/src/real-time/botApi/utils.js +54 -0
- package/backend/src/real-time/socketHandler.js +6 -2
- package/backend/src/repositories/BaseRepository.js +43 -0
- package/backend/src/repositories/BotRepository.js +42 -0
- package/backend/src/repositories/CommandRepository.js +53 -0
- package/backend/src/repositories/EventGraphRepository.js +40 -0
- package/backend/src/repositories/GroupRepository.js +69 -0
- package/backend/src/repositories/PermissionRepository.js +48 -0
- package/backend/src/repositories/PluginRepository.js +42 -0
- package/backend/src/repositories/ServerRepository.js +27 -0
- package/backend/src/repositories/UserRepository.js +48 -0
- package/backend/src/server.js +3 -0
- package/backend/src/test-refactor.js +85 -0
- package/frontend/dist/assets/index-CfTo92bP.css +1 -0
- package/frontend/dist/assets/index-CiFD5X9Z.js +8344 -0
- package/frontend/dist/index.html +2 -2
- package/frontend/package.json +1 -5
- package/package.json +2 -1
- package/frontend/dist/assets/index-BFd7YoAj.css +0 -1
- package/frontend/dist/assets/index-CMMutadc.js +0 -8352
- package/nul +0 -0
|
@@ -0,0 +1,834 @@
|
|
|
1
|
+
# Incident and Problem Management
|
|
2
|
+
|
|
3
|
+
Incident classification, escalation procedures, problem management, root cause analysis, and known error database for resolving and preventing IT issues.
|
|
4
|
+
|
|
5
|
+
## Table of Contents
|
|
6
|
+
|
|
7
|
+
- [Incident Management](#incident-management)
|
|
8
|
+
- [Incident Classification](#incident-classification)
|
|
9
|
+
- [Escalation Procedures](#escalation-procedures)
|
|
10
|
+
- [Problem Management](#problem-management)
|
|
11
|
+
- [Root Cause Analysis](#root-cause-analysis)
|
|
12
|
+
- [Known Error Database](#known-error-database)
|
|
13
|
+
- [Best Practices](#best-practices)
|
|
14
|
+
|
|
15
|
+
## Incident Management
|
|
16
|
+
|
|
17
|
+
### Purpose
|
|
18
|
+
|
|
19
|
+
**Incident:** Unplanned interruption or reduction in quality of service.
|
|
20
|
+
|
|
21
|
+
**Goals:**
|
|
22
|
+
- Restore normal service as quickly as possible
|
|
23
|
+
- Minimize business impact
|
|
24
|
+
- Maintain quality of service levels
|
|
25
|
+
- Meet SLA targets
|
|
26
|
+
|
|
27
|
+
### Incident vs Problem vs Request
|
|
28
|
+
|
|
29
|
+
```yaml
|
|
30
|
+
Incident:
|
|
31
|
+
Definition: Service disruption or degradation
|
|
32
|
+
Example: Email server down, application error
|
|
33
|
+
Goal: Restore service quickly
|
|
34
|
+
Owner: Service desk / Incident Manager
|
|
35
|
+
|
|
36
|
+
Problem:
|
|
37
|
+
Definition: Unknown root cause of one or more incidents
|
|
38
|
+
Example: Why do email servers crash every week?
|
|
39
|
+
Goal: Find and fix root cause
|
|
40
|
+
Owner: Problem Manager
|
|
41
|
+
|
|
42
|
+
Service Request:
|
|
43
|
+
Definition: Request for service or information
|
|
44
|
+
Example: New laptop, password reset, access request
|
|
45
|
+
Goal: Fulfill request
|
|
46
|
+
Owner: Service desk / Fulfillment team
|
|
47
|
+
```
|
|
48
|
+
|
|
49
|
+
### Incident Lifecycle
|
|
50
|
+
|
|
51
|
+
```
|
|
52
|
+
Detection
|
|
53
|
+
↓
|
|
54
|
+
Logging
|
|
55
|
+
↓
|
|
56
|
+
Categorization & Prioritization
|
|
57
|
+
↓
|
|
58
|
+
Diagnosis
|
|
59
|
+
↓
|
|
60
|
+
Escalation (if needed)
|
|
61
|
+
↓
|
|
62
|
+
Investigation & Diagnosis
|
|
63
|
+
↓
|
|
64
|
+
Resolution & Recovery
|
|
65
|
+
↓
|
|
66
|
+
Closure
|
|
67
|
+
↓
|
|
68
|
+
Post-Incident Review (for major incidents)
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
## Incident Classification
|
|
72
|
+
|
|
73
|
+
### Priority Matrix
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
┌────────────────────────────────────────────┐
|
|
77
|
+
│ Impact vs Urgency │
|
|
78
|
+
├────────────────────────────────────────────┤
|
|
79
|
+
│ │ Low │ Med │ High │ Critical │
|
|
80
|
+
│────────────┼─────┼─────┼──────┼───────────│
|
|
81
|
+
│ Urgent │ P3 │ P2 │ P1 │ P1 │
|
|
82
|
+
│ High │ P3 │ P2 │ P2 │ P1 │
|
|
83
|
+
│ Medium │ P4 │ P3 │ P2 │ P2 │
|
|
84
|
+
│ Low │ P4 │ P4 │ P3 │ P3 │
|
|
85
|
+
└────────────────────────────────────────────┘
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
### Priority Definitions
|
|
89
|
+
|
|
90
|
+
**P1 - Critical:**
|
|
91
|
+
```yaml
|
|
92
|
+
Criteria:
|
|
93
|
+
- Complete service outage
|
|
94
|
+
- Critical business process stopped
|
|
95
|
+
- Security breach in progress
|
|
96
|
+
- Data loss imminent
|
|
97
|
+
- Affects >1000 users
|
|
98
|
+
|
|
99
|
+
Examples:
|
|
100
|
+
- Production database down
|
|
101
|
+
- Website completely unavailable
|
|
102
|
+
- Email service down globally
|
|
103
|
+
- Ransomware attack
|
|
104
|
+
|
|
105
|
+
SLA Targets:
|
|
106
|
+
Response Time: 15 minutes
|
|
107
|
+
Resolution Time: 4 hours
|
|
108
|
+
Communication: Every 30 minutes
|
|
109
|
+
Support: 24/7/365
|
|
110
|
+
|
|
111
|
+
Process:
|
|
112
|
+
- Immediate notification to Incident Manager
|
|
113
|
+
- War room/bridge call initiated
|
|
114
|
+
- All hands on deck
|
|
115
|
+
- Executive notification
|
|
116
|
+
- Status page updated
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**P2 - High:**
|
|
120
|
+
```yaml
|
|
121
|
+
Criteria:
|
|
122
|
+
- Major service degradation
|
|
123
|
+
- Critical users affected
|
|
124
|
+
- Workaround not available
|
|
125
|
+
- Affects 100-1000 users
|
|
126
|
+
|
|
127
|
+
Examples:
|
|
128
|
+
- Application slow (80% degradation)
|
|
129
|
+
- VPN intermittent failures
|
|
130
|
+
- Key functionality broken
|
|
131
|
+
- Regional service outage
|
|
132
|
+
|
|
133
|
+
SLA Targets:
|
|
134
|
+
Response Time: 1 hour
|
|
135
|
+
Resolution Time: 8 hours
|
|
136
|
+
Communication: Every 2 hours
|
|
137
|
+
Support: 24/7
|
|
138
|
+
|
|
139
|
+
Process:
|
|
140
|
+
- Incident Manager assigned
|
|
141
|
+
- Bridge call if needed
|
|
142
|
+
- Regular updates
|
|
143
|
+
- Management notification
|
|
144
|
+
```
|
|
145
|
+
|
|
146
|
+
**P3 - Medium:**
|
|
147
|
+
```yaml
|
|
148
|
+
Criteria:
|
|
149
|
+
- Moderate service impact
|
|
150
|
+
- Workaround available
|
|
151
|
+
- Affects 10-100 users
|
|
152
|
+
- Non-critical business impact
|
|
153
|
+
|
|
154
|
+
Examples:
|
|
155
|
+
- Minor application bug
|
|
156
|
+
- Single user unable to access file share
|
|
157
|
+
- Printer offline
|
|
158
|
+
- Report not generating
|
|
159
|
+
|
|
160
|
+
SLA Targets:
|
|
161
|
+
Response Time: 4 hours
|
|
162
|
+
Resolution Time: 24 hours
|
|
163
|
+
Communication: Daily
|
|
164
|
+
Support: Business hours
|
|
165
|
+
|
|
166
|
+
Process:
|
|
167
|
+
- Standard troubleshooting
|
|
168
|
+
- Escalate if unresolved
|
|
169
|
+
- User updates as needed
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**P4 - Low:**
|
|
173
|
+
```yaml
|
|
174
|
+
Criteria:
|
|
175
|
+
- Minimal impact
|
|
176
|
+
- Single user affected
|
|
177
|
+
- Enhancement request
|
|
178
|
+
- Cosmetic issues
|
|
179
|
+
|
|
180
|
+
Examples:
|
|
181
|
+
- Typo in application
|
|
182
|
+
- Feature request
|
|
183
|
+
- Question about functionality
|
|
184
|
+
- UI improvement suggestion
|
|
185
|
+
|
|
186
|
+
SLA Targets:
|
|
187
|
+
Response Time: 8 hours
|
|
188
|
+
Resolution Time: 48 hours
|
|
189
|
+
Communication: On close
|
|
190
|
+
Support: Business hours
|
|
191
|
+
|
|
192
|
+
Process:
|
|
193
|
+
- Queue for resolution
|
|
194
|
+
- May convert to service request
|
|
195
|
+
- May defer to next release
|
|
196
|
+
```
|
|
197
|
+
|
|
198
|
+
### Incident Categories
|
|
199
|
+
|
|
200
|
+
```yaml
|
|
201
|
+
Category Hierarchy:
|
|
202
|
+
|
|
203
|
+
Hardware:
|
|
204
|
+
- Desktop/Laptop
|
|
205
|
+
- Server
|
|
206
|
+
- Network
|
|
207
|
+
- Printer
|
|
208
|
+
- Phone
|
|
209
|
+
|
|
210
|
+
Software:
|
|
211
|
+
- Application
|
|
212
|
+
- Operating System
|
|
213
|
+
- Database
|
|
214
|
+
- Middleware
|
|
215
|
+
|
|
216
|
+
Service:
|
|
217
|
+
- Email
|
|
218
|
+
- Network Connectivity
|
|
219
|
+
- Authentication
|
|
220
|
+
- File Storage
|
|
221
|
+
- VPN
|
|
222
|
+
|
|
223
|
+
Security:
|
|
224
|
+
- Account Compromise
|
|
225
|
+
- Malware
|
|
226
|
+
- Phishing
|
|
227
|
+
- Unauthorized Access
|
|
228
|
+
```
|
|
229
|
+
|
|
230
|
+
## Escalation Procedures
|
|
231
|
+
|
|
232
|
+
### Escalation Types
|
|
233
|
+
|
|
234
|
+
**Functional Escalation:**
|
|
235
|
+
```
|
|
236
|
+
Escalate to higher expertise level
|
|
237
|
+
|
|
238
|
+
L1 Service Desk
|
|
239
|
+
↓ (if unresolved in 15 minutes)
|
|
240
|
+
L2 Technical Support
|
|
241
|
+
↓ (if unresolved in 1 hour)
|
|
242
|
+
L3 Subject Matter Expert
|
|
243
|
+
↓ (if vendor issue)
|
|
244
|
+
Vendor Support
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
**Hierarchical Escalation:**
|
|
248
|
+
```
|
|
249
|
+
Escalate to higher management level
|
|
250
|
+
|
|
251
|
+
Service Desk Analyst
|
|
252
|
+
↓ (if P1 or SLA breach)
|
|
253
|
+
Service Desk Manager
|
|
254
|
+
↓ (if not resolving)
|
|
255
|
+
IT Manager
|
|
256
|
+
↓ (if business critical)
|
|
257
|
+
CIO
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Escalation Triggers
|
|
261
|
+
|
|
262
|
+
```yaml
|
|
263
|
+
Automatic Escalation Triggers:
|
|
264
|
+
|
|
265
|
+
1. Priority-based:
|
|
266
|
+
- P1 incident created → Immediate
|
|
267
|
+
- P2 approaching SLA breach → 1 hour before
|
|
268
|
+
- P3/P4 SLA breached → Immediately
|
|
269
|
+
|
|
270
|
+
2. Time-based:
|
|
271
|
+
- L1 unable to resolve in 15 minutes → L2
|
|
272
|
+
- L2 unable to resolve in 1 hour → L3
|
|
273
|
+
- Any tier stuck for 2x expected time → Manager
|
|
274
|
+
|
|
275
|
+
3. Impact-based:
|
|
276
|
+
- Executive affected → Immediate
|
|
277
|
+
- Revenue-impacting → Immediate
|
|
278
|
+
- Compliance issue → Immediate
|
|
279
|
+
- Security incident → Immediate
|
|
280
|
+
|
|
281
|
+
4. Pattern-based:
|
|
282
|
+
- 3rd occurrence of same issue → Problem Management
|
|
283
|
+
- Affecting multiple users → Escalate priority
|
|
284
|
+
- Scope expanding → Re-prioritize
|
|
285
|
+
```
|
|
286
|
+
|
|
287
|
+
### Major Incident Management
|
|
288
|
+
|
|
289
|
+
```yaml
|
|
290
|
+
Major Incident: High impact, urgent situation requiring immediate attention
|
|
291
|
+
|
|
292
|
+
Criteria:
|
|
293
|
+
- Multiple critical services down
|
|
294
|
+
- Widespread user impact (>1000 users)
|
|
295
|
+
- Executive visibility
|
|
296
|
+
- Revenue impact
|
|
297
|
+
- Regulatory implications
|
|
298
|
+
|
|
299
|
+
Process:
|
|
300
|
+
|
|
301
|
+
1. Declaration:
|
|
302
|
+
- Incident Manager declares major incident
|
|
303
|
+
- Initiate major incident process
|
|
304
|
+
- Notify stakeholders
|
|
305
|
+
|
|
306
|
+
2. War Room:
|
|
307
|
+
- Virtual bridge call
|
|
308
|
+
- Dedicated Slack channel
|
|
309
|
+
- Screen sharing for collaboration
|
|
310
|
+
|
|
311
|
+
3. Roles:
|
|
312
|
+
- Incident Manager: Leads response
|
|
313
|
+
- Technical Lead: Coordinates fixes
|
|
314
|
+
- Communications Lead: User updates
|
|
315
|
+
- Scribe: Documents timeline
|
|
316
|
+
|
|
317
|
+
4. Communication:
|
|
318
|
+
- Status page updated immediately
|
|
319
|
+
- Updates every 15-30 minutes
|
|
320
|
+
- Executive briefings
|
|
321
|
+
- Customer notifications
|
|
322
|
+
|
|
323
|
+
5. Resolution:
|
|
324
|
+
- Focus on service restoration (not root cause)
|
|
325
|
+
- Implement workaround if faster than fix
|
|
326
|
+
- Verify resolution
|
|
327
|
+
- Monitoring period
|
|
328
|
+
|
|
329
|
+
6. Post-Incident Review:
|
|
330
|
+
- Mandatory for major incidents
|
|
331
|
+
- Timeline analysis
|
|
332
|
+
- Root cause
|
|
333
|
+
- Action items
|
|
334
|
+
- Process improvements
|
|
335
|
+
```
|
|
336
|
+
|
|
337
|
+
### War Room Example
|
|
338
|
+
|
|
339
|
+
```yaml
|
|
340
|
+
Major Incident: Customer Portal Down
|
|
341
|
+
|
|
342
|
+
Bridge Call Details:
|
|
343
|
+
Conference: +1-555-1234, Code: 9876
|
|
344
|
+
Slack Channel: #incident-portal-down
|
|
345
|
+
Start Time: 2024-11-01 14:30
|
|
346
|
+
|
|
347
|
+
Participants:
|
|
348
|
+
Incident Manager: Sarah Johnson
|
|
349
|
+
Technical Lead: Mike Chen
|
|
350
|
+
App Team: Dev Team Lead + 2 engineers
|
|
351
|
+
Infrastructure: Ops Manager + SRE
|
|
352
|
+
Database: DBA on call
|
|
353
|
+
Communications: Customer Success Manager
|
|
354
|
+
Executive: VP Engineering (observer)
|
|
355
|
+
|
|
356
|
+
Timeline:
|
|
357
|
+
14:30 - Incident declared, war room started
|
|
358
|
+
14:35 - Investigation started (all hands)
|
|
359
|
+
14:40 - Root cause identified (database connection pool exhausted)
|
|
360
|
+
14:50 - Fix applied (increased connection pool size)
|
|
361
|
+
15:00 - Service restored
|
|
362
|
+
15:15 - Monitoring period complete, war room closed
|
|
363
|
+
15:30 - PIR scheduled for tomorrow 10am
|
|
364
|
+
|
|
365
|
+
Actions:
|
|
366
|
+
- [DONE] Restart application servers
|
|
367
|
+
- [DONE] Increase DB connection pool
|
|
368
|
+
- [DONE] Verify functionality
|
|
369
|
+
- [PENDING] PIR tomorrow
|
|
370
|
+
- [PENDING] Permanent fix in sprint
|
|
371
|
+
```
|
|
372
|
+
|
|
373
|
+
## Problem Management
|
|
374
|
+
|
|
375
|
+
### Purpose
|
|
376
|
+
|
|
377
|
+
**Problem:** Unknown root cause of one or more incidents.
|
|
378
|
+
|
|
379
|
+
**Goals:**
|
|
380
|
+
- Identify root causes
|
|
381
|
+
- Prevent incident recurrence
|
|
382
|
+
- Minimize impact of incidents that can't be prevented
|
|
383
|
+
- Improve service quality
|
|
384
|
+
|
|
385
|
+
### Reactive vs Proactive
|
|
386
|
+
|
|
387
|
+
**Reactive Problem Management:**
|
|
388
|
+
```
|
|
389
|
+
Triggered by:
|
|
390
|
+
- Multiple related incidents
|
|
391
|
+
- Recurring incidents
|
|
392
|
+
- Major incident root cause analysis
|
|
393
|
+
|
|
394
|
+
Process:
|
|
395
|
+
- Analyze incident patterns
|
|
396
|
+
- Identify common root cause
|
|
397
|
+
- Document as problem
|
|
398
|
+
- Assign for investigation
|
|
399
|
+
```
|
|
400
|
+
|
|
401
|
+
**Proactive Problem Management:**
|
|
402
|
+
```
|
|
403
|
+
Triggered by:
|
|
404
|
+
- Trend analysis
|
|
405
|
+
- Performance monitoring
|
|
406
|
+
- Technology assessments
|
|
407
|
+
- Known vulnerabilities
|
|
408
|
+
|
|
409
|
+
Process:
|
|
410
|
+
- Identify potential problems before incidents occur
|
|
411
|
+
- Analyze risks
|
|
412
|
+
- Implement preventive measures
|
|
413
|
+
- Continuous improvement
|
|
414
|
+
```
|
|
415
|
+
|
|
416
|
+
### Problem Lifecycle
|
|
417
|
+
|
|
418
|
+
```
|
|
419
|
+
Problem Detection
|
|
420
|
+
↓
|
|
421
|
+
Problem Logging
|
|
422
|
+
↓
|
|
423
|
+
Categorization & Prioritization
|
|
424
|
+
↓
|
|
425
|
+
Investigation & Diagnosis
|
|
426
|
+
↓
|
|
427
|
+
Workaround (interim solution)
|
|
428
|
+
↓
|
|
429
|
+
Known Error (if workaround found)
|
|
430
|
+
↓
|
|
431
|
+
Root Cause Identified
|
|
432
|
+
↓
|
|
433
|
+
Change Request (for permanent fix)
|
|
434
|
+
↓
|
|
435
|
+
Problem Resolution
|
|
436
|
+
↓
|
|
437
|
+
Problem Closure
|
|
438
|
+
```
|
|
439
|
+
|
|
440
|
+
### Problem Record Example
|
|
441
|
+
|
|
442
|
+
```yaml
|
|
443
|
+
Problem: PRB-456
|
|
444
|
+
|
|
445
|
+
Status: Known Error (workaround available)
|
|
446
|
+
Category: Application
|
|
447
|
+
Subcategory: Performance
|
|
448
|
+
Priority: P2
|
|
449
|
+
|
|
450
|
+
Description:
|
|
451
|
+
Customer Portal experiences slow response times (>5 seconds)
|
|
452
|
+
during business hours, especially between 9am-11am.
|
|
453
|
+
|
|
454
|
+
Related Incidents:
|
|
455
|
+
- INC-12340: 2024-10-15 - Slow portal
|
|
456
|
+
- INC-12398: 2024-10-22 - Portal timeout
|
|
457
|
+
- INC-12445: 2024-10-29 - Cannot load customer data
|
|
458
|
+
- INC-12467: 2024-11-01 - Slow search results
|
|
459
|
+
Total: 15 incidents in 30 days
|
|
460
|
+
|
|
461
|
+
Investigation Timeline:
|
|
462
|
+
2024-10-16: Problem created (after 2nd occurrence)
|
|
463
|
+
2024-10-17: Assigned to App Team
|
|
464
|
+
2024-10-20: Performance testing conducted
|
|
465
|
+
2024-10-22: Root cause identified
|
|
466
|
+
2024-10-23: Workaround implemented
|
|
467
|
+
2024-10-25: Marked as Known Error
|
|
468
|
+
|
|
469
|
+
Root Cause:
|
|
470
|
+
Database query inefficiency due to missing index on
|
|
471
|
+
customer_orders.created_date column. Query scans full table
|
|
472
|
+
(5M rows) instead of using index, causing 8+ second queries.
|
|
473
|
+
|
|
474
|
+
Workaround:
|
|
475
|
+
Add database read replica for reporting queries to reduce
|
|
476
|
+
load on primary database.
|
|
477
|
+
|
|
478
|
+
Implementation:
|
|
479
|
+
- Created read replica
|
|
480
|
+
- Routed search queries to replica
|
|
481
|
+
- Reduced primary DB load by 40%
|
|
482
|
+
|
|
483
|
+
Result:
|
|
484
|
+
- Response times improved to <2 seconds
|
|
485
|
+
- Incidents reduced by 90%
|
|
486
|
+
|
|
487
|
+
Permanent Solution:
|
|
488
|
+
Change Request: CHG-12890
|
|
489
|
+
Description: Add index on customer_orders.created_date
|
|
490
|
+
Status: Approved
|
|
491
|
+
Scheduled: 2024-11-10 02:00 AM
|
|
492
|
+
Expected Result: Eliminate slow queries completely
|
|
493
|
+
|
|
494
|
+
Owner: Database Team
|
|
495
|
+
Next Review: 2024-11-15 (verify fix effectiveness)
|
|
496
|
+
```
|
|
497
|
+
|
|
498
|
+
## Root Cause Analysis
|
|
499
|
+
|
|
500
|
+
### 5 Whys Technique
|
|
501
|
+
|
|
502
|
+
```yaml
|
|
503
|
+
Incident: Production website down for 2 hours
|
|
504
|
+
|
|
505
|
+
Why? The web server crashed.
|
|
506
|
+
↓
|
|
507
|
+
Why did the web server crash? It ran out of memory.
|
|
508
|
+
↓
|
|
509
|
+
Why did it run out of memory? Memory leak in application code.
|
|
510
|
+
↓
|
|
511
|
+
Why was there a memory leak? Database connections not closed properly.
|
|
512
|
+
↓
|
|
513
|
+
Why weren't connections closed? Missing finally blocks in error handling.
|
|
514
|
+
↓
|
|
515
|
+
Root Cause: Code review process didn't catch missing resource cleanup
|
|
516
|
+
|
|
517
|
+
Actions:
|
|
518
|
+
1. Fix immediate issue: Add connection cleanup code
|
|
519
|
+
2. Prevent recurrence: Update code review checklist
|
|
520
|
+
3. Detect earlier: Add memory monitoring alerts
|
|
521
|
+
4. Improve process: Automated code analysis for resource leaks
|
|
522
|
+
```
|
|
523
|
+
|
|
524
|
+
### Fishbone Diagram (Ishikawa)
|
|
525
|
+
|
|
526
|
+
```
|
|
527
|
+
Problem: Database Performance Degradation
|
|
528
|
+
|
|
529
|
+
People Process Technology
|
|
530
|
+
│ │ │
|
|
531
|
+
│ │ │
|
|
532
|
+
No DBA training No capacity planning Old hardware
|
|
533
|
+
│ │ │
|
|
534
|
+
└───────────────────┴──────────────────────┘
|
|
535
|
+
│
|
|
536
|
+
┌──────▼──────┐
|
|
537
|
+
│ Database │
|
|
538
|
+
│ Performance │
|
|
539
|
+
│ Issues │
|
|
540
|
+
└──────▲──────┘
|
|
541
|
+
┌───────────────────┬┴──────────────────────┐
|
|
542
|
+
│ │ │
|
|
543
|
+
No monitoring No query tuning Unoptimized queries
|
|
544
|
+
│ │ │
|
|
545
|
+
Environment Methods Management
|
|
546
|
+
```
|
|
547
|
+
|
|
548
|
+
### RCA Report Template
|
|
549
|
+
|
|
550
|
+
```yaml
|
|
551
|
+
Root Cause Analysis Report
|
|
552
|
+
|
|
553
|
+
Incident: INC-12500 - Email Service Outage
|
|
554
|
+
Date: 2024-11-01
|
|
555
|
+
Duration: 14:30 - 17:15 (2 hours 45 minutes)
|
|
556
|
+
Impact: 5,000 users unable to send/receive email
|
|
557
|
+
Business Impact: $45,000 revenue (estimated)
|
|
558
|
+
|
|
559
|
+
Timeline:
|
|
560
|
+
14:30 - Users report cannot send email
|
|
561
|
+
14:35 - Service desk escalates to L2
|
|
562
|
+
14:40 - Email team investigates
|
|
563
|
+
15:00 - Identified mail server queue full
|
|
564
|
+
15:15 - Identified spam causing queue backup
|
|
565
|
+
15:30 - Spam filtering updated
|
|
566
|
+
16:00 - Queue processing resumed
|
|
567
|
+
17:00 - All queued emails delivered
|
|
568
|
+
17:15 - Service fully restored
|
|
569
|
+
|
|
570
|
+
Root Cause:
|
|
571
|
+
Spam filtering rules outdated (last update 6 months ago).
|
|
572
|
+
New spam campaign bypassed filters, overwhelming mail queue
|
|
573
|
+
with 500,000 spam messages in 2 hours.
|
|
574
|
+
|
|
575
|
+
Contributing Factors:
|
|
576
|
+
1. No automated spam filter updates
|
|
577
|
+
2. No queue monitoring alerts
|
|
578
|
+
3. No rate limiting on inbound email
|
|
579
|
+
|
|
580
|
+
Immediate Actions (Completed):
|
|
581
|
+
- Updated spam filters
|
|
582
|
+
- Cleared spam from queue
|
|
583
|
+
- Implemented rate limiting
|
|
584
|
+
|
|
585
|
+
Preventive Actions:
|
|
586
|
+
1. Automate spam filter updates (daily) - Due: Nov 5
|
|
587
|
+
2. Implement queue monitoring - Due: Nov 10
|
|
588
|
+
3. Review email security architecture - Due: Nov 20
|
|
589
|
+
4. Add rate limiting for all services - Due: Nov 30
|
|
590
|
+
|
|
591
|
+
Lessons Learned:
|
|
592
|
+
- Automated updates critical for security controls
|
|
593
|
+
- Monitoring should cover queue depths, not just service status
|
|
594
|
+
- Need faster incident detection (took 30 min to detect)
|
|
595
|
+
|
|
596
|
+
Owner: Email Team Lead
|
|
597
|
+
PIR Completed: 2024-11-02
|
|
598
|
+
Follow-up Review: 2024-12-01
|
|
599
|
+
```
|
|
600
|
+
|
|
601
|
+
## Known Error Database
|
|
602
|
+
|
|
603
|
+
### Purpose
|
|
604
|
+
|
|
605
|
+
**Known Error Database (KEDB):** Repository of Known Errors and workarounds.
|
|
606
|
+
|
|
607
|
+
**Benefits:**
|
|
608
|
+
- Faster incident resolution
|
|
609
|
+
- Consistent solutions
|
|
610
|
+
- Knowledge sharing
|
|
611
|
+
- Reduced dependency on specific individuals
|
|
612
|
+
|
|
613
|
+
### Known Error Record
|
|
614
|
+
|
|
615
|
+
```yaml
|
|
616
|
+
Known Error: KE-123
|
|
617
|
+
|
|
618
|
+
Problem: PRB-456
|
|
619
|
+
Status: Active (workaround available)
|
|
620
|
+
Category: Application
|
|
621
|
+
Subcategory: Performance
|
|
622
|
+
|
|
623
|
+
Description:
|
|
624
|
+
Customer search returns results slowly (>5 seconds)
|
|
625
|
+
during peak hours (9am-11am, 1pm-3pm).
|
|
626
|
+
|
|
627
|
+
Root Cause:
|
|
628
|
+
Missing database index on frequently queried column.
|
|
629
|
+
Query performs table scan instead of index lookup.
|
|
630
|
+
|
|
631
|
+
Workaround:
|
|
632
|
+
Title: Use Advanced Search with Date Filter
|
|
633
|
+
|
|
634
|
+
Steps:
|
|
635
|
+
1. Click "Advanced Search"
|
|
636
|
+
2. Add date range filter (last 90 days)
|
|
637
|
+
3. Perform search
|
|
638
|
+
|
|
639
|
+
Result:
|
|
640
|
+
- Reduces query scope
|
|
641
|
+
- Returns results in <1 second
|
|
642
|
+
- 95% of searches within 90 days
|
|
643
|
+
|
|
644
|
+
Limitations:
|
|
645
|
+
- Doesn't help for searches >90 days
|
|
646
|
+
- Extra step for users
|
|
647
|
+
|
|
648
|
+
Permanent Fix:
|
|
649
|
+
Status: In Progress
|
|
650
|
+
Change: CHG-12890 (database index creation)
|
|
651
|
+
ETA: 2024-11-10
|
|
652
|
+
|
|
653
|
+
Once deployed:
|
|
654
|
+
- Workaround no longer needed
|
|
655
|
+
- All searches will be fast
|
|
656
|
+
- KE will be closed
|
|
657
|
+
|
|
658
|
+
Communication:
|
|
659
|
+
- Service desk trained on workaround
|
|
660
|
+
- Knowledge base article published: KB-789
|
|
661
|
+
- User notification sent
|
|
662
|
+
- Status page updated
|
|
663
|
+
|
|
664
|
+
Related Documents:
|
|
665
|
+
- Problem Record: PRB-456
|
|
666
|
+
- Knowledge Article: KB-789
|
|
667
|
+
- Change Request: CHG-12890
|
|
668
|
+
- Incidents: INC-12340, INC-12398, INC-12445
|
|
669
|
+
|
|
670
|
+
Owner: Database Team
|
|
671
|
+
Created: 2024-10-23
|
|
672
|
+
Last Updated: 2024-11-01
|
|
673
|
+
Next Review: 2024-11-15
|
|
674
|
+
```
|
|
675
|
+
|
|
676
|
+
### KEDB Integration
|
|
677
|
+
|
|
678
|
+
```yaml
|
|
679
|
+
Integration Points:
|
|
680
|
+
|
|
681
|
+
Service Desk:
|
|
682
|
+
- Search KEDB before escalating
|
|
683
|
+
- Apply workaround if available
|
|
684
|
+
- Link incident to known error
|
|
685
|
+
- Faster resolution
|
|
686
|
+
|
|
687
|
+
Incident Management:
|
|
688
|
+
- Check for known errors during diagnosis
|
|
689
|
+
- Apply documented workarounds
|
|
690
|
+
- Track effectiveness
|
|
691
|
+
|
|
692
|
+
Problem Management:
|
|
693
|
+
- Create KE when workaround identified
|
|
694
|
+
- Update KE when permanent fix deployed
|
|
695
|
+
- Close KE when problem resolved
|
|
696
|
+
|
|
697
|
+
Knowledge Management:
|
|
698
|
+
- Convert KE to KB articles
|
|
699
|
+
- Self-service access
|
|
700
|
+
- User education
|
|
701
|
+
```
|
|
702
|
+
|
|
703
|
+
## Best Practices
|
|
704
|
+
|
|
705
|
+
### 1. Clear Communication
|
|
706
|
+
|
|
707
|
+
**User Updates:**
|
|
708
|
+
```
|
|
709
|
+
During Incident:
|
|
710
|
+
- Initial acknowledgment (within SLA)
|
|
711
|
+
- Progress updates (regular intervals)
|
|
712
|
+
- Workaround instructions (if available)
|
|
713
|
+
- Resolution notification
|
|
714
|
+
- Follow-up (verify fix)
|
|
715
|
+
|
|
716
|
+
Status Page:
|
|
717
|
+
- Real-time incident status
|
|
718
|
+
- Affected services
|
|
719
|
+
- ETR (Estimated Time to Repair)
|
|
720
|
+
- Updates as situation evolves
|
|
721
|
+
```
|
|
722
|
+
|
|
723
|
+
### 2. Incident Metrics
|
|
724
|
+
|
|
725
|
+
```yaml
|
|
726
|
+
Key Performance Indicators:
|
|
727
|
+
|
|
728
|
+
Response Metrics:
|
|
729
|
+
- Mean Time to Respond (MTTR)
|
|
730
|
+
Target: <15 min (P1), <1 hr (P2)
|
|
731
|
+
- First Response SLA Compliance
|
|
732
|
+
Target: >95%
|
|
733
|
+
|
|
734
|
+
Resolution Metrics:
|
|
735
|
+
- Mean Time to Resolve (MTTR)
|
|
736
|
+
Target: <4 hr (P1), <8 hr (P2)
|
|
737
|
+
- Resolution SLA Compliance
|
|
738
|
+
Target: >90%
|
|
739
|
+
- First Contact Resolution
|
|
740
|
+
Target: >70%
|
|
741
|
+
|
|
742
|
+
Quality Metrics:
|
|
743
|
+
- Incident Reopen Rate
|
|
744
|
+
Target: <5%
|
|
745
|
+
- Escalation Rate
|
|
746
|
+
Target: <30%
|
|
747
|
+
- User Satisfaction (CSAT)
|
|
748
|
+
Target: >4.0/5.0
|
|
749
|
+
|
|
750
|
+
Volume Metrics:
|
|
751
|
+
- Incidents by category
|
|
752
|
+
- Incidents by priority
|
|
753
|
+
- Trend analysis (increasing/decreasing)
|
|
754
|
+
```
|
|
755
|
+
|
|
756
|
+
### 3. Post-Incident Reviews
|
|
757
|
+
|
|
758
|
+
**For major incidents, conduct PIR:**
|
|
759
|
+
```yaml
|
|
760
|
+
PIR Agenda:
|
|
761
|
+
|
|
762
|
+
1. Timeline Review (20 min)
|
|
763
|
+
- What happened and when?
|
|
764
|
+
- Who was involved?
|
|
765
|
+
- What actions were taken?
|
|
766
|
+
|
|
767
|
+
2. Root Cause Analysis (20 min)
|
|
768
|
+
- Why did it happen?
|
|
769
|
+
- What were contributing factors?
|
|
770
|
+
- Could it have been prevented?
|
|
771
|
+
|
|
772
|
+
3. Impact Assessment (10 min)
|
|
773
|
+
- Users affected
|
|
774
|
+
- Business impact
|
|
775
|
+
- Financial impact
|
|
776
|
+
- Reputation impact
|
|
777
|
+
|
|
778
|
+
4. Response Evaluation (15 min)
|
|
779
|
+
- What went well?
|
|
780
|
+
- What could be improved?
|
|
781
|
+
- Was communication effective?
|
|
782
|
+
- Were tools adequate?
|
|
783
|
+
|
|
784
|
+
5. Action Items (15 min)
|
|
785
|
+
- Immediate fixes
|
|
786
|
+
- Preventive measures
|
|
787
|
+
- Process improvements
|
|
788
|
+
- Training needs
|
|
789
|
+
- Tool enhancements
|
|
790
|
+
|
|
791
|
+
6. Follow-up (5 min)
|
|
792
|
+
- Assign owners
|
|
793
|
+
- Set deadlines
|
|
794
|
+
- Schedule review
|
|
795
|
+
```
|
|
796
|
+
|
|
797
|
+
### 4. Knowledge Management
|
|
798
|
+
|
|
799
|
+
Build comprehensive knowledge base from incidents and problems.
|
|
800
|
+
|
|
801
|
+
### 5. Automation
|
|
802
|
+
|
|
803
|
+
```yaml
|
|
804
|
+
Automate Common Resolutions:
|
|
805
|
+
|
|
806
|
+
Self-Healing:
|
|
807
|
+
- Service restart on failure
|
|
808
|
+
- Disk space cleanup
|
|
809
|
+
- Cache clearing
|
|
810
|
+
- Certificate renewal
|
|
811
|
+
|
|
812
|
+
Automated Diagnostics:
|
|
813
|
+
- Health check scripts
|
|
814
|
+
- Log analysis
|
|
815
|
+
- Performance baselines
|
|
816
|
+
- Connectivity tests
|
|
817
|
+
|
|
818
|
+
Chatbots:
|
|
819
|
+
- Password reset
|
|
820
|
+
- Account unlock
|
|
821
|
+
- Common questions
|
|
822
|
+
- Ticket creation
|
|
823
|
+
```
|
|
824
|
+
|
|
825
|
+
### 6. Continuous Improvement
|
|
826
|
+
|
|
827
|
+
Learn from every incident to prevent future occurrences.
|
|
828
|
+
|
|
829
|
+
---
|
|
830
|
+
|
|
831
|
+
**Related Resources:**
|
|
832
|
+
- [itil-framework.md](itil-framework.md) - ITIL incident and problem management
|
|
833
|
+
- [service-management.md](service-management.md) - SLA management
|
|
834
|
+
- [help-desk-operations.md](help-desk-operations.md) - Service desk operations
|