@arthai/agents 1.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +123 -0
- package/VERSION +1 -0
- package/agents/ai-consultant.md +999 -0
- package/agents/architect.md +174 -0
- package/agents/code-reviewer.md +115 -0
- package/agents/competitive-analyst.md +688 -0
- package/agents/content-strategist.md +607 -0
- package/agents/design-studio-create.md +304 -0
- package/agents/design-studio-critique.md +258 -0
- package/agents/design-studio-think.md +79 -0
- package/agents/domain-hunter.md +519 -0
- package/agents/explore-light.md +52 -0
- package/agents/frontend.md +261 -0
- package/agents/gtm-expert.md +811 -0
- package/agents/meeting-prep.md +318 -0
- package/agents/ops.md +149 -0
- package/agents/product-manager.md +563 -0
- package/agents/python-backend.md +286 -0
- package/agents/qa-baseline-updater.md +45 -0
- package/agents/qa-challenger.md +97 -0
- package/agents/qa-domain.md +145 -0
- package/agents/qa-e2e.md +184 -0
- package/agents/qa-test-promoter.md +97 -0
- package/agents/qa.md +226 -0
- package/agents/setup.md +134 -0
- package/agents/sre.md +165 -0
- package/agents/stakeholder-reporter.md +94 -0
- package/agents/user-researcher.md +602 -0
- package/bin/cli.js +322 -0
- package/bundles/canvas.json +16 -0
- package/bundles/compass.json +16 -0
- package/bundles/counsel.json +31 -0
- package/bundles/cruise.json +11 -0
- package/bundles/forge.json +26 -0
- package/bundles/prime.json +10 -0
- package/bundles/prism.json +23 -0
- package/bundles/scalpel.json +17 -0
- package/bundles/sentinel.json +19 -0
- package/bundles/shield.json +14 -0
- package/bundles/spark.json +19 -0
- package/compiler.sh +305 -0
- package/dist/plugins/canvas/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/canvas/agents/design-studio-create.md +304 -0
- package/dist/plugins/canvas/agents/design-studio-critique.md +258 -0
- package/dist/plugins/canvas/agents/design-studio-think.md +79 -0
- package/dist/plugins/canvas/agents/frontend.md +261 -0
- package/dist/plugins/canvas/skills/planning/SKILL.md +436 -0
- package/dist/plugins/compass/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/compass/agents/content-strategist.md +607 -0
- package/dist/plugins/compass/agents/gtm-expert.md +811 -0
- package/dist/plugins/compass/agents/product-manager.md +563 -0
- package/dist/plugins/compass/agents/user-researcher.md +602 -0
- package/dist/plugins/compass/skills/planning/SKILL.md +436 -0
- package/dist/plugins/counsel/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/counsel/agents/ai-consultant.md +999 -0
- package/dist/plugins/counsel/agents/competitive-analyst.md +688 -0
- package/dist/plugins/counsel/agents/meeting-prep.md +318 -0
- package/dist/plugins/counsel/agents/stakeholder-reporter.md +94 -0
- package/dist/plugins/counsel/hooks/check-deliverable.sh +65 -0
- package/dist/plugins/counsel/hooks/ensure-client-dir.sh +59 -0
- package/dist/plugins/counsel/hooks/hooks.json +28 -0
- package/dist/plugins/counsel/skills/client-discovery/SKILL.md +266 -0
- package/dist/plugins/counsel/skills/consulting/SKILL.md +282 -0
- package/dist/plugins/counsel/skills/deliverable-builder/SKILL.md +928 -0
- package/dist/plugins/counsel/skills/engagement-tracker/SKILL.md +380 -0
- package/dist/plugins/counsel/skills/market-research/SKILL.md +300 -0
- package/dist/plugins/counsel/skills/opportunity-map/SKILL.md +307 -0
- package/dist/plugins/counsel/skills/pitch-generator/SKILL.md +378 -0
- package/dist/plugins/counsel/skills/roi-calculator/SKILL.md +469 -0
- package/dist/plugins/counsel/skills/share/SKILL.md +211 -0
- package/dist/plugins/counsel/skills/solution-architect/SKILL.md +566 -0
- package/dist/plugins/counsel/skills/templates/SKILL.md +194 -0
- package/dist/plugins/counsel/skills/welcome/SKILL.md +136 -0
- package/dist/plugins/counsel/skills/wizard/SKILL.md +411 -0
- package/dist/plugins/cruise/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/cruise/skills/autopilot/SKILL.md +425 -0
- package/dist/plugins/forge/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/forge/agents/architect.md +174 -0
- package/dist/plugins/forge/agents/code-reviewer.md +115 -0
- package/dist/plugins/forge/agents/frontend.md +261 -0
- package/dist/plugins/forge/agents/product-manager.md +563 -0
- package/dist/plugins/forge/agents/python-backend.md +286 -0
- package/dist/plugins/forge/agents/qa.md +226 -0
- package/dist/plugins/forge/hooks/hooks.json +28 -0
- package/dist/plugins/forge/hooks/post-test-summary.sh +115 -0
- package/dist/plugins/forge/hooks/triage-router.sh +740 -0
- package/dist/plugins/forge/skills/implement/SKILL.md +532 -0
- package/dist/plugins/forge/skills/planning/SKILL.md +436 -0
- package/dist/plugins/forge/skills/pr/SKILL.md +275 -0
- package/dist/plugins/forge/skills/precheck/SKILL.md +159 -0
- package/dist/plugins/forge/skills/qa/SKILL.md +127 -0
- package/dist/plugins/forge/skills/review-pr/SKILL.md +367 -0
- package/dist/plugins/prime/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/prime/agents/ai-consultant.md +999 -0
- package/dist/plugins/prime/agents/architect.md +174 -0
- package/dist/plugins/prime/agents/code-reviewer.md +115 -0
- package/dist/plugins/prime/agents/competitive-analyst.md +688 -0
- package/dist/plugins/prime/agents/content-strategist.md +607 -0
- package/dist/plugins/prime/agents/design-studio-create.md +304 -0
- package/dist/plugins/prime/agents/design-studio-critique.md +258 -0
- package/dist/plugins/prime/agents/design-studio-think.md +79 -0
- package/dist/plugins/prime/agents/explore-light.md +52 -0
- package/dist/plugins/prime/agents/frontend.md +261 -0
- package/dist/plugins/prime/agents/gtm-expert.md +811 -0
- package/dist/plugins/prime/agents/meeting-prep.md +318 -0
- package/dist/plugins/prime/agents/ops.md +149 -0
- package/dist/plugins/prime/agents/product-manager.md +563 -0
- package/dist/plugins/prime/agents/python-backend.md +286 -0
- package/dist/plugins/prime/agents/qa-baseline-updater.md +45 -0
- package/dist/plugins/prime/agents/qa-challenger.md +97 -0
- package/dist/plugins/prime/agents/qa-domain.md +145 -0
- package/dist/plugins/prime/agents/qa-e2e.md +184 -0
- package/dist/plugins/prime/agents/qa-test-promoter.md +97 -0
- package/dist/plugins/prime/agents/qa.md +226 -0
- package/dist/plugins/prime/agents/setup.md +134 -0
- package/dist/plugins/prime/agents/sre.md +165 -0
- package/dist/plugins/prime/agents/stakeholder-reporter.md +94 -0
- package/dist/plugins/prime/agents/user-researcher.md +602 -0
- package/dist/plugins/prime/hooks/check-deliverable.sh +65 -0
- package/dist/plugins/prime/hooks/ensure-client-dir.sh +59 -0
- package/dist/plugins/prime/hooks/hooks.json +184 -0
- package/dist/plugins/prime/hooks/post-deploy-health.sh +83 -0
- package/dist/plugins/prime/hooks/post-diff-test-compare.sh +125 -0
- package/dist/plugins/prime/hooks/post-edit-lint.sh +92 -0
- package/dist/plugins/prime/hooks/post-git-state.sh +54 -0
- package/dist/plugins/prime/hooks/post-merge-cleanup.sh +101 -0
- package/dist/plugins/prime/hooks/post-test-summary.sh +115 -0
- package/dist/plugins/prime/hooks/pre-bash-guard.sh +142 -0
- package/dist/plugins/prime/hooks/pre-edit-guard.sh +121 -0
- package/dist/plugins/prime/hooks/pre-task-context.sh +113 -0
- package/dist/plugins/prime/hooks/session-bootstrap.sh +379 -0
- package/dist/plugins/prime/hooks/session-end.sh +107 -0
- package/dist/plugins/prime/hooks/session-summary.sh +97 -0
- package/dist/plugins/prime/hooks/sync-agents.sh +269 -0
- package/dist/plugins/prime/hooks/triage-router.sh +740 -0
- package/dist/plugins/prime/skills/arth/SKILL.md +165 -0
- package/dist/plugins/prime/skills/autopilot/SKILL.md +425 -0
- package/dist/plugins/prime/skills/calibrate/SKILL.md +1807 -0
- package/dist/plugins/prime/skills/ci-fix/SKILL.md +293 -0
- package/dist/plugins/prime/skills/client-discovery/SKILL.md +266 -0
- package/dist/plugins/prime/skills/consulting/SKILL.md +282 -0
- package/dist/plugins/prime/skills/custom-domain/SKILL.md +261 -0
- package/dist/plugins/prime/skills/deliverable-builder/SKILL.md +928 -0
- package/dist/plugins/prime/skills/discord-ops/SKILL.md +125 -0
- package/dist/plugins/prime/skills/engagement-tracker/SKILL.md +380 -0
- package/dist/plugins/prime/skills/explore.md +43 -0
- package/dist/plugins/prime/skills/fix/SKILL.md +1058 -0
- package/dist/plugins/prime/skills/implement/SKILL.md +532 -0
- package/dist/plugins/prime/skills/incident/SKILL.md +910 -0
- package/dist/plugins/prime/skills/issue/SKILL.md +134 -0
- package/dist/plugins/prime/skills/market-research/SKILL.md +300 -0
- package/dist/plugins/prime/skills/onboard/SKILL.md +344 -0
- package/dist/plugins/prime/skills/opportunity-map/SKILL.md +307 -0
- package/dist/plugins/prime/skills/pitch-generator/SKILL.md +378 -0
- package/dist/plugins/prime/skills/planning/SKILL.md +436 -0
- package/dist/plugins/prime/skills/pr/SKILL.md +275 -0
- package/dist/plugins/prime/skills/precheck/SKILL.md +159 -0
- package/dist/plugins/prime/skills/qa/SKILL.md +127 -0
- package/dist/plugins/prime/skills/qa-incident/SKILL.md +54 -0
- package/dist/plugins/prime/skills/qa-learn/SKILL.md +47 -0
- package/dist/plugins/prime/skills/restart/SKILL.md +70 -0
- package/dist/plugins/prime/skills/review-pr/SKILL.md +367 -0
- package/dist/plugins/prime/skills/roi-calculator/SKILL.md +469 -0
- package/dist/plugins/prime/skills/scan/SKILL.md +232 -0
- package/dist/plugins/prime/skills/setup/SKILL.md +691 -0
- package/dist/plugins/prime/skills/share/SKILL.md +211 -0
- package/dist/plugins/prime/skills/solution-architect/SKILL.md +566 -0
- package/dist/plugins/prime/skills/sre/SKILL.md +362 -0
- package/dist/plugins/prime/skills/sync/SKILL.md +188 -0
- package/dist/plugins/prime/skills/templates/SKILL.md +194 -0
- package/dist/plugins/prime/skills/welcome/SKILL.md +136 -0
- package/dist/plugins/prime/skills/wizard/SKILL.md +411 -0
- package/dist/plugins/prism/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/prism/agents/qa-baseline-updater.md +45 -0
- package/dist/plugins/prism/agents/qa-challenger.md +97 -0
- package/dist/plugins/prism/agents/qa-domain.md +145 -0
- package/dist/plugins/prism/agents/qa-e2e.md +184 -0
- package/dist/plugins/prism/agents/qa-test-promoter.md +97 -0
- package/dist/plugins/prism/agents/qa.md +226 -0
- package/dist/plugins/prism/hooks/hooks.json +26 -0
- package/dist/plugins/prism/hooks/post-diff-test-compare.sh +125 -0
- package/dist/plugins/prism/hooks/post-test-summary.sh +115 -0
- package/dist/plugins/prism/skills/qa/SKILL.md +127 -0
- package/dist/plugins/prism/skills/qa-incident/SKILL.md +54 -0
- package/dist/plugins/prism/skills/qa-learn/SKILL.md +47 -0
- package/dist/plugins/scalpel/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/scalpel/agents/code-reviewer.md +115 -0
- package/dist/plugins/scalpel/hooks/hooks.json +26 -0
- package/dist/plugins/scalpel/hooks/pre-edit-guard.sh +121 -0
- package/dist/plugins/scalpel/skills/ci-fix/SKILL.md +293 -0
- package/dist/plugins/scalpel/skills/fix/SKILL.md +1058 -0
- package/dist/plugins/scalpel/skills/issue/SKILL.md +134 -0
- package/dist/plugins/sentinel/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/sentinel/agents/ops.md +149 -0
- package/dist/plugins/sentinel/agents/sre.md +165 -0
- package/dist/plugins/sentinel/hooks/hooks.json +26 -0
- package/dist/plugins/sentinel/hooks/post-deploy-health.sh +83 -0
- package/dist/plugins/sentinel/hooks/post-git-state.sh +54 -0
- package/dist/plugins/sentinel/skills/incident/SKILL.md +910 -0
- package/dist/plugins/sentinel/skills/restart/SKILL.md +70 -0
- package/dist/plugins/sentinel/skills/sre/SKILL.md +362 -0
- package/dist/plugins/shield/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/shield/hooks/hooks.json +60 -0
- package/dist/plugins/shield/hooks/pre-bash-guard.sh +142 -0
- package/dist/plugins/shield/hooks/pre-edit-guard.sh +121 -0
- package/dist/plugins/shield/hooks/session-bootstrap.sh +379 -0
- package/dist/plugins/shield/hooks/triage-router.sh +740 -0
- package/dist/plugins/spark/.claude-plugin/plugin.json +6 -0
- package/dist/plugins/spark/agents/explore-light.md +52 -0
- package/dist/plugins/spark/agents/setup.md +134 -0
- package/dist/plugins/spark/hooks/hooks.json +16 -0
- package/dist/plugins/spark/hooks/session-bootstrap.sh +379 -0
- package/dist/plugins/spark/skills/calibrate/SKILL.md +1807 -0
- package/dist/plugins/spark/skills/onboard/SKILL.md +344 -0
- package/dist/plugins/spark/skills/scan/SKILL.md +232 -0
- package/dist/plugins/spark/skills/setup/SKILL.md +691 -0
- package/hook-defs.json +104 -0
- package/hooks/check-deliverable.sh +65 -0
- package/hooks/ensure-client-dir.sh +59 -0
- package/hooks/hooks.json +16 -0
- package/hooks/post-deploy-health.sh +83 -0
- package/hooks/post-diff-test-compare.sh +125 -0
- package/hooks/post-edit-lint.sh +92 -0
- package/hooks/post-git-state.sh +54 -0
- package/hooks/post-merge-cleanup.sh +101 -0
- package/hooks/post-test-summary.sh +115 -0
- package/hooks/pre-bash-guard.sh +142 -0
- package/hooks/pre-edit-guard.sh +121 -0
- package/hooks/pre-task-context.sh +113 -0
- package/hooks/session-bootstrap.sh +379 -0
- package/hooks/session-end.sh +107 -0
- package/hooks/session-start.sh +46 -0
- package/hooks/session-summary.sh +97 -0
- package/hooks/sync-agents.sh +269 -0
- package/hooks/triage-router.sh +740 -0
- package/install.sh +3185 -0
- package/package.json +40 -0
- package/portable.manifest +112 -0
- package/skills/arth/SKILL.md +165 -0
- package/skills/autopilot/SKILL.md +425 -0
- package/skills/calibrate/SKILL.md +1807 -0
- package/skills/ci-fix/SKILL.md +293 -0
- package/skills/client-discovery/SKILL.md +266 -0
- package/skills/consulting/SKILL.md +282 -0
- package/skills/continue/SKILL.md +174 -0
- package/skills/custom-domain/SKILL.md +261 -0
- package/skills/deliverable-builder/SKILL.md +928 -0
- package/skills/discord-ops/SKILL.md +125 -0
- package/skills/engagement-tracker/SKILL.md +380 -0
- package/skills/explore.md +43 -0
- package/skills/fix/SKILL.md +1058 -0
- package/skills/implement/SKILL.md +532 -0
- package/skills/incident/SKILL.md +910 -0
- package/skills/issue/SKILL.md +134 -0
- package/skills/market-research/SKILL.md +300 -0
- package/skills/onboard/SKILL.md +344 -0
- package/skills/opportunity-map/SKILL.md +307 -0
- package/skills/pitch-generator/SKILL.md +378 -0
- package/skills/planning/SKILL.md +436 -0
- package/skills/pr/SKILL.md +275 -0
- package/skills/precheck/SKILL.md +159 -0
- package/skills/qa/SKILL.md +127 -0
- package/skills/qa-incident/SKILL.md +54 -0
- package/skills/qa-learn/SKILL.md +47 -0
- package/skills/railway/central-station/SKILL.md +226 -0
- package/skills/railway/central-station/references/environment-config.md +183 -0
- package/skills/railway/central-station/references/monorepo.md +216 -0
- package/skills/railway/central-station/references/railpack.md +257 -0
- package/skills/railway/central-station/references/variables.md +170 -0
- package/skills/railway/database/SKILL.md +284 -0
- package/skills/railway/database/references/environment-config.md +183 -0
- package/skills/railway/database/references/monorepo.md +216 -0
- package/skills/railway/database/references/railpack.md +257 -0
- package/skills/railway/database/references/variables.md +170 -0
- package/skills/railway/database/scripts/railway-api.sh +41 -0
- package/skills/railway/deploy/SKILL.md +128 -0
- package/skills/railway/deploy/references/environment-config.md +183 -0
- package/skills/railway/deploy/references/monorepo.md +216 -0
- package/skills/railway/deploy/references/railpack.md +257 -0
- package/skills/railway/deploy/references/variables.md +170 -0
- package/skills/railway/deployment/SKILL.md +222 -0
- package/skills/railway/deployment/references/environment-config.md +183 -0
- package/skills/railway/deployment/references/monorepo.md +216 -0
- package/skills/railway/deployment/references/railpack.md +257 -0
- package/skills/railway/deployment/references/variables.md +170 -0
- package/skills/railway/domain/SKILL.md +137 -0
- package/skills/railway/domain/references/environment-config.md +183 -0
- package/skills/railway/domain/references/monorepo.md +216 -0
- package/skills/railway/domain/references/railpack.md +257 -0
- package/skills/railway/domain/references/variables.md +170 -0
- package/skills/railway/environment/SKILL.md +266 -0
- package/skills/railway/environment/references/environment-config.md +183 -0
- package/skills/railway/environment/references/monorepo.md +216 -0
- package/skills/railway/environment/references/railpack.md +257 -0
- package/skills/railway/environment/references/variables.md +170 -0
- package/skills/railway/metrics/SKILL.md +211 -0
- package/skills/railway/metrics/references/environment-config.md +183 -0
- package/skills/railway/metrics/references/monorepo.md +216 -0
- package/skills/railway/metrics/references/railpack.md +257 -0
- package/skills/railway/metrics/references/variables.md +170 -0
- package/skills/railway/metrics/scripts/railway-api.sh +41 -0
- package/skills/railway/new/SKILL.md +489 -0
- package/skills/railway/new/references/environment-config.md +183 -0
- package/skills/railway/new/references/monorepo.md +216 -0
- package/skills/railway/new/references/railpack.md +257 -0
- package/skills/railway/new/references/variables.md +170 -0
- package/skills/railway/projects/SKILL.md +142 -0
- package/skills/railway/projects/references/environment-config.md +183 -0
- package/skills/railway/projects/references/monorepo.md +216 -0
- package/skills/railway/projects/references/railpack.md +257 -0
- package/skills/railway/projects/references/variables.md +170 -0
- package/skills/railway/projects/scripts/railway-api.sh +41 -0
- package/skills/railway/railway-docs/SKILL.md +47 -0
- package/skills/railway/railway-docs/references/environment-config.md +183 -0
- package/skills/railway/railway-docs/references/monorepo.md +216 -0
- package/skills/railway/railway-docs/references/railpack.md +257 -0
- package/skills/railway/railway-docs/references/variables.md +170 -0
- package/skills/railway/service/SKILL.md +249 -0
- package/skills/railway/service/references/environment-config.md +183 -0
- package/skills/railway/service/references/monorepo.md +216 -0
- package/skills/railway/service/references/railpack.md +257 -0
- package/skills/railway/service/references/variables.md +170 -0
- package/skills/railway/service/scripts/railway-api.sh +41 -0
- package/skills/railway/status/SKILL.md +91 -0
- package/skills/railway/status/references/environment-config.md +183 -0
- package/skills/railway/status/references/monorepo.md +216 -0
- package/skills/railway/status/references/railpack.md +257 -0
- package/skills/railway/status/references/variables.md +170 -0
- package/skills/railway/templates/SKILL.md +275 -0
- package/skills/railway/templates/references/environment-config.md +183 -0
- package/skills/railway/templates/references/monorepo.md +216 -0
- package/skills/railway/templates/references/railpack.md +257 -0
- package/skills/railway/templates/references/variables.md +170 -0
- package/skills/railway/templates/scripts/railway-api.sh +41 -0
- package/skills/restart/SKILL.md +70 -0
- package/skills/review-pr/SKILL.md +367 -0
- package/skills/roi-calculator/SKILL.md +469 -0
- package/skills/scan/SKILL.md +232 -0
- package/skills/setup/SKILL.md +691 -0
- package/skills/share/SKILL.md +211 -0
- package/skills/solution-architect/SKILL.md +566 -0
- package/skills/sre/SKILL.md +362 -0
- package/skills/superpowers/brainstorming/SKILL.md +96 -0
- package/skills/superpowers/dispatching-parallel-agents/SKILL.md +180 -0
- package/skills/superpowers/executing-plans/SKILL.md +84 -0
- package/skills/superpowers/finishing-a-development-branch/SKILL.md +200 -0
- package/skills/superpowers/receiving-code-review/SKILL.md +213 -0
- package/skills/superpowers/requesting-code-review/SKILL.md +105 -0
- package/skills/superpowers/requesting-code-review/code-reviewer.md +146 -0
- package/skills/superpowers/subagent-driven-development/SKILL.md +242 -0
- package/skills/superpowers/subagent-driven-development/code-quality-reviewer-prompt.md +20 -0
- package/skills/superpowers/subagent-driven-development/implementer-prompt.md +78 -0
- package/skills/superpowers/subagent-driven-development/spec-reviewer-prompt.md +61 -0
- package/skills/superpowers/systematic-debugging/CREATION-LOG.md +119 -0
- package/skills/superpowers/systematic-debugging/SKILL.md +296 -0
- package/skills/superpowers/systematic-debugging/condition-based-waiting-example.ts +158 -0
- package/skills/superpowers/systematic-debugging/condition-based-waiting.md +115 -0
- package/skills/superpowers/systematic-debugging/defense-in-depth.md +122 -0
- package/skills/superpowers/systematic-debugging/find-polluter.sh +63 -0
- package/skills/superpowers/systematic-debugging/root-cause-tracing.md +169 -0
- package/skills/superpowers/systematic-debugging/test-academic.md +14 -0
- package/skills/superpowers/systematic-debugging/test-pressure-1.md +58 -0
- package/skills/superpowers/systematic-debugging/test-pressure-2.md +68 -0
- package/skills/superpowers/systematic-debugging/test-pressure-3.md +69 -0
- package/skills/superpowers/test-driven-development/SKILL.md +371 -0
- package/skills/superpowers/test-driven-development/testing-anti-patterns.md +299 -0
- package/skills/superpowers/using-git-worktrees/SKILL.md +218 -0
- package/skills/superpowers/using-superpowers/SKILL.md +95 -0
- package/skills/superpowers/verification-before-completion/SKILL.md +139 -0
- package/skills/superpowers/writing-plans/SKILL.md +116 -0
- package/skills/superpowers/writing-skills/SKILL.md +655 -0
- package/skills/superpowers/writing-skills/anthropic-best-practices.md +1150 -0
- package/skills/superpowers/writing-skills/examples/CLAUDE_MD_TESTING.md +189 -0
- package/skills/superpowers/writing-skills/graphviz-conventions.dot +172 -0
- package/skills/superpowers/writing-skills/persuasion-principles.md +187 -0
- package/skills/superpowers/writing-skills/render-graphs.js +168 -0
- package/skills/superpowers/writing-skills/testing-skills-with-subagents.md +384 -0
- package/skills/sync/SKILL.md +188 -0
- package/skills/templates/SKILL.md +194 -0
- package/skills/welcome/SKILL.md +136 -0
- package/skills/wizard/SKILL.md +411 -0
- package/templates/CLAUDE.md.managed-block +123 -0
- package/templates/CLAUDE.md.template +111 -0
- package/templates/consulting/engagement-tracker-template.md +181 -0
- package/templates/consulting/executive-summary-template.md +83 -0
- package/templates/consulting/maturity-assessment-template.md +182 -0
- package/templates/consulting/proposal-template.md +209 -0
- package/templates/consulting/roi-model-template.md +139 -0
- package/templates/consulting/solution-architecture-template.md +313 -0
- package/templates/settings.json +130 -0
|
@@ -0,0 +1,910 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: incident
|
|
3
|
+
description: "Incident triage orchestrator — classifies severity, diagnoses in parallel, routes to /sre, /ci-fix, or /fix based on evidence. Usage: /incident <description>"
|
|
4
|
+
user-invocable: true
|
|
5
|
+
arguments: "<description>"
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# /incident — Incident Triage Orchestrator
|
|
9
|
+
|
|
10
|
+
The single entry point for anything going wrong. Classifies, diagnoses, and routes to
|
|
11
|
+
the right skill — you don't need to figure out if it's an infra issue, CI failure, or
|
|
12
|
+
code bug. `/incident` figures it out for you.
|
|
13
|
+
|
|
14
|
+
**Flow: Classify → Parallel Diagnosis → Correlate → Challenge → Verdict → Route → Resolve → Learn**
|
|
15
|
+
|
|
16
|
+
## When to Use
|
|
17
|
+
|
|
18
|
+
- "The site is down"
|
|
19
|
+
- "CI is failing"
|
|
20
|
+
- "There's a 500 error in production"
|
|
21
|
+
- "Staging deploy failed"
|
|
22
|
+
- "The database is slow"
|
|
23
|
+
- "Users are getting timeouts"
|
|
24
|
+
- "Fix issue #234"
|
|
25
|
+
- Any time something is broken and you don't know which skill to use
|
|
26
|
+
|
|
27
|
+
## Argument Parsing
|
|
28
|
+
|
|
29
|
+
| Input | Action |
|
|
30
|
+
|-------|--------|
|
|
31
|
+
| `/incident the site is down` | Free-text incident description |
|
|
32
|
+
| `/incident #234` | Load incident from GitHub issue |
|
|
33
|
+
| `/incident` (no args) | Auto-detect — check health, CI, recent deploys |
|
|
34
|
+
|
|
35
|
+
---
|
|
36
|
+
|
|
37
|
+
## Phase 1: Classify (instant — no agents, just pattern matching)
|
|
38
|
+
|
|
39
|
+
From the description, classify **severity** and **type** immediately.
|
|
40
|
+
|
|
41
|
+
### Severity Classification
|
|
42
|
+
|
|
43
|
+
| Keywords | Severity | Response Time |
|
|
44
|
+
|----------|----------|--------------|
|
|
45
|
+
| down, outage, crash, 500 in production, data loss, security breach | **CRITICAL** | Act immediately, no questions |
|
|
46
|
+
| slow, timeout, degraded, errors increasing, staging broken | **HIGH** | Act within minutes |
|
|
47
|
+
| failing, broken, error, not working, regression | **MEDIUM** | Investigate, then act |
|
|
48
|
+
| flaky, intermittent, warning, minor, cosmetic | **LOW** | Queue for next available time |
|
|
49
|
+
|
|
50
|
+
### Type Classification
|
|
51
|
+
|
|
52
|
+
| Keywords | Type | Primary Route |
|
|
53
|
+
|----------|------|--------------|
|
|
54
|
+
| CI, pipeline, build failed, lint, test fail, Actions, workflow | **CI Failure** | `/ci-fix` |
|
|
55
|
+
| deploy, health, down, outage, production, staging, 500, 502, 503 | **Infrastructure** | `/sre debug` |
|
|
56
|
+
| bug, fix, issue #, regression, broken feature, wrong behavior | **Code Bug** | `/fix` |
|
|
57
|
+
| slow, performance, timeout, latency, memory, CPU | **Performance** | `/sre debug` (perf focus) |
|
|
58
|
+
| restart, start, stop, hung, frozen, local | **Local Ops** | `/restart` or ops agent |
|
|
59
|
+
| database, migration, schema, data corruption | **Data Issue** | `/sre debug` (data focus) |
|
|
60
|
+
| auth, login, permission, 401, 403 | **Auth Issue** | Could be infra OR code — diagnose first |
|
|
61
|
+
|
|
62
|
+
### Output after classification:
|
|
63
|
+
|
|
64
|
+
```
|
|
65
|
+
INCIDENT — {severity}: {type}
|
|
66
|
+
Description: {user's description}
|
|
67
|
+
```
|
|
68
|
+
|
|
69
|
+
Don't ask questions for CRITICAL severity. Just proceed to Phase 2.
|
|
70
|
+
For HIGH/MEDIUM, briefly confirm the description then proceed.
|
|
71
|
+
For LOW, ask if they want full triage or just a quick check.
|
|
72
|
+
|
|
73
|
+
---
|
|
74
|
+
|
|
75
|
+
## Phase 2: Parallel Diagnosis (< 60 seconds)
|
|
76
|
+
|
|
77
|
+
Spawn 4 cheap agents IN PARALLEL to gather evidence fast. Use explore-light and ops
|
|
78
|
+
agents (Haiku, 1x cost) — not Sonnet/Opus. Speed and cost matter here.
|
|
79
|
+
|
|
80
|
+
### 2.1: Health Check (explore-light)
|
|
81
|
+
|
|
82
|
+
```
|
|
83
|
+
subagent_type: "explore-light"
|
|
84
|
+
name: "health-check"
|
|
85
|
+
prompt: "Read CLAUDE.md for health endpoints and infrastructure details.
|
|
86
|
+
Hit every health endpoint listed. Also check:
|
|
87
|
+
- Production URL: {from CLAUDE.md or project-profile.md}
|
|
88
|
+
- Staging URL: {if configured}
|
|
89
|
+
Report pass/fail for each endpoint with response time and status code."
|
|
90
|
+
```
|
|
91
|
+
|
|
92
|
+
### 2.2: Recent Deploys (ops agent)
|
|
93
|
+
|
|
94
|
+
```
|
|
95
|
+
subagent_type: "ops"
|
|
96
|
+
name: "deploy-check"
|
|
97
|
+
prompt: "Check recent deployment activity:
|
|
98
|
+
1. git log --oneline -5 main (what recently shipped)
|
|
99
|
+
2. gh run list --branch main --limit 5 (CI status)
|
|
100
|
+
3. If Railway: railway status (or platform-appropriate command)
|
|
101
|
+
Report: what was the last deploy, when, did CI pass, any errors."
|
|
102
|
+
```
|
|
103
|
+
|
|
104
|
+
### 2.3: Error Signals (ops agent)
|
|
105
|
+
|
|
106
|
+
```
|
|
107
|
+
subagent_type: "ops"
|
|
108
|
+
name: "error-check"
|
|
109
|
+
prompt: "Check for error signals:
|
|
110
|
+
1. If deploy platform has logs: get last 50 lines, filter for ERROR/WARN/FATAL
|
|
111
|
+
2. If Sentry configured (check .env or CLAUDE.md): note DSN existence
|
|
112
|
+
3. If Docker running: docker ps --format table, check for unhealthy/restarting containers
|
|
113
|
+
4. Check if any services are expected but not running (from CLAUDE.md Local Dev Services)
|
|
114
|
+
Report: active errors, unhealthy services, recent error patterns."
|
|
115
|
+
```
|
|
116
|
+
|
|
117
|
+
### 2.4: Knowledge Base Lookup (ops agent)
|
|
118
|
+
|
|
119
|
+
```
|
|
120
|
+
subagent_type: "ops"
|
|
121
|
+
name: "kb-lookup"
|
|
122
|
+
prompt: "Search for similar past incidents:
|
|
123
|
+
1. Read .claude/qa-knowledge/incidents/ — any files mentioning {affected area keywords}
|
|
124
|
+
2. Read .claude/qa-knowledge/bug-patterns.md — any patterns matching {symptoms}
|
|
125
|
+
3. Read .claude/knowledge/agents/sre.md — any past resolutions for similar issues
|
|
126
|
+
Report: similar past incidents with root causes and how they were resolved."
|
|
127
|
+
```
|
|
128
|
+
|
|
129
|
+
**CRITICAL: Spawn all 4 in ONE message for maximum parallelism.**
|
|
130
|
+
|
|
131
|
+
---
|
|
132
|
+
|
|
133
|
+
## Phase 3: Correlate (analyze the evidence)
|
|
134
|
+
|
|
135
|
+
When all 4 agents report back, correlate their findings:
|
|
136
|
+
|
|
137
|
+
### Decision Matrix
|
|
138
|
+
|
|
139
|
+
| Health | Last Deploy | CI | Errors | Diagnosis | Route |
|
|
140
|
+
|--------|-----------|-----|--------|-----------|-------|
|
|
141
|
+
| DOWN | Recent (< 1hr) | Green | Deploy errors in logs | **Bad deploy** | Revert or `/fix` |
|
|
142
|
+
| DOWN | Recent (< 1hr) | Red | Test failures | **Broken CI shipped** | Revert, then `/ci-fix` |
|
|
143
|
+
| DOWN | None recent | — | Infra errors | **Infrastructure failure** | `/sre debug` |
|
|
144
|
+
| DOWN | None recent | — | No errors | **External dependency** | Check 3rd party status |
|
|
145
|
+
| UP | — | Red | CI errors | **CI broken, prod OK** | `/ci-fix` (not urgent) |
|
|
146
|
+
| UP | — | Green | App errors in logs | **Code bug in prod** | `/fix` with log context |
|
|
147
|
+
| UP | — | Green | No errors | **Intermittent/resolved** | Monitor, check if still happening |
|
|
148
|
+
| SLOW | Recent | — | Timeout errors | **Perf regression from deploy** | `/sre debug` + `/fix` |
|
|
149
|
+
| SLOW | None recent | — | DB slow queries | **Database performance** | `/sre debug` (data focus) |
|
|
150
|
+
| N/A | — | Red | — | **CI failure only** | `/ci-fix` |
|
|
151
|
+
| N/A | — | — | Local errors | **Local dev issue** | ops agent or `/restart` |
|
|
152
|
+
|
|
153
|
+
### Correlation Output
|
|
154
|
+
|
|
155
|
+
```
|
|
156
|
+
DIAGNOSIS:
|
|
157
|
+
Severity: {CRITICAL/HIGH/MEDIUM/LOW}
|
|
158
|
+
Type: {infrastructure/CI/code bug/performance/data/auth}
|
|
159
|
+
Evidence:
|
|
160
|
+
Health: {UP/DOWN/SLOW} — {details}
|
|
161
|
+
Last deploy: {time} — {commit message}
|
|
162
|
+
CI: {GREEN/RED} — {details}
|
|
163
|
+
Errors: {summary}
|
|
164
|
+
Similar past incident: {if found}
|
|
165
|
+
|
|
166
|
+
Root cause hypothesis: {what we think happened based on correlation}
|
|
167
|
+
|
|
168
|
+
Recommended action: {specific skill to invoke}
|
|
169
|
+
|
|
170
|
+
Preliminary confidence: {HIGH (4 signals aligned) / MEDIUM (3 signals aligned) / LOW (2 or fewer signals aligned)}
|
|
171
|
+
Dissenting signals: {list any signals that don't fit the primary hypothesis, or "none"}
|
|
172
|
+
```
|
|
173
|
+
|
|
174
|
+
Confidence is computed from how many of the 4 diagnostic signals (Health, Last Deploy, CI, Errors) align with the matched Decision Matrix row:
|
|
175
|
+
- 4 signals aligned = HIGH confidence
|
|
176
|
+
- 3 signals aligned = MEDIUM confidence
|
|
177
|
+
- 2 or fewer signals aligned = LOW confidence
|
|
178
|
+
|
|
179
|
+
---
|
|
180
|
+
|
|
181
|
+
## Phase 3b: Competing Hypothesis Challenge
|
|
182
|
+
|
|
183
|
+
Before routing to a resolution skill, subject the primary diagnosis to adversarial challenge. 90 seconds of verification prevents 30 minutes of wrong-path investigation.
|
|
184
|
+
|
|
185
|
+
### Mode Selection
|
|
186
|
+
|
|
187
|
+
| Severity | Confidence | Mode | Time Budget | Agents | Condition |
|
|
188
|
+
|----------|------------|------|-------------|--------|-----------|
|
|
189
|
+
| CRITICAL | HIGH | Skip | 0s | None | Unambiguous evidence — act now |
|
|
190
|
+
| CRITICAL | MEDIUM or LOW | Fast Challenge | 30s | 1 Haiku (qa-challenger) | Quick sanity check before committing |
|
|
191
|
+
| HIGH | Any | Full Challenge | 90s | 2 Haiku agents | Worth 90s to avoid 30min wrong-path |
|
|
192
|
+
| MEDIUM | Any | Full Challenge | 90s | 2 Haiku agents | Same as HIGH |
|
|
193
|
+
| LOW | Any | Full Challenge | 120s | 2 Haiku agents | Extra time, low urgency |
|
|
194
|
+
|
|
195
|
+
### Fast Challenge Mode (1 agent, 30s)
|
|
196
|
+
|
|
197
|
+
Spawn a single qa-challenger subagent as devil's advocate:
|
|
198
|
+
|
|
199
|
+
```
|
|
200
|
+
subagent_type: "qa-challenger"
|
|
201
|
+
model: haiku
|
|
202
|
+
name: "devil-advocate-fast"
|
|
203
|
+
prompt: "You are a devil's advocate reviewing an incident diagnosis.
|
|
204
|
+
|
|
205
|
+
Incident: {original description}
|
|
206
|
+
|
|
207
|
+
Evidence gathered:
|
|
208
|
+
Health: {Phase 2.1 result}
|
|
209
|
+
Last deploy: {Phase 2.2 result}
|
|
210
|
+
CI status: {Phase 2.2 CI result}
|
|
211
|
+
Error signals: {Phase 2.3 result}
|
|
212
|
+
KB matches: {Phase 2.4 result}
|
|
213
|
+
|
|
214
|
+
Primary diagnosis: {hypothesis from Phase 3}
|
|
215
|
+
Recommended action: {skill from Phase 3}
|
|
216
|
+
|
|
217
|
+
Your job: in 30 seconds, challenge this diagnosis.
|
|
218
|
+
- Is there an alternative explanation that fits the evidence better?
|
|
219
|
+
- Is any key evidence being ignored?
|
|
220
|
+
- Could this be a different failure mode?
|
|
221
|
+
|
|
222
|
+
Output exactly:
|
|
223
|
+
CHALLENGE: {alternative hypothesis} OR NONE — diagnosis looks correct
|
|
224
|
+
CONFIDENCE: HIGH / MEDIUM / LOW
|
|
225
|
+
KEY EVIDENCE: {the piece of evidence that drives your challenge, or n/a}"
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
Orchestrator enforces the 30-second time budget. If no response in 30s, proceed with primary hypothesis.
|
|
229
|
+
|
|
230
|
+
### Full Challenge Mode (2 agents, 90-120s)
|
|
231
|
+
|
|
232
|
+
Create an agent team for cross-domain adversarial review:
|
|
233
|
+
|
|
234
|
+
```
|
|
235
|
+
TeamCreate: "incident-challenge-{timestamp}"
|
|
236
|
+
|
|
237
|
+
Agent 1 — devil-advocate:
|
|
238
|
+
subagent_type: "qa-challenger"
|
|
239
|
+
model: haiku
|
|
240
|
+
role: Challenges primary hypothesis, proposes alternatives
|
|
241
|
+
|
|
242
|
+
Agent 2 — alt-hypothesis:
|
|
243
|
+
subagent_type: "sre" (if primary routes to /fix or /ci-fix)
|
|
244
|
+
OR "ops" (if primary routes to /sre debug — always the OPPOSITE domain)
|
|
245
|
+
model: haiku
|
|
246
|
+
role: Cross-domain verification of the challenge
|
|
247
|
+
```
|
|
248
|
+
|
|
249
|
+
**Agent 2 domain selection rule:** always choose the domain OPPOSITE to where the primary hypothesis routes. If the primary diagnosis says "code bug → /fix", Agent 2 is an SRE (infra perspective). If the primary says "infrastructure → /sre debug", Agent 2 is an ops agent (code/config perspective).
|
|
250
|
+
|
|
251
|
+
**SendMessage flow (max 2 hops):**
|
|
252
|
+
|
|
253
|
+
Hop 1 — devil-advocate → alt-hypothesis:
|
|
254
|
+
```
|
|
255
|
+
"Challenge: Primary diagnosis says {primary hypothesis} but I want your cross-domain view.
|
|
256
|
+
|
|
257
|
+
Evidence:
|
|
258
|
+
Health: {Phase 2.1 result}
|
|
259
|
+
Last deploy: {Phase 2.2 result}
|
|
260
|
+
CI status: {Phase 2.2 CI result}
|
|
261
|
+
Error signals: {Phase 2.3 result}
|
|
262
|
+
|
|
263
|
+
My alternative hypothesis: {alternative — or 'I agree with primary if no better alternative'}
|
|
264
|
+
|
|
265
|
+
What do you see from your domain perspective?"
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
Hop 2 — alt-hypothesis → devil-advocate:
|
|
269
|
+
```
|
|
270
|
+
"Verdict: {AGREE-PRIMARY | AGREE-CHALLENGE | THIRD-HYPOTHESIS}
|
|
271
|
+
Reasoning: {brief — max 2 sentences}
|
|
272
|
+
Additional evidence: {any cross-domain signal that supports your verdict}"
|
|
273
|
+
```
|
|
274
|
+
|
|
275
|
+
devil-advocate → team lead (orchestrator): final positions summary:
|
|
276
|
+
```
|
|
277
|
+
CHALLENGE SUMMARY:
|
|
278
|
+
devil-advocate position: {primary is correct / alternative: X}
|
|
279
|
+
alt-hypothesis verdict: {AGREE-PRIMARY / AGREE-CHALLENGE / THIRD-HYPOTHESIS: X}
|
|
280
|
+
Key disagreement: {what they disagreed on, or "none — consensus reached"}
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
### Urgency Override
|
|
284
|
+
|
|
285
|
+
If at any point during Phase 3b the user sends "just fix it" (or equivalent urgency signal), immediately proceed with the primary hypothesis. Skip remaining challenge steps. Document the override in the incident report.
|
|
286
|
+
|
|
287
|
+
---
|
|
288
|
+
|
|
289
|
+
## Phase 3c: Verdict Gate
|
|
290
|
+
|
|
291
|
+
Compute a confidence score from Phase 3b results. This score determines whether to proceed, merge, or escalate.
|
|
292
|
+
|
|
293
|
+
### Confidence Score Computation
|
|
294
|
+
|
|
295
|
+
Start at 1.0 and apply adjustments:
|
|
296
|
+
|
|
297
|
+
| Signal | Adjustment |
|
|
298
|
+
|--------|-----------|
|
|
299
|
+
| DA says "NONE — diagnosis looks correct" | +0.0 (stay at 1.0) |
|
|
300
|
+
| DA proposes alternative with HIGH confidence | -0.4 |
|
|
301
|
+
| DA proposes alternative with MEDIUM confidence | -0.2 |
|
|
302
|
+
| DA proposes alternative with LOW confidence | -0.1 |
|
|
303
|
+
| alt-hypothesis AGREES with challenge (not primary) | -0.3 |
|
|
304
|
+
| alt-hypothesis proposes THIRD hypothesis | -0.2 |
|
|
305
|
+
| alt-hypothesis AGREES with primary | +0.1 (cap at 1.0) |
|
|
306
|
+
| KB found matching past incident (Phase 2.4) | +0.1 (cap at 1.0) |
|
|
307
|
+
| Fast Challenge mode (only 1 agent ran) | No adjustment — use score as-is |
|
|
308
|
+
| Skip mode (CRITICAL + HIGH confidence) | Score = 1.0 by definition |
|
|
309
|
+
|
|
310
|
+
### Verdict Paths
|
|
311
|
+
|
|
312
|
+
| Score | Verdict | Action |
|
|
313
|
+
|-------|---------|--------|
|
|
314
|
+
| >= 0.8 | PROCEED | Continue with primary hypothesis unchanged |
|
|
315
|
+
| 0.5 – 0.79 | MERGE | Incorporate challenger insights into the fix context. Document fallback hypothesis for Phase 4 agents. Proceed with primary but stay alert for signs of the alternative. |
|
|
316
|
+
| < 0.5 | ESCALATE | Do not auto-route. Present competing hypotheses to user. |
|
|
317
|
+
|
|
318
|
+
**ESCALATE output format:**
|
|
319
|
+
```
|
|
320
|
+
Competing hypotheses — your call:
|
|
321
|
+
|
|
322
|
+
[A] Primary: {primary hypothesis}
|
|
323
|
+
Evidence for: {supporting signals}
|
|
324
|
+
Confidence: {score before challenge}
|
|
325
|
+
|
|
326
|
+
[B] Challenger: {alternative hypothesis}
|
|
327
|
+
Evidence for: {challenger's key evidence}
|
|
328
|
+
From: {da / alt-hypothesis / both}
|
|
329
|
+
|
|
330
|
+
[C] Investigate both — run parallel diagnosis targeting both hypotheses
|
|
331
|
+
|
|
332
|
+
Which path? (A / B / C)
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
### Verdict Output
|
|
336
|
+
|
|
337
|
+
```
|
|
338
|
+
VERDICT GATE:
|
|
339
|
+
Challenge mode: {skip / fast / full}
|
|
340
|
+
Primary hypothesis: {diagnosis}
|
|
341
|
+
Challenger finding: {agreed / alternative: X / third hypothesis: X}
|
|
342
|
+
Confidence score: {0.0–1.0}
|
|
343
|
+
Verdict: {PROCEED / MERGE / ESCALATE}
|
|
344
|
+
Fallback hypothesis: {if MERGE — what to watch for during resolution}
|
|
345
|
+
```
|
|
346
|
+
|
|
347
|
+
---
|
|
348
|
+
|
|
349
|
+
## Phase 4: Route to Resolution
|
|
350
|
+
|
|
351
|
+
Based on the diagnosis, invoke the appropriate skill. **Do NOT ask the user which skill to use.** The correlation already determined it.
|
|
352
|
+
|
|
353
|
+
### 4.1: Infrastructure Issue → `/sre debug`
|
|
354
|
+
|
|
355
|
+
```
|
|
356
|
+
Invoke /sre debug with context:
|
|
357
|
+
- Description: {original description}
|
|
358
|
+
- Health check results: {from Phase 2.1}
|
|
359
|
+
- Deploy history: {from Phase 2.2}
|
|
360
|
+
- Error logs: {from Phase 2.3}
|
|
361
|
+
- Similar incidents: {from Phase 2.4}
|
|
362
|
+
- Challenge results: {verdict from Phase 3c — PROCEED/MERGE/ESCALATE/N·A (challenge skipped) and challenger summary}
|
|
363
|
+
- Fallback hypothesis: {if MERGE verdict — alternative hypothesis to watch for during investigation}
|
|
364
|
+
```
|
|
365
|
+
|
|
366
|
+
The SRE agent gets all the context from parallel diagnosis — doesn't need to re-discover.
|
|
367
|
+
|
|
368
|
+
### 4.2: CI Failure → `/ci-fix`
|
|
369
|
+
|
|
370
|
+
```
|
|
371
|
+
Invoke /ci-fix with context:
|
|
372
|
+
- Mode: ci (or staging/prod based on diagnosis)
|
|
373
|
+
- Branch: {from deploy check}
|
|
374
|
+
- Known patterns: {from KB lookup}
|
|
375
|
+
- Challenge results: {verdict from Phase 3c — PROCEED/MERGE/ESCALATE/N·A (challenge skipped) and challenger summary}
|
|
376
|
+
- Fallback hypothesis: {if MERGE verdict — alternative hypothesis to watch for during investigation}
|
|
377
|
+
```
|
|
378
|
+
|
|
379
|
+
### 4.3: Code Bug → `/fix`
|
|
380
|
+
|
|
381
|
+
```
|
|
382
|
+
Invoke /fix with context:
|
|
383
|
+
- Description: {original description + diagnosis}
|
|
384
|
+
- Severity: {from Phase 1}
|
|
385
|
+
- Error logs: {from Phase 2.3 — gives the agent a head start on root cause}
|
|
386
|
+
- Similar incidents: {from Phase 2.4 — may identify root cause immediately}
|
|
387
|
+
- Challenge results: {verdict from Phase 3c — PROCEED/MERGE/ESCALATE/N·A (challenge skipped) and challenger summary}
|
|
388
|
+
- Fallback hypothesis: {if MERGE verdict — alternative hypothesis to watch for during investigation}
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
If an issue number was provided (`/incident #234`), pass it: `/fix #234 --severity {classified severity}`
|
|
392
|
+
|
|
393
|
+
### 4.4: Performance Issue → `/sre debug` with perf focus
|
|
394
|
+
|
|
395
|
+
```
|
|
396
|
+
Invoke /sre debug with:
|
|
397
|
+
- Description: "Performance degradation: {description}"
|
|
398
|
+
- Focus: "Use RED method (Rate/Errors/Duration) for API endpoints.
|
|
399
|
+
Check: database query times, connection pool, cache hit rates, memory usage."
|
|
400
|
+
- Challenge results: {verdict from Phase 3c — PROCEED/MERGE/ESCALATE/N·A (challenge skipped) and challenger summary}
|
|
401
|
+
- Fallback hypothesis: {if MERGE verdict — alternative hypothesis to watch for during investigation}
|
|
402
|
+
```
|
|
403
|
+
|
|
404
|
+
### 4.5: Local Ops → `/restart` or ops agent
|
|
405
|
+
|
|
406
|
+
```
|
|
407
|
+
For local issues (services not starting, local errors):
|
|
408
|
+
- Invoke /restart skill
|
|
409
|
+
- Or spawn ops agent for specific task
|
|
410
|
+
```
|
|
411
|
+
|
|
412
|
+
### 4.6: Unknown / Ambiguous → Full `/sre status` then decide
|
|
413
|
+
|
|
414
|
+
```
|
|
415
|
+
If correlation is inconclusive:
|
|
416
|
+
1. Run /sre status for complete system overview
|
|
417
|
+
2. Present findings to user
|
|
418
|
+
3. Ask: "Based on this, it looks like {hypothesis}. Should I proceed with {skill}?"
|
|
419
|
+
```
|
|
420
|
+
|
|
421
|
+
---
|
|
422
|
+
|
|
423
|
+
## Phase 5: Post-Resolution
|
|
424
|
+
|
|
425
|
+
After the routed skill completes its full lifecycle. **Important**: different skills have
|
|
426
|
+
different post-implementation flows — `/incident` must wait for the ENTIRE flow, not just
|
|
427
|
+
the fix itself.
|
|
428
|
+
|
|
429
|
+
### Coordination with routed skill lifecycle
|
|
430
|
+
|
|
431
|
+
| Routed to | What that skill does after fixing | When /incident Phase 5 starts |
|
|
432
|
+
|-----------|----------------------------------|-------------------------------|
|
|
433
|
+
| `/fix` | Step 6: QA → restart servers → user tests → `/pr --skip-qa` → PR created | After PR is created (Step 6.4 completes) |
|
|
434
|
+
| `/ci-fix` | Retries CI, may push fixes | After CI passes or exhausts retries |
|
|
435
|
+
| `/sre debug` | Infra fix (no code) → verify health | Immediately after health verified |
|
|
436
|
+
| `/sre debug` → escalates to `/fix` | Same as `/fix` row above | After `/fix`'s full lifecycle completes |
|
|
437
|
+
| `/restart` | Restarts services → verify health | Immediately after health verified |
|
|
438
|
+
|
|
439
|
+
**Key rule**: Do NOT duplicate QA, server restarts, or user testing that the routed skill
|
|
440
|
+
already handles. `/incident` Phase 5 is about **verification, reporting, and learning** — not
|
|
441
|
+
re-running the same checks.
|
|
442
|
+
|
|
443
|
+
### 5.1: Verify resolution
|
|
444
|
+
|
|
445
|
+
For **code bug routes** (`/fix`, `/ci-fix`): The routed skill already ran QA, restarted
|
|
446
|
+
servers, and got user confirmation. Phase 5.1 only re-runs health checks to confirm
|
|
447
|
+
the deployed fix (if auto-deployed) or that local state is clean:
|
|
448
|
+
|
|
449
|
+
```bash
|
|
450
|
+
# Re-run the health checks from Phase 2.1
|
|
451
|
+
# Compare: were the failing endpoints now passing?
|
|
452
|
+
# Compare: are the error signals gone?
|
|
453
|
+
```
|
|
454
|
+
|
|
455
|
+
For **infra routes** (`/sre debug`, `/restart`): These don't go through QA/PR.
|
|
456
|
+
Phase 5.1 is the primary verification — check health endpoints and error signals.
|
|
457
|
+
If the infra fix involved config or environment changes that could affect behavior:
|
|
458
|
+
|
|
459
|
+
```
|
|
460
|
+
After infra fix, quick sanity check:
|
|
461
|
+
1. Hit all health endpoints (from CLAUDE.md Environments table)
|
|
462
|
+
2. Check error signals have cleared
|
|
463
|
+
3. If config was changed: ask user to smoke-test manually
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
### 5.2: Write incident report
|
|
467
|
+
|
|
468
|
+
Create `.claude/qa-knowledge/incidents/{date}-{slug}.md`:
|
|
469
|
+
|
|
470
|
+
```markdown
|
|
471
|
+
---
|
|
472
|
+
status: resolved
|
|
473
|
+
severity: {severity}
|
|
474
|
+
type: {type}
|
|
475
|
+
affected: {services/endpoints}
|
|
476
|
+
duration: {time from report to resolution}
|
|
477
|
+
root_cause: {from the skill that fixed it}
|
|
478
|
+
resolved_by: {/sre, /ci-fix, /fix, /restart}
|
|
479
|
+
---
|
|
480
|
+
|
|
481
|
+
## Timeline
|
|
482
|
+
- {time}: Incident reported — "{original description}"
|
|
483
|
+
- {time}: Parallel diagnosis — {what was found}
|
|
484
|
+
- {time}: Routed to {skill} — {diagnosis}
|
|
485
|
+
- {time}: Resolution — {what was done}
|
|
486
|
+
- {time}: Verified — {health checks pass}
|
|
487
|
+
|
|
488
|
+
## Root Cause
|
|
489
|
+
{from the resolving skill}
|
|
490
|
+
|
|
491
|
+
## Prevention
|
|
492
|
+
{what would prevent this from happening again}
|
|
493
|
+
|
|
494
|
+
## Similar Past Incidents
|
|
495
|
+
{from Phase 2.4 KB lookup}
|
|
496
|
+
|
|
497
|
+
## Challenge Results
|
|
498
|
+
- Challenge mode: {skip / fast / full}
|
|
499
|
+
- Primary hypothesis: {diagnosis from Phase 3}
|
|
500
|
+
- Challenger finding: {agreed / alternative proposed: X / third hypothesis: X}
|
|
501
|
+
- Verdict: {PROCEED / MERGE / ESCALATE}
|
|
502
|
+
- Confidence score: {0.0–1.0}
|
|
503
|
+
```
|
|
504
|
+
|
|
505
|
+
### 5.3: Update knowledge base
|
|
506
|
+
|
|
507
|
+
- Append to `.claude/knowledge/agents/sre.md`: incident summary + resolution
|
|
508
|
+
- If new pattern discovered: append to `.claude/qa-knowledge/bug-patterns.md`
|
|
509
|
+
- If the diagnosis was wrong (routed to wrong skill): note the misclassification
|
|
510
|
+
so future triage is more accurate
|
|
511
|
+
- Append to `.claude/knowledge/skills/incident.md`: challenge effectiveness data —
|
|
512
|
+
whether the challenger caught a real issue, whether the primary hypothesis was correct,
|
|
513
|
+
and the final confidence score. This builds a record of when challenge modes add value
|
|
514
|
+
vs. when they confirm the obvious.
|
|
515
|
+
|
|
516
|
+
### 5.4: Notify team
|
|
517
|
+
|
|
518
|
+
If Discord MCP is available:
|
|
519
|
+
```
|
|
520
|
+
Send to #deployments:
|
|
521
|
+
"Incident resolved — {severity} {type}
|
|
522
|
+
Root cause: {summary}
|
|
523
|
+
Fix: {what was done}
|
|
524
|
+
Duration: {time}"
|
|
525
|
+
```
|
|
526
|
+
|
|
527
|
+
### 5.5: What's next
|
|
528
|
+
|
|
529
|
+
After resolution is verified and reported:
|
|
530
|
+
|
|
531
|
+
```
|
|
532
|
+
Incident resolved. Duration: {time}.
|
|
533
|
+
|
|
534
|
+
What's next?
|
|
535
|
+
[1] Monitor — keep watching for recurrence (/sre health in 5 min)
|
|
536
|
+
[2] Fix another issue — /fix #N or /incident <description>
|
|
537
|
+
[3] See project status — /onboard
|
|
538
|
+
[4] Done for now
|
|
539
|
+
```
|
|
540
|
+
|
|
541
|
+
For CRITICAL/HIGH severity, default to option 1 (monitor) and suggest running
|
|
542
|
+
`/sre health` after 5-10 minutes to confirm the fix holds.
|
|
543
|
+
|
|
544
|
+
---
|
|
545
|
+
|
|
546
|
+
## Auto-Detect Mode (`/incident` with no args)
|
|
547
|
+
|
|
548
|
+
When invoked without a description, run Phase 2 checks proactively:
|
|
549
|
+
|
|
550
|
+
1. Hit all health endpoints
|
|
551
|
+
2. Check CI status
|
|
552
|
+
3. Check recent deploys
|
|
553
|
+
4. Check for error signals
|
|
554
|
+
|
|
555
|
+
If everything is green:
|
|
556
|
+
```
|
|
557
|
+
All systems healthy:
|
|
558
|
+
✓ Health endpoints: all responding
|
|
559
|
+
✓ CI: last 3 runs green
|
|
560
|
+
✓ Deploy: last deploy {time ago}, healthy
|
|
561
|
+
✓ No error signals detected
|
|
562
|
+
|
|
563
|
+
Nothing to triage. What prompted the check?
|
|
564
|
+
```
|
|
565
|
+
|
|
566
|
+
If something is wrong, proceed to Phase 3 correlation, Phase 3b challenge, Phase 3c verdict, and Phase 4 routing automatically.
|
|
567
|
+
|
|
568
|
+
---
|
|
569
|
+
|
|
570
|
+
## Escalation Rules
|
|
571
|
+
|
|
572
|
+
| Condition | Escalation |
|
|
573
|
+
|-----------|-----------|
|
|
574
|
+
| /sre debug finds a code bug | → Escalate to `/fix` with SRE's findings as context |
|
|
575
|
+
| /ci-fix exhausts 3 attempts | → Escalate to `/fix` (deeper investigation needed) |
|
|
576
|
+
| /fix finds an infra issue (not code) | → Escalate back to `/sre debug` |
|
|
577
|
+
| Any skill fails to resolve in 15 min | → Alert on Discord, present options to user |
|
|
578
|
+
| CRITICAL severity not resolved in 30 min | → Suggest revert: `git revert HEAD && git push` |
|
|
579
|
+
|
|
580
|
+
---
|
|
581
|
+
|
|
582
|
+
## Integration with Existing Skills
|
|
583
|
+
|
|
584
|
+
| Skill | How /incident uses it |
|
|
585
|
+
|-------|----------------------|
|
|
586
|
+
| `/sre debug` | Routed to for infrastructure issues, gets pre-gathered context |
|
|
587
|
+
| `/ci-fix` | Routed to for CI failures, gets branch and pattern context |
|
|
588
|
+
| `/fix` | Routed to for code bugs, gets error logs and similar incidents |
|
|
589
|
+
| `/restart` | Routed to for local ops issues |
|
|
590
|
+
| `/qa` | Run after resolution to verify no regressions |
|
|
591
|
+
| explore-light | Phase 2 health checks (1x cost) |
|
|
592
|
+
| ops agent | Phase 2 deploy/error/KB checks (1x cost) |
|
|
593
|
+
| `/review-pr` | After /fix creates a PR, review it before merge |
|
|
594
|
+
|
|
595
|
+
---
|
|
596
|
+
|
|
597
|
+
## Cost Model
|
|
598
|
+
|
|
599
|
+
### By Phase
|
|
600
|
+
|
|
601
|
+
| Phase | Agents | Cost |
|
|
602
|
+
|-------|--------|------|
|
|
603
|
+
| Phase 1: Classify | None (pattern matching) | 0 |
|
|
604
|
+
| Phase 2: Diagnose | 4 Haiku agents in parallel | 4x |
|
|
605
|
+
| Phase 3: Correlate | None (analysis) | 0 |
|
|
606
|
+
| Phase 3b: Challenge (skip) | None | 0 |
|
|
607
|
+
| Phase 3b: Challenge (fast) | 1 Haiku agent | 1x |
|
|
608
|
+
| Phase 3b: Challenge (full) | 2 Haiku agents | 2x |
|
|
609
|
+
| Phase 4: Route | 1 Sonnet agent (SRE/fix/ci-fix) | 10x |
|
|
610
|
+
| Phase 5: Verify | 1 Haiku agent (health check) | 1x |
|
|
611
|
+
|
|
612
|
+
### By Severity (total cost including challenge)
|
|
613
|
+
|
|
614
|
+
| Severity | Challenge Mode | Total Cost | Notes |
|
|
615
|
+
|----------|---------------|------------|-------|
|
|
616
|
+
| CRITICAL + HIGH confidence | Skip | ~15x | No challenge overhead |
|
|
617
|
+
| CRITICAL + med/low confidence | Fast (1 Haiku) | ~16x | +1x for 30s sanity check |
|
|
618
|
+
| HIGH | Full (2 Haiku) | ~17x | +2x for 90s full challenge |
|
|
619
|
+
| MEDIUM | Full (2 Haiku) | ~17x | Same as HIGH |
|
|
620
|
+
| LOW | Full (2 Haiku) | ~17x | 120s budget, same agent cost |
|
|
621
|
+
|
|
622
|
+
**Break-even analysis:** Full challenge adds ~2x cost (2 Haiku agents). A single wrong routing — running `/sre debug` when the issue is a code bug — wastes a full Sonnet invocation (10x) plus the time to re-diagnose and re-route. Challenge pays for itself if it catches one misrouting in every 5 incidents. At typical incident rates, that's almost always worth it.
|
|
623
|
+
|
|
624
|
+
---
|
|
625
|
+
|
|
626
|
+
## Rules
|
|
627
|
+
|
|
628
|
+
- **CRITICAL = no questions.** Diagnose and act immediately.
|
|
629
|
+
- **Parallel diagnosis is mandatory.** Never diagnose sequentially — 4 agents at once.
|
|
630
|
+
- **Evidence before routing.** Don't guess which skill to use — correlate first.
|
|
631
|
+
- **Challenge before routing (unless CRITICAL + HIGH confidence).** 90 seconds of verification prevents 30 minutes of wrong-path investigation.
|
|
632
|
+
- **Challenge uses Haiku. Never escalate challenge agents to Sonnet/Opus.** Speed and cost matter — Haiku is sufficient for adversarial review.
|
|
633
|
+
- **Escalation is automatic.** If `/sre` finds a code bug, it routes to `/fix` without asking.
|
|
634
|
+
- **Every incident gets a report.** Even if resolved in 30 seconds.
|
|
635
|
+
- **Knowledge base is always checked.** Similar past incidents save diagnosis time.
|
|
636
|
+
- **Verification after resolution.** Re-run health checks to confirm the fix worked.
|
|
637
|
+
- **Notify stakeholders.** Communication channels are checked and used at every phase.
|
|
638
|
+
|
|
639
|
+
---
|
|
640
|
+
|
|
641
|
+
## Stakeholder Communication
|
|
642
|
+
|
|
643
|
+
Incidents don't happen in a vacuum. Stakeholders need updates throughout — not just at the end.
|
|
644
|
+
|
|
645
|
+
### Communication Channels (auto-detect from project)
|
|
646
|
+
|
|
647
|
+
Check these sources to find configured channels:
|
|
648
|
+
|
|
649
|
+
| Signal | Channel | How to use |
|
|
650
|
+
|--------|---------|-----------|
|
|
651
|
+
| Discord MCP in `.claude/settings.json` | Discord | `mcp__discord-mcp__send-message(channel, message)` |
|
|
652
|
+
| `slack@claude-plugins-official` enabled | Slack | Slack plugin send message |
|
|
653
|
+
| `SLACK_WEBHOOK_URL` in env | Slack webhook | `curl -X POST -d '{"text":"..."}' $SLACK_WEBHOOK_URL` |
|
|
654
|
+
| `DISCORD_WEBHOOK_URL` in env | Discord webhook | `curl -X POST -d '{"content":"..."}' $DISCORD_WEBHOOK_URL` |
|
|
655
|
+
| `PAGERDUTY_*` in env | PagerDuty | PagerDuty API for escalation |
|
|
656
|
+
| `OPSGENIE_*` in env | OpsGenie | OpsGenie API for alerting |
|
|
657
|
+
| GitHub issue exists for the incident | GitHub | Comment on the issue with updates |
|
|
658
|
+
| `TEAMS_WEBHOOK_URL` in env | Microsoft Teams | Teams webhook API |
|
|
659
|
+
| Notion MCP configured | Notion | Create/update incident page |
|
|
660
|
+
| Linear MCP configured | Linear | Create/update incident issue |
|
|
661
|
+
|
|
662
|
+
**On first run, detect which channels are available and store in knowledge base.**
|
|
663
|
+
|
|
664
|
+
### Communication Protocol
|
|
665
|
+
|
|
666
|
+
| Phase | Who to notify | What to say | Channel |
|
|
667
|
+
|-------|-------------|------------|---------|
|
|
668
|
+
| Phase 1 (Classify) | On-call / team | "Incident detected: {severity} — {description}. Investigating." | Discord/Slack #incidents |
|
|
669
|
+
| Phase 3 (Correlate) | On-call / team | "Diagnosis: {type} — {hypothesis}. Routing to {skill}." | Discord/Slack #incidents |
|
|
670
|
+
| Phase 4 (mid-fix) | Stakeholders if CRITICAL | "Update: root cause identified — {cause}. Fix in progress. ETA: {estimate}." | Discord/Slack #incidents + PagerDuty |
|
|
671
|
+
| Phase 5 (Resolved) | Everyone | "Resolved: {root cause}. Fix: {what was done}. Duration: {time}." | Discord/Slack #incidents + GitHub issue |
|
|
672
|
+
| Post-incident | Team lead | Full incident report with timeline, root cause, prevention | Knowledge base + Notion/Linear |
|
|
673
|
+
|
|
674
|
+
### Stakeholder Input During Incident
|
|
675
|
+
|
|
676
|
+
If the user (or a stakeholder via Discord/Slack) provides additional context during
|
|
677
|
+
the incident, incorporate it:
|
|
678
|
+
|
|
679
|
+
```
|
|
680
|
+
User: "actually it only affects users in EU"
|
|
681
|
+
→ Update scope: EU region only
|
|
682
|
+
→ Check: is there a regional configuration difference?
|
|
683
|
+
→ Update diagnosis with this context
|
|
684
|
+
|
|
685
|
+
User: "we just deployed a config change 10 minutes ago"
|
|
686
|
+
→ Highest suspect: the config change
|
|
687
|
+
→ Check the config diff
|
|
688
|
+
→ Skip broad diagnosis, focus on config
|
|
689
|
+
|
|
690
|
+
User: "don't revert, we need that deploy"
|
|
691
|
+
→ Respect the constraint
|
|
692
|
+
→ Find a forward fix instead of revert
|
|
693
|
+
→ Update incident notes with the constraint
|
|
694
|
+
```
|
|
695
|
+
|
|
696
|
+
---
|
|
697
|
+
|
|
698
|
+
## Data Source Integration
|
|
699
|
+
|
|
700
|
+
### Automatic Data Sources (always available)
|
|
701
|
+
|
|
702
|
+
| Source | What it provides | How to access |
|
|
703
|
+
|--------|-----------------|--------------|
|
|
704
|
+
| Git history | Recent commits, who changed what | `git log`, `git blame` |
|
|
705
|
+
| GitHub/GitLab | CI status, PRs, issues, deployments | `gh` CLI |
|
|
706
|
+
| CLAUDE.md | Health endpoints, service topology | Read tool |
|
|
707
|
+
| Docker | Container health, resource usage | `docker ps`, `docker stats` |
|
|
708
|
+
| Project logs | Application errors, access logs | Platform-specific CLI |
|
|
709
|
+
|
|
710
|
+
### MCP-Connected Data Sources (if configured)
|
|
711
|
+
|
|
712
|
+
| MCP Server | What it unlocks |
|
|
713
|
+
|-----------|----------------|
|
|
714
|
+
| Postgres MCP | Direct DB queries — check connection count, slow queries, table sizes |
|
|
715
|
+
| Redis MCP | Cache hit rates, memory usage, connected clients |
|
|
716
|
+
| MongoDB MCP | Collection stats, slow operations, replica set health |
|
|
717
|
+
| Sentry MCP | Error rates, affected users, stack traces, release health |
|
|
718
|
+
| Datadog MCP | APM traces, infrastructure metrics, log patterns |
|
|
719
|
+
| AWS MCP | CloudWatch metrics, ECS task health, RDS stats, Lambda errors |
|
|
720
|
+
| GCP MCP | Cloud Monitoring, Error Reporting, Trace |
|
|
721
|
+
| Cloudflare MCP | Edge analytics, WAF events, origin health |
|
|
722
|
+
| PagerDuty MCP | Active incidents, on-call schedule |
|
|
723
|
+
|
|
724
|
+
### Recommending Data Sources
|
|
725
|
+
|
|
726
|
+
If diagnosis is limited by missing data sources:
|
|
727
|
+
|
|
728
|
+
```
|
|
729
|
+
Incident triage limited — missing data sources:
|
|
730
|
+
|
|
731
|
+
⚠ No monitoring connected — can't check error rates or performance metrics
|
|
732
|
+
Recommend: Sentry MCP (free tier) or Datadog MCP
|
|
733
|
+
/calibrate can install this automatically
|
|
734
|
+
|
|
735
|
+
⚠ No database MCP — can't check query performance or connection health
|
|
736
|
+
Recommend: Postgres MCP / Redis MCP based on your stack
|
|
737
|
+
/calibrate can install this automatically
|
|
738
|
+
|
|
739
|
+
⚠ No alerting configured — team won't be notified of future incidents
|
|
740
|
+
Recommend: PagerDuty or OpsGenie integration
|
|
741
|
+
```
|
|
742
|
+
|
|
743
|
+
These recommendations feed back to `/calibrate` — next time it runs, it includes
|
|
744
|
+
incident-driven recommendations alongside the standard ones.
|
|
745
|
+
|
|
746
|
+
---
|
|
747
|
+
|
|
748
|
+
## Adaptive Learning
|
|
749
|
+
|
|
750
|
+
### Learn from every incident
|
|
751
|
+
|
|
752
|
+
After resolution, analyze the incident for patterns:
|
|
753
|
+
|
|
754
|
+
```python
|
|
755
|
+
# What to learn:
|
|
756
|
+
1. Classification accuracy — did we route to the right skill?
|
|
757
|
+
If /sre was invoked but it turned out to be a code bug → improve type classification
|
|
758
|
+
|
|
759
|
+
2. Diagnosis speed — which of the 4 parallel checks found the answer?
|
|
760
|
+
If KB lookup found a matching past incident → that was the fastest path
|
|
761
|
+
If health check was the key signal → health checks are working well
|
|
762
|
+
|
|
763
|
+
3. Resolution effectiveness — did the routed skill fix it?
|
|
764
|
+
If /fix resolved it → code bug pattern, add to bug-patterns.md
|
|
765
|
+
If /sre resolved it → infra pattern, add to sre knowledge
|
|
766
|
+
If manual intervention was needed → gap in automation
|
|
767
|
+
|
|
768
|
+
4. Time to resolution — how long from report to verified fix?
|
|
769
|
+
Track per severity and type for trending
|
|
770
|
+
```
|
|
771
|
+
|
|
772
|
+
### Write learning to knowledge base
|
|
773
|
+
|
|
774
|
+
After every incident, append to `.claude/knowledge/skills/incident.md`:
|
|
775
|
+
|
|
776
|
+
```markdown
|
|
777
|
+
### {date} — {title} ({severity}, {type})
|
|
778
|
+
**Classified as**: {type} — {was this correct? yes/no}
|
|
779
|
+
**Routed to**: {skill}
|
|
780
|
+
**Root cause**: {summary}
|
|
781
|
+
**Resolution time**: {duration}
|
|
782
|
+
**Key signal**: {which Phase 2 check found the answer}
|
|
783
|
+
**Knowledge gap**: {what we didn't know that slowed us down}
|
|
784
|
+
**New pattern**: {if this is a new type of incident, describe it}
|
|
785
|
+
**Similar past**: {count of similar incidents — is this recurring?}
|
|
786
|
+
```
|
|
787
|
+
|
|
788
|
+
### Pattern Detection
|
|
789
|
+
|
|
790
|
+
After 5+ incidents are logged, check for patterns:
|
|
791
|
+
|
|
792
|
+
```
|
|
793
|
+
Incident patterns detected:
|
|
794
|
+
⚠ 3 incidents in backend/app/services/auth.py in last 30 days
|
|
795
|
+
→ Consider: comprehensive auth service refactor
|
|
796
|
+
⚠ 2 CI failures from dependency updates in last 2 weeks
|
|
797
|
+
→ Consider: pin dependencies, add lockfile check
|
|
798
|
+
⚠ Database slow query incidents increasing
|
|
799
|
+
→ Consider: add query monitoring, review indexes
|
|
800
|
+
```
|
|
801
|
+
|
|
802
|
+
Surface these in `/onboard` briefing so the team sees systemic issues, not just
|
|
803
|
+
individual incidents.
|
|
804
|
+
|
|
805
|
+
### Improve classification over time
|
|
806
|
+
|
|
807
|
+
Read `.claude/knowledge/skills/incident.md` at the start of every incident. If past
|
|
808
|
+
incidents show that certain keywords were misclassified:
|
|
809
|
+
|
|
810
|
+
```
|
|
811
|
+
# From knowledge base:
|
|
812
|
+
# "auth timeout" was classified as INFRA but was actually CODE BUG (3 times)
|
|
813
|
+
# → Override: "auth.*timeout" → CODE BUG, not INFRA
|
|
814
|
+
```
|
|
815
|
+
|
|
816
|
+
The classification tables in Phase 1 are defaults. Knowledge base corrections
|
|
817
|
+
override them for this specific project.
|
|
818
|
+
|
|
819
|
+
---
|
|
820
|
+
|
|
821
|
+
## /calibrate Integration
|
|
822
|
+
|
|
823
|
+
### What /calibrate contributes to /incident
|
|
824
|
+
|
|
825
|
+
When `/calibrate` runs, it discovers:
|
|
826
|
+
- Deploy platform → /incident knows where to check logs
|
|
827
|
+
- Monitoring tools → /incident knows which MCP servers to query
|
|
828
|
+
- CI platform → /incident knows how to check CI status
|
|
829
|
+
- Health endpoints → /incident knows what to hit first
|
|
830
|
+
- Communication channels → /incident knows where to send alerts
|
|
831
|
+
|
|
832
|
+
### What /incident contributes to /calibrate
|
|
833
|
+
|
|
834
|
+
After incidents, /incident recommends:
|
|
835
|
+
- Missing monitoring → "add Sentry MCP"
|
|
836
|
+
- Missing alerting → "add PagerDuty integration"
|
|
837
|
+
- Missing health endpoints → "add /health to service X"
|
|
838
|
+
- Missing data sources → "add Postgres MCP for query diagnostics"
|
|
839
|
+
|
|
840
|
+
These get included in `/calibrate`'s recommendations next time it runs.
|
|
841
|
+
|
|
842
|
+
### Project Profile Usage
|
|
843
|
+
|
|
844
|
+
If `.claude/project-profile.md` exists:
|
|
845
|
+
- Read infrastructure section → skip platform detection
|
|
846
|
+
- Read external integrations → know which MCP servers are available
|
|
847
|
+
- Read domain model → understand which services are critical vs auxiliary
|
|
848
|
+
|
|
849
|
+
If it doesn't exist:
|
|
850
|
+
- Do full discovery (Phase 2 handles this)
|
|
851
|
+
- Recommend running `/calibrate`
|
|
852
|
+
|
|
853
|
+
---
|
|
854
|
+
|
|
855
|
+
## Workflow: Full Incident Lifecycle
|
|
856
|
+
|
|
857
|
+
```
|
|
858
|
+
Something is wrong
|
|
859
|
+
│
|
|
860
|
+
▼
|
|
861
|
+
/incident "{description}"
|
|
862
|
+
│
|
|
863
|
+
├── Phase 1: CLASSIFY (instant)
|
|
864
|
+
│ Severity: CRITICAL / HIGH / MEDIUM / LOW
|
|
865
|
+
│ Type: infra / CI / code bug / perf / data / auth
|
|
866
|
+
│ → Notify team: "Incident detected, investigating"
|
|
867
|
+
│
|
|
868
|
+
├── Phase 2: PARALLEL DIAGNOSIS (4 agents, < 60s)
|
|
869
|
+
│ ├── Health endpoints (explore-light, 1x)
|
|
870
|
+
│ ├── Recent deploys + CI (ops, 1x)
|
|
871
|
+
│ ├── Error signals + logs (ops, 1x)
|
|
872
|
+
│ └── Knowledge base lookup (ops, 1x)
|
|
873
|
+
│ → Notify team: "Diagnosis complete, routing"
|
|
874
|
+
│
|
|
875
|
+
├── Phase 3: CORRELATE
|
|
876
|
+
│ Health × Deploy × CI × Errors = Diagnosis
|
|
877
|
+
│ → Hypothesis + preliminary confidence (HIGH/MEDIUM/LOW)
|
|
878
|
+
│
|
|
879
|
+
├── Phase 3b: CHALLENGE (adversarial swarm)
|
|
880
|
+
│ ├── CRITICAL + HIGH conf → Skip (0s)
|
|
881
|
+
│ ├── CRITICAL + med/low → Fast: 1 Haiku devil's advocate (30s)
|
|
882
|
+
│ └── HIGH/MEDIUM/LOW → Full: 2 Haiku agents, 2-hop debate (90-120s)
|
|
883
|
+
│ ├── devil-advocate: challenges primary hypothesis
|
|
884
|
+
│ └── alt-hypothesis: cross-domain verification
|
|
885
|
+
│
|
|
886
|
+
├── Phase 3c: VERDICT GATE
|
|
887
|
+
│ Confidence score → PROCEED / MERGE / ESCALATE
|
|
888
|
+
│ → If ESCALATE: present options [A] primary [B] challenger [C] both
|
|
889
|
+
│
|
|
890
|
+
├── Phase 4: ROUTE + RESOLVE
|
|
891
|
+
│ ├── Infra → /sre debug (with all Phase 2 context)
|
|
892
|
+
│ ├── CI → /ci-fix (with branch + patterns)
|
|
893
|
+
│ ├── Code bug → /fix (with error logs + similar incidents)
|
|
894
|
+
│ ├── Perf → /sre debug (perf focus)
|
|
895
|
+
│ └── Local → /restart or ops agent
|
|
896
|
+
│ → Notify team: "Root cause: {X}. Fix in progress."
|
|
897
|
+
│ → Accept stakeholder input: constraints, context, steering
|
|
898
|
+
│
|
|
899
|
+
├── Phase 5: VERIFY + REPORT
|
|
900
|
+
│ ├── Re-run health checks
|
|
901
|
+
│ ├── Write incident report
|
|
902
|
+
│ ├── Update knowledge base
|
|
903
|
+
│ └── Notify team: "Resolved. Duration: {X}."
|
|
904
|
+
│
|
|
905
|
+
└── LEARN
|
|
906
|
+
├── Was classification correct?
|
|
907
|
+
├── Which diagnosis signal was key?
|
|
908
|
+
├── Is this a recurring pattern?
|
|
909
|
+
└── What data sources were missing?
|
|
910
|
+
```
|