mindforge-cc 10.0.3 → 11.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (333) hide show
  1. package/.mindforge/MINDFORGE-V2-SCHEMA.json +43 -10
  2. package/.mindforge/config.json +30 -2
  3. package/.mindforge/engine/cross-model-eval.md +74 -0
  4. package/.mindforge/engine/proactive/signal-detector.md +60 -0
  5. package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
  6. package/.mindforge/personas/agent-architect.md +57 -0
  7. package/.mindforge/personas/agent-evaluator.md +162 -0
  8. package/.mindforge/personas/agent-memory-designer.md +157 -0
  9. package/.mindforge/personas/agent-ops-engineer.md +120 -0
  10. package/.mindforge/personas/agent-orchestrator.md +112 -0
  11. package/.mindforge/personas/ai-economist.md +57 -0
  12. package/.mindforge/personas/ai-safety-engineer.md +57 -0
  13. package/.mindforge/personas/analytics-engineer.md +57 -0
  14. package/.mindforge/personas/anti-pattern-hunter.md +61 -0
  15. package/.mindforge/personas/api-gateway-designer.md +132 -0
  16. package/.mindforge/personas/auth-engineer.md +112 -0
  17. package/.mindforge/personas/build-engineer.md +57 -0
  18. package/.mindforge/personas/business-analyst.md +56 -0
  19. package/.mindforge/personas/cache-architect.md +100 -0
  20. package/.mindforge/personas/causal-scientist.md +57 -0
  21. package/.mindforge/personas/cdn-architect.md +118 -0
  22. package/.mindforge/personas/change-agent.md +104 -0
  23. package/.mindforge/personas/code-narrator.md +52 -0
  24. package/.mindforge/personas/codegen-specialist.md +68 -0
  25. package/.mindforge/personas/communication-architect.md +102 -0
  26. package/.mindforge/personas/compliance-engineer.md +96 -0
  27. package/.mindforge/personas/consensus-engineer.md +116 -0
  28. package/.mindforge/personas/contract-tester.md +60 -192
  29. package/.mindforge/personas/data-architect.md +108 -0
  30. package/.mindforge/personas/data-mesh-architect.md +57 -0
  31. package/.mindforge/personas/data-pipeline-architect.md +120 -0
  32. package/.mindforge/personas/de-sloppifier.md +60 -0
  33. package/.mindforge/personas/debt-manager.md +66 -0
  34. package/.mindforge/personas/decision-architect.md +82 -51
  35. package/.mindforge/personas/deployment-captain.md +74 -0
  36. package/.mindforge/personas/design-system-lead.md +112 -0
  37. package/.mindforge/personas/dmux-orchestrator.md +75 -0
  38. package/.mindforge/personas/dx-engineer.md +96 -0
  39. package/.mindforge/personas/ecommerce-engineer.md +57 -0
  40. package/.mindforge/personas/edge-engineer.md +94 -0
  41. package/.mindforge/personas/edtech-architect.md +106 -0
  42. package/.mindforge/personas/embedding-architect.md +57 -0
  43. package/.mindforge/personas/environment-engineer.md +57 -0
  44. package/.mindforge/personas/eval-judge.md +55 -0
  45. package/.mindforge/personas/event-architect.md +102 -0
  46. package/.mindforge/personas/experiment-designer.md +138 -0
  47. package/.mindforge/personas/feature-store-engineer.md +57 -0
  48. package/.mindforge/personas/finops-analyst.md +66 -0
  49. package/.mindforge/personas/fintech-architect.md +57 -0
  50. package/.mindforge/personas/flutter-engineer.md +104 -0
  51. package/.mindforge/personas/gaming-engineer.md +57 -0
  52. package/.mindforge/personas/graphql-designer.md +73 -0
  53. package/.mindforge/personas/healthcare-engineer.md +57 -0
  54. package/.mindforge/personas/hiring-strategist.md +105 -0
  55. package/.mindforge/personas/hitl-architect.md +165 -0
  56. package/.mindforge/personas/i18n-architect.md +69 -0
  57. package/.mindforge/personas/iot-architect.md +105 -0
  58. package/.mindforge/personas/knowledge-curator.md +139 -0
  59. package/.mindforge/personas/knowledge-engineer.md +57 -0
  60. package/.mindforge/personas/lakehouse-architect.md +57 -0
  61. package/.mindforge/personas/llm-orchestrator.md +57 -0
  62. package/.mindforge/personas/logistics-architect.md +106 -0
  63. package/.mindforge/personas/market-analyst.md +53 -0
  64. package/.mindforge/personas/marketplace-engineer.md +105 -0
  65. package/.mindforge/personas/mcp-designer.md +54 -0
  66. package/.mindforge/personas/meeting-designer.md +104 -0
  67. package/.mindforge/personas/mentorship-lead.md +106 -0
  68. package/.mindforge/personas/migration-architect.md +57 -0
  69. package/.mindforge/personas/ml-ops-engineer.md +101 -0
  70. package/.mindforge/personas/mobile-architect.md +105 -0
  71. package/.mindforge/personas/mobile-security-engineer.md +106 -0
  72. package/.mindforge/personas/multi-tenancy-architect.md +71 -0
  73. package/.mindforge/personas/multimodal-engineer.md +57 -0
  74. package/.mindforge/personas/offline-specialist.md +105 -0
  75. package/.mindforge/personas/onboarding-navigator.md +63 -0
  76. package/.mindforge/personas/payments-engineer.md +135 -0
  77. package/.mindforge/personas/pipeline-engineer.md +115 -0
  78. package/.mindforge/personas/platform-engineer.md +97 -0
  79. package/.mindforge/personas/platform-lead.md +57 -0
  80. package/.mindforge/personas/privacy-engineer.md +57 -0
  81. package/.mindforge/personas/product-owner.md +56 -0
  82. package/.mindforge/personas/productivity-analyst.md +57 -0
  83. package/.mindforge/personas/prompt-architect.md +101 -0
  84. package/.mindforge/personas/proofreader.md +53 -0
  85. package/.mindforge/personas/pwa-architect.md +105 -0
  86. package/.mindforge/personas/quality-scorer.md +63 -0
  87. package/.mindforge/personas/react-native-engineer.md +106 -0
  88. package/.mindforge/personas/resilience-engineer.md +69 -0
  89. package/.mindforge/personas/rfc-architect.md +64 -0
  90. package/.mindforge/personas/saga-orchestrator.md +80 -0
  91. package/.mindforge/personas/secrets-engineer.md +57 -0
  92. package/.mindforge/personas/skill-smith.md +79 -0
  93. package/.mindforge/personas/sre-lead.md +107 -0
  94. package/.mindforge/personas/stream-engineer.md +57 -0
  95. package/.mindforge/personas/streaming-engineer.md +64 -0
  96. package/.mindforge/personas/swarm-templates.json +674 -44
  97. package/.mindforge/personas/system-designer.md +57 -0
  98. package/.mindforge/personas/team-coach.md +120 -0
  99. package/.mindforge/personas/tech-lead-coach.md +103 -0
  100. package/.mindforge/personas/technical-writer-lead.md +111 -0
  101. package/.mindforge/personas/vibe-checker.md +75 -0
  102. package/.mindforge/personas/worktree-manager.md +56 -0
  103. package/.mindforge/personas/zero-trust-engineer.md +113 -0
  104. package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
  105. package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
  106. package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
  107. package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
  108. package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
  109. package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
  110. package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
  111. package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
  112. package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
  113. package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
  114. package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
  115. package/.mindforge/skills/api-versioning/SKILL.md +100 -0
  116. package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
  117. package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
  118. package/.mindforge/skills/audit-logging/SKILL.md +140 -0
  119. package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
  120. package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
  121. package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
  122. package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
  123. package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
  124. package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
  125. package/.mindforge/skills/business-analyst/SKILL.md +82 -0
  126. package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
  127. package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
  128. package/.mindforge/skills/causal-inference/SKILL.md +42 -0
  129. package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
  130. package/.mindforge/skills/change-management/SKILL.md +106 -0
  131. package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
  132. package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
  133. package/.mindforge/skills/cli-design/SKILL.md +118 -0
  134. package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
  135. package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
  136. package/.mindforge/skills/code-tour/SKILL.md +145 -0
  137. package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
  138. package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
  139. package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
  140. package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
  141. package/.mindforge/skills/container-security/SKILL.md +151 -0
  142. package/.mindforge/skills/context-engineering/SKILL.md +114 -0
  143. package/.mindforge/skills/contract-testing/SKILL.md +85 -0
  144. package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
  145. package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
  146. package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
  147. package/.mindforge/skills/data-governance/SKILL.md +42 -0
  148. package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
  149. package/.mindforge/skills/data-mesh/SKILL.md +42 -0
  150. package/.mindforge/skills/data-modeling/SKILL.md +107 -0
  151. package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
  152. package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
  153. package/.mindforge/skills/database-performance/SKILL.md +174 -0
  154. package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
  155. package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
  156. package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
  157. package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
  158. package/.mindforge/skills/dependency-management/SKILL.md +94 -0
  159. package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
  160. package/.mindforge/skills/design-system/SKILL.md +113 -0
  161. package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
  162. package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
  163. package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
  164. package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
  165. package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
  166. package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
  167. package/.mindforge/skills/edge-computing/SKILL.md +91 -0
  168. package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
  169. package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
  170. package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
  171. package/.mindforge/skills/environment-management/SKILL.md +54 -0
  172. package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
  173. package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
  174. package/.mindforge/skills/eval-harness/SKILL.md +180 -0
  175. package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
  176. package/.mindforge/skills/experiment-design/SKILL.md +139 -0
  177. package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
  178. package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
  179. package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
  180. package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
  181. package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
  182. package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
  183. package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
  184. package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
  185. package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
  186. package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
  187. package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
  188. package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
  189. package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
  190. package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
  191. package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
  192. package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
  193. package/.mindforge/skills/incident-communication/SKILL.md +96 -0
  194. package/.mindforge/skills/incident-management/SKILL.md +97 -0
  195. package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
  196. package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
  197. package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
  198. package/.mindforge/skills/iot-platform/SKILL.md +41 -0
  199. package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
  200. package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
  201. package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
  202. package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
  203. package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
  204. package/.mindforge/skills/load-testing/SKILL.md +84 -0
  205. package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
  206. package/.mindforge/skills/market-researcher/SKILL.md +99 -0
  207. package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
  208. package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
  209. package/.mindforge/skills/media-streaming/SKILL.md +41 -0
  210. package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
  211. package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
  212. package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
  213. package/.mindforge/skills/migration-platform/SKILL.md +61 -0
  214. package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
  215. package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
  216. package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
  217. package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
  218. package/.mindforge/skills/mobile-security/SKILL.md +45 -0
  219. package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
  220. package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
  221. package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
  222. package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
  223. package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
  224. package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
  225. package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
  226. package/.mindforge/skills/observability-stack/SKILL.md +136 -0
  227. package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
  228. package/.mindforge/skills/on-call-design/SKILL.md +111 -0
  229. package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
  230. package/.mindforge/skills/payment-integration/SKILL.md +176 -0
  231. package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
  232. package/.mindforge/skills/platform-observability/SKILL.md +58 -0
  233. package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
  234. package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
  235. package/.mindforge/skills/product-manager/SKILL.md +104 -0
  236. package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
  237. package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
  238. package/.mindforge/skills/proofreader/SKILL.md +158 -0
  239. package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
  240. package/.mindforge/skills/python-performance/SKILL.md +183 -0
  241. package/.mindforge/skills/quality-audit/SKILL.md +171 -0
  242. package/.mindforge/skills/queue-design/SKILL.md +85 -0
  243. package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
  244. package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
  245. package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
  246. package/.mindforge/skills/react-performance/SKILL.md +229 -0
  247. package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
  248. package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
  249. package/.mindforge/skills/responsive-native/SKILL.md +44 -0
  250. package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
  251. package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
  252. package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
  253. package/.mindforge/skills/santa-method/SKILL.md +134 -0
  254. package/.mindforge/skills/search-implementation/SKILL.md +98 -0
  255. package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
  256. package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
  257. package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
  258. package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
  259. package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
  260. package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
  261. package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
  262. package/.mindforge/skills/state-management/SKILL.md +104 -0
  263. package/.mindforge/skills/stream-processing/SKILL.md +43 -0
  264. package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
  265. package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
  266. package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
  267. package/.mindforge/skills/system-design/SKILL.md +88 -0
  268. package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
  269. package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
  270. package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
  271. package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
  272. package/.mindforge/skills/technical-writing/SKILL.md +237 -0
  273. package/.mindforge/skills/technology-radar/SKILL.md +88 -0
  274. package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
  275. package/.mindforge/skills/tool-design/SKILL.md +138 -0
  276. package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
  277. package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
  278. package/.mindforge/skills/verification-loop/SKILL.md +13 -1
  279. package/.mindforge/skills/vibe-security/SKILL.md +165 -0
  280. package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
  281. package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
  282. package/.mindforge/skills/writing-plans/SKILL.md +170 -0
  283. package/.mindforge/skills/writing-skills/SKILL.md +216 -0
  284. package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
  285. package/CHANGELOG.md +240 -0
  286. package/MINDFORGE.md +4 -4
  287. package/README.md +49 -4
  288. package/RELEASENOTES.md +80 -0
  289. package/SECURITY.md +20 -8
  290. package/bin/autonomous/audit-writer.js +13 -0
  291. package/bin/autonomous/auto-runner.js +74 -16
  292. package/bin/autonomous/context-refactorer.js +26 -11
  293. package/bin/autonomous/state-manager.js +62 -6
  294. package/bin/autonomous/stuck-monitor.js +46 -7
  295. package/bin/autonomous/wave-executor.js +66 -25
  296. package/bin/dashboard/api-router.js +43 -0
  297. package/bin/dashboard/metrics-aggregator.js +28 -1
  298. package/bin/dashboard/server.js +67 -4
  299. package/bin/dashboard/sse-bridge.js +4 -4
  300. package/bin/engine/feedback-loop.js +8 -0
  301. package/bin/engine/intelligence-interlock.js +32 -15
  302. package/bin/engine/logic-drift-detector.js +2 -1
  303. package/bin/engine/nexus-tracer.js +3 -2
  304. package/bin/engine/remediation-engine.js +155 -32
  305. package/bin/engine/self-corrective-synthesizer.js +84 -10
  306. package/bin/engine/sre-manager.js +12 -4
  307. package/bin/engine/temporal-hub.js +131 -34
  308. package/bin/governance/approve.js +41 -5
  309. package/bin/governance/impact-analyzer.js +28 -0
  310. package/bin/governance/policy-engine.js +10 -3
  311. package/bin/governance/quantum-crypto.js +32 -19
  312. package/bin/governance/rbac-manager.js +74 -2
  313. package/bin/governance/ztai-manager.js +49 -7
  314. package/bin/hindsight-injector.js +3 -3
  315. package/bin/memory/eis-client.js +71 -34
  316. package/bin/memory/embedding-engine.js +61 -0
  317. package/bin/memory/knowledge-graph.js +58 -5
  318. package/bin/memory/knowledge-indexer.js +53 -6
  319. package/bin/memory/knowledge-store.js +22 -0
  320. package/bin/migrations/10.7.0-to-11.0.0.js +110 -0
  321. package/bin/migrations/schema-versions.js +13 -0
  322. package/bin/models/anthropic-provider.js +45 -0
  323. package/bin/models/cloud-broker.js +68 -20
  324. package/bin/models/gemini-provider.js +51 -0
  325. package/bin/models/model-client.js +20 -0
  326. package/bin/models/model-router.js +28 -8
  327. package/bin/models/openai-provider.js +44 -0
  328. package/bin/utils/file-io.js +63 -1
  329. package/bin/utils/index.js +58 -0
  330. package/docs/getting-started.md +1 -1
  331. package/docs/user-guide.md +2 -2
  332. package/package.json +2 -2
  333. package/.mindforge/personas/data-privacy-engineer.md +0 -187
@@ -0,0 +1,234 @@
1
+ ---
2
+ name: human-in-the-loop-design
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.4
5
+ status: stable
6
+ triggers: human in the loop, escalation gate, approval threshold, confidence threshold, explanation quality, trust calibration, user override, hitl pattern, agent handoff, supervision design, human review trigger, autonomous boundary
7
+ compose: guardrails-and-safety
8
+ ---
9
+
10
+ # Skill — Human-in-the-Loop Design (Escalation & Supervision Architecture)
11
+
12
+ ## When this skill activates
13
+ When designing agent autonomy boundaries, building escalation gates, calibrating
14
+ confidence thresholds, or implementing approval workflows. Use for any system where
15
+ an AI agent must decide between acting autonomously and requesting human guidance.
16
+
17
+ Core principle: **Maximum VALUE, not maximum autonomy** — the goal is not to minimize
18
+ human involvement. The goal is to maximize the value delivered. Sometimes the highest-value
19
+ action is asking the human. The art is knowing WHEN.
20
+
21
+ ## Mandatory actions when this skill is active
22
+
23
+ ### Action Classification (Reversibility x Impact Matrix)
24
+
25
+ 1. **Classify every agent action:**
26
+ ```
27
+ | Impact \ Reversibility | Easily Reversible | Hard to Reverse | Irreversible |
28
+ |------------------------|------------------------|------------------------|------------------------|
29
+ | Low Impact | AUTONOMOUS | AUTONOMOUS | CONFIRM |
30
+ | Medium Impact | AUTONOMOUS | CONFIRM | APPROVE |
31
+ | High Impact | CONFIRM | APPROVE | APPROVE + WAIT |
32
+
33
+ Levels:
34
+ - AUTONOMOUS: Agent acts without asking (log for audit)
35
+ - CONFIRM: Agent acts but shows what it did (user can undo)
36
+ - APPROVE: Agent proposes, human approves before execution
37
+ - APPROVE + WAIT: Agent proposes, human approves, agent waits for explicit "go"
38
+ ```
39
+
40
+ 2. **Per-action classification examples:**
41
+ ```
42
+ AUTONOMOUS (act freely):
43
+ - Reading files
44
+ - Running read-only queries
45
+ - Searching codebases
46
+ - Generating suggestions (not applying them)
47
+
48
+ CONFIRM (act, show, allow undo):
49
+ - Editing existing files
50
+ - Creating new files in expected locations
51
+ - Running tests
52
+ - Installing dev dependencies
53
+
54
+ APPROVE (propose, wait for yes):
55
+ - Deleting files
56
+ - Modifying configuration
57
+ - Running destructive commands
58
+ - Changing auth/security code
59
+ - Making API calls with side effects
60
+
61
+ APPROVE + WAIT (high ceremony):
62
+ - Deploying to production
63
+ - Modifying database schema
64
+ - Changing payment logic
65
+ - Force-pushing to shared branches
66
+ - Deleting user data
67
+ ```
68
+
69
+ ### Escalation Triggers
70
+
71
+ 3. **When to escalate (confidence-based):**
72
+ ```
73
+ Always escalate when:
74
+ - Confidence < 0.7 on the correct approach
75
+ - Action is irreversible AND high-impact
76
+ - Multiple valid approaches exist with no clear winner
77
+ - Task contradicts prior user guidance
78
+ - Security-sensitive code is being modified
79
+ - User's intent is ambiguous
80
+
81
+ Escalation format:
82
+ "I need your input on [X].
83
+ Context: [what I understand about the situation]
84
+ Options: [A, B, C with tradeoffs]
85
+ My recommendation: [preferred option + why]
86
+ What I'm uncertain about: [specific uncertainty]"
87
+ ```
88
+
89
+ Rules:
90
+ - ALWAYS explain WHY you're escalating (don't just say "I'm not sure")
91
+ - ALWAYS provide a recommendation (even when uncertain)
92
+ - ALWAYS state what additional context would resolve the uncertainty
93
+ - Never escalate without having done research first (don't be lazy)
94
+
95
+ ### Approval Gate Design
96
+
97
+ 4. **Designing low-friction approval UX:**
98
+ ```
99
+ Principles:
100
+ - Fast: approval should take <5 seconds for clear cases
101
+ - Informative: show WHAT will happen, not just ask "ok?"
102
+ - Defaulted: suggest the likely answer (approve/reject)
103
+ - Skippable: allow bulk-approve for repetitive low-risk items
104
+ - Auditable: log every approval decision with timestamp and rationale
105
+
106
+ Good approval request:
107
+ "I'd like to add an index on users.email (migration file ready).
108
+ This will lock the table for ~2 seconds during deploy.
109
+ [Approve] [Reject] [Show migration SQL first]"
110
+
111
+ Bad approval request:
112
+ "Can I make a database change?"
113
+ ```
114
+
115
+ Rules:
116
+ - Show the EFFECT of the action, not just the action itself
117
+ - Provide enough context to decide without further research
118
+ - Offer a way to see more detail (for cautious reviewers)
119
+ - Default to the safe option (reject) for high-impact actions
120
+ - Time-box approvals: if no response in X hours, remind or escalate
121
+
122
+ ### Confidence Calibration
123
+
124
+ 5. **Ensuring confidence scores are meaningful:**
125
+ ```
126
+ Calibration goal:
127
+ When the agent says "I'm 90% confident" → it should be correct 90% of the time
128
+ When the agent says "I'm 50% confident" → it should be correct 50% of the time
129
+
130
+ Measuring calibration:
131
+ - Collect (confidence, actual_outcome) pairs from eval runs
132
+ - Plot calibration curve (expected accuracy vs actual accuracy)
133
+ - Perfect calibration = diagonal line
134
+ - Overconfident = curve below diagonal (says 90%, is right 70%)
135
+ - Underconfident = curve above diagonal (says 50%, is right 80%)
136
+
137
+ Fixing miscalibration:
138
+ - Overconfident: lower confidence thresholds (escalate more)
139
+ - Underconfident: raise thresholds (escalate less, trust yourself)
140
+ - Recalibrate after major model/prompt changes
141
+ ```
142
+
143
+ Rules:
144
+ - Calibrate quarterly (or after any major agent change)
145
+ - If overconfident: the agent is making unescalated mistakes → tighten boundaries
146
+ - If underconfident: the agent is annoying users with unnecessary escalations → loosen
147
+ - Track calibration as a first-class metric (alongside accuracy)
148
+
149
+ ### Explanation Quality
150
+
151
+ 6. **How to explain escalations effectively:**
152
+ ```
153
+ Explanation structure:
154
+ 1. WHAT: what you're asking about (specific, concrete)
155
+ 2. WHY: why you can't decide autonomously (the uncertainty)
156
+ 3. OPTIONS: what the choices are (with tradeoffs)
157
+ 4. RECOMMENDATION: what you'd do if forced to decide
158
+ 5. CONTEXT_GAP: what information would let you decide next time
159
+
160
+ Good explanation:
161
+ "I found two approaches to implement caching (Redis vs in-memory).
162
+ Redis is more robust but adds infrastructure cost.
163
+ In-memory is simpler but won't survive restarts.
164
+ I'd lean toward Redis for production, but I don't know your infra budget.
165
+ If you tell me the acceptable monthly cost, I can decide this autonomously next time."
166
+
167
+ Bad explanation:
168
+ "Should I use Redis or in-memory caching?"
169
+ ```
170
+
171
+ ### Trust Building (Progressive Autonomy)
172
+
173
+ 7. **Earning autonomy over time:**
174
+ ```
175
+ Trust levels:
176
+ Level 1 — New agent (restrictive):
177
+ - APPROVE for any write operation
178
+ - CONFIRM for most read operations
179
+ - Escalation rate: high (~30% of actions)
180
+
181
+ Level 2 — Established (standard):
182
+ - AUTONOMOUS for reads and standard writes
183
+ - CONFIRM for destructive operations
184
+ - APPROVE for irreversible high-impact actions
185
+ - Escalation rate: moderate (~10%)
186
+
187
+ Level 3 — Trusted (permissive):
188
+ - AUTONOMOUS for most operations
189
+ - CONFIRM for irreversible actions
190
+ - APPROVE only for production deploys and security changes
191
+ - Escalation rate: low (~3%)
192
+
193
+ Level transitions:
194
+ - Promote: 20 consecutive successful autonomous actions without user correction
195
+ - Demote: 1 autonomous action that user explicitly reverses or flags as wrong
196
+ - Demotion is faster than promotion (trust is earned slowly, lost quickly)
197
+ ```
198
+
199
+ ### Monitoring Escalation Health
200
+
201
+ 8. **Tracking escalation quality:**
202
+ ```
203
+ Metrics to monitor:
204
+ - Escalation rate: % of actions that require human input
205
+ - False escalation rate: % of escalations where human says "just do it"
206
+ - Missed escalation rate: % of autonomous actions that were wrong
207
+ - Approval latency: time between escalation and human response
208
+ - Rubber-stamp rate: % of approvals decided in <2 seconds (too fast = not reading)
209
+
210
+ Healthy ranges:
211
+ - Escalation rate: 5-15% (too low = risky, too high = annoying)
212
+ - False escalation rate: <20% (too high = boundaries too tight)
213
+ - Missed escalation rate: <2% (too high = boundaries too loose)
214
+ - Rubber-stamp rate: <30% (too high = approval fatigue, redesign needed)
215
+ ```
216
+
217
+ Rules:
218
+ - If rubber-stamp rate is high: reduce approval friction or widen autonomy
219
+ - If missed escalation rate is high: tighten boundaries immediately
220
+ - Review escalation metrics weekly (don't let them drift)
221
+ - Treat high rubber-stamp rate as a UX bug (users are being annoyed, not helped)
222
+
223
+ ## Self-check before task completion
224
+
225
+ Before marking a task done when this skill was active:
226
+
227
+ - [ ] Are actions classified by reversibility x impact (autonomous/confirm/approve)?
228
+ - [ ] Do escalation triggers include: low confidence, irreversible actions, ambiguity?
229
+ - [ ] Is the approval UX low-friction (fast, informative, defaulted)?
230
+ - [ ] Are explanations structured (what, why, options, recommendation, context gap)?
231
+ - [ ] Is progressive autonomy designed (trust levels with promotion/demotion)?
232
+ - [ ] Are escalation health metrics defined (false escalation rate, rubber-stamp rate)?
233
+ - [ ] Is confidence calibration measured (says 90% → right 90%)?
234
+ - [ ] Does the guardrails-and-safety skill co-activate for safety-critical boundaries?
@@ -0,0 +1,147 @@
1
+ ---
2
+ name: i18n-architecture
3
+ version: 1.0.0
4
+ min_mindforge_version: 0.3.0
5
+ status: stable
6
+ triggers: i18n architecture, message catalog, pluralization rule, ICU message format, RTL layout, locale detection, translation loading, internationalization setup, language fallback, number formatting, date locale, translation management
7
+ ---
8
+
9
+ # Skill — Internationalization Architecture
10
+
11
+ ## When this skill activates
12
+ Any task involving multi-language support, locale handling, message catalogs,
13
+ RTL layouts, number/date formatting, or translation infrastructure.
14
+
15
+ ## Mandatory actions when this skill is active
16
+
17
+ ### Before implementing i18n
18
+ 1. Audit all user-facing strings in the codebase.
19
+ 2. Define the locale detection strategy.
20
+ 3. Choose a message format that handles plurals and gender.
21
+
22
+ ### Message format (ICU MessageFormat)
23
+
24
+ **Why ICU:**
25
+ - Handles plurals correctly across languages (some have 6 plural forms).
26
+ - Handles gender agreement.
27
+ - Handles select/choice patterns.
28
+ - Industry standard supported by most i18n libraries.
29
+
30
+ **Examples:**
31
+ ```
32
+ {count, plural,
33
+ =0 {No items}
34
+ one {# item}
35
+ other {# items}
36
+ }
37
+
38
+ {gender, select,
39
+ male {He liked your post}
40
+ female {She liked your post}
41
+ other {They liked your post}
42
+ }
43
+ ```
44
+
45
+ **Critical rule:** NEVER concatenate strings for messages.
46
+ - BAD: `"Hello " + name + ", you have " + count + " messages"`
47
+ - GOOD: `"Hello {name}, you have {count, plural, one {# message} other {# messages}}"`
48
+
49
+ ### Catalog structure
50
+
51
+ **One file per locale, namespaced by feature:**
52
+ ```
53
+ locales/
54
+ en/
55
+ common.json
56
+ auth.json
57
+ dashboard.json
58
+ fr/
59
+ common.json
60
+ auth.json
61
+ dashboard.json
62
+ ```
63
+
64
+ **Rules:**
65
+ - Keys are semantic, not the English text (`auth.loginButton` not `"Log in"`).
66
+ - Flat keys with dot notation or nested objects — pick one, be consistent.
67
+ - Never store HTML in translation strings (use interpolation components).
68
+ - Keep a "base" locale (usually en) as the source of truth.
69
+
70
+ ### Loading strategy
71
+
72
+ **Lazy-load per route/namespace:**
73
+ - Do NOT load all locales upfront — only the active locale.
74
+ - Do NOT load all namespaces — only what the current route needs.
75
+ - Prefetch the next likely namespace on navigation intent.
76
+
77
+ **Implementation:**
78
+ ```javascript
79
+ // Load only when needed
80
+ const messages = await import(`./locales/${locale}/${namespace}.json`);
81
+ ```
82
+
83
+ **Fallback chain:**
84
+ - Specific locale (fr-CA) → base locale (fr) → default locale (en).
85
+ - Missing key in active locale → fall back, log warning in development.
86
+
87
+ ### Locale detection
88
+
89
+ **Priority order:**
90
+ 1. User explicit preference (stored in profile/cookie).
91
+ 2. Accept-Language header (server-side).
92
+ 3. Navigator.language (client-side).
93
+ 4. Geo-IP lookup (least reliable).
94
+ 5. Default locale (en).
95
+
96
+ **Rules:**
97
+ - Let users override detected locale at any time.
98
+ - Persist user choice across sessions.
99
+ - URL strategy: subdomain (fr.app.com) or path prefix (/fr/dashboard).
100
+
101
+ ### RTL layout support
102
+
103
+ **CSS logical properties (mandatory):**
104
+ - Use `margin-inline-start` not `margin-left`.
105
+ - Use `padding-inline-end` not `padding-right`.
106
+ - Use `inset-inline-start` not `left`.
107
+ - Use `border-inline-start` not `border-left`.
108
+
109
+ **HTML:**
110
+ - Set `dir="rtl"` on the `<html>` element based on locale.
111
+ - Use `dir="auto"` on user-generated content.
112
+
113
+ **Icons and images:**
114
+ - Mirror directional icons (arrows, progress indicators) in RTL.
115
+ - Do NOT mirror: logos, clocks, phone icons, checkmarks.
116
+
117
+ ### Number and date formatting
118
+
119
+ **Always use Intl APIs:**
120
+ ```javascript
121
+ // Numbers
122
+ new Intl.NumberFormat(locale, { style: 'currency', currency: 'USD' }).format(amount);
123
+
124
+ // Dates
125
+ new Intl.DateTimeFormat(locale, { dateStyle: 'long', timeStyle: 'short' }).format(date);
126
+
127
+ // Relative time
128
+ new Intl.RelativeTimeFormat(locale, { numeric: 'auto' }).format(-1, 'day');
129
+ ```
130
+
131
+ **Rules:**
132
+ - Never manually format numbers or dates with string templates.
133
+ - Store dates in UTC, display in user's timezone.
134
+ - Currency display must respect locale (symbol position, separator).
135
+
136
+ ### Translation management
137
+
138
+ - Use a translation management system (Crowdin, Lokalise, Phrase) for professional translations.
139
+ - Extract new keys automatically from code (i18next-parser, formatjs extract).
140
+ - CI check: fail if base locale has keys missing from other locales.
141
+ - Pseudo-localization in development to catch hardcoded strings and layout overflow.
142
+
143
+ ## Self-check before task completion
144
+ - [ ] Did I follow the mandatory actions for this skill?
145
+ - [ ] Did I apply the patterns appropriate to the context?
146
+ - [ ] Did I verify the implementation meets the criteria above?
147
+ - [ ] Did I document decisions and trade-offs made?
@@ -0,0 +1,84 @@
1
+ ---
2
+ name: idempotency-patterns
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.9
5
+ status: stable
6
+ triggers: idempotency pattern, idempotency key, exactly once semantic, deduplication strategy, replay safety, idempotent consumer, idempotent api, request deduplication, operation retry safety, duplicate detection, idempotent write, at-most-once processing
7
+ ---
8
+
9
+ # Skill — Idempotency Patterns
10
+
11
+ ## When this skill activates
12
+ Any task involving idempotent API design, idempotency keys, exactly-once semantics,
13
+ deduplication, replay safety, or retry-safe operations.
14
+
15
+ ## Mandatory actions when this skill is active
16
+
17
+ ### Before writing any code
18
+ 1. Identify which operations MUST be idempotent (any retryable operation).
19
+ 2. Determine key strategy (client-generated UUID in header).
20
+ 3. Choose storage backend (Redis for short-lived, DB for permanent records).
21
+
22
+ ### During implementation
23
+ - Accept keys via `Idempotency-Key` header on all POST endpoints.
24
+ - Store complete response with the key (status + body, not just success flag).
25
+ - Set appropriate TTL on idempotency records.
26
+ - Handle concurrent duplicates (lock or 409 Conflict).
27
+
28
+ ### After implementation
29
+ - Test concurrent duplicate requests (race condition safety).
30
+ - Verify partial failures are NOT cached (only complete operations).
31
+ - Document which endpoints are idempotent and key requirements.
32
+
33
+ ## Core Flow
34
+
35
+ ```
36
+ 1. Receive request with Idempotency-Key
37
+ 2. Key exists in store? → YES: return cached response. NO: continue.
38
+ 3. Lock key (prevent concurrent processing of same key)
39
+ 4. Execute operation
40
+ 5. Store: key → {status_code, body, created_at}
41
+ 6. Release lock, return response
42
+ ```
43
+
44
+ ## Idempotency Key Design
45
+ - Client-generated UUID v4 or ULID. Same key = same intended operation.
46
+ - Scope per-endpoint AND per-user: `idempotency:{user}:{endpoint}:{key}`.
47
+ - Max length: 255 chars. Stable across retries of same business action.
48
+
49
+ ## Storage Options
50
+
51
+ - **Redis**: fast, TTL built-in, atomic via Lua. Use for API requests (24h TTL).
52
+ - **Database**: durable, queryable. Use for financial/audit operations. Needs cleanup job.
53
+
54
+ ## Database Patterns
55
+
56
+ - **INSERT ON CONFLICT DO NOTHING**: safe insert, check RETURNING for duplicate.
57
+ - **Conditional UPDATE**: `WHERE version = $expected` — 0 rows = stale (reject).
58
+ - **Transactional Outbox**: atomically persist state + event, poll and publish.
59
+
60
+ ## Consumer Idempotency
61
+ - Dedup table: `processed_events(event_id PK, processed_at)`.
62
+ - Flow: check if processed → BEGIN → process → INSERT dedup → COMMIT → ACK.
63
+ - TTL: retain dedup records for broker retention + buffer (e.g., 14 days).
64
+
65
+ ## API Design
66
+
67
+ - **GET/PUT/DELETE**: naturally idempotent (safe to retry without keys).
68
+ - **POST/PATCH**: require explicit `Idempotency-Key` header.
69
+ - Response headers: `Idempotent-Replayed: true` for cached responses.
70
+
71
+ ## Error Handling
72
+ - 4xx client errors: cache (client should not retry same bad input).
73
+ - 5xx server errors: do NOT cache (may succeed on retry).
74
+ - Partial completion: do NOT cache (delete record, retry from scratch).
75
+
76
+ ## Self-check before task completion
77
+
78
+ - [ ] Are idempotency keys accepted on all non-idempotent endpoints?
79
+ - [ ] Is the complete response cached (status + body)?
80
+ - [ ] Are concurrent duplicates handled safely (lock or 409)?
81
+ - [ ] Is TTL set appropriately on idempotency records?
82
+ - [ ] Are server errors excluded from caching?
83
+ - [ ] Are DB writes using conflict-safe patterns?
84
+ - [ ] Are message consumers deduplicating by event ID?
@@ -0,0 +1,96 @@
1
+ ---
2
+ name: incident-communication
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.3.0
5
+ status: stable
6
+ triggers: incident communication, war room coordination, customer incident message, postmortem facilitation, blameless culture, incident status page, outage communication, stakeholder notification incident, incident timeline, root cause communication, severity classification, incident bridge
7
+ compose:
8
+ - incident-management
9
+ ---
10
+
11
+ # Incident Communication
12
+
13
+ ## When this skill activates
14
+
15
+ This skill activates during production incidents when coordinating war rooms, writing customer-facing incident messages, updating status pages, communicating to stakeholders, facilitating postmortems, or establishing blameless culture. It applies to on-call engineers, incident commanders, and engineering managers responsible for incident response and communication.
16
+
17
+ ## Mandatory actions when this skill is active
18
+
19
+ ### Before incident communication begins
20
+
21
+ 1. **Classify severity immediately** — SEV1 (business-critical service down), SEV2 (degraded performance, partial outage), SEV3 (minor issue, workaround available), SEV4 (cosmetic issue, no user impact). Severity determines communication cadence and escalation.
22
+ 2. **Assign roles explicitly** — Incident Commander (coordinates response), Communications Lead (stakeholder updates), Tech Lead (drives technical resolution), Scribe (documents timeline). Ambiguous roles cause chaos.
23
+ 3. **Establish communication channels** — War room (Slack/Teams/Zoom for internal), status page (for customers), stakeholder channel (for execs/product). Don't mix internal and external comms.
24
+ 4. **Start the incident timeline immediately** — Document: when did it start, when was it detected, who's working on it, what's been tried, what's the current status. Real-time notes prevent memory loss.
25
+
26
+ ### During incident response
27
+
28
+ #### War Room Coordination
29
+
30
+ - **Run structured check-ins every 15-30 minutes** — Incident Commander asks: (1) What's the current status? (2) What are we trying next? (3) Do we need more people? Prevents chaos and ensures everyone is aligned.
31
+ - **Use threaded communication** — Main channel for status updates only. Threads for technical debugging. Don't pollute the main channel with noisy debugging logs.
32
+ - **Limit the war room to essential people** — Too many cooks slow resolution. Core team: 3-5 people. Observers can follow in a read-only channel.
33
+ - **Escalate when stuck** — If the team is stuck for >30 minutes, escalate. Call in the architect, the team that owns the upstream service, or the engineer who built the system. Ego has no place in incidents.
34
+ - **Declare resolution criteria upfront** — What does "fixed" mean? Service is healthy? All users can transact? Error rate below threshold? Prevents premature "all clear" declarations.
35
+
36
+ #### Customer-Facing Communication
37
+
38
+ - **Acknowledge fast, diagnose later** — Within 15 minutes of detection, post to status page: "We are investigating reports of [service] being unavailable. Updates to follow." Don't wait for root cause.
39
+ - **Use the 3-part update structure** — (1) Current status (what's broken), (2) Customer impact (what can't they do), (3) Next steps (when's the next update). No jargon, no excuses.
40
+ - **Update cadence by severity** — SEV1: every 30 minutes. SEV2: every 60 minutes. SEV3: every 2 hours. Customers hate silence more than bad news.
41
+ - **Avoid over-promising** — Don't say "Fixed in 10 minutes" if you're unsure. Say "Actively working on resolution. Next update in 30 minutes."
42
+ - **Post resolution message** — Once resolved, post: what broke, how long it lasted, how many users were impacted, what we did to fix it, what we're doing to prevent recurrence. Transparency builds trust.
43
+
44
+ #### Stakeholder Notification
45
+
46
+ - **Notify executives immediately for SEV1** — Execs need to know ASAP, especially if customers are complaining or revenue is impacted. One-line summary: "Service X is down. Y% of users impacted. We're on it."
47
+ - **Use BLUF (Bottom Line Up Front)** — Don't bury the lede. "The payment service is down" comes before "We suspect a database connection pool exhaustion."
48
+ - **Provide ETA cautiously** — If you estimate 2 hours to fix, tell stakeholders 3-4 hours. Better to resolve early than overpromise and underdeliver.
49
+ - **Summarize after resolution** — Once the incident is resolved, send a concise executive summary: what broke, how long, customer impact, resolution, next steps. Save the detailed RCA for the postmortem.
50
+
51
+ #### Incident Timeline Documentation
52
+
53
+ - **Scribe logs everything in real-time** — Timestamp every key event: detection, escalation, hypothesis tested, mitigation applied, resolution. Future-you will thank present-you.
54
+ - **Capture decisions and reasoning** — Don't just log "Rolled back to v1.3." Log "Rolled back to v1.3 because v1.4 introduced N+1 query causing DB saturation."
55
+ - **Include dead ends** — Document failed attempts: "Restarted service at 10:15am. Did not resolve issue." Prevents repeating failed approaches.
56
+ - **Link to relevant artifacts** — Logs, dashboards, PRs, alerts. Timeline should be a navigation hub, not a standalone document.
57
+
58
+ #### Root Cause Communication
59
+
60
+ - **Distinguish proximate cause from root cause** — Proximate: "The database ran out of connections." Root: "We didn't set a connection pool limit, so a spike in traffic exhausted connections." Root cause prevents recurrence.
61
+ - **Use the Five Whys** — Why did X happen? Because Y. Why did Y happen? Because Z. Repeat 5 times until you reach the systemic failure, not the surface symptom.
62
+ - **Avoid blame in RCA communication** — Don't say "Engineer A deployed bad code." Say "A deployment bypassed our automated testing due to a gap in CI pipeline." Focus on systems, not individuals.
63
+ - **Define prevention actions** — Every RCA must end with: what are we doing to prevent this from happening again? Action items with owners and deadlines.
64
+
65
+ #### Blameless Culture
66
+
67
+ - **Assume good intent** — Engineers don't cause incidents on purpose. Treat incidents as learning opportunities, not witch hunts.
68
+ - **Focus on systems, not people** — If one engineer's mistake caused an outage, the real failure is the system that allowed a single mistake to cascade. Fix the system.
69
+ - **Celebrate transparency** — When someone admits a mistake, praise their honesty. If people fear punishment, they hide mistakes until they explode.
70
+ - **Conduct blameless postmortems** — Postmortem facilitator enforces: no blame, no naming individuals (unless volunteering credit), focus on system improvements. If someone says "X person messed up," redirect: "What system gap allowed this to happen?"
71
+
72
+ #### Postmortem Facilitation
73
+
74
+ - **Schedule postmortem within 3-5 days** — Too soon: emotions are high, data is incomplete. Too late: memory fades, urgency disappears.
75
+ - **Invite all incident responders + key stakeholders** — Engineers who responded, on-call rotation, product/exec if customer-facing.
76
+ - **Use a structured template** — Summary, Timeline, Root Cause Analysis, What Went Well, What Went Poorly, Action Items (with owners and deadlines). Don't free-form it.
77
+ - **Timebox to 1 hour** — Longer meetings lose focus. If you can't cover it in 1 hour, schedule a follow-up.
78
+ - **Publish postmortem widely** — Share in engineering all-hands, team wikis, or public blog (if appropriate). Transparency accelerates learning across the org.
79
+
80
+ ### After incident resolution
81
+
82
+ - **Close the loop with customers** — If customers were impacted, follow up: apologize, explain what broke, what you're doing to prevent recurrence. Consider service credits if appropriate.
83
+ - **Track action items to completion** — 70% of postmortem action items don't get done. Assign a DRI (Directly Responsible Individual) and review progress in sprint planning.
84
+ - **Measure incident response effectiveness** — Track: detection time, resolution time, communication cadence, customer satisfaction. Improve the metrics that matter.
85
+ - **Update runbooks** — Every incident reveals gaps in runbooks. After resolution, update the runbook so future responders have better context.
86
+
87
+ ## Self-check before task completion
88
+
89
+ - [ ] Severity is classified (SEV1-4) and roles are assigned (IC, Comms Lead, Tech Lead, Scribe)
90
+ - [ ] War room check-ins happen every 15-30 minutes with structured updates
91
+ - [ ] Customer-facing communication acknowledges the issue within 15 minutes
92
+ - [ ] Status page updates use 3-part structure (status, impact, next steps) with no jargon
93
+ - [ ] Incident timeline is documented in real-time with timestamps and decisions
94
+ - [ ] Root cause distinguishes proximate cause from systemic root cause
95
+ - [ ] Postmortem is scheduled within 3-5 days and uses blameless facilitation
96
+ - [ ] Action items from postmortem have owners, deadlines, and are tracked to completion
@@ -0,0 +1,97 @@
1
+ ---
2
+ name: incident-management
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.7
5
+ status: stable
6
+ triggers: incident management process, runbook authoring, severity classification framework, communication template design, on-call rotation design, escalation path design, blameless postmortem facilitation, incident timeline reconstruction, incident playbook, war room methodology, incident retrospective framework, pager rotation scheduling
7
+ compose:
8
+ - observability-stack
9
+ ---
10
+
11
+ # Incident Management
12
+
13
+ ## When this skill activates
14
+
15
+ This skill activates when the user is designing, implementing, or improving incident
16
+ management processes. This includes severity classification frameworks, runbook
17
+ authoring, on-call rotation scheduling, escalation path design, blameless postmortem
18
+ facilitation, war room methodology, communication templates for stakeholders, and
19
+ incident timeline reconstruction for retrospectives.
20
+
21
+ ## Mandatory actions
22
+
23
+ ### Before
24
+
25
+ 1. Identify the current incident management maturity (ad-hoc, documented, measured, optimized).
26
+ 2. Determine the team size, timezone coverage, and existing on-call tooling (PagerDuty, Opsgenie, etc.).
27
+ 3. Assess existing runbook coverage and documentation gaps.
28
+ 4. Review past incident frequency and mean-time-to-recovery (MTTR) trends.
29
+ 5. Confirm observability stack integration (alerts feed into incident workflow).
30
+
31
+ ### During
32
+
33
+ **Severity Framework:**
34
+ - **SEV1 (Critical):** Total service outage affecting all users. Revenue impact. All-hands response. Target acknowledgment: 5 minutes. Target resolution: 1 hour.
35
+ - **SEV2 (Major):** Significant degradation affecting many users. Key features unavailable. On-call + escalation. Target acknowledgment: 15 minutes. Target resolution: 4 hours.
36
+ - **SEV3 (Minor):** Partial degradation affecting a subset of users. Workarounds exist. On-call handles. Target acknowledgment: 30 minutes. Target resolution: 24 hours.
37
+ - **SEV4 (Low):** Minor issue with minimal user impact. Handled during business hours. Target resolution: 72 hours.
38
+ - Severity is determined by impact (users affected) x urgency (revenue/safety/compliance risk).
39
+
40
+ **Runbook Template:**
41
+ - **Trigger Conditions:** What alert or symptom initiates this runbook.
42
+ - **Investigation Steps:** Ordered diagnostic commands and checks (copy-pasteable).
43
+ - **Mitigation Actions:** Immediate steps to restore service (rollback, failover, scale).
44
+ - **Escalation Criteria:** When to involve additional teams or bump severity.
45
+ - **Communication Templates:** Pre-written status page updates and stakeholder messages.
46
+ - **Verification:** How to confirm the incident is resolved.
47
+ - Keep runbooks in version control, review quarterly, update after every incident.
48
+
49
+ **On-Call Rotation:**
50
+ - Follow-the-sun model for distributed teams (no one owns overnight).
51
+ - Maximum rotation length: 1 week. Longer causes burnout.
52
+ - Provide compensation (extra pay, time off, or both).
53
+ - Escalation after 15 minutes of no acknowledgment.
54
+ - Secondary on-call as backup for every primary.
55
+ - On-call handoff includes open incidents, recent changes, and known risks.
56
+
57
+ **War Room Methodology:**
58
+ - Designate an Incident Commander (IC) who coordinates, does not debug.
59
+ - IC assigns roles: Communications Lead, Technical Lead, Scribe.
60
+ - Communication cadence: status page updates every 15 minutes during SEV1/SEV2.
61
+ - Use a shared channel (Slack/Teams) with pinned timeline.
62
+ - No blame during active incident — focus on mitigation only.
63
+
64
+ **Postmortem Structure:**
65
+ - **Timeline:** Minute-by-minute reconstruction of events.
66
+ - **Impact:** Users affected, duration, revenue/SLA impact.
67
+ - **Root Cause:** The systemic failure that allowed the incident.
68
+ - **Contributing Factors:** What made detection/resolution harder.
69
+ - **Action Items:** Specific, assigned, time-boxed improvements.
70
+ - Blameless: focus on systems and processes, never individuals.
71
+ - Share postmortems broadly — learning is organizational.
72
+ - Track action item completion rate as a metric.
73
+
74
+ **Communication:**
75
+ - Status page updates every 15 minutes during active incidents.
76
+ - Internal stakeholder updates via designated channel.
77
+ - Customer-facing communication: acknowledge → investigate → mitigate → resolve.
78
+ - Post-resolution: summary email with impact and next steps.
79
+
80
+ ### After
81
+
82
+ 1. Verify runbooks are tested (tabletop exercises quarterly).
83
+ 2. Confirm escalation paths are current and contact information is valid.
84
+ 3. Validate on-call schedule has no coverage gaps.
85
+ 4. Review postmortem action item completion from previous incidents.
86
+ 5. Measure MTTR trends and identify systemic improvement opportunities.
87
+
88
+ ## Self-check before task completion
89
+
90
+ - [ ] Severity levels are clearly defined with response time targets.
91
+ - [ ] Runbooks follow the template and are actionable (copy-pasteable commands).
92
+ - [ ] On-call rotation is sustainable (max 1 week, compensation, follow-the-sun).
93
+ - [ ] Escalation paths have timeout-based auto-escalation.
94
+ - [ ] Postmortem template is blameless and includes action items.
95
+ - [ ] Communication cadence is defined for each severity level.
96
+ - [ ] All processes integrate with existing alerting and observability tooling.
97
+ - [ ] Quarterly review cadence is established for runbook freshness.