mindforge-cc 10.0.3 → 11.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (333) hide show
  1. package/.mindforge/MINDFORGE-V2-SCHEMA.json +43 -10
  2. package/.mindforge/config.json +30 -2
  3. package/.mindforge/engine/cross-model-eval.md +74 -0
  4. package/.mindforge/engine/proactive/signal-detector.md +60 -0
  5. package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
  6. package/.mindforge/personas/agent-architect.md +57 -0
  7. package/.mindforge/personas/agent-evaluator.md +162 -0
  8. package/.mindforge/personas/agent-memory-designer.md +157 -0
  9. package/.mindforge/personas/agent-ops-engineer.md +120 -0
  10. package/.mindforge/personas/agent-orchestrator.md +112 -0
  11. package/.mindforge/personas/ai-economist.md +57 -0
  12. package/.mindforge/personas/ai-safety-engineer.md +57 -0
  13. package/.mindforge/personas/analytics-engineer.md +57 -0
  14. package/.mindforge/personas/anti-pattern-hunter.md +61 -0
  15. package/.mindforge/personas/api-gateway-designer.md +132 -0
  16. package/.mindforge/personas/auth-engineer.md +112 -0
  17. package/.mindforge/personas/build-engineer.md +57 -0
  18. package/.mindforge/personas/business-analyst.md +56 -0
  19. package/.mindforge/personas/cache-architect.md +100 -0
  20. package/.mindforge/personas/causal-scientist.md +57 -0
  21. package/.mindforge/personas/cdn-architect.md +118 -0
  22. package/.mindforge/personas/change-agent.md +104 -0
  23. package/.mindforge/personas/code-narrator.md +52 -0
  24. package/.mindforge/personas/codegen-specialist.md +68 -0
  25. package/.mindforge/personas/communication-architect.md +102 -0
  26. package/.mindforge/personas/compliance-engineer.md +96 -0
  27. package/.mindforge/personas/consensus-engineer.md +116 -0
  28. package/.mindforge/personas/contract-tester.md +60 -192
  29. package/.mindforge/personas/data-architect.md +108 -0
  30. package/.mindforge/personas/data-mesh-architect.md +57 -0
  31. package/.mindforge/personas/data-pipeline-architect.md +120 -0
  32. package/.mindforge/personas/de-sloppifier.md +60 -0
  33. package/.mindforge/personas/debt-manager.md +66 -0
  34. package/.mindforge/personas/decision-architect.md +82 -51
  35. package/.mindforge/personas/deployment-captain.md +74 -0
  36. package/.mindforge/personas/design-system-lead.md +112 -0
  37. package/.mindforge/personas/dmux-orchestrator.md +75 -0
  38. package/.mindforge/personas/dx-engineer.md +96 -0
  39. package/.mindforge/personas/ecommerce-engineer.md +57 -0
  40. package/.mindforge/personas/edge-engineer.md +94 -0
  41. package/.mindforge/personas/edtech-architect.md +106 -0
  42. package/.mindforge/personas/embedding-architect.md +57 -0
  43. package/.mindforge/personas/environment-engineer.md +57 -0
  44. package/.mindforge/personas/eval-judge.md +55 -0
  45. package/.mindforge/personas/event-architect.md +102 -0
  46. package/.mindforge/personas/experiment-designer.md +138 -0
  47. package/.mindforge/personas/feature-store-engineer.md +57 -0
  48. package/.mindforge/personas/finops-analyst.md +66 -0
  49. package/.mindforge/personas/fintech-architect.md +57 -0
  50. package/.mindforge/personas/flutter-engineer.md +104 -0
  51. package/.mindforge/personas/gaming-engineer.md +57 -0
  52. package/.mindforge/personas/graphql-designer.md +73 -0
  53. package/.mindforge/personas/healthcare-engineer.md +57 -0
  54. package/.mindforge/personas/hiring-strategist.md +105 -0
  55. package/.mindforge/personas/hitl-architect.md +165 -0
  56. package/.mindforge/personas/i18n-architect.md +69 -0
  57. package/.mindforge/personas/iot-architect.md +105 -0
  58. package/.mindforge/personas/knowledge-curator.md +139 -0
  59. package/.mindforge/personas/knowledge-engineer.md +57 -0
  60. package/.mindforge/personas/lakehouse-architect.md +57 -0
  61. package/.mindforge/personas/llm-orchestrator.md +57 -0
  62. package/.mindforge/personas/logistics-architect.md +106 -0
  63. package/.mindforge/personas/market-analyst.md +53 -0
  64. package/.mindforge/personas/marketplace-engineer.md +105 -0
  65. package/.mindforge/personas/mcp-designer.md +54 -0
  66. package/.mindforge/personas/meeting-designer.md +104 -0
  67. package/.mindforge/personas/mentorship-lead.md +106 -0
  68. package/.mindforge/personas/migration-architect.md +57 -0
  69. package/.mindforge/personas/ml-ops-engineer.md +101 -0
  70. package/.mindforge/personas/mobile-architect.md +105 -0
  71. package/.mindforge/personas/mobile-security-engineer.md +106 -0
  72. package/.mindforge/personas/multi-tenancy-architect.md +71 -0
  73. package/.mindforge/personas/multimodal-engineer.md +57 -0
  74. package/.mindforge/personas/offline-specialist.md +105 -0
  75. package/.mindforge/personas/onboarding-navigator.md +63 -0
  76. package/.mindforge/personas/payments-engineer.md +135 -0
  77. package/.mindforge/personas/pipeline-engineer.md +115 -0
  78. package/.mindforge/personas/platform-engineer.md +97 -0
  79. package/.mindforge/personas/platform-lead.md +57 -0
  80. package/.mindforge/personas/privacy-engineer.md +57 -0
  81. package/.mindforge/personas/product-owner.md +56 -0
  82. package/.mindforge/personas/productivity-analyst.md +57 -0
  83. package/.mindforge/personas/prompt-architect.md +101 -0
  84. package/.mindforge/personas/proofreader.md +53 -0
  85. package/.mindforge/personas/pwa-architect.md +105 -0
  86. package/.mindforge/personas/quality-scorer.md +63 -0
  87. package/.mindforge/personas/react-native-engineer.md +106 -0
  88. package/.mindforge/personas/resilience-engineer.md +69 -0
  89. package/.mindforge/personas/rfc-architect.md +64 -0
  90. package/.mindforge/personas/saga-orchestrator.md +80 -0
  91. package/.mindforge/personas/secrets-engineer.md +57 -0
  92. package/.mindforge/personas/skill-smith.md +79 -0
  93. package/.mindforge/personas/sre-lead.md +107 -0
  94. package/.mindforge/personas/stream-engineer.md +57 -0
  95. package/.mindforge/personas/streaming-engineer.md +64 -0
  96. package/.mindforge/personas/swarm-templates.json +674 -44
  97. package/.mindforge/personas/system-designer.md +57 -0
  98. package/.mindforge/personas/team-coach.md +120 -0
  99. package/.mindforge/personas/tech-lead-coach.md +103 -0
  100. package/.mindforge/personas/technical-writer-lead.md +111 -0
  101. package/.mindforge/personas/vibe-checker.md +75 -0
  102. package/.mindforge/personas/worktree-manager.md +56 -0
  103. package/.mindforge/personas/zero-trust-engineer.md +113 -0
  104. package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
  105. package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
  106. package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
  107. package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
  108. package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
  109. package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
  110. package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
  111. package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
  112. package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
  113. package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
  114. package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
  115. package/.mindforge/skills/api-versioning/SKILL.md +100 -0
  116. package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
  117. package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
  118. package/.mindforge/skills/audit-logging/SKILL.md +140 -0
  119. package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
  120. package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
  121. package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
  122. package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
  123. package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
  124. package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
  125. package/.mindforge/skills/business-analyst/SKILL.md +82 -0
  126. package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
  127. package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
  128. package/.mindforge/skills/causal-inference/SKILL.md +42 -0
  129. package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
  130. package/.mindforge/skills/change-management/SKILL.md +106 -0
  131. package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
  132. package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
  133. package/.mindforge/skills/cli-design/SKILL.md +118 -0
  134. package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
  135. package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
  136. package/.mindforge/skills/code-tour/SKILL.md +145 -0
  137. package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
  138. package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
  139. package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
  140. package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
  141. package/.mindforge/skills/container-security/SKILL.md +151 -0
  142. package/.mindforge/skills/context-engineering/SKILL.md +114 -0
  143. package/.mindforge/skills/contract-testing/SKILL.md +85 -0
  144. package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
  145. package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
  146. package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
  147. package/.mindforge/skills/data-governance/SKILL.md +42 -0
  148. package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
  149. package/.mindforge/skills/data-mesh/SKILL.md +42 -0
  150. package/.mindforge/skills/data-modeling/SKILL.md +107 -0
  151. package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
  152. package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
  153. package/.mindforge/skills/database-performance/SKILL.md +174 -0
  154. package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
  155. package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
  156. package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
  157. package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
  158. package/.mindforge/skills/dependency-management/SKILL.md +94 -0
  159. package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
  160. package/.mindforge/skills/design-system/SKILL.md +113 -0
  161. package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
  162. package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
  163. package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
  164. package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
  165. package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
  166. package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
  167. package/.mindforge/skills/edge-computing/SKILL.md +91 -0
  168. package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
  169. package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
  170. package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
  171. package/.mindforge/skills/environment-management/SKILL.md +54 -0
  172. package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
  173. package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
  174. package/.mindforge/skills/eval-harness/SKILL.md +180 -0
  175. package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
  176. package/.mindforge/skills/experiment-design/SKILL.md +139 -0
  177. package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
  178. package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
  179. package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
  180. package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
  181. package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
  182. package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
  183. package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
  184. package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
  185. package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
  186. package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
  187. package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
  188. package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
  189. package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
  190. package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
  191. package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
  192. package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
  193. package/.mindforge/skills/incident-communication/SKILL.md +96 -0
  194. package/.mindforge/skills/incident-management/SKILL.md +97 -0
  195. package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
  196. package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
  197. package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
  198. package/.mindforge/skills/iot-platform/SKILL.md +41 -0
  199. package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
  200. package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
  201. package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
  202. package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
  203. package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
  204. package/.mindforge/skills/load-testing/SKILL.md +84 -0
  205. package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
  206. package/.mindforge/skills/market-researcher/SKILL.md +99 -0
  207. package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
  208. package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
  209. package/.mindforge/skills/media-streaming/SKILL.md +41 -0
  210. package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
  211. package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
  212. package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
  213. package/.mindforge/skills/migration-platform/SKILL.md +61 -0
  214. package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
  215. package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
  216. package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
  217. package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
  218. package/.mindforge/skills/mobile-security/SKILL.md +45 -0
  219. package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
  220. package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
  221. package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
  222. package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
  223. package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
  224. package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
  225. package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
  226. package/.mindforge/skills/observability-stack/SKILL.md +136 -0
  227. package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
  228. package/.mindforge/skills/on-call-design/SKILL.md +111 -0
  229. package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
  230. package/.mindforge/skills/payment-integration/SKILL.md +176 -0
  231. package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
  232. package/.mindforge/skills/platform-observability/SKILL.md +58 -0
  233. package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
  234. package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
  235. package/.mindforge/skills/product-manager/SKILL.md +104 -0
  236. package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
  237. package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
  238. package/.mindforge/skills/proofreader/SKILL.md +158 -0
  239. package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
  240. package/.mindforge/skills/python-performance/SKILL.md +183 -0
  241. package/.mindforge/skills/quality-audit/SKILL.md +171 -0
  242. package/.mindforge/skills/queue-design/SKILL.md +85 -0
  243. package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
  244. package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
  245. package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
  246. package/.mindforge/skills/react-performance/SKILL.md +229 -0
  247. package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
  248. package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
  249. package/.mindforge/skills/responsive-native/SKILL.md +44 -0
  250. package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
  251. package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
  252. package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
  253. package/.mindforge/skills/santa-method/SKILL.md +134 -0
  254. package/.mindforge/skills/search-implementation/SKILL.md +98 -0
  255. package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
  256. package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
  257. package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
  258. package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
  259. package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
  260. package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
  261. package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
  262. package/.mindforge/skills/state-management/SKILL.md +104 -0
  263. package/.mindforge/skills/stream-processing/SKILL.md +43 -0
  264. package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
  265. package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
  266. package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
  267. package/.mindforge/skills/system-design/SKILL.md +88 -0
  268. package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
  269. package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
  270. package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
  271. package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
  272. package/.mindforge/skills/technical-writing/SKILL.md +237 -0
  273. package/.mindforge/skills/technology-radar/SKILL.md +88 -0
  274. package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
  275. package/.mindforge/skills/tool-design/SKILL.md +138 -0
  276. package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
  277. package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
  278. package/.mindforge/skills/verification-loop/SKILL.md +13 -1
  279. package/.mindforge/skills/vibe-security/SKILL.md +165 -0
  280. package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
  281. package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
  282. package/.mindforge/skills/writing-plans/SKILL.md +170 -0
  283. package/.mindforge/skills/writing-skills/SKILL.md +216 -0
  284. package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
  285. package/CHANGELOG.md +240 -0
  286. package/MINDFORGE.md +4 -4
  287. package/README.md +49 -4
  288. package/RELEASENOTES.md +80 -0
  289. package/SECURITY.md +20 -8
  290. package/bin/autonomous/audit-writer.js +13 -0
  291. package/bin/autonomous/auto-runner.js +74 -16
  292. package/bin/autonomous/context-refactorer.js +26 -11
  293. package/bin/autonomous/state-manager.js +62 -6
  294. package/bin/autonomous/stuck-monitor.js +46 -7
  295. package/bin/autonomous/wave-executor.js +66 -25
  296. package/bin/dashboard/api-router.js +43 -0
  297. package/bin/dashboard/metrics-aggregator.js +28 -1
  298. package/bin/dashboard/server.js +67 -4
  299. package/bin/dashboard/sse-bridge.js +4 -4
  300. package/bin/engine/feedback-loop.js +8 -0
  301. package/bin/engine/intelligence-interlock.js +32 -15
  302. package/bin/engine/logic-drift-detector.js +2 -1
  303. package/bin/engine/nexus-tracer.js +3 -2
  304. package/bin/engine/remediation-engine.js +155 -32
  305. package/bin/engine/self-corrective-synthesizer.js +84 -10
  306. package/bin/engine/sre-manager.js +12 -4
  307. package/bin/engine/temporal-hub.js +131 -34
  308. package/bin/governance/approve.js +41 -5
  309. package/bin/governance/impact-analyzer.js +28 -0
  310. package/bin/governance/policy-engine.js +10 -3
  311. package/bin/governance/quantum-crypto.js +32 -19
  312. package/bin/governance/rbac-manager.js +74 -2
  313. package/bin/governance/ztai-manager.js +49 -7
  314. package/bin/hindsight-injector.js +3 -3
  315. package/bin/memory/eis-client.js +71 -34
  316. package/bin/memory/embedding-engine.js +61 -0
  317. package/bin/memory/knowledge-graph.js +58 -5
  318. package/bin/memory/knowledge-indexer.js +53 -6
  319. package/bin/memory/knowledge-store.js +22 -0
  320. package/bin/migrations/10.7.0-to-11.0.0.js +110 -0
  321. package/bin/migrations/schema-versions.js +13 -0
  322. package/bin/models/anthropic-provider.js +45 -0
  323. package/bin/models/cloud-broker.js +68 -20
  324. package/bin/models/gemini-provider.js +51 -0
  325. package/bin/models/model-client.js +20 -0
  326. package/bin/models/model-router.js +28 -8
  327. package/bin/models/openai-provider.js +44 -0
  328. package/bin/utils/file-io.js +63 -1
  329. package/bin/utils/index.js +58 -0
  330. package/docs/getting-started.md +1 -1
  331. package/docs/user-guide.md +2 -2
  332. package/package.json +2 -2
  333. package/.mindforge/personas/data-privacy-engineer.md +0 -187
@@ -0,0 +1,129 @@
1
+ ---
2
+ name: agent-orchestration-patterns
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.7
5
+ status: stable
6
+ triggers: agent orchestration pattern, supervisor worker pattern, agent pipeline topology, agent debate pattern, agent consensus protocol, map reduce agents, handoff protocol design, multi-agent coordination, agent topology design, swarm pattern design, agent composition, failure propagation pattern
7
+ compose:
8
+ - tool-design
9
+ ---
10
+
11
+ # Agent Orchestration Patterns
12
+
13
+ ## When this skill activates
14
+
15
+ This skill activates when designing multi-agent systems, choosing coordination topologies, implementing handoff protocols, or debugging agent-to-agent communication failures. It applies to any system where two or more autonomous agents must collaborate, compete, or chain their outputs to accomplish a goal.
16
+
17
+ ## Mandatory actions when this skill is active
18
+
19
+ ### Before
20
+
21
+ 1. **Map the problem space** — Identify all subtasks. Determine which require sequential execution (dependencies) and which are independent (parallelizable).
22
+ 2. **Assess complexity** — Single-agent tasks masquerading as multi-agent problems waste coordination overhead. Only orchestrate when genuine specialization or parallelism is needed.
23
+ 3. **Define boundaries** — Each agent must have a clear responsibility boundary. Overlapping responsibilities cause conflicts. Gaps cause dropped tasks.
24
+ 4. **Choose state strategy** — Decide upfront: shared state (agents read/write common store) or isolated state (agents communicate only via messages).
25
+
26
+ ### During
27
+
28
+ #### Pattern Catalog
29
+
30
+ **1. Supervisor/Worker (Hub and Spoke)**
31
+ - **Topology** — One coordinator agent decomposes the task and dispatches subtasks to N worker agents. Workers report results back to the supervisor.
32
+ - **When to use** — Task is decomposable into independent units. Workers are interchangeable or specialized but non-overlapping.
33
+ - **Supervisor responsibilities** — Task decomposition, worker assignment, result aggregation, error handling, timeout enforcement.
34
+ - **Worker responsibilities** — Execute assigned subtask, report structured results, signal failure early.
35
+ - **Pitfall** — Supervisor becomes bottleneck. Mitigate with async dispatch and parallel worker execution.
36
+
37
+ **2. Pipeline (Sequential Chain)**
38
+ - **Topology** — Agent A's output becomes Agent B's input. Linear flow through N stages.
39
+ - **When to use** — Tasks have natural ordering (research → draft → review → publish). Each stage transforms or enriches the previous output.
40
+ - **Stage contract** — Each stage must define its input schema and output schema. Type mismatch between stages is the most common pipeline failure.
41
+ - **Error handling** — Fail the pipeline on any stage failure. Partial results from earlier stages should be preserved for debugging.
42
+ - **Optimization** — Streaming between stages reduces latency. Agent B can begin processing as Agent A emits output.
43
+
44
+ **3. Debate (Adversarial)**
45
+ - **Topology** — Two or more agents argue opposing positions. A synthesizer agent evaluates arguments and produces a final decision.
46
+ - **When to use** — High-stakes decisions where bias is a risk. Architecture choices, security reviews, strategic decisions.
47
+ - **Protocol** — Round 1: each debater states position with evidence. Round 2: each debater rebuts opponent's position. Round 3: synthesizer produces verdict with reasoning.
48
+ - **Constraint** — Debaters must not see each other's initial positions until after Round 1. Prevents anchoring.
49
+ - **Pitfall** — Debates can be unproductive without strict structure. Always time-box rounds.
50
+
51
+ **4. Consensus (Agreement Required)**
52
+ - **Topology** — All agents must agree on the output. Disagreement triggers re-evaluation.
53
+ - **When to use** — Safety-critical decisions. Deployment approvals. Security assessments. Changes where false positives are acceptable but false negatives are dangerous.
54
+ - **Protocol** — Each agent independently evaluates. If all approve: proceed. If any reject: block and surface the dissenting reasoning.
55
+ - **Threshold variants** — Unanimous (all agree), Majority (>50%), Supermajority (>66%), Quorum (minimum N must vote).
56
+ - **Pitfall** — Consensus is expensive. Reserve for decisions where the cost of a wrong answer far exceeds the cost of deliberation.
57
+
58
+ **5. MapReduce (Parallel Processing)**
59
+ - **Topology** — Map phase: split input into N chunks, dispatch to N parallel agents. Reduce phase: aggregate results into final output.
60
+ - **When to use** — Large inputs that can be processed independently (code review across files, document analysis, test execution).
61
+ - **Map function** — Must produce non-overlapping chunks. Overlap causes duplicate work or conflicting results.
62
+ - **Reduce function** — Must handle partial failures gracefully. If 1 of 10 map workers fails, the reduce should still produce useful output from the other 9.
63
+ - **Scaling** — Add more workers linearly. Bottleneck is the reduce step, not the map step.
64
+
65
+ #### Handoff Protocol Design
66
+
67
+ Every agent-to-agent handoff must include a structured message:
68
+
69
+ ```json
70
+ {
71
+ "task_id": "unique-identifier",
72
+ "from_agent": "agent-name",
73
+ "to_agent": "agent-name",
74
+ "task": "clear description of what to do",
75
+ "context": "relevant background (minimal, not full history)",
76
+ "constraints": ["must not modify X", "timeout 30s"],
77
+ "acceptance_criteria": ["output matches schema Y", "all tests pass"],
78
+ "artifacts": ["file paths or data references"]
79
+ }
80
+ ```
81
+
82
+ - **Minimal context** — Send only what the receiving agent needs. Full history causes confusion and wastes tokens.
83
+ - **Explicit acceptance criteria** — The receiving agent must know when it has succeeded.
84
+ - **Typed artifacts** — Reference files or data by path/ID, not by embedding content in the message.
85
+
86
+ #### Failure Propagation Strategies
87
+
88
+ | Strategy | Behavior | Use When |
89
+ |----------|----------|----------|
90
+ | Fail-fast | Abort immediately, surface error | Critical path, no recovery possible |
91
+ | Retry | Repeat N times with backoff | Transient failures (network, rate limits) |
92
+ | Escalate | Notify supervisor, request human input | Ambiguous failures, policy decisions |
93
+ | Degrade | Continue with partial results, flag gaps | Non-critical subtasks, best-effort acceptable |
94
+ | Circuit-break | Stop retrying after N failures, return cached/default | Dependency is unreliable |
95
+
96
+ #### State Management
97
+
98
+ - **Shared state** — All agents read/write a common store (database, shared memory). Simpler but requires conflict resolution (optimistic locking, CRDTs).
99
+ - **Isolated state** — Agents maintain private state, communicate only via messages. Safer but requires explicit state transfer in handoffs.
100
+ - **Hybrid** — Shared read-only state (project context, configuration) + isolated write state (each agent's working memory). Best balance for most systems.
101
+
102
+ #### Decision Matrix: When to Use Which Pattern
103
+
104
+ | Scenario | Pattern |
105
+ |----------|---------|
106
+ | Task decomposes into independent subtasks | MapReduce or Supervisor/Worker |
107
+ | Tasks must execute in order | Pipeline |
108
+ | High-stakes decision needs scrutiny | Debate or Consensus |
109
+ | One coordinator manages many executors | Supervisor/Worker |
110
+ | System must tolerate partial failures | MapReduce with degraded reduce |
111
+ | Speed is critical, tasks are independent | MapReduce with max parallelism |
112
+
113
+ ### After
114
+
115
+ 1. **Validate handoff contracts** — Test that each agent produces output matching the next agent's expected input schema.
116
+ 2. **Test failure modes** — Simulate each failure propagation path. Verify the system degrades gracefully, not catastrophically.
117
+ 3. **Measure overhead** — Coordination cost should be <20% of total execution time. If higher, simplify the topology.
118
+ 4. **Document topology** — Create a diagram showing agent relationships, handoff directions, and failure paths.
119
+
120
+ ## Self-check before task completion
121
+
122
+ - [ ] Pattern choice is justified by task structure (not over-engineered)
123
+ - [ ] Each agent has clear, non-overlapping responsibilities
124
+ - [ ] Handoff protocol includes task, context, constraints, and acceptance criteria
125
+ - [ ] Failure propagation strategy is defined for every inter-agent connection
126
+ - [ ] State management approach is explicit (shared vs isolated vs hybrid)
127
+ - [ ] Coordination overhead is measured and acceptable (<20% of total time)
128
+ - [ ] All agent-to-agent contracts are typed and validated
129
+ - [ ] System degrades gracefully under partial failure conditions
@@ -0,0 +1,204 @@
1
+ ---
2
+ name: agent-tool-selection
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.4
5
+ status: stable
6
+ triggers: agent tool selection, capability matching, tool routing, cost-aware tool choice, tool fallback chain, tool composition strategy, function selection, tool description optimization, tool disambiguation, tool preference, tool capability matrix, tool ranking
7
+ compose: tool-design
8
+ ---
9
+
10
+ # Skill — Agent Tool Selection (Intelligent Capability Routing)
11
+
12
+ ## When this skill activates
13
+ When designing tool selection logic for AI agents, optimizing tool descriptions,
14
+ building fallback chains, or analyzing tool usage patterns. Use for any scenario
15
+ where an agent must choose between multiple tools to accomplish a task.
16
+
17
+ Core principle: **Specificity over generality** — when two tools can both handle a
18
+ task, prefer the more specific one. It will be faster, cheaper, and more reliable.
19
+ A hammer works for screws, but a screwdriver is better.
20
+
21
+ ## Mandatory actions when this skill is active
22
+
23
+ ### Tool Selection Algorithm
24
+
25
+ 1. **Selection pipeline (in order):**
26
+ ```
27
+ Input: task description + available tools
28
+
29
+ Step 1 — Capability Match:
30
+ - For each tool: does its capability set cover the task requirements?
31
+ - Eliminate tools that CANNOT handle the task (hard filter)
32
+
33
+ Step 2 — Rank by Specificity:
34
+ - More specific tool > more general tool
35
+ - Example: "Read file" > "Bash (cat)" for reading files
36
+ - Specificity = how narrow is the tool's intended use case?
37
+
38
+ Step 3 — Rank by Cost-Efficiency:
39
+ - Among equally capable tools: prefer cheaper/faster
40
+ - Read (free, instant) > Bash cat (process spawn) > API call (network)
41
+
42
+ Step 4 — Rank by Reliability:
43
+ - Among equal cost: prefer higher success rate
44
+ - Check historical success rate per tool per task type
45
+
46
+ Step 5 — Verify Preconditions:
47
+ - Does the selected tool's preconditions hold?
48
+ - Example: Edit requires file was previously Read
49
+ - If preconditions not met: add prerequisite steps
50
+
51
+ Output: ordered list of tools to try (primary + fallbacks)
52
+ ```
53
+
54
+ 2. **Decision matrix template:**
55
+ ```
56
+ | Task Type | Primary Tool | Fallback 1 | Fallback 2 | Anti-pattern |
57
+ |--------------------|--------------|--------------|----------- |------------------|
58
+ | Read file content | Read | Bash (cat) | — | Grep (wrong use) |
59
+ | Search for pattern | Grep | Bash (grep) | Read + scan | Read all files |
60
+ | Edit existing file | Edit | Write | — | Bash (sed) |
61
+ | Create new file | Write | Bash (echo>) | — | Edit (no file) |
62
+ | Run tests | Bash | — | — | Read test output |
63
+ | Check file exists | Bash (ls) | Read (error) | — | Grep for path |
64
+ ```
65
+
66
+ ### Tool Description Optimization
67
+
68
+ 3. **Writing effective tool descriptions (they ARE prompts):**
69
+ ```
70
+ Good description (specific, with examples):
71
+ "Read a file from disk. Use when you need to see file contents.
72
+ Supports text, images, PDFs. Prefer over Bash cat/head/tail.
73
+ NOT for directories (use Bash ls)."
74
+
75
+ Bad description (vague):
76
+ "Reads things from the filesystem."
77
+
78
+ Good description (with when-to-use and when-NOT-to-use):
79
+ "Edit an existing file by replacing exact string matches.
80
+ Use when: modifying 1-5 specific locations in a file.
81
+ Do NOT use when: rewriting >50% of the file (use Write instead).
82
+ Requires: file must have been Read in this session first."
83
+
84
+ Bad description (missing boundaries):
85
+ "Edits files."
86
+ ```
87
+
88
+ Rules:
89
+ - Include WHEN to use (positive examples)
90
+ - Include when NOT to use (negative examples — prevents misuse)
91
+ - State preconditions explicitly
92
+ - Keep descriptions under 100 words (concise > comprehensive)
93
+ - Use concrete examples, not abstract capabilities
94
+
95
+ ### Cost-Aware Selection
96
+
97
+ 4. **Cost hierarchy (prefer cheaper when quality is equal):**
98
+ ```
99
+ Tier 1 — Free/Instant (prefer these):
100
+ - Read (file content)
101
+ - Edit (modify file)
102
+ - Write (create file)
103
+ - Grep (pattern search in known scope)
104
+
105
+ Tier 2 — Cheap/Fast (use when Tier 1 can't):
106
+ - Bash (shell commands — process spawn overhead)
107
+ - Glob (file path patterns)
108
+
109
+ Tier 3 — Moderate (use when necessary):
110
+ - LSP (language server queries)
111
+ - Web fetch (network requests)
112
+
113
+ Tier 4 — Expensive (use sparingly):
114
+ - Sub-agent spawn (full agent instantiation)
115
+ - Multi-file analysis (token-heavy)
116
+ - External API calls (rate-limited, costly)
117
+ ```
118
+
119
+ Rules:
120
+ - Always check if a Tier 1 tool can handle the task before reaching for Tier 3-4
121
+ - Track cumulative cost during a session (don't let tool costs compound silently)
122
+ - For repeated operations: batch when possible (one Bash with && vs many Bash calls)
123
+ - Cost includes: token consumption, time, API calls, compute resources
124
+
125
+ ### Fallback Chains
126
+
127
+ 5. **Designing robust fallback sequences:**
128
+ ```
129
+ Fallback chain structure:
130
+ 1. Try primary tool (most specific, cheapest)
131
+ 2. If fails (error, timeout, precondition unmet):
132
+ - Log failure reason
133
+ - Try fallback 1 (broader capability)
134
+ 3. If fallback 1 fails:
135
+ - Try fallback 2 (most general/expensive)
136
+ 4. If all fail:
137
+ - Escalate to user with: what was tried, why each failed, what's needed
138
+
139
+ Example:
140
+ Task: "Find where function X is defined"
141
+ 1. Grep (fast, pattern-based) → found? done
142
+ 2. LSP (semantic, language-aware) → found? done
143
+ 3. Bash find + grep (brute force) → found? done
144
+ 4. Escalate: "I couldn't locate function X. Can you point me to the file?"
145
+ ```
146
+
147
+ Rules:
148
+ - Fallback chains should be pre-defined per task type (not improvised)
149
+ - Each fallback should be DIFFERENT in approach (not just retry)
150
+ - Log which level of the chain succeeded (optimize primary over time)
151
+ - Max 3 fallback levels before escalation (avoid infinite retry loops)
152
+
153
+ ### Tool Composition
154
+
155
+ 6. **Combining tools for complex tasks:**
156
+ ```
157
+ Composition patterns:
158
+
159
+ Sequential: Tool A output → Tool B input
160
+ Example: Grep (find file) → Read (get content) → Edit (modify)
161
+
162
+ Parallel: Tool A + Tool B independently → merge results
163
+ Example: Grep (find usages) + Read (get definition) → understand full context
164
+
165
+ Conditional: If Tool A succeeds → Tool B, else → Tool C
166
+ Example: If Read(file) succeeds → Edit, else → Write (file doesn't exist)
167
+
168
+ Iterative: Repeat Tool A until condition met
169
+ Example: Bash(test) → fails → Edit(fix) → Bash(test) → passes → done
170
+ ```
171
+
172
+ Rules:
173
+ - Plan composition BEFORE executing (don't improvise mid-chain)
174
+ - Minimize total tool calls (combine steps where possible)
175
+ - Never call the same tool twice with identical inputs (cache/reuse results)
176
+ - If composition exceeds 5 sequential steps: consider if there's a more direct tool
177
+
178
+ ### Tool Disambiguation
179
+
180
+ 7. **When multiple tools seem equally valid:**
181
+ ```
182
+ Disambiguation criteria (in priority order):
183
+ 1. Fewer side effects: Read-only > Read-write (prefer observation over action)
184
+ 2. More specific: Narrow tool > broad tool (Edit > Bash sed)
185
+ 3. Cheaper: Less resource consumption > more
186
+ 4. More reversible: Undoable > permanent (Edit > Write for existing files)
187
+ 5. Better error messages: Tools with clear failure modes > opaque failures
188
+
189
+ If still tied after all criteria: pick the one that appears first in the
190
+ tool list (convention-based tie-breaking, prevents analysis paralysis)
191
+ ```
192
+
193
+ ## Self-check before task completion
194
+
195
+ Before marking a task done when this skill was active:
196
+
197
+ - [ ] Did I follow the selection pipeline (capability → specificity → cost → reliability)?
198
+ - [ ] Are tool descriptions specific, with positive AND negative usage examples?
199
+ - [ ] Is cost hierarchy respected (cheaper tools preferred when quality is equal)?
200
+ - [ ] Are fallback chains defined for each critical task type (max 3 levels)?
201
+ - [ ] Is tool composition planned before execution (not improvised)?
202
+ - [ ] Are disambiguations resolved by: side effects → specificity → cost → reversibility?
203
+ - [ ] Are tool calls minimized (no redundant calls, batched where possible)?
204
+ - [ ] Is escalation to user defined as the final fallback (not infinite retry)?
@@ -0,0 +1,176 @@
1
+ ---
2
+ name: ai-agent-deployment
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.1.1
5
+ status: stable
6
+ triggers: ai agent deployment, agent hosting, agent scaling, agent versioning, agent A/B testing, agent monitoring production, agent rollback, agent health check, agent cost production, agent performance monitoring, agent canary, agent shadow testing
7
+ compose: deployment-workflow
8
+ ---
9
+
10
+ # Skill — AI Agent Deployment
11
+
12
+ ## When this skill activates
13
+ Any task involving deploying AI agents to production, versioning agent configurations,
14
+ A/B testing agent variants, monitoring agent quality in production,
15
+ or managing the operational lifecycle of AI agents.
16
+
17
+ ## Mandatory actions when this skill is active
18
+
19
+ ### Before writing any code
20
+ 1. Define the agent version tuple: model + prompt + tools + config (all pinned together).
21
+ 2. Identify success metrics (quality, latency, cost, user satisfaction).
22
+ 3. Plan rollback strategy (instant version pointer switch).
23
+ 4. Design monitoring (token usage, error rate, quality signal).
24
+
25
+ ### During implementation
26
+ - Package agent as a versioned, immutable deployment artifact.
27
+ - Implement health check endpoint (synthetic task probe).
28
+ - Add structured logging for every agent action (input, output, tools used, tokens).
29
+ - Build traffic splitting capability for A/B and canary.
30
+ - Instrument cost tracking per-task and per-user.
31
+ - Implement graceful degradation (fallback to simpler model on failure).
32
+
33
+ ### After implementation
34
+ - Verify shadow test shows no regression vs current version.
35
+ - Confirm monitoring dashboards capture all key metrics.
36
+ - Test rollback procedure end-to-end.
37
+ - Validate cost projections against actual usage.
38
+ - Run synthetic probes for health verification.
39
+
40
+ ## Versioning Strategy
41
+
42
+ ### Agent Version = Immutable Tuple
43
+ ```json
44
+ {
45
+ "version": "agent-v2.3.1",
46
+ "model": "claude-sonnet-4-20250514",
47
+ "prompt_hash": "sha256:abc123...",
48
+ "tools": ["search_v2", "code_exec_v1", "web_browse_v3"],
49
+ "config": {
50
+ "temperature": 0.3,
51
+ "max_tokens": 4096,
52
+ "timeout_ms": 30000
53
+ }
54
+ }
55
+ ```
56
+
57
+ ### Rules
58
+ - Changing ANY component = new version.
59
+ - Never mutate a deployed version in place.
60
+ - Keep previous N versions warm for instant rollback.
61
+ - Version string includes all components for traceability.
62
+
63
+ ## Hosting Patterns
64
+
65
+ ### Containerized (Recommended)
66
+ - Docker container with model client, prompt, tool implementations.
67
+ - Auto-scale on queue depth (not CPU — agents are I/O bound).
68
+ - GPU allocation only if running local inference.
69
+ - Isolate per-tenant for data separation.
70
+
71
+ ### Scaling Signals
72
+ | Signal | Scale Direction | Reason |
73
+ |--------|----------------|--------|
74
+ | Queue depth increasing | Scale up | Work is backing up |
75
+ | P95 latency rising | Scale up | Capacity insufficient |
76
+ | Queue empty for 5min | Scale down | Over-provisioned |
77
+ | Error rate > 5% | Pause scaling | Fix errors first |
78
+
79
+ ## A/B Testing
80
+
81
+ ### Setup
82
+ 1. Define hypothesis (e.g., "new prompt reduces hallucination by 20%").
83
+ 2. Split traffic (e.g., 90/10 control/experiment).
84
+ 3. Run for statistical significance (typically 1000+ samples per variant).
85
+ 4. Measure: quality score, latency, cost, user feedback.
86
+
87
+ ### Metrics to Compare
88
+ - Task success rate (did the agent complete the task correctly?).
89
+ - Token usage (cost proxy).
90
+ - Latency p50/p95/p99.
91
+ - Tool failure rate.
92
+ - User satisfaction signal (thumbs up/down, follow-up corrections).
93
+ - Hallucination rate (if measurable via ground truth).
94
+
95
+ ### Graduation Criteria
96
+ - Improvement statistically significant (p < 0.05).
97
+ - No regression in any critical metric.
98
+ - Cost increase acceptable (<20% for same quality).
99
+
100
+ ## Shadow Testing
101
+
102
+ ### Pattern
103
+ ```
104
+ User Request → Production Agent (responds to user)
105
+ → Shadow Agent (runs silently, output logged)
106
+ ```
107
+
108
+ ### Purpose
109
+ - Test new version against real traffic without user impact.
110
+ - Compare outputs offline (human eval or automated scoring).
111
+ - Detect regressions before any user sees them.
112
+
113
+ ### Rules
114
+ - Shadow agent output never reaches the user.
115
+ - Shadow uses same input but may have different model/prompt/tools.
116
+ - Compare at scale (1000+ requests) before promoting.
117
+ - Track divergence rate and categorize differences.
118
+
119
+ ## Monitoring
120
+
121
+ ### Key Metrics (Real-Time Dashboard)
122
+ | Metric | Alert Threshold | Action |
123
+ |--------|----------------|--------|
124
+ | Token usage/task | >2x baseline | Check for loops/verbose output |
125
+ | Latency p95 | >30s | Scale up or investigate bottleneck |
126
+ | Tool failure rate | >5% | Check tool availability |
127
+ | Hallucination rate | >3% | Rollback, investigate prompt |
128
+ | User negative feedback | >10% | Investigate, consider rollback |
129
+ | Cost per task | >$0.50 | Check for inefficiency |
130
+
131
+ ### Structured Logging
132
+ Every agent invocation must log:
133
+ - Request ID, user ID, agent version.
134
+ - Input (sanitized of PII).
135
+ - Output summary.
136
+ - Tools invoked and their results.
137
+ - Token counts (input, output, total).
138
+ - Latency breakdown (thinking, tool calls, generation).
139
+ - Success/failure determination.
140
+
141
+ ## Rollback
142
+
143
+ ### Instant Rollback
144
+ - Version pointer in config store (not redeployment).
145
+ - Switch pointer → immediate traffic to previous version.
146
+ - Keep N previous versions warm (containers running, ready).
147
+ - Rollback decision within 5 minutes of detecting regression.
148
+
149
+ ### Rollback Triggers (Automatic)
150
+ - Error rate > 10% for 3 consecutive minutes.
151
+ - P95 latency > 60s for 5 minutes.
152
+ - User negative feedback spike (3x normal rate).
153
+
154
+ ## Health Checks
155
+
156
+ ### Synthetic Probes
157
+ - Run a known-good task against the agent every 5 minutes.
158
+ - Verify output matches expected structure.
159
+ - Check latency within bounds.
160
+ - Alert if probe fails 2 consecutive times.
161
+
162
+ ### Probe Design
163
+ - Task must be deterministic (or have verifiable structure).
164
+ - Must exercise core capabilities (reasoning + at least one tool).
165
+ - Must complete within health check timeout (10s recommended).
166
+ - Results logged for trend analysis.
167
+
168
+ ## Self-check
169
+ - [ ] Agent version tuple defined (model + prompt + tools + config).
170
+ - [ ] Health check probes running every 5 minutes.
171
+ - [ ] Monitoring covers: tokens, latency, errors, quality, cost.
172
+ - [ ] Rollback tested and confirmed instant.
173
+ - [ ] Shadow test shows no regression.
174
+ - [ ] A/B framework ready for future experiments.
175
+ - [ ] Cost per task tracked and within budget.
176
+ - [ ] Graceful degradation implemented for failures.
@@ -0,0 +1,57 @@
1
+ ---
2
+ name: ai-cost-management
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.5.0
5
+ status: stable
6
+ triggers: AI cost management, token budget optimization, model cost selection, LLM response caching, batch inference processing, AI infrastructure cost, token usage monitoring, cost-aware model routing, inference cost reduction, GPU utilization optimization, AI spend tracking, cost per query optimization
7
+ compose:
8
+ - llm-cost-optimization
9
+ ---
10
+
11
+ # AI Cost Management & Optimization
12
+
13
+ ## When this skill activates
14
+
15
+ This skill activates when optimizing AI inference costs, implementing token budgeting, designing cost-aware model routing, or tracking AI spending at scale. It applies to production AI systems where compute costs are significant (>$10K/month) and must be controlled without degrading user experience.
16
+
17
+ ## Mandatory actions when this skill is active
18
+
19
+ ### Before writing any code
20
+
21
+ 1. **Establish cost baseline** — Measure current costs per dimension: cost per request, cost per user, cost per model, cost per task type. Identify cost drivers: which models, which tasks, which users consume the most budget. Focus optimization efforts on the top 20% of cost drivers (Pareto principle).
22
+ 2. **Set cost budgets and alerts** — Define acceptable cost thresholds per day, per week, per month. Set up alerts when spending exceeds thresholds (50%, 80%, 100%). Alerts prevent runaway costs from unexpected usage spikes or inefficient code.
23
+ 3. **Design cost attribution model** — Assign costs to cost centers: per-team, per-product, per-customer tier (free vs. paid). Enables chargeback (internal teams pay for their usage) and informs product decisions (should free tier have lower model quality to reduce costs?).
24
+ 4. **Identify optimization opportunities** — Rank opportunities by potential savings: caching (30-70% reduction for repetitive queries), model downgrade (10-50% reduction with minimal quality loss), batching (20-40% reduction via throughput optimization), prompt compression (10-30% reduction by reducing tokens).
25
+
26
+ ### During implementation
27
+
28
+ - **Implement aggressive response caching** — Cache LLM responses keyed by prompt hash + model + hyperparameters. Cache hit rate >50% is achievable for many use cases. Use Redis or Memcached for low-latency lookups (<1ms). Set TTL based on content freshness requirements (hours for news, days for documentation, indefinite for static content).
29
+ - **Design cost-aware routing** — Route requests to the cheapest model that meets quality requirements. Example: simple classification → Haiku ($0.25/MTok), complex reasoning → Opus ($15/MTok). Measure quality degradation when downgrading models. If accuracy drop is <2%, downgrade is safe.
30
+ - **Compress prompts aggressively** — Remove filler words, use abbreviations, compress whitespace. Test that compressed prompts produce equivalent outputs. Measure compression ratio (original tokens / compressed tokens). Target: 20-50% compression with <1% quality loss.
31
+ - **Batch inference for throughput** — Group requests into batches (10-100 per batch) to maximize GPU utilization. Batching increases throughput (requests/second) and reduces cost per request (amortize fixed overhead). Trade-off: higher latency (wait for batch to fill) for lower cost.
32
+ - **Implement prompt caching (for supported models)** — Use prompt caching for repeated prompt prefixes (system message, context). Claude and GPT-4 support prompt caching. Reduces cost by 90% for cached tokens. Ensure prompts are structured with stable prefixes and variable suffixes.
33
+ - **Track token usage per request** — Log input tokens, output tokens, and total cost per request. Aggregate by model, task type, user, and time. Identify outliers: queries with unusually high token counts (may indicate inefficient prompts or bugs). Set up monitoring dashboards.
34
+
35
+ ### After implementation
36
+
37
+ - **Measure cache hit rate** — Track % of requests served from cache. Target: >50% hit rate for production systems with repetitive queries. If lower, analyze cache misses: are prompts subtly different (normalize prompts), is TTL too short (increase TTL), or is traffic too diverse (caching won't help)?
38
+ - **Validate quality after cost optimization** — Compare model accuracy, user satisfaction, or business metrics before and after optimization. Acceptable thresholds: <2% accuracy drop, <5% user satisfaction drop. If degradation is higher, roll back optimizations.
39
+ - **Benchmark cost reduction** — Measure cost per request after optimizations vs. baseline. Target: 30-50% cost reduction from caching + routing + compression. Document savings in $/month and ROI (developer time spent vs. cost saved).
40
+ - **Monitor for cost anomalies** — Set up alerts for sudden cost spikes (>2x daily average) or unusual patterns (single user consuming 10x typical usage). Anomalies indicate bugs (infinite loops, retry storms) or abuse (attackers exploiting free tier).
41
+
42
+ ## Self-check before task completion
43
+
44
+ - [ ] Cost baseline is measured per request, user, model, and task type
45
+ - [ ] Cost budgets are set with alerts at 50%, 80%, 100% thresholds
46
+ - [ ] Cost attribution model assigns costs to teams, products, or customer tiers
47
+ - [ ] Optimization opportunities are ranked by potential savings and effort
48
+ - [ ] Response caching is implemented with cache hit rate tracked (target >50%)
49
+ - [ ] Cost-aware routing selects cheapest model that meets quality requirements
50
+ - [ ] Prompt compression achieves 20-50% token reduction with <1% quality loss
51
+ - [ ] Batch inference is implemented for throughput optimization (10-100 requests per batch)
52
+ - [ ] Prompt caching (if supported) reduces cost by 90% for repeated prompt prefixes
53
+ - [ ] Token usage is logged per request and aggregated by model, task, user, time
54
+ - [ ] Cache hit rate is validated at >50% for production systems with repetitive queries
55
+ - [ ] Model quality is validated post-optimization (<2% accuracy drop, <5% satisfaction drop)
56
+ - [ ] Cost reduction is benchmarked (target 30-50% reduction from baseline)
57
+ - [ ] Cost anomaly alerts are configured for spikes (>2x daily average) and abuse patterns
@@ -0,0 +1,53 @@
1
+ ---
2
+ name: ai-safety-alignment
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.5.0
5
+ status: stable
6
+ triggers: AI safety implementation, AI alignment technique, output filtering AI, content moderation AI, bias detection model, AI guardrail implementation, harmful content prevention, AI red teaming, model safety evaluation, AI ethics implementation, alignment testing, responsible AI deployment
7
+ compose:
8
+ - guardrails-and-safety
9
+ ---
10
+
11
+ # AI Safety & Alignment
12
+
13
+ ## When this skill activates
14
+
15
+ This skill activates when implementing output filters, content moderation systems, bias detection, adversarial robustness, or alignment techniques for AI systems. It applies when deploying AI in production environments where safety failures have legal, reputational, or ethical consequences.
16
+
17
+ ## Mandatory actions when this skill is active
18
+
19
+ ### Before writing any code
20
+
21
+ 1. **Define harm taxonomy** — Categorize potential harms specific to your domain: toxic content, misinformation, bias (demographic, cultural), privacy leaks, prompt injection, jailbreaks, copyright violations. Prioritize by severity and likelihood. Not all harms are equal.
22
+ 2. **Establish safety thresholds** — Define numeric thresholds for each harm category: toxicity score <0.3, PII detection confidence >0.9, bias parity ratio 0.8-1.2. Thresholds must be tuned on representative data, not arbitrary guesses.
23
+ 3. **Select detection models** — Choose specialized classifiers per harm type: Perspective API or custom models for toxicity, named entity recognition for PII, fairness metrics for bias. General-purpose LLMs are too slow and expensive for real-time safety filtering.
24
+ 4. **Design layered defense** — Implement multiple safety layers: input validation (reject malicious prompts), model guardrails (constrain model behavior), output filtering (catch harmful completions), monitoring (detect safety failures post-deployment). Single-layer defense is insufficient.
25
+
26
+ ### During implementation
27
+
28
+ - **Implement input sanitization first** — Validate all user inputs before reaching the model. Reject or sanitize: SQL injection patterns, prompt injection attempts (ignore previous instructions), PII in prompts, excessive length, special characters that break parsing. Log rejected inputs for analysis.
29
+ - **Apply constitutional AI principles** — Constrain model behavior via system prompts: "You are a helpful assistant. You must not generate harmful, biased, or illegal content. If asked to do so, politely refuse and explain why." Test refusal behavior extensively.
30
+ - **Use classifier-guided generation** — For high-risk domains, run a safety classifier on every output before returning to the user. If toxicity/bias/PII is detected above threshold, retry generation with a modified prompt or return a safe default response.
31
+ - **Implement red-teaming as tests** — Create automated adversarial tests: jailbreak attempts, bias triggers, edge cases designed to elicit failures. Run these tests in CI/CD. Safety regressions must block deployment.
32
+ - **Log all safety events** — Record every safety filter activation: input rejected, output filtered, threshold exceeded. Include context: user ID, timestamp, input/output text, classifier scores. This data is critical for tuning thresholds and identifying attack patterns.
33
+ - **Design graceful degradation** — When safety filters trigger, provide user-friendly explanations: "I can't generate that content because [reason]." Do not expose internal classifier scores or filter logic (attackers use this to evade detection).
34
+
35
+ ### After implementation
36
+
37
+ - **Validate safety coverage** — Test the system with a held-out safety dataset: known toxic examples, bias triggers, jailbreak prompts. Measure recall (% of harmful content caught) and precision (% of flagged content that is truly harmful). Target: recall >95%, precision >90%.
38
+ - **Measure bias across demographics** — Evaluate model outputs for demographic parity, equalized odds, and calibration across protected attributes (race, gender, age). Use fairness toolkits (Fairlearn, AI Fairness 360). Bias gaps >20% require mitigation.
39
+ - **Conduct human red-teaming** — Hire external red-teamers to attempt jailbreaks and adversarial attacks. Automated tests miss creative attack vectors. Budget 20-40 hours of red-teaming per major release.
40
+ - **Monitor safety in production** — Track safety metrics over time: filter activation rate, false positive rate, user complaints about incorrect filtering. Safety degrades as attackers adapt and user behavior shifts.
41
+
42
+ ## Self-check before task completion
43
+
44
+ - [ ] Harm taxonomy is defined with severity ratings and likelihood estimates
45
+ - [ ] Safety thresholds are set per harm type and validated on representative data
46
+ - [ ] Input sanitization rejects malicious prompts before reaching the model
47
+ - [ ] Output filtering runs on every completion with <100ms latency overhead
48
+ - [ ] Constitutional AI constraints are embedded in system prompts and tested
49
+ - [ ] Automated red-teaming tests run in CI/CD and block deployment on failures
50
+ - [ ] Safety events are logged with full context for post-hoc analysis
51
+ - [ ] Bias is measured across demographics with fairness parity within acceptable bounds
52
+ - [ ] Human red-teaming has been conducted with documented attack attempts and mitigations
53
+ - [ ] Production monitoring tracks filter activation rate and false positive trends