mindforge-cc 10.0.2 → 10.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (322) hide show
  1. package/.mindforge/config.json +73 -2
  2. package/.mindforge/engine/autonomous/cross-iteration-bridge.md +96 -0
  3. package/.mindforge/engine/cost-tracking/budget-enforcer.md +68 -0
  4. package/.mindforge/engine/cost-tracking/router.md +58 -0
  5. package/.mindforge/engine/cost-tracking/token-ledger.md +77 -0
  6. package/.mindforge/engine/council/council-protocol.md +96 -0
  7. package/.mindforge/engine/council/council-templates.md +85 -0
  8. package/.mindforge/engine/council/synthesis-engine.md +71 -0
  9. package/.mindforge/engine/cross-model-eval.md +74 -0
  10. package/.mindforge/engine/instincts/capture-engine.md +63 -0
  11. package/.mindforge/engine/instincts/instinct-schema.md +76 -0
  12. package/.mindforge/engine/instincts/promotion-engine.md +77 -0
  13. package/.mindforge/engine/proactive/signal-detector.md +60 -0
  14. package/.mindforge/engine/proactive/suggestion-engine.md +100 -0
  15. package/.mindforge/engine/skills/composition.md +83 -0
  16. package/.mindforge/engine/skills/loader.md +16 -0
  17. package/.mindforge/personas/agent-architect.md +57 -0
  18. package/.mindforge/personas/agent-evaluator.md +162 -0
  19. package/.mindforge/personas/agent-memory-designer.md +157 -0
  20. package/.mindforge/personas/agent-ops-engineer.md +120 -0
  21. package/.mindforge/personas/agent-orchestrator.md +112 -0
  22. package/.mindforge/personas/ai-economist.md +57 -0
  23. package/.mindforge/personas/ai-safety-engineer.md +57 -0
  24. package/.mindforge/personas/analytics-engineer.md +57 -0
  25. package/.mindforge/personas/anti-pattern-hunter.md +61 -0
  26. package/.mindforge/personas/api-gateway-designer.md +132 -0
  27. package/.mindforge/personas/auth-engineer.md +112 -0
  28. package/.mindforge/personas/build-engineer.md +57 -0
  29. package/.mindforge/personas/business-analyst.md +56 -0
  30. package/.mindforge/personas/cache-architect.md +100 -0
  31. package/.mindforge/personas/causal-scientist.md +57 -0
  32. package/.mindforge/personas/cdn-architect.md +118 -0
  33. package/.mindforge/personas/change-agent.md +104 -0
  34. package/.mindforge/personas/code-narrator.md +52 -0
  35. package/.mindforge/personas/codegen-specialist.md +68 -0
  36. package/.mindforge/personas/communication-architect.md +102 -0
  37. package/.mindforge/personas/compliance-engineer.md +96 -0
  38. package/.mindforge/personas/consensus-engineer.md +116 -0
  39. package/.mindforge/personas/contract-tester.md +60 -192
  40. package/.mindforge/personas/cost-optimizer.md +71 -0
  41. package/.mindforge/personas/council-architect.md +66 -0
  42. package/.mindforge/personas/council-critic.md +67 -0
  43. package/.mindforge/personas/council-pragmatist.md +71 -0
  44. package/.mindforge/personas/council-skeptic.md +73 -0
  45. package/.mindforge/personas/data-architect.md +108 -0
  46. package/.mindforge/personas/data-mesh-architect.md +57 -0
  47. package/.mindforge/personas/data-pipeline-architect.md +120 -0
  48. package/.mindforge/personas/de-sloppifier.md +60 -0
  49. package/.mindforge/personas/debt-manager.md +66 -0
  50. package/.mindforge/personas/decision-architect.md +82 -51
  51. package/.mindforge/personas/deployment-captain.md +74 -0
  52. package/.mindforge/personas/design-system-lead.md +112 -0
  53. package/.mindforge/personas/dmux-orchestrator.md +75 -0
  54. package/.mindforge/personas/doc-auditor.md +84 -0
  55. package/.mindforge/personas/dx-engineer.md +96 -0
  56. package/.mindforge/personas/ecommerce-engineer.md +57 -0
  57. package/.mindforge/personas/edge-engineer.md +94 -0
  58. package/.mindforge/personas/edtech-architect.md +106 -0
  59. package/.mindforge/personas/embedding-architect.md +57 -0
  60. package/.mindforge/personas/environment-engineer.md +57 -0
  61. package/.mindforge/personas/eval-judge.md +55 -0
  62. package/.mindforge/personas/event-architect.md +102 -0
  63. package/.mindforge/personas/experiment-designer.md +138 -0
  64. package/.mindforge/personas/feature-store-engineer.md +57 -0
  65. package/.mindforge/personas/finops-analyst.md +66 -0
  66. package/.mindforge/personas/fintech-architect.md +57 -0
  67. package/.mindforge/personas/flutter-engineer.md +104 -0
  68. package/.mindforge/personas/gaming-engineer.md +57 -0
  69. package/.mindforge/personas/graphql-designer.md +73 -0
  70. package/.mindforge/personas/healthcare-engineer.md +57 -0
  71. package/.mindforge/personas/hiring-strategist.md +105 -0
  72. package/.mindforge/personas/hitl-architect.md +165 -0
  73. package/.mindforge/personas/i18n-architect.md +69 -0
  74. package/.mindforge/personas/instinct-curator.md +83 -0
  75. package/.mindforge/personas/iot-architect.md +105 -0
  76. package/.mindforge/personas/knowledge-curator.md +139 -0
  77. package/.mindforge/personas/knowledge-engineer.md +57 -0
  78. package/.mindforge/personas/lakehouse-architect.md +57 -0
  79. package/.mindforge/personas/llm-orchestrator.md +57 -0
  80. package/.mindforge/personas/logistics-architect.md +106 -0
  81. package/.mindforge/personas/market-analyst.md +53 -0
  82. package/.mindforge/personas/marketplace-engineer.md +105 -0
  83. package/.mindforge/personas/mcp-designer.md +54 -0
  84. package/.mindforge/personas/meeting-designer.md +104 -0
  85. package/.mindforge/personas/mentorship-lead.md +106 -0
  86. package/.mindforge/personas/migration-architect.md +57 -0
  87. package/.mindforge/personas/ml-ops-engineer.md +101 -0
  88. package/.mindforge/personas/mobile-architect.md +105 -0
  89. package/.mindforge/personas/mobile-security-engineer.md +106 -0
  90. package/.mindforge/personas/multi-model-bridge.md +86 -0
  91. package/.mindforge/personas/multi-tenancy-architect.md +71 -0
  92. package/.mindforge/personas/multimodal-engineer.md +57 -0
  93. package/.mindforge/personas/offline-specialist.md +105 -0
  94. package/.mindforge/personas/onboarding-navigator.md +63 -0
  95. package/.mindforge/personas/payments-engineer.md +135 -0
  96. package/.mindforge/personas/pipeline-engineer.md +115 -0
  97. package/.mindforge/personas/platform-engineer.md +97 -0
  98. package/.mindforge/personas/platform-lead.md +57 -0
  99. package/.mindforge/personas/privacy-engineer.md +57 -0
  100. package/.mindforge/personas/product-owner.md +56 -0
  101. package/.mindforge/personas/productivity-analyst.md +57 -0
  102. package/.mindforge/personas/prompt-architect.md +101 -0
  103. package/.mindforge/personas/proofreader.md +53 -0
  104. package/.mindforge/personas/pwa-architect.md +105 -0
  105. package/.mindforge/personas/quality-scorer.md +63 -0
  106. package/.mindforge/personas/react-native-engineer.md +106 -0
  107. package/.mindforge/personas/resilience-engineer.md +69 -0
  108. package/.mindforge/personas/rfc-architect.md +64 -0
  109. package/.mindforge/personas/saga-orchestrator.md +80 -0
  110. package/.mindforge/personas/secrets-engineer.md +57 -0
  111. package/.mindforge/personas/skill-smith.md +79 -0
  112. package/.mindforge/personas/sre-lead.md +107 -0
  113. package/.mindforge/personas/stream-engineer.md +57 -0
  114. package/.mindforge/personas/streaming-engineer.md +64 -0
  115. package/.mindforge/personas/swarm-templates.json +695 -38
  116. package/.mindforge/personas/system-designer.md +57 -0
  117. package/.mindforge/personas/team-coach.md +120 -0
  118. package/.mindforge/personas/tech-lead-coach.md +103 -0
  119. package/.mindforge/personas/technical-writer-lead.md +111 -0
  120. package/.mindforge/personas/threat-modeler.md +82 -0
  121. package/.mindforge/personas/vibe-checker.md +75 -0
  122. package/.mindforge/personas/worktree-manager.md +56 -0
  123. package/.mindforge/personas/zero-trust-engineer.md +113 -0
  124. package/.mindforge/skills/a11y-testing/SKILL.md +143 -0
  125. package/.mindforge/skills/agent-evaluation-framework/SKILL.md +227 -0
  126. package/.mindforge/skills/agent-introspection-debugging/SKILL.md +88 -0
  127. package/.mindforge/skills/agent-loops/SKILL.md +84 -0
  128. package/.mindforge/skills/agent-memory-design/SKILL.md +199 -0
  129. package/.mindforge/skills/agent-orchestration-patterns/SKILL.md +129 -0
  130. package/.mindforge/skills/agent-tool-selection/SKILL.md +204 -0
  131. package/.mindforge/skills/ai-agent-deployment/SKILL.md +176 -0
  132. package/.mindforge/skills/ai-cost-management/SKILL.md +57 -0
  133. package/.mindforge/skills/ai-safety-alignment/SKILL.md +53 -0
  134. package/.mindforge/skills/analytics-instrumentation/SKILL.md +172 -0
  135. package/.mindforge/skills/api-gateway-patterns/SKILL.md +177 -0
  136. package/.mindforge/skills/api-marketplace/SKILL.md +56 -0
  137. package/.mindforge/skills/api-versioning/SKILL.md +100 -0
  138. package/.mindforge/skills/app-store-deployment/SKILL.md +44 -0
  139. package/.mindforge/skills/architecture-tradeoff-analysis/SKILL.md +97 -0
  140. package/.mindforge/skills/audit-logging/SKILL.md +140 -0
  141. package/.mindforge/skills/auth-patterns/SKILL.md +148 -0
  142. package/.mindforge/skills/autonomous-agent-harness/SKILL.md +218 -0
  143. package/.mindforge/skills/autonomous-agents/SKILL.md +59 -0
  144. package/.mindforge/skills/autonomous-loops/SKILL.md +105 -0
  145. package/.mindforge/skills/build-system-optimization/SKILL.md +54 -0
  146. package/.mindforge/skills/build-vs-buy/SKILL.md +80 -0
  147. package/.mindforge/skills/bundle-optimization/SKILL.md +174 -0
  148. package/.mindforge/skills/business-analyst/SKILL.md +82 -0
  149. package/.mindforge/skills/caching-strategies/SKILL.md +132 -0
  150. package/.mindforge/skills/capacity-planning/SKILL.md +96 -0
  151. package/.mindforge/skills/causal-inference/SKILL.md +42 -0
  152. package/.mindforge/skills/cdn-optimization/SKILL.md +212 -0
  153. package/.mindforge/skills/change-management/SKILL.md +106 -0
  154. package/.mindforge/skills/chaos-engineering/SKILL.md +99 -0
  155. package/.mindforge/skills/ci-cd-pipeline/SKILL.md +118 -0
  156. package/.mindforge/skills/cli-design/SKILL.md +118 -0
  157. package/.mindforge/skills/code-generation-patterns/SKILL.md +92 -0
  158. package/.mindforge/skills/code-review-methodology/SKILL.md +180 -0
  159. package/.mindforge/skills/code-tour/SKILL.md +145 -0
  160. package/.mindforge/skills/codebase-onboarding/SKILL.md +95 -0
  161. package/.mindforge/skills/compliance-as-code/SKILL.md +195 -0
  162. package/.mindforge/skills/conflict-resolution/SKILL.md +87 -0
  163. package/.mindforge/skills/connection-pooling/SKILL.md +151 -0
  164. package/.mindforge/skills/container-security/SKILL.md +151 -0
  165. package/.mindforge/skills/context-engineering/SKILL.md +114 -0
  166. package/.mindforge/skills/continuous-learning/SKILL.md +84 -0
  167. package/.mindforge/skills/contract-testing/SKILL.md +85 -0
  168. package/.mindforge/skills/cost-aware-routing/SKILL.md +83 -0
  169. package/.mindforge/skills/cost-estimation/SKILL.md +82 -0
  170. package/.mindforge/skills/council/SKILL.md +68 -0
  171. package/.mindforge/skills/cqrs-event-sourcing/SKILL.md +95 -0
  172. package/.mindforge/skills/cross-platform-testing/SKILL.md +43 -0
  173. package/.mindforge/skills/data-governance/SKILL.md +42 -0
  174. package/.mindforge/skills/data-lakehouse/SKILL.md +42 -0
  175. package/.mindforge/skills/data-mesh/SKILL.md +42 -0
  176. package/.mindforge/skills/data-modeling/SKILL.md +107 -0
  177. package/.mindforge/skills/data-pipeline-design/SKILL.md +171 -0
  178. package/.mindforge/skills/data-privacy-engineering/SKILL.md +42 -0
  179. package/.mindforge/skills/database-performance/SKILL.md +174 -0
  180. package/.mindforge/skills/database-sharding-advanced/SKILL.md +206 -0
  181. package/.mindforge/skills/de-sloppify/SKILL.md +120 -0
  182. package/.mindforge/skills/defense-in-depth/SKILL.md +84 -0
  183. package/.mindforge/skills/delegation-patterns/SKILL.md +123 -0
  184. package/.mindforge/skills/dependency-management/SKILL.md +94 -0
  185. package/.mindforge/skills/deployment-workflow/SKILL.md +135 -0
  186. package/.mindforge/skills/design-system/SKILL.md +113 -0
  187. package/.mindforge/skills/developer-onboarding/SKILL.md +99 -0
  188. package/.mindforge/skills/developer-productivity-metrics/SKILL.md +59 -0
  189. package/.mindforge/skills/distributed-consensus/SKILL.md +141 -0
  190. package/.mindforge/skills/dmux-workflows/SKILL.md +141 -0
  191. package/.mindforge/skills/dns-architecture/SKILL.md +167 -0
  192. package/.mindforge/skills/doc-health-audit/SKILL.md +102 -0
  193. package/.mindforge/skills/ecommerce-architecture/SKILL.md +41 -0
  194. package/.mindforge/skills/edge-computing/SKILL.md +91 -0
  195. package/.mindforge/skills/edtech-platform/SKILL.md +41 -0
  196. package/.mindforge/skills/email-deliverability/SKILL.md +177 -0
  197. package/.mindforge/skills/embedding-systems/SKILL.md +55 -0
  198. package/.mindforge/skills/environment-management/SKILL.md +54 -0
  199. package/.mindforge/skills/error-handling-architecture/SKILL.md +118 -0
  200. package/.mindforge/skills/estimation-techniques/SKILL.md +113 -0
  201. package/.mindforge/skills/eval-harness/SKILL.md +180 -0
  202. package/.mindforge/skills/event-driven-architecture/SKILL.md +162 -0
  203. package/.mindforge/skills/experiment-design/SKILL.md +139 -0
  204. package/.mindforge/skills/experiment-platform/SKILL.md +43 -0
  205. package/.mindforge/skills/feature-engineering/SKILL.md +42 -0
  206. package/.mindforge/skills/feature-flag-management/SKILL.md +183 -0
  207. package/.mindforge/skills/fine-tuning-workflow/SKILL.md +189 -0
  208. package/.mindforge/skills/fintech-patterns/SKILL.md +41 -0
  209. package/.mindforge/skills/flutter-architecture/SKILL.md +42 -0
  210. package/.mindforge/skills/gaming-backend/SKILL.md +41 -0
  211. package/.mindforge/skills/git-workflow-design/SKILL.md +129 -0
  212. package/.mindforge/skills/graceful-degradation/SKILL.md +95 -0
  213. package/.mindforge/skills/graphql-patterns/SKILL.md +243 -0
  214. package/.mindforge/skills/guardrails-and-safety/SKILL.md +137 -0
  215. package/.mindforge/skills/healthcare-systems/SKILL.md +40 -0
  216. package/.mindforge/skills/hiring-engineering/SKILL.md +119 -0
  217. package/.mindforge/skills/human-in-the-loop-design/SKILL.md +234 -0
  218. package/.mindforge/skills/i18n-architecture/SKILL.md +147 -0
  219. package/.mindforge/skills/idempotency-patterns/SKILL.md +84 -0
  220. package/.mindforge/skills/incident-communication/SKILL.md +96 -0
  221. package/.mindforge/skills/incident-management/SKILL.md +97 -0
  222. package/.mindforge/skills/infrastructure-as-code/SKILL.md +98 -0
  223. package/.mindforge/skills/instinct-clustering/SKILL.md +190 -0
  224. package/.mindforge/skills/internal-developer-platform/SKILL.md +51 -0
  225. package/.mindforge/skills/iot-platform/SKILL.md +41 -0
  226. package/.mindforge/skills/k8s-deployment/SKILL.md +358 -0
  227. package/.mindforge/skills/knowledge-graphs/SKILL.md +56 -0
  228. package/.mindforge/skills/knowledge-sharing-systems/SKILL.md +112 -0
  229. package/.mindforge/skills/llm-cost-optimization/SKILL.md +198 -0
  230. package/.mindforge/skills/llm-orchestration/SKILL.md +56 -0
  231. package/.mindforge/skills/load-testing/SKILL.md +84 -0
  232. package/.mindforge/skills/logistics-optimization/SKILL.md +40 -0
  233. package/.mindforge/skills/market-researcher/SKILL.md +99 -0
  234. package/.mindforge/skills/marketplace-trust/SKILL.md +40 -0
  235. package/.mindforge/skills/mcp-server-patterns/SKILL.md +264 -0
  236. package/.mindforge/skills/media-streaming/SKILL.md +41 -0
  237. package/.mindforge/skills/meeting-architecture/SKILL.md +146 -0
  238. package/.mindforge/skills/mentoring-patterns/SKILL.md +77 -0
  239. package/.mindforge/skills/microservices-patterns/SKILL.md +83 -0
  240. package/.mindforge/skills/migration-platform/SKILL.md +61 -0
  241. package/.mindforge/skills/migration-strategies/SKILL.md +129 -0
  242. package/.mindforge/skills/ml-feature-store/SKILL.md +56 -0
  243. package/.mindforge/skills/ml-monitoring/SKILL.md +42 -0
  244. package/.mindforge/skills/mobile-performance/SKILL.md +44 -0
  245. package/.mindforge/skills/mobile-security/SKILL.md +45 -0
  246. package/.mindforge/skills/model-evaluation/SKILL.md +53 -0
  247. package/.mindforge/skills/monorepo-management/SKILL.md +100 -0
  248. package/.mindforge/skills/multi-llm-consult/SKILL.md +75 -0
  249. package/.mindforge/skills/multi-tenancy-patterns/SKILL.md +145 -0
  250. package/.mindforge/skills/multi-turn-conversation-design/SKILL.md +206 -0
  251. package/.mindforge/skills/multimodal-ai/SKILL.md +51 -0
  252. package/.mindforge/skills/mutation-testing/SKILL.md +97 -0
  253. package/.mindforge/skills/notification-system-design/SKILL.md +168 -0
  254. package/.mindforge/skills/observability-stack/SKILL.md +136 -0
  255. package/.mindforge/skills/offline-first-design/SKILL.md +43 -0
  256. package/.mindforge/skills/on-call-design/SKILL.md +111 -0
  257. package/.mindforge/skills/pagination-patterns/SKILL.md +230 -0
  258. package/.mindforge/skills/payment-integration/SKILL.md +176 -0
  259. package/.mindforge/skills/performance-reviews/SKILL.md +140 -0
  260. package/.mindforge/skills/platform-observability/SKILL.md +58 -0
  261. package/.mindforge/skills/platform-reliability/SKILL.md +52 -0
  262. package/.mindforge/skills/post-incident-learning/SKILL.md +96 -0
  263. package/.mindforge/skills/product-manager/SKILL.md +104 -0
  264. package/.mindforge/skills/progressive-web-app/SKILL.md +44 -0
  265. package/.mindforge/skills/prompt-engineering/SKILL.md +94 -0
  266. package/.mindforge/skills/proofreader/SKILL.md +158 -0
  267. package/.mindforge/skills/push-notification-architecture/SKILL.md +45 -0
  268. package/.mindforge/skills/python-performance/SKILL.md +183 -0
  269. package/.mindforge/skills/quality-audit/SKILL.md +171 -0
  270. package/.mindforge/skills/queue-design/SKILL.md +85 -0
  271. package/.mindforge/skills/rag-architecture/SKILL.md +176 -0
  272. package/.mindforge/skills/rate-limiting-design/SKILL.md +94 -0
  273. package/.mindforge/skills/react-native-patterns/SKILL.md +42 -0
  274. package/.mindforge/skills/react-performance/SKILL.md +229 -0
  275. package/.mindforge/skills/real-time-analytics/SKILL.md +42 -0
  276. package/.mindforge/skills/real-time-sync/SKILL.md +83 -0
  277. package/.mindforge/skills/responsive-native/SKILL.md +44 -0
  278. package/.mindforge/skills/responsive-patterns/SKILL.md +141 -0
  279. package/.mindforge/skills/rfc-pipeline/SKILL.md +114 -0
  280. package/.mindforge/skills/saas-multi-tenant/SKILL.md +41 -0
  281. package/.mindforge/skills/santa-method/SKILL.md +134 -0
  282. package/.mindforge/skills/search-implementation/SKILL.md +98 -0
  283. package/.mindforge/skills/secrets-platform/SKILL.md +56 -0
  284. package/.mindforge/skills/secrets-rotation/SKILL.md +173 -0
  285. package/.mindforge/skills/self-serve-infrastructure/SKILL.md +51 -0
  286. package/.mindforge/skills/serverless-patterns/SKILL.md +119 -0
  287. package/.mindforge/skills/skill-creator-meta/SKILL.md +146 -0
  288. package/.mindforge/skills/sprint-retrospective-facilitation/SKILL.md +112 -0
  289. package/.mindforge/skills/stakeholder-communication/SKILL.md +85 -0
  290. package/.mindforge/skills/state-management/SKILL.md +104 -0
  291. package/.mindforge/skills/stream-processing/SKILL.md +43 -0
  292. package/.mindforge/skills/streaming-architecture/SKILL.md +81 -0
  293. package/.mindforge/skills/supply-chain-security/SKILL.md +145 -0
  294. package/.mindforge/skills/synthetic-data-generation/SKILL.md +52 -0
  295. package/.mindforge/skills/system-design/SKILL.md +88 -0
  296. package/.mindforge/skills/team-topology-design/SKILL.md +107 -0
  297. package/.mindforge/skills/technical-debt-management/SKILL.md +86 -0
  298. package/.mindforge/skills/technical-interview-design/SKILL.md +98 -0
  299. package/.mindforge/skills/technical-leadership/SKILL.md +75 -0
  300. package/.mindforge/skills/technical-writing/SKILL.md +237 -0
  301. package/.mindforge/skills/technology-radar/SKILL.md +88 -0
  302. package/.mindforge/skills/testing-anti-patterns/SKILL.md +288 -0
  303. package/.mindforge/skills/threat-modeling/SKILL.md +109 -0
  304. package/.mindforge/skills/tool-design/SKILL.md +138 -0
  305. package/.mindforge/skills/typescript-advanced/SKILL.md +198 -0
  306. package/.mindforge/skills/using-git-worktrees/SKILL.md +139 -0
  307. package/.mindforge/skills/verification-loop/SKILL.md +97 -0
  308. package/.mindforge/skills/vibe-security/SKILL.md +165 -0
  309. package/.mindforge/skills/visual-regression-testing/SKILL.md +97 -0
  310. package/.mindforge/skills/websocket-patterns/SKILL.md +203 -0
  311. package/.mindforge/skills/writing-plans/SKILL.md +170 -0
  312. package/.mindforge/skills/writing-skills/SKILL.md +216 -0
  313. package/.mindforge/skills/zero-trust-architecture/SKILL.md +166 -0
  314. package/CHANGELOG.md +195 -0
  315. package/MINDFORGE.md +4 -4
  316. package/README.md +2 -2
  317. package/RELEASENOTES.md +66 -0
  318. package/bin/installer-core.js +1 -1
  319. package/bin/wizard/theme.js +2 -2
  320. package/docs/commands-reference.md +18 -1
  321. package/package.json +2 -2
  322. package/.mindforge/personas/data-privacy-engineer.md +0 -187
@@ -0,0 +1,180 @@
1
+ ---
2
+ name: eval-harness
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.4
5
+ status: stable
6
+ triggers: eval, evaluation, grading, pass at k, rubric, regression eval, capability eval, model judge, deterministic grading, LLM-as-judge, eval score, eval-driven, benchmark eval
7
+ ---
8
+
9
+ # Skill — Eval Harness (Systematic Evaluation Framework)
10
+
11
+ ## When this skill activates
12
+ When measuring, scoring, or validating system outputs against defined criteria.
13
+ Use for capability evaluation (can the system do X?), regression evaluation (does
14
+ a change break existing behavior?), or comparative evaluation (is version A better
15
+ than version B?). The eval harness ensures you define success BEFORE implementing,
16
+ not after.
17
+
18
+ Core principle: **Define-before-code** — write evaluation criteria before writing
19
+ the implementation they measure.
20
+
21
+ ## Mandatory actions when this skill is active
22
+
23
+ ### Before evaluation begins
24
+
25
+ 1. **Define the eval type:**
26
+ - **Capability eval**: Can the system perform task X at acceptable quality?
27
+ - **Regression eval**: Does this change preserve existing behavior?
28
+ - **Comparative eval**: Is output A better than output B on criteria C?
29
+
30
+ 2. **Write the eval config BEFORE implementation:**
31
+ ```
32
+ .mindforge/evals/[eval-name]/
33
+ ├── config.json # eval metadata, parameters, thresholds
34
+ ├── rubric.md # human-readable success criteria
35
+ ├── test-cases.json # input/expected-output pairs
36
+ └── results.jsonl # append-only results log
37
+ ```
38
+
39
+ 3. **Define success criteria in config.json:**
40
+ ```json
41
+ {
42
+ "name": "eval-name",
43
+ "type": "capability" | "regression" | "comparative",
44
+ "version": "1.0.0",
45
+ "created": "ISO-8601",
46
+ "thresholds": {
47
+ "pass_at_1": 0.8,
48
+ "pass_at_5": 0.95,
49
+ "pass_at_10": 0.99
50
+ },
51
+ "grader": "code" | "model" | "human",
52
+ "model_judge_config": {
53
+ "model": "claude-sonnet",
54
+ "rubric_path": "./rubric.md",
55
+ "temperature": 0.0
56
+ },
57
+ "test_case_count": 0,
58
+ "tags": []
59
+ }
60
+ ```
61
+
62
+ 4. **Write the rubric (rubric.md) with explicit scoring:**
63
+ - Each criterion gets a 1-5 scale with concrete examples at each level
64
+ - Define what a "pass" means (minimum score per criterion)
65
+ - Define what a "fail" looks like with specific examples
66
+ - Include edge cases that should be tested
67
+
68
+ ### During evaluation
69
+
70
+ **Three Grader Types:**
71
+
72
+ **1. Code-Based (Deterministic):**
73
+ - Use when outputs have objectively verifiable properties
74
+ - Write assertion functions that return PASS/FAIL with evidence
75
+ - Examples: output matches regex, JSON schema validates, function returns expected value
76
+ - No ambiguity — the grader is a function, not a judgment call
77
+ - Always prefer code-based grading when possible (fastest, most reliable)
78
+
79
+ ```typescript
80
+ // Example code grader
81
+ function grade(output: string, expected: TestCase): GradeResult {
82
+ const parsed = JSON.parse(output);
83
+ return {
84
+ pass: parsed.status === expected.status && parsed.count >= expected.minCount,
85
+ evidence: `status=${parsed.status}, count=${parsed.count}`,
86
+ criterion: "structural-correctness"
87
+ };
88
+ }
89
+ ```
90
+
91
+ **2. Model-Based (LLM-as-Judge):**
92
+ - Use when outputs require semantic understanding (prose quality, code correctness, reasoning)
93
+ - Always provide the rubric in the judge prompt — never rely on implicit standards
94
+ - Use temperature 0.0 for judge calls (determinism)
95
+ - Run judge 3x per item and take majority vote (reduces noise)
96
+ - Log the judge's reasoning alongside the score
97
+
98
+ ```
99
+ Judge prompt structure:
100
+ 1. Task description (what was the system asked to do?)
101
+ 2. Rubric (what does good look like? what does bad look like?)
102
+ 3. The output to grade
103
+ 4. Instruction: score 1-5 per criterion, explain each score, give overall PASS/FAIL
104
+ ```
105
+
106
+ **3. Human-Based (Flag for Review):**
107
+ - Use when stakes are too high for automated judgment
108
+ - Generate a review queue with: input, output, rubric, suggested-score
109
+ - Human confirms or overrides the suggested score
110
+ - Track inter-rater reliability if multiple humans review
111
+
112
+ **pass@k Metrics:**
113
+ - Generate k independent outputs for each test case
114
+ - **pass@1**: Fraction of test cases where the first output passes
115
+ - **pass@5**: Fraction where at least 1 of 5 outputs passes
116
+ - **pass@10**: Fraction where at least 1 of 10 outputs passes
117
+ - Formula: pass@k = 1 - C(n-c, k) / C(n, k) where n=total, c=correct
118
+ - Always report pass@1 (baseline) and at least one higher-k metric
119
+ - Use pass@1 for production readiness, pass@k for capability ceiling
120
+
121
+ **Result logging (results.jsonl):**
122
+ ```json
123
+ {
124
+ "timestamp": "ISO-8601",
125
+ "test_case_id": "tc-001",
126
+ "input": "...",
127
+ "output": "...",
128
+ "grader": "code",
129
+ "scores": {"criterion_a": 4, "criterion_b": 5},
130
+ "pass": true,
131
+ "evidence": "...",
132
+ "latency_ms": 0,
133
+ "model_version": "...",
134
+ "run_id": "uuid"
135
+ }
136
+ ```
137
+
138
+ ### After evaluation
139
+
140
+ 1. **Compute aggregate metrics:**
141
+ - Overall pass rate (pass@1, pass@5, pass@10)
142
+ - Per-criterion score distribution
143
+ - Failure mode clustering (what patterns cause failures?)
144
+ - Comparison to previous run (regression detection)
145
+
146
+ 2. **Regression detection logic:**
147
+ - If pass@1 drops > 5% from previous run: FLAG as regression
148
+ - If any previously-passing test case now fails: FLAG as regression
149
+ - If new failure modes appear that didn't exist before: FLAG as regression
150
+ - Regressions block shipping until investigated
151
+
152
+ 3. **Store results:**
153
+ - Append to results.jsonl (never overwrite)
154
+ - Update config.json with latest run metadata
155
+ - If regression detected: create `.mindforge/evals/[name]/REGRESSION.md`
156
+
157
+ 4. **Report format:**
158
+ ```
159
+ ## Eval Report: [eval-name]
160
+ - Type: capability | regression | comparative
161
+ - Run: [run-id] at [timestamp]
162
+ - Test cases: N total, P passed, F failed
163
+ - pass@1: X% | pass@5: Y% | pass@10: Z%
164
+ - Threshold: pass@1 >= T% → [MET / NOT MET]
165
+ - Regressions: [none | list]
166
+ - Top failure modes: [list with counts]
167
+ ```
168
+
169
+ ## Self-check before task completion
170
+
171
+ Before marking a task done when this skill was active:
172
+
173
+ - [ ] Did I define success criteria BEFORE writing implementation code?
174
+ - [ ] Did I choose the appropriate grader type (code > model > human preference)?
175
+ - [ ] Did I track pass@k metrics (at minimum pass@1)?
176
+ - [ ] Did I run regression evals against previous results?
177
+ - [ ] Are results stored in `.mindforge/evals/[name]/results.jsonl`?
178
+ - [ ] If model-based grading: did I use temperature 0.0 and majority vote?
179
+ - [ ] Did I report failure modes, not just pass rates?
180
+ - [ ] Is the rubric explicit enough that another reviewer could grade independently?
@@ -0,0 +1,162 @@
1
+ ---
2
+ name: event-driven-architecture
3
+ version: 1.0.0
4
+ min_mindforge_version: 0.1.0
5
+ status: stable
6
+ triggers: event driven architecture, event bus, pub sub pattern, event schema design, ordering guarantee, exactly once delivery, dead letter topic, event sourcing integration, event catalog, event versioning strategy, event replay strategy, event consumer group
7
+ ---
8
+
9
+ # Skill — Event-Driven Architecture
10
+
11
+ ## When this skill activates
12
+ Any task involving event bus design, pub/sub patterns, message ordering,
13
+ delivery guarantees, dead letter handling, or event schema evolution.
14
+
15
+ ## Mandatory actions when this skill is active
16
+
17
+ ### Before writing any code
18
+ 1. Classify event types (domain, integration, or command events).
19
+ 2. Define delivery guarantees required for each event stream.
20
+ 3. Design the event schema with forward/backward compatibility in mind.
21
+
22
+ ### During implementation
23
+ - Make all consumers idempotent (safe to process same event multiple times).
24
+ - Implement dead letter topic handling with alerting.
25
+ - Use partition keys to maintain ordering where required.
26
+
27
+ ### After implementation
28
+ - Register events in the event catalog with schema and owner.
29
+ - Add consumer lag monitoring.
30
+ - Document retry and failure handling in ARCHITECTURE.md.
31
+
32
+ ## Event Types
33
+
34
+ ### Domain Events
35
+ - Facts about what happened in a bounded context.
36
+ - Named in past tense: `OrderPlaced`, `PaymentProcessed`, `UserRegistered`.
37
+ - Owned by the producing domain — consumers must adapt.
38
+ - Immutable once published.
39
+
40
+ ### Integration Events
41
+ - Cross-boundary communication between services.
42
+ - May be transformed from domain events (different schema, less detail).
43
+ - Published on shared event bus (Kafka, SNS, EventBridge).
44
+
45
+ ### Command Events
46
+ - Request for action (not a fact).
47
+ - Named as imperative: `ProcessPayment`, `SendNotification`.
48
+ - Exactly one consumer expected to handle.
49
+ - Requires acknowledgment/response.
50
+
51
+ ## Delivery Guarantees
52
+
53
+ ### At-Most-Once
54
+ - Fire and forget. No retries.
55
+ - Use for: metrics, analytics, non-critical notifications.
56
+ - Risk: message loss on failure.
57
+
58
+ ### At-Least-Once (Recommended Default)
59
+ - Retry until acknowledged.
60
+ - Consumers MUST be idempotent.
61
+ - Use for: most business events.
62
+ - Risk: duplicate processing (mitigated by idempotency).
63
+
64
+ ### Exactly-Once (Expensive)
65
+ - Requires transactional outbox + deduplication.
66
+ - Use for: financial transactions, inventory changes.
67
+ - Implementation: idempotency key + processed event log.
68
+
69
+ ## Ordering Guarantees
70
+
71
+ ### Per-Partition Ordering
72
+ - Events with the same partition key are ordered.
73
+ - Partition key = entity ID (e.g., order_id, user_id).
74
+ - Different entities may be processed out of order (acceptable).
75
+
76
+ ### Global Ordering
77
+ - Extremely expensive — single partition = no parallelism.
78
+ - Almost never needed — design around per-entity ordering instead.
79
+
80
+ ### Kafka Partition Key Design
81
+ ```
82
+ topic: order-events
83
+ partition_key: order_id
84
+ result: all events for order-123 arrive in sequence
85
+ ```
86
+
87
+ ## Schema Evolution
88
+
89
+ ### Compatibility Modes (Avro/Protobuf)
90
+ - **Backward compatible**: new schema can read old data (add optional fields).
91
+ - **Forward compatible**: old schema can read new data (ignore unknown fields).
92
+ - **Full compatible**: both directions (safest, most restrictive).
93
+
94
+ ### Rules for Safe Evolution
95
+ - Adding optional fields: always safe.
96
+ - Removing fields: only if no consumers depend on them.
97
+ - Renaming fields: treat as remove + add (breaking).
98
+ - Changing field types: always breaking.
99
+
100
+ ### Schema Registry
101
+ - Central registry of all event schemas with version history.
102
+ - Validates compatibility before allowing schema updates.
103
+ - Consumers reference schema by ID (embedded in message header).
104
+
105
+ ## Consumer Groups
106
+
107
+ ### Competing Consumers (Scaling Pattern)
108
+ - Multiple instances in same group share the load.
109
+ - Each message processed by exactly one instance.
110
+ - Use for: order processing, notification sending.
111
+ - Scale by adding more consumers (up to partition count).
112
+
113
+ ### Broadcasting (Fan-Out Pattern)
114
+ - Each consumer group gets every message.
115
+ - Use for: audit logging, cache invalidation, analytics.
116
+ - Different groups process independently at their own pace.
117
+
118
+ ## Dead Letter Topics (DLT)
119
+
120
+ ### Flow
121
+ ```
122
+ message → consumer → FAIL → retry (3x with backoff) → FAIL → DLT → alert
123
+ ```
124
+
125
+ ### Requirements
126
+ - Every consumer MUST have a DLT configured.
127
+ - DLT messages retain full context (original message + error + attempt count).
128
+ - Alert on first DLT message (don't silently accumulate).
129
+ - Manual resolution workflow: inspect → fix → replay or discard.
130
+
131
+ ### Retry Strategy
132
+ - Attempt 1: immediate.
133
+ - Attempt 2: 1 second delay.
134
+ - Attempt 3: 10 second delay.
135
+ - After 3 failures: route to DLT.
136
+
137
+ ## Event Catalog
138
+
139
+ Every event in the system must be registered:
140
+
141
+ | Field | Description |
142
+ |-------|-------------|
143
+ | Event name | `OrderPlaced` |
144
+ | Schema version | `v3` |
145
+ | Owner (team) | Order Service team |
146
+ | Producers | order-service |
147
+ | Consumers | notification-svc, analytics-svc, fulfillment-svc |
148
+ | Partition key | order_id |
149
+ | Delivery guarantee | at-least-once |
150
+ | Retention | 7 days |
151
+
152
+ ## Self-check before task completion
153
+
154
+ Before marking a task done when this skill was active:
155
+
156
+ - [ ] Did I read the full SKILL.md before starting? (Not just the triggers)
157
+ - [ ] Are all consumers idempotent?
158
+ - [ ] Is ordering guaranteed per entity via partition keys?
159
+ - [ ] Is dead letter topic configured with alerting?
160
+ - [ ] Are event schemas registered in the catalog?
161
+ - [ ] Is schema evolution backward-compatible?
162
+ - [ ] Are consumer groups configured correctly (competing vs broadcasting)?
@@ -0,0 +1,139 @@
1
+ ---
2
+ name: experiment-design
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.0.4
5
+ status: stable
6
+ triggers: experiment design, A/B testing architecture, statistical significance, sample size calculator, guardrail metric, experiment lifecycle, hypothesis testing, control variant, experiment analysis, metric sensitivity, experiment duration, randomization unit
7
+ ---
8
+
9
+ # Skill — Experiment Design (Rigorous A/B Testing Architecture)
10
+
11
+ ## When this skill activates
12
+ When designing, planning, or analyzing A/B tests, multivariate experiments, or
13
+ any controlled experiment that requires statistical rigor. Use for feature rollout
14
+ decisions, conversion optimization, pricing tests, or any scenario where you need
15
+ to measure the causal impact of a change.
16
+
17
+ Core principle: **Hypothesis-first** — never launch an experiment without a written
18
+ hypothesis that specifies the expected effect direction, magnitude, and mechanism.
19
+
20
+ ## Mandatory actions when this skill is active
21
+
22
+ ### Before experiment begins
23
+
24
+ 1. **Write the hypothesis in structured format:**
25
+ ```
26
+ If we [change X],
27
+ then [metric Y] will [improve/degrade] by [Z amount]
28
+ because [causal mechanism].
29
+ ```
30
+ - The hypothesis must be falsifiable
31
+ - The expected effect size must be realistic (based on prior data or industry benchmarks)
32
+ - The causal mechanism must be articulated (not just "it will be better")
33
+
34
+ 2. **Calculate sample size:**
35
+ ```
36
+ Inputs:
37
+ - Baseline conversion rate (current metric value)
38
+ - Minimum Detectable Effect (MDE): smallest improvement worth detecting
39
+ - Statistical significance level (alpha): typically 0.05
40
+ - Statistical power (1-beta): typically 0.80
41
+ - Number of variants (control + treatments)
42
+
43
+ Output:
44
+ - Required sample size per variant
45
+ - Estimated duration = required_N / daily_traffic_per_variant
46
+ ```
47
+
48
+ Rules:
49
+ - MDE should be the smallest PRACTICALLY significant effect (not just statistically significant)
50
+ - If duration > 8 weeks: increase MDE or find higher-traffic surface
51
+ - Never compromise on power — underpowered experiments waste everyone's time
52
+
53
+ 3. **Define guardrail metrics:**
54
+ ```json
55
+ {
56
+ "primary_metric": "conversion_rate",
57
+ "secondary_metrics": ["revenue_per_user", "engagement_time"],
58
+ "guardrail_metrics": [
59
+ {"name": "page_load_time_p95", "threshold": "+200ms", "action": "stop"},
60
+ {"name": "error_rate", "threshold": "+0.5%", "action": "stop"},
61
+ {"name": "revenue_per_session", "threshold": "-2%", "action": "alert"}
62
+ ]
63
+ }
64
+ ```
65
+ - Guardrails are metrics that MUST NOT degrade beyond threshold
66
+ - Violation of a guardrail = experiment stopped regardless of primary metric
67
+ - Always include: performance, error rate, and revenue as guardrails
68
+
69
+ 4. **Choose randomization unit:**
70
+ - **User-level**: Default for most experiments (consistent experience across sessions)
71
+ - **Session-level**: For UI experiments where cross-session contamination is acceptable
72
+ - **Page-level**: Only for layout experiments with no carryover effects
73
+ - **Device-level**: When logged-out users are significant traffic
74
+ - Rule: randomization unit >= analysis unit (never analyze at user level if randomized at page level)
75
+
76
+ ### During experiment
77
+
78
+ 1. **Minimum duration rules:**
79
+ - Run for at least 1 full business cycle (typically 7 days minimum)
80
+ - Recommended: 2 full weeks to capture weekday/weekend variation
81
+ - NEVER stop early because results "look significant" (peeking problem)
82
+ - If using sequential testing: define stopping rules BEFORE launch
83
+
84
+ 2. **Monitoring protocol:**
85
+ - Check guardrail metrics daily
86
+ - Do NOT check primary metric significance until planned end date
87
+ - If peeking is necessary: use group sequential methods with alpha spending
88
+ - Log any system issues that may contaminate results (outages, bugs, other launches)
89
+
90
+ 3. **Sample Ratio Mismatch (SRM) check:**
91
+ - Verify variant assignment is balanced (chi-square test, p < 0.001 = SRM)
92
+ - SRM invalidates the experiment — do not trust results
93
+ - Common causes: bot filtering, redirect failures, bucketing bugs
94
+
95
+ ### After experiment (analysis)
96
+
97
+ 1. **Statistical analysis checklist:**
98
+ - [ ] Confirm no SRM
99
+ - [ ] Check primary metric: p-value < 0.05 AND confidence interval excludes 0
100
+ - [ ] Check practical significance: is the effect size large enough to matter?
101
+ - [ ] Check guardrail metrics: no violations
102
+ - [ ] Check segment consistency: does the effect hold across key segments?
103
+ - [ ] Check novelty/primacy effects: is the effect stable over time?
104
+
105
+ 2. **Decision framework:**
106
+ ```
107
+ IF p < 0.05 AND practical significance AND no guardrail violations:
108
+ → SHIP (roll out to 100%)
109
+ IF p < 0.05 BUT guardrail violation:
110
+ → ITERATE (fix guardrail issue, re-run)
111
+ IF p >= 0.05 AND confidence interval includes meaningful effects:
112
+ → EXTEND (underpowered, run longer or increase traffic)
113
+ IF p >= 0.05 AND confidence interval excludes meaningful effects:
114
+ → KILL (the change doesn't work, move on)
115
+ ```
116
+
117
+ 3. **Document the result:**
118
+ ```markdown
119
+ ## Experiment Result: [name]
120
+ - Hypothesis: [statement]
121
+ - Duration: [days] | Sample: [N per variant]
122
+ - Primary metric: [baseline] → [variant] ([+/-X%], p=[value])
123
+ - Guardrails: [all clear / violations]
124
+ - Decision: SHIP / ITERATE / EXTEND / KILL
125
+ - Learning: [what did we learn about user behavior?]
126
+ ```
127
+
128
+ ## Self-check before task completion
129
+
130
+ Before marking a task done when this skill was active:
131
+
132
+ - [ ] Did I write a structured hypothesis with expected effect size and mechanism?
133
+ - [ ] Did I calculate required sample size based on MDE and baseline?
134
+ - [ ] Did I define guardrail metrics with explicit thresholds?
135
+ - [ ] Did I choose an appropriate randomization unit?
136
+ - [ ] Did I set minimum duration (>= 1 business cycle)?
137
+ - [ ] Did I plan for the peeking problem (no early stopping without sequential testing)?
138
+ - [ ] Did I document the decision framework (ship/iterate/extend/kill)?
139
+ - [ ] Is the experiment design reproducible by another engineer?
@@ -0,0 +1,43 @@
1
+ ---
2
+ name: experiment-platform
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.6.0
5
+ status: stable
6
+ triggers: experiment platform design, experimentation infrastructure, statistical rigor experiment, guardrail metric design, experiment velocity, feature flag experiment, experiment analysis automation, sample size calculation, multi-variant testing, experiment platform lifecycle, experiment review process, sequential testing
7
+ compose: experiment-design
8
+ ---
9
+
10
+ # Skill — Experiment Platform
11
+
12
+ ## When this skill activates
13
+ This skill activates when building experimentation infrastructure, implementing statistical testing frameworks, or designing A/B testing platforms. Use when organizations need to scale testing velocity while maintaining statistical rigor.
14
+
15
+ ## Mandatory actions when this skill is active
16
+
17
+ ### Before writing any code
18
+ 1. Define experiment framework components: randomization service, exposure logging, metric computation pipeline, and statistical analysis engine
19
+ 2. Establish statistical rigor standards: minimum sample size, power (80%+), significance level (5%), minimum detectable effect, and multiple comparison corrections
20
+ 3. Design guardrail metrics framework: business health (revenue, retention), user experience (latency, errors), and ecosystem health (partner impact)
21
+ 4. Plan experiment lifecycle states: draft, review, running, paused, completed, archived with transition criteria and approval gates
22
+
23
+ ### During implementation
24
+ - Implement consistent randomization using stable hashing (user_id + experiment_id) ensuring users see same variant across sessions
25
+ - Build exposure logging capturing: timestamp, user_id, experiment_id, variant, context for accurate sample size and covariate adjustment
26
+ - Create metric computation pipeline with: numerator/denominator structure, winsorization for outliers, delta method for ratios, bootstrap for confidence intervals
27
+ - Design sequential testing capability for early stopping: alpha spending functions, futility boundaries, and minimum runtime requirements
28
+ - Implement stratified analysis for heterogeneous treatment effects: by platform, user segment, geography with interaction effect testing
29
+ - Build automated guardrail checks: alert on significant negative movement in critical metrics with experiment auto-pause capability
30
+ - Create experiment metadata repository: hypothesis, success criteria, related experiments, learnings, and decision outcome for institutional knowledge
31
+
32
+ ### After implementation
33
+ - Generate automated experiment scorecards: primary metric movement, guardrail status, statistical significance, practical significance, recommendation
34
+ - Build experiment catalog with search and discovery: hypothesis library, metric glossary, analysis templates, and historical results
35
+ - Create experimentation health dashboard: velocity (experiments/week), quality (statistical power distribution), impact (significant wins), and coverage (features tested)
36
+ - Document statistical methodology: test selection, variance reduction techniques, multiple comparison approach, and sequential testing procedures
37
+
38
+ ## Self-check before task completion
39
+ - [ ] Randomization service ensures stable assignment and balanced allocation across variants
40
+ - [ ] Exposure logging captures all necessary context for accurate analysis and debugging
41
+ - [ ] Statistical analysis engine implements proper corrections for multiple comparisons and peeking
42
+ - [ ] Guardrail metrics monitored automatically with alerting and experiment pause capability
43
+ - [ ] Experiment lifecycle enforces minimum runtime and sample size before declaring results
@@ -0,0 +1,42 @@
1
+ ---
2
+ name: feature-engineering
3
+ version: 1.0.0
4
+ min_mindforge_version: 10.6.0
5
+ status: stable
6
+ triggers: ML feature engineering workflow, feature selection method, feature transformation, feature importance analysis, automated feature discovery, feature scaling normalization, feature interaction, temporal feature extraction, text feature engineering, categorical encoding strategy, feature validation, domain feature creation
7
+ ---
8
+
9
+ # Skill — Feature Engineering
10
+
11
+ ## When this skill activates
12
+ This skill activates when building ML pipelines that require feature creation, transformation, or selection. Use when designing feature stores, implementing automated feature discovery, or optimizing model input representation.
13
+
14
+ ## Mandatory actions when this skill is active
15
+
16
+ ### Before writing any code
17
+ 1. Conduct exploratory data analysis to understand feature distributions, missing patterns, correlations, and domain-specific relationships
18
+ 2. Define feature engineering strategy: target encoding risks, temporal leakage prevention, train-test split boundaries, and cross-validation approach
19
+ 3. Document business logic for derived features with domain expert validation and interpretability requirements
20
+ 4. Establish feature quality metrics: null rates, cardinality, stability over time, and correlation with target variable
21
+
22
+ ### During implementation
23
+ - Implement feature transformations within sklearn Pipelines or similar frameworks to prevent train-test leakage
24
+ - Use robust scaling methods appropriate to distribution (StandardScaler for normal, RobustScaler for outliers, quantile for non-parametric)
25
+ - Create temporal features with proper lag handling: rolling windows, exponential smoothing, seasonal decomposition, time-since-event
26
+ - Encode categorical variables with strategy matching cardinality (one-hot <10 categories, target encoding >50, embeddings for high-cardinality)
27
+ - Generate interaction features guided by domain knowledge and feature importance: polynomial, ratio, difference, product features
28
+ - Handle missing values explicitly with strategy documented: imputation (mean/median/mode), indicator variables, or model-based imputation
29
+ - Validate feature importance using multiple methods: permutation importance, SHAP values, and univariate tests to identify top contributors
30
+
31
+ ### After implementation
32
+ - Create feature documentation with schema definitions, transformation logic, expected ranges, and update frequency
33
+ - Build feature monitoring dashboards tracking distribution drift, missing rate changes, and correlation stability over time
34
+ - Generate feature store integration with versioning, metadata tracking, and point-in-time correctness for temporal joins
35
+ - Validate feature pipeline performance: transformation latency, memory usage, and batch vs online serving consistency
36
+
37
+ ## Self-check before task completion
38
+ - [ ] All features are computed within transformation pipelines to prevent train-test leakage
39
+ - [ ] Feature importance analysis identifies top 20 contributors with interpretable business meaning
40
+ - [ ] Temporal features respect time boundaries and use only historically available information
41
+ - [ ] Feature documentation includes transformation logic, expected distributions, and monitoring thresholds
42
+ - [ ] Feature validation tests confirm stability across different time periods and data segments