opencode-swarm-plugin 0.44.0 → 0.44.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (215) hide show
  1. package/bin/swarm.serve.test.ts +6 -4
  2. package/bin/swarm.ts +18 -12
  3. package/dist/compaction-prompt-scoring.js +139 -0
  4. package/dist/eval-capture.js +12811 -0
  5. package/dist/hive.d.ts.map +1 -1
  6. package/dist/hive.js +14834 -0
  7. package/dist/index.d.ts +18 -0
  8. package/dist/index.d.ts.map +1 -1
  9. package/dist/index.js +7743 -62593
  10. package/dist/plugin.js +24052 -78907
  11. package/dist/swarm-orchestrate.d.ts.map +1 -1
  12. package/dist/swarm-prompts.d.ts.map +1 -1
  13. package/dist/swarm-prompts.js +39407 -0
  14. package/dist/swarm-review.d.ts.map +1 -1
  15. package/dist/swarm-validation.d.ts +127 -0
  16. package/dist/swarm-validation.d.ts.map +1 -0
  17. package/dist/validators/index.d.ts +7 -0
  18. package/dist/validators/index.d.ts.map +1 -0
  19. package/dist/validators/schema-validator.d.ts +58 -0
  20. package/dist/validators/schema-validator.d.ts.map +1 -0
  21. package/package.json +17 -5
  22. package/.changeset/swarm-insights-data-layer.md +0 -63
  23. package/.hive/analysis/eval-failure-analysis-2025-12-25.md +0 -331
  24. package/.hive/analysis/session-data-quality-audit.md +0 -320
  25. package/.hive/eval-results.json +0 -483
  26. package/.hive/issues.jsonl +0 -138
  27. package/.hive/memories.jsonl +0 -729
  28. package/.opencode/eval-history.jsonl +0 -327
  29. package/.turbo/turbo-build.log +0 -9
  30. package/CHANGELOG.md +0 -2286
  31. package/SCORER-ANALYSIS.md +0 -598
  32. package/docs/analysis/subagent-coordination-patterns.md +0 -902
  33. package/docs/analysis-socratic-planner-pattern.md +0 -504
  34. package/docs/planning/ADR-001-monorepo-structure.md +0 -171
  35. package/docs/planning/ADR-002-package-extraction.md +0 -393
  36. package/docs/planning/ADR-003-performance-improvements.md +0 -451
  37. package/docs/planning/ADR-004-message-queue-features.md +0 -187
  38. package/docs/planning/ADR-005-devtools-observability.md +0 -202
  39. package/docs/planning/ADR-007-swarm-enhancements-worktree-review.md +0 -168
  40. package/docs/planning/ADR-008-worker-handoff-protocol.md +0 -293
  41. package/docs/planning/ADR-009-oh-my-opencode-patterns.md +0 -353
  42. package/docs/planning/ADR-010-cass-inhousing.md +0 -1215
  43. package/docs/planning/ROADMAP.md +0 -368
  44. package/docs/semantic-memory-cli-syntax.md +0 -123
  45. package/docs/swarm-mail-architecture.md +0 -1147
  46. package/docs/testing/context-recovery-test.md +0 -470
  47. package/evals/ARCHITECTURE.md +0 -1189
  48. package/evals/README.md +0 -768
  49. package/evals/compaction-prompt.eval.ts +0 -149
  50. package/evals/compaction-resumption.eval.ts +0 -289
  51. package/evals/coordinator-behavior.eval.ts +0 -307
  52. package/evals/coordinator-session.eval.ts +0 -154
  53. package/evals/evalite.config.ts.bak +0 -15
  54. package/evals/example.eval.ts +0 -31
  55. package/evals/fixtures/cass-baseline.ts +0 -217
  56. package/evals/fixtures/compaction-cases.ts +0 -350
  57. package/evals/fixtures/compaction-prompt-cases.ts +0 -311
  58. package/evals/fixtures/coordinator-sessions.ts +0 -328
  59. package/evals/fixtures/decomposition-cases.ts +0 -105
  60. package/evals/lib/compaction-loader.test.ts +0 -248
  61. package/evals/lib/compaction-loader.ts +0 -320
  62. package/evals/lib/data-loader.evalite-test.ts +0 -289
  63. package/evals/lib/data-loader.test.ts +0 -345
  64. package/evals/lib/data-loader.ts +0 -281
  65. package/evals/lib/llm.ts +0 -115
  66. package/evals/scorers/compaction-prompt-scorers.ts +0 -145
  67. package/evals/scorers/compaction-scorers.ts +0 -305
  68. package/evals/scorers/coordinator-discipline.evalite-test.ts +0 -539
  69. package/evals/scorers/coordinator-discipline.ts +0 -325
  70. package/evals/scorers/index.test.ts +0 -146
  71. package/evals/scorers/index.ts +0 -328
  72. package/evals/scorers/outcome-scorers.evalite-test.ts +0 -27
  73. package/evals/scorers/outcome-scorers.ts +0 -349
  74. package/evals/swarm-decomposition.eval.ts +0 -121
  75. package/examples/commands/swarm.md +0 -745
  76. package/examples/plugin-wrapper-template.ts +0 -2515
  77. package/examples/skills/hive-workflow/SKILL.md +0 -212
  78. package/examples/skills/skill-creator/SKILL.md +0 -223
  79. package/examples/skills/swarm-coordination/SKILL.md +0 -292
  80. package/global-skills/cli-builder/SKILL.md +0 -344
  81. package/global-skills/cli-builder/references/advanced-patterns.md +0 -244
  82. package/global-skills/learning-systems/SKILL.md +0 -644
  83. package/global-skills/skill-creator/LICENSE.txt +0 -202
  84. package/global-skills/skill-creator/SKILL.md +0 -352
  85. package/global-skills/skill-creator/references/output-patterns.md +0 -82
  86. package/global-skills/skill-creator/references/workflows.md +0 -28
  87. package/global-skills/swarm-coordination/SKILL.md +0 -995
  88. package/global-skills/swarm-coordination/references/coordinator-patterns.md +0 -235
  89. package/global-skills/swarm-coordination/references/strategies.md +0 -138
  90. package/global-skills/system-design/SKILL.md +0 -213
  91. package/global-skills/testing-patterns/SKILL.md +0 -430
  92. package/global-skills/testing-patterns/references/dependency-breaking-catalog.md +0 -586
  93. package/opencode-swarm-plugin-0.30.7.tgz +0 -0
  94. package/opencode-swarm-plugin-0.31.0.tgz +0 -0
  95. package/scripts/cleanup-test-memories.ts +0 -346
  96. package/scripts/init-skill.ts +0 -222
  97. package/scripts/migrate-unknown-sessions.ts +0 -349
  98. package/scripts/validate-skill.ts +0 -204
  99. package/src/agent-mail.ts +0 -1724
  100. package/src/anti-patterns.test.ts +0 -1167
  101. package/src/anti-patterns.ts +0 -448
  102. package/src/compaction-capture.integration.test.ts +0 -257
  103. package/src/compaction-hook.test.ts +0 -838
  104. package/src/compaction-hook.ts +0 -1204
  105. package/src/compaction-observability.integration.test.ts +0 -139
  106. package/src/compaction-observability.test.ts +0 -187
  107. package/src/compaction-observability.ts +0 -324
  108. package/src/compaction-prompt-scorers.test.ts +0 -475
  109. package/src/compaction-prompt-scoring.ts +0 -300
  110. package/src/contributor-tools.test.ts +0 -133
  111. package/src/contributor-tools.ts +0 -201
  112. package/src/dashboard.test.ts +0 -611
  113. package/src/dashboard.ts +0 -462
  114. package/src/error-enrichment.test.ts +0 -403
  115. package/src/error-enrichment.ts +0 -219
  116. package/src/eval-capture.test.ts +0 -1015
  117. package/src/eval-capture.ts +0 -929
  118. package/src/eval-gates.test.ts +0 -306
  119. package/src/eval-gates.ts +0 -218
  120. package/src/eval-history.test.ts +0 -508
  121. package/src/eval-history.ts +0 -214
  122. package/src/eval-learning.test.ts +0 -378
  123. package/src/eval-learning.ts +0 -360
  124. package/src/eval-runner.test.ts +0 -223
  125. package/src/eval-runner.ts +0 -402
  126. package/src/export-tools.test.ts +0 -476
  127. package/src/export-tools.ts +0 -257
  128. package/src/hive.integration.test.ts +0 -2241
  129. package/src/hive.ts +0 -1628
  130. package/src/index.ts +0 -940
  131. package/src/learning.integration.test.ts +0 -1815
  132. package/src/learning.ts +0 -1079
  133. package/src/logger.test.ts +0 -189
  134. package/src/logger.ts +0 -135
  135. package/src/mandate-promotion.test.ts +0 -473
  136. package/src/mandate-promotion.ts +0 -239
  137. package/src/mandate-storage.integration.test.ts +0 -601
  138. package/src/mandate-storage.test.ts +0 -578
  139. package/src/mandate-storage.ts +0 -794
  140. package/src/mandates.ts +0 -540
  141. package/src/memory-tools.test.ts +0 -195
  142. package/src/memory-tools.ts +0 -344
  143. package/src/memory.integration.test.ts +0 -334
  144. package/src/memory.test.ts +0 -158
  145. package/src/memory.ts +0 -527
  146. package/src/model-selection.test.ts +0 -188
  147. package/src/model-selection.ts +0 -68
  148. package/src/observability-tools.test.ts +0 -359
  149. package/src/observability-tools.ts +0 -871
  150. package/src/output-guardrails.test.ts +0 -438
  151. package/src/output-guardrails.ts +0 -381
  152. package/src/pattern-maturity.test.ts +0 -1160
  153. package/src/pattern-maturity.ts +0 -525
  154. package/src/planning-guardrails.test.ts +0 -491
  155. package/src/planning-guardrails.ts +0 -438
  156. package/src/plugin.ts +0 -23
  157. package/src/post-compaction-tracker.test.ts +0 -251
  158. package/src/post-compaction-tracker.ts +0 -237
  159. package/src/query-tools.test.ts +0 -636
  160. package/src/query-tools.ts +0 -324
  161. package/src/rate-limiter.integration.test.ts +0 -466
  162. package/src/rate-limiter.ts +0 -774
  163. package/src/replay-tools.test.ts +0 -496
  164. package/src/replay-tools.ts +0 -240
  165. package/src/repo-crawl.integration.test.ts +0 -441
  166. package/src/repo-crawl.ts +0 -610
  167. package/src/schemas/cell-events.test.ts +0 -347
  168. package/src/schemas/cell-events.ts +0 -807
  169. package/src/schemas/cell.ts +0 -257
  170. package/src/schemas/evaluation.ts +0 -166
  171. package/src/schemas/index.test.ts +0 -199
  172. package/src/schemas/index.ts +0 -286
  173. package/src/schemas/mandate.ts +0 -232
  174. package/src/schemas/swarm-context.ts +0 -115
  175. package/src/schemas/task.ts +0 -161
  176. package/src/schemas/worker-handoff.test.ts +0 -302
  177. package/src/schemas/worker-handoff.ts +0 -131
  178. package/src/sessions/agent-discovery.test.ts +0 -137
  179. package/src/sessions/agent-discovery.ts +0 -112
  180. package/src/sessions/index.ts +0 -15
  181. package/src/skills.integration.test.ts +0 -1192
  182. package/src/skills.test.ts +0 -643
  183. package/src/skills.ts +0 -1549
  184. package/src/storage.integration.test.ts +0 -341
  185. package/src/storage.ts +0 -884
  186. package/src/structured.integration.test.ts +0 -817
  187. package/src/structured.test.ts +0 -1046
  188. package/src/structured.ts +0 -762
  189. package/src/swarm-decompose.test.ts +0 -188
  190. package/src/swarm-decompose.ts +0 -1302
  191. package/src/swarm-deferred.integration.test.ts +0 -157
  192. package/src/swarm-deferred.test.ts +0 -38
  193. package/src/swarm-insights.test.ts +0 -214
  194. package/src/swarm-insights.ts +0 -459
  195. package/src/swarm-mail.integration.test.ts +0 -970
  196. package/src/swarm-mail.ts +0 -739
  197. package/src/swarm-orchestrate.integration.test.ts +0 -282
  198. package/src/swarm-orchestrate.test.ts +0 -548
  199. package/src/swarm-orchestrate.ts +0 -3084
  200. package/src/swarm-prompts.test.ts +0 -1270
  201. package/src/swarm-prompts.ts +0 -2077
  202. package/src/swarm-research.integration.test.ts +0 -701
  203. package/src/swarm-research.test.ts +0 -698
  204. package/src/swarm-research.ts +0 -472
  205. package/src/swarm-review.integration.test.ts +0 -285
  206. package/src/swarm-review.test.ts +0 -879
  207. package/src/swarm-review.ts +0 -709
  208. package/src/swarm-strategies.ts +0 -407
  209. package/src/swarm-worktree.test.ts +0 -501
  210. package/src/swarm-worktree.ts +0 -575
  211. package/src/swarm.integration.test.ts +0 -2377
  212. package/src/swarm.ts +0 -38
  213. package/src/tool-adapter.integration.test.ts +0 -1221
  214. package/src/tool-availability.ts +0 -461
  215. package/tsconfig.json +0 -28
@@ -1 +1 @@
1
- {"version":3,"file":"swarm-review.d.ts","sourceRoot":"","sources":["../src/swarm-review.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAGH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AASxB;;GAEG;AACH,MAAM,WAAW,WAAW;IAC1B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,KAAK,EAAE,MAAM,CAAC;IACd,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,eAAO,MAAM,iBAAiB;;;;;iBAK5B,CAAC;AAEH;;GAEG;AACH,MAAM,WAAW,YAAY;IAC3B,MAAM,EAAE,UAAU,GAAG,eAAe,CAAC;IACrC,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,MAAM,CAAC,EAAE,WAAW,EAAE,CAAC;IACvB,kBAAkB,CAAC,EAAE,MAAM,CAAC;CAC7B;AAED,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;iBAkB5B,CAAC;AAEJ;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;CACf;AAED;;GAEG;AACH,MAAM,WAAW,mBAAmB;IAClC,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,aAAa,EAAE,MAAM,EAAE,CAAC;IACxB,IAAI,EAAE,MAAM,CAAC;IACb,sBAAsB,CAAC,EAAE,cAAc,EAAE,CAAC;IAC1C,gBAAgB,CAAC,EAAE,cAAc,EAAE,CAAC;CACrC;AAkDD;;;;;;;;;;GAUG;AACH,wBAAgB,oBAAoB,CAAC,OAAO,EAAE,mBAAmB,GAAG,MAAM,CAsGzE;AAmED;;;;;GAKG;AACH,eAAO,MAAM,YAAY;;;;;;;;;;;;;;CA+GvB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,qBAAqB;;;;;;;;;;;;;;;;;;;;;CAkLhC,CAAC;AAMH;;GAEG;AACH,UAAU,gBAAgB;IACxB,QAAQ,EAAE,OAAO,CAAC;IAClB,QAAQ,EAAE,OAAO,CAAC;IAClB,aAAa,EAAE,MAAM,CAAC;IACtB,kBAAkB,EAAE,MAAM,CAAC;CAC5B;AAOD;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAGvD;AAED;;GAEG;AACH,wBAAgB,gBAAgB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAGxD;AAED;;GAEG;AACH,wBAAgB,eAAe,CAAC,MAAM,EAAE,MAAM,GAAG,gBAAgB,CAQhE;AAED;;GAEG;AACH,wBAAgB,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAGtD;AAED;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAEvD;AAMD,eAAO,MAAM,WAAW;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAGvB,CAAC"}
1
+ {"version":3,"file":"swarm-review.d.ts","sourceRoot":"","sources":["../src/swarm-review.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAGH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AASxB;;GAEG;AACH,MAAM,WAAW,WAAW;IAC1B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,KAAK,EAAE,MAAM,CAAC;IACd,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,eAAO,MAAM,iBAAiB;;;;;iBAK5B,CAAC;AAEH;;GAEG;AACH,MAAM,WAAW,YAAY;IAC3B,MAAM,EAAE,UAAU,GAAG,eAAe,CAAC;IACrC,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,MAAM,CAAC,EAAE,WAAW,EAAE,CAAC;IACvB,kBAAkB,CAAC,EAAE,MAAM,CAAC;CAC7B;AAED,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;iBAkB5B,CAAC;AAEJ;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;CACf;AAED;;GAEG;AACH,MAAM,WAAW,mBAAmB;IAClC,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,aAAa,EAAE,MAAM,EAAE,CAAC;IACxB,IAAI,EAAE,MAAM,CAAC;IACb,sBAAsB,CAAC,EAAE,cAAc,EAAE,CAAC;IAC1C,gBAAgB,CAAC,EAAE,cAAc,EAAE,CAAC;CACrC;AAkDD;;;;;;;;;;GAUG;AACH,wBAAgB,oBAAoB,CAAC,OAAO,EAAE,mBAAmB,GAAG,MAAM,CAsGzE;AAmED;;;;;GAKG;AACH,eAAO,MAAM,YAAY;;;;;;;;;;;;;;CA+HvB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,qBAAqB;;;;;;;;;;;;;;;;;;;;;CAoNhC,CAAC;AAMH;;GAEG;AACH,UAAU,gBAAgB;IACxB,QAAQ,EAAE,OAAO,CAAC;IAClB,QAAQ,EAAE,OAAO,CAAC;IAClB,aAAa,EAAE,MAAM,CAAC;IACtB,kBAAkB,EAAE,MAAM,CAAC;CAC5B;AAOD;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAGvD;AAED;;GAEG;AACH,wBAAgB,gBAAgB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAGxD;AAED;;GAEG;AACH,wBAAgB,eAAe,CAAC,MAAM,EAAE,MAAM,GAAG,gBAAgB,CAQhE;AAED;;GAEG;AACH,wBAAgB,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAGtD;AAED;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAEvD;AAMD,eAAO,MAAM,WAAW;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAGvB,CAAC"}
@@ -0,0 +1,127 @@
1
+ /**
2
+ * Swarm Validation Hook Infrastructure
3
+ *
4
+ * Provides validation event types and hooks for post-swarm validation.
5
+ * Integrates with swarm-mail event sourcing to emit validation events.
6
+ *
7
+ * @module swarm-validation
8
+ */
9
+ import { z } from "zod";
10
+ /**
11
+ * Agent event type for validation events
12
+ *
13
+ * This is a minimal type that matches the swarm-mail AgentEvent interface
14
+ * for the validation events we emit.
15
+ */
16
+ type AgentEvent = {
17
+ type: "validation_started";
18
+ project_key: string;
19
+ timestamp: number;
20
+ epic_id: string;
21
+ swarm_id: string;
22
+ started_at: number;
23
+ } | {
24
+ type: "validation_issue";
25
+ project_key: string;
26
+ timestamp: string | number;
27
+ epic_id: string;
28
+ severity: "error" | "warning" | "info";
29
+ category: "schema_mismatch" | "missing_event" | "undefined_value" | "dashboard_render" | "websocket_delivery";
30
+ message: string;
31
+ location?: {
32
+ event_type?: string;
33
+ field?: string;
34
+ component?: string;
35
+ };
36
+ } | {
37
+ type: "validation_completed";
38
+ project_key: string;
39
+ timestamp: number;
40
+ epic_id: string;
41
+ swarm_id: string;
42
+ passed: boolean;
43
+ issue_count: number;
44
+ duration_ms: number;
45
+ };
46
+ /**
47
+ * Severity levels for validation issues
48
+ */
49
+ export declare const ValidationIssueSeverity: z.ZodEnum<{
50
+ error: "error";
51
+ info: "info";
52
+ warning: "warning";
53
+ }>;
54
+ /**
55
+ * Categories of validation issues
56
+ */
57
+ export declare const ValidationIssueCategory: z.ZodEnum<{
58
+ schema_mismatch: "schema_mismatch";
59
+ missing_event: "missing_event";
60
+ undefined_value: "undefined_value";
61
+ dashboard_render: "dashboard_render";
62
+ websocket_delivery: "websocket_delivery";
63
+ }>;
64
+ /**
65
+ * Validation issue with location context
66
+ */
67
+ export declare const ValidationIssueSchema: z.ZodObject<{
68
+ severity: z.ZodEnum<{
69
+ error: "error";
70
+ info: "info";
71
+ warning: "warning";
72
+ }>;
73
+ category: z.ZodEnum<{
74
+ schema_mismatch: "schema_mismatch";
75
+ missing_event: "missing_event";
76
+ undefined_value: "undefined_value";
77
+ dashboard_render: "dashboard_render";
78
+ websocket_delivery: "websocket_delivery";
79
+ }>;
80
+ message: z.ZodString;
81
+ location: z.ZodOptional<z.ZodObject<{
82
+ event_type: z.ZodOptional<z.ZodString>;
83
+ field: z.ZodOptional<z.ZodString>;
84
+ component: z.ZodOptional<z.ZodString>;
85
+ }, z.core.$strip>>;
86
+ }, z.core.$strip>;
87
+ export type ValidationIssue = z.infer<typeof ValidationIssueSchema>;
88
+ /**
89
+ * Context for validation execution
90
+ */
91
+ export interface ValidationContext {
92
+ /** Project key (path) */
93
+ project_key: string;
94
+ /** Epic ID being validated */
95
+ epic_id: string;
96
+ /** Swarm ID being validated */
97
+ swarm_id: string;
98
+ /** Validation start time */
99
+ started_at: Date;
100
+ /** Event emitter function */
101
+ emit: (event: AgentEvent) => Promise<void>;
102
+ }
103
+ /**
104
+ * Run post-swarm validation
105
+ *
106
+ * Emits validation_started, runs validators, emits validation_issue for each issue,
107
+ * and emits validation_completed with summary.
108
+ *
109
+ * @param ctx - Validation context
110
+ * @param events - Events to validate
111
+ * @returns Validation result with passed flag and issues
112
+ */
113
+ export declare function runPostSwarmValidation(ctx: ValidationContext, events: unknown[]): Promise<{
114
+ passed: boolean;
115
+ issues: ValidationIssue[];
116
+ }>;
117
+ /**
118
+ * Report a validation issue
119
+ *
120
+ * Emits a validation_issue event with the provided issue details.
121
+ *
122
+ * @param ctx - Validation context
123
+ * @param issue - Validation issue to report
124
+ */
125
+ export declare function reportIssue(ctx: ValidationContext, issue: ValidationIssue): Promise<void>;
126
+ export {};
127
+ //# sourceMappingURL=swarm-validation.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"swarm-validation.d.ts","sourceRoot":"","sources":["../src/swarm-validation.ts"],"names":[],"mappings":"AAAA;;;;;;;GAOG;AACH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AAExB;;;;;GAKG;AACH,KAAK,UAAU,GACX;IACE,IAAI,EAAE,oBAAoB,CAAC;IAC3B,WAAW,EAAE,MAAM,CAAC;IACpB,SAAS,EAAE,MAAM,CAAC;IAClB,OAAO,EAAE,MAAM,CAAC;IAChB,QAAQ,EAAE,MAAM,CAAC;IACjB,UAAU,EAAE,MAAM,CAAC;CACpB,GACD;IACE,IAAI,EAAE,kBAAkB,CAAC;IACzB,WAAW,EAAE,MAAM,CAAC;IACpB,SAAS,EAAE,MAAM,GAAG,MAAM,CAAC;IAC3B,OAAO,EAAE,MAAM,CAAC;IAChB,QAAQ,EAAE,OAAO,GAAG,SAAS,GAAG,MAAM,CAAC;IACvC,QAAQ,EACJ,iBAAiB,GACjB,eAAe,GACf,iBAAiB,GACjB,kBAAkB,GAClB,oBAAoB,CAAC;IACzB,OAAO,EAAE,MAAM,CAAC;IAChB,QAAQ,CAAC,EAAE;QACT,UAAU,CAAC,EAAE,MAAM,CAAC;QACpB,KAAK,CAAC,EAAE,MAAM,CAAC;QACf,SAAS,CAAC,EAAE,MAAM,CAAC;KACpB,CAAC;CACH,GACD;IACE,IAAI,EAAE,sBAAsB,CAAC;IAC7B,WAAW,EAAE,MAAM,CAAC;IACpB,SAAS,EAAE,MAAM,CAAC;IAClB,OAAO,EAAE,MAAM,CAAC;IAChB,QAAQ,EAAE,MAAM,CAAC;IACjB,MAAM,EAAE,OAAO,CAAC;IAChB,WAAW,EAAE,MAAM,CAAC;IACpB,WAAW,EAAE,MAAM,CAAC;CACrB,CAAC;AAMN;;GAEG;AACH,eAAO,MAAM,uBAAuB;;;;EAAuC,CAAC;AAE5E;;GAEG;AACH,eAAO,MAAM,uBAAuB;;;;;;EAMlC,CAAC;AAEH;;GAEG;AACH,eAAO,MAAM,qBAAqB;;;;;;;;;;;;;;;;;;;iBAWhC,CAAC;AAEH,MAAM,MAAM,eAAe,GAAG,CAAC,CAAC,KAAK,CAAC,OAAO,qBAAqB,CAAC,CAAC;AAMpE;;GAEG;AACH,MAAM,WAAW,iBAAiB;IAChC,yBAAyB;IACzB,WAAW,EAAE,MAAM,CAAC;IACpB,8BAA8B;IAC9B,OAAO,EAAE,MAAM,CAAC;IAChB,+BAA+B;IAC/B,QAAQ,EAAE,MAAM,CAAC;IACjB,4BAA4B;IAC5B,UAAU,EAAE,IAAI,CAAC;IACjB,6BAA6B;IAC7B,IAAI,EAAE,CAAC,KAAK,EAAE,UAAU,KAAK,OAAO,CAAC,IAAI,CAAC,CAAC;CAC5C;AAMD;;;;;;;;;GASG;AACH,wBAAsB,sBAAsB,CAC1C,GAAG,EAAE,iBAAiB,EACtB,MAAM,EAAE,OAAO,EAAE,GAChB,OAAO,CAAC;IAAE,MAAM,EAAE,OAAO,CAAC;IAAC,MAAM,EAAE,eAAe,EAAE,CAAA;CAAE,CAAC,CA+BzD;AAED;;;;;;;GAOG;AACH,wBAAsB,WAAW,CAC/B,GAAG,EAAE,iBAAiB,EACtB,KAAK,EAAE,eAAe,GACrB,OAAO,CAAC,IAAI,CAAC,CAWf"}
@@ -0,0 +1,7 @@
1
+ /**
2
+ * Validators - Event and schema validation utilities
3
+ *
4
+ * @module validators
5
+ */
6
+ export * from "./schema-validator.js";
7
+ //# sourceMappingURL=index.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"index.d.ts","sourceRoot":"","sources":["../../src/validators/index.ts"],"names":[],"mappings":"AAAA;;;;GAIG;AAEH,cAAc,uBAAuB,CAAC"}
@@ -0,0 +1,58 @@
1
+ /**
2
+ * Event Schema Validator
3
+ *
4
+ * Validates emitted events against their Zod schemas.
5
+ * Catches:
6
+ * - Type mismatches
7
+ * - Missing required fields
8
+ * - Undefined values that could break UI rendering
9
+ * - Schema violations
10
+ *
11
+ * Used by:
12
+ * - Swarm event emission (validateEvent before emit)
13
+ * - Post-run validation (validateSwarmEvents for all events)
14
+ * - Debug tooling (identify schema drift)
15
+ */
16
+ import type { ZodError } from "zod";
17
+ export interface ValidationIssue {
18
+ severity: "error" | "warning";
19
+ category: "schema_mismatch" | "undefined_value" | "missing_field" | "type_error";
20
+ message: string;
21
+ location?: {
22
+ event_type?: string;
23
+ field?: string;
24
+ };
25
+ zodError?: ZodError;
26
+ }
27
+ export interface SchemaValidationResult {
28
+ valid: boolean;
29
+ issues: ValidationIssue[];
30
+ }
31
+ /**
32
+ * Validate a single event against its schema
33
+ *
34
+ * Usage:
35
+ * ```typescript
36
+ * const result = validateEvent(event);
37
+ * if (!result.valid) {
38
+ * console.error("Schema validation failed:", result.issues);
39
+ * }
40
+ * ```
41
+ */
42
+ export declare function validateEvent(event: unknown): SchemaValidationResult;
43
+ /**
44
+ * Validate all events from a swarm run
45
+ *
46
+ * Usage:
47
+ * ```typescript
48
+ * const { passed, issueCount } = await validateSwarmEvents(events);
49
+ * if (!passed) {
50
+ * console.error(`Found ${issueCount} validation issues`);
51
+ * }
52
+ * ```
53
+ */
54
+ export declare function validateSwarmEvents(events: unknown[]): Promise<{
55
+ passed: boolean;
56
+ issueCount: number;
57
+ }>;
58
+ //# sourceMappingURL=schema-validator.d.ts.map
@@ -0,0 +1 @@
1
+ {"version":3,"file":"schema-validator.d.ts","sourceRoot":"","sources":["../../src/validators/schema-validator.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAGH,OAAO,KAAK,EAAE,QAAQ,EAAE,MAAM,KAAK,CAAC;AAEpC,MAAM,WAAW,eAAe;IAC9B,QAAQ,EAAE,OAAO,GAAG,SAAS,CAAC;IAC9B,QAAQ,EACJ,iBAAiB,GACjB,iBAAiB,GACjB,eAAe,GACf,YAAY,CAAC;IACjB,OAAO,EAAE,MAAM,CAAC;IAChB,QAAQ,CAAC,EAAE;QACT,UAAU,CAAC,EAAE,MAAM,CAAC;QACpB,KAAK,CAAC,EAAE,MAAM,CAAC;KAChB,CAAC;IACF,QAAQ,CAAC,EAAE,QAAQ,CAAC;CACrB;AAED,MAAM,WAAW,sBAAsB;IACrC,KAAK,EAAE,OAAO,CAAC;IACf,MAAM,EAAE,eAAe,EAAE,CAAC;CAC3B;AAED;;;;;;;;;;GAUG;AACH,wBAAgB,aAAa,CAAC,KAAK,EAAE,OAAO,GAAG,sBAAsB,CAkDpE;AAgCD;;;;;;;;;;GAUG;AACH,wBAAsB,mBAAmB,CACvC,MAAM,EAAE,OAAO,EAAE,GAChB,OAAO,CAAC;IAAE,MAAM,EAAE,OAAO,CAAC;IAAC,UAAU,EAAE,MAAM,CAAA;CAAE,CAAC,CAWlD"}
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "opencode-swarm-plugin",
3
- "version": "0.44.0",
3
+ "version": "0.44.2",
4
4
  "description": "Multi-agent swarm coordination for OpenCode with learning capabilities, beads integration, and Agent Mail",
5
5
  "type": "module",
6
6
  "main": "./dist/index.js",
@@ -16,14 +16,27 @@
16
16
  "./plugin": {
17
17
  "import": "./dist/plugin.js",
18
18
  "types": "./dist/plugin.d.ts"
19
+ },
20
+ "./eval-capture": {
21
+ "import": "./dist/eval-capture.js",
22
+ "types": "./dist/eval-capture.d.ts"
23
+ },
24
+ "./compaction-prompt-scoring": {
25
+ "import": "./dist/compaction-prompt-scoring.js",
26
+ "types": "./dist/compaction-prompt-scoring.d.ts"
19
27
  }
20
28
  },
29
+ "files": [
30
+ "dist",
31
+ "bin",
32
+ "README.md"
33
+ ],
21
34
  "publishConfig": {
22
35
  "access": "public",
23
36
  "registry": "https://registry.npmjs.org/"
24
37
  },
25
38
  "scripts": {
26
- "build": "bun build ./src/index.ts --outdir ./dist --target node --external @electric-sql/pglite --external swarm-mail --external vitest --external @vitest/ui --external lightningcss && bun build ./src/plugin.ts --outfile ./dist/plugin.js --target node --external @electric-sql/pglite --external swarm-mail --external vitest --external @vitest/ui --external lightningcss && tsc",
39
+ "build": "bun run scripts/build.ts",
27
40
  "dev": "bun --watch src/index.ts",
28
41
  "test": "bun test --timeout 10000 src/anti-patterns.test.ts src/mandate-promotion.test.ts src/mandate-storage.test.ts src/output-guardrails.test.ts src/pattern-maturity.test.ts src/skills.test.ts src/structured.test.ts src/schemas/",
29
42
  "test:integration": "bun test --timeout 60000 src/*.integration.test.ts",
@@ -56,11 +69,10 @@
56
69
  "@types/minimatch": "^6.0.0",
57
70
  "ai": "6.0.0-beta.150",
58
71
  "bun-types": "^1.3.4",
59
- "evalite": "^1.0.0-beta.10",
72
+ "evalite": "^0.19.0",
60
73
  "pino-pretty": "^13.1.3",
61
74
  "turbo": "^2.6.3",
62
- "typescript": "^5.7.0",
63
- "vitest": "^4.0.15"
75
+ "typescript": "^5.7.0"
64
76
  },
65
77
  "peerDependencies": {
66
78
  "@opencode-ai/plugin": "^1.0.0"
@@ -1,63 +0,0 @@
1
- ---
2
- "opencode-swarm-plugin": minor
3
- ---
4
-
5
- ## 🧠 Swarm Insights: Data-Driven Decomposition
6
-
7
- > "It should allow the learner both to reflect on the quality of found solutions so that more effective cognitive schemata can be induced (including discriminations and generalizations) or further elaborated."
8
- >
9
- > — *Training Complex Cognitive Skills: A Four-Component Instructional Design Model for Technical Training*
10
-
11
- **What changed:**
12
-
13
- New data layer (`swarm-insights.ts`) aggregates learnings from swarm coordination events to inform future decompositions. Coordinators and workers now get concise, context-efficient summaries injected into their prompts.
14
-
15
- **Key exports:**
16
-
17
- - `getStrategyInsights(swarmMail, task)` - Strategy success rates and recommendations
18
- - Queries `subtask_outcome` events, calculates win/loss ratios
19
- - Returns: `{ strategy, successRate, totalAttempts, recommendation }`
20
- - Powers coordinator strategy selection with empirical data
21
-
22
- - `getFileInsights(swarmMail, files)` - File-specific gotchas from past failures
23
- - Identifies files with high failure rates
24
- - Returns: `{ file, failureCount, lastFailure, gotchas[] }`
25
- - Workers see warnings about tricky files before touching them
26
-
27
- - `getPatternInsights(swarmMail)` - Common failure patterns and anti-patterns
28
- - Detects recurring error types (type_error, timeout, conflict, test_failure)
29
- - Returns: `{ pattern, frequency, recommendation }`
30
- - Surfaces systemic issues for proactive prevention
31
-
32
- - `formatInsightsForPrompt(bundle, options)` - Context-aware formatting
33
- - Token budget enforcement (default 500 tokens, ~2000 chars)
34
- - Prioritizes top 3 strategies, 5 files, 3 patterns
35
- - Clean markdown output for prompt injection
36
-
37
- - `getCachedInsights(swarmMail, cacheKey, computeFn)` - 5-minute TTL caching
38
- - Prevents redundant queries during active swarms
39
- - Transparent cache miss fallback
40
-
41
- **Why it matters:**
42
-
43
- Before this, coordinators decomposed tasks blind to past failures. "Split by file type" might have failed 8 times, but the coordinator would try it again. Workers would touch `auth/tokens.ts` without knowing it caused 3 prior failures.
44
-
45
- Now:
46
- - **Better decomposition**: Coordinator prompts show strategy success rates (e.g., "file-based: 85% success, feature-based: 40% - avoid")
47
- - **Fewer repeated mistakes**: Workers see file-specific warnings before editing
48
- - **Compounding learning**: Each swarm completion feeds the insights engine, improving future decompositions
49
- - **Context-efficient**: Hard token caps prevent insights from dominating prompt budgets
50
-
51
- The swarm now learns from its mistakes, not just records them.
52
-
53
- **Data sources:**
54
- - Event store: `subtask_outcome`, `eval_finalized` events
55
- - Semantic memory: File-specific learnings (TODO: full integration)
56
- - Anti-pattern registry: Detection and inversion rules
57
-
58
- **Integration points:**
59
- - Coordinator prompts: Inject strategy insights during decomposition
60
- - Worker prompts: Inject file insights when subtasks are spawned
61
- - Learning layer: Confidence decay, pattern maturity, implicit feedback scoring
62
-
63
- This is the foundation for adaptive swarm intelligence - decomposition that gets smarter with every task completed.
@@ -1,331 +0,0 @@
1
- # Eval Failure Analysis Report
2
- **Date:** 2025-12-25
3
- **Analyst:** BrightStar
4
- **Cell:** opencode-swarm-plugin--ys7z8-mjlk7jsl4tt
5
- **Epic:** opencode-swarm-plugin--ys7z8-mjlk7js9bt1
6
-
7
- ## Executive Summary
8
-
9
- Two eval failures analyzed:
10
- - **example.eval.ts**: 0% score - structural bug in eval setup
11
- - **compaction-prompt.eval.ts**: 53% score - case sensitivity + missing forbidden tools
12
-
13
- Both are fixable with code changes. No test data quality issues.
14
-
15
- ---
16
-
17
- ## example.eval.ts - 0% Score
18
-
19
- ### Status
20
- ❌ **CRITICAL** - Complete failure (0%)
21
-
22
- ### Root Cause
23
- **Eval structure mismatch** between data provider and task function.
24
-
25
- ### Technical Details
26
-
27
- **File:** `evals/example.eval.ts`
28
- **Lines:** 14-30
29
-
30
- The eval has a fundamental flow error:
31
-
32
- ```typescript
33
- // Line 14-26: data() provides BOTH input AND expected output
34
- data: async () => {
35
- return [
36
- {
37
- input: "Test task", // ← String for task function
38
- output: JSON.stringify({ // ← Expected output (ignored!)
39
- epic: { title: "Test Epic", ... },
40
- subtasks: [...]
41
- }),
42
- },
43
- ];
44
- },
45
-
46
- // Line 28-30: task() does passthrough
47
- task: async (input) => {
48
- return input; // ← Returns "Test task" string, NOT the CellTree
49
- },
50
-
51
- // Line 31: Scorer expects CellTree JSON
52
- scorers: [subtaskIndependence],
53
- ```
54
-
55
- **What happens:**
56
- 1. Evalite passes `input` ("Test task") to task function
57
- 2. Task returns "Test task" string unchanged
58
- 3. Scorer `subtaskIndependence` receives "Test task"
59
- 4. Scorer tries to parse as CellTree JSON → **FAILS**
60
- 5. Score: 0%
61
-
62
- The `output` field in `data()` is ignored by Evalite - it's the `task()` return value that gets scored.
63
-
64
- ### Impact
65
- - Example eval is useless for validation
66
- - False signal that scorer infrastructure is broken (it's not)
67
- - Wastes CI time
68
-
69
- ### Proposed Fix
70
-
71
- **Option 1: Remove output from data (recommended)**
72
- ```typescript
73
- data: async () => {
74
- return [
75
- {
76
- input: {
77
- epic: { title: "Test Epic", description: "Test" },
78
- subtasks: [
79
- { title: "Subtask 1", files: ["a.ts"], estimated_complexity: 1 },
80
- { title: "Subtask 2", files: ["b.ts"], estimated_complexity: 1 },
81
- ],
82
- },
83
- },
84
- ];
85
- },
86
-
87
- task: async (input) => {
88
- return JSON.stringify(input); // Stringify the CellTree
89
- },
90
- ```
91
-
92
- **Option 2: Fix task to use output**
93
- ```typescript
94
- // Keep data() as-is, but fix task:
95
- task: async (input, context) => {
96
- return context.expected.output; // Use the output from data()
97
- },
98
- ```
99
-
100
- Option 1 is cleaner - task functions should generate output, not just pass through.
101
-
102
- ---
103
-
104
- ## compaction-prompt.eval.ts - 53% Score
105
-
106
- ### Status
107
- ⚠️ **DEGRADED** - Below target (53% vs 100% historical)
108
-
109
- ### Root Causes
110
-
111
- #### RC1: Case-Sensitive Forbidden Tool Patterns (15% weight)
112
-
113
- **File:** `src/compaction-prompt-scoring.ts`
114
- **Lines:** 213-218
115
-
116
- ```typescript
117
- const forbiddenTools = [
118
- /\bEdit\b/, // ← Requires capital E
119
- /\bWrite\b/, // ← Requires capital W
120
- /swarmmail_reserve/,
121
- /git commit/,
122
- ];
123
- ```
124
-
125
- **File:** `evals/fixtures/compaction-prompt-cases.ts`
126
- **Lines:** 76-83 (perfect fixture)
127
-
128
- ```
129
- - edit // ← lowercase e
130
- - write // ← lowercase w
131
- - bash (for file modifications)
132
- ```
133
-
134
- **Evidence:**
135
- ```javascript
136
- /\bEdit\b/.test("- Edit") // ✅ true
137
- /\bEdit\b/.test("- edit") // ❌ false (word boundary + case)
138
- ```
139
-
140
- **Impact:**
141
- - Perfect fixture: 0/4 forbidden tools matched
142
- - Forbidden tools scorer: 0% (should be 75-100%)
143
- - Overall impact: 15% of total score lost
144
-
145
- #### RC2: Missing Forbidden Tools (15% weight)
146
-
147
- Scorer expects **4 tools**:
148
- 1. Edit (or edit)
149
- 2. Write (or write)
150
- 3. swarmmail_reserve
151
- 4. git commit
152
-
153
- Perfect fixture has **3 tools** (and case mismatch):
154
- 1. edit ❌ (lowercase)
155
- 2. write ❌ (lowercase)
156
- 3. bash ❌ (not in scorer's list)
157
-
158
- Missing: swarmmail_reserve, git commit
159
-
160
- **Impact:**
161
- - Even if case fixed, still only 2/4 tools = 50% on this scorer
162
- - Weighted: 50% × 15% = 7.5% contribution (should be 15%)
163
-
164
- #### RC3: "bash" Not in Scorer's List
165
-
166
- Fixtures mention "bash (for file modifications)" as forbidden, but scorer doesn't check for it.
167
- This creates a 3-way mismatch:
168
- - Fixture lists: edit, write, bash
169
- - Scorer checks: Edit, Write, swarmmail_reserve, git commit
170
- - Overlap: 0 tools (due to case)
171
-
172
- ### Score Breakdown - Perfect Fixture
173
-
174
- Expected (if 100%):
175
- ```
176
- epicIdSpecificity: 20% × 1.0 = 20%
177
- actionability: 20% × 1.0 = 20%
178
- coordinatorIdentity: 25% × 1.0 = 25%
179
- forbiddenToolsPresent: 15% × 1.0 = 15%
180
- postCompactionDiscipline: 20% × 1.0 = 20%
181
- ─────
182
- TOTAL: 100%
183
- ```
184
-
185
- Actual (current):
186
- ```
187
- epicIdSpecificity: 20% × 1.0 = 20% ✅
188
- actionability: 20% × 1.0 = 20% ✅
189
- coordinatorIdentity: 25% × 1.0 = 25% ✅
190
- forbiddenToolsPresent: 15% × 0.0 = 0% ❌ (0/4 matched)
191
- postCompactionDiscipline: 20% × 1.0 = 20% ✅
192
- ─────
193
- TOTAL: 85%
194
- ```
195
-
196
- Perfect fixture alone should score 85%, but overall eval is 53%.
197
- This means the 5 "bad" fixtures are pulling average down further (expected behavior).
198
-
199
- ### Historical Context
200
-
201
- Semantic memory claims 100% score previously. Likely scenarios:
202
- 1. **Never actually ran** - aspiration documented before implementation
203
- 2. **Ran with different fixtures** - fixtures were updated after scorer was written
204
- 3. **Scorer was case-insensitive before** - regression in recent commit aa12943
205
-
206
- Commit aa12943 (2025-12-24) added the eval infrastructure. This is brand new code.
207
-
208
- ### Proposed Fixes
209
-
210
- #### Fix 1: Make Scorer Case-Insensitive (Recommended)
211
-
212
- **File:** `src/compaction-prompt-scoring.ts`
213
- **Lines:** 213-218
214
-
215
- ```typescript
216
- const forbiddenTools = [
217
- /\bedit\b/i, // Case insensitive with 'i' flag
218
- /\bwrite\b/i, // Case insensitive
219
- /\bbash\b/i, // Add bash (was missing)
220
- /swarmmail_reserve/i, // Keep, add 'i' for safety
221
- /git commit/i, // Keep, add 'i' for safety
222
- ];
223
- ```
224
-
225
- **Rationale:**
226
- - Coordinators might capitalize differently in prompts
227
- - Real prompts won't always match exact case
228
- - More robust matching
229
-
230
- #### Fix 2: Update Fixtures to Match Scorer (Alternative)
231
-
232
- **File:** `evals/fixtures/compaction-prompt-cases.ts`
233
- **Lines:** 76-83 (and all other fixtures)
234
-
235
- ```
236
- - Edit // Capital E
237
- - Write // Capital W
238
- - bash (for file modifications) // Keep or remove
239
- - swarmmail_reserve // ADD
240
- - git commit // ADD
241
- ```
242
-
243
- **Rationale:**
244
- - Keeps scorer strict (may catch real case issues)
245
- - Makes fixtures comprehensive (all 5 tools)
246
- - More explicit about what's forbidden
247
-
248
- #### Fix 3: Hybrid (Best of Both)
249
-
250
- 1. Make scorer case-insensitive (Fix 1)
251
- 2. Update fixtures to include all 5 tools (Fix 2)
252
- 3. Remove "bash" from fixtures if not in coordinator forbidden list
253
-
254
- ```typescript
255
- // Scorer (5 tools, case-insensitive):
256
- const forbiddenTools = [
257
- /\bedit\b/i,
258
- /\bwrite\b/i,
259
- /swarmmail_reserve/i,
260
- /git\s+commit/i,
261
- /\bread\b/i, // Consider adding - coordinators shouldn't read, should check status
262
- ];
263
- ```
264
-
265
- ```
266
- // Fixture:
267
- - Edit
268
- - Write
269
- - swarmmail_reserve (only workers reserve files)
270
- - git commit (workers commit their changes)
271
- ```
272
-
273
- ### Risk Assessment
274
-
275
- **If we fix this, will scores jump to 100%?**
276
-
277
- **Perfect fixture:** 85% → 100% (if all 4 tools matched)
278
- **Other fixtures:** Depends on their issues
279
-
280
- Looking at fixture expected values:
281
- - Fixture 0 (perfect): Should be 100%
282
- - Fixture 1 (placeholder): Should fail (expected)
283
- - Fixture 2 (generic): Should fail (expected)
284
- - Fixture 3 (weak identity): Should partially fail (expected)
285
- - Fixture 4 (missing forbidden): Should fail on forbidden tools only
286
- - Fixture 5 (wrong first tool): Should fail on discipline only
287
-
288
- Average across 6 fixtures: ~66% expected (not 100%)
289
-
290
- **So 53% → ~70-80%** is realistic after fixes (not 100%).
291
-
292
- To get higher scores, need to fix issues in bad fixtures too, but those are SUPPOSED to fail.
293
- The scorer is working correctly on those.
294
-
295
- ---
296
-
297
- ## Recommendations
298
-
299
- ### Immediate Actions (P0)
300
-
301
- 1. **Fix example.eval.ts structure** - 5 min fix, unblocks that eval
302
- 2. **Make forbidden tools case-insensitive** - 5 min fix, +15-20% score boost
303
- 3. **Add missing tools to fixtures** - 10 min, comprehensive coverage
304
-
305
- ### Medium-term Actions (P1)
306
-
307
- 4. **Verify 100% claim in semantic memory** - Check if historical data exists
308
- 5. **Document scorer expectations** - Add comments to fixtures explaining weights
309
- 6. **Add unit tests for scorers** - Test edge cases independently
310
-
311
- ### Long-term Actions (P2)
312
-
313
- 7. **Consider LLM-as-judge for semantic checks** - Case-insensitive by nature
314
- 8. **Add visual diff in eval output** - Show what's missing from prompts
315
- 9. **Create eval dashboard** - Track scores over time, detect regressions
316
-
317
- ---
318
-
319
- ## Conclusion
320
-
321
- Both evals have **code bugs, not test data issues**:
322
- - example.eval.ts: Structural bug (task/data mismatch)
323
- - compaction-prompt.eval.ts: Case sensitivity + incomplete tool list
324
-
325
- Fixes are straightforward and low-risk. After fixes, expect:
326
- - example.eval.ts: 0% → 100%
327
- - compaction-prompt.eval.ts: 53% → 70-80%
328
-
329
- The 100% historical score in semantic memory is likely aspirational - these evals are brand new (commit aa12943, Dec 24).
330
-
331
- **Ready to implement fixes or escalate for review?**