opencode-swarm-plugin 0.44.0 → 0.44.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/bin/swarm.serve.test.ts +6 -4
- package/bin/swarm.ts +16 -10
- package/dist/compaction-prompt-scoring.js +139 -0
- package/dist/eval-capture.js +12811 -0
- package/dist/hive.d.ts.map +1 -1
- package/dist/index.js +7644 -62599
- package/dist/plugin.js +23766 -78721
- package/dist/swarm-orchestrate.d.ts.map +1 -1
- package/dist/swarm-prompts.d.ts.map +1 -1
- package/dist/swarm-review.d.ts.map +1 -1
- package/package.json +17 -5
- package/.changeset/swarm-insights-data-layer.md +0 -63
- package/.hive/analysis/eval-failure-analysis-2025-12-25.md +0 -331
- package/.hive/analysis/session-data-quality-audit.md +0 -320
- package/.hive/eval-results.json +0 -483
- package/.hive/issues.jsonl +0 -138
- package/.hive/memories.jsonl +0 -729
- package/.opencode/eval-history.jsonl +0 -327
- package/.turbo/turbo-build.log +0 -9
- package/CHANGELOG.md +0 -2286
- package/SCORER-ANALYSIS.md +0 -598
- package/docs/analysis/subagent-coordination-patterns.md +0 -902
- package/docs/analysis-socratic-planner-pattern.md +0 -504
- package/docs/planning/ADR-001-monorepo-structure.md +0 -171
- package/docs/planning/ADR-002-package-extraction.md +0 -393
- package/docs/planning/ADR-003-performance-improvements.md +0 -451
- package/docs/planning/ADR-004-message-queue-features.md +0 -187
- package/docs/planning/ADR-005-devtools-observability.md +0 -202
- package/docs/planning/ADR-007-swarm-enhancements-worktree-review.md +0 -168
- package/docs/planning/ADR-008-worker-handoff-protocol.md +0 -293
- package/docs/planning/ADR-009-oh-my-opencode-patterns.md +0 -353
- package/docs/planning/ADR-010-cass-inhousing.md +0 -1215
- package/docs/planning/ROADMAP.md +0 -368
- package/docs/semantic-memory-cli-syntax.md +0 -123
- package/docs/swarm-mail-architecture.md +0 -1147
- package/docs/testing/context-recovery-test.md +0 -470
- package/evals/ARCHITECTURE.md +0 -1189
- package/evals/README.md +0 -768
- package/evals/compaction-prompt.eval.ts +0 -149
- package/evals/compaction-resumption.eval.ts +0 -289
- package/evals/coordinator-behavior.eval.ts +0 -307
- package/evals/coordinator-session.eval.ts +0 -154
- package/evals/evalite.config.ts.bak +0 -15
- package/evals/example.eval.ts +0 -31
- package/evals/fixtures/cass-baseline.ts +0 -217
- package/evals/fixtures/compaction-cases.ts +0 -350
- package/evals/fixtures/compaction-prompt-cases.ts +0 -311
- package/evals/fixtures/coordinator-sessions.ts +0 -328
- package/evals/fixtures/decomposition-cases.ts +0 -105
- package/evals/lib/compaction-loader.test.ts +0 -248
- package/evals/lib/compaction-loader.ts +0 -320
- package/evals/lib/data-loader.evalite-test.ts +0 -289
- package/evals/lib/data-loader.test.ts +0 -345
- package/evals/lib/data-loader.ts +0 -281
- package/evals/lib/llm.ts +0 -115
- package/evals/scorers/compaction-prompt-scorers.ts +0 -145
- package/evals/scorers/compaction-scorers.ts +0 -305
- package/evals/scorers/coordinator-discipline.evalite-test.ts +0 -539
- package/evals/scorers/coordinator-discipline.ts +0 -325
- package/evals/scorers/index.test.ts +0 -146
- package/evals/scorers/index.ts +0 -328
- package/evals/scorers/outcome-scorers.evalite-test.ts +0 -27
- package/evals/scorers/outcome-scorers.ts +0 -349
- package/evals/swarm-decomposition.eval.ts +0 -121
- package/examples/commands/swarm.md +0 -745
- package/examples/plugin-wrapper-template.ts +0 -2515
- package/examples/skills/hive-workflow/SKILL.md +0 -212
- package/examples/skills/skill-creator/SKILL.md +0 -223
- package/examples/skills/swarm-coordination/SKILL.md +0 -292
- package/global-skills/cli-builder/SKILL.md +0 -344
- package/global-skills/cli-builder/references/advanced-patterns.md +0 -244
- package/global-skills/learning-systems/SKILL.md +0 -644
- package/global-skills/skill-creator/LICENSE.txt +0 -202
- package/global-skills/skill-creator/SKILL.md +0 -352
- package/global-skills/skill-creator/references/output-patterns.md +0 -82
- package/global-skills/skill-creator/references/workflows.md +0 -28
- package/global-skills/swarm-coordination/SKILL.md +0 -995
- package/global-skills/swarm-coordination/references/coordinator-patterns.md +0 -235
- package/global-skills/swarm-coordination/references/strategies.md +0 -138
- package/global-skills/system-design/SKILL.md +0 -213
- package/global-skills/testing-patterns/SKILL.md +0 -430
- package/global-skills/testing-patterns/references/dependency-breaking-catalog.md +0 -586
- package/opencode-swarm-plugin-0.30.7.tgz +0 -0
- package/opencode-swarm-plugin-0.31.0.tgz +0 -0
- package/scripts/cleanup-test-memories.ts +0 -346
- package/scripts/init-skill.ts +0 -222
- package/scripts/migrate-unknown-sessions.ts +0 -349
- package/scripts/validate-skill.ts +0 -204
- package/src/agent-mail.ts +0 -1724
- package/src/anti-patterns.test.ts +0 -1167
- package/src/anti-patterns.ts +0 -448
- package/src/compaction-capture.integration.test.ts +0 -257
- package/src/compaction-hook.test.ts +0 -838
- package/src/compaction-hook.ts +0 -1204
- package/src/compaction-observability.integration.test.ts +0 -139
- package/src/compaction-observability.test.ts +0 -187
- package/src/compaction-observability.ts +0 -324
- package/src/compaction-prompt-scorers.test.ts +0 -475
- package/src/compaction-prompt-scoring.ts +0 -300
- package/src/contributor-tools.test.ts +0 -133
- package/src/contributor-tools.ts +0 -201
- package/src/dashboard.test.ts +0 -611
- package/src/dashboard.ts +0 -462
- package/src/error-enrichment.test.ts +0 -403
- package/src/error-enrichment.ts +0 -219
- package/src/eval-capture.test.ts +0 -1015
- package/src/eval-capture.ts +0 -929
- package/src/eval-gates.test.ts +0 -306
- package/src/eval-gates.ts +0 -218
- package/src/eval-history.test.ts +0 -508
- package/src/eval-history.ts +0 -214
- package/src/eval-learning.test.ts +0 -378
- package/src/eval-learning.ts +0 -360
- package/src/eval-runner.test.ts +0 -223
- package/src/eval-runner.ts +0 -402
- package/src/export-tools.test.ts +0 -476
- package/src/export-tools.ts +0 -257
- package/src/hive.integration.test.ts +0 -2241
- package/src/hive.ts +0 -1628
- package/src/index.ts +0 -940
- package/src/learning.integration.test.ts +0 -1815
- package/src/learning.ts +0 -1079
- package/src/logger.test.ts +0 -189
- package/src/logger.ts +0 -135
- package/src/mandate-promotion.test.ts +0 -473
- package/src/mandate-promotion.ts +0 -239
- package/src/mandate-storage.integration.test.ts +0 -601
- package/src/mandate-storage.test.ts +0 -578
- package/src/mandate-storage.ts +0 -794
- package/src/mandates.ts +0 -540
- package/src/memory-tools.test.ts +0 -195
- package/src/memory-tools.ts +0 -344
- package/src/memory.integration.test.ts +0 -334
- package/src/memory.test.ts +0 -158
- package/src/memory.ts +0 -527
- package/src/model-selection.test.ts +0 -188
- package/src/model-selection.ts +0 -68
- package/src/observability-tools.test.ts +0 -359
- package/src/observability-tools.ts +0 -871
- package/src/output-guardrails.test.ts +0 -438
- package/src/output-guardrails.ts +0 -381
- package/src/pattern-maturity.test.ts +0 -1160
- package/src/pattern-maturity.ts +0 -525
- package/src/planning-guardrails.test.ts +0 -491
- package/src/planning-guardrails.ts +0 -438
- package/src/plugin.ts +0 -23
- package/src/post-compaction-tracker.test.ts +0 -251
- package/src/post-compaction-tracker.ts +0 -237
- package/src/query-tools.test.ts +0 -636
- package/src/query-tools.ts +0 -324
- package/src/rate-limiter.integration.test.ts +0 -466
- package/src/rate-limiter.ts +0 -774
- package/src/replay-tools.test.ts +0 -496
- package/src/replay-tools.ts +0 -240
- package/src/repo-crawl.integration.test.ts +0 -441
- package/src/repo-crawl.ts +0 -610
- package/src/schemas/cell-events.test.ts +0 -347
- package/src/schemas/cell-events.ts +0 -807
- package/src/schemas/cell.ts +0 -257
- package/src/schemas/evaluation.ts +0 -166
- package/src/schemas/index.test.ts +0 -199
- package/src/schemas/index.ts +0 -286
- package/src/schemas/mandate.ts +0 -232
- package/src/schemas/swarm-context.ts +0 -115
- package/src/schemas/task.ts +0 -161
- package/src/schemas/worker-handoff.test.ts +0 -302
- package/src/schemas/worker-handoff.ts +0 -131
- package/src/sessions/agent-discovery.test.ts +0 -137
- package/src/sessions/agent-discovery.ts +0 -112
- package/src/sessions/index.ts +0 -15
- package/src/skills.integration.test.ts +0 -1192
- package/src/skills.test.ts +0 -643
- package/src/skills.ts +0 -1549
- package/src/storage.integration.test.ts +0 -341
- package/src/storage.ts +0 -884
- package/src/structured.integration.test.ts +0 -817
- package/src/structured.test.ts +0 -1046
- package/src/structured.ts +0 -762
- package/src/swarm-decompose.test.ts +0 -188
- package/src/swarm-decompose.ts +0 -1302
- package/src/swarm-deferred.integration.test.ts +0 -157
- package/src/swarm-deferred.test.ts +0 -38
- package/src/swarm-insights.test.ts +0 -214
- package/src/swarm-insights.ts +0 -459
- package/src/swarm-mail.integration.test.ts +0 -970
- package/src/swarm-mail.ts +0 -739
- package/src/swarm-orchestrate.integration.test.ts +0 -282
- package/src/swarm-orchestrate.test.ts +0 -548
- package/src/swarm-orchestrate.ts +0 -3084
- package/src/swarm-prompts.test.ts +0 -1270
- package/src/swarm-prompts.ts +0 -2077
- package/src/swarm-research.integration.test.ts +0 -701
- package/src/swarm-research.test.ts +0 -698
- package/src/swarm-research.ts +0 -472
- package/src/swarm-review.integration.test.ts +0 -285
- package/src/swarm-review.test.ts +0 -879
- package/src/swarm-review.ts +0 -709
- package/src/swarm-strategies.ts +0 -407
- package/src/swarm-worktree.test.ts +0 -501
- package/src/swarm-worktree.ts +0 -575
- package/src/swarm.integration.test.ts +0 -2377
- package/src/swarm.ts +0 -38
- package/src/tool-adapter.integration.test.ts +0 -1221
- package/src/tool-availability.ts +0 -461
- package/tsconfig.json +0 -28
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"swarm-orchestrate.d.ts","sourceRoot":"","sources":["../src/swarm-orchestrate.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;GAmBG;AAGH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AAaxB,OAAO,EACL,KAAK,aAAa,EAEnB,MAAM,0BAA0B,CAAC;AAsDlC;;;;;;;;GAQG;AACH,wBAAgB,qBAAqB,CAAC,MAAM,EAAE;IAC5C,OAAO,EAAE,MAAM,CAAC;IAChB,WAAW,EAAE,MAAM,EAAE,CAAC;IACtB,cAAc,CAAC,EAAE,MAAM,EAAE,CAAC;IAC1B,sBAAsB,CAAC,EAAE,MAAM,EAAE,CAAC;IAClC,gBAAgB,CAAC,EAAE,MAAM,EAAE,CAAC;IAC5B,YAAY,EAAE,MAAM,CAAC;IACrB,SAAS,EAAE,MAAM,CAAC;IAClB,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,eAAe,CAAC,EAAE,MAAM,CAAC;CAC1B,GAAG,aAAa,CA4BhB;AAED;;;;;;;;;;;;;;;;;;;;;;;;GAwBG;AACH,wBAAgB,gBAAgB,CAC9B,aAAa,EAAE,MAAM,EAAE,EACvB,WAAW,EAAE,MAAM,EAAE,GACpB;IAAE,KAAK,EAAE,OAAO,CAAC;IAAC,UAAU,EAAE,MAAM,EAAE,CAAA;CAAE,CAqC1C;AAkaD;;;;;;;;;;GAUG;AACH,eAAO,MAAM,UAAU;;;;;;;;;;;;;CA8JrB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,YAAY;;;;;;;;;;CAoFvB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,cAAc;;;;;;;;;;;;;;;;;;;;;;;;;
|
|
1
|
+
{"version":3,"file":"swarm-orchestrate.d.ts","sourceRoot":"","sources":["../src/swarm-orchestrate.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;;;;;;GAmBG;AAGH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AAaxB,OAAO,EACL,KAAK,aAAa,EAEnB,MAAM,0BAA0B,CAAC;AAsDlC;;;;;;;;GAQG;AACH,wBAAgB,qBAAqB,CAAC,MAAM,EAAE;IAC5C,OAAO,EAAE,MAAM,CAAC;IAChB,WAAW,EAAE,MAAM,EAAE,CAAC;IACtB,cAAc,CAAC,EAAE,MAAM,EAAE,CAAC;IAC1B,sBAAsB,CAAC,EAAE,MAAM,EAAE,CAAC;IAClC,gBAAgB,CAAC,EAAE,MAAM,EAAE,CAAC;IAC5B,YAAY,EAAE,MAAM,CAAC;IACrB,SAAS,EAAE,MAAM,CAAC;IAClB,eAAe,CAAC,EAAE,MAAM,CAAC;IACzB,eAAe,CAAC,EAAE,MAAM,CAAC;CAC1B,GAAG,aAAa,CA4BhB;AAED;;;;;;;;;;;;;;;;;;;;;;;;GAwBG;AACH,wBAAgB,gBAAgB,CAC9B,aAAa,EAAE,MAAM,EAAE,EACvB,WAAW,EAAE,MAAM,EAAE,GACpB;IAAE,KAAK,EAAE,OAAO,CAAC;IAAC,UAAU,EAAE,MAAM,EAAE,CAAA;CAAE,CAqC1C;AAkaD;;;;;;;;;;GAUG;AACH,eAAO,MAAM,UAAU;;;;;;;;;;;;;CA8JrB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,YAAY;;;;;;;;;;CAoFvB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,cAAc;;;;;;;;;;;;;;;;;;;;;;;;;CAoIzB,CAAC;AAEH;;;;;;;;GAQG;AACH,eAAO,MAAM,eAAe;;;;;;;;;;;;;;;;;;;;;;CA6E1B,CAAC;AAEH;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;GAuCG;AACH,eAAO,MAAM,cAAc;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAmwBzB,CAAC;AAEH;;;;;;;;;;;GAWG;AACH,eAAO,MAAM,oBAAoB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CA0K/B,CAAC;AAwBH;;;;;;;;;;;;;;GAcG;AACH,wBAAgB,gBAAgB,CAAC,IAAI,EAAE,MAAM,GAAG,MAAM,EAAE,CAUvD;AAED;;GAEG;AACH,MAAM,WAAW,wBAAwB;IACvC,uCAAuC;IACvC,WAAW,EAAE,MAAM,CAAC;IACpB,kCAAkC;IAClC,IAAI,EAAE,MAAM,CAAC;IACb,2CAA2C;IAC3C,MAAM,EAAE,MAAM,CAAC;IACf,mCAAmC;IACnC,aAAa,EAAE,kBAAkB,CAAC;CACnC;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,6CAA6C;IAC7C,UAAU,EAAE,MAAM,EAAE,CAAC;IACrB,gDAAgD;IAChD,kBAAkB,EAAE,wBAAwB,EAAE,CAAC;IAC/C,yCAAyC;IACzC,SAAS,EAAE,MAAM,CAAC,MAAM,EAAE,MAAM,CAAC,CAAC;IAClC,mDAAmD;IACnD,UAAU,EAAE,MAAM,EAAE,CAAC;CACtB;AAED;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;GA0CG;AACH,wBAAsB,gBAAgB,CACpC,IAAI,EAAE,MAAM,EACZ,WAAW,EAAE,MAAM,EACnB,OAAO,CAAC,EAAE;IAAE,aAAa,CAAC,EAAE,OAAO,CAAA;CAAE,GACpC,OAAO,CAAC,cAAc,CAAC,CAgDzB;AAED;;;;;GAKG;AACH,eAAO,MAAM,oBAAoB;;;;;;;;;;;;CAqC/B,CAAC;AAEH;;;;;;;;GAQG;AACH,eAAO,MAAM,sBAAsB;;;;;;;;;;;;;;;;;;;;;;;;CA6CjC,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,uBAAuB;;;;;;;;;;CAmClC,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,mBAAmB;;;;;;;;CAmB9B,CAAC;AAEH;;;;;;;;;;;;;;;;;GAiBG;AACH,eAAO,MAAM,mBAAmB;;;;;;;;;;;;;;;;;;;CAoJ9B,CAAC;AA4BH;;;;;;;;;;;;GAYG;AACH,eAAO,MAAM,gBAAgB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAwG3B,CAAC;AAEH;;;;;;;;GAQG;AACH,eAAO,MAAM,aAAa;;;;;;;;;;CAuGxB,CAAC;AAEH;;;;;;;;GAQG;AACH,eAAO,MAAM,WAAW;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAgMtB,CAAC;AAMH,eAAO,MAAM,gBAAgB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAe5B,CAAC"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"swarm-prompts.d.ts","sourceRoot":"","sources":["../src/swarm-prompts.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;GAYG;AAWH;;;;;GAKG;AACH,eAAO,MAAM,oBAAoB,s6EAkET,CAAC;AAEzB;;GAEG;AACH,eAAO,MAAM,6BAA6B,mxDAyDlB,CAAC;AAEzB;;;;;GAKG;AACH,eAAO,MAAM,cAAc,mkFAgFK,CAAC;AAEjC;;;;;;;GAOG;AACH,eAAO,MAAM,iBAAiB,goUAiUnB,CAAC;AAEZ;;;;;;;;;;;;;;;GAeG;AACH,eAAO,MAAM,kBAAkB,mgTAuQ9B,CAAC;AAEF;;;;;;;GAOG;AACH,eAAO,MAAM,iBAAiB,4pHA4GV,CAAC;AAErB;;;;;GAKG;AACH,eAAO,MAAM,iCAAiC,u+DAyE7C,CAAC;AAEF;;;;GAIG;AACH,eAAO,MAAM,iBAAiB,8jCAmCU,CAAC;AAMzC;;;;;;;GAOG;AACH,wBAAsB,qBAAqB,IAAI,OAAO,CAAC,MAAM,CAAC,CA8B7D;AAMD,UAAU,qBAAqB;IAC7B,IAAI,EAAE,aAAa,GAAG,QAAQ,CAAC;IAC/B,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,KAAK,CAAC,EAAE,MAAM,EAAE,CAAC;IACjB,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AAED;;;;;;;;;;;;;GAaG;AACH,wBAAsB,iBAAiB,CACrC,OAAO,EAAE,qBAAqB,GAC7B,OAAO,CAAC,MAAM,CAAC,CAYjB;AA8ID;;GAEG;AACH,wBAAgB,sBAAsB,CAAC,MAAM,EAAE;IAC7C,WAAW,EAAE,MAAM,CAAC;IACpB,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,EAAE,CAAC;IACrB,YAAY,EAAE,MAAM,CAAC;IACrB,cAAc,EAAE,OAAO,CAAC;CACzB,GAAG,MAAM,CAaT;AAED;;GAEG;AACH,wBAAgB,uBAAuB,CAAC,MAAM,EAAE;IAC9C,IAAI,EAAE,MAAM,CAAC;IACb,WAAW,EAAE,MAAM,CAAC;CACrB,GAAG,MAAM,CAIT;AAED;;GAEG;AACH,wBAAsB,qBAAqB,CAAC,MAAM,EAAE;IAClD,OAAO,EAAE,MAAM,CAAC;IAChB,OAAO,EAAE,MAAM,CAAC;IAChB,aAAa,EAAE,MAAM,CAAC;IACtB,mBAAmB,EAAE,MAAM,CAAC;IAC5B,KAAK,EAAE,MAAM,EAAE,CAAC;IAChB,cAAc,CAAC,EAAE,MAAM,CAAC;IACxB,kBAAkB,CAAC,EAAE,MAAM,CAAC;IAC5B,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,gBAAgB,CAAC,EAAE;QACjB,cAAc,CAAC,EAAE,MAAM,CAAC;QACxB,cAAc,CAAC,EAAE,MAAM,EAAE,CAAC;QAC1B,iBAAiB,CAAC,EAAE,MAAM,CAAC;KAC5B,CAAC;CACH,GAAG,OAAO,CAAC,MAAM,CAAC,CAuFlB;AAED;;GAEG;AACH,wBAAgB,mBAAmB,CAAC,MAAM,EAAE;IAC1C,UAAU,EAAE,MAAM,CAAC;IACnB,OAAO,EAAE,MAAM,CAAC;IAChB,OAAO,EAAE,MAAM,CAAC;IAChB,aAAa,EAAE,MAAM,CAAC;IACtB,mBAAmB,EAAE,MAAM,CAAC;IAC5B,KAAK,EAAE,MAAM,EAAE,CAAC;IAChB,cAAc,CAAC,EAAE,MAAM,CAAC;CACzB,GAAG,MAAM,CAUT;AAED;;GAEG;AACH,wBAAgB,sBAAsB,CAAC,MAAM,EAAE;IAC7C,OAAO,EAAE,MAAM,CAAC;IAChB,aAAa,EAAE,MAAM,CAAC;IACtB,aAAa,EAAE,MAAM,EAAE,CAAC;CACzB,GAAG,MAAM,CAMT;AAMD;;GAEG;AACH,eAAO,MAAM,oBAAoB;;;;;;;;;;;;;;;;;;;;;;CAoC/B,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,mBAAmB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
|
1
|
+
{"version":3,"file":"swarm-prompts.d.ts","sourceRoot":"","sources":["../src/swarm-prompts.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;GAYG;AAWH;;;;;GAKG;AACH,eAAO,MAAM,oBAAoB,s6EAkET,CAAC;AAEzB;;GAEG;AACH,eAAO,MAAM,6BAA6B,mxDAyDlB,CAAC;AAEzB;;;;;GAKG;AACH,eAAO,MAAM,cAAc,mkFAgFK,CAAC;AAEjC;;;;;;;GAOG;AACH,eAAO,MAAM,iBAAiB,goUAiUnB,CAAC;AAEZ;;;;;;;;;;;;;;;GAeG;AACH,eAAO,MAAM,kBAAkB,mgTAuQ9B,CAAC;AAEF;;;;;;;GAOG;AACH,eAAO,MAAM,iBAAiB,4pHA4GV,CAAC;AAErB;;;;;GAKG;AACH,eAAO,MAAM,iCAAiC,u+DAyE7C,CAAC;AAEF;;;;GAIG;AACH,eAAO,MAAM,iBAAiB,8jCAmCU,CAAC;AAMzC;;;;;;;GAOG;AACH,wBAAsB,qBAAqB,IAAI,OAAO,CAAC,MAAM,CAAC,CA8B7D;AAMD,UAAU,qBAAqB;IAC7B,IAAI,EAAE,aAAa,GAAG,QAAQ,CAAC;IAC/B,WAAW,CAAC,EAAE,MAAM,CAAC;IACrB,KAAK,CAAC,EAAE,MAAM,EAAE,CAAC;IACjB,MAAM,CAAC,EAAE,MAAM,CAAC;CACjB;AAED;;;;;;;;;;;;;GAaG;AACH,wBAAsB,iBAAiB,CACrC,OAAO,EAAE,qBAAqB,GAC7B,OAAO,CAAC,MAAM,CAAC,CAYjB;AA8ID;;GAEG;AACH,wBAAgB,sBAAsB,CAAC,MAAM,EAAE;IAC7C,WAAW,EAAE,MAAM,CAAC;IACpB,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,EAAE,CAAC;IACrB,YAAY,EAAE,MAAM,CAAC;IACrB,cAAc,EAAE,OAAO,CAAC;CACzB,GAAG,MAAM,CAaT;AAED;;GAEG;AACH,wBAAgB,uBAAuB,CAAC,MAAM,EAAE;IAC9C,IAAI,EAAE,MAAM,CAAC;IACb,WAAW,EAAE,MAAM,CAAC;CACrB,GAAG,MAAM,CAIT;AAED;;GAEG;AACH,wBAAsB,qBAAqB,CAAC,MAAM,EAAE;IAClD,OAAO,EAAE,MAAM,CAAC;IAChB,OAAO,EAAE,MAAM,CAAC;IAChB,aAAa,EAAE,MAAM,CAAC;IACtB,mBAAmB,EAAE,MAAM,CAAC;IAC5B,KAAK,EAAE,MAAM,EAAE,CAAC;IAChB,cAAc,CAAC,EAAE,MAAM,CAAC;IACxB,kBAAkB,CAAC,EAAE,MAAM,CAAC;IAC5B,aAAa,CAAC,EAAE,MAAM,CAAC;IACvB,YAAY,CAAC,EAAE,MAAM,CAAC;IACtB,gBAAgB,CAAC,EAAE;QACjB,cAAc,CAAC,EAAE,MAAM,CAAC;QACxB,cAAc,CAAC,EAAE,MAAM,EAAE,CAAC;QAC1B,iBAAiB,CAAC,EAAE,MAAM,CAAC;KAC5B,CAAC;CACH,GAAG,OAAO,CAAC,MAAM,CAAC,CAuFlB;AAED;;GAEG;AACH,wBAAgB,mBAAmB,CAAC,MAAM,EAAE;IAC1C,UAAU,EAAE,MAAM,CAAC;IACnB,OAAO,EAAE,MAAM,CAAC;IAChB,OAAO,EAAE,MAAM,CAAC;IAChB,aAAa,EAAE,MAAM,CAAC;IACtB,mBAAmB,EAAE,MAAM,CAAC;IAC5B,KAAK,EAAE,MAAM,EAAE,CAAC;IAChB,cAAc,CAAC,EAAE,MAAM,CAAC;CACzB,GAAG,MAAM,CAUT;AAED;;GAEG;AACH,wBAAgB,sBAAsB,CAAC,MAAM,EAAE;IAC7C,OAAO,EAAE,MAAM,CAAC;IAChB,aAAa,EAAE,MAAM,CAAC;IACtB,aAAa,EAAE,MAAM,EAAE,CAAC;CACzB,GAAG,MAAM,CAMT;AAMD;;GAEG;AACH,eAAO,MAAM,oBAAoB;;;;;;;;;;;;;;;;;;;;;;CAoC/B,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,mBAAmB;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CA0I9B,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,sBAAsB;;;;;;;;;;;;;;;;CAsDjC,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,iBAAiB;;;;;;;;;;;;;;;;;;;;;;CA+I5B,CAAC;AAEH;;GAEG;AACH,eAAO,MAAM,uBAAuB;;;;;;;;;;;;CAoClC,CAAC;AAEH;;;;;GAKG;AACH,eAAO,MAAM,iBAAiB;;;;;;;;;;;;;;;;;;;;;;;CAsI5B,CAAC;AAEH,eAAO,MAAM,WAAW;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAOvB,CAAC"}
|
|
@@ -1 +1 @@
|
|
|
1
|
-
{"version":3,"file":"swarm-review.d.ts","sourceRoot":"","sources":["../src/swarm-review.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAGH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AASxB;;GAEG;AACH,MAAM,WAAW,WAAW;IAC1B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,KAAK,EAAE,MAAM,CAAC;IACd,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,eAAO,MAAM,iBAAiB;;;;;iBAK5B,CAAC;AAEH;;GAEG;AACH,MAAM,WAAW,YAAY;IAC3B,MAAM,EAAE,UAAU,GAAG,eAAe,CAAC;IACrC,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,MAAM,CAAC,EAAE,WAAW,EAAE,CAAC;IACvB,kBAAkB,CAAC,EAAE,MAAM,CAAC;CAC7B;AAED,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;iBAkB5B,CAAC;AAEJ;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;CACf;AAED;;GAEG;AACH,MAAM,WAAW,mBAAmB;IAClC,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,aAAa,EAAE,MAAM,EAAE,CAAC;IACxB,IAAI,EAAE,MAAM,CAAC;IACb,sBAAsB,CAAC,EAAE,cAAc,EAAE,CAAC;IAC1C,gBAAgB,CAAC,EAAE,cAAc,EAAE,CAAC;CACrC;AAkDD;;;;;;;;;;GAUG;AACH,wBAAgB,oBAAoB,CAAC,OAAO,EAAE,mBAAmB,GAAG,MAAM,CAsGzE;AAmED;;;;;GAKG;AACH,eAAO,MAAM,YAAY;;;;;;;;;;;;;;CA+
|
|
1
|
+
{"version":3,"file":"swarm-review.d.ts","sourceRoot":"","sources":["../src/swarm-review.ts"],"names":[],"mappings":"AAAA;;;;;;;;;;;;;;GAcG;AAGH,OAAO,EAAE,CAAC,EAAE,MAAM,KAAK,CAAC;AASxB;;GAEG;AACH,MAAM,WAAW,WAAW;IAC1B,IAAI,EAAE,MAAM,CAAC;IACb,IAAI,CAAC,EAAE,MAAM,CAAC;IACd,KAAK,EAAE,MAAM,CAAC;IACd,UAAU,CAAC,EAAE,MAAM,CAAC;CACrB;AAED,eAAO,MAAM,iBAAiB;;;;;iBAK5B,CAAC;AAEH;;GAEG;AACH,MAAM,WAAW,YAAY;IAC3B,MAAM,EAAE,UAAU,GAAG,eAAe,CAAC;IACrC,OAAO,CAAC,EAAE,MAAM,CAAC;IACjB,MAAM,CAAC,EAAE,WAAW,EAAE,CAAC;IACvB,kBAAkB,CAAC,EAAE,MAAM,CAAC;CAC7B;AAED,eAAO,MAAM,kBAAkB;;;;;;;;;;;;;iBAkB5B,CAAC;AAEJ;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;IACd,OAAO,CAAC,EAAE,MAAM,CAAC;CAClB;AAED;;GAEG;AACH,MAAM,WAAW,cAAc;IAC7B,EAAE,EAAE,MAAM,CAAC;IACX,KAAK,EAAE,MAAM,CAAC;CACf;AAED;;GAEG;AACH,MAAM,WAAW,mBAAmB;IAClC,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,OAAO,EAAE,MAAM,CAAC;IAChB,UAAU,EAAE,MAAM,CAAC;IACnB,gBAAgB,CAAC,EAAE,MAAM,CAAC;IAC1B,aAAa,EAAE,MAAM,EAAE,CAAC;IACxB,IAAI,EAAE,MAAM,CAAC;IACb,sBAAsB,CAAC,EAAE,cAAc,EAAE,CAAC;IAC1C,gBAAgB,CAAC,EAAE,cAAc,EAAE,CAAC;CACrC;AAkDD;;;;;;;;;;GAUG;AACH,wBAAgB,oBAAoB,CAAC,OAAO,EAAE,mBAAmB,GAAG,MAAM,CAsGzE;AAmED;;;;;GAKG;AACH,eAAO,MAAM,YAAY;;;;;;;;;;;;;;CA+HvB,CAAC;AAEH;;;;GAIG;AACH,eAAO,MAAM,qBAAqB;;;;;;;;;;;;;;;;;;;;;CAoNhC,CAAC;AAMH;;GAEG;AACH,UAAU,gBAAgB;IACxB,QAAQ,EAAE,OAAO,CAAC;IAClB,QAAQ,EAAE,OAAO,CAAC;IAClB,aAAa,EAAE,MAAM,CAAC;IACtB,kBAAkB,EAAE,MAAM,CAAC;CAC5B;AAOD;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAGvD;AAED;;GAEG;AACH,wBAAgB,gBAAgB,CAAC,MAAM,EAAE,MAAM,GAAG,OAAO,CAGxD;AAED;;GAEG;AACH,wBAAgB,eAAe,CAAC,MAAM,EAAE,MAAM,GAAG,gBAAgB,CAQhE;AAED;;GAEG;AACH,wBAAgB,iBAAiB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAGtD;AAED;;GAEG;AACH,wBAAgB,kBAAkB,CAAC,MAAM,EAAE,MAAM,GAAG,IAAI,CAEvD;AAMD,eAAO,MAAM,WAAW;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;CAGvB,CAAC"}
|
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "opencode-swarm-plugin",
|
|
3
|
-
"version": "0.44.
|
|
3
|
+
"version": "0.44.1",
|
|
4
4
|
"description": "Multi-agent swarm coordination for OpenCode with learning capabilities, beads integration, and Agent Mail",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/index.js",
|
|
@@ -16,14 +16,27 @@
|
|
|
16
16
|
"./plugin": {
|
|
17
17
|
"import": "./dist/plugin.js",
|
|
18
18
|
"types": "./dist/plugin.d.ts"
|
|
19
|
+
},
|
|
20
|
+
"./eval-capture": {
|
|
21
|
+
"import": "./dist/eval-capture.js",
|
|
22
|
+
"types": "./dist/eval-capture.d.ts"
|
|
23
|
+
},
|
|
24
|
+
"./compaction-prompt-scoring": {
|
|
25
|
+
"import": "./dist/compaction-prompt-scoring.js",
|
|
26
|
+
"types": "./dist/compaction-prompt-scoring.d.ts"
|
|
19
27
|
}
|
|
20
28
|
},
|
|
29
|
+
"files": [
|
|
30
|
+
"dist",
|
|
31
|
+
"bin",
|
|
32
|
+
"README.md"
|
|
33
|
+
],
|
|
21
34
|
"publishConfig": {
|
|
22
35
|
"access": "public",
|
|
23
36
|
"registry": "https://registry.npmjs.org/"
|
|
24
37
|
},
|
|
25
38
|
"scripts": {
|
|
26
|
-
"build": "bun build ./src/index.ts --outdir ./dist --target node --external @electric-sql/pglite --external swarm-mail --external
|
|
39
|
+
"build": "bun build ./src/index.ts --outdir ./dist --target node --external @electric-sql/pglite --external swarm-mail --external evalite && bun build ./src/plugin.ts --outfile ./dist/plugin.js --target node --external @electric-sql/pglite --external swarm-mail --external evalite && bun build ./src/eval-capture.ts --outfile ./dist/eval-capture.js --target node --external @electric-sql/pglite --external swarm-mail && bun build ./src/compaction-prompt-scoring.ts --outfile ./dist/compaction-prompt-scoring.js --target node --external @electric-sql/pglite --external swarm-mail && tsc",
|
|
27
40
|
"dev": "bun --watch src/index.ts",
|
|
28
41
|
"test": "bun test --timeout 10000 src/anti-patterns.test.ts src/mandate-promotion.test.ts src/mandate-storage.test.ts src/output-guardrails.test.ts src/pattern-maturity.test.ts src/skills.test.ts src/structured.test.ts src/schemas/",
|
|
29
42
|
"test:integration": "bun test --timeout 60000 src/*.integration.test.ts",
|
|
@@ -56,11 +69,10 @@
|
|
|
56
69
|
"@types/minimatch": "^6.0.0",
|
|
57
70
|
"ai": "6.0.0-beta.150",
|
|
58
71
|
"bun-types": "^1.3.4",
|
|
59
|
-
"evalite": "^
|
|
72
|
+
"evalite": "^0.19.0",
|
|
60
73
|
"pino-pretty": "^13.1.3",
|
|
61
74
|
"turbo": "^2.6.3",
|
|
62
|
-
"typescript": "^5.7.0"
|
|
63
|
-
"vitest": "^4.0.15"
|
|
75
|
+
"typescript": "^5.7.0"
|
|
64
76
|
},
|
|
65
77
|
"peerDependencies": {
|
|
66
78
|
"@opencode-ai/plugin": "^1.0.0"
|
|
@@ -1,63 +0,0 @@
|
|
|
1
|
-
---
|
|
2
|
-
"opencode-swarm-plugin": minor
|
|
3
|
-
---
|
|
4
|
-
|
|
5
|
-
## 🧠 Swarm Insights: Data-Driven Decomposition
|
|
6
|
-
|
|
7
|
-
> "It should allow the learner both to reflect on the quality of found solutions so that more effective cognitive schemata can be induced (including discriminations and generalizations) or further elaborated."
|
|
8
|
-
>
|
|
9
|
-
> — *Training Complex Cognitive Skills: A Four-Component Instructional Design Model for Technical Training*
|
|
10
|
-
|
|
11
|
-
**What changed:**
|
|
12
|
-
|
|
13
|
-
New data layer (`swarm-insights.ts`) aggregates learnings from swarm coordination events to inform future decompositions. Coordinators and workers now get concise, context-efficient summaries injected into their prompts.
|
|
14
|
-
|
|
15
|
-
**Key exports:**
|
|
16
|
-
|
|
17
|
-
- `getStrategyInsights(swarmMail, task)` - Strategy success rates and recommendations
|
|
18
|
-
- Queries `subtask_outcome` events, calculates win/loss ratios
|
|
19
|
-
- Returns: `{ strategy, successRate, totalAttempts, recommendation }`
|
|
20
|
-
- Powers coordinator strategy selection with empirical data
|
|
21
|
-
|
|
22
|
-
- `getFileInsights(swarmMail, files)` - File-specific gotchas from past failures
|
|
23
|
-
- Identifies files with high failure rates
|
|
24
|
-
- Returns: `{ file, failureCount, lastFailure, gotchas[] }`
|
|
25
|
-
- Workers see warnings about tricky files before touching them
|
|
26
|
-
|
|
27
|
-
- `getPatternInsights(swarmMail)` - Common failure patterns and anti-patterns
|
|
28
|
-
- Detects recurring error types (type_error, timeout, conflict, test_failure)
|
|
29
|
-
- Returns: `{ pattern, frequency, recommendation }`
|
|
30
|
-
- Surfaces systemic issues for proactive prevention
|
|
31
|
-
|
|
32
|
-
- `formatInsightsForPrompt(bundle, options)` - Context-aware formatting
|
|
33
|
-
- Token budget enforcement (default 500 tokens, ~2000 chars)
|
|
34
|
-
- Prioritizes top 3 strategies, 5 files, 3 patterns
|
|
35
|
-
- Clean markdown output for prompt injection
|
|
36
|
-
|
|
37
|
-
- `getCachedInsights(swarmMail, cacheKey, computeFn)` - 5-minute TTL caching
|
|
38
|
-
- Prevents redundant queries during active swarms
|
|
39
|
-
- Transparent cache miss fallback
|
|
40
|
-
|
|
41
|
-
**Why it matters:**
|
|
42
|
-
|
|
43
|
-
Before this, coordinators decomposed tasks blind to past failures. "Split by file type" might have failed 8 times, but the coordinator would try it again. Workers would touch `auth/tokens.ts` without knowing it caused 3 prior failures.
|
|
44
|
-
|
|
45
|
-
Now:
|
|
46
|
-
- **Better decomposition**: Coordinator prompts show strategy success rates (e.g., "file-based: 85% success, feature-based: 40% - avoid")
|
|
47
|
-
- **Fewer repeated mistakes**: Workers see file-specific warnings before editing
|
|
48
|
-
- **Compounding learning**: Each swarm completion feeds the insights engine, improving future decompositions
|
|
49
|
-
- **Context-efficient**: Hard token caps prevent insights from dominating prompt budgets
|
|
50
|
-
|
|
51
|
-
The swarm now learns from its mistakes, not just records them.
|
|
52
|
-
|
|
53
|
-
**Data sources:**
|
|
54
|
-
- Event store: `subtask_outcome`, `eval_finalized` events
|
|
55
|
-
- Semantic memory: File-specific learnings (TODO: full integration)
|
|
56
|
-
- Anti-pattern registry: Detection and inversion rules
|
|
57
|
-
|
|
58
|
-
**Integration points:**
|
|
59
|
-
- Coordinator prompts: Inject strategy insights during decomposition
|
|
60
|
-
- Worker prompts: Inject file insights when subtasks are spawned
|
|
61
|
-
- Learning layer: Confidence decay, pattern maturity, implicit feedback scoring
|
|
62
|
-
|
|
63
|
-
This is the foundation for adaptive swarm intelligence - decomposition that gets smarter with every task completed.
|
|
@@ -1,331 +0,0 @@
|
|
|
1
|
-
# Eval Failure Analysis Report
|
|
2
|
-
**Date:** 2025-12-25
|
|
3
|
-
**Analyst:** BrightStar
|
|
4
|
-
**Cell:** opencode-swarm-plugin--ys7z8-mjlk7jsl4tt
|
|
5
|
-
**Epic:** opencode-swarm-plugin--ys7z8-mjlk7js9bt1
|
|
6
|
-
|
|
7
|
-
## Executive Summary
|
|
8
|
-
|
|
9
|
-
Two eval failures analyzed:
|
|
10
|
-
- **example.eval.ts**: 0% score - structural bug in eval setup
|
|
11
|
-
- **compaction-prompt.eval.ts**: 53% score - case sensitivity + missing forbidden tools
|
|
12
|
-
|
|
13
|
-
Both are fixable with code changes. No test data quality issues.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## example.eval.ts - 0% Score
|
|
18
|
-
|
|
19
|
-
### Status
|
|
20
|
-
❌ **CRITICAL** - Complete failure (0%)
|
|
21
|
-
|
|
22
|
-
### Root Cause
|
|
23
|
-
**Eval structure mismatch** between data provider and task function.
|
|
24
|
-
|
|
25
|
-
### Technical Details
|
|
26
|
-
|
|
27
|
-
**File:** `evals/example.eval.ts`
|
|
28
|
-
**Lines:** 14-30
|
|
29
|
-
|
|
30
|
-
The eval has a fundamental flow error:
|
|
31
|
-
|
|
32
|
-
```typescript
|
|
33
|
-
// Line 14-26: data() provides BOTH input AND expected output
|
|
34
|
-
data: async () => {
|
|
35
|
-
return [
|
|
36
|
-
{
|
|
37
|
-
input: "Test task", // ← String for task function
|
|
38
|
-
output: JSON.stringify({ // ← Expected output (ignored!)
|
|
39
|
-
epic: { title: "Test Epic", ... },
|
|
40
|
-
subtasks: [...]
|
|
41
|
-
}),
|
|
42
|
-
},
|
|
43
|
-
];
|
|
44
|
-
},
|
|
45
|
-
|
|
46
|
-
// Line 28-30: task() does passthrough
|
|
47
|
-
task: async (input) => {
|
|
48
|
-
return input; // ← Returns "Test task" string, NOT the CellTree
|
|
49
|
-
},
|
|
50
|
-
|
|
51
|
-
// Line 31: Scorer expects CellTree JSON
|
|
52
|
-
scorers: [subtaskIndependence],
|
|
53
|
-
```
|
|
54
|
-
|
|
55
|
-
**What happens:**
|
|
56
|
-
1. Evalite passes `input` ("Test task") to task function
|
|
57
|
-
2. Task returns "Test task" string unchanged
|
|
58
|
-
3. Scorer `subtaskIndependence` receives "Test task"
|
|
59
|
-
4. Scorer tries to parse as CellTree JSON → **FAILS**
|
|
60
|
-
5. Score: 0%
|
|
61
|
-
|
|
62
|
-
The `output` field in `data()` is ignored by Evalite - it's the `task()` return value that gets scored.
|
|
63
|
-
|
|
64
|
-
### Impact
|
|
65
|
-
- Example eval is useless for validation
|
|
66
|
-
- False signal that scorer infrastructure is broken (it's not)
|
|
67
|
-
- Wastes CI time
|
|
68
|
-
|
|
69
|
-
### Proposed Fix
|
|
70
|
-
|
|
71
|
-
**Option 1: Remove output from data (recommended)**
|
|
72
|
-
```typescript
|
|
73
|
-
data: async () => {
|
|
74
|
-
return [
|
|
75
|
-
{
|
|
76
|
-
input: {
|
|
77
|
-
epic: { title: "Test Epic", description: "Test" },
|
|
78
|
-
subtasks: [
|
|
79
|
-
{ title: "Subtask 1", files: ["a.ts"], estimated_complexity: 1 },
|
|
80
|
-
{ title: "Subtask 2", files: ["b.ts"], estimated_complexity: 1 },
|
|
81
|
-
],
|
|
82
|
-
},
|
|
83
|
-
},
|
|
84
|
-
];
|
|
85
|
-
},
|
|
86
|
-
|
|
87
|
-
task: async (input) => {
|
|
88
|
-
return JSON.stringify(input); // Stringify the CellTree
|
|
89
|
-
},
|
|
90
|
-
```
|
|
91
|
-
|
|
92
|
-
**Option 2: Fix task to use output**
|
|
93
|
-
```typescript
|
|
94
|
-
// Keep data() as-is, but fix task:
|
|
95
|
-
task: async (input, context) => {
|
|
96
|
-
return context.expected.output; // Use the output from data()
|
|
97
|
-
},
|
|
98
|
-
```
|
|
99
|
-
|
|
100
|
-
Option 1 is cleaner - task functions should generate output, not just pass through.
|
|
101
|
-
|
|
102
|
-
---
|
|
103
|
-
|
|
104
|
-
## compaction-prompt.eval.ts - 53% Score
|
|
105
|
-
|
|
106
|
-
### Status
|
|
107
|
-
⚠️ **DEGRADED** - Below target (53% vs 100% historical)
|
|
108
|
-
|
|
109
|
-
### Root Causes
|
|
110
|
-
|
|
111
|
-
#### RC1: Case-Sensitive Forbidden Tool Patterns (15% weight)
|
|
112
|
-
|
|
113
|
-
**File:** `src/compaction-prompt-scoring.ts`
|
|
114
|
-
**Lines:** 213-218
|
|
115
|
-
|
|
116
|
-
```typescript
|
|
117
|
-
const forbiddenTools = [
|
|
118
|
-
/\bEdit\b/, // ← Requires capital E
|
|
119
|
-
/\bWrite\b/, // ← Requires capital W
|
|
120
|
-
/swarmmail_reserve/,
|
|
121
|
-
/git commit/,
|
|
122
|
-
];
|
|
123
|
-
```
|
|
124
|
-
|
|
125
|
-
**File:** `evals/fixtures/compaction-prompt-cases.ts`
|
|
126
|
-
**Lines:** 76-83 (perfect fixture)
|
|
127
|
-
|
|
128
|
-
```
|
|
129
|
-
- edit // ← lowercase e
|
|
130
|
-
- write // ← lowercase w
|
|
131
|
-
- bash (for file modifications)
|
|
132
|
-
```
|
|
133
|
-
|
|
134
|
-
**Evidence:**
|
|
135
|
-
```javascript
|
|
136
|
-
/\bEdit\b/.test("- Edit") // ✅ true
|
|
137
|
-
/\bEdit\b/.test("- edit") // ❌ false (word boundary + case)
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
**Impact:**
|
|
141
|
-
- Perfect fixture: 0/4 forbidden tools matched
|
|
142
|
-
- Forbidden tools scorer: 0% (should be 75-100%)
|
|
143
|
-
- Overall impact: 15% of total score lost
|
|
144
|
-
|
|
145
|
-
#### RC2: Missing Forbidden Tools (15% weight)
|
|
146
|
-
|
|
147
|
-
Scorer expects **4 tools**:
|
|
148
|
-
1. Edit (or edit)
|
|
149
|
-
2. Write (or write)
|
|
150
|
-
3. swarmmail_reserve
|
|
151
|
-
4. git commit
|
|
152
|
-
|
|
153
|
-
Perfect fixture has **3 tools** (and case mismatch):
|
|
154
|
-
1. edit ❌ (lowercase)
|
|
155
|
-
2. write ❌ (lowercase)
|
|
156
|
-
3. bash ❌ (not in scorer's list)
|
|
157
|
-
|
|
158
|
-
Missing: swarmmail_reserve, git commit
|
|
159
|
-
|
|
160
|
-
**Impact:**
|
|
161
|
-
- Even if case fixed, still only 2/4 tools = 50% on this scorer
|
|
162
|
-
- Weighted: 50% × 15% = 7.5% contribution (should be 15%)
|
|
163
|
-
|
|
164
|
-
#### RC3: "bash" Not in Scorer's List
|
|
165
|
-
|
|
166
|
-
Fixtures mention "bash (for file modifications)" as forbidden, but scorer doesn't check for it.
|
|
167
|
-
This creates a 3-way mismatch:
|
|
168
|
-
- Fixture lists: edit, write, bash
|
|
169
|
-
- Scorer checks: Edit, Write, swarmmail_reserve, git commit
|
|
170
|
-
- Overlap: 0 tools (due to case)
|
|
171
|
-
|
|
172
|
-
### Score Breakdown - Perfect Fixture
|
|
173
|
-
|
|
174
|
-
Expected (if 100%):
|
|
175
|
-
```
|
|
176
|
-
epicIdSpecificity: 20% × 1.0 = 20%
|
|
177
|
-
actionability: 20% × 1.0 = 20%
|
|
178
|
-
coordinatorIdentity: 25% × 1.0 = 25%
|
|
179
|
-
forbiddenToolsPresent: 15% × 1.0 = 15%
|
|
180
|
-
postCompactionDiscipline: 20% × 1.0 = 20%
|
|
181
|
-
─────
|
|
182
|
-
TOTAL: 100%
|
|
183
|
-
```
|
|
184
|
-
|
|
185
|
-
Actual (current):
|
|
186
|
-
```
|
|
187
|
-
epicIdSpecificity: 20% × 1.0 = 20% ✅
|
|
188
|
-
actionability: 20% × 1.0 = 20% ✅
|
|
189
|
-
coordinatorIdentity: 25% × 1.0 = 25% ✅
|
|
190
|
-
forbiddenToolsPresent: 15% × 0.0 = 0% ❌ (0/4 matched)
|
|
191
|
-
postCompactionDiscipline: 20% × 1.0 = 20% ✅
|
|
192
|
-
─────
|
|
193
|
-
TOTAL: 85%
|
|
194
|
-
```
|
|
195
|
-
|
|
196
|
-
Perfect fixture alone should score 85%, but overall eval is 53%.
|
|
197
|
-
This means the 5 "bad" fixtures are pulling average down further (expected behavior).
|
|
198
|
-
|
|
199
|
-
### Historical Context
|
|
200
|
-
|
|
201
|
-
Semantic memory claims 100% score previously. Likely scenarios:
|
|
202
|
-
1. **Never actually ran** - aspiration documented before implementation
|
|
203
|
-
2. **Ran with different fixtures** - fixtures were updated after scorer was written
|
|
204
|
-
3. **Scorer was case-insensitive before** - regression in recent commit aa12943
|
|
205
|
-
|
|
206
|
-
Commit aa12943 (2025-12-24) added the eval infrastructure. This is brand new code.
|
|
207
|
-
|
|
208
|
-
### Proposed Fixes
|
|
209
|
-
|
|
210
|
-
#### Fix 1: Make Scorer Case-Insensitive (Recommended)
|
|
211
|
-
|
|
212
|
-
**File:** `src/compaction-prompt-scoring.ts`
|
|
213
|
-
**Lines:** 213-218
|
|
214
|
-
|
|
215
|
-
```typescript
|
|
216
|
-
const forbiddenTools = [
|
|
217
|
-
/\bedit\b/i, // Case insensitive with 'i' flag
|
|
218
|
-
/\bwrite\b/i, // Case insensitive
|
|
219
|
-
/\bbash\b/i, // Add bash (was missing)
|
|
220
|
-
/swarmmail_reserve/i, // Keep, add 'i' for safety
|
|
221
|
-
/git commit/i, // Keep, add 'i' for safety
|
|
222
|
-
];
|
|
223
|
-
```
|
|
224
|
-
|
|
225
|
-
**Rationale:**
|
|
226
|
-
- Coordinators might capitalize differently in prompts
|
|
227
|
-
- Real prompts won't always match exact case
|
|
228
|
-
- More robust matching
|
|
229
|
-
|
|
230
|
-
#### Fix 2: Update Fixtures to Match Scorer (Alternative)
|
|
231
|
-
|
|
232
|
-
**File:** `evals/fixtures/compaction-prompt-cases.ts`
|
|
233
|
-
**Lines:** 76-83 (and all other fixtures)
|
|
234
|
-
|
|
235
|
-
```
|
|
236
|
-
- Edit // Capital E
|
|
237
|
-
- Write // Capital W
|
|
238
|
-
- bash (for file modifications) // Keep or remove
|
|
239
|
-
- swarmmail_reserve // ADD
|
|
240
|
-
- git commit // ADD
|
|
241
|
-
```
|
|
242
|
-
|
|
243
|
-
**Rationale:**
|
|
244
|
-
- Keeps scorer strict (may catch real case issues)
|
|
245
|
-
- Makes fixtures comprehensive (all 5 tools)
|
|
246
|
-
- More explicit about what's forbidden
|
|
247
|
-
|
|
248
|
-
#### Fix 3: Hybrid (Best of Both)
|
|
249
|
-
|
|
250
|
-
1. Make scorer case-insensitive (Fix 1)
|
|
251
|
-
2. Update fixtures to include all 5 tools (Fix 2)
|
|
252
|
-
3. Remove "bash" from fixtures if not in coordinator forbidden list
|
|
253
|
-
|
|
254
|
-
```typescript
|
|
255
|
-
// Scorer (5 tools, case-insensitive):
|
|
256
|
-
const forbiddenTools = [
|
|
257
|
-
/\bedit\b/i,
|
|
258
|
-
/\bwrite\b/i,
|
|
259
|
-
/swarmmail_reserve/i,
|
|
260
|
-
/git\s+commit/i,
|
|
261
|
-
/\bread\b/i, // Consider adding - coordinators shouldn't read, should check status
|
|
262
|
-
];
|
|
263
|
-
```
|
|
264
|
-
|
|
265
|
-
```
|
|
266
|
-
// Fixture:
|
|
267
|
-
- Edit
|
|
268
|
-
- Write
|
|
269
|
-
- swarmmail_reserve (only workers reserve files)
|
|
270
|
-
- git commit (workers commit their changes)
|
|
271
|
-
```
|
|
272
|
-
|
|
273
|
-
### Risk Assessment
|
|
274
|
-
|
|
275
|
-
**If we fix this, will scores jump to 100%?**
|
|
276
|
-
|
|
277
|
-
**Perfect fixture:** 85% → 100% (if all 4 tools matched)
|
|
278
|
-
**Other fixtures:** Depends on their issues
|
|
279
|
-
|
|
280
|
-
Looking at fixture expected values:
|
|
281
|
-
- Fixture 0 (perfect): Should be 100%
|
|
282
|
-
- Fixture 1 (placeholder): Should fail (expected)
|
|
283
|
-
- Fixture 2 (generic): Should fail (expected)
|
|
284
|
-
- Fixture 3 (weak identity): Should partially fail (expected)
|
|
285
|
-
- Fixture 4 (missing forbidden): Should fail on forbidden tools only
|
|
286
|
-
- Fixture 5 (wrong first tool): Should fail on discipline only
|
|
287
|
-
|
|
288
|
-
Average across 6 fixtures: ~66% expected (not 100%)
|
|
289
|
-
|
|
290
|
-
**So 53% → ~70-80%** is realistic after fixes (not 100%).
|
|
291
|
-
|
|
292
|
-
To get higher scores, need to fix issues in bad fixtures too, but those are SUPPOSED to fail.
|
|
293
|
-
The scorer is working correctly on those.
|
|
294
|
-
|
|
295
|
-
---
|
|
296
|
-
|
|
297
|
-
## Recommendations
|
|
298
|
-
|
|
299
|
-
### Immediate Actions (P0)
|
|
300
|
-
|
|
301
|
-
1. **Fix example.eval.ts structure** - 5 min fix, unblocks that eval
|
|
302
|
-
2. **Make forbidden tools case-insensitive** - 5 min fix, +15-20% score boost
|
|
303
|
-
3. **Add missing tools to fixtures** - 10 min, comprehensive coverage
|
|
304
|
-
|
|
305
|
-
### Medium-term Actions (P1)
|
|
306
|
-
|
|
307
|
-
4. **Verify 100% claim in semantic memory** - Check if historical data exists
|
|
308
|
-
5. **Document scorer expectations** - Add comments to fixtures explaining weights
|
|
309
|
-
6. **Add unit tests for scorers** - Test edge cases independently
|
|
310
|
-
|
|
311
|
-
### Long-term Actions (P2)
|
|
312
|
-
|
|
313
|
-
7. **Consider LLM-as-judge for semantic checks** - Case-insensitive by nature
|
|
314
|
-
8. **Add visual diff in eval output** - Show what's missing from prompts
|
|
315
|
-
9. **Create eval dashboard** - Track scores over time, detect regressions
|
|
316
|
-
|
|
317
|
-
---
|
|
318
|
-
|
|
319
|
-
## Conclusion
|
|
320
|
-
|
|
321
|
-
Both evals have **code bugs, not test data issues**:
|
|
322
|
-
- example.eval.ts: Structural bug (task/data mismatch)
|
|
323
|
-
- compaction-prompt.eval.ts: Case sensitivity + incomplete tool list
|
|
324
|
-
|
|
325
|
-
Fixes are straightforward and low-risk. After fixes, expect:
|
|
326
|
-
- example.eval.ts: 0% → 100%
|
|
327
|
-
- compaction-prompt.eval.ts: 53% → 70-80%
|
|
328
|
-
|
|
329
|
-
The 100% historical score in semantic memory is likely aspirational - these evals are brand new (commit aa12943, Dec 24).
|
|
330
|
-
|
|
331
|
-
**Ready to implement fixes or escalate for review?**
|