maestro-flow 0.4.17 → 0.4.19

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (165) hide show
  1. package/.agents/skills/maestro/SKILL.md +1 -1
  2. package/.agents/skills/maestro-analyze/SKILL.md +5 -0
  3. package/.agents/skills/maestro-blueprint/SKILL.md +5 -0
  4. package/.agents/skills/maestro-brainstorm/SKILL.md +5 -0
  5. package/.agents/skills/maestro-next/SKILL.md +254 -0
  6. package/.agents/skills/team-swarm/SKILL.md +180 -0
  7. package/.agents/skills/team-swarm/roles/analyst/role.md +187 -0
  8. package/.agents/skills/team-swarm/roles/ant/role.md +169 -0
  9. package/.agents/skills/team-swarm/roles/coordinator/commands/converge.md +146 -0
  10. package/.agents/skills/team-swarm/roles/coordinator/commands/init-swarm.md +136 -0
  11. package/.agents/skills/team-swarm/roles/coordinator/commands/iterate.md +232 -0
  12. package/.agents/skills/team-swarm/roles/coordinator/role.md +211 -0
  13. package/.agents/skills/team-swarm/roles/scorer/role.md +157 -0
  14. package/.agents/skills/team-swarm/scripts/aco.py +473 -0
  15. package/.agents/skills/team-swarm/scripts/pheromone.py +144 -0
  16. package/.agents/skills/team-swarm/scripts/scoring.py +92 -0
  17. package/.agents/skills/team-swarm/scripts/test_aco.py +475 -0
  18. package/.agents/skills/team-swarm/specs/ant-output-schema.md +119 -0
  19. package/.agents/skills/team-swarm/specs/convergence-criteria.md +106 -0
  20. package/.agents/skills/team-swarm/specs/pheromone-schema.md +123 -0
  21. package/.agents/skills/team-swarm/specs/swarm-config-template.json +71 -0
  22. package/.agents/skills/team-swarm/specs/swarm-protocol.md +117 -0
  23. package/.agy/skills/maestro/SKILL.md +1 -1
  24. package/.agy/skills/maestro-analyze/SKILL.md +5 -0
  25. package/.agy/skills/maestro-blueprint/SKILL.md +5 -0
  26. package/.agy/skills/maestro-brainstorm/SKILL.md +5 -0
  27. package/.agy/skills/maestro-next/SKILL.md +250 -0
  28. package/.agy/skills/team-swarm/SKILL.md +176 -0
  29. package/.agy/skills/team-swarm/roles/analyst/role.md +183 -0
  30. package/.agy/skills/team-swarm/roles/ant/role.md +165 -0
  31. package/.agy/skills/team-swarm/roles/coordinator/commands/converge.md +134 -0
  32. package/.agy/skills/team-swarm/roles/coordinator/commands/init-swarm.md +136 -0
  33. package/.agy/skills/team-swarm/roles/coordinator/commands/iterate.md +202 -0
  34. package/.agy/skills/team-swarm/roles/coordinator/role.md +209 -0
  35. package/.agy/skills/team-swarm/roles/scorer/role.md +153 -0
  36. package/.agy/skills/team-swarm/scripts/aco.py +473 -0
  37. package/.agy/skills/team-swarm/scripts/pheromone.py +144 -0
  38. package/.agy/skills/team-swarm/scripts/scoring.py +92 -0
  39. package/.agy/skills/team-swarm/scripts/test_aco.py +475 -0
  40. package/.agy/skills/team-swarm/specs/ant-output-schema.md +119 -0
  41. package/.agy/skills/team-swarm/specs/convergence-criteria.md +106 -0
  42. package/.agy/skills/team-swarm/specs/pheromone-schema.md +123 -0
  43. package/.agy/skills/team-swarm/specs/swarm-config-template.json +71 -0
  44. package/.agy/skills/team-swarm/specs/swarm-protocol.md +117 -0
  45. package/.claude/commands/maestro-analyze.md +5 -0
  46. package/.claude/commands/maestro-blueprint.md +5 -0
  47. package/.claude/commands/maestro-brainstorm.md +5 -0
  48. package/.claude/commands/maestro-next.md +252 -0
  49. package/.claude/commands/maestro.md +1 -1
  50. package/.claude/skills/team-swarm/SKILL.md +178 -0
  51. package/.claude/skills/team-swarm/roles/analyst/role.md +185 -0
  52. package/.claude/skills/team-swarm/roles/ant/role.md +167 -0
  53. package/.claude/skills/team-swarm/roles/coordinator/commands/converge.md +146 -0
  54. package/.claude/skills/team-swarm/roles/coordinator/commands/init-swarm.md +136 -0
  55. package/.claude/skills/team-swarm/roles/coordinator/commands/iterate.md +232 -0
  56. package/.claude/skills/team-swarm/roles/coordinator/role.md +209 -0
  57. package/.claude/skills/team-swarm/roles/scorer/role.md +155 -0
  58. package/.claude/skills/team-swarm/scripts/aco.py +473 -0
  59. package/.claude/skills/team-swarm/scripts/pheromone.py +144 -0
  60. package/.claude/skills/team-swarm/scripts/scoring.py +92 -0
  61. package/.claude/skills/team-swarm/scripts/test_aco.py +475 -0
  62. package/.claude/skills/team-swarm/specs/ant-output-schema.md +119 -0
  63. package/.claude/skills/team-swarm/specs/convergence-criteria.md +106 -0
  64. package/.claude/skills/team-swarm/specs/pheromone-schema.md +123 -0
  65. package/.claude/skills/team-swarm/specs/swarm-config-template.json +71 -0
  66. package/.claude/skills/team-swarm/specs/swarm-protocol.md +117 -0
  67. package/.codex/skills/learn-decompose/SKILL.md +34 -3
  68. package/.codex/skills/learn-retro/SKILL.md +31 -1
  69. package/.codex/skills/learn-second-opinion/SKILL.md +34 -4
  70. package/.codex/skills/maestro-analyze/SKILL.md +44 -5
  71. package/.codex/skills/maestro-blueprint/SKILL.md +5 -0
  72. package/.codex/skills/maestro-brainstorm/SKILL.md +46 -0
  73. package/.codex/skills/maestro-execute/SKILL.md +61 -5
  74. package/.codex/skills/maestro-milestone-audit/SKILL.md +64 -13
  75. package/.codex/skills/maestro-milestone-complete/SKILL.md +12 -0
  76. package/.codex/skills/maestro-next/SKILL.md +297 -0
  77. package/.codex/skills/maestro-plan/SKILL.md +36 -1
  78. package/.codex/skills/maestro-player/SKILL.md +25 -6
  79. package/.codex/skills/maestro-ralph/SKILL.md +17 -10
  80. package/.codex/skills/maestro-ralph-execute/SKILL.md +2 -1
  81. package/.codex/skills/maestro-roadmap/SKILL.md +35 -4
  82. package/.codex/skills/maestro-ui-codify/SKILL.md +38 -10
  83. package/.codex/skills/maestro-verify/SKILL.md +40 -5
  84. package/.codex/skills/manage-codebase-rebuild/SKILL.md +52 -5
  85. package/.codex/skills/manage-issue-discover/SKILL.md +106 -15
  86. package/.codex/skills/quality-auto-test/SKILL.md +70 -16
  87. package/.codex/skills/quality-debug/SKILL.md +139 -28
  88. package/.codex/skills/quality-refactor/SKILL.md +61 -11
  89. package/.codex/skills/quality-review/SKILL.md +45 -9
  90. package/.codex/skills/quality-test/SKILL.md +58 -3
  91. package/.codex/skills/security-audit/SKILL.md +38 -0
  92. package/.codex/skills/spec-map/SKILL.md +65 -8
  93. package/.codex/skills/team-coordinate/SKILL.md +28 -11
  94. package/.codex/skills/team-coordinate/specs/role-catalog.md +20 -0
  95. package/.codex/skills/team-lifecycle-v4/SKILL.md +23 -7
  96. package/.codex/skills/team-lifecycle-v4/instructions/agent-instruction.md +20 -0
  97. package/.codex/skills/team-quality-assurance/SKILL.md +40 -2
  98. package/.codex/skills/team-review/SKILL.md +42 -2
  99. package/.codex/skills/team-tech-debt/SKILL.md +45 -2
  100. package/.codex/skills/team-testing/SKILL.md +42 -2
  101. package/dashboard/dist-server/dashboard/src/server/wiki/search.d.ts +6 -4
  102. package/dashboard/dist-server/dashboard/src/server/wiki/search.js +50 -8
  103. package/dashboard/dist-server/dashboard/src/server/wiki/search.js.map +1 -1
  104. package/dashboard/dist-server/dashboard/src/server/wiki/virtual-wiki-adapters.d.ts +32 -0
  105. package/dashboard/dist-server/dashboard/src/server/wiki/virtual-wiki-adapters.js +294 -0
  106. package/dashboard/dist-server/dashboard/src/server/wiki/virtual-wiki-adapters.js.map +1 -1
  107. package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.d.ts +1 -0
  108. package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.js +35 -1
  109. package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.js.map +1 -1
  110. package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.test.js +235 -0
  111. package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.test.js.map +1 -1
  112. package/dist/src/commands/install.js +5 -1
  113. package/dist/src/commands/install.js.map +1 -1
  114. package/dist/src/i18n/locales/en.d.ts.map +1 -1
  115. package/dist/src/i18n/locales/en.js +9 -0
  116. package/dist/src/i18n/locales/en.js.map +1 -1
  117. package/dist/src/i18n/locales/zh.d.ts.map +1 -1
  118. package/dist/src/i18n/locales/zh.js +9 -0
  119. package/dist/src/i18n/locales/zh.js.map +1 -1
  120. package/dist/src/i18n/types.d.ts +3 -0
  121. package/dist/src/i18n/types.d.ts.map +1 -1
  122. package/dist/src/ralph/cmd-check.js +1 -1
  123. package/dist/src/ralph/cmd-check.js.map +1 -1
  124. package/dist/src/ralph/cmd-complete.js +1 -1
  125. package/dist/src/ralph/cmd-complete.js.map +1 -1
  126. package/dist/src/ralph/cmd-next.d.ts.map +1 -1
  127. package/dist/src/ralph/cmd-next.js +12 -4
  128. package/dist/src/ralph/cmd-next.js.map +1 -1
  129. package/dist/src/ralph/cmd-session.js +2 -2
  130. package/dist/src/ralph/cmd-session.js.map +1 -1
  131. package/dist/src/ralph/status-store.d.ts +8 -1
  132. package/dist/src/ralph/status-store.d.ts.map +1 -1
  133. package/dist/src/ralph/status-store.js +12 -2
  134. package/dist/src/ralph/status-store.js.map +1 -1
  135. package/dist/src/tools/store-knowhow.d.ts.map +1 -1
  136. package/dist/src/tools/store-knowhow.js +51 -64
  137. package/dist/src/tools/store-knowhow.js.map +1 -1
  138. package/dist/src/tui/install-ui/HooksConfig.d.ts +5 -1
  139. package/dist/src/tui/install-ui/HooksConfig.d.ts.map +1 -1
  140. package/dist/src/tui/install-ui/HooksConfig.js +5 -3
  141. package/dist/src/tui/install-ui/HooksConfig.js.map +1 -1
  142. package/dist/src/tui/install-ui/InstallConfirm.d.ts +2 -0
  143. package/dist/src/tui/install-ui/InstallConfirm.d.ts.map +1 -1
  144. package/dist/src/tui/install-ui/InstallConfirm.js +1 -1
  145. package/dist/src/tui/install-ui/InstallConfirm.js.map +1 -1
  146. package/dist/src/tui/install-ui/InstallExecution.d.ts +1 -0
  147. package/dist/src/tui/install-ui/InstallExecution.d.ts.map +1 -1
  148. package/dist/src/tui/install-ui/InstallExecution.js +26 -3
  149. package/dist/src/tui/install-ui/InstallExecution.js.map +1 -1
  150. package/dist/src/tui/install-ui/InstallFlow.d.ts +1 -1
  151. package/dist/src/tui/install-ui/InstallFlow.d.ts.map +1 -1
  152. package/dist/src/tui/install-ui/InstallFlow.js +76 -16
  153. package/dist/src/tui/install-ui/InstallFlow.js.map +1 -1
  154. package/dist/src/tui/install-ui/InstallHub.d.ts +2 -0
  155. package/dist/src/tui/install-ui/InstallHub.d.ts.map +1 -1
  156. package/dist/src/tui/install-ui/InstallHub.js +8 -0
  157. package/dist/src/tui/install-ui/InstallHub.js.map +1 -1
  158. package/dist/src/tui/install-ui/InstallResult.d.ts.map +1 -1
  159. package/dist/src/tui/install-ui/InstallResult.js +1 -1
  160. package/dist/src/tui/install-ui/InstallResult.js.map +1 -1
  161. package/dist/src/utils/update-notices.js +23 -0
  162. package/dist/src/utils/update-notices.js.map +1 -1
  163. package/package.json +1 -1
  164. package/workflows/finish-work.md +119 -0
  165. package/workflows/milestone-complete.md +23 -1
@@ -0,0 +1,92 @@
1
+ """Pluggable scoring module.
2
+
3
+ Two scorer types:
4
+ - ScriptScorer: runs user-defined Python rule on ant artifacts (deterministic)
5
+ - FallbackScorer: derives effective_score from self_score * self_confidence
6
+
7
+ LLM scorer is handled by the scorer worker role, not this script.
8
+ This module is invoked by aco.py when scoring.mode = "script" or as fallback.
9
+
10
+ Spec: ../specs/ant-output-schema.md (two-layer scoring)
11
+ """
12
+ from __future__ import annotations
13
+
14
+ import importlib.util
15
+ import json
16
+ from pathlib import Path
17
+ from typing import Dict, Optional
18
+
19
+
20
+ class BaseScorer:
21
+ def score(self, ant_artifact: dict) -> Optional[float]: # noqa: ARG002
22
+ raise NotImplementedError
23
+
24
+
25
+ class FallbackScorer(BaseScorer):
26
+ """Used when no verified_scores file exists.
27
+
28
+ effective_score = self_score * self_confidence * discount
29
+ """
30
+
31
+ def __init__(self, discount: float = 0.5):
32
+ self.discount = discount
33
+
34
+ def score(self, ant_artifact: dict) -> float:
35
+ s = ant_artifact.get("self_score", 0.0)
36
+ c = ant_artifact.get("self_confidence", 0.5)
37
+ return s * c * self.discount
38
+
39
+
40
+ class ScriptScorer(BaseScorer):
41
+ """Loads user-defined scoring rule from a Python file.
42
+
43
+ The rule file must define: `def score(ant_artifact: dict) -> float`
44
+ Returns a value in [0.0, 1.0].
45
+ """
46
+
47
+ def __init__(self, rule_path: Path):
48
+ spec = importlib.util.spec_from_file_location("user_score_rule", rule_path)
49
+ if spec is None or spec.loader is None:
50
+ raise ValueError(f"cannot load scoring rule from {rule_path}")
51
+ self.module = importlib.util.module_from_spec(spec)
52
+ spec.loader.exec_module(self.module)
53
+ if not hasattr(self.module, "score"):
54
+ raise ValueError(f"{rule_path} must define `score(ant_artifact) -> float`")
55
+
56
+ def score(self, ant_artifact: dict) -> float:
57
+ v = self.module.score(ant_artifact)
58
+ return max(0.0, min(1.0, float(v)))
59
+
60
+
61
+ def load_verified_scores(scores_file: Path) -> Dict[str, float]:
62
+ """Load pre-computed verified_scores from scorer role output (if exists)."""
63
+ if not scores_file.exists():
64
+ return {}
65
+ data = json.loads(scores_file.read_text())
66
+ return {
67
+ ant_id: entry["verified_score"]
68
+ for ant_id, entry in data.get("scores", {}).items()
69
+ }
70
+
71
+
72
+ def resolve_score(
73
+ ant_artifact: dict,
74
+ verified_scores: Dict[str, float],
75
+ script_scorer: Optional[ScriptScorer],
76
+ fallback: FallbackScorer,
77
+ ) -> tuple[float, str]:
78
+ """Return (score, source) using priority: verified > script > fallback."""
79
+ ant_id = ant_artifact.get("ant_id", "")
80
+ if ant_id in verified_scores:
81
+ return verified_scores[ant_id], "verified_llm"
82
+ if script_scorer is not None:
83
+ try:
84
+ return script_scorer.score(ant_artifact), "verified_script"
85
+ except Exception as e:
86
+ print(f"warning: script scorer failed for {ant_id}: {e}")
87
+ return fallback.score(ant_artifact), "fallback_self"
88
+
89
+
90
+ def hallucination_check(self_score: float, verified_score: float, threshold: float = 0.4) -> bool:
91
+ """True if self vs verified divergence exceeds threshold."""
92
+ return abs(self_score - verified_score) > threshold
@@ -0,0 +1,475 @@
1
+ """End-to-end tests for team-swarm scripts.
2
+
3
+ Runs each scenario in a clean tmp directory and asserts on outputs.
4
+ No external test framework — runnable as `python test_aco.py`.
5
+ """
6
+ from __future__ import annotations
7
+
8
+ import json
9
+ import subprocess
10
+ import sys
11
+ import tempfile
12
+ from pathlib import Path
13
+ from typing import Optional
14
+
15
+ SCRIPT_DIR = Path(__file__).parent
16
+ ACO = SCRIPT_DIR / "aco.py"
17
+
18
+ # Import modules directly for unit-level tests
19
+ sys.path.insert(0, str(SCRIPT_DIR))
20
+ from pheromone import PheromoneState, edge_key # noqa: E402
21
+ from scoring import FallbackScorer, ScriptScorer, hallucination_check, resolve_score # noqa: E402
22
+
23
+
24
+ # ---------------------------------------------------------------------------
25
+ # Helpers
26
+ # ---------------------------------------------------------------------------
27
+
28
+ PASS = 0
29
+ FAIL = 0
30
+ FAILED_NAMES = []
31
+
32
+
33
+ def run_aco(session: Path, *args, expect_exit: int = 0) -> dict:
34
+ """Invoke aco.py CLI, return parsed stdout JSON."""
35
+ cmd = [sys.executable, str(ACO), "--session", str(session), *args]
36
+ proc = subprocess.run(cmd, capture_output=True, text=True)
37
+ if proc.returncode != expect_exit:
38
+ raise AssertionError(
39
+ f"exit={proc.returncode} (expected {expect_exit})\n"
40
+ f"cmd: {' '.join(cmd)}\nstdout: {proc.stdout}\nstderr: {proc.stderr}"
41
+ )
42
+ if not proc.stdout.strip():
43
+ return {}
44
+ return json.loads(proc.stdout.strip().splitlines()[-1])
45
+
46
+
47
+ def check(name: str, cond: bool, detail: str = ""):
48
+ global PASS, FAIL
49
+ if cond:
50
+ PASS += 1
51
+ print(f" PASS {name}")
52
+ else:
53
+ FAIL += 1
54
+ FAILED_NAMES.append(name)
55
+ print(f" FAIL {name} {detail}")
56
+
57
+
58
+ def section(title: str):
59
+ print(f"\n=== {title} ===")
60
+
61
+
62
+ def write_config(session: Path, overrides: Optional[dict] = None) -> dict:
63
+ cfg = {
64
+ "swarm": {"n_ants": 3, "max_iterations": 3, "elite_keep": 2},
65
+ "aco": {"alpha": 1.0, "beta": 2.0, "rho": 0.2, "q": 1.0,
66
+ "tau_init": 1.0, "tau_min": 0.01, "tau_max": 10.0},
67
+ "task_space": {"type": "graph",
68
+ "nodes": ["a", "b", "c", "d", "e"],
69
+ "max_path_length": 3,
70
+ "start_nodes": "any",
71
+ "edges": "complete"},
72
+ "scoring": {"mode": "fallback", "self_score_discount": 0.5},
73
+ "ant_prompt": {"objective": "test", "evidence_requirements": []},
74
+ "convergence": {
75
+ "max_iterations": 3,
76
+ "stagnation": {"enabled": True, "patience": 2, "min_delta": 0.01},
77
+ "entropy_floor": {"enabled": True, "threshold": 0.1},
78
+ "target_score": {"enabled": True, "value": 0.95},
79
+ },
80
+ }
81
+ if overrides:
82
+ _deep_merge(cfg, overrides)
83
+ session.mkdir(parents=True, exist_ok=True)
84
+ (session / "swarm-config.json").write_text(json.dumps(cfg))
85
+ return cfg
86
+
87
+
88
+ def _deep_merge(base: dict, overrides: dict):
89
+ for k, v in overrides.items():
90
+ if isinstance(v, dict) and isinstance(base.get(k), dict):
91
+ _deep_merge(base[k], v)
92
+ else:
93
+ base[k] = v
94
+
95
+
96
+ def write_ant(session: Path, iteration: int, ant_idx: int,
97
+ path: list, self_score: float = 0.6, self_confidence: float = 0.7) -> Path:
98
+ artifacts = session / "artifacts"
99
+ artifacts.mkdir(exist_ok=True)
100
+ decisions = [
101
+ {"from": path[i], "to": path[i + 1], "rationale": "r",
102
+ "guided_by": "pheromone", "deviation_from_hint": False}
103
+ for i in range(len(path) - 1)
104
+ ]
105
+ art = {
106
+ "schema_version": "1.0",
107
+ "ant_id": f"ANT-{iteration}-{ant_idx}",
108
+ "iteration": iteration,
109
+ "assignment": {"start_node": path[0], "max_path_length": 3},
110
+ "path": path,
111
+ "path_decisions": decisions,
112
+ "self_score": self_score,
113
+ "self_confidence": self_confidence,
114
+ "evidence": [f"src/{path[-1]}.ts:{ant_idx}"],
115
+ "candidate_solution": {"type": "string", "summary": f"sol-{ant_idx}",
116
+ "content": str(path)},
117
+ }
118
+ p = artifacts / f"ant-{iteration}-{ant_idx}.json"
119
+ p.write_text(json.dumps(art))
120
+ return p
121
+
122
+
123
+ # ---------------------------------------------------------------------------
124
+ # Unit tests — pheromone.py
125
+ # ---------------------------------------------------------------------------
126
+
127
+ def test_pheromone_unit():
128
+ section("pheromone.py unit")
129
+
130
+ s = PheromoneState.initialize(["a", "b", "c"], {})
131
+ check("init creates n*(n-1)/2 edges", len(s.tau) == 3, f"got {len(s.tau)}")
132
+ check("init uses default alpha=1.0", s.metadata["alpha"] == 1.0)
133
+ check("init uses default rho=0.2", s.metadata["rho"] == 0.2)
134
+ check("init all tau equal", len(set(s.tau.values())) == 1)
135
+
136
+ s.evaporate()
137
+ check("evaporate reduces by rho", abs(s.tau[edge_key("a", "b")] - 0.8) < 1e-9,
138
+ f"got {s.tau[edge_key('a', 'b')]}")
139
+
140
+ s.deposit(["a", "b", "c"], 0.5)
141
+ expected_ab = 0.8 + 0.5 * 1.0
142
+ check("deposit adds q*score per edge",
143
+ abs(s.tau[edge_key("a", "b")] - expected_ab) < 1e-9,
144
+ f"got {s.tau[edge_key('a', 'b')]}, expected {expected_ab}")
145
+
146
+ s.metadata["tau_max"] = 2.0
147
+ s.tau[edge_key("a", "b")] = 100.0
148
+ s.clip()
149
+ check("clip enforces tau_max", s.tau[edge_key("a", "b")] == 2.0)
150
+ check("clip enforces tau_min on small values",
151
+ all(v >= s.metadata["tau_min"] for v in s.tau.values()))
152
+
153
+ stats = s.stats()
154
+ check("stats has entropy field", "entropy" in stats)
155
+ check("stats entropy is positive", stats["entropy"] > 0)
156
+
157
+ probs = s.select_neighbors("a", ["a", "b", "c"])
158
+ check("select_neighbors excludes current node", "a" not in probs)
159
+ check("select_neighbors probs sum to 1",
160
+ abs(sum(probs.values()) - 1.0) < 1e-9, f"got {sum(probs.values())}")
161
+
162
+ empty = PheromoneState.initialize(["a"], {})
163
+ check("single-node init produces 0 edges", len(empty.tau) == 0)
164
+ check("empty stats handles 0-edge case", empty.stats()["entropy"] == 0.0)
165
+
166
+ s2 = PheromoneState.initialize(["a", "b", "c"], {})
167
+ with tempfile.TemporaryDirectory() as td:
168
+ p = Path(td) / "p.json"
169
+ s2.save(p)
170
+ s3 = PheromoneState.load(p)
171
+ check("save/load roundtrip preserves tau", s2.tau == s3.tau)
172
+ check("save/load preserves metadata", s2.metadata == s3.metadata)
173
+
174
+
175
+ # ---------------------------------------------------------------------------
176
+ # Unit tests — scoring.py
177
+ # ---------------------------------------------------------------------------
178
+
179
+ def test_scoring_unit():
180
+ section("scoring.py unit")
181
+
182
+ fb = FallbackScorer(discount=0.5)
183
+ artifact = {"self_score": 0.8, "self_confidence": 0.6}
184
+ expected = 0.8 * 0.6 * 0.5
185
+ check("FallbackScorer = self * conf * discount",
186
+ abs(fb.score(artifact) - expected) < 1e-9)
187
+
188
+ artifact_missing = {}
189
+ check("FallbackScorer handles missing fields",
190
+ fb.score(artifact_missing) == 0.0)
191
+
192
+ with tempfile.TemporaryDirectory() as td:
193
+ rule = Path(td) / "rule.py"
194
+ rule.write_text("def score(ant_artifact):\n return ant_artifact.get('self_score', 0) * 2\n")
195
+ ss = ScriptScorer(rule)
196
+ check("ScriptScorer loads user rule", ss.score({"self_score": 0.3}) == 0.6)
197
+ check("ScriptScorer clamps > 1.0", ss.score({"self_score": 0.9}) == 1.0)
198
+ check("ScriptScorer clamps < 0.0", ss.score({"self_score": -0.5}) == 0.0)
199
+
200
+ check("hallucination_check true at diff > 0.4",
201
+ hallucination_check(0.9, 0.4) is True)
202
+ check("hallucination_check false at diff < 0.4",
203
+ hallucination_check(0.5, 0.4) is False)
204
+
205
+ artifact_v = {"ant_id": "X", "self_score": 0.5, "self_confidence": 0.5}
206
+ score, src = resolve_score(artifact_v, {"X": 0.9}, None, fb)
207
+ check("resolve_score prefers verified", score == 0.9 and src == "verified_llm")
208
+
209
+ score, src = resolve_score(artifact_v, {}, None, fb)
210
+ check("resolve_score falls back when no verified",
211
+ src == "fallback_self")
212
+
213
+
214
+ # ---------------------------------------------------------------------------
215
+ # CLI integration — full pipeline (3 iterations, fallback scoring)
216
+ # ---------------------------------------------------------------------------
217
+
218
+ def test_full_pipeline_3_iterations():
219
+ section("aco.py — full 3-iteration pipeline (fallback scoring)")
220
+
221
+ with tempfile.TemporaryDirectory() as td:
222
+ session = Path(td)
223
+ write_config(session)
224
+
225
+ r = run_aco(session, "init")
226
+ check("init: status ok", r["status"] == "ok")
227
+ check("init: 5 nodes -> 10 edges", r["n_edges"] == 10)
228
+ check("init: pheromone file exists",
229
+ (session / "pheromone" / "current.json").exists())
230
+ check("init: task-space file exists",
231
+ (session / "task-space.json").exists())
232
+ check("init: init.json frozen",
233
+ (session / "pheromone" / "init.json").exists())
234
+
235
+ best_history = []
236
+ entropy_history = []
237
+
238
+ for k in range(1, 4):
239
+ sel = run_aco(session, "select", "--iter", str(k))
240
+ check(f"iter{k} select: 3 assignments", len(sel["assignments"]) == 3)
241
+ check(f"iter{k} select: ant_ids correct",
242
+ all(a["ant_id"] == f"ANT-{k}-{i+1}" for i, a in enumerate(sel["assignments"])))
243
+ check(f"iter{k} select: edge_preferences not empty",
244
+ all(a["edge_preferences"] for a in sel["assignments"]))
245
+
246
+ paths_by_quality = [
247
+ (["a", "b", "c"], 0.9, 0.9),
248
+ (["b", "d", "e"], 0.6, 0.7),
249
+ (["c", "e", "a"], 0.4, 0.5),
250
+ ]
251
+ for i, (path, ss, sc) in enumerate(paths_by_quality, 1):
252
+ write_ant(session, k, i, path, ss, sc)
253
+
254
+ up = run_aco(session, "update", "--iter", str(k))
255
+ check(f"iter{k} update: 3 ants processed", up["n_ants_processed"] == 3)
256
+ check(f"iter{k} update: best_score > 0", up["best_score"] > 0)
257
+ check(f"iter{k} update: stats has entropy", "entropy" in up["stats"])
258
+
259
+ best_history.append(up["best_score"])
260
+ entropy_history.append(up["stats"]["entropy"])
261
+
262
+ check(f"iter{k} history snapshot exists",
263
+ (session / "pheromone" / "history" / f"{k}.json").exists())
264
+ check(f"iter{k} trails written",
265
+ (session / "trails" / f"{k}.jsonl").exists())
266
+
267
+ check("best_score stable after iter1 (same ants each iter)",
268
+ best_history[0] == best_history[-1])
269
+ check("entropy decreases over iterations (concentration)",
270
+ entropy_history[0] >= entropy_history[-1],
271
+ f"start={entropy_history[0]:.3f} end={entropy_history[-1]:.3f}")
272
+
273
+ cv = run_aco(session, "converged")
274
+ check("converged: returns triggered_by list", isinstance(cv["triggered_by"], list))
275
+ check("converged: triggers stagnation (best unchanged)",
276
+ "stagnation" in cv["triggered_by"])
277
+ check("converged: also triggers max_iterations",
278
+ "max_iterations" in cv["triggered_by"])
279
+ check("converged: true after triggers", cv["converged"] is True)
280
+
281
+ rep = run_aco(session, "report")
282
+ check("report: has best", rep["best"] is not None)
283
+ check("report: top_k present", len(rep["top_k"]) > 0)
284
+ check("report: convergence curve len == iters",
285
+ len(rep["convergence_curve"]) == 3)
286
+ check("report: iterations_completed == 3", rep["iterations_completed"] == 3)
287
+
288
+
289
+ # ---------------------------------------------------------------------------
290
+ # CLI integration — target_score convergence
291
+ # ---------------------------------------------------------------------------
292
+
293
+ def test_target_score_convergence():
294
+ section("aco.py — target_score triggers early convergence")
295
+
296
+ with tempfile.TemporaryDirectory() as td:
297
+ session = Path(td)
298
+ write_config(session, {
299
+ "convergence": {"target_score": {"enabled": True, "value": 0.30}},
300
+ })
301
+ run_aco(session, "init")
302
+ run_aco(session, "select", "--iter", "1")
303
+ write_ant(session, 1, 1, ["a", "b"], 0.9, 0.9)
304
+ write_ant(session, 1, 2, ["b", "c"], 0.6, 0.6)
305
+ write_ant(session, 1, 3, ["c", "d"], 0.4, 0.4)
306
+ up = run_aco(session, "update", "--iter", "1")
307
+
308
+ check("best_score above target", up["best_score"] >= 0.30)
309
+ cv = run_aco(session, "converged")
310
+ check("converged after 1 iter (target hit)",
311
+ cv["converged"] is True and "target_score" in cv["triggered_by"])
312
+
313
+
314
+ # ---------------------------------------------------------------------------
315
+ # CLI integration — hallucination detection (verified scores file)
316
+ # ---------------------------------------------------------------------------
317
+
318
+ def test_hallucination_flagging():
319
+ section("aco.py — hallucination flagging with verified scores")
320
+
321
+ with tempfile.TemporaryDirectory() as td:
322
+ session = Path(td)
323
+ write_config(session)
324
+ run_aco(session, "init")
325
+ run_aco(session, "select", "--iter", "1")
326
+
327
+ write_ant(session, 1, 1, ["a", "b"], self_score=0.95, self_confidence=0.9)
328
+ write_ant(session, 1, 2, ["b", "c"], self_score=0.85, self_confidence=0.8)
329
+ write_ant(session, 1, 3, ["c", "d"], self_score=0.7, self_confidence=0.7)
330
+
331
+ scores_dir = session / "scores"
332
+ scores_dir.mkdir(exist_ok=True)
333
+ (scores_dir / "iter-1-scores.json").write_text(json.dumps({
334
+ "iteration": 1, "scorer_type": "llm",
335
+ "scores": {
336
+ "ANT-1-1": {"verified_score": 0.20, "rationale": "weak"},
337
+ "ANT-1-2": {"verified_score": 0.80, "rationale": "ok"},
338
+ "ANT-1-3": {"verified_score": 0.65, "rationale": "ok"},
339
+ },
340
+ }))
341
+
342
+ up = run_aco(session, "update", "--iter", "1")
343
+ check("ANT-1-1 flagged as hallucination (|0.95-0.20|=0.75 > 0.4)",
344
+ "ANT-1-1" in up["hallucinations_flagged"])
345
+ check("ANT-1-2 NOT flagged (|0.85-0.80|=0.05 < 0.4)",
346
+ "ANT-1-2" not in up["hallucinations_flagged"])
347
+ check("ANT-1-3 NOT flagged (|0.7-0.65|=0.05 < 0.4)",
348
+ "ANT-1-3" not in up["hallucinations_flagged"])
349
+
350
+
351
+ # ---------------------------------------------------------------------------
352
+ # CLI integration — invalid artifact handling
353
+ # ---------------------------------------------------------------------------
354
+
355
+ def test_invalid_artifacts():
356
+ section("aco.py — invalid artifacts handled gracefully")
357
+
358
+ with tempfile.TemporaryDirectory() as td:
359
+ session = Path(td)
360
+ write_config(session)
361
+ run_aco(session, "init")
362
+ run_aco(session, "select", "--iter", "1")
363
+
364
+ write_ant(session, 1, 1, ["a", "b"])
365
+
366
+ artifacts = session / "artifacts"
367
+ (artifacts / "ant-1-2.json").write_text(json.dumps({
368
+ "schema_version": "1.0", "ant_id": "ANT-1-2", "iteration": 1,
369
+ "path": ["a", "ZZZ_NOT_A_NODE"],
370
+ "path_decisions": [{"from": "a", "to": "ZZZ_NOT_A_NODE", "rationale": "x",
371
+ "guided_by": "x", "deviation_from_hint": False}],
372
+ "self_score": 0.5, "self_confidence": 0.5,
373
+ "evidence": ["x"], "candidate_solution": {"summary": "x"},
374
+ }))
375
+
376
+ (artifacts / "ant-1-3.json").write_text("{ malformed json")
377
+
378
+ up = run_aco(session, "update", "--iter", "1")
379
+ check("only 1 valid ant processed (2 rejected)",
380
+ up["n_ants_processed"] == 1, f"got {up['n_ants_processed']}")
381
+
382
+
383
+ def test_config_validation():
384
+ section("aco.py — config validation error paths")
385
+
386
+ with tempfile.TemporaryDirectory() as td:
387
+ session = Path(td)
388
+ session.mkdir(exist_ok=True)
389
+ (session / "swarm-config.json").write_text(json.dumps({"task_space": {}, "aco": {}}))
390
+ r = run_aco(session, "init", expect_exit=2)
391
+ check("missing nodes -> exit 2", r["status"] == "error")
392
+ check("error message mentions nodes",
393
+ "nodes" in r["message"] or "auto_discover" in r["message"])
394
+
395
+ with tempfile.TemporaryDirectory() as td:
396
+ session = Path(td)
397
+ session.mkdir(exist_ok=True)
398
+ r = run_aco(session, "init", expect_exit=2)
399
+ check("missing config -> exit 2", r["status"] == "error")
400
+
401
+
402
+ def test_idempotent_update():
403
+ section("aco.py — update is idempotent (re-running same iter is safe)")
404
+
405
+ with tempfile.TemporaryDirectory() as td:
406
+ session = Path(td)
407
+ write_config(session, {"aco": {"rho": 0.0}})
408
+ run_aco(session, "init")
409
+ run_aco(session, "select", "--iter", "1")
410
+ write_ant(session, 1, 1, ["a", "b"], 0.8, 0.8)
411
+ write_ant(session, 1, 2, ["b", "c"], 0.6, 0.7)
412
+ write_ant(session, 1, 3, ["c", "d"], 0.4, 0.5)
413
+
414
+ up1 = run_aco(session, "update", "--iter", "1")
415
+ up2 = run_aco(session, "update", "--iter", "1")
416
+ check("update re-run keeps n_ants_processed stable",
417
+ up1["n_ants_processed"] == up2["n_ants_processed"])
418
+ check("update re-run keeps best ant stable",
419
+ up1["best_score"] == up2["best_score"])
420
+
421
+
422
+ def test_auto_discover_from_glob():
423
+ section("aco.py — auto_discover_from glob")
424
+
425
+ with tempfile.TemporaryDirectory() as td:
426
+ td_path = Path(td)
427
+ for name in ["alpha.txt", "beta.txt", "gamma.txt"]:
428
+ (td_path / name).write_text("data")
429
+
430
+ session = td_path / "session"
431
+ session.mkdir()
432
+ (session / "swarm-config.json").write_text(json.dumps({
433
+ "swarm": {"n_ants": 2}, "aco": {},
434
+ "task_space": {"auto_discover_from": str(td_path / "*.txt"),
435
+ "max_path_length": 2},
436
+ "scoring": {"mode": "fallback"},
437
+ "ant_prompt": {"objective": "x"},
438
+ "convergence": {"max_iterations": 1},
439
+ }))
440
+ r = run_aco(session, "init")
441
+ check("auto_discover finds 3 files", r["n_nodes"] == 3,
442
+ f"got {r['n_nodes']}")
443
+
444
+
445
+ # ---------------------------------------------------------------------------
446
+ # Main
447
+ # ---------------------------------------------------------------------------
448
+
449
+ def main():
450
+ print("=" * 60)
451
+ print("team-swarm scripts test suite")
452
+ print("=" * 60)
453
+
454
+ test_pheromone_unit()
455
+ test_scoring_unit()
456
+ test_full_pipeline_3_iterations()
457
+ test_target_score_convergence()
458
+ test_hallucination_flagging()
459
+ test_invalid_artifacts()
460
+ test_config_validation()
461
+ test_idempotent_update()
462
+ test_auto_discover_from_glob()
463
+
464
+ print("\n" + "=" * 60)
465
+ print(f"Results: {PASS} passed, {FAIL} failed")
466
+ if FAILED_NAMES:
467
+ print("\nFailed tests:")
468
+ for name in FAILED_NAMES:
469
+ print(f" - {name}")
470
+ print("=" * 60)
471
+ sys.exit(0 if FAIL == 0 else 1)
472
+
473
+
474
+ if __name__ == "__main__":
475
+ main()
@@ -0,0 +1,119 @@
1
+ # Ant Output Schema
2
+
3
+ **The critical contract.** Every ant MUST write a JSON file matching this schema. Pheromone updates depend on it. Schema violation = ant output discarded + worker error reported.
4
+
5
+ ## File Path
6
+
7
+ ```
8
+ <session>/artifacts/ant-<iteration>-<ant_id>.json
9
+ ```
10
+
11
+ Example: `artifacts/ant-3-2.json` (ant id 2 in iteration 3)
12
+
13
+ ## Schema
14
+
15
+ ```json
16
+ {
17
+ "schema_version": "1.0",
18
+ "ant_id": "ANT-3-2",
19
+ "iteration": 3,
20
+ "assignment": {
21
+ "start_node": "node_a",
22
+ "max_path_length": 5
23
+ },
24
+ "path": ["node_a", "node_c", "node_f"],
25
+ "path_decisions": [
26
+ {
27
+ "from": "node_a",
28
+ "to": "node_c",
29
+ "rationale": "<one-line reason>",
30
+ "guided_by": "pheromone | heuristic | evidence",
31
+ "pheromone_weight": 0.30,
32
+ "deviation_from_hint": false
33
+ },
34
+ {
35
+ "from": "node_c",
36
+ "to": "node_f",
37
+ "rationale": "<one-line reason>",
38
+ "guided_by": "evidence",
39
+ "deviation_from_hint": true
40
+ }
41
+ ],
42
+ "self_score": 0.78,
43
+ "self_confidence": 0.6,
44
+ "cost_tokens": 1200,
45
+ "cost_seconds": 18,
46
+ "evidence": [
47
+ "src/foo.ts:42",
48
+ "tests/foo.spec.ts:18"
49
+ ],
50
+ "candidate_solution": {
51
+ "type": "<string|object|file_ref>",
52
+ "summary": "<one-line>",
53
+ "content": "<actual artifact content OR a path>"
54
+ },
55
+ "blockers": [],
56
+ "notes": "<optional free text, NOT used by pheromone update>"
57
+ }
58
+ ```
59
+
60
+ ## Required Fields (Validation)
61
+
62
+ | Field | Type | Required | Constraint |
63
+ |-------|------|----------|------------|
64
+ | `schema_version` | string | yes | must be `"1.0"` |
65
+ | `ant_id` | string | yes | matches assignment |
66
+ | `iteration` | int | yes | matches assignment |
67
+ | `path` | array of string | yes | len >= 1, all nodes ∈ task_space.nodes |
68
+ | `path_decisions` | array | yes | len = len(path) - 1 |
69
+ | `self_score` | float | yes | 0.0 ≤ x ≤ 1.0 |
70
+ | `self_confidence` | float | yes | 0.0 ≤ x ≤ 1.0 |
71
+ | `cost_tokens` | int | no | recommended for budget tracking |
72
+ | `evidence` | array of string | yes | min 1 entry (forces grounding) |
73
+ | `candidate_solution` | object | yes | non-empty `summary` |
74
+
75
+ ## Two-Layer Scoring
76
+
77
+ | Score | Source | Purpose |
78
+ |-------|--------|---------|
79
+ | `self_score` | Ant LLM self-report | Cheap early-stop signal; tracked but NOT used for pheromone update |
80
+ | `self_confidence` | Ant LLM self-report | Used to weight self_score when no verified_score is available |
81
+ | `verified_score` | scoring.py OR scorer role | **Authoritative input to pheromone update.** Written to separate file: `<session>/scores/iter-k-scores.json` |
82
+
83
+ If `verified_score` is missing for an ant (scorer disabled), pheromone update falls back to:
84
+ ```
85
+ effective_score = self_score * self_confidence * config.scoring.self_score_discount # default 0.5
86
+ ```
87
+
88
+ ## verified_scores File
89
+
90
+ When scorer runs (script or LLM), produces:
91
+
92
+ ```json
93
+ {
94
+ "iteration": 3,
95
+ "scorer_type": "script | llm",
96
+ "scores": {
97
+ "ANT-3-1": { "verified_score": 0.82, "rationale": "..." },
98
+ "ANT-3-2": { "verified_score": 0.45, "rationale": "..." }
99
+ },
100
+ "computed_at": "2026-05-25T14:30:00Z"
101
+ }
102
+ ```
103
+
104
+ ## Hallucination Detection
105
+
106
+ `aco.py update` compares `self_score` vs `verified_score` per ant:
107
+ - `|self_score - verified_score| > 0.4` → flagged as `hallucination_suspected`
108
+ - Repeat offenders (≥ 3 across iterations) → `aco.py` reduces deposit on their paths by 50%
109
+
110
+ ## Validation in Ant's Phase 4
111
+
112
+ Ant MUST self-validate before writing:
113
+ 1. JSON parses cleanly
114
+ 2. All required fields present
115
+ 3. `path` nodes exist in task-space.json
116
+ 4. `path_decisions` length = `len(path) - 1`
117
+ 5. Numeric ranges within bounds
118
+
119
+ Validation failure → retry once → if still failing, report `partial_completion` to coordinator.