maestro-flow 0.4.20 → 0.4.21
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.agents/skills/maestro-ralph-execute/SKILL.md +2 -1
- package/.agents/skills/maestro-swarm-workflow/SKILL.md +27 -19
- package/.agents/skills/maestro-universal-workflow/SKILL.md +563 -0
- package/.agents/skills/team-adversarial-swarm/SKILL.md +235 -0
- package/.agents/skills/team-adversarial-swarm/scripts/aco.py +473 -0
- package/.agents/skills/team-adversarial-swarm/scripts/pheromone.py +144 -0
- package/.agents/skills/team-adversarial-swarm/scripts/scoring.py +92 -0
- package/.agents/skills/team-adversarial-swarm/scripts/test_aco.py +475 -0
- package/.agents/skills/team-adversarial-swarm/specs/ant-output-schema.md +115 -0
- package/.agents/skills/team-adversarial-swarm/specs/convergence-criteria.md +75 -0
- package/.agents/skills/team-adversarial-swarm/specs/pheromone-schema.md +90 -0
- package/.agents/skills/team-adversarial-swarm/specs/swarm-config-template.json +66 -0
- package/.agents/skills/team-adversarial-swarm/specs/swarm-protocol.md +105 -0
- package/.agents/skills/team-adversarial-swarm/workflows/wf-swarm-converge.js +197 -0
- package/.agents/skills/team-adversarial-swarm/workflows/wf-swarm-explore.js +194 -0
- package/.agents/skills/team-adversarial-swarm/workflows/wf-swarm-score.js +188 -0
- package/.agents/skills/team-adversarial-swarm/workflows/wf-swarm-synthesize.js +248 -0
- package/.agy/skills/maestro-ralph-execute/SKILL.md +2 -1
- package/.agy/skills/maestro-swarm-workflow/SKILL.md +27 -19
- package/.agy/skills/maestro-universal-workflow/SKILL.md +560 -0
- package/.agy/skills/team-adversarial-swarm/SKILL.md +244 -0
- package/.agy/skills/team-adversarial-swarm/scripts/aco.py +473 -0
- package/.agy/skills/team-adversarial-swarm/scripts/pheromone.py +144 -0
- package/.agy/skills/team-adversarial-swarm/scripts/scoring.py +92 -0
- package/.agy/skills/team-adversarial-swarm/scripts/test_aco.py +475 -0
- package/.agy/skills/team-adversarial-swarm/specs/ant-output-schema.md +115 -0
- package/.agy/skills/team-adversarial-swarm/specs/convergence-criteria.md +75 -0
- package/.agy/skills/team-adversarial-swarm/specs/pheromone-schema.md +90 -0
- package/.agy/skills/team-adversarial-swarm/specs/swarm-config-template.json +66 -0
- package/.agy/skills/team-adversarial-swarm/specs/swarm-protocol.md +105 -0
- package/.agy/skills/team-adversarial-swarm/workflows/wf-swarm-converge.js +197 -0
- package/.agy/skills/team-adversarial-swarm/workflows/wf-swarm-explore.js +194 -0
- package/.agy/skills/team-adversarial-swarm/workflows/wf-swarm-score.js +188 -0
- package/.agy/skills/team-adversarial-swarm/workflows/wf-swarm-synthesize.js +248 -0
- package/.claude/commands/maestro-ralph-execute.md +2 -1
- package/.claude/commands/maestro-swarm-workflow.md +27 -19
- package/.claude/commands/maestro-universal-workflow.md +561 -0
- package/.claude/skills/team-adversarial-swarm/SKILL.md +233 -0
- package/.claude/skills/team-adversarial-swarm/scripts/aco.py +473 -0
- package/.claude/skills/team-adversarial-swarm/scripts/pheromone.py +144 -0
- package/.claude/skills/team-adversarial-swarm/scripts/scoring.py +92 -0
- package/.claude/skills/team-adversarial-swarm/scripts/test_aco.py +475 -0
- package/.claude/skills/team-adversarial-swarm/specs/ant-output-schema.md +115 -0
- package/.claude/skills/team-adversarial-swarm/specs/convergence-criteria.md +75 -0
- package/.claude/skills/team-adversarial-swarm/specs/pheromone-schema.md +90 -0
- package/.claude/skills/team-adversarial-swarm/specs/swarm-config-template.json +66 -0
- package/.claude/skills/team-adversarial-swarm/specs/swarm-protocol.md +105 -0
- package/.claude/skills/team-adversarial-swarm/workflows/wf-swarm-converge.js +197 -0
- package/.claude/skills/team-adversarial-swarm/workflows/wf-swarm-explore.js +194 -0
- package/.claude/skills/team-adversarial-swarm/workflows/wf-swarm-score.js +188 -0
- package/.claude/skills/team-adversarial-swarm/workflows/wf-swarm-synthesize.js +248 -0
- package/dashboard/dist-server/dashboard/src/server/wiki/graph-analysis.js +1 -1
- package/dashboard/dist-server/dashboard/src/server/wiki/graph-analysis.js.map +1 -1
- package/dashboard/dist-server/dashboard/src/server/wiki/search.js +1 -1
- package/dashboard/dist-server/dashboard/src/server/wiki/search.js.map +1 -1
- package/dashboard/dist-server/dashboard/src/server/wiki/virtual-wiki-adapters.d.ts +1 -1
- package/dashboard/dist-server/dashboard/src/server/wiki/virtual-wiki-adapters.js +5 -5
- package/dashboard/dist-server/dashboard/src/server/wiki/virtual-wiki-adapters.js.map +1 -1
- package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.js +3 -3
- package/dashboard/dist-server/dashboard/src/server/wiki/wiki-indexer.js.map +1 -1
- package/dashboard/dist-server/src/graph/types.d.ts +111 -0
- package/dashboard/dist-server/src/graph/types.js +2 -0
- package/dashboard/dist-server/src/graph/types.js.map +1 -0
- package/dist/src/commands/install-backend.d.ts +0 -7
- package/dist/src/commands/install-backend.d.ts.map +1 -1
- package/dist/src/commands/install-backend.js +0 -14
- package/dist/src/commands/install-backend.js.map +1 -1
- package/dist/src/commands/install.d.ts.map +1 -1
- package/dist/src/commands/install.js +0 -18
- package/dist/src/commands/install.js.map +1 -1
- package/dist/src/commands/kg.d.ts +2 -2
- package/dist/src/commands/kg.d.ts.map +1 -1
- package/dist/src/commands/kg.js +150 -179
- package/dist/src/commands/kg.js.map +1 -1
- package/dist/src/graph/analyzers/fs-analyzer.d.ts +10 -0
- package/dist/src/graph/analyzers/fs-analyzer.d.ts.map +1 -0
- package/dist/src/graph/analyzers/fs-analyzer.js +959 -0
- package/dist/src/graph/analyzers/fs-analyzer.js.map +1 -0
- package/dist/src/graph/index.d.ts +6 -0
- package/dist/src/graph/index.d.ts.map +1 -0
- package/dist/src/graph/index.js +6 -0
- package/dist/src/graph/index.js.map +1 -0
- package/dist/src/graph/loader.d.ts +3 -0
- package/dist/src/graph/loader.d.ts.map +1 -0
- package/dist/src/graph/loader.js +12 -0
- package/dist/src/graph/loader.js.map +1 -0
- package/dist/src/graph/merger.d.ts +56 -0
- package/dist/src/graph/merger.d.ts.map +1 -0
- package/dist/src/graph/merger.js +896 -0
- package/dist/src/graph/merger.js.map +1 -0
- package/dist/src/graph/query.d.ts +7 -0
- package/dist/src/graph/query.d.ts.map +1 -0
- package/dist/src/graph/query.js +126 -0
- package/dist/src/graph/query.js.map +1 -0
- package/dist/src/graph/types.d.ts +112 -0
- package/dist/src/graph/types.d.ts.map +1 -0
- package/dist/src/graph/types.js +2 -0
- package/dist/src/graph/types.js.map +1 -0
- package/dist/src/i18n/locales/en.d.ts.map +1 -1
- package/dist/src/i18n/locales/en.js +0 -10
- package/dist/src/i18n/locales/en.js.map +1 -1
- package/dist/src/i18n/locales/zh.d.ts.map +1 -1
- package/dist/src/i18n/locales/zh.js +0 -10
- package/dist/src/i18n/locales/zh.js.map +1 -1
- package/dist/src/i18n/types.d.ts +0 -9
- package/dist/src/i18n/types.d.ts.map +1 -1
- package/dist/src/tui/install-ui/InstallConfirm.d.ts +0 -1
- package/dist/src/tui/install-ui/InstallConfirm.d.ts.map +1 -1
- package/dist/src/tui/install-ui/InstallConfirm.js +1 -1
- package/dist/src/tui/install-ui/InstallConfirm.js.map +1 -1
- package/dist/src/tui/install-ui/InstallExecution.d.ts +0 -1
- package/dist/src/tui/install-ui/InstallExecution.d.ts.map +1 -1
- package/dist/src/tui/install-ui/InstallExecution.js +0 -22
- package/dist/src/tui/install-ui/InstallExecution.js.map +1 -1
- package/dist/src/tui/install-ui/InstallFlow.d.ts +1 -1
- package/dist/src/tui/install-ui/InstallFlow.d.ts.map +1 -1
- package/dist/src/tui/install-ui/InstallFlow.js +5 -23
- package/dist/src/tui/install-ui/InstallFlow.js.map +1 -1
- package/dist/src/tui/install-ui/InstallHub.d.ts +0 -2
- package/dist/src/tui/install-ui/InstallHub.d.ts.map +1 -1
- package/dist/src/tui/install-ui/InstallHub.js +0 -6
- package/dist/src/tui/install-ui/InstallHub.js.map +1 -1
- package/dist/src/tui/install-ui/InstallResult.d.ts.map +1 -1
- package/dist/src/tui/install-ui/InstallResult.js +1 -1
- package/dist/src/tui/install-ui/InstallResult.js.map +1 -1
- package/dist/src/utils/update-notices.js +12 -0
- package/dist/src/utils/update-notices.js.map +1 -1
- package/package.json +1 -1
- package/workflows/swarm/wf-analyze.js +195 -34
- package/workflows/swarm/wf-brainstorm.js +225 -53
- package/workflows/swarm/wf-execute.js +199 -23
- package/workflows/swarm/wf-grill.js +181 -20
- package/workflows/swarm/wf-milestone-audit.js +178 -29
- package/workflows/swarm/wf-plan.js +288 -53
- package/workflows/swarm/wf-review.js +195 -80
- package/workflows/swarm/wf-verify.js +125 -28
|
@@ -0,0 +1,92 @@
|
|
|
1
|
+
"""Pluggable scoring module.
|
|
2
|
+
|
|
3
|
+
Two scorer types:
|
|
4
|
+
- ScriptScorer: runs user-defined Python rule on ant artifacts (deterministic)
|
|
5
|
+
- FallbackScorer: derives effective_score from self_score * self_confidence
|
|
6
|
+
|
|
7
|
+
LLM scorer is handled by the scorer worker role, not this script.
|
|
8
|
+
This module is invoked by aco.py when scoring.mode = "script" or as fallback.
|
|
9
|
+
|
|
10
|
+
Spec: ../specs/ant-output-schema.md (two-layer scoring)
|
|
11
|
+
"""
|
|
12
|
+
from __future__ import annotations
|
|
13
|
+
|
|
14
|
+
import importlib.util
|
|
15
|
+
import json
|
|
16
|
+
from pathlib import Path
|
|
17
|
+
from typing import Dict, Optional
|
|
18
|
+
|
|
19
|
+
|
|
20
|
+
class BaseScorer:
|
|
21
|
+
def score(self, ant_artifact: dict) -> Optional[float]: # noqa: ARG002
|
|
22
|
+
raise NotImplementedError
|
|
23
|
+
|
|
24
|
+
|
|
25
|
+
class FallbackScorer(BaseScorer):
|
|
26
|
+
"""Used when no verified_scores file exists.
|
|
27
|
+
|
|
28
|
+
effective_score = self_score * self_confidence * discount
|
|
29
|
+
"""
|
|
30
|
+
|
|
31
|
+
def __init__(self, discount: float = 0.5):
|
|
32
|
+
self.discount = discount
|
|
33
|
+
|
|
34
|
+
def score(self, ant_artifact: dict) -> float:
|
|
35
|
+
s = ant_artifact.get("self_score", 0.0)
|
|
36
|
+
c = ant_artifact.get("self_confidence", 0.5)
|
|
37
|
+
return s * c * self.discount
|
|
38
|
+
|
|
39
|
+
|
|
40
|
+
class ScriptScorer(BaseScorer):
|
|
41
|
+
"""Loads user-defined scoring rule from a Python file.
|
|
42
|
+
|
|
43
|
+
The rule file must define: `def score(ant_artifact: dict) -> float`
|
|
44
|
+
Returns a value in [0.0, 1.0].
|
|
45
|
+
"""
|
|
46
|
+
|
|
47
|
+
def __init__(self, rule_path: Path):
|
|
48
|
+
spec = importlib.util.spec_from_file_location("user_score_rule", rule_path)
|
|
49
|
+
if spec is None or spec.loader is None:
|
|
50
|
+
raise ValueError(f"cannot load scoring rule from {rule_path}")
|
|
51
|
+
self.module = importlib.util.module_from_spec(spec)
|
|
52
|
+
spec.loader.exec_module(self.module)
|
|
53
|
+
if not hasattr(self.module, "score"):
|
|
54
|
+
raise ValueError(f"{rule_path} must define `score(ant_artifact) -> float`")
|
|
55
|
+
|
|
56
|
+
def score(self, ant_artifact: dict) -> float:
|
|
57
|
+
v = self.module.score(ant_artifact)
|
|
58
|
+
return max(0.0, min(1.0, float(v)))
|
|
59
|
+
|
|
60
|
+
|
|
61
|
+
def load_verified_scores(scores_file: Path) -> Dict[str, float]:
|
|
62
|
+
"""Load pre-computed verified_scores from scorer role output (if exists)."""
|
|
63
|
+
if not scores_file.exists():
|
|
64
|
+
return {}
|
|
65
|
+
data = json.loads(scores_file.read_text())
|
|
66
|
+
return {
|
|
67
|
+
ant_id: entry["verified_score"]
|
|
68
|
+
for ant_id, entry in data.get("scores", {}).items()
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
|
|
72
|
+
def resolve_score(
|
|
73
|
+
ant_artifact: dict,
|
|
74
|
+
verified_scores: Dict[str, float],
|
|
75
|
+
script_scorer: Optional[ScriptScorer],
|
|
76
|
+
fallback: FallbackScorer,
|
|
77
|
+
) -> tuple[float, str]:
|
|
78
|
+
"""Return (score, source) using priority: verified > script > fallback."""
|
|
79
|
+
ant_id = ant_artifact.get("ant_id", "")
|
|
80
|
+
if ant_id in verified_scores:
|
|
81
|
+
return verified_scores[ant_id], "verified_llm"
|
|
82
|
+
if script_scorer is not None:
|
|
83
|
+
try:
|
|
84
|
+
return script_scorer.score(ant_artifact), "verified_script"
|
|
85
|
+
except Exception as e:
|
|
86
|
+
print(f"warning: script scorer failed for {ant_id}: {e}")
|
|
87
|
+
return fallback.score(ant_artifact), "fallback_self"
|
|
88
|
+
|
|
89
|
+
|
|
90
|
+
def hallucination_check(self_score: float, verified_score: float, threshold: float = 0.4) -> bool:
|
|
91
|
+
"""True if self vs verified divergence exceeds threshold."""
|
|
92
|
+
return abs(self_score - verified_score) > threshold
|
|
@@ -0,0 +1,475 @@
|
|
|
1
|
+
"""End-to-end tests for team-swarm scripts.
|
|
2
|
+
|
|
3
|
+
Runs each scenario in a clean tmp directory and asserts on outputs.
|
|
4
|
+
No external test framework — runnable as `python test_aco.py`.
|
|
5
|
+
"""
|
|
6
|
+
from __future__ import annotations
|
|
7
|
+
|
|
8
|
+
import json
|
|
9
|
+
import subprocess
|
|
10
|
+
import sys
|
|
11
|
+
import tempfile
|
|
12
|
+
from pathlib import Path
|
|
13
|
+
from typing import Optional
|
|
14
|
+
|
|
15
|
+
SCRIPT_DIR = Path(__file__).parent
|
|
16
|
+
ACO = SCRIPT_DIR / "aco.py"
|
|
17
|
+
|
|
18
|
+
# Import modules directly for unit-level tests
|
|
19
|
+
sys.path.insert(0, str(SCRIPT_DIR))
|
|
20
|
+
from pheromone import PheromoneState, edge_key # noqa: E402
|
|
21
|
+
from scoring import FallbackScorer, ScriptScorer, hallucination_check, resolve_score # noqa: E402
|
|
22
|
+
|
|
23
|
+
|
|
24
|
+
# ---------------------------------------------------------------------------
|
|
25
|
+
# Helpers
|
|
26
|
+
# ---------------------------------------------------------------------------
|
|
27
|
+
|
|
28
|
+
PASS = 0
|
|
29
|
+
FAIL = 0
|
|
30
|
+
FAILED_NAMES = []
|
|
31
|
+
|
|
32
|
+
|
|
33
|
+
def run_aco(session: Path, *args, expect_exit: int = 0) -> dict:
|
|
34
|
+
"""Invoke aco.py CLI, return parsed stdout JSON."""
|
|
35
|
+
cmd = [sys.executable, str(ACO), "--session", str(session), *args]
|
|
36
|
+
proc = subprocess.run(cmd, capture_output=True, text=True)
|
|
37
|
+
if proc.returncode != expect_exit:
|
|
38
|
+
raise AssertionError(
|
|
39
|
+
f"exit={proc.returncode} (expected {expect_exit})\n"
|
|
40
|
+
f"cmd: {' '.join(cmd)}\nstdout: {proc.stdout}\nstderr: {proc.stderr}"
|
|
41
|
+
)
|
|
42
|
+
if not proc.stdout.strip():
|
|
43
|
+
return {}
|
|
44
|
+
return json.loads(proc.stdout.strip().splitlines()[-1])
|
|
45
|
+
|
|
46
|
+
|
|
47
|
+
def check(name: str, cond: bool, detail: str = ""):
|
|
48
|
+
global PASS, FAIL
|
|
49
|
+
if cond:
|
|
50
|
+
PASS += 1
|
|
51
|
+
print(f" PASS {name}")
|
|
52
|
+
else:
|
|
53
|
+
FAIL += 1
|
|
54
|
+
FAILED_NAMES.append(name)
|
|
55
|
+
print(f" FAIL {name} {detail}")
|
|
56
|
+
|
|
57
|
+
|
|
58
|
+
def section(title: str):
|
|
59
|
+
print(f"\n=== {title} ===")
|
|
60
|
+
|
|
61
|
+
|
|
62
|
+
def write_config(session: Path, overrides: Optional[dict] = None) -> dict:
|
|
63
|
+
cfg = {
|
|
64
|
+
"swarm": {"n_ants": 3, "max_iterations": 3, "elite_keep": 2},
|
|
65
|
+
"aco": {"alpha": 1.0, "beta": 2.0, "rho": 0.2, "q": 1.0,
|
|
66
|
+
"tau_init": 1.0, "tau_min": 0.01, "tau_max": 10.0},
|
|
67
|
+
"task_space": {"type": "graph",
|
|
68
|
+
"nodes": ["a", "b", "c", "d", "e"],
|
|
69
|
+
"max_path_length": 3,
|
|
70
|
+
"start_nodes": "any",
|
|
71
|
+
"edges": "complete"},
|
|
72
|
+
"scoring": {"mode": "fallback", "self_score_discount": 0.5},
|
|
73
|
+
"ant_prompt": {"objective": "test", "evidence_requirements": []},
|
|
74
|
+
"convergence": {
|
|
75
|
+
"max_iterations": 3,
|
|
76
|
+
"stagnation": {"enabled": True, "patience": 2, "min_delta": 0.01},
|
|
77
|
+
"entropy_floor": {"enabled": True, "threshold": 0.1},
|
|
78
|
+
"target_score": {"enabled": True, "value": 0.95},
|
|
79
|
+
},
|
|
80
|
+
}
|
|
81
|
+
if overrides:
|
|
82
|
+
_deep_merge(cfg, overrides)
|
|
83
|
+
session.mkdir(parents=True, exist_ok=True)
|
|
84
|
+
(session / "swarm-config.json").write_text(json.dumps(cfg))
|
|
85
|
+
return cfg
|
|
86
|
+
|
|
87
|
+
|
|
88
|
+
def _deep_merge(base: dict, overrides: dict):
|
|
89
|
+
for k, v in overrides.items():
|
|
90
|
+
if isinstance(v, dict) and isinstance(base.get(k), dict):
|
|
91
|
+
_deep_merge(base[k], v)
|
|
92
|
+
else:
|
|
93
|
+
base[k] = v
|
|
94
|
+
|
|
95
|
+
|
|
96
|
+
def write_ant(session: Path, iteration: int, ant_idx: int,
|
|
97
|
+
path: list, self_score: float = 0.6, self_confidence: float = 0.7) -> Path:
|
|
98
|
+
artifacts = session / "artifacts"
|
|
99
|
+
artifacts.mkdir(exist_ok=True)
|
|
100
|
+
decisions = [
|
|
101
|
+
{"from": path[i], "to": path[i + 1], "rationale": "r",
|
|
102
|
+
"guided_by": "pheromone", "deviation_from_hint": False}
|
|
103
|
+
for i in range(len(path) - 1)
|
|
104
|
+
]
|
|
105
|
+
art = {
|
|
106
|
+
"schema_version": "1.0",
|
|
107
|
+
"ant_id": f"ANT-{iteration}-{ant_idx}",
|
|
108
|
+
"iteration": iteration,
|
|
109
|
+
"assignment": {"start_node": path[0], "max_path_length": 3},
|
|
110
|
+
"path": path,
|
|
111
|
+
"path_decisions": decisions,
|
|
112
|
+
"self_score": self_score,
|
|
113
|
+
"self_confidence": self_confidence,
|
|
114
|
+
"evidence": [f"src/{path[-1]}.ts:{ant_idx}"],
|
|
115
|
+
"candidate_solution": {"type": "string", "summary": f"sol-{ant_idx}",
|
|
116
|
+
"content": str(path)},
|
|
117
|
+
}
|
|
118
|
+
p = artifacts / f"ant-{iteration}-{ant_idx}.json"
|
|
119
|
+
p.write_text(json.dumps(art))
|
|
120
|
+
return p
|
|
121
|
+
|
|
122
|
+
|
|
123
|
+
# ---------------------------------------------------------------------------
|
|
124
|
+
# Unit tests — pheromone.py
|
|
125
|
+
# ---------------------------------------------------------------------------
|
|
126
|
+
|
|
127
|
+
def test_pheromone_unit():
|
|
128
|
+
section("pheromone.py unit")
|
|
129
|
+
|
|
130
|
+
s = PheromoneState.initialize(["a", "b", "c"], {})
|
|
131
|
+
check("init creates n*(n-1)/2 edges", len(s.tau) == 3, f"got {len(s.tau)}")
|
|
132
|
+
check("init uses default alpha=1.0", s.metadata["alpha"] == 1.0)
|
|
133
|
+
check("init uses default rho=0.2", s.metadata["rho"] == 0.2)
|
|
134
|
+
check("init all tau equal", len(set(s.tau.values())) == 1)
|
|
135
|
+
|
|
136
|
+
s.evaporate()
|
|
137
|
+
check("evaporate reduces by rho", abs(s.tau[edge_key("a", "b")] - 0.8) < 1e-9,
|
|
138
|
+
f"got {s.tau[edge_key('a', 'b')]}")
|
|
139
|
+
|
|
140
|
+
s.deposit(["a", "b", "c"], 0.5)
|
|
141
|
+
expected_ab = 0.8 + 0.5 * 1.0
|
|
142
|
+
check("deposit adds q*score per edge",
|
|
143
|
+
abs(s.tau[edge_key("a", "b")] - expected_ab) < 1e-9,
|
|
144
|
+
f"got {s.tau[edge_key('a', 'b')]}, expected {expected_ab}")
|
|
145
|
+
|
|
146
|
+
s.metadata["tau_max"] = 2.0
|
|
147
|
+
s.tau[edge_key("a", "b")] = 100.0
|
|
148
|
+
s.clip()
|
|
149
|
+
check("clip enforces tau_max", s.tau[edge_key("a", "b")] == 2.0)
|
|
150
|
+
check("clip enforces tau_min on small values",
|
|
151
|
+
all(v >= s.metadata["tau_min"] for v in s.tau.values()))
|
|
152
|
+
|
|
153
|
+
stats = s.stats()
|
|
154
|
+
check("stats has entropy field", "entropy" in stats)
|
|
155
|
+
check("stats entropy is positive", stats["entropy"] > 0)
|
|
156
|
+
|
|
157
|
+
probs = s.select_neighbors("a", ["a", "b", "c"])
|
|
158
|
+
check("select_neighbors excludes current node", "a" not in probs)
|
|
159
|
+
check("select_neighbors probs sum to 1",
|
|
160
|
+
abs(sum(probs.values()) - 1.0) < 1e-9, f"got {sum(probs.values())}")
|
|
161
|
+
|
|
162
|
+
empty = PheromoneState.initialize(["a"], {})
|
|
163
|
+
check("single-node init produces 0 edges", len(empty.tau) == 0)
|
|
164
|
+
check("empty stats handles 0-edge case", empty.stats()["entropy"] == 0.0)
|
|
165
|
+
|
|
166
|
+
s2 = PheromoneState.initialize(["a", "b", "c"], {})
|
|
167
|
+
with tempfile.TemporaryDirectory() as td:
|
|
168
|
+
p = Path(td) / "p.json"
|
|
169
|
+
s2.save(p)
|
|
170
|
+
s3 = PheromoneState.load(p)
|
|
171
|
+
check("save/load roundtrip preserves tau", s2.tau == s3.tau)
|
|
172
|
+
check("save/load preserves metadata", s2.metadata == s3.metadata)
|
|
173
|
+
|
|
174
|
+
|
|
175
|
+
# ---------------------------------------------------------------------------
|
|
176
|
+
# Unit tests — scoring.py
|
|
177
|
+
# ---------------------------------------------------------------------------
|
|
178
|
+
|
|
179
|
+
def test_scoring_unit():
|
|
180
|
+
section("scoring.py unit")
|
|
181
|
+
|
|
182
|
+
fb = FallbackScorer(discount=0.5)
|
|
183
|
+
artifact = {"self_score": 0.8, "self_confidence": 0.6}
|
|
184
|
+
expected = 0.8 * 0.6 * 0.5
|
|
185
|
+
check("FallbackScorer = self * conf * discount",
|
|
186
|
+
abs(fb.score(artifact) - expected) < 1e-9)
|
|
187
|
+
|
|
188
|
+
artifact_missing = {}
|
|
189
|
+
check("FallbackScorer handles missing fields",
|
|
190
|
+
fb.score(artifact_missing) == 0.0)
|
|
191
|
+
|
|
192
|
+
with tempfile.TemporaryDirectory() as td:
|
|
193
|
+
rule = Path(td) / "rule.py"
|
|
194
|
+
rule.write_text("def score(ant_artifact):\n return ant_artifact.get('self_score', 0) * 2\n")
|
|
195
|
+
ss = ScriptScorer(rule)
|
|
196
|
+
check("ScriptScorer loads user rule", ss.score({"self_score": 0.3}) == 0.6)
|
|
197
|
+
check("ScriptScorer clamps > 1.0", ss.score({"self_score": 0.9}) == 1.0)
|
|
198
|
+
check("ScriptScorer clamps < 0.0", ss.score({"self_score": -0.5}) == 0.0)
|
|
199
|
+
|
|
200
|
+
check("hallucination_check true at diff > 0.4",
|
|
201
|
+
hallucination_check(0.9, 0.4) is True)
|
|
202
|
+
check("hallucination_check false at diff < 0.4",
|
|
203
|
+
hallucination_check(0.5, 0.4) is False)
|
|
204
|
+
|
|
205
|
+
artifact_v = {"ant_id": "X", "self_score": 0.5, "self_confidence": 0.5}
|
|
206
|
+
score, src = resolve_score(artifact_v, {"X": 0.9}, None, fb)
|
|
207
|
+
check("resolve_score prefers verified", score == 0.9 and src == "verified_llm")
|
|
208
|
+
|
|
209
|
+
score, src = resolve_score(artifact_v, {}, None, fb)
|
|
210
|
+
check("resolve_score falls back when no verified",
|
|
211
|
+
src == "fallback_self")
|
|
212
|
+
|
|
213
|
+
|
|
214
|
+
# ---------------------------------------------------------------------------
|
|
215
|
+
# CLI integration — full pipeline (3 iterations, fallback scoring)
|
|
216
|
+
# ---------------------------------------------------------------------------
|
|
217
|
+
|
|
218
|
+
def test_full_pipeline_3_iterations():
|
|
219
|
+
section("aco.py — full 3-iteration pipeline (fallback scoring)")
|
|
220
|
+
|
|
221
|
+
with tempfile.TemporaryDirectory() as td:
|
|
222
|
+
session = Path(td)
|
|
223
|
+
write_config(session)
|
|
224
|
+
|
|
225
|
+
r = run_aco(session, "init")
|
|
226
|
+
check("init: status ok", r["status"] == "ok")
|
|
227
|
+
check("init: 5 nodes -> 10 edges", r["n_edges"] == 10)
|
|
228
|
+
check("init: pheromone file exists",
|
|
229
|
+
(session / "pheromone" / "current.json").exists())
|
|
230
|
+
check("init: task-space file exists",
|
|
231
|
+
(session / "task-space.json").exists())
|
|
232
|
+
check("init: init.json frozen",
|
|
233
|
+
(session / "pheromone" / "init.json").exists())
|
|
234
|
+
|
|
235
|
+
best_history = []
|
|
236
|
+
entropy_history = []
|
|
237
|
+
|
|
238
|
+
for k in range(1, 4):
|
|
239
|
+
sel = run_aco(session, "select", "--iter", str(k))
|
|
240
|
+
check(f"iter{k} select: 3 assignments", len(sel["assignments"]) == 3)
|
|
241
|
+
check(f"iter{k} select: ant_ids correct",
|
|
242
|
+
all(a["ant_id"] == f"ANT-{k}-{i+1}" for i, a in enumerate(sel["assignments"])))
|
|
243
|
+
check(f"iter{k} select: edge_preferences not empty",
|
|
244
|
+
all(a["edge_preferences"] for a in sel["assignments"]))
|
|
245
|
+
|
|
246
|
+
paths_by_quality = [
|
|
247
|
+
(["a", "b", "c"], 0.9, 0.9),
|
|
248
|
+
(["b", "d", "e"], 0.6, 0.7),
|
|
249
|
+
(["c", "e", "a"], 0.4, 0.5),
|
|
250
|
+
]
|
|
251
|
+
for i, (path, ss, sc) in enumerate(paths_by_quality, 1):
|
|
252
|
+
write_ant(session, k, i, path, ss, sc)
|
|
253
|
+
|
|
254
|
+
up = run_aco(session, "update", "--iter", str(k))
|
|
255
|
+
check(f"iter{k} update: 3 ants processed", up["n_ants_processed"] == 3)
|
|
256
|
+
check(f"iter{k} update: best_score > 0", up["best_score"] > 0)
|
|
257
|
+
check(f"iter{k} update: stats has entropy", "entropy" in up["stats"])
|
|
258
|
+
|
|
259
|
+
best_history.append(up["best_score"])
|
|
260
|
+
entropy_history.append(up["stats"]["entropy"])
|
|
261
|
+
|
|
262
|
+
check(f"iter{k} history snapshot exists",
|
|
263
|
+
(session / "pheromone" / "history" / f"{k}.json").exists())
|
|
264
|
+
check(f"iter{k} trails written",
|
|
265
|
+
(session / "trails" / f"{k}.jsonl").exists())
|
|
266
|
+
|
|
267
|
+
check("best_score stable after iter1 (same ants each iter)",
|
|
268
|
+
best_history[0] == best_history[-1])
|
|
269
|
+
check("entropy decreases over iterations (concentration)",
|
|
270
|
+
entropy_history[0] >= entropy_history[-1],
|
|
271
|
+
f"start={entropy_history[0]:.3f} end={entropy_history[-1]:.3f}")
|
|
272
|
+
|
|
273
|
+
cv = run_aco(session, "converged")
|
|
274
|
+
check("converged: returns triggered_by list", isinstance(cv["triggered_by"], list))
|
|
275
|
+
check("converged: triggers stagnation (best unchanged)",
|
|
276
|
+
"stagnation" in cv["triggered_by"])
|
|
277
|
+
check("converged: also triggers max_iterations",
|
|
278
|
+
"max_iterations" in cv["triggered_by"])
|
|
279
|
+
check("converged: true after triggers", cv["converged"] is True)
|
|
280
|
+
|
|
281
|
+
rep = run_aco(session, "report")
|
|
282
|
+
check("report: has best", rep["best"] is not None)
|
|
283
|
+
check("report: top_k present", len(rep["top_k"]) > 0)
|
|
284
|
+
check("report: convergence curve len == iters",
|
|
285
|
+
len(rep["convergence_curve"]) == 3)
|
|
286
|
+
check("report: iterations_completed == 3", rep["iterations_completed"] == 3)
|
|
287
|
+
|
|
288
|
+
|
|
289
|
+
# ---------------------------------------------------------------------------
|
|
290
|
+
# CLI integration — target_score convergence
|
|
291
|
+
# ---------------------------------------------------------------------------
|
|
292
|
+
|
|
293
|
+
def test_target_score_convergence():
|
|
294
|
+
section("aco.py — target_score triggers early convergence")
|
|
295
|
+
|
|
296
|
+
with tempfile.TemporaryDirectory() as td:
|
|
297
|
+
session = Path(td)
|
|
298
|
+
write_config(session, {
|
|
299
|
+
"convergence": {"target_score": {"enabled": True, "value": 0.30}},
|
|
300
|
+
})
|
|
301
|
+
run_aco(session, "init")
|
|
302
|
+
run_aco(session, "select", "--iter", "1")
|
|
303
|
+
write_ant(session, 1, 1, ["a", "b"], 0.9, 0.9)
|
|
304
|
+
write_ant(session, 1, 2, ["b", "c"], 0.6, 0.6)
|
|
305
|
+
write_ant(session, 1, 3, ["c", "d"], 0.4, 0.4)
|
|
306
|
+
up = run_aco(session, "update", "--iter", "1")
|
|
307
|
+
|
|
308
|
+
check("best_score above target", up["best_score"] >= 0.30)
|
|
309
|
+
cv = run_aco(session, "converged")
|
|
310
|
+
check("converged after 1 iter (target hit)",
|
|
311
|
+
cv["converged"] is True and "target_score" in cv["triggered_by"])
|
|
312
|
+
|
|
313
|
+
|
|
314
|
+
# ---------------------------------------------------------------------------
|
|
315
|
+
# CLI integration — hallucination detection (verified scores file)
|
|
316
|
+
# ---------------------------------------------------------------------------
|
|
317
|
+
|
|
318
|
+
def test_hallucination_flagging():
|
|
319
|
+
section("aco.py — hallucination flagging with verified scores")
|
|
320
|
+
|
|
321
|
+
with tempfile.TemporaryDirectory() as td:
|
|
322
|
+
session = Path(td)
|
|
323
|
+
write_config(session)
|
|
324
|
+
run_aco(session, "init")
|
|
325
|
+
run_aco(session, "select", "--iter", "1")
|
|
326
|
+
|
|
327
|
+
write_ant(session, 1, 1, ["a", "b"], self_score=0.95, self_confidence=0.9)
|
|
328
|
+
write_ant(session, 1, 2, ["b", "c"], self_score=0.85, self_confidence=0.8)
|
|
329
|
+
write_ant(session, 1, 3, ["c", "d"], self_score=0.7, self_confidence=0.7)
|
|
330
|
+
|
|
331
|
+
scores_dir = session / "scores"
|
|
332
|
+
scores_dir.mkdir(exist_ok=True)
|
|
333
|
+
(scores_dir / "iter-1-scores.json").write_text(json.dumps({
|
|
334
|
+
"iteration": 1, "scorer_type": "llm",
|
|
335
|
+
"scores": {
|
|
336
|
+
"ANT-1-1": {"verified_score": 0.20, "rationale": "weak"},
|
|
337
|
+
"ANT-1-2": {"verified_score": 0.80, "rationale": "ok"},
|
|
338
|
+
"ANT-1-3": {"verified_score": 0.65, "rationale": "ok"},
|
|
339
|
+
},
|
|
340
|
+
}))
|
|
341
|
+
|
|
342
|
+
up = run_aco(session, "update", "--iter", "1")
|
|
343
|
+
check("ANT-1-1 flagged as hallucination (|0.95-0.20|=0.75 > 0.4)",
|
|
344
|
+
"ANT-1-1" in up["hallucinations_flagged"])
|
|
345
|
+
check("ANT-1-2 NOT flagged (|0.85-0.80|=0.05 < 0.4)",
|
|
346
|
+
"ANT-1-2" not in up["hallucinations_flagged"])
|
|
347
|
+
check("ANT-1-3 NOT flagged (|0.7-0.65|=0.05 < 0.4)",
|
|
348
|
+
"ANT-1-3" not in up["hallucinations_flagged"])
|
|
349
|
+
|
|
350
|
+
|
|
351
|
+
# ---------------------------------------------------------------------------
|
|
352
|
+
# CLI integration — invalid artifact handling
|
|
353
|
+
# ---------------------------------------------------------------------------
|
|
354
|
+
|
|
355
|
+
def test_invalid_artifacts():
|
|
356
|
+
section("aco.py — invalid artifacts handled gracefully")
|
|
357
|
+
|
|
358
|
+
with tempfile.TemporaryDirectory() as td:
|
|
359
|
+
session = Path(td)
|
|
360
|
+
write_config(session)
|
|
361
|
+
run_aco(session, "init")
|
|
362
|
+
run_aco(session, "select", "--iter", "1")
|
|
363
|
+
|
|
364
|
+
write_ant(session, 1, 1, ["a", "b"])
|
|
365
|
+
|
|
366
|
+
artifacts = session / "artifacts"
|
|
367
|
+
(artifacts / "ant-1-2.json").write_text(json.dumps({
|
|
368
|
+
"schema_version": "1.0", "ant_id": "ANT-1-2", "iteration": 1,
|
|
369
|
+
"path": ["a", "ZZZ_NOT_A_NODE"],
|
|
370
|
+
"path_decisions": [{"from": "a", "to": "ZZZ_NOT_A_NODE", "rationale": "x",
|
|
371
|
+
"guided_by": "x", "deviation_from_hint": False}],
|
|
372
|
+
"self_score": 0.5, "self_confidence": 0.5,
|
|
373
|
+
"evidence": ["x"], "candidate_solution": {"summary": "x"},
|
|
374
|
+
}))
|
|
375
|
+
|
|
376
|
+
(artifacts / "ant-1-3.json").write_text("{ malformed json")
|
|
377
|
+
|
|
378
|
+
up = run_aco(session, "update", "--iter", "1")
|
|
379
|
+
check("only 1 valid ant processed (2 rejected)",
|
|
380
|
+
up["n_ants_processed"] == 1, f"got {up['n_ants_processed']}")
|
|
381
|
+
|
|
382
|
+
|
|
383
|
+
def test_config_validation():
|
|
384
|
+
section("aco.py — config validation error paths")
|
|
385
|
+
|
|
386
|
+
with tempfile.TemporaryDirectory() as td:
|
|
387
|
+
session = Path(td)
|
|
388
|
+
session.mkdir(exist_ok=True)
|
|
389
|
+
(session / "swarm-config.json").write_text(json.dumps({"task_space": {}, "aco": {}}))
|
|
390
|
+
r = run_aco(session, "init", expect_exit=2)
|
|
391
|
+
check("missing nodes -> exit 2", r["status"] == "error")
|
|
392
|
+
check("error message mentions nodes",
|
|
393
|
+
"nodes" in r["message"] or "auto_discover" in r["message"])
|
|
394
|
+
|
|
395
|
+
with tempfile.TemporaryDirectory() as td:
|
|
396
|
+
session = Path(td)
|
|
397
|
+
session.mkdir(exist_ok=True)
|
|
398
|
+
r = run_aco(session, "init", expect_exit=2)
|
|
399
|
+
check("missing config -> exit 2", r["status"] == "error")
|
|
400
|
+
|
|
401
|
+
|
|
402
|
+
def test_idempotent_update():
|
|
403
|
+
section("aco.py — update is idempotent (re-running same iter is safe)")
|
|
404
|
+
|
|
405
|
+
with tempfile.TemporaryDirectory() as td:
|
|
406
|
+
session = Path(td)
|
|
407
|
+
write_config(session, {"aco": {"rho": 0.0}})
|
|
408
|
+
run_aco(session, "init")
|
|
409
|
+
run_aco(session, "select", "--iter", "1")
|
|
410
|
+
write_ant(session, 1, 1, ["a", "b"], 0.8, 0.8)
|
|
411
|
+
write_ant(session, 1, 2, ["b", "c"], 0.6, 0.7)
|
|
412
|
+
write_ant(session, 1, 3, ["c", "d"], 0.4, 0.5)
|
|
413
|
+
|
|
414
|
+
up1 = run_aco(session, "update", "--iter", "1")
|
|
415
|
+
up2 = run_aco(session, "update", "--iter", "1")
|
|
416
|
+
check("update re-run keeps n_ants_processed stable",
|
|
417
|
+
up1["n_ants_processed"] == up2["n_ants_processed"])
|
|
418
|
+
check("update re-run keeps best ant stable",
|
|
419
|
+
up1["best_score"] == up2["best_score"])
|
|
420
|
+
|
|
421
|
+
|
|
422
|
+
def test_auto_discover_from_glob():
|
|
423
|
+
section("aco.py — auto_discover_from glob")
|
|
424
|
+
|
|
425
|
+
with tempfile.TemporaryDirectory() as td:
|
|
426
|
+
td_path = Path(td)
|
|
427
|
+
for name in ["alpha.txt", "beta.txt", "gamma.txt"]:
|
|
428
|
+
(td_path / name).write_text("data")
|
|
429
|
+
|
|
430
|
+
session = td_path / "session"
|
|
431
|
+
session.mkdir()
|
|
432
|
+
(session / "swarm-config.json").write_text(json.dumps({
|
|
433
|
+
"swarm": {"n_ants": 2}, "aco": {},
|
|
434
|
+
"task_space": {"auto_discover_from": str(td_path / "*.txt"),
|
|
435
|
+
"max_path_length": 2},
|
|
436
|
+
"scoring": {"mode": "fallback"},
|
|
437
|
+
"ant_prompt": {"objective": "x"},
|
|
438
|
+
"convergence": {"max_iterations": 1},
|
|
439
|
+
}))
|
|
440
|
+
r = run_aco(session, "init")
|
|
441
|
+
check("auto_discover finds 3 files", r["n_nodes"] == 3,
|
|
442
|
+
f"got {r['n_nodes']}")
|
|
443
|
+
|
|
444
|
+
|
|
445
|
+
# ---------------------------------------------------------------------------
|
|
446
|
+
# Main
|
|
447
|
+
# ---------------------------------------------------------------------------
|
|
448
|
+
|
|
449
|
+
def main():
|
|
450
|
+
print("=" * 60)
|
|
451
|
+
print("team-swarm scripts test suite")
|
|
452
|
+
print("=" * 60)
|
|
453
|
+
|
|
454
|
+
test_pheromone_unit()
|
|
455
|
+
test_scoring_unit()
|
|
456
|
+
test_full_pipeline_3_iterations()
|
|
457
|
+
test_target_score_convergence()
|
|
458
|
+
test_hallucination_flagging()
|
|
459
|
+
test_invalid_artifacts()
|
|
460
|
+
test_config_validation()
|
|
461
|
+
test_idempotent_update()
|
|
462
|
+
test_auto_discover_from_glob()
|
|
463
|
+
|
|
464
|
+
print("\n" + "=" * 60)
|
|
465
|
+
print(f"Results: {PASS} passed, {FAIL} failed")
|
|
466
|
+
if FAILED_NAMES:
|
|
467
|
+
print("\nFailed tests:")
|
|
468
|
+
for name in FAILED_NAMES:
|
|
469
|
+
print(f" - {name}")
|
|
470
|
+
print("=" * 60)
|
|
471
|
+
sys.exit(0 if FAIL == 0 else 1)
|
|
472
|
+
|
|
473
|
+
|
|
474
|
+
if __name__ == "__main__":
|
|
475
|
+
main()
|
|
@@ -0,0 +1,115 @@
|
|
|
1
|
+
# Ant Output Schema
|
|
2
|
+
|
|
3
|
+
**The critical contract.** Every ant MUST produce JSON matching this schema.
|
|
4
|
+
Pheromone updates and adversarial scoring depend on it.
|
|
5
|
+
|
|
6
|
+
Inherited from team-swarm with adversarial scoring integration notes.
|
|
7
|
+
|
|
8
|
+
## File Path
|
|
9
|
+
|
|
10
|
+
```
|
|
11
|
+
<session>/artifacts/ant-<iteration>-<ant_id>.json
|
|
12
|
+
```
|
|
13
|
+
|
|
14
|
+
## Schema
|
|
15
|
+
|
|
16
|
+
```json
|
|
17
|
+
{
|
|
18
|
+
"schema_version": "1.0",
|
|
19
|
+
"ant_id": "ANT-3-2",
|
|
20
|
+
"iteration": 3,
|
|
21
|
+
"assignment": {
|
|
22
|
+
"start_node": "node_a",
|
|
23
|
+
"max_path_length": 5
|
|
24
|
+
},
|
|
25
|
+
"path": ["node_a", "node_c", "node_f"],
|
|
26
|
+
"path_decisions": [
|
|
27
|
+
{
|
|
28
|
+
"from": "node_a",
|
|
29
|
+
"to": "node_c",
|
|
30
|
+
"rationale": "<one-line reason>",
|
|
31
|
+
"guided_by": "pheromone | heuristic | evidence",
|
|
32
|
+
"pheromone_weight": 0.30,
|
|
33
|
+
"deviation_from_hint": false
|
|
34
|
+
}
|
|
35
|
+
],
|
|
36
|
+
"self_score": 0.78,
|
|
37
|
+
"self_confidence": 0.6,
|
|
38
|
+
"evidence": [
|
|
39
|
+
{ "source": "src/foo.ts:42", "finding": "suspicious pattern", "strength": "strong" }
|
|
40
|
+
],
|
|
41
|
+
"candidate_solution": {
|
|
42
|
+
"type": "string | object | file_ref",
|
|
43
|
+
"summary": "<one-line>",
|
|
44
|
+
"content": "<actual artifact>"
|
|
45
|
+
},
|
|
46
|
+
"blockers": [],
|
|
47
|
+
"notes": "<optional free text>"
|
|
48
|
+
}
|
|
49
|
+
```
|
|
50
|
+
|
|
51
|
+
## Required Fields
|
|
52
|
+
|
|
53
|
+
| Field | Type | Required | Constraint |
|
|
54
|
+
|-------|------|----------|------------|
|
|
55
|
+
| `schema_version` | string | yes | `"1.0"` |
|
|
56
|
+
| `ant_id` | string | yes | matches assignment |
|
|
57
|
+
| `iteration` | int | yes | matches assignment |
|
|
58
|
+
| `path` | string[] | yes | len >= 1, all nodes ∈ task_space |
|
|
59
|
+
| `path_decisions` | array | yes | len = len(path) - 1 |
|
|
60
|
+
| `self_score` | float | yes | [0.0, 1.0] |
|
|
61
|
+
| `self_confidence` | float | yes | [0.0, 1.0] |
|
|
62
|
+
| `evidence` | array | yes | min 1 entry |
|
|
63
|
+
| `candidate_solution` | object | yes | non-empty summary |
|
|
64
|
+
|
|
65
|
+
## Three-Layer Scoring (Adversarial Edition)
|
|
66
|
+
|
|
67
|
+
| Layer | Source | Purpose |
|
|
68
|
+
|-------|--------|---------|
|
|
69
|
+
| `self_score` | Ant self-report | Cheap signal; NOT used for pheromone |
|
|
70
|
+
| `adversarial_score` | wf-swarm-score (3-vote) | Prosecutor/defender/judge per ant |
|
|
71
|
+
| `verified_score` | Calibrated from 3 votes | **Authoritative pheromone input** |
|
|
72
|
+
|
|
73
|
+
In team-adversarial-swarm, `verified_score` is derived from 3 adversarial votes:
|
|
74
|
+
```
|
|
75
|
+
verified_score = prosecutor(0.25) + defender(0.25) + judge(0.50) (weighted avg)
|
|
76
|
+
```
|
|
77
|
+
|
|
78
|
+
Calibrated across the full ant batch for consistency.
|
|
79
|
+
|
|
80
|
+
## Hallucination Detection
|
|
81
|
+
|
|
82
|
+
Enhanced by adversarial scoring:
|
|
83
|
+
- `|self_score - verified_score| > 0.3` → flagged (threshold lower than team-swarm's 0.4)
|
|
84
|
+
- Prosecutor/defender vote spread > 0.5 → "controversial" flag
|
|
85
|
+
- If >50% of ants flagged → coordinator pauses for user input
|
|
86
|
+
|
|
87
|
+
## Adversarial Scores File
|
|
88
|
+
|
|
89
|
+
```json
|
|
90
|
+
{
|
|
91
|
+
"iteration": 3,
|
|
92
|
+
"scorer_type": "adversarial_3vote",
|
|
93
|
+
"scores": {
|
|
94
|
+
"ANT-3-1": {
|
|
95
|
+
"verified_score": 0.72,
|
|
96
|
+
"rationale": "judge weighted average",
|
|
97
|
+
"votes": {
|
|
98
|
+
"prosecutor": 0.55,
|
|
99
|
+
"defender": 0.85,
|
|
100
|
+
"judge": 0.76
|
|
101
|
+
},
|
|
102
|
+
"hallucination_flag": false,
|
|
103
|
+
"self_vs_verified_delta": 0.06
|
|
104
|
+
}
|
|
105
|
+
},
|
|
106
|
+
"calibration": {
|
|
107
|
+
"mean": 0.68,
|
|
108
|
+
"std": 0.12,
|
|
109
|
+
"min": 0.45,
|
|
110
|
+
"max": 0.82,
|
|
111
|
+
"hallucination_rate": 0.2
|
|
112
|
+
},
|
|
113
|
+
"ranking": ["ANT-3-2", "ANT-3-1", "ANT-3-4", "ANT-3-3", "ANT-3-5"]
|
|
114
|
+
}
|
|
115
|
+
```
|