harness-evolver 1.2.0 → 1.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -1,12 +1,26 @@
1
1
  ---
2
2
  name: harness-evolver-architect
3
3
  description: |
4
- Use this agent when the harness-evolver:architect skill needs to analyze a harness
5
- and recommend the optimal multi-agent topology. Reads code analysis signals, traces,
6
- and scores to produce a migration plan from current to recommended architecture.
7
- model: opus
4
+ Use this agent to analyze harness architecture and recommend optimal multi-agent topology.
5
+ Reads code analysis signals, traces, and scores to produce a migration plan.
6
+ tools: Read, Write, Bash, Grep, Glob
8
7
  ---
9
8
 
9
+ ## Bootstrap
10
+
11
+ If your prompt contains a `<files_to_read>` block, you MUST use the Read tool to load
12
+ every file listed there before performing any other actions.
13
+
14
+ ## Return Protocol
15
+
16
+ When done, end your response with:
17
+
18
+ ## ARCHITECTURE ANALYSIS COMPLETE
19
+ - **Current topology**: {topology}
20
+ - **Recommended**: {topology}
21
+ - **Confidence**: {low|medium|high}
22
+ - **Migration steps**: {N}
23
+
10
24
  # Harness Evolver — Architect Agent
11
25
 
12
26
  You are the architect in a Meta-Harness optimization system. Your job is to analyze a harness's current agent topology, assess whether it matches the task complexity, and recommend the optimal topology with a concrete migration plan.
@@ -1,13 +1,27 @@
1
1
  ---
2
2
  name: harness-evolver-critic
3
3
  description: |
4
- Use this agent when scores converge suspiciously fast (>0.3 jump in one iteration
5
- or 1.0 reached in <3 iterations), or when the user wants to validate eval quality.
6
- Analyzes the eval script, harness outputs, and optionally uses LangSmith evaluators
7
- to cross-validate scores and identify eval weaknesses.
8
- model: opus
4
+ Use this agent to assess eval quality, detect eval gaming, and propose stricter evaluation.
5
+ Triggered when scores converge suspiciously fast or on user request.
6
+ tools: Read, Write, Bash, Grep, Glob
9
7
  ---
10
8
 
9
+ ## Bootstrap
10
+
11
+ If your prompt contains a `<files_to_read>` block, you MUST use the Read tool to load
12
+ every file listed there before performing any other actions.
13
+
14
+ ## Return Protocol
15
+
16
+ When done, end your response with:
17
+
18
+ ## CRITIC REPORT COMPLETE
19
+ - **Eval quality**: {weak|moderate|strong}
20
+ - **Gaming detected**: {yes|no}
21
+ - **Weaknesses found**: {N}
22
+ - **Improved eval written**: {yes|no}
23
+ - **Score with improved eval**: {score or N/A}
24
+
11
25
  # Harness Evolver — Critic Agent
12
26
 
13
27
  You are the critic in the Harness Evolver loop. Your job is to assess whether the eval
@@ -1,12 +1,27 @@
1
1
  ---
2
2
  name: harness-evolver-proposer
3
3
  description: |
4
- Use this agent when the harness-evolve skill needs to propose a new harness candidate.
5
- This agent navigates the .harness-evolver/ filesystem to diagnose failures in prior
6
- candidates and propose an improved harness. It is the core of the Meta-Harness optimization loop.
7
- model: opus
4
+ Use this agent when the evolve skill needs to propose a new harness candidate.
5
+ Navigates the .harness-evolver/ filesystem to diagnose failures and propose improvements.
6
+ tools: Read, Write, Edit, Bash, Glob, Grep
7
+ permissionMode: acceptEdits
8
8
  ---
9
9
 
10
+ ## Bootstrap
11
+
12
+ If your prompt contains a `<files_to_read>` block, you MUST use the Read tool to load
13
+ every file listed there before performing any other actions. These files are your context.
14
+
15
+ ## Return Protocol
16
+
17
+ When done, end your response with:
18
+
19
+ ## PROPOSAL COMPLETE
20
+ - **Version**: v{NNN}
21
+ - **Parent**: v{PARENT}
22
+ - **Change**: {one-sentence summary}
23
+ - **Expected impact**: {score prediction}
24
+
10
25
  # Harness Evolver — Proposer Agent
11
26
 
12
27
  You are the proposer in a Meta-Harness optimization loop. Your job is to analyze all prior harness candidates — their code, execution traces, and scores — and propose a new harness that improves on them.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "harness-evolver",
3
- "version": "1.2.0",
3
+ "version": "1.3.0",
4
4
  "description": "Meta-Harness-style autonomous harness optimization for Claude Code",
5
5
  "author": "Raphael Valdetaro",
6
6
  "license": "MIT",
@@ -28,80 +28,60 @@ TOOLS=$([ -d ".harness-evolver/tools" ] && echo ".harness-evolver/tools" || echo
28
28
 
29
29
  Use `$TOOLS` prefix for all tool calls below.
30
30
 
31
- ## Step 1: Run Architecture Analysis
31
+ ## What To Do
32
32
 
33
- Build the command based on what exists:
33
+ 1. Check `.harness-evolver/` exists.
34
34
 
35
+ 2. Run architecture analysis tool:
35
36
  ```bash
36
- CMD="python3 $TOOLS/analyze_architecture.py --harness .harness-evolver/baseline/harness.py"
37
-
38
- # Add traces from best version if evolution has run
39
- if [ -f ".harness-evolver/summary.json" ]; then
40
- BEST=$(python3 -c "import json; s=json.load(open('.harness-evolver/summary.json')); print(s.get('best',{}).get('version',''))")
41
- if [ -n "$BEST" ] && [ -d ".harness-evolver/harnesses/$BEST/traces" ]; then
42
- CMD="$CMD --traces-dir .harness-evolver/harnesses/$BEST/traces"
43
- fi
44
- CMD="$CMD --summary .harness-evolver/summary.json"
45
- fi
46
-
47
- CMD="$CMD -o .harness-evolver/architecture_signals.json"
48
-
49
- eval $CMD
37
+ python3 $TOOLS/analyze_architecture.py \
38
+ --harness .harness-evolver/baseline/harness.py \
39
+ -o .harness-evolver/architecture_signals.json
50
40
  ```
51
41
 
52
- Check exit code. If it fails, report the error and stop.
53
-
54
- ## Step 2: Spawn Architect Agent
55
-
56
- Spawn the `harness-evolver-architect` agent with:
57
-
58
- > Analyze the harness and recommend the optimal multi-agent topology.
59
- > Raw signals are at `.harness-evolver/architecture_signals.json`.
60
- > Write `.harness-evolver/architecture.json` and `.harness-evolver/architecture.md`.
61
-
62
- The architect agent will:
63
- 1. Read the signals JSON
64
- 2. Read the harness code and config
65
- 3. Classify the current topology
66
- 4. Assess if it matches task complexity
67
- 5. Recommend the optimal topology with migration steps
68
- 6. Write `architecture.json` and `architecture.md`
69
-
70
- ## Step 3: Report
71
-
72
- After the architect agent completes, read the outputs and print a summary:
73
-
42
+ If evolution has run, add trace and score data:
43
+ ```bash
44
+ python3 $TOOLS/analyze_architecture.py \
45
+ --harness .harness-evolver/harnesses/{best}/harness.py \
46
+ --traces-dir .harness-evolver/harnesses/{best}/traces \
47
+ --summary .harness-evolver/summary.json \
48
+ -o .harness-evolver/architecture_signals.json
74
49
  ```
75
- Architecture Analysis Complete
76
- ==============================
77
- Current topology: {current_topology}
78
- Recommended topology: {recommended_topology}
79
- Confidence: {confidence}
80
-
81
- Reasoning: {reasoning}
82
-
83
- Migration Path:
84
- 1. {step 1 description}
85
- 2. {step 2 description}
86
- ...
87
50
 
88
- Risks:
89
- - {risk 1}
90
- - {risk 2}
91
-
92
- Next: Run /harness-evolver:evolve the proposer will follow the migration path.
51
+ 3. Spawn the `harness-evolver-architect` agent:
52
+
53
+ ```xml
54
+ <objective>
55
+ Analyze the harness architecture and recommend the optimal multi-agent topology.
56
+ {If called from evolve: "The evolution loop stagnated/regressed after N iterations."}
57
+ {If called by user: "The user requested an architecture analysis."}
58
+ </objective>
59
+
60
+ <files_to_read>
61
+ - .harness-evolver/architecture_signals.json
62
+ - .harness-evolver/config.json
63
+ - .harness-evolver/baseline/harness.py
64
+ - .harness-evolver/summary.json (if exists)
65
+ - .harness-evolver/PROPOSER_HISTORY.md (if exists)
66
+ </files_to_read>
67
+
68
+ <output>
69
+ Write:
70
+ - .harness-evolver/architecture.json
71
+ - .harness-evolver/architecture.md
72
+ </output>
73
+
74
+ <success_criteria>
75
+ - Classifies current topology correctly
76
+ - Recommendation includes migration path with concrete steps
77
+ - Considers detected stack and API key availability
78
+ - Confidence rating is honest (low/medium/high)
79
+ </success_criteria>
93
80
  ```
94
81
 
95
- If the architect recommends no change (current = recommended), report:
96
-
97
- ```
98
- Architecture Analysis Complete
99
- ==============================
100
- Current topology: {topology} — looks optimal for these tasks.
101
- No architecture change recommended. Score: {score}
82
+ 4. Wait for `## ARCHITECTURE ANALYSIS COMPLETE`.
102
83
 
103
- The proposer can continue evolving within the current topology.
104
- ```
84
+ 5. Print summary: current -> recommended, confidence, migration steps.
105
85
 
106
86
  ## Arguments
107
87
 
@@ -20,18 +20,42 @@ TOOLS=$([ -d ".harness-evolver/tools" ] && echo ".harness-evolver/tools" || echo
20
20
 
21
21
  ## What To Do
22
22
 
23
- 1. Read `summary.json` to check for suspicious patterns:
24
- - Score jump >0.3 in a single iteration
25
- - Score reached 1.0 in <3 iterations
26
- - All tasks suddenly pass after failing
23
+ 1. Read `summary.json` and identify the suspicious pattern (score jump, premature convergence).
27
24
 
28
25
  2. Spawn the `harness-evolver-critic` agent:
29
- > Analyze the eval quality for this harness evolution project.
30
- > Check if the eval at `.harness-evolver/eval/eval.py` is rigorous enough.
31
- > The best version is {version} with score {score} achieved in {iterations} iterations.
32
-
33
- 3. After the critic reports:
34
- - Show the eval quality assessment
35
- - If `eval_improved.py` was created, show the score comparison
36
- - Ask user: "Adopt the improved eval? This will re-baseline all scores."
37
- - If adopted: copy `eval_improved.py` to `eval/eval.py`, re-run baseline, update state
26
+
27
+ ```xml
28
+ <objective>
29
+ Analyze eval quality for this harness evolution project.
30
+ The best version is {version} with score {score} achieved in {iterations} iteration(s).
31
+ {Specific concern: "Score jumped from X to Y in one iteration" or "Perfect score in N iterations"}
32
+ </objective>
33
+
34
+ <files_to_read>
35
+ - .harness-evolver/eval/eval.py
36
+ - .harness-evolver/summary.json
37
+ - .harness-evolver/harnesses/{best_version}/scores.json
38
+ - .harness-evolver/harnesses/{best_version}/harness.py
39
+ - .harness-evolver/harnesses/{best_version}/proposal.md
40
+ - .harness-evolver/config.json
41
+ </files_to_read>
42
+
43
+ <output>
44
+ Write:
45
+ - .harness-evolver/critic_report.md (human-readable analysis)
46
+ - .harness-evolver/eval/eval_improved.py (if weaknesses found)
47
+ </output>
48
+
49
+ <success_criteria>
50
+ - Identifies specific weaknesses in eval.py with examples
51
+ - If gaming detected, shows exact tasks/outputs that expose the weakness
52
+ - Improved eval preserves the --results-dir/--tasks-dir/--scores interface
53
+ - Re-scores the best version with improved eval to quantify the difference
54
+ </success_criteria>
55
+ ```
56
+
57
+ 3. Wait for `## CRITIC REPORT COMPLETE`.
58
+
59
+ 4. Report findings to user. If `eval_improved.py` was written:
60
+ - Show score comparison (current eval vs improved eval)
61
+ - Ask: "Adopt the improved eval? This will affect future iterations."
@@ -36,12 +36,38 @@ python3 -c "import json; s=json.load(open('.harness-evolver/summary.json')); pri
36
36
 
37
37
  ### 2. Propose
38
38
 
39
- Spawn the `harness-evolver-proposer` agent:
40
-
41
- > You are proposing iteration {i}. Create version {version} in `.harness-evolver/harnesses/{version}/`.
42
- > Working directory contains `.harness-evolver/` with all prior candidates and traces.
39
+ Spawn the `harness-evolver-proposer` agent with a structured prompt:
40
+
41
+ ```xml
42
+ <objective>
43
+ Propose harness version {version} that improves on the current best score of {best_score}.
44
+ </objective>
45
+
46
+ <files_to_read>
47
+ - .harness-evolver/summary.json
48
+ - .harness-evolver/PROPOSER_HISTORY.md
49
+ - .harness-evolver/config.json
50
+ - .harness-evolver/baseline/harness.py
51
+ - .harness-evolver/harnesses/{best_version}/harness.py
52
+ - .harness-evolver/harnesses/{best_version}/scores.json
53
+ - .harness-evolver/harnesses/{best_version}/proposal.md
54
+ </files_to_read>
55
+
56
+ <output>
57
+ Create directory .harness-evolver/harnesses/{version}/ containing:
58
+ - harness.py (the improved harness)
59
+ - config.json (parameters, copy from parent if unchanged)
60
+ - proposal.md (reasoning, must start with "Based on v{PARENT}")
61
+ </output>
62
+
63
+ <success_criteria>
64
+ - harness.py maintains CLI interface (--input, --output, --traces-dir, --config)
65
+ - proposal.md documents evidence-based reasoning
66
+ - Changes are motivated by trace analysis, not guesswork
67
+ </success_criteria>
68
+ ```
43
69
 
44
- The proposer creates: `harness.py`, `config.json`, `proposal.md`.
70
+ Wait for the agent to complete. Look for `## PROPOSAL COMPLETE` in the response.
45
71
 
46
72
  ### 3. Validate
47
73
 
@@ -117,10 +143,35 @@ python3 $TOOLS/analyze_architecture.py \
117
143
 
118
144
  Then spawn the `harness-evolver-architect` agent:
119
145
 
120
- > The evolution loop has stagnated/regressed after {iterations} iterations (best: {best_score}).
121
- > Analyze the harness architecture and recommend a topology change.
122
- > Raw signals at `.harness-evolver/architecture_signals.json`.
123
- > Write `.harness-evolver/architecture.json` and `.harness-evolver/architecture.md`.
146
+ ```xml
147
+ <objective>
148
+ The evolution loop has {stagnated/regressed} after {iterations} iterations (best: {best_score}).
149
+ Analyze the harness architecture and recommend a topology change.
150
+ </objective>
151
+
152
+ <files_to_read>
153
+ - .harness-evolver/architecture_signals.json
154
+ - .harness-evolver/summary.json
155
+ - .harness-evolver/PROPOSER_HISTORY.md
156
+ - .harness-evolver/config.json
157
+ - .harness-evolver/harnesses/{best_version}/harness.py
158
+ - .harness-evolver/harnesses/{best_version}/scores.json
159
+ </files_to_read>
160
+
161
+ <output>
162
+ Write:
163
+ - .harness-evolver/architecture.json (structured recommendation)
164
+ - .harness-evolver/architecture.md (human-readable analysis)
165
+ </output>
166
+
167
+ <success_criteria>
168
+ - Recommendation includes concrete migration steps
169
+ - Each step is implementable in one proposer iteration
170
+ - Considers detected stack and available API keys
171
+ </success_criteria>
172
+ ```
173
+
174
+ Wait for `## ARCHITECTURE ANALYSIS COMPLETE`.
124
175
 
125
176
  After the architect completes, report:
126
177