harness-evolver 2.1.0 → 2.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/architect/SKILL.md +2 -10
- package/skills/critic/SKILL.md +2 -10
- package/skills/evolve/SKILL.md +11 -49
- package/skills/init/SKILL.md +2 -9
package/package.json
CHANGED
|
@@ -48,21 +48,13 @@ python3 $TOOLS/analyze_architecture.py \
|
|
|
48
48
|
-o .harness-evolver/architecture_signals.json
|
|
49
49
|
```
|
|
50
50
|
|
|
51
|
-
3.
|
|
52
|
-
```bash
|
|
53
|
-
cat ~/.claude/agents/harness-evolver-architect.md
|
|
54
|
-
```
|
|
55
|
-
|
|
56
|
-
4. Dispatch using the Agent tool — include the agent definition in the prompt:
|
|
51
|
+
3. Dispatch using the Agent tool with `subagent_type: "harness-evolver-architect"`:
|
|
57
52
|
|
|
58
53
|
```
|
|
59
54
|
Agent(
|
|
55
|
+
subagent_type: "harness-evolver-architect",
|
|
60
56
|
description: "Architect: topology analysis",
|
|
61
57
|
prompt: |
|
|
62
|
-
<agent_instructions>
|
|
63
|
-
{paste the FULL content of harness-evolver-architect.md here}
|
|
64
|
-
</agent_instructions>
|
|
65
|
-
|
|
66
58
|
<objective>
|
|
67
59
|
Analyze the harness architecture and recommend the optimal multi-agent topology.
|
|
68
60
|
{If called from evolve: "The evolution loop stagnated/regressed after N iterations."}
|
package/skills/critic/SKILL.md
CHANGED
|
@@ -22,21 +22,13 @@ TOOLS=$([ -d ".harness-evolver/tools" ] && echo ".harness-evolver/tools" || echo
|
|
|
22
22
|
|
|
23
23
|
1. Read `summary.json` and identify the suspicious pattern (score jump, premature convergence).
|
|
24
24
|
|
|
25
|
-
2.
|
|
26
|
-
```bash
|
|
27
|
-
cat ~/.claude/agents/harness-evolver-critic.md
|
|
28
|
-
```
|
|
29
|
-
|
|
30
|
-
3. Dispatch using the Agent tool — include the agent definition in the prompt:
|
|
25
|
+
2. Dispatch using the Agent tool with `subagent_type: "harness-evolver-critic"`:
|
|
31
26
|
|
|
32
27
|
```
|
|
33
28
|
Agent(
|
|
29
|
+
subagent_type: "harness-evolver-critic",
|
|
34
30
|
description: "Critic: analyze eval quality",
|
|
35
31
|
prompt: |
|
|
36
|
-
<agent_instructions>
|
|
37
|
-
{paste the FULL content of harness-evolver-critic.md here}
|
|
38
|
-
</agent_instructions>
|
|
39
|
-
|
|
40
32
|
<objective>
|
|
41
33
|
Analyze eval quality for this harness evolution project.
|
|
42
34
|
The best version is {version} with score {score} achieved in {iterations} iteration(s).
|
package/skills/evolve/SKILL.md
CHANGED
|
@@ -73,28 +73,20 @@ These files are included in the proposer's `<files_to_read>` so it has real trac
|
|
|
73
73
|
Spawn 3 proposer agents IN PARALLEL, each with a different evolutionary strategy.
|
|
74
74
|
This follows the DGM/AlphaEvolve pattern: exploit + explore + crossover.
|
|
75
75
|
|
|
76
|
-
|
|
77
|
-
```bash
|
|
78
|
-
cat ~/.claude/agents/harness-evolver-proposer.md
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
Then determine parents for each strategy:
|
|
76
|
+
Determine parents for each strategy:
|
|
82
77
|
- **Exploiter parent**: current best version (from summary.json `best.version`)
|
|
83
78
|
- **Explorer parent**: a non-best version with low offspring count (read summary.json history, pick one that scored >0 but is NOT the best and has NOT been parent to many children)
|
|
84
79
|
- **Crossover parents**: best version + a different high-scorer from a different lineage
|
|
85
80
|
|
|
86
|
-
Spawn all 3 using the Agent tool
|
|
81
|
+
Spawn all 3 using the Agent tool with `subagent_type: "harness-evolver-proposer"`. The first 2 use `run_in_background: true`, the 3rd blocks:
|
|
87
82
|
|
|
88
83
|
**Candidate A (Exploiter)** — `run_in_background: true`:
|
|
89
84
|
```
|
|
90
85
|
Agent(
|
|
86
|
+
subagent_type: "harness-evolver-proposer",
|
|
91
87
|
description: "Proposer A (exploit): targeted fix for {version}",
|
|
92
88
|
run_in_background: true,
|
|
93
89
|
prompt: |
|
|
94
|
-
<agent_instructions>
|
|
95
|
-
{FULL content of harness-evolver-proposer.md}
|
|
96
|
-
</agent_instructions>
|
|
97
|
-
|
|
98
90
|
<strategy>
|
|
99
91
|
APPROACH: exploitation
|
|
100
92
|
You are the EXPLOITER. Make the SMALLEST, most targeted change that fixes
|
|
@@ -130,13 +122,10 @@ Agent(
|
|
|
130
122
|
**Candidate B (Explorer)** — `run_in_background: true`:
|
|
131
123
|
```
|
|
132
124
|
Agent(
|
|
125
|
+
subagent_type: "harness-evolver-proposer",
|
|
133
126
|
description: "Proposer B (explore): bold change from {explorer_parent}",
|
|
134
127
|
run_in_background: true,
|
|
135
128
|
prompt: |
|
|
136
|
-
<agent_instructions>
|
|
137
|
-
{FULL content of harness-evolver-proposer.md}
|
|
138
|
-
</agent_instructions>
|
|
139
|
-
|
|
140
129
|
<strategy>
|
|
141
130
|
APPROACH: exploration
|
|
142
131
|
You are the EXPLORER. Try a FUNDAMENTALLY DIFFERENT approach.
|
|
@@ -172,12 +161,9 @@ Agent(
|
|
|
172
161
|
**Candidate C (Crossover)** — blocks (last one):
|
|
173
162
|
```
|
|
174
163
|
Agent(
|
|
164
|
+
subagent_type: "harness-evolver-proposer",
|
|
175
165
|
description: "Proposer C (crossover): combine {parent_a} + {parent_b}",
|
|
176
166
|
prompt: |
|
|
177
|
-
<agent_instructions>
|
|
178
|
-
{FULL content of harness-evolver-proposer.md}
|
|
179
|
-
</agent_instructions>
|
|
180
|
-
|
|
181
167
|
<strategy>
|
|
182
168
|
APPROACH: crossover
|
|
183
169
|
You are the CROSSOVER agent. Combine the STRENGTHS of two different versions:
|
|
@@ -269,21 +255,13 @@ python3 $TOOLS/evaluate.py run \
|
|
|
269
255
|
|
|
270
256
|
For each evaluated candidate, read its scores.json. If `eval_type` is `"pending-judge"` (combined_score == -1), the eval was a passthrough and needs judge scoring.
|
|
271
257
|
|
|
272
|
-
|
|
273
|
-
```bash
|
|
274
|
-
cat ~/.claude/agents/harness-evolver-judge.md
|
|
275
|
-
```
|
|
276
|
-
|
|
277
|
-
Spawn judge subagent for EACH candidate that needs judging:
|
|
258
|
+
Spawn judge subagent with `subagent_type: "harness-evolver-judge"` for EACH candidate that needs judging:
|
|
278
259
|
|
|
279
260
|
```
|
|
280
261
|
Agent(
|
|
262
|
+
subagent_type: "harness-evolver-judge",
|
|
281
263
|
description: "Judge: score {version}{suffix} outputs",
|
|
282
264
|
prompt: |
|
|
283
|
-
<agent_instructions>
|
|
284
|
-
{FULL content of harness-evolver-judge.md}
|
|
285
|
-
</agent_instructions>
|
|
286
|
-
|
|
287
265
|
<objective>
|
|
288
266
|
Score the outputs of harness version {version}{suffix} across all {N} tasks.
|
|
289
267
|
</objective>
|
|
@@ -354,21 +332,13 @@ python3 $TOOLS/evaluate.py run \
|
|
|
354
332
|
--timeout 60
|
|
355
333
|
```
|
|
356
334
|
|
|
357
|
-
|
|
358
|
-
```bash
|
|
359
|
-
cat ~/.claude/agents/harness-evolver-critic.md
|
|
360
|
-
```
|
|
361
|
-
|
|
362
|
-
Then dispatch:
|
|
335
|
+
Dispatch the critic agent:
|
|
363
336
|
|
|
364
337
|
```
|
|
365
338
|
Agent(
|
|
339
|
+
subagent_type: "harness-evolver-critic",
|
|
366
340
|
description: "Critic: analyze eval quality",
|
|
367
341
|
prompt: |
|
|
368
|
-
<agent_instructions>
|
|
369
|
-
{paste the FULL content of harness-evolver-critic.md here}
|
|
370
|
-
</agent_instructions>
|
|
371
|
-
|
|
372
342
|
<objective>
|
|
373
343
|
EVAL GAMING DETECTED: Score jumped from {parent_score} to {score} in one iteration.
|
|
374
344
|
Analyze the eval quality and propose a stricter eval.
|
|
@@ -431,21 +401,13 @@ python3 $TOOLS/analyze_architecture.py \
|
|
|
431
401
|
-o .harness-evolver/architecture_signals.json
|
|
432
402
|
```
|
|
433
403
|
|
|
434
|
-
|
|
435
|
-
```bash
|
|
436
|
-
cat ~/.claude/agents/harness-evolver-architect.md
|
|
437
|
-
```
|
|
438
|
-
|
|
439
|
-
Then dispatch:
|
|
404
|
+
Dispatch the architect agent:
|
|
440
405
|
|
|
441
406
|
```
|
|
442
407
|
Agent(
|
|
408
|
+
subagent_type: "harness-evolver-architect",
|
|
443
409
|
description: "Architect: analyze topology after {stagnation/regression}",
|
|
444
410
|
prompt: |
|
|
445
|
-
<agent_instructions>
|
|
446
|
-
{paste the FULL content of harness-evolver-architect.md here}
|
|
447
|
-
</agent_instructions>
|
|
448
|
-
|
|
449
411
|
<objective>
|
|
450
412
|
The evolution loop has {stagnated/regressed} after {iterations} iterations (best: {best_score}).
|
|
451
413
|
Analyze the harness architecture and recommend a topology change.
|
package/skills/init/SKILL.md
CHANGED
|
@@ -49,19 +49,12 @@ If NO eval exists:
|
|
|
49
49
|
**Tasks** (`tasks/`): If test tasks exist, use them.
|
|
50
50
|
|
|
51
51
|
If NO tasks exist:
|
|
52
|
-
-
|
|
53
|
-
```bash
|
|
54
|
-
cat ~/.claude/agents/harness-evolver-testgen.md
|
|
55
|
-
```
|
|
56
|
-
- Spawn testgen subagent:
|
|
52
|
+
- Spawn testgen subagent with `subagent_type: "harness-evolver-testgen"`:
|
|
57
53
|
```
|
|
58
54
|
Agent(
|
|
55
|
+
subagent_type: "harness-evolver-testgen",
|
|
59
56
|
description: "TestGen: generate test cases for this project",
|
|
60
57
|
prompt: |
|
|
61
|
-
<agent_instructions>
|
|
62
|
-
{FULL content of harness-evolver-testgen.md}
|
|
63
|
-
</agent_instructions>
|
|
64
|
-
|
|
65
58
|
<objective>
|
|
66
59
|
Generate 30 diverse test cases for this project. Write them to tasks/ directory.
|
|
67
60
|
</objective>
|