harness-evolver 1.6.0 → 1.7.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/architect/SKILL.md +10 -2
- package/skills/critic/SKILL.md +10 -2
- package/skills/evolve/SKILL.md +34 -11
package/package.json
CHANGED
|
@@ -48,13 +48,21 @@ python3 $TOOLS/analyze_architecture.py \
|
|
|
48
48
|
-o .harness-evolver/architecture_signals.json
|
|
49
49
|
```
|
|
50
50
|
|
|
51
|
-
3.
|
|
51
|
+
3. Read the architect agent definition:
|
|
52
|
+
```bash
|
|
53
|
+
cat ~/.claude/agents/harness-evolver-architect.md
|
|
54
|
+
```
|
|
55
|
+
|
|
56
|
+
4. Dispatch using the Agent tool — include the agent definition in the prompt:
|
|
52
57
|
|
|
53
58
|
```
|
|
54
59
|
Agent(
|
|
55
|
-
subagent_type: "harness-evolver-architect",
|
|
56
60
|
description: "Architect: topology analysis",
|
|
57
61
|
prompt: |
|
|
62
|
+
<agent_instructions>
|
|
63
|
+
{paste the FULL content of harness-evolver-architect.md here}
|
|
64
|
+
</agent_instructions>
|
|
65
|
+
|
|
58
66
|
<objective>
|
|
59
67
|
Analyze the harness architecture and recommend the optimal multi-agent topology.
|
|
60
68
|
{If called from evolve: "The evolution loop stagnated/regressed after N iterations."}
|
package/skills/critic/SKILL.md
CHANGED
|
@@ -22,13 +22,21 @@ TOOLS=$([ -d ".harness-evolver/tools" ] && echo ".harness-evolver/tools" || echo
|
|
|
22
22
|
|
|
23
23
|
1. Read `summary.json` and identify the suspicious pattern (score jump, premature convergence).
|
|
24
24
|
|
|
25
|
-
2.
|
|
25
|
+
2. Read the critic agent definition:
|
|
26
|
+
```bash
|
|
27
|
+
cat ~/.claude/agents/harness-evolver-critic.md
|
|
28
|
+
```
|
|
29
|
+
|
|
30
|
+
3. Dispatch using the Agent tool — include the agent definition in the prompt:
|
|
26
31
|
|
|
27
32
|
```
|
|
28
33
|
Agent(
|
|
29
|
-
subagent_type: "harness-evolver-critic",
|
|
30
34
|
description: "Critic: analyze eval quality",
|
|
31
35
|
prompt: |
|
|
36
|
+
<agent_instructions>
|
|
37
|
+
{paste the FULL content of harness-evolver-critic.md here}
|
|
38
|
+
</agent_instructions>
|
|
39
|
+
|
|
32
40
|
<objective>
|
|
33
41
|
Analyze eval quality for this harness evolution project.
|
|
34
42
|
The best version is {version} with score {score} achieved in {iterations} iteration(s).
|
package/skills/evolve/SKILL.md
CHANGED
|
@@ -50,15 +50,23 @@ These files are included in the proposer's `<files_to_read>` so it has real trac
|
|
|
50
50
|
|
|
51
51
|
### 2. Propose
|
|
52
52
|
|
|
53
|
-
Dispatch a subagent using the **Agent tool
|
|
53
|
+
Dispatch a subagent using the **Agent tool**.
|
|
54
54
|
|
|
55
|
-
|
|
55
|
+
First, read the proposer agent definition to include in the prompt:
|
|
56
|
+
```bash
|
|
57
|
+
cat ~/.claude/agents/harness-evolver-proposer.md
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
Then dispatch the Agent with the agent definition + structured task:
|
|
56
61
|
|
|
57
62
|
```
|
|
58
63
|
Agent(
|
|
59
|
-
subagent_type: "harness-evolver-proposer",
|
|
60
64
|
description: "Propose harness {version}",
|
|
61
65
|
prompt: |
|
|
66
|
+
<agent_instructions>
|
|
67
|
+
{paste the FULL content of harness-evolver-proposer.md here}
|
|
68
|
+
</agent_instructions>
|
|
69
|
+
|
|
62
70
|
<objective>
|
|
63
71
|
Propose harness version {version} that improves on the current best score of {best_score}.
|
|
64
72
|
</objective>
|
|
@@ -73,7 +81,6 @@ Agent(
|
|
|
73
81
|
- .harness-evolver/harnesses/{best_version}/proposal.md
|
|
74
82
|
- .harness-evolver/langsmith_diagnosis.json (if exists — LangSmith failure analysis)
|
|
75
83
|
- .harness-evolver/langsmith_stats.json (if exists — LangSmith aggregate stats)
|
|
76
|
-
- .harness-evolver/context7_docs.md (if exists — current library documentation)
|
|
77
84
|
- .harness-evolver/architecture.json (if exists — architect topology recommendation)
|
|
78
85
|
</files_to_read>
|
|
79
86
|
|
|
@@ -87,13 +94,13 @@ Agent(
|
|
|
87
94
|
<success_criteria>
|
|
88
95
|
- harness.py maintains CLI interface (--input, --output, --traces-dir, --config)
|
|
89
96
|
- proposal.md documents evidence-based reasoning
|
|
90
|
-
-
|
|
91
|
-
-
|
|
97
|
+
- If proposing API changes, MUST use Context7 (resolve-library-id + get-library-docs) to verify current docs
|
|
98
|
+
- Changes motivated by LangSmith trace data (in langsmith_diagnosis.json) when available
|
|
92
99
|
</success_criteria>
|
|
93
100
|
)
|
|
94
101
|
```
|
|
95
102
|
|
|
96
|
-
Wait for `## PROPOSAL COMPLETE` in the response.
|
|
103
|
+
Wait for `## PROPOSAL COMPLETE` in the response.
|
|
97
104
|
|
|
98
105
|
### 3. Validate
|
|
99
106
|
|
|
@@ -150,13 +157,21 @@ python3 $TOOLS/evaluate.py run \
|
|
|
150
157
|
--timeout 60
|
|
151
158
|
```
|
|
152
159
|
|
|
153
|
-
|
|
160
|
+
First read the critic agent definition:
|
|
161
|
+
```bash
|
|
162
|
+
cat ~/.claude/agents/harness-evolver-critic.md
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
Then dispatch:
|
|
154
166
|
|
|
155
167
|
```
|
|
156
168
|
Agent(
|
|
157
|
-
subagent_type: "harness-evolver-critic",
|
|
158
169
|
description: "Critic: analyze eval quality",
|
|
159
170
|
prompt: |
|
|
171
|
+
<agent_instructions>
|
|
172
|
+
{paste the FULL content of harness-evolver-critic.md here}
|
|
173
|
+
</agent_instructions>
|
|
174
|
+
|
|
160
175
|
<objective>
|
|
161
176
|
EVAL GAMING DETECTED: Score jumped from {parent_score} to {score} in one iteration.
|
|
162
177
|
Analyze the eval quality and propose a stricter eval.
|
|
@@ -219,13 +234,21 @@ python3 $TOOLS/analyze_architecture.py \
|
|
|
219
234
|
-o .harness-evolver/architecture_signals.json
|
|
220
235
|
```
|
|
221
236
|
|
|
222
|
-
|
|
237
|
+
First read the architect agent definition:
|
|
238
|
+
```bash
|
|
239
|
+
cat ~/.claude/agents/harness-evolver-architect.md
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
Then dispatch:
|
|
223
243
|
|
|
224
244
|
```
|
|
225
245
|
Agent(
|
|
226
|
-
subagent_type: "harness-evolver-architect",
|
|
227
246
|
description: "Architect: analyze topology after {stagnation/regression}",
|
|
228
247
|
prompt: |
|
|
248
|
+
<agent_instructions>
|
|
249
|
+
{paste the FULL content of harness-evolver-architect.md here}
|
|
250
|
+
</agent_instructions>
|
|
251
|
+
|
|
229
252
|
<objective>
|
|
230
253
|
The evolution loop has {stagnated/regressed} after {iterations} iterations (best: {best_score}).
|
|
231
254
|
Analyze the harness architecture and recommend a topology change.
|