harness-evolver 1.9.0 → 2.0.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/package.json +1 -1
- package/skills/evolve/SKILL.md +30 -4
package/package.json
CHANGED
package/skills/evolve/SKILL.md
CHANGED
|
@@ -206,15 +206,41 @@ Agent(
|
|
|
206
206
|
)
|
|
207
207
|
```
|
|
208
208
|
|
|
209
|
-
|
|
209
|
+
**Also spawn these additional candidates:**
|
|
210
210
|
|
|
211
|
-
**
|
|
211
|
+
**Candidate D (Prompt Specialist)** — `run_in_background: true`:
|
|
212
|
+
Same as Exploiter but with a different focus:
|
|
213
|
+
```
|
|
214
|
+
<strategy>
|
|
215
|
+
APPROACH: prompt-engineering
|
|
216
|
+
You are the PROMPT SPECIALIST. Focus ONLY on improving the system prompt,
|
|
217
|
+
few-shot examples, output format instructions, and prompt structure.
|
|
218
|
+
Do NOT change the retrieval logic, pipeline structure, or code architecture.
|
|
219
|
+
</strategy>
|
|
220
|
+
```
|
|
221
|
+
Output to: `.harness-evolver/harnesses/{version}d/`
|
|
222
|
+
|
|
223
|
+
**Candidate E (Data/Retrieval Specialist)** — `run_in_background: true`:
|
|
224
|
+
```
|
|
225
|
+
<strategy>
|
|
226
|
+
APPROACH: retrieval-optimization
|
|
227
|
+
You are the RETRIEVAL SPECIALIST. Focus ONLY on improving how data is
|
|
228
|
+
retrieved, filtered, ranked, and presented to the LLM.
|
|
229
|
+
Do NOT change the system prompt text or output formatting.
|
|
230
|
+
Improve: search logic, relevance scoring, cross-domain retrieval, chunking.
|
|
231
|
+
</strategy>
|
|
232
|
+
```
|
|
233
|
+
Output to: `.harness-evolver/harnesses/{version}e/`
|
|
234
|
+
|
|
235
|
+
Wait for all 5 to complete. The background agents will notify when done.
|
|
236
|
+
|
|
237
|
+
**Minimum 3 candidates ALWAYS, even on iteration 1.** On iteration 1, the crossover agent uses baseline as both parents but with instruction to "combine the best retrieval strategy with the best prompt strategy from your analysis of the baseline." On iteration 2+, crossover uses two genuinely different parents.
|
|
212
238
|
|
|
213
|
-
**
|
|
239
|
+
**On iteration 3+**: If scores are improving, keep all 5 strategies. If stagnating, replace Candidate D with a "Radical" strategy that rewrites the harness from scratch.
|
|
214
240
|
|
|
215
241
|
### 3. Validate All Candidates
|
|
216
242
|
|
|
217
|
-
For each candidate (a, b, c):
|
|
243
|
+
For each candidate (a, b, c, d, e):
|
|
218
244
|
```bash
|
|
219
245
|
python3 $TOOLS/evaluate.py validate --harness .harness-evolver/harnesses/{version}{suffix}/harness.py --config .harness-evolver/harnesses/{version}{suffix}/config.json
|
|
220
246
|
```
|