harness-evolver 1.8.0 → 2.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "harness-evolver",
3
- "version": "1.8.0",
3
+ "version": "2.0.0",
4
4
  "description": "Meta-Harness-style autonomous harness optimization for Claude Code",
5
5
  "author": "Raphael Valdetaro",
6
6
  "license": "MIT",
@@ -36,15 +36,34 @@ python3 -c "import json; s=json.load(open('.harness-evolver/summary.json')); pri
36
36
 
37
37
  ### 1.5. Gather LangSmith Traces (MANDATORY after every evaluation)
38
38
 
39
- **Run these commands unconditionally after EVERY evaluation** (including baseline). If langsmith-cli is not installed or there are no runs, the commands fail silently that's fine. But you MUST attempt them.
39
+ **Run these commands unconditionally after EVERY evaluation** (including baseline). Do NOT guess project namesdiscover them.
40
+
41
+ **Step 1: Find the actual LangSmith project name**
40
42
 
41
43
  ```bash
42
- langsmith-cli --json runs list --project harness-evolver-{last_evaluated_version} --failed --fields id,name,error,inputs --limit 10 > .harness-evolver/langsmith_diagnosis.json 2>/dev/null || echo "[]" > .harness-evolver/langsmith_diagnosis.json
44
+ langsmith-cli --json projects list --name-pattern "harness-evolver*" --limit 10 2>/dev/null
45
+ ```
43
46
 
44
- langsmith-cli --json runs stats --project harness-evolver-{last_evaluated_version} > .harness-evolver/langsmith_stats.json 2>/dev/null || echo "{}" > .harness-evolver/langsmith_stats.json
47
+ This returns all projects matching the prefix. Pick the most recently updated one, or the one matching the current version. Save the project name:
48
+
49
+ ```bash
50
+ LS_PROJECT=$(langsmith-cli --json projects list --name-pattern "harness-evolver*" --limit 1 2>/dev/null | python3 -c "import sys,json; data=json.load(sys.stdin); print(data[0]['name'] if data else '')" 2>/dev/null || echo "")
45
51
  ```
46
52
 
47
- For the first iteration, use `baseline` as the version. For subsequent iterations, use the latest evaluated version.
53
+ If `LS_PROJECT` is empty, langsmith-cli is not available or no projects exist skip to step 2.
54
+
55
+ **Step 2: Gather traces from the discovered project**
56
+
57
+ ```bash
58
+ if [ -n "$LS_PROJECT" ]; then
59
+ langsmith-cli --json runs list --project "$LS_PROJECT" --failed --fields id,name,error,inputs --limit 10 > .harness-evolver/langsmith_diagnosis.json 2>/dev/null || echo "[]" > .harness-evolver/langsmith_diagnosis.json
60
+ langsmith-cli --json runs stats --project "$LS_PROJECT" > .harness-evolver/langsmith_stats.json 2>/dev/null || echo "{}" > .harness-evolver/langsmith_stats.json
61
+ echo "$LS_PROJECT" > .harness-evolver/langsmith_project.txt
62
+ else
63
+ echo "[]" > .harness-evolver/langsmith_diagnosis.json
64
+ echo "{}" > .harness-evolver/langsmith_stats.json
65
+ fi
66
+ ```
48
67
 
49
68
  These files are included in the proposer's `<files_to_read>` so it has real trace data for diagnosis.
50
69
 
@@ -187,15 +206,41 @@ Agent(
187
206
  )
188
207
  ```
189
208
 
190
- Wait for all 3 to complete. The background agents will notify when done.
209
+ **Also spawn these additional candidates:**
210
+
211
+ **Candidate D (Prompt Specialist)** — `run_in_background: true`:
212
+ Same as Exploiter but with a different focus:
213
+ ```
214
+ <strategy>
215
+ APPROACH: prompt-engineering
216
+ You are the PROMPT SPECIALIST. Focus ONLY on improving the system prompt,
217
+ few-shot examples, output format instructions, and prompt structure.
218
+ Do NOT change the retrieval logic, pipeline structure, or code architecture.
219
+ </strategy>
220
+ ```
221
+ Output to: `.harness-evolver/harnesses/{version}d/`
222
+
223
+ **Candidate E (Data/Retrieval Specialist)** — `run_in_background: true`:
224
+ ```
225
+ <strategy>
226
+ APPROACH: retrieval-optimization
227
+ You are the RETRIEVAL SPECIALIST. Focus ONLY on improving how data is
228
+ retrieved, filtered, ranked, and presented to the LLM.
229
+ Do NOT change the system prompt text or output formatting.
230
+ Improve: search logic, relevance scoring, cross-domain retrieval, chunking.
231
+ </strategy>
232
+ ```
233
+ Output to: `.harness-evolver/harnesses/{version}e/`
234
+
235
+ Wait for all 5 to complete. The background agents will notify when done.
191
236
 
192
- **Special case iteration 1**: Only the exploiter and explorer can run (no second parent for crossover yet). Spawn 2 agents: exploiter (from baseline) and explorer (also from baseline but with bold strategy). Skip crossover.
237
+ **Minimum 3 candidates ALWAYS, even on iteration 1.** On iteration 1, the crossover agent uses baseline as both parents but with instruction to "combine the best retrieval strategy with the best prompt strategy from your analysis of the baseline." On iteration 2+, crossover uses two genuinely different parents.
193
238
 
194
- **Special case — iteration 2+**: All 3 strategies. Explorer parent = fitness-weighted random from history excluding current best.
239
+ **On iteration 3+**: If scores are improving, keep all 5 strategies. If stagnating, replace Candidate D with a "Radical" strategy that rewrites the harness from scratch.
195
240
 
196
241
  ### 3. Validate All Candidates
197
242
 
198
- For each candidate (a, b, c):
243
+ For each candidate (a, b, c, d, e):
199
244
  ```bash
200
245
  python3 $TOOLS/evaluate.py validate --harness .harness-evolver/harnesses/{version}{suffix}/harness.py --config .harness-evolver/harnesses/{version}{suffix}/config.json
201
246
  ```
package/tools/evaluate.py CHANGED
@@ -118,12 +118,17 @@ def cmd_run(args):
118
118
  api_key = os.environ.get(ls.get("api_key_env", "LANGSMITH_API_KEY"), "")
119
119
  if api_key:
120
120
  version = os.path.basename(os.path.dirname(traces_dir))
121
+ ls_project = f"{ls.get('project_prefix', 'harness-evolver')}-{version}"
121
122
  langsmith_env = {
122
123
  **os.environ,
123
124
  "LANGCHAIN_TRACING_V2": "true",
124
125
  "LANGCHAIN_API_KEY": api_key,
125
- "LANGCHAIN_PROJECT": f"{ls.get('project_prefix', 'harness-evolver')}-{version}",
126
+ "LANGCHAIN_PROJECT": ls_project,
126
127
  }
128
+ # Write the project name so the evolve skill knows where to find traces
129
+ ls_project_file = os.path.join(os.path.dirname(os.path.dirname(traces_dir)), "langsmith_project.txt")
130
+ with open(ls_project_file, "w") as f:
131
+ f.write(ls_project)
127
132
 
128
133
  for task_file in task_files:
129
134
  task_path = os.path.join(tasks_dir, task_file)