harness-evolver 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "harness-evolver",
3
- "version": "0.2.0",
3
+ "version": "0.2.1",
4
4
  "description": "Meta-Harness-style autonomous harness optimization for Claude Code",
5
5
  "author": "Raphael Valdetaro Christi Cordeiro",
6
6
  "license": "MIT",
@@ -1,8 +1,8 @@
1
1
  ---
2
2
  name: harness-evolve-init
3
- description: "Initialize harness evolution in the current project. Auto-detects harness.py, eval.py, and tasks/ in the working directory."
4
- argument-hint: "[directory] [--harness <path>] [--eval <path>] [--tasks <path>]"
5
- allowed-tools: [Read, Write, Bash, Glob]
3
+ description: "Initialize harness evolution in the current project. Scans the codebase, identifies the entry point, and helps create harness wrapper, eval script, and test cases if they don't exist."
4
+ argument-hint: "[directory]"
5
+ allowed-tools: [Read, Write, Edit, Bash, Glob, Grep, Agent]
6
6
  ---
7
7
 
8
8
  # /harness-evolve-init
@@ -12,39 +12,187 @@ Initialize the Harness Evolver for this project.
12
12
  ## Usage
13
13
 
14
14
  ```
15
- /harness-evolve-init # auto-detect everything in CWD
16
- /harness-evolve-init ./my-project # auto-detect in a specific directory
17
- /harness-evolve-init --harness run.py # override one path, auto-detect the rest
15
+ /harness-evolve-init # setup in current directory
16
+ /harness-evolve-init ./my-project # setup in a specific directory
18
17
  ```
19
18
 
20
- ## How Auto-Detection Works
19
+ ## Your Job: A 3-Phase Setup Wizard
21
20
 
22
- The tool scans the directory for:
23
- 1. **Exact names:** `harness.py`, `eval.py`, `tasks/`, `config.json`
24
- 2. **Fuzzy fallback:** `*harness*`, `*agent*`, `*run*` for harness; `*eval*`, `*score*` for eval; any dir with JSON files containing `id`/`input` fields for tasks
21
+ You are the intelligent layer. The init.py tool is dumb it takes paths. Your job is to figure out what to pass it, creating files if needed.
25
22
 
26
- If all 3 are found, init proceeds immediately. If something is missing, it reports what's needed.
23
+ ### Phase 1: SCAN the project
27
24
 
28
- ## What To Do
25
+ Read the project structure to understand what exists:
29
26
 
30
- Run the init tool:
27
+ ```bash
28
+ find . -maxdepth 3 -type f -name "*.py" | head -30
29
+ ls -la
30
+ ```
31
+
32
+ Look for:
33
+ - **Entry point candidates:** `main.py`, `app.py`, `agent.py`, `graph.py`, `pipeline.py`, `bot.py`, `run.py`, or any file with `if __name__` block
34
+ - **Existing eval/test files:** `eval.py`, `test_*.py`, `score.py`, `judge.py`
35
+ - **Existing test data:** `tasks/`, `tests/`, `data/`, `examples/`, `fixtures/`, any dir with JSON/JSONL files
36
+ - **Config files:** `config.json`, `config.yaml`, `.env`
37
+ - **Framework clues:** imports of langchain, langgraph, openai, anthropic, crewai, etc.
38
+
39
+ Also run stack detection:
40
+ ```bash
41
+ python3 ~/.harness-evolver/tools/detect_stack.py .
42
+ ```
43
+
44
+ ### Phase 2: CREATE what's missing
45
+
46
+ There are 3 artifacts needed. For each, check if it exists or needs to be created.
47
+
48
+ #### A. The Harness (`harness.py`)
49
+
50
+ **If `harness.py` already exists with the right interface** (`--input`, `--output`): use it directly.
51
+
52
+ **If the project has an entry point but NOT in our format:** Create a `harness.py` wrapper.
53
+
54
+ Read the entry point to understand its input/output format, then generate a wrapper:
55
+
56
+ ```python
57
+ #!/usr/bin/env python3
58
+ """Harness wrapper for [project name]. Generated by harness-evolve-init."""
59
+
60
+ import argparse
61
+ import json
62
+ import sys
63
+
64
+ # Import the actual project code
65
+ from [entry_module] import [main_function_or_class]
66
+
67
+ def main():
68
+ parser = argparse.ArgumentParser()
69
+ parser.add_argument("--input", required=True)
70
+ parser.add_argument("--output", required=True)
71
+ parser.add_argument("--traces-dir", default=None)
72
+ parser.add_argument("--config", default=None)
73
+ args = parser.parse_args()
74
+
75
+ task = json.load(open(args.input))
76
+ config = json.load(open(args.config)) if args.config else {}
77
+
78
+ # Call the actual project code
79
+ result = [call_the_project](task["input"], **config)
80
+
81
+ json.dump({"id": task["id"], "output": str(result)}, open(args.output, "w"))
82
+
83
+ if __name__ == "__main__":
84
+ main()
85
+ ```
86
+
87
+ Adapt this template based on what you learned from reading the entry point. Ask the user to confirm if you're unsure about the input/output mapping.
88
+
89
+ #### B. The Eval (`eval.py`)
90
+
91
+ **If `eval.py` exists:** use it.
92
+
93
+ **If not:** Ask the user what "correct" means for their use case, then generate one:
94
+
95
+ - **Classification/extraction:** exact match or fuzzy match
96
+ - **Chatbot/QA:** LLM-as-judge (requires API key) or keyword matching
97
+ - **Code generation:** execution-based (run the code, check output)
98
+ - **RAG:** relevance scoring
99
+
100
+ Start with the simplest eval that works. The evolver can iterate on the harness without a perfect eval — even a rough eval gives signal.
101
+
102
+ ```python
103
+ #!/usr/bin/env python3
104
+ """Eval script for [project]. Generated by harness-evolve-init."""
105
+
106
+ import argparse
107
+ import json
108
+ import os
109
+
110
+ def main():
111
+ parser = argparse.ArgumentParser()
112
+ parser.add_argument("--results-dir", required=True)
113
+ parser.add_argument("--tasks-dir", required=True)
114
+ parser.add_argument("--scores", required=True)
115
+ args = parser.parse_args()
116
+
117
+ correct, total = 0, 0
118
+ per_task = {}
119
+
120
+ for fname in sorted(os.listdir(args.tasks_dir)):
121
+ if not fname.endswith(".json"):
122
+ continue
123
+ with open(os.path.join(args.tasks_dir, fname)) as f:
124
+ task = json.load(f)
125
+ task_id = task["id"]
126
+
127
+ result_path = os.path.join(args.results_dir, fname)
128
+ if not os.path.exists(result_path):
129
+ per_task[task_id] = {"score": 0.0, "error": "no output"}
130
+ total += 1
131
+ continue
132
+
133
+ with open(result_path) as f:
134
+ result = json.load(f)
135
+
136
+ # ADAPT THIS: define what "correct" means for this project
137
+ expected = task.get("expected", "")
138
+ actual = result.get("output", "")
139
+ match = actual.lower().strip() == expected.lower().strip()
140
+
141
+ per_task[task_id] = {"score": 1.0 if match else 0.0, "expected": expected, "actual": actual}
142
+ correct += int(match)
143
+ total += 1
144
+
145
+ accuracy = correct / total if total > 0 else 0.0
146
+ json.dump({
147
+ "combined_score": accuracy,
148
+ "accuracy": accuracy,
149
+ "total_tasks": total,
150
+ "correct": correct,
151
+ "per_task": per_task,
152
+ }, open(args.scores, "w"), indent=2)
153
+
154
+ if __name__ == "__main__":
155
+ main()
156
+ ```
157
+
158
+ #### C. Test Cases (`tasks/`)
159
+
160
+ **If `tasks/` exists with JSON files:** use it.
161
+
162
+ **If the project has test data in another format:** Convert it to our format.
163
+
164
+ **If no test data exists:** Help create 5-10 test cases. Ask the user:
165
+ > "What are typical inputs to your system? And what are the expected outputs? Give me 3-5 examples and I'll create the task files."
166
+
167
+ Each task file is:
168
+ ```json
169
+ {"id": "task_001", "input": "...", "expected": "...", "metadata": {}}
170
+ ```
171
+
172
+ ### Phase 3: RUN init.py
173
+
174
+ Once all 3 artifacts exist, run:
31
175
 
32
176
  ```bash
33
- python3 ~/.harness-evolver/tools/init.py {directory if provided} \
177
+ python3 ~/.harness-evolver/tools/init.py \
178
+ --harness harness.py \
179
+ --eval eval.py \
180
+ --tasks tasks/ \
34
181
  --tools-dir ~/.harness-evolver/tools
35
182
  ```
36
183
 
37
- Add explicit flags only if the user provided them:
38
- - `--harness PATH` — override harness auto-detection
39
- - `--eval PATH` — override eval auto-detection
40
- - `--tasks PATH` — override tasks auto-detection
41
- - `--harness-config PATH` — optional config for the harness
184
+ If a harness config exists, add `--harness-config config.json`.
42
185
 
43
- If `~/.harness-evolver/tools/init.py` does not exist, check `.harness-evolver/tools/init.py` (local override).
186
+ If `~/.harness-evolver/tools/init.py` does not exist, check `.harness-evolver/tools/init.py`.
44
187
 
45
- After init completes, report:
46
- - What was detected (harness, eval, tasks)
188
+ ### After init completes, report:
189
+
190
+ - What was detected vs created
191
+ - Stack detected (libraries)
192
+ - Integrations available (LangSmith, Context7)
47
193
  - Baseline score
48
- - Number of tasks
49
- - Integrations detected (LangSmith, Context7, stack)
50
- - Next step: run `/harness-evolve` to start the optimization loop
194
+ - Next step: `/harness-evolve` to start the optimization loop
195
+
196
+ ## Key Principle
197
+
198
+ **Don't ask the user to restructure their project.** You adapt to them. If they have a LangGraph graph in `src/graph.py`, you create a thin wrapper — you don't ask them to rename it to `harness.py`. The wrapper IS the harness.