harness-evolver 0.7.1 → 0.9.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +123 -128
- package/bin/install.js +106 -10
- package/package.json +1 -1
package/README.md
CHANGED
|
@@ -4,46 +4,102 @@ End-to-end optimization of LLM agent harnesses, inspired by [Meta-Harness](https
|
|
|
4
4
|
|
|
5
5
|
**The harness is the 80% factor.** Changing just the scaffolding around a fixed LLM can produce a [6x performance gap](https://arxiv.org/abs/2603.28052) on the same benchmark. Harness Evolver automates the search for better harnesses using an autonomous propose-evaluate-iterate loop with full execution traces as feedback.
|
|
6
6
|
|
|
7
|
-
##
|
|
7
|
+
## Install
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
```bash
|
|
10
|
+
npx harness-evolver@latest
|
|
11
|
+
```
|
|
10
12
|
|
|
11
|
-
|
|
13
|
+
Select your runtime (Claude Code, Cursor, Codex, Windsurf) and scope (global/local). Then **restart your AI coding agent** for the skills to appear.
|
|
12
14
|
|
|
13
|
-
##
|
|
15
|
+
## Prerequisites
|
|
16
|
+
|
|
17
|
+
### API Keys (set in your shell before launching Claude Code)
|
|
18
|
+
|
|
19
|
+
The harness you're evolving may call LLM APIs. Set the keys your harness needs:
|
|
14
20
|
|
|
15
21
|
```bash
|
|
16
|
-
#
|
|
17
|
-
|
|
22
|
+
# Required: at least one LLM provider
|
|
23
|
+
export ANTHROPIC_API_KEY="sk-ant-..." # For Claude-based harnesses
|
|
24
|
+
export OPENAI_API_KEY="sk-..." # For OpenAI-based harnesses
|
|
25
|
+
export GEMINI_API_KEY="AIza..." # For Gemini-based harnesses
|
|
26
|
+
export OPENROUTER_API_KEY="sk-or-..." # For OpenRouter (multi-model)
|
|
27
|
+
|
|
28
|
+
# Optional: enhanced tracing
|
|
29
|
+
export LANGSMITH_API_KEY="lsv2_pt_..." # Auto-enables LangSmith tracing
|
|
30
|
+
```
|
|
31
|
+
|
|
32
|
+
The plugin auto-detects which keys are available during `/harness-evolver:init` and shows them. The proposer agent knows which APIs are available and uses them accordingly.
|
|
33
|
+
|
|
34
|
+
**No API key needed for the example** — the classifier example uses keyword matching (mock mode), no LLM calls.
|
|
35
|
+
|
|
36
|
+
### Optional: Enhanced Integrations
|
|
37
|
+
|
|
38
|
+
```bash
|
|
39
|
+
# LangSmith — rich trace analysis for the proposer
|
|
40
|
+
uv tool install langsmith-cli && langsmith-cli auth login
|
|
41
|
+
|
|
42
|
+
# Context7 — up-to-date library documentation for the proposer
|
|
43
|
+
claude mcp add context7 -- npx -y @upstash/context7-mcp@latest
|
|
18
44
|
|
|
19
|
-
#
|
|
20
|
-
|
|
45
|
+
# LangChain Docs — LangChain/LangGraph-specific documentation
|
|
46
|
+
claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp
|
|
21
47
|
```
|
|
22
48
|
|
|
23
49
|
## Quick Start
|
|
24
50
|
|
|
51
|
+
### Try the Example (no API key needed)
|
|
52
|
+
|
|
25
53
|
```bash
|
|
26
|
-
# 1. Copy the example
|
|
54
|
+
# 1. Copy the example
|
|
27
55
|
cp -r ~/.harness-evolver/examples/classifier ./my-classifier
|
|
28
56
|
cd my-classifier
|
|
29
57
|
|
|
30
|
-
# 2.
|
|
31
|
-
|
|
58
|
+
# 2. Open Claude Code
|
|
59
|
+
claude
|
|
60
|
+
|
|
61
|
+
# 3. Initialize — auto-detects harness.py, eval.py, tasks/
|
|
62
|
+
/harness-evolver:init
|
|
63
|
+
|
|
64
|
+
# 4. Run the evolution loop
|
|
65
|
+
/harness-evolver:evolve --iterations 3
|
|
32
66
|
|
|
33
|
-
#
|
|
34
|
-
/harness-
|
|
67
|
+
# 5. Check progress
|
|
68
|
+
/harness-evolver:status
|
|
69
|
+
```
|
|
70
|
+
|
|
71
|
+
### Use with Your Own Project
|
|
72
|
+
|
|
73
|
+
```bash
|
|
74
|
+
cd my-llm-project
|
|
75
|
+
claude
|
|
35
76
|
|
|
36
|
-
#
|
|
37
|
-
|
|
77
|
+
# Init scans your project, identifies the entry point,
|
|
78
|
+
# and helps create harness wrapper + eval + tasks if missing
|
|
79
|
+
/harness-evolver:init
|
|
80
|
+
|
|
81
|
+
# Run optimization
|
|
82
|
+
/harness-evolver:evolve --iterations 10
|
|
38
83
|
```
|
|
39
84
|
|
|
40
|
-
The
|
|
85
|
+
The init skill adapts to your project — if you have `graph.py` instead of `harness.py`, it creates a thin wrapper. If you don't have an eval script, it helps you write one.
|
|
86
|
+
|
|
87
|
+
## Available Commands
|
|
88
|
+
|
|
89
|
+
| Command | What it does |
|
|
90
|
+
|---|---|
|
|
91
|
+
| `/harness-evolver:init` | Scan project, create harness/eval/tasks, run baseline |
|
|
92
|
+
| `/harness-evolver:evolve` | Run the autonomous optimization loop |
|
|
93
|
+
| `/harness-evolver:status` | Show progress (scores, iterations, stagnation) |
|
|
94
|
+
| `/harness-evolver:compare` | Diff two versions with per-task analysis |
|
|
95
|
+
| `/harness-evolver:diagnose` | Deep trace analysis of a specific version |
|
|
96
|
+
| `/harness-evolver:deploy` | Copy the best harness back to your project |
|
|
41
97
|
|
|
42
98
|
## How It Works
|
|
43
99
|
|
|
44
100
|
```
|
|
45
101
|
┌─────────────────────────────┐
|
|
46
|
-
│
|
|
102
|
+
│ /harness-evolver:evolve │
|
|
47
103
|
│ (orchestrator skill) │
|
|
48
104
|
└──────────┬──────────────────┘
|
|
49
105
|
│
|
|
@@ -63,10 +119,10 @@ The classifier example runs in mock mode (no API key needed) and demonstrates th
|
|
|
63
119
|
scores.json
|
|
64
120
|
```
|
|
65
121
|
|
|
66
|
-
1. **Propose** — A proposer agent
|
|
67
|
-
2. **Evaluate** — The harness runs against every task. Traces are captured per-task (input, output, stdout, stderr, timing).
|
|
122
|
+
1. **Propose** — A proposer agent reads all prior candidates' code, execution traces, and scores. Diagnoses failure modes via counterfactual analysis and writes a new harness.
|
|
123
|
+
2. **Evaluate** — The harness runs against every task. Traces are captured per-task (input, output, stdout, stderr, timing). Your eval script scores the results.
|
|
68
124
|
3. **Update** — State files are updated with the new score, parent lineage, and regression detection.
|
|
69
|
-
4. **Repeat** —
|
|
125
|
+
4. **Repeat** — Until N iterations, stagnation (3 rounds without >1% improvement), or target score reached.
|
|
70
126
|
|
|
71
127
|
## The Harness Contract
|
|
72
128
|
|
|
@@ -78,8 +134,8 @@ python3 harness.py --input task.json --output result.json [--traces-dir DIR] [--
|
|
|
78
134
|
|
|
79
135
|
- `--input`: JSON with `{id, input, metadata}` (never sees expected answers)
|
|
80
136
|
- `--output`: JSON with `{id, output}`
|
|
81
|
-
- `--traces-dir`: optional directory for
|
|
82
|
-
- `--config`: optional JSON with evolvable parameters
|
|
137
|
+
- `--traces-dir`: optional directory for rich traces
|
|
138
|
+
- `--config`: optional JSON with evolvable parameters
|
|
83
139
|
|
|
84
140
|
The eval script is also any executable:
|
|
85
141
|
|
|
@@ -87,165 +143,104 @@ The eval script is also any executable:
|
|
|
87
143
|
python3 eval.py --results-dir results/ --tasks-dir tasks/ --scores scores.json
|
|
88
144
|
```
|
|
89
145
|
|
|
90
|
-
|
|
146
|
+
Works with **any language, any framework, any domain**.
|
|
91
147
|
|
|
92
|
-
## Project Structure
|
|
148
|
+
## Project Structure (after init)
|
|
93
149
|
|
|
94
150
|
```
|
|
95
|
-
.harness-evolver/ # Created
|
|
96
|
-
├── config.json # Project config (harness cmd, eval
|
|
151
|
+
.harness-evolver/ # Created by /harness-evolver:init
|
|
152
|
+
├── config.json # Project config (harness cmd, eval, API keys detected)
|
|
97
153
|
├── summary.json # Source of truth (versions, scores, parents)
|
|
98
|
-
├── STATE.md # Human-readable status
|
|
154
|
+
├── STATE.md # Human-readable status
|
|
99
155
|
├── PROPOSER_HISTORY.md # Log of all proposals and outcomes
|
|
100
|
-
├── baseline/ # Original harness (read-only
|
|
101
|
-
│
|
|
102
|
-
│ └── config.json
|
|
156
|
+
├── baseline/ # Original harness (read-only)
|
|
157
|
+
│ └── harness.py
|
|
103
158
|
├── eval/
|
|
104
|
-
│ ├── eval.py #
|
|
105
|
-
│ └── tasks/ # Test cases
|
|
159
|
+
│ ├── eval.py # Your scoring script
|
|
160
|
+
│ └── tasks/ # Test cases
|
|
106
161
|
└── harnesses/
|
|
107
162
|
└── v001/
|
|
108
|
-
├── harness.py #
|
|
109
|
-
├──
|
|
110
|
-
├──
|
|
111
|
-
├── scores.json # Evaluation results
|
|
163
|
+
├── harness.py # Evolved candidate
|
|
164
|
+
├── proposal.md # Why this version was created
|
|
165
|
+
├── scores.json # How it scored
|
|
112
166
|
└── traces/ # Full execution traces
|
|
113
167
|
├── stdout.log
|
|
114
168
|
├── stderr.log
|
|
115
169
|
├── timing.json
|
|
116
170
|
└── task_001/
|
|
117
|
-
├── input.json
|
|
118
|
-
└── output.json
|
|
171
|
+
├── input.json
|
|
172
|
+
└── output.json
|
|
119
173
|
```
|
|
120
174
|
|
|
121
|
-
##
|
|
175
|
+
## The Proposer
|
|
122
176
|
|
|
123
|
-
|
|
177
|
+
The core of the system. 4-phase workflow from the Meta-Harness paper:
|
|
124
178
|
|
|
125
|
-
|
|
126
|
-
|
|
127
|
-
|
|
128
|
-
|
|
129
|
-
|
|
179
|
+
| Phase | What it does |
|
|
180
|
+
|---|---|
|
|
181
|
+
| **Orient** | Read `summary.json` + `PROPOSER_HISTORY.md`. Pick 2-3 versions to investigate. |
|
|
182
|
+
| **Diagnose** | Deep trace analysis. grep for errors, diff versions, counterfactual diagnosis. |
|
|
183
|
+
| **Propose** | Write new harness. Prefer additive changes after regressions. |
|
|
184
|
+
| **Document** | Write `proposal.md` with evidence. Update history. |
|
|
130
185
|
|
|
131
|
-
|
|
132
|
-
|---|---|---|
|
|
133
|
-
| **Skills** | `skills/harness-evolve-init/`, `skills/harness-evolve/`, `skills/harness-evolve-status/` | Slash commands that orchestrate the loop |
|
|
134
|
-
| **Agent** | `agents/harness-evolver-proposer.md` | The proposer — 4-phase workflow (orient, diagnose, propose, document) with 6 rules |
|
|
135
|
-
| **Tools** | `tools/evaluate.py`, `tools/state.py`, `tools/init.py`, `tools/detect_stack.py`, `tools/trace_logger.py` | CLI tools called via subprocess — zero LLM tokens spent on deterministic work |
|
|
136
|
-
| **Installer** | `bin/install.js`, `package.json` | Copies skills/agents/tools to the right locations |
|
|
137
|
-
| **Example** | `examples/classifier/` | 10-task medical classifier with mock mode |
|
|
186
|
+
**7 rules:** evidence-based changes, conservative after regression, don't repeat mistakes, one hypothesis at a time, maintain interface, prefer readability, use available API keys from environment.
|
|
138
187
|
|
|
139
188
|
## Integrations
|
|
140
189
|
|
|
141
|
-
### LangSmith (optional)
|
|
142
|
-
|
|
143
|
-
If `LANGSMITH_API_KEY` is set, the plugin automatically:
|
|
144
|
-
- Enables `LANGCHAIN_TRACING_V2` for auto-tracing of LangChain/LangGraph harnesses
|
|
145
|
-
- Detects [langsmith-cli](https://github.com/gigaverse-app/langsmith-cli) for the proposer to query traces directly
|
|
190
|
+
### LangSmith (optional, recommended for LangChain/LangGraph harnesses)
|
|
146
191
|
|
|
147
192
|
```bash
|
|
148
|
-
# Setup
|
|
149
193
|
export LANGSMITH_API_KEY=lsv2_...
|
|
150
194
|
uv tool install langsmith-cli && langsmith-cli auth login
|
|
151
|
-
|
|
152
|
-
# The proposer can then do:
|
|
153
|
-
langsmith-cli --json runs list --project harness-evolver-v003 --failed --fields id,name,error
|
|
154
|
-
langsmith-cli --json runs stats --project harness-evolver-v003
|
|
155
195
|
```
|
|
156
196
|
|
|
157
|
-
|
|
158
|
-
|
|
159
|
-
|
|
160
|
-
|
|
161
|
-
The plugin detects the harness's technology stack via AST analysis (17 libraries supported) and instructs the proposer to consult current documentation before proposing API changes.
|
|
197
|
+
When detected, the plugin:
|
|
198
|
+
- Sets `LANGCHAIN_TRACING_V2=true` automatically — all LLM calls are traced
|
|
199
|
+
- The proposer queries traces directly via `langsmith-cli`:
|
|
162
200
|
|
|
163
201
|
```bash
|
|
164
|
-
|
|
165
|
-
|
|
166
|
-
|
|
167
|
-
# The proposer automatically:
|
|
168
|
-
# 1. Reads config.json → stack.detected (e.g., LangChain, ChromaDB)
|
|
169
|
-
# 2. Queries Context7 for current docs before writing code
|
|
170
|
-
# 3. Annotates proposal.md with "API verified via Context7"
|
|
202
|
+
langsmith-cli --json runs list --project harness-evolver-v003 --failed --fields id,name,error
|
|
203
|
+
langsmith-cli --json runs stats --project harness-evolver-v003
|
|
171
204
|
```
|
|
172
205
|
|
|
173
|
-
|
|
174
|
-
|
|
175
|
-
### LangChain Docs MCP (optional)
|
|
206
|
+
### Context7 (optional, recommended for any library-heavy harness)
|
|
176
207
|
|
|
177
208
|
```bash
|
|
178
|
-
claude mcp add
|
|
209
|
+
claude mcp add context7 -- npx -y @upstash/context7-mcp@latest
|
|
179
210
|
```
|
|
180
211
|
|
|
181
|
-
|
|
182
|
-
|
|
183
|
-
## The Proposer
|
|
184
|
-
|
|
185
|
-
The proposer agent is the core of the system. It follows a 4-phase workflow derived from the Meta-Harness paper:
|
|
186
|
-
|
|
187
|
-
| Phase | Context % | What it does |
|
|
188
|
-
|---|---|---|
|
|
189
|
-
| **Orient** | ~6% | Read `summary.json` and `PROPOSER_HISTORY.md`. Decide which 2-3 versions to investigate. |
|
|
190
|
-
| **Diagnose** | ~80% | Deep trace analysis on selected versions. grep for errors, diff between good/bad versions, counterfactual diagnosis. |
|
|
191
|
-
| **Propose** | ~10% | Write new `harness.py` + `config.json`. Prefer additive changes after regressions. |
|
|
192
|
-
| **Document** | ~4% | Write `proposal.md` with evidence. Append to `PROPOSER_HISTORY.md`. |
|
|
193
|
-
|
|
194
|
-
**6 rules:**
|
|
195
|
-
1. Every change motivated by evidence (cite task ID, trace line, or score delta)
|
|
196
|
-
2. After regression, prefer additive changes
|
|
197
|
-
3. Don't repeat past mistakes (read PROPOSER_HISTORY.md)
|
|
198
|
-
4. One hypothesis at a time when possible
|
|
199
|
-
5. Maintain the CLI interface
|
|
200
|
-
6. Prefer readable harnesses over defensive ones
|
|
201
|
-
|
|
202
|
-
## Supported Libraries (Stack Detection)
|
|
203
|
-
|
|
204
|
-
The AST-based stack detector recognizes 17 libraries:
|
|
205
|
-
|
|
206
|
-
| Category | Libraries |
|
|
207
|
-
|---|---|
|
|
208
|
-
| **AI Frameworks** | LangChain, LangGraph, LlamaIndex, OpenAI, Anthropic, DSPy, CrewAI, AutoGen |
|
|
209
|
-
| **Vector Stores** | ChromaDB, Pinecone, Qdrant, Weaviate |
|
|
210
|
-
| **Web** | FastAPI, Flask, Pydantic |
|
|
211
|
-
| **Data** | Pandas, NumPy |
|
|
212
|
+
The plugin detects your stack via AST analysis (17 libraries: LangChain, LangGraph, OpenAI, Anthropic, ChromaDB, FastAPI, etc.) and instructs the proposer to consult current docs before proposing API changes.
|
|
212
213
|
|
|
213
214
|
## Development
|
|
214
215
|
|
|
215
216
|
```bash
|
|
216
|
-
# Run all tests (41 tests, stdlib-only
|
|
217
|
+
# Run all tests (41 tests, stdlib-only)
|
|
217
218
|
python3 -m unittest discover -s tests -v
|
|
218
219
|
|
|
219
|
-
# Test
|
|
220
|
-
|
|
221
|
-
python3 harness.py --input tasks/task_001.json --output /tmp/result.json --config config.json
|
|
222
|
-
cat /tmp/result.json
|
|
220
|
+
# Test example manually
|
|
221
|
+
python3 examples/classifier/harness.py --input examples/classifier/tasks/task_001.json --output /tmp/result.json --config examples/classifier/config.json
|
|
223
222
|
|
|
224
|
-
#
|
|
223
|
+
# Install locally for development
|
|
225
224
|
node bin/install.js
|
|
226
225
|
```
|
|
227
226
|
|
|
228
|
-
## Comparison
|
|
227
|
+
## Comparison
|
|
229
228
|
|
|
230
|
-
| | Meta-Harness
|
|
229
|
+
| | Meta-Harness | A-Evolve | ECC | **Harness Evolver** |
|
|
231
230
|
|---|---|---|---|---|
|
|
232
231
|
| **Format** | Paper artifact | Framework (Docker) | Plugin (passive) | **Plugin (active)** |
|
|
233
|
-
| **Search
|
|
234
|
-
| **Context/iter** | 10M tokens | Variable | N/A | **Full filesystem** |
|
|
232
|
+
| **Search** | Code-space | Code-space | Prompt-space | **Code-space** |
|
|
235
233
|
| **Domain** | TerminalBench-2 | Coding benchmarks | Dev workflow | **Any domain** |
|
|
236
|
-
| **Install** | Manual Python | Docker CLI | `/plugin install` | **`npx
|
|
237
|
-
| **LangSmith** | No | No | No | **Yes
|
|
238
|
-
| **Context7** | No | No | No | **Yes
|
|
234
|
+
| **Install** | Manual Python | Docker CLI | `/plugin install` | **`npx`** |
|
|
235
|
+
| **LangSmith** | No | No | No | **Yes** |
|
|
236
|
+
| **Context7** | No | No | No | **Yes** |
|
|
239
237
|
|
|
240
238
|
## References
|
|
241
239
|
|
|
242
|
-
- [Meta-Harness
|
|
243
|
-
- [GSD (Get Shit Done)](https://github.com/gsd-build/get-shit-done) — CLI architecture inspiration
|
|
244
|
-
- [LangSmith CLI](https://github.com/gigaverse-app/langsmith-cli) — Trace analysis for the proposer
|
|
245
|
-
- [Context7](https://github.com/upstash/context7) — Documentation lookup via MCP
|
|
240
|
+
- [Meta-Harness paper (arxiv 2603.28052)](https://arxiv.org/abs/2603.28052) — Lee et al., 2026
|
|
246
241
|
- [Design Spec](docs/specs/2026-03-31-harness-evolver-design.md)
|
|
247
|
-
- [LangSmith Integration
|
|
248
|
-
- [Context7 Integration
|
|
242
|
+
- [LangSmith Integration](docs/specs/2026-03-31-langsmith-integration.md)
|
|
243
|
+
- [Context7 Integration](docs/specs/2026-03-31-context7-integration.md)
|
|
249
244
|
|
|
250
245
|
## License
|
|
251
246
|
|
package/bin/install.js
CHANGED
|
@@ -70,30 +70,43 @@ function checkPython() {
|
|
|
70
70
|
}
|
|
71
71
|
}
|
|
72
72
|
|
|
73
|
+
function checkCommand(cmd) {
|
|
74
|
+
try {
|
|
75
|
+
execSync(cmd, { stdio: "pipe" });
|
|
76
|
+
return true;
|
|
77
|
+
} catch {
|
|
78
|
+
return false;
|
|
79
|
+
}
|
|
80
|
+
}
|
|
81
|
+
|
|
73
82
|
function installForRuntime(runtimeDir, scope) {
|
|
74
83
|
const baseDir = scope === "local"
|
|
75
84
|
? path.join(process.cwd(), runtimeDir)
|
|
76
85
|
: path.join(HOME, runtimeDir);
|
|
77
86
|
|
|
78
|
-
const
|
|
87
|
+
const skillsDir = path.join(baseDir, "skills");
|
|
79
88
|
const agentsDir = path.join(baseDir, "agents");
|
|
80
89
|
|
|
81
|
-
// Skills →
|
|
82
|
-
// Claude Code expects commands/name.md, not commands/name/SKILL.md
|
|
90
|
+
// Skills → ~/.claude/skills/<skill-name>/SKILL.md (proper skills format)
|
|
83
91
|
const skillsSource = path.join(PLUGIN_ROOT, "skills");
|
|
84
92
|
if (fs.existsSync(skillsSource)) {
|
|
85
|
-
fs.mkdirSync(commandsDir, { recursive: true });
|
|
86
93
|
for (const skill of fs.readdirSync(skillsSource, { withFileTypes: true })) {
|
|
87
94
|
if (skill.isDirectory()) {
|
|
88
|
-
const
|
|
89
|
-
|
|
90
|
-
|
|
91
|
-
|
|
92
|
-
}
|
|
95
|
+
const src = path.join(skillsSource, skill.name);
|
|
96
|
+
const dest = path.join(skillsDir, "harness-evolver:" + skill.name);
|
|
97
|
+
copyDir(src, dest);
|
|
98
|
+
console.log(` ${GREEN}✓${RESET} Installed skill: harness-evolver:${skill.name}`);
|
|
93
99
|
}
|
|
94
100
|
}
|
|
95
101
|
}
|
|
96
102
|
|
|
103
|
+
// Cleanup old commands/ install (from previous versions)
|
|
104
|
+
const oldCommandsDir = path.join(baseDir, "commands", "harness-evolver");
|
|
105
|
+
if (fs.existsSync(oldCommandsDir)) {
|
|
106
|
+
fs.rmSync(oldCommandsDir, { recursive: true, force: true });
|
|
107
|
+
console.log(` ${GREEN}✓${RESET} Cleaned up old commands/ directory`);
|
|
108
|
+
}
|
|
109
|
+
|
|
97
110
|
// Agents → agents/
|
|
98
111
|
const agentsSource = path.join(PLUGIN_ROOT, "agents");
|
|
99
112
|
if (fs.existsSync(agentsSource)) {
|
|
@@ -219,7 +232,90 @@ async function main() {
|
|
|
219
232
|
fs.writeFileSync(versionPath, VERSION);
|
|
220
233
|
console.log(` ${GREEN}✓${RESET} VERSION ${VERSION}`);
|
|
221
234
|
|
|
222
|
-
console.log(`\n ${GREEN}Done!${RESET}
|
|
235
|
+
console.log(`\n ${GREEN}Done!${RESET} Restart Claude Code, then run ${BRIGHT_MAGENTA}/harness-evolver:init${RESET}\n`);
|
|
236
|
+
|
|
237
|
+
// Optional integrations
|
|
238
|
+
console.log(` ${YELLOW}Install optional integrations?${RESET}\n`);
|
|
239
|
+
console.log(` These enhance the proposer with rich traces and up-to-date documentation.\n`);
|
|
240
|
+
|
|
241
|
+
// LangSmith CLI
|
|
242
|
+
const hasLangsmithCli = checkCommand("langsmith-cli --version");
|
|
243
|
+
if (hasLangsmithCli) {
|
|
244
|
+
console.log(` ${GREEN}✓${RESET} langsmith-cli already installed`);
|
|
245
|
+
} else {
|
|
246
|
+
console.log(` ${BOLD}LangSmith CLI${RESET} — rich trace analysis (error rates, latency, token usage)`);
|
|
247
|
+
console.log(` ${DIM}uv tool install langsmith-cli && langsmith-cli auth login${RESET}`);
|
|
248
|
+
const lsAnswer = await ask(rl, `\n ${YELLOW}Install langsmith-cli? [y/N]:${RESET} `);
|
|
249
|
+
if (lsAnswer.trim().toLowerCase() === "y") {
|
|
250
|
+
console.log(`\n Installing langsmith-cli...`);
|
|
251
|
+
try {
|
|
252
|
+
execSync("uv tool install langsmith-cli", { stdio: "inherit" });
|
|
253
|
+
console.log(`\n ${GREEN}✓${RESET} langsmith-cli installed`);
|
|
254
|
+
console.log(` ${YELLOW}Run ${BOLD}langsmith-cli auth login${RESET}${YELLOW} to authenticate with your LangSmith API key.${RESET}\n`);
|
|
255
|
+
} catch {
|
|
256
|
+
console.log(`\n ${RED}Failed.${RESET} Install manually: uv tool install langsmith-cli\n`);
|
|
257
|
+
}
|
|
258
|
+
}
|
|
259
|
+
}
|
|
260
|
+
|
|
261
|
+
// Context7 MCP
|
|
262
|
+
const hasContext7 = (() => {
|
|
263
|
+
try {
|
|
264
|
+
for (const p of [path.join(HOME, ".claude", "settings.json"), path.join(HOME, ".claude.json")]) {
|
|
265
|
+
if (fs.existsSync(p)) {
|
|
266
|
+
const s = JSON.parse(fs.readFileSync(p, "utf8"));
|
|
267
|
+
if (s.mcpServers && (s.mcpServers.context7 || s.mcpServers.Context7)) return true;
|
|
268
|
+
}
|
|
269
|
+
}
|
|
270
|
+
} catch {}
|
|
271
|
+
return false;
|
|
272
|
+
})();
|
|
273
|
+
if (hasContext7) {
|
|
274
|
+
console.log(` ${GREEN}✓${RESET} Context7 MCP already configured`);
|
|
275
|
+
} else {
|
|
276
|
+
console.log(`\n ${BOLD}Context7 MCP${RESET} — up-to-date library documentation (LangChain, OpenAI, etc.)`);
|
|
277
|
+
console.log(` ${DIM}claude mcp add context7 -- npx -y @upstash/context7-mcp@latest${RESET}`);
|
|
278
|
+
const c7Answer = await ask(rl, `\n ${YELLOW}Install Context7 MCP? [y/N]:${RESET} `);
|
|
279
|
+
if (c7Answer.trim().toLowerCase() === "y") {
|
|
280
|
+
console.log(`\n Installing Context7 MCP...`);
|
|
281
|
+
try {
|
|
282
|
+
execSync("claude mcp add context7 -- npx -y @upstash/context7-mcp@latest", { stdio: "inherit" });
|
|
283
|
+
console.log(`\n ${GREEN}✓${RESET} Context7 MCP configured`);
|
|
284
|
+
} catch {
|
|
285
|
+
console.log(`\n ${RED}Failed.${RESET} Install manually: claude mcp add context7 -- npx -y @upstash/context7-mcp@latest\n`);
|
|
286
|
+
}
|
|
287
|
+
}
|
|
288
|
+
}
|
|
289
|
+
|
|
290
|
+
// LangChain Docs MCP
|
|
291
|
+
const hasLcDocs = (() => {
|
|
292
|
+
try {
|
|
293
|
+
for (const p of [path.join(HOME, ".claude", "settings.json"), path.join(HOME, ".claude.json")]) {
|
|
294
|
+
if (fs.existsSync(p)) {
|
|
295
|
+
const s = JSON.parse(fs.readFileSync(p, "utf8"));
|
|
296
|
+
if (s.mcpServers && (s.mcpServers["docs-langchain"] || s.mcpServers["LangChain Docs"])) return true;
|
|
297
|
+
}
|
|
298
|
+
}
|
|
299
|
+
} catch {}
|
|
300
|
+
return false;
|
|
301
|
+
})();
|
|
302
|
+
if (hasLcDocs) {
|
|
303
|
+
console.log(` ${GREEN}✓${RESET} LangChain Docs MCP already configured`);
|
|
304
|
+
} else {
|
|
305
|
+
console.log(`\n ${BOLD}LangChain Docs MCP${RESET} — LangChain/LangGraph/LangSmith documentation search`);
|
|
306
|
+
console.log(` ${DIM}claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp${RESET}`);
|
|
307
|
+
const lcAnswer = await ask(rl, `\n ${YELLOW}Install LangChain Docs MCP? [y/N]:${RESET} `);
|
|
308
|
+
if (lcAnswer.trim().toLowerCase() === "y") {
|
|
309
|
+
console.log(`\n Installing LangChain Docs MCP...`);
|
|
310
|
+
try {
|
|
311
|
+
execSync("claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp", { stdio: "inherit" });
|
|
312
|
+
console.log(`\n ${GREEN}✓${RESET} LangChain Docs MCP configured`);
|
|
313
|
+
} catch {
|
|
314
|
+
console.log(`\n ${RED}Failed.${RESET} Install manually: claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp\n`);
|
|
315
|
+
}
|
|
316
|
+
}
|
|
317
|
+
}
|
|
318
|
+
|
|
223
319
|
console.log(`\n ${DIM}Quick start with example:${RESET}`);
|
|
224
320
|
console.log(` cp -r ~/.harness-evolver/examples/classifier ./my-project`);
|
|
225
321
|
console.log(` cd my-project && claude`);
|