harness-evolver 0.7.1 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +123 -128
  2. package/bin/install.js +106 -10
  3. package/package.json +1 -1
package/README.md CHANGED
@@ -4,46 +4,102 @@ End-to-end optimization of LLM agent harnesses, inspired by [Meta-Harness](https
4
4
 
5
5
  **The harness is the 80% factor.** Changing just the scaffolding around a fixed LLM can produce a [6x performance gap](https://arxiv.org/abs/2603.28052) on the same benchmark. Harness Evolver automates the search for better harnesses using an autonomous propose-evaluate-iterate loop with full execution traces as feedback.
6
6
 
7
- ## Why
7
+ ## Install
8
8
 
9
- Manual harness engineering is slow and doesn't scale. Existing optimizers work in prompt-space (OPRO, TextGrad, GEPA) or use compressed summaries. Meta-Harness showed that **code-space search with full diagnostic context** (10M+ tokens of traces) outperforms all of them by 10+ points.
9
+ ```bash
10
+ npx harness-evolver@latest
11
+ ```
10
12
 
11
- Harness Evolver brings that approach to any domain as a Claude Code plugin.
13
+ Select your runtime (Claude Code, Cursor, Codex, Windsurf) and scope (global/local). Then **restart your AI coding agent** for the skills to appear.
12
14
 
13
- ## Install
15
+ ## Prerequisites
16
+
17
+ ### API Keys (set in your shell before launching Claude Code)
18
+
19
+ The harness you're evolving may call LLM APIs. Set the keys your harness needs:
14
20
 
15
21
  ```bash
16
- # Via npx (recommended)
17
- npx harness-evolver@latest
22
+ # Required: at least one LLM provider
23
+ export ANTHROPIC_API_KEY="sk-ant-..." # For Claude-based harnesses
24
+ export OPENAI_API_KEY="sk-..." # For OpenAI-based harnesses
25
+ export GEMINI_API_KEY="AIza..." # For Gemini-based harnesses
26
+ export OPENROUTER_API_KEY="sk-or-..." # For OpenRouter (multi-model)
27
+
28
+ # Optional: enhanced tracing
29
+ export LANGSMITH_API_KEY="lsv2_pt_..." # Auto-enables LangSmith tracing
30
+ ```
31
+
32
+ The plugin auto-detects which keys are available during `/harness-evolver:init` and shows them. The proposer agent knows which APIs are available and uses them accordingly.
33
+
34
+ **No API key needed for the example** — the classifier example uses keyword matching (mock mode), no LLM calls.
35
+
36
+ ### Optional: Enhanced Integrations
37
+
38
+ ```bash
39
+ # LangSmith — rich trace analysis for the proposer
40
+ uv tool install langsmith-cli && langsmith-cli auth login
41
+
42
+ # Context7 — up-to-date library documentation for the proposer
43
+ claude mcp add context7 -- npx -y @upstash/context7-mcp@latest
18
44
 
19
- # Or as a Claude Code plugin
20
- /plugin install harness-evolver
45
+ # LangChain Docs LangChain/LangGraph-specific documentation
46
+ claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp
21
47
  ```
22
48
 
23
49
  ## Quick Start
24
50
 
51
+ ### Try the Example (no API key needed)
52
+
25
53
  ```bash
26
- # 1. Copy the example into a working directory
54
+ # 1. Copy the example
27
55
  cp -r ~/.harness-evolver/examples/classifier ./my-classifier
28
56
  cd my-classifier
29
57
 
30
- # 2. Initialize (validates harness, evaluates baseline)
31
- /harness-evolve-init --harness harness.py --eval eval.py --tasks tasks/
58
+ # 2. Open Claude Code
59
+ claude
60
+
61
+ # 3. Initialize — auto-detects harness.py, eval.py, tasks/
62
+ /harness-evolver:init
63
+
64
+ # 4. Run the evolution loop
65
+ /harness-evolver:evolve --iterations 3
32
66
 
33
- # 3. Run the evolution loop
34
- /harness-evolve --iterations 5
67
+ # 5. Check progress
68
+ /harness-evolver:status
69
+ ```
70
+
71
+ ### Use with Your Own Project
72
+
73
+ ```bash
74
+ cd my-llm-project
75
+ claude
35
76
 
36
- # 4. Check progress anytime
37
- /harness-evolve-status
77
+ # Init scans your project, identifies the entry point,
78
+ # and helps create harness wrapper + eval + tasks if missing
79
+ /harness-evolver:init
80
+
81
+ # Run optimization
82
+ /harness-evolver:evolve --iterations 10
38
83
  ```
39
84
 
40
- The classifier example runs in mock mode (no API key needed) and demonstrates the full loop in under 2 minutes.
85
+ The init skill adapts to your project if you have `graph.py` instead of `harness.py`, it creates a thin wrapper. If you don't have an eval script, it helps you write one.
86
+
87
+ ## Available Commands
88
+
89
+ | Command | What it does |
90
+ |---|---|
91
+ | `/harness-evolver:init` | Scan project, create harness/eval/tasks, run baseline |
92
+ | `/harness-evolver:evolve` | Run the autonomous optimization loop |
93
+ | `/harness-evolver:status` | Show progress (scores, iterations, stagnation) |
94
+ | `/harness-evolver:compare` | Diff two versions with per-task analysis |
95
+ | `/harness-evolver:diagnose` | Deep trace analysis of a specific version |
96
+ | `/harness-evolver:deploy` | Copy the best harness back to your project |
41
97
 
42
98
  ## How It Works
43
99
 
44
100
  ```
45
101
  ┌─────────────────────────────┐
46
- /harness-evolve
102
+ /harness-evolver:evolve
47
103
  │ (orchestrator skill) │
48
104
  └──────────┬──────────────────┘
49
105
 
@@ -63,10 +119,10 @@ The classifier example runs in mock mode (no API key needed) and demonstrates th
63
119
  scores.json
64
120
  ```
65
121
 
66
- 1. **Propose** — A proposer agent (Claude Code subagent) reads all prior candidates' code, execution traces, and scores. It diagnoses failure modes via counterfactual analysis and writes a new harness.
67
- 2. **Evaluate** — The harness runs against every task. Traces are captured per-task (input, output, stdout, stderr, timing). The user's eval script scores the results.
122
+ 1. **Propose** — A proposer agent reads all prior candidates' code, execution traces, and scores. Diagnoses failure modes via counterfactual analysis and writes a new harness.
123
+ 2. **Evaluate** — The harness runs against every task. Traces are captured per-task (input, output, stdout, stderr, timing). Your eval script scores the results.
68
124
  3. **Update** — State files are updated with the new score, parent lineage, and regression detection.
69
- 4. **Repeat** — The loop continues until N iterations, stagnation (3 rounds without >1% improvement), or a target score is reached.
125
+ 4. **Repeat** — Until N iterations, stagnation (3 rounds without >1% improvement), or target score reached.
70
126
 
71
127
  ## The Harness Contract
72
128
 
@@ -78,8 +134,8 @@ python3 harness.py --input task.json --output result.json [--traces-dir DIR] [--
78
134
 
79
135
  - `--input`: JSON with `{id, input, metadata}` (never sees expected answers)
80
136
  - `--output`: JSON with `{id, output}`
81
- - `--traces-dir`: optional directory for the harness to write rich traces
82
- - `--config`: optional JSON with evolvable parameters (model, temperature, etc.)
137
+ - `--traces-dir`: optional directory for rich traces
138
+ - `--config`: optional JSON with evolvable parameters
83
139
 
84
140
  The eval script is also any executable:
85
141
 
@@ -87,165 +143,104 @@ The eval script is also any executable:
87
143
  python3 eval.py --results-dir results/ --tasks-dir tasks/ --scores scores.json
88
144
  ```
89
145
 
90
- This means Harness Evolver works with **any language, any framework, any domain**.
146
+ Works with **any language, any framework, any domain**.
91
147
 
92
- ## Project Structure
148
+ ## Project Structure (after init)
93
149
 
94
150
  ```
95
- .harness-evolver/ # Created in your project by /harness-evolve-init
96
- ├── config.json # Project config (harness cmd, eval cmd, evolution params)
151
+ .harness-evolver/ # Created by /harness-evolver:init
152
+ ├── config.json # Project config (harness cmd, eval, API keys detected)
97
153
  ├── summary.json # Source of truth (versions, scores, parents)
98
- ├── STATE.md # Human-readable status (generated)
154
+ ├── STATE.md # Human-readable status
99
155
  ├── PROPOSER_HISTORY.md # Log of all proposals and outcomes
100
- ├── baseline/ # Original harness (read-only reference)
101
- ├── harness.py
102
- │ └── config.json
156
+ ├── baseline/ # Original harness (read-only)
157
+ └── harness.py
103
158
  ├── eval/
104
- │ ├── eval.py # Scoring script
105
- │ └── tasks/ # Test cases (JSON files)
159
+ │ ├── eval.py # Your scoring script
160
+ │ └── tasks/ # Test cases
106
161
  └── harnesses/
107
162
  └── v001/
108
- ├── harness.py # Candidate code
109
- ├── config.json # Evolvable parameters
110
- ├── proposal.md # Proposer's reasoning
111
- ├── scores.json # Evaluation results
163
+ ├── harness.py # Evolved candidate
164
+ ├── proposal.md # Why this version was created
165
+ ├── scores.json # How it scored
112
166
  └── traces/ # Full execution traces
113
167
  ├── stdout.log
114
168
  ├── stderr.log
115
169
  ├── timing.json
116
170
  └── task_001/
117
- ├── input.json # What the harness received
118
- └── output.json # What the harness returned
171
+ ├── input.json
172
+ └── output.json
119
173
  ```
120
174
 
121
- ## Plugin Architecture
175
+ ## The Proposer
122
176
 
123
- Three-layer design inspired by [GSD](https://github.com/gsd-build/get-shit-done):
177
+ The core of the system. 4-phase workflow from the Meta-Harness paper:
124
178
 
125
- ```
126
- Layer 1: Skills + Agents (markdown) → AI orchestration
127
- Layer 2: Tools (Python stdlib-only) → Deterministic operations
128
- Layer 3: Installer (Node.js) → Distribution via npx
129
- ```
179
+ | Phase | What it does |
180
+ |---|---|
181
+ | **Orient** | Read `summary.json` + `PROPOSER_HISTORY.md`. Pick 2-3 versions to investigate. |
182
+ | **Diagnose** | Deep trace analysis. grep for errors, diff versions, counterfactual diagnosis. |
183
+ | **Propose** | Write new harness. Prefer additive changes after regressions. |
184
+ | **Document** | Write `proposal.md` with evidence. Update history. |
130
185
 
131
- | Component | Files | Purpose |
132
- |---|---|---|
133
- | **Skills** | `skills/harness-evolve-init/`, `skills/harness-evolve/`, `skills/harness-evolve-status/` | Slash commands that orchestrate the loop |
134
- | **Agent** | `agents/harness-evolver-proposer.md` | The proposer — 4-phase workflow (orient, diagnose, propose, document) with 6 rules |
135
- | **Tools** | `tools/evaluate.py`, `tools/state.py`, `tools/init.py`, `tools/detect_stack.py`, `tools/trace_logger.py` | CLI tools called via subprocess — zero LLM tokens spent on deterministic work |
136
- | **Installer** | `bin/install.js`, `package.json` | Copies skills/agents/tools to the right locations |
137
- | **Example** | `examples/classifier/` | 10-task medical classifier with mock mode |
186
+ **7 rules:** evidence-based changes, conservative after regression, don't repeat mistakes, one hypothesis at a time, maintain interface, prefer readability, use available API keys from environment.
138
187
 
139
188
  ## Integrations
140
189
 
141
- ### LangSmith (optional)
142
-
143
- If `LANGSMITH_API_KEY` is set, the plugin automatically:
144
- - Enables `LANGCHAIN_TRACING_V2` for auto-tracing of LangChain/LangGraph harnesses
145
- - Detects [langsmith-cli](https://github.com/gigaverse-app/langsmith-cli) for the proposer to query traces directly
190
+ ### LangSmith (optional, recommended for LangChain/LangGraph harnesses)
146
191
 
147
192
  ```bash
148
- # Setup
149
193
  export LANGSMITH_API_KEY=lsv2_...
150
194
  uv tool install langsmith-cli && langsmith-cli auth login
151
-
152
- # The proposer can then do:
153
- langsmith-cli --json runs list --project harness-evolver-v003 --failed --fields id,name,error
154
- langsmith-cli --json runs stats --project harness-evolver-v003
155
195
  ```
156
196
 
157
- No custom API client — the proposer uses `langsmith-cli` like it uses `grep` and `diff`.
158
-
159
- ### Context7 (optional)
160
-
161
- The plugin detects the harness's technology stack via AST analysis (17 libraries supported) and instructs the proposer to consult current documentation before proposing API changes.
197
+ When detected, the plugin:
198
+ - Sets `LANGCHAIN_TRACING_V2=true` automatically — all LLM calls are traced
199
+ - The proposer queries traces directly via `langsmith-cli`:
162
200
 
163
201
  ```bash
164
- # Setup
165
- claude mcp add context7 -- npx -y @upstash/context7-mcp@latest
166
-
167
- # The proposer automatically:
168
- # 1. Reads config.json → stack.detected (e.g., LangChain, ChromaDB)
169
- # 2. Queries Context7 for current docs before writing code
170
- # 3. Annotates proposal.md with "API verified via Context7"
202
+ langsmith-cli --json runs list --project harness-evolver-v003 --failed --fields id,name,error
203
+ langsmith-cli --json runs stats --project harness-evolver-v003
171
204
  ```
172
205
 
173
- Without Context7, the proposer uses model knowledge and annotates "API not verified against current docs."
174
-
175
- ### LangChain Docs MCP (optional)
206
+ ### Context7 (optional, recommended for any library-heavy harness)
176
207
 
177
208
  ```bash
178
- claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp
209
+ claude mcp add context7 -- npx -y @upstash/context7-mcp@latest
179
210
  ```
180
211
 
181
- Complements Context7 with LangChain/LangGraph/LangSmith-specific documentation search.
182
-
183
- ## The Proposer
184
-
185
- The proposer agent is the core of the system. It follows a 4-phase workflow derived from the Meta-Harness paper:
186
-
187
- | Phase | Context % | What it does |
188
- |---|---|---|
189
- | **Orient** | ~6% | Read `summary.json` and `PROPOSER_HISTORY.md`. Decide which 2-3 versions to investigate. |
190
- | **Diagnose** | ~80% | Deep trace analysis on selected versions. grep for errors, diff between good/bad versions, counterfactual diagnosis. |
191
- | **Propose** | ~10% | Write new `harness.py` + `config.json`. Prefer additive changes after regressions. |
192
- | **Document** | ~4% | Write `proposal.md` with evidence. Append to `PROPOSER_HISTORY.md`. |
193
-
194
- **6 rules:**
195
- 1. Every change motivated by evidence (cite task ID, trace line, or score delta)
196
- 2. After regression, prefer additive changes
197
- 3. Don't repeat past mistakes (read PROPOSER_HISTORY.md)
198
- 4. One hypothesis at a time when possible
199
- 5. Maintain the CLI interface
200
- 6. Prefer readable harnesses over defensive ones
201
-
202
- ## Supported Libraries (Stack Detection)
203
-
204
- The AST-based stack detector recognizes 17 libraries:
205
-
206
- | Category | Libraries |
207
- |---|---|
208
- | **AI Frameworks** | LangChain, LangGraph, LlamaIndex, OpenAI, Anthropic, DSPy, CrewAI, AutoGen |
209
- | **Vector Stores** | ChromaDB, Pinecone, Qdrant, Weaviate |
210
- | **Web** | FastAPI, Flask, Pydantic |
211
- | **Data** | Pandas, NumPy |
212
+ The plugin detects your stack via AST analysis (17 libraries: LangChain, LangGraph, OpenAI, Anthropic, ChromaDB, FastAPI, etc.) and instructs the proposer to consult current docs before proposing API changes.
212
213
 
213
214
  ## Development
214
215
 
215
216
  ```bash
216
- # Run all tests (41 tests, stdlib-only, no pip install needed)
217
+ # Run all tests (41 tests, stdlib-only)
217
218
  python3 -m unittest discover -s tests -v
218
219
 
219
- # Test the example manually
220
- cd examples/classifier
221
- python3 harness.py --input tasks/task_001.json --output /tmp/result.json --config config.json
222
- cat /tmp/result.json
220
+ # Test example manually
221
+ python3 examples/classifier/harness.py --input examples/classifier/tasks/task_001.json --output /tmp/result.json --config examples/classifier/config.json
223
222
 
224
- # Run the installer locally
223
+ # Install locally for development
225
224
  node bin/install.js
226
225
  ```
227
226
 
228
- ## Comparison with Related Work
227
+ ## Comparison
229
228
 
230
- | | Meta-Harness (paper) | A-Evolve | ECC /evolve | **Harness Evolver** |
229
+ | | Meta-Harness | A-Evolve | ECC | **Harness Evolver** |
231
230
  |---|---|---|---|---|
232
231
  | **Format** | Paper artifact | Framework (Docker) | Plugin (passive) | **Plugin (active)** |
233
- | **Search space** | Code-space | Code-space | Prompt-space | **Code-space** |
234
- | **Context/iter** | 10M tokens | Variable | N/A | **Full filesystem** |
232
+ | **Search** | Code-space | Code-space | Prompt-space | **Code-space** |
235
233
  | **Domain** | TerminalBench-2 | Coding benchmarks | Dev workflow | **Any domain** |
236
- | **Install** | Manual Python | Docker CLI | `/plugin install` | **`npx` or `/plugin install`** |
237
- | **LangSmith** | No | No | No | **Yes (langsmith-cli)** |
238
- | **Context7** | No | No | No | **Yes (MCP)** |
234
+ | **Install** | Manual Python | Docker CLI | `/plugin install` | **`npx`** |
235
+ | **LangSmith** | No | No | No | **Yes** |
236
+ | **Context7** | No | No | No | **Yes** |
239
237
 
240
238
  ## References
241
239
 
242
- - [Meta-Harness: End-to-End Optimization of Model Harnesses](https://arxiv.org/abs/2603.28052) — Lee et al., 2026
243
- - [GSD (Get Shit Done)](https://github.com/gsd-build/get-shit-done) — CLI architecture inspiration
244
- - [LangSmith CLI](https://github.com/gigaverse-app/langsmith-cli) — Trace analysis for the proposer
245
- - [Context7](https://github.com/upstash/context7) — Documentation lookup via MCP
240
+ - [Meta-Harness paper (arxiv 2603.28052)](https://arxiv.org/abs/2603.28052) — Lee et al., 2026
246
241
  - [Design Spec](docs/specs/2026-03-31-harness-evolver-design.md)
247
- - [LangSmith Integration Spec](docs/specs/2026-03-31-langsmith-integration.md)
248
- - [Context7 Integration Spec](docs/specs/2026-03-31-context7-integration.md)
242
+ - [LangSmith Integration](docs/specs/2026-03-31-langsmith-integration.md)
243
+ - [Context7 Integration](docs/specs/2026-03-31-context7-integration.md)
249
244
 
250
245
  ## License
251
246
 
package/bin/install.js CHANGED
@@ -70,30 +70,43 @@ function checkPython() {
70
70
  }
71
71
  }
72
72
 
73
+ function checkCommand(cmd) {
74
+ try {
75
+ execSync(cmd, { stdio: "pipe" });
76
+ return true;
77
+ } catch {
78
+ return false;
79
+ }
80
+ }
81
+
73
82
  function installForRuntime(runtimeDir, scope) {
74
83
  const baseDir = scope === "local"
75
84
  ? path.join(process.cwd(), runtimeDir)
76
85
  : path.join(HOME, runtimeDir);
77
86
 
78
- const commandsDir = path.join(baseDir, "commands", "harness-evolver");
87
+ const skillsDir = path.join(baseDir, "skills");
79
88
  const agentsDir = path.join(baseDir, "agents");
80
89
 
81
- // Skills → commands/harness-evolver/ as flat .md files
82
- // Claude Code expects commands/name.md, not commands/name/SKILL.md
90
+ // Skills → ~/.claude/skills/<skill-name>/SKILL.md (proper skills format)
83
91
  const skillsSource = path.join(PLUGIN_ROOT, "skills");
84
92
  if (fs.existsSync(skillsSource)) {
85
- fs.mkdirSync(commandsDir, { recursive: true });
86
93
  for (const skill of fs.readdirSync(skillsSource, { withFileTypes: true })) {
87
94
  if (skill.isDirectory()) {
88
- const skillMd = path.join(skillsSource, skill.name, "SKILL.md");
89
- if (fs.existsSync(skillMd)) {
90
- fs.copyFileSync(skillMd, path.join(commandsDir, skill.name + ".md"));
91
- console.log(` ${GREEN}✓${RESET} Installed command: harness-evolver:${skill.name}`);
92
- }
95
+ const src = path.join(skillsSource, skill.name);
96
+ const dest = path.join(skillsDir, "harness-evolver:" + skill.name);
97
+ copyDir(src, dest);
98
+ console.log(` ${GREEN}✓${RESET} Installed skill: harness-evolver:${skill.name}`);
93
99
  }
94
100
  }
95
101
  }
96
102
 
103
+ // Cleanup old commands/ install (from previous versions)
104
+ const oldCommandsDir = path.join(baseDir, "commands", "harness-evolver");
105
+ if (fs.existsSync(oldCommandsDir)) {
106
+ fs.rmSync(oldCommandsDir, { recursive: true, force: true });
107
+ console.log(` ${GREEN}✓${RESET} Cleaned up old commands/ directory`);
108
+ }
109
+
97
110
  // Agents → agents/
98
111
  const agentsSource = path.join(PLUGIN_ROOT, "agents");
99
112
  if (fs.existsSync(agentsSource)) {
@@ -219,7 +232,90 @@ async function main() {
219
232
  fs.writeFileSync(versionPath, VERSION);
220
233
  console.log(` ${GREEN}✓${RESET} VERSION ${VERSION}`);
221
234
 
222
- console.log(`\n ${GREEN}Done!${RESET} Run ${BRIGHT_MAGENTA}/reload-plugins${RESET} in Claude Code, then ${BRIGHT_MAGENTA}/harness-evolver:init${RESET}`);
235
+ console.log(`\n ${GREEN}Done!${RESET} Restart Claude Code, then run ${BRIGHT_MAGENTA}/harness-evolver:init${RESET}\n`);
236
+
237
+ // Optional integrations
238
+ console.log(` ${YELLOW}Install optional integrations?${RESET}\n`);
239
+ console.log(` These enhance the proposer with rich traces and up-to-date documentation.\n`);
240
+
241
+ // LangSmith CLI
242
+ const hasLangsmithCli = checkCommand("langsmith-cli --version");
243
+ if (hasLangsmithCli) {
244
+ console.log(` ${GREEN}✓${RESET} langsmith-cli already installed`);
245
+ } else {
246
+ console.log(` ${BOLD}LangSmith CLI${RESET} — rich trace analysis (error rates, latency, token usage)`);
247
+ console.log(` ${DIM}uv tool install langsmith-cli && langsmith-cli auth login${RESET}`);
248
+ const lsAnswer = await ask(rl, `\n ${YELLOW}Install langsmith-cli? [y/N]:${RESET} `);
249
+ if (lsAnswer.trim().toLowerCase() === "y") {
250
+ console.log(`\n Installing langsmith-cli...`);
251
+ try {
252
+ execSync("uv tool install langsmith-cli", { stdio: "inherit" });
253
+ console.log(`\n ${GREEN}✓${RESET} langsmith-cli installed`);
254
+ console.log(` ${YELLOW}Run ${BOLD}langsmith-cli auth login${RESET}${YELLOW} to authenticate with your LangSmith API key.${RESET}\n`);
255
+ } catch {
256
+ console.log(`\n ${RED}Failed.${RESET} Install manually: uv tool install langsmith-cli\n`);
257
+ }
258
+ }
259
+ }
260
+
261
+ // Context7 MCP
262
+ const hasContext7 = (() => {
263
+ try {
264
+ for (const p of [path.join(HOME, ".claude", "settings.json"), path.join(HOME, ".claude.json")]) {
265
+ if (fs.existsSync(p)) {
266
+ const s = JSON.parse(fs.readFileSync(p, "utf8"));
267
+ if (s.mcpServers && (s.mcpServers.context7 || s.mcpServers.Context7)) return true;
268
+ }
269
+ }
270
+ } catch {}
271
+ return false;
272
+ })();
273
+ if (hasContext7) {
274
+ console.log(` ${GREEN}✓${RESET} Context7 MCP already configured`);
275
+ } else {
276
+ console.log(`\n ${BOLD}Context7 MCP${RESET} — up-to-date library documentation (LangChain, OpenAI, etc.)`);
277
+ console.log(` ${DIM}claude mcp add context7 -- npx -y @upstash/context7-mcp@latest${RESET}`);
278
+ const c7Answer = await ask(rl, `\n ${YELLOW}Install Context7 MCP? [y/N]:${RESET} `);
279
+ if (c7Answer.trim().toLowerCase() === "y") {
280
+ console.log(`\n Installing Context7 MCP...`);
281
+ try {
282
+ execSync("claude mcp add context7 -- npx -y @upstash/context7-mcp@latest", { stdio: "inherit" });
283
+ console.log(`\n ${GREEN}✓${RESET} Context7 MCP configured`);
284
+ } catch {
285
+ console.log(`\n ${RED}Failed.${RESET} Install manually: claude mcp add context7 -- npx -y @upstash/context7-mcp@latest\n`);
286
+ }
287
+ }
288
+ }
289
+
290
+ // LangChain Docs MCP
291
+ const hasLcDocs = (() => {
292
+ try {
293
+ for (const p of [path.join(HOME, ".claude", "settings.json"), path.join(HOME, ".claude.json")]) {
294
+ if (fs.existsSync(p)) {
295
+ const s = JSON.parse(fs.readFileSync(p, "utf8"));
296
+ if (s.mcpServers && (s.mcpServers["docs-langchain"] || s.mcpServers["LangChain Docs"])) return true;
297
+ }
298
+ }
299
+ } catch {}
300
+ return false;
301
+ })();
302
+ if (hasLcDocs) {
303
+ console.log(` ${GREEN}✓${RESET} LangChain Docs MCP already configured`);
304
+ } else {
305
+ console.log(`\n ${BOLD}LangChain Docs MCP${RESET} — LangChain/LangGraph/LangSmith documentation search`);
306
+ console.log(` ${DIM}claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp${RESET}`);
307
+ const lcAnswer = await ask(rl, `\n ${YELLOW}Install LangChain Docs MCP? [y/N]:${RESET} `);
308
+ if (lcAnswer.trim().toLowerCase() === "y") {
309
+ console.log(`\n Installing LangChain Docs MCP...`);
310
+ try {
311
+ execSync("claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp", { stdio: "inherit" });
312
+ console.log(`\n ${GREEN}✓${RESET} LangChain Docs MCP configured`);
313
+ } catch {
314
+ console.log(`\n ${RED}Failed.${RESET} Install manually: claude mcp add docs-langchain --transport http https://docs.langchain.com/mcp\n`);
315
+ }
316
+ }
317
+ }
318
+
223
319
  console.log(`\n ${DIM}Quick start with example:${RESET}`);
224
320
  console.log(` cp -r ~/.harness-evolver/examples/classifier ./my-project`);
225
321
  console.log(` cd my-project && claude`);
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "harness-evolver",
3
- "version": "0.7.1",
3
+ "version": "0.9.0",
4
4
  "description": "Meta-Harness-style autonomous harness optimization for Claude Code",
5
5
  "author": "Raphael Valdetaro",
6
6
  "license": "MIT",