npm - harness-evolver - Versions diffs - 4.0.1 → 4.0.3 - Mend

harness-evolver 4.0.1 → 4.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/.claude-plugin/plugin.json +1 -1
package/README.md +4 -4
package/agents/evolver-proposer.md +0 -8
package/package.json +1 -1
package/skills/evolve/SKILL.md +4 -59
package/tools/synthesize_strategy.py +51 -5

package/.claude-plugin/plugin.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "harness-evolver",
   "description": "LangSmith-native autonomous agent optimization — evolves LLM agent code using multi-agent proposers, LangSmith experiments, and git worktrees",
-  "version": "4.0.1",
+  "version": "4.0.3",
   "author": {
     "name": "Raphael Valdetaro"
   },

package/README.md CHANGED Viewed

@@ -95,8 +95,8 @@ claude
 <td>Three-gate iteration triggers (score plateau, cost budget, convergence detection) replace blind N-iteration loops. State validation ensures config hasn't diverged from LangSmith.</td>
 </tr>
 <tr>
-<td><b>Self-Scheduling</b></td>
-<td>Background and cron-based evolution modes for unattended optimization. Schedule nightly runs and get notified on improvements.</td>
+<td><b>Background Mode</b></td>
+<td>Run all iterations in background while you continue working. Get notified on completion or significant improvements.</td>
 </tr>
 </table>
@@ -137,7 +137,7 @@ claude
   +- 1.8  Analyze per-task failures (adaptive briefings)
   +- 1.8a Synthesize strategy document (coordinator synthesis)
   +- 1.9  Prepare shared proposer context (KV cache-optimized prefix)
-  +- 2.   Spawn 5 proposers in parallel (per-strategy tool restrictions)
+  +- 2.   Spawn 5 proposers in parallel (each in a git worktree)
   +- 3.   Run target for each candidate (code-based evaluators)
   +- 3.5  Spawn evaluator agent (LLM-as-judge via langsmith-cli)
   +- 4.   Compare experiments -> select winner + per-task champion
@@ -165,7 +165,7 @@ Skills (markdown)
   └── /evolver:deploy   → tags and pushes
 Agents (markdown)
-  ├── Proposer (x5)     → modifies code in worktrees (per-strategy tool restrictions)
+  ├── Proposer (x5)     → modifies code in isolated git worktrees
   ├── Evaluator          → LLM-as-judge via langsmith-cli
   ├── Critic             → detects gaming + implements stricter evaluators
   ├── Architect          → ULTRAPLAN deep analysis (opus model)

package/agents/evolver-proposer.md CHANGED Viewed

@@ -148,14 +148,6 @@ Prioritize changes that fix real production failures over synthetic test failure
 4. **Commit your changes** — uncommitted changes are lost when the worktree is cleaned up
 5. **Write proposal.md** — the evolve skill reads this to understand what you did
-## Tool Restrictions
-Your available tools may be restricted based on your strategy:
-- **Exploit/Crossover/Failure-targeted**: Edit-only (no Write). Focus on modifying existing files.
-- **Explore**: Full access including Write. You may create new files if your approach requires it.
-If you need to create a file but only have Edit, restructure your approach to modify existing files instead.
 ## Return Protocol
 When done, end your response with:

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "harness-evolver",
-  "version": "4.0.1",
+  "version": "4.0.3",
   "description": "LangSmith-native autonomous agent optimization for Claude Code",
   "author": "Raphael Valdetaro",
   "license": "MIT",

package/skills/evolve/SKILL.md CHANGED Viewed

@@ -87,8 +87,7 @@ If iterations > 3, offer execution mode:
       "multiSelect": false,
       "options": [
         {"label": "Interactive", "description": "I'll watch. Show results after each iteration."},
-        {"label": "Background", "description": "Run all iterations in background. Notify on completion or significant improvement."},
-        {"label": "Scheduled", "description": "Schedule iterations to run on a cron (e.g., nightly optimization)."}
+        {"label": "Background", "description": "Run all iterations in background. Notify on completion or significant improvement."}
       ]
     }
   ]
@@ -98,35 +97,6 @@ If iterations > 3, offer execution mode:
 **If "Background" selected:**
 Run the evolution loop as a background task. Use the `run_in_background` parameter on the main loop execution.
-**If "Scheduled" selected:**
-Ask for schedule via AskUserQuestion:
-```json
-{
-  "questions": [
-    {
-      "question": "Schedule?",
-      "header": "Cron Schedule",
-      "multiSelect": false,
-      "options": [
-        {"label": "Every 6 hours", "description": "Run 1 iteration every 6 hours"},
-        {"label": "Nightly (2 AM)", "description": "Run iterations overnight"},
-        {"label": "Custom", "description": "Enter a cron expression"}
-      ]
-    }
-  ]
-}
-```
-Then create a cron trigger:
-```
-Use CronCreate tool to schedule:
-  - command: "/evolver:evolve --iterations 1 --no-interactive"
-  - schedule: {selected_cron}
-  - description: "Harness Evolver: scheduled optimization iteration"
-```
-Report: "Scheduled evolution iterations. Use `/evolver:status` to check progress. Cancel with CronDelete."
 ## The Loop
 Read config:
@@ -218,10 +188,11 @@ $EVOLVER_PY $TOOLS/synthesize_strategy.py \
     --trace-insights trace_insights.json \
     --best-results best_results.json \
     --evolution-memory evolution_memory.json \
+    --production-seed production_seed.json \
     --output strategy.md 2>/dev/null
 ```
-The `strategy.md` file is included in the proposer `<files_to_read>` block via the shared context (Step 1.9). This replaces raw data dumps with a synthesized, actionable document — proposers receive specific targets, not raw traces.
+The `strategy.md` file is included in the proposer `<files_to_read>` block via the shared context (Step 1.9). It synthesizes trace analysis, evolution memory, and production data into an actionable document. Proposers also receive `production_seed.json` directly for access to raw production traces.
 ### 1.9. Prepare Shared Proposer Context
@@ -233,6 +204,7 @@ SHARED_FILES_BLOCK="<files_to_read>
 - .evolver.json
 - strategy.md (if exists)
 - evolution_memory.md (if exists)
+- production_seed.json (if exists)
 - {entry_point_file}
 </files_to_read>"
@@ -311,33 +283,6 @@ APPROACH: {failure_targeted_or_efficiency}
 {adaptive_briefing_e}
 ```
-**Tool restrictions per strategy:**
-| Strategy | Allowed Tools | Rationale |
-|----------|--------------|-----------|
-| Exploit (A) | Read, Edit, Bash, Glob, Grep | No Write — can't create new files, only edit existing |
-| Explore (B) | Read, Write, Edit, Bash, Glob, Grep | Full access — may need new files for new architecture |
-| Crossover (C) | Read, Edit, Bash, Glob, Grep | No Write — combines existing patterns, doesn't create |
-| Failure-targeted (D, E) | Read, Edit, Bash, Glob, Grep | No Write — focused fixes on specific files |
-Apply via the `tools` parameter in each Agent() call. Example for exploit:
-```
-Agent(
-  subagent_type: "evolver-proposer",
-  tools: ["Read", "Edit", "Bash", "Glob", "Grep"],
-  ...
-)
-```
-For explore:
-```
-Agent(
-  subagent_type: "evolver-proposer",
-  tools: ["Read", "Write", "Edit", "Bash", "Glob", "Grep"],
-  ...
-)
-```
 Wait for all 5 to complete.
 **Stuck proposer detection**: If any proposer hasn't completed after 10 minutes, it may be stuck in a loop. The Claude Code runtime handles this via the agent's turn limit. If a proposer returns without committing changes, skip it — don't retry.

package/tools/synthesize_strategy.py CHANGED Viewed

@@ -1,9 +1,9 @@
 #!/usr/bin/env python3
 """Synthesize evolution strategy document from trace analysis.
-Reads trace_insights.json, best_results.json, and evolution_memory.json
-to produce a targeted strategy document with specific file paths,
-line numbers, and concrete change recommendations for proposers.
+Reads trace_insights.json, best_results.json, evolution_memory.json,
+and production_seed.json to produce a targeted strategy document with
+specific file paths and concrete change recommendations for proposers.
 Usage:
     python3 synthesize_strategy.py \
@@ -11,6 +11,7 @@ Usage:
         --trace-insights trace_insights.json \
         --best-results best_results.json \
         --evolution-memory evolution_memory.json \
+        --production-seed production_seed.json \
         --output strategy.md
 """
@@ -42,7 +43,7 @@ def identify_target_files(config):
     return target_files
-def synthesize(config, insights, results, memory):
+def synthesize(config, insights, results, memory, production=None):
     """Produce strategy recommendations."""
     strategy = {
         "primary_targets": [],
@@ -94,6 +95,28 @@ def synthesize(config, insights, results, memory):
             for eid, data in failing[:10]
         ]
+    # Production trace data
+    if production:
+        prod_data = {}
+        stats = production.get("stats", {})
+        if stats:
+            prod_data["total_traces"] = stats.get("total_traces", 0)
+            prod_data["error_rate"] = stats.get("error_rate", 0)
+        categories = production.get("categories", [])
+        if categories:
+            prod_data["traffic_distribution"] = categories[:10]
+        neg = production.get("negative_feedback_inputs", [])
+        if neg:
+            prod_data["negative_feedback"] = neg[:5]
+        errors = production.get("error_patterns", production.get("errors", []))
+        if errors:
+            prod_data["production_errors"] = errors[:5] if isinstance(errors, list) else []
+        slow = production.get("slow_queries", [])
+        if slow:
+            prod_data["slow_queries"] = slow[:5]
+        if prod_data:
+            strategy["production"] = prod_data
     return strategy
@@ -142,6 +165,27 @@ def format_strategy_md(strategy, config):
             lines.append(f"- `{ex['example_id']}` (score: {score:.2f}): {preview}{error}")
         lines.append("")
+    prod = strategy.get("production", {})
+    if prod:
+        lines.append("## Production Insights")
+        if prod.get("total_traces"):
+            lines.append(f"- **Traces**: {prod['total_traces']} total, {prod.get('error_rate', 0):.1%} error rate")
+        if prod.get("traffic_distribution"):
+            lines.append(f"- **Traffic**: {', '.join(str(c) for c in prod['traffic_distribution'][:5])}")
+        if prod.get("negative_feedback"):
+            lines.append("- **Negative feedback inputs**:")
+            for nf in prod["negative_feedback"]:
+                lines.append(f"  - {str(nf)[:120]}")
+        if prod.get("production_errors"):
+            lines.append("- **Production errors**:")
+            for pe in prod["production_errors"]:
+                lines.append(f"  - {str(pe)[:120]}")
+        if prod.get("slow_queries"):
+            lines.append("- **Slow queries**:")
+            for sq in prod["slow_queries"]:
+                lines.append(f"  - {str(sq)[:120]}")
+        lines.append("")
     return "\n".join(lines)
@@ -151,6 +195,7 @@ def main():
     parser.add_argument("--trace-insights", default="trace_insights.json")
     parser.add_argument("--best-results", default="best_results.json")
     parser.add_argument("--evolution-memory", default="evolution_memory.json")
+    parser.add_argument("--production-seed", default="production_seed.json")
     parser.add_argument("--output", default="strategy.md")
     args = parser.parse_args()
@@ -160,8 +205,9 @@ def main():
     insights = load_json_safe(args.trace_insights)
     results = load_json_safe(args.best_results)
     memory = load_json_safe(args.evolution_memory)
+    production = load_json_safe(args.production_seed)
-    strategy = synthesize(config, insights, results, memory)
+    strategy = synthesize(config, insights, results, memory, production)
     md = format_strategy_md(strategy, config)
     with open(args.output, "w") as f: