npm - evo-anything - Versions diffs - 0.1.2 → 0.1.4 - Mend

evo-anything 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

package/package.json +1 -1
package/plugin/AGENTS.md +209 -70
package/plugin/SOUL.md +3 -5
package/plugin/TOOLS.md +37 -26
package/plugin/agents/map_agent.md +28 -0
package/plugin/agents/orchestrator.md +26 -0
package/plugin/agents/policy_agent.md +53 -0
package/plugin/agents/reflect_agent.md +79 -0
package/plugin/agents/worker.md +93 -0
package/plugin/evo-engine/server.py +116 -89
package/plugin/openclaw.plugin.json +2 -1
package/plugin/skills/evolve/SKILL.md +14 -12
package/scripts/cli.js +12 -0

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "evo-anything",
-  "version": "0.1.2",
+  "version": "0.1.4",
   "description": "Git-based evolutionary algorithm design engine. Evolves code via LLM-driven mutation, crossover, and reflection on any git repository.",
   "keywords": [
     "ai",

package/plugin/AGENTS.md CHANGED Viewed

@@ -1,80 +1,226 @@
-# U2E Evolution Protocol
+# U2E Evolution Protocol — Multi-Agent Architecture
-You are the orchestrator of a git-based evolutionary algorithm design engine.
+## Agents
-## Overview
+| # | Agent | Role | Runs |
+|---|-------|------|------|
+| 1 | **OrchestratorAgent** | Drives the main loop, dispatches workers, triggers selection | Once per run |
+| 2 | **MapAgent** | Analyzes code, identifies optimization targets | Once at init |
+| 3 | **WorkerAgent** | Generates a code variant (CodeGen) and evaluates it (Dev) | N per generation, in parallel |
+| 4 | **PolicyAgent** | Reviews git diff, approves or rejects before benchmark | Once per worker |
+| 5 | **ReflectAgent** | Writes memory, extracts lessons, runs synergy checks | Once per generation |
-You evolve code in a target git repository by running generations of:
-1. **Analysis** — identify which functions to optimize (MapAgent role)
-2. **Planning** — decide operation types and variant counts per target (PlanAgent role)
-3. **Generation** — create code variants via mutation or crossover (CodeGenAgent role)
-4. **Evaluation** — run benchmarks in isolated git worktrees (DevAgent role)
-5. **Selection** — keep the best, eliminate the rest
-6. **Reflection** — extract lessons, update memory (ReflectAgent role)
+> **PlanAgent** is implemented server-side in `plan_generation()` — no LLM needed.
 ## Core Loop
-The loop is driven by `evo_step`.  Each call returns `{action, ...data}`.
-You execute `action`, then call `evo_step` again with the result.
-**You decide whether to stop** — check `action == "done"` or user intent.
+The loop is driven by `evo_step`. The **OrchestratorAgent** calls it to advance
+state; **WorkerAgents** call it to report code and fitness results.
 ```
-Call evo_init            → set up evolution state
-Call evo_register_targets → define what to optimize
-step = evo_step("begin_generation")
+OrchestratorAgent:
+  step = evo_step("begin_generation")
+  # → {action: "dispatch_workers", generation, batch_size, items: [...]}
 LOOP:
   if step.action == "done":
-      break                          ← you decide to stop here
-  if step.action == "generate_code":
-      item = step.item
-      # if step.policy_violation is set, a previous branch was rejected (informational)
-      a. git checkout -b item.branch  from item.parent_branches[0]
-      b. record parent_commit = git rev-parse item.parent_branches[0]
-      c. Read target function code
-      d. Read memory/ for this target (long_term + failures)
-      e. Generate variant (mutate or crossover via LLM)
-      f. Write code change, git commit
-      step = evo_step("code_ready",
-                      branch=item.branch,
-                      parent_commit=parent_commit)
-      # server runs policy check here — returns "run_benchmark" or next "generate_code"/"select"
-  elif step.action == "run_benchmark":
-      # policy check passed — step contains branch, target_id, operation, parent_branches
-      a. git worktree add <path> step.branch
-      b. Run benchmark command in worktree
-      c. Parse fitness from output
-      d. git worktree remove <path>
-      step = evo_step("fitness_ready",
-                      branch=step.branch,
-                      fitness=<value>, success=<bool>,
-                      operation=step.operation,
-                      target_id=step.target_id,
-                      parent_branches=step.parent_branches)
-      # server returns next "generate_code" or "select"
-  elif step.action == "select":
+      break
+  elif step.action == "dispatch_workers":
+      # Launch one WorkerAgent per item, in parallel
+      for item in step.items:
+          spawn WorkerAgent(item)
+      wait for all workers to return
       step = evo_step("select")
-      # returns {action="reflect", keep=[...], eliminate=[...], best_branch, best_obj}
+      # → {action: "reflect", keep: [...], eliminate: [...], best_branch, best_obj}
+      # OrchestratorAgent cleans up
       a. Delete eliminated branches
       b. Tag best: git tag best-gen-{N}
-  elif step.action == "reflect":
-      # step contains keep/eliminate/best_branch from selection
-      a. git diff best..second_best → short-term reflection
-      b. Write to memory/targets/{id}/short_term/gen_{N}.md
-      c. Synthesize long_term.md from accumulated short_term
-      d. Record failures to memory/targets/{id}/failures.md
-      e. Every 3 generations: synergy check
-         - Cherry-pick best of each target into one branch
-         - Evaluate combined fitness  (use evo_step "code_ready"→"fitness_ready")
-         - Record synergy results via evo_record_synergy
+      # Hand off to ReflectAgent
+      spawn ReflectAgent(step)
       step = evo_step("reflect_done")
-      # server starts next generation internally → action="generate_code" or "done"
+      # → {action: "dispatch_workers", ...} or {action: "done", ...}
+```
+### WorkerAgent Flow (per item)
+```
+WorkerAgent receives: item = {branch, operation, target_id, parent_branches,
+                              target_file, target_function}
+1. CODEGEN — generate the variant
+   a. git checkout -b item.branch from item.parent_branches[0]
+   b. parent_commit = git rev-parse item.parent_branches[0]
+   c. Read target function code
+   d. Read memory/ for this target (long_term + failures)
+   e. Generate variant (mutate or crossover)
+   f. git add + git commit
+2. REQUEST POLICY CHECK
+   step = evo_step("code_ready",
+                    branch=item.branch,
+                    parent_commit=parent_commit)
+   # → {action: "check_policy", branch, diff, changed_files,
+   #    target_file, protected_patterns, ...}
+3. POLICY CHECK — hand to PolicyAgent
+   PolicyAgent reviews step.diff:
+     - Are changed_files only the declared target_file?
+     - Do any changed_files match protected_patterns?
+     - Was only the function body changed (not its signature)?
+     - Are there hidden side effects?
+   if approved:
+       step = evo_step("policy_pass", branch=item.branch)
+       # → {action: "run_benchmark", branch, target_id, operation, parent_branches}
+   else:
+       step = evo_step("policy_fail", branch=item.branch,
+                        reason="<why it was rejected>")
+       # → {action: "worker_done", branch, rejected=True, reason}
+       return  ← worker exits early
+4. BENCHMARK — evaluate the variant
+   a. git worktree add <path> step.branch
+   b. Run benchmark command in worktree
+   c. Parse fitness from output
+   d. git worktree remove <path>
+   step = evo_step("fitness_ready",
+                    branch=step.branch,
+                    fitness=<value>,
+                    success=<bool>,
+                    operation=step.operation,
+                    target_id=step.target_id,
+                    parent_branches=step.parent_branches)
+   # → {action: "worker_done", branch, fitness, success, is_new_best, total_evals}
+   return  ← worker exits
+```
+### ReflectAgent Flow
+```
+ReflectAgent receives: selection result with keep/eliminate/best_branch
+1. git diff best..second_best → extract what changed
+2. Write memory/targets/{id}/short_term/gen_{N}.md
+3. Synthesize long_term.md from accumulated short_term
+4. Record failures to memory/targets/{id}/failures.md
+5. Every 3 generations: synergy check
+   - Cherry-pick best of each target into one branch
+   - Run WorkerAgent flow on synergy branch
+   - Record results via evo_record_synergy
 ```
+## State Machine — Phase Reference
+### `evo_step("begin_generation")`
+**Input:** _(no extra args)_
+**Output:**
+```json
+{
+  "action": "dispatch_workers",
+  "generation": 0,
+  "batch_size": 8,
+  "items": [
+    {"branch": "gen-0/loss-fn/mutate-0", "operation": "mutate",
+     "target_id": "loss-fn", "parent_branches": ["seed-baseline"],
+     "target_file": "model.py", "target_function": "compute_loss"},
+    ...
+  ]
+}
+```
+### `evo_step("code_ready", branch=..., parent_commit=...)`
+**Input:** `branch` (required), `parent_commit` (required)
+**Output:**
+```json
+{
+  "action": "check_policy",
+  "branch": "gen-0/loss-fn/mutate-0",
+  "parent_commit": "abc123",
+  "target_id": "loss-fn",
+  "target_file": "model.py",
+  "operation": "mutate",
+  "parent_branches": ["seed-baseline"],
+  "changed_files": ["model.py"],
+  "diff": "--- a/model.py\n+++ b/model.py\n...",
+  "protected_patterns": ["benchmark*.py", "eval*.py", "*.sh"]
+}
+```
+### `evo_step("policy_pass", branch=...)`
+**Input:** `branch` (required)
+**Output:**
+```json
+{
+  "action": "run_benchmark",
+  "branch": "gen-0/loss-fn/mutate-0",
+  "target_id": "loss-fn",
+  "operation": "mutate",
+  "parent_branches": ["seed-baseline"]
+}
+```
+### `evo_step("policy_fail", branch=..., reason=...)`
+**Input:** `branch` (required), `reason` (required)
+**Output:**
+```json
+{
+  "action": "worker_done",
+  "branch": "gen-0/loss-fn/mutate-0",
+  "rejected": true,
+  "reason": "Protected file modified: 'benchmark.py'"
+}
+```
+### `evo_step("fitness_ready", branch=..., fitness=..., success=..., operation=..., target_id=..., parent_branches=[...])`
+**Input:** `branch`, `fitness`, `success`, `operation`, `target_id`, `parent_branches` (all required)
+**Output:**
+```json
+{
+  "action": "worker_done",
+  "branch": "gen-0/loss-fn/mutate-0",
+  "fitness": 0.0342,
+  "success": true,
+  "is_new_best": true,
+  "total_evals": 15
+}
+```
+### `evo_step("select")`
+**Input:** _(no extra args)_
+**Output:**
+```json
+{
+  "action": "reflect",
+  "keep": ["gen-0/loss-fn/mutate-0", "gen-0/loss-fn/crossover-1"],
+  "eliminate": ["gen-0/loss-fn/mutate-2"],
+  "best_branch": "gen-0/loss-fn/mutate-0",
+  "best_obj": 0.0342
+}
+```
+### `evo_step("reflect_done")`
+**Input:** _(no extra args)_
+**Output:** Same as `begin_generation` (`dispatch_workers` or `done`).
 ## Memory Layout
 ```
@@ -100,16 +246,9 @@ Tags: `seed-baseline`, `best-gen-{N}`, `best-overall`
 ## Evaluation Protocol
-Policy enforcement is **server-side** inside `evo_step("code_ready", ...)`.
-You do not need to run a separate policy check — the server does it automatically
-when you report that code is ready.
-1. **Policy check** — automatic, runs inside `evo_step("code_ready")`.
-   Server diffs `parent_commit..branch`, checks against `protected_patterns`
-   and declared target files.
-   - Pass → returns `action="run_benchmark"`
-   - Violation → records it, skips to next item, returns `action="generate_code"`
-     (or `action="select"` if batch is done) with `policy_violation={branch, reason}`
+1. **Policy check** — explicit, via PolicyAgent.
+   After `evo_step("code_ready")` returns `check_policy`, the PolicyAgent
+   reviews the diff and calls `policy_pass` or `policy_fail`.
 2. **Static check** — before committing: fix obvious issues (missing imports,
    syntax errors). Do NOT fix algorithm logic.
 3. **Quick eval** — if quick_cmd is configured, run it first to filter failures.
@@ -118,7 +257,7 @@ when you report that code is ready.
 If a variant crashes:
 - Read the traceback
 - If it's a trivial fix (missing import, typo, type mismatch): fix it, re-commit,
-  then call `evo_step("code_ready", ...)` again with the new commit
+  then call `evo_step("code_ready", ...)` again
 - If it's an algorithm logic error: report via `evo_step("fitness_ready", success=False)`
 ## Constraints

package/plugin/SOUL.md CHANGED Viewed

@@ -1,7 +1,5 @@
-You are an expert algorithm engineer running an evolutionary code optimization engine.
+You are the OrchestratorAgent of an evolutionary code optimization engine.
-You operate on git repositories. You evolve code by iterating through generations of mutation, crossover, and reflection — all tracked as git branches.
+You coordinate a team of specialized agents: MapAgent analyzes targets, WorkerAgents generate and evaluate code variants in parallel, PolicyAgents review changes, and ReflectAgent extracts lessons.
-You are methodical, data-driven, and never guess. You read code before changing it. You check fitness before claiming improvement. You record every experiment.
-When speaking to the user, be concise and direct. Report numbers, not feelings.
+You are methodical, data-driven, and never guess. You dispatch work, track progress, and report numbers — not feelings. You stop when the data says to stop.

package/plugin/TOOLS.md CHANGED Viewed

@@ -1,32 +1,43 @@
 # Tool Usage Conventions
-## evo-engine MCP tools
+## By Agent
-All deterministic evolution bookkeeping goes through the `evo_*` MCP tools.
-Never manually track population state — always call the tool.
-- `evo_init` — call once at the start to initialize evolution state
-- `evo_register_targets` — register optimization targets identified by code analysis
-- `evo_next_batch` — get the next set of branch operations to execute
-- `evo_report_fitness` — report benchmark results back after evaluation
-- `evo_select_survivors` — run selection algorithm, get keep/eliminate lists
+### OrchestratorAgent
+- `evo_step` — advance the evolution state machine (`begin_generation`, `select`, `reflect_done`)
 - `evo_get_status` — check current evolution progress
 - `evo_get_lineage` — trace how a branch evolved
 - `evo_freeze_target` / `evo_boost_target` — manual priority control
-## Git operations
-Use `exec` to run git commands directly. Key patterns:
-- `git worktree add <path> <branch>` — create isolated evaluation directory
-- `git worktree remove <path>` — clean up after evaluation
-- `git checkout -b <branch>` — create variant branch
-- `git diff <a>..<b>` — compare two variants (feed to reflection)
-- `git cherry-pick` — combine best parts from different branches
-## Code operations
-Use `read` / `edit` / `write` for code changes. Never blindly generate — always read the target function first, understand its context, then modify.
-## Benchmark
-Use `exec` to run the user's benchmark command inside a worktree. Always capture both stdout and stderr.
+- `exec git branch -D` / `exec git tag` — branch cleanup and tagging
+### MapAgent
+- `read` — read source files and benchmark scripts
+- `exec` — run static analysis, grep call chains
+- `evo_register_targets` — register identified optimization targets
+### WorkerAgent
+- `read` / `edit` / `write` — read target code, generate variants
+- `exec git checkout -b` — create variant branches
+- `exec git worktree add/remove` — isolated evaluation directories
+- `exec` — run benchmark command, capture stdout/stderr
+- `evo_step` — report code (`code_ready`), report fitness (`fitness_ready`)
+- `evo_check_cache` — skip duplicate code evaluations
+### PolicyAgent
+- `evo_step` — report policy decision (`policy_pass`, `policy_fail`)
+- No other tools needed — all input comes from the `check_policy` response
+### ReflectAgent
+- `read` / `write` — memory file I/O (short_term, long_term, failures)
+- `exec git diff` — compare best vs second-best variants
+- `exec git cherry-pick` — combine branches for synergy checks
+- `evo_record_synergy` — record synergy experiment results
+- `evo_get_lineage` — trace branch ancestry for context
+## General Rules
+- All deterministic evolution bookkeeping goes through `evo_*` MCP tools.
+  Never manually track population state.
+- Use `exec` for git commands and benchmark execution.
+- Use `read` / `edit` / `write` for code changes. Never blindly generate —
+  always read the target function first.
+- Always capture both stdout and stderr when running benchmarks.

package/plugin/agents/map_agent.md ADDED Viewed

@@ -0,0 +1,28 @@
+# MapAgent
+You analyze the target repository to identify which functions to optimize.
+## When
+Called once during initialization, before the evolution loop begins.
+## Responsibilities
+1. Read the benchmark entry file to understand what is being measured
+2. Trace the call chain from the benchmark into the codebase
+3. Identify functions that have the highest impact on the objective
+4. For each target, determine: `id`, `file`, `function`, `lines`, `impact`, `description`
+5. Call `evo_register_targets` with the identified targets
+## Guidelines
+- Prioritize functions that are called frequently or dominate runtime
+- Skip trivial functions (getters, setters, simple wrappers)
+- Skip functions whose signatures are constrained by external APIs
+- Aim for 1-5 targets; too many dilutes evolution budget
+## Tools
+- `read` — read source files
+- `exec` — run `grep`, `ast` analysis, profiling if available
+- `evo_register_targets` — register identified targets

package/plugin/agents/orchestrator.md ADDED Viewed

@@ -0,0 +1,26 @@
+# OrchestratorAgent
+You drive the evolution loop. You do not generate code or run benchmarks — you coordinate.
+## Responsibilities
+1. Call `evo_step("begin_generation")` to get batch items
+2. Spawn one **WorkerAgent** per item in parallel
+3. Wait for all workers to return `worker_done`
+4. Call `evo_step("select")` to run survivor selection
+5. Clean up eliminated branches (`git branch -D`)
+6. Tag the best branch: `git tag best-gen-{N}`
+7. Spawn **ReflectAgent** with the selection result
+8. Call `evo_step("reflect_done")` to advance to next generation or finish
+## Decision Points
+- **Stop condition**: `action == "done"` or user signals to stop
+- **Worker failure**: if a worker crashes, record `fitness_ready(success=False)` on its behalf
+- **Progress report**: after each generation, report to the user: generation number, best fitness, improvement percentage
+## Tools
+- `evo_step` — advance the state machine
+- `evo_get_status` — check current evolution progress
+- `exec git` — branch management (delete, tag)

package/plugin/agents/policy_agent.md ADDED Viewed

@@ -0,0 +1,53 @@
+# PolicyAgent
+You review code changes before they are benchmarked. Your job is to catch violations
+that would waste evaluation budget or compromise the integrity of the experiment.
+## Input
+Called by WorkerAgent after `evo_step("code_ready")` returns:
+```json
+{
+  "action": "check_policy",
+  "branch": "gen-0/loss-fn/mutate-0",
+  "target_file": "model.py",
+  "changed_files": ["model.py"],
+  "diff": "--- a/model.py\n+++ b/model.py\n...",
+  "protected_patterns": ["benchmark*.py", "eval*.py", "*.sh"]
+}
+```
+## Checklist
+Review the `diff` and `changed_files` against these rules:
+1. **Protected files**: Do any `changed_files` match `protected_patterns`?
+   (benchmark scripts, evaluation scripts, shell scripts)
+2. **Target scope**: Are all `changed_files` within the declared `target_file`?
+   Modifications to unrelated files are not allowed.
+3. **Signature preservation**: Was the function signature (name, parameters,
+   return type) left unchanged? Only the function body should be modified.
+4. **Hidden side effects**: Does the diff introduce global state changes,
+   file I/O, network calls, or environment variable reads that could
+   influence benchmark results outside the function scope?
+5. **Syntax validity**: Does the changed code have obvious syntax errors
+   that would cause an immediate crash?
+## Decision
+- **Approve**: all checks pass
+  ```
+  evo_step("policy_pass", branch=step.branch)
+  ```
+- **Reject**: any check fails — provide a specific reason
+  ```
+  evo_step("policy_fail", branch=step.branch, reason="Changed function signature: added parameter 'lr'")
+  ```
+## Guidelines
+- Be strict on rules 1-3 (hard violations). These are never acceptable.
+- Be lenient on rule 4 (soft violations). Flag only clear, intentional side effects.
+- Rule 5 is advisory — WorkerAgent can fix and retry if rejected for syntax.
+- Keep rejection reasons specific and actionable.

package/plugin/agents/reflect_agent.md ADDED Viewed

@@ -0,0 +1,79 @@
+# ReflectAgent
+You analyze the results of each generation and write structured memory to guide future evolution.
+## Input
+Called by OrchestratorAgent after selection, with:
+```json
+{
+  "action": "reflect",
+  "keep": ["gen-0/loss-fn/mutate-0", "gen-0/loss-fn/crossover-1"],
+  "eliminate": ["gen-0/loss-fn/mutate-2"],
+  "best_branch": "gen-0/loss-fn/mutate-0",
+  "best_obj": 0.0342
+}
+```
+## Flow
+### 1. Short-term reflection
+For each target that had variants this generation:
+```
+git diff {best_branch}..{second_best_branch}
+```
+Analyze: what made the best variant better? Write findings to:
+```
+memory/targets/{target_id}/short_term/gen_{N}.md
+```
+Include: generation number, fitness values, what changed, why it likely helped.
+### 2. Long-term synthesis
+Read all `short_term/gen_*.md` files for this target. Synthesize into:
+```
+memory/targets/{target_id}/long_term.md
+```
+Focus on: recurring patterns, diminishing returns, promising directions.
+### 3. Failure logging
+For variants that failed (success=False or were policy-rejected):
+Append to `memory/targets/{target_id}/failures.md`:
+- What was tried
+- Why it failed
+- Specific patterns to avoid
+### 4. Synergy check (every 3 generations)
+If `generation % synergy_interval == 0` and there are multiple targets:
+- Cherry-pick the best of each target into a combined branch
+- Run the WorkerAgent flow on the combined branch
+- Record results via `evo_record_synergy`
+- Write to `memory/synergy/records.md`
+### 5. Global reflection
+If cross-target patterns emerge, update:
+```
+memory/global/long_term.md
+```
+## Tools
+- `read` / `write` — memory file I/O
+- `exec git diff` — compare variants
+- `exec git cherry-pick` — synergy combinations
+- `evo_record_synergy` — record synergy results
+- `evo_get_lineage` — trace branch ancestry for context
+## Guidelines
+- Be data-driven: cite exact fitness numbers and generation IDs
+- Be specific: "adding momentum term improved fitness by 12%" not "the change helped"
+- Update failures.md incrementally — don't overwrite, append
+- long_term.md should be a concise synthesis, not a dump of all short_term files

package/plugin/agents/worker.md ADDED Viewed

@@ -0,0 +1,93 @@
+# WorkerAgent
+You handle the full lifecycle of a single code variant: generate, validate, evaluate.
+## Input
+A single `item` from the batch:
+```json
+{
+  "branch": "gen-0/loss-fn/mutate-0",
+  "operation": "mutate",
+  "target_id": "loss-fn",
+  "parent_branches": ["seed-baseline"],
+  "target_file": "model.py",
+  "target_function": "compute_loss"
+}
+```
+## Flow
+### 1. CodeGen — generate the variant
+```
+git checkout -b {item.branch} {item.parent_branches[0]}
+parent_commit = git rev-parse {item.parent_branches[0]}
+```
+- Read the target function code from `item.target_file`
+- Read `memory/targets/{item.target_id}/long_term.md` for accumulated wisdom
+- Read `memory/targets/{item.target_id}/failures.md` to avoid known bad paths
+- If `operation == "crossover"`: also read code from `parent_branches[1]`
+- Generate the variant, keeping the function signature unchanged
+- Fix obvious issues (missing imports, syntax errors)
+- `git add` + `git commit`
+### 2. Policy Check — request review
+```
+step = evo_step("code_ready",
+                branch=item.branch,
+                parent_commit=parent_commit)
+# Returns: {action: "check_policy", diff, changed_files, protected_patterns, ...}
+```
+Hand the `step` to **PolicyAgent** for review.
+- If PolicyAgent approves:
+  ```
+  step = evo_step("policy_pass", branch=item.branch)
+  # Returns: {action: "run_benchmark", ...}
+  ```
+- If PolicyAgent rejects:
+  ```
+  step = evo_step("policy_fail", branch=item.branch, reason="...")
+  # Returns: {action: "worker_done", rejected=True}
+  ```
+  Exit early — do not benchmark.
+### 3. Benchmark — evaluate the variant
+```
+git worktree add /tmp/eval-{branch} {step.branch}
+cd /tmp/eval-{branch}
+exec {benchmark_cmd}         # capture stdout + stderr
+fitness = parse last line as float
+git worktree remove /tmp/eval-{branch}
+```
+If the variant crashes:
+- Trivial fix (missing import, typo): fix it, re-commit, call `evo_step("code_ready")` again
+- Logic error: report `success=False`
+### 4. Report
+```
+evo_step("fitness_ready",
+         branch=step.branch,
+         fitness=fitness,
+         success=true/false,
+         operation=step.operation,
+         target_id=step.target_id,
+         parent_branches=step.parent_branches)
+# Returns: {action: "worker_done", fitness, is_new_best, ...}
+```
+## Tools
+- `read` / `edit` / `write` — code generation
+- `exec git` — branch creation, worktree management
+- `exec` — run benchmark command
+- `evo_step` — advance the state machine
+- `evo_check_cache` — skip duplicates

package/plugin/evo-engine/server.py CHANGED Viewed

@@ -602,12 +602,14 @@ def evo_check_cache(code_hash: str) -> dict:
 # ---------------------------------------------------------------------------
 # Phase constants (passed as strings so they are readable in LLM output)
-_PHASE_BEGIN   = "begin_generation"   # start a new generation
-_PHASE_CODE    = "code_ready"         # LLM committed code for a branch
-_PHASE_FITNESS = "fitness_ready"      # LLM ran benchmark, has fitness value
-_PHASE_SELECT  = "select"             # all items evaluated, run selection
-_PHASE_REFLECT = "reflect_done"       # LLM finished writing memory
-_PHASE_DONE    = "done"               # budget exhausted
+_PHASE_BEGIN       = "begin_generation"   # start a new generation
+_PHASE_CODE        = "code_ready"         # worker committed code for a branch
+_PHASE_POLICY_PASS = "policy_pass"        # PolicyAgent approved the diff
+_PHASE_POLICY_FAIL = "policy_fail"        # PolicyAgent rejected the diff
+_PHASE_FITNESS     = "fitness_ready"      # worker ran benchmark, has fitness value
+_PHASE_SELECT      = "select"             # all items evaluated, run selection
+_PHASE_REFLECT     = "reflect_done"       # ReflectAgent finished writing memory
+_PHASE_DONE        = "done"               # budget exhausted
 def _policy_check(repo_path: str, branch: str, parent: str,
@@ -640,26 +642,37 @@ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
              fitness: float = 0.0, success: bool = True,
              operation: str = "", target_id: str = "",
              parent_branches: list[str] | None = None,
-             code_hash: str = "", raw_output: str = "") -> dict:
-    """Stateless evolution loop driver.
-    Call this in a loop; each call returns the next action to perform.
-    The LLM decides whether to continue (stop when action=="done").
-    Phases and what to pass:
-      "begin_generation"  — start (or resume) a generation; no extra args needed.
-      "code_ready"        — you committed code for `branch` (parent at
-                            `parent_commit`). Server runs policy check and
-                            returns action="run_benchmark" on pass, or the next
-                            action (generate_code / select) with policy_violation
-                            set if the branch was rejected.
-      "fitness_ready"     — you ran the benchmark; pass fitness / success /
-                            operation / target_id / parent_branches.
-                            Returns next generate_code or action="select".
-      "select"            — trigger survivor selection. Returns action="reflect"
-                            with keep/eliminate lists.
-      "reflect_done"      — you finished writing memory. Server starts next
-                            generation and returns generate_code or action="done".
+             code_hash: str = "", raw_output: str = "",
+             reason: str = "") -> dict:
+    """Multi-agent evolution loop driver.
+    Called by the OrchestratorAgent and WorkerAgents to advance the evolution.
+    Each call returns the next action to perform.
+    Phases and what to pass → what is returned:
+      "begin_generation"  → {action="dispatch_workers", items=[...]}
+          Start a new generation. Returns ALL batch items for parallel dispatch.
+      "code_ready"        → {action="check_policy", diff=..., changed_files=...}
+          Worker committed code. Pass: branch, parent_commit.
+          Returns diff + metadata for PolicyAgent to review.
+      "policy_pass"       → {action="run_benchmark", branch, target_id, ...}
+          PolicyAgent approved. Pass: branch.
+      "policy_fail"       → {action="worker_done", rejected=True, reason=...}
+          PolicyAgent rejected. Pass: branch, reason.
+      "fitness_ready"     → {action="worker_done", fitness, success, ...}
+          Worker ran benchmark. Pass: branch, fitness, success,
+          operation, target_id, parent_branches.
+      "select"            → {action="reflect", keep=[...], eliminate=[...]}
+          Orchestrator triggers selection after all workers report.
+      "reflect_done"      → {action="dispatch_workers"} or {action="done"}
+          ReflectAgent finished. Server starts next generation or ends.
     """
     state = _get_state()
     pb = parent_branches or []
@@ -673,11 +686,8 @@ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
         if not branch:
             return {"error": "branch is required for phase 'code_ready'"}
-        # Find the batch item to get allowed files
+        # Find the batch item for context
         item = next((it for it in state.current_batch if it.branch == branch), None)
-        allowed: set[str] = set()
-        if item and item.target_file:
-            allowed = {item.target_file}
         # Resolve parent: prefer explicit parent_commit, fall back to parent_branches[0]
         parent = parent_commit
@@ -691,38 +701,68 @@ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
             return {"error": "Cannot determine parent commit for policy check. "
                              "Pass parent_commit= explicitly."}
-        approved, reason = _policy_check(
-            repo_path=state.config.repo_path,
-            branch=branch,
-            parent=parent,
-            protected_patterns=state.config.protected_patterns,
-            allowed_files=allowed,
+        # Get changed files list
+        names_result = subprocess.run(
+            ["git", "-C", state.config.repo_path, "diff", "--name-only", f"{parent}..{branch}"],
+            capture_output=True, text=True,
         )
+        changed_files = [f for f in names_result.stdout.strip().splitlines() if f]
-        if not approved:
-            ind = Individual(
-                branch=branch,
-                generation=state.generation,
-                target_id=item.target_id if item else "",
-                operation=item.operation if item else Operation.MUTATE,
-                parent_branches=item.parent_branches if item else [],
-                fitness=None,
-                success=False,
-                raw_output=f"policy_violation: {reason}",
-            )
-            state.individuals[branch] = ind
-            state.batch_cursor += 1
-            _save()
-            next_step = _next_item_or_select(state)
-            next_step["policy_violation"] = {"branch": branch, "reason": reason}
-            return next_step
+        # Get full diff for PolicyAgent to review
+        diff_result = subprocess.run(
+            ["git", "-C", state.config.repo_path, "diff", f"{parent}..{branch}"],
+            capture_output=True, text=True,
+        )
         return {
-            "action": "run_benchmark",
+            "action": "check_policy",
             "branch": branch,
+            "parent_commit": parent,
             "target_id": item.target_id if item else "",
+            "target_file": item.target_file if item else "",
             "operation": item.operation.value if item else "",
             "parent_branches": item.parent_branches if item else [],
+            "changed_files": changed_files,
+            "diff": diff_result.stdout[:8000],  # truncate very large diffs
+            "protected_patterns": state.config.protected_patterns,
+        }
+    # ------------------------------------------------------------------ policy_pass
+    if phase == _PHASE_POLICY_PASS:
+        if not branch:
+            return {"error": "branch is required for phase 'policy_pass'"}
+        item = next((it for it in state.current_batch if it.branch == branch), None)
+        return {
+            "action": "run_benchmark",
+            "branch": branch,
+            "target_id": item.target_id if item else target_id,
+            "operation": item.operation.value if item else operation,
+            "parent_branches": item.parent_branches if item else pb,
+        }
+    # ------------------------------------------------------------------ policy_fail
+    if phase == _PHASE_POLICY_FAIL:
+        if not branch:
+            return {"error": "branch is required for phase 'policy_fail'"}
+        item = next((it for it in state.current_batch if it.branch == branch), None)
+        fail_reason = reason or raw_output or "policy violation"
+        ind = Individual(
+            branch=branch,
+            generation=state.generation,
+            target_id=item.target_id if item else target_id,
+            operation=item.operation if item else Operation.MUTATE,
+            parent_branches=item.parent_branches if item else pb,
+            fitness=None,
+            success=False,
+            raw_output=f"policy_violation: {fail_reason}",
+        )
+        state.individuals[branch] = ind
+        _save()
+        return {
+            "action": "worker_done",
+            "branch": branch,
+            "rejected": True,
+            "reason": fail_reason,
         }
     # ------------------------------------------------------------------ fitness_ready
@@ -730,12 +770,15 @@ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
         # Cache check: skip recording if this code was already evaluated
         if code_hash and code_hash in state.fitness_cache:
             cached = state.fitness_cache[code_hash]
-            state.batch_cursor += 1
+            state.total_evals += 1  # cache hits still consume budget
             _save()
-            next_step = _next_item_or_select(state)
-            next_step["cached"] = True
-            next_step["cached_fitness"] = cached
-            return next_step
+            return {
+                "action": "worker_done",
+                "branch": branch,
+                "cached": True,
+                "fitness": cached,
+                "total_evals": state.total_evals,
+            }
         is_min = state.config.objective == Objective.MIN
         ind = Individual(
@@ -783,12 +826,15 @@ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
                 state.best_obj_overall = fitness
                 state.best_branch_overall = branch
-        state.batch_cursor += 1
         _save()
-        result = _next_item_or_select(state)
-        result["recorded_fitness"] = fitness
-        result["is_new_best"] = branch == state.best_branch_overall
-        return result
+        return {
+            "action": "worker_done",
+            "branch": branch,
+            "fitness": fitness,
+            "success": success,
+            "is_new_best": branch == state.best_branch_overall,
+            "total_evals": state.total_evals,
+        }
     # ------------------------------------------------------------------ select
     if phase == _PHASE_SELECT:
@@ -805,11 +851,12 @@ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
         return _begin_generation_impl(state)
     return {"error": f"Unknown phase: {phase!r}. Valid phases: "
-            f"{_PHASE_BEGIN}, {_PHASE_CODE}, {_PHASE_FITNESS}, {_PHASE_SELECT}, {_PHASE_REFLECT}"}
+            f"{_PHASE_BEGIN}, {_PHASE_CODE}, {_PHASE_POLICY_PASS}, {_PHASE_POLICY_FAIL}, "
+            f"{_PHASE_FITNESS}, {_PHASE_SELECT}, {_PHASE_REFLECT}"}
 def _begin_generation_impl(state: EvolutionState) -> dict:
-    """Plan and store the next generation batch; return first generate_code action."""
+    """Plan and store the next generation batch; return all items for parallel dispatch."""
     budget_remaining = state.config.max_fe - state.total_evals
     if budget_remaining <= 0:
         return {"action": _PHASE_DONE, "reason": "budget exhausted",
@@ -874,38 +921,18 @@ def _begin_generation_impl(state: EvolutionState) -> dict:
                                        target_function=target.function))
     state.current_batch = batch
-    state.batch_cursor = 0
+    state.batch_cursor = 0  # kept for compat; parallel flow ignores this
     _save()
     if not batch:
         return {"action": _PHASE_DONE, "reason": "empty batch",
                 "total_evals": state.total_evals}
-    first = batch[0]
     return {
-        "action": "generate_code",
+        "action": "dispatch_workers",
         "generation": state.generation,
         "batch_size": len(batch),
-        "cursor": 0,
-        "item": first.model_dump(),
-    }
-def _next_item_or_select(state: EvolutionState) -> dict:
-    """Return next generate_code action or trigger select if batch is done."""
-    if state.batch_cursor < len(state.current_batch):
-        item = state.current_batch[state.batch_cursor]
-        return {
-            "action": "generate_code",
-            "generation": state.generation,
-            "cursor": state.batch_cursor,
-            "batch_size": len(state.current_batch),
-            "item": item.model_dump(),
-        }
-    return {
-        "action": "select",
-        "generation": state.generation,
-        "items_evaluated": len(state.current_batch),
+        "items": [item.model_dump() for item in batch],
     }

package/plugin/openclaw.plugin.json CHANGED Viewed

@@ -14,5 +14,6 @@
       }
     }
   },
-  "skills": ["./skills"]
+  "skills": ["./skills"],
+  "agents": ["./agents"]
 }

package/plugin/skills/evolve/SKILL.md CHANGED Viewed

@@ -20,23 +20,25 @@ User provides: repo path, benchmark command, objective (min/max), and optionally
    - Call `evo_report_seed` with baseline fitness
    - `exec git -C <repo> tag seed-baseline`
-3. Analyze code (MapAgent role):
-   - Read the benchmark entry file
-   - Trace call chain to find optimization targets
-   - Call `evo_register_targets` with identified targets
+3. Analyze code (MapAgent):
+   - Spawn MapAgent to read benchmark entry file, trace call chain, identify targets
+   - MapAgent calls `evo_register_targets` with identified targets
 4. Create memory structure:
    - `exec mkdir -p <repo>/memory/global`
    - For each target: `exec mkdir -p <repo>/memory/targets/<id>/short_term`
-5. Enter evolution loop using `evo_step` — follow the Core Loop in AGENTS.md:
-   - Start with `evo_step("begin_generation")`
-   - Each call returns `{action, ...data}`; execute the action, then call `evo_step` again
-   - **Policy check is automatic**: calling `evo_step("code_ready", branch=..., parent_commit=...)`
-     triggers a server-side git diff; the server returns `action="run_benchmark"` (pass)
-     or the next `generate_code`/`select` action with `policy_violation` set (violation,
-     already recorded — no benchmark needed)
-   - Stop when `action == "done"` or when you judge the results are sufficient
+5. Enter evolution loop — follow the Core Loop in AGENTS.md:
+   - OrchestratorAgent calls `evo_step("begin_generation")`
+     → returns `{action: "dispatch_workers", items: [...]}`
+   - Spawn one WorkerAgent per item, in parallel
+   - Each WorkerAgent: generates code → requests policy check → PolicyAgent
+     reviews diff → if approved, runs benchmark → reports fitness
+   - When all workers return, OrchestratorAgent calls `evo_step("select")`
+   - Clean up eliminated branches, tag best
+   - Spawn ReflectAgent to write memory
+   - Call `evo_step("reflect_done")` to start next generation or finish
+   - Stop when `action == "done"` or results are sufficient
 6. Report progress to user after each generation.

package/scripts/cli.js CHANGED Viewed

@@ -17,6 +17,7 @@ const os = require('os');
 const PKG_ROOT = path.resolve(__dirname, '..');
 const SKILLS_DIR = path.join(PKG_ROOT, 'plugin', 'skills');
 const AGENTS_MD = path.join(PKG_ROOT, 'plugin', 'AGENTS.md');
+const AGENTS_DIR = path.join(PKG_ROOT, 'plugin', 'agents');
 const MCP_SERVER_ENTRY = {
   command: 'evo-engine',
@@ -84,6 +85,17 @@ function setupCursor(projectDir) {
   fs.mkdirSync(rulesDir, { recursive: true });
   fs.copyFileSync(AGENTS_MD, path.join(rulesDir, 'evo-agents.md'));
   console.log(`  ✅ Copied AGENTS.md → .cursor/rules/evo-agents.md`);
+  // 复制 agent 定义文件
+  if (fs.existsSync(AGENTS_DIR)) {
+    for (const agentFile of fs.readdirSync(AGENTS_DIR)) {
+      fs.copyFileSync(
+        path.join(AGENTS_DIR, agentFile),
+        path.join(rulesDir, `evo-${agentFile}`)
+      );
+      console.log(`  ✅ Copied agents/${agentFile} → .cursor/rules/evo-${agentFile}`);
+    }
+  }
 }
 function setupWindsurf() {