npm - evo-anything - Versions diffs - 0.1.0 → 0.1.2 - Mend

evo-anything 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +1 -1
package/plugin/AGENTS.md +74 -33
package/plugin/evo-engine/models.py +9 -1
package/plugin/evo-engine/pyproject.toml +5 -2
package/plugin/evo-engine/server.py +315 -1
package/plugin/skills/evolve/SKILL.md +8 -2

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "evo-anything",
-  "version": "0.1.0",
+  "version": "0.1.2",
   "description": "Git-based evolutionary algorithm design engine. Evolves code via LLM-driven mutation, crossover, and reflection on any git repository.",
   "keywords": [
     "ai",

package/plugin/AGENTS.md CHANGED Viewed

@@ -14,36 +14,65 @@ You evolve code in a target git repository by running generations of:
 ## Core Loop
+The loop is driven by `evo_step`.  Each call returns `{action, ...data}`.
+You execute `action`, then call `evo_step` again with the result.
+**You decide whether to stop** — check `action == "done"` or user intent.
 ```
-Call evo_init → set up evolution state
+Call evo_init            → set up evolution state
 Call evo_register_targets → define what to optimize
-WHILE evo_get_status shows budget remaining:
-  1. Call evo_next_batch → get [{branch, op, target, parents}]
-  2. For each operation:
-     a. git checkout -b <branch> from parent
-     b. Read target function code
-     c. Read memory/ for this target (long_term + failures)
-     d. Generate variant (mutate or crossover via LLM)
-     e. Write code change, git commit
-  3. For each branch to evaluate:
-     a. git worktree add <path> <branch>
-     b. Run benchmark command in worktree
-     c. Parse fitness from output
-     d. Call evo_report_fitness with result
-     e. git worktree remove <path>
-  4. Call evo_select_survivors → get keep/eliminate lists
-  5. Delete eliminated branches
-  6. Tag best: git tag best-gen-{N}
-  7. Reflect:
-     a. git diff best..second_best → short-term reflection
-     b. Write to memory/targets/{id}/short_term/gen_{N}.md
-     c. Synthesize long_term.md from accumulated short_term
-     d. Record failures to memory/targets/{id}/failures.md
-  8. Every 3 generations: synergy check
-     a. Cherry-pick best of each target into one branch
-     b. Evaluate combined fitness
-     c. Record synergy results
+step = evo_step("begin_generation")
+LOOP:
+  if step.action == "done":
+      break                          ← you decide to stop here
+  if step.action == "generate_code":
+      item = step.item
+      # if step.policy_violation is set, a previous branch was rejected (informational)
+      a. git checkout -b item.branch  from item.parent_branches[0]
+      b. record parent_commit = git rev-parse item.parent_branches[0]
+      c. Read target function code
+      d. Read memory/ for this target (long_term + failures)
+      e. Generate variant (mutate or crossover via LLM)
+      f. Write code change, git commit
+      step = evo_step("code_ready",
+                      branch=item.branch,
+                      parent_commit=parent_commit)
+      # server runs policy check here — returns "run_benchmark" or next "generate_code"/"select"
+  elif step.action == "run_benchmark":
+      # policy check passed — step contains branch, target_id, operation, parent_branches
+      a. git worktree add <path> step.branch
+      b. Run benchmark command in worktree
+      c. Parse fitness from output
+      d. git worktree remove <path>
+      step = evo_step("fitness_ready",
+                      branch=step.branch,
+                      fitness=<value>, success=<bool>,
+                      operation=step.operation,
+                      target_id=step.target_id,
+                      parent_branches=step.parent_branches)
+      # server returns next "generate_code" or "select"
+  elif step.action == "select":
+      step = evo_step("select")
+      # returns {action="reflect", keep=[...], eliminate=[...], best_branch, best_obj}
+      a. Delete eliminated branches
+      b. Tag best: git tag best-gen-{N}
+  elif step.action == "reflect":
+      # step contains keep/eliminate/best_branch from selection
+      a. git diff best..second_best → short-term reflection
+      b. Write to memory/targets/{id}/short_term/gen_{N}.md
+      c. Synthesize long_term.md from accumulated short_term
+      d. Record failures to memory/targets/{id}/failures.md
+      e. Every 3 generations: synergy check
+         - Cherry-pick best of each target into one branch
+         - Evaluate combined fitness  (use evo_step "code_ready"→"fitness_ready")
+         - Record synergy results via evo_record_synergy
+      step = evo_step("reflect_done")
+      # server starts next generation internally → action="generate_code" or "done"
 ```
 ## Memory Layout
@@ -71,14 +100,26 @@ Tags: `seed-baseline`, `best-gen-{N}`, `best-overall`
 ## Evaluation Protocol
-1. **Static check** — read generated code, fix obvious issues (missing imports, syntax errors). Do NOT fix algorithm logic.
-2. **Quick eval** — if quick_cmd is configured, run it first to filter obvious failures.
-3. **Full eval** — run full benchmark only on candidates that pass quick eval.
+Policy enforcement is **server-side** inside `evo_step("code_ready", ...)`.
+You do not need to run a separate policy check — the server does it automatically
+when you report that code is ready.
+1. **Policy check** — automatic, runs inside `evo_step("code_ready")`.
+   Server diffs `parent_commit..branch`, checks against `protected_patterns`
+   and declared target files.
+   - Pass → returns `action="run_benchmark"`
+   - Violation → records it, skips to next item, returns `action="generate_code"`
+     (or `action="select"` if batch is done) with `policy_violation={branch, reason}`
+2. **Static check** — before committing: fix obvious issues (missing imports,
+   syntax errors). Do NOT fix algorithm logic.
+3. **Quick eval** — if quick_cmd is configured, run it first to filter failures.
+4. **Full eval** — run full benchmark only on candidates that pass quick eval.
 If a variant crashes:
 - Read the traceback
-- If it's a trivial fix (missing import, typo, type mismatch): fix it, re-commit, re-evaluate
-- If it's an algorithm logic error: mark as failed, record in failures.md
+- If it's a trivial fix (missing import, typo, type mismatch): fix it, re-commit,
+  then call `evo_step("code_ready", ...)` again with the new commit
+- If it's an algorithm logic error: report via `evo_step("fitness_ready", success=False)`
 ## Constraints

package/plugin/evo-engine/models.py CHANGED Viewed

@@ -67,7 +67,7 @@ class SurvivorResult(BaseModel):
     keep: list[str]
     eliminate: list[str]
     best_branch: str
-    best_obj: float
+    best_obj: Optional[float] = None
 class EvolutionConfig(BaseModel):
@@ -81,6 +81,11 @@ class EvolutionConfig(BaseModel):
     synergy_interval: int = 3
     top_k_survive: int = 5
     quick_cmd: Optional[str] = None
+    # Glob patterns for files that must never be modified by evolution
+    protected_patterns: list[str] = Field(default_factory=lambda: [
+        "benchmark*.py", "eval*.py", "evaluate*.py",
+        "run_eval*", "test_bench*", "*.sh",
+    ])
 class EvolutionState(BaseModel):
@@ -101,3 +106,6 @@ class EvolutionState(BaseModel):
     fitness_cache: dict[str, float] = Field(default_factory=dict)
     # Synergy records
     synergy_records: list[dict] = Field(default_factory=list)
+    # Current generation batch (stored server-side so LLM just passes a cursor)
+    current_batch: list[BatchItem] = Field(default_factory=list)
+    batch_cursor: int = 0  # index of the next unprocessed item in current_batch

package/plugin/evo-engine/pyproject.toml CHANGED Viewed

@@ -11,6 +11,9 @@ dependencies = [
 [project.scripts]
 evo-engine = "server:main"
+[tool.setuptools]
+py-modules = ["server", "models", "selection"]
 [build-system]
-requires = ["hatchling"]
-build-backend = "hatchling.build"
+requires = ["setuptools>=61"]
+build-backend = "setuptools.build_meta"

package/plugin/evo-engine/server.py CHANGED Viewed

@@ -9,9 +9,11 @@ The agent calls these tools; the LLM handles code generation and reflection.
 from __future__ import annotations
+import fnmatch
 import hashlib
 import json
 import os
+import subprocess
 from pathlib import Path
 from mcp.server.fastmcp import FastMCP
@@ -35,7 +37,7 @@ from selection import (
     update_temperatures,
 )
-mcp = FastMCP("evo-engine", description="U2E evolutionary algorithm bookkeeping")
+mcp = FastMCP("evo-engine", instructions="U2E evolutionary algorithm bookkeeping")
 # ---------------------------------------------------------------------------
 # State persistence
@@ -595,6 +597,318 @@ def evo_check_cache(code_hash: str) -> dict:
     return {"cached": False}
+# ---------------------------------------------------------------------------
+# evo_step — stateless loop driver
+# ---------------------------------------------------------------------------
+# Phase constants (passed as strings so they are readable in LLM output)
+_PHASE_BEGIN   = "begin_generation"   # start a new generation
+_PHASE_CODE    = "code_ready"         # LLM committed code for a branch
+_PHASE_FITNESS = "fitness_ready"      # LLM ran benchmark, has fitness value
+_PHASE_SELECT  = "select"             # all items evaluated, run selection
+_PHASE_REFLECT = "reflect_done"       # LLM finished writing memory
+_PHASE_DONE    = "done"               # budget exhausted
+def _policy_check(repo_path: str, branch: str, parent: str,
+                  protected_patterns: list[str],
+                  allowed_files: set[str]) -> tuple[bool, str]:
+    """Run git diff and check for policy violations.
+    Returns (approved: bool, reason: str).
+    """
+    result = subprocess.run(
+        ["git", "-C", repo_path, "diff", "--name-only", f"{parent}..{branch}"],
+        capture_output=True, text=True,
+    )
+    if result.returncode != 0:
+        return False, f"git diff failed: {result.stderr.strip()}"
+    changed = [f for f in result.stdout.strip().splitlines() if f]
+    for f in changed:
+        basename = os.path.basename(f)
+        for pat in protected_patterns:
+            if fnmatch.fnmatch(f, pat) or fnmatch.fnmatch(basename, pat):
+                return False, f"Protected file modified: {f!r} (pattern {pat!r})"
+        if allowed_files and f not in allowed_files:
+            return False, f"File outside optimization targets: {f!r}"
+    return True, ""
+@mcp.tool()
+def evo_step(phase: str, branch: str = "", parent_commit: str = "",
+             fitness: float = 0.0, success: bool = True,
+             operation: str = "", target_id: str = "",
+             parent_branches: list[str] | None = None,
+             code_hash: str = "", raw_output: str = "") -> dict:
+    """Stateless evolution loop driver.
+    Call this in a loop; each call returns the next action to perform.
+    The LLM decides whether to continue (stop when action=="done").
+    Phases and what to pass:
+      "begin_generation"  — start (or resume) a generation; no extra args needed.
+      "code_ready"        — you committed code for `branch` (parent at
+                            `parent_commit`). Server runs policy check and
+                            returns action="run_benchmark" on pass, or the next
+                            action (generate_code / select) with policy_violation
+                            set if the branch was rejected.
+      "fitness_ready"     — you ran the benchmark; pass fitness / success /
+                            operation / target_id / parent_branches.
+                            Returns next generate_code or action="select".
+      "select"            — trigger survivor selection. Returns action="reflect"
+                            with keep/eliminate lists.
+      "reflect_done"      — you finished writing memory. Server starts next
+                            generation and returns generate_code or action="done".
+    """
+    state = _get_state()
+    pb = parent_branches or []
+    # ------------------------------------------------------------------ begin
+    if phase == _PHASE_BEGIN:
+        return _begin_generation_impl(state)
+    # ------------------------------------------------------------------ code_ready
+    if phase == _PHASE_CODE:
+        if not branch:
+            return {"error": "branch is required for phase 'code_ready'"}
+        # Find the batch item to get allowed files
+        item = next((it for it in state.current_batch if it.branch == branch), None)
+        allowed: set[str] = set()
+        if item and item.target_file:
+            allowed = {item.target_file}
+        # Resolve parent: prefer explicit parent_commit, fall back to parent_branches[0]
+        parent = parent_commit
+        if not parent and item and item.parent_branches:
+            r = subprocess.run(
+                ["git", "-C", state.config.repo_path, "rev-parse", item.parent_branches[0]],
+                capture_output=True, text=True,
+            )
+            parent = r.stdout.strip() if r.returncode == 0 else item.parent_branches[0]
+        if not parent:
+            return {"error": "Cannot determine parent commit for policy check. "
+                             "Pass parent_commit= explicitly."}
+        approved, reason = _policy_check(
+            repo_path=state.config.repo_path,
+            branch=branch,
+            parent=parent,
+            protected_patterns=state.config.protected_patterns,
+            allowed_files=allowed,
+        )
+        if not approved:
+            ind = Individual(
+                branch=branch,
+                generation=state.generation,
+                target_id=item.target_id if item else "",
+                operation=item.operation if item else Operation.MUTATE,
+                parent_branches=item.parent_branches if item else [],
+                fitness=None,
+                success=False,
+                raw_output=f"policy_violation: {reason}",
+            )
+            state.individuals[branch] = ind
+            state.batch_cursor += 1
+            _save()
+            next_step = _next_item_or_select(state)
+            next_step["policy_violation"] = {"branch": branch, "reason": reason}
+            return next_step
+        return {
+            "action": "run_benchmark",
+            "branch": branch,
+            "target_id": item.target_id if item else "",
+            "operation": item.operation.value if item else "",
+            "parent_branches": item.parent_branches if item else [],
+        }
+    # ------------------------------------------------------------------ fitness_ready
+    if phase == _PHASE_FITNESS:
+        # Cache check: skip recording if this code was already evaluated
+        if code_hash and code_hash in state.fitness_cache:
+            cached = state.fitness_cache[code_hash]
+            state.batch_cursor += 1
+            _save()
+            next_step = _next_item_or_select(state)
+            next_step["cached"] = True
+            next_step["cached_fitness"] = cached
+            return next_step
+        is_min = state.config.objective == Objective.MIN
+        ind = Individual(
+            branch=branch,
+            generation=state.generation,
+            target_id=target_id,
+            operation=Operation(operation) if operation else Operation.MUTATE,
+            parent_branches=pb,
+            fitness=fitness,
+            success=success,
+            code_hash=code_hash,
+            raw_output=raw_output[:500] if raw_output else None,
+        )
+        state.individuals[branch] = ind
+        state.total_evals += 1
+        if code_hash:
+            state.fitness_cache[code_hash] = fitness
+        if target_id not in state.active_branches:
+            state.active_branches[target_id] = []
+        if success:
+            state.active_branches[target_id].append(branch)
+        if success and target_id in state.targets:
+            target = state.targets[target_id]
+            improved = (
+                target.current_best_obj is None
+                or (is_min and fitness < target.current_best_obj)
+                or (not is_min and fitness > target.current_best_obj)
+            )
+            if improved:
+                target.current_best_obj = fitness
+                target.current_best_branch = branch
+                target.stagnation_count = 0
+        if success:
+            if state.best_obj_overall is None:
+                state.best_obj_overall = fitness
+                state.best_branch_overall = branch
+            elif is_min and fitness < state.best_obj_overall:
+                state.best_obj_overall = fitness
+                state.best_branch_overall = branch
+            elif not is_min and fitness > state.best_obj_overall:
+                state.best_obj_overall = fitness
+                state.best_branch_overall = branch
+        state.batch_cursor += 1
+        _save()
+        result = _next_item_or_select(state)
+        result["recorded_fitness"] = fitness
+        result["is_new_best"] = branch == state.best_branch_overall
+        return result
+    # ------------------------------------------------------------------ select
+    if phase == _PHASE_SELECT:
+        result = evo_select_survivors()
+        result["action"] = "reflect"
+        return result
+    # ------------------------------------------------------------------ reflect_done
+    if phase == _PHASE_REFLECT:
+        budget_remaining = state.config.max_fe - state.total_evals
+        if budget_remaining <= 0:
+            return {"action": _PHASE_DONE, "reason": "budget exhausted",
+                    "total_evals": state.total_evals, "best_obj": state.best_obj_overall}
+        return _begin_generation_impl(state)
+    return {"error": f"Unknown phase: {phase!r}. Valid phases: "
+            f"{_PHASE_BEGIN}, {_PHASE_CODE}, {_PHASE_FITNESS}, {_PHASE_SELECT}, {_PHASE_REFLECT}"}
+def _begin_generation_impl(state: EvolutionState) -> dict:
+    """Plan and store the next generation batch; return first generate_code action."""
+    budget_remaining = state.config.max_fe - state.total_evals
+    if budget_remaining <= 0:
+        return {"action": _PHASE_DONE, "reason": "budget exhausted",
+                "total_evals": state.total_evals}
+    is_min = state.config.objective == Objective.MIN
+    plan = plan_generation(
+        targets=state.targets,
+        pop_size=state.config.pop_size,
+        mutation_rate=state.config.mutation_rate,
+        budget_remaining=budget_remaining,
+        synergy_interval=state.config.synergy_interval,
+        generation=state.generation,
+        is_minimize=is_min,
+    )
+    batch: list[BatchItem] = []
+    var_counter: dict[str, int] = {}
+    for item in plan:
+        tid = item["target_id"]
+        op = item["operation"]
+        count = item["count"]
+        for _ in range(count):
+            key = f"{tid}/{op.value}"
+            idx = var_counter.get(key, 0)
+            var_counter[key] = idx + 1
+            if op == Operation.SYNERGY:
+                b = f"gen-{state.generation}/synergy/{tid}-{idx}"
+                parts = tid.split("+")
+                parents_list = [
+                    state.targets[p].current_best_branch
+                    for p in parts
+                    if p in state.targets and state.targets[p].current_best_branch
+                ]
+                batch.append(BatchItem(branch=b, operation=op, target_id=tid,
+                                       parent_branches=parents_list,
+                                       target_file="", target_function=""))
+            else:
+                target = state.targets[tid]
+                b = f"gen-{state.generation}/{tid}/{op.value}-{idx}"
+                if op == Operation.CROSSOVER:
+                    active = state.active_branches.get(tid, [])
+                    active_inds = [
+                        state.individuals[br] for br in active
+                        if br in state.individuals and state.individuals[br].success
+                    ]
+                    pairs = random_select(active_inds, 1, is_minimize=is_min)
+                    if pairs:
+                        parents_list = [pairs[0][0].branch, pairs[0][1].branch]
+                    elif target.current_best_branch:
+                        parents_list = [target.current_best_branch]
+                    else:
+                        parents_list = [state.seed_branch]
+                else:
+                    parents_list = (
+                        [target.current_best_branch] if target.current_best_branch
+                        else [state.seed_branch]
+                    )
+                batch.append(BatchItem(branch=b, operation=op, target_id=tid,
+                                       parent_branches=parents_list,
+                                       target_file=target.file,
+                                       target_function=target.function))
+    state.current_batch = batch
+    state.batch_cursor = 0
+    _save()
+    if not batch:
+        return {"action": _PHASE_DONE, "reason": "empty batch",
+                "total_evals": state.total_evals}
+    first = batch[0]
+    return {
+        "action": "generate_code",
+        "generation": state.generation,
+        "batch_size": len(batch),
+        "cursor": 0,
+        "item": first.model_dump(),
+    }
+def _next_item_or_select(state: EvolutionState) -> dict:
+    """Return next generate_code action or trigger select if batch is done."""
+    if state.batch_cursor < len(state.current_batch):
+        item = state.current_batch[state.batch_cursor]
+        return {
+            "action": "generate_code",
+            "generation": state.generation,
+            "cursor": state.batch_cursor,
+            "batch_size": len(state.current_batch),
+            "item": item.model_dump(),
+        }
+    return {
+        "action": "select",
+        "generation": state.generation,
+        "items_evaluated": len(state.current_batch),
+    }
 # ---------------------------------------------------------------------------
 # Helpers
 # ---------------------------------------------------------------------------

package/plugin/skills/evolve/SKILL.md CHANGED Viewed

@@ -29,8 +29,14 @@ User provides: repo path, benchmark command, objective (min/max), and optionally
    - `exec mkdir -p <repo>/memory/global`
    - For each target: `exec mkdir -p <repo>/memory/targets/<id>/short_term`
-5. Enter evolution loop — follow the protocol in AGENTS.md:
-   - Call `evo_next_batch` → execute each operation → `evo_report_fitness` → `evo_select_survivors` → reflect → repeat
+5. Enter evolution loop using `evo_step` — follow the Core Loop in AGENTS.md:
+   - Start with `evo_step("begin_generation")`
+   - Each call returns `{action, ...data}`; execute the action, then call `evo_step` again
+   - **Policy check is automatic**: calling `evo_step("code_ready", branch=..., parent_commit=...)`
+     triggers a server-side git diff; the server returns `action="run_benchmark"` (pass)
+     or the next `generate_code`/`select` action with `policy_violation` set (violation,
+     already recorded — no benchmark needed)
+   - Stop when `action == "done"` or when you judge the results are sufficient
 6. Report progress to user after each generation.