evo-anything 0.1.0 → 0.1.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "evo-anything",
3
- "version": "0.1.0",
3
+ "version": "0.1.2",
4
4
  "description": "Git-based evolutionary algorithm design engine. Evolves code via LLM-driven mutation, crossover, and reflection on any git repository.",
5
5
  "keywords": [
6
6
  "ai",
package/plugin/AGENTS.md CHANGED
@@ -14,36 +14,65 @@ You evolve code in a target git repository by running generations of:
14
14
 
15
15
  ## Core Loop
16
16
 
17
+ The loop is driven by `evo_step`. Each call returns `{action, ...data}`.
18
+ You execute `action`, then call `evo_step` again with the result.
19
+ **You decide whether to stop** — check `action == "done"` or user intent.
20
+
17
21
  ```
18
- Call evo_init → set up evolution state
22
+ Call evo_init → set up evolution state
19
23
  Call evo_register_targets → define what to optimize
20
-
21
- WHILE evo_get_status shows budget remaining:
22
- 1. Call evo_next_batch → get [{branch, op, target, parents}]
23
- 2. For each operation:
24
- a. git checkout -b <branch> from parent
25
- b. Read target function code
26
- c. Read memory/ for this target (long_term + failures)
27
- d. Generate variant (mutate or crossover via LLM)
28
- e. Write code change, git commit
29
- 3. For each branch to evaluate:
30
- a. git worktree add <path> <branch>
31
- b. Run benchmark command in worktree
32
- c. Parse fitness from output
33
- d. Call evo_report_fitness with result
34
- e. git worktree remove <path>
35
- 4. Call evo_select_survivors → get keep/eliminate lists
36
- 5. Delete eliminated branches
37
- 6. Tag best: git tag best-gen-{N}
38
- 7. Reflect:
39
- a. git diff best..second_best → short-term reflection
40
- b. Write to memory/targets/{id}/short_term/gen_{N}.md
41
- c. Synthesize long_term.md from accumulated short_term
42
- d. Record failures to memory/targets/{id}/failures.md
43
- 8. Every 3 generations: synergy check
44
- a. Cherry-pick best of each target into one branch
45
- b. Evaluate combined fitness
46
- c. Record synergy results
24
+ step = evo_step("begin_generation")
25
+
26
+ LOOP:
27
+ if step.action == "done":
28
+ break ← you decide to stop here
29
+
30
+ if step.action == "generate_code":
31
+ item = step.item
32
+ # if step.policy_violation is set, a previous branch was rejected (informational)
33
+ a. git checkout -b item.branch from item.parent_branches[0]
34
+ b. record parent_commit = git rev-parse item.parent_branches[0]
35
+ c. Read target function code
36
+ d. Read memory/ for this target (long_term + failures)
37
+ e. Generate variant (mutate or crossover via LLM)
38
+ f. Write code change, git commit
39
+ step = evo_step("code_ready",
40
+ branch=item.branch,
41
+ parent_commit=parent_commit)
42
+ # server runs policy check here — returns "run_benchmark" or next "generate_code"/"select"
43
+
44
+ elif step.action == "run_benchmark":
45
+ # policy check passed step contains branch, target_id, operation, parent_branches
46
+ a. git worktree add <path> step.branch
47
+ b. Run benchmark command in worktree
48
+ c. Parse fitness from output
49
+ d. git worktree remove <path>
50
+ step = evo_step("fitness_ready",
51
+ branch=step.branch,
52
+ fitness=<value>, success=<bool>,
53
+ operation=step.operation,
54
+ target_id=step.target_id,
55
+ parent_branches=step.parent_branches)
56
+ # server returns next "generate_code" or "select"
57
+
58
+ elif step.action == "select":
59
+ step = evo_step("select")
60
+ # returns {action="reflect", keep=[...], eliminate=[...], best_branch, best_obj}
61
+ a. Delete eliminated branches
62
+ b. Tag best: git tag best-gen-{N}
63
+
64
+ elif step.action == "reflect":
65
+ # step contains keep/eliminate/best_branch from selection
66
+ a. git diff best..second_best → short-term reflection
67
+ b. Write to memory/targets/{id}/short_term/gen_{N}.md
68
+ c. Synthesize long_term.md from accumulated short_term
69
+ d. Record failures to memory/targets/{id}/failures.md
70
+ e. Every 3 generations: synergy check
71
+ - Cherry-pick best of each target into one branch
72
+ - Evaluate combined fitness (use evo_step "code_ready"→"fitness_ready")
73
+ - Record synergy results via evo_record_synergy
74
+ step = evo_step("reflect_done")
75
+ # server starts next generation internally → action="generate_code" or "done"
47
76
  ```
48
77
 
49
78
  ## Memory Layout
@@ -71,14 +100,26 @@ Tags: `seed-baseline`, `best-gen-{N}`, `best-overall`
71
100
 
72
101
  ## Evaluation Protocol
73
102
 
74
- 1. **Static check** read generated code, fix obvious issues (missing imports, syntax errors). Do NOT fix algorithm logic.
75
- 2. **Quick eval** if quick_cmd is configured, run it first to filter obvious failures.
76
- 3. **Full eval** — run full benchmark only on candidates that pass quick eval.
103
+ Policy enforcement is **server-side** inside `evo_step("code_ready", ...)`.
104
+ You do not need to run a separate policy check the server does it automatically
105
+ when you report that code is ready.
106
+
107
+ 1. **Policy check** — automatic, runs inside `evo_step("code_ready")`.
108
+ Server diffs `parent_commit..branch`, checks against `protected_patterns`
109
+ and declared target files.
110
+ - Pass → returns `action="run_benchmark"`
111
+ - Violation → records it, skips to next item, returns `action="generate_code"`
112
+ (or `action="select"` if batch is done) with `policy_violation={branch, reason}`
113
+ 2. **Static check** — before committing: fix obvious issues (missing imports,
114
+ syntax errors). Do NOT fix algorithm logic.
115
+ 3. **Quick eval** — if quick_cmd is configured, run it first to filter failures.
116
+ 4. **Full eval** — run full benchmark only on candidates that pass quick eval.
77
117
 
78
118
  If a variant crashes:
79
119
  - Read the traceback
80
- - If it's a trivial fix (missing import, typo, type mismatch): fix it, re-commit, re-evaluate
81
- - If it's an algorithm logic error: mark as failed, record in failures.md
120
+ - If it's a trivial fix (missing import, typo, type mismatch): fix it, re-commit,
121
+ then call `evo_step("code_ready", ...)` again with the new commit
122
+ - If it's an algorithm logic error: report via `evo_step("fitness_ready", success=False)`
82
123
 
83
124
  ## Constraints
84
125
 
@@ -67,7 +67,7 @@ class SurvivorResult(BaseModel):
67
67
  keep: list[str]
68
68
  eliminate: list[str]
69
69
  best_branch: str
70
- best_obj: float
70
+ best_obj: Optional[float] = None
71
71
 
72
72
 
73
73
  class EvolutionConfig(BaseModel):
@@ -81,6 +81,11 @@ class EvolutionConfig(BaseModel):
81
81
  synergy_interval: int = 3
82
82
  top_k_survive: int = 5
83
83
  quick_cmd: Optional[str] = None
84
+ # Glob patterns for files that must never be modified by evolution
85
+ protected_patterns: list[str] = Field(default_factory=lambda: [
86
+ "benchmark*.py", "eval*.py", "evaluate*.py",
87
+ "run_eval*", "test_bench*", "*.sh",
88
+ ])
84
89
 
85
90
 
86
91
  class EvolutionState(BaseModel):
@@ -101,3 +106,6 @@ class EvolutionState(BaseModel):
101
106
  fitness_cache: dict[str, float] = Field(default_factory=dict)
102
107
  # Synergy records
103
108
  synergy_records: list[dict] = Field(default_factory=list)
109
+ # Current generation batch (stored server-side so LLM just passes a cursor)
110
+ current_batch: list[BatchItem] = Field(default_factory=list)
111
+ batch_cursor: int = 0 # index of the next unprocessed item in current_batch
@@ -11,6 +11,9 @@ dependencies = [
11
11
  [project.scripts]
12
12
  evo-engine = "server:main"
13
13
 
14
+ [tool.setuptools]
15
+ py-modules = ["server", "models", "selection"]
16
+
14
17
  [build-system]
15
- requires = ["hatchling"]
16
- build-backend = "hatchling.build"
18
+ requires = ["setuptools>=61"]
19
+ build-backend = "setuptools.build_meta"
@@ -9,9 +9,11 @@ The agent calls these tools; the LLM handles code generation and reflection.
9
9
 
10
10
  from __future__ import annotations
11
11
 
12
+ import fnmatch
12
13
  import hashlib
13
14
  import json
14
15
  import os
16
+ import subprocess
15
17
  from pathlib import Path
16
18
 
17
19
  from mcp.server.fastmcp import FastMCP
@@ -35,7 +37,7 @@ from selection import (
35
37
  update_temperatures,
36
38
  )
37
39
 
38
- mcp = FastMCP("evo-engine", description="U2E evolutionary algorithm bookkeeping")
40
+ mcp = FastMCP("evo-engine", instructions="U2E evolutionary algorithm bookkeeping")
39
41
 
40
42
  # ---------------------------------------------------------------------------
41
43
  # State persistence
@@ -595,6 +597,318 @@ def evo_check_cache(code_hash: str) -> dict:
595
597
  return {"cached": False}
596
598
 
597
599
 
600
+ # ---------------------------------------------------------------------------
601
+ # evo_step — stateless loop driver
602
+ # ---------------------------------------------------------------------------
603
+
604
+ # Phase constants (passed as strings so they are readable in LLM output)
605
+ _PHASE_BEGIN = "begin_generation" # start a new generation
606
+ _PHASE_CODE = "code_ready" # LLM committed code for a branch
607
+ _PHASE_FITNESS = "fitness_ready" # LLM ran benchmark, has fitness value
608
+ _PHASE_SELECT = "select" # all items evaluated, run selection
609
+ _PHASE_REFLECT = "reflect_done" # LLM finished writing memory
610
+ _PHASE_DONE = "done" # budget exhausted
611
+
612
+
613
+ def _policy_check(repo_path: str, branch: str, parent: str,
614
+ protected_patterns: list[str],
615
+ allowed_files: set[str]) -> tuple[bool, str]:
616
+ """Run git diff and check for policy violations.
617
+
618
+ Returns (approved: bool, reason: str).
619
+ """
620
+ result = subprocess.run(
621
+ ["git", "-C", repo_path, "diff", "--name-only", f"{parent}..{branch}"],
622
+ capture_output=True, text=True,
623
+ )
624
+ if result.returncode != 0:
625
+ return False, f"git diff failed: {result.stderr.strip()}"
626
+
627
+ changed = [f for f in result.stdout.strip().splitlines() if f]
628
+ for f in changed:
629
+ basename = os.path.basename(f)
630
+ for pat in protected_patterns:
631
+ if fnmatch.fnmatch(f, pat) or fnmatch.fnmatch(basename, pat):
632
+ return False, f"Protected file modified: {f!r} (pattern {pat!r})"
633
+ if allowed_files and f not in allowed_files:
634
+ return False, f"File outside optimization targets: {f!r}"
635
+ return True, ""
636
+
637
+
638
+ @mcp.tool()
639
+ def evo_step(phase: str, branch: str = "", parent_commit: str = "",
640
+ fitness: float = 0.0, success: bool = True,
641
+ operation: str = "", target_id: str = "",
642
+ parent_branches: list[str] | None = None,
643
+ code_hash: str = "", raw_output: str = "") -> dict:
644
+ """Stateless evolution loop driver.
645
+
646
+ Call this in a loop; each call returns the next action to perform.
647
+ The LLM decides whether to continue (stop when action=="done").
648
+
649
+ Phases and what to pass:
650
+ "begin_generation" — start (or resume) a generation; no extra args needed.
651
+ "code_ready" — you committed code for `branch` (parent at
652
+ `parent_commit`). Server runs policy check and
653
+ returns action="run_benchmark" on pass, or the next
654
+ action (generate_code / select) with policy_violation
655
+ set if the branch was rejected.
656
+ "fitness_ready" — you ran the benchmark; pass fitness / success /
657
+ operation / target_id / parent_branches.
658
+ Returns next generate_code or action="select".
659
+ "select" — trigger survivor selection. Returns action="reflect"
660
+ with keep/eliminate lists.
661
+ "reflect_done" — you finished writing memory. Server starts next
662
+ generation and returns generate_code or action="done".
663
+ """
664
+ state = _get_state()
665
+ pb = parent_branches or []
666
+
667
+ # ------------------------------------------------------------------ begin
668
+ if phase == _PHASE_BEGIN:
669
+ return _begin_generation_impl(state)
670
+
671
+ # ------------------------------------------------------------------ code_ready
672
+ if phase == _PHASE_CODE:
673
+ if not branch:
674
+ return {"error": "branch is required for phase 'code_ready'"}
675
+
676
+ # Find the batch item to get allowed files
677
+ item = next((it for it in state.current_batch if it.branch == branch), None)
678
+ allowed: set[str] = set()
679
+ if item and item.target_file:
680
+ allowed = {item.target_file}
681
+
682
+ # Resolve parent: prefer explicit parent_commit, fall back to parent_branches[0]
683
+ parent = parent_commit
684
+ if not parent and item and item.parent_branches:
685
+ r = subprocess.run(
686
+ ["git", "-C", state.config.repo_path, "rev-parse", item.parent_branches[0]],
687
+ capture_output=True, text=True,
688
+ )
689
+ parent = r.stdout.strip() if r.returncode == 0 else item.parent_branches[0]
690
+ if not parent:
691
+ return {"error": "Cannot determine parent commit for policy check. "
692
+ "Pass parent_commit= explicitly."}
693
+
694
+ approved, reason = _policy_check(
695
+ repo_path=state.config.repo_path,
696
+ branch=branch,
697
+ parent=parent,
698
+ protected_patterns=state.config.protected_patterns,
699
+ allowed_files=allowed,
700
+ )
701
+
702
+ if not approved:
703
+ ind = Individual(
704
+ branch=branch,
705
+ generation=state.generation,
706
+ target_id=item.target_id if item else "",
707
+ operation=item.operation if item else Operation.MUTATE,
708
+ parent_branches=item.parent_branches if item else [],
709
+ fitness=None,
710
+ success=False,
711
+ raw_output=f"policy_violation: {reason}",
712
+ )
713
+ state.individuals[branch] = ind
714
+ state.batch_cursor += 1
715
+ _save()
716
+ next_step = _next_item_or_select(state)
717
+ next_step["policy_violation"] = {"branch": branch, "reason": reason}
718
+ return next_step
719
+
720
+ return {
721
+ "action": "run_benchmark",
722
+ "branch": branch,
723
+ "target_id": item.target_id if item else "",
724
+ "operation": item.operation.value if item else "",
725
+ "parent_branches": item.parent_branches if item else [],
726
+ }
727
+
728
+ # ------------------------------------------------------------------ fitness_ready
729
+ if phase == _PHASE_FITNESS:
730
+ # Cache check: skip recording if this code was already evaluated
731
+ if code_hash and code_hash in state.fitness_cache:
732
+ cached = state.fitness_cache[code_hash]
733
+ state.batch_cursor += 1
734
+ _save()
735
+ next_step = _next_item_or_select(state)
736
+ next_step["cached"] = True
737
+ next_step["cached_fitness"] = cached
738
+ return next_step
739
+
740
+ is_min = state.config.objective == Objective.MIN
741
+ ind = Individual(
742
+ branch=branch,
743
+ generation=state.generation,
744
+ target_id=target_id,
745
+ operation=Operation(operation) if operation else Operation.MUTATE,
746
+ parent_branches=pb,
747
+ fitness=fitness,
748
+ success=success,
749
+ code_hash=code_hash,
750
+ raw_output=raw_output[:500] if raw_output else None,
751
+ )
752
+ state.individuals[branch] = ind
753
+ state.total_evals += 1
754
+
755
+ if code_hash:
756
+ state.fitness_cache[code_hash] = fitness
757
+
758
+ if target_id not in state.active_branches:
759
+ state.active_branches[target_id] = []
760
+ if success:
761
+ state.active_branches[target_id].append(branch)
762
+
763
+ if success and target_id in state.targets:
764
+ target = state.targets[target_id]
765
+ improved = (
766
+ target.current_best_obj is None
767
+ or (is_min and fitness < target.current_best_obj)
768
+ or (not is_min and fitness > target.current_best_obj)
769
+ )
770
+ if improved:
771
+ target.current_best_obj = fitness
772
+ target.current_best_branch = branch
773
+ target.stagnation_count = 0
774
+
775
+ if success:
776
+ if state.best_obj_overall is None:
777
+ state.best_obj_overall = fitness
778
+ state.best_branch_overall = branch
779
+ elif is_min and fitness < state.best_obj_overall:
780
+ state.best_obj_overall = fitness
781
+ state.best_branch_overall = branch
782
+ elif not is_min and fitness > state.best_obj_overall:
783
+ state.best_obj_overall = fitness
784
+ state.best_branch_overall = branch
785
+
786
+ state.batch_cursor += 1
787
+ _save()
788
+ result = _next_item_or_select(state)
789
+ result["recorded_fitness"] = fitness
790
+ result["is_new_best"] = branch == state.best_branch_overall
791
+ return result
792
+
793
+ # ------------------------------------------------------------------ select
794
+ if phase == _PHASE_SELECT:
795
+ result = evo_select_survivors()
796
+ result["action"] = "reflect"
797
+ return result
798
+
799
+ # ------------------------------------------------------------------ reflect_done
800
+ if phase == _PHASE_REFLECT:
801
+ budget_remaining = state.config.max_fe - state.total_evals
802
+ if budget_remaining <= 0:
803
+ return {"action": _PHASE_DONE, "reason": "budget exhausted",
804
+ "total_evals": state.total_evals, "best_obj": state.best_obj_overall}
805
+ return _begin_generation_impl(state)
806
+
807
+ return {"error": f"Unknown phase: {phase!r}. Valid phases: "
808
+ f"{_PHASE_BEGIN}, {_PHASE_CODE}, {_PHASE_FITNESS}, {_PHASE_SELECT}, {_PHASE_REFLECT}"}
809
+
810
+
811
+ def _begin_generation_impl(state: EvolutionState) -> dict:
812
+ """Plan and store the next generation batch; return first generate_code action."""
813
+ budget_remaining = state.config.max_fe - state.total_evals
814
+ if budget_remaining <= 0:
815
+ return {"action": _PHASE_DONE, "reason": "budget exhausted",
816
+ "total_evals": state.total_evals}
817
+
818
+ is_min = state.config.objective == Objective.MIN
819
+ plan = plan_generation(
820
+ targets=state.targets,
821
+ pop_size=state.config.pop_size,
822
+ mutation_rate=state.config.mutation_rate,
823
+ budget_remaining=budget_remaining,
824
+ synergy_interval=state.config.synergy_interval,
825
+ generation=state.generation,
826
+ is_minimize=is_min,
827
+ )
828
+
829
+ batch: list[BatchItem] = []
830
+ var_counter: dict[str, int] = {}
831
+ for item in plan:
832
+ tid = item["target_id"]
833
+ op = item["operation"]
834
+ count = item["count"]
835
+ for _ in range(count):
836
+ key = f"{tid}/{op.value}"
837
+ idx = var_counter.get(key, 0)
838
+ var_counter[key] = idx + 1
839
+ if op == Operation.SYNERGY:
840
+ b = f"gen-{state.generation}/synergy/{tid}-{idx}"
841
+ parts = tid.split("+")
842
+ parents_list = [
843
+ state.targets[p].current_best_branch
844
+ for p in parts
845
+ if p in state.targets and state.targets[p].current_best_branch
846
+ ]
847
+ batch.append(BatchItem(branch=b, operation=op, target_id=tid,
848
+ parent_branches=parents_list,
849
+ target_file="", target_function=""))
850
+ else:
851
+ target = state.targets[tid]
852
+ b = f"gen-{state.generation}/{tid}/{op.value}-{idx}"
853
+ if op == Operation.CROSSOVER:
854
+ active = state.active_branches.get(tid, [])
855
+ active_inds = [
856
+ state.individuals[br] for br in active
857
+ if br in state.individuals and state.individuals[br].success
858
+ ]
859
+ pairs = random_select(active_inds, 1, is_minimize=is_min)
860
+ if pairs:
861
+ parents_list = [pairs[0][0].branch, pairs[0][1].branch]
862
+ elif target.current_best_branch:
863
+ parents_list = [target.current_best_branch]
864
+ else:
865
+ parents_list = [state.seed_branch]
866
+ else:
867
+ parents_list = (
868
+ [target.current_best_branch] if target.current_best_branch
869
+ else [state.seed_branch]
870
+ )
871
+ batch.append(BatchItem(branch=b, operation=op, target_id=tid,
872
+ parent_branches=parents_list,
873
+ target_file=target.file,
874
+ target_function=target.function))
875
+
876
+ state.current_batch = batch
877
+ state.batch_cursor = 0
878
+ _save()
879
+
880
+ if not batch:
881
+ return {"action": _PHASE_DONE, "reason": "empty batch",
882
+ "total_evals": state.total_evals}
883
+
884
+ first = batch[0]
885
+ return {
886
+ "action": "generate_code",
887
+ "generation": state.generation,
888
+ "batch_size": len(batch),
889
+ "cursor": 0,
890
+ "item": first.model_dump(),
891
+ }
892
+
893
+
894
+ def _next_item_or_select(state: EvolutionState) -> dict:
895
+ """Return next generate_code action or trigger select if batch is done."""
896
+ if state.batch_cursor < len(state.current_batch):
897
+ item = state.current_batch[state.batch_cursor]
898
+ return {
899
+ "action": "generate_code",
900
+ "generation": state.generation,
901
+ "cursor": state.batch_cursor,
902
+ "batch_size": len(state.current_batch),
903
+ "item": item.model_dump(),
904
+ }
905
+ return {
906
+ "action": "select",
907
+ "generation": state.generation,
908
+ "items_evaluated": len(state.current_batch),
909
+ }
910
+
911
+
598
912
  # ---------------------------------------------------------------------------
599
913
  # Helpers
600
914
  # ---------------------------------------------------------------------------
@@ -29,8 +29,14 @@ User provides: repo path, benchmark command, objective (min/max), and optionally
29
29
  - `exec mkdir -p <repo>/memory/global`
30
30
  - For each target: `exec mkdir -p <repo>/memory/targets/<id>/short_term`
31
31
 
32
- 5. Enter evolution loop — follow the protocol in AGENTS.md:
33
- - Call `evo_next_batch` → execute each operation → `evo_report_fitness` → `evo_select_survivors` → reflect → repeat
32
+ 5. Enter evolution loop using `evo_step` — follow the Core Loop in AGENTS.md:
33
+ - Start with `evo_step("begin_generation")`
34
+ - Each call returns `{action, ...data}`; execute the action, then call `evo_step` again
35
+ - **Policy check is automatic**: calling `evo_step("code_ready", branch=..., parent_commit=...)`
36
+ triggers a server-side git diff; the server returns `action="run_benchmark"` (pass)
37
+ or the next `generate_code`/`select` action with `policy_violation` set (violation,
38
+ already recorded — no benchmark needed)
39
+ - Stop when `action == "done"` or when you judge the results are sufficient
34
40
 
35
41
  6. Report progress to user after each generation.
36
42