@evo-hq/pi-evo 0.5.0-alpha.5 → 0.5.0-alpha.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@evo-hq/pi-evo",
3
- "version": "0.5.0-alpha.5",
3
+ "version": "0.5.0-alpha.7",
4
4
  "description": "Evo plugin for pi-coding-agent: optimize/discover/subagent skills + mid-run inject extension.",
5
5
  "publishConfig": {
6
6
  "access": "public"
@@ -2,13 +2,87 @@
2
2
  name: discover
3
3
  description: Initialize evo for the current repository by exploring the codebase, proposing unexplored optimization dimensions, constructing the benchmark inside a baseline worktree, and running the first experiment. Use when the user invokes /evo:discover, mentions setting up evo, wants to instrument a codebase for autonomous optimization, or asks to start a new evo run on a project.
4
4
  argument-hint: <optional context about what to optimize>
5
- evo_version: 0.5.0-alpha.5
5
+ evo_version: 0.5.0-alpha.7
6
6
  ---
7
7
 
8
8
  # Discover
9
9
 
10
10
  Internal procedure for `evo:discover`. The user only sees the user-facing prompts, the dashboard URL, and the baseline score -- everything else is the agent's choreography.
11
11
 
12
+ ## Evo surface
13
+
14
+ What you can invoke / dispatch / read. Each line is a triggering condition: if you're about to do X, pull/dispatch/read this. Don't preload -- act when the trigger fires.
15
+
16
+ ```
17
+ evo plugin
18
+
19
+ ├── Main thread (the orchestrator -- you, inside /evo:discover or /evo:optimize)
20
+ │ │
21
+ │ ├── Skills (Skill tool)
22
+ │ │ ├── evo:discover starting a new evo workspace / instrumenting a project
23
+ │ │ ├── evo:optimize after discover commits the baseline -- drives the loop.
24
+ │ │ │ Args: subagents=N (read sizing-the-round FIRST),
25
+ │ │ │ autonomous, subagents-only, budget=N, stall=N
26
+ │ │ ├── evo:finetuning task is finetuning / post-training / training a model
27
+ │ │ └── evo:infra-setup need a remote backend, pooled workspaces, lease/slot
28
+ │ │ management, or specific provider auth/setup
29
+ │ │
30
+ │ └── Subagents to dispatch (Task tool, subagent_type=...)
31
+ │ ├── evo:benchmark-reviewer before the baseline run, or whenever the
32
+ │ │ benchmark command / harness changes
33
+ │ └── evo:ideator stalled, or every ~5 committed experiments.
34
+ │ One subagent per brief:
35
+ │ failure_analysis, literature, frontier_extrapolation
36
+
37
+ ├── Subagent thread (each subagent spawned by /optimize step 5)
38
+ │ │
39
+ │ ├── Skills (the subagent loads this on first turn -- the brief's first
40
+ │ │ sentence mandates it; not auto-loaded by the host)
41
+ │ │ └── evo:subagent load FIRST -- defines the iteration protocol
42
+ │ │ + brief field shape the subagent operates under
43
+ │ │
44
+ │ └── Subagents to dispatch (Task tool, subagent_type=...)
45
+ │ └── evo:verifier ALWAYS dispatch pre AND post every evo run.
46
+ │ Pre: ~30s static analysis before the experiment runs.
47
+ │ Post: result-validity audit after it commits.
48
+ │ Not optional. Not ad-hoc.
49
+
50
+ └── Key references (Read tool, on demand)
51
+ ├── discover/references/
52
+ │ ├── constructing-benchmark.md designing + assembling a benchmark from scratch
53
+ │ ├── sdk_python.py / sdk_node.js wiring per-task instrumentation -- preferred path
54
+ │ ├── inline_instrumentation.py inline fallback when SDK can't be used.
55
+ │ │ Copy as-is; do not reimplement (file header
56
+ │ │ explains why)
57
+ │ ├── sizing-the-round.md BEFORE invoking /evo:optimize with any
58
+ │ │ specific subagents=N. Single-GPU /
59
+ │ │ single-exclusive-resource -> subagents=1
60
+ │ ├── proposing-dimensions.md choosing what to optimize when not obvious
61
+ │ └── instrumentation-contract.md the format evo reads (result + traces shapes)
62
+
63
+ ├── finetuning/references/
64
+ │ ├── glue.md writing train.py -- I/O contract evo expects
65
+ │ ├── diagnostics.md per-failure-mode diagnostics
66
+ │ ├── false-progress.md what doesn't count as improvement
67
+ │ ├── trace-schema.md per-task trace JSON schema for training runs
68
+ │ ├── rl/ RL framework references
69
+ │ │ └── art.md ART (Algorithm-Refined Training)
70
+ │ ├── sft/ SFT framework references
71
+ │ │ └── tinker.md Tinker SFT
72
+ │ └── serving/ eval-time inference references
73
+ │ └── vllm.md vLLM serving config + LoRA-multi
74
+
75
+ ├── infra-setup/references/
76
+ │ └── provider-matrix.md provider/backend summary (auth, setup, costs)
77
+
78
+ └── references/ (shared across skills)
79
+ ├── evo-wait.md any time you need to wait without burning
80
+ │ context (subagent completion, training,
81
+ │ ideators, GPU activity, any long-running)
82
+ ├── agent-sdk-reference.md SDK API surface
83
+ └── cli-quick-reference.md CLI subcommand cheat sheet
84
+ ```
85
+
12
86
  ## Host conventions
13
87
 
14
88
  This skill runs on any host that implements the Agent Skills spec. When the body uses generic phrases, apply the host's best-fit equivalent:
@@ -40,20 +114,20 @@ evo --version
40
114
  The output must be exactly:
41
115
 
42
116
  ```
43
- evo-hq-cli 0.5.0-alpha.5
117
+ evo-hq-cli 0.5.0-alpha.7
44
118
  ```
45
119
 
46
120
  Three outcomes:
47
121
 
48
122
  1. **Matches exactly** — continue to step 1.
49
123
  2. **Reports a different version** (`evo-hq-cli 0.4.2`, etc.) — the host refetched a newer/older skill bundle than the CLI on PATH. Drift breaks skills silently. Stop and tell the user:
50
- > Your installed evo CLI is on a different version than this skill (`0.5.0-alpha.5`). Run:
124
+ > Your installed evo CLI is on a different version than this skill (`0.5.0-alpha.7`). Run:
51
125
  > ```
52
- > uv tool install --force evo-hq-cli==0.5.0-alpha.5
126
+ > uv tool install --force evo-hq-cli==0.5.0-alpha.7
53
127
  > ```
54
128
  > Then re-invoke this skill.
55
129
  3. **`command not found`, or reports a different package** (commonly `evo 1.x` — the unrelated SLAM tool) — the CLI isn't installed. Tell the user:
56
- > `evo-hq-cli` isn't on your PATH. Install it: `uv tool install evo-hq-cli==0.5.0-alpha.5` (or `pipx install evo-hq-cli==0.5.0-alpha.5`). Then re-invoke this skill.
130
+ > `evo-hq-cli` isn't on your PATH. Install it: `uv tool install evo-hq-cli==0.5.0-alpha.7` (or `pipx install evo-hq-cli==0.5.0-alpha.7`). Then re-invoke this skill.
57
131
 
58
132
  Do not try to auto-install. Host sandbox + network policy may block it; leaving the install as a user action keeps failure modes clear.
59
133
 
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: infra-setup
3
3
  description: Non-user-invocable provider/setup reference for evo backend switching, prerequisite checks, and auth/install guidance.
4
- evo_version: 0.5.0-alpha.5
4
+ evo_version: 0.5.0-alpha.7
5
5
  ---
6
6
 
7
7
  # Infra Setup
@@ -2,13 +2,47 @@
2
2
  name: optimize
3
3
  description: Run the evo optimization loop with parallel subagents until interrupted.
4
4
  argument-hint: "[subagents=N] [budget=N] [stall=N]"
5
- evo_version: 0.5.0-alpha.5
5
+ evo_version: 0.5.0-alpha.7
6
6
  ---
7
7
 
8
8
  Run the `evo` optimization loop. Each round, the orchestrator writes structured briefs and spawns subagents that execute within them. Each subagent is semi-autonomous: it reads the pointer traces, forms the concrete edit, runs experiments, and can iterate within its branch. Runs until interrupted or the stall limit is reached.
9
9
 
10
10
  **This skill is the canonical loop for ALL post-discover work — including serial workloads.** If the workspace's resource profile forces width 1 (single GPU, single-process benchmark, etc.), you still invoke `/evo:optimize` -- just pass `subagents=1`. The loop's value is the STRUCTURE around each experiment (scan-subagent cross-cutting analysis between rounds, verifier pre/post hooks via the subagent skill, ideator spawning on stall, frontier reconciliation, stop-hook discipline), NOT just parallelism. Bypassing optimize because "I'm running serial work anyway" loses every piece of that structure -- you've reverted to ad-hoc experiment iteration with none of evo's loop benefits, just the bookkeeping.
11
11
 
12
+ ## Evo surface -- loop-relevant
13
+
14
+ You're inside `/evo:optimize`. Things you'll pull/dispatch during the loop:
15
+
16
+ ```
17
+ main thread (you)
18
+ ├── Skills (Skill tool)
19
+ │ └── evo:finetuning before writing or changing any train.py
20
+
21
+ └── Subagents to dispatch (Task tool, subagent_type=...)
22
+ └── evo:ideator stalled, or every ~5 committed experiments.
23
+ One subagent per brief:
24
+ failure_analysis, literature, frontier_extrapolation
25
+
26
+ subagent thread (each subagent spawned by step 5)
27
+ ├── evo:subagent skill loaded by the subagent on first turn -- the brief's
28
+ │ first sentence mandates it (not auto-loaded)
29
+ └── evo:verifier subagent MANDATORY pre AND post every evo run.
30
+ Pre: ~30s static analysis before the experiment runs.
31
+ Post: result-validity audit after it commits.
32
+
33
+ references (Read tool, on demand)
34
+ ├── discover/references/sizing-the-round.md pick subagents=N
35
+ ├── references/evo-wait.md waiting without burning context
36
+ ├── finetuning/references/glue.md train.py I/O contract
37
+ └── finetuning/references/{rl,sft,serving}/ provider-specific recipes
38
+ (rl/art.md, sft/tinker.md,
39
+ serving/vllm.md)
40
+ ```
41
+
42
+ Full surface tree (orchestrator entry-point view, including benchmark-reviewer,
43
+ infra-setup, and the complete references catalogue) lives in `evo:discover`'s
44
+ "Evo surface" section.
45
+
12
46
  ## Host conventions
13
47
 
14
48
  This skill runs on any host that implements the Agent Skills spec. When the body uses generic phrases, apply the host's best-fit equivalent:
@@ -38,7 +72,7 @@ A user can override any of these with `/optimize [subagents=N] [budget=N] [stall
38
72
 
39
73
  **Picking `subagents` and `budget` is load-bearing -- do not skim.**
40
74
 
41
- Mandatory before the first round (and again any time the backend or benchmark changes): **READ `plugins/evo/skills/optimize/references/sizing-the-round.md` IN FULL.** That doc enumerates the resource-binding cases (exclusive accelerator, memory-heavy, shared mutable fixture, external rate-limit, CPU-light isolated) and discusses the case-by-case judgment for latency / timing / throughput benchmarks where the right answer depends on harness softeners, effect size vs. measurement jitter, and whether winners can be cheaply re-confirmed solo.
75
+ Mandatory before the first round (and again any time the backend or benchmark changes): **READ `plugins/evo/skills/discover/references/sizing-the-round.md` IN FULL.** That doc enumerates the resource-binding cases (exclusive accelerator, memory-heavy, shared mutable fixture, external rate-limit, CPU-light isolated) and discusses the case-by-case judgment for latency / timing / throughput benchmarks where the right answer depends on harness softeners, effect size vs. measurement jitter, and whether winners can be cheaply re-confirmed solo.
42
76
 
43
77
  Under-subscribing wastes wall-clock. Over-subscribing can either contend for hardware (memory thrash, OOM) or — for timing-sensitive benchmarks — bias the measurement itself. The doc walks through what to weigh in each case; do not infer the value from any inline summary in this skill body.
44
78
 
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  name: report
3
3
  description: Print the dashboard's dot chart (score over experiment order, status colors, best-path stair) inline in the terminal for every run in the workspace. Use when the user invokes /evo:report, asks for a quick score chart without opening the dashboard, or wants the scatter plot in chat output.
4
- evo_version: 0.5.0-alpha.5
4
+ evo_version: 0.5.0-alpha.7
5
5
  ---
6
6
 
7
7
  # Report
@@ -1,11 +1,52 @@
1
1
  ---
2
2
  name: subagent
3
- description: Internal protocol for evo optimization subagents. Loaded by subagents spawned from /optimize via their host's skill loader. Not for orchestrator use.
4
- evo_version: 0.5.0-alpha.5
3
+ description: Protocol that evo optimization subagents follow when dispatched from /optimize. Auto-loaded by spawned subagents via their host's skill loader. The orchestrator may also invoke this skill to understand the brief shape its dispatched subagents expect + what they're required to emit -- useful when writing briefs or debugging a subagent's behavior.
4
+ evo_version: 0.5.0-alpha.7
5
5
  ---
6
6
 
7
7
  # Evo Subagent Protocol
8
8
 
9
+ **Orchestrators reading for context**: this is the protocol your dispatched subagents follow. You don't act on it yourself -- write briefs that satisfy the four required fields described below, and rely on each spawned subagent to drive the loop on its end. Stop reading at "Host conventions" if you only need the brief shape; the rest is for the subagent.
10
+
11
+ ## Evo surface -- subagent perspective
12
+
13
+ What you can pull/dispatch/read as a subagent. Each line is a triggering condition.
14
+
15
+ ```
16
+ skills you may pull (Skill tool)
17
+ └── evo:finetuning before writing or changing any train.py
18
+
19
+ subagents you dispatch (Task tool, subagent_type=...)
20
+ └── evo:verifier MANDATORY pre AND post every evo run.
21
+ Pre: ~30s static analysis BEFORE the experiment runs
22
+ (block on failure -- fix and retry).
23
+ Post: result-validity audit AFTER it commits.
24
+
25
+ references (Read tool, on demand)
26
+ ├── discover/references/
27
+ │ ├── sdk_python.py / sdk_node.js wiring per-task instrumentation -- preferred
28
+ │ ├── inline_instrumentation.py inline fallback. Copy as-is; do not reimplement
29
+ │ └── instrumentation-contract.md the format evo reads (result + traces shapes)
30
+
31
+ ├── references/evo-wait.md any time you need to wait -- training, eval,
32
+ │ any long-running condition. Use this instead
33
+ │ of `sleep N`; doesn't burn context.
34
+
35
+ └── finetuning/references/
36
+ ├── glue.md train.py I/O contract evo expects
37
+ ├── diagnostics.md per-failure-mode diagnostics
38
+ ├── false-progress.md what doesn't count as improvement
39
+ ├── trace-schema.md per-task trace JSON schema
40
+ ├── rl/art.md ART (Algorithm-Refined Training)
41
+ ├── sft/tinker.md Tinker SFT
42
+ └── serving/vllm.md vLLM serving config + LoRA-multi
43
+ ```
44
+
45
+ Orchestrator entry-point view (benchmark-reviewer, ideator, infra-setup, full
46
+ references catalogue) lives in `evo:discover`'s "Evo surface" section.
47
+
48
+ ---
49
+
9
50
  You are an evo optimization subagent. The orchestrator has given you a **brief** with four fields:
10
51
 
11
52
  - **Objective** -- the bottleneck to attack and evidence for it (strategic, not edit-level)