@evo-hq/pi-evo 0.5.0-alpha.12 → 0.5.0-alpha.13
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
package/skills/discover/SKILL.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
name: discover
|
|
3
3
|
description: Initialize evo for the current repository by exploring the codebase, proposing unexplored optimization dimensions, constructing the benchmark inside a baseline worktree, and running the first experiment. Use when the user invokes /evo:discover, mentions setting up evo, wants to instrument a codebase for autonomous optimization, or asks to start a new evo run on a project.
|
|
4
4
|
argument-hint: <optional context about what to optimize>
|
|
5
|
-
evo_version: 0.5.0-alpha.
|
|
5
|
+
evo_version: 0.5.0-alpha.13
|
|
6
6
|
---
|
|
7
7
|
|
|
8
8
|
# Discover
|
|
@@ -116,20 +116,20 @@ evo --version
|
|
|
116
116
|
The output must be exactly:
|
|
117
117
|
|
|
118
118
|
```
|
|
119
|
-
evo-hq-cli 0.5.0-alpha.
|
|
119
|
+
evo-hq-cli 0.5.0-alpha.13
|
|
120
120
|
```
|
|
121
121
|
|
|
122
122
|
Three outcomes:
|
|
123
123
|
|
|
124
124
|
1. **Matches exactly** — continue to step 1.
|
|
125
125
|
2. **Reports a different version** (`evo-hq-cli 0.4.2`, etc.) — the host refetched a newer/older skill bundle than the CLI on PATH. Drift breaks skills silently. Stop and tell the user:
|
|
126
|
-
> Your installed evo CLI is on a different version than this skill (`0.5.0-alpha.
|
|
126
|
+
> Your installed evo CLI is on a different version than this skill (`0.5.0-alpha.13`). Run:
|
|
127
127
|
> ```
|
|
128
|
-
> uv tool install --force evo-hq-cli==0.5.0-alpha.
|
|
128
|
+
> uv tool install --force evo-hq-cli==0.5.0-alpha.13
|
|
129
129
|
> ```
|
|
130
130
|
> Then re-invoke this skill.
|
|
131
131
|
3. **`command not found`, or reports a different package** (commonly `evo 1.x` — the unrelated SLAM tool) — the CLI isn't installed. Tell the user:
|
|
132
|
-
> `evo-hq-cli` isn't on your PATH. Install it: `uv tool install evo-hq-cli==0.5.0-alpha.
|
|
132
|
+
> `evo-hq-cli` isn't on your PATH. Install it: `uv tool install evo-hq-cli==0.5.0-alpha.13` (or `pipx install evo-hq-cli==0.5.0-alpha.13`). Then re-invoke this skill.
|
|
133
133
|
|
|
134
134
|
Do not try to auto-install. Host sandbox + network policy may block it; leaving the install as a user action keeps failure modes clear.
|
|
135
135
|
|
package/skills/optimize/SKILL.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
name: optimize
|
|
3
3
|
description: Drive structured autoresearch iteration after evo:discover and the baseline commit -- scan-subagent cross-cutting analysis between rounds, frontier-based parent selection, ideator dispatch on stall, verifier pre/post hooks, annotation discipline. Width is set via subagents=N (1 for serial workloads, larger for parallel); the loop's structural value applies at any width.
|
|
4
4
|
argument-hint: "[subagents=N] [budget=N] [stall=N]"
|
|
5
|
-
evo_version: 0.5.0-alpha.
|
|
5
|
+
evo_version: 0.5.0-alpha.13
|
|
6
6
|
---
|
|
7
7
|
|
|
8
8
|
Run the `evo` optimization loop. Each round, the orchestrator writes structured briefs and spawns subagents that execute within them. Each subagent is semi-autonomous: it reads the pointer traces, forms the concrete edit, runs experiments, and can iterate within its branch. Runs until interrupted or the stall limit is reached.
|
|
@@ -102,17 +102,19 @@ evo defaults get subagents-only --json
|
|
|
102
102
|
|
|
103
103
|
As your **very first actions, before the loop**, resolve each and arm it: run `evo autonomous on` / `evo subagents-only on` when it resolves on, or `evo autonomous off` / `evo subagents-only off` when an explicit instruction or stored default turned it off. If a behavior resolves off — whether from the user's instruction this run or a stored default — say so in your opening message (e.g. "autonomous off — running one round at a time, as you asked") so it's never invisible.
|
|
104
104
|
|
|
105
|
-
**Orchestrator driver
|
|
105
|
+
**Orchestrator driver.** evo drives the loop two ways: a deterministic **dynamic workflow** (Claude Code only) or the **prose loop** below (every host). **On Claude Code the workflow is the DEFAULT — use it whenever it's available.** Resolve which as part of your very first actions:
|
|
106
106
|
|
|
107
|
-
1. `evo host show` —
|
|
108
|
-
2. `evo config get default-orchestrator` — `
|
|
107
|
+
1. `evo host show` — the workflow driver requires `claude-code`. If it prints `<not set>` (a pre-host workspace), determine your actual runtime from your own context (system prompt, env such as `CLAUDECODE=1`, self-identity): **only if you are genuinely Claude Code**, do the one-time host migration now (`evo host set claude-code`) and continue; if you are any other runtime, do NOT stamp the host here — leave it for Step 0.1 and use the prose loop.
|
|
108
|
+
2. `evo config get default-orchestrator` — `prose` is an explicit **opt-out** (honor it: use the prose loop). `workflow` **or unset** resolves to the workflow driver on Claude Code. An explicit user instruction this run still wins.
|
|
109
109
|
|
|
110
|
-
|
|
110
|
+
**Use the workflow** when host is `claude-code`, the value is not explicitly `prose`, AND the **Workflow tool is actually present in your available tools this session** — this is the default path, not opt-in. The availability check is load-bearing: **older Claude Code builds do not ship the Workflow tool**, so verify it's really in your toolset; do not assume it exists from the host alone. When (and only when) you will actually launch it, FIRST persist the choice so the rest of evo agrees (`evo config get` reflects it, and the autonomous stop-nudge auto-suppresses under the workflow): run `evo config set default-orchestrator workflow`. Then launch it once — do NOT drive the loop turn-by-turn:
|
|
111
111
|
|
|
112
112
|
- Call the **Workflow** tool with `scriptPath: ${CLAUDE_PLUGIN_ROOT}/skills/optimize/workflows/evo-optimize.js` and `args: {pluginRoot: "${CLAUDE_PLUGIN_ROOT}", subagents: <N>, budget: <N>, stall: <N>}`, using the round sizing you resolved above. **Pass all four keys explicitly — never omit one.** For `stall`, use the user's `/optimize stall=N` override if given, else the default 5. (The workflow's stop condition is the stall limit, so a dropped `stall` silently reverts it to 5.)
|
|
113
|
-
- Report the returned `runId` and tell the user to watch progress with `/workflows`. The workflow runs the round loop itself (orient → mandatory scan + cross-history axis check → ideators on stall/periodic → briefs → fan-out + verify → collect → frontier-select → stall); you do **not** execute "The Loop" section below.
|
|
113
|
+
- Report the returned `runId` and tell the user to watch progress with `/workflows`. The workflow runs the round loop itself (orient → mandatory scan + cross-history axis check → ideators on stall/periodic → briefs → fan-out + verify → collect → frontier-select → stall) plus the concurrent meta controller; you do **not** execute "The Loop" section below, and you do **not** need autonomous mode (the workflow self-drives; its stall limit is the stop).
|
|
114
114
|
|
|
115
|
-
|
|
115
|
+
Use **The Loop** below only when the workflow can't drive: host is not `claude-code`, `default-orchestrator` is explicitly `prose`, or the Workflow tool is unavailable (e.g. an older Claude Code build). The workflow is only an execution strategy over the same `evo` CLI; gates, frontier, dashboard, and recovery are identical either way.
|
|
116
|
+
|
|
117
|
+
**Reconcile config when you fall back to prose.** The stop-nudge that drives the prose loop is auto-suppressed whenever `default-orchestrator` is `workflow`. So if you fall back to the prose loop on Claude Code because the Workflow tool isn't available (older build) while `default-orchestrator` is still `workflow` from a prior run, you MUST set it back — `evo config set default-orchestrator prose` — and arm autonomous as usual. Otherwise the prose loop's stop-nudge stays suppressed and the run stalls after one round. Invariant to preserve: `default-orchestrator=workflow` in config iff the workflow is actually the driver this run.
|
|
116
118
|
|
|
117
119
|
**Autonomous mode.** Off lets you stop naturally at a turn boundary — finish a round, report, and stop. On arms the stop-nudge: at every turn boundary you are re-prompted to keep driving the loop until the **stall** limit is hit or the user interrupts. Without it, the loop does NOT force-continue across turn boundaries. To stop an autonomous run, the user runs `evo autonomous off` or `evo exit-optimize-mode`.
|
|
118
120
|
|
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
* opt-in, Claude-Code-only driver; the prose skill remains the canonical, host-agnostic
|
|
6
6
|
* default. The workflow encodes the loop CONTROL: while/stall, mandatory scan + cross-history
|
|
7
7
|
* axis check, research escalation (ideators on stall / every ~5 commits), brief + diversity,
|
|
8
|
-
* fan-out + verify, collect + frontier-select. A concurrent
|
|
8
|
+
* fan-out + verify, collect + frontier-select. A concurrent META thread (Opus, self-paced,
|
|
9
9
|
* read-only) runs alongside the round loop via Promise.all — host + cross-history checks during
|
|
10
10
|
* rounds, feeding hints into the next brief. All domain work goes through the `evo` CLI inside
|
|
11
11
|
* agents — the script itself never touches the filesystem/shell.
|
|
@@ -27,7 +27,7 @@
|
|
|
27
27
|
|
|
28
28
|
export const meta = {
|
|
29
29
|
name: 'evo-optimize',
|
|
30
|
-
description: '
|
|
30
|
+
description: 'evo tree-search loop over the evo CLI (orient, scan, ideate-on-stall, brief, fan-out, verify, collect) with a concurrent meta controller that stops doomed experiments and restructures the workflow (logic flow + prompts) live.',
|
|
31
31
|
phases: [
|
|
32
32
|
{ title: 'Orient', detail: 'read state + select frontier parents to extend' },
|
|
33
33
|
{ title: 'Scan', detail: 'mandatory parallel cross-cutting scan + structural aggregation (incl. cross-history axis check)' },
|
|
@@ -36,7 +36,8 @@ export const meta = {
|
|
|
36
36
|
{ title: 'Optimize', detail: 'parallel optimization subagents (evo new/run)' },
|
|
37
37
|
{ title: 'Verify', detail: 'validity audit + benchmark-noise confirm' },
|
|
38
38
|
{ title: 'Collect', detail: 'prune dead lineages, record cross-cutting notes' },
|
|
39
|
-
{ title: '
|
|
39
|
+
{ title: 'Meta', detail: 'concurrent controller (Opus) — host/cross-history checks, STOP doomed runs, suggest directions, AND restructure the workflow live (logic flow + prompts)' },
|
|
40
|
+
{ title: 'Meta-step', detail: 'extra agent step the meta injected into the round via an inject-step harness edit' },
|
|
40
41
|
],
|
|
41
42
|
}
|
|
42
43
|
|
|
@@ -180,8 +181,8 @@ const PREVERDICT = {
|
|
|
180
181
|
},
|
|
181
182
|
}
|
|
182
183
|
|
|
183
|
-
//
|
|
184
|
-
const
|
|
184
|
+
// Meta tick output: work-quality hints (fed into the next brief) + runtime/host alerts (surfaced).
|
|
185
|
+
const META_FINDINGS = {
|
|
185
186
|
type: 'object',
|
|
186
187
|
required: ['briefHints', 'alerts', 'stops'],
|
|
187
188
|
properties: {
|
|
@@ -204,6 +205,35 @@ const ANALYST_FINDINGS = {
|
|
|
204
205
|
},
|
|
205
206
|
},
|
|
206
207
|
},
|
|
208
|
+
// FREE-WILL harness edits: the meta agent restructures the WORKFLOW itself, live
|
|
209
|
+
// (its logic flow + prompts). Applied directly each tick — no allow-list, no caps —
|
|
210
|
+
// and audited (editLog + run log + returned state). It edits only the search harness;
|
|
211
|
+
// it never touches the benchmark / grader / verifier (those are not part of the
|
|
212
|
+
// workflow it controls, so they stay fixed and the score stays comparable).
|
|
213
|
+
harnessEdits: {
|
|
214
|
+
type: 'array',
|
|
215
|
+
items: {
|
|
216
|
+
type: 'object',
|
|
217
|
+
required: ['op', 'rationale'],
|
|
218
|
+
properties: {
|
|
219
|
+
op: { enum: ['set-knob', 'toggle-phase', 'set-prompt', 'inject-step'] },
|
|
220
|
+
rationale: { type: 'string' }, // why this edit, with evidence
|
|
221
|
+
// set-knob — retune the loop's control flow
|
|
222
|
+
knob: { enum: ['width', 'budget', 'stall', 'ideateEvery', 'ideateStall'] },
|
|
223
|
+
value: { type: 'number' },
|
|
224
|
+
// toggle-phase — turn a discretionary phase on/off
|
|
225
|
+
phaseName: { enum: ['scan', 'ideate'] },
|
|
226
|
+
enabled: { type: 'boolean' },
|
|
227
|
+
// set-prompt — edit a prompt the workflow uses (append a directive or replace it wholesale)
|
|
228
|
+
target: { enum: ['state', 'scan', 'aggregate', 'brief', 'implement', 'run', 'ideator', 'collect'] },
|
|
229
|
+
mode: { enum: ['append', 'replace'] },
|
|
230
|
+
text: { type: 'string' },
|
|
231
|
+
// inject-step — add an extra agent step at a fixed seam each round
|
|
232
|
+
at: { enum: ['before-scan', 'after-scan', 'before-brief', 'after-collect'] },
|
|
233
|
+
label: { type: 'string' },
|
|
234
|
+
},
|
|
235
|
+
},
|
|
236
|
+
},
|
|
207
237
|
},
|
|
208
238
|
}
|
|
209
239
|
|
|
@@ -225,17 +255,17 @@ const LIMIT = Number(A.stall) || 5
|
|
|
225
255
|
const IDEATE_STALL = Math.max(1, Math.min(3, LIMIT - 1))
|
|
226
256
|
const IDEATE_EVERY_COMMITS = 5 // periodic research cadence (matches prose step 6b)
|
|
227
257
|
const PREVERIFY_MAX = 3 // pre-run verify <-> revise attempts before discarding a rigged edit
|
|
228
|
-
// Concurrent
|
|
229
|
-
const
|
|
230
|
-
const
|
|
231
|
-
const
|
|
232
|
-
const
|
|
233
|
-
// ends mid-wait it drops a sentinel the
|
|
234
|
-
// tick exits within ~
|
|
258
|
+
// Concurrent meta thread (runs alongside the round loop, NOT per-round).
|
|
259
|
+
const META_ENABLED = true
|
|
260
|
+
const META_MODEL = 'opus' // the meta always reasons with Opus (judgment-heavy)
|
|
261
|
+
const META_INTERVAL_S = 300 // self-pace: observe ~every 5 min, during rounds
|
|
262
|
+
const META_HOP_S = 15 // the wait is INTERRUPTIBLE in hops of this size: when the optimize loop
|
|
263
|
+
// ends mid-wait it drops a sentinel the meta polls, so the in-flight
|
|
264
|
+
// tick exits within ~META_HOP_S instead of stalling the run for the
|
|
235
265
|
// full interval (the script can't interrupt an agent's `sleep` directly).
|
|
236
|
-
const DONE_SENTINEL = '.evo/.wf_optimize_done' // optimize ->
|
|
266
|
+
const DONE_SENTINEL = '.evo/.wf_optimize_done' // optimize -> meta "loop is over" signal (a file,
|
|
237
267
|
// since the in-memory `done` flag isn't visible to the agent's process)
|
|
238
|
-
const
|
|
268
|
+
const META_MAX_FAILS = 3 // consecutive failed ticks before the advisory meta self-disables
|
|
239
269
|
// (guards against a hot-spin when ticks fail instantly, e.g. a bad schema)
|
|
240
270
|
// Experiments per scan agent. Heuristic for the prose "small enough to read in one pass" rule —
|
|
241
271
|
// the workflow can't recursively self-partition like the prose loop, so this is fixed up front.
|
|
@@ -357,7 +387,7 @@ function aggregatePrompt(ids) {
|
|
|
357
387
|
].join(' ')
|
|
358
388
|
}
|
|
359
389
|
|
|
360
|
-
function briefPrompt(state, findings, patterns, parents, ideated,
|
|
390
|
+
function briefPrompt(state, findings, patterns, parents, ideated, metaHints) {
|
|
361
391
|
return [
|
|
362
392
|
'You are the evo orchestrator\'s brief writer.',
|
|
363
393
|
'State summary:', state.summary || '',
|
|
@@ -368,10 +398,10 @@ function briefPrompt(state, findings, patterns, parents, ideated, analystHints)
|
|
|
368
398
|
? '\nFRESH IDEATOR PROPOSALS may be available — read `.evo/run_*/ideator/proposals.jsonl` and reconcile BEFORE writing: skip any whose technique was already tried (`evo discards --like "<keyword>"`); score the rest by expected_score_uplift x confidence (frontier_extrapolation > failure_analysis > literature, all else equal); let the top 1-2 become brief objectives, citing the proposal\'s hypothesis/technique. Proposals are advisory — if none beat the in-graph scan findings, ignore them.'
|
|
369
399
|
: '',
|
|
370
400
|
'\nIf the patterns include an "axis-warning", the current axis is saturated — target the ORTHOGONAL axis it names rather than iterating the plateaued one.',
|
|
371
|
-
(
|
|
372
|
-
? '\nLIVE
|
|
401
|
+
(metaHints && metaHints.length)
|
|
402
|
+
? '\nLIVE META SIGNALS (from the concurrent observer — fold relevant ones into objectives/boundaries, e.g. switch off a saturated axis, avoid a flagged dead direction): ' + JSON.stringify(metaHints)
|
|
373
403
|
: '',
|
|
374
|
-
`\nWrite up to ${
|
|
404
|
+
`\nWrite up to ${harness.width} briefs (use the full round width of ${harness.width} whenever you can find that many genuinely DISTINCT objectives — multiple briefs MAY branch from the SAME parent when fewer than ${harness.width} frontier parents exist, as long as each attacks a different surface; do not pad with redundant briefs). One per subagent, each with four fields:`,
|
|
375
405
|
'1. objective -- one sentence naming WHERE in system behavior the gain hides, with evidence; NO file/function/edit names.',
|
|
376
406
|
'2. parent -- which experiment id to branch from (choose from the selected parents).',
|
|
377
407
|
'3. boundaries -- what NOT to try and why (discarded approaches, gates not to regress, what adjacent briefs this round do).',
|
|
@@ -446,7 +476,7 @@ function runPrompt(expId, state) {
|
|
|
446
476
|
`CRITICAL ordering: if this experiment produces an output artifact through a build or training step (whatever your recipe declares — a checkpoint dir, adapter, merged model, index, etc.), run that step to COMPLETION and confirm the artifact exists BEFORE the real run. Never call \`evo run\` while that step is still in flight or before its output exists — evaluating a not-yet-produced artifact wastes the attempt. If the experiment warm-starts, the parent's reusable artifact is in EVO_PARENT_POLICY (start from it; do not redo from scratch).`,
|
|
447
477
|
`Then run \`evo run ${expId}\` to evaluate and (if it improves and passes gates) commit it.`,
|
|
448
478
|
'If it exits GATE_FAILED, do not fight the gate — report status=evaluated.',
|
|
449
|
-
'If `evo run` is terminated externally mid-flight (the concurrent
|
|
479
|
+
'If `evo run` is terminated externally mid-flight (the concurrent meta can STOP a doomed experiment — it aborts the run and discards this node with a diagnosis), do NOT retry: report status:none and stop. The diagnosis is already recorded as an annotation and will steer the next brief.',
|
|
450
480
|
`Return: expIds:["${expId}"]; status (committed|evaluated|failed|none); committedImprover = true ONLY if evo printed COMMITTED;`,
|
|
451
481
|
'bestExpId + bestScore (required when committedImprover is true); any gates added; learnings.',
|
|
452
482
|
].join(' ')
|
|
@@ -494,16 +524,16 @@ function collectPrompt(results, round) {
|
|
|
494
524
|
].join(' ')
|
|
495
525
|
}
|
|
496
526
|
|
|
497
|
-
// One
|
|
527
|
+
// One meta tick (a FRESH Opus agent each call — no memory across ticks, so `reported` carries
|
|
498
528
|
// the dedup state in the loop's closure). Read-only: observes host + cross-history signals DURING
|
|
499
529
|
// rounds, returns work-quality briefHints (folded into the next brief) + runtime alerts (surfaced).
|
|
500
|
-
function
|
|
530
|
+
function metaPrompt(ctx, intervalS, reported) {
|
|
501
531
|
return [
|
|
502
|
-
'You are the evo
|
|
503
|
-
'
|
|
532
|
+
'You are the evo META agent — an independent controller running CONCURRENTLY with the optimize loop.',
|
|
533
|
+
'You do NOT edit experiment code, run experiments, or touch the benchmark/grader/verifier. But you DO shape the optimize WORKFLOW: stop doomed experiments, suggest next directions (briefHints), AND restructure the running workflow itself — its logic flow + prompts — via harnessEdits (your distinctive power, detailed below).',
|
|
504
534
|
`FIRST pace yourself with an INTERRUPTIBLE wait, so you stop promptly when the optimize loop ends. Run this single Bash command with a tool timeout of at least ${(intervalS + 30) * 1000} ms:`,
|
|
505
|
-
` \`if [ -f ${DONE_SENTINEL} ]; then echo OPTIMIZE_DONE; else for i in $(seq 1 ${Math.ceil(intervalS /
|
|
506
|
-
`If that prints OPTIMIZE_DONE, the optimize loop has finished — return {"briefHints":[],"alerts":[]} immediately WITHOUT gathering any signals. Otherwise the full interval elapsed: now gather signals and report.`,
|
|
535
|
+
` \`if [ -f ${DONE_SENTINEL} ]; then echo OPTIMIZE_DONE; else for i in $(seq 1 ${Math.ceil(intervalS / META_HOP_S)}); do sleep ${META_HOP_S}; [ -f ${DONE_SENTINEL} ] && { echo OPTIMIZE_DONE; break; }; done; fi\``,
|
|
536
|
+
`If that prints OPTIMIZE_DONE, the optimize loop has finished — return {"briefHints":[],"alerts":[],"stops":[],"harnessEdits":[]} immediately WITHOUT gathering any signals. Otherwise the full interval elapsed: now gather signals and report.`,
|
|
507
537
|
`Current loop state: round=${ctx.round}, stall=${ctx.stall}/${LIMIT}, best=${ctx.bestScore}.`,
|
|
508
538
|
`Already reported (do NOT repeat — only emit findings NEW since these): ${JSON.stringify(reported || [])}.`,
|
|
509
539
|
'Walk these checks (skip any whose inputs are unavailable; cite evidence; nothing speculative):',
|
|
@@ -513,8 +543,12 @@ function analystPrompt(ctx, intervalS, reported) {
|
|
|
513
543
|
'- Stuck axis: from `evo tree`, 3+ structurally-distinct committed hypotheses plateaued at ~the same score → name the saturated axis + one orthogonal axis. BRIEF HINT.',
|
|
514
544
|
'- Dead direction / ignored mechanism: annotations repeatedly naming a mechanism the recent work ignores, or a direction that keeps regressing. BRIEF HINT.',
|
|
515
545
|
'- Heading toward failure (STOP): an in-flight experiment that is CLEARLY doomed or wasting the budget — a divergent / NaN / flatlined progress metric; projected completion beyond the remaining time budget; or a known-fatal signature (e.g. output the scorer cannot parse; a silent resource mis-placement that tanks throughput with no error; a corrupt input/format that invalidates the result). HIGH PRECISION ONLY: default to NOT stopping — recommend a STOP only with concrete evidence that finishing is wasted, and only for an experiment still `active`. Emit a stop with: expId; failureClass (build = the build/produce step is broken; eval = artifact is fine but scoring/serving is wrong; hypothesis = it runs but won\'t help); reason (the diagnosis + the evidence you saw); fixHint (what the NEXT experiment must change).',
|
|
516
|
-
'
|
|
517
|
-
'
|
|
546
|
+
'For STOPs you stay READ-ONLY: do NOT run `evo abort` / `evo discard` yourself. A gated enforcer acts on each stop — it aborts the run + its subprocess tree, annotates your diagnosis (so it outlives the worktree and feeds the next round), and discards with the failureClass so the partial artifact is preserved. A STOP is a diagnosed, recoverable stop, never a silent kill.',
|
|
547
|
+
'',
|
|
548
|
+
'HARNESS CONTROL (your distinctive power): you may restructure the optimize workflow itself, live, when you judge it will help — edits apply directly (free will) and take effect next round. Current harness state: ' + JSON.stringify(harnessSummary()) + '.',
|
|
549
|
+
'harnessEdits ops: (1) set-knob {knob: width|budget|stall|ideateEvery|ideateStall, value} — retune the loop (widen the round, deepen branches, change the stall limit or ideation cadence). (2) toggle-phase {phaseName: scan|ideate, enabled} — turn a phase off/on (e.g. skip scan when traces are uninformative; force ideation early). (3) set-prompt {target: state|scan|aggregate|brief|implement|run|ideator|collect, mode: append|replace, text} — edit the prompt that step uses (append a directive, or replace it wholesale). (4) inject-step {at: before-scan|after-scan|before-brief|after-collect, text, label} — add an extra agent step at that seam each round. Every edit needs a rationale citing the evidence.',
|
|
550
|
+
'HARD CONSTRAINT: edit ONLY the search harness above. NEVER propose edits to the benchmark, grader, scorer, held-out test, or any gate — those define how results are judged; if you change them the score stops meaning anything. Emit harnessEdits ONLY with concrete evidence the current workflow SHAPE is the bottleneck; most ticks should emit none.',
|
|
551
|
+
'Return {briefHints:[...], alerts:[...], stops:[...], harnessEdits:[...]}. briefHints feed the NEXT round\'s briefs; alerts surface to the user; each stop triggers the gated enforcer; each harnessEdit is applied directly to the running workflow. All-empty is fine — most ticks should be quiet.',
|
|
518
552
|
].join('\n')
|
|
519
553
|
}
|
|
520
554
|
|
|
@@ -535,8 +569,8 @@ function analystPrompt(ctx, intervalS, reported) {
|
|
|
535
569
|
async function runBrief(brief, state) {
|
|
536
570
|
let parent = brief.parent
|
|
537
571
|
let best = null
|
|
538
|
-
for (let depth = 0; depth <
|
|
539
|
-
const impl = await agent(implementPrompt(brief, parent, state), {
|
|
572
|
+
for (let depth = 0; depth < harness.budget; depth++) {
|
|
573
|
+
const impl = await agent(withHarnessPrompt('implement', implementPrompt(brief, parent, state)), {
|
|
540
574
|
schema: IMPL_RESULT, model: brief.hard ? 'opus' : 'sonnet', phase: 'Optimize', label: `impl:${parent}#${depth}`,
|
|
541
575
|
})
|
|
542
576
|
if (!impl || !impl.expId) break
|
|
@@ -560,7 +594,7 @@ async function runBrief(brief, state) {
|
|
|
560
594
|
}
|
|
561
595
|
|
|
562
596
|
// run -> evaluate + commit
|
|
563
|
-
const r = await agent(runPrompt(impl.expId, state), { schema: SUBAGENT_RESULT, phase: 'Optimize', label: `run:${impl.expId}` })
|
|
597
|
+
const r = await agent(withHarnessPrompt('run', runPrompt(impl.expId, state)), { schema: SUBAGENT_RESULT, phase: 'Optimize', label: `run:${impl.expId}` })
|
|
564
598
|
if (!r) break
|
|
565
599
|
|
|
566
600
|
// post-run validity audit (evo:verifier, post-phase) on committed improvers
|
|
@@ -595,51 +629,131 @@ let stall = 0
|
|
|
595
629
|
let round = 0
|
|
596
630
|
let lastIdeatedCommit = 0 // committedCount at the last ideator dispatch (periodic cadence)
|
|
597
631
|
let ideatedThisStall = false // fire ideators once per stall episode, not every stalled round
|
|
598
|
-
let lastBestScore = null // latest best score, surfaced to the concurrent
|
|
599
|
-
let done = false // set when the optimize loop ends -> stops the
|
|
600
|
-
const
|
|
632
|
+
let lastBestScore = null // latest best score, surfaced to the concurrent meta thread
|
|
633
|
+
let done = false // set when the optimize loop ends -> stops the meta thread
|
|
634
|
+
const metaSignals = [] // briefHints the meta pushes; drained into the next round's brief
|
|
635
|
+
|
|
636
|
+
// ---------------------------------------------------------------------------
|
|
637
|
+
// Mutable HARNESS (the round-plan + prompts the meta agent edits live, free-will).
|
|
638
|
+
// Initialized to the resolved defaults, so a run where the meta emits no edits behaves
|
|
639
|
+
// byte-identically to before. The optimize loop READS this object each round; the meta
|
|
640
|
+
// thread WRITES it via applyHarnessEdit. Single-threaded event loop => edits applied in a
|
|
641
|
+
// meta tick land between optimize-loop awaits and take effect at the next read (next round).
|
|
642
|
+
// Every edit is audited: harness.editLog + a run-log line + the workflow's return value.
|
|
643
|
+
// Scope boundary: this controls the SEARCH harness only — never the grader/verifier.
|
|
644
|
+
// ---------------------------------------------------------------------------
|
|
645
|
+
const harness = {
|
|
646
|
+
width: WIDTH,
|
|
647
|
+
budget: ITER,
|
|
648
|
+
stall: LIMIT,
|
|
649
|
+
ideateEvery: IDEATE_EVERY_COMMITS,
|
|
650
|
+
ideateStall: IDEATE_STALL,
|
|
651
|
+
phases: { scan: true, ideate: true },
|
|
652
|
+
prompts: {}, // target -> { mode: 'append'|'replace', text }
|
|
653
|
+
injectedSteps: [], // { at, prompt, label }
|
|
654
|
+
editLog: [], // audit trail: { round, op, ...spec, rationale }
|
|
655
|
+
}
|
|
656
|
+
|
|
657
|
+
// Apply a meta prompt override (append a directive, or replace wholesale) to a base prompt.
|
|
658
|
+
function withHarnessPrompt(target, baseText) {
|
|
659
|
+
const o = harness.prompts[target]
|
|
660
|
+
if (!o || !o.text) return baseText
|
|
661
|
+
return o.mode === 'replace'
|
|
662
|
+
? o.text
|
|
663
|
+
: baseText + '\n\n[META-ADDED DIRECTIVE — injected live by the meta agent]: ' + o.text
|
|
664
|
+
}
|
|
665
|
+
|
|
666
|
+
// Run any meta-injected extra steps registered at a given seam (insert-step op).
|
|
667
|
+
async function runInjected(at, ctxLabel) {
|
|
668
|
+
for (const s of harness.injectedSteps.filter((x) => x.at === at)) {
|
|
669
|
+
try {
|
|
670
|
+
await agent(s.prompt, { phase: 'Meta-step', label: s.label || `injected:${at}:${ctxLabel}` })
|
|
671
|
+
} catch (e) {
|
|
672
|
+
log(`META injected step (${at}) errored (ignored): ${(e && e.message) || e}`)
|
|
673
|
+
}
|
|
674
|
+
}
|
|
675
|
+
}
|
|
676
|
+
|
|
677
|
+
// Apply ONE harness edit with free will (no validation gate, no caps) — then audit it.
|
|
678
|
+
function applyHarnessEdit(e, atRound) {
|
|
679
|
+
if (!e || !e.op) return
|
|
680
|
+
const rec = { round: atRound, op: e.op, rationale: e.rationale || '' }
|
|
681
|
+
if (e.op === 'set-knob' && e.knob && typeof e.value === 'number') {
|
|
682
|
+
harness[e.knob] = e.value; rec.knob = e.knob; rec.value = e.value
|
|
683
|
+
} else if (e.op === 'toggle-phase' && e.phaseName) {
|
|
684
|
+
harness.phases[e.phaseName] = e.enabled !== false; rec.phaseName = e.phaseName; rec.enabled = harness.phases[e.phaseName]
|
|
685
|
+
} else if (e.op === 'set-prompt' && e.target && e.text) {
|
|
686
|
+
harness.prompts[e.target] = { mode: e.mode === 'replace' ? 'replace' : 'append', text: e.text }
|
|
687
|
+
rec.target = e.target; rec.mode = harness.prompts[e.target].mode
|
|
688
|
+
} else if (e.op === 'inject-step' && e.at && e.text) {
|
|
689
|
+
harness.injectedSteps.push({ at: e.at, prompt: e.text, label: e.label || `meta:${e.at}` }); rec.at = e.at; rec.label = e.label || `meta:${e.at}`
|
|
690
|
+
} else {
|
|
691
|
+
log(`META harness edit IGNORED (incomplete spec for op=${e.op}): ${JSON.stringify(e)}`); return
|
|
692
|
+
}
|
|
693
|
+
harness.editLog.push(rec)
|
|
694
|
+
log(`META HARNESS EDIT [r${atRound}] ${JSON.stringify(rec)}`)
|
|
695
|
+
}
|
|
696
|
+
|
|
697
|
+
function harnessSummary() {
|
|
698
|
+
return {
|
|
699
|
+
width: harness.width, budget: harness.budget, stall: harness.stall,
|
|
700
|
+
ideateEvery: harness.ideateEvery, ideateStall: harness.ideateStall,
|
|
701
|
+
phases: harness.phases,
|
|
702
|
+
promptsOverridden: Object.entries(harness.prompts).map(([k, v]) => `${k}:${v.mode}`),
|
|
703
|
+
injectedSteps: harness.injectedSteps.map((s) => `${s.at}:${s.label}`),
|
|
704
|
+
edits: harness.editLog.length,
|
|
705
|
+
}
|
|
706
|
+
}
|
|
601
707
|
|
|
602
|
-
log(`evo-optimize start: subagents=${WIDTH} budget=${ITER} stall=${LIMIT}
|
|
708
|
+
log(`evo-optimize start: subagents=${WIDTH} budget=${ITER} stall=${LIMIT} meta=${META_ENABLED ? META_MODEL : 'off'} | argsType=${typeof args} A.subagents=${A.subagents} A.budget=${A.budget} A.stall=${A.stall}`)
|
|
603
709
|
|
|
604
|
-
// The optimize round loop (runs concurrently with
|
|
710
|
+
// The optimize round loop (runs concurrently with metaLoop via Promise.all).
|
|
605
711
|
async function optimizeLoop() {
|
|
606
|
-
while (stall <
|
|
712
|
+
while (stall < harness.stall) {
|
|
607
713
|
round += 1
|
|
608
714
|
|
|
609
715
|
phase('Orient')
|
|
610
|
-
|
|
716
|
+
await runInjected('before-scan', `r${round}`) // meta seam (pre-orient/scan)
|
|
717
|
+
const state = await agent(withHarnessPrompt('state', statePrompt()), { schema: STATE, agentType: 'Explore', model: 'sonnet', phase: 'Orient', label: `state:r${round}` })
|
|
611
718
|
lastBestScore = state.bestScore
|
|
612
719
|
if (state.bestScore === state.ceiling) { log(`ceiling reached (best=${state.bestScore}) — stopping`); break }
|
|
613
|
-
const parents = (state.frontier || []).slice(0,
|
|
720
|
+
const parents = (state.frontier || []).slice(0, harness.width)
|
|
614
721
|
if (parents.length === 0) { log('no explorable frontier nodes — stopping'); break }
|
|
615
722
|
|
|
616
|
-
// N1 + N1.5 —
|
|
617
|
-
// (hard rule)
|
|
618
|
-
//
|
|
619
|
-
|
|
620
|
-
|
|
621
|
-
|
|
622
|
-
|
|
623
|
-
|
|
624
|
-
|
|
625
|
-
|
|
626
|
-
|
|
627
|
-
|
|
628
|
-
|
|
629
|
-
|
|
630
|
-
|
|
631
|
-
|
|
632
|
-
|
|
633
|
-
|
|
634
|
-
|
|
723
|
+
// N1 + N1.5 — parallel scan + structural aggregation (barrier). Scan normally runs EVERY round
|
|
724
|
+
// (hard rule), but the meta agent MAY disable it via a toggle-phase edit (free will) — when off,
|
|
725
|
+
// the round briefs from prior signals only. Round 1 falls back to the committed frontier.
|
|
726
|
+
let findings = []
|
|
727
|
+
let patterns = []
|
|
728
|
+
if (harness.phases.scan) {
|
|
729
|
+
phase('Scan')
|
|
730
|
+
const evaluatedIds = state.evaluatedIds || []
|
|
731
|
+
const frontierIds = (state.frontier || []).map((f) => f.id).filter(Boolean)
|
|
732
|
+
const scanTargets = evaluatedIds.length ? evaluatedIds : frontierIds
|
|
733
|
+
const batches = chunk(scanTargets, SCAN_BATCH)
|
|
734
|
+
const scanThunks = batches.map((b) => () => agent(withHarnessPrompt('scan', scanBrief(b)), { schema: FINDINGS, agentType: 'Explore', phase: 'Scan', label: `scan ${b.length}: ${batchLabel(b)}` }))
|
|
735
|
+
const aggregateIds = [...new Set([...evaluatedIds, ...frontierIds])]
|
|
736
|
+
const aggThunk = aggregateIds.length
|
|
737
|
+
? [() => agent(withHarnessPrompt('aggregate', aggregatePrompt(aggregateIds)), { schema: PATTERNS, agentType: 'Explore', phase: 'Scan', label: 'aggregate' })]
|
|
738
|
+
: []
|
|
739
|
+
const scanResults = (await parallel([...scanThunks, ...aggThunk])).filter(Boolean)
|
|
740
|
+
findings = scanResults.flatMap((r) => (r && r.findings) ? r.findings : [])
|
|
741
|
+
patterns = scanResults.flatMap((r) => (r && r.patterns) ? r.patterns : [])
|
|
742
|
+
} else {
|
|
743
|
+
log('scan phase disabled by meta — briefing from prior signals only')
|
|
744
|
+
}
|
|
745
|
+
await runInjected('after-scan', `r${round}`)
|
|
746
|
+
|
|
747
|
+
// N1.7 — research escalation (6b): on stall (before the hard limit) or every ~N commits, fire the
|
|
748
|
+
// three ideators in parallel. Gated by harness.phases.ideate + the harness cadence knobs (meta-tunable).
|
|
635
749
|
const commits = Number(state.committedCount) || 0
|
|
636
|
-
const stalledTrigger = stall >=
|
|
637
|
-
const periodicTrigger = commits - lastIdeatedCommit >=
|
|
750
|
+
const stalledTrigger = stall >= harness.ideateStall && !ideatedThisStall
|
|
751
|
+
const periodicTrigger = commits - lastIdeatedCommit >= harness.ideateEvery
|
|
638
752
|
let ideated = false
|
|
639
|
-
if (stalledTrigger || periodicTrigger) {
|
|
753
|
+
if (harness.phases.ideate && (stalledTrigger || periodicTrigger)) {
|
|
640
754
|
phase('Ideate')
|
|
641
755
|
await parallel(['frontier_extrapolation', 'failure_analysis', 'literature'].map((b) => () =>
|
|
642
|
-
agent(ideatorPrompt(b), { agentType: 'evo:ideator', phase: 'Ideate', label: `ideate:${b}` })))
|
|
756
|
+
agent(withHarnessPrompt('ideator', ideatorPrompt(b)), { agentType: 'evo:ideator', phase: 'Ideate', label: `ideate:${b}` })))
|
|
643
757
|
lastIdeatedCommit = commits
|
|
644
758
|
if (stalledTrigger) ideatedThisStall = true
|
|
645
759
|
ideated = true
|
|
@@ -647,10 +761,11 @@ async function optimizeLoop() {
|
|
|
647
761
|
}
|
|
648
762
|
|
|
649
763
|
// N2 — brief writer: reconciles ideator proposals (6c), acts on axis-warning, and folds in any
|
|
650
|
-
// live
|
|
764
|
+
// live meta hints accumulated since the last round; JS diversity dedupe afterwards.
|
|
765
|
+
await runInjected('before-brief', `r${round}`)
|
|
651
766
|
phase('Brief')
|
|
652
|
-
const
|
|
653
|
-
const briefOut = await agent(briefPrompt(state, findings, patterns, parents, ideated,
|
|
767
|
+
const metaHints = metaSignals.splice(0)
|
|
768
|
+
const briefOut = await agent(withHarnessPrompt('brief', briefPrompt(state, findings, patterns, parents, ideated, metaHints)), { schema: BRIEFS, phase: 'Brief', label: `briefs:r${round}` })
|
|
654
769
|
const briefs = dedupeBriefs((briefOut && briefOut.briefs) || [])
|
|
655
770
|
if (briefs.length === 0) { log('no briefs produced — stopping'); break }
|
|
656
771
|
|
|
@@ -659,7 +774,8 @@ async function optimizeLoop() {
|
|
|
659
774
|
|
|
660
775
|
// N5 — collect: prune dead lineages, record notes.
|
|
661
776
|
phase('Collect')
|
|
662
|
-
await agent(collectPrompt(results, round), { phase: 'Collect', label: `collect:r${round}` })
|
|
777
|
+
await agent(withHarnessPrompt('collect', collectPrompt(results, round)), { phase: 'Collect', label: `collect:r${round}` })
|
|
778
|
+
await runInjected('after-collect', `r${round}`)
|
|
663
779
|
|
|
664
780
|
// Loop control: stall resets only when this round produced a VERIFIED committed score that beats
|
|
665
781
|
// the PRIOR BEST in the metric direction (a beat-its-own-parent commit is branch progress, not a
|
|
@@ -675,55 +791,59 @@ async function optimizeLoop() {
|
|
|
675
791
|
log(`round ${round}: improved=${improved} roundBest=${roundBest} prevBest=${state.bestScore} stall=${stall}/${LIMIT} spent=${budget.spent()}`)
|
|
676
792
|
}
|
|
677
793
|
done = true
|
|
678
|
-
// Wake any in-flight
|
|
679
|
-
// makes the tick's interruptible wait exit within ~
|
|
680
|
-
if (
|
|
794
|
+
// Wake any in-flight meta tick now (its `sleep` can't see the in-memory `done`): the sentinel
|
|
795
|
+
// makes the tick's interruptible wait exit within ~META_HOP_S instead of running the full interval.
|
|
796
|
+
if (META_ENABLED) await agent(`mkdir -p .evo && : > ${DONE_SENTINEL} && echo signalled`, { phase: 'Collect', label: 'signal:optimize-done' })
|
|
681
797
|
log(`optimize loop finished after ${round} round(s), final stall=${stall}/${LIMIT}`)
|
|
682
798
|
return { rounds: round, finalStall: stall }
|
|
683
799
|
}
|
|
684
800
|
|
|
685
|
-
// Concurrent
|
|
801
|
+
// Concurrent meta thread (P1-sliver/P2-P5/P7): an independent, self-paced Opus observer that runs
|
|
686
802
|
// DURING rounds (not per-round). Each tick is a FRESH agent (no cross-tick memory), so `reported`
|
|
687
|
-
// holds the dedup state in this closure. Work-quality findings ->
|
|
803
|
+
// holds the dedup state in this closure. Work-quality findings -> metaSignals (next brief);
|
|
688
804
|
// runtime/host alerts -> the run log. Stops when optimizeLoop sets `done`.
|
|
689
|
-
// Gated ENFORCER for an
|
|
805
|
+
// Gated ENFORCER for an meta STOP: detect (meta) and act (this agent) stay separate. Verifies
|
|
690
806
|
// the experiment is still active, then aborts its run (driver + subprocess tree), annotates the
|
|
691
807
|
// diagnosis (survives the worktree + feeds the next round via knownLearnings), and discards with the
|
|
692
808
|
// failure class so the partial artifact is preserved + classified. A STOP is a diagnosed, recoverable
|
|
693
809
|
// stop — never a silent kill.
|
|
694
810
|
function enforceStopPrompt(s) {
|
|
695
811
|
return [
|
|
696
|
-
`A concurrent
|
|
812
|
+
`A concurrent meta flagged experiment ${s.expId} as heading toward failure and recommends STOPPING it. You are the gated ENFORCER — read-only except for the three evo commands below; do NOT edit code or run training.`,
|
|
697
813
|
`First VERIFY: run \`evo show ${s.expId}\`. Only proceed if its status is still \`active\`. If it is committed / evaluated / discarded / not found, do NOTHING and report skipped (it already resolved).`,
|
|
698
814
|
`If still active, run in order:`,
|
|
699
815
|
` 1. \`evo abort ${s.expId}\` — stop the evo run driver and its subprocess tree.`,
|
|
700
816
|
` 2. annotate the diagnosis so it outlives the worktree and feeds the next round: \`evo annotate ${s.expId} "STOPPED (${s.failureClass}): ${s.reason} | FIX: ${s.fixHint}"\` (quote carefully).`,
|
|
701
|
-
` 3. classify + preserve: \`evo discard ${s.expId} --force --failure-class ${s.failureClass} --reason "
|
|
817
|
+
` 3. classify + preserve: \`evo discard ${s.expId} --force --failure-class ${s.failureClass} --reason "meta stop: ${s.reason}"\` (--force because abort already killed the driver; declared artifacts are preserved).`,
|
|
702
818
|
`Report what you did (aborted / annotated / discarded) or that you skipped because it was no longer active. This is a diagnosed, recoverable stop, not a crash.`,
|
|
703
819
|
].join('\n')
|
|
704
820
|
}
|
|
705
821
|
|
|
706
|
-
async function
|
|
707
|
-
if (!
|
|
822
|
+
async function metaLoop() {
|
|
823
|
+
if (!META_ENABLED) return
|
|
708
824
|
const reported = [] // closure memory across the stateless ticks (caps re-alerting)
|
|
709
825
|
let t = 0
|
|
710
826
|
let fails = 0 // consecutive tick failures; trips the self-disable below
|
|
711
827
|
while (!done) {
|
|
712
828
|
t += 1
|
|
713
|
-
// The
|
|
829
|
+
// The meta is purely advisory and read-only: a failed tick must NEVER reject this loop and
|
|
714
830
|
// abort the optimizer. Swallow any tick error, log it, and continue (or exit if `done` flipped).
|
|
715
831
|
let tick = null
|
|
716
832
|
try {
|
|
717
|
-
tick = await agent(
|
|
718
|
-
agentType: 'Explore', model:
|
|
833
|
+
tick = await agent(metaPrompt({ round, stall, bestScore: lastBestScore }, META_INTERVAL_S, reported.slice(-30)), {
|
|
834
|
+
agentType: 'Explore', model: META_MODEL, schema: META_FINDINGS, phase: 'Meta', label: `meta#${t}`,
|
|
719
835
|
})
|
|
720
836
|
} catch (e) {
|
|
721
|
-
log(`
|
|
837
|
+
log(`META tick #${t} errored (ignored, optimize unaffected): ${(e && e.message) || e}`)
|
|
722
838
|
}
|
|
723
839
|
if (tick) {
|
|
724
840
|
fails = 0 // a real tick resets the failure streak
|
|
725
|
-
for (const h of (tick.briefHints || [])) {
|
|
726
|
-
for (const a of (tick.alerts || [])) { log(`
|
|
841
|
+
for (const h of (tick.briefHints || [])) { metaSignals.push(h); reported.push(h) }
|
|
842
|
+
for (const a of (tick.alerts || [])) { log(`META ALERT: ${a}`); reported.push(a) }
|
|
843
|
+
// HARNESS EDITS (new ability): the meta restructures the workflow itself live — applied
|
|
844
|
+
// directly with free will (no gate, no caps), audited via harness.editLog + the run log.
|
|
845
|
+
// Takes effect at the next round (the optimize loop reads `harness` at each round start).
|
|
846
|
+
for (const e of (tick.harnessEdits || [])) applyHarnessEdit(e, round)
|
|
727
847
|
// STOP recommendations: hand each to a gated enforcer (detect/act separation). The fix also
|
|
728
848
|
// feeds the next round's brief so the loop corrects rather than just abandons.
|
|
729
849
|
for (const s of (tick.stops || [])) {
|
|
@@ -731,32 +851,35 @@ async function analystLoop() {
|
|
|
731
851
|
const stopKey = `stop:${s.expId}`
|
|
732
852
|
if (reported.includes(stopKey)) continue // never re-enforce the same experiment
|
|
733
853
|
reported.push(stopKey)
|
|
734
|
-
log(`
|
|
735
|
-
|
|
854
|
+
log(`META STOP: ${s.expId} [${s.failureClass}] ${s.reason}`)
|
|
855
|
+
metaSignals.push(`Experiment ${s.expId} was stopped (${s.failureClass}): ${s.reason} — next: ${s.fixHint}`)
|
|
736
856
|
try {
|
|
737
|
-
await agent(enforceStopPrompt(s), { phase: '
|
|
857
|
+
await agent(enforceStopPrompt(s), { phase: 'Meta', label: `enforce-stop:${s.expId}` })
|
|
738
858
|
} catch (e) {
|
|
739
|
-
log(`
|
|
859
|
+
log(`META enforce-stop ${s.expId} errored (ignored): ${(e && e.message) || e}`)
|
|
740
860
|
}
|
|
741
861
|
}
|
|
742
|
-
} else if (++fails >=
|
|
862
|
+
} else if (++fails >= META_MAX_FAILS) {
|
|
743
863
|
// The pacing wait lives INSIDE the agent, so a tick that fails before sleeping (e.g. a schema
|
|
744
864
|
// reject) leaves nothing to pace the retry — left unchecked the loop hot-spins agents. The
|
|
745
|
-
//
|
|
746
|
-
log(`
|
|
865
|
+
// meta is optional, so after a short streak of failures, disable it for the rest of the run.
|
|
866
|
+
log(`META disabled after ${fails} consecutive failed ticks — optimize continues without it.`)
|
|
747
867
|
return
|
|
748
868
|
}
|
|
749
869
|
}
|
|
750
870
|
}
|
|
751
871
|
|
|
752
|
-
// Clear any stale sentinel from a prior run BEFORE the threads start, else the
|
|
872
|
+
// Clear any stale sentinel from a prior run BEFORE the threads start, else the meta's first wait
|
|
753
873
|
// would see it and exit instantly. The script can't touch the filesystem itself, so an agent does it.
|
|
754
|
-
if (
|
|
874
|
+
if (META_ENABLED) await agent(`rm -f ${DONE_SENTINEL}; echo cleared`, { phase: 'Orient', label: 'init:clear-sentinel' })
|
|
755
875
|
|
|
756
|
-
// optimizeLoop is the run's result;
|
|
876
|
+
// optimizeLoop is the run's result; metaLoop is advisory. The `.catch` is the definitive guard that
|
|
757
877
|
// the observer thread can NEVER reject the combined promise and fail an otherwise-good optimize run.
|
|
758
878
|
const [optimizeResult] = await Promise.all([
|
|
759
879
|
optimizeLoop(),
|
|
760
|
-
|
|
880
|
+
metaLoop().catch((e) => log(`META thread exited abnormally (ignored): ${(e && e.message) || e}`)),
|
|
761
881
|
])
|
|
762
|
-
|
|
882
|
+
// Surface the harness audit alongside the optimize result: final round-plan + every live edit the
|
|
883
|
+
// meta agent applied (knobs, phase toggles, prompt overrides, injected steps), so the run is fully
|
|
884
|
+
// reconstructable from the return value + the run log.
|
|
885
|
+
return { ...optimizeResult, harness: harnessSummary(), harnessEditLog: harness.editLog }
|
package/skills/report/SKILL.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: report
|
|
3
3
|
description: Print the dashboard's dot chart (score over experiment order, status colors, best-path stair) inline in the terminal for every run in the workspace. Use when the user invokes /evo:report, asks for a quick score chart without opening the dashboard, or wants the scatter plot in chat output.
|
|
4
|
-
evo_version: 0.5.0-alpha.
|
|
4
|
+
evo_version: 0.5.0-alpha.13
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Report
|
package/skills/subagent/SKILL.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: subagent
|
|
3
3
|
description: Protocol that evo optimization subagents follow when dispatched from /optimize. Auto-loaded by spawned subagents via their host's skill loader. The orchestrator may also invoke this skill to understand the brief shape its dispatched subagents expect + what they're required to emit -- useful when writing briefs or debugging a subagent's behavior.
|
|
4
|
-
evo_version: 0.5.0-alpha.
|
|
4
|
+
evo_version: 0.5.0-alpha.13
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Evo Subagent Protocol
|