@evo-hq/pi-evo 0.5.1 → 0.5.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json
CHANGED
package/skills/discover/SKILL.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
name: discover
|
|
3
3
|
description: Initialize evo for the current repository by exploring the codebase, proposing unexplored optimization dimensions, constructing the benchmark inside a baseline worktree, and running the first experiment. Use when the user invokes /evo:discover, mentions setting up evo, wants to instrument a codebase for autonomous optimization, or asks to start a new evo run on a project.
|
|
4
4
|
argument-hint: <optional context about what to optimize>
|
|
5
|
-
evo_version: 0.5.
|
|
5
|
+
evo_version: 0.5.3
|
|
6
6
|
---
|
|
7
7
|
|
|
8
8
|
# Discover
|
|
@@ -116,20 +116,20 @@ evo --version
|
|
|
116
116
|
The output must be exactly:
|
|
117
117
|
|
|
118
118
|
```
|
|
119
|
-
evo-hq-cli 0.5.
|
|
119
|
+
evo-hq-cli 0.5.3
|
|
120
120
|
```
|
|
121
121
|
|
|
122
122
|
Three outcomes:
|
|
123
123
|
|
|
124
124
|
1. **Matches exactly** — continue to step 1.
|
|
125
125
|
2. **Reports a different version** (`evo-hq-cli 0.4.2`, etc.) — the host refetched a newer/older skill bundle than the CLI on PATH. Drift breaks skills silently. Stop and tell the user:
|
|
126
|
-
> Your installed evo CLI is on a different version than this skill (`0.5.
|
|
126
|
+
> Your installed evo CLI is on a different version than this skill (`0.5.3`). Run:
|
|
127
127
|
> ```
|
|
128
|
-
> uv tool install --force evo-hq-cli==0.5.
|
|
128
|
+
> uv tool install --force evo-hq-cli==0.5.3
|
|
129
129
|
> ```
|
|
130
130
|
> Then re-invoke this skill.
|
|
131
131
|
3. **`command not found`, or reports a different package** (commonly `evo 1.x` — the unrelated SLAM tool) — the CLI isn't installed. Tell the user:
|
|
132
|
-
> `evo-hq-cli` isn't on your PATH. Install it: `uv tool install evo-hq-cli==0.5.
|
|
132
|
+
> `evo-hq-cli` isn't on your PATH. Install it: `uv tool install evo-hq-cli==0.5.3` (or `pipx install evo-hq-cli==0.5.3`). Then re-invoke this skill.
|
|
133
133
|
|
|
134
134
|
Do not try to auto-install. Host sandbox + network policy may block it; leaving the install as a user action keeps failure modes clear.
|
|
135
135
|
|
package/skills/optimize/SKILL.md
CHANGED
|
@@ -2,7 +2,7 @@
|
|
|
2
2
|
name: optimize
|
|
3
3
|
description: Drive structured autoresearch iteration after evo:discover and the baseline commit -- scan-subagent cross-cutting analysis between rounds, frontier-based parent selection, ideator dispatch on stall, verifier pre/post hooks, annotation discipline. Width is set via subagents=N (1 for serial workloads, larger for parallel); the loop's structural value applies at any width.
|
|
4
4
|
argument-hint: "[subagents=N] [budget=N] [stall=N]"
|
|
5
|
-
evo_version: 0.5.
|
|
5
|
+
evo_version: 0.5.3
|
|
6
6
|
---
|
|
7
7
|
|
|
8
8
|
Run the `evo` optimization loop. Each round, the orchestrator writes structured briefs and spawns subagents that execute within them. Each subagent is semi-autonomous: it reads the pointer traces, forms the concrete edit, runs experiments, and can iterate within its branch. Runs until interrupted or the stall limit is reached.
|
|
@@ -102,17 +102,17 @@ evo defaults get subagents-only --json
|
|
|
102
102
|
|
|
103
103
|
As your **very first actions, before the loop**, resolve each and arm it: run `evo autonomous on` / `evo subagents-only on` when it resolves on, or `evo autonomous off` / `evo subagents-only off` when an explicit instruction or stored default turned it off. If a behavior resolves off — whether from the user's instruction this run or a stored default — say so in your opening message (e.g. "autonomous off — running one round at a time, as you asked") so it's never invisible.
|
|
104
104
|
|
|
105
|
-
**Orchestrator driver.** evo drives the loop two ways: a deterministic **dynamic workflow** (Claude Code only)
|
|
105
|
+
**Orchestrator driver.** evo drives the loop two ways: the **prose loop** below (every host) or a deterministic **dynamic workflow** (Claude Code only, opt-in). **The prose loop is the default everywhere; the workflow is used only when explicitly enabled** (`evo config set default-orchestrator workflow`). Resolve which as part of your very first actions:
|
|
106
106
|
|
|
107
107
|
1. `evo host show` — the workflow driver requires `claude-code`. If it prints `<not set>` (a pre-host workspace), determine your actual runtime from your own context (system prompt, env such as `CLAUDECODE=1`, self-identity): **only if you are genuinely Claude Code**, do the one-time host migration now (`evo host set claude-code`) and continue; if you are any other runtime, do NOT stamp the host here — leave it for Step 0.1 and use the prose loop.
|
|
108
|
-
2. `evo config get default-orchestrator` — `
|
|
108
|
+
2. `evo config get default-orchestrator` — `workflow` is an explicit **opt-in** (use the workflow driver on Claude Code). `prose` **or unset** resolves to the prose loop. An explicit user instruction this run still wins.
|
|
109
109
|
|
|
110
|
-
**Use the workflow** when
|
|
110
|
+
**Use the workflow** only when `default-orchestrator` is explicitly `workflow`, host is `claude-code`, AND the **Workflow tool is actually present in your available tools this session** — it is opt-in, never the default. The availability check is load-bearing: **older Claude Code builds do not ship the Workflow tool**, so verify it's really in your toolset; do not assume it exists from the host alone. Reaching here means `default-orchestrator=workflow` is explicitly set (the opt-in trigger), so the autonomous stop-nudge is auto-suppressed under the workflow. Launch it once, do NOT drive the loop turn-by-turn:
|
|
111
111
|
|
|
112
112
|
- Call the **Workflow** tool with `scriptPath: ${CLAUDE_PLUGIN_ROOT}/skills/optimize/workflows/evo-optimize.js` and `args: {pluginRoot: "${CLAUDE_PLUGIN_ROOT}", subagents: <N>, budget: <N>, stall: <N>}`, using the round sizing you resolved above. **Pass all four keys explicitly — never omit one.** For `stall`, use the user's `/optimize stall=N` override if given, else the default 5. (The workflow's stop condition is the stall limit, so a dropped `stall` silently reverts it to 5.)
|
|
113
113
|
- Report the returned `runId` and tell the user to watch progress with `/workflows`. The workflow runs the round loop itself (orient → mandatory scan + cross-history axis check → ideators on stall/periodic → briefs → fan-out + verify → collect → frontier-select → stall) plus the concurrent meta controller; you do **not** execute "The Loop" section below, and you do **not** need autonomous mode (the workflow self-drives; its stall limit is the stop).
|
|
114
114
|
|
|
115
|
-
Use **The Loop** below
|
|
115
|
+
Use **The Loop** below by default — it is the prose driver on every host, and the path whenever the workflow is not explicitly enabled (`default-orchestrator` unset or `prose`), the host is not `claude-code`, or the Workflow tool is unavailable (e.g. an older Claude Code build). The workflow is only an execution strategy over the same `evo` CLI; gates, frontier, dashboard, and recovery are identical either way.
|
|
116
116
|
|
|
117
117
|
**Reconcile config when you fall back to prose.** The stop-nudge that drives the prose loop is auto-suppressed whenever `default-orchestrator` is `workflow`. So if you fall back to the prose loop on Claude Code because the Workflow tool isn't available (older build) while `default-orchestrator` is still `workflow` from a prior run, you MUST set it back — `evo config set default-orchestrator prose` — and arm autonomous as usual. Otherwise the prose loop's stop-nudge stays suppressed and the run stalls after one round. Invariant to preserve: `default-orchestrator=workflow` in config iff the workflow is actually the driver this run.
|
|
118
118
|
|
|
@@ -5,7 +5,7 @@
|
|
|
5
5
|
* opt-in, Claude-Code-only driver; the prose skill remains the canonical, host-agnostic
|
|
6
6
|
* default. The workflow encodes the loop CONTROL: while/stall, mandatory scan + cross-history
|
|
7
7
|
* axis check, research escalation (ideators on stall / every ~5 commits), brief + diversity,
|
|
8
|
-
* fan-out + verify, collect + frontier-select. A concurrent META thread (
|
|
8
|
+
* fan-out + verify, collect + frontier-select. A concurrent META thread (session model, self-paced,
|
|
9
9
|
* read-only) runs alongside the round loop via Promise.all — host + cross-history checks during
|
|
10
10
|
* rounds, feeding hints into the next brief. All domain work goes through the `evo` CLI inside
|
|
11
11
|
* agents — the script itself never touches the filesystem/shell.
|
|
@@ -36,7 +36,7 @@ export const meta = {
|
|
|
36
36
|
{ title: 'Optimize', detail: 'parallel optimization subagents (evo new/run)' },
|
|
37
37
|
{ title: 'Verify', detail: 'validity audit + benchmark-noise confirm' },
|
|
38
38
|
{ title: 'Collect', detail: 'prune dead lineages, record cross-cutting notes' },
|
|
39
|
-
{ title: 'Meta', detail: 'concurrent controller (
|
|
39
|
+
{ title: 'Meta', detail: 'concurrent controller (session model) — host/cross-history checks, STOP doomed runs, suggest directions, AND restructure the workflow live (logic flow + prompts)' },
|
|
40
40
|
{ title: 'Meta-step', detail: 'extra agent step the meta injected into the round via an inject-step harness edit' },
|
|
41
41
|
],
|
|
42
42
|
}
|
|
@@ -188,6 +188,10 @@ const META_FINDINGS = {
|
|
|
188
188
|
properties: {
|
|
189
189
|
briefHints: { type: 'array', items: { type: 'string' } },
|
|
190
190
|
alerts: { type: 'array', items: { type: 'string' } },
|
|
191
|
+
// META-LOG: a working note to the meta's FUTURE ticks (each tick is a fresh agent — this is
|
|
192
|
+
// its only reasoning memory). Observations, pending hypotheses, evidence trails, watch-items.
|
|
193
|
+
// The loop accumulates these and feeds recent entries back into every tick's prompt.
|
|
194
|
+
journal: { type: 'string' },
|
|
191
195
|
// STOP recommendations for in-flight experiments that are clearly doomed.
|
|
192
196
|
// A stop is NOT a crash: each carries the diagnosis + a fix so the gated
|
|
193
197
|
// enforcer can abort, annotate (lesson outlives the worktree), and classify+
|
|
@@ -206,10 +210,10 @@ const META_FINDINGS = {
|
|
|
206
210
|
},
|
|
207
211
|
},
|
|
208
212
|
// FREE-WILL harness edits: the meta agent restructures the WORKFLOW itself, live
|
|
209
|
-
// (its logic flow + prompts
|
|
210
|
-
// and audited (editLog + run log + returned state).
|
|
211
|
-
//
|
|
212
|
-
//
|
|
213
|
+
// (its logic flow + prompts, including the two verifier-gate prompts). Applied directly
|
|
214
|
+
// each tick — no allow-list, no caps — and audited (editLog + run log + returned state).
|
|
215
|
+
// It never touches the benchmark / grader / scorer (those define how results are judged,
|
|
216
|
+
// so they stay fixed and the score stays comparable).
|
|
213
217
|
harnessEdits: {
|
|
214
218
|
type: 'array',
|
|
215
219
|
items: {
|
|
@@ -224,8 +228,10 @@ const META_FINDINGS = {
|
|
|
224
228
|
// toggle-phase — turn a discretionary phase on/off
|
|
225
229
|
phaseName: { enum: ['scan', 'ideate'] },
|
|
226
230
|
enabled: { type: 'boolean' },
|
|
227
|
-
// set-prompt — edit a prompt the workflow uses
|
|
228
|
-
|
|
231
|
+
// set-prompt — edit a prompt the workflow uses. Appends ACCUMULATE (each adds a standing
|
|
232
|
+
// directive); replace swaps the base prompt wholesale. preverify/audit cover the two
|
|
233
|
+
// verifier gates (design-time cheating audit, post-run validity audit).
|
|
234
|
+
target: { enum: ['state', 'scan', 'aggregate', 'brief', 'implement', 'run', 'ideator', 'collect', 'preverify', 'audit'] },
|
|
229
235
|
mode: { enum: ['append', 'replace'] },
|
|
230
236
|
text: { type: 'string' },
|
|
231
237
|
// inject-step — add an extra agent step at a fixed seam each round
|
|
@@ -257,7 +263,7 @@ const IDEATE_EVERY_COMMITS = 5 // periodic research cadence (matches prose ste
|
|
|
257
263
|
const PREVERIFY_MAX = 3 // pre-run verify <-> revise attempts before discarding a rigged edit
|
|
258
264
|
// Concurrent meta thread (runs alongside the round loop, NOT per-round).
|
|
259
265
|
const META_ENABLED = true
|
|
260
|
-
const META_MODEL = '
|
|
266
|
+
const META_MODEL = 'inherit' // the meta reasons with the session model (judgment-heavy; never below the loop it supervises)
|
|
261
267
|
const META_INTERVAL_S = 300 // self-pace: observe ~every 5 min, during rounds
|
|
262
268
|
const META_HOP_S = 15 // the wait is INTERRUPTIBLE in hops of this size: when the optimize loop
|
|
263
269
|
// ends mid-wait it drops a sentinel the meta polls, so the in-flight
|
|
@@ -524,18 +530,21 @@ function collectPrompt(results, round) {
|
|
|
524
530
|
].join(' ')
|
|
525
531
|
}
|
|
526
532
|
|
|
527
|
-
// One meta tick (a FRESH
|
|
528
|
-
// the
|
|
533
|
+
// One meta tick (a FRESH agent each call — no memory across ticks: `reported` carries the dedup
|
|
534
|
+
// state and `metaJournal` the reasoning notes, both in the loop's closure). Read-only: observes
|
|
535
|
+
// host + cross-history signals DURING
|
|
529
536
|
// rounds, returns work-quality briefHints (folded into the next brief) + runtime alerts (surfaced).
|
|
530
|
-
function metaPrompt(ctx, intervalS, reported) {
|
|
537
|
+
function metaPrompt(ctx, intervalS, reported, journal) {
|
|
531
538
|
return [
|
|
532
539
|
'You are the evo META agent — an independent controller running CONCURRENTLY with the optimize loop.',
|
|
533
|
-
'You do NOT edit experiment code, run experiments, or touch the benchmark/grader
|
|
540
|
+
'You do NOT edit experiment code, run experiments, or touch the benchmark/grader. But you DO shape the optimize WORKFLOW: stop doomed experiments, suggest next directions (briefHints), AND restructure the running workflow itself — its logic flow + prompts, including the two verifier-gate prompts (preverify, audit) — via harnessEdits (your distinctive power, detailed below).',
|
|
534
541
|
`FIRST pace yourself with an INTERRUPTIBLE wait, so you stop promptly when the optimize loop ends. Run this single Bash command with a tool timeout of at least ${(intervalS + 30) * 1000} ms:`,
|
|
535
542
|
` \`if [ -f ${DONE_SENTINEL} ]; then echo OPTIMIZE_DONE; else for i in $(seq 1 ${Math.ceil(intervalS / META_HOP_S)}); do sleep ${META_HOP_S}; [ -f ${DONE_SENTINEL} ] && { echo OPTIMIZE_DONE; break; }; done; fi\``,
|
|
536
543
|
`If that prints OPTIMIZE_DONE, the optimize loop has finished — return {"briefHints":[],"alerts":[],"stops":[],"harnessEdits":[]} immediately WITHOUT gathering any signals. Otherwise the full interval elapsed: now gather signals and report.`,
|
|
537
544
|
`Current loop state: round=${ctx.round}, stall=${ctx.stall}/${LIMIT}, best=${ctx.bestScore}.`,
|
|
538
545
|
`Already reported (do NOT repeat — only emit findings NEW since these): ${JSON.stringify(reported || [])}.`,
|
|
546
|
+
`META-LOG (notes your past ticks left for you — each tick is a fresh agent, this is your only reasoning memory): ${JSON.stringify(journal || [])}.`,
|
|
547
|
+
'Optionally return `journal`: one concise working note to your future ticks — observations not yet actionable, pending hypotheses with the evidence so far, watch-items to re-check. Omit it when there is nothing worth carrying forward.',
|
|
539
548
|
'Walk these checks (skip any whose inputs are unavailable; cite evidence; nothing speculative):',
|
|
540
549
|
'- Zombie GPU: `nvidia-smi --query-compute-apps=pid,used_memory,process_name --format=csv,noheader` + `ps` — a PID holding >=4GB not tied to an active `evo run`. ALERT with a verify clause (do NOT kill).',
|
|
541
550
|
'- Buried stderr warning: tail recent experiment stderr under `.evo/run_*/experiments/*/attempts/*/` for tokenizer / EOS / chat_template / parity-mismatch lines not already annotated. ALERT.',
|
|
@@ -546,8 +555,8 @@ function metaPrompt(ctx, intervalS, reported) {
|
|
|
546
555
|
'For STOPs you stay READ-ONLY: do NOT run `evo abort` / `evo discard` yourself. A gated enforcer acts on each stop — it aborts the run + its subprocess tree, annotates your diagnosis (so it outlives the worktree and feeds the next round), and discards with the failureClass so the partial artifact is preserved. A STOP is a diagnosed, recoverable stop, never a silent kill.',
|
|
547
556
|
'',
|
|
548
557
|
'HARNESS CONTROL (your distinctive power): you may restructure the optimize workflow itself, live, when you judge it will help — edits apply directly (free will) and take effect next round. Current harness state: ' + JSON.stringify(harnessSummary()) + '.',
|
|
549
|
-
'harnessEdits ops: (1) set-knob {knob: width|budget|stall|ideateEvery|ideateStall, value} — retune the loop (widen the round, deepen branches, change the stall limit or ideation cadence). (2) toggle-phase {phaseName: scan|ideate, enabled} — turn a phase off/on (e.g. skip scan when traces are uninformative; force ideation early). (3) set-prompt {target: state|scan|aggregate|brief|implement|run|ideator|collect, mode: append|replace, text} — edit the prompt that step uses (
|
|
550
|
-
'HARD CONSTRAINT: edit ONLY the search harness above. NEVER propose edits to the benchmark, grader, scorer, held-out test, or any gate — those define how results are judged; if you change them the score stops meaning anything. Emit harnessEdits ONLY with concrete evidence the current workflow SHAPE is the bottleneck; most ticks should emit none.',
|
|
558
|
+
'harnessEdits ops: (1) set-knob {knob: width|budget|stall|ideateEvery|ideateStall, value} — retune the loop (widen the round, deepen branches, change the stall limit or ideation cadence). (2) toggle-phase {phaseName: scan|ideate, enabled} — turn a phase off/on (e.g. skip scan when traces are uninformative; force ideation early). (3) set-prompt {target: state|scan|aggregate|brief|implement|run|ideator|collect|preverify|audit, mode: append|replace, text} — edit the prompt that step uses. Appends ACCUMULATE as standing directives (the current ones are visible in the harness state above — do not re-add them); replace swaps the base wholesale. Use preverify/audit to harden the verifier when you spot a cheat pattern the audit missed. (4) inject-step {at: before-scan|after-scan|before-brief|after-collect, text, label} — add an extra agent step at that seam each round. Every edit needs a rationale citing the evidence.',
|
|
559
|
+
'HARD CONSTRAINT: edit ONLY the search harness above. NEVER propose edits to the benchmark, grader, scorer, held-out test, or any gate — those define how results are judged; if you change them the score stops meaning anything. Verifier prompt edits are the one sanctioned contact with judging. Emit harnessEdits ONLY with concrete evidence the current workflow SHAPE is the bottleneck; most ticks should emit none.',
|
|
551
560
|
'Return {briefHints:[...], alerts:[...], stops:[...], harnessEdits:[...]}. briefHints feed the NEXT round\'s briefs; alerts surface to the user; each stop triggers the gated enforcer; each harnessEdit is applied directly to the running workflow. All-empty is fine — most ticks should be quiet.',
|
|
552
561
|
].join('\n')
|
|
553
562
|
}
|
|
@@ -571,20 +580,20 @@ async function runBrief(brief, state) {
|
|
|
571
580
|
let best = null
|
|
572
581
|
for (let depth = 0; depth < harness.budget; depth++) {
|
|
573
582
|
const impl = await agent(withHarnessPrompt('implement', implementPrompt(brief, parent, state)), {
|
|
574
|
-
schema: IMPL_RESULT,
|
|
583
|
+
schema: IMPL_RESULT, ...(brief.hard ? {} : { model: 'sonnet' }), phase: 'Optimize', label: `impl:${parent}#${depth}`,
|
|
575
584
|
})
|
|
576
585
|
if (!impl || !impl.expId) break
|
|
577
586
|
|
|
578
587
|
// pre-verify <-> revise feedback loop (design-time cheating gate)
|
|
579
588
|
let pv = null
|
|
580
589
|
for (let v = 0; v < PREVERIFY_MAX; v++) {
|
|
581
|
-
pv = await agent(preVerifyPrompt(impl.expId, impl.worktree), {
|
|
590
|
+
pv = await agent(withHarnessPrompt('preverify', preVerifyPrompt(impl.expId, impl.worktree)), {
|
|
582
591
|
schema: PREVERDICT, agentType: 'evo:verifier', phase: 'Verify', label: `preverify:${impl.expId}#${v}`,
|
|
583
592
|
})
|
|
584
593
|
if (pv && pv.pass) break
|
|
585
594
|
if (v < PREVERIFY_MAX - 1) {
|
|
586
595
|
await agent(revisePrompt(impl.expId, impl.worktree, pv && pv.findings), {
|
|
587
|
-
|
|
596
|
+
...(brief.hard ? {} : { model: 'sonnet' }), phase: 'Optimize', label: `revise:${impl.expId}#${v}`,
|
|
588
597
|
})
|
|
589
598
|
}
|
|
590
599
|
}
|
|
@@ -603,7 +612,7 @@ async function runBrief(brief, state) {
|
|
|
603
612
|
if (!r.bestExpId || typeof r.bestScore !== 'number') {
|
|
604
613
|
valid = false
|
|
605
614
|
} else {
|
|
606
|
-
const audit = await agent(auditPrompt(r.bestExpId), { schema: VERDICT, agentType: 'evo:verifier', phase: 'Verify', label: `audit:${r.bestExpId}` })
|
|
615
|
+
const audit = await agent(withHarnessPrompt('audit', auditPrompt(r.bestExpId)), { schema: VERDICT, agentType: 'evo:verifier', phase: 'Verify', label: `audit:${r.bestExpId}` })
|
|
607
616
|
valid = !!(audit && audit.valid)
|
|
608
617
|
if (valid && (Number(state.verifyRepeats) || 1) > 1) {
|
|
609
618
|
log(`note: ${r.bestExpId} on a noisy benchmark (repeats=${state.verifyRepeats}); confirm-loop pending the evo rescore affordance — relying on the held-out gate`)
|
|
@@ -632,6 +641,7 @@ let ideatedThisStall = false // fire ideators once per stall episode, not ever
|
|
|
632
641
|
let lastBestScore = null // latest best score, surfaced to the concurrent meta thread
|
|
633
642
|
let done = false // set when the optimize loop ends -> stops the meta thread
|
|
634
643
|
const metaSignals = [] // briefHints the meta pushes; drained into the next round's brief
|
|
644
|
+
const metaJournal = [] // meta-log: { tick, round, note } working notes the meta writes to its future ticks
|
|
635
645
|
|
|
636
646
|
// ---------------------------------------------------------------------------
|
|
637
647
|
// Mutable HARNESS (the round-plan + prompts the meta agent edits live, free-will).
|
|
@@ -649,18 +659,19 @@ const harness = {
|
|
|
649
659
|
ideateEvery: IDEATE_EVERY_COMMITS,
|
|
650
660
|
ideateStall: IDEATE_STALL,
|
|
651
661
|
phases: { scan: true, ideate: true },
|
|
652
|
-
prompts: {}, // target -> {
|
|
662
|
+
prompts: {}, // target -> { replace: string|null, appends: [string] } — appends accumulate
|
|
653
663
|
injectedSteps: [], // { at, prompt, label }
|
|
654
664
|
editLog: [], // audit trail: { round, op, ...spec, rationale }
|
|
655
665
|
}
|
|
656
666
|
|
|
657
|
-
// Apply
|
|
667
|
+
// Apply meta prompt overrides to a base prompt: a replace (if any) swaps the base wholesale,
|
|
668
|
+
// then every accumulated append rides on top as a standing directive.
|
|
658
669
|
function withHarnessPrompt(target, baseText) {
|
|
659
670
|
const o = harness.prompts[target]
|
|
660
|
-
if (!o
|
|
661
|
-
|
|
662
|
-
|
|
663
|
-
|
|
671
|
+
if (!o) return baseText
|
|
672
|
+
let text = o.replace != null ? o.replace : baseText
|
|
673
|
+
for (const a of o.appends) text += '\n\n[META-ADDED DIRECTIVE — injected live by the meta agent]: ' + a
|
|
674
|
+
return text
|
|
664
675
|
}
|
|
665
676
|
|
|
666
677
|
// Run any meta-injected extra steps registered at a given seam (insert-step op).
|
|
@@ -683,8 +694,11 @@ function applyHarnessEdit(e, atRound) {
|
|
|
683
694
|
} else if (e.op === 'toggle-phase' && e.phaseName) {
|
|
684
695
|
harness.phases[e.phaseName] = e.enabled !== false; rec.phaseName = e.phaseName; rec.enabled = harness.phases[e.phaseName]
|
|
685
696
|
} else if (e.op === 'set-prompt' && e.target && e.text) {
|
|
686
|
-
|
|
687
|
-
|
|
697
|
+
const mode = e.mode === 'replace' ? 'replace' : 'append'
|
|
698
|
+
const o = harness.prompts[e.target] || (harness.prompts[e.target] = { replace: null, appends: [] })
|
|
699
|
+
if (mode === 'replace') o.replace = e.text
|
|
700
|
+
else o.appends.push(e.text)
|
|
701
|
+
rec.target = e.target; rec.mode = mode
|
|
688
702
|
} else if (e.op === 'inject-step' && e.at && e.text) {
|
|
689
703
|
harness.injectedSteps.push({ at: e.at, prompt: e.text, label: e.label || `meta:${e.at}` }); rec.at = e.at; rec.label = e.label || `meta:${e.at}`
|
|
690
704
|
} else {
|
|
@@ -699,7 +713,9 @@ function harnessSummary() {
|
|
|
699
713
|
width: harness.width, budget: harness.budget, stall: harness.stall,
|
|
700
714
|
ideateEvery: harness.ideateEvery, ideateStall: harness.ideateStall,
|
|
701
715
|
phases: harness.phases,
|
|
702
|
-
|
|
716
|
+
// Full directive texts included so fresh meta ticks see standing overrides and don't duplicate them.
|
|
717
|
+
promptsOverridden: Object.entries(harness.prompts).map(([k, v]) =>
|
|
718
|
+
({ target: k, replaced: v.replace != null, appends: v.appends })),
|
|
703
719
|
injectedSteps: harness.injectedSteps.map((s) => `${s.at}:${s.label}`),
|
|
704
720
|
edits: harness.editLog.length,
|
|
705
721
|
}
|
|
@@ -830,14 +846,15 @@ async function metaLoop() {
|
|
|
830
846
|
// abort the optimizer. Swallow any tick error, log it, and continue (or exit if `done` flipped).
|
|
831
847
|
let tick = null
|
|
832
848
|
try {
|
|
833
|
-
tick = await agent(metaPrompt({ round, stall, bestScore: lastBestScore }, META_INTERVAL_S, reported.slice(-30)), {
|
|
834
|
-
agentType: 'Explore',
|
|
849
|
+
tick = await agent(metaPrompt({ round, stall, bestScore: lastBestScore }, META_INTERVAL_S, reported.slice(-30), metaJournal.slice(-20)), {
|
|
850
|
+
agentType: 'Explore', schema: META_FINDINGS, phase: 'Meta', label: `meta#${t}`,
|
|
835
851
|
})
|
|
836
852
|
} catch (e) {
|
|
837
853
|
log(`META tick #${t} errored (ignored, optimize unaffected): ${(e && e.message) || e}`)
|
|
838
854
|
}
|
|
839
855
|
if (tick) {
|
|
840
856
|
fails = 0 // a real tick resets the failure streak
|
|
857
|
+
if (tick.journal) metaJournal.push({ tick: t, round, note: tick.journal })
|
|
841
858
|
for (const h of (tick.briefHints || [])) { metaSignals.push(h); reported.push(h) }
|
|
842
859
|
for (const a of (tick.alerts || [])) { log(`META ALERT: ${a}`); reported.push(a) }
|
|
843
860
|
// HARNESS EDITS (new ability): the meta restructures the workflow itself live — applied
|
|
@@ -882,4 +899,4 @@ const [optimizeResult] = await Promise.all([
|
|
|
882
899
|
// Surface the harness audit alongside the optimize result: final round-plan + every live edit the
|
|
883
900
|
// meta agent applied (knobs, phase toggles, prompt overrides, injected steps), so the run is fully
|
|
884
901
|
// reconstructable from the return value + the run log.
|
|
885
|
-
return { ...optimizeResult, harness: harnessSummary(), harnessEditLog: harness.editLog }
|
|
902
|
+
return { ...optimizeResult, harness: harnessSummary(), harnessEditLog: harness.editLog, metaJournal }
|
package/skills/report/SKILL.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: report
|
|
3
3
|
description: Print the dashboard's dot chart (score over experiment order, status colors, best-path stair) inline in the terminal for every run in the workspace. Use when the user invokes /evo:report, asks for a quick score chart without opening the dashboard, or wants the scatter plot in chat output.
|
|
4
|
-
evo_version: 0.5.
|
|
4
|
+
evo_version: 0.5.3
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Report
|
package/skills/subagent/SKILL.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
name: subagent
|
|
3
3
|
description: Protocol that evo optimization subagents follow when dispatched from /optimize. Auto-loaded by spawned subagents via their host's skill loader. The orchestrator may also invoke this skill to understand the brief shape its dispatched subagents expect + what they're required to emit -- useful when writing briefs or debugging a subagent's behavior.
|
|
4
|
-
evo_version: 0.5.
|
|
4
|
+
evo_version: 0.5.3
|
|
5
5
|
---
|
|
6
6
|
|
|
7
7
|
# Evo Subagent Protocol
|