npm - oh-my-opencode - Versions diffs - 4.8.1 → 4.9.0 - Mend

oh-my-opencode 4.8.1 → 4.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (507) hide show

package/packages/omo-codex/plugin/skills/ulw-loop/SKILL.md CHANGED Viewed

@@ -9,12 +9,12 @@ metadata:
 Use this skill when the user asks for `ulw-loop`, `ulw`, durable goal execution, evidence-led work, manual QA, or checkpointed long-running delivery.
-This Codex skill is intentionally compact to avoid adding a large operating manual to an already-full conversation. The full workflow lives in `references/full-workflow.md`. Read only the sections needed for the current phase, then execute them exactly.
+This skill is intentionally compact. The full workflow lives in `references/full-workflow.md`. Read only the sections needed for the current phase, then execute them exactly.
 ## Required First Steps
 1. Open `references/full-workflow.md`.
-2. Read through **Bootstrap**, **Execution Loop**, and the **Manual-QA channels** table before running any ULW command or recording evidence.
+2. Read through **Bootstrap** (including its tier triage), **Execution Loop**, and the **Manual-QA channels** table before running any ULW command or recording evidence.
 3. If the task has code edits, tests, QA, or commit work, follow the full workflow's delegation and evidence rules. Tests alone never prove done.
 ## Non-Negotiables
@@ -22,15 +22,15 @@ This Codex skill is intentionally compact to avoid adding a large operating manu
 - Use the ulw-loop CLI state under `.omo/ulw-loop`; do not hand-edit goal state.
 - After any compaction or context loss, re-read brief + goals + ledger FIRST (`omo sparkshell cat .omo/ulw-loop/ledger.jsonl` or read directly) plus `omo ulw-loop status --json`, then resume; never re-plan from scratch.
 - If `omo ulw-loop create-goals` says the existing aggregate is already complete, start unrelated new work with a fresh `--session-id <new-id>` instead of steering or forcing the completed default state. Use `--force` only to intentionally overwrite completed evidence.
-- Every success criterion needs observable evidence from a real channel: tmux, HTTP, browser, or computer-use.
+- Every success criterion needs observable evidence from a real surface: a channel (tmux, HTTP, browser, computer-use) or, for CLI- or data-shaped criteria, an auxiliary surface (CLI stdout, DB diff, parsed config dump).
 - Record evidence through the CLI only after cleanup receipts are available.
 - Delegate code edits, test writes, fixes, and QA execution to right-sized Codex subagents when the workflow requires it.
-- Every `spawn_agent` message starts with `TASK:`, then names `DELIVERABLE`, `SCOPE`, and `VERIFY`; put role and specialty instructions inside `message` because the Codex tool schema only accepts `task_name`, `message`, and `fork_turns`; prefer `fork_turns: "none"` unless full history is truly required.
-- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short wait_agent cycles. Never use a single long blocking wait for them.
+- Every `multi_agent_v1.spawn_agent` message starts with `TASK:`, then names `DELIVERABLE`, `SCOPE`, and `VERIFY`; put role and specialty instructions inside `message`; use `fork_context: false` unless full history is truly required.
+- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short `multi_agent_v1.wait_agent` cycles. Never use a single long blocking wait for them.
 - For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long reading, testing, or review passes, and `BLOCKED: <reason>` only when it cannot progress.
-- While any child is active, keep the parent visibly alive with brief status updates that include active subagent count, agent names, latest `WORKING:` phase, and whether the parent is waiting for mailbox updates.
-- Track spawned agent names locally. Use `wait_agent` for mailbox signals, not proof of completion. A timeout only means no new mailbox update arrived; after a timeout, run a single `list_agents` check for the named child when you need reassurance. If it is running or its latest message is `WORKING:`, treat it as alive.
-- Do not use `list_agents` as a polling loop or status feed; it can replay large payloads. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running. Then record inconclusive and respawn a smaller `fork_turns: "none"` task with the missing deliverable.
+- Track spawned agent names locally. Use `multi_agent_v1.wait_agent` for mailbox signals, not proof of completion. A timeout only means no new mailbox update arrived. Treat a running child as alive.
+- While children run, surface the active subagent count, agent names, and latest `WORKING:` phase.
+- Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running. Then record inconclusive and respawn a smaller `fork_context: false` task with the missing deliverable.
 - Use `git-master` for git-tracked edits: inspect recent and touched-path commit history, then commit each verified work unit atomically in the repository's observed language, scope, and message style with only that unit's files staged.
 ## Codex Tool Mapping
@@ -39,11 +39,11 @@ The full workflow may mention OpenCode-style orchestration examples. In Codex, t
 | Workflow intent | Codex tool |
 | --- | --- |
-| Plan agent | `spawn_agent({"task_name":"...","message":"TASK: act as a planning agent. ...","fork_turns":"none"})` |
-| Search/read-only worker | `spawn_agent({"task_name":"...","message":"TASK: act as an explorer. ...","fork_turns":"none"})` |
-| Implementation or QA worker | `spawn_agent({"task_name":"...","message":"TASK: act as an implementation or QA worker. ...","fork_turns":"none"})` |
-| Final verification reviewer | `spawn_agent({"task_name":"...","message":"TASK: act as a rigorous reviewer. ...","fork_turns":"none"})` |
-| Wait for background result | `wait_agent(...)` |
-| Clean up finished worker | `close_agent(...)` |
+| Plan agent | `multi_agent_v1.spawn_agent({"message":"TASK: act as a planning agent. ...","fork_context":false})` |
+| Search/read-only worker | `multi_agent_v1.spawn_agent({"message":"TASK: act as an explorer. ...","fork_context":false})` |
+| Implementation or QA worker | `multi_agent_v1.spawn_agent({"message":"TASK: act as an implementation or QA worker. ...","fork_context":false})` |
+| Final verification reviewer | `multi_agent_v1.spawn_agent({"message":"TASK: act as a rigorous reviewer. ...","fork_context":false})` |
+| Wait for background result | `multi_agent_v1.wait_agent(...)` |
+| Clean up finished worker | `multi_agent_v1.close_agent(...)` |
 When translating `load_skills=[...]`, include the requested skill names in the spawned agent's `message`.

package/packages/omo-codex/plugin/skills/ulw-loop/references/full-workflow.md CHANGED Viewed

@@ -15,20 +15,20 @@ Prove EVERY success criterion with captured observable evidence from a real-usag
 TESTS ALONE NEVER PROVE DONE. A green test suite is supporting evidence, not completion proof.
 Audit each pass, fail, block, steering change, and checkpoint in `.omo/ulw-loop/ledger.jsonl`.
-## Manual-QA channels (PICK ONE PER CRITERION — ACTUALLY RUN IT)
-For every criterion, build a real-usage scenario through ONE of these four channels and run it yourself before recording PASS. The full test suite being green is NEVER verification on its own.
+## Manual-QA channels
+Run each criterion's real-surface proof yourself through the channel that faithfully exercises it; capture the artifact before recording PASS.
 1. **HTTP call** — hit the live endpoint with `curl -i` (or a Playwright APIRequestContext); capture status line + headers + body.
 2. **tmux** — `tmux new-session -d -s ulw-qa-<criterion>`, drive with `send-keys`, dump via `tmux capture-pane -pS -E -`; transcript is the artifact.
 3. **Browser use** — use Chrome to drive the REAL page; if Chrome is not available, download and use agent-browser (https://github.com/vercel-labs/agent-browser). Capture action log + screenshot path. Never downgrade to a non-browser surface for a browser-facing criterion.
 4. **Computer use** — when the surface is a desktop/GUI app rather than a page, drive it via OS-level automation (a computer-use agent, AppleScript, xdotool, etc.) against the running app; capture action log + screenshot. Use this for any non-browser GUI criterion.
-Auxiliary surfaces (pure CLI stdout / DB state diff / parsed config dump) satisfy CLI- or data-shaped criteria but NEVER replace a channel scenario for user-facing behavior. `--dry-run`, printing the command, "should respond", and "looks correct" never count.
+Auxiliary surfaces (CLI stdout / DB state diff / parsed config dump) are first-class evidence for CLI- or data-shaped criteria; use a channel scenario when the behavior is user-facing. `--dry-run`, printing the command, "should respond", and "looks correct" never count.
 ## Delegation model (ATLAS-STYLE — YOU CONDUCT, WORKERS PLAY)
-You read, search, plan, integrate, and QA. You DELEGATE every code edit, test write, bug fix, and QA execution to a right-sized `spawn_agent` worker, then verify what comes back. Fan out independent tasks in PARALLEL in a single response; serialize only on a NAMED dependency (one task consumes another's output or edits the same file).
+You read, search, plan, integrate, and QA. You DELEGATE every code edit, test write, bug fix, and QA execution to a right-sized `multi_agent_v1.spawn_agent` worker, then verify what comes back. Fan out independent tasks in PARALLEL in a single response; serialize only on a NAMED dependency (one task consumes another's output or edits the same file).
-Size each worker to the task. Put the intended role, rigor level, and specialty inside the worker `message`; the Codex `spawn_agent` schema only accepts `task_name`, `message`, and `fork_turns`.
+Size each worker to the task. Put the intended role, rigor level, and specialty inside the worker `message`.
 | Task shape | Message instruction |
 |---|---|
@@ -42,16 +42,16 @@ Size each worker to the task. Put the intended role, rigor level, and specialty
 For reviewer work, use a self-contained reviewer assignment, tight scope, and explicit verification in `message`. Never spawn a context-only child for review.
-Every worker message MUST carry: goal + exact files in scope; the baseline characterization test pinning current behavior when the task touches existing code, then the failing test / reproduction required before production code; constraints + project rules; the verification commands to run; the ONE Manual-QA channel and the exact evidence artifact to capture; for git-tracked edits, require `git-master` plus repository-wide and touched-path commit history inspection before commit. Workers have NO interview context — be exhaustive, and forward accumulated learnings to every next worker.
+Every worker message MUST carry: goal + exact files in scope; the PIN + failing-first proof required before production code (Per-Criterion Cycle step 3); constraints + project rules; the verification commands to run; the ONE Manual-QA channel and the exact evidence artifact to capture; for git-tracked edits, require `git-master` plus repository-wide and touched-path commit history inspection before commit. Workers have NO interview context — be exhaustive, and forward accumulated learnings to every next worker.
 Codex subagent reliability:
-- Start every `spawn_agent` message with `TASK: <imperative assignment>`, then name `DELIVERABLE`, `SCOPE`, and `VERIFY`. State that it is an executable assignment, not a context handoff.
-- Prefer `fork_turns: "none"` unless full history is truly required; paste only the context the child needs. Full-history forks can make the child continue old parent context instead of the delegated task.
-- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short wait_agent cycles. Never use a single long blocking wait for them.
+- Start every `multi_agent_v1.spawn_agent` message with `TASK: <imperative assignment>`, then name `DELIVERABLE`, `SCOPE`, and `VERIFY`. State that it is an executable assignment, not a context handoff.
+- Use `fork_context: false` unless full history is truly required; paste only the context the child needs. Full-history forks can make the child continue old parent context instead of the delegated task.
+- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short `multi_agent_v1.wait_agent` cycles. Never use a single long blocking wait for them.
 - For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long reading, testing, or review passes, and `BLOCKED: <reason>` only when it cannot progress.
 - While any child is active, keep the parent visibly alive with active subagent count, agent names, latest `WORKING:` phase, and whether the parent is waiting for mailbox updates.
-- Track spawned agent names locally. Use `wait_agent` for mailbox signals, not proof of completion. A timeout only means no new mailbox update arrived; after a timeout, run a single `list_agents` check for the named child when you need reassurance. If it is running or its latest message is `WORKING:`, treat it as alive. Do not use `list_agents` as a polling loop or status feed; it can replay large payloads.
-- Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running. Then send `TASK STILL ACTIVE: return <deliverable> or BLOCKED: <reason>` when a targeted followup can still recover the lane; otherwise record inconclusive, do not count it as pass/review approval, close if safe, and respawn a smaller `fork_turns: "none"` task with the missing deliverable.
+- Track spawned agent names locally. Use `multi_agent_v1.wait_agent` for mailbox signals, not proof of completion. A timeout only means no new mailbox update arrived. Treat a running child as alive.
+- Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running. Then send `TASK STILL ACTIVE: return <deliverable> or BLOCKED: <reason>` when a targeted followup can still recover the lane; otherwise record inconclusive, do not count it as pass/review approval, close if safe, and respawn a smaller `fork_context: false` task with the missing deliverable.
 ## Artifacts
 - `.omo/ulw-loop/brief.md`: original brief and durable constraints.
@@ -117,16 +117,14 @@ Write state through the CLI path. Do not hand-edit state files.
 ### 2. Refine success criteria + a Prometheus-grade QA and parallelism plan per goal
 Gather context BEFORE planning — fire parallel `explorer` / `librarian` workers plus your own read-only tools; never plan blind.
-First survey the skills available in this system: read the description of every loosely-relevant skill, decide deliberately which ones this work will use, and prefer using as many genuinely-applicable skills as apply rather than working raw. Then size the scope: count distinct surfaces, files, and steps. For any non-trivial goal (2+ steps, multi-file, unclear scope, or an architecture decision) spawn the `plan` agent with the gathered context and let IT decide the wave ordering and parallel grouping; follow that order and grouping exactly and run the verification it specifies. Only a genuinely trivial single-step goal may skip the plan agent.
-Define pass/fail acceptance criteria before launching execution lanes. Include the command, artifact, or manual check that will prove success.
-Each goal MUST carry 3+ `successCriteria` covering happy path, edge, regression, and adversarial risk.
-For each criterion set, concretely and upfront: `id`, `scenario` (the exact tool — curl / tmux / playwright / computer-use — plus exact steps with specific inputs and a binary pass/fail), `expectedEvidence` (the exact artifact path, e.g. `.omo/ulw-loop/evidence/<goal>-<criterion>.<ext>`), adversarial classes, stop condition, and the Manual-QA channel (HTTP call / tmux / browser use / computer use) that will exercise it. Vague QA ("verify it works") is a rejected criterion — revise it before execution.
-Apply ultraqa classes where relevant: malformed input, repeated interruptions, prompt injection, cancel/resume, stale state, dirty worktree, hung or long commands, flaky tests, misleading success output.
+First survey the skills available in this system: read the description of every loosely-relevant skill, decide deliberately which ones this work will use, and prefer using as many genuinely-applicable skills as apply rather than working raw.
+Then run tier triage per goal and record it with an `annotate_ledger` steering entry. Default is LIGHT — a narrow change inside existing layers. Take HEAVY only on a fact you can point to: a new module / abstraction / domain model; auth, security, or session; an external integration; a DB schema or migration; concurrency, transaction boundaries, or cache invalidation; a cross-domain refactor; or the user signaled care or demanded review. When unsure, take HEAVY; upgrade the moment a HEAVY fact surfaces and never downgrade mid-run.
+HEAVY goals: spawn the `plan` agent with the gathered context, follow its wave ordering and parallel grouping exactly, and run the verification it specifies; carry 3+ successCriteria covering happy path, edge, regression, and adversarial risk. LIGHT goals: plan directly; carry 1-2 successCriteria (happy path + the riskiest edge) with one real-surface proof of the deliverable.
+For each criterion set, concretely and upfront: `id`, `scenario` (the exact tool — curl / tmux / playwright / computer-use — plus exact steps with specific inputs and a binary pass/fail), `expectedEvidence` (the exact artifact path, e.g. `.omo/ulw-loop/evidence/<goal>-<criterion>.<ext>`), adversarial classes, stop condition, and the Manual-QA channel that will exercise it. Vague QA ("verify it works") is a rejected criterion — revise it before execution.
+A criterion's adversarial classes are the ultraqa classes a fact about the change triggers: malformed input, prompt injection, cancel/resume, stale state, dirty worktree, hung or long commands, flaky tests, misleading success output, repeated interruptions. Record untriggered classes as not-applicable in one line.
 Use evidence verbs from the channel table (tmux transcript, curl status+body, browser screenshot, computer-use action log, CLI stdout, DB diff, parsed config dump) — not vibes.
-"Tests pass" is supporting signal, NEVER completion proof. Every criterion needs its own channel scenario, built fresh and exercised every time.
-**Plan for maximum parallelism.** Decompose each goal's criteria into atomic tasks (Implementation + its Test = ONE task, never split) and group them into dependency waves. Target 5–8 tasks per wave; <3 per wave (except the final wave) means under-splitting — extract shared prerequisites into Wave 1. For each task record its wave, what it blocks, what blocks it, the worker tier from the Delegation table, and its QA scenario + evidence path. Build a dependency matrix (Task | Depends on | Blocks | Can parallelize with) and name the critical path. Anything not on a real dependency edge MUST share a wave and dispatch together.
-Record manual QA notes when behavior is user-visible.
+**Plan for maximum parallelism (HEAVY goals).** Decompose each goal's criteria into atomic tasks (Implementation + its Test = ONE task, never split) and group them into dependency waves. Target 5–8 tasks per wave; <3 per wave (except the final wave) means under-splitting — extract shared prerequisites into Wave 1. For each task record its wave, what it blocks, what blocks it, the worker tier from the Delegation table, and its QA scenario + evidence path. Build a dependency matrix (Task | Depends on | Blocks | Can parallelize with) and name the critical path. Anything not on a real dependency edge MUST share a wave and dispatch together.
 Revise any criterion that lacks observable `expectedEvidence` or a named channel before execution.
 ### 3. Inspect state
@@ -152,11 +150,11 @@ Loop per goal. Cap at 5 cycles per goal. Cap identical same-criterion failures a
 ### Per-Criterion Cycle
 1. PLAN: read `criterion.scenario`, `criterion.expectedEvidence`, prior ledger entries, and safety bounds. Identify which tasks in the current wave are independent.
 2. Register atomic todos via `update_plan` — one ultra-granular step per action, `path: <action> for <criterion> - verify by <check>`. Call `update_plan` on every transition (start → `in_progress`, finish → `completed`); exactly one `in_progress`, mark completed immediately, never batch, never let the rendered plan lag behind reality.
-3. DELEGATE-IN-PARALLEL: dispatch every independent task in the wave at once via right-sized `spawn_agent` workers (Delegation table). Each worker does strict TDD on its task: when the task touches EXISTING behavior, PIN it FIRST — write a characterization test that asserts the current observable behavior and PASSES on the unchanged code, so any later regression fails loudly. Then RED (the new failing assertion must fail for the RIGHT reason — no syntax/import error), then the SMALLEST GREEN change; before GREEN work that depends on external review, PR, issue, or branch state, refresh current branch/PR/issue state, preserve existing ordering/policy, and separate compatibility detection from policy changes unless the goal explicitly asks to change policy. A GREEN needing >~20 lines means the test was too coarse — instruct a split. The baseline-pin scenario must be as rigorous and specific as the new-behavior scenario: exact inputs, exact observable, exact assertion. Serialize only on a NAMED dependency.
+3. DELEGATE-IN-PARALLEL: dispatch every independent task in the wave at once via right-sized `multi_agent_v1.spawn_agent` workers (Delegation table). Each worker captures evidence failing-first: when the task touches EXISTING behavior, PIN it FIRST — a characterization test that asserts the current observable behavior and PASSES on the unchanged code, as rigorous as the new-behavior scenario (exact inputs, exact observable, exact assertion). Then RED through the cheapest faithful channel — a unit test where a seam exists, an integration/e2e test where the behavior lives in wiring, or the criterion's scenario captured failing when no test seam exists — failing for the RIGHT reason (no syntax/import error). A test that mirrors its implementation (mock-call assertions, pinned constants, cannot fail under plausible regression) is not evidence; use the scenario as the failing proof instead. Then the SMALLEST GREEN change; before GREEN work that depends on external review, PR, issue, or branch state, refresh current branch/PR/issue state, preserve existing ordering/policy, and separate compatibility detection from policy changes unless the goal explicitly asks to change policy. A GREEN far larger than the criterion implies means the proof was too coarse — instruct a split. Serialize only on a NAMED dependency.
 4. INTEGRATE + CRITICAL SELF-QA + GIT CHECKPOINT (EVERY WORKER RETURN): do NOT trust the worker's report. Read the diff yourself, re-run its tests, and run LSP diagnostics on the changed files. Treat "done" as a claim to disprove. If the diff drifts, the test is hollow, or evidence is missing, RESPAWN the worker with the specific failure context. Once the work unit is verified, use `git-master` before staging: inspect recent repository commits and touched-path history to infer commit language, Conventional Commit scope, message shape, and unit size. Stage only that unit's files and commit in the observed style; do not carry verified work forward into a later omnibus commit. If no git-tracked files changed or committing is unsafe, record the no-commit reason as evidence. Forward every finding/learning to subsequent workers.
-5. EXECUTE-AS-SCENARIO: ACTUALLY run the Manual-QA channel scenario the criterion named (HTTP call / tmux / browser use / computer use — see the channel table above). Run it yourself for the orchestrator check; for heavier flows dispatch a dedicated QA worker (`worker`, `gpt-5.5`, `high`) whose ONLY job is to drive the channel and write the artifact to the named evidence path. The unit suite being green is NEVER substitute. If the scenario FAILS, respawn the implementing worker with the captured failure — do not hand-patch around it.
+5. EXECUTE-AS-SCENARIO: ACTUALLY run the Manual-QA scenario the criterion named (channel table above). Run it yourself for the orchestrator check; for heavier flows dispatch a dedicated QA worker (`worker`, `gpt-5.5`, `high`) whose ONLY job is to drive the channel and write the artifact to the named evidence path. If the scenario FAILS, respawn the implementing worker with the captured failure — do not hand-patch around it.
 6. CAPTURE: collect the observable artifact path: transcript, stdout, screenshot, assertion, status+body, diff, or parsed dump. No artifact written at the evidence path — not done; record BLOCKED and respawn QA.
-7. CLEAN (PAIRED, NEVER SKIP): tear down every runtime artifact step 5 spawned BEFORE recording — server PIDs (`kill`, verify `kill -0` fails), `tmux` sessions (`tmux kill-session -t ulw-qa-<criterion>`; confirm `tmux ls`), browser / Playwright contexts (`.close()`), containers (`docker rm -f`), bound ports (`lsof -i :<port>` empty), temp sockets / files / dirs (`rm -rf` the `mktemp` paths), QA-only env vars, AND `close_agent` on every finished worker. Register each teardown as its own todo the moment the QA spawns the resource (scripts, tmux assets, browsers / agent-browser sessions, PIDs, ports) so none is forgotten. Embed a one-line cleanup receipt in the evidence string, e.g. `cleanup: killed 12345; tmux kill-session ulw-qa-foo; rm -rf /tmp/ulw.aB12cD; close_agent w-3`. Missing receipt → record BLOCKED, not PASS.
+7. CLEAN (PAIRED, NEVER SKIP): tear down every runtime artifact step 5 spawned BEFORE recording — server PIDs (`kill`, verify `kill -0` fails), `tmux` sessions (`tmux kill-session -t ulw-qa-<criterion>`; confirm `tmux ls`), browser / Playwright contexts (`.close()`), containers (`docker rm -f`), bound ports (`lsof -i :<port>` empty), temp sockets / files / dirs (`rm -rf` the `mktemp` paths), QA-only env vars, AND `multi_agent_v1.close_agent` on every finished worker. Register each teardown as its own todo the moment the QA spawns the resource (scripts, tmux assets, browsers / agent-browser sessions, PIDs, ports) so none is forgotten. Embed a one-line cleanup receipt in the evidence string, e.g. `cleanup: killed 12345; tmux kill-session ulw-qa-foo; rm -rf /tmp/ulw.aB12cD; multi_agent_v1.close_agent w-3`. Missing receipt → record BLOCKED, not PASS.
 8. RECORD exactly one result:
    - PASS: `omo ulw-loop record-evidence --goal-id <id> --criterion-id <id> --status pass --evidence "<observable> | <cleanup receipt>" --json`
    - FAIL: `omo ulw-loop record-evidence --goal-id <id> --criterion-id <id> --status fail --evidence "<observable> | <cleanup receipt>" --notes "<diagnosis>" --json`
@@ -178,7 +176,7 @@ Trigger only when one goal remains and all its criteria are passing.
 1. Run targeted verification for changed behavior.
 2. Run `ai-slop-cleaner` on changed files. If no relevant edits exist, record a passed no-op cleaner report.
 3. Rerun verification after cleanup.
-4. Judge the change size. Spawn a rigorous reviewer with `spawn_agent({"task_name":"final_verification_review","message":"TASK: act as a rigorous final verification reviewer. DELIVERABLE: approve or cite blockers. SCOPE: <changed files and goal>. VERIFY: inspect diff and verification evidence.","fork_turns":"none"})` only when the work is large or risky (multi-file, cross-cutting, new architecture, security/data surfaces, or you are unsure it is sound); for a small, local, low-risk change, do the review yourself and record `codeReview` with `evidence` starting `UNCONDITIONAL APPROVAL` plus a one-line justification of why the change was small enough to self-review.
+4. HEAVY tier — or any goal you are unsure is sound — spawns a rigorous reviewer with `multi_agent_v1.spawn_agent({"message":"TASK: act as a rigorous final verification reviewer. DELIVERABLE: approve or cite blockers. SCOPE: <changed files and goal>. VERIFY: inspect diff and verification evidence.","fork_context":false})`. LIGHT tier: review the diff yourself and record `codeReview` with `evidence` starting `UNCONDITIONAL APPROVAL` plus a one-line justification of why the tier held.
 5. Clean review means `codeReview.recommendation == "APPROVE"` and `codeReview.architectStatus == "CLEAR"`.
 6. If review is non-clean, run `omo ulw-loop record-review-blockers --goal-id <id> --title "<...>" --objective "<...>" --evidence "<review findings>" --codex-goal-json <snapshot> --json`.
 7. If clean, checkpoint final completion:
@@ -194,6 +192,7 @@ omo ulw-loop checkpoint --goal-id <id> --status complete --evidence "<e2e eviden
   "criteriaCoverage": { "totalCriteria": N, "passCount": N, "adversarialClassesCovered": ["malformed_input", "..."] }
 }
 ```
+A LIGHT goal with no triggered adversarial class records `"adversarialClassesCovered": ["none-applicable: <reason>"]`.
 ## Dynamic Steering
 Use steering only for structured evidence-backed mutation. Reject natural-language steering requests.
@@ -221,11 +220,11 @@ Structured prompt directives accepted: `OMO_ULW_LOOP_STEER: { ... }`, `omo.ulw-l
 7. Per-story Codex goal mode is opt-in only with `--codex-goal-mode per-story`; default is aggregate.
 8. Structured steering directives mutate state through validation; normal prose does not.
 9. Evidence MUST be observable from the real surface: tmux transcript, curl status+body, browser/Playwright assertion, CLI stdout, DB state diff, parsed config dump.
-10. Apply ultraqa's 9 adversarial classes where relevant per goal: malformed input, prompt injection, cancel/resume, stale state, dirty worktree, hung commands, flaky tests, misleading success output, repeated interruptions.
+10. Probe the adversarial classes each criterion's trigger facts name (list in Bootstrap step 2); record untriggered classes as not-applicable in one line.
 11. After completing an aggregate ulw-loop run, clear the Codex goal manually with `/goal clear` before starting another in the same session.
 12. The shell command emits a model-facing handoff; only the Codex agent calls `get_goal`, `create_goal`, or `update_goal` tools.
 13. NEVER record `--status pass` while a QA-spawned process, `tmux` session, browser context, bound port, container, or temp file / dir is still alive, or while any worker is still open. The evidence string MUST include the cleanup receipt. Leftover runtime state = BLOCKED, not PASS.
-14. DELEGATE all code edits, test writes, fixes, and QA execution to right-sized `spawn_agent` workers (Delegation table); you read, search, plan, integrate, and QA. NEVER record `--status pass` from a worker's self-report — only from evidence you re-verified yourself. Dispatch independent tasks in parallel; serialize only on a NAMED dependency.
+14. DELEGATE all code edits, test writes, fixes, and QA execution to right-sized `multi_agent_v1.spawn_agent` workers (Delegation table); you read, search, plan, integrate, and QA. NEVER record `--status pass` from a worker's self-report — only from evidence you re-verified yourself. Dispatch independent tasks in parallel; serialize only on a NAMED dependency.
 15. Every verified work unit that touched git-tracked files must leave either an atomic `git-master`-style commit hash or explicit no-commit blocker evidence before the next unit starts.
 ## Stop Rules

package/packages/omo-codex/plugin/skills/ulw-plan/SKILL.md CHANGED Viewed

@@ -14,7 +14,7 @@ This skill is intentionally compact. The full planning workflow lives in `refere
 ## Required First Steps
 1. Open `references/full-workflow.md`.
-2. Read **Phase 0 - Classify**, **Phase 1 - Ground**, and the **Approval gate** before you ask the user anything or draft a plan.
+2. Read **Phase 0 - Classify**, **Phase 1 - Ground**, **Phase 2 - Interview**, and the **Approval gate** before you ask the user anything or draft a plan.
 3. Internalize the loop: explore exhaustively, surface the genuine unknowns, ask, then wait for approval before planning.
 ## The Gate (non-negotiable behavior)
@@ -24,6 +24,15 @@ This skill is intentionally compact. The full planning workflow lives in `refere
 - **Wait for the user's explicit okay before generating the plan.** Never auto-transition from interview to plan generation. No plan file, no Metis gap-analysis, no execution until the user approves the approach.
 - **Planner scope only.** Write only `.omo/plans/<slug>.md` and `.omo/drafts/*.md`. Never edit source. If asked to "just do it", decline: you plan; a worker executes.
+## Interview Discipline (how to ask)
+Exploration answers facts; the user decides preferences, tradeoffs, and safety. Bring those decisions to the user EARLY and well-formed:
+- Every question must materially change the plan, confirm a load-bearing assumption, or choose between real tradeoffs. If a read-only search could answer it, asking is a failure.
+- Ask 1-3 narrow questions per turn, each with 2-4 concrete options and your recommended default first, grounded in a file path or finding you cite. A skipped question resolves to that default, recorded in the draft as an assumption.
+- Always ask test strategy (TDD / tests-after / none); agent-executed QA scenarios are included regardless.
+- Record every answer and decision in `.omo/drafts/<slug>.md` immediately; run the Phase 2 clearance check after every turn; never end a turn passively — end with the question or the explicit next step.
 ## Dynamic Adversarial Planning
 For architecture work, no-plan `$start-work` bootstrap, or requests that cite Discord / external repositories, use **dynamic adversarial workflow phases** before writing the final plan:
@@ -44,22 +53,22 @@ Subagent outputs are not success or approval without independent verification.
 You explore a LOT - fan out parallel read-only research before interviewing - but delegate with Codex discipline:
-- Every `spawn_agent` message starts with `TASK:`, then names `DELIVERABLE`, `SCOPE`, and `VERIFY`. Put role and specialty instructions inside `message`; the Codex tool schema only accepts `task_name`, `message`, and `fork_turns`. Prefer `fork_turns: "none"` unless full history is truly required.
-- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short wait_agent cycles. Never use a single long blocking wait for them.
+- Every `multi_agent_v1.spawn_agent` message starts with `TASK:`, then names `DELIVERABLE`, `SCOPE`, and `VERIFY`. Put role and specialty instructions inside `message`. Use `fork_context: false` unless full history is truly required.
+- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short `multi_agent_v1.wait_agent` cycles. Never use a single long blocking wait for them.
 - For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long reading, testing, or review passes, and `BLOCKED: <reason>` only when it cannot progress.
 - While any child is active, keep yourself visibly alive with active subagent count, agent names, latest `WORKING:` phase, and whether you are waiting for mailbox updates.
-- Track spawned agent names locally. Use `wait_agent` for mailbox signals, not proof of completion. A timeout only means no new mailbox update arrived; after a timeout, run a single `list_agents` check for the named child when you need reassurance. If it is running or its latest message is `WORKING:`, treat it as alive.
-- Do not use `list_agents` as a polling loop or status feed; it can replay large payloads. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running. Then record the lane inconclusive and respawn a smaller `fork_turns: "none"` task with the missing deliverable.
+- Track spawned agent names locally. Use `multi_agent_v1.wait_agent` for mailbox signals, not proof of completion. A timeout only means no new mailbox update arrived. Treat a running child as alive.
+- Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running. Then record the lane inconclusive and respawn a smaller `fork_context: false` task with the missing deliverable.
 ## Codex Tool Mapping
 | Planning intent | Codex tool |
 | --- | --- |
-| Internal codebase research | `spawn_agent({"task_name":"...","message":"TASK: act as an explorer. ...","fork_turns":"none"})` |
-| External docs / library research | `spawn_agent({"task_name":"...","message":"TASK: act as a librarian. ...","fork_turns":"none"})` |
-| Pre-plan gap analysis (after approval) | `spawn_agent({"task_name":"...","message":"TASK: act as a Metis gap-analysis reviewer. ...","fork_turns":"none"})` |
-| High-accuracy plan review (optional) | `spawn_agent({"task_name":"...","message":"TASK: act as a Momus plan reviewer. ...","fork_turns":"none"})` |
-| Wait for a research result | `wait_agent(...)` |
-| Release a finished subagent | `close_agent(...)` |
+| Internal codebase research | `multi_agent_v1.spawn_agent({"message":"TASK: act as an explorer. ...","fork_context":false})` |
+| External docs / library research | `multi_agent_v1.spawn_agent({"message":"TASK: act as a librarian. ...","fork_context":false})` |
+| Pre-plan gap analysis (after approval) | `multi_agent_v1.spawn_agent({"message":"TASK: act as a Metis gap-analysis reviewer. ...","fork_context":false})` |
+| High-accuracy plan review (optional) | `multi_agent_v1.spawn_agent({"message":"TASK: act as a Momus plan reviewer. ...","fork_context":false})` |
+| Wait for a research result | `multi_agent_v1.wait_agent(...)` |
+| Release a finished subagent | `multi_agent_v1.close_agent(...)` |
 Name any skills the child needs directly inside its `message`. Your plan goes to `.omo/plans/<slug>.md`; never split one request into multiple plans.

package/packages/omo-codex/plugin/skills/ulw-plan/references/full-workflow.md CHANGED Viewed

@@ -8,7 +8,7 @@ metadata:
 ## Role
 Prometheus, strategic planning consultant inside Codex. You turn a vague or large request into ONE decision-complete work plan a downstream worker can execute with zero further interview. You are a PLANNER, not an implementer: read, search, run read-only analysis, and write only `.omo/plans/<slug>.md` and `.omo/drafts/*.md`. Never edit product code; if asked to "just do it", decline and offer to plan.
-GPT-5.x style: outcome-first, evidence-bound, atomic decisions. Explore a lot. Ask little. Never plan blind, and never plan before the user approves.
+GPT-5.x style: outcome-first, evidence-bound, atomic decisions. Explore a lot; ask few, decisive questions. Never plan blind, and never plan before the user approves.
 ## North star
 A plan is **decision-complete** when the implementer needs ZERO judgment calls: every decision made, every ambiguity resolved, every pattern referenced with a concrete path.
@@ -22,8 +22,8 @@ Size your interview depth before diving in:
 ## Phase 1 - Ground (explore exhaustively BEFORE asking)
 Eliminate unknowns by discovering facts, not by asking the user. Before your first question, fan out parallel read-only research and keep working while it runs.
-- `spawn_agent({"task_name":"...","message":"TASK: act as an explorer. ...","fork_turns":"none"})` per internal aspect: existing patterns, conventions, similar implementations, naming/registration, test infrastructure. One agent per aspect.
-- `spawn_agent({"task_name":"...","message":"TASK: act as a librarian. ...","fork_turns":"none"})` per external aspect: official docs, API contracts, recommended patterns, pitfalls.
+- `multi_agent_v1.spawn_agent({"message":"TASK: act as an explorer. ...","fork_context":false})` per internal aspect: existing patterns, conventions, similar implementations, naming/registration, test infrastructure. One agent per aspect.
+- `multi_agent_v1.spawn_agent({"message":"TASK: act as a librarian. ...","fork_context":false})` per external aspect: official docs, API contracts, recommended patterns, pitfalls.
 - While they run, use direct read-only tools (`read`, `rg`, `ast_grep_search`, `lsp_*`) for immediate context. Do not idle.
 ### Dynamic workflow for architecture and bootstrap planning
@@ -44,11 +44,17 @@ Two kinds of unknowns:
 Exhaust exploration first. "I could not find it" is true only after you actually looked.
 ## Phase 2 - Interview (ask only what exploration cannot resolve)
-Record everything to `.omo/drafts/<slug>.md` as you go: confirmed requirements, decisions + rationale, research findings, open questions, scope IN / OUT. The draft is your durable memory across turns.
+Record everything to `.omo/drafts/<slug>.md` as you go: confirmed requirements (the user's exact words), decisions + rationale, research findings, open questions, scope IN / OUT. Update it after EVERY meaningful exchange - long interviews outlive your context, and plan generation reads the draft, not your memory.
-Ask focused questions ONLY for genuine unknowns surfaced by Phase 1: goal + definition of done, scope boundaries, preference tradeoffs, test strategy (TDD / tests-after / none - agent-executed QA is always included), and hard constraints. Every question must materially change the plan. Never ask what a read-only search would answer.
+Interview focus, informed by Phase 1 findings: goal + definition of done, scope boundaries (IN and explicitly OUT), technical approach ("I found pattern X at `src/path` - follow it?"), test strategy (TDD / tests-after / none - agent-executed QA is always included), and hard constraints.
-Keep each turn conversational: 3-6 sentences plus 1-3 questions. Never end a turn passively; end with the specific question or the explicit next step.
+Question rules:
+- Every question must materially change the plan, confirm a load-bearing assumption, or choose between real tradeoffs. Never ask what a read-only search would answer.
+- Ask 1-3 narrow questions per turn, each with 2-4 concrete options and your recommended default first with a one-line rationale. A question the user skips resolves to the recommended default, recorded in the draft as an assumption.
+- Ground each question in evidence: cite the file path or research finding that raised it, so the user decides from facts rather than guesses.
+- Keep each turn conversational: 3-6 sentences plus the questions. Never end a turn passively; end with the specific question or the explicit next step.
+Clearance check - run after EVERY interview turn: core objective defined? scope IN/OUT explicit? technical approach decided? test strategy confirmed? no critical ambiguity or blocking question left? Any NO -> that unmet item is your next question. All YES -> present the approval brief (see Approval gate) and stop; never jump from interview into writing the plan.
 ## Approval gate (DO NOT SKIP)
 When exploration is exhausted and the genuine unknowns are answered, do NOT auto-start planning. Present a short brief instead:
@@ -61,7 +67,7 @@ Then **wait for the user's explicit okay** before generating the plan. No Metis,
 Narrow `$start-work` bootstrap exception: if `$start-work` invoked this skill because there was no active Boulder work and no selectable plan, the user's `start work` request counts as approval to generate the plan and begin execution. Preserve the normal gate for ordinary `ulw-plan`; ask one focused question only if the objective is missing, destructive, or has a safety/product ambiguity that exploration cannot resolve.
 ## Phase 3 - Generate the plan (only after approval)
-1. **Metis gap analysis (mandatory):** `spawn_agent({"task_name":"metis_gap_analysis","message":"TASK: act as a Metis gap-analysis reviewer and review this planning session for gaps. DELIVERABLE: contradictions, missing constraints, scope-creep risks, unvalidated assumptions, missing acceptance criteria. VERIFY: each gap names a concrete fix.","fork_turns":"none"})`. Fold the findings in silently.
+1. **Metis gap analysis (mandatory):** `multi_agent_v1.spawn_agent({"message":"TASK: act as a Metis gap-analysis reviewer and review this planning session for gaps. DELIVERABLE: contradictions, missing constraints, scope-creep risks, unvalidated assumptions, missing acceptance criteria. VERIFY: each gap names a concrete fix.","fork_context":false})`. Fold the findings in silently.
 2. Write ONE plan to `.omo/plans/<slug>.md` using the template below. No "Phase 1 plan / Phase 2 plan" splits; 50+ todos is fine. Build it incrementally - skeleton first, then append todo batches - so output limits never truncate it; re-read the file to confirm completeness.
 3. **Self-review:** every todo has references + agent-executable acceptance criteria + QA scenarios; no business-logic assumption without evidence; zero acceptance criteria require a human.
@@ -116,14 +122,14 @@ Critical path: ...
 ```
 ## Phase 4 - High-accuracy review (optional)
-If the user wants maximum rigor, call `spawn_agent({"task_name":"momus_plan_review","message":"TASK: act as a Momus plan reviewer. DELIVERABLE: review .omo/plans/<slug>.md only. VERIFY: cite every required fix or approve.","fork_turns":"none"})` and pass ONLY the plan path in `message`. Fix every cited issue and resubmit until it approves.
+If the user wants maximum rigor, call `multi_agent_v1.spawn_agent({"message":"TASK: act as a Momus plan reviewer. DELIVERABLE: review .omo/plans/<slug>.md only. VERIFY: cite every required fix or approve.","fork_context":false})` and pass ONLY the plan path in `message`. Fix every cited issue and resubmit until it approves.
 ## Delegation discipline (Codex)
-- Every `spawn_agent` message starts with `TASK:`, then `DELIVERABLE`, `SCOPE`, `VERIFY`. Put role and specialty instructions inside `message`; the Codex tool schema only accepts `task_name`, `message`, and `fork_turns`. Prefer `fork_turns: "none"`.
-- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short wait_agent cycles. Never use a single long blocking wait for them.
+- Every `multi_agent_v1.spawn_agent` message starts with `TASK:`, then `DELIVERABLE`, `SCOPE`, `VERIFY`. Put role and specialty instructions inside `message`. Use `fork_context: false` unless full history is truly required.
+- Plan and reviewer agents may run for a long time; spawn them in the background, keep doing independent root work, and poll with short `multi_agent_v1.wait_agent` cycles. Never use a single long blocking wait for them.
 - For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long passes and `BLOCKED: <reason>` only when progress stops.
 - Keep yourself visibly alive while children run: active subagent count, agent names, latest `WORKING:` phase, and whether you are waiting on mailbox updates.
-- Use `wait_agent` for mailbox signals, not proof. A timeout only means no new mailbox update arrived; after a timeout, run a single `list_agents` check for the named child when you need reassurance. If it is running or its latest message is `WORKING:`, treat it as alive. Do not use `list_agents` as a polling loop. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running; then mark the lane inconclusive and respawn a smaller `fork_turns: "none"` task with the missing deliverable. `close_agent` after integrating each result.
+- Use `multi_agent_v1.wait_agent` for mailbox signals, not proof. A timeout only means no new mailbox update arrived. Treat a running child as alive. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running; then mark the lane inconclusive and respawn a smaller `fork_context: false` task with the missing deliverable. `multi_agent_v1.close_agent` after integrating each result.
 ## Stop rules
 - Plan file exists, template filled, every todo has references + acceptance + QA + commit, dependency matrix consistent: DONE.

package/packages/omo-codex/plugin/skills/visual-qa/SKILL.md CHANGED Viewed

@@ -9,17 +9,17 @@ This skill may include examples copied from the OpenCode harness. In Codex, do n
 | OpenCode example | Codex tool to use |
 | --- | --- |
-| `call_omo_agent(subagent_type="explore", ...)` | `spawn_agent({"task_name":"...","message":"TASK: act as an explorer. ...","fork_turns":"none"})` |
-| `call_omo_agent(subagent_type="librarian", ...)` | `spawn_agent({"task_name":"...","message":"TASK: act as a librarian. ...","fork_turns":"none"})` |
-| `task(subagent_type="plan", ...)` | `spawn_agent({"task_name":"...","message":"TASK: act as a planning agent. ...","fork_turns":"none"})` |
-| `task(subagent_type="oracle", ...)` for final verification | `spawn_agent({"task_name":"...","message":"TASK: act as a rigorous reviewer. ...","fork_turns":"none"})` |
-| `task(category="...", ...)` for implementation or QA | `spawn_agent({"task_name":"...","message":"TASK: act as an implementation or QA worker. ...","fork_turns":"none"})` |
-| `background_output(task_id="...")` | `wait_agent(...)` for mailbox signals; after a timeout, run one `list_agents` check for the named child if reassurance is needed |
-| `team_*(...)` | Use Codex native subagents plus `send_message`, `followup_task`, `wait_agent`, and `close_agent` |
+| `call_omo_agent(subagent_type="explore", ...)` | `multi_agent_v1.spawn_agent({"message":"TASK: act as an explorer. ...","agent_type":"explorer","fork_context":false})` |
+| `call_omo_agent(subagent_type="librarian", ...)` | `multi_agent_v1.spawn_agent({"message":"TASK: act as a librarian. ...","agent_type":"librarian","fork_context":false})` |
+| `task(subagent_type="plan", ...)` | `multi_agent_v1.spawn_agent({"message":"TASK: act as a planning agent. ...","agent_type":"plan","fork_context":false})` |
+| `task(subagent_type="oracle", ...)` for final verification | `multi_agent_v1.spawn_agent({"message":"TASK: act as a rigorous reviewer. ...","agent_type":"codex-ultrawork-reviewer","fork_context":false})` |
+| `task(category="...", ...)` for implementation or QA | `multi_agent_v1.spawn_agent({"message":"TASK: act as an implementation or QA worker. ...","fork_context":false})` |
+| `background_output(task_id="...")` | `multi_agent_v1.wait_agent(...)` for mailbox signals |
+| `team_*(...)` | Use Codex native subagents via `multi_agent_v1.spawn_agent`, `multi_agent_v1.send_input`, `multi_agent_v1.wait_agent`, and `multi_agent_v1.close_agent` |
-Codex full-history forks inherit parent context, so role-specific behavior must be described in a self-contained `message` and usually should use a non-full-history fork mode such as `fork_turns="none"`. Include any required conversation context, files, diffs, constraints, and requested skill names directly in the spawned agent's `message`. If a code block below conflicts with this section, this section wins.
+Role-specific behavior must be described in a self-contained `message`. Use `fork_context: false` to start the child with only the initial prompt (no parent history); use `fork_context: true` only when full parent history is truly required. Include any required conversation context, files, diffs, constraints, and requested skill names directly in the spawned agent's `message`. OMO installs these selectable agent roles into `~/.codex/agents/`: `explorer`, `librarian`, `plan`, `momus`, `metis`, and `codex-ultrawork-reviewer` — pass the matching name as `agent_type` so the child gets that role's model and instructions. On `multi_agent_v2` sessions the same `agent_type` applies (the OMO installer exposes it) with `fork_turns` instead of `fork_context`. If the spawn tool exposes no `agent_type` parameter, omit it and describe the role inside `message`. If a code block below conflicts with this section, this section wins.
-For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long passes and `BLOCKED: <reason>` only when progress stops. A `wait_agent` timeout only means no new mailbox update arrived. Treat a running child or latest `WORKING:` message as alive. Do not use `list_agents` as a polling loop. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running.
+For work likely to exceed one wait cycle, require the child to send `WORKING: <task> - <current phase>` before long passes and `BLOCKED: <reason>` only when progress stops. A `multi_agent_v1.wait_agent` timeout only means no new mailbox update arrived. Treat a running child as alive. Fallback only when the child is completed without the deliverable, ack-only after followup, explicitly `BLOCKED:`, or no longer running.
 # Visual QA - Dual-Oracle Web and TUI Verification

package/packages/omo-codex/plugin/test/aggregate-build.test.mjs CHANGED Viewed

@@ -18,8 +18,9 @@ test("#given aggregate plugin build script #when inspected #then hook status and
 	// then
 	assert.equal(
 		buildScript,
-		"node scripts/sync-hook-status-messages.mjs && node scripts/build-bundled-mcp-runtimes.mjs && node scripts/sync-skills.mjs && node ../scripts/sync-telemetry-component.mjs && node scripts/build-components.mjs",
+		"node scripts/sync-version.mjs && node scripts/sync-hook-status-messages.mjs && node scripts/build-bundled-mcp-runtimes.mjs && node scripts/sync-skills.mjs && node ../scripts/sync-telemetry-component.mjs && node scripts/build-components.mjs",
 	);
+	assert.match(buildScript, /^node scripts\/sync-version\.mjs &&/);
 	assert.equal(testScript, "node --test test/*.test.mjs");
 	assert(packageJson.workspaces.includes("components/ultrawork"));
 	assert.match(telemetrySyncScript, /syncTelemetryComponent/);

package/packages/omo-codex/plugin/test/aggregate-mcp.test.mjs CHANGED Viewed

@@ -33,7 +33,7 @@ test("#given aggregate MCP config #when inspected #then code MCPs reference pack
 	assert.match(bundledMcpBuildScript, /git-bash-mcp/);
 	assert.doesNotMatch(packageJson.scripts.build, /--workspaces/);
 	assert.equal(lspServer.command, "node");
-	assert.deepEqual(lspServer.args, ["../../lsp-tools-mcp/dist/cli.js", "mcp"]);
+	assert.deepEqual(lspServer.args, ["../../lsp-daemon/dist/cli.js", "mcp"]);
 	assert.equal(lspServer.cwd, ".");
 	assert.equal(astGrepServer.command, "node");
 	assert.deepEqual(astGrepServer.args, ["../../ast-grep-mcp/dist/cli.js", "mcp"]);

package/packages/omo-codex/plugin/test/aggregate-plugin-fixture.mjs CHANGED Viewed

@@ -70,14 +70,14 @@ export function findInvalidSpawnAgentRoleParameters(content) {
 	return [...content.matchAll(/spawn_agent\([^)\n]*(?:agent_type|model|reasoning_effort)\s*=/g)].map((match) => match[0]);
 }
-export function findSpawnAgentCallsWithoutForkTurnsNone(content) {
-	const missingForkTurns = [];
+export function findSpawnAgentCallsWithoutForkContextFalse(content) {
+	const missingForkContext = [];
 	const regex = /spawn_agent\(([^)]*)\)/g;
 	for (const match of content.matchAll(regex)) {
 		const call = match[0];
-		if (!/"fork_turns"\s*:\s*"none"|fork_turns:\s*"none"|fork_turns="none"/.test(call)) {
-			missingForkTurns.push(call);
+		if (!/"fork_context"\s*:\s*false|fork_context:\s*false|fork_context=false/.test(call)) {
+			missingForkContext.push(call);
 		}
 	}
-	return missingForkTurns;
+	return missingForkContext;
 }

package/packages/omo-codex/plugin/test/aggregate-skills.test.mjs CHANGED Viewed

@@ -5,7 +5,7 @@ import test from "node:test";
 import {
 	findInvalidSpawnAgentRoleParameters,
-	findSpawnAgentCallsWithoutForkTurnsNone,
+	findSpawnAgentCallsWithoutForkContextFalse,
 	root,
 } from "./aggregate-plugin-fixture.mjs";
@@ -27,7 +27,7 @@ test("#given synced skills with Codex compatibility guidance #when spawn_agent i
 	assert.deepEqual(invalidCalls, []);
 });
-test('#given synced skills and bundled rules #when role-specific agents are spawned #then they set fork_turns="none"', async () => {
+test('#given synced skills and bundled rules #when role-specific agents are spawned #then they set fork_context=false', async () => {
 	const skillsDir = join(root, "skills");
 	const skillEntries = await readdir(skillsDir, { withFileTypes: true });
 	const promptFiles = skillEntries
@@ -35,15 +35,15 @@ test('#given synced skills and bundled rules #when role-specific agents are spaw
 		.map((entry) => join(skillsDir, entry.name, "SKILL.md"));
 	promptFiles.push(join(root, "components", "rules", "bundled-rules", "hephaestus.md"));
-	const missingForkTurns = [];
+	const missingForkContext = [];
 	for (const promptPath of promptFiles) {
 		const content = await readFile(promptPath, "utf8");
-		for (const call of findSpawnAgentCallsWithoutForkTurnsNone(content)) {
-			missingForkTurns.push(`${basename(dirname(promptPath))}/${basename(promptPath)}: ${call}`);
+		for (const call of findSpawnAgentCallsWithoutForkContextFalse(content)) {
+			missingForkContext.push(`${basename(dirname(promptPath))}/${basename(promptPath)}: ${call}`);
 		}
 	}
-	assert.deepEqual(missingForkTurns, []);
+	assert.deepEqual(missingForkContext, []);
 });
 test("#given long-running orchestration prompts #when waiting on child agents #then parent liveness is surfaced", async () => {