npm - @kbediako/codex-orchestrator - Versions diffs - 0.1.31 → 0.1.33 - Mend

@kbediako/codex-orchestrator 0.1.31 → 0.1.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/README.md +79 -9
package/dist/bin/codex-orchestrator.js +671 -66
package/dist/orchestrator/src/cli/codexCliSetup.js +1 -0
package/dist/orchestrator/src/cli/doctor.js +186 -7
package/dist/orchestrator/src/cli/doctorUsage.js +150 -8
package/dist/orchestrator/src/cli/init.js +1 -1
package/dist/orchestrator/src/cli/mcpEnable.js +392 -0
package/dist/orchestrator/src/cli/orchestrator.js +161 -2
package/dist/orchestrator/src/cli/rlmRunner.js +289 -35
package/dist/orchestrator/src/cli/run/manifest.js +31 -6
package/dist/orchestrator/src/cli/services/commandRunner.js +10 -2
package/dist/orchestrator/src/cli/services/runSummaryWriter.js +35 -0
package/dist/orchestrator/src/cli/skills.js +3 -8
package/dist/orchestrator/src/cli/utils/advancedAutopilot.js +114 -0
package/dist/orchestrator/src/cli/utils/codexCli.js +21 -0
package/dist/orchestrator/src/cli/utils/delegationGuardRunner.js +85 -8
package/dist/orchestrator/src/cli/utils/specGuardRunner.js +79 -19
package/dist/orchestrator/src/cloud/CodexCloudTaskExecutor.js +25 -6
package/dist/orchestrator/src/control-plane/request-builder.js +9 -8
package/dist/scripts/lib/pr-watch-merge.js +493 -4
package/docs/README.md +7 -5
package/package.json +1 -1
package/schemas/manifest.json +27 -0
package/skills/collab-deliberation/SKILL.md +6 -0
package/skills/collab-evals/SKILL.md +4 -0
package/skills/collab-subagents-first/SKILL.md +29 -7
package/skills/delegation-usage/DELEGATION_GUIDE.md +31 -5
package/skills/delegation-usage/SKILL.md +29 -4
package/skills/elegance-review/SKILL.md +14 -3
package/skills/standalone-review/SKILL.md +8 -2
package/templates/README.md +1 -1
package/templates/codex/AGENTS.md +12 -1

package/skills/collab-deliberation/SKILL.md CHANGED Viewed

@@ -7,6 +7,11 @@ description: Structure multi-agent brainstorming and deliberation (options, trad
 Use this skill when the user asks for brainstorming, tradeoffs, option comparison, or decision support before implementation. This skill is for ideas and decisions, not coding.
+## Terminology + feature gate (required)
+- In this skill, "collab" means multi-agent tool usage (`spawn_agent` / `wait` / `close_agent`).
+- Codex CLI feature gating is `features.multi_agent=true`; treat `collab` as legacy naming in some env/artifact keys.
+- For symbolic orchestration, existing key names remain `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls`.
 ## Deliberation Default v1 (required)
 - Keep MCP as the lead control plane. Use collab/delegated subagents to generate and challenge options.
 - Run full deliberation when any hard-stop trigger is true:
@@ -84,4 +89,5 @@ Use this skill when the user asks for brainstorming, tradeoffs, option compariso
 - Do not present uncertainty as certainty.
 - Keep outputs concise and action-oriented.
 - If collab subagents are used, close lifecycle loops per id (`spawn_agent` -> `wait` -> `close_agent`) before finishing.
+- If collab subagents are used, always set explicit `agent_type` (omission defaults to `default`) and prefix spawned prompts with `[agent_type:<role>]`.
 - If you cannot close collab agents (missing ids) and spawn keeps failing, restart the session and re-run deliberation; keep work moving by doing solo deliberation meanwhile.

package/skills/collab-evals/SKILL.md CHANGED Viewed

@@ -9,6 +9,10 @@ Use this skill to run repeatable collab evaluation scenarios and record evidence
 ## Quick start
+0) Confirm feature readiness:
+- Run `codex features list` and verify `multi_agent` is enabled.
+- In this skill, "collab" refers to the same multi-agent tooling path; `collab` naming remains in legacy keys like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls`.
 1) Pick the scenario(s):
 - Large-context symbolic RLM with collab subcalls.
 - Multi-hour refactor with checkpoints.

package/skills/collab-subagents-first/SKILL.md CHANGED Viewed

@@ -11,6 +11,12 @@ Delegate as a manager, not as a pass-through. Split work into narrow streams, gi
 Note: If a global `collab-subagents-first` skill is installed, prefer that and fall back to this bundled skill.
+## Terminology + feature gate
+- Use "collab" as the workflow/tooling term for subagent calls (`spawn_agent` / `wait` / `close_agent`).
+- Codex CLI enablement is `features.multi_agent=true`; `collab` remains as legacy naming in fields like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls`.
+- Keep existing env/artifact key names as-is unless upstream explicitly changes those interfaces.
 ## Delegation gate
 Use subagents when any condition is true:
@@ -89,6 +95,8 @@ Skip subagents when all conditions are true:
   - `message` (plain text), or
   - `items` (structured input).
 - Do not send both `message` and `items` in one spawn call.
+- `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- Prefix spawned prompts with `[agent_type:<role>]` on line one so role intent is auditable from collab JSONL/manifests.
 - Use `items` when you need explicit structured context (for example `mention` paths like `app://...` or selected `skill` entries) instead of flattening everything into one long string.
 - Spawn returns an `agent_id` (thread id). Collab event rendering/picker labels are id-based today; do not depend on custom visible agent names.
 - To keep operator readability high despite id labels, encode the role clearly in your stream labels and first-line task brief (for example `review`, `tests`, `research`).
@@ -96,10 +104,13 @@ Skip subagents when all conditions are true:
 ## Collab lifecycle hygiene (required)
 When you use collab tools (`spawn_agent` / `wait` / `close_agent`):
-- Keep a local list of every returned `agent_id`.
-- For every successful `spawn_agent`, run `wait` and then `close_agent` for that same id.
-- Always close agents on error/timeout paths; do a final cleanup pass before finishing so no id is left unclosed.
-- If spawn fails with `agent thread limit reached`, stop spawning immediately, close any known ids, then retry once. If you still cannot spawn, proceed without collab (solo or via delegation) and explicitly note the degraded mode.
+- Keep an `open_agent_ids` ledger for the current turn/stage.
+- On successful `spawn_agent`, append the returned `agent_id` to `open_agent_ids` immediately.
+- For every successful spawn, run `wait` and then `close_agent` for that same id.
+- After each successful close, remove that id from `open_agent_ids`.
+- On timeout/error paths, close any id still in `open_agent_ids` before returning control.
+- Run a final close-sweep before handoff: iterate all remaining ids in `open_agent_ids`, call `close_agent`, then clear the list.
+- If spawn fails with `agent thread limit reached`, stop spawning immediately, run a close-sweep for all known ids, retry one time, and if it still fails proceed without collab (solo or delegation) while explicitly noting degraded mode.
 ## Required subagent contract
@@ -123,7 +134,7 @@ Reject and rerun when responses are:
 - Keep privileged/high-risk operations in the parent thread when interactive approval is required.
 - Subagents inherit core execution context (for example cwd/sandbox constraints), so include environment assumptions explicitly in each brief.
-## Review loop (standalone-review pairing)
+## Review loop (standalone + elegance pairing)
 Use a two-layer review loop:
@@ -137,6 +148,7 @@ Use a two-layer review loop:
 2) Parent independent review (required)
 - After integrating subagent work, run a standalone review from the parent.
 - Prefer the global `standalone-review` skill workflow for consistent checks.
+- For non-trivial diffs (about 2+ files or 40+ changed lines), run `elegance-review` in the same cycle before handoff/merge.
 Do not treat wrapper handoff-only output as a completed review.
@@ -151,11 +163,20 @@ Do not treat wrapper handoff-only output as a completed review.
 - Symptoms: missing collab/delegate tool-call evidence, framing/parsing errors, or unstable collab behavior after CLI upgrades.
 - Check versions first: `codex --version` and `codex-orchestrator --version`.
 - Confirm feature readiness: `codex-orchestrator doctor` (checks collab/cloud/delegation readiness and prints enablement commands).
-- CO repo refresh path (safe default): `scripts/codex-cli-refresh.sh --repo <codex-repo> --no-push`.
-- Rebuild managed CLI only: `codex-orchestrator codex setup --source <codex-repo> --yes --force`.
+- CO repo refresh path (safe default): `scripts/codex-cli-refresh.sh --repo <codex-repo> --align-only`.
+- Rebuild managed CLI only (optional): `codex-orchestrator codex setup --source <codex-repo> --yes --force`.
+- Managed routing is explicit opt-in: `export CODEX_CLI_USE_MANAGED=1` (stock/global `codex` remains default otherwise).
 - If local codex is materially behind upstream, sync before diagnosing collab behavior differences.
+- Built-in `explorer` may map to an older model profile; set `[agents.explorer]` without `config_file` so it inherits top-level `gpt-5.3-codex`, and reserve spark for optional `[agents.explorer_fast]` (text-only caveat).
+- For cloud-heavy streams, treat fallback as a safety net only; set `CODEX_ORCHESTRATOR_CLOUD_FALLBACK=deny` in fail-fast lanes.
 - If compatibility remains unstable, continue with non-collab execution path and document the degraded mode.
+## High-output guardrail (Playwright/browser tools)
+- Route Playwright-heavy work to a dedicated subagent stream so the parent thread does not absorb large browser logs/snapshots.
+- Keep raw Playwright output in artifacts and return only concise summary + evidence paths to the parent.
+- For these streams, explicitly close lifecycle loops (`spawn_agent` -> `wait` -> `close_agent`) before synthesis.
 ## Depth-limit guardrail
 - Collab spawn depth is bounded. At max depth, `spawn_agent` will fail and the branch must execute directly.
@@ -171,6 +192,7 @@ Do not treat wrapper handoff-only output as a completed review.
 - Do not keep long single-agent execution in parent when a focused subagent can own it.
 - Do not skip delegation solely because there is only one implementation stream; single-stream delegation is valid for context offload.
 - Do not rely on human-readable agent names in TUI labels for control flow; use stream ownership and evidence paths as source of truth.
+- Do not omit `agent_type` on `spawn_agent`; omission silently routes to `default`.
 - Do not end the parent work with unclosed collab agent ids.
 - Do not treat every delegated edit as "unexpected"; first verify whether the edit belongs to an active stream owner.

package/skills/delegation-usage/DELEGATION_GUIDE.md CHANGED Viewed

@@ -9,7 +9,7 @@ Use this guide for deeper context on delegation behavior, tool surfaces, and tro
 - It does **not** provide general tools itself; it only exposes `delegate.*` + optional `github.*` tools.
 - Child runs get tools based on `delegate.mode` + `delegate.tool_profile` + repo caps.
 - Delegation MCP stays enabled by default (only MCP on by default); disable it only when required by safety constraints.
-- Collab multi-agent mode is separate from delegation; for symbolic RLM subcalls, set `RLM_SYMBOLIC_COLLAB=1` and ensure a collab-capable Codex CLI. Collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
+- Multi-agent (collab tools) mode is separate from delegation; for symbolic RLM subcalls, set `RLM_SYMBOLIC_MULTI_AGENT=1` (legacy alias: `RLM_SYMBOLIC_COLLAB=1`) and ensure your Codex CLI has `features.multi_agent=true` (`collab` is a legacy alias/name in some keys). Collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
 ## Background-run pattern (preferred)
@@ -81,15 +81,25 @@ delegate.spawn({
 })
 ```
-## Collab lifecycle hygiene (required)
+## Multi-agent (collab tools) lifecycle hygiene (required)
 When using collab tools (`spawn_agent` / `wait` / `close_agent`):
 - Treat each spawned `agent_id` as a resource that must be closed.
+- `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- Prefix spawned prompts with `[agent_type:<role>]` on line one for auditable role routing.
 - For every successful spawn, run `wait` then `close_agent` for the same id.
-- Keep a local list of spawned ids and run a final cleanup pass before returning.
-- On timeout/error paths, still close known ids before reporting failure.
-- If you see `agent thread limit reached`, stop spawning immediately, close known ids, and retry only after cleanup.
+- Keep an `open_agent_ids` ledger and append each successful spawn id immediately.
+- Remove ids from `open_agent_ids` only after successful `close_agent`.
+- Run a final close-sweep before returning: close every id still in `open_agent_ids`, then clear it.
+- On timeout/error paths, run the same close-sweep before reporting failure.
+- If you see `agent thread limit reached`, stop spawning immediately, run close-sweep, retry once, and if still blocked continue in degraded mode (no further collab spawns).
+## Playwright stream hygiene
+- Run Playwright-heavy steps in a dedicated child stream; keep browser output out of parent chat.
+- Prefer artifact-first reporting (paths/manifests/screenshots) and a short synthesis instead of raw logs.
+- Keep lifecycle strict for browser streams too: `spawn_agent` -> `wait` -> `close_agent`.
 ## RLM budget overrides (recommended defaults)
@@ -125,9 +135,24 @@ Delegation MCP expects JSONL. Keep `codex-orchestrator` aligned with the current
 - Stock `codex` is the default path. If using a custom Codex fork, fast-forward from `upstream/main` regularly.
 - CO repo checkout only (helper is not shipped in npm): `scripts/codex-cli-refresh.sh --repo /path/to/codex --align-only`
 - CO repo checkout only (managed rebuild helper): `scripts/codex-cli-refresh.sh --repo /path/to/codex --force-rebuild`
+- Managed routing is opt-in: `export CODEX_CLI_USE_MANAGED=1` (without this, stock/global `codex` remains active).
 - Add `--no-push` only when you intentionally want local-only alignment without updating `origin/main`.
 - npm-safe alternative (no repo helper): `codex-orchestrator codex setup --source /path/to/codex --yes --force`
+## Agent role guard (recommended)
+- Built-in agent roles are `default`, `explorer`, `worker`; `researcher` is user-defined.
+- `spawn_agent` omission defaults to `default`; require explicit `agent_type` for every spawn.
+- For symbolic collab runs, include a first-line role tag in spawned prompts: `[agent_type:<role>]`.
+- Built-in `explorer` may map to an older model profile unless overridden in `~/.codex/config.toml`.
+- Recommended baseline:
+  - `model = "gpt-5.3-codex"`
+  - `model_reasoning_effort = "xhigh"`
+  - `[agents] max_threads = 8` (consider 12 only after stability checks)
+  - Set `[agents.explorer]` with no `config_file` so explorer inherits top-level `gpt-5.3-codex`.
+  - Add optional `[agents.explorer_fast]` for `gpt-5.3-codex-spark` (text-only caveat).
+  - Add `[agents.worker_complex]` for high-risk edits (`gpt-5.3-codex`, `xhigh`).
 ## Common failures
 - **Handshake failed / connection closed**: Usually an older binary (0.1.5) or framed responses.
@@ -136,5 +161,6 @@ Delegation MCP expects JSONL. Keep `codex-orchestrator` aligned with the current
 - **Missing control files**: delegate tools rely on `control_endpoint.json` in the run directory.
 - **Run identifiers**: status/pause/cancel require `manifest_path`; question queue requires `parent_manifest_path`.
 - **Collab payload mismatch**: `spawn_agent` calls fail if they include both `message` and `items`.
+- **Collab role routing drift**: if symbolic collab lifecycle validation reports missing/disallowed spawn roles, set explicit `agent_type` and add first-line `[agent_type:<role>]` tags.
 - **Collab depth limits**: recursive collab fan-out can fail near max depth; prefer shallow parent fan-out.
 - **Collab lifecycle leaks**: missing `close_agent` calls can exhaust thread slots and block future spawns (`agent thread limit reached`).

package/skills/delegation-usage/SKILL.md CHANGED Viewed

@@ -11,18 +11,23 @@ Use this skill to operate delegation MCP tools with delegation enabled by defaul
 `delegation-usage` is the canonical delegation workflow skill. If `delegate-early` is present, treat it as a compatibility alias that should redirect to this skill.
-Collab multi-agent mode is separate from delegation. For symbolic RLM subcalls that use collab tools, set `RLM_SYMBOLIC_COLLAB=1` and ensure a collab-capable Codex CLI; collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
+Multi-agent (collab tools) mode is separate from delegation. For symbolic RLM subcalls that use collab tools, set `RLM_SYMBOLIC_MULTI_AGENT=1` (legacy alias: `RLM_SYMBOLIC_COLLAB=1`) and ensure your Codex CLI has `features.multi_agent=true` (`collab` is a legacy alias/name in some keys); collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
-## Collab realities in delegated runs (current behavior)
+## Multi-agent (collab tools) realities in delegated runs (current behavior)
 - `spawn_agent` accepts one input style per call: either `message` (plain text) or `items` (structured input).
 - Do not send both `message` and `items` in the same `spawn_agent` call.
+- `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- For auditable role routing, prefix spawned prompts with `[agent_type:<role>]` on the first line and keep it aligned with `agent_type`.
 - Spawn returns an `agent_id` (thread id). Current TUI collab rendering is id-based; do not depend on custom visible agent names.
 - Subagents spawned through collab run with approval effectively set to `never`; design child tasks to avoid approval/escalation requirements.
 - Collab spawn depth is bounded. Near/at max depth, recursive delegation can fail or collab can be disabled in children; prefer shallow parent fan-out.
 - **Lifecycle is mandatory:** for every successful `spawn_agent`, run `wait` and then `close_agent` for that same id before task completion.
-- Keep a local list of spawned ids and run a final cleanup pass so no agent id is left unclosed on timeout/error paths.
-- If spawn fails with `agent thread limit reached`, stop spawning, close any known ids first, then surface a concise recovery note.
+- Keep an `open_agent_ids` ledger and append ids immediately after each successful spawn.
+- Remove ids from `open_agent_ids` only after successful `close_agent`.
+- Run a final close-sweep before handoff: close every id still in `open_agent_ids`, then clear the ledger.
+- On timeout/error paths, execute the same close-sweep before returning.
+- If spawn fails with `agent thread limit reached`, stop spawning, run close-sweep for known ids, retry once, and if still blocked surface a concise degraded-mode recovery note.
 - In a shared checkout, spawned subagents may produce file edits. Treat edits inside that stream's declared ownership as expected delegated output, not external interference.
 - Before spawning, capture a baseline (`git status --porcelain`). After `wait`, diff against baseline and classify file changes by stream ownership.
 - Escalate "unexpected local edits" only when changed files are outside all active stream scopes (or when no subagent was active).
@@ -61,6 +66,7 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
 - Delegate when the work spans >1 domain, touches more than ~2 files, needs verification/research, or is likely to run >10 minutes.
 - Spawn one delegate per workstream with narrow scope and acceptance criteria.
 - Keep delegation MCP enabled by default; enable other MCPs only when relevant to the task.
+- For Playwright-heavy browser flows, use a dedicated child stream and keep parent context lean: artifact-first evidence, short summary in chat, no raw log dumps.
 - Use `delegate.mode=question_only` unless the child truly needs full tool access.
 - Ask delegates for short, structured summaries and to write details into files/artifacts instead of long chat dumps.
 - Use `codex exec` only for pre-task triage (no task id yet) or when delegation is unavailable; copy outcomes into the spec once it exists.
@@ -72,6 +78,8 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
 - Register the delegation server once:
   - Preferred: `codex-orchestrator setup --yes`
     - One-shot bootstrap (installs bundled skills + configures delegation/DevTools wiring).
+  - Optional low-friction MCP enable pass: `codex-orchestrator mcp enable --yes`
+    - Enables disabled MCP servers from existing Codex config entries (plan mode redacts env/secret values in displayed command lines).
   - `codex-orchestrator delegation setup --yes`
     - Delegation-only setup (wraps `codex mcp add delegation ...` and keeps wiring discoverable via `codex-orchestrator doctor`).
   - `codex mcp add delegation -- codex-orchestrator delegate-server`
@@ -94,9 +102,24 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
 - Stock `codex` is the default path. If you use a custom Codex fork, fast-forward it regularly from `upstream/main`.
 - CO repo checkout only (helper is not shipped in npm): `scripts/codex-cli-refresh.sh --repo /path/to/codex --align-only`
 - CO repo checkout only (managed rebuild helper): `scripts/codex-cli-refresh.sh --repo /path/to/codex --force-rebuild`
+- Managed routing is explicit opt-in: `export CODEX_CLI_USE_MANAGED=1` (without this, stock/global `codex` stays active).
 - Add `--no-push` only when you intentionally want local-only alignment without updating `origin/main`.
 - npm-safe alternative (no repo helper): `codex-orchestrator codex setup --source /path/to/codex --yes --force`
+### 0a.1) Agent role guard (avoid stale built-in defaults)
+- Built-in roles are `default`, `explorer`, and `worker`. `researcher` is user-defined.
+- `spawn_agent` omission defaults to `default`; require explicit `agent_type` for every spawn.
+- For symbolic collab runs, include a first-line role tag in spawned prompts: `[agent_type:<role>]`.
+- Built-in `explorer` can map to an older model profile unless overridden; pin your own role config to keep latest-codex behavior.
+- Recommended baseline in `~/.codex/config.toml`:
+  - `model = "gpt-5.3-codex"`
+  - `model_reasoning_effort = "xhigh"`
+  - `[agents] max_threads = 8` (raise to 12 only after proving stability on your machine)
+  - `[agents.explorer]` with no `config_file` so built-in explorer inherits top-level `gpt-5.3-codex`
+  - Optional `[agents.explorer_fast]` -> `~/.codex/agents/explorer-fast.toml` (`gpt-5.3-codex-spark`, text-only)
+  - `[agents.worker_complex]` -> `~/.codex/agents/worker-complex.toml` (`gpt-5.3-codex`, `xhigh`)
 ### 0b) Background terminal bootstrap (required when MCP is disabled)
 When `delegate.*` is missing in the current session, immediately spawn a **background** Codex run with delegation enabled and hand it the narrow task. Use `codex exec` so it completes without interaction and you can capture output:
@@ -185,11 +208,13 @@ repeat:
 - **Long waits:** `wait_ms` never blocks longer than 10s per call; use polling.
 - **Long-running delegate.spawn:** Prefer `start_only=true` (default) to avoid tool-call timeouts. If you must use `start_only=false`, keep runs short or run long jobs outside delegation (no question queue).
 - **Cloud run branch mismatch:** cloud-mode orchestration against a local-only branch can fail with `couldn't find remote ref ...`; set `CODEX_CLOUD_BRANCH` to a pushed branch (typically `main`) before cloud execution.
+- **Cloud fallback dependence:** fallback should be a safety net, not the default path; for fail-fast cloud lanes, set `CODEX_ORCHESTRATOR_CLOUD_FALLBACK=deny`.
 - **Tool profile mismatch:** child tool profile must be allowed by repo policy; invalid or unsafe names are ignored.
 - **Confirmation misuse:** never pass `confirm_nonce` from model/tool input; it is runner‑injected only.
 - **Secrets exposure:** never include secrets/tokens/PII in delegate prompts or files.
 - **Missing control files:** delegate tools rely on `control_endpoint.json` in the run directory; older runs may not have it.
 - **Collab payload mismatch:** `spawn_agent` rejects calls that include both `message` and `items`.
+- **Collab role routing drift:** if symbolic collab lifecycle validation reports missing/disallowed spawn roles, set explicit `agent_type` and add first-line `[agent_type:<role>]` tags.
 - **Collab UI assumptions:** agent rows/records are id-based today; use explicit stream role text in prompts/artifacts for operator clarity.
 - **Collab lifecycle leaks:** missing `close_agent` calls accumulate open threads and can trigger `agent thread limit reached`; always finish `spawn -> wait -> close_agent` per id.
 - **False "unexpected edits" stops:** when a live subagent owns the touched files, treat those edits as expected output and continue with scope-aware review.

package/skills/elegance-review/SKILL.md CHANGED Viewed

@@ -14,20 +14,31 @@ Use this skill after non-trivial edits to verify the implementation is minimal,
 Run this skill whenever any condition is true:
 - You changed behavior across about 2+ files.
 - You added a new helper/module/pathway and could possibly collapse it.
+- You finished writing code for a non-trivial sub-goal and are about to lock the checkpoint.
 - You finished addressing review feedback and are preparing to hand off.
 - You are about to recommend merge/release.
 - The user explicitly asks for elegance/minimality/overengineering checks.
+- A standalone review just completed for a non-trivial diff.
 ## Quick start
-Focused uncommitted review:
+Compatibility guard (current Codex CLI behavior):
+- Do not combine `--uncommitted`, `--base`, or `--commit` with a custom prompt argument.
+- Use diff-scoped review without prompt, or prompt-only review without scope flags.
+Uncommitted diff:
 ```bash
-codex review --uncommitted "Find avoidable complexity, duplicate abstractions, and unnecessary indirection. Prioritize simplifications that preserve behavior."
+codex review --uncommitted
 ```
 Diff-vs-base review:
 ```bash
-codex review --base <branch> "Focus on smallest viable design and maintenance cost."
+codex review --base <branch>
+```
+Prompt-only pass (no diff flags):
+```bash
+codex review "Find avoidable complexity, duplicate abstractions, and unnecessary indirection. Prioritize simplifications that preserve behavior."
 ```
 ## Workflow

package/skills/standalone-review/SKILL.md CHANGED Viewed

@@ -15,14 +15,19 @@ Before implementation, use it to review the task/spec against the user’s inten
 Run this skill automatically whenever any condition is true:
 - You made code/config/script/test edits since the last standalone review.
 - You finished a meaningful chunk of work (default: behavior change or about 2+ files touched).
+- You finished a coding burst for a sub-goal and are about to validate, summarize, or switch streams.
 - You are about to report completion, propose merge, or answer "what's next?" with recommendations.
 - You addressed external feedback (PR reviews, bot comments, or CI-fix patches).
-- 45 minutes of active implementation elapsed without a standalone review.
+- A non-trivial open diff (about 2+ files or 40+ changed lines) has not had an elegance pass in the current cycle.
 If review execution is blocked, record why in task notes, then do manual diff review plus targeted tests before proceeding.
 ## Quick start
+Compatibility guard (current Codex CLI behavior):
+- Do not combine `--uncommitted`, `--base`, or `--commit` with a custom prompt argument.
+- Use diff-scoped review without prompt, or prompt-only review without scope flags.
 Uncommitted diff:
 ```
 codex review --uncommitted
@@ -52,7 +57,8 @@ codex review "Focus on correctness, regressions, edge cases; list missing tests.
 2) Run the review often
 - Follow the auto-trigger policy above (not optional).
 - Run after each meaningful chunk of work.
-- Prefer targeted focus prompts for WIP reviews.
+- Prefer targeted focus prompts for WIP reviews (prompt-only invocation).
+- For non-trivial diffs, pair this with `elegance-review` in the same cycle before handoff/merge.
 3) Capture actionable output
 - Prioritize correctness, regressions, and missing tests.

package/templates/README.md CHANGED Viewed

@@ -13,4 +13,4 @@ repository and will not overwrite files unless you pass --force.
 Next steps (recommended):
   codex mcp add delegation -- codex-orchestrator delegate-server --repo /path/to/repo
-  codex-orchestrator codex setup   # optional: CO-managed Codex CLI for collab JSONL
+  codex-orchestrator codex setup   # optional: CO-managed Codex CLI (activate only when needed via CODEX_CLI_USE_MANAGED=1)

package/templates/codex/AGENTS.md CHANGED Viewed

@@ -1,4 +1,4 @@
-<!-- codex:instruction-stamp 2408396e5cc9b25d5522b7064010a36a43007508072f3e0f051ab042370928a1 -->
+<!-- codex:instruction-stamp 4f9803271a8209cf58746c0a71d87421952a402c884cc0262a8765fa5c456128 -->
 # Agent Instructions (Template)
 ## Orchestrator-first workflow
@@ -28,6 +28,7 @@
 ## Deliberation Default (agent-first)
 - Keep MCP as the lead control plane. Use collab/delegated subagents for deliberation when ambiguity or impact is high.
+- Terminology: `collab` is the workflow/tooling name, while Codex CLI feature gating uses `features.multi_agent=true` (legacy alias/names like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls` still use `collab`).
 - Run full deliberation on any hard-stop trigger:
   - Irreversible/destructive changes with unclear rollback.
   - Auth/secrets/PII boundary changes.
@@ -47,6 +48,16 @@
   - `P1` high findings are hard-stop only when high-signal (clear evidence or corroboration).
   - `P2/P3` findings are tracked follow-ups.
+## Agent role baseline
+- Built-in roles are `default`, `explorer`, and `worker`; `researcher` is user-defined.
+- `spawn_agent` defaults to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- For symbolic collab runs, prefix spawned prompts with `[agent_type:<role>]` on line one so role intent is auditable from JSONL/manifests.
+- Keep top-level defaults on latest codex by setting `model = "gpt-5.3-codex"` in `~/.codex/config.toml`.
+- Define a user `agents.explorer` role without `config_file` so built-in explorer inherits top-level model defaults.
+- Spark caveat: `gpt-5.3-codex-spark` is text-only.
+- Use `[agents] max_threads = 8` as the default baseline; raise to `12` only after proving stable tool/runtime behavior.
+- Add an explicit `worker_complex` role (`gpt-5.3-codex`, `xhigh`) for high-risk implementation streams.
 ## Completion discipline (patience-first)
 - Wait/poll for terminal state on long-running operations (CI checks, reviews, cloud jobs, orchestrator runs) before reporting completion.
 - Reset waiting windows when checks restart or new feedback appears.