@kbediako/codex-orchestrator 0.1.31 → 0.1.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (32) hide show
  1. package/README.md +79 -9
  2. package/dist/bin/codex-orchestrator.js +671 -66
  3. package/dist/orchestrator/src/cli/codexCliSetup.js +1 -0
  4. package/dist/orchestrator/src/cli/doctor.js +186 -7
  5. package/dist/orchestrator/src/cli/doctorUsage.js +150 -8
  6. package/dist/orchestrator/src/cli/init.js +1 -1
  7. package/dist/orchestrator/src/cli/mcpEnable.js +392 -0
  8. package/dist/orchestrator/src/cli/orchestrator.js +161 -2
  9. package/dist/orchestrator/src/cli/rlmRunner.js +289 -35
  10. package/dist/orchestrator/src/cli/run/manifest.js +31 -6
  11. package/dist/orchestrator/src/cli/services/commandRunner.js +10 -2
  12. package/dist/orchestrator/src/cli/services/runSummaryWriter.js +35 -0
  13. package/dist/orchestrator/src/cli/skills.js +3 -8
  14. package/dist/orchestrator/src/cli/utils/advancedAutopilot.js +114 -0
  15. package/dist/orchestrator/src/cli/utils/codexCli.js +21 -0
  16. package/dist/orchestrator/src/cli/utils/delegationGuardRunner.js +85 -8
  17. package/dist/orchestrator/src/cli/utils/specGuardRunner.js +79 -19
  18. package/dist/orchestrator/src/cloud/CodexCloudTaskExecutor.js +25 -6
  19. package/dist/orchestrator/src/control-plane/request-builder.js +9 -8
  20. package/dist/scripts/lib/pr-watch-merge.js +493 -4
  21. package/docs/README.md +7 -5
  22. package/package.json +1 -1
  23. package/schemas/manifest.json +27 -0
  24. package/skills/collab-deliberation/SKILL.md +6 -0
  25. package/skills/collab-evals/SKILL.md +4 -0
  26. package/skills/collab-subagents-first/SKILL.md +29 -7
  27. package/skills/delegation-usage/DELEGATION_GUIDE.md +31 -5
  28. package/skills/delegation-usage/SKILL.md +29 -4
  29. package/skills/elegance-review/SKILL.md +14 -3
  30. package/skills/standalone-review/SKILL.md +8 -2
  31. package/templates/README.md +1 -1
  32. package/templates/codex/AGENTS.md +12 -1
@@ -7,6 +7,11 @@ description: Structure multi-agent brainstorming and deliberation (options, trad
7
7
 
8
8
  Use this skill when the user asks for brainstorming, tradeoffs, option comparison, or decision support before implementation. This skill is for ideas and decisions, not coding.
9
9
 
10
+ ## Terminology + feature gate (required)
11
+ - In this skill, "collab" means multi-agent tool usage (`spawn_agent` / `wait` / `close_agent`).
12
+ - Codex CLI feature gating is `features.multi_agent=true`; treat `collab` as legacy naming in some env/artifact keys.
13
+ - For symbolic orchestration, existing key names remain `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls`.
14
+
10
15
  ## Deliberation Default v1 (required)
11
16
  - Keep MCP as the lead control plane. Use collab/delegated subagents to generate and challenge options.
12
17
  - Run full deliberation when any hard-stop trigger is true:
@@ -84,4 +89,5 @@ Use this skill when the user asks for brainstorming, tradeoffs, option compariso
84
89
  - Do not present uncertainty as certainty.
85
90
  - Keep outputs concise and action-oriented.
86
91
  - If collab subagents are used, close lifecycle loops per id (`spawn_agent` -> `wait` -> `close_agent`) before finishing.
92
+ - If collab subagents are used, always set explicit `agent_type` (omission defaults to `default`) and prefix spawned prompts with `[agent_type:<role>]`.
87
93
  - If you cannot close collab agents (missing ids) and spawn keeps failing, restart the session and re-run deliberation; keep work moving by doing solo deliberation meanwhile.
@@ -9,6 +9,10 @@ Use this skill to run repeatable collab evaluation scenarios and record evidence
9
9
 
10
10
  ## Quick start
11
11
 
12
+ 0) Confirm feature readiness:
13
+ - Run `codex features list` and verify `multi_agent` is enabled.
14
+ - In this skill, "collab" refers to the same multi-agent tooling path; `collab` naming remains in legacy keys like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls`.
15
+
12
16
  1) Pick the scenario(s):
13
17
  - Large-context symbolic RLM with collab subcalls.
14
18
  - Multi-hour refactor with checkpoints.
@@ -11,6 +11,12 @@ Delegate as a manager, not as a pass-through. Split work into narrow streams, gi
11
11
 
12
12
  Note: If a global `collab-subagents-first` skill is installed, prefer that and fall back to this bundled skill.
13
13
 
14
+ ## Terminology + feature gate
15
+
16
+ - Use "collab" as the workflow/tooling term for subagent calls (`spawn_agent` / `wait` / `close_agent`).
17
+ - Codex CLI enablement is `features.multi_agent=true`; `collab` remains as legacy naming in fields like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls`.
18
+ - Keep existing env/artifact key names as-is unless upstream explicitly changes those interfaces.
19
+
14
20
  ## Delegation gate
15
21
 
16
22
  Use subagents when any condition is true:
@@ -89,6 +95,8 @@ Skip subagents when all conditions are true:
89
95
  - `message` (plain text), or
90
96
  - `items` (structured input).
91
97
  - Do not send both `message` and `items` in one spawn call.
98
+ - `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
99
+ - Prefix spawned prompts with `[agent_type:<role>]` on line one so role intent is auditable from collab JSONL/manifests.
92
100
  - Use `items` when you need explicit structured context (for example `mention` paths like `app://...` or selected `skill` entries) instead of flattening everything into one long string.
93
101
  - Spawn returns an `agent_id` (thread id). Collab event rendering/picker labels are id-based today; do not depend on custom visible agent names.
94
102
  - To keep operator readability high despite id labels, encode the role clearly in your stream labels and first-line task brief (for example `review`, `tests`, `research`).
@@ -96,10 +104,13 @@ Skip subagents when all conditions are true:
96
104
  ## Collab lifecycle hygiene (required)
97
105
 
98
106
  When you use collab tools (`spawn_agent` / `wait` / `close_agent`):
99
- - Keep a local list of every returned `agent_id`.
100
- - For every successful `spawn_agent`, run `wait` and then `close_agent` for that same id.
101
- - Always close agents on error/timeout paths; do a final cleanup pass before finishing so no id is left unclosed.
102
- - If spawn fails with `agent thread limit reached`, stop spawning immediately, close any known ids, then retry once. If you still cannot spawn, proceed without collab (solo or via delegation) and explicitly note the degraded mode.
107
+ - Keep an `open_agent_ids` ledger for the current turn/stage.
108
+ - On successful `spawn_agent`, append the returned `agent_id` to `open_agent_ids` immediately.
109
+ - For every successful spawn, run `wait` and then `close_agent` for that same id.
110
+ - After each successful close, remove that id from `open_agent_ids`.
111
+ - On timeout/error paths, close any id still in `open_agent_ids` before returning control.
112
+ - Run a final close-sweep before handoff: iterate all remaining ids in `open_agent_ids`, call `close_agent`, then clear the list.
113
+ - If spawn fails with `agent thread limit reached`, stop spawning immediately, run a close-sweep for all known ids, retry one time, and if it still fails proceed without collab (solo or delegation) while explicitly noting degraded mode.
103
114
 
104
115
  ## Required subagent contract
105
116
 
@@ -123,7 +134,7 @@ Reject and rerun when responses are:
123
134
  - Keep privileged/high-risk operations in the parent thread when interactive approval is required.
124
135
  - Subagents inherit core execution context (for example cwd/sandbox constraints), so include environment assumptions explicitly in each brief.
125
136
 
126
- ## Review loop (standalone-review pairing)
137
+ ## Review loop (standalone + elegance pairing)
127
138
 
128
139
  Use a two-layer review loop:
129
140
 
@@ -137,6 +148,7 @@ Use a two-layer review loop:
137
148
  2) Parent independent review (required)
138
149
  - After integrating subagent work, run a standalone review from the parent.
139
150
  - Prefer the global `standalone-review` skill workflow for consistent checks.
151
+ - For non-trivial diffs (about 2+ files or 40+ changed lines), run `elegance-review` in the same cycle before handoff/merge.
140
152
 
141
153
  Do not treat wrapper handoff-only output as a completed review.
142
154
 
@@ -151,11 +163,20 @@ Do not treat wrapper handoff-only output as a completed review.
151
163
  - Symptoms: missing collab/delegate tool-call evidence, framing/parsing errors, or unstable collab behavior after CLI upgrades.
152
164
  - Check versions first: `codex --version` and `codex-orchestrator --version`.
153
165
  - Confirm feature readiness: `codex-orchestrator doctor` (checks collab/cloud/delegation readiness and prints enablement commands).
154
- - CO repo refresh path (safe default): `scripts/codex-cli-refresh.sh --repo <codex-repo> --no-push`.
155
- - Rebuild managed CLI only: `codex-orchestrator codex setup --source <codex-repo> --yes --force`.
166
+ - CO repo refresh path (safe default): `scripts/codex-cli-refresh.sh --repo <codex-repo> --align-only`.
167
+ - Rebuild managed CLI only (optional): `codex-orchestrator codex setup --source <codex-repo> --yes --force`.
168
+ - Managed routing is explicit opt-in: `export CODEX_CLI_USE_MANAGED=1` (stock/global `codex` remains default otherwise).
156
169
  - If local codex is materially behind upstream, sync before diagnosing collab behavior differences.
170
+ - Built-in `explorer` may map to an older model profile; set `[agents.explorer]` without `config_file` so it inherits top-level `gpt-5.3-codex`, and reserve spark for optional `[agents.explorer_fast]` (text-only caveat).
171
+ - For cloud-heavy streams, treat fallback as a safety net only; set `CODEX_ORCHESTRATOR_CLOUD_FALLBACK=deny` in fail-fast lanes.
157
172
  - If compatibility remains unstable, continue with non-collab execution path and document the degraded mode.
158
173
 
174
+ ## High-output guardrail (Playwright/browser tools)
175
+
176
+ - Route Playwright-heavy work to a dedicated subagent stream so the parent thread does not absorb large browser logs/snapshots.
177
+ - Keep raw Playwright output in artifacts and return only concise summary + evidence paths to the parent.
178
+ - For these streams, explicitly close lifecycle loops (`spawn_agent` -> `wait` -> `close_agent`) before synthesis.
179
+
159
180
  ## Depth-limit guardrail
160
181
 
161
182
  - Collab spawn depth is bounded. At max depth, `spawn_agent` will fail and the branch must execute directly.
@@ -171,6 +192,7 @@ Do not treat wrapper handoff-only output as a completed review.
171
192
  - Do not keep long single-agent execution in parent when a focused subagent can own it.
172
193
  - Do not skip delegation solely because there is only one implementation stream; single-stream delegation is valid for context offload.
173
194
  - Do not rely on human-readable agent names in TUI labels for control flow; use stream ownership and evidence paths as source of truth.
195
+ - Do not omit `agent_type` on `spawn_agent`; omission silently routes to `default`.
174
196
  - Do not end the parent work with unclosed collab agent ids.
175
197
  - Do not treat every delegated edit as "unexpected"; first verify whether the edit belongs to an active stream owner.
176
198
 
@@ -9,7 +9,7 @@ Use this guide for deeper context on delegation behavior, tool surfaces, and tro
9
9
  - It does **not** provide general tools itself; it only exposes `delegate.*` + optional `github.*` tools.
10
10
  - Child runs get tools based on `delegate.mode` + `delegate.tool_profile` + repo caps.
11
11
  - Delegation MCP stays enabled by default (only MCP on by default); disable it only when required by safety constraints.
12
- - Collab multi-agent mode is separate from delegation; for symbolic RLM subcalls, set `RLM_SYMBOLIC_COLLAB=1` and ensure a collab-capable Codex CLI. Collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
12
+ - Multi-agent (collab tools) mode is separate from delegation; for symbolic RLM subcalls, set `RLM_SYMBOLIC_MULTI_AGENT=1` (legacy alias: `RLM_SYMBOLIC_COLLAB=1`) and ensure your Codex CLI has `features.multi_agent=true` (`collab` is a legacy alias/name in some keys). Collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
13
13
 
14
14
  ## Background-run pattern (preferred)
15
15
 
@@ -81,15 +81,25 @@ delegate.spawn({
81
81
  })
82
82
  ```
83
83
 
84
- ## Collab lifecycle hygiene (required)
84
+ ## Multi-agent (collab tools) lifecycle hygiene (required)
85
85
 
86
86
  When using collab tools (`spawn_agent` / `wait` / `close_agent`):
87
87
 
88
88
  - Treat each spawned `agent_id` as a resource that must be closed.
89
+ - `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
90
+ - Prefix spawned prompts with `[agent_type:<role>]` on line one for auditable role routing.
89
91
  - For every successful spawn, run `wait` then `close_agent` for the same id.
90
- - Keep a local list of spawned ids and run a final cleanup pass before returning.
91
- - On timeout/error paths, still close known ids before reporting failure.
92
- - If you see `agent thread limit reached`, stop spawning immediately, close known ids, and retry only after cleanup.
92
+ - Keep an `open_agent_ids` ledger and append each successful spawn id immediately.
93
+ - Remove ids from `open_agent_ids` only after successful `close_agent`.
94
+ - Run a final close-sweep before returning: close every id still in `open_agent_ids`, then clear it.
95
+ - On timeout/error paths, run the same close-sweep before reporting failure.
96
+ - If you see `agent thread limit reached`, stop spawning immediately, run close-sweep, retry once, and if still blocked continue in degraded mode (no further collab spawns).
97
+
98
+ ## Playwright stream hygiene
99
+
100
+ - Run Playwright-heavy steps in a dedicated child stream; keep browser output out of parent chat.
101
+ - Prefer artifact-first reporting (paths/manifests/screenshots) and a short synthesis instead of raw logs.
102
+ - Keep lifecycle strict for browser streams too: `spawn_agent` -> `wait` -> `close_agent`.
93
103
 
94
104
  ## RLM budget overrides (recommended defaults)
95
105
 
@@ -125,9 +135,24 @@ Delegation MCP expects JSONL. Keep `codex-orchestrator` aligned with the current
125
135
  - Stock `codex` is the default path. If using a custom Codex fork, fast-forward from `upstream/main` regularly.
126
136
  - CO repo checkout only (helper is not shipped in npm): `scripts/codex-cli-refresh.sh --repo /path/to/codex --align-only`
127
137
  - CO repo checkout only (managed rebuild helper): `scripts/codex-cli-refresh.sh --repo /path/to/codex --force-rebuild`
138
+ - Managed routing is opt-in: `export CODEX_CLI_USE_MANAGED=1` (without this, stock/global `codex` remains active).
128
139
  - Add `--no-push` only when you intentionally want local-only alignment without updating `origin/main`.
129
140
  - npm-safe alternative (no repo helper): `codex-orchestrator codex setup --source /path/to/codex --yes --force`
130
141
 
142
+ ## Agent role guard (recommended)
143
+
144
+ - Built-in agent roles are `default`, `explorer`, `worker`; `researcher` is user-defined.
145
+ - `spawn_agent` omission defaults to `default`; require explicit `agent_type` for every spawn.
146
+ - For symbolic collab runs, include a first-line role tag in spawned prompts: `[agent_type:<role>]`.
147
+ - Built-in `explorer` may map to an older model profile unless overridden in `~/.codex/config.toml`.
148
+ - Recommended baseline:
149
+ - `model = "gpt-5.3-codex"`
150
+ - `model_reasoning_effort = "xhigh"`
151
+ - `[agents] max_threads = 8` (consider 12 only after stability checks)
152
+ - Set `[agents.explorer]` with no `config_file` so explorer inherits top-level `gpt-5.3-codex`.
153
+ - Add optional `[agents.explorer_fast]` for `gpt-5.3-codex-spark` (text-only caveat).
154
+ - Add `[agents.worker_complex]` for high-risk edits (`gpt-5.3-codex`, `xhigh`).
155
+
131
156
  ## Common failures
132
157
 
133
158
  - **Handshake failed / connection closed**: Usually an older binary (0.1.5) or framed responses.
@@ -136,5 +161,6 @@ Delegation MCP expects JSONL. Keep `codex-orchestrator` aligned with the current
136
161
  - **Missing control files**: delegate tools rely on `control_endpoint.json` in the run directory.
137
162
  - **Run identifiers**: status/pause/cancel require `manifest_path`; question queue requires `parent_manifest_path`.
138
163
  - **Collab payload mismatch**: `spawn_agent` calls fail if they include both `message` and `items`.
164
+ - **Collab role routing drift**: if symbolic collab lifecycle validation reports missing/disallowed spawn roles, set explicit `agent_type` and add first-line `[agent_type:<role>]` tags.
139
165
  - **Collab depth limits**: recursive collab fan-out can fail near max depth; prefer shallow parent fan-out.
140
166
  - **Collab lifecycle leaks**: missing `close_agent` calls can exhaust thread slots and block future spawns (`agent thread limit reached`).
@@ -11,18 +11,23 @@ Use this skill to operate delegation MCP tools with delegation enabled by defaul
11
11
 
12
12
  `delegation-usage` is the canonical delegation workflow skill. If `delegate-early` is present, treat it as a compatibility alias that should redirect to this skill.
13
13
 
14
- Collab multi-agent mode is separate from delegation. For symbolic RLM subcalls that use collab tools, set `RLM_SYMBOLIC_COLLAB=1` and ensure a collab-capable Codex CLI; collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
14
+ Multi-agent (collab tools) mode is separate from delegation. For symbolic RLM subcalls that use collab tools, set `RLM_SYMBOLIC_MULTI_AGENT=1` (legacy alias: `RLM_SYMBOLIC_COLLAB=1`) and ensure your Codex CLI has `features.multi_agent=true` (`collab` is a legacy alias/name in some keys); collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
15
15
 
16
- ## Collab realities in delegated runs (current behavior)
16
+ ## Multi-agent (collab tools) realities in delegated runs (current behavior)
17
17
 
18
18
  - `spawn_agent` accepts one input style per call: either `message` (plain text) or `items` (structured input).
19
19
  - Do not send both `message` and `items` in the same `spawn_agent` call.
20
+ - `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
21
+ - For auditable role routing, prefix spawned prompts with `[agent_type:<role>]` on the first line and keep it aligned with `agent_type`.
20
22
  - Spawn returns an `agent_id` (thread id). Current TUI collab rendering is id-based; do not depend on custom visible agent names.
21
23
  - Subagents spawned through collab run with approval effectively set to `never`; design child tasks to avoid approval/escalation requirements.
22
24
  - Collab spawn depth is bounded. Near/at max depth, recursive delegation can fail or collab can be disabled in children; prefer shallow parent fan-out.
23
25
  - **Lifecycle is mandatory:** for every successful `spawn_agent`, run `wait` and then `close_agent` for that same id before task completion.
24
- - Keep a local list of spawned ids and run a final cleanup pass so no agent id is left unclosed on timeout/error paths.
25
- - If spawn fails with `agent thread limit reached`, stop spawning, close any known ids first, then surface a concise recovery note.
26
+ - Keep an `open_agent_ids` ledger and append ids immediately after each successful spawn.
27
+ - Remove ids from `open_agent_ids` only after successful `close_agent`.
28
+ - Run a final close-sweep before handoff: close every id still in `open_agent_ids`, then clear the ledger.
29
+ - On timeout/error paths, execute the same close-sweep before returning.
30
+ - If spawn fails with `agent thread limit reached`, stop spawning, run close-sweep for known ids, retry once, and if still blocked surface a concise degraded-mode recovery note.
26
31
  - In a shared checkout, spawned subagents may produce file edits. Treat edits inside that stream's declared ownership as expected delegated output, not external interference.
27
32
  - Before spawning, capture a baseline (`git status --porcelain`). After `wait`, diff against baseline and classify file changes by stream ownership.
28
33
  - Escalate "unexpected local edits" only when changed files are outside all active stream scopes (or when no subagent was active).
@@ -61,6 +66,7 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
61
66
  - Delegate when the work spans >1 domain, touches more than ~2 files, needs verification/research, or is likely to run >10 minutes.
62
67
  - Spawn one delegate per workstream with narrow scope and acceptance criteria.
63
68
  - Keep delegation MCP enabled by default; enable other MCPs only when relevant to the task.
69
+ - For Playwright-heavy browser flows, use a dedicated child stream and keep parent context lean: artifact-first evidence, short summary in chat, no raw log dumps.
64
70
  - Use `delegate.mode=question_only` unless the child truly needs full tool access.
65
71
  - Ask delegates for short, structured summaries and to write details into files/artifacts instead of long chat dumps.
66
72
  - Use `codex exec` only for pre-task triage (no task id yet) or when delegation is unavailable; copy outcomes into the spec once it exists.
@@ -72,6 +78,8 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
72
78
  - Register the delegation server once:
73
79
  - Preferred: `codex-orchestrator setup --yes`
74
80
  - One-shot bootstrap (installs bundled skills + configures delegation/DevTools wiring).
81
+ - Optional low-friction MCP enable pass: `codex-orchestrator mcp enable --yes`
82
+ - Enables disabled MCP servers from existing Codex config entries (plan mode redacts env/secret values in displayed command lines).
75
83
  - `codex-orchestrator delegation setup --yes`
76
84
  - Delegation-only setup (wraps `codex mcp add delegation ...` and keeps wiring discoverable via `codex-orchestrator doctor`).
77
85
  - `codex mcp add delegation -- codex-orchestrator delegate-server`
@@ -94,9 +102,24 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
94
102
  - Stock `codex` is the default path. If you use a custom Codex fork, fast-forward it regularly from `upstream/main`.
95
103
  - CO repo checkout only (helper is not shipped in npm): `scripts/codex-cli-refresh.sh --repo /path/to/codex --align-only`
96
104
  - CO repo checkout only (managed rebuild helper): `scripts/codex-cli-refresh.sh --repo /path/to/codex --force-rebuild`
105
+ - Managed routing is explicit opt-in: `export CODEX_CLI_USE_MANAGED=1` (without this, stock/global `codex` stays active).
97
106
  - Add `--no-push` only when you intentionally want local-only alignment without updating `origin/main`.
98
107
  - npm-safe alternative (no repo helper): `codex-orchestrator codex setup --source /path/to/codex --yes --force`
99
108
 
109
+ ### 0a.1) Agent role guard (avoid stale built-in defaults)
110
+
111
+ - Built-in roles are `default`, `explorer`, and `worker`. `researcher` is user-defined.
112
+ - `spawn_agent` omission defaults to `default`; require explicit `agent_type` for every spawn.
113
+ - For symbolic collab runs, include a first-line role tag in spawned prompts: `[agent_type:<role>]`.
114
+ - Built-in `explorer` can map to an older model profile unless overridden; pin your own role config to keep latest-codex behavior.
115
+ - Recommended baseline in `~/.codex/config.toml`:
116
+ - `model = "gpt-5.3-codex"`
117
+ - `model_reasoning_effort = "xhigh"`
118
+ - `[agents] max_threads = 8` (raise to 12 only after proving stability on your machine)
119
+ - `[agents.explorer]` with no `config_file` so built-in explorer inherits top-level `gpt-5.3-codex`
120
+ - Optional `[agents.explorer_fast]` -> `~/.codex/agents/explorer-fast.toml` (`gpt-5.3-codex-spark`, text-only)
121
+ - `[agents.worker_complex]` -> `~/.codex/agents/worker-complex.toml` (`gpt-5.3-codex`, `xhigh`)
122
+
100
123
  ### 0b) Background terminal bootstrap (required when MCP is disabled)
101
124
 
102
125
  When `delegate.*` is missing in the current session, immediately spawn a **background** Codex run with delegation enabled and hand it the narrow task. Use `codex exec` so it completes without interaction and you can capture output:
@@ -185,11 +208,13 @@ repeat:
185
208
  - **Long waits:** `wait_ms` never blocks longer than 10s per call; use polling.
186
209
  - **Long-running delegate.spawn:** Prefer `start_only=true` (default) to avoid tool-call timeouts. If you must use `start_only=false`, keep runs short or run long jobs outside delegation (no question queue).
187
210
  - **Cloud run branch mismatch:** cloud-mode orchestration against a local-only branch can fail with `couldn't find remote ref ...`; set `CODEX_CLOUD_BRANCH` to a pushed branch (typically `main`) before cloud execution.
211
+ - **Cloud fallback dependence:** fallback should be a safety net, not the default path; for fail-fast cloud lanes, set `CODEX_ORCHESTRATOR_CLOUD_FALLBACK=deny`.
188
212
  - **Tool profile mismatch:** child tool profile must be allowed by repo policy; invalid or unsafe names are ignored.
189
213
  - **Confirmation misuse:** never pass `confirm_nonce` from model/tool input; it is runner‑injected only.
190
214
  - **Secrets exposure:** never include secrets/tokens/PII in delegate prompts or files.
191
215
  - **Missing control files:** delegate tools rely on `control_endpoint.json` in the run directory; older runs may not have it.
192
216
  - **Collab payload mismatch:** `spawn_agent` rejects calls that include both `message` and `items`.
217
+ - **Collab role routing drift:** if symbolic collab lifecycle validation reports missing/disallowed spawn roles, set explicit `agent_type` and add first-line `[agent_type:<role>]` tags.
193
218
  - **Collab UI assumptions:** agent rows/records are id-based today; use explicit stream role text in prompts/artifacts for operator clarity.
194
219
  - **Collab lifecycle leaks:** missing `close_agent` calls accumulate open threads and can trigger `agent thread limit reached`; always finish `spawn -> wait -> close_agent` per id.
195
220
  - **False "unexpected edits" stops:** when a live subagent owns the touched files, treat those edits as expected output and continue with scope-aware review.
@@ -14,20 +14,31 @@ Use this skill after non-trivial edits to verify the implementation is minimal,
14
14
  Run this skill whenever any condition is true:
15
15
  - You changed behavior across about 2+ files.
16
16
  - You added a new helper/module/pathway and could possibly collapse it.
17
+ - You finished writing code for a non-trivial sub-goal and are about to lock the checkpoint.
17
18
  - You finished addressing review feedback and are preparing to hand off.
18
19
  - You are about to recommend merge/release.
19
20
  - The user explicitly asks for elegance/minimality/overengineering checks.
21
+ - A standalone review just completed for a non-trivial diff.
20
22
 
21
23
  ## Quick start
22
24
 
23
- Focused uncommitted review:
25
+ Compatibility guard (current Codex CLI behavior):
26
+ - Do not combine `--uncommitted`, `--base`, or `--commit` with a custom prompt argument.
27
+ - Use diff-scoped review without prompt, or prompt-only review without scope flags.
28
+
29
+ Uncommitted diff:
24
30
  ```bash
25
- codex review --uncommitted "Find avoidable complexity, duplicate abstractions, and unnecessary indirection. Prioritize simplifications that preserve behavior."
31
+ codex review --uncommitted
26
32
  ```
27
33
 
28
34
  Diff-vs-base review:
29
35
  ```bash
30
- codex review --base <branch> "Focus on smallest viable design and maintenance cost."
36
+ codex review --base <branch>
37
+ ```
38
+
39
+ Prompt-only pass (no diff flags):
40
+ ```bash
41
+ codex review "Find avoidable complexity, duplicate abstractions, and unnecessary indirection. Prioritize simplifications that preserve behavior."
31
42
  ```
32
43
 
33
44
  ## Workflow
@@ -15,14 +15,19 @@ Before implementation, use it to review the task/spec against the user’s inten
15
15
  Run this skill automatically whenever any condition is true:
16
16
  - You made code/config/script/test edits since the last standalone review.
17
17
  - You finished a meaningful chunk of work (default: behavior change or about 2+ files touched).
18
+ - You finished a coding burst for a sub-goal and are about to validate, summarize, or switch streams.
18
19
  - You are about to report completion, propose merge, or answer "what's next?" with recommendations.
19
20
  - You addressed external feedback (PR reviews, bot comments, or CI-fix patches).
20
- - 45 minutes of active implementation elapsed without a standalone review.
21
+ - A non-trivial open diff (about 2+ files or 40+ changed lines) has not had an elegance pass in the current cycle.
21
22
 
22
23
  If review execution is blocked, record why in task notes, then do manual diff review plus targeted tests before proceeding.
23
24
 
24
25
  ## Quick start
25
26
 
27
+ Compatibility guard (current Codex CLI behavior):
28
+ - Do not combine `--uncommitted`, `--base`, or `--commit` with a custom prompt argument.
29
+ - Use diff-scoped review without prompt, or prompt-only review without scope flags.
30
+
26
31
  Uncommitted diff:
27
32
  ```
28
33
  codex review --uncommitted
@@ -52,7 +57,8 @@ codex review "Focus on correctness, regressions, edge cases; list missing tests.
52
57
  2) Run the review often
53
58
  - Follow the auto-trigger policy above (not optional).
54
59
  - Run after each meaningful chunk of work.
55
- - Prefer targeted focus prompts for WIP reviews.
60
+ - Prefer targeted focus prompts for WIP reviews (prompt-only invocation).
61
+ - For non-trivial diffs, pair this with `elegance-review` in the same cycle before handoff/merge.
56
62
 
57
63
  3) Capture actionable output
58
64
  - Prioritize correctness, regressions, and missing tests.
@@ -13,4 +13,4 @@ repository and will not overwrite files unless you pass --force.
13
13
 
14
14
  Next steps (recommended):
15
15
  codex mcp add delegation -- codex-orchestrator delegate-server --repo /path/to/repo
16
- codex-orchestrator codex setup # optional: CO-managed Codex CLI for collab JSONL
16
+ codex-orchestrator codex setup # optional: CO-managed Codex CLI (activate only when needed via CODEX_CLI_USE_MANAGED=1)
@@ -1,4 +1,4 @@
1
- <!-- codex:instruction-stamp 2408396e5cc9b25d5522b7064010a36a43007508072f3e0f051ab042370928a1 -->
1
+ <!-- codex:instruction-stamp 4f9803271a8209cf58746c0a71d87421952a402c884cc0262a8765fa5c456128 -->
2
2
  # Agent Instructions (Template)
3
3
 
4
4
  ## Orchestrator-first workflow
@@ -28,6 +28,7 @@
28
28
 
29
29
  ## Deliberation Default (agent-first)
30
30
  - Keep MCP as the lead control plane. Use collab/delegated subagents for deliberation when ambiguity or impact is high.
31
+ - Terminology: `collab` is the workflow/tooling name, while Codex CLI feature gating uses `features.multi_agent=true` (legacy alias/names like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls` still use `collab`).
31
32
  - Run full deliberation on any hard-stop trigger:
32
33
  - Irreversible/destructive changes with unclear rollback.
33
34
  - Auth/secrets/PII boundary changes.
@@ -47,6 +48,16 @@
47
48
  - `P1` high findings are hard-stop only when high-signal (clear evidence or corroboration).
48
49
  - `P2/P3` findings are tracked follow-ups.
49
50
 
51
+ ## Agent role baseline
52
+ - Built-in roles are `default`, `explorer`, and `worker`; `researcher` is user-defined.
53
+ - `spawn_agent` defaults to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
54
+ - For symbolic collab runs, prefix spawned prompts with `[agent_type:<role>]` on line one so role intent is auditable from JSONL/manifests.
55
+ - Keep top-level defaults on latest codex by setting `model = "gpt-5.3-codex"` in `~/.codex/config.toml`.
56
+ - Define a user `agents.explorer` role without `config_file` so built-in explorer inherits top-level model defaults.
57
+ - Spark caveat: `gpt-5.3-codex-spark` is text-only.
58
+ - Use `[agents] max_threads = 8` as the default baseline; raise to `12` only after proving stable tool/runtime behavior.
59
+ - Add an explicit `worker_complex` role (`gpt-5.3-codex`, `xhigh`) for high-risk implementation streams.
60
+
50
61
  ## Completion discipline (patience-first)
51
62
  - Wait/poll for terminal state on long-running operations (CI checks, reviews, cloud jobs, orchestrator runs) before reporting completion.
52
63
  - Reset waiting windows when checks restart or new feedback appears.