npm - @kbediako/codex-orchestrator - Versions diffs - 0.1.32 → 0.1.33 - Mend

@kbediako/codex-orchestrator 0.1.32 → 0.1.33

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (32) hide show

package/README.md +77 -9
package/dist/bin/codex-orchestrator.js +339 -59
package/dist/orchestrator/src/cli/codexCliSetup.js +1 -0
package/dist/orchestrator/src/cli/doctor.js +186 -7
package/dist/orchestrator/src/cli/doctorUsage.js +150 -8
package/dist/orchestrator/src/cli/init.js +1 -1
package/dist/orchestrator/src/cli/mcpEnable.js +392 -0
package/dist/orchestrator/src/cli/orchestrator.js +161 -2
package/dist/orchestrator/src/cli/rlmRunner.js +289 -35
package/dist/orchestrator/src/cli/run/manifest.js +31 -6
package/dist/orchestrator/src/cli/services/commandRunner.js +10 -2
package/dist/orchestrator/src/cli/services/runSummaryWriter.js +35 -0
package/dist/orchestrator/src/cli/skills.js +3 -8
package/dist/orchestrator/src/cli/utils/advancedAutopilot.js +114 -0
package/dist/orchestrator/src/cli/utils/codexCli.js +21 -0
package/dist/orchestrator/src/cli/utils/delegationGuardRunner.js +85 -8
package/dist/orchestrator/src/cli/utils/specGuardRunner.js +79 -19
package/dist/orchestrator/src/cloud/CodexCloudTaskExecutor.js +25 -6
package/dist/orchestrator/src/control-plane/request-builder.js +9 -8
package/dist/scripts/lib/pr-watch-merge.js +367 -3
package/docs/README.md +6 -5
package/package.json +1 -1
package/schemas/manifest.json +27 -0
package/skills/collab-deliberation/SKILL.md +6 -0
package/skills/collab-evals/SKILL.md +4 -0
package/skills/collab-subagents-first/SKILL.md +29 -7
package/skills/delegation-usage/DELEGATION_GUIDE.md +31 -5
package/skills/delegation-usage/SKILL.md +29 -4
package/skills/elegance-review/SKILL.md +14 -3
package/skills/standalone-review/SKILL.md +8 -2
package/templates/README.md +1 -1
package/templates/codex/AGENTS.md +12 -1

package/skills/delegation-usage/DELEGATION_GUIDE.md CHANGED Viewed

@@ -9,7 +9,7 @@ Use this guide for deeper context on delegation behavior, tool surfaces, and tro
 - It does **not** provide general tools itself; it only exposes `delegate.*` + optional `github.*` tools.
 - Child runs get tools based on `delegate.mode` + `delegate.tool_profile` + repo caps.
 - Delegation MCP stays enabled by default (only MCP on by default); disable it only when required by safety constraints.
-- Collab multi-agent mode is separate from delegation; for symbolic RLM subcalls, set `RLM_SYMBOLIC_COLLAB=1` and ensure a collab-capable Codex CLI. Collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
+- Multi-agent (collab tools) mode is separate from delegation; for symbolic RLM subcalls, set `RLM_SYMBOLIC_MULTI_AGENT=1` (legacy alias: `RLM_SYMBOLIC_COLLAB=1`) and ensure your Codex CLI has `features.multi_agent=true` (`collab` is a legacy alias/name in some keys). Collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
 ## Background-run pattern (preferred)
@@ -81,15 +81,25 @@ delegate.spawn({
 })
 ```
-## Collab lifecycle hygiene (required)
+## Multi-agent (collab tools) lifecycle hygiene (required)
 When using collab tools (`spawn_agent` / `wait` / `close_agent`):
 - Treat each spawned `agent_id` as a resource that must be closed.
+- `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- Prefix spawned prompts with `[agent_type:<role>]` on line one for auditable role routing.
 - For every successful spawn, run `wait` then `close_agent` for the same id.
-- Keep a local list of spawned ids and run a final cleanup pass before returning.
-- On timeout/error paths, still close known ids before reporting failure.
-- If you see `agent thread limit reached`, stop spawning immediately, close known ids, and retry only after cleanup.
+- Keep an `open_agent_ids` ledger and append each successful spawn id immediately.
+- Remove ids from `open_agent_ids` only after successful `close_agent`.
+- Run a final close-sweep before returning: close every id still in `open_agent_ids`, then clear it.
+- On timeout/error paths, run the same close-sweep before reporting failure.
+- If you see `agent thread limit reached`, stop spawning immediately, run close-sweep, retry once, and if still blocked continue in degraded mode (no further collab spawns).
+## Playwright stream hygiene
+- Run Playwright-heavy steps in a dedicated child stream; keep browser output out of parent chat.
+- Prefer artifact-first reporting (paths/manifests/screenshots) and a short synthesis instead of raw logs.
+- Keep lifecycle strict for browser streams too: `spawn_agent` -> `wait` -> `close_agent`.
 ## RLM budget overrides (recommended defaults)
@@ -125,9 +135,24 @@ Delegation MCP expects JSONL. Keep `codex-orchestrator` aligned with the current
 - Stock `codex` is the default path. If using a custom Codex fork, fast-forward from `upstream/main` regularly.
 - CO repo checkout only (helper is not shipped in npm): `scripts/codex-cli-refresh.sh --repo /path/to/codex --align-only`
 - CO repo checkout only (managed rebuild helper): `scripts/codex-cli-refresh.sh --repo /path/to/codex --force-rebuild`
+- Managed routing is opt-in: `export CODEX_CLI_USE_MANAGED=1` (without this, stock/global `codex` remains active).
 - Add `--no-push` only when you intentionally want local-only alignment without updating `origin/main`.
 - npm-safe alternative (no repo helper): `codex-orchestrator codex setup --source /path/to/codex --yes --force`
+## Agent role guard (recommended)
+- Built-in agent roles are `default`, `explorer`, `worker`; `researcher` is user-defined.
+- `spawn_agent` omission defaults to `default`; require explicit `agent_type` for every spawn.
+- For symbolic collab runs, include a first-line role tag in spawned prompts: `[agent_type:<role>]`.
+- Built-in `explorer` may map to an older model profile unless overridden in `~/.codex/config.toml`.
+- Recommended baseline:
+  - `model = "gpt-5.3-codex"`
+  - `model_reasoning_effort = "xhigh"`
+  - `[agents] max_threads = 8` (consider 12 only after stability checks)
+  - Set `[agents.explorer]` with no `config_file` so explorer inherits top-level `gpt-5.3-codex`.
+  - Add optional `[agents.explorer_fast]` for `gpt-5.3-codex-spark` (text-only caveat).
+  - Add `[agents.worker_complex]` for high-risk edits (`gpt-5.3-codex`, `xhigh`).
 ## Common failures
 - **Handshake failed / connection closed**: Usually an older binary (0.1.5) or framed responses.
@@ -136,5 +161,6 @@ Delegation MCP expects JSONL. Keep `codex-orchestrator` aligned with the current
 - **Missing control files**: delegate tools rely on `control_endpoint.json` in the run directory.
 - **Run identifiers**: status/pause/cancel require `manifest_path`; question queue requires `parent_manifest_path`.
 - **Collab payload mismatch**: `spawn_agent` calls fail if they include both `message` and `items`.
+- **Collab role routing drift**: if symbolic collab lifecycle validation reports missing/disallowed spawn roles, set explicit `agent_type` and add first-line `[agent_type:<role>]` tags.
 - **Collab depth limits**: recursive collab fan-out can fail near max depth; prefer shallow parent fan-out.
 - **Collab lifecycle leaks**: missing `close_agent` calls can exhaust thread slots and block future spawns (`agent thread limit reached`).

package/skills/delegation-usage/SKILL.md CHANGED Viewed

@@ -11,18 +11,23 @@ Use this skill to operate delegation MCP tools with delegation enabled by defaul
 `delegation-usage` is the canonical delegation workflow skill. If `delegate-early` is present, treat it as a compatibility alias that should redirect to this skill.
-Collab multi-agent mode is separate from delegation. For symbolic RLM subcalls that use collab tools, set `RLM_SYMBOLIC_COLLAB=1` and ensure a collab-capable Codex CLI; collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
+Multi-agent (collab tools) mode is separate from delegation. For symbolic RLM subcalls that use collab tools, set `RLM_SYMBOLIC_MULTI_AGENT=1` (legacy alias: `RLM_SYMBOLIC_COLLAB=1`) and ensure your Codex CLI has `features.multi_agent=true` (`collab` is a legacy alias/name in some keys); collab tool calls are recorded in `manifest.collab_tool_calls`. If collab tools are unavailable in your CLI build, skip collab steps; delegation still works independently.
-## Collab realities in delegated runs (current behavior)
+## Multi-agent (collab tools) realities in delegated runs (current behavior)
 - `spawn_agent` accepts one input style per call: either `message` (plain text) or `items` (structured input).
 - Do not send both `message` and `items` in the same `spawn_agent` call.
+- `spawn_agent` falls back to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- For auditable role routing, prefix spawned prompts with `[agent_type:<role>]` on the first line and keep it aligned with `agent_type`.
 - Spawn returns an `agent_id` (thread id). Current TUI collab rendering is id-based; do not depend on custom visible agent names.
 - Subagents spawned through collab run with approval effectively set to `never`; design child tasks to avoid approval/escalation requirements.
 - Collab spawn depth is bounded. Near/at max depth, recursive delegation can fail or collab can be disabled in children; prefer shallow parent fan-out.
 - **Lifecycle is mandatory:** for every successful `spawn_agent`, run `wait` and then `close_agent` for that same id before task completion.
-- Keep a local list of spawned ids and run a final cleanup pass so no agent id is left unclosed on timeout/error paths.
-- If spawn fails with `agent thread limit reached`, stop spawning, close any known ids first, then surface a concise recovery note.
+- Keep an `open_agent_ids` ledger and append ids immediately after each successful spawn.
+- Remove ids from `open_agent_ids` only after successful `close_agent`.
+- Run a final close-sweep before handoff: close every id still in `open_agent_ids`, then clear the ledger.
+- On timeout/error paths, execute the same close-sweep before returning.
+- If spawn fails with `agent thread limit reached`, stop spawning, run close-sweep for known ids, retry once, and if still blocked surface a concise degraded-mode recovery note.
 - In a shared checkout, spawned subagents may produce file edits. Treat edits inside that stream's declared ownership as expected delegated output, not external interference.
 - Before spawning, capture a baseline (`git status --porcelain`). After `wait`, diff against baseline and classify file changes by stream ownership.
 - Escalate "unexpected local edits" only when changed files are outside all active stream scopes (or when no subagent was active).
@@ -61,6 +66,7 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
 - Delegate when the work spans >1 domain, touches more than ~2 files, needs verification/research, or is likely to run >10 minutes.
 - Spawn one delegate per workstream with narrow scope and acceptance criteria.
 - Keep delegation MCP enabled by default; enable other MCPs only when relevant to the task.
+- For Playwright-heavy browser flows, use a dedicated child stream and keep parent context lean: artifact-first evidence, short summary in chat, no raw log dumps.
 - Use `delegate.mode=question_only` unless the child truly needs full tool access.
 - Ask delegates for short, structured summaries and to write details into files/artifacts instead of long chat dumps.
 - Use `codex exec` only for pre-task triage (no task id yet) or when delegation is unavailable; copy outcomes into the spec once it exists.
@@ -72,6 +78,8 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
 - Register the delegation server once:
   - Preferred: `codex-orchestrator setup --yes`
     - One-shot bootstrap (installs bundled skills + configures delegation/DevTools wiring).
+  - Optional low-friction MCP enable pass: `codex-orchestrator mcp enable --yes`
+    - Enables disabled MCP servers from existing Codex config entries (plan mode redacts env/secret values in displayed command lines).
   - `codex-orchestrator delegation setup --yes`
     - Delegation-only setup (wraps `codex mcp add delegation ...` and keeps wiring discoverable via `codex-orchestrator doctor`).
   - `codex mcp add delegation -- codex-orchestrator delegate-server`
@@ -94,9 +102,24 @@ For runner + delegation coordination (short `--task` flow), see `docs/delegation
 - Stock `codex` is the default path. If you use a custom Codex fork, fast-forward it regularly from `upstream/main`.
 - CO repo checkout only (helper is not shipped in npm): `scripts/codex-cli-refresh.sh --repo /path/to/codex --align-only`
 - CO repo checkout only (managed rebuild helper): `scripts/codex-cli-refresh.sh --repo /path/to/codex --force-rebuild`
+- Managed routing is explicit opt-in: `export CODEX_CLI_USE_MANAGED=1` (without this, stock/global `codex` stays active).
 - Add `--no-push` only when you intentionally want local-only alignment without updating `origin/main`.
 - npm-safe alternative (no repo helper): `codex-orchestrator codex setup --source /path/to/codex --yes --force`
+### 0a.1) Agent role guard (avoid stale built-in defaults)
+- Built-in roles are `default`, `explorer`, and `worker`. `researcher` is user-defined.
+- `spawn_agent` omission defaults to `default`; require explicit `agent_type` for every spawn.
+- For symbolic collab runs, include a first-line role tag in spawned prompts: `[agent_type:<role>]`.
+- Built-in `explorer` can map to an older model profile unless overridden; pin your own role config to keep latest-codex behavior.
+- Recommended baseline in `~/.codex/config.toml`:
+  - `model = "gpt-5.3-codex"`
+  - `model_reasoning_effort = "xhigh"`
+  - `[agents] max_threads = 8` (raise to 12 only after proving stability on your machine)
+  - `[agents.explorer]` with no `config_file` so built-in explorer inherits top-level `gpt-5.3-codex`
+  - Optional `[agents.explorer_fast]` -> `~/.codex/agents/explorer-fast.toml` (`gpt-5.3-codex-spark`, text-only)
+  - `[agents.worker_complex]` -> `~/.codex/agents/worker-complex.toml` (`gpt-5.3-codex`, `xhigh`)
 ### 0b) Background terminal bootstrap (required when MCP is disabled)
 When `delegate.*` is missing in the current session, immediately spawn a **background** Codex run with delegation enabled and hand it the narrow task. Use `codex exec` so it completes without interaction and you can capture output:
@@ -185,11 +208,13 @@ repeat:
 - **Long waits:** `wait_ms` never blocks longer than 10s per call; use polling.
 - **Long-running delegate.spawn:** Prefer `start_only=true` (default) to avoid tool-call timeouts. If you must use `start_only=false`, keep runs short or run long jobs outside delegation (no question queue).
 - **Cloud run branch mismatch:** cloud-mode orchestration against a local-only branch can fail with `couldn't find remote ref ...`; set `CODEX_CLOUD_BRANCH` to a pushed branch (typically `main`) before cloud execution.
+- **Cloud fallback dependence:** fallback should be a safety net, not the default path; for fail-fast cloud lanes, set `CODEX_ORCHESTRATOR_CLOUD_FALLBACK=deny`.
 - **Tool profile mismatch:** child tool profile must be allowed by repo policy; invalid or unsafe names are ignored.
 - **Confirmation misuse:** never pass `confirm_nonce` from model/tool input; it is runner‑injected only.
 - **Secrets exposure:** never include secrets/tokens/PII in delegate prompts or files.
 - **Missing control files:** delegate tools rely on `control_endpoint.json` in the run directory; older runs may not have it.
 - **Collab payload mismatch:** `spawn_agent` rejects calls that include both `message` and `items`.
+- **Collab role routing drift:** if symbolic collab lifecycle validation reports missing/disallowed spawn roles, set explicit `agent_type` and add first-line `[agent_type:<role>]` tags.
 - **Collab UI assumptions:** agent rows/records are id-based today; use explicit stream role text in prompts/artifacts for operator clarity.
 - **Collab lifecycle leaks:** missing `close_agent` calls accumulate open threads and can trigger `agent thread limit reached`; always finish `spawn -> wait -> close_agent` per id.
 - **False "unexpected edits" stops:** when a live subagent owns the touched files, treat those edits as expected output and continue with scope-aware review.

package/skills/elegance-review/SKILL.md CHANGED Viewed

@@ -14,20 +14,31 @@ Use this skill after non-trivial edits to verify the implementation is minimal,
 Run this skill whenever any condition is true:
 - You changed behavior across about 2+ files.
 - You added a new helper/module/pathway and could possibly collapse it.
+- You finished writing code for a non-trivial sub-goal and are about to lock the checkpoint.
 - You finished addressing review feedback and are preparing to hand off.
 - You are about to recommend merge/release.
 - The user explicitly asks for elegance/minimality/overengineering checks.
+- A standalone review just completed for a non-trivial diff.
 ## Quick start
-Focused uncommitted review:
+Compatibility guard (current Codex CLI behavior):
+- Do not combine `--uncommitted`, `--base`, or `--commit` with a custom prompt argument.
+- Use diff-scoped review without prompt, or prompt-only review without scope flags.
+Uncommitted diff:
 ```bash
-codex review --uncommitted "Find avoidable complexity, duplicate abstractions, and unnecessary indirection. Prioritize simplifications that preserve behavior."
+codex review --uncommitted
 ```
 Diff-vs-base review:
 ```bash
-codex review --base <branch> "Focus on smallest viable design and maintenance cost."
+codex review --base <branch>
+```
+Prompt-only pass (no diff flags):
+```bash
+codex review "Find avoidable complexity, duplicate abstractions, and unnecessary indirection. Prioritize simplifications that preserve behavior."
 ```
 ## Workflow

package/skills/standalone-review/SKILL.md CHANGED Viewed

@@ -15,14 +15,19 @@ Before implementation, use it to review the task/spec against the user’s inten
 Run this skill automatically whenever any condition is true:
 - You made code/config/script/test edits since the last standalone review.
 - You finished a meaningful chunk of work (default: behavior change or about 2+ files touched).
+- You finished a coding burst for a sub-goal and are about to validate, summarize, or switch streams.
 - You are about to report completion, propose merge, or answer "what's next?" with recommendations.
 - You addressed external feedback (PR reviews, bot comments, or CI-fix patches).
-- 45 minutes of active implementation elapsed without a standalone review.
+- A non-trivial open diff (about 2+ files or 40+ changed lines) has not had an elegance pass in the current cycle.
 If review execution is blocked, record why in task notes, then do manual diff review plus targeted tests before proceeding.
 ## Quick start
+Compatibility guard (current Codex CLI behavior):
+- Do not combine `--uncommitted`, `--base`, or `--commit` with a custom prompt argument.
+- Use diff-scoped review without prompt, or prompt-only review without scope flags.
 Uncommitted diff:
 ```
 codex review --uncommitted
@@ -52,7 +57,8 @@ codex review "Focus on correctness, regressions, edge cases; list missing tests.
 2) Run the review often
 - Follow the auto-trigger policy above (not optional).
 - Run after each meaningful chunk of work.
-- Prefer targeted focus prompts for WIP reviews.
+- Prefer targeted focus prompts for WIP reviews (prompt-only invocation).
+- For non-trivial diffs, pair this with `elegance-review` in the same cycle before handoff/merge.
 3) Capture actionable output
 - Prioritize correctness, regressions, and missing tests.

package/templates/README.md CHANGED Viewed

@@ -13,4 +13,4 @@ repository and will not overwrite files unless you pass --force.
 Next steps (recommended):
   codex mcp add delegation -- codex-orchestrator delegate-server --repo /path/to/repo
-  codex-orchestrator codex setup   # optional: CO-managed Codex CLI for collab JSONL
+  codex-orchestrator codex setup   # optional: CO-managed Codex CLI (activate only when needed via CODEX_CLI_USE_MANAGED=1)

package/templates/codex/AGENTS.md CHANGED Viewed

@@ -1,4 +1,4 @@
-<!-- codex:instruction-stamp 2408396e5cc9b25d5522b7064010a36a43007508072f3e0f051ab042370928a1 -->
+<!-- codex:instruction-stamp 4f9803271a8209cf58746c0a71d87421952a402c884cc0262a8765fa5c456128 -->
 # Agent Instructions (Template)
 ## Orchestrator-first workflow
@@ -28,6 +28,7 @@
 ## Deliberation Default (agent-first)
 - Keep MCP as the lead control plane. Use collab/delegated subagents for deliberation when ambiguity or impact is high.
+- Terminology: `collab` is the workflow/tooling name, while Codex CLI feature gating uses `features.multi_agent=true` (legacy alias/names like `RLM_SYMBOLIC_COLLAB` and `manifest.collab_tool_calls` still use `collab`).
 - Run full deliberation on any hard-stop trigger:
   - Irreversible/destructive changes with unclear rollback.
   - Auth/secrets/PII boundary changes.
@@ -47,6 +48,16 @@
   - `P1` high findings are hard-stop only when high-signal (clear evidence or corroboration).
   - `P2/P3` findings are tracked follow-ups.
+## Agent role baseline
+- Built-in roles are `default`, `explorer`, and `worker`; `researcher` is user-defined.
+- `spawn_agent` defaults to `default` when `agent_type` is omitted; always set `agent_type` explicitly.
+- For symbolic collab runs, prefix spawned prompts with `[agent_type:<role>]` on line one so role intent is auditable from JSONL/manifests.
+- Keep top-level defaults on latest codex by setting `model = "gpt-5.3-codex"` in `~/.codex/config.toml`.
+- Define a user `agents.explorer` role without `config_file` so built-in explorer inherits top-level model defaults.
+- Spark caveat: `gpt-5.3-codex-spark` is text-only.
+- Use `[agents] max_threads = 8` as the default baseline; raise to `12` only after proving stable tool/runtime behavior.
+- Add an explicit `worker_complex` role (`gpt-5.3-codex`, `xhigh`) for high-risk implementation streams.
 ## Completion discipline (patience-first)
 - Wait/poll for terminal state on long-running operations (CI checks, reviews, cloud jobs, orchestrator runs) before reporting completion.
 - Reset waiting windows when checks restart or new feedback appears.