@kbediako/codex-orchestrator 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (41) hide show
  1. package/README.md +43 -83
  2. package/dist/bin/codex-orchestrator.js +2 -0
  3. package/dist/orchestrator/src/cli/adapters/CommandBuilder.js +50 -0
  4. package/dist/orchestrator/src/cli/adapters/cloudFailureDiagnostics.js +117 -5
  5. package/dist/orchestrator/src/cli/coStatusAttachCliShell.js +2 -2
  6. package/dist/orchestrator/src/cli/coStatusCliShell.js +28 -6
  7. package/dist/orchestrator/src/cli/codexCliShell.js +48 -1
  8. package/dist/orchestrator/src/cli/codexDefaultsSetup.js +217 -26
  9. package/dist/orchestrator/src/cli/control/controlHostSupervision.js +28 -6
  10. package/dist/orchestrator/src/cli/control/controlRuntime.js +17 -6
  11. package/dist/orchestrator/src/cli/control/controlStatusDashboard.js +6 -1
  12. package/dist/orchestrator/src/cli/control/selectedRunProjection.js +49 -2
  13. package/dist/orchestrator/src/cli/doctor.js +142 -48
  14. package/dist/orchestrator/src/cli/init.js +94 -1
  15. package/dist/orchestrator/src/cli/providerLinearChildLaneRunner.js +64 -1
  16. package/dist/orchestrator/src/cli/providerLinearWorkerRunner.js +1165 -69
  17. package/dist/orchestrator/src/cli/rlm/alignment.js +3 -3
  18. package/dist/orchestrator/src/cli/services/commandRunner.js +31 -0
  19. package/dist/orchestrator/src/cli/utils/cloudPreflight.js +202 -6
  20. package/dist/orchestrator/src/cli/utils/codexFeatures.js +60 -0
  21. package/dist/orchestrator/src/manager.js +74 -4
  22. package/dist/scripts/lib/docs-catalog.js +35 -1
  23. package/docs/README.md +333 -0
  24. package/docs/book/README.md +19 -0
  25. package/docs/book/codex-cli-0124-adoption.md +68 -0
  26. package/docs/book/local-hook-impact.md +73 -0
  27. package/docs/book/operations.md +60 -0
  28. package/docs/book/public-posture.md +34 -0
  29. package/docs/book/setup.md +91 -0
  30. package/docs/book/skills.md +11 -0
  31. package/docs/guides/codex-version-policy.md +104 -0
  32. package/docs/public/downstream-setup.md +25 -18
  33. package/package.json +4 -1
  34. package/plugins/codex-orchestrator/.codex-plugin/plugin.json +1 -1
  35. package/plugins/codex-orchestrator/launcher.mjs +6 -4
  36. package/schemas/manifest.json +17 -0
  37. package/skills/README.md +26 -0
  38. package/skills/collab-subagents-first/SKILL.md +1 -1
  39. package/skills/delegation-usage/DELEGATION_GUIDE.md +12 -7
  40. package/skills/delegation-usage/SKILL.md +13 -8
  41. package/templates/codex/AGENTS.md +12 -10
package/docs/README.md ADDED
@@ -0,0 +1,333 @@
1
+ # Codex Orchestrator (Repository Guide)
2
+
3
+ > Internal contributor guide. Public downstream docs live in `README.md` and `docs/public/`.
4
+
5
+ Codex Orchestrator is the coordination layer that glues together Codex-driven agents, run pipelines, approval policies, and evidence capture for multi-stage automation projects. It wraps a reusable orchestration core with a CLI that produces auditable manifests, integrates with control-plane validators, and syncs run results to downstream systems.
6
+
7
+ > **At a glance:** Every run starts from a task description, writes the active CLI manifest to `.runs/<task-id>/cli/<run-id>/manifest.json`, emits a persisted run summary at `.runs/<task-id>/<run-id>/manifest.json`, mirrors human-readable data to `out/<task-id>/`, and can optionally sync to a remote control plane. Pipelines define the concrete commands (build, lint, test, etc.) that execute for a given task.
8
+
9
+ ## Evaluation & Metrics
10
+ - Evaluation playbook: `docs/guides/evaluation-playbook.md`.
11
+ - Metrics reference: `docs/reference/metrics-collab-context-rot.md`.
12
+
13
+ ## Collab vs MCP
14
+ - Decision guide: `docs/guides/collab-vs-mcp.md`.
15
+
16
+ ## Public docs
17
+ - Public front door: `README.md`
18
+ - Downstream setup: `docs/public/downstream-setup.md`
19
+ - Provider onboarding: `docs/public/provider-onboarding.md`
20
+
21
+ ## Upstream Sync
22
+ - Codex CLI sync strategy: `docs/guides/upstream-codex-cli-sync.md`.
23
+
24
+ ## Current Posture
25
+ - Current CO-local ChatGPT-auth/appserver model posture: `gpt-5.5` / `xhigh` on Codex CLI `0.125.0` when live access smoke passes.
26
+ - Release-facing cloud/downstream pins remain evidence-gated in `docs/guides/codex-version-policy.md`; the exact CO-352 cloud blocker is the configured environment id not found.
27
+ - Current model posture is `gpt-5.5` / `xhigh` when available in ChatGPT-auth Codex sessions; keep `explorer_fast` on `gpt-5.3-codex-spark` for file/codebase search only.
28
+ - Portable packaged/generated defaults still keep `gpt-5.4` / `xhigh` as fallback values when `gpt-5.5`, API/cloud portability, or downstream/no-network access is not proven.
29
+ - `codex-orchestrator doctor` treats `gpt-5.5` as non-drift when `codex debug models` verifies current model access; additive defaults keep fresh configs on portable fallback values unless `--auth-scope chatgpt` is explicitly requested after live access smoke, and they preserve compatible prior `gpt-5.5` role files without requiring extra marker metadata.
30
+ - Local default runtime is `appserver`; keep `--runtime-mode cli` as break-glass.
31
+ - Full posture and promotion gates live in `docs/guides/codex-version-policy.md`.
32
+
33
+ ## Release Notes
34
+ - Shipped skills note: `docs/release-notes-template-addendum.md`.
35
+ - Canonical promoted sections: generated `Overview` and `Bug Fixes` become top-level release-note sections; generated `Documentation` remains under `Full Changelog`.
36
+ - Optional one-shot overview override: put release-specific narrative text in the signed annotated tag body before pushing the tag. The workflow reads the tag body for that release only and does not read .github/release-overview.md.
37
+
38
+ ## How It Works
39
+ - **Planner → Builder → Tester → Reviewer:** The core `TaskManager` (see `orchestrator/src/manager.ts`) wires together agent interfaces that decide *what* to run (planner), execute the selected pipeline stage (builder), verify results (tester), and give a final decision (reviewer).
40
+ - **Execution modes:** Each plan item can flag canonical `requires_cloud`; planner output still carries legacy `requiresCloud` as a compatibility alias while current code should prefer `requires_cloud`. Task metadata can set `execution.parallel`, and the mode policy picks `mcp` (local MCP runtime) or `cloud` execution accordingly. Cloud runs perform a quick preflight (env id, codex availability, optional remote branch) and fall back to `mcp` with both summary text and a structured `cloud_fallback` manifest block when preflight fails.
41
+ - **Runtime provider modes:** `runtimeMode=cli|appserver` is orthogonal to `executionMode`; local default runtime is `appserver` with `cli` break-glass support preserved. Explicit `executionMode=cloud + runtimeMode=appserver` remains unsupported and fails fast.
42
+ - **Advanced feature posture:** `js_repl` is enabled by default globally (local + cloud lanes). For deterministic cloud contracts, pin explicit feature lanes (`CODEX_CLOUD_ENABLE_FEATURES=js_repl` and separate `CODEX_CLOUD_DISABLE_FEATURES=js_repl` runs). Use `CODEX_CLOUD_DISABLE_FEATURES=js_repl` for task-scoped cloud break-glass; reserve `codex features disable js_repl` for global emergency toggles and re-enable with `codex features enable js_repl`; `memories` remains scoped to explicit eval lanes (legacy alias `memory_tool` is compatibility-only).
43
+ - **Event-driven persistence:** Milestones emit typed events on `EventBus`. `PersistenceCoordinator` captures run summaries in the task state store and writes manifests so nothing is lost if the process crashes.
44
+ - **CLI lifecycle:** `CodexOrchestrator` (in `orchestrator/src/cli/orchestrator.ts`) resolves instruction sources (`AGENTS.md`, `docs/AGENTS.md`, `.agent/AGENTS.md`), loads the chosen pipeline, executes each command stage via `runCommandStage`, and keeps heartbeats plus command status current inside the manifest (approval evidence will surface once prompt wiring lands).
45
+ - **Control-plane & scheduler integrations:** Optional validation (`control-plane/`) and scheduling (`scheduler/`) modules enrich manifests with drift checks, plan assignments, and remote run metadata.
46
+ - **Cloud sync (optional):** `orchestrator/src/sync/` includes a `CloudSyncWorker` + `CloudRunsClient`, but the default CLI does not wire cloud uploads yet—treat this as an integration point you enable explicitly.
47
+ - **Tool orchestration:** The shared `packages/orchestrator` toolkit handles approval prompts, sandbox retries, and tool run bookkeeping used by higher-level agents.
48
+
49
+ ```
50
+ Task input ─► Planner ─► Mode policy (mcp/cloud) ─► Builder ─► Tester ─► Reviewer ─► Run summary
51
+ │ │ │ │ │
52
+ │ │ │ │ └─► Control-plane validators / Scheduler hooks / Cloud sync
53
+ │ │ │ │
54
+ └─► EventBus ─► PersistenceCoordinator ─► .runs/ manifests ─► out/ audits
55
+
56
+ └─► Task state snapshots & guardrail evidence
57
+
58
+ Group execution (when `FEATURE_TFGRPO_GROUP=on`): repeat the Builder → Tester → Reviewer stages for prioritized subtasks until a stage fails or the list completes.
59
+ ```
60
+
61
+ - **Mode policy:** Defaults to `mcp` but upgrades to `cloud` whenever a subtask flags `requires_cloud` or task metadata enables parallel execution, ensuring builders/testers run in the correct environment before artifacts are produced.
62
+ - **Event-driven persistence:** Every `run:completed` event flows through `PersistenceCoordinator`, writing manifests under `.runs/<task-id>/<run-id>/` and keeping task-state snapshots current before downstream consumers (control-plane validators, scheduler hooks, optional cloud sync) ingest the data.
63
+ - **Optional group loop:** When the TF-GRPO feature flag is on, the manager processes the prioritized subtask list serially, stopping early if any Builder or Tester stage fails so reviewers only see runnable work with passing prerequisites.
64
+
65
+ ## Learning Pipeline (local snapshots + auto validation)
66
+ - Enabled per run with `LEARNING_PIPELINE_ENABLED=1`; after a successful stage, the CLI captures the working tree (tracked + untracked, git-ignored files excluded) into `.runs/<task-id>/cli/<run-id>/learning/<run-id>.tar.gz` and copies it to `.runs/learning-snapshots/<task-id>/<run-id>.tar.gz` by default (recorded as `learning.snapshot.storage_path`).
67
+ - Manifests record the tag, commit SHA, tarball digest/path, queue payload path, and validation status (`validated`, `snapshot_failed`, `stalled_snapshot`, `needs_manual_scenario`) under `learning.*` so reviewers can audit outcomes without external storage.
68
+ - Scenario synthesis replays the most recent successful command from the run (or prompt/diff fallback), writes `learning/scenario.json`, and automatically executes the commands; validation logs live at `learning/scenario-validation.log` and are stored in `learning.validation.log_path`.
69
+ - Override snapshot storage with `LEARNING_SNAPSHOT_DIR=/custom/dir` when needed; the default lives under `.runs/learning-snapshots/` (or `$CODEX_ORCHESTRATOR_RUNS_DIR/learning-snapshots/` when configured).
70
+ - Successful pipeline runs also persist lightweight experience records in `out/<task-id>/experiences.jsonl` using prompt-pack domains, so future runs can inject higher-signal context without requiring learning snapshots.
71
+ - Prompt-pack injections apply a minimum reward threshold (`TFGRPO_EXPERIENCE_MIN_REWARD`, default `0.1`) to avoid re-injecting low-signal records.
72
+ - In cloud execution mode, the orchestrator now injects a bounded subset of relevant prompt-pack experience snippets directly into the cloud task prompt, so persisted experience data can influence execution outcomes immediately.
73
+
74
+ ### How to run the learning pipeline locally
75
+ - Seed a normal run and keep manifests grouped by task:
76
+ ```bash
77
+ export MCP_RUNNER_TASK_ID=<task-id>
78
+ LEARNING_PIPELINE_ENABLED=1 npx @kbediako/codex-orchestrator start diagnostics --format json
79
+ ```
80
+ - The learning section is written only when the run succeeds; rerun the command with `LEARNING_SNAPSHOT_DIR=<abs-path>` to redirect tarball copies.
81
+
82
+ ## Repository Layout
83
+ - `orchestrator/` – Core orchestration runtime (`TaskManager`, event bus, persistence, CLI, control-plane hooks, scheduler, privacy guard).
84
+ - `packages/` – Shared libraries used by downstream projects (tool orchestrator, shared manifest schema, SDK shims, control-plane schema bundle).
85
+ - `patterns/`, `eslint-plugin-patterns/` – Codemod + lint infrastructure invoked during builds.
86
+ - `scripts/` – Operational helpers for repo contributors (e.g., `scripts/spec-guard.mjs`), not shipped in the npm package.
87
+ - `tasks/`, `docs/`, `.agent/` – Project planning artifacts that must stay in sync (`[ ]` → `[x]` checklists pointing to manifest evidence).
88
+ - `.runs/<task-id>/` – Per-task manifests, logs, metrics snapshots (`metrics.json`), and CLI run folders.
89
+ - `out/<task-id>/` – Human-friendly summaries and (when enabled) cloud-sync audit logs.
90
+
91
+ ## CLI Quick Start
92
+ 1. Install dependencies and build:
93
+ ```bash
94
+ npm install
95
+ npm run build
96
+ ```
97
+ 2. Set the task context so artifacts land in the right folder:
98
+ ```bash
99
+ export MCP_RUNNER_TASK_ID=<task-id>
100
+ ```
101
+ 3. Launch diagnostics (defaults to the configured pipeline):
102
+ ```bash
103
+ npx @kbediako/codex-orchestrator start diagnostics --format json
104
+ ```
105
+ > Tip: keep `FEATURE_TFGRPO_GROUP`, `TFGRPO_GROUP_SIZE`, and related TF-GRPO env vars **unset** when running diagnostics. Many tests assume grouped execution is off, and the TF-GRPO guardrails require `groupSize >= 2` and `groupSize <= fanOutCapacity`. Use the `tfgrpo-learning` pipeline instead when you need grouped TF-GRPO runs.
106
+ > HUD: add `--interactive` (or `--ui`) when stdout/stderr are TTY, TERM is not `dumb`, and CI is off to view the read-only Ink HUD. Non-interactive or JSON runs skip the HUD automatically.
107
+ 4. Follow the run:
108
+ ```bash
109
+ npx @kbediako/codex-orchestrator status --run <run-id> --watch --interval 10
110
+ ```
111
+ 5. Attach the CLI manifest path (`.runs/<task-id>/cli/<run-id>/manifest.json`) when you complete checklist items; the TaskManager summary lives at `.runs/<task-id>/<run-id>/manifest.json`, metrics aggregate in `.runs/<task-id>/metrics.json`, and summaries land in `out/<task-id>/state.json`.
112
+
113
+ Use `npx @kbediako/codex-orchestrator resume --run <run-id>` to continue interrupted runs; the CLI verifies resume tokens, refreshes the plan, and updates the manifest safely before rerunning.
114
+
115
+ ## Companion Package Commands
116
+ - `codex-orchestrator mcp serve [--repo <path>] [--dry-run] [-- <extra args>]`: launch the MCP stdio server (delegates to `codex mcp-server`; stdout guard keeps protocol-only output, logs to stderr).
117
+ - `codex-orchestrator init codex [--cwd <path>] [--force]`: copy starter templates into a repo (includes `mcp-client.json`, `AGENTS.md`, downstream .codex/config.toml + .codex/agents/* role files sourced from `templates/codex/.codex/*`, and `codex.orchestrator.json`; no overwrite unless `--force`).
118
+ - `codex-orchestrator setup [--yes] [--refresh-skills]`: one-shot bootstrap for downstream users (installs bundled skills, configures delegation + DevTools wiring, and prints policy/usage guidance). By default, setup does not overwrite existing skills; add `--refresh-skills` when you want to replace existing bundled skill files.
119
+ - Canonical bundled skill roster lives in `skills/README.md`, with shipped-file parity enforced against `skills/`.
120
+ - `codex-orchestrator start [pipeline] [--auto-issue-log] [--repo-config-required]`: starts a pipeline run. `--auto-issue-log` writes failure bundles automatically (including setup failures before manifest creation); `--repo-config-required` disables packaged config fallback.
121
+ - `codex-orchestrator flow [--task <task-id>] [--auto-issue-log] [--repo-config-required]`: runs `docs-review` then `implementation-gate` in sequence; stops on the first failure. `--auto-issue-log` writes failure bundles automatically (including setup failures before manifest creation); `--repo-config-required` disables packaged config fallback.
122
+ - `codex-orchestrator doctor [--format json] [--usage] [--cloud-preflight] [--issue-log] [--apply]`: check optional tooling dependencies plus collab/cloud/delegation readiness and print enablement commands. `--usage` appends a local usage snapshot (scans `.runs/`) with adoption KPIs. `--issue-log` appends/creates `docs/codex-orchestrator-issues.md` (or `--issue-log-path`) and writes a JSON bundle under `out/<resolved-task>/doctor/issue-bundles/` with doctor context plus latest run context when available. `--apply` plans/applies quick fixes (use with `--yes`).
123
+ - `codex-orchestrator devtools setup [--yes]`: print DevTools MCP setup instructions (`--yes` applies `codex mcp add ...`).
124
+ - `codex-orchestrator delegation setup [--yes]`: configure delegation MCP wiring (`--yes` applies `codex mcp add ...`).
125
+ - `codex-orchestrator skills install [--force] [--only <skills>] [--codex-home <path>]`: install bundled skills into `$CODEX_HOME/skills` (prefer global skills when installed; fall back to bundled skills, for example use `$CODEX_HOME/skills/docs-first` when present, otherwise `skills/docs-first/SKILL.md`).
126
+ - `codex-orchestrator self-check --format json`: emit a safe JSON health payload for smoke tests.
127
+ - `codex-orchestrator --version`: print the package version.
128
+
129
+ ## Publishing (npm)
130
+ - Pack audit: `npm run pack:audit` (validates the tarball file list; run `npm run clean:dist && npm run build` first if `dist/` contains non-runtime artifacts).
131
+ - Pack smoke: `npm run pack:smoke` (installs the tarball in a temp mock repo, runs CLI behavior checks including `review` artifacts and `long-poll-wait` skill install, and validates delegate-server JSONL; uses network). Treat this as a spot-check gate; use `npm run pack:audit` for full tarball inventory validation.
132
+ - Release tags: `vX.Y.Z` or `vX.Y.Z-<prerelease>` must match `package.json` version, for example `vX.Y.Z-alpha.N`, `vX.Y.Z-beta.N`, or `vX.Y.Z-rc.N`.
133
+ - Dist-tags: stable releases publish to `latest`; prereleases publish with a dist-tag derived from the leading prerelease label before the first `.` or `-`, lowercased and sanitized. Examples: `alpha.1` -> `alpha`, `beta.1` -> `beta`, `rc.1` -> `rc`; empty or numeric-leading labels fall back to `next`. Prerelease tags create a GitHub prerelease.
134
+ - Publishing auth: workflow attempts OIDC trusted publishing first (`id-token: write` + `--provenance`), then falls back to `secrets.NPM_TOKEN` when OIDC is unavailable. `secrets.NPM_TOKEN` must be an npm automation token (not a token that requires OTP).
135
+ - Trusted publisher config: npm expects workflow filename `release.yml` (the file must exist at `.github/workflows/release.yml` on the default branch). Leave environment blank unless the publish job sets `environment: ...`.
136
+ - OIDC runtime prereqs: npm trusted publishing currently requires Node.js `22.14.0+` and npm `11.5.1+`; the publish job logs the runner versions, then runs the publish commands through `npx --yes npm@11.5.1` instead of mutating the runner-global npm install.
137
+
138
+ ## Parallel Runs (Meta-Orchestration)
139
+ The orchestrator executes a single pipeline serially. “Parallelism” comes from running multiple orchestrator runs at the same time, ideally in separate git worktrees so builds/tests don’t contend for the same working tree outputs.
140
+
141
+ **Recommended pattern (one worktree per workstream)**
142
+ ```bash
143
+ git worktree add ../CO-stream-a HEAD
144
+ git worktree add ../CO-stream-b HEAD
145
+
146
+ # terminal A
147
+ cd ../CO-stream-a
148
+ export MCP_RUNNER_TASK_ID=<task-id>-a
149
+ npx @kbediako/codex-orchestrator start diagnostics --format json
150
+
151
+ # terminal B
152
+ cd ../CO-stream-b
153
+ export MCP_RUNNER_TASK_ID=<task-id>-b
154
+ npx @kbediako/codex-orchestrator start diagnostics --format json
155
+ ```
156
+
157
+ Notes:
158
+ - Use `--task <id>` instead of exporting `MCP_RUNNER_TASK_ID` when scripting runs.
159
+ - Release usage relies on the scoped package (`npx @kbediako/codex-orchestrator`); for local dev, use the repo CLI (`codex-orch` or `node ./bin/codex-orchestrator.ts`) so your changes are picked up. The unscoped `npx codex-orchestrator` is not published.
160
+ - Use `--parent-run <run-id>` to group related runs in manifests (optional).
161
+ - If worktrees aren’t possible, isolate artifacts with `CODEX_ORCHESTRATOR_RUNS_DIR` and `CODEX_ORCHESTRATOR_OUT_DIR`. Use `CODEX_ORCHESTRATOR_ROOT` to point the CLI at a repo root when invoking from outside the repo (optional; defaults to the current working directory). Avoid concurrent builds/tests in the same checkout.
162
+ - For a deeper runbook, see `.agent/SOPs/meta-orchestration.md`.
163
+
164
+ ### Codex CLI prompts
165
+ - Note: prompt installers and guardrail scripts live under `scripts/` and are repo-only (not included in the npm package).
166
+ - The custom prompts live outside the repo at `~/.codex/prompts/diagnostics.md` and `~/.codex/prompts/review-handoff.md`. Recreate those files on every fresh machine so `/prompts:diagnostics` and `/prompts:review-handoff` are available in the Codex CLI palette.
167
+ - Canonical diagnostics prompt + output expectations: `docs/diagnostics-prompt-guide.md` (keep in sync with `scripts/setup-codex-prompts.sh`).
168
+ - Standalone review guidance (`codex-orchestrator review` default, `npm run review` repo alias, plus direct `codex review` quick mode): `docs/standalone-review-guide.md`.
169
+ - These prompts are consumed by the Codex CLI UI only; the orchestrator does not read them. Keep updates synced across machines during onboarding.
170
+ - To install or refresh the prompts (repo-only), run `scripts/setup-codex-prompts.sh` (use `--force` to overwrite existing files).
171
+ - `/prompts:diagnostics` takes `TASK=<task-id> MANIFEST=<path> [NOTES=<free text>]`, exports `MCP_RUNNER_TASK_ID=$TASK`, runs `npx @kbediako/codex-orchestrator start diagnostics --format json`, tails `.runs/$TASK/cli/<run-id>/manifest.json` (or `npx @kbediako/codex-orchestrator status --run <run-id> --watch --interval 10`), and records evidence to `/tasks`, `docs/TASKS.md`, `.agent/task/...`, `.runs/$TASK/metrics.json`, and `out/$TASK/state.json` using `$MANIFEST`.
172
+ - `/prompts:review-handoff` takes `TASK=<task-id> MANIFEST=<path> NOTES=<goal + summary + risks + optional questions>`, re-exports `MCP_RUNNER_TASK_ID`, and (repo-only) runs `node scripts/delegation-guard.mjs`, `node scripts/spec-guard.mjs --dry-run`, `npm run lint`, `npm run test`, optional `npm run eval:test`, plus `codex-orchestrator review` (which wraps `codex review` against the current diff and includes the latest run manifest path as evidence). It also reminds you to log approvals in `$MANIFEST` and mirror the evidence to the same docs/metrics/state targets.
173
+ - In CI / `--no-interactive` pipelines (or when stdin is not a TTY, or `CODEX_REVIEW_NON_INTERACTIVE=1` / `CODEX_NON_INTERACTIVE=1` / `CODEX_NO_INTERACTIVE=1`), `codex-orchestrator review` prints the review handoff prompt (including evidence paths) and exits successfully instead of invoking `codex review`. Set `FORCE_CODEX_REVIEW=1` to run `codex review` in those environments.
174
+ - `codex-orchestrator review` keeps delegation MCP enabled by default; disable for troubleshooting with `CODEX_REVIEW_DISABLE_DELEGATION_MCP=1` (or `--disable-delegation-mcp`). Legacy disable control (`CODEX_REVIEW_ENABLE_DELEGATION_MCP=0`) remains supported.
175
+ - `codex-orchestrator review` allows unbounded runtime by default; set `CODEX_REVIEW_TIMEOUT_SECONDS`, `CODEX_REVIEW_STALL_TIMEOUT_SECONDS`, and/or `CODEX_REVIEW_STARTUP_LOOP_TIMEOUT_SECONDS` to opt into explicit guards (`0` disables each guard when set).
176
+ - `CODEX_REVIEW_STARTUP_LOOP_MIN_EVENTS` defaults to `8` when startup-loop timeout detection is enabled.
177
+ - `codex-orchestrator review` emits patience-first monitor checkpoints every 60 seconds by default; set `CODEX_REVIEW_MONITOR_INTERVAL_SECONDS=<seconds>` to tune cadence (`0` disables checkpoints).
178
+ - `codex-orchestrator review` detects large uncommitted scopes and injects a high-signal scope advisory into the review prompt; tune detection via `CODEX_REVIEW_LARGE_SCOPE_FILE_THRESHOLD` (default `25`) and `CODEX_REVIEW_LARGE_SCOPE_LINE_THRESHOLD` (default `1200`).
179
+ - Optional failure issue-bundle capture: set `CODEX_REVIEW_AUTO_ISSUE_LOG=1` (or pass `--auto-issue-log` to `codex-orchestrator review ...`).
180
+ - Always trigger diagnostics and review workflows through these prompts whenever you run the orchestrator so contributors consistently execute the required command sequences and capture auditable manifests.
181
+
182
+ ### Identifier Guardrails
183
+ - `MCP_RUNNER_TASK_ID` is no longer coerced or lowercased silently. The CLI calls the shared `sanitizeTaskId` helper and fails fast when the value contains control characters, traversal attempts, or Windows-reserved characters (`<`, `>`, `:`, `"`, `/`, `\`, `|`, `?`, `*`). Set the correct task ID in your environment *before* invoking the CLI.
184
+ - Run IDs used for manifest or artifact storage must come from the CLI (or pass the shared `sanitizeRunId` helper). Strings with colons, control characters, or `../` are rejected to ensure every run directory lives under `.runs/<task-id>/cli/<run-id>` (and legacy `mcp` mirrors) without risking traversal.
185
+
186
+ ### Delegation Guardrails
187
+ - `delegate.question.poll` clamps `wait_ms` to `MAX_QUESTION_POLL_WAIT_MS` (10s); each poll timeout is bounded by the remaining `wait_ms`.
188
+ - Confirm-to-act fallback only triggers on confirmation-specific errors (`error.code`), not generic tool failures.
189
+ - Tool profile entries used for MCP overrides are sanitized; only alphanumeric + `_`/`-` names are allowed (rejects `;`, `/`, `\n`, `=` and similar).
190
+
191
+ ## Pipelines & Execution Plans
192
+ - Default pipelines live in `codex.orchestrator.json` (repository-specific) and `orchestrator/src/cli/pipelines/` (built-in defaults). Each stage is either a command (shell execution) or a nested pipeline.
193
+ - The `CommandPlanner` inspects the selected pipeline and target stage; you can pass `--target <stage-id>` (alias: `--target-stage`) or set `CODEX_ORCHESTRATOR_TARGET_STAGE` to focus on a specific step (e.g., rerun tests only).
194
+ - Stage execution records stdout/stderr logs, exit codes, optional summaries, and failure data directly into the manifest (`commands[]` array).
195
+ - Guardrails (repo-only): before review, run `node scripts/delegation-guard.mjs` and `node scripts/spec-guard.mjs --dry-run` to ensure delegation and spec freshness; the orchestrator tracks guardrail outcomes in the manifest (`guardrail_status`).
196
+
197
+ ## Approval & Sandbox Model
198
+ - Approval policies (`never`, `on-request`, `auto`, or custom strings) flow through `packages/orchestrator`. Tool invocations can require approval before sandbox elevation, and all prompts/decisions are persisted.
199
+ - Sandbox retries (for transient `mcp` or cloud failures) use exponential backoff with configurable classifiers, ensuring tools get multiple attempts without masking hard failures.
200
+
201
+ ## Control Plane, Scheduler, and Cloud Sync
202
+ - `control-plane/` builds versioned requests (`buildRunRequestV2`) and validates manifests against remote expectations. Drift reports are appended to run summaries so reviewers see deviations.
203
+ - `scheduler/` resolves assignments, serializes plan data, and embeds scheduler state in manifests, making it easy to coordinate multi-stage work across agents.
204
+ - `sync/` contains the cloud upload client + worker, but is not wired into the default CLI yet. Configure credentials through the credential broker (`orchestrator/src/credentials/`) and wire `createCloudSyncWorker` to an `EventBus` if you need uploads.
205
+
206
+ ## Persistence & Observability
207
+ - `TaskStateStore` writes per-task snapshots with bounded lock retries; failures degrade gracefully while still writing the main manifest.
208
+ - `RunManifestWriter` generates the canonical manifest JSON for each run (mirrored under `.runs/`), while metrics appenders and summary writers keep `out/` up to date.
209
+ - `run-summary.json` now carries `usageKpi` run-level signals (cloud/collab/delegation/rlm indicators) and `cloudFallback` details when a cloud request is downgraded to MCP.
210
+ - `collab_tool_calls` in the manifest captures collab tool call JSONL lines extracted from command stdout (bounded by `CODEX_ORCHESTRATOR_COLLAB_MAX_EVENTS`, default 200; set 0 to disable capture). For `spawn_agent` calls, keep prompt-role intent explicit (first-line `[agent_type:<role>]`) and set `agent_type` when supported so routing remains auditable even when event payloads omit `agent_type`; keep `fork_context` disabled by default and enable it only for streams that require inherited thread history. When emitted upstream, `spawn_agent.fork_context` is persisted and summarized by `codex-orchestrator doctor --usage` counters (`true/false/unknown`) to support evidence-based policy decisions.
211
+ - Heartbeat files and timestamps guard against stalled runs. `orchestrator/src/cli/metrics/metricsRecorder.ts` aggregates command durations, exit codes, and guardrail stats for later review.
212
+ - Optional caps: `CODEX_ORCHESTRATOR_EXEC_EVENT_MAX_CHUNKS` limits captured exec chunk events per command (defaults to 500; set 0 for no cap), `CODEX_ORCHESTRATOR_TELEMETRY_MAX_EVENTS` caps in-memory telemetry events queued before flush (defaults to 1000; set 0 for no cap), and `CODEX_METRICS_PRIVACY_EVENTS_MAX` limits privacy decision events stored in `metrics.json` (-1 = no cap; `privacy_event_count` still reflects total).
213
+
214
+ ## Customizing for New Projects
215
+ - Duplicate the templates under `/tasks`, `docs/`, and `.agent/` for your task ID and keep checklist status mirrored (`[ ]` → `[x]`) with links to the manifest that proves each outcome.
216
+ - Update `docs/PRD-<slug>.md`, `tasks/specs/<id>-<slug>.md`, and `docs/ACTION_PLAN-<slug>.md` with project details and evidence paths (`.runs/<task-id>/...`).
217
+ - Refresh `.agent/` SOPs with task-specific guardrails, escalation contacts, and artifact locations.
218
+ - Remove placeholder references in manifests/docs before merging so downstream teams see only live project data.
219
+
220
+ ## Development Workflow
221
+ Note: the commands below assume a source checkout; `scripts/` helpers are not included in the npm package.
222
+ | Command | Purpose |
223
+ | --- | --- |
224
+ | `npm run build` | Compiles TypeScript to `dist/` (required for packaging and running the CLI from `dist/`). |
225
+ | `npm run lint` | Lints orchestrator, adapters, shared packages. Auto-runs `node scripts/build-patterns-if-needed.mjs` so codemods compile when missing/outdated. |
226
+ | `npm run test:core` | Narrow Core Lane matrix via `vitest.config.core.ts`; excludes `adapters/**` and `evaluation/tests/**`. |
227
+ | `npm run test` | Default repo validation alias; runs `test:core` so the historical core-only surface stays explicit. |
228
+ | `npm run test:all` | Explicit broader Vitest matrix (`test:core` + `test:adapters`) without implicitly enabling the opt-in evaluation lane. |
229
+ | `npm run eval:test` | Optional evaluation-only harness lane; alias to `npm run test:evaluation` when `evaluation/fixtures/**` or evaluation scope is in play. |
230
+ | `npm run docs:check` | Deterministically validates scripts/pipelines/paths referenced in agent-facing docs, current posture locks, bundled-skill roster parity, and the README front-door budget. |
231
+ | `npm run docs:freshness` | Validates docs registry coverage plus catalog class coverage and writes a class-separated report to `out/<task-id>/docs-freshness.json`. |
232
+ | `npm run repo:stewardship` | Audits every tracked file via `git ls-files`, classifies each tracked surface as `validate`, `update`, `delete`, or `retain_with_rationale`, and writes `out/<task-id>/repo-stewardship.json`. |
233
+ | `npm run ci:cloud-canary` | Runs the cloud canary harness (`scripts/cloud-canary-ci.mjs`) to verify cloud lifecycle manifest + run-summary evidence; credential-gated by `CODEX_CLOUD_ENV_ID` and optional auth secrets (`CODEX_CLOUD_BRANCH` defaults to `main`). Feature flags can be passed through with `CODEX_CLOUD_ENABLE_FEATURES` / `CODEX_CLOUD_DISABLE_FEATURES` (comma- or space-delimited, e.g. `sqlite,memories`). |
234
+ | `node scripts/delegation-guard.mjs` | Enforces subagent delegation evidence before review (repo-only). |
235
+ | `node scripts/spec-guard.mjs --dry-run` | Validates spec freshness; required before review (repo-only). |
236
+ | `node scripts/diff-budget.mjs` | Guards against oversized diffs before review (repo-only; defaults: 25 files / 1200 lines; supports explicit overrides). |
237
+ | `npm run pack:smoke` | Downstream simulation gate for npm consumers (tarball install in temp mock repo, `review` wrapper artifacts, delegate-server JSONL, and `skills install --only long-poll-wait`). Spot-check gate; pair with `npm run pack:audit` when you need full tarball inventory coverage. Core lane runs it automatically when downstream-facing paths change, and `.github/workflows/pack-smoke-backstop.yml` runs a weekly `main` backstop. |
238
+ | `codex-orchestrator review` | Runs the standalone review wrapper with task-scoped manifest evidence; delegation MCP is enabled by default (explicit disable available via `CODEX_REVIEW_DISABLE_DELEGATION_MCP=1` / `--disable-delegation-mcp`), runtime guards are opt-in via `CODEX_REVIEW_*` env vars, and patience-first checkpoints log by default (`CODEX_REVIEW_MONITOR_INTERVAL_SECONDS` tunes/disables). Large uncommitted scopes get an automatic prompt advisory (`CODEX_REVIEW_LARGE_SCOPE_FILE_THRESHOLD` / `CODEX_REVIEW_LARGE_SCOPE_LINE_THRESHOLD`). Optional auto failure issue logging via `CODEX_REVIEW_AUTO_ISSUE_LOG=1` or `--auto-issue-log`. |
239
+ | `npm run review` | Runs `codex review` with task-scoped manifest evidence; delegation MCP is enabled by default (explicit disable available via `CODEX_REVIEW_DISABLE_DELEGATION_MCP=1` / `--disable-delegation-mcp`), runtime guards are opt-in via `CODEX_REVIEW_*` env vars, and patience-first checkpoints log by default (`CODEX_REVIEW_MONITOR_INTERVAL_SECONDS` tunes/disables). Large uncommitted scopes get an automatic prompt advisory (`CODEX_REVIEW_LARGE_SCOPE_FILE_THRESHOLD` / `CODEX_REVIEW_LARGE_SCOPE_LINE_THRESHOLD`). Optional auto failure issue logging via `CODEX_REVIEW_AUTO_ISSUE_LOG=1` or `--auto-issue-log`. |
240
+
241
+ Run `npm run build` to compile TypeScript before packaging or invoking the CLI directly from `dist/`.
242
+
243
+ ## Diff Budget
244
+
245
+ This repo enforces a small “diff budget” via `node scripts/diff-budget.mjs` to keep PRs reviewable and avoid accidental scope creep (repo-only).
246
+
247
+ - Defaults: 25 changed files / 1200 total lines changed (additions + deletions), excluding ignored paths.
248
+ - CI: `.github/workflows/core-lane.yml` runs the diff budget on pull requests and sets `BASE_SHA` to the PR base commit, so PR/base scope remains hard-gated.
249
+ - Local: run `node scripts/diff-budget.mjs` before `npm run review` (the review wrapper runs it automatically). Without an explicit base, the hard local gate uses the current working tree relative to `HEAD`; when `origin/main` exists and the broader stacked aggregate is larger, the script prints that aggregate as advisory context.
250
+ - If `--base`, `BASE_SHA`, or `DIFF_BUDGET_BASE` is provided but cannot be resolved, the script fails instead of downgrading to local auto mode or silently falling through to a lower-priority base source.
251
+
252
+ ### Local usage
253
+ - Current working tree hard gate relative to `HEAD` (default local mode): `node scripts/diff-budget.mjs`
254
+ - Explicit PR/base scope: `node scripts/diff-budget.mjs --base <ref>`
255
+ - Commit-scoped mode (ignores working tree state): `node scripts/diff-budget.mjs --commit <sha>`
256
+
257
+ ### Overrides (exceptional)
258
+ - Local: `DIFF_BUDGET_OVERRIDE_REASON="..." node scripts/diff-budget.mjs`
259
+ - CI: apply label `diff-budget-override` and add a PR body line `Diff budget override: <reason>` (label without a non-empty reason fails CI).
260
+
261
+ ## Review Handoff
262
+
263
+ Use an explicit handoff note for reviewers. `NOTES` is required for review runs; questions are optional:
264
+
265
+ `NOTES="<goal + summary + risks + optional questions>" npm run review` (repo-only; CI disables stdin; set `CODEX_REVIEW_NON_INTERACTIVE=1` to enforce locally).
266
+
267
+ Template: `Goal: ... | Summary: ... | Risks: ... | Questions (optional): ...`
268
+
269
+ To enable Chrome DevTools for review runs, set `CODEX_REVIEW_DEVTOOLS=1` (uses a codex config override; no repo scripts required).
270
+ Default to the standard `implementation-gate` for general reviews; enable DevTools only when the review needs Chrome DevTools capabilities (visual/layout checks, network/perf diagnostics). After fixing review feedback, rerun the same gate and include any follow-up questions in `NOTES`.
271
+ To run the full implementation gate with DevTools-enabled review, use `CODEX_REVIEW_DEVTOOLS=1 npx @kbediako/codex-orchestrator start implementation-gate --format json --no-interactive --task <task-id>`.
272
+
273
+ ## Frontend Testing
274
+ Frontend testing is a first-class pipeline with DevTools off by default. The shipped pipelines already set `CODEX_NON_INTERACTIVE=1`; add it explicitly for custom automation or when you want the `frontend-test` shortcut to suppress Codex prompts:
275
+ - `CODEX_NON_INTERACTIVE=1 npx @kbediako/codex-orchestrator start frontend-testing --format json --no-interactive --task <task-id>`
276
+ - `CODEX_NON_INTERACTIVE=1 CODEX_REVIEW_DEVTOOLS=1 npx @kbediako/codex-orchestrator start frontend-testing --format json --no-interactive --task <task-id>` (DevTools enabled)
277
+ - `CODEX_NON_INTERACTIVE=1 codex-orchestrator frontend-test` (shortcut; add `--devtools` to enable DevTools)
278
+
279
+ If you run the pipelines from this repo, run `npm run build` first so `dist/` stays current (the pipeline executes the compiled runner).
280
+
281
+ Note: the frontend-testing pipeline reads the shared `CODEX_REVIEW_DEVTOOLS` flag; prefer `--devtools` or `CODEX_REVIEW_DEVTOOLS=1` for explicit enablement.
282
+
283
+ Optional prompt overrides:
284
+ - `CODEX_FRONTEND_TEST_PROMPT` (inline prompt)
285
+ - `CODEX_FRONTEND_TEST_PROMPT_PATH` (path to a prompt file)
286
+
287
+ `--no-interactive` disables the HUD only; set `CODEX_NON_INTERACTIVE=1` when you need to suppress Codex prompts (e.g., shortcut runs or custom automation).
288
+
289
+ Check readiness with `codex-orchestrator doctor --format json` (reports DevTools skill + MCP config availability). Use `codex-orchestrator devtools setup` to print setup steps.
290
+
291
+ ## Linear Runtime Proof Handoff
292
+ - Use `codex-orchestrator linear runtime-proof --issue-id <issue-id> --origin <app-url> --format json` to inspect the permit posture for app-touching lanes before review handoff.
293
+ - When the permit allows a proof mode, rerun with `--kind <screenshot|external-link|video> --proof-url <reviewer-url>` plus optional `--title` / `--summary` to generate `handoff.workpad_markdown` and `handoff.pr_markdown`.
294
+ - The helper is intentionally fail-closed for reviewer handoff: unreadable permit files, unapproved origins, blocked proof kinds, and local-only artifact paths all return non-zero instead of pretending proof is review-ready.
295
+ - Screenshot and external-link proof are controlled independently through `compliance/permit.json` `runtime_proof.allow_screenshot` and `runtime_proof.allow_external_link`; video stays disabled unless `runtime_proof.allow_video` or legacy `allow_video_capture` explicitly enables it.
296
+ - Add `--reachability-mode dns-public` only when you want explicit worker-local DNS public-resolution evidence for the reviewer URL. The default deterministic path never depends on live DNS, and a dns-public pass is still only worker-local evidence, not a universal reviewer-reachability guarantee.
297
+
298
+ ## Mirror Workflows
299
+ - `npm run mirror:fetch -- --project <name> [--dry-run] [--force]`: reads `packages/<project>/mirror.config.json` (origin, routes, asset roots, rewrite/block/allow lists), caches downloads **per project** under `.runs/<task>/mirror/<project>/cache`, strips tracker patterns, rewrites externals to `/external/<host>/...`, localizes OG/twitter preview images, rewrites share links off tracker-heavy hosts, and stages into `.runs/<task>/mirror/<project>/<timestamp>/staging/public` before promoting to `packages/<project>/public`. Non-origin assets fall back to Web Archive when the primary host is down; promotion is skipped if errors are detected unless `--force` is set. Manifests live at `.runs/<task>/mirror/<project>/<timestamp>/manifest.json` (warns when `MCP_RUNNER_TASK_ID` is unset; honors `compliance/permit.json` when present).
300
+ - `npm run mirror:serve -- --project <name> [--port <port>] [--csp <self|strict|off>] [--no-range]`: shared local-mirror server with traversal guard, HTML no-cache/asset immutability, optional CSP, optional Range support, and directory-listing blocks.
301
+ - `npm run mirror:check -- --project <name> [--port <port>]`: boots a temporary mirror server when needed and verifies all configured routes with Playwright, failing on outbound hosts outside the allowlist, tracker strings (gtag/gtm/analytics/hotjar/facebook/clarity/etc.), unresolved assets, absolute https:// references, or non-200 responses. Keep this opt-in and trigger it when `packages/<project>/public` changes.
302
+
303
+ ## Hi-Fi Design Toolkit Captures
304
+ Use the hi-fi pipeline to snapshot complex marketing sites (motion, interactions, tokens) while keeping the repo cloneable:
305
+
306
+ 1. **Configure the source:** Update `design.config.yaml` → `pipelines.hi_fi_design_toolkit.sources` with the target URL, slug, title, and breakpoints (the repo defaults to an empty `sources` list until you add one).
307
+ 2. **Permit the domain:** Copy `compliance/permit.example.json` to `compliance/permit.json`, then add (or update) the matching record so Playwright, video capture, and live assets are explicitly approved for that origin.
308
+ 3. **Prep tooling:**
309
+ - `npm install && npm run build`
310
+ - `npm run setup:design-tools` (installs design-system deps) and ensure FFmpeg is available (`brew install ffmpeg` on macOS).
311
+ 4. **Run the pipeline:**
312
+ ```bash
313
+ export MCP_RUNNER_TASK_ID=<task-id>
314
+ npx @kbediako/codex-orchestrator start hi-fi-design-toolkit --format json --task <task-id>
315
+ ```
316
+ Manifests/logs/state land under `.runs/<task-id>/cli/<run-id>/`, while staged artifacts land under `.runs/<task-id>/<run-id>/artifacts/design-toolkit/` with human summaries mirrored to `out/<task-id>/`.
317
+ 5. **Validate the clone:** serve the staged reference directory, e.g.
318
+ ```bash
319
+ cd .runs/<task-id>/<run-id>/artifacts/design-toolkit/reference/<slug>
320
+ python3 -m http.server 4173
321
+ ```
322
+ The build now mirrors all `/assets/...` content and adds root shortcuts (`wp-content`, `wp-includes`, etc.) so even absolute WordPress paths work offline. A lightweight `codex-scroll-fallback` script only unlocks scrolling if the captured page never enables it.
323
+ 6. **Document learnings:** Drop run evidence into `docs/findings/<slug>.md` (see `docs/findings/slimdown-audit.md` for a current example) so reviewers know which manifest, artifacts, and diffs back each finding.
324
+
325
+ ## Extending the Orchestrator
326
+ - Add new agent strategies by implementing the planner/builder/tester/reviewer interfaces and wiring them into `TaskManager`.
327
+ - Register additional pipelines or override defaults through `codex.orchestrator.json`. Nested pipelines let you compose reusable command groups.
328
+ - Hook external systems by subscribing to `EventBus` events (plan/build/test/review/run) or by wiring optional integrations like `CloudSyncWorker`.
329
+ - Leverage the shared TypeScript definitions in `packages/shared` to keep manifest, metrics, and telemetry consumers aligned.
330
+
331
+ ---
332
+
333
+ When preparing a review (repo-only), always capture the latest manifest path, run `node scripts/delegation-guard.mjs` and `node scripts/spec-guard.mjs --dry-run`, and ensure checklist mirrors (`/tasks`, `docs/`, `.agent/`) point at the evidence generated by Codex Orchestrator. That keeps the automation trustworthy and auditable across projects.
@@ -0,0 +1,19 @@
1
+ # Codex Orchestrator Book
2
+
3
+ This folder keeps the long-form public and maintainer guidance out of the GitHub front door while preserving stable links for operators and reviewers.
4
+
5
+ ## Contents
6
+
7
+ - [Setup](setup.md): npm baseline, Codex marketplace/plugin install, rollback, downstream bootstrap, and provider onboarding links
8
+ - [Operations](operations.md): common commands, run artifacts, workflow modes, and review handoff expectations
9
+ - [Bundled Skills](skills.md): install behavior and pointer to the canonical roster in [skills/README.md](../../skills/README.md)
10
+ - [Public Posture](public-posture.md): current compatibility target, model/runtime posture, and evidence gates
11
+ - [Local Hook Impact](local-hook-impact.md): evidence for the local CO auto-continue hook and whether it affects subagents/provider agents
12
+ - [Codex CLI 0.124.0 Adoption Evidence](codex-cli-0124-adoption.md): historical CO-341/CO-345 evidence for the `0.124.0` step; see the canonical version policy for the current local ChatGPT-auth appserver/model posture, package/downstream-smoke `0.125.0` compatibility, and cloud-only `0.124.0` candidate split
13
+
14
+ ## Navigation Contract
15
+
16
+ - Keep the root [README.md](../../README.md) concise.
17
+ - Put detailed setup and posture guidance in this folder or in the focused public guides under [docs/public](../public/).
18
+ - Keep canonical version-policy decisions in [docs/guides/codex-version-policy.md](../guides/codex-version-policy.md) and summarize them here instead of duplicating the full policy.
19
+ - Keep task-specific evidence in the task packet; link to durable summaries when a future operator needs the decision context.
@@ -0,0 +1,68 @@
1
+ # CO-345 Evidence Book: Codex CLI 0.124.0 Adoption Evidence
2
+
3
+ Scope: CO-345 README/book evidence page. This page preserves the CO-341/CO-345 `codex-cli 0.124.0` adoption step against repo evidence and official OpenAI Codex docs. Current posture has since moved: release-facing package/downstream-smoke compatibility and local ChatGPT-auth/appserver posture now use `0.125.0`, while cloud execution remains separately pinned to `0.124.0`. This page does not change runtime defaults.
4
+
5
+ ## Bottom Line
6
+
7
+ CO adopted Codex CLI `0.124.0` as the repo compatibility target during CO-341/CO-345.
8
+
9
+ That adoption was intentionally narrow. It promoted `0.124.0` after CO-341 runtime, cloud, pack-smoke, and review evidence while keeping packaged/generated model defaults on portable `gpt-5.4` with `model_reasoning_effort = "xhigh"`. Local ChatGPT-auth `gpt-5.5` / `xhigh` remained a marker-backed local opt-in rather than the generic shipped default. Current local ChatGPT-auth appserver/model posture, package/downstream-smoke `0.125.0` compatibility, and the cloud-only `0.124.0` candidate split now live in `docs/guides/codex-version-policy.md`.
10
+
11
+ ## Evidence Boundary
12
+
13
+ Host-specific absolute paths and local account state stay in the CO-345 task packet, Linear workpad, and run artifacts. This shipped page records the portable adoption decision and the evidence classes without exposing operator-local paths.
14
+
15
+ ## Recorded Evidence Snapshot
16
+
17
+ Commands were run from the issue workspace or the active operator environment during CO-345/CO-341 evidence gathering.
18
+
19
+ | Evidence | Observation |
20
+ | --- | --- |
21
+ | `which codex` | The active executable was identified before posture checks. |
22
+ | `codex --version` | Active executable reports `codex-cli 0.124.0`. |
23
+ | `codex login status` | Local CLI auth state was checked before model/posture conclusions. |
24
+ | `codex debug models` | Live model catalog includes `gpt-5.4`, `gpt-5.5`, and `gpt-5.3-codex-spark`; `gpt-5.4` and `gpt-5.5` expose `low/medium/high/xhigh` reasoning levels. |
25
+ | `codex debug models --bundled` | Bundled catalog filtering found `gpt-5.4`; local `gpt-5.5` is not treated as a portable bundled default. |
26
+ | User-level Codex config | The inspected operator environment has an explicit local `gpt-5.5` / `xhigh` opt-in; this is not a packaged/generated default. |
27
+ | `codex features list` | Local feature list reports `multi_agent`, `plugins`, `apps`, `tool_search`, and `codex_hooks` as stable/enabled; `js_repl` and `memories` are experimental/enabled. |
28
+ | `codex exec --help` | Supports `[PROMPT]`, stdin appending, `resume`, `review`, `--output-schema`, `--json`, `--ignore-user-config`, and feature toggles. |
29
+ | `codex review --help` | Supports `[PROMPT]`, `--uncommitted`, `--base`, `--commit`, `--title`, and feature toggles. |
30
+
31
+ ## Official OpenAI Docs Context
32
+
33
+ Official Codex docs describe the CLI setup, ChatGPT/API-key auth, app-server APIs, model/config fields, feature flags, plugin marketplace operations, skills listing, and feature maturity levels. Those docs support treating the 0.124-era local surfaces as real capabilities, while still requiring repo-specific evidence before CO changes shipped defaults or provider-worker supervision.
34
+
35
+ Relevant docs:
36
+
37
+ - [Codex CLI setup](https://developers.openai.com/codex/cli#cli-setup)
38
+ - [Codex auth](https://developers.openai.com/codex/auth#sign-in-with-chatgpt)
39
+ - [Codex CLI reference: login](https://developers.openai.com/codex/cli/reference#codex-login)
40
+ - [Codex config reference](https://developers.openai.com/codex/config-reference#configtoml)
41
+ - [Codex app-server](https://developers.openai.com/codex/app-server)
42
+ - [Codex feature maturity](https://developers.openai.com/codex/feature-maturity)
43
+
44
+ ## Repo Adoption Matrix
45
+
46
+ | Surface | Current posture on `main` | Classification |
47
+ | --- | --- | --- |
48
+ | Compatibility target | This page records the previous `0.124.0` target evidence; current local ChatGPT-auth appserver/model posture, package/downstream-smoke `0.125.0` compatibility, and the cloud-only `0.124.0` candidate split live in `docs/guides/codex-version-policy.md`. | Historical evidence |
49
+ | Packaged/generated model defaults | `gpt-5.4` with `model_reasoning_effort = "xhigh"`. | Adopted, intentionally portable |
50
+ | Local `gpt-5.5` / `xhigh` | Allowed after live access smoke plus `[codex_orchestrator] local_model_opt_in = "gpt-5.5"`. | Adopted as local opt-in |
51
+ | Generic shipped `gpt-5.5` default | Not promoted because bundled/cloud/API portability remains unproven. | Held |
52
+ | Appserver runtime | Local appserver remains the default runtime path. | Adopted |
53
+ | `executionMode=cloud` + `runtimeMode=appserver` | Still fails fast as unsupported. | Held |
54
+ | Provider-worker supervision | Still uses `codex exec` / `codex exec resume` until a separate app-server control seam lands. | Held |
55
+ | `explorer_fast` | Remains `gpt-5.3-codex-spark` for file/codebase search only. | Adopted exception |
56
+ | Marketplace/plugin guidance | npm remains baseline; Codex `0.121.0` accepts `codex marketplace add` or `codex plugin marketplace add`, while `0.122.0+` uses `codex plugin marketplace add`. | Adopted |
57
+
58
+ ## Follow-Up Assessment
59
+
60
+ CO-345 did not find a new unresolved `0.124.0` adoption blocker that belonged in a follow-up issue.
61
+
62
+ The meaningful holds are already intentional posture boundaries:
63
+
64
+ - Do not promote `gpt-5.5` as a generic shipped default from local ChatGPT-auth evidence alone.
65
+ - Do not move provider workers from `codex exec` / `codex exec resume` without a separate governed app-server control seam.
66
+ - Do not treat experimental or under-development feature flags as default CO behavior without task-scoped evidence.
67
+
68
+ Those holds were policy, not README-cleanup defects. Current posture and newer holds are recorded in `docs/guides/codex-version-policy.md`.
@@ -0,0 +1,73 @@
1
+ # CO-345 Evidence Book: Local Hook Impact
2
+
3
+ Date: 2026-04-24
4
+
5
+ Scope: docs-only child lane for CO-345. This page covers local Codex hook impact only. It does not change hook configuration, repo policy, README content, task packets, Linear state, or PR state.
6
+
7
+ ## Bottom Line
8
+
9
+ Local hooks are an ambient host-level input, not a repo-shipped CO behavior in this child lane.
10
+
11
+ The checked-out lane contains no repo-level Codex hooks config file and no repo-local Codex hook scripts. It does contain the tracked utility script `scripts/hooks/continue_co_orchestration.py`, but no repo config wires that script into Codex hooks in this lane. The inspected operator environment has user-level hook configuration under `${CODEX_HOME:-~/.codex}/hooks/`, and `codex features list` on the active `codex-cli 0.124.0` install reports `codex_hooks` as enabled.
12
+
13
+ Current conclusion: the inspected user-level `continue_co_orchestration.py` hook does not directly affect spawned subagents or Linear/provider agents under the inspected state because the hook only emits a blocking auto-continue prompt when hooks are enabled, the event `cwd` is inside the local CO checkout, no stop sentinel is present, and the event `session_id` matches the configured `root_session_id`. The inspected state has `root_session_id` set, so other Codex sessions, subagent sessions, and provider-worker sessions with different ids fall through with `{"continue": true}`. If `root_session_id` is cleared later, the same hook would become broader for any hook-enabled Codex event inside the CO repo tree.
14
+
15
+ ## Evidence Boundary
16
+
17
+ Host-specific absolute paths and local state values stay in the CO-345 task packet, Linear workpad, and run artifacts. This shipped page records the portable conclusion and the evidence classes without exposing operator-local paths.
18
+
19
+ ## Official Codex Hook Semantics
20
+
21
+ Official OpenAI Codex docs describe hooks as a lifecycle extensibility framework for running deterministic scripts inside the Codex loop. The docs identify the useful hook locations as user-level hooks.json and repo-level .codex/hooks.json; if more than one hook file exists, Codex loads all matching hooks, and a higher-precedence config layer does not replace lower-precedence hooks. The docs also note that matching hooks for the same event can run concurrently, and that hooks are behind the `features.codex_hooks` flag. Sources: [Hooks](https://developers.openai.com/codex/hooks), [Advanced configuration: Hooks](https://developers.openai.com/codex/config-advanced#hooks-experimental), [Config basics: Supported features](https://developers.openai.com/codex/config-basic#supported-features).
22
+
23
+ Important limits from the same docs:
24
+
25
+ | Hook area | Documented behavior | CO-345 impact |
26
+ | --- | --- | --- |
27
+ | Load path | Codex discovers hooks next to active config layers, including user-level and repo-level files. | A user-level hook can affect this lane even when the repo does not ship a hook file. |
28
+ | Multiple hooks | All matching hooks load; higher-precedence config does not replace lower-precedence hooks. | Adding a repo hook in a later issue would not automatically disable a user hook. |
29
+ | Command hooks | Multiple matching command hooks for one event launch concurrently. | Hook ordering should not be used as a correctness dependency. |
30
+ | `PreToolUse` | Current docs frame Bash interception as incomplete and a guardrail, not a complete enforcement boundary. | A hook can add safety signal but does not replace CO approval, sandbox, and review gates. |
31
+ | `PostToolUse` | It cannot undo side effects from a command that already ran. | It is evidence/continuation signal, not rollback. |
32
+ | Windows | Hooks are currently disabled on Windows in the docs. | Cross-platform claims need separate validation before repo-level hook adoption. |
33
+
34
+ ## Lane Evidence
35
+
36
+ Commands were run from the issue workspace only, unless noted.
37
+
38
+ | Evidence | Observation |
39
+ | --- | --- |
40
+ | `git status --short` | Clean before edits. |
41
+ | `find docs/book -maxdepth 2 -type f -print` | `docs/book/` did not exist before this child lane created the two scoped docs files. |
42
+ | `find . -maxdepth 4 -path '*hooks.json' -o -path '*/.codex/hooks/*' -o -path '*/hooks/*'` | No repo Codex hook config was found under `.codex`; `scripts/hooks/continue_co_orchestration.py` exists as a tracked utility/script surface and is not wired by repo config in this lane. |
43
+ | `find .codex -maxdepth 3 -type f -print` | `.codex/orchestrator.toml` exists and contains `[sandbox] network = true`; no repo hook config was present. |
44
+ | `codex features list` | Local `codex-cli 0.124.0` reports `codex_hooks` as `stable true`. |
45
+ | User-level Codex config | `codex_hooks` and `multi_agent` are enabled in the inspected operator environment. |
46
+ | User-level hook script | The installed user-level hook is the operative local hook; it differs from the tracked utility script and adds `root_session_id` scoping plus memory-citation handling. It checks repo containment, allows exact stop sentinels, and otherwise emits the auto-continue orchestration prompt. Exceptions fail open with `{"continue": true}`. |
47
+ | User-level hook state | Current state is enabled for the local CO checkout, and `root_session_id` is non-empty. |
48
+
49
+ ## Risk Posture
50
+
51
+ The local hook surface is a real source of run variance because user-level hooks can load outside the repo. That is useful for operator safety and local automation, but it is not portable evidence that CO itself ships or requires hooks.
52
+
53
+ The parent lane should classify hook-driven observations into three categories:
54
+
55
+ | Category | Treatment |
56
+ | --- | --- |
57
+ | Repo-local hook behavior | Requires committed or patch-visible repo-level Codex hook wiring. Not present in this child lane; the tracked `scripts/hooks/` utility is not active by itself. |
58
+ | User-local hook behavior | May affect local runs through user-level Codex hook config. In the inspected state it is scoped by a non-empty `root_session_id`, so different subagent/provider sessions fall through. |
59
+ | Official Codex hook capability | Cite OpenAI docs for expected semantics, but validate actual local behavior on the active CLI before depending on it. |
60
+
61
+ ## Recommended Parent Handling
62
+
63
+ - Preserve this page as evidence that this child lane found no repo-level Codex hook config, while separately noting the tracked `scripts/hooks/` utility script.
64
+ - Keep the local auto-continue hook out of shipped README/setup guidance. It is a local operator guard, not a downstream CO default.
65
+ - If a future issue wants broader local auto-continue behavior, require a separate governed lane because clearing `root_session_id` would broaden the hook to any hook-enabled Codex session inside the CO repo tree.
66
+ - If CO wants repo-governed hooks, open a separate docs-first implementation lane that owns repo-level hook configuration, hook scripts, cross-platform policy, and focused hook tests.
67
+ - For adoption canaries, compare a normal local run against a run with `--disable codex_hooks` when the goal is to isolate hook-driven behavior from Codex CLI behavior.
68
+
69
+ ## Sources
70
+
71
+ - OpenAI Codex Hooks: https://developers.openai.com/codex/hooks
72
+ - OpenAI Codex Advanced Configuration, Hooks: https://developers.openai.com/codex/config-advanced#hooks-experimental
73
+ - OpenAI Codex Config Basics, Supported features: https://developers.openai.com/codex/config-basic#supported-features
@@ -0,0 +1,60 @@
1
+ # Operations
2
+
3
+ ## Task-Scoped Runs
4
+
5
+ Use a task id for every governed run so manifests, metrics, and summaries are grouped consistently.
6
+
7
+ ```bash
8
+ export MCP_RUNNER_TASK_ID=<task-id>
9
+ codex-orchestrator start diagnostics --task <task-id> --format json
10
+ codex-orchestrator status --run <run-id> --watch --interval 10
11
+ ```
12
+
13
+ Run artifacts live under:
14
+
15
+ - `.runs/<task-id>/cli/<run-id>/manifest.json`
16
+ - `.runs/<task-id>/metrics.json`
17
+ - `out/<task-id>/state.json`
18
+
19
+ ## Common Workflows
20
+
21
+ ```bash
22
+ codex-orchestrator flow --task <task-id>
23
+ codex-orchestrator review --task <task-id>
24
+ codex-orchestrator doctor --usage --window-days 30
25
+ codex-orchestrator co-status
26
+ codex-orchestrator control-host supervise status --format json
27
+ ```
28
+
29
+ `flow` runs the docs-review and implementation-gate sequence. `review` prepares reviewer handoff evidence and can execute Codex review when the environment is configured to force non-interactive review execution.
30
+
31
+ ## Execution Modes
32
+
33
+ - Default execution mode is `mcp`.
34
+ - Cloud mode is reserved for long-running, highly parallel, or locally constrained work after preflight confirms branch/ref, non-interactive setup, and required cloud secrets.
35
+ - `runtimeMode=cli|appserver` is independent of `executionMode=mcp|cloud`.
36
+ - Local appserver remains the expected default runtime path.
37
+ - `executionMode=cloud` with explicit `runtimeMode=appserver` is unsupported and should fail fast.
38
+
39
+ ## Validation Floor
40
+
41
+ For implementation work, use the repo-local gate list from `AGENTS.md`. For documentation-only README/book work, the targeted floor is:
42
+
43
+ ```bash
44
+ node scripts/spec-guard.mjs --dry-run
45
+ npm run docs:check
46
+ npm run docs:freshness
47
+ node scripts/diff-budget.mjs
48
+ ```
49
+
50
+ Add build, lint, tests, pack smoke, or runtime proof when the diff touches scripts, package surfaces, runtime behavior, or UI/app surfaces.
51
+
52
+ ## Review Handoff
53
+
54
+ Before handing an issue to `Human Review` or `In Review`, refresh the Linear workpad with:
55
+
56
+ - final implementation summary
57
+ - validation results
58
+ - standalone review or fallback review evidence
59
+ - explicit elegance/minimality pass result
60
+ - PR link and ready-review drain result when a PR exists