npm - @kbediako/codex-orchestrator - Versions diffs - 0.2.0 → 0.2.1 - Mend

@kbediako/codex-orchestrator 0.2.0 → 0.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (41) hide show

package/README.md +43 -83
package/dist/bin/codex-orchestrator.js +2 -0
package/dist/orchestrator/src/cli/adapters/CommandBuilder.js +50 -0
package/dist/orchestrator/src/cli/adapters/cloudFailureDiagnostics.js +117 -5
package/dist/orchestrator/src/cli/coStatusAttachCliShell.js +2 -2
package/dist/orchestrator/src/cli/coStatusCliShell.js +28 -6
package/dist/orchestrator/src/cli/codexCliShell.js +48 -1
package/dist/orchestrator/src/cli/codexDefaultsSetup.js +217 -26
package/dist/orchestrator/src/cli/control/controlHostSupervision.js +28 -6
package/dist/orchestrator/src/cli/control/controlRuntime.js +17 -6
package/dist/orchestrator/src/cli/control/controlStatusDashboard.js +6 -1
package/dist/orchestrator/src/cli/control/selectedRunProjection.js +49 -2
package/dist/orchestrator/src/cli/doctor.js +142 -48
package/dist/orchestrator/src/cli/init.js +94 -1
package/dist/orchestrator/src/cli/providerLinearChildLaneRunner.js +64 -1
package/dist/orchestrator/src/cli/providerLinearWorkerRunner.js +1165 -69
package/dist/orchestrator/src/cli/rlm/alignment.js +3 -3
package/dist/orchestrator/src/cli/services/commandRunner.js +31 -0
package/dist/orchestrator/src/cli/utils/cloudPreflight.js +202 -6
package/dist/orchestrator/src/cli/utils/codexFeatures.js +60 -0
package/dist/orchestrator/src/manager.js +74 -4
package/dist/scripts/lib/docs-catalog.js +35 -1
package/docs/README.md +333 -0
package/docs/book/README.md +19 -0
package/docs/book/codex-cli-0124-adoption.md +68 -0
package/docs/book/local-hook-impact.md +73 -0
package/docs/book/operations.md +60 -0
package/docs/book/public-posture.md +34 -0
package/docs/book/setup.md +91 -0
package/docs/book/skills.md +11 -0
package/docs/guides/codex-version-policy.md +104 -0
package/docs/public/downstream-setup.md +25 -18
package/package.json +4 -1
package/plugins/codex-orchestrator/.codex-plugin/plugin.json +1 -1
package/plugins/codex-orchestrator/launcher.mjs +6 -4
package/schemas/manifest.json +17 -0
package/skills/README.md +26 -0
package/skills/collab-subagents-first/SKILL.md +1 -1
package/skills/delegation-usage/DELEGATION_GUIDE.md +12 -7
package/skills/delegation-usage/SKILL.md +13 -8
package/templates/codex/AGENTS.md +12 -10

package/docs/README.md ADDED Viewed

@@ -0,0 +1,333 @@
+# Codex Orchestrator (Repository Guide)
+> Internal contributor guide. Public downstream docs live in `README.md` and `docs/public/`.
+Codex Orchestrator is the coordination layer that glues together Codex-driven agents, run pipelines, approval policies, and evidence capture for multi-stage automation projects. It wraps a reusable orchestration core with a CLI that produces auditable manifests, integrates with control-plane validators, and syncs run results to downstream systems.
+> **At a glance:** Every run starts from a task description, writes the active CLI manifest to `.runs/<task-id>/cli/<run-id>/manifest.json`, emits a persisted run summary at `.runs/<task-id>/<run-id>/manifest.json`, mirrors human-readable data to `out/<task-id>/`, and can optionally sync to a remote control plane. Pipelines define the concrete commands (build, lint, test, etc.) that execute for a given task.
+## Evaluation & Metrics
+- Evaluation playbook: `docs/guides/evaluation-playbook.md`.
+- Metrics reference: `docs/reference/metrics-collab-context-rot.md`.
+## Collab vs MCP
+- Decision guide: `docs/guides/collab-vs-mcp.md`.
+## Public docs
+- Public front door: `README.md`
+- Downstream setup: `docs/public/downstream-setup.md`
+- Provider onboarding: `docs/public/provider-onboarding.md`
+## Upstream Sync
+- Codex CLI sync strategy: `docs/guides/upstream-codex-cli-sync.md`.
+## Current Posture
+- Current CO-local ChatGPT-auth/appserver model posture: `gpt-5.5` / `xhigh` on Codex CLI `0.125.0` when live access smoke passes.
+- Release-facing cloud/downstream pins remain evidence-gated in `docs/guides/codex-version-policy.md`; the exact CO-352 cloud blocker is the configured environment id not found.
+- Current model posture is `gpt-5.5` / `xhigh` when available in ChatGPT-auth Codex sessions; keep `explorer_fast` on `gpt-5.3-codex-spark` for file/codebase search only.
+- Portable packaged/generated defaults still keep `gpt-5.4` / `xhigh` as fallback values when `gpt-5.5`, API/cloud portability, or downstream/no-network access is not proven.
+- `codex-orchestrator doctor` treats `gpt-5.5` as non-drift when `codex debug models` verifies current model access; additive defaults keep fresh configs on portable fallback values unless `--auth-scope chatgpt` is explicitly requested after live access smoke, and they preserve compatible prior `gpt-5.5` role files without requiring extra marker metadata.
+- Local default runtime is `appserver`; keep `--runtime-mode cli` as break-glass.
+- Full posture and promotion gates live in `docs/guides/codex-version-policy.md`.
+## Release Notes
+- Shipped skills note: `docs/release-notes-template-addendum.md`.
+- Canonical promoted sections: generated `Overview` and `Bug Fixes` become top-level release-note sections; generated `Documentation` remains under `Full Changelog`.
+- Optional one-shot overview override: put release-specific narrative text in the signed annotated tag body before pushing the tag. The workflow reads the tag body for that release only and does not read .github/release-overview.md.
+## How It Works
+- **Planner → Builder → Tester → Reviewer:** The core `TaskManager` (see `orchestrator/src/manager.ts`) wires together agent interfaces that decide *what* to run (planner), execute the selected pipeline stage (builder), verify results (tester), and give a final decision (reviewer).
+- **Execution modes:** Each plan item can flag canonical `requires_cloud`; planner output still carries legacy `requiresCloud` as a compatibility alias while current code should prefer `requires_cloud`. Task metadata can set `execution.parallel`, and the mode policy picks `mcp` (local MCP runtime) or `cloud` execution accordingly. Cloud runs perform a quick preflight (env id, codex availability, optional remote branch) and fall back to `mcp` with both summary text and a structured `cloud_fallback` manifest block when preflight fails.
+- **Runtime provider modes:** `runtimeMode=cli|appserver` is orthogonal to `executionMode`; local default runtime is `appserver` with `cli` break-glass support preserved. Explicit `executionMode=cloud + runtimeMode=appserver` remains unsupported and fails fast.
+- **Advanced feature posture:** `js_repl` is enabled by default globally (local + cloud lanes). For deterministic cloud contracts, pin explicit feature lanes (`CODEX_CLOUD_ENABLE_FEATURES=js_repl` and separate `CODEX_CLOUD_DISABLE_FEATURES=js_repl` runs). Use `CODEX_CLOUD_DISABLE_FEATURES=js_repl` for task-scoped cloud break-glass; reserve `codex features disable js_repl` for global emergency toggles and re-enable with `codex features enable js_repl`; `memories` remains scoped to explicit eval lanes (legacy alias `memory_tool` is compatibility-only).
+- **Event-driven persistence:** Milestones emit typed events on `EventBus`. `PersistenceCoordinator` captures run summaries in the task state store and writes manifests so nothing is lost if the process crashes.
+- **CLI lifecycle:** `CodexOrchestrator` (in `orchestrator/src/cli/orchestrator.ts`) resolves instruction sources (`AGENTS.md`, `docs/AGENTS.md`, `.agent/AGENTS.md`), loads the chosen pipeline, executes each command stage via `runCommandStage`, and keeps heartbeats plus command status current inside the manifest (approval evidence will surface once prompt wiring lands).
+- **Control-plane & scheduler integrations:** Optional validation (`control-plane/`) and scheduling (`scheduler/`) modules enrich manifests with drift checks, plan assignments, and remote run metadata.
+- **Cloud sync (optional):** `orchestrator/src/sync/` includes a `CloudSyncWorker` + `CloudRunsClient`, but the default CLI does not wire cloud uploads yet—treat this as an integration point you enable explicitly.
+- **Tool orchestration:** The shared `packages/orchestrator` toolkit handles approval prompts, sandbox retries, and tool run bookkeeping used by higher-level agents.
+```
+Task input ─► Planner ─► Mode policy (mcp/cloud) ─► Builder ─► Tester ─► Reviewer ─► Run summary
+                         │                    │            │            │              │
+                         │                    │            │            │              └─► Control-plane validators / Scheduler hooks / Cloud sync
+                         │                    │            │            │
+                         └─► EventBus ─► PersistenceCoordinator ─► .runs/ manifests ─► out/ audits
+                                                                   │
+                                                                   └─► Task state snapshots & guardrail evidence
+Group execution (when `FEATURE_TFGRPO_GROUP=on`): repeat the Builder → Tester → Reviewer stages for prioritized subtasks until a stage fails or the list completes.
+```
+- **Mode policy:** Defaults to `mcp` but upgrades to `cloud` whenever a subtask flags `requires_cloud` or task metadata enables parallel execution, ensuring builders/testers run in the correct environment before artifacts are produced.
+- **Event-driven persistence:** Every `run:completed` event flows through `PersistenceCoordinator`, writing manifests under `.runs/<task-id>/<run-id>/` and keeping task-state snapshots current before downstream consumers (control-plane validators, scheduler hooks, optional cloud sync) ingest the data.
+- **Optional group loop:** When the TF-GRPO feature flag is on, the manager processes the prioritized subtask list serially, stopping early if any Builder or Tester stage fails so reviewers only see runnable work with passing prerequisites.
+## Learning Pipeline (local snapshots + auto validation)
+- Enabled per run with `LEARNING_PIPELINE_ENABLED=1`; after a successful stage, the CLI captures the working tree (tracked + untracked, git-ignored files excluded) into `.runs/<task-id>/cli/<run-id>/learning/<run-id>.tar.gz` and copies it to `.runs/learning-snapshots/<task-id>/<run-id>.tar.gz` by default (recorded as `learning.snapshot.storage_path`).
+- Manifests record the tag, commit SHA, tarball digest/path, queue payload path, and validation status (`validated`, `snapshot_failed`, `stalled_snapshot`, `needs_manual_scenario`) under `learning.*` so reviewers can audit outcomes without external storage.
+- Scenario synthesis replays the most recent successful command from the run (or prompt/diff fallback), writes `learning/scenario.json`, and automatically executes the commands; validation logs live at `learning/scenario-validation.log` and are stored in `learning.validation.log_path`.
+- Override snapshot storage with `LEARNING_SNAPSHOT_DIR=/custom/dir` when needed; the default lives under `.runs/learning-snapshots/` (or `$CODEX_ORCHESTRATOR_RUNS_DIR/learning-snapshots/` when configured).
+- Successful pipeline runs also persist lightweight experience records in `out/<task-id>/experiences.jsonl` using prompt-pack domains, so future runs can inject higher-signal context without requiring learning snapshots.
+- Prompt-pack injections apply a minimum reward threshold (`TFGRPO_EXPERIENCE_MIN_REWARD`, default `0.1`) to avoid re-injecting low-signal records.
+- In cloud execution mode, the orchestrator now injects a bounded subset of relevant prompt-pack experience snippets directly into the cloud task prompt, so persisted experience data can influence execution outcomes immediately.
+### How to run the learning pipeline locally
+- Seed a normal run and keep manifests grouped by task:
+  ```bash
+  export MCP_RUNNER_TASK_ID=<task-id>
+  LEARNING_PIPELINE_ENABLED=1 npx @kbediako/codex-orchestrator start diagnostics --format json
+  ```
+- The learning section is written only when the run succeeds; rerun the command with `LEARNING_SNAPSHOT_DIR=<abs-path>` to redirect tarball copies.
+## Repository Layout
+- `orchestrator/` – Core orchestration runtime (`TaskManager`, event bus, persistence, CLI, control-plane hooks, scheduler, privacy guard).
+- `packages/` – Shared libraries used by downstream projects (tool orchestrator, shared manifest schema, SDK shims, control-plane schema bundle).
+- `patterns/`, `eslint-plugin-patterns/` – Codemod + lint infrastructure invoked during builds.
+- `scripts/` – Operational helpers for repo contributors (e.g., `scripts/spec-guard.mjs`), not shipped in the npm package.
+- `tasks/`, `docs/`, `.agent/` – Project planning artifacts that must stay in sync (`[ ]` → `[x]` checklists pointing to manifest evidence).
+- `.runs/<task-id>/` – Per-task manifests, logs, metrics snapshots (`metrics.json`), and CLI run folders.
+- `out/<task-id>/` – Human-friendly summaries and (when enabled) cloud-sync audit logs.
+## CLI Quick Start
+1. Install dependencies and build:
+   ```bash
+   npm install
+   npm run build
+   ```
+2. Set the task context so artifacts land in the right folder:
+   ```bash
+   export MCP_RUNNER_TASK_ID=<task-id>
+   ```
+3. Launch diagnostics (defaults to the configured pipeline):
+   ```bash
+   npx @kbediako/codex-orchestrator start diagnostics --format json
+   ```
+   > Tip: keep `FEATURE_TFGRPO_GROUP`, `TFGRPO_GROUP_SIZE`, and related TF-GRPO env vars **unset** when running diagnostics. Many tests assume grouped execution is off, and the TF-GRPO guardrails require `groupSize >= 2` and `groupSize <= fanOutCapacity`. Use the `tfgrpo-learning` pipeline instead when you need grouped TF-GRPO runs.
+   > HUD: add `--interactive` (or `--ui`) when stdout/stderr are TTY, TERM is not `dumb`, and CI is off to view the read-only Ink HUD. Non-interactive or JSON runs skip the HUD automatically.
+4. Follow the run:
+   ```bash
+   npx @kbediako/codex-orchestrator status --run <run-id> --watch --interval 10
+   ```
+5. Attach the CLI manifest path (`.runs/<task-id>/cli/<run-id>/manifest.json`) when you complete checklist items; the TaskManager summary lives at `.runs/<task-id>/<run-id>/manifest.json`, metrics aggregate in `.runs/<task-id>/metrics.json`, and summaries land in `out/<task-id>/state.json`.
+Use `npx @kbediako/codex-orchestrator resume --run <run-id>` to continue interrupted runs; the CLI verifies resume tokens, refreshes the plan, and updates the manifest safely before rerunning.
+## Companion Package Commands
+- `codex-orchestrator mcp serve [--repo <path>] [--dry-run] [-- <extra args>]`: launch the MCP stdio server (delegates to `codex mcp-server`; stdout guard keeps protocol-only output, logs to stderr).
+- `codex-orchestrator init codex [--cwd <path>] [--force]`: copy starter templates into a repo (includes `mcp-client.json`, `AGENTS.md`, downstream .codex/config.toml + .codex/agents/* role files sourced from `templates/codex/.codex/*`, and `codex.orchestrator.json`; no overwrite unless `--force`).
+- `codex-orchestrator setup [--yes] [--refresh-skills]`: one-shot bootstrap for downstream users (installs bundled skills, configures delegation + DevTools wiring, and prints policy/usage guidance). By default, setup does not overwrite existing skills; add `--refresh-skills` when you want to replace existing bundled skill files.
+- Canonical bundled skill roster lives in `skills/README.md`, with shipped-file parity enforced against `skills/`.
+- `codex-orchestrator start [pipeline] [--auto-issue-log] [--repo-config-required]`: starts a pipeline run. `--auto-issue-log` writes failure bundles automatically (including setup failures before manifest creation); `--repo-config-required` disables packaged config fallback.
+- `codex-orchestrator flow [--task <task-id>] [--auto-issue-log] [--repo-config-required]`: runs `docs-review` then `implementation-gate` in sequence; stops on the first failure. `--auto-issue-log` writes failure bundles automatically (including setup failures before manifest creation); `--repo-config-required` disables packaged config fallback.
+- `codex-orchestrator doctor [--format json] [--usage] [--cloud-preflight] [--issue-log] [--apply]`: check optional tooling dependencies plus collab/cloud/delegation readiness and print enablement commands. `--usage` appends a local usage snapshot (scans `.runs/`) with adoption KPIs. `--issue-log` appends/creates `docs/codex-orchestrator-issues.md` (or `--issue-log-path`) and writes a JSON bundle under `out/<resolved-task>/doctor/issue-bundles/` with doctor context plus latest run context when available. `--apply` plans/applies quick fixes (use with `--yes`).
+- `codex-orchestrator devtools setup [--yes]`: print DevTools MCP setup instructions (`--yes` applies `codex mcp add ...`).
+- `codex-orchestrator delegation setup [--yes]`: configure delegation MCP wiring (`--yes` applies `codex mcp add ...`).
+- `codex-orchestrator skills install [--force] [--only <skills>] [--codex-home <path>]`: install bundled skills into `$CODEX_HOME/skills` (prefer global skills when installed; fall back to bundled skills, for example use `$CODEX_HOME/skills/docs-first` when present, otherwise `skills/docs-first/SKILL.md`).
+- `codex-orchestrator self-check --format json`: emit a safe JSON health payload for smoke tests.
+- `codex-orchestrator --version`: print the package version.
+## Publishing (npm)
+- Pack audit: `npm run pack:audit` (validates the tarball file list; run `npm run clean:dist && npm run build` first if `dist/` contains non-runtime artifacts).
+- Pack smoke: `npm run pack:smoke` (installs the tarball in a temp mock repo, runs CLI behavior checks including `review` artifacts and `long-poll-wait` skill install, and validates delegate-server JSONL; uses network). Treat this as a spot-check gate; use `npm run pack:audit` for full tarball inventory validation.
+- Release tags: `vX.Y.Z` or `vX.Y.Z-<prerelease>` must match `package.json` version, for example `vX.Y.Z-alpha.N`, `vX.Y.Z-beta.N`, or `vX.Y.Z-rc.N`.
+- Dist-tags: stable releases publish to `latest`; prereleases publish with a dist-tag derived from the leading prerelease label before the first `.` or `-`, lowercased and sanitized. Examples: `alpha.1` -> `alpha`, `beta.1` -> `beta`, `rc.1` -> `rc`; empty or numeric-leading labels fall back to `next`. Prerelease tags create a GitHub prerelease.
+- Publishing auth: workflow attempts OIDC trusted publishing first (`id-token: write` + `--provenance`), then falls back to `secrets.NPM_TOKEN` when OIDC is unavailable. `secrets.NPM_TOKEN` must be an npm automation token (not a token that requires OTP).
+- Trusted publisher config: npm expects workflow filename `release.yml` (the file must exist at `.github/workflows/release.yml` on the default branch). Leave environment blank unless the publish job sets `environment: ...`.
+- OIDC runtime prereqs: npm trusted publishing currently requires Node.js `22.14.0+` and npm `11.5.1+`; the publish job logs the runner versions, then runs the publish commands through `npx --yes npm@11.5.1` instead of mutating the runner-global npm install.
+## Parallel Runs (Meta-Orchestration)
+The orchestrator executes a single pipeline serially. “Parallelism” comes from running multiple orchestrator runs at the same time, ideally in separate git worktrees so builds/tests don’t contend for the same working tree outputs.
+**Recommended pattern (one worktree per workstream)**
+```bash
+git worktree add ../CO-stream-a HEAD
+git worktree add ../CO-stream-b HEAD
+# terminal A
+cd ../CO-stream-a
+export MCP_RUNNER_TASK_ID=<task-id>-a
+npx @kbediako/codex-orchestrator start diagnostics --format json
+# terminal B
+cd ../CO-stream-b
+export MCP_RUNNER_TASK_ID=<task-id>-b
+npx @kbediako/codex-orchestrator start diagnostics --format json
+```
+Notes:
+- Use `--task <id>` instead of exporting `MCP_RUNNER_TASK_ID` when scripting runs.
+- Release usage relies on the scoped package (`npx @kbediako/codex-orchestrator`); for local dev, use the repo CLI (`codex-orch` or `node ./bin/codex-orchestrator.ts`) so your changes are picked up. The unscoped `npx codex-orchestrator` is not published.
+- Use `--parent-run <run-id>` to group related runs in manifests (optional).
+- If worktrees aren’t possible, isolate artifacts with `CODEX_ORCHESTRATOR_RUNS_DIR` and `CODEX_ORCHESTRATOR_OUT_DIR`. Use `CODEX_ORCHESTRATOR_ROOT` to point the CLI at a repo root when invoking from outside the repo (optional; defaults to the current working directory). Avoid concurrent builds/tests in the same checkout.
+- For a deeper runbook, see `.agent/SOPs/meta-orchestration.md`.
+### Codex CLI prompts
+- Note: prompt installers and guardrail scripts live under `scripts/` and are repo-only (not included in the npm package).
+- The custom prompts live outside the repo at `~/.codex/prompts/diagnostics.md` and `~/.codex/prompts/review-handoff.md`. Recreate those files on every fresh machine so `/prompts:diagnostics` and `/prompts:review-handoff` are available in the Codex CLI palette.
+- Canonical diagnostics prompt + output expectations: `docs/diagnostics-prompt-guide.md` (keep in sync with `scripts/setup-codex-prompts.sh`).
+- Standalone review guidance (`codex-orchestrator review` default, `npm run review` repo alias, plus direct `codex review` quick mode): `docs/standalone-review-guide.md`.
+- These prompts are consumed by the Codex CLI UI only; the orchestrator does not read them. Keep updates synced across machines during onboarding.
+- To install or refresh the prompts (repo-only), run `scripts/setup-codex-prompts.sh` (use `--force` to overwrite existing files).
+- `/prompts:diagnostics` takes `TASK=<task-id> MANIFEST=<path> [NOTES=<free text>]`, exports `MCP_RUNNER_TASK_ID=$TASK`, runs `npx @kbediako/codex-orchestrator start diagnostics --format json`, tails `.runs/$TASK/cli/<run-id>/manifest.json` (or `npx @kbediako/codex-orchestrator status --run <run-id> --watch --interval 10`), and records evidence to `/tasks`, `docs/TASKS.md`, `.agent/task/...`, `.runs/$TASK/metrics.json`, and `out/$TASK/state.json` using `$MANIFEST`.
+- `/prompts:review-handoff` takes `TASK=<task-id> MANIFEST=<path> NOTES=<goal + summary + risks + optional questions>`, re-exports `MCP_RUNNER_TASK_ID`, and (repo-only) runs `node scripts/delegation-guard.mjs`, `node scripts/spec-guard.mjs --dry-run`, `npm run lint`, `npm run test`, optional `npm run eval:test`, plus `codex-orchestrator review` (which wraps `codex review` against the current diff and includes the latest run manifest path as evidence). It also reminds you to log approvals in `$MANIFEST` and mirror the evidence to the same docs/metrics/state targets.
+- In CI / `--no-interactive` pipelines (or when stdin is not a TTY, or `CODEX_REVIEW_NON_INTERACTIVE=1` / `CODEX_NON_INTERACTIVE=1` / `CODEX_NO_INTERACTIVE=1`), `codex-orchestrator review` prints the review handoff prompt (including evidence paths) and exits successfully instead of invoking `codex review`. Set `FORCE_CODEX_REVIEW=1` to run `codex review` in those environments.
+- `codex-orchestrator review` keeps delegation MCP enabled by default; disable for troubleshooting with `CODEX_REVIEW_DISABLE_DELEGATION_MCP=1` (or `--disable-delegation-mcp`). Legacy disable control (`CODEX_REVIEW_ENABLE_DELEGATION_MCP=0`) remains supported.
+- `codex-orchestrator review` allows unbounded runtime by default; set `CODEX_REVIEW_TIMEOUT_SECONDS`, `CODEX_REVIEW_STALL_TIMEOUT_SECONDS`, and/or `CODEX_REVIEW_STARTUP_LOOP_TIMEOUT_SECONDS` to opt into explicit guards (`0` disables each guard when set).
+- `CODEX_REVIEW_STARTUP_LOOP_MIN_EVENTS` defaults to `8` when startup-loop timeout detection is enabled.
+- `codex-orchestrator review` emits patience-first monitor checkpoints every 60 seconds by default; set `CODEX_REVIEW_MONITOR_INTERVAL_SECONDS=<seconds>` to tune cadence (`0` disables checkpoints).
+- `codex-orchestrator review` detects large uncommitted scopes and injects a high-signal scope advisory into the review prompt; tune detection via `CODEX_REVIEW_LARGE_SCOPE_FILE_THRESHOLD` (default `25`) and `CODEX_REVIEW_LARGE_SCOPE_LINE_THRESHOLD` (default `1200`).
+- Optional failure issue-bundle capture: set `CODEX_REVIEW_AUTO_ISSUE_LOG=1` (or pass `--auto-issue-log` to `codex-orchestrator review ...`).
+- Always trigger diagnostics and review workflows through these prompts whenever you run the orchestrator so contributors consistently execute the required command sequences and capture auditable manifests.
+### Identifier Guardrails
+- `MCP_RUNNER_TASK_ID` is no longer coerced or lowercased silently. The CLI calls the shared `sanitizeTaskId` helper and fails fast when the value contains control characters, traversal attempts, or Windows-reserved characters (`<`, `>`, `:`, `"`, `/`, `\`, `|`, `?`, `*`). Set the correct task ID in your environment *before* invoking the CLI.
+- Run IDs used for manifest or artifact storage must come from the CLI (or pass the shared `sanitizeRunId` helper). Strings with colons, control characters, or `../` are rejected to ensure every run directory lives under `.runs/<task-id>/cli/<run-id>` (and legacy `mcp` mirrors) without risking traversal.
+### Delegation Guardrails
+- `delegate.question.poll` clamps `wait_ms` to `MAX_QUESTION_POLL_WAIT_MS` (10s); each poll timeout is bounded by the remaining `wait_ms`.
+- Confirm-to-act fallback only triggers on confirmation-specific errors (`error.code`), not generic tool failures.
+- Tool profile entries used for MCP overrides are sanitized; only alphanumeric + `_`/`-` names are allowed (rejects `;`, `/`, `\n`, `=` and similar).
+## Pipelines & Execution Plans
+- Default pipelines live in `codex.orchestrator.json` (repository-specific) and `orchestrator/src/cli/pipelines/` (built-in defaults). Each stage is either a command (shell execution) or a nested pipeline.
+- The `CommandPlanner` inspects the selected pipeline and target stage; you can pass `--target <stage-id>` (alias: `--target-stage`) or set `CODEX_ORCHESTRATOR_TARGET_STAGE` to focus on a specific step (e.g., rerun tests only).
+- Stage execution records stdout/stderr logs, exit codes, optional summaries, and failure data directly into the manifest (`commands[]` array).
+- Guardrails (repo-only): before review, run `node scripts/delegation-guard.mjs` and `node scripts/spec-guard.mjs --dry-run` to ensure delegation and spec freshness; the orchestrator tracks guardrail outcomes in the manifest (`guardrail_status`).
+## Approval & Sandbox Model
+- Approval policies (`never`, `on-request`, `auto`, or custom strings) flow through `packages/orchestrator`. Tool invocations can require approval before sandbox elevation, and all prompts/decisions are persisted.
+- Sandbox retries (for transient `mcp` or cloud failures) use exponential backoff with configurable classifiers, ensuring tools get multiple attempts without masking hard failures.
+## Control Plane, Scheduler, and Cloud Sync
+- `control-plane/` builds versioned requests (`buildRunRequestV2`) and validates manifests against remote expectations. Drift reports are appended to run summaries so reviewers see deviations.
+- `scheduler/` resolves assignments, serializes plan data, and embeds scheduler state in manifests, making it easy to coordinate multi-stage work across agents.
+- `sync/` contains the cloud upload client + worker, but is not wired into the default CLI yet. Configure credentials through the credential broker (`orchestrator/src/credentials/`) and wire `createCloudSyncWorker` to an `EventBus` if you need uploads.
+## Persistence & Observability
+- `TaskStateStore` writes per-task snapshots with bounded lock retries; failures degrade gracefully while still writing the main manifest.
+- `RunManifestWriter` generates the canonical manifest JSON for each run (mirrored under `.runs/`), while metrics appenders and summary writers keep `out/` up to date.
+- `run-summary.json` now carries `usageKpi` run-level signals (cloud/collab/delegation/rlm indicators) and `cloudFallback` details when a cloud request is downgraded to MCP.
+- `collab_tool_calls` in the manifest captures collab tool call JSONL lines extracted from command stdout (bounded by `CODEX_ORCHESTRATOR_COLLAB_MAX_EVENTS`, default 200; set 0 to disable capture). For `spawn_agent` calls, keep prompt-role intent explicit (first-line `[agent_type:<role>]`) and set `agent_type` when supported so routing remains auditable even when event payloads omit `agent_type`; keep `fork_context` disabled by default and enable it only for streams that require inherited thread history. When emitted upstream, `spawn_agent.fork_context` is persisted and summarized by `codex-orchestrator doctor --usage` counters (`true/false/unknown`) to support evidence-based policy decisions.
+- Heartbeat files and timestamps guard against stalled runs. `orchestrator/src/cli/metrics/metricsRecorder.ts` aggregates command durations, exit codes, and guardrail stats for later review.
+- Optional caps: `CODEX_ORCHESTRATOR_EXEC_EVENT_MAX_CHUNKS` limits captured exec chunk events per command (defaults to 500; set 0 for no cap), `CODEX_ORCHESTRATOR_TELEMETRY_MAX_EVENTS` caps in-memory telemetry events queued before flush (defaults to 1000; set 0 for no cap), and `CODEX_METRICS_PRIVACY_EVENTS_MAX` limits privacy decision events stored in `metrics.json` (-1 = no cap; `privacy_event_count` still reflects total).
+## Customizing for New Projects
+- Duplicate the templates under `/tasks`, `docs/`, and `.agent/` for your task ID and keep checklist status mirrored (`[ ]` → `[x]`) with links to the manifest that proves each outcome.
+- Update `docs/PRD-<slug>.md`, `tasks/specs/<id>-<slug>.md`, and `docs/ACTION_PLAN-<slug>.md` with project details and evidence paths (`.runs/<task-id>/...`).
+- Refresh `.agent/` SOPs with task-specific guardrails, escalation contacts, and artifact locations.
+- Remove placeholder references in manifests/docs before merging so downstream teams see only live project data.
+## Development Workflow
+Note: the commands below assume a source checkout; `scripts/` helpers are not included in the npm package.
+| Command | Purpose |
+| --- | --- |
+| `npm run build` | Compiles TypeScript to `dist/` (required for packaging and running the CLI from `dist/`). |
+| `npm run lint` | Lints orchestrator, adapters, shared packages. Auto-runs `node scripts/build-patterns-if-needed.mjs` so codemods compile when missing/outdated. |
+| `npm run test:core` | Narrow Core Lane matrix via `vitest.config.core.ts`; excludes `adapters/**` and `evaluation/tests/**`. |
+| `npm run test` | Default repo validation alias; runs `test:core` so the historical core-only surface stays explicit. |
+| `npm run test:all` | Explicit broader Vitest matrix (`test:core` + `test:adapters`) without implicitly enabling the opt-in evaluation lane. |
+| `npm run eval:test` | Optional evaluation-only harness lane; alias to `npm run test:evaluation` when `evaluation/fixtures/**` or evaluation scope is in play. |
+| `npm run docs:check` | Deterministically validates scripts/pipelines/paths referenced in agent-facing docs, current posture locks, bundled-skill roster parity, and the README front-door budget. |
+| `npm run docs:freshness` | Validates docs registry coverage plus catalog class coverage and writes a class-separated report to `out/<task-id>/docs-freshness.json`. |
+| `npm run repo:stewardship` | Audits every tracked file via `git ls-files`, classifies each tracked surface as `validate`, `update`, `delete`, or `retain_with_rationale`, and writes `out/<task-id>/repo-stewardship.json`. |
+| `npm run ci:cloud-canary` | Runs the cloud canary harness (`scripts/cloud-canary-ci.mjs`) to verify cloud lifecycle manifest + run-summary evidence; credential-gated by `CODEX_CLOUD_ENV_ID` and optional auth secrets (`CODEX_CLOUD_BRANCH` defaults to `main`). Feature flags can be passed through with `CODEX_CLOUD_ENABLE_FEATURES` / `CODEX_CLOUD_DISABLE_FEATURES` (comma- or space-delimited, e.g. `sqlite,memories`). |
+| `node scripts/delegation-guard.mjs` | Enforces subagent delegation evidence before review (repo-only). |
+| `node scripts/spec-guard.mjs --dry-run` | Validates spec freshness; required before review (repo-only). |
+| `node scripts/diff-budget.mjs` | Guards against oversized diffs before review (repo-only; defaults: 25 files / 1200 lines; supports explicit overrides). |
+| `npm run pack:smoke` | Downstream simulation gate for npm consumers (tarball install in temp mock repo, `review` wrapper artifacts, delegate-server JSONL, and `skills install --only long-poll-wait`). Spot-check gate; pair with `npm run pack:audit` when you need full tarball inventory coverage. Core lane runs it automatically when downstream-facing paths change, and `.github/workflows/pack-smoke-backstop.yml` runs a weekly `main` backstop. |
+| `codex-orchestrator review` | Runs the standalone review wrapper with task-scoped manifest evidence; delegation MCP is enabled by default (explicit disable available via `CODEX_REVIEW_DISABLE_DELEGATION_MCP=1` / `--disable-delegation-mcp`), runtime guards are opt-in via `CODEX_REVIEW_*` env vars, and patience-first checkpoints log by default (`CODEX_REVIEW_MONITOR_INTERVAL_SECONDS` tunes/disables). Large uncommitted scopes get an automatic prompt advisory (`CODEX_REVIEW_LARGE_SCOPE_FILE_THRESHOLD` / `CODEX_REVIEW_LARGE_SCOPE_LINE_THRESHOLD`). Optional auto failure issue logging via `CODEX_REVIEW_AUTO_ISSUE_LOG=1` or `--auto-issue-log`. |
+| `npm run review` | Runs `codex review` with task-scoped manifest evidence; delegation MCP is enabled by default (explicit disable available via `CODEX_REVIEW_DISABLE_DELEGATION_MCP=1` / `--disable-delegation-mcp`), runtime guards are opt-in via `CODEX_REVIEW_*` env vars, and patience-first checkpoints log by default (`CODEX_REVIEW_MONITOR_INTERVAL_SECONDS` tunes/disables). Large uncommitted scopes get an automatic prompt advisory (`CODEX_REVIEW_LARGE_SCOPE_FILE_THRESHOLD` / `CODEX_REVIEW_LARGE_SCOPE_LINE_THRESHOLD`). Optional auto failure issue logging via `CODEX_REVIEW_AUTO_ISSUE_LOG=1` or `--auto-issue-log`. |
+Run `npm run build` to compile TypeScript before packaging or invoking the CLI directly from `dist/`.
+## Diff Budget
+This repo enforces a small “diff budget” via `node scripts/diff-budget.mjs` to keep PRs reviewable and avoid accidental scope creep (repo-only).
+- Defaults: 25 changed files / 1200 total lines changed (additions + deletions), excluding ignored paths.
+- CI: `.github/workflows/core-lane.yml` runs the diff budget on pull requests and sets `BASE_SHA` to the PR base commit, so PR/base scope remains hard-gated.
+- Local: run `node scripts/diff-budget.mjs` before `npm run review` (the review wrapper runs it automatically). Without an explicit base, the hard local gate uses the current working tree relative to `HEAD`; when `origin/main` exists and the broader stacked aggregate is larger, the script prints that aggregate as advisory context.
+- If `--base`, `BASE_SHA`, or `DIFF_BUDGET_BASE` is provided but cannot be resolved, the script fails instead of downgrading to local auto mode or silently falling through to a lower-priority base source.
+### Local usage
+- Current working tree hard gate relative to `HEAD` (default local mode): `node scripts/diff-budget.mjs`
+- Explicit PR/base scope: `node scripts/diff-budget.mjs --base <ref>`
+- Commit-scoped mode (ignores working tree state): `node scripts/diff-budget.mjs --commit <sha>`
+### Overrides (exceptional)
+- Local: `DIFF_BUDGET_OVERRIDE_REASON="..." node scripts/diff-budget.mjs`
+- CI: apply label `diff-budget-override` and add a PR body line `Diff budget override: <reason>` (label without a non-empty reason fails CI).
+## Review Handoff
+Use an explicit handoff note for reviewers. `NOTES` is required for review runs; questions are optional:
+`NOTES="<goal + summary + risks + optional questions>" npm run review` (repo-only; CI disables stdin; set `CODEX_REVIEW_NON_INTERACTIVE=1` to enforce locally).
+Template: `Goal: ... | Summary: ... | Risks: ... | Questions (optional): ...`
+To enable Chrome DevTools for review runs, set `CODEX_REVIEW_DEVTOOLS=1` (uses a codex config override; no repo scripts required).
+Default to the standard `implementation-gate` for general reviews; enable DevTools only when the review needs Chrome DevTools capabilities (visual/layout checks, network/perf diagnostics). After fixing review feedback, rerun the same gate and include any follow-up questions in `NOTES`.
+To run the full implementation gate with DevTools-enabled review, use `CODEX_REVIEW_DEVTOOLS=1 npx @kbediako/codex-orchestrator start implementation-gate --format json --no-interactive --task <task-id>`.
+## Frontend Testing
+Frontend testing is a first-class pipeline with DevTools off by default. The shipped pipelines already set `CODEX_NON_INTERACTIVE=1`; add it explicitly for custom automation or when you want the `frontend-test` shortcut to suppress Codex prompts:
+- `CODEX_NON_INTERACTIVE=1 npx @kbediako/codex-orchestrator start frontend-testing --format json --no-interactive --task <task-id>`
+- `CODEX_NON_INTERACTIVE=1 CODEX_REVIEW_DEVTOOLS=1 npx @kbediako/codex-orchestrator start frontend-testing --format json --no-interactive --task <task-id>` (DevTools enabled)
+- `CODEX_NON_INTERACTIVE=1 codex-orchestrator frontend-test` (shortcut; add `--devtools` to enable DevTools)
+If you run the pipelines from this repo, run `npm run build` first so `dist/` stays current (the pipeline executes the compiled runner).
+Note: the frontend-testing pipeline reads the shared `CODEX_REVIEW_DEVTOOLS` flag; prefer `--devtools` or `CODEX_REVIEW_DEVTOOLS=1` for explicit enablement.
+Optional prompt overrides:
+- `CODEX_FRONTEND_TEST_PROMPT` (inline prompt)
+- `CODEX_FRONTEND_TEST_PROMPT_PATH` (path to a prompt file)
+`--no-interactive` disables the HUD only; set `CODEX_NON_INTERACTIVE=1` when you need to suppress Codex prompts (e.g., shortcut runs or custom automation).
+Check readiness with `codex-orchestrator doctor --format json` (reports DevTools skill + MCP config availability). Use `codex-orchestrator devtools setup` to print setup steps.
+## Linear Runtime Proof Handoff
+- Use `codex-orchestrator linear runtime-proof --issue-id <issue-id> --origin <app-url> --format json` to inspect the permit posture for app-touching lanes before review handoff.
+- When the permit allows a proof mode, rerun with `--kind <screenshot|external-link|video> --proof-url <reviewer-url>` plus optional `--title` / `--summary` to generate `handoff.workpad_markdown` and `handoff.pr_markdown`.
+- The helper is intentionally fail-closed for reviewer handoff: unreadable permit files, unapproved origins, blocked proof kinds, and local-only artifact paths all return non-zero instead of pretending proof is review-ready.
+- Screenshot and external-link proof are controlled independently through `compliance/permit.json` `runtime_proof.allow_screenshot` and `runtime_proof.allow_external_link`; video stays disabled unless `runtime_proof.allow_video` or legacy `allow_video_capture` explicitly enables it.
+- Add `--reachability-mode dns-public` only when you want explicit worker-local DNS public-resolution evidence for the reviewer URL. The default deterministic path never depends on live DNS, and a dns-public pass is still only worker-local evidence, not a universal reviewer-reachability guarantee.
+## Mirror Workflows
+- `npm run mirror:fetch -- --project <name> [--dry-run] [--force]`: reads `packages/<project>/mirror.config.json` (origin, routes, asset roots, rewrite/block/allow lists), caches downloads **per project** under `.runs/<task>/mirror/<project>/cache`, strips tracker patterns, rewrites externals to `/external/<host>/...`, localizes OG/twitter preview images, rewrites share links off tracker-heavy hosts, and stages into `.runs/<task>/mirror/<project>/<timestamp>/staging/public` before promoting to `packages/<project>/public`. Non-origin assets fall back to Web Archive when the primary host is down; promotion is skipped if errors are detected unless `--force` is set. Manifests live at `.runs/<task>/mirror/<project>/<timestamp>/manifest.json` (warns when `MCP_RUNNER_TASK_ID` is unset; honors `compliance/permit.json` when present).
+- `npm run mirror:serve -- --project <name> [--port <port>] [--csp <self|strict|off>] [--no-range]`: shared local-mirror server with traversal guard, HTML no-cache/asset immutability, optional CSP, optional Range support, and directory-listing blocks.
+- `npm run mirror:check -- --project <name> [--port <port>]`: boots a temporary mirror server when needed and verifies all configured routes with Playwright, failing on outbound hosts outside the allowlist, tracker strings (gtag/gtm/analytics/hotjar/facebook/clarity/etc.), unresolved assets, absolute https:// references, or non-200 responses. Keep this opt-in and trigger it when `packages/<project>/public` changes.
+## Hi-Fi Design Toolkit Captures
+Use the hi-fi pipeline to snapshot complex marketing sites (motion, interactions, tokens) while keeping the repo cloneable:
+1. **Configure the source:** Update `design.config.yaml` → `pipelines.hi_fi_design_toolkit.sources` with the target URL, slug, title, and breakpoints (the repo defaults to an empty `sources` list until you add one).
+2. **Permit the domain:** Copy `compliance/permit.example.json` to `compliance/permit.json`, then add (or update) the matching record so Playwright, video capture, and live assets are explicitly approved for that origin.
+3. **Prep tooling:**
+   - `npm install && npm run build`
+   - `npm run setup:design-tools` (installs design-system deps) and ensure FFmpeg is available (`brew install ffmpeg` on macOS).
+4. **Run the pipeline:**
+   ```bash
+   export MCP_RUNNER_TASK_ID=<task-id>
+   npx @kbediako/codex-orchestrator start hi-fi-design-toolkit --format json --task <task-id>
+   ```
+   Manifests/logs/state land under `.runs/<task-id>/cli/<run-id>/`, while staged artifacts land under `.runs/<task-id>/<run-id>/artifacts/design-toolkit/` with human summaries mirrored to `out/<task-id>/`.
+5. **Validate the clone:** serve the staged reference directory, e.g.
+   ```bash
+   cd .runs/<task-id>/<run-id>/artifacts/design-toolkit/reference/<slug>
+   python3 -m http.server 4173
+   ```
+   The build now mirrors all `/assets/...` content and adds root shortcuts (`wp-content`, `wp-includes`, etc.) so even absolute WordPress paths work offline. A lightweight `codex-scroll-fallback` script only unlocks scrolling if the captured page never enables it.
+6. **Document learnings:** Drop run evidence into `docs/findings/<slug>.md` (see `docs/findings/slimdown-audit.md` for a current example) so reviewers know which manifest, artifacts, and diffs back each finding.
+## Extending the Orchestrator
+- Add new agent strategies by implementing the planner/builder/tester/reviewer interfaces and wiring them into `TaskManager`.
+- Register additional pipelines or override defaults through `codex.orchestrator.json`. Nested pipelines let you compose reusable command groups.
+- Hook external systems by subscribing to `EventBus` events (plan/build/test/review/run) or by wiring optional integrations like `CloudSyncWorker`.
+- Leverage the shared TypeScript definitions in `packages/shared` to keep manifest, metrics, and telemetry consumers aligned.
+---
+When preparing a review (repo-only), always capture the latest manifest path, run `node scripts/delegation-guard.mjs` and `node scripts/spec-guard.mjs --dry-run`, and ensure checklist mirrors (`/tasks`, `docs/`, `.agent/`) point at the evidence generated by Codex Orchestrator. That keeps the automation trustworthy and auditable across projects.

package/docs/book/README.md ADDED Viewed

@@ -0,0 +1,19 @@
+# Codex Orchestrator Book
+This folder keeps the long-form public and maintainer guidance out of the GitHub front door while preserving stable links for operators and reviewers.
+## Contents
+- [Setup](setup.md): npm baseline, Codex marketplace/plugin install, rollback, downstream bootstrap, and provider onboarding links
+- [Operations](operations.md): common commands, run artifacts, workflow modes, and review handoff expectations
+- [Bundled Skills](skills.md): install behavior and pointer to the canonical roster in [skills/README.md](../../skills/README.md)
+- [Public Posture](public-posture.md): current compatibility target, model/runtime posture, and evidence gates
+- [Local Hook Impact](local-hook-impact.md): evidence for the local CO auto-continue hook and whether it affects subagents/provider agents
+- [Codex CLI 0.124.0 Adoption Evidence](codex-cli-0124-adoption.md): historical CO-341/CO-345 evidence for the `0.124.0` step; see the canonical version policy for the current local ChatGPT-auth appserver/model posture, package/downstream-smoke `0.125.0` compatibility, and cloud-only `0.124.0` candidate split
+## Navigation Contract
+- Keep the root [README.md](../../README.md) concise.
+- Put detailed setup and posture guidance in this folder or in the focused public guides under [docs/public](../public/).
+- Keep canonical version-policy decisions in [docs/guides/codex-version-policy.md](../guides/codex-version-policy.md) and summarize them here instead of duplicating the full policy.
+- Keep task-specific evidence in the task packet; link to durable summaries when a future operator needs the decision context.

package/docs/book/codex-cli-0124-adoption.md ADDED Viewed

@@ -0,0 +1,68 @@
+# CO-345 Evidence Book: Codex CLI 0.124.0 Adoption Evidence
+Scope: CO-345 README/book evidence page. This page preserves the CO-341/CO-345 `codex-cli 0.124.0` adoption step against repo evidence and official OpenAI Codex docs. Current posture has since moved: release-facing package/downstream-smoke compatibility and local ChatGPT-auth/appserver posture now use `0.125.0`, while cloud execution remains separately pinned to `0.124.0`. This page does not change runtime defaults.
+## Bottom Line
+CO adopted Codex CLI `0.124.0` as the repo compatibility target during CO-341/CO-345.
+That adoption was intentionally narrow. It promoted `0.124.0` after CO-341 runtime, cloud, pack-smoke, and review evidence while keeping packaged/generated model defaults on portable `gpt-5.4` with `model_reasoning_effort = "xhigh"`. Local ChatGPT-auth `gpt-5.5` / `xhigh` remained a marker-backed local opt-in rather than the generic shipped default. Current local ChatGPT-auth appserver/model posture, package/downstream-smoke `0.125.0` compatibility, and the cloud-only `0.124.0` candidate split now live in `docs/guides/codex-version-policy.md`.
+## Evidence Boundary
+Host-specific absolute paths and local account state stay in the CO-345 task packet, Linear workpad, and run artifacts. This shipped page records the portable adoption decision and the evidence classes without exposing operator-local paths.
+## Recorded Evidence Snapshot
+Commands were run from the issue workspace or the active operator environment during CO-345/CO-341 evidence gathering.
+| Evidence | Observation |
+| --- | --- |
+| `which codex` | The active executable was identified before posture checks. |
+| `codex --version` | Active executable reports `codex-cli 0.124.0`. |
+| `codex login status` | Local CLI auth state was checked before model/posture conclusions. |
+| `codex debug models` | Live model catalog includes `gpt-5.4`, `gpt-5.5`, and `gpt-5.3-codex-spark`; `gpt-5.4` and `gpt-5.5` expose `low/medium/high/xhigh` reasoning levels. |
+| `codex debug models --bundled` | Bundled catalog filtering found `gpt-5.4`; local `gpt-5.5` is not treated as a portable bundled default. |
+| User-level Codex config | The inspected operator environment has an explicit local `gpt-5.5` / `xhigh` opt-in; this is not a packaged/generated default. |
+| `codex features list` | Local feature list reports `multi_agent`, `plugins`, `apps`, `tool_search`, and `codex_hooks` as stable/enabled; `js_repl` and `memories` are experimental/enabled. |
+| `codex exec --help` | Supports `[PROMPT]`, stdin appending, `resume`, `review`, `--output-schema`, `--json`, `--ignore-user-config`, and feature toggles. |
+| `codex review --help` | Supports `[PROMPT]`, `--uncommitted`, `--base`, `--commit`, `--title`, and feature toggles. |
+## Official OpenAI Docs Context
+Official Codex docs describe the CLI setup, ChatGPT/API-key auth, app-server APIs, model/config fields, feature flags, plugin marketplace operations, skills listing, and feature maturity levels. Those docs support treating the 0.124-era local surfaces as real capabilities, while still requiring repo-specific evidence before CO changes shipped defaults or provider-worker supervision.
+Relevant docs:
+- [Codex CLI setup](https://developers.openai.com/codex/cli#cli-setup)
+- [Codex auth](https://developers.openai.com/codex/auth#sign-in-with-chatgpt)
+- [Codex CLI reference: login](https://developers.openai.com/codex/cli/reference#codex-login)
+- [Codex config reference](https://developers.openai.com/codex/config-reference#configtoml)
+- [Codex app-server](https://developers.openai.com/codex/app-server)
+- [Codex feature maturity](https://developers.openai.com/codex/feature-maturity)
+## Repo Adoption Matrix
+| Surface | Current posture on `main` | Classification |
+| --- | --- | --- |
+| Compatibility target | This page records the previous `0.124.0` target evidence; current local ChatGPT-auth appserver/model posture, package/downstream-smoke `0.125.0` compatibility, and the cloud-only `0.124.0` candidate split live in `docs/guides/codex-version-policy.md`. | Historical evidence |
+| Packaged/generated model defaults | `gpt-5.4` with `model_reasoning_effort = "xhigh"`. | Adopted, intentionally portable |
+| Local `gpt-5.5` / `xhigh` | Allowed after live access smoke plus `[codex_orchestrator] local_model_opt_in = "gpt-5.5"`. | Adopted as local opt-in |
+| Generic shipped `gpt-5.5` default | Not promoted because bundled/cloud/API portability remains unproven. | Held |
+| Appserver runtime | Local appserver remains the default runtime path. | Adopted |
+| `executionMode=cloud` + `runtimeMode=appserver` | Still fails fast as unsupported. | Held |
+| Provider-worker supervision | Still uses `codex exec` / `codex exec resume` until a separate app-server control seam lands. | Held |
+| `explorer_fast` | Remains `gpt-5.3-codex-spark` for file/codebase search only. | Adopted exception |
+| Marketplace/plugin guidance | npm remains baseline; Codex `0.121.0` accepts `codex marketplace add` or `codex plugin marketplace add`, while `0.122.0+` uses `codex plugin marketplace add`. | Adopted |
+## Follow-Up Assessment
+CO-345 did not find a new unresolved `0.124.0` adoption blocker that belonged in a follow-up issue.
+The meaningful holds are already intentional posture boundaries:
+- Do not promote `gpt-5.5` as a generic shipped default from local ChatGPT-auth evidence alone.
+- Do not move provider workers from `codex exec` / `codex exec resume` without a separate governed app-server control seam.
+- Do not treat experimental or under-development feature flags as default CO behavior without task-scoped evidence.
+Those holds were policy, not README-cleanup defects. Current posture and newer holds are recorded in `docs/guides/codex-version-policy.md`.

package/docs/book/local-hook-impact.md ADDED Viewed

@@ -0,0 +1,73 @@
+# CO-345 Evidence Book: Local Hook Impact
+Date: 2026-04-24
+Scope: docs-only child lane for CO-345. This page covers local Codex hook impact only. It does not change hook configuration, repo policy, README content, task packets, Linear state, or PR state.
+## Bottom Line
+Local hooks are an ambient host-level input, not a repo-shipped CO behavior in this child lane.
+The checked-out lane contains no repo-level Codex hooks config file and no repo-local Codex hook scripts. It does contain the tracked utility script `scripts/hooks/continue_co_orchestration.py`, but no repo config wires that script into Codex hooks in this lane. The inspected operator environment has user-level hook configuration under `${CODEX_HOME:-~/.codex}/hooks/`, and `codex features list` on the active `codex-cli 0.124.0` install reports `codex_hooks` as enabled.
+Current conclusion: the inspected user-level `continue_co_orchestration.py` hook does not directly affect spawned subagents or Linear/provider agents under the inspected state because the hook only emits a blocking auto-continue prompt when hooks are enabled, the event `cwd` is inside the local CO checkout, no stop sentinel is present, and the event `session_id` matches the configured `root_session_id`. The inspected state has `root_session_id` set, so other Codex sessions, subagent sessions, and provider-worker sessions with different ids fall through with `{"continue": true}`. If `root_session_id` is cleared later, the same hook would become broader for any hook-enabled Codex event inside the CO repo tree.
+## Evidence Boundary
+Host-specific absolute paths and local state values stay in the CO-345 task packet, Linear workpad, and run artifacts. This shipped page records the portable conclusion and the evidence classes without exposing operator-local paths.
+## Official Codex Hook Semantics
+Official OpenAI Codex docs describe hooks as a lifecycle extensibility framework for running deterministic scripts inside the Codex loop. The docs identify the useful hook locations as user-level hooks.json and repo-level .codex/hooks.json; if more than one hook file exists, Codex loads all matching hooks, and a higher-precedence config layer does not replace lower-precedence hooks. The docs also note that matching hooks for the same event can run concurrently, and that hooks are behind the `features.codex_hooks` flag. Sources: [Hooks](https://developers.openai.com/codex/hooks), [Advanced configuration: Hooks](https://developers.openai.com/codex/config-advanced#hooks-experimental), [Config basics: Supported features](https://developers.openai.com/codex/config-basic#supported-features).
+Important limits from the same docs:
+| Hook area | Documented behavior | CO-345 impact |
+| --- | --- | --- |
+| Load path | Codex discovers hooks next to active config layers, including user-level and repo-level files. | A user-level hook can affect this lane even when the repo does not ship a hook file. |
+| Multiple hooks | All matching hooks load; higher-precedence config does not replace lower-precedence hooks. | Adding a repo hook in a later issue would not automatically disable a user hook. |
+| Command hooks | Multiple matching command hooks for one event launch concurrently. | Hook ordering should not be used as a correctness dependency. |
+| `PreToolUse` | Current docs frame Bash interception as incomplete and a guardrail, not a complete enforcement boundary. | A hook can add safety signal but does not replace CO approval, sandbox, and review gates. |
+| `PostToolUse` | It cannot undo side effects from a command that already ran. | It is evidence/continuation signal, not rollback. |
+| Windows | Hooks are currently disabled on Windows in the docs. | Cross-platform claims need separate validation before repo-level hook adoption. |
+## Lane Evidence
+Commands were run from the issue workspace only, unless noted.
+| Evidence | Observation |
+| --- | --- |
+| `git status --short` | Clean before edits. |
+| `find docs/book -maxdepth 2 -type f -print` | `docs/book/` did not exist before this child lane created the two scoped docs files. |
+| `find . -maxdepth 4 -path '*hooks.json' -o -path '*/.codex/hooks/*' -o -path '*/hooks/*'` | No repo Codex hook config was found under `.codex`; `scripts/hooks/continue_co_orchestration.py` exists as a tracked utility/script surface and is not wired by repo config in this lane. |
+| `find .codex -maxdepth 3 -type f -print` | `.codex/orchestrator.toml` exists and contains `[sandbox] network = true`; no repo hook config was present. |
+| `codex features list` | Local `codex-cli 0.124.0` reports `codex_hooks` as `stable true`. |
+| User-level Codex config | `codex_hooks` and `multi_agent` are enabled in the inspected operator environment. |
+| User-level hook script | The installed user-level hook is the operative local hook; it differs from the tracked utility script and adds `root_session_id` scoping plus memory-citation handling. It checks repo containment, allows exact stop sentinels, and otherwise emits the auto-continue orchestration prompt. Exceptions fail open with `{"continue": true}`. |
+| User-level hook state | Current state is enabled for the local CO checkout, and `root_session_id` is non-empty. |
+## Risk Posture
+The local hook surface is a real source of run variance because user-level hooks can load outside the repo. That is useful for operator safety and local automation, but it is not portable evidence that CO itself ships or requires hooks.
+The parent lane should classify hook-driven observations into three categories:
+| Category | Treatment |
+| --- | --- |
+| Repo-local hook behavior | Requires committed or patch-visible repo-level Codex hook wiring. Not present in this child lane; the tracked `scripts/hooks/` utility is not active by itself. |
+| User-local hook behavior | May affect local runs through user-level Codex hook config. In the inspected state it is scoped by a non-empty `root_session_id`, so different subagent/provider sessions fall through. |
+| Official Codex hook capability | Cite OpenAI docs for expected semantics, but validate actual local behavior on the active CLI before depending on it. |
+## Recommended Parent Handling
+- Preserve this page as evidence that this child lane found no repo-level Codex hook config, while separately noting the tracked `scripts/hooks/` utility script.
+- Keep the local auto-continue hook out of shipped README/setup guidance. It is a local operator guard, not a downstream CO default.
+- If a future issue wants broader local auto-continue behavior, require a separate governed lane because clearing `root_session_id` would broaden the hook to any hook-enabled Codex session inside the CO repo tree.
+- If CO wants repo-governed hooks, open a separate docs-first implementation lane that owns repo-level hook configuration, hook scripts, cross-platform policy, and focused hook tests.
+- For adoption canaries, compare a normal local run against a run with `--disable codex_hooks` when the goal is to isolate hook-driven behavior from Codex CLI behavior.
+## Sources
+- OpenAI Codex Hooks: https://developers.openai.com/codex/hooks
+- OpenAI Codex Advanced Configuration, Hooks: https://developers.openai.com/codex/config-advanced#hooks-experimental
+- OpenAI Codex Config Basics, Supported features: https://developers.openai.com/codex/config-basic#supported-features

package/docs/book/operations.md ADDED Viewed

@@ -0,0 +1,60 @@
+# Operations
+## Task-Scoped Runs
+Use a task id for every governed run so manifests, metrics, and summaries are grouped consistently.
+```bash
+export MCP_RUNNER_TASK_ID=<task-id>
+codex-orchestrator start diagnostics --task <task-id> --format json
+codex-orchestrator status --run <run-id> --watch --interval 10
+```
+Run artifacts live under:
+- `.runs/<task-id>/cli/<run-id>/manifest.json`
+- `.runs/<task-id>/metrics.json`
+- `out/<task-id>/state.json`
+## Common Workflows
+```bash
+codex-orchestrator flow --task <task-id>
+codex-orchestrator review --task <task-id>
+codex-orchestrator doctor --usage --window-days 30
+codex-orchestrator co-status
+codex-orchestrator control-host supervise status --format json
+```
+`flow` runs the docs-review and implementation-gate sequence. `review` prepares reviewer handoff evidence and can execute Codex review when the environment is configured to force non-interactive review execution.
+## Execution Modes
+- Default execution mode is `mcp`.
+- Cloud mode is reserved for long-running, highly parallel, or locally constrained work after preflight confirms branch/ref, non-interactive setup, and required cloud secrets.
+- `runtimeMode=cli|appserver` is independent of `executionMode=mcp|cloud`.
+- Local appserver remains the expected default runtime path.
+- `executionMode=cloud` with explicit `runtimeMode=appserver` is unsupported and should fail fast.
+## Validation Floor
+For implementation work, use the repo-local gate list from `AGENTS.md`. For documentation-only README/book work, the targeted floor is:
+```bash
+node scripts/spec-guard.mjs --dry-run
+npm run docs:check
+npm run docs:freshness
+node scripts/diff-budget.mjs
+```
+Add build, lint, tests, pack smoke, or runtime proof when the diff touches scripts, package surfaces, runtime behavior, or UI/app surfaces.
+## Review Handoff
+Before handing an issue to `Human Review` or `In Review`, refresh the Linear workpad with:
+- final implementation summary
+- validation results
+- standalone review or fallback review evidence
+- explicit elegance/minimality pass result
+- PR link and ready-review drain result when a PR exists