@kbediako/codex-orchestrator 0.1.2 → 0.1.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (73) hide show
  1. package/README.md +15 -8
  2. package/dist/bin/codex-orchestrator.js +252 -121
  3. package/dist/orchestrator/src/cli/config/delegationConfig.js +485 -0
  4. package/dist/orchestrator/src/cli/config/userConfig.js +86 -12
  5. package/dist/orchestrator/src/cli/control/confirmations.js +262 -0
  6. package/dist/orchestrator/src/cli/control/controlServer.js +1476 -0
  7. package/dist/orchestrator/src/cli/control/controlState.js +46 -0
  8. package/dist/orchestrator/src/cli/control/controlWatcher.js +222 -0
  9. package/dist/orchestrator/src/cli/control/delegationTokens.js +62 -0
  10. package/dist/orchestrator/src/cli/control/questions.js +106 -0
  11. package/dist/orchestrator/src/cli/delegationServer.js +1368 -0
  12. package/dist/orchestrator/src/cli/events/runEventStream.js +246 -0
  13. package/dist/orchestrator/src/cli/exec/context.js +9 -3
  14. package/dist/orchestrator/src/cli/exec/learning.js +5 -3
  15. package/dist/orchestrator/src/cli/exec/stageRunner.js +30 -5
  16. package/dist/orchestrator/src/cli/exec/summary.js +1 -1
  17. package/dist/orchestrator/src/cli/metrics/metricsAggregator.js +377 -147
  18. package/dist/orchestrator/src/cli/metrics/metricsRecorder.js +3 -5
  19. package/dist/orchestrator/src/cli/orchestrator.js +233 -47
  20. package/dist/orchestrator/src/cli/pipelines/index.js +13 -24
  21. package/dist/orchestrator/src/cli/rlm/prompt.js +31 -0
  22. package/dist/orchestrator/src/cli/rlm/runner.js +177 -0
  23. package/dist/orchestrator/src/cli/rlm/types.js +1 -0
  24. package/dist/orchestrator/src/cli/rlm/validator.js +159 -0
  25. package/dist/orchestrator/src/cli/rlmRunner.js +440 -0
  26. package/dist/orchestrator/src/cli/run/environment.js +4 -11
  27. package/dist/orchestrator/src/cli/run/manifest.js +7 -1
  28. package/dist/orchestrator/src/cli/run/manifestPersister.js +33 -3
  29. package/dist/orchestrator/src/cli/run/runPaths.js +14 -0
  30. package/dist/orchestrator/src/cli/services/commandRunner.js +2 -2
  31. package/dist/orchestrator/src/cli/services/controlPlaneService.js +3 -1
  32. package/dist/orchestrator/src/cli/services/execRuntime.js +1 -2
  33. package/dist/orchestrator/src/cli/services/pipelineResolver.js +33 -2
  34. package/dist/orchestrator/src/cli/services/runPreparation.js +7 -1
  35. package/dist/orchestrator/src/cli/services/schedulerService.js +1 -1
  36. package/dist/orchestrator/src/cli/utils/devtools.js +33 -2
  37. package/dist/orchestrator/src/cli/utils/specGuardRunner.js +3 -1
  38. package/dist/orchestrator/src/cli/utils/strings.js +8 -6
  39. package/dist/orchestrator/src/persistence/ExperienceStore.js +115 -58
  40. package/dist/orchestrator/src/persistence/PersistenceCoordinator.js +8 -8
  41. package/dist/orchestrator/src/persistence/TaskStateStore.js +3 -2
  42. package/dist/orchestrator/src/persistence/lockFile.js +26 -1
  43. package/dist/orchestrator/src/persistence/sanitizeIdentifier.js +1 -1
  44. package/dist/orchestrator/src/sync/CloudSyncWorker.js +17 -4
  45. package/dist/packages/orchestrator/src/exec/stdio.js +112 -0
  46. package/dist/packages/orchestrator/src/exec/unified-exec.js +1 -1
  47. package/dist/packages/orchestrator/src/index.js +1 -0
  48. package/dist/packages/orchestrator/src/telemetry/otel-exporter.js +21 -0
  49. package/dist/packages/shared/design-artifacts/writer.js +4 -14
  50. package/dist/packages/shared/streams/stdio.js +2 -112
  51. package/dist/packages/shared/utils/strings.js +17 -0
  52. package/dist/scripts/design/pipeline/advanced-assets.js +1 -1
  53. package/dist/scripts/design/pipeline/context.js +5 -5
  54. package/dist/scripts/design/pipeline/extract.js +9 -6
  55. package/dist/scripts/design/pipeline/{optionalDeps.js → optional-deps.js} +49 -38
  56. package/dist/scripts/design/pipeline/permit.js +59 -0
  57. package/dist/scripts/design/pipeline/toolkit/common.js +18 -32
  58. package/dist/scripts/design/pipeline/toolkit/reference.js +1 -1
  59. package/dist/scripts/design/pipeline/toolkit/snapshot.js +1 -1
  60. package/dist/scripts/design/pipeline/visual-regression.js +2 -11
  61. package/dist/scripts/lib/cli-args.js +53 -0
  62. package/dist/scripts/lib/docs-helpers.js +111 -0
  63. package/dist/scripts/lib/npm-pack.js +20 -0
  64. package/dist/scripts/lib/run-manifests.js +160 -0
  65. package/package.json +7 -2
  66. package/dist/orchestrator/src/cli/pipelines/defaultDiagnostics.js +0 -32
  67. package/dist/orchestrator/src/cli/pipelines/designReference.js +0 -72
  68. package/dist/orchestrator/src/cli/pipelines/hiFiDesignToolkit.js +0 -71
  69. package/dist/orchestrator/src/cli/utils/jsonlWriter.js +0 -10
  70. package/dist/orchestrator/src/control-plane/index.js +0 -3
  71. package/dist/orchestrator/src/persistence/identifierGuards.js +0 -1
  72. package/dist/orchestrator/src/persistence/writeAtomicFile.js +0 -4
  73. package/dist/orchestrator/src/scheduler/index.js +0 -1
package/README.md CHANGED
@@ -122,7 +122,7 @@ Notes:
122
122
  - These prompts are consumed by the Codex CLI UI only; the orchestrator does not read them. Keep updates synced across machines during onboarding.
123
123
  - To install or refresh the prompts (repo-only), run `scripts/setup-codex-prompts.sh` (use `--force` to overwrite existing files).
124
124
  - `/prompts:diagnostics` takes `TASK=<task-id> MANIFEST=<path> [NOTES=<free text>]`, exports `MCP_RUNNER_TASK_ID=$TASK`, runs `npx codex-orchestrator start diagnostics --format json`, tails `.runs/$TASK/cli/<run-id>/manifest.json` (or `npx codex-orchestrator status --watch`), and records evidence to `/tasks`, `docs/TASKS.md`, `.agent/task/...`, `.runs/$TASK/metrics.json`, and `out/$TASK/state.json` using `$MANIFEST`.
125
- - `/prompts:review-handoff` takes `TASK=<task-id> MANIFEST=<path> NOTES=<goal + summary + risks + optional questions>`, re-exports `MCP_RUNNER_TASK_ID`, and (repo-only) runs `node scripts/spec-guard.mjs --dry-run`, `npm run lint`, `npm run test`, optional `npm run eval:test`, plus `npm run review` (wraps `codex review` against the current diff and includes the latest run manifest path as evidence). It also reminds you to log approvals in `$MANIFEST` and mirror the evidence to the same docs/metrics/state targets.
125
+ - `/prompts:review-handoff` takes `TASK=<task-id> MANIFEST=<path> NOTES=<goal + summary + risks + optional questions>`, re-exports `MCP_RUNNER_TASK_ID`, and (repo-only) runs `node scripts/delegation-guard.mjs`, `node scripts/spec-guard.mjs --dry-run`, `npm run lint`, `npm run test`, optional `npm run eval:test`, plus `npm run review` (wraps `codex review` against the current diff and includes the latest run manifest path as evidence). It also reminds you to log approvals in `$MANIFEST` and mirror the evidence to the same docs/metrics/state targets.
126
126
  - In CI / `--no-interactive` pipelines (or when stdin is not a TTY), `npm run review` prints the review handoff prompt (including evidence paths) and exits successfully instead of invoking `codex review`. Set `FORCE_CODEX_REVIEW=1` to run `codex review` in those environments.
127
127
  - Always trigger diagnostics and review workflows through these prompts whenever you run the orchestrator so contributors consistently execute the required command sequences and capture auditable manifests.
128
128
 
@@ -130,11 +130,16 @@ Notes:
130
130
  - `MCP_RUNNER_TASK_ID` is no longer coerced or lowercased silently. The CLI calls the shared `sanitizeTaskId` helper and fails fast when the value contains control characters, traversal attempts, or Windows-reserved characters (`<`, `>`, `:`, `"`, `/`, `\`, `|`, `?`, `*`). Set the correct task ID in your environment *before* invoking the CLI.
131
131
  - Run IDs used for manifest or artifact storage must come from the CLI (or pass the shared `sanitizeRunId` helper). Strings with colons, control characters, or `../` are rejected to ensure every run directory lives under `.runs/<task-id>/cli/<run-id>` (and legacy `mcp` mirrors) without risking traversal.
132
132
 
133
+ ### Delegation Guardrails
134
+ - `delegate.question.poll` clamps `wait_ms` to `MAX_QUESTION_POLL_WAIT_MS` (10s); each poll timeout is bounded by the remaining `wait_ms`.
135
+ - Confirm-to-act fallback only triggers on confirmation-specific errors (`error.code`), not generic tool failures.
136
+ - Tool profile entries used for MCP overrides are sanitized; only alphanumeric + `_`/`-` names are allowed (rejects `;`, `/`, `\n`, `=` and similar).
137
+
133
138
  ## Pipelines & Execution Plans
134
139
  - Default pipelines live in `codex.orchestrator.json` (repository-specific) and `orchestrator/src/cli/pipelines/` (built-in defaults). Each stage is either a command (shell execution) or a nested pipeline.
135
140
  - The `CommandPlanner` inspects the selected pipeline and target stage; you can pass `--target <stage-id>` (alias: `--target-stage`) or set `CODEX_ORCHESTRATOR_TARGET_STAGE` to focus on a specific step (e.g., rerun tests only).
136
141
  - Stage execution records stdout/stderr logs, exit codes, optional summaries, and failure data directly into the manifest (`commands[]` array).
137
- - Guardrails (repo-only): before review, run `node scripts/spec-guard.mjs --dry-run` to ensure specs touched in the PR are current; the orchestrator tracks guardrail outcomes in the manifest (`guardrail_status`).
142
+ - Guardrails (repo-only): before review, run `node scripts/delegation-guard.mjs` and `node scripts/spec-guard.mjs --dry-run` to ensure delegation and spec freshness; the orchestrator tracks guardrail outcomes in the manifest (`guardrail_status`).
138
143
 
139
144
  ## Approval & Sandbox Model
140
145
  - Approval policies (`never`, `on-request`, `auto`, or custom strings) flow through `packages/orchestrator`. Tool invocations can require approval before sandbox elevation, and all prompts/decisions are persisted.
@@ -149,7 +154,7 @@ Notes:
149
154
  - `TaskStateStore` writes per-task snapshots with bounded lock retries; failures degrade gracefully while still writing the main manifest.
150
155
  - `RunManifestWriter` generates the canonical manifest JSON for each run (mirrored under `.runs/`), while metrics appenders and summary writers keep `out/` up to date.
151
156
  - Heartbeat files and timestamps guard against stalled runs. `orchestrator/src/cli/metrics/metricsRecorder.ts` aggregates command durations, exit codes, and guardrail stats for later review.
152
- - Optional caps: `CODEX_ORCHESTRATOR_EXEC_EVENT_MAX_CHUNKS` limits captured exec chunk events per command (0 = no cap), and `CODEX_METRICS_PRIVACY_EVENTS_MAX` limits privacy decision events stored in `metrics.json` (-1 = no cap; `privacy_event_count` still reflects total).
157
+ - Optional caps: `CODEX_ORCHESTRATOR_EXEC_EVENT_MAX_CHUNKS` limits captured exec chunk events per command (defaults to 500; set 0 for no cap), `CODEX_ORCHESTRATOR_TELEMETRY_MAX_EVENTS` caps in-memory telemetry events queued before flush (defaults to 1000; set 0 for no cap), and `CODEX_METRICS_PRIVACY_EVENTS_MAX` limits privacy decision events stored in `metrics.json` (-1 = no cap; `privacy_event_count` still reflects total).
153
158
 
154
159
  ## Customizing for New Projects
155
160
  - Duplicate the templates under `/tasks`, `docs/`, and `.agent/` for your task ID and keep checklist status mirrored (`[ ]` → `[x]`) with links to the manifest that proves each outcome.
@@ -166,6 +171,8 @@ Note: the commands below assume a source checkout; `scripts/` helpers are not in
166
171
  | `npm run test` | Vitest suite covering orchestration core, CLI services, and patterns. |
167
172
  | `npm run eval:test` | Optional evaluation harness (enable when `evaluation/fixtures/**` is populated). |
168
173
  | `npm run docs:check` | Deterministically validates scripts/pipelines/paths referenced in agent-facing docs. |
174
+ | `npm run docs:freshness` | Validates docs registry coverage + review recency; writes `out/<task-id>/docs-freshness.json`. |
175
+ | `node scripts/delegation-guard.mjs` | Enforces subagent delegation evidence before review (repo-only). |
169
176
  | `node scripts/spec-guard.mjs --dry-run` | Validates spec freshness; required before review (repo-only). |
170
177
  | `node scripts/diff-budget.mjs` | Guards against oversized diffs before review (repo-only; defaults: 25 files / 800 lines; supports explicit overrides). |
171
178
  | `npm run review` | Runs `codex review` with the latest run manifest path as evidence (repo-only; CI disables stdin; set `CODEX_REVIEW_NON_INTERACTIVE=1` to enforce locally). |
@@ -198,18 +205,18 @@ Use an explicit handoff note for reviewers. `NOTES` is required for review runs;
198
205
  Template: `Goal: ... | Summary: ... | Risks: ... | Questions (optional): ...`
199
206
 
200
207
  To enable Chrome DevTools for review runs, set `CODEX_REVIEW_DEVTOOLS=1` (uses a codex config override; no repo scripts required).
201
- Default to the standard `implementation-gate` for general reviews; use `implementation-gate-devtools` only when the review needs Chrome DevTools capabilities (visual/layout checks, network/perf diagnostics). After fixing review feedback, rerun the same gate and include any follow-up questions in `NOTES`.
202
- To run the full implementation gate with DevTools-enabled review, use `npx codex-orchestrator start implementation-gate-devtools --format json --no-interactive --task <task-id>`.
208
+ Default to the standard `implementation-gate` for general reviews; enable DevTools only when the review needs Chrome DevTools capabilities (visual/layout checks, network/perf diagnostics). After fixing review feedback, rerun the same gate and include any follow-up questions in `NOTES`.
209
+ To run the full implementation gate with DevTools-enabled review, use `CODEX_REVIEW_DEVTOOLS=1 npx codex-orchestrator start implementation-gate --format json --no-interactive --task <task-id>`.
203
210
 
204
211
  ## Frontend Testing
205
212
  Frontend testing is a first-class pipeline with DevTools off by default. The shipped pipelines already set `CODEX_NON_INTERACTIVE=1`; add it explicitly for custom automation or when you want the `frontend-test` shortcut to suppress Codex prompts:
206
213
  - `CODEX_NON_INTERACTIVE=1 npx codex-orchestrator start frontend-testing --format json --no-interactive --task <task-id>`
207
- - `CODEX_NON_INTERACTIVE=1 npx codex-orchestrator start frontend-testing-devtools --format json --no-interactive --task <task-id>` (DevTools enabled)
214
+ - `CODEX_NON_INTERACTIVE=1 CODEX_REVIEW_DEVTOOLS=1 npx codex-orchestrator start frontend-testing --format json --no-interactive --task <task-id>` (DevTools enabled)
208
215
  - `CODEX_NON_INTERACTIVE=1 codex-orchestrator frontend-test` (shortcut; add `--devtools` to enable DevTools)
209
216
 
210
217
  If you run the pipelines from this repo, run `npm run build` first so `dist/` stays current (the pipeline executes the compiled runner).
211
218
 
212
- Note: the frontend-testing pipelines toggle the shared `CODEX_REVIEW_DEVTOOLS` flag under the hood; prefer `--devtools` or the devtools pipeline instead of setting it manually.
219
+ Note: the frontend-testing pipeline reads the shared `CODEX_REVIEW_DEVTOOLS` flag; prefer `--devtools` or `CODEX_REVIEW_DEVTOOLS=1` for explicit enablement.
213
220
 
214
221
  Optional prompt overrides:
215
222
  - `CODEX_FRONTEND_TEST_PROMPT` (inline prompt)
@@ -254,4 +261,4 @@ Use the hi-fi pipeline to snapshot complex marketing sites (motion, interactions
254
261
 
255
262
  ---
256
263
 
257
- When preparing a review (repo-only), always capture the latest manifest path, run `node scripts/spec-guard.mjs --dry-run`, and ensure checklist mirrors (`/tasks`, `docs/`, `.agent/`) point at the evidence generated by Codex Orchestrator. That keeps the automation trustworthy and auditable across projects.
264
+ When preparing a review (repo-only), always capture the latest manifest path, run `node scripts/delegation-guard.mjs` and `node scripts/spec-guard.mjs --dry-run`, and ensure checklist mirrors (`/tasks`, `docs/`, `.agent/`) point at the evidence generated by Codex Orchestrator. That keeps the automation trustworthy and auditable across projects.
@@ -1,9 +1,12 @@
1
1
  #!/usr/bin/env node
2
+ import { readFile } from 'node:fs/promises';
3
+ import { basename, join } from 'node:path';
2
4
  import process from 'node:process';
3
5
  import { CodexOrchestrator } from '../orchestrator/src/cli/orchestrator.js';
4
6
  import { formatPlanPreview } from '../orchestrator/src/cli/utils/planFormatter.js';
5
7
  import { executeExecCommand } from '../orchestrator/src/cli/exec/command.js';
6
- import { resolveEnvironment, sanitizeTaskId } from '../orchestrator/src/cli/run/environment.js';
8
+ import { resolveEnvironmentPaths } from '../scripts/lib/run-manifests.js';
9
+ import { normalizeEnvironmentPaths, sanitizeTaskId } from '../orchestrator/src/cli/run/environment.js';
7
10
  import { RunEventEmitter } from '../orchestrator/src/cli/events/runEvents.js';
8
11
  import { evaluateInteractiveGate } from '../orchestrator/src/cli/utils/interactive.js';
9
12
  import { buildSelfCheckResult } from '../orchestrator/src/cli/selfCheck.js';
@@ -11,7 +14,10 @@ import { initCodexTemplates, formatInitSummary } from '../orchestrator/src/cli/i
11
14
  import { runDoctor, formatDoctorSummary } from '../orchestrator/src/cli/doctor.js';
12
15
  import { formatDevtoolsSetupSummary, runDevtoolsSetup } from '../orchestrator/src/cli/devtoolsSetup.js';
13
16
  import { loadPackageInfo } from '../orchestrator/src/cli/utils/packageInfo.js';
17
+ import { slugify } from '../orchestrator/src/cli/utils/strings.js';
14
18
  import { serveMcp } from '../orchestrator/src/cli/mcp.js';
19
+ import { startDelegationServer } from '../orchestrator/src/cli/delegationServer.js';
20
+ import { splitDelegationConfigOverrides } from '../orchestrator/src/cli/config/delegationConfig.js';
15
21
  async function main() {
16
22
  const args = process.argv.slice(2);
17
23
  const command = args.shift();
@@ -35,6 +41,9 @@ async function main() {
35
41
  case 'plan':
36
42
  await handlePlan(orchestrator, args);
37
43
  break;
44
+ case 'rlm':
45
+ await handleRlm(orchestrator, args);
46
+ break;
38
47
  case 'resume':
39
48
  await handleResume(orchestrator, args);
40
49
  break;
@@ -59,6 +68,10 @@ async function main() {
59
68
  case 'mcp':
60
69
  await handleMcp(args);
61
70
  break;
71
+ case 'delegate-server':
72
+ case 'delegation-server':
73
+ await handleDelegationServer(args);
74
+ break;
62
75
  case 'version':
63
76
  printVersion();
64
77
  break;
@@ -111,110 +124,133 @@ function resolveTargetStageId(flags) {
111
124
  }
112
125
  return undefined;
113
126
  }
127
+ function readStringFlag(flags, key) {
128
+ const value = flags[key];
129
+ if (typeof value !== 'string') {
130
+ return undefined;
131
+ }
132
+ const trimmed = value.trim();
133
+ return trimmed.length > 0 ? trimmed : undefined;
134
+ }
135
+ function applyRlmEnvOverrides(flags, goal) {
136
+ if (goal) {
137
+ process.env.RLM_GOAL = goal;
138
+ }
139
+ const validator = readStringFlag(flags, 'validator');
140
+ if (validator) {
141
+ process.env.RLM_VALIDATOR = validator;
142
+ }
143
+ const maxIterations = readStringFlag(flags, 'max-iterations');
144
+ if (maxIterations) {
145
+ process.env.RLM_MAX_ITERATIONS = maxIterations;
146
+ }
147
+ const maxMinutes = readStringFlag(flags, 'max-minutes');
148
+ if (maxMinutes) {
149
+ process.env.RLM_MAX_MINUTES = maxMinutes;
150
+ }
151
+ const roles = readStringFlag(flags, 'roles');
152
+ if (roles) {
153
+ process.env.RLM_ROLES = roles;
154
+ }
155
+ }
156
+ function resolveRlmTaskId(taskFlag) {
157
+ if (taskFlag) {
158
+ return sanitizeTaskId(taskFlag);
159
+ }
160
+ const envTask = process.env.MCP_RUNNER_TASK_ID?.trim();
161
+ if (envTask) {
162
+ return sanitizeTaskId(envTask);
163
+ }
164
+ const { repoRoot } = resolveEnvironmentPaths();
165
+ const repoName = basename(repoRoot);
166
+ const slug = slugify(repoName, 'adhoc');
167
+ return sanitizeTaskId(`rlm-${slug}`);
168
+ }
169
+ async function waitForManifestCompletion(manifestPath, intervalMs = 2000) {
170
+ const terminal = new Set(['succeeded', 'failed', 'cancelled']);
171
+ while (true) {
172
+ const raw = await readFile(manifestPath, 'utf8');
173
+ const manifest = JSON.parse(raw);
174
+ if (terminal.has(manifest.status)) {
175
+ return manifest;
176
+ }
177
+ await new Promise((resolve) => setTimeout(resolve, intervalMs));
178
+ }
179
+ }
180
+ async function readRlmState(statePath) {
181
+ try {
182
+ const raw = await readFile(statePath, 'utf8');
183
+ const parsed = JSON.parse(raw);
184
+ if (!parsed?.final) {
185
+ return null;
186
+ }
187
+ return { exitCode: parsed.final.exitCode, status: parsed.final.status };
188
+ }
189
+ catch {
190
+ return null;
191
+ }
192
+ }
114
193
  async function handleStart(orchestrator, rawArgs) {
115
194
  const { positionals, flags } = parseArgs(rawArgs);
116
195
  const pipelineId = positionals[0];
117
196
  const format = flags['format'] === 'json' ? 'json' : 'text';
118
- const interactiveRequested = Boolean(flags['interactive'] || flags['ui']);
119
- const interactiveDisabled = Boolean(flags['no-interactive']);
120
- const runEvents = new RunEventEmitter();
121
- const gate = evaluateInteractiveGate({
122
- requested: interactiveRequested,
123
- disabled: interactiveDisabled,
124
- format,
125
- stdoutIsTTY: process.stdout.isTTY === true,
126
- stderrIsTTY: process.stderr.isTTY === true,
127
- term: process.env.TERM ?? null
128
- });
129
- const hud = await maybeStartHud(gate, runEvents);
130
- if (!gate.enabled && interactiveRequested && !interactiveDisabled && gate.reason) {
131
- console.error(`[HUD disabled] ${gate.reason}`);
132
- }
133
- try {
197
+ if (pipelineId === 'rlm') {
198
+ const goal = readStringFlag(flags, 'goal');
199
+ applyRlmEnvOverrides(flags, goal);
200
+ }
201
+ await withRunUi(flags, format, async (runEvents) => {
202
+ let taskIdOverride = typeof flags['task'] === 'string' ? flags['task'] : undefined;
203
+ if (pipelineId === 'rlm') {
204
+ taskIdOverride = resolveRlmTaskId(taskIdOverride);
205
+ process.env.MCP_RUNNER_TASK_ID = taskIdOverride;
206
+ if (format !== 'json') {
207
+ console.log(`Task: ${taskIdOverride}`);
208
+ }
209
+ }
134
210
  const result = await orchestrator.start({
135
211
  pipelineId,
136
- taskId: typeof flags['task'] === 'string' ? flags['task'] : undefined,
212
+ taskId: taskIdOverride,
137
213
  parentRunId: typeof flags['parent-run'] === 'string' ? flags['parent-run'] : undefined,
138
214
  approvalPolicy: typeof flags['approval-policy'] === 'string' ? flags['approval-policy'] : undefined,
139
215
  targetStageId: resolveTargetStageId(flags),
140
216
  runEvents
141
217
  });
142
- hud?.stop();
143
- const payload = {
144
- run_id: result.manifest.run_id,
145
- status: result.manifest.status,
146
- artifact_root: result.manifest.artifact_root,
147
- manifest: `${result.manifest.artifact_root}/manifest.json`,
148
- log_path: result.manifest.log_path
149
- };
150
- if (format === 'json') {
151
- console.log(JSON.stringify(payload, null, 2));
152
- }
153
- else {
154
- console.log(`Run started: ${payload.run_id}`);
155
- console.log(`Status: ${payload.status}`);
156
- console.log(`Manifest: ${payload.manifest}`);
157
- console.log(`Log: ${payload.log_path}`);
158
- }
159
- }
160
- finally {
161
- hud?.stop();
162
- runEvents.dispose();
163
- }
218
+ emitRunOutput(result, format, 'Run started');
219
+ });
164
220
  }
165
221
  async function handleFrontendTest(orchestrator, rawArgs) {
166
222
  const { positionals, flags } = parseArgs(rawArgs);
167
223
  const format = flags['format'] === 'json' ? 'json' : 'text';
168
224
  const devtools = Boolean(flags['devtools']);
169
- const interactiveRequested = Boolean(flags['interactive'] || flags['ui']);
170
- const interactiveDisabled = Boolean(flags['no-interactive']);
171
- const runEvents = new RunEventEmitter();
172
- const gate = evaluateInteractiveGate({
173
- requested: interactiveRequested,
174
- disabled: interactiveDisabled,
175
- format,
176
- stdoutIsTTY: process.stdout.isTTY === true,
177
- stderrIsTTY: process.stderr.isTTY === true,
178
- term: process.env.TERM ?? null
179
- });
180
- const hud = await maybeStartHud(gate, runEvents);
181
- if (!gate.enabled && interactiveRequested && !interactiveDisabled && gate.reason) {
182
- console.error(`[HUD disabled] ${gate.reason}`);
183
- }
184
225
  if (positionals.length > 0) {
185
226
  console.error(`[frontend-test] ignoring extra arguments: ${positionals.join(' ')}`);
186
227
  }
228
+ const originalDevtools = process.env.CODEX_REVIEW_DEVTOOLS;
229
+ if (devtools) {
230
+ process.env.CODEX_REVIEW_DEVTOOLS = '1';
231
+ }
187
232
  try {
188
- const pipelineId = devtools ? 'frontend-testing-devtools' : 'frontend-testing';
189
- const result = await orchestrator.start({
190
- pipelineId,
191
- taskId: typeof flags['task'] === 'string' ? flags['task'] : undefined,
192
- parentRunId: typeof flags['parent-run'] === 'string' ? flags['parent-run'] : undefined,
193
- approvalPolicy: typeof flags['approval-policy'] === 'string' ? flags['approval-policy'] : undefined,
194
- targetStageId: resolveTargetStageId(flags),
195
- runEvents
233
+ await withRunUi(flags, format, async (runEvents) => {
234
+ const result = await orchestrator.start({
235
+ pipelineId: 'frontend-testing',
236
+ taskId: typeof flags['task'] === 'string' ? flags['task'] : undefined,
237
+ parentRunId: typeof flags['parent-run'] === 'string' ? flags['parent-run'] : undefined,
238
+ approvalPolicy: typeof flags['approval-policy'] === 'string' ? flags['approval-policy'] : undefined,
239
+ targetStageId: resolveTargetStageId(flags),
240
+ runEvents
241
+ });
242
+ emitRunOutput(result, format, 'Run started');
196
243
  });
197
- hud?.stop();
198
- const payload = {
199
- run_id: result.manifest.run_id,
200
- status: result.manifest.status,
201
- artifact_root: result.manifest.artifact_root,
202
- manifest: `${result.manifest.artifact_root}/manifest.json`,
203
- log_path: result.manifest.log_path
204
- };
205
- if (format === 'json') {
206
- console.log(JSON.stringify(payload, null, 2));
207
- }
208
- else {
209
- console.log(`Run started: ${payload.run_id}`);
210
- console.log(`Status: ${payload.status}`);
211
- console.log(`Manifest: ${payload.manifest}`);
212
- console.log(`Log: ${payload.log_path}`);
213
- }
214
244
  }
215
245
  finally {
216
- hud?.stop();
217
- runEvents.dispose();
246
+ if (devtools) {
247
+ if (originalDevtools === undefined) {
248
+ delete process.env.CODEX_REVIEW_DEVTOOLS;
249
+ }
250
+ else {
251
+ process.env.CODEX_REVIEW_DEVTOOLS = originalDevtools;
252
+ }
253
+ }
218
254
  }
219
255
  }
220
256
  async function handlePlan(orchestrator, rawArgs) {
@@ -232,6 +268,47 @@ async function handlePlan(orchestrator, rawArgs) {
232
268
  }
233
269
  process.stdout.write(`${formatPlanPreview(result)}\n`);
234
270
  }
271
+ async function handleRlm(orchestrator, rawArgs) {
272
+ const { positionals, flags } = parseArgs(rawArgs);
273
+ const goalFromArgs = positionals.length > 0 ? positionals.join(' ') : undefined;
274
+ const goal = goalFromArgs ?? readStringFlag(flags, 'goal') ?? process.env.RLM_GOAL?.trim();
275
+ if (!goal) {
276
+ throw new Error('rlm requires a goal. Use: codex-orchestrator rlm \"<goal>\".');
277
+ }
278
+ const taskFlag = typeof flags['task'] === 'string' ? flags['task'] : undefined;
279
+ const taskId = resolveRlmTaskId(taskFlag);
280
+ process.env.MCP_RUNNER_TASK_ID = taskId;
281
+ applyRlmEnvOverrides(flags, goal);
282
+ console.log(`Task: ${taskId}`);
283
+ let startResult = null;
284
+ await withRunUi(flags, 'text', async (runEvents) => {
285
+ startResult = await orchestrator.start({
286
+ pipelineId: 'rlm',
287
+ taskId,
288
+ parentRunId: typeof flags['parent-run'] === 'string' ? flags['parent-run'] : undefined,
289
+ approvalPolicy: typeof flags['approval-policy'] === 'string' ? flags['approval-policy'] : undefined,
290
+ runEvents
291
+ });
292
+ emitRunOutput(startResult, 'text', 'Run started');
293
+ });
294
+ if (!startResult) {
295
+ throw new Error('rlm run failed to start.');
296
+ }
297
+ const resolvedStart = startResult;
298
+ const { repoRoot } = resolveEnvironmentPaths();
299
+ const manifestPath = join(repoRoot, resolvedStart.manifest.artifact_root, 'manifest.json');
300
+ const manifest = await waitForManifestCompletion(manifestPath);
301
+ const statePath = join(repoRoot, resolvedStart.manifest.artifact_root, 'rlm', 'state.json');
302
+ const rlmState = await readRlmState(statePath);
303
+ if (rlmState) {
304
+ console.log(`RLM status: ${rlmState.status}`);
305
+ process.exitCode = rlmState.exitCode;
306
+ return;
307
+ }
308
+ console.log(`RLM status: ${manifest.status}`);
309
+ console.error('RLM state file missing; treating as internal error.');
310
+ process.exitCode = 10;
311
+ }
235
312
  async function handleResume(orchestrator, rawArgs) {
236
313
  const { positionals, flags } = parseArgs(rawArgs);
237
314
  const runId = (flags['run'] ?? positionals[0]);
@@ -239,22 +316,7 @@ async function handleResume(orchestrator, rawArgs) {
239
316
  throw new Error('resume requires --run <run-id>.');
240
317
  }
241
318
  const format = flags['format'] === 'json' ? 'json' : 'text';
242
- const interactiveRequested = Boolean(flags['interactive'] || flags['ui']);
243
- const interactiveDisabled = Boolean(flags['no-interactive']);
244
- const runEvents = new RunEventEmitter();
245
- const gate = evaluateInteractiveGate({
246
- requested: interactiveRequested,
247
- disabled: interactiveDisabled,
248
- format,
249
- stdoutIsTTY: process.stdout.isTTY === true,
250
- stderrIsTTY: process.stderr.isTTY === true,
251
- term: process.env.TERM ?? null
252
- });
253
- const hud = await maybeStartHud(gate, runEvents);
254
- if (!gate.enabled && interactiveRequested && !interactiveDisabled && gate.reason) {
255
- console.error(`[HUD disabled] ${gate.reason}`);
256
- }
257
- try {
319
+ await withRunUi(flags, format, async (runEvents) => {
258
320
  const result = await orchestrator.resume({
259
321
  runId,
260
322
  resumeToken: typeof flags['token'] === 'string' ? flags['token'] : undefined,
@@ -263,28 +325,8 @@ async function handleResume(orchestrator, rawArgs) {
263
325
  targetStageId: resolveTargetStageId(flags),
264
326
  runEvents
265
327
  });
266
- hud?.stop();
267
- const payload = {
268
- run_id: result.manifest.run_id,
269
- status: result.manifest.status,
270
- artifact_root: result.manifest.artifact_root,
271
- manifest: `${result.manifest.artifact_root}/manifest.json`,
272
- log_path: result.manifest.log_path
273
- };
274
- if (format === 'json') {
275
- console.log(JSON.stringify(payload, null, 2));
276
- }
277
- else {
278
- console.log(`Run resumed: ${payload.run_id}`);
279
- console.log(`Status: ${payload.status}`);
280
- console.log(`Manifest: ${payload.manifest}`);
281
- console.log(`Log: ${payload.log_path}`);
282
- }
283
- }
284
- finally {
285
- hud?.stop();
286
- runEvents.dispose();
287
- }
328
+ emitRunOutput(result, format, 'Run resumed');
329
+ });
288
330
  }
289
331
  async function handleStatus(orchestrator, rawArgs) {
290
332
  const { positionals, flags } = parseArgs(rawArgs);
@@ -315,6 +357,47 @@ async function maybeStartHud(gate, emitter) {
315
357
  const { startHud } = await import('../orchestrator/src/cli/ui/controller.js');
316
358
  return startHud({ emitter, footerNote: 'interactive HUD (read-only)' });
317
359
  }
360
+ async function withRunUi(flags, format, action) {
361
+ const interactiveRequested = Boolean(flags['interactive'] || flags['ui']);
362
+ const interactiveDisabled = Boolean(flags['no-interactive']);
363
+ const runEvents = new RunEventEmitter();
364
+ const gate = evaluateInteractiveGate({
365
+ requested: interactiveRequested,
366
+ disabled: interactiveDisabled,
367
+ format,
368
+ stdoutIsTTY: process.stdout.isTTY === true,
369
+ stderrIsTTY: process.stderr.isTTY === true,
370
+ term: process.env.TERM ?? null
371
+ });
372
+ const hud = await maybeStartHud(gate, runEvents);
373
+ if (!gate.enabled && interactiveRequested && !interactiveDisabled && gate.reason) {
374
+ console.error(`[HUD disabled] ${gate.reason}`);
375
+ }
376
+ try {
377
+ await action(runEvents);
378
+ }
379
+ finally {
380
+ hud?.stop();
381
+ runEvents.dispose();
382
+ }
383
+ }
384
+ function emitRunOutput(result, format, label) {
385
+ const payload = {
386
+ run_id: result.manifest.run_id,
387
+ status: result.manifest.status,
388
+ artifact_root: result.manifest.artifact_root,
389
+ manifest: `${result.manifest.artifact_root}/manifest.json`,
390
+ log_path: result.manifest.log_path
391
+ };
392
+ if (format === 'json') {
393
+ console.log(JSON.stringify(payload, null, 2));
394
+ return;
395
+ }
396
+ console.log(`${label}: ${payload.run_id}`);
397
+ console.log(`Status: ${payload.status}`);
398
+ console.log(`Manifest: ${payload.manifest}`);
399
+ console.log(`Log: ${payload.log_path}`);
400
+ }
318
401
  async function handleExec(rawArgs) {
319
402
  const parsed = parseExecArgs(rawArgs);
320
403
  if (parsed.commandTokens.length === 0) {
@@ -322,7 +405,7 @@ async function handleExec(rawArgs) {
322
405
  }
323
406
  const isInteractive = process.stdout.isTTY === true && process.stderr.isTTY === true;
324
407
  const outputMode = parsed.requestedMode ?? (isInteractive ? 'interactive' : 'jsonl');
325
- const env = resolveEnvironment();
408
+ const env = normalizeEnvironmentPaths(resolveEnvironmentPaths());
326
409
  if (parsed.taskId) {
327
410
  env.taskId = sanitizeTaskId(parsed.taskId);
328
411
  }
@@ -429,6 +512,34 @@ async function handleMcp(rawArgs) {
429
512
  const dryRun = Boolean(flags['dry-run']);
430
513
  await serveMcp({ repoRoot, dryRun, extraArgs: positionals });
431
514
  }
515
+ async function handleDelegationServer(rawArgs) {
516
+ const { flags } = parseArgs(rawArgs);
517
+ const repoRoot = typeof flags['repo'] === 'string' ? flags['repo'] : process.cwd();
518
+ const modeFlag = typeof flags['mode'] === 'string' ? flags['mode'] : undefined;
519
+ const overrideFlag = typeof flags['config'] === 'string'
520
+ ? flags['config']
521
+ : typeof flags['config-override'] === 'string'
522
+ ? flags['config-override']
523
+ : undefined;
524
+ const envMode = process.env.CODEX_DELEGATE_MODE?.trim();
525
+ const resolvedMode = modeFlag ?? envMode;
526
+ let mode;
527
+ if (resolvedMode) {
528
+ if (resolvedMode === 'full' || resolvedMode === 'question_only') {
529
+ mode = resolvedMode;
530
+ }
531
+ else {
532
+ console.warn(`Invalid delegate mode "${resolvedMode}". Falling back to config default.`);
533
+ }
534
+ }
535
+ const configOverrides = overrideFlag
536
+ ? splitDelegationConfigOverrides(overrideFlag).map((value) => ({
537
+ source: 'cli',
538
+ value
539
+ }))
540
+ : [];
541
+ await startDelegationServer({ repoRoot, mode, configOverrides });
542
+ }
432
543
  function parseExecArgs(rawArgs) {
433
544
  const notifyTargets = [];
434
545
  let otelEndpoint = null;
@@ -550,6 +661,22 @@ Commands:
550
661
  --approval-policy <p> Record approval policy metadata.
551
662
  --format json Emit machine-readable output.
552
663
  --target <stage-id> Focus plan/build metadata on a specific stage (alias: --target-stage).
664
+ --goal "<goal>" When pipeline is rlm, set the RLM goal.
665
+ --validator <cmd|none> When pipeline is rlm, set the validator command.
666
+ --max-iterations <n> When pipeline is rlm, override max iterations.
667
+ --max-minutes <n> When pipeline is rlm, override max minutes.
668
+ --roles <single|triad> When pipeline is rlm, set role split.
669
+ --interactive | --ui Enable read-only HUD when running in a TTY.
670
+ --no-interactive Force disable HUD (default is off unless requested).
671
+
672
+ rlm "<goal>" Run RLM loop until validator passes.
673
+ --task <id> Override task identifier.
674
+ --validator <cmd|none> Set validator command or disable validation.
675
+ --max-iterations <n> Override max iterations (0 = unlimited with validator).
676
+ --max-minutes <n> Optional time-based guardrail in minutes.
677
+ --roles <single|triad> Choose single or triad role split.
678
+ --parent-run <id> Link run to parent run id.
679
+ --approval-policy <p> Record approval policy metadata.
553
680
  --interactive | --ui Enable read-only HUD when running in a TTY.
554
681
  --no-interactive Force disable HUD (default is off unless requested).
555
682
 
@@ -594,6 +721,10 @@ Commands:
594
721
  --yes Apply setup by running "codex mcp add ...".
595
722
  --format json Emit machine-readable output (dry-run only).
596
723
  mcp serve [--repo <path>] [--dry-run] [-- <extra args>]
724
+ delegate-server Run the delegation MCP server (stdio).
725
+ --repo <path> Repo root for config + manifests (default cwd).
726
+ --mode <full|question_only> Limit tool surface for child runs.
727
+ --config "<key>=<value>[;...]" Apply config overrides (repeat via separators).
597
728
  version | --version
598
729
 
599
730
  help Show this message.