mcp-agents 0.10.2 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +66 -4
  2. package/package.json +1 -1
  3. package/server.js +624 -63
package/README.md CHANGED
@@ -86,10 +86,11 @@ or Gemini during bridge calls.
86
86
  | `--model` | `gpt-5.5` | `model` |
87
87
  | `--model_reasoning_effort` | `xhigh` | `model_reasoning_effort` |
88
88
 
89
- Hardcoded defaults: `sandbox_mode=read-only`, `approval_policy=never`,
90
- `features.multi_agent=false`.
89
+ Other startup defaults: `sandbox_mode=workspace-write`, `approval_policy=never`
90
+ (both configurable via `--sandbox_mode` / `--approval_policy`, and steerable per
91
+ call); `features.multi_agent=false` is fixed.
91
92
 
92
- Startup flags (`--model`, `--model_reasoning_effort`) set the model and effort for the native Codex MCP server. Per-call `model` and `config` arguments are stripped from `tools/call` before they reach Codex, so a client cannot override the pinned model/effort (or the read-only/never sandbox config) for a single call. For example, this request:
93
+ Startup flags (`--model`, `--model_reasoning_effort`) set the model and effort for the native Codex MCP server. Per-call `model` and the model/effort keys inside a `config` override are stripped from `tools/call` before they reach Codex, so a client cannot override the pinned model/effort for a single call (`sandbox`, `cwd`, and `approval-policy` top-level and the matching `config` keys are intentionally left steerable per call). For example, this request:
93
94
 
94
95
  ```json
95
96
  {
@@ -101,6 +102,67 @@ Startup flags (`--model`, `--model_reasoning_effort`) set the model and effort f
101
102
 
102
103
  is forwarded to Codex as `{ "prompt": "Review this diff" }`. Change the model or effort at server startup instead.
103
104
 
105
+ **Goal injection.** You can give Codex a persistent objective. Set one at server
106
+ startup with `--goal "<text>"`, or per call with a `goal` argument in `tools/call`:
107
+
108
+ ```json
109
+ { "prompt": "Refactor the parser", "goal": "Keep the public API unchanged" }
110
+ ```
111
+
112
+ For the initial `codex` call the objective is injected into Codex's native
113
+ `developer-instructions` field (a developer-role message), so this is forwarded
114
+ to Codex as:
115
+
116
+ ```json
117
+ {
118
+ "prompt": "Refactor the parser",
119
+ "developer-instructions": "Persistent objective for this Codex thread (a standing goal — keep pursuing it across turns unless explicitly superseded):\nKeep the public API unchanged"
120
+ }
121
+ ```
122
+
123
+ A developer message persists for the whole thread, so `codex-reply` follow-ups
124
+ inherit the objective automatically. Because `codex-reply` has no
125
+ `developer-instructions` field, a per-call `goal` on a reply is instead added as
126
+ a concise `Reminder — standing objective for this thread: …` preamble on the
127
+ prompt. Any caller-supplied `developer-instructions` are preserved, with the
128
+ objective merged ahead of them.
129
+
130
+ The wrapper-only `goal` argument is always stripped before it reaches Codex (its
131
+ schema has no `goal`). A per-call `goal` overrides the `--goal` default for that
132
+ call; a per-call empty `goal` (`""`) suppresses the default for that one call; a
133
+ non-string `goal` is ignored (the `--goal` default still applies).
134
+
135
+ **Precedence within a thread.** The objective set on the initial `codex` call is
136
+ a developer-role message and persists for the whole thread, so it takes
137
+ precedence: a *different* `goal` supplied later on a `codex-reply` is only a
138
+ prompt-level reminder and will not reliably override the standing objective
139
+ (verified live — a reply goal that conflicts with the initial one is ignored in
140
+ favor of the standing one). The reply reminder works when it is *not* opposed by
141
+ a conflicting standing objective. To genuinely change the objective mid-stream,
142
+ start a new `codex` call rather than changing it on a `codex-reply`.
143
+
144
+ > **Note — this is not Codex's native `/goal`.** Codex's `/goal` slash command
145
+ > (durable, thread-scoped goal state with lifecycle/budget/evidence-based
146
+ > completion) is a TUI-only feature — it is parsed in the Codex terminal UI and
147
+ > is *not* reachable through `codex mcp-server`. Prefixing an MCP prompt with
148
+ > `/goal …` does **not** activate it; the text is just passed through as a user
149
+ > message. This wrapper therefore steers Codex with `developer-instructions`
150
+ > (the MCP-native vehicle for a standing objective), which is prompt/role
151
+ > conditioning, not the native goal-lifecycle subsystem.
152
+
153
+ **Idle watchdog.** The codex pass-through is transparent, so a Codex session that
154
+ stalls after doing work (e.g. its final model turn hangs, or it waits on an
155
+ elicitation the client never answers) would otherwise hang the caller's
156
+ `tools/call` forever. `--codex_idle_timeout <seconds>` (default `600`, `0`
157
+ disables) bounds this: if Codex emits nothing while a request is in flight for
158
+ that long, the wrapper returns a JSON-RPC error (`-32001`) for the open
159
+ request(s), kills the Codex process group, and exits — turning an unbounded stall
160
+ into a surfaced error. The timer resets on any Codex output or inbound client
161
+ activity and is suspended while the client backpressures stdout, so healthy long
162
+ or interactive runs are not killed. The wrapper also exits (instead of lingering)
163
+ if Codex dies or fails to start, so a dead Codex can never leave the caller
164
+ hanging.
165
+
104
166
  ## Integration with Claude Code
105
167
 
106
168
  Add entries to your project's `.mcp.json` (requires `npm i -g mcp-agents`):
@@ -133,7 +195,7 @@ Override codex defaults at server startup:
133
195
  }
134
196
  ```
135
197
 
136
- The model and effort are fixed at server startup. Per-call `model` and `config` arguments sent to the native `codex` tool are stripped before reaching Codex, so they cannot override the startup defaults.
198
+ The model and effort are fixed at server startup. Per-call `model` and the model/effort keys inside a `config` override sent to the native `codex` tool are stripped before reaching Codex, so they cannot override the startup model/effort (per-call `sandbox`/`cwd`/`approval-policy` are left intact). Add `"--goal", "<text>"` to `args` to inject a persistent objective into every Codex call (see [Goal injection](#codex-pass-through) above).
137
199
 
138
200
  Because the bridge runs in an isolated Codex home, inherited MCP servers from your normal
139
201
  `~/.codex/config.toml` are intentionally unavailable inside bridged Codex sessions.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mcp-agents",
3
- "version": "0.10.2",
3
+ "version": "0.12.0",
4
4
  "description": "MCP server that wraps AI CLI tools (Claude Code, Gemini CLI, Codex CLI) for use by any MCP client",
5
5
  "type": "module",
6
6
  "bin": {
package/server.js CHANGED
@@ -31,6 +31,11 @@ const DEFAULT_CODEX_MODEL = "gpt-5.5";
31
31
  const DEFAULT_CODEX_MODEL_REASONING_EFFORT = "xhigh";
32
32
  const DEFAULT_CODEX_SANDBOX_MODE = "workspace-write";
33
33
  const DEFAULT_CODEX_APPROVAL_POLICY = "never";
34
+ // Idle watchdog for the codex pass-through: if a request is in flight and codex
35
+ // emits nothing on stdout/stderr for this long, the wrapper synthesizes a
36
+ // JSON-RPC error for the open request(s) and tears codex down — converting an
37
+ // unbounded post-completion stall into a surfaced error. 0 disables it.
38
+ const DEFAULT_CODEX_IDLE_TIMEOUT_MS = 600_000;
34
39
  const DEFAULT_CLAUDE_MODEL = "claude-opus-4-8";
35
40
  const DEFAULT_CLAUDE_EFFORT = "xhigh";
36
41
  // tools/call argument keys stripped from the codex pass-through so callers
@@ -61,7 +66,7 @@ const CODEX_STRIPPED_CONFIG_KEYS = [
61
66
  ];
62
67
  const MAX_BUFFER_BYTES = 10 * 1024 * 1024;
63
68
  const CLAUDE_EMPTY_OUTPUT_MAX_ATTEMPTS = 2;
64
- const SIGNAL_CODES = { SIGHUP: 1, SIGINT: 2, SIGTERM: 15 };
69
+ const SIGNAL_CODES = { SIGHUP: 1, SIGINT: 2, SIGKILL: 9, SIGTERM: 15 };
65
70
  const SHUTDOWN_TIMEOUT_MS = 3_000;
66
71
  let fatalShutdown;
67
72
 
@@ -187,6 +192,12 @@ Options:
187
192
  danger-full-access [default: ${DEFAULT_CODEX_SANDBOX_MODE}]
188
193
  --approval_policy <policy> Codex approval policy: untrusted, on-failure,
189
194
  on-request, never [default: ${DEFAULT_CODEX_APPROVAL_POLICY}]
195
+ --goal <text> Persistent objective injected into every Codex
196
+ call (as developer-instructions, or a prompt
197
+ reminder on codex-reply); per-call \`goal\` arg
198
+ overrides it [default: none]
199
+ --codex_idle_timeout <secs> Codex pass-through idle watchdog; 0 disables
200
+ [default: ${DEFAULT_CODEX_IDLE_TIMEOUT_MS / 1000}]
190
201
  --timeout <seconds> Default timeout per call [default: 300]
191
202
  --help, -h Show this help message
192
203
  --version, -v Show version number`);
@@ -195,8 +206,9 @@ Options:
195
206
  /**
196
207
  * Parse CLI flags from process.argv.
197
208
  * Handles --help, --version, --provider, --model, --model_reasoning_effort,
198
- * --sandbox_mode, --approval_policy, and unknown flags.
199
- * @returns {{ provider: string, model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, defaultTimeoutMs?: number }}
209
+ * --sandbox_mode, --approval_policy, --goal, --codex_idle_timeout, and unknown
210
+ * flags.
211
+ * @returns {{ provider: string, model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, goal?: string, codexIdleTimeoutMs?: number, defaultTimeoutMs?: number }}
200
212
  */
201
213
  function parseArgs() {
202
214
  const args = process.argv.slice(2);
@@ -205,6 +217,8 @@ function parseArgs() {
205
217
  let modelReasoningEffort;
206
218
  let sandboxMode;
207
219
  let approvalPolicy;
220
+ let goal;
221
+ let codexIdleTimeoutMs;
208
222
  let defaultTimeoutMs;
209
223
 
210
224
  for (let i = 0; i < args.length; i++) {
@@ -256,6 +270,28 @@ function parseArgs() {
256
270
  }
257
271
  approvalPolicy = args[++i];
258
272
  break;
273
+ case "--goal":
274
+ if (i + 1 >= args.length) {
275
+ process.stderr.write("error: --goal requires a value\n");
276
+ process.exit(1);
277
+ }
278
+ goal = args[++i];
279
+ break;
280
+ case "--codex_idle_timeout": {
281
+ if (i + 1 >= args.length) {
282
+ process.stderr.write("error: --codex_idle_timeout requires a value\n");
283
+ process.exit(1);
284
+ }
285
+ const secs = Number(args[++i]);
286
+ if (!Number.isFinite(secs) || secs < 0) {
287
+ process.stderr.write(
288
+ "error: --codex_idle_timeout must be a non-negative number\n",
289
+ );
290
+ process.exit(1);
291
+ }
292
+ codexIdleTimeoutMs = Math.round(secs * 1000);
293
+ break;
294
+ }
259
295
  case "--timeout": {
260
296
  if (i + 1 >= args.length) {
261
297
  process.stderr.write("error: --timeout requires a value\n");
@@ -281,6 +317,8 @@ function parseArgs() {
281
317
  modelReasoningEffort,
282
318
  sandboxMode,
283
319
  approvalPolicy,
320
+ goal,
321
+ codexIdleTimeoutMs,
284
322
  defaultTimeoutMs,
285
323
  };
286
324
  }
@@ -459,41 +497,98 @@ function createIsolatedCodexHome({
459
497
  approvalPolicy,
460
498
  }) {
461
499
  const codexHome = mkdtempSync(join(tmpdir(), "mcp-agents-codex-"));
462
- const sourceAuthPath = join(resolveCodexHome(), "auth.json");
463
- const targetAuthPath = join(codexHome, "auth.json");
464
- const configPath = join(codexHome, "config.toml");
500
+ // If auth copy or config write throws after the dir exists, remove the
501
+ // partially-prepared dir before rethrowing so it is never leaked.
502
+ try {
503
+ const sourceAuthPath = join(resolveCodexHome(), "auth.json");
504
+ const targetAuthPath = join(codexHome, "auth.json");
505
+ const configPath = join(codexHome, "config.toml");
506
+
507
+ if (existsSync(sourceAuthPath)) {
508
+ copyFileSync(sourceAuthPath, targetAuthPath);
509
+ }
510
+
511
+ writeFileSync(
512
+ configPath,
513
+ buildCodexBridgeConfig({
514
+ model,
515
+ modelReasoningEffort,
516
+ sandboxMode,
517
+ approvalPolicy,
518
+ }),
519
+ "utf8",
520
+ );
465
521
 
466
- if (existsSync(sourceAuthPath)) {
467
- copyFileSync(sourceAuthPath, targetAuthPath);
522
+ return codexHome;
523
+ } catch (err) {
524
+ try { rmSync(codexHome, { recursive: true, force: true }); } catch {}
525
+ throw err;
468
526
  }
527
+ }
469
528
 
470
- writeFileSync(
471
- configPath,
472
- buildCodexBridgeConfig({
473
- model,
474
- modelReasoningEffort,
475
- sandboxMode,
476
- approvalPolicy,
477
- }),
478
- "utf8",
479
- );
529
+ /**
530
+ * Build the text for codex's native `developer-instructions` field (a
531
+ * developer-role message) from a goal. This is the MCP-correct vehicle for a
532
+ * standing objective: it is higher-altitude than the user prompt and persists
533
+ * across the thread. It is NOT codex's `/goal` subsystem — that is a TUI-only
534
+ * slash command (parsed in codex-rs/tui, e.g. chatwidget/slash_dispatch.rs) and
535
+ * is not reachable through the MCP `codex`/`codex-reply` tool surface. Any
536
+ * caller-supplied developer instructions are preserved after the objective.
537
+ * @param {string} goal
538
+ * @param {string} [existing] caller-supplied developer-instructions, if any
539
+ * @returns {string}
540
+ */
541
+ function buildGoalDeveloperInstructions(goal, existing) {
542
+ const directive =
543
+ "Persistent objective for this Codex thread (a standing goal — keep " +
544
+ "pursuing it across turns unless explicitly superseded):\n" +
545
+ goal.trim();
546
+ const prior = typeof existing === "string" ? existing.trim() : "";
547
+ return prior ? `${directive}\n\n---\n\n${prior}` : directive;
548
+ }
480
549
 
481
- return codexHome;
550
+ /**
551
+ * Prepend a concise goal reminder to a prompt. Used for `codex-reply` turns,
552
+ * which expose no `developer-instructions` field, so the prompt is the only
553
+ * vehicle left to restate the standing objective. A blank goal leaves the
554
+ * prompt untouched.
555
+ * @param {string} prompt
556
+ * @param {string} goal
557
+ * @returns {string}
558
+ */
559
+ function applyGoalPreamble(prompt, goal) {
560
+ const trimmedGoal = (goal ?? "").trim();
561
+ const body = prompt ?? "";
562
+ if (!trimmedGoal) return body;
563
+ return `Reminder — standing objective for this thread: ${trimmedGoal}\n\n${body}`;
482
564
  }
483
565
 
484
566
  /**
485
567
  * Filter a single newline-delimited JSON-RPC message on its way to the codex
486
- * pass-through. Strips per-call model/effort overrides from `tools/call` so the
487
- * client cannot escape the pinned model/effort — both the top-level `model` arg
488
- * and the model-envelope keys inside a `config` override map. sandbox/cwd/
489
- * approval-policy (top-level and inside `config`) are intentionally left intact
490
- * so callers can steer them per call. Non-`tools/call`, unparseable, and
491
- * nothing-to-strip lines are returned byte-for-byte unchanged so the MCP framing
492
- * is preserved.
568
+ * pass-through. Two transforms, both confined to `tools/call`:
569
+ * 1. Strip per-call model/effort overrides — the top-level `model` arg and the
570
+ * model-envelope keys inside a `config` override map — so the client cannot
571
+ * escape the pinned model/effort. sandbox/cwd/approval-policy (top-level and
572
+ * inside `config`) are intentionally left intact so callers can steer them
573
+ * per call.
574
+ * 2. Goal injection — codex's native `/goal` is a TUI-only slash command, not
575
+ * reachable via MCP, so a wrapper-only `goal` arg is always stripped and the
576
+ * objective is injected the MCP-correct way: into `developer-instructions`
577
+ * (a developer-role message) for the initial `codex` call, or as a concise
578
+ * prompt reminder for a `codex-reply` turn (which has no
579
+ * `developer-instructions` field). A per-call `goal` overrides the
580
+ * server-wide `--goal` default (`opts.serverGoal`); only a string per-call
581
+ * goal overrides (a blank one suppresses the default for that call), while a
582
+ * non-string `goal` is dropped without disturbing the default.
583
+ * Non-`tools/call`, unparseable, and nothing-to-change lines are returned
584
+ * byte-for-byte unchanged so the MCP framing is preserved; any actual mutation
585
+ * re-serializes the message (the intended, framing-safe path for a changed
586
+ * message).
493
587
  * @param {string} line
588
+ * @param {{ serverGoal?: string }} [opts]
494
589
  * @returns {string}
495
590
  */
496
- function filterCodexToolCall(line) {
591
+ function filterCodexToolCall(line, opts = {}) {
497
592
  const trimmed = line.trim();
498
593
  if (!trimmed) return line;
499
594
 
@@ -536,31 +631,83 @@ function filterCodexToolCall(line) {
536
631
  if (Object.keys(cfg).length === 0) delete args.config;
537
632
  }
538
633
 
539
- if (removed.length === 0) return line; // nothing pinned to strip — keep framing
634
+ // ── Goal injection ────────────────────────────────────────────────────────
635
+ // A per-call `goal` (any value) is always stripped — codex's schema has no
636
+ // `goal`, so it must never be forwarded. Only a STRING per-call goal counts as
637
+ // an override: a string (including "") replaces the server default for this
638
+ // call, so "" suppresses it. A non-string `goal` is malformed and is dropped
639
+ // without disturbing the configured server default. A blank effective goal
640
+ // injects nothing.
641
+ let goalLog;
642
+ let goalSource = "server";
643
+ let effectiveGoal = opts.serverGoal;
644
+ if ("goal" in args) {
645
+ const perCallGoal = args.goal;
646
+ delete args.goal;
647
+ goalLog = "stripped per-call goal arg";
648
+ if (typeof perCallGoal === "string") {
649
+ effectiveGoal = perCallGoal;
650
+ goalSource = "per-call";
651
+ }
652
+ }
653
+ if (effectiveGoal && effectiveGoal.trim()) {
654
+ if (msg.params?.name === "codex") {
655
+ // Initial `codex` call: the native developer-instructions field is the
656
+ // correct, thread-persistent vehicle for a standing objective.
657
+ args["developer-instructions"] = buildGoalDeveloperInstructions(
658
+ effectiveGoal,
659
+ args["developer-instructions"],
660
+ );
661
+ goalLog = `injected ${goalSource} goal into developer-instructions`;
662
+ } else if (msg.params?.name === "codex-reply" && typeof args.prompt === "string") {
663
+ // codex-reply has no developer-instructions field, so restate the
664
+ // objective as a concise prompt reminder. Any other (unknown/future) tool
665
+ // is left untouched — only the wrapper-only `goal` arg stripped above is
666
+ // removed, never the prompt — so the byte-for-byte invariant holds for
667
+ // tools this wrapper does not explicitly support.
668
+ args.prompt = applyGoalPreamble(args.prompt, effectiveGoal);
669
+ goalLog = `injected ${goalSource} goal into codex-reply prompt`;
670
+ }
671
+ }
540
672
 
541
- logErr(
542
- `[mcp-agents] codex passthrough: pinning model/effort, stripped: ${removed.join(", ")}`,
543
- );
673
+ if (removed.length === 0 && !goalLog) return line; // nothing changed — keep framing
674
+
675
+ if (removed.length > 0) {
676
+ logErr(
677
+ `[mcp-agents] codex passthrough: pinning model/effort, stripped: ${removed.join(", ")}`,
678
+ );
679
+ }
680
+ if (goalLog) {
681
+ logErr(`[mcp-agents] codex passthrough: ${goalLog}`);
682
+ }
544
683
  return JSON.stringify(msg);
545
684
  }
546
685
 
547
686
  /**
548
- * Spawn codex mcp-server as a pass-through. stdout/stderr flow straight back to
549
- * the client, but the client's stdin is intercepted line-by-line so per-call
550
- * model/config overrides are stripped before reaching codex.
551
- * @param {{ model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string }} opts
687
+ * Spawn codex mcp-server as a pass-through. codex stdout is forwarded back to
688
+ * the client byte-for-byte, but the client's stdin is intercepted line-by-line
689
+ * so per-call model/config overrides are stripped before reaching codex. An
690
+ * idle watchdog converts an unbounded codex stall (no stdout/stderr while a
691
+ * request is in flight) into a synthesized JSON-RPC error so the caller never
692
+ * hangs forever.
693
+ * @param {{ model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, idleTimeoutMs?: number, goal?: string }} opts
552
694
  */
553
695
  function runCodexPassthrough({
554
696
  model,
555
697
  modelReasoningEffort,
556
698
  sandboxMode,
557
699
  approvalPolicy,
700
+ idleTimeoutMs,
701
+ goal,
558
702
  }) {
559
703
  const resolvedModel = model || DEFAULT_CODEX_MODEL;
560
704
  const resolvedModelReasoningEffort =
561
705
  modelReasoningEffort || DEFAULT_CODEX_MODEL_REASONING_EFFORT;
562
706
  const resolvedSandboxMode = sandboxMode || DEFAULT_CODEX_SANDBOX_MODE;
563
707
  const resolvedApprovalPolicy = approvalPolicy || DEFAULT_CODEX_APPROVAL_POLICY;
708
+ const resolvedIdleTimeoutMs = idleTimeoutMs ?? DEFAULT_CODEX_IDLE_TIMEOUT_MS;
709
+ // Server-wide default goal (string or undefined); per-call `goal` overrides it.
710
+ const resolvedGoal = goal;
564
711
  let isolatedCodexHome;
565
712
 
566
713
  try {
@@ -595,70 +742,480 @@ function runCodexPassthrough({
595
742
  `[mcp-agents] passthrough: codex ${args.join(" ")} ` +
596
743
  `(model=${resolvedModel}, reasoning_effort=${resolvedModelReasoningEffort}, ` +
597
744
  `sandbox_mode=${resolvedSandboxMode}, approval_policy=${resolvedApprovalPolicy}, ` +
598
- `isolated_home=true)`,
745
+ `goal=${resolvedGoal && resolvedGoal.trim() ? "set" : "none"}, ` +
746
+ `idle_timeout_ms=${resolvedIdleTimeoutMs}, isolated_home=true)`,
599
747
  );
600
748
 
601
749
  const child = spawn("codex", args, {
602
750
  env: { ...process.env, CODEX_HOME: isolatedCodexHome },
603
- // stdin is piped (not inherited) so we can strip per-call overrides;
604
- // stdout stays inherited so codex responses reach the client untouched.
605
- stdio: ["pipe", "inherit", "pipe"],
751
+ // stdin is piped so we can strip per-call overrides; stdout is piped (not
752
+ // inherited) so the wrapper can both forward responses byte-for-byte AND
753
+ // observe them for the idle watchdog. detached:true puts codex in its own
754
+ // process group so a stall is torn down group-wide (mirrors runCli).
755
+ detached: true,
756
+ stdio: ["pipe", "pipe", "pipe"],
606
757
  });
607
758
 
759
+ const NEWLINE = 0x0a;
760
+ // Clean the isolated home on any exit path, not just the ones we route through
761
+ // hardExit() (e.g. a global uncaughtException handler calling process.exit).
762
+ process.once("exit", () => cleanupIsolatedCodexHome());
763
+
764
+ // Install signal teardown IMMEDIATELY after spawn (before the heavier wiring
765
+ // below) so a signal in the startup window can never orphan the detached
766
+ // group. `finalize` is a forward reference — safe because the handler body
767
+ // only runs when a signal fires, which is after this synchronous setup
768
+ // completes and `finalize` is defined.
769
+ for (const sig of ["SIGTERM", "SIGINT", "SIGHUP"]) {
770
+ process.once(sig, () => {
771
+ finalize({
772
+ reason: `signal ${sig}`,
773
+ emit: false,
774
+ exitCode: 128 + SIGNAL_CODES[sig],
775
+ });
776
+ });
777
+ }
778
+
779
+ // ── In-flight request tracking ──────────────────────────────────────────
780
+ // Client requests (id + method) awaiting a codex response. Keyed by a
781
+ // type-preserving key so JSON-RPC `1` (number) and `"1"` (string) never
782
+ // collide. `canceled` marks ids the client gave up on (notifications/
783
+ // cancelled): we never synthesize a response for them, but they still count
784
+ // toward teardown so a canceled-but-wedged codex is not left running.
785
+ const inFlight = new Map();
786
+ const idKey = (id) => `${typeof id}:${id}`;
787
+ const addInFlight = (id) => {
788
+ if (id == null) return;
789
+ const key = idKey(id);
790
+ if (!inFlight.has(key)) inFlight.set(key, { id, canceled: false });
791
+ };
792
+ const clearInFlight = (id) => {
793
+ if (id != null) inFlight.delete(idKey(id));
794
+ };
795
+ const cancelInFlight = (id) => {
796
+ const entry = id == null ? undefined : inFlight.get(idKey(id));
797
+ if (entry) entry.canceled = true;
798
+ };
799
+ const hasEmittableInFlight = () => {
800
+ for (const entry of inFlight.values()) if (!entry.canceled) return true;
801
+ return false;
802
+ };
803
+
804
+ // ── Liveness / lifecycle state ──────────────────────────────────────────
805
+ let finalizing = false;
806
+ let exited = false;
807
+ let stdoutPaused = false; // process.stdout backpressured (downstream, not idle)
808
+ let idleTimer;
809
+ let lastForwardedByteWasNewline = true; // nothing forwarded yet
810
+ let stdoutObsBuf = Buffer.alloc(0); // observation copy of codex stdout
811
+ let skippingFrame = false; // mid-skip of an oversized stdout frame (resync at \n)
812
+ let droppedFrameResponseId; // partial oversized frame's classified id (cleared at its newline)
813
+ let observationDropLogged = false; // log the first observation-cap drop only
814
+
815
+ const killGroup = (signal) => {
816
+ try {
817
+ if (child.pid) process.kill(-child.pid, signal);
818
+ else child.kill(signal);
819
+ } catch {
820
+ try { child.kill(signal); } catch {}
821
+ }
822
+ };
823
+
824
+ const clearIdle = () => {
825
+ if (idleTimer) {
826
+ clearTimeout(idleTimer);
827
+ idleTimer = undefined;
828
+ }
829
+ };
830
+ const armIdle = () => {
831
+ clearIdle();
832
+ // No watchdog when disabled, while finalizing, or while downstream is
833
+ // backpressured (blocked downstream != idle upstream).
834
+ if (!(resolvedIdleTimeoutMs > 0) || finalizing || stdoutPaused) return;
835
+ idleTimer = setTimeout(onIdle, resolvedIdleTimeoutMs);
836
+ };
837
+ const resetIdle = armIdle;
838
+
839
+ // Parse one complete codex->client stdout frame (observation only — the raw
840
+ // bytes are forwarded separately). Clears an id once its result/error lands.
841
+ const observeOutgoingLine = (line) => {
842
+ const trimmed = line.trim();
843
+ if (!trimmed) return;
844
+ let msg;
845
+ try { msg = JSON.parse(trimmed); } catch { return; }
846
+ if (
847
+ msg && typeof msg === "object" && "id" in msg &&
848
+ ("result" in msg || "error" in msg)
849
+ ) {
850
+ clearInFlight(msg.id);
851
+ }
852
+ };
853
+
854
+ // Classify a (possibly oversized) frame from a bounded prefix: return the
855
+ // request id iff it is clearly a RESPONSE — a top-level "result"/"error" with
856
+ // the "id" appearing before it and no top-level "method" preceding it.
857
+ // Assumes codex's (serde_json) serialization order: a response is
858
+ // {jsonrpc,id,result|error} (id/result within the first handful of bytes), and
859
+ // a notification/request emits its top-level "method" before "params". Under
860
+ // that contract a nested "result"/"id" inside a non-response's params cannot be
861
+ // misread as a response. Only ever consulted for frames too large to buffer.
862
+ const FRAME_HEADER_SCAN = 8192;
863
+ const peekResponseId = (prefix) => {
864
+ const s = prefix
865
+ .subarray(0, Math.min(prefix.length, FRAME_HEADER_SCAN))
866
+ .toString("utf8");
867
+ const resultAt = s.search(/"(?:result|error)"\s*:/);
868
+ if (resultAt === -1) return undefined; // no result/error -> not a response
869
+ const methodAt = s.search(/"method"\s*:/);
870
+ if (methodAt !== -1 && methodAt < resultAt) return undefined; // request/notif
871
+ // Capture the full id TOKEN (number or quoted string) and JSON-decode it so
872
+ // the value matches what noteInbound stored via JSON.parse — otherwise an
873
+ // escaped string id (e.g. "a\\b") would not equal the tracked key.
874
+ const idMatch = s
875
+ .slice(0, resultAt)
876
+ .match(/"id"\s*:\s*(-?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?|"(?:[^"\\]|\\.)*")/);
877
+ if (!idMatch) return undefined;
878
+ try { return JSON.parse(idMatch[1]); } catch { return undefined; }
879
+ };
880
+
881
+ const logObservationDropOnce = () => {
882
+ if (!observationDropLogged) {
883
+ logErr(
884
+ "[mcp-agents] codex passthrough: stdout frame exceeded observation cap; " +
885
+ "classifying it via a bounded header scan (forwarding unaffected)",
886
+ );
887
+ observationDropLogged = true;
888
+ }
889
+ };
890
+
891
+ // Resolve a dropped frame's effect on id-tracking. The frame's raw bytes were
892
+ // already forwarded to the client. If a bounded header scan proves it is the
893
+ // RESPONSE for an in-flight id, clear exactly that id — so we neither
894
+ // double-respond with a synthetic error nor falsely idle-kill a healthy
895
+ // session once codex goes quiet. If it is NOT a response (notification /
896
+ // server->client request) or cannot be classified, leave the in-flight ids
897
+ // tracked so a genuine post-frame stall is still caught. ONLY call this once
898
+ // the frame is COMPLETE (its terminating newline has been seen): clearing on a
899
+ // still-partial frame would prematurely untrack an id whose response codex may
900
+ // never finish writing, re-introducing a hang.
901
+ const resolveDroppedFrame = (prefix) => {
902
+ const id = peekResponseId(prefix);
903
+ if (id !== undefined) clearInFlight(id);
904
+ };
905
+
906
+ // Accumulate codex stdout into the observation buffer and parse each complete
907
+ // frame to clear in-flight ids. Soft-bounded by MAX_BUFFER_BYTES so a
908
+ // pathologically large single frame cannot exhaust memory — the bound is
909
+ // approximate (a frame may transiently allocate up to one stream chunk beyond
910
+ // the cap before being dropped). The RAW bytes are always forwarded untouched
911
+ // by the caller regardless. A dropped frame is handled by onObservedFrameDropped().
912
+ const observeOutgoing = (chunk) => {
913
+ let data = chunk;
914
+ if (skippingFrame) {
915
+ const nl = data.indexOf(NEWLINE);
916
+ if (nl === -1) return; // still inside the oversized frame
917
+ // The oversized frame just COMPLETED. Apply the deferred clear now: if its
918
+ // header looked like a response, the response genuinely finished, so clear
919
+ // that id. (If codex had stalled mid-frame, this newline never arrives and
920
+ // the id stays tracked so the watchdog still catches the stall.)
921
+ skippingFrame = false;
922
+ if (droppedFrameResponseId !== undefined) {
923
+ clearInFlight(droppedFrameResponseId);
924
+ droppedFrameResponseId = undefined;
925
+ }
926
+ data = data.subarray(nl + 1); // resume parsing after the frame boundary
927
+ }
928
+ stdoutObsBuf = stdoutObsBuf.length ? Buffer.concat([stdoutObsBuf, data]) : data;
929
+ let nl;
930
+ while ((nl = stdoutObsBuf.indexOf(NEWLINE)) !== -1) {
931
+ if (nl > MAX_BUFFER_BYTES) {
932
+ // A COMPLETE frame larger than the cap: it fully arrived, so classify it
933
+ // from a bounded header prefix and clear its id now (no huge alloc).
934
+ logObservationDropOnce();
935
+ resolveDroppedFrame(stdoutObsBuf.subarray(0, nl));
936
+ stdoutObsBuf = stdoutObsBuf.subarray(nl + 1);
937
+ continue;
938
+ }
939
+ const line = stdoutObsBuf.subarray(0, nl).toString("utf8");
940
+ stdoutObsBuf = stdoutObsBuf.subarray(nl + 1);
941
+ observeOutgoingLine(line);
942
+ }
943
+ if (stdoutObsBuf.length > MAX_BUFFER_BYTES) {
944
+ // A PARTIAL frame already past the cap with no newline yet: classify the
945
+ // prefix but DEFER clearing to the frame's newline (above) — clearing now
946
+ // would untrack an id whose response codex might never finish, hanging it.
947
+ logObservationDropOnce();
948
+ droppedFrameResponseId = peekResponseId(stdoutObsBuf);
949
+ stdoutObsBuf = Buffer.alloc(0);
950
+ skippingFrame = true;
951
+ }
952
+ };
953
+
954
+ const hardExit = (code) => {
955
+ if (exited) return;
956
+ exited = true;
957
+ clearIdle();
958
+ cleanupIsolatedCodexHome();
959
+ process.exit(code);
960
+ };
961
+ const flushThenExit = (code) => {
962
+ if (exited) return;
963
+ if (process.stdout.writableLength === 0) {
964
+ hardExit(code);
965
+ return;
966
+ }
967
+ // Ref'd safety timer guarantees exit if 'drain' never fires (client gone).
968
+ const safety = setTimeout(() => hardExit(code), 2_000);
969
+ process.stdout.once("drain", () => {
970
+ clearTimeout(safety);
971
+ hardExit(code);
972
+ });
973
+ };
974
+
975
+ // Single, idempotent teardown. `emit` controls whether open (non-canceled)
976
+ // requests get a synthetic JSON-RPC error before exit. The detached group is
977
+ // killed on EVERY teardown path so codex and any descendants are never
978
+ // orphaned.
979
+ const finalize = ({ reason, emit, exitCode }) => {
980
+ if (finalizing) return;
981
+ finalizing = true;
982
+ clearIdle();
983
+ logErr(`[mcp-agents] codex passthrough finalize: ${reason}`);
984
+
985
+ // Stop forwarding further codex stdout so a late real response cannot race
986
+ // the synthetic error onto the wire after we've taken over the stream.
987
+ try { child.stdout?.pause(); } catch {}
988
+
989
+ // Kill the whole detached group so codex AND any descendants it spawned are
990
+ // reaped on EVERY teardown path — never orphaned. On abort paths (idle /
991
+ // signal / EPIPE / fatal) codex is still alive, so there is no PID-reuse
992
+ // risk; on a natural close/spawn-error this runs synchronously right after
993
+ // the child was reaped (a negligible reuse window) to clean up anything
994
+ // codex left behind in its group. A SIGKILL on an already-empty group is a
995
+ // harmless ESRCH (swallowed by killGroup).
996
+ killGroup("SIGKILL");
997
+
998
+ if (emit && hasEmittableInFlight()) {
999
+ // Framing recovery: if codex left a dangling partial line on the wire, try
1000
+ // to parse it (it may itself be the real response) and terminate it with a
1001
+ // newline so the synthetic frame cannot glue onto a half-written line.
1002
+ if (stdoutObsBuf.length > 0) {
1003
+ observeOutgoingLine(stdoutObsBuf.toString("utf8"));
1004
+ stdoutObsBuf = Buffer.alloc(0);
1005
+ try { process.stdout.write("\n"); } catch {}
1006
+ lastForwardedByteWasNewline = true;
1007
+ } else if (!lastForwardedByteWasNewline) {
1008
+ try { process.stdout.write("\n"); } catch {}
1009
+ lastForwardedByteWasNewline = true;
1010
+ }
1011
+
1012
+ for (const entry of inFlight.values()) {
1013
+ if (entry.canceled) continue;
1014
+ const frame = {
1015
+ jsonrpc: "2.0",
1016
+ id: entry.id,
1017
+ error: {
1018
+ code: -32001,
1019
+ message:
1020
+ `mcp-agents: codex pass-through aborted before responding ` +
1021
+ `(${reason}); the request was still open. Any applied edits may ` +
1022
+ `exist — verify the tree.`,
1023
+ },
1024
+ };
1025
+ try { process.stdout.write(`${JSON.stringify(frame)}\n`); } catch {}
1026
+ }
1027
+ }
1028
+
1029
+ flushThenExit(exitCode);
1030
+ };
1031
+
1032
+ // Route the global uncaughtException/unhandledRejection handlers through the
1033
+ // same teardown so codex's DETACHED group is always killed — otherwise those
1034
+ // handlers call process.exit() directly and orphan codex (the 'exit' handler
1035
+ // only deletes CODEX_HOME, it cannot reap a detached group).
1036
+ fatalShutdown = (reason, code) =>
1037
+ finalize({ reason: `fatal: ${reason}`, emit: true, exitCode: code ?? 1 });
1038
+
1039
+ function onIdle() {
1040
+ idleTimer = undefined;
1041
+ if (finalizing) return;
1042
+ if (hasEmittableInFlight()) {
1043
+ finalize({
1044
+ reason: `idle timeout (${Math.round(resolvedIdleTimeoutMs / 1000)}s)`,
1045
+ emit: true,
1046
+ exitCode: 1,
1047
+ });
1048
+ return;
1049
+ }
1050
+ // Only canceled requests left -> tear down quietly. Nothing open at all ->
1051
+ // healthy idle between calls, just re-arm.
1052
+ if (inFlight.size > 0) {
1053
+ finalize({
1054
+ reason: "idle timeout (canceled-only)",
1055
+ emit: false,
1056
+ exitCode: 1,
1057
+ });
1058
+ } else {
1059
+ armIdle();
1060
+ }
1061
+ }
1062
+
608
1063
  child.stderr.on("data", (chunk) => {
1064
+ resetIdle();
609
1065
  logErr(`[codex] ${chunk.toString().trimEnd()}`);
610
1066
  });
611
1067
 
1068
+ // Forward codex stdout to the client byte-for-byte (raw Buffer) and keep a
1069
+ // parallel observation buffer (split on the newline BYTE) to clear in-flight
1070
+ // ids as their responses land. Raw chunks are forwarded; reconstructed lines
1071
+ // are never written back.
1072
+ child.stdout.on("data", (chunk) => {
1073
+ if (finalizing) return; // stream ownership has been taken over
1074
+ resetIdle();
1075
+
1076
+ // Forward the raw bytes FIRST so a bug in observation can never affect the
1077
+ // byte-for-byte passthrough (observation is best-effort id-tracking only).
1078
+ if (chunk.length > 0) {
1079
+ lastForwardedByteWasNewline = chunk[chunk.length - 1] === NEWLINE;
1080
+ }
1081
+ const ok = process.stdout.write(chunk);
1082
+ try {
1083
+ observeOutgoing(chunk); // bounded parse-for-ids; never alters forwarded bytes
1084
+ } catch (err) {
1085
+ const msg = err instanceof Error ? err.message : String(err);
1086
+ logErr(`[mcp-agents] codex passthrough: stdout observation error (ignored): ${msg}`);
1087
+ }
1088
+ if (!ok) {
1089
+ // Downstream full: pause codex and suspend the idle watchdog until the
1090
+ // client drains, so a slow reader is never mistaken for a stalled codex.
1091
+ // Trade-off: a client that never drains keeps the request open with no
1092
+ // watchdog — but a synthetic error could not be delivered to it anyway.
1093
+ stdoutPaused = true;
1094
+ clearIdle();
1095
+ child.stdout.pause();
1096
+ }
1097
+ });
1098
+
1099
+ process.stdout.on("drain", () => {
1100
+ if (!stdoutPaused) return;
1101
+ stdoutPaused = false;
1102
+ if (finalizing) return;
1103
+ child.stdout.resume();
1104
+ resetIdle();
1105
+ });
1106
+
1107
+ process.stdout.on("error", (err) => {
1108
+ // Client went away mid-write: nothing left to answer, tear codex down.
1109
+ if (err && err.code === "EPIPE") {
1110
+ finalize({ reason: "stdout EPIPE", emit: false, exitCode: 0 });
1111
+ }
1112
+ });
1113
+
612
1114
  // Pump client stdin -> codex stdin, splitting on the newline BYTE (0x0a) that
613
1115
  // delimits MCP stdio JSON-RPC frames. Buffering raw bytes (not per-chunk
614
1116
  // strings) avoids corrupting a multibyte UTF-8 sequence that straddles two
615
1117
  // read chunks, which would otherwise break the byte-for-byte passthrough.
616
1118
  child.stdin.on("error", () => {}); // ignore EPIPE if codex exits early
617
- const NEWLINE = 0x0a;
1119
+
1120
+ // Read-only inbound tracking: record client requests (id + method) as
1121
+ // in-flight and honor cancellations. Never mutates what is forwarded —
1122
+ // filterCodexToolCall remains the sole authority on the forwarded bytes.
1123
+ const noteInbound = (line) => {
1124
+ const trimmed = line.trim();
1125
+ if (!trimmed) return;
1126
+ let msg;
1127
+ try { msg = JSON.parse(trimmed); } catch { return; }
1128
+ if (!msg || typeof msg !== "object") return;
1129
+ // (Watchdog liveness is reset at the byte level in the stdin 'data' handler,
1130
+ // so even an elicitation response — bare id, no method — keeps a healthy
1131
+ // interactive flow alive.)
1132
+ if (msg.method === "notifications/cancelled") {
1133
+ cancelInFlight(msg.params?.requestId);
1134
+ return;
1135
+ }
1136
+ // A client message awaits a response iff it carries BOTH an id and a method.
1137
+ // A bare id with no method is a *response* to a codex elicitation — skip it
1138
+ // for in-flight tracking.
1139
+ if (msg.id != null && typeof msg.method === "string") {
1140
+ addInFlight(msg.id);
1141
+ }
1142
+ };
1143
+
618
1144
  let stdinBuf = Buffer.alloc(0);
619
1145
  process.stdin.on("data", (chunk) => {
1146
+ // ANY inbound bytes mean the client side of the exchange is alive — even a
1147
+ // large/slow elicitation response arriving across chunks without a newline.
1148
+ // Reset the watchdog here at the BYTE level (not per parsed line): a truly
1149
+ // stalled exchange (codex silent AND client sending nothing) still produces
1150
+ // no inbound, so the genuine stall is still caught.
1151
+ resetIdle();
620
1152
  stdinBuf = stdinBuf.length ? Buffer.concat([stdinBuf, chunk]) : chunk;
621
1153
  let nl;
622
1154
  while ((nl = stdinBuf.indexOf(NEWLINE)) !== -1) {
623
1155
  const line = stdinBuf.subarray(0, nl).toString("utf8");
624
1156
  stdinBuf = stdinBuf.subarray(nl + 1);
625
- child.stdin.write(`${filterCodexToolCall(line)}\n`);
1157
+ noteInbound(line);
1158
+ child.stdin.write(`${filterCodexToolCall(line, { serverGoal: resolvedGoal })}\n`);
626
1159
  }
627
1160
  });
628
1161
  process.stdin.on("error", () => {});
629
1162
  process.stdin.on("end", () => {
630
1163
  if (stdinBuf.length > 0) {
631
- child.stdin.write(filterCodexToolCall(stdinBuf.toString("utf8")));
1164
+ const line = stdinBuf.toString("utf8");
1165
+ noteInbound(line);
1166
+ child.stdin.write(filterCodexToolCall(line, { serverGoal: resolvedGoal }));
632
1167
  }
633
1168
  child.stdin.end();
634
1169
  });
635
1170
 
636
- for (const sig of ["SIGTERM", "SIGINT", "SIGHUP"]) {
637
- process.once(sig, () => {
638
- child.kill(sig);
639
- setTimeout(() => {
640
- child.kill("SIGKILL");
641
- cleanupIsolatedCodexHome();
642
- process.exit(128 + SIGNAL_CODES[sig]);
643
- }, 5000).unref();
644
- });
645
- }
646
-
647
1171
  child.on("error", (err) => {
648
- cleanupIsolatedCodexHome();
649
1172
  logErr(`[mcp-agents] failed to start codex: ${err.message}`);
650
- process.exitCode = 1;
1173
+ // codex failed to start. The fix that matters is that we EXIT (instead of
1174
+ // leaving a childless wrapper alive on the client's open stdin, which used
1175
+ // to hang). `emit` synthesizes an error only if a request was already
1176
+ // tracked; spawn 'error' usually fires before any stdin is read, so the
1177
+ // client typically just sees the server exit — the conventional
1178
+ // "server failed to start".
1179
+ finalize({
1180
+ reason: `codex spawn error: ${err.message}`,
1181
+ emit: true,
1182
+ exitCode: 1,
1183
+ });
651
1184
  });
652
1185
 
653
- child.on("exit", (code, signal) => {
654
- cleanupIsolatedCodexHome();
655
- if (signal) {
656
- logErr(`[mcp-agents] codex killed by ${signal}`);
657
- process.exitCode = 128 + (SIGNAL_CODES[signal] ?? 0);
658
- } else {
659
- if (code !== 0) logErr(`[mcp-agents] codex exited with code ${code}`);
660
- process.exitCode = code ?? 1;
1186
+ // codex death is handled via BOTH 'exit' and 'close':
1187
+ // - 'exit' fires when the codex PROCESS terminates. A descendant that
1188
+ // inherited codex's stdio can hold those pipes open, delaying or even
1189
+ // preventing 'close' (and would be orphaned), so we kill the group here to
1190
+ // reap it which also lets 'close' fire. A ref'd fallback guarantees
1191
+ // teardown even if a descendant escaped the group (setsid) so 'close'
1192
+ // never arrives.
1193
+ // - 'close' fires once all stdio is drained, so codex's final response has
1194
+ // been delivered and its id cleared — only THEN do we decide whether to
1195
+ // synthesize, which avoids double-responding.
1196
+ let childExitInfo = null;
1197
+ const onChildGone = () => {
1198
+ const code = childExitInfo?.code;
1199
+ const signal = childExitInfo?.signal;
1200
+ if (signal) logErr(`[mcp-agents] codex killed by ${signal}`);
1201
+ else if (code != null && code !== 0) {
1202
+ logErr(`[mcp-agents] codex exited with code ${code}`);
661
1203
  }
1204
+ finalize({
1205
+ reason: signal ? `codex killed by ${signal}` : `codex exited (code ${code})`,
1206
+ emit: true,
1207
+ exitCode: signal ? 128 + (SIGNAL_CODES[signal] ?? 0) : (code ?? 1),
1208
+ });
1209
+ };
1210
+
1211
+ child.on("exit", (code, signal) => {
1212
+ childExitInfo = { code, signal };
1213
+ killGroup("SIGKILL");
1214
+ setTimeout(onChildGone, 2_000);
1215
+ });
1216
+ child.on("close", (code, signal) => {
1217
+ if (!childExitInfo) childExitInfo = { code, signal };
1218
+ onChildGone();
662
1219
  });
663
1220
  }
664
1221
 
@@ -673,6 +1230,8 @@ async function main() {
673
1230
  modelReasoningEffort,
674
1231
  sandboxMode,
675
1232
  approvalPolicy,
1233
+ goal,
1234
+ codexIdleTimeoutMs,
676
1235
  defaultTimeoutMs,
677
1236
  } = parseArgs();
678
1237
  const backend = CLI_BACKENDS[providerName];
@@ -690,6 +1249,8 @@ async function main() {
690
1249
  modelReasoningEffort,
691
1250
  sandboxMode,
692
1251
  approvalPolicy,
1252
+ goal,
1253
+ idleTimeoutMs: codexIdleTimeoutMs,
693
1254
  });
694
1255
  return;
695
1256
  }