mcp-agents 0.11.0 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (3) hide show
  1. package/README.md +61 -4
  2. package/package.json +1 -1
  3. package/server.js +401 -41
package/README.md CHANGED
@@ -86,10 +86,11 @@ or Gemini during bridge calls.
86
86
  | `--model` | `gpt-5.5` | `model` |
87
87
  | `--model_reasoning_effort` | `xhigh` | `model_reasoning_effort` |
88
88
 
89
- Hardcoded defaults: `sandbox_mode=read-only`, `approval_policy=never`,
90
- `features.multi_agent=false`.
89
+ Other startup defaults: `sandbox_mode=workspace-write`, `approval_policy=never`
90
+ (both configurable via `--sandbox_mode` / `--approval_policy`, and steerable per
91
+ call); `features.multi_agent=false` is fixed.
91
92
 
92
- Startup flags (`--model`, `--model_reasoning_effort`) set the model and effort for the native Codex MCP server. Per-call `model` and `config` arguments are stripped from `tools/call` before they reach Codex, so a client cannot override the pinned model/effort (or the read-only/never sandbox config) for a single call. For example, this request:
93
+ Startup flags (`--model`, `--model_reasoning_effort`) set the model and effort for the native Codex MCP server. Per-call `model` and the model/effort keys inside a `config` override are stripped from `tools/call` before they reach Codex, so a client cannot override the pinned model/effort for a single call (`sandbox`, `cwd`, and `approval-policy` top-level and the matching `config` keys are intentionally left steerable per call). For example, this request:
93
94
 
94
95
  ```json
95
96
  {
@@ -101,6 +102,62 @@ Startup flags (`--model`, `--model_reasoning_effort`) set the model and effort f
101
102
 
102
103
  is forwarded to Codex as `{ "prompt": "Review this diff" }`. Change the model or effort at server startup instead.
103
104
 
105
+ **Goal injection.** You can give Codex a persistent objective. Set one at server
106
+ startup with `--goal "<text>"`, or per call with a `goal` argument in `tools/call`:
107
+
108
+ ```json
109
+ { "prompt": "Refactor the parser", "goal": "Keep the public API unchanged" }
110
+ ```
111
+
112
+ For the initial `codex` call the objective is injected into Codex's native
113
+ `developer-instructions` field (a developer-role message), so this is forwarded
114
+ to Codex as:
115
+
116
+ ```json
117
+ {
118
+ "prompt": "Refactor the parser",
119
+ "developer-instructions": "Persistent objective for this Codex thread (a standing goal — keep pursuing it across turns unless explicitly superseded):\nKeep the public API unchanged"
120
+ }
121
+ ```
122
+
123
+ A developer message persists for the whole thread, so `codex-reply` follow-ups
124
+ inherit the objective automatically. Because `codex-reply` has no
125
+ `developer-instructions` field, a per-call `goal` on a reply is instead added as
126
+ a concise `Reminder — standing objective for this thread: …` preamble on the
127
+ prompt. Any caller-supplied `developer-instructions` are preserved, with the
128
+ objective merged ahead of them.
129
+
130
+ The wrapper-only `goal` argument is always stripped before it reaches Codex (it
131
+ is never a native Codex parameter). A per-call `goal` overrides the `--goal`
132
+ default for that call; a per-call empty `goal` (`""`) suppresses the default for
133
+ that one call; a non-string `goal` is ignored (the `--goal` default still
134
+ applies).
135
+
136
+ So a client's model knows it can pass `goal`, the pass-through advertises it: it
137
+ rewrites its own `tools/list` response to declare an optional `goal` property on
138
+ the `codex` and `codex-reply` tool schemas (models only generate arguments
139
+ declared in a tool's `inputSchema`). Only `properties` is augmented — `required`
140
+ and `additionalProperties` are left intact — and the rewrite touches only the
141
+ `tools/list` response; every other frame is forwarded byte-for-byte.
142
+
143
+ **Precedence within a thread.** The objective set on the initial `codex` call is
144
+ a developer-role message and persists for the whole thread, so it takes
145
+ precedence: a *different* `goal` supplied later on a `codex-reply` is only a
146
+ prompt-level reminder and will not reliably override the standing objective
147
+ (verified live — a reply goal that conflicts with the initial one is ignored in
148
+ favor of the standing one). The reply reminder works when it is *not* opposed by
149
+ a conflicting standing objective. To genuinely change the objective mid-stream,
150
+ start a new `codex` call rather than changing it on a `codex-reply`.
151
+
152
+ > **Note — this is not Codex's native `/goal`.** Codex's `/goal` slash command
153
+ > (durable, thread-scoped goal state with lifecycle/budget/evidence-based
154
+ > completion) is a TUI-only feature — it is parsed in the Codex terminal UI and
155
+ > is *not* reachable through `codex mcp-server`. Prefixing an MCP prompt with
156
+ > `/goal …` does **not** activate it; the text is just passed through as a user
157
+ > message. This wrapper therefore steers Codex with `developer-instructions`
158
+ > (the MCP-native vehicle for a standing objective), which is prompt/role
159
+ > conditioning, not the native goal-lifecycle subsystem.
160
+
104
161
  **Idle watchdog.** The codex pass-through is transparent, so a Codex session that
105
162
  stalls after doing work (e.g. its final model turn hangs, or it waits on an
106
163
  elicitation the client never answers) would otherwise hang the caller's
@@ -146,7 +203,7 @@ Override codex defaults at server startup:
146
203
  }
147
204
  ```
148
205
 
149
- The model and effort are fixed at server startup. Per-call `model` and `config` arguments sent to the native `codex` tool are stripped before reaching Codex, so they cannot override the startup defaults.
206
+ The model and effort are fixed at server startup. Per-call `model` and the model/effort keys inside a `config` override sent to the native `codex` tool are stripped before reaching Codex, so they cannot override the startup model/effort (per-call `sandbox`/`cwd`/`approval-policy` are left intact). Add `"--goal", "<text>"` to `args` to inject a persistent objective into every Codex call (see [Goal injection](#codex-pass-through) above).
150
207
 
151
208
  Because the bridge runs in an isolated Codex home, inherited MCP servers from your normal
152
209
  `~/.codex/config.toml` are intentionally unavailable inside bridged Codex sessions.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "mcp-agents",
3
- "version": "0.11.0",
3
+ "version": "0.12.1",
4
4
  "description": "MCP server that wraps AI CLI tools (Claude Code, Gemini CLI, Codex CLI) for use by any MCP client",
5
5
  "type": "module",
6
6
  "bin": {
package/server.js CHANGED
@@ -192,6 +192,10 @@ Options:
192
192
  danger-full-access [default: ${DEFAULT_CODEX_SANDBOX_MODE}]
193
193
  --approval_policy <policy> Codex approval policy: untrusted, on-failure,
194
194
  on-request, never [default: ${DEFAULT_CODEX_APPROVAL_POLICY}]
195
+ --goal <text> Persistent objective injected into every Codex
196
+ call (as developer-instructions, or a prompt
197
+ reminder on codex-reply); per-call \`goal\` arg
198
+ overrides it [default: none]
195
199
  --codex_idle_timeout <secs> Codex pass-through idle watchdog; 0 disables
196
200
  [default: ${DEFAULT_CODEX_IDLE_TIMEOUT_MS / 1000}]
197
201
  --timeout <seconds> Default timeout per call [default: 300]
@@ -202,8 +206,9 @@ Options:
202
206
  /**
203
207
  * Parse CLI flags from process.argv.
204
208
  * Handles --help, --version, --provider, --model, --model_reasoning_effort,
205
- * --sandbox_mode, --approval_policy, --codex_idle_timeout, and unknown flags.
206
- * @returns {{ provider: string, model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, codexIdleTimeoutMs?: number, defaultTimeoutMs?: number }}
209
+ * --sandbox_mode, --approval_policy, --goal, --codex_idle_timeout, and unknown
210
+ * flags.
211
+ * @returns {{ provider: string, model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, goal?: string, codexIdleTimeoutMs?: number, defaultTimeoutMs?: number }}
207
212
  */
208
213
  function parseArgs() {
209
214
  const args = process.argv.slice(2);
@@ -212,6 +217,7 @@ function parseArgs() {
212
217
  let modelReasoningEffort;
213
218
  let sandboxMode;
214
219
  let approvalPolicy;
220
+ let goal;
215
221
  let codexIdleTimeoutMs;
216
222
  let defaultTimeoutMs;
217
223
 
@@ -264,6 +270,13 @@ function parseArgs() {
264
270
  }
265
271
  approvalPolicy = args[++i];
266
272
  break;
273
+ case "--goal":
274
+ if (i + 1 >= args.length) {
275
+ process.stderr.write("error: --goal requires a value\n");
276
+ process.exit(1);
277
+ }
278
+ goal = args[++i];
279
+ break;
267
280
  case "--codex_idle_timeout": {
268
281
  if (i + 1 >= args.length) {
269
282
  process.stderr.write("error: --codex_idle_timeout requires a value\n");
@@ -304,6 +317,7 @@ function parseArgs() {
304
317
  modelReasoningEffort,
305
318
  sandboxMode,
306
319
  approvalPolicy,
320
+ goal,
307
321
  codexIdleTimeoutMs,
308
322
  defaultTimeoutMs,
309
323
  };
@@ -512,19 +526,69 @@ function createIsolatedCodexHome({
512
526
  }
513
527
  }
514
528
 
529
+ /**
530
+ * Build the text for codex's native `developer-instructions` field (a
531
+ * developer-role message) from a goal. This is the MCP-correct vehicle for a
532
+ * standing objective: it is higher-altitude than the user prompt and persists
533
+ * across the thread. It is NOT codex's `/goal` subsystem — that is a TUI-only
534
+ * slash command (parsed in codex-rs/tui, e.g. chatwidget/slash_dispatch.rs) and
535
+ * is not reachable through the MCP `codex`/`codex-reply` tool surface. Any
536
+ * caller-supplied developer instructions are preserved after the objective.
537
+ * @param {string} goal
538
+ * @param {string} [existing] caller-supplied developer-instructions, if any
539
+ * @returns {string}
540
+ */
541
+ function buildGoalDeveloperInstructions(goal, existing) {
542
+ const directive =
543
+ "Persistent objective for this Codex thread (a standing goal — keep " +
544
+ "pursuing it across turns unless explicitly superseded):\n" +
545
+ goal.trim();
546
+ const prior = typeof existing === "string" ? existing.trim() : "";
547
+ return prior ? `${directive}\n\n---\n\n${prior}` : directive;
548
+ }
549
+
550
+ /**
551
+ * Prepend a concise goal reminder to a prompt. Used for `codex-reply` turns,
552
+ * which expose no `developer-instructions` field, so the prompt is the only
553
+ * vehicle left to restate the standing objective. A blank goal leaves the
554
+ * prompt untouched.
555
+ * @param {string} prompt
556
+ * @param {string} goal
557
+ * @returns {string}
558
+ */
559
+ function applyGoalPreamble(prompt, goal) {
560
+ const trimmedGoal = (goal ?? "").trim();
561
+ const body = prompt ?? "";
562
+ if (!trimmedGoal) return body;
563
+ return `Reminder — standing objective for this thread: ${trimmedGoal}\n\n${body}`;
564
+ }
565
+
515
566
  /**
516
567
  * Filter a single newline-delimited JSON-RPC message on its way to the codex
517
- * pass-through. Strips per-call model/effort overrides from `tools/call` so the
518
- * client cannot escape the pinned model/effort — both the top-level `model` arg
519
- * and the model-envelope keys inside a `config` override map. sandbox/cwd/
520
- * approval-policy (top-level and inside `config`) are intentionally left intact
521
- * so callers can steer them per call. Non-`tools/call`, unparseable, and
522
- * nothing-to-strip lines are returned byte-for-byte unchanged so the MCP framing
523
- * is preserved.
568
+ * pass-through. Two transforms, both confined to `tools/call`:
569
+ * 1. Strip per-call model/effort overrides — the top-level `model` arg and the
570
+ * model-envelope keys inside a `config` override map — so the client cannot
571
+ * escape the pinned model/effort. sandbox/cwd/approval-policy (top-level and
572
+ * inside `config`) are intentionally left intact so callers can steer them
573
+ * per call.
574
+ * 2. Goal injection — codex's native `/goal` is a TUI-only slash command, not
575
+ * reachable via MCP, so a wrapper-only `goal` arg is always stripped and the
576
+ * objective is injected the MCP-correct way: into `developer-instructions`
577
+ * (a developer-role message) for the initial `codex` call, or as a concise
578
+ * prompt reminder for a `codex-reply` turn (which has no
579
+ * `developer-instructions` field). A per-call `goal` overrides the
580
+ * server-wide `--goal` default (`opts.serverGoal`); only a string per-call
581
+ * goal overrides (a blank one suppresses the default for that call), while a
582
+ * non-string `goal` is dropped without disturbing the default.
583
+ * Non-`tools/call`, unparseable, and nothing-to-change lines are returned
584
+ * byte-for-byte unchanged so the MCP framing is preserved; any actual mutation
585
+ * re-serializes the message (the intended, framing-safe path for a changed
586
+ * message).
524
587
  * @param {string} line
588
+ * @param {{ serverGoal?: string }} [opts]
525
589
  * @returns {string}
526
590
  */
527
- function filterCodexToolCall(line) {
591
+ function filterCodexToolCall(line, opts = {}) {
528
592
  const trimmed = line.trim();
529
593
  if (!trimmed) return line;
530
594
 
@@ -567,14 +631,100 @@ function filterCodexToolCall(line) {
567
631
  if (Object.keys(cfg).length === 0) delete args.config;
568
632
  }
569
633
 
570
- if (removed.length === 0) return line; // nothing pinned to strip — keep framing
634
+ // ── Goal injection ────────────────────────────────────────────────────────
635
+ // A per-call `goal` (any value) is always stripped — codex's schema has no
636
+ // `goal`, so it must never be forwarded. Only a STRING per-call goal counts as
637
+ // an override: a string (including "") replaces the server default for this
638
+ // call, so "" suppresses it. A non-string `goal` is malformed and is dropped
639
+ // without disturbing the configured server default. A blank effective goal
640
+ // injects nothing.
641
+ let goalLog;
642
+ let goalSource = "server";
643
+ let effectiveGoal = opts.serverGoal;
644
+ if ("goal" in args) {
645
+ const perCallGoal = args.goal;
646
+ delete args.goal;
647
+ goalLog = "stripped per-call goal arg";
648
+ if (typeof perCallGoal === "string") {
649
+ effectiveGoal = perCallGoal;
650
+ goalSource = "per-call";
651
+ }
652
+ }
653
+ if (effectiveGoal && effectiveGoal.trim()) {
654
+ if (msg.params?.name === "codex") {
655
+ // Initial `codex` call: the native developer-instructions field is the
656
+ // correct, thread-persistent vehicle for a standing objective.
657
+ args["developer-instructions"] = buildGoalDeveloperInstructions(
658
+ effectiveGoal,
659
+ args["developer-instructions"],
660
+ );
661
+ goalLog = `injected ${goalSource} goal into developer-instructions`;
662
+ } else if (msg.params?.name === "codex-reply" && typeof args.prompt === "string") {
663
+ // codex-reply has no developer-instructions field, so restate the
664
+ // objective as a concise prompt reminder. Any other (unknown/future) tool
665
+ // is left untouched — only the wrapper-only `goal` arg stripped above is
666
+ // removed, never the prompt — so the byte-for-byte invariant holds for
667
+ // tools this wrapper does not explicitly support.
668
+ args.prompt = applyGoalPreamble(args.prompt, effectiveGoal);
669
+ goalLog = `injected ${goalSource} goal into codex-reply prompt`;
670
+ }
671
+ }
571
672
 
572
- logErr(
573
- `[mcp-agents] codex passthrough: pinning model/effort, stripped: ${removed.join(", ")}`,
574
- );
673
+ if (removed.length === 0 && !goalLog) return line; // nothing changed — keep framing
674
+
675
+ if (removed.length > 0) {
676
+ logErr(
677
+ `[mcp-agents] codex passthrough: pinning model/effort, stripped: ${removed.join(", ")}`,
678
+ );
679
+ }
680
+ if (goalLog) {
681
+ logErr(`[mcp-agents] codex passthrough: ${goalLog}`);
682
+ }
575
683
  return JSON.stringify(msg);
576
684
  }
577
685
 
686
+ // Tools whose advertised inputSchema gains a wrapper-only `goal` property so a
687
+ // client's model knows it can pass one (the model only emits args declared in
688
+ // the schema). The arg is stripped inbound by filterCodexToolCall before it
689
+ // reaches codex; advertising it here is purely for discoverability.
690
+ const CODEX_GOAL_TOOLS = new Set(["codex", "codex-reply"]);
691
+ const CODEX_GOAL_PROPERTY_DESCRIPTION =
692
+ "Optional standing objective for this Codex session. mcp-agents injects it as " +
693
+ "`developer-instructions` (codex) or a prompt reminder (codex-reply); it is not a " +
694
+ "native Codex parameter. Overrides the server-wide --goal default for this call; " +
695
+ "pass an empty string to suppress that default.";
696
+
697
+ /**
698
+ * Mutate a parsed `tools/list` RESPONSE in place, adding a `goal` property to the
699
+ * advertised inputSchema of the `codex` and `codex-reply` tools. Returns true iff
700
+ * it added `goal` to at least one tool. Only `properties` is touched — `required`
701
+ * and `additionalProperties` are left intact (a declared property is not an
702
+ * "additional" one, so `additionalProperties:false` stays valid). Best-effort per
703
+ * tool: a target tool that already declares `goal` (idempotent) or whose
704
+ * inputSchema.properties is missing/malformed (drifted schema) is simply skipped;
705
+ * other valid targets are still augmented. Returns false (→ the caller forwards the
706
+ * original bytes byte-for-byte) for an error response, a non-array `result.tools`,
707
+ * or when no `codex`/`codex-reply` target was augmentable.
708
+ * @param {any} msg
709
+ * @returns {boolean}
710
+ */
711
+ function injectGoalIntoToolsListMessage(msg) {
712
+ const tools = msg?.result?.tools;
713
+ if (!Array.isArray(tools)) return false;
714
+ let changed = false;
715
+ for (const tool of tools) {
716
+ if (!tool || typeof tool !== "object" || !CODEX_GOAL_TOOLS.has(tool.name)) continue;
717
+ const schema = tool.inputSchema;
718
+ if (!schema || typeof schema !== "object" || Array.isArray(schema)) continue;
719
+ const props = schema.properties;
720
+ if (!props || typeof props !== "object" || Array.isArray(props)) continue;
721
+ if ("goal" in props) continue; // idempotent — respect an existing declaration
722
+ props.goal = { type: "string", description: CODEX_GOAL_PROPERTY_DESCRIPTION };
723
+ changed = true;
724
+ }
725
+ return changed;
726
+ }
727
+
578
728
  /**
579
729
  * Spawn codex mcp-server as a pass-through. codex stdout is forwarded back to
580
730
  * the client byte-for-byte, but the client's stdin is intercepted line-by-line
@@ -582,7 +732,7 @@ function filterCodexToolCall(line) {
582
732
  * idle watchdog converts an unbounded codex stall (no stdout/stderr while a
583
733
  * request is in flight) into a synthesized JSON-RPC error so the caller never
584
734
  * hangs forever.
585
- * @param {{ model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, idleTimeoutMs?: number }} opts
735
+ * @param {{ model?: string, modelReasoningEffort?: string, sandboxMode?: string, approvalPolicy?: string, idleTimeoutMs?: number, goal?: string }} opts
586
736
  */
587
737
  function runCodexPassthrough({
588
738
  model,
@@ -590,6 +740,7 @@ function runCodexPassthrough({
590
740
  sandboxMode,
591
741
  approvalPolicy,
592
742
  idleTimeoutMs,
743
+ goal,
593
744
  }) {
594
745
  const resolvedModel = model || DEFAULT_CODEX_MODEL;
595
746
  const resolvedModelReasoningEffort =
@@ -597,6 +748,8 @@ function runCodexPassthrough({
597
748
  const resolvedSandboxMode = sandboxMode || DEFAULT_CODEX_SANDBOX_MODE;
598
749
  const resolvedApprovalPolicy = approvalPolicy || DEFAULT_CODEX_APPROVAL_POLICY;
599
750
  const resolvedIdleTimeoutMs = idleTimeoutMs ?? DEFAULT_CODEX_IDLE_TIMEOUT_MS;
751
+ // Server-wide default goal (string or undefined); per-call `goal` overrides it.
752
+ const resolvedGoal = goal;
600
753
  let isolatedCodexHome;
601
754
 
602
755
  try {
@@ -631,6 +784,7 @@ function runCodexPassthrough({
631
784
  `[mcp-agents] passthrough: codex ${args.join(" ")} ` +
632
785
  `(model=${resolvedModel}, reasoning_effort=${resolvedModelReasoningEffort}, ` +
633
786
  `sandbox_mode=${resolvedSandboxMode}, approval_policy=${resolvedApprovalPolicy}, ` +
787
+ `goal=${resolvedGoal && resolvedGoal.trim() ? "set" : "none"}, ` +
634
788
  `idle_timeout_ms=${resolvedIdleTimeoutMs}, isolated_home=true)`,
635
789
  );
636
790
 
@@ -700,6 +854,18 @@ function runCodexPassthrough({
700
854
  let droppedFrameResponseId; // partial oversized frame's classified id (cleared at its newline)
701
855
  let observationDropLogged = false; // log the first observation-cap drop only
702
856
 
857
+ // ── tools/list goal-advertising rewrite (contained latch) ────────────────
858
+ // While a `tools/list` request id is outstanding the forwarder switches from
859
+ // raw passthrough to buffer-and-rewrite, injecting a `goal` property into the
860
+ // advertised codex/codex-reply schemas of that one response, then returns to
861
+ // raw. Observation above stays the SOLE authority for inFlight/the watchdog;
862
+ // this path only changes HOW bytes reach the wire.
863
+ const pendingToolsListIds = new Set(); // idKey(id) of outstanding tools/list requests (the latch)
864
+ let rewriteBuf = Buffer.alloc(0); // buffer-mode accumulator; holds ≤1 trailing partial after a flush
865
+ let rewriteSkipUntilNewline = false; // forwarding raw to the next newline (oversized frame or mode-boundary align)
866
+ let rewriteSkipReleaseId; // idKey to release when the skipped frame's newline lands (oversized response only)
867
+ let oversizedToolsListLogged = false; // log the first rewrite-cap drop only
868
+
703
869
  const killGroup = (signal) => {
704
870
  try {
705
871
  if (child.pid) process.kill(-child.pid, signal);
@@ -884,10 +1050,53 @@ function runCodexPassthrough({
884
1050
  killGroup("SIGKILL");
885
1051
 
886
1052
  if (emit && hasEmittableInFlight()) {
887
- // Framing recovery: if codex left a dangling partial line on the wire, try
888
- // to parse it (it may itself be the real response) and terminate it with a
889
- // newline so the synthetic frame cannot glue onto a half-written line.
890
- if (stdoutObsBuf.length > 0) {
1053
+ // Framing recovery. Precedence handles bytes WITHHELD by buffer mode (which
1054
+ // the plain stdoutObsBuf recovery would mis-handle). EVERY write here is
1055
+ // try/catch-guarded: finalize runs synchronously from close/exit/idle/signal
1056
+ // handlers, so an unguarded EPIPE would escape into uncaughtException ->
1057
+ // fatalShutdown -> a re-entrant finalize early-return, skipping
1058
+ // flushThenExit/process.exit and hanging the wrapper.
1059
+ if (rewriteSkipUntilNewline) {
1060
+ // Oversized/align mid-skip: head already forwarded raw, remainder
1061
+ // unrecoverable. Discard; the -32001 loop covers the still-open id.
1062
+ rewriteBuf = Buffer.alloc(0);
1063
+ rewriteSkipUntilNewline = false;
1064
+ stdoutObsBuf = Buffer.alloc(0);
1065
+ if (!lastForwardedByteWasNewline) {
1066
+ try { process.stdout.write("\n"); } catch {}
1067
+ lastForwardedByteWasNewline = true;
1068
+ }
1069
+ } else if (rewriteBuf.length > 0) {
1070
+ // A withheld buffered partial (never forwarded). If it parses as a COMPLETE
1071
+ // message (only its trailing newline missing) — possible only when the whole
1072
+ // frame arrived post-latch, so NONE of it is on the wire — deliver it
1073
+ // (rewritten if a pending tools/list response, else raw) + "\n" and clear its
1074
+ // id (no -32001). Otherwise (a mode-boundary tail — pre-empted by the
1075
+ // align-skip — or codex died mid-frame) discard; the -32001 loop covers it.
1076
+ const frameStr = rewriteBuf.toString("utf8");
1077
+ let outStr = null;
1078
+ try {
1079
+ const m = JSON.parse(frameStr);
1080
+ outStr = frameStr;
1081
+ if (
1082
+ m && typeof m === "object" && "id" in m &&
1083
+ ("result" in m || "error" in m) &&
1084
+ pendingToolsListIds.has(idKey(m.id)) && injectGoalIntoToolsListMessage(m)
1085
+ ) {
1086
+ outStr = JSON.stringify(m);
1087
+ }
1088
+ } catch { outStr = null; }
1089
+ rewriteBuf = Buffer.alloc(0);
1090
+ stdoutObsBuf = Buffer.alloc(0);
1091
+ if (outStr !== null) {
1092
+ try { process.stdout.write(`${outStr}\n`); } catch {}
1093
+ observeOutgoingLine(frameStr); // clear its id -> no synthetic error for it
1094
+ lastForwardedByteWasNewline = true;
1095
+ } else if (!lastForwardedByteWasNewline) {
1096
+ try { process.stdout.write("\n"); } catch {}
1097
+ lastForwardedByteWasNewline = true;
1098
+ }
1099
+ } else if (stdoutObsBuf.length > 0) {
891
1100
  observeOutgoingLine(stdoutObsBuf.toString("utf8"));
892
1101
  stdoutObsBuf = Buffer.alloc(0);
893
1102
  try { process.stdout.write("\n"); } catch {}
@@ -914,6 +1123,12 @@ function runCodexPassthrough({
914
1123
  }
915
1124
  }
916
1125
 
1126
+ // Hygiene: drop the rewrite latch/skip state (forwarding has stopped).
1127
+ pendingToolsListIds.clear();
1128
+ rewriteSkipUntilNewline = false;
1129
+ rewriteSkipReleaseId = undefined;
1130
+ rewriteBuf = Buffer.alloc(0);
1131
+
917
1132
  flushThenExit(exitCode);
918
1133
  };
919
1134
 
@@ -953,35 +1168,153 @@ function runCodexPassthrough({
953
1168
  logErr(`[codex] ${chunk.toString().trimEnd()}`);
954
1169
  });
955
1170
 
956
- // Forward codex stdout to the client byte-for-byte (raw Buffer) and keep a
957
- // parallel observation buffer (split on the newline BYTE) to clear in-flight
958
- // ids as their responses land. Raw chunks are forwarded; reconstructed lines
959
- // are never written back.
1171
+ const logRewriteDropOnce = () => {
1172
+ if (!oversizedToolsListLogged) {
1173
+ logErr(
1174
+ "[mcp-agents] codex passthrough: tools/list-window frame exceeded rewrite cap; " +
1175
+ "forwarding raw (goal not advertised on this response)",
1176
+ );
1177
+ oversizedToolsListLogged = true;
1178
+ }
1179
+ };
1180
+
1181
+ // Raw forward of one buffer plus the existing first-`!ok` backpressure handling
1182
+ // (pause codex + suspend the watchdog until drain). Returns the write result.
1183
+ // Used by BOTH the raw fast path and buffer mode, so the wire-state tracking and
1184
+ // backpressure contract live in exactly one place.
1185
+ const forwardChunk = (buf) => {
1186
+ if (buf.length === 0) return true;
1187
+ lastForwardedByteWasNewline = buf[buf.length - 1] === NEWLINE;
1188
+ const ok = process.stdout.write(buf);
1189
+ if (!ok && !stdoutPaused) {
1190
+ // Downstream full: pause codex and suspend the idle watchdog until the
1191
+ // client drains, so a slow reader is never mistaken for a stalled codex.
1192
+ stdoutPaused = true;
1193
+ clearIdle();
1194
+ child.stdout.pause();
1195
+ }
1196
+ return ok;
1197
+ };
1198
+
1199
+ // Once no tools/list id is outstanding (and not mid-skip), a trailing partial in
1200
+ // rewriteBuf is a NON-tools/list frame (no response expected), so it must not stay
1201
+ // withheld in buffer mode — raw mode forwards partials as they arrive, and
1202
+ // withholding it would byte-lose it if codex dies before its newline. Forward it
1203
+ // raw and drop back to the fast path. Called from BOTH paths that can clear the
1204
+ // latch: the end of flushRewriteBuf (a response completed) and noteInbound's
1205
+ // cancel branch (a tools/list was canceled on stdin, which never runs the flush).
1206
+ const returnToRawIfLatchClear = () => {
1207
+ if (
1208
+ !finalizing && pendingToolsListIds.size === 0 &&
1209
+ !rewriteSkipUntilNewline && rewriteBuf.length > 0
1210
+ ) {
1211
+ forwardChunk(rewriteBuf);
1212
+ rewriteBuf = Buffer.alloc(0);
1213
+ }
1214
+ };
1215
+
1216
+ // Flush every COMPLETE frame from rewriteBuf, rewriting only the matched
1217
+ // tools/list response and forwarding everything else byte-for-byte. NEVER
1218
+ // early-returns on backpressure: forwardChunk pauses codex on the first `!ok`,
1219
+ // but this chunk's frames are all queued (Node buffers regardless), so no
1220
+ // COMPLETE frame is ever stranded — exactly today's "one write(chunk), then
1221
+ // pause the source" semantics. After this returns rewriteBuf holds at most one
1222
+ // trailing INCOMPLETE partial.
1223
+ const flushRewriteBuf = () => {
1224
+ if (rewriteSkipUntilNewline) {
1225
+ const nl = rewriteBuf.indexOf(NEWLINE);
1226
+ if (nl === -1) {
1227
+ // Still inside the skipped/aligned frame: forward it all raw, stay skipping.
1228
+ forwardChunk(rewriteBuf);
1229
+ rewriteBuf = Buffer.alloc(0);
1230
+ return;
1231
+ }
1232
+ forwardChunk(rewriteBuf.subarray(0, nl + 1)); // forward through the newline raw
1233
+ rewriteBuf = rewriteBuf.subarray(nl + 1);
1234
+ if (rewriteSkipReleaseId !== undefined) {
1235
+ pendingToolsListIds.delete(rewriteSkipReleaseId);
1236
+ rewriteSkipReleaseId = undefined;
1237
+ }
1238
+ rewriteSkipUntilNewline = false;
1239
+ }
1240
+ let nl;
1241
+ while ((nl = rewriteBuf.indexOf(NEWLINE)) !== -1) {
1242
+ const frameBytes = rewriteBuf.subarray(0, nl + 1); // original bytes incl. delimiter
1243
+ rewriteBuf = rewriteBuf.subarray(nl + 1); // consume-first: never re-forward, never wedge
1244
+ if (nl > MAX_BUFFER_BYTES) {
1245
+ // Complete frame larger than the cap: forward raw without parsing (mirrors
1246
+ // observeOutgoing's oversized branch), releasing only a matching pending id.
1247
+ logRewriteDropOnce();
1248
+ const pid = peekResponseId(frameBytes);
1249
+ if (pid !== undefined && pendingToolsListIds.has(idKey(pid))) {
1250
+ pendingToolsListIds.delete(idKey(pid));
1251
+ }
1252
+ forwardChunk(frameBytes);
1253
+ continue;
1254
+ }
1255
+ let outBuf = frameBytes; // default: byte-for-byte
1256
+ try {
1257
+ const msg = JSON.parse(
1258
+ frameBytes.subarray(0, frameBytes.length - 1).toString("utf8"),
1259
+ );
1260
+ if (
1261
+ msg && typeof msg === "object" && "id" in msg &&
1262
+ ("result" in msg || "error" in msg) &&
1263
+ pendingToolsListIds.has(idKey(msg.id))
1264
+ ) {
1265
+ pendingToolsListIds.delete(idKey(msg.id));
1266
+ if (injectGoalIntoToolsListMessage(msg)) {
1267
+ outBuf = Buffer.from(`${JSON.stringify(msg)}\n`, "utf8");
1268
+ }
1269
+ }
1270
+ } catch {
1271
+ outBuf = frameBytes; // unparseable (mode-boundary tail / partial) — forward original bytes
1272
+ }
1273
+ forwardChunk(outBuf);
1274
+ }
1275
+ if (rewriteBuf.length > MAX_BUFFER_BYTES) {
1276
+ // Partial frame already past the cap with no newline: abandon rewriting for
1277
+ // THIS frame, forward what we have raw, and skip to its newline. Release only
1278
+ // a matching id, deferred to that newline.
1279
+ logRewriteDropOnce();
1280
+ const pid = peekResponseId(rewriteBuf);
1281
+ rewriteSkipReleaseId =
1282
+ pid !== undefined && pendingToolsListIds.has(idKey(pid)) ? idKey(pid) : undefined;
1283
+ forwardChunk(rewriteBuf);
1284
+ rewriteBuf = Buffer.alloc(0);
1285
+ rewriteSkipUntilNewline = true;
1286
+ }
1287
+ // Latch boundary: a response just completed may have emptied the latch — if so,
1288
+ // flush any trailing NON-tools/list partial raw and return to the fast path.
1289
+ returnToRawIfLatchClear();
1290
+ };
1291
+ const bufferModeForward = (chunk) => {
1292
+ rewriteBuf = rewriteBuf.length ? Buffer.concat([rewriteBuf, chunk]) : chunk;
1293
+ flushRewriteBuf();
1294
+ };
1295
+
1296
+ // Forward codex stdout to the client. Steady state is a byte-for-byte raw
1297
+ // passthrough (forwardChunk); while a tools/list response is pending the
1298
+ // forwarder buffers and rewrites that one frame (bufferModeForward) to advertise
1299
+ // `goal`. Observation runs on the ORIGINAL bytes and stays the sole authority for
1300
+ // clearing in-flight ids — by the time it runs, every complete frame in this
1301
+ // chunk was already forwarded/queued, so it never leads forwarding.
960
1302
  child.stdout.on("data", (chunk) => {
961
1303
  if (finalizing) return; // stream ownership has been taken over
962
- resetIdle();
1304
+ resetIdle(); // UNCONDITIONAL, before the mode branch — buffer-mode activity must keep the watchdog alive
963
1305
 
964
- // Forward the raw bytes FIRST so a bug in observation can never affect the
965
- // byte-for-byte passthrough (observation is best-effort id-tracking only).
966
- if (chunk.length > 0) {
967
- lastForwardedByteWasNewline = chunk[chunk.length - 1] === NEWLINE;
1306
+ if (pendingToolsListIds.size > 0 || rewriteBuf.length > 0 || rewriteSkipUntilNewline) {
1307
+ bufferModeForward(chunk);
1308
+ } else {
1309
+ forwardChunk(chunk);
968
1310
  }
969
- const ok = process.stdout.write(chunk);
1311
+
970
1312
  try {
971
1313
  observeOutgoing(chunk); // bounded parse-for-ids; never alters forwarded bytes
972
1314
  } catch (err) {
973
1315
  const msg = err instanceof Error ? err.message : String(err);
974
1316
  logErr(`[mcp-agents] codex passthrough: stdout observation error (ignored): ${msg}`);
975
1317
  }
976
- if (!ok) {
977
- // Downstream full: pause codex and suspend the idle watchdog until the
978
- // client drains, so a slow reader is never mistaken for a stalled codex.
979
- // Trade-off: a client that never drains keeps the request open with no
980
- // watchdog — but a synthetic error could not be delivered to it anyway.
981
- stdoutPaused = true;
982
- clearIdle();
983
- child.stdout.pause();
984
- }
985
1318
  });
986
1319
 
987
1320
  process.stdout.on("drain", () => {
@@ -1018,7 +1351,16 @@ function runCodexPassthrough({
1018
1351
  // so even an elicitation response — bare id, no method — keeps a healthy
1019
1352
  // interactive flow alive.)
1020
1353
  if (msg.method === "notifications/cancelled") {
1021
- cancelInFlight(msg.params?.requestId);
1354
+ const rid = msg.params?.requestId;
1355
+ cancelInFlight(rid);
1356
+ // A canceled/never-answered tools/list must not wedge buffer mode open. If
1357
+ // this cancel cleared the last pending tools/list id while a NON-tools/list
1358
+ // partial is withheld in rewriteBuf, flush it raw — otherwise a codex exit
1359
+ // with only-canceled work would drop those bytes (finalize skips recovery).
1360
+ if (rid != null) {
1361
+ pendingToolsListIds.delete(idKey(rid));
1362
+ returnToRawIfLatchClear();
1363
+ }
1022
1364
  return;
1023
1365
  }
1024
1366
  // A client message awaits a response iff it carries BOTH an id and a method.
@@ -1026,6 +1368,22 @@ function runCodexPassthrough({
1026
1368
  // for in-flight tracking.
1027
1369
  if (msg.id != null && typeof msg.method === "string") {
1028
1370
  addInFlight(msg.id);
1371
+ if (msg.method === "tools/list") {
1372
+ // Arm the goal-advertising rewrite latch for this tools/list response. If
1373
+ // buffer mode would START mid-frame (a pre-latch frame's head was already
1374
+ // raw-forwarded and its newline hasn't arrived), first align by raw-skipping
1375
+ // the orphan tail to its next newline — so the tail is forwarded
1376
+ // byte-for-byte and never mis-parsed as a standalone frame nor byte-lost at
1377
+ // finalize. Equivalent to today's raw behaviour for that straddled frame.
1378
+ if (
1379
+ pendingToolsListIds.size === 0 && rewriteBuf.length === 0 &&
1380
+ !rewriteSkipUntilNewline && !lastForwardedByteWasNewline
1381
+ ) {
1382
+ rewriteSkipUntilNewline = true;
1383
+ rewriteSkipReleaseId = undefined;
1384
+ }
1385
+ pendingToolsListIds.add(idKey(msg.id));
1386
+ }
1029
1387
  }
1030
1388
  };
1031
1389
 
@@ -1043,7 +1401,7 @@ function runCodexPassthrough({
1043
1401
  const line = stdinBuf.subarray(0, nl).toString("utf8");
1044
1402
  stdinBuf = stdinBuf.subarray(nl + 1);
1045
1403
  noteInbound(line);
1046
- child.stdin.write(`${filterCodexToolCall(line)}\n`);
1404
+ child.stdin.write(`${filterCodexToolCall(line, { serverGoal: resolvedGoal })}\n`);
1047
1405
  }
1048
1406
  });
1049
1407
  process.stdin.on("error", () => {});
@@ -1051,7 +1409,7 @@ function runCodexPassthrough({
1051
1409
  if (stdinBuf.length > 0) {
1052
1410
  const line = stdinBuf.toString("utf8");
1053
1411
  noteInbound(line);
1054
- child.stdin.write(filterCodexToolCall(line));
1412
+ child.stdin.write(filterCodexToolCall(line, { serverGoal: resolvedGoal }));
1055
1413
  }
1056
1414
  child.stdin.end();
1057
1415
  });
@@ -1118,6 +1476,7 @@ async function main() {
1118
1476
  modelReasoningEffort,
1119
1477
  sandboxMode,
1120
1478
  approvalPolicy,
1479
+ goal,
1121
1480
  codexIdleTimeoutMs,
1122
1481
  defaultTimeoutMs,
1123
1482
  } = parseArgs();
@@ -1136,6 +1495,7 @@ async function main() {
1136
1495
  modelReasoningEffort,
1137
1496
  sandboxMode,
1138
1497
  approvalPolicy,
1498
+ goal,
1139
1499
  idleTimeoutMs: codexIdleTimeoutMs,
1140
1500
  });
1141
1501
  return;