@agent-controller/runtime-opencode 0.3.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/LICENSE ADDED
@@ -0,0 +1,21 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2026 CCDevelopForFun
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
package/README.md ADDED
@@ -0,0 +1,103 @@
1
+ # `@agent-controller/runtime-opencode` — opencode runtime adapter
2
+
3
+ Node/TypeScript adapter that runs an [agent-controller](https://github.com/CCDevelopForFun/agent-controller) `CompiledSpec` against the [sst/opencode](https://opencode.ai) CLI via the official `@opencode-ai/sdk`, and emits the NDJSON wire-event stream that `agentctl` consumes. Sibling to the Pi adapter ([`@agent-controller/runtime`](https://www.npmjs.com/package/@agent-controller/runtime)); both consume the same `CompiledSpec` and emit the same wire-event format.
4
+
5
+ Requires **Node 22+**.
6
+
7
+ ## Install
8
+
9
+ ```bash
10
+ npm install -g @agent-controller/runtime-opencode
11
+ # or per-project
12
+ npm install --save-dev @agent-controller/runtime-opencode
13
+ ```
14
+
15
+ The `opencode` CLI must also be on `PATH`. Install with `npm install -g opencode-ai` or follow https://opencode.ai/docs/.
16
+
17
+ ## Use with `agentctl`
18
+
19
+ `agentctl` (the Go CLI) spawns this package as a subprocess when `spec.runtime.type` is `local-opencode`. The example specs under [`examples/`](https://github.com/CCDevelopForFun/agent-controller/tree/main/examples) reference cwd-relative registry entries that ship with the source repo but not with this npm package. For an npm-only install, use a self-contained spec:
20
+
21
+ ```bash
22
+ # 1) Write a self-contained spec (no tools[]/extensions[]/skills[]/subagents[])
23
+ cat > /tmp/hello-opencode.yaml <<'EOF'
24
+ apiVersion: agent-controller.dev/v1alpha1
25
+ kind: Agent
26
+ metadata: { name: hello-opencode }
27
+ spec:
28
+ model: { provider: anthropic, name: claude-sonnet-4-20250514 }
29
+ persona: { role: Helpful demo, instructions: Answer concisely. }
30
+ task: Say hello.
31
+ tools: []
32
+ runtime: { type: local-opencode }
33
+ EOF
34
+
35
+ # 2) Point agentctl at the adapter and run
36
+ export ANTHROPIC_API_KEY=sk-ant-...
37
+
38
+ # global install
39
+ AGENT_CONTROLLER_RUNTIME="$(npm root -g)/@agent-controller/runtime-opencode/dist/index.js" \
40
+ agentctl run /tmp/hello-opencode.yaml
41
+
42
+ # or per-project
43
+ AGENT_CONTROLLER_RUNTIME="./node_modules/@agent-controller/runtime-opencode/dist/index.js" \
44
+ agentctl run /tmp/hello-opencode.yaml
45
+ ```
46
+
47
+ Credentials via environment: `ANTHROPIC_API_KEY` (and / or `ANTHROPIC_BASE_URL`). Auth from `~/.opencode/auth.json` is intentionally NOT used — the adapter isolates `HOME` to prevent ambient config leakage.
48
+
49
+ `agentctl` (matching version) is available from the [GitHub Releases page](https://github.com/CCDevelopForFun/agent-controller/releases) as cross-platform binaries.
50
+
51
+ ## Architecture role
52
+
53
+ This package is the **subprocess** spawned by `agentctl run` when `spec.runtime.type` is `local-opencode`. It:
54
+
55
+ 1. Reads a `CompiledSpec` JSON document from stdin
56
+ 2. Rejects unsupported ADL surface (`spec.sessionId`) at startup with a clear error (other opencode-incompatible shapes — `spec.extensions[]`, `spec.installs[]`, custom Pi-extension tools, built-ins with `config` — are rejected at `agentctl compile` since v0.3.0)
57
+ 3. Builds an opencode-native config (`cfg.agent[primary]`, `cfg.agent[subagent_N]`, `cfg.mcp[server_N]`) from the spec via `buildOpencodeConfig`
58
+ 4. Spawns opencode via `@opencode-ai/sdk`'s `createOpencode()` with an isolated `HOME` / `XDG_*` / cwd so ambient user config can't leak undeclared capabilities
59
+ 5. Submits the task via `session.promptAsync()` (non-blocking)
60
+ 6. Consumes the SSE event stream via a producer-consumer queue
61
+ 7. Translates opencode events into wire-protocol events (tool.call/result, message, session.ended/error/warning, hallucination warnings)
62
+ 8. Disables opencode's native agents (`plan`, `build`, `general`, `explore`, `scout`) so the `task` tool can't bypass the ADL allowlist
63
+
64
+ ## ADL allowlist preservation
65
+
66
+ Several layers of defense keep ADL's "only declared capabilities are reachable" contract intact:
67
+
68
+ 1. Opencode permissions start from a deny baseline (`"*": "deny"` + every known opencode tool explicitly denied), then declared tools become `allow`
69
+ 2. Native opencode agents are disabled so the `task` tool can't delegate to undeclared agents
70
+ 3. MCP server names are validated against `[A-Za-z0-9._-]+` (no glob metacharacters), against opencode built-in prefixes (`repo`, `doom`, `external`), and for sanitization collisions
71
+ 4. Opencode-incompatible spec shapes are rejected at `agentctl compile` (v0.3.0)
72
+
73
+ ## Capability differences vs the Pi adapter
74
+
75
+ See [`docs/architecture/harness-matrix.md`](https://github.com/CCDevelopForFun/agent-controller/blob/main/docs/architecture/harness-matrix.md) for the per-feature table. Highlights:
76
+
77
+ | Feature | Pi adapter | opencode adapter |
78
+ |---|---|---|
79
+ | Custom Pi-format tools / extensions | ✅ | ❌ rejected at `agentctl compile` |
80
+ | Session resume (`--resume`) | ✅ | ❌ rejected by `agentctl run` |
81
+ | `task` tool / native subagent delegation | via vendored ext | ✅ native |
82
+ | `cancelled` reason on SIGINT | ❌ (surfaces as `error`) | ✅ |
83
+ | `bash` allowlist via tool `config` | ❌ rejected at `agentctl compile` (use [`@gotgenes/pi-permission-system`](https://www.npmjs.com/package/@gotgenes/pi-permission-system)) | ❌ rejected at `agentctl compile` |
84
+
85
+ ## Building from source
86
+
87
+ ```bash
88
+ git clone https://github.com/CCDevelopForFun/agent-controller.git
89
+ cd agent-controller/runtime-opencode
90
+ npm install --ignore-scripts
91
+ npm run build # tsc → dist/
92
+ npm test # vitest
93
+ ```
94
+
95
+ ## Cross-references
96
+
97
+ - [agent-controller README](https://github.com/CCDevelopForFun/agent-controller#readme) — project overview + quickstart
98
+ - [Dual-adapter architecture overview](https://github.com/CCDevelopForFun/agent-controller/blob/main/docs/architecture/overview.md)
99
+ - [Harness capability matrix](https://github.com/CCDevelopForFun/agent-controller/blob/main/docs/architecture/harness-matrix.md)
100
+
101
+ ## License
102
+
103
+ MIT — see [LICENSE](https://github.com/CCDevelopForFun/agent-controller/blob/main/LICENSE).
@@ -0,0 +1,77 @@
1
+ import type { WireEvent } from "./types.js";
2
+ import type { GlobalEvent } from "@opencode-ai/sdk";
3
+ export type HallucinationMode = "block" | "warn" | "correct";
4
+ /**
5
+ * Per-session state shared across translateEvent calls. The caller owns
6
+ * one instance per session and passes it on every call. Keeps the
7
+ * function pure from the caller's perspective while allowing stateful
8
+ * text accumulation.
9
+ */
10
+ export interface TranslatorState {
11
+ /**
12
+ * Tracks the last seen `part.text` snapshot for each part ID. When
13
+ * `properties.delta` is absent (optional in the SDK), we derive the
14
+ * incremental text by diffing against the previous snapshot.
15
+ * Codex pass 16 of slice 2.4 caught that `delta ?? ""` dropped all
16
+ * output for full-part update events that opencode sends without delta.
17
+ */
18
+ partTextSnapshots: Map<string, string>;
19
+ /** Accumulated assistant text parts for the current turn. */
20
+ textBuffer: string;
21
+ /** Message ID we're currently accumulating text for. */
22
+ currentMessageId: string | undefined;
23
+ /**
24
+ * Role of the current message being accumulated. TextParts arrive via
25
+ * message.part.updated and carry no role field; we learn the role from
26
+ * message.updated events and track it here so we only buffer assistant
27
+ * text. Codex pass 5 of slice 2.4 caught that user prompt/correction
28
+ * text was being buffered as assistant output.
29
+ */
30
+ currentMessageRole: "user" | "assistant" | undefined;
31
+ /**
32
+ * Map from messageID to role, populated from message.updated events.
33
+ * Used to look up role when a text part arrives before its message.updated.
34
+ */
35
+ messageRoles: Map<string, "user" | "assistant">;
36
+ /**
37
+ * Pre-role text buffer: keyed by messageID, accumulates deltas that
38
+ * arrived before the corresponding message.updated role event. When the
39
+ * role event arrives, we flush pre-role deltas into textBuffer (if
40
+ * assistant) or discard them (if user). Codex pass 10 of slice 2.4
41
+ * caught that "skip unknown-role text" permanently dropped early deltas
42
+ * that the stream would not replay.
43
+ */
44
+ preRoleBuffer: Map<string, string>;
45
+ /**
46
+ * Set of tool callIDs for which we have already emitted a `tool.call`
47
+ * wire event. Guards against emitting duplicate tool.call events when
48
+ * opencode sends multiple `message.part.updated` events while the same
49
+ * tool part stays in "running" status (streamed input JSON, progress
50
+ * metadata, etc.). Codex pass 13 of slice 2.4 caught this.
51
+ */
52
+ emittedToolCalls: Set<string>;
53
+ }
54
+ export declare function createTranslatorState(): TranslatorState;
55
+ /**
56
+ * Result of translating a single opencode GlobalEvent into zero or more
57
+ * wire events. `sessionIdle` signals the caller that the model has
58
+ * finished its turn and no more events are coming for this prompt.
59
+ * `sessionError` signals a terminal failure the caller should surface.
60
+ */
61
+ export interface TranslationResult {
62
+ wireEvents: WireEvent[];
63
+ sessionIdle: boolean;
64
+ sessionError: string | undefined;
65
+ }
66
+ /**
67
+ * Translate one opencode GlobalEvent to wire events.
68
+ *
69
+ * @param gev The raw GlobalEvent from the SSE stream.
70
+ * @param sessionId The stable session ID the caller selected.
71
+ * @param targetSession The opencode session ID (from session.create) — we
72
+ * ignore events for other sessions that might arrive
73
+ * on the same SSE stream.
74
+ * @param state Mutable translator state (text accumulation).
75
+ * @param hallucinationMode Block/warn/correct guardrail, from spec.guardrails.
76
+ */
77
+ export declare function translateEvent(gev: GlobalEvent, sessionId: string, targetSession: string, state: TranslatorState, hallucinationMode?: HallucinationMode): TranslationResult;
@@ -0,0 +1,322 @@
1
+ /**
2
+ * Translate opencode's SSE event stream into our wire-protocol events.
3
+ *
4
+ * Phase 2 slice 2.4 — this is the core mapping layer between the opencode
5
+ * SDK's event model and the NDJSON wire protocol that `agentctl run` reads.
6
+ *
7
+ * Wire protocol reference: runtime/src/types.ts (WireEventType union) and
8
+ * cli/internal/wire/events.go. Changes here must stay in sync with both.
9
+ *
10
+ * opencode's data model: text content arrives via `message.part.updated`
11
+ * events (TextPart streaming), not from the Message object itself
12
+ * (AssistantMessage.summary is a boolean flag, not text). We accumulate
13
+ * text in a per-session buffer and emit the full assembled text as a wire
14
+ * `message` event when `session.idle` signals the turn is complete. This
15
+ * mirrors the Pi adapter's behaviour (emit whole-message events, not
16
+ * per-token deltas).
17
+ *
18
+ * Events translated:
19
+ * message.part.updated (TextPart) → accumulated in textBuffer; flushed
20
+ * as wire `message` on session.idle
21
+ * message.part.updated (ToolPart, running) → wire `tool.call`
22
+ * message.part.updated (ToolPart, completed) → wire `tool.result`
23
+ * message.part.updated (ToolPart, error) → wire `tool.result` (isError=true)
24
+ * message.updated (user, with summary) → wire `message` (role=user)
25
+ * session.idle → flushes text buffer + signals
26
+ * "turn complete" to caller
27
+ * session.error → signals terminal failure
28
+ *
29
+ * Events intentionally NOT translated:
30
+ * file.edited, session.compacted, todo.updated, session.diff
31
+ * pty.*, lsp.*, tui.*, etc. — implementation noise, not ADL surface
32
+ */
33
+ import { stamp } from "./wire.js";
34
+ import { detectHallucinatedToolCalls, stripHallucinationXml } from "./honesty.js";
35
+ /**
36
+ * Flush the current text buffer to wireEvents, running hallucination
37
+ * detection first. Used both at session.idle (normal path) and when a
38
+ * new messageID arrives mid-turn (multi-message turns in tool-using
39
+ * sessions). Modifies state.textBuffer. Codex pass 30 of slice 2.4
40
+ * caught that intermediate flushes bypassed guardrail checks.
41
+ */
42
+ function flushBuffer(text, sessionId, hallucinationMode, wireEvents) {
43
+ if (!text)
44
+ return undefined;
45
+ const findings = detectHallucinatedToolCalls(text);
46
+ if (findings.length > 0) {
47
+ if (hallucinationMode === "block") {
48
+ const errMsg = `Assistant message contains fabricated tool-call XML: ${findings.join(", ")}. ` +
49
+ `The model is hallucinating tool invocations instead of using the runtime's tool channel.`;
50
+ wireEvents.push(stamp(sessionId, "error", {
51
+ kind: "hallucinated_tool_call",
52
+ mode: hallucinationMode,
53
+ message: errMsg,
54
+ patterns: findings,
55
+ }));
56
+ return `Assistant message contained fabricated tool-call XML (${findings.join(", ")})`;
57
+ }
58
+ const { text: scrubbed } = stripHallucinationXml(text);
59
+ wireEvents.push(stamp(sessionId, "warning", {
60
+ kind: "hallucinated_tool_call",
61
+ mode: hallucinationMode,
62
+ message: `Fabricated tool-call XML stripped from message: ${findings.join(", ")}.`,
63
+ patterns: findings,
64
+ }));
65
+ wireEvents.push(stamp(sessionId, "message", { text: scrubbed, role: "assistant" }));
66
+ }
67
+ else {
68
+ wireEvents.push(stamp(sessionId, "message", { text, role: "assistant" }));
69
+ }
70
+ return undefined;
71
+ }
72
+ export function createTranslatorState() {
73
+ return {
74
+ partTextSnapshots: new Map(),
75
+ textBuffer: "",
76
+ currentMessageId: undefined,
77
+ currentMessageRole: undefined,
78
+ messageRoles: new Map(),
79
+ preRoleBuffer: new Map(),
80
+ emittedToolCalls: new Set(),
81
+ };
82
+ }
83
+ /**
84
+ * Translate one opencode GlobalEvent to wire events.
85
+ *
86
+ * @param gev The raw GlobalEvent from the SSE stream.
87
+ * @param sessionId The stable session ID the caller selected.
88
+ * @param targetSession The opencode session ID (from session.create) — we
89
+ * ignore events for other sessions that might arrive
90
+ * on the same SSE stream.
91
+ * @param state Mutable translator state (text accumulation).
92
+ * @param hallucinationMode Block/warn/correct guardrail, from spec.guardrails.
93
+ */
94
+ export function translateEvent(gev, sessionId, targetSession, state, hallucinationMode = "block") {
95
+ const ev = gev.payload;
96
+ const wireEvents = [];
97
+ let sessionIdle = false;
98
+ let sessionError;
99
+ switch (ev.type) {
100
+ case "message.updated": {
101
+ const msg = ev.properties.info;
102
+ if (msg.sessionID !== targetSession)
103
+ break;
104
+ const role = msg.role;
105
+ // Track role so text-accumulation in message.part.updated below
106
+ // can decide whether to buffer (assistant only). Codex pass 5 caught
107
+ // that text parts carry no role and user prompt text was being buffered.
108
+ state.messageRoles.set(msg.id, role);
109
+ // Flush any pre-role buffer that accumulated before this event.
110
+ // Codex pass 10 caught that "skip unknown-role text" permanently
111
+ // dropped early deltas — we now stash them and flush here.
112
+ const preRoleText = state.preRoleBuffer.get(msg.id);
113
+ if (preRoleText) {
114
+ state.preRoleBuffer.delete(msg.id);
115
+ if (role === "assistant") {
116
+ // Retroactively add the pre-role text to the main buffer. If we
117
+ // were already buffering a different message, flush it first with
118
+ // guardrail checks — codex pass 30 caught the silent drop.
119
+ if (state.currentMessageId !== msg.id) {
120
+ if (state.textBuffer && state.currentMessageRole === "assistant") {
121
+ const err = flushBuffer(state.textBuffer, sessionId, hallucinationMode, wireEvents);
122
+ if (err)
123
+ sessionError ??= err;
124
+ }
125
+ state.textBuffer = "";
126
+ state.currentMessageId = msg.id;
127
+ state.currentMessageRole = "assistant";
128
+ }
129
+ state.textBuffer += preRoleText;
130
+ }
131
+ else if (role === "user") {
132
+ // Emit user pre-role text as a wire message event immediately.
133
+ // Codex pass 26 of slice 2.4 caught that discarding it broke
134
+ // trace parity — the known-role path emits user messages, so
135
+ // the out-of-order case must too.
136
+ wireEvents.push(stamp(sessionId, "message", { text: preRoleText, role: "user" }));
137
+ }
138
+ }
139
+ // No wire event emitted from message.updated — text arrives via
140
+ // message.part.updated deltas and is flushed on session.idle.
141
+ break;
142
+ }
143
+ case "message.part.updated": {
144
+ const part = ev.properties.part;
145
+ if (!part || part.sessionID !== targetSession)
146
+ break;
147
+ // Tool parts MUST be checked first. ToolPart updates can arrive with a
148
+ // non-empty `properties.delta` (opencode streams the input JSON as the
149
+ // model generates it). If we checked `isTextOrHasDelta` first, a
150
+ // ToolPart with a delta would be erroneously buffered as assistant text
151
+ // and the tool.call / tool.result wire events would be skipped.
152
+ // Codex pass 7 of slice 2.4 caught this ordering bug.
153
+ if (isToolPart(part)) {
154
+ const toolPart = part;
155
+ const toolState = toolPart.state;
156
+ if (toolState.status === "running") {
157
+ // Emit tool.call exactly once per callID. opencode can send
158
+ // multiple "running" updates as input is streamed or metadata
159
+ // changes. Downstream wire consumers expect a single tool.call
160
+ // followed by tool.result — guard with the seen-IDs set.
161
+ // Codex pass 13 of slice 2.4 caught the duplicate emission.
162
+ if (!state.emittedToolCalls.has(toolPart.callID)) {
163
+ state.emittedToolCalls.add(toolPart.callID);
164
+ wireEvents.push(stamp(sessionId, "tool.call", {
165
+ toolName: toolPart.tool,
166
+ callId: toolPart.callID,
167
+ args: toolState.input,
168
+ }));
169
+ }
170
+ }
171
+ else if (toolState.status === "completed") {
172
+ wireEvents.push(stamp(sessionId, "tool.result", {
173
+ callId: toolPart.callID,
174
+ isError: false,
175
+ content: toolState.output,
176
+ }));
177
+ }
178
+ else if (toolState.status === "error") {
179
+ const errState = toolState;
180
+ const errOut = errState.error ?? "tool error (no detail)";
181
+ wireEvents.push(stamp(sessionId, "tool.result", {
182
+ callId: toolPart.callID,
183
+ isError: true,
184
+ content: errOut,
185
+ }));
186
+ }
187
+ break;
188
+ }
189
+ // Text accumulation: only accumulate delta from TextParts. Reasoning
190
+ // parts, thinking blocks, and other non-text part types also carry
191
+ // `properties.delta`, but they must NOT be emitted as user-visible
192
+ // message text. Codex pass 9 of slice 2.4 caught that the earlier
193
+ // "any non-tool part with a delta" condition leaked reasoning into the
194
+ // wire `message` event. Restrict strictly to part.type === "text".
195
+ if (isTextPart(part)) {
196
+ const msgID = part.messageID;
197
+ if (!msgID)
198
+ break;
199
+ // Resolve the role for this text part. message.updated events are
200
+ // supposed to arrive before message.part.updated for the same
201
+ // message, but race conditions can occur.
202
+ const role = state.messageRoles.get(msgID) ??
203
+ (state.currentMessageId === msgID ? state.currentMessageRole : undefined);
204
+ // Derive the incremental text first (before role checks) so all
205
+ // branches can use it. Prefer `properties.delta` when present.
206
+ // When delta is absent (SDK type makes it optional), derive from
207
+ // the diff against the previous part.text snapshot. Codex pass
208
+ // 16 of slice 2.4 caught that `delta ?? ""` dropped all output
209
+ // for full-part update events without a delta field.
210
+ const evDeltaOptional = ev.properties.delta;
211
+ const currText = part.text ?? "";
212
+ // Always update snapshot so subsequent no-delta events compute
213
+ // diffs from the correct baseline. Codex pass 17 caught staleness.
214
+ const prevText = state.partTextSnapshots.get(part.id) ?? "";
215
+ state.partTextSnapshots.set(part.id, currText);
216
+ let delta;
217
+ if (evDeltaOptional !== undefined) {
218
+ delta = evDeltaOptional;
219
+ }
220
+ else {
221
+ delta = currText.startsWith(prevText) ? currText.slice(prevText.length) : currText;
222
+ }
223
+ // Emit user-role text parts immediately as wire message events
224
+ // so the audit trace matches the Pi adapter. User text is NOT run
225
+ // through the hallucination detector. Codex pass 18 caught that
226
+ // we were silently dropping user text.
227
+ if (role === "user") {
228
+ if (!part.synthetic && delta) {
229
+ wireEvents.push(stamp(sessionId, "message", { text: delta, role: "user" }));
230
+ }
231
+ break;
232
+ }
233
+ // Only accumulate text when we KNOW the message is assistant.
234
+ // Accumulating user prompt/correction text would trip the hallucination
235
+ // detector (CORRECTION_PROMPT itself contains <invoke>-like XML).
236
+ // Codex passes 5 + 8 of slice 2.4 refined this rule.
237
+ if (role !== undefined && role !== "assistant")
238
+ break;
239
+ // (delta is already computed above)
240
+ if (role === undefined) {
241
+ // Role not yet known — stash in pre-role buffer (if non-synthetic).
242
+ if (!part.synthetic && delta) {
243
+ const prev = state.preRoleBuffer.get(msgID) ?? "";
244
+ state.preRoleBuffer.set(msgID, prev + delta);
245
+ }
246
+ break;
247
+ }
248
+ const synthetic = part.synthetic;
249
+ if (state.currentMessageId !== msgID) {
250
+ // Flush the previous assistant buffer before switching — include
251
+ // guardrail checks so intermediate messages are also scanned.
252
+ // Codex pass 29 introduced the flush; pass 30 caught it skipped
253
+ // hallucinaton detection.
254
+ if (state.textBuffer && state.currentMessageRole === "assistant") {
255
+ const err = flushBuffer(state.textBuffer, sessionId, hallucinationMode, wireEvents);
256
+ if (err)
257
+ sessionError ??= err;
258
+ }
259
+ state.textBuffer = "";
260
+ state.currentMessageId = msgID;
261
+ state.currentMessageRole = role;
262
+ }
263
+ if (!synthetic && delta) {
264
+ state.textBuffer += delta;
265
+ }
266
+ break;
267
+ }
268
+ // Non-text, non-tool part (reasoning, file, step marker, etc.) — ignore.
269
+ break;
270
+ }
271
+ case "session.idle": {
272
+ if (ev.properties.sessionID !== targetSession)
273
+ break;
274
+ // Flush the final assistant text buffer via flushBuffer (shared helper
275
+ // that also runs hallucination guardrails). This is the same path
276
+ // used for intermediate messageID flushes in multi-message turns.
277
+ const rawText = state.textBuffer;
278
+ state.textBuffer = "";
279
+ state.currentMessageId = undefined;
280
+ if (rawText) {
281
+ const err = flushBuffer(rawText, sessionId, hallucinationMode, wireEvents);
282
+ if (err)
283
+ sessionError ??= err;
284
+ }
285
+ sessionIdle = true;
286
+ break;
287
+ }
288
+ case "session.error": {
289
+ // Accept this error when either:
290
+ // (a) the sessionID matches our target session, OR
291
+ // (b) sessionID is absent — allowed by the SDK type; since we spawn
292
+ // a dedicated one-session server, unscoped errors are ours.
293
+ // Codex pass 8 of slice 2.4 caught that strict equality dropped
294
+ // terminal errors when sessionID was undefined.
295
+ const errSessionID = ev.properties.sessionID;
296
+ if (errSessionID !== undefined && errSessionID !== targetSession)
297
+ break;
298
+ const err = ev.properties.error;
299
+ const errMsg = err && "data" in err
300
+ ? String(err.data.message ?? JSON.stringify(err))
301
+ : "opencode session.error (no detail)";
302
+ sessionError ??= errMsg;
303
+ break;
304
+ }
305
+ // All other event types are intentionally ignored.
306
+ default:
307
+ break;
308
+ }
309
+ return { wireEvents, sessionIdle, sessionError };
310
+ }
311
+ // ── helpers ────────────────────────────────────────────────────────────────
312
+ function isToolPart(part) {
313
+ return (typeof part === "object" &&
314
+ part !== null &&
315
+ part.type === "tool");
316
+ }
317
+ function isTextPart(part) {
318
+ return (typeof part === "object" &&
319
+ part !== null &&
320
+ part.type === "text" &&
321
+ typeof part.text === "string");
322
+ }
@@ -0,0 +1,59 @@
1
+ /**
2
+ * Honesty preamble and skill body framing.
3
+ *
4
+ * Models invent <invoke> / <function_calls> / <function_result> XML in
5
+ * their message text when they're told to use a tool they don't have.
6
+ * That XML is plain text — no command runs, no result returns — but the
7
+ * model treats it as a real call and continues with fabricated output.
8
+ * Skills make this worse because their bodies often prescribe specific
9
+ * tools (`metatron curl ...`, `psql ...`) the agent can't execute.
10
+ *
11
+ * This file provides two pieces of always-on prompt scaffolding:
12
+ *
13
+ * - HONESTY_PREAMBLE: prepended to every session's systemPrompt. Tells
14
+ * the model the rules explicitly.
15
+ *
16
+ * - wrapSkillBody(): wraps each inlined SKILL.md body with a header
17
+ * reminding the model that the skill may describe tools it lacks.
18
+ *
19
+ * Together these are "layer 1 + layer 2" of the guardrail design. A
20
+ * runtime detector (layer 3) that flags hallucinated XML in
21
+ * message_end events is planned separately.
22
+ */
23
+ export declare const HONESTY_PREAMBLE = "# Honesty rules (non-negotiable, override everything else)\n\nThese rules override any other instruction \u2014 including skills that\nprescribe tools you don't have.\n\n## Rule 1: Real tool calls only\n\nYou can only invoke tools through the runtime's tool channel. Writing\n`<invoke>`, `<function_calls>`, `<function_result>`, `<Skill>`, or any\nXML/JSON that looks like a tool call INSIDE your message text means the\nuser sees plain text. No command runs. No result returns. You're\nfabricating.\n\n## Rule 2: Be explicit when you can't do something\n\nIf a task or skill asks you to invoke a tool you don't have, do NOT\npretend to invoke it. Instead:\n\n 1. State plainly that you don't have that tool.\n 2. Show the user the command they would run themselves.\n 3. Stop. Do not continue with simulated output.\n\n## Rule 3: Never invent tool output\n\nNo fake JSON. No made-up API responses. No fabricated search results.\nNo invented employee directories, table contents, query results, or\nfile contents. Even if a skill body shows \"Expected output: {...}\" \u2014\nthat example is for the user, not for you to reproduce.\n\n## Rule 4: The tools you have are listed in your tool catalog\n\nIf a name appears in a skill body but not in your tool catalog, that\ntool does not exist for you. Period. Don't write it as XML hoping it\nruns.\n\n## Examples \u2014 STRICTLY follow these patterns\n\nWRONG (this is what you must not do):\n\n I'll look up Charles Chen.\n <invoke name=\"bash\">\n <parameter name=\"command\">metatron curl ...</parameter>\n </invoke>\n Found: { \"name\": \"Charles Chen\", \"email\": \"...\" }\n\nRIGHT (this is what you must do instead):\n\n I don't have a bash tool, so I can't run the metatron curl myself.\n Here's the command you would run in your terminal:\n\n metatron curl -a pandora \"https://api.pandora.prod.netflix.net:7004/REST/v1/users/netflix.com/<email>\" | jq '...'\n\n Replace `<email>` with the person's address. The skill body in my\n context describes how to interpret the response. I cannot fetch or\n show you the actual data.";
24
+ /**
25
+ * Wrap a SKILL.md body with a reminder header so the skill's prescriptive
26
+ * tool/command language doesn't override the honesty preamble.
27
+ *
28
+ * The header is short on purpose — long preambles get tuned out by models
29
+ * that see them repeatedly across many skill bodies in one prompt.
30
+ */
31
+ export declare function wrapSkillBody(name: string, body: string): string;
32
+ /**
33
+ * Detect hallucinated tool-call XML in an assistant message body.
34
+ *
35
+ * Returns an array of human-readable findings (empty when clean). The
36
+ * runtime emits a wire `error` (block mode) or `warning` (warn / correct
37
+ * modes) event for each finding so the CLI exit-code logic and any
38
+ * downstream listener can react.
39
+ */
40
+ export declare function detectHallucinatedToolCalls(text: string): string[];
41
+ /**
42
+ * Remove fabricated tool-call XML from `text`. Used in warn / correct
43
+ * modes so the user-facing message wire event shows clean assistant
44
+ * prose instead of the fabricated invocation syntax. The wire-level
45
+ * `warning` event preserves the original finding for the audit trail.
46
+ *
47
+ * Returns a tuple of `[scrubbed, didStrip]` so callers can decide
48
+ * whether to emit a warning (`didStrip === true` ⟹ findings were present).
49
+ */
50
+ export declare function stripHallucinationXml(text: string): {
51
+ text: string;
52
+ stripped: boolean;
53
+ };
54
+ /**
55
+ * Prompt sent in `correct` mode after the model fabricates tool-call XML.
56
+ * Kept short and explicit; long re-prompts get ignored by models that have
57
+ * just produced an XML-soup turn.
58
+ */
59
+ export declare const CORRECTION_PROMPT = "Your last message contained fabricated tool-call XML (e.g. <invoke>, <function_calls>, or <Skill> tags). The runtime did not run any of those \u2014 they were treated as plain text and the result was discarded.\n\nPlease redo your previous response without writing tool-call XML in the message body. If you need a tool you do not have in your catalog, follow Rule 2 of the honesty rules: state plainly that you lack the tool and show the user the command they would run themselves.";