pi-prompt-template-model 0.7.2 → 0.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,29 @@
2
2
 
3
3
  ## [Unreleased]
4
4
 
5
+ ## [0.8.0] - 2026-04-21
6
+
7
+ ### Added
8
+ - Added first-class deterministic prompt-template execution for single prompt templates via `deterministic:` or shorthand `run` / `script` frontmatter. Templates can run one direct command or script before any optional LLM turn.
9
+ - Added configurable deterministic-step handoff policies: `always`, `never`, `on-success`, and `on-failure`.
10
+ - Added deterministic result cards that always show the executed command or script, resolved `cwd`, exit code, duration, and stdout/stderr previews.
11
+ - Added deterministic-step `env` support plus `nonInteractive` control for deploy/release-style scripts that need explicit environment variables or want to disable the default non-interactive guardrail env bundle.
12
+ - Added a visible deterministic completion message for `handoff: never`, so no-handoff runs still end with an explicit completion marker after the result card.
13
+
14
+ ### Changed
15
+ - When a deterministic prompt hands off to the model, the extension now prepends a generated `[Deterministic step]` block with structured execution metadata and truncated stdout/stderr previews before the prompt body.
16
+ - Deterministic stdout/stderr payloads are now capped before they are stored in message details, while preserving total line/character counts and truncation metadata for both the card UI and the LLM handoff block.
17
+
18
+ ### Fixed
19
+ - Added regression coverage for deterministic loader parsing, relative script resolution, handoff gating, and the no-handoff fast path.
20
+ - Deterministic timeouts now escalate from `SIGTERM` to `SIGKILL` if the child process does not exit within the post-timeout grace window.
21
+
22
+ ## [0.7.3] - 2026-04-14
23
+
24
+ ### Fixed
25
+ - `/chain-prompts` and chain templates now resolve plain prompt files without extension-specific frontmatter, so standard prompts like `double-check -> deslop` work in chain execution.
26
+ - Added regression coverage for plain-prompt chain resolution while keeping ordinary prompt-template command registration unchanged.
27
+
5
28
  ## [0.7.2] - 2026-04-04
6
29
 
7
30
  ### Changed
package/README.md CHANGED
@@ -353,6 +353,128 @@ Within a compare lineup, use `task` for a full per-slot override and `taskSuffix
353
353
 
354
354
  When a compare prompt uses `bestOfN.worktree: true`, all worker slots must resolve to the same `cwd`. Mixed worker `cwd` values are only allowed when worktree isolation is off. Worktree isolation is for the worker phase only; `bestOfN.finalApplier` always applies on the real branch (`compareCwd`).
355
355
 
356
+ ## Deterministic Steps
357
+
358
+ Prompt templates can run one deterministic command or script before any optional LLM turn. Use this when the first step should be direct code, not model latency.
359
+
360
+ The flow is simple:
361
+
362
+ 1. Run one command or script.
363
+ 2. Always render a visible deterministic result card with the command, exit code, duration, and stdout/stderr previews.
364
+ 3. Optionally hand the structured result to the model as a `[Deterministic step]` preamble before the prompt body.
365
+ 4. If `handoff: never`, stop after the result card and a visible completion marker — no LLM turn happens.
366
+
367
+ That handoff preamble is intentionally structured and uses stable field names like `status`, `executionKind`, `command`, `cwd`, `exitCode`, `signal`, `durationMs`, `timedOut`, `lineCount`, `charCount`, `truncated`, `omittedChars`, and `preview`.
368
+
369
+ V1 scope is intentionally narrow: deterministic execution only works on single prompt templates. It does not combine with chain templates, delegated/subagent prompts, `parallel`, or loops. At runtime, deterministic prompts explicitly reject `--loop`, `--subagent`, and `--fork` in v1.
370
+
371
+ ### Authoring forms
372
+
373
+ You can write deterministic steps as **top-level shorthand** or **nested under `deterministic:`**. Both are equivalent. Use shorthand for brevity, nested when you want everything grouped under one key.
374
+
375
+ **Top-level shorthand** — put `run`, `script`, `handoff`, `timeout`, `cwd`, `env`, and `nonInteractive` directly in frontmatter:
376
+
377
+ ```markdown
378
+ ---
379
+ run: git push origin HEAD:main
380
+ handoff: on-failure
381
+ timeout: 30000
382
+ ---
383
+ If the push failed, explain why and suggest the next step.
384
+ ```
385
+
386
+ You can also use `script:` as shorthand:
387
+
388
+ ```markdown
389
+ ---
390
+ script: ./scripts/ship.sh
391
+ handoff: always
392
+ timeout: 15000
393
+ ---
394
+ Summarize the script result.
395
+ ```
396
+
397
+ **Nested form** — group everything under `deterministic:`:
398
+
399
+ ```markdown
400
+ ---
401
+ model: claude-sonnet-4-20250514
402
+ deterministic:
403
+ script:
404
+ path: ./scripts/ship.sh
405
+ args:
406
+ - --fast
407
+ handoff: always
408
+ timeout: 15000
409
+ cwd: ~/src/my-repo
410
+ ---
411
+ Summarize the script result and call out anything risky.
412
+ ```
413
+
414
+ **Structured command form** — when you need explicit args instead of a single shell string, use `deterministic.run.command` with `args`:
415
+
416
+ ```markdown
417
+ ---
418
+ model: claude-sonnet-4-20250514
419
+ deterministic:
420
+ run:
421
+ command: git
422
+ args: [status, --short]
423
+ handoff: always
424
+ ---
425
+ Interpret the repo state.
426
+ ```
427
+
428
+ Do not mix top-level shorthand with nested `deterministic:` in the same prompt. Pick one style.
429
+
430
+ ### Model requirement
431
+
432
+ Deterministic prompts that hand off to the model (`handoff: always`, `on-success`, or `on-failure`) need a model to continue into. You can either:
433
+
434
+ - Add a `model:` field explicitly
435
+ - Omit `model:` and let the prompt inherit whatever model is currently active
436
+
437
+ `handoff: never` prompts do not need a model field because they never reach the LLM.
438
+
439
+ ### Handoff values
440
+
441
+ - `always` — always continue into the LLM after the deterministic card is emitted.
442
+ - `never` — stop after the deterministic card and completion marker.
443
+ - `on-success` — continue only when the command exits `0`.
444
+ - `on-failure` — continue only when the command exits non-zero.
445
+
446
+ Command descriptions in the slash-command picker show this feature as `deterministic-step:<handoff>`.
447
+
448
+ ### Timeout
449
+
450
+ `timeout` is in milliseconds. When a timeout fires, the runner sends `SIGTERM` first. If the process still has not exited after a short grace window, it escalates to `SIGKILL`.
451
+
452
+ ### Script path resolution
453
+
454
+ Relative script paths resolve from the prompt file's directory first, then fall back to the command invocation `cwd`. Absolute script paths also work.
455
+
456
+ ### Environment and non-interactive mode
457
+
458
+ You can provide explicit environment variables and control the runner's non-interactive guardrails:
459
+
460
+ ```markdown
461
+ ---
462
+ deterministic:
463
+ run: ./deploy.sh
464
+ handoff: never
465
+ nonInteractive: false
466
+ env:
467
+ SPECIAL_TOKEN: abc123
468
+ RETRIES: 2
469
+ ---
470
+ ```
471
+
472
+ `nonInteractive` defaults to `true`. In that mode the runner keeps stdin ignored and adds a few guardrail environment defaults such as `CI=1`, `GIT_TERMINAL_PROMPT=0`, `PAGER=cat`, and `GIT_PAGER=cat`. Set `nonInteractive: false` when the command needs a more normal process environment and you explicitly want to opt out of those defaults. Explicit `env` values override the built-in defaults.
473
+
474
+ ### Output capping
475
+
476
+ Large stdout/stderr streams are capped before they are stored in the conversation card payload. The card and the LLM handoff block both show the total character and line counts plus explicit truncation metadata when output was capped.
477
+
356
478
  ## Loop Execution
357
479
 
358
480
  Run a template multiple times with `--loop`:
@@ -672,3 +794,4 @@ $@
672
794
  - In chains, model-less steps inherit the chain-start model snapshot, not the previous step's model. This is intentional for deterministic behavior.
673
795
  - Delegated `subagent` prompts require [pi-subagents](https://github.com/nicobailon/pi-subagents/).
674
796
  - `run-prompt` must be explicitly enabled with `/prompt-tool on`.
797
+ rompt-tool on`.
@@ -0,0 +1,136 @@
1
+ import type { MessageRenderOptions, Theme } from "@mariozechner/pi-coding-agent";
2
+ import { Box, Container, Spacer, Text } from "@mariozechner/pi-tui";
3
+ import { formatDeterministicExecution, type DeterministicExecutionResult } from "./deterministic-step.js";
4
+
5
+ interface DeterministicMessage {
6
+ content?: unknown;
7
+ details?: DeterministicExecutionResult;
8
+ }
9
+
10
+ interface DeterministicCompletionMessage {
11
+ content?: unknown;
12
+ details?: {
13
+ promptName: string;
14
+ exitCode: number;
15
+ timedOut: boolean;
16
+ status: "succeeded" | "failed";
17
+ };
18
+ }
19
+
20
+ const PREVIEW_LINES = 8;
21
+
22
+ function formatDuration(durationMs: number): string {
23
+ if (durationMs < 1_000) return `${durationMs}ms`;
24
+ if (durationMs < 10_000) return `${(durationMs / 1_000).toFixed(1)}s`;
25
+ return `${Math.round(durationMs / 1_000)}s`;
26
+ }
27
+
28
+ function buildCapturedOutputLabel(
29
+ label: string,
30
+ meta: { totalChars: number; totalLines: number; truncated: boolean },
31
+ ): string {
32
+ if (meta.totalChars === 0) return `${label} · empty`;
33
+ const lineCount = meta.totalLines;
34
+ const charCount = meta.totalChars.toLocaleString();
35
+ const truncated = meta.truncated ? " · capped" : "";
36
+ return `${label} · ${lineCount} line${lineCount === 1 ? "" : "s"} · ${charCount} chars${truncated}`;
37
+ }
38
+
39
+ function renderOutputSection(
40
+ box: Box,
41
+ label: string,
42
+ value: string,
43
+ meta: { totalChars: number; totalLines: number; truncated: boolean },
44
+ options: MessageRenderOptions,
45
+ theme: Theme,
46
+ ) {
47
+ box.addChild(new Text(theme.fg("toolTitle", buildCapturedOutputLabel(label, meta)), 0, 0));
48
+ if (!value) {
49
+ box.addChild(new Text(theme.fg("dim", "(empty)"), 0, 0));
50
+ return;
51
+ }
52
+ const lines = value.split("\n");
53
+ if (options.expanded || lines.length <= PREVIEW_LINES) {
54
+ box.addChild(new Text(theme.fg("toolOutput", value), 0, 0));
55
+ if (meta.truncated) {
56
+ box.addChild(new Text(theme.fg("warning", `\n... (stored preview capped, ${Math.max(0, meta.totalChars - value.length)} more chars hidden)`), 0, 0));
57
+ }
58
+ return;
59
+ }
60
+ box.addChild(new Text(theme.fg("toolOutput", lines.slice(0, PREVIEW_LINES).join("\n")), 0, 0));
61
+ box.addChild(new Text(theme.fg("warning", `\n... (${lines.length - PREVIEW_LINES} more lines hidden — Ctrl+O to expand)`), 0, 0));
62
+ if (meta.truncated) {
63
+ box.addChild(new Text(theme.fg("warning", `\n... (stored preview capped, ${Math.max(0, meta.totalChars - value.length)} more chars hidden)`), 0, 0));
64
+ }
65
+ }
66
+
67
+ export function renderDeterministicResult(message: DeterministicMessage, options: MessageRenderOptions, theme: Theme) {
68
+ const details = message.details;
69
+ const container = new Container();
70
+ container.addChild(new Spacer(1));
71
+ if (!details) {
72
+ container.addChild(new Text(theme.fg("warning", "Deterministic step message is missing details."), 0, 0));
73
+ return container;
74
+ }
75
+
76
+ const failed = details.exitCode !== 0;
77
+ const box = new Box(1, 1, (text: string) => theme.bg(failed ? "toolPendingBg" : "toolSuccessBg", text));
78
+ const icon = theme.fg(failed ? "error" : "success", failed ? "fail" : "ok");
79
+ const status = failed ? "failed" : "succeeded";
80
+ const title = formatDeterministicExecution(details.execution, details.resolvedScriptPath);
81
+ box.addChild(new Text(`${icon} ${theme.fg("toolTitle", theme.bold("deterministic"))} | ${status} · exit ${details.exitCode} · ${formatDuration(details.durationMs)}`, 0, 0));
82
+ box.addChild(new Spacer(1));
83
+ box.addChild(new Text(theme.fg("dim", `command: ${title}`), 0, 0));
84
+ if (details.resolvedScriptPath) {
85
+ box.addChild(new Text(theme.fg("dim", `script: ${details.resolvedScriptPath}`), 0, 0));
86
+ }
87
+ box.addChild(new Text(theme.fg("dim", `cwd: ${details.cwd}`), 0, 0));
88
+ if (details.signal) {
89
+ box.addChild(new Text(theme.fg("dim", `signal: ${details.signal}`), 0, 0));
90
+ }
91
+ if (details.timedOut) {
92
+ box.addChild(new Text(theme.fg("error", "timeout reached before the process exited"), 0, 0));
93
+ }
94
+ box.addChild(new Text(theme.fg("dim", `nonInteractive: ${details.nonInteractive ? "true" : "false"}`), 0, 0));
95
+ box.addChild(new Spacer(1));
96
+ renderOutputSection(box, "stdout", details.stdout, {
97
+ totalChars: details.stdoutTotalChars,
98
+ totalLines: details.stdoutTotalLines,
99
+ truncated: details.stdoutTruncated,
100
+ }, options, theme);
101
+ box.addChild(new Spacer(1));
102
+ renderOutputSection(box, "stderr", details.stderr, {
103
+ totalChars: details.stderrTotalChars,
104
+ totalLines: details.stderrTotalLines,
105
+ truncated: details.stderrTruncated,
106
+ }, options, theme);
107
+ container.addChild(box);
108
+ return container;
109
+ }
110
+
111
+ export function renderDeterministicCompletion(
112
+ message: DeterministicCompletionMessage,
113
+ _options: MessageRenderOptions,
114
+ theme: Theme,
115
+ ) {
116
+ const details = message.details;
117
+ const container = new Container();
118
+ container.addChild(new Spacer(1));
119
+ if (!details) {
120
+ container.addChild(new Text(theme.fg("warning", "Deterministic completion message is missing details."), 0, 0));
121
+ return container;
122
+ }
123
+
124
+ const failed = details.status === "failed";
125
+ const box = new Box(1, 1, (text: string) => theme.bg(failed ? "toolPendingBg" : "toolSuccessBg", text));
126
+ const icon = theme.fg(failed ? "error" : "success", failed ? "fail" : "ok");
127
+ box.addChild(new Text(`${icon} ${theme.fg("toolTitle", theme.bold("deterministic complete"))} | ${details.status} · exit ${details.exitCode}`, 0, 0));
128
+ box.addChild(new Spacer(1));
129
+ box.addChild(new Text(theme.fg("dim", `prompt: ${details.promptName}`), 0, 0));
130
+ box.addChild(new Text(theme.fg("dim", "model handoff: skipped"), 0, 0));
131
+ if (details.timedOut) {
132
+ box.addChild(new Text(theme.fg("error", "the command hit its timeout before completion"), 0, 0));
133
+ }
134
+ container.addChild(box);
135
+ return container;
136
+ }
@@ -0,0 +1,309 @@
1
+ import { spawn } from "node:child_process";
2
+ import { existsSync } from "node:fs";
3
+ import { dirname, isAbsolute, resolve } from "node:path";
4
+ import type { PromptWithModel, DeterministicStep, DeterministicExecution, DeterministicEnv } from "./prompt-loader.js";
5
+
6
+ export const PROMPT_TEMPLATE_DETERMINISTIC_MESSAGE_TYPE = "prompt-template-deterministic";
7
+ export const PROMPT_TEMPLATE_DETERMINISTIC_COMPLETION_MESSAGE_TYPE = "prompt-template-deterministic-complete";
8
+
9
+ const DEFAULT_MAX_CAPTURE_STDOUT_CHARS = 16_000;
10
+ const DEFAULT_MAX_CAPTURE_STDERR_CHARS = 16_000;
11
+ const DEFAULT_TIMEOUT_KILL_AFTER_MS = 1_000;
12
+
13
+ interface CapturedOutput {
14
+ text: string;
15
+ totalChars: number;
16
+ totalNewlines: number;
17
+ trailingNewlineRun: number;
18
+ sawNonNewline: boolean;
19
+ truncated: boolean;
20
+ maxChars: number;
21
+ }
22
+
23
+ export interface DeterministicExecutionResult {
24
+ execution: DeterministicExecution;
25
+ cwd: string;
26
+ nonInteractive: boolean;
27
+ resolvedScriptPath?: string;
28
+ exitCode: number;
29
+ signal?: NodeJS.Signals;
30
+ stdout: string;
31
+ stdoutTotalChars: number;
32
+ stdoutTotalLines: number;
33
+ stdoutTruncated: boolean;
34
+ stderr: string;
35
+ stderrTotalChars: number;
36
+ stderrTotalLines: number;
37
+ stderrTruncated: boolean;
38
+ durationMs: number;
39
+ timedOut: boolean;
40
+ }
41
+
42
+ export interface DeterministicPreambleOptions {
43
+ maxStdoutChars?: number;
44
+ maxStderrChars?: number;
45
+ }
46
+
47
+ function createCapturedOutput(maxChars: number): CapturedOutput {
48
+ return {
49
+ text: "",
50
+ totalChars: 0,
51
+ totalNewlines: 0,
52
+ trailingNewlineRun: 0,
53
+ sawNonNewline: false,
54
+ truncated: false,
55
+ maxChars,
56
+ };
57
+ }
58
+
59
+ function appendCapturedOutput(output: CapturedOutput, chunk: string): void {
60
+ if (!chunk) return;
61
+ output.totalChars += chunk.length;
62
+ const newlines = chunk.match(/\n/g)?.length ?? 0;
63
+ output.totalNewlines += newlines;
64
+ if (/[^\n]/.test(chunk)) output.sawNonNewline = true;
65
+ const trailingRun = chunk.match(/\n+$/)?.[0].length ?? 0;
66
+ if (trailingRun === 0) {
67
+ output.trailingNewlineRun = 0;
68
+ } else if (trailingRun === chunk.length) {
69
+ output.trailingNewlineRun += trailingRun;
70
+ } else {
71
+ output.trailingNewlineRun = trailingRun;
72
+ }
73
+
74
+ if (output.text.length < output.maxChars) {
75
+ const remaining = output.maxChars - output.text.length;
76
+ output.text += chunk.slice(0, remaining);
77
+ }
78
+ if (output.totalChars > output.maxChars) output.truncated = true;
79
+ }
80
+
81
+ function capturedLineCount(output: Pick<CapturedOutput, "totalChars" | "sawNonNewline" | "totalNewlines" | "trailingNewlineRun">): number {
82
+ if (output.totalChars === 0) return 0;
83
+ if (!output.sawNonNewline) return 1;
84
+ return output.totalNewlines - output.trailingNewlineRun + 1;
85
+ }
86
+
87
+ function countLines(value: string): number {
88
+ if (!value) return 0;
89
+ const normalized = value.replace(/\n+$/g, "");
90
+ if (!normalized) return 1;
91
+ return normalized.split("\n").length;
92
+ }
93
+
94
+ function buildTextPreview(label: string, value: string, totalChars: number, maxChars: number): { text: string; truncated: boolean; omittedChars: number } {
95
+ const shownChars = Math.min(value.length, maxChars);
96
+ const preview = value.slice(0, shownChars);
97
+ const omittedChars = Math.max(0, totalChars - shownChars);
98
+ if (omittedChars === 0) {
99
+ return { text: preview, truncated: false, omittedChars: 0 };
100
+ }
101
+ return {
102
+ text: `${preview}\n...[${label} truncated, ${omittedChars} more chars omitted]`,
103
+ truncated: true,
104
+ omittedChars,
105
+ };
106
+ }
107
+
108
+ function shellQuote(value: string): string {
109
+ return `'${value.replace(/'/g, `'"'"'`)}'`;
110
+ }
111
+
112
+ export function formatDeterministicExecution(execution: DeterministicExecution, resolvedScriptPath?: string): string {
113
+ switch (execution.kind) {
114
+ case "run":
115
+ return execution.command;
116
+ case "command": {
117
+ const parts = [execution.command, ...execution.args].map((part) => shellQuote(part));
118
+ return execution.shell ? `${parts.join(" ")} (shell)` : parts.join(" ");
119
+ }
120
+ case "script": {
121
+ const scriptPath = resolvedScriptPath ?? execution.path;
122
+ const parts = [scriptPath, ...execution.args].map((part) => shellQuote(part));
123
+ return parts.join(" ");
124
+ }
125
+ }
126
+ }
127
+
128
+ export function shouldHandoffToLlm(step: DeterministicStep, result: Pick<DeterministicExecutionResult, "exitCode">): boolean {
129
+ switch (step.handoff) {
130
+ case "always": return true;
131
+ case "never": return false;
132
+ case "on-success": return result.exitCode === 0;
133
+ case "on-failure": return result.exitCode !== 0;
134
+ }
135
+ }
136
+
137
+ function buildOutputPreambleSectionFromResult(
138
+ label: "stdout" | "stderr",
139
+ value: string,
140
+ meta: { totalChars: number; totalLines: number },
141
+ maxChars: number,
142
+ ): string[] {
143
+ const preview = buildTextPreview(label, value, meta.totalChars, maxChars);
144
+ return [
145
+ `[${label}]`,
146
+ `lineCount: ${meta.totalLines}`,
147
+ `charCount: ${meta.totalChars}`,
148
+ `truncated: ${preview.truncated ? "true" : "false"}`,
149
+ preview.truncated ? `omittedChars: ${preview.omittedChars}` : undefined,
150
+ "preview:",
151
+ preview.text || "(empty)",
152
+ ];
153
+ }
154
+
155
+ export function buildDeterministicPreamble(
156
+ result: DeterministicExecutionResult,
157
+ options: DeterministicPreambleOptions = {},
158
+ ): string {
159
+ const maxStdoutChars = options.maxStdoutChars ?? 8_000;
160
+ const maxStderrChars = options.maxStderrChars ?? 4_000;
161
+ const command = formatDeterministicExecution(result.execution, result.resolvedScriptPath);
162
+ return [
163
+ "[Deterministic step]",
164
+ `status: ${result.exitCode === 0 ? "succeeded" : "failed"}`,
165
+ `executionKind: ${result.execution.kind}`,
166
+ `command: ${command.includes("\n") ? JSON.stringify(command) : command}`,
167
+ result.resolvedScriptPath ? `resolvedScript: ${result.resolvedScriptPath}` : undefined,
168
+ `cwd: ${result.cwd}`,
169
+ `nonInteractive: ${result.nonInteractive ? "true" : "false"}`,
170
+ `exitCode: ${result.exitCode}`,
171
+ result.signal ? `signal: ${result.signal}` : undefined,
172
+ `durationMs: ${result.durationMs}`,
173
+ `timedOut: ${result.timedOut ? "true" : "false"}`,
174
+ "",
175
+ ...buildOutputPreambleSectionFromResult("stdout", result.stdout, {
176
+ totalChars: result.stdoutTotalChars,
177
+ totalLines: result.stdoutTotalLines,
178
+ }, maxStdoutChars),
179
+ "",
180
+ ...buildOutputPreambleSectionFromResult("stderr", result.stderr, {
181
+ totalChars: result.stderrTotalChars,
182
+ totalLines: result.stderrTotalLines,
183
+ }, maxStderrChars),
184
+ ].filter((line): line is string => line !== undefined).join("\n");
185
+ }
186
+
187
+ function resolveScriptPath(prompt: Pick<PromptWithModel, "filePath">, cwd: string, execution: Extract<DeterministicExecution, { kind: "script" }>): string {
188
+ if (isAbsolute(execution.path)) return execution.path;
189
+ const promptRelative = resolve(dirname(prompt.filePath), execution.path);
190
+ if (existsSync(promptRelative)) return promptRelative;
191
+ return resolve(cwd, execution.path);
192
+ }
193
+
194
+ function buildDeterministicEnv(step: Pick<DeterministicStep, "env" | "nonInteractive">): NodeJS.ProcessEnv {
195
+ const nonInteractiveDefaults: DeterministicEnv = step.nonInteractive
196
+ ? {
197
+ CI: "1",
198
+ GIT_TERMINAL_PROMPT: "0",
199
+ PAGER: "cat",
200
+ GIT_PAGER: "cat",
201
+ }
202
+ : {};
203
+ return {
204
+ ...process.env,
205
+ ...nonInteractiveDefaults,
206
+ ...(step.env ?? {}),
207
+ };
208
+ }
209
+
210
+ function spawnProcess(command: string, args: string[], options: { cwd: string; shell?: boolean; env: NodeJS.ProcessEnv }) {
211
+ return spawn(command, args, {
212
+ cwd: options.cwd,
213
+ shell: options.shell ?? false,
214
+ env: options.env,
215
+ stdio: ["ignore", "pipe", "pipe"],
216
+ });
217
+ }
218
+
219
+ export async function runDeterministicStep(
220
+ prompt: Pick<PromptWithModel, "filePath">,
221
+ step: DeterministicStep,
222
+ cwd: string,
223
+ ): Promise<DeterministicExecutionResult> {
224
+ const startedAt = Date.now();
225
+ const execution = step.execution;
226
+ const resolvedCwd = step.cwd ?? cwd;
227
+ const env = buildDeterministicEnv(step);
228
+ const resolvedScriptPath = execution.kind === "script"
229
+ ? resolveScriptPath(prompt, resolvedCwd, execution)
230
+ : undefined;
231
+ const child = execution.kind === "run"
232
+ ? spawnProcess("/bin/bash", ["-lc", execution.command], { cwd: resolvedCwd, env })
233
+ : execution.kind === "command"
234
+ ? spawnProcess(execution.command, execution.args, { cwd: resolvedCwd, shell: execution.shell, env })
235
+ : spawnProcess(resolvedScriptPath!, execution.args, { cwd: resolvedCwd, env });
236
+
237
+ const stdout = createCapturedOutput(DEFAULT_MAX_CAPTURE_STDOUT_CHARS);
238
+ const stderr = createCapturedOutput(DEFAULT_MAX_CAPTURE_STDERR_CHARS);
239
+ let timedOut = false;
240
+ let timeoutKillHandle: NodeJS.Timeout | undefined;
241
+
242
+ child.stdout.on("data", (chunk) => {
243
+ appendCapturedOutput(stdout, chunk.toString());
244
+ });
245
+ child.stderr.on("data", (chunk) => {
246
+ appendCapturedOutput(stderr, chunk.toString());
247
+ });
248
+
249
+ const timeoutHandle = step.timeoutMs
250
+ ? setTimeout(() => {
251
+ timedOut = true;
252
+ child.kill("SIGTERM");
253
+ timeoutKillHandle = setTimeout(() => {
254
+ child.kill("SIGKILL");
255
+ }, DEFAULT_TIMEOUT_KILL_AFTER_MS);
256
+ }, step.timeoutMs)
257
+ : undefined;
258
+
259
+ return await new Promise((resolveResult) => {
260
+ let settled = false;
261
+ child.on("error", (error) => {
262
+ if (settled) return;
263
+ settled = true;
264
+ if (timeoutHandle) clearTimeout(timeoutHandle);
265
+ if (timeoutKillHandle) clearTimeout(timeoutKillHandle);
266
+ resolveResult({
267
+ execution,
268
+ cwd: resolvedCwd,
269
+ nonInteractive: step.nonInteractive,
270
+ resolvedScriptPath,
271
+ exitCode: 1,
272
+ stdout: stdout.text,
273
+ stdoutTotalChars: stdout.totalChars,
274
+ stdoutTotalLines: capturedLineCount(stdout),
275
+ stdoutTruncated: stdout.truncated,
276
+ stderr: stderr.text ? `${stderr.text}\n${error.message}` : error.message,
277
+ stderrTotalChars: stderr.totalChars + (stderr.text ? error.message.length + 1 : error.message.length),
278
+ stderrTotalLines: countLines(stderr.text ? `${stderr.text}\n${error.message}` : error.message),
279
+ stderrTruncated: stderr.truncated,
280
+ durationMs: Date.now() - startedAt,
281
+ timedOut,
282
+ });
283
+ });
284
+ child.on("close", (exitCode, signal) => {
285
+ if (settled) return;
286
+ settled = true;
287
+ if (timeoutHandle) clearTimeout(timeoutHandle);
288
+ if (timeoutKillHandle) clearTimeout(timeoutKillHandle);
289
+ resolveResult({
290
+ execution,
291
+ cwd: resolvedCwd,
292
+ nonInteractive: step.nonInteractive,
293
+ resolvedScriptPath,
294
+ exitCode: exitCode ?? (timedOut ? 124 : 1),
295
+ signal: signal ?? undefined,
296
+ stdout: stdout.text,
297
+ stdoutTotalChars: stdout.totalChars,
298
+ stdoutTotalLines: capturedLineCount(stdout),
299
+ stdoutTruncated: stdout.truncated,
300
+ stderr: stderr.text,
301
+ stderrTotalChars: stderr.totalChars,
302
+ stderrTotalLines: capturedLineCount(stderr),
303
+ stderrTruncated: stderr.truncated,
304
+ durationMs: Date.now() - startedAt,
305
+ timedOut,
306
+ });
307
+ });
308
+ });
309
+ }