pi-interactive-shell 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,6 +2,33 @@
2
2
 
3
3
  All notable changes to the `pi-interactive-shell` extension will be documented in this file.
4
4
 
5
+ ## [0.10.0] - 2026-03-13
6
+
7
+ ### Added
8
+ - **Test harness** - Added vitest with 20 tests covering session queries, key encoding, notification formatting, headless monitor lifecycle, session manager, config/docs parity, and module loading.
9
+ - **`gpt-5-4-prompting` skill** - New bundled skill with GPT-5.4 prompting best practices for Codex workflows.
10
+
11
+ ### Changed
12
+ - **Architecture refactor** - Extracted shared logic into focused modules for better maintainability:
13
+ - `session-query.ts` - Unified output/query logic (rate limiting, incremental, drain, offset modes)
14
+ - `notification-utils.ts` - Message formatting for dispatch/hands-free notifications
15
+ - `handoff-utils.ts` - Snapshot/preview capture on session exit/transfer
16
+ - `runtime-coordinator.ts` - Centralized overlay/monitor/widget state management
17
+ - `pty-log.ts` - Raw output trimming and line slicing
18
+ - `pty-protocol.ts` - DSR cursor position query handling
19
+ - `spawn-helper.ts` - macOS node-pty permission fix
20
+ - `background-widget.ts` - TUI widget for background sessions
21
+ - README, `SKILL.md`, install output, and the packaged Codex workflow examples now tell the same story about dispatch being the recommended delegated mode, the current 8s quiet threshold / 15s grace-period defaults, and the bundled prompt-skill surface.
22
+ - The Codex workflow docs now point at the packaged `gpt-5-4-prompting`, `codex-5-3-prompting`, and `codex-cli` skills instead of describing a runtime fetch of the old 5.2 prompting guide.
23
+ - Example prompts and skill docs are aligned around `gpt-5.4` as the default Codex model, with `gpt-5.3-codex` remaining the explicit opt-in fallback.
24
+ - Renamed `codex-5.3-prompting` → `codex-5-3-prompting` example skill (filesystem-friendly path).
25
+
26
+ ### Fixed
27
+ - **Map iteration bug** - Fixed `disposeAllMonitors()` modifying Map during iteration, which could cause unpredictable behavior.
28
+ - **Array iteration bug** - Fixed PTY listener notifications modifying arrays during iteration if a listener unsubscribed itself.
29
+ - **Missing runtime dependency** - Added `@sinclair/typebox` to dependencies (was imported but not declared).
30
+ - Documented the packaged prompt/skill onboarding path more clearly so users can either rely on the exported package metadata or copy the bundled examples into their own prompt and skill directories.
31
+
5
32
  ## [0.9.0] - 2026-02-23
6
33
 
7
34
  ### Added
package/README.md CHANGED
@@ -115,7 +115,7 @@ Attach to review full output: interactive_shell({ attach: "calm-reef" })
115
115
 
116
116
  The notification includes a brief tail (last 5 lines) and a reattach instruction. The PTY is preserved for 5 minutes so the agent can attach to review full scrollback.
117
117
 
118
- Dispatch defaults `autoExitOnQuiet: true` — the session gets a 30s startup grace period, then is killed after output goes silent (5s by default), which signals completion for task-oriented subagents. Tune the grace period with `handsFree: { gracePeriod: 60000 }` or opt out entirely with `handsFree: { autoExitOnQuiet: false }`.
118
+ Dispatch defaults `autoExitOnQuiet: true` — the session gets a 15s startup grace period, then is killed after output goes silent (8s by default), which signals completion for task-oriented subagents. Tune the grace period with `handsFree: { gracePeriod: 60000 }` or opt out entirely with `handsFree: { autoExitOnQuiet: false }`.
119
119
 
120
120
  The overlay still shows for the user, who can Ctrl+T to transfer output, Ctrl+B to background, take over by typing, or Ctrl+Q for more options.
121
121
 
@@ -151,7 +151,7 @@ interactive_shell({
151
151
 
152
152
  ### Auto-Exit on Quiet
153
153
 
154
- For fire-and-forget single-task delegations, enable auto-exit to kill the session after 5s of output silence:
154
+ For fire-and-forget single-task delegations, enable auto-exit to kill the session after 8s of output silence:
155
155
 
156
156
  ```typescript
157
157
  interactive_shell({
@@ -161,7 +161,7 @@ interactive_shell({
161
161
  })
162
162
  ```
163
163
 
164
- A 30s startup grace period prevents the session from being killed before the subprocess has time to produce output. Customize it per-call with `gracePeriod`:
164
+ A 15s startup grace period prevents the session from being killed before the subprocess has time to produce output. Customize it per-call with `gracePeriod`:
165
165
 
166
166
  ```typescript
167
167
  interactive_shell({
@@ -282,8 +282,8 @@ Configuration files (project overrides global):
282
282
  "completionNotifyMaxChars": 5000,
283
283
  "handsFreeUpdateMode": "on-quiet",
284
284
  "handsFreeUpdateInterval": 60000,
285
- "handsFreeQuietThreshold": 5000,
286
- "autoExitGracePeriod": 30000,
285
+ "handsFreeQuietThreshold": 8000,
286
+ "autoExitGracePeriod": 15000,
287
287
  "handsFreeUpdateMaxChars": 1500,
288
288
  "handsFreeMaxTotalChars": 100000,
289
289
  "handoffPreviewEnabled": true,
@@ -306,8 +306,8 @@ Configuration files (project overrides global):
306
306
  | `completionNotifyLines` | 50 | Lines in dispatch completion notification (10-500) |
307
307
  | `completionNotifyMaxChars` | 5000 | Max chars in completion notification (1KB-50KB) |
308
308
  | `handsFreeUpdateMode` | "on-quiet" | "on-quiet" or "interval" |
309
- | `handsFreeQuietThreshold` | 5000 | Silence duration before update (ms) |
310
- | `autoExitGracePeriod` | 30000 | Startup grace before `autoExitOnQuiet` kill (ms) |
309
+ | `handsFreeQuietThreshold` | 8000 | Silence duration before update (ms) |
310
+ | `autoExitGracePeriod` | 15000 | Startup grace before `autoExitOnQuiet` kill (ms) |
311
311
  | `handsFreeUpdateInterval` | 60000 | Max interval between updates (ms) |
312
312
  | `handsFreeUpdateMaxChars` | 1500 | Max chars per update |
313
313
  | `handsFreeMaxTotalChars` | 100000 | Total char budget for updates |
@@ -331,7 +331,7 @@ Full PTY. The subprocess thinks it's in a real terminal.
331
331
 
332
332
  ## Example Workflow: Plan, Implement, Review
333
333
 
334
- The `examples/prompts/` directory includes three prompt templates that chain together into a complete development workflow using Codex CLI. Each template instructs pi to gather context, generate a tailored meta prompt based on the [Codex prompting guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5-2_prompting_guide.md), and launch Codex in an interactive overlay.
334
+ The `examples/prompts/` directory includes three prompt templates that chain together into a complete development workflow using Codex CLI. Each template now loads the bundled `gpt-5-4-prompting` skill by default, falls back to `codex-5-3-prompting` when the user explicitly asks for Codex 5.3, and launches Codex in an interactive overlay.
335
335
 
336
336
  ### The Pipeline
337
337
 
@@ -347,14 +347,22 @@ Write a plan
347
347
 
348
348
  ### Installing the Templates
349
349
 
350
- Copy the prompt templates and Codex CLI skill to your pi config:
350
+ Install the package first so pi can discover the bundled prompt and skill directories via the package metadata:
351
+
352
+ ```bash
353
+ pi install npm:pi-interactive-shell
354
+ ```
355
+
356
+ If you want your own slash commands and local skill copies, copy the examples into your agent config:
351
357
 
352
358
  ```bash
353
359
  # Prompt templates (slash commands)
354
360
  cp ~/.pi/agent/extensions/interactive-shell/examples/prompts/*.md ~/.pi/agent/prompts/
355
361
 
356
- # Codex CLI skill (teaches pi how to use codex flags, sandbox caveats, etc.)
362
+ # Skills used by the templates
357
363
  cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/codex-cli ~/.pi/agent/skills/
364
+ cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/gpt-5-4-prompting ~/.pi/agent/skills/
365
+ cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/codex-5-3-prompting ~/.pi/agent/skills/
358
366
  ```
359
367
 
360
368
  ### Usage
@@ -388,9 +396,9 @@ Say you have a plan at `docs/auth-redesign-plan.md`:
388
396
 
389
397
  These templates demonstrate a "meta-prompt generation" pattern:
390
398
 
391
- 1. **Pi gathers context** — reads the plan, runs git diff, fetches the Codex prompting guide
392
- 2. **Pi generates a calibrated prompt** — tailored to the specific plan/diff, following the guide's best practices
393
- 3. **Pi launches Codex in the overlay** — with explicit flags (`-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`) and hands off control
399
+ 1. **Pi gathers context** — reads the plan, runs git diff, and loads the local `gpt-5-4-prompting` or `codex-5-3-prompting` skill
400
+ 2. **Pi generates a calibrated prompt** — tailored to the specific plan/diff, following the selected skill's best practices
401
+ 3. **Pi launches Codex in the overlay** — defaulting to `-m gpt-5.4 -a never` and switching to `-m gpt-5.3-codex -a never` only when the user explicitly asks for Codex 5.3
394
402
 
395
403
  The user watches Codex work in the overlay and can take over anytime (type to intervene, Ctrl+T to transfer output back to pi, Ctrl+Q for options).
396
404
 
package/SKILL.md CHANGED
@@ -5,7 +5,7 @@ description: Cheat sheet + workflow for launching interactive coding-agent CLIs
5
5
 
6
6
  # Interactive Shell (Skill)
7
7
 
8
- Last verified: 2026-01-18
8
+ Last verified: 2026-03-12
9
9
 
10
10
  ## Foreground vs Background Subagents
11
11
 
@@ -165,7 +165,7 @@ interactive_shell({
165
165
  reason: "Security review",
166
166
  handsFree: { autoExitOnQuiet: true }
167
167
  })
168
- // Session auto-kills after ~5s of quiet
168
+ // Session auto-kills after ~8s of quiet (after the startup grace period)
169
169
  // Read results from file:
170
170
  // read("/tmp/security-review.md")
171
171
  ```
@@ -0,0 +1,76 @@
1
+ import { truncateToWidth, visibleWidth } from "@mariozechner/pi-tui";
2
+ import { formatDuration } from "./types.js";
3
+ import type { ShellSessionManager } from "./session-manager.js";
4
+
5
+ export function setupBackgroundWidget(
6
+ ctx: { ui: { setWidget: Function }; hasUI?: boolean },
7
+ sessionManager: ShellSessionManager,
8
+ ): (() => void) | null {
9
+ if (!ctx.hasUI) return null;
10
+
11
+ let durationTimer: ReturnType<typeof setInterval> | null = null;
12
+ let tuiRef: { requestRender: () => void } | null = null;
13
+
14
+ const requestRender = () => tuiRef?.requestRender();
15
+ const unsubscribe = sessionManager.onChange(() => {
16
+ manageDurationTimer();
17
+ requestRender();
18
+ });
19
+
20
+ function manageDurationTimer() {
21
+ const sessions = sessionManager.list();
22
+ const hasRunning = sessions.some((s) => !s.session.exited);
23
+ if (hasRunning && !durationTimer) {
24
+ durationTimer = setInterval(requestRender, 10_000);
25
+ } else if (!hasRunning && durationTimer) {
26
+ clearInterval(durationTimer);
27
+ durationTimer = null;
28
+ }
29
+ }
30
+
31
+ ctx.ui.setWidget(
32
+ "bg-sessions",
33
+ (tui: any, theme: any) => {
34
+ tuiRef = tui;
35
+ return {
36
+ render: (width: number) => {
37
+ const sessions = sessionManager.list();
38
+ if (sessions.length === 0) return [];
39
+ const cols = width || tui.terminal?.columns || 120;
40
+ const lines: string[] = [];
41
+ for (const s of sessions) {
42
+ const exited = s.session.exited;
43
+ const dot = exited ? theme.fg("dim", "○") : theme.fg("accent", "●");
44
+ const id = theme.fg("dim", s.id);
45
+ const cmd = s.command.replace(/\s+/g, " ").trim();
46
+ const truncCmd = cmd.length > 60 ? cmd.slice(0, 57) + "..." : cmd;
47
+ const reason = s.reason ? theme.fg("dim", ` · ${s.reason}`) : "";
48
+ const status = exited ? theme.fg("dim", "exited") : theme.fg("success", "running");
49
+ const duration = theme.fg("dim", formatDuration(Date.now() - s.startedAt.getTime()));
50
+ const oneLine = ` ${dot} ${id} ${truncCmd}${reason} ${status} ${duration}`;
51
+ if (visibleWidth(oneLine) <= cols) {
52
+ lines.push(oneLine);
53
+ } else {
54
+ lines.push(truncateToWidth(` ${dot} ${id} ${cmd}`, cols, "…"));
55
+ lines.push(truncateToWidth(` ${status} ${duration}${reason}`, cols, "…"));
56
+ }
57
+ }
58
+ return lines;
59
+ },
60
+ invalidate: () => {},
61
+ };
62
+ },
63
+ { placement: "belowEditor" },
64
+ );
65
+
66
+ manageDurationTimer();
67
+
68
+ return () => {
69
+ unsubscribe();
70
+ if (durationTimer) {
71
+ clearInterval(durationTimer);
72
+ durationTimer = null;
73
+ }
74
+ ctx.ui.setWidget("bg-sessions", undefined);
75
+ };
76
+ }
@@ -1,7 +1,11 @@
1
1
  ---
2
2
  description: Launch Codex CLI in overlay to fully implement an existing plan/spec document
3
3
  ---
4
- Load the `codex-5.3-prompting` and `codex-cli` skills. Then read the plan at `$1`.
4
+ Determine which prompting skill to load based on model:
5
+ - Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
6
+ - If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
7
+
8
+ Also load the `codex-cli` skill. Then read the plan at `$1`.
5
9
 
6
10
  Analyze the plan to understand: how many files are created vs modified, whether there's a prescribed implementation order or prerequisites, what existing code is referenced, and roughly how large the implementation is.
7
11
 
@@ -17,9 +21,13 @@ Based on the prompting skill's best practices and the plan's content, generate a
17
21
  8. After implementing all files, do a self-review pass: re-read the plan from top to bottom and verify every requirement, every edge case, every design decision is addressed in the code. Check for: missing imports, type mismatches, unreachable code paths, inconsistent field names between modules, and any plan requirement that was overlooked.
18
22
  9. Do NOT commit or push. Write a summary listing every file created or modified, what was implemented in each, and any plan ambiguities that required judgment calls.
19
23
 
20
- The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions about things answerable by reading the plan or codebase — read first, then act. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design. Emphasize scope discipline GPT-5.3-Codex is aggressive about refactoring adjacent code if not explicitly fenced in.
24
+ The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions about things answerable by reading the plan or codebase — read first, then act. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design. Emphasize scope discipline and verification requirements per the prompting skill.
25
+
26
+ Determine the model flag:
27
+ - Default: `-m gpt-5.4`
28
+ - If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
21
29
 
22
- Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`.
30
+ Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
23
31
 
24
32
  Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
25
33
 
@@ -1,7 +1,11 @@
1
1
  ---
2
2
  description: Launch Codex CLI in overlay to review implemented code changes (optionally against a plan)
3
3
  ---
4
- Load the `codex-5.3-prompting` and `codex-cli` skills. Then determine the review scope:
4
+ Determine which prompting skill to load based on model:
5
+ - Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
6
+ - If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
7
+
8
+ Also load the `codex-cli` skill. Then determine the review scope:
5
9
 
6
10
  - If `$1` looks like a file path (contains `/` or ends in `.md`): read it as the plan/spec these changes were based on. The diff scope is uncommitted changes vs HEAD, or if clean, the current branch vs main.
7
11
  - Otherwise: no plan file. Diff scope is the same. Treat all of `$@` as additional review context or focus areas.
@@ -18,9 +22,13 @@ Based on the prompting skill's best practices, the diff scope, and the optional
18
22
  6. Fix every issue found with direct code edits. Keep fixes scoped to the actual issues identified — do not expand into refactoring or restructuring code that wasn't flagged in the review. If adjacent code looks problematic, note it in the summary but don't touch it.
19
23
  7. After all fixes, write a clear summary listing what was found, what was fixed, and any remaining concerns that require human judgment.
20
24
 
21
- The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions — if intent is unclear, read the surrounding code for context instead of asking. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp. GPT-5.3-Codex moves fast and can skim; the meta prompt must force it to slow down and read carefully before judging.
25
+ The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions — if intent is unclear, read the surrounding code for context instead of asking. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp. Emphasize scope discipline and verification requirements per the prompting skill.
26
+
27
+ Determine the model flag:
28
+ - Default: `-m gpt-5.4`
29
+ - If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
22
30
 
23
- Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`.
31
+ Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
24
32
 
25
33
  Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
26
34
 
@@ -1,7 +1,11 @@
1
1
  ---
2
2
  description: Launch Codex CLI in overlay to review an implementation plan against the codebase
3
3
  ---
4
- Load the `codex-5.3-prompting` and `codex-cli` skills. Then read the plan at `$1`.
4
+ Determine which prompting skill to load based on model:
5
+ - Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
6
+ - If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
7
+
8
+ Also load the `codex-cli` skill. Then read the plan at `$1`.
5
9
 
6
10
  Based on the prompting skill's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
7
11
 
@@ -12,9 +16,13 @@ Based on the prompting skill's best practices and the plan's content, generate a
12
16
  5. Identify any gaps, contradictions, incorrect assumptions, or missing steps.
13
17
  6. Make targeted edits to the plan file to fix issues found, adding inline notes where changes were made. Fix what's wrong — do not restructure or rewrite sections that are correct.
14
18
 
15
- The meta prompt should follow the prompting skill's patterns (clear system context, explicit constraints, step-by-step instructions, expected output format). Instruct Codex not to ask clarifying questions — read the codebase to resolve ambiguities instead of asking. Keep progress updates brief and concrete. GPT-5.3-Codex is eager and may restructure the plan beyond what's needed; constrain edits to actual issues found.
19
+ The meta prompt should follow the prompting skill's patterns (clear system context, explicit constraints, step-by-step instructions, expected output format). Instruct Codex not to ask clarifying questions — read the codebase to resolve ambiguities instead of asking. Keep progress updates brief and concrete. Emphasize scope discipline and verification requirements per the prompting skill.
20
+
21
+ Determine the model flag:
22
+ - Default: `-m gpt-5.4`
23
+ - If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
16
24
 
17
- Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="xhigh" -a never`.
25
+ Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
18
26
 
19
27
  Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
20
28
 
@@ -1,5 +1,5 @@
1
1
  ---
2
- name: codex-5.3-prompting
2
+ name: codex-5-3-prompting
3
3
  description: How to write system prompts and instructions for GPT-5.3-Codex. Use when constructing or tuning prompts targeting Codex 5.3.
4
4
  ---
5
5
 
@@ -23,7 +23,7 @@ description: OpenAI Codex CLI reference. Use when running codex in interactive_s
23
23
 
24
24
  | Flag | Description |
25
25
  |------|-------------|
26
- | `-m, --model <model>` | Switch model (default: `gpt-5.3-codex`) |
26
+ | `-m, --model <model>` | Switch model (default: `gpt-5.3-codex`). Options include `gpt-5.4` (newer, more thorough) and `gpt-5.3-codex` (faster) |
27
27
  | `-c <key=value>` | Override config.toml values (dotted paths, parsed as TOML) |
28
28
  | `-p, --profile <name>` | Use config profile from config.toml |
29
29
  | `-s, --sandbox <mode>` | Sandbox policy: `read-only`, `workspace-write`, `danger-full-access` |
@@ -71,15 +71,21 @@ Use explicit flags to control model and behavior per-run.
71
71
  For delegated fire-and-forget runs, prefer `mode: "dispatch"` so the agent is notified automatically when Codex completes.
72
72
 
73
73
  ```typescript
74
- // Delegated run with completion notification (recommended default)
74
+ // Delegated run with gpt-5.4 (recommended for thorough work)
75
75
  interactive_shell({
76
- command: 'codex -m gpt-5.3-codex -a never "Review this codebase for security issues"',
76
+ command: 'codex -m gpt-5.4 -a never "Review this codebase for security issues"',
77
77
  mode: "dispatch"
78
78
  })
79
79
 
80
- // Override reasoning effort for a single delegated run
80
+ // Faster run with gpt-5.3-codex
81
81
  interactive_shell({
82
- command: 'codex -m gpt-5.3-codex -c model_reasoning_effort="xhigh" -a never "Complex refactor task"',
82
+ command: 'codex -m gpt-5.3-codex -a never "Quick refactor task"',
83
+ mode: "dispatch"
84
+ })
85
+
86
+ // Override reasoning effort for complex tasks
87
+ interactive_shell({
88
+ command: 'codex -m gpt-5.4 -c model_reasoning_effort="xhigh" -a never "Complex architecture review"',
83
89
  mode: "dispatch"
84
90
  })
85
91
 
@@ -0,0 +1,202 @@
1
+ ---
2
+ name: gpt-5-4-prompting
3
+ description: How to write system prompts and instructions for GPT-5.4. Use when constructing or tuning prompts targeting GPT-5.4.
4
+ ---
5
+
6
+ # GPT-5.4 Prompting Guide
7
+
8
+ GPT-5.4 unifies reasoning, coding, and agentic capabilities into a single frontier model. It's extremely persistent, highly token-efficient, and delivers more human-like outputs than its predecessors. However, it has new failure modes: it moves fast without solid plans, expands scope aggressively, and can prematurely declare tasks complete—sometimes falsely claiming success. Prompts must account for these behaviors.
9
+
10
+ ## Output shape
11
+
12
+ Always include.
13
+
14
+ ```
15
+ <output_verbosity_spec>
16
+ - Default: 3-6 sentences or <=5 bullets for typical answers.
17
+ - Simple yes/no questions: <=2 sentences.
18
+ - Complex multi-step or multi-file tasks:
19
+ - 1 short overview paragraph
20
+ - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
21
+ - Avoid long narrative paragraphs; prefer compact bullets and short sections.
22
+ - Do not rephrase the user's request unless it changes semantics.
23
+ </output_verbosity_spec>
24
+ ```
25
+
26
+ ## Scope constraints
27
+
28
+ Critical. GPT-5.4's primary failure mode is scope expansion—it adds features, refactors beyond the ask, and "helpfully" extends tasks. Fence it in hard.
29
+
30
+ ```
31
+ <design_and_scope_constraints>
32
+ - Implement EXACTLY and ONLY what the user requests. Nothing more.
33
+ - No extra features, no "while I'm here" improvements, no UX embellishments.
34
+ - Do NOT expand the task scope under any circumstances.
35
+ - If you notice adjacent issues or opportunities, note them in your summary but DO NOT act on them.
36
+ - If any instruction is ambiguous, choose the simplest valid interpretation.
37
+ - Style aligned to the existing design system. Do not invent new patterns.
38
+ - Do NOT invent colors, shadows, tokens, animations, or new UI elements unless explicitly requested.
39
+ </design_and_scope_constraints>
40
+ ```
41
+
42
+ ## Verification requirements
43
+
44
+ Critical. GPT-5.4 can declare tasks complete prematurely or claim success when the implementation is incorrect. Force explicit verification.
45
+
46
+ ```
47
+ <verification_requirements>
48
+ - Before declaring any task complete, perform explicit verification:
49
+ - Re-read the original requirements
50
+ - Check that every requirement is addressed in the actual code
51
+ - Run tests or validation steps if available
52
+ - Confirm the implementation actually works, don't assume
53
+ - Do NOT claim success based on intent—verify actual outcomes.
54
+ - If you cannot verify (no tests, can't run code), say so explicitly.
55
+ - When reporting completion, include concrete evidence: test results, verified file contents, or explicit acknowledgment of what couldn't be verified.
56
+ - If something failed or was skipped, say so clearly. Do not obscure failures.
57
+ </verification_requirements>
58
+ ```
59
+
60
+ ## Context loading
61
+
62
+ Always include. GPT-5.4 is faster and may skip reading in favor of acting. Force thoroughness.
63
+
64
+ ```
65
+ <context_loading>
66
+ - Read ALL files that will be modified—in full, not just the sections mentioned in the task.
67
+ - Also read key files they import from or that depend on them.
68
+ - Absorb surrounding patterns, naming conventions, error handling style, and architecture before writing any code.
69
+ - Do not ask clarifying questions about things that are answerable by reading the codebase.
70
+ - If modifying existing code, understand the full context before making changes.
71
+ </context_loading>
72
+ ```
73
+
74
+ ## Plan-first mode
75
+
76
+ Include for multi-file work, refactors, or tasks with ordering dependencies. GPT-5.4 produces good natural-language plans but may skip validation steps.
77
+
78
+ ```
79
+ <plan_first>
80
+ - Before writing any code, produce a brief implementation plan:
81
+ - Files to create vs. modify
82
+ - Implementation order and prerequisites
83
+ - Key design decisions and edge cases
84
+ - Acceptance criteria for "done"
85
+ - How you will verify each step
86
+ - Execute the plan step by step. After each step, verify it worked before proceeding.
87
+ - If the plan is provided externally, follow it faithfully—the job is execution, not second-guessing.
88
+ - Do NOT skip verification steps even if you're confident.
89
+ </plan_first>
90
+ ```
91
+
92
+ ## Long-context handling
93
+
94
+ GPT-5.4 supports up to 1M tokens, but accuracy degrades beyond ~512K. Handle long inputs carefully.
95
+
96
+ ```
97
+ <long_context_handling>
98
+ - For inputs longer than ~10k tokens:
99
+ - First, produce a short internal outline of the key sections relevant to the task.
100
+ - Re-state the constraints explicitly before answering.
101
+ - Anchor claims to sections ("In the 'Data Retention' section...") rather than speaking generically.
102
+ - If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them.
103
+ - For very long contexts (200K+ tokens):
104
+ - Be extra vigilant about accuracy—retrieval quality degrades.
105
+ - Cross-reference claims against multiple sections.
106
+ - Prefer citing specific locations over making sweeping statements.
107
+ </long_context_handling>
108
+ ```
109
+
110
+ ## Tool usage
111
+
112
+ ```
113
+ <tool_usage_rules>
114
+ - Prefer tools over internal knowledge whenever:
115
+ - You need fresh or user-specific data (tickets, orders, configs, logs).
116
+ - You reference specific IDs, URLs, or document titles.
117
+ - Parallelize independent tool calls when possible to reduce latency.
118
+ - After any write/update tool call, verify the outcome—do not assume success.
119
+ - After any write/update tool call, briefly restate:
120
+ - What changed
121
+ - Where (ID or path)
122
+ - Verification performed or why verification was skipped
123
+ </tool_usage_rules>
124
+ ```
125
+
126
+ ## Backwards compatibility hedging
127
+
128
+ GPT-5.4 tends to preserve old patterns and add compatibility shims. Use **"cutover"** to signal a clean break.
129
+
130
+ Instead of:
131
+ > "Rewrite this and don't worry about backwards compatibility"
132
+
133
+ Say:
134
+ > "This is a cutover. No backwards compatibility. Rewrite using only Python 3.12+ features and current best practices. Do not preserve legacy code, polyfills, or deprecated patterns."
135
+
136
+ ## Quick reference
137
+
138
+ - **Constrain scope aggressively.** GPT-5.4 expands tasks beyond the ask. "ONLY what is requested, nothing more."
139
+ - **Force verification.** Don't trust "done"—require evidence. "Verify before claiming complete."
140
+ - **Use cutover language.** "Cutover," "no fallbacks," "exactly as specified" get cleaner results.
141
+ - **Plan mode helps.** Explicit plan-first prompts ensure verification steps.
142
+ - **Watch for false success claims.** In agent harnesses, add explicit validation steps. Don't let it self-report completion.
143
+ - **Steer mid-task.** GPT-5.4 handles redirects well. Be direct: "Stop. That's out of scope." / "Verify that actually worked."
144
+ - **Use domain jargon.** "Cutover," "golden-path," "no fallbacks," "domain split," "exactly as specified" trigger precise behavior.
145
+ - **Long context degrades.** Above ~512K tokens, cross-reference claims and cite specific sections.
146
+ - **Token efficiency is real.** 5.4 uses fewer tokens per problem—but verify it didn't skip steps to get there.
147
+
148
+ ## Example: implementation task prompt
149
+
150
+ ```
151
+ <system>
152
+ You are implementing a feature in an existing codebase. Follow these rules strictly.
153
+
154
+ <design_and_scope_constraints>
155
+ - Implement EXACTLY and ONLY what the user requests. Nothing more.
156
+ - No extra features, no "while I'm here" improvements.
157
+ - If you notice adjacent issues, note them in your summary but DO NOT act on them.
158
+ </design_and_scope_constraints>
159
+
160
+ <context_loading>
161
+ - Read ALL files that will be modified—in full.
162
+ - Also read key files they import from or depend on.
163
+ - Absorb patterns before writing any code.
164
+ </context_loading>
165
+
166
+ <verification_requirements>
167
+ - Before declaring complete, verify each requirement is addressed in actual code.
168
+ - Run tests if available. If not, state what couldn't be verified.
169
+ - Include concrete evidence of completion in your summary.
170
+ </verification_requirements>
171
+
172
+ <output_verbosity_spec>
173
+ - Brief updates only on major phases or blockers.
174
+ - Final summary: What changed, Where, Risks, Next steps.
175
+ </output_verbosity_spec>
176
+ </system>
177
+ ```
178
+
179
+ ## Example: code review prompt
180
+
181
+ ```
182
+ <system>
183
+ You are reviewing code changes. Be thorough but stay in scope.
184
+
185
+ <context_loading>
186
+ - Read every changed file in full, not just the diff hunks.
187
+ - Also read files they import from and key dependents.
188
+ </context_loading>
189
+
190
+ <review_scope>
191
+ - Review for: bugs, logic errors, race conditions, resource leaks, null hazards, error handling gaps, type mismatches, dead code, unused imports, pattern inconsistencies.
192
+ - Fix issues you find with direct code edits.
193
+ - Do NOT refactor or restructure code that wasn't flagged in the review.
194
+ - If adjacent code looks problematic, note it but don't touch it.
195
+ </review_scope>
196
+
197
+ <verification_requirements>
198
+ - After fixes, verify the code still works. Run tests if available.
199
+ - In your summary, list what was found, what was fixed, and what couldn't be verified.
200
+ </verification_requirements>
201
+ </system>
202
+ ```
@@ -0,0 +1,92 @@
1
+ import { mkdirSync, writeFileSync } from "node:fs";
2
+ import { join } from "node:path";
3
+ import { getAgentDir } from "@mariozechner/pi-coding-agent";
4
+ import type { InteractiveShellConfig } from "./config.js";
5
+ import type { InteractiveShellOptions, InteractiveShellResult } from "./types.js";
6
+ import type { PtyTerminalSession } from "./pty-session.js";
7
+
8
+ export function captureCompletionOutput(
9
+ session: PtyTerminalSession,
10
+ config: InteractiveShellConfig,
11
+ ): InteractiveShellResult["completionOutput"] {
12
+ const result = session.getTailLines({
13
+ lines: config.completionNotifyLines,
14
+ ansi: false,
15
+ maxChars: config.completionNotifyMaxChars,
16
+ });
17
+ return {
18
+ lines: result.lines,
19
+ totalLines: result.totalLinesInBuffer,
20
+ truncated: result.lines.length < result.totalLinesInBuffer || result.truncatedByChars,
21
+ };
22
+ }
23
+
24
+ export function captureTransferOutput(
25
+ session: PtyTerminalSession,
26
+ config: InteractiveShellConfig,
27
+ ): InteractiveShellResult["transferred"] {
28
+ const result = session.getTailLines({
29
+ lines: config.transferLines,
30
+ ansi: false,
31
+ maxChars: config.transferMaxChars,
32
+ });
33
+ return {
34
+ lines: result.lines,
35
+ totalLines: result.totalLinesInBuffer,
36
+ truncated: result.lines.length < result.totalLinesInBuffer || result.truncatedByChars,
37
+ };
38
+ }
39
+
40
+ export function maybeBuildHandoffPreview(
41
+ session: PtyTerminalSession,
42
+ when: "exit" | "detach" | "kill" | "timeout" | "transfer",
43
+ config: InteractiveShellConfig,
44
+ overrides?: Pick<InteractiveShellOptions, "handoffPreviewEnabled" | "handoffPreviewLines" | "handoffPreviewMaxChars">,
45
+ ): InteractiveShellResult["handoffPreview"] | undefined {
46
+ const enabled = overrides?.handoffPreviewEnabled ?? config.handoffPreviewEnabled;
47
+ if (!enabled) return undefined;
48
+ const lines = overrides?.handoffPreviewLines ?? config.handoffPreviewLines;
49
+ const maxChars = overrides?.handoffPreviewMaxChars ?? config.handoffPreviewMaxChars;
50
+ if (lines <= 0 || maxChars <= 0) return undefined;
51
+ const result = session.getTailLines({ lines, ansi: false, maxChars });
52
+ return { type: "tail", when, lines: result.lines };
53
+ }
54
+
55
+ export function maybeWriteHandoffSnapshot(
56
+ session: PtyTerminalSession,
57
+ when: "exit" | "detach" | "kill" | "timeout" | "transfer",
58
+ config: InteractiveShellConfig,
59
+ context: { command: string; cwd?: string },
60
+ overrides?: Pick<InteractiveShellOptions, "handoffSnapshotEnabled" | "handoffSnapshotLines" | "handoffSnapshotMaxChars">,
61
+ ): InteractiveShellResult["handoff"] | undefined {
62
+ const enabled = overrides?.handoffSnapshotEnabled ?? config.handoffSnapshotEnabled;
63
+ if (!enabled) return undefined;
64
+ const lines = overrides?.handoffSnapshotLines ?? config.handoffSnapshotLines;
65
+ const maxChars = overrides?.handoffSnapshotMaxChars ?? config.handoffSnapshotMaxChars;
66
+ if (lines <= 0 || maxChars <= 0) return undefined;
67
+
68
+ const baseDir = join(getAgentDir(), "cache", "interactive-shell");
69
+ mkdirSync(baseDir, { recursive: true });
70
+ const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
71
+ const pid = session.pid;
72
+ const filename = `snapshot-${timestamp}-pid${pid}.log`;
73
+ const transcriptPath = join(baseDir, filename);
74
+ const tailResult = session.getTailLines({
75
+ lines,
76
+ ansi: config.ansiReemit,
77
+ maxChars,
78
+ });
79
+ const header = [
80
+ `# interactive-shell snapshot (${when})`,
81
+ `time: ${new Date().toISOString()}`,
82
+ `command: ${context.command}`,
83
+ `cwd: ${context.cwd ?? ""}`,
84
+ `pid: ${pid}`,
85
+ `exitCode: ${session.exitCode ?? ""}`,
86
+ `signal: ${session.signal ?? ""}`,
87
+ `lines: ${tailResult.lines.length} (requested ${lines}, maxChars ${maxChars})`,
88
+ "",
89
+ ].join("\n");
90
+ writeFileSync(transcriptPath, header + tailResult.lines.join("\n") + "\n", { encoding: "utf-8" });
91
+ return { type: "snapshot", when, transcriptPath, linesWritten: tailResult.lines.length };
92
+ }