npm - pi-interactive-shell - Versions diffs - 0.9.0 → 0.10.0 - Mend

pi-interactive-shell 0.9.0 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/CHANGELOG.md +27 -0
package/README.md +21 -13
package/SKILL.md +2 -2
package/background-widget.ts +76 -0
package/examples/prompts/codex-implement-plan.md +11 -3
package/examples/prompts/codex-review-impl.md +11 -3
package/examples/prompts/codex-review-plan.md +11 -3
package/examples/skills/{codex-5.3-prompting → codex-5-3-prompting}/SKILL.md +1 -1
package/examples/skills/codex-cli/SKILL.md +11 -5
package/examples/skills/gpt-5-4-prompting/SKILL.md +202 -0
package/handoff-utils.ts +92 -0
package/headless-monitor.ts +6 -1
package/index.ts +231 -416
package/notification-utils.ts +134 -0
package/overlay-component.ts +14 -213
package/package.json +26 -6
package/pty-log.ts +59 -0
package/pty-protocol.ts +33 -0
package/pty-session.ts +11 -134
package/reattach-overlay.ts +5 -74
package/runtime-coordinator.ts +69 -0
package/scripts/install.js +5 -1
package/session-manager.ts +21 -11
package/session-query.ts +170 -0
package/spawn-helper.ts +37 -0
package/types.ts +3 -0

package/CHANGELOG.md CHANGED Viewed

@@ -2,6 +2,33 @@
 All notable changes to the `pi-interactive-shell` extension will be documented in this file.
+## [0.10.0] - 2026-03-13
+### Added
+- **Test harness** - Added vitest with 20 tests covering session queries, key encoding, notification formatting, headless monitor lifecycle, session manager, config/docs parity, and module loading.
+- **`gpt-5-4-prompting` skill** - New bundled skill with GPT-5.4 prompting best practices for Codex workflows.
+### Changed
+- **Architecture refactor** - Extracted shared logic into focused modules for better maintainability:
+  - `session-query.ts` - Unified output/query logic (rate limiting, incremental, drain, offset modes)
+  - `notification-utils.ts` - Message formatting for dispatch/hands-free notifications
+  - `handoff-utils.ts` - Snapshot/preview capture on session exit/transfer
+  - `runtime-coordinator.ts` - Centralized overlay/monitor/widget state management
+  - `pty-log.ts` - Raw output trimming and line slicing
+  - `pty-protocol.ts` - DSR cursor position query handling
+  - `spawn-helper.ts` - macOS node-pty permission fix
+  - `background-widget.ts` - TUI widget for background sessions
+- README, `SKILL.md`, install output, and the packaged Codex workflow examples now tell the same story about dispatch being the recommended delegated mode, the current 8s quiet threshold / 15s grace-period defaults, and the bundled prompt-skill surface.
+- The Codex workflow docs now point at the packaged `gpt-5-4-prompting`, `codex-5-3-prompting`, and `codex-cli` skills instead of describing a runtime fetch of the old 5.2 prompting guide.
+- Example prompts and skill docs are aligned around `gpt-5.4` as the default Codex model, with `gpt-5.3-codex` remaining the explicit opt-in fallback.
+- Renamed `codex-5.3-prompting` → `codex-5-3-prompting` example skill (filesystem-friendly path).
+### Fixed
+- **Map iteration bug** - Fixed `disposeAllMonitors()` modifying Map during iteration, which could cause unpredictable behavior.
+- **Array iteration bug** - Fixed PTY listener notifications modifying arrays during iteration if a listener unsubscribed itself.
+- **Missing runtime dependency** - Added `@sinclair/typebox` to dependencies (was imported but not declared).
+- Documented the packaged prompt/skill onboarding path more clearly so users can either rely on the exported package metadata or copy the bundled examples into their own prompt and skill directories.
 ## [0.9.0] - 2026-02-23
 ### Added

package/README.md CHANGED Viewed

@@ -115,7 +115,7 @@ Attach to review full output: interactive_shell({ attach: "calm-reef" })
 The notification includes a brief tail (last 5 lines) and a reattach instruction. The PTY is preserved for 5 minutes so the agent can attach to review full scrollback.
-Dispatch defaults `autoExitOnQuiet: true` — the session gets a 30s startup grace period, then is killed after output goes silent (5s by default), which signals completion for task-oriented subagents. Tune the grace period with `handsFree: { gracePeriod: 60000 }` or opt out entirely with `handsFree: { autoExitOnQuiet: false }`.
+Dispatch defaults `autoExitOnQuiet: true` — the session gets a 15s startup grace period, then is killed after output goes silent (8s by default), which signals completion for task-oriented subagents. Tune the grace period with `handsFree: { gracePeriod: 60000 }` or opt out entirely with `handsFree: { autoExitOnQuiet: false }`.
 The overlay still shows for the user, who can Ctrl+T to transfer output, Ctrl+B to background, take over by typing, or Ctrl+Q for more options.
@@ -151,7 +151,7 @@ interactive_shell({
 ### Auto-Exit on Quiet
-For fire-and-forget single-task delegations, enable auto-exit to kill the session after 5s of output silence:
+For fire-and-forget single-task delegations, enable auto-exit to kill the session after 8s of output silence:
 ```typescript
 interactive_shell({
@@ -161,7 +161,7 @@ interactive_shell({
 })
 ```
-A 30s startup grace period prevents the session from being killed before the subprocess has time to produce output. Customize it per-call with `gracePeriod`:
+A 15s startup grace period prevents the session from being killed before the subprocess has time to produce output. Customize it per-call with `gracePeriod`:
 ```typescript
 interactive_shell({
@@ -282,8 +282,8 @@ Configuration files (project overrides global):
   "completionNotifyMaxChars": 5000,
   "handsFreeUpdateMode": "on-quiet",
   "handsFreeUpdateInterval": 60000,
-  "handsFreeQuietThreshold": 5000,
-  "autoExitGracePeriod": 30000,
+  "handsFreeQuietThreshold": 8000,
+  "autoExitGracePeriod": 15000,
   "handsFreeUpdateMaxChars": 1500,
   "handsFreeMaxTotalChars": 100000,
   "handoffPreviewEnabled": true,
@@ -306,8 +306,8 @@ Configuration files (project overrides global):
 | `completionNotifyLines` | 50 | Lines in dispatch completion notification (10-500) |
 | `completionNotifyMaxChars` | 5000 | Max chars in completion notification (1KB-50KB) |
 | `handsFreeUpdateMode` | "on-quiet" | "on-quiet" or "interval" |
-| `handsFreeQuietThreshold` | 5000 | Silence duration before update (ms) |
-| `autoExitGracePeriod` | 30000 | Startup grace before `autoExitOnQuiet` kill (ms) |
+| `handsFreeQuietThreshold` | 8000 | Silence duration before update (ms) |
+| `autoExitGracePeriod` | 15000 | Startup grace before `autoExitOnQuiet` kill (ms) |
 | `handsFreeUpdateInterval` | 60000 | Max interval between updates (ms) |
 | `handsFreeUpdateMaxChars` | 1500 | Max chars per update |
 | `handsFreeMaxTotalChars` | 100000 | Total char budget for updates |
@@ -331,7 +331,7 @@ Full PTY. The subprocess thinks it's in a real terminal.
 ## Example Workflow: Plan, Implement, Review
-The `examples/prompts/` directory includes three prompt templates that chain together into a complete development workflow using Codex CLI. Each template instructs pi to gather context, generate a tailored meta prompt based on the [Codex prompting guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5-2_prompting_guide.md), and launch Codex in an interactive overlay.
+The `examples/prompts/` directory includes three prompt templates that chain together into a complete development workflow using Codex CLI. Each template now loads the bundled `gpt-5-4-prompting` skill by default, falls back to `codex-5-3-prompting` when the user explicitly asks for Codex 5.3, and launches Codex in an interactive overlay.
 ### The Pipeline
@@ -347,14 +347,22 @@ Write a plan
 ### Installing the Templates
-Copy the prompt templates and Codex CLI skill to your pi config:
+Install the package first so pi can discover the bundled prompt and skill directories via the package metadata:
+```bash
+pi install npm:pi-interactive-shell
+```
+If you want your own slash commands and local skill copies, copy the examples into your agent config:
 ```bash
 # Prompt templates (slash commands)
 cp ~/.pi/agent/extensions/interactive-shell/examples/prompts/*.md ~/.pi/agent/prompts/
-# Codex CLI skill (teaches pi how to use codex flags, sandbox caveats, etc.)
+# Skills used by the templates
 cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/codex-cli ~/.pi/agent/skills/
+cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/gpt-5-4-prompting ~/.pi/agent/skills/
+cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/codex-5-3-prompting ~/.pi/agent/skills/
 ```
 ### Usage
@@ -388,9 +396,9 @@ Say you have a plan at `docs/auth-redesign-plan.md`:
 These templates demonstrate a "meta-prompt generation" pattern:
-1. **Pi gathers context** — reads the plan, runs git diff, fetches the Codex prompting guide
-2. **Pi generates a calibrated prompt** — tailored to the specific plan/diff, following the guide's best practices
-3. **Pi launches Codex in the overlay** — with explicit flags (`-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`) and hands off control
+1. **Pi gathers context** — reads the plan, runs git diff, and loads the local `gpt-5-4-prompting` or `codex-5-3-prompting` skill
+2. **Pi generates a calibrated prompt** — tailored to the specific plan/diff, following the selected skill's best practices
+3. **Pi launches Codex in the overlay** — defaulting to `-m gpt-5.4 -a never` and switching to `-m gpt-5.3-codex -a never` only when the user explicitly asks for Codex 5.3
 The user watches Codex work in the overlay and can take over anytime (type to intervene, Ctrl+T to transfer output back to pi, Ctrl+Q for options).

package/SKILL.md CHANGED Viewed

@@ -5,7 +5,7 @@ description: Cheat sheet + workflow for launching interactive coding-agent CLIs
 # Interactive Shell (Skill)
-Last verified: 2026-01-18
+Last verified: 2026-03-12
 ## Foreground vs Background Subagents
@@ -165,7 +165,7 @@ interactive_shell({
   reason: "Security review",
   handsFree: { autoExitOnQuiet: true }
 })
-// Session auto-kills after ~5s of quiet
+// Session auto-kills after ~8s of quiet (after the startup grace period)
 // Read results from file:
 // read("/tmp/security-review.md")
 ```

package/background-widget.ts ADDED Viewed

@@ -0,0 +1,76 @@
+import { truncateToWidth, visibleWidth } from "@mariozechner/pi-tui";
+import { formatDuration } from "./types.js";
+import type { ShellSessionManager } from "./session-manager.js";
+export function setupBackgroundWidget(
+	ctx: { ui: { setWidget: Function }; hasUI?: boolean },
+	sessionManager: ShellSessionManager,
+): (() => void) | null {
+	if (!ctx.hasUI) return null;
+	let durationTimer: ReturnType<typeof setInterval> | null = null;
+	let tuiRef: { requestRender: () => void } | null = null;
+	const requestRender = () => tuiRef?.requestRender();
+	const unsubscribe = sessionManager.onChange(() => {
+		manageDurationTimer();
+		requestRender();
+	});
+	function manageDurationTimer() {
+		const sessions = sessionManager.list();
+		const hasRunning = sessions.some((s) => !s.session.exited);
+		if (hasRunning && !durationTimer) {
+			durationTimer = setInterval(requestRender, 10_000);
+		} else if (!hasRunning && durationTimer) {
+			clearInterval(durationTimer);
+			durationTimer = null;
+		}
+	}
+	ctx.ui.setWidget(
+		"bg-sessions",
+		(tui: any, theme: any) => {
+			tuiRef = tui;
+			return {
+				render: (width: number) => {
+					const sessions = sessionManager.list();
+					if (sessions.length === 0) return [];
+					const cols = width || tui.terminal?.columns || 120;
+					const lines: string[] = [];
+					for (const s of sessions) {
+						const exited = s.session.exited;
+						const dot = exited ? theme.fg("dim", "○") : theme.fg("accent", "●");
+						const id = theme.fg("dim", s.id);
+						const cmd = s.command.replace(/\s+/g, " ").trim();
+						const truncCmd = cmd.length > 60 ? cmd.slice(0, 57) + "..." : cmd;
+						const reason = s.reason ? theme.fg("dim", ` · ${s.reason}`) : "";
+						const status = exited ? theme.fg("dim", "exited") : theme.fg("success", "running");
+						const duration = theme.fg("dim", formatDuration(Date.now() - s.startedAt.getTime()));
+						const oneLine = ` ${dot} ${id}  ${truncCmd}${reason}  ${status} ${duration}`;
+						if (visibleWidth(oneLine) <= cols) {
+							lines.push(oneLine);
+						} else {
+							lines.push(truncateToWidth(` ${dot} ${id}  ${cmd}`, cols, "…"));
+							lines.push(truncateToWidth(`   ${status} ${duration}${reason}`, cols, "…"));
+						}
+					}
+					return lines;
+				},
+				invalidate: () => {},
+			};
+		},
+		{ placement: "belowEditor" },
+	);
+	manageDurationTimer();
+	return () => {
+		unsubscribe();
+		if (durationTimer) {
+			clearInterval(durationTimer);
+			durationTimer = null;
+		}
+		ctx.ui.setWidget("bg-sessions", undefined);
+	};
+}

package/examples/prompts/codex-implement-plan.md CHANGED Viewed

@@ -1,7 +1,11 @@
 ---
 description: Launch Codex CLI in overlay to fully implement an existing plan/spec document
 ---
-Load the `codex-5.3-prompting` and `codex-cli` skills. Then read the plan at `$1`.
+Determine which prompting skill to load based on model:
+- Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
+- If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
+Also load the `codex-cli` skill. Then read the plan at `$1`.
 Analyze the plan to understand: how many files are created vs modified, whether there's a prescribed implementation order or prerequisites, what existing code is referenced, and roughly how large the implementation is.
@@ -17,9 +21,13 @@ Based on the prompting skill's best practices and the plan's content, generate a
 8. After implementing all files, do a self-review pass: re-read the plan from top to bottom and verify every requirement, every edge case, every design decision is addressed in the code. Check for: missing imports, type mismatches, unreachable code paths, inconsistent field names between modules, and any plan requirement that was overlooked.
 9. Do NOT commit or push. Write a summary listing every file created or modified, what was implemented in each, and any plan ambiguities that required judgment calls.
-The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions about things answerable by reading the plan or codebase — read first, then act. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design. Emphasize scope discipline — GPT-5.3-Codex is aggressive about refactoring adjacent code if not explicitly fenced in.
+The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions about things answerable by reading the plan or codebase — read first, then act. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design. Emphasize scope discipline and verification requirements per the prompting skill.
+Determine the model flag:
+- Default: `-m gpt-5.4`
+- If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
-Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`.
+Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
 Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.

package/examples/prompts/codex-review-impl.md CHANGED Viewed

@@ -1,7 +1,11 @@
 ---
 description: Launch Codex CLI in overlay to review implemented code changes (optionally against a plan)
 ---
-Load the `codex-5.3-prompting` and `codex-cli` skills. Then determine the review scope:
+Determine which prompting skill to load based on model:
+- Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
+- If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
+Also load the `codex-cli` skill. Then determine the review scope:
 - If `$1` looks like a file path (contains `/` or ends in `.md`): read it as the plan/spec these changes were based on. The diff scope is uncommitted changes vs HEAD, or if clean, the current branch vs main.
 - Otherwise: no plan file. Diff scope is the same. Treat all of `$@` as additional review context or focus areas.
@@ -18,9 +22,13 @@ Based on the prompting skill's best practices, the diff scope, and the optional
 6. Fix every issue found with direct code edits. Keep fixes scoped to the actual issues identified — do not expand into refactoring or restructuring code that wasn't flagged in the review. If adjacent code looks problematic, note it in the summary but don't touch it.
 7. After all fixes, write a clear summary listing what was found, what was fixed, and any remaining concerns that require human judgment.
-The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions — if intent is unclear, read the surrounding code for context instead of asking. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp. GPT-5.3-Codex moves fast and can skim; the meta prompt must force it to slow down and read carefully before judging.
+The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions — if intent is unclear, read the surrounding code for context instead of asking. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp. Emphasize scope discipline and verification requirements per the prompting skill.
+Determine the model flag:
+- Default: `-m gpt-5.4`
+- If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
-Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`.
+Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
 Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.

package/examples/prompts/codex-review-plan.md CHANGED Viewed

@@ -1,7 +1,11 @@
 ---
 description: Launch Codex CLI in overlay to review an implementation plan against the codebase
 ---
-Load the `codex-5.3-prompting` and `codex-cli` skills. Then read the plan at `$1`.
+Determine which prompting skill to load based on model:
+- Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
+- If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
+Also load the `codex-cli` skill. Then read the plan at `$1`.
 Based on the prompting skill's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
@@ -12,9 +16,13 @@ Based on the prompting skill's best practices and the plan's content, generate a
 5. Identify any gaps, contradictions, incorrect assumptions, or missing steps.
 6. Make targeted edits to the plan file to fix issues found, adding inline notes where changes were made. Fix what's wrong — do not restructure or rewrite sections that are correct.
-The meta prompt should follow the prompting skill's patterns (clear system context, explicit constraints, step-by-step instructions, expected output format). Instruct Codex not to ask clarifying questions — read the codebase to resolve ambiguities instead of asking. Keep progress updates brief and concrete. GPT-5.3-Codex is eager and may restructure the plan beyond what's needed; constrain edits to actual issues found.
+The meta prompt should follow the prompting skill's patterns (clear system context, explicit constraints, step-by-step instructions, expected output format). Instruct Codex not to ask clarifying questions — read the codebase to resolve ambiguities instead of asking. Keep progress updates brief and concrete. Emphasize scope discipline and verification requirements per the prompting skill.
+Determine the model flag:
+- Default: `-m gpt-5.4`
+- If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
-Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="xhigh" -a never`.
+Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
 Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.

package/examples/skills/{codex-5.3-prompting → codex-5-3-prompting}/SKILL.md RENAMED Viewed

@@ -1,5 +1,5 @@
 ---
-name: codex-5.3-prompting
+name: codex-5-3-prompting
 description: How to write system prompts and instructions for GPT-5.3-Codex. Use when constructing or tuning prompts targeting Codex 5.3.
 ---

package/examples/skills/codex-cli/SKILL.md CHANGED Viewed

@@ -23,7 +23,7 @@ description: OpenAI Codex CLI reference. Use when running codex in interactive_s
 | Flag | Description |
 |------|-------------|
-| `-m, --model <model>` | Switch model (default: `gpt-5.3-codex`) |
+| `-m, --model <model>` | Switch model (default: `gpt-5.3-codex`). Options include `gpt-5.4` (newer, more thorough) and `gpt-5.3-codex` (faster) |
 | `-c <key=value>` | Override config.toml values (dotted paths, parsed as TOML) |
 | `-p, --profile <name>` | Use config profile from config.toml |
 | `-s, --sandbox <mode>` | Sandbox policy: `read-only`, `workspace-write`, `danger-full-access` |
@@ -71,15 +71,21 @@ Use explicit flags to control model and behavior per-run.
 For delegated fire-and-forget runs, prefer `mode: "dispatch"` so the agent is notified automatically when Codex completes.
 ```typescript
-// Delegated run with completion notification (recommended default)
+// Delegated run with gpt-5.4 (recommended for thorough work)
 interactive_shell({
-  command: 'codex -m gpt-5.3-codex -a never "Review this codebase for security issues"',
+  command: 'codex -m gpt-5.4 -a never "Review this codebase for security issues"',
   mode: "dispatch"
 })
-// Override reasoning effort for a single delegated run
+// Faster run with gpt-5.3-codex
 interactive_shell({
-  command: 'codex -m gpt-5.3-codex -c model_reasoning_effort="xhigh" -a never "Complex refactor task"',
+  command: 'codex -m gpt-5.3-codex -a never "Quick refactor task"',
+  mode: "dispatch"
+})
+// Override reasoning effort for complex tasks
+interactive_shell({
+  command: 'codex -m gpt-5.4 -c model_reasoning_effort="xhigh" -a never "Complex architecture review"',
   mode: "dispatch"
 })

package/examples/skills/gpt-5-4-prompting/SKILL.md ADDED Viewed

@@ -0,0 +1,202 @@
+---
+name: gpt-5-4-prompting
+description: How to write system prompts and instructions for GPT-5.4. Use when constructing or tuning prompts targeting GPT-5.4.
+---
+# GPT-5.4 Prompting Guide
+GPT-5.4 unifies reasoning, coding, and agentic capabilities into a single frontier model. It's extremely persistent, highly token-efficient, and delivers more human-like outputs than its predecessors. However, it has new failure modes: it moves fast without solid plans, expands scope aggressively, and can prematurely declare tasks complete—sometimes falsely claiming success. Prompts must account for these behaviors.
+## Output shape
+Always include.
+```
+<output_verbosity_spec>
+- Default: 3-6 sentences or <=5 bullets for typical answers.
+- Simple yes/no questions: <=2 sentences.
+- Complex multi-step or multi-file tasks:
+  - 1 short overview paragraph
+  - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
+- Avoid long narrative paragraphs; prefer compact bullets and short sections.
+- Do not rephrase the user's request unless it changes semantics.
+</output_verbosity_spec>
+```
+## Scope constraints
+Critical. GPT-5.4's primary failure mode is scope expansion—it adds features, refactors beyond the ask, and "helpfully" extends tasks. Fence it in hard.
+```
+<design_and_scope_constraints>
+- Implement EXACTLY and ONLY what the user requests. Nothing more.
+- No extra features, no "while I'm here" improvements, no UX embellishments.
+- Do NOT expand the task scope under any circumstances.
+- If you notice adjacent issues or opportunities, note them in your summary but DO NOT act on them.
+- If any instruction is ambiguous, choose the simplest valid interpretation.
+- Style aligned to the existing design system. Do not invent new patterns.
+- Do NOT invent colors, shadows, tokens, animations, or new UI elements unless explicitly requested.
+</design_and_scope_constraints>
+```
+## Verification requirements
+Critical. GPT-5.4 can declare tasks complete prematurely or claim success when the implementation is incorrect. Force explicit verification.
+```
+<verification_requirements>
+- Before declaring any task complete, perform explicit verification:
+  - Re-read the original requirements
+  - Check that every requirement is addressed in the actual code
+  - Run tests or validation steps if available
+  - Confirm the implementation actually works, don't assume
+- Do NOT claim success based on intent—verify actual outcomes.
+- If you cannot verify (no tests, can't run code), say so explicitly.
+- When reporting completion, include concrete evidence: test results, verified file contents, or explicit acknowledgment of what couldn't be verified.
+- If something failed or was skipped, say so clearly. Do not obscure failures.
+</verification_requirements>
+```
+## Context loading
+Always include. GPT-5.4 is faster and may skip reading in favor of acting. Force thoroughness.
+```
+<context_loading>
+- Read ALL files that will be modified—in full, not just the sections mentioned in the task.
+- Also read key files they import from or that depend on them.
+- Absorb surrounding patterns, naming conventions, error handling style, and architecture before writing any code.
+- Do not ask clarifying questions about things that are answerable by reading the codebase.
+- If modifying existing code, understand the full context before making changes.
+</context_loading>
+```
+## Plan-first mode
+Include for multi-file work, refactors, or tasks with ordering dependencies. GPT-5.4 produces good natural-language plans but may skip validation steps.
+```
+<plan_first>
+- Before writing any code, produce a brief implementation plan:
+  - Files to create vs. modify
+  - Implementation order and prerequisites
+  - Key design decisions and edge cases
+  - Acceptance criteria for "done"
+  - How you will verify each step
+- Execute the plan step by step. After each step, verify it worked before proceeding.
+- If the plan is provided externally, follow it faithfully—the job is execution, not second-guessing.
+- Do NOT skip verification steps even if you're confident.
+</plan_first>
+```
+## Long-context handling
+GPT-5.4 supports up to 1M tokens, but accuracy degrades beyond ~512K. Handle long inputs carefully.
+```
+<long_context_handling>
+- For inputs longer than ~10k tokens:
+  - First, produce a short internal outline of the key sections relevant to the task.
+  - Re-state the constraints explicitly before answering.
+  - Anchor claims to sections ("In the 'Data Retention' section...") rather than speaking generically.
+- If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them.
+- For very long contexts (200K+ tokens):
+  - Be extra vigilant about accuracy—retrieval quality degrades.
+  - Cross-reference claims against multiple sections.
+  - Prefer citing specific locations over making sweeping statements.
+</long_context_handling>
+```
+## Tool usage
+```
+<tool_usage_rules>
+- Prefer tools over internal knowledge whenever:
+  - You need fresh or user-specific data (tickets, orders, configs, logs).
+  - You reference specific IDs, URLs, or document titles.
+- Parallelize independent tool calls when possible to reduce latency.
+- After any write/update tool call, verify the outcome—do not assume success.
+- After any write/update tool call, briefly restate:
+  - What changed
+  - Where (ID or path)
+  - Verification performed or why verification was skipped
+</tool_usage_rules>
+```
+## Backwards compatibility hedging
+GPT-5.4 tends to preserve old patterns and add compatibility shims. Use **"cutover"** to signal a clean break.
+Instead of:
+> "Rewrite this and don't worry about backwards compatibility"
+Say:
+> "This is a cutover. No backwards compatibility. Rewrite using only Python 3.12+ features and current best practices. Do not preserve legacy code, polyfills, or deprecated patterns."
+## Quick reference
+- **Constrain scope aggressively.** GPT-5.4 expands tasks beyond the ask. "ONLY what is requested, nothing more."
+- **Force verification.** Don't trust "done"—require evidence. "Verify before claiming complete."
+- **Use cutover language.** "Cutover," "no fallbacks," "exactly as specified" get cleaner results.
+- **Plan mode helps.** Explicit plan-first prompts ensure verification steps.
+- **Watch for false success claims.** In agent harnesses, add explicit validation steps. Don't let it self-report completion.
+- **Steer mid-task.** GPT-5.4 handles redirects well. Be direct: "Stop. That's out of scope." / "Verify that actually worked."
+- **Use domain jargon.** "Cutover," "golden-path," "no fallbacks," "domain split," "exactly as specified" trigger precise behavior.
+- **Long context degrades.** Above ~512K tokens, cross-reference claims and cite specific sections.
+- **Token efficiency is real.** 5.4 uses fewer tokens per problem—but verify it didn't skip steps to get there.
+## Example: implementation task prompt
+```
+<system>
+You are implementing a feature in an existing codebase. Follow these rules strictly.
+<design_and_scope_constraints>
+- Implement EXACTLY and ONLY what the user requests. Nothing more.
+- No extra features, no "while I'm here" improvements.
+- If you notice adjacent issues, note them in your summary but DO NOT act on them.
+</design_and_scope_constraints>
+<context_loading>
+- Read ALL files that will be modified—in full.
+- Also read key files they import from or depend on.
+- Absorb patterns before writing any code.
+</context_loading>
+<verification_requirements>
+- Before declaring complete, verify each requirement is addressed in actual code.
+- Run tests if available. If not, state what couldn't be verified.
+- Include concrete evidence of completion in your summary.
+</verification_requirements>
+<output_verbosity_spec>
+- Brief updates only on major phases or blockers.
+- Final summary: What changed, Where, Risks, Next steps.
+</output_verbosity_spec>
+</system>
+```
+## Example: code review prompt
+```
+<system>
+You are reviewing code changes. Be thorough but stay in scope.
+<context_loading>
+- Read every changed file in full, not just the diff hunks.
+- Also read files they import from and key dependents.
+</context_loading>
+<review_scope>
+- Review for: bugs, logic errors, race conditions, resource leaks, null hazards, error handling gaps, type mismatches, dead code, unused imports, pattern inconsistencies.
+- Fix issues you find with direct code edits.
+- Do NOT refactor or restructure code that wasn't flagged in the review.
+- If adjacent code looks problematic, note it but don't touch it.
+</review_scope>
+<verification_requirements>
+- After fixes, verify the code still works. Run tests if available.
+- In your summary, list what was found, what was fixed, and what couldn't be verified.
+</verification_requirements>
+</system>
+```

package/handoff-utils.ts ADDED Viewed

@@ -0,0 +1,92 @@
+import { mkdirSync, writeFileSync } from "node:fs";
+import { join } from "node:path";
+import { getAgentDir } from "@mariozechner/pi-coding-agent";
+import type { InteractiveShellConfig } from "./config.js";
+import type { InteractiveShellOptions, InteractiveShellResult } from "./types.js";
+import type { PtyTerminalSession } from "./pty-session.js";
+export function captureCompletionOutput(
+	session: PtyTerminalSession,
+	config: InteractiveShellConfig,
+): InteractiveShellResult["completionOutput"] {
+	const result = session.getTailLines({
+		lines: config.completionNotifyLines,
+		ansi: false,
+		maxChars: config.completionNotifyMaxChars,
+	});
+	return {
+		lines: result.lines,
+		totalLines: result.totalLinesInBuffer,
+		truncated: result.lines.length < result.totalLinesInBuffer || result.truncatedByChars,
+	};
+}
+export function captureTransferOutput(
+	session: PtyTerminalSession,
+	config: InteractiveShellConfig,
+): InteractiveShellResult["transferred"] {
+	const result = session.getTailLines({
+		lines: config.transferLines,
+		ansi: false,
+		maxChars: config.transferMaxChars,
+	});
+	return {
+		lines: result.lines,
+		totalLines: result.totalLinesInBuffer,
+		truncated: result.lines.length < result.totalLinesInBuffer || result.truncatedByChars,
+	};
+}
+export function maybeBuildHandoffPreview(
+	session: PtyTerminalSession,
+	when: "exit" | "detach" | "kill" | "timeout" | "transfer",
+	config: InteractiveShellConfig,
+	overrides?: Pick<InteractiveShellOptions, "handoffPreviewEnabled" | "handoffPreviewLines" | "handoffPreviewMaxChars">,
+): InteractiveShellResult["handoffPreview"] | undefined {
+	const enabled = overrides?.handoffPreviewEnabled ?? config.handoffPreviewEnabled;
+	if (!enabled) return undefined;
+	const lines = overrides?.handoffPreviewLines ?? config.handoffPreviewLines;
+	const maxChars = overrides?.handoffPreviewMaxChars ?? config.handoffPreviewMaxChars;
+	if (lines <= 0 || maxChars <= 0) return undefined;
+	const result = session.getTailLines({ lines, ansi: false, maxChars });
+	return { type: "tail", when, lines: result.lines };
+}
+export function maybeWriteHandoffSnapshot(
+	session: PtyTerminalSession,
+	when: "exit" | "detach" | "kill" | "timeout" | "transfer",
+	config: InteractiveShellConfig,
+	context: { command: string; cwd?: string },
+	overrides?: Pick<InteractiveShellOptions, "handoffSnapshotEnabled" | "handoffSnapshotLines" | "handoffSnapshotMaxChars">,
+): InteractiveShellResult["handoff"] | undefined {
+	const enabled = overrides?.handoffSnapshotEnabled ?? config.handoffSnapshotEnabled;
+	if (!enabled) return undefined;
+	const lines = overrides?.handoffSnapshotLines ?? config.handoffSnapshotLines;
+	const maxChars = overrides?.handoffSnapshotMaxChars ?? config.handoffSnapshotMaxChars;
+	if (lines <= 0 || maxChars <= 0) return undefined;
+	const baseDir = join(getAgentDir(), "cache", "interactive-shell");
+	mkdirSync(baseDir, { recursive: true });
+	const timestamp = new Date().toISOString().replace(/[:.]/g, "-");
+	const pid = session.pid;
+	const filename = `snapshot-${timestamp}-pid${pid}.log`;
+	const transcriptPath = join(baseDir, filename);
+	const tailResult = session.getTailLines({
+		lines,
+		ansi: config.ansiReemit,
+		maxChars,
+	});
+	const header = [
+		`# interactive-shell snapshot (${when})`,
+		`time: ${new Date().toISOString()}`,
+		`command: ${context.command}`,
+		`cwd: ${context.cwd ?? ""}`,
+		`pid: ${pid}`,
+		`exitCode: ${session.exitCode ?? ""}`,
+		`signal: ${session.signal ?? ""}`,
+		`lines: ${tailResult.lines.length} (requested ${lines}, maxChars ${maxChars})`,
+		"",
+	].join("\n");
+	writeFileSync(transcriptPath, header + tailResult.lines.join("\n") + "\n", { encoding: "utf-8" });
+	return { type: "snapshot", when, transcriptPath, linesWritten: tailResult.lines.length };
+}