pi-interactive-shell 0.8.2 → 0.10.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -2,7 +2,53 @@
2
2
 
3
3
  All notable changes to the `pi-interactive-shell` extension will be documented in this file.
4
4
 
5
- ## [Unreleased]
5
+ ## [0.10.0] - 2026-03-13
6
+
7
+ ### Added
8
+ - **Test harness** - Added vitest with 20 tests covering session queries, key encoding, notification formatting, headless monitor lifecycle, session manager, config/docs parity, and module loading.
9
+ - **`gpt-5-4-prompting` skill** - New bundled skill with GPT-5.4 prompting best practices for Codex workflows.
10
+
11
+ ### Changed
12
+ - **Architecture refactor** - Extracted shared logic into focused modules for better maintainability:
13
+ - `session-query.ts` - Unified output/query logic (rate limiting, incremental, drain, offset modes)
14
+ - `notification-utils.ts` - Message formatting for dispatch/hands-free notifications
15
+ - `handoff-utils.ts` - Snapshot/preview capture on session exit/transfer
16
+ - `runtime-coordinator.ts` - Centralized overlay/monitor/widget state management
17
+ - `pty-log.ts` - Raw output trimming and line slicing
18
+ - `pty-protocol.ts` - DSR cursor position query handling
19
+ - `spawn-helper.ts` - macOS node-pty permission fix
20
+ - `background-widget.ts` - TUI widget for background sessions
21
+ - README, `SKILL.md`, install output, and the packaged Codex workflow examples now tell the same story about dispatch being the recommended delegated mode, the current 8s quiet threshold / 15s grace-period defaults, and the bundled prompt-skill surface.
22
+ - The Codex workflow docs now point at the packaged `gpt-5-4-prompting`, `codex-5-3-prompting`, and `codex-cli` skills instead of describing a runtime fetch of the old 5.2 prompting guide.
23
+ - Example prompts and skill docs are aligned around `gpt-5.4` as the default Codex model, with `gpt-5.3-codex` remaining the explicit opt-in fallback.
24
+ - Renamed `codex-5.3-prompting` → `codex-5-3-prompting` example skill (filesystem-friendly path).
25
+
26
+ ### Fixed
27
+ - **Map iteration bug** - Fixed `disposeAllMonitors()` modifying Map during iteration, which could cause unpredictable behavior.
28
+ - **Array iteration bug** - Fixed PTY listener notifications modifying arrays during iteration if a listener unsubscribed itself.
29
+ - **Missing runtime dependency** - Added `@sinclair/typebox` to dependencies (was imported but not declared).
30
+ - Documented the packaged prompt/skill onboarding path more clearly so users can either rely on the exported package metadata or copy the bundled examples into their own prompt and skill directories.
31
+
32
+ ## [0.9.0] - 2026-02-23
33
+
34
+ ### Added
35
+ - `examples/skills/codex-5.3-prompting/` skill with GPT-5.3-Codex prompting guide -- self-contained best practices for verbosity control, scope discipline, forced upfront reading, plan mode, mid-task steering, context management, and reasoning effort recommendations.
36
+ - **`interactive-shell:update` event** — All hands-free update callbacks now emit `pi.events.emit("interactive-shell:update", update)` with the full `HandsFreeUpdate` payload. Extensions can listen for quiet, exit, kill, and user-takeover events regardless of which code path started the session (blocking, non-blocking, or reattach).
37
+ - **`triggerTurn` on terminal events** — Non-blocking hands-free sessions now send `pi.sendMessage` with `triggerTurn: true` when the session exits, is killed, or the user takes over. Periodic "running" updates emit only on the event bus (cheap for extensions) without waking the agent.
38
+
39
+ ### Fixed
40
+ - **Quiet detection broken for TUI apps** — Ink-based CLIs (Claude Code, etc.) emit periodic ANSI-only PTY data (cursor blink, frame redraws) that reset the quiet timer on every event, preventing quiet detection from ever triggering. Now filters data through `stripVTControlCharacters` and only resets the quiet timer when there's visible content. Fixed in both the overlay (`overlay-component.ts`) and headless dispatch monitor (`headless-monitor.ts`). Also seeds the quiet timer at startup when `autoExitOnQuiet` is enabled, so sessions that never produce visible output still get killed after the grace period.
41
+ - **Lifecycle guard decoupled from callback** — The overlay used `onHandsFreeUpdate` presence as a proxy for "blocking tool call" to decide whether to unregister sessions on completion. Wiring the callback in non-blocking paths (for event emission) would cause premature session cleanup. Introduced `streamingMode` flag to separate "has update callback" from "should unregister on completion," so non-blocking sessions stay queryable after the callback fires.
42
+ - **`autoExitOnQuiet` broken in interval update mode** — The `onData` handler only reset the quiet timer in `on-quiet` mode, so `autoExitOnQuiet` never fired with `updateMode: "interval"`. Also, the interval timer's safety-net flush unconditionally stopped the quiet timer, preventing `autoExitOnQuiet` from firing if the interval flushed before the quiet threshold. Both fixed: data handler now resets the timer whenever `autoExitOnQuiet` is enabled regardless of update mode, and the interval flush restarts (rather than stops) the quiet timer when `autoExitOnQuiet` is active.
43
+ - **RangeError on narrow terminals** — `render()` computed `width - 2` for border strings without a lower bound, causing `String.prototype.repeat()` to throw with negative counts when terminal width < 4. Clamped in both the main overlay and reattach overlay. Fixes #2.
44
+ - **Hardcoded `~/.pi/agent` path** — Config loading, snapshot writing, and the install script all hardcoded `~/.pi/agent`, ignoring `PI_CODING_AGENT_DIR`. Now uses `getAgentDir()` from pi's API in all runtime paths and reads the env var in the install script. Fixes #1.
45
+
46
+ ### Changed
47
+ - Default `handsFreeQuietThreshold` increased from 5000ms to 8000ms and `autoExitGracePeriod` reduced from 30000ms to 15000ms. Both remain adjustable per-call via `handsFree.quietThreshold` and `handsFree.gracePeriod`, and via config file.
48
+ - Dispatch mode is now the recommended default for delegated Codex runs. Updated `README.md`, `SKILL.md`, `tool-schema.ts`, `examples/skills/codex-cli/SKILL.md`, and all three codex prompt templates to prefer `mode: "dispatch"` over hands-free for fire-and-forget delegations.
49
+ - Rewrote `codex-5.3-prompting` skill from a descriptive model-behavior guide into a directive prompt-construction reference. Cut behavioral comparison, mid-task steering, and context management prose sections; reframed each prompt block with a one-line "include when X" directive so the agent knows what to inject and when.
50
+ - Added "Backwards compatibility hedging" section to `codex-5.3-prompting` skill covering the "cutover" keyword trick -- GPT-5.3-Codex inserts compatibility shims and fallback code even when told not to; using "cutover" + "no backwards compatibility" + "do not preserve legacy code" produces cleaner breaks than vague "don't worry about backwards compatibility" phrasing.
51
+ - Example prompts (`codex-implement-plan`, `codex-review-impl`, `codex-review-plan`) updated for GPT-5.3-Codex: load `codex-5.3-prompting` and `codex-cli` skills instead of fetching the 5.2 guide URL at runtime, added scope fencing instructions to counter 5.3's aggressive refactoring, added "don't ask clarifying questions" and "brief updates" constraints, strengthened `codex-review-plan` to force reading codebase files referenced in the plan and constrain edit scope.
6
52
 
7
53
  ## [0.8.2] - 2026-02-10
8
54
 
package/README.md CHANGED
@@ -49,7 +49,7 @@ Three modes control how the agent engages with a session:
49
49
 
50
50
  **Hands-free** returns immediately so the agent can do other work, but the agent must poll periodically to discover output and completion. Good for processes the agent needs to monitor and react to mid-flight, like watching build output and sending follow-up commands.
51
51
 
52
- **Dispatch** also returns immediately, but the agent doesn't poll at all. When the session completes — whether by natural exit, quiet detection, timeout, or user intervention — the agent gets woken up with a notification containing the tail output. This is the right mode for delegating a task to a subagent and moving on. Add `background: true` to skip the overlay entirely and run headless.
52
+ **Dispatch** also returns immediately, but the agent doesn't poll at all. When the session completes — whether by natural exit, quiet detection, timeout, or user intervention — the agent gets woken up with a notification containing the tail output. This is the right mode for delegating a task to a subagent and moving on. For fire-and-forget delegated runs and QA checks, prefer dispatch by default. Add `background: true` to skip the overlay entirely and run headless.
53
53
 
54
54
  ## Quick Start
55
55
 
@@ -115,7 +115,7 @@ Attach to review full output: interactive_shell({ attach: "calm-reef" })
115
115
 
116
116
  The notification includes a brief tail (last 5 lines) and a reattach instruction. The PTY is preserved for 5 minutes so the agent can attach to review full scrollback.
117
117
 
118
- Dispatch defaults `autoExitOnQuiet: true` — the session gets a 30s startup grace period, then is killed after output goes silent (5s by default), which signals completion for task-oriented subagents. Tune the grace period with `handsFree: { gracePeriod: 60000 }` or opt out entirely with `handsFree: { autoExitOnQuiet: false }`.
118
+ Dispatch defaults `autoExitOnQuiet: true` — the session gets a 15s startup grace period, then is killed after output goes silent (8s by default), which signals completion for task-oriented subagents. Tune the grace period with `handsFree: { gracePeriod: 60000 }` or opt out entirely with `handsFree: { autoExitOnQuiet: false }`.
119
119
 
120
120
  The overlay still shows for the user, who can Ctrl+T to transfer output, Ctrl+B to background, take over by typing, or Ctrl+Q for more options.
121
121
 
@@ -151,7 +151,7 @@ interactive_shell({
151
151
 
152
152
  ### Auto-Exit on Quiet
153
153
 
154
- For fire-and-forget single-task delegations, enable auto-exit to kill the session after 5s of output silence:
154
+ For fire-and-forget single-task delegations, enable auto-exit to kill the session after 8s of output silence:
155
155
 
156
156
  ```typescript
157
157
  interactive_shell({
@@ -161,7 +161,7 @@ interactive_shell({
161
161
  })
162
162
  ```
163
163
 
164
- A 30s startup grace period prevents the session from being killed before the subprocess has time to produce output. Customize it per-call with `gracePeriod`:
164
+ A 15s startup grace period prevents the session from being killed before the subprocess has time to produce output. Customize it per-call with `gracePeriod`:
165
165
 
166
166
  ```typescript
167
167
  interactive_shell({
@@ -282,8 +282,8 @@ Configuration files (project overrides global):
282
282
  "completionNotifyMaxChars": 5000,
283
283
  "handsFreeUpdateMode": "on-quiet",
284
284
  "handsFreeUpdateInterval": 60000,
285
- "handsFreeQuietThreshold": 5000,
286
- "autoExitGracePeriod": 30000,
285
+ "handsFreeQuietThreshold": 8000,
286
+ "autoExitGracePeriod": 15000,
287
287
  "handsFreeUpdateMaxChars": 1500,
288
288
  "handsFreeMaxTotalChars": 100000,
289
289
  "handoffPreviewEnabled": true,
@@ -306,8 +306,8 @@ Configuration files (project overrides global):
306
306
  | `completionNotifyLines` | 50 | Lines in dispatch completion notification (10-500) |
307
307
  | `completionNotifyMaxChars` | 5000 | Max chars in completion notification (1KB-50KB) |
308
308
  | `handsFreeUpdateMode` | "on-quiet" | "on-quiet" or "interval" |
309
- | `handsFreeQuietThreshold` | 5000 | Silence duration before update (ms) |
310
- | `autoExitGracePeriod` | 30000 | Startup grace before `autoExitOnQuiet` kill (ms) |
309
+ | `handsFreeQuietThreshold` | 8000 | Silence duration before update (ms) |
310
+ | `autoExitGracePeriod` | 15000 | Startup grace before `autoExitOnQuiet` kill (ms) |
311
311
  | `handsFreeUpdateInterval` | 60000 | Max interval between updates (ms) |
312
312
  | `handsFreeUpdateMaxChars` | 1500 | Max chars per update |
313
313
  | `handsFreeMaxTotalChars` | 100000 | Total char budget for updates |
@@ -331,7 +331,7 @@ Full PTY. The subprocess thinks it's in a real terminal.
331
331
 
332
332
  ## Example Workflow: Plan, Implement, Review
333
333
 
334
- The `examples/prompts/` directory includes three prompt templates that chain together into a complete development workflow using Codex CLI. Each template instructs pi to gather context, generate a tailored meta prompt based on the [Codex prompting guide](https://developers.openai.com/cookbook/examples/gpt-5/gpt-5-2_prompting_guide.md), and launch Codex in an interactive overlay.
334
+ The `examples/prompts/` directory includes three prompt templates that chain together into a complete development workflow using Codex CLI. Each template now loads the bundled `gpt-5-4-prompting` skill by default, falls back to `codex-5-3-prompting` when the user explicitly asks for Codex 5.3, and launches Codex in an interactive overlay.
335
335
 
336
336
  ### The Pipeline
337
337
 
@@ -347,14 +347,22 @@ Write a plan
347
347
 
348
348
  ### Installing the Templates
349
349
 
350
- Copy the prompt templates and Codex CLI skill to your pi config:
350
+ Install the package first so pi can discover the bundled prompt and skill directories via the package metadata:
351
+
352
+ ```bash
353
+ pi install npm:pi-interactive-shell
354
+ ```
355
+
356
+ If you want your own slash commands and local skill copies, copy the examples into your agent config:
351
357
 
352
358
  ```bash
353
359
  # Prompt templates (slash commands)
354
360
  cp ~/.pi/agent/extensions/interactive-shell/examples/prompts/*.md ~/.pi/agent/prompts/
355
361
 
356
- # Codex CLI skill (teaches pi how to use codex flags, sandbox caveats, etc.)
362
+ # Skills used by the templates
357
363
  cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/codex-cli ~/.pi/agent/skills/
364
+ cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/gpt-5-4-prompting ~/.pi/agent/skills/
365
+ cp -r ~/.pi/agent/extensions/interactive-shell/examples/skills/codex-5-3-prompting ~/.pi/agent/skills/
358
366
  ```
359
367
 
360
368
  ### Usage
@@ -388,9 +396,9 @@ Say you have a plan at `docs/auth-redesign-plan.md`:
388
396
 
389
397
  These templates demonstrate a "meta-prompt generation" pattern:
390
398
 
391
- 1. **Pi gathers context** — reads the plan, runs git diff, fetches the Codex prompting guide
392
- 2. **Pi generates a calibrated prompt** — tailored to the specific plan/diff, following the guide's best practices
393
- 3. **Pi launches Codex in the overlay** — with explicit flags (`-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`) and hands off control
399
+ 1. **Pi gathers context** — reads the plan, runs git diff, and loads the local `gpt-5-4-prompting` or `codex-5-3-prompting` skill
400
+ 2. **Pi generates a calibrated prompt** — tailored to the specific plan/diff, following the selected skill's best practices
401
+ 3. **Pi launches Codex in the overlay** — defaulting to `-m gpt-5.4 -a never` and switching to `-m gpt-5.3-codex -a never` only when the user explicitly asks for Codex 5.3
394
402
 
395
403
  The user watches Codex work in the overlay and can take over anytime (type to intervene, Ctrl+T to transfer output back to pi, Ctrl+Q for options).
396
404
 
package/SKILL.md CHANGED
@@ -5,7 +5,7 @@ description: Cheat sheet + workflow for launching interactive coding-agent CLIs
5
5
 
6
6
  # Interactive Shell (Skill)
7
7
 
8
- Last verified: 2026-01-18
8
+ Last verified: 2026-03-12
9
9
 
10
10
  ## Foreground vs Background Subagents
11
11
 
@@ -84,6 +84,8 @@ interactive_shell({
84
84
 
85
85
  Dispatch defaults `autoExitOnQuiet: true`. The agent can still query the sessionId if needed, but doesn't have to.
86
86
 
87
+ For fire-and-forget delegated runs (including QA-style delegated checks), prefer dispatch as the default mode.
88
+
87
89
  #### Background Dispatch (Headless)
88
90
  No overlay opens. Multiple headless dispatches can run concurrently:
89
91
 
@@ -163,7 +165,7 @@ interactive_shell({
163
165
  reason: "Security review",
164
166
  handsFree: { autoExitOnQuiet: true }
165
167
  })
166
- // Session auto-kills after ~5s of quiet
168
+ // Session auto-kills after ~8s of quiet (after the startup grace period)
167
169
  // Read results from file:
168
170
  // read("/tmp/security-review.md")
169
171
  ```
@@ -0,0 +1,76 @@
1
+ import { truncateToWidth, visibleWidth } from "@mariozechner/pi-tui";
2
+ import { formatDuration } from "./types.js";
3
+ import type { ShellSessionManager } from "./session-manager.js";
4
+
5
+ export function setupBackgroundWidget(
6
+ ctx: { ui: { setWidget: Function }; hasUI?: boolean },
7
+ sessionManager: ShellSessionManager,
8
+ ): (() => void) | null {
9
+ if (!ctx.hasUI) return null;
10
+
11
+ let durationTimer: ReturnType<typeof setInterval> | null = null;
12
+ let tuiRef: { requestRender: () => void } | null = null;
13
+
14
+ const requestRender = () => tuiRef?.requestRender();
15
+ const unsubscribe = sessionManager.onChange(() => {
16
+ manageDurationTimer();
17
+ requestRender();
18
+ });
19
+
20
+ function manageDurationTimer() {
21
+ const sessions = sessionManager.list();
22
+ const hasRunning = sessions.some((s) => !s.session.exited);
23
+ if (hasRunning && !durationTimer) {
24
+ durationTimer = setInterval(requestRender, 10_000);
25
+ } else if (!hasRunning && durationTimer) {
26
+ clearInterval(durationTimer);
27
+ durationTimer = null;
28
+ }
29
+ }
30
+
31
+ ctx.ui.setWidget(
32
+ "bg-sessions",
33
+ (tui: any, theme: any) => {
34
+ tuiRef = tui;
35
+ return {
36
+ render: (width: number) => {
37
+ const sessions = sessionManager.list();
38
+ if (sessions.length === 0) return [];
39
+ const cols = width || tui.terminal?.columns || 120;
40
+ const lines: string[] = [];
41
+ for (const s of sessions) {
42
+ const exited = s.session.exited;
43
+ const dot = exited ? theme.fg("dim", "○") : theme.fg("accent", "●");
44
+ const id = theme.fg("dim", s.id);
45
+ const cmd = s.command.replace(/\s+/g, " ").trim();
46
+ const truncCmd = cmd.length > 60 ? cmd.slice(0, 57) + "..." : cmd;
47
+ const reason = s.reason ? theme.fg("dim", ` · ${s.reason}`) : "";
48
+ const status = exited ? theme.fg("dim", "exited") : theme.fg("success", "running");
49
+ const duration = theme.fg("dim", formatDuration(Date.now() - s.startedAt.getTime()));
50
+ const oneLine = ` ${dot} ${id} ${truncCmd}${reason} ${status} ${duration}`;
51
+ if (visibleWidth(oneLine) <= cols) {
52
+ lines.push(oneLine);
53
+ } else {
54
+ lines.push(truncateToWidth(` ${dot} ${id} ${cmd}`, cols, "…"));
55
+ lines.push(truncateToWidth(` ${status} ${duration}${reason}`, cols, "…"));
56
+ }
57
+ }
58
+ return lines;
59
+ },
60
+ invalidate: () => {},
61
+ };
62
+ },
63
+ { placement: "belowEditor" },
64
+ );
65
+
66
+ manageDurationTimer();
67
+
68
+ return () => {
69
+ unsubscribe();
70
+ if (durationTimer) {
71
+ clearInterval(durationTimer);
72
+ durationTimer = null;
73
+ }
74
+ ctx.ui.setWidget("bg-sessions", undefined);
75
+ };
76
+ }
package/config.ts CHANGED
@@ -1,6 +1,6 @@
1
1
  import { existsSync, readFileSync } from "node:fs";
2
- import { homedir } from "node:os";
3
2
  import { join } from "node:path";
3
+ import { getAgentDir } from "@mariozechner/pi-coding-agent";
4
4
 
5
5
  export interface InteractiveShellConfig {
6
6
  exitAutoCloseDelay: number;
@@ -52,8 +52,8 @@ const DEFAULT_CONFIG: InteractiveShellConfig = {
52
52
  // Hands-free mode defaults
53
53
  handsFreeUpdateMode: "on-quiet" as const,
54
54
  handsFreeUpdateInterval: 60000,
55
- handsFreeQuietThreshold: 5000,
56
- autoExitGracePeriod: 30000,
55
+ handsFreeQuietThreshold: 8000,
56
+ autoExitGracePeriod: 15000,
57
57
  handsFreeUpdateMaxChars: 1500,
58
58
  handsFreeMaxTotalChars: 100000,
59
59
  // Query rate limiting (default 60 seconds between queries)
@@ -62,7 +62,7 @@ const DEFAULT_CONFIG: InteractiveShellConfig = {
62
62
 
63
63
  export function loadConfig(cwd: string): InteractiveShellConfig {
64
64
  const projectPath = join(cwd, ".pi", "interactive-shell.json");
65
- const globalPath = join(homedir(), ".pi", "agent", "interactive-shell.json");
65
+ const globalPath = join(getAgentDir(), "interactive-shell.json");
66
66
 
67
67
  let globalConfig: Partial<InteractiveShellConfig> = {};
68
68
  let projectConfig: Partial<InteractiveShellConfig> = {};
@@ -1,23 +1,34 @@
1
1
  ---
2
2
  description: Launch Codex CLI in overlay to fully implement an existing plan/spec document
3
3
  ---
4
- Read the Codex prompting guide at https://developers.openai.com/cookbook/examples/gpt-5/gpt-5-2_prompting_guide.md using fetch_content or web_search. Then read the plan at `$1`.
4
+ Determine which prompting skill to load based on model:
5
+ - Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
6
+ - If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
7
+
8
+ Also load the `codex-cli` skill. Then read the plan at `$1`.
5
9
 
6
10
  Analyze the plan to understand: how many files are created vs modified, whether there's a prescribed implementation order or prerequisites, what existing code is referenced, and roughly how large the implementation is.
7
11
 
8
- Based on the prompting guide's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
12
+ Based on the prompting skill's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
9
13
 
10
14
  1. Read and internalize the full plan document. Identify every file to be created, every file to be modified, and any prerequisites or ordering constraints.
11
15
  2. Before writing any code, read all existing files that will be modified — in full, not just the sections mentioned in the plan. Also read key files they import from or that import them, to absorb the surrounding patterns, naming conventions, and architecture.
12
16
  3. If the plan specifies an implementation order or prerequisites (e.g., "extract module X before building Y"), follow that order exactly. Otherwise, implement bottom-up: shared utilities and types first, then the modules that depend on them, then integration/registration code last.
13
17
  4. Implement each piece completely. No stubs, no TODOs, no placeholder comments, no "implement this later" shortcuts. Every function body, every edge case handler, every error path described in the plan must be real code.
14
18
  5. Match existing code patterns exactly — same formatting, same import style, same error handling conventions, same naming. Read the surrounding codebase to absorb these patterns before writing. If the plan references patterns from specific files (e.g., "same pattern as X"), read those files and replicate the pattern faithfully.
15
- 6. Keep files reasonably sized. If a file grows beyond ~500 lines, split it as the plan describes or refactor into logical sub-modules.
16
- 7. After implementing all files, do a self-review pass: re-read the plan from top to bottom and verify every requirement, every edge case, every design decision is addressed in the code. Check for: missing imports, type mismatches, unreachable code paths, inconsistent field names between modules, and any plan requirement that was overlooked.
17
- 8. Do NOT commit or push. Write a summary listing every file created or modified, what was implemented in each, and any plan ambiguities that required judgment calls.
19
+ 6. Stay within scope. Do not refactor, rename, or restructure adjacent code that the plan does not mention. No "while I'm here" improvements. If something adjacent looks wrong, note it in the summary but do not touch it.
20
+ 7. Keep files reasonably sized. If a file grows beyond ~500 lines, split it as the plan describes or refactor into logical sub-modules.
21
+ 8. After implementing all files, do a self-review pass: re-read the plan from top to bottom and verify every requirement, every edge case, every design decision is addressed in the code. Check for: missing imports, type mismatches, unreachable code paths, inconsistent field names between modules, and any plan requirement that was overlooked.
22
+ 9. Do NOT commit or push. Write a summary listing every file created or modified, what was implemented in each, and any plan ambiguities that required judgment calls.
23
+
24
+ The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions about things answerable by reading the plan or codebase — read first, then act. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design. Emphasize scope discipline and verification requirements per the prompting skill.
25
+
26
+ Determine the model flag:
27
+ - Default: `-m gpt-5.4`
28
+ - If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
18
29
 
19
- The meta prompt should follow the Codex guide's structure: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Emphasize that the plan has already been thoroughly reviewed — the job is faithful execution, not second-guessing the design.
30
+ Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
20
31
 
21
- Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`. Do NOT pass sandbox flags in interactive_shell. End your turn immediately after launching -- do not poll the session. The user will manage the overlay directly.
32
+ Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
22
33
 
23
34
  $@
@@ -1,24 +1,35 @@
1
1
  ---
2
2
  description: Launch Codex CLI in overlay to review implemented code changes (optionally against a plan)
3
3
  ---
4
- Read the Codex prompting guide at https://developers.openai.com/cookbook/examples/gpt-5/gpt-5-2_prompting_guide.md using fetch_content or web_search. Then determine the review scope:
4
+ Determine which prompting skill to load based on model:
5
+ - Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
6
+ - If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
7
+
8
+ Also load the `codex-cli` skill. Then determine the review scope:
5
9
 
6
10
  - If `$1` looks like a file path (contains `/` or ends in `.md`): read it as the plan/spec these changes were based on. The diff scope is uncommitted changes vs HEAD, or if clean, the current branch vs main.
7
11
  - Otherwise: no plan file. Diff scope is the same. Treat all of `$@` as additional review context or focus areas.
8
12
 
9
13
  Run the appropriate git diff to identify which files changed and how many lines are involved. This context helps you generate a better-calibrated meta prompt.
10
14
 
11
- Based on the prompting guide's best practices, the diff scope, and the optional plan, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
15
+ Based on the prompting skill's best practices, the diff scope, and the optional plan, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
12
16
 
13
17
  1. Identify all changed files via git diff, then read every changed file in full — not just the diff hunks. For each changed file, also read the files it imports from and key files that depend on it, to understand integration points and downstream effects.
14
18
  2. If a plan/spec was provided, read it and verify the implementation is complete — every requirement addressed, no steps skipped, nothing invented beyond scope, no partial stubs left behind.
15
19
  3. Review each changed file for: bugs, logic errors, race conditions, resource leaks (timers, event listeners, file handles, unclosed connections), null/undefined hazards, off-by-one errors, error handling gaps, type mismatches, dead code, unused imports/variables/parameters, unnecessary complexity, and inconsistency with surrounding code patterns and naming conventions.
16
20
  4. Trace key code paths end-to-end across function and file boundaries — verify data flows, state transitions, error propagation, and cleanup ordering. Don't evaluate functions in isolation.
17
21
  5. Check for missing or inadequate tests, stale documentation, and missing changelog entries.
18
- 6. Fix every issue found with direct code edits. After all fixes, write a clear summary listing what was found, what was fixed, and any remaining concerns that require human judgment.
22
+ 6. Fix every issue found with direct code edits. Keep fixes scoped to the actual issues identified do not expand into refactoring or restructuring code that wasn't flagged in the review. If adjacent code looks problematic, note it in the summary but don't touch it.
23
+ 7. After all fixes, write a clear summary listing what was found, what was fixed, and any remaining concerns that require human judgment.
24
+
25
+ The meta prompt should follow the prompting skill's patterns: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Instruct Codex not to ask clarifying questions — if intent is unclear, read the surrounding code for context instead of asking. Keep progress updates brief and concrete (no narrating routine file reads or tool calls). Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp. Emphasize scope discipline and verification requirements per the prompting skill.
26
+
27
+ Determine the model flag:
28
+ - Default: `-m gpt-5.4`
29
+ - If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
19
30
 
20
- The meta prompt should follow the Codex guide's structure: clear system context, explicit scope and verbosity constraints, step-by-step instructions, and expected output format. Emphasize thoroughness — read the actual code deeply before making judgments, question every assumption, and never rubber-stamp.
31
+ Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
21
32
 
22
- Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="high" -a never`. Do NOT pass sandbox flags in interactive_shell. End your turn immediately after launching -- do not poll the session. The user will manage the overlay directly.
33
+ Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
23
34
 
24
35
  $@
@@ -1,19 +1,29 @@
1
1
  ---
2
2
  description: Launch Codex CLI in overlay to review an implementation plan against the codebase
3
3
  ---
4
- Read the Codex prompting guide at https://developers.openai.com/cookbook/examples/gpt-5/gpt-5-2_prompting_guide.md using fetch_content or web_search. Then read the plan at `$1`.
4
+ Determine which prompting skill to load based on model:
5
+ - Default: Load `gpt-5-4-prompting` skill (for `gpt-5.4`)
6
+ - If user explicitly requests Codex 5.3: Load `codex-5-3-prompting` skill (for `gpt-5.3-codex`)
5
7
 
6
- Based on the prompting guide's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
8
+ Also load the `codex-cli` skill. Then read the plan at `$1`.
7
9
 
8
- 1. Read and internalize the full plan
9
- 2. Systematically review the plan against the reference docs/links/code
10
- 3. Verify every assumption, file path, API shape, data flow, and integration point mentioned in the plan
11
- 4. Check that the plan's approach is logically sound, complete, and accounts for edge cases
12
- 5. Identify any gaps, contradictions, incorrect assumptions, or missing steps
13
- 6. Make direct edits to the plan file to fix any issues found, adding inline notes where changes were made
10
+ Based on the prompting skill's best practices and the plan's content, generate a comprehensive meta prompt tailored for Codex CLI. The meta prompt should instruct Codex to:
14
11
 
15
- The meta prompt should be structured according to the Codex guide's recommendations (clear system context, explicit constraints, step-by-step instructions, expected output format).
12
+ 1. Read and internalize the full plan. Then read every codebase file the plan references in full, not just the sections mentioned. Also read key files adjacent to those (imports, dependents) to understand the real state of the code the plan targets.
13
+ 2. Systematically review the plan against what the code actually looks like, not what the plan assumes it looks like.
14
+ 3. Verify every assumption, file path, API shape, data flow, and integration point mentioned in the plan against the actual codebase.
15
+ 4. Check that the plan's approach is logically sound, complete, and accounts for edge cases.
16
+ 5. Identify any gaps, contradictions, incorrect assumptions, or missing steps.
17
+ 6. Make targeted edits to the plan file to fix issues found, adding inline notes where changes were made. Fix what's wrong — do not restructure or rewrite sections that are correct.
16
18
 
17
- Then launch Codex CLI in the interactive shell overlay with that meta prompt using these flags: `-m gpt-5.3-codex -c model_reasoning_effort="xhigh" -a never`. Do NOT pass sandbox flags in interactive_shell. End your turn immediately after launching -- do not poll the session. The user will manage the overlay directly.
19
+ The meta prompt should follow the prompting skill's patterns (clear system context, explicit constraints, step-by-step instructions, expected output format). Instruct Codex not to ask clarifying questions read the codebase to resolve ambiguities instead of asking. Keep progress updates brief and concrete. Emphasize scope discipline and verification requirements per the prompting skill.
20
+
21
+ Determine the model flag:
22
+ - Default: `-m gpt-5.4`
23
+ - If user explicitly requests Codex 5.3: `-m gpt-5.3-codex`
24
+
25
+ Then launch Codex CLI in the interactive shell overlay with that meta prompt using the chosen model flag plus `-a never`.
26
+
27
+ Use `interactive_shell` with `mode: "dispatch"` for this delegated run (fire-and-forget with completion notification). Do NOT pass sandbox flags in interactive_shell. Dispatch mode only. End turn immediately. Do not poll. Wait for completion notification.
18
28
 
19
29
  $@
@@ -0,0 +1,161 @@
1
+ ---
2
+ name: codex-5-3-prompting
3
+ description: How to write system prompts and instructions for GPT-5.3-Codex. Use when constructing or tuning prompts targeting Codex 5.3.
4
+ ---
5
+
6
+ # GPT-5.3-Codex Prompting Guide
7
+
8
+ GPT-5.3-Codex is fast, capable, and eager. It moves quickly and will skip reading, over-refactor, and drift scope if prompts aren't tight. Explicit constraints matter more than with GPT-5.2-Codex. Include the following blocks as needed when constructing system prompts.
9
+
10
+ ## Output shape
11
+
12
+ Always include. Controls verbosity and response structure.
13
+
14
+ ```
15
+ <output_verbosity_spec>
16
+ - Default: 3-6 sentences or <=5 bullets for typical answers.
17
+ - Simple yes/no questions: <=2 sentences.
18
+ - Complex multi-step or multi-file tasks:
19
+ - 1 short overview paragraph
20
+ - then <=5 bullets tagged: What changed, Where, Risks, Next steps, Open questions.
21
+ - Avoid long narrative paragraphs; prefer compact bullets and short sections.
22
+ - Do not rephrase the user's request unless it changes semantics.
23
+ </output_verbosity_spec>
24
+ ```
25
+
26
+ ## Scope constraints
27
+
28
+ Always include. GPT-5.3-Codex will add features, refactor adjacent code, and invent UI elements if you don't fence it in.
29
+
30
+ ```
31
+ <design_and_scope_constraints>
32
+ - Explore any existing design systems and understand them deeply.
33
+ - Implement EXACTLY and ONLY what the user requests.
34
+ - No extra features, no added components, no UX embellishments.
35
+ - Style aligned to the design system at hand.
36
+ - Do NOT invent colors, shadows, tokens, animations, or new UI elements unless requested or necessary.
37
+ - If any instruction is ambiguous, choose the simplest valid interpretation.
38
+ </design_and_scope_constraints>
39
+ ```
40
+
41
+ ## Context loading
42
+
43
+ Always include. GPT-5.3-Codex skips reading and starts writing if you don't force it.
44
+
45
+ ```
46
+ <context_loading>
47
+ - Read ALL files that will be modified -- in full, not just the sections mentioned in the task.
48
+ - Also read key files they import from or that depend on them.
49
+ - Absorb surrounding patterns, naming conventions, error handling style, and architecture before writing any code.
50
+ - Do not ask clarifying questions about things that are answerable by reading the codebase.
51
+ </context_loading>
52
+ ```
53
+
54
+ ## Plan-first mode
55
+
56
+ Include for multi-file work, large refactors, or any task with ordering dependencies.
57
+
58
+ ```
59
+ <plan_first>
60
+ - Before writing any code, produce a brief implementation plan:
61
+ - Files to create vs. modify
62
+ - Implementation order and prerequisites
63
+ - Key design decisions and edge cases
64
+ - Acceptance criteria for "done"
65
+ - Get the plan right first. Then implement step by step following the plan.
66
+ - If the plan is provided externally, follow it faithfully -- the job is execution, not second-guessing the design.
67
+ </plan_first>
68
+ ```
69
+
70
+ ## Long-context handling
71
+
72
+ Include when inputs exceed ~10k tokens (multi-chapter docs, long threads, multiple PDFs).
73
+
74
+ ```
75
+ <long_context_handling>
76
+ - For inputs longer than ~10k tokens:
77
+ - First, produce a short internal outline of the key sections relevant to the task.
78
+ - Re-state the constraints explicitly before answering.
79
+ - Anchor claims to sections ("In the 'Data Retention' section...") rather than speaking generically.
80
+ - If the answer depends on fine details (dates, thresholds, clauses), quote or paraphrase them.
81
+ </long_context_handling>
82
+ ```
83
+
84
+ ## Uncertainty and ambiguity
85
+
86
+ Include when the task involves underspecified requirements or hallucination-prone domains.
87
+
88
+ ```
89
+ <uncertainty_and_ambiguity>
90
+ - If the question is ambiguous or underspecified:
91
+ - Ask up to 1-3 precise clarifying questions, OR
92
+ - Present 2-3 plausible interpretations with clearly labeled assumptions.
93
+ - Never fabricate exact figures, line numbers, or external references when uncertain.
94
+ - When unsure, prefer "Based on the provided context..." over absolute claims.
95
+ </uncertainty_and_ambiguity>
96
+ ```
97
+
98
+ ## User updates
99
+
100
+ Include for agentic / long-running tasks.
101
+
102
+ ```
103
+ <user_updates_spec>
104
+ - Send brief updates (1-2 sentences) only when:
105
+ - You start a new major phase of work, or
106
+ - You discover something that changes the plan.
107
+ - Avoid narrating routine tool calls ("reading file...", "running tests...").
108
+ - Each update must include at least one concrete outcome ("Found X", "Confirmed Y", "Updated Z").
109
+ - Do not expand the task beyond what was asked; if you notice new work, call it out as optional.
110
+ </user_updates_spec>
111
+ ```
112
+
113
+ ## Tool usage
114
+
115
+ Include when the prompt involves tool-calling agents.
116
+
117
+ ```
118
+ <tool_usage_rules>
119
+ - Prefer tools over internal knowledge whenever:
120
+ - You need fresh or user-specific data (tickets, orders, configs, logs).
121
+ - You reference specific IDs, URLs, or document titles.
122
+ - Parallelize independent reads (read_file, fetch_record, search_docs) when possible to reduce latency.
123
+ - After any write/update tool call, briefly restate:
124
+ - What changed
125
+ - Where (ID or path)
126
+ - Any follow-up validation performed
127
+ </tool_usage_rules>
128
+ ```
129
+
130
+ ## Reasoning effort
131
+
132
+ Set `model_reasoning_effort` via Codex CLI: `-c model_reasoning_effort="high"`
133
+
134
+ | Task type | Effort |
135
+ |---|---|
136
+ | Simple code generation, formatting | `low` or `medium` |
137
+ | Standard implementation from clear specs | `high` |
138
+ | Complex refactors, plan review, architecture | `xhigh` |
139
+ | Code review (thorough) | `high` or `xhigh` |
140
+
141
+ ## Backwards compatibility hedging
142
+
143
+ GPT-5.3-Codex has a strong tendency to preserve old patterns, add compatibility shims, and provide fallback code "just in case" -- even when explicitly told not to worry about backwards compatibility. Vague instructions like "don't worry about backwards compatibility" get interpreted weakly; the model may still hedge.
144
+
145
+ Use **"cutover"** to signal a clean, irreversible break. It's a precise industry term that conveys finality and intentional deprecation -- no dual-support phase, no gradual migration, no preserving old behavior.
146
+
147
+ Instead of:
148
+ > "Rewrite this and don't worry about backwards compatibility"
149
+
150
+ Say:
151
+ > "This is a cutover. No backwards compatibility. Rewrite using only Python 3.12+ features and current best practices. Do not preserve legacy code, polyfills, or deprecated patterns."
152
+
153
+ ## Quick reference
154
+
155
+ - **Force reading first.** "Read all necessary files before you ask any dumb question."
156
+ - **Use plan mode.** Draft the full task with acceptance criteria before implementing.
157
+ - **Steer aggressively mid-task.** GPT-5.3-Codex handles redirects without losing context. Be direct: "Stop. Fix the actual cause." / "Simplest valid implementation only."
158
+ - **Constrain scope hard.** GPT-5.3-Codex will refactor aggressively if you don't fence it in.
159
+ - **Watch context burn.** Faster model = faster context consumption. Start fresh at ~40%.
160
+ - **Use domain jargon.** "Cutover," "golden-path," "no fallbacks," "domain split" get cleaner, faster responses.
161
+ - **Download libraries locally.** Tell it to read them for better context than relying on training data.