npm - pi-extensions - Versions diffs - 0.1.40 → 0.1.41 - Mend

pi-extensions 0.1.40 → 0.1.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

package/package.json +1 -1
package/session-recap/CHANGELOG.md +12 -0
package/session-recap/DESIGN.md +20 -12
package/session-recap/README.md +7 -3
package/session-recap/index.ts +63 -13
package/session-recap/package.json +1 -1

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "pi-extensions",
-  "version": "0.1.40",
+  "version": "0.1.41",
   "license": "MIT",
   "private": false,
   "keywords": [

package/session-recap/CHANGELOG.md CHANGED Viewed

@@ -1,5 +1,17 @@
 # Changelog
+## [0.1.3] - 2026-05-12
+### Fixed
+- Defer focus-triggered recaps while the agent is still active, matching Claude Code's away-summary pending behavior and avoiding duplicate/stale recaps during slow tool calls.
+- Cancel stale in-flight recap drafts when a new turn starts.
+- Skip `/resume` and `/fork` recap generation in headless/non-UI sessions.
+- Read registered flag values using bare flag names (for example `recap-idle-seconds`, not `--recap-idle-seconds`) so automatic trigger configuration actually takes effect.
+- Invoke recap generation with no reasoning, no prompt-cache retention, and `maxTokens: 256`.
+### Added
+- Add `--recap-during-active` to opt back into focus-triggered recaps while an agent turn is still running.
 ## [0.1.2] - 2026-05-07
 ### Changed

package/session-recap/DESIGN.md CHANGED Viewed

@@ -39,12 +39,12 @@ All four cancel each other cleanly via an `AbortController`; the next `input`,
 Default must not surprise users with auth/login issues.
-**Decision:** default to the **currently active model** with **`reasoning: "minimal"`** where supported. Trust the user's model choice — no auto-fallback to a cheap tier. If they're on Opus 4-7 the recap uses Opus 4-7. It's the only way to guarantee reliable generation across built-in + custom providers.
+**Decision:** default to the **currently active model**, but invoke it as a tiny throwaway completion: no tools, no Agent Skills, reasoning/thinking disabled, no prompt-cache retention, capped output. Trust the user's model choice — no auto-fallback to a cheap tier. If they're on Opus 4-7 the recap uses Opus 4-7. It's the only way to guarantee reliable generation across built-in + custom providers.
 - Primary: `ctx.model` (whatever the user is running right now).
-- Reasoning: pass `reasoningEffort: "minimal"` when `model.reasoning === true`; omit entirely otherwise.
-  - Pi's own `setThinkingLevel` already clamps to model capabilities — we follow the same rule.
-  - Some custom providers may not honour `reasoningEffort`; that's fine, they'll ignore it.
+- Reasoning: disabled for recap generation. Claude Code's away-summary path uses `thinkingConfig: { type: "disabled" }`; recap generation is similarly not worth reasoning tokens.
+- Cache: pass `cacheRetention: "none"` so providers do not add Anthropic cache-control markers or OpenAI prompt-cache session keys. We should not pay cache-write overhead for one-off recap prompts.
+- Output cap: `maxTokens: 256`. Avoid forcing `temperature: 0` because some reasoning/chat providers reject temperature on their Responses API even when we are not requesting reasoning.
 - Auth: `ctx.modelRegistry.getApiKeyAndHeaders(ctx.model)` — same primitive as every other pi call, so any OAuth / env-var / custom-provider credential the user already set up just works.
 - Custom / local models (via `pi.registerProvider`): same path. If the provider is registered and has a key, recap works. If not, we skip silently — never fail loudly.
 - No active model / no API key → skip silently, log to `console.error` for debugging.
@@ -55,7 +55,7 @@ Default must not surprise users with auth/login issues.
 **Trade-off we're accepting**
-- If the user is on a heavy model (Opus 4-7, GPT-5.4), each recap uses that model for a small one-liner task. Still cheap in absolute terms because the prompt is capped at ~12k chars and the output is one line, but not the cheapest option. We prefer "no auth surprise" over "always-cheapest".
+- If the user is on a heavy model (Opus 4-7, GPT-5.5), each recap uses that model for a small one-liner task. Still cheap in absolute terms because the prompt is capped at ~12k chars and the output is one line, but not the cheapest option. We prefer "no auth surprise" over "always-cheapest".
 ## Context fed to the model
@@ -99,7 +99,15 @@ guard against multi-line outputs.
 ## Edge cases
-### 1. Focus → defocus → focus again without user input
+### 1. Focus-out during long-running agent activity
+Claude Code's `useAwaySummary` waits until the terminal has been blurred for a fixed delay. If that delay expires while `isLoading` is still true, it sets a pending bit and generates only after loading finishes, as long as the terminal is still blurred. On focus-in it cancels the timer, aborts in-flight generation, and clears the pending bit.
+Pi's recap extension mirrors that by default: `agent_start`/`agent_end` maintain `agentActive`; focus-out while active sets `focusDraftAfterAgent`; generation is deferred until `agent_end` (or a safe `turn_end` check) and is cancelled if the user refocuses or starts new input. This avoids drafting against a half-written branch during slow tool calls, which could otherwise show one recap and then replace it with a later one once tool results land.
+Escape hatch: `--recap-during-active` restores the older behavior and allows focus-triggered drafts while the agent turn is still running. This is useful for users who want a mid-flight "what's happened so far" peek and accept the possibility of stale/discarded duplicate drafts.
+### 2. Focus → defocus → focus again without user input
 **Current behaviour:** `handleFocusOut` re-enters `generateAndShow` if there is no in-flight request and no `draftingForFocus` flag, even if `pendingRecap` is still a perfectly valid recap for the same session state. Wasteful, and may overwrite a good recap with an identical one.
@@ -110,7 +118,7 @@ guard against multi-line outputs.
 **Related:** also gate on "has any activity happened since the previous draft?" — if nothing, reuse; if yes, regenerate.
-### 2. Agent turn ends in error or abort
+### 3. Agent turn ends in error or abort
 **Question:** does `agent_end` fire reliably on user-Escape abort and on model/transport errors? Need to verify against pi's current behaviour. `turn_end` is documented as per-turn and should fire even on partial completion.
@@ -119,19 +127,19 @@ guard against multi-line outputs.
 - Focus-out path already works: `hasMeaningfulActivity` counts assistant words and tool calls, independent of success/failure. An aborted turn with partial work still qualifies.
 - Add a note in the recap prompt encouraging the model to mention "aborted" / "failed" state explicitly when present in the transcript, so the one-liner is honest (e.g. `recap: Started refactor of auth.ts; aborted before tests ran. Next: resume from middleware split.`).
-### 3. Terminal doesn't support DECSET ?1004
+### 4. Terminal doesn't support DECSET ?1004
 Idle fallback covers it. `--recap-disable-focus` lets the user opt out explicitly (in case the escape sequences cause weird ghost characters in a less-compliant terminal).
-### 4. tmux without `focus-events on`
+### 5. tmux without `focus-events on`
 tmux swallows focus events unless `set -g focus-events on` is set. Document in README. Idle fallback still works.
-### 5. Aborted-in-flight recap request
+### 6. Aborted-in-flight recap request
 Already handled: `AbortController` on every `complete()` call; cancelled on input / agent_start / session_shutdown / next trigger.
-### 6. Multiple pi sessions in the same terminal process
+### 7. Multiple pi sessions in the same terminal process
 Not applicable — pi is one process per terminal tab. The stdin listener we add is scoped to the process and cleaned up on `session_shutdown`.
@@ -145,7 +153,7 @@ Not applicable — pi is one process per terminal tab. The stdin listener we add
 ### Code
 - [x] Extension lives at `session-recap/index.ts`.
-- [x] Default model = `ctx.model` with `reasoning: "minimal"` via `completeSimple()` when the model advertises reasoning; `--recap-model` override.
+- [x] Default model = `ctx.model` via `completeSimple()` with no reasoning, no cache retention, capped output, and `--recap-model` override.
 - [x] Idle timer armed on `turn_end` (not `agent_end`) so error/abort turns still get a recap.
 - [x] `pendingRecap` + `lastDraftedLeafId` stamping; skip regen on focus-out if branch leaf hasn't changed.
 - [x] Prompt explicitly asks the model to mention aborted/errored turn state when present.

package/session-recap/README.md CHANGED Viewed

@@ -30,10 +30,12 @@ If focus events cause any weirdness in your terminal, run with `--recap-disable-
 ## Model
-Defaults to the **currently active model** in your Pi session with `reasoning: "minimal"` where the model supports it. This piggybacks on whatever auth you already have (including custom providers registered via `pi.registerProvider`), so there are no login surprises.
+Defaults to the **currently active model** in your Pi session, but with recap-specific low-cost settings. This piggybacks on whatever auth you already have (including custom providers registered via `pi.registerProvider`), so there are no login surprises.
-- Reasoning-capable model (Opus 4-7, GPT-5.4, etc.) → runs at minimal thinking for speed/cost.
-- Non-reasoning model → no reasoning params passed.
+- No tools or Agent Skills are loaded into the recap call — only the compact transcript below is sent.
+- Reasoning/thinking is disabled for the recap call.
+- Prompt cache writes/reads are disabled with `cacheRetention: "none"`.
+- Output is capped with `maxTokens: 256`.
 - No active model or missing API key → the recap is skipped silently.
 Override with `--recap-model "<provider>/<id>"` if you want a specific model regardless of the session's active one.
@@ -76,6 +78,7 @@ Filter to just this extension in `~/.pi/agent/settings.json`:
 | `--recap-idle-seconds <n>` | `45` | Seconds after `turn_end` before the idle-fallback recap fires. |
 | `--recap-focus-min-seconds <n>` | `3` | Minimum focus-out duration before a recap is revealed on refocus. |
 | `--recap-disable-focus` | `false` | Disable DECSET `?1004` focus reporting. Idle fallback still runs. |
+| `--recap-during-active` | `false` | Allow focus-triggered recaps while an agent turn is still running. This restores the older “peek mid-flight” behavior, at the cost of possible stale/discarded duplicate drafts. |
 | `--recap-disable` | `false` | Disable the automatic recap entirely. `/recap` still works. |
 | `--recap-model "<p/id>"` | (active model) | Override the default, e.g. `anthropic/claude-sonnet-4-6`. |
@@ -89,6 +92,7 @@ Filter to just this extension in `~/.pi/agent/settings.json`:
 - **Uses `turn_end`, not `agent_end`**, so a turn that errors or is aborted still gets recapped.
 - **No duplicate drafts**: the last-drafted branch-leaf is stamped; if you focus out / in repeatedly without any new session activity, the recap is reused rather than regenerated.
+- **Defers during active work by default**: if you focus away during a slow model/tool action, the focus recap waits until the agent finishes loading before drafting, matching Claude Code's away-summary behavior. Use `--recap-during-active` to allow mid-flight recaps instead.
 - **Aborts on new input**: any in-flight recap request is cancelled when you start typing or a new turn begins.
 - **No session persistence**: the recap lives only in the widget for the active session — nothing is stored.

package/session-recap/index.ts CHANGED Viewed

@@ -18,16 +18,17 @@
  * Also fires on `/resume` (session_start reason="resume") to recap where
  * the prior session left off.
  *
- * Model: defaults to the user's currently active model with
- * `reasoning: "minimal"` when the model advertises reasoning support. This
- * piggybacks on whatever auth the user already has configured (including
- * custom providers) so there are no login surprises. Override explicitly
+ * Model: defaults to the user's currently active model with reasoning/thinking
+ * disabled and cache writes disabled. This piggybacks on whatever auth the user
+ * already has configured (including custom providers) so there are no login
+ * surprises. Override explicitly
  * with `--recap-model "<provider>/<id>"` if you want a specific model.
  *
  * Flags:
  *   --recap-idle-seconds <n>      Seconds after turn_end for idle recap (default 45)
  *   --recap-focus-min-seconds <n> Min focus-out duration to show a recap (default 3)
  *   --recap-disable-focus         Disable DECSET ?1004 focus reporting
+ *   --recap-during-active         Allow focus recaps while an agent turn is still running
  *   --recap-disable               Disable the automatic recap entirely
  *   --recap-model <p/id>          Override the default (active) model
  *
@@ -227,9 +228,11 @@ async function generateRecap(
 			apiKey: auth.apiKey,
 			headers: auth.headers,
 			signal,
-			// Only request reasoning on reasoning-capable models. Non-reasoning
-			// models ignore unknown params but we keep this clean.
-			...(model.reasoning ? { reasoning: "minimal" as const } : {}),
+			// Recaps are tiny, throwaway UI hints. Do not pay to create/read prompt
+			// cache entries, and do not spend reasoning tokens. Claude Code's away
+			// summary path likewise disables thinking for this job.
+			cacheRetention: "none",
+			maxTokens: 256,
 		},
 	);
@@ -273,6 +276,11 @@ export default function (pi: ExtensionAPI) {
 		type: "boolean",
 		default: false,
 	});
+	pi.registerFlag("recap-during-active", {
+		description: "Allow focus-triggered recaps while an agent turn is still running",
+		type: "boolean",
+		default: false,
+	});
 	pi.registerFlag("recap-disable", {
 		description: "Disable the automatic session recap",
 		type: "boolean",
@@ -293,6 +301,14 @@ export default function (pi: ExtensionAPI) {
 	let activeController: AbortController | undefined;
 	let activeReason: RecapReason | undefined;
+	// Agent activity state. Claude Code's away recap deliberately avoids
+	// generating while a turn is still loading: if the user blurs during a slow
+	// tool call, it marks the recap as pending and waits for loading to finish.
+	// Mirroring that here prevents a draft from summarising the pre-result
+	// branch, then being replaced when the slow action finally commits output.
+	let agentActive = false;
+	let focusDraftAfterAgent = false;
 	// Focus reporting state.
 	let focusListener: ((chunk: Buffer) => void) | undefined;
 	let focusEnabled = false;
@@ -304,17 +320,18 @@ export default function (pi: ExtensionAPI) {
 	let lastDraftedLeafId: string | undefined;
 	const idleMs = (): number => {
-		const n = Number(pi.getFlag("--recap-idle-seconds") ?? DEFAULT_IDLE_SECONDS);
+		const n = Number(pi.getFlag("recap-idle-seconds") ?? DEFAULT_IDLE_SECONDS);
 		return Math.max(5, Number.isFinite(n) ? n : DEFAULT_IDLE_SECONDS) * 1000;
 	};
 	const focusMinMs = (): number => {
-		const n = Number(pi.getFlag("--recap-focus-min-seconds") ?? DEFAULT_FOCUS_MIN_SECONDS);
+		const n = Number(pi.getFlag("recap-focus-min-seconds") ?? DEFAULT_FOCUS_MIN_SECONDS);
 		return Math.max(0, Number.isFinite(n) ? n : DEFAULT_FOCUS_MIN_SECONDS) * 1000;
 	};
-	const isDisabled = (): boolean => Boolean(pi.getFlag("--recap-disable"));
-	const isFocusDisabled = (): boolean => Boolean(pi.getFlag("--recap-disable-focus"));
+	const isDisabled = (): boolean => Boolean(pi.getFlag("recap-disable"));
+	const isFocusDisabled = (): boolean => Boolean(pi.getFlag("recap-disable-focus"));
+	const allowDuringActive = (): boolean => Boolean(pi.getFlag("recap-during-active"));
 	const modelOverride = (): string | undefined => {
-		const v = String(pi.getFlag("--recap-model") ?? "").trim();
+		const v = String(pi.getFlag("recap-model") ?? "").trim();
 		return v.length > 0 ? v : undefined;
 	};
@@ -406,10 +423,27 @@ export default function (pi: ExtensionAPI) {
 	// --- focus reporting wiring -------------------------------------------
+	const maybeGenerateDeferredFocusRecap = (ctx: ExtensionContext) => {
+		if (!focusDraftAfterAgent) return;
+		if (focusedOutAt === undefined) return;
+		if (agentActive) return;
+		focusDraftAfterAgent = false;
+		void generateAndShow(ctx, { reason: "focus" });
+	};
 	const handleFocusOut = (ctx: ExtensionContext) => {
 		focusedOutAt = Date.now();
 		if (isDisabled() || activeController) return;
+		// If the user switches away during a long-running model/tool action, do
+		// not draft against the half-written branch. Claude Code sets a pending
+		// bit in this case and generates only after `isLoading` flips false; we do
+		// the same via agent_start/agent_end.
+		if (agentActive && !allowDuringActive()) {
+			focusDraftAfterAgent = true;
+			return;
+		}
 		// Skip regen if we already have a fresh recap for the current session
 		// state — regardless of whether it's still parked in pendingRecap or
 		// already shown in the widget. The stamp is invalidated on any new
@@ -425,6 +459,7 @@ export default function (pi: ExtensionAPI) {
 	const handleFocusIn = (ctx: ExtensionContext) => {
 		const outAt = focusedOutAt;
 		focusedOutAt = undefined;
+		focusDraftAfterAgent = false;
 		if (outAt === undefined) return; // spurious focus-in before we saw focus-out
 		const duration = Date.now() - outAt;
 		if (duration < focusMinMs()) {
@@ -509,6 +544,7 @@ export default function (pi: ExtensionAPI) {
 			focusEnabled = false;
 		}
 		focusedOutAt = undefined;
+		focusDraftAfterAgent = false;
 		pendingRecap = undefined;
 	};
@@ -519,23 +555,30 @@ export default function (pi: ExtensionAPI) {
 		// A new turn (successful or not) invalidates any prior draft.
 		lastDraftedLeafId = undefined;
 		scheduleRecap(ctx);
+		maybeGenerateDeferredFocusRecap(ctx);
 	});
 	pi.on("turn_start", async () => {
 		// Another turn is starting in the same agent loop — clear the idle timer
 		// we armed on the previous turn_end; it'll re-arm on the next turn_end.
+		// It also means any focus/idle draft racing in the background is stale.
 		clearTimer();
+		cancelActive();
+		pendingRecap = undefined;
+		lastDraftedLeafId = undefined;
 	});
 	pi.on("input", async (_event, ctx) => {
 		clearTimer();
 		cancelActive();
+		focusDraftAfterAgent = false;
 		pendingRecap = undefined;
 		lastDraftedLeafId = undefined;
 		clearRecap(ctx);
 	});
 	pi.on("agent_start", async (_event, ctx) => {
+		agentActive = true;
 		clearTimer();
 		cancelActive();
 		pendingRecap = undefined;
@@ -543,7 +586,14 @@ export default function (pi: ExtensionAPI) {
 		clearRecap(ctx);
 	});
+	pi.on("agent_end", async (_event, ctx) => {
+		agentActive = false;
+		maybeGenerateDeferredFocusRecap(ctx);
+	});
 	pi.on("session_shutdown", async () => {
+		agentActive = false;
+		focusDraftAfterAgent = false;
 		clearTimer();
 		cancelActive();
 		detachFocusReporting();
@@ -552,7 +602,7 @@ export default function (pi: ExtensionAPI) {
 	// Session start: wire up focus reporting; on resume, show a recap.
 	pi.on("session_start", async (event, ctx) => {
 		attachFocusReporting(ctx);
-		if (isDisabled()) return;
+		if (isDisabled() || !ctx.hasUI) return;
 		if (event.reason === "resume" || event.reason === "fork") {
 			setTimeout(() => {
 				void generateAndShow(ctx, { reason: "resume" });

package/session-recap/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@tmustier/pi-session-recap",
-  "version": "0.1.2",
+  "version": "0.1.3",
   "description": "One-line recap above the editor when you refocus a Pi session. Keeps you in flow when multi-agenting.",
   "license": "MIT",
   "author": "Thomas Mustier",