pi-extensions 0.1.40 → 0.1.41

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-extensions",
3
- "version": "0.1.40",
3
+ "version": "0.1.41",
4
4
  "license": "MIT",
5
5
  "private": false,
6
6
  "keywords": [
@@ -1,5 +1,17 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.1.3] - 2026-05-12
4
+
5
+ ### Fixed
6
+ - Defer focus-triggered recaps while the agent is still active, matching Claude Code's away-summary pending behavior and avoiding duplicate/stale recaps during slow tool calls.
7
+ - Cancel stale in-flight recap drafts when a new turn starts.
8
+ - Skip `/resume` and `/fork` recap generation in headless/non-UI sessions.
9
+ - Read registered flag values using bare flag names (for example `recap-idle-seconds`, not `--recap-idle-seconds`) so automatic trigger configuration actually takes effect.
10
+ - Invoke recap generation with no reasoning, no prompt-cache retention, and `maxTokens: 256`.
11
+
12
+ ### Added
13
+ - Add `--recap-during-active` to opt back into focus-triggered recaps while an agent turn is still running.
14
+
3
15
  ## [0.1.2] - 2026-05-07
4
16
 
5
17
  ### Changed
@@ -39,12 +39,12 @@ All four cancel each other cleanly via an `AbortController`; the next `input`,
39
39
 
40
40
  Default must not surprise users with auth/login issues.
41
41
 
42
- **Decision:** default to the **currently active model** with **`reasoning: "minimal"`** where supported. Trust the user's model choice — no auto-fallback to a cheap tier. If they're on Opus 4-7 the recap uses Opus 4-7. It's the only way to guarantee reliable generation across built-in + custom providers.
42
+ **Decision:** default to the **currently active model**, but invoke it as a tiny throwaway completion: no tools, no Agent Skills, reasoning/thinking disabled, no prompt-cache retention, capped output. Trust the user's model choice — no auto-fallback to a cheap tier. If they're on Opus 4-7 the recap uses Opus 4-7. It's the only way to guarantee reliable generation across built-in + custom providers.
43
43
 
44
44
  - Primary: `ctx.model` (whatever the user is running right now).
45
- - Reasoning: pass `reasoningEffort: "minimal"` when `model.reasoning === true`; omit entirely otherwise.
46
- - Pi's own `setThinkingLevel` already clamps to model capabilities we follow the same rule.
47
- - Some custom providers may not honour `reasoningEffort`; that's fine, they'll ignore it.
45
+ - Reasoning: disabled for recap generation. Claude Code's away-summary path uses `thinkingConfig: { type: "disabled" }`; recap generation is similarly not worth reasoning tokens.
46
+ - Cache: pass `cacheRetention: "none"` so providers do not add Anthropic cache-control markers or OpenAI prompt-cache session keys. We should not pay cache-write overhead for one-off recap prompts.
47
+ - Output cap: `maxTokens: 256`. Avoid forcing `temperature: 0` because some reasoning/chat providers reject temperature on their Responses API even when we are not requesting reasoning.
48
48
  - Auth: `ctx.modelRegistry.getApiKeyAndHeaders(ctx.model)` — same primitive as every other pi call, so any OAuth / env-var / custom-provider credential the user already set up just works.
49
49
  - Custom / local models (via `pi.registerProvider`): same path. If the provider is registered and has a key, recap works. If not, we skip silently — never fail loudly.
50
50
  - No active model / no API key → skip silently, log to `console.error` for debugging.
@@ -55,7 +55,7 @@ Default must not surprise users with auth/login issues.
55
55
 
56
56
  **Trade-off we're accepting**
57
57
 
58
- - If the user is on a heavy model (Opus 4-7, GPT-5.4), each recap uses that model for a small one-liner task. Still cheap in absolute terms because the prompt is capped at ~12k chars and the output is one line, but not the cheapest option. We prefer "no auth surprise" over "always-cheapest".
58
+ - If the user is on a heavy model (Opus 4-7, GPT-5.5), each recap uses that model for a small one-liner task. Still cheap in absolute terms because the prompt is capped at ~12k chars and the output is one line, but not the cheapest option. We prefer "no auth surprise" over "always-cheapest".
59
59
 
60
60
  ## Context fed to the model
61
61
 
@@ -99,7 +99,15 @@ guard against multi-line outputs.
99
99
 
100
100
  ## Edge cases
101
101
 
102
- ### 1. Focus defocus focus again without user input
102
+ ### 1. Focus-out during long-running agent activity
103
+
104
+ Claude Code's `useAwaySummary` waits until the terminal has been blurred for a fixed delay. If that delay expires while `isLoading` is still true, it sets a pending bit and generates only after loading finishes, as long as the terminal is still blurred. On focus-in it cancels the timer, aborts in-flight generation, and clears the pending bit.
105
+
106
+ Pi's recap extension mirrors that by default: `agent_start`/`agent_end` maintain `agentActive`; focus-out while active sets `focusDraftAfterAgent`; generation is deferred until `agent_end` (or a safe `turn_end` check) and is cancelled if the user refocuses or starts new input. This avoids drafting against a half-written branch during slow tool calls, which could otherwise show one recap and then replace it with a later one once tool results land.
107
+
108
+ Escape hatch: `--recap-during-active` restores the older behavior and allows focus-triggered drafts while the agent turn is still running. This is useful for users who want a mid-flight "what's happened so far" peek and accept the possibility of stale/discarded duplicate drafts.
109
+
110
+ ### 2. Focus → defocus → focus again without user input
103
111
 
104
112
  **Current behaviour:** `handleFocusOut` re-enters `generateAndShow` if there is no in-flight request and no `draftingForFocus` flag, even if `pendingRecap` is still a perfectly valid recap for the same session state. Wasteful, and may overwrite a good recap with an identical one.
105
113
 
@@ -110,7 +118,7 @@ guard against multi-line outputs.
110
118
 
111
119
  **Related:** also gate on "has any activity happened since the previous draft?" — if nothing, reuse; if yes, regenerate.
112
120
 
113
- ### 2. Agent turn ends in error or abort
121
+ ### 3. Agent turn ends in error or abort
114
122
 
115
123
  **Question:** does `agent_end` fire reliably on user-Escape abort and on model/transport errors? Need to verify against pi's current behaviour. `turn_end` is documented as per-turn and should fire even on partial completion.
116
124
 
@@ -119,19 +127,19 @@ guard against multi-line outputs.
119
127
  - Focus-out path already works: `hasMeaningfulActivity` counts assistant words and tool calls, independent of success/failure. An aborted turn with partial work still qualifies.
120
128
  - Add a note in the recap prompt encouraging the model to mention "aborted" / "failed" state explicitly when present in the transcript, so the one-liner is honest (e.g. `recap: Started refactor of auth.ts; aborted before tests ran. Next: resume from middleware split.`).
121
129
 
122
- ### 3. Terminal doesn't support DECSET ?1004
130
+ ### 4. Terminal doesn't support DECSET ?1004
123
131
 
124
132
  Idle fallback covers it. `--recap-disable-focus` lets the user opt out explicitly (in case the escape sequences cause weird ghost characters in a less-compliant terminal).
125
133
 
126
- ### 4. tmux without `focus-events on`
134
+ ### 5. tmux without `focus-events on`
127
135
 
128
136
  tmux swallows focus events unless `set -g focus-events on` is set. Document in README. Idle fallback still works.
129
137
 
130
- ### 5. Aborted-in-flight recap request
138
+ ### 6. Aborted-in-flight recap request
131
139
 
132
140
  Already handled: `AbortController` on every `complete()` call; cancelled on input / agent_start / session_shutdown / next trigger.
133
141
 
134
- ### 6. Multiple pi sessions in the same terminal process
142
+ ### 7. Multiple pi sessions in the same terminal process
135
143
 
136
144
  Not applicable — pi is one process per terminal tab. The stdin listener we add is scoped to the process and cleaned up on `session_shutdown`.
137
145
 
@@ -145,7 +153,7 @@ Not applicable — pi is one process per terminal tab. The stdin listener we add
145
153
 
146
154
  ### Code
147
155
  - [x] Extension lives at `session-recap/index.ts`.
148
- - [x] Default model = `ctx.model` with `reasoning: "minimal"` via `completeSimple()` when the model advertises reasoning; `--recap-model` override.
156
+ - [x] Default model = `ctx.model` via `completeSimple()` with no reasoning, no cache retention, capped output, and `--recap-model` override.
149
157
  - [x] Idle timer armed on `turn_end` (not `agent_end`) so error/abort turns still get a recap.
150
158
  - [x] `pendingRecap` + `lastDraftedLeafId` stamping; skip regen on focus-out if branch leaf hasn't changed.
151
159
  - [x] Prompt explicitly asks the model to mention aborted/errored turn state when present.
@@ -30,10 +30,12 @@ If focus events cause any weirdness in your terminal, run with `--recap-disable-
30
30
 
31
31
  ## Model
32
32
 
33
- Defaults to the **currently active model** in your Pi session with `reasoning: "minimal"` where the model supports it. This piggybacks on whatever auth you already have (including custom providers registered via `pi.registerProvider`), so there are no login surprises.
33
+ Defaults to the **currently active model** in your Pi session, but with recap-specific low-cost settings. This piggybacks on whatever auth you already have (including custom providers registered via `pi.registerProvider`), so there are no login surprises.
34
34
 
35
- - Reasoning-capable model (Opus 4-7, GPT-5.4, etc.) runs at minimal thinking for speed/cost.
36
- - Non-reasoning model no reasoning params passed.
35
+ - No tools or Agent Skills are loaded into the recap call only the compact transcript below is sent.
36
+ - Reasoning/thinking is disabled for the recap call.
37
+ - Prompt cache writes/reads are disabled with `cacheRetention: "none"`.
38
+ - Output is capped with `maxTokens: 256`.
37
39
  - No active model or missing API key → the recap is skipped silently.
38
40
 
39
41
  Override with `--recap-model "<provider>/<id>"` if you want a specific model regardless of the session's active one.
@@ -76,6 +78,7 @@ Filter to just this extension in `~/.pi/agent/settings.json`:
76
78
  | `--recap-idle-seconds <n>` | `45` | Seconds after `turn_end` before the idle-fallback recap fires. |
77
79
  | `--recap-focus-min-seconds <n>` | `3` | Minimum focus-out duration before a recap is revealed on refocus. |
78
80
  | `--recap-disable-focus` | `false` | Disable DECSET `?1004` focus reporting. Idle fallback still runs. |
81
+ | `--recap-during-active` | `false` | Allow focus-triggered recaps while an agent turn is still running. This restores the older “peek mid-flight” behavior, at the cost of possible stale/discarded duplicate drafts. |
79
82
  | `--recap-disable` | `false` | Disable the automatic recap entirely. `/recap` still works. |
80
83
  | `--recap-model "<p/id>"` | (active model) | Override the default, e.g. `anthropic/claude-sonnet-4-6`. |
81
84
 
@@ -89,6 +92,7 @@ Filter to just this extension in `~/.pi/agent/settings.json`:
89
92
 
90
93
  - **Uses `turn_end`, not `agent_end`**, so a turn that errors or is aborted still gets recapped.
91
94
  - **No duplicate drafts**: the last-drafted branch-leaf is stamped; if you focus out / in repeatedly without any new session activity, the recap is reused rather than regenerated.
95
+ - **Defers during active work by default**: if you focus away during a slow model/tool action, the focus recap waits until the agent finishes loading before drafting, matching Claude Code's away-summary behavior. Use `--recap-during-active` to allow mid-flight recaps instead.
92
96
  - **Aborts on new input**: any in-flight recap request is cancelled when you start typing or a new turn begins.
93
97
  - **No session persistence**: the recap lives only in the widget for the active session — nothing is stored.
94
98
 
@@ -18,16 +18,17 @@
18
18
  * Also fires on `/resume` (session_start reason="resume") to recap where
19
19
  * the prior session left off.
20
20
  *
21
- * Model: defaults to the user's currently active model with
22
- * `reasoning: "minimal"` when the model advertises reasoning support. This
23
- * piggybacks on whatever auth the user already has configured (including
24
- * custom providers) so there are no login surprises. Override explicitly
21
+ * Model: defaults to the user's currently active model with reasoning/thinking
22
+ * disabled and cache writes disabled. This piggybacks on whatever auth the user
23
+ * already has configured (including custom providers) so there are no login
24
+ * surprises. Override explicitly
25
25
  * with `--recap-model "<provider>/<id>"` if you want a specific model.
26
26
  *
27
27
  * Flags:
28
28
  * --recap-idle-seconds <n> Seconds after turn_end for idle recap (default 45)
29
29
  * --recap-focus-min-seconds <n> Min focus-out duration to show a recap (default 3)
30
30
  * --recap-disable-focus Disable DECSET ?1004 focus reporting
31
+ * --recap-during-active Allow focus recaps while an agent turn is still running
31
32
  * --recap-disable Disable the automatic recap entirely
32
33
  * --recap-model <p/id> Override the default (active) model
33
34
  *
@@ -227,9 +228,11 @@ async function generateRecap(
227
228
  apiKey: auth.apiKey,
228
229
  headers: auth.headers,
229
230
  signal,
230
- // Only request reasoning on reasoning-capable models. Non-reasoning
231
- // models ignore unknown params but we keep this clean.
232
- ...(model.reasoning ? { reasoning: "minimal" as const } : {}),
231
+ // Recaps are tiny, throwaway UI hints. Do not pay to create/read prompt
232
+ // cache entries, and do not spend reasoning tokens. Claude Code's away
233
+ // summary path likewise disables thinking for this job.
234
+ cacheRetention: "none",
235
+ maxTokens: 256,
233
236
  },
234
237
  );
235
238
 
@@ -273,6 +276,11 @@ export default function (pi: ExtensionAPI) {
273
276
  type: "boolean",
274
277
  default: false,
275
278
  });
279
+ pi.registerFlag("recap-during-active", {
280
+ description: "Allow focus-triggered recaps while an agent turn is still running",
281
+ type: "boolean",
282
+ default: false,
283
+ });
276
284
  pi.registerFlag("recap-disable", {
277
285
  description: "Disable the automatic session recap",
278
286
  type: "boolean",
@@ -293,6 +301,14 @@ export default function (pi: ExtensionAPI) {
293
301
  let activeController: AbortController | undefined;
294
302
  let activeReason: RecapReason | undefined;
295
303
 
304
+ // Agent activity state. Claude Code's away recap deliberately avoids
305
+ // generating while a turn is still loading: if the user blurs during a slow
306
+ // tool call, it marks the recap as pending and waits for loading to finish.
307
+ // Mirroring that here prevents a draft from summarising the pre-result
308
+ // branch, then being replaced when the slow action finally commits output.
309
+ let agentActive = false;
310
+ let focusDraftAfterAgent = false;
311
+
296
312
  // Focus reporting state.
297
313
  let focusListener: ((chunk: Buffer) => void) | undefined;
298
314
  let focusEnabled = false;
@@ -304,17 +320,18 @@ export default function (pi: ExtensionAPI) {
304
320
  let lastDraftedLeafId: string | undefined;
305
321
 
306
322
  const idleMs = (): number => {
307
- const n = Number(pi.getFlag("--recap-idle-seconds") ?? DEFAULT_IDLE_SECONDS);
323
+ const n = Number(pi.getFlag("recap-idle-seconds") ?? DEFAULT_IDLE_SECONDS);
308
324
  return Math.max(5, Number.isFinite(n) ? n : DEFAULT_IDLE_SECONDS) * 1000;
309
325
  };
310
326
  const focusMinMs = (): number => {
311
- const n = Number(pi.getFlag("--recap-focus-min-seconds") ?? DEFAULT_FOCUS_MIN_SECONDS);
327
+ const n = Number(pi.getFlag("recap-focus-min-seconds") ?? DEFAULT_FOCUS_MIN_SECONDS);
312
328
  return Math.max(0, Number.isFinite(n) ? n : DEFAULT_FOCUS_MIN_SECONDS) * 1000;
313
329
  };
314
- const isDisabled = (): boolean => Boolean(pi.getFlag("--recap-disable"));
315
- const isFocusDisabled = (): boolean => Boolean(pi.getFlag("--recap-disable-focus"));
330
+ const isDisabled = (): boolean => Boolean(pi.getFlag("recap-disable"));
331
+ const isFocusDisabled = (): boolean => Boolean(pi.getFlag("recap-disable-focus"));
332
+ const allowDuringActive = (): boolean => Boolean(pi.getFlag("recap-during-active"));
316
333
  const modelOverride = (): string | undefined => {
317
- const v = String(pi.getFlag("--recap-model") ?? "").trim();
334
+ const v = String(pi.getFlag("recap-model") ?? "").trim();
318
335
  return v.length > 0 ? v : undefined;
319
336
  };
320
337
 
@@ -406,10 +423,27 @@ export default function (pi: ExtensionAPI) {
406
423
 
407
424
  // --- focus reporting wiring -------------------------------------------
408
425
 
426
+ const maybeGenerateDeferredFocusRecap = (ctx: ExtensionContext) => {
427
+ if (!focusDraftAfterAgent) return;
428
+ if (focusedOutAt === undefined) return;
429
+ if (agentActive) return;
430
+ focusDraftAfterAgent = false;
431
+ void generateAndShow(ctx, { reason: "focus" });
432
+ };
433
+
409
434
  const handleFocusOut = (ctx: ExtensionContext) => {
410
435
  focusedOutAt = Date.now();
411
436
  if (isDisabled() || activeController) return;
412
437
 
438
+ // If the user switches away during a long-running model/tool action, do
439
+ // not draft against the half-written branch. Claude Code sets a pending
440
+ // bit in this case and generates only after `isLoading` flips false; we do
441
+ // the same via agent_start/agent_end.
442
+ if (agentActive && !allowDuringActive()) {
443
+ focusDraftAfterAgent = true;
444
+ return;
445
+ }
446
+
413
447
  // Skip regen if we already have a fresh recap for the current session
414
448
  // state — regardless of whether it's still parked in pendingRecap or
415
449
  // already shown in the widget. The stamp is invalidated on any new
@@ -425,6 +459,7 @@ export default function (pi: ExtensionAPI) {
425
459
  const handleFocusIn = (ctx: ExtensionContext) => {
426
460
  const outAt = focusedOutAt;
427
461
  focusedOutAt = undefined;
462
+ focusDraftAfterAgent = false;
428
463
  if (outAt === undefined) return; // spurious focus-in before we saw focus-out
429
464
  const duration = Date.now() - outAt;
430
465
  if (duration < focusMinMs()) {
@@ -509,6 +544,7 @@ export default function (pi: ExtensionAPI) {
509
544
  focusEnabled = false;
510
545
  }
511
546
  focusedOutAt = undefined;
547
+ focusDraftAfterAgent = false;
512
548
  pendingRecap = undefined;
513
549
  };
514
550
 
@@ -519,23 +555,30 @@ export default function (pi: ExtensionAPI) {
519
555
  // A new turn (successful or not) invalidates any prior draft.
520
556
  lastDraftedLeafId = undefined;
521
557
  scheduleRecap(ctx);
558
+ maybeGenerateDeferredFocusRecap(ctx);
522
559
  });
523
560
 
524
561
  pi.on("turn_start", async () => {
525
562
  // Another turn is starting in the same agent loop — clear the idle timer
526
563
  // we armed on the previous turn_end; it'll re-arm on the next turn_end.
564
+ // It also means any focus/idle draft racing in the background is stale.
527
565
  clearTimer();
566
+ cancelActive();
567
+ pendingRecap = undefined;
568
+ lastDraftedLeafId = undefined;
528
569
  });
529
570
 
530
571
  pi.on("input", async (_event, ctx) => {
531
572
  clearTimer();
532
573
  cancelActive();
574
+ focusDraftAfterAgent = false;
533
575
  pendingRecap = undefined;
534
576
  lastDraftedLeafId = undefined;
535
577
  clearRecap(ctx);
536
578
  });
537
579
 
538
580
  pi.on("agent_start", async (_event, ctx) => {
581
+ agentActive = true;
539
582
  clearTimer();
540
583
  cancelActive();
541
584
  pendingRecap = undefined;
@@ -543,7 +586,14 @@ export default function (pi: ExtensionAPI) {
543
586
  clearRecap(ctx);
544
587
  });
545
588
 
589
+ pi.on("agent_end", async (_event, ctx) => {
590
+ agentActive = false;
591
+ maybeGenerateDeferredFocusRecap(ctx);
592
+ });
593
+
546
594
  pi.on("session_shutdown", async () => {
595
+ agentActive = false;
596
+ focusDraftAfterAgent = false;
547
597
  clearTimer();
548
598
  cancelActive();
549
599
  detachFocusReporting();
@@ -552,7 +602,7 @@ export default function (pi: ExtensionAPI) {
552
602
  // Session start: wire up focus reporting; on resume, show a recap.
553
603
  pi.on("session_start", async (event, ctx) => {
554
604
  attachFocusReporting(ctx);
555
- if (isDisabled()) return;
605
+ if (isDisabled() || !ctx.hasUI) return;
556
606
  if (event.reason === "resume" || event.reason === "fork") {
557
607
  setTimeout(() => {
558
608
  void generateAndShow(ctx, { reason: "resume" });
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@tmustier/pi-session-recap",
3
- "version": "0.1.2",
3
+ "version": "0.1.3",
4
4
  "description": "One-line recap above the editor when you refocus a Pi session. Keeps you in flow when multi-agenting.",
5
5
  "license": "MIT",
6
6
  "author": "Thomas Mustier",