@tintinweb/pi-subagents 0.6.3 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -7,6 +7,28 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
7
7
 
8
8
  ## [Unreleased]
9
9
 
10
+ ## [0.7.0] - 2026-05-04
11
+
12
+ > **Heads-up — behavior changes:**
13
+ > - `subagents:completed`/`failed` event `tokens.total` now excludes `cacheRead` (previously double-counted across turns) — see Fixed [#38].
14
+ > - Cron `?` is now a wildcard (same as `*`), not "current time value" — affects Quartz-style expressions only.
15
+
16
+ ### Changed
17
+ - **`@mariozechner/pi-{ai,coding-agent,tui}` moved to `peerDependencies` (`>=0.70.5`).** Avoids duplicate framework instances when the host loads this extension.
18
+ - **`@sinclair/typebox` pinned from `latest` to `^0.34.49`** so installs are reproducible.
19
+ - **`croner` bumped 8 → 10.** Heads-up: in cron strings, `?` now means wildcard (same as `*`) instead of "current time value" — affects Quartz-style expressions only.
20
+
21
+ ### Added
22
+ - **Master switch for scheduling** — new `schedulingEnabled` setting (default `true`) under `/agents → Settings → Scheduling`. When set to `false`: the `schedule` parameter and its guideline are stripped from the `Agent` tool spec at registration (zero LLM-context cost), the scheduler does not bind to the session, the `/agents → Scheduled jobs` menu entry is hidden, and any in-flight scheduler is stopped immediately. The schema-level removal applies on next pi session; the runtime kill (menu, fire path) takes effect immediately. Persisted at `<cwd>/.pi/subagents.json`.
23
+ - **Schedule subagent spawns** — the `Agent` tool now accepts an optional `schedule` parameter. When set, the spawn registers a job that fires later instead of running immediately. Three formats: 6-field cron (`"0 0 9 * * 1"` — 9am every Monday), interval (`"5m"`, `"1h"`), or one-shot (`"+10m"` or ISO timestamp). Returns the job ID. Schedules are session-scoped — they reset on `/new`, restore on `/resume` (mirrors the persistence model of pi-chonky-tasks). Storage at `<cwd>/.pi/subagent-schedules/<sessionId>.json`, with PID-based file locking + atomic temp+rename for concurrent-instance safety. **Result delivery is identical to today's background-spawn completions**: when the scheduled agent finishes, the existing `subagent-notification` followUp path emits the result to the conversation — no new delivery code, no new message types. **Concurrency**: scheduled fires bypass `maxConcurrent` so a 5-minute interval can't be deferred behind 4 long-running manual agents. **Management**: `/agents` → "Scheduled jobs" lists active jobs and lets you cancel any one of them. Creation is via the `Agent` tool only — no parallel manual-create wizard in this iteration. **Events**: `subagents:scheduled` ({ type: "added" | "removed" | "updated" | "fired" | "error", … }) and `subagents:scheduler_ready` for cross-extension consumers. **Restrictions**: `schedule` is incompatible with `inherit_context` (no parent at fire time) and `resume` (schedules create fresh agents); forces `run_in_background: true`. Scheduler engine mirrors `pi-cron-schedule` (`croner` for cron, `setInterval`/`setTimeout` for interval/once); past one-shot timestamps and invalid cron expressions are caught at create time.
24
+ - **Context-window utilization indicator in the subagent overlay** — token count is now followed by a colored `(NN%)` showing how full the subagent's context is right now (`estimateContextTokens(messages) / model.contextWindow * 100`, sourced from upstream `contextUsage.percent`). Threshold colors: <70% dim, 70–85% warning, ≥85% error. Gracefully omitted when the model has no `contextWindow` declared, or right after compaction before the next assistant turn (`tokens` is `null` in that window). The same annotation slot also surfaces a compaction count `↻N` when the agent has compacted at least once — e.g. `12.3k token (84% · ↻3)` (percent + compactions joined with `·`), `12.3k token (↻1)` (compactions only, immediately post-compaction while percent is still null). The compaction glyph stays dim regardless; the percent's threshold color carries the urgency signal. Two live overlays get the annotations (running stats line; inspect-overlay header); post-completion notifications and result/event payloads only get the count (the indicator is no longer actionable once the agent is done).
25
+ - **Token usage and context% exposed to the parent agent** at every interaction surface — `get_subagent_result` adds `Context: NN%` to its stats line; `steer_subagent` returns a `Current state: 12.3k token · 5 tool uses · context 72% full` line so the steering agent knows whether it has room before sending more context; `task-notification` XML adds `<context_percent>NN</context_percent>` (omitted when null). All plain-text, no ANSI codes — designed for LLM consumption, not human display.
26
+ - **New `subagents:compacted` lifecycle event** fires when a subagent's session successfully compacts. Payload: `{ id, type, description, reason: "manual" | "threshold" | "overflow", tokensBefore, compactionCount }` — `tokensBefore` is upstream's pre-compaction context size estimate; `compactionCount` is the running total for this agent (also persisted on `AgentRecord.compactionCount` and surfaced in `get_subagent_result` / `steer_subagent` / `task-notification` when > 0). Aborted compactions don't fire. Routed through a new manager-level `onCompact` constructor callback, matching the existing `onStart` / `onComplete` pattern.
27
+
28
+ ### Fixed
29
+ - **Subagent token count was inflated 5–15× and reset mid-run** ([#38](https://github.com/tintinweb/pi-subagents/issues/38)). Two distinct bugs in the same field. (1) Upstream `getSessionStats().tokens.total` sums per-turn `cacheRead` across every assistant message — but each turn's `cacheRead` is the *cumulative* cached prefix re-read on that one API call, so summing N turns counts the prefix N times (quadratic inflation, very visible on long sessions). (2) Even with that fixed, anything derived from `session.state.messages` resets at compaction because upstream replaces the array via `this.agent.state.messages = sessionContext.messages`. Fix replaces all six display readers with a lifetime accumulator (`AgentRecord.lifetimeUsage` and `AgentActivity.lifetimeUsage` — `{ input, output, cacheWrite }`) fed by a new `onAssistantUsage` callback dispatched from `message_end` events in both `runAgent` and `resumeAgent`. The accumulator is independent of `state.messages` mutation, so it survives compaction; total = input + output + cacheWrite by construction (cacheRead deliberately excluded — same prefix-double-counting reason). The `subagents:completed`/`failed` event payload's `tokens` field is now also lifetime-accumulated for `input`, `output`, and `total` together (was: `total` lifetime, `input`/`output` session-derived → inconsistent after compaction).
30
+ - **ESC during a foreground `Agent` call now actually stops the subagent** ([#44](https://github.com/tintinweb/pi-subagents/pull/44) — thanks [@Zeng-Zer](https://github.com/Zeng-Zer)). Pi's interrupt path is `esc → agent.abort()` on the parent → `AbortSignal` delivered to every tool's `execute(toolCallId, params, signal, …)`, but the `Agent` tool dropped that signal on the floor: subagents ran on their own independent `AbortController` inside `AgentManager`, so the parent abort was invisible and the subagent kept running until natural completion or `max_turns`. Fix threads `signal` through `Agent.execute` → `manager.spawnAndWait()` → `SpawnOptions.signal`, and `AgentManager.startAgent()` now attaches an `{ once: true }` `"abort"` listener that calls `this.abort(id)` (which sets `status: "stopped"` and aborts the child controller). The listener is detached in both `.then` and `.catch` to avoid leaking on natural settle. **Scope:** foreground only — background agents intentionally outlive the parent tool call, so their spawn deliberately does not forward `signal`. Resume path (`AgentManager.resume()`) has the same blind spot and is tracked as a follow-up.
31
+
10
32
  ## [0.6.3] - 2026-04-28
11
33
 
12
34
  ### Fixed
package/README.md CHANGED
@@ -28,8 +28,9 @@ https://github.com/user-attachments/assets/8685261b-9338-4fea-8dfe-1c590d5df543
28
28
  - **Skill preloading** — inject named skill files from `.pi/skills/` into agent system prompts
29
29
  - **Tool denylist** — block specific tools via `disallowed_tools` frontmatter
30
30
  - **Styled completion notifications** — background agent results render as themed, compact notification boxes (icon, stats, result preview) instead of raw XML. Expandable to show full output. Group completions render each agent individually
31
- - **Event bus** — lifecycle events (`subagents:created`, `started`, `completed`, `failed`, `steered`) emitted via `pi.events`, enabling other extensions to react to sub-agent activity
31
+ - **Event bus** — lifecycle events (`subagents:created`, `started`, `completed`, `failed`, `steered`, `compacted`) emitted via `pi.events`, enabling other extensions to react to sub-agent activity
32
32
  - **Cross-extension RPC** — other pi extensions can spawn and stop subagents via the `pi.events` event bus (`subagents:rpc:ping`, `subagents:rpc:spawn`, `subagents:rpc:stop`). Standardized reply envelopes with protocol versioning. Emits `subagents:ready` on load
33
+ - **Schedule subagents** — pass `schedule` to the `Agent` tool to fire on cron / interval / one-shot. Session-scoped jobs with PID-locked persistence; results land via the same `subagent-notification` followUp path as manual background completions; manage via `/agents → Scheduled jobs`
33
34
 
34
35
  ## Install
35
36
 
@@ -58,29 +59,67 @@ Agent({
58
59
 
59
60
  Foreground agents block until complete and return results inline. Background agents return an ID immediately and notify you on completion.
60
61
 
62
+ ### Scheduling
63
+
64
+ Add a `schedule` field to register the agent to fire later instead of running now:
65
+
66
+ ```
67
+ Agent({
68
+ subagent_type: "Explore",
69
+ prompt: "Look at recent commits and summarize what changed since last week",
70
+ description: "Weekly commit review",
71
+ schedule: "0 0 9 * * 1", // 9am every Monday (6-field cron)
72
+ })
73
+ ```
74
+
75
+ Schedule formats:
76
+
77
+ - **Cron** — 6-field (`second minute hour day-of-month month day-of-week`), e.g. `"0 0 9 * * 1"` for 9am every Monday, `"0 */15 * * * *"` for every 15 minutes.
78
+ - **Interval** — `"5m"`, `"1h"`, `"30s"`, `"2d"`. Fires repeatedly at that interval.
79
+ - **One-shot relative** — `"+10m"`, `"+2h"`, `"+1d"`. Fires once at that future time.
80
+ - **One-shot absolute** — full ISO timestamp, e.g. `"2026-12-25T09:00:00.000Z"`.
81
+
82
+ When a schedule fires, the spawn runs in background and its completion notification arrives in the conversation through the same `subagent-notification` followUp path as a manually-spawned background agent — your parent agent reasons about the result the same way.
83
+
84
+ Schedules are **session-scoped**: they reset on `/new` and restore on `/resume`. List and cancel via `/agents → Scheduled jobs` (creation is the `Agent` tool's job — there is no parallel manual-create wizard). Storage at `<cwd>/.pi/subagent-schedules/<sessionId>.json` with PID-based file locking for cross-instance safety.
85
+
86
+ **Disable the feature entirely**: `/agents → Settings → Scheduling → disabled` removes `schedule` from the `Agent` tool spec (no LLM-context cost), hides the menu entry, and stops any active scheduler. The schema-level removal takes effect on the next pi session; the runtime kill is immediate. Re-enable from the same menu.
87
+
88
+ Restrictions:
89
+ - `schedule` cannot be combined with `inherit_context` (no parent conversation exists at fire time) or `resume` (schedules create fresh agents).
90
+ - `run_in_background` is forced to `true`.
91
+ - Scheduled fires bypass the `maxConcurrent` queue so a 5-minute interval cannot be deferred behind long-running manual agents.
92
+ - **Headless `pi -p` doesn't wait for scheduled subagents.**
93
+
61
94
  ## UI
62
95
 
63
96
  The extension renders a persistent widget above the editor showing all active agents:
64
97
 
65
98
  ```
66
99
  ● Agents
67
- ├─ ⠹ Agent Refactor auth module · ⟳5≤30 · 5 tool uses · 33.8k token · 12.3s
100
+ ├─ ⠹ Agent Refactor auth module · ⟳5≤30 · 5 tool uses · 33.8k token (62%) · 12.3s
68
101
  │ ⎿ editing 2 files…
69
- ├─ ⠹ Explore Find auth files · ⟳3 · 3 tool uses · 12.4k token · 4.1s
102
+ ├─ ⠹ Explore Find auth files · ⟳3 · 3 tool uses · 12.4k token (8%) · 4.1s
70
103
  │ ⎿ searching…
104
+ ├─ ⠹ Agent Long-running task · ⟳42 · 38 tool uses · 91.0k token (84% · ↻2) · 2m17s
105
+ │ ⎿ reading…
71
106
  └─ 2 queued
72
107
  ```
73
108
 
109
+ The token field is annotated with two optional signals inside parens:
110
+ - **`NN%`** — context-window utilization (color-coded: <70% dim, 70–85% warning, ≥85% error). Omitted when the model has no declared `contextWindow`, or briefly right after compaction.
111
+ - **`↻N`** — number of times the session has compacted, when > 0. Stays dim; the percent's color carries urgency.
112
+
74
113
  Individual agent results render Claude Code-style in the conversation:
75
114
 
76
115
  | State | Example |
77
116
  |-------|---------|
78
- | **Running** | `⠹ ⟳3≤30 · 3 tool uses · 12.4k token` / `⎿ searching, reading 3 files…` |
79
- | **Completed** | `✓ ⟳8 · 5 tool uses · 33.8k token · 12.3s` / `⎿ Done` |
80
- | **Wrapped up** | `✓ ⟳50≤50 · 50 tool uses · 89.1k token · 45.2s` / `⎿ Wrapped up (turn limit)` |
81
- | **Stopped** | `■ ⟳3 · 3 tool uses · 12.4k token` / `⎿ Stopped` |
82
- | **Error** | `✗ ⟳3 · 3 tool uses · 12.4k token` / `⎿ Error: timeout` |
83
- | **Aborted** | `✗ ⟳55≤50 · 55 tool uses · 102.3k token` / `⎿ Aborted (max turns exceeded)` |
117
+ | **Running** | `⠹ ⟳3≤30 · 3 tool uses · 12.4k token (8%)` / `⎿ searching, reading 3 files…` |
118
+ | **Completed** | `✓ ⟳8 · 5 tool uses · 33.8k token (62%) · 12.3s` / `⎿ Done` |
119
+ | **Wrapped up** | `✓ ⟳50≤50 · 50 tool uses · 89.1k token (84% · ↻2) · 45.2s` / `⎿ Wrapped up (turn limit)` |
120
+ | **Stopped** | `■ ⟳3 · 3 tool uses · 12.4k token (8%)` / `⎿ Stopped` |
121
+ | **Error** | `✗ ⟳3 · 3 tool uses · 12.4k token (8%)` / `⎿ Error: timeout` |
122
+ | **Aborted** | `✗ ⟳55≤50 · 55 tool uses · 102.3k token (95% · ↻3)` / `⎿ Aborted (max turns exceeded)` |
84
123
 
85
124
  Completed results can be expanded (ctrl+o in pi) to show the full agent output inline.
86
125
 
@@ -304,13 +343,18 @@ Agent lifecycle events are emitted via `pi.events.emit()` so other extensions ca
304
343
  |-------|------|------------|
305
344
  | `subagents:created` | Background agent registered | `id`, `type`, `description`, `isBackground` |
306
345
  | `subagents:started` | Agent transitions to running (including queued→running) | `id`, `type`, `description` |
307
- | `subagents:completed` | Agent finished successfully | `id`, `type`, `durationMs`, `tokens`, `toolUses`, `result` |
346
+ | `subagents:completed` | Agent finished successfully | `id`, `type`, `durationMs`, `tokens` (lifetime `{ input, output, total }`), `toolUses`, `result` |
308
347
  | `subagents:failed` | Agent errored, stopped, or aborted | same as completed + `error`, `status` |
309
348
  | `subagents:steered` | Steering message sent | `id`, `message` |
349
+ | `subagents:compacted` | Agent's session successfully compacted | `id`, `type`, `description`, `reason` (`"manual"` / `"threshold"` / `"overflow"`), `tokensBefore`, `compactionCount` |
350
+ | `subagents:scheduled` | Schedule lifecycle change | `{ type: "added" \| "removed" \| "updated" \| "fired" \| "error", … }` (job/agentId/error fields per type) |
351
+ | `subagents:scheduler_ready` | Scheduler bound to session, enabled jobs armed | `sessionId`, `jobCount` |
310
352
  | `subagents:ready` | Extension loaded and RPC handlers registered | — |
311
353
  | `subagents:settings_loaded` | Persisted settings applied at extension init | `settings` (merged global + project) |
312
354
  | `subagents:settings_changed` | `/agents` → Settings mutation was applied | `settings`, `persisted` (`boolean` — `false` on write failure) |
313
355
 
356
+ `tokens.total` = `input + output + cacheWrite`. `cacheRead` is excluded — each turn's `cacheRead` is the cumulative cached prefix re-read on that one API call, so summing per-message would over-count it. Use `contextUsage.percent` (surfaced as `(NN%)` in the widget) for current context size.
357
+
314
358
  ## Cross-Extension RPC
315
359
 
316
360
  Other pi extensions can spawn and stop subagents programmatically via the `pi.events` event bus, without importing this package directly.
@@ -11,6 +11,11 @@ import { type ToolActivity } from "./agent-runner.js";
11
11
  import type { AgentRecord, IsolationMode, SubagentType, ThinkingLevel } from "./types.js";
12
12
  export type OnAgentComplete = (record: AgentRecord) => void;
13
13
  export type OnAgentStart = (record: AgentRecord) => void;
14
+ export type OnAgentCompact = (record: AgentRecord, info: CompactionInfo) => void;
15
+ export type CompactionInfo = {
16
+ reason: "manual" | "threshold" | "overflow";
17
+ tokensBefore: number;
18
+ };
14
19
  interface SpawnOptions {
15
20
  description: string;
16
21
  model?: Model<any>;
@@ -19,8 +24,16 @@ interface SpawnOptions {
19
24
  inheritContext?: boolean;
20
25
  thinkingLevel?: ThinkingLevel;
21
26
  isBackground?: boolean;
27
+ /**
28
+ * Skip the maxConcurrent queue check for this spawn — start immediately even
29
+ * if the configured concurrency limit would otherwise queue it. Used by the
30
+ * scheduler so a fired job can't be deferred past its trigger window.
31
+ */
32
+ bypassQueue?: boolean;
22
33
  /** Isolation mode — "worktree" creates a temp git worktree for the agent. */
23
34
  isolation?: IsolationMode;
35
+ /** Parent abort signal — when aborted, the subagent is also stopped. */
36
+ signal?: AbortSignal;
24
37
  /** Called on tool start/end with activity info (for streaming progress to UI). */
25
38
  onToolActivity?: (activity: ToolActivity) => void;
26
39
  /** Called on streaming text deltas from the assistant response. */
@@ -29,18 +42,27 @@ interface SpawnOptions {
29
42
  onSessionCreated?: (session: AgentSession) => void;
30
43
  /** Called at the end of each agentic turn with the cumulative count. */
31
44
  onTurnEnd?: (turnCount: number) => void;
45
+ /** Called once per assistant message_end with that message's usage delta. */
46
+ onAssistantUsage?: (usage: {
47
+ input: number;
48
+ output: number;
49
+ cacheWrite: number;
50
+ }) => void;
51
+ /** Called when the session successfully compacts. */
52
+ onCompaction?: (info: CompactionInfo) => void;
32
53
  }
33
54
  export declare class AgentManager {
34
55
  private agents;
35
56
  private cleanupInterval;
36
57
  private onComplete?;
37
58
  private onStart?;
59
+ private onCompact?;
38
60
  private maxConcurrent;
39
61
  /** Queue of background agents waiting to start. */
40
62
  private queue;
41
63
  /** Number of currently running background agents. */
42
64
  private runningBackground;
43
- constructor(onComplete?: OnAgentComplete, maxConcurrent?: number, onStart?: OnAgentStart);
65
+ constructor(onComplete?: OnAgentComplete, maxConcurrent?: number, onStart?: OnAgentStart, onCompact?: OnAgentCompact);
44
66
  /** Update the max concurrent background agents limit. */
45
67
  setMaxConcurrent(n: number): void;
46
68
  getMaxConcurrent(): number;
@@ -7,6 +7,7 @@
7
7
  */
8
8
  import { randomUUID } from "node:crypto";
9
9
  import { resumeAgent, runAgent } from "./agent-runner.js";
10
+ import { addUsage } from "./usage.js";
10
11
  import { cleanupWorktree, createWorktree, pruneWorktrees, } from "./worktree.js";
11
12
  /** Default max concurrent background agents. */
12
13
  const DEFAULT_MAX_CONCURRENT = 4;
@@ -15,14 +16,16 @@ export class AgentManager {
15
16
  cleanupInterval;
16
17
  onComplete;
17
18
  onStart;
19
+ onCompact;
18
20
  maxConcurrent;
19
21
  /** Queue of background agents waiting to start. */
20
22
  queue = [];
21
23
  /** Number of currently running background agents. */
22
24
  runningBackground = 0;
23
- constructor(onComplete, maxConcurrent = DEFAULT_MAX_CONCURRENT, onStart) {
25
+ constructor(onComplete, maxConcurrent = DEFAULT_MAX_CONCURRENT, onStart, onCompact) {
24
26
  this.onComplete = onComplete;
25
27
  this.onStart = onStart;
28
+ this.onCompact = onCompact;
26
29
  this.maxConcurrent = maxConcurrent;
27
30
  // Cleanup completed agents after 10 minutes (but keep sessions for resume)
28
31
  this.cleanupInterval = setInterval(() => this.cleanup(), 60_000);
@@ -51,10 +54,12 @@ export class AgentManager {
51
54
  toolUses: 0,
52
55
  startedAt: Date.now(),
53
56
  abortController,
57
+ lifetimeUsage: { input: 0, output: 0, cacheWrite: 0 },
58
+ compactionCount: 0,
54
59
  };
55
60
  this.agents.set(id, record);
56
61
  const args = { pi, ctx, type, prompt, options };
57
- if (options.isBackground && this.runningBackground >= this.maxConcurrent) {
62
+ if (options.isBackground && !options.bypassQueue && this.runningBackground >= this.maxConcurrent) {
58
63
  // Queue it — will be started when a running agent completes
59
64
  this.queue.push({ id, args });
60
65
  return id;
@@ -69,6 +74,14 @@ export class AgentManager {
69
74
  if (options.isBackground)
70
75
  this.runningBackground++;
71
76
  this.onStart?.(record);
77
+ // Wire parent abort signal to stop the subagent when the parent is interrupted
78
+ let detachParentSignal;
79
+ if (options.signal) {
80
+ const onParentAbort = () => this.abort(id);
81
+ options.signal.addEventListener("abort", onParentAbort, { once: true });
82
+ detachParentSignal = () => options.signal.removeEventListener("abort", onParentAbort);
83
+ }
84
+ const detach = () => { detachParentSignal?.(); detachParentSignal = undefined; };
72
85
  // Worktree isolation: create a temporary git worktree if requested
73
86
  let worktreeCwd;
74
87
  let worktreeWarning = "";
@@ -100,6 +113,15 @@ export class AgentManager {
100
113
  },
101
114
  onTurnEnd: options.onTurnEnd,
102
115
  onTextDelta: options.onTextDelta,
116
+ onAssistantUsage: (usage) => {
117
+ addUsage(record.lifetimeUsage, usage);
118
+ options.onAssistantUsage?.(usage);
119
+ },
120
+ onCompaction: (info) => {
121
+ record.compactionCount++;
122
+ this.onCompact?.(record, info);
123
+ options.onCompaction?.(info);
124
+ },
103
125
  onSessionCreated: (session) => {
104
126
  record.session = session;
105
127
  // Flush any steers that arrived before the session was ready
@@ -120,6 +142,7 @@ export class AgentManager {
120
142
  record.result = responseText;
121
143
  record.session = session;
122
144
  record.completedAt ??= Date.now();
145
+ detach();
123
146
  // Final flush of streaming output file
124
147
  if (record.outputCleanup) {
125
148
  try {
@@ -151,6 +174,7 @@ export class AgentManager {
151
174
  }
152
175
  record.error = err instanceof Error ? err.message : String(err);
153
176
  record.completedAt ??= Date.now();
177
+ detach();
154
178
  // Final flush of streaming output file on error
155
179
  if (record.outputCleanup) {
156
180
  try {
@@ -214,6 +238,13 @@ export class AgentManager {
214
238
  if (activity.type === "end")
215
239
  record.toolUses++;
216
240
  },
241
+ onAssistantUsage: (usage) => {
242
+ addUsage(record.lifetimeUsage, usage);
243
+ },
244
+ onCompaction: (info) => {
245
+ record.compactionCount++;
246
+ this.onCompact?.(record, info);
247
+ },
217
248
  signal,
218
249
  });
219
250
  record.status = "completed";
@@ -38,6 +38,24 @@ export interface RunOptions {
38
38
  onSessionCreated?: (session: AgentSession) => void;
39
39
  /** Called at the end of each agentic turn with the cumulative count. */
40
40
  onTurnEnd?: (turnCount: number) => void;
41
+ /**
42
+ * Called once per assistant message_end with that message's usage delta.
43
+ * Lets callers maintain a lifetime accumulator that survives compaction
44
+ * (which replaces session.state.messages and resets stats-derived sums).
45
+ */
46
+ onAssistantUsage?: (usage: {
47
+ input: number;
48
+ output: number;
49
+ cacheWrite: number;
50
+ }) => void;
51
+ /**
52
+ * Called when the session successfully compacts. `tokensBefore` is upstream's
53
+ * pre-compaction context size estimate. Aborted compactions don't fire.
54
+ */
55
+ onCompaction?: (info: {
56
+ reason: "manual" | "threshold" | "overflow";
57
+ tokensBefore: number;
58
+ }) => void;
41
59
  }
42
60
  export interface RunResult {
43
61
  responseText: string;
@@ -53,6 +71,15 @@ export declare function runAgent(ctx: ExtensionContext, type: SubagentType, prom
53
71
  */
54
72
  export declare function resumeAgent(session: AgentSession, prompt: string, options?: {
55
73
  onToolActivity?: (activity: ToolActivity) => void;
74
+ onAssistantUsage?: (usage: {
75
+ input: number;
76
+ output: number;
77
+ cacheWrite: number;
78
+ }) => void;
79
+ onCompaction?: (info: {
80
+ reason: "manual" | "threshold" | "overflow";
81
+ tokensBefore: number;
82
+ }) => void;
56
83
  signal?: AbortSignal;
57
84
  }): Promise<string>;
58
85
  /**
@@ -261,6 +261,18 @@ export async function runAgent(ctx, type, prompt, options) {
261
261
  if (event.type === "tool_execution_end") {
262
262
  options.onToolActivity?.({ type: "end", toolName: event.toolName });
263
263
  }
264
+ if (event.type === "message_end" && event.message.role === "assistant") {
265
+ const u = event.message.usage;
266
+ if (u)
267
+ options.onAssistantUsage?.({
268
+ input: u.input ?? 0,
269
+ output: u.output ?? 0,
270
+ cacheWrite: u.cacheWrite ?? 0,
271
+ });
272
+ }
273
+ if (event.type === "compaction_end" && !event.aborted && event.result) {
274
+ options.onCompaction?.({ reason: event.reason, tokensBefore: event.result.tokensBefore });
275
+ }
264
276
  });
265
277
  const collector = collectResponseText(session);
266
278
  const cleanupAbort = forwardAbortSignal(session, options.signal);
@@ -289,12 +301,24 @@ export async function runAgent(ctx, type, prompt, options) {
289
301
  export async function resumeAgent(session, prompt, options = {}) {
290
302
  const collector = collectResponseText(session);
291
303
  const cleanupAbort = forwardAbortSignal(session, options.signal);
292
- const unsubToolUse = options.onToolActivity
304
+ const unsubEvents = (options.onToolActivity || options.onAssistantUsage || options.onCompaction)
293
305
  ? session.subscribe((event) => {
294
306
  if (event.type === "tool_execution_start")
295
- options.onToolActivity({ type: "start", toolName: event.toolName });
307
+ options.onToolActivity?.({ type: "start", toolName: event.toolName });
296
308
  if (event.type === "tool_execution_end")
297
- options.onToolActivity({ type: "end", toolName: event.toolName });
309
+ options.onToolActivity?.({ type: "end", toolName: event.toolName });
310
+ if (event.type === "message_end" && event.message.role === "assistant") {
311
+ const u = event.message.usage;
312
+ if (u)
313
+ options.onAssistantUsage?.({
314
+ input: u.input ?? 0,
315
+ output: u.output ?? 0,
316
+ cacheWrite: u.cacheWrite ?? 0,
317
+ });
318
+ }
319
+ if (event.type === "compaction_end" && !event.aborted && event.result) {
320
+ options.onCompaction?.({ reason: event.reason, tokensBefore: event.result.tokensBefore });
321
+ }
298
322
  })
299
323
  : () => { };
300
324
  try {
@@ -302,7 +326,7 @@ export async function resumeAgent(session, prompt, options = {}) {
302
326
  }
303
327
  finally {
304
328
  collector.unsubscribe();
305
- unsubToolUse();
329
+ unsubEvents();
306
330
  cleanupAbort();
307
331
  }
308
332
  return collector.getText().trim() || getLastAssistantText(session);