@gotgenes/pi-subagents 16.0.0 → 16.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,95 @@
1
+ ---
2
+ issue: 381
3
+ issue_title: "Replace ConcurrencyQueue with a thunk-based ConcurrencyLimiter"
4
+ ---
5
+
6
+ # Retro: #381 — Replace ConcurrencyQueue with a thunk-based ConcurrencyLimiter
7
+
8
+ ## Stage: Planning (2026-06-13T00:00:00Z)
9
+
10
+ ### Session summary
11
+
12
+ Produced a 3-step TDD plan to replace the ID-registry `ConcurrencyQueue` (with its `startAgent` back-edge and `markStarted`/`markFinished` relays) with a pure `ConcurrencyLimiter` that schedules thunks FIFO against a dynamic limit.
13
+ The design follows the architecture doc's Phase 17 Step 1 entry and the issue's revised framing closely; the plan adds concrete code sketches for `schedule`/`recheck`/`clear`, the manager call site, the simplified `waitForAll`, and `index.ts` wiring.
14
+
15
+ ### Observations
16
+
17
+ - Author is `gotgenes` (matches the gh CLI user), so the well-specified proposal was treated as the working hypothesis; the design is unambiguous (down to the architecture-doc Step 1), so the `ask_user` gate was skipped.
18
+ - Classified non-breaking: `ConcurrencyQueue`/`ConcurrencyLimiter` are internal — no public API, config, or observable behavior change.
19
+ The FIFO admission gate against `maxConcurrent` is preserved.
20
+ - Key design decision beyond the issue sketch: `clear()` must *settle* dropped pending promises (resolve them), not just drop the thunks.
21
+ Every `schedule()` promise becomes `record.promise`, and the post-spawn contract is that it always settles — dropping without resolving would strand a promise.
22
+ This costs a small `settle` handle per pending entry (a few lines beyond the issue's "~40 lines").
23
+ - Verified no production caller awaits a *queued* agent's promise in a blocking way (`get-result-tool.ts` guards on `status === "running"`; `spawnAndWait` is foreground/direct; `waitForAll` filters by status), confirming it is safe to give queued agents a real promise.
24
+ - Sequencing decision: the `SubagentManagerOptions.queue` → `limiter` swap breaks both call sites (`index.ts` + the manager test helper) and the old test file imports the deleted source, so step 2 is one atomic commit (migrate consumers + delete queue + delete old test).
25
+ - `bypassQueue` is kept as-is — it is in the published `SubagentsService` type bundle, so renaming would be breaking; deferred to Open Questions.
26
+ - Doc inventory: grep confirmed current-state references to update are the Mermaid lifecycle node, the layout listing, the "What the core owns" bullet, the Step 7 ([#378]) target filename, and the `package-pi-subagents` SKILL lifecycle-domain table.
27
+ `SKILL.md` line 80 (Phase 15 history) keeps `ConcurrencyQueue` as a historical record.
28
+
29
+ ## Stage: Implementation — TDD (2026-06-13T22:15:00Z)
30
+
31
+ ### Session summary
32
+
33
+ Executed all 3 planned TDD cycles: (1) added `ConcurrencyLimiter` + 13 unit tests, (2) migrated `SubagentManager`, `index.ts`, `subagent.ts` docstring, and the manager test helper to the limiter while deleting `concurrency-queue.ts` + its test in the same atomic commit, (3) updated `architecture.md` and the package SKILL.
34
+ Test count went 975 → 966 (−22 deleted queue tests, +13 new limiter tests); the full suite, `check`, `lint`, and `pnpm fallow dead-code` are all green.
35
+
36
+ ### Observations
37
+
38
+ - The plan held up cleanly — no surprises in the manager integration tests.
39
+ The `queueing and concurrency` describe block passed unchanged after only the `createManager` helper swap (real `ConcurrencyLimiter` instead of `ConcurrencyQueue` + forward-ref start callback), confirming those tests exercise behavior, not queue internals.
40
+ - One deviation: a 4th commit (`90135005`, `refactor:`) fixes a stale `// before startAgent / queue drain` comment at `src/index.ts:125` that the plan's grep inventory missed (it named no removed symbol, just deleted concepts).
41
+ The pre-completion reviewer caught it.
42
+ Committed separately rather than amending the non-HEAD refactor commit, since AGENTS.md discourages interactive rebase in this environment.
43
+ - ESLint `@typescript-eslint/no-floating-promises` fired on every bare `limiter.schedule(...)` in the limiter test (the queue's `enqueue` returned `void`; `schedule` returns a promise).
44
+ Resolved by prefixing unawaited calls with `void` — all such tasks either stay pending or resolve, so no unhandled rejection.
45
+ - The `clear()`-settles-pending-promises decision (made at planning) proved correct and is covered by a dedicated test ("resolves the promises of dropped pending tasks").
46
+ - Pre-completion reviewer: WARN (no FAILs).
47
+ Reviewer warnings: the single stale-comment finding at `index.ts:125` — now fixed in commit `90135005`.
48
+
49
+ ## Stage: Final Retrospective (2026-06-14T00:30:00Z)
50
+
51
+ ### Session summary
52
+
53
+ Shipped #381 across planning, TDD, and release: `pi-subagents` `16.0.0` → `16.1.0`, tag `pi-subagents-v16.1.0`.
54
+ Four commits landed (one `feat`, two `refactor`, one `docs`) plus two `docs(retro)` notes; CI passed first try, the issue was closed with an implemented-in summary, and the release-please PR was merged.
55
+ The plan — written down to code sketches — held up across all three TDD cycles with no design rework.
56
+
57
+ ### Observations
58
+
59
+ #### What went well
60
+
61
+ - The plan's fidelity paid off: the `clear()`-settles-pending-promises decision, the atomic step-2 sequencing (migrate consumers + delete queue + delete old test in one commit), and the `void`-prefix prediction for floating promises were all made at planning time and executed without surprise.
62
+ The `queueing and concurrency` manager tests passed unchanged after only the `createManager` helper swap, validating the planning claim that they exercise behavior, not queue internals.
63
+ - The pre-completion-reviewer (on `anthropic/claude-sonnet-4-6`, 161s, 21 tool uses) caught a stale comment at `src/index.ts:125` that all four deterministic gates (`check`, `lint`, `test`, `fallow dead-code`) passed over.
64
+ This is the backstop working exactly as intended — a judgment-model review surfacing residue that pattern-matchers cannot.
65
+ - Verification cadence was incremental, not end-loaded: file-scoped `vitest` + `biome` + `eslint` after step 1, `pnpm run check` immediately after the shared-interface change mid-step-2 (per the plan's own instruction), then lifecycle suite → full suite → full lint, then `rumdl` for the docs step, then the full gates + `fallow` before push.
66
+
67
+ #### What caused friction (agent side)
68
+
69
+ - `missing-context` (self/reviewer-caught) — the stale comment `// before startAgent / queue drain` at `src/index.ts:125` referenced two deleted concepts but was not cataloged in the plan's Module-Level Changes, despite the planning grep output having surfaced that exact line.
70
+ The grep hit was visible but never converted into a plan action or an explicit leave-as-is.
71
+ Impact: one small follow-up commit (`90135005`, `refactor:`); no rework, no design impact — the reviewer backstop absorbed it before ship.
72
+
73
+ #### What caused friction (user side)
74
+
75
+ - None.
76
+ The single user touchpoint — the release-timing gate in `/ship-issue` (release now vs. batch the Phase 17 sequence) — was strategic judgment the agent correctly deferred, not mechanical oversight.
77
+
78
+ ### Diagnostic details
79
+
80
+ - **Model-performance correlation** — one subagent dispatch (`pre-completion-reviewer`) on `anthropic/claude-sonnet-4-6`; appropriate match for judgment-heavy review, and it returned the session's only actionable finding.
81
+ - **Escalation-delay tracking** — no rabbit-holes; the lone lint error (`@typescript-eslint/no-floating-promises`, 18 sites) was resolved in a single test-file rewrite, far under the 5-call escalation threshold.
82
+ - **Unused-tool detection** — nothing under-tooled; `colgrep`/`grep` were used during planning exploration and the reviewer subagent was dispatched as designed.
83
+ - **Feedback-loop gap analysis** — no gap; verification ran after every cycle, with `pnpm run check` correctly invoked right after the shared-interface change rather than at end-of-session.
84
+
85
+ #### Process note (no inline change)
86
+
87
+ - The release-please PR merge required the documented `UNSTABLE` → `gh pr merge` fallback (step 6.4 of `/ship-issue`) because default-`GITHUB_TOKEN` release PRs never get checks.
88
+ This recurs every release; the prompt already handles it, so it is recorded here only as a standing pattern, not a friction point.
89
+
90
+ ### Changes made
91
+
92
+ 1. Added this Final Retrospective stage entry to `packages/pi-subagents/docs/retro/0381-replace-concurrency-queue-with-limiter.md`.
93
+ 2. No prompt or `AGENTS.md` changes — the operator chose retro-file-only, since the single friction (the stale `src/index.ts:125` comment) was a one-off execution slip already caught by the pre-completion-reviewer backstop, and the candidate grep-hit rule was judged not worth the prompt verbosity.
94
+
95
+ [#378]: https://github.com/gotgenes/pi-packages/issues/378
@@ -41,4 +41,44 @@ Test count went from 973 to 975 (+2 net new tests) across 59 test files.
41
41
  - Pre-completion reviewer: WARN — one finding: `.pi/skills/package-pi-subagents/SKILL.md` still said "prepends" for the `<active_agent>` tag; fixed in a follow-up `docs:` commit before shipping.
42
42
  - No deviations from the plan's Module-Level Changes list; no lockfile changes; fallow dead-code exited zero.
43
43
 
44
+ ## Stage: Final Retrospective (2026-06-14T01:11:10Z)
45
+
46
+ ### Session summary
47
+
48
+ Shipped #400 across three stages (Planning on `claude-opus-4-8`, TDD + Ship on `claude-sonnet-4-6`) as a single-function edit to `buildAgentPrompt()`'s replace branch plus tests and doc updates, released as `pi-subagents` v16.0.0 (major, breaking `perf!:`).
49
+ The run was clean end-to-end: two `ask_user` gates during planning, a 3-cycle TDD pass, one pre-completion WARN resolved before push, and a no-friction release-please merge.
50
+
51
+ ### Observations
52
+
53
+ #### What went well
54
+
55
+ - Cross-extension investigation on demand — when the operator asked mid-`ask_user` how the `genericBase` fallback interacts with `@gotgenes/pi-anthropic-auth`, the agent read that sibling repo's `system-prompt-shaping.ts` and `request-shaping.ts` and proved no new interaction (billing header prepended unconditionally; de-fingerprinting keys off `PI_DEFAULT_PROMPT_PREFIX`, absent from the neutral `genericBase`) before answering.
56
+ This converted an open worry into a documented Risk row rather than a deferred unknown.
57
+ - Emergent-scope surfacing — planning noticed that built-in `Explore`/`Plan` are replace-mode agents and so are visibly affected, then confirmed uniform application via a second `ask_user` instead of assuming.
58
+ - Autoformat discipline — after `pi-autoformat` touched `README.md` mid-edit, the agent re-read the region before the next edit (turns 49–50) rather than matching against stale layout, avoiding a failed `oldText`.
59
+
60
+ #### What caused friction (agent side)
61
+
62
+ - `missing-context` (planning) — the plan listed the README's Patch 3 `<active_agent>` "prepends" wording as a doc update but missed the identical Patch 3 description in `.pi/skills/package-pi-subagents/SKILL.md`.
63
+ Exact-grep during planning keyed on removed strings (`You are a pi coding agent sub-agent`, `prompt_mode`); the stale prose carried none of them, so the skill file's "prepends `<active_agent>`" line was not found.
64
+ Impact: the pre-completion reviewer caught it as a WARN, requiring one follow-up `docs:` commit (8e93d2a4) during TDD before push — no rework beyond that, and the safety net worked as designed.
65
+
66
+ #### What caused friction (user side)
67
+
68
+ - None — the operator's mid-planning OAuth question was a high-value redirect that strengthened the plan, not friction.
69
+
70
+ ### Diagnostic details
71
+
72
+ - **Model-performance correlation** — judgment-heavy planning ran on `claude-opus-4-8`; mechanical TDD execution and the deterministic ship steps ran on `claude-sonnet-4-6`.
73
+ Appropriate assignment in both directions; no mismatch.
74
+ - **Unused-tool detection** — the `colgrep` skill was loaded in planning but never used; exploration was all exact-symbol grep, which was correct for known symbols.
75
+ The one place it would have helped is the `missing-context` friction: a semantic search like "docs describing how the active_agent tag is added to the system prompt" would likely have surfaced both the README and the SKILL.md descriptions that symbol-grep missed.
76
+ - **Feedback-loop gap analysis** — verification ran incrementally throughout (green baseline before cycle 1, per-file `vitest` each cycle, full suite + `check` + `lint` + `fallow` after the last step).
77
+ No end-loaded verification.
78
+ - **Escalation-delay tracking** — no rabbit-holes; no error sequence exceeded one tool call.
79
+
80
+ ### Changes made
81
+
82
+ 1. `.pi/prompts/plan-issue.md` — extended the Module-Level Changes grep bullet: when a step reworks a documented mechanism's behavior (rather than removing a symbol), grep `.pi/skills/package-*/SKILL.md` for the mechanism name, since reworded prose carries no removed symbol to match.
83
+
44
84
  [#180]: https://github.com/gotgenes/pi-packages/issues/180
@@ -0,0 +1,49 @@
1
+ ---
2
+ issue: 403
3
+ issue_title: "Pressing Escape does not stop subagent/background agent"
4
+ ---
5
+
6
+ # Retro: #403 — Pressing Escape does not stop subagent/background agent
7
+
8
+ ## Stage: Planning (2026-06-14T00:00:00Z)
9
+
10
+ ### Session summary
11
+
12
+ Investigated the third-party bug report that ESC does not stop subagents and traced the abort path through both the package and the pinned Pi SDK peer deps.
13
+ Found that foreground subagents already receive the parent abort signal end-to-end, while background subagents are detached with no interrupt wiring — the reproducible bug.
14
+ Confirmed direction with the operator via `ask_user` (third-party gate): implement ESC-to-abort for both modes, with a foreground guard test, aborting all running and queued background agents.
15
+ Wrote and committed plan `0403-abort-subagents-on-interrupt.md`.
16
+
17
+ ### Observations
18
+
19
+ - Key SDK fact that de-risks the design: in `pi-agent-core` `agent.js`, each run creates a fresh `AbortController` and `finishRun()` discards it **without** aborting on normal completion.
20
+ So the parent signal's `abort` event fires only on a real ESC interrupt — latching `abortAll()` to it will not spuriously kill background agents at turn end.
21
+ - Chosen mechanism: a small `InterruptHandler` driven by `pi.on("turn_start", ...)`, re-latching `ctx.signal` each turn so the latch tracks the live per-run signal even across runs and tool-less turns.
22
+ `turn_start` was preferred over `tool_execution_start` because a background agent can outlive the run that spawned it; a turn-level latch still holds the current run's signal when the user interrupts a later tool-less turn.
23
+ - Reused the existing `manager.abortAll()` rather than adding `abortBackground()`.
24
+ Foreground agents are already aborted via their own `wireSignal`, so `abortAll()`'s overlap is redundant-but-harmless (status-guarded `abort()`, idempotent `markStopped`).
25
+ The manager does not store `isBackground` on the record, so distinguishing modes would need extra state — deferred as an Open Question.
26
+ - Classified as a non-breaking `fix:` (not `fix!:`): no config key, default, or output shape changes; detached-survives-ESC was a limitation, not a contract.
27
+ Noted the behavior change explicitly in Goals.
28
+ - Foreground path is believed already-correct from the code trace; the plan adds a regression guard in `subagent-session.test.ts` (`forwardAbortSignal` is currently untested for the parent-signal path) and will fix only if the guard fails.
29
+
30
+ ## Stage: Implementation — TDD (2026-06-14T18:00:00Z)
31
+
32
+ ### Session summary
33
+
34
+ Completed all three TDD cycles against a green baseline (967 tests).
35
+ Added the foreground-abort guard, implemented `InterruptHandler` + `turn_start` wiring, and updated the architecture doc.
36
+ Test count went from 967 to 975 (+8: 6 `InterruptHandler` unit tests, 2 foreground guard tests); `check`, `lint`, `test`, and `fallow dead-code` all pass.
37
+
38
+ ### Observations
39
+
40
+ - The foreground guard (Step 1) passed on the first run, confirming the planning-stage code trace: the parent signal already reaches the child `session.abort()` via `forwardAbortSignal`.
41
+ No code fix was needed, so it landed as `test:` exactly as the plan anticipated.
42
+ - `InterruptHandler` came out clean against the `code-design` heuristics — one field read from `ctx`, one method on a one-method `InterruptManager` interface, latch state owned internally, `{ once: true }` listener.
43
+ The reviewer's code-design check was PASS with no structural concerns.
44
+ - `abortAll()` gained a second narrow-interface consumer (the new handler) on top of the shutdown path; `fallow dead-code` stayed green, so its existing `fallow-ignore-next-line unused-class-member` comment was left untouched.
45
+ - Pre-completion reviewer: **WARN**.
46
+ - Reviewer warnings: stale source-file counts in `architecture.md`.
47
+ Fixed the current-state prose claim (`56` → `58` source files).
48
+ Left the fallow health-metrics snapshot rows (line ~650, `7,778 (57 files)`) intact — those are point-in-time analysis tables where the file count was computed alongside LOC and other metrics, so bumping one cell in isolation would desync the snapshot.
49
+ Amended the fix into the docs commit (not yet pushed).
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "@gotgenes/pi-subagents",
3
- "version": "16.0.0",
3
+ "version": "16.1.1",
4
4
  "type": "module",
5
5
  "exports": {
6
6
  ".": {
@@ -1,2 +1,3 @@
1
+ export { InterruptHandler } from "#src/handlers/interrupt";
1
2
  export { SessionLifecycleHandler } from "#src/handlers/lifecycle";
2
3
  export { ToolStartHandler } from "#src/handlers/tool-start";
@@ -0,0 +1,49 @@
1
+ /**
2
+ * turn_start event handler that aborts subagents on a parent interrupt (ESC).
3
+ *
4
+ * The parent agent loop creates a fresh AbortController per run and only aborts
5
+ * it on an explicit interrupt — never on normal completion. So latching to the
6
+ * current run's signal and aborting on its `abort` event fires exactly on ESC.
7
+ *
8
+ * `turn_start` carries the live per-run `ctx.signal`, so re-latching each turn
9
+ * keeps the handler tracking the current signal across runs and tool-less turns.
10
+ */
11
+
12
+ /** Narrow manager interface — only the method the interrupt handler calls. */
13
+ export interface InterruptManager {
14
+ abortAll(): number;
15
+ }
16
+
17
+ /** Minimal context shape — only the field the handler reads. */
18
+ interface InterruptCtx {
19
+ signal: AbortSignal | undefined;
20
+ }
21
+
22
+ /**
23
+ * Latches the current parent abort signal and aborts all subagents when it fires.
24
+ *
25
+ * The latch dedups by reference: most turns reuse the same signal (no-op); a new
26
+ * run's signal triggers a detach-and-rewire. The `abort` listener is one-shot.
27
+ */
28
+ export class InterruptHandler {
29
+ private latched?: AbortSignal;
30
+ private detach?: () => void;
31
+
32
+ constructor(private readonly manager: InterruptManager) {}
33
+
34
+ handleTurnStart(ctx: InterruptCtx): void {
35
+ const signal = ctx.signal;
36
+ if (signal === this.latched) return;
37
+
38
+ this.detach?.();
39
+ this.detach = undefined;
40
+ this.latched = signal;
41
+ if (!signal) return;
42
+
43
+ const onAbort = (): void => {
44
+ this.manager.abortAll();
45
+ };
46
+ signal.addEventListener("abort", onAbort, { once: true });
47
+ this.detach = () => signal.removeEventListener("abort", onAbort);
48
+ }
49
+ }
package/src/index.ts CHANGED
@@ -22,9 +22,9 @@ import {
22
22
  } from "@earendil-works/pi-coding-agent";
23
23
  import { AgentTypeRegistry } from "#src/config/agent-types";
24
24
  import { loadCustomAgents } from "#src/config/custom-agents";
25
- import { SessionLifecycleHandler, ToolStartHandler } from "#src/handlers/index";
25
+ import { InterruptHandler, SessionLifecycleHandler, ToolStartHandler } from "#src/handlers/index";
26
26
  import { createChildLifecyclePublisher } from "#src/lifecycle/child-lifecycle";
27
- import { ConcurrencyQueue } from "#src/lifecycle/concurrency-queue";
27
+ import { ConcurrencyLimiter } from "#src/lifecycle/concurrency-limiter";
28
28
  import { createSubagentSession, type SubagentSessionDeps } from "#src/lifecycle/create-subagent-session";
29
29
  import { buildParentSnapshot } from "#src/lifecycle/parent-snapshot";
30
30
  import { SubagentManager, type SubagentManagerObserver } from "#src/lifecycle/subagent-manager";
@@ -66,12 +66,12 @@ export default function (pi: ExtensionAPI) {
66
66
  );
67
67
 
68
68
  // Settings: owns all three in-memory values and handles load/save/emit.
69
- // onMaxConcurrentChanged is wired to the queue directly (closure captures by reference).
69
+ // onMaxConcurrentChanged is wired to the limiter directly (closure captures by reference).
70
70
  const settings = new SettingsManager({
71
71
  emit: (event, payload) => pi.events.emit(event, payload),
72
72
  cwd: process.cwd(),
73
73
  agentDir: getAgentDir(),
74
- onMaxConcurrentChanged: () => queue.drain(),
74
+ onMaxConcurrentChanged: () => limiter.recheck(),
75
75
  });
76
76
  settings.load();
77
77
 
@@ -122,7 +122,7 @@ export default function (pi: ExtensionAPI) {
122
122
  });
123
123
  },
124
124
  onSubagentCreated(record) {
125
- // Emit created event for background agents (before startAgent / queue drain).
125
+ // Emit created event for background agents (before limiter admission).
126
126
  pi.events.emit("subagents:created", {
127
127
  id: record.id,
128
128
  type: record.type,
@@ -150,22 +150,15 @@ export default function (pi: ExtensionAPI) {
150
150
  lifecycle: createChildLifecyclePublisher((channel, data) => pi.events.emit(channel, data)),
151
151
  };
152
152
 
153
- // ConcurrencyQueue: scheduling extracted from SubagentManager.
154
- // startAgent callback forward-references manager via closure (safedrain is never called during construction).
155
- const queue = new ConcurrencyQueue(
156
- () => settings.maxConcurrent,
157
- (id) => {
158
- const agent = manager.getRecord(id);
159
- if (agent?.status !== "queued") return;
160
- agent.promise = agent.run();
161
- },
162
- );
153
+ // ConcurrencyLimiter: schedules background run thunks FIFO against the limit.
154
+ // It knows nothing about agents or the manager dependency direction is strictly manager limiter.
155
+ const limiter = new ConcurrencyLimiter(() => settings.maxConcurrent);
163
156
 
164
157
  const manager = new SubagentManager({
165
158
  createSubagentSession: (params) => createSubagentSession(params, subagentSessionDeps),
166
159
  baseCwd: process.cwd(),
167
160
  observer,
168
- queue,
161
+ limiter,
169
162
  getRunConfig: () => settings,
170
163
  });
171
164
 
@@ -192,6 +185,10 @@ export default function (pi: ExtensionAPI) {
192
185
  const toolStart = new ToolStartHandler(runtime);
193
186
  pi.on("tool_execution_start", (event, ctx) => toolStart.handleToolExecutionStart(event, ctx));
194
187
 
188
+ // Abort all subagents when the parent agent loop is interrupted (ESC).
189
+ const interrupt = new InterruptHandler(manager);
190
+ pi.on("turn_start", (_event, ctx) => interrupt.handleTurnStart(ctx));
191
+
195
192
  // ---- Agent tool ----
196
193
 
197
194
  pi.registerTool(new AgentTool(manager, runtime, settings, registry, getAgentDir()).toToolDefinition());
@@ -0,0 +1,55 @@
1
+ /**
2
+ * concurrency-limiter.ts — FIFO admission gate for background work.
3
+ *
4
+ * Schedules run closures (thunks) against a dynamic limit, running them in
5
+ * scheduling order as slots free. The limiter knows nothing about agents, IDs,
6
+ * or the manager — it owns only the active count and the pending queue.
7
+ *
8
+ * Every scheduled promise settles: it follows the task's settlement when the
9
+ * task runs, or resolves early if clear() drops it before it starts.
10
+ */
11
+
12
+ export class ConcurrencyLimiter {
13
+ private active = 0;
14
+ private readonly pending: Array<{ start: () => void; settle: () => void }> = [];
15
+
16
+ constructor(private readonly getLimit: () => number) {}
17
+
18
+ /**
19
+ * Schedule a task to run FIFO once a slot is free.
20
+ * Returns a promise that settles with the task, or resolves early if the
21
+ * task is dropped by clear() before it starts.
22
+ */
23
+ schedule(task: () => Promise<void>): Promise<void> {
24
+ const { promise, resolve, reject } = Promise.withResolvers<void>(); // eslint-disable-line @typescript-eslint/no-invalid-void-type -- Promise.withResolvers<void> is valid; rule does not allow void in generic fn call type args
25
+ this.pending.push({
26
+ start: () => {
27
+ this.active++;
28
+ task()
29
+ .then(resolve, reject)
30
+ .finally(() => {
31
+ this.active--;
32
+ this.recheck();
33
+ });
34
+ },
35
+ settle: resolve,
36
+ });
37
+ this.recheck();
38
+ return promise;
39
+ }
40
+
41
+ /** Start pending tasks until the limit is reached. Call when the limit may have grown. */
42
+ recheck(): void {
43
+ while (this.active < this.getLimit()) {
44
+ const next = this.pending.shift();
45
+ if (!next) break;
46
+ next.start();
47
+ }
48
+ }
49
+
50
+ /** Drop all pending tasks, resolving their promises without running them. */
51
+ clear(): void {
52
+ const dropped = this.pending.splice(0);
53
+ for (const task of dropped) task.settle();
54
+ }
55
+ }
@@ -2,18 +2,19 @@
2
2
  * subagent-manager.ts - Tracks subagents, background execution, resume support.
3
3
  *
4
4
  * Background agents are subject to a configurable concurrency limit (default: 4).
5
- * Excess agents are queued and auto-started as running agents complete.
6
- * Foreground agents bypass the queue (they block the parent anyway).
5
+ * Excess agents are scheduled on a ConcurrencyLimiter and auto-started as running
6
+ * agents complete. Foreground agents bypass the limiter (they block the parent anyway).
7
7
  */
8
8
 
9
9
  import { randomUUID } from "node:crypto";
10
10
  import type { Model } from "@earendil-works/pi-ai";
11
11
  import { debugLog } from "#src/debug";
12
- import type { ConcurrencyQueue } from "#src/lifecycle/concurrency-queue";
12
+ import type { ConcurrencyLimiter } from "#src/lifecycle/concurrency-limiter";
13
13
  import type { CreateSubagentSessionParams } from "#src/lifecycle/create-subagent-session";
14
14
  import type { ParentSnapshot } from "#src/lifecycle/parent-snapshot";
15
15
  import { Subagent, type SubagentLifecycleObserver } from "#src/lifecycle/subagent";
16
16
  import type { SubagentSession } from "#src/lifecycle/subagent-session";
17
+ import { SubagentState } from "#src/lifecycle/subagent-state";
17
18
  import type { WorkspaceProvider } from "#src/lifecycle/workspace";
18
19
 
19
20
  import type { RunConfig } from "#src/runtime";
@@ -31,8 +32,8 @@ export interface SubagentManagerObserver {
31
32
  export interface SubagentManagerOptions {
32
33
  /** Assembly factory that produces a born-complete SubagentSession per spawn. */
33
34
  createSubagentSession: (params: CreateSubagentSessionParams) => Promise<SubagentSession>;
34
- /** Concurrency queueowns scheduling, limit checks, and drain logic. */
35
- queue: ConcurrencyQueue;
35
+ /** Concurrency limiterschedules background run thunks FIFO against the limit. */
36
+ limiter: ConcurrencyLimiter;
36
37
  /** Base working directory handed to a workspace provider (the parent cwd). */
37
38
  baseCwd: string;
38
39
  getRunConfig?: () => RunConfig;
@@ -67,7 +68,7 @@ export class SubagentManager {
67
68
  private cleanupInterval: ReturnType<typeof setInterval>;
68
69
  private readonly observer?: SubagentManagerObserver;
69
70
  private readonly createSubagentSession: (params: CreateSubagentSessionParams) => Promise<SubagentSession>;
70
- private readonly queue: ConcurrencyQueue;
71
+ private readonly limiter: ConcurrencyLimiter;
71
72
  private readonly baseCwd: string;
72
73
  private getRunConfig?: () => RunConfig;
73
74
  private _workspaceProvider?: WorkspaceProvider;
@@ -79,7 +80,7 @@ export class SubagentManager {
79
80
 
80
81
  constructor(options: SubagentManagerOptions) {
81
82
  this.createSubagentSession = options.createSubagentSession;
82
- this.queue = options.queue;
83
+ this.limiter = options.limiter;
83
84
  this.baseCwd = options.baseCwd;
84
85
  this.observer = options.observer;
85
86
  this.getRunConfig = options.getRunConfig;
@@ -109,7 +110,6 @@ export class SubagentManager {
109
110
  private buildObserver(options: AgentSpawnConfig): SubagentLifecycleObserver {
110
111
  return {
111
112
  onStarted: (agent) => {
112
- if (options.isBackground) this.queue.markStarted();
113
113
  this.observer?.onSubagentStarted(agent);
114
114
  },
115
115
  onSessionCreated: options.observer?.onSessionCreated
@@ -117,7 +117,6 @@ export class SubagentManager {
117
117
  : undefined,
118
118
  onRunFinished: (agent) => {
119
119
  if (options.isBackground) {
120
- this.queue.markFinished();
121
120
  try { this.observer?.onSubagentCompleted(agent); } catch (err) { debugLog("onSubagentCompleted observer", err); }
122
121
  }
123
122
  },
@@ -142,23 +141,25 @@ export class SubagentManager {
142
141
  id,
143
142
  type,
144
143
  description: options.description,
145
- status: options.isBackground ? "queued" : "running",
146
- startedAt: Date.now(),
147
144
  invocation: options.invocation,
148
- // Run config
149
- snapshot,
150
- prompt,
151
- model: options.model,
152
- maxTurns: options.maxTurns,
153
- thinkingLevel: options.thinkingLevel,
154
- parentSession: options.parentSession,
155
- signal: options.signal,
156
- // Shared deps
157
- createSubagentSession: this.createSubagentSession,
158
- observer: this.buildObserver(options),
159
- getRunConfig: this.getRunConfig,
160
- baseCwd: this.baseCwd,
161
- getWorkspaceProvider: () => this._workspaceProvider,
145
+ state: new SubagentState({
146
+ status: options.isBackground ? "queued" : "running",
147
+ startedAt: Date.now(),
148
+ }),
149
+ execution: {
150
+ createSubagentSession: this.createSubagentSession,
151
+ snapshot,
152
+ prompt,
153
+ baseCwd: this.baseCwd,
154
+ observer: this.buildObserver(options),
155
+ getRunConfig: this.getRunConfig,
156
+ getWorkspaceProvider: () => this._workspaceProvider,
157
+ model: options.model,
158
+ maxTurns: options.maxTurns,
159
+ thinkingLevel: options.thinkingLevel,
160
+ parentSession: options.parentSession,
161
+ signal: options.signal,
162
+ },
162
163
  });
163
164
  this.agents.set(id, record);
164
165
 
@@ -166,9 +167,13 @@ export class SubagentManager {
166
167
  this.observer?.onSubagentCreated(record);
167
168
  }
168
169
 
169
- if (options.isBackground && !options.bypassQueue && this.queue.isFull()) {
170
- // Queue it - will be started when a running agent completes
171
- this.queue.enqueue(id);
170
+ if (options.isBackground && !options.bypassQueue) {
171
+ // Schedule on the limiter started when a slot frees. The status guard
172
+ // makes an abort-while-queued task a no-op (Step 3 folds it into start()).
173
+ record.promise = this.limiter.schedule(() => {
174
+ if (record.status !== "queued") return Promise.resolve();
175
+ return record.run();
176
+ });
172
177
  return id;
173
178
  }
174
179
 
@@ -221,9 +226,9 @@ export class SubagentManager {
221
226
  const record = this.agents.get(id);
222
227
  if (!record) return false;
223
228
 
224
- // Remove from queue if queued
229
+ // A queued agent has not started; mark it stopped. Its scheduled thunk
230
+ // becomes a no-op (status guard) when its slot finally opens.
225
231
  if (record.status === "queued") {
226
- this.queue.dequeue(id);
227
232
  record.markStopped();
228
233
  return true;
229
234
  }
@@ -269,43 +274,44 @@ export class SubagentManager {
269
274
  // fallow-ignore-next-line unused-class-member
270
275
  abortAll(): number {
271
276
  let count = 0;
272
- // Clear queued agents first
273
- for (const id of this.queue.queuedIds) {
274
- const record = this.agents.get(id);
275
- if (record) {
277
+ for (const record of this.agents.values()) {
278
+ if (record.status === "queued") {
276
279
  record.markStopped();
277
280
  count++;
281
+ } else if (record.abort()) {
282
+ count++;
278
283
  }
279
284
  }
280
- this.queue.clear();
281
- // Abort running agents
282
- for (const record of this.agents.values()) {
283
- if (record.abort()) count++;
284
- }
285
+ // Drop pending thunks (their promises resolve).
286
+ this.limiter.clear();
285
287
  return count;
286
288
  }
287
289
 
288
290
  /** Wait for all running and queued agents to complete (including queued ones). */
289
291
  // fallow-ignore-next-line unused-class-member
290
292
  async waitForAll(): Promise<void> {
291
- // Loop because queue.drain() respects the concurrency limit - as running
292
- // agents finish they start queued ones, which need awaiting too.
293
- // eslint-disable-next-line @typescript-eslint/no-unnecessary-condition -- intentional infinite loop with explicit break
294
- while (true) {
295
- this.queue.drain();
296
- const pending = [...this.agents.values()]
297
- .filter(r => r.status === "running" || r.status === "queued")
298
- .map(r => r.promise)
299
- .filter((p): p is Promise<void> => p != null);
300
- if (pending.length === 0) break;
293
+ // Every spawned agent has a settled-on-completion promise (the limiter starts
294
+ // queued ones as slots free), so a single allSettled covers the queued case.
295
+ // The loop only catches agents spawned during the wait.
296
+ let pending = this.pendingPromises();
297
+ while (pending.length > 0) {
301
298
  await Promise.allSettled(pending);
299
+ pending = this.pendingPromises();
302
300
  }
303
301
  }
304
302
 
303
+ /** Promises of all running/queued agents that have one. */
304
+ private pendingPromises(): Promise<void>[] {
305
+ return [...this.agents.values()]
306
+ .filter(r => r.status === "running" || r.status === "queued")
307
+ .map(r => r.promise)
308
+ .filter((p): p is Promise<void> => p != null);
309
+ }
310
+
305
311
  dispose() {
306
312
  clearInterval(this.cleanupInterval);
307
- // Clear queue
308
- this.queue.clear();
313
+ // Drop pending thunks
314
+ this.limiter.clear();
309
315
  for (const record of this.agents.values()) {
310
316
  record.disposeSession();
311
317
  }