npm - @gotgenes/pi-subagents - Versions diffs - 16.0.0 → 16.1.1 - Mend

@gotgenes/pi-subagents 16.0.0 → 16.1.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/CHANGELOG.md +14 -0
package/dist/public.d.ts +19 -22
package/docs/architecture/architecture.md +49 -17
package/docs/plans/0373-extract-subagent-state.md +250 -0
package/docs/plans/0381-replace-concurrency-queue-with-limiter.md +267 -0
package/docs/plans/0403-abort-subagents-on-interrupt.md +180 -0
package/docs/retro/0373-extract-subagent-state.md +94 -0
package/docs/retro/0381-replace-concurrency-queue-with-limiter.md +95 -0
package/docs/retro/0400-include-parent-prompt-in-replace-mode.md +40 -0
package/docs/retro/0403-abort-subagents-on-interrupt.md +49 -0
package/package.json +1 -1
package/src/handlers/index.ts +1 -0
package/src/handlers/interrupt.ts +49 -0
package/src/index.ts +13 -16
package/src/lifecycle/concurrency-limiter.ts +55 -0
package/src/lifecycle/subagent-manager.ts +57 -51
package/src/lifecycle/subagent-state.ts +156 -0
package/src/lifecycle/subagent.ts +86 -163
package/src/observation/record-observer.ts +15 -13
package/src/lifecycle/concurrency-queue.ts +0 -63

package/docs/retro/0381-replace-concurrency-queue-with-limiter.md ADDED Viewed

@@ -0,0 +1,95 @@
+---
+issue: 381
+issue_title: "Replace ConcurrencyQueue with a thunk-based ConcurrencyLimiter"
+---
+# Retro: #381 — Replace ConcurrencyQueue with a thunk-based ConcurrencyLimiter
+## Stage: Planning (2026-06-13T00:00:00Z)
+### Session summary
+Produced a 3-step TDD plan to replace the ID-registry `ConcurrencyQueue` (with its `startAgent` back-edge and `markStarted`/`markFinished` relays) with a pure `ConcurrencyLimiter` that schedules thunks FIFO against a dynamic limit.
+The design follows the architecture doc's Phase 17 Step 1 entry and the issue's revised framing closely; the plan adds concrete code sketches for `schedule`/`recheck`/`clear`, the manager call site, the simplified `waitForAll`, and `index.ts` wiring.
+### Observations
+- Author is `gotgenes` (matches the gh CLI user), so the well-specified proposal was treated as the working hypothesis; the design is unambiguous (down to the architecture-doc Step 1), so the `ask_user` gate was skipped.
+- Classified non-breaking: `ConcurrencyQueue`/`ConcurrencyLimiter` are internal — no public API, config, or observable behavior change.
+  The FIFO admission gate against `maxConcurrent` is preserved.
+- Key design decision beyond the issue sketch: `clear()` must *settle* dropped pending promises (resolve them), not just drop the thunks.
+  Every `schedule()` promise becomes `record.promise`, and the post-spawn contract is that it always settles — dropping without resolving would strand a promise.
+  This costs a small `settle` handle per pending entry (a few lines beyond the issue's "~40 lines").
+- Verified no production caller awaits a *queued* agent's promise in a blocking way (`get-result-tool.ts` guards on `status === "running"`; `spawnAndWait` is foreground/direct; `waitForAll` filters by status), confirming it is safe to give queued agents a real promise.
+- Sequencing decision: the `SubagentManagerOptions.queue` → `limiter` swap breaks both call sites (`index.ts` + the manager test helper) and the old test file imports the deleted source, so step 2 is one atomic commit (migrate consumers + delete queue + delete old test).
+- `bypassQueue` is kept as-is — it is in the published `SubagentsService` type bundle, so renaming would be breaking; deferred to Open Questions.
+- Doc inventory: grep confirmed current-state references to update are the Mermaid lifecycle node, the layout listing, the "What the core owns" bullet, the Step 7 ([#378]) target filename, and the `package-pi-subagents` SKILL lifecycle-domain table.
+  `SKILL.md` line 80 (Phase 15 history) keeps `ConcurrencyQueue` as a historical record.
+## Stage: Implementation — TDD (2026-06-13T22:15:00Z)
+### Session summary
+Executed all 3 planned TDD cycles: (1) added `ConcurrencyLimiter` + 13 unit tests, (2) migrated `SubagentManager`, `index.ts`, `subagent.ts` docstring, and the manager test helper to the limiter while deleting `concurrency-queue.ts` + its test in the same atomic commit, (3) updated `architecture.md` and the package SKILL.
+Test count went 975 → 966 (−22 deleted queue tests, +13 new limiter tests); the full suite, `check`, `lint`, and `pnpm fallow dead-code` are all green.
+### Observations
+- The plan held up cleanly — no surprises in the manager integration tests.
+  The `queueing and concurrency` describe block passed unchanged after only the `createManager` helper swap (real `ConcurrencyLimiter` instead of `ConcurrencyQueue` + forward-ref start callback), confirming those tests exercise behavior, not queue internals.
+- One deviation: a 4th commit (`90135005`, `refactor:`) fixes a stale `// before startAgent / queue drain` comment at `src/index.ts:125` that the plan's grep inventory missed (it named no removed symbol, just deleted concepts).
+  The pre-completion reviewer caught it.
+  Committed separately rather than amending the non-HEAD refactor commit, since AGENTS.md discourages interactive rebase in this environment.
+- ESLint `@typescript-eslint/no-floating-promises` fired on every bare `limiter.schedule(...)` in the limiter test (the queue's `enqueue` returned `void`; `schedule` returns a promise).
+  Resolved by prefixing unawaited calls with `void` — all such tasks either stay pending or resolve, so no unhandled rejection.
+- The `clear()`-settles-pending-promises decision (made at planning) proved correct and is covered by a dedicated test ("resolves the promises of dropped pending tasks").
+- Pre-completion reviewer: WARN (no FAILs).
+  Reviewer warnings: the single stale-comment finding at `index.ts:125` — now fixed in commit `90135005`.
+## Stage: Final Retrospective (2026-06-14T00:30:00Z)
+### Session summary
+Shipped #381 across planning, TDD, and release: `pi-subagents` `16.0.0` → `16.1.0`, tag `pi-subagents-v16.1.0`.
+Four commits landed (one `feat`, two `refactor`, one `docs`) plus two `docs(retro)` notes; CI passed first try, the issue was closed with an implemented-in summary, and the release-please PR was merged.
+The plan — written down to code sketches — held up across all three TDD cycles with no design rework.
+### Observations
+#### What went well
+- The plan's fidelity paid off: the `clear()`-settles-pending-promises decision, the atomic step-2 sequencing (migrate consumers + delete queue + delete old test in one commit), and the `void`-prefix prediction for floating promises were all made at planning time and executed without surprise.
+  The `queueing and concurrency` manager tests passed unchanged after only the `createManager` helper swap, validating the planning claim that they exercise behavior, not queue internals.
+- The pre-completion-reviewer (on `anthropic/claude-sonnet-4-6`, 161s, 21 tool uses) caught a stale comment at `src/index.ts:125` that all four deterministic gates (`check`, `lint`, `test`, `fallow dead-code`) passed over.
+  This is the backstop working exactly as intended — a judgment-model review surfacing residue that pattern-matchers cannot.
+- Verification cadence was incremental, not end-loaded: file-scoped `vitest` + `biome` + `eslint` after step 1, `pnpm run check` immediately after the shared-interface change mid-step-2 (per the plan's own instruction), then lifecycle suite → full suite → full lint, then `rumdl` for the docs step, then the full gates + `fallow` before push.
+#### What caused friction (agent side)
+- `missing-context` (self/reviewer-caught) — the stale comment `// before startAgent / queue drain` at `src/index.ts:125` referenced two deleted concepts but was not cataloged in the plan's Module-Level Changes, despite the planning grep output having surfaced that exact line.
+  The grep hit was visible but never converted into a plan action or an explicit leave-as-is.
+  Impact: one small follow-up commit (`90135005`, `refactor:`); no rework, no design impact — the reviewer backstop absorbed it before ship.
+#### What caused friction (user side)
+- None.
+  The single user touchpoint — the release-timing gate in `/ship-issue` (release now vs. batch the Phase 17 sequence) — was strategic judgment the agent correctly deferred, not mechanical oversight.
+### Diagnostic details
+- **Model-performance correlation** — one subagent dispatch (`pre-completion-reviewer`) on `anthropic/claude-sonnet-4-6`; appropriate match for judgment-heavy review, and it returned the session's only actionable finding.
+- **Escalation-delay tracking** — no rabbit-holes; the lone lint error (`@typescript-eslint/no-floating-promises`, 18 sites) was resolved in a single test-file rewrite, far under the 5-call escalation threshold.
+- **Unused-tool detection** — nothing under-tooled; `colgrep`/`grep` were used during planning exploration and the reviewer subagent was dispatched as designed.
+- **Feedback-loop gap analysis** — no gap; verification ran after every cycle, with `pnpm run check` correctly invoked right after the shared-interface change rather than at end-of-session.
+#### Process note (no inline change)
+- The release-please PR merge required the documented `UNSTABLE` → `gh pr merge` fallback (step 6.4 of `/ship-issue`) because default-`GITHUB_TOKEN` release PRs never get checks.
+  This recurs every release; the prompt already handles it, so it is recorded here only as a standing pattern, not a friction point.
+### Changes made
+1. Added this Final Retrospective stage entry to `packages/pi-subagents/docs/retro/0381-replace-concurrency-queue-with-limiter.md`.
+2. No prompt or `AGENTS.md` changes — the operator chose retro-file-only, since the single friction (the stale `src/index.ts:125` comment) was a one-off execution slip already caught by the pre-completion-reviewer backstop, and the candidate grep-hit rule was judged not worth the prompt verbosity.
+[#378]: https://github.com/gotgenes/pi-packages/issues/378

package/docs/retro/0400-include-parent-prompt-in-replace-mode.md CHANGED Viewed

@@ -41,4 +41,44 @@ Test count went from 973 to 975 (+2 net new tests) across 59 test files.
 - Pre-completion reviewer: WARN — one finding: `.pi/skills/package-pi-subagents/SKILL.md` still said "prepends" for the `<active_agent>` tag; fixed in a follow-up `docs:` commit before shipping.
 - No deviations from the plan's Module-Level Changes list; no lockfile changes; fallow dead-code exited zero.
+## Stage: Final Retrospective (2026-06-14T01:11:10Z)
+### Session summary
+Shipped #400 across three stages (Planning on `claude-opus-4-8`, TDD + Ship on `claude-sonnet-4-6`) as a single-function edit to `buildAgentPrompt()`'s replace branch plus tests and doc updates, released as `pi-subagents` v16.0.0 (major, breaking `perf!:`).
+The run was clean end-to-end: two `ask_user` gates during planning, a 3-cycle TDD pass, one pre-completion WARN resolved before push, and a no-friction release-please merge.
+### Observations
+#### What went well
+- Cross-extension investigation on demand — when the operator asked mid-`ask_user` how the `genericBase` fallback interacts with `@gotgenes/pi-anthropic-auth`, the agent read that sibling repo's `system-prompt-shaping.ts` and `request-shaping.ts` and proved no new interaction (billing header prepended unconditionally; de-fingerprinting keys off `PI_DEFAULT_PROMPT_PREFIX`, absent from the neutral `genericBase`) before answering.
+  This converted an open worry into a documented Risk row rather than a deferred unknown.
+- Emergent-scope surfacing — planning noticed that built-in `Explore`/`Plan` are replace-mode agents and so are visibly affected, then confirmed uniform application via a second `ask_user` instead of assuming.
+- Autoformat discipline — after `pi-autoformat` touched `README.md` mid-edit, the agent re-read the region before the next edit (turns 49–50) rather than matching against stale layout, avoiding a failed `oldText`.
+#### What caused friction (agent side)
+- `missing-context` (planning) — the plan listed the README's Patch 3 `<active_agent>` "prepends" wording as a doc update but missed the identical Patch 3 description in `.pi/skills/package-pi-subagents/SKILL.md`.
+  Exact-grep during planning keyed on removed strings (`You are a pi coding agent sub-agent`, `prompt_mode`); the stale prose carried none of them, so the skill file's "prepends `<active_agent>`" line was not found.
+  Impact: the pre-completion reviewer caught it as a WARN, requiring one follow-up `docs:` commit (8e93d2a4) during TDD before push — no rework beyond that, and the safety net worked as designed.
+#### What caused friction (user side)
+- None — the operator's mid-planning OAuth question was a high-value redirect that strengthened the plan, not friction.
+### Diagnostic details
+- **Model-performance correlation** — judgment-heavy planning ran on `claude-opus-4-8`; mechanical TDD execution and the deterministic ship steps ran on `claude-sonnet-4-6`.
+  Appropriate assignment in both directions; no mismatch.
+- **Unused-tool detection** — the `colgrep` skill was loaded in planning but never used; exploration was all exact-symbol grep, which was correct for known symbols.
+  The one place it would have helped is the `missing-context` friction: a semantic search like "docs describing how the active_agent tag is added to the system prompt" would likely have surfaced both the README and the SKILL.md descriptions that symbol-grep missed.
+- **Feedback-loop gap analysis** — verification ran incrementally throughout (green baseline before cycle 1, per-file `vitest` each cycle, full suite + `check` + `lint` + `fallow` after the last step).
+  No end-loaded verification.
+- **Escalation-delay tracking** — no rabbit-holes; no error sequence exceeded one tool call.
+### Changes made
+1. `.pi/prompts/plan-issue.md` — extended the Module-Level Changes grep bullet: when a step reworks a documented mechanism's behavior (rather than removing a symbol), grep `.pi/skills/package-*/SKILL.md` for the mechanism name, since reworded prose carries no removed symbol to match.
 [#180]: https://github.com/gotgenes/pi-packages/issues/180

package/docs/retro/0403-abort-subagents-on-interrupt.md ADDED Viewed

@@ -0,0 +1,49 @@
+---
+issue: 403
+issue_title: "Pressing Escape does not stop subagent/background agent"
+---
+# Retro: #403 — Pressing Escape does not stop subagent/background agent
+## Stage: Planning (2026-06-14T00:00:00Z)
+### Session summary
+Investigated the third-party bug report that ESC does not stop subagents and traced the abort path through both the package and the pinned Pi SDK peer deps.
+Found that foreground subagents already receive the parent abort signal end-to-end, while background subagents are detached with no interrupt wiring — the reproducible bug.
+Confirmed direction with the operator via `ask_user` (third-party gate): implement ESC-to-abort for both modes, with a foreground guard test, aborting all running and queued background agents.
+Wrote and committed plan `0403-abort-subagents-on-interrupt.md`.
+### Observations
+- Key SDK fact that de-risks the design: in `pi-agent-core` `agent.js`, each run creates a fresh `AbortController` and `finishRun()` discards it **without** aborting on normal completion.
+  So the parent signal's `abort` event fires only on a real ESC interrupt — latching `abortAll()` to it will not spuriously kill background agents at turn end.
+- Chosen mechanism: a small `InterruptHandler` driven by `pi.on("turn_start", ...)`, re-latching `ctx.signal` each turn so the latch tracks the live per-run signal even across runs and tool-less turns.
+  `turn_start` was preferred over `tool_execution_start` because a background agent can outlive the run that spawned it; a turn-level latch still holds the current run's signal when the user interrupts a later tool-less turn.
+- Reused the existing `manager.abortAll()` rather than adding `abortBackground()`.
+  Foreground agents are already aborted via their own `wireSignal`, so `abortAll()`'s overlap is redundant-but-harmless (status-guarded `abort()`, idempotent `markStopped`).
+  The manager does not store `isBackground` on the record, so distinguishing modes would need extra state — deferred as an Open Question.
+- Classified as a non-breaking `fix:` (not `fix!:`): no config key, default, or output shape changes; detached-survives-ESC was a limitation, not a contract.
+  Noted the behavior change explicitly in Goals.
+- Foreground path is believed already-correct from the code trace; the plan adds a regression guard in `subagent-session.test.ts` (`forwardAbortSignal` is currently untested for the parent-signal path) and will fix only if the guard fails.
+## Stage: Implementation — TDD (2026-06-14T18:00:00Z)
+### Session summary
+Completed all three TDD cycles against a green baseline (967 tests).
+Added the foreground-abort guard, implemented `InterruptHandler` + `turn_start` wiring, and updated the architecture doc.
+Test count went from 967 to 975 (+8: 6 `InterruptHandler` unit tests, 2 foreground guard tests); `check`, `lint`, `test`, and `fallow dead-code` all pass.
+### Observations
+- The foreground guard (Step 1) passed on the first run, confirming the planning-stage code trace: the parent signal already reaches the child `session.abort()` via `forwardAbortSignal`.
+  No code fix was needed, so it landed as `test:` exactly as the plan anticipated.
+- `InterruptHandler` came out clean against the `code-design` heuristics — one field read from `ctx`, one method on a one-method `InterruptManager` interface, latch state owned internally, `{ once: true }` listener.
+  The reviewer's code-design check was PASS with no structural concerns.
+- `abortAll()` gained a second narrow-interface consumer (the new handler) on top of the shutdown path; `fallow dead-code` stayed green, so its existing `fallow-ignore-next-line unused-class-member` comment was left untouched.
+- Pre-completion reviewer: **WARN**.
+- Reviewer warnings: stale source-file counts in `architecture.md`.
+  Fixed the current-state prose claim (`56` → `58` source files).
+  Left the fallow health-metrics snapshot rows (line ~650, `7,778 (57 files)`) intact — those are point-in-time analysis tables where the file count was computed alongside LOC and other metrics, so bumping one cell in isolation would desync the snapshot.
+  Amended the fix into the docs commit (not yet pushed).

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@gotgenes/pi-subagents",
-  "version": "16.0.0",
+  "version": "16.1.1",
   "type": "module",
   "exports": {
     ".": {

package/src/handlers/index.ts CHANGED Viewed

@@ -1,2 +1,3 @@
+export { InterruptHandler } from "#src/handlers/interrupt";
 export { SessionLifecycleHandler } from "#src/handlers/lifecycle";
 export { ToolStartHandler } from "#src/handlers/tool-start";

package/src/handlers/interrupt.ts ADDED Viewed

@@ -0,0 +1,49 @@
+/**
+ * turn_start event handler that aborts subagents on a parent interrupt (ESC).
+ *
+ * The parent agent loop creates a fresh AbortController per run and only aborts
+ * it on an explicit interrupt — never on normal completion. So latching to the
+ * current run's signal and aborting on its `abort` event fires exactly on ESC.
+ *
+ * `turn_start` carries the live per-run `ctx.signal`, so re-latching each turn
+ * keeps the handler tracking the current signal across runs and tool-less turns.
+ */
+/** Narrow manager interface — only the method the interrupt handler calls. */
+export interface InterruptManager {
+  abortAll(): number;
+}
+/** Minimal context shape — only the field the handler reads. */
+interface InterruptCtx {
+  signal: AbortSignal | undefined;
+}
+/**
+ * Latches the current parent abort signal and aborts all subagents when it fires.
+ *
+ * The latch dedups by reference: most turns reuse the same signal (no-op); a new
+ * run's signal triggers a detach-and-rewire. The `abort` listener is one-shot.
+ */
+export class InterruptHandler {
+  private latched?: AbortSignal;
+  private detach?: () => void;
+  constructor(private readonly manager: InterruptManager) {}
+  handleTurnStart(ctx: InterruptCtx): void {
+    const signal = ctx.signal;
+    if (signal === this.latched) return;
+    this.detach?.();
+    this.detach = undefined;
+    this.latched = signal;
+    if (!signal) return;
+    const onAbort = (): void => {
+      this.manager.abortAll();
+    };
+    signal.addEventListener("abort", onAbort, { once: true });
+    this.detach = () => signal.removeEventListener("abort", onAbort);
+  }
+}

package/src/index.ts CHANGED Viewed

@@ -22,9 +22,9 @@ import {
 } from "@earendil-works/pi-coding-agent";
 import { AgentTypeRegistry } from "#src/config/agent-types";
 import { loadCustomAgents } from "#src/config/custom-agents";
-import { SessionLifecycleHandler, ToolStartHandler } from "#src/handlers/index";
+import { InterruptHandler, SessionLifecycleHandler, ToolStartHandler } from "#src/handlers/index";
 import { createChildLifecyclePublisher } from "#src/lifecycle/child-lifecycle";
-import { ConcurrencyQueue } from "#src/lifecycle/concurrency-queue";
+import { ConcurrencyLimiter } from "#src/lifecycle/concurrency-limiter";
 import { createSubagentSession, type SubagentSessionDeps } from "#src/lifecycle/create-subagent-session";
 import { buildParentSnapshot } from "#src/lifecycle/parent-snapshot";
 import { SubagentManager, type SubagentManagerObserver } from "#src/lifecycle/subagent-manager";
@@ -66,12 +66,12 @@ export default function (pi: ExtensionAPI) {
   );
   // Settings: owns all three in-memory values and handles load/save/emit.
-  // onMaxConcurrentChanged is wired to the queue directly (closure captures by reference).
+  // onMaxConcurrentChanged is wired to the limiter directly (closure captures by reference).
   const settings = new SettingsManager({
     emit: (event, payload) => pi.events.emit(event, payload),
     cwd: process.cwd(),
     agentDir: getAgentDir(),
-    onMaxConcurrentChanged: () => queue.drain(),
+    onMaxConcurrentChanged: () => limiter.recheck(),
   });
   settings.load();
@@ -122,7 +122,7 @@ export default function (pi: ExtensionAPI) {
       });
     },
     onSubagentCreated(record) {
-      // Emit created event for background agents (before startAgent / queue drain).
+      // Emit created event for background agents (before limiter admission).
       pi.events.emit("subagents:created", {
         id: record.id,
         type: record.type,
@@ -150,22 +150,15 @@ export default function (pi: ExtensionAPI) {
     lifecycle: createChildLifecyclePublisher((channel, data) => pi.events.emit(channel, data)),
   };
-  // ConcurrencyQueue: scheduling extracted from SubagentManager.
-  // startAgent callback forward-references manager via closure (safe — drain is never called during construction).
-  const queue = new ConcurrencyQueue(
-    () => settings.maxConcurrent,
-    (id) => {
-      const agent = manager.getRecord(id);
-      if (agent?.status !== "queued") return;
-      agent.promise = agent.run();
-    },
-  );
+  // ConcurrencyLimiter: schedules background run thunks FIFO against the limit.
+  // It knows nothing about agents or the manager — dependency direction is strictly manager → limiter.
+  const limiter = new ConcurrencyLimiter(() => settings.maxConcurrent);
   const manager = new SubagentManager({
     createSubagentSession: (params) => createSubagentSession(params, subagentSessionDeps),
     baseCwd: process.cwd(),
     observer,
-    queue,
+    limiter,
     getRunConfig: () => settings,
   });
@@ -192,6 +185,10 @@ export default function (pi: ExtensionAPI) {
   const toolStart = new ToolStartHandler(runtime);
   pi.on("tool_execution_start", (event, ctx) => toolStart.handleToolExecutionStart(event, ctx));
+  // Abort all subagents when the parent agent loop is interrupted (ESC).
+  const interrupt = new InterruptHandler(manager);
+  pi.on("turn_start", (_event, ctx) => interrupt.handleTurnStart(ctx));
   // ---- Agent tool ----
   pi.registerTool(new AgentTool(manager, runtime, settings, registry, getAgentDir()).toToolDefinition());

package/src/lifecycle/concurrency-limiter.ts ADDED Viewed

@@ -0,0 +1,55 @@
+/**
+ * concurrency-limiter.ts — FIFO admission gate for background work.
+ *
+ * Schedules run closures (thunks) against a dynamic limit, running them in
+ * scheduling order as slots free. The limiter knows nothing about agents, IDs,
+ * or the manager — it owns only the active count and the pending queue.
+ *
+ * Every scheduled promise settles: it follows the task's settlement when the
+ * task runs, or resolves early if clear() drops it before it starts.
+ */
+export class ConcurrencyLimiter {
+	private active = 0;
+	private readonly pending: Array<{ start: () => void; settle: () => void }> = [];
+	constructor(private readonly getLimit: () => number) {}
+	/**
+	 * Schedule a task to run FIFO once a slot is free.
+	 * Returns a promise that settles with the task, or resolves early if the
+	 * task is dropped by clear() before it starts.
+	 */
+	schedule(task: () => Promise<void>): Promise<void> {
+		const { promise, resolve, reject } = Promise.withResolvers<void>(); // eslint-disable-line @typescript-eslint/no-invalid-void-type -- Promise.withResolvers<void> is valid; rule does not allow void in generic fn call type args
+		this.pending.push({
+			start: () => {
+				this.active++;
+				task()
+					.then(resolve, reject)
+					.finally(() => {
+						this.active--;
+						this.recheck();
+					});
+			},
+			settle: resolve,
+		});
+		this.recheck();
+		return promise;
+	}
+	/** Start pending tasks until the limit is reached. Call when the limit may have grown. */
+	recheck(): void {
+		while (this.active < this.getLimit()) {
+			const next = this.pending.shift();
+			if (!next) break;
+			next.start();
+		}
+	}
+	/** Drop all pending tasks, resolving their promises without running them. */
+	clear(): void {
+		const dropped = this.pending.splice(0);
+		for (const task of dropped) task.settle();
+	}
+}

package/src/lifecycle/subagent-manager.ts CHANGED Viewed

@@ -2,18 +2,19 @@
  * subagent-manager.ts - Tracks subagents, background execution, resume support.
  *
  * Background agents are subject to a configurable concurrency limit (default: 4).
- * Excess agents are queued and auto-started as running agents complete.
- * Foreground agents bypass the queue (they block the parent anyway).
+ * Excess agents are scheduled on a ConcurrencyLimiter and auto-started as running
+ * agents complete. Foreground agents bypass the limiter (they block the parent anyway).
  */
 import { randomUUID } from "node:crypto";
 import type { Model } from "@earendil-works/pi-ai";
 import { debugLog } from "#src/debug";
-import type { ConcurrencyQueue } from "#src/lifecycle/concurrency-queue";
+import type { ConcurrencyLimiter } from "#src/lifecycle/concurrency-limiter";
 import type { CreateSubagentSessionParams } from "#src/lifecycle/create-subagent-session";
 import type { ParentSnapshot } from "#src/lifecycle/parent-snapshot";
 import { Subagent, type SubagentLifecycleObserver } from "#src/lifecycle/subagent";
 import type { SubagentSession } from "#src/lifecycle/subagent-session";
+import { SubagentState } from "#src/lifecycle/subagent-state";
 import type { WorkspaceProvider } from "#src/lifecycle/workspace";
 import type { RunConfig } from "#src/runtime";
@@ -31,8 +32,8 @@ export interface SubagentManagerObserver {
 export interface SubagentManagerOptions {
   /** Assembly factory that produces a born-complete SubagentSession per spawn. */
   createSubagentSession: (params: CreateSubagentSessionParams) => Promise<SubagentSession>;
-  /** Concurrency queue — owns scheduling, limit checks, and drain logic. */
-  queue: ConcurrencyQueue;
+  /** Concurrency limiter — schedules background run thunks FIFO against the limit. */
+  limiter: ConcurrencyLimiter;
   /** Base working directory handed to a workspace provider (the parent cwd). */
   baseCwd: string;
   getRunConfig?: () => RunConfig;
@@ -67,7 +68,7 @@ export class SubagentManager {
   private cleanupInterval: ReturnType<typeof setInterval>;
   private readonly observer?: SubagentManagerObserver;
   private readonly createSubagentSession: (params: CreateSubagentSessionParams) => Promise<SubagentSession>;
-  private readonly queue: ConcurrencyQueue;
+  private readonly limiter: ConcurrencyLimiter;
   private readonly baseCwd: string;
   private getRunConfig?: () => RunConfig;
   private _workspaceProvider?: WorkspaceProvider;
@@ -79,7 +80,7 @@ export class SubagentManager {
   constructor(options: SubagentManagerOptions) {
     this.createSubagentSession = options.createSubagentSession;
-    this.queue = options.queue;
+    this.limiter = options.limiter;
     this.baseCwd = options.baseCwd;
     this.observer = options.observer;
     this.getRunConfig = options.getRunConfig;
@@ -109,7 +110,6 @@ export class SubagentManager {
   private buildObserver(options: AgentSpawnConfig): SubagentLifecycleObserver {
     return {
       onStarted: (agent) => {
-        if (options.isBackground) this.queue.markStarted();
         this.observer?.onSubagentStarted(agent);
       },
       onSessionCreated: options.observer?.onSessionCreated
@@ -117,7 +117,6 @@ export class SubagentManager {
         : undefined,
       onRunFinished: (agent) => {
         if (options.isBackground) {
-          this.queue.markFinished();
           try { this.observer?.onSubagentCompleted(agent); } catch (err) { debugLog("onSubagentCompleted observer", err); }
         }
       },
@@ -142,23 +141,25 @@ export class SubagentManager {
       id,
       type,
       description: options.description,
-      status: options.isBackground ? "queued" : "running",
-      startedAt: Date.now(),
       invocation: options.invocation,
-      // Run config
-      snapshot,
-      prompt,
-      model: options.model,
-      maxTurns: options.maxTurns,
-      thinkingLevel: options.thinkingLevel,
-      parentSession: options.parentSession,
-      signal: options.signal,
-      // Shared deps
-      createSubagentSession: this.createSubagentSession,
-      observer: this.buildObserver(options),
-      getRunConfig: this.getRunConfig,
-      baseCwd: this.baseCwd,
-      getWorkspaceProvider: () => this._workspaceProvider,
+      state: new SubagentState({
+        status: options.isBackground ? "queued" : "running",
+        startedAt: Date.now(),
+      }),
+      execution: {
+        createSubagentSession: this.createSubagentSession,
+        snapshot,
+        prompt,
+        baseCwd: this.baseCwd,
+        observer: this.buildObserver(options),
+        getRunConfig: this.getRunConfig,
+        getWorkspaceProvider: () => this._workspaceProvider,
+        model: options.model,
+        maxTurns: options.maxTurns,
+        thinkingLevel: options.thinkingLevel,
+        parentSession: options.parentSession,
+        signal: options.signal,
+      },
     });
     this.agents.set(id, record);
@@ -166,9 +167,13 @@ export class SubagentManager {
       this.observer?.onSubagentCreated(record);
     }
-    if (options.isBackground && !options.bypassQueue && this.queue.isFull()) {
-      // Queue it - will be started when a running agent completes
-      this.queue.enqueue(id);
+    if (options.isBackground && !options.bypassQueue) {
+      // Schedule on the limiter — started when a slot frees. The status guard
+      // makes an abort-while-queued task a no-op (Step 3 folds it into start()).
+      record.promise = this.limiter.schedule(() => {
+        if (record.status !== "queued") return Promise.resolve();
+        return record.run();
+      });
       return id;
     }
@@ -221,9 +226,9 @@ export class SubagentManager {
     const record = this.agents.get(id);
     if (!record) return false;
-    // Remove from queue if queued
+    // A queued agent has not started; mark it stopped. Its scheduled thunk
+    // becomes a no-op (status guard) when its slot finally opens.
     if (record.status === "queued") {
-      this.queue.dequeue(id);
       record.markStopped();
       return true;
     }
@@ -269,43 +274,44 @@ export class SubagentManager {
   // fallow-ignore-next-line unused-class-member
   abortAll(): number {
     let count = 0;
-    // Clear queued agents first
-    for (const id of this.queue.queuedIds) {
-      const record = this.agents.get(id);
-      if (record) {
+    for (const record of this.agents.values()) {
+      if (record.status === "queued") {
         record.markStopped();
         count++;
+      } else if (record.abort()) {
+        count++;
       }
     }
-    this.queue.clear();
-    // Abort running agents
-    for (const record of this.agents.values()) {
-      if (record.abort()) count++;
-    }
+    // Drop pending thunks (their promises resolve).
+    this.limiter.clear();
     return count;
   }
   /** Wait for all running and queued agents to complete (including queued ones). */
   // fallow-ignore-next-line unused-class-member
   async waitForAll(): Promise<void> {
-    // Loop because queue.drain() respects the concurrency limit - as running
-    // agents finish they start queued ones, which need awaiting too.
-    // eslint-disable-next-line @typescript-eslint/no-unnecessary-condition -- intentional infinite loop with explicit break
-    while (true) {
-      this.queue.drain();
-      const pending = [...this.agents.values()]
-        .filter(r => r.status === "running" || r.status === "queued")
-        .map(r => r.promise)
-        .filter((p): p is Promise<void> => p != null);
-      if (pending.length === 0) break;
+    // Every spawned agent has a settled-on-completion promise (the limiter starts
+    // queued ones as slots free), so a single allSettled covers the queued case.
+    // The loop only catches agents spawned during the wait.
+    let pending = this.pendingPromises();
+    while (pending.length > 0) {
       await Promise.allSettled(pending);
+      pending = this.pendingPromises();
     }
   }
+  /** Promises of all running/queued agents that have one. */
+  private pendingPromises(): Promise<void>[] {
+    return [...this.agents.values()]
+      .filter(r => r.status === "running" || r.status === "queued")
+      .map(r => r.promise)
+      .filter((p): p is Promise<void> => p != null);
+  }
   dispose() {
     clearInterval(this.cleanupInterval);
-    // Clear queue
-    this.queue.clear();
+    // Drop pending thunks
+    this.limiter.clear();
     for (const record of this.agents.values()) {
       record.disposeSession();
     }