npm - @gotgenes/pi-subagents - Versions diffs - 5.1.0 → 5.3.0 - Mend

@gotgenes/pi-subagents 5.1.0 → 5.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (26) hide show

package/CHANGELOG.md +37 -0
package/README.md +176 -133
package/docs/architecture/architecture.md +148 -92
package/docs/decisions/0001-deferred-patches.md +11 -5
package/docs/plans/0048-implement-subagents-api.md +2 -1
package/docs/plans/0049-remove-group-join-output-file-rpc.md +22 -5
package/docs/plans/0051-update-adr-0001-hard-fork.md +2 -1
package/docs/plans/0052-remove-scheduled-subagents.md +4 -2
package/docs/plans/0057-structured-debug-logging.md +22 -52
package/docs/plans/0069-create-subagent-runtime.md +345 -0
package/docs/plans/0071-extract-session-config-assembler.md +362 -0
package/docs/retro/0049-remove-group-join-output-file-rpc.md +15 -4
package/docs/retro/0051-update-adr-0001-hard-fork.md +7 -3
package/docs/retro/0053-extract-model-resolution-from-execute.md +14 -4
package/docs/retro/0054-decompose-index-into-modules.md +20 -5
package/docs/retro/0057-structured-debug-logging.md +77 -0
package/docs/retro/0069-create-subagent-runtime.md +43 -0
package/package.json +1 -1
package/src/agent-manager.ts +7 -0
package/src/agent-runner.ts +51 -189
package/src/debug.ts +4 -2
package/src/index.ts +37 -28
package/src/runtime.ts +62 -0
package/src/session-config.ts +263 -0
package/src/tools/agent-tool.ts +4 -2
package/src/ui/agent-menu.ts +16 -13

package/docs/plans/0071-extract-session-config-assembler.md ADDED Viewed

@@ -0,0 +1,362 @@
+---
+issue: 71
+issue_title: "refactor: extract pure agent-session assembler from agent-runner.ts"
+---
+# Extract session-config assembler from agent-runner
+## Problem Statement
+`agent-runner.ts` `runAgent()` is ~390 lines (post-#69 cleanup) and mixes three concerns:
+1. Configuration assembly — resolve model, detect env, build prompt extras, preload skills, build memory blocks, assemble system prompt, compute tool names (~200 lines).
+2. Session construction — create `DefaultResourceLoader`, call `createAgentSession`, filter tools, bind extensions (~100 lines).
+3. Runtime orchestration — subscribe to events, enforce turn limits, collect results (~90 lines).
+The configuration assembly is deterministic given resolved inputs and does not need an `AgentSession`.
+Because it is inlined in `runAgent()`, it cannot be unit-tested without mocking the entire Pi SDK (`createAgentSession`, `DefaultResourceLoader`, `SessionManager`, `SettingsManager`).
+## Goals
+- Extract a pure `assembleSessionConfig()` function into a new `src/session-config.ts` module.
+- The assembler takes resolved inputs (agent config, environment info, narrow context) and returns a data object with everything `runAgent()` needs to create the session.
+- Reduce `runAgent()` to an IO shell: call the assembler, create SDK objects, wire subscriptions, and run the event loop.
+- Add focused unit tests for the assembler covering model resolution fallback chain, skill preloading, memory block selection (read-write vs read-only), prompt mode, tool name assembly, and disallowed-tool computation.
+- No behavior change.
+## Non-Goals
+- Changing the `RunResult` shape or `RunOptions` interface.
+- Refactoring the event subscription / turn-limit logic (stays in `runAgent()`).
+- Extracting `resumeAgent` or `steerAgent`.
+- Modifying the public API surface (`service.ts`).
+## Background
+### Prior art
+`pi-permission-system` extracted `evaluate()` — a pure function of `(surface, pattern, ruleset)` — from `PermissionManager.checkPermission()`.
+That made permission decisions independently testable without filesystem access or a manager instance.
+This plan follows the same pattern: extract a pure core from an IO-heavy function.
+### Current `runAgent()` structure
+Lines 220–460 of `agent-runner.ts` break into these logical phases:
+| Phase                           | Lines (approx) | SDK dependency                                           |
+| ------------------------------- | -------------- | -------------------------------------------------------- |
+| Config + agentConfig lookup     | 224–225        | None (agent-types registry)                              |
+| effectiveCwd                    | 228            | None                                                     |
+| detectEnv                       | 230            | `pi.exec` (async IO)                                     |
+| parentSystemPrompt              | 233            | `ctx.getSystemPrompt()`                                  |
+| extensions / skills resolution  | 237–245        | None                                                     |
+| Skill preloading                | 247–252        | `preloadSkills` (filesystem)                             |
+| Tool names + memory             | 254–274        | None (agent-types registry)                              |
+| System prompt assembly          | 277–303        | `buildAgentPrompt` (pure)                                |
+| noSkills flag                   | 306            | None                                                     |
+| DefaultResourceLoader           | 308–320        | `DefaultResourceLoader` (SDK)                            |
+| Model resolution                | 323–324        | `ctx.modelRegistry` (narrow)                             |
+| Thinking level                  | 327            | None                                                     |
+| sessionOpts construction        | 329–345        | `SessionManager`, `SettingsManager`, `getAgentDir` (SDK) |
+| createAgentSession              | 347            | SDK                                                      |
+| Tool filtering + bindExtensions | 350–400        | `session.*` methods (SDK)                                |
+| Event subscriptions + prompt    | 402–460        | `session.*` methods (SDK)                                |
+Everything above the `DefaultResourceLoader` line is configuration assembly — deterministic given resolved inputs.
+Everything from `DefaultResourceLoader` onward is SDK orchestration.
+### Modules the assembler will call
+All are internal to this package — not Pi SDK:
+- `agent-types.ts` — `getConfig()`, `getAgentConfig()`, `getToolNamesForType()`, `getMemoryToolNames()`, `getReadOnlyMemoryToolNames()`
+- `prompts.ts` — `buildAgentPrompt()`
+- `memory.ts` — `buildMemoryBlock()`, `buildReadOnlyMemoryBlock()`
+- `skill-loader.ts` — `preloadSkills()`
+- `default-agents.ts` — `DEFAULT_AGENTS` (fallback config)
+### Relevant constraints from AGENTS.md
+- Keep modules focused and composable (one concern per file).
+- Keep Pi SDK imports out of business-logic modules.
+- Prefer explicit configuration over hidden behavior.
+- Business logic should be pure functions wherever possible — keep IO at the edges.
+### Issue #69 status
+Issue #69 (`SubagentRuntime`) is implemented.
+Module-scope mutable state has been removed from `agent-runner.ts`.
+`defaultMaxTurns` and `graceTurns` flow through `RunOptions`.
+This plan builds on the post-#69 codebase.
+## Design Overview
+### Separation of concerns
+`detectEnv()` is the only async IO call in the assembly phase — it calls `pi.exec()` to check git state.
+The assembler is synchronous and takes `EnvInfo` as a pre-resolved parameter.
+`runAgent()` calls `detectEnv()` first, then calls the assembler, then does SDK work.
+### Narrow context interface
+The assembler does not accept `ExtensionContext` — it accepts a narrow interface with only the fields it reads:
+```typescript
+interface AssemblerContext {
+  /** Parent working directory (overridable via options.cwd). */
+  cwd: string;
+  /** Parent's effective system prompt (for append-mode agents). */
+  parentSystemPrompt: string;
+  /** Parent's current model instance (fallback when agent config has no model). */
+  parentModel?: Model<any>;
+  /** Model registry for resolving config.model strings. */
+  modelRegistry: ModelRegistry;
+}
+```
+`ModelRegistry` is a narrow interface (already exists in `model-resolver.ts`):
+```typescript
+interface ModelRegistry {
+  find(provider: string, modelId: string): Model<any> | undefined;
+  getAvailable?(): Model<any>[];
+}
+```
+Tests construct plain objects satisfying these interfaces — no SDK mocking needed.
+### Assembler signature
+```typescript
+function assembleSessionConfig(
+  type: SubagentType,
+  ctx: AssemblerContext,
+  options: AssemblerOptions,
+  env: EnvInfo,
+): SessionConfig;
+```
+`AssemblerOptions` is a narrow pick of `RunOptions`:
+```typescript
+interface AssemblerOptions {
+  cwd?: string;
+  isolated?: boolean;
+  model?: Model<any>;
+  thinkingLevel?: ThinkingLevel;
+}
+```
+### Return type
+```typescript
+interface SessionConfig {
+  /** Resolved working directory (options.cwd ?? ctx.cwd). */
+  effectiveCwd: string;
+  /** Fully-assembled system prompt string. */
+  systemPrompt: string;
+  /** Tool names for session creation and filtering. */
+  toolNames: string[];
+  /** Disallowed tool set from agent config (for filterActiveTools). */
+  disallowedSet: Set<string> | undefined;
+  /** Resolved extensions setting (for resource loader and tool filtering). */
+  extensions: boolean | string[];
+  /** Resolved model instance (or undefined → parent fallback). */
+  model: Model<any> | undefined;
+  /** Resolved thinking level (or undefined → inherit). */
+  thinkingLevel: ThinkingLevel | undefined;
+  /** Whether to skip skill loading in the resource loader. */
+  noSkills: boolean;
+  /** Prompt extras for transparency / debugging. */
+  extras: PromptExtras;
+}
+```
+### `resolveDefaultModel` moves to session-config.ts
+`resolveDefaultModel()` is a pure function that resolves model strings against a registry.
+It belongs in the assembler module alongside the other resolution logic.
+It becomes an internal function (not exported) — its behavior is tested through `assembleSessionConfig()`.
+### `filterActiveTools` stays in agent-runner.ts
+`filterActiveTools()` operates on a live session's active tool list.
+It runs twice (pre- and post-`bindExtensions`) and is an IO-layer concern.
+It stays in `agent-runner.ts` and consumes `toolNames`, `extensions`, and `disallowedSet` from the `SessionConfig` return.
+### `normalizeMaxTurns` stays in agent-runner.ts
+`normalizeMaxTurns()` is used in the turn-limit subscription callback — runtime orchestration, not config assembly.
+It stays in `agent-runner.ts`.
+### What runAgent() looks like after
+```typescript
+export async function runAgent(
+  ctx: ExtensionContext,
+  type: SubagentType,
+  prompt: string,
+  options: RunOptions,
+): Promise<RunResult> {
+  const effectiveCwd = options.cwd ?? ctx.cwd;
+  const env = await detectEnv(options.pi, effectiveCwd);
+  const config = assembleSessionConfig(type, {
+    cwd: ctx.cwd,
+    parentSystemPrompt: ctx.getSystemPrompt(),
+    parentModel: ctx.model,
+    modelRegistry: ctx.modelRegistry,
+  }, {
+    cwd: options.cwd,
+    isolated: options.isolated,
+    model: options.model,
+    thinkingLevel: options.thinkingLevel,
+  }, env);
+  // SDK orchestration: create loader, session, filter tools, bind, run
+  const agentDir = getAgentDir();
+  const loader = new DefaultResourceLoader({ ... });
+  await loader.reload();
+  const { session } = await createAgentSession({ ... });
+  // Tool filtering (two passes), bindExtensions, subscriptions, prompt
+  // ...same as today, using config.toolNames, config.disallowedSet, etc.
+}
+```
+Target: `runAgent()` drops to ~200 lines (down from ~390).
+### Edge cases
+- Unknown agent type: `getAgentConfig()` returns `undefined`.
+  The assembler falls back to `DEFAULT_AGENTS.get("general-purpose")` with `name: type`, matching the current `runAgent()` fallback.
+- Empty `builtinToolNames`: `getToolNamesForType()` already falls back to `BUILTIN_TOOL_NAMES`.
+- `isolated: true` overrides `extensions` and `skills` to `false` — same as today, now inside the assembler.
+- Memory block selection: write-capable agents (have `write` or `edit` in effective tool set, not denied) get read-write memory; others get read-only.
+  The denylist check uses `disallowedSet` from the agent config.
+## Module-Level Changes
+### `src/session-config.ts` (new)
+- `AssemblerContext` interface — narrow context (cwd, parentSystemPrompt, parentModel, modelRegistry).
+- `AssemblerOptions` interface — narrow options subset (cwd, isolated, model, thinkingLevel).
+- `SessionConfig` interface — return type with all assembled configuration.
+- `assembleSessionConfig()` function — pure configuration assembly.
+- `resolveDefaultModel()` — moved from `agent-runner.ts` (internal, not exported).
+### `src/agent-runner.ts` (modified)
+- Import `assembleSessionConfig` and `SessionConfig` from `./session-config.js`.
+- Remove ~200 lines of configuration assembly from `runAgent()`.
+- Replace with a call to `assembleSessionConfig()` followed by SDK orchestration using the returned `SessionConfig`.
+- Remove `resolveDefaultModel()` (moved to session-config.ts).
+- `filterActiveTools()`, `normalizeMaxTurns()`, `collectResponseText()`, `getLastAssistantText()`, `forwardAbortSignal()` — all stay.
+- `RunOptions`, `RunResult`, `ToolActivity` — all stay (unchanged).
+### `test/session-config.test.ts` (new)
+- Unit tests for `assembleSessionConfig()` covering all assembly logic.
+- Tests use plain objects for `AssemblerContext` — no SDK mocks.
+- Mocks for `agent-types`, `prompts`, `memory`, `skill-loader` — simple function mocks.
+### `test/agent-runner.test.ts` (modified)
+- Existing tests stay as-is — they already mock the SDK and test the full `runAgent()` flow.
+- Tests that verified assembly details (e.g., `suppresses AGENTS.md/CLAUDE.md` or `passes effective cwd to the loader`) remain valid because `runAgent()` still does the SDK orchestration.
+- No tests are removed or rewritten.
+### `test/agent-runner-extension-tools.test.ts` (unchanged)
+- Tests extension-tool filtering via `filterActiveTools` — stays in `agent-runner.ts`.
+- No impact.
+## Test Impact Analysis
+### New unit tests enabled by the extraction
+1. Model resolution fallback chain — test that `assembleSessionConfig` returns the correct model for: explicit option model, config model string (valid/invalid), parent model fallback, and no model.
+2. Skill preloading — test that `skills: string[]` triggers `preloadSkills` and populates `extras.skillBlocks`; `skills: false` and `skills: true` skip preloading.
+3. Memory block selection — test read-write vs read-only memory based on tool availability and denylist interaction.
+4. Tool name assembly — test that `getToolNamesForType` result is augmented with memory tool names when memory is configured.
+5. Extensions / isolated interaction — test that `isolated: true` forces `extensions: false` and `skills: false`.
+6. System prompt assembly — test that `buildAgentPrompt` is called with the correct config, extras, and env.
+7. Disallowed tool set — test construction from `agentConfig.disallowedTools`.
+8. Unknown type fallback — test that missing `agentConfig` triggers the general-purpose fallback.
+9. Thinking level resolution — test explicit option vs config vs undefined.
+### Existing tests that stay as-is
+All tests in `test/agent-runner.test.ts`, `test/agent-runner-extension-tools.test.ts`, and `test/agent-runner-settings.test.ts` continue to pass unchanged.
+They test the SDK orchestration layer which is not modified (only reduced in scope).
+The assembly logic they implicitly tested is now covered more thoroughly by `test/session-config.test.ts`.
+### Existing tests that could be simplified (future follow-up)
+Some `agent-runner.test.ts` tests verify assembly-layer behavior through the full `runAgent()` call (e.g., checking `defaultResourceLoaderCtor` args).
+These become redundant with the new assembler tests.
+Simplifying them is a separate follow-up — not part of this issue's scope.
+## TDD Order
+1. **Red: assembler returns correct defaults for a standard agent type.**
+   Create `test/session-config.test.ts` with a test that calls `assembleSessionConfig()` for the `"Explore"` type and asserts the returned `SessionConfig` shape: `effectiveCwd`, `systemPrompt`, `toolNames`, `extensions: false`, `noSkills: true`, `disallowedSet: undefined`.
+   Mock `agent-types`, `prompts`, `memory`, `skill-loader` at the module level.
+   This fails because `session-config.ts` does not exist yet.
+   Commit: `test: add session-config assembler test for default agent type`
+2. **Green: implement `assembleSessionConfig()` core path.**
+   Create `src/session-config.ts` with `AssemblerContext`, `AssemblerOptions`, `SessionConfig` interfaces and the `assembleSessionConfig()` function.
+   Implement the happy path: resolve config, compute effectiveCwd, resolve extensions/skills, build extras, build system prompt, compute toolNames, compute disallowedSet, resolve noSkills.
+   Tests go green.
+   Commit: `feat: add assembleSessionConfig in session-config.ts`
+3. **Red→Green: model resolution fallback chain.**
+   Add tests for: explicit option model wins, config model string resolves via registry, invalid config model falls back to parent, no model returns undefined.
+   Move `resolveDefaultModel()` from `agent-runner.ts` to `session-config.ts` (internal).
+   Commit: `test: model resolution fallback chain in session-config`
+4. **Red→Green: skill preloading paths.**
+   Add tests for: `skills: string[]` populates `extras.skillBlocks`, `skills: false` skips, `skills: true` skips preloading (loaded by resource loader instead), `isolated: true` forces skip.
+   Commit: `test: skill preloading paths in session-config`
+5. **Red→Green: memory block selection.**
+   Add tests for: agent with memory + write tools → read-write block, agent with memory + read-only tools → read-only block, agent with memory + denied write tools → read-only block, agent without memory → no block.
+   Commit: `test: memory block selection in session-config`
+6. **Red→Green: isolated mode, unknown type fallback, thinking level.**
+   Add tests for: `isolated: true` forces `extensions: false` and `noSkills: true`, unknown type falls back to general-purpose config, thinking level resolves from option > config > undefined.
+   Commit: `test: isolated mode, unknown type fallback, thinking level`
+7. **Refactor: wire `assembleSessionConfig` into `runAgent()`.**
+   Replace the configuration assembly block in `runAgent()` with a call to `assembleSessionConfig()`.
+   Use the returned `SessionConfig` fields to construct `DefaultResourceLoader`, `createAgentSession` opts, and `filterActiveTools` args.
+   Remove `resolveDefaultModel()` from `agent-runner.ts` (already moved in step 3).
+   Run full test suite — all existing `agent-runner.test.ts` tests pass unchanged.
+   Commit: `refactor: wire assembleSessionConfig into runAgent (#71)`
+8. **Verify acceptance criteria and clean up.**
+   Confirm `runAgent()` is ≤200 lines.
+   Confirm assembler tests run without mocking `AgentSession`, `ExtensionContext`, or Pi SDK types.
+   Confirm full test suite passes with no regressions.
+   Remove any dead imports.
+   Run `pnpm run check` for type safety.
+   Commit: `refactor: finalize session-config extraction (#71)`
+## Risks and Mitigations
+| Risk                                                                                                                            | Mitigation                                                                                                                                                                                                                                                              |
+| ------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| Assembly logic has subtle ordering dependencies (e.g., tool names must be computed before memory block selection)               | The assembler mirrors the exact order from `runAgent()` today; tests verify each dependency chain explicitly.                                                                                                                                                           |
+| Moving `resolveDefaultModel` changes import paths for any external consumer                                                     | `resolveDefaultModel` is not exported from the package — it is internal to `agent-runner.ts` today and internal to `session-config.ts` after the move. No external impact.                                                                                              |
+| Existing `agent-runner.test.ts` tests break when assembly is delegated                                                          | The tests mock `agent-types`, `prompts`, `memory`, `skill-loader` — the assembler calls the same functions through the same module paths, so existing mocks continue to intercept.                                                                                      |
+| `Model<any>` import from `@earendil-works/pi-ai` in the new module violates "keep Pi SDK imports out of business-logic modules" | `pi-ai` provides type-only interfaces (`Model`, `ThinkingLevel`) already used in `types.ts`. The constraint targets `pi-coding-agent` SDK types (`AgentSession`, `ExtensionContext`, `DefaultResourceLoader`). The assembler imports zero types from `pi-coding-agent`. |
+| The assembler's return type becomes a wide interface (9 fields)                                                                 | All fields are consumed by `runAgent()` — none are unused. The interface represents a single cohesive concept (session configuration). No consumer uses a subset; there is no narrowing opportunity.                                                                    |
+## Open Questions
+- Should `assembleSessionConfig` also resolve `effectiveCwd` internally (trivial: `options.cwd ?? ctx.cwd`) or should the caller pre-compute it?
+  The plan assumes the assembler computes it (self-contained), but `runAgent()` also needs `effectiveCwd` for `detectEnv()` before calling the assembler.
+  Resolution: `runAgent()` computes `effectiveCwd` once, passes it as `options.cwd` (already resolved) or as a separate parameter.
+  The assembler still computes `effectiveCwd` from its inputs, which produces the same value.
+  This duplication is benign — both paths yield `options.cwd ?? ctx.cwd`.

package/docs/retro/0049-remove-group-join-output-file-rpc.md CHANGED Viewed

@@ -23,15 +23,26 @@ A new issue (#61) was filed to port the output-file format to Pi's official JSON
 #### What caused friction (agent side)
-- `missing-context` — Included `output-file.ts` removal in the initial plan without questioning its debugging value, despite AGENTS.md's rule "Ask before removing functionality or changing defaults." The issue body explicitly listed it for removal so I followed the spec literally. Impact: required plan revision (amend commit), scope-narrowing comment on issue, and filing #61 — roughly 10 minutes of rework, but produced a better design.
+- `missing-context` — Included `output-file.ts` removal in the initial plan without questioning its debugging value, despite AGENTS.md's rule "Ask before removing functionality or changing defaults."
+  The issue body explicitly listed it for removal so I followed the spec literally.
+  Impact: required plan revision (amend commit), scope-narrowing comment on issue, and filing #61 — roughly 10 minutes of rework, but produced a better design.
-- `missing-context` — When asked whether output-file adheres to Pi's session format, searched the web (`web_search` for "Claude Code session JSONL format") instead of checking the local `~/development/pi/pi` monorepo. The user had to explicitly say "~/development/pi/pi has the code for Pi's JSONL format." Impact: one extra round-trip and less authoritative initial answer (Claude Code's format vs Pi's `SessionManager`). Self-identified after user redirect.
+- `missing-context` — When asked whether output-file adheres to Pi's session format, searched the web (`web_search` for "Claude Code session JSONL format") instead of checking the local `~/development/pi/pi` monorepo.
+  The user had to explicitly say "~/development/pi/pi has the code for Pi's JSONL format."
+  Impact: one extra round-trip and less authoritative initial answer (Claude Code's format vs Pi's `SessionManager`).
+  Self-identified after user redirect.
-- `instruction-violation` (self-identified) — Shell-escaped the `gh issue comment` body incorrectly; backtick-wrapped `src/output-file.ts` was interpreted by bash. Caught immediately via `gh issue view` and fixed with `--edit-last`. Impact: trivial — one extra command.
+- `instruction-violation` (self-identified) — Shell-escaped the `gh issue comment` body incorrectly; backtick-wrapped `src/output-file.ts` was interpreted by bash.
+  Caught immediately via `gh issue view` and fixed with `--edit-last`.
+  Impact: trivial — one extra command.
 #### What caused friction (user side)
-- The issue body listed output-file for removal without noting its debugging value. The user's "How confident are we in getting rid of the logging system?" intervention was the correction. If the issue had marked output-file removal as "tentative pending debugging value assessment," the plan would have surfaced it as a design decision from the start. Minor — the discussion was quick and productive.
+- The issue body listed output-file for removal without noting its debugging value.
+  The user's "How confident are we in getting rid of the logging system?"
+  intervention was the correction.
+  If the issue had marked output-file removal as "tentative pending debugging value assessment," the plan would have surfaced it as a design decision from the start.
+  Minor — the discussion was quick and productive.
 ### Changes made

package/docs/retro/0051-update-adr-0001-hard-fork.md CHANGED Viewed

@@ -22,12 +22,16 @@ The change was planned, implemented, shipped, and released as `pi-subagents-v1.0
 #### What caused friction (agent side)
-- No friction observed. The task was unambiguous and the tooling well-suited.
+- No friction observed.
+  The task was unambiguous and the tooling well-suited.
 #### What caused friction (user side)
-- No friction observed. The session required no user input beyond invoking the three slash commands.
+- No friction observed.
+  The session required no user input beyond invoking the three slash commands.
 ### Follow-ups identified
-- The `package-pi-subagents` skill (`.pi/skills/package-pi-subagents/SKILL.md`) still frames the fork as "a friendly fork… carrying a small number of patches" with priorities like "stays as close to upstream as possible." This framing is now stale given the hard-fork commitment. A separate issue should update the skill to reflect the architecture document's posture.
+- The `package-pi-subagents` skill (`.pi/skills/package-pi-subagents/SKILL.md`) still frames the fork as "a friendly fork… carrying a small number of patches" with priorities like "stays as close to upstream as possible."
+  This framing is now stale given the hard-fork commitment.
+  A separate issue should update the skill to reflect the architecture document's posture.

package/docs/retro/0053-extract-model-resolution-from-execute.md CHANGED Viewed

@@ -17,14 +17,24 @@ Also fixed a pre-existing `rumdl` glob-quoting bug in `package.json` discovered
 #### What went well
-- Pre-existing lint bug surfaced and fixed: the `rumdl check '*.md' 'docs/**/*.md'` command in `package.json` used single-quoted globs that prevented shell expansion. Verified as pre-existing (reproduced on prior commit via `git stash`), cleanly isolated into its own `fix:` commit. This was a genuine find — the lint had been silently broken.
+- Pre-existing lint bug surfaced and fixed: the `rumdl check '*.md' 'docs/**/*.md'` command in `package.json` used single-quoted globs that prevented shell expansion.
+  Verified as pre-existing (reproduced on prior commit via `git stash`), cleanly isolated into its own `fix:` commit.
+  This was a genuine find — the lint had been silently broken.
 #### What caused friction (agent side)
-- `missing-context` — In step 6 (refactoring `index.ts`), replaced the `resolveModel` import with `resolveInvocationModel` without first checking whether `resolveModel` was still used elsewhere in the file. Two other call sites (`createSubagentsService` at line 386 and `getModelLabel` at line 1043) still needed it. The plan explicitly listed `getModelLabel` as a non-goal that continues using `resolveModel`, so the information was available. Caught immediately via `grep` after the edit and fixed in the same commit. Impact: one extra edit + grep cycle, no rework.
+- `missing-context` — In step 6 (refactoring `index.ts`), replaced the `resolveModel` import with `resolveInvocationModel` without first checking whether `resolveModel` was still used elsewhere in the file.
+  Two other call sites (`createSubagentsService` at line 386 and `getModelLabel` at line 1043) still needed it.
+  The plan explicitly listed `getModelLabel` as a non-goal that continues using `resolveModel`, so the information was available.
+  Caught immediately via `grep` after the edit and fixed in the same commit.
+  Impact: one extra edit + grep cycle, no rework.
-- `missing-context` — The plan's type definitions specified `model: unknown` for `ModelResolutionResult`, but downstream code in `index.ts` accesses `.id` and `.name` on the model and passes it where `Model<any>` is expected. The plan's risk section flagged this ("reducing but not eliminating the `any`"), yet the implementation went with `unknown` first, requiring a correction after `pnpm run check` failed with 4 type errors. Changed to `model: any` to match the existing `resolveModel` return type. Impact: one extra edit cycle within the same commit, no rework.
+- `missing-context` — The plan's type definitions specified `model: unknown` for `ModelResolutionResult`, but downstream code in `index.ts` accesses `.id` and `.name` on the model and passes it where `Model<any>` is expected.
+  The plan's risk section flagged this ("reducing but not eliminating the `any`"), yet the implementation went with `unknown` first, requiring a correction after `pnpm run check` failed with 4 type errors.
+  Changed to `model: any` to match the existing `resolveModel` return type.
+  Impact: one extra edit cycle within the same commit, no rework.
 #### What caused friction (user side)
-- None observed. The issue was well-scoped with clear acceptance criteria, making planning and execution straightforward.
+- None observed.
+  The issue was well-scoped with clear acceptance criteria, making planning and execution straightforward.

package/docs/retro/0054-decompose-index-into-modules.md CHANGED Viewed

@@ -18,20 +18,35 @@ Filed follow-up #66 (replace `as any` casts with proper SDK types) and #67 (flak
 #### What went well
-- Leaf-first extraction order worked cleanly — helpers, then renderer, then notification, then tools, then menu. Each step left the repo green with no cascading breakage.
+- Leaf-first extraction order worked cleanly — helpers, then renderer, then notification, then tools, then menu.
+  Each step left the repo green with no cascading breakage.
 - The `createNotificationSystem` factory pattern with arrow-closure capture of `widget` (assigned after `AgentManager` construction) preserved the existing deferred-reference semantics without restructuring initialization order.
 #### What caused friction (agent side)
-- `wrong-abstraction` — Applied the code-style skill's "keep Pi SDK imports out of business-logic modules" rule to tool/menu modules, which are SDK consumers, not business logic. Used `unknown` for `ExtensionContext`, `AgentSession`, `ModelRegistry` in factory dep interfaces, requiring 9 `as any` casts in `index.ts`. User caught this post-ship. Impact: filed #66 as a follow-up cleanup; the casts are cosmetic (no runtime effect) but degrade type safety. Fixed the code-style skill to clarify the boundary. (user-caught)
+- `wrong-abstraction` — Applied the code-style skill's "keep Pi SDK imports out of business-logic modules" rule to tool/menu modules, which are SDK consumers, not business logic.
+  Used `unknown` for `ExtensionContext`, `AgentSession`, `ModelRegistry` in factory dep interfaces, requiring 9 `as any` casts in `index.ts`.
+  User caught this post-ship.
+  Impact: filed #66 as a follow-up cleanup; the casts are cosmetic (no runtime effect) but degrade type safety.
+  Fixed the code-style skill to clarify the boundary. (user-caught)
-- `missing-context` — Four test files (`notification.test.ts`, `get-result-tool.test.ts`, `steer-tool.test.ts`, `agent-tool.test.ts`) omitted `compactionCount: 0` from `AgentRecord` factories. Caught at the final `pnpm run check` step, not during test writing. The testing skill already says "grep for ALL test files that construct a compatible mock." Impact: one extra fix cycle delegated to a subagent, no rework beyond that step. (self-identified)
+- `missing-context` — Four test files (`notification.test.ts`, `get-result-tool.test.ts`, `steer-tool.test.ts`, `agent-tool.test.ts`) omitted `compactionCount: 0` from `AgentRecord` factories.
+  Caught at the final `pnpm run check` step, not during test writing.
+  The testing skill already says "grep for ALL test files that construct a compatible mock."
+  Impact: one extra fix cycle delegated to a subagent, no rework beyond that step. (self-identified)
-- `other` — `Edit` tool failed 3 times matching the UTF-8 middle dot (`·`, U+00B7) in the steer tool's `stateParts.join(" · ")` line. The third attempt produced a partial match that left the file in a broken state (dangling orphan code after the replacement anchor). Required `git restore` and a fallback to `python3` line-range replacement. The same `python3` approach for the menu extraction lost the closing `}` of the default export function. Impact: ~5 minutes of rework across the two extraction steps, plus one `git restore`.
+- `other` — `Edit` tool failed 3 times matching the UTF-8 middle dot (`·`, U+00B7) in the steer tool's `stateParts.join(" · ")` line.
+  The third attempt produced a partial match that left the file in a broken state (dangling orphan code after the replacement anchor).
+  Required `git restore` and a fallback to `python3` line-range replacement.
+  The same `python3` approach for the menu extraction lost the closing `}` of the default export function.
+  Impact: ~5 minutes of rework across the two extraction steps, plus one `git restore`.
 #### What caused friction (user side)
-- The `as any` casts could have been caught earlier if the user had flagged the `unknown` types during the planning phase. However, the plan didn't prescribe exact interface types — that was an implementation decision. The user's post-ship review ("Why did we have to cast `as any`? Take a look at `packages/pi-permission-system/` as a model") was an efficient redirect that immediately scoped the investigation.
+- The `as any` casts could have been caught earlier if the user had flagged the `unknown` types during the planning phase.
+  However, the plan didn't prescribe exact interface types — that was an implementation decision.
+  The user's post-ship review ("Why did we have to cast `as any`?
+  Take a look at `packages/pi-permission-system/` as a model") was an efficient redirect that immediately scoped the investigation.
 ### Changes made

package/docs/retro/0057-structured-debug-logging.md ADDED Viewed

@@ -0,0 +1,77 @@
+---
+issue: 57
+issue_title: "feat: structured debug logging for silenced catch blocks"
+---
+# Retro: #57 — structured debug logging for silenced catch blocks
+## Final Retrospective (2026-05-19T10:30:00Z)
+### Session summary
+Added `src/debug.ts` with `debugLog` and `isDebug()`, then threaded `debugLog` into ~20 silent `catch` blocks across 9 files.
+All 7 TDD cycles went green on the first pass with no rework.
+Shipped as `pi-subagents-v5.1.0`, then followed up with a `refactor:` commit converting `DEBUG` (module-level constant) to `isDebug()` (function getter) during the retro.
+### Observations
+#### What went well
+- The plan's "Non-Goals" section correctly excluded `usage.ts` and `settings.ts` before implementation started, and a post-TDD `grep -rn 'catch\s*{'` confirmed only those two in-scope-excluded files remained.
+  Closing the loop with a verification query is worth repeating.
+- The scope of the change was so well-defined (the issue listed exact file names) that no `ask_user` call was needed during planning.
+#### What caused friction (agent side)
+- `missing-context` — When loading the `ask-user` skill I guessed `.pi/skills/ask-user/SKILL.md` before reading the actual `<location>` tag in `AGENTS.md`, triggering an ENOENT error and a follow-up `find` call.
+  Impact: 2 extra tool calls, no rework. (self-identified)
+- `other` — The plan's TDD Order step 1 stated *"the test skill documents this pattern"* for `vi.resetModules()` + dynamic import when testing module-level env constants — but the testing skill does not have that entry.
+  The aspiration was recorded rather than verified.
+  During the retro, the user's question ("should that be a function getter instead?") led to a better outcome: replace the module-level constant with `isDebug()` so `vi.stubEnv()` alone works, consistent with how every other `process.env` read in this codebase is structured.
+  Impact: one retro-phase `refactor:` commit; the approach shipped in `v5.1.0` was technically correct but unnecessarily complex to test.
+#### What caused friction (user side)
+- The initial issue proposal chose the module-level-constant pattern (common in Node.js tooling like the `debug` package).
+  A note in the issue or plan about preferring function-based env reads for testability would have caught this at design time rather than post-ship.
+  That said, the retro question was efficient — a single targeted redirect resolved it cleanly.
+### Changes made
+1. `packages/pi-subagents/src/debug.ts` — replaced `export const DEBUG` with `export function isDebug()`.
+2. `packages/pi-subagents/test/debug.test.ts` — simplified to static import + `vi.stubEnv()` only; removed all `vi.resetModules()` + dynamic `import()` calls.
+3. `.pi/skills/testing/SKILL.md` — added bullet: prefer reading `process.env` inside functions; `vi.stubEnv()` alone is insufficient for module-level constants.
+## Follow-up Retrospective (2026-05-19T11:15:00Z)
+### Session summary
+The user asked how many `process.*` reads exist in `pi-subagents`.
+Audit found 9 sites: 4 acceptable (wiring layer, detection functions, injectable defaults), 2 genuine injection gaps, and 1 mild case.
+Filed #76 (`AgentManager.dispose()` reads `process.cwd()` without a stored `cwd`) and #77 (`createAgentsMenuHandler` hardcodes `process.cwd()` when `AgentMenuDeps` already injects the personal-side equivalent).
+### Observations
+#### What went well
+- The `isDebug()` refactor naturally led the user to ask a broader design question about `process.*` access patterns, producing two well-scoped follow-up issues without manual triage.
+- The audit categorization (genuinely problematic vs. acceptable) was clean — presenting a table with verdicts per site let the user decide scope without re-reading source.
+#### What caused friction (agent side)
+- `premature-convergence` — The original plan accepted the module-level `DEBUG` constant without checking how the rest of the codebase reads `process.env`.
+  The code-style skill said "keep IO at the edges" but didn't name `process.*` specifically, so the rule wasn't applied.
+  Impact: one post-ship `refactor:` commit to replace `DEBUG` with `isDebug()`; the pattern was technically correct but inconsistent with codebase conventions. (user-caught)
+#### What caused friction (user side)
+- Nothing notable.
+  The user's two redirecting questions ("should that be a function?"
+  and "how many places access `process.*`?") were well-timed interventions that broadened scope productively.
+### Changes made
+1. `.pi/skills/code-style/SKILL.md` — added bullet: do not read `process.env`, `process.cwd()`, or `process.platform` inside library/utility functions; accept the value as a parameter.
+2. Filed #76 — inject `cwd` into `AgentManager` constructor.
+3. Filed #77 — add `projectAgentsDir` to `AgentMenuDeps`.

package/docs/retro/0069-create-subagent-runtime.md ADDED Viewed

@@ -0,0 +1,43 @@
+---
+issue: 69
+issue_title: "refactor: eliminate module-scope mutable state in pi-subagents — create SubagentRuntime"
+---
+# Retro: #69 — create SubagentRuntime
+## Final Retrospective (2026-05-19T16:47:00Z)
+### Session summary
+Planned, implemented, and shipped `SubagentRuntime` — a composition-root object that replaces module-scope mutable state in `agent-runner.ts` and closure-scoped state in `index.ts`.
+Six TDD steps completed with one deviation: `agent-tool.ts` and `agent-menu.ts` also imported the removed getter/setter exports, requiring unplanned fixes.
+Released as `pi-subagents-v5.2.0`.
+### Observations
+#### What went well
+- The lift-and-shift strategy (introduce `RunOptions` fields alongside module-scope fallback, wire consumers, then remove old path) kept the 460-test suite green through every intermediate commit.
+  No step broke existing tests.
+- `pnpm run check` caught the two missing downstream files (`agent-tool.ts`, `agent-menu.ts`) immediately after the removal step.
+  The typecheck-after-removal safety net worked exactly as intended.
+- The `pi-permission-system` prior art (`ExtensionRuntime` in #43) provided a clear structural template, reducing design decisions to near zero.
+#### What caused friction (agent side)
+- `missing-context` — The plan's Module-Level Changes listed `agent-runner.ts`, `agent-manager.ts`, and `index.ts` but missed `src/tools/agent-tool.ts` and `src/ui/agent-menu.ts`, both of which imported `getDefaultMaxTurns`/`setDefaultMaxTurns`/`getGraceTurns`/`setGraceTurns` from `agent-runner.ts`.
+  A grep for all importers of the removed symbols during planning would have caught this.
+  Impact: 4 extra files touched in step 5 (the two source files + their test helpers); no rework of earlier steps, but the commit scope was wider than planned. (self-identified at `pnpm run check` time)
+- `missing-context` — In step 3 (`agent-manager.test.ts`), checked `vi.mocked(runAgent).mock.calls[0]` without clearing the mock first.
+  The module-level `vi.mock("../src/agent-runner.js")` is shared across all describe blocks, so `calls[0]` picked up a stale invocation from an earlier test.
+  Impact: one debug cycle adding `vi.mocked(runAgent).mockClear()` after `resolvedRun()`. (self-identified)
+#### What caused friction (user side)
+- Nothing notable.
+  The plan was unambiguous, and the session ran without user intervention beyond the initial prompts.
+### Changes made
+1. `.pi/prompts/plan-issue.md` — added grep-importers rule to the Module-Level Changes bullet: when a step removes or renames an export, grep all `src/` and `test/` files for every removed symbol before finalizing the file list.

package/package.json CHANGED Viewed

@@ -1,6 +1,6 @@
 {
   "name": "@gotgenes/pi-subagents",
-  "version": "5.1.0",
+  "version": "5.3.0",
   "exports": {
     ".": "./src/service.ts"
   },