pi-crew 0.5.14 → 0.5.16

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (43) hide show
  1. package/CHANGELOG.md +117 -0
  2. package/README.md +1 -1
  3. package/docs/pi-crew-v0.5.16-audit-fix-plan.md +35 -0
  4. package/docs/pi-crew-v0.5.17-audit-fix-plan.md +80 -0
  5. package/docs/skills/REFERENCE.md +11 -0
  6. package/package.json +1 -1
  7. package/skills/iterative-audit/SKILL.md +330 -0
  8. package/src/extension/management.ts +1 -1
  9. package/src/extension/plan-orchestrate.ts +0 -1
  10. package/src/extension/register.ts +16 -7
  11. package/src/extension/registration/viewers.ts +1 -1
  12. package/src/extension/run-index.ts +1 -1
  13. package/src/extension/team-tool/explain.ts +0 -1
  14. package/src/extension/team-tool/handle-schedule.ts +0 -1
  15. package/src/extension/team-tool/health-monitor.ts +0 -1
  16. package/src/extension/team-tool/run.ts +2 -2
  17. package/src/extension/team-tool/status.ts +1 -1
  18. package/src/extension/team-tool.ts +2 -30
  19. package/src/observability/exporters/otlp-exporter.ts +11 -1
  20. package/src/runtime/child-pi.ts +1 -1
  21. package/src/runtime/crash-recovery.ts +1 -1
  22. package/src/runtime/crew-agent-records.ts +23 -3
  23. package/src/runtime/crew-hooks.ts +1 -1
  24. package/src/runtime/handoff-manager.ts +0 -1
  25. package/src/runtime/heartbeat-watcher.ts +1 -1
  26. package/src/runtime/live-session-runtime.ts +0 -1
  27. package/src/runtime/loop-gates.ts +0 -1
  28. package/src/runtime/mcp-proxy.ts +2 -2
  29. package/src/runtime/pipeline-runner.ts +1 -2
  30. package/src/runtime/task-runner/live-executor.ts +1 -2
  31. package/src/runtime/task-runner.ts +1 -1
  32. package/src/state/jsonl-writer.ts +24 -0
  33. package/src/state/locks.ts +66 -35
  34. package/src/state/run-metrics.ts +1 -2
  35. package/src/state/schedule.ts +13 -5
  36. package/src/state/state-store.ts +1 -1
  37. package/src/tools/safe-bash.ts +0 -1
  38. package/src/ui/crew-widget.ts +2 -2
  39. package/src/ui/render-diff.ts +1 -1
  40. package/src/ui/run-dashboard.ts +1 -2
  41. package/src/ui/tool-render.ts +20 -3
  42. package/src/utils/conflict-detect.ts +0 -1
  43. package/src/utils/gh-protocol.ts +0 -2
package/CHANGELOG.md CHANGED
@@ -1,5 +1,122 @@
1
1
  # Changelog
2
2
 
3
+ ## [0.5.16] — Rounds 22–31 Audit Fixes (2026-06-02)
4
+
5
+ ### Highlights
6
+ - **1 bug fix**: OTLP exporter `dispose()` now awaits in-flight push (bounded by 10s timeout)
7
+ - **269 new unit tests** across 16 previously-untested modules (Pattern #3)
8
+ - **72 unused imports removed** across 28 source files (Pattern #6)
9
+ - **2 defensive caps** for unbounded Maps (Pattern #2)
10
+ - **1 L1 fix**: `console.warn` → `logInternalError` in crew-hooks
11
+
12
+ ### Round 22: Defensive Caps (commit 85b3be6)
13
+ - Bounded `autoRecoveryLast` and `agentEventSeqCache` Maps to 1000 entries
14
+ - Eviction uses insertion-order oldest-first pattern
15
+
16
+ ### Round 23: Resource Cleanup (commit 4be2c4e)
17
+ - OTLP exporter `dispose()` now async, awaits in-flight push with 10s timeout
18
+ - Surveyed all setInterval/setTimeout, process.on, file watchers, event listeners, AbortControllers — all clean
19
+
20
+ ### Round 24: Test Coverage — discover-agents, markers, tiered-eval (commit cfe5242)
21
+ - 50 new tests: `sanitizeAgentSystemPrompt` (6 rules), `sanitizeGuidanceContent` (5 rules), `TieredEvalRunner` class
22
+
23
+ ### Round 25: Test Coverage — adaptive-plan, group-join (commit 89e1cf1)
24
+ - 42 new tests: `slug`, `extractAdaptivePlanJson`, `parseAdaptivePlan`, `repairAdaptivePlan`, `GroupJoinManager`
25
+
26
+ ### Round 26: Test Coverage — pi-args, i18n (commit 3669f24)
27
+ - 38 new tests: `applyThinkingSuffix`, `resolveCrewMaxDepth`, `t()`, `addTranslations`, `listLocales`
28
+
29
+ ### Round 27: Test Coverage — validation-types, live-extension-bridge (commit 44a2366)
30
+ - 36 new tests: `validateWithSeverity` strict/lenient modes, `buildExtensionBridge` mock session
31
+
32
+ ### Round 28: Test Coverage — direct-run, live-session-health (commit 339ac7d)
33
+ - 17 new tests: `isDirectRun`, `directTeamAndWorkflowFromRun`, `collectLiveSessionHealth`
34
+
35
+ ### Round 29: Test Coverage — process-status, task-claims (commit 405e05d)
36
+ - 43 new tests: `checkProcessLiveness`, `isActiveRunStatus`, full claim lifecycle
37
+
38
+ ### Round 30: Test Coverage — task-display, green-contract, session-utils (commit 7d065ca)
39
+ - 43 new tests: `shouldMaterializeAgent`, `taskById`, `waitingReason`, `greenLevelSatisfies`, `assertValidSessionId`
40
+
41
+ ### Round 31: Code Quality — unused imports + L1 fix (commit 35cc0e7)
42
+ - 72 unused imports removed across 28 source files
43
+ - `crew-hooks.ts`: `console.warn` → `logInternalError` for unknown event types
44
+
45
+ ### Stats
46
+ - Test suite: 2657 pass + 1 skip, 0 fail (was 2370 in v0.5.14; +287 net)
47
+ - TypeScript: 0 errors
48
+ - New test files: 13
49
+ - Files touched: 58
50
+
51
+ ## [0.5.15] — Round 20 + 21 Audit Fixes (2026-06-02)
52
+
53
+ ### Source tour
54
+ - Pulled latest `can1357/oh-my-pi` (1751 new commits since 2026-05-11) to working copy
55
+ - Surveyed extensibility, skill system, and security/performance changes via 3 parallel explorer agents
56
+ - Distilled 2 high-impact, immediately applicable patterns (Round 20)
57
+ - Identified 5 more upgrade opportunities; applied 5 in Round 21
58
+
59
+ ### Round 20: Lock token guard + tool-error sanitization (commit f448d7d)
60
+
61
+ #### 1. Per-process lock tokens (src/state/locks.ts)
62
+ - **Pattern source**: oh-my-pi commit `cd578a86d` (`file-lock.ts:13-152`)
63
+ - **Bug fixed**: "Losing contender wipes winner's lock" race when one process times out and steals a stale lock that the original holder is about to release
64
+ - Lock file now carries a UUID token. `releaseLock` refuses to `fs.rm` unless the stored token matches.
65
+ - 3 new tests in `test/unit/locks-race.test.ts`
66
+
67
+ #### 2. Tool-error sanitization (src/ui/tool-render.ts)
68
+ - **Pattern source**: oh-my-pi `render-utils.ts:177-185` (`replaceTabs(truncateToWidth(clean, LINE_CAP))`)
69
+ - **Bug fixed**: Embedded tabs/newlines/long strings in tool errors break TUI border alignment
70
+ - Applied to `renderAgentProgress` and `renderAgentToolResult` (2 places)
71
+ - `replaceTabs` is now exported from `src/ui/render-diff.ts` for reuse
72
+ - 2 new tests in `test/unit/tool-render.test.ts`
73
+
74
+ ### Round 21: L1 cleanup, lock kind, JSONL per-line cap, in-place loader test (commit 1bf120b)
75
+
76
+ #### 1. L1 cleanup in src/state/schedule.ts
77
+ - `console.warn` → `logInternalError` (consistency with rest of codebase)
78
+ - `require("node:fs")` → top-level `fs`/`path` imports
79
+ - 3 new tests in `test/unit/schedule-store.test.ts`
80
+
81
+ #### 2. Dead code sweep in src/state/locks.ts
82
+ - Removed misleadingly-named `readLockStateAsync` (sync I/O, called from async path) and its redundant call site
83
+ - Async path now mirrors sync path exactly: stale-check + release + sleep
84
+
85
+ #### 3. Lock file `kind` discriminator (forward compat)
86
+ - Lock JSON now includes `kind: "run" | "file"`
87
+ - `withRunLock` writes `kind="run"`; `withFileLockSync` writes `kind="file"`
88
+ - Old locks (no `kind` field) still work — `releaseLock` only reads `token`, so the discriminator is purely additive
89
+ - 3 new tests (kind for run, kind for file, back-compat with legacy locks)
90
+
91
+ #### 4. JSONL per-line cap (defensive, src/state/jsonl-writer.ts)
92
+ - Single huge line could exhaust memory during `redactJsonLine`
93
+ - New `DEFAULT_MAX_LINE_BYTES = 1MB`. Lines exceeding the cap are dropped and counted
94
+ - `logInternalError` fires on the first drop and every 100th drop thereafter
95
+ - 2 new tests in `test/unit/jsonl-writer.test.ts`
96
+
97
+ #### 5. In-place extension loader integration test
98
+ - **Pattern source**: oh-my-pi commit `c5e3698f4` (changed how extensions are loaded)
99
+ - This test verifies pi-crew's `import.meta.url`-based skill path resolution still works with the new in-place loader
100
+ - 2 new tests in `test/integration/extension-skill-resolution.test.ts`
101
+
102
+ ### Summary
103
+ - **2 rounds** (Round 20 + 21)
104
+ - **2 commits**: `f448d7d` (Round 20) + `1bf120b` (Round 21)
105
+ - **10 new tests** across 4 test files
106
+ - **Total tests**: 50 pass + 1 skip, **0 fail** (was 49 in v0.5.14)
107
+ - **TypeScript**: 0 errors
108
+ - **Patterns adopted**: 5 from `can1357/oh-my-pi` post-2026-05-11
109
+
110
+ ### Patterns surveyed but not applied (low applicability for pi-crew)
111
+ - **Streaming JSON throttle** (3a733c480) — pi-crew has no streaming JSON parser
112
+ - **In-place state mutation** (3a733c480) — pi-crew's spreads are bounded (small N), not hot paths
113
+ - **Bounded row probing** (b522fde56) — pi-crew has no SQL queries
114
+ - **MCP reconnect storm circuit breaker** — pi-crew has no MCP reconnect logic
115
+ - **Drop `args` global from eval** (4ab40764d) — pi-crew's `dynamic-script-runner.ts` already safe
116
+ - **Shell-injection rejection in git specs** (22e564a85) — pi-crew has no plugin install path
117
+ - **NPM registry pinning** (9abce6e97) — pi-crew's `install.mjs` is config-only; user runs `pi install npm:pi-crew`
118
+ - **Extension flag shadow** (1fbc2cbd7) — pi-crew has no `registerFlag` calls
119
+
3
120
  ## [0.5.14] — Round 19 Audit Fixes (2026-06-02)
4
121
 
5
122
  ### Phase 1: Path validation in checkpoint.ts (MEDIUM security)
package/README.md CHANGED
@@ -9,7 +9,7 @@ npm: pi-crew
9
9
  repo: https://github.com/baphuongna/pi-crew
10
10
  ```
11
11
 
12
- **v0.5.14**: See [CHANGELOG.md](CHANGELOG.md).
12
+ **v0.5.15**: See [CHANGELOG.md](CHANGELOG.md).
13
13
 
14
14
  ### Security highlights (v0.5.5)
15
15
 
@@ -0,0 +1,35 @@
1
+ # Round 22 Audit Fix Plan (Defensive Caps)
2
+
3
+ ## Findings
4
+
5
+ ### Issue 1: `autoRecoveryLast` Map grows unboundedly (MEDIUM, MEMORY)
6
+ - **File**: `src/extension/register.ts:484`
7
+ - **What**: Module-level `Map<string, number>` keyed by `${kind}_${runId}`. Holds cooldown timestamps for "recovery notifications" (5-minute gate per key).
8
+ - **Bug**: Entries are NEVER removed during a session. Each run contributes up to 4 keys (one per `maybeNotifyHealth` kind). Long-running pi sessions that run 1000+ teams accumulate 4000+ entries (~32KB).
9
+ - **Severity**: MEDIUM — silent memory growth in long-running process. Not a security issue.
10
+ - **Fix**: Add `AUTO_RECOVERY_LAST_MAX_ENTRIES` cap. Evict oldest insertion (matches the 5-min cooldown gate semantics — once the gate has expired, the entry is irrelevant). The eviction loop runs on each `set()` to amortize the cost.
11
+
12
+ ### Issue 2: `agentEventSeqCache` Map grows unboundedly (MEDIUM, MEMORY)
13
+ - **File**: `src/runtime/crew-agent-records.ts:265`
14
+ - **What**: Module-level `Map<string, { size, mtimeMs, seq }>` keyed by `filePath` (each agent event log). Caches the `.seq` sidecar value.
15
+ - **Bug**: Entries are NEVER removed. Each new agent task creates a new event log file, adding a cache entry. A long-running pi-crew process that spawns 1000s of agents accumulates 1000s of entries.
16
+ - **Severity**: MEDIUM — silent memory growth. Plus, stale entries mask filesystem changes (mtime/size won't reflect a re-created file).
17
+ - **Fix**: Add `AGENT_EVENT_SEQ_CACHE_MAX_ENTRIES` cap. Evict oldest insertion first (mirrors the `asyncAgentReaderCache` pattern at line 134-136 in the same file).
18
+
19
+ ## Plan (2 phases)
20
+
21
+ ### Phase 1: `autoRecoveryLast` defensive cap
22
+ - `src/extension/register.ts:484` — add `AUTO_RECOVERY_LAST_MAX_ENTRIES = 1000` constant
23
+ - Modify the `set()` site at line 1534 to evict oldest entries before inserting when size > cap
24
+ - Add test in `test/unit/auto-recovery-cap.test.ts`
25
+
26
+ ### Phase 2: `agentEventSeqCache` defensive cap
27
+ - `src/runtime/crew-agent-records.ts:265` — add `AGENT_EVENT_SEQ_CACHE_MAX_ENTRIES = 1000` constant
28
+ - Add helper function `setAgentEventSeqCache()` that wraps the `.set()` and evicts oldest entries
29
+ - Add test in `test/unit/crew-agent-records.test.ts` (or new file)
30
+
31
+ ## Expected impact
32
+ - 2 new tests, 0 regressions
33
+ - Total: 2 MEDIUM memory-leak fixes
34
+ - No public API changes
35
+ - Pattern: follows existing `NotificationRouter.SEEN_MAP_MAX_SIZE` and `asyncAgentReaderCache` patterns in the codebase
@@ -0,0 +1,80 @@
1
+ # Round 23 Audit Findings (Resource Cleanup)
2
+
3
+ ## Skill: iterative-audit (Pattern #7: Resource Cleanup)
4
+
5
+ ## Findings
6
+
7
+ ### Issue 1: OTLP exporter `inFlight` push not awaited on dispose (LOW)
8
+ - **File**: `src/observability/exporters/otlp-exporter.ts:80-86, 127-130`
9
+ - **What**: When `dispose()` is called, the interval timer is cleared but the in-flight `push()` continues to run until the 10s fetch timeout. The result is lost (not awaited).
10
+ - **Severity**: LOW — bounded by 10s fetch timeout. Not a real leak, just orphaned work.
11
+ - **Fix**: Make `dispose()` async. Await the in-flight push before returning.
12
+ - **Test**: 1 new test verifies `dispose()` waits for the in-flight push.
13
+
14
+ ## Patterns surveyed (all VERIFIED clean from source)
15
+
16
+ ### setInterval / setTimeout cleanup
17
+ | File | Resource | Cleanup | Status |
18
+ |------|----------|---------|--------|
19
+ | `register.ts:411` | `autoRepairTimer` | cleared on line 308, 402, 1102 | OK |
20
+ | `register.ts:442` | `tempReconcileTimer` | cleared on line 308, 402, 1102 | OK |
21
+ | `result-watcher.ts:80` | `pollTimer` | cleared in `stopPolling()` | OK |
22
+ | `result-watcher.ts:96` | `restartTimer` | cleared in `scheduleRestart()` and `stop()` | OK |
23
+ | `async-notifier.ts:101` | `state.interval` | cleared in `stopAsyncRunNotifier()` | OK |
24
+ | `subagent-tools.ts:228` | `timer` | cleanup function returned to caller | OK |
25
+ | `team-tool.ts:160` | `timer` | `stop()` method clears it | OK |
26
+ | `live-conversation-overlay.ts:55` | `pollTimer` | cleared in `close()` / `dispose()` | OK |
27
+ | `loaders.ts:127` | `timer` | cleared in `dispose()` | OK |
28
+ | `theme-adapter.ts:145` | `pollTimer` | cleared in unsubscribe (line 169) | OK |
29
+ | `delivery-coordinator.ts:169` | `ttlTimer` | cleared in `dispose()` | OK |
30
+ | `parent-guard.ts:61` | `guardInterval` | cleared in `stopParentGuard()` | OK |
31
+ | `scheduler.ts:88` | `t` (timer) | cleared on job removal | OK |
32
+ | `otlp-exporter.ts:80` | `timer` | cleared in `dispose()` (Round 23: also awaits inFlight) | OK |
33
+ | `team-runner.ts:67` | `interval` | local scope (per-run) | OK |
34
+ | `metric-sink.ts:68` | `timer` | cleared in `dispose()` (also closes fd) | OK |
35
+ | `handoff-manager.ts:203` | `cleanupTimer` | cleared in `dispose()` (also clears Maps) | OK |
36
+ | `live-session-runtime.ts:487` | `controlTimer` | cleared in `finally` block | OK |
37
+ | `budget-tracker.ts:231` | `abortInterval` | cleared on abort/exhausted | OK |
38
+ | `background-runner.ts:52, 74` | `interval` | local scope (process entry point) | OK |
39
+
40
+ ### process.on() signal handler registration
41
+ | File | Handlers | Guard | Status |
42
+ |------|----------|-------|--------|
43
+ | `crew-cleanup.ts:79, 84` | SIGTERM, SIGHUP | `signalHandlersRegistered` flag (Round 16) | OK |
44
+ | `background-runner.ts:107, 148, 175, 181, 194, 198` | many | process entry point (registered once per process) | OK |
45
+ | `event-log.ts:490-492` | exit, SIGTERM, SIGINT | module-level (ESM caches) | OK |
46
+ | `atomic-write.ts:265-267` | exit, SIGTERM, SIGINT | module-level (ESM caches) | OK |
47
+
48
+ ### File watchers
49
+ | File | Watcher | Cleanup | Status |
50
+ |------|---------|---------|--------|
51
+ | `register.ts:682, 686` | `crewWatcher`, `userCrewWatcher` | `closeWatcher()` in cleanup paths | OK |
52
+ | `result-watcher.ts` | `watcher` | `closeWatcher()` in `stop()` | OK |
53
+
54
+ ### Event listeners
55
+ | File | Listener | Cleanup | Status |
56
+ |------|----------|---------|--------|
57
+ | `event-bus.ts:on()` | deduped via Set | cleanup function returned | OK |
58
+ | `run-event-bus.ts:onAny()` etc. | deduped via Sets | cleanup function returned | OK |
59
+ | `phase-tracker.ts:dispose()` | EventEmitter | `removeAllListeners()` | OK |
60
+ | `team-tool.ts:72` | signal listener | `removeEventListener` in `finally` | OK |
61
+
62
+ ### AbortController
63
+ | File | Controller | Cleanup | Status |
64
+ |------|-----------|---------|--------|
65
+ | `team-tool.ts:68` | per-tool | aborted via signal listener, removed in `finally` | OK |
66
+ | `subagent-manager.ts:290` | per-run | cleaned in `cleanupRunSignal()` | OK |
67
+ | `cancellation-token.ts:17` | per-token | aborted via `#controller.abort()` | OK |
68
+ | `otlp-exporter.ts:106` | per-push | cleared in `finally` block | OK (also: dispose awaits inFlight) |
69
+
70
+ ## Plan (1 phase)
71
+
72
+ ### Phase 1: OTLP exporter `dispose()` awaits inFlight
73
+ - `src/observability/exporters/otlp-exporter.ts:127-130` — make `dispose()` async, await `this.inFlight`
74
+ - 1 new test in `test/unit/otlp-exporter.test.ts`
75
+
76
+ ## Expected impact
77
+ - 1 new test, 0 regressions
78
+ - Total: 1 LOW severity improvement
79
+ - No public API change (callers that don't await still get synchronous timer clear)
80
+ - Pattern: matches the existing `await` patterns elsewhere in the codebase
@@ -38,6 +38,16 @@ multi-perspective-review (8-pass deep review)
38
38
  secure-agent-orchestration-review (security focus)
39
39
  ```
40
40
 
41
+ ### Multi-Round Audit (5-20 rounds)
42
+
43
+ ```
44
+ iterative-audit (round planning, 7 patterns, diminishing-returns)
45
+
46
+ multi-perspective-review (per round, optional)
47
+
48
+ verification-before-done (per round)
49
+ ```
50
+
41
51
  ---
42
52
 
43
53
  ## When to Invoke
@@ -48,6 +58,7 @@ secure-agent-orchestration-review (security focus)
48
58
  | Before claiming done | `verification-before-done` |
49
59
  | Code review (quick) | `scrutinize` |
50
60
  | Code review (deep) | `multi-perspective-review` |
61
+ | Multi-round audit (5-20 rounds) | `iterative-audit` |
51
62
  | Task delegation | `delegation-patterns` |
52
63
  | Complex multi-phase work | `orchestration` |
53
64
  | After bug is fixed | `post-mortem` |
package/package.json CHANGED
@@ -1,6 +1,6 @@
1
1
  {
2
2
  "name": "pi-crew",
3
- "version": "0.5.14",
3
+ "version": "0.5.16",
4
4
  "description": "Pi extension for coordinated AI teams, workflows, worktrees, and async task orchestration",
5
5
  "author": "baphuongna",
6
6
  "license": "MIT",
@@ -0,0 +1,330 @@
1
+ ---
2
+ name: iterative-audit
3
+ description: "Iterative multi-round codebase audit with diminishing-returns detection. Run 5-20+ rounds, each focusing on one specific area. Built from 19 rounds of dogfooding pi-crew on itself."
4
+ triggers:
5
+ - "audit this codebase"
6
+ - "review everything"
7
+ - "find all bugs"
8
+ - "deep audit"
9
+ - "harden this"
10
+ - "iterate audit rounds"
11
+ - "multi-round review"
12
+ ---
13
+
14
+ # Iterative Audit
15
+
16
+ > Distilled from 19 rounds of auditing pi-crew on itself (v0.5.5 → v0.5.14):
17
+ > ~70 issues fixed, 286 tests added, 9 security improvements, 2 performance improvements.
18
+
19
+ The core insight: **a single round of audit finds the easy 30% of bugs**. The remaining 70% only surfaces through 5-20+ targeted rounds, each with a specific focus. After round 5+ you find HIGH severity bugs that round 1 missed. After round 10+ you find issues that no human reviewer would catch in a single pass.
20
+
21
+ ## Operating Stance
22
+
23
+ - **One focus per round.** Each round targets one of the 7 patterns below. Don't try to fix everything in one pass.
24
+ - **Source verification is mandatory.** Never trust audit docs or previous round reports — always read the actual code. ~30% of issues from prior rounds are false positives or already fixed.
25
+ - **Document every finding with file:line.** "Sandbox env allow-list" is useless. "src/runtime/sandbox.ts:70 — process.env full leak" is actionable.
26
+ - **Verify the team actually applied changes.** After any team run, run `git diff` and inspect. ~20% of team runs silently fail to apply changes.
27
+ - **Don't publish without explicit user confirmation.** Audit work compounds; releasing in the middle of a round leaves the codebase in a half-hardened state.
28
+
29
+ ## The 7 Patterns (rotate through these)
30
+
31
+ After 19 rounds, every issue found falls into one of these 7 categories. Use this to plan each round's focus.
32
+
33
+ ### 1. L1 Cleanup (decoration, low value, easy)
34
+ **What**: Replace `console.error` / `console.warn` / `process.stderr.write` with `logInternalError()` from `utils/internal-error.ts`.
35
+
36
+ **Why**: `console.error` may not be visible in JSON-RPC mode or when stderr is redirected. `logInternalError` is the project-wide pattern; missing it means errors are silently dropped.
37
+
38
+ **How to find them**:
39
+ ```bash
40
+ rg -n 'console\.(error|warn|log)' src/
41
+ rg -n 'process\.stderr\.write' src/
42
+ ```
43
+
44
+ **Rule**: Skip `internal-error.ts:5` itself (it's the implementation). Skip `background-runner.ts:146` (overrides `console.error` for testing). Skip `parent-guard.ts:37` (exit-time log must fire synchronously).
45
+
46
+ **Time per round**: 30 min for 5-10 callsites. Diminishing returns after round 1.
47
+
48
+ ### 2. Defensive Caps (memory safety, medium value, medium effort)
49
+ **What**: Find Maps, Sets, Arrays, and Queues that grow unboundedly. Add `MAX_*` constants and eviction logic.
50
+
51
+ **Why**: Long-running processes (background runners, extension reloads) accumulate state. Without caps, a busy period causes OOM.
52
+
53
+ **How to find them**:
54
+ ```bash
55
+ rg -n 'new Map\(' src/ # look for ones that are .set() repeatedly
56
+ rg -n 'new Set\(' src/
57
+ rg -n 'this\.\w+\.push\(' src/ # look for unbounded arrays
58
+ ```
59
+
60
+ **Common patterns**:
61
+ - `Semaphore.#queue` → add `MAX_QUEUE` cap (pi-crew: 10,000)
62
+ - `liveAgentManager.liveAgents` Map → add `MAX_LIVE_AGENTS` cap (pi-crew: 5,000)
63
+ - `OverflowRecoveryTracker.states` Map → add `MAX_TRACKED_STATES` cap (pi-crew: 5,000)
64
+ - `NotificationRouter.seen` Map → add `SEEN_MAP_MAX_SIZE` cap (pi-crew: 10,000)
65
+
66
+ **Eviction strategies** (in order of preference):
67
+ 1. **LRU by access time** — track `lastAccessAt` per entry
68
+ 2. **Oldest insertion** — Map's natural insertion order works (delete first key)
69
+ 3. **Terminal-state priority** — protect live entries, evict completed/failed/cancelled first
70
+
71
+ **Test pattern**: Verify cap by inserting 1.5× the max, confirm old entries are gone.
72
+
73
+ ### 3. Test Coverage Gaps (good value, low effort)
74
+ **What**: Find source files with zero direct unit tests.
75
+
76
+ **How to find them**:
77
+ ```bash
78
+ # For each src file, check if any test file imports it
79
+ for f in src/runtime/*.ts src/extension/*.ts; do
80
+ basename=$(basename "$f" .ts)
81
+ count=$(ls test/unit/${basename}*.test.ts 2>/dev/null | wc -l)
82
+ [ "$count" = "0" ] && echo "NO TEST: $f"
83
+ done
84
+ ```
85
+
86
+ **Prioritize**:
87
+ - Security-critical: `sandbox.ts`, `child-pi.ts`, `pi-spawn.ts`, `crew-cleanup.ts`
88
+ - Resource-management: `live-agent-manager.ts`, `semaphore.ts`, `overflow-recovery.ts`
89
+ - Public APIs: anything with `export class` or `export function`
90
+
91
+ **Don't test**: internal helpers, generated code, pure re-exports.
92
+
93
+ **Test categories** (in order of importance):
94
+ 1. **Path validation** (security) — `assertSafePathId`, path traversal rejection
95
+ 2. **Resource cleanup** — `dispose()` clears everything, listeners don't stack
96
+ 3. **Boundary conditions** — empty input, max-size, overflow
97
+ 4. **Callback lifecycle** — sync/async error handling, `resultConsumed` flag
98
+
99
+ ### 4. Security Hardening (high value, high effort)
100
+ **What**: Find places where untrusted input reaches dangerous sinks.
101
+
102
+ **Common sinks to audit**:
103
+ - `execSync(command)` → switch to `execFileSync(program, args[])`
104
+ - `eval()` / `Function()` / `vm.runInNewContext()` → avoid entirely
105
+ - `path.join(base, userInput)` → use `assertSafePathId(userInput)` first
106
+ - `process.env` access → use sanitized env with allow-list
107
+ - File writes to user-controlled paths → validate path is within allowed roots
108
+ - Child process spawn → use `cwd: knownDir`, sanitize env
109
+
110
+ **How to find them**:
111
+ ```bash
112
+ rg -n 'execSync\(' src/
113
+ rg -n 'exec\(' src/
114
+ rg -n 'eval\(|Function\(' src/
115
+ rg -n 'spawn\(' src/
116
+ rg -n 'path\.join\(' src/ | rg 'record\.|task\.|runId|agent\.'
117
+ ```
118
+
119
+ **Round 1**: Find all `execSync` and `exec`. Switch to `execFileSync(program, args)` (no shell).
120
+ **Round 2**: Audit env handling. Look for `process.env` access in hot paths. Add allow-list.
121
+ **Round 3**: Path traversal. For every `path.join(base, userInput)`, add `assertSafePathId()`.
122
+ **Round 4**: Subprocess safety. Verify all `spawn()` calls have: validated args, sanitized env, `cwd` set, signal handling, timeout.
123
+
124
+ ### 5. Performance (medium value, medium effort)
125
+ **What**: Find O(N²) or worse algorithms, especially in hot paths.
126
+
127
+ **Common patterns**:
128
+ - Recomputing document frequency in search loops → precompute at construction
129
+ - `array.filter().map().filter()` in a loop → fuse into one pass
130
+ - `JSON.parse` of the same file repeatedly → cache
131
+ - `fs.statSync` per file in a directory scan → batch with `Dirent.isDirectory()`
132
+ - `setTimeout` busy-polling for state changes → use `fs.watch` or events
133
+
134
+ **How to find them**:
135
+ ```bash
136
+ # Look for nested loops over the same data
137
+ rg -nB 1 -A 5 'for.*of.*for' src/
138
+ # Look for polls
139
+ rg -n 'setTimeout.*poll' src/
140
+ rg -n 'pollIntervalMs' src/
141
+ ```
142
+
143
+ **Test pattern**: For precomputation fixes, write a perf test that creates 1000 docs, runs search, and asserts completion under 100ms.
144
+
145
+ ### 6. Code Quality (low value, easy)
146
+ **What**: Remove dead code, fix type misuse, add missing JSDoc.
147
+
148
+ **Common patterns**:
149
+ - Fields declared but never used (e.g., `seenCleanupCounter`)
150
+ - Unused imports
151
+ - Type assertions (`as any`, `as unknown as T`) that hide real issues
152
+ - Functions that always return the same value
153
+ - Catch blocks that swallow errors silently
154
+
155
+ **How to find them**:
156
+ ```bash
157
+ # Find fields/methods declared but never used
158
+ rg -n 'private \w+\s*=\s*' src/ | while read line; do
159
+ field=$(echo "$line" | grep -oP 'private \K\w+')
160
+ count=$(rg -c "\b$field\b" src/ 2>/dev/null | head -1)
161
+ [ "$count" = "1" ] && echo "DEAD: $line"
162
+ done
163
+ ```
164
+
165
+ ### 7. Resource Cleanup (medium value, medium effort)
166
+ **What**: Find places where listeners, timers, file handles, or other resources can leak.
167
+
168
+ **Common patterns**:
169
+ - `process.on('SIGTERM', ...)` registered multiple times → use module-level flag
170
+ - `setInterval` / `setTimeout` not cleared on shutdown → `dispose()` method
171
+ - `AbortController` not aborted in cleanup
172
+ - File watchers (`fs.watch`) not closed
173
+ - Event listeners (`emitter.on`) not removed
174
+
175
+ **How to find them**:
176
+ ```bash
177
+ rg -n 'process\.on\(' src/
178
+ rg -n 'setInterval\(' src/
179
+ rg -n 'setTimeout\(' src/ | rg -v 'setTimeout.*resolve' # filter out poll sleeps
180
+ rg -n 'fs\.watch\(' src/
181
+ ```
182
+
183
+ **Test pattern**: Call the registration function N times, verify listener count is 1.
184
+
185
+ ## Round Workflow (use this for EVERY round)
186
+
187
+ ### Step 1: Pick a focus
188
+ Choose ONE of the 7 patterns above. Don't try to do multiple patterns in one round.
189
+
190
+ ### Step 2: Explore (read 3-5 files)
191
+ Read the actual source for the focus area. Don't trust prior audit docs.
192
+
193
+ ### Step 3: Verify from source
194
+ For each candidate issue:
195
+ - Read the file at the cited line
196
+ - Check if the issue is real (not a false positive)
197
+ - Check if it's already fixed
198
+ - Note the exact file:line and code snippet
199
+
200
+ ### Step 4: Create a plan doc
201
+ ```markdown
202
+ # Round N Audit Fix Plan
203
+ ## Findings
204
+ ### Issue 1: <file>:<line> — <title> (severity)
205
+ <File path and line numbers>
206
+ <Code snippet showing the issue>
207
+ <Rationale>
208
+
209
+ ## Plan (5 phases)
210
+ ### Phase 1: <action>
211
+ ### Phase 2: <action>
212
+ ...
213
+ ```
214
+
215
+ ### Step 5: Implement
216
+ - Make the fix
217
+ - Add tests (if applicable)
218
+ - Run typecheck: `npx tsc --noEmit`
219
+ - Run tests: `npm test`
220
+
221
+ ### Step 6: Commit + Release
222
+ - Commit with conventional message: `fix: round N - <summary>`
223
+ - Update CHANGELOG.md
224
+ - Bump version (patch)
225
+ - Push + npm publish
226
+ - Create GitHub release
227
+
228
+ ### Step 7: Decide: continue or stop?
229
+ After 5-10 rounds, evaluate:
230
+
231
+ **Continue if**:
232
+ - Last 2 rounds found HIGH or MEDIUM severity issues
233
+ - Test coverage is < 80% of modules
234
+ - User explicitly wants more
235
+
236
+ **Stop if**:
237
+ - Last 2 rounds found only LOW severity or L1 cleanup
238
+ - All patterns exhausted (you've done each at least once)
239
+ - Diminishing returns: more time spent planning than implementing
240
+
241
+ ## When to Use Teams vs. Do It Yourself
242
+
243
+ **Use teams** (via `team action='run', team='review'`) for:
244
+ - Initial broad audit (round 1)
245
+ - Security reviews (specialized `security-reviewer` agent)
246
+ - When you need 3+ perspectives (multi-explorer)
247
+
248
+ **Do it yourself** for:
249
+ - Round 2+ (you have context from prior rounds)
250
+ - Focused single-pattern work (L1 cleanup, test coverage)
251
+ - Small fixes (< 5 file edits)
252
+
253
+ **Teams often fail because**:
254
+ - 5-min heartbeat timeout for long-running runs (add `startTeamRunHeartbeat` if needed)
255
+ - Agent cancellations
256
+ - Hallucinated file:line references (always verify from source)
257
+
258
+ ## Common False Positives (audit findings to reject)
259
+
260
+ After 19 rounds, ~30% of audit findings are false positives. Common patterns:
261
+
262
+ 1. **"Double-merge in config"** — looks like a bug, but project config + user config merge is intentional
263
+ 2. **"as unknown as T in error handling"** — necessary for TypeScript's strict mode
264
+ 3. **"Auto-repair timer race"** — there's a guard like `cleanedUp || !currentCtx` you missed
265
+ 4. **"Already-validated input"** — validation is in the caller, not the callee
266
+ 5. **"Redundant null check"** — TypeScript narrowing doesn't always work for closures
267
+
268
+ **Always verify against source** before acting. If you're not sure, write a test that exercises the alleged bug path. If the test passes, it's a false positive.
269
+
270
+ ## Success Metrics
271
+
272
+ After each round, record:
273
+ - Issues found (real vs. false positive)
274
+ - Tests added
275
+ - Typecheck clean?
276
+ - Total test count delta
277
+
278
+ **Healthy round**: 3-8 real issues found, +20 to +50 tests added, all pass.
279
+
280
+ **Exhausted round**: 0-1 real issues found, 0 tests added, mostly L1 cleanup.
281
+
282
+ When you hit 2+ exhausted rounds in a row, **stop**.
283
+
284
+ ## Real Examples from 19 Rounds
285
+
286
+ | Round | Focus | Issues Found | Severity Range |
287
+ |-------|-------|--------------|----------------|
288
+ | 1-3 | Broad security audit | 11 | CRITICAL, HIGH |
289
+ | 4-6 | Race conditions, locks | 5 | HIGH |
290
+ | 7-9 | L1 cleanup, dead code | 12 | LOW |
291
+ | 10-12 | Defensive caps | 3 | MEDIUM |
292
+ | 13-15 | Security: execSync, sandbox | 9 | CRITICAL, HIGH |
293
+ | 16-18 | Test coverage, L1 | 30+ | LOW |
294
+ | 19 | Path validation, tests | 5 | MEDIUM |
295
+
296
+ **Pattern**: First 3 rounds find the most impactful issues. Rounds 4-15 find the rest. Rounds 16+ are diminishing returns (mostly test coverage and L1 cleanup).
297
+
298
+ ## Anti-Patterns to Avoid
299
+
300
+ - **Mega-rounds** (10+ files, 5+ categories) — too broad, low quality findings
301
+ - **Trusting audit docs** — always verify from source
302
+ - **Skipping typecheck** — type errors compound and become hard to debug later
303
+ - **Releasing mid-round** — leaves the codebase in a half-hardened state
304
+ - **No test for the fix** — every fix needs a test that would have caught the bug
305
+ - **Committing too late** — commit after each phase, not at the end of the round
306
+
307
+ ## Enforcement — Iterative Audit Gate
308
+
309
+ **Before reporting round findings, verify:**
310
+
311
+ - [ ] Round focus is ONE of the 7 patterns (not multiple)
312
+ - [ ] Each finding has a verified `file:line` reference (read the actual source)
313
+ - [ ] False positives filtered out (consult "Common False Positives" section)
314
+ - [ ] Severity assigned using the standard scale (CRITICAL / HIGH / MEDIUM / LOW)
315
+ - [ ] Plan doc created with phases and file:line evidence
316
+ - [ ] Typecheck clean: `npx tsc --noEmit` returns 0 errors
317
+ - [ ] All tests pass: `npm test` shows 0 failures
318
+ - [ ] Tests added for the fix (if applicable)
319
+ - [ ] Round results recorded: issues found, tests added, delta
320
+ - [ ] Decision logged: continue to next round or stop (with reason)
321
+
322
+ **If ANY answer is NO → Stop. Complete audit requirements before reporting round results.**
323
+
324
+ ## Related Skills
325
+
326
+ - `scrutinize` — Quick outsider-perspective review of a single change
327
+ - `multi-perspective-review` — 8-pass deep review for a single change
328
+ - `security-review` — Security-focused audit with detection authoring
329
+ - `verification-before-done` — Evidence before claim (use per round)
330
+ - `systematic-debugging` — When a finding reveals a real bug that needs deeper investigation
@@ -3,7 +3,7 @@ import * as fs from "node:fs";
3
3
  import * as path from "node:path";
4
4
  import type { AgentConfig, ResourceSource, RoutingMetadata } from "../agents/agent-config.ts";
5
5
  import { serializeAgent } from "../agents/agent-serializer.ts";
6
- import { allAgents, discoverAgents } from "../agents/discover-agents.ts";
6
+ import { discoverAgents } from "../agents/discover-agents.ts";
7
7
  import type { TeamToolDetails } from "./team-tool-types.ts";
8
8
  import { toolResult, type PiTeamsToolResult } from "./tool-result.ts";
9
9
  import type { TeamToolParamsValue } from "../schema/team-tool-schema.ts";
@@ -6,7 +6,6 @@
6
6
  */
7
7
 
8
8
  import * as fs from "node:fs";
9
- import * as path from "node:path";
10
9
 
11
10
  /**
12
11
  * Tag → Agent chain mapping from ECC recommendations.