pi-crew 0.5.14 → 0.5.16
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +117 -0
- package/README.md +1 -1
- package/docs/pi-crew-v0.5.16-audit-fix-plan.md +35 -0
- package/docs/pi-crew-v0.5.17-audit-fix-plan.md +80 -0
- package/docs/skills/REFERENCE.md +11 -0
- package/package.json +1 -1
- package/skills/iterative-audit/SKILL.md +330 -0
- package/src/extension/management.ts +1 -1
- package/src/extension/plan-orchestrate.ts +0 -1
- package/src/extension/register.ts +16 -7
- package/src/extension/registration/viewers.ts +1 -1
- package/src/extension/run-index.ts +1 -1
- package/src/extension/team-tool/explain.ts +0 -1
- package/src/extension/team-tool/handle-schedule.ts +0 -1
- package/src/extension/team-tool/health-monitor.ts +0 -1
- package/src/extension/team-tool/run.ts +2 -2
- package/src/extension/team-tool/status.ts +1 -1
- package/src/extension/team-tool.ts +2 -30
- package/src/observability/exporters/otlp-exporter.ts +11 -1
- package/src/runtime/child-pi.ts +1 -1
- package/src/runtime/crash-recovery.ts +1 -1
- package/src/runtime/crew-agent-records.ts +23 -3
- package/src/runtime/crew-hooks.ts +1 -1
- package/src/runtime/handoff-manager.ts +0 -1
- package/src/runtime/heartbeat-watcher.ts +1 -1
- package/src/runtime/live-session-runtime.ts +0 -1
- package/src/runtime/loop-gates.ts +0 -1
- package/src/runtime/mcp-proxy.ts +2 -2
- package/src/runtime/pipeline-runner.ts +1 -2
- package/src/runtime/task-runner/live-executor.ts +1 -2
- package/src/runtime/task-runner.ts +1 -1
- package/src/state/jsonl-writer.ts +24 -0
- package/src/state/locks.ts +66 -35
- package/src/state/run-metrics.ts +1 -2
- package/src/state/schedule.ts +13 -5
- package/src/state/state-store.ts +1 -1
- package/src/tools/safe-bash.ts +0 -1
- package/src/ui/crew-widget.ts +2 -2
- package/src/ui/render-diff.ts +1 -1
- package/src/ui/run-dashboard.ts +1 -2
- package/src/ui/tool-render.ts +20 -3
- package/src/utils/conflict-detect.ts +0 -1
- package/src/utils/gh-protocol.ts +0 -2
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,122 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.5.16] — Rounds 22–31 Audit Fixes (2026-06-02)
|
|
4
|
+
|
|
5
|
+
### Highlights
|
|
6
|
+
- **1 bug fix**: OTLP exporter `dispose()` now awaits in-flight push (bounded by 10s timeout)
|
|
7
|
+
- **269 new unit tests** across 16 previously-untested modules (Pattern #3)
|
|
8
|
+
- **72 unused imports removed** across 28 source files (Pattern #6)
|
|
9
|
+
- **2 defensive caps** for unbounded Maps (Pattern #2)
|
|
10
|
+
- **1 L1 fix**: `console.warn` → `logInternalError` in crew-hooks
|
|
11
|
+
|
|
12
|
+
### Round 22: Defensive Caps (commit 85b3be6)
|
|
13
|
+
- Bounded `autoRecoveryLast` and `agentEventSeqCache` Maps to 1000 entries
|
|
14
|
+
- Eviction uses insertion-order oldest-first pattern
|
|
15
|
+
|
|
16
|
+
### Round 23: Resource Cleanup (commit 4be2c4e)
|
|
17
|
+
- OTLP exporter `dispose()` now async, awaits in-flight push with 10s timeout
|
|
18
|
+
- Surveyed all setInterval/setTimeout, process.on, file watchers, event listeners, AbortControllers — all clean
|
|
19
|
+
|
|
20
|
+
### Round 24: Test Coverage — discover-agents, markers, tiered-eval (commit cfe5242)
|
|
21
|
+
- 50 new tests: `sanitizeAgentSystemPrompt` (6 rules), `sanitizeGuidanceContent` (5 rules), `TieredEvalRunner` class
|
|
22
|
+
|
|
23
|
+
### Round 25: Test Coverage — adaptive-plan, group-join (commit 89e1cf1)
|
|
24
|
+
- 42 new tests: `slug`, `extractAdaptivePlanJson`, `parseAdaptivePlan`, `repairAdaptivePlan`, `GroupJoinManager`
|
|
25
|
+
|
|
26
|
+
### Round 26: Test Coverage — pi-args, i18n (commit 3669f24)
|
|
27
|
+
- 38 new tests: `applyThinkingSuffix`, `resolveCrewMaxDepth`, `t()`, `addTranslations`, `listLocales`
|
|
28
|
+
|
|
29
|
+
### Round 27: Test Coverage — validation-types, live-extension-bridge (commit 44a2366)
|
|
30
|
+
- 36 new tests: `validateWithSeverity` strict/lenient modes, `buildExtensionBridge` mock session
|
|
31
|
+
|
|
32
|
+
### Round 28: Test Coverage — direct-run, live-session-health (commit 339ac7d)
|
|
33
|
+
- 17 new tests: `isDirectRun`, `directTeamAndWorkflowFromRun`, `collectLiveSessionHealth`
|
|
34
|
+
|
|
35
|
+
### Round 29: Test Coverage — process-status, task-claims (commit 405e05d)
|
|
36
|
+
- 43 new tests: `checkProcessLiveness`, `isActiveRunStatus`, full claim lifecycle
|
|
37
|
+
|
|
38
|
+
### Round 30: Test Coverage — task-display, green-contract, session-utils (commit 7d065ca)
|
|
39
|
+
- 43 new tests: `shouldMaterializeAgent`, `taskById`, `waitingReason`, `greenLevelSatisfies`, `assertValidSessionId`
|
|
40
|
+
|
|
41
|
+
### Round 31: Code Quality — unused imports + L1 fix (commit 35cc0e7)
|
|
42
|
+
- 72 unused imports removed across 28 source files
|
|
43
|
+
- `crew-hooks.ts`: `console.warn` → `logInternalError` for unknown event types
|
|
44
|
+
|
|
45
|
+
### Stats
|
|
46
|
+
- Test suite: 2657 pass + 1 skip, 0 fail (was 2370 in v0.5.14; +287 net)
|
|
47
|
+
- TypeScript: 0 errors
|
|
48
|
+
- New test files: 13
|
|
49
|
+
- Files touched: 58
|
|
50
|
+
|
|
51
|
+
## [0.5.15] — Round 20 + 21 Audit Fixes (2026-06-02)
|
|
52
|
+
|
|
53
|
+
### Source tour
|
|
54
|
+
- Pulled latest `can1357/oh-my-pi` (1751 new commits since 2026-05-11) to working copy
|
|
55
|
+
- Surveyed extensibility, skill system, and security/performance changes via 3 parallel explorer agents
|
|
56
|
+
- Distilled 2 high-impact, immediately applicable patterns (Round 20)
|
|
57
|
+
- Identified 5 more upgrade opportunities; applied 5 in Round 21
|
|
58
|
+
|
|
59
|
+
### Round 20: Lock token guard + tool-error sanitization (commit f448d7d)
|
|
60
|
+
|
|
61
|
+
#### 1. Per-process lock tokens (src/state/locks.ts)
|
|
62
|
+
- **Pattern source**: oh-my-pi commit `cd578a86d` (`file-lock.ts:13-152`)
|
|
63
|
+
- **Bug fixed**: "Losing contender wipes winner's lock" race when one process times out and steals a stale lock that the original holder is about to release
|
|
64
|
+
- Lock file now carries a UUID token. `releaseLock` refuses to `fs.rm` unless the stored token matches.
|
|
65
|
+
- 3 new tests in `test/unit/locks-race.test.ts`
|
|
66
|
+
|
|
67
|
+
#### 2. Tool-error sanitization (src/ui/tool-render.ts)
|
|
68
|
+
- **Pattern source**: oh-my-pi `render-utils.ts:177-185` (`replaceTabs(truncateToWidth(clean, LINE_CAP))`)
|
|
69
|
+
- **Bug fixed**: Embedded tabs/newlines/long strings in tool errors break TUI border alignment
|
|
70
|
+
- Applied to `renderAgentProgress` and `renderAgentToolResult` (2 places)
|
|
71
|
+
- `replaceTabs` is now exported from `src/ui/render-diff.ts` for reuse
|
|
72
|
+
- 2 new tests in `test/unit/tool-render.test.ts`
|
|
73
|
+
|
|
74
|
+
### Round 21: L1 cleanup, lock kind, JSONL per-line cap, in-place loader test (commit 1bf120b)
|
|
75
|
+
|
|
76
|
+
#### 1. L1 cleanup in src/state/schedule.ts
|
|
77
|
+
- `console.warn` → `logInternalError` (consistency with rest of codebase)
|
|
78
|
+
- `require("node:fs")` → top-level `fs`/`path` imports
|
|
79
|
+
- 3 new tests in `test/unit/schedule-store.test.ts`
|
|
80
|
+
|
|
81
|
+
#### 2. Dead code sweep in src/state/locks.ts
|
|
82
|
+
- Removed misleadingly-named `readLockStateAsync` (sync I/O, called from async path) and its redundant call site
|
|
83
|
+
- Async path now mirrors sync path exactly: stale-check + release + sleep
|
|
84
|
+
|
|
85
|
+
#### 3. Lock file `kind` discriminator (forward compat)
|
|
86
|
+
- Lock JSON now includes `kind: "run" | "file"`
|
|
87
|
+
- `withRunLock` writes `kind="run"`; `withFileLockSync` writes `kind="file"`
|
|
88
|
+
- Old locks (no `kind` field) still work — `releaseLock` only reads `token`, so the discriminator is purely additive
|
|
89
|
+
- 3 new tests (kind for run, kind for file, back-compat with legacy locks)
|
|
90
|
+
|
|
91
|
+
#### 4. JSONL per-line cap (defensive, src/state/jsonl-writer.ts)
|
|
92
|
+
- Single huge line could exhaust memory during `redactJsonLine`
|
|
93
|
+
- New `DEFAULT_MAX_LINE_BYTES = 1MB`. Lines exceeding the cap are dropped and counted
|
|
94
|
+
- `logInternalError` fires on the first drop and every 100th drop thereafter
|
|
95
|
+
- 2 new tests in `test/unit/jsonl-writer.test.ts`
|
|
96
|
+
|
|
97
|
+
#### 5. In-place extension loader integration test
|
|
98
|
+
- **Pattern source**: oh-my-pi commit `c5e3698f4` (changed how extensions are loaded)
|
|
99
|
+
- This test verifies pi-crew's `import.meta.url`-based skill path resolution still works with the new in-place loader
|
|
100
|
+
- 2 new tests in `test/integration/extension-skill-resolution.test.ts`
|
|
101
|
+
|
|
102
|
+
### Summary
|
|
103
|
+
- **2 rounds** (Round 20 + 21)
|
|
104
|
+
- **2 commits**: `f448d7d` (Round 20) + `1bf120b` (Round 21)
|
|
105
|
+
- **10 new tests** across 4 test files
|
|
106
|
+
- **Total tests**: 50 pass + 1 skip, **0 fail** (was 49 in v0.5.14)
|
|
107
|
+
- **TypeScript**: 0 errors
|
|
108
|
+
- **Patterns adopted**: 5 from `can1357/oh-my-pi` post-2026-05-11
|
|
109
|
+
|
|
110
|
+
### Patterns surveyed but not applied (low applicability for pi-crew)
|
|
111
|
+
- **Streaming JSON throttle** (3a733c480) — pi-crew has no streaming JSON parser
|
|
112
|
+
- **In-place state mutation** (3a733c480) — pi-crew's spreads are bounded (small N), not hot paths
|
|
113
|
+
- **Bounded row probing** (b522fde56) — pi-crew has no SQL queries
|
|
114
|
+
- **MCP reconnect storm circuit breaker** — pi-crew has no MCP reconnect logic
|
|
115
|
+
- **Drop `args` global from eval** (4ab40764d) — pi-crew's `dynamic-script-runner.ts` already safe
|
|
116
|
+
- **Shell-injection rejection in git specs** (22e564a85) — pi-crew has no plugin install path
|
|
117
|
+
- **NPM registry pinning** (9abce6e97) — pi-crew's `install.mjs` is config-only; user runs `pi install npm:pi-crew`
|
|
118
|
+
- **Extension flag shadow** (1fbc2cbd7) — pi-crew has no `registerFlag` calls
|
|
119
|
+
|
|
3
120
|
## [0.5.14] — Round 19 Audit Fixes (2026-06-02)
|
|
4
121
|
|
|
5
122
|
### Phase 1: Path validation in checkpoint.ts (MEDIUM security)
|
package/README.md
CHANGED
|
@@ -0,0 +1,35 @@
|
|
|
1
|
+
# Round 22 Audit Fix Plan (Defensive Caps)
|
|
2
|
+
|
|
3
|
+
## Findings
|
|
4
|
+
|
|
5
|
+
### Issue 1: `autoRecoveryLast` Map grows unboundedly (MEDIUM, MEMORY)
|
|
6
|
+
- **File**: `src/extension/register.ts:484`
|
|
7
|
+
- **What**: Module-level `Map<string, number>` keyed by `${kind}_${runId}`. Holds cooldown timestamps for "recovery notifications" (5-minute gate per key).
|
|
8
|
+
- **Bug**: Entries are NEVER removed during a session. Each run contributes up to 4 keys (one per `maybeNotifyHealth` kind). Long-running pi sessions that run 1000+ teams accumulate 4000+ entries (~32KB).
|
|
9
|
+
- **Severity**: MEDIUM — silent memory growth in long-running process. Not a security issue.
|
|
10
|
+
- **Fix**: Add `AUTO_RECOVERY_LAST_MAX_ENTRIES` cap. Evict oldest insertion (matches the 5-min cooldown gate semantics — once the gate has expired, the entry is irrelevant). The eviction loop runs on each `set()` to amortize the cost.
|
|
11
|
+
|
|
12
|
+
### Issue 2: `agentEventSeqCache` Map grows unboundedly (MEDIUM, MEMORY)
|
|
13
|
+
- **File**: `src/runtime/crew-agent-records.ts:265`
|
|
14
|
+
- **What**: Module-level `Map<string, { size, mtimeMs, seq }>` keyed by `filePath` (each agent event log). Caches the `.seq` sidecar value.
|
|
15
|
+
- **Bug**: Entries are NEVER removed. Each new agent task creates a new event log file, adding a cache entry. A long-running pi-crew process that spawns 1000s of agents accumulates 1000s of entries.
|
|
16
|
+
- **Severity**: MEDIUM — silent memory growth. Plus, stale entries mask filesystem changes (mtime/size won't reflect a re-created file).
|
|
17
|
+
- **Fix**: Add `AGENT_EVENT_SEQ_CACHE_MAX_ENTRIES` cap. Evict oldest insertion first (mirrors the `asyncAgentReaderCache` pattern at line 134-136 in the same file).
|
|
18
|
+
|
|
19
|
+
## Plan (2 phases)
|
|
20
|
+
|
|
21
|
+
### Phase 1: `autoRecoveryLast` defensive cap
|
|
22
|
+
- `src/extension/register.ts:484` — add `AUTO_RECOVERY_LAST_MAX_ENTRIES = 1000` constant
|
|
23
|
+
- Modify the `set()` site at line 1534 to evict oldest entries before inserting when size > cap
|
|
24
|
+
- Add test in `test/unit/auto-recovery-cap.test.ts`
|
|
25
|
+
|
|
26
|
+
### Phase 2: `agentEventSeqCache` defensive cap
|
|
27
|
+
- `src/runtime/crew-agent-records.ts:265` — add `AGENT_EVENT_SEQ_CACHE_MAX_ENTRIES = 1000` constant
|
|
28
|
+
- Add helper function `setAgentEventSeqCache()` that wraps the `.set()` and evicts oldest entries
|
|
29
|
+
- Add test in `test/unit/crew-agent-records.test.ts` (or new file)
|
|
30
|
+
|
|
31
|
+
## Expected impact
|
|
32
|
+
- 2 new tests, 0 regressions
|
|
33
|
+
- Total: 2 MEDIUM memory-leak fixes
|
|
34
|
+
- No public API changes
|
|
35
|
+
- Pattern: follows existing `NotificationRouter.SEEN_MAP_MAX_SIZE` and `asyncAgentReaderCache` patterns in the codebase
|
|
@@ -0,0 +1,80 @@
|
|
|
1
|
+
# Round 23 Audit Findings (Resource Cleanup)
|
|
2
|
+
|
|
3
|
+
## Skill: iterative-audit (Pattern #7: Resource Cleanup)
|
|
4
|
+
|
|
5
|
+
## Findings
|
|
6
|
+
|
|
7
|
+
### Issue 1: OTLP exporter `inFlight` push not awaited on dispose (LOW)
|
|
8
|
+
- **File**: `src/observability/exporters/otlp-exporter.ts:80-86, 127-130`
|
|
9
|
+
- **What**: When `dispose()` is called, the interval timer is cleared but the in-flight `push()` continues to run until the 10s fetch timeout. The result is lost (not awaited).
|
|
10
|
+
- **Severity**: LOW — bounded by 10s fetch timeout. Not a real leak, just orphaned work.
|
|
11
|
+
- **Fix**: Make `dispose()` async. Await the in-flight push before returning.
|
|
12
|
+
- **Test**: 1 new test verifies `dispose()` waits for the in-flight push.
|
|
13
|
+
|
|
14
|
+
## Patterns surveyed (all VERIFIED clean from source)
|
|
15
|
+
|
|
16
|
+
### setInterval / setTimeout cleanup
|
|
17
|
+
| File | Resource | Cleanup | Status |
|
|
18
|
+
|------|----------|---------|--------|
|
|
19
|
+
| `register.ts:411` | `autoRepairTimer` | cleared on line 308, 402, 1102 | OK |
|
|
20
|
+
| `register.ts:442` | `tempReconcileTimer` | cleared on line 308, 402, 1102 | OK |
|
|
21
|
+
| `result-watcher.ts:80` | `pollTimer` | cleared in `stopPolling()` | OK |
|
|
22
|
+
| `result-watcher.ts:96` | `restartTimer` | cleared in `scheduleRestart()` and `stop()` | OK |
|
|
23
|
+
| `async-notifier.ts:101` | `state.interval` | cleared in `stopAsyncRunNotifier()` | OK |
|
|
24
|
+
| `subagent-tools.ts:228` | `timer` | cleanup function returned to caller | OK |
|
|
25
|
+
| `team-tool.ts:160` | `timer` | `stop()` method clears it | OK |
|
|
26
|
+
| `live-conversation-overlay.ts:55` | `pollTimer` | cleared in `close()` / `dispose()` | OK |
|
|
27
|
+
| `loaders.ts:127` | `timer` | cleared in `dispose()` | OK |
|
|
28
|
+
| `theme-adapter.ts:145` | `pollTimer` | cleared in unsubscribe (line 169) | OK |
|
|
29
|
+
| `delivery-coordinator.ts:169` | `ttlTimer` | cleared in `dispose()` | OK |
|
|
30
|
+
| `parent-guard.ts:61` | `guardInterval` | cleared in `stopParentGuard()` | OK |
|
|
31
|
+
| `scheduler.ts:88` | `t` (timer) | cleared on job removal | OK |
|
|
32
|
+
| `otlp-exporter.ts:80` | `timer` | cleared in `dispose()` (Round 23: also awaits inFlight) | OK |
|
|
33
|
+
| `team-runner.ts:67` | `interval` | local scope (per-run) | OK |
|
|
34
|
+
| `metric-sink.ts:68` | `timer` | cleared in `dispose()` (also closes fd) | OK |
|
|
35
|
+
| `handoff-manager.ts:203` | `cleanupTimer` | cleared in `dispose()` (also clears Maps) | OK |
|
|
36
|
+
| `live-session-runtime.ts:487` | `controlTimer` | cleared in `finally` block | OK |
|
|
37
|
+
| `budget-tracker.ts:231` | `abortInterval` | cleared on abort/exhausted | OK |
|
|
38
|
+
| `background-runner.ts:52, 74` | `interval` | local scope (process entry point) | OK |
|
|
39
|
+
|
|
40
|
+
### process.on() signal handler registration
|
|
41
|
+
| File | Handlers | Guard | Status |
|
|
42
|
+
|------|----------|-------|--------|
|
|
43
|
+
| `crew-cleanup.ts:79, 84` | SIGTERM, SIGHUP | `signalHandlersRegistered` flag (Round 16) | OK |
|
|
44
|
+
| `background-runner.ts:107, 148, 175, 181, 194, 198` | many | process entry point (registered once per process) | OK |
|
|
45
|
+
| `event-log.ts:490-492` | exit, SIGTERM, SIGINT | module-level (ESM caches) | OK |
|
|
46
|
+
| `atomic-write.ts:265-267` | exit, SIGTERM, SIGINT | module-level (ESM caches) | OK |
|
|
47
|
+
|
|
48
|
+
### File watchers
|
|
49
|
+
| File | Watcher | Cleanup | Status |
|
|
50
|
+
|------|---------|---------|--------|
|
|
51
|
+
| `register.ts:682, 686` | `crewWatcher`, `userCrewWatcher` | `closeWatcher()` in cleanup paths | OK |
|
|
52
|
+
| `result-watcher.ts` | `watcher` | `closeWatcher()` in `stop()` | OK |
|
|
53
|
+
|
|
54
|
+
### Event listeners
|
|
55
|
+
| File | Listener | Cleanup | Status |
|
|
56
|
+
|------|----------|---------|--------|
|
|
57
|
+
| `event-bus.ts:on()` | deduped via Set | cleanup function returned | OK |
|
|
58
|
+
| `run-event-bus.ts:onAny()` etc. | deduped via Sets | cleanup function returned | OK |
|
|
59
|
+
| `phase-tracker.ts:dispose()` | EventEmitter | `removeAllListeners()` | OK |
|
|
60
|
+
| `team-tool.ts:72` | signal listener | `removeEventListener` in `finally` | OK |
|
|
61
|
+
|
|
62
|
+
### AbortController
|
|
63
|
+
| File | Controller | Cleanup | Status |
|
|
64
|
+
|------|-----------|---------|--------|
|
|
65
|
+
| `team-tool.ts:68` | per-tool | aborted via signal listener, removed in `finally` | OK |
|
|
66
|
+
| `subagent-manager.ts:290` | per-run | cleaned in `cleanupRunSignal()` | OK |
|
|
67
|
+
| `cancellation-token.ts:17` | per-token | aborted via `#controller.abort()` | OK |
|
|
68
|
+
| `otlp-exporter.ts:106` | per-push | cleared in `finally` block | OK (also: dispose awaits inFlight) |
|
|
69
|
+
|
|
70
|
+
## Plan (1 phase)
|
|
71
|
+
|
|
72
|
+
### Phase 1: OTLP exporter `dispose()` awaits inFlight
|
|
73
|
+
- `src/observability/exporters/otlp-exporter.ts:127-130` — make `dispose()` async, await `this.inFlight`
|
|
74
|
+
- 1 new test in `test/unit/otlp-exporter.test.ts`
|
|
75
|
+
|
|
76
|
+
## Expected impact
|
|
77
|
+
- 1 new test, 0 regressions
|
|
78
|
+
- Total: 1 LOW severity improvement
|
|
79
|
+
- No public API change (callers that don't await still get synchronous timer clear)
|
|
80
|
+
- Pattern: matches the existing `await` patterns elsewhere in the codebase
|
package/docs/skills/REFERENCE.md
CHANGED
|
@@ -38,6 +38,16 @@ multi-perspective-review (8-pass deep review)
|
|
|
38
38
|
secure-agent-orchestration-review (security focus)
|
|
39
39
|
```
|
|
40
40
|
|
|
41
|
+
### Multi-Round Audit (5-20 rounds)
|
|
42
|
+
|
|
43
|
+
```
|
|
44
|
+
iterative-audit (round planning, 7 patterns, diminishing-returns)
|
|
45
|
+
↓
|
|
46
|
+
multi-perspective-review (per round, optional)
|
|
47
|
+
↓
|
|
48
|
+
verification-before-done (per round)
|
|
49
|
+
```
|
|
50
|
+
|
|
41
51
|
---
|
|
42
52
|
|
|
43
53
|
## When to Invoke
|
|
@@ -48,6 +58,7 @@ secure-agent-orchestration-review (security focus)
|
|
|
48
58
|
| Before claiming done | `verification-before-done` |
|
|
49
59
|
| Code review (quick) | `scrutinize` |
|
|
50
60
|
| Code review (deep) | `multi-perspective-review` |
|
|
61
|
+
| Multi-round audit (5-20 rounds) | `iterative-audit` |
|
|
51
62
|
| Task delegation | `delegation-patterns` |
|
|
52
63
|
| Complex multi-phase work | `orchestration` |
|
|
53
64
|
| After bug is fixed | `post-mortem` |
|
package/package.json
CHANGED
|
@@ -0,0 +1,330 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: iterative-audit
|
|
3
|
+
description: "Iterative multi-round codebase audit with diminishing-returns detection. Run 5-20+ rounds, each focusing on one specific area. Built from 19 rounds of dogfooding pi-crew on itself."
|
|
4
|
+
triggers:
|
|
5
|
+
- "audit this codebase"
|
|
6
|
+
- "review everything"
|
|
7
|
+
- "find all bugs"
|
|
8
|
+
- "deep audit"
|
|
9
|
+
- "harden this"
|
|
10
|
+
- "iterate audit rounds"
|
|
11
|
+
- "multi-round review"
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Iterative Audit
|
|
15
|
+
|
|
16
|
+
> Distilled from 19 rounds of auditing pi-crew on itself (v0.5.5 → v0.5.14):
|
|
17
|
+
> ~70 issues fixed, 286 tests added, 9 security improvements, 2 performance improvements.
|
|
18
|
+
|
|
19
|
+
The core insight: **a single round of audit finds the easy 30% of bugs**. The remaining 70% only surfaces through 5-20+ targeted rounds, each with a specific focus. After round 5+ you find HIGH severity bugs that round 1 missed. After round 10+ you find issues that no human reviewer would catch in a single pass.
|
|
20
|
+
|
|
21
|
+
## Operating Stance
|
|
22
|
+
|
|
23
|
+
- **One focus per round.** Each round targets one of the 7 patterns below. Don't try to fix everything in one pass.
|
|
24
|
+
- **Source verification is mandatory.** Never trust audit docs or previous round reports — always read the actual code. ~30% of issues from prior rounds are false positives or already fixed.
|
|
25
|
+
- **Document every finding with file:line.** "Sandbox env allow-list" is useless. "src/runtime/sandbox.ts:70 — process.env full leak" is actionable.
|
|
26
|
+
- **Verify the team actually applied changes.** After any team run, run `git diff` and inspect. ~20% of team runs silently fail to apply changes.
|
|
27
|
+
- **Don't publish without explicit user confirmation.** Audit work compounds; releasing in the middle of a round leaves the codebase in a half-hardened state.
|
|
28
|
+
|
|
29
|
+
## The 7 Patterns (rotate through these)
|
|
30
|
+
|
|
31
|
+
After 19 rounds, every issue found falls into one of these 7 categories. Use this to plan each round's focus.
|
|
32
|
+
|
|
33
|
+
### 1. L1 Cleanup (decoration, low value, easy)
|
|
34
|
+
**What**: Replace `console.error` / `console.warn` / `process.stderr.write` with `logInternalError()` from `utils/internal-error.ts`.
|
|
35
|
+
|
|
36
|
+
**Why**: `console.error` may not be visible in JSON-RPC mode or when stderr is redirected. `logInternalError` is the project-wide pattern; missing it means errors are silently dropped.
|
|
37
|
+
|
|
38
|
+
**How to find them**:
|
|
39
|
+
```bash
|
|
40
|
+
rg -n 'console\.(error|warn|log)' src/
|
|
41
|
+
rg -n 'process\.stderr\.write' src/
|
|
42
|
+
```
|
|
43
|
+
|
|
44
|
+
**Rule**: Skip `internal-error.ts:5` itself (it's the implementation). Skip `background-runner.ts:146` (overrides `console.error` for testing). Skip `parent-guard.ts:37` (exit-time log must fire synchronously).
|
|
45
|
+
|
|
46
|
+
**Time per round**: 30 min for 5-10 callsites. Diminishing returns after round 1.
|
|
47
|
+
|
|
48
|
+
### 2. Defensive Caps (memory safety, medium value, medium effort)
|
|
49
|
+
**What**: Find Maps, Sets, Arrays, and Queues that grow unboundedly. Add `MAX_*` constants and eviction logic.
|
|
50
|
+
|
|
51
|
+
**Why**: Long-running processes (background runners, extension reloads) accumulate state. Without caps, a busy period causes OOM.
|
|
52
|
+
|
|
53
|
+
**How to find them**:
|
|
54
|
+
```bash
|
|
55
|
+
rg -n 'new Map\(' src/ # look for ones that are .set() repeatedly
|
|
56
|
+
rg -n 'new Set\(' src/
|
|
57
|
+
rg -n 'this\.\w+\.push\(' src/ # look for unbounded arrays
|
|
58
|
+
```
|
|
59
|
+
|
|
60
|
+
**Common patterns**:
|
|
61
|
+
- `Semaphore.#queue` → add `MAX_QUEUE` cap (pi-crew: 10,000)
|
|
62
|
+
- `liveAgentManager.liveAgents` Map → add `MAX_LIVE_AGENTS` cap (pi-crew: 5,000)
|
|
63
|
+
- `OverflowRecoveryTracker.states` Map → add `MAX_TRACKED_STATES` cap (pi-crew: 5,000)
|
|
64
|
+
- `NotificationRouter.seen` Map → add `SEEN_MAP_MAX_SIZE` cap (pi-crew: 10,000)
|
|
65
|
+
|
|
66
|
+
**Eviction strategies** (in order of preference):
|
|
67
|
+
1. **LRU by access time** — track `lastAccessAt` per entry
|
|
68
|
+
2. **Oldest insertion** — Map's natural insertion order works (delete first key)
|
|
69
|
+
3. **Terminal-state priority** — protect live entries, evict completed/failed/cancelled first
|
|
70
|
+
|
|
71
|
+
**Test pattern**: Verify cap by inserting 1.5× the max, confirm old entries are gone.
|
|
72
|
+
|
|
73
|
+
### 3. Test Coverage Gaps (good value, low effort)
|
|
74
|
+
**What**: Find source files with zero direct unit tests.
|
|
75
|
+
|
|
76
|
+
**How to find them**:
|
|
77
|
+
```bash
|
|
78
|
+
# For each src file, check if any test file imports it
|
|
79
|
+
for f in src/runtime/*.ts src/extension/*.ts; do
|
|
80
|
+
basename=$(basename "$f" .ts)
|
|
81
|
+
count=$(ls test/unit/${basename}*.test.ts 2>/dev/null | wc -l)
|
|
82
|
+
[ "$count" = "0" ] && echo "NO TEST: $f"
|
|
83
|
+
done
|
|
84
|
+
```
|
|
85
|
+
|
|
86
|
+
**Prioritize**:
|
|
87
|
+
- Security-critical: `sandbox.ts`, `child-pi.ts`, `pi-spawn.ts`, `crew-cleanup.ts`
|
|
88
|
+
- Resource-management: `live-agent-manager.ts`, `semaphore.ts`, `overflow-recovery.ts`
|
|
89
|
+
- Public APIs: anything with `export class` or `export function`
|
|
90
|
+
|
|
91
|
+
**Don't test**: internal helpers, generated code, pure re-exports.
|
|
92
|
+
|
|
93
|
+
**Test categories** (in order of importance):
|
|
94
|
+
1. **Path validation** (security) — `assertSafePathId`, path traversal rejection
|
|
95
|
+
2. **Resource cleanup** — `dispose()` clears everything, listeners don't stack
|
|
96
|
+
3. **Boundary conditions** — empty input, max-size, overflow
|
|
97
|
+
4. **Callback lifecycle** — sync/async error handling, `resultConsumed` flag
|
|
98
|
+
|
|
99
|
+
### 4. Security Hardening (high value, high effort)
|
|
100
|
+
**What**: Find places where untrusted input reaches dangerous sinks.
|
|
101
|
+
|
|
102
|
+
**Common sinks to audit**:
|
|
103
|
+
- `execSync(command)` → switch to `execFileSync(program, args[])`
|
|
104
|
+
- `eval()` / `Function()` / `vm.runInNewContext()` → avoid entirely
|
|
105
|
+
- `path.join(base, userInput)` → use `assertSafePathId(userInput)` first
|
|
106
|
+
- `process.env` access → use sanitized env with allow-list
|
|
107
|
+
- File writes to user-controlled paths → validate path is within allowed roots
|
|
108
|
+
- Child process spawn → use `cwd: knownDir`, sanitize env
|
|
109
|
+
|
|
110
|
+
**How to find them**:
|
|
111
|
+
```bash
|
|
112
|
+
rg -n 'execSync\(' src/
|
|
113
|
+
rg -n 'exec\(' src/
|
|
114
|
+
rg -n 'eval\(|Function\(' src/
|
|
115
|
+
rg -n 'spawn\(' src/
|
|
116
|
+
rg -n 'path\.join\(' src/ | rg 'record\.|task\.|runId|agent\.'
|
|
117
|
+
```
|
|
118
|
+
|
|
119
|
+
**Round 1**: Find all `execSync` and `exec`. Switch to `execFileSync(program, args)` (no shell).
|
|
120
|
+
**Round 2**: Audit env handling. Look for `process.env` access in hot paths. Add allow-list.
|
|
121
|
+
**Round 3**: Path traversal. For every `path.join(base, userInput)`, add `assertSafePathId()`.
|
|
122
|
+
**Round 4**: Subprocess safety. Verify all `spawn()` calls have: validated args, sanitized env, `cwd` set, signal handling, timeout.
|
|
123
|
+
|
|
124
|
+
### 5. Performance (medium value, medium effort)
|
|
125
|
+
**What**: Find O(N²) or worse algorithms, especially in hot paths.
|
|
126
|
+
|
|
127
|
+
**Common patterns**:
|
|
128
|
+
- Recomputing document frequency in search loops → precompute at construction
|
|
129
|
+
- `array.filter().map().filter()` in a loop → fuse into one pass
|
|
130
|
+
- `JSON.parse` of the same file repeatedly → cache
|
|
131
|
+
- `fs.statSync` per file in a directory scan → batch with `Dirent.isDirectory()`
|
|
132
|
+
- `setTimeout` busy-polling for state changes → use `fs.watch` or events
|
|
133
|
+
|
|
134
|
+
**How to find them**:
|
|
135
|
+
```bash
|
|
136
|
+
# Look for nested loops over the same data
|
|
137
|
+
rg -nB 1 -A 5 'for.*of.*for' src/
|
|
138
|
+
# Look for polls
|
|
139
|
+
rg -n 'setTimeout.*poll' src/
|
|
140
|
+
rg -n 'pollIntervalMs' src/
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
**Test pattern**: For precomputation fixes, write a perf test that creates 1000 docs, runs search, and asserts completion under 100ms.
|
|
144
|
+
|
|
145
|
+
### 6. Code Quality (low value, easy)
|
|
146
|
+
**What**: Remove dead code, fix type misuse, add missing JSDoc.
|
|
147
|
+
|
|
148
|
+
**Common patterns**:
|
|
149
|
+
- Fields declared but never used (e.g., `seenCleanupCounter`)
|
|
150
|
+
- Unused imports
|
|
151
|
+
- Type assertions (`as any`, `as unknown as T`) that hide real issues
|
|
152
|
+
- Functions that always return the same value
|
|
153
|
+
- Catch blocks that swallow errors silently
|
|
154
|
+
|
|
155
|
+
**How to find them**:
|
|
156
|
+
```bash
|
|
157
|
+
# Find fields/methods declared but never used
|
|
158
|
+
rg -n 'private \w+\s*=\s*' src/ | while read line; do
|
|
159
|
+
field=$(echo "$line" | grep -oP 'private \K\w+')
|
|
160
|
+
count=$(rg -c "\b$field\b" src/ 2>/dev/null | head -1)
|
|
161
|
+
[ "$count" = "1" ] && echo "DEAD: $line"
|
|
162
|
+
done
|
|
163
|
+
```
|
|
164
|
+
|
|
165
|
+
### 7. Resource Cleanup (medium value, medium effort)
|
|
166
|
+
**What**: Find places where listeners, timers, file handles, or other resources can leak.
|
|
167
|
+
|
|
168
|
+
**Common patterns**:
|
|
169
|
+
- `process.on('SIGTERM', ...)` registered multiple times → use module-level flag
|
|
170
|
+
- `setInterval` / `setTimeout` not cleared on shutdown → `dispose()` method
|
|
171
|
+
- `AbortController` not aborted in cleanup
|
|
172
|
+
- File watchers (`fs.watch`) not closed
|
|
173
|
+
- Event listeners (`emitter.on`) not removed
|
|
174
|
+
|
|
175
|
+
**How to find them**:
|
|
176
|
+
```bash
|
|
177
|
+
rg -n 'process\.on\(' src/
|
|
178
|
+
rg -n 'setInterval\(' src/
|
|
179
|
+
rg -n 'setTimeout\(' src/ | rg -v 'setTimeout.*resolve' # filter out poll sleeps
|
|
180
|
+
rg -n 'fs\.watch\(' src/
|
|
181
|
+
```
|
|
182
|
+
|
|
183
|
+
**Test pattern**: Call the registration function N times, verify listener count is 1.
|
|
184
|
+
|
|
185
|
+
## Round Workflow (use this for EVERY round)
|
|
186
|
+
|
|
187
|
+
### Step 1: Pick a focus
|
|
188
|
+
Choose ONE of the 7 patterns above. Don't try to do multiple patterns in one round.
|
|
189
|
+
|
|
190
|
+
### Step 2: Explore (read 3-5 files)
|
|
191
|
+
Read the actual source for the focus area. Don't trust prior audit docs.
|
|
192
|
+
|
|
193
|
+
### Step 3: Verify from source
|
|
194
|
+
For each candidate issue:
|
|
195
|
+
- Read the file at the cited line
|
|
196
|
+
- Check if the issue is real (not a false positive)
|
|
197
|
+
- Check if it's already fixed
|
|
198
|
+
- Note the exact file:line and code snippet
|
|
199
|
+
|
|
200
|
+
### Step 4: Create a plan doc
|
|
201
|
+
```markdown
|
|
202
|
+
# Round N Audit Fix Plan
|
|
203
|
+
## Findings
|
|
204
|
+
### Issue 1: <file>:<line> — <title> (severity)
|
|
205
|
+
<File path and line numbers>
|
|
206
|
+
<Code snippet showing the issue>
|
|
207
|
+
<Rationale>
|
|
208
|
+
|
|
209
|
+
## Plan (5 phases)
|
|
210
|
+
### Phase 1: <action>
|
|
211
|
+
### Phase 2: <action>
|
|
212
|
+
...
|
|
213
|
+
```
|
|
214
|
+
|
|
215
|
+
### Step 5: Implement
|
|
216
|
+
- Make the fix
|
|
217
|
+
- Add tests (if applicable)
|
|
218
|
+
- Run typecheck: `npx tsc --noEmit`
|
|
219
|
+
- Run tests: `npm test`
|
|
220
|
+
|
|
221
|
+
### Step 6: Commit + Release
|
|
222
|
+
- Commit with conventional message: `fix: round N - <summary>`
|
|
223
|
+
- Update CHANGELOG.md
|
|
224
|
+
- Bump version (patch)
|
|
225
|
+
- Push + npm publish
|
|
226
|
+
- Create GitHub release
|
|
227
|
+
|
|
228
|
+
### Step 7: Decide: continue or stop?
|
|
229
|
+
After 5-10 rounds, evaluate:
|
|
230
|
+
|
|
231
|
+
**Continue if**:
|
|
232
|
+
- Last 2 rounds found HIGH or MEDIUM severity issues
|
|
233
|
+
- Test coverage is < 80% of modules
|
|
234
|
+
- User explicitly wants more
|
|
235
|
+
|
|
236
|
+
**Stop if**:
|
|
237
|
+
- Last 2 rounds found only LOW severity or L1 cleanup
|
|
238
|
+
- All patterns exhausted (you've done each at least once)
|
|
239
|
+
- Diminishing returns: more time spent planning than implementing
|
|
240
|
+
|
|
241
|
+
## When to Use Teams vs. Do It Yourself
|
|
242
|
+
|
|
243
|
+
**Use teams** (via `team action='run', team='review'`) for:
|
|
244
|
+
- Initial broad audit (round 1)
|
|
245
|
+
- Security reviews (specialized `security-reviewer` agent)
|
|
246
|
+
- When you need 3+ perspectives (multi-explorer)
|
|
247
|
+
|
|
248
|
+
**Do it yourself** for:
|
|
249
|
+
- Round 2+ (you have context from prior rounds)
|
|
250
|
+
- Focused single-pattern work (L1 cleanup, test coverage)
|
|
251
|
+
- Small fixes (< 5 file edits)
|
|
252
|
+
|
|
253
|
+
**Teams often fail because**:
|
|
254
|
+
- 5-min heartbeat timeout for long-running runs (add `startTeamRunHeartbeat` if needed)
|
|
255
|
+
- Agent cancellations
|
|
256
|
+
- Hallucinated file:line references (always verify from source)
|
|
257
|
+
|
|
258
|
+
## Common False Positives (audit findings to reject)
|
|
259
|
+
|
|
260
|
+
After 19 rounds, ~30% of audit findings are false positives. Common patterns:
|
|
261
|
+
|
|
262
|
+
1. **"Double-merge in config"** — looks like a bug, but project config + user config merge is intentional
|
|
263
|
+
2. **"as unknown as T in error handling"** — necessary for TypeScript's strict mode
|
|
264
|
+
3. **"Auto-repair timer race"** — there's a guard like `cleanedUp || !currentCtx` you missed
|
|
265
|
+
4. **"Already-validated input"** — validation is in the caller, not the callee
|
|
266
|
+
5. **"Redundant null check"** — TypeScript narrowing doesn't always work for closures
|
|
267
|
+
|
|
268
|
+
**Always verify against source** before acting. If you're not sure, write a test that exercises the alleged bug path. If the test passes, it's a false positive.
|
|
269
|
+
|
|
270
|
+
## Success Metrics
|
|
271
|
+
|
|
272
|
+
After each round, record:
|
|
273
|
+
- Issues found (real vs. false positive)
|
|
274
|
+
- Tests added
|
|
275
|
+
- Typecheck clean?
|
|
276
|
+
- Total test count delta
|
|
277
|
+
|
|
278
|
+
**Healthy round**: 3-8 real issues found, +20 to +50 tests added, all pass.
|
|
279
|
+
|
|
280
|
+
**Exhausted round**: 0-1 real issues found, 0 tests added, mostly L1 cleanup.
|
|
281
|
+
|
|
282
|
+
When you hit 2+ exhausted rounds in a row, **stop**.
|
|
283
|
+
|
|
284
|
+
## Real Examples from 19 Rounds
|
|
285
|
+
|
|
286
|
+
| Round | Focus | Issues Found | Severity Range |
|
|
287
|
+
|-------|-------|--------------|----------------|
|
|
288
|
+
| 1-3 | Broad security audit | 11 | CRITICAL, HIGH |
|
|
289
|
+
| 4-6 | Race conditions, locks | 5 | HIGH |
|
|
290
|
+
| 7-9 | L1 cleanup, dead code | 12 | LOW |
|
|
291
|
+
| 10-12 | Defensive caps | 3 | MEDIUM |
|
|
292
|
+
| 13-15 | Security: execSync, sandbox | 9 | CRITICAL, HIGH |
|
|
293
|
+
| 16-18 | Test coverage, L1 | 30+ | LOW |
|
|
294
|
+
| 19 | Path validation, tests | 5 | MEDIUM |
|
|
295
|
+
|
|
296
|
+
**Pattern**: First 3 rounds find the most impactful issues. Rounds 4-15 find the rest. Rounds 16+ are diminishing returns (mostly test coverage and L1 cleanup).
|
|
297
|
+
|
|
298
|
+
## Anti-Patterns to Avoid
|
|
299
|
+
|
|
300
|
+
- **Mega-rounds** (10+ files, 5+ categories) — too broad, low quality findings
|
|
301
|
+
- **Trusting audit docs** — always verify from source
|
|
302
|
+
- **Skipping typecheck** — type errors compound and become hard to debug later
|
|
303
|
+
- **Releasing mid-round** — leaves the codebase in a half-hardened state
|
|
304
|
+
- **No test for the fix** — every fix needs a test that would have caught the bug
|
|
305
|
+
- **Committing too late** — commit after each phase, not at the end of the round
|
|
306
|
+
|
|
307
|
+
## Enforcement — Iterative Audit Gate
|
|
308
|
+
|
|
309
|
+
**Before reporting round findings, verify:**
|
|
310
|
+
|
|
311
|
+
- [ ] Round focus is ONE of the 7 patterns (not multiple)
|
|
312
|
+
- [ ] Each finding has a verified `file:line` reference (read the actual source)
|
|
313
|
+
- [ ] False positives filtered out (consult "Common False Positives" section)
|
|
314
|
+
- [ ] Severity assigned using the standard scale (CRITICAL / HIGH / MEDIUM / LOW)
|
|
315
|
+
- [ ] Plan doc created with phases and file:line evidence
|
|
316
|
+
- [ ] Typecheck clean: `npx tsc --noEmit` returns 0 errors
|
|
317
|
+
- [ ] All tests pass: `npm test` shows 0 failures
|
|
318
|
+
- [ ] Tests added for the fix (if applicable)
|
|
319
|
+
- [ ] Round results recorded: issues found, tests added, delta
|
|
320
|
+
- [ ] Decision logged: continue to next round or stop (with reason)
|
|
321
|
+
|
|
322
|
+
**If ANY answer is NO → Stop. Complete audit requirements before reporting round results.**
|
|
323
|
+
|
|
324
|
+
## Related Skills
|
|
325
|
+
|
|
326
|
+
- `scrutinize` — Quick outsider-perspective review of a single change
|
|
327
|
+
- `multi-perspective-review` — 8-pass deep review for a single change
|
|
328
|
+
- `security-review` — Security-focused audit with detection authoring
|
|
329
|
+
- `verification-before-done` — Evidence before claim (use per round)
|
|
330
|
+
- `systematic-debugging` — When a finding reveals a real bug that needs deeper investigation
|
|
@@ -3,7 +3,7 @@ import * as fs from "node:fs";
|
|
|
3
3
|
import * as path from "node:path";
|
|
4
4
|
import type { AgentConfig, ResourceSource, RoutingMetadata } from "../agents/agent-config.ts";
|
|
5
5
|
import { serializeAgent } from "../agents/agent-serializer.ts";
|
|
6
|
-
import {
|
|
6
|
+
import { discoverAgents } from "../agents/discover-agents.ts";
|
|
7
7
|
import type { TeamToolDetails } from "./team-tool-types.ts";
|
|
8
8
|
import { toolResult, type PiTeamsToolResult } from "./tool-result.ts";
|
|
9
9
|
import type { TeamToolParamsValue } from "../schema/team-tool-schema.ts";
|