pi-crew 0.8.14 → 0.9.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +366 -0
- package/README.md +112 -2
- package/docs/FEATURE_INTAKE.md +1 -1
- package/docs/HARNESS.md +20 -19
- package/docs/PROJECT_REVIEW.md +132 -133
- package/docs/PROJECT_REVIEW_FIXES.md +130 -131
- package/docs/actions-reference.md +127 -121
- package/docs/architecture.md +1 -1
- package/docs/code-review-2026-05-11.md +134 -134
- package/docs/commands-reference.md +108 -106
- package/docs/comparison-pi-subagents-vs-pi-crew.md +105 -105
- package/docs/deep-review-report.md +1 -1
- package/docs/dynamic-workflows.md +90 -0
- package/docs/fixes/BATCH_A_H1_H2.md +17 -17
- package/docs/fixes/bug-007-async-notifier-stale-ctx.md +23 -23
- package/docs/followup-plan-2026-05-12.md +135 -135
- package/docs/followup-review-2026-05-12.md +86 -86
- package/docs/followup-review-round3-2026-05-12.md +123 -123
- package/docs/goals.md +59 -0
- package/docs/implementation-plan-top3.md +4 -4
- package/docs/issue-29-analysis.md +2 -2
- package/docs/oh-my-pi-research.md +154 -154
- package/docs/optimization-plan.md +2 -0
- package/docs/perf/baseline-2026-05.md +9 -9
- package/docs/perf/final-report-2026-05.md +2 -2
- package/docs/perf/sprint-1-report.md +2 -2
- package/docs/perf/sprint-2-report.md +1 -1
- package/docs/perf/upgrade-plan-2026-05.md +72 -72
- package/docs/pi-crew-bugs.md +230 -230
- package/docs/pi-crew-investigation-report.md +102 -102
- package/docs/pi-crew-test-round5.md +4 -4
- package/docs/runtime-analysis-child-vs-live.md +57 -57
- package/docs/runtime-migration-in-process-analysis.md +97 -97
- package/package.json +2 -4
- package/skills/orchestration/SKILL.md +11 -11
- package/src/agents/agent-config.ts +4 -0
- package/src/config/config.ts +39 -0
- package/src/config/types.ts +11 -0
- package/src/extension/action-suggestions.ts +2 -1
- package/src/extension/async-notifier.ts +10 -0
- package/src/extension/help.ts +14 -0
- package/src/extension/registration/commands.ts +27 -0
- package/src/extension/team-tool/destructive-gate.ts +1 -1
- package/src/extension/team-tool/goal-wrap.ts +288 -0
- package/src/extension/team-tool/goal.ts +405 -0
- package/src/extension/team-tool/run.ts +103 -4
- package/src/extension/team-tool/workflow-manage.ts +194 -0
- package/src/extension/team-tool.ts +20 -0
- package/src/hooks/types.ts +3 -1
- package/src/runtime/async-runner.ts +27 -2
- package/src/runtime/background-runner.ts +68 -19
- package/src/runtime/child-pi.ts +9 -1
- package/src/runtime/completion-guard.ts +1 -1
- package/src/runtime/dynamic-workflow-context.ts +450 -0
- package/src/runtime/dynamic-workflow-runner.ts +180 -0
- package/src/runtime/global-worker-cap.ts +96 -0
- package/src/runtime/goal-evaluator.ts +294 -0
- package/src/runtime/goal-loop-runner.ts +612 -0
- package/src/runtime/goal-state-store.ts +209 -0
- package/src/runtime/iteration-hooks.ts +2 -1
- package/src/runtime/pi-args.ts +10 -2
- package/src/runtime/post-checks.ts +2 -1
- package/src/runtime/result-extractor.ts +32 -0
- package/src/runtime/team-runner.ts +11 -1
- package/src/runtime/verification-gates.ts +88 -5
- package/src/runtime/verification-integrity.ts +110 -0
- package/src/runtime/verification-worktree.ts +136 -0
- package/src/runtime/workspace-lock.ts +448 -0
- package/src/schema/config-schema.ts +26 -0
- package/src/schema/team-tool-schema.ts +39 -4
- package/src/state/atomic-write.ts +9 -0
- package/src/state/contracts.ts +14 -0
- package/src/state/crew-init.ts +18 -5
- package/src/state/event-log.ts +7 -1
- package/src/state/state-store.ts +2 -0
- package/src/state/types.ts +82 -0
- package/src/state/worker-atomic-writer.ts +190 -0
- package/src/utils/env-allowlist.ts +30 -0
- package/src/utils/redaction.ts +104 -24
- package/src/utils/safe-paths.ts +55 -14
- package/src/workflows/discover-workflows.ts +25 -1
- package/src/workflows/workflow-config.ts +13 -0
- package/src/worktree/cleanup.ts +2 -1
- package/src/worktree/worktree-manager.ts +4 -3
- package/teams/parallel-research.team.md +1 -1
- package/workflows/examples/hello.dwf.ts +24 -0
package/docs/PROJECT_REVIEW.md
CHANGED
|
@@ -1,60 +1,60 @@
|
|
|
1
1
|
# pi-crew — Project Review
|
|
2
2
|
|
|
3
|
-
>
|
|
4
|
-
>
|
|
5
|
-
>
|
|
6
|
-
> Method:
|
|
3
|
+
> Review date: 2026-05-18
|
|
4
|
+
> Version: `pi-crew@0.2.19`
|
|
5
|
+
> Scope: the entire source (`index.ts`, `src/**`), config, tests, docs, scripts.
|
|
6
|
+
> Method: read source directly, cross-referenced against `AGENTS.md`/`docs/architecture.md`, ran `npm run typecheck` + `npm run test:unit`.
|
|
7
7
|
|
|
8
|
-
##
|
|
8
|
+
## Overview
|
|
9
9
|
|
|
10
|
-
`pi-crew`
|
|
10
|
+
`pi-crew` is a multi-agent orchestration Pi extension (teams + workflows + worktrees + async background runs), with a **durable-first model**: every run/task/event is persisted to disk (JSONL + atomic JSON writes) so that foreground, async background, dashboard, and crash recovery all read the same source of truth.
|
|
11
11
|
|
|
12
|
-
|
|
12
|
+
The codebase is mature, uses **TypeScript strict mode** (`noImplicitAny`, `strict: true`), has a broad test suite (1596 tests pass, 2 skipped, 0 failures), clear layered architecture (extension / runtime / state / worktree / utils), and many defensive notes ("3.1 backpressure", "2.10 cache", "P1 catch unhandled errors") indicating it has been iterated over many review rounds.
|
|
13
13
|
|
|
14
|
-
###
|
|
14
|
+
### Quick health-check results
|
|
15
15
|
|
|
16
|
-
|
|
|
16
|
+
| Category | Result |
|
|
17
17
|
|---|---|
|
|
18
18
|
| `npm run typecheck` (`tsc --noEmit` + strip-types import) | PASS |
|
|
19
19
|
| `npm run test:unit` (1598 tests / 128 suites) | 1596 pass · 2 skip · 0 fail (~90s) |
|
|
20
|
-
| `npm pack --dry-run` (
|
|
21
|
-
| Linter (ESLint) |
|
|
22
|
-
|
|
|
20
|
+
| `npm pack --dry-run` (via `npm run ci`) | Not checked in this session |
|
|
21
|
+
| Linter (ESLint) | No `lint` script; relies on `tsc strict` |
|
|
22
|
+
| Number of `.ts` files in `src/` | ~190 modules |
|
|
23
23
|
|
|
24
24
|
---
|
|
25
25
|
|
|
26
|
-
## 1.
|
|
26
|
+
## 1. Notable strengths
|
|
27
27
|
|
|
28
|
-
1. **
|
|
29
|
-
2. **
|
|
30
|
-
- `O_EXCL | O_CREAT | O_NOFOLLOW`
|
|
31
|
-
- `fstatSync` post-open
|
|
32
|
-
- Rename retry
|
|
33
|
-
- Coalesced variant `atomicWriteJsonCoalesced`
|
|
34
|
-
3. **Redaction (`utils/redaction.ts`)**
|
|
35
|
-
4. **Env sanitization (`utils/env-filter.ts`)**: secret-pattern deny-list
|
|
28
|
+
1. **Consistent path-safety**: `utils/safe-paths.ts` (`assertSafePathId`, `resolveContainedPath`, `resolveRealContainedPath`) is used uniformly in `state-store.ts`, `artifact-store.ts`, `mailbox.ts`. It has two layers: a string-based containment check and a real-path check (defends against symlink escape after mkdir).
|
|
29
|
+
2. **Multi-layered defensive atomic writes** (`state/atomic-write.ts`):
|
|
30
|
+
- `O_EXCL | O_CREAT | O_NOFOLLOW` when opening the temp file.
|
|
31
|
+
- `fstatSync` post-open to verify a regular file (defends against TOCTOU on Windows where `O_NOFOLLOW = 0`).
|
|
32
|
+
- Rename retry with exponential backoff + jitter (defends against lockstep starvation).
|
|
33
|
+
- Coalesced variant `atomicWriteJsonCoalesced` for high-frequency state writes; flush on `exit`/`SIGTERM`/`SIGINT`.
|
|
34
|
+
3. **Redaction (`utils/redaction.ts`)** handles many patterns: PEM private keys, Authorization headers, Bearer tokens, inline secret patterns, key-name matching (`apiKey`, `password`, `secret`, ...). Applied in `appendEvent`, `appendMailboxMessage`, `writeArtifact`, `appendTranscript`.
|
|
35
|
+
4. **Env sanitization (`utils/env-filter.ts`)**: default secret-pattern deny-list, allow-list mode for `worktree.setupHook` to pass only `PATH`, `HOME`, `PI_*`.
|
|
36
36
|
5. **Process kill tree** (`runtime/child-pi.ts`):
|
|
37
|
-
- Windows: `taskkill /T /F` + verify-after-2s + retry
|
|
38
|
-
- POSIX: `process.kill(-pid, "SIGTERM")` (process group)
|
|
39
|
-
- Lifecycle events
|
|
40
|
-
6. **Backpressure**: pause child stdout
|
|
41
|
-
7. **Lazy imports
|
|
42
|
-
8. **Run / task contract guards**: `shouldMergeTaskUpdate` (
|
|
43
|
-
9. **Crash & cancellation paths**: `executeTeamRun` catch-all
|
|
44
|
-
10. **
|
|
37
|
+
- Windows: `taskkill /T /F` + verify-after-2s + retry if PID is still alive.
|
|
38
|
+
- POSIX: `process.kill(-pid, "SIGTERM")` (process group) with an absolute-pid fallback; SIGKILL escalation after `HARD_KILL_MS`; fast-cancel SIGKILL after 200ms on user cancel.
|
|
39
|
+
- Lifecycle events have a structured shape `{ type, pid, exitCode?, error?, ts }`.
|
|
40
|
+
6. **Backpressure**: pause child stdout when more than 256KB is undrained.
|
|
41
|
+
7. **Lazy imports marked with `// LAZY:`** with a specific reason (reduces ~1.4s import cost at registration), plus a `check:lazy-imports` script to enforce it.
|
|
42
|
+
8. **Run / task contract guards**: `shouldMergeTaskUpdate` (prevents a stale snapshot from regressing terminal state), monotonic finishedAt, `canTransitionRunStatus`, plan-approval-gating for mutating tasks.
|
|
43
|
+
9. **Crash & cancellation paths**: `executeTeamRun` catch-all ensures the manifest/tasks transition to terminal on unhandled error (avoids "running forever"); `background-runner` has an `unhandledRejection` guard that writes `async.failed` before exit; `parent-guard` so the background runner dies when its parent dies.
|
|
44
|
+
10. **Very broad test coverage** for both happy paths and edge cases (yield, atomic-write retry, mergeTaskUpdates, mailbox validation, cancellation, model fallback...).
|
|
45
45
|
11. **Config**:
|
|
46
|
-
- Schema
|
|
47
|
-
- **Sanitize project-level config** (`sanitizeProjectConfig`):
|
|
46
|
+
- Schema validation via TypeBox with fuzzy suggestions for misspelled keys.
|
|
47
|
+
- **Sanitize project-level config** (`sanitizeProjectConfig`): strips sensitive keys (`executeWorkers`, `runtime.mode`, `worktree.setupHook`, `otlp.headers`, `agents.overrides`, …) from the project config, accepting them only from user config. This is an essential safeguard for an injected repo.
|
|
48
48
|
|
|
49
49
|
---
|
|
50
50
|
|
|
51
|
-
## 2. Bugs / Issues
|
|
51
|
+
## 2. Bugs / Issues found
|
|
52
52
|
|
|
53
|
-
>
|
|
53
|
+
> Classification: **HIGH** (can cause data loss / incorrectness), **MED** (correctness corner case / DX), **LOW** (improvement).
|
|
54
54
|
|
|
55
55
|
### HIGH
|
|
56
56
|
|
|
57
|
-
**H1. `event-log.ts` — silent loss
|
|
57
|
+
**H1. `event-log.ts` — silent loss when exceeding `MAX_EVENTS_BYTES` (50MB)**
|
|
58
58
|
```ts
|
|
59
59
|
// src/state/event-log.ts ~ appendEventInsideLock
|
|
60
60
|
if (fs.existsSync(eventsPath) && fs.statSync(eventsPath).size > MAX_EVENTS_BYTES) {
|
|
@@ -62,150 +62,150 @@ if (fs.existsSync(eventsPath) && fs.statSync(eventsPath).size > MAX_EVENTS_BYTES
|
|
|
62
62
|
return { ...fullEvent, metadata: { ...(fullEvent.metadata ?? {seq:0,...}), appended: false } };
|
|
63
63
|
}
|
|
64
64
|
```
|
|
65
|
-
-
|
|
66
|
-
-
|
|
65
|
+
- Problem: the event is dropped immediately (including terminal events like `run.failed`, `task.completed`) and `appendCounter` is also not incremented → `compactEventLog` (which only runs every 100 appends) is not triggered when it is needed most. Consequence: once the threshold is crossed, the log is "locked" silently until the next 100 appends trigger a rotation.
|
|
66
|
+
- Suggestion: when the threshold is hit, call `compactEventLog(eventsPath)` immediately, or rotate first then append; also prioritize letting terminal events (TERMINAL_EVENT_TYPES) through, since those events are part of the durable contract.
|
|
67
67
|
|
|
68
|
-
**H2. `mailbox.ts` — `appendMailboxMessage`
|
|
68
|
+
**H2. `mailbox.ts` — `appendMailboxMessage` has no cross-process lock**
|
|
69
69
|
```ts
|
|
70
70
|
fs.appendFileSync(mailboxFile(manifest, complete.direction, complete.taskId), `${JSON.stringify(...)}\n`, "utf-8");
|
|
71
71
|
```
|
|
72
|
-
-
|
|
73
|
-
-
|
|
72
|
+
- Problem: `appendFileSync` is not atomic across processes on Windows. Two background runners + foreground steering at the same time can interleave JSON lines → `parseMailboxMessage` skips them, messages are lost silently (reported later via `validateMailbox`).
|
|
73
|
+
- Suggestion: use the existing `withEventLogLockSync` pattern for the mailbox, or use `atomicWriteFile` to rewrite (slower but atomic). At minimum, add atomic `O_APPEND` on POSIX (only guaranteed up to PIPE_BUF) and a lock on Windows.
|
|
74
74
|
|
|
75
|
-
**H3. `atomic-write.ts` — fallback `writeFileSync`
|
|
75
|
+
**H3. `atomic-write.ts` — fallback `writeFileSync` has no symlink guard**
|
|
76
76
|
```ts
|
|
77
77
|
try { renameWithRetry(tempPath, filePath); }
|
|
78
78
|
catch (renameError) {
|
|
79
|
-
try { fs.writeFileSync(filePath, content, "utf-8"); } //
|
|
79
|
+
try { fs.writeFileSync(filePath, content, "utf-8"); } // BYPASSES symlink guard
|
|
80
80
|
catch { throw renameError; }
|
|
81
81
|
}
|
|
82
82
|
```
|
|
83
|
-
-
|
|
84
|
-
-
|
|
83
|
+
- Problem: if rename fails with EPERM on Windows, the fallback goes directly to `writeFileSync(filePath)` — if `filePath` becomes a symlink between the `isSymlinkSafePath` check (top of function) and the fallback, the write follows the link. The time window is small but could be exploited by an adversary on a multi-user host.
|
|
84
|
+
- Suggestion: before the fallback, re-check `fs.lstatSync(filePath).isSymbolicLink()`. Or open an fd with `O_NOFOLLOW` and `O_TRUNC` then write.
|
|
85
85
|
|
|
86
|
-
**H4. `team-runner.ts` —
|
|
86
|
+
**H4. `team-runner.ts` — function named `__test__mergeTaskUpdates` is used in production**
|
|
87
87
|
```ts
|
|
88
|
-
// Re-
|
|
88
|
+
// Re-exported and documented as test-only:
|
|
89
89
|
export function __test__mergeTaskUpdates(...) { ... }
|
|
90
|
-
//
|
|
90
|
+
// but called in executeTeamRunCore:
|
|
91
91
|
tasks = __test__mergeTaskUpdates(tasks, results);
|
|
92
92
|
```
|
|
93
|
-
-
|
|
94
|
-
-
|
|
93
|
+
- Problem: the `__test__` convention implies only tests should import it; this is actually the runner's core merge logic. Another developer might "clean up" this helper or change its behavior thinking it only affects tests → silent regression.
|
|
94
|
+
- Suggestion: rename to `mergeTaskUpdatesPreservingTerminal()` (or similar), keep `__test__mergeTaskUpdates` as an export-only alias for tests, add a comment.
|
|
95
95
|
|
|
96
96
|
### MED
|
|
97
97
|
|
|
98
98
|
**M1. `task-runner.ts` — `transcriptPath` reused across model fallback attempts**
|
|
99
|
-
-
|
|
100
|
-
-
|
|
99
|
+
- Each attempt appends to the same transcript file. `parsePiJsonOutput(fs.readFileSync(transcriptPath, "utf-8"))` parses everything → final text/usage may be mixed across attempts. `resultArtifact.content` takes `parsedOutput?.finalText`, which could be the final text of attempt 1 (which failed) if attempt 2 has no valid message_end.
|
|
100
|
+
- Suggestion: either use `transcripts/${task.id}.attempt-${i}.jsonl` per attempt, or clear the file at the start of each attempt if the policy is "last attempt wins".
|
|
101
101
|
|
|
102
|
-
**M2. `task-runner.ts` —
|
|
102
|
+
**M2. `task-runner.ts` — reads the entire transcript into memory for `transcriptArtifact`**
|
|
103
103
|
```ts
|
|
104
104
|
content: fs.readFileSync(transcriptPath, "utf-8"),
|
|
105
105
|
```
|
|
106
|
-
-
|
|
107
|
-
-
|
|
106
|
+
- For long-running tasks the transcript can be tens of MB. Combined with compactChildPiEvent already reducing size, it is still unbounded. `MAX_CAPTURE_BYTES` only applies to in-memory `stdout/stderr`, not to the on-disk transcript.
|
|
107
|
+
- Suggestion: cap the transcript file size (rotate when exceeding a threshold) or have the artifact use a reference (path) instead of copying content.
|
|
108
108
|
|
|
109
|
-
**M3. `cleanup.ts` — `fs.statSync(worktreePath).isDirectory()`
|
|
109
|
+
**M3. `cleanup.ts` — `fs.statSync(worktreePath).isDirectory()` has no race guard**
|
|
110
110
|
```ts
|
|
111
111
|
for (const entry of fs.readdirSync(worktreeRoot)) {
|
|
112
112
|
const worktreePath = path.join(worktreeRoot, entry);
|
|
113
113
|
if (!fs.statSync(worktreePath).isDirectory()) continue;
|
|
114
114
|
```
|
|
115
|
-
-
|
|
116
|
-
-
|
|
115
|
+
- If the entry is deleted between `readdirSync` and `statSync`, it throws uncaught.
|
|
116
|
+
- Suggestion: wrap in `try { fs.statSync... } catch { continue; }` or use `fs.readdirSync(worktreeRoot, { withFileTypes: true })` then `entry.isDirectory()`.
|
|
117
117
|
|
|
118
|
-
**M4. `worktree-manager.ts` — `runSetupHook`
|
|
118
|
+
**M4. `worktree-manager.ts` — `runSetupHook` parses JSON only from the last line**
|
|
119
119
|
```ts
|
|
120
120
|
const lastLine = lines[lines.length - 1] ?? trimmed;
|
|
121
121
|
const parsed = JSON.parse(lastLine);
|
|
122
122
|
```
|
|
123
|
-
-
|
|
124
|
-
-
|
|
123
|
+
- If the hook outputs multi-line JSON (pretty-printed), only the last line is parsed → `syntheticPaths` are silently lost. There is a log warning, but it is silent from the caller's side.
|
|
124
|
+
- Suggestion: try parsing `trimmed` first, fall back to the last line. Or define a clear protocol (one-line JSON, terminator marker).
|
|
125
125
|
|
|
126
|
-
**M5. `worktree-manager.ts` — `linkNodeModulesIfPresent`
|
|
126
|
+
**M5. `worktree-manager.ts` — `linkNodeModulesIfPresent` does not warn when `symlinkSync` fails**
|
|
127
127
|
```ts
|
|
128
128
|
try { fs.symlinkSync(...); return true; } catch { return false; }
|
|
129
129
|
```
|
|
130
|
-
-
|
|
131
|
-
-
|
|
130
|
+
- On Windows without the right to create symlinks (requires SeCreateSymbolicLinkPrivilege), it fails silently, the agent runs without `node_modules` — module resolution may fail but the caller does not know.
|
|
131
|
+
- Suggestion: log the reason for failure (especially for non-admin Windows) via `logInternalError`, or return `{ linked, reason }`.
|
|
132
132
|
|
|
133
|
-
**M6. `child-pi.ts` — `forcedFinalDrain`
|
|
133
|
+
**M6. `child-pi.ts` — `forcedFinalDrain` forces `exitCode: 0`**
|
|
134
134
|
```ts
|
|
135
135
|
const finalExitCode = forcedFinalDrain && !timeoutError ? 0 : exitCode;
|
|
136
136
|
```
|
|
137
|
-
-
|
|
138
|
-
-
|
|
137
|
+
- This logic (already explained in a comment) converts some exit ≠ 0 into 0 after the child sends the final assistant event. Edge case: child crashes during cleanup after the final event → still reports success. This could mask a memory leak or crash in the child Pi.
|
|
138
|
+
- Suggestion: add telemetry/metrics counting how often `forcedFinalDrain → 0` happens to detect regressions. Currently there is only a lifecycle event "final_drain" but no conversion metric.
|
|
139
139
|
|
|
140
|
-
**M7. `background-runner.ts` — `process.exit(130)`
|
|
140
|
+
**M7. `background-runner.ts` — `process.exit(130)` in the interrupt guard does not await flush**
|
|
141
141
|
```ts
|
|
142
142
|
if (last?.type === "interrupt" && last?.acknowledged !== true) {
|
|
143
143
|
appendEvent(...);
|
|
144
144
|
process.exit(130);
|
|
145
145
|
}
|
|
146
146
|
```
|
|
147
|
-
- `process.exit`
|
|
148
|
-
-
|
|
147
|
+
- `process.exit` runs the `'exit'` handler but does not await async ops (e.g., a pending `appendEventBuffered` Promise). `flushEventLogBuffer` registered on `'exit'` is sync so it's OK, but `terminateLiveAgentsForRun` is not. It could leak a live agent.
|
|
148
|
+
- Suggestion: `await terminateLiveAgentsForRun(...)` before exiting, or use `process.exitCode = 130` + return so cleanup runs normally.
|
|
149
149
|
|
|
150
150
|
**M8. `state-store.ts` — manifest cache TTL invariant**
|
|
151
|
-
-
|
|
152
|
-
-
|
|
151
|
+
- The cache key is `stateRoot`, TTL 5 minutes. Path validation guards against manifest paths changing. But if the file mtime + size do not change (extremely rare but possible with coalesced atomic writes when the size & content are the same), the cache serves stale content.
|
|
152
|
+
- Suggestion: add a `contentHash` (cheap to stat → a fingerprint like the first 32 bytes) to the cache key, or invalidate the cache in the `atomicWriteJsonCoalesced` flush callback.
|
|
153
153
|
|
|
154
|
-
**M9. `event-log.ts` — `sequenceCache`
|
|
155
|
-
-
|
|
156
|
-
-
|
|
154
|
+
**M9. `event-log.ts` — `sequenceCache` not invalidated when the file is truncated externally**
|
|
155
|
+
- If an external tool truncates `events.jsonl` (manual rotation), the cached `seq` stays high, making `nextSequence` produce wrong seqs (there is a fallback: `cached.size === stat.size`). OK for the same-size race, but if truncation happens between `statSync` and `appendFileSync`, two appends will have the same seq.
|
|
156
|
+
- Suggestion: persistSequence already uses atomic write, you can trust it in the race. Add an integration test for external truncation.
|
|
157
157
|
|
|
158
158
|
**M10. `runtime-resolver` / config — `executeWorkers=false` default fallback path**
|
|
159
|
-
- `handleResume`
|
|
160
|
-
-
|
|
159
|
+
- `handleResume` has complex logic to re-evaluate `runtime.mode` when resuming scaffold runs. The 3-way logic (`resumeManifest.runtimeResolution?.safety === "explicit_dry_run"` + env var checks) easily leads to an edge case where the user expects actual workers but resume is still scaffold. Hard to test.
|
|
160
|
+
- Suggestion: refactor into a clear state machine `resolveResumeRuntime({ original, override, env })` with unit tests covering the full truth table.
|
|
161
161
|
|
|
162
162
|
### LOW
|
|
163
163
|
|
|
164
|
-
- **L1. `package.json`
|
|
165
|
-
- **L2. Many `JSON.stringify(value, null, 2)`
|
|
166
|
-
- **L3. `task-runner.ts`
|
|
167
|
-
- **L4. `registerYieldTool()`
|
|
168
|
-
- **L5. `atomic-write.ts` `atomicWriteJsonCoalesced`** — API
|
|
169
|
-
- **L6. Cancellation paths
|
|
170
|
-
- **L7. `management.ts` `handleUpdate` rename+write** sequence
|
|
171
|
-
- **L8. `child-pi.ts` mock paths
|
|
172
|
-
- **L9. `worktree-manager.ts` `findGitRoot` throws**
|
|
173
|
-
- **L10.
|
|
174
|
-
- **L11.
|
|
175
|
-
- **L12. `update-references-for-rename`
|
|
164
|
+
- **L1. `package.json` missing a `lint` script**; the global `AGENTS.md` has a convention `eslint --max-warnings=0`. Currently it only relies on `tsc strict`. Consider adding ESLint or Biome.
|
|
165
|
+
- **L2. Many `JSON.stringify(value, null, 2)` for metadata artifacts**. Pretty-printing 50+ artifact/task files costs I/O. Consider minified JSON for metadata; pretty only for summary/progress that users read.
|
|
166
|
+
- **L3. `task-runner.ts` creates ~13 artifacts per task** (prompt, result, inputs, coordination, skill, packet, verification, startup, permission, capability, prompt-pipeline, log, transcript, diff, diff-stat, output-validation). Each is an `atomicWriteFile` syscall. In a large run (50+ tasks), consolidating into fewer sub-artifacts would significantly reduce I/O.
|
|
167
|
+
- **L4. `registerYieldTool()` runs at module top level** (`task-runner.ts` line 35). Side effect on import — if the module is imported twice (e.g., jiti vs strip-types), `subprocessToolRegistry` could be duplicated. Check whether `subprocess-tool-registry.ts` is idempotent.
|
|
168
|
+
- **L5. `atomic-write.ts` `atomicWriteJsonCoalesced`** — the API has a significant caveat (read-after-write within the buffer window reads stale content). Large risk surface if a future dev forgets to call `flushPendingAtomicWrites()`. Consider adding a dedicated read API `readJsonFileWithCoalesceFlush()`.
|
|
169
|
+
- **L6. Cancellation paths have no counting metric**. There are observability events but no gauge for the number of tasks cancelled per run.
|
|
170
|
+
- **L7. `management.ts` `handleUpdate` rename+write** sequence has no rollback if writeFileSync fails after rename (a backup exists, but the user must manually restore). Could wrap in try/catch + auto-restore from backup.
|
|
171
|
+
- **L8. `child-pi.ts` mock paths read env `PI_TEAMS_MOCK_CHILD_PI`** — there should be a guard preventing accidental production activation (check `process.env.NODE_ENV === "test"` or a clear test flag).
|
|
172
|
+
- **L9. `worktree-manager.ts` `findGitRoot` throws** if cwd is not a git repo. `prepareTaskWorkspace` calls it before checking workspaceMode; actually workspaceMode is checked at the top of the function, OK. But the git error message ("not a git repository") propagates to the user — not user-friendly.
|
|
173
|
+
- **L10. The naming `crewRoot` vs `.crew/` vs `.pi/teams/`** is documented but easy to confuse. `projectCrewRoot` has three branches (existing `.crew` → `.crew`; existing `.pi` → `.pi/teams`; else → `.crew`). Tests cover it but a new dev reading the code can easily misunderstand.
|
|
174
|
+
- **L11. Some `let task: TeamTaskState = ...` is reassigned multiple times in `task-runner.ts`**. Hard to reason about. Consider refactoring into a reducer pattern.
|
|
175
|
+
- **L12. `update-references-for-rename` only updates team→agent and team.defaultWorkflow**, does not cover workflow→step.role or agent references in test fixtures. The comment acknowledges this. Still worth fixing so renames are safe.
|
|
176
176
|
|
|
177
177
|
---
|
|
178
178
|
|
|
179
179
|
## 3. Security review
|
|
180
180
|
|
|
181
|
-
|
|
|
181
|
+
| Item | Status | Notes |
|
|
182
182
|
|---|---|---|
|
|
183
|
-
| Path traversal | OK | `assertSafePathId`, `resolveContainedPath`, `resolveRealContainedPath`
|
|
184
|
-
| Symlink escape | OK (corner case H3) | `O_NOFOLLOW`, `lstatSync`, post-open `fstatSync`.
|
|
185
|
-
| Secret leak | OK | Redaction
|
|
186
|
-
| Code injection via setup hook | Mitigated | `runSetupHook`
|
|
187
|
-
| Untrusted project config | OK | `sanitizeProjectConfig`
|
|
183
|
+
| Path traversal | OK | `assertSafePathId`, `resolveContainedPath`, `resolveRealContainedPath` cover it fairly thoroughly. |
|
|
184
|
+
| Symlink escape | OK (corner case H3) | `O_NOFOLLOW`, `lstatSync`, post-open `fstatSync`. One fallback path skips the check (H3). |
|
|
185
|
+
| Secret leak | OK | Redaction applied at the event log, transcript, mailbox, artifact inputs. Env sanitization before spawning the child. |
|
|
186
|
+
| Code injection via setup hook | Mitigated | `runSetupHook` validates the file exists, uses `shell: false`, allow-lists env, 30s timeout. But it still executes user-provided code. Must trust the user. |
|
|
187
|
+
| Untrusted project config | OK | `sanitizeProjectConfig` strips sensitive keys before merging. |
|
|
188
188
|
| Process tree leak (zombie child Pi) | OK | `terminateActiveChildPiProcesses` + `parent-guard` + Windows `taskkill /T /F`. |
|
|
189
|
-
| DoS
|
|
190
|
-
| Event log injection | Mitigated | JSON.stringify
|
|
191
|
-
| Dependency surface |
|
|
189
|
+
| DoS via concurrency | OK | Default hard-cap; `allowUnboundedConcurrency=true` requires explicit opt-in + emits an event. |
|
|
190
|
+
| Event log injection | Mitigated | JSON.stringify per line; readEvents skips parse errors. There is a risk of corrupted JSON lines due to an `appendFileSync` race (H2 in mailbox, but the event log has a lock). |
|
|
191
|
+
| Dependency surface | Small | Only runtime deps: typebox, cli-highlight, diff, jiti. |
|
|
192
192
|
|
|
193
|
-
|
|
193
|
+
In summary: the security posture is **good**. The biggest issue is H2 (mailbox has no lock) — stale state can occur if multiple processes race.
|
|
194
194
|
|
|
195
195
|
---
|
|
196
196
|
|
|
197
197
|
## 4. Performance review
|
|
198
198
|
|
|
199
|
-
- **Atomic write coalescer** (50ms window)
|
|
200
|
-
- **Manifest cache**
|
|
201
|
-
- **Lazy import boundaries**
|
|
202
|
-
- **`projectRootCache` TTL 30s**
|
|
199
|
+
- **Atomic write coalescer** (50ms window) has reduced I/O for high-frequency state writes.
|
|
200
|
+
- **Manifest cache** with mtime+size key avoids re-parsing when unchanged.
|
|
201
|
+
- **Lazy import boundaries** reduce import cost ~1.4s.
|
|
202
|
+
- **`projectRootCache` TTL 30s** reduces 14 `existsSync` × ancestor levels per render tick.
|
|
203
203
|
|
|
204
|
-
|
|
205
|
-
1.
|
|
206
|
-
2. `progress.md`
|
|
207
|
-
3. `parsePiJsonOutput(fs.readFileSync(transcriptPath))`
|
|
208
|
-
4. `aggregateUsage(tasks)`
|
|
204
|
+
Areas with optimization potential:
|
|
205
|
+
1. Each completed task produces ~13 artifacts (L3). 50 tasks = 650 atomic writes for metadata. Consider batching.
|
|
206
|
+
2. `progress.md` and `summary.md` are rewritten multiple times per batch (writeProgress in a loop). Coalescing is fine but `atomicWriteJsonCoalesced` could be used.
|
|
207
|
+
3. `parsePiJsonOutput(fs.readFileSync(transcriptPath))` runs each attempt, parsing the full transcript. Stream parsing is cheaper for large transcripts.
|
|
208
|
+
4. `aggregateUsage(tasks)` runs O(n) over tasks on each summary write.
|
|
209
209
|
|
|
210
210
|
---
|
|
211
211
|
|
|
@@ -214,40 +214,40 @@ Nóng còn tiềm năng tối ưu:
|
|
|
214
214
|
| Aspect | Note |
|
|
215
215
|
|---|---|
|
|
216
216
|
| TS strict | OK, `noImplicitAny` enforced. |
|
|
217
|
-
| Naming `__test__*` |
|
|
218
|
-
| File size | `team-runner.ts` (694
|
|
219
|
-
| Comment quality |
|
|
220
|
-
| Test layout | `test/unit/*.test.ts` + `test/integration/*.test.ts`.
|
|
221
|
-
| Hard-coded magic numbers |
|
|
222
|
-
| Error reporting | `logInternalError` consistent — best-effort,
|
|
223
|
-
| Docs sync | `docs/architecture.md`
|
|
217
|
+
| Naming `__test__*` | Some mixing of pure test utils and production helpers (H4). |
|
|
218
|
+
| File size | `team-runner.ts` (694 lines), `task-runner.ts` (440+ lines), `register.ts` (1k+ lines), `live-session-runtime.ts` (~750 lines) are all > 500 lines. AGENTS.md says "prefer small modules". |
|
|
219
|
+
| Comment quality | Good — there are "WHY" markers, version tags (`// 2.10`, `// H4`, `// 3.1`). |
|
|
220
|
+
| Test layout | `test/unit/*.test.ts` + `test/integration/*.test.ts`. Reasonable concurrency. |
|
|
221
|
+
| Hard-coded magic numbers | Mostly centralized in `config/defaults.ts`. |
|
|
222
|
+
| Error reporting | `logInternalError` is consistent — best-effort, does not throw. |
|
|
223
|
+
| Docs sync | `docs/architecture.md` matches the code (except some next-upgrade-roadmap items not yet implemented). |
|
|
224
224
|
|
|
225
225
|
---
|
|
226
226
|
|
|
227
|
-
## 6. Test-matrix
|
|
227
|
+
## 6. Test-matrix gaps (candidates for new tests)
|
|
228
228
|
|
|
229
|
-
- Cross-process race
|
|
230
|
-
- Event log overflow recovery (H1) —
|
|
231
|
-
- `forcedFinalDrain`
|
|
229
|
+
- Cross-process race on mailbox append (H2).
|
|
230
|
+
- Event log overflow recovery (H1) — ensure terminal events are still persisted when exceeding 50MB.
|
|
231
|
+
- `forcedFinalDrain` does not mask a real child crash (M6).
|
|
232
232
|
- Resume with mixed `runtime.mode` overrides (M10).
|
|
233
|
-
- Atomic-write coalesced + read-after-write within window —
|
|
233
|
+
- Atomic-write coalesced + read-after-write within the window — ensure documented behavior matches reality.
|
|
234
234
|
- `linkNodeModulesIfPresent` Windows non-admin fallback (M5).
|
|
235
235
|
- `runSetupHook` multi-line JSON output (M4).
|
|
236
236
|
|
|
237
237
|
---
|
|
238
238
|
|
|
239
|
-
## 7.
|
|
239
|
+
## 7. Suggested priorities (sorted)
|
|
240
240
|
|
|
241
|
-
1. **Fix H1** (event-log overflow): rotate
|
|
242
|
-
2. **Fix H2** (mailbox lock):
|
|
243
|
-
3. **Fix H3** (atomic-write symlink TOCTOU): re-check lstat
|
|
244
|
-
4. **Fix H4** (rename `__test__mergeTaskUpdates` → `mergeTaskUpdates`,
|
|
241
|
+
1. **Fix H1** (event-log overflow): rotate immediately when the threshold is crossed + prioritize terminal events.
|
|
242
|
+
2. **Fix H2** (mailbox lock): apply the `withEventLogLockSync` pattern to mailbox append.
|
|
243
|
+
3. **Fix H3** (atomic-write symlink TOCTOU): re-check lstat before the `writeFileSync` fallback.
|
|
244
|
+
4. **Fix H4** (rename `__test__mergeTaskUpdates` → `mergeTaskUpdates`, keep alias).
|
|
245
245
|
5. **M1/M2** transcript per-attempt + cap size.
|
|
246
|
-
6. **M3** race-safe `statSync`
|
|
247
|
-
7. **M6**
|
|
248
|
-
8. **L1**
|
|
249
|
-
9. **L3** batch artifact writes
|
|
250
|
-
10. **L12**
|
|
246
|
+
6. **M3** race-safe `statSync` in cleanup.
|
|
247
|
+
7. **M6** add a metric `crew.child.final_drain_force_zero_total`.
|
|
248
|
+
8. **L1** add ESLint or Biome for consistency (global AGENTS.md requires it).
|
|
249
|
+
9. **L3** batch artifact writes for metadata.
|
|
250
|
+
10. **L12** expand `updateReferencesForRename` for workflow→step + agent references.
|
|
251
251
|
|
|
252
252
|
---
|
|
253
253
|
|
|
@@ -259,13 +259,12 @@ node --experimental-strip-types -e "..." → PASS (strip-types import ok
|
|
|
259
259
|
node --test test/unit/*.test.ts → 1596 pass / 2 skip / 0 fail / 90s
|
|
260
260
|
```
|
|
261
261
|
|
|
262
|
-
|
|
262
|
+
There is no lint command in the project (only `tsc strict`); no `.eslintrc*` file was found.
|
|
263
263
|
|
|
264
264
|
---
|
|
265
265
|
|
|
266
|
-
## 9.
|
|
266
|
+
## 9. Conclusion
|
|
267
267
|
|
|
268
|
-
`pi-crew`
|
|
269
|
-
|
|
270
|
-
**Khuyến nghị**: ưu tiên fix H1–H4 và mở rộng test cho cross-process race (mailbox + event-log overflow). Tiếp theo là cân nhắc thêm linter, batch metadata artifact writes, và refactor một số orchestrator file lớn (`register.ts`, `team-runner.ts`, `live-session-runtime.ts`) thành sub-modules.
|
|
268
|
+
`pi-crew` is a **mature, highly disciplined** codebase, with many defensive layers against TOCTOU, races, and mid-write crashes. Test coverage is broad, the architecture is clear. The issues found are mainly edge-case correctness and hardening; there is no serious "broken core flow" vulnerability.
|
|
271
269
|
|
|
270
|
+
**Recommendation**: prioritize fixing H1–H4 and expanding tests for cross-process races (mailbox + event-log overflow). Next, consider adding a linter, batching metadata artifact writes, and refactoring some large orchestrator files (`register.ts`, `team-runner.ts`, `live-session-runtime.ts`) into sub-modules.
|