pi-crew 0.8.14 → 0.9.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (86) hide show
  1. package/CHANGELOG.md +366 -0
  2. package/README.md +112 -2
  3. package/docs/FEATURE_INTAKE.md +1 -1
  4. package/docs/HARNESS.md +20 -19
  5. package/docs/PROJECT_REVIEW.md +132 -133
  6. package/docs/PROJECT_REVIEW_FIXES.md +130 -131
  7. package/docs/actions-reference.md +127 -121
  8. package/docs/architecture.md +1 -1
  9. package/docs/code-review-2026-05-11.md +134 -134
  10. package/docs/commands-reference.md +108 -106
  11. package/docs/comparison-pi-subagents-vs-pi-crew.md +105 -105
  12. package/docs/deep-review-report.md +1 -1
  13. package/docs/dynamic-workflows.md +90 -0
  14. package/docs/fixes/BATCH_A_H1_H2.md +17 -17
  15. package/docs/fixes/bug-007-async-notifier-stale-ctx.md +23 -23
  16. package/docs/followup-plan-2026-05-12.md +135 -135
  17. package/docs/followup-review-2026-05-12.md +86 -86
  18. package/docs/followup-review-round3-2026-05-12.md +123 -123
  19. package/docs/goals.md +59 -0
  20. package/docs/implementation-plan-top3.md +4 -4
  21. package/docs/issue-29-analysis.md +2 -2
  22. package/docs/oh-my-pi-research.md +154 -154
  23. package/docs/optimization-plan.md +2 -0
  24. package/docs/perf/baseline-2026-05.md +9 -9
  25. package/docs/perf/final-report-2026-05.md +2 -2
  26. package/docs/perf/sprint-1-report.md +2 -2
  27. package/docs/perf/sprint-2-report.md +1 -1
  28. package/docs/perf/upgrade-plan-2026-05.md +72 -72
  29. package/docs/pi-crew-bugs.md +230 -230
  30. package/docs/pi-crew-investigation-report.md +102 -102
  31. package/docs/pi-crew-test-round5.md +4 -4
  32. package/docs/runtime-analysis-child-vs-live.md +57 -57
  33. package/docs/runtime-migration-in-process-analysis.md +97 -97
  34. package/package.json +2 -4
  35. package/skills/orchestration/SKILL.md +11 -11
  36. package/src/agents/agent-config.ts +4 -0
  37. package/src/config/config.ts +39 -0
  38. package/src/config/types.ts +11 -0
  39. package/src/extension/action-suggestions.ts +2 -1
  40. package/src/extension/async-notifier.ts +10 -0
  41. package/src/extension/help.ts +14 -0
  42. package/src/extension/registration/commands.ts +27 -0
  43. package/src/extension/team-tool/destructive-gate.ts +1 -1
  44. package/src/extension/team-tool/goal-wrap.ts +288 -0
  45. package/src/extension/team-tool/goal.ts +405 -0
  46. package/src/extension/team-tool/run.ts +103 -4
  47. package/src/extension/team-tool/workflow-manage.ts +194 -0
  48. package/src/extension/team-tool.ts +20 -0
  49. package/src/hooks/types.ts +3 -1
  50. package/src/runtime/async-runner.ts +27 -2
  51. package/src/runtime/background-runner.ts +68 -19
  52. package/src/runtime/child-pi.ts +9 -1
  53. package/src/runtime/completion-guard.ts +1 -1
  54. package/src/runtime/dynamic-workflow-context.ts +450 -0
  55. package/src/runtime/dynamic-workflow-runner.ts +180 -0
  56. package/src/runtime/global-worker-cap.ts +96 -0
  57. package/src/runtime/goal-evaluator.ts +294 -0
  58. package/src/runtime/goal-loop-runner.ts +612 -0
  59. package/src/runtime/goal-state-store.ts +209 -0
  60. package/src/runtime/iteration-hooks.ts +2 -1
  61. package/src/runtime/pi-args.ts +10 -2
  62. package/src/runtime/post-checks.ts +2 -1
  63. package/src/runtime/result-extractor.ts +32 -0
  64. package/src/runtime/team-runner.ts +11 -1
  65. package/src/runtime/verification-gates.ts +88 -5
  66. package/src/runtime/verification-integrity.ts +110 -0
  67. package/src/runtime/verification-worktree.ts +136 -0
  68. package/src/runtime/workspace-lock.ts +448 -0
  69. package/src/schema/config-schema.ts +26 -0
  70. package/src/schema/team-tool-schema.ts +39 -4
  71. package/src/state/atomic-write.ts +9 -0
  72. package/src/state/contracts.ts +14 -0
  73. package/src/state/crew-init.ts +18 -5
  74. package/src/state/event-log.ts +7 -1
  75. package/src/state/state-store.ts +2 -0
  76. package/src/state/types.ts +82 -0
  77. package/src/state/worker-atomic-writer.ts +190 -0
  78. package/src/utils/env-allowlist.ts +30 -0
  79. package/src/utils/redaction.ts +104 -24
  80. package/src/utils/safe-paths.ts +55 -14
  81. package/src/workflows/discover-workflows.ts +25 -1
  82. package/src/workflows/workflow-config.ts +13 -0
  83. package/src/worktree/cleanup.ts +2 -1
  84. package/src/worktree/worktree-manager.ts +4 -3
  85. package/teams/parallel-research.team.md +1 -1
  86. package/workflows/examples/hello.dwf.ts +24 -0
package/CHANGELOG.md CHANGED
@@ -1,5 +1,341 @@
1
1
  # Changelog
2
2
 
3
+ ## [v0.9.1] — Windows essentials fix + cross-platform CI green (2026-06-22)
4
+
5
+ Patch release. No new features. Fixes a real Windows bug reported by a user,
6
+ plus the cross-platform CI failures that followed.
7
+
8
+ ### fix(windows): `${APPDATA}` npm-global resolution failure (root cause)
9
+
10
+ Reported symptom (Windows): running pi-crew created a phantom literal
11
+ `${APPDATA}/npm/` directory in the project root (containing `node_modules`,
12
+ `pi-crew`, `pi-crew.cmd`, `pi-crew.ps1`) and leaked a literal `${APPDATA}`
13
+ line into `.gitignore`.
14
+
15
+ Root cause: pi-crew's subprocess env sanitization used explicit allowlists
16
+ that stripped **all Windows-essential env vars** (`APPDATA`, `LOCALAPPDATA`,
17
+ `USERPROFILE`, `SystemRoot`, `ComSpec`, `TEMP`, `TMP`). When a child pi
18
+ process (or npm inside it) tried to resolve the npm-global prefix on Windows,
19
+ it used `%APPDATA%` (cmd expansion) / `${APPDATA}` (bash expansion), but
20
+ `APPDATA` was missing from the env — so the shell left the literal
21
+ `${APPDATA}` in place and operations created/ignored paths under that
22
+ literal name.
23
+
24
+ Fix: added the 7 Windows essentials to all 7 subprocess env allowlists
25
+ (child-pi, async-runner, verification-gates, post-checks, iteration-hooks,
26
+ worktree/cleanup, worktree/worktree-manager). (commit `a7ddc50`)
27
+
28
+ ### refactor(env): centralize Windows essentials + regression guard
29
+
30
+ The same 7 vars were duplicated inline across 9 call sites — easy to forget
31
+ on a new allowlist, with nothing preventing a future site from omitting them.
32
+
33
+ - New single source of truth: `WINDOWS_ESSENTIAL_ENV_VARS` in
34
+ `src/utils/env-allowlist.ts` (with full root-cause documentation).
35
+ - All 9 call sites now spread the constant instead of inlining (net −42/+19
36
+ lines, behavior unchanged).
37
+ - New regression test `test/unit/env-allowlist.test.ts`: scans ALL
38
+ `src/**/*.ts` files and fails if any hardcodes the 7 vars inline (the only
39
+ allowed location is the constant file). This catches any new allowlist that
40
+ forgets the constant — the exact regression that caused the bug.
41
+ (commit `6a0284c`)
42
+
43
+ ### fix(ci): cross-platform CI green (ubuntu + macOS + Windows)
44
+
45
+ Three distinct containment/path bugs that only surfaced on non-ubuntu CI:
46
+
47
+ 1. **Windows 8.3 short-name paths** — `resolveWindowsCanonical()` used
48
+ non-native `realpathSync`, preserving the `RUNNER~1` vs `runneradmin`
49
+ form mismatch. A legitimately-contained dynamic workflow file was
50
+ rejected as "outside the allowed directories". Fixed by using
51
+ `realpathSync.native` (canonical long-name form) as the primary resolver.
52
+ (commit `e9e7137`)
53
+
54
+ 2. **ESM `file://` URLs** — two integration tests passed raw Windows paths
55
+ (`D:\…`) to native `import()`, which Node rejects on Windows
56
+ (`ERR_UNSUPPORTED_ESM_URL_SCHEME: protocol 'd:'`). Wrapped with
57
+ `pathToFileURL(…).href`. (commit `e9e7137`)
58
+
59
+ 3. **macOS symlink-ancestor** — `isSymlinkSafePath()` walked up the temp
60
+ path and hit `/var` (a symlink → `/private/var`). The old check compared
61
+ the resolved `/private/var` against the tmpdir
62
+ `/private/var/folders/…/T` — `/private/var` is an **ancestor**, not a
63
+ descendant, so it was wrongly rejected (5 macOS worker-atomic-writer
64
+ failures). Fixed by accepting a symlink whose target is a safe root, is
65
+ UNDER a safe root, OR is an ANCESTOR of a safe root. Added two behavioral
66
+ regression tests (symlink-ancestor accept + symlink-attack reject).
67
+ (commit `e9e7137`)
68
+
69
+ 4. **macOS `/var` containment** — `resolveContainedPath()` only
70
+ canonicalized paths on win32; on POSIX it compared raw paths, so base
71
+ (`/private/var`) vs target (`/var`) diverged → false "outside" rejection
72
+ (macOS dwf-setresult failure). Added platform-agnostic
73
+ `resolveCanonicalPath()`. Added a darwin-only regression test for the
74
+ real `/var` divergence. (commit `4821bb1`)
75
+
76
+ 5. **Windows wakeup timing** — `subagent-manager` polls the child run
77
+ manifest every 1000ms. On the slower Windows CI runner, child-process
78
+ spawn + first poll exceeded the test's 10s deadline (failed at 11.6s).
79
+ Bumped the mock-test deadline to 30s. (commit `4821bb1`)
80
+
81
+ ### Verification
82
+
83
+ - tsc: 0
84
+ - Full test suite: 5207 tests, 0 fail on **all three** platforms (ubuntu,
85
+ macOS, Windows)
86
+ - CI run `27955398241`: success across ubuntu-latest, macos-latest,
87
+ windows-latest
88
+ - Regression tests added: env-allowlist scan, worker-atomic-writer symlink
89
+ ancestor/attack, safe-paths darwin `/var` divergence
90
+
91
+ ### Breaking changes
92
+
93
+ None. All fixes are additive or behavior-preserving. Windows users who hit
94
+ the `${APPDATA}` bug should upgrade.
95
+
96
+ ---
97
+
98
+ ## [v0.9.0] — goal loops + dynamic workflows (2026-06-18)
99
+
100
+ Two new features, both built on a shared `runKind` background-dispatch discriminator.
101
+
102
+ ### Phase 1.5 #4: TDZ fix — dynamic-workflow runs end-to-end via full pi pipeline (RFC 17 fix)
103
+
104
+ Live `team action='run' workflow='<dynamic>'` was failing with
105
+ `Dynamic workflow 'X' must export a default async function(ctx).` even
106
+ though the .dwf.ts loaded correctly via direct jiti. Root cause was NOT
107
+ in `dynamic-workflow-runner.ts` — it was a Temporal Dead Zone race in
108
+ `team-tool/run.ts` when loaded via the full pi extension pipeline
109
+ (`index.ts → register.ts → registration/team-tool.ts → team-tool.ts →
110
+ run.ts`).
111
+
112
+ **Race details**: jiti loads each .ts file inside an `async function
113
+ _module(...)` wrapper. Static `import { X } from "..."` statements
114
+ become `var _x = require(...)` calls. When a destructured `import` is
115
+ referenced inside a hoisted function before its `let` declaration line
116
+ runs, the reference hits TDZ.
117
+
118
+ **Fixes**:
119
+ - `src/extension/team-tool/run.ts`:
120
+ - `crewInitPromise`: `let` → `var` (avoids TDZ)
121
+ - `expandParallelResearchWorkflow`, `validateWorkflowForTeam`,
122
+ `normalizeSkillOverride`: convert to lazy dynamic imports at call site
123
+ - `src/state/crew-init.ts`:
124
+ - `CREW_README`: `const` → `function buildCrewReadme(): string` (function
125
+ declarations are fully hoisted)
126
+ - `updateGitignore`: convert usage to lazy dynamic import at call site
127
+
128
+ **New test**: `test/integration/run-via-full-pipeline.test.ts` loads
129
+ `index.ts` via `jiti.import()` the way pi does, invokes `handleRun` with a
130
+ dynamic workflow params, and asserts no TDZ / ReferenceError is thrown.
131
+ Fails without the fix, passes with it.
132
+
133
+ **Verification**:
134
+ - 108 unit tests pass (goal, dwf, redaction, verification, worker-writer)
135
+ - New integration test passes
136
+ - Direct simulation of pi pipeline → `Dynamic workflow 'demo-hello'
137
+ completed` (was: `failed: must export a default async function`)
138
+
139
+ Closes RFC 17 §4 round-trip / investigated residual. See
140
+ `research-findings/goal-workflow/17-PHASE1.5-CRASH-INVESTIGATION-RFC.md`
141
+ for the full 8-attempt investigation log (gdb, strace, V8 report, sync
142
+ workarounds, worker-thread atomic writer, auto-downgrade — none
143
+ identified the real bug because they all skipped the full pi load path).
144
+
145
+ ### Phase 1.5 #3: V8 diagnostic report infrastructure + crash investigation closed
146
+
147
+ `PI_CREW_BG_REPORT_ON_FATAL=1` makes the background goal-loop runner spawn
148
+ with `--report-on-fatalerror --report-compact`. When V8 considers the
149
+ process state fatal, it writes a diagnostic report (native stack, JS stack,
150
+ libuv handles, environment) — crucial info that application-level signal
151
+ handlers cannot capture.
152
+
153
+ **Investigation result** (RFC 17, 5+ reproduction attempts): the multi-step
154
+ goal-wrap crash does NOT produce a V8 report. Combined with prior findings
155
+ (no signal via strace, no coredump, no OOM, parent process alive) the crash
156
+ signature is consistent with a V8/libuv internal abort that bypasses all
157
+ Node.js reporting mechanisms. Cannot be diagnosed or fixed at the
158
+ application level.
159
+
160
+ **Decision**: ship with the existing auto-downgrade safety net
161
+ (commit `6de5270`). Multi-step workflows silently run as normal team-runs
162
+ (no goal-wrap layer); single-step workflows (implementation) continue to
163
+ goal-wrap end-to-end. The `--report-on-fatalerror` infrastructure remains
164
+ in place for future investigation if a Node.js version change alters the
165
+ crash behavior.
166
+
167
+ See `research-findings/goal-workflow/17-PHASE1.5-CRASH-INVESTIGATION-RFC.md`
168
+ for the full 8-attempt investigation log and re-opening criteria.
169
+
170
+ ### Phase 1.5 #2: git-worktree verification sandbox (closes round-trip + invoked-script tamper)
171
+
172
+ `PI_CREW_VERIFICATION_WORKTREE=1` runs verification commands in a pristine
173
+ `git worktree` at the T_snap commit SHA. Closes the two remaining Med-High
174
+ tampering residuals from RFC 13 §6 that P1a manifest hashing cannot detect:
175
+
176
+ - **Round-trip manifest tamper (MAJ#2)**: worker edits `package.json`, runs
177
+ `npm test` (passes), reverts before T_verify_done → hash matches →
178
+ tamper undetected. With worktree: verification runs at original SHA →
179
+ worker edits invisible → tamper BLOCKED.
180
+ - **Invoked-script tampering**: worker rewrites a script the verification
181
+ command invokes; only MANIFEST_FILES are hashed → invisible. With
182
+ worktree: script is at original SHA → tamper BLOCKED.
183
+
184
+ Graceful fallback when ANY precondition fails (logged via
185
+ logInternalError "goal-loop.worktreeSandboxBypassed"): opt-out env,
186
+ not-a-git-repo, dirty index, git unavailable. NEVER blocks the goal loop.
187
+
188
+ Implementation:
189
+ - `src/runtime/verification-worktree.ts` (NEW, pure leaf module):
190
+ `isWorktreeSandboxEnabled`, `checkWorktreeSandboxAvailable`,
191
+ `prepareVerificationWorktree` (git worktree add --detach),
192
+ `withVerificationWorktree` (RAII cleanup, idempotent, finally-safe).
193
+ - `src/runtime/verification-gates.ts`: `executeVerificationCommands`
194
+ accepts optional `worktreeCwd` — spawns commands with that cwd.
195
+ - `src/runtime/goal-loop-runner.ts`: verification call site prepares
196
+ worktree at T_snap SHA when available; finally block always cleans up.
197
+ - `src/runtime/async-runner.ts`: PI_CREW_VERIFICATION_WORKTREE env
198
+ inherited by bg-runner.
199
+
200
+ Tests: 12 new unit tests in `test/unit/verification-worktree.test.ts`
201
+ (flag opt-in, not-a-repo fallback, dirty-index fallback, clean-repo success,
202
+ pristine-checkout property = the security guarantee, RAII cleanup on success
203
+ + on exception, idempotent cleanup). All pass.
204
+ 5200 unit + 115 integration tests; no regression; tsc clean.
205
+
206
+ RFC: `research-findings/goal-workflow/16-PHASE1.5-WORKTREE-SANDBOX-RFC.md`
207
+
208
+ ### Phase 1.5 #1: sanitized-env verification (opt-in info-disclosure mitigation)
209
+
210
+ `PI_CREW_VERIFICATION_SANITIZE_ENV=1` strips model-provider secrets (and
211
+ everything else not in the essential-vars allowlist) from the env passed to
212
+ verification commands (`npm test`, `pytest`, etc.). Closes the info-disclosure
213
+ residual at the SOURCE — P1f redaction at artifact-write + judge-bound is
214
+ regex-best-effort against adversarial workers; this never gives the
215
+ verification process the secret in the first place.
216
+
217
+ Escape hatch: `PI_CREW_VERIFICATION_PRESERVE_ENV=KEY1,KEY2,...` lets users
218
+ explicitly opt specific env vars back in (audited via the env-filter.ts
219
+ allowlist validator). Essential non-secret vars (PATH, HOME, USER, SHELL,
220
+ LANG, XDG_*, NPM_CONFIG_*, etc.) are always preserved.
221
+
222
+ AllowList: 25 essential vars. NO model-provider keys by default.
223
+ Inherited by bg-runner via async-runner.ts env allowlist.
224
+
225
+ Tests: 7 new unit tests in test/unit/verification-env-sanitize.test.ts
226
+ (3 flag checks + 4 integration tests spawning real `printenv` subprocesses).
227
+ All pass. 5188 unit + 115 integration tests; no regression.
228
+
229
+ ### SAFETY: goal-wrap auto-downgrades multi-step workflows (no hidden crashes)
230
+
231
+ Multi-step workflows (default: 4 steps, fast-fix: 3 steps) crash
232
+ non-deterministically when run as goal-wrap worker turns in the background
233
+ goal-loop process — V8/libuv race during event-loop yields in team-runner
234
+ batch transition (see commit a9f6e09, RFC 15). Sync fs workarounds regress;
235
+ worker-thread isolation doesn't help.
236
+
237
+ When a user has goal-wrap enabled in config but the workflow is multi-step,
238
+ the team-run handler now **auto-downgrades**: skips the goal-wrap layer and
239
+ runs the workflow via the normal team-run path (foreground `executeTeamRun`
240
+ or background `spawnBackgroundTeamRun`, depending on `async`). The user gets
241
+ the run they asked for — no error, no hang, no need to remove config.
242
+
243
+ The bypass reason is logged via `logInternalError("team-tool.run.goalWrapBypassed", ...)`
244
+ for traceability (findable in debug logs / `internal-error.json`).
245
+
246
+ Single-step workflows (e.g. `implementation`, only the adaptive `assess`
247
+ step) continue to be goal-wrapped end-to-end.
248
+
249
+ Implementation:
250
+ - `shouldGoalWrap(cwd, workflow)` — pure decision function returning
251
+ `{enabled: true}` or `{enabled: false, reason, message}`. Reasons:
252
+ `config-off` (not enabled), `invalid-config` (malformed), `multi-step`
253
+ (more than `GOAL_WRAP_MAX_STEPS = 1` step).
254
+ - `run.ts` calls `shouldGoalWrap` after `isGoalWrapEnabled`; if disabled,
255
+ falls through to normal team-run path. The original `isGoalWrapEnabled`
256
+ fast path (config check only) is kept as a cheap pre-filter.
257
+ - 5 new unit tests in `test/unit/goal-wrap.test.ts` cover all 4 decisions
258
+ (config-off / invalid-config / multi-step refuse / single-step accept)
259
+ + the GOAL_WRAP_MAX_STEPS value invariant.
260
+
261
+ ### Phase 1.5: worker-thread atomic writer (opt-in, infrastructure)
262
+
263
+ `PI_CREW_WORKER_ATOMIC_WRITER=1` routes `atomicWriteFileAsync` and
264
+ `appendEventAsync` through a dedicated worker thread that performs SYNC fs
265
+ ops with no internal yields. Implementation: `src/state/worker-atomic-writer.ts`.
266
+ 9 unit tests; 5169 existing tests pass; no regression.
267
+
268
+ **Test result**: worker writer does NOT fix the multi-step crash (verified
269
+ end-to-end with `default` workflow). The crash is NOT in fs writes — worker
270
+ writes complete successfully but the process still dies during batch
271
+ transition. Root cause is some other async operation yielding the main
272
+ event loop. See `research-findings/goal-workflow/15-PHASE1.5-WORKER-WRITER-RFC.md`
273
+ for full investigation notes.
274
+
275
+ The worker writer is kept as **infrastructure** — opt-in, well-tested, no
276
+ regression. It may help with future variants or concurrent-write contention.
277
+
278
+ ### Resolution: multi-step goal-wrap crash (3/3 tasks now complete end-to-end)
279
+
280
+ The silent crash at `atomicWriteFileAsync` of the inner turn's `manifest.json`
281
+ (size=7417) — which caused `team action='run' workflow='fast-fix'` (and other
282
+ multi-step builtins) to hang at "1/3" forever — is **resolved** as a side
283
+ effect of commit `d52cb81` ("fix(goal-wrap): persist async.pid on OUTER
284
+ goal-loop manifest"). The extra `atomicWriteJson(manifestPath, asyncGoalManifest)`
285
+ call in `startGoalWrappedRun` after `spawnBackgroundTeamRun` shifts timing
286
+ enough to avoid the underlying race condition.
287
+
288
+ Verified end-to-end with 3 consecutive runs of goal-wrapped fast-fix
289
+ (`fix test.js so npm test passes`): all completed 3/3 tasks in ~120s with
290
+ `npm test` PASS. The original deep-dive investigation (commit `a9f6e09`) is
291
+ preserved as a reference; the proximate crash trigger is a Node.js / V8 /
292
+ filesystem-level race that is not reliably reproducible in either direction.
293
+
294
+ The user-facing symptom (must kill pi to recover from 1/3 hang) is also
295
+ resolved: even if a future regression reintroduces the crash, async-notifier
296
+ will detect the dead background-runner within ~30s and emit `async.died` —
297
+ the user sees "Goal failed: Background runner died unexpectedly" instead of
298
+ an infinite "running" state.
299
+
300
+
301
+
302
+ ### `goal` — autonomous goal loop (P0a + P0 + P1)
303
+
304
+ - `team action='goal' config.subAction='start|status|pause|resume|stop|step|clear'`.
305
+ - A worker does a turn (`executeTeamRun`), then a separate LLM judge (synthesized
306
+ `goal-judge` AgentConfig with `disableTools:true` → Pi `--no-tools`) evaluates the
307
+ transcript + evidence and returns `{achieved, reason, evidenceRefs}`. On
308
+ not-achieved, the `reason` is composed into the next turn's `manifest.goal`.
309
+ - One manifest PER turn (status-transition invariants block reuse). Budget via
310
+ `collectRunMetrics`. `GoalLoopState` persisted at `<crewRoot>/state/goals/<goalId>.json`.
311
+ - Slash command `/team-goal`. Hooks: `before_goal_step`, `before_goal_abort`.
312
+ - Spec-driven: `research-findings/goal-workflow/00-SPEC.md` + `07-PLAN.md` v3.
313
+
314
+ ### `workflow` — dynamic workflow scripts (P2 + P3)
315
+
316
+ - `.dwf.ts` scripts orchestrate subagents via `ctx.agent()` / `ctx.fanOut()` with
317
+ JS loops/branch/cross-review; only `ctx.setResult()` reaches the main context.
318
+ - Full `WorkflowCtx`: `agent`, `fanOut`, `review`, `retry`, `mail`, `gatherReplies`,
319
+ `renderTemplate`, `vars`, `setResult`.
320
+ - `team action='workflow-{create,get,list,save,delete}'`. `workflow-create`/`-delete`
321
+ ACE-gated via `destructive-gate.ts` (`confirm:true`, user-initiated only, path-
322
+ allowlisted via `resolveRealContainedPath`, content-validated).
323
+ - Capability-locked `WorkflowCtx` (Object.freeze + vm.runInNewContext);
324
+ `isolated-vm` deferred to v1.5.
325
+ - Slash command `/workflows`. Example: `workflows/examples/hello.dwf.ts`.
326
+
327
+ ### Shared infra (P0a)
328
+
329
+ - `manifest.runKind?: 'team-run' | 'goal-loop' | 'dynamic-workflow'` discriminator;
330
+ background-runner.ts dispatches to `executeTeamRun` / `runGoalLoop` /
331
+ `runDynamicWorkflow`. Default `'team-run'` (backward-compatible).
332
+
333
+ ### Other
334
+
335
+ - `AgentConfig.disableTools?: boolean` — pushes Pi `--no-tools` (capability-locked agents).
336
+ - `TEAM_EVENT_TYPES` += `goal.*` + `dwf.*` namespaces.
337
+ - New agent-config field, new event types, new hooks — all additive, no breaking changes.
338
+
3
339
  ## [0.8.12] — `team action=cleanup` now reverses `init` (Issue #35) (2026-06-17)
4
340
 
5
341
  `team action=cleanup` gained a **project-level mode** that reverses what
@@ -2653,3 +2989,33 @@ user's project-instructions file was out-of-scope and unnecessary.
2653
2989
 
2654
2990
  +4 regression tests (init does NOT create/modify AGENTS.md; API fields removed).
2655
2991
  typecheck clean; full suite 2972/0.
2992
+
2993
+ ## [Unreleased] — dead-dep cleanup + non-blocking fallow CI (2026-06-18)
2994
+
2995
+ Spotted by running `fallow` (deterministic Rust codebase intelligence) against
2996
+ the repo. Two genuine wins, plus an informational CI job that never blocks.
2997
+
2998
+ ### Removed (dead dependencies, verified unused)
2999
+ - **`typebox`** (`package.json:89`) — dead duplicate of `@sinclair/typebox`
3000
+ (which 10 source files actually import). `typebox` (plain) had **zero**
3001
+ imports anywhere in `src/`.
3002
+ - **`acorn`** (`package.json:84`) — **zero** runtime references in `src/`,
3003
+ `scripts/`, or `*.mjs`. Verified the only other package referencing it
3004
+ (`jiti`) lists it under its own `devDependencies` (for jiti's own tests), so
3005
+ it is not a runtime transitive need. `npm ls acorn` confirmed `pi-crew` was
3006
+ its sole parent.
3007
+
3008
+ Both removals verified: typecheck clean, full suite 2965/0.
3009
+
3010
+ ### CI: added `fallow-audit` job (non-blocking)
3011
+ - New job in `.github/workflows/ci.yml`: ubuntu-only, `continue-on-error: true`
3012
+ so it **never fails the build**.
3013
+ - Runs `fallow audit` (changed-code diff vs base ref) in JSON + human summary,
3014
+ uploads `fallow-audit-report` artifact (14-day retention).
3015
+ - Surfaced findings (dead code, circular deps, duplication, complexity
3016
+ hotspots, dependency hygiene) are for human/agent review, NOT a merge gate.
3017
+ - Rationale for non-blocking: fallow has high out-of-the-box noise (254 clone
3018
+ families, 379 hotspots) + a false positive on the tsx/jiti path-loading
3019
+ pattern (`jiti` flagged unused but is used via runtime path-loading). A
3020
+ blocking gate would create an unpaid maintenance backlog unsuitable for a
3021
+ solo-maintained extension.
package/README.md CHANGED
@@ -1,5 +1,35 @@
1
1
  # pi-crew
2
2
 
3
+ > ## ⚠️ IMPORTANT — Read before using
4
+ >
5
+ > **pi-crew is a sub-agent orchestration layer that was developed almost entirely
6
+ > by AI, for the author's own workflow.** It is **not** a hardened, audited
7
+ > product. Here's the honest framing:
8
+ >
9
+ > - **AI-generated code, limited human review.** The vast majority of pi-crew
10
+ > was written and iterated on by autonomous AI agents. While every change
11
+ > goes through static review + runtime tests, I (the author) have not
12
+ > line-by-line verified everything. There will be bugs, edge cases, and
13
+ > behaviors I haven't anticipated.
14
+ > - **It can spawn processes, run shell commands, and write files on your
15
+ > behalf.** Dynamic workflows (`.dwf.ts`) and goal loops run with the same
16
+ > privileges as your Pi session — treat any `.dwf.ts` like `node script.js`
17
+ > you downloaded from the internet.
18
+ > - **Built for *my* needs, not yours.** This scratches a personal itch. It
19
+ > likely won't fit every workflow, team setup, or risk tolerance — and
20
+ > that's fine.
21
+ >
22
+ > **If that sounds too risky, don't use it** — no hard feelings.
23
+ >
24
+ > **If you still want to use it**, the safest path is to **fork it, read the
25
+ > parts you'll touch, and adapt it to your own setup.** If you find a bug,
26
+ > a footgun, or a sharp edge, please open an issue or send a note — your
27
+ > feedback is genuinely appreciated. Thanks. ✌️
28
+ >
29
+ > See also: [SECURITY-ISSUES.md](SECURITY-ISSUES.md),
30
+ > [docs/dynamic-workflows.md](docs/dynamic-workflows.md#security-model-important)
31
+ > (trust model), and the [Known limitations](#known-limitations) section below.
32
+
3
33
  **Coordinate AI agent teams inside [Pi](https://github.com/nicekate/pi-coding-agent).**
4
34
 
5
35
  pi-crew is a Pi extension that orchestrates autonomous multi-agent workflows — research, implementation, review, testing, and more — with durable state, parallel execution, worktree isolation, and safe defaults.
@@ -9,13 +39,52 @@ npm: pi-crew
9
39
  repo: https://github.com/baphuongna/pi-crew
10
40
  ```
11
41
 
12
- **v0.8.11**: See [CHANGELOG.md](CHANGELOG.md).
42
+ **v0.9.0**: See [CHANGELOG.md](CHANGELOG.md).
13
43
 
14
- ### Highlights (v0.6.4 → v0.8.11)
44
+ ### Highlights (v0.6.4 → v0.9.0)
15
45
 
16
46
  A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
17
47
  trust and cliff-resilience, stay lean, delete before adding.*
18
48
 
49
+ #### v0.9.0 — goal loops + dynamic workflows (2026-06-18)
50
+ Two new features, both modeled on Claude Code, built on a shared `runKind`
51
+ background-dispatch discriminator.
52
+
53
+ - **🎯 Autonomous goal loops** — `team action='goal'` runs a self-directed
54
+ multi-turn loop: a **worker** does a turn, a separate **LLM judge**
55
+ (capability-locked, no tools) evaluates the transcript + verification against
56
+ the objective, and on "not-achieved" the reason is fed into the next turn's
57
+ prompt. Stops on `achieved` / `maxTurns` / budget / `BLOCKED:` / user `stop`.
58
+ See [docs/goals.md](docs/goals.md).
59
+ - **📜 Dynamic workflows (`.dwf.ts`)** — author orchestration as a TypeScript
60
+ script (JS loops/branch/cross-review) instead of a static step list. Runs in
61
+ the background, spawns subagents via `ctx.agent()`/`ctx.fanOut()`, holds
62
+ intermediate results in JS variables, and only `ctx.setResult()` reaches the
63
+ main context. `workflow-create`/`-delete` are ACE-gated (`confirm:true`,
64
+ user-confirmed). See [docs/dynamic-workflows.md](docs/dynamic-workflows.md).
65
+ - **🛡️ Goal-wrap** (RFC v0.5 vision) — apply the goal completion-guarantee to
66
+ existing builtin workflows (`implementation`, `fast-fix`, `default`) via
67
+ per-workflow `.crew/config.json` toggle. Single-step workflows goal-wrap
68
+ end-to-end; multi-step workflows auto-downgrade to a normal team-run because
69
+ they crash non-deterministically under the V8/libuv event-loop (see [Known
70
+ limitations](#known-limitations)).
71
+ - **🔐 Phase 1 integrity hardening** (P1a–P1g) — verification bookend snapshots,
72
+ anti-oscillation (`stuck` non-terminal + resumable), budget enforcement
73
+ (required or explicit opt-out), nonce-token feedback sanitization, secret
74
+ redaction at artifact-write (O(n) fix), global worker cap + workspace lock
75
+ (O_EXCL, startTime-safe). B2 confused-deputy (auto-detecting verification
76
+ commands) refused — user must declare verification explicitly.
77
+ - **🧪 Phase 1.5 fast-follow** — opt-in mitigation toggles for residual risks:
78
+ `PI_CREW_VERIFICATION_SANITIZE_ENV=1` (strip provider secrets from the
79
+ verification subprocess), `PI_CREW_VERIFICATION_WORKTREE=1` (run verification
80
+ in a pristine git worktree at the T_snap commit SHA),
81
+ `PI_CREW_BG_REPORT_ON_FATAL=1` (V8 diagnostic report on fatal).
82
+ - **🐛 TDZ fix** (Phase 1.5 #4) — live `team action='run' workflow='<dynamic>'`
83
+ was failing with a misleading "must export a default async function" error.
84
+ Root cause was a Temporal Dead Zone race in `team-tool/run.ts` when loaded via
85
+ the full Pi extension pipeline (`index.ts → … → run.ts`). Fixed by
86
+ `let`→`var` on the latch + lazy dynamic imports at call sites.
87
+
19
88
  #### v0.8.x — hardening & reliability (2026-06-17)
20
89
  - **🛠️ Split-scope install fix (v0.8.11)** — `team` runs no longer crash with
21
90
  `Cannot find module '@earendil-works/pi-coding-agent'` when pi-crew and pi
@@ -75,6 +144,8 @@ trust and cliff-resilience, stay lean, delete before adding.*
75
144
  - **Scheduled runs** — `schedule`/`scheduled` actions with cron, interval, and one-shot support; spawned runs tracked and auto-cancelled on job removal
76
145
  - **Plugin system** — framework-aware context injection (Next.js, Vite, Vitest) via plugin registry
77
146
  - **Health scoring** — penalty-based run health with time-series snapshots
147
+ - **Autonomous goal loops** (P0/P1) — `team action='goal'` runs an autonomous multi-turn loop: a worker does a turn, a separate LLM judge evaluates the transcript+evidence against the goal, and on "not-achieved" the reason is fed into the next turn's prompt. Stops on achieved / maxTurns / budget / blocked. Claude-Code-style `/goal`. See `docs/goals.md`.
148
+ - **Dynamic workflows** (P2/P3) — author orchestration as a `.dwf.ts` script (JS loops/branch/cross-review) instead of a static step list. The script runs in the background, calls subagents via `ctx.agent()`/`ctx.fanOut()`, holds intermediate results in JS variables, and only `ctx.setResult()` reaches the main context. `workflow-create`/`-delete`/`-save` require `confirm:true` at the tool-call layer (the only gate — a malicious agent that passes `confirm:true` programmatically bypasses it; this is postinstall-equivalent trust, not a human-in-the-loop dialog). See `docs/dynamic-workflows.md`.
78
149
 
79
150
  ---
80
151
 
@@ -582,6 +653,8 @@ Stats: **366 source files** (70K lines) · **506 test files** (66K lines) · **4
582
653
  | [docs/troubleshooting.md](docs/troubleshooting.md) | Common errors, recovery, and error-code reference (E001–E012) |
583
654
  | [docs/architecture.md](docs/architecture.md) | Internal architecture + run flow |
584
655
  | [docs/runtime-flow.md](docs/runtime-flow.md) | Runtime execution details |
656
+ | [docs/goals.md](docs/goals.md) | **v0.9.0** Autonomous goal loops (`team action='goal'`) |
657
+ | [docs/dynamic-workflows.md](docs/dynamic-workflows.md) | **v0.9.0** `.dwf.ts` script runtime + trust model |
585
658
  | [docs/live-mailbox-runtime.md](docs/live-mailbox-runtime.md) | Mailbox + live-session runtime |
586
659
  | [docs/publishing.md](docs/publishing.md) | Release & publish process |
587
660
  | [docs/next-upgrade-roadmap.md](docs/next-upgrade-roadmap.md) | Future upgrade roadmap |
@@ -591,6 +664,43 @@ Research docs (not in package): [`docs/pi-crew-research/`](https://github.com/ba
591
664
 
592
665
  ---
593
666
 
667
+ ## Known limitations
668
+
669
+ This is AI-developed software built for a personal workflow. These are the
670
+ sharp edges I'm aware of — there are almost certainly others I'm not.
671
+
672
+ - **Multi-step goal-wrap crashes non-deterministically.** Goal-wrapping
673
+ multi-step builtin workflows (`fast-fix`, `default`) can hit a V8/libuv
674
+ event-loop race that kills the background process with no signal, no core,
675
+ and no V8 diagnostic report (8 investigation attempts: gdb, strace, perf,
676
+ `--report-on-fatalerror`, sync-fs workarounds, worker-thread atomic writer —
677
+ see `research-findings/goal-workflow/17-PHASE1.5-CRASH-INVESTIGATION-RFC.md`).
678
+ **Mitigation:** multi-step workflows silently auto-downgrade to a normal
679
+ team-run (no goal-wrap layer); single-step workflows (`implementation`)
680
+ goal-wrap end-to-end.
681
+ - **`.dwf.ts` scripts are NOT sandboxed in v1.** The `WorkflowCtx` is
682
+ `Object.freeze()`d, but the script runs in plain module scope with full
683
+ `require`/`import`/`process` access (postinstall-equivalent trust).
684
+ `isolated-vm` (real V8 isolate) is planned for a future release. Only place
685
+ `.dwf.ts` files you have reviewed. See
686
+ [docs/dynamic-workflows.md#security-model-important](docs/dynamic-workflows.md#security-model-important).
687
+ - **Editor/agent file caching.** After editing a loaded pi-crew source file,
688
+ restart the Pi session for changes to take effect (jiti in-memory cache).
689
+ Editing a `.dwf.ts` in place while a run is mid-flight can serve a stale
690
+ module body; rename the file or restart Pi to force a fresh load.
691
+ - **Verification integrity is best-effort against adversarial workers.** The
692
+ bookend snapshot (P1a) and git-worktree sandbox (Phase 1.5 #2, opt-in)
693
+ raise the bar, but a worker in the same process can still tamper with files
694
+ outside the snapshot window. Full isolation requires the planned sandbox.
695
+ - **Single maintainer + AI review.** Every change ships after 2+ consecutive
696
+ clean static-review rounds + runtime tests, but there's no independent human
697
+ audit. Fork and read before trusting anything that touches your data.
698
+
699
+ If you hit any of these — or a new one — please
700
+ [open an issue](https://github.com/baphuongna/pi-crew/issues).
701
+
702
+ ---
703
+
594
704
  ## Acknowledgements
595
705
 
596
706
  `pi-crew` builds on ideas and selected MIT-licensed implementation patterns from `pi-subagents` and `oh-my-claudecode`, with conceptual inspiration from `oh-my-openagent`.
@@ -1,6 +1,6 @@
1
1
  # Feature Intake
2
2
 
3
- Mọi implementation prompt phải đi qua intake gate trước khi code changes.
3
+ Every implementation prompt must pass through the intake gate before code changes.
4
4
 
5
5
  ## Intake Flow
6
6
 
package/docs/HARNESS.md CHANGED
@@ -1,11 +1,12 @@
1
1
  # Harness
2
2
 
3
- pi-crew một Pi extension cho multi-agent orchestration. Harness này giúp
4
- agents humans phối hợp phát triển pi-crew một cách reliable, inspectable,
5
- dễ steer.
3
+ pi-crew is a Pi extension for multi-agent orchestration. This harness helps
4
+ agents and humans collaborate on developing pi-crew in a reliable, inspectable,
5
+ and easy-to-steer way.
6
6
 
7
- Product pi-crew chính nó. Harness môi trường operating để agents hiểu
8
- product, classify work, track decisions, và validate changes.
7
+ The product is pi-crew itself. The harness is the operating environment that
8
+ helps agents understand the product, classify work, track decisions, and
9
+ validate changes.
9
10
 
10
11
  ## Mental Model
11
12
 
@@ -36,26 +37,26 @@ Human intent (issue, prompt, request)
36
37
  Next intent
37
38
  ```
38
39
 
39
- Mỗi task 2 outputs:
40
+ Each task has 2 outputs:
40
41
  1. **Product delta**: code changes, test changes, API shape, config changes
41
42
  2. **Harness delta**: docs, decisions, test matrix updates, backlog items
42
43
 
43
44
  ## Source Hierarchy
44
45
 
45
- Agents đọc theo thứ tự:
46
+ Agents read in this order:
46
47
 
47
- 1. `AGENTS.md` — operating rules important paths
48
- 2. `docs/HARNESS.md` — file này, collaboration model
49
- 3. `docs/FEATURE_INTAKE.md` — trước khi biến request thành work
48
+ 1. `AGENTS.md` — operating rules and important paths
49
+ 2. `docs/HARNESS.md` — this file, the collaboration model
50
+ 3. `docs/FEATURE_INTAKE.md` — before turning a request into work
50
51
  4. `docs/product/` — current product contract
51
52
  5. `docs/ARCHITECTURE.md` — implementation shape
52
- 6. `docs/stories/` — active completed stories
53
+ 6. `docs/stories/` — active and completed stories
53
54
  7. `docs/TEST_MATRIX.md` — proof status
54
55
  8. `docs/decisions/` — why important choices were made
55
56
 
56
57
  ## Validation Ladder
57
58
 
58
- pi-crew đã validation commands:
59
+ pi-crew already has validation commands:
59
60
 
60
61
  | Level | Command | What it proves |
61
62
  |-------|---------|----------------|
@@ -68,14 +69,14 @@ Agents **must not** claim validation passes without running the actual command.
68
69
 
69
70
  ## Growth Rule
70
71
 
71
- Harness grows từ friction. Khi agent:
72
- - Bị confused về expected behavior
73
- - Phải repeat manual reasoning
74
- - Thiếu validation command
75
- - Discover missing rule
76
- - Thấy recurring failure pattern
72
+ The harness grows from friction. When an agent:
73
+ - Gets confused about expected behavior
74
+ - Has to repeat manual reasoning
75
+ - Lacks a validation command
76
+ - Discovers a missing rule
77
+ - Sees a recurring failure pattern
77
78
 
78
- Agent must improve harness directly hoặc propose trong `docs/HARNESS_BACKLOG.md`.
79
+ The agent must improve the harness directly or propose changes in `docs/HARNESS_BACKLOG.md`.
79
80
 
80
81
  ## Working Conventions
81
82