pi-crew 0.8.10 → 0.8.11
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +139 -0
- package/README.md +53 -44
- package/package.json +1 -1
- package/src/extension/register.ts +6 -0
- package/src/runtime/async-runner.ts +11 -1
- package/src/runtime/background-runner.ts +5 -0
- package/src/runtime/model-fallback.ts +23 -0
- package/src/runtime/peer-dep.ts +296 -0
- package/src/runtime/skill-instructions.ts +5 -1
- package/src/skills/discover-skills.ts +3 -1
package/CHANGELOG.md
CHANGED
|
@@ -1,5 +1,144 @@
|
|
|
1
1
|
# Changelog
|
|
2
2
|
|
|
3
|
+
## [0.8.11] — Split-scope install fix + transient-provider fallback (2026-06-17)
|
|
4
|
+
|
|
5
|
+
Bundle of two independent fixes that were triaged from real user reports on
|
|
6
|
+
2026-06-17. Both are robustness fixes for failure modes that previously
|
|
7
|
+
killed team runs silently.
|
|
8
|
+
|
|
9
|
+
### 1. `Cannot find module '@earendil-works/pi-coding-agent'` on Windows / global installs
|
|
10
|
+
|
|
11
|
+
**Symptom:** every `team` action (run / parallel / plan) crashed ~1 minute
|
|
12
|
+
after spawn, leaving all tasks permanently `queued`. The detached
|
|
13
|
+
background team-runner child threw:
|
|
14
|
+
```
|
|
15
|
+
Error: Cannot find module '@earendil-works/pi-coding-agent'
|
|
16
|
+
Require stack:
|
|
17
|
+
- .../.pi/agent/npm/node_modules/pi-crew/src/runtime/skill-instructions.ts
|
|
18
|
+
```
|
|
19
|
+
|
|
20
|
+
**Root cause:** pi-crew (an extension) is installed under
|
|
21
|
+
`~/.pi/agent/npm/node_modules/<ext>/`, but pi itself (the
|
|
22
|
+
`@earendil-works/pi-coding-agent` package extensions import from) lives in a
|
|
23
|
+
**separate** node_modules tree (nvm / `%APPDATA%\npm` / Volta / fnm /
|
|
24
|
+
pnpm-global). Node's resolver only walks UP ancestor `node_modules`, so a
|
|
25
|
+
static `import { getAgentDir } from "@earendil-works/pi-coding-agent"` in a
|
|
26
|
+
file loaded by the spawned child crashes. This is the **default** layout for
|
|
27
|
+
anyone who installs pi-crew via `pi install` — not a user misconfiguration.
|
|
28
|
+
|
|
29
|
+
**Additional constraint:** pi-coding-agent ships as **ESM-only**
|
|
30
|
+
(`type:module`, exports map with only an `import` condition). CJS
|
|
31
|
+
`createRequire(dir)(name)` / `require.resolve("<pkg>/package.json")` both
|
|
32
|
+
fail with `ERR_PACKAGE_PATH_NOT_EXPORTED` under node AND jiti/tsx (verified).
|
|
33
|
+
The ONLY working load mechanism is a dynamic `import()` of the resolved ESM
|
|
34
|
+
entry file URL.
|
|
35
|
+
|
|
36
|
+
**Fix — NEW `src/runtime/peer-dep.ts`:**
|
|
37
|
+
- `resolvePeerDep()` (sync): walks `node_modules` **manually** (bypasses the
|
|
38
|
+
restrictive exports map) across 6 strategies — env hint
|
|
39
|
+
(`PI_CREW_PEER_DEP_DIR`), this file, `process.argv[1]`, the node binary's
|
|
40
|
+
global node_modules (covers nvm/Volta/fnm), `npm root -g`, and
|
|
41
|
+
`%APPDATA%\npm`. Memoized.
|
|
42
|
+
- `primePeerDep()` (async): dynamic `import(fileURL)` the resolved ESM entry,
|
|
43
|
+
cache the module namespace. Memoized + retryable on failure.
|
|
44
|
+
- `getAgentDir()` (sync): reads the REAL fork-aware `getAgentDir` from the
|
|
45
|
+
primed cache; falls back to a computed default (`~/.pi/agent`, respecting
|
|
46
|
+
`PI_CODING_AGENT_DIR`) if not primed — **NEVER throws**.
|
|
47
|
+
|
|
48
|
+
**Rewired:**
|
|
49
|
+
- `skill-instructions.ts`, `discover-skills.ts` — static peer-dep import →
|
|
50
|
+
lazy `getAgentDir()` from `peer-dep.ts` (this is the crash site).
|
|
51
|
+
- `background-runner.ts` — `primePeerDep()` before importing `team-runner`
|
|
52
|
+
(child process).
|
|
53
|
+
- `register.ts` — `primePeerDep()` at extension entry (main process).
|
|
54
|
+
- `async-runner.ts` — propagate `PI_CREW_PEER_DEP_DIR` to children so they
|
|
55
|
+
skip the ~200ms `npm root -g` probe.
|
|
56
|
+
|
|
57
|
+
**Tests:** NEW `test/unit/peer-dep-resolver.test.ts` (9 cases) — env-hint
|
|
58
|
+
resolution, manual node_modules walk past exports map, ESM dynamic-import
|
|
59
|
+
loading, memoization, graceful fallback, `PI_CODING_AGENT_DIR` override,
|
|
60
|
+
loadable fileURL under the child's loader.
|
|
61
|
+
|
|
62
|
+
### 2. `500 api_error "unknown error, 999 (1000)"` aborted the run instead of falling back
|
|
63
|
+
|
|
64
|
+
**Symptom:** when the model provider went hard-down with
|
|
65
|
+
`500 {"type":"error","error":{"type":"api_error","message":"unknown
|
|
66
|
+
error, 999 (1000)"}}`, the run died even when the user had configured a
|
|
67
|
+
fallback model that would have worked.
|
|
68
|
+
|
|
69
|
+
**Root cause:** pi has two safety layers. (1) pi-core provider-retry retries
|
|
70
|
+
3× with exponential backoff — its regex already matches `500`. (2) pi-crew's
|
|
71
|
+
`model-fallback` layer is the last safety net: when all 3 retries fail, it
|
|
72
|
+
tries the next configured model. But `isRetryableModelFailure`'s pattern
|
|
73
|
+
list covered 429 / rate-limit / 502-504 / overloaded / timeout and **MISSED**
|
|
74
|
+
generic `500`, `api_error`, `unknown error`, and internal/server-error
|
|
75
|
+
phrasings. So a transient provider outage was retried 3× then **aborted**
|
|
76
|
+
instead of failing over.
|
|
77
|
+
|
|
78
|
+
**Fix:** added to `RETRYABLE_MODEL_FAILURE_PATTERNS` —
|
|
79
|
+
`\b500\b`, `\b501\b`, `api_error`, `unknown error`,
|
|
80
|
+
`internal(?:_server)?[ _]error`, `server error`, `bad gateway`.
|
|
81
|
+
|
|
82
|
+
`NON_RETRYABLE` (auth/billing/key) still wins — checked first in
|
|
83
|
+
`isRetryableModelFailure` — so a transient-looking 500 wrapping an auth
|
|
84
|
+
failure won't loop the chain.
|
|
85
|
+
|
|
86
|
+
**Tests:** 4 regression tests in `test/unit/model-fallback.test.ts` covering
|
|
87
|
+
the exact reported error, generic 5xx, auth-still-blocked, and undefined/empty.
|
|
88
|
+
|
|
89
|
+
### Verification
|
|
90
|
+
|
|
91
|
+
typecheck clean; peer-dep suite 9/9; model-* suite 57/57; full suite 0 real
|
|
92
|
+
failures (1 known `result-watcher` fs.watch 10s timeout flake passes 7/7 in
|
|
93
|
+
isolation — unrelated).
|
|
94
|
+
|
|
95
|
+
## [0.8.10] — Pre-warm 3 repro-observed cold-start crash-variant modules (2026-06-17)
|
|
96
|
+
|
|
97
|
+
The post-v0.8.9-restart 6-subagent repro surfaced 3 cold-start crash variants
|
|
98
|
+
in one batch: `existsSync` (peer-dep, latched v0.8.1 + warmup v0.8.6),
|
|
99
|
+
`effectiveRunConfig` (`team-tool/config-patch.ts`), `CREW_README`
|
|
100
|
+
(`state/crew-init.ts`, latched v0.8.9). v0.8.6's warmup covered `team-tool.ts`
|
|
101
|
+
transitively but not these specific modules explicitly — static-graph
|
|
102
|
+
reachability isn't reliable under tsx/jiti interop + concurrent fanout (the
|
|
103
|
+
`handleRun` latch serializes the CALL but not module-body instantiation of
|
|
104
|
+
`run.ts`'s static deps).
|
|
105
|
+
|
|
106
|
+
**Fix:** add the 3 repro-observed modules to `HOT_MODULE_SPECIFIERS` so their
|
|
107
|
+
module bodies instantiate at single-threaded registration:
|
|
108
|
+
`team-tool/run.ts`, `team-tool/config-patch.ts`, `workflows/validate-workflow.ts`.
|
|
109
|
+
|
|
110
|
+
Repro verification: 6/6 subagents clean (was 1/6) under loaded code.
|
|
111
|
+
|
|
112
|
+
## [0.8.9] — crew-init dynamic-import latch (kills CREW_README TDZ race) (2026-06-17)
|
|
113
|
+
|
|
114
|
+
Module-scoped `loadCrewInit()` latch in `team-tool/run.ts` — concurrent `team`
|
|
115
|
+
tool calls share ONE in-flight import promise. Added `crew-init.ts` to
|
|
116
|
+
`HOT_MODULE_SPECIFIERS`. Targets the `CREW_README` TDZ variant observed in the
|
|
117
|
+
post-v0.8.8 repro.
|
|
118
|
+
|
|
119
|
+
## [0.8.8] — Cross-project leak cwd-scope barrier (2026-06-17)
|
|
120
|
+
|
|
121
|
+
`collectInFlightRuns` filtered by STATUS only (queued/planning/running), not
|
|
122
|
+
by project scope. Multiple Pi sessions in the same project shared
|
|
123
|
+
`.crew/state/runs/`, so Session B's compaction picked up Session A's runs in
|
|
124
|
+
OTHER projects and injected them into Session B's continuation prompt.
|
|
125
|
+
|
|
126
|
+
The v0.8.8 (4bd6f5b) `ownerSessionId` filter was **unreliable** —
|
|
127
|
+
`ctx.sessionId` is absent on pi 0.79.6 `ExtensionContext`.
|
|
128
|
+
|
|
129
|
+
**Fix:** `isInProjectScope(run, queryCwd)` in `collectInFlightRuns` — keeps a
|
|
130
|
+
run only if `findRepoRoot(run.cwd) === findRepoRoot(queryCwd)`. Reliable,
|
|
131
|
+
version-independent. Filter at the consumption site, NOT in
|
|
132
|
+
`listRecentRuns`/`collectActiveRuns` (the cross-project dashboard view stays
|
|
133
|
+
unfiltered — 2 run-index tests pin that). Empirically verified: ambient
|
|
134
|
+
status shows only current-project runs, zero foreign-project bleed.
|
|
135
|
+
|
|
136
|
+
## [0.8.7] — Doctor runtime-warmup status (2026-06-17)
|
|
137
|
+
|
|
138
|
+
`getRuntimeWarmupStatus()` diagnostic + a "Runtime warmup" section in
|
|
139
|
+
`team doctor` showing started/completed/duration/error. "Not started" is NOT
|
|
140
|
+
a doctor error (normal for direct unit-test calls).
|
|
141
|
+
|
|
3
142
|
## [0.8.6] — General cold-start race fix (runtime module-graph warmup) (2026-06-17)
|
|
4
143
|
|
|
5
144
|
Fixes the `validateWorkflowForTeam` cold-start crash that v0.8.1 did NOT
|
package/README.md
CHANGED
|
@@ -9,50 +9,48 @@ npm: pi-crew
|
|
|
9
9
|
repo: https://github.com/baphuongna/pi-crew
|
|
10
10
|
```
|
|
11
11
|
|
|
12
|
-
**v0.
|
|
13
|
-
|
|
14
|
-
### Highlights (v0.6.4 → v0.
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
-
|
|
35
|
-
|
|
36
|
-
|
|
37
|
-
|
|
38
|
-
-
|
|
39
|
-
|
|
40
|
-
-
|
|
41
|
-
-
|
|
42
|
-
|
|
43
|
-
-
|
|
44
|
-
|
|
45
|
-
-
|
|
46
|
-
|
|
47
|
-
|
|
48
|
-
|
|
49
|
-
-
|
|
50
|
-
-
|
|
51
|
-
-
|
|
52
|
-
-
|
|
53
|
-
-
|
|
54
|
-
- **State-store race fix** — manifest/tasks mtime false positive eliminated
|
|
55
|
-
- **Orphan worker/temp cleanup** — 4-layer defense with session-scoped tracking
|
|
12
|
+
**v0.8.11**: See [CHANGELOG.md](CHANGELOG.md).
|
|
13
|
+
|
|
14
|
+
### Highlights (v0.6.4 → v0.8.11)
|
|
15
|
+
|
|
16
|
+
A long arc of **trust, cliff-resilience, and robustness** work. Principle: *build
|
|
17
|
+
trust and cliff-resilience, stay lean, delete before adding.*
|
|
18
|
+
|
|
19
|
+
#### v0.8.x — hardening & reliability (2026-06-17)
|
|
20
|
+
- **🛠️ Split-scope install fix (v0.8.11)** — `team` runs no longer crash with
|
|
21
|
+
`Cannot find module '@earendil-works/pi-coding-agent'` when pi-crew and pi
|
|
22
|
+
live in separate node_modules trees (the default for `pi install`). New
|
|
23
|
+
`src/runtime/peer-dep.ts` resolves the ESM-only peer dep across 6 strategies.
|
|
24
|
+
- **🔄 Model fallback on transient 5xx (v0.8.11)** — a hard-down provider
|
|
25
|
+
(`500 api_error "unknown error"`) now triggers the configured fallback
|
|
26
|
+
model instead of aborting the run. `isRetryableModelFailure` extended.
|
|
27
|
+
- **🧊 Cold-start race eliminated (v0.8.6 → v0.8.10)** — under tsx, concurrent
|
|
28
|
+
subagent spawns raced module instantiation (`existsSync` / `CREW_README` /
|
|
29
|
+
`effectiveRunConfig` / `validateWorkflowForTeam`). Fixed graph-wide: warm at
|
|
30
|
+
registration + gate at spawn boundaries + per-site latches. 6/6 repro clean.
|
|
31
|
+
- **🔒 Cross-project leak fixed (v0.8.8)** — ambient status / compaction no
|
|
32
|
+
longer bleed foreign-project runs into the current session. Cwd-scope
|
|
33
|
+
barrier (`isInProjectScope`), version-independent.
|
|
34
|
+
- **🩺 Doctor runtime-warmup status (v0.8.7)** — `team doctor` shows whether
|
|
35
|
+
the module-graph warmup fired.
|
|
36
|
+
- **🔍 Cold-verifier agent (v0.8.4)** — adversarial cross-check that re-derives
|
|
37
|
+
claims WITHOUT trusting prior analysis, catching confirmation bias.
|
|
38
|
+
- **⚡ Per-write validator (v0.8.5)** — zero-cost `JSON.parse` on every
|
|
39
|
+
`write`/`edit`, appends a `🔴` blocker on malformed files.
|
|
40
|
+
- **🎨 Terminal status (v0.8.3)** — tab title + Ghostty native progress bar.
|
|
41
|
+
- **🧠 Skill confidence revived (v0.8.2)** — `adjustConfidence()` was dead
|
|
42
|
+
code; the effectiveness system now actually learns.
|
|
43
|
+
- **🔧 Tool-restriction unification (v0.8.0)** — single `resolveToolPolicy`
|
|
44
|
+
across both spawn paths.
|
|
45
|
+
- **🎯 F6/F1 interop granularity (v0.7.9)** — 7 skill roots, `.pi/agents/`
|
|
46
|
+
tier, tool wildcards, `excludeExtensions` denylist.
|
|
47
|
+
|
|
48
|
+
#### v0.7.0 — Phase 0 + Phase 1 roadmap
|
|
49
|
+
- **🛡️ Compaction resilience (O10)** — in-flight runs survive auto-compact.
|
|
50
|
+
- **💰 Cost visibility (O1)** — per-role token + cost attribution.
|
|
51
|
+
- **✋ Plan-level HITL (O5)** — `requirePlanApproval` gates any workflow.
|
|
52
|
+
- **🧠 Cross-run memory (O4)** — `.crew/knowledge.md` injected every run.
|
|
53
|
+
- **🎯 Single-agent cliff hedge** — `team plan singleAgent=true`.
|
|
56
54
|
|
|
57
55
|
---
|
|
58
56
|
|
|
@@ -99,6 +97,17 @@ pi-crew # after npm install
|
|
|
99
97
|
node ./pi-crew/install.mjs # from local clone
|
|
100
98
|
```
|
|
101
99
|
|
|
100
|
+
> **Split-scope install note (v0.8.11+):** pi installs extensions under
|
|
101
|
+
> `~/.pi/agent/npm/node_modules/<ext>/`, separate from pi's own
|
|
102
|
+
> node_modules tree (nvm / `%APPDATA%\npm` / Volta / fnm). Since v0.8.11
|
|
103
|
+
> pi-crew resolves the `@earendil-works/pi-coding-agent` peer dep robustly
|
|
104
|
+
> across these layouts — no symlink/NODE_PATH workaround needed. If you ever
|
|
105
|
+
> do hit `Cannot find module '@earendil-works/pi-coding-agent'`, set
|
|
106
|
+
> `PI_CREW_PEER_DEP_DIR=<path to the pi-coding-agent package dir>` as a
|
|
107
|
+
> one-line workaround (or install pi-crew in pi's own scope:
|
|
108
|
+
> `npm install -g @earendil-works/pi-crew`).
|
|
109
|
+
|
|
110
|
+
|
|
102
111
|
---
|
|
103
112
|
|
|
104
113
|
## Quick Start
|
package/package.json
CHANGED
|
@@ -84,6 +84,7 @@ import { runEventBus } from "../ui/run-event-bus.ts";
|
|
|
84
84
|
import { createTerminalStatusController, type TerminalStatusController } from "../ui/terminal-status.ts";
|
|
85
85
|
import { extractPathFromInput, validateWrittenFile, buildValidationBlocker } from "../runtime/per-write-validator.ts";
|
|
86
86
|
import { startRuntimeWarmup } from "../runtime/runtime-warmup.ts";
|
|
87
|
+
import { primePeerDep } from "../runtime/peer-dep.ts";
|
|
87
88
|
import { createRunSnapshotCache } from "../ui/run-snapshot-cache.ts";
|
|
88
89
|
import { closeWatcher } from "../utils/fs-watch.ts";
|
|
89
90
|
import { RunWatcherRegistry } from "../utils/run-watcher-registry.ts";
|
|
@@ -207,6 +208,11 @@ export function registerPiTeams(pi: ExtensionAPI): void {
|
|
|
207
208
|
// Warming the graph here + awaiting it at spawn boundaries eliminates the
|
|
208
209
|
// race window. See src/runtime/runtime-warmup.ts.
|
|
209
210
|
startRuntimeWarmup();
|
|
211
|
+
// FIX (split-scope install): preload the ESM peer dep so discover-skills /
|
|
212
|
+
// skill-instructions can read the REAL getAgentDir (fork-aware) from cache.
|
|
213
|
+
// Fire-and-forget: getAgentDir() falls back to a safe computed default until
|
|
214
|
+
// this resolves. See src/runtime/peer-dep.ts.
|
|
215
|
+
primePeerDep().catch(() => {});
|
|
210
216
|
// Deploy bundled themes (crew-dark, crew-dracula, etc.) to ~/.pi/agent/themes/
|
|
211
217
|
// so Pi's theme loader discovers them. Best-effort, idempotent.
|
|
212
218
|
deployBundledThemes();
|
|
@@ -6,6 +6,7 @@ import { fileURLToPath, pathToFileURL } from "node:url";
|
|
|
6
6
|
import { logInternalError } from "../utils/internal-error.ts";
|
|
7
7
|
import { appendEvent } from "../state/event-log.ts";
|
|
8
8
|
import { sanitizeEnvSecrets } from "../utils/env-filter.ts";
|
|
9
|
+
import { resolvePeerDepDir, PEER_DEP_DIR_ENV } from "./peer-dep.ts";
|
|
9
10
|
import {
|
|
10
11
|
registerWorker,
|
|
11
12
|
unregisterWorker,
|
|
@@ -202,6 +203,15 @@ export async function spawnBackgroundTeamRun(manifest: TeamRunManifest): Promise
|
|
|
202
203
|
// FIX: removed delete workarounds — with explicit allowlist, these vars
|
|
203
204
|
// are no longer auto-leaked. Matches child-pi.ts.
|
|
204
205
|
|
|
206
|
+
// FIX (split-scope install): pass the resolved peer-dep dir to the child so
|
|
207
|
+
// it can resolve @earendil-works/pi-coding-agent WITHOUT the ~200ms
|
|
208
|
+
// `npm root -g` probe. No-op when pi-crew and pi are co-located. See
|
|
209
|
+
// src/runtime/peer-dep.ts.
|
|
210
|
+
const peerDepDir = resolvePeerDepDir();
|
|
211
|
+
const childEnv = peerDepDir
|
|
212
|
+
? { ...filteredEnv, [PEER_DEP_DIR_ENV]: peerDepDir }
|
|
213
|
+
: filteredEnv;
|
|
214
|
+
|
|
205
215
|
const loader = resolveTypeScriptLoader();
|
|
206
216
|
if (!loader) {
|
|
207
217
|
const message = buildLoaderUnavailableMessage(packageRootFromRuntime());
|
|
@@ -227,7 +237,7 @@ export async function spawnBackgroundTeamRun(manifest: TeamRunManifest): Promise
|
|
|
227
237
|
detached: true,
|
|
228
238
|
setsid: true,
|
|
229
239
|
stdio: ["ignore", "pipe", "pipe"],
|
|
230
|
-
env:
|
|
240
|
+
env: childEnv,
|
|
231
241
|
windowsHide: true,
|
|
232
242
|
} as unknown as Parameters<typeof spawn>[2];
|
|
233
243
|
const child = spawn(process.execPath, command.args, spawnOpts);
|
|
@@ -19,6 +19,7 @@ import {
|
|
|
19
19
|
} from "../workflows/discover-workflows.ts";
|
|
20
20
|
// Heavy runtime — lazy-loaded to avoid pulling team-runner into background-runner
|
|
21
21
|
// at module load time. Only needed when a background run actually starts.
|
|
22
|
+
import { primePeerDep } from "./peer-dep.ts";
|
|
22
23
|
import type { executeTeamRun as ExecuteTeamRunFn } from "./team-runner.ts";
|
|
23
24
|
import type { TeamRunManifest, TeamTaskState } from "../state/types.ts";
|
|
24
25
|
|
|
@@ -27,6 +28,10 @@ async function executeTeamRun(
|
|
|
27
28
|
...args: Parameters<typeof ExecuteTeamRunFn>
|
|
28
29
|
): Promise<Awaited<ReturnType<typeof ExecuteTeamRunFn>>> {
|
|
29
30
|
if (!_cachedExecuteTeamRun) {
|
|
31
|
+
// FIX (split-scope install): prime the ESM peer dep BEFORE team-runner is
|
|
32
|
+
// imported, so its transitive skill-instructions.ts can read getAgentDir()
|
|
33
|
+
// from the primed cache instead of crashing on `Cannot find module`.
|
|
34
|
+
await primePeerDep().catch(() => {});
|
|
30
35
|
// LAZY: avoid pulling team-runner into background-runner at module load time.
|
|
31
36
|
const mod = await import("./team-runner.ts");
|
|
32
37
|
_cachedExecuteTeamRun = mod.executeTeamRun;
|
|
@@ -186,6 +186,29 @@ const RETRYABLE_MODEL_FAILURE_PATTERNS = [
|
|
|
186
186
|
/\b502\b/,
|
|
187
187
|
/\b503\b/,
|
|
188
188
|
/\b504\b/,
|
|
189
|
+
//
|
|
190
|
+
// Provider-side 5xx / generic api_error. The pi-core retry layer already
|
|
191
|
+
// retries these (agent-session.ts matches `500|server error|internal error`),
|
|
192
|
+
// but the pi-crew MODEL FALLBACK layer must ALSO treat them as retryable so
|
|
193
|
+
// that when the provider is hard-down across all 3 provider retries, we fail
|
|
194
|
+
// over to the next configured model instead of giving up. Reported case
|
|
195
|
+
// (2026-06-17): `500 {"type":"error","error":{"type":"api_error",
|
|
196
|
+
// "message":"unknown error, 999 (1000)"}}` — a transient provider outage that
|
|
197
|
+
// should trigger the fallback chain, not abort.
|
|
198
|
+
//
|
|
199
|
+
// `api_error` is the OpenAI-compatible generic error type (vs rate_limit_error
|
|
200
|
+
// / overloaded_error / etc.) and almost always means a transient server fault.
|
|
201
|
+
//
|
|
202
|
+
// `unknown error` is the body of the generic message; `internal`/`server`
|
|
203
|
+
// catch the common phrasings. `\b500\b`/`\b501\b` catch the HTTP status in
|
|
204
|
+
// the rendered error string.
|
|
205
|
+
/\b500\b/,
|
|
206
|
+
/\b501\b/,
|
|
207
|
+
/api_error/i,
|
|
208
|
+
/unknown error/i,
|
|
209
|
+
/internal(?:_server)?[ _]error/i,
|
|
210
|
+
/server error/i,
|
|
211
|
+
/bad gateway/i,
|
|
189
212
|
];
|
|
190
213
|
|
|
191
214
|
// These patterns indicate auth/key/billing issues that will never succeed on retry.
|
|
@@ -0,0 +1,296 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Robust resolution + async loading of the @earendil-works/pi-coding-agent
|
|
3
|
+
* peer dependency. Fixes the "Cannot find module '@earendil-works/pi-coding-agent'"
|
|
4
|
+
* crash that blocks ALL team runs when pi-crew and pi are installed in
|
|
5
|
+
* SEPARATE node_modules trees.
|
|
6
|
+
*
|
|
7
|
+
* PROBLEM (Windows / global installs — reported 2026-06-17)
|
|
8
|
+
* pi-crew is a pi EXTENSION. pi installs extensions under
|
|
9
|
+
* `~/.pi/agent/npm/node_modules/<ext>/`, but pi itself (the
|
|
10
|
+
* @earendil-works/pi-coding-agent package that extensions import from)
|
|
11
|
+
* usually lives in a DIFFERENT node_modules tree — a global one (nvm,
|
|
12
|
+
* %APPDATA%\npm, Volta, fnm, pnpm-global). Node's resolver only walks UP
|
|
13
|
+
* through ancestor `node_modules` of the importing file, so a file under
|
|
14
|
+
* `~/.pi/agent/npm/node_modules/pi-crew/...` CANNOT resolve a peer dep
|
|
15
|
+
* installed under `~/.nvm/.../lib/node_modules/`. Every static
|
|
16
|
+
* `import { X } from "@earendil-works/pi-coding-agent"` that executes inside
|
|
17
|
+
* a SPAWNED CHILD PROCESS (the detached background team runner started by
|
|
18
|
+
* async-runner.spawnBackgroundTeamRun) therefore crashes at module load,
|
|
19
|
+
* leaving all team runs permanently `queued`.
|
|
20
|
+
*
|
|
21
|
+
* ADDITIONAL CONSTRAINT (verified empirically 2026-06-17)
|
|
22
|
+
* pi-coding-agent ships as ESM-only (`"type":"module"`, exports map has only
|
|
23
|
+
* an `import` condition). CJS `require()` / `createRequire(dir)(name)` fails
|
|
24
|
+
* with ERR_PACKAGE_PATH_NOT_EXPORTED under plain node AND under jiti/tsx. The
|
|
25
|
+
* ONLY working load mechanism is a dynamic `import()` of the resolved ESM
|
|
26
|
+
* entry file URL. Hence: sync resolution of the DIR, async load of the MODULE.
|
|
27
|
+
*
|
|
28
|
+
* APPROACH
|
|
29
|
+
* - resolvePeerDep() (sync) — find the install dir across many layouts.
|
|
30
|
+
* - primePeerDep() (async) — dynamic-import the resolved entry, cache
|
|
31
|
+
* the module namespace. Memoized. Called
|
|
32
|
+
* once per process during bootstrap.
|
|
33
|
+
* - getAgentDir() (sync) — read the cached module's getAgentDir.
|
|
34
|
+
* Falls back to a computed default if the
|
|
35
|
+
* cache was never primed, so it NEVER throws.
|
|
36
|
+
*/
|
|
37
|
+
import * as fs from "node:fs";
|
|
38
|
+
import * as os from "node:os";
|
|
39
|
+
import * as path from "node:path";
|
|
40
|
+
import { fileURLToPath, pathToFileURL } from "node:url";
|
|
41
|
+
import { resolveNpmGlobalRoot } from "./pi-spawn.ts";
|
|
42
|
+
|
|
43
|
+
/**
|
|
44
|
+
* The pi-coding-agent peer dependency package name(s) we can be loaded by.
|
|
45
|
+
* @earendil-works is the canonical scope; @mariozechner is the historical fork.
|
|
46
|
+
*/
|
|
47
|
+
export const PEER_DEP_NAMES = [
|
|
48
|
+
"@earendil-works/pi-coding-agent",
|
|
49
|
+
"@mariozechner/pi-coding-agent",
|
|
50
|
+
] as const;
|
|
51
|
+
|
|
52
|
+
/**
|
|
53
|
+
* Env var a parent pi-crew process sets on spawned children so they can resolve
|
|
54
|
+
* the peer dep WITHOUT running `npm root -g` (~200ms probe). The resolver
|
|
55
|
+
* checks this FIRST. Absent (older parent, direct invocation, tests) → falls
|
|
56
|
+
* through to the probing strategies. Also lets users override the resolution
|
|
57
|
+
* explicitly as a last-resort fix.
|
|
58
|
+
*/
|
|
59
|
+
export const PEER_DEP_DIR_ENV = "PI_CREW_PEER_DEP_DIR";
|
|
60
|
+
|
|
61
|
+
type PeerDepModule = typeof import("@earendil-works/pi-coding-agent");
|
|
62
|
+
|
|
63
|
+
interface ResolvedPeerDep {
|
|
64
|
+
dir: string;
|
|
65
|
+
name: string;
|
|
66
|
+
/** file:// URL of the ESM entry (exports["."].import || main). */
|
|
67
|
+
mainUrl: string;
|
|
68
|
+
}
|
|
69
|
+
|
|
70
|
+
let cachedResolve: ResolvedPeerDep | undefined | null = null;
|
|
71
|
+
let cachedModule: PeerDepModule | undefined;
|
|
72
|
+
let primingPromise: Promise<PeerDepModule> | undefined;
|
|
73
|
+
|
|
74
|
+
/**
|
|
75
|
+
* Build the ordered list of "resolution bases" — paths to seed
|
|
76
|
+
* `createRequire(...).resolve()` from. Node walks UP `node_modules` from each
|
|
77
|
+
* base's directory, so any base inside (or beside) the peer dep's package
|
|
78
|
+
* tree will find it. Pure given env/process inputs; exported for unit tests.
|
|
79
|
+
*/
|
|
80
|
+
export function peerDepResolutionBases(): string[] {
|
|
81
|
+
const bases: string[] = [];
|
|
82
|
+
|
|
83
|
+
// 0. Parent-provided hint (fastest — no probe). Set by async-runner.
|
|
84
|
+
const envHint = process.env[PEER_DEP_DIR_ENV]?.trim();
|
|
85
|
+
if (envHint) bases.push(path.resolve(envHint));
|
|
86
|
+
|
|
87
|
+
// 1. This file's location — works when pi-crew and pi-coding-agent share a
|
|
88
|
+
// node_modules ancestor (the common co-located install).
|
|
89
|
+
bases.push(fileURLToPath(import.meta.url));
|
|
90
|
+
|
|
91
|
+
// 2. The entry script. In the PARENT (main pi process) argv[1] is pi's CLI
|
|
92
|
+
// script, which lives INSIDE pi-coding-agent's package → resolves. In a
|
|
93
|
+
// SPAWNED CHILD argv[1] is a pi-crew script → cheap miss, falls through.
|
|
94
|
+
const argv1 = process.argv[1];
|
|
95
|
+
if (argv1) bases.push(path.resolve(argv1));
|
|
96
|
+
|
|
97
|
+
// 3. The Node binary's global node_modules. Covers nvm / nvm-windows /
|
|
98
|
+
// Volta / fnm where pi-coding-agent is `npm i -g`'d: node is at
|
|
99
|
+
// <prefix>/bin/node and globals live at <prefix>/lib/node_modules.
|
|
100
|
+
try {
|
|
101
|
+
const execDir = path.dirname(fs.realpathSync.native(process.execPath));
|
|
102
|
+
bases.push(path.join(path.dirname(execDir), "lib", "node_modules"));
|
|
103
|
+
// Some layouts (Windows global, or a bare node_modules sibling of bin).
|
|
104
|
+
bases.push(path.join(execDir, "node_modules"));
|
|
105
|
+
} catch {
|
|
106
|
+
/* realpath best-effort */
|
|
107
|
+
}
|
|
108
|
+
|
|
109
|
+
// 4. `npm root -g` — the canonical cross-layout global root (memoized in
|
|
110
|
+
// pi-spawn.ts, ~200ms once). Derive the scoped package dirs from it.
|
|
111
|
+
const npmRoot = resolveNpmGlobalRoot();
|
|
112
|
+
if (npmRoot) {
|
|
113
|
+
for (const pkgName of PEER_DEP_NAMES) {
|
|
114
|
+
bases.push(path.join(npmRoot, ...pkgName.split("/")));
|
|
115
|
+
}
|
|
116
|
+
}
|
|
117
|
+
|
|
118
|
+
// 5. Windows %APPDATA%\npm static layout (legacy npm-global, pre-npm-root-g).
|
|
119
|
+
if (process.env.APPDATA) {
|
|
120
|
+
bases.push(path.join(process.env.APPDATA, "npm", "node_modules"));
|
|
121
|
+
}
|
|
122
|
+
|
|
123
|
+
return bases;
|
|
124
|
+
}
|
|
125
|
+
|
|
126
|
+
/** Pull the ESM entry path out of package.json (exports import || main). */
|
|
127
|
+
function extractEsmMain(pkg: unknown): string | undefined {
|
|
128
|
+
if (!pkg || typeof pkg !== "object") return undefined;
|
|
129
|
+
const p = pkg as Record<string, unknown>;
|
|
130
|
+
const exp = p.exports;
|
|
131
|
+
if (exp && typeof exp === "object") {
|
|
132
|
+
const dot = (exp as Record<string, unknown>)["."];
|
|
133
|
+
if (dot && typeof dot === "object") {
|
|
134
|
+
const d = dot as Record<string, unknown>;
|
|
135
|
+
const rel = d.import ?? d.default ?? d.module;
|
|
136
|
+
if (typeof rel === "string") return rel;
|
|
137
|
+
} else if (typeof dot === "string") {
|
|
138
|
+
return dot;
|
|
139
|
+
}
|
|
140
|
+
}
|
|
141
|
+
const main = p.main;
|
|
142
|
+
return typeof main === "string" ? main : undefined;
|
|
143
|
+
}
|
|
144
|
+
|
|
145
|
+
/**
|
|
146
|
+
* Walk the node_modules resolution algorithm MANUALLY from `start` looking for
|
|
147
|
+
* any of `names`. We do NOT use createRequire/require.resolve here because
|
|
148
|
+
* pi-coding-agent ships an ESM-only package with a restrictive exports map
|
|
149
|
+
* (only the `.` import condition) — `require.resolve("<pkg>/package.json")`
|
|
150
|
+
* and `require.resolve("<pkg>")` both throw ERR_PACKAGE_PATH_NOT_EXPORTED.
|
|
151
|
+
* Reading package.json directly from the walked dir sidesteps the exports map
|
|
152
|
+
* entirely (exports only governs subpath IMPORTS, not raw file reads).
|
|
153
|
+
*
|
|
154
|
+
* At each directory we check BOTH `<dir>/node_modules/<pkg>` (the standard
|
|
155
|
+
* container case) AND `<dir>/<pkg>` (handles a base that IS a node_modules
|
|
156
|
+
* dir, e.g. the output of `npm root -g`), then walk up to root.
|
|
157
|
+
*/
|
|
158
|
+
function findPackageDir(
|
|
159
|
+
start: string,
|
|
160
|
+
names: readonly string[],
|
|
161
|
+
): { dir: string; name: string } | undefined {
|
|
162
|
+
let dir = path.resolve(start);
|
|
163
|
+
try {
|
|
164
|
+
if (fs.statSync(dir).isFile()) dir = path.dirname(dir);
|
|
165
|
+
} catch {
|
|
166
|
+
/* treat as directory */
|
|
167
|
+
}
|
|
168
|
+
while (true) {
|
|
169
|
+
for (const name of names) {
|
|
170
|
+
const segs = name.split("/");
|
|
171
|
+
const candidates = [
|
|
172
|
+
path.join(dir, "node_modules", ...segs, "package.json"),
|
|
173
|
+
path.join(dir, ...segs, "package.json"),
|
|
174
|
+
];
|
|
175
|
+
for (const pkgJson of candidates) {
|
|
176
|
+
try {
|
|
177
|
+
const pkg = JSON.parse(fs.readFileSync(pkgJson, "utf-8"));
|
|
178
|
+
if (pkg?.name === name) {
|
|
179
|
+
return { dir: path.dirname(pkgJson), name };
|
|
180
|
+
}
|
|
181
|
+
} catch {
|
|
182
|
+
/* not present at this candidate */
|
|
183
|
+
}
|
|
184
|
+
}
|
|
185
|
+
}
|
|
186
|
+
const parent = path.dirname(dir);
|
|
187
|
+
if (parent === dir) break; // reached filesystem root
|
|
188
|
+
dir = parent;
|
|
189
|
+
}
|
|
190
|
+
return undefined;
|
|
191
|
+
}
|
|
192
|
+
|
|
193
|
+
function tryResolveFrom(base: string): ResolvedPeerDep | undefined {
|
|
194
|
+
const found = findPackageDir(base, PEER_DEP_NAMES);
|
|
195
|
+
if (!found) return undefined;
|
|
196
|
+
try {
|
|
197
|
+
const pkg = JSON.parse(
|
|
198
|
+
fs.readFileSync(path.join(found.dir, "package.json"), "utf-8"),
|
|
199
|
+
);
|
|
200
|
+
const mainRel = extractEsmMain(pkg);
|
|
201
|
+
if (!mainRel) return undefined;
|
|
202
|
+
const mainAbs = path.resolve(found.dir, mainRel);
|
|
203
|
+
if (!fs.existsSync(mainAbs)) return undefined;
|
|
204
|
+
return { dir: found.dir, name: found.name, mainUrl: pathToFileURL(mainAbs).href };
|
|
205
|
+
} catch {
|
|
206
|
+
return undefined;
|
|
207
|
+
}
|
|
208
|
+
}
|
|
209
|
+
|
|
210
|
+
/** Resolve the peer dep install dir + ESM entry URL. Memoized (sync). */
|
|
211
|
+
export function resolvePeerDep(): ResolvedPeerDep | undefined {
|
|
212
|
+
if (cachedResolve !== null) return cachedResolve ?? undefined;
|
|
213
|
+
for (const base of peerDepResolutionBases()) {
|
|
214
|
+
const found = tryResolveFrom(base);
|
|
215
|
+
if (found) {
|
|
216
|
+
cachedResolve = found;
|
|
217
|
+
return found;
|
|
218
|
+
}
|
|
219
|
+
}
|
|
220
|
+
cachedResolve = null; // mark attempted-and-failed; don't re-probe per call
|
|
221
|
+
return undefined;
|
|
222
|
+
}
|
|
223
|
+
|
|
224
|
+
/** Just the install directory (for env-hint propagation to children). */
|
|
225
|
+
export function resolvePeerDepDir(): string | undefined {
|
|
226
|
+
return resolvePeerDep()?.dir;
|
|
227
|
+
}
|
|
228
|
+
|
|
229
|
+
/**
|
|
230
|
+
* Dynamic-import the peer dep module, caching the namespace. Memoized via a
|
|
231
|
+
* shared promise so concurrent callers share one load. On failure the promise
|
|
232
|
+
* is cleared so a later caller can retry. Safe to call repeatedly.
|
|
233
|
+
*/
|
|
234
|
+
export function primePeerDep(): Promise<PeerDepModule> {
|
|
235
|
+
if (cachedModule) return Promise.resolve(cachedModule);
|
|
236
|
+
if (primingPromise) return primingPromise;
|
|
237
|
+
primingPromise = (async () => {
|
|
238
|
+
const resolved = resolvePeerDep();
|
|
239
|
+
if (!resolved) {
|
|
240
|
+
throw new Error(buildMissingMessage());
|
|
241
|
+
}
|
|
242
|
+
cachedModule = (await import(resolved.mainUrl)) as PeerDepModule;
|
|
243
|
+
return cachedModule;
|
|
244
|
+
})();
|
|
245
|
+
// Clear on failure so a later caller can retry (e.g. after env fix).
|
|
246
|
+
primingPromise.catch(() => {
|
|
247
|
+
primingPromise = undefined;
|
|
248
|
+
});
|
|
249
|
+
return primingPromise;
|
|
250
|
+
}
|
|
251
|
+
|
|
252
|
+
/** Async module accessor (primes if needed). */
|
|
253
|
+
export async function loadPeerDep(): Promise<PeerDepModule> {
|
|
254
|
+
return primePeerDep();
|
|
255
|
+
}
|
|
256
|
+
|
|
257
|
+
function buildMissingMessage(): string {
|
|
258
|
+
return (
|
|
259
|
+
`pi-crew could not resolve the @earendil-works/pi-coding-agent peer dependency.\n` +
|
|
260
|
+
`This usually means pi-crew and pi are installed in separate node_modules trees\n` +
|
|
261
|
+
`(e.g. pi-crew under ~/.pi/agent/npm/ but pi under an nvm/Volta/fnm global scope).\n` +
|
|
262
|
+
`Resolution bases tried:\n` +
|
|
263
|
+
peerDepResolutionBases().map((b) => ` - ${b}`).join("\n") +
|
|
264
|
+
`\nFix: install pi-crew in the SAME scope as pi, e.g.\n` +
|
|
265
|
+
` npm install -g @earendil-works/pi-crew\n` +
|
|
266
|
+
`or set the env var ${PEER_DEP_DIR_ENV}=<path to the pi-coding-agent package dir>.`
|
|
267
|
+
);
|
|
268
|
+
}
|
|
269
|
+
|
|
270
|
+
/**
|
|
271
|
+
* Read the user agent dir via the REAL peer-dep getAgentDir (fork-aware:
|
|
272
|
+
* correct for pi, tau, and renamed forks). Sync; reads the primed cache.
|
|
273
|
+
*
|
|
274
|
+
* If the cache was never primed (e.g. called before bootstrap completes, or
|
|
275
|
+
* prime failed), falls back to a computed default so it NEVER throws. The
|
|
276
|
+
* default matches standard pi (`~/.pi/agent`) and respects the
|
|
277
|
+
* `PI_CODING_AGENT_DIR` override — correct for the overwhelmingly common
|
|
278
|
+
* case. Forks rely on the primed real function (register.ts primes at startup).
|
|
279
|
+
*/
|
|
280
|
+
export function getAgentDir(): string {
|
|
281
|
+
if (cachedModule?.getAgentDir) {
|
|
282
|
+
try {
|
|
283
|
+
return cachedModule.getAgentDir();
|
|
284
|
+
} catch {
|
|
285
|
+
/* fall through to computed default */
|
|
286
|
+
}
|
|
287
|
+
}
|
|
288
|
+
return process.env.PI_CODING_AGENT_DIR || path.join(os.homedir(), ".pi", "agent");
|
|
289
|
+
}
|
|
290
|
+
|
|
291
|
+
/** @internal — reset all caches for unit tests. */
|
|
292
|
+
export function __resetPeerDepCacheForTest(): void {
|
|
293
|
+
cachedResolve = null;
|
|
294
|
+
cachedModule = undefined;
|
|
295
|
+
primingPromise = undefined;
|
|
296
|
+
}
|
|
@@ -22,7 +22,11 @@ const PACKAGE_SKILLS_DIR = path.resolve(
|
|
|
22
22
|
"skills",
|
|
23
23
|
);
|
|
24
24
|
import * as os from "node:os";
|
|
25
|
-
|
|
25
|
+
// peer-dep.ts resolves @earendil-works/pi-coding-agent robustly across install
|
|
26
|
+
// layouts (extension-under-~/.pi + pi-under-global). A static `import { getAgentDir }`
|
|
27
|
+
// here crashes detached child processes when pi-crew and pi live in separate
|
|
28
|
+
// node_modules trees. See src/runtime/peer-dep.ts.
|
|
29
|
+
import { getAgentDir } from "../runtime/peer-dep.ts";
|
|
26
30
|
const MAX_SKILL_CHARS = 1500;
|
|
27
31
|
const MAX_TOTAL_CHARS = 6000;
|
|
28
32
|
const MAX_SKILL_NAME_CHARS = 80;
|
|
@@ -2,7 +2,9 @@ import * as fs from "node:fs";
|
|
|
2
2
|
import * as os from "node:os";
|
|
3
3
|
import * as path from "node:path";
|
|
4
4
|
import { fileURLToPath } from "node:url";
|
|
5
|
-
|
|
5
|
+
// peer-dep.ts resolves @earendil-works/pi-coding-agent robustly across install
|
|
6
|
+
// layouts. See src/runtime/peer-dep.ts (split-scope install fix).
|
|
7
|
+
import { getAgentDir } from "../runtime/peer-dep.ts";
|
|
6
8
|
import { logInternalError } from "../utils/internal-error.ts";
|
|
7
9
|
import { isSafePathId, resolveContainedPath, resolveRealContainedPath } from "../utils/safe-paths.ts";
|
|
8
10
|
|