@mjasnikovs/pi-task 0.7.1 β 0.7.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +14 -40
- package/assets/pi-logo.svg +4 -0
- package/assets/pipeline.svg +90 -0
- package/assets/task-auto.svg +65 -0
- package/dist/shared/leaked-tool-call.d.ts +36 -0
- package/dist/shared/leaked-tool-call.js +60 -0
- package/dist/task/auto-io.d.ts +7 -0
- package/dist/task/auto-io.js +24 -13
- package/dist/task/auto-orchestrator.d.ts +6 -1
- package/dist/task/auto-orchestrator.js +15 -3
- package/dist/task/child-runner.d.ts +19 -1
- package/dist/task/child-runner.js +68 -17
- package/dist/task/failure-classifier.js +10 -1
- package/dist/task/orchestrator.d.ts +6 -1
- package/dist/task/orchestrator.js +9 -2
- package/dist/task/phases.js +4 -0
- package/dist/workers/html-clean.js +77 -9
- package/dist/workers/pi-worker-core.d.ts +6 -0
- package/dist/workers/pi-worker-core.js +36 -21
- package/dist/workers/pi-worker-fetch.js +5 -4
- package/package.json +2 -1
package/README.md
CHANGED
|
@@ -1,13 +1,15 @@
|
|
|
1
1
|
<div align="center">
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
<img src="./assets/pipeline.svg" alt="pi-task pipeline: a /task request runs through refine, research, grill, compose and critique, then the final spec is delivered to your main pi session in the same chat. Every phase boundary is persisted to .pi-tasks/TASK_NNNN.md, so the task is crash-safe and resumable." width="820"/>
|
|
4
|
+
|
|
5
|
+
# <img src="./assets/pi-logo.svg" alt="" height="30" align="top"/> pi-task
|
|
4
6
|
|
|
5
7
|
**Deterministic spec-orchestration for local models β with bundled web, docs, fetch, and worker sub-agent tools.**
|
|
6
8
|
|
|
7
9
|
[](https://www.npmjs.com/package/@mjasnikovs/pi-task)
|
|
8
10
|
[](./LICENSE)
|
|
9
11
|
[](https://www.npmjs.com/package/@earendil-works/pi-coding-agent)
|
|
10
|
-
[](#development)
|
|
11
13
|
[](./tsconfig.json)
|
|
12
14
|
|
|
13
15
|
</div>
|
|
@@ -16,24 +18,7 @@
|
|
|
16
18
|
|
|
17
19
|
## What it does
|
|
18
20
|
|
|
19
|
-
Local models drift. Ask one to plan a non-trivial change and it skips context, hallucinates APIs, and forgets what you actually asked. `pi-task` fixes this by **not trusting a single prompt** β it drives your request through a fixed, persisted pipeline of small, verifiable steps, then hands the main session a clean spec to execute.
|
|
20
|
-
|
|
21
|
-
```
|
|
22
|
-
/task add rate-limiting to the public API
|
|
23
|
-
β
|
|
24
|
-
βΌ
|
|
25
|
-
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
|
|
26
|
-
β refine ββββΆβ research ββββΆβ grill ββββΆβ compose ββββΆβ critique β
|
|
27
|
-
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
|
|
28
|
-
sharpen the parallel clarifying assemble triage +
|
|
29
|
-
raw prompt sub-agents: questions the spec rewrite if
|
|
30
|
-
files Β· APIs Β· (auto- or the draft
|
|
31
|
-
context Β· you answer) isn't clean
|
|
32
|
-
tooling
|
|
33
|
-
β
|
|
34
|
-
βΌ
|
|
35
|
-
final spec βββΆ main pi session (you keep working in the same chat)
|
|
36
|
-
```
|
|
21
|
+
Local models drift. Ask one to plan a non-trivial change and it skips context, hallucinates APIs, and forgets what you actually asked. `pi-task` fixes this by **not trusting a single prompt** β it drives your request through a fixed, persisted pipeline of small, verifiable steps (shown above), then hands the main session a clean spec to execute.
|
|
37
22
|
|
|
38
23
|
Every phase boundary is written to `.pi-tasks/TASK_NNNN.md`, so a task survives a crash, a restart, or a `/task-cancel` β pick it back up with `/task-resume`.
|
|
39
24
|
|
|
@@ -51,7 +36,7 @@ Every phase boundary is written to `.pi-tasks/TASK_NNNN.md`, so a task survives
|
|
|
51
36
|
pi install npm:@mjasnikovs/pi-task
|
|
52
37
|
```
|
|
53
38
|
|
|
54
|
-
> Requires [`pi`](https://www.npmjs.com/package/@earendil-works/pi-coding-agent) (the Earendil coding agent) β₯ 0.
|
|
39
|
+
> Requires [`pi`](https://www.npmjs.com/package/@earendil-works/pi-coding-agent) (the Earendil coding agent) β₯ 0.78.
|
|
55
40
|
|
|
56
41
|
## Slash commands
|
|
57
42
|
|
|
@@ -82,22 +67,11 @@ The finished spec is delivered to your main `pi` conversation via `sendUserMessa
|
|
|
82
67
|
|
|
83
68
|
A real feature is usually several tasks, not one. `/task-auto` is a thin planner on top of the single-task pipeline:
|
|
84
69
|
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
|
|
88
|
-
|
|
89
|
-
|
|
90
|
-
β clarify ββββΆβ decomposeββββΆβ TASK_AUTO_β¦ β resumable list of task titles
|
|
91
|
-
β gray β β β titles β β .md (titles) β
|
|
92
|
-
β areas β ββββββββββββ ββββββββ¬ββββββββ
|
|
93
|
-
ββββββββββββ β
|
|
94
|
-
ββββββββββββββΌββββββββββββββ
|
|
95
|
-
β for each unchecked title β
|
|
96
|
-
β β full /task pipeline β (spec + implement)
|
|
97
|
-
β β wait until it finishes β
|
|
98
|
-
β β check the box, next β
|
|
99
|
-
ββββββββββββββββββββββββββββββ
|
|
100
|
-
```
|
|
70
|
+
<div align="center">
|
|
71
|
+
|
|
72
|
+
<img src="./assets/task-auto.svg" alt="/task-auto plans a feature: it clarifies the gray areas, decomposes the answers into an ordered list of task titles written to TASK_AUTO_NNNN.md, then runs each unchecked title through the full /task pipeline one at a time, ticking the box before moving on." width="820"/>
|
|
73
|
+
|
|
74
|
+
</div>
|
|
101
75
|
|
|
102
76
|
- **It only produces titles.** All the depth β refine, research, grill, compose, critique β is `/task`'s job, run fresh per title. `/task-auto` never researches or specs anything itself.
|
|
103
77
|
- **Clarify first.** It asks the few clarifying questions whose answers change how the feature splits, then decomposes the answers into an ordered list of task titles written to `.pi-tasks/TASK_AUTO_NNNN.md`.
|
|
@@ -149,9 +123,9 @@ Runs a Brave Search query and returns a compact markdown list (title Β· URL Β· s
|
|
|
149
123
|
> **Requires** `BRAVE_SEARCH_API_KEY` (also accepted as `BRAVE_API_KEY`). Grab a free key at [api.search.brave.com/app/keys](https://api.search.brave.com/app/keys).
|
|
150
124
|
|
|
151
125
|
### `pi-worker-fetch`
|
|
152
|
-
Fetches a URL, cleans
|
|
126
|
+
Fetches a URL, cleans HTML to markdown ([Readability](https://github.com/mozilla/readability) + [Turndown](https://github.com/mixmark-io/turndown)), then hands it to an isolated child that extracts **only** the content answering your `query`. The parent never sees the raw page.
|
|
153
127
|
|
|
154
|
-
-
|
|
128
|
+
- HTML is cleaned; text formats (plain text, markdown, JSON, XML/feeds, `llms.txt`, β¦) pass through verbatim. Binary responses β PDFs, images, octet-streams β return a clear error.
|
|
155
129
|
- Bodies over 2 MB are rejected.
|
|
156
130
|
- The extraction child runs with `--no-tools` to mitigate visible-text prompt injection.
|
|
157
131
|
|
|
@@ -179,7 +153,7 @@ Tasks are persisted to `<cwd>/.pi-tasks/TASK_NNNN.md`. Add `.pi-tasks/` to your
|
|
|
179
153
|
|
|
180
154
|
```sh
|
|
181
155
|
bun install
|
|
182
|
-
bun test src/ #
|
|
156
|
+
bun test src/ # 559 tests across 46 files
|
|
183
157
|
bun run lint # prettier + eslint + tsc --noEmit
|
|
184
158
|
bun run build # tsc β dist/
|
|
185
159
|
```
|
|
@@ -0,0 +1,4 @@
|
|
|
1
|
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 180 180" width="180" height="180" role="img" aria-label="pi-task logo">
|
|
2
|
+
<rect width="180" height="180" rx="38" fill="#1e1e2e"/>
|
|
3
|
+
<text x="90" y="130" font-family="Georgia, serif" font-size="100" text-anchor="middle" fill="#cba6f7">Ο</text>
|
|
4
|
+
</svg>
|
|
@@ -0,0 +1,90 @@
|
|
|
1
|
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 396" width="900" height="396" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif" role="img" aria-label="pi-task pipeline: a /task request runs through refine, research, grill, compose and critique phases, then the final spec is delivered to your main pi session in the same chat. Every phase boundary is persisted to .pi-tasks/TASK_NNNN.md, so the task is crash-safe and resumable.">
|
|
2
|
+
<defs>
|
|
3
|
+
<marker id="ah" viewBox="0 0 10 10" refX="8.5" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
|
4
|
+
<path d="M0,0 L10,5 L0,10 z" fill="#6e7681"/>
|
|
5
|
+
</marker>
|
|
6
|
+
<style>
|
|
7
|
+
.title { font-size:17px; font-weight:600; fill:#e6edf3; }
|
|
8
|
+
.cap { font-size:11.5px; fill:#8b949e; }
|
|
9
|
+
.num { font-size:11px; font-weight:700; }
|
|
10
|
+
.code { font-family:ui-monospace,SFMono-Regular,Menlo,Consolas,monospace; }
|
|
11
|
+
.foot { font-size:11.5px; fill:#768390; font-style:italic; }
|
|
12
|
+
</style>
|
|
13
|
+
</defs>
|
|
14
|
+
|
|
15
|
+
<rect x="6" y="6" width="888" height="384" rx="16" fill="#0d1117" stroke="#30363d" stroke-width="1.5"/>
|
|
16
|
+
|
|
17
|
+
<!-- input -->
|
|
18
|
+
<rect x="220" y="32" width="460" height="38" rx="19" fill="#161b22" stroke="#7c3aed" stroke-width="1.5"/>
|
|
19
|
+
<text x="450" y="57" text-anchor="middle" class="code" font-size="14"><tspan fill="#a371f7" font-weight="700">/task</tspan><tspan fill="#c9d1d9" dx="7">add rate-limiting to the public API</tspan></text>
|
|
20
|
+
<line x1="450" y1="70" x2="450" y2="106" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah)"/>
|
|
21
|
+
|
|
22
|
+
<!-- phase row -->
|
|
23
|
+
<!-- refine -->
|
|
24
|
+
<g>
|
|
25
|
+
<rect x="32" y="118" width="150" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
26
|
+
<rect x="32" y="118" width="5" height="72" rx="2.5" fill="#58a6ff"/>
|
|
27
|
+
<text x="50" y="138" class="num" fill="#58a6ff">1</text>
|
|
28
|
+
<text x="107" y="161" text-anchor="middle" class="title">refine</text>
|
|
29
|
+
</g>
|
|
30
|
+
<text x="107" y="210" text-anchor="middle" class="cap">sharpen the</text>
|
|
31
|
+
<text x="107" y="225" text-anchor="middle" class="cap">raw prompt</text>
|
|
32
|
+
|
|
33
|
+
<!-- research -->
|
|
34
|
+
<g>
|
|
35
|
+
<rect x="203.5" y="118" width="150" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
36
|
+
<rect x="203.5" y="118" width="5" height="72" rx="2.5" fill="#39c5cf"/>
|
|
37
|
+
<text x="221.5" y="138" class="num" fill="#39c5cf">2</text>
|
|
38
|
+
<text x="278.5" y="161" text-anchor="middle" class="title">research</text>
|
|
39
|
+
</g>
|
|
40
|
+
<text x="278.5" y="210" text-anchor="middle" class="cap">parallel sub-agents:</text>
|
|
41
|
+
<text x="278.5" y="225" text-anchor="middle" class="cap">files Β· APIs Β·</text>
|
|
42
|
+
<text x="278.5" y="240" text-anchor="middle" class="cap">context Β· tooling</text>
|
|
43
|
+
|
|
44
|
+
<!-- grill -->
|
|
45
|
+
<g>
|
|
46
|
+
<rect x="375" y="118" width="150" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
47
|
+
<rect x="375" y="118" width="5" height="72" rx="2.5" fill="#d29922"/>
|
|
48
|
+
<text x="393" y="138" class="num" fill="#d29922">3</text>
|
|
49
|
+
<text x="450" y="161" text-anchor="middle" class="title">grill</text>
|
|
50
|
+
</g>
|
|
51
|
+
<text x="450" y="210" text-anchor="middle" class="cap">clarifying Q&A</text>
|
|
52
|
+
<text x="450" y="225" text-anchor="middle" class="cap">(auto, or you answer)</text>
|
|
53
|
+
|
|
54
|
+
<!-- compose -->
|
|
55
|
+
<g>
|
|
56
|
+
<rect x="546.5" y="118" width="150" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
57
|
+
<rect x="546.5" y="118" width="5" height="72" rx="2.5" fill="#3fb950"/>
|
|
58
|
+
<text x="564.5" y="138" class="num" fill="#3fb950">4</text>
|
|
59
|
+
<text x="621.5" y="161" text-anchor="middle" class="title">compose</text>
|
|
60
|
+
</g>
|
|
61
|
+
<text x="621.5" y="210" text-anchor="middle" class="cap">assemble</text>
|
|
62
|
+
<text x="621.5" y="225" text-anchor="middle" class="cap">the spec</text>
|
|
63
|
+
|
|
64
|
+
<!-- critique -->
|
|
65
|
+
<g>
|
|
66
|
+
<rect x="718" y="118" width="150" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
67
|
+
<rect x="718" y="118" width="5" height="72" rx="2.5" fill="#a371f7"/>
|
|
68
|
+
<text x="736" y="138" class="num" fill="#a371f7">5</text>
|
|
69
|
+
<text x="793" y="161" text-anchor="middle" class="title">critique</text>
|
|
70
|
+
</g>
|
|
71
|
+
<text x="793" y="210" text-anchor="middle" class="cap">triage, rewrite</text>
|
|
72
|
+
<text x="793" y="225" text-anchor="middle" class="cap">if not clean</text>
|
|
73
|
+
|
|
74
|
+
<!-- inter-phase arrows -->
|
|
75
|
+
<line x1="182" y1="154" x2="201.5" y2="154" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah)"/>
|
|
76
|
+
<line x1="353.5" y1="154" x2="373" y2="154" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah)"/>
|
|
77
|
+
<line x1="525" y1="154" x2="544.5" y2="154" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah)"/>
|
|
78
|
+
<line x1="696.5" y1="154" x2="716" y2="154" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah)"/>
|
|
79
|
+
|
|
80
|
+
<!-- to result -->
|
|
81
|
+
<line x1="450" y1="252" x2="450" y2="292" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah)"/>
|
|
82
|
+
|
|
83
|
+
<!-- result -->
|
|
84
|
+
<rect x="120" y="296" width="660" height="48" rx="12" fill="#0f261a" stroke="#2ea043" stroke-width="1.5"/>
|
|
85
|
+
<text x="450" y="318" text-anchor="middle" font-size="14" font-weight="600" fill="#e6edf3">final spec β delivered to your main <tspan class="code" fill="#7ee787">pi</tspan> session</text>
|
|
86
|
+
<text x="450" y="334" text-anchor="middle" class="cap">same chat β no handoff, no copy-paste</text>
|
|
87
|
+
|
|
88
|
+
<!-- footnote -->
|
|
89
|
+
<text x="450" y="372" text-anchor="middle" class="foot">every phase boundary is persisted to <tspan class="code" font-style="normal">.pi-tasks/TASK_NNNN.md</tspan> β crash-safe & resumable</text>
|
|
90
|
+
</svg>
|
|
@@ -0,0 +1,65 @@
|
|
|
1
|
+
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 900 332" width="900" height="332" font-family="-apple-system, BlinkMacSystemFont, 'Segoe UI', Helvetica, Arial, sans-serif" role="img" aria-label="/task-auto plans a feature: it clarifies the gray areas, decomposes the answers into an ordered list of task titles written to TASK_AUTO_NNNN.md, then runs each unchecked title through the full /task pipeline one at a time, ticking the box before moving on.">
|
|
2
|
+
<defs>
|
|
3
|
+
<marker id="ah2" viewBox="0 0 10 10" refX="8.5" refY="5" markerWidth="7" markerHeight="7" orient="auto-start-reverse">
|
|
4
|
+
<path d="M0,0 L10,5 L0,10 z" fill="#6e7681"/>
|
|
5
|
+
</marker>
|
|
6
|
+
<style>
|
|
7
|
+
.title { font-size:17px; font-weight:600; fill:#e6edf3; }
|
|
8
|
+
.titlec { font-size:15px; font-weight:600; fill:#e6edf3; font-family:ui-monospace,SFMono-Regular,Menlo,Consolas,monospace; }
|
|
9
|
+
.cap { font-size:11.5px; fill:#8b949e; }
|
|
10
|
+
.num { font-size:11px; font-weight:700; }
|
|
11
|
+
.code { font-family:ui-monospace,SFMono-Regular,Menlo,Consolas,monospace; }
|
|
12
|
+
</style>
|
|
13
|
+
</defs>
|
|
14
|
+
|
|
15
|
+
<rect x="6" y="6" width="888" height="320" rx="16" fill="#0d1117" stroke="#30363d" stroke-width="1.5"/>
|
|
16
|
+
|
|
17
|
+
<!-- input -->
|
|
18
|
+
<rect x="240" y="30" width="420" height="38" rx="19" fill="#161b22" stroke="#7c3aed" stroke-width="1.5"/>
|
|
19
|
+
<text x="450" y="55" text-anchor="middle" class="code" font-size="14"><tspan fill="#a371f7" font-weight="700">/task-auto</tspan><tspan fill="#c9d1d9" dx="7">add multi-tenant billing</tspan></text>
|
|
20
|
+
<line x1="450" y1="68" x2="450" y2="100" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah2)"/>
|
|
21
|
+
|
|
22
|
+
<!-- clarify -->
|
|
23
|
+
<g>
|
|
24
|
+
<rect x="32" y="110" width="200" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
25
|
+
<rect x="32" y="110" width="5" height="72" rx="2.5" fill="#d29922"/>
|
|
26
|
+
<text x="50" y="130" class="num" fill="#d29922">1</text>
|
|
27
|
+
<text x="132" y="153" text-anchor="middle" class="title">clarify gray areas</text>
|
|
28
|
+
</g>
|
|
29
|
+
<text x="132" y="202" text-anchor="middle" class="cap">ask only what</text>
|
|
30
|
+
<text x="132" y="217" text-anchor="middle" class="cap">changes the split</text>
|
|
31
|
+
|
|
32
|
+
<!-- decompose -->
|
|
33
|
+
<g>
|
|
34
|
+
<rect x="350" y="110" width="200" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
35
|
+
<rect x="350" y="110" width="5" height="72" rx="2.5" fill="#58a6ff"/>
|
|
36
|
+
<text x="368" y="130" class="num" fill="#58a6ff">2</text>
|
|
37
|
+
<text x="450" y="153" text-anchor="middle" class="title">decompose β titles</text>
|
|
38
|
+
</g>
|
|
39
|
+
<text x="450" y="202" text-anchor="middle" class="cap">ordered list of</text>
|
|
40
|
+
<text x="450" y="217" text-anchor="middle" class="cap">task titles only</text>
|
|
41
|
+
|
|
42
|
+
<!-- file -->
|
|
43
|
+
<g>
|
|
44
|
+
<rect x="668" y="110" width="200" height="72" rx="10" fill="#161b22" stroke="#30363d"/>
|
|
45
|
+
<rect x="668" y="110" width="5" height="72" rx="2.5" fill="#3fb950"/>
|
|
46
|
+
<text x="686" y="130" class="num" fill="#3fb950">3</text>
|
|
47
|
+
<text x="768" y="154" text-anchor="middle" class="titlec">TASK_AUTO_NNNN.md</text>
|
|
48
|
+
</g>
|
|
49
|
+
<text x="768" y="202" text-anchor="middle" class="cap">resumable checklist</text>
|
|
50
|
+
<text x="768" y="217" text-anchor="middle" class="cap">of titles</text>
|
|
51
|
+
|
|
52
|
+
<!-- arrows between -->
|
|
53
|
+
<line x1="232" y1="146" x2="348" y2="146" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah2)"/>
|
|
54
|
+
<line x1="550" y1="146" x2="666" y2="146" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah2)"/>
|
|
55
|
+
|
|
56
|
+
<!-- to loop -->
|
|
57
|
+
<line x1="768" y1="227" x2="768" y2="248" stroke="#6e7681" stroke-width="1.5"/>
|
|
58
|
+
<path d="M768,248 H450 V248" fill="none" stroke="#6e7681" stroke-width="1.5"/>
|
|
59
|
+
<line x1="450" y1="244" x2="450" y2="252" stroke="#6e7681" stroke-width="1.5" marker-end="url(#ah2)"/>
|
|
60
|
+
|
|
61
|
+
<!-- loop -->
|
|
62
|
+
<rect x="120" y="256" width="660" height="54" rx="12" fill="#0d1117" stroke="#a371f7" stroke-width="1.5"/>
|
|
63
|
+
<text x="450" y="278" text-anchor="middle" font-size="14" font-weight="600" fill="#e6edf3">β» for each unchecked title</text>
|
|
64
|
+
<text x="450" y="296" text-anchor="middle" class="cap">run the full <tspan class="code" fill="#a371f7">/task</tspan> pipeline β implement β wait until done β tick the box β next</text>
|
|
65
|
+
</svg>
|
|
@@ -0,0 +1,36 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Detect tool calls that leaked into a child's assistant *text* instead of
|
|
3
|
+
* being executed.
|
|
4
|
+
*
|
|
5
|
+
* Background: every child pi runs under `--mode json`; pi-task only ever treats
|
|
6
|
+
* a structured `tool_execution_start` event as a tool call (see
|
|
7
|
+
* shared/child-process.ts). When a local model emits a call in a markup dialect
|
|
8
|
+
* pi's harness doesn't recognise β e.g.
|
|
9
|
+
*
|
|
10
|
+
* <tool_call>
|
|
11
|
+
* <function=bash>
|
|
12
|
+
* <parameter=command>grep β¦</parameter>
|
|
13
|
+
* </function>
|
|
14
|
+
* </tool_call>
|
|
15
|
+
*
|
|
16
|
+
* pi passes the raw markup through as ordinary assistant text. The command never
|
|
17
|
+
* runs, no event fires, and pi-task's guards (loop detector, widget) never see
|
|
18
|
+
* it. The phase then "passes" on its only gates (non-empty text + exit 0) and
|
|
19
|
+
* the unexecuted call flows downstream β a silently skipped beat.
|
|
20
|
+
*
|
|
21
|
+
* This is fundamentally an upstream mismatch (model output format β pi's parser)
|
|
22
|
+
* that pi-task cannot fix. What it CAN do is notice the leaked markup and refuse
|
|
23
|
+
* to accept the turn, so the skip becomes visible instead of silent.
|
|
24
|
+
*/
|
|
25
|
+
export declare const MAX_LEAK_RETRIES = 2;
|
|
26
|
+
/**
|
|
27
|
+
* Return the offending marker string if `text` contains a leaked tool call, or
|
|
28
|
+
* null if it looks clean. The marker is suitable for logging and for naming the
|
|
29
|
+
* problem back to the model in a re-prompt hint.
|
|
30
|
+
*/
|
|
31
|
+
export declare function detectLeakedToolCall(text: string): string | null;
|
|
32
|
+
/**
|
|
33
|
+
* A correction hint to prepend to a re-spawn after a leak, naming the offending
|
|
34
|
+
* markup so the model stops repeating that exact mistake.
|
|
35
|
+
*/
|
|
36
|
+
export declare function leakedToolCallHint(marker: string): string;
|
|
@@ -0,0 +1,60 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Detect tool calls that leaked into a child's assistant *text* instead of
|
|
3
|
+
* being executed.
|
|
4
|
+
*
|
|
5
|
+
* Background: every child pi runs under `--mode json`; pi-task only ever treats
|
|
6
|
+
* a structured `tool_execution_start` event as a tool call (see
|
|
7
|
+
* shared/child-process.ts). When a local model emits a call in a markup dialect
|
|
8
|
+
* pi's harness doesn't recognise β e.g.
|
|
9
|
+
*
|
|
10
|
+
* <tool_call>
|
|
11
|
+
* <function=bash>
|
|
12
|
+
* <parameter=command>grep β¦</parameter>
|
|
13
|
+
* </function>
|
|
14
|
+
* </tool_call>
|
|
15
|
+
*
|
|
16
|
+
* pi passes the raw markup through as ordinary assistant text. The command never
|
|
17
|
+
* runs, no event fires, and pi-task's guards (loop detector, widget) never see
|
|
18
|
+
* it. The phase then "passes" on its only gates (non-empty text + exit 0) and
|
|
19
|
+
* the unexecuted call flows downstream β a silently skipped beat.
|
|
20
|
+
*
|
|
21
|
+
* This is fundamentally an upstream mismatch (model output format β pi's parser)
|
|
22
|
+
* that pi-task cannot fix. What it CAN do is notice the leaked markup and refuse
|
|
23
|
+
* to accept the turn, so the skip becomes visible instead of silent.
|
|
24
|
+
*/
|
|
25
|
+
// A child that wrote a tool call as plain text (wrong dialect, never executed)
|
|
26
|
+
// gets re-prompted with a correction hint up to this many times before the
|
|
27
|
+
// caller gives up. Mirrors MAX_LOOP_RESTARTS: 3 attempts total.
|
|
28
|
+
export const MAX_LEAK_RETRIES = 2;
|
|
29
|
+
// The Hermes-style wrapper a leaked call is most often wrapped in. pi-task never
|
|
30
|
+
// legitimately emits this tag, so its presence alone is a confident signal.
|
|
31
|
+
const TOOL_CALL_WRAPPER = /<tool_call\b[^>]*>/i;
|
|
32
|
+
// The "XML function call" dialect: <function=name> β¦ <parameter=key>. Either tag
|
|
33
|
+
// alone is too weak (a stray "<function=x>" can appear in prose or source), so we
|
|
34
|
+
// require the structural pair before flagging it.
|
|
35
|
+
const FUNCTION_TAG = /<function=[\w.-]+\s*>/i;
|
|
36
|
+
const PARAMETER_TAG = /<parameter=[\w.-]+\s*>/i;
|
|
37
|
+
/**
|
|
38
|
+
* Return the offending marker string if `text` contains a leaked tool call, or
|
|
39
|
+
* null if it looks clean. The marker is suitable for logging and for naming the
|
|
40
|
+
* problem back to the model in a re-prompt hint.
|
|
41
|
+
*/
|
|
42
|
+
export function detectLeakedToolCall(text) {
|
|
43
|
+
const wrapper = TOOL_CALL_WRAPPER.exec(text);
|
|
44
|
+
if (wrapper)
|
|
45
|
+
return wrapper[0];
|
|
46
|
+
const fn = FUNCTION_TAG.exec(text);
|
|
47
|
+
if (fn && PARAMETER_TAG.test(text))
|
|
48
|
+
return fn[0];
|
|
49
|
+
return null;
|
|
50
|
+
}
|
|
51
|
+
/**
|
|
52
|
+
* A correction hint to prepend to a re-spawn after a leak, naming the offending
|
|
53
|
+
* markup so the model stops repeating that exact mistake.
|
|
54
|
+
*/
|
|
55
|
+
export function leakedToolCallHint(marker) {
|
|
56
|
+
return (`[SYSTEM NOTE: Your previous turn wrote a tool call as plain text (\`${marker}\`) `
|
|
57
|
+
+ `instead of invoking the tool β so it never ran and you proceeded without its result. `
|
|
58
|
+
+ `Invoke tools through the native tool-calling mechanism; never type `
|
|
59
|
+
+ `<tool_call>/<function=β¦>/<parameter=β¦> markup into your answer.]`);
|
|
60
|
+
}
|
package/dist/task/auto-io.d.ts
CHANGED
|
@@ -13,5 +13,12 @@ export declare function parseTaskList(body: string): TaskEntry[];
|
|
|
13
13
|
export declare function buildAutoBody(feature: string, clarifications: string, titles: string[]): string;
|
|
14
14
|
/** Check off the Nth checkbox line, stamping the produced TASK_NNNN id. */
|
|
15
15
|
export declare function checkOffTask(cwd: string, id: string, index: number, producedId: string, title: string): Promise<void>;
|
|
16
|
+
/**
|
|
17
|
+
* Stamp the inner TASK_NNNN id onto the Nth (still-unchecked) entry the moment
|
|
18
|
+
* the inner task is allocated. This links the AUTO entry to its in-progress
|
|
19
|
+
* inner task so /task-auto-resume can continue it from its saved phase instead
|
|
20
|
+
* of starting a brand-new task β matching how /task-resume behaves.
|
|
21
|
+
*/
|
|
22
|
+
export declare function stampTaskInProgress(cwd: string, id: string, index: number, producedId: string, title: string): Promise<void>;
|
|
16
23
|
/** Find the most-recently-updated resumable TASK_AUTO_* file, or null. */
|
|
17
24
|
export declare function findResumableAuto(cwd: string): Promise<string | null>;
|
package/dist/task/auto-io.js
CHANGED
|
@@ -53,17 +53,15 @@ export function parseTaskList(body) {
|
|
|
53
53
|
continue;
|
|
54
54
|
const done = m[1].toLowerCase() === 'x';
|
|
55
55
|
const rest = m[2].trim();
|
|
56
|
-
|
|
57
|
-
|
|
58
|
-
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
entries.push({ index, title: rest, done: true });
|
|
63
|
-
}
|
|
56
|
+
// A line carries a stamped TASK_NNNN id both when done (the completed
|
|
57
|
+
// inner task) and when merely started β an unchecked, stamped line is an
|
|
58
|
+
// in-progress entry whose inner task can be resumed.
|
|
59
|
+
const idm = PRODUCED_ID_RE.exec(rest);
|
|
60
|
+
if (idm) {
|
|
61
|
+
entries.push({ index, title: idm[2].trim(), done, producedId: idm[1] });
|
|
64
62
|
}
|
|
65
63
|
else {
|
|
66
|
-
entries.push({ index, title: rest, done
|
|
64
|
+
entries.push({ index, title: rest, done });
|
|
67
65
|
}
|
|
68
66
|
index++;
|
|
69
67
|
}
|
|
@@ -76,8 +74,8 @@ export function buildAutoBody(feature, clarifications, titles) {
|
|
|
76
74
|
+ `## clarifications\n\n${clarifications.trim() || '(none)'}\n\n`
|
|
77
75
|
+ `## tasks\n\n${tasks}\n`);
|
|
78
76
|
}
|
|
79
|
-
/**
|
|
80
|
-
|
|
77
|
+
/** Rewrite the Nth checkbox line of the "## tasks" section in place. */
|
|
78
|
+
async function rewriteTaskLine(cwd, id, index, render, label) {
|
|
81
79
|
const { body } = await readTaskFile(cwd, id);
|
|
82
80
|
const section = extractSection(body, 'tasks') ?? '';
|
|
83
81
|
const lines = section.split('\n');
|
|
@@ -87,15 +85,28 @@ export async function checkOffTask(cwd, id, index, producedId, title) {
|
|
|
87
85
|
continue;
|
|
88
86
|
seen++;
|
|
89
87
|
if (seen === index) {
|
|
90
|
-
lines[i] =
|
|
88
|
+
lines[i] = render();
|
|
91
89
|
break;
|
|
92
90
|
}
|
|
93
91
|
}
|
|
94
92
|
if (seen < index) {
|
|
95
|
-
throw new Error(
|
|
93
|
+
throw new Error(`${label}: index ${index} out of range in ${id} (only ${seen + 1} checkboxes found)`);
|
|
96
94
|
}
|
|
97
95
|
await setTaskSection(cwd, id, 'tasks', lines.join('\n'));
|
|
98
96
|
}
|
|
97
|
+
/** Check off the Nth checkbox line, stamping the produced TASK_NNNN id. */
|
|
98
|
+
export async function checkOffTask(cwd, id, index, producedId, title) {
|
|
99
|
+
await rewriteTaskLine(cwd, id, index, () => (producedId ? `- [x] ${producedId} ${title}` : `- [x] ${title}`), 'checkOffTask');
|
|
100
|
+
}
|
|
101
|
+
/**
|
|
102
|
+
* Stamp the inner TASK_NNNN id onto the Nth (still-unchecked) entry the moment
|
|
103
|
+
* the inner task is allocated. This links the AUTO entry to its in-progress
|
|
104
|
+
* inner task so /task-auto-resume can continue it from its saved phase instead
|
|
105
|
+
* of starting a brand-new task β matching how /task-resume behaves.
|
|
106
|
+
*/
|
|
107
|
+
export async function stampTaskInProgress(cwd, id, index, producedId, title) {
|
|
108
|
+
await rewriteTaskLine(cwd, id, index, () => `- [ ] ${producedId} ${title}`, 'stampTaskInProgress');
|
|
109
|
+
}
|
|
99
110
|
/** Find the most-recently-updated resumable TASK_AUTO_* file, or null. */
|
|
100
111
|
export async function findResumableAuto(cwd) {
|
|
101
112
|
await ensureTasksDir(cwd);
|
|
@@ -7,7 +7,12 @@ import { type CommitResult } from './auto-commit.js';
|
|
|
7
7
|
*/
|
|
8
8
|
export interface AutoDeps {
|
|
9
9
|
runChild: (name: string, tools: string, prompt: string) => Promise<string>;
|
|
10
|
-
runTask: (ctx: ExtensionCommandContext, cwd: string, title: string
|
|
10
|
+
runTask: (ctx: ExtensionCommandContext, cwd: string, title: string, opts?: {
|
|
11
|
+
/** Resume this inner task id instead of allocating a fresh one. */
|
|
12
|
+
resumeId?: string;
|
|
13
|
+
/** Called with the inner task id once its file exists, before phases. */
|
|
14
|
+
onStart?: (taskId: string) => void | Promise<void>;
|
|
15
|
+
}) => Promise<RunSingleTaskResult>;
|
|
11
16
|
/** Snapshot the working tree into one commit after a task passes. */
|
|
12
17
|
commit: (cwd: string, message: string) => Promise<CommitResult>;
|
|
13
18
|
}
|
|
@@ -11,7 +11,7 @@ import { runSingleTask } from './orchestrator.js';
|
|
|
11
11
|
import { parseClarifyList, deriveTitle } from './parsers.js';
|
|
12
12
|
import { renderInlineMarkdown, stripInlineMarkdown } from './inline-markdown.js';
|
|
13
13
|
import { AUTO_CLARIFY_PROMPT, AUTO_DECOMPOSE_PROMPT } from './auto-prompts.js';
|
|
14
|
-
import { allocateAutoId, buildAutoBody, parseDecomposeList, parseTaskList, checkOffTask, findResumableAuto } from './auto-io.js';
|
|
14
|
+
import { allocateAutoId, buildAutoBody, parseDecomposeList, parseTaskList, checkOffTask, stampTaskInProgress, findResumableAuto } from './auto-io.js';
|
|
15
15
|
import { writeTaskFile, readTaskFile, updateTaskFrontMatter } from './task-io.js';
|
|
16
16
|
import { gitCommitAll } from './auto-commit.js';
|
|
17
17
|
import { runPhaseChild, USER_CANCELLED } from './child-runner.js';
|
|
@@ -176,7 +176,11 @@ function defaultDeps(ctx, cwd, signal, title) {
|
|
|
176
176
|
stopLoader();
|
|
177
177
|
}
|
|
178
178
|
},
|
|
179
|
-
runTask: (c, cwd2, t) => runSingleTask(c, cwd2, t, {
|
|
179
|
+
runTask: (c, cwd2, t, opts) => runSingleTask(c, cwd2, t, {
|
|
180
|
+
waitForImplementation: true,
|
|
181
|
+
resumeId: opts?.resumeId,
|
|
182
|
+
onStart: opts?.onStart
|
|
183
|
+
}),
|
|
180
184
|
commit: (cwd2, message) => gitCommitAll(cwd2, message, signal)
|
|
181
185
|
};
|
|
182
186
|
}
|
|
@@ -208,7 +212,15 @@ export async function runAutoLoop(ctx, cwd, id, deps) {
|
|
|
208
212
|
return;
|
|
209
213
|
}
|
|
210
214
|
active.ui.notify(`${id}: task ${next.index + 1}/${entries.length} β ${next.title}`, 'info');
|
|
211
|
-
|
|
215
|
+
// If this entry already has a stamped inner id, it was started in a
|
|
216
|
+
// previous (interrupted) run β resume it from its saved phase rather
|
|
217
|
+
// than spawning a fresh task. Otherwise stamp the freshly-allocated id
|
|
218
|
+
// onto the entry the moment it exists, so an interruption here is
|
|
219
|
+
// resumable too. This mirrors /task-resume's continue-don't-restart.
|
|
220
|
+
const res = await deps.runTask(active, cwd, next.title, {
|
|
221
|
+
resumeId: next.producedId,
|
|
222
|
+
onStart: next.producedId ? undefined : (innerId => stampTaskInProgress(cwd, id, next.index, innerId, next.title))
|
|
223
|
+
});
|
|
212
224
|
active = res.ctx ?? active;
|
|
213
225
|
if (res.sessionCancelled) {
|
|
214
226
|
active.ui.notify(`${id} paused β could not start a session. Run /task-auto-resume to retry.`, 'warning');
|
|
@@ -14,6 +14,8 @@ export interface PhaseRunResult {
|
|
|
14
14
|
exitCode: number;
|
|
15
15
|
stderr: string;
|
|
16
16
|
loopHit?: LoopHit;
|
|
17
|
+
/** Set when the assistant text contains an unexecuted, leaked tool call. */
|
|
18
|
+
leakedToolCall?: string;
|
|
17
19
|
}
|
|
18
20
|
export declare function childArgs(tools: string, prompt: string): string[];
|
|
19
21
|
export declare const USER_CANCELLED = "__user_cancelled__";
|
|
@@ -38,7 +40,13 @@ interface PhaseDeps {
|
|
|
38
40
|
spawn?: SpawnFn;
|
|
39
41
|
}
|
|
40
42
|
export type { PhaseDeps };
|
|
41
|
-
/**
|
|
43
|
+
/**
|
|
44
|
+
* Run a child pi and return its assistant text. Throws if exit code != 0.
|
|
45
|
+
*
|
|
46
|
+
* If the child leaks a tool call as plain text (wrong dialect β never executed),
|
|
47
|
+
* re-prompt with a correction hint up to MAX_LEAK_RETRIES times; if it keeps
|
|
48
|
+
* leaking, throw LeakedToolCallError rather than returning the unexecuted call.
|
|
49
|
+
*/
|
|
42
50
|
export declare function runPhaseChild(deps: PhaseDeps, name: string, tools: string, prompt: string): Promise<string>;
|
|
43
51
|
export declare function prependHint(hint: string | null, prompt: string): string;
|
|
44
52
|
/**
|
|
@@ -64,3 +72,13 @@ export declare class LoopExhaustedError extends Error {
|
|
|
64
72
|
readonly history: LoopHit[];
|
|
65
73
|
constructor(phase: string, history: LoopHit[]);
|
|
66
74
|
}
|
|
75
|
+
/**
|
|
76
|
+
* Thrown when a phase child repeatedly wrote a tool call as plain text (a markup
|
|
77
|
+
* dialect pi's harness didn't parse) instead of invoking it. The call never ran,
|
|
78
|
+
* so the phase output is untrustworthy β fail loudly rather than check it off.
|
|
79
|
+
*/
|
|
80
|
+
export declare class LeakedToolCallError extends Error {
|
|
81
|
+
readonly phase: string;
|
|
82
|
+
readonly marker: string;
|
|
83
|
+
constructor(phase: string, marker: string);
|
|
84
|
+
}
|
|
@@ -9,6 +9,7 @@ import { spawn } from 'node:child_process';
|
|
|
9
9
|
import { getPiInvocation } from '../shared/pi-invocation.js';
|
|
10
10
|
import { runChild as runChildUnified, CHILD_BASE_ARGS } from '../shared/child-process.js';
|
|
11
11
|
import { LoopDetector } from './loop-detector.js';
|
|
12
|
+
import { detectLeakedToolCall, leakedToolCallHint, MAX_LEAK_RETRIES } from '../shared/leaked-tool-call.js';
|
|
12
13
|
import { readSection, setTaskSection } from './task-file.js';
|
|
13
14
|
// βββ Loop detection constants ββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
14
15
|
// Defined here (not in phases.ts) to avoid a circular dependency:
|
|
@@ -57,28 +58,52 @@ export async function runChild(cwd, tools, prompt, signal, onLine, onContextUsag
|
|
|
57
58
|
return hit; // propagate to unified runner so it can kill
|
|
58
59
|
}
|
|
59
60
|
});
|
|
61
|
+
// Use `||` (not `??`) so an empty string from json-events mode falls
|
|
62
|
+
// back to raw stdout. Without this, a child that exits 0 but emits no
|
|
63
|
+
// assistant text (e.g. model API error swallowed in json mode) always
|
|
64
|
+
// fails with the unhelpful "X child produced no output" β the raw
|
|
65
|
+
// stdout/stderr that might contain the real error is discarded.
|
|
66
|
+
const text = result.text || result.stdout.trim();
|
|
60
67
|
return {
|
|
61
|
-
|
|
62
|
-
// back to raw stdout. Without this, a child that exits 0 but emits no
|
|
63
|
-
// assistant text (e.g. model API error swallowed in json mode) always
|
|
64
|
-
// fails with the unhelpful "X child produced no output" β the raw
|
|
65
|
-
// stdout/stderr that might contain the real error is discarded.
|
|
66
|
-
text: result.text || result.stdout.trim(),
|
|
68
|
+
text,
|
|
67
69
|
exitCode: result.exitCode,
|
|
68
70
|
stderr: result.stderr.trim(),
|
|
69
|
-
loopHit
|
|
71
|
+
loopHit,
|
|
72
|
+
// A tool call the model wrote as text (wrong dialect) never executed and
|
|
73
|
+
// sailed past the structured-event guards above; flag it so the wrappers
|
|
74
|
+
// can re-prompt instead of accepting the unexecuted call. Only meaningful
|
|
75
|
+
// when the run otherwise succeeded β a loop kill truncates text mid-stream.
|
|
76
|
+
leakedToolCall: loopHit ? undefined : (detectLeakedToolCall(text) ?? undefined)
|
|
70
77
|
};
|
|
71
78
|
}
|
|
72
|
-
/**
|
|
79
|
+
/**
|
|
80
|
+
* Run a child pi and return its assistant text. Throws if exit code != 0.
|
|
81
|
+
*
|
|
82
|
+
* If the child leaks a tool call as plain text (wrong dialect β never executed),
|
|
83
|
+
* re-prompt with a correction hint up to MAX_LEAK_RETRIES times; if it keeps
|
|
84
|
+
* leaking, throw LeakedToolCallError rather than returning the unexecuted call.
|
|
85
|
+
*/
|
|
73
86
|
export async function runPhaseChild(deps, name, tools, prompt) {
|
|
74
|
-
|
|
75
|
-
|
|
76
|
-
|
|
77
|
-
|
|
78
|
-
|
|
79
|
-
|
|
87
|
+
let hint = null;
|
|
88
|
+
for (let attempt = 0; attempt <= MAX_LEAK_RETRIES; attempt++) {
|
|
89
|
+
const r = await runChild(deps.cwd, tools, prependHint(hint, prompt), deps.signal, deps.onChildOutput, deps.onContextUsage, undefined, deps.spawn);
|
|
90
|
+
if (r.exitCode !== 0) {
|
|
91
|
+
throw new Error(`${name} child failed: ${r.stderr || '(no stderr)'}`);
|
|
92
|
+
}
|
|
93
|
+
if (r.text.trim().length === 0) {
|
|
94
|
+
throw new Error(`${name} child produced no output${r.stderr ? ' β stderr: ' + r.stderr : ''}`);
|
|
95
|
+
}
|
|
96
|
+
if (r.leakedToolCall) {
|
|
97
|
+
if (attempt === MAX_LEAK_RETRIES) {
|
|
98
|
+
throw new LeakedToolCallError(name, r.leakedToolCall);
|
|
99
|
+
}
|
|
100
|
+
hint = leakedToolCallHint(r.leakedToolCall);
|
|
101
|
+
continue;
|
|
102
|
+
}
|
|
103
|
+
return r.text;
|
|
80
104
|
}
|
|
81
|
-
|
|
105
|
+
// Unreachable: the loop returns clean text or throws on the final leak.
|
|
106
|
+
throw new LeakedToolCallError(name, '(unknown)');
|
|
82
107
|
}
|
|
83
108
|
function formatLoopHint(hit) {
|
|
84
109
|
const argsStr = JSON.stringify(hit.call.args);
|
|
@@ -106,12 +131,13 @@ async function appendLoopEvent(cwd, taskId, phase, hit, strike, outcome) {
|
|
|
106
131
|
*/
|
|
107
132
|
export async function runPhaseWithLoopGuard(deps, name, tools, buildPrompt) {
|
|
108
133
|
const loopHistory = [];
|
|
134
|
+
// Carries the correction hint (loop OR leaked-tool-call) into the next strike.
|
|
135
|
+
let nextHint = null;
|
|
109
136
|
for (let strike = 0; strike <= MAX_LOOP_RESTARTS; strike++) {
|
|
110
137
|
if (deps.signal.aborted)
|
|
111
138
|
throw new Error(USER_CANCELLED);
|
|
112
139
|
const detector = new LoopDetector(LOOP_WINDOW, LOOP_THRESHOLD);
|
|
113
|
-
const
|
|
114
|
-
const prompt = buildPrompt(hint);
|
|
140
|
+
const prompt = buildPrompt(nextHint);
|
|
115
141
|
const r = await runChild(deps.cwd, tools, prompt, deps.signal, deps.onChildOutput, deps.onContextUsage, call => detector.record(call), deps.spawn);
|
|
116
142
|
if (deps.signal.aborted)
|
|
117
143
|
throw new Error(USER_CANCELLED);
|
|
@@ -121,6 +147,7 @@ export async function runPhaseWithLoopGuard(deps, name, tools, buildPrompt) {
|
|
|
121
147
|
await appendLoopEvent(deps.cwd, deps.taskId, name, r.loopHit, strike + 1, isLastStrike ? 'phase failed' : 'restarted with hint');
|
|
122
148
|
if (isLastStrike)
|
|
123
149
|
throw new LoopExhaustedError(name, loopHistory);
|
|
150
|
+
nextHint = formatLoopHint(r.loopHit);
|
|
124
151
|
continue;
|
|
125
152
|
}
|
|
126
153
|
if (r.exitCode !== 0) {
|
|
@@ -129,6 +156,13 @@ export async function runPhaseWithLoopGuard(deps, name, tools, buildPrompt) {
|
|
|
129
156
|
if (r.text.trim().length === 0) {
|
|
130
157
|
throw new Error(`${name} child produced no output${r.stderr ? ' β stderr: ' + r.stderr : ''}`);
|
|
131
158
|
}
|
|
159
|
+
if (r.leakedToolCall) {
|
|
160
|
+
if (strike === MAX_LOOP_RESTARTS) {
|
|
161
|
+
throw new LeakedToolCallError(name, r.leakedToolCall);
|
|
162
|
+
}
|
|
163
|
+
nextHint = leakedToolCallHint(r.leakedToolCall);
|
|
164
|
+
continue;
|
|
165
|
+
}
|
|
132
166
|
return r.text;
|
|
133
167
|
}
|
|
134
168
|
throw new LoopExhaustedError(name, loopHistory);
|
|
@@ -160,3 +194,20 @@ export class LoopExhaustedError extends Error {
|
|
|
160
194
|
this.name = 'LoopExhaustedError';
|
|
161
195
|
}
|
|
162
196
|
}
|
|
197
|
+
// βββ LeakedToolCallError βββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
198
|
+
/**
|
|
199
|
+
* Thrown when a phase child repeatedly wrote a tool call as plain text (a markup
|
|
200
|
+
* dialect pi's harness didn't parse) instead of invoking it. The call never ran,
|
|
201
|
+
* so the phase output is untrustworthy β fail loudly rather than check it off.
|
|
202
|
+
*/
|
|
203
|
+
export class LeakedToolCallError extends Error {
|
|
204
|
+
phase;
|
|
205
|
+
marker;
|
|
206
|
+
constructor(phase, marker) {
|
|
207
|
+
super(`${phase} child wrote a tool call as text instead of invoking it `
|
|
208
|
+
+ `(${marker.trim()}) β it never ran`);
|
|
209
|
+
this.phase = phase;
|
|
210
|
+
this.marker = marker;
|
|
211
|
+
this.name = 'LeakedToolCallError';
|
|
212
|
+
}
|
|
213
|
+
}
|
|
@@ -4,7 +4,7 @@
|
|
|
4
4
|
*/
|
|
5
5
|
import { updateTaskFrontMatter } from './task-file.js';
|
|
6
6
|
import { flashTerminalWidget } from './widget.js';
|
|
7
|
-
import { LoopExhaustedError, USER_CANCELLED } from './child-runner.js';
|
|
7
|
+
import { LoopExhaustedError, LeakedToolCallError, USER_CANCELLED } from './child-runner.js';
|
|
8
8
|
// βββ Classifier ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
|
9
9
|
export function classifyFailure(err, aborted) {
|
|
10
10
|
const msg = err instanceof Error ? err.message : String(err);
|
|
@@ -20,6 +20,15 @@ export function classifyFailure(err, aborted) {
|
|
|
20
20
|
level: 'error'
|
|
21
21
|
};
|
|
22
22
|
}
|
|
23
|
+
if (err instanceof LeakedToolCallError) {
|
|
24
|
+
return {
|
|
25
|
+
state: 'failed',
|
|
26
|
+
reason: `leaked tool call in ${err.phase}: ${err.marker.trim()}`,
|
|
27
|
+
flash: 'leaked_tool_call',
|
|
28
|
+
notify: `failed: ${err.phase} wrote a tool call as text instead of running it β it never executed. Resume to retry.`,
|
|
29
|
+
level: 'error'
|
|
30
|
+
};
|
|
31
|
+
}
|
|
23
32
|
if (msg === 'no_verify_block') {
|
|
24
33
|
return {
|
|
25
34
|
state: 'failed',
|
|
@@ -25,6 +25,7 @@ export declare class TaskRunner {
|
|
|
25
25
|
private readonly _rawPrompt;
|
|
26
26
|
private readonly _resumeId;
|
|
27
27
|
private readonly _sendSpec;
|
|
28
|
+
private readonly _onStart;
|
|
28
29
|
private readonly _abort;
|
|
29
30
|
private readonly _startedAt;
|
|
30
31
|
private readonly _widgetState;
|
|
@@ -40,7 +41,7 @@ export declare class TaskRunner {
|
|
|
40
41
|
*/
|
|
41
42
|
private readonly _timings;
|
|
42
43
|
private _currentPhaseChildren;
|
|
43
|
-
constructor(ctx: ExtensionCommandContext, cwd: string, rawPrompt: string, resumeId?: string, sendSpec?: (spec: string) => Promise<void>, spawnFn?: SpawnFn);
|
|
44
|
+
constructor(ctx: ExtensionCommandContext, cwd: string, rawPrompt: string, resumeId?: string, sendSpec?: (spec: string) => Promise<void>, spawnFn?: SpawnFn, onStart?: (taskId: string) => void | Promise<void>);
|
|
44
45
|
get taskId(): string;
|
|
45
46
|
get signal(): AbortSignal;
|
|
46
47
|
/** Return the current widget state, or null if not started. */
|
|
@@ -64,6 +65,10 @@ export interface RunSingleTaskOptions {
|
|
|
64
65
|
resumeId?: string;
|
|
65
66
|
/** Test seam: spawn function forwarded to TaskRunner. */
|
|
66
67
|
spawnFn?: SpawnFn;
|
|
68
|
+
/** Called with the resolved task id once its file exists, before any phase
|
|
69
|
+
* work. Lets callers record the id (e.g. stamp the /task-auto entry) so an
|
|
70
|
+
* interrupted run can be resumed instead of restarted. */
|
|
71
|
+
onStart?: (taskId: string) => void | Promise<void>;
|
|
67
72
|
}
|
|
68
73
|
export interface RunSingleTaskResult {
|
|
69
74
|
taskId: string;
|
|
@@ -45,6 +45,7 @@ export class TaskRunner {
|
|
|
45
45
|
_rawPrompt;
|
|
46
46
|
_resumeId;
|
|
47
47
|
_sendSpec;
|
|
48
|
+
_onStart;
|
|
48
49
|
_abort = new AbortController();
|
|
49
50
|
_startedAt;
|
|
50
51
|
_widgetState;
|
|
@@ -60,12 +61,13 @@ export class TaskRunner {
|
|
|
60
61
|
*/
|
|
61
62
|
_timings = [];
|
|
62
63
|
_currentPhaseChildren = null;
|
|
63
|
-
constructor(ctx, cwd, rawPrompt, resumeId, sendSpec, spawnFn) {
|
|
64
|
+
constructor(ctx, cwd, rawPrompt, resumeId, sendSpec, spawnFn, onStart) {
|
|
64
65
|
this._ctx = ctx;
|
|
65
66
|
this._cwd = cwd;
|
|
66
67
|
this._rawPrompt = rawPrompt;
|
|
67
68
|
this._resumeId = resumeId;
|
|
68
69
|
this._sendSpec = sendSpec;
|
|
70
|
+
this._onStart = onStart;
|
|
69
71
|
this._startedAt = Date.now();
|
|
70
72
|
// We'll populate id/title/phase lazily in run().
|
|
71
73
|
// Placeholder β real values set in run().
|
|
@@ -157,6 +159,11 @@ export class TaskRunner {
|
|
|
157
159
|
};
|
|
158
160
|
await writeTaskFile(cwd, fm, `\n## raw prompt\n\n${this._rawPrompt.trim() || '(none)'}\n`);
|
|
159
161
|
}
|
|
162
|
+
// Surface the resolved id now that the task file exists, so callers (e.g.
|
|
163
|
+
// the /task-auto loop) can link this run to their own bookkeeping before
|
|
164
|
+
// any phase work β and recover it if the session dies mid-pipeline.
|
|
165
|
+
if (this._onStart)
|
|
166
|
+
await this._onStart(id);
|
|
160
167
|
// Register as active.
|
|
161
168
|
this._widgetState.taskId = id;
|
|
162
169
|
this._widgetState.title = title;
|
|
@@ -279,7 +286,7 @@ export async function runSingleTask(ctx, cwd, rawPrompt, opts = {}) {
|
|
|
279
286
|
await newCtx.sendUserMessage(spec);
|
|
280
287
|
if (opts.waitForImplementation)
|
|
281
288
|
await newCtx.waitForIdle();
|
|
282
|
-
}, opts.spawnFn);
|
|
289
|
+
}, opts.spawnFn, opts.onStart);
|
|
283
290
|
await runner.run();
|
|
284
291
|
taskId = runner.taskId;
|
|
285
292
|
}
|
package/dist/task/phases.js
CHANGED
|
@@ -233,6 +233,10 @@ export async function phaseResearch(deps, refined, researchDeps = {}) {
|
|
|
233
233
|
if (result.text.trim().length === 0) {
|
|
234
234
|
throw new Error(`Research ${name} worker produced no output`);
|
|
235
235
|
}
|
|
236
|
+
if (result.leakedToolCall) {
|
|
237
|
+
throw new Error(`Research ${name} worker wrote a tool call as text instead of invoking it `
|
|
238
|
+
+ `(${result.leakedToolCall.trim()}) β it never ran`);
|
|
239
|
+
}
|
|
236
240
|
}
|
|
237
241
|
return `FILES\n${files.text}\n\nAPIS\n${apis.text}\n\nCONTEXT\n${context.text}\n\nTOOLING\n${tooling.text}`;
|
|
238
242
|
}
|
|
@@ -1,3 +1,6 @@
|
|
|
1
|
+
import { readFileSync } from 'node:fs';
|
|
2
|
+
import { fileURLToPath } from 'node:url';
|
|
3
|
+
import { dirname, join } from 'node:path';
|
|
1
4
|
import { JSDOM } from 'jsdom';
|
|
2
5
|
import { Readability } from '@mozilla/readability';
|
|
3
6
|
import TurndownService from 'turndown';
|
|
@@ -29,8 +32,57 @@ export function cleanHtml(html, baseUrl) {
|
|
|
29
32
|
}
|
|
30
33
|
const DEFAULT_TIMEOUT_MS = 15_000;
|
|
31
34
|
const DEFAULT_MAX_BYTES = 2 * 1024 * 1024; // 2 MB
|
|
32
|
-
const PKG_VERSION =
|
|
35
|
+
const PKG_VERSION = readPkgVersion();
|
|
33
36
|
const USER_AGENT = `pi-worker/${PKG_VERSION} (+https://npmjs.com/package/@mjasnikovs/pi-worker)`;
|
|
37
|
+
// Read the version from package.json at runtime so the User-Agent never drifts
|
|
38
|
+
// out of sync with releases. Two levels up holds for both src/workers (tests)
|
|
39
|
+
// and dist/workers (build) since tsc preserves the layout under rootDir.
|
|
40
|
+
function readPkgVersion() {
|
|
41
|
+
try {
|
|
42
|
+
const here = dirname(fileURLToPath(import.meta.url));
|
|
43
|
+
const pkg = JSON.parse(readFileSync(join(here, '..', '..', 'package.json'), 'utf8'));
|
|
44
|
+
return typeof pkg.version === 'string' ? pkg.version : '0.0.0';
|
|
45
|
+
}
|
|
46
|
+
catch {
|
|
47
|
+
return '0.0.0';
|
|
48
|
+
}
|
|
49
|
+
}
|
|
50
|
+
// Decide how to handle a response based on its content-type. HTML is run through
|
|
51
|
+
// the readability/turndown pipeline; text-ish formats (markdown, plain text,
|
|
52
|
+
// JSON, XML/feeds) are already clean and pass through verbatim; binary formats
|
|
53
|
+
// (PDF, images, octet-stream, β¦) are rejected. A missing content-type is treated
|
|
54
|
+
// as text β many plain-text endpoints (llms.txt, robots.txt) omit the header.
|
|
55
|
+
function classifyContentType(contentType) {
|
|
56
|
+
const mime = contentType.split(';')[0].trim().toLowerCase();
|
|
57
|
+
if (mime === '')
|
|
58
|
+
return 'text';
|
|
59
|
+
if (mime === 'text/html' || mime === 'application/xhtml+xml')
|
|
60
|
+
return 'html';
|
|
61
|
+
if (mime.startsWith('text/'))
|
|
62
|
+
return 'text';
|
|
63
|
+
if (mime === 'application/json' || mime.endsWith('+json'))
|
|
64
|
+
return 'text';
|
|
65
|
+
if (mime === 'application/xml' || mime.endsWith('+xml'))
|
|
66
|
+
return 'text';
|
|
67
|
+
if (mime === 'application/javascript' || mime === 'application/ecmascript')
|
|
68
|
+
return 'text';
|
|
69
|
+
return 'reject';
|
|
70
|
+
}
|
|
71
|
+
// Extract the charset from a content-type header, if present and supported by
|
|
72
|
+
// TextDecoder; otherwise fall back to UTF-8 so non-UTF-8 pages aren't mangled.
|
|
73
|
+
function decoderFor(contentType) {
|
|
74
|
+
const match = /charset=([^;]+)/i.exec(contentType);
|
|
75
|
+
const charset = match?.[1]?.trim().replace(/^["']|["']$/g, '');
|
|
76
|
+
if (charset) {
|
|
77
|
+
try {
|
|
78
|
+
return new TextDecoder(charset, { fatal: false });
|
|
79
|
+
}
|
|
80
|
+
catch {
|
|
81
|
+
// Unknown/unsupported label β fall through to UTF-8.
|
|
82
|
+
}
|
|
83
|
+
}
|
|
84
|
+
return new TextDecoder('utf-8', { fatal: false });
|
|
85
|
+
}
|
|
34
86
|
export class FetchAndCleanError extends Error {
|
|
35
87
|
kind;
|
|
36
88
|
cause;
|
|
@@ -77,15 +129,16 @@ export async function fetchAndClean(url, opts = {}) {
|
|
|
77
129
|
throw new FetchAndCleanError(`Fetch failed: HTTP ${response.status} ${response.statusText} for ${url}`, 'http-error');
|
|
78
130
|
}
|
|
79
131
|
const contentType = response.headers.get('content-type') ?? '';
|
|
80
|
-
|
|
81
|
-
|
|
132
|
+
const kind = classifyContentType(contentType);
|
|
133
|
+
if (kind === 'reject') {
|
|
134
|
+
throw new FetchAndCleanError(`${url} is ${contentType || 'unknown content type'}, not a text or HTML page that pi-worker-fetch can read.`, 'not-html');
|
|
82
135
|
}
|
|
83
136
|
const reader = response.body?.getReader();
|
|
84
137
|
if (!reader) {
|
|
85
138
|
throw new FetchAndCleanError(`Could not fetch ${url}: empty response body`, 'network');
|
|
86
139
|
}
|
|
87
|
-
const decoder =
|
|
88
|
-
let
|
|
140
|
+
const decoder = decoderFor(contentType);
|
|
141
|
+
let text = '';
|
|
89
142
|
let bytesRead = 0;
|
|
90
143
|
try {
|
|
91
144
|
while (true) {
|
|
@@ -99,10 +152,10 @@ export async function fetchAndClean(url, opts = {}) {
|
|
|
99
152
|
internalController.abort();
|
|
100
153
|
break;
|
|
101
154
|
}
|
|
102
|
-
|
|
155
|
+
text += decoder.decode(value, { stream: true });
|
|
103
156
|
}
|
|
104
157
|
}
|
|
105
|
-
|
|
158
|
+
text += decoder.decode();
|
|
106
159
|
}
|
|
107
160
|
catch (err) {
|
|
108
161
|
if (sizeExceeded) {
|
|
@@ -119,8 +172,15 @@ export async function fetchAndClean(url, opts = {}) {
|
|
|
119
172
|
throw new FetchAndCleanError(`${url} exceeds ${formatBytes(maxBytes)} size cap. Try a more specific URL.`, 'too-large');
|
|
120
173
|
}
|
|
121
174
|
const finalUrl = response.url || url;
|
|
122
|
-
|
|
123
|
-
|
|
175
|
+
if (kind === 'html') {
|
|
176
|
+
return cleanHtml(text, finalUrl);
|
|
177
|
+
}
|
|
178
|
+
// text-ish formats are already clean β return them verbatim.
|
|
179
|
+
return {
|
|
180
|
+
title: hostnameOf(finalUrl),
|
|
181
|
+
markdown: text.trim(),
|
|
182
|
+
finalUrl
|
|
183
|
+
};
|
|
124
184
|
}
|
|
125
185
|
finally {
|
|
126
186
|
clearTimeout(timeoutHandle);
|
|
@@ -128,6 +188,14 @@ export async function fetchAndClean(url, opts = {}) {
|
|
|
128
188
|
opts.signal.removeEventListener('abort', onUserAbort);
|
|
129
189
|
}
|
|
130
190
|
}
|
|
191
|
+
function hostnameOf(url) {
|
|
192
|
+
try {
|
|
193
|
+
return new URL(url).hostname;
|
|
194
|
+
}
|
|
195
|
+
catch {
|
|
196
|
+
return url;
|
|
197
|
+
}
|
|
198
|
+
}
|
|
131
199
|
function describeError(err) {
|
|
132
200
|
if (err instanceof Error)
|
|
133
201
|
return err.message;
|
|
@@ -24,5 +24,11 @@ export interface RunWorkerResult {
|
|
|
24
24
|
* elapsed when the child never produced output.
|
|
25
25
|
*/
|
|
26
26
|
workMs: number;
|
|
27
|
+
/**
|
|
28
|
+
* Set when the worker exhausted its re-prompts still leaking a tool call as
|
|
29
|
+
* text (wrong dialect, never executed). The caller must treat this as a
|
|
30
|
+
* failure rather than trusting the returned text.
|
|
31
|
+
*/
|
|
32
|
+
leakedToolCall?: string;
|
|
27
33
|
}
|
|
28
34
|
export declare function runWorker(input: RunWorkerInput): Promise<RunWorkerResult>;
|
|
@@ -1,6 +1,7 @@
|
|
|
1
1
|
import { getPiInvocation } from '../shared/pi-invocation.js';
|
|
2
2
|
import { CHILD_BASE_ARGS, runChildDefault } from '../shared/child-process.js';
|
|
3
3
|
import { LoopDetector } from '../task/loop-detector.js';
|
|
4
|
+
import { detectLeakedToolCall, leakedToolCallHint, MAX_LEAK_RETRIES } from '../shared/leaked-tool-call.js';
|
|
4
5
|
// `--mode json` makes pi emit structured events as they happen instead of
|
|
5
6
|
// buffering the assistant text and flushing on exit. That matters for the
|
|
6
7
|
// wait/work timing split: in text mode the first stdout chunk only arrives at
|
|
@@ -11,25 +12,39 @@ import { LoopDetector } from '../task/loop-detector.js';
|
|
|
11
12
|
const DEFAULT_TOOLS = 'read,grep,find,ls';
|
|
12
13
|
export async function runWorker(input) {
|
|
13
14
|
const tools = input.tools ?? DEFAULT_TOOLS;
|
|
14
|
-
const
|
|
15
|
-
|
|
16
|
-
|
|
17
|
-
|
|
18
|
-
|
|
19
|
-
|
|
20
|
-
|
|
21
|
-
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
25
|
-
|
|
26
|
-
|
|
27
|
-
|
|
28
|
-
|
|
29
|
-
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
|
|
33
|
-
|
|
34
|
-
|
|
15
|
+
const baseArgs = [...CHILD_BASE_ARGS, '--mode', 'json', '--tools', tools];
|
|
16
|
+
let hint = null;
|
|
17
|
+
for (let attempt = 0;; attempt++) {
|
|
18
|
+
const prompt = hint === null ? input.prompt : `${hint}\n\n${input.prompt}`;
|
|
19
|
+
const invocation = getPiInvocation([...baseArgs, prompt]);
|
|
20
|
+
const tStart = Date.now();
|
|
21
|
+
let tFirstByte = null;
|
|
22
|
+
const loopDetector = new LoopDetector(20, 5);
|
|
23
|
+
const result = await runChildDefault(invocation, input.cwd, input.signal, {
|
|
24
|
+
mode: 'json-events',
|
|
25
|
+
onFirstByte: () => (tFirstByte = Date.now()),
|
|
26
|
+
onToolCall: call => loopDetector.record(call)
|
|
27
|
+
}, input.spawn);
|
|
28
|
+
const tEnd = Date.now();
|
|
29
|
+
const waitMs = tFirstByte === null ? tEnd - tStart : tFirstByte - tStart;
|
|
30
|
+
const workMs = tFirstByte === null ? 0 : tEnd - tFirstByte;
|
|
31
|
+
const text = result.text ?? '';
|
|
32
|
+
// Only treat output as a leak on a clean, complete run β a non-zero exit
|
|
33
|
+
// or abort yields partial text the caller already handles, and detecting
|
|
34
|
+
// there would just mislabel the real failure.
|
|
35
|
+
const leaked = result.exitCode === 0 && !result.aborted ? detectLeakedToolCall(text) : null;
|
|
36
|
+
if (leaked && attempt < MAX_LEAK_RETRIES) {
|
|
37
|
+
hint = leakedToolCallHint(leaked);
|
|
38
|
+
continue;
|
|
39
|
+
}
|
|
40
|
+
return {
|
|
41
|
+
text,
|
|
42
|
+
exitCode: result.exitCode,
|
|
43
|
+
stderr: result.stderr.trim(),
|
|
44
|
+
aborted: result.aborted,
|
|
45
|
+
waitMs,
|
|
46
|
+
workMs,
|
|
47
|
+
...(leaked ? { leakedToolCall: leaked } : {})
|
|
48
|
+
};
|
|
49
|
+
}
|
|
35
50
|
}
|
|
@@ -14,10 +14,11 @@ export function registerPiWorkerFetch(pi, internals = {}) {
|
|
|
14
14
|
pi.registerTool({
|
|
15
15
|
name: 'pi-worker-fetch',
|
|
16
16
|
label: 'Pi Worker Fetch',
|
|
17
|
-
description: 'Fetch
|
|
18
|
-
+ '
|
|
19
|
-
+ '
|
|
20
|
-
+ '
|
|
17
|
+
description: 'Fetch a web page or text resource (HTML, markdown, plain text, JSON, '
|
|
18
|
+
+ 'XML/feeds), clean HTML to markdown, and hand it to an isolated child '
|
|
19
|
+
+ 'Pi session that extracts ONLY content answering `query`. Returns the '
|
|
20
|
+
+ 'focused answer. Use after `pi-worker-search` (or with a known URL) to '
|
|
21
|
+
+ 'avoid stuffing raw content into the main context.',
|
|
21
22
|
parameters: Params,
|
|
22
23
|
executionMode: 'parallel',
|
|
23
24
|
async execute(_toolCallId, params, signal, _onUpdate, ctx) {
|
package/package.json
CHANGED
|
@@ -1,12 +1,13 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "@mjasnikovs/pi-task",
|
|
3
|
-
"version": "0.7.
|
|
3
|
+
"version": "0.7.3",
|
|
4
4
|
"description": "Deterministic spec-orchestration for local models, with a bundled real-time remote web view and web/docs/fetch/worker subagent tools.",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "./dist/index.js",
|
|
7
7
|
"types": "./dist/index.d.ts",
|
|
8
8
|
"files": [
|
|
9
9
|
"dist",
|
|
10
|
+
"assets",
|
|
10
11
|
"README.md",
|
|
11
12
|
"LICENSE"
|
|
12
13
|
],
|