@sebastianandreasson/pi-autonomous-agents 0.5.0 → 0.5.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (2) hide show
  1. package/README.md +249 -81
  2. package/package.json +1 -1
package/README.md CHANGED
@@ -1,58 +1,79 @@
1
- # PI Harness
1
+ # PI Autonomous Agents
2
2
 
3
- `pi-harness` is a portable CLI/workflow package for running a local PI-based unattended loop with:
3
+ `@sebastianandreasson/pi-autonomous-agents` is an npm package for running a bounded unattended [PI](https://pi.dev/) workflow inside another repository.
4
4
 
5
- - a `developer` pass
6
- - a fast verification step
7
- - a skeptical `tester` pass
8
- - optional periodic multimodal visual review
9
- - tester-owned final commit by default
5
+ It orchestrates:
10
6
 
11
- The package is intentionally generic. It does not know how to navigate or test a specific app on its own.
7
+ - a `developer` turn
8
+ - a fast local verification step
9
+ - an independent `tester` turn
10
+ - an optional focused `developerFix` turn when verification/tester finds a real issue
11
+ - optional periodic visual review from screenshots
12
12
 
13
- ## What Belongs In The Package
13
+ The package is intentionally generic. It handles supervision, prompts, runtime state, telemetry, retries, and guardrails. The consuming repo still owns its own tasks, instructions, tests, model endpoints, and screenshot capture flow.
14
14
 
15
- - supervisor/orchestration
16
- - PI adapter/runtime integration
15
+ ## Install
16
+
17
+ ```bash
18
+ npm install -D @sebastianandreasson/pi-autonomous-agents
19
+ ```
20
+
21
+ Then in the consuming repo, tell your agent:
22
+
23
+ ```text
24
+ Find SETUP.md in @sebastianandreasson/pi-autonomous-agents and set everything up for this repository.
25
+ ```
26
+
27
+ The package ships a top-level [SETUP.md](./SETUP.md) specifically for that workflow.
28
+
29
+ ## What This Package Owns
30
+
31
+ - unattended loop orchestration
32
+ - PI adapter integration
17
33
  - config loading
18
- - telemetry
19
- - loop guards, timeout guards, and retries
20
- - tester feedback + visual feedback handoff
21
- - optional legacy harness git finalize step for `commitMode: "plan"`
22
- - multimodal visual review client
34
+ - prompt assembly
35
+ - verification/tester/visual-review handoff
36
+ - timeout and loop guards
37
+ - telemetry and run summaries
38
+ - runtime isolation and stale-run recovery
23
39
 
24
- ## What Stays Per Project
40
+ ## What Each Repo Must Provide
25
41
 
26
42
  - `TODOS.md`
27
- - project instructions
28
- - browser tests
29
- - visual capture flow
30
- - app-specific verification commands
31
- - app/server startup scripts
43
+ - repo-specific `pi/DEVELOPER.md`
44
+ - repo-specific `pi/TESTER.md`
45
+ - a fast bounded `testCommand`
46
+ - model configuration that actually matches the local/cloud providers in use
47
+ - optionally a screenshot capture command for visual review
32
48
 
33
- ## Layout
49
+ ## Quick Start In A Repo
50
+
51
+ The normal setup shape is:
34
52
 
35
53
  ```text
36
- packages/pi-harness/
37
- package.json
38
- pi.config.json
39
- templates/DEVELOPER.md
40
- templates/TESTER.md
41
- docs/PI_SUPERVISOR.md
42
- src/
43
- cli.mjs
44
- pi-client.mjs
45
- pi-config.mjs
46
- pi-prompts.mjs
47
- pi-repo.mjs
48
- pi-report.mjs
49
- pi-rpc-adapter.mjs
50
- pi-supervisor.mjs
51
- pi-telemetry.mjs
52
- pi-visual-once.mjs
53
- pi-visual-review.mjs
54
+ TODOS.md
55
+ pi.config.json
56
+ pi/
57
+ DEVELOPER.md
58
+ TESTER.md
59
+ ```
60
+
61
+ Typical scripts:
62
+
63
+ ```json
64
+ {
65
+ "scripts": {
66
+ "pi:mock": "PI_CONFIG_FILE=pi.config.json PI_TRANSPORT=mock PI_TEST_CMD= pi-harness once",
67
+ "pi:once": "PI_CONFIG_FILE=pi.config.json pi-harness once",
68
+ "pi:run": "PI_CONFIG_FILE=pi.config.json pi-harness run",
69
+ "pi:report": "PI_CONFIG_FILE=pi.config.json pi-harness report",
70
+ "pi:visual:once": "PI_CONFIG_FILE=pi.config.json pi-harness visual-once"
71
+ }
72
+ }
54
73
  ```
55
74
 
75
+ Start from [templates/pi.config.example.json](./templates/pi.config.example.json), [templates/DEVELOPER.md](./templates/DEVELOPER.md), [templates/TESTER.md](./templates/TESTER.md), and [templates/gitignore.fragment](./templates/gitignore.fragment).
76
+
56
77
  ## CLI
57
78
 
58
79
  ```bash
@@ -65,65 +86,212 @@ pi-harness adapter
65
86
  pi-harness visual-review-worker
66
87
  ```
67
88
 
68
- Use `PI_CONFIG_FILE` to point the harness at a project-local config file. If you do not provide one, the bundled generic `pi.config.json` is used as a fallback.
69
-
70
- ## Setup In Another Repo
71
-
72
- After installing the package:
89
+ Use `PI_CONFIG_FILE` to point at the repo-local config file:
73
90
 
74
91
  ```bash
75
- npm install -D @sebastianandreasson/pi-autonomous-agents
92
+ PI_CONFIG_FILE=pi.config.json pi-harness once
76
93
  ```
77
94
 
78
- you can tell another agent in that repo:
79
-
80
- ```text
81
- Find SETUP.md in @sebastianandreasson/pi-autonomous-agents and set everything up for this repository.
95
+ If `PI_CONFIG_FILE` is not set, the package falls back to the bundled generic [pi.config.json](./pi.config.json).
96
+
97
+ ## Core Workflow
98
+
99
+ Each real iteration works like this:
100
+
101
+ 1. `developer` implements one unchecked task from `TODOS.md`.
102
+ 2. The harness runs the configured fast verification command.
103
+ 3. If verification passes, `tester` reviews the change independently.
104
+ 4. If tester or verification fails, the findings go back to `developerFix` for one focused repair pass.
105
+ 5. If tester reaches `PASS`, tester creates the final commit directly by default.
106
+ 6. Every `N` successful iterations, optional visual review can inspect screenshots and veto the success if it finds a real problem.
107
+
108
+ The default commit model is `commitMode: "agent"`. The older harness-managed parsed commit-plan flow still exists as `commitMode: "plan"`, but it is now a compatibility mode rather than the default.
109
+
110
+ ## Recommended Model Setup
111
+
112
+ The package supports:
113
+
114
+ - one default text model via `piModel`
115
+ - one default visual-review model via `visualReviewModel`
116
+ - optional per-role overrides via `roleModels`
117
+ - per-model endpoint config in `models`
118
+
119
+ Typical pattern:
120
+
121
+ - local model for `developer`
122
+ - local model for `developerRetry`
123
+ - local model for `developerFix`
124
+ - local or slightly stronger model for `tester`
125
+ - stronger frontier model only for `visualReview`
126
+
127
+ Example:
128
+
129
+ ```json
130
+ {
131
+ "piModel": "local/text-model",
132
+ "visualReviewModel": "cloud/vision-model",
133
+ "models": {
134
+ "local/text-model": {
135
+ "baseUrl": "http://localhost:8000/v1",
136
+ "apiKey": "local",
137
+ "vision": false
138
+ },
139
+ "local/tester-model": {
140
+ "baseUrl": "http://localhost:8000/v1",
141
+ "apiKey": "local",
142
+ "vision": false
143
+ },
144
+ "cloud/vision-model": {
145
+ "baseUrl": "https://api.openai.com/v1",
146
+ "apiKeyEnv": "OPENAI_API_KEY",
147
+ "vision": true
148
+ }
149
+ },
150
+ "roleModels": {
151
+ "developer": "local/text-model",
152
+ "developerRetry": "local/text-model",
153
+ "developerFix": "local/text-model",
154
+ "tester": "local/tester-model",
155
+ "visualReview": "cloud/vision-model"
156
+ }
157
+ }
82
158
  ```
83
159
 
84
- The package ships a top-level [SETUP.md](./SETUP.md) specifically for that workflow.
160
+ Important:
161
+
162
+ - do not guess model ids
163
+ - if using a custom OpenAI-compatible provider, verify `<baseUrl>/models`
164
+ - if using PI models directly, verify `pi --list-models`
165
+ - if `PI_CODING_AGENT_DIR` points at a repo-local PI home, make sure it is bootstrapped and contains `models.json`
166
+
167
+ The harness now preflights those checks before starting a real run.
85
168
 
86
- If you want to wipe all harness-generated state and start over cleanly in a repo, run:
169
+ ## Important Config Fields
87
170
 
88
- ```bash
89
- PI_CONFIG_FILE=pi.config.json pi-harness clear-history
90
- ```
171
+ Common fields in `pi.config.json`:
172
+
173
+ - `taskFile`
174
+ - `developerInstructionsFile`
175
+ - `testerInstructionsFile`
176
+ - `transport`
177
+ - `adapterCommand`
178
+ - `piModel`
179
+ - `models`
180
+ - `roleModels`
181
+ - `commitMode`
182
+ - `promptMode`
183
+ - `testCommand`
184
+ - `visualReviewEnabled`
185
+ - `visualCaptureCommand`
186
+ - `continueAfterSeconds`
187
+ - `toolContinueAfterSeconds`
188
+ - `noEventTimeoutSeconds`
189
+ - `toolNoEventTimeoutSeconds`
190
+ - `largeFileWarningLines`
191
+ - `largeSpecWarningLines`
192
+
193
+ Key defaults:
194
+
195
+ - `transport`: `adapter`
196
+ - `commitMode`: `agent`
197
+ - `promptMode`: `compact`
198
+ - `piTools`: `read,edit,write,find,ls,bash`
199
+ - `continueAfterSeconds`: `300`
200
+ - `toolContinueAfterSeconds`: `900`
201
+ - `noEventTimeoutSeconds`: `900`
202
+ - `toolNoEventTimeoutSeconds`: `1800`
203
+
204
+ ## Prompt and Tooling Behavior
205
+
206
+ The package is optimized for local models by default:
207
+
208
+ - prompts are compacted before handoff
209
+ - changed-file lists and feedback excerpts are capped
210
+ - prompts prefer `read` for source inspection
211
+ - shell is intended for `git`, tests, and narrow diagnostics
212
+ - the adapter warns on obvious oversized shell-based file reads
213
+ - the supervisor emits large-file/spec warnings when touched files are getting risky
214
+
215
+ This is deliberate. Large monolith files, huge e2e specs, and broad TODO items are one of the main causes of local-model drift and retry loops.
216
+
217
+ Recommended repo shape:
218
+
219
+ - keep TODO items very small and implementation-shaped
220
+ - split giant stores/modules before they become constant edit hotspots
221
+ - split ever-growing end-to-end specs into scenario files
222
+ - keep the default `testCommand` to a bounded smoke check, not a multi-minute happy-path run
223
+
224
+ ## Runtime Isolation And Recovery
91
225
 
92
- The command removes configured harness history/runtime files and verifies that no configured history paths remain afterward.
226
+ Recent versions of the package isolate each run more aggressively:
93
227
 
94
- For prompt debugging, the harness also writes the exact assembled prompt for the current role to `.pi-last-prompt.txt` by default.
95
- For flow debugging, it also writes a machine-readable `.pi-last-iteration.json` summary with the selected task, tester verdict, commit-plan state, and terminal reason.
96
- For run isolation, the supervisor also maintains `.pi-runtime/active-run.json` and stores PI sessions plus per-run telemetry under `.pi-runtime/runs/<runId>/`.
228
+ - active ownership lock at `.pi-runtime/active-run.json`
229
+ - per-run runtime directory under `.pi-runtime/runs/<runId>/`
230
+ - per-run PI sessions and telemetry
231
+ - `runId` added to telemetry
232
+ - in-progress iteration state persisted before agent work starts
233
+ - stale run locks recovered when the owning PID is gone
234
+ - timeout cleanup kills the full spawned process group, not only the direct child
97
235
 
98
- ## Generic Contracts
236
+ That is meant to prevent orphaned timed-out agents or concurrent supervisors from corrupting shared state.
99
237
 
100
- - `taskFile`: usually `TODOS.md`
101
- - `developerInstructionsFile`: per-project developer instructions
102
- - `testerInstructionsFile`: per-project tester instructions
103
- - `roleModels`: optional per-role model overrides
104
- - `commitMode`: `agent` by default, `plan` only for legacy harness-managed commit parsing
105
- - `promptMode`: `compact` by default
106
- - `testCommand`: fast verification command
107
- - `visualCaptureCommand`: project-defined screenshot capture command
108
- - `visualFeedbackFile`: latest visual-review handoff
109
- - `testerFeedbackFile`: latest tester-review handoff
238
+ ## Debugging Artifacts
110
239
 
111
- For unattended loops, keep `testCommand` fast and bounded, such as a smoke suite. Long real-time Playwright happy-path specs belong in an explicit nightly or post-run lane, not the default developer/tester inner loop.
240
+ Useful files during a run:
112
241
 
113
- Keep TODO items extremely small and implementation-shaped when using weaker local models. Broad tasks tend to produce much longer turns, more retries, and more tester drift than narrow one-step tasks.
242
+ - `.pi-last-prompt.txt`
243
+ Exact assembled prompt for the current role.
244
+ - `.pi-last-output.txt`
245
+ Latest agent output snapshot.
246
+ - `.pi-last-verification.txt`
247
+ Latest verification output snapshot.
248
+ - `.pi-last-iteration.json`
249
+ Structured summary of the last completed iteration.
250
+ - `.pi-state.json`
251
+ Persistent harness state, including in-progress iteration data.
252
+ - `pi.log`
253
+ Main run log.
254
+ - `pi_telemetry.jsonl`
255
+ - `pi_telemetry.csv`
256
+ - `.pi-runtime/active-run.json`
257
+ - `.pi-runtime/runs/<runId>/...`
114
258
 
115
- The adapter heartbeat is PI-RPC-event based. Streaming shell output does not count as progress on its own, so long-running tools should rely on the tool-aware watchdog thresholds rather than terminal streaming.
259
+ `pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons and large-file warnings.
116
260
 
117
- The supervisor now enforces single-run ownership per repo/config. If a stale run crashed mid-iteration, the next run recovers the unfinished iteration number from `.pi-state.json` instead of silently rolling forward.
261
+ ## Visual Review Contract
118
262
 
119
- `piModel` remains the default text model, but you can override specific roles with `roleModels` such as `developer`, `developerRetry`, `developerFix`, `tester`, and `visualReview`. `testerCommit` is only relevant if you opt back into `commitMode: "plan"`.
263
+ Visual review is optional and generic. The harness does not know how to navigate your app.
120
264
 
121
- By default, successful tester passes should stage and create the commit directly in the same PI turn. The old commit-plan parsing flow is still available as `commitMode: "plan"`, but it is now a compatibility mode rather than the default.
265
+ If enabled, your repo must provide a real screenshot capture command that writes a manifest under the configured capture directory. The manifest shape is documented in [docs/PI_SUPERVISOR.md](./docs/PI_SUPERVISOR.md).
122
266
 
123
- Prompt/context handoff is compact by default. The harness now caps prior feedback excerpts, changed-file lists, verification excerpts, and prompt note handoff. If needed, tune `maxPromptChangedFiles`, `maxVisualFeedbackLines`, `maxTesterFeedbackLines`, `maxPromptNotesLines`, and `maxVerificationExcerptLines`.
267
+ Visual review should be used as a periodic audit, not as the default inner-loop gate.
124
268
 
125
- The default coding tool mix is now safer for local models: `read,edit,write,find,ls,bash`. Prompts explicitly steer source inspection toward `read` and reserve shell usage for `git`, tests, and narrow diagnostics.
269
+ ## Resetting Harness State
126
270
 
127
- The harness also emits lightweight large-file warnings for touched source/spec files and carries them into `.pi-last-iteration.json`, `pi-harness report`, and relevant prompts. Tune `largeFileWarningLines` and `largeSpecWarningLines` if needed.
271
+ If you want to wipe harness-generated state and start fresh:
272
+
273
+ ```bash
274
+ PI_CONFIG_FILE=pi.config.json pi-harness clear-history
275
+ ```
276
+
277
+ That clears configured harness runtime/history artifacts and verifies they are gone. It does not remove project source files.
278
+
279
+ ## Docs
280
+
281
+ - [SETUP.md](./SETUP.md)
282
+ Agent-facing setup instructions for consuming repos.
283
+ - [docs/PI_SUPERVISOR.md](./docs/PI_SUPERVISOR.md)
284
+ More detailed flow, adapter, and runtime documentation.
285
+ - [templates/PROJECT_SETUP.md](./templates/PROJECT_SETUP.md)
286
+ Minimal consuming-repo layout summary.
287
+
288
+ ## Development
289
+
290
+ In this package repo:
291
+
292
+ ```bash
293
+ npm run check
294
+ npm test
295
+ ```
128
296
 
129
- The harness expects screenshot capture to produce a `manifest.json` plus image files under the configured visual capture directory.
297
+ The package requires Node `>=20`.
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@sebastianandreasson/pi-autonomous-agents",
3
3
  "private": false,
4
- "version": "0.5.0",
4
+ "version": "0.5.1",
5
5
  "type": "module",
6
6
  "description": "Portable unattended PI harness for developer/tester/visual-review loops.",
7
7
  "license": "MIT",