@sebastianandreasson/pi-autonomous-agents 0.10.0 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -190,10 +190,13 @@ Common fields in `pi.config.json`:
190
190
  - `testCommand`
191
191
  - `visualReviewEnabled`
192
192
  - `visualCaptureCommand`
193
+ - `failureArtifactDir`
193
194
  - `continueAfterSeconds`
194
195
  - `toolContinueAfterSeconds`
195
196
  - `noEventTimeoutSeconds`
196
197
  - `toolNoEventTimeoutSeconds`
198
+ - `sameFileLoopBudget`
199
+ - `loopHistoryLimit`
197
200
  - `largeFileWarningLines`
198
201
  - `largeSpecWarningLines`
199
202
 
@@ -207,6 +210,8 @@ Key defaults:
207
210
  - `toolContinueAfterSeconds`: `900`
208
211
  - `noEventTimeoutSeconds`: `900`
209
212
  - `toolNoEventTimeoutSeconds`: `1800`
213
+ - `sameFileLoopBudget`: `2`
214
+ - `loopHistoryLimit`: `25`
210
215
 
211
216
  ## Prompt and Tooling Behavior
212
217
 
@@ -217,6 +222,7 @@ The package is optimized for local models by default:
217
222
  - prompts prefer `read` for source inspection
218
223
  - shell is intended for `git`, tests, and narrow diagnostics
219
224
  - SDK transport carries forward oversized shell-read warnings and loop/timeout guards
225
+ - repeated same-file loop failures are remembered across iterations and escalate the next edit strategy
220
226
  - the supervisor emits large-file/spec warnings when touched files are getting risky
221
227
 
222
228
  This is deliberate. Large monolith files, huge e2e specs, and broad TODO items are one of the main causes of local-model drift and retry loops.
@@ -255,6 +261,8 @@ Useful files during a run:
255
261
  Latest verification output snapshot.
256
262
  - `.pi-last-iteration.json`
257
263
  Structured summary of the last completed iteration.
264
+ - `pi-output/failure-artifacts/`
265
+ Compact failure artifacts with command, exit code, changed files, tester summary, and output excerpt.
258
266
  - `.pi-state.json`
259
267
  Persistent harness state, including in-progress iteration data.
260
268
  - `pi.log`
@@ -264,7 +272,7 @@ Useful files during a run:
264
272
  - `.pi-runtime/active-run.json`
265
273
  - `.pi-runtime/runs/<runId>/...`
266
274
 
267
- `pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons and large-file warnings.
275
+ `pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons, large-file warnings, and recent failure artifacts.
268
276
 
269
277
  `pi-harness run` now also starts lightweight local web UI for orchestration flow by default. By default it listens on `127.0.0.1:4317`. Override with `PI_VISUALIZER_HOST` and `PI_VISUALIZER_PORT`. Set `PI_VISUALIZER=0` to disable embedded web UI for a run.
270
278
 
@@ -326,6 +334,13 @@ For local visualizer iteration against fake live SDK agent:
326
334
  npm run debug:live-ui
327
335
  ```
328
336
 
337
+ Scenario variants:
338
+
339
+ ```bash
340
+ node src/cli.mjs debug-live --reset --scenario noisy --task-count 24
341
+ node src/cli.mjs debug-live --reset --scenario retry
342
+ ```
343
+
329
344
  For React/Vite visualizer UI dev loop:
330
345
 
331
346
  ```bash
@@ -338,6 +353,8 @@ For production visualizer UI build:
338
353
  npm run build:visualizer:ui
339
354
  ```
340
355
 
356
+ Publish now auto-runs check, tests, and UI build via `prepublishOnly`.
357
+
341
358
  This seeds `.pi-debug/live-ui/`, runs harness there with streaming fake SDK fixture, hosts visualizer, and gives stable local repro loop for UI work. React app lives in `visualizer-ui/`. Visualizer server now serves built assets from `visualizer-ui/dist/` and falls back to build-instructions page if build artifacts are missing.
342
359
 
343
360
  See `docs/VISUALIZER_UI_PLAN.md` for migration plan.
@@ -62,7 +62,7 @@ The package reads `PI_CONFIG_FILE` if provided. Otherwise it falls back to the b
62
62
 
63
63
  Visualizer reads active-run lock, TODO file, per-run state, per-run iteration summary, per-run last output snapshot, live feed JSONL, and telemetry to show current stage plus historical runs.
64
64
 
65
- For local UI iteration in this package repo, use `pi-harness debug-live` to run against seeded fake live SDK sandbox.
65
+ For local UI iteration in this package repo, use `pi-harness debug-live` to run against seeded fake live SDK sandbox. Useful variants: `--scenario noisy`, `--scenario retry`, `--task-count 24`.
66
66
 
67
67
  ## Config Contract
68
68
 
@@ -80,10 +80,13 @@ Projects typically provide their own `pi.config.json` with fields such as:
80
80
  - `visualCaptureCommand`
81
81
  - `visualFeedbackFile`
82
82
  - `testerFeedbackFile`
83
+ - `failureArtifactDir`
83
84
  - `models`
84
85
  - `piModel`
85
86
  - `visualReviewModel`
86
87
  - `commitMode`
88
+ - `sameFileLoopBudget`
89
+ - `loopHistoryLimit`
87
90
 
88
91
  Model entries may carry their own OpenAI-compatible endpoint settings, so the PI text loop and the multimodal visual reviewer can point at different backends without changing code.
89
92
 
@@ -124,6 +127,10 @@ The default flow keeps commit ownership with the active agent:
124
127
  2. `tester` should review functionality and, on `PASS`, stage only the task-related files and create the commit directly.
125
128
  3. If the working tree is too messy to isolate safely, tester should return `VERDICT: BLOCKED` instead of guessing.
126
129
 
130
+ If tester returns `PASS` but leaves a dirty tree without creating the commit, the harness now treats that as a protocol error and automatically falls back to a commit-plan follow-up instead of stalling the iteration.
131
+
132
+ If tester edits files before finalization, the harness re-runs the configured smoke verification command immediately and records which files tester touched.
133
+
127
134
  If a repo explicitly needs the older harness-managed commit-plan flow, set `commitMode` to `plan`. In that mode, `testerCommit` and parsed commit plans are used as a compatibility path rather than the default.
128
135
 
129
136
  For source inspection, prompts prefer `read` and reserve shell usage for `git`, tests, and narrow diagnostics. Large shell file reads are more likely to truncate under context pressure than focused `read` calls.
@@ -175,6 +182,7 @@ SDK transport mitigates obvious local loops by watching agent and tool events:
175
182
 
176
183
  - repeated identical tool calls are aborted
177
184
  - repeated same-path churn is aborted
185
+ - repeated same-file loop targets are persisted in harness state and escalate the next retry strategy
178
186
  - a soft `continue` can be sent after inactivity
179
187
  - a separate tool-aware watchdog can tolerate long-running `bash` or browser work without treating the turn as dead
180
188
  - a hard no-event timeout aborts a wedged turn instead of hanging indefinitely
@@ -200,4 +208,6 @@ Each step records:
200
208
  - changed file count
201
209
  - verification status
202
210
  - retry count
211
+ - artifact path for compact failure diagnostics when available
212
+ - output excerpt for failed verification-style events
203
213
  - notes
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@sebastianandreasson/pi-autonomous-agents",
3
3
  "private": false,
4
- "version": "0.10.0",
4
+ "version": "0.12.1",
5
5
  "type": "module",
6
6
  "description": "Portable unattended PI harness for developer/tester/visual-review loops.",
7
7
  "license": "MIT",
@@ -23,7 +23,8 @@
23
23
  "test": "node --test test/pi-heartbeat.test.mjs test/pi-lifecycle.test.mjs test/pi-role-models.test.mjs test/pi-flow.test.mjs test/pi-history.test.mjs test/pi-prompts.test.mjs test/pi-preflight.test.mjs test/pi-repo.test.mjs test/pi-sdk-supervisor.test.mjs test/pi-sdk-turn.test.mjs test/pi-telemetry.test.mjs test/pi-visualizer-shared.test.mjs",
24
24
  "debug:live-ui": "node src/cli.mjs debug-live --reset",
25
25
  "dev:visualizer:ui": "npm --prefix visualizer-ui run dev",
26
- "build:visualizer:ui": "npm --prefix visualizer-ui run build"
26
+ "build:visualizer:ui": "npm --prefix visualizer-ui run build",
27
+ "prepublishOnly": "npm run check && npm test && npm run build:visualizer:ui"
27
28
  },
28
29
  "files": [
29
30
  "src",
package/src/cli.mjs CHANGED
@@ -36,11 +36,14 @@ function main() {
36
36
  if (subcommand === 'once' || subcommand === 'run') {
37
37
  childArgs.push(subcommand)
38
38
  }
39
+ const childStdio = subcommand === 'once' || subcommand === 'run'
40
+ ? ['pipe', 'inherit', 'inherit']
41
+ : 'inherit'
39
42
 
40
43
  const child = spawn(process.execPath, childArgs, {
41
44
  cwd: process.cwd(),
42
45
  env: process.env,
43
- stdio: 'inherit',
46
+ stdio: childStdio,
44
47
  })
45
48
  registerOwnedChildProcess(child)
46
49
 
package/src/pi-config.mjs CHANGED
@@ -259,6 +259,7 @@ export function loadConfig(mode = 'once') {
259
259
  maxTesterFeedbackLines: readInt('PI_MAX_TESTER_FEEDBACK_LINES', file.maxTesterFeedbackLines, 32),
260
260
  maxPromptNotesLines: readInt('PI_MAX_PROMPT_NOTES_LINES', file.maxPromptNotesLines, 16),
261
261
  maxVerificationExcerptLines: readInt('PI_MAX_VERIFICATION_EXCERPT_LINES', file.maxVerificationExcerptLines, 40),
262
+ maxFailureArtifactLines: readInt('PI_MAX_FAILURE_ARTIFACT_LINES', file.maxFailureArtifactLines, 80),
262
263
  largeFileWarningLines: readInt('PI_LARGE_FILE_WARNING_LINES', file.largeFileWarningLines, 500),
263
264
  largeSpecWarningLines: readInt('PI_LARGE_SPEC_WARNING_LINES', file.largeSpecWarningLines, 300),
264
265
  piTools: readString('PI_TOOLS', file.piTools, 'read,edit,write,find,ls,bash'),
@@ -280,6 +281,8 @@ export function loadConfig(mode = 'once') {
280
281
  verificationTimeoutSeconds: readInt('PI_VERIFICATION_TIMEOUT', file.verificationTimeoutSeconds, 300),
281
282
  idleRetryLimit: readInt('PI_IDLE_RETRY_LIMIT', file.idleRetryLimit, 1),
282
283
  noChangeRetryLimit: readInt('PI_NO_CHANGE_RETRY_LIMIT', file.noChangeRetryLimit, 1),
284
+ sameFileLoopBudget: readInt('PI_SAME_FILE_LOOP_BUDGET', file.sameFileLoopBudget, 2),
285
+ loopHistoryLimit: readInt('PI_LOOP_HISTORY_LIMIT', file.loopHistoryLimit, 25),
283
286
  visualFeedbackFile: resolveFromCwd(
284
287
  cwd,
285
288
  'PI_VISUAL_FEEDBACK_FILE',
@@ -298,6 +301,12 @@ export function loadConfig(mode = 'once') {
298
301
  file.testerFeedbackHistoryDir,
299
302
  'pi-output/tester-feedback/history'
300
303
  ),
304
+ failureArtifactDir: resolveFromCwd(
305
+ cwd,
306
+ 'PI_FAILURE_ARTIFACT_DIR',
307
+ file.failureArtifactDir,
308
+ 'pi-output/failure-artifacts'
309
+ ),
301
310
  visualReviewHistoryDir: resolveFromCwd(
302
311
  cwd,
303
312
  'PI_VISUAL_REVIEW_HISTORY_DIR',
@@ -12,11 +12,51 @@ const cliFile = path.join(scriptDir, 'cli.mjs')
12
12
  const fakePiFile = path.join(packageRoot, 'test', 'fixtures', 'fake-pi.mjs')
13
13
  const fakeLiveSdkFile = path.join(packageRoot, 'test', 'fixtures', 'fake-live-pi-sdk.mjs')
14
14
  const sandboxDir = path.join(packageRoot, '.pi-debug', 'live-ui')
15
+ const DEFAULT_TASK_COUNT = 12
15
16
 
16
17
  function shellQuote(value) {
17
18
  return JSON.stringify(String(value))
18
19
  }
19
20
 
21
+ function readFlagValue(flag) {
22
+ const index = process.argv.indexOf(flag)
23
+ if (index === -1) {
24
+ return ''
25
+ }
26
+ return String(process.argv[index + 1] ?? '').trim()
27
+ }
28
+
29
+ function readScenario() {
30
+ const value = readFlagValue('--scenario') || process.env.PI_FAKE_LIVE_SCENARIO || 'default'
31
+ return String(value).trim() || 'default'
32
+ }
33
+
34
+ function readTaskCount() {
35
+ const raw = Number.parseInt(readFlagValue('--task-count') || process.env.PI_DEBUG_TASK_COUNT || `${DEFAULT_TASK_COUNT}`, 10)
36
+ return Number.isFinite(raw) && raw > 0 ? raw : DEFAULT_TASK_COUNT
37
+ }
38
+
39
+ function buildTodoLines(taskCount) {
40
+ const lines = []
41
+ for (let index = 1; index <= taskCount; index += 1) {
42
+ const phase = index <= Math.ceil(taskCount / 3)
43
+ ? 'Phase 1'
44
+ : index <= Math.ceil((taskCount * 2) / 3)
45
+ ? 'Phase 2'
46
+ : 'Phase 3'
47
+ const label = `Fake live task ${index}`
48
+ if (lines.length === 0 || lines[lines.length - 1] !== `## ${phase}`) {
49
+ if (lines.length > 0) {
50
+ lines.push('')
51
+ }
52
+ lines.push(`## ${phase}`)
53
+ lines.push('')
54
+ }
55
+ lines.push(`- [ ] ${label}`)
56
+ }
57
+ return `${lines.join('\n')}\n`
58
+ }
59
+
20
60
  async function ensureRepo(cwd) {
21
61
  try {
22
62
  execFileSync('git', ['rev-parse', '--is-inside-work-tree'], { cwd, stdio: 'ignore' })
@@ -27,21 +67,11 @@ async function ensureRepo(cwd) {
27
67
  }
28
68
  }
29
69
 
30
- async function seedFiles(cwd) {
70
+ async function seedFiles(cwd, { taskCount, scenario }) {
31
71
  await fs.mkdir(path.join(cwd, 'pi'), { recursive: true })
32
- await fs.writeFile(path.join(cwd, 'TODOS.md'), [
33
- '## Phase 1',
34
- '',
35
- '- [ ] Fake live task one',
36
- '- [ ] Fake live task two',
37
- '- [ ] Fake live task three',
38
- '',
39
- '## Phase 2',
40
- '',
41
- '- [ ] Fake live task four',
42
- ].join('\n') + '\n', 'utf8')
43
- await fs.writeFile(path.join(cwd, 'DEVELOPER.md'), 'Developer instructions for local visualizer debugging.\n', 'utf8')
44
- await fs.writeFile(path.join(cwd, 'TESTER.md'), 'Tester instructions for local visualizer debugging.\n', 'utf8')
72
+ await fs.writeFile(path.join(cwd, 'TODOS.md'), buildTodoLines(taskCount), 'utf8')
73
+ await fs.writeFile(path.join(cwd, 'DEVELOPER.md'), `Developer instructions for local visualizer debugging.\nScenario: ${scenario}\n`, 'utf8')
74
+ await fs.writeFile(path.join(cwd, 'TESTER.md'), `Tester instructions for local visualizer debugging.\nScenario: ${scenario}\n`, 'utf8')
45
75
  await fs.writeFile(path.join(cwd, 'pi.config.json'), `${JSON.stringify({
46
76
  transport: 'sdk',
47
77
  taskFile: 'TODOS.md',
@@ -63,7 +93,7 @@ async function seedFiles(cwd) {
63
93
  toolContinueAfterSeconds: 3600,
64
94
  toolNoEventTimeoutSeconds: 3600,
65
95
  sleepBetweenSeconds: 1,
66
- maxIterations: 20,
96
+ maxIterations: Math.max(taskCount * 3, 20),
67
97
  }, null, 2)}\n`, 'utf8')
68
98
  }
69
99
 
@@ -78,17 +108,22 @@ async function ensureInitialCommit(cwd) {
78
108
 
79
109
  async function main() {
80
110
  const reset = process.argv.includes('--reset')
111
+ const scenario = readScenario()
112
+ const taskCount = readTaskCount()
113
+
81
114
  if (reset) {
82
115
  await fs.rm(sandboxDir, { recursive: true, force: true })
83
116
  }
84
117
 
85
118
  await fs.mkdir(sandboxDir, { recursive: true })
86
119
  await ensureRepo(sandboxDir)
87
- await seedFiles(sandboxDir)
120
+ await seedFiles(sandboxDir, { taskCount, scenario })
88
121
  await ensureInitialCommit(sandboxDir)
89
122
 
90
123
  process.stdout.write(`PI debug sandbox: ${sandboxDir}\n`)
91
124
  process.stdout.write(`Using fake live SDK fixture: ${fakeLiveSdkFile}\n`)
125
+ process.stdout.write(`Scenario: ${scenario}\n`)
126
+ process.stdout.write(`Task count: ${taskCount}\n`)
92
127
 
93
128
  const child = spawn(process.execPath, [cliFile, 'run'], {
94
129
  cwd: sandboxDir,
@@ -96,6 +131,7 @@ async function main() {
96
131
  ...process.env,
97
132
  PI_CONFIG_FILE: 'pi.config.json',
98
133
  PI_SDK_MODULE: fakeLiveSdkFile,
134
+ PI_FAKE_LIVE_SCENARIO: scenario,
99
135
  PI_VISUALIZER_HOST: process.env.PI_VISUALIZER_HOST || '127.0.0.1',
100
136
  PI_VISUALIZER_PORT: process.env.PI_VISUALIZER_PORT || '4317',
101
137
  },
@@ -54,6 +54,16 @@ function formatLargeFileRiskHint(warnings) {
54
54
  return `\nLarge file risk in touched files:\n${lines}\nPrefer helper extraction, smaller scoped edits, or test splitting over broad in-place edits.\n`
55
55
  }
56
56
 
57
+ function formatLoopRecoveryHint(hints) {
58
+ const list = Array.isArray(hints) ? hints.filter(Boolean) : []
59
+ if (list.length === 0) {
60
+ return ''
61
+ }
62
+
63
+ const lines = list.slice(0, 3).map((hint) => `- ${hint}`).join('\n')
64
+ return `\nRecent loop-recovery constraints:\n${lines}\n`
65
+ }
66
+
57
67
  function displayPath(config, filePath) {
58
68
  const relativePath = path.relative(config.cwd, filePath)
59
69
  if (
@@ -119,6 +129,36 @@ function repoInstructionsAuthorityLine(config, instructionsFile, usesBundledInst
119
129
  return `Repo-local instructions in ${displayPath(config, instructionsFile)} are the primary role contract. Follow them over package defaults when they differ.\n`
120
130
  }
121
131
 
132
+ export function classifyTaskType(task) {
133
+ const text = String(task ?? '').trim().toLowerCase()
134
+ if (text === '') {
135
+ return 'general'
136
+ }
137
+
138
+ if (
139
+ /\b(write|add|create|implement|expand|improve|fix|update)\b.*\b(test|tests|coverage|regression test|spec|specs)\b/.test(text)
140
+ || /\b(test|tests|coverage|regression test|spec|specs)\b.*\b(write|add|create|implement|expand|improve|fix|update)\b/.test(text)
141
+ ) {
142
+ return 'test'
143
+ }
144
+
145
+ return 'general'
146
+ }
147
+
148
+ function formatTaskTypeGuidance(taskType) {
149
+ if (taskType !== 'test') {
150
+ return ''
151
+ }
152
+
153
+ return [
154
+ 'Test-task guidance:',
155
+ '- This TODO is primarily test-focused. Do not fail solely because changes are mostly or entirely tests.',
156
+ '- PASS if the new or updated test adds meaningful behavioral or regression coverage and verification passes.',
157
+ '- FAIL if the test is brittle, redundant, weakly asserted, or not tied to real behavior.',
158
+ '- Prefer checking whether the test would have failed before the change, or whether developer notes justify why missing coverage mattered.',
159
+ ].join('\n')
160
+ }
161
+
122
162
  function testerPassOwnershipRules(config) {
123
163
  if (config.commitMode === 'plan') {
124
164
  return {
@@ -160,11 +200,13 @@ export function buildMainPrompt(config, options = {}) {
160
200
  config.developerInstructionsFile,
161
201
  config.usingBundledDeveloperInstructions,
162
202
  )
203
+ const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
163
204
 
164
205
  if (!config.usingBundledDeveloperInstructions) {
165
206
  return `Read ${taskFile} and ${instructionsFile}.
166
207
  ${authorityLine}${visualFeedbackSection}
167
208
  ${testerFeedbackSection}
209
+ ${loopRecoveryHint}
168
210
 
169
211
  Work only on the current phase.
170
212
  Select the first unchecked actionable checkbox in phase order.
@@ -190,6 +232,7 @@ Before stopping:
190
232
  return `Read ${taskFile} and ${instructionsFile}.
191
233
  ${authorityLine}${visualFeedbackSection}
192
234
  ${testerFeedbackSection}
235
+ ${loopRecoveryHint}
193
236
 
194
237
  Do one current-phase unchecked task.
195
238
 
@@ -224,12 +267,14 @@ export function buildFixPrompt(config, recentVerificationOutput, options = {}) {
224
267
  )
225
268
  const findings = clampLines(recentVerificationOutput, configMaxLines(config, 'maxVerificationExcerptLines', 40))
226
269
  const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
270
+ const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
227
271
 
228
272
  if (!config.usingBundledDeveloperInstructions) {
229
273
  return `Read ${taskFile} and ${instructionsFile}.
230
274
  ${authorityLine}${visualFeedbackSection}
231
275
  ${testerFeedbackSection}
232
276
  ${largeFileRiskHint}
277
+ ${loopRecoveryHint}
233
278
 
234
279
  The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
235
280
 
@@ -256,6 +301,7 @@ Before stopping:
256
301
  ${authorityLine}${visualFeedbackSection}
257
302
  ${testerFeedbackSection}
258
303
  ${largeFileRiskHint}
304
+ ${loopRecoveryHint}
259
305
 
260
306
  The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
261
307
 
@@ -289,6 +335,7 @@ export function buildSteeringPrompt(config, reason, options = {}) {
289
335
  config.usingBundledDeveloperInstructions,
290
336
  )
291
337
  const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
338
+ const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
292
339
 
293
340
  if (!config.usingBundledDeveloperInstructions) {
294
341
  return `Continue from the current repo state.
@@ -296,6 +343,7 @@ Read ${taskFile} and ${instructionsFile}.
296
343
  ${authorityLine}${visualFeedbackSection}
297
344
  ${testerFeedbackSection}
298
345
  ${largeFileRiskHint}
346
+ ${loopRecoveryHint}
299
347
 
300
348
  Reason for this follow-up: ${reason}
301
349
 
@@ -316,6 +364,7 @@ Read ${taskFile} and ${instructionsFile}.
316
364
  ${authorityLine}${visualFeedbackSection}
317
365
  ${testerFeedbackSection}
318
366
  ${largeFileRiskHint}
367
+ ${loopRecoveryHint}
319
368
 
320
369
  Reason for this follow-up: ${reason}
321
370
 
@@ -353,6 +402,9 @@ export function buildTesterPrompt(config, {
353
402
  developerNotes || '(none provided)',
354
403
  configMaxLines(config, 'maxPromptNotesLines', 16),
355
404
  )
405
+ const taskType = classifyTaskType(task)
406
+ const taskTypeLabel = taskType === 'test' ? 'test-focused' : 'general'
407
+ const taskTypeGuidance = formatTaskTypeGuidance(taskType)
356
408
  const verificationCommand = config.testCommand.trim() === '' ? '(not configured)' : config.testCommand
357
409
  const visualCaptureNote = config.visualReviewEnabled
358
410
  ? `\n- Keep the screenshot capture flow working so the harness still produces current visual artifacts for review.`
@@ -364,6 +416,7 @@ export function buildTesterPrompt(config, {
364
416
  )
365
417
  const passOwnership = testerPassOwnershipRules(config)
366
418
  const largeFileRiskHint = formatLargeFileRiskHint(largeFileWarnings)
419
+ const taskTypeRuleBlock = taskTypeGuidance === '' ? '' : `${taskTypeGuidance}\n`
367
420
 
368
421
  if (!config.usingBundledTesterInstructions) {
369
422
  return `Read ${taskFile} and ${instructionsFile}.
@@ -375,6 +428,7 @@ You are the TESTER role. You are reviewing the most recent developer work from a
375
428
 
376
429
  Current phase: ${phase}
377
430
  Current task: ${task}
431
+ Current task type: ${taskTypeLabel}
378
432
  Reason for this tester pass: ${reason}
379
433
 
380
434
  Developer notes:
@@ -391,7 +445,7 @@ Rules:
391
445
  - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
392
446
  - If blocked or inconclusive, return VERDICT: BLOCKED.
393
447
  - Do not hide real bugs with brittle tests.
394
- - ${passOwnership.successRule.slice(2)}
448
+ ${taskTypeRuleBlock}- ${passOwnership.successRule.slice(2)}
395
449
  - ${passOwnership.isolationRule.slice(2)}
396
450
  - ${passOwnership.extraRule.slice(2)}${visualCaptureNote}
397
451
 
@@ -417,6 +471,7 @@ You are the TESTER role. You are reviewing the most recent developer work from a
417
471
 
418
472
  Current phase: ${phase}
419
473
  Current task: ${task}
474
+ Current task type: ${taskTypeLabel}
420
475
  Reason for this tester pass: ${reason}
421
476
 
422
477
  Developer notes:
@@ -433,7 +488,7 @@ ${indentBlock(innerLoopValidationRules(verificationCommand), '\t')}
433
488
  - Prefer one focused browser-driven review pass.
434
489
  - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
435
490
  - Do not hide real bugs with brittle tests.
436
- - If blocked or inconclusive, return VERDICT: BLOCKED.
491
+ ${taskTypeGuidance === '' ? '' : `${indentBlock(taskTypeGuidance, '\t')}\n`} - If blocked or inconclusive, return VERDICT: BLOCKED.
437
492
  ${indentBlock(passOwnership.successRule, '\t')}
438
493
  ${indentBlock(passOwnership.isolationRule, '\t')}
439
494
  ${indentBlock(passOwnership.extraRule, '\t')}${visualCaptureNote}
package/src/pi-repo.mjs CHANGED
@@ -57,6 +57,7 @@ export async function readState(stateFile) {
57
57
  lastStatus: '',
58
58
  lastVerificationStatus: '',
59
59
  lastVisualStatus: '',
60
+ loopHistory: {},
60
61
  lastRunAt: '',
61
62
  runId: '',
62
63
  inProgress: null,
@@ -75,6 +76,7 @@ export async function readState(stateFile) {
75
76
  lastStatus: '',
76
77
  lastVerificationStatus: '',
77
78
  lastVisualStatus: '',
79
+ loopHistory: {},
78
80
  lastRunAt: '',
79
81
  runId: '',
80
82
  inProgress: null,
@@ -282,7 +284,8 @@ export function watchParentProcess(onParentExit, options = {}) {
282
284
  }
283
285
 
284
286
  const currentParentPid = normalizePid(process.ppid)
285
- if (currentParentPid === expectedParentPid && currentParentPid > 1) {
287
+ const parentStillRunning = isProcessRunning(expectedParentPid)
288
+ if (currentParentPid === expectedParentPid && currentParentPid > 1 && parentStillRunning) {
286
289
  return
287
290
  }
288
291
 
@@ -483,7 +486,7 @@ function countLines(text) {
483
486
  return normalized.split('\n').length
484
487
  }
485
488
 
486
- function isSpecLikeFile(filePath) {
489
+ export function isSpecLikeFile(filePath) {
487
490
  const normalized = String(filePath ?? '').replaceAll('\\', '/')
488
491
  return /(^|\/)(e2e|test|tests|spec|specs)\//.test(normalized)
489
492
  || /\.(spec|test)\.[cm]?[jt]sx?$/.test(normalized)
package/src/pi-report.mjs CHANGED
@@ -46,6 +46,21 @@ async function main() {
46
46
  }
47
47
  }
48
48
 
49
+ const failureArtifacts = recent
50
+ .filter((event) => String(event.artifactPath ?? '').trim() !== '')
51
+ .slice(-5)
52
+
53
+ if (failureArtifacts.length > 0) {
54
+ console.log('\nFailure artifacts:')
55
+ for (const event of failureArtifacts) {
56
+ const excerpt = String(event.outputExcerpt ?? '').trim()
57
+ console.log(`- iteration ${event.iteration} ${event.kind}: ${event.artifactPath}`)
58
+ if (excerpt !== '') {
59
+ console.log(` excerpt: ${excerpt.split('\n')[0]}`)
60
+ }
61
+ }
62
+ }
63
+
49
64
  const last = recent.at(-1)
50
65
  if (!last) {
51
66
  return