npm - @sebastianandreasson/pi-autonomous-agents - Versions diffs - 0.10.0 → 0.12.1 - Mend

@sebastianandreasson/pi-autonomous-agents 0.10.0 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

package/README.md +18 -1
package/docs/PI_SUPERVISOR.md +11 -1
package/package.json +3 -2
package/src/cli.mjs +4 -1
package/src/pi-config.mjs +9 -0
package/src/pi-debug-live.mjs +52 -16
package/src/pi-prompts.mjs +57 -2
package/src/pi-repo.mjs +5 -2
package/src/pi-report.mjs +15 -0
package/src/pi-supervisor.mjs +559 -34
package/src/pi-telemetry.mjs +6 -1
package/src/pi-visualizer-server.mjs +23 -5
package/src/pi-visualizer-shared.mjs +29 -10
package/visualizer-ui/dist/assets/index-Bbj-UfL5.js +12 -0
package/visualizer-ui/dist/assets/index-CO5voAk0.css +1 -0
package/visualizer-ui/dist/index.html +2 -2
package/visualizer-ui/dist/assets/index-C398cGuP.js +0 -12
package/visualizer-ui/dist/assets/index-DuJxYqkl.css +0 -1

package/README.md CHANGED Viewed

@@ -190,10 +190,13 @@ Common fields in `pi.config.json`:
 - `testCommand`
 - `visualReviewEnabled`
 - `visualCaptureCommand`
+- `failureArtifactDir`
 - `continueAfterSeconds`
 - `toolContinueAfterSeconds`
 - `noEventTimeoutSeconds`
 - `toolNoEventTimeoutSeconds`
+- `sameFileLoopBudget`
+- `loopHistoryLimit`
 - `largeFileWarningLines`
 - `largeSpecWarningLines`
@@ -207,6 +210,8 @@ Key defaults:
 - `toolContinueAfterSeconds`: `900`
 - `noEventTimeoutSeconds`: `900`
 - `toolNoEventTimeoutSeconds`: `1800`
+- `sameFileLoopBudget`: `2`
+- `loopHistoryLimit`: `25`
 ## Prompt and Tooling Behavior
@@ -217,6 +222,7 @@ The package is optimized for local models by default:
 - prompts prefer `read` for source inspection
 - shell is intended for `git`, tests, and narrow diagnostics
 - SDK transport carries forward oversized shell-read warnings and loop/timeout guards
+- repeated same-file loop failures are remembered across iterations and escalate the next edit strategy
 - the supervisor emits large-file/spec warnings when touched files are getting risky
 This is deliberate. Large monolith files, huge e2e specs, and broad TODO items are one of the main causes of local-model drift and retry loops.
@@ -255,6 +261,8 @@ Useful files during a run:
   Latest verification output snapshot.
 - `.pi-last-iteration.json`
   Structured summary of the last completed iteration.
+- `pi-output/failure-artifacts/`
+  Compact failure artifacts with command, exit code, changed files, tester summary, and output excerpt.
 - `.pi-state.json`
   Persistent harness state, including in-progress iteration data.
 - `pi.log`
@@ -264,7 +272,7 @@ Useful files during a run:
 - `.pi-runtime/active-run.json`
 - `.pi-runtime/runs/<runId>/...`
-`pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons and large-file warnings.
+`pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons, large-file warnings, and recent failure artifacts.
 `pi-harness run` now also starts lightweight local web UI for orchestration flow by default. By default it listens on `127.0.0.1:4317`. Override with `PI_VISUALIZER_HOST` and `PI_VISUALIZER_PORT`. Set `PI_VISUALIZER=0` to disable embedded web UI for a run.
@@ -326,6 +334,13 @@ For local visualizer iteration against fake live SDK agent:
 npm run debug:live-ui
 ```
+Scenario variants:
+```bash
+node src/cli.mjs debug-live --reset --scenario noisy --task-count 24
+node src/cli.mjs debug-live --reset --scenario retry
+```
 For React/Vite visualizer UI dev loop:
 ```bash
@@ -338,6 +353,8 @@ For production visualizer UI build:
 npm run build:visualizer:ui
 ```
+Publish now auto-runs check, tests, and UI build via `prepublishOnly`.
 This seeds `.pi-debug/live-ui/`, runs harness there with streaming fake SDK fixture, hosts visualizer, and gives stable local repro loop for UI work. React app lives in `visualizer-ui/`. Visualizer server now serves built assets from `visualizer-ui/dist/` and falls back to build-instructions page if build artifacts are missing.
 See `docs/VISUALIZER_UI_PLAN.md` for migration plan.

package/docs/PI_SUPERVISOR.md CHANGED Viewed

@@ -62,7 +62,7 @@ The package reads `PI_CONFIG_FILE` if provided. Otherwise it falls back to the b
 Visualizer reads active-run lock, TODO file, per-run state, per-run iteration summary, per-run last output snapshot, live feed JSONL, and telemetry to show current stage plus historical runs.
-For local UI iteration in this package repo, use `pi-harness debug-live` to run against seeded fake live SDK sandbox.
+For local UI iteration in this package repo, use `pi-harness debug-live` to run against seeded fake live SDK sandbox. Useful variants: `--scenario noisy`, `--scenario retry`, `--task-count 24`.
 ## Config Contract
@@ -80,10 +80,13 @@ Projects typically provide their own `pi.config.json` with fields such as:
 - `visualCaptureCommand`
 - `visualFeedbackFile`
 - `testerFeedbackFile`
+- `failureArtifactDir`
 - `models`
 - `piModel`
 - `visualReviewModel`
 - `commitMode`
+- `sameFileLoopBudget`
+- `loopHistoryLimit`
 Model entries may carry their own OpenAI-compatible endpoint settings, so the PI text loop and the multimodal visual reviewer can point at different backends without changing code.
@@ -124,6 +127,10 @@ The default flow keeps commit ownership with the active agent:
 2. `tester` should review functionality and, on `PASS`, stage only the task-related files and create the commit directly.
 3. If the working tree is too messy to isolate safely, tester should return `VERDICT: BLOCKED` instead of guessing.
+If tester returns `PASS` but leaves a dirty tree without creating the commit, the harness now treats that as a protocol error and automatically falls back to a commit-plan follow-up instead of stalling the iteration.
+If tester edits files before finalization, the harness re-runs the configured smoke verification command immediately and records which files tester touched.
 If a repo explicitly needs the older harness-managed commit-plan flow, set `commitMode` to `plan`. In that mode, `testerCommit` and parsed commit plans are used as a compatibility path rather than the default.
 For source inspection, prompts prefer `read` and reserve shell usage for `git`, tests, and narrow diagnostics. Large shell file reads are more likely to truncate under context pressure than focused `read` calls.
@@ -175,6 +182,7 @@ SDK transport mitigates obvious local loops by watching agent and tool events:
 - repeated identical tool calls are aborted
 - repeated same-path churn is aborted
+- repeated same-file loop targets are persisted in harness state and escalate the next retry strategy
 - a soft `continue` can be sent after inactivity
 - a separate tool-aware watchdog can tolerate long-running `bash` or browser work without treating the turn as dead
 - a hard no-event timeout aborts a wedged turn instead of hanging indefinitely
@@ -200,4 +208,6 @@ Each step records:
 - changed file count
 - verification status
 - retry count
+- artifact path for compact failure diagnostics when available
+- output excerpt for failed verification-style events
 - notes

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@sebastianandreasson/pi-autonomous-agents",
   "private": false,
-  "version": "0.10.0",
+  "version": "0.12.1",
   "type": "module",
   "description": "Portable unattended PI harness for developer/tester/visual-review loops.",
   "license": "MIT",
@@ -23,7 +23,8 @@
     "test": "node --test test/pi-heartbeat.test.mjs test/pi-lifecycle.test.mjs test/pi-role-models.test.mjs test/pi-flow.test.mjs test/pi-history.test.mjs test/pi-prompts.test.mjs test/pi-preflight.test.mjs test/pi-repo.test.mjs test/pi-sdk-supervisor.test.mjs test/pi-sdk-turn.test.mjs test/pi-telemetry.test.mjs test/pi-visualizer-shared.test.mjs",
     "debug:live-ui": "node src/cli.mjs debug-live --reset",
     "dev:visualizer:ui": "npm --prefix visualizer-ui run dev",
-    "build:visualizer:ui": "npm --prefix visualizer-ui run build"
+    "build:visualizer:ui": "npm --prefix visualizer-ui run build",
+    "prepublishOnly": "npm run check && npm test && npm run build:visualizer:ui"
   },
   "files": [
     "src",

package/src/cli.mjs CHANGED Viewed

@@ -36,11 +36,14 @@ function main() {
   if (subcommand === 'once' || subcommand === 'run') {
     childArgs.push(subcommand)
   }
+  const childStdio = subcommand === 'once' || subcommand === 'run'
+    ? ['pipe', 'inherit', 'inherit']
+    : 'inherit'
   const child = spawn(process.execPath, childArgs, {
     cwd: process.cwd(),
     env: process.env,
-    stdio: 'inherit',
+    stdio: childStdio,
   })
   registerOwnedChildProcess(child)

package/src/pi-config.mjs CHANGED Viewed

@@ -259,6 +259,7 @@ export function loadConfig(mode = 'once') {
     maxTesterFeedbackLines: readInt('PI_MAX_TESTER_FEEDBACK_LINES', file.maxTesterFeedbackLines, 32),
     maxPromptNotesLines: readInt('PI_MAX_PROMPT_NOTES_LINES', file.maxPromptNotesLines, 16),
     maxVerificationExcerptLines: readInt('PI_MAX_VERIFICATION_EXCERPT_LINES', file.maxVerificationExcerptLines, 40),
+    maxFailureArtifactLines: readInt('PI_MAX_FAILURE_ARTIFACT_LINES', file.maxFailureArtifactLines, 80),
     largeFileWarningLines: readInt('PI_LARGE_FILE_WARNING_LINES', file.largeFileWarningLines, 500),
     largeSpecWarningLines: readInt('PI_LARGE_SPEC_WARNING_LINES', file.largeSpecWarningLines, 300),
     piTools: readString('PI_TOOLS', file.piTools, 'read,edit,write,find,ls,bash'),
@@ -280,6 +281,8 @@ export function loadConfig(mode = 'once') {
     verificationTimeoutSeconds: readInt('PI_VERIFICATION_TIMEOUT', file.verificationTimeoutSeconds, 300),
     idleRetryLimit: readInt('PI_IDLE_RETRY_LIMIT', file.idleRetryLimit, 1),
     noChangeRetryLimit: readInt('PI_NO_CHANGE_RETRY_LIMIT', file.noChangeRetryLimit, 1),
+    sameFileLoopBudget: readInt('PI_SAME_FILE_LOOP_BUDGET', file.sameFileLoopBudget, 2),
+    loopHistoryLimit: readInt('PI_LOOP_HISTORY_LIMIT', file.loopHistoryLimit, 25),
     visualFeedbackFile: resolveFromCwd(
       cwd,
       'PI_VISUAL_FEEDBACK_FILE',
@@ -298,6 +301,12 @@ export function loadConfig(mode = 'once') {
       file.testerFeedbackHistoryDir,
       'pi-output/tester-feedback/history'
     ),
+    failureArtifactDir: resolveFromCwd(
+      cwd,
+      'PI_FAILURE_ARTIFACT_DIR',
+      file.failureArtifactDir,
+      'pi-output/failure-artifacts'
+    ),
     visualReviewHistoryDir: resolveFromCwd(
       cwd,
       'PI_VISUAL_REVIEW_HISTORY_DIR',

package/src/pi-debug-live.mjs CHANGED Viewed

@@ -12,11 +12,51 @@ const cliFile = path.join(scriptDir, 'cli.mjs')
 const fakePiFile = path.join(packageRoot, 'test', 'fixtures', 'fake-pi.mjs')
 const fakeLiveSdkFile = path.join(packageRoot, 'test', 'fixtures', 'fake-live-pi-sdk.mjs')
 const sandboxDir = path.join(packageRoot, '.pi-debug', 'live-ui')
+const DEFAULT_TASK_COUNT = 12
 function shellQuote(value) {
   return JSON.stringify(String(value))
 }
+function readFlagValue(flag) {
+  const index = process.argv.indexOf(flag)
+  if (index === -1) {
+    return ''
+  }
+  return String(process.argv[index + 1] ?? '').trim()
+}
+function readScenario() {
+  const value = readFlagValue('--scenario') || process.env.PI_FAKE_LIVE_SCENARIO || 'default'
+  return String(value).trim() || 'default'
+}
+function readTaskCount() {
+  const raw = Number.parseInt(readFlagValue('--task-count') || process.env.PI_DEBUG_TASK_COUNT || `${DEFAULT_TASK_COUNT}`, 10)
+  return Number.isFinite(raw) && raw > 0 ? raw : DEFAULT_TASK_COUNT
+}
+function buildTodoLines(taskCount) {
+  const lines = []
+  for (let index = 1; index <= taskCount; index += 1) {
+    const phase = index <= Math.ceil(taskCount / 3)
+      ? 'Phase 1'
+      : index <= Math.ceil((taskCount * 2) / 3)
+        ? 'Phase 2'
+        : 'Phase 3'
+    const label = `Fake live task ${index}`
+    if (lines.length === 0 || lines[lines.length - 1] !== `## ${phase}`) {
+      if (lines.length > 0) {
+        lines.push('')
+      }
+      lines.push(`## ${phase}`)
+      lines.push('')
+    }
+    lines.push(`- [ ] ${label}`)
+  }
+  return `${lines.join('\n')}\n`
+}
 async function ensureRepo(cwd) {
   try {
     execFileSync('git', ['rev-parse', '--is-inside-work-tree'], { cwd, stdio: 'ignore' })
@@ -27,21 +67,11 @@ async function ensureRepo(cwd) {
   }
 }
-async function seedFiles(cwd) {
+async function seedFiles(cwd, { taskCount, scenario }) {
   await fs.mkdir(path.join(cwd, 'pi'), { recursive: true })
-  await fs.writeFile(path.join(cwd, 'TODOS.md'), [
-    '## Phase 1',
-    '',
-    '- [ ] Fake live task one',
-    '- [ ] Fake live task two',
-    '- [ ] Fake live task three',
-    '',
-    '## Phase 2',
-    '',
-    '- [ ] Fake live task four',
-  ].join('\n') + '\n', 'utf8')
-  await fs.writeFile(path.join(cwd, 'DEVELOPER.md'), 'Developer instructions for local visualizer debugging.\n', 'utf8')
-  await fs.writeFile(path.join(cwd, 'TESTER.md'), 'Tester instructions for local visualizer debugging.\n', 'utf8')
+  await fs.writeFile(path.join(cwd, 'TODOS.md'), buildTodoLines(taskCount), 'utf8')
+  await fs.writeFile(path.join(cwd, 'DEVELOPER.md'), `Developer instructions for local visualizer debugging.\nScenario: ${scenario}\n`, 'utf8')
+  await fs.writeFile(path.join(cwd, 'TESTER.md'), `Tester instructions for local visualizer debugging.\nScenario: ${scenario}\n`, 'utf8')
   await fs.writeFile(path.join(cwd, 'pi.config.json'), `${JSON.stringify({
     transport: 'sdk',
     taskFile: 'TODOS.md',
@@ -63,7 +93,7 @@ async function seedFiles(cwd) {
     toolContinueAfterSeconds: 3600,
     toolNoEventTimeoutSeconds: 3600,
     sleepBetweenSeconds: 1,
-    maxIterations: 20,
+    maxIterations: Math.max(taskCount * 3, 20),
   }, null, 2)}\n`, 'utf8')
 }
@@ -78,17 +108,22 @@ async function ensureInitialCommit(cwd) {
 async function main() {
   const reset = process.argv.includes('--reset')
+  const scenario = readScenario()
+  const taskCount = readTaskCount()
   if (reset) {
     await fs.rm(sandboxDir, { recursive: true, force: true })
   }
   await fs.mkdir(sandboxDir, { recursive: true })
   await ensureRepo(sandboxDir)
-  await seedFiles(sandboxDir)
+  await seedFiles(sandboxDir, { taskCount, scenario })
   await ensureInitialCommit(sandboxDir)
   process.stdout.write(`PI debug sandbox: ${sandboxDir}\n`)
   process.stdout.write(`Using fake live SDK fixture: ${fakeLiveSdkFile}\n`)
+  process.stdout.write(`Scenario: ${scenario}\n`)
+  process.stdout.write(`Task count: ${taskCount}\n`)
   const child = spawn(process.execPath, [cliFile, 'run'], {
     cwd: sandboxDir,
@@ -96,6 +131,7 @@ async function main() {
       ...process.env,
       PI_CONFIG_FILE: 'pi.config.json',
       PI_SDK_MODULE: fakeLiveSdkFile,
+      PI_FAKE_LIVE_SCENARIO: scenario,
       PI_VISUALIZER_HOST: process.env.PI_VISUALIZER_HOST || '127.0.0.1',
       PI_VISUALIZER_PORT: process.env.PI_VISUALIZER_PORT || '4317',
     },

package/src/pi-prompts.mjs CHANGED Viewed

@@ -54,6 +54,16 @@ function formatLargeFileRiskHint(warnings) {
   return `\nLarge file risk in touched files:\n${lines}\nPrefer helper extraction, smaller scoped edits, or test splitting over broad in-place edits.\n`
 }
+function formatLoopRecoveryHint(hints) {
+  const list = Array.isArray(hints) ? hints.filter(Boolean) : []
+  if (list.length === 0) {
+    return ''
+  }
+  const lines = list.slice(0, 3).map((hint) => `- ${hint}`).join('\n')
+  return `\nRecent loop-recovery constraints:\n${lines}\n`
+}
 function displayPath(config, filePath) {
   const relativePath = path.relative(config.cwd, filePath)
   if (
@@ -119,6 +129,36 @@ function repoInstructionsAuthorityLine(config, instructionsFile, usesBundledInst
   return `Repo-local instructions in ${displayPath(config, instructionsFile)} are the primary role contract. Follow them over package defaults when they differ.\n`
 }
+export function classifyTaskType(task) {
+  const text = String(task ?? '').trim().toLowerCase()
+  if (text === '') {
+    return 'general'
+  }
+  if (
+    /\b(write|add|create|implement|expand|improve|fix|update)\b.*\b(test|tests|coverage|regression test|spec|specs)\b/.test(text)
+    || /\b(test|tests|coverage|regression test|spec|specs)\b.*\b(write|add|create|implement|expand|improve|fix|update)\b/.test(text)
+  ) {
+    return 'test'
+  }
+  return 'general'
+}
+function formatTaskTypeGuidance(taskType) {
+  if (taskType !== 'test') {
+    return ''
+  }
+  return [
+    'Test-task guidance:',
+    '- This TODO is primarily test-focused. Do not fail solely because changes are mostly or entirely tests.',
+    '- PASS if the new or updated test adds meaningful behavioral or regression coverage and verification passes.',
+    '- FAIL if the test is brittle, redundant, weakly asserted, or not tied to real behavior.',
+    '- Prefer checking whether the test would have failed before the change, or whether developer notes justify why missing coverage mattered.',
+  ].join('\n')
+}
 function testerPassOwnershipRules(config) {
   if (config.commitMode === 'plan') {
     return {
@@ -160,11 +200,13 @@ export function buildMainPrompt(config, options = {}) {
     config.developerInstructionsFile,
     config.usingBundledDeveloperInstructions,
   )
+  const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
   if (!config.usingBundledDeveloperInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${loopRecoveryHint}
 Work only on the current phase.
 Select the first unchecked actionable checkbox in phase order.
@@ -190,6 +232,7 @@ Before stopping:
   return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${loopRecoveryHint}
 Do one current-phase unchecked task.
@@ -224,12 +267,14 @@ export function buildFixPrompt(config, recentVerificationOutput, options = {}) {
   )
   const findings = clampLines(recentVerificationOutput, configMaxLines(config, 'maxVerificationExcerptLines', 40))
   const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
+  const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
   if (!config.usingBundledDeveloperInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
@@ -256,6 +301,7 @@ Before stopping:
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
@@ -289,6 +335,7 @@ export function buildSteeringPrompt(config, reason, options = {}) {
     config.usingBundledDeveloperInstructions,
   )
   const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
+  const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
   if (!config.usingBundledDeveloperInstructions) {
     return `Continue from the current repo state.
@@ -296,6 +343,7 @@ Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 Reason for this follow-up: ${reason}
@@ -316,6 +364,7 @@ Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 Reason for this follow-up: ${reason}
@@ -353,6 +402,9 @@ export function buildTesterPrompt(config, {
     developerNotes || '(none provided)',
     configMaxLines(config, 'maxPromptNotesLines', 16),
   )
+  const taskType = classifyTaskType(task)
+  const taskTypeLabel = taskType === 'test' ? 'test-focused' : 'general'
+  const taskTypeGuidance = formatTaskTypeGuidance(taskType)
   const verificationCommand = config.testCommand.trim() === '' ? '(not configured)' : config.testCommand
   const visualCaptureNote = config.visualReviewEnabled
     ? `\n- Keep the screenshot capture flow working so the harness still produces current visual artifacts for review.`
@@ -364,6 +416,7 @@ export function buildTesterPrompt(config, {
   )
   const passOwnership = testerPassOwnershipRules(config)
   const largeFileRiskHint = formatLargeFileRiskHint(largeFileWarnings)
+  const taskTypeRuleBlock = taskTypeGuidance === '' ? '' : `${taskTypeGuidance}\n`
   if (!config.usingBundledTesterInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
@@ -375,6 +428,7 @@ You are the TESTER role. You are reviewing the most recent developer work from a
 Current phase: ${phase}
 Current task: ${task}
+Current task type: ${taskTypeLabel}
 Reason for this tester pass: ${reason}
 Developer notes:
@@ -391,7 +445,7 @@ Rules:
 - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 - If blocked or inconclusive, return VERDICT: BLOCKED.
 - Do not hide real bugs with brittle tests.
-- ${passOwnership.successRule.slice(2)}
+${taskTypeRuleBlock}- ${passOwnership.successRule.slice(2)}
 - ${passOwnership.isolationRule.slice(2)}
 - ${passOwnership.extraRule.slice(2)}${visualCaptureNote}
@@ -417,6 +471,7 @@ You are the TESTER role. You are reviewing the most recent developer work from a
 Current phase: ${phase}
 Current task: ${task}
+Current task type: ${taskTypeLabel}
 Reason for this tester pass: ${reason}
 Developer notes:
@@ -433,7 +488,7 @@ ${indentBlock(innerLoopValidationRules(verificationCommand), '\t')}
 	- Prefer one focused browser-driven review pass.
 	- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 	- Do not hide real bugs with brittle tests.
-	- If blocked or inconclusive, return VERDICT: BLOCKED.
+${taskTypeGuidance === '' ? '' : `${indentBlock(taskTypeGuidance, '\t')}\n`}	- If blocked or inconclusive, return VERDICT: BLOCKED.
 ${indentBlock(passOwnership.successRule, '\t')}
 ${indentBlock(passOwnership.isolationRule, '\t')}
 ${indentBlock(passOwnership.extraRule, '\t')}${visualCaptureNote}

package/src/pi-repo.mjs CHANGED Viewed

@@ -57,6 +57,7 @@ export async function readState(stateFile) {
       lastStatus: '',
       lastVerificationStatus: '',
       lastVisualStatus: '',
+      loopHistory: {},
       lastRunAt: '',
       runId: '',
       inProgress: null,
@@ -75,6 +76,7 @@ export async function readState(stateFile) {
       lastStatus: '',
       lastVerificationStatus: '',
       lastVisualStatus: '',
+      loopHistory: {},
       lastRunAt: '',
       runId: '',
       inProgress: null,
@@ -282,7 +284,8 @@ export function watchParentProcess(onParentExit, options = {}) {
     }
     const currentParentPid = normalizePid(process.ppid)
-    if (currentParentPid === expectedParentPid && currentParentPid > 1) {
+    const parentStillRunning = isProcessRunning(expectedParentPid)
+    if (currentParentPid === expectedParentPid && currentParentPid > 1 && parentStillRunning) {
       return
     }
@@ -483,7 +486,7 @@ function countLines(text) {
   return normalized.split('\n').length
 }
-function isSpecLikeFile(filePath) {
+export function isSpecLikeFile(filePath) {
   const normalized = String(filePath ?? '').replaceAll('\\', '/')
   return /(^|\/)(e2e|test|tests|spec|specs)\//.test(normalized)
     || /\.(spec|test)\.[cm]?[jt]sx?$/.test(normalized)

package/src/pi-report.mjs CHANGED Viewed

@@ -46,6 +46,21 @@ async function main() {
     }
   }
+  const failureArtifacts = recent
+    .filter((event) => String(event.artifactPath ?? '').trim() !== '')
+    .slice(-5)
+  if (failureArtifacts.length > 0) {
+    console.log('\nFailure artifacts:')
+    for (const event of failureArtifacts) {
+      const excerpt = String(event.outputExcerpt ?? '').trim()
+      console.log(`- iteration ${event.iteration} ${event.kind}: ${event.artifactPath}`)
+      if (excerpt !== '') {
+        console.log(`  excerpt: ${excerpt.split('\n')[0]}`)
+      }
+    }
+  }
   const last = recent.at(-1)
   if (!last) {
     return