npm - @sebastianandreasson/pi-autonomous-agents - Versions diffs - 0.11.0 → 0.12.1 - Mend

@sebastianandreasson/pi-autonomous-agents 0.11.0 → 0.12.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +9 -1
package/docs/PI_SUPERVISOR.md +10 -0
package/package.json +1 -1
package/src/cli.mjs +4 -1
package/src/pi-config.mjs +9 -0
package/src/pi-prompts.mjs +19 -0
package/src/pi-repo.mjs +5 -2
package/src/pi-report.mjs +15 -0
package/src/pi-supervisor.mjs +522 -28
package/src/pi-telemetry.mjs +6 -1
package/visualizer-ui/dist/assets/index-Bbj-UfL5.js +12 -0
package/visualizer-ui/dist/assets/index-CO5voAk0.css +1 -0
package/visualizer-ui/dist/index.html +2 -2
package/visualizer-ui/dist/assets/index-C5V0jXPE.css +0 -1
package/visualizer-ui/dist/assets/index-CpHvuv0C.js +0 -12

package/README.md CHANGED Viewed

@@ -190,10 +190,13 @@ Common fields in `pi.config.json`:
 - `testCommand`
 - `visualReviewEnabled`
 - `visualCaptureCommand`
+- `failureArtifactDir`
 - `continueAfterSeconds`
 - `toolContinueAfterSeconds`
 - `noEventTimeoutSeconds`
 - `toolNoEventTimeoutSeconds`
+- `sameFileLoopBudget`
+- `loopHistoryLimit`
 - `largeFileWarningLines`
 - `largeSpecWarningLines`
@@ -207,6 +210,8 @@ Key defaults:
 - `toolContinueAfterSeconds`: `900`
 - `noEventTimeoutSeconds`: `900`
 - `toolNoEventTimeoutSeconds`: `1800`
+- `sameFileLoopBudget`: `2`
+- `loopHistoryLimit`: `25`
 ## Prompt and Tooling Behavior
@@ -217,6 +222,7 @@ The package is optimized for local models by default:
 - prompts prefer `read` for source inspection
 - shell is intended for `git`, tests, and narrow diagnostics
 - SDK transport carries forward oversized shell-read warnings and loop/timeout guards
+- repeated same-file loop failures are remembered across iterations and escalate the next edit strategy
 - the supervisor emits large-file/spec warnings when touched files are getting risky
 This is deliberate. Large monolith files, huge e2e specs, and broad TODO items are one of the main causes of local-model drift and retry loops.
@@ -255,6 +261,8 @@ Useful files during a run:
   Latest verification output snapshot.
 - `.pi-last-iteration.json`
   Structured summary of the last completed iteration.
+- `pi-output/failure-artifacts/`
+  Compact failure artifacts with command, exit code, changed files, tester summary, and output excerpt.
 - `.pi-state.json`
   Persistent harness state, including in-progress iteration data.
 - `pi.log`
@@ -264,7 +272,7 @@ Useful files during a run:
 - `.pi-runtime/active-run.json`
 - `.pi-runtime/runs/<runId>/...`
-`pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons and large-file warnings.
+`pi-harness report` summarizes recent telemetry and surfaces things like terminal reasons, large-file warnings, and recent failure artifacts.
 `pi-harness run` now also starts lightweight local web UI for orchestration flow by default. By default it listens on `127.0.0.1:4317`. Override with `PI_VISUALIZER_HOST` and `PI_VISUALIZER_PORT`. Set `PI_VISUALIZER=0` to disable embedded web UI for a run.

package/docs/PI_SUPERVISOR.md CHANGED Viewed

@@ -80,10 +80,13 @@ Projects typically provide their own `pi.config.json` with fields such as:
 - `visualCaptureCommand`
 - `visualFeedbackFile`
 - `testerFeedbackFile`
+- `failureArtifactDir`
 - `models`
 - `piModel`
 - `visualReviewModel`
 - `commitMode`
+- `sameFileLoopBudget`
+- `loopHistoryLimit`
 Model entries may carry their own OpenAI-compatible endpoint settings, so the PI text loop and the multimodal visual reviewer can point at different backends without changing code.
@@ -124,6 +127,10 @@ The default flow keeps commit ownership with the active agent:
 2. `tester` should review functionality and, on `PASS`, stage only the task-related files and create the commit directly.
 3. If the working tree is too messy to isolate safely, tester should return `VERDICT: BLOCKED` instead of guessing.
+If tester returns `PASS` but leaves a dirty tree without creating the commit, the harness now treats that as a protocol error and automatically falls back to a commit-plan follow-up instead of stalling the iteration.
+If tester edits files before finalization, the harness re-runs the configured smoke verification command immediately and records which files tester touched.
 If a repo explicitly needs the older harness-managed commit-plan flow, set `commitMode` to `plan`. In that mode, `testerCommit` and parsed commit plans are used as a compatibility path rather than the default.
 For source inspection, prompts prefer `read` and reserve shell usage for `git`, tests, and narrow diagnostics. Large shell file reads are more likely to truncate under context pressure than focused `read` calls.
@@ -175,6 +182,7 @@ SDK transport mitigates obvious local loops by watching agent and tool events:
 - repeated identical tool calls are aborted
 - repeated same-path churn is aborted
+- repeated same-file loop targets are persisted in harness state and escalate the next retry strategy
 - a soft `continue` can be sent after inactivity
 - a separate tool-aware watchdog can tolerate long-running `bash` or browser work without treating the turn as dead
 - a hard no-event timeout aborts a wedged turn instead of hanging indefinitely
@@ -200,4 +208,6 @@ Each step records:
 - changed file count
 - verification status
 - retry count
+- artifact path for compact failure diagnostics when available
+- output excerpt for failed verification-style events
 - notes

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@sebastianandreasson/pi-autonomous-agents",
   "private": false,
-  "version": "0.11.0",
+  "version": "0.12.1",
   "type": "module",
   "description": "Portable unattended PI harness for developer/tester/visual-review loops.",
   "license": "MIT",

package/src/cli.mjs CHANGED Viewed

@@ -36,11 +36,14 @@ function main() {
   if (subcommand === 'once' || subcommand === 'run') {
     childArgs.push(subcommand)
   }
+  const childStdio = subcommand === 'once' || subcommand === 'run'
+    ? ['pipe', 'inherit', 'inherit']
+    : 'inherit'
   const child = spawn(process.execPath, childArgs, {
     cwd: process.cwd(),
     env: process.env,
-    stdio: 'inherit',
+    stdio: childStdio,
   })
   registerOwnedChildProcess(child)

package/src/pi-config.mjs CHANGED Viewed

@@ -259,6 +259,7 @@ export function loadConfig(mode = 'once') {
     maxTesterFeedbackLines: readInt('PI_MAX_TESTER_FEEDBACK_LINES', file.maxTesterFeedbackLines, 32),
     maxPromptNotesLines: readInt('PI_MAX_PROMPT_NOTES_LINES', file.maxPromptNotesLines, 16),
     maxVerificationExcerptLines: readInt('PI_MAX_VERIFICATION_EXCERPT_LINES', file.maxVerificationExcerptLines, 40),
+    maxFailureArtifactLines: readInt('PI_MAX_FAILURE_ARTIFACT_LINES', file.maxFailureArtifactLines, 80),
     largeFileWarningLines: readInt('PI_LARGE_FILE_WARNING_LINES', file.largeFileWarningLines, 500),
     largeSpecWarningLines: readInt('PI_LARGE_SPEC_WARNING_LINES', file.largeSpecWarningLines, 300),
     piTools: readString('PI_TOOLS', file.piTools, 'read,edit,write,find,ls,bash'),
@@ -280,6 +281,8 @@ export function loadConfig(mode = 'once') {
     verificationTimeoutSeconds: readInt('PI_VERIFICATION_TIMEOUT', file.verificationTimeoutSeconds, 300),
     idleRetryLimit: readInt('PI_IDLE_RETRY_LIMIT', file.idleRetryLimit, 1),
     noChangeRetryLimit: readInt('PI_NO_CHANGE_RETRY_LIMIT', file.noChangeRetryLimit, 1),
+    sameFileLoopBudget: readInt('PI_SAME_FILE_LOOP_BUDGET', file.sameFileLoopBudget, 2),
+    loopHistoryLimit: readInt('PI_LOOP_HISTORY_LIMIT', file.loopHistoryLimit, 25),
     visualFeedbackFile: resolveFromCwd(
       cwd,
       'PI_VISUAL_FEEDBACK_FILE',
@@ -298,6 +301,12 @@ export function loadConfig(mode = 'once') {
       file.testerFeedbackHistoryDir,
       'pi-output/tester-feedback/history'
     ),
+    failureArtifactDir: resolveFromCwd(
+      cwd,
+      'PI_FAILURE_ARTIFACT_DIR',
+      file.failureArtifactDir,
+      'pi-output/failure-artifacts'
+    ),
     visualReviewHistoryDir: resolveFromCwd(
       cwd,
       'PI_VISUAL_REVIEW_HISTORY_DIR',

package/src/pi-prompts.mjs CHANGED Viewed

@@ -54,6 +54,16 @@ function formatLargeFileRiskHint(warnings) {
   return `\nLarge file risk in touched files:\n${lines}\nPrefer helper extraction, smaller scoped edits, or test splitting over broad in-place edits.\n`
 }
+function formatLoopRecoveryHint(hints) {
+  const list = Array.isArray(hints) ? hints.filter(Boolean) : []
+  if (list.length === 0) {
+    return ''
+  }
+  const lines = list.slice(0, 3).map((hint) => `- ${hint}`).join('\n')
+  return `\nRecent loop-recovery constraints:\n${lines}\n`
+}
 function displayPath(config, filePath) {
   const relativePath = path.relative(config.cwd, filePath)
   if (
@@ -190,11 +200,13 @@ export function buildMainPrompt(config, options = {}) {
     config.developerInstructionsFile,
     config.usingBundledDeveloperInstructions,
   )
+  const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
   if (!config.usingBundledDeveloperInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${loopRecoveryHint}
 Work only on the current phase.
 Select the first unchecked actionable checkbox in phase order.
@@ -220,6 +232,7 @@ Before stopping:
   return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${loopRecoveryHint}
 Do one current-phase unchecked task.
@@ -254,12 +267,14 @@ export function buildFixPrompt(config, recentVerificationOutput, options = {}) {
   )
   const findings = clampLines(recentVerificationOutput, configMaxLines(config, 'maxVerificationExcerptLines', 40))
   const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
+  const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
   if (!config.usingBundledDeveloperInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
@@ -286,6 +301,7 @@ Before stopping:
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
@@ -319,6 +335,7 @@ export function buildSteeringPrompt(config, reason, options = {}) {
     config.usingBundledDeveloperInstructions,
   )
   const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
+  const loopRecoveryHint = formatLoopRecoveryHint(options.loopRecoveryHints)
   if (!config.usingBundledDeveloperInstructions) {
     return `Continue from the current repo state.
@@ -326,6 +343,7 @@ Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 Reason for this follow-up: ${reason}
@@ -346,6 +364,7 @@ Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
 ${largeFileRiskHint}
+${loopRecoveryHint}
 Reason for this follow-up: ${reason}

package/src/pi-repo.mjs CHANGED Viewed

@@ -57,6 +57,7 @@ export async function readState(stateFile) {
       lastStatus: '',
       lastVerificationStatus: '',
       lastVisualStatus: '',
+      loopHistory: {},
       lastRunAt: '',
       runId: '',
       inProgress: null,
@@ -75,6 +76,7 @@ export async function readState(stateFile) {
       lastStatus: '',
       lastVerificationStatus: '',
       lastVisualStatus: '',
+      loopHistory: {},
       lastRunAt: '',
       runId: '',
       inProgress: null,
@@ -282,7 +284,8 @@ export function watchParentProcess(onParentExit, options = {}) {
     }
     const currentParentPid = normalizePid(process.ppid)
-    if (currentParentPid === expectedParentPid && currentParentPid > 1) {
+    const parentStillRunning = isProcessRunning(expectedParentPid)
+    if (currentParentPid === expectedParentPid && currentParentPid > 1 && parentStillRunning) {
       return
     }
@@ -483,7 +486,7 @@ function countLines(text) {
   return normalized.split('\n').length
 }
-function isSpecLikeFile(filePath) {
+export function isSpecLikeFile(filePath) {
   const normalized = String(filePath ?? '').replaceAll('\\', '/')
   return /(^|\/)(e2e|test|tests|spec|specs)\//.test(normalized)
     || /\.(spec|test)\.[cm]?[jt]sx?$/.test(normalized)

package/src/pi-report.mjs CHANGED Viewed

@@ -46,6 +46,21 @@ async function main() {
     }
   }
+  const failureArtifacts = recent
+    .filter((event) => String(event.artifactPath ?? '').trim() !== '')
+    .slice(-5)
+  if (failureArtifacts.length > 0) {
+    console.log('\nFailure artifacts:')
+    for (const event of failureArtifacts) {
+      const excerpt = String(event.outputExcerpt ?? '').trim()
+      console.log(`- iteration ${event.iteration} ${event.kind}: ${event.artifactPath}`)
+      if (excerpt !== '') {
+        console.log(`  excerpt: ${excerpt.split('\n')[0]}`)
+      }
+    }
+  }
   const last = recent.at(-1)
   if (!last) {
     return