npm - @sebastianandreasson/pi-autonomous-agents - Versions diffs - 0.3.0 → 0.4.0 - Mend

@sebastianandreasson/pi-autonomous-agents 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

package/README.md +6 -2
package/SETUP.md +3 -0
package/docs/PI_SUPERVISOR.md +4 -2
package/package.json +1 -1
package/src/index.mjs +1 -0
package/src/pi-config.mjs +3 -1
package/src/pi-prompts.mjs +47 -0
package/src/pi-repo.mjs +59 -0
package/src/pi-report.mjs +11 -0
package/src/pi-rpc-adapter.mjs +42 -0
package/src/pi-supervisor.mjs +58 -1
package/src/pi-telemetry.mjs +2 -1
package/templates/DEVELOPER.md +3 -0
package/templates/TESTER.md +7 -4
package/templates/pi.config.example.json +2 -0

package/README.md CHANGED Viewed

@@ -6,7 +6,7 @@
 - a fast verification step
 - a skeptical `tester` pass
 - optional periodic multimodal visual review
-- harness-owned git finalization
+- tester-owned final commit by default
 The package is intentionally generic. It does not know how to navigate or test a specific app on its own.
@@ -18,7 +18,7 @@ The package is intentionally generic. It does not know how to navigate or test a
 - telemetry
 - loop guards, timeout guards, and retries
 - tester feedback + visual feedback handoff
-- harness-owned git finalize step
+- optional legacy harness git finalize step for `commitMode: "plan"`
 - multimodal visual review client
 ## What Stays Per Project
@@ -119,4 +119,8 @@ By default, successful tester passes should stage and create the commit directly
 Prompt/context handoff is compact by default. The harness now caps prior feedback excerpts, changed-file lists, verification excerpts, and prompt note handoff. If needed, tune `maxPromptChangedFiles`, `maxVisualFeedbackLines`, `maxTesterFeedbackLines`, `maxPromptNotesLines`, and `maxVerificationExcerptLines`.
+The default coding tool mix is now safer for local models: `read,edit,write,find,ls,bash`. Prompts explicitly steer source inspection toward `read` and reserve shell usage for `git`, tests, and narrow diagnostics.
+The harness also emits lightweight large-file warnings for touched source/spec files and carries them into `.pi-last-iteration.json`, `pi-harness report`, and relevant prompts. Tune `largeFileWarningLines` and `largeSpecWarningLines` if needed.
 The harness expects screenshot capture to produce a `manifest.json` plus image files under the configured visual capture directory.

package/SETUP.md CHANGED Viewed

@@ -47,6 +47,7 @@ If the repo uses another package manager already, use the repo-native equivalent
   - `developerInstructionsFile`: `pi/DEVELOPER.md`
   - `testerInstructionsFile`: `pi/TESTER.md`
   - `commitMode`: normally `agent`
+  - `promptMode`: normally `compact`
   - `testCommand`: a fast bounded verification command for this repo
   - `visualCaptureCommand`: only if this repo has a real screenshot capture flow
   - `models` / `piModel` / `visualReviewModel` / `roleModels`: configure the models actually available in this environment
@@ -125,6 +126,7 @@ Recommended pattern:
 - local or slightly stronger model for `tester`
 - stronger frontier model for `visualReview` only if available
 - keep `commitMode` as `agent` unless the repo explicitly needs legacy harness-managed commit-plan parsing
+- keep large-file thresholds sensible for local models (`largeFileWarningLines`, `largeSpecWarningLines`)
 Example shape:
@@ -192,6 +194,7 @@ For flow debugging, inspect `.pi-last-iteration.json` after a run. It summarizes
 - Do not enable visual review unless the repo actually has a usable capture command and model config.
 - Keep changes minimal and local to harness setup.
 - Prefer very small, implementation-shaped TODO items for local models. Broad tasks tend to create long turns, retries, and weak tester behavior.
+- Prefer `read` for code inspection and keep shell usage focused on `git`, tests, and narrow diagnostics, especially for weaker local models.
 ## What To Report Back

package/docs/PI_SUPERVISOR.md CHANGED Viewed

@@ -30,7 +30,7 @@ Main package files:
 - `src/pi-client.mjs`: transport layer
 - `src/pi-rpc-adapter.mjs`: built-in adapter from supervisor JSON to `pi --mode rpc`
 - `src/pi-config.mjs`: config loader
-- `src/pi-repo.mjs`: repo helpers, verification runner, git finalize step
+- `src/pi-repo.mjs`: repo helpers, verification runner, and optional legacy git finalize step
 - `src/pi-telemetry.mjs`: telemetry writer/reader
 - `src/pi-prompts.mjs`: default prompt builders
 - `src/pi-visual-review.mjs`: multimodal visual-review worker
@@ -126,7 +126,7 @@ Request shape:
   "runtimeDir": "/absolute/repo/path/.pi-runtime",
   "piCli": "pi",
   "model": "local/model-name",
-  "tools": "read,bash,edit,write,grep,find,ls",
+  "tools": "read,edit,write,find,ls,bash",
   "thinking": "",
   "noExtensions": false,
   "noSkills": false,
@@ -170,6 +170,8 @@ The default flow keeps commit ownership with the active agent:
 If a repo explicitly needs the older harness-managed commit-plan flow, set `commitMode` to `plan`. In that mode, `testerCommit` and parsed commit plans are used as a compatibility path rather than the default.
+For source inspection, prompts prefer `read` and reserve shell usage for `git`, tests, and narrow diagnostics. Large shell file reads are more likely to truncate under context pressure than focused `read` calls.
 ## Persistent Handoffs
 The harness persists two cross-iteration handoff files:

package/package.json CHANGED Viewed

@@ -1,7 +1,7 @@
 {
   "name": "@sebastianandreasson/pi-autonomous-agents",
   "private": false,
-  "version": "0.3.0",
+  "version": "0.4.0",
   "type": "module",
   "description": "Portable unattended PI harness for developer/tester/visual-review loops.",
   "license": "MIT",

package/src/index.mjs CHANGED Viewed

@@ -10,4 +10,5 @@ export {
   runStartupPreflight,
 } from './pi-preflight.mjs'
 export { clearHarnessHistory, collectHistoryTargets } from './pi-history.mjs'
+export { collectLargeFileWarnings } from './pi-repo.mjs'
 export { runAgentTurn } from './pi-client.mjs'

package/src/pi-config.mjs CHANGED Viewed

@@ -258,7 +258,9 @@ export function loadConfig(mode = 'once') {
     maxTesterFeedbackLines: readInt('PI_MAX_TESTER_FEEDBACK_LINES', file.maxTesterFeedbackLines, 32),
     maxPromptNotesLines: readInt('PI_MAX_PROMPT_NOTES_LINES', file.maxPromptNotesLines, 16),
     maxVerificationExcerptLines: readInt('PI_MAX_VERIFICATION_EXCERPT_LINES', file.maxVerificationExcerptLines, 40),
-    piTools: readString('PI_TOOLS', file.piTools, 'read,bash,edit,write,grep,find,ls'),
+    largeFileWarningLines: readInt('PI_LARGE_FILE_WARNING_LINES', file.largeFileWarningLines, 500),
+    largeSpecWarningLines: readInt('PI_LARGE_SPEC_WARNING_LINES', file.largeSpecWarningLines, 300),
+    piTools: readString('PI_TOOLS', file.piTools, 'read,edit,write,find,ls,bash'),
     piThinking: readString('PI_THINKING', file.piThinking, ''),
     piNoExtensions: readBool('PI_NO_EXTENSIONS', file.piNoExtensions, false),
     piNoSkills: readBool('PI_NO_SKILLS', file.piNoSkills, false),

package/src/pi-prompts.mjs CHANGED Viewed

@@ -40,6 +40,20 @@ function formatChangedFilesSection(files, maxFiles) {
   return lines.join('\n')
 }
+function formatLargeFileRiskHint(warnings) {
+  const list = Array.isArray(warnings) ? warnings.filter(Boolean) : []
+  if (list.length === 0) {
+    return ''
+  }
+  const lines = list
+    .slice(0, 3)
+    .map((warning) => `- ${warning.file} (${warning.lineCount} lines${warning.kind === 'large_spec' ? ', spec' : ''})`)
+    .join('\n')
+  return `\nLarge file risk in touched files:\n${lines}\nPrefer helper extraction, smaller scoped edits, or test splitting over broad in-place edits.\n`
+}
 function displayPath(config, filePath) {
   const relativePath = path.relative(config.cwd, filePath)
   if (
@@ -160,6 +174,9 @@ Harness rules:
 - Start by checking git status so you know whether unrelated changes already exist.
 - Update code, config, and docs only as needed for the selected task.
 - Tick only the checkbox items that are actually completed.
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
+- Do not build edits from large sed/grep output or from memory after partial shell reads.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 - If blocked, add a brief note directly under the relevant task in ${taskFile} explaining the blocker, then stop.
 - Do not create the final commit during the developer pass.
 ${staleEditRecoveryRules()}
@@ -180,6 +197,9 @@ Rules:
 - Start with git status.
 - Select the first unchecked actionable checkbox in phase order.
 - Keep changes minimal and scoped.
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
+- Do not edit from memory after partial shell output.
 - Tick only completed items.
 - If blocked, note it under the task in ${taskFile} and stop.
 - Do not touch lockfiles, generated files, or unrelated assets.
@@ -203,11 +223,13 @@ export function buildFixPrompt(config, recentVerificationOutput, options = {}) {
     config.usingBundledDeveloperInstructions,
   )
   const findings = clampLines(recentVerificationOutput, configMaxLines(config, 'maxVerificationExcerptLines', 40))
+  const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
   if (!config.usingBundledDeveloperInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
@@ -218,6 +240,9 @@ Harness rules:
 - Start by checking git status so you know which files are already dirty.
 - Do not paper over product bugs by weakening tests.
 - Keep changes minimal and focused on the failing behavior.
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
+- Do not edit from memory after partial shell output.
 - Do not perform speculative cleanup or unrelated refactors in this pass.
 - Do not create the final commit during the developer fix pass.
 ${staleEditRecoveryRules()}
@@ -230,6 +255,7 @@ Before stopping:
   return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
@@ -240,6 +266,9 @@ Rules:
 - Start with git status.
 - Keep the fix narrow.
 - Do not weaken tests to hide product bugs.
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
+- Do not edit from memory after partial shell output.
 - Do not perform speculative cleanup or unrelated refactors.
 - Do not create the final commit.
 ${staleEditRecoveryRules()}
@@ -259,12 +288,14 @@ export function buildSteeringPrompt(config, reason, options = {}) {
     config.developerInstructionsFile,
     config.usingBundledDeveloperInstructions,
   )
+  const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
   if (!config.usingBundledDeveloperInstructions) {
     return `Continue from the current repo state.
 Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 Reason for this follow-up: ${reason}
@@ -272,9 +303,11 @@ Select the first unchecked actionable checkbox in the current phase, complete on
 Additional harness guardrails:
 - Start by checking git status.
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
 - Do not repeat the same tool call over and over.
 - If you already read a file, use that context instead of rereading it unless something changed.
 - If an edit fails once, reread the file before retrying. Do not repeat the same exact edit attempt.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 - If you are stuck, make the smallest decisive next action or stop and state the blocker.`
   }
@@ -282,15 +315,18 @@ Additional harness guardrails:
 Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 Reason for this follow-up: ${reason}
 Select the first unchecked actionable checkbox in the current phase, complete one coherent task, tick completed items, run verification, and stop.
 Additional guardrails:
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
 - Do not repeat the same tool call over and over.
 - If you already read a file, use that context instead of rereading it unless something changed.
 - If an edit fails once, reread the file before retrying. Do not repeat the same exact edit attempt.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 - Prefer the configured smoke verification path and one narrow targeted check over long full-flow Playwright specs.
 - If you are stuck, make the smallest decisive next action or stop and state the blocker.`
 }
@@ -303,6 +339,7 @@ export function buildTesterPrompt(config, {
   reason = 'tester_review',
   visualFeedback = '',
   testerFeedback = '',
+  largeFileWarnings = [],
 }) {
   const taskFile = displayPath(config, config.taskFile)
   const instructionsFile = displayPath(config, config.testerInstructionsFile)
@@ -326,11 +363,13 @@ export function buildTesterPrompt(config, {
     config.usingBundledTesterInstructions,
   )
   const passOwnership = testerPassOwnershipRules(config)
+  const largeFileRiskHint = formatLargeFileRiskHint(largeFileWarnings)
   if (!config.usingBundledTesterInstructions) {
     return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 You are the TESTER role. You are reviewing the most recent developer work from an independent quality and functionality perspective.
@@ -348,6 +387,8 @@ Rules:
 - Start with git status.
 - Follow repo-local tester instructions for what to verify and which commands to run.
 - Prefer one focused review pass.
+- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
+- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 - If blocked or inconclusive, return VERDICT: BLOCKED.
 - Do not hide real bugs with brittle tests.
 - ${passOwnership.successRule.slice(2)}
@@ -370,6 +411,7 @@ Before stopping, end your final response with exactly one verdict line:
   return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 You are the TESTER role. You are reviewing the most recent developer work from an independent quality and functionality perspective.
@@ -385,9 +427,11 @@ ${changedFilesSection}
 	Rules:
 	- Start with git status.
+	- Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
 	- Run the repo verification command yourself: ${verificationCommand}
 ${indentBlock(innerLoopValidationRules(verificationCommand), '\t')}
 	- Prefer one focused browser-driven review pass.
+	- If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
 	- Do not hide real bugs with brittle tests.
 	- If blocked or inconclusive, return VERDICT: BLOCKED.
 ${indentBlock(passOwnership.successRule, '\t')}
@@ -415,6 +459,7 @@ export function buildCommitPrompt(config, {
   reason = 'tester_passed_without_commit',
   visualFeedback = '',
   testerFeedback = '',
+  largeFileWarnings = [],
 }) {
   const taskFile = displayPath(config, config.taskFile)
   const instructionsFile = displayPath(config, config.testerInstructionsFile)
@@ -433,10 +478,12 @@ export function buildCommitPrompt(config, {
     developerNotes || '(none provided)',
     configMaxLines(config, 'maxPromptNotesLines', 16),
   )
+  const largeFileRiskHint = formatLargeFileRiskHint(largeFileWarnings)
   return `Read ${taskFile} and ${instructionsFile}.
 ${authorityLine}${visualFeedbackSection}
 ${testerFeedbackSection}
+${largeFileRiskHint}
 You are the TESTER role. The implementation already passed functional review, but the final commit was not created.

package/src/pi-repo.mjs CHANGED Viewed

@@ -225,6 +225,65 @@ export function findFirstUncheckedTaskInfo(taskFile) {
   }
 }
+function countLines(text) {
+  const normalized = String(text ?? '')
+  if (normalized === '') {
+    return 0
+  }
+  return normalized.split('\n').length
+}
+function isSpecLikeFile(filePath) {
+  const normalized = String(filePath ?? '').replaceAll('\\', '/')
+  return /(^|\/)(e2e|test|tests|spec|specs)\//.test(normalized)
+    || /\.(spec|test)\.[cm]?[jt]sx?$/.test(normalized)
+}
+export function collectLargeFileWarnings(cwd, files, {
+  largeFileWarningLines = 500,
+  largeSpecWarningLines = 300,
+} = {}) {
+  const warnings = []
+  const seen = new Set()
+  for (const file of Array.isArray(files) ? files : []) {
+    const relativePath = String(file ?? '').trim()
+    if (relativePath === '' || seen.has(relativePath)) {
+      continue
+    }
+    seen.add(relativePath)
+    const absolutePath = path.resolve(cwd, relativePath)
+    let raw = ''
+    try {
+      raw = readFileSync(absolutePath, 'utf8')
+    } catch {
+      continue
+    }
+    const lineCount = countLines(raw)
+    const isSpec = isSpecLikeFile(relativePath)
+    if (isSpec && lineCount >= largeSpecWarningLines) {
+      warnings.push({
+        file: relativePath,
+        lineCount,
+        kind: 'large_spec',
+      })
+      continue
+    }
+    if (lineCount >= largeFileWarningLines) {
+      warnings.push({
+        file: relativePath,
+        lineCount,
+        kind: 'large_file',
+      })
+    }
+  }
+  return warnings.sort((left, right) => right.lineCount - left.lineCount)
+}
 export async function runShellCommand({
   cwd,
   command,

package/src/pi-report.mjs CHANGED Viewed

@@ -35,6 +35,17 @@ async function main() {
     console.log(`- ${kind}: ${count}`)
   }
+  const iterationSummaries = recent.filter((event) => event.kind === 'iteration_summary')
+  const warningsByIteration = iterationSummaries
+    .filter((event) => String(event.riskWarnings ?? '').trim() !== '')
+  if (warningsByIteration.length > 0) {
+    console.log('\nLarge file warnings:')
+    for (const event of warningsByIteration.slice(-5)) {
+      console.log(`- iteration ${event.iteration}: ${event.riskWarnings}`)
+    }
+  }
   const last = recent.at(-1)
   if (!last) {
     return

package/src/pi-rpc-adapter.mjs CHANGED Viewed

@@ -54,6 +54,44 @@ function extractToolTarget(toolName, args) {
   return ''
 }
+function extractShellCommand(args) {
+  if (!args || typeof args !== 'object') {
+    return ''
+  }
+  if (typeof args.command === 'string') {
+    return args.command
+  }
+  if (typeof args.cmd === 'string') {
+    return args.cmd
+  }
+  return ''
+}
+function isLargeShellRead(command) {
+  const text = String(command ?? '').trim()
+  if (text === '') {
+    return false
+  }
+  if (/^\s*cat\s+\S+/.test(text)) {
+    return true
+  }
+  const sedMatch = text.match(/sed\s+-n\s+['"]?(\d+)\s*,\s*(\d+)p['"]?/)
+  if (sedMatch) {
+    const start = Number.parseInt(sedMatch[1], 10)
+    const end = Number.parseInt(sedMatch[2], 10)
+    if (Number.isFinite(start) && Number.isFinite(end) && end >= start) {
+      return (end - start) >= 120
+    }
+  }
+  return false
+}
 function extractAssistantText(message) {
   if (!message || message.role !== 'assistant' || !Array.isArray(message.content)) {
     return ''
@@ -295,6 +333,7 @@ async function run() {
       activeToolName = String(data.toolName ?? '')
       activeToolStartedAt = Date.now()
       const target = extractToolTarget(data.toolName, data.args)
+      const shellCommand = data.toolName === 'bash' ? extractShellCommand(data.args) : ''
       if (signature === lastToolSignature) {
         repeatedToolCount += 1
       } else {
@@ -325,6 +364,9 @@ async function run() {
       }
       writeLive(`[PI tool:start] ${data.toolName}${suffix}\n`)
+      if (data.toolName === 'bash' && isLargeShellRead(shellCommand)) {
+        writeLive('[PI warning] large bash file read detected; prefer read or a smaller exact window to avoid truncated context.\n')
+      }
     }
     if (data.type === 'tool_execution_end') {

package/src/pi-supervisor.mjs CHANGED Viewed

@@ -13,6 +13,7 @@ import {
 import { appendTelemetry, ensureTelemetryFiles } from './pi-telemetry.mjs'
 import {
   appendLog,
+  collectLargeFileWarnings,
   commitStagedFiles,
   didRepoChange,
   ensureFileExists,
@@ -79,6 +80,10 @@ function printTerminalSummary(config, summary) {
     lines.push(`[PI supervisor] notes=${summary.notes}`)
   }
+  if (Array.isArray(summary.largeFileWarnings) && summary.largeFileWarnings.length > 0) {
+    lines.push(`[PI supervisor] large_file_warnings=${formatLargeFileWarningsInline(summary.largeFileWarnings)}`)
+  }
   if (summary.terminalReason) {
     lines.push(`[PI supervisor] terminal_reason=${summary.terminalReason}`)
   }
@@ -162,6 +167,7 @@ function createIterationSummary({
   gitFinalizeStatus,
   visualStatus,
   terminalReason,
+  largeFileWarnings,
   sessionId,
   developerModel,
   testerModel,
@@ -180,6 +186,7 @@ function createIterationSummary({
     gitFinalizeStatus,
     visualStatus,
     terminalReason,
+    largeFileWarnings,
     sessionId,
     developerModel,
     testerModel,
@@ -191,6 +198,39 @@ function didInvocationCreateCommit(invocation) {
   return invocation?.beforeSnapshot?.head !== invocation?.afterSnapshot?.head
 }
+function mergeLargeFileWarnings(existing, incoming) {
+  const merged = new Map()
+  for (const warning of [...(existing || []), ...(incoming || [])]) {
+    if (!warning?.file) {
+      continue
+    }
+    const key = `${warning.kind}:${warning.file}`
+    const current = merged.get(key)
+    if (!current || Number(warning.lineCount) > Number(current.lineCount)) {
+      merged.set(key, warning)
+    }
+  }
+  return [...merged.values()].sort((left, right) => right.lineCount - left.lineCount)
+}
+function findLargeFileWarnings(config, files) {
+  return collectLargeFileWarnings(config.cwd, files, {
+    largeFileWarningLines: config.largeFileWarningLines,
+    largeSpecWarningLines: config.largeSpecWarningLines,
+  })
+}
+function formatLargeFileWarningsInline(warnings) {
+  const list = Array.isArray(warnings) ? warnings : []
+  if (list.length === 0) {
+    return ''
+  }
+  return list
+    .slice(0, 3)
+    .map((warning) => `${warning.file}(${warning.lineCount}${warning.kind === 'large_spec' ? ',spec' : ''})`)
+    .join(', ')
+}
 function clampPromptLines(text, maxLines) {
   const normalized = String(text ?? '').trim()
   if (normalized === '') {
@@ -644,6 +684,7 @@ async function runMainTurnWithRetries({ config, iteration, phase, sessionId, ses
     prompt = buildSteeringPrompt(config, reason, {
       visualFeedback: await readLatestVisualFeedback(config),
       testerFeedback: await readLatestTesterFeedback(config),
+      largeFileWarnings: findLargeFileWarnings(config, listChangedFiles(config.cwd)),
     })
     if (shouldRetryForTimeout || shouldRetryForNoChange) {
@@ -656,12 +697,14 @@ async function runMainTurnWithRetries({ config, iteration, phase, sessionId, ses
 }
 async function runFixTurn({ config, iteration, phase, sessionId, sessionFile, testerOutput }) {
+  const largeFileWarnings = findLargeFileWarnings(config, listChangedFiles(config.cwd))
   const fixPrompt = buildFixPrompt(
     config,
     clampPromptLines(testerOutput, Number(config.maxVerificationExcerptLines) || 40),
     {
       visualFeedback: await readLatestVisualFeedback(config),
       testerFeedback: await readLatestTesterFeedback(config),
+      largeFileWarnings,
     }
   )
   return await runAgentInvocation({
@@ -762,6 +805,7 @@ async function runTesterTurn({
   developerNotes,
   reason,
 }) {
+  const largeFileWarnings = findLargeFileWarnings(config, changedFiles)
   const prompt = buildTesterPrompt(config, {
     phase,
     task,
@@ -770,6 +814,7 @@ async function runTesterTurn({
     reason,
     visualFeedback: await readLatestVisualFeedback(config),
     testerFeedback: await readLatestTesterFeedback(config),
+    largeFileWarnings,
   })
   const invocation = await runAgentInvocation({
@@ -835,6 +880,7 @@ async function runTesterCommitTurn({
   developerNotes,
   reason,
 }) {
+  const largeFileWarnings = findLargeFileWarnings(config, changedFiles)
   const prompt = buildCommitPrompt(config, {
     phase,
     task,
@@ -843,6 +889,7 @@ async function runTesterCommitTurn({
     reason,
     visualFeedback: await readLatestVisualFeedback(config),
     testerFeedback: await readLatestTesterFeedback(config),
+    largeFileWarnings,
   })
   const invocation = await runAgentInvocation({
@@ -1054,6 +1101,7 @@ async function runIteration({ config, state, iteration }) {
         gitFinalizeStatus: 'not_run',
         visualStatus: 'not_run',
         terminalReason: 'all_tasks_complete',
+        largeFileWarnings: [],
         notes: 'No unchecked tasks remain in TODOS.md.',
         sessionId: state.sessionId || '',
         outputPath: config.lastAgentOutputFile,
@@ -1103,6 +1151,7 @@ async function runIteration({ config, state, iteration }) {
   let commitPlanFound = false
   let gitFinalizeStatus = 'not_run'
   let terminalReason = mainInvocation.result.terminalReason || ''
+  let largeFileWarnings = findLargeFileWarnings(config, mainInvocation.changedFiles)
   const noteParts = [`developer: ${mainInvocation.result.notes}`]
   if (mainInvocation.result.status === 'success' && config.transport === 'mock') {
@@ -1157,6 +1206,7 @@ async function runIteration({ config, state, iteration }) {
       testerVerdict = testerInvocation.testerVerdict
       commitPlanFound = testerInvocation.commitPlanFound === true
       terminalReason = testerInvocation.result.terminalReason || terminalReason
+      largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
       noteParts.push(`tester: ${testerInvocation.result.notes}`)
       await writeTesterFeedback(config, {
         iteration,
@@ -1184,6 +1234,7 @@ async function runIteration({ config, state, iteration }) {
         testerVerdict = testerCommitInvocation.testerVerdict
         commitPlanFound = testerCommitInvocation.commitPlanFound === true
         terminalReason = testerCommitInvocation.result.terminalReason || terminalReason
+        largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
         noteParts.push(`tester_commit: ${testerCommitInvocation.result.notes}`)
         await writeTesterFeedback(config, {
           iteration,
@@ -1241,6 +1292,7 @@ async function runIteration({ config, state, iteration }) {
       sessionFile = fixInvocation.result.sessionFile || sessionFile
       developerStatus = fixInvocation.result.status
       terminalReason = fixInvocation.result.terminalReason || 'developer_fix_incomplete'
+      largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
       noteParts.push(`developer_fix: ${fixInvocation.result.notes}`)
       if (fixInvocation.result.status === 'success') {
@@ -1258,6 +1310,7 @@ async function runIteration({ config, state, iteration }) {
         testerVerdict = testerRecheck.testerVerdict
         commitPlanFound = testerRecheck.commitPlanFound === true
         terminalReason = testerRecheck.result.terminalReason || terminalReason
+        largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
         noteParts.push(`tester_recheck: ${testerRecheck.result.notes}`)
         await writeTesterFeedback(config, {
           iteration,
@@ -1285,6 +1338,7 @@ async function runIteration({ config, state, iteration }) {
           testerVerdict = testerCommitInvocation.testerVerdict
           commitPlanFound = testerCommitInvocation.commitPlanFound === true
           terminalReason = testerCommitInvocation.result.terminalReason || terminalReason
+          largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
           noteParts.push(`tester_commit: ${testerCommitInvocation.result.notes}`)
           await writeTesterFeedback(config, {
             iteration,
@@ -1436,7 +1490,7 @@ async function runIteration({ config, state, iteration }) {
   await appendLog(
     config.logFile,
-    `Finished iteration ${iteration} with status=${finalStatus} verification=${finalVerificationStatus} tester_verdict=${testerVerdict} commit_plan_found=${commitPlanFound} terminal_reason=${terminalReason}`
+    `Finished iteration ${iteration} with status=${finalStatus} verification=${finalVerificationStatus} tester_verdict=${testerVerdict} commit_plan_found=${commitPlanFound} terminal_reason=${terminalReason}${largeFileWarnings.length > 0 ? ` large_file_warnings=${formatLargeFileWarningsInline(largeFileWarnings)}` : ''}`
   )
   const iterationEndSnapshot = getRepoSnapshot(config.cwd)
@@ -1453,6 +1507,7 @@ async function runIteration({ config, state, iteration }) {
     gitFinalizeStatus,
     visualStatus,
     terminalReason,
+    largeFileWarnings,
     sessionId,
     developerModel: developerModelName,
     testerModel: testerModelName,
@@ -1486,6 +1541,7 @@ async function runIteration({ config, state, iteration }) {
     testerVerdict,
     commitPlanFound,
     terminalReason,
+    riskWarnings: formatLargeFileWarningsInline(largeFileWarnings),
     notes: noteParts.join(' | '),
   })
@@ -1504,6 +1560,7 @@ async function runIteration({ config, state, iteration }) {
       gitFinalizeStatus,
       visualStatus,
       terminalReason,
+      largeFileWarnings,
       notes: noteParts.join(' | '),
       sessionId,
       outputPath: config.lastAgentOutputFile,

package/src/pi-telemetry.mjs CHANGED Viewed

@@ -1,6 +1,6 @@
 import fs from 'node:fs/promises'
-const CSV_HEADER = 'timestamp,iteration,phase,kind,status,transport,session_id,timed_out,exit_code,duration_seconds,commit_before,commit_after,repo_changed,changed_files_count,verification_status,retry_count,role,model,tool_calls,tool_errors,message_updates,stop_reason,loop_detected,loop_signature,tester_verdict,commit_plan_found,terminal_reason,notes\n'
+const CSV_HEADER = 'timestamp,iteration,phase,kind,status,transport,session_id,timed_out,exit_code,duration_seconds,commit_before,commit_after,repo_changed,changed_files_count,verification_status,retry_count,role,model,tool_calls,tool_errors,message_updates,stop_reason,loop_detected,loop_signature,tester_verdict,commit_plan_found,terminal_reason,risk_warnings,notes\n'
 function csvEscape(value) {
   const text = String(value ?? '')
@@ -56,6 +56,7 @@ export async function appendTelemetry(config, event) {
     event.testerVerdict,
     event.commitPlanFound,
     event.terminalReason,
+    event.riskWarnings,
     event.notes,
   ].map(csvEscape).join(',')

package/templates/DEVELOPER.md CHANGED Viewed

@@ -20,6 +20,9 @@ Rules:
 - Use the configured smoke verification path as the fast inner-loop gate. Do not replace it with a long full-flow Playwright spec unless the task explicitly requires it.
 - If a long Playwright happy-path spec changes, validate with smoke plus one narrow targeted spec or deterministic state hook, not the entire full-flow run.
 - Reserve long full-flow Playwright specs for an explicit nightly or post-run lane, not the developer turn.
+- Use `read` for source inspection. Use shell only for `git`, tests, and narrow diagnostics.
+- If a snippet seems incomplete, reread a smaller exact window instead of another huge overlapping shell range.
+- Do not build edits from large `sed`/`grep` output or from memory after partial shell reads.
 - Trust tool output over your own guesses.
 - Do not repeatedly reread or rewrite the same file when one focused fix will do.
 - After one failed edit attempt, reread the file before retrying.

package/templates/TESTER.md CHANGED Viewed

@@ -7,7 +7,7 @@ Your job:
 - review the developer's change from an independent user-facing perspective
 - add or improve focused verification where needed
 - verify actual functionality, not just plausibility
-- produce a commit plan when the work is truly ready
+- create the final commit only when the work is truly ready
 Rules:
@@ -16,6 +16,9 @@ Rules:
 - Run the configured smoke verification command as the default inner-loop gate.
 - Do not run long full-flow Playwright happy-path specs in the tester turn unless the task explicitly requires them.
 - If a long spec changed, validate with smoke plus one narrow targeted spec or deterministic state setup instead of replaying the entire run.
+- Use `read` for source inspection. Use shell only for `git`, tests, and narrow diagnostics.
+- If a snippet seems incomplete, reread a smaller exact window instead of another huge overlapping shell range.
+- Do not build edits from large `sed`/`grep` output or from memory after partial shell reads.
 - Treat player-facing dead ends, missing affordances, broken progression, console/runtime failures, and unusable UI as real failures.
 - If the task affects menus, unlocks, progression, classes, routes, shops, onboarding, or gating, verify a fresh-save path.
 - Do not hide product bugs by weakening tests.
@@ -23,7 +26,7 @@ Rules:
 - After one failed edit attempt, reread the file before retrying.
 - Do not repeat the same exact oldText-based edit on the same file.
 - If visual review is enabled, maintain the screenshot capture flow and manifest expected by the harness.
-- If the change passes, do not run `git add` or `git commit` yourself. Provide a commit plan for the harness instead.
+- If the change passes, stage only the related files and create the commit yourself.
 - If the working tree cannot be isolated safely, return `VERDICT: BLOCKED`.
 Before stopping:
@@ -31,7 +34,7 @@ Before stopping:
 - include `Observed flow:`
 - include `Player-facing result:`
 - include `Regression check:`
+- if passing, include `COMMIT_CREATED: true`
 - if passing, include `COMMIT_MESSAGE: ...`
-- if passing, include `COMMIT_FILES:`
-- if passing, include one `- path/to/file` line per file
+- if passing, include `COMMIT_SHA: ...`
 - end with exactly one verdict line: `VERDICT: PASS`, `VERDICT: FAIL`, or `VERDICT: BLOCKED`

package/templates/pi.config.example.json CHANGED Viewed

@@ -6,6 +6,8 @@
   "testerInstructionsFile": "pi/TESTER.md",
   "commitMode": "agent",
   "promptMode": "compact",
+  "largeFileWarningLines": 500,
+  "largeSpecWarningLines": 300,
   "piModel": "local/text-model",
   "models": {
     "local/text-model": {