@sebastianandreasson/pi-autonomous-agents 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  - a fast verification step
7
7
  - a skeptical `tester` pass
8
8
  - optional periodic multimodal visual review
9
- - harness-owned git finalization
9
+ - tester-owned final commit by default
10
10
 
11
11
  The package is intentionally generic. It does not know how to navigate or test a specific app on its own.
12
12
 
@@ -18,7 +18,7 @@ The package is intentionally generic. It does not know how to navigate or test a
18
18
  - telemetry
19
19
  - loop guards, timeout guards, and retries
20
20
  - tester feedback + visual feedback handoff
21
- - harness-owned git finalize step
21
+ - optional legacy harness git finalize step for `commitMode: "plan"`
22
22
  - multimodal visual review client
23
23
 
24
24
  ## What Stays Per Project
@@ -119,4 +119,8 @@ By default, successful tester passes should stage and create the commit directly
119
119
 
120
120
  Prompt/context handoff is compact by default. The harness now caps prior feedback excerpts, changed-file lists, verification excerpts, and prompt note handoff. If needed, tune `maxPromptChangedFiles`, `maxVisualFeedbackLines`, `maxTesterFeedbackLines`, `maxPromptNotesLines`, and `maxVerificationExcerptLines`.
121
121
 
122
+ The default coding tool mix is now safer for local models: `read,edit,write,find,ls,bash`. Prompts explicitly steer source inspection toward `read` and reserve shell usage for `git`, tests, and narrow diagnostics.
123
+
124
+ The harness also emits lightweight large-file warnings for touched source/spec files and carries them into `.pi-last-iteration.json`, `pi-harness report`, and relevant prompts. Tune `largeFileWarningLines` and `largeSpecWarningLines` if needed.
125
+
122
126
  The harness expects screenshot capture to produce a `manifest.json` plus image files under the configured visual capture directory.
package/SETUP.md CHANGED
@@ -47,6 +47,7 @@ If the repo uses another package manager already, use the repo-native equivalent
47
47
  - `developerInstructionsFile`: `pi/DEVELOPER.md`
48
48
  - `testerInstructionsFile`: `pi/TESTER.md`
49
49
  - `commitMode`: normally `agent`
50
+ - `promptMode`: normally `compact`
50
51
  - `testCommand`: a fast bounded verification command for this repo
51
52
  - `visualCaptureCommand`: only if this repo has a real screenshot capture flow
52
53
  - `models` / `piModel` / `visualReviewModel` / `roleModels`: configure the models actually available in this environment
@@ -125,6 +126,7 @@ Recommended pattern:
125
126
  - local or slightly stronger model for `tester`
126
127
  - stronger frontier model for `visualReview` only if available
127
128
  - keep `commitMode` as `agent` unless the repo explicitly needs legacy harness-managed commit-plan parsing
129
+ - keep large-file thresholds sensible for local models (`largeFileWarningLines`, `largeSpecWarningLines`)
128
130
 
129
131
  Example shape:
130
132
 
@@ -192,6 +194,7 @@ For flow debugging, inspect `.pi-last-iteration.json` after a run. It summarizes
192
194
  - Do not enable visual review unless the repo actually has a usable capture command and model config.
193
195
  - Keep changes minimal and local to harness setup.
194
196
  - Prefer very small, implementation-shaped TODO items for local models. Broad tasks tend to create long turns, retries, and weak tester behavior.
197
+ - Prefer `read` for code inspection and keep shell usage focused on `git`, tests, and narrow diagnostics, especially for weaker local models.
195
198
 
196
199
  ## What To Report Back
197
200
 
@@ -30,7 +30,7 @@ Main package files:
30
30
  - `src/pi-client.mjs`: transport layer
31
31
  - `src/pi-rpc-adapter.mjs`: built-in adapter from supervisor JSON to `pi --mode rpc`
32
32
  - `src/pi-config.mjs`: config loader
33
- - `src/pi-repo.mjs`: repo helpers, verification runner, git finalize step
33
+ - `src/pi-repo.mjs`: repo helpers, verification runner, and optional legacy git finalize step
34
34
  - `src/pi-telemetry.mjs`: telemetry writer/reader
35
35
  - `src/pi-prompts.mjs`: default prompt builders
36
36
  - `src/pi-visual-review.mjs`: multimodal visual-review worker
@@ -126,7 +126,7 @@ Request shape:
126
126
  "runtimeDir": "/absolute/repo/path/.pi-runtime",
127
127
  "piCli": "pi",
128
128
  "model": "local/model-name",
129
- "tools": "read,bash,edit,write,grep,find,ls",
129
+ "tools": "read,edit,write,find,ls,bash",
130
130
  "thinking": "",
131
131
  "noExtensions": false,
132
132
  "noSkills": false,
@@ -170,6 +170,8 @@ The default flow keeps commit ownership with the active agent:
170
170
 
171
171
  If a repo explicitly needs the older harness-managed commit-plan flow, set `commitMode` to `plan`. In that mode, `testerCommit` and parsed commit plans are used as a compatibility path rather than the default.
172
172
 
173
+ For source inspection, prompts prefer `read` and reserve shell usage for `git`, tests, and narrow diagnostics. Large shell file reads are more likely to truncate under context pressure than focused `read` calls.
174
+
173
175
  ## Persistent Handoffs
174
176
 
175
177
  The harness persists two cross-iteration handoff files:
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "@sebastianandreasson/pi-autonomous-agents",
3
3
  "private": false,
4
- "version": "0.3.0",
4
+ "version": "0.4.0",
5
5
  "type": "module",
6
6
  "description": "Portable unattended PI harness for developer/tester/visual-review loops.",
7
7
  "license": "MIT",
package/src/index.mjs CHANGED
@@ -10,4 +10,5 @@ export {
10
10
  runStartupPreflight,
11
11
  } from './pi-preflight.mjs'
12
12
  export { clearHarnessHistory, collectHistoryTargets } from './pi-history.mjs'
13
+ export { collectLargeFileWarnings } from './pi-repo.mjs'
13
14
  export { runAgentTurn } from './pi-client.mjs'
package/src/pi-config.mjs CHANGED
@@ -258,7 +258,9 @@ export function loadConfig(mode = 'once') {
258
258
  maxTesterFeedbackLines: readInt('PI_MAX_TESTER_FEEDBACK_LINES', file.maxTesterFeedbackLines, 32),
259
259
  maxPromptNotesLines: readInt('PI_MAX_PROMPT_NOTES_LINES', file.maxPromptNotesLines, 16),
260
260
  maxVerificationExcerptLines: readInt('PI_MAX_VERIFICATION_EXCERPT_LINES', file.maxVerificationExcerptLines, 40),
261
- piTools: readString('PI_TOOLS', file.piTools, 'read,bash,edit,write,grep,find,ls'),
261
+ largeFileWarningLines: readInt('PI_LARGE_FILE_WARNING_LINES', file.largeFileWarningLines, 500),
262
+ largeSpecWarningLines: readInt('PI_LARGE_SPEC_WARNING_LINES', file.largeSpecWarningLines, 300),
263
+ piTools: readString('PI_TOOLS', file.piTools, 'read,edit,write,find,ls,bash'),
262
264
  piThinking: readString('PI_THINKING', file.piThinking, ''),
263
265
  piNoExtensions: readBool('PI_NO_EXTENSIONS', file.piNoExtensions, false),
264
266
  piNoSkills: readBool('PI_NO_SKILLS', file.piNoSkills, false),
@@ -40,6 +40,20 @@ function formatChangedFilesSection(files, maxFiles) {
40
40
  return lines.join('\n')
41
41
  }
42
42
 
43
+ function formatLargeFileRiskHint(warnings) {
44
+ const list = Array.isArray(warnings) ? warnings.filter(Boolean) : []
45
+ if (list.length === 0) {
46
+ return ''
47
+ }
48
+
49
+ const lines = list
50
+ .slice(0, 3)
51
+ .map((warning) => `- ${warning.file} (${warning.lineCount} lines${warning.kind === 'large_spec' ? ', spec' : ''})`)
52
+ .join('\n')
53
+
54
+ return `\nLarge file risk in touched files:\n${lines}\nPrefer helper extraction, smaller scoped edits, or test splitting over broad in-place edits.\n`
55
+ }
56
+
43
57
  function displayPath(config, filePath) {
44
58
  const relativePath = path.relative(config.cwd, filePath)
45
59
  if (
@@ -160,6 +174,9 @@ Harness rules:
160
174
  - Start by checking git status so you know whether unrelated changes already exist.
161
175
  - Update code, config, and docs only as needed for the selected task.
162
176
  - Tick only the checkbox items that are actually completed.
177
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
178
+ - Do not build edits from large sed/grep output or from memory after partial shell reads.
179
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
163
180
  - If blocked, add a brief note directly under the relevant task in ${taskFile} explaining the blocker, then stop.
164
181
  - Do not create the final commit during the developer pass.
165
182
  ${staleEditRecoveryRules()}
@@ -180,6 +197,9 @@ Rules:
180
197
  - Start with git status.
181
198
  - Select the first unchecked actionable checkbox in phase order.
182
199
  - Keep changes minimal and scoped.
200
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
201
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
202
+ - Do not edit from memory after partial shell output.
183
203
  - Tick only completed items.
184
204
  - If blocked, note it under the task in ${taskFile} and stop.
185
205
  - Do not touch lockfiles, generated files, or unrelated assets.
@@ -203,11 +223,13 @@ export function buildFixPrompt(config, recentVerificationOutput, options = {}) {
203
223
  config.usingBundledDeveloperInstructions,
204
224
  )
205
225
  const findings = clampLines(recentVerificationOutput, configMaxLines(config, 'maxVerificationExcerptLines', 40))
226
+ const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
206
227
 
207
228
  if (!config.usingBundledDeveloperInstructions) {
208
229
  return `Read ${taskFile} and ${instructionsFile}.
209
230
  ${authorityLine}${visualFeedbackSection}
210
231
  ${testerFeedbackSection}
232
+ ${largeFileRiskHint}
211
233
 
212
234
  The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
213
235
 
@@ -218,6 +240,9 @@ Harness rules:
218
240
  - Start by checking git status so you know which files are already dirty.
219
241
  - Do not paper over product bugs by weakening tests.
220
242
  - Keep changes minimal and focused on the failing behavior.
243
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
244
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
245
+ - Do not edit from memory after partial shell output.
221
246
  - Do not perform speculative cleanup or unrelated refactors in this pass.
222
247
  - Do not create the final commit during the developer fix pass.
223
248
  ${staleEditRecoveryRules()}
@@ -230,6 +255,7 @@ Before stopping:
230
255
  return `Read ${taskFile} and ${instructionsFile}.
231
256
  ${authorityLine}${visualFeedbackSection}
232
257
  ${testerFeedbackSection}
258
+ ${largeFileRiskHint}
233
259
 
234
260
  The tester step found a real problem in the current implementation. Fix only the product behavior related to the current phase and current task.
235
261
 
@@ -240,6 +266,9 @@ Rules:
240
266
  - Start with git status.
241
267
  - Keep the fix narrow.
242
268
  - Do not weaken tests to hide product bugs.
269
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
270
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
271
+ - Do not edit from memory after partial shell output.
243
272
  - Do not perform speculative cleanup or unrelated refactors.
244
273
  - Do not create the final commit.
245
274
  ${staleEditRecoveryRules()}
@@ -259,12 +288,14 @@ export function buildSteeringPrompt(config, reason, options = {}) {
259
288
  config.developerInstructionsFile,
260
289
  config.usingBundledDeveloperInstructions,
261
290
  )
291
+ const largeFileRiskHint = formatLargeFileRiskHint(options.largeFileWarnings)
262
292
 
263
293
  if (!config.usingBundledDeveloperInstructions) {
264
294
  return `Continue from the current repo state.
265
295
  Read ${taskFile} and ${instructionsFile}.
266
296
  ${authorityLine}${visualFeedbackSection}
267
297
  ${testerFeedbackSection}
298
+ ${largeFileRiskHint}
268
299
 
269
300
  Reason for this follow-up: ${reason}
270
301
 
@@ -272,9 +303,11 @@ Select the first unchecked actionable checkbox in the current phase, complete on
272
303
 
273
304
  Additional harness guardrails:
274
305
  - Start by checking git status.
306
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
275
307
  - Do not repeat the same tool call over and over.
276
308
  - If you already read a file, use that context instead of rereading it unless something changed.
277
309
  - If an edit fails once, reread the file before retrying. Do not repeat the same exact edit attempt.
310
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
278
311
  - If you are stuck, make the smallest decisive next action or stop and state the blocker.`
279
312
  }
280
313
 
@@ -282,15 +315,18 @@ Additional harness guardrails:
282
315
  Read ${taskFile} and ${instructionsFile}.
283
316
  ${authorityLine}${visualFeedbackSection}
284
317
  ${testerFeedbackSection}
318
+ ${largeFileRiskHint}
285
319
 
286
320
  Reason for this follow-up: ${reason}
287
321
 
288
322
  Select the first unchecked actionable checkbox in the current phase, complete one coherent task, tick completed items, run verification, and stop.
289
323
 
290
324
  Additional guardrails:
325
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
291
326
  - Do not repeat the same tool call over and over.
292
327
  - If you already read a file, use that context instead of rereading it unless something changed.
293
328
  - If an edit fails once, reread the file before retrying. Do not repeat the same exact edit attempt.
329
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
294
330
  - Prefer the configured smoke verification path and one narrow targeted check over long full-flow Playwright specs.
295
331
  - If you are stuck, make the smallest decisive next action or stop and state the blocker.`
296
332
  }
@@ -303,6 +339,7 @@ export function buildTesterPrompt(config, {
303
339
  reason = 'tester_review',
304
340
  visualFeedback = '',
305
341
  testerFeedback = '',
342
+ largeFileWarnings = [],
306
343
  }) {
307
344
  const taskFile = displayPath(config, config.taskFile)
308
345
  const instructionsFile = displayPath(config, config.testerInstructionsFile)
@@ -326,11 +363,13 @@ export function buildTesterPrompt(config, {
326
363
  config.usingBundledTesterInstructions,
327
364
  )
328
365
  const passOwnership = testerPassOwnershipRules(config)
366
+ const largeFileRiskHint = formatLargeFileRiskHint(largeFileWarnings)
329
367
 
330
368
  if (!config.usingBundledTesterInstructions) {
331
369
  return `Read ${taskFile} and ${instructionsFile}.
332
370
  ${authorityLine}${visualFeedbackSection}
333
371
  ${testerFeedbackSection}
372
+ ${largeFileRiskHint}
334
373
 
335
374
  You are the TESTER role. You are reviewing the most recent developer work from an independent quality and functionality perspective.
336
375
 
@@ -348,6 +387,8 @@ Rules:
348
387
  - Start with git status.
349
388
  - Follow repo-local tester instructions for what to verify and which commands to run.
350
389
  - Prefer one focused review pass.
390
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
391
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
351
392
  - If blocked or inconclusive, return VERDICT: BLOCKED.
352
393
  - Do not hide real bugs with brittle tests.
353
394
  - ${passOwnership.successRule.slice(2)}
@@ -370,6 +411,7 @@ Before stopping, end your final response with exactly one verdict line:
370
411
  return `Read ${taskFile} and ${instructionsFile}.
371
412
  ${authorityLine}${visualFeedbackSection}
372
413
  ${testerFeedbackSection}
414
+ ${largeFileRiskHint}
373
415
 
374
416
  You are the TESTER role. You are reviewing the most recent developer work from an independent quality and functionality perspective.
375
417
 
@@ -385,9 +427,11 @@ ${changedFilesSection}
385
427
 
386
428
  Rules:
387
429
  - Start with git status.
430
+ - Use read for source inspection. Use bash only for git, tests, and narrow diagnostics.
388
431
  - Run the repo verification command yourself: ${verificationCommand}
389
432
  ${indentBlock(innerLoopValidationRules(verificationCommand), '\t')}
390
433
  - Prefer one focused browser-driven review pass.
434
+ - If a snippet seems incomplete, reread a smaller exact window with read instead of another large overlapping shell range.
391
435
  - Do not hide real bugs with brittle tests.
392
436
  - If blocked or inconclusive, return VERDICT: BLOCKED.
393
437
  ${indentBlock(passOwnership.successRule, '\t')}
@@ -415,6 +459,7 @@ export function buildCommitPrompt(config, {
415
459
  reason = 'tester_passed_without_commit',
416
460
  visualFeedback = '',
417
461
  testerFeedback = '',
462
+ largeFileWarnings = [],
418
463
  }) {
419
464
  const taskFile = displayPath(config, config.taskFile)
420
465
  const instructionsFile = displayPath(config, config.testerInstructionsFile)
@@ -433,10 +478,12 @@ export function buildCommitPrompt(config, {
433
478
  developerNotes || '(none provided)',
434
479
  configMaxLines(config, 'maxPromptNotesLines', 16),
435
480
  )
481
+ const largeFileRiskHint = formatLargeFileRiskHint(largeFileWarnings)
436
482
 
437
483
  return `Read ${taskFile} and ${instructionsFile}.
438
484
  ${authorityLine}${visualFeedbackSection}
439
485
  ${testerFeedbackSection}
486
+ ${largeFileRiskHint}
440
487
 
441
488
  You are the TESTER role. The implementation already passed functional review, but the final commit was not created.
442
489
 
package/src/pi-repo.mjs CHANGED
@@ -225,6 +225,65 @@ export function findFirstUncheckedTaskInfo(taskFile) {
225
225
  }
226
226
  }
227
227
 
228
+ function countLines(text) {
229
+ const normalized = String(text ?? '')
230
+ if (normalized === '') {
231
+ return 0
232
+ }
233
+ return normalized.split('\n').length
234
+ }
235
+
236
+ function isSpecLikeFile(filePath) {
237
+ const normalized = String(filePath ?? '').replaceAll('\\', '/')
238
+ return /(^|\/)(e2e|test|tests|spec|specs)\//.test(normalized)
239
+ || /\.(spec|test)\.[cm]?[jt]sx?$/.test(normalized)
240
+ }
241
+
242
+ export function collectLargeFileWarnings(cwd, files, {
243
+ largeFileWarningLines = 500,
244
+ largeSpecWarningLines = 300,
245
+ } = {}) {
246
+ const warnings = []
247
+ const seen = new Set()
248
+
249
+ for (const file of Array.isArray(files) ? files : []) {
250
+ const relativePath = String(file ?? '').trim()
251
+ if (relativePath === '' || seen.has(relativePath)) {
252
+ continue
253
+ }
254
+ seen.add(relativePath)
255
+
256
+ const absolutePath = path.resolve(cwd, relativePath)
257
+ let raw = ''
258
+ try {
259
+ raw = readFileSync(absolutePath, 'utf8')
260
+ } catch {
261
+ continue
262
+ }
263
+
264
+ const lineCount = countLines(raw)
265
+ const isSpec = isSpecLikeFile(relativePath)
266
+ if (isSpec && lineCount >= largeSpecWarningLines) {
267
+ warnings.push({
268
+ file: relativePath,
269
+ lineCount,
270
+ kind: 'large_spec',
271
+ })
272
+ continue
273
+ }
274
+
275
+ if (lineCount >= largeFileWarningLines) {
276
+ warnings.push({
277
+ file: relativePath,
278
+ lineCount,
279
+ kind: 'large_file',
280
+ })
281
+ }
282
+ }
283
+
284
+ return warnings.sort((left, right) => right.lineCount - left.lineCount)
285
+ }
286
+
228
287
  export async function runShellCommand({
229
288
  cwd,
230
289
  command,
package/src/pi-report.mjs CHANGED
@@ -35,6 +35,17 @@ async function main() {
35
35
  console.log(`- ${kind}: ${count}`)
36
36
  }
37
37
 
38
+ const iterationSummaries = recent.filter((event) => event.kind === 'iteration_summary')
39
+ const warningsByIteration = iterationSummaries
40
+ .filter((event) => String(event.riskWarnings ?? '').trim() !== '')
41
+
42
+ if (warningsByIteration.length > 0) {
43
+ console.log('\nLarge file warnings:')
44
+ for (const event of warningsByIteration.slice(-5)) {
45
+ console.log(`- iteration ${event.iteration}: ${event.riskWarnings}`)
46
+ }
47
+ }
48
+
38
49
  const last = recent.at(-1)
39
50
  if (!last) {
40
51
  return
@@ -54,6 +54,44 @@ function extractToolTarget(toolName, args) {
54
54
  return ''
55
55
  }
56
56
 
57
+ function extractShellCommand(args) {
58
+ if (!args || typeof args !== 'object') {
59
+ return ''
60
+ }
61
+
62
+ if (typeof args.command === 'string') {
63
+ return args.command
64
+ }
65
+
66
+ if (typeof args.cmd === 'string') {
67
+ return args.cmd
68
+ }
69
+
70
+ return ''
71
+ }
72
+
73
+ function isLargeShellRead(command) {
74
+ const text = String(command ?? '').trim()
75
+ if (text === '') {
76
+ return false
77
+ }
78
+
79
+ if (/^\s*cat\s+\S+/.test(text)) {
80
+ return true
81
+ }
82
+
83
+ const sedMatch = text.match(/sed\s+-n\s+['"]?(\d+)\s*,\s*(\d+)p['"]?/)
84
+ if (sedMatch) {
85
+ const start = Number.parseInt(sedMatch[1], 10)
86
+ const end = Number.parseInt(sedMatch[2], 10)
87
+ if (Number.isFinite(start) && Number.isFinite(end) && end >= start) {
88
+ return (end - start) >= 120
89
+ }
90
+ }
91
+
92
+ return false
93
+ }
94
+
57
95
  function extractAssistantText(message) {
58
96
  if (!message || message.role !== 'assistant' || !Array.isArray(message.content)) {
59
97
  return ''
@@ -295,6 +333,7 @@ async function run() {
295
333
  activeToolName = String(data.toolName ?? '')
296
334
  activeToolStartedAt = Date.now()
297
335
  const target = extractToolTarget(data.toolName, data.args)
336
+ const shellCommand = data.toolName === 'bash' ? extractShellCommand(data.args) : ''
298
337
  if (signature === lastToolSignature) {
299
338
  repeatedToolCount += 1
300
339
  } else {
@@ -325,6 +364,9 @@ async function run() {
325
364
  }
326
365
 
327
366
  writeLive(`[PI tool:start] ${data.toolName}${suffix}\n`)
367
+ if (data.toolName === 'bash' && isLargeShellRead(shellCommand)) {
368
+ writeLive('[PI warning] large bash file read detected; prefer read or a smaller exact window to avoid truncated context.\n')
369
+ }
328
370
  }
329
371
 
330
372
  if (data.type === 'tool_execution_end') {
@@ -13,6 +13,7 @@ import {
13
13
  import { appendTelemetry, ensureTelemetryFiles } from './pi-telemetry.mjs'
14
14
  import {
15
15
  appendLog,
16
+ collectLargeFileWarnings,
16
17
  commitStagedFiles,
17
18
  didRepoChange,
18
19
  ensureFileExists,
@@ -79,6 +80,10 @@ function printTerminalSummary(config, summary) {
79
80
  lines.push(`[PI supervisor] notes=${summary.notes}`)
80
81
  }
81
82
 
83
+ if (Array.isArray(summary.largeFileWarnings) && summary.largeFileWarnings.length > 0) {
84
+ lines.push(`[PI supervisor] large_file_warnings=${formatLargeFileWarningsInline(summary.largeFileWarnings)}`)
85
+ }
86
+
82
87
  if (summary.terminalReason) {
83
88
  lines.push(`[PI supervisor] terminal_reason=${summary.terminalReason}`)
84
89
  }
@@ -162,6 +167,7 @@ function createIterationSummary({
162
167
  gitFinalizeStatus,
163
168
  visualStatus,
164
169
  terminalReason,
170
+ largeFileWarnings,
165
171
  sessionId,
166
172
  developerModel,
167
173
  testerModel,
@@ -180,6 +186,7 @@ function createIterationSummary({
180
186
  gitFinalizeStatus,
181
187
  visualStatus,
182
188
  terminalReason,
189
+ largeFileWarnings,
183
190
  sessionId,
184
191
  developerModel,
185
192
  testerModel,
@@ -191,6 +198,39 @@ function didInvocationCreateCommit(invocation) {
191
198
  return invocation?.beforeSnapshot?.head !== invocation?.afterSnapshot?.head
192
199
  }
193
200
 
201
+ function mergeLargeFileWarnings(existing, incoming) {
202
+ const merged = new Map()
203
+ for (const warning of [...(existing || []), ...(incoming || [])]) {
204
+ if (!warning?.file) {
205
+ continue
206
+ }
207
+ const key = `${warning.kind}:${warning.file}`
208
+ const current = merged.get(key)
209
+ if (!current || Number(warning.lineCount) > Number(current.lineCount)) {
210
+ merged.set(key, warning)
211
+ }
212
+ }
213
+ return [...merged.values()].sort((left, right) => right.lineCount - left.lineCount)
214
+ }
215
+
216
+ function findLargeFileWarnings(config, files) {
217
+ return collectLargeFileWarnings(config.cwd, files, {
218
+ largeFileWarningLines: config.largeFileWarningLines,
219
+ largeSpecWarningLines: config.largeSpecWarningLines,
220
+ })
221
+ }
222
+
223
+ function formatLargeFileWarningsInline(warnings) {
224
+ const list = Array.isArray(warnings) ? warnings : []
225
+ if (list.length === 0) {
226
+ return ''
227
+ }
228
+ return list
229
+ .slice(0, 3)
230
+ .map((warning) => `${warning.file}(${warning.lineCount}${warning.kind === 'large_spec' ? ',spec' : ''})`)
231
+ .join(', ')
232
+ }
233
+
194
234
  function clampPromptLines(text, maxLines) {
195
235
  const normalized = String(text ?? '').trim()
196
236
  if (normalized === '') {
@@ -644,6 +684,7 @@ async function runMainTurnWithRetries({ config, iteration, phase, sessionId, ses
644
684
  prompt = buildSteeringPrompt(config, reason, {
645
685
  visualFeedback: await readLatestVisualFeedback(config),
646
686
  testerFeedback: await readLatestTesterFeedback(config),
687
+ largeFileWarnings: findLargeFileWarnings(config, listChangedFiles(config.cwd)),
647
688
  })
648
689
 
649
690
  if (shouldRetryForTimeout || shouldRetryForNoChange) {
@@ -656,12 +697,14 @@ async function runMainTurnWithRetries({ config, iteration, phase, sessionId, ses
656
697
  }
657
698
 
658
699
  async function runFixTurn({ config, iteration, phase, sessionId, sessionFile, testerOutput }) {
700
+ const largeFileWarnings = findLargeFileWarnings(config, listChangedFiles(config.cwd))
659
701
  const fixPrompt = buildFixPrompt(
660
702
  config,
661
703
  clampPromptLines(testerOutput, Number(config.maxVerificationExcerptLines) || 40),
662
704
  {
663
705
  visualFeedback: await readLatestVisualFeedback(config),
664
706
  testerFeedback: await readLatestTesterFeedback(config),
707
+ largeFileWarnings,
665
708
  }
666
709
  )
667
710
  return await runAgentInvocation({
@@ -762,6 +805,7 @@ async function runTesterTurn({
762
805
  developerNotes,
763
806
  reason,
764
807
  }) {
808
+ const largeFileWarnings = findLargeFileWarnings(config, changedFiles)
765
809
  const prompt = buildTesterPrompt(config, {
766
810
  phase,
767
811
  task,
@@ -770,6 +814,7 @@ async function runTesterTurn({
770
814
  reason,
771
815
  visualFeedback: await readLatestVisualFeedback(config),
772
816
  testerFeedback: await readLatestTesterFeedback(config),
817
+ largeFileWarnings,
773
818
  })
774
819
 
775
820
  const invocation = await runAgentInvocation({
@@ -835,6 +880,7 @@ async function runTesterCommitTurn({
835
880
  developerNotes,
836
881
  reason,
837
882
  }) {
883
+ const largeFileWarnings = findLargeFileWarnings(config, changedFiles)
838
884
  const prompt = buildCommitPrompt(config, {
839
885
  phase,
840
886
  task,
@@ -843,6 +889,7 @@ async function runTesterCommitTurn({
843
889
  reason,
844
890
  visualFeedback: await readLatestVisualFeedback(config),
845
891
  testerFeedback: await readLatestTesterFeedback(config),
892
+ largeFileWarnings,
846
893
  })
847
894
 
848
895
  const invocation = await runAgentInvocation({
@@ -1054,6 +1101,7 @@ async function runIteration({ config, state, iteration }) {
1054
1101
  gitFinalizeStatus: 'not_run',
1055
1102
  visualStatus: 'not_run',
1056
1103
  terminalReason: 'all_tasks_complete',
1104
+ largeFileWarnings: [],
1057
1105
  notes: 'No unchecked tasks remain in TODOS.md.',
1058
1106
  sessionId: state.sessionId || '',
1059
1107
  outputPath: config.lastAgentOutputFile,
@@ -1103,6 +1151,7 @@ async function runIteration({ config, state, iteration }) {
1103
1151
  let commitPlanFound = false
1104
1152
  let gitFinalizeStatus = 'not_run'
1105
1153
  let terminalReason = mainInvocation.result.terminalReason || ''
1154
+ let largeFileWarnings = findLargeFileWarnings(config, mainInvocation.changedFiles)
1106
1155
  const noteParts = [`developer: ${mainInvocation.result.notes}`]
1107
1156
 
1108
1157
  if (mainInvocation.result.status === 'success' && config.transport === 'mock') {
@@ -1157,6 +1206,7 @@ async function runIteration({ config, state, iteration }) {
1157
1206
  testerVerdict = testerInvocation.testerVerdict
1158
1207
  commitPlanFound = testerInvocation.commitPlanFound === true
1159
1208
  terminalReason = testerInvocation.result.terminalReason || terminalReason
1209
+ largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
1160
1210
  noteParts.push(`tester: ${testerInvocation.result.notes}`)
1161
1211
  await writeTesterFeedback(config, {
1162
1212
  iteration,
@@ -1184,6 +1234,7 @@ async function runIteration({ config, state, iteration }) {
1184
1234
  testerVerdict = testerCommitInvocation.testerVerdict
1185
1235
  commitPlanFound = testerCommitInvocation.commitPlanFound === true
1186
1236
  terminalReason = testerCommitInvocation.result.terminalReason || terminalReason
1237
+ largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
1187
1238
  noteParts.push(`tester_commit: ${testerCommitInvocation.result.notes}`)
1188
1239
  await writeTesterFeedback(config, {
1189
1240
  iteration,
@@ -1241,6 +1292,7 @@ async function runIteration({ config, state, iteration }) {
1241
1292
  sessionFile = fixInvocation.result.sessionFile || sessionFile
1242
1293
  developerStatus = fixInvocation.result.status
1243
1294
  terminalReason = fixInvocation.result.terminalReason || 'developer_fix_incomplete'
1295
+ largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
1244
1296
  noteParts.push(`developer_fix: ${fixInvocation.result.notes}`)
1245
1297
 
1246
1298
  if (fixInvocation.result.status === 'success') {
@@ -1258,6 +1310,7 @@ async function runIteration({ config, state, iteration }) {
1258
1310
  testerVerdict = testerRecheck.testerVerdict
1259
1311
  commitPlanFound = testerRecheck.commitPlanFound === true
1260
1312
  terminalReason = testerRecheck.result.terminalReason || terminalReason
1313
+ largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
1261
1314
  noteParts.push(`tester_recheck: ${testerRecheck.result.notes}`)
1262
1315
  await writeTesterFeedback(config, {
1263
1316
  iteration,
@@ -1285,6 +1338,7 @@ async function runIteration({ config, state, iteration }) {
1285
1338
  testerVerdict = testerCommitInvocation.testerVerdict
1286
1339
  commitPlanFound = testerCommitInvocation.commitPlanFound === true
1287
1340
  terminalReason = testerCommitInvocation.result.terminalReason || terminalReason
1341
+ largeFileWarnings = mergeLargeFileWarnings(largeFileWarnings, findLargeFileWarnings(config, listChangedFiles(config.cwd)))
1288
1342
  noteParts.push(`tester_commit: ${testerCommitInvocation.result.notes}`)
1289
1343
  await writeTesterFeedback(config, {
1290
1344
  iteration,
@@ -1436,7 +1490,7 @@ async function runIteration({ config, state, iteration }) {
1436
1490
 
1437
1491
  await appendLog(
1438
1492
  config.logFile,
1439
- `Finished iteration ${iteration} with status=${finalStatus} verification=${finalVerificationStatus} tester_verdict=${testerVerdict} commit_plan_found=${commitPlanFound} terminal_reason=${terminalReason}`
1493
+ `Finished iteration ${iteration} with status=${finalStatus} verification=${finalVerificationStatus} tester_verdict=${testerVerdict} commit_plan_found=${commitPlanFound} terminal_reason=${terminalReason}${largeFileWarnings.length > 0 ? ` large_file_warnings=${formatLargeFileWarningsInline(largeFileWarnings)}` : ''}`
1440
1494
  )
1441
1495
 
1442
1496
  const iterationEndSnapshot = getRepoSnapshot(config.cwd)
@@ -1453,6 +1507,7 @@ async function runIteration({ config, state, iteration }) {
1453
1507
  gitFinalizeStatus,
1454
1508
  visualStatus,
1455
1509
  terminalReason,
1510
+ largeFileWarnings,
1456
1511
  sessionId,
1457
1512
  developerModel: developerModelName,
1458
1513
  testerModel: testerModelName,
@@ -1486,6 +1541,7 @@ async function runIteration({ config, state, iteration }) {
1486
1541
  testerVerdict,
1487
1542
  commitPlanFound,
1488
1543
  terminalReason,
1544
+ riskWarnings: formatLargeFileWarningsInline(largeFileWarnings),
1489
1545
  notes: noteParts.join(' | '),
1490
1546
  })
1491
1547
 
@@ -1504,6 +1560,7 @@ async function runIteration({ config, state, iteration }) {
1504
1560
  gitFinalizeStatus,
1505
1561
  visualStatus,
1506
1562
  terminalReason,
1563
+ largeFileWarnings,
1507
1564
  notes: noteParts.join(' | '),
1508
1565
  sessionId,
1509
1566
  outputPath: config.lastAgentOutputFile,
@@ -1,6 +1,6 @@
1
1
  import fs from 'node:fs/promises'
2
2
 
3
- const CSV_HEADER = 'timestamp,iteration,phase,kind,status,transport,session_id,timed_out,exit_code,duration_seconds,commit_before,commit_after,repo_changed,changed_files_count,verification_status,retry_count,role,model,tool_calls,tool_errors,message_updates,stop_reason,loop_detected,loop_signature,tester_verdict,commit_plan_found,terminal_reason,notes\n'
3
+ const CSV_HEADER = 'timestamp,iteration,phase,kind,status,transport,session_id,timed_out,exit_code,duration_seconds,commit_before,commit_after,repo_changed,changed_files_count,verification_status,retry_count,role,model,tool_calls,tool_errors,message_updates,stop_reason,loop_detected,loop_signature,tester_verdict,commit_plan_found,terminal_reason,risk_warnings,notes\n'
4
4
 
5
5
  function csvEscape(value) {
6
6
  const text = String(value ?? '')
@@ -56,6 +56,7 @@ export async function appendTelemetry(config, event) {
56
56
  event.testerVerdict,
57
57
  event.commitPlanFound,
58
58
  event.terminalReason,
59
+ event.riskWarnings,
59
60
  event.notes,
60
61
  ].map(csvEscape).join(',')
61
62
 
@@ -20,6 +20,9 @@ Rules:
20
20
  - Use the configured smoke verification path as the fast inner-loop gate. Do not replace it with a long full-flow Playwright spec unless the task explicitly requires it.
21
21
  - If a long Playwright happy-path spec changes, validate with smoke plus one narrow targeted spec or deterministic state hook, not the entire full-flow run.
22
22
  - Reserve long full-flow Playwright specs for an explicit nightly or post-run lane, not the developer turn.
23
+ - Use `read` for source inspection. Use shell only for `git`, tests, and narrow diagnostics.
24
+ - If a snippet seems incomplete, reread a smaller exact window instead of another huge overlapping shell range.
25
+ - Do not build edits from large `sed`/`grep` output or from memory after partial shell reads.
23
26
  - Trust tool output over your own guesses.
24
27
  - Do not repeatedly reread or rewrite the same file when one focused fix will do.
25
28
  - After one failed edit attempt, reread the file before retrying.
@@ -7,7 +7,7 @@ Your job:
7
7
  - review the developer's change from an independent user-facing perspective
8
8
  - add or improve focused verification where needed
9
9
  - verify actual functionality, not just plausibility
10
- - produce a commit plan when the work is truly ready
10
+ - create the final commit only when the work is truly ready
11
11
 
12
12
  Rules:
13
13
 
@@ -16,6 +16,9 @@ Rules:
16
16
  - Run the configured smoke verification command as the default inner-loop gate.
17
17
  - Do not run long full-flow Playwright happy-path specs in the tester turn unless the task explicitly requires them.
18
18
  - If a long spec changed, validate with smoke plus one narrow targeted spec or deterministic state setup instead of replaying the entire run.
19
+ - Use `read` for source inspection. Use shell only for `git`, tests, and narrow diagnostics.
20
+ - If a snippet seems incomplete, reread a smaller exact window instead of another huge overlapping shell range.
21
+ - Do not build edits from large `sed`/`grep` output or from memory after partial shell reads.
19
22
  - Treat player-facing dead ends, missing affordances, broken progression, console/runtime failures, and unusable UI as real failures.
20
23
  - If the task affects menus, unlocks, progression, classes, routes, shops, onboarding, or gating, verify a fresh-save path.
21
24
  - Do not hide product bugs by weakening tests.
@@ -23,7 +26,7 @@ Rules:
23
26
  - After one failed edit attempt, reread the file before retrying.
24
27
  - Do not repeat the same exact oldText-based edit on the same file.
25
28
  - If visual review is enabled, maintain the screenshot capture flow and manifest expected by the harness.
26
- - If the change passes, do not run `git add` or `git commit` yourself. Provide a commit plan for the harness instead.
29
+ - If the change passes, stage only the related files and create the commit yourself.
27
30
  - If the working tree cannot be isolated safely, return `VERDICT: BLOCKED`.
28
31
 
29
32
  Before stopping:
@@ -31,7 +34,7 @@ Before stopping:
31
34
  - include `Observed flow:`
32
35
  - include `Player-facing result:`
33
36
  - include `Regression check:`
37
+ - if passing, include `COMMIT_CREATED: true`
34
38
  - if passing, include `COMMIT_MESSAGE: ...`
35
- - if passing, include `COMMIT_FILES:`
36
- - if passing, include one `- path/to/file` line per file
39
+ - if passing, include `COMMIT_SHA: ...`
37
40
  - end with exactly one verdict line: `VERDICT: PASS`, `VERDICT: FAIL`, or `VERDICT: BLOCKED`
@@ -6,6 +6,8 @@
6
6
  "testerInstructionsFile": "pi/TESTER.md",
7
7
  "commitMode": "agent",
8
8
  "promptMode": "compact",
9
+ "largeFileWarningLines": 500,
10
+ "largeSpecWarningLines": 300,
9
11
  "piModel": "local/text-model",
10
12
  "models": {
11
13
  "local/text-model": {