prizmkit 1.1.70 → 1.1.74

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/bundled/VERSION.json +3 -3
  2. package/bundled/agents/prizm-dev-team-dev.md +11 -1
  3. package/bundled/dev-pipeline/lib/common.sh +427 -0
  4. package/bundled/dev-pipeline/lib/heartbeat.sh +101 -36
  5. package/bundled/dev-pipeline/run-feature.sh +109 -29
  6. package/bundled/dev-pipeline/scripts/parse-stream-progress.py +198 -3
  7. package/bundled/dev-pipeline/scripts/update-feature-status.py +27 -3
  8. package/bundled/dev-pipeline/templates/agent-prompts/dev-implement.md +21 -0
  9. package/bundled/dev-pipeline/templates/bootstrap-tier2.md +1 -1
  10. package/bundled/dev-pipeline/templates/bootstrap-tier3.md +5 -9
  11. package/bundled/dev-pipeline/templates/sections/feature-context.md +3 -18
  12. package/bundled/dev-pipeline/templates/sections/phase-commit-full.md +11 -0
  13. package/bundled/dev-pipeline/templates/sections/phase-commit.md +11 -0
  14. package/bundled/dev-pipeline/templates/sections/phase-context-snapshot-agent-suffix.md +1 -1
  15. package/bundled/dev-pipeline/templates/sections/phase-context-snapshot-base.md +6 -12
  16. package/bundled/dev-pipeline/templates/sections/phase-context-snapshot-lite-suffix.md +10 -3
  17. package/bundled/dev-pipeline/templates/sections/phase-implement-agent.md +1 -0
  18. package/bundled/dev-pipeline/templates/sections/phase-specify-plan-full.md +4 -8
  19. package/bundled/dev-pipeline-windows/lib/common.ps1 +61 -1
  20. package/bundled/dev-pipeline-windows/lib/pipeline.ps1 +325 -16
  21. package/bundled/dev-pipeline-windows/scripts/parse-stream-progress.py +198 -3
  22. package/bundled/dev-pipeline-windows/scripts/update-feature-status.py +27 -3
  23. package/bundled/dev-pipeline-windows/templates/agent-prompts/dev-implement.md +21 -0
  24. package/bundled/dev-pipeline-windows/templates/agent-prompts/reviewer-review.md +1 -1
  25. package/bundled/dev-pipeline-windows/templates/bootstrap-prompt.md +27 -0
  26. package/bundled/dev-pipeline-windows/templates/bootstrap-tier1.md +543 -14
  27. package/bundled/dev-pipeline-windows/templates/bootstrap-tier2.md +664 -14
  28. package/bundled/dev-pipeline-windows/templates/bootstrap-tier3.md +741 -14
  29. package/bundled/dev-pipeline-windows/templates/bugfix-bootstrap-prompt.md +2 -2
  30. package/bundled/dev-pipeline-windows/templates/feature-list-schema.json +1 -1
  31. package/bundled/dev-pipeline-windows/templates/refactor-bootstrap-prompt.md +1 -1
  32. package/bundled/dev-pipeline-windows/templates/refactor-list-schema.json +1 -1
  33. package/bundled/dev-pipeline-windows/templates/sections/context-budget-rules.md +3 -3
  34. package/bundled/dev-pipeline-windows/templates/sections/failure-capture.md +1 -1
  35. package/bundled/dev-pipeline-windows/templates/sections/feature-context.md +3 -18
  36. package/bundled/dev-pipeline-windows/templates/sections/phase-browser-verification-auto.md +239 -40
  37. package/bundled/dev-pipeline-windows/templates/sections/phase-browser-verification-opencli.md +75 -26
  38. package/bundled/dev-pipeline-windows/templates/sections/phase-browser-verification.md +142 -36
  39. package/bundled/dev-pipeline-windows/templates/sections/phase-commit-full.md +13 -2
  40. package/bundled/dev-pipeline-windows/templates/sections/phase-commit.md +12 -1
  41. package/bundled/dev-pipeline-windows/templates/sections/phase-context-snapshot-agent-suffix.md +1 -1
  42. package/bundled/dev-pipeline-windows/templates/sections/phase-context-snapshot-base.md +7 -17
  43. package/bundled/dev-pipeline-windows/templates/sections/phase-context-snapshot-lite-suffix.md +10 -3
  44. package/bundled/dev-pipeline-windows/templates/sections/phase-critic-plan-full.md +1 -1
  45. package/bundled/dev-pipeline-windows/templates/sections/phase-critic-plan.md +1 -1
  46. package/bundled/dev-pipeline-windows/templates/sections/phase-implement-agent.md +3 -1
  47. package/bundled/dev-pipeline-windows/templates/sections/phase-implement-full.md +7 -3
  48. package/bundled/dev-pipeline-windows/templates/sections/phase-implement-lite.md +1 -3
  49. package/bundled/dev-pipeline-windows/templates/sections/phase-plan-agent.md +1 -1
  50. package/bundled/dev-pipeline-windows/templates/sections/phase-plan-lite.md +1 -1
  51. package/bundled/dev-pipeline-windows/templates/sections/phase-review-agent.md +1 -1
  52. package/bundled/dev-pipeline-windows/templates/sections/phase-review-full.md +2 -2
  53. package/bundled/dev-pipeline-windows/templates/sections/phase-specify-plan-full.md +13 -17
  54. package/bundled/dev-pipeline-windows/templates/sections/phase0-test-baseline.md +2 -4
  55. package/bundled/dev-pipeline-windows/templates/sections/subagent-timeout-recovery.md +1 -1
  56. package/bundled/skills/_metadata.json +1 -1
  57. package/package.json +1 -1
@@ -77,13 +77,16 @@ FEATURE_LIST=""
77
77
  # Branch tracking (for cleanup on interrupt)
78
78
  _ORIGINAL_BRANCH=""
79
79
  _DEV_BRANCH_NAME=""
80
+ _SPAWN_FEATURE_SLUG=""
81
+ _SPAWN_EXIT_CODE=0
80
82
 
81
83
  # ============================================================
82
84
  # Shared: Spawn an AI CLI session and wait for result
83
85
  # ============================================================
84
86
 
85
87
  # Spawns an AI CLI session with heartbeat + timeout, waits for completion,
86
- # checks session status, and updates feature status.
88
+ # and checks session status. Canonical status updates happen after the caller
89
+ # returns to the original branch.
87
90
  #
88
91
  # Arguments:
89
92
  # $1 - feature_id
@@ -105,6 +108,9 @@ spawn_and_wait_session() {
105
108
  local session_log="$session_dir/logs/session.log"
106
109
  local progress_json="$session_dir/logs/progress.json"
107
110
 
111
+ _SPAWN_FEATURE_SLUG=""
112
+ _SPAWN_EXIT_CODE=0
113
+
108
114
  local effective_model="${feature_model:-$MODEL}"
109
115
  local cbc_pid
110
116
  prizm_start_ai_session "$bootstrap_prompt" "$session_log" "$effective_model"
@@ -144,6 +150,7 @@ spawn_and_wait_session() {
144
150
  if [[ $exit_code -eq 143 ]]; then
145
151
  exit_code=124
146
152
  fi
153
+ _SPAWN_EXIT_CODE="$exit_code"
147
154
 
148
155
  # Check for stale-kill marker (heartbeat killed the process due to no progress)
149
156
  local stale_kill_marker="$session_dir/logs/stale-kill.json"
@@ -174,7 +181,28 @@ spawn_and_wait_session() {
174
181
  project_root="$PROJECT_ROOT"
175
182
  local default_branch="$base_branch"
176
183
 
177
- if [[ $exit_code -eq 124 ]]; then
184
+ local semantic_finalized=false
185
+ local semantic_feature_slug=""
186
+ local semantic_commit_sha=""
187
+ local was_ai_runtime_error=false
188
+ if prizm_detect_ai_runtime_error "$session_log" "$progress_json"; then
189
+ was_ai_runtime_error=true
190
+ fi
191
+
192
+ if prizm_feature_semantically_complete "$feature_list" "$feature_id" "$project_root" "$default_branch" "$PRIZMKIT_DIR"; then
193
+ semantic_finalized=true
194
+ semantic_feature_slug="$PRIZM_SEMANTIC_FEATURE_SLUG"
195
+ semantic_commit_sha="$PRIZM_SEMANTIC_COMMIT_SHA"
196
+ if [[ $exit_code -ne 0 || "$was_stale_killed" == true || "$was_ai_runtime_error" == true ]]; then
197
+ log_warn "Session ended with a failure signal after semantic completion; treating as finalized success"
198
+ log_warn "Semantic completion commit: ${semantic_commit_sha:-unknown}"
199
+ fi
200
+ session_status="success"
201
+ elif [[ "$was_ai_runtime_error" == true ]]; then
202
+ log_warn "Session failed due to structured AI runtime/context error"
203
+ log_warn "AI runtime errors are retried without consuming code retry budget"
204
+ session_status="infra_error"
205
+ elif [[ $exit_code -eq 124 ]]; then
178
206
  log_warn "Session timed out after ${SESSION_TIMEOUT}s"
179
207
  session_status="timed_out"
180
208
  elif [[ "$was_infra_error" == true ]]; then
@@ -222,15 +250,31 @@ spawn_and_wait_session() {
222
250
  # ── Post-success validation ──────────────────────────────────────────
223
251
  if [[ "$session_status" == "success" ]]; then
224
252
  if git -C "$project_root" rev-parse --is-inside-work-tree >/dev/null 2>&1; then
225
- # Auto-commit any remaining dirty files produced during the session
226
253
  local dirty_files=""
227
254
  dirty_files=$(git -C "$project_root" status --porcelain 2>/dev/null || true)
228
255
  if [[ -n "$dirty_files" ]]; then
229
- log_info "Auto-committing remaining session artifacts..."
230
- git -C "$project_root" add -A 2>/dev/null || true
231
- git -C "$project_root" commit --no-verify --amend --no-edit 2>/dev/null \
232
- || git -C "$project_root" commit --no-verify -m "chore($feature_id): include remaining session artifacts" 2>/dev/null \
233
- || true
256
+ if [[ "$semantic_finalized" == true ]]; then
257
+ local post_completion_slug="$semantic_feature_slug"
258
+ if [[ -z "$post_completion_slug" ]]; then
259
+ post_completion_slug=$(prizm_feature_slug_from_list "$feature_list" "$feature_id" 2>/dev/null || true)
260
+ fi
261
+ if [[ -n "$post_completion_slug" ]] && prizm_preserve_post_completion_dirty "$project_root" "$PRIZMKIT_DIR/specs/${post_completion_slug}" "$feature_id" "$session_id"; then
262
+ log_warn "Post-completion dirty changes preserved under $PRIZMKIT_DIR/specs/${post_completion_slug}/"
263
+ log_warn "They were not included in the finalized feature commit."
264
+ else
265
+ log_warn "Could not safely preserve post-completion dirty changes; preserving dev branch for manual finalization"
266
+ session_status="finalization_needed"
267
+ fi
268
+ else
269
+ # Auto-commit any remaining dirty files produced during a normal
270
+ # clean success path. Semantic finalization explicitly avoids this
271
+ # so delayed post-commit findings cannot be merged into main.
272
+ log_info "Auto-committing remaining session artifacts..."
273
+ git -C "$project_root" add -A 2>/dev/null || true
274
+ git -C "$project_root" commit --no-verify --amend --no-edit 2>/dev/null \
275
+ || git -C "$project_root" commit --no-verify -m "chore($feature_id): include remaining session artifacts" 2>/dev/null \
276
+ || true
277
+ fi
234
278
  fi
235
279
  fi
236
280
  fi
@@ -242,7 +286,10 @@ spawn_and_wait_session() {
242
286
 
243
287
  # Write lightweight session summary for post-session inspection
244
288
  local feature_slug
245
- feature_slug=$(python3 -c "
289
+ if [[ -n "$semantic_feature_slug" ]]; then
290
+ feature_slug="$semantic_feature_slug"
291
+ else
292
+ feature_slug=$(python3 -c "
246
293
  import json, re, sys
247
294
  flist, fid = sys.argv[1], sys.argv[2]
248
295
  with open(flist) as f:
@@ -258,9 +305,11 @@ for feat in data.get('features', []):
258
305
  sys.exit(0)
259
306
  sys.exit(1)
260
307
  " "$feature_list" "$feature_id" 2>/dev/null) || {
261
- log_warn "Could not resolve feature slug for $feature_id — session summary and artifact validation will be skipped"
262
- feature_slug=""
263
- }
308
+ log_warn "Could not resolve feature slug for $feature_id — session summary and artifact validation will be skipped"
309
+ feature_slug=""
310
+ }
311
+ fi
312
+ _SPAWN_FEATURE_SLUG="$feature_slug"
264
313
 
265
314
  # Validate key artifacts exist after successful session
266
315
  if [[ "$session_status" == "success" && -n "$feature_slug" ]]; then
@@ -315,16 +364,6 @@ sys.exit(0)
315
364
  fi
316
365
  fi
317
366
 
318
- # Check if session produced a failure-log for future retries
319
- if [[ "$session_status" != "success" && -n "$feature_slug" ]]; then
320
- local failure_log="$PRIZMKIT_DIR/specs/${feature_slug}/failure-log.md"
321
- if [[ -f "$failure_log" ]]; then
322
- log_info "FAILURE_LOG: Session wrote failure-log.md — will be available to next retry"
323
- else
324
- log_info "FAILURE_LOG: No failure-log.md written by session"
325
- fi
326
- fi
327
-
328
367
  # Propagate completion notes for dependency context (only on success)
329
368
  if [[ "$session_status" == "success" && -n "$feature_slug" ]]; then
330
369
  local summary_path="$PRIZMKIT_DIR/specs/$feature_slug/completion-summary.json"
@@ -342,7 +381,45 @@ sys.exit(0)
342
381
  fi
343
382
  fi
344
383
 
345
- # Update feature status (do NOT commit on dev branch — commit happens after merge)
384
+ # Return status via global variable (avoids $() swallowing stdout)
385
+ _SPAWN_RESULT="$session_status"
386
+ }
387
+
388
+ finalize_feature_status_after_branch_return() {
389
+ local feature_id="$1"
390
+ local feature_list="$2"
391
+ local session_id="$3"
392
+ local session_status="$4"
393
+ local max_retries="$5"
394
+ local session_dir="$6"
395
+ local base_branch="${7:-main}"
396
+
397
+ local feature_slug="${_SPAWN_FEATURE_SLUG:-}"
398
+ local progress_json="$session_dir/logs/progress.json"
399
+ local stale_kill_marker="$session_dir/logs/stale-kill.json"
400
+ local exit_code="${_SPAWN_EXIT_CODE:-0}"
401
+
402
+ # Check if session produced a failure-log for future retries; synthesize one
403
+ # after branch return so canonical diagnostics live on the original branch.
404
+ if [[ "$session_status" != "success" && -n "$feature_slug" ]]; then
405
+ local failure_log="$PRIZMKIT_DIR/specs/${feature_slug}/failure-log.md"
406
+ local checkpoint_file_for_failure="$PRIZMKIT_DIR/specs/${feature_slug}/workflow-checkpoint.json"
407
+ if [[ -f "$failure_log" ]]; then
408
+ log_info "FAILURE_LOG: Session wrote failure-log.md — will be available to next retry"
409
+ else
410
+ prizm_synthesize_failure_log \
411
+ "$failure_log" "$feature_id" "$session_id" "$session_status" "$exit_code" \
412
+ "$stale_kill_marker" "$progress_json" "$checkpoint_file_for_failure" "$PROJECT_ROOT" "$base_branch"
413
+ if [[ -f "$failure_log" ]]; then
414
+ log_info "FAILURE_LOG: Runtime synthesized failure-log.md — will be available to next retry"
415
+ else
416
+ log_info "FAILURE_LOG: No failure-log.md written by session"
417
+ fi
418
+ fi
419
+ fi
420
+
421
+ # Update feature status on the original branch. The caller commits the
422
+ # resulting feature-list diff immediately after this helper returns.
346
423
  local update_output
347
424
  update_output=$(python3 "$SCRIPTS_DIR/update-feature-status.py" \
348
425
  --feature-list "$feature_list" \
@@ -357,9 +434,6 @@ sys.exit(0)
357
434
  }
358
435
 
359
436
  _SPAWN_ITEM_STATUS="$(printf '%s' "$update_output" | prizm_extract_update_new_status)"
360
-
361
- # Return status via global variable (avoids $() swallowing stdout)
362
- _SPAWN_RESULT="$session_status"
363
437
  }
364
438
 
365
439
  # ============================================================
@@ -896,7 +970,7 @@ else:
896
970
  else
897
971
  log_warn "Auto-merge failed — dev branch preserved: $_DEV_BRANCH_NAME"
898
972
  log_warn "Merge manually: git checkout $_ORIGINAL_BRANCH && git rebase $_DEV_BRANCH_NAME"
899
- _DEV_BRANCH_NAME=""
973
+ session_status="merge_conflict"
900
974
  fi
901
975
  elif [[ -n "$_DEV_BRANCH_NAME" ]]; then
902
976
  # Session failed — preserve dev branch for inspection
@@ -907,6 +981,9 @@ else:
907
981
  # GUARANTEED: always return to original branch regardless of success/failure/merge outcome
908
982
  branch_ensure_return "$_proj_root" "$_ORIGINAL_BRANCH"
909
983
 
984
+ finalize_feature_status_after_branch_return \
985
+ "$feature_id" "$feature_list" "$session_id" "$session_status" 999 "$session_dir" "$_ORIGINAL_BRANCH"
986
+
910
987
  # Commit feature status update on the original branch (after guaranteed return)
911
988
  if ! git -C "$_proj_root" diff --quiet "$feature_list" 2>/dev/null; then
912
989
  git -C "$_proj_root" add "$feature_list"
@@ -1318,7 +1395,6 @@ DEPLOY_PROMPT_EOF
1318
1395
  "$feature_id" "$feature_list" "$session_id" \
1319
1396
  "$bootstrap_prompt" "$session_dir" "$MAX_RETRIES" "$feature_model" "$_ORIGINAL_BRANCH"
1320
1397
  local session_status="$_SPAWN_RESULT"
1321
- local item_status_after_session="${_SPAWN_ITEM_STATUS:-}"
1322
1398
 
1323
1399
  # Merge per-feature dev branch back to original on success
1324
1400
  if [[ "$session_status" == "success" && -n "$_DEV_BRANCH_NAME" ]]; then
@@ -1327,7 +1403,7 @@ DEPLOY_PROMPT_EOF
1327
1403
  else
1328
1404
  log_warn "Auto-merge failed — dev branch preserved: $_DEV_BRANCH_NAME"
1329
1405
  log_warn "Merge manually: git checkout $_ORIGINAL_BRANCH && git rebase $_DEV_BRANCH_NAME"
1330
- _DEV_BRANCH_NAME=""
1406
+ session_status="merge_conflict"
1331
1407
  fi
1332
1408
  elif [[ -n "$_DEV_BRANCH_NAME" ]]; then
1333
1409
  # Session failed — preserve dev branch for inspection
@@ -1338,6 +1414,10 @@ DEPLOY_PROMPT_EOF
1338
1414
  # GUARANTEED: always return to original branch regardless of success/failure/merge outcome
1339
1415
  branch_ensure_return "$_proj_root" "$_ORIGINAL_BRANCH"
1340
1416
 
1417
+ finalize_feature_status_after_branch_return \
1418
+ "$feature_id" "$feature_list" "$session_id" "$session_status" "$MAX_RETRIES" "$session_dir" "$_ORIGINAL_BRANCH"
1419
+ local item_status_after_session="${_SPAWN_ITEM_STATUS:-}"
1420
+
1341
1421
  # Commit feature status update on the original branch (after guaranteed return)
1342
1422
  if ! git -C "$_proj_root" diff --quiet "$feature_list" 2>/dev/null; then
1343
1423
  git -C "$_proj_root" add "$feature_list"
@@ -17,6 +17,7 @@ The script runs until:
17
17
  import argparse
18
18
  import json
19
19
  import os
20
+ import re
20
21
  import signal
21
22
  import sys
22
23
  import tempfile
@@ -59,6 +60,58 @@ PHASE_KEYWORDS = {
59
60
  },
60
61
  }
61
62
 
63
+ CONTEXT_ERROR_PATTERNS = [
64
+ re.compile(pattern, re.IGNORECASE)
65
+ for pattern in (
66
+ r"context_too_large",
67
+ r"model_context_window_exceeded",
68
+ r"Your input exceeds the context window",
69
+ r"input exceeds the context window",
70
+ r"context window of this model",
71
+ r"context window exceeded",
72
+ r"invalid_request_error.*context window",
73
+ r"context window.*invalid_request_error",
74
+ )
75
+ ]
76
+
77
+ ERROR_CONTEXT_PATTERNS = [
78
+ re.compile(pattern, re.IGNORECASE)
79
+ for pattern in (
80
+ r"\bapi error\b",
81
+ r"invalid_request_error",
82
+ r"\bstatus\s*[:=]?\s*(400|413)\b",
83
+ r"\bapi_error_status\b",
84
+ r"\bapi_error_code\b",
85
+ r"\blast_result_is_error\b\s*[\"':=]*\s*true\b",
86
+ r"\bis_error\b\s*[\"':=]*\s*true\b",
87
+ )
88
+ ]
89
+
90
+
91
+ def _has_error_context(text):
92
+ """Return true when free text looks like a runtime/provider error."""
93
+ if not text:
94
+ return False
95
+ return any(pattern.search(text) for pattern in ERROR_CONTEXT_PATTERNS)
96
+
97
+
98
+ def detect_api_error_code(text, require_error_context=False):
99
+ """Return a normalized fatal/runtime error code from terminal text.
100
+
101
+ Structured terminal result/error events and raw stderr can be matched
102
+ directly. Ordinary assistant prose is noisier: it may mention the phrase
103
+ "input exceeds the context window" while explaining a test or recovery
104
+ rule, so callers can require additional error-like context there.
105
+ """
106
+ if not text:
107
+ return ""
108
+ if require_error_context and not _has_error_context(text):
109
+ return ""
110
+ for pattern in CONTEXT_ERROR_PATTERNS:
111
+ if pattern.search(text):
112
+ return "context_too_large"
113
+ return ""
114
+
62
115
 
63
116
  class ProgressTracker:
64
117
  """Tracks progress state from stream-json events."""
@@ -73,6 +126,12 @@ class ProgressTracker:
73
126
  self.tool_call_counts = Counter()
74
127
  self.total_tool_calls = 0
75
128
  self.last_text_snippet = ""
129
+ self.last_result_is_error = False
130
+ self.api_error_status = None
131
+ self.api_error_code = ""
132
+ self.terminal_result_text = ""
133
+ self.terminal_success_at = ""
134
+ self.fatal_error_code = ""
76
135
  self.is_active = True
77
136
  self.errors = []
78
137
  self.event_format = ""
@@ -164,11 +223,13 @@ class ProgressTracker:
164
223
  elif event_type == "turn.failed":
165
224
  error = event.get("error") or event.get("message") or "Codex turn failed"
166
225
  self.errors.append(str(error))
226
+ self._detect_terminal_error(str(error))
167
227
  self.current_tool = None
168
228
 
169
229
  elif event_type == "error":
170
230
  error = event.get("error") or event.get("message") or "Unknown error"
171
231
  self.errors.append(str(error))
232
+ self._detect_terminal_error(str(error))
172
233
 
173
234
  return
174
235
 
@@ -196,12 +257,51 @@ class ProgressTracker:
196
257
  if text.strip():
197
258
  self.last_text_snippet = text.strip()[:120]
198
259
  self._detect_phase(text)
260
+ self._detect_terminal_error(text, require_error_context=True)
199
261
 
200
262
  elif event_type == "tool_result" or event_type == "user":
201
263
  # tool_result contains output from tool execution
202
264
  self.event_format = self.event_format or "stream-json"
203
265
  self.is_active = True
204
266
 
267
+ # Check for error patterns in tool_result content (supports both formats):
268
+ # A) Top-level tool_result events: event["content"] is the result text
269
+ # B) Nested user events: event["message"]["content"][] has type=="tool_result" items
270
+ content_candidates = []
271
+
272
+ # Format A: top-level tool_result
273
+ if event_type == "tool_result":
274
+ content_candidates.append(str(event.get("content", "")))
275
+
276
+ # Format B: nested inside user event
277
+ if event_type == "user":
278
+ message = event.get("message", {})
279
+ content_list = message.get("content", [])
280
+ if isinstance(content_list, list):
281
+ for item in content_list:
282
+ if isinstance(item, dict) and item.get("type") == "tool_result":
283
+ content_candidates.append(str(item.get("content", "")))
284
+
285
+ for result_text in content_candidates:
286
+ if "shorter than the provided offset" in result_text:
287
+ self.errors.append({
288
+ "type": "read_offset_overflow",
289
+ "tool": self.current_tool,
290
+ "at": datetime.now(timezone.utc).isoformat(),
291
+ })
292
+ break # one error per event is enough
293
+ elif "Wasted call" in result_text:
294
+ self.errors.append({
295
+ "type": "wasted_call",
296
+ "tool": self.current_tool,
297
+ "at": datetime.now(timezone.utc).isoformat(),
298
+ })
299
+ break
300
+
301
+ # Keep only last 20 errors to prevent unbounded growth in progress.json
302
+ if len(self.errors) > 20:
303
+ self.errors = self.errors[-20:]
304
+
205
305
  elif event_type == "system":
206
306
  # System events (hooks, init, task notifications, etc.) — track but don't count as messages.
207
307
  self.event_format = self.event_format or "stream-json"
@@ -274,6 +374,28 @@ class ProgressTracker:
274
374
  state.setdefault("subagent_type", "")
275
375
  self._update_claude_subagent_status_counts()
276
376
 
377
+ elif event_type == "result":
378
+ self.event_format = self.event_format or "stream-json"
379
+ self.is_active = False
380
+ result_text = event.get("result") or event.get("message") or ""
381
+ error_obj = event.get("error")
382
+ if isinstance(error_obj, dict):
383
+ error_text = " ".join(
384
+ str(error_obj.get(key) or "")
385
+ for key in ("type", "code", "message")
386
+ if error_obj.get(key)
387
+ )
388
+ result_text = " ".join(part for part in (str(result_text), error_text) if part)
389
+ api_error_code = event.get("api_error_code") or event.get("error_code") or ""
390
+ if isinstance(error_obj, dict) and not api_error_code:
391
+ api_error_code = error_obj.get("code") or error_obj.get("type") or ""
392
+ self._record_terminal_result(
393
+ text=str(result_text or ""),
394
+ is_error=bool(event.get("is_error")),
395
+ api_error_status=event.get("api_error_status"),
396
+ api_error_code=str(api_error_code or ""),
397
+ )
398
+
277
399
  # ── Claude API raw stream format ────────────────────────────
278
400
  elif event_type == "message_start":
279
401
  self.event_format = self.event_format or "stream-json"
@@ -316,6 +438,7 @@ class ProgressTracker:
316
438
  self.last_text_snippet = stripped[:120]
317
439
  # Try to detect phase from text
318
440
  self._detect_phase(text)
441
+ self._detect_terminal_error(text, require_error_context=True)
319
442
 
320
443
  elif delta_type == "input_json_delta":
321
444
  partial = delta.get("partial_json", "")
@@ -331,21 +454,73 @@ class ProgressTracker:
331
454
  self._extract_tool_summary(full_input)
332
455
  self._detect_phase(full_input)
333
456
  else:
334
- # Text block finished - detect phase from accumulated text
457
+ # Text block finished - detect phase and terminal errors from accumulated text
335
458
  if self._text_buffer:
336
459
  self._detect_phase(self._text_buffer)
460
+ self._detect_terminal_error(
461
+ self._text_buffer,
462
+ require_error_context=True,
463
+ )
337
464
  self._in_tool_use = False
338
465
  self._current_tool_input_parts = []
339
466
 
340
467
  elif event_type == "error":
341
468
  error_msg = event.get("error", {}).get("message", "Unknown error")
342
469
  self.errors.append(error_msg)
470
+ self._detect_terminal_error(str(error_msg))
343
471
 
344
472
  # Check for subagent indicator
345
473
  if event.get("parent_tool_use_id"):
346
474
  # This is a sub-agent event; tool name is still tracked normally
347
475
  pass
348
476
 
477
+ def _record_terminal_result(self, text="", is_error=False, api_error_status=None, api_error_code=""):
478
+ """Record a Claude Code terminal result event."""
479
+ terminal_text = str(text or "")
480
+ self.last_result_is_error = bool(is_error)
481
+ if api_error_status not in (None, ""):
482
+ try:
483
+ self.api_error_status = int(api_error_status)
484
+ except (TypeError, ValueError):
485
+ self.api_error_status = api_error_status
486
+ error_like_result = (
487
+ self.last_result_is_error
488
+ or api_error_status not in (None, "")
489
+ or bool(api_error_code)
490
+ or _has_error_context(terminal_text)
491
+ )
492
+ normalized_code = detect_api_error_code(
493
+ " ".join([str(api_error_code or ""), terminal_text]),
494
+ require_error_context=not error_like_result,
495
+ )
496
+ if normalized_code:
497
+ self.api_error_code = normalized_code
498
+ self.fatal_error_code = normalized_code
499
+ elif api_error_code:
500
+ self.api_error_code = str(api_error_code)
501
+ self.terminal_result_text = terminal_text[:1000]
502
+ if terminal_text.strip():
503
+ self.last_text_snippet = terminal_text.strip()[:120]
504
+ if not self.last_result_is_error and not self.fatal_error_code:
505
+ self.terminal_success_at = datetime.now(timezone.utc).strftime("%Y-%m-%dT%H:%M:%SZ")
506
+ elif self.fatal_error_code:
507
+ self.errors.append(self.fatal_error_code)
508
+
509
+ def _detect_terminal_error(self, text, require_error_context=False):
510
+ """Detect fatal context-window errors from unstructured text."""
511
+ code = detect_api_error_code(
512
+ str(text or ""),
513
+ require_error_context=require_error_context,
514
+ )
515
+ if not code:
516
+ return
517
+ self.last_result_is_error = True
518
+ self.api_error_code = code
519
+ self.fatal_error_code = code
520
+ self.terminal_result_text = str(text or "")[:1000]
521
+ if text:
522
+ self.last_text_snippet = str(text).strip()[:120]
523
+
349
524
  def _detect_phase(self, text):
350
525
  """Detect pipeline phase from text content.
351
526
 
@@ -692,6 +867,12 @@ class ProgressTracker:
692
867
  "child_activity_signature": self.child_activity_signature,
693
868
  "last_child_activity_at": self.last_child_activity_at,
694
869
  "last_text_snippet": self.last_text_snippet,
870
+ "last_result_is_error": self.last_result_is_error,
871
+ "api_error_status": self.api_error_status,
872
+ "api_error_code": self.api_error_code,
873
+ "terminal_result_text": self.terminal_result_text,
874
+ "terminal_success_at": self.terminal_success_at,
875
+ "fatal_error_code": self.fatal_error_code,
695
876
  "is_active": self.is_active,
696
877
  "errors": self.errors[-10:], # Keep last 10 errors
697
878
  }
@@ -728,6 +909,12 @@ def tail_and_parse(session_log, progress_file, poll_interval=0.5):
728
909
  state["current_phase"],
729
910
  state["total_tool_calls"],
730
911
  state.get("child_activity_signature", ""),
912
+ state.get("last_result_is_error"),
913
+ state.get("api_error_status"),
914
+ state.get("api_error_code", ""),
915
+ state.get("fatal_error_code", ""),
916
+ state.get("terminal_result_text", ""),
917
+ tuple(state.get("errors", [])),
731
918
  )
732
919
 
733
920
  # Wait for log file to appear
@@ -752,11 +939,19 @@ def tail_and_parse(session_log, progress_file, poll_interval=0.5):
752
939
  event = json.loads(line)
753
940
  tracker.process_event(event)
754
941
  except json.JSONDecodeError:
755
- # Not a JSON line (could be stderr mixed in)
756
- # Use it as a text snippet if meaningful
942
+ # Not a JSON line (could be stderr mixed in). Use it as a
943
+ # text snippet and only treat it as terminal when it has a
944
+ # strong API/runtime error marker; ordinary assistant prose
945
+ # can discuss context limits without being fatal.
757
946
  stripped = line.strip()
758
947
  if stripped and len(stripped) > 5:
759
948
  tracker.last_text_snippet = stripped[:120]
949
+ tracker._detect_terminal_error(stripped, require_error_context=True)
950
+ current_state = tracker.to_dict()
951
+ current_state_key = state_key(current_state)
952
+ if current_state_key != last_write_state:
953
+ atomic_write_json(current_state, progress_file)
954
+ last_write_state = current_state_key
760
955
  continue
761
956
 
762
957
  # Write progress if state changed
@@ -49,6 +49,7 @@ SESSION_STATUS_VALUES = [
49
49
  "commit_missing",
50
50
  "docs_missing",
51
51
  "merge_conflict",
52
+ "finalization_needed",
52
53
  ]
53
54
 
54
55
  TERMINAL_STATUSES = {"completed", "failed", "skipped", "auto_skipped", "split"}
@@ -644,7 +645,25 @@ def action_update(args, feature_list_path, state_dir):
644
645
  fs["degraded_reason"] = session_status
645
646
  fs["resume_from_phase"] = None
646
647
  fs["sessions"] = []
647
- fs["last_session_id"] = None
648
+ if session_id:
649
+ fs["last_session_id"] = session_id
650
+ fs["last_failed_session_id"] = session_id
651
+
652
+ err = update_feature_in_list(feature_list_path, feature_id, new_status)
653
+ if err:
654
+ error_out("Failed to update .prizmkit/plans/feature-list.json: {}".format(err))
655
+ return
656
+ elif session_status == "finalization_needed":
657
+ # Runtime preserved dirty post-completion changes but could not safely
658
+ # clean them for automatic merge. Preserve the dev branch and stop for
659
+ # manual finalization instead of spending code retry budget.
660
+ new_status = "failed"
661
+ fs["degraded_reason"] = session_status
662
+ fs["resume_from_phase"] = None
663
+ fs["finalization_needed_count"] = fs.get("finalization_needed_count", 0) + 1
664
+ if session_id:
665
+ fs["last_session_id"] = session_id
666
+ fs["last_failed_session_id"] = session_id
648
667
 
649
668
  err = update_feature_in_list(feature_list_path, feature_id, new_status)
650
669
  if err:
@@ -657,6 +676,8 @@ def action_update(args, feature_list_path, state_dir):
657
676
  new_status = "pending"
658
677
  fs["infra_error_count"] = fs.get("infra_error_count", 0) + 1
659
678
  fs["last_infra_error_session_id"] = session_id
679
+ if session_id:
680
+ fs["last_session_id"] = session_id
660
681
  fs["resume_from_phase"] = None
661
682
 
662
683
  err = update_feature_in_list(feature_list_path, feature_id, new_status)
@@ -673,6 +694,9 @@ def action_update(args, feature_list_path, state_dir):
673
694
  new_status = "pending"
674
695
 
675
696
  fs["resume_from_phase"] = None
697
+ if session_id:
698
+ fs["last_session_id"] = session_id
699
+ fs["last_failed_session_id"] = session_id
676
700
  # Keep sessions list and last_session_id for debugging
677
701
 
678
702
  err = update_feature_in_list(feature_list_path, feature_id, new_status)
@@ -712,9 +736,9 @@ def action_update(args, feature_list_path, state_dir):
712
736
  }
713
737
  if auto_skipped_features:
714
738
  summary["auto_skipped"] = [info["feature_id"] for info in auto_skipped_features]
715
- if session_status in ("commit_missing", "docs_missing", "merge_conflict"):
739
+ if session_status in ("commit_missing", "docs_missing", "merge_conflict", "finalization_needed"):
716
740
  summary["degraded_reason"] = session_status
717
- summary["restart_policy"] = "finalization_retry"
741
+ summary["restart_policy"] = "manual_finalization" if session_status == "finalization_needed" else "finalization_retry"
718
742
  elif session_status == "infra_error":
719
743
  summary["restart_policy"] = "infra_retry"
720
744
  summary["infra_error_count"] = fs.get("infra_error_count", 0)
@@ -1,5 +1,23 @@
1
1
  "Read {{DEV_SUBAGENT_PATH}}. Implement feature {{FEATURE_ID}} (slug: {{FEATURE_SLUG}}).
2
2
 
3
+ ## Task Summary Card
4
+
5
+ **Objective**: Implement {{FEATURE_TITLE}}.
6
+
7
+ **Primary files** (see context-snapshot.md Section 4 for full manifest):
8
+ - Review plan.md Tasks section for the complete task-to-file mapping.
9
+ - Each task's `— file:` suffix identifies the target file.
10
+
11
+ **Test command**: `{{TEST_CMD}}`
12
+
13
+ **Known baseline failures**: `{{BASELINE_FAILURES}}`
14
+
15
+ **DO NOT**:
16
+ - Re-read source files already listed in context-snapshot.md Section 4 File Manifest
17
+ - Create new files unless a plan.md task explicitly requires it
18
+ - Run git commands
19
+ - Use mock success data or fake rows in UI/tests
20
+
3
21
  ## Required Inputs
4
22
 
5
23
  1. Read `.prizmkit/specs/{{FEATURE_SLUG}}/context-snapshot.md` first.
@@ -35,6 +53,9 @@ Before returning, append `## Implementation Log` to `context-snapshot.md` with:
35
53
  - Carry forward the Dev-isolated subset: skip scaffold/generated files listed in `context-snapshot.md`; verify dependency versions before install/build commands that resolve dependencies; after build/compile commands, ensure outputs are ignored and never commit generated artifacts.
36
54
  - If tests fail, follow this Test Failure Recovery subset: classify failures as baseline, new regression, brittle test, or environment/tooling; fix new regressions and brittle tests while progress is being made; document baseline failures; write `failure-log.md` for blockers.
37
55
  - Do not run git commands; staging and commit are handled by the orchestrator.
56
+ - **Edit safety**: If an Edit fails with 'String to replace not found', grep for the target text before retrying. Never guess file offsets — verify them with a Read or grep first.
57
+ - **Read safety**: If 3 consecutive Reads to the same file return 'shorter than offset' or 'Wasted call', STOP and report BLOCKED.
58
+ - **Test early**: Run `{{TEST_CMD}}` after every 3 successful Edit operations. Capture output to /tmp/test-out.txt and grep for failures.
38
59
 
39
60
  Do not return success unless:
40
61
  1. implementation tasks are complete;
@@ -131,7 +131,7 @@ If MISSING — build it now:
131
131
  ```bash
132
132
  find . -maxdepth 2 -type d -not -path '*/node_modules/*' -not -path '*/.git/*' -not -path '*/dist/*' -not -path '*/build/*' -not -path '*/__pycache__/*' -not -path '*/vendor/*' | sed -e 's;[^/]*/;|____;g;s;____|; |;g'
133
133
  ```
134
- - **Section 3 — Prizm Context**: full content of root.prizm and relevant L1/L2 docs
134
+ - **Section 3 — Key TRAPS & RULES**: relevant TRAPS/RULES from prizm-docs (not full copies)
135
135
  - **Section 4 — File Manifest**: For each file relevant to this feature, list: file path, why it's needed (modify/reference/test), key interface signatures (function names + params + return types). Do NOT include full file content — agents read files on-demand. Format:
136
136
  ### Files to Modify
137
137
  | File | Why Needed | Key Interfaces |