loki-mode 7.41.4 → 7.42.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -29,7 +29,7 @@ _The free, source-available autonomous coding agent by [Autonomi](https://www.au
29
29
  - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
30
30
  - **Standalone verification: `loki verify`** -- Run Loki's deterministic gates (build, tests, static analysis, secret scan, dependency audit) against any branch or PR diff, including code written by other agents or humans. CI-ready exit codes (0 VERIFIED, 1 CONCERNS, 2 BLOCKED), machine-readable evidence at `.loki/verify/evidence.json`. Inconclusive evidence is never reported as VERIFIED (v7.27.0).
31
31
  - **Living spec and pre-build interrogation** -- `loki spec` locks a spec and detects drift deterministically (`spec.lock`, `drift-report.json`, and a `SPEC_DRIFT` finding in `loki verify` with CI exit codes), so you can tell when the build diverges from what was agreed. `loki grill` runs a Devil's-Advocate interrogation of the spec before you build, surfacing gaps and contradictions early (v7.28.0).
32
- - **Mid-flight model switching + Claude Fable tier** -- switch the model a live run uses from the dashboard (applies at the next iteration, current run only), with Claude Fable available as a premium tier at its published $10/$50 per MTok (2x Opus). For every model lever (session pin to Fable, mid-flight override, architect pass) and every `LOKI_MAX_TIER` path, the `loki plan` quote, the dashboard's reported model, and the actual dispatched model agree, with the ceiling enforced (v7.31.0).
32
+ - **Mid-flight model switching** -- switch the model a live run uses from the dashboard (applies at the next iteration, current run only). A Fable tier lever exists in the CLI, dashboard, and override paths, but Claude Fable 5 is not yet available at the API, so selecting Fable currently collapses to Opus at every dispatch chokepoint and the `loki plan` quote reflects Opus accordingly. For every model lever (session pin, mid-flight override, architect pass) and every `LOKI_MAX_TIER` path, the `loki plan` quote, the dashboard's reported model, and the actual dispatched model agree, with the ceiling enforced (v7.31.0; Fable-to-Opus collapse v7.39.1).
33
33
  - **A calmer CLI** -- the help surface is ~20 grouped workflow entries instead of a 70-command wall; merged commands live on as aliases that forward byte-identically with a one-line stderr pointer, so no script breaks (v7.31.0).
34
34
  - **Guided first build: `loki quickstart`** -- four quick questions (setup check, one-line idea, template pick, plan review) and your build starts; pressing Enter through every step builds the sample Todo app. The plan step quotes the real cost/time estimate before anything is spent, and `loki demo` now confirms its estimate the same way. If no AI provider CLI is installed, Loki offers to install Claude Code (consent-gated, interactive terminals only) (v7.29.0).
35
35
  - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
@@ -391,6 +391,23 @@ Run `loki --help` for all options. Full reference: [CLI Reference](wiki/CLI-Refe
391
391
 
392
392
  ---
393
393
 
394
+ <details>
395
+ <summary><strong>Configuration env vars (intelligent defaults, opt-out knobs)</strong></summary>
396
+
397
+ Loki Mode's accuracy and autonomy behaviors are default-on. Each is an opt-out escape hatch, not a setting you have to discover. The most relevant knobs from the v7.41.x accuracy/autonomy hardening:
398
+
399
+ | Env var | Default | Effect |
400
+ |---------|---------|--------|
401
+ | `LOKI_REVIEW_INCONCLUSIVE_BLOCK` | `1` | Blocks completion when a code-review round returns zero usable verdicts (an all-empty review proves nothing). Set `0` to record the inconclusive result without blocking. |
402
+ | `LOKI_COMPLETION_TEST_CAPTURE` | `1` | Captures fresh test results before the verified-completion evidence gate evaluates. Set `0` to skip the pre-gate capture. |
403
+ | `LOKI_AUTO_DOCS` | `true` | Generates the `.loki/docs/` suite before the documentation gate scores it (bounded: once per run when docs are missing, and again only when >10 commits stale). Set `false` to opt out. |
404
+ | `LOKI_CAVEMAN` | `1` (on) | Output-token compressor for free-form generation only (never trust-gate subcalls). Set `0` to opt out. |
405
+ | `LOKI_CAVEMAN_LEVEL` | inferred | Compression level for the compressor. Auto-inferred per invocation from the run's RARV tier; set explicitly (`lite` / `full` / `ultra`) to override the inference. |
406
+
407
+ This is a subset. See the [wiki](wiki/Home.md) for the full env-var reference and the RARV-C closure knobs (`LOKI_INJECT_FINDINGS`, `LOKI_OVERRIDE_COUNCIL`, `LOKI_AUTO_LEARNINGS`, `LOKI_HANDOFF_MD`).
408
+
409
+ </details>
410
+
394
411
  <details>
395
412
  <summary><strong>BMAD Method Integration</strong></summary>
396
413
 
package/SKILL.md CHANGED
@@ -3,7 +3,7 @@ name: loki-mode
3
3
  description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
4
4
  ---
5
5
 
6
- # Loki Mode v7.41.4
6
+ # Loki Mode v7.42.0
7
7
 
8
8
  **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
9
9
 
@@ -398,4 +398,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
398
398
 
399
399
  ---
400
400
 
401
- **v7.41.4 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
401
+ **v7.42.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
package/VERSION CHANGED
@@ -1 +1 @@
1
- 7.41.4
1
+ 7.42.0
@@ -710,8 +710,18 @@ print('true' if ratio > budget else 'false')
710
710
  ((member++))
711
711
  done
712
712
 
713
- # Anti-sycophancy check: if unanimous APPROVE, run devil's advocate
713
+ # Anti-sycophancy check: if unanimous APPROVE, run devil's advocate.
714
+ #
715
+ # Audit-trail snapshots (these do NOT affect the live vote): capture whether
716
+ # the council was unanimous BEFORE the decrement below, and whether the DA
717
+ # actually fired and flipped the verdict. The transcript fields
718
+ # _ct_triggered/_ct_flipped used to be re-derived from approve_count AFTER
719
+ # this block decremented it, so on rounds where the DA fired AND flipped they
720
+ # were mis-recorded as false/false, corrupting the trust-metrics audit trail.
721
+ local _da_was_unanimous="false"
722
+ local _da_flipped="false"
714
723
  if [ $approve_count -eq $COUNCIL_SIZE ] && [ $COUNCIL_SIZE -ge 2 ]; then
724
+ _da_was_unanimous="true"
715
725
  log_warn "Unanimous approval detected - running anti-sycophancy check..."
716
726
  local contrarian_verdict
717
727
  contrarian_verdict=$(council_devils_advocate "$evidence_file" "$vote_dir")
@@ -731,6 +741,7 @@ print('true' if ratio > budget else 'false')
731
741
  log_warn "Overriding to require one more iteration for verification"
732
742
  approve_count=$((approve_count - 1))
733
743
  reject_count=$((reject_count + 1))
744
+ _da_flipped="true"
734
745
  fi
735
746
  fi
736
747
 
@@ -795,20 +806,18 @@ with open(state_file, 'w') as f:
795
806
  >/dev/null 2>&1 || true
796
807
  fi
797
808
 
798
- # Write transcript for this council round (Path A: council_vote path)
809
+ # Write transcript for this council round (Path A: council_vote path).
810
+ #
811
+ # Drive contrarian_triggered/_flipped off the snapshots captured in the
812
+ # anti-sycophancy block ABOVE, not off the now-mutated approve_count. The DA
813
+ # fires exactly when the council was unanimous (_da_was_unanimous), and it
814
+ # flips exactly when it did not confirm the approval (_da_flipped). Re-deriving
815
+ # from approve_count was wrong because the flip path already decremented it,
816
+ # so triggered/flipped were both recorded as false on flip rounds.
799
817
  local _ct_outcome
800
818
  _ct_outcome=$([ $approve_count -ge $effective_threshold ] && echo "APPROVED" || echo "REJECTED")
801
- local _ct_triggered="false"
802
- local _ct_flipped="false"
803
- if [ $approve_count -eq $COUNCIL_SIZE ] && [ $COUNCIL_SIZE -ge 2 ]; then
804
- _ct_triggered="true"
805
- fi
806
- # contrarian_flipped: DA voted REJECT/CANNOT_VALIDATE causing approve_count drop
807
- # Detect by checking if approve dropped from unanimous (COUNCIL_SIZE) to less
808
- # We infer flip if triggered AND final approve < COUNCIL_SIZE
809
- if [ "$_ct_triggered" = "true" ] && [ $approve_count -lt $COUNCIL_SIZE ]; then
810
- _ct_flipped="true"
811
- fi
819
+ local _ct_triggered="$_da_was_unanimous"
820
+ local _ct_flipped="$_da_flipped"
812
821
  council_write_transcript "${ITERATION_COUNT:-0}" "$_ct_outcome" "$_ct_triggered" "$_ct_flipped" "$effective_threshold"
813
822
 
814
823
  if [ $approve_count -ge $effective_threshold ]; then
@@ -2045,7 +2054,23 @@ council_evaluate_member() {
2045
2054
  local role="$1"
2046
2055
  local criteria="${2:-general completion check}"
2047
2056
  local loki_dir="${TARGET_DIR:-.}/.loki"
2048
- local vote="COMPLETE"
2057
+ # Trust-gate inversion (v7.41.5): the default is CONTINUE, not COMPLETE.
2058
+ # The old default ("absence of a detected failure == COMPLETE") let a
2059
+ # greenfield run with an empty .loki/ (no test logs, no queue, few TODOs)
2060
+ # cause requirements_verifier + devils_advocate to vote COMPLETE while only
2061
+ # test_auditor went CONTINUE -> 2-of-3 cleared the size-3 threshold and the
2062
+ # heuristic council approved a project with ZERO positive evidence. We now
2063
+ # require AFFIRMATIVE positive evidence before any member votes COMPLETE,
2064
+ # mirroring the LLM prompt's "do not approve incomplete work" stance.
2065
+ #
2066
+ # Mechanics:
2067
+ # - vote starts at CONTINUE.
2068
+ # - The existing failure detectors set blocked=true (a hard "not done"
2069
+ # signal) and accumulate reasons.
2070
+ # - At the end, vote flips to COMPLETE only if blocked==false AND the
2071
+ # member's positive signal holds. No positive signal => CONTINUE.
2072
+ local vote="CONTINUE"
2073
+ local blocked="false"
2049
2074
  local reasons=""
2050
2075
 
2051
2076
  # Check 1: Do tests pass? Look for test results in .loki/
@@ -2067,7 +2092,7 @@ council_evaluate_member() {
2067
2092
  fi
2068
2093
  done
2069
2094
  if [ "$test_failures" -gt 0 ]; then
2070
- vote="CONTINUE"
2095
+ blocked="true"
2071
2096
  reasons="${reasons}test failures found ($test_failures); "
2072
2097
  fi
2073
2098
 
@@ -2076,12 +2101,16 @@ council_evaluate_member() {
2076
2101
  local current_diff_hash
2077
2102
  current_diff_hash=$(git diff --stat HEAD 2>/dev/null | (md5sum 2>/dev/null || md5 -r 2>/dev/null) | cut -d' ' -f1 || echo "unknown")
2078
2103
  if [ "$COUNCIL_CONSECUTIVE_NO_CHANGE" -gt 0 ] && [ "$ITERATION_COUNT" -gt "$COUNCIL_MIN_ITERATIONS" ]; then
2079
- # Code has stopped changing -- stagnation, not necessarily done
2080
- # (BUG-QG-011: previously inverted -- forced CONTINUE when code was changing,
2081
- # which penalized active progress. Now: stagnation with no passing checks = CONTINUE)
2082
- if [ "$vote" = "COMPLETE" ]; then
2083
- : # Other checks passed despite stagnation -- allow COMPLETE
2084
- else
2104
+ # Code has stopped changing -- stagnation, not necessarily done.
2105
+ # (BUG-QG-011 history: previously inverted -- forced CONTINUE when code
2106
+ # was changing, which penalized active progress.)
2107
+ # Under the v7.41.5 affirmative-evidence default this is INFORMATIONAL
2108
+ # only: stagnation by itself is neither a failure (do not set blocked)
2109
+ # nor positive evidence (does not flip vote to COMPLETE). Whether the
2110
+ # member completes is decided entirely by blocked + the positive-signal
2111
+ # check below. Surface the stagnation note only when a real failure was
2112
+ # also detected, so the reason string stays honest.
2113
+ if [ "$blocked" = "true" ]; then
2085
2114
  reasons="${reasons}code stagnated with failing checks; "
2086
2115
  fi
2087
2116
  fi
@@ -2100,49 +2129,126 @@ council_evaluate_member() {
2100
2129
  done
2101
2130
  fi
2102
2131
  if [ "$error_count" -gt 0 ]; then
2103
- vote="CONTINUE"
2132
+ blocked="true"
2104
2133
  reasons="${reasons}uncaught errors in logs ($error_count); "
2105
2134
  fi
2106
2135
 
2107
- # Role-specific checks
2136
+ # --- Affirmative positive evidence (v7.41.5) ----------------------------
2137
+ # Base positive signal, shared by all members: a structured test-results.json
2138
+ # exists and is NOT red. This is the SAME file + parse the evidence hard gate
2139
+ # uses (council_evidence_gate, see this file ~line 1543-1568), so the member
2140
+ # vote and the gate agree on what "tests are not red" means instead of a
2141
+ # fragile log grep. The completion route guarantees this file is written
2142
+ # before the council votes via ensure_completion_test_evidence()
2143
+ # (autonomy/run.sh): a project with a real runner records its true PASS/FAIL,
2144
+ # and a project with no runner records {"runner":"none","pass":true}. A
2145
+ # greenfield run with an empty .loki/ has NO such file -> no positive base ->
2146
+ # the member stays CONTINUE.
2147
+ #
2148
+ # Parse verdict mirrors council_evidence_gate: runner=="none" => PASS,
2149
+ # pass is False => FAIL, else PASS. Unparseable/missing => not present.
2150
+ local tr_file="$loki_dir/quality/test-results.json"
2151
+ local test_evidence="absent" # absent | pass | fail
2152
+ local test_runner_seen="none"
2153
+ if [ -f "$tr_file" ]; then
2154
+ local _tr_status
2155
+ _tr_status=$(_TR_FILE="$tr_file" python3 -c "
2156
+ import json, os, sys
2157
+ try:
2158
+ with open(os.environ['_TR_FILE']) as f:
2159
+ d = json.load(f)
2160
+ except (json.JSONDecodeError, IOError, KeyError, ValueError):
2161
+ print('absent:none')
2162
+ sys.exit(0)
2163
+ runner = d.get('runner', 'none')
2164
+ passed = d.get('pass', True)
2165
+ if runner == 'none':
2166
+ print('pass:none')
2167
+ elif passed is False:
2168
+ print('fail:%s' % runner)
2169
+ else:
2170
+ print('pass:%s' % runner)
2171
+ " 2>/dev/null || echo "absent:none")
2172
+ test_evidence="${_tr_status%%:*}"
2173
+ test_runner_seen="${_tr_status#*:}"
2174
+ fi
2175
+ if [ "$test_evidence" = "fail" ]; then
2176
+ # A red structured result is a hard failure for every member, on top of
2177
+ # any log-derived failures already counted above.
2178
+ blocked="true"
2179
+ reasons="${reasons}structured test results red (runner '$test_runner_seen'); "
2180
+ fi
2181
+
2182
+ # Per-member positive signal, evaluated on top of the shared base.
2183
+ local positive="false"
2108
2184
  case "$role" in
2109
2185
  requirements_verifier)
2110
- # Check if pending tasks remain
2186
+ # Positive: tests not red AND no pending tasks. A present queue file
2187
+ # with pending>0 is a hard "not done"; an ABSENT queue file is not
2188
+ # itself disqualifying (a legit run need not have one), it just means
2189
+ # this member relies on the base test evidence.
2190
+ local pending=0
2111
2191
  if [ -f "$loki_dir/queue/pending.json" ]; then
2112
- local pending
2113
2192
  pending=$(_QUEUE_FILE="$loki_dir/queue/pending.json" python3 -c "import json, os; print(len(json.load(open(os.environ['_QUEUE_FILE']))))" 2>/dev/null || echo "0")
2114
2193
  if [ "$pending" -gt 0 ]; then
2115
- vote="CONTINUE"
2194
+ blocked="true"
2116
2195
  reasons="${reasons}$pending tasks still pending; "
2117
2196
  fi
2118
2197
  fi
2198
+ if [ "$test_evidence" = "pass" ] && [ "$pending" -eq 0 ]; then
2199
+ positive="true"
2200
+ fi
2119
2201
  ;;
2120
2202
  test_auditor)
2121
- # Check if any test log exists at all
2122
- local has_tests=false
2123
- for f in "$loki_dir"/logs/test-*.log "$loki_dir"/logs/*test*.log; do
2124
- [ -f "$f" ] && has_tests=true && break
2125
- done
2126
- if [ "$has_tests" = "false" ]; then
2127
- vote="CONTINUE"
2128
- reasons="${reasons}no test results found; "
2203
+ # Positive requires a REAL passing test signal, not merely the
2204
+ # absence of a failing one: a structured result with runner != none
2205
+ # AND pass == true. {"runner":"none"} (no suite ran) is NOT positive
2206
+ # test evidence for this member, and a missing file is not either, so
2207
+ # a no-tests / greenfield project leaves test_auditor at CONTINUE.
2208
+ if [ "$test_evidence" = "absent" ]; then
2209
+ reasons="${reasons}no structured test results found; "
2210
+ elif [ "$test_runner_seen" = "none" ]; then
2211
+ reasons="${reasons}no real test suite ran (runner none); "
2212
+ elif [ "$test_evidence" = "pass" ]; then
2213
+ positive="true"
2129
2214
  fi
2130
2215
  ;;
2131
2216
  devils_advocate)
2132
- # Check for TODO/FIXME markers
2217
+ # Positive: tests not red AND a low TODO/FIXME density. A high marker
2218
+ # count is a hard "not done"; a missing/absent test base means no
2219
+ # positive evidence even when TODOs are low.
2133
2220
  local todo_count
2134
2221
  todo_count=$(grep -rl "TODO\|FIXME\|HACK\|XXX" . --include="*.ts" --include="*.js" --include="*.py" --include="*.sh" 2>/dev/null | wc -l | tr -d ' ')
2135
2222
  if [ "$todo_count" -gt 5 ]; then
2136
- vote="CONTINUE"
2223
+ blocked="true"
2137
2224
  reasons="${reasons}$todo_count files with TODO/FIXME markers; "
2138
2225
  fi
2226
+ if [ "$test_evidence" = "pass" ] && [ "$todo_count" -le 5 ]; then
2227
+ positive="true"
2228
+ fi
2229
+ ;;
2230
+ *)
2231
+ # Unknown role: fall back to the shared base signal only.
2232
+ if [ "$test_evidence" = "pass" ]; then
2233
+ positive="true"
2234
+ fi
2139
2235
  ;;
2140
2236
  esac
2141
2237
 
2238
+ # Final decision: COMPLETE only when nothing blocks AND positive evidence
2239
+ # is present. Otherwise CONTINUE (the affirmative-evidence default).
2240
+ if [ "$blocked" = "false" ] && [ "$positive" = "true" ]; then
2241
+ vote="COMPLETE"
2242
+ fi
2243
+
2142
2244
  # Clean up trailing separator
2143
2245
  reasons="${reasons%; }"
2144
2246
  if [ -z "$reasons" ]; then
2145
- reasons="all checks passed for $role ($criteria)"
2247
+ if [ "$vote" = "COMPLETE" ]; then
2248
+ reasons="positive evidence present, no failures for $role ($criteria)"
2249
+ else
2250
+ reasons="no positive completion evidence for $role ($criteria)"
2251
+ fi
2146
2252
  fi
2147
2253
 
2148
2254
  echo "$vote $reasons"
@@ -317,14 +317,38 @@ hook_pre_healing_modify() {
317
317
  if [[ -f "$heal_dir/friction-map.json" ]]; then
318
318
  local blocked
319
319
  blocked=$(python3 -c "
320
- import json, sys
320
+ import json, os, sys
321
321
  file_path = sys.argv[1]
322
322
  strict = sys.argv[2] == 'true'
323
323
  with open(sys.argv[3]) as f:
324
324
  data = json.load(f)
325
+
326
+ # Path-aware match (not raw substring 'in', which over-matched app.py against
327
+ # myapp.py and under-matched src/foo.py against a foo.py:10 location). Friction
328
+ # locations are formatted 'path:line' (or just 'path'); strip a trailing
329
+ # ':<line>' then compare by basename and normalized path so the same file is
330
+ # matched regardless of how it was referenced.
331
+ def norm(p):
332
+ # Drop a trailing ':<line>' (and optional ':<col>') suffix from a location.
333
+ parts = p.rsplit(':', 1)
334
+ while len(parts) == 2 and parts[1].isdigit():
335
+ p = parts[0]
336
+ parts = p.rsplit(':', 1)
337
+ return p
338
+
339
+ def matches(target, loc):
340
+ loc = norm(loc)
341
+ if not target or not loc:
342
+ return False
343
+ # Exact normalized-path match, or same basename. Basename equality is the
344
+ # path-aware replacement for substring containment.
345
+ if os.path.normpath(target) == os.path.normpath(loc):
346
+ return True
347
+ return os.path.basename(target) == os.path.basename(loc)
348
+
325
349
  for friction in data.get('frictions', []):
326
350
  loc = friction.get('location', '')
327
- if file_path in loc:
351
+ if matches(file_path, loc):
328
352
  cls = friction.get('classification', 'unknown')
329
353
  safe = friction.get('safe_to_remove', False)
330
354
  if cls in ('business_rule', 'unknown') and not safe:
@@ -343,9 +367,101 @@ print('OK')
343
367
  fi
344
368
  fi
345
369
 
370
+ # Capture a pre-edit snapshot so post_healing_modify can revert ONLY the
371
+ # healing edit on test failure (not unrelated uncommitted changes, and not
372
+ # via git checkout which discards everything). Keyed by file path.
373
+ _heal_snapshot_save "$heal_dir" "$file_path"
374
+
375
+ return 0
376
+ }
377
+
378
+ # Snapshot path helper: maps a target file path to its snapshot blob location.
379
+ # Uses a flat directory with the path's basename plus a hash of the full path
380
+ # to avoid collisions between same-named files in different directories.
381
+ _heal_snapshot_path() {
382
+ local heal_dir="$1"
383
+ local file_path="$2"
384
+ local key
385
+ key=$(printf '%s' "$file_path" | cksum | awk '{print $1"-"$2}')
386
+ printf '%s/snapshots/%s.%s' "$heal_dir" "$(basename "$file_path")" "$key"
387
+ }
388
+
389
+ # Save a pre-edit snapshot of file_path. If the file does not exist yet (the
390
+ # healing edit will CREATE it), write a sentinel marker instead so the revert
391
+ # path knows to remove the file rather than restore content.
392
+ #
393
+ # Pairing contract: hook_pre_healing_modify (which calls this) MUST run for a
394
+ # file before hook_post_healing_modify reverts it. The snapshot is refreshed on
395
+ # every pre call, so a post without a matching fresh pre could restore a stale
396
+ # blob. On the success path the snapshot is intentionally left in place; the
397
+ # next pre overwrites it.
398
+ _heal_snapshot_save() {
399
+ local heal_dir="$1"
400
+ local file_path="$2"
401
+ [[ -z "$file_path" ]] && return 0
402
+ local snap_dir="$heal_dir/snapshots"
403
+ mkdir -p "$snap_dir" 2>/dev/null || return 0
404
+ local snap
405
+ snap=$(_heal_snapshot_path "$heal_dir" "$file_path")
406
+ if [[ -f "$file_path" ]]; then
407
+ cp "$file_path" "$snap" 2>/dev/null || return 0
408
+ rm -f "$snap.absent" 2>/dev/null || true
409
+ else
410
+ # File does not exist pre-edit: record an "absent" marker, drop any
411
+ # stale content snapshot.
412
+ rm -f "$snap" 2>/dev/null || true
413
+ : > "$snap.absent" 2>/dev/null || true
414
+ fi
346
415
  return 0
347
416
  }
348
417
 
418
+ # Restore file_path from its pre-edit snapshot, reverting ONLY the healing edit.
419
+ # Echoes an accurate human-readable message describing what actually happened
420
+ # (content restored / healing-added file removed / could not revert). Returns 0
421
+ # when the revert succeeded as reported, 1 when it could not be performed.
422
+ _heal_snapshot_restore() {
423
+ local heal_dir="$1"
424
+ local file_path="$2"
425
+ if [[ -z "$file_path" ]]; then
426
+ echo "No file path given; nothing reverted."
427
+ return 1
428
+ fi
429
+ local snap
430
+ snap=$(_heal_snapshot_path "$heal_dir" "$file_path")
431
+
432
+ if [[ -f "$snap" ]]; then
433
+ # Pre-edit content snapshot exists: restore exactly that content, which
434
+ # preserves any unrelated uncommitted changes present before the edit.
435
+ if cp "$snap" "$file_path" 2>/dev/null; then
436
+ echo "Healing edit reverted to pre-edit snapshot."
437
+ return 0
438
+ fi
439
+ echo "Could not restore pre-edit snapshot for ${file_path}; file left as-is."
440
+ return 1
441
+ fi
442
+
443
+ if [[ -f "$snap.absent" ]]; then
444
+ # File did not exist pre-edit: the healing edit created it. Remove only
445
+ # that file, not unrelated state.
446
+ if [[ ! -e "$file_path" ]]; then
447
+ echo "Healing-added file ${file_path} no longer present; nothing to remove."
448
+ return 0
449
+ fi
450
+ if rm -f "$file_path" 2>/dev/null; then
451
+ echo "Healing-added file ${file_path} removed."
452
+ return 0
453
+ fi
454
+ echo "Could not remove healing-added file ${file_path}; file left as-is."
455
+ return 1
456
+ fi
457
+
458
+ # No snapshot was captured (pre_healing_modify did not run for this file).
459
+ # Be honest: do not claim a revert that did not happen, and do NOT fall back
460
+ # to a destructive git checkout.
461
+ echo "No pre-edit snapshot found for ${file_path}; could not revert (left as-is)."
462
+ return 1
463
+ }
464
+
349
465
  # Hook: post_healing_modify - runs AFTER agent modifies a file in healing mode
350
466
  # Verifies characterization tests still pass after modification
351
467
  hook_post_healing_modify() {
@@ -384,9 +500,17 @@ hook_post_healing_modify() {
384
500
  test_output=$(cat "$test_result_file")
385
501
  rm -f "$test_result_file"
386
502
 
387
- # Revert the change - characterization tests must pass
388
- git -C "$codebase_path" checkout -- "$file_path" 2>/dev/null || true
389
- echo "HOOK_BLOCKED: Characterization tests failed after healing modification to ${file_path}. Change reverted."
503
+ # Revert ONLY the healing edit using the pre-edit snapshot captured by
504
+ # hook_pre_healing_modify. Do NOT use `git checkout -- "$file_path"`:
505
+ # that discards ALL uncommitted changes to the file (not just the
506
+ # healing edit) and silently no-ops for an untracked file while still
507
+ # claiming the change was reverted. Report exactly what happened.
508
+ local revert_msg
509
+ # _heal_snapshot_restore returns nonzero when it could not revert; we
510
+ # surface the outcome via its message (recorded below) rather than a
511
+ # code, and must not let a nonzero return abort under set -e.
512
+ revert_msg=$(_heal_snapshot_restore "$heal_dir" "$file_path") || true
513
+ echo "HOOK_BLOCKED: Characterization tests failed after healing modification to ${file_path}. ${revert_msg}"
390
514
  echo "Test output: ${test_output}"
391
515
 
392
516
  # Record failure in failure-modes.json
@@ -404,12 +528,12 @@ data.setdefault('modes', []).append({
404
528
  'trigger': 'healing_modification',
405
529
  'file': sys.argv[2],
406
530
  'behavior': 'Characterization tests failed after modification',
407
- 'recovery': 'Change automatically reverted',
531
+ 'recovery': sys.argv[3],
408
532
  'is_intentional': False
409
533
  })
410
534
  with open(sys.argv[1], 'w') as f:
411
535
  json.dump(data, f, indent=2)
412
- " "$heal_dir/failure-modes.json" "$file_path" 2>/dev/null || true
536
+ " "$heal_dir/failure-modes.json" "$file_path" "$revert_msg" 2>/dev/null || true
413
537
  fi
414
538
 
415
539
  return 1