npm - loki-mode - Versions diffs - 7.41.4 → 7.42.0 - Mend

loki-mode 7.41.4 → 7.42.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (20) hide show

package/README.md +18 -1
package/SKILL.md +2 -2
package/VERSION +1 -1
package/autonomy/completion-council.sh +143 -37
package/autonomy/hooks/migration-hooks.sh +131 -7
package/autonomy/loki +54 -43
package/autonomy/run.sh +1 -1
package/dashboard/__init__.py +1 -1
package/dashboard/server.py +102 -0
package/docs/INSTALLATION.md +70 -1
package/loki-ts/dist/loki.js +2 -2
package/mcp/__init__.py +1 -1
package/mcp/lsp_proxy.py +274 -89
package/memory/engine.py +15 -3
package/memory/storage.py +6 -0
package/package.json +1 -1
package/plugins/loki-mode/.claude-plugin/plugin.json +1 -1
package/references/core-workflow.md +7 -0
package/references/quality-control.md +6 -0
package/skills/agents.md +1 -0

package/README.md CHANGED Viewed

@@ -29,7 +29,7 @@ _The free, source-available autonomous coding agent by [Autonomi](https://www.au
 - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
 - **Standalone verification: `loki verify`** -- Run Loki's deterministic gates (build, tests, static analysis, secret scan, dependency audit) against any branch or PR diff, including code written by other agents or humans. CI-ready exit codes (0 VERIFIED, 1 CONCERNS, 2 BLOCKED), machine-readable evidence at `.loki/verify/evidence.json`. Inconclusive evidence is never reported as VERIFIED (v7.27.0).
 - **Living spec and pre-build interrogation** -- `loki spec` locks a spec and detects drift deterministically (`spec.lock`, `drift-report.json`, and a `SPEC_DRIFT` finding in `loki verify` with CI exit codes), so you can tell when the build diverges from what was agreed. `loki grill` runs a Devil's-Advocate interrogation of the spec before you build, surfacing gaps and contradictions early (v7.28.0).
-- **Mid-flight model switching + Claude Fable tier** -- switch the model a live run uses from the dashboard (applies at the next iteration, current run only), with Claude Fable available as a premium tier at its published $10/$50 per MTok (2x Opus). For every model lever (session pin to Fable, mid-flight override, architect pass) and every `LOKI_MAX_TIER` path, the `loki plan` quote, the dashboard's reported model, and the actual dispatched model agree, with the ceiling enforced (v7.31.0).
+- **Mid-flight model switching** -- switch the model a live run uses from the dashboard (applies at the next iteration, current run only). A Fable tier lever exists in the CLI, dashboard, and override paths, but Claude Fable 5 is not yet available at the API, so selecting Fable currently collapses to Opus at every dispatch chokepoint and the `loki plan` quote reflects Opus accordingly. For every model lever (session pin, mid-flight override, architect pass) and every `LOKI_MAX_TIER` path, the `loki plan` quote, the dashboard's reported model, and the actual dispatched model agree, with the ceiling enforced (v7.31.0; Fable-to-Opus collapse v7.39.1).
 - **A calmer CLI** -- the help surface is ~20 grouped workflow entries instead of a 70-command wall; merged commands live on as aliases that forward byte-identically with a one-line stderr pointer, so no script breaks (v7.31.0).
 - **Guided first build: `loki quickstart`** -- four quick questions (setup check, one-line idea, template pick, plan review) and your build starts; pressing Enter through every step builds the sample Todo app. The plan step quotes the real cost/time estimate before anything is spent, and `loki demo` now confirms its estimate the same way. If no AI provider CLI is installed, Loki offers to install Claude Code (consent-gated, interactive terminals only) (v7.29.0).
 - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
@@ -391,6 +391,23 @@ Run `loki --help` for all options. Full reference: [CLI Reference](wiki/CLI-Refe
 ---
+<details>
+<summary><strong>Configuration env vars (intelligent defaults, opt-out knobs)</strong></summary>
+Loki Mode's accuracy and autonomy behaviors are default-on. Each is an opt-out escape hatch, not a setting you have to discover. The most relevant knobs from the v7.41.x accuracy/autonomy hardening:
+| Env var | Default | Effect |
+|---------|---------|--------|
+| `LOKI_REVIEW_INCONCLUSIVE_BLOCK` | `1` | Blocks completion when a code-review round returns zero usable verdicts (an all-empty review proves nothing). Set `0` to record the inconclusive result without blocking. |
+| `LOKI_COMPLETION_TEST_CAPTURE` | `1` | Captures fresh test results before the verified-completion evidence gate evaluates. Set `0` to skip the pre-gate capture. |
+| `LOKI_AUTO_DOCS` | `true` | Generates the `.loki/docs/` suite before the documentation gate scores it (bounded: once per run when docs are missing, and again only when >10 commits stale). Set `false` to opt out. |
+| `LOKI_CAVEMAN` | `1` (on) | Output-token compressor for free-form generation only (never trust-gate subcalls). Set `0` to opt out. |
+| `LOKI_CAVEMAN_LEVEL` | inferred | Compression level for the compressor. Auto-inferred per invocation from the run's RARV tier; set explicitly (`lite` / `full` / `ultra`) to override the inference. |
+This is a subset. See the [wiki](wiki/Home.md) for the full env-var reference and the RARV-C closure knobs (`LOKI_INJECT_FINDINGS`, `LOKI_OVERRIDE_COUNCIL`, `LOKI_AUTO_LEARNINGS`, `LOKI_HANDOFF_MD`).
+</details>
 <details>
 <summary><strong>BMAD Method Integration</strong></summary>

package/SKILL.md CHANGED Viewed

@@ -3,7 +3,7 @@ name: loki-mode
 description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
 ---
-# Loki Mode v7.41.4
+# Loki Mode v7.42.0
 **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
@@ -398,4 +398,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
 ---
-**v7.41.4 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
+**v7.42.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**

package/VERSION CHANGED Viewed

	@@ -1 +1 @@
1	- 7.41.4
1	+ 7.42.0

package/autonomy/completion-council.sh CHANGED Viewed

@@ -710,8 +710,18 @@ print('true' if ratio > budget else 'false')
         ((member++))
     done
-    # Anti-sycophancy check: if unanimous APPROVE, run devil's advocate
+    # Anti-sycophancy check: if unanimous APPROVE, run devil's advocate.
+    #
+    # Audit-trail snapshots (these do NOT affect the live vote): capture whether
+    # the council was unanimous BEFORE the decrement below, and whether the DA
+    # actually fired and flipped the verdict. The transcript fields
+    # _ct_triggered/_ct_flipped used to be re-derived from approve_count AFTER
+    # this block decremented it, so on rounds where the DA fired AND flipped they
+    # were mis-recorded as false/false, corrupting the trust-metrics audit trail.
+    local _da_was_unanimous="false"
+    local _da_flipped="false"
     if [ $approve_count -eq $COUNCIL_SIZE ] && [ $COUNCIL_SIZE -ge 2 ]; then
+        _da_was_unanimous="true"
         log_warn "Unanimous approval detected - running anti-sycophancy check..."
         local contrarian_verdict
         contrarian_verdict=$(council_devils_advocate "$evidence_file" "$vote_dir")
@@ -731,6 +741,7 @@ print('true' if ratio > budget else 'false')
             log_warn "Overriding to require one more iteration for verification"
             approve_count=$((approve_count - 1))
             reject_count=$((reject_count + 1))
+            _da_flipped="true"
         fi
     fi
@@ -795,20 +806,18 @@ with open(state_file, 'w') as f:
             >/dev/null 2>&1 || true
     fi
-    # Write transcript for this council round (Path A: council_vote path)
+    # Write transcript for this council round (Path A: council_vote path).
+    #
+    # Drive contrarian_triggered/_flipped off the snapshots captured in the
+    # anti-sycophancy block ABOVE, not off the now-mutated approve_count. The DA
+    # fires exactly when the council was unanimous (_da_was_unanimous), and it
+    # flips exactly when it did not confirm the approval (_da_flipped). Re-deriving
+    # from approve_count was wrong because the flip path already decremented it,
+    # so triggered/flipped were both recorded as false on flip rounds.
     local _ct_outcome
     _ct_outcome=$([ $approve_count -ge $effective_threshold ] && echo "APPROVED" || echo "REJECTED")
-    local _ct_triggered="false"
-    local _ct_flipped="false"
-    if [ $approve_count -eq $COUNCIL_SIZE ] && [ $COUNCIL_SIZE -ge 2 ]; then
-        _ct_triggered="true"
-    fi
-    # contrarian_flipped: DA voted REJECT/CANNOT_VALIDATE causing approve_count drop
-    # Detect by checking if approve dropped from unanimous (COUNCIL_SIZE) to less
-    # We infer flip if triggered AND final approve < COUNCIL_SIZE
-    if [ "$_ct_triggered" = "true" ] && [ $approve_count -lt $COUNCIL_SIZE ]; then
-        _ct_flipped="true"
-    fi
+    local _ct_triggered="$_da_was_unanimous"
+    local _ct_flipped="$_da_flipped"
     council_write_transcript "${ITERATION_COUNT:-0}" "$_ct_outcome" "$_ct_triggered" "$_ct_flipped" "$effective_threshold"
     if [ $approve_count -ge $effective_threshold ]; then
@@ -2045,7 +2054,23 @@ council_evaluate_member() {
     local role="$1"
     local criteria="${2:-general completion check}"
     local loki_dir="${TARGET_DIR:-.}/.loki"
-    local vote="COMPLETE"
+    # Trust-gate inversion (v7.41.5): the default is CONTINUE, not COMPLETE.
+    # The old default ("absence of a detected failure == COMPLETE") let a
+    # greenfield run with an empty .loki/ (no test logs, no queue, few TODOs)
+    # cause requirements_verifier + devils_advocate to vote COMPLETE while only
+    # test_auditor went CONTINUE -> 2-of-3 cleared the size-3 threshold and the
+    # heuristic council approved a project with ZERO positive evidence. We now
+    # require AFFIRMATIVE positive evidence before any member votes COMPLETE,
+    # mirroring the LLM prompt's "do not approve incomplete work" stance.
+    #
+    # Mechanics:
+    #   - vote starts at CONTINUE.
+    #   - The existing failure detectors set blocked=true (a hard "not done"
+    #     signal) and accumulate reasons.
+    #   - At the end, vote flips to COMPLETE only if blocked==false AND the
+    #     member's positive signal holds. No positive signal => CONTINUE.
+    local vote="CONTINUE"
+    local blocked="false"
     local reasons=""
     # Check 1: Do tests pass? Look for test results in .loki/
@@ -2067,7 +2092,7 @@ council_evaluate_member() {
         fi
     done
     if [ "$test_failures" -gt 0 ]; then
-        vote="CONTINUE"
+        blocked="true"
         reasons="${reasons}test failures found ($test_failures); "
     fi
@@ -2076,12 +2101,16 @@ council_evaluate_member() {
     local current_diff_hash
     current_diff_hash=$(git diff --stat HEAD 2>/dev/null | (md5sum 2>/dev/null || md5 -r 2>/dev/null) | cut -d' ' -f1 || echo "unknown")
     if [ "$COUNCIL_CONSECUTIVE_NO_CHANGE" -gt 0 ] && [ "$ITERATION_COUNT" -gt "$COUNCIL_MIN_ITERATIONS" ]; then
-        # Code has stopped changing -- stagnation, not necessarily done
-        # (BUG-QG-011: previously inverted -- forced CONTINUE when code was changing,
-        # which penalized active progress. Now: stagnation with no passing checks = CONTINUE)
-        if [ "$vote" = "COMPLETE" ]; then
-            : # Other checks passed despite stagnation -- allow COMPLETE
-        else
+        # Code has stopped changing -- stagnation, not necessarily done.
+        # (BUG-QG-011 history: previously inverted -- forced CONTINUE when code
+        # was changing, which penalized active progress.)
+        # Under the v7.41.5 affirmative-evidence default this is INFORMATIONAL
+        # only: stagnation by itself is neither a failure (do not set blocked)
+        # nor positive evidence (does not flip vote to COMPLETE). Whether the
+        # member completes is decided entirely by blocked + the positive-signal
+        # check below. Surface the stagnation note only when a real failure was
+        # also detected, so the reason string stays honest.
+        if [ "$blocked" = "true" ]; then
             reasons="${reasons}code stagnated with failing checks; "
         fi
     fi
@@ -2100,49 +2129,126 @@ council_evaluate_member() {
         done
     fi
     if [ "$error_count" -gt 0 ]; then
-        vote="CONTINUE"
+        blocked="true"
         reasons="${reasons}uncaught errors in logs ($error_count); "
     fi
-    # Role-specific checks
+    # --- Affirmative positive evidence (v7.41.5) ----------------------------
+    # Base positive signal, shared by all members: a structured test-results.json
+    # exists and is NOT red. This is the SAME file + parse the evidence hard gate
+    # uses (council_evidence_gate, see this file ~line 1543-1568), so the member
+    # vote and the gate agree on what "tests are not red" means instead of a
+    # fragile log grep. The completion route guarantees this file is written
+    # before the council votes via ensure_completion_test_evidence()
+    # (autonomy/run.sh): a project with a real runner records its true PASS/FAIL,
+    # and a project with no runner records {"runner":"none","pass":true}. A
+    # greenfield run with an empty .loki/ has NO such file -> no positive base ->
+    # the member stays CONTINUE.
+    #
+    # Parse verdict mirrors council_evidence_gate: runner=="none" => PASS,
+    # pass is False => FAIL, else PASS. Unparseable/missing => not present.
+    local tr_file="$loki_dir/quality/test-results.json"
+    local test_evidence="absent"   # absent | pass | fail
+    local test_runner_seen="none"
+    if [ -f "$tr_file" ]; then
+        local _tr_status
+        _tr_status=$(_TR_FILE="$tr_file" python3 -c "
+import json, os, sys
+try:
+    with open(os.environ['_TR_FILE']) as f:
+        d = json.load(f)
+except (json.JSONDecodeError, IOError, KeyError, ValueError):
+    print('absent:none')
+    sys.exit(0)
+runner = d.get('runner', 'none')
+passed = d.get('pass', True)
+if runner == 'none':
+    print('pass:none')
+elif passed is False:
+    print('fail:%s' % runner)
+else:
+    print('pass:%s' % runner)
+" 2>/dev/null || echo "absent:none")
+        test_evidence="${_tr_status%%:*}"
+        test_runner_seen="${_tr_status#*:}"
+    fi
+    if [ "$test_evidence" = "fail" ]; then
+        # A red structured result is a hard failure for every member, on top of
+        # any log-derived failures already counted above.
+        blocked="true"
+        reasons="${reasons}structured test results red (runner '$test_runner_seen'); "
+    fi
+    # Per-member positive signal, evaluated on top of the shared base.
+    local positive="false"
     case "$role" in
         requirements_verifier)
-            # Check if pending tasks remain
+            # Positive: tests not red AND no pending tasks. A present queue file
+            # with pending>0 is a hard "not done"; an ABSENT queue file is not
+            # itself disqualifying (a legit run need not have one), it just means
+            # this member relies on the base test evidence.
+            local pending=0
             if [ -f "$loki_dir/queue/pending.json" ]; then
-                local pending
                 pending=$(_QUEUE_FILE="$loki_dir/queue/pending.json" python3 -c "import json, os; print(len(json.load(open(os.environ['_QUEUE_FILE']))))" 2>/dev/null || echo "0")
                 if [ "$pending" -gt 0 ]; then
-                    vote="CONTINUE"
+                    blocked="true"
                     reasons="${reasons}$pending tasks still pending; "
                 fi
             fi
+            if [ "$test_evidence" = "pass" ] && [ "$pending" -eq 0 ]; then
+                positive="true"
+            fi
             ;;
         test_auditor)
-            # Check if any test log exists at all
-            local has_tests=false
-            for f in "$loki_dir"/logs/test-*.log "$loki_dir"/logs/*test*.log; do
-                [ -f "$f" ] && has_tests=true && break
-            done
-            if [ "$has_tests" = "false" ]; then
-                vote="CONTINUE"
-                reasons="${reasons}no test results found; "
+            # Positive requires a REAL passing test signal, not merely the
+            # absence of a failing one: a structured result with runner != none
+            # AND pass == true. {"runner":"none"} (no suite ran) is NOT positive
+            # test evidence for this member, and a missing file is not either, so
+            # a no-tests / greenfield project leaves test_auditor at CONTINUE.
+            if [ "$test_evidence" = "absent" ]; then
+                reasons="${reasons}no structured test results found; "
+            elif [ "$test_runner_seen" = "none" ]; then
+                reasons="${reasons}no real test suite ran (runner none); "
+            elif [ "$test_evidence" = "pass" ]; then
+                positive="true"
             fi
             ;;
         devils_advocate)
-            # Check for TODO/FIXME markers
+            # Positive: tests not red AND a low TODO/FIXME density. A high marker
+            # count is a hard "not done"; a missing/absent test base means no
+            # positive evidence even when TODOs are low.
             local todo_count
             todo_count=$(grep -rl "TODO\|FIXME\|HACK\|XXX" . --include="*.ts" --include="*.js" --include="*.py" --include="*.sh" 2>/dev/null | wc -l | tr -d ' ')
             if [ "$todo_count" -gt 5 ]; then
-                vote="CONTINUE"
+                blocked="true"
                 reasons="${reasons}$todo_count files with TODO/FIXME markers; "
             fi
+            if [ "$test_evidence" = "pass" ] && [ "$todo_count" -le 5 ]; then
+                positive="true"
+            fi
+            ;;
+        *)
+            # Unknown role: fall back to the shared base signal only.
+            if [ "$test_evidence" = "pass" ]; then
+                positive="true"
+            fi
             ;;
     esac
+    # Final decision: COMPLETE only when nothing blocks AND positive evidence
+    # is present. Otherwise CONTINUE (the affirmative-evidence default).
+    if [ "$blocked" = "false" ] && [ "$positive" = "true" ]; then
+        vote="COMPLETE"
+    fi
     # Clean up trailing separator
     reasons="${reasons%; }"
     if [ -z "$reasons" ]; then
-        reasons="all checks passed for $role ($criteria)"
+        if [ "$vote" = "COMPLETE" ]; then
+            reasons="positive evidence present, no failures for $role ($criteria)"
+        else
+            reasons="no positive completion evidence for $role ($criteria)"
+        fi
     fi
     echo "$vote $reasons"

package/autonomy/hooks/migration-hooks.sh CHANGED Viewed

@@ -317,14 +317,38 @@ hook_pre_healing_modify() {
     if [[ -f "$heal_dir/friction-map.json" ]]; then
         local blocked
         blocked=$(python3 -c "
-import json, sys
+import json, os, sys
 file_path = sys.argv[1]
 strict = sys.argv[2] == 'true'
 with open(sys.argv[3]) as f:
     data = json.load(f)
+# Path-aware match (not raw substring 'in', which over-matched app.py against
+# myapp.py and under-matched src/foo.py against a foo.py:10 location). Friction
+# locations are formatted 'path:line' (or just 'path'); strip a trailing
+# ':<line>' then compare by basename and normalized path so the same file is
+# matched regardless of how it was referenced.
+def norm(p):
+    # Drop a trailing ':<line>' (and optional ':<col>') suffix from a location.
+    parts = p.rsplit(':', 1)
+    while len(parts) == 2 and parts[1].isdigit():
+        p = parts[0]
+        parts = p.rsplit(':', 1)
+    return p
+def matches(target, loc):
+    loc = norm(loc)
+    if not target or not loc:
+        return False
+    # Exact normalized-path match, or same basename. Basename equality is the
+    # path-aware replacement for substring containment.
+    if os.path.normpath(target) == os.path.normpath(loc):
+        return True
+    return os.path.basename(target) == os.path.basename(loc)
 for friction in data.get('frictions', []):
     loc = friction.get('location', '')
-    if file_path in loc:
+    if matches(file_path, loc):
         cls = friction.get('classification', 'unknown')
         safe = friction.get('safe_to_remove', False)
         if cls in ('business_rule', 'unknown') and not safe:
@@ -343,9 +367,101 @@ print('OK')
         fi
     fi
+    # Capture a pre-edit snapshot so post_healing_modify can revert ONLY the
+    # healing edit on test failure (not unrelated uncommitted changes, and not
+    # via git checkout which discards everything). Keyed by file path.
+    _heal_snapshot_save "$heal_dir" "$file_path"
+    return 0
+}
+# Snapshot path helper: maps a target file path to its snapshot blob location.
+# Uses a flat directory with the path's basename plus a hash of the full path
+# to avoid collisions between same-named files in different directories.
+_heal_snapshot_path() {
+    local heal_dir="$1"
+    local file_path="$2"
+    local key
+    key=$(printf '%s' "$file_path" | cksum | awk '{print $1"-"$2}')
+    printf '%s/snapshots/%s.%s' "$heal_dir" "$(basename "$file_path")" "$key"
+}
+# Save a pre-edit snapshot of file_path. If the file does not exist yet (the
+# healing edit will CREATE it), write a sentinel marker instead so the revert
+# path knows to remove the file rather than restore content.
+#
+# Pairing contract: hook_pre_healing_modify (which calls this) MUST run for a
+# file before hook_post_healing_modify reverts it. The snapshot is refreshed on
+# every pre call, so a post without a matching fresh pre could restore a stale
+# blob. On the success path the snapshot is intentionally left in place; the
+# next pre overwrites it.
+_heal_snapshot_save() {
+    local heal_dir="$1"
+    local file_path="$2"
+    [[ -z "$file_path" ]] && return 0
+    local snap_dir="$heal_dir/snapshots"
+    mkdir -p "$snap_dir" 2>/dev/null || return 0
+    local snap
+    snap=$(_heal_snapshot_path "$heal_dir" "$file_path")
+    if [[ -f "$file_path" ]]; then
+        cp "$file_path" "$snap" 2>/dev/null || return 0
+        rm -f "$snap.absent" 2>/dev/null || true
+    else
+        # File does not exist pre-edit: record an "absent" marker, drop any
+        # stale content snapshot.
+        rm -f "$snap" 2>/dev/null || true
+        : > "$snap.absent" 2>/dev/null || true
+    fi
     return 0
 }
+# Restore file_path from its pre-edit snapshot, reverting ONLY the healing edit.
+# Echoes an accurate human-readable message describing what actually happened
+# (content restored / healing-added file removed / could not revert). Returns 0
+# when the revert succeeded as reported, 1 when it could not be performed.
+_heal_snapshot_restore() {
+    local heal_dir="$1"
+    local file_path="$2"
+    if [[ -z "$file_path" ]]; then
+        echo "No file path given; nothing reverted."
+        return 1
+    fi
+    local snap
+    snap=$(_heal_snapshot_path "$heal_dir" "$file_path")
+    if [[ -f "$snap" ]]; then
+        # Pre-edit content snapshot exists: restore exactly that content, which
+        # preserves any unrelated uncommitted changes present before the edit.
+        if cp "$snap" "$file_path" 2>/dev/null; then
+            echo "Healing edit reverted to pre-edit snapshot."
+            return 0
+        fi
+        echo "Could not restore pre-edit snapshot for ${file_path}; file left as-is."
+        return 1
+    fi
+    if [[ -f "$snap.absent" ]]; then
+        # File did not exist pre-edit: the healing edit created it. Remove only
+        # that file, not unrelated state.
+        if [[ ! -e "$file_path" ]]; then
+            echo "Healing-added file ${file_path} no longer present; nothing to remove."
+            return 0
+        fi
+        if rm -f "$file_path" 2>/dev/null; then
+            echo "Healing-added file ${file_path} removed."
+            return 0
+        fi
+        echo "Could not remove healing-added file ${file_path}; file left as-is."
+        return 1
+    fi
+    # No snapshot was captured (pre_healing_modify did not run for this file).
+    # Be honest: do not claim a revert that did not happen, and do NOT fall back
+    # to a destructive git checkout.
+    echo "No pre-edit snapshot found for ${file_path}; could not revert (left as-is)."
+    return 1
+}
 # Hook: post_healing_modify - runs AFTER agent modifies a file in healing mode
 # Verifies characterization tests still pass after modification
 hook_post_healing_modify() {
@@ -384,9 +500,17 @@ hook_post_healing_modify() {
         test_output=$(cat "$test_result_file")
         rm -f "$test_result_file"
-        # Revert the change - characterization tests must pass
-        git -C "$codebase_path" checkout -- "$file_path" 2>/dev/null || true
-        echo "HOOK_BLOCKED: Characterization tests failed after healing modification to ${file_path}. Change reverted."
+        # Revert ONLY the healing edit using the pre-edit snapshot captured by
+        # hook_pre_healing_modify. Do NOT use `git checkout -- "$file_path"`:
+        # that discards ALL uncommitted changes to the file (not just the
+        # healing edit) and silently no-ops for an untracked file while still
+        # claiming the change was reverted. Report exactly what happened.
+        local revert_msg
+        # _heal_snapshot_restore returns nonzero when it could not revert; we
+        # surface the outcome via its message (recorded below) rather than a
+        # code, and must not let a nonzero return abort under set -e.
+        revert_msg=$(_heal_snapshot_restore "$heal_dir" "$file_path") || true
+        echo "HOOK_BLOCKED: Characterization tests failed after healing modification to ${file_path}. ${revert_msg}"
         echo "Test output: ${test_output}"
         # Record failure in failure-modes.json
@@ -404,12 +528,12 @@ data.setdefault('modes', []).append({
     'trigger': 'healing_modification',
     'file': sys.argv[2],
     'behavior': 'Characterization tests failed after modification',
-    'recovery': 'Change automatically reverted',
+    'recovery': sys.argv[3],
     'is_intentional': False
 })
 with open(sys.argv[1], 'w') as f:
     json.dump(data, f, indent=2)
-" "$heal_dir/failure-modes.json" "$file_path" 2>/dev/null || true
+" "$heal_dir/failure-modes.json" "$file_path" "$revert_msg" 2>/dev/null || true
         fi
         return 1