loki-mode 7.27.0 → 7.28.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -27,6 +27,7 @@
27
27
  - **Spec-driven, autonomous, with a built-in trust layer** -- Hand Loki a spec, walk away, come back to working code with tests. The full RARV-C closure loop (Reason - Act - Reflect - Verify - Close) runs until the work is actually done, not just attempted. The verified-completion evidence gate (`skills/quality-gates.md`) refuses any "done" claim on an empty git diff against the run-start commit, and blocks completion when tests run red, so "complete" means proven, not promised.
28
28
  - **Production quality built in** -- 11 quality gates (`skills/quality-gates.md`), blind 3-reviewer code review (`run.sh:run_code_review()`), anti-sycophancy checks
29
29
  - **Standalone verification: `loki verify`** -- Run Loki's deterministic gates (build, tests, static analysis, secret scan, dependency audit) against any branch or PR diff, including code written by other agents or humans. CI-ready exit codes (0 VERIFIED, 1 CONCERNS, 2 BLOCKED), machine-readable evidence at `.loki/verify/evidence.json`. Inconclusive evidence is never reported as VERIFIED (v7.27.0).
30
+ - **Living spec and pre-build interrogation** -- `loki spec` locks a spec and detects drift deterministically (`spec.lock`, `drift-report.json`, and a `SPEC_DRIFT` finding in `loki verify` with CI exit codes), so you can tell when the build diverges from what was agreed. `loki grill` runs a Devil's-Advocate interrogation of the spec before you build, surfacing gaps and contradictions early (v7.28.0).
30
31
  - **Live App Preview** -- The dashboard embeds the locally-running app in an iframe so you can interact with it immediately during a build. Use `loki preview` (alias `loki open`) to print the URL and open it in your browser. Local-first: no hosted service, no vendor lock (v7.24.0).
31
32
  - **Compose-first fullstack** -- When a spec needs more than one service (web + database + cache) Loki generates a 12-factor `docker-compose.yml` with healthchecks, `depends_on` wiring, env-var config, and a `.env.example`. The Live App Preview surfaces the web service URL (not a database port), and health reflects the web service's Docker healthcheck so a crashed app shows as crashed even when the database stays up. Single-service apps stay on a plain run command. All local-first, no hosted service (v7.26.0).
32
33
  - **Intelligent `loki start`** -- For interactive foreground runs the dashboard auto-opens in the browser (cross-platform; skipped in CI, SSH-without-TTY, and piped runs; opt out with `LOKI_NO_AUTO_OPEN=1`). The completion summary shows "Your app is live at <url>" so you know exactly where to try what Loki just built. The autonomous loop passes Claude Code's `--effort`, `--max-budget-usd`, and `--fallback-model` on every iteration (each gated on CLI support and individual opt-out env vars) for better long-run unattended execution (v7.25.0).
@@ -90,7 +91,7 @@ loki quick "build a landing page with a signup form"
90
91
  |--------|---------|-------|
91
92
  | **Bun (recommended)** | `bun install -g loki-mode` | Fastest startup for CLI commands. |
92
93
  | **Homebrew** | `brew tap asklokesh/tap && brew install loki-mode` | Auto-installs Bun as a dep |
93
- | **Docker** | `docker pull asklokesh/loki-mode:7.7.31 && docker run --rm asklokesh/loki-mode:7.7.31 start prd.md` | Bun pre-installed in image |
94
+ | **Docker** | `docker pull asklokesh/loki-mode:7.28.1 && docker run --rm asklokesh/loki-mode:7.28.1 start prd.md` | Bun pre-installed in image |
94
95
  | **npm (compat)** | `npm install -g loki-mode` | Works without Bun (bash fallback). Migrate any time with `loki self-update --to bun`. |
95
96
 
96
97
  **Upgrading:**
@@ -150,7 +151,7 @@ The next major release sunsets the Bash runtime entirely. There is no firm calen
150
151
  | Method | Command |
151
152
  |--------|---------|
152
153
  | **Homebrew** | `brew tap asklokesh/tap && brew install loki-mode` |
153
- | **Docker** | `docker pull asklokesh/loki-mode:7.7.31` |
154
+ | **Docker** | `docker pull asklokesh/loki-mode:7.28.1` |
154
155
  | **Inside Claude Code** | `claude --dangerously-skip-permissions` then type "Loki Mode" |
155
156
  | **Git clone** | `git clone https://github.com/asklokesh/loki-mode.git` |
156
157
 
package/SKILL.md CHANGED
@@ -3,7 +3,7 @@ name: loki-mode
3
3
  description: Autonomous spec-driven build system with a built-in trust layer. It does not call work done until it is verified (RARV-C closure loop, 11 quality gates, completion council, verified-completion evidence gate). Triggers on "Loki Mode". Takes a spec (PRD, GitHub issue, OpenAPI doc, etc.) to deployed product with minimal human intervention. Provider-agnostic. Requires --dangerously-skip-permissions flag.
4
4
  ---
5
5
 
6
- # Loki Mode v7.27.0
6
+ # Loki Mode v7.28.1
7
7
 
8
8
  **You are an autonomous agent. You make decisions. You do not ask questions. You do not stop.**
9
9
 
@@ -335,6 +335,15 @@ See `references/core-workflow.md` for the full RARV-C contract.
335
335
 
336
336
  ---
337
337
 
338
+ ## Trust-layer additions (v7.28.0)
339
+
340
+ Two completion-trust features extend the verification gates. Full details in `skills/quality-gates.md`.
341
+
342
+ - **Held-out spec evals:** ~25% of checklist items (deterministic `sha256(id)` order, `N >= 4`) are reserved into `.loki/checklist/held-out.json` and excluded from the build prompt feed; the completion council blocks if a held-out item fails. Opt out with `LOKI_HELDOUT_GATE=0`. Honest limit: this guards the prompt feed, not a sandbox; the reservation file is on disk and an agent with filesystem access can read it.
343
+ - **Inconclusive-baseline disclosure:** when the evidence gate cannot establish a diff baseline (`no_git_repo` / `no_run_start_sha`) it writes `.loki/state/evidence-inconclusive.json` and `COMPLETION.txt` carries an honest "not independently verified" line. It never blocks non-git projects; red tests still block.
344
+
345
+ ---
346
+
338
347
  ## Concurrency and Security Hardening (v7.5.7 - v7.5.13)
339
348
 
340
349
  Three back-to-back patches closed cross-process and security gaps. No user-facing behavior change on the default flow; verify via the cited paths.
@@ -383,4 +392,4 @@ See `CHANGELOG.md` entries [7.5.7], [7.5.8], [7.5.13] for the per-fix list and r
383
392
 
384
393
  ---
385
394
 
386
- **v7.27.0 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
395
+ **v7.28.1 | [Autonomi](https://www.autonomi.dev/) flagship product | ~260 lines core**
package/VERSION CHANGED
@@ -1 +1 @@
1
- 7.27.0
1
+ 7.28.1
@@ -1098,19 +1098,24 @@ council_reverify_checklist() {
1098
1098
  council_checklist_gate() {
1099
1099
  local results_file=".loki/checklist/verification-results.json"
1100
1100
  local waivers_file=".loki/checklist/waivers.json"
1101
+ local heldout_file=".loki/checklist/held-out.json"
1101
1102
 
1102
1103
  # No checklist = no gate (backwards compatible)
1103
1104
  if [ ! -f "$results_file" ]; then
1104
1105
  return 0
1105
1106
  fi
1106
1107
 
1107
- # Check for critical failures, excluding waived items
1108
+ # Check for critical failures, excluding waived AND held-out items. Held-out
1109
+ # items (v7.28.0) must NOT block here: they are evaluated separately by
1110
+ # council_heldout_gate at the ship gate, and surfacing them in this gate's
1111
+ # block report would leak their identity back into the build loop.
1108
1112
  local gate_result
1109
- gate_result=$(_RESULTS_FILE="$results_file" _WAIVERS_FILE="$waivers_file" python3 -c "
1113
+ gate_result=$(_RESULTS_FILE="$results_file" _WAIVERS_FILE="$waivers_file" _HELDOUT_FILE="$heldout_file" python3 -c "
1110
1114
  import json, sys, os
1111
1115
 
1112
1116
  results_file = os.environ['_RESULTS_FILE']
1113
1117
  waivers_file = os.environ.get('_WAIVERS_FILE', '')
1118
+ heldout_file = os.environ.get('_HELDOUT_FILE', '')
1114
1119
 
1115
1120
  try:
1116
1121
  with open(results_file) as f:
@@ -1129,12 +1134,22 @@ if waivers_file and os.path.exists(waivers_file):
1129
1134
  except (json.JSONDecodeError, KeyError):
1130
1135
  pass
1131
1136
 
1132
- # Find critical failures not waived
1137
+ # Load held-out item ids (excluded from this gate)
1138
+ heldout_ids = set()
1139
+ if heldout_file and os.path.exists(heldout_file):
1140
+ try:
1141
+ with open(heldout_file) as f:
1142
+ heldout_ids = set(json.load(f).get('held_out', []))
1143
+ except (json.JSONDecodeError, KeyError):
1144
+ pass
1145
+
1146
+ # Find critical failures not waived and not held-out
1133
1147
  critical_failures = []
1134
1148
  for cat in results.get('categories', []):
1135
1149
  for item in cat.get('items', []):
1136
1150
  if item.get('priority') == 'critical' and item.get('status') == 'failing':
1137
- if item.get('id') not in waived_ids:
1151
+ iid = item.get('id')
1152
+ if iid not in waived_ids and iid not in heldout_ids:
1138
1153
  critical_failures.append(item.get('title', item.get('id', 'unknown')))
1139
1154
 
1140
1155
  if critical_failures:
@@ -1187,6 +1202,221 @@ GATE_EOF
1187
1202
  return 0
1188
1203
  }
1189
1204
 
1205
+ #===============================================================================
1206
+ # Council Held-out Spec Eval Gate (v7.28.0) - anti-reward-hacking
1207
+ #===============================================================================
1208
+ # Held-out checklist items are reserved at PRD-checklist generation time and are
1209
+ # excluded from the prompt feed the build loop sees (checklist_summary, the build
1210
+ # prompt, and council_checklist_gate). The completion council evaluates them only
1211
+ # here, at the ship gate. Scope of the guarantee: this protects the prompt feed,
1212
+ # not a sandbox. .loki/checklist/held-out.json is plain on-disk JSON, so a
1213
+ # non-cooperative agent with filesystem tools can read the reservation directly;
1214
+ # the protection is against feeding held-out items to the loop, not isolation.
1215
+ # The gate uses the SAME verification machinery the
1216
+ # checklist already uses: council_reverify_checklist re-runs checklist-verify.py
1217
+ # over the FULL checklist (including held-out items), so this gate just reads
1218
+ # the held-out items' freshly-computed statuses from verification-results.json.
1219
+ #
1220
+ # A held-out item with status 'failing' blocks completion exactly like the
1221
+ # evidence gate (return 1 = CONTINUE). Pending/inconclusive items pass through.
1222
+ # Default-on ONLY when held-out items exist; opt out with LOKI_HELDOUT_GATE=0
1223
+ # (byte-identical to prior behavior: no read, no write).
1224
+ council_heldout_gate() {
1225
+ # Knob first: opt-out is exact-as-today, before any file read or write.
1226
+ [ "${LOKI_HELDOUT_GATE:-1}" = "0" ] && return 0
1227
+
1228
+ local results_file=".loki/checklist/verification-results.json"
1229
+ local heldout_file=".loki/checklist/held-out.json"
1230
+ local waivers_file=".loki/checklist/waivers.json"
1231
+
1232
+ # No held-out reservation = no gate (default-off when nothing reserved).
1233
+ if [ ! -f "$heldout_file" ] || [ ! -f "$results_file" ]; then
1234
+ return 0
1235
+ fi
1236
+
1237
+ if [ -z "${COUNCIL_STATE_DIR:-}" ]; then
1238
+ COUNCIL_STATE_DIR="${TARGET_DIR:-.}/.loki/council"
1239
+ fi
1240
+
1241
+ # Evaluate held-out items against their freshly-verified statuses. Output is
1242
+ # a single line "<verdict> <pass> <fail>" where verdict is NONE (no held-out
1243
+ # items reserved, gate inert), STALE (ids reserved but ZERO matched current
1244
+ # items -> reservation orphaned by a checklist regeneration), PASS, or BLOCK.
1245
+ # The failing titles are NOT carried in this line (a checklist title may
1246
+ # contain ':' or '|'); they are read separately from the held-out JSON block
1247
+ # below in the BLOCK branch.
1248
+ local gate_result
1249
+ gate_result=$(_RESULTS_FILE="$results_file" _HELDOUT_FILE="$heldout_file" _WAIVERS_FILE="$waivers_file" python3 -c "
1250
+ import json, sys, os
1251
+
1252
+ results_file = os.environ['_RESULTS_FILE']
1253
+ heldout_file = os.environ['_HELDOUT_FILE']
1254
+ waivers_file = os.environ.get('_WAIVERS_FILE', '')
1255
+
1256
+ try:
1257
+ with open(results_file) as f:
1258
+ results = json.load(f)
1259
+ with open(heldout_file) as f:
1260
+ heldout_ids = set(json.load(f).get('held_out', []))
1261
+ except (json.JSONDecodeError, IOError, KeyError):
1262
+ print('NONE 0 0')
1263
+ sys.exit(0)
1264
+
1265
+ # No held-out items reserved (e.g. N<4): gate is inert. Emit NONE so the caller
1266
+ # skips the trust-event entirely (no no-op heldout_eval pollution per round).
1267
+ if not heldout_ids:
1268
+ print('NONE 0 0')
1269
+ sys.exit(0)
1270
+
1271
+ # Waived held-out items are not counted as failures (operator override path).
1272
+ waived_ids = set()
1273
+ if waivers_file and os.path.exists(waivers_file):
1274
+ try:
1275
+ with open(waivers_file) as f:
1276
+ waived_ids = {w['item_id'] for w in json.load(f).get('waivers', []) if w.get('active', True)}
1277
+ except (json.JSONDecodeError, KeyError):
1278
+ pass
1279
+
1280
+ # HIGH-1(b): track how many held-out ids actually matched a current item. If the
1281
+ # reservation lists ids but ZERO matched (orphaned after a checklist regen), the
1282
+ # gate must NOT report PASS (that reads as evaluated-and-passed). 'matched' is
1283
+ # distinct from passed/failed: an all-pending matched set legitimately yields
1284
+ # passed=0 failed=0 and must stay PASS/pass-through, not STALE.
1285
+ matched = 0
1286
+ passed = 0
1287
+ failed = 0
1288
+ for cat in results.get('categories', []):
1289
+ for item in cat.get('items', []):
1290
+ iid = item.get('id', '')
1291
+ if iid not in heldout_ids:
1292
+ continue
1293
+ matched += 1
1294
+ if iid in waived_ids:
1295
+ continue
1296
+ status = item.get('status')
1297
+ if status == 'verified':
1298
+ passed += 1
1299
+ elif status == 'failing':
1300
+ failed += 1
1301
+ # pending/inconclusive: pass-through (not counted as pass or fail block)
1302
+
1303
+ if matched == 0:
1304
+ # Reservation is stale: ids exist but none map to a current item. Selection-
1305
+ # side repair (checklist_select_heldout) fixes this next iteration; emit STALE
1306
+ # so this round is recorded honestly rather than as a silent PASS.
1307
+ print('STALE 0 0')
1308
+ sys.exit(0)
1309
+
1310
+ verdict = 'BLOCK' if failed > 0 else 'PASS'
1311
+ print('%s %d %d' % (verdict, passed, failed))
1312
+ " 2>/dev/null || echo "NONE 0 0")
1313
+
1314
+ local verdict pass_count fail_count
1315
+ read -r verdict pass_count fail_count <<< "$gate_result"
1316
+ [ -z "$verdict" ] && verdict="NONE"
1317
+ [ -z "$pass_count" ] && pass_count=0
1318
+ [ -z "$fail_count" ] && fail_count=0
1319
+
1320
+ # NONE: no held-out items reserved -> gate inert, no trust-event, no block.
1321
+ # LOW-5: still clear any stale block report so a prior BLOCK does not linger
1322
+ # after the reservation is emptied (matches the PASS branch cleanup).
1323
+ if [ "$verdict" = "NONE" ]; then
1324
+ if [ -n "${COUNCIL_STATE_DIR:-}" ] && [ -f "$COUNCIL_STATE_DIR/heldout-block.json" ]; then
1325
+ rm -f "$COUNCIL_STATE_DIR/heldout-block.json"
1326
+ fi
1327
+ return 0
1328
+ fi
1329
+
1330
+ # STALE: reservation orphaned by a checklist regeneration (ids reserved but
1331
+ # zero matched current items). Emit a STALE trust event so the round is not
1332
+ # silently counted as a pass, warn, clear any stale block file (LOW-5), and
1333
+ # return 0 (pass-through): blocking here would loop forever, and the
1334
+ # selection-side repair re-selects valid ids on the next iteration.
1335
+ if [ "$verdict" = "STALE" ]; then
1336
+ log_warn "[Council] Held-out reservation is stale (checklist regenerated; reserved ids match no current item). Selection will re-select next iteration; not treating this as an evaluated PASS."
1337
+ if type record_trust_event_bash &>/dev/null; then
1338
+ record_trust_event_bash "heldout_eval" \
1339
+ "verdict=STALE" \
1340
+ "pass=0" \
1341
+ "fail=0" \
1342
+ >/dev/null 2>&1 || true
1343
+ fi
1344
+ if [ -n "${COUNCIL_STATE_DIR:-}" ] && [ -f "$COUNCIL_STATE_DIR/heldout-block.json" ]; then
1345
+ rm -f "$COUNCIL_STATE_DIR/heldout-block.json"
1346
+ fi
1347
+ return 0
1348
+ fi
1349
+
1350
+ # Trust-metrics: durable per-evaluation record (pass/fail counts). Emitted
1351
+ # only when held-out items actually exist (verdict PASS or BLOCK).
1352
+ if type record_trust_event_bash &>/dev/null; then
1353
+ record_trust_event_bash "heldout_eval" \
1354
+ "verdict=$verdict" \
1355
+ "pass=$pass_count" \
1356
+ "fail=$fail_count" \
1357
+ >/dev/null 2>&1 || true
1358
+ fi
1359
+
1360
+ if [ "$verdict" = "BLOCK" ]; then
1361
+ # Read failing held-out titles directly from the data (colon/pipe-safe).
1362
+ local titles_json titles_display
1363
+ titles_json=$(_RESULTS_FILE="$results_file" _HELDOUT_FILE="$heldout_file" _WAIVERS_FILE="$waivers_file" python3 -c "
1364
+ import json, os
1365
+ results = json.load(open(os.environ['_RESULTS_FILE']))
1366
+ heldout_ids = set(json.load(open(os.environ['_HELDOUT_FILE'])).get('held_out', []))
1367
+ waived_ids = set()
1368
+ wf = os.environ.get('_WAIVERS_FILE', '')
1369
+ if wf and os.path.exists(wf):
1370
+ try:
1371
+ waived_ids = {w['item_id'] for w in json.load(open(wf)).get('waivers', []) if w.get('active', True)}
1372
+ except Exception:
1373
+ pass
1374
+ titles = []
1375
+ for cat in results.get('categories', []):
1376
+ for item in cat.get('items', []):
1377
+ iid = item.get('id', '')
1378
+ if iid in heldout_ids and iid not in waived_ids and item.get('status') == 'failing':
1379
+ titles.append(item.get('title', iid))
1380
+ print(json.dumps(titles[:5]))
1381
+ " 2>/dev/null || echo '[]')
1382
+ titles_display=$(_T="$titles_json" python3 -c "
1383
+ import json, os
1384
+ try:
1385
+ print(', '.join(json.loads(os.environ['_T'])))
1386
+ except Exception:
1387
+ print('')
1388
+ " 2>/dev/null || echo "")
1389
+ log_warn "[Council] Held-out gate BLOCKED: ${fail_count} held-out acceptance check(s) failing: ${titles_display}"
1390
+ log_warn "[Council] Held-out checks are hidden from the build loop and verified only at completion. To opt out: set LOKI_HELDOUT_GATE=0"
1391
+
1392
+ mkdir -p "$COUNCIL_STATE_DIR" 2>/dev/null || true
1393
+ local ho_file="$COUNCIL_STATE_DIR/heldout-block.json"
1394
+ local ho_tmp="${ho_file}.tmp"
1395
+ local timestamp
1396
+ timestamp=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
1397
+ cat > "$ho_tmp" << HELDOUT_EOF
1398
+ {
1399
+ "status": "blocked",
1400
+ "blocked": true,
1401
+ "blocked_at": "$timestamp",
1402
+ "iteration": ${ITERATION_COUNT:-0},
1403
+ "reason": "held_out_checks_failing",
1404
+ "passed": $pass_count,
1405
+ "failed": $fail_count,
1406
+ "failures": $titles_json
1407
+ }
1408
+ HELDOUT_EOF
1409
+ mv "$ho_tmp" "$ho_file"
1410
+ return 1
1411
+ fi
1412
+
1413
+ # Gate passes: remove any stale block report.
1414
+ if [ -f "$COUNCIL_STATE_DIR/heldout-block.json" ]; then
1415
+ rm -f "$COUNCIL_STATE_DIR/heldout-block.json"
1416
+ fi
1417
+ return 0
1418
+ }
1419
+
1190
1420
  #===============================================================================
1191
1421
  # Council Evidence Hard Gate (v7.19.1) - "verified completion"
1192
1422
  #===============================================================================
@@ -1224,13 +1454,20 @@ council_evidence_gate() {
1224
1454
  # read, so none is tracked (avoids SC2034 dead-assignment).
1225
1455
  local diff_fails="false"
1226
1456
  local diff_files=0
1457
+ # v7.28.0: track WHY the diff baseline could not be established, so the
1458
+ # inconclusive case is surfaced honestly instead of passing through silently.
1459
+ # diff_inconclusive stays "false" on the conclusive branch below.
1460
+ local diff_inconclusive="false"
1461
+ local diff_inconclusive_reason=""
1227
1462
  if ! git rev-parse --is-inside-work-tree >/dev/null 2>&1; then
1228
1463
  # No git repo => cannot prove fabrication => inconclusive => pass-through.
1229
- :
1464
+ diff_inconclusive="true"
1465
+ diff_inconclusive_reason="no_git_repo"
1230
1466
  elif [ -z "$base_sha" ]; then
1231
1467
  # No baseline captured (non-git/zero-commit run, or never set) =>
1232
1468
  # inconclusive => pass-through. Never false-block a legit first run.
1233
- :
1469
+ diff_inconclusive="true"
1470
+ diff_inconclusive_reason="no_run_start_sha"
1234
1471
  else
1235
1472
  # Count the UNION of three change sources (auto-commit is not guaranteed,
1236
1473
  # so committed-only would false-block a dirty-but-real working tree):
@@ -1308,6 +1545,40 @@ else:
1308
1545
  # Missing test-results.json (the else of the -f check) likewise leaves
1309
1546
  # test_fails="false" => inconclusive => pass-through (no file = no gate).
1310
1547
 
1548
+ # --- v7.28.0: inconclusive-baseline lifecycle -------------------------------
1549
+ # When the gate cannot establish a diff baseline (no git repo, or no run-start
1550
+ # SHA) it does NOT block (would break non-git projects), but completion is no
1551
+ # longer independently verified. Record that fact durably so the completion
1552
+ # summary can surface one honest line, and emit a trust-event. The record is
1553
+ # about the DIFF baseline only, so it is written regardless of the test
1554
+ # outcome. On any CONCLUSIVE baseline we remove a stale record.
1555
+ local inconclusive_file="${TARGET_DIR:-.}/.loki/state/evidence-inconclusive.json"
1556
+ if [ "$diff_inconclusive" = "true" ]; then
1557
+ mkdir -p "${TARGET_DIR:-.}/.loki/state" 2>/dev/null || true
1558
+ local inc_tmp="${inconclusive_file}.tmp"
1559
+ local inc_ts
1560
+ inc_ts=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
1561
+ cat > "$inc_tmp" << INCONCLUSIVE_EOF
1562
+ {
1563
+ "inconclusive": true,
1564
+ "recorded_at": "$inc_ts",
1565
+ "iteration": ${ITERATION_COUNT:-0},
1566
+ "reason": "$diff_inconclusive_reason"
1567
+ }
1568
+ INCONCLUSIVE_EOF
1569
+ mv "$inc_tmp" "$inconclusive_file" 2>/dev/null || rm -f "$inc_tmp" 2>/dev/null || true
1570
+ if type record_trust_event_bash &>/dev/null; then
1571
+ record_trust_event_bash "evidence_inconclusive" \
1572
+ "reason=$diff_inconclusive_reason" \
1573
+ >/dev/null 2>&1 || true
1574
+ fi
1575
+ else
1576
+ # Conclusive baseline: clear any stale inconclusive record.
1577
+ if [ -f "$inconclusive_file" ]; then
1578
+ rm -f "$inconclusive_file"
1579
+ fi
1580
+ fi
1581
+
1311
1582
  # --- Block decision: block iff DIFF FAILS or TEST FAILS ---
1312
1583
  if [ "$diff_fails" != "true" ] && [ "$test_fails" != "true" ]; then
1313
1584
  # Gate passes: remove any stale block report.
@@ -2025,6 +2296,14 @@ council_evaluate() {
2025
2296
  return 1 # CONTINUE - can't complete with critical failures
2026
2297
  fi
2027
2298
 
2299
+ # v7.28.0: held-out spec eval gate - verify the hidden acceptance checks the
2300
+ # build loop never saw. Runs after the visible-checklist gate, using the
2301
+ # statuses council_reverify_checklist just recomputed over the full checklist.
2302
+ if ! council_heldout_gate; then
2303
+ log_info "[Council] Completion blocked by held-out spec eval gate"
2304
+ return 1 # CONTINUE - cannot complete with failing held-out checks
2305
+ fi
2306
+
2028
2307
  # Phase 2.5 (v7.19.1): evidence hard gate - block completion unless there is
2029
2308
  # real evidence that files changed AND tests are green.
2030
2309
  if ! council_evidence_gate; then
@@ -21,6 +21,7 @@ Usage:
21
21
  import argparse
22
22
  import json
23
23
  import os
24
+ import re
24
25
  import sys
25
26
  from datetime import datetime, timezone
26
27
  from pathlib import Path
@@ -61,13 +62,31 @@ def get_pricing(provider):
61
62
  return PRICING_BY_PROVIDER.get(provider, PRICING_BY_PROVIDER["claude"])
62
63
 
63
64
 
64
- def derive_project_slug():
65
- """Derive Claude's project slug from cwd (matches Claude's naming convention)."""
66
- cwd = os.getcwd()
67
- # Claude uses: /Users/name/project -> -Users-name-project
65
+ def derive_naive_project_slug():
66
+ """Legacy slug rule: replace only '/' with '-'.
67
+
68
+ Kept for backward compatibility: stale sessions created before the
69
+ sanitization fix live under this naming. find_session_file falls back to
70
+ it when the correctly-sanitized slug dir does not exist.
71
+ """
72
+ cwd = os.path.realpath(os.getcwd())
68
73
  return "-" + cwd.lstrip("/").replace("/", "-")
69
74
 
70
75
 
76
+ def derive_project_slug():
77
+ """Derive Claude's project slug from cwd.
78
+
79
+ Claude Code sanitizes EVERY non-alphanumeric character in the realpath to
80
+ '-' (rule: re.sub(r'[^a-zA-Z0-9]', '-', path)). The earlier implementation
81
+ replaced only '/', so any path with underscores, dots, or other special
82
+ characters produced a slug that did not match Claude's real session dir,
83
+ silently zeroing token/cost capture. realpath resolves symlinks (e.g.
84
+ /tmp -> /private/tmp) to match Claude's own keying.
85
+ """
86
+ cwd = os.path.realpath(os.getcwd())
87
+ return "-" + re.sub(r"[^a-zA-Z0-9]", "-", cwd.lstrip("/"))
88
+
89
+
71
90
  def find_session_file(provider, session_file_arg=None):
72
91
  """Find the most recently modified session file for the given provider.
73
92
 
@@ -84,10 +103,16 @@ def find_session_file(provider, session_file_arg=None):
84
103
  return path if path.exists() else None
85
104
 
86
105
  if provider == "claude":
87
- project_slug = derive_project_slug()
88
- session_dir = Path.home() / ".claude" / "projects" / project_slug
106
+ projects_root = Path.home() / ".claude" / "projects"
107
+ session_dir = projects_root / derive_project_slug()
108
+ # Backward compatibility: if the correctly-sanitized slug dir does not
109
+ # exist but a stale session under the old naive slug does, use it.
89
110
  if not session_dir.is_dir():
90
- return None
111
+ naive_dir = projects_root / derive_naive_project_slug()
112
+ if naive_dir.is_dir():
113
+ session_dir = naive_dir
114
+ else:
115
+ return None
91
116
  jsonl_files = sorted(session_dir.glob("*.jsonl"), key=lambda p: p.stat().st_mtime, reverse=True)
92
117
  return jsonl_files[0] if jsonl_files else None
93
118