loki-mode 7.26.0 → 7.28.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -158,6 +158,18 @@ pass, not PRD-semantic correctness (the council vote is the semantic check).
158
158
  The common false-block is a project that was ALREADY red before the run; the
159
159
  one-step opt-out is the escape hatch.
160
160
 
161
+ **Inconclusive-baseline disclosure (v7.28.0):** when the gate cannot establish a
162
+ diff baseline (reason `no_git_repo` or `no_run_start_sha`) it still passes
163
+ through (it never blocks a non-git project), but completion is no longer
164
+ independently verified. Instead of passing silently, the gate writes
165
+ `.loki/state/evidence-inconclusive.json` (recording the reason, iteration, and
166
+ timestamp) and emits an `evidence_inconclusive` trust event. The run summary in
167
+ `.loki/COMPLETION.txt` then carries one honest line:
168
+ `Evidence gate: inconclusive (<reason>) - completion not independently
169
+ verified`. The record is removed automatically on any later run that resolves a
170
+ conclusive baseline. This is a diff-baseline-only disclosure: red tests still
171
+ block completion independently, regardless of the inconclusive state.
172
+
161
173
  **Override-judge knobs (v7.5.4+):**
162
174
 
163
175
  ```bash
@@ -220,6 +232,40 @@ crash via the primitive's `finally` cleanup.
220
232
 
221
233
  ---
222
234
 
235
+ ## Held-out spec evals (v7.28.0, default-on when reserved)
236
+
237
+ Anti-reward-hacking for the checklist. Before the first verification,
238
+ `checklist_select_heldout` (`autonomy/prd-checklist.sh`) deterministically
239
+ reserves a slice of checklist items as held-out:
240
+ `count = clamp(round(0.25 * N), 1, 5)` for checklists with `N >= 4` items
241
+ (smaller checklists reserve nothing). Selection is reproducible, not random:
242
+ items are ranked by `sha256(id)` and the first `count` are taken, then written
243
+ once to `.loki/checklist/held-out.json` (idempotent: never reselected once
244
+ chosen).
245
+
246
+ Held-out item IDs are EXCLUDED from everything the build loop sees: the checklist
247
+ summary, the visible counts, and the per-iteration checklist gate all omit them,
248
+ so the build agent cannot tune to those specific acceptance checks. The
249
+ completion council evaluates them only at the ship gate via
250
+ `council_heldout_gate` (`autonomy/completion-council.sh`): a held-out item whose
251
+ status is `failing` (and not waived) blocks completion exactly like any other
252
+ critical failure. Each evaluation records a `heldout_eval` trust event with the
253
+ verdict and pass/fail counts (no event is emitted when nothing is reserved).
254
+
255
+ ```bash
256
+ LOKI_HELDOUT_GATE=0 # opt out: the held-out gate never blocks completion.
257
+ # Default is on (1), and the gate is inert anyway when
258
+ # no held-out items were reserved (N < 4).
259
+ ```
260
+
261
+ Honest limit: this protects against the PROMPT FEED, not against filesystem
262
+ access. The reservation lives on disk at `.loki/checklist/held-out.json`; an
263
+ adversarial agent with read access to the working tree can open that file and
264
+ learn which items were held out. The guarantee is that held-out items are kept
265
+ out of the build loop's own prompt context, not that they are sandboxed.
266
+
267
+ ---
268
+
223
269
  ## Uncertainty-gated escalation (v7.19.2, default-on)
224
270
 
225
271
  When Loki is likely stuck or thrashing, it escalates proactively to the human