loki-mode 7.26.0 → 7.28.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +15 -13
- package/SKILL.md +11 -2
- package/VERSION +1 -1
- package/autonomy/completion-council.sh +310 -6
- package/autonomy/context-tracker.py +32 -7
- package/autonomy/grill.sh +321 -0
- package/autonomy/lib/trust_metrics.py +636 -0
- package/autonomy/loki +142 -0
- package/autonomy/prd-checklist.sh +248 -14
- package/autonomy/run.sh +283 -32
- package/autonomy/spec.sh +646 -0
- package/autonomy/verify.sh +1130 -0
- package/dashboard/__init__.py +1 -1
- package/dashboard/static/index.html +1 -1
- package/docs/COMPARISON.md +9 -9
- package/docs/COMPETITIVE-ANALYSIS.md +18 -37
- package/docs/INSTALLATION.md +1 -1
- package/docs/auto-claude-comparison.md +9 -6
- package/docs/certification/01-core-concepts/lesson.md +3 -3
- package/docs/competitive/emergence-others-analysis.md +1 -1
- package/docs/competitive/replit-lovable-analysis.md +1 -1
- package/docs/cursor-comparison.md +1 -1
- package/docs/prd-purple-lab-platform.md +1 -1
- package/docs/show-hn-post.md +2 -2
- package/loki-ts/dist/loki.js +2 -2
- package/mcp/__init__.py +1 -1
- package/package.json +2 -1
- package/providers/codex.sh +3 -2
- package/references/agent-types.md +9 -9
- package/references/agents.md +8 -8
- package/references/business-ops.md +1 -1
- package/references/competitive-analysis.md +1 -1
- package/skills/agents.md +3 -3
- package/skills/providers.md +3 -3
- package/skills/quality-gates.md +46 -0
package/skills/quality-gates.md
CHANGED
|
@@ -158,6 +158,18 @@ pass, not PRD-semantic correctness (the council vote is the semantic check).
|
|
|
158
158
|
The common false-block is a project that was ALREADY red before the run; the
|
|
159
159
|
one-step opt-out is the escape hatch.
|
|
160
160
|
|
|
161
|
+
**Inconclusive-baseline disclosure (v7.28.0):** when the gate cannot establish a
|
|
162
|
+
diff baseline (reason `no_git_repo` or `no_run_start_sha`) it still passes
|
|
163
|
+
through (it never blocks a non-git project), but completion is no longer
|
|
164
|
+
independently verified. Instead of passing silently, the gate writes
|
|
165
|
+
`.loki/state/evidence-inconclusive.json` (recording the reason, iteration, and
|
|
166
|
+
timestamp) and emits an `evidence_inconclusive` trust event. The run summary in
|
|
167
|
+
`.loki/COMPLETION.txt` then carries one honest line:
|
|
168
|
+
`Evidence gate: inconclusive (<reason>) - completion not independently
|
|
169
|
+
verified`. The record is removed automatically on any later run that resolves a
|
|
170
|
+
conclusive baseline. This is a diff-baseline-only disclosure: red tests still
|
|
171
|
+
block completion independently, regardless of the inconclusive state.
|
|
172
|
+
|
|
161
173
|
**Override-judge knobs (v7.5.4+):**
|
|
162
174
|
|
|
163
175
|
```bash
|
|
@@ -220,6 +232,40 @@ crash via the primitive's `finally` cleanup.
|
|
|
220
232
|
|
|
221
233
|
---
|
|
222
234
|
|
|
235
|
+
## Held-out spec evals (v7.28.0, default-on when reserved)
|
|
236
|
+
|
|
237
|
+
Anti-reward-hacking for the checklist. Before the first verification,
|
|
238
|
+
`checklist_select_heldout` (`autonomy/prd-checklist.sh`) deterministically
|
|
239
|
+
reserves a slice of checklist items as held-out:
|
|
240
|
+
`count = clamp(round(0.25 * N), 1, 5)` for checklists with `N >= 4` items
|
|
241
|
+
(smaller checklists reserve nothing). Selection is reproducible, not random:
|
|
242
|
+
items are ranked by `sha256(id)` and the first `count` are taken, then written
|
|
243
|
+
once to `.loki/checklist/held-out.json` (idempotent: never reselected once
|
|
244
|
+
chosen).
|
|
245
|
+
|
|
246
|
+
Held-out item IDs are EXCLUDED from everything the build loop sees: the checklist
|
|
247
|
+
summary, the visible counts, and the per-iteration checklist gate all omit them,
|
|
248
|
+
so the build agent cannot tune to those specific acceptance checks. The
|
|
249
|
+
completion council evaluates them only at the ship gate via
|
|
250
|
+
`council_heldout_gate` (`autonomy/completion-council.sh`): a held-out item whose
|
|
251
|
+
status is `failing` (and not waived) blocks completion exactly like any other
|
|
252
|
+
critical failure. Each evaluation records a `heldout_eval` trust event with the
|
|
253
|
+
verdict and pass/fail counts (no event is emitted when nothing is reserved).
|
|
254
|
+
|
|
255
|
+
```bash
|
|
256
|
+
LOKI_HELDOUT_GATE=0 # opt out: the held-out gate never blocks completion.
|
|
257
|
+
# Default is on (1), and the gate is inert anyway when
|
|
258
|
+
# no held-out items were reserved (N < 4).
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
Honest limit: this protects against the PROMPT FEED, not against filesystem
|
|
262
|
+
access. The reservation lives on disk at `.loki/checklist/held-out.json`; an
|
|
263
|
+
adversarial agent with read access to the working tree can open that file and
|
|
264
|
+
learn which items were held out. The guarantee is that held-out items are kept
|
|
265
|
+
out of the build loop's own prompt context, not that they are sandboxed.
|
|
266
|
+
|
|
267
|
+
---
|
|
268
|
+
|
|
223
269
|
## Uncertainty-gated escalation (v7.19.2, default-on)
|
|
224
270
|
|
|
225
271
|
When Loki is likely stuck or thrashing, it escalates proactively to the human
|