loki-mode 7.19.0 → 7.19.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/completion-council.sh +483 -0
- package/autonomy/config.example.yaml +26 -0
- package/autonomy/run.sh +103 -3
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +40 -11
- package/dashboard/static/index.html +543 -497
- package/docs/INSTALLATION.md +1 -1
- package/docs/UNCERTAINTY-ESCALATION-PLAN.md +396 -0
- package/docs/VERIFIED-COMPLETION-PLAN.md +462 -0
- package/loki-ts/dist/loki.js +2 -2
- package/mcp/__init__.py +1 -1
- package/package.json +1 -1
- package/skills/quality-gates.md +115 -0
package/docs/INSTALLATION.md
CHANGED
|
@@ -0,0 +1,396 @@
|
|
|
1
|
+
# Uncertainty-Gated Escalation (Loki Mode v7.19.2)
|
|
2
|
+
|
|
3
|
+
Design only. No implementation code lands with this document. Every hook point
|
|
4
|
+
below was read from live source; line numbers drift, so the verified anchors in
|
|
5
|
+
section 1 are the contract a dev re-confirms before editing.
|
|
6
|
+
|
|
7
|
+
## Goal
|
|
8
|
+
|
|
9
|
+
When Loki is likely stuck or thrashing, escalate to the human PROACTIVELY via
|
|
10
|
+
the EXISTING pause + notify + handoff machinery, instead of silently burning
|
|
11
|
+
iterations until max-iterations. No new metacognition: reuse three proxy signals
|
|
12
|
+
that already exist. Escalate only when at least two of the three co-occur for N
|
|
13
|
+
consecutive rounds. Default-on, opt-out with LOKI_UNCERTAINTY_ESCALATION=0,
|
|
14
|
+
byte-identical when off.
|
|
15
|
+
|
|
16
|
+
## Architectural spine: split DECISION from ACTION
|
|
17
|
+
|
|
18
|
+
This is the load-bearing decision and everything else falls out of it.
|
|
19
|
+
|
|
20
|
+
- DECISION: a pure-ish function `uncertainty_should_escalate` lives in
|
|
21
|
+
`autonomy/completion-council.sh` next to the other `council_*` state helpers.
|
|
22
|
+
It reads ONLY persisted state (`state.json`, `convergence.log`, and its own
|
|
23
|
+
`.loki/state/uncertainty.json`), mutates only its own state file, and returns
|
|
24
|
+
rc 0 (escalate now) / rc 1 (do not). It fires NO notifications and touches NO
|
|
25
|
+
PAUSE file. This makes it sourceable and testable exactly like
|
|
26
|
+
`council_evidence_gate` (completion-council.sh:907): a test writes a fake
|
|
27
|
+
`state.json` into a throwaway dir and asserts the return code, with zero real
|
|
28
|
+
side effects on the developer's machine.
|
|
29
|
+
- ACTION: the run.sh call site (new region right after
|
|
30
|
+
`council_track_iteration`, run.sh:12389-12391) interprets rc 0 and performs
|
|
31
|
+
the side effects: loud terminal line, `write_structured_handoff`,
|
|
32
|
+
`notify_intervention_needed`, write a `signals/UNCERTAINTY_ESCALATION` marker,
|
|
33
|
+
and `touch .loki/PAUSE`. It also emits the perpetual-mode honesty line.
|
|
34
|
+
|
|
35
|
+
Consequence: the two code slices live in DIFFERENT files (decision in
|
|
36
|
+
completion-council.sh, action in run.sh), so the dev fleet can build them in
|
|
37
|
+
parallel without collision.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## 1. Verified hook points (read from live source)
|
|
42
|
+
|
|
43
|
+
All paths relative to repo root `/Users/lokesh/git/loki-mode`.
|
|
44
|
+
|
|
45
|
+
### Proxy 1 - circuit-breaker no-change counter
|
|
46
|
+
- Var declared: `autonomy/completion-council.sh:70` (`COUNCIL_CONSECUTIVE_NO_CHANGE=0`).
|
|
47
|
+
- Incremented: `completion-council.sh:178`; reset: `:180`. Driven by a combined
|
|
48
|
+
hash of `git diff --stat HEAD` + staged diff + last commit hash
|
|
49
|
+
(`:165-182`).
|
|
50
|
+
- Limit knob: `COUNCIL_STAGNATION_LIMIT` (`:56`, default 5).
|
|
51
|
+
- Persisted: written into `state.json` as `consecutive_no_change`
|
|
52
|
+
(`completion-council.sh:232` -> `:237` json.dump). THIS is what the decision
|
|
53
|
+
function reads (not the live shell var, which is out of scope in a sourced
|
|
54
|
+
test).
|
|
55
|
+
- Updated every iteration via `council_track_iteration` (run.sh:12390).
|
|
56
|
+
|
|
57
|
+
### Proxy 2 - file-churn oscillation / reverts
|
|
58
|
+
- Existing data: `convergence.log` is appended at
|
|
59
|
+
`completion-council.sh:215` with line format
|
|
60
|
+
`timestamp|iteration|files_changed|consecutive_no_change|done_signals`.
|
|
61
|
+
CRITICAL: `files_changed` (`:208`) is a COUNT
|
|
62
|
+
(`git diff --name-only HEAD | wc -l`), NOT file identities. A count cannot
|
|
63
|
+
detect "same files back and forth."
|
|
64
|
+
- The combined diff hash exists at `completion-council.sh:175`
|
|
65
|
+
(`combined_hash`), persisted only transiently in the shell var
|
|
66
|
+
`COUNCIL_LAST_DIFF_HASH` (`:73`, `:182`) - immediate-repeat only.
|
|
67
|
+
- DECISION (see section 5 limits): proxy 2 is implemented as DIFF-HASH
|
|
68
|
+
RECURRENCE-AT-DISTANCE. We persist a small ring buffer (last
|
|
69
|
+
~6 hashes) of `combined_hash` in `uncertainty.json`. Proxy 2 fires when the
|
|
70
|
+
current hash equals a hash seen 2+ rounds back (A -> B -> A pattern). The
|
|
71
|
+
immediate repeat (A -> A) is already proxy 1, so recurrence-at-distance is the
|
|
72
|
+
genuine oscillation/revert signal. This is a tiny, justified addition (one
|
|
73
|
+
bounded array in an existing JSON file), NOT heavy new tracking. The hash to
|
|
74
|
+
read is the same `combined_hash` proxy 1 already computes; the decision
|
|
75
|
+
function recomputes it cheaply from `git diff --stat HEAD` or, preferably,
|
|
76
|
+
`council_track_iteration` writes it into `state.json` (`last_diff_hash`) so the
|
|
77
|
+
decision function stays pure (no git calls). See slice A for which.
|
|
78
|
+
|
|
79
|
+
### Proxy 3 - persistent council split
|
|
80
|
+
- approve_count computed in `council_vote` (`completion-council.sh:270`,
|
|
81
|
+
tallied `:388`, anti-sycophancy adjust `:417`).
|
|
82
|
+
- effective_threshold: `completion-council.sh:293`
|
|
83
|
+
(`(COUNCIL_SIZE * 2 + 2) / 3`, the ceiling(2/3) formula).
|
|
84
|
+
- Persisted: each council round appends to `state['verdicts']`
|
|
85
|
+
(`completion-council.sh:449-455`) with keys `iteration`, `timestamp`,
|
|
86
|
+
`approve`, `reject`, `result` (`APPROVED`/`REJECTED`). NOTE: threshold is NOT
|
|
87
|
+
stored. That is fine: `result == "REJECTED"` already encodes
|
|
88
|
+
`approve < threshold`. A split round = `result == "REJECTED" AND approve >= 1`
|
|
89
|
+
(council could not converge: at least one approver, still short of threshold).
|
|
90
|
+
Do NOT go looking for a stored threshold; it is not there by design.
|
|
91
|
+
- CADENCE: `verdicts` only appends when the council actually VOTES, which is
|
|
92
|
+
every `COUNCIL_CHECK_INTERVAL` OR when the circuit breaker forces a vote
|
|
93
|
+
(`council_should_stop`, completion-council.sh:2045-2051; circuit check
|
|
94
|
+
:2039-2043). So proxy 3 is STALE between votes. This is acceptable because in
|
|
95
|
+
the stuck regime we care about, proxy 1 going hot
|
|
96
|
+
(`consecutive_no_change >= COUNCIL_STAGNATION_LIMIT`) is exactly what TRIPS the
|
|
97
|
+
circuit breaker (`council_circuit_breaker_triggered`,
|
|
98
|
+
completion-council.sh:252) and forces a council vote, which refreshes proxy 3.
|
|
99
|
+
Verified: `council_should_stop` sets `should_check=true` when
|
|
100
|
+
`circuit_triggered=true` (:2047-2048). Document the between-votes staleness as
|
|
101
|
+
a known limit (section 5).
|
|
102
|
+
|
|
103
|
+
### notify_intervention_needed
|
|
104
|
+
- `autonomy/run.sh:2328`. Signature: `notify_intervention_needed "$reason"`;
|
|
105
|
+
thin wrapper over `send_notification "Intervention Needed" "$reason"
|
|
106
|
+
"critical"`.
|
|
107
|
+
|
|
108
|
+
### PAUSE consume / clear path (perpetual-mode crux)
|
|
109
|
+
- Consumer: `check_human_intervention` (run.sh:12701), PAUSE branch
|
|
110
|
+
`:12708`.
|
|
111
|
+
- Perpetual auto-clear: `:12711-12730`. In perpetual mode PAUSE is
|
|
112
|
+
auto-cleared (`:12727 rm -f`) and `notify_intervention_needed` STILL fires
|
|
113
|
+
(`:12726`). Only `BUDGET_EXCEEDED` (`:12712`) is carved out from
|
|
114
|
+
auto-clear.
|
|
115
|
+
- Non-perpetual: PAUSE triggers `handle_pause` (run.sh:12842) and waits
|
|
116
|
+
(`:12732-12742`).
|
|
117
|
+
- Consumed once per loop turn from the main loop: `check_human_intervention`
|
|
118
|
+
is called at run.sh:11528, return-code switch `:11530-11533`
|
|
119
|
+
(1 = restart loop, 2 = stop).
|
|
120
|
+
- IMPLICATION: escalation only WRITES PAUSE. The existing consumer halts (or, in
|
|
121
|
+
perpetual mode, auto-clears + notifies). Perpetual degrade is therefore FREE -
|
|
122
|
+
no new consumer logic. We detect perpetual at OUR site using the same vars
|
|
123
|
+
(`AUTONOMY_MODE` / `PERPETUAL_MODE`, run.sh:12711) only to print the honest
|
|
124
|
+
"notify-only; PAUSE will not halt this run" line.
|
|
125
|
+
|
|
126
|
+
### write_structured_handoff
|
|
127
|
+
- `autonomy/run.sh:8816`. Verified single live definition (the
|
|
128
|
+
"active definition is below" comment at :8811 refers to
|
|
129
|
+
`load_handoff_context`, not a second handoff def; grep shows one
|
|
130
|
+
`write_structured_handoff()`). Signature:
|
|
131
|
+
`write_structured_handoff "$reason"`; writes
|
|
132
|
+
`.loki/memory/handoffs/<ts>.json` + `.md`.
|
|
133
|
+
|
|
134
|
+
### Loop point for the escalation check
|
|
135
|
+
- Slot the ACTION immediately AFTER `council_track_iteration` in the main loop:
|
|
136
|
+
run.sh:12388-12391. At this point proxy 1 and proxy 2 are freshly written for
|
|
137
|
+
this iteration, and proxy 3 is fresh exactly when it matters (circuit-forced
|
|
138
|
+
vote). This is BEFORE the completion-promise / council checks
|
|
139
|
+
(run.sh:12408+), so escalation is evaluated every iteration.
|
|
140
|
+
|
|
141
|
+
### Mirror precedent (action shape)
|
|
142
|
+
- Gate-escalation block run.sh:12308-12318 is the precedent to clone: write a
|
|
143
|
+
`signals/` marker (`:12310`), call a handoff hook with its own opt-out
|
|
144
|
+
(`:12314`), then `touch .loki/PAUSE` (`:12317`). Our action mirrors this with
|
|
145
|
+
`write_structured_handoff` + `notify_intervention_needed` +
|
|
146
|
+
`signals/UNCERTAINTY_ESCALATION` + `touch .loki/PAUSE`.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## 2. Escalation decision function design
|
|
151
|
+
|
|
152
|
+
### Inputs (all read from persisted state, no live shell vars)
|
|
153
|
+
1. `p1` = proxy 1 hot: from `state.json.consecutive_no_change`. Hot when
|
|
154
|
+
`>= LOKI_UNCERTAINTY_NOCHANGE_MIN` (default = `COUNCIL_STAGNATION_LIMIT` - 1,
|
|
155
|
+
i.e. "approaching circuit-breaker"). Reading slightly below the breaker limit
|
|
156
|
+
lets us escalate BEFORE the breaker forces an end-state.
|
|
157
|
+
2. `p2` = proxy 2 hot: diff-hash recurrence-at-distance. Hot when the current
|
|
158
|
+
`last_diff_hash` matches a hash at distance >= 2 in the ring buffer.
|
|
159
|
+
3. `p3` = proxy 3 hot: persistent split. Read the last `K` entries of
|
|
160
|
+
`state.json.verdicts`; count consecutive trailing rounds where
|
|
161
|
+
`result == "REJECTED" AND approve >= 1`. Hot when that run length
|
|
162
|
+
`>= LOKI_UNCERTAINTY_SPLIT_ROUNDS` (default 2).
|
|
163
|
+
|
|
164
|
+
### Co-occurrence + N-round debounce
|
|
165
|
+
- Per round (= per iteration; "round" is defined as one main-loop iteration),
|
|
166
|
+
compute `hot_count = p1 + p2 + p3`.
|
|
167
|
+
- `co_occur = (hot_count >= 2)`.
|
|
168
|
+
- Maintain `consecutive_co_occur` in `uncertainty.json`:
|
|
169
|
+
- if `co_occur`: increment; else reset to 0.
|
|
170
|
+
- Escalate (rc 0) when `consecutive_co_occur >= LOKI_UNCERTAINTY_ROUNDS`
|
|
171
|
+
(the N knob, default 2; recommended range 2-3) AND not already escalated this
|
|
172
|
+
episode (debounce flag, below).
|
|
173
|
+
- A single noisy proxy can NEVER escalate alone (requires hot_count >= 2).
|
|
174
|
+
|
|
175
|
+
### Debounce (escalate once per stuck-episode)
|
|
176
|
+
- `uncertainty.json` carries `escalated_episode: true|false`.
|
|
177
|
+
- On escalate, set `escalated_episode = true` and record
|
|
178
|
+
`escalated_at_iteration`.
|
|
179
|
+
- Suppress re-fire while `escalated_episode == true`.
|
|
180
|
+
- RE-ARM (reset `escalated_episode = false` and `consecutive_co_occur = 0`) when
|
|
181
|
+
`co_occur` becomes false in any later round (a proxy cleared => the episode is
|
|
182
|
+
considered resolved; a new stuck episode may legitimately re-escalate). State
|
|
183
|
+
the reset condition explicitly so a dev does not "helpfully" keep it latched.
|
|
184
|
+
|
|
185
|
+
### State persistence
|
|
186
|
+
- File: `.loki/state/uncertainty.json` (singular; the `uncertainty-*.json` glob
|
|
187
|
+
in the brief maps to this one file - keep it single to avoid an unbounded
|
|
188
|
+
directory). Schema:
|
|
189
|
+
```json
|
|
190
|
+
{
|
|
191
|
+
"schema_version": "1.0.0",
|
|
192
|
+
"consecutive_co_occur": 0,
|
|
193
|
+
"escalated_episode": false,
|
|
194
|
+
"escalated_at_iteration": 0,
|
|
195
|
+
"diff_hash_ring": ["<h>", "<h>", "..."],
|
|
196
|
+
"last_round_iteration": 0,
|
|
197
|
+
"last_proxies": {"p1": false, "p2": false, "p3": false}
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
- Ring buffer bounded to 6 entries (constant). All writes atomic temp+mv,
|
|
201
|
+
mirroring evidence-block.json (`completion-council.sh:1059-1086`).
|
|
202
|
+
|
|
203
|
+
### Knob-first byte-identical guard
|
|
204
|
+
First line of `uncertainty_should_escalate`, BEFORE any read or write:
|
|
205
|
+
```
|
|
206
|
+
[ "${LOKI_UNCERTAINTY_ESCALATION:-1}" = "0" ] && return 1
|
|
207
|
+
```
|
|
208
|
+
(rc 1 = do-not-escalate; mirrors `council_evidence_gate`'s knob-first guard at
|
|
209
|
+
completion-council.sh:909). When off: zero file reads, zero writes, zero state
|
|
210
|
+
file creation => byte-identical.
|
|
211
|
+
|
|
212
|
+
### Knobs summary (all opt-out / tunable, none required)
|
|
213
|
+
- `LOKI_UNCERTAINTY_ESCALATION` (default 1) - master on/off.
|
|
214
|
+
- `LOKI_UNCERTAINTY_ROUNDS` (default 2) - N consecutive co-occurrence rounds.
|
|
215
|
+
- `LOKI_UNCERTAINTY_NOCHANGE_MIN` (default `COUNCIL_STAGNATION_LIMIT - 1`) - p1
|
|
216
|
+
threshold.
|
|
217
|
+
- `LOKI_UNCERTAINTY_SPLIT_ROUNDS` (default 2) - p3 split run length.
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
## 3. Disjoint dev slices (parallel-safe)
|
|
222
|
+
|
|
223
|
+
Binding constraints for EVERY slice: no version bumps (do not touch VERSION /
|
|
224
|
+
CHANGELOG), no git commits, no emojis, no em-dashes or en-dashes (ASCII hyphen
|
|
225
|
+
only), atomic temp+mv for all state writes, knob-first opt-out where the slice
|
|
226
|
+
touches the hot loop.
|
|
227
|
+
|
|
228
|
+
### Slice A - decision function + state schema (completion-council.sh)
|
|
229
|
+
- Region: add `uncertainty_should_escalate` and a tiny
|
|
230
|
+
`_uncertainty_read_state` / `_uncertainty_write_state` pair near the other
|
|
231
|
+
`council_*` state helpers (after `council_circuit_breaker_triggered`,
|
|
232
|
+
i.e. around completion-council.sh:265, BEFORE `council_vote` at :270).
|
|
233
|
+
- Also add ONE line inside `council_track_iteration` to persist
|
|
234
|
+
`state['last_diff_hash'] = combined_hash` (extend the python block at
|
|
235
|
+
completion-council.sh:224-238 by adding the env var + one assignment) so the
|
|
236
|
+
decision function reads the hash from state.json and stays pure (no git in the
|
|
237
|
+
decision path). This is the only edit inside an existing function; keep it to a
|
|
238
|
+
single key add to minimize collision with run.sh slice.
|
|
239
|
+
- Owns: `.loki/state/uncertainty.json` schema, ring buffer, co-occurrence +
|
|
240
|
+
debounce logic, all four knobs' defaults.
|
|
241
|
+
- File-region disjoint from slice B (different file).
|
|
242
|
+
|
|
243
|
+
### Slice B - action + wiring (run.sh)
|
|
244
|
+
- Region: new block right after `council_track_iteration` call
|
|
245
|
+
(run.sh:12389-12391).
|
|
246
|
+
- Logic:
|
|
247
|
+
```
|
|
248
|
+
if type uncertainty_should_escalate >/dev/null 2>&1 && uncertainty_should_escalate; then
|
|
249
|
+
# loud line (section 6), write_structured_handoff "uncertainty_escalation",
|
|
250
|
+
# notify_intervention_needed, signals/UNCERTAINTY_ESCALATION marker,
|
|
251
|
+
# touch .loki/PAUSE, perpetual honesty line.
|
|
252
|
+
fi
|
|
253
|
+
```
|
|
254
|
+
- Clone the GATE_ESCALATION shape (run.sh:12308-12318) for marker + handoff +
|
|
255
|
+
touch ordering.
|
|
256
|
+
- Perpetual detection: read `AUTONOMY_MODE` / `PERPETUAL_MODE`
|
|
257
|
+
(same as run.sh:12711) ONLY to print the honest notify-only line.
|
|
258
|
+
- File-region disjoint from slices A, C, D.
|
|
259
|
+
|
|
260
|
+
### Slice C - tests (tests/test-uncertainty-escalation.sh)
|
|
261
|
+
- New file. Sources the real `uncertainty_should_escalate` from
|
|
262
|
+
completion-council.sh, stubs `log_*`, runs per-case throwaway dirs. Models
|
|
263
|
+
tests/test-evidence-gate.sh exactly. Asserts decision-only (no real notify /
|
|
264
|
+
no real PAUSE because it calls the DECISION function, not the run.sh action).
|
|
265
|
+
- File-region disjoint (new file).
|
|
266
|
+
|
|
267
|
+
### Slice D - docs + knob registration
|
|
268
|
+
- Register the four knobs in the config-comment block (the env-var doc region
|
|
269
|
+
around run.sh:91-128 and the yaml mapping near :282/:424) and
|
|
270
|
+
`autonomy/config.example.yaml`. Add a short section to the user-facing docs.
|
|
271
|
+
- Keep edits to comment / config blocks; do not touch the hot loop. If this
|
|
272
|
+
collides with slice B's run.sh edits, sequence D after B (the only soft
|
|
273
|
+
dependency). Otherwise fully disjoint.
|
|
274
|
+
|
|
275
|
+
Recommended parallelism: A, C, D in parallel; B after A's function signature is
|
|
276
|
+
agreed (C can mock the signature meanwhile). 4 slices, 3 files + 1 new test +
|
|
277
|
+
docs.
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## 4. Test plan (model: tests/test-evidence-gate.sh)
|
|
282
|
+
|
|
283
|
+
Harness: source the real completion-council.sh with `log_*` stubbed; call
|
|
284
|
+
`uncertainty_should_escalate` inside per-case `mktemp -d` dirs, each writing its
|
|
285
|
+
own `.loki/state/uncertainty.json` + `.loki/council/state.json` +
|
|
286
|
+
`.loki/council/convergence.log`. Assert BOTH rc and the mutated
|
|
287
|
+
`uncertainty.json` side effects. Loud SKIP (exit 0) if the function is not yet
|
|
288
|
+
defined (mirrors evidence-gate's absent-impl banner). Each case sets
|
|
289
|
+
`COUNCIL_STATE_DIR` and `ITERATION_COUNT` explicitly.
|
|
290
|
+
|
|
291
|
+
Cases:
|
|
292
|
+
1. PROXY READ - p1 only hot: `consecutive_no_change` >= min, hash unique,
|
|
293
|
+
verdicts approved. Assert `last_proxies.p1 == true`, others false, rc 1
|
|
294
|
+
(NO escalate on 1 proxy). Proves proxy 1 is read.
|
|
295
|
+
2. PROXY READ - p2 only hot: write a recurrence-at-distance hash ring
|
|
296
|
+
(A,B,A), unique p1/p3. Assert `p2 == true`, rc 1. Proves proxy 2 is read
|
|
297
|
+
from the ring, and that immediate-repeat (A,A) does NOT count as p2.
|
|
298
|
+
3. PROXY READ - p3 only hot: verdicts trailing K = REJECTED with approve>=1 for
|
|
299
|
+
SPLIT_ROUNDS rounds. Assert `p3 == true`, rc 1. Proves proxy 3 reads
|
|
300
|
+
`result`/`approve` (and does NOT require a stored threshold).
|
|
301
|
+
4. CO-OCCURRENCE x N escalates: set p1 + p3 hot for N consecutive calls
|
|
302
|
+
(loop the function N times, advancing iteration). Assert rc 0 on the Nth
|
|
303
|
+
call, `escalated_episode == true`. Proves >=2-for-N escalates.
|
|
304
|
+
5. 1-PROXY-NEVER: keep only one proxy hot for many rounds. Assert rc 1 every
|
|
305
|
+
round, `escalated_episode == false`. Proves a single noisy proxy cannot
|
|
306
|
+
escalate.
|
|
307
|
+
6. DEBOUNCE (no re-fire): after case-4 escalation, call again with the SAME hot
|
|
308
|
+
proxies. Assert rc 1 (suppressed) while `escalated_episode == true`. Proves
|
|
309
|
+
escalate-once-per-episode.
|
|
310
|
+
7. RE-ARM: after escalation, feed one round with co_occur false (clear a proxy),
|
|
311
|
+
assert `escalated_episode == false` + `consecutive_co_occur == 0`; then feed
|
|
312
|
+
N hot rounds again, assert rc 0. Proves reset-on-clear and re-escalation of a
|
|
313
|
+
new episode.
|
|
314
|
+
8. OPT-OUT BYTE-IDENTICAL: `LOKI_UNCERTAINTY_ESCALATION=0`. Assert rc 1 AND that
|
|
315
|
+
`.loki/state/uncertainty.json` is NOT created / NOT modified (snapshot the
|
|
316
|
+
dir before/after; mtime + existence). Proves byte-identical when off.
|
|
317
|
+
9. PERPETUAL DEGRADE-TO-NOTIFY: this is a run.sh ACTION behavior, so test it as a
|
|
318
|
+
thin integration shim: stub `notify_intervention_needed`, `handle_pause`,
|
|
319
|
+
`handle_dashboard_crash` to record calls; set `AUTONOMY_MODE=perpetual`;
|
|
320
|
+
`touch .loki/PAUSE`; call the real `check_human_intervention`
|
|
321
|
+
(run.sh:12701). Assert PAUSE is auto-cleared AND notify was called (proves
|
|
322
|
+
the degrade path is the EXISTING consumer at run.sh:12725-12727, so escalation
|
|
323
|
+
degrades to notify-only under perpetual). This case sources run.sh's
|
|
324
|
+
`check_human_intervention` with its deps stubbed, or asserts via a focused
|
|
325
|
+
harness; if sourcing run.sh wholesale is impractical, assert the contract by
|
|
326
|
+
reading the consumer branch and documenting it as a code-path test.
|
|
327
|
+
|
|
328
|
+
All cases: throwaway git repos isolated via `GIT_CONFIG_GLOBAL=/dev/null`
|
|
329
|
+
(mirror test-evidence-gate.sh:107-115). Skip-not-fail on missing git/python3.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## 5. Honest limits
|
|
334
|
+
|
|
335
|
+
- PERPETUAL-MODE = NOTIFY-ONLY. If Loki runs in perpetual / auto-continue mode,
|
|
336
|
+
the existing consumer (`check_human_intervention`, run.sh:12725-12727)
|
|
337
|
+
auto-clears PAUSE and continues. Escalation therefore DEGRADES to a
|
|
338
|
+
notification (notify still fires) plus a handoff doc; it does NOT halt the run.
|
|
339
|
+
We detect this at the action site and print it honestly. We deliberately do
|
|
340
|
+
NOT add a no-auto-clear carve-out for our marker (the BUDGET_EXCEEDED carve-out
|
|
341
|
+
at run.sh:12712 shows it is technically possible) because that is scope creep
|
|
342
|
+
and would break "byte-identical when off." Out of scope for v7.19.2; candidate
|
|
343
|
+
follow-up.
|
|
344
|
+
- PROXY 2 IS COUNT-BLIND BY ORIGIN. `convergence.log` stores `files_changed` as
|
|
345
|
+
a count (completion-council.sh:208), not identities, so it cannot by itself see
|
|
346
|
+
"same files back and forth." We approximate oscillation with diff-hash
|
|
347
|
+
recurrence-at-distance, which catches A -> B -> A state cycling but CANNOT
|
|
348
|
+
distinguish a genuine revert from a coincidental return to an identical tree
|
|
349
|
+
state, and will MISS oscillation that changes content each pass (hash differs
|
|
350
|
+
every round). It is a heuristic, not a true revert detector.
|
|
351
|
+
- PROXY 3 STALENESS BETWEEN VOTES. The verdicts array only updates on actual
|
|
352
|
+
council votes (every `COUNCIL_CHECK_INTERVAL` or circuit-forced). Sampled every
|
|
353
|
+
iteration, p3 can be stale between votes. We rely on the circuit-breaker
|
|
354
|
+
coupling (proxy 1 hot forces a vote, refreshing p3) so p3 is fresh exactly in
|
|
355
|
+
the regime we escalate on; outside that regime p3 may lag by up to
|
|
356
|
+
`COUNCIL_CHECK_INTERVAL` iterations.
|
|
357
|
+
- PROXIES FALSE-FIRE AND MISS. All three are heuristics. A legitimately hard
|
|
358
|
+
refactor that produces no net diff for several rounds while the council
|
|
359
|
+
remains split can false-fire; a fast-thrashing failure that keeps changing
|
|
360
|
+
different files with shifting hashes can be missed. Requiring >=2 co-occurring
|
|
361
|
+
for N rounds reduces, but does not eliminate, false fires. The cost of a false
|
|
362
|
+
fire is bounded: one notification + one handoff + one PAUSE (auto-cleared in
|
|
363
|
+
perpetual), opt-out at the site.
|
|
364
|
+
- THESE ARE PROXIES, NOT TRUE METACOGNITION. The system does not know it is
|
|
365
|
+
stuck; it infers stuckness from three correlated symptoms of stuckness. There
|
|
366
|
+
is no model of confidence, no self-estimate of progress. This is intentional
|
|
367
|
+
(no new metacognition) and is the honest ceiling on what this feature can
|
|
368
|
+
claim.
|
|
369
|
+
|
|
370
|
+
---
|
|
371
|
+
|
|
372
|
+
## 6. Rails (the v7.19.1 evidence-gate rails, mirrored)
|
|
373
|
+
|
|
374
|
+
A default-on hook in the hot loop must be bounded, loud, and self-rescuing.
|
|
375
|
+
|
|
376
|
+
- BOUNDED: the decision function does O(1) work - reads two small JSON files,
|
|
377
|
+
scans the last K verdicts and a 6-entry ring. No git subprocess in the decision
|
|
378
|
+
path (hash comes from state.json via slice A's one-line add). No network. No
|
|
379
|
+
unbounded loop. Cannot hang. The action runs at most ONCE per stuck episode
|
|
380
|
+
(debounce), not every iteration.
|
|
381
|
+
- LOUD TERMINAL LINE at the escalation site (run.sh, slice B):
|
|
382
|
+
```
|
|
383
|
+
log_error "[Uncertainty] Escalating to human: >=2 of 3 stuck-signals co-occurred for N rounds (no-change / oscillation / council-split). PAUSE written; handoff saved."
|
|
384
|
+
log_warn "[Uncertainty] To opt out of proactive escalation: set LOKI_UNCERTAINTY_ESCALATION=0"
|
|
385
|
+
```
|
|
386
|
+
And, only when perpetual, the honesty line:
|
|
387
|
+
```
|
|
388
|
+
log_warn "[Uncertainty] Perpetual mode: PAUSE will be auto-cleared; this is notify-only and will NOT halt the run."
|
|
389
|
+
```
|
|
390
|
+
- OPT-OUT NAMED AT THE SITE: the opt-out env var is printed on the line above,
|
|
391
|
+
right where escalation happens, so a terminal user with no dashboard can
|
|
392
|
+
self-rescue in one step (mirrors completion-council.sh:1055).
|
|
393
|
+
- KNOB-FIRST: `LOKI_UNCERTAINTY_ESCALATION=0` short-circuits the decision
|
|
394
|
+
function before any read/write (section 2), and `type ... >/dev/null` guards
|
|
395
|
+
the run.sh call so an unbuilt function is a silent no-op. Byte-identical when
|
|
396
|
+
off, proven by test case 8.
|