loki-mode 7.19.1 → 7.19.3
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/completion-council.sh +282 -0
- package/autonomy/config.example.yaml +26 -0
- package/autonomy/lib/proof-generator.py +20 -1
- package/autonomy/lib/proof-template.html +211 -16
- package/autonomy/run.sh +55 -0
- package/dashboard/__init__.py +1 -1
- package/docs/INSTALLATION.md +1 -1
- package/docs/SHAREABLE-PROOF-PLAN.md +194 -0
- package/docs/UNCERTAINTY-ESCALATION-PLAN.md +396 -0
- package/loki-ts/dist/loki.js +2 -2
- package/mcp/__init__.py +1 -1
- package/package.json +1 -1
- package/skills/quality-gates.md +85 -0
|
@@ -0,0 +1,396 @@
|
|
|
1
|
+
# Uncertainty-Gated Escalation (Loki Mode v7.19.2)
|
|
2
|
+
|
|
3
|
+
Design only. No implementation code lands with this document. Every hook point
|
|
4
|
+
below was read from live source; line numbers drift, so the verified anchors in
|
|
5
|
+
section 1 are the contract a dev re-confirms before editing.
|
|
6
|
+
|
|
7
|
+
## Goal
|
|
8
|
+
|
|
9
|
+
When Loki is likely stuck or thrashing, escalate to the human PROACTIVELY via
|
|
10
|
+
the EXISTING pause + notify + handoff machinery, instead of silently burning
|
|
11
|
+
iterations until max-iterations. No new metacognition: reuse three proxy signals
|
|
12
|
+
that already exist. Escalate only when at least two of the three co-occur for N
|
|
13
|
+
consecutive rounds. Default-on, opt-out with LOKI_UNCERTAINTY_ESCALATION=0,
|
|
14
|
+
byte-identical when off.
|
|
15
|
+
|
|
16
|
+
## Architectural spine: split DECISION from ACTION
|
|
17
|
+
|
|
18
|
+
This is the load-bearing decision and everything else falls out of it.
|
|
19
|
+
|
|
20
|
+
- DECISION: a pure-ish function `uncertainty_should_escalate` lives in
|
|
21
|
+
`autonomy/completion-council.sh` next to the other `council_*` state helpers.
|
|
22
|
+
It reads ONLY persisted state (`state.json`, `convergence.log`, and its own
|
|
23
|
+
`.loki/state/uncertainty.json`), mutates only its own state file, and returns
|
|
24
|
+
rc 0 (escalate now) / rc 1 (do not). It fires NO notifications and touches NO
|
|
25
|
+
PAUSE file. This makes it sourceable and testable exactly like
|
|
26
|
+
`council_evidence_gate` (completion-council.sh:907): a test writes a fake
|
|
27
|
+
`state.json` into a throwaway dir and asserts the return code, with zero real
|
|
28
|
+
side effects on the developer's machine.
|
|
29
|
+
- ACTION: the run.sh call site (new region right after
|
|
30
|
+
`council_track_iteration`, run.sh:12389-12391) interprets rc 0 and performs
|
|
31
|
+
the side effects: loud terminal line, `write_structured_handoff`,
|
|
32
|
+
`notify_intervention_needed`, write a `signals/UNCERTAINTY_ESCALATION` marker,
|
|
33
|
+
and `touch .loki/PAUSE`. It also emits the perpetual-mode honesty line.
|
|
34
|
+
|
|
35
|
+
Consequence: the two code slices live in DIFFERENT files (decision in
|
|
36
|
+
completion-council.sh, action in run.sh), so the dev fleet can build them in
|
|
37
|
+
parallel without collision.
|
|
38
|
+
|
|
39
|
+
---
|
|
40
|
+
|
|
41
|
+
## 1. Verified hook points (read from live source)
|
|
42
|
+
|
|
43
|
+
All paths relative to repo root `/Users/lokesh/git/loki-mode`.
|
|
44
|
+
|
|
45
|
+
### Proxy 1 - circuit-breaker no-change counter
|
|
46
|
+
- Var declared: `autonomy/completion-council.sh:70` (`COUNCIL_CONSECUTIVE_NO_CHANGE=0`).
|
|
47
|
+
- Incremented: `completion-council.sh:178`; reset: `:180`. Driven by a combined
|
|
48
|
+
hash of `git diff --stat HEAD` + staged diff + last commit hash
|
|
49
|
+
(`:165-182`).
|
|
50
|
+
- Limit knob: `COUNCIL_STAGNATION_LIMIT` (`:56`, default 5).
|
|
51
|
+
- Persisted: written into `state.json` as `consecutive_no_change`
|
|
52
|
+
(`completion-council.sh:232` -> `:237` json.dump). THIS is what the decision
|
|
53
|
+
function reads (not the live shell var, which is out of scope in a sourced
|
|
54
|
+
test).
|
|
55
|
+
- Updated every iteration via `council_track_iteration` (run.sh:12390).
|
|
56
|
+
|
|
57
|
+
### Proxy 2 - file-churn oscillation / reverts
|
|
58
|
+
- Existing data: `convergence.log` is appended at
|
|
59
|
+
`completion-council.sh:215` with line format
|
|
60
|
+
`timestamp|iteration|files_changed|consecutive_no_change|done_signals`.
|
|
61
|
+
CRITICAL: `files_changed` (`:208`) is a COUNT
|
|
62
|
+
(`git diff --name-only HEAD | wc -l`), NOT file identities. A count cannot
|
|
63
|
+
detect "same files back and forth."
|
|
64
|
+
- The combined diff hash exists at `completion-council.sh:175`
|
|
65
|
+
(`combined_hash`), persisted only transiently in the shell var
|
|
66
|
+
`COUNCIL_LAST_DIFF_HASH` (`:73`, `:182`) - immediate-repeat only.
|
|
67
|
+
- DECISION (see section 5 limits): proxy 2 is implemented as DIFF-HASH
|
|
68
|
+
RECURRENCE-AT-DISTANCE. We persist a small ring buffer (last
|
|
69
|
+
~6 hashes) of `combined_hash` in `uncertainty.json`. Proxy 2 fires when the
|
|
70
|
+
current hash equals a hash seen 2+ rounds back (A -> B -> A pattern). The
|
|
71
|
+
immediate repeat (A -> A) is already proxy 1, so recurrence-at-distance is the
|
|
72
|
+
genuine oscillation/revert signal. This is a tiny, justified addition (one
|
|
73
|
+
bounded array in an existing JSON file), NOT heavy new tracking. The hash to
|
|
74
|
+
read is the same `combined_hash` proxy 1 already computes; the decision
|
|
75
|
+
function recomputes it cheaply from `git diff --stat HEAD` or, preferably,
|
|
76
|
+
`council_track_iteration` writes it into `state.json` (`last_diff_hash`) so the
|
|
77
|
+
decision function stays pure (no git calls). See slice A for which.
|
|
78
|
+
|
|
79
|
+
### Proxy 3 - persistent council split
|
|
80
|
+
- approve_count computed in `council_vote` (`completion-council.sh:270`,
|
|
81
|
+
tallied `:388`, anti-sycophancy adjust `:417`).
|
|
82
|
+
- effective_threshold: `completion-council.sh:293`
|
|
83
|
+
(`(COUNCIL_SIZE * 2 + 2) / 3`, the ceiling(2/3) formula).
|
|
84
|
+
- Persisted: each council round appends to `state['verdicts']`
|
|
85
|
+
(`completion-council.sh:449-455`) with keys `iteration`, `timestamp`,
|
|
86
|
+
`approve`, `reject`, `result` (`APPROVED`/`REJECTED`). NOTE: threshold is NOT
|
|
87
|
+
stored. That is fine: `result == "REJECTED"` already encodes
|
|
88
|
+
`approve < threshold`. A split round = `result == "REJECTED" AND approve >= 1`
|
|
89
|
+
(council could not converge: at least one approver, still short of threshold).
|
|
90
|
+
Do NOT go looking for a stored threshold; it is not there by design.
|
|
91
|
+
- CADENCE: `verdicts` only appends when the council actually VOTES, which is
|
|
92
|
+
every `COUNCIL_CHECK_INTERVAL` OR when the circuit breaker forces a vote
|
|
93
|
+
(`council_should_stop`, completion-council.sh:2045-2051; circuit check
|
|
94
|
+
:2039-2043). So proxy 3 is STALE between votes. This is acceptable because in
|
|
95
|
+
the stuck regime we care about, proxy 1 going hot
|
|
96
|
+
(`consecutive_no_change >= COUNCIL_STAGNATION_LIMIT`) is exactly what TRIPS the
|
|
97
|
+
circuit breaker (`council_circuit_breaker_triggered`,
|
|
98
|
+
completion-council.sh:252) and forces a council vote, which refreshes proxy 3.
|
|
99
|
+
Verified: `council_should_stop` sets `should_check=true` when
|
|
100
|
+
`circuit_triggered=true` (:2047-2048). Document the between-votes staleness as
|
|
101
|
+
a known limit (section 5).
|
|
102
|
+
|
|
103
|
+
### notify_intervention_needed
|
|
104
|
+
- `autonomy/run.sh:2328`. Signature: `notify_intervention_needed "$reason"`;
|
|
105
|
+
thin wrapper over `send_notification "Intervention Needed" "$reason"
|
|
106
|
+
"critical"`.
|
|
107
|
+
|
|
108
|
+
### PAUSE consume / clear path (perpetual-mode crux)
|
|
109
|
+
- Consumer: `check_human_intervention` (run.sh:12701), PAUSE branch
|
|
110
|
+
`:12708`.
|
|
111
|
+
- Perpetual auto-clear: `:12711-12730`. In perpetual mode PAUSE is
|
|
112
|
+
auto-cleared (`:12727 rm -f`) and `notify_intervention_needed` STILL fires
|
|
113
|
+
(`:12726`). Only `BUDGET_EXCEEDED` (`:12712`) is carved out from
|
|
114
|
+
auto-clear.
|
|
115
|
+
- Non-perpetual: PAUSE triggers `handle_pause` (run.sh:12842) and waits
|
|
116
|
+
(`:12732-12742`).
|
|
117
|
+
- Consumed once per loop turn from the main loop: `check_human_intervention`
|
|
118
|
+
is called at run.sh:11528, return-code switch `:11530-11533`
|
|
119
|
+
(1 = restart loop, 2 = stop).
|
|
120
|
+
- IMPLICATION: escalation only WRITES PAUSE. The existing consumer halts (or, in
|
|
121
|
+
perpetual mode, auto-clears + notifies). Perpetual degrade is therefore FREE -
|
|
122
|
+
no new consumer logic. We detect perpetual at OUR site using the same vars
|
|
123
|
+
(`AUTONOMY_MODE` / `PERPETUAL_MODE`, run.sh:12711) only to print the honest
|
|
124
|
+
"notify-only; PAUSE will not halt this run" line.
|
|
125
|
+
|
|
126
|
+
### write_structured_handoff
|
|
127
|
+
- `autonomy/run.sh:8816`. Verified single live definition (the
|
|
128
|
+
"active definition is below" comment at :8811 refers to
|
|
129
|
+
`load_handoff_context`, not a second handoff def; grep shows one
|
|
130
|
+
`write_structured_handoff()`). Signature:
|
|
131
|
+
`write_structured_handoff "$reason"`; writes
|
|
132
|
+
`.loki/memory/handoffs/<ts>.json` + `.md`.
|
|
133
|
+
|
|
134
|
+
### Loop point for the escalation check
|
|
135
|
+
- Slot the ACTION immediately AFTER `council_track_iteration` in the main loop:
|
|
136
|
+
run.sh:12388-12391. At this point proxy 1 and proxy 2 are freshly written for
|
|
137
|
+
this iteration, and proxy 3 is fresh exactly when it matters (circuit-forced
|
|
138
|
+
vote). This is BEFORE the completion-promise / council checks
|
|
139
|
+
(run.sh:12408+), so escalation is evaluated every iteration.
|
|
140
|
+
|
|
141
|
+
### Mirror precedent (action shape)
|
|
142
|
+
- Gate-escalation block run.sh:12308-12318 is the precedent to clone: write a
|
|
143
|
+
`signals/` marker (`:12310`), call a handoff hook with its own opt-out
|
|
144
|
+
(`:12314`), then `touch .loki/PAUSE` (`:12317`). Our action mirrors this with
|
|
145
|
+
`write_structured_handoff` + `notify_intervention_needed` +
|
|
146
|
+
`signals/UNCERTAINTY_ESCALATION` + `touch .loki/PAUSE`.
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## 2. Escalation decision function design
|
|
151
|
+
|
|
152
|
+
### Inputs (all read from persisted state, no live shell vars)
|
|
153
|
+
1. `p1` = proxy 1 hot: from `state.json.consecutive_no_change`. Hot when
|
|
154
|
+
`>= LOKI_UNCERTAINTY_NOCHANGE_MIN` (default = `COUNCIL_STAGNATION_LIMIT` - 1,
|
|
155
|
+
i.e. "approaching circuit-breaker"). Reading slightly below the breaker limit
|
|
156
|
+
lets us escalate BEFORE the breaker forces an end-state.
|
|
157
|
+
2. `p2` = proxy 2 hot: diff-hash recurrence-at-distance. Hot when the current
|
|
158
|
+
`last_diff_hash` matches a hash at distance >= 2 in the ring buffer.
|
|
159
|
+
3. `p3` = proxy 3 hot: persistent split. Read the last `K` entries of
|
|
160
|
+
`state.json.verdicts`; count consecutive trailing rounds where
|
|
161
|
+
`result == "REJECTED" AND approve >= 1`. Hot when that run length
|
|
162
|
+
`>= LOKI_UNCERTAINTY_SPLIT_ROUNDS` (default 2).
|
|
163
|
+
|
|
164
|
+
### Co-occurrence + N-round debounce
|
|
165
|
+
- Per round (= per iteration; "round" is defined as one main-loop iteration),
|
|
166
|
+
compute `hot_count = p1 + p2 + p3`.
|
|
167
|
+
- `co_occur = (hot_count >= 2)`.
|
|
168
|
+
- Maintain `consecutive_co_occur` in `uncertainty.json`:
|
|
169
|
+
- if `co_occur`: increment; else reset to 0.
|
|
170
|
+
- Escalate (rc 0) when `consecutive_co_occur >= LOKI_UNCERTAINTY_ROUNDS`
|
|
171
|
+
(the N knob, default 2; recommended range 2-3) AND not already escalated this
|
|
172
|
+
episode (debounce flag, below).
|
|
173
|
+
- A single noisy proxy can NEVER escalate alone (requires hot_count >= 2).
|
|
174
|
+
|
|
175
|
+
### Debounce (escalate once per stuck-episode)
|
|
176
|
+
- `uncertainty.json` carries `escalated_episode: true|false`.
|
|
177
|
+
- On escalate, set `escalated_episode = true` and record
|
|
178
|
+
`escalated_at_iteration`.
|
|
179
|
+
- Suppress re-fire while `escalated_episode == true`.
|
|
180
|
+
- RE-ARM (reset `escalated_episode = false` and `consecutive_co_occur = 0`) when
|
|
181
|
+
`co_occur` becomes false in any later round (a proxy cleared => the episode is
|
|
182
|
+
considered resolved; a new stuck episode may legitimately re-escalate). State
|
|
183
|
+
the reset condition explicitly so a dev does not "helpfully" keep it latched.
|
|
184
|
+
|
|
185
|
+
### State persistence
|
|
186
|
+
- File: `.loki/state/uncertainty.json` (singular; the `uncertainty-*.json` glob
|
|
187
|
+
in the brief maps to this one file - keep it single to avoid an unbounded
|
|
188
|
+
directory). Schema:
|
|
189
|
+
```json
|
|
190
|
+
{
|
|
191
|
+
"schema_version": "1.0.0",
|
|
192
|
+
"consecutive_co_occur": 0,
|
|
193
|
+
"escalated_episode": false,
|
|
194
|
+
"escalated_at_iteration": 0,
|
|
195
|
+
"diff_hash_ring": ["<h>", "<h>", "..."],
|
|
196
|
+
"last_round_iteration": 0,
|
|
197
|
+
"last_proxies": {"p1": false, "p2": false, "p3": false}
|
|
198
|
+
}
|
|
199
|
+
```
|
|
200
|
+
- Ring buffer bounded to 6 entries (constant). All writes atomic temp+mv,
|
|
201
|
+
mirroring evidence-block.json (`completion-council.sh:1059-1086`).
|
|
202
|
+
|
|
203
|
+
### Knob-first byte-identical guard
|
|
204
|
+
First line of `uncertainty_should_escalate`, BEFORE any read or write:
|
|
205
|
+
```
|
|
206
|
+
[ "${LOKI_UNCERTAINTY_ESCALATION:-1}" = "0" ] && return 1
|
|
207
|
+
```
|
|
208
|
+
(rc 1 = do-not-escalate; mirrors `council_evidence_gate`'s knob-first guard at
|
|
209
|
+
completion-council.sh:909). When off: zero file reads, zero writes, zero state
|
|
210
|
+
file creation => byte-identical.
|
|
211
|
+
|
|
212
|
+
### Knobs summary (all opt-out / tunable, none required)
|
|
213
|
+
- `LOKI_UNCERTAINTY_ESCALATION` (default 1) - master on/off.
|
|
214
|
+
- `LOKI_UNCERTAINTY_ROUNDS` (default 2) - N consecutive co-occurrence rounds.
|
|
215
|
+
- `LOKI_UNCERTAINTY_NOCHANGE_MIN` (default `COUNCIL_STAGNATION_LIMIT - 1`) - p1
|
|
216
|
+
threshold.
|
|
217
|
+
- `LOKI_UNCERTAINTY_SPLIT_ROUNDS` (default 2) - p3 split run length.
|
|
218
|
+
|
|
219
|
+
---
|
|
220
|
+
|
|
221
|
+
## 3. Disjoint dev slices (parallel-safe)
|
|
222
|
+
|
|
223
|
+
Binding constraints for EVERY slice: no version bumps (do not touch VERSION /
|
|
224
|
+
CHANGELOG), no git commits, no emojis, no em-dashes or en-dashes (ASCII hyphen
|
|
225
|
+
only), atomic temp+mv for all state writes, knob-first opt-out where the slice
|
|
226
|
+
touches the hot loop.
|
|
227
|
+
|
|
228
|
+
### Slice A - decision function + state schema (completion-council.sh)
|
|
229
|
+
- Region: add `uncertainty_should_escalate` and a tiny
|
|
230
|
+
`_uncertainty_read_state` / `_uncertainty_write_state` pair near the other
|
|
231
|
+
`council_*` state helpers (after `council_circuit_breaker_triggered`,
|
|
232
|
+
i.e. around completion-council.sh:265, BEFORE `council_vote` at :270).
|
|
233
|
+
- Also add ONE line inside `council_track_iteration` to persist
|
|
234
|
+
`state['last_diff_hash'] = combined_hash` (extend the python block at
|
|
235
|
+
completion-council.sh:224-238 by adding the env var + one assignment) so the
|
|
236
|
+
decision function reads the hash from state.json and stays pure (no git in the
|
|
237
|
+
decision path). This is the only edit inside an existing function; keep it to a
|
|
238
|
+
single key add to minimize collision with run.sh slice.
|
|
239
|
+
- Owns: `.loki/state/uncertainty.json` schema, ring buffer, co-occurrence +
|
|
240
|
+
debounce logic, all four knobs' defaults.
|
|
241
|
+
- File-region disjoint from slice B (different file).
|
|
242
|
+
|
|
243
|
+
### Slice B - action + wiring (run.sh)
|
|
244
|
+
- Region: new block right after `council_track_iteration` call
|
|
245
|
+
(run.sh:12389-12391).
|
|
246
|
+
- Logic:
|
|
247
|
+
```
|
|
248
|
+
if type uncertainty_should_escalate >/dev/null 2>&1 && uncertainty_should_escalate; then
|
|
249
|
+
# loud line (section 6), write_structured_handoff "uncertainty_escalation",
|
|
250
|
+
# notify_intervention_needed, signals/UNCERTAINTY_ESCALATION marker,
|
|
251
|
+
# touch .loki/PAUSE, perpetual honesty line.
|
|
252
|
+
fi
|
|
253
|
+
```
|
|
254
|
+
- Clone the GATE_ESCALATION shape (run.sh:12308-12318) for marker + handoff +
|
|
255
|
+
touch ordering.
|
|
256
|
+
- Perpetual detection: read `AUTONOMY_MODE` / `PERPETUAL_MODE`
|
|
257
|
+
(same as run.sh:12711) ONLY to print the honest notify-only line.
|
|
258
|
+
- File-region disjoint from slices A, C, D.
|
|
259
|
+
|
|
260
|
+
### Slice C - tests (tests/test-uncertainty-escalation.sh)
|
|
261
|
+
- New file. Sources the real `uncertainty_should_escalate` from
|
|
262
|
+
completion-council.sh, stubs `log_*`, runs per-case throwaway dirs. Models
|
|
263
|
+
tests/test-evidence-gate.sh exactly. Asserts decision-only (no real notify /
|
|
264
|
+
no real PAUSE because it calls the DECISION function, not the run.sh action).
|
|
265
|
+
- File-region disjoint (new file).
|
|
266
|
+
|
|
267
|
+
### Slice D - docs + knob registration
|
|
268
|
+
- Register the four knobs in the config-comment block (the env-var doc region
|
|
269
|
+
around run.sh:91-128 and the yaml mapping near :282/:424) and
|
|
270
|
+
`autonomy/config.example.yaml`. Add a short section to the user-facing docs.
|
|
271
|
+
- Keep edits to comment / config blocks; do not touch the hot loop. If this
|
|
272
|
+
collides with slice B's run.sh edits, sequence D after B (the only soft
|
|
273
|
+
dependency). Otherwise fully disjoint.
|
|
274
|
+
|
|
275
|
+
Recommended parallelism: A, C, D in parallel; B after A's function signature is
|
|
276
|
+
agreed (C can mock the signature meanwhile). 4 slices, 3 files + 1 new test +
|
|
277
|
+
docs.
|
|
278
|
+
|
|
279
|
+
---
|
|
280
|
+
|
|
281
|
+
## 4. Test plan (model: tests/test-evidence-gate.sh)
|
|
282
|
+
|
|
283
|
+
Harness: source the real completion-council.sh with `log_*` stubbed; call
|
|
284
|
+
`uncertainty_should_escalate` inside per-case `mktemp -d` dirs, each writing its
|
|
285
|
+
own `.loki/state/uncertainty.json` + `.loki/council/state.json` +
|
|
286
|
+
`.loki/council/convergence.log`. Assert BOTH rc and the mutated
|
|
287
|
+
`uncertainty.json` side effects. Loud SKIP (exit 0) if the function is not yet
|
|
288
|
+
defined (mirrors evidence-gate's absent-impl banner). Each case sets
|
|
289
|
+
`COUNCIL_STATE_DIR` and `ITERATION_COUNT` explicitly.
|
|
290
|
+
|
|
291
|
+
Cases:
|
|
292
|
+
1. PROXY READ - p1 only hot: `consecutive_no_change` >= min, hash unique,
|
|
293
|
+
verdicts approved. Assert `last_proxies.p1 == true`, others false, rc 1
|
|
294
|
+
(NO escalate on 1 proxy). Proves proxy 1 is read.
|
|
295
|
+
2. PROXY READ - p2 only hot: write a recurrence-at-distance hash ring
|
|
296
|
+
(A,B,A), unique p1/p3. Assert `p2 == true`, rc 1. Proves proxy 2 is read
|
|
297
|
+
from the ring, and that immediate-repeat (A,A) does NOT count as p2.
|
|
298
|
+
3. PROXY READ - p3 only hot: verdicts trailing K = REJECTED with approve>=1 for
|
|
299
|
+
SPLIT_ROUNDS rounds. Assert `p3 == true`, rc 1. Proves proxy 3 reads
|
|
300
|
+
`result`/`approve` (and does NOT require a stored threshold).
|
|
301
|
+
4. CO-OCCURRENCE x N escalates: set p1 + p3 hot for N consecutive calls
|
|
302
|
+
(loop the function N times, advancing iteration). Assert rc 0 on the Nth
|
|
303
|
+
call, `escalated_episode == true`. Proves >=2-for-N escalates.
|
|
304
|
+
5. 1-PROXY-NEVER: keep only one proxy hot for many rounds. Assert rc 1 every
|
|
305
|
+
round, `escalated_episode == false`. Proves a single noisy proxy cannot
|
|
306
|
+
escalate.
|
|
307
|
+
6. DEBOUNCE (no re-fire): after case-4 escalation, call again with the SAME hot
|
|
308
|
+
proxies. Assert rc 1 (suppressed) while `escalated_episode == true`. Proves
|
|
309
|
+
escalate-once-per-episode.
|
|
310
|
+
7. RE-ARM: after escalation, feed one round with co_occur false (clear a proxy),
|
|
311
|
+
assert `escalated_episode == false` + `consecutive_co_occur == 0`; then feed
|
|
312
|
+
N hot rounds again, assert rc 0. Proves reset-on-clear and re-escalation of a
|
|
313
|
+
new episode.
|
|
314
|
+
8. OPT-OUT BYTE-IDENTICAL: `LOKI_UNCERTAINTY_ESCALATION=0`. Assert rc 1 AND that
|
|
315
|
+
`.loki/state/uncertainty.json` is NOT created / NOT modified (snapshot the
|
|
316
|
+
dir before/after; mtime + existence). Proves byte-identical when off.
|
|
317
|
+
9. PERPETUAL DEGRADE-TO-NOTIFY: this is a run.sh ACTION behavior, so test it as a
|
|
318
|
+
thin integration shim: stub `notify_intervention_needed`, `handle_pause`,
|
|
319
|
+
`handle_dashboard_crash` to record calls; set `AUTONOMY_MODE=perpetual`;
|
|
320
|
+
`touch .loki/PAUSE`; call the real `check_human_intervention`
|
|
321
|
+
(run.sh:12701). Assert PAUSE is auto-cleared AND notify was called (proves
|
|
322
|
+
the degrade path is the EXISTING consumer at run.sh:12725-12727, so escalation
|
|
323
|
+
degrades to notify-only under perpetual). This case sources run.sh's
|
|
324
|
+
`check_human_intervention` with its deps stubbed, or asserts via a focused
|
|
325
|
+
harness; if sourcing run.sh wholesale is impractical, assert the contract by
|
|
326
|
+
reading the consumer branch and documenting it as a code-path test.
|
|
327
|
+
|
|
328
|
+
All cases: throwaway git repos isolated via `GIT_CONFIG_GLOBAL=/dev/null`
|
|
329
|
+
(mirror test-evidence-gate.sh:107-115). Skip-not-fail on missing git/python3.
|
|
330
|
+
|
|
331
|
+
---
|
|
332
|
+
|
|
333
|
+
## 5. Honest limits
|
|
334
|
+
|
|
335
|
+
- PERPETUAL-MODE = NOTIFY-ONLY. If Loki runs in perpetual / auto-continue mode,
|
|
336
|
+
the existing consumer (`check_human_intervention`, run.sh:12725-12727)
|
|
337
|
+
auto-clears PAUSE and continues. Escalation therefore DEGRADES to a
|
|
338
|
+
notification (notify still fires) plus a handoff doc; it does NOT halt the run.
|
|
339
|
+
We detect this at the action site and print it honestly. We deliberately do
|
|
340
|
+
NOT add a no-auto-clear carve-out for our marker (the BUDGET_EXCEEDED carve-out
|
|
341
|
+
at run.sh:12712 shows it is technically possible) because that is scope creep
|
|
342
|
+
and would break "byte-identical when off." Out of scope for v7.19.2; candidate
|
|
343
|
+
follow-up.
|
|
344
|
+
- PROXY 2 IS COUNT-BLIND BY ORIGIN. `convergence.log` stores `files_changed` as
|
|
345
|
+
a count (completion-council.sh:208), not identities, so it cannot by itself see
|
|
346
|
+
"same files back and forth." We approximate oscillation with diff-hash
|
|
347
|
+
recurrence-at-distance, which catches A -> B -> A state cycling but CANNOT
|
|
348
|
+
distinguish a genuine revert from a coincidental return to an identical tree
|
|
349
|
+
state, and will MISS oscillation that changes content each pass (hash differs
|
|
350
|
+
every round). It is a heuristic, not a true revert detector.
|
|
351
|
+
- PROXY 3 STALENESS BETWEEN VOTES. The verdicts array only updates on actual
|
|
352
|
+
council votes (every `COUNCIL_CHECK_INTERVAL` or circuit-forced). Sampled every
|
|
353
|
+
iteration, p3 can be stale between votes. We rely on the circuit-breaker
|
|
354
|
+
coupling (proxy 1 hot forces a vote, refreshing p3) so p3 is fresh exactly in
|
|
355
|
+
the regime we escalate on; outside that regime p3 may lag by up to
|
|
356
|
+
`COUNCIL_CHECK_INTERVAL` iterations.
|
|
357
|
+
- PROXIES FALSE-FIRE AND MISS. All three are heuristics. A legitimately hard
|
|
358
|
+
refactor that produces no net diff for several rounds while the council
|
|
359
|
+
remains split can false-fire; a fast-thrashing failure that keeps changing
|
|
360
|
+
different files with shifting hashes can be missed. Requiring >=2 co-occurring
|
|
361
|
+
for N rounds reduces, but does not eliminate, false fires. The cost of a false
|
|
362
|
+
fire is bounded: one notification + one handoff + one PAUSE (auto-cleared in
|
|
363
|
+
perpetual), opt-out at the site.
|
|
364
|
+
- THESE ARE PROXIES, NOT TRUE METACOGNITION. The system does not know it is
|
|
365
|
+
stuck; it infers stuckness from three correlated symptoms of stuckness. There
|
|
366
|
+
is no model of confidence, no self-estimate of progress. This is intentional
|
|
367
|
+
(no new metacognition) and is the honest ceiling on what this feature can
|
|
368
|
+
claim.
|
|
369
|
+
|
|
370
|
+
---
|
|
371
|
+
|
|
372
|
+
## 6. Rails (the v7.19.1 evidence-gate rails, mirrored)
|
|
373
|
+
|
|
374
|
+
A default-on hook in the hot loop must be bounded, loud, and self-rescuing.
|
|
375
|
+
|
|
376
|
+
- BOUNDED: the decision function does O(1) work - reads two small JSON files,
|
|
377
|
+
scans the last K verdicts and a 6-entry ring. No git subprocess in the decision
|
|
378
|
+
path (hash comes from state.json via slice A's one-line add). No network. No
|
|
379
|
+
unbounded loop. Cannot hang. The action runs at most ONCE per stuck episode
|
|
380
|
+
(debounce), not every iteration.
|
|
381
|
+
- LOUD TERMINAL LINE at the escalation site (run.sh, slice B):
|
|
382
|
+
```
|
|
383
|
+
log_error "[Uncertainty] Escalating to human: >=2 of 3 stuck-signals co-occurred for N rounds (no-change / oscillation / council-split). PAUSE written; handoff saved."
|
|
384
|
+
log_warn "[Uncertainty] To opt out of proactive escalation: set LOKI_UNCERTAINTY_ESCALATION=0"
|
|
385
|
+
```
|
|
386
|
+
And, only when perpetual, the honesty line:
|
|
387
|
+
```
|
|
388
|
+
log_warn "[Uncertainty] Perpetual mode: PAUSE will be auto-cleared; this is notify-only and will NOT halt the run."
|
|
389
|
+
```
|
|
390
|
+
- OPT-OUT NAMED AT THE SITE: the opt-out env var is printed on the line above,
|
|
391
|
+
right where escalation happens, so a terminal user with no dashboard can
|
|
392
|
+
self-rescue in one step (mirrors completion-council.sh:1055).
|
|
393
|
+
- KNOB-FIRST: `LOKI_UNCERTAINTY_ESCALATION=0` short-circuits the decision
|
|
394
|
+
function before any read/write (section 2), and `type ... >/dev/null` guards
|
|
395
|
+
the run.sh call so an unbuilt function is a silent no-op. Byte-identical when
|
|
396
|
+
off, proven by test case 8.
|
package/loki-ts/dist/loki.js
CHANGED
|
@@ -1,5 +1,5 @@
|
|
|
1
1
|
// @bun
|
|
2
|
-
var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var K in Q)f8($,K,{get:Q[K],enumerable:!0,configurable:!0,set:c8.bind(Q,K)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let K=l1($);if(K===$)break;$=K}return n(j$,"..","..","..")}function d1($){let Q=$;for(let K=0;K<6;K++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let Z=l1(Q);if(Z===Q)break;Q=Z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.19.
|
|
2
|
+
var f8=Object.defineProperty;var u8=($)=>$;function c8($,Q){this[$]=u8.bind(null,Q)}var g=($,Q)=>{for(var K in Q)f8($,K,{get:Q[K],enumerable:!0,configurable:!0,set:c8.bind(Q,K)})};var k=($,Q)=>()=>($&&(Q=$($=0)),Q);var X1=import.meta.require;var F$={};g(F$,{lokiDir:()=>P,homeLokiDir:()=>o1,findRepoRootForVersion:()=>d1,REPO_ROOT:()=>f});import{resolve as n,dirname as l1}from"path";import{fileURLToPath as p8}from"url";import{existsSync as L1}from"fs";import{homedir as l8}from"os";function d8(){let $=j$;for(let Q=0;Q<6;Q++){if(L1(n($,"VERSION"))&&L1(n($,"autonomy/run.sh")))return $;let K=l1($);if(K===$)break;$=K}return n(j$,"..","..","..")}function d1($){let Q=$;for(let K=0;K<6;K++){if(L1(n(Q,"VERSION"))&&L1(n(Q,"autonomy/run.sh")))return Q;let Z=l1(Q);if(Z===Q)break;Q=Z}return n($,"..","..","..")}function P(){return process.env.LOKI_DIR??n(process.cwd(),".loki")}function o1(){return n(l8(),".loki")}var j$,f;var y=k(()=>{j$=l1(p8(import.meta.url));f=d8()});import{readFileSync as o8}from"fs";import{resolve as n8,dirname as a8}from"path";import{fileURLToPath as s8}from"url";function k1(){if($1!==null)return $1;let $="7.19.3";if(typeof $==="string"&&$.length>0)return $1=$,$1;try{let Q=a8(s8(import.meta.url)),K=d1(Q);$1=o8(n8(K,"VERSION"),"utf-8").trim()}catch{$1="unknown"}return $1}var $1=null;var n1=k(()=>{y()});var E$={};g(E$,{runOrThrow:()=>t8,run:()=>j,commandVersion:()=>i8,commandExists:()=>v,ShellError:()=>a1});async function j($,Q={}){let K=Bun.spawn({cmd:[...$],stdout:"pipe",stderr:"pipe",env:Q.env?{...process.env,...Q.env}:process.env,cwd:Q.cwd}),Z,z;if(Q.timeoutMs&&Q.timeoutMs>0)Z=setTimeout(()=>{try{K.kill("SIGTERM")}catch{}z=setTimeout(()=>{try{K.kill("SIGKILL")}catch{}},2000)},Q.timeoutMs);try{let[H,X,q]=await Promise.all([new Response(K.stdout).text(),new Response(K.stderr).text(),K.exited]);return{stdout:H,stderr:X,exitCode:q}}finally{if(Z)clearTimeout(Z);if(z)clearTimeout(z)}}async function t8($,Q={}){let K=await j($,Q);if(K.exitCode!==0)throw new a1(`command failed (${K.exitCode}): ${$.join(" ")}`,K.exitCode,K.stdout,K.stderr);return K}async function v($){let Q=r8($),K=await j(["sh","-c",`command -v ${Q}`],{timeoutMs:5000});if(K.exitCode===0)return K.stdout.trim()||null;return null}function r8($){if(!/^[A-Za-z0-9._/-]+$/.test($))throw Error(`refused to shell-escape suspect token: ${$}`);return $}async function i8($,Q="--version"){if(!await v($))return null;let Z=await j([$,Q],{timeoutMs:5000});if(Z.exitCode!==0)return null;return((Z.stdout||Z.stderr).split(/\r?\n/)[0]?.trim()??"")||null}var a1;var d=k(()=>{a1=class a1 extends Error{message;exitCode;stdout;stderr;constructor($,Q,K,Z){super($);this.message=$;this.exitCode=Q;this.stdout=K;this.stderr=Z;this.name="ShellError"}}});function a($){return e8?"":$}var e8,T,N,w,ZK,_,R,h,J;var c=k(()=>{e8=(process.env.NO_COLOR??"").length>0;T=a("\x1B[0;31m"),N=a("\x1B[0;32m"),w=a("\x1B[1;33m"),ZK=a("\x1B[0;34m"),_=a("\x1B[0;36m"),R=a("\x1B[1m"),h=a("\x1B[2m"),J=a("\x1B[0m")});import{existsSync as U7}from"fs";async function Q1(){if(B1!==void 0)return B1;let $="/opt/homebrew/bin/python3.12";if(U7($))return B1=$,$;let Q=await v("python3.12");if(Q)return B1=Q,Q;let K=await v("python3");return B1=K,K}async function K1($,Q={}){let K=await Q1();if(!K)return{stdout:"",stderr:"python3 not found",exitCode:127};return j([K,"-c",$],Q)}var B1;var H1=k(()=>{d()});var d$={};g(d$,{runStatus:()=>N7});import{existsSync as b,readFileSync as q1,readdirSync as v$,statSync as f$}from"fs";import{resolve as D,basename as P7}from"path";import{homedir as L7}from"os";async function j7(){if(await v("jq"))return!0;return process.stdout.write(`${T}Error: jq is required but not installed.${J}
|
|
3
3
|
`),process.stdout.write(`Install with:
|
|
4
4
|
`),process.stdout.write(` brew install jq (macOS)
|
|
5
5
|
`),process.stdout.write(` apt install jq (Debian/Ubuntu)
|
|
@@ -787,4 +787,4 @@ Set LOKI_LEGACY_BASH=1 to force the bash CLI for every command.
|
|
|
787
787
|
`),2}default:return process.stderr.write(`Unknown command: ${Q}
|
|
788
788
|
`),process.stderr.write(v8),2}}g$();process.on("SIGINT",()=>process.exit(130));process.on("SIGTERM",()=>process.exit(143));var p3=await c3(Bun.argv.slice(2));process.exit(p3);
|
|
789
789
|
|
|
790
|
-
//# debugId=
|
|
790
|
+
//# debugId=DB7AECDBE28F921664756E2164756E21
|
package/mcp/__init__.py
CHANGED
package/package.json
CHANGED
|
@@ -1,6 +1,6 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "loki-mode",
|
|
3
|
-
"version": "7.19.
|
|
3
|
+
"version": "7.19.3",
|
|
4
4
|
"description": "Loki Mode by Autonomi. Autonomous spec-to-product system: takes a PRD, GitHub issue, OpenAPI/JSON/YAML, or one-line brief to a deployed app via the RARV-C closure loop with 11 quality gates. Provider-agnostic (Claude Code, OpenAI Codex, Cline, Aider).",
|
|
5
5
|
"keywords": [
|
|
6
6
|
"agent",
|
package/skills/quality-gates.md
CHANGED
|
@@ -202,6 +202,91 @@ crash via the primitive's `finally` cleanup.
|
|
|
202
202
|
|
|
203
203
|
---
|
|
204
204
|
|
|
205
|
+
## Uncertainty-gated escalation (v7.19.2, default-on)
|
|
206
|
+
|
|
207
|
+
When Loki is likely stuck or thrashing, it escalates proactively to the human
|
|
208
|
+
via the existing PAUSE + notification + handoff machinery, rather than silently
|
|
209
|
+
burning iterations until max-iterations. No new metacognition: the system
|
|
210
|
+
reuses three proxy signals that already exist and escalates only when at least
|
|
211
|
+
two of the three co-occur for N consecutive rounds.
|
|
212
|
+
|
|
213
|
+
### Trigger condition
|
|
214
|
+
|
|
215
|
+
Three proxy signals are evaluated each iteration:
|
|
216
|
+
|
|
217
|
+
- **Proxy 1 (no-change counter):** `consecutive_no_change` in council state.json
|
|
218
|
+
reaches `LOKI_UNCERTAINTY_NOCHANGE_MIN` (default: `COUNCIL_STAGNATION_LIMIT - 1`,
|
|
219
|
+
i.e. one below the circuit-breaker limit so escalation fires before the
|
|
220
|
+
breaker ends the run).
|
|
221
|
+
- **Proxy 2 (diff-hash oscillation):** the current iteration's combined diff
|
|
222
|
+
hash matches a hash seen 2+ rounds back in a bounded ring buffer (A -> B -> A
|
|
223
|
+
pattern). Detects oscillation/revert cycling; does not fire on the trivial
|
|
224
|
+
immediate-repeat case which proxy 1 already covers.
|
|
225
|
+
- **Proxy 3 (persistent council split):** the last `LOKI_UNCERTAINTY_SPLIT_ROUNDS`
|
|
226
|
+
consecutive council verdicts are all REJECTED-with-at-least-one-approver
|
|
227
|
+
(split verdict). Stale between council votes; fresh exactly when proxy 1 is
|
|
228
|
+
hot, because proxy 1 hot forces a circuit-breaker vote that refreshes verdicts.
|
|
229
|
+
|
|
230
|
+
Escalation fires when `hot_count >= 2` (at least two proxies hot simultaneously)
|
|
231
|
+
for `LOKI_UNCERTAINTY_ROUNDS` consecutive rounds AND the episode has not already
|
|
232
|
+
been escalated (one escalation per stuck-episode, with re-arm when co-occurrence
|
|
233
|
+
clears).
|
|
234
|
+
|
|
235
|
+
### Action
|
|
236
|
+
|
|
237
|
+
When the trigger condition is met, the run.sh action block:
|
|
238
|
+
|
|
239
|
+
1. Prints a loud terminal line with the opt-out env var.
|
|
240
|
+
2. Calls `write_structured_handoff "uncertainty_escalation"` (saves
|
|
241
|
+
`.loki/memory/handoffs/<ts>.json` and `.md`).
|
|
242
|
+
3. Calls `notify_intervention_needed` with a structured reason string.
|
|
243
|
+
4. Writes a `.loki/signals/UNCERTAINTY_ESCALATION` marker file.
|
|
244
|
+
5. Touches `.loki/PAUSE`.
|
|
245
|
+
|
|
246
|
+
### Knobs
|
|
247
|
+
|
|
248
|
+
```bash
|
|
249
|
+
LOKI_UNCERTAINTY_ESCALATION=0 # Disable entirely. Byte-identical when off:
|
|
250
|
+
# zero reads, zero writes, no state file.
|
|
251
|
+
# Default: 1 (enabled). Toggle value is 0/1,
|
|
252
|
+
# not false/true.
|
|
253
|
+
LOKI_UNCERTAINTY_ROUNDS=2 # Consecutive co-occurrence rounds required.
|
|
254
|
+
# Recommended range 2-3. Default: 2.
|
|
255
|
+
LOKI_UNCERTAINTY_NOCHANGE_MIN=N # Proxy 1 threshold. Unset = auto-computed as
|
|
256
|
+
# COUNCIL_STAGNATION_LIMIT - 1 (floored at 1).
|
|
257
|
+
LOKI_UNCERTAINTY_SPLIT_ROUNDS=2 # Proxy 3 trailing split-round run length.
|
|
258
|
+
# Default: 2.
|
|
259
|
+
```
|
|
260
|
+
|
|
261
|
+
Configurable via `config.yaml` under `completion.uncertainty.*` (see
|
|
262
|
+
`autonomy/config.example.yaml`).
|
|
263
|
+
|
|
264
|
+
### Honest limits
|
|
265
|
+
|
|
266
|
+
- **Perpetual-mode = notify-only by default.** `AUTONOMY_MODE` defaults to
|
|
267
|
+
`perpetual`. In perpetual mode the existing consumer (`check_human_intervention`)
|
|
268
|
+
auto-clears PAUSE and continues. Escalation therefore degrades to a notification
|
|
269
|
+
plus a handoff document; it does NOT halt the run. The terminal prints an explicit
|
|
270
|
+
warning at the escalation site: "Perpetual mode: PAUSE will be auto-cleared; this
|
|
271
|
+
is notify-only and will NOT halt the run."
|
|
272
|
+
- **Proxy 2 is count-blind by origin.** It approximates oscillation with
|
|
273
|
+
diff-hash recurrence-at-distance; it cannot distinguish a genuine revert from
|
|
274
|
+
a coincidental identical tree state, and misses oscillation where the hash
|
|
275
|
+
differs every round.
|
|
276
|
+
- **Proxy 3 is stale between council votes.** Verdicts are only appended when the
|
|
277
|
+
council actually votes (every `COUNCIL_CHECK_INTERVAL` or circuit-forced). In
|
|
278
|
+
practice p3 is always fresh in the regime that matters (proxy 1 hot forces a
|
|
279
|
+
vote), but it may lag by up to `COUNCIL_CHECK_INTERVAL` iterations otherwise.
|
|
280
|
+
- **These are heuristics, not true metacognition.** The system does not know it
|
|
281
|
+
is stuck; it infers stuckness from three correlated symptoms. A legitimately
|
|
282
|
+
hard refactor that produces no net diff for several rounds while the council
|
|
283
|
+
remains split can false-fire. Requiring >=2 co-occurring for N rounds reduces
|
|
284
|
+
but does not eliminate false fires. The cost of a false fire is bounded: one
|
|
285
|
+
notification + one handoff + one PAUSE (auto-cleared in perpetual), opt-out
|
|
286
|
+
at the site.
|
|
287
|
+
|
|
288
|
+
---
|
|
289
|
+
|
|
205
290
|
## Guardrails Execution Modes
|
|
206
291
|
|
|
207
292
|
- **Blocking**: Guardrail completes before agent starts (use for expensive operations)
|