loki-mode 7.18.3 → 7.19.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/SKILL.md +2 -2
- package/VERSION +1 -1
- package/autonomy/completion-council.sh +201 -0
- package/autonomy/run.sh +146 -3
- package/dashboard/__init__.py +1 -1
- package/dashboard/server.py +40 -11
- package/dashboard/static/index.html +543 -497
- package/docs/FAILURE-MEMORY-PLAN.md +424 -0
- package/docs/INSTALLATION.md +1 -1
- package/docs/VERIFIED-COMPLETION-PLAN.md +462 -0
- package/loki-ts/dist/loki.js +2 -2
- package/mcp/__init__.py +1 -1
- package/package.json +1 -1
- package/skills/quality-gates.md +30 -0
|
@@ -0,0 +1,424 @@
|
|
|
1
|
+
# Failure-Memory Loop - Implementation Plan
|
|
2
|
+
|
|
3
|
+
Release: next PATCH/MINOR (v7.18.4 or v7.19.0)
|
|
4
|
+
Status: DESIGN ONLY. No implementation code, no version bump, no commit in this plan.
|
|
5
|
+
Author: Architect (read-only planning pass; line numbers re-grepped on the live tree; key paths verified by running the real Python modules).
|
|
6
|
+
|
|
7
|
+
## Goal
|
|
8
|
+
|
|
9
|
+
Crashes and iteration failures become durable lessons that get injected into the
|
|
10
|
+
NEXT iteration's prompt, so Loki stops repeating the same mistake. Default-on
|
|
11
|
+
(`LOKI_FAILURE_MEMORY=1`, set to `0` to opt out). Local, zero new setup, zero
|
|
12
|
+
network. Builds on the already-shipped Phase 0 crash capture.
|
|
13
|
+
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
## IMPORTANT: this plan deviates from the task's literal Connector B design - with evidence
|
|
17
|
+
|
|
18
|
+
The task instructed Connector B to call `retrieve_anti_patterns(top_k=3)` and the
|
|
19
|
+
overall design to rely on consolidation turning failures into anti-patterns. I
|
|
20
|
+
implemented that path and tested it against the REAL modules. It does NOT close
|
|
21
|
+
the loop within a run. The deviation below is evidence-driven, not a preference.
|
|
22
|
+
|
|
23
|
+
### Why `retrieve_anti_patterns` cannot retrieve the failure within a run
|
|
24
|
+
|
|
25
|
+
Traced and then verified by running the modules:
|
|
26
|
+
|
|
27
|
+
1. Every ordinary non-zero iteration calls `loki_crash_capture` with a FIXED
|
|
28
|
+
`error_class="IterationError"` (run.sh:12030-12038). So all plain failures
|
|
29
|
+
carry the same error class.
|
|
30
|
+
2. If a failure were consolidated, `extract_anti_patterns` (consolidation.py:570)
|
|
31
|
+
builds the anti-pattern body from `action_log[-3:]` and `resolutions`. But
|
|
32
|
+
`auto_capture_episode` never populates `action_log` (run.sh:9454-9491) and our
|
|
33
|
+
`ErrorEntry.resolution` is empty, so the ONLY non-empty searchable field is
|
|
34
|
+
`pattern="Avoid: IterationError"` (consolidation.py:626-627). The rich
|
|
35
|
+
`message` we compose in Connector A is DISCARDED by `extract_anti_patterns`.
|
|
36
|
+
3. `retrieve_anti_patterns(query=goal+" "+phase)` keyword-scores those words
|
|
37
|
+
against `"avoid: iterationerror"` (retrieval.py:1567-1588). A real goal
|
|
38
|
+
("build a todo REST API") shares ZERO tokens with "IterationError" -> score 0
|
|
39
|
+
-> not returned. Embeddings would also score near-zero. Keyword dominates on
|
|
40
|
+
local setups (no `embedding_engine`).
|
|
41
|
+
4. All plain failures collapse into one `IterationError` group, so there is not
|
|
42
|
+
even per-failure discrimination.
|
|
43
|
+
|
|
44
|
+
Empirical confirmation (ran the real `MemoryRetrieval` against a seeded failed
|
|
45
|
+
episode): `retrieve_anti_patterns('build a todo REST API ACT', top_k=3)`
|
|
46
|
+
returned `[]`. A direct recency read of the same store returned the lesson with
|
|
47
|
+
its full message. See "Verification performed" below.
|
|
48
|
+
|
|
49
|
+
### The fix: recency-scoped direct read (Connector B, revised)
|
|
50
|
+
|
|
51
|
+
Within a run the goal is constant, so the correct retrieval key is RECENCY
|
|
52
|
+
("what did I just fail at"), not goal-similarity. Connector B reads recent
|
|
53
|
+
FAILURE episodes directly from storage and formats their `errors_encountered`
|
|
54
|
+
(including the rich `message` Connector A composes), reusing the exact storage
|
|
55
|
+
API consolidation already uses (`list_episodes(since=)` + `load_episode`,
|
|
56
|
+
consolidation.py:172-182). The literal `retrieve_anti_patterns` call is KEPT as
|
|
57
|
+
a best-effort cross-run secondary (mostly empty locally; harmless), but it is NOT
|
|
58
|
+
what closes the loop.
|
|
59
|
+
|
|
60
|
+
This deviation should be visible to the reviewer: the task said "call
|
|
61
|
+
`retrieve_anti_patterns`"; verification shows that mechanism does not retrieve
|
|
62
|
+
within a run, so the within-run loop is closed by a recency read instead.
|
|
63
|
+
|
|
64
|
+
---
|
|
65
|
+
|
|
66
|
+
## Final design (validated against source + a live module run)
|
|
67
|
+
|
|
68
|
+
- CONNECTOR A - failure ingestion (run.sh `auto_capture_episode`): when
|
|
69
|
+
`exit_code != 0` and the knob is on, read this iteration's scrubbed
|
|
70
|
+
`.loki/crash/*.json`, map the whitelisted fields into an `ErrorEntry`, and
|
|
71
|
+
attach it to the LIVE failed episode's `errors_encountered` BEFORE
|
|
72
|
+
`engine.store_episode(trace)`. If telemetry is off (no crash file), SYNTHESIZE
|
|
73
|
+
a minimal ErrorEntry from non-sensitive fields so the loop works regardless of
|
|
74
|
+
telemetry state.
|
|
75
|
+
- CONNECTOR B - failure-aware retrieval (run.sh `retrieve_memory_context`): read
|
|
76
|
+
recent FAILURE episodes directly and append a clearly labeled
|
|
77
|
+
`PAST FAILURES TO AVOID:` block (error_type + composed message) to the memory
|
|
78
|
+
context that `build_prompt` carries into the next iteration. Keep a best-effort
|
|
79
|
+
`retrieve_anti_patterns` secondary for cross-run lessons.
|
|
80
|
+
- Connector C (per-iteration consolidation) is DROPPED. With the recency read it
|
|
81
|
+
is not load-bearing (it could not produce a retrievable lesson anyway, per the
|
|
82
|
+
evidence above), and the existing end-of-run consolidations
|
|
83
|
+
(run.sh:12289/12338/12680) still provide cross-run semantic durability.
|
|
84
|
+
Dropping it removes the perpetual-mode-volume, lock-contention, and
|
|
85
|
+
index-staleness risks entirely and net-reduces code.
|
|
86
|
+
- Gate: `LOKI_FAILURE_MEMORY` (default 1). Both connectors no-op when `0`.
|
|
87
|
+
- Dual-route: bash is the engine. The Bun `build_prompt.ts` `retrieveMemoryContext`
|
|
88
|
+
is an intentional empty stub; only static-line parity (if any) + fixture
|
|
89
|
+
refresh applies - recommended: add no static line, so zero Bun change.
|
|
90
|
+
|
|
91
|
+
---
|
|
92
|
+
|
|
93
|
+
## Verification performed (so the reviewer can trust the deviation)
|
|
94
|
+
|
|
95
|
+
Ran the real `memory` modules against a temp `.loki/memory`:
|
|
96
|
+
- Stored a failed `EpisodeTrace` (outcome="failure", goal="build a todo REST API")
|
|
97
|
+
with `ErrorEntry(error_type="IterationError", message="phase=ACT; signature:
|
|
98
|
+
handler > parse > json.loads; fp=abc123def456", resolution="")`.
|
|
99
|
+
- `MemoryStorage.list_episodes(since=now-24h)` -> 1 episode; filtered
|
|
100
|
+
outcome=="failure" -> 1; surfaced the lesson WITH its full message. (Connector B
|
|
101
|
+
recency read works.)
|
|
102
|
+
- `MemoryRetrieval.retrieve_anti_patterns("build a todo REST API ACT", top_k=3)`
|
|
103
|
+
-> `[]`. (Confirms the literal path does not retrieve within a run.)
|
|
104
|
+
- Confirmed `loki_crash_capture` fires on EVERY non-zero, non-signal iteration
|
|
105
|
+
exit at run.sh:12030-12038 (before `auto_capture_episode` at 12255), so a
|
|
106
|
+
scrubbed crash file normally exists for Connector A when telemetry is on.
|
|
107
|
+
|
|
108
|
+
---
|
|
109
|
+
|
|
110
|
+
## Exact files and functions to change
|
|
111
|
+
|
|
112
|
+
Line numbers re-grepped on the current tree; they drift - re-`grep -n` before editing.
|
|
113
|
+
|
|
114
|
+
### 1. autonomy/run.sh
|
|
115
|
+
|
|
116
|
+
#### 1a. CONNECTOR A - `auto_capture_episode` (def run.sh:9303; Python heredoc 9428-9524)
|
|
117
|
+
|
|
118
|
+
The episode is built and stored in ONE Python heredoc (`EpisodeTrace.create` at
|
|
119
|
+
9454; `engine.store_episode(trace)` at 9491). Inject the `ErrorEntry` INSIDE this
|
|
120
|
+
heredoc, after `trace.outcome = outcome` (9460) and before `store_episode` (9491).
|
|
121
|
+
Do not load-modify-restore on disk; that would race the store.
|
|
122
|
+
|
|
123
|
+
Bash, in the function body before the heredoc env block (~9420), gated:
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
# CONNECTOR A: locate this iteration's scrubbed crash file (failure only).
|
|
127
|
+
local _crash_json=""
|
|
128
|
+
if [ "${LOKI_FAILURE_MEMORY:-1}" != "0" ] && [ "$exit_code" -ne 0 ] \
|
|
129
|
+
&& [ -d "$target_dir/.loki/crash" ]; then
|
|
130
|
+
_crash_json=$(ls -t "$target_dir/.loki/crash/"*.json 2>/dev/null | head -1 || true)
|
|
131
|
+
fi
|
|
132
|
+
```
|
|
133
|
+
|
|
134
|
+
Pass `_LOKI_CRASH_JSON="$_crash_json"`, `_LOKI_FAILURE_MEMORY="${LOKI_FAILURE_MEMORY:-1}"`,
|
|
135
|
+
and the already-available `_LOKI_RARV_PHASE` / `_LOKI_EXIT_CODE` into the heredoc
|
|
136
|
+
env block (9420-9427). Inside the heredoc, between 9460 and 9491:
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
if os.environ.get('_LOKI_FAILURE_MEMORY', '1') != '0' and outcome == 'failure':
|
|
140
|
+
try:
|
|
141
|
+
from memory.schemas import ErrorEntry
|
|
142
|
+
crash_json_path = os.environ.get('_LOKI_CRASH_JSON', '')
|
|
143
|
+
_err_type = 'IterationError'
|
|
144
|
+
_message = ''
|
|
145
|
+
if crash_json_path:
|
|
146
|
+
with open(crash_json_path, 'r', encoding='utf-8') as _cf:
|
|
147
|
+
_crash = json.load(_cf)
|
|
148
|
+
_err_type = (_crash.get('error_class')
|
|
149
|
+
or _crash.get('friction_kind') or 'IterationError')
|
|
150
|
+
_sig = _crash.get('stack_signature') or []
|
|
151
|
+
_sig_str = ' > '.join(str(s) for s in _sig[:5]) if isinstance(_sig, list) else str(_sig)
|
|
152
|
+
_phase = _crash.get('rarv_phase') or rarv_phase or ''
|
|
153
|
+
_parts = []
|
|
154
|
+
if _phase: _parts.append('phase=' + str(_phase))
|
|
155
|
+
if _crash.get('friction_kind'): _parts.append('friction=' + str(_crash['friction_kind']))
|
|
156
|
+
if _sig_str: _parts.append('signature: ' + _sig_str)
|
|
157
|
+
if _crash.get('fingerprint'): _parts.append('fp=' + str(_crash['fingerprint'])[:12])
|
|
158
|
+
_message = '; '.join(_parts) or 'iteration failed'
|
|
159
|
+
else:
|
|
160
|
+
# Telemetry-independent fallback: no crash file (e.g. telemetry off).
|
|
161
|
+
# Synthesize from non-sensitive fields only. Nothing raw, no scrub needed.
|
|
162
|
+
_ec = os.environ.get('_LOKI_EXIT_CODE', '')
|
|
163
|
+
_message = 'phase=' + str(rarv_phase or '') + '; exit=' + str(_ec)
|
|
164
|
+
trace.errors_encountered.append(ErrorEntry(
|
|
165
|
+
error_type=str(_err_type), message=_message, resolution=''))
|
|
166
|
+
except Exception:
|
|
167
|
+
pass # never block episode capture
|
|
168
|
+
```
|
|
169
|
+
|
|
170
|
+
Notes:
|
|
171
|
+
- `trace.outcome` is already "failure" for non-zero exit (9411-9413, 9460), so
|
|
172
|
+
the failed-episode filter (consolidation.py:192) and Connector B's recency read
|
|
173
|
+
both see it. No extra outcome wiring.
|
|
174
|
+
- REUSE the scrubbed crash file; never re-capture, never read raw. The fallback
|
|
175
|
+
branch uses only `rarv_phase` + `exit_code` (no log text, no paths) so it is
|
|
176
|
+
safe with NO scrubbing and works when telemetry is off.
|
|
177
|
+
|
|
178
|
+
#### 1b. CONNECTOR B - `retrieve_memory_context` (def run.sh:9031; Python heredoc 9045-9071)
|
|
179
|
+
|
|
180
|
+
Add `_LOKI_FAILURE_MEMORY="${LOKI_FAILURE_MEMORY:-1}"` to the env block at
|
|
181
|
+
9043-9044. After the existing `RELEVANT MEMORIES:` loop (9063-9068) and before
|
|
182
|
+
`PYEOF` (9071), add the recency read + best-effort secondary:
|
|
183
|
+
|
|
184
|
+
```
|
|
185
|
+
if os.environ.get('_LOKI_FAILURE_MEMORY', '1') != '0':
|
|
186
|
+
try:
|
|
187
|
+
from memory.storage import MemoryStorage as _MS
|
|
188
|
+
from memory.schemas import EpisodeTrace as _ET
|
|
189
|
+
from datetime import datetime as _dt, timezone as _tz, timedelta as _td
|
|
190
|
+
_s = storage if 'storage' in dir() else _MS(f'{target_dir}/.loki/memory')
|
|
191
|
+
_since = _dt.now(_tz.utc) - _td(hours=24)
|
|
192
|
+
_lessons = []
|
|
193
|
+
for _eid in _s.list_episodes(since=_since, limit=50):
|
|
194
|
+
_data = _s.load_episode(_eid)
|
|
195
|
+
_ep = _ET.from_dict(_data) if isinstance(_data, dict) else _data
|
|
196
|
+
if getattr(_ep, 'outcome', '') != 'failure':
|
|
197
|
+
continue
|
|
198
|
+
# Carry the episode timestamp so we can sort by true wall-clock
|
|
199
|
+
# recency. list_episodes is newest-DAY-first, but within a day
|
|
200
|
+
# files sort by a random uuid suffix in the id, NOT by time, so
|
|
201
|
+
# slicing the raw order would drop the most-recent same-day
|
|
202
|
+
# failure once a run has >3 in one day. Sort by timestamp instead.
|
|
203
|
+
_ts = getattr(_ep, 'timestamp', None)
|
|
204
|
+
_ts_key = _ts.isoformat() if hasattr(_ts, 'isoformat') else str(_ts or '')
|
|
205
|
+
for _e in getattr(_ep, 'errors_encountered', []):
|
|
206
|
+
_lessons.append((_ts_key, _e.error_type, _e.message))
|
|
207
|
+
# newest first by true wall-clock timestamp, then take 3
|
|
208
|
+
_lessons.sort(key=lambda _x: _x[0], reverse=True)
|
|
209
|
+
_lessons = [(_t, _m) for (_k, _t, _m) in _lessons[:3]]
|
|
210
|
+
if _lessons:
|
|
211
|
+
print('')
|
|
212
|
+
print('PAST FAILURES TO AVOID:')
|
|
213
|
+
for _t, _m in _lessons:
|
|
214
|
+
_line = '- ' + str(_t)[:80]
|
|
215
|
+
if _m:
|
|
216
|
+
_line += ': ' + str(_m)[:160]
|
|
217
|
+
print(_line)
|
|
218
|
+
except Exception:
|
|
219
|
+
pass
|
|
220
|
+
# Best-effort cross-run secondary (mostly empty locally; harmless).
|
|
221
|
+
try:
|
|
222
|
+
_anti = retriever.retrieve_anti_patterns((goal + ' ' + phase).strip() or goal, top_k=3)
|
|
223
|
+
for _a in _anti[:3]:
|
|
224
|
+
_w = _a.get('what_fails') or _a.get('incorrect_approach') or _a.get('pattern', '')
|
|
225
|
+
if _w:
|
|
226
|
+
print('- (prior) ' + str(_w)[:120])
|
|
227
|
+
except Exception:
|
|
228
|
+
pass
|
|
229
|
+
```
|
|
230
|
+
|
|
231
|
+
`build_prompt` captures this function's stdout into `memory_context` at
|
|
232
|
+
run.sh:9968. Confirm `list_episodes`/`load_episode` exist (storage.py:477/447 -
|
|
233
|
+
verified) and that the recency read surfaces an episode stored last iteration by
|
|
234
|
+
`engine.store_episode` (same-day `episodic/<date>/`; verified by live run).
|
|
235
|
+
|
|
236
|
+
### 2. loki-ts/src/runner/build_prompt.ts - PARITY ONLY (no logic port)
|
|
237
|
+
|
|
238
|
+
`retrieveMemoryContext` (build_prompt.ts:371-378) is an intentional empty stub
|
|
239
|
+
(returns "" unless `.loki/memory/index.json` exists, and "" even then; comment
|
|
240
|
+
374-377). Called once at build_prompt.ts:976. The DYNAMIC failure block is
|
|
241
|
+
environment-derived and already excluded from parity (stub returns "" and bash
|
|
242
|
+
returns "" when Python errors). Recommendation: add NO static instruction line ->
|
|
243
|
+
zero Bun change, parity stays green. If product insists on an explicit directive
|
|
244
|
+
line, add ONE static line, mirror it exactly in build_prompt.ts, and refresh
|
|
245
|
+
`loki-ts/tests/fixtures/build_prompt/*` via the repo's existing fixture-refresh
|
|
246
|
+
path (do not hand-edit fixtures). Update
|
|
247
|
+
`loki-ts/tests/parity/build_prompt.test.ts` only in that case.
|
|
248
|
+
|
|
249
|
+
---
|
|
250
|
+
|
|
251
|
+
## ErrorEntry field mapping (from the scrubbed crash whitelist)
|
|
252
|
+
|
|
253
|
+
Whitelist source: `autonomy/lib/crash_redact.py` `_WHITELIST` (lines 45-61).
|
|
254
|
+
`ErrorEntry` shape: `error_type`, `message`, `resolution` (schemas.py:105-117).
|
|
255
|
+
On-disk crash JSON is the post-scrub whitelist dict (crash_capture.py:194-200).
|
|
256
|
+
For ordinary failures the crash file is written at run.sh:12033 with
|
|
257
|
+
`error_class="IterationError"`.
|
|
258
|
+
|
|
259
|
+
| ErrorEntry field | Source crash field(s) | Mapping rule |
|
|
260
|
+
|------------------|------------------------|--------------|
|
|
261
|
+
| `error_type` | `error_class` -> else `friction_kind` -> `"IterationError"` | `error_class` is the sanitized class token (for ordinary failures it is "IterationError"; for friction records "Friction" with `friction_kind` carrying the kind). Becomes the displayed label. |
|
|
262
|
+
| `message` | composed from `rarv_phase` + `friction_kind` + `stack_signature` (first 5 frames) + `fingerprint` (first 12 chars) | THIS is the discriminating, retrievable content (Connector B surfaces it directly; `extract_anti_patterns` would have thrown it away). All from whitelisted fields; never raw stack text. |
|
|
263
|
+
| `resolution` | none at capture time | `""` (tolerated downstream). |
|
|
264
|
+
|
|
265
|
+
Telemetry-off fallback (no crash file): `error_type="IterationError"`,
|
|
266
|
+
`message="phase=<rarv_phase>; exit=<exit_code>"`, `resolution=""`. Uses only
|
|
267
|
+
non-sensitive fields -> no scrubbing required, no leak.
|
|
268
|
+
|
|
269
|
+
Not mapped (already on episode or not useful): `os`, `arch`, `loki_version`,
|
|
270
|
+
`node_version`, `bun_version`, `exit_code` (episode outcome already encodes
|
|
271
|
+
failure), `project_id_hash`, `rules_version`, `redactions_count`, `captured_at`.
|
|
272
|
+
|
|
273
|
+
All mapped values originate from the WHITELISTED file (or non-sensitive fallback
|
|
274
|
+
fields), so no new scrubbing is required and docs/PRIVACY.md is preserved.
|
|
275
|
+
|
|
276
|
+
---
|
|
277
|
+
|
|
278
|
+
## PAST FAILURES TO AVOID block: injection point and format
|
|
279
|
+
|
|
280
|
+
- Producer: `retrieve_memory_context` Python heredoc (run.sh, after 9068).
|
|
281
|
+
- Consumer: captured into `memory_context` at run.sh:9968, embedded in the prompt.
|
|
282
|
+
- Format (no emojis, no em dashes):
|
|
283
|
+
|
|
284
|
+
```
|
|
285
|
+
PAST FAILURES TO AVOID:
|
|
286
|
+
- <error_type, <=80 chars>: <message: phase / signature / fp, <=160 chars>
|
|
287
|
+
- ...
|
|
288
|
+
- (prior) <cross-run anti-pattern, <=120 chars> # only if any exist
|
|
289
|
+
```
|
|
290
|
+
|
|
291
|
+
- Placement: AFTER `RELEVANT MEMORIES:` and the managed-store block, so positive
|
|
292
|
+
memories come first and failures read as constraints. Capped at 3 recent
|
|
293
|
+
lessons + up to 3 cross-run secondaries to bound prompt growth.
|
|
294
|
+
|
|
295
|
+
---
|
|
296
|
+
|
|
297
|
+
## Default-on knob wiring (`LOKI_FAILURE_MEMORY`)
|
|
298
|
+
|
|
299
|
+
- Default 1 (on). Opt out with `LOKI_FAILURE_MEMORY=0`. Read via
|
|
300
|
+
`${LOKI_FAILURE_MEMORY:-1}` (matches existing default-on knobs like
|
|
301
|
+
`LOKI_INTELLIGENT_USAGE`, run.sh:12333).
|
|
302
|
+
- Gates (no-op when 0): Connector A crash lookup + ErrorEntry injection (bash
|
|
303
|
+
guard + heredoc env check); Connector B recency read + secondary (heredoc env
|
|
304
|
+
check).
|
|
305
|
+
- When off: failures do not attach an ErrorEntry; no `PAST FAILURES TO AVOID:`
|
|
306
|
+
block is emitted; behavior reverts to current.
|
|
307
|
+
- INDEPENDENCE from telemetry (decided, not just documented): the crash-file
|
|
308
|
+
WRITE at run.sh:12030 is gated by `loki_collection_enabled` (crash.sh:30). So a
|
|
309
|
+
user with telemetry OFF but `LOKI_FAILURE_MEMORY=1` (default) would otherwise
|
|
310
|
+
get a silently empty loop. Connector A's synthesized-fallback branch closes
|
|
311
|
+
that gap, so the feature works regardless of telemetry state. Document the
|
|
312
|
+
interaction AND ship the fallback.
|
|
313
|
+
|
|
314
|
+
---
|
|
315
|
+
|
|
316
|
+
## New tests
|
|
317
|
+
|
|
318
|
+
Reuse patterns from `tests/integration/test_rarv_c_memory_flow.sh` (behavioral
|
|
319
|
+
simulation exercising the real Python modules), `tests/crash/`, and
|
|
320
|
+
`tests/test-crash-cli.sh`.
|
|
321
|
+
|
|
322
|
+
### Test 1 (PRIMARY) - end-to-end, driven by the REAL input and the REAL query
|
|
323
|
+
New file: `tests/integration/test_failure_memory_loop.sh`
|
|
324
|
+
|
|
325
|
+
This test must NOT seed a query-matching record and must NOT put the error class
|
|
326
|
+
in the query (that is the mask that hid the original retrieval bug):
|
|
327
|
+
1. Input via the real path: write the crash file the way run.sh:12033 does
|
|
328
|
+
(`error_class="IterationError"`, a stack producing a `stack_signature`), via
|
|
329
|
+
`autonomy/lib/crash_capture.py` so the whitelist is authentic.
|
|
330
|
+
2. Connector A: build + store a failed `EpisodeTrace` with the ErrorEntry mapped
|
|
331
|
+
from that crash file. Assert `errors_encountered` non-empty and `message`
|
|
332
|
+
contains the signature/fingerprint.
|
|
333
|
+
3. Connector B: run the recency-read body with a goal that shares NO tokens with
|
|
334
|
+
the error class (e.g. "build a todo REST API"). Assert stdout contains
|
|
335
|
+
`PAST FAILURES TO AVOID:` AND the stack_signature/fingerprint text. If green
|
|
336
|
+
only when the error class is in the query, the test is wrong.
|
|
337
|
+
4. Telemetry-off fallback: with no crash file, assert Connector A synthesizes an
|
|
338
|
+
ErrorEntry (`phase=...; exit=...`) and Connector B still emits the block.
|
|
339
|
+
|
|
340
|
+
### Test 2 - knob off is inert
|
|
341
|
+
`LOKI_FAILURE_MEMORY=0`: no ErrorEntry attached; no `PAST FAILURES TO AVOID:`.
|
|
342
|
+
|
|
343
|
+
### Test 3 - Connector A mapping unit (Python, tests/memory/)
|
|
344
|
+
Feed crash JSON shapes (IterationError, Friction, ScrubError minimal, and the
|
|
345
|
+
no-file fallback) and assert the mapping (error_class vs friction_kind fallback;
|
|
346
|
+
empty resolution; message composed only from whitelisted/non-sensitive fields).
|
|
347
|
+
|
|
348
|
+
### Test 4 - privacy regression (tests/crash/ negative style)
|
|
349
|
+
Assert the ErrorEntry message and the rendered block contain none of: home path,
|
|
350
|
+
repo owner/name, email, IPv4/IPv6 (cannot, since inputs are whitelisted or the
|
|
351
|
+
non-sensitive fallback - guard test).
|
|
352
|
+
|
|
353
|
+
### Test 5 - Bun parity (only if a static line is added)
|
|
354
|
+
If a static directive line is added, extend
|
|
355
|
+
`loki-ts/tests/parity/build_prompt.test.ts` and refresh fixtures. If kept purely
|
|
356
|
+
dynamic (recommended), assert the existing parity suite passes unchanged.
|
|
357
|
+
|
|
358
|
+
---
|
|
359
|
+
|
|
360
|
+
## CHANGELOG entry (with honest "NOT tested" section)
|
|
361
|
+
|
|
362
|
+
```
|
|
363
|
+
### Added
|
|
364
|
+
- Failure-memory loop (LOKI_FAILURE_MEMORY, default on): iteration failures and
|
|
365
|
+
crashes are surfaced into the next iteration's prompt under a
|
|
366
|
+
"PAST FAILURES TO AVOID:" heading (error type + sanitized phase/stack-signature/
|
|
367
|
+
fingerprint), so Loki stops repeating the same mistake. Local-only, zero new
|
|
368
|
+
setup, zero network. Reuses Phase 0 scrubbed crash files (no re-capture, no raw
|
|
369
|
+
data); works even with telemetry off via a non-sensitive fallback. Opt out with
|
|
370
|
+
LOKI_FAILURE_MEMORY=0.
|
|
371
|
+
|
|
372
|
+
### Changed
|
|
373
|
+
- auto_capture_episode attaches a scrubbed (or non-sensitive fallback) ErrorEntry
|
|
374
|
+
to the failed episode's errors_encountered (Connector A).
|
|
375
|
+
- retrieve_memory_context surfaces the most recent failure lessons by recency
|
|
376
|
+
(Connector B), with a best-effort cross-run anti-pattern secondary.
|
|
377
|
+
|
|
378
|
+
### NOT tested (honest disclosure)
|
|
379
|
+
- Not validated on a real multi-iteration live run against a paid provider; the
|
|
380
|
+
end-to-end test is a behavioral simulation against the real Python modules, not
|
|
381
|
+
a full runner boot.
|
|
382
|
+
- Lesson usefulness is heuristic: the message carries phase + stack signature +
|
|
383
|
+
fingerprint but no auto-derived fix/resolution, so guidance is "what failed,"
|
|
384
|
+
not "how to fix." Whether that measurably reduces repeats is not quantified.
|
|
385
|
+
- Cross-run anti-pattern retrieval (the retrieve_anti_patterns secondary) is
|
|
386
|
+
known to rarely match goal+phase queries (error class shares no goal tokens);
|
|
387
|
+
it is kept best-effort and is not the loop-closer. Not precision-tested.
|
|
388
|
+
- Crash-file-to-episode matching uses most-recent-by-mtime; not tested under
|
|
389
|
+
rapid multi-crash iterations.
|
|
390
|
+
- Bun route: failure-memory is intentionally not implemented (stub unchanged).
|
|
391
|
+
```
|
|
392
|
+
|
|
393
|
+
---
|
|
394
|
+
|
|
395
|
+
## Risks
|
|
396
|
+
|
|
397
|
+
| # | Risk | Likelihood | Impact | Mitigation |
|
|
398
|
+
|---|------|-----------|--------|------------|
|
|
399
|
+
| 1 | Crash-file-to-episode mismatch when multiple crash files exist in one iteration | Medium | Low | Connector A picks most-recent-by-mtime; the per-iteration crash write (12033) runs just before capture. Connector B's recency read is on EPISODES (not crash files), so even a wrong crash file only mislabels one lesson, not the loop. Future: match by fingerprint/timestamp. |
|
|
400
|
+
| 2 | Within-run loop closure (anti-pattern unreachable via goal query) | Was HIGH | High | RESOLVED by switching Connector B to a recency-scoped direct episode read (verified by live module run). The literal retrieve_anti_patterns path returned []; recency read returned the lesson. |
|
|
401
|
+
| 3 | Telemetry-off silently empties the loop (no crash file) | Was HIGH | High | RESOLVED by Connector A's non-sensitive synthesized-fallback ErrorEntry. Feature is now independent of telemetry state. |
|
|
402
|
+
| 4 | Retrieval relevance: recency may surface a failure unrelated to the current sub-goal | Low | Low | Within a run the goal is roughly constant, so recent failures are relevant by construction. Capped at 3. |
|
|
403
|
+
| 5 | Lesson quality is thin (no auto-resolution) | Medium | Medium | message carries phase + stack_signature + fingerprint (discriminating). Flagged NOT tested for repeat-reduction. Future: thread a resolution/fix once available. |
|
|
404
|
+
| 6 | Prompt bloat | Low | Low | <=3 recent + <=3 cross-run lines, each bounded (<=240 chars). No static line (Bun parity unchanged). |
|
|
405
|
+
| 7 | Privacy: lesson leaking raw data | Low | High | Inputs are whitelisted crash fields or non-sensitive fallback (phase/exit only). Guard test (Test 4) asserts no path/repo/email/IP. Local only, no egress. |
|
|
406
|
+
| 8 | Perpetual-mode volume / consolidation lock contention | Eliminated | n/a | Connector C (per-iteration consolidation) was DROPPED; only the existing end-of-run consolidations remain. Index-staleness (BUG-MEM-002) also moot for this feature since Connector B reads episodes, not the vector index. |
|
|
407
|
+
|
|
408
|
+
---
|
|
409
|
+
|
|
410
|
+
## Sequencing
|
|
411
|
+
|
|
412
|
+
1. Connector A (run.sh `auto_capture_episode` heredoc + bash crash lookup + fallback).
|
|
413
|
+
2. Connector B (run.sh `retrieve_memory_context` recency read + env + secondary).
|
|
414
|
+
3. Tests 1-4 (the simulation must pass with a non-matching goal query before any Bun work).
|
|
415
|
+
4. Bun parity decision (recommended: no static line -> no Bun change). Only if a
|
|
416
|
+
static line is added: mirror in build_prompt.ts + refresh fixtures + Test 5.
|
|
417
|
+
5. CHANGELOG entry. (Version bump and commit are out of scope for this plan.)
|
|
418
|
+
|
|
419
|
+
## Critical Files for Implementation
|
|
420
|
+
- /Users/lokesh/git/loki-mode/autonomy/run.sh
|
|
421
|
+
- /Users/lokesh/git/loki-mode/memory/storage.py
|
|
422
|
+
- /Users/lokesh/git/loki-mode/memory/schemas.py
|
|
423
|
+
- /Users/lokesh/git/loki-mode/autonomy/lib/crash_redact.py
|
|
424
|
+
- /Users/lokesh/git/loki-mode/memory/retrieval.py
|