opencode-lore 0.2.9 → 0.3.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +37 -22
- package/package.json +2 -2
- package/src/agents-file.ts +1 -1
- package/src/index.ts +10 -7
package/README.md
CHANGED
|
@@ -24,31 +24,45 @@ A **gradient context manager** decides how much of each tier to include in each
|
|
|
24
24
|
|
|
25
25
|
> Scores below are on Claude Sonnet 4 (claude-sonnet-4-6). Results may vary with other models.
|
|
26
26
|
|
|
27
|
-
###
|
|
27
|
+
### Coding session recall
|
|
28
28
|
|
|
29
|
-
|
|
29
|
+
20 questions across 2 real coding sessions (113K and 353K tokens), targeting specific facts at varying depths. Default mode simulates OpenCode's actual behavior: compaction of early messages + 80K-token tail window. Lore mode uses on-the-fly distillation + the `recall` tool for searching raw message history.
|
|
30
30
|
|
|
31
|
-
|
|
32
|
-
|---------------------------|-----------|---------|
|
|
33
|
-
| Single-session (user) | 71.9% | 93.8% |
|
|
34
|
-
| Single-session (prefs) | 46.7% | 86.7% |
|
|
35
|
-
| Single-session (assistant)| 91.1% | 96.4% |
|
|
36
|
-
| Multi-session | 76.9% | 85.1% |
|
|
37
|
-
| Knowledge updates | 84.7% | 93.1% |
|
|
38
|
-
| Temporal reasoning | 64.6% | 81.9% |
|
|
39
|
-
| Abstention | 53.3% | 86.7% |
|
|
40
|
-
| **Overall** | **72.6%** | **88.0%** |
|
|
31
|
+
**Accuracy:**
|
|
41
32
|
|
|
42
|
-
|
|
33
|
+
| Mode | Score | Accuracy |
|
|
34
|
+
|---------|-----------|------------|
|
|
35
|
+
| Default | 10/20 | 50.0% |
|
|
36
|
+
| Lore | **17/20** | **85.0%** |
|
|
37
|
+
|
|
38
|
+
**By question depth** (where in the session the answer lives):
|
|
39
|
+
|
|
40
|
+
| Depth | Default | Lore | Gap |
|
|
41
|
+
|--------------|----------|-------------|---------|
|
|
42
|
+
| Early detail | 1/7 | **6/7** | +71pp |
|
|
43
|
+
| Mid detail | 3/5 | **5/5** | +40pp |
|
|
44
|
+
| Late detail | 6/7 | 6/7 | tied |
|
|
45
|
+
|
|
46
|
+
Early and mid details — specific numbers, file paths, design decisions, error messages — are what compaction loses and distillation preserves. Late details are in both modes' context windows, so they tie.
|
|
43
47
|
|
|
44
|
-
|
|
48
|
+
**Cost:**
|
|
45
49
|
|
|
46
|
-
| Metric
|
|
47
|
-
|
|
48
|
-
|
|
|
49
|
-
|
|
|
50
|
+
| Metric | Default | Lore | Factor |
|
|
51
|
+
|--------------------|------------|------------|--------------|
|
|
52
|
+
| Avg input/question | 126K tok | 50K tok | 2.5x less |
|
|
53
|
+
| Total cost | $8.14 | $1.87 | 4.4x cheaper |
|
|
54
|
+
| Cost/correct | $0.81 | **$0.11** | 7.4x cheaper |
|
|
50
55
|
|
|
51
|
-
Lore's
|
|
56
|
+
Lore's distilled context is smaller and more cacheable than raw tail windows, making it both more accurate and cheaper per correct answer.
|
|
57
|
+
|
|
58
|
+
**Distillation compression:**
|
|
59
|
+
|
|
60
|
+
| Session | Messages | Tokens | Distilled to | Compression |
|
|
61
|
+
|--------------------|----------|--------|------------------|-------------|
|
|
62
|
+
| cli-sentry-issue | 318 | 113K | ~6K tokens | 19x |
|
|
63
|
+
| cli-nightly | 898 | 353K | ~19K tokens | 19x |
|
|
64
|
+
|
|
65
|
+
The eval is self-contained and reproducible: session transcripts are stored as JSON files with no database dependency. See [`eval/`](eval/) for the harness and data.
|
|
52
66
|
|
|
53
67
|
## How we got here
|
|
54
68
|
|
|
@@ -58,9 +72,11 @@ This plugin was built in a few intense sessions. Some highlights:
|
|
|
58
72
|
|
|
59
73
|
**Markdown injection.** Property-based testing with fast-check revealed that user-generated content in facts (code fences, heading markers, thematic breaks) could break the markdown structure of the injected context, confusing the model.
|
|
60
74
|
|
|
61
|
-
**v2 — observation logs.** Switching to Mastra's observer/reflector architecture with plain-text timestamped observation logs was the breakthrough
|
|
75
|
+
**v2 — observation logs.** Switching to Mastra's observer/reflector architecture with plain-text timestamped observation logs was the breakthrough. The key insight: dated event logs preserve temporal relationships that structured JSON destroys.
|
|
76
|
+
|
|
77
|
+
**Prompt refinements.** The push from 80% to 93.3% on the initial coding recall eval came from two observer prompt additions: "EXACT NUMBERS — NEVER APPROXIMATE" (the observer was rounding counts) and "BUG FIXES — ALWAYS RECORD" (early-session fixes were being compressed away during reflection).
|
|
62
78
|
|
|
63
|
-
**
|
|
79
|
+
**v3 — gradient fixes, caching, and proper eval.** A month of fixes (per-session gradient state, current-turn protection, cache.write calibration, prefix caching, LTM relevance scoring) shipped alongside a new self-contained eval harness. The old coding eval used DB-resident sessions that degraded over time as temporal pruning deleted messages. The new eval extracts full session transcripts into portable JSON files, distills on the fly with the current production prompt, seeds the DB for recall tool access, and compares against OpenCode's actual compaction behavior. This moved the coding eval from 15 questions on degraded data to 20 questions on clean 113K-353K token sessions — and confirmed the +35pp accuracy gap and 7x cost efficiency advantage.
|
|
64
80
|
|
|
65
81
|
## Installation
|
|
66
82
|
|
|
@@ -126,7 +142,6 @@ The assistant gets a `recall` tool that searches across stored messages and know
|
|
|
126
142
|
- [How we solved the agent memory problem](https://www.sanity.io/blog/how-we-solved-the-agent-memory-problem) — Simen Svale at Sanity on the Nuum memory architecture: three-tier storage, distillation not summarization, recursive compression. The foundation this plugin is built on.
|
|
127
143
|
- [Mastra Observational Memory](https://mastra.ai/research/observational-memory) — the observer/reflector architecture and the switch from structured JSON to timestamped observation logs that made v2 work.
|
|
128
144
|
- [Mastra Memory source](https://github.com/mastra-ai/mastra/tree/main/packages/memory) — reference implementation.
|
|
129
|
-
- [LongMemEval](https://arxiv.org/abs/2410.10813) — the evaluation benchmark (ICLR 2025) we used to measure progress.
|
|
130
145
|
- [OpenCode](https://opencode.ai) — the coding agent this plugin extends.
|
|
131
146
|
|
|
132
147
|
## License
|
package/package.json
CHANGED
|
@@ -1,9 +1,9 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "opencode-lore",
|
|
3
|
-
"version": "0.
|
|
3
|
+
"version": "0.3.0",
|
|
4
4
|
"type": "module",
|
|
5
5
|
"license": "MIT",
|
|
6
|
-
"description": "Three-tier memory architecture for OpenCode
|
|
6
|
+
"description": "Three-tier memory architecture for OpenCode \u2014 distillation, not summarization",
|
|
7
7
|
"main": "src/index.ts",
|
|
8
8
|
"exports": {
|
|
9
9
|
".": "./src/index.ts"
|
package/src/agents-file.ts
CHANGED
|
@@ -175,7 +175,7 @@ function buildSection(projectPath: string): string {
|
|
|
175
175
|
// Section heading
|
|
176
176
|
out.push("## Long-term Knowledge");
|
|
177
177
|
|
|
178
|
-
for (const [category, items] of grouped) {
|
|
178
|
+
for (const [category, items] of [...grouped.entries()].sort((a, b) => a[0].localeCompare(b[0]))) {
|
|
179
179
|
out.push("");
|
|
180
180
|
out.push(`### ${category.charAt(0).toUpperCase() + category.slice(1)}`);
|
|
181
181
|
out.push("");
|
package/src/index.ts
CHANGED
|
@@ -234,6 +234,12 @@ export const LorePlugin: Plugin = async (ctx) => {
|
|
|
234
234
|
}
|
|
235
235
|
|
|
236
236
|
if (event.type === "session.error") {
|
|
237
|
+
// Skip eval/worker child sessions — only handle errors for real user sessions.
|
|
238
|
+
const errorSessionID = (event.properties as Record<string, unknown>).sessionID as
|
|
239
|
+
| string
|
|
240
|
+
| undefined;
|
|
241
|
+
if (errorSessionID && await shouldSkip(errorSessionID)) return;
|
|
242
|
+
|
|
237
243
|
// Detect "prompt is too long" API errors and auto-recover:
|
|
238
244
|
// 1. Force the gradient transform to escalate on the next call (skip layer 0/1)
|
|
239
245
|
// 2. Force distillation to capture all temporal data before compaction
|
|
@@ -260,22 +266,19 @@ export const LorePlugin: Plugin = async (ctx) => {
|
|
|
260
266
|
);
|
|
261
267
|
|
|
262
268
|
if (isPromptTooLong) {
|
|
263
|
-
const sessionID = (event.properties as Record<string, unknown>).sessionID as
|
|
264
|
-
| string
|
|
265
|
-
| undefined;
|
|
266
269
|
console.error(
|
|
267
|
-
`[lore] detected 'prompt too long' error — forcing distillation + layer escalation (session: ${
|
|
270
|
+
`[lore] detected 'prompt too long' error — forcing distillation + layer escalation (session: ${errorSessionID?.substring(0, 16)})`,
|
|
268
271
|
);
|
|
269
272
|
// Force layer 2 on next transform — layers 0 and 1 were already too large.
|
|
270
273
|
// The gradient at layers 2-4 will compress the context enough for the next turn.
|
|
271
274
|
// Do NOT call session.summarize() here — it sends all messages to the model,
|
|
272
275
|
// which would overflow again and create a stuck compaction loop.
|
|
273
|
-
setForceMinLayer(2,
|
|
276
|
+
setForceMinLayer(2, errorSessionID);
|
|
274
277
|
|
|
275
|
-
if (
|
|
278
|
+
if (errorSessionID) {
|
|
276
279
|
// Force distillation to capture all undistilled messages into the temporal
|
|
277
280
|
// store so they're preserved even if the session is later compacted manually.
|
|
278
|
-
await backgroundDistill(
|
|
281
|
+
await backgroundDistill(errorSessionID, true);
|
|
279
282
|
}
|
|
280
283
|
}
|
|
281
284
|
}
|