opencode-diane 0.0.5
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +180 -0
- package/LICENSE +21 -0
- package/README.md +206 -0
- package/WIKI.md +1430 -0
- package/dist/index.d.ts +28 -0
- package/dist/index.js +1632 -0
- package/dist/ingest/adaptive.d.ts +47 -0
- package/dist/ingest/adaptive.js +182 -0
- package/dist/ingest/code-health.d.ts +58 -0
- package/dist/ingest/code-health.js +202 -0
- package/dist/ingest/code-map.d.ts +71 -0
- package/dist/ingest/code-map.js +670 -0
- package/dist/ingest/cross-refs.d.ts +59 -0
- package/dist/ingest/cross-refs.js +1207 -0
- package/dist/ingest/docs.d.ts +49 -0
- package/dist/ingest/docs.js +325 -0
- package/dist/ingest/git.d.ts +77 -0
- package/dist/ingest/git.js +390 -0
- package/dist/ingest/live-session.d.ts +101 -0
- package/dist/ingest/live-session.js +173 -0
- package/dist/ingest/project-notes.d.ts +28 -0
- package/dist/ingest/project-notes.js +102 -0
- package/dist/ingest/project.d.ts +35 -0
- package/dist/ingest/project.js +430 -0
- package/dist/ingest/session-snapshot.d.ts +63 -0
- package/dist/ingest/session-snapshot.js +94 -0
- package/dist/ingest/sessions.d.ts +29 -0
- package/dist/ingest/sessions.js +164 -0
- package/dist/ingest/tables.d.ts +52 -0
- package/dist/ingest/tables.js +360 -0
- package/dist/mining/skill-miner.d.ts +53 -0
- package/dist/mining/skill-miner.js +234 -0
- package/dist/search/bm25.d.ts +81 -0
- package/dist/search/bm25.js +334 -0
- package/dist/search/e5-embedder.d.ts +30 -0
- package/dist/search/e5-embedder.js +91 -0
- package/dist/search/embed-pass.d.ts +26 -0
- package/dist/search/embed-pass.js +43 -0
- package/dist/search/embedder.d.ts +58 -0
- package/dist/search/embedder.js +85 -0
- package/dist/search/inverted-index.d.ts +51 -0
- package/dist/search/inverted-index.js +139 -0
- package/dist/search/ppr.d.ts +44 -0
- package/dist/search/ppr.js +118 -0
- package/dist/search/tokenize.d.ts +26 -0
- package/dist/search/tokenize.js +98 -0
- package/dist/store/eviction.d.ts +16 -0
- package/dist/store/eviction.js +37 -0
- package/dist/store/repository.d.ts +222 -0
- package/dist/store/repository.js +420 -0
- package/dist/store/sqlite-store.d.ts +89 -0
- package/dist/store/sqlite-store.js +252 -0
- package/dist/store/vector-store.d.ts +66 -0
- package/dist/store/vector-store.js +160 -0
- package/dist/types.d.ts +385 -0
- package/dist/types.js +9 -0
- package/dist/utils/file-log.d.ts +87 -0
- package/dist/utils/file-log.js +215 -0
- package/dist/utils/peer-detection.d.ts +45 -0
- package/dist/utils/peer-detection.js +90 -0
- package/dist/utils/shell.d.ts +43 -0
- package/dist/utils/shell.js +110 -0
- package/dist/utils/usage-skill.d.ts +42 -0
- package/dist/utils/usage-skill.js +129 -0
- package/dist/utils/xlsx.d.ts +36 -0
- package/dist/utils/xlsx.js +270 -0
- package/grammars/tree-sitter-c.wasm +0 -0
- package/grammars/tree-sitter-c_sharp.wasm +0 -0
- package/grammars/tree-sitter-cpp.wasm +0 -0
- package/grammars/tree-sitter-css.wasm +0 -0
- package/grammars/tree-sitter-go.wasm +0 -0
- package/grammars/tree-sitter-html.wasm +0 -0
- package/grammars/tree-sitter-java.wasm +0 -0
- package/grammars/tree-sitter-javascript.wasm +0 -0
- package/grammars/tree-sitter-json.wasm +0 -0
- package/grammars/tree-sitter-php.wasm +0 -0
- package/grammars/tree-sitter-python.wasm +0 -0
- package/grammars/tree-sitter-rust.wasm +0 -0
- package/grammars/tree-sitter-typescript.wasm +0 -0
- package/package.json +80 -0
package/WIKI.md
ADDED
|
@@ -0,0 +1,1430 @@
|
|
|
1
|
+
# opencode-diane — Wiki
|
|
2
|
+
|
|
3
|
+
## What it is
|
|
4
|
+
|
|
5
|
+
A plugin for [OpenCode](https://opencode.ai) that gives the agent a
|
|
6
|
+
hierarchical, BM25-ranked memory store for **any git repository, in any
|
|
7
|
+
language**. It pre-fills itself from git history and project files,
|
|
8
|
+
lets the agent ingest past OpenCode sessions, and mines its own
|
|
9
|
+
contents into reusable `SKILL.md` files. No embeddings, no LLM
|
|
10
|
+
round-trips, no convention assumptions — by default. (Cross-lingual
|
|
11
|
+
semantic search is available as an explicit opt-in; see *Semantic
|
|
12
|
+
search*.)
|
|
13
|
+
|
|
14
|
+
The name is a *Twin Peaks* reference. Throughout the show, Dale
|
|
15
|
+
Cooper recorded his case notes for Diane, the recipient of his
|
|
16
|
+
investigation log. The plugin plays that role for a coding agent — a
|
|
17
|
+
persistent memory layer that holds everything it has observed about
|
|
18
|
+
the codebase. "Diane, I'm standing at the edge of a large
|
|
19
|
+
repository, and I have some thoughts on the commit history."
|
|
20
|
+
|
|
21
|
+
## Why it exists
|
|
22
|
+
|
|
23
|
+
The agent re-discovers the same things every session: which files
|
|
24
|
+
change together, what's in the build manifest, which files are
|
|
25
|
+
hotspots. Each rediscovery costs many tool calls. A small store of
|
|
26
|
+
compact, structural facts, queryable with BM25, replaces those
|
|
27
|
+
discoveries.
|
|
28
|
+
|
|
29
|
+
It's also a substrate for skill mining: after enough history the store
|
|
30
|
+
contains real patterns, and the miner turns clusters into OpenCode
|
|
31
|
+
`SKILL.md` files. Those are usable in the *same* session via the
|
|
32
|
+
`memory_skill` tool — no restart — and OpenCode also picks them up as
|
|
33
|
+
native skills on the next startup.
|
|
34
|
+
|
|
35
|
+
## No conventions — only structure
|
|
36
|
+
|
|
37
|
+
The hard rule: the plugin never interprets *culture*. It does not
|
|
38
|
+
parse commit messages for intent, does not assume a commit-message
|
|
39
|
+
style, does not reach into a language's semantics. Real repositories
|
|
40
|
+
often have no commit-message culture at all (`wip`, `.`, `更新`,
|
|
41
|
+
empty) — message-derived classification is noise dressed up as signal.
|
|
42
|
+
|
|
43
|
+
Everything the plugin derives is a **fact about what physically
|
|
44
|
+
happened or physically exists**:
|
|
45
|
+
|
|
46
|
+
- From git: per-commit diff *shape* (files touched, lines ±, files
|
|
47
|
+
created/deleted), file *co-change*, file *churn*, *recency*. The
|
|
48
|
+
commit subject is stored verbatim as searchable text — data, never
|
|
49
|
+
signal.
|
|
50
|
+
- From the tree: a file-extension census (the language signal emerges
|
|
51
|
+
from the data), the top-level layout, and recognised project/build/CI
|
|
52
|
+
files summarised **by format only** (JSON → keys, TOML → sections,
|
|
53
|
+
`Makefile` → targets, …). Recognising that a file is *named*
|
|
54
|
+
`Cargo.toml` is a fact, like knowing a file extension; interpreting
|
|
55
|
+
Rust's dependency model would be a convention, and the plugin does
|
|
56
|
+
not do that.
|
|
57
|
+
- From the language server (live): current diagnostics per file —
|
|
58
|
+
the compiler's / type-checker's own output, normalised by LSP
|
|
59
|
+
across 40+ languages. No heuristics.
|
|
60
|
+
- From tree-sitter (opt-in): per-file definition *signatures* — the
|
|
61
|
+
structural shape of the code, bodies stripped.
|
|
62
|
+
|
|
63
|
+
## Straight answers for a decision-maker
|
|
64
|
+
|
|
65
|
+
Short answers to the questions that decide whether this is worth
|
|
66
|
+
adding. Each links to the section with the full story.
|
|
67
|
+
|
|
68
|
+
**How is the memory structured?**
|
|
69
|
+
Every memory is one flat record — a `category`, a `subject`, verbatim
|
|
70
|
+
searchable `content`, structural `tags`, and bookkeeping fields
|
|
71
|
+
(`use_count`, `size_bytes`, `pinned`). There is no graph, no nesting,
|
|
72
|
+
no per-category schema. See [How the memory is
|
|
73
|
+
structured](#how-the-memory-is-structured).
|
|
74
|
+
|
|
75
|
+
**What does "hierarchical" mean here?**
|
|
76
|
+
Two address levels: a fixed top-level `category` (nine kinds — git
|
|
77
|
+
history, project facts, code map, …) and a free-form `subject` (a file
|
|
78
|
+
path, a task slug). Retrieval filters by either or both *before*
|
|
79
|
+
scoring, so narrowing to "the git history of `context.go`" costs
|
|
80
|
+
nothing. The hierarchy is those two filter levels — not a tree of
|
|
81
|
+
objects.
|
|
82
|
+
|
|
83
|
+
**What if the repo has no git history?**
|
|
84
|
+
The plugin still activates on any recognised manifest and still gives
|
|
85
|
+
you project facts, the code map, LSP code-health, and everything
|
|
86
|
+
session-driven — but the single largest source (per-commit memories,
|
|
87
|
+
co-change, churn, recency) is gone, so day one is thin. It grows
|
|
88
|
+
useful over sessions as snapshots and notes accumulate. `git init`
|
|
89
|
+
unlocks more than half the value. See [Without git
|
|
90
|
+
history](#without-git-history).
|
|
91
|
+
|
|
92
|
+
**What if commit messages are meaningless ("wip", "fix", ".")?**
|
|
93
|
+
By design this changes nothing about correctness. The plugin never
|
|
94
|
+
classifies a commit by its message — every tag comes from what the
|
|
95
|
+
commit physically did (files touched, lines ±, files created/deleted).
|
|
96
|
+
A terse message only means that one memory's searchable text is
|
|
97
|
+
low-signal; the diff-shape tags, co-change and churn are unaffected.
|
|
98
|
+
See [No conventions](#no-conventions--only-structure).
|
|
99
|
+
|
|
100
|
+
**How is it different from other memory plugins / approaches?**
|
|
101
|
+
By default it is deterministic: BM25 over a hand-built index — no
|
|
102
|
+
embeddings, no model, no API spend, fully reproducible and
|
|
103
|
+
inspectable. (An opt-in semantic-search mode adds an embedding model
|
|
104
|
+
for cross-lingual recall — off unless you enable it.) See [How it
|
|
105
|
+
compares](#how-it-compares).
|
|
106
|
+
|
|
107
|
+
**What token reduction can I actually expect?**
|
|
108
|
+
When a recall covers the task, 80–89 % measured on real repos with
|
|
109
|
+
history. That is a *ceiling*, not a promise — it assumes the recall is
|
|
110
|
+
relevant. It is lower on terse-history repos, mature/stable repos,
|
|
111
|
+
dynamic-dispatch code, and tiny repos. The `dry-run.mjs` script gives
|
|
112
|
+
your repo a GOOD / MODERATE / LOW verdict before you rely on it. See
|
|
113
|
+
[What token reduction to expect](#what-token-reduction-to-expect).
|
|
114
|
+
|
|
115
|
+
**What does it cost to run?**
|
|
116
|
+
The core plugin is ~77 KB with one small dependency, and a large store
|
|
117
|
+
costs a few hundred MB of RAM. The optional code map adds ~16 MB of
|
|
118
|
+
vendored grammar files. No GPU, no API key, no network. See
|
|
119
|
+
[Performance](#performance) and [Code map](#code-map).
|
|
120
|
+
|
|
121
|
+
**Is it production-ready?**
|
|
122
|
+
674 assertions across 24 test suites, ~90 % line coverage, verified
|
|
123
|
+
against the documented plugin contract and dry-run against real repos
|
|
124
|
+
in 30+ languages (code map covers 13 tree-sitter grammars; cross-refs
|
|
125
|
+
adds Pascal, Ruby, Perl, Elixir, Lua, Haskell, Scala, Kotlin, Swift,
|
|
126
|
+
Verilog, VHDL, COBOL, Fortran, Solidity, Smalltalk, Vim, Racket, Lisp,
|
|
127
|
+
and more). The one honest gap: it has not yet been run end-to-end inside
|
|
128
|
+
a live OpenCode *server* — see [Verifying it inside
|
|
129
|
+
a live OpenCode session](#verifying-it-inside-a-live-opencode-session).
|
|
130
|
+
|
|
131
|
+
## How the memory is structured
|
|
132
|
+
|
|
133
|
+
Every memory is **one flat record** — there is no per-category schema,
|
|
134
|
+
no nesting, no object graph. The shape, end to end:
|
|
135
|
+
|
|
136
|
+
```
|
|
137
|
+
one memory — the only record shape; every category uses it
|
|
138
|
+
─────────────────────────────────────────────────────────────────
|
|
139
|
+
id mem_mp45o0rc_c
|
|
140
|
+
category git-history one of 9 fixed kinds ┐ the two
|
|
141
|
+
subject src/context.go what the fact is about ┘ hierarchy
|
|
142
|
+
content "fix nil deref on flush" + diff shape levels
|
|
143
|
+
verbatim text — scored by BM25, never parsed
|
|
144
|
+
tags [single-file, tiny-diff, net-addition]
|
|
145
|
+
structural only — never derived from prose
|
|
146
|
+
source git:116c8060… provenance
|
|
147
|
+
pinned false true => never evicted
|
|
148
|
+
use_count 3 used_at … least-used pair ages out first
|
|
149
|
+
size_bytes 412 counts against the disk budget
|
|
150
|
+
created_at 2026-05-01T…
|
|
151
|
+
```
|
|
152
|
+
|
|
153
|
+
That uniformity is deliberate: one storage table, one inverted index,
|
|
154
|
+
one eviction rule, regardless of where a memory came from.
|
|
155
|
+
|
|
156
|
+
### The hierarchy is two filter levels
|
|
157
|
+
|
|
158
|
+
"Hierarchical" here means exactly two levels of address — nothing more
|
|
159
|
+
elaborate:
|
|
160
|
+
|
|
161
|
+
- **`category`** — a fixed, closed set of nine kinds. It says *what
|
|
162
|
+
type of fact* this is.
|
|
163
|
+
- **`subject`** — free-form. Usually a file path; sometimes a task
|
|
164
|
+
slug or a synthetic key like `<tree>` or `go.mod↔go.sum`. It says
|
|
165
|
+
*what the fact is about*.
|
|
166
|
+
|
|
167
|
+
```
|
|
168
|
+
the store
|
|
169
|
+
│
|
|
170
|
+
├─ git-history ······ commit / co-change / churn / recency memories
|
|
171
|
+
│ ├─ subject "src/context.go" a commit that touched it
|
|
172
|
+
│ ├─ subject "src/writer.go"
|
|
173
|
+
│ ├─ subject "go.mod↔go.sum" a co-change pair
|
|
174
|
+
│ └─ subject "context.go (churn)" a stability signal
|
|
175
|
+
│
|
|
176
|
+
├─ project-facts ···· manifests, tree census, README headline
|
|
177
|
+
│ ├─ subject "package.json"
|
|
178
|
+
│ └─ subject "<tree>"
|
|
179
|
+
│
|
|
180
|
+
├─ code-map ········· one signature digest per source file (opt-in)
|
|
181
|
+
├─ code-health ······ one LSP error/warning summary per file (live)
|
|
182
|
+
├─ session-snapshot · one per session — mental model, decisions
|
|
183
|
+
├─ session-trace ···· task + tool-trace summaries of past sessions
|
|
184
|
+
├─ agent-note ······· facts the agent chose to remember
|
|
185
|
+
├─ skill-mined ······ subject clusters promoted to SKILL.md
|
|
186
|
+
└─ custom ··········· anything stored with memory_remember
|
|
187
|
+
```
|
|
188
|
+
|
|
189
|
+
Retrieval can filter by `category`, by `subject`, or by both *before*
|
|
190
|
+
BM25 scoring runs — so "the git history of `context.go`" or "every
|
|
191
|
+
code-map entry" is a free narrowing, not a post-filter over a full
|
|
192
|
+
scan. That pre-score filter is the entire payoff of the hierarchy: it
|
|
193
|
+
makes a scoped recall as cheap as an unscoped one.
|
|
194
|
+
|
|
195
|
+
There is no third level and no cross-links between memories *as data*.
|
|
196
|
+
The one relationship the plugin uses — which files change together —
|
|
197
|
+
is itself stored as ordinary `git-history` memories (subject
|
|
198
|
+
`fileA↔fileB`) and consulted at query time as the co-change boost; it
|
|
199
|
+
is not a separate graph structure in the store.
|
|
200
|
+
|
|
201
|
+
## How it compares
|
|
202
|
+
|
|
203
|
+
The plugin is one specific point in the design space. What it trades,
|
|
204
|
+
against the common alternatives:
|
|
205
|
+
|
|
206
|
+
**vs. an embedding / vector memory.** The deliberate difference is
|
|
207
|
+
that the *default* uses *no model*. Retrieval is BM25 over a
|
|
208
|
+
hand-built inverted index — deterministic, reproducible, debuggable,
|
|
209
|
+
no GPU, no API spend. (Semantic search bridges to the embedding world
|
|
210
|
+
as an opt-in when you need cross-lingual recall — see *Semantic
|
|
211
|
+
search* — but it is off by default.) The cost of that choice
|
|
212
|
+
is real: BM25 matches *tokens*, so a query has to share words (or CJK
|
|
213
|
+
bigrams) with the memory — it will not catch a pure paraphrase the way
|
|
214
|
+
a vector search can. Three things blunt that: identifier-aware
|
|
215
|
+
tokenisation (so `getUserName` also matches `user` and `name`), the
|
|
216
|
+
co-change boost (structurally-related memories surface with no textual
|
|
217
|
+
match), and the fact that code search is mostly keyword search anyway.
|
|
218
|
+
If you specifically need semantic similarity over prose, an embedding
|
|
219
|
+
store is the better tool; for a fast, free, inspectable memory of a
|
|
220
|
+
codebase, this is.
|
|
221
|
+
|
|
222
|
+
**vs. aider's repo-map.** aider uses tree-sitter too, but the design
|
|
223
|
+
is different at every level.
|
|
224
|
+
|
|
225
|
+
*How it works (from the source).* The expensive step — tree-sitter
|
|
226
|
+
parsing and tag extraction — is cached persistently on disk
|
|
227
|
+
(`.aider.tags.cache.v{N}`, using `diskcache`) across sessions. What
|
|
228
|
+
is recomputed on each message turn is the *ranking*: a full PageRank
|
|
229
|
+
run (via NetworkX) on a symbol-reference dependency graph, where each
|
|
230
|
+
source file is a node and edges are weighted by how often one file
|
|
231
|
+
references symbols defined in another. The ranking is *personalised*
|
|
232
|
+
to the current turn — files in the active chat get a ×50 multiplier
|
|
233
|
+
on their outgoing edges; symbols mentioned in the current message get
|
|
234
|
+
×10; long compound identifiers (camelCase / snake_case, ≥ 8 chars)
|
|
235
|
+
×10; private `_`-prefixed symbols ×0.1. An in-session in-memory cache
|
|
236
|
+
short-circuits the PageRank when the inputs are unchanged. The ranked
|
|
237
|
+
tags are then fitted into the token budget by binary search.
|
|
238
|
+
|
|
239
|
+
*The budget is dynamic.* The default token cap is 1 024 (`--map-tokens`),
|
|
240
|
+
but when no files are in the chat the budget multiplies by
|
|
241
|
+
`map_mul_no_files=8` — up to ~8 192 tokens — so an empty chat gets a
|
|
242
|
+
much wider view of the whole repo.
|
|
243
|
+
|
|
244
|
+
*Where diane lands.* diane's default co-change boost is deliberately
|
|
245
|
+
one hop — direct neighbours only — which is cheaper and trivially
|
|
246
|
+
inspectable, but narrower than aider's whole-graph PageRank. The
|
|
247
|
+
`personalizedPageRank` option closes that gap: turned on, diane runs
|
|
248
|
+
its own Personalized PageRank (a restart-biased random walk seeded on
|
|
249
|
+
the query's textual hits) over the co-change graph, so relevance
|
|
250
|
+
reaches multi-hop files graded by graph distance. It is off by
|
|
251
|
+
default — the random walk is a per-recall iterative computation (a few
|
|
252
|
+
ms on a large graph) and less trivially traceable than one hop, so the
|
|
253
|
+
cheap, fully-inspectable path stays the default and PPR is there for
|
|
254
|
+
those who want the wider reach. The graph differs from aider's in
|
|
255
|
+
kind: aider's edges are *symbol references* (who calls whom), diane's
|
|
256
|
+
are *co-change* (what changes together in Git history) — structural
|
|
257
|
+
coupling rather than static call structure.
|
|
258
|
+
|
|
259
|
+
*Output format: source lines, not stripped signatures.* aider's output
|
|
260
|
+
(via `TreeContext` with "lines of interest") shows the actual source
|
|
261
|
+
lines of the referenced symbols — class attributes, multi-line
|
|
262
|
+
signatures, brief context — not just a single signature string per
|
|
263
|
+
definition. Richer context per symbol, but more tokens per symbol;
|
|
264
|
+
diane's code map is more compact, covering more files at a lower
|
|
265
|
+
per-file token cost.
|
|
266
|
+
|
|
267
|
+
*How diane's code map differs.* It does not track symbol references or
|
|
268
|
+
run a graph algorithm. Every file gets one flat signature digest; BM25
|
|
269
|
+
recall selects the most query-relevant digests at call time. The map
|
|
270
|
+
is available immediately (persisted from prefill) and the token cost
|
|
271
|
+
is predictable at every call. It is also only one of ten memory
|
|
272
|
+
categories — git history, past sessions, mined skills and snapshots
|
|
273
|
+
sit alongside it. The benchmark repo (`opencode-diane-benchmarks`)
|
|
274
|
+
compares the two maps directly on real repositories.
|
|
275
|
+
|
|
276
|
+
**vs. AGENTS.md / static context files.** Those are loaded into the
|
|
277
|
+
prompt *every turn* — a fixed, recurring token cost the model pays
|
|
278
|
+
whether or not it needs them. This plugin is *pull*, not push: a
|
|
279
|
+
memory costs tokens only on the turn it is recalled. The two are
|
|
280
|
+
complementary — AGENTS.md for guidance the model should always see,
|
|
281
|
+
diane for facts it needs only sometimes.
|
|
282
|
+
|
|
283
|
+
**vs. no memory at all.** Without a memory the agent re-runs the same
|
|
284
|
+
`git log`, `ls -R`, `grep` and file reads every session. That raw
|
|
285
|
+
discovery is the baseline the token-savings numbers below are measured
|
|
286
|
+
against.
|
|
287
|
+
|
|
288
|
+
## The pillars
|
|
289
|
+
|
|
290
|
+
**1. Hierarchical store.** Top-level `category` (`git-history`,
|
|
291
|
+
`project-facts`, `code-health`, `code-map`, `session-trace`,
|
|
292
|
+
`session-snapshot`, `agent-note`, `skill-mined`, `custom`) + free-form
|
|
293
|
+
`subject` (file path, task slug). Retrieval filters by both before
|
|
294
|
+
scoring, so narrowing is free.
|
|
295
|
+
|
|
296
|
+
**2. BM25 retrieval, co-change-boosted.** Pure-JS tokenizer with
|
|
297
|
+
camelCase / snake_case splitting for Latin text and **overlapping
|
|
298
|
+
bigrams for CJK** (Chinese, Japanese, Korean) — CJK has no word
|
|
299
|
+
delimiters, so an ASCII splitter would drop it entirely; bigrams give
|
|
300
|
+
BM25 units to match on, the same dependency-free approach Lucene's CJK
|
|
301
|
+
analyzer and SQLite FTS5 use (see *Multilingual retrieval*). Inverted
|
|
302
|
+
index, `k1=1.2 b=0.75`, plus a small log-of-useCount tiebreak. On top
|
|
303
|
+
of textual scoring, a one-hop **co-change boost**: a hit about file X
|
|
304
|
+
pulls in memories about files X is historically modified with —
|
|
305
|
+
structurally-related context a pure text match would miss. (With
|
|
306
|
+
`personalizedPageRank` on, that one hop becomes a full
|
|
307
|
+
restart-biased random walk over the co-change graph, reaching
|
|
308
|
+
multi-hop files — opt-in; see *How it compares*.) Recall
|
|
309
|
+
output is **token-budgeted**: ranked hits are packed to a ceiling
|
|
310
|
+
(default 1200) so a
|
|
311
|
+
call's context cost is predictable; an oversized sole hit is
|
|
312
|
+
content-truncated rather than allowed to blow the budget.
|
|
313
|
+
|
|
314
|
+
The retrieval path, end to end:
|
|
315
|
+
|
|
316
|
+
```
|
|
317
|
+
memory_recall("nil deref on flush", category?, subject?, prefer?)
|
|
318
|
+
│
|
|
319
|
+
▼ tokenize camelCase / snake_case split · CJK -> bigrams ·
|
|
320
|
+
│ stopwords dropped · sub-2-char tokens dropped
|
|
321
|
+
[nil, deref, flush]
|
|
322
|
+
│
|
|
323
|
+
▼ filter category / subject narrow the candidate set
|
|
324
|
+
│ BEFORE scoring — a scoped recall is free
|
|
325
|
+
│
|
|
326
|
+
▼ BM25 k1=1.2 b=0.75 + log1p(use_count)*0.05 tiebreak
|
|
327
|
+
│
|
|
328
|
+
▼ co-change a hit on context.go pulls in writer.go if history
|
|
329
|
+
│ boost shows them changing together (a direct text
|
|
330
|
+
│ match still outranks a co-change-surfaced one)
|
|
331
|
+
│
|
|
332
|
+
▼ prefer lean optional: gently up/down-rank code vs tests vs
|
|
333
|
+
│ history to match the query's intent
|
|
334
|
+
│
|
|
335
|
+
▼ token-budget ranked hits packed to <= tokenBudget (default
|
|
336
|
+
│ pack 1200); the remainder returned as an omitted
|
|
337
|
+
│ count; an oversized sole hit is truncated
|
|
338
|
+
▼
|
|
339
|
+
bounded, predictable result
|
|
340
|
+
```
|
|
341
|
+
|
|
342
|
+
**3. Structural pre-fill.** Walks the last 500 commits via
|
|
343
|
+
`git log --numstat --summary`; every non-merge commit becomes a memory
|
|
344
|
+
tagged purely by diff shape. Adds co-change, churn and recency
|
|
345
|
+
memories. Separately, censuses the file tree and summarises recognised
|
|
346
|
+
project files by format. Works identically on a Go, Rust, Python,
|
|
347
|
+
Elixir, or polyglot repo.
|
|
348
|
+
|
|
349
|
+
What prefill does, on every startup:
|
|
350
|
+
|
|
351
|
+
```
|
|
352
|
+
OpenCode starts
|
|
353
|
+
│
|
|
354
|
+
▼ activate? — git repo OR a recognised manifest present?
|
|
355
|
+
│ if neither: log one idle line, register no tools
|
|
356
|
+
│
|
|
357
|
+
▼ prefill (background — the agent can query partial results at once)
|
|
358
|
+
│
|
|
359
|
+
├── git log --numstat --summary -> per-commit · co-change · churn · recency
|
|
360
|
+
├── walk the file tree ----------> extension census · layout · manifest digests
|
|
361
|
+
├── tree-sitter parse (opt-in) -> per-file signature digests (code-map)
|
|
362
|
+
├── past OpenCode sessions ------> task + tool-trace summaries
|
|
363
|
+
└── most recent session-snapshot > resume point logged
|
|
364
|
+
│
|
|
365
|
+
▼ store ready — every later session starts warm
|
|
366
|
+
```
|
|
367
|
+
|
|
368
|
+
**4. Live code-health.** Subscribes to OpenCode's
|
|
369
|
+
`lsp.client.diagnostics` event and keeps one `code-health` memory per
|
|
370
|
+
file reflecting its *current* error/warning count — re-reports
|
|
371
|
+
replace, not accumulate. Convention-free, language-agnostic, no new
|
|
372
|
+
dependency.
|
|
373
|
+
|
|
374
|
+
**5. Code map (opt-in).** With `enableCodeMap`, tree-sitter parses
|
|
375
|
+
each source file and stores the *signatures* of its definitions
|
|
376
|
+
(bodies stripped) — an Aider-style repo map, reachable via
|
|
377
|
+
`memory_code_map`. This is the one heavyweight, language-aware
|
|
378
|
+
feature; see *Code map* below.
|
|
379
|
+
|
|
380
|
+
**6. Session snapshots.** `memory_snapshot` records a session's
|
|
381
|
+
*understanding* — mental model, decisions, learned conventions — as a
|
|
382
|
+
pinned `session-snapshot` memory. Each tags the previous session's as
|
|
383
|
+
`parent`, so the set is a branchable history with no DAG structure
|
|
384
|
+
beyond the tags; a later or parallel session resumes from the latest.
|
|
385
|
+
See *Session snapshots* below.
|
|
386
|
+
|
|
387
|
+
**7. LFU disk budget.** Configurable byte cap (default 50 MB — see
|
|
388
|
+
*Configuration* and the *heap* note under *Performance*). After every
|
|
389
|
+
mutation, evict ascending by `(useCount, usedAt)` until under. Pinned
|
|
390
|
+
entries (including snapshots) are never evicted.
|
|
391
|
+
|
|
392
|
+
**8. Skill mining.** Clusters memories by `subject`. Clusters with
|
|
393
|
+
≥ 3 entries become `<root>/.opencode/skills/<slug>/SKILL.md`. Runs
|
|
394
|
+
in the background; the tool returns immediately. The mined skills are
|
|
395
|
+
usable in the same session through the `memory_skill` tool — no
|
|
396
|
+
restart — and OpenCode also loads them as native skills next start.
|
|
397
|
+
|
|
398
|
+
## The ten tools
|
|
399
|
+
|
|
400
|
+
| Tool | Purpose |
|
|
401
|
+
|---|---|
|
|
402
|
+
| `memory_recall(query, category?, subject?, prefer?, limit?, tokenBudget?)` | Search the store — co-change-boosted, token-budgeted. `prefer` ('code'/'tests'/'history') leans ranking to match query intent. The recall-first entry point. |
|
|
403
|
+
| `memory_code_map(query?, tokenBudget?)` | Aider-style file-signature map, ranked + budgeted. Needs `enableCodeMap`. |
|
|
404
|
+
| `memory_remember(subject, content, tags?)` | Save a fact for future turns. |
|
|
405
|
+
| `memory_snapshot(summary, decisions?, conventions?)` | Record this session's understanding for a later/parallel session to resume from. |
|
|
406
|
+
| `memory_outline()` | Counts per category — cheap orientation. |
|
|
407
|
+
| `memory_status()` | Size, byte usage vs budget, last-ingest timestamps. |
|
|
408
|
+
| `memory_ingest_sessions()` | Pull task + tool-trace summaries from past OpenCode sessions. |
|
|
409
|
+
| `memory_ingest_git()` | Re-scan git history for new commits arrived since startup. Idempotent — `insertIfMissing` skips already-known commits. The plugin also auto-triggers this when a `bash` call moves HEAD; this tool is the explicit version. |
|
|
410
|
+
| `memory_mine_skills(reason?)` | Cluster memories into SKILL.md files. Background. |
|
|
411
|
+
| `memory_skill(name?)` | List the mined skill files, or load one into the conversation — so a skill mined this session is usable now, no restart. |
|
|
412
|
+
|
|
413
|
+
Tool descriptions are deliberately **directive** — they tell the
|
|
414
|
+
agent to recall *before* raw discovery and frame the token-cost
|
|
415
|
+
argument, since the description is the only prompt a plugin controls.
|
|
416
|
+
On top of that, a `tool.execute.before/after` pair provides a gentle
|
|
417
|
+
**recall-first nudge**: if the agent makes a couple of raw discovery
|
|
418
|
+
calls without ever touching a memory tool, one reminder is appended to
|
|
419
|
+
a discovery result. It fires at most once per session, never on
|
|
420
|
+
`read` output (file contents stay pristine), and goes silent the
|
|
421
|
+
moment any memory tool is used.
|
|
422
|
+
|
|
423
|
+
## Activation
|
|
424
|
+
|
|
425
|
+
Activates on any directory that is a git repository **or** contains at
|
|
426
|
+
least one recognised project/build file (a flat list of filenames
|
|
427
|
+
across ecosystems — no language logic). Otherwise it logs one idle
|
|
428
|
+
line and registers no tools. `forceActive: true` overrides.
|
|
429
|
+
|
|
430
|
+
## Without git history
|
|
431
|
+
|
|
432
|
+
Git is the largest single source: per-commit memories plus co-change,
|
|
433
|
+
churn and recency — and co-change is the entire backing for the
|
|
434
|
+
retrieval boost and the closest thing the plugin has to a graph.
|
|
435
|
+
Without git, none of that exists.
|
|
436
|
+
|
|
437
|
+
What remains git-independent:
|
|
438
|
+
|
|
439
|
+
- **Project facts** — manifests, build/CI files, tree census, README
|
|
440
|
+
headline. Real, but a modest slice: orientation, not history-derived
|
|
441
|
+
intelligence.
|
|
442
|
+
- **Code map** (`enableCodeMap`) — tree-sitter parses the file tree
|
|
443
|
+
directly and never touches git. On a non-git repo this is the main
|
|
444
|
+
source of actual codebase intelligence.
|
|
445
|
+
- **Code health** — LSP diagnostics, event-driven.
|
|
446
|
+
- **Session snapshots, agent notes, session ingestion, skill mining** —
|
|
447
|
+
all agent- and session-driven; they accumulate across sessions
|
|
448
|
+
regardless of git.
|
|
449
|
+
- The retrieval machinery itself — BM25, the inverted index, token
|
|
450
|
+
budgeting, the recall-first nudge — is entirely git-independent. It
|
|
451
|
+
simply has less to retrieve at first.
|
|
452
|
+
|
|
453
|
+
So the honest picture: **weak on a fresh non-git repo on day one**
|
|
454
|
+
(project facts alone is thin — and `measure-savings.mjs` will report
|
|
455
|
+
"inconclusive" there for exactly that reason), but **not useless over
|
|
456
|
+
time**, because snapshots, notes and traces build a store that recall
|
|
457
|
+
still operates on. `detectWorkableRepo` accepts a recognised manifest
|
|
458
|
+
*or* git, so a non-git Node/Rust/Python project still activates and
|
|
459
|
+
ingests project facts — only a directory with neither git nor a
|
|
460
|
+
manifest sits idle (and needs `forceActive`).
|
|
461
|
+
|
|
462
|
+
Recommendation: if you work without git, enable `enableCodeMap` and
|
|
463
|
+
use `memory_snapshot` / `memory_remember` deliberately — on a non-git
|
|
464
|
+
repo the store is only as good as what you and past sessions put in.
|
|
465
|
+
If the repo could be under git, `git init` unlocks more than half the
|
|
466
|
+
plugin's value and is the cheapest fix.
|
|
467
|
+
|
|
468
|
+
## Configuration
|
|
469
|
+
|
|
470
|
+
Defaults work without any config. To override, list the plugin as a
|
|
471
|
+
`[name, options]` tuple in `opencode.json`; OpenCode passes the
|
|
472
|
+
options straight through, and they're coerced defensively (bad keys
|
|
473
|
+
ignored, defaults applied).
|
|
474
|
+
|
|
475
|
+
```ts
|
|
476
|
+
interface UserConfig {
|
|
477
|
+
maxMemoryDiskMB?: number // default 50
|
|
478
|
+
autoIngestOnStartup?: boolean // default true
|
|
479
|
+
gitHistoryDepth?: number // default 500
|
|
480
|
+
forceActive?: boolean // default false
|
|
481
|
+
skillsOutputDir?: string // default ".opencode/skills"
|
|
482
|
+
skillMiningMinCluster?: number // default 3
|
|
483
|
+
ingestSessions?: boolean // default true
|
|
484
|
+
enableCodeMap?: boolean // default true — tree-sitter signatures (since v0.0.4)
|
|
485
|
+
installUsageSkill?: boolean // default true — write a using-memory skill on first startup
|
|
486
|
+
ingestDocs?: boolean // default true — index docs/ headings as section pointers
|
|
487
|
+
ingestProjectNotes?: boolean // default true — index AGENTS.md, CLAUDE.md, .cursorrules, …
|
|
488
|
+
ingestTableHeaders?: boolean // default true — index CSV / TSV / XLSX column headers
|
|
489
|
+
ingestCrossRefs?: boolean // default true — grammar-agnostic cross-file edges
|
|
490
|
+
crossRefsRarityThreshold?: number // default 3 — max files a symbol can appear in to count
|
|
491
|
+
enableNudgeHook?: boolean // default true — see Coexisting plugins
|
|
492
|
+
adaptive?: boolean // default true — see Adaptive sizing
|
|
493
|
+
enableSemanticSearch?: boolean // default false — see Semantic search
|
|
494
|
+
embeddingModel?: string // default "Xenova/multilingual-e5-small"
|
|
495
|
+
personalizedPageRank?: boolean // default false — see "How it compares"
|
|
496
|
+
recordSessionActivity?: boolean // default true — record this session's edits + bash as a rolling memory
|
|
497
|
+
bashFileTrackingMaxFiles?: number // default 20 — refresh code-map for files a bash call touched (0 = off)
|
|
498
|
+
autoReingestGitOnHeadChange?: boolean // default true — re-ingest git when bash moves HEAD
|
|
499
|
+
}
|
|
500
|
+
```
|
|
501
|
+
|
|
502
|
+
### Fine-grained tuning
|
|
503
|
+
|
|
504
|
+
Most users never set these — the defaults cover typical repos. They
|
|
505
|
+
exist for monorepos, documentation-heavy projects, and locked-down
|
|
506
|
+
environments where every walk needs an explicit ceiling.
|
|
507
|
+
|
|
508
|
+
| Option | Default | What it does |
|
|
509
|
+
|---|---|---|
|
|
510
|
+
| `docsMaxFiles` | `200` | Cap on `.md` / `.markdown` files walked under `docs/` plus conventional root docs (CHANGELOG, CONTRIBUTING, ARCHITECTURE, ROADMAP, …). Raise for documentation-heavy repos. |
|
|
511
|
+
| `docsBodyChars` | `240` | Characters of body text captured after each heading as the recall snippet. Longer values → richer context, larger memory entries. |
|
|
512
|
+
| `docsMaxHeadingLevel` | `3` | Deepest heading level indexed (`3` = H1–H3). Set `2` for only H1–H2, or `4`/`5` for deeper structure. Clamped to `[1, 6]`. |
|
|
513
|
+
| `notesMaxBytes` | `6144` | Maximum bytes read from each agent-instruction file (`AGENTS.md`, `CLAUDE.md`, `.cursorrules`, etc.). Raise for teams with detailed instructions. |
|
|
514
|
+
| `tablesMaxFiles` | `200` | Cap on table files (CSV / TSV / XLSX / XLS) walked per prefill pass. |
|
|
515
|
+
| `tablesMaxXlsxMB` | `50` | Skip XLSX/XLS files larger than this (in MB). Set `0` to skip all spreadsheets. |
|
|
516
|
+
| `tablesMaxColumns` | `40` | Maximum column headers listed per table/sheet. Wider tables get a `(N more)` note. |
|
|
517
|
+
| `crossRefsMaxFiles` | `2000` | Cap on files the cross-reference ingester walks per prefill. Raise for monorepos. |
|
|
518
|
+
| `crossRefsMaxEdges` | `10000` | Hard cap on cross-reference edges emitted per pass. Controls the coverage/noise trade-off on dense codebases. |
|
|
519
|
+
| `coChangeMinOccurrences` | `3` | Minimum commits in which two files must co-change before a co-change edge is recorded. Lower → denser graph on small/young repos; raise → tighter graph on busy repos. |
|
|
520
|
+
| `codeMapMaxFiles` | adaptive (`1500`/`4000`/`10000`) | Cap on source files the code-map ingester parses per pass. By default sized by adaptive tuning at startup (small / medium / large tier). Setting it explicitly *overrides the adaptive choice* — useful when you want deterministic behaviour. |
|
|
521
|
+
| `coChangeMaxCommits` | `5000` | Cap on git commits the co-change graph builder scans. Lower for faster startup; raise for deeper history. Adaptive sizing keeps this uniform across tiers in the current implementation; only `codeMapMaxFiles` varies by repo size. |
|
|
522
|
+
|
|
523
|
+
All numeric limits are clamped to a safe minimum (typically `1`,
|
|
524
|
+
sometimes `0` where "off" is meaningful) and rounded — garbage input
|
|
525
|
+
in `opencode.json` never breaks the plugin.
|
|
526
|
+
|
|
527
|
+
## Coexisting plugins
|
|
528
|
+
|
|
529
|
+
Diane is designed to run alongside other OpenCode plugins without
|
|
530
|
+
either side losing functionality. The two compatibility decisions
|
|
531
|
+
that can't be avoided are made automatically at startup, by reading
|
|
532
|
+
the `plugin` array in `opencode.json` (project-local first, then
|
|
533
|
+
`~/.config/opencode/opencode.json`) and matching against known peer
|
|
534
|
+
names.
|
|
535
|
+
|
|
536
|
+
### What gets adjusted, and why
|
|
537
|
+
|
|
538
|
+
**The `tool.execute.after` nudge hook (default ON).** When a memory
|
|
539
|
+
tool has gone unused after several discovery calls, Diane appends one
|
|
540
|
+
short reminder to the discovery tool's output. `oh-my-opencode` also
|
|
541
|
+
post-processes tool output (its `look_at` flow replaces grep/glob),
|
|
542
|
+
and two plugins both mutating `output.output` interleave
|
|
543
|
+
unpredictably — so when oh-my-opencode is listed in `opencode.json`
|
|
544
|
+
the nudge is turned off. `caveman` doesn't touch tool output (it
|
|
545
|
+
hooks `session.created` and `tui.prompt.append`), so the nudge stays
|
|
546
|
+
on alongside caveman.
|
|
547
|
+
|
|
548
|
+
**Mined-skill subdirectory prefix (default empty).**
|
|
549
|
+
`memory_mine_skills` writes to `.opencode/skills/<slug>/SKILL.md` —
|
|
550
|
+
the same directory OpenCode discovers skills from. `caveman` writes
|
|
551
|
+
fixed slugs into the same place (`caveman`, `caveman-commit`,
|
|
552
|
+
`caveman-review`, `caveman-help`, `caveman-compress`), and
|
|
553
|
+
`oh-my-opencode`'s skill system also lives there. When either is
|
|
554
|
+
detected, Diane prefixes its subdirs with `diane-` so collisions are
|
|
555
|
+
impossible AND `memory_skill` surfaces only Diane's slugs (the
|
|
556
|
+
peer's slugs are theirs to list, not ours). Standalone, no prefix is
|
|
557
|
+
applied — paths are byte-for-byte the documented default.
|
|
558
|
+
|
|
559
|
+
### The matrix
|
|
560
|
+
|
|
561
|
+
| Detected peer | nudge hook | mined-skill subdirs |
|
|
562
|
+
|---|---|---|
|
|
563
|
+
| none | on (default) | `<slug>/` |
|
|
564
|
+
| `oh-my-opencode` / `oh-my-openagent` / `oh-my-opencode-slim` | **off** | **`diane-<slug>/`** |
|
|
565
|
+
| `caveman` / `caveman-opencode` / `caveman-opencode-plugin` / `opencode-caveman` | on | **`diane-<slug>/`** |
|
|
566
|
+
| both | **off** | **`diane-<slug>/`** |
|
|
567
|
+
|
|
568
|
+
### Override
|
|
569
|
+
|
|
570
|
+
An explicit `enableNudgeHook` or `skillsOutputDir` in your `"diane"`
|
|
571
|
+
config beats the auto-detect — useful when you have a specific reason
|
|
572
|
+
to want the nudge on alongside oh-my-opencode, or to point mining at
|
|
573
|
+
a non-standard directory and accept your own collision policy. The
|
|
574
|
+
adjustments are also visible at runtime:
|
|
575
|
+
|
|
576
|
+
- The OpenCode log line at startup names the peers found and the
|
|
577
|
+
adjustments made (or "no compatibility adjustments needed" when
|
|
578
|
+
none were).
|
|
579
|
+
- The `plugin.active` event in the JSONL log carries
|
|
580
|
+
`peers: { ohMyOpencode, caveman, found: [...] }` plus the
|
|
581
|
+
resolved `enableNudgeHook` and `minedSkillPrefix`, so a support
|
|
582
|
+
thread can confirm what actually ran.
|
|
583
|
+
|
|
584
|
+
### What's not detected
|
|
585
|
+
|
|
586
|
+
Detection is **list-based, not behavioural**. A plugin that does the
|
|
587
|
+
same things oh-my-opencode does but isn't named anything we recognise
|
|
588
|
+
will get no special treatment from us. If you hit a clash with such a
|
|
589
|
+
plugin, set `enableNudgeHook: false` (and/or `skillsOutputDir`) in
|
|
590
|
+
your config and file an issue with the peer's name so we can add it
|
|
591
|
+
to the detection list.
|
|
592
|
+
|
|
593
|
+
## Adaptive sizing
|
|
594
|
+
|
|
595
|
+
The fixed defaults (gitHistoryDepth 500, a 4000-file code-map cap) are
|
|
596
|
+
a sensible middle — wasteful on a 50-commit toy, thin on a 100k-commit
|
|
597
|
+
monorepo. With `adaptive` on (the default), prefill closes that gap
|
|
598
|
+
from **one measured signal**: `git rev-list --count HEAD`, or a
|
|
599
|
+
bounded file count when there's no git. That signal sorts the repo
|
|
600
|
+
into one of three named tiers, and a lookup table picks the knobs:
|
|
601
|
+
|
|
602
|
+
| knob | small | medium | large |
|
|
603
|
+
|---|---|---|---|
|
|
604
|
+
| `gitHistoryDepth` | 250 | 500 | 1500 |
|
|
605
|
+
| code-map file cap | 1500 | 4000 | 10000 |
|
|
606
|
+
| co-change pass | on | on | skipped above 5000 commits |
|
|
607
|
+
|
|
608
|
+
**The disk budget is deliberately not in that table.** It used to be
|
|
609
|
+
(small/medium 5 MB, large 20 MB) — back when the default was a tight
|
|
610
|
+
5 MB that genuinely needed widening for big repos. The default is now
|
|
611
|
+
a generous 50 MB (see *Configuration*), which clears even a
|
|
612
|
+
depth-capped large repo's store (~6–8 MB) several times over, so there
|
|
613
|
+
is nothing left for adaptation to do: every tier carries the same
|
|
614
|
+
50 MB budget. To use more or less, set `maxMemoryDiskMB` explicitly.
|
|
615
|
+
|
|
616
|
+
**Co-change is the one pass that gets cut** on huge histories: its
|
|
617
|
+
pair-counting is O(commits × files²), the only super-linear step in
|
|
618
|
+
the plugin, so above the threshold it's skipped (commit/churn/recency
|
|
619
|
+
still run).
|
|
620
|
+
|
|
621
|
+
One input, three tiers, a table — not a pile of heuristics — so the
|
|
622
|
+
behaviour stays inspectable: the chosen tier and every knob it moved
|
|
623
|
+
are logged each run (`prefill: repo tier=large (9000 commits) — …`).
|
|
624
|
+
Adaptation only fills knobs the user did **not** set explicitly; an
|
|
625
|
+
explicit config value always wins, including `maxMemoryDiskMB` set
|
|
626
|
+
below the 50 MB default. `adaptive: false` pins everything to the
|
|
627
|
+
fixed defaults.
|
|
628
|
+
|
|
629
|
+
When there's no git, the file count is the signal instead — same
|
|
630
|
+
mechanism, different sensor — so adaptive sizing still works on a
|
|
631
|
+
non-git repo.
|
|
632
|
+
|
|
633
|
+
## Code map
|
|
634
|
+
|
|
635
|
+
`enableCodeMap` turns on tree-sitter parsing of every source file
|
|
636
|
+
into its per-file structural shape. It is **on by default since
|
|
637
|
+
v0.0.4** — set `enableCodeMap: false` to disable it (the grammar
|
|
638
|
+
`.wasm` files are shipped regardless; the flag only controls whether
|
|
639
|
+
the ingester runs). It is the one deliberate exception to the
|
|
640
|
+
plugin's otherwise-lightweight design:
|
|
641
|
+
|
|
642
|
+
- Covers **thirteen languages**. Ten are extracted as definition
|
|
643
|
+
signatures (JavaScript, TypeScript, Python, Go, Rust, Java, C, C++,
|
|
644
|
+
C#, PHP); the other three get their own extractors since they have no
|
|
645
|
+
"definitions" — CSS → selectors and at-rules, JSON → top-level keys,
|
|
646
|
+
HTML → `id`-bearing and landmark elements.
|
|
647
|
+
- It adds `web-tree-sitter` (~300 KB) plus vendored grammar `.wasm`
|
|
648
|
+
(~16 MB total). Three grammars are most of that weight: C# (5.2 MB),
|
|
649
|
+
C++ (4.5 MB) and TypeScript (2.3 MB). With it on, the package is
|
|
650
|
+
~16.5 MB rather than ~77 KB. Grammars load lazily — only for
|
|
651
|
+
languages actually present in the repo — but all `.wasm` ships in the
|
|
652
|
+
package; dropping a grammar you don't need is a small edit in
|
|
653
|
+
`code-map.ts` plus deleting one file.
|
|
654
|
+
- It is the only language-*aware* component: one small table maps
|
|
655
|
+
each grammar's node types to the kinds worth extracting. Files in a
|
|
656
|
+
language with no grammar are skipped; if `web-tree-sitter` fails to
|
|
657
|
+
load, code map degrades gracefully and the rest of the plugin is
|
|
658
|
+
unaffected.
|
|
659
|
+
- Measured: on a real 81-file Go repo the map cost ~45 tokens/file,
|
|
660
|
+
and a `memory_recall` + `memory_code_map` pair answered a "work on
|
|
661
|
+
feature X" scenario in ~700 tokens versus ~5,400 tokens of raw
|
|
662
|
+
discovery — an ~87 % reduction. Worth it or not is a per-setup
|
|
663
|
+
judgement; hence opt-in.
|
|
664
|
+
|
|
665
|
+
## Session snapshots
|
|
666
|
+
|
|
667
|
+
The other categories hold *facts*; `session-snapshot` holds
|
|
668
|
+
*understanding* — a session's mental model, the decisions it made and
|
|
669
|
+
why, the conventions it learned that the code doesn't show. The agent
|
|
670
|
+
writes one with `memory_snapshot`; it is **pinned** (eviction-proof),
|
|
671
|
+
**one per session** (re-snapshotting replaces in place), and tags the
|
|
672
|
+
most recent other session's snapshot as `parent:<id>`.
|
|
673
|
+
|
|
674
|
+
Those `parent` tags are the entire mechanism — the snapshot set is a
|
|
675
|
+
branchable history with no DAG data structure, just edges in the tag
|
|
676
|
+
list. A later session resumes from the latest snapshot (prefill logs
|
|
677
|
+
the resume point); a parallel session reads the same shared store and
|
|
678
|
+
forks from the same point; a snapshot tagging an older parent is a
|
|
679
|
+
branch. It's the harness-side, no-model take on versioned agent
|
|
680
|
+
memory — continuity without embeddings or LLM summarisation.
|
|
681
|
+
|
|
682
|
+
## Performance
|
|
683
|
+
|
|
684
|
+
All hot paths are O(1) or O(n), never O(n²). The in-memory working set
|
|
685
|
+
is a `Map<id, Memory>`, so insert, lookup and delete are O(1) —
|
|
686
|
+
`removeMemory` and `applyEviction` (which run on the per-event
|
|
687
|
+
`upsertBySubject` path and after every write) were O(n) array
|
|
688
|
+
operations before the Map. `insertIfMissing` uses a composite-key
|
|
689
|
+
`Map` for O(1) idempotency; `totalBytes` is a running counter;
|
|
690
|
+
`countsByCategory` reads the index directly; eviction sorts once per
|
|
691
|
+
*batch*, not per insert.
|
|
692
|
+
|
|
693
|
+
Persistence is a SQLite database (`bun:sqlite`) written behind a
|
|
694
|
+
debounced, failure-tolerant write-behind buffer: mutations record a
|
|
695
|
+
changed/deleted id, and the flush drains the buffer into one
|
|
696
|
+
transaction — a delta of only the changed rows, not a re-serialise of
|
|
697
|
+
the whole store the way the old JSON file did. The database is read
|
|
698
|
+
exactly once, at load; recall runs entirely against the in-memory
|
|
699
|
+
index and never touches it. At small scale this is not a speed win
|
|
700
|
+
over the JSON file — a ~1 MB store is cheap to rewrite wholesale, and
|
|
701
|
+
SQLite's per-transaction overhead is comparable. The win is at scale
|
|
702
|
+
and in the steady-state access pattern: on a 15,000-entry store,
|
|
703
|
+
touching a handful of memories and flushing costs ~4 ms (a delta of
|
|
704
|
+
the changed rows) versus ~40 ms for a JSON-style whole-file rewrite,
|
|
705
|
+
and that gap widens as the store grows — the incremental flush is
|
|
706
|
+
constant in the number of *changed* rows, the rewrite is linear in
|
|
707
|
+
the *whole* store. WAL mode also makes writes crash-safe and lets
|
|
708
|
+
parallel sessions share a repo. The migration is justified by that
|
|
709
|
+
scaling behaviour and crash-safety, not by small-store microbenchmarks.
|
|
710
|
+
|
|
711
|
+
### Scaling — measured
|
|
712
|
+
|
|
713
|
+
`scripts/stress-scale.mjs` builds stores of increasing size with
|
|
714
|
+
realistic content (a wide vocabulary, co-change tags) and measures
|
|
715
|
+
every cost that grows with size. Eviction is disabled so the table is
|
|
716
|
+
the raw curve. Representative numbers on a dev machine:
|
|
717
|
+
|
|
718
|
+
| memories | store on disk | insert | full flush | reload | recall ×100 | incr. flush | heap |
|
|
719
|
+
|--:|--:|--:|--:|--:|--:|--:|--:|
|
|
720
|
+
| 5 000 | 1.2 MB | 0.3 s | 18 ms | 0.2 s | 23 ms | 13 ms | ~100 MB |
|
|
721
|
+
| 15 000 | 3.6 MB | 0.8 s | 77 ms | 0.7 s | 51 ms | 18 ms | ~275 MB |
|
|
722
|
+
| 25 000 | 6.0 MB | 1.3 s | 126 ms | 1.2 s | 77 ms | 50 ms | ~440 MB |
|
|
723
|
+
|
|
724
|
+
Every cost scales **linearly** — there is no quadratic term. Recall
|
|
725
|
+
stays ~1–3 ms per query throughout (BM25 over the in-memory index;
|
|
726
|
+
latency tracks how many memories match the query terms, not the store
|
|
727
|
+
size). Incremental flush stays a small near-flat delta — that's the
|
|
728
|
+
SQLite write-behind win. `tests/scaling.test.ts` is a gated guard at
|
|
729
|
+
4 000 memories that would fail loudly if any of these went
|
|
730
|
+
super-linear.
|
|
731
|
+
|
|
732
|
+
The honest caveat is **heap**. The plugin holds the entire working
|
|
733
|
+
set in memory — the `byId` map, the inverted index (a term-frequency
|
|
734
|
+
map per memory, needed for BM25), and the co-change graph. That's
|
|
735
|
+
roughly 17 KB of heap per memory, ~70× the on-disk size. At a
|
|
736
|
+
realistic large store (~25k memories → ~440 MB) that's a chunky but
|
|
737
|
+
manageable footprint on a modern dev machine.
|
|
738
|
+
|
|
739
|
+
**The disk budget bounds RAM, not just disk.** Because heap tracks
|
|
740
|
+
memory count, and memory count tracks bytes stored, the byte budget is
|
|
741
|
+
effectively a RAM ceiling — about **70 MB of heap per 1 MB of
|
|
742
|
+
budget**, if the budget were ever filled. The default budget is 50 MB,
|
|
743
|
+
so the *theoretical* worst case is ~210k memories and ~3.5 GB of heap.
|
|
744
|
+
|
|
745
|
+
In practice a store never comes close. The git-history and code-map
|
|
746
|
+
ingesters are themselves depth-capped (≤ 1500 commits, ≤ 10 000
|
|
747
|
+
files), so a real store — even on a large repo — lands in the
|
|
748
|
+
**15–25k band: ~4–6 MB on disk, ~300–440 MB of heap**, far below the
|
|
749
|
+
50 MB budget. That is the point of the generous default: at 50 MB the
|
|
750
|
+
budget is a *safety valve* for a runaway monorepo, not a routine
|
|
751
|
+
clipper. The previous 5 MB default was small enough that a normal
|
|
752
|
+
large repo (~25k memories ≈ 6 MB) hit the ceiling and lost useful
|
|
753
|
+
memories to eviction every run; 50 MB ends that.
|
|
754
|
+
|
|
755
|
+
If you run on an unusually large monorepo and the heap footprint
|
|
756
|
+
matters, `maxMemoryDiskMB` is the single knob — set it **down** (e.g.
|
|
757
|
+
`10`, ~700 MB heap ceiling) to cap RAM hard, or **up** if you have the
|
|
758
|
+
memory and want a deeper store. The budget is the RAM dial.
|
|
759
|
+
|
|
760
|
+
The fuller answer for a store that genuinely outgrows RAM is to move
|
|
761
|
+
the search index itself onto disk. SQLite is already the durable store
|
|
762
|
+
here, and SQLite's FTS5 is a disk-resident full-text index with BM25
|
|
763
|
+
built in — so a future version could keep the inverted index in FTS5
|
|
764
|
+
rather than in the heap, holding only a small working set in memory and
|
|
765
|
+
letting the rest live on disk. That is a real architectural change (the
|
|
766
|
+
CJK bigram tokenisation would move to an FTS5 custom/trigram tokenizer,
|
|
767
|
+
and ranking would shift from the in-process scorer to FTS5's), so it is
|
|
768
|
+
deliberately scoped as separate future work rather than bolted on; for
|
|
769
|
+
now the byte budget plus depth-capped ingesters are what keep the
|
|
770
|
+
in-memory footprint bounded.
|
|
771
|
+
|
|
772
|
+
Confirmed on real large repositories: ingesting `redis` (1.8k files),
|
|
773
|
+
`rocksdb` (2.2k files) and `spring-framework` (11.4k files) produced
|
|
774
|
+
1.9k / 2.4k / 4.7k memories with a one-time background prefill of
|
|
775
|
+
~9 / ~17 / ~11 seconds. On `spring-framework` the code-map count
|
|
776
|
+
stopped at exactly 4 000 — the file cap doing its job, which is why an
|
|
777
|
+
11k-file repo prefilled no slower than a 2k-file one. The first session
|
|
778
|
+
gets partial recall until that prefill finishes; every session after is
|
|
779
|
+
warm.
|
|
780
|
+
|
|
781
|
+
## Rich logs
|
|
782
|
+
|
|
783
|
+
In addition to the human-readable lines that go to OpenCode's session
|
|
784
|
+
log (via `client.app.log`), the plugin writes a structured JSON-Lines
|
|
785
|
+
log to `os.tmpdir()/opencode-diane/` — typically
|
|
786
|
+
`/tmp/opencode-diane/` on Linux,
|
|
787
|
+
`/var/folders/.../T/opencode-diane/` on macOS. One file per process,
|
|
788
|
+
named `diane-<iso-timestamp>-pid<pid>.jsonl`, so parallel OpenCode
|
|
789
|
+
sessions never interleave.
|
|
790
|
+
|
|
791
|
+
**Inside Docker:** the default `os.tmpdir()` path is ephemeral
|
|
792
|
+
container storage — fine for ad-hoc runs but lost when the container
|
|
793
|
+
exits. Set `OPENCODE_DIANE_LOG_DIR` to a mounted path and the logs
|
|
794
|
+
flow to the host:
|
|
795
|
+
|
|
796
|
+
```bash
|
|
797
|
+
docker run \
|
|
798
|
+
-e OPENCODE_DIANE_LOG_DIR=/logs \
|
|
799
|
+
-v $PWD/logs:/logs \
|
|
800
|
+
…
|
|
801
|
+
# then from outside the container:
|
|
802
|
+
python3 analyze-logs.py --dir ./logs --plain
|
|
803
|
+
```
|
|
804
|
+
|
|
805
|
+
The env var is the write-side override; `analyze-logs.py --dir` is
|
|
806
|
+
the read-side counterpart, so the two halves of the diagnostic loop
|
|
807
|
+
work together regardless of where the logs are.
|
|
808
|
+
|
|
809
|
+
Two record shapes share the file: prose `log()` lines and structured
|
|
810
|
+
`event()` records. Every record carries `ts` (ISO ms-precision),
|
|
811
|
+
`service`, and `root`. Prose lines add `level` (`debug`/`info`/`warn`/
|
|
812
|
+
`error`) and `message` (mirroring exactly what OpenCode's session log
|
|
813
|
+
shows). Events add `event` (a dotted name like `ingest.git`) and a
|
|
814
|
+
typed payload — counts, ms, ids. The header record is
|
|
815
|
+
`event: "session.start"` with the pid, Node version, platform, and
|
|
816
|
+
cwd, so opening the file in isolation always gives context.
|
|
817
|
+
|
|
818
|
+
The events fired today:
|
|
819
|
+
|
|
820
|
+
- `session.start` — header (pid, node, platform, cwd)
|
|
821
|
+
- `plugin.idle` — directory has no git history and no project files
|
|
822
|
+
- `plugin.active` — version, storeSize, bytesTotal, budgetBytes, feature flags
|
|
823
|
+
- `store.migration.failed` — the legacy `diane.json` → SQLite migration
|
|
824
|
+
hit an error (the cause is in the `reason` field). The plugin does
|
|
825
|
+
**not** crash on this: it starts with an empty database, leaves the
|
|
826
|
+
JSON file in place, and the next startup retries. Observed in the
|
|
827
|
+
field when running alongside heavyweight plugins (e.g. oh-my-opencode)
|
|
828
|
+
whose own startup contends for resources during ours.
|
|
829
|
+
- `prefill.start` / `prefill.complete` / `prefill.failed` (with ms)
|
|
830
|
+
- `adaptive.tuned` — the size signal and the chosen knobs
|
|
831
|
+
- `ingest.project`, `ingest.git`, `ingest.sessions`, `ingest.code-map`
|
|
832
|
+
/ `ingest.code-map.skipped` — each ingester's raw counts
|
|
833
|
+
- `snapshot.resume` — id and total count when resuming
|
|
834
|
+
- `eviction` — removed count, bytes after, trigger
|
|
835
|
+
- `tool.call` — one record per tool invocation, with `tool`, `ms`,
|
|
836
|
+
`ok`, `args` (truncated to ~500 chars per string field) and either
|
|
837
|
+
`result` (a per-tool summary like `{hits, omitted}` or `{id,
|
|
838
|
+
sizeBytes, bytesTotal}`) or `error` on failure
|
|
839
|
+
- `mining.complete` / `mining.failed` — the background outcome of
|
|
840
|
+
`memory_mine_skills` (the tool returns immediately; these fire when
|
|
841
|
+
the background job finishes)
|
|
842
|
+
|
|
843
|
+
Because every line is independently valid JSON, the file is greppable
|
|
844
|
+
*and* `jq`-able. Common queries:
|
|
845
|
+
|
|
846
|
+
```bash
|
|
847
|
+
# Tail the latest session
|
|
848
|
+
tail -f "$(ls -t /tmp/diane/*.jsonl | head -1)"
|
|
849
|
+
|
|
850
|
+
# Just the structured events from a specific run, in time order
|
|
851
|
+
jq -c 'select(.event)' /tmp/diane/diane-2026-05-15T*.jsonl
|
|
852
|
+
|
|
853
|
+
# Every tool call across all sessions, with timing
|
|
854
|
+
jq -c 'select(.event == "tool.call") | {tool, ms, ok}' /tmp/diane/*.jsonl
|
|
855
|
+
|
|
856
|
+
# Slow tool calls (> 100ms)
|
|
857
|
+
jq -c 'select(.event == "tool.call" and .ms > 100)' /tmp/diane/*.jsonl
|
|
858
|
+
|
|
859
|
+
# Find slow prefills (> 1 s)
|
|
860
|
+
jq -c 'select(.event == "prefill.complete" and .ms > 1000)' /tmp/diane/*.jsonl
|
|
861
|
+
```
|
|
862
|
+
|
|
863
|
+
### `analyze-logs.py`
|
|
864
|
+
|
|
865
|
+
A standalone Python script at the repo root that turns one or more
|
|
866
|
+
JSONL files into a report. Standalone means: stdlib only, no plugin
|
|
867
|
+
imports — you can copy the script to a machine that doesn't have the
|
|
868
|
+
plugin installed and analyse logs that came from one that does.
|
|
869
|
+
|
|
870
|
+
**Every report leads with a plain-language "What happened" summary.**
|
|
871
|
+
The raw log is a stream of dotted event names and typed payloads —
|
|
872
|
+
`prefill.complete`, `ingest.git scanned=1500`, `eviction removed=12` —
|
|
873
|
+
which is precise but assumes you know what each one means. The
|
|
874
|
+
analyzer's first job is to translate that into a numbered, jargon-free
|
|
875
|
+
account of what the plugin did and *why* it mattered, written for
|
|
876
|
+
someone who has never read the plugin's source. For example, instead
|
|
877
|
+
of `ingest.git scanned=1500 commitMemories=80` it writes: "it read
|
|
878
|
+
1,500 commits of Git history and turned them into 80 compact notes
|
|
879
|
+
about which files change together … this is what lets the AI answer
|
|
880
|
+
'what changed recently?' from memory instead of searching your files."
|
|
881
|
+
The technical sections (per-tool latency tables, the event timeline,
|
|
882
|
+
raw ingest counts) follow underneath for anyone who wants them.
|
|
883
|
+
|
|
884
|
+
`--plain` prints only that plain-language summary — the view for a
|
|
885
|
+
non-specialist or a quick "what did it just do?" check. `--json`
|
|
886
|
+
includes the same explanation as a string array per session, so an LLM
|
|
887
|
+
or downstream tool gets it too. Useful for bug reports
|
|
888
|
+
(`./analyze-logs.py --json > report.json` and attach it), quick local
|
|
889
|
+
debugging (`--timeline` shows the full chronological flow), or feeding
|
|
890
|
+
to an LLM as context. Examples:
|
|
891
|
+
|
|
892
|
+
```bash
|
|
893
|
+
./analyze-logs.py # plain summary + technical detail
|
|
894
|
+
./analyze-logs.py --plain # plain-language summary only
|
|
895
|
+
./analyze-logs.py --tail 3 --timeline # 3 newest, with chronological flow
|
|
896
|
+
./analyze-logs.py --json # JSON (carries the explanation too)
|
|
897
|
+
./analyze-logs.py --root /path/to/repo # filter to one repo
|
|
898
|
+
./analyze-logs.py --quiet # one-line-per-session summary
|
|
899
|
+
```
|
|
900
|
+
|
|
901
|
+
The plain-language explainer is covered by `tests/test_analyze_logs.py`
|
|
902
|
+
(Python `unittest`, stdlib only, wired into CI): the tests assert that
|
|
903
|
+
each major step is explained with its real numbers and its reason, and
|
|
904
|
+
that the plain output contains none of the raw event/field identifiers
|
|
905
|
+
— a machine-checkable proxy for "a non-specialist can read this".
|
|
906
|
+
|
|
907
|
+
The script is intentionally NOT bundled into the published npm
|
|
908
|
+
package — it's a development/debugging aid, not part of the runtime
|
|
909
|
+
plugin. It lives in the repo so it's there when you clone, and that's
|
|
910
|
+
the only coupling.
|
|
911
|
+
|
|
912
|
+
Reliability: writes are synchronous (`openSync` + `writeSync`), so a
|
|
913
|
+
line that "wrote" is on disk before the call returns — including
|
|
914
|
+
right before a crash, which is when these logs are most useful. A
|
|
915
|
+
write failure (disk full, permission lost mid-session) drops the fd
|
|
916
|
+
silently; the plugin keeps running and OpenCode's own log channel is
|
|
917
|
+
unaffected. A logger error never propagates.
|
|
918
|
+
|
|
919
|
+
Retention is the user's responsibility: the plugin never deletes its
|
|
920
|
+
own log files. On Linux they're cleared at reboot or by
|
|
921
|
+
`systemd-tmpfiles`; on macOS the periodic tmp cleaner removes them
|
|
922
|
+
after a few days of inactivity. For a manual sweep:
|
|
923
|
+
`rm /tmp/diane/*.jsonl`.
|
|
924
|
+
|
|
925
|
+
## Tests & CI
|
|
926
|
+
|
|
927
|
+
674 assertions across 24 test suites (covering storage, search, ingest,
|
|
928
|
+
cross-references, code-health, code-map, mining, sessions, adaptive tuning,
|
|
929
|
+
peer compatibility, configurable limits, and more). The ingest suite exercises real git fixtures
|
|
930
|
+
and a Rust project fixture; code-map parses a multi-language fixture
|
|
931
|
+
with the real grammars; the session-snapshot suite covers parent
|
|
932
|
+
linkage and pinned-survives-eviction; the plugin suite covers the
|
|
933
|
+
recall-first nudge hooks; the token-savings suite builds a fixture
|
|
934
|
+
repo with real history and asserts that recall is measurably cheaper
|
|
935
|
+
than raw discovery (see *Token savings*, below); the skill-activation
|
|
936
|
+
suite proves a skill mined mid-session is discoverable and loadable in
|
|
937
|
+
that same session, no restart; the scaling suite builds a 4 000-memory
|
|
938
|
+
store and guards correctness plus anti-quadratic timing ceilings (the
|
|
939
|
+
deep curve is `scripts/stress-scale.mjs` — see *Scaling*). Alongside
|
|
940
|
+
the Bun suites, `tests/test_analyze_logs.py` is a 12-test Python
|
|
941
|
+
(`unittest`, stdlib only) suite for the log analyzer's plain-language
|
|
942
|
+
explainer — it asserts the report stays legible to a non-specialist
|
|
943
|
+
(see *Rich logs*). CI runs typecheck →
|
|
944
|
+
lint (ESLint 9, type-aware) → build → tests → a smoke test of the
|
|
945
|
+
compiled `dist/` → a package-size guard → the Python analyzer tests,
|
|
946
|
+
all on the Bun runtime (with the preinstalled `python3` for the last
|
|
947
|
+
step), then a coverage job (`bun test --coverage`) enforces a
|
|
948
|
+
line/function coverage floor and uploads the lcov report. Coverage
|
|
949
|
+
sits around 90 % lines as Bun measures it. There is no Node version
|
|
950
|
+
matrix — OpenCode loads plugins under Bun, so Bun is what's tested.
|
|
951
|
+
The suites use a small self-contained assertion harness, so each runs
|
|
952
|
+
as a Bun script and self-gates on exit code.
|
|
953
|
+
|
|
954
|
+
A separate, informational workflow — `compare-aider` — is *not* part
|
|
955
|
+
of the merge gate. It's manually runnable (and runs monthly), installs
|
|
956
|
+
aider, and compares aider's tree-sitter repo-map to diane's
|
|
957
|
+
code map on a real repository, publishing the result to the run's job
|
|
958
|
+
summary. See *Token savings* below.
|
|
959
|
+
|
|
960
|
+
Verified unchanged against three real repositories — `rs/zerolog`
|
|
961
|
+
(Go), `BurntSushi/byteorder` (Rust), `petrovich/pytrovich` (Python) —
|
|
962
|
+
producing the same structural signals for each.
|
|
963
|
+
|
|
964
|
+
## Development & packaging
|
|
965
|
+
|
|
966
|
+
The plugin runs under Bun (the runtime OpenCode loads plugins in), so
|
|
967
|
+
the whole toolchain is Bun-based. `tsc` is still the build step — it
|
|
968
|
+
emits the `.d.ts` files the npm package ships — but it runs under Bun
|
|
969
|
+
like everything else.
|
|
970
|
+
|
|
971
|
+
```bash
|
|
972
|
+
bun install
|
|
973
|
+
bun run build # tsc -p tsconfig.json — emits dist/ + .d.ts
|
|
974
|
+
bun run lint # eslint src tests (type-aware; floating promises = error)
|
|
975
|
+
bun run test # 674 assertions across 24 test suites
|
|
976
|
+
bun run smoke # exercises the compiled dist/ as OpenCode would
|
|
977
|
+
bun run check:size # fails if the package exceeds its size ceiling
|
|
978
|
+
bun run typecheck # no emit
|
|
979
|
+
bun run coverage:check # bun test --coverage, fails under the coverage floor
|
|
980
|
+
bun run test:analyzer # python tests for the log analyzer's plain-language report
|
|
981
|
+
bun run verify:semantic # optional: runs the real e5 model on a 9-language fixture set
|
|
982
|
+
```
|
|
983
|
+
|
|
984
|
+
CI (`.github/workflows/ci.yml`) runs typecheck → lint → build → test →
|
|
985
|
+
smoke → size-guard on Bun, then a separate coverage job. There is no
|
|
986
|
+
Node version matrix — OpenCode loads plugins under Bun, so Bun is what
|
|
987
|
+
is tested.
|
|
988
|
+
|
|
989
|
+
To publish a new version:
|
|
990
|
+
|
|
991
|
+
```bash
|
|
992
|
+
bun run test && bun run smoke && bun run check:size # pre-flight: all must pass
|
|
993
|
+
bun run clean && bun run build # also the prepublishOnly script
|
|
994
|
+
npm version <patch|minor|major> # bump version + git tag
|
|
995
|
+
npm publish --access public # npm is the registry
|
|
996
|
+
```
|
|
997
|
+
|
|
998
|
+
**The version lives in exactly one place:** `package.json#version`.
|
|
999
|
+
`npm version <patch|minor|major>` edits that field (and creates a
|
|
1000
|
+
matching git tag). At plugin startup `src/index.ts` reads it from
|
|
1001
|
+
that same `package.json` and the value flows from there to the
|
|
1002
|
+
`plugin.active` log event (so the running version is in every
|
|
1003
|
+
session's JSONL log) and to the `memory_status` tool's output (so an
|
|
1004
|
+
agent can ask which version is loaded). There is no second place to
|
|
1005
|
+
update — change `package.json#version`, rebuild, and every consumer
|
|
1006
|
+
picks up the new number.
|
|
1007
|
+
|
|
1008
|
+
`bun pm pack --dry-run` lists exactly what would be packed; the `files`
|
|
1009
|
+
allowlist in `package.json` limits the tarball to `dist/`, `grammars/`,
|
|
1010
|
+
`README.md`, `WIKI.md`, and `LICENSE`. `check:size` runs that
|
|
1011
|
+
`--dry-run` under the hood and fails CI if the unpacked size crosses
|
|
1012
|
+
its ceiling or a vendored grammar goes missing — so a size regression
|
|
1013
|
+
cannot ship silently.
|
|
1014
|
+
|
|
1015
|
+
## Token savings
|
|
1016
|
+
|
|
1017
|
+
The plugin's premise is that a token-budgeted recall is cheaper than
|
|
1018
|
+
the raw discovery an agent would otherwise do. That claim is measured
|
|
1019
|
+
two ways, both with zero API spend.
|
|
1020
|
+
|
|
1021
|
+
### What token reduction to expect
|
|
1022
|
+
|
|
1023
|
+
The honest range, from measured runs:
|
|
1024
|
+
|
|
1025
|
+
- **When a recall covers the task: 80–89 %.** Real-repo measurements
|
|
1026
|
+
(`measure-savings.mjs`): ~87 % on `zerolog`, ~89 % on `click`,
|
|
1027
|
+
~85 % on `express` — ~8–11k tokens of raw discovery collapsing to
|
|
1028
|
+
~1.1–1.2k of recall.
|
|
1029
|
+
- That figure is a **ceiling, not a promise.** It is "tokens saved
|
|
1030
|
+
*if the recall is relevant*" — it is not a relevance score. A recall
|
|
1031
|
+
can be cheap and still mediocre (see the `express` case under
|
|
1032
|
+
*Real-world usefulness*).
|
|
1033
|
+
- **Lower** on: terse-history repos (low-signal commit text), mature/
|
|
1034
|
+
stable repos (recent history is dependency bumps), dynamic-dispatch
|
|
1035
|
+
codebases (the code map extracts *declared* signatures), and very
|
|
1036
|
+
small repos (raw discovery was already cheap — reported as
|
|
1037
|
+
"inconclusive", not a loss).
|
|
1038
|
+
- The gated test floor is deliberately conservative: a fixture
|
|
1039
|
+
end-to-end orientation must be **> 25 %** cheaper, and recall output
|
|
1040
|
+
must stay within ~2× its own token budget — so the plugin's
|
|
1041
|
+
footprint can never turn a saving into a cost.
|
|
1042
|
+
|
|
1043
|
+
Before trusting it on your repo, run `scripts/dry-run.mjs <repo>`: it
|
|
1044
|
+
prints a **GOOD / MODERATE / LOW** verdict on the git-history signal
|
|
1045
|
+
and shows real query results with their token cost. That verdict is
|
|
1046
|
+
the answer for *your* repo, which no general percentage can give.
|
|
1047
|
+
|
|
1048
|
+
### How it is measured
|
|
1049
|
+
|
|
1050
|
+
`scripts/measure-savings.mjs <repo>` runs a realistic *without-plugin*
|
|
1051
|
+
discovery recipe (recent git history, a tree listing, reading the
|
|
1052
|
+
files whose names match the task, a grep), sums the token cost, then
|
|
1053
|
+
runs the *with-plugin* memory calls and sums those. Both sides print
|
|
1054
|
+
what they ran. It's honest about coverage: thin recall results assume
|
|
1055
|
+
the agent still does part of the fallback discovery rather than
|
|
1056
|
+
claiming an unrealistic 100 %, and a non-git repo with an empty store
|
|
1057
|
+
is reported "inconclusive", not a win. Sample runs land around 80 % on
|
|
1058
|
+
real repos with history; on a tiny repo the saving is modest, because
|
|
1059
|
+
the baseline was already cheap — that's correct, not a failure.
|
|
1060
|
+
|
|
1061
|
+
`tests/token-savings.test.ts` turns the same method into gated
|
|
1062
|
+
assertions on a fixture repo with real history: a single file's
|
|
1063
|
+
history via recall vs `git log -p` (~5× cheaper), project facts via
|
|
1064
|
+
recall vs reading the config files, and a whole end-to-end
|
|
1065
|
+
orientation (>25 % fewer tokens). One case guards the floor — it
|
|
1066
|
+
verifies recall output stays within ~2× its token budget, so the
|
|
1067
|
+
plugin's own footprint can't run away and turn a "saving" into a cost.
|
|
1068
|
+
|
|
1069
|
+
For the code map specifically, the repo can compare against aider.
|
|
1070
|
+
`scripts/dump-code-map.mjs <repo>` prints diane's full code map
|
|
1071
|
+
as text; `aider --show-repo-map` prints aider's repo-map;
|
|
1072
|
+
`scripts/compare-aider.mjs <aider-map> <diane-map>` reports token cost
|
|
1073
|
+
and approximate coverage for both, with one tokenizer applied to each.
|
|
1074
|
+
The `compare-aider` workflow runs the whole thing in CI. The report is
|
|
1075
|
+
careful about what it shows: the two artifacts are different shapes —
|
|
1076
|
+
aider's repo-map embeds critical source lines and is trimmed to
|
|
1077
|
+
`--map-tokens` (default 1k) per turn; diane's code map is one
|
|
1078
|
+
signature digest per file, recalled as a query-ranked subset — so the
|
|
1079
|
+
figures are a coverage/footprint comparison of the full maps, not a
|
|
1080
|
+
head-to-head of per-request context cost.
|
|
1081
|
+
|
|
1082
|
+
### `bun test` vs `bun run test`
|
|
1083
|
+
|
|
1084
|
+
These look the same and are not. `bun run test` is the canonical gate:
|
|
1085
|
+
it runs each suite as a script (`bun tests/<name>.test.ts`), uses the
|
|
1086
|
+
custom assertion harness, and self-gates on exit code per suite. Its
|
|
1087
|
+
output is the 343-pass/0-fail summary.
|
|
1088
|
+
|
|
1089
|
+
`bun test` (Bun's native test runner) discovers `*.test.ts` files in
|
|
1090
|
+
parallel and looks for `bun:test` registrations. Our suites use a
|
|
1091
|
+
custom harness, so Bun reports "0 tests" — that's correct, not a bug.
|
|
1092
|
+
The only places that invoke `bun test` are `coverage:check` (it needs
|
|
1093
|
+
Bun's `--coverage` instrumentation) and CI gating.
|
|
1094
|
+
|
|
1095
|
+
The most common confusion: `bun test` shows `1 fail / 1 error /
|
|
1096
|
+
Cannot find module '@opencode-ai/plugin'`. That means your
|
|
1097
|
+
`node_modules` is incomplete — usually a missing `bun install` on a
|
|
1098
|
+
fresh checkout. The `coverage:check` preflight catches this case
|
|
1099
|
+
explicitly and tells you to run `bun install`; if you see the error
|
|
1100
|
+
from raw `bun test`, the answer is the same.
|
|
1101
|
+
|
|
1102
|
+
## Multilingual retrieval
|
|
1103
|
+
|
|
1104
|
+
Retrieval works for non-Latin scripts, with CJK as the driving case.
|
|
1105
|
+
The tokenizer handles two scripts in one pass: Latin/digit runs are
|
|
1106
|
+
split identifier-aware (camelCase, snake_case), and **CJK runs — Han,
|
|
1107
|
+
Hiragana, Katakana, Hangul — are emitted as overlapping bigrams**
|
|
1108
|
+
(`数据库连接` → `数据`,`据库`,`库连`,`连接`). A mixed string like
|
|
1109
|
+
`fix 数据库连接 bug` tokenizes to both `fix`/`bug` and the Chinese
|
|
1110
|
+
bigrams. Indexing and querying share the tokenizer, so the two sides
|
|
1111
|
+
always agree.
|
|
1112
|
+
|
|
1113
|
+
This matters because CJK has no spaces between words: an ASCII
|
|
1114
|
+
splitter treats every ideograph as a separator and **discards Chinese
|
|
1115
|
+
text entirely** — Chinese commit messages would index to nothing and
|
|
1116
|
+
Chinese queries would match nothing. Bigrams give BM25 overlapping
|
|
1117
|
+
units to match on. A dry run on a real Chinese repository confirmed
|
|
1118
|
+
`数据库索引` and `面试题` retrieve relevant Chinese commits where before
|
|
1119
|
+
they returned nothing.
|
|
1120
|
+
|
|
1121
|
+
The honest tradeoff: bigrams are the *lightweight* approach — the same
|
|
1122
|
+
one Lucene's CJK analyzer and SQLite FTS5 use — chosen because they're
|
|
1123
|
+
deterministic and need no dictionary or model. They take CJK recall
|
|
1124
|
+
from broken to working, but they're not as precise as true word
|
|
1125
|
+
segmentation: a bigram can be shared by unrelated words (`编程` is in
|
|
1126
|
+
both `并发编程` "concurrent programming" and `AI 编程` "AI programming"),
|
|
1127
|
+
so partial-match false positives happen — the same class of imprecision
|
|
1128
|
+
BM25 has for English. A statistical segmenter (jieba-style) would be
|
|
1129
|
+
more precise, but its dictionary alone is several MB and would break
|
|
1130
|
+
the package-size budget, so bigrams are the right point on that curve
|
|
1131
|
+
for the *lexical* index — which is the always-on default. (Genuine
|
|
1132
|
+
cross-lingual recall is a different problem with its own opt-in
|
|
1133
|
+
answer, *Semantic search*, below.)
|
|
1134
|
+
|
|
1135
|
+
One known refinement: the token-budget estimate is a flat ~4
|
|
1136
|
+
chars/token heuristic, which slightly *under*-counts CJK (CJK is
|
|
1137
|
+
denser per model token), so recall packs marginally more CJK content
|
|
1138
|
+
than the budget intends. It's a small imprecision in packing, not a
|
|
1139
|
+
correctness problem.
|
|
1140
|
+
|
|
1141
|
+
Note the scope of all the above: bigrams make retrieval work *within*
|
|
1142
|
+
a language — a Chinese query finding Chinese text. They cannot do
|
|
1143
|
+
*cross-lingual* recall — a Chinese or Russian query finding code
|
|
1144
|
+
commented in English — because lexical search matches tokens, and
|
|
1145
|
+
different scripts share none. That is a genuinely different problem,
|
|
1146
|
+
and it has its own opt-in answer below.
|
|
1147
|
+
|
|
1148
|
+
## Semantic search
|
|
1149
|
+
|
|
1150
|
+
`enableSemanticSearch` (default **off**) adds opt-in **cross-lingual**
|
|
1151
|
+
retrieval: a query in one language finding code and comments written
|
|
1152
|
+
in another — e.g. a Russian or Chinese query surfacing an
|
|
1153
|
+
English-commented function. Lexical BM25 structurally cannot do this
|
|
1154
|
+
(a Russian query and English content share zero tokens); it needs an
|
|
1155
|
+
embedding model that places the languages in one shared vector space.
|
|
1156
|
+
|
|
1157
|
+
**How it works.** With the flag on, the plugin loads a small
|
|
1158
|
+
multilingual embedding model — `intfloat/e5` via the optional
|
|
1159
|
+
`@huggingface/transformers` dependency, default
|
|
1160
|
+
`Xenova/multilingual-e5-small` (~120 MB, ~384-dim, 100+ languages,
|
|
1161
|
+
downloaded and cached on first use). A background pass after prefill
|
|
1162
|
+
embeds every memory and stores the vectors in a **separate**
|
|
1163
|
+
`.opencode/diane-vectors.db`; the pass is incremental and crash-safe,
|
|
1164
|
+
so each memory is embedded once and reused across sessions. On a
|
|
1165
|
+
recall, the query is embedded and the two rankings — BM25 lexical and
|
|
1166
|
+
vector similarity — are merged with reciprocal-rank fusion (RRF), the
|
|
1167
|
+
standard position-only blend that needs no score calibration. The
|
|
1168
|
+
recall path itself stays synchronous: only the query embedding is
|
|
1169
|
+
async, done in the tool handler before the sync ranking.
|
|
1170
|
+
|
|
1171
|
+
**Off by default, and off means off.** When `enableSemanticSearch` is
|
|
1172
|
+
false: no model is downloaded, `@huggingface/transformers` is never
|
|
1173
|
+
imported (it is an *optional* peer dependency — a normal install never
|
|
1174
|
+
pulls it in), no vector database is created, and `recallDetailed`
|
|
1175
|
+
takes the byte-for-byte unchanged lexical path. The plugin's full
|
|
1176
|
+
existing test suite runs with the feature off and is the regression
|
|
1177
|
+
proof that the default path is untouched.
|
|
1178
|
+
|
|
1179
|
+
**Enabling it.**
|
|
1180
|
+
|
|
1181
|
+
```sh
|
|
1182
|
+
bun add @huggingface/transformers # the optional dependency
|
|
1183
|
+
```
|
|
1184
|
+
|
|
1185
|
+
```jsonc
|
|
1186
|
+
// opencode.json
|
|
1187
|
+
["opencode-diane", { "enableSemanticSearch": true }]
|
|
1188
|
+
```
|
|
1189
|
+
|
|
1190
|
+
If the flag is on but the dependency is missing or the model can't be
|
|
1191
|
+
fetched, the plugin logs a warning and falls back to lexical search —
|
|
1192
|
+
enabling the flag never breaks recall.
|
|
1193
|
+
|
|
1194
|
+
**Cost, honestly.** The model is a real dependency: a one-time ~120 MB
|
|
1195
|
+
download, a few hundred MB of process RAM while loaded, and a
|
|
1196
|
+
background embedding pass that takes a few minutes on a large store
|
|
1197
|
+
the first time (incremental and cached thereafter). Each recall adds
|
|
1198
|
+
one query embedding (~tens of ms on CPU) plus a brute-force cosine
|
|
1199
|
+
scan (sub-millisecond at realistic store sizes). And it trades away
|
|
1200
|
+
the plugin's signature property: BM25 is inspectable — you can see
|
|
1201
|
+
*why* a hit matched — whereas an embedding match is a black box. That
|
|
1202
|
+
is the deliberate tradeoff for crossing languages, which is why it is
|
|
1203
|
+
opt-in rather than default.
|
|
1204
|
+
|
|
1205
|
+
**What is tested, and how.** diane's *pipeline* — the vector store,
|
|
1206
|
+
RRF fusion, the recall gating, graceful degradation, and end-to-end
|
|
1207
|
+
RU/EN/ZH cross-lingual retrieval — is covered in CI (`semantic.test.ts`)
|
|
1208
|
+
by a deterministic stub embedder with a built-in trilingual concept
|
|
1209
|
+
lexicon. The stub is used on purpose: the cross-lingual *quality* is a
|
|
1210
|
+
property of Microsoft's e5 model, benchmarked by its authors, and CI
|
|
1211
|
+
should not re-prove it by downloading 120 MB on every run. The real
|
|
1212
|
+
model is verified separately by `scripts/verify-semantic.mjs` (run it
|
|
1213
|
+
once where the Hugging Face Hub is reachable: `bun run verify:semantic`).
|
|
1214
|
+
That script covers **nine languages on a two-tier scheme**: a *core*
|
|
1215
|
+
tier of well-represented languages (English, Chinese, Russian,
|
|
1216
|
+
Japanese, Spanish, Turkish) whose cross-lingual matches gate the exit
|
|
1217
|
+
code, plus an *experimental* tier of low-resource Cyrillic languages
|
|
1218
|
+
(Mongolian, Tajik, Kyrgyz) whose results are reported but do not fail
|
|
1219
|
+
the script — an honest empirical view of how the model handles
|
|
1220
|
+
languages it was trained on with very uneven amounts of data, rather
|
|
1221
|
+
than a pretence that it handles all of them equally.
|
|
1222
|
+
|
|
1223
|
+
## Real-world usefulness — when it helps, when it doesn't
|
|
1224
|
+
|
|
1225
|
+
The plugin was dry-run against real repositories — `rs/zerolog` (Go),
|
|
1226
|
+
`pallets/click` (Python), `expressjs/express` (JavaScript),
|
|
1227
|
+
`BurntSushi/byteorder` (Rust), `Snailclimb/JavaGuide` (Chinese),
|
|
1228
|
+
`redis/redis` (C), `facebook/rocksdb` (C++) and
|
|
1229
|
+
`spring-projects/spring-framework` (Java, 11k files) — using
|
|
1230
|
+
`scripts/dry-run.mjs` (ingests a checkout and shows the actual memories
|
|
1231
|
+
and the results of realistic developer queries) and
|
|
1232
|
+
`scripts/measure-savings.mjs` (models the token cost of raw discovery
|
|
1233
|
+
versus a recall). The honest findings:
|
|
1234
|
+
|
|
1235
|
+
**Measured token savings.** When recall covers a task, the saving is
|
|
1236
|
+
large: ~87 % on `zerolog`, ~89 % on `click`, ~85 % on `express` — raw
|
|
1237
|
+
discovery of ~8–11k tokens collapsing to ~1.1–1.2k. But that number is
|
|
1238
|
+
"tokens saved *if recall is relevant*". It is not a relevance score —
|
|
1239
|
+
see the express case below, where the token count looks great while the
|
|
1240
|
+
hits are mediocre. Treat the percentage as a ceiling, not a promise.
|
|
1241
|
+
|
|
1242
|
+
**It helps most on** repos with descriptive commit messages and
|
|
1243
|
+
*statically-declared* code structure (Go, Rust, typed Java/Python),
|
|
1244
|
+
under active development so recent history is substantive. On `zerolog`,
|
|
1245
|
+
"error handling" and "logging configuration" surfaced genuinely relevant
|
|
1246
|
+
commits and the code map gave compact, accurate API digests.
|
|
1247
|
+
|
|
1248
|
+
**It helps least on**:
|
|
1249
|
+
- *Terse-commit repos.* Commit messages are stored verbatim — the
|
|
1250
|
+
plugin derives nothing from message style — so a history of "fix",
|
|
1251
|
+
"wip", "update" yields low-signal memories. `dry-run.mjs` prints a
|
|
1252
|
+
GOOD / MODERATE / LOW verdict so you know before relying on it.
|
|
1253
|
+
- *Repos mid-mechanical-refactor.* A burst of renames or a doc
|
|
1254
|
+
migration produces many keyword-matching but signal-free commits. The
|
|
1255
|
+
git ingester detects **balanced churn** — additions ≈ deletions, the
|
|
1256
|
+
convention-free fingerprint of moved/reformatted content — and gives
|
|
1257
|
+
it no per-commit memory, as merge commits get none. On `click`,
|
|
1258
|
+
mid-`.rst`→`.md` migration, this filtered ~5 % of commits; on
|
|
1259
|
+
`zerolog` only ~2.5 %.
|
|
1260
|
+
- *Mature, stable repos.* On `express` (2000+ commits) recent history
|
|
1261
|
+
is dominated by dependency bumps, CI tweaks and test maintenance; the
|
|
1262
|
+
substantive architectural commits are old, possibly past the depth
|
|
1263
|
+
cap. Git-history memory is most valuable on actively-evolving code.
|
|
1264
|
+
- *Dynamic-dispatch codebases.* The tree-sitter code map extracts
|
|
1265
|
+
*declared* signatures, so its quality tracks how statically a
|
|
1266
|
+
language declares its API. It is **strong on C, C++, Java, Go and
|
|
1267
|
+
Rust** — dry runs on `redis`, `rocksdb` and `spring-framework`
|
|
1268
|
+
produced accurate signatures (`static int checkStringLength(client
|
|
1269
|
+
*c…)`, C++ namespaces/templates/inheritance, Java classes/methods).
|
|
1270
|
+
It is **weak on idiomatic dynamic JavaScript**: `express` builds its
|
|
1271
|
+
real API (`app.get`, `req.body`…) through prototype mutation and
|
|
1272
|
+
higher-order functions, so the extractor finds little (`lib/request.js`
|
|
1273
|
+
→ "1 definition").
|
|
1274
|
+
- *Very small repos.* Little history → raw discovery was already cheap;
|
|
1275
|
+
`measure-savings.mjs` reports such cases as inconclusive, not a win.
|
|
1276
|
+
|
|
1277
|
+
**Keyword-on-filename bias.** Default retrieval is keyword BM25 (the
|
|
1278
|
+
deliberate embedding-free default) and it scores file *paths* as well
|
|
1279
|
+
as content, so a file *named* after a concept can outrank the real
|
|
1280
|
+
implementation. This was the most consistent weakness across the dry
|
|
1281
|
+
runs: on `express`,
|
|
1282
|
+
"routing and middleware" surfaced `test/middleware.basic.js` and a
|
|
1283
|
+
benchmark over `lib/router/`; on `rocksdb`, "write ahead log" surfaced
|
|
1284
|
+
test and bench files; on `spring-framework`, "bean lifecycle" surfaced
|
|
1285
|
+
JUnit fixture classes named `LifecycleBean.java`. The effect is
|
|
1286
|
+
*amplified* in verbose-naming languages — Java's long descriptive class
|
|
1287
|
+
names mean test and fixture classes match concept keywords strongly.
|
|
1288
|
+
|
|
1289
|
+
The mitigation is the `memory_recall` **`prefer`** option — a
|
|
1290
|
+
query-dependent intent lean the calling agent sets from what the user
|
|
1291
|
+
asked: `prefer:"code"` gently down-ranks test-pathed memories,
|
|
1292
|
+
`prefer:"tests"` lifts them, `prefer:"history"` favours change history.
|
|
1293
|
+
It is a mild score multiplier, deliberately **never a filter** — a
|
|
1294
|
+
strongly-matching test still surfaces under `"code"`, just lower —
|
|
1295
|
+
because sometimes the test really is what you want. On `spring`,
|
|
1296
|
+
`prefer:"code"` lifted the real `InitDestroyAnnotationBeanPostProcessor`
|
|
1297
|
+
above the JUnit fixtures for "bean lifecycle"; on `rocksdb` it separated
|
|
1298
|
+
`db_write_test.cc` from the implementation. The test signal itself is
|
|
1299
|
+
deliberately minimal and language-neutral — whether the word "test"
|
|
1300
|
+
appears as a *token* of the path, which catches `test/` directories,
|
|
1301
|
+
`_test.go` / `.test.ts` / `test_x.py` filenames alike without
|
|
1302
|
+
enumerating any one ecosystem's convention. It is the agent — already
|
|
1303
|
+
an LLM that understood the request in whatever natural language — that
|
|
1304
|
+
decides the intent; the plugin hardcodes no query keywords. Run
|
|
1305
|
+
`dry-run.mjs` on your own repo to see the lean in action.
|
|
1306
|
+
|
|
1307
|
+
### Verifying it inside a live OpenCode session
|
|
1308
|
+
|
|
1309
|
+
The suites and smoke test exercise the plugin against the documented
|
|
1310
|
+
plugin contract with a mock host; they do **not** run it end-to-end
|
|
1311
|
+
inside a live OpenCode server — that gap is real and is best closed by
|
|
1312
|
+
running it. A quick manual check, in a real repo under OpenCode:
|
|
1313
|
+
|
|
1314
|
+
1. Start OpenCode; confirm the plugin loads (no error; `memory_status`
|
|
1315
|
+
responds and, after prefill, reports a non-zero memory count).
|
|
1316
|
+
2. Ask the agent something the history knows ("what changed recently
|
|
1317
|
+
around <area>"); confirm `memory_recall` is called and the results
|
|
1318
|
+
are relevant.
|
|
1319
|
+
3. Run `memory_code_map` for a structural question; confirm the
|
|
1320
|
+
signatures are accurate.
|
|
1321
|
+
4. Run `memory_mine_skills`, then `memory_skill` — confirm a skill
|
|
1322
|
+
mined this session lists and loads without a restart.
|
|
1323
|
+
5. Skim a session log with `analyze-logs.py` to see the tool calls and
|
|
1324
|
+
their latencies.
|
|
1325
|
+
|
|
1326
|
+
## What it is not
|
|
1327
|
+
|
|
1328
|
+
- **Not a vector store by default.** Lexical BM25, no neural ranker —
|
|
1329
|
+
though cross-lingual semantic search is an explicit opt-in.
|
|
1330
|
+
- **Not an LLM.** No model is bundled or called; everything is
|
|
1331
|
+
deterministic structure + BM25.
|
|
1332
|
+
- **Not an unbounded archive.** A configurable disk budget (50 MB default); least-used facts age out via LFU eviction.
|
|
1333
|
+
- **Not a substitute for AGENTS.md.** AGENTS.md is for fuzzy guidance every turn; this is for facts surfaced on demand.
|
|
1334
|
+
- **Not lossy by intent.** The store keeps verbatim content; eviction only kicks in over budget.
|
|
1335
|
+
|
|
1336
|
+
## Live code-map refresh
|
|
1337
|
+
|
|
1338
|
+
When `enableCodeMap` is on and the agent modifies a source file using
|
|
1339
|
+
OpenCode's `write`, `edit`, or `patch` tool, the plugin **re-indexes
|
|
1340
|
+
that file's code-map memory immediately** — before the agent's next
|
|
1341
|
+
tool call — so `memory_code_map` never serves stale signatures within
|
|
1342
|
+
the same session.
|
|
1343
|
+
|
|
1344
|
+
How it works: the `tool.execute.before` hook records which file a
|
|
1345
|
+
`write`/`edit` is about to change; the `tool.execute.after` hook
|
|
1346
|
+
(which fires once the file is on disk in its new form) calls
|
|
1347
|
+
`ingestCodeMapForFile`, a per-file variant of the prefill walk. It
|
|
1348
|
+
reuses the already-warm tree-sitter engine (the wasm init and grammar
|
|
1349
|
+
loads only happen once per session, at prefill), so a single-file
|
|
1350
|
+
re-parse costs ~milliseconds. `upsertBySubject` replaces the old
|
|
1351
|
+
code-map memory in place — no duplicates, no accumulation.
|
|
1352
|
+
|
|
1353
|
+
**Bash-driven changes are also tracked (since v0.0.5).** After every
|
|
1354
|
+
`bash` tool call the plugin runs `git status --porcelain` to find
|
|
1355
|
+
files the shell command modified or created, then refreshes the
|
|
1356
|
+
code-map for each — up to `bashFileTrackingMaxFiles` (default 20).
|
|
1357
|
+
This closes the long-standing gap where `git checkout other-branch`,
|
|
1358
|
+
`npm run format`, `cargo fmt --all`, or `sed -i …` would leave stale
|
|
1359
|
+
signatures in the index. Deletions are skipped (there's no file on
|
|
1360
|
+
disk to re-index); renames track the destination path. Set
|
|
1361
|
+
`bashFileTrackingMaxFiles: 0` to opt out.
|
|
1362
|
+
|
|
1363
|
+
The cap matters: a `git checkout` between branches can touch thousands
|
|
1364
|
+
of files, and re-indexing each synchronously would stall the next tool
|
|
1365
|
+
call. The default 20 covers typical commit/format/codegen workflows
|
|
1366
|
+
without that risk. The plugin logs a `debug` line when files are
|
|
1367
|
+
skipped over the cap so it's visible in the JSONL log without flooding
|
|
1368
|
+
the agent.
|
|
1369
|
+
|
|
1370
|
+
## Live session reflection
|
|
1371
|
+
|
|
1372
|
+
Three behaviours, added in v0.0.5, keep what's happening *right now*
|
|
1373
|
+
visible to the memory store:
|
|
1374
|
+
|
|
1375
|
+
**1. Live-session activity recording** (`recordSessionActivity`,
|
|
1376
|
+
default on). The current session's file edits and bash commands roll
|
|
1377
|
+
up into ONE memory under `session-trace` → `live:${sessionId}`,
|
|
1378
|
+
upserted in place after each event. Lets the current session recall
|
|
1379
|
+
"what have I touched so far" without scanning the OpenCode SDK, and
|
|
1380
|
+
pre-seeds the trace so the moment this session becomes "past", a
|
|
1381
|
+
successor sees it like any other. The memory is **not** pinned (it's
|
|
1382
|
+
transient state — eligible for eviction). Content is bounded
|
|
1383
|
+
(~4 KB) with a rolling list of recent bash commands; total counts
|
|
1384
|
+
stay accurate even after detail truncation.
|
|
1385
|
+
|
|
1386
|
+
**2. Post-bash code-map freshness** — covered above.
|
|
1387
|
+
|
|
1388
|
+
**3. Auto git re-ingest on HEAD movement** (`autoReingestGitOnHeadChange`,
|
|
1389
|
+
default on). After every `bash` call the plugin polls
|
|
1390
|
+
`git rev-parse HEAD`; if HEAD moved (pull / merge / rebase / checkout
|
|
1391
|
+
/ reset), it queues a background re-ingest of git history.
|
|
1392
|
+
Idempotent — already-known commits are skipped via `insertIfMissing`,
|
|
1393
|
+
so the cost is roughly linear in the number of *new* commits.
|
|
1394
|
+
Concurrent triggers coalesce: only one re-ingest runs at a time, and
|
|
1395
|
+
further detections re-arm the flag for the next poll. The
|
|
1396
|
+
`memory_ingest_git` tool exposes the same logic as an explicit,
|
|
1397
|
+
on-demand call for cases the auto-detect can't cover (a fetch-only
|
|
1398
|
+
operation that brings new commits via another mechanism).
|
|
1399
|
+
|
|
1400
|
+
Together these three close the gaps surfaced by the v0.0.4 reflection
|
|
1401
|
+
verdict: the current session's work, bash-driven file changes, and
|
|
1402
|
+
post-merge commits are all visible to recall mid-session, not only
|
|
1403
|
+
after a restart.
|
|
1404
|
+
|
|
1405
|
+
## Compatibility
|
|
1406
|
+
|
|
1407
|
+
Built against `@opencode-ai/plugin@1.14.x`. Runs on the Bun runtime
|
|
1408
|
+
(what OpenCode loads plugins under) — Bun ≥ 1.1. Uses documented
|
|
1409
|
+
hooks only — `tool` for custom tools, `event` for `lsp.client.diagnostics`,
|
|
1410
|
+
`tool.execute.before/after` for code-map refresh and the recall-first
|
|
1411
|
+
nudge, `client.app.log` for session logs. Storage is a SQLite database
|
|
1412
|
+
(`bun:sqlite`, built into the Bun runtime) you can inspect with any
|
|
1413
|
+
SQLite client.
|
|
1414
|
+
|
|
1415
|
+
Coexists with other plugins. With a **hook-heavy plugin** alongside it
|
|
1416
|
+
(e.g. `oh-my-opencode`), note that the recall-first nudge mutates
|
|
1417
|
+
`output.output` in a `tool.execute.after` hook — if you'd rather not
|
|
1418
|
+
have two plugins post-processing tool output, set
|
|
1419
|
+
`enableNudgeHook: false`. The nudge effect is then suppressed;
|
|
1420
|
+
**the hooks themselves remain registered** (they still run the
|
|
1421
|
+
code-map refresh). If the other plugin already does AST/LSP code
|
|
1422
|
+
intelligence, setting `enableCodeMap: false` avoids redundant work
|
|
1423
|
+
(and the grammar-wasm parse overhead) while Diane still covers the
|
|
1424
|
+
persistent memory store, git-structure signals, session ingestion,
|
|
1425
|
+
cross-references, and skill mining.
|
|
1426
|
+
|
|
1427
|
+
|
|
1428
|
+
## License
|
|
1429
|
+
|
|
1430
|
+
MIT.
|