opencode-diane 0.0.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (80) hide show
  1. package/CHANGELOG.md +180 -0
  2. package/LICENSE +21 -0
  3. package/README.md +206 -0
  4. package/WIKI.md +1430 -0
  5. package/dist/index.d.ts +28 -0
  6. package/dist/index.js +1632 -0
  7. package/dist/ingest/adaptive.d.ts +47 -0
  8. package/dist/ingest/adaptive.js +182 -0
  9. package/dist/ingest/code-health.d.ts +58 -0
  10. package/dist/ingest/code-health.js +202 -0
  11. package/dist/ingest/code-map.d.ts +71 -0
  12. package/dist/ingest/code-map.js +670 -0
  13. package/dist/ingest/cross-refs.d.ts +59 -0
  14. package/dist/ingest/cross-refs.js +1207 -0
  15. package/dist/ingest/docs.d.ts +49 -0
  16. package/dist/ingest/docs.js +325 -0
  17. package/dist/ingest/git.d.ts +77 -0
  18. package/dist/ingest/git.js +390 -0
  19. package/dist/ingest/live-session.d.ts +101 -0
  20. package/dist/ingest/live-session.js +173 -0
  21. package/dist/ingest/project-notes.d.ts +28 -0
  22. package/dist/ingest/project-notes.js +102 -0
  23. package/dist/ingest/project.d.ts +35 -0
  24. package/dist/ingest/project.js +430 -0
  25. package/dist/ingest/session-snapshot.d.ts +63 -0
  26. package/dist/ingest/session-snapshot.js +94 -0
  27. package/dist/ingest/sessions.d.ts +29 -0
  28. package/dist/ingest/sessions.js +164 -0
  29. package/dist/ingest/tables.d.ts +52 -0
  30. package/dist/ingest/tables.js +360 -0
  31. package/dist/mining/skill-miner.d.ts +53 -0
  32. package/dist/mining/skill-miner.js +234 -0
  33. package/dist/search/bm25.d.ts +81 -0
  34. package/dist/search/bm25.js +334 -0
  35. package/dist/search/e5-embedder.d.ts +30 -0
  36. package/dist/search/e5-embedder.js +91 -0
  37. package/dist/search/embed-pass.d.ts +26 -0
  38. package/dist/search/embed-pass.js +43 -0
  39. package/dist/search/embedder.d.ts +58 -0
  40. package/dist/search/embedder.js +85 -0
  41. package/dist/search/inverted-index.d.ts +51 -0
  42. package/dist/search/inverted-index.js +139 -0
  43. package/dist/search/ppr.d.ts +44 -0
  44. package/dist/search/ppr.js +118 -0
  45. package/dist/search/tokenize.d.ts +26 -0
  46. package/dist/search/tokenize.js +98 -0
  47. package/dist/store/eviction.d.ts +16 -0
  48. package/dist/store/eviction.js +37 -0
  49. package/dist/store/repository.d.ts +222 -0
  50. package/dist/store/repository.js +420 -0
  51. package/dist/store/sqlite-store.d.ts +89 -0
  52. package/dist/store/sqlite-store.js +252 -0
  53. package/dist/store/vector-store.d.ts +66 -0
  54. package/dist/store/vector-store.js +160 -0
  55. package/dist/types.d.ts +385 -0
  56. package/dist/types.js +9 -0
  57. package/dist/utils/file-log.d.ts +87 -0
  58. package/dist/utils/file-log.js +215 -0
  59. package/dist/utils/peer-detection.d.ts +45 -0
  60. package/dist/utils/peer-detection.js +90 -0
  61. package/dist/utils/shell.d.ts +43 -0
  62. package/dist/utils/shell.js +110 -0
  63. package/dist/utils/usage-skill.d.ts +42 -0
  64. package/dist/utils/usage-skill.js +129 -0
  65. package/dist/utils/xlsx.d.ts +36 -0
  66. package/dist/utils/xlsx.js +270 -0
  67. package/grammars/tree-sitter-c.wasm +0 -0
  68. package/grammars/tree-sitter-c_sharp.wasm +0 -0
  69. package/grammars/tree-sitter-cpp.wasm +0 -0
  70. package/grammars/tree-sitter-css.wasm +0 -0
  71. package/grammars/tree-sitter-go.wasm +0 -0
  72. package/grammars/tree-sitter-html.wasm +0 -0
  73. package/grammars/tree-sitter-java.wasm +0 -0
  74. package/grammars/tree-sitter-javascript.wasm +0 -0
  75. package/grammars/tree-sitter-json.wasm +0 -0
  76. package/grammars/tree-sitter-php.wasm +0 -0
  77. package/grammars/tree-sitter-python.wasm +0 -0
  78. package/grammars/tree-sitter-rust.wasm +0 -0
  79. package/grammars/tree-sitter-typescript.wasm +0 -0
  80. package/package.json +80 -0
package/WIKI.md ADDED
@@ -0,0 +1,1430 @@
1
+ # opencode-diane — Wiki
2
+
3
+ ## What it is
4
+
5
+ A plugin for [OpenCode](https://opencode.ai) that gives the agent a
6
+ hierarchical, BM25-ranked memory store for **any git repository, in any
7
+ language**. It pre-fills itself from git history and project files,
8
+ lets the agent ingest past OpenCode sessions, and mines its own
9
+ contents into reusable `SKILL.md` files. No embeddings, no LLM
10
+ round-trips, no convention assumptions — by default. (Cross-lingual
11
+ semantic search is available as an explicit opt-in; see *Semantic
12
+ search*.)
13
+
14
+ The name is a *Twin Peaks* reference. Throughout the show, Dale
15
+ Cooper recorded his case notes for Diane, the recipient of his
16
+ investigation log. The plugin plays that role for a coding agent — a
17
+ persistent memory layer that holds everything it has observed about
18
+ the codebase. "Diane, I'm standing at the edge of a large
19
+ repository, and I have some thoughts on the commit history."
20
+
21
+ ## Why it exists
22
+
23
+ The agent re-discovers the same things every session: which files
24
+ change together, what's in the build manifest, which files are
25
+ hotspots. Each rediscovery costs many tool calls. A small store of
26
+ compact, structural facts, queryable with BM25, replaces those
27
+ discoveries.
28
+
29
+ It's also a substrate for skill mining: after enough history the store
30
+ contains real patterns, and the miner turns clusters into OpenCode
31
+ `SKILL.md` files. Those are usable in the *same* session via the
32
+ `memory_skill` tool — no restart — and OpenCode also picks them up as
33
+ native skills on the next startup.
34
+
35
+ ## No conventions — only structure
36
+
37
+ The hard rule: the plugin never interprets *culture*. It does not
38
+ parse commit messages for intent, does not assume a commit-message
39
+ style, does not reach into a language's semantics. Real repositories
40
+ often have no commit-message culture at all (`wip`, `.`, `更新`,
41
+ empty) — message-derived classification is noise dressed up as signal.
42
+
43
+ Everything the plugin derives is a **fact about what physically
44
+ happened or physically exists**:
45
+
46
+ - From git: per-commit diff *shape* (files touched, lines ±, files
47
+ created/deleted), file *co-change*, file *churn*, *recency*. The
48
+ commit subject is stored verbatim as searchable text — data, never
49
+ signal.
50
+ - From the tree: a file-extension census (the language signal emerges
51
+ from the data), the top-level layout, and recognised project/build/CI
52
+ files summarised **by format only** (JSON → keys, TOML → sections,
53
+ `Makefile` → targets, …). Recognising that a file is *named*
54
+ `Cargo.toml` is a fact, like knowing a file extension; interpreting
55
+ Rust's dependency model would be a convention, and the plugin does
56
+ not do that.
57
+ - From the language server (live): current diagnostics per file —
58
+ the compiler's / type-checker's own output, normalised by LSP
59
+ across 40+ languages. No heuristics.
60
+ - From tree-sitter (opt-in): per-file definition *signatures* — the
61
+ structural shape of the code, bodies stripped.
62
+
63
+ ## Straight answers for a decision-maker
64
+
65
+ Short answers to the questions that decide whether this is worth
66
+ adding. Each links to the section with the full story.
67
+
68
+ **How is the memory structured?**
69
+ Every memory is one flat record — a `category`, a `subject`, verbatim
70
+ searchable `content`, structural `tags`, and bookkeeping fields
71
+ (`use_count`, `size_bytes`, `pinned`). There is no graph, no nesting,
72
+ no per-category schema. See [How the memory is
73
+ structured](#how-the-memory-is-structured).
74
+
75
+ **What does "hierarchical" mean here?**
76
+ Two address levels: a fixed top-level `category` (nine kinds — git
77
+ history, project facts, code map, …) and a free-form `subject` (a file
78
+ path, a task slug). Retrieval filters by either or both *before*
79
+ scoring, so narrowing to "the git history of `context.go`" costs
80
+ nothing. The hierarchy is those two filter levels — not a tree of
81
+ objects.
82
+
83
+ **What if the repo has no git history?**
84
+ The plugin still activates on any recognised manifest and still gives
85
+ you project facts, the code map, LSP code-health, and everything
86
+ session-driven — but the single largest source (per-commit memories,
87
+ co-change, churn, recency) is gone, so day one is thin. It grows
88
+ useful over sessions as snapshots and notes accumulate. `git init`
89
+ unlocks more than half the value. See [Without git
90
+ history](#without-git-history).
91
+
92
+ **What if commit messages are meaningless ("wip", "fix", ".")?**
93
+ By design this changes nothing about correctness. The plugin never
94
+ classifies a commit by its message — every tag comes from what the
95
+ commit physically did (files touched, lines ±, files created/deleted).
96
+ A terse message only means that one memory's searchable text is
97
+ low-signal; the diff-shape tags, co-change and churn are unaffected.
98
+ See [No conventions](#no-conventions--only-structure).
99
+
100
+ **How is it different from other memory plugins / approaches?**
101
+ By default it is deterministic: BM25 over a hand-built index — no
102
+ embeddings, no model, no API spend, fully reproducible and
103
+ inspectable. (An opt-in semantic-search mode adds an embedding model
104
+ for cross-lingual recall — off unless you enable it.) See [How it
105
+ compares](#how-it-compares).
106
+
107
+ **What token reduction can I actually expect?**
108
+ When a recall covers the task, 80–89 % measured on real repos with
109
+ history. That is a *ceiling*, not a promise — it assumes the recall is
110
+ relevant. It is lower on terse-history repos, mature/stable repos,
111
+ dynamic-dispatch code, and tiny repos. The `dry-run.mjs` script gives
112
+ your repo a GOOD / MODERATE / LOW verdict before you rely on it. See
113
+ [What token reduction to expect](#what-token-reduction-to-expect).
114
+
115
+ **What does it cost to run?**
116
+ The core plugin is ~77 KB with one small dependency, and a large store
117
+ costs a few hundred MB of RAM. The optional code map adds ~16 MB of
118
+ vendored grammar files. No GPU, no API key, no network. See
119
+ [Performance](#performance) and [Code map](#code-map).
120
+
121
+ **Is it production-ready?**
122
+ 674 assertions across 24 test suites, ~90 % line coverage, verified
123
+ against the documented plugin contract and dry-run against real repos
124
+ in 30+ languages (code map covers 13 tree-sitter grammars; cross-refs
125
+ adds Pascal, Ruby, Perl, Elixir, Lua, Haskell, Scala, Kotlin, Swift,
126
+ Verilog, VHDL, COBOL, Fortran, Solidity, Smalltalk, Vim, Racket, Lisp,
127
+ and more). The one honest gap: it has not yet been run end-to-end inside
128
+ a live OpenCode *server* — see [Verifying it inside
129
+ a live OpenCode session](#verifying-it-inside-a-live-opencode-session).
130
+
131
+ ## How the memory is structured
132
+
133
+ Every memory is **one flat record** — there is no per-category schema,
134
+ no nesting, no object graph. The shape, end to end:
135
+
136
+ ```
137
+ one memory — the only record shape; every category uses it
138
+ ─────────────────────────────────────────────────────────────────
139
+ id mem_mp45o0rc_c
140
+ category git-history one of 9 fixed kinds ┐ the two
141
+ subject src/context.go what the fact is about ┘ hierarchy
142
+ content "fix nil deref on flush" + diff shape levels
143
+ verbatim text — scored by BM25, never parsed
144
+ tags [single-file, tiny-diff, net-addition]
145
+ structural only — never derived from prose
146
+ source git:116c8060… provenance
147
+ pinned false true => never evicted
148
+ use_count 3 used_at … least-used pair ages out first
149
+ size_bytes 412 counts against the disk budget
150
+ created_at 2026-05-01T…
151
+ ```
152
+
153
+ That uniformity is deliberate: one storage table, one inverted index,
154
+ one eviction rule, regardless of where a memory came from.
155
+
156
+ ### The hierarchy is two filter levels
157
+
158
+ "Hierarchical" here means exactly two levels of address — nothing more
159
+ elaborate:
160
+
161
+ - **`category`** — a fixed, closed set of nine kinds. It says *what
162
+ type of fact* this is.
163
+ - **`subject`** — free-form. Usually a file path; sometimes a task
164
+ slug or a synthetic key like `<tree>` or `go.mod↔go.sum`. It says
165
+ *what the fact is about*.
166
+
167
+ ```
168
+ the store
169
+
170
+ ├─ git-history ······ commit / co-change / churn / recency memories
171
+ │ ├─ subject "src/context.go" a commit that touched it
172
+ │ ├─ subject "src/writer.go"
173
+ │ ├─ subject "go.mod↔go.sum" a co-change pair
174
+ │ └─ subject "context.go (churn)" a stability signal
175
+
176
+ ├─ project-facts ···· manifests, tree census, README headline
177
+ │ ├─ subject "package.json"
178
+ │ └─ subject "<tree>"
179
+
180
+ ├─ code-map ········· one signature digest per source file (opt-in)
181
+ ├─ code-health ······ one LSP error/warning summary per file (live)
182
+ ├─ session-snapshot · one per session — mental model, decisions
183
+ ├─ session-trace ···· task + tool-trace summaries of past sessions
184
+ ├─ agent-note ······· facts the agent chose to remember
185
+ ├─ skill-mined ······ subject clusters promoted to SKILL.md
186
+ └─ custom ··········· anything stored with memory_remember
187
+ ```
188
+
189
+ Retrieval can filter by `category`, by `subject`, or by both *before*
190
+ BM25 scoring runs — so "the git history of `context.go`" or "every
191
+ code-map entry" is a free narrowing, not a post-filter over a full
192
+ scan. That pre-score filter is the entire payoff of the hierarchy: it
193
+ makes a scoped recall as cheap as an unscoped one.
194
+
195
+ There is no third level and no cross-links between memories *as data*.
196
+ The one relationship the plugin uses — which files change together —
197
+ is itself stored as ordinary `git-history` memories (subject
198
+ `fileA↔fileB`) and consulted at query time as the co-change boost; it
199
+ is not a separate graph structure in the store.
200
+
201
+ ## How it compares
202
+
203
+ The plugin is one specific point in the design space. What it trades,
204
+ against the common alternatives:
205
+
206
+ **vs. an embedding / vector memory.** The deliberate difference is
207
+ that the *default* uses *no model*. Retrieval is BM25 over a
208
+ hand-built inverted index — deterministic, reproducible, debuggable,
209
+ no GPU, no API spend. (Semantic search bridges to the embedding world
210
+ as an opt-in when you need cross-lingual recall — see *Semantic
211
+ search* — but it is off by default.) The cost of that choice
212
+ is real: BM25 matches *tokens*, so a query has to share words (or CJK
213
+ bigrams) with the memory — it will not catch a pure paraphrase the way
214
+ a vector search can. Three things blunt that: identifier-aware
215
+ tokenisation (so `getUserName` also matches `user` and `name`), the
216
+ co-change boost (structurally-related memories surface with no textual
217
+ match), and the fact that code search is mostly keyword search anyway.
218
+ If you specifically need semantic similarity over prose, an embedding
219
+ store is the better tool; for a fast, free, inspectable memory of a
220
+ codebase, this is.
221
+
222
+ **vs. aider's repo-map.** aider uses tree-sitter too, but the design
223
+ is different at every level.
224
+
225
+ *How it works (from the source).* The expensive step — tree-sitter
226
+ parsing and tag extraction — is cached persistently on disk
227
+ (`.aider.tags.cache.v{N}`, using `diskcache`) across sessions. What
228
+ is recomputed on each message turn is the *ranking*: a full PageRank
229
+ run (via NetworkX) on a symbol-reference dependency graph, where each
230
+ source file is a node and edges are weighted by how often one file
231
+ references symbols defined in another. The ranking is *personalised*
232
+ to the current turn — files in the active chat get a ×50 multiplier
233
+ on their outgoing edges; symbols mentioned in the current message get
234
+ ×10; long compound identifiers (camelCase / snake_case, ≥ 8 chars)
235
+ ×10; private `_`-prefixed symbols ×0.1. An in-session in-memory cache
236
+ short-circuits the PageRank when the inputs are unchanged. The ranked
237
+ tags are then fitted into the token budget by binary search.
238
+
239
+ *The budget is dynamic.* The default token cap is 1 024 (`--map-tokens`),
240
+ but when no files are in the chat the budget multiplies by
241
+ `map_mul_no_files=8` — up to ~8 192 tokens — so an empty chat gets a
242
+ much wider view of the whole repo.
243
+
244
+ *Where diane lands.* diane's default co-change boost is deliberately
245
+ one hop — direct neighbours only — which is cheaper and trivially
246
+ inspectable, but narrower than aider's whole-graph PageRank. The
247
+ `personalizedPageRank` option closes that gap: turned on, diane runs
248
+ its own Personalized PageRank (a restart-biased random walk seeded on
249
+ the query's textual hits) over the co-change graph, so relevance
250
+ reaches multi-hop files graded by graph distance. It is off by
251
+ default — the random walk is a per-recall iterative computation (a few
252
+ ms on a large graph) and less trivially traceable than one hop, so the
253
+ cheap, fully-inspectable path stays the default and PPR is there for
254
+ those who want the wider reach. The graph differs from aider's in
255
+ kind: aider's edges are *symbol references* (who calls whom), diane's
256
+ are *co-change* (what changes together in Git history) — structural
257
+ coupling rather than static call structure.
258
+
259
+ *Output format: source lines, not stripped signatures.* aider's output
260
+ (via `TreeContext` with "lines of interest") shows the actual source
261
+ lines of the referenced symbols — class attributes, multi-line
262
+ signatures, brief context — not just a single signature string per
263
+ definition. Richer context per symbol, but more tokens per symbol;
264
+ diane's code map is more compact, covering more files at a lower
265
+ per-file token cost.
266
+
267
+ *How diane's code map differs.* It does not track symbol references or
268
+ run a graph algorithm. Every file gets one flat signature digest; BM25
269
+ recall selects the most query-relevant digests at call time. The map
270
+ is available immediately (persisted from prefill) and the token cost
271
+ is predictable at every call. It is also only one of ten memory
272
+ categories — git history, past sessions, mined skills and snapshots
273
+ sit alongside it. The benchmark repo (`opencode-diane-benchmarks`)
274
+ compares the two maps directly on real repositories.
275
+
276
+ **vs. AGENTS.md / static context files.** Those are loaded into the
277
+ prompt *every turn* — a fixed, recurring token cost the model pays
278
+ whether or not it needs them. This plugin is *pull*, not push: a
279
+ memory costs tokens only on the turn it is recalled. The two are
280
+ complementary — AGENTS.md for guidance the model should always see,
281
+ diane for facts it needs only sometimes.
282
+
283
+ **vs. no memory at all.** Without a memory the agent re-runs the same
284
+ `git log`, `ls -R`, `grep` and file reads every session. That raw
285
+ discovery is the baseline the token-savings numbers below are measured
286
+ against.
287
+
288
+ ## The pillars
289
+
290
+ **1. Hierarchical store.** Top-level `category` (`git-history`,
291
+ `project-facts`, `code-health`, `code-map`, `session-trace`,
292
+ `session-snapshot`, `agent-note`, `skill-mined`, `custom`) + free-form
293
+ `subject` (file path, task slug). Retrieval filters by both before
294
+ scoring, so narrowing is free.
295
+
296
+ **2. BM25 retrieval, co-change-boosted.** Pure-JS tokenizer with
297
+ camelCase / snake_case splitting for Latin text and **overlapping
298
+ bigrams for CJK** (Chinese, Japanese, Korean) — CJK has no word
299
+ delimiters, so an ASCII splitter would drop it entirely; bigrams give
300
+ BM25 units to match on, the same dependency-free approach Lucene's CJK
301
+ analyzer and SQLite FTS5 use (see *Multilingual retrieval*). Inverted
302
+ index, `k1=1.2 b=0.75`, plus a small log-of-useCount tiebreak. On top
303
+ of textual scoring, a one-hop **co-change boost**: a hit about file X
304
+ pulls in memories about files X is historically modified with —
305
+ structurally-related context a pure text match would miss. (With
306
+ `personalizedPageRank` on, that one hop becomes a full
307
+ restart-biased random walk over the co-change graph, reaching
308
+ multi-hop files — opt-in; see *How it compares*.) Recall
309
+ output is **token-budgeted**: ranked hits are packed to a ceiling
310
+ (default 1200) so a
311
+ call's context cost is predictable; an oversized sole hit is
312
+ content-truncated rather than allowed to blow the budget.
313
+
314
+ The retrieval path, end to end:
315
+
316
+ ```
317
+ memory_recall("nil deref on flush", category?, subject?, prefer?)
318
+
319
+ ▼ tokenize camelCase / snake_case split · CJK -> bigrams ·
320
+ │ stopwords dropped · sub-2-char tokens dropped
321
+ [nil, deref, flush]
322
+
323
+ ▼ filter category / subject narrow the candidate set
324
+ │ BEFORE scoring — a scoped recall is free
325
+
326
+ ▼ BM25 k1=1.2 b=0.75 + log1p(use_count)*0.05 tiebreak
327
+
328
+ ▼ co-change a hit on context.go pulls in writer.go if history
329
+ │ boost shows them changing together (a direct text
330
+ │ match still outranks a co-change-surfaced one)
331
+
332
+ ▼ prefer lean optional: gently up/down-rank code vs tests vs
333
+ │ history to match the query's intent
334
+
335
+ ▼ token-budget ranked hits packed to <= tokenBudget (default
336
+ │ pack 1200); the remainder returned as an omitted
337
+ │ count; an oversized sole hit is truncated
338
+
339
+ bounded, predictable result
340
+ ```
341
+
342
+ **3. Structural pre-fill.** Walks the last 500 commits via
343
+ `git log --numstat --summary`; every non-merge commit becomes a memory
344
+ tagged purely by diff shape. Adds co-change, churn and recency
345
+ memories. Separately, censuses the file tree and summarises recognised
346
+ project files by format. Works identically on a Go, Rust, Python,
347
+ Elixir, or polyglot repo.
348
+
349
+ What prefill does, on every startup:
350
+
351
+ ```
352
+ OpenCode starts
353
+
354
+ ▼ activate? — git repo OR a recognised manifest present?
355
+ │ if neither: log one idle line, register no tools
356
+
357
+ ▼ prefill (background — the agent can query partial results at once)
358
+
359
+ ├── git log --numstat --summary -> per-commit · co-change · churn · recency
360
+ ├── walk the file tree ----------> extension census · layout · manifest digests
361
+ ├── tree-sitter parse (opt-in) -> per-file signature digests (code-map)
362
+ ├── past OpenCode sessions ------> task + tool-trace summaries
363
+ └── most recent session-snapshot > resume point logged
364
+
365
+ ▼ store ready — every later session starts warm
366
+ ```
367
+
368
+ **4. Live code-health.** Subscribes to OpenCode's
369
+ `lsp.client.diagnostics` event and keeps one `code-health` memory per
370
+ file reflecting its *current* error/warning count — re-reports
371
+ replace, not accumulate. Convention-free, language-agnostic, no new
372
+ dependency.
373
+
374
+ **5. Code map (opt-in).** With `enableCodeMap`, tree-sitter parses
375
+ each source file and stores the *signatures* of its definitions
376
+ (bodies stripped) — an Aider-style repo map, reachable via
377
+ `memory_code_map`. This is the one heavyweight, language-aware
378
+ feature; see *Code map* below.
379
+
380
+ **6. Session snapshots.** `memory_snapshot` records a session's
381
+ *understanding* — mental model, decisions, learned conventions — as a
382
+ pinned `session-snapshot` memory. Each tags the previous session's as
383
+ `parent`, so the set is a branchable history with no DAG structure
384
+ beyond the tags; a later or parallel session resumes from the latest.
385
+ See *Session snapshots* below.
386
+
387
+ **7. LFU disk budget.** Configurable byte cap (default 50 MB — see
388
+ *Configuration* and the *heap* note under *Performance*). After every
389
+ mutation, evict ascending by `(useCount, usedAt)` until under. Pinned
390
+ entries (including snapshots) are never evicted.
391
+
392
+ **8. Skill mining.** Clusters memories by `subject`. Clusters with
393
+ ≥ 3 entries become `<root>/.opencode/skills/<slug>/SKILL.md`. Runs
394
+ in the background; the tool returns immediately. The mined skills are
395
+ usable in the same session through the `memory_skill` tool — no
396
+ restart — and OpenCode also loads them as native skills next start.
397
+
398
+ ## The ten tools
399
+
400
+ | Tool | Purpose |
401
+ |---|---|
402
+ | `memory_recall(query, category?, subject?, prefer?, limit?, tokenBudget?)` | Search the store — co-change-boosted, token-budgeted. `prefer` ('code'/'tests'/'history') leans ranking to match query intent. The recall-first entry point. |
403
+ | `memory_code_map(query?, tokenBudget?)` | Aider-style file-signature map, ranked + budgeted. Needs `enableCodeMap`. |
404
+ | `memory_remember(subject, content, tags?)` | Save a fact for future turns. |
405
+ | `memory_snapshot(summary, decisions?, conventions?)` | Record this session's understanding for a later/parallel session to resume from. |
406
+ | `memory_outline()` | Counts per category — cheap orientation. |
407
+ | `memory_status()` | Size, byte usage vs budget, last-ingest timestamps. |
408
+ | `memory_ingest_sessions()` | Pull task + tool-trace summaries from past OpenCode sessions. |
409
+ | `memory_ingest_git()` | Re-scan git history for new commits arrived since startup. Idempotent — `insertIfMissing` skips already-known commits. The plugin also auto-triggers this when a `bash` call moves HEAD; this tool is the explicit version. |
410
+ | `memory_mine_skills(reason?)` | Cluster memories into SKILL.md files. Background. |
411
+ | `memory_skill(name?)` | List the mined skill files, or load one into the conversation — so a skill mined this session is usable now, no restart. |
412
+
413
+ Tool descriptions are deliberately **directive** — they tell the
414
+ agent to recall *before* raw discovery and frame the token-cost
415
+ argument, since the description is the only prompt a plugin controls.
416
+ On top of that, a `tool.execute.before/after` pair provides a gentle
417
+ **recall-first nudge**: if the agent makes a couple of raw discovery
418
+ calls without ever touching a memory tool, one reminder is appended to
419
+ a discovery result. It fires at most once per session, never on
420
+ `read` output (file contents stay pristine), and goes silent the
421
+ moment any memory tool is used.
422
+
423
+ ## Activation
424
+
425
+ Activates on any directory that is a git repository **or** contains at
426
+ least one recognised project/build file (a flat list of filenames
427
+ across ecosystems — no language logic). Otherwise it logs one idle
428
+ line and registers no tools. `forceActive: true` overrides.
429
+
430
+ ## Without git history
431
+
432
+ Git is the largest single source: per-commit memories plus co-change,
433
+ churn and recency — and co-change is the entire backing for the
434
+ retrieval boost and the closest thing the plugin has to a graph.
435
+ Without git, none of that exists.
436
+
437
+ What remains git-independent:
438
+
439
+ - **Project facts** — manifests, build/CI files, tree census, README
440
+ headline. Real, but a modest slice: orientation, not history-derived
441
+ intelligence.
442
+ - **Code map** (`enableCodeMap`) — tree-sitter parses the file tree
443
+ directly and never touches git. On a non-git repo this is the main
444
+ source of actual codebase intelligence.
445
+ - **Code health** — LSP diagnostics, event-driven.
446
+ - **Session snapshots, agent notes, session ingestion, skill mining** —
447
+ all agent- and session-driven; they accumulate across sessions
448
+ regardless of git.
449
+ - The retrieval machinery itself — BM25, the inverted index, token
450
+ budgeting, the recall-first nudge — is entirely git-independent. It
451
+ simply has less to retrieve at first.
452
+
453
+ So the honest picture: **weak on a fresh non-git repo on day one**
454
+ (project facts alone is thin — and `measure-savings.mjs` will report
455
+ "inconclusive" there for exactly that reason), but **not useless over
456
+ time**, because snapshots, notes and traces build a store that recall
457
+ still operates on. `detectWorkableRepo` accepts a recognised manifest
458
+ *or* git, so a non-git Node/Rust/Python project still activates and
459
+ ingests project facts — only a directory with neither git nor a
460
+ manifest sits idle (and needs `forceActive`).
461
+
462
+ Recommendation: if you work without git, enable `enableCodeMap` and
463
+ use `memory_snapshot` / `memory_remember` deliberately — on a non-git
464
+ repo the store is only as good as what you and past sessions put in.
465
+ If the repo could be under git, `git init` unlocks more than half the
466
+ plugin's value and is the cheapest fix.
467
+
468
+ ## Configuration
469
+
470
+ Defaults work without any config. To override, list the plugin as a
471
+ `[name, options]` tuple in `opencode.json`; OpenCode passes the
472
+ options straight through, and they're coerced defensively (bad keys
473
+ ignored, defaults applied).
474
+
475
+ ```ts
476
+ interface UserConfig {
477
+ maxMemoryDiskMB?: number // default 50
478
+ autoIngestOnStartup?: boolean // default true
479
+ gitHistoryDepth?: number // default 500
480
+ forceActive?: boolean // default false
481
+ skillsOutputDir?: string // default ".opencode/skills"
482
+ skillMiningMinCluster?: number // default 3
483
+ ingestSessions?: boolean // default true
484
+ enableCodeMap?: boolean // default true — tree-sitter signatures (since v0.0.4)
485
+ installUsageSkill?: boolean // default true — write a using-memory skill on first startup
486
+ ingestDocs?: boolean // default true — index docs/ headings as section pointers
487
+ ingestProjectNotes?: boolean // default true — index AGENTS.md, CLAUDE.md, .cursorrules, …
488
+ ingestTableHeaders?: boolean // default true — index CSV / TSV / XLSX column headers
489
+ ingestCrossRefs?: boolean // default true — grammar-agnostic cross-file edges
490
+ crossRefsRarityThreshold?: number // default 3 — max files a symbol can appear in to count
491
+ enableNudgeHook?: boolean // default true — see Coexisting plugins
492
+ adaptive?: boolean // default true — see Adaptive sizing
493
+ enableSemanticSearch?: boolean // default false — see Semantic search
494
+ embeddingModel?: string // default "Xenova/multilingual-e5-small"
495
+ personalizedPageRank?: boolean // default false — see "How it compares"
496
+ recordSessionActivity?: boolean // default true — record this session's edits + bash as a rolling memory
497
+ bashFileTrackingMaxFiles?: number // default 20 — refresh code-map for files a bash call touched (0 = off)
498
+ autoReingestGitOnHeadChange?: boolean // default true — re-ingest git when bash moves HEAD
499
+ }
500
+ ```
501
+
502
+ ### Fine-grained tuning
503
+
504
+ Most users never set these — the defaults cover typical repos. They
505
+ exist for monorepos, documentation-heavy projects, and locked-down
506
+ environments where every walk needs an explicit ceiling.
507
+
508
+ | Option | Default | What it does |
509
+ |---|---|---|
510
+ | `docsMaxFiles` | `200` | Cap on `.md` / `.markdown` files walked under `docs/` plus conventional root docs (CHANGELOG, CONTRIBUTING, ARCHITECTURE, ROADMAP, …). Raise for documentation-heavy repos. |
511
+ | `docsBodyChars` | `240` | Characters of body text captured after each heading as the recall snippet. Longer values → richer context, larger memory entries. |
512
+ | `docsMaxHeadingLevel` | `3` | Deepest heading level indexed (`3` = H1–H3). Set `2` for only H1–H2, or `4`/`5` for deeper structure. Clamped to `[1, 6]`. |
513
+ | `notesMaxBytes` | `6144` | Maximum bytes read from each agent-instruction file (`AGENTS.md`, `CLAUDE.md`, `.cursorrules`, etc.). Raise for teams with detailed instructions. |
514
+ | `tablesMaxFiles` | `200` | Cap on table files (CSV / TSV / XLSX / XLS) walked per prefill pass. |
515
+ | `tablesMaxXlsxMB` | `50` | Skip XLSX/XLS files larger than this (in MB). Set `0` to skip all spreadsheets. |
516
+ | `tablesMaxColumns` | `40` | Maximum column headers listed per table/sheet. Wider tables get a `(N more)` note. |
517
+ | `crossRefsMaxFiles` | `2000` | Cap on files the cross-reference ingester walks per prefill. Raise for monorepos. |
518
+ | `crossRefsMaxEdges` | `10000` | Hard cap on cross-reference edges emitted per pass. Controls the coverage/noise trade-off on dense codebases. |
519
+ | `coChangeMinOccurrences` | `3` | Minimum commits in which two files must co-change before a co-change edge is recorded. Lower → denser graph on small/young repos; raise → tighter graph on busy repos. |
520
+ | `codeMapMaxFiles` | adaptive (`1500`/`4000`/`10000`) | Cap on source files the code-map ingester parses per pass. By default sized by adaptive tuning at startup (small / medium / large tier). Setting it explicitly *overrides the adaptive choice* — useful when you want deterministic behaviour. |
521
+ | `coChangeMaxCommits` | `5000` | Cap on git commits the co-change graph builder scans. Lower for faster startup; raise for deeper history. Adaptive sizing keeps this uniform across tiers in the current implementation; only `codeMapMaxFiles` varies by repo size. |
522
+
523
+ All numeric limits are clamped to a safe minimum (typically `1`,
524
+ sometimes `0` where "off" is meaningful) and rounded — garbage input
525
+ in `opencode.json` never breaks the plugin.
526
+
527
+ ## Coexisting plugins
528
+
529
+ Diane is designed to run alongside other OpenCode plugins without
530
+ either side losing functionality. The two compatibility decisions
531
+ that can't be avoided are made automatically at startup, by reading
532
+ the `plugin` array in `opencode.json` (project-local first, then
533
+ `~/.config/opencode/opencode.json`) and matching against known peer
534
+ names.
535
+
536
+ ### What gets adjusted, and why
537
+
538
+ **The `tool.execute.after` nudge hook (default ON).** When a memory
539
+ tool has gone unused after several discovery calls, Diane appends one
540
+ short reminder to the discovery tool's output. `oh-my-opencode` also
541
+ post-processes tool output (its `look_at` flow replaces grep/glob),
542
+ and two plugins both mutating `output.output` interleave
543
+ unpredictably — so when oh-my-opencode is listed in `opencode.json`
544
+ the nudge is turned off. `caveman` doesn't touch tool output (it
545
+ hooks `session.created` and `tui.prompt.append`), so the nudge stays
546
+ on alongside caveman.
547
+
548
+ **Mined-skill subdirectory prefix (default empty).**
549
+ `memory_mine_skills` writes to `.opencode/skills/<slug>/SKILL.md` —
550
+ the same directory OpenCode discovers skills from. `caveman` writes
551
+ fixed slugs into the same place (`caveman`, `caveman-commit`,
552
+ `caveman-review`, `caveman-help`, `caveman-compress`), and
553
+ `oh-my-opencode`'s skill system also lives there. When either is
554
+ detected, Diane prefixes its subdirs with `diane-` so collisions are
555
+ impossible AND `memory_skill` surfaces only Diane's slugs (the
556
+ peer's slugs are theirs to list, not ours). Standalone, no prefix is
557
+ applied — paths are byte-for-byte the documented default.
558
+
559
+ ### The matrix
560
+
561
+ | Detected peer | nudge hook | mined-skill subdirs |
562
+ |---|---|---|
563
+ | none | on (default) | `<slug>/` |
564
+ | `oh-my-opencode` / `oh-my-openagent` / `oh-my-opencode-slim` | **off** | **`diane-<slug>/`** |
565
+ | `caveman` / `caveman-opencode` / `caveman-opencode-plugin` / `opencode-caveman` | on | **`diane-<slug>/`** |
566
+ | both | **off** | **`diane-<slug>/`** |
567
+
568
+ ### Override
569
+
570
+ An explicit `enableNudgeHook` or `skillsOutputDir` in your `"diane"`
571
+ config beats the auto-detect — useful when you have a specific reason
572
+ to want the nudge on alongside oh-my-opencode, or to point mining at
573
+ a non-standard directory and accept your own collision policy. The
574
+ adjustments are also visible at runtime:
575
+
576
+ - The OpenCode log line at startup names the peers found and the
577
+ adjustments made (or "no compatibility adjustments needed" when
578
+ none were).
579
+ - The `plugin.active` event in the JSONL log carries
580
+ `peers: { ohMyOpencode, caveman, found: [...] }` plus the
581
+ resolved `enableNudgeHook` and `minedSkillPrefix`, so a support
582
+ thread can confirm what actually ran.
583
+
584
+ ### What's not detected
585
+
586
+ Detection is **list-based, not behavioural**. A plugin that does the
587
+ same things oh-my-opencode does but isn't named anything we recognise
588
+ will get no special treatment from us. If you hit a clash with such a
589
+ plugin, set `enableNudgeHook: false` (and/or `skillsOutputDir`) in
590
+ your config and file an issue with the peer's name so we can add it
591
+ to the detection list.
592
+
593
+ ## Adaptive sizing
594
+
595
+ The fixed defaults (gitHistoryDepth 500, a 4000-file code-map cap) are
596
+ a sensible middle — wasteful on a 50-commit toy, thin on a 100k-commit
597
+ monorepo. With `adaptive` on (the default), prefill closes that gap
598
+ from **one measured signal**: `git rev-list --count HEAD`, or a
599
+ bounded file count when there's no git. That signal sorts the repo
600
+ into one of three named tiers, and a lookup table picks the knobs:
601
+
602
+ | knob | small | medium | large |
603
+ |---|---|---|---|
604
+ | `gitHistoryDepth` | 250 | 500 | 1500 |
605
+ | code-map file cap | 1500 | 4000 | 10000 |
606
+ | co-change pass | on | on | skipped above 5000 commits |
607
+
608
+ **The disk budget is deliberately not in that table.** It used to be
609
+ (small/medium 5 MB, large 20 MB) — back when the default was a tight
610
+ 5 MB that genuinely needed widening for big repos. The default is now
611
+ a generous 50 MB (see *Configuration*), which clears even a
612
+ depth-capped large repo's store (~6–8 MB) several times over, so there
613
+ is nothing left for adaptation to do: every tier carries the same
614
+ 50 MB budget. To use more or less, set `maxMemoryDiskMB` explicitly.
615
+
616
+ **Co-change is the one pass that gets cut** on huge histories: its
617
+ pair-counting is O(commits × files²), the only super-linear step in
618
+ the plugin, so above the threshold it's skipped (commit/churn/recency
619
+ still run).
620
+
621
+ One input, three tiers, a table — not a pile of heuristics — so the
622
+ behaviour stays inspectable: the chosen tier and every knob it moved
623
+ are logged each run (`prefill: repo tier=large (9000 commits) — …`).
624
+ Adaptation only fills knobs the user did **not** set explicitly; an
625
+ explicit config value always wins, including `maxMemoryDiskMB` set
626
+ below the 50 MB default. `adaptive: false` pins everything to the
627
+ fixed defaults.
628
+
629
+ When there's no git, the file count is the signal instead — same
630
+ mechanism, different sensor — so adaptive sizing still works on a
631
+ non-git repo.
632
+
633
+ ## Code map
634
+
635
+ `enableCodeMap` turns on tree-sitter parsing of every source file
636
+ into its per-file structural shape. It is **on by default since
637
+ v0.0.4** — set `enableCodeMap: false` to disable it (the grammar
638
+ `.wasm` files are shipped regardless; the flag only controls whether
639
+ the ingester runs). It is the one deliberate exception to the
640
+ plugin's otherwise-lightweight design:
641
+
642
+ - Covers **thirteen languages**. Ten are extracted as definition
643
+ signatures (JavaScript, TypeScript, Python, Go, Rust, Java, C, C++,
644
+ C#, PHP); the other three get their own extractors since they have no
645
+ "definitions" — CSS → selectors and at-rules, JSON → top-level keys,
646
+ HTML → `id`-bearing and landmark elements.
647
+ - It adds `web-tree-sitter` (~300 KB) plus vendored grammar `.wasm`
648
+ (~16 MB total). Three grammars are most of that weight: C# (5.2 MB),
649
+ C++ (4.5 MB) and TypeScript (2.3 MB). With it on, the package is
650
+ ~16.5 MB rather than ~77 KB. Grammars load lazily — only for
651
+ languages actually present in the repo — but all `.wasm` ships in the
652
+ package; dropping a grammar you don't need is a small edit in
653
+ `code-map.ts` plus deleting one file.
654
+ - It is the only language-*aware* component: one small table maps
655
+ each grammar's node types to the kinds worth extracting. Files in a
656
+ language with no grammar are skipped; if `web-tree-sitter` fails to
657
+ load, code map degrades gracefully and the rest of the plugin is
658
+ unaffected.
659
+ - Measured: on a real 81-file Go repo the map cost ~45 tokens/file,
660
+ and a `memory_recall` + `memory_code_map` pair answered a "work on
661
+ feature X" scenario in ~700 tokens versus ~5,400 tokens of raw
662
+ discovery — an ~87 % reduction. Worth it or not is a per-setup
663
+ judgement; hence opt-in.
664
+
665
+ ## Session snapshots
666
+
667
+ The other categories hold *facts*; `session-snapshot` holds
668
+ *understanding* — a session's mental model, the decisions it made and
669
+ why, the conventions it learned that the code doesn't show. The agent
670
+ writes one with `memory_snapshot`; it is **pinned** (eviction-proof),
671
+ **one per session** (re-snapshotting replaces in place), and tags the
672
+ most recent other session's snapshot as `parent:<id>`.
673
+
674
+ Those `parent` tags are the entire mechanism — the snapshot set is a
675
+ branchable history with no DAG data structure, just edges in the tag
676
+ list. A later session resumes from the latest snapshot (prefill logs
677
+ the resume point); a parallel session reads the same shared store and
678
+ forks from the same point; a snapshot tagging an older parent is a
679
+ branch. It's the harness-side, no-model take on versioned agent
680
+ memory — continuity without embeddings or LLM summarisation.
681
+
682
+ ## Performance
683
+
684
+ All hot paths are O(1) or O(n), never O(n²). The in-memory working set
685
+ is a `Map<id, Memory>`, so insert, lookup and delete are O(1) —
686
+ `removeMemory` and `applyEviction` (which run on the per-event
687
+ `upsertBySubject` path and after every write) were O(n) array
688
+ operations before the Map. `insertIfMissing` uses a composite-key
689
+ `Map` for O(1) idempotency; `totalBytes` is a running counter;
690
+ `countsByCategory` reads the index directly; eviction sorts once per
691
+ *batch*, not per insert.
692
+
693
+ Persistence is a SQLite database (`bun:sqlite`) written behind a
694
+ debounced, failure-tolerant write-behind buffer: mutations record a
695
+ changed/deleted id, and the flush drains the buffer into one
696
+ transaction — a delta of only the changed rows, not a re-serialise of
697
+ the whole store the way the old JSON file did. The database is read
698
+ exactly once, at load; recall runs entirely against the in-memory
699
+ index and never touches it. At small scale this is not a speed win
700
+ over the JSON file — a ~1 MB store is cheap to rewrite wholesale, and
701
+ SQLite's per-transaction overhead is comparable. The win is at scale
702
+ and in the steady-state access pattern: on a 15,000-entry store,
703
+ touching a handful of memories and flushing costs ~4 ms (a delta of
704
+ the changed rows) versus ~40 ms for a JSON-style whole-file rewrite,
705
+ and that gap widens as the store grows — the incremental flush is
706
+ constant in the number of *changed* rows, the rewrite is linear in
707
+ the *whole* store. WAL mode also makes writes crash-safe and lets
708
+ parallel sessions share a repo. The migration is justified by that
709
+ scaling behaviour and crash-safety, not by small-store microbenchmarks.
710
+
711
+ ### Scaling — measured
712
+
713
+ `scripts/stress-scale.mjs` builds stores of increasing size with
714
+ realistic content (a wide vocabulary, co-change tags) and measures
715
+ every cost that grows with size. Eviction is disabled so the table is
716
+ the raw curve. Representative numbers on a dev machine:
717
+
718
+ | memories | store on disk | insert | full flush | reload | recall ×100 | incr. flush | heap |
719
+ |--:|--:|--:|--:|--:|--:|--:|--:|
720
+ | 5 000 | 1.2 MB | 0.3 s | 18 ms | 0.2 s | 23 ms | 13 ms | ~100 MB |
721
+ | 15 000 | 3.6 MB | 0.8 s | 77 ms | 0.7 s | 51 ms | 18 ms | ~275 MB |
722
+ | 25 000 | 6.0 MB | 1.3 s | 126 ms | 1.2 s | 77 ms | 50 ms | ~440 MB |
723
+
724
+ Every cost scales **linearly** — there is no quadratic term. Recall
725
+ stays ~1–3 ms per query throughout (BM25 over the in-memory index;
726
+ latency tracks how many memories match the query terms, not the store
727
+ size). Incremental flush stays a small near-flat delta — that's the
728
+ SQLite write-behind win. `tests/scaling.test.ts` is a gated guard at
729
+ 4 000 memories that would fail loudly if any of these went
730
+ super-linear.
731
+
732
+ The honest caveat is **heap**. The plugin holds the entire working
733
+ set in memory — the `byId` map, the inverted index (a term-frequency
734
+ map per memory, needed for BM25), and the co-change graph. That's
735
+ roughly 17 KB of heap per memory, ~70× the on-disk size. At a
736
+ realistic large store (~25k memories → ~440 MB) that's a chunky but
737
+ manageable footprint on a modern dev machine.
738
+
739
+ **The disk budget bounds RAM, not just disk.** Because heap tracks
740
+ memory count, and memory count tracks bytes stored, the byte budget is
741
+ effectively a RAM ceiling — about **70 MB of heap per 1 MB of
742
+ budget**, if the budget were ever filled. The default budget is 50 MB,
743
+ so the *theoretical* worst case is ~210k memories and ~3.5 GB of heap.
744
+
745
+ In practice a store never comes close. The git-history and code-map
746
+ ingesters are themselves depth-capped (≤ 1500 commits, ≤ 10 000
747
+ files), so a real store — even on a large repo — lands in the
748
+ **15–25k band: ~4–6 MB on disk, ~300–440 MB of heap**, far below the
749
+ 50 MB budget. That is the point of the generous default: at 50 MB the
750
+ budget is a *safety valve* for a runaway monorepo, not a routine
751
+ clipper. The previous 5 MB default was small enough that a normal
752
+ large repo (~25k memories ≈ 6 MB) hit the ceiling and lost useful
753
+ memories to eviction every run; 50 MB ends that.
754
+
755
+ If you run on an unusually large monorepo and the heap footprint
756
+ matters, `maxMemoryDiskMB` is the single knob — set it **down** (e.g.
757
+ `10`, ~700 MB heap ceiling) to cap RAM hard, or **up** if you have the
758
+ memory and want a deeper store. The budget is the RAM dial.
759
+
760
+ The fuller answer for a store that genuinely outgrows RAM is to move
761
+ the search index itself onto disk. SQLite is already the durable store
762
+ here, and SQLite's FTS5 is a disk-resident full-text index with BM25
763
+ built in — so a future version could keep the inverted index in FTS5
764
+ rather than in the heap, holding only a small working set in memory and
765
+ letting the rest live on disk. That is a real architectural change (the
766
+ CJK bigram tokenisation would move to an FTS5 custom/trigram tokenizer,
767
+ and ranking would shift from the in-process scorer to FTS5's), so it is
768
+ deliberately scoped as separate future work rather than bolted on; for
769
+ now the byte budget plus depth-capped ingesters are what keep the
770
+ in-memory footprint bounded.
771
+
772
+ Confirmed on real large repositories: ingesting `redis` (1.8k files),
773
+ `rocksdb` (2.2k files) and `spring-framework` (11.4k files) produced
774
+ 1.9k / 2.4k / 4.7k memories with a one-time background prefill of
775
+ ~9 / ~17 / ~11 seconds. On `spring-framework` the code-map count
776
+ stopped at exactly 4 000 — the file cap doing its job, which is why an
777
+ 11k-file repo prefilled no slower than a 2k-file one. The first session
778
+ gets partial recall until that prefill finishes; every session after is
779
+ warm.
780
+
781
+ ## Rich logs
782
+
783
+ In addition to the human-readable lines that go to OpenCode's session
784
+ log (via `client.app.log`), the plugin writes a structured JSON-Lines
785
+ log to `os.tmpdir()/opencode-diane/` — typically
786
+ `/tmp/opencode-diane/` on Linux,
787
+ `/var/folders/.../T/opencode-diane/` on macOS. One file per process,
788
+ named `diane-<iso-timestamp>-pid<pid>.jsonl`, so parallel OpenCode
789
+ sessions never interleave.
790
+
791
+ **Inside Docker:** the default `os.tmpdir()` path is ephemeral
792
+ container storage — fine for ad-hoc runs but lost when the container
793
+ exits. Set `OPENCODE_DIANE_LOG_DIR` to a mounted path and the logs
794
+ flow to the host:
795
+
796
+ ```bash
797
+ docker run \
798
+ -e OPENCODE_DIANE_LOG_DIR=/logs \
799
+ -v $PWD/logs:/logs \
800
+
801
+ # then from outside the container:
802
+ python3 analyze-logs.py --dir ./logs --plain
803
+ ```
804
+
805
+ The env var is the write-side override; `analyze-logs.py --dir` is
806
+ the read-side counterpart, so the two halves of the diagnostic loop
807
+ work together regardless of where the logs are.
808
+
809
+ Two record shapes share the file: prose `log()` lines and structured
810
+ `event()` records. Every record carries `ts` (ISO ms-precision),
811
+ `service`, and `root`. Prose lines add `level` (`debug`/`info`/`warn`/
812
+ `error`) and `message` (mirroring exactly what OpenCode's session log
813
+ shows). Events add `event` (a dotted name like `ingest.git`) and a
814
+ typed payload — counts, ms, ids. The header record is
815
+ `event: "session.start"` with the pid, Node version, platform, and
816
+ cwd, so opening the file in isolation always gives context.
817
+
818
+ The events fired today:
819
+
820
+ - `session.start` — header (pid, node, platform, cwd)
821
+ - `plugin.idle` — directory has no git history and no project files
822
+ - `plugin.active` — version, storeSize, bytesTotal, budgetBytes, feature flags
823
+ - `store.migration.failed` — the legacy `diane.json` → SQLite migration
824
+ hit an error (the cause is in the `reason` field). The plugin does
825
+ **not** crash on this: it starts with an empty database, leaves the
826
+ JSON file in place, and the next startup retries. Observed in the
827
+ field when running alongside heavyweight plugins (e.g. oh-my-opencode)
828
+ whose own startup contends for resources during ours.
829
+ - `prefill.start` / `prefill.complete` / `prefill.failed` (with ms)
830
+ - `adaptive.tuned` — the size signal and the chosen knobs
831
+ - `ingest.project`, `ingest.git`, `ingest.sessions`, `ingest.code-map`
832
+ / `ingest.code-map.skipped` — each ingester's raw counts
833
+ - `snapshot.resume` — id and total count when resuming
834
+ - `eviction` — removed count, bytes after, trigger
835
+ - `tool.call` — one record per tool invocation, with `tool`, `ms`,
836
+ `ok`, `args` (truncated to ~500 chars per string field) and either
837
+ `result` (a per-tool summary like `{hits, omitted}` or `{id,
838
+ sizeBytes, bytesTotal}`) or `error` on failure
839
+ - `mining.complete` / `mining.failed` — the background outcome of
840
+ `memory_mine_skills` (the tool returns immediately; these fire when
841
+ the background job finishes)
842
+
843
+ Because every line is independently valid JSON, the file is greppable
844
+ *and* `jq`-able. Common queries:
845
+
846
+ ```bash
847
+ # Tail the latest session
848
+ tail -f "$(ls -t /tmp/diane/*.jsonl | head -1)"
849
+
850
+ # Just the structured events from a specific run, in time order
851
+ jq -c 'select(.event)' /tmp/diane/diane-2026-05-15T*.jsonl
852
+
853
+ # Every tool call across all sessions, with timing
854
+ jq -c 'select(.event == "tool.call") | {tool, ms, ok}' /tmp/diane/*.jsonl
855
+
856
+ # Slow tool calls (> 100ms)
857
+ jq -c 'select(.event == "tool.call" and .ms > 100)' /tmp/diane/*.jsonl
858
+
859
+ # Find slow prefills (> 1 s)
860
+ jq -c 'select(.event == "prefill.complete" and .ms > 1000)' /tmp/diane/*.jsonl
861
+ ```
862
+
863
+ ### `analyze-logs.py`
864
+
865
+ A standalone Python script at the repo root that turns one or more
866
+ JSONL files into a report. Standalone means: stdlib only, no plugin
867
+ imports — you can copy the script to a machine that doesn't have the
868
+ plugin installed and analyse logs that came from one that does.
869
+
870
+ **Every report leads with a plain-language "What happened" summary.**
871
+ The raw log is a stream of dotted event names and typed payloads —
872
+ `prefill.complete`, `ingest.git scanned=1500`, `eviction removed=12` —
873
+ which is precise but assumes you know what each one means. The
874
+ analyzer's first job is to translate that into a numbered, jargon-free
875
+ account of what the plugin did and *why* it mattered, written for
876
+ someone who has never read the plugin's source. For example, instead
877
+ of `ingest.git scanned=1500 commitMemories=80` it writes: "it read
878
+ 1,500 commits of Git history and turned them into 80 compact notes
879
+ about which files change together … this is what lets the AI answer
880
+ 'what changed recently?' from memory instead of searching your files."
881
+ The technical sections (per-tool latency tables, the event timeline,
882
+ raw ingest counts) follow underneath for anyone who wants them.
883
+
884
+ `--plain` prints only that plain-language summary — the view for a
885
+ non-specialist or a quick "what did it just do?" check. `--json`
886
+ includes the same explanation as a string array per session, so an LLM
887
+ or downstream tool gets it too. Useful for bug reports
888
+ (`./analyze-logs.py --json > report.json` and attach it), quick local
889
+ debugging (`--timeline` shows the full chronological flow), or feeding
890
+ to an LLM as context. Examples:
891
+
892
+ ```bash
893
+ ./analyze-logs.py # plain summary + technical detail
894
+ ./analyze-logs.py --plain # plain-language summary only
895
+ ./analyze-logs.py --tail 3 --timeline # 3 newest, with chronological flow
896
+ ./analyze-logs.py --json # JSON (carries the explanation too)
897
+ ./analyze-logs.py --root /path/to/repo # filter to one repo
898
+ ./analyze-logs.py --quiet # one-line-per-session summary
899
+ ```
900
+
901
+ The plain-language explainer is covered by `tests/test_analyze_logs.py`
902
+ (Python `unittest`, stdlib only, wired into CI): the tests assert that
903
+ each major step is explained with its real numbers and its reason, and
904
+ that the plain output contains none of the raw event/field identifiers
905
+ — a machine-checkable proxy for "a non-specialist can read this".
906
+
907
+ The script is intentionally NOT bundled into the published npm
908
+ package — it's a development/debugging aid, not part of the runtime
909
+ plugin. It lives in the repo so it's there when you clone, and that's
910
+ the only coupling.
911
+
912
+ Reliability: writes are synchronous (`openSync` + `writeSync`), so a
913
+ line that "wrote" is on disk before the call returns — including
914
+ right before a crash, which is when these logs are most useful. A
915
+ write failure (disk full, permission lost mid-session) drops the fd
916
+ silently; the plugin keeps running and OpenCode's own log channel is
917
+ unaffected. A logger error never propagates.
918
+
919
+ Retention is the user's responsibility: the plugin never deletes its
920
+ own log files. On Linux they're cleared at reboot or by
921
+ `systemd-tmpfiles`; on macOS the periodic tmp cleaner removes them
922
+ after a few days of inactivity. For a manual sweep:
923
+ `rm /tmp/diane/*.jsonl`.
924
+
925
+ ## Tests & CI
926
+
927
+ 674 assertions across 24 test suites (covering storage, search, ingest,
928
+ cross-references, code-health, code-map, mining, sessions, adaptive tuning,
929
+ peer compatibility, configurable limits, and more). The ingest suite exercises real git fixtures
930
+ and a Rust project fixture; code-map parses a multi-language fixture
931
+ with the real grammars; the session-snapshot suite covers parent
932
+ linkage and pinned-survives-eviction; the plugin suite covers the
933
+ recall-first nudge hooks; the token-savings suite builds a fixture
934
+ repo with real history and asserts that recall is measurably cheaper
935
+ than raw discovery (see *Token savings*, below); the skill-activation
936
+ suite proves a skill mined mid-session is discoverable and loadable in
937
+ that same session, no restart; the scaling suite builds a 4 000-memory
938
+ store and guards correctness plus anti-quadratic timing ceilings (the
939
+ deep curve is `scripts/stress-scale.mjs` — see *Scaling*). Alongside
940
+ the Bun suites, `tests/test_analyze_logs.py` is a 12-test Python
941
+ (`unittest`, stdlib only) suite for the log analyzer's plain-language
942
+ explainer — it asserts the report stays legible to a non-specialist
943
+ (see *Rich logs*). CI runs typecheck →
944
+ lint (ESLint 9, type-aware) → build → tests → a smoke test of the
945
+ compiled `dist/` → a package-size guard → the Python analyzer tests,
946
+ all on the Bun runtime (with the preinstalled `python3` for the last
947
+ step), then a coverage job (`bun test --coverage`) enforces a
948
+ line/function coverage floor and uploads the lcov report. Coverage
949
+ sits around 90 % lines as Bun measures it. There is no Node version
950
+ matrix — OpenCode loads plugins under Bun, so Bun is what's tested.
951
+ The suites use a small self-contained assertion harness, so each runs
952
+ as a Bun script and self-gates on exit code.
953
+
954
+ A separate, informational workflow — `compare-aider` — is *not* part
955
+ of the merge gate. It's manually runnable (and runs monthly), installs
956
+ aider, and compares aider's tree-sitter repo-map to diane's
957
+ code map on a real repository, publishing the result to the run's job
958
+ summary. See *Token savings* below.
959
+
960
+ Verified unchanged against three real repositories — `rs/zerolog`
961
+ (Go), `BurntSushi/byteorder` (Rust), `petrovich/pytrovich` (Python) —
962
+ producing the same structural signals for each.
963
+
964
+ ## Development & packaging
965
+
966
+ The plugin runs under Bun (the runtime OpenCode loads plugins in), so
967
+ the whole toolchain is Bun-based. `tsc` is still the build step — it
968
+ emits the `.d.ts` files the npm package ships — but it runs under Bun
969
+ like everything else.
970
+
971
+ ```bash
972
+ bun install
973
+ bun run build # tsc -p tsconfig.json — emits dist/ + .d.ts
974
+ bun run lint # eslint src tests (type-aware; floating promises = error)
975
+ bun run test # 674 assertions across 24 test suites
976
+ bun run smoke # exercises the compiled dist/ as OpenCode would
977
+ bun run check:size # fails if the package exceeds its size ceiling
978
+ bun run typecheck # no emit
979
+ bun run coverage:check # bun test --coverage, fails under the coverage floor
980
+ bun run test:analyzer # python tests for the log analyzer's plain-language report
981
+ bun run verify:semantic # optional: runs the real e5 model on a 9-language fixture set
982
+ ```
983
+
984
+ CI (`.github/workflows/ci.yml`) runs typecheck → lint → build → test →
985
+ smoke → size-guard on Bun, then a separate coverage job. There is no
986
+ Node version matrix — OpenCode loads plugins under Bun, so Bun is what
987
+ is tested.
988
+
989
+ To publish a new version:
990
+
991
+ ```bash
992
+ bun run test && bun run smoke && bun run check:size # pre-flight: all must pass
993
+ bun run clean && bun run build # also the prepublishOnly script
994
+ npm version <patch|minor|major> # bump version + git tag
995
+ npm publish --access public # npm is the registry
996
+ ```
997
+
998
+ **The version lives in exactly one place:** `package.json#version`.
999
+ `npm version <patch|minor|major>` edits that field (and creates a
1000
+ matching git tag). At plugin startup `src/index.ts` reads it from
1001
+ that same `package.json` and the value flows from there to the
1002
+ `plugin.active` log event (so the running version is in every
1003
+ session's JSONL log) and to the `memory_status` tool's output (so an
1004
+ agent can ask which version is loaded). There is no second place to
1005
+ update — change `package.json#version`, rebuild, and every consumer
1006
+ picks up the new number.
1007
+
1008
+ `bun pm pack --dry-run` lists exactly what would be packed; the `files`
1009
+ allowlist in `package.json` limits the tarball to `dist/`, `grammars/`,
1010
+ `README.md`, `WIKI.md`, and `LICENSE`. `check:size` runs that
1011
+ `--dry-run` under the hood and fails CI if the unpacked size crosses
1012
+ its ceiling or a vendored grammar goes missing — so a size regression
1013
+ cannot ship silently.
1014
+
1015
+ ## Token savings
1016
+
1017
+ The plugin's premise is that a token-budgeted recall is cheaper than
1018
+ the raw discovery an agent would otherwise do. That claim is measured
1019
+ two ways, both with zero API spend.
1020
+
1021
+ ### What token reduction to expect
1022
+
1023
+ The honest range, from measured runs:
1024
+
1025
+ - **When a recall covers the task: 80–89 %.** Real-repo measurements
1026
+ (`measure-savings.mjs`): ~87 % on `zerolog`, ~89 % on `click`,
1027
+ ~85 % on `express` — ~8–11k tokens of raw discovery collapsing to
1028
+ ~1.1–1.2k of recall.
1029
+ - That figure is a **ceiling, not a promise.** It is "tokens saved
1030
+ *if the recall is relevant*" — it is not a relevance score. A recall
1031
+ can be cheap and still mediocre (see the `express` case under
1032
+ *Real-world usefulness*).
1033
+ - **Lower** on: terse-history repos (low-signal commit text), mature/
1034
+ stable repos (recent history is dependency bumps), dynamic-dispatch
1035
+ codebases (the code map extracts *declared* signatures), and very
1036
+ small repos (raw discovery was already cheap — reported as
1037
+ "inconclusive", not a loss).
1038
+ - The gated test floor is deliberately conservative: a fixture
1039
+ end-to-end orientation must be **> 25 %** cheaper, and recall output
1040
+ must stay within ~2× its own token budget — so the plugin's
1041
+ footprint can never turn a saving into a cost.
1042
+
1043
+ Before trusting it on your repo, run `scripts/dry-run.mjs <repo>`: it
1044
+ prints a **GOOD / MODERATE / LOW** verdict on the git-history signal
1045
+ and shows real query results with their token cost. That verdict is
1046
+ the answer for *your* repo, which no general percentage can give.
1047
+
1048
+ ### How it is measured
1049
+
1050
+ `scripts/measure-savings.mjs <repo>` runs a realistic *without-plugin*
1051
+ discovery recipe (recent git history, a tree listing, reading the
1052
+ files whose names match the task, a grep), sums the token cost, then
1053
+ runs the *with-plugin* memory calls and sums those. Both sides print
1054
+ what they ran. It's honest about coverage: thin recall results assume
1055
+ the agent still does part of the fallback discovery rather than
1056
+ claiming an unrealistic 100 %, and a non-git repo with an empty store
1057
+ is reported "inconclusive", not a win. Sample runs land around 80 % on
1058
+ real repos with history; on a tiny repo the saving is modest, because
1059
+ the baseline was already cheap — that's correct, not a failure.
1060
+
1061
+ `tests/token-savings.test.ts` turns the same method into gated
1062
+ assertions on a fixture repo with real history: a single file's
1063
+ history via recall vs `git log -p` (~5× cheaper), project facts via
1064
+ recall vs reading the config files, and a whole end-to-end
1065
+ orientation (>25 % fewer tokens). One case guards the floor — it
1066
+ verifies recall output stays within ~2× its token budget, so the
1067
+ plugin's own footprint can't run away and turn a "saving" into a cost.
1068
+
1069
+ For the code map specifically, the repo can compare against aider.
1070
+ `scripts/dump-code-map.mjs <repo>` prints diane's full code map
1071
+ as text; `aider --show-repo-map` prints aider's repo-map;
1072
+ `scripts/compare-aider.mjs <aider-map> <diane-map>` reports token cost
1073
+ and approximate coverage for both, with one tokenizer applied to each.
1074
+ The `compare-aider` workflow runs the whole thing in CI. The report is
1075
+ careful about what it shows: the two artifacts are different shapes —
1076
+ aider's repo-map embeds critical source lines and is trimmed to
1077
+ `--map-tokens` (default 1k) per turn; diane's code map is one
1078
+ signature digest per file, recalled as a query-ranked subset — so the
1079
+ figures are a coverage/footprint comparison of the full maps, not a
1080
+ head-to-head of per-request context cost.
1081
+
1082
+ ### `bun test` vs `bun run test`
1083
+
1084
+ These look the same and are not. `bun run test` is the canonical gate:
1085
+ it runs each suite as a script (`bun tests/<name>.test.ts`), uses the
1086
+ custom assertion harness, and self-gates on exit code per suite. Its
1087
+ output is the 343-pass/0-fail summary.
1088
+
1089
+ `bun test` (Bun's native test runner) discovers `*.test.ts` files in
1090
+ parallel and looks for `bun:test` registrations. Our suites use a
1091
+ custom harness, so Bun reports "0 tests" — that's correct, not a bug.
1092
+ The only places that invoke `bun test` are `coverage:check` (it needs
1093
+ Bun's `--coverage` instrumentation) and CI gating.
1094
+
1095
+ The most common confusion: `bun test` shows `1 fail / 1 error /
1096
+ Cannot find module '@opencode-ai/plugin'`. That means your
1097
+ `node_modules` is incomplete — usually a missing `bun install` on a
1098
+ fresh checkout. The `coverage:check` preflight catches this case
1099
+ explicitly and tells you to run `bun install`; if you see the error
1100
+ from raw `bun test`, the answer is the same.
1101
+
1102
+ ## Multilingual retrieval
1103
+
1104
+ Retrieval works for non-Latin scripts, with CJK as the driving case.
1105
+ The tokenizer handles two scripts in one pass: Latin/digit runs are
1106
+ split identifier-aware (camelCase, snake_case), and **CJK runs — Han,
1107
+ Hiragana, Katakana, Hangul — are emitted as overlapping bigrams**
1108
+ (`数据库连接` → `数据`,`据库`,`库连`,`连接`). A mixed string like
1109
+ `fix 数据库连接 bug` tokenizes to both `fix`/`bug` and the Chinese
1110
+ bigrams. Indexing and querying share the tokenizer, so the two sides
1111
+ always agree.
1112
+
1113
+ This matters because CJK has no spaces between words: an ASCII
1114
+ splitter treats every ideograph as a separator and **discards Chinese
1115
+ text entirely** — Chinese commit messages would index to nothing and
1116
+ Chinese queries would match nothing. Bigrams give BM25 overlapping
1117
+ units to match on. A dry run on a real Chinese repository confirmed
1118
+ `数据库索引` and `面试题` retrieve relevant Chinese commits where before
1119
+ they returned nothing.
1120
+
1121
+ The honest tradeoff: bigrams are the *lightweight* approach — the same
1122
+ one Lucene's CJK analyzer and SQLite FTS5 use — chosen because they're
1123
+ deterministic and need no dictionary or model. They take CJK recall
1124
+ from broken to working, but they're not as precise as true word
1125
+ segmentation: a bigram can be shared by unrelated words (`编程` is in
1126
+ both `并发编程` "concurrent programming" and `AI 编程` "AI programming"),
1127
+ so partial-match false positives happen — the same class of imprecision
1128
+ BM25 has for English. A statistical segmenter (jieba-style) would be
1129
+ more precise, but its dictionary alone is several MB and would break
1130
+ the package-size budget, so bigrams are the right point on that curve
1131
+ for the *lexical* index — which is the always-on default. (Genuine
1132
+ cross-lingual recall is a different problem with its own opt-in
1133
+ answer, *Semantic search*, below.)
1134
+
1135
+ One known refinement: the token-budget estimate is a flat ~4
1136
+ chars/token heuristic, which slightly *under*-counts CJK (CJK is
1137
+ denser per model token), so recall packs marginally more CJK content
1138
+ than the budget intends. It's a small imprecision in packing, not a
1139
+ correctness problem.
1140
+
1141
+ Note the scope of all the above: bigrams make retrieval work *within*
1142
+ a language — a Chinese query finding Chinese text. They cannot do
1143
+ *cross-lingual* recall — a Chinese or Russian query finding code
1144
+ commented in English — because lexical search matches tokens, and
1145
+ different scripts share none. That is a genuinely different problem,
1146
+ and it has its own opt-in answer below.
1147
+
1148
+ ## Semantic search
1149
+
1150
+ `enableSemanticSearch` (default **off**) adds opt-in **cross-lingual**
1151
+ retrieval: a query in one language finding code and comments written
1152
+ in another — e.g. a Russian or Chinese query surfacing an
1153
+ English-commented function. Lexical BM25 structurally cannot do this
1154
+ (a Russian query and English content share zero tokens); it needs an
1155
+ embedding model that places the languages in one shared vector space.
1156
+
1157
+ **How it works.** With the flag on, the plugin loads a small
1158
+ multilingual embedding model — `intfloat/e5` via the optional
1159
+ `@huggingface/transformers` dependency, default
1160
+ `Xenova/multilingual-e5-small` (~120 MB, ~384-dim, 100+ languages,
1161
+ downloaded and cached on first use). A background pass after prefill
1162
+ embeds every memory and stores the vectors in a **separate**
1163
+ `.opencode/diane-vectors.db`; the pass is incremental and crash-safe,
1164
+ so each memory is embedded once and reused across sessions. On a
1165
+ recall, the query is embedded and the two rankings — BM25 lexical and
1166
+ vector similarity — are merged with reciprocal-rank fusion (RRF), the
1167
+ standard position-only blend that needs no score calibration. The
1168
+ recall path itself stays synchronous: only the query embedding is
1169
+ async, done in the tool handler before the sync ranking.
1170
+
1171
+ **Off by default, and off means off.** When `enableSemanticSearch` is
1172
+ false: no model is downloaded, `@huggingface/transformers` is never
1173
+ imported (it is an *optional* peer dependency — a normal install never
1174
+ pulls it in), no vector database is created, and `recallDetailed`
1175
+ takes the byte-for-byte unchanged lexical path. The plugin's full
1176
+ existing test suite runs with the feature off and is the regression
1177
+ proof that the default path is untouched.
1178
+
1179
+ **Enabling it.**
1180
+
1181
+ ```sh
1182
+ bun add @huggingface/transformers # the optional dependency
1183
+ ```
1184
+
1185
+ ```jsonc
1186
+ // opencode.json
1187
+ ["opencode-diane", { "enableSemanticSearch": true }]
1188
+ ```
1189
+
1190
+ If the flag is on but the dependency is missing or the model can't be
1191
+ fetched, the plugin logs a warning and falls back to lexical search —
1192
+ enabling the flag never breaks recall.
1193
+
1194
+ **Cost, honestly.** The model is a real dependency: a one-time ~120 MB
1195
+ download, a few hundred MB of process RAM while loaded, and a
1196
+ background embedding pass that takes a few minutes on a large store
1197
+ the first time (incremental and cached thereafter). Each recall adds
1198
+ one query embedding (~tens of ms on CPU) plus a brute-force cosine
1199
+ scan (sub-millisecond at realistic store sizes). And it trades away
1200
+ the plugin's signature property: BM25 is inspectable — you can see
1201
+ *why* a hit matched — whereas an embedding match is a black box. That
1202
+ is the deliberate tradeoff for crossing languages, which is why it is
1203
+ opt-in rather than default.
1204
+
1205
+ **What is tested, and how.** diane's *pipeline* — the vector store,
1206
+ RRF fusion, the recall gating, graceful degradation, and end-to-end
1207
+ RU/EN/ZH cross-lingual retrieval — is covered in CI (`semantic.test.ts`)
1208
+ by a deterministic stub embedder with a built-in trilingual concept
1209
+ lexicon. The stub is used on purpose: the cross-lingual *quality* is a
1210
+ property of Microsoft's e5 model, benchmarked by its authors, and CI
1211
+ should not re-prove it by downloading 120 MB on every run. The real
1212
+ model is verified separately by `scripts/verify-semantic.mjs` (run it
1213
+ once where the Hugging Face Hub is reachable: `bun run verify:semantic`).
1214
+ That script covers **nine languages on a two-tier scheme**: a *core*
1215
+ tier of well-represented languages (English, Chinese, Russian,
1216
+ Japanese, Spanish, Turkish) whose cross-lingual matches gate the exit
1217
+ code, plus an *experimental* tier of low-resource Cyrillic languages
1218
+ (Mongolian, Tajik, Kyrgyz) whose results are reported but do not fail
1219
+ the script — an honest empirical view of how the model handles
1220
+ languages it was trained on with very uneven amounts of data, rather
1221
+ than a pretence that it handles all of them equally.
1222
+
1223
+ ## Real-world usefulness — when it helps, when it doesn't
1224
+
1225
+ The plugin was dry-run against real repositories — `rs/zerolog` (Go),
1226
+ `pallets/click` (Python), `expressjs/express` (JavaScript),
1227
+ `BurntSushi/byteorder` (Rust), `Snailclimb/JavaGuide` (Chinese),
1228
+ `redis/redis` (C), `facebook/rocksdb` (C++) and
1229
+ `spring-projects/spring-framework` (Java, 11k files) — using
1230
+ `scripts/dry-run.mjs` (ingests a checkout and shows the actual memories
1231
+ and the results of realistic developer queries) and
1232
+ `scripts/measure-savings.mjs` (models the token cost of raw discovery
1233
+ versus a recall). The honest findings:
1234
+
1235
+ **Measured token savings.** When recall covers a task, the saving is
1236
+ large: ~87 % on `zerolog`, ~89 % on `click`, ~85 % on `express` — raw
1237
+ discovery of ~8–11k tokens collapsing to ~1.1–1.2k. But that number is
1238
+ "tokens saved *if recall is relevant*". It is not a relevance score —
1239
+ see the express case below, where the token count looks great while the
1240
+ hits are mediocre. Treat the percentage as a ceiling, not a promise.
1241
+
1242
+ **It helps most on** repos with descriptive commit messages and
1243
+ *statically-declared* code structure (Go, Rust, typed Java/Python),
1244
+ under active development so recent history is substantive. On `zerolog`,
1245
+ "error handling" and "logging configuration" surfaced genuinely relevant
1246
+ commits and the code map gave compact, accurate API digests.
1247
+
1248
+ **It helps least on**:
1249
+ - *Terse-commit repos.* Commit messages are stored verbatim — the
1250
+ plugin derives nothing from message style — so a history of "fix",
1251
+ "wip", "update" yields low-signal memories. `dry-run.mjs` prints a
1252
+ GOOD / MODERATE / LOW verdict so you know before relying on it.
1253
+ - *Repos mid-mechanical-refactor.* A burst of renames or a doc
1254
+ migration produces many keyword-matching but signal-free commits. The
1255
+ git ingester detects **balanced churn** — additions ≈ deletions, the
1256
+ convention-free fingerprint of moved/reformatted content — and gives
1257
+ it no per-commit memory, as merge commits get none. On `click`,
1258
+ mid-`.rst`→`.md` migration, this filtered ~5 % of commits; on
1259
+ `zerolog` only ~2.5 %.
1260
+ - *Mature, stable repos.* On `express` (2000+ commits) recent history
1261
+ is dominated by dependency bumps, CI tweaks and test maintenance; the
1262
+ substantive architectural commits are old, possibly past the depth
1263
+ cap. Git-history memory is most valuable on actively-evolving code.
1264
+ - *Dynamic-dispatch codebases.* The tree-sitter code map extracts
1265
+ *declared* signatures, so its quality tracks how statically a
1266
+ language declares its API. It is **strong on C, C++, Java, Go and
1267
+ Rust** — dry runs on `redis`, `rocksdb` and `spring-framework`
1268
+ produced accurate signatures (`static int checkStringLength(client
1269
+ *c…)`, C++ namespaces/templates/inheritance, Java classes/methods).
1270
+ It is **weak on idiomatic dynamic JavaScript**: `express` builds its
1271
+ real API (`app.get`, `req.body`…) through prototype mutation and
1272
+ higher-order functions, so the extractor finds little (`lib/request.js`
1273
+ → "1 definition").
1274
+ - *Very small repos.* Little history → raw discovery was already cheap;
1275
+ `measure-savings.mjs` reports such cases as inconclusive, not a win.
1276
+
1277
+ **Keyword-on-filename bias.** Default retrieval is keyword BM25 (the
1278
+ deliberate embedding-free default) and it scores file *paths* as well
1279
+ as content, so a file *named* after a concept can outrank the real
1280
+ implementation. This was the most consistent weakness across the dry
1281
+ runs: on `express`,
1282
+ "routing and middleware" surfaced `test/middleware.basic.js` and a
1283
+ benchmark over `lib/router/`; on `rocksdb`, "write ahead log" surfaced
1284
+ test and bench files; on `spring-framework`, "bean lifecycle" surfaced
1285
+ JUnit fixture classes named `LifecycleBean.java`. The effect is
1286
+ *amplified* in verbose-naming languages — Java's long descriptive class
1287
+ names mean test and fixture classes match concept keywords strongly.
1288
+
1289
+ The mitigation is the `memory_recall` **`prefer`** option — a
1290
+ query-dependent intent lean the calling agent sets from what the user
1291
+ asked: `prefer:"code"` gently down-ranks test-pathed memories,
1292
+ `prefer:"tests"` lifts them, `prefer:"history"` favours change history.
1293
+ It is a mild score multiplier, deliberately **never a filter** — a
1294
+ strongly-matching test still surfaces under `"code"`, just lower —
1295
+ because sometimes the test really is what you want. On `spring`,
1296
+ `prefer:"code"` lifted the real `InitDestroyAnnotationBeanPostProcessor`
1297
+ above the JUnit fixtures for "bean lifecycle"; on `rocksdb` it separated
1298
+ `db_write_test.cc` from the implementation. The test signal itself is
1299
+ deliberately minimal and language-neutral — whether the word "test"
1300
+ appears as a *token* of the path, which catches `test/` directories,
1301
+ `_test.go` / `.test.ts` / `test_x.py` filenames alike without
1302
+ enumerating any one ecosystem's convention. It is the agent — already
1303
+ an LLM that understood the request in whatever natural language — that
1304
+ decides the intent; the plugin hardcodes no query keywords. Run
1305
+ `dry-run.mjs` on your own repo to see the lean in action.
1306
+
1307
+ ### Verifying it inside a live OpenCode session
1308
+
1309
+ The suites and smoke test exercise the plugin against the documented
1310
+ plugin contract with a mock host; they do **not** run it end-to-end
1311
+ inside a live OpenCode server — that gap is real and is best closed by
1312
+ running it. A quick manual check, in a real repo under OpenCode:
1313
+
1314
+ 1. Start OpenCode; confirm the plugin loads (no error; `memory_status`
1315
+ responds and, after prefill, reports a non-zero memory count).
1316
+ 2. Ask the agent something the history knows ("what changed recently
1317
+ around <area>"); confirm `memory_recall` is called and the results
1318
+ are relevant.
1319
+ 3. Run `memory_code_map` for a structural question; confirm the
1320
+ signatures are accurate.
1321
+ 4. Run `memory_mine_skills`, then `memory_skill` — confirm a skill
1322
+ mined this session lists and loads without a restart.
1323
+ 5. Skim a session log with `analyze-logs.py` to see the tool calls and
1324
+ their latencies.
1325
+
1326
+ ## What it is not
1327
+
1328
+ - **Not a vector store by default.** Lexical BM25, no neural ranker —
1329
+ though cross-lingual semantic search is an explicit opt-in.
1330
+ - **Not an LLM.** No model is bundled or called; everything is
1331
+ deterministic structure + BM25.
1332
+ - **Not an unbounded archive.** A configurable disk budget (50 MB default); least-used facts age out via LFU eviction.
1333
+ - **Not a substitute for AGENTS.md.** AGENTS.md is for fuzzy guidance every turn; this is for facts surfaced on demand.
1334
+ - **Not lossy by intent.** The store keeps verbatim content; eviction only kicks in over budget.
1335
+
1336
+ ## Live code-map refresh
1337
+
1338
+ When `enableCodeMap` is on and the agent modifies a source file using
1339
+ OpenCode's `write`, `edit`, or `patch` tool, the plugin **re-indexes
1340
+ that file's code-map memory immediately** — before the agent's next
1341
+ tool call — so `memory_code_map` never serves stale signatures within
1342
+ the same session.
1343
+
1344
+ How it works: the `tool.execute.before` hook records which file a
1345
+ `write`/`edit` is about to change; the `tool.execute.after` hook
1346
+ (which fires once the file is on disk in its new form) calls
1347
+ `ingestCodeMapForFile`, a per-file variant of the prefill walk. It
1348
+ reuses the already-warm tree-sitter engine (the wasm init and grammar
1349
+ loads only happen once per session, at prefill), so a single-file
1350
+ re-parse costs ~milliseconds. `upsertBySubject` replaces the old
1351
+ code-map memory in place — no duplicates, no accumulation.
1352
+
1353
+ **Bash-driven changes are also tracked (since v0.0.5).** After every
1354
+ `bash` tool call the plugin runs `git status --porcelain` to find
1355
+ files the shell command modified or created, then refreshes the
1356
+ code-map for each — up to `bashFileTrackingMaxFiles` (default 20).
1357
+ This closes the long-standing gap where `git checkout other-branch`,
1358
+ `npm run format`, `cargo fmt --all`, or `sed -i …` would leave stale
1359
+ signatures in the index. Deletions are skipped (there's no file on
1360
+ disk to re-index); renames track the destination path. Set
1361
+ `bashFileTrackingMaxFiles: 0` to opt out.
1362
+
1363
+ The cap matters: a `git checkout` between branches can touch thousands
1364
+ of files, and re-indexing each synchronously would stall the next tool
1365
+ call. The default 20 covers typical commit/format/codegen workflows
1366
+ without that risk. The plugin logs a `debug` line when files are
1367
+ skipped over the cap so it's visible in the JSONL log without flooding
1368
+ the agent.
1369
+
1370
+ ## Live session reflection
1371
+
1372
+ Three behaviours, added in v0.0.5, keep what's happening *right now*
1373
+ visible to the memory store:
1374
+
1375
+ **1. Live-session activity recording** (`recordSessionActivity`,
1376
+ default on). The current session's file edits and bash commands roll
1377
+ up into ONE memory under `session-trace` → `live:${sessionId}`,
1378
+ upserted in place after each event. Lets the current session recall
1379
+ "what have I touched so far" without scanning the OpenCode SDK, and
1380
+ pre-seeds the trace so the moment this session becomes "past", a
1381
+ successor sees it like any other. The memory is **not** pinned (it's
1382
+ transient state — eligible for eviction). Content is bounded
1383
+ (~4 KB) with a rolling list of recent bash commands; total counts
1384
+ stay accurate even after detail truncation.
1385
+
1386
+ **2. Post-bash code-map freshness** — covered above.
1387
+
1388
+ **3. Auto git re-ingest on HEAD movement** (`autoReingestGitOnHeadChange`,
1389
+ default on). After every `bash` call the plugin polls
1390
+ `git rev-parse HEAD`; if HEAD moved (pull / merge / rebase / checkout
1391
+ / reset), it queues a background re-ingest of git history.
1392
+ Idempotent — already-known commits are skipped via `insertIfMissing`,
1393
+ so the cost is roughly linear in the number of *new* commits.
1394
+ Concurrent triggers coalesce: only one re-ingest runs at a time, and
1395
+ further detections re-arm the flag for the next poll. The
1396
+ `memory_ingest_git` tool exposes the same logic as an explicit,
1397
+ on-demand call for cases the auto-detect can't cover (a fetch-only
1398
+ operation that brings new commits via another mechanism).
1399
+
1400
+ Together these three close the gaps surfaced by the v0.0.4 reflection
1401
+ verdict: the current session's work, bash-driven file changes, and
1402
+ post-merge commits are all visible to recall mid-session, not only
1403
+ after a restart.
1404
+
1405
+ ## Compatibility
1406
+
1407
+ Built against `@opencode-ai/plugin@1.14.x`. Runs on the Bun runtime
1408
+ (what OpenCode loads plugins under) — Bun ≥ 1.1. Uses documented
1409
+ hooks only — `tool` for custom tools, `event` for `lsp.client.diagnostics`,
1410
+ `tool.execute.before/after` for code-map refresh and the recall-first
1411
+ nudge, `client.app.log` for session logs. Storage is a SQLite database
1412
+ (`bun:sqlite`, built into the Bun runtime) you can inspect with any
1413
+ SQLite client.
1414
+
1415
+ Coexists with other plugins. With a **hook-heavy plugin** alongside it
1416
+ (e.g. `oh-my-opencode`), note that the recall-first nudge mutates
1417
+ `output.output` in a `tool.execute.after` hook — if you'd rather not
1418
+ have two plugins post-processing tool output, set
1419
+ `enableNudgeHook: false`. The nudge effect is then suppressed;
1420
+ **the hooks themselves remain registered** (they still run the
1421
+ code-map refresh). If the other plugin already does AST/LSP code
1422
+ intelligence, setting `enableCodeMap: false` avoids redundant work
1423
+ (and the grammar-wasm parse overhead) while Diane still covers the
1424
+ persistent memory store, git-structure signals, session ingestion,
1425
+ cross-references, and skill mining.
1426
+
1427
+
1428
+ ## License
1429
+
1430
+ MIT.