@psiclawops/hypermem 0.8.4 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (99) hide show
  1. package/CHANGELOG.md +33 -0
  2. package/INSTALL.md +203 -23
  3. package/README.md +139 -216
  4. package/bench/README.md +42 -0
  5. package/bench/data-access-bench.mjs +380 -0
  6. package/bin/hypermem-bench.mjs +2 -0
  7. package/bin/hypermem-doctor.mjs +412 -0
  8. package/bin/hypermem-model-audit.mjs +339 -0
  9. package/bin/hypermem-status.mjs +491 -70
  10. package/dist/adaptive-lifecycle.d.ts +81 -0
  11. package/dist/adaptive-lifecycle.d.ts.map +1 -0
  12. package/dist/adaptive-lifecycle.js +190 -0
  13. package/dist/background-indexer.js +9 -9
  14. package/dist/budget-policy.d.ts +1 -1
  15. package/dist/budget-policy.d.ts.map +1 -1
  16. package/dist/budget-policy.js +10 -5
  17. package/dist/cache.d.ts +4 -0
  18. package/dist/cache.d.ts.map +1 -1
  19. package/dist/cache.js +2 -0
  20. package/dist/composition-snapshot-integrity.d.ts +36 -0
  21. package/dist/composition-snapshot-integrity.d.ts.map +1 -0
  22. package/dist/composition-snapshot-integrity.js +131 -0
  23. package/dist/composition-snapshot-runtime.d.ts +59 -0
  24. package/dist/composition-snapshot-runtime.d.ts.map +1 -0
  25. package/dist/composition-snapshot-runtime.js +250 -0
  26. package/dist/composition-snapshot-store.d.ts +44 -0
  27. package/dist/composition-snapshot-store.d.ts.map +1 -0
  28. package/dist/composition-snapshot-store.js +117 -0
  29. package/dist/compositor.d.ts +125 -1
  30. package/dist/compositor.d.ts.map +1 -1
  31. package/dist/compositor.js +692 -44
  32. package/dist/cross-agent.d.ts +1 -1
  33. package/dist/cross-agent.js +17 -17
  34. package/dist/doc-chunk-store.d.ts +19 -0
  35. package/dist/doc-chunk-store.d.ts.map +1 -1
  36. package/dist/doc-chunk-store.js +56 -6
  37. package/dist/dreaming-promoter.d.ts +1 -1
  38. package/dist/dreaming-promoter.js +2 -2
  39. package/dist/hybrid-retrieval.d.ts +38 -0
  40. package/dist/hybrid-retrieval.d.ts.map +1 -1
  41. package/dist/hybrid-retrieval.js +86 -1
  42. package/dist/index.d.ts +15 -6
  43. package/dist/index.d.ts.map +1 -1
  44. package/dist/index.js +33 -7
  45. package/dist/knowledge-store.d.ts +4 -1
  46. package/dist/knowledge-store.d.ts.map +1 -1
  47. package/dist/knowledge-store.js +27 -4
  48. package/dist/library-schema.d.ts +12 -8
  49. package/dist/library-schema.d.ts.map +1 -1
  50. package/dist/library-schema.js +22 -8
  51. package/dist/message-store.d.ts.map +1 -1
  52. package/dist/message-store.js +7 -3
  53. package/dist/metrics-dashboard.d.ts +18 -1
  54. package/dist/metrics-dashboard.d.ts.map +1 -1
  55. package/dist/metrics-dashboard.js +52 -14
  56. package/dist/reranker.d.ts +1 -1
  57. package/dist/reranker.js +2 -2
  58. package/dist/schema.d.ts +1 -1
  59. package/dist/schema.d.ts.map +1 -1
  60. package/dist/schema.js +28 -1
  61. package/dist/seed.d.ts +1 -1
  62. package/dist/seed.d.ts.map +1 -1
  63. package/dist/seed.js +3 -1
  64. package/dist/session-flusher.d.ts +2 -2
  65. package/dist/session-flusher.js +2 -2
  66. package/dist/spawn-context.d.ts +1 -1
  67. package/dist/spawn-context.js +1 -1
  68. package/dist/topic-store.js +5 -5
  69. package/dist/topic-synthesizer.d.ts +20 -0
  70. package/dist/topic-synthesizer.d.ts.map +1 -1
  71. package/dist/topic-synthesizer.js +114 -4
  72. package/dist/trigger-registry.d.ts +1 -1
  73. package/dist/trigger-registry.d.ts.map +1 -1
  74. package/dist/trigger-registry.js +14 -6
  75. package/dist/types.d.ts +273 -3
  76. package/dist/types.d.ts.map +1 -1
  77. package/dist/version.d.ts +7 -7
  78. package/dist/version.d.ts.map +1 -1
  79. package/dist/version.js +17 -7
  80. package/docs/DIAGNOSTICS.md +205 -0
  81. package/docs/INTEGRATION_VALIDATION.md +186 -0
  82. package/docs/MIGRATION.md +9 -6
  83. package/docs/MIGRATION_GUIDE.md +125 -101
  84. package/docs/ROADMAP.md +238 -20
  85. package/docs/TUNING.md +30 -6
  86. package/install.sh +159 -408
  87. package/memory-plugin/LICENSE +190 -0
  88. package/memory-plugin/README.md +20 -0
  89. package/memory-plugin/dist/index.js +50 -0
  90. package/memory-plugin/package.json +2 -2
  91. package/package.json +18 -4
  92. package/plugin/LICENSE +190 -0
  93. package/plugin/README.md +20 -0
  94. package/plugin/dist/index.d.ts +55 -0
  95. package/plugin/dist/index.d.ts.map +1 -1
  96. package/plugin/dist/index.js +362 -42
  97. package/plugin/dist/index.js.map +1 -1
  98. package/plugin/package.json +2 -2
  99. package/scripts/install-runtime.mjs +13 -3
package/README.md CHANGED
@@ -8,7 +8,7 @@
8
8
 
9
9
  hypermem is a SQLite-backed runtime context engine for OpenClaw agents.
10
10
 
11
- **Quick install** (interactive, detects hardware, writes config):
11
+ **Quick install** (runtime staging + guided OpenClaw wiring):
12
12
 
13
13
  ```bash
14
14
  npm install @psiclawops/hypermem && npx hypermem-install
@@ -20,14 +20,23 @@ Or via the shell installer:
20
20
  curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
21
21
  ```
22
22
 
23
- Or install manually via `npm install @psiclawops/hypermem` see [Installation](#installation) for plugin wiring, embedding setup, and step-by-step paths.
23
+ Or install manually via `npm install @psiclawops/hypermem` - see [Installation](#installation) for the full declarative plugin path, verification checkpoints, and setup variants.
24
24
 
25
+ Release operators should also read:
26
+
27
+ - [INSTALL.md](./INSTALL.md) - canonical fresh install and upgrade guide
28
+ - [docs/INTEGRATION_VALIDATION.md](./docs/INTEGRATION_VALIDATION.md) - end-to-end integration validation contract
29
+ - [docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md) - status, model audit, compose, trim, and release diagnostics
30
+
31
+ A successful `hypermem-install` only stages the runtime. HyperMem is active only after OpenClaw config is wired, the gateway restarts, and logs show compose activity.
25
32
 
26
33
  ---
27
34
 
28
35
  ## The problem
29
36
 
30
- Every LLM conversation is composed at runtime. The model sees only what's in the prompt. It has no memory of prior sessions, no access to decisions made last week, no awareness of work that happened before this context window opened.
37
+ Your agent can feel sharp on day one, then start slipping as the work accumulates.
38
+
39
+ Not because the model got worse. Because each turn is composed at runtime, and the model only sees what made it into this prompt. It has no native memory of prior sessions, no direct access to last week's decisions, and no awareness of work that happened before this context window opened.
31
40
 
32
41
  Two questions make this concrete:
33
42
 
@@ -36,36 +45,38 @@ Two questions make this concrete:
36
45
  | *"What was Caesar's greatest military victory?"* | Training data | ✅ Answered correctly, no session context needed |
37
46
  | *"What did we decide about the retry logic last week?"* | Nothing (prior session is gone) | ❌ The decision existed only in that session |
38
47
 
39
- The difference isn't intelligence. It's what was in the prompt. Two failure modes follow:
48
+ The difference is not intelligence. It is prompt access. Three failure modes follow:
40
49
 
41
- **New-session amnesia.** The agent restarts and everything is gone. Decisions, preferences, work in progress: erased at the session boundary. Operators re-explain context. Agents re-ask questions already answered.
50
+ **New-session amnesia.** The agent restarts and the work disappears with the session. Decisions, preferences, and work in progress vanish at the boundary. Operators re-explain context. Agents re-ask questions that were already settled.
42
51
 
43
- **Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics (tool output, exact decisions, file paths) are lost in the summary. The agent keeps running, but degraded.
52
+ **Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics like tool output, exact decisions, and file paths are the first things to get flattened. The agent keeps running, but with less ground truth than it had a few turns ago.
44
53
 
45
- **Bloated context.** 128k tokens doesn't mean 128k of useful prompt. Without active curation, agents fill the window with stale history, redundant instructions, and memory that isn't relevant to this turn. A bigger context window just means more room to waste. The information is in the prompt somewhere, buried under content irrelevant to this turn.
54
+ **Bloated context.** 128k tokens does not mean 128k of useful prompt. Without active selection, agents fill the window with stale history, repeated instructions, and memory that does not matter to this turn. A bigger window just gives you more room to waste.
46
55
 
47
56
  ---
48
57
 
49
58
  ## What OpenClaw provides today
50
59
 
51
- OpenClaw addresses both failure modes with structured guidance files injected into every session:
60
+ OpenClaw already gives agents a stronger baseline than most stacks. It injects structured guidance into every session:
52
61
 
53
62
  | File | What it contributes | Survives session restart? |
54
63
  |---|---|---|
55
64
  | `SOUL.md` | Agent identity, voice, principles | ✅ always injected |
56
65
  | `USER.md` | User preferences, working style | ✅ always injected |
57
- | Task and workspace instruction files (for example AGENTS.md, job files, and related guidance) | ✅ always injected |
66
+ | Task/workspace instructions | `AGENTS.md`, job files, and related guidance | ✅ always injected |
58
67
  | `MEMORY.md` | Hand-curated decisions, facts, patterns | ✅ if manually maintained |
59
68
 
60
- These are powerful for identity and preferences. But the retry logic decision from last week? If nobody manually captured it into `MEMORY.md`, that session boundary erased it. The system is only as strong as its last manual update.
69
+ These files are strong at identity, user fit, and working style. They are not a durable memory system by themselves. If nobody copied the retry-logic decision into `MEMORY.md`, the next session does not know it happened.
61
70
 
62
- OpenClaw also ships compaction safeguards and hybrid file search. That's a solid baseline. It has limits. hypermem closes both gaps.
71
+ OpenClaw also ships compaction safeguards and hybrid file search. That is a solid baseline. What is still missing is durable recall across sessions, active prompt selection under pressure, and writing discipline that holds across long-running work. hypermem adds those layers.
63
72
 
64
73
  ---
65
74
 
66
75
  ## hypermem
67
76
 
68
- Four SQLite-backed memory databases, sub-millisecond retrieval, no external database services required. Runs in-process with local SQLite storage and local Nomic embeddings by default, with optional hosted embeddings for L3.
77
+ OpenClaw gives agents a strong starting shape: identity files, user guidance, task framing, compaction safeguards, and hybrid file search. What it does not add by default is durable recall across session boundaries. When a useful decision falls out of the prompt and nobody hand-copied it into `MEMORY.md`, it is gone.
78
+
79
+ hypermem closes that gap with four SQLite-backed memory layers that stay local, run in-process, and remain queryable across sessions. No external database service. No retrieval stack to babysit.
69
80
 
70
81
  | Layer | What it holds | Speed |
71
82
  |---|---|---|
@@ -74,22 +85,24 @@ Four SQLite-backed memory databases, sub-millisecond retrieval, no external data
74
85
  | **L3 Semantic** | Finds related content even when the words don't match. | 0.29ms |
75
86
  | **L4 Knowledge** | Facts, wiki pages, episodes, preferences. Shared across agents. | 0.09ms |
76
87
 
77
- Everything is retained. Storage survives session boundaries. The retry logic decision from last week, the deployment preferences from last month, the architecture choices from day one: all queryable, all available for composition.
88
+ Durable context stays in SQLite and remains queryable across session boundaries. The retry logic decision from last week, the deployment preferences from last month, and the architecture choices from day one can be pulled back in when they matter.
78
89
 
79
- **Session warming.** Before the first turn fires, hypermem pre-loads the agent's full working state from its SQLite-backed memory stores and hot `:memory:` cache: recent history, facts ranked by confidence and recency, active topic context, cached embeddings for fast semantic recall. The agent's first reply draws from everything that was in scope at the end of the last session. The agent picks up where it left off.
90
+ That changes OpenClaw in a few concrete ways. Starts are warm instead of blank because recent history, ranked facts, active topics, and cached semantic state are loaded before the first turn. Recall survives wording drift because FTS5, sqlite-vec, RRF fusion, and an optional reranker can recover the same idea through different phrasing. Time-aware facts can answer “last week” and “before the release” as retrieval problems instead of vague prompt guessing. Shared knowledge stops living in one agent’s scratchpad because `library.db` holds facts, docs, episodes, preferences, fleet state, and output standards with visibility controls.
80
91
 
81
92
  ---
82
93
 
83
94
  ## hypercompositor
84
95
 
85
- Every memory system stores. Almost none compose.
96
+ Storage is only half the problem. The harder question is what actually reaches the model.
97
+
98
+ Most memory systems can save useful state. Far fewer can decide, turn by turn, what belongs in the prompt right now and what should stay on disk. Without that layer, long sessions bloat, tool output crowds out current work, and a larger context window just gives you more room to waste tokens.
86
99
 
87
- Your agent has four layers of stored context, but what shows up in the prompt? How much of the token budget goes to stale content? Who decides what's relevant to this specific turn?
100
+ hypercompositor queries all four memory layers in parallel, scores what matters for the current turn, and composes a fresh prompt inside a fixed budget. Content that does not fit is not destroyed. It stays in storage and can win its way back in when the topic returns.
88
101
 
89
- The hypercompositor queries all four layers in parallel on every turn and composes context within a fixed token budget. No transcript accumulates. No lossy transcript summarization. Amnesia isn't a storage problem; the memories exist, but nobody composed them into a coherent prompt. Compaction isn't inevitable; content that doesn't fit this turn stays in storage instead of being destroyed.
102
+ That changes OpenClaw at the prompt boundary. Selection replaces loss. Tool calls and results stay paired, recent turns stay readable, and older payloads compress by age instead of being flattened blindly. Quiet topics compile into structured wiki pages so the next turn can inject the decision trail without replaying raw transcript. Duplicate prompt spend drops because facts, doc chunks, semantic hits, and bootstrap content are fingerprinted before insertion. Subagents inherit a bounded handoff instead of a random slice of parent history.
90
103
 
91
- **Bigger context windows don't help if you fill them with stale history.**
92
- 128k tokens of stale history and irrelevant memory is worse than 32k of precisely selected content. 9 budget categories, priority-ordered, greedy-fill. Every token in the prompt earned its spot.
104
+ **A bigger context window does not fix bad composition.**
105
+ 128k tokens of stale history is worse than 32k of selected context. hypercompositor treats prompt space as a constrained resource, not a dumping ground.
93
106
 
94
107
  ### What the model actually sees
95
108
 
@@ -123,7 +136,7 @@ OpenClaw default hypercompositor
123
136
  ──────────────────────────────── ────────────────────────────────
124
137
  message → append to transcript message → detect active topic
125
138
  transcript full → trim oldest query 4 storage layers in parallel
126
- trimmed content → summarize (lossy) budget allocator: 9 slots, fixed cap
139
+ trimmed content → summarize (lossy) budget allocator: 10 slots, fixed cap
127
140
  send transcript to model tool compression by turn age
128
141
  model responds → append again keystone guard + hyperform profile
129
142
  composed prompt → model
@@ -148,7 +161,7 @@ When it fills: When budget is exceeded:
148
161
 
149
162
  High-signal turns are marked as keystones and survive pressure trimming ahead of ordinary history.
150
163
 
151
- The compositor fills 9 slots in priority order (system prompt → identity → hyperform → history → facts → wiki → semantic recall → cross-session → action summary). Each slot consumes tokens from the remaining budget before the next slot runs. Slots that don't fit this turn stay in storage, not destroyed.
164
+ The compositor fills 10 slots in priority order (system prompt → identity → hyperform → history → recent tools keystones → wiki/knowledgefacts → semantic recall → reserve/action context). Each slot consumes tokens from the remaining budget before the next slot runs. Slots that do not fit this turn stay in storage, not destroyed.
152
165
 
153
166
  For the full fill order, budget formula, and all configuration knobs, see **[Tuning](#tuning)** below and **[docs/TUNING.md](./docs/TUNING.md)**.
154
167
 
@@ -156,9 +169,13 @@ For the full fill order, budget formula, and all configuration knobs, see **[Tun
156
169
 
157
170
  ## hyperform
158
171
 
159
- Raw model output has two problems. It drifts from your standards (sycophancy, hedging, pagination, formatting) and it drifts from your facts (confabulation, contradiction, stale claims). hyperform handles both: normalization enforces consistency, confabulation resistance checks output against what's actually stored.
172
+ Good memory is wasted if the model still writes like it has no standards.
160
173
 
161
- Consistent output isn't just aesthetic. A model that paginates short answers, preambles with filler, or inflates lists uses more output tokens per turn. Over hundreds of turns, that compounds into real cost. hyperform directives compress output at the source: fewer tokens generated means lower API spend per session, and less context pressure for subsequent turns.
174
+ OpenClaw can preserve identity and instruction. That does not guarantee consistent delivery. Models still drift into filler openings, hedging, bloated lists, pagination, and stale claims. Over long sessions that is not just annoying copy. It is token waste, weaker signal, and lower trust in what gets written back into memory.
175
+
176
+ hyperform adds a writing contract at prompt time. Output profiles inject shared standards before generation. Model directives correct known provider habits. Confabulation resistance checks candidate claims against stored facts before new memory is recorded.
177
+
178
+ That gives OpenClaw something it does not get from raw prompting alone: fleet-wide writing discipline, model-aware correction, and tighter claim hygiene at the memory boundary. The point is not to post-process prose into something artificial. The point is to make the first draft cleaner, shorter, and harder to contaminate with unsupported claims.
162
179
 
163
180
  ### Behavior standards
164
181
 
@@ -184,14 +201,14 @@ Model adaptation is only active at the `full` tier. At `light` and `standard`, m
184
201
 
185
202
  The `model_output_directives` table starts empty. You populate it with corrections for the models you run. See [docs/TUNING.md](./docs/TUNING.md#creating-custom-entries) for the schema and SQL examples.
186
203
 
187
- ### Before and after
204
+ ### Illustrative before and after
188
205
 
189
- The same prompt, GPT-5.4, with and without `hyperformProfile: "light"`:
206
+ The example below shows the intended effect of `hyperformProfile: "light"`. hyperform is prompt-time shaping, not a deterministic post-generation rewrite engine:
190
207
 
191
208
  ```
192
209
  Prompt: "How should I size my context window budget for a long-running agent session?"
193
210
 
194
- WITHOUT normalization (GPT-5.4 default):
211
+ WITHOUT hyperform shaping (GPT-5.4 default):
195
212
  Here are the key factors to consider when sizing your context window budget:
196
213
 
197
214
  **1. Session depth**
@@ -211,46 +228,12 @@ tool context, and leave ~30k as allocator reserve. hypermem handles slot competi
211
228
  automatically. Set `reserveFraction` to your preferred floor and let the compositor fill.
212
229
  ```
213
230
 
214
- **Confabulation resistance** checks output against stored facts before claims are recorded. No LLM call. Pattern matching against the fact corpus, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
231
+ **Confabulation resistance** checks stored claims against existing facts before new memory entries are recorded. No LLM call. Pattern matching against the fact corpus, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
215
232
 
216
233
  Set `compositor.hyperformProfile` to `light`, `standard`, or `full`. For tier selection guidance, configuration details, and custom entry creation, see **[Tuning](#tuning)** below and **[docs/TUNING.md](./docs/TUNING.md)**.
217
234
 
218
235
  ---
219
236
 
220
- ## What it solves
221
-
222
- ### Tool output that doesn't take over
223
-
224
- Agentic sessions generate massive tool output. Left unmanaged, old results crowd out current reasoning. hypermem compresses tool history by age: recent clusters stay full, older clusters are capped, and the oldest collapse to short stubs while preserving tool call/result integrity. The budget goes to current work, not last hour's npm test output.
225
-
226
- ### Knowledge that outlasts the conversation
227
-
228
- Most memory systems store what was said. hypermem synthesizes what was learned.
229
-
230
- When a topic goes quiet, hypermem compiles the thread into a structured wiki page: decisions, open questions, artifacts, participants. When the topic resurfaces, the agent gets a compact structured summary rather than a raw history replay.
231
-
232
- OpenClaw 2026.4.7 ships memory wiki for structured storage. hypermem goes further: wiki pages are synthesized automatically and injected by the compositor within token budget, backed by SQLite memory databases instead of an external cache service.
233
-
234
- ### Subagents that hit the ground running
235
-
236
- Spawned subagents inherit a bounded context block: recent parent turns, session-scoped documents, and relevant facts. Scope is isolated from the shared library. Documents are cleaned up on completion.
237
-
238
- ### Context that doesn't repeat itself
239
-
240
- Retrieval paths pull from four layers, trigger shortcuts, temporal indexes, open-domain FTS5, semantic recall, and cross-session summaries. Without dedup, the same fact surfaces through multiple paths and wastes budget on repetition.
241
-
242
- hypermem runs content fingerprint dedup across all compose-time retrieval. Every fact, temporal result, open-domain hit, and semantic recall entry is normalized and fingerprinted on a 120-char prefix. O(1) lookup in a shared set catches duplicates regardless of which retrieval path produced them, including rephrased near-duplicates that substring matching missed. Diagnostics track dedup counts and fingerprint collisions per compose call.
243
-
244
- Identity content (SOUL.md, USER.md, IDENTITY.md) and doc chunks already injected by OpenClaw's bootstrap are fingerprinted before retrieval runs, so the compositor never double-injects content the runtime already placed in the prompt.
245
-
246
- ### Integrity under failure
247
-
248
- The background indexer runs a startup integrity check against `library.db` on every boot. If the schema is corrupt, tables are missing, or critical indexes are damaged, the indexer enters circuit-breaker mode: it logs the failure, skips indexing for the session, and avoids cascading writes into a broken database. The agent still runs with cached and in-memory data while the operator is notified.
249
-
250
- SQL queries that interpolate datetime values are fully parameterized. FTS5 trigger terms are quoted to prevent injection through crafted content. These aren't theoretical: agentic sessions ingest arbitrary user and tool output into the fact store, and unparameterized queries on that path were a real attack surface.
251
-
252
- ---
253
-
254
237
  ## Pressure management
255
238
 
256
239
  hypermem manages context pressure automatically through four escalating paths. Most sessions never need manual intervention. For trigger thresholds and path details, see [Pressure management](#pressure-management-1) below.
@@ -280,7 +263,15 @@ No configuration required for any of these:
280
263
 
281
264
  ## Speed
282
265
 
283
- Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB. 1,000 iterations, 50 warmup discarded, single-process isolation.
266
+ HyperMem ships a user-facing benchmark so operators can validate local memory access speed against their own dataset:
267
+
268
+ ```bash
269
+ hypermem-bench --iterations 1000 --warmup 50 --agent main
270
+ ```
271
+
272
+ The benchmark reports min, average, p50, p95, p99, and max timings for the storage paths present in the install: message hot-path lookups, session/conversation lookup, message FTS, facts, episodes, topics, fleet records, and doc chunks. It reads from `~/.openclaw/hypermem` by default, or from `HYPERMEM_DATA_DIR` / `--data-dir`.
273
+
274
+ Reference run, production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB, 1,000 iterations, 50 warmup discarded, single-process isolation.
284
275
 
285
276
  | Operation | avg | p50 | p95 |
286
277
  |---|---|---|---|
@@ -292,9 +283,10 @@ Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 kno
292
283
  | L4 FTS5 + agentId filter | 0.07ms | 0.06ms | 0.10ms |
293
284
  | L4 knowledge query | 0.09ms | 0.08ms | 0.14ms |
294
285
  | Recency decay scoring (28 rows, in JS) | 0.003ms | 0.002ms | 0.005ms |
295
- > Query planner uses compound indexes on agentId + sort key; FTS5 performance improved 25% from baseline after index additions despite a 47% increase in stored data.
296
286
 
297
- L1 and L4 structured retrieval are sub-millisecond. Vector embeddings are computed asynchronously after the assistant replies and cached in the in-memory layer, not on the primary composition call path. Users never wait for an embedding computation.
287
+ L1 and L4 structured retrieval are sub-millisecond on this dataset. Vector embeddings are computed asynchronously after the assistant replies and cached for later recall; hosted reranker latency depends on the chosen provider and is measured separately from SQLite access timings.
288
+
289
+ For reproducible commands and interpretation notes, see **[docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md#memory-access-benchmark)**.
298
290
 
299
291
  ---
300
292
 
@@ -319,28 +311,36 @@ The Plugin column is the npm package name. The ID column is what goes in `plugin
319
311
 
320
312
  Retrieval follows a fixed pipeline on every compose call:
321
313
 
322
- 1. **Trigger registry** fires first. Nine pattern triggers check for exact-match shortcuts. If one hits, scoped FTS5 prefix queries (`word1* OR word2*`) run against L4 collections and return immediately.
323
- 2. **Semantic fallback** fires when no trigger matches. Bounded hybrid retrieval runs FTS5 + KNN in parallel, then merges via Reciprocal Rank Fusion (RRF). BM25 ranks and KNN cosine distances combine into a single ordered result.
324
- 3. **Noise floor** filters anything below RRF 0.008 before it reaches the compositor.
314
+ 1. **Active facts** are ranked by confidence and recency.
315
+ 2. **Temporal retrieval** runs when the query has time signals.
316
+ 3. **Open-domain retrieval** handles broad exploratory queries over indexed memory.
317
+ 4. **Knowledge and preference blocks** add structured library context.
318
+ 5. **Hybrid semantic recall** runs FTS5 and KNN/vector search, then merges candidates with Reciprocal Rank Fusion (RRF).
319
+ 6. **Optional reranking** reorders fused candidates when a reranker is configured. Supported providers include ZeroEntropy, OpenRouter, and Ollama. If the reranker is absent, fails, times out, or has too few candidates, HyperMem keeps the original RRF order.
320
+ 7. **Trigger-based doc retrieval** pulls doctrine, policy, and workspace chunks by trigger match, with semantic fallback on misses.
321
+ 8. **Session-scoped spawn context** and **cross-session context** are added when relevant.
325
322
 
326
- FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries). These indexes yielded a 25% read improvement over baseline despite a 47% increase in stored data.
323
+ Diagnostics expose reranker status, candidate count, and provider, so operators can tell whether a turn used RRF only or reranked retrieval. FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries).
327
324
 
328
- ### Retrieval pipeline
325
+ ### Library and fleet data
329
326
 
330
327
  **L4: Library DB.** Per-agent storage can't hold shared knowledge. Facts established by one agent, wiki pages synthesized from cross-agent topics, shared registry state: these belong to the system, not one agent. One shared SQLite database:
331
328
 
332
329
  | Collection | What it holds |
333
330
  |---|---|
334
- | Facts | Claims with confidence scoring, domain, expiry, supersedes chains |
335
- | Knowledge | Domain/key/value structured data with full-text search |
336
- | Episodes | Significant events with impact scores and participant tracking |
337
- | Topics | Cross-session thread tracking and synthesized wiki pages |
338
- | Preferences | Operator behavioral patterns |
339
- | Fleet Registry | Agent registry with tier, org, and capability metadata |
340
- | System Registry | Service state and lifecycle |
341
- | Work Items | Work queue with status transitions and FTS5 |
342
- | Session Registry | Session lifecycle tracking |
343
- | Desired State | Per-agent config targets; compares running config against desired at gateway startup and surfaces drift for operator review |
331
+ | Facts | Claims with confidence, visibility, decay, temporal validity, and supersession chains |
332
+ | Knowledge / wiki | Domain knowledge and synthesized topic pages with full-text search |
333
+ | Episodes | Significant events, decisions, discoveries, participants, and source links |
334
+ | Topics | Cross-session thread tracking and topic lifecycle state |
335
+ | Preferences | Operator and agent behavior patterns |
336
+ | Documents | Chunked workspace/governance docs, doc sources, and trigger retrieval metadata |
337
+ | Knowledge graph | Links between facts, knowledge, topics, episodes, agents, and preferences |
338
+ | Fleet registry | Agents, orgs, tiers, capabilities, and fleet topology |
339
+ | Desired state | Per-agent config targets, config events, and drift detection |
340
+ | System / work state | Service state, system events, work items, and work events |
341
+ | Sessions | Session registry, lifecycle events, and extraction counters |
342
+ | Output standards | Fleet output standards, model directives, and output metrics |
343
+ | Temporal / expertise / audits | Temporal index, expertise patterns, contradiction audits, and indexer watermarks |
344
344
 
345
345
  Facts are ranked by `confidence × recencyDecay`, where decay is exponential with a configurable half-life: recent, high-confidence facts float to the top while stale entries yield budget to newer knowledge.
346
346
 
@@ -371,7 +371,7 @@ Facts are ranked by `confidence × recencyDecay`, where decay is exponential wit
371
371
 
372
372
  keystone guard ──► high-signal turns survive pressure
373
373
 
374
- hyperform ──► output normalization directives
374
+ hyperform ──► output profile directives
375
375
 
376
376
  composed prompt
377
377
 
@@ -386,20 +386,19 @@ Slot-level budget allocation is shown in the [hypercompositor diagram](#what-the
386
386
 
387
387
  ## Requirements
388
388
 
389
- **Current release: hypermem 0.8.2.** Changelog: [CHANGELOG.md](./CHANGELOG.md)
389
+ **Current release: hypermem 0.9.0.** Changelog: [CHANGELOG.md](./CHANGELOG.md)
390
390
 
391
391
  | Requirement | Version | Notes |
392
392
  |---|---|---|
393
393
  | **Node.js** | `>=22.0.0` | Required for native `node:sqlite` module |
394
- | **better-sqlite3** | `^11.x` | Installed automatically via npm; powers L1 in-memory and L4 library |
395
394
  | **sqlite-vec** | `0.1.9` | Bundled; no separate install needed |
396
395
 
397
- SQLite is a library, not a service. All four layers run in-process with no external daemons. The nomic embedder on Ollama is the heaviest component, and it is lighter than pgvector or any hosted vector database.
396
+ SQLite is a library, not a service. All four layers run in-process with no external database daemon. Embeddings are optional: use no embeddings for FTS-only lightweight mode, Ollama for local embeddings, or a hosted provider such as OpenRouter/Gemini when configured.
398
397
 
399
398
  **Runtime version constants** (importable from the package):
400
399
  ```typescript
401
400
  import {
402
- ENGINE_VERSION, // '0.8.2'
401
+ ENGINE_VERSION, // '0.9.0'
403
402
  MIN_NODE_VERSION, // '22.0.0'
404
403
  SQLITE_VEC_VERSION, // '0.1.9'
405
404
  MAIN_SCHEMA_VERSION, // 10 (messages.db)
@@ -413,105 +412,77 @@ Schema versions are stamped into each database on startup and checked on open. A
413
412
 
414
413
  ## Installation
415
414
 
416
- **Requirements:** Node.js 22+, OpenClaw with context engine plugin support. No standalone SQLite install needed (uses Node 22 built-in `node:sqlite`). Embedding provider is optional for first install.
415
+ **Requirements:** Node.js 22+, OpenClaw with context engine plugin support. No standalone SQLite install is needed because HyperMem uses Node 22 `node:sqlite`. Embeddings are optional on first install.
417
416
 
418
- hypermem works two ways:
419
- - **As a library** — import directly into your own Node.js code. No OpenClaw required.
420
- - **As an OpenClaw plugin** — replaces the default context engine. Requires a running OpenClaw gateway.
417
+ README is OpenClaw-first. For non-OpenClaw library usage, see **[INSTALL.md § Non-OpenClaw usage](./INSTALL.md#non-openclaw-usage)**.
421
418
 
422
- ### Library usage (no OpenClaw required)
419
+ ### OpenClaw quickstart
423
420
 
424
421
  ```bash
425
422
  npm install @psiclawops/hypermem
423
+ npx hypermem-install
426
424
  ```
427
425
 
428
- ```typescript
429
- import { HyperMem } from '@psiclawops/hypermem';
430
- import { join } from 'node:path';
431
- import { homedir } from 'node:os';
432
-
433
- const hm = await HyperMem.create({
434
- dataDir: join(homedir(), '.openclaw', 'hypermem'),
435
- embedding: { provider: 'none' },
436
- });
437
-
438
- await hm.recordUserMessage('my-agent', 'session-1', 'Hello');
439
- const composed = await hm.compose({
440
- agentId: 'my-agent',
441
- sessionKey: 'session-1',
442
- prompt: 'Hello',
443
- tokenBudget: 4000,
444
- provider: 'anthropic',
445
- });
446
- ```
426
+ `hypermem-install` stages the runtime payload into `~/.openclaw/plugins/hypermem`. It does **not** modify OpenClaw config and does **not** restart the gateway. HyperMem is active only after OpenClaw is wired, restarted, and compose activity appears in logs.
447
427
 
448
- That's it. No gateway, no plugins, no config files. See [API](#api) for the full interface.
428
+ Install states:
449
429
 
450
- ### OpenClaw plugin install (from source)
451
-
452
- > **Release note:** if the npm package you installed does not contain `install:runtime`, you are on an older public release. Use the source-clone path below or wait for `0.8.4+`.
453
-
454
- ```bash
455
- git clone https://github.com/PsiClawOps/hypermem.git
456
- cd hypermem
457
- npm install && npm run build
458
- npm --prefix plugin install && npm --prefix plugin run build
459
- npm --prefix memory-plugin install && npm --prefix memory-plugin run build
460
- npm run install:runtime
461
- ```
430
+ | State | Meaning |
431
+ |---|---|
432
+ | Package installed | npm package is present |
433
+ | Runtime staged | plugin payload copied into `~/.openclaw/plugins/hypermem` |
434
+ | OpenClaw wired | `plugins.load.paths`, `plugins.slots.contextEngine`, and `plugins.slots.memory` point at HyperMem |
435
+ | Runtime loaded | gateway restarted and both plugins loaded |
436
+ | Runtime active | logs show `hypermem initialized` and compose activity |
462
437
 
463
- `install:runtime` stages the runtime payload into `~/.openclaw/plugins/hypermem` and prints the exact config commands to wire the plugins. It does not finish wiring automatically. Before running them, create the data directory and config:
438
+ Minimal starter config for lightweight FTS-only mode:
464
439
 
465
440
  ```bash
466
441
  mkdir -p ~/.openclaw/hypermem
467
442
  cat > ~/.openclaw/hypermem/config.json <<'JSON'
468
443
  {
469
- "embedding": {
470
- "provider": "none"
444
+ "embedding": { "provider": "none" },
445
+ "compositor": {
446
+ "budgetFraction": 0.55,
447
+ "contextWindowReserve": 0.25,
448
+ "targetBudgetFraction": 0.50,
449
+ "warmHistoryBudgetFraction": 0.27,
450
+ "maxFacts": 25,
451
+ "maxHistoryMessages": 500,
452
+ "maxCrossSessionContext": 4000,
453
+ "maxRecentToolPairs": 3,
454
+ "maxProseToolPairs": 10,
455
+ "keystoneHistoryFraction": 0.15,
456
+ "keystoneMaxMessages": 12,
457
+ "wikiTokenCap": 500
471
458
  }
472
459
  }
473
460
  JSON
474
461
  ```
475
462
 
476
- This sets lightweight mode (FTS5 keyword search, no embedding provider needed). Add an embedding provider later for semantic search without losing stored data. See [INSTALL.md](./INSTALL.md#embedding-providers) for options.
477
-
478
- Wire the plugins into OpenClaw:
479
-
480
- > **⚠️ Merge, don't overwrite.** If you already have values in `plugins.load.paths` or `plugins.allow`, check them first and include your existing entries alongside the new ones. Replacing the list drops whatever was there before.
481
- >
482
- > ```bash
483
- > openclaw config get plugins.allow
484
- > openclaw config get plugins.load.paths
485
- > ```
463
+ Then merge the staged plugin paths into OpenClaw config and set the slots:
486
464
 
487
465
  ```bash
488
- # Use a variable to avoid shell quote-escaping issues with $HOME:
466
+ openclaw config get plugins.load.paths
467
+ openclaw config get plugins.allow
468
+
489
469
  HYPERMEM_PATHS="[\"${HOME}/.openclaw/plugins/hypermem/plugin\",\"${HOME}/.openclaw/plugins/hypermem/memory-plugin\"]"
490
470
  openclaw config set plugins.load.paths "$HYPERMEM_PATHS" --strict-json
491
- # If you have existing load paths, merge them into the array in HYPERMEM_PATHS.
492
-
493
471
  openclaw config set plugins.slots.contextEngine hypercompositor
494
472
  openclaw config set plugins.slots.memory hypermem
495
473
 
496
- # Only set plugins.allow if your OpenClaw config already uses an allowlist.
497
- # If `openclaw config get plugins.allow` returns null, empty, or unset, skip this step.
498
- # If it returns an array, copy that array and append "hypercompositor" and "hypermem".
474
+ # Only if your install already uses plugins.allow: merge, do not replace.
499
475
  openclaw config set plugins.allow '["existing-plugin","hypercompositor","hypermem"]' --strict-json
500
476
 
501
477
  openclaw gateway restart
478
+ hypermem-doctor --fix-plan
479
+ hypermem-status --health
480
+ hypermem-model-audit --strict
502
481
  ```
503
482
 
504
- Do **not** replace a working `plugins.allow` list with only `['hypercompositor','hypermem']`. That can disable bundled CLI surfaces and channel plugins.
505
-
506
- Verify (run these commands from the repo clone directory — `bin/` is a relative path):
507
-
508
- ```bash
509
- openclaw plugins list # hypercompositor and hypermem should show as loaded
510
- node bin/hypermem-status.mjs --health # confirms database initialization
511
- openclaw logs --limit 50 | grep hypermem # should show "hypermem initialized"
512
- ```
483
+ `hypermem-doctor` is the confidence check: it validates plugin wiring, runtime load state, recommended OpenClaw settings such as `contextPruning.mode=off`, GPT-5 personality overlay off, startup/bootstrap injection sizing, compaction safety settings, HyperMem data files, and model context-window overrides for GPT/OpenAI-compatible/local gateways. It is read-only and prints a reviewable fix plan.
513
484
 
514
- If you see `falling back to default engine "legacy"` in the logs, the install is not active. Check [INSTALL.md troubleshooting](./INSTALL.md#troubleshooting-clean-installs).
485
+ Full install, upgrade, source-clone, embedding provider, reranker, fleet config, and rollback guidance lives in **[INSTALL.md](./INSTALL.md)**.
515
486
 
516
487
  ### One-line installer
517
488
 
@@ -519,9 +490,7 @@ If you see `falling back to default engine "legacy"` in the logs, the install is
519
490
  curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
520
491
  ```
521
492
 
522
- Interactive: detects hardware, selects embedding tier, writes config, registers plugins.
523
-
524
- Full guide with embedding tiers, reranker setup, fleet config, and tuning: **[INSTALL.md](./INSTALL.md)**
493
+ The shell installer stages the runtime and prints merge-safe activation commands. It does not edit OpenClaw config or restart the gateway.
525
494
 
526
495
  ### Agent-assisted install
527
496
 
@@ -537,6 +506,8 @@ If you prefer, hand the install to your OpenClaw agent:
537
506
 
538
507
  ### Tuning
539
508
 
509
+ Do tuning **after** the install is verified active. If logs still show `legacy` fallback or no compose activity, you do not have a tuning problem yet. You have an install problem.
510
+
540
511
  Two independent surfaces: **context assembly** (what fills the context window) and **output shaping** (how the model writes). Pick a profile first. Most deployments adjust one or two settings on top.
541
512
 
542
513
  | Profile | Target window | Best for |
@@ -582,78 +553,30 @@ Or configure through `openclaw.json` (preferred for managed deployments):
582
553
 
583
554
  Plugin config in `openclaw.json` takes precedence over `config.json`. Both sources are merged, with plugin config winning on overlap. The config schema is validated on gateway start and visible via `openclaw config get plugins.entries.hypercompositor.config`.
584
555
 
585
- Full reference: **[docs/TUNING.md](./docs/TUNING.md)**
586
-
587
- ---
588
-
589
- ## API
590
-
591
- > **Note:** The examples below use placeholder agent names (`my-agent`, `agent1`, etc.). Replace these with your actual agent IDs from your OpenClaw config. Single-agent installs typically use `main`. Multi-agent fleets use whatever IDs you've configured. See [INSTALL.md § "Configure your fleet"](./INSTALL.md#step-5--configure-your-fleet-multi-agent-only) for details.
592
-
593
- ```typescript
594
- import { HyperMem } from '@psiclawops/hypermem';
595
- import { join } from 'node:path';
596
- import { homedir } from 'node:os';
597
-
598
- const hm = await HyperMem.create({
599
- dataDir: join(homedir(), '.openclaw', 'hypermem'),
600
- cache: { maxEntries: 10000 },
601
- // Local (Ollama):
602
- embedding: { ollamaUrl: 'http://localhost:11434', model: 'nomic-embed-text' },
603
- // Hosted (OpenRouter), recommended for installs without local GPU/CPU:
604
- // embedding: { provider: 'openai', openaiApiKey: 'sk-or-...', openaiBaseUrl: 'https://openrouter.ai/api/v1', model: 'qwen/qwen3-embedding-8b', dimensions: 4096, batchSize: 128 },
605
- });
606
-
607
- // Record and compose
608
- await hm.recordUserMessage('my-agent', 'agent:my-agent:webchat:main', 'How does drift detection work?');
609
-
610
- const composed = await hm.compose({
611
- agentId: 'my-agent',
612
- sessionKey: 'agent:my-agent:webchat:main',
613
- prompt: 'How does drift detection work?',
614
- tokenBudget: 4000,
615
- provider: 'anthropic',
616
- });
617
-
618
- // Refresh tool compression after each turn
619
- await hm.refreshCacheGradient('my-agent', 'agent:my-agent:webchat:main');
620
- ```
556
+ **Key tuning knobs:**
557
+ - `verboseLogging` — set to `true` in the compositor config to see per-turn budget resolution in the gateway logs (`budget source:` lines show which window size is active and why).
558
+ - `contextWindowOverrides` — override the detected context window per `"provider/model"` key when autodetect gives wrong results for custom, local, or finetuned models. Fixes all downstream budget fractions in one place.
621
559
 
622
- Spawning a subagent with parent context:
623
-
624
- ```typescript
625
- import { buildSpawnContext, MessageStore, DocChunkStore } from '@psiclawops/hypermem';
626
-
627
- const spawn = await buildSpawnContext(
628
- new MessageStore(hm.dbManager.getMessageDb('my-agent')),
629
- new DocChunkStore(hm.dbManager.getLibraryDb()),
630
- 'my-agent',
631
- { parentSessionKey: 'agent:my-agent:webchat:main', workingSnapshot: 12 }
632
- );
633
- ```
560
+ Full reference: **[docs/TUNING.md](./docs/TUNING.md)**
634
561
 
635
562
  ---
636
563
 
637
- ## CLI
564
+ ## API and CLI references
638
565
 
639
- `bin/hypermem-status.mjs` provides health checks and metrics from the command line:
566
+ README keeps the interface surface short. Use the detailed docs for exact examples and release validation commands.
640
567
 
641
- ```bash
642
- node bin/hypermem-status.mjs # full dashboard
643
- node bin/hypermem-status.mjs --agent my-agent # scoped to one agent
644
- node bin/hypermem-status.mjs --json # machine-readable output
645
- node bin/hypermem-status.mjs --health # health checks only (exit 1 on failure)
646
- ```
568
+ **Runtime API:** import `HyperMem` from `@psiclawops/hypermem` for direct Node.js use, custom tests, and non-OpenClaw integrations. See **[INSTALL.md § Non-OpenClaw usage](./INSTALL.md#non-openclaw-usage)** and package TypeScript declarations for the current interface.
647
569
 
648
- By default, `hypermem-status` looks for data in `~/.openclaw/hypermem`. If your data directory is elsewhere (e.g. testing in an isolated environment), set:
570
+ **Operator CLIs:**
649
571
 
650
572
  ```bash
651
- HYPERMEM_DATA_DIR=/path/to/data node bin/hypermem-status.mjs --health
573
+ hypermem-status --health
574
+ hypermem-status --master
575
+ hypermem-model-audit --strict
576
+ hypermem-bench --iterations 1000 --warmup 50 --agent main
652
577
  ```
653
578
 
654
- > **Fresh install note:** If no agent has run a session yet, `--health` will report "no sessions ingested" rather than a database error. This is expected. Send a test message to any agent, then re-run the health check.
655
-
656
- ---
579
+ Diagnostics and validation details: **[docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md)** and **[docs/INTEGRATION_VALIDATION.md](./docs/INTEGRATION_VALIDATION.md)**.
657
580
 
658
581
  ## Pressure management
659
582
 
@@ -704,7 +627,7 @@ Full troubleshooting: **[INSTALL.md § Troubleshooting](./INSTALL.md#troubleshoo
704
627
 
705
628
  hypermem doesn't touch your existing memory data. Install it, switch the context engine, and migrate historical data on your own timeline.
706
629
 
707
- The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, Mem0, Honcho, QMD session exports, and Engram. Each example walks through the data model mapping, transformation steps, and validation. Adapt them to your setup.
630
+ The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, QMD, ClawText, Cognee, Mem0, Zep, Honcho, memory-lancedb, MEMORY.md files, and custom engines. Each path documents source mapping, dry-run expectations, activation, rollback, and post-migration validation. Adapter snippets are examples unless explicitly shipped as package binaries.
708
631
 
709
632
  All examples default to dry-run. Nothing is written until you add `--apply`.
710
633
 
@@ -715,7 +638,7 @@ Operator guide: **[docs/MIGRATION_GUIDE.md](./docs/MIGRATION_GUIDE.md)**
715
638
 
716
639
  ## Identity layer
717
640
 
718
- hypermem handles context and output normalization. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
641
+ hypermem handles context assembly and output-profile shaping. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
719
642
 
720
643
  Design guide: [PsiClawOps/AgenticCognitiveArchitecture](https://github.com/PsiClawOps/AgenticCognitiveArchitecture/)
721
644