@psiclawops/hypermem 0.8.5 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (87) hide show
  1. package/CHANGELOG.md +26 -0
  2. package/INSTALL.md +132 -9
  3. package/README.md +119 -272
  4. package/bench/README.md +42 -0
  5. package/bench/data-access-bench.mjs +380 -0
  6. package/bin/hypermem-bench.mjs +2 -0
  7. package/bin/hypermem-doctor.mjs +412 -0
  8. package/bin/hypermem-model-audit.mjs +339 -0
  9. package/bin/hypermem-status.mjs +491 -70
  10. package/dist/adaptive-lifecycle.d.ts +81 -0
  11. package/dist/adaptive-lifecycle.d.ts.map +1 -0
  12. package/dist/adaptive-lifecycle.js +190 -0
  13. package/dist/budget-policy.d.ts +1 -1
  14. package/dist/budget-policy.d.ts.map +1 -1
  15. package/dist/budget-policy.js +10 -5
  16. package/dist/cache.d.ts +1 -0
  17. package/dist/cache.d.ts.map +1 -1
  18. package/dist/cache.js +2 -0
  19. package/dist/composition-snapshot-integrity.d.ts +36 -0
  20. package/dist/composition-snapshot-integrity.d.ts.map +1 -0
  21. package/dist/composition-snapshot-integrity.js +131 -0
  22. package/dist/composition-snapshot-runtime.d.ts +59 -0
  23. package/dist/composition-snapshot-runtime.d.ts.map +1 -0
  24. package/dist/composition-snapshot-runtime.js +250 -0
  25. package/dist/composition-snapshot-store.d.ts +44 -0
  26. package/dist/composition-snapshot-store.d.ts.map +1 -0
  27. package/dist/composition-snapshot-store.js +117 -0
  28. package/dist/compositor.d.ts +125 -1
  29. package/dist/compositor.d.ts.map +1 -1
  30. package/dist/compositor.js +692 -44
  31. package/dist/doc-chunk-store.d.ts +19 -0
  32. package/dist/doc-chunk-store.d.ts.map +1 -1
  33. package/dist/doc-chunk-store.js +56 -6
  34. package/dist/hybrid-retrieval.d.ts +38 -0
  35. package/dist/hybrid-retrieval.d.ts.map +1 -1
  36. package/dist/hybrid-retrieval.js +86 -1
  37. package/dist/index.d.ts +12 -3
  38. package/dist/index.d.ts.map +1 -1
  39. package/dist/index.js +28 -2
  40. package/dist/knowledge-store.d.ts +4 -1
  41. package/dist/knowledge-store.d.ts.map +1 -1
  42. package/dist/knowledge-store.js +27 -4
  43. package/dist/library-schema.d.ts +12 -8
  44. package/dist/library-schema.d.ts.map +1 -1
  45. package/dist/library-schema.js +22 -8
  46. package/dist/message-store.d.ts.map +1 -1
  47. package/dist/message-store.js +7 -3
  48. package/dist/metrics-dashboard.d.ts +18 -1
  49. package/dist/metrics-dashboard.d.ts.map +1 -1
  50. package/dist/metrics-dashboard.js +52 -14
  51. package/dist/reranker.d.ts +1 -1
  52. package/dist/reranker.js +2 -2
  53. package/dist/schema.d.ts +1 -1
  54. package/dist/schema.d.ts.map +1 -1
  55. package/dist/schema.js +28 -1
  56. package/dist/seed.d.ts.map +1 -1
  57. package/dist/seed.js +2 -0
  58. package/dist/topic-synthesizer.d.ts +20 -0
  59. package/dist/topic-synthesizer.d.ts.map +1 -1
  60. package/dist/topic-synthesizer.js +113 -3
  61. package/dist/trigger-registry.d.ts.map +1 -1
  62. package/dist/trigger-registry.js +10 -2
  63. package/dist/types.d.ts +271 -1
  64. package/dist/types.d.ts.map +1 -1
  65. package/dist/version.d.ts +7 -7
  66. package/dist/version.d.ts.map +1 -1
  67. package/dist/version.js +17 -7
  68. package/docs/DIAGNOSTICS.md +205 -0
  69. package/docs/INTEGRATION_VALIDATION.md +186 -0
  70. package/docs/MIGRATION.md +9 -6
  71. package/docs/MIGRATION_GUIDE.md +125 -101
  72. package/docs/ROADMAP.md +238 -20
  73. package/docs/TUNING.md +19 -5
  74. package/install.sh +152 -401
  75. package/memory-plugin/LICENSE +190 -0
  76. package/memory-plugin/README.md +20 -0
  77. package/memory-plugin/dist/index.js +50 -0
  78. package/memory-plugin/package.json +2 -2
  79. package/package.json +18 -4
  80. package/plugin/LICENSE +190 -0
  81. package/plugin/README.md +20 -0
  82. package/plugin/dist/index.d.ts +29 -0
  83. package/plugin/dist/index.d.ts.map +1 -1
  84. package/plugin/dist/index.js +288 -23
  85. package/plugin/dist/index.js.map +1 -1
  86. package/plugin/package.json +2 -2
  87. package/scripts/install-runtime.mjs +12 -1
package/README.md CHANGED
@@ -20,14 +20,23 @@ Or via the shell installer:
20
20
  curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
21
21
  ```
22
22
 
23
- Or install manually via `npm install @psiclawops/hypermem` see [Installation](#installation) for the full declarative plugin path, verification checkpoints, and setup variants.
23
+ Or install manually via `npm install @psiclawops/hypermem` - see [Installation](#installation) for the full declarative plugin path, verification checkpoints, and setup variants.
24
24
 
25
+ Release operators should also read:
26
+
27
+ - [INSTALL.md](./INSTALL.md) - canonical fresh install and upgrade guide
28
+ - [docs/INTEGRATION_VALIDATION.md](./docs/INTEGRATION_VALIDATION.md) - end-to-end integration validation contract
29
+ - [docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md) - status, model audit, compose, trim, and release diagnostics
30
+
31
+ A successful `hypermem-install` only stages the runtime. HyperMem is active only after OpenClaw config is wired, the gateway restarts, and logs show compose activity.
25
32
 
26
33
  ---
27
34
 
28
35
  ## The problem
29
36
 
30
- Every LLM conversation is composed at runtime. The model sees only what's in the prompt. It has no memory of prior sessions, no access to decisions made last week, no awareness of work that happened before this context window opened.
37
+ Your agent can feel sharp on day one, then start slipping as the work accumulates.
38
+
39
+ Not because the model got worse. Because each turn is composed at runtime, and the model only sees what made it into this prompt. It has no native memory of prior sessions, no direct access to last week's decisions, and no awareness of work that happened before this context window opened.
31
40
 
32
41
  Two questions make this concrete:
33
42
 
@@ -36,36 +45,38 @@ Two questions make this concrete:
36
45
  | *"What was Caesar's greatest military victory?"* | Training data | ✅ Answered correctly, no session context needed |
37
46
  | *"What did we decide about the retry logic last week?"* | Nothing (prior session is gone) | ❌ The decision existed only in that session |
38
47
 
39
- The difference isn't intelligence. It's what was in the prompt. Two failure modes follow:
48
+ The difference is not intelligence. It is prompt access. Three failure modes follow:
40
49
 
41
- **New-session amnesia.** The agent restarts and everything is gone. Decisions, preferences, work in progress: erased at the session boundary. Operators re-explain context. Agents re-ask questions already answered.
50
+ **New-session amnesia.** The agent restarts and the work disappears with the session. Decisions, preferences, and work in progress vanish at the boundary. Operators re-explain context. Agents re-ask questions that were already settled.
42
51
 
43
- **Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics (tool output, exact decisions, file paths) are lost in the summary. The agent keeps running, but degraded.
52
+ **Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics like tool output, exact decisions, and file paths are the first things to get flattened. The agent keeps running, but with less ground truth than it had a few turns ago.
44
53
 
45
- **Bloated context.** 128k tokens doesn't mean 128k of useful prompt. Without active curation, agents fill the window with stale history, redundant instructions, and memory that isn't relevant to this turn. A bigger context window just means more room to waste. The information is in the prompt somewhere, buried under content irrelevant to this turn.
54
+ **Bloated context.** 128k tokens does not mean 128k of useful prompt. Without active selection, agents fill the window with stale history, repeated instructions, and memory that does not matter to this turn. A bigger window just gives you more room to waste.
46
55
 
47
56
  ---
48
57
 
49
58
  ## What OpenClaw provides today
50
59
 
51
- OpenClaw addresses both failure modes with structured guidance files injected into every session:
60
+ OpenClaw already gives agents a stronger baseline than most stacks. It injects structured guidance into every session:
52
61
 
53
62
  | File | What it contributes | Survives session restart? |
54
63
  |---|---|---|
55
64
  | `SOUL.md` | Agent identity, voice, principles | ✅ always injected |
56
65
  | `USER.md` | User preferences, working style | ✅ always injected |
57
- | Task and workspace instruction files (for example AGENTS.md, job files, and related guidance) | ✅ always injected |
66
+ | Task/workspace instructions | `AGENTS.md`, job files, and related guidance | ✅ always injected |
58
67
  | `MEMORY.md` | Hand-curated decisions, facts, patterns | ✅ if manually maintained |
59
68
 
60
- These are powerful for identity and preferences. But the retry logic decision from last week? If nobody manually captured it into `MEMORY.md`, that session boundary erased it. The system is only as strong as its last manual update.
69
+ These files are strong at identity, user fit, and working style. They are not a durable memory system by themselves. If nobody copied the retry-logic decision into `MEMORY.md`, the next session does not know it happened.
61
70
 
62
- OpenClaw also ships compaction safeguards and hybrid file search. That's a solid baseline. It has limits. hypermem closes both gaps.
71
+ OpenClaw also ships compaction safeguards and hybrid file search. That is a solid baseline. What is still missing is durable recall across sessions, active prompt selection under pressure, and writing discipline that holds across long-running work. hypermem adds those layers.
63
72
 
64
73
  ---
65
74
 
66
75
  ## hypermem
67
76
 
68
- Four SQLite-backed memory databases, sub-millisecond retrieval, no external database services required. Runs in-process with local SQLite storage and local Nomic embeddings by default, with optional hosted embeddings for L3.
77
+ OpenClaw gives agents a strong starting shape: identity files, user guidance, task framing, compaction safeguards, and hybrid file search. What it does not add by default is durable recall across session boundaries. When a useful decision falls out of the prompt and nobody hand-copied it into `MEMORY.md`, it is gone.
78
+
79
+ hypermem closes that gap with four SQLite-backed memory layers that stay local, run in-process, and remain queryable across sessions. No external database service. No retrieval stack to babysit.
69
80
 
70
81
  | Layer | What it holds | Speed |
71
82
  |---|---|---|
@@ -74,22 +85,24 @@ Four SQLite-backed memory databases, sub-millisecond retrieval, no external data
74
85
  | **L3 Semantic** | Finds related content even when the words don't match. | 0.29ms |
75
86
  | **L4 Knowledge** | Facts, wiki pages, episodes, preferences. Shared across agents. | 0.09ms |
76
87
 
77
- Everything is retained. Storage survives session boundaries. The retry logic decision from last week, the deployment preferences from last month, the architecture choices from day one: all queryable, all available for composition.
88
+ Durable context stays in SQLite and remains queryable across session boundaries. The retry logic decision from last week, the deployment preferences from last month, and the architecture choices from day one can be pulled back in when they matter.
78
89
 
79
- **Session warming.** Before the first turn fires, hypermem pre-loads the agent's full working state from its SQLite-backed memory stores and hot `:memory:` cache: recent history, facts ranked by confidence and recency, active topic context, cached embeddings for fast semantic recall. The agent's first reply draws from everything that was in scope at the end of the last session. The agent picks up where it left off.
90
+ That changes OpenClaw in a few concrete ways. Starts are warm instead of blank because recent history, ranked facts, active topics, and cached semantic state are loaded before the first turn. Recall survives wording drift because FTS5, sqlite-vec, RRF fusion, and an optional reranker can recover the same idea through different phrasing. Time-aware facts can answer “last week” and “before the release” as retrieval problems instead of vague prompt guessing. Shared knowledge stops living in one agent’s scratchpad because `library.db` holds facts, docs, episodes, preferences, fleet state, and output standards with visibility controls.
80
91
 
81
92
  ---
82
93
 
83
94
  ## hypercompositor
84
95
 
85
- Every memory system stores. Almost none compose.
96
+ Storage is only half the problem. The harder question is what actually reaches the model.
97
+
98
+ Most memory systems can save useful state. Far fewer can decide, turn by turn, what belongs in the prompt right now and what should stay on disk. Without that layer, long sessions bloat, tool output crowds out current work, and a larger context window just gives you more room to waste tokens.
86
99
 
87
- Your agent has four layers of stored context, but what shows up in the prompt? How much of the token budget goes to stale content? Who decides what's relevant to this specific turn?
100
+ hypercompositor queries all four memory layers in parallel, scores what matters for the current turn, and composes a fresh prompt inside a fixed budget. Content that does not fit is not destroyed. It stays in storage and can win its way back in when the topic returns.
88
101
 
89
- The hypercompositor queries all four layers in parallel on every turn and composes context within a fixed token budget. No transcript accumulates. No lossy transcript summarization. Amnesia isn't a storage problem; the memories exist, but nobody composed them into a coherent prompt. Compaction isn't inevitable; content that doesn't fit this turn stays in storage instead of being destroyed.
102
+ That changes OpenClaw at the prompt boundary. Selection replaces loss. Tool calls and results stay paired, recent turns stay readable, and older payloads compress by age instead of being flattened blindly. Quiet topics compile into structured wiki pages so the next turn can inject the decision trail without replaying raw transcript. Duplicate prompt spend drops because facts, doc chunks, semantic hits, and bootstrap content are fingerprinted before insertion. Subagents inherit a bounded handoff instead of a random slice of parent history.
90
103
 
91
- **Bigger context windows don't help if you fill them with stale history.**
92
- 128k tokens of stale history and irrelevant memory is worse than 32k of precisely selected content. 9 budget categories, priority-ordered, greedy-fill. Every token in the prompt earned its spot.
104
+ **A bigger context window does not fix bad composition.**
105
+ 128k tokens of stale history is worse than 32k of selected context. hypercompositor treats prompt space as a constrained resource, not a dumping ground.
93
106
 
94
107
  ### What the model actually sees
95
108
 
@@ -123,7 +136,7 @@ OpenClaw default hypercompositor
123
136
  ──────────────────────────────── ────────────────────────────────
124
137
  message → append to transcript message → detect active topic
125
138
  transcript full → trim oldest query 4 storage layers in parallel
126
- trimmed content → summarize (lossy) budget allocator: 9 slots, fixed cap
139
+ trimmed content → summarize (lossy) budget allocator: 10 slots, fixed cap
127
140
  send transcript to model tool compression by turn age
128
141
  model responds → append again keystone guard + hyperform profile
129
142
  composed prompt → model
@@ -148,7 +161,7 @@ When it fills: When budget is exceeded:
148
161
 
149
162
  High-signal turns are marked as keystones and survive pressure trimming ahead of ordinary history.
150
163
 
151
- The compositor fills 9 slots in priority order (system prompt → identity → hyperform → history → facts → wiki → semantic recall → cross-session → action summary). Each slot consumes tokens from the remaining budget before the next slot runs. Slots that don't fit this turn stay in storage, not destroyed.
164
+ The compositor fills 10 slots in priority order (system prompt → identity → hyperform → history → recent tools keystones → wiki/knowledgefacts → semantic recall → reserve/action context). Each slot consumes tokens from the remaining budget before the next slot runs. Slots that do not fit this turn stay in storage, not destroyed.
152
165
 
153
166
  For the full fill order, budget formula, and all configuration knobs, see **[Tuning](#tuning)** below and **[docs/TUNING.md](./docs/TUNING.md)**.
154
167
 
@@ -156,9 +169,13 @@ For the full fill order, budget formula, and all configuration knobs, see **[Tun
156
169
 
157
170
  ## hyperform
158
171
 
159
- Raw model output has two problems. It drifts from your standards (sycophancy, hedging, pagination, formatting) and it drifts from your facts (confabulation, contradiction, stale claims). hyperform handles both: normalization enforces consistency, confabulation resistance checks output against what's actually stored.
172
+ Good memory is wasted if the model still writes like it has no standards.
173
+
174
+ OpenClaw can preserve identity and instruction. That does not guarantee consistent delivery. Models still drift into filler openings, hedging, bloated lists, pagination, and stale claims. Over long sessions that is not just annoying copy. It is token waste, weaker signal, and lower trust in what gets written back into memory.
160
175
 
161
- Consistent output isn't just aesthetic. A model that paginates short answers, preambles with filler, or inflates lists uses more output tokens per turn. Over hundreds of turns, that compounds into real cost. hyperform directives compress output at the source: fewer tokens generated means lower API spend per session, and less context pressure for subsequent turns.
176
+ hyperform adds a writing contract at prompt time. Output profiles inject shared standards before generation. Model directives correct known provider habits. Confabulation resistance checks candidate claims against stored facts before new memory is recorded.
177
+
178
+ That gives OpenClaw something it does not get from raw prompting alone: fleet-wide writing discipline, model-aware correction, and tighter claim hygiene at the memory boundary. The point is not to post-process prose into something artificial. The point is to make the first draft cleaner, shorter, and harder to contaminate with unsupported claims.
162
179
 
163
180
  ### Behavior standards
164
181
 
@@ -184,14 +201,14 @@ Model adaptation is only active at the `full` tier. At `light` and `standard`, m
184
201
 
185
202
  The `model_output_directives` table starts empty. You populate it with corrections for the models you run. See [docs/TUNING.md](./docs/TUNING.md#creating-custom-entries) for the schema and SQL examples.
186
203
 
187
- ### Before and after
204
+ ### Illustrative before and after
188
205
 
189
- The same prompt, GPT-5.4, with and without `hyperformProfile: "light"`:
206
+ The example below shows the intended effect of `hyperformProfile: "light"`. hyperform is prompt-time shaping, not a deterministic post-generation rewrite engine:
190
207
 
191
208
  ```
192
209
  Prompt: "How should I size my context window budget for a long-running agent session?"
193
210
 
194
- WITHOUT normalization (GPT-5.4 default):
211
+ WITHOUT hyperform shaping (GPT-5.4 default):
195
212
  Here are the key factors to consider when sizing your context window budget:
196
213
 
197
214
  **1. Session depth**
@@ -211,46 +228,12 @@ tool context, and leave ~30k as allocator reserve. hypermem handles slot competi
211
228
  automatically. Set `reserveFraction` to your preferred floor and let the compositor fill.
212
229
  ```
213
230
 
214
- **Confabulation resistance** checks output against stored facts before claims are recorded. No LLM call. Pattern matching against the fact corpus, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
231
+ **Confabulation resistance** checks stored claims against existing facts before new memory entries are recorded. No LLM call. Pattern matching against the fact corpus, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
215
232
 
216
233
  Set `compositor.hyperformProfile` to `light`, `standard`, or `full`. For tier selection guidance, configuration details, and custom entry creation, see **[Tuning](#tuning)** below and **[docs/TUNING.md](./docs/TUNING.md)**.
217
234
 
218
235
  ---
219
236
 
220
- ## What it solves
221
-
222
- ### Tool output that doesn't take over
223
-
224
- Agentic sessions generate massive tool output. Left unmanaged, old results crowd out current reasoning. hypermem compresses tool history by age: recent clusters stay full, older clusters are capped, and the oldest collapse to short stubs while preserving tool call/result integrity. The budget goes to current work, not last hour's npm test output.
225
-
226
- ### Knowledge that outlasts the conversation
227
-
228
- Most memory systems store what was said. hypermem synthesizes what was learned.
229
-
230
- When a topic goes quiet, hypermem compiles the thread into a structured wiki page: decisions, open questions, artifacts, participants. When the topic resurfaces, the agent gets a compact structured summary rather than a raw history replay.
231
-
232
- OpenClaw 2026.4.7 ships memory wiki for structured storage. hypermem goes further: wiki pages are synthesized automatically and injected by the compositor within token budget, backed by SQLite memory databases instead of an external cache service.
233
-
234
- ### Subagents that hit the ground running
235
-
236
- Spawned subagents inherit a bounded context block: recent parent turns, session-scoped documents, and relevant facts. Scope is isolated from the shared library. Documents are cleaned up on completion.
237
-
238
- ### Context that doesn't repeat itself
239
-
240
- Retrieval paths pull from four layers, trigger shortcuts, temporal indexes, open-domain FTS5, semantic recall, and cross-session summaries. Without dedup, the same fact surfaces through multiple paths and wastes budget on repetition.
241
-
242
- hypermem runs content fingerprint dedup across all compose-time retrieval. Every fact, temporal result, open-domain hit, and semantic recall entry is normalized and fingerprinted on a 120-char prefix. O(1) lookup in a shared set catches duplicates regardless of which retrieval path produced them, including rephrased near-duplicates that substring matching missed. Diagnostics track dedup counts and fingerprint collisions per compose call.
243
-
244
- Identity content (SOUL.md, USER.md, IDENTITY.md) and doc chunks already injected by OpenClaw's bootstrap are fingerprinted before retrieval runs, so the compositor never double-injects content the runtime already placed in the prompt.
245
-
246
- ### Integrity under failure
247
-
248
- The background indexer runs a startup integrity check against `library.db` on every boot. If the schema is corrupt, tables are missing, or critical indexes are damaged, the indexer enters circuit-breaker mode: it logs the failure, skips indexing for the session, and avoids cascading writes into a broken database. The agent still runs with cached and in-memory data while the operator is notified.
249
-
250
- SQL queries that interpolate datetime values are fully parameterized. FTS5 trigger terms are quoted to prevent injection through crafted content. These aren't theoretical: agentic sessions ingest arbitrary user and tool output into the fact store, and unparameterized queries on that path were a real attack surface.
251
-
252
- ---
253
-
254
237
  ## Pressure management
255
238
 
256
239
  hypermem manages context pressure automatically through four escalating paths. Most sessions never need manual intervention. For trigger thresholds and path details, see [Pressure management](#pressure-management-1) below.
@@ -280,7 +263,15 @@ No configuration required for any of these:
280
263
 
281
264
  ## Speed
282
265
 
283
- Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB. 1,000 iterations, 50 warmup discarded, single-process isolation.
266
+ HyperMem ships a user-facing benchmark so operators can validate local memory access speed against their own dataset:
267
+
268
+ ```bash
269
+ hypermem-bench --iterations 1000 --warmup 50 --agent main
270
+ ```
271
+
272
+ The benchmark reports min, average, p50, p95, p99, and max timings for the storage paths present in the install: message hot-path lookups, session/conversation lookup, message FTS, facts, episodes, topics, fleet records, and doc chunks. It reads from `~/.openclaw/hypermem` by default, or from `HYPERMEM_DATA_DIR` / `--data-dir`.
273
+
274
+ Reference run, production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB, 1,000 iterations, 50 warmup discarded, single-process isolation.
284
275
 
285
276
  | Operation | avg | p50 | p95 |
286
277
  |---|---|---|---|
@@ -292,9 +283,10 @@ Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 kno
292
283
  | L4 FTS5 + agentId filter | 0.07ms | 0.06ms | 0.10ms |
293
284
  | L4 knowledge query | 0.09ms | 0.08ms | 0.14ms |
294
285
  | Recency decay scoring (28 rows, in JS) | 0.003ms | 0.002ms | 0.005ms |
295
- > Query planner uses compound indexes on agentId + sort key; FTS5 performance improved 25% from baseline after index additions despite a 47% increase in stored data.
296
286
 
297
- L1 and L4 structured retrieval are sub-millisecond. Vector embeddings are computed asynchronously after the assistant replies and cached in the in-memory layer, not on the primary composition call path. Users never wait for an embedding computation.
287
+ L1 and L4 structured retrieval are sub-millisecond on this dataset. Vector embeddings are computed asynchronously after the assistant replies and cached for later recall; hosted reranker latency depends on the chosen provider and is measured separately from SQLite access timings.
288
+
289
+ For reproducible commands and interpretation notes, see **[docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md#memory-access-benchmark)**.
298
290
 
299
291
  ---
300
292
 
@@ -319,28 +311,36 @@ The Plugin column is the npm package name. The ID column is what goes in `plugin
319
311
 
320
312
  Retrieval follows a fixed pipeline on every compose call:
321
313
 
322
- 1. **Trigger registry** fires first. Nine pattern triggers check for exact-match shortcuts. If one hits, scoped FTS5 prefix queries (`word1* OR word2*`) run against L4 collections and return immediately.
323
- 2. **Semantic fallback** fires when no trigger matches. Bounded hybrid retrieval runs FTS5 + KNN in parallel, then merges via Reciprocal Rank Fusion (RRF). BM25 ranks and KNN cosine distances combine into a single ordered result.
324
- 3. **Noise floor** filters anything below RRF 0.008 before it reaches the compositor.
314
+ 1. **Active facts** are ranked by confidence and recency.
315
+ 2. **Temporal retrieval** runs when the query has time signals.
316
+ 3. **Open-domain retrieval** handles broad exploratory queries over indexed memory.
317
+ 4. **Knowledge and preference blocks** add structured library context.
318
+ 5. **Hybrid semantic recall** runs FTS5 and KNN/vector search, then merges candidates with Reciprocal Rank Fusion (RRF).
319
+ 6. **Optional reranking** reorders fused candidates when a reranker is configured. Supported providers include ZeroEntropy, OpenRouter, and Ollama. If the reranker is absent, fails, times out, or has too few candidates, HyperMem keeps the original RRF order.
320
+ 7. **Trigger-based doc retrieval** pulls doctrine, policy, and workspace chunks by trigger match, with semantic fallback on misses.
321
+ 8. **Session-scoped spawn context** and **cross-session context** are added when relevant.
325
322
 
326
- FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries). These indexes yielded a 25% read improvement over baseline despite a 47% increase in stored data.
323
+ Diagnostics expose reranker status, candidate count, and provider, so operators can tell whether a turn used RRF only or reranked retrieval. FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries).
327
324
 
328
- ### Retrieval pipeline
325
+ ### Library and fleet data
329
326
 
330
327
  **L4: Library DB.** Per-agent storage can't hold shared knowledge. Facts established by one agent, wiki pages synthesized from cross-agent topics, shared registry state: these belong to the system, not one agent. One shared SQLite database:
331
328
 
332
329
  | Collection | What it holds |
333
330
  |---|---|
334
- | Facts | Claims with confidence scoring, domain, expiry, supersedes chains |
335
- | Knowledge | Domain/key/value structured data with full-text search |
336
- | Episodes | Significant events with impact scores and participant tracking |
337
- | Topics | Cross-session thread tracking and synthesized wiki pages |
338
- | Preferences | Operator behavioral patterns |
339
- | Fleet Registry | Agent registry with tier, org, and capability metadata |
340
- | System Registry | Service state and lifecycle |
341
- | Work Items | Work queue with status transitions and FTS5 |
342
- | Session Registry | Session lifecycle tracking |
343
- | Desired State | Per-agent config targets; compares running config against desired at gateway startup and surfaces drift for operator review |
331
+ | Facts | Claims with confidence, visibility, decay, temporal validity, and supersession chains |
332
+ | Knowledge / wiki | Domain knowledge and synthesized topic pages with full-text search |
333
+ | Episodes | Significant events, decisions, discoveries, participants, and source links |
334
+ | Topics | Cross-session thread tracking and topic lifecycle state |
335
+ | Preferences | Operator and agent behavior patterns |
336
+ | Documents | Chunked workspace/governance docs, doc sources, and trigger retrieval metadata |
337
+ | Knowledge graph | Links between facts, knowledge, topics, episodes, agents, and preferences |
338
+ | Fleet registry | Agents, orgs, tiers, capabilities, and fleet topology |
339
+ | Desired state | Per-agent config targets, config events, and drift detection |
340
+ | System / work state | Service state, system events, work items, and work events |
341
+ | Sessions | Session registry, lifecycle events, and extraction counters |
342
+ | Output standards | Fleet output standards, model directives, and output metrics |
343
+ | Temporal / expertise / audits | Temporal index, expertise patterns, contradiction audits, and indexer watermarks |
344
344
 
345
345
  Facts are ranked by `confidence × recencyDecay`, where decay is exponential with a configurable half-life: recent, high-confidence facts float to the top while stale entries yield budget to newer knowledge.
346
346
 
@@ -371,7 +371,7 @@ Facts are ranked by `confidence × recencyDecay`, where decay is exponential wit
371
371
 
372
372
  keystone guard ──► high-signal turns survive pressure
373
373
 
374
- hyperform ──► output normalization directives
374
+ hyperform ──► output profile directives
375
375
 
376
376
  composed prompt
377
377
 
@@ -386,20 +386,19 @@ Slot-level budget allocation is shown in the [hypercompositor diagram](#what-the
386
386
 
387
387
  ## Requirements
388
388
 
389
- **Current release: hypermem 0.8.5.** Changelog: [CHANGELOG.md](./CHANGELOG.md)
389
+ **Current release: hypermem 0.9.0.** Changelog: [CHANGELOG.md](./CHANGELOG.md)
390
390
 
391
391
  | Requirement | Version | Notes |
392
392
  |---|---|---|
393
393
  | **Node.js** | `>=22.0.0` | Required for native `node:sqlite` module |
394
- | **better-sqlite3** | `^11.x` | Installed automatically via npm; powers L1 in-memory and L4 library |
395
394
  | **sqlite-vec** | `0.1.9` | Bundled; no separate install needed |
396
395
 
397
- SQLite is a library, not a service. All four layers run in-process with no external daemons. The nomic embedder on Ollama is the heaviest component, and it is lighter than pgvector or any hosted vector database.
396
+ SQLite is a library, not a service. All four layers run in-process with no external database daemon. Embeddings are optional: use no embeddings for FTS-only lightweight mode, Ollama for local embeddings, or a hosted provider such as OpenRouter/Gemini when configured.
398
397
 
399
398
  **Runtime version constants** (importable from the package):
400
399
  ```typescript
401
400
  import {
402
- ENGINE_VERSION, // '0.8.5'
401
+ ENGINE_VERSION, // '0.9.0'
403
402
  MIN_NODE_VERSION, // '22.0.0'
404
403
  SQLITE_VEC_VERSION, // '0.1.9'
405
404
  MAIN_SCHEMA_VERSION, // 10 (messages.db)
@@ -413,86 +412,36 @@ Schema versions are stamped into each database on startup and checked on open. A
413
412
 
414
413
  ## Installation
415
414
 
416
- **Requirements:** Node.js 22+, OpenClaw with context engine plugin support. No standalone SQLite install needed (uses Node 22 built-in `node:sqlite`). Embedding provider is optional for first install.
417
-
418
- hypermem works two ways:
419
- - **As a library** — import directly into your own Node.js code. No OpenClaw required.
420
- - **As an OpenClaw plugin** — replaces the default context engine. Requires a running OpenClaw gateway.
421
-
422
- ### Library usage (no OpenClaw required)
423
-
424
- ```bash
425
- npm install @psiclawops/hypermem
426
- ```
427
-
428
- ```typescript
429
- import { HyperMem } from '@psiclawops/hypermem';
430
- import { join } from 'node:path';
431
- import { homedir } from 'node:os';
432
-
433
- const hm = await HyperMem.create({
434
- dataDir: join(homedir(), '.openclaw', 'hypermem'),
435
- embedding: { provider: 'none' },
436
- });
437
-
438
- await hm.recordUserMessage('my-agent', 'session-1', 'Hello');
439
- const composed = await hm.compose({
440
- agentId: 'my-agent',
441
- sessionKey: 'session-1',
442
- prompt: 'Hello',
443
- tokenBudget: 4000,
444
- provider: 'anthropic',
445
- });
446
- ```
447
-
448
- That's it. No gateway, no plugins, no config files. See [API](#api) for the full interface.
449
-
450
- ### OpenClaw plugin install
451
-
452
- **Install contract:** HyperMem plugin install has 4 distinct states. Treat them separately.
453
-
454
- 1. **Package installed**: `npm install @psiclawops/hypermem`
455
- 2. **Runtime staged**: `npx hypermem-install` or `npm run install:runtime`
456
- 3. **OpenClaw wired**: plugin paths, slots, and optional allowlist merged into config
457
- 4. **Runtime verified active**: gateway restarted, plugins loaded, compose logs visible
458
-
459
- If you only finish step 2, HyperMem is **not installed yet**. It is only staged.
415
+ **Requirements:** Node.js 22+, OpenClaw with context engine plugin support. No standalone SQLite install is needed because HyperMem uses Node 22 `node:sqlite`. Embeddings are optional on first install.
460
416
 
461
- > **Release note:** if the npm package you installed does not contain `hypermem-install` or `install:runtime`, you are on an older public release. Use the source-clone path below or wait for `0.8.5+`.
417
+ README is OpenClaw-first. For non-OpenClaw library usage, see **[INSTALL.md § Non-OpenClaw usage](./INSTALL.md#non-openclaw-usage)**.
462
418
 
463
- #### Path A: npm package, recommended for operators
419
+ ### OpenClaw quickstart
464
420
 
465
421
  ```bash
466
422
  npm install @psiclawops/hypermem
467
423
  npx hypermem-install
468
424
  ```
469
425
 
470
- `hypermem-install` stages the runtime payload into `~/.openclaw/plugins/hypermem`. It does **not** modify OpenClaw config and does **not** restart the gateway.
426
+ `hypermem-install` stages the runtime payload into `~/.openclaw/plugins/hypermem`. It does **not** modify OpenClaw config and does **not** restart the gateway. HyperMem is active only after OpenClaw is wired, restarted, and compose activity appears in logs.
471
427
 
472
- #### Path B: source clone, recommended for contributors
428
+ Install states:
473
429
 
474
- ```bash
475
- git clone https://github.com/PsiClawOps/hypermem.git
476
- cd hypermem
477
- npm install && npm run build
478
- npm --prefix plugin install && npm --prefix plugin run build
479
- npm --prefix memory-plugin install && npm --prefix memory-plugin run build
480
- npm run install:runtime
481
- ```
482
-
483
- Both install paths converge here. The runtime payload is now staged under `~/.openclaw/plugins/hypermem`, but HyperMem is still **not active** until OpenClaw is wired and restarted.
484
-
485
- #### Step 1, write the starter config
430
+ | State | Meaning |
431
+ |---|---|
432
+ | Package installed | npm package is present |
433
+ | Runtime staged | plugin payload copied into `~/.openclaw/plugins/hypermem` |
434
+ | OpenClaw wired | `plugins.load.paths`, `plugins.slots.contextEngine`, and `plugins.slots.memory` point at HyperMem |
435
+ | Runtime loaded | gateway restarted and both plugins loaded |
436
+ | Runtime active | logs show `hypermem initialized` and compose activity |
486
437
 
487
- Before wiring the plugins, create the data directory and write the current recommended starter config:
438
+ Minimal starter config for lightweight FTS-only mode:
488
439
 
489
440
  ```bash
490
441
  mkdir -p ~/.openclaw/hypermem
491
442
  cat > ~/.openclaw/hypermem/config.json <<'JSON'
492
443
  {
493
- "embedding": {
494
- "provider": "none"
495
- },
444
+ "embedding": { "provider": "none" },
496
445
  "compositor": {
497
446
  "budgetFraction": 0.55,
498
447
  "contextWindowReserve": 0.25,
@@ -511,81 +460,29 @@ cat > ~/.openclaw/hypermem/config.json <<'JSON'
511
460
  JSON
512
461
  ```
513
462
 
514
- This keeps a fresh install in lightweight embedding mode while also applying the current recommended lean compositor baseline for OpenClaw operators. Add an embedding provider later for semantic search without losing stored data. See [INSTALL.md](./INSTALL.md#embedding-providers) and [docs/TUNING.md](./docs/TUNING.md) for adjustments.
515
-
516
- #### Step 2, inspect current OpenClaw plugin config
463
+ Then merge the staged plugin paths into OpenClaw config and set the slots:
517
464
 
518
465
  ```bash
519
466
  openclaw config get plugins.load.paths
520
467
  openclaw config get plugins.allow
521
- ```
522
-
523
- Record what is already there. You are going to **merge**, not replace.
524
-
525
- #### Step 3, wire the plugins into OpenClaw
526
468
 
527
- > **⚠️ Merge, don't overwrite.** If you already have values in `plugins.load.paths` or `plugins.allow`, check them first and include your existing entries alongside the new ones. Replacing the list drops whatever was there before.
528
- >
529
- > ```bash
530
- > openclaw config get plugins.allow
531
- > openclaw config get plugins.load.paths
532
- > ```
533
-
534
- ```bash
535
- # Use a variable to avoid shell quote-escaping issues with $HOME:
536
469
  HYPERMEM_PATHS="[\"${HOME}/.openclaw/plugins/hypermem/plugin\",\"${HOME}/.openclaw/plugins/hypermem/memory-plugin\"]"
537
470
  openclaw config set plugins.load.paths "$HYPERMEM_PATHS" --strict-json
538
- # If you have existing load paths, merge them into the array in HYPERMEM_PATHS.
539
-
540
471
  openclaw config set plugins.slots.contextEngine hypercompositor
541
472
  openclaw config set plugins.slots.memory hypermem
542
473
 
543
- # Only set plugins.allow if your OpenClaw config already uses an allowlist.
544
- # If `openclaw config get plugins.allow` returns null, empty, or unset, skip this step.
545
- # If it returns an array, copy that array and append "hypercompositor" and "hypermem".
474
+ # Only if your install already uses plugins.allow: merge, do not replace.
546
475
  openclaw config set plugins.allow '["existing-plugin","hypercompositor","hypermem"]' --strict-json
547
476
 
548
477
  openclaw gateway restart
478
+ hypermem-doctor --fix-plan
479
+ hypermem-status --health
480
+ hypermem-model-audit --strict
549
481
  ```
550
482
 
551
- Do **not** replace a working `plugins.allow` list with only `['hypercompositor','hypermem']`. That can disable bundled CLI surfaces and channel plugins.
483
+ `hypermem-doctor` is the confidence check: it validates plugin wiring, runtime load state, recommended OpenClaw settings such as `contextPruning.mode=off`, GPT-5 personality overlay off, startup/bootstrap injection sizing, compaction safety settings, HyperMem data files, and model context-window overrides for GPT/OpenAI-compatible/local gateways. It is read-only and prints a reviewable fix plan.
552
484
 
553
- If `plugins.allow` is unset, null, or empty, leave it alone. Do **not** create a new allowlist unless your OpenClaw install already uses one.
554
-
555
- #### Step 4, restart the gateway
556
-
557
- ```bash
558
- openclaw gateway restart
559
- ```
560
-
561
- #### Step 5, verify install state
562
-
563
- Verification should answer which state you are in, not just whether one command succeeded.
564
-
565
- | State | What it means | How to verify |
566
- |---|---|---|
567
- | Runtime staged | files copied into plugin runtime dir | `ls ~/.openclaw/plugins/hypermem` |
568
- | Wired | OpenClaw config points at HyperMem | `openclaw config get plugins.slots.contextEngine`, `openclaw config get plugins.slots.memory` |
569
- | Loaded | gateway actually loaded both plugins | `openclaw plugins list` |
570
- | Healthy but empty | plugin active, no real session data yet | `node bin/hypermem-status.mjs --health` may report no sessions ingested |
571
- | Active | HyperMem is composing live turns | `openclaw logs --limit 50 | grep hypermem` shows compose activity |
572
-
573
- Run these commands from the repo clone directory when using `bin/hypermem-status.mjs`, because `bin/` is a relative path:
574
-
575
- ```bash
576
- openclaw plugins list # hypercompositor and hypermem should show as loaded
577
- node bin/hypermem-status.mjs --health # confirms database initialization
578
- openclaw logs --limit 50 | grep hypermem # should show "hypermem initialized"
579
- ```
580
-
581
- Expected first-run outcomes:
582
-
583
- - `openclaw plugins list` shows `hypercompositor` and `hypermem` loaded
584
- - `node bin/hypermem-status.mjs --health` may say `no sessions ingested` on a fresh install, which is normal
585
- - logs should show `hypermem initialized`
586
- - logs should show compose activity after you send a real message to any agent
587
-
588
- If you see `falling back to default engine "legacy"` in the logs, the install is **not active**. Check [INSTALL.md troubleshooting](./INSTALL.md#troubleshooting-clean-installs).
485
+ Full install, upgrade, source-clone, embedding provider, reranker, fleet config, and rollback guidance lives in **[INSTALL.md](./INSTALL.md)**.
589
486
 
590
487
  ### One-line installer
591
488
 
@@ -593,9 +490,7 @@ If you see `falling back to default engine "legacy"` in the logs, the install is
593
490
  curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
594
491
  ```
595
492
 
596
- Interactive: detects hardware, selects embedding tier, writes config, registers plugins.
597
-
598
- Full guide with installation states, merge-safe config wiring, embedding tiers, reranker setup, fleet config, and tuning: **[INSTALL.md](./INSTALL.md)**
493
+ The shell installer stages the runtime and prints merge-safe activation commands. It does not edit OpenClaw config or restart the gateway.
599
494
 
600
495
  ### Agent-assisted install
601
496
 
@@ -658,78 +553,30 @@ Or configure through `openclaw.json` (preferred for managed deployments):
658
553
 
659
554
  Plugin config in `openclaw.json` takes precedence over `config.json`. Both sources are merged, with plugin config winning on overlap. The config schema is validated on gateway start and visible via `openclaw config get plugins.entries.hypercompositor.config`.
660
555
 
661
- Full reference: **[docs/TUNING.md](./docs/TUNING.md)**
662
-
663
- ---
664
-
665
- ## API
666
-
667
- > **Note:** The examples below use placeholder agent names (`my-agent`, `agent1`, etc.). Replace these with your actual agent IDs from your OpenClaw config. Single-agent installs typically use `main`. Multi-agent fleets use whatever IDs you've configured. See [INSTALL.md § "Configure your fleet"](./INSTALL.md#step-5--configure-your-fleet-multi-agent-only) for details.
556
+ **Key tuning knobs:**
557
+ - `verboseLogging` — set to `true` in the compositor config to see per-turn budget resolution in the gateway logs (`budget source:` lines show which window size is active and why).
558
+ - `contextWindowOverrides` — override the detected context window per `"provider/model"` key when autodetect gives wrong results for custom, local, or finetuned models. Fixes all downstream budget fractions in one place.
668
559
 
669
- ```typescript
670
- import { HyperMem } from '@psiclawops/hypermem';
671
- import { join } from 'node:path';
672
- import { homedir } from 'node:os';
673
-
674
- const hm = await HyperMem.create({
675
- dataDir: join(homedir(), '.openclaw', 'hypermem'),
676
- cache: { maxEntries: 10000 },
677
- // Local (Ollama):
678
- embedding: { ollamaUrl: 'http://localhost:11434', model: 'nomic-embed-text' },
679
- // Hosted (OpenRouter), recommended for installs without local GPU/CPU:
680
- // embedding: { provider: 'openai', openaiApiKey: 'sk-or-...', openaiBaseUrl: 'https://openrouter.ai/api/v1', model: 'qwen/qwen3-embedding-8b', dimensions: 4096, batchSize: 128 },
681
- });
682
-
683
- // Record and compose
684
- await hm.recordUserMessage('my-agent', 'agent:my-agent:webchat:main', 'How does drift detection work?');
685
-
686
- const composed = await hm.compose({
687
- agentId: 'my-agent',
688
- sessionKey: 'agent:my-agent:webchat:main',
689
- prompt: 'How does drift detection work?',
690
- tokenBudget: 4000,
691
- provider: 'anthropic',
692
- });
693
-
694
- // Refresh tool compression after each turn
695
- await hm.refreshCacheGradient('my-agent', 'agent:my-agent:webchat:main');
696
- ```
697
-
698
- Spawning a subagent with parent context:
699
-
700
- ```typescript
701
- import { buildSpawnContext, MessageStore, DocChunkStore } from '@psiclawops/hypermem';
702
-
703
- const spawn = await buildSpawnContext(
704
- new MessageStore(hm.dbManager.getMessageDb('my-agent')),
705
- new DocChunkStore(hm.dbManager.getLibraryDb()),
706
- 'my-agent',
707
- { parentSessionKey: 'agent:my-agent:webchat:main', workingSnapshot: 12 }
708
- );
709
- ```
560
+ Full reference: **[docs/TUNING.md](./docs/TUNING.md)**
710
561
 
711
562
  ---
712
563
 
713
- ## CLI
564
+ ## API and CLI references
714
565
 
715
- `bin/hypermem-status.mjs` provides health checks and metrics from the command line:
566
+ README keeps the interface surface short. Use the detailed docs for exact examples and release validation commands.
716
567
 
717
- ```bash
718
- node bin/hypermem-status.mjs # full dashboard
719
- node bin/hypermem-status.mjs --agent my-agent # scoped to one agent
720
- node bin/hypermem-status.mjs --json # machine-readable output
721
- node bin/hypermem-status.mjs --health # health checks only (exit 1 on failure)
722
- ```
568
+ **Runtime API:** import `HyperMem` from `@psiclawops/hypermem` for direct Node.js use, custom tests, and non-OpenClaw integrations. See **[INSTALL.md § Non-OpenClaw usage](./INSTALL.md#non-openclaw-usage)** and package TypeScript declarations for the current interface.
723
569
 
724
- By default, `hypermem-status` looks for data in `~/.openclaw/hypermem`. If your data directory is elsewhere (e.g. testing in an isolated environment), set:
570
+ **Operator CLIs:**
725
571
 
726
572
  ```bash
727
- HYPERMEM_DATA_DIR=/path/to/data node bin/hypermem-status.mjs --health
573
+ hypermem-status --health
574
+ hypermem-status --master
575
+ hypermem-model-audit --strict
576
+ hypermem-bench --iterations 1000 --warmup 50 --agent main
728
577
  ```
729
578
 
730
- > **Fresh install note:** If no agent has run a session yet, `--health` will report "no sessions ingested" rather than a database error. This is expected. Send a test message to any agent, then re-run the health check.
731
-
732
- ---
579
+ Diagnostics and validation details: **[docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md)** and **[docs/INTEGRATION_VALIDATION.md](./docs/INTEGRATION_VALIDATION.md)**.
733
580
 
734
581
  ## Pressure management
735
582
 
@@ -780,7 +627,7 @@ Full troubleshooting: **[INSTALL.md § Troubleshooting](./INSTALL.md#troubleshoo
780
627
 
781
628
  hypermem doesn't touch your existing memory data. Install it, switch the context engine, and migrate historical data on your own timeline.
782
629
 
783
- The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, Mem0, Honcho, QMD session exports, and Engram. Each example walks through the data model mapping, transformation steps, and validation. Adapt them to your setup.
630
+ The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, QMD, ClawText, Cognee, Mem0, Zep, Honcho, memory-lancedb, MEMORY.md files, and custom engines. Each path documents source mapping, dry-run expectations, activation, rollback, and post-migration validation. Adapter snippets are examples unless explicitly shipped as package binaries.
784
631
 
785
632
  All examples default to dry-run. Nothing is written until you add `--apply`.
786
633
 
@@ -791,7 +638,7 @@ Operator guide: **[docs/MIGRATION_GUIDE.md](./docs/MIGRATION_GUIDE.md)**
791
638
 
792
639
  ## Identity layer
793
640
 
794
- hypermem handles context and output normalization. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
641
+ hypermem handles context assembly and output-profile shaping. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
795
642
 
796
643
  Design guide: [PsiClawOps/AgenticCognitiveArchitecture](https://github.com/PsiClawOps/AgenticCognitiveArchitecture/)
797
644