@psiclawops/hypermem 0.1.0 → 0.5.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (93) hide show
  1. package/ARCHITECTURE.md +4 -3
  2. package/README.md +457 -174
  3. package/package.json +15 -5
  4. package/dist/background-indexer.d.ts +0 -117
  5. package/dist/background-indexer.d.ts.map +0 -1
  6. package/dist/background-indexer.js +0 -732
  7. package/dist/compaction-fence.d.ts +0 -89
  8. package/dist/compaction-fence.d.ts.map +0 -1
  9. package/dist/compaction-fence.js +0 -153
  10. package/dist/compositor.d.ts +0 -139
  11. package/dist/compositor.d.ts.map +0 -1
  12. package/dist/compositor.js +0 -1109
  13. package/dist/cross-agent.d.ts +0 -57
  14. package/dist/cross-agent.d.ts.map +0 -1
  15. package/dist/cross-agent.js +0 -254
  16. package/dist/db.d.ts +0 -131
  17. package/dist/db.d.ts.map +0 -1
  18. package/dist/db.js +0 -398
  19. package/dist/desired-state-store.d.ts +0 -100
  20. package/dist/desired-state-store.d.ts.map +0 -1
  21. package/dist/desired-state-store.js +0 -212
  22. package/dist/doc-chunk-store.d.ts +0 -115
  23. package/dist/doc-chunk-store.d.ts.map +0 -1
  24. package/dist/doc-chunk-store.js +0 -278
  25. package/dist/doc-chunker.d.ts +0 -99
  26. package/dist/doc-chunker.d.ts.map +0 -1
  27. package/dist/doc-chunker.js +0 -324
  28. package/dist/episode-store.d.ts +0 -48
  29. package/dist/episode-store.d.ts.map +0 -1
  30. package/dist/episode-store.js +0 -135
  31. package/dist/fact-store.d.ts +0 -57
  32. package/dist/fact-store.d.ts.map +0 -1
  33. package/dist/fact-store.js +0 -175
  34. package/dist/fleet-store.d.ts +0 -144
  35. package/dist/fleet-store.d.ts.map +0 -1
  36. package/dist/fleet-store.js +0 -276
  37. package/dist/hybrid-retrieval.d.ts +0 -60
  38. package/dist/hybrid-retrieval.d.ts.map +0 -1
  39. package/dist/hybrid-retrieval.js +0 -340
  40. package/dist/index.d.ts +0 -611
  41. package/dist/index.d.ts.map +0 -1
  42. package/dist/index.js +0 -1042
  43. package/dist/knowledge-graph.d.ts +0 -110
  44. package/dist/knowledge-graph.d.ts.map +0 -1
  45. package/dist/knowledge-graph.js +0 -305
  46. package/dist/knowledge-store.d.ts +0 -72
  47. package/dist/knowledge-store.d.ts.map +0 -1
  48. package/dist/knowledge-store.js +0 -241
  49. package/dist/library-schema.d.ts +0 -22
  50. package/dist/library-schema.d.ts.map +0 -1
  51. package/dist/library-schema.js +0 -717
  52. package/dist/message-store.d.ts +0 -76
  53. package/dist/message-store.d.ts.map +0 -1
  54. package/dist/message-store.js +0 -273
  55. package/dist/preference-store.d.ts +0 -54
  56. package/dist/preference-store.d.ts.map +0 -1
  57. package/dist/preference-store.js +0 -109
  58. package/dist/preservation-gate.d.ts +0 -82
  59. package/dist/preservation-gate.d.ts.map +0 -1
  60. package/dist/preservation-gate.js +0 -150
  61. package/dist/provider-translator.d.ts +0 -40
  62. package/dist/provider-translator.d.ts.map +0 -1
  63. package/dist/provider-translator.js +0 -349
  64. package/dist/rate-limiter.d.ts +0 -76
  65. package/dist/rate-limiter.d.ts.map +0 -1
  66. package/dist/rate-limiter.js +0 -179
  67. package/dist/redis.d.ts +0 -188
  68. package/dist/redis.d.ts.map +0 -1
  69. package/dist/redis.js +0 -534
  70. package/dist/schema.d.ts +0 -15
  71. package/dist/schema.d.ts.map +0 -1
  72. package/dist/schema.js +0 -203
  73. package/dist/secret-scanner.d.ts +0 -51
  74. package/dist/secret-scanner.d.ts.map +0 -1
  75. package/dist/secret-scanner.js +0 -248
  76. package/dist/seed.d.ts +0 -108
  77. package/dist/seed.d.ts.map +0 -1
  78. package/dist/seed.js +0 -177
  79. package/dist/system-store.d.ts +0 -73
  80. package/dist/system-store.d.ts.map +0 -1
  81. package/dist/system-store.js +0 -182
  82. package/dist/topic-store.d.ts +0 -45
  83. package/dist/topic-store.d.ts.map +0 -1
  84. package/dist/topic-store.js +0 -136
  85. package/dist/types.d.ts +0 -329
  86. package/dist/types.d.ts.map +0 -1
  87. package/dist/types.js +0 -9
  88. package/dist/vector-store.d.ts +0 -132
  89. package/dist/vector-store.d.ts.map +0 -1
  90. package/dist/vector-store.js +0 -498
  91. package/dist/work-store.d.ts +0 -112
  92. package/dist/work-store.d.ts.map +0 -1
  93. package/dist/work-store.js +0 -273
package/README.md CHANGED
@@ -1,243 +1,526 @@
1
- # HyperMem
1
+ <p align="center">
2
+ <img src="assets/logo.png" alt="hypermem" width="283" />
3
+ </p>
2
4
 
3
- Agent-centric memory system for OpenClaw. Four-layer architecture: Redis hot cache → per-agent message DB → per-agent vector DB → shared fleet library.
5
+ <p align="center"><em>Coherent agents. Every session.</em></p>
4
6
 
5
- **Status:** Core complete + context engine plugin shipped. 29 modules, ~12,300 lines, 419 tests across 11 suites. All passing.
7
+ ---
6
8
 
7
- ## What It Does
9
+ hypermem is a runtime context engine for OpenClaw agents.
8
10
 
9
- HyperMem replaces the default OpenClaw context assembly pipeline. Instead of the runtime managing conversation history and compaction, HyperMem owns the full prompt composition lifecycle:
11
+ ```bash
12
+ curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
13
+ ```
10
14
 
11
- 1. **Records** every message to SQLite (L2) and Redis (L1) as it arrives
12
- 2. **Indexes** conversations and workspace files for semantic retrieval (L3)
13
- 3. **Composes** each LLM prompt fresh from storage — facts, knowledge, history, recall — within a strict token budget
14
- 4. **Owns compaction** — the runtime's legacy compaction is bypassed entirely; HyperMem handles its own context management
15
15
 
16
- This means agents get structured, budget-aware context every turn instead of a growing transcript that eventually gets summarized.
16
+ ---
17
17
 
18
- ## Architecture
18
+ ## The problem
19
+
20
+ Every LLM conversation is composed at runtime. The model sees only what's in the prompt. It has no memory of prior sessions, no access to decisions made last week, no awareness of work that happened before this context window opened.
21
+
22
+ Two questions make this concrete:
23
+
24
+ | Question | What the LLM has | What happens |
25
+ |---|---|---|
26
+ | *"What was Caesar's greatest military victory?"* | Training data | ✅ Answered correctly, no session context needed |
27
+ | *"What did we decide about the retry logic last week?"* | Nothing (prior session is gone) | ❌ The decision existed only in that session |
28
+
29
+ The difference isn't intelligence. It's what was in the prompt. Two failure modes follow:
30
+
31
+ **New-session amnesia.** The agent restarts and everything is gone. Decisions, preferences, work in progress: erased at the session boundary. Operators re-explain context. Agents re-ask questions already answered.
32
+
33
+ **Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics (tool output, exact decisions, file paths) are lost in the summary. The agent keeps running, but degraded.
34
+
35
+ ---
36
+
37
+ ## What OpenClaw provides today
38
+
39
+ OpenClaw addresses both failure modes with structured guidance files injected into every session:
40
+
41
+ | File | What it contributes | Survives session restart? |
42
+ |---|---|---|
43
+ | `SOUL.md` | Agent identity, voice, principles | ✅ always injected |
44
+ | `USER.md` | User preferences, working style | ✅ always injected |
45
+ | Task and workspace instruction files (for example AGENTS.md, job files, and related guidance) | ✅ always injected |
46
+ | `MEMORY.md` | Hand-curated decisions, facts, patterns | ✅ if manually maintained |
47
+
48
+ These are powerful for identity and preferences. But the retry logic decision from last week? If nobody manually captured it into `MEMORY.md`, that session boundary erased it. The system is only as strong as its last manual update.
49
+
50
+ OpenClaw also ships compaction safeguards and hybrid file search. That's a solid baseline. It has limits. hypermem closes both gaps.
51
+
52
+ ---
53
+
54
+ ## hypermem
55
+
56
+ Four storage layers, sub-millisecond retrieval, no external database services required. Runs in-process with local SQLite storage and local Nomic embeddings by default, with optional hosted embeddings for L3.
19
57
 
20
- See [ARCHITECTURE.md](./ARCHITECTURE.md) for the full design.
58
+ | Layer | What it holds | Speed |
59
+ |---|---|---|
60
+ | **L1 In-memory** | What the agent needs right now. Identity, recent history, active state. | 0.08ms |
61
+ | **L2 History** | Every conversation, queryable and concurrent-safe. Per-agent. | 0.13ms |
62
+ | **L3 Semantic** | Finds related content even when the words don't match. | 0.29ms |
63
+ | **L4 Knowledge** | Facts, wiki pages, episodes, preferences. Shared across agents. | 0.09ms |
64
+
65
+ Everything is retained. Storage survives session boundaries. The retry logic decision from last week, the deployment preferences from last month, the architecture choices from day one: all queryable, all available for composition.
66
+
67
+ **Session warming.** Before the first turn fires, hypermem pre-loads the agent's full working state from the in-memory SQLite cache: recent history, facts ranked by confidence and recency, active topic context, cached embeddings for fast semantic recall. The agent's first reply draws from everything that was in scope at the end of the last session. The agent picks up where it left off.
68
+
69
+ ---
70
+
71
+ ## hypercompositor
72
+
73
+ Every memory system stores. Almost none compose.
74
+
75
+ Your agent has four layers of stored context, but what shows up in the prompt? How much of the token budget goes to stale content? Who decides what's relevant to this specific turn?
76
+
77
+ The hypercompositor queries all four layers in parallel on every turn and composes context within a fixed token budget. No transcript accumulates. No lossy transcript summarization. Amnesia isn't a storage problem; the memories exist, but nobody composed them into a coherent prompt. Compaction isn't inevitable; content that doesn't fit this turn stays in storage instead of being destroyed.
78
+
79
+ **Bigger context windows don't help if you fill them with stale history.**
80
+ 128k tokens of stale history and irrelevant memory is worse than 32k of precisely selected content. 10 budget categories, priority-ordered, greedy-fill. Every token in the prompt earned its spot.
81
+
82
+ ### What the model actually sees
21
83
 
84
+ Token budget allocation from a mature session (847 turns deep, 128k budget):
85
+
86
+ ```
87
+ What the model sees (92k of 128k tokens, 72% utilization):
88
+
89
+ ┌────────────────┬──────────────────────────┬──────────────┬───────────┬────────────┬────────────┬──────────────┬──────────┐
90
+ │ id/sys/user │ history │ recent tools │ keystones │ wiki/know. │ facts │ recall/sem. │ reserve │
91
+ │ tools 14,000 │ 46,000 │ 10,000 │ 3,600 │ 2,600 │ 2,200 │ 1,600 │ 12,000 │
92
+ │ │ 65-90 tool or 120-160 │ │ │ │ top ~28 │ │ │
93
+ └────────────────┴──────────────────────────┴──────────────┴───────────┴────────────┴────────────┴──────────────┴──────────┘
94
+ ◄────────────────────────────────────────────── 72% composed ──────────────────────────────────────────────►
95
+
96
+ What's in storage, not in this prompt:
97
+
98
+ L2 847 turns stored top 70-120 shown depending on turn density
99
+ L3 28,441 indexed episodes available via semantic search
100
+ L4 5,104 facts ranked by confidence × decay, top ~28 selected
101
+ L4 847 knowledge entries active-topic subset shown, rest on standby
102
+
103
+ Everything stays in storage. The compositor picks what's relevant right now.
104
+ Change the topic, and the next turn pulls different content from the same storage.
22
105
  ```
23
- L1 Redis Hot session cache (sub-ms reads, identity kernel, fleet cache)
24
- L2 Messages DB Per-agent conversation history (SQLite, rotatable)
25
- L3 Vectors DB Per-agent semantic search (sqlite-vec, 768d embeddings)
26
- L4 Library DB Fleet-wide structured knowledge (10 collections + knowledge graph)
106
+
107
+ ### Standard context engine vs. hypercompositor
108
+
27
109
  ```
110
+ Standard hypercompositor
111
+ ──────────────────────────────── ────────────────────────────────
112
+ message → append to transcript message → detect active topic
113
+ transcript full → trim oldest query 4 storage layers in parallel
114
+ trimmed content → summarize (lossy) budget allocator: 10 slots, fixed cap
115
+ send transcript to model tool compression by turn age
116
+ model responds → append again keystone guard + hyperform profile
117
+ composed prompt → model
118
+ ┌──────────────────┐ model responds → afterTurn ingest
119
+ │ loop until full │ → write back to all 4 layers
120
+ └──────────────────┘
121
+
122
+ When it fills: When budget is exceeded:
123
+ content is lost permanently content stays in storage
124
+ summaries are lossy not selected for this turn
125
+ no recovery path change topic back → retrieved again
126
+ ```
127
+
128
+ | | Standard | hypercompositor |
129
+ |---|---|---|
130
+ | Context source | Growing transcript | 4 independent storage layers |
131
+ | When context fills | Trim + summarize (lossy) | Budget allocation (lossless storage) |
132
+ | Old decisions | Lost after compaction | Retrievable via keystones + semantic recall |
133
+ | Topic changes | All history competes equally | Scoped retrieval by active topic |
134
+ | Tool output | Stays until trimmed | Cluster-compressed by age |
135
+ | Model swap mid-session | Re-count, hope it fits | Budget recomputed from new window size next turn |
136
+
137
+ High-signal turns are marked as keystones and survive pressure trimming ahead of ordinary history.
138
+
139
+ ---
140
+
141
+ ## hyperform
142
+
143
+ Raw model output has two problems. It drifts from your standards (sycophancy, hedging, pagination, formatting) and it drifts from your facts (confabulation, contradiction, stale claims). hyperform handles both.
144
+
145
+ **Normalization** shapes output to match a profile you define. Three presets ship with hypermem:
146
+
147
+ | Profile | Tokens | Covers |
148
+ |---|---|---|
149
+ | `light` | ~100 | Anti-sycophancy, em dash ban, AI vocab ban, length targets, evidence calibration |
150
+ | `standard` | ~250 | Full directive set plus pagination rules and hedging policy |
151
+ | `full` | ~400 | Complete normalization with full directive set and model-specific calibration |
152
+
153
+ The same prompt, GPT-5.4, with and without `outputProfile: "light"`:
154
+
155
+ ```
156
+ Prompt: "How should I size my context window budget for a long-running agent session?"
157
+
158
+ WITHOUT normalization (GPT-5.4 default):
159
+ Here are the key factors to consider when sizing your context window budget:
160
+
161
+ **1. Session depth**
162
+ Longer sessions accumulate more history...
163
+
164
+ **2. Tool output volume**
165
+ Agentic sessions generate significant tool output...
166
+
167
+ **3. Fact corpus size**
168
+ More stored facts means more retrieval candidates...
169
+
170
+ Would you like me to go deeper on any of these?
171
+
172
+ WITH outputProfile: "light":
173
+ For a 128k window: reserve 14k for identity/system, target 46k for history, 10k for recent
174
+ tool context, and leave ~30k as allocator reserve. hypermem handles slot competition
175
+ automatically -- set contextWindowReserve to your preferred floor and let the compositor fill.
176
+ ```
177
+
178
+ **Verification** checks claims against the fact corpus before they're recorded. No LLM call. Pattern matching against stored facts, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
179
+
180
+ ---
181
+
182
+ ## What it solves
183
+
184
+ ### Tool output that doesn't take over
185
+
186
+ Agentic sessions generate massive tool output. Left unmanaged, old results crowd out current reasoning. hypermem compresses tool history by age: recent clusters stay full, older clusters are capped, and the oldest collapse to short stubs while preserving tool call/result integrity. The budget goes to current work, not last hour's npm test output.
187
+
188
+ ### Knowledge that outlasts the conversation
189
+
190
+ Most memory systems store what was said. hypermem synthesizes what was learned.
191
+
192
+ When a topic goes quiet, hypermem compiles the thread into a structured wiki page: decisions, open questions, artifacts, participants. When the topic resurfaces, the agent gets a compact structured summary rather than a raw history replay.
193
+
194
+ OpenClaw 2026.4.7 ships memory wiki for structured storage. hypermem goes further: wiki pages are synthesized automatically and injected by the compositor within token budget.
195
+
196
+ ### Subagents that hit the ground running
197
+
198
+ Spawned subagents inherit a bounded context block: recent parent turns, session-scoped documents, and relevant facts. Scope is isolated from the shared library. Documents are cleaned up on completion.
199
+
200
+ ---
201
+
202
+ ## Pressure management
203
+
204
+ hypermem composes context fresh on every turn, but a long-running session still accumulates history in its JSONL transcript. When that grows large enough, incoming tool results have nowhere to land and get silently stripped. Three automatic paths handle this:
205
+
206
+ | Path | Trigger | Action |
207
+ |---|---|---|
208
+ | **Pressure-tiered tool-loop trim** | Any tool-loop turn | Measures projected occupancy before results land; trims large results at 80%+ and truncates the messages[] array for the current turn |
209
+ | **AfterTurn trim** | Every turn at >80% | Pre-emptive headroom cut after the assistant replies, before the next turn arrives |
210
+ | **Deep compaction** | compact() at >85% | Cuts in-memory cache to 25% budget and truncates JSONL to ~20% depth. Bypasses the normal reshape guard |
211
+
212
+ **The one thing these paths cannot fix:** a session whose JSONL transcript on disk is already at 98% when the gateway restarts. The JSONL loads into runtime context before any compaction runs. Check `session_status` on startup. If you're above 85%, start a fresh session.
213
+
214
+ ---
215
+
216
+ ## How it works
217
+
218
+ 1. **Record** each turn into SQLite and mirror hot session state into the in-memory cache.
219
+ 2. **Index** conversations and workspace files for exact and semantic recall.
220
+ 3. **Assemble** a fresh prompt from history, facts, document chunks, and library data within a strict budget.
221
+ 4. **Tune** tool-heavy history by turn age so old payloads don't crowd out current work.
222
+ 5. **Compile** stale topics into structured wiki pages for future recall without raw history replay.
223
+ 6. **Carry forward** scoped context into subagents when a task needs a narrower working set.
224
+
225
+ ### What runs automatically
28
226
 
29
- ### Key Components
227
+ No configuration required for any of these:
30
228
 
31
- - **Compositor** Assembles LLM prompts from all 4 layers with token budgeting. Each slot gets a proportional cap of remaining budget (facts 30%, knowledge 20%, preferences 10%, cross-session 20%). Post-assembly safety valve catches estimation drift and trims history to fit. Multi-provider output (Anthropic + OpenAI).
32
- - **Context Engine Plugin** OpenClaw plugin (`plugin/src/index.ts`) that registers HyperMem as a context engine. Owns compaction, translates between OpenClaw's runtime events and HyperMem's storage/composition pipeline. Drop-in replacement for the default context assembly.
33
- - **Hybrid Retrieval** Combined FTS5 full-text search + KNN vector similarity with Reciprocal Rank Fusion for recall quality.
34
- - **Doc Chunker** Section-aware markdown/file parser that splits workspace documents into semantically meaningful chunks for indexing.
35
- - **Workspace Seeder** Indexes workspace files (AGENTS.md, SOUL.md, POLICY.md, daily memory, etc.) into L4 collections with idempotent re-indexing and source-hash deduplication.
36
- - **Fleet Cache** — Redis hot layer for agent profiles + fleet summary. Cache-aside reads, write-through invalidation, bulk hydration on gateway startup.
37
- - **Knowledge Graph** — DAG traversal over entity relationships. BFS with depth/direction/type filters, shortest path, degree analytics.
38
- - **Rate Limiter** — Token-bucket for embedding API calls. Priority queue (high > normal > low) with reserved tokens for user-facing recall.
39
- - **Secret Scanner** — Scans content for API keys, tokens, and credentials before storage. Prevents accidental persistence of secrets.
40
- - **Provider Translator** — Converts between neutral internal format and Anthropic/OpenAI at the output boundary. Handles tool call ID round-tripping.
41
- - **Message Rotation** — Automatic rotation of message DBs at 100MB / 90 days. WAL checkpoint before rotate.
42
- - **Background Indexer** — LLM-powered fact/knowledge extraction from conversations (framework complete).
229
+ - **Semantic indexer:** indexes each session's turns for recall after activity drops off. Embeddings are computed asynchronously after the assistant replies and cached for subsequent turns, so compose calls hit cache rather than computing on demand
230
+ - **Topic synthesis:** compiles stale topics into structured wiki pages and promotes high-signal facts from the hot cache to pointer-format entries in MEMORY.md; both classifier-driven, no LLM call
231
+ - **Noise sweep:** removes low-signal or expired facts on a rolling basis
232
+ - **Tool decay:** compresses older tool history to free budget for current work
233
+ - **Keystone scoring:** evaluates each recorded turn for historical significance; high-signal turns are marked for preservation ahead of ordinary history during pressure trimming
43
234
 
44
- ### Library Collections (L4)
235
+ ---
45
236
 
46
- | Collection | Purpose |
237
+ ## Speed
238
+
239
+ Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB. 1,000 iterations, 50 warmup discarded, single-process isolation.
240
+
241
+ | Operation | avg | p50 | p95 |
242
+ |---|---|---|---|
243
+ | L1 slot GET (SQLite in-memory) | 0.08ms | 0.07ms | 0.13ms |
244
+ | L1 history window (100 messages) | 0.13ms | 0.11ms | 0.19ms |
245
+ | L4 facts (top-28, confidence × decay) | 0.28ms | 0.26ms | 0.36ms |
246
+ | L4 facts + agentId filter | 0.31ms | 0.29ms | 0.40ms |
247
+ | L4 FTS5 keyword search | 0.06ms | 0.05ms | 0.08ms |
248
+ | L4 FTS5 + agentId filter | 0.07ms | 0.06ms | 0.10ms |
249
+ | L4 knowledge query | 0.09ms | 0.08ms | 0.14ms |
250
+ | Recency decay scoring (28 rows, in JS) | 0.003ms | 0.002ms | 0.005ms |
251
+ > Query planner uses compound indexes on agentId + sort key; FTS5 performance improved 25% from baseline after index additions despite a 47% increase in stored data.
252
+
253
+ L1 and L4 structured retrieval are sub-millisecond. Vector embeddings are computed asynchronously after the assistant replies and cached in the in-memory layer, not on the primary composition call path. Users never wait for an embedding computation.
254
+
255
+ ---
256
+
257
+ ## Architecture
258
+
259
+ hypermem plugs into OpenClaw as a context engine and owns the full prompt composition lifecycle. It registers as both `contextEngine` and `memory`, providing the standard memory slot interface alongside full prompt composition: `memory_search` routes through the official slot and shows correctly in `openclaw plugins list`.
260
+
261
+ **L1: SQLite in-memory.** Sub-millisecond hot reads, no network dependency, no daemon, no retry logic. Identity, compressed session history, cached embeddings, topic-scoped session and recall state, and agent registry data. The compositor hits this first on every turn.
262
+
263
+ **L2: Messages DB.** A single `MEMORY.md` file doesn't hold per-agent conversation history at scale. Thousands of turns across dozens of agents need queryable, concurrent-safe storage. Per-agent SQLite with WAL mode, auto-rotating at 100MB or 90 days. Full conversation history and session metadata. Rotated archives remain readable for recall.
264
+
265
+ **L3: Vectors DB.** Keyword search alone misses semantically related content. A decision recorded as "we chose exponential backoff" won't match a search for "what was the retry strategy" without vector similarity. Per-agent sqlite-vec database with KNN search over prior turns and indexed workspace documents. Reconstructable from L2 if lost. Supports two embedding providers: Ollama (local, default `nomic-embed-text`) or hosted via OpenRouter (recommended: `qwen/qwen3-embedding-8b`, 4096d, top of MTEB retrieval leaderboard).
266
+
267
+ Retrieval follows a fixed pipeline on every compose call:
268
+
269
+ 1. **Trigger registry** fires first. Nine pattern triggers check for exact-match shortcuts. If one hits, scoped FTS5 prefix queries (`word1* OR word2*`) run against L4 collections and return immediately.
270
+ 2. **Semantic fallback** fires when no trigger matches. Bounded hybrid retrieval runs FTS5 + KNN in parallel, then merges via Reciprocal Rank Fusion (RRF). BM25 ranks and KNN cosine distances combine into a single ordered result.
271
+ 3. **Noise floor** filters anything below RRF 0.008 before it reaches the compositor.
272
+
273
+ FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries). These indexes yielded a 25% read improvement over baseline despite a 47% increase in stored data.
274
+
275
+ **L4: Library DB.** Per-agent storage can't hold shared knowledge. Facts established by one agent, wiki pages synthesized from cross-agent topics, shared registry state: these belong to the system, not one agent. One shared SQLite database:
276
+
277
+ | Collection | What it holds |
47
278
  |---|---|
48
279
  | Facts | Verifiable claims with confidence, domain, expiry, supersedes chains |
49
- | Knowledge | Domain/key/value structured data with FTS |
50
- | Episodes | Significant events with impact and participants |
51
- | Topics | Cross-session thread tracking |
52
- | Preferences | Operator/user behavioral patterns |
53
- | Fleet Registry | Agent registry with tier, org, capabilities |
54
- | System Registry | Service state and lifecycle tracking |
280
+ | Knowledge | Domain/key/value structured data with full-text search |
281
+ | Episodes | Significant events with impact scores and participant tracking |
282
+ | Topics | Cross-session thread tracking and synthesized wiki pages |
283
+ | Preferences | Operator behavioral patterns |
284
+ | Fleet Registry | Agent registry with tier, org, and capability metadata |
285
+ | System Registry | Service state and lifecycle |
55
286
  | Work Items | Work queue with status transitions and FTS5 |
56
287
  | Session Registry | Session lifecycle tracking |
57
- | Desired State | Per-agent config with automatic drift detection |
288
+ | Desired State | Per-agent config targets; compares running config against desired at gateway startup and surfaces drift for operator review |
58
289
 
59
- ## Context Engine Plugin
290
+ Facts are ranked by `confidence × recencyDecay`, where decay is exponential with a configurable half-life: recent, high-confidence facts float to the top while stale entries yield budget to newer knowledge.
60
291
 
61
- The plugin (`plugin/`) is how HyperMem integrates with OpenClaw. It implements the `ContextEngine` interface:
292
+ **Secret scanner:** Before any fact, episode, or knowledge entry with `org`, `council`, or `fleet` visibility is written to L4, hypermem scans the content for credentials, API keys, tokens, and connection strings. Matches are downgraded to `private` scope rather than rejected; the write succeeds without the content reaching shared-visible storage.
62
293
 
63
- ```typescript
64
- // plugin registers as a context engine with:
65
- {
66
- id: 'hypermem',
67
- name: 'HyperMem Context Engine',
68
- version: '0.1.0',
69
- ownsCompaction: true, // runtime skips legacy compaction
70
- }
294
+ **The compositor** queries all four layers in parallel on each turn, applies per-slot token caps, and composes a provider-format context block. A safety valve catches estimation drift and trims post-composition. Because the budget is computed from the model's actual context window at compose time (resolved from the model string when the runtime doesn't pass `tokenBudget` explicitly), a mid-session model swap triggers a budget recompute on the next turn. Structured tool history is guarded from destructive persistence during a budget downshift.
295
+
296
+ **Tool compression** groups calls with results into atomic clusters via `clusterNeutralMessages()`. T0 preserves the current turn plus the two most recent completed turns at full fidelity, matching OpenClaw's native `keepLastAssistants: 3` baseline. Above 80% projected occupancy, large T0 results are head-and-tail trimmed with a structured trim note rather than dropped. Older clusters then enter the gradient: T1 caps at 6k per result, T2 at 800 chars, T3 at 150-char stubs. A pair-integrity guard ensures call-result clusters survive or drop together. `getTurnAge()` counts tool clusters correctly, and `toolPairMetrics` logs pair-integrity anomalies at the OpenClaw seam. When `deferToolPruning` is enabled and OpenClaw's native `contextPruning` is active, the native pruner handles tool result trimming instead.
297
+
298
+ **canPersistReshapedHistory** guards the compositor from persisting structurally reshaped history back to the JSONL transcript. When structured tool history is present, budget downshifts are computed but not committed to storage, preventing a lower-context snapshot from overwriting the full history on disk.
299
+
300
+ ```
301
+ user message
302
+
303
+ topic detection ──► scope retrieval to active thread
304
+
305
+ ┌────┴────────────────────────────────────────────┐
306
+ │ query 4 layers (parallel) │
307
+ │ │
308
+ │ L1 in-memory L2 History L3 Vectors L4 Library │
309
+ │ hot state durable semantic facts/wiki │
310
+ │ 0.1ms 0.16ms 0.29ms 0.08ms │
311
+ └────┬────────────────────────────────────────────┘
312
+
313
+ budget allocator ──► 10 slots, fixed token cap
314
+
315
+ tool compression ──► clusters by age, T0 3 turns full → T1 6k → T2 800 → T3 150-char stub
316
+
317
+ keystone guard ──► high-signal turns survive pressure
318
+
319
+ hyperform ──► output normalization directives
320
+
321
+ composed prompt
322
+
323
+ model response
324
+
325
+ afterTurn ──► write back to all 4 layers (tool-result carrier messages persisted through recordAssistantMessage, not flattened into plain user text, so structured tool results remain recoverable in durable history)
71
326
  ```
72
327
 
73
- **Lifecycle hooks handled:**
74
- | Event | Action |
75
- |---|---|
76
- | `gateway:startup` | Init HyperMem, auto-rotate DBs, hydrate fleet cache |
77
- | `agent:bootstrap` | Warm session (history, facts, profile → Redis) |
78
- | `message:received` | Record user message to SQLite + Redis |
79
- | `message:sent` | Record assistant message to SQLite + Redis |
80
- | `context:compose` | Full four-layer prompt assembly within token budget |
328
+ Slot-level budget allocation is shown in the [hypercompositor diagram](#what-the-model-actually-sees) above. The 72% composition figure is typical for a warm mature session. Multi-agent sessions with active registry and cross-session wiki may run slightly higher.
81
329
 
82
- **Install:** Deployed as an OpenClaw managed hook at `~/.openclaw/hooks/hypermem-core/handler.js`. The plugin build step copies compiled output to this path.
330
+ ---
83
331
 
84
332
  ## Requirements
85
333
 
86
- - **Node.js 22+** (uses built-in `node:sqlite`)
87
- - **Redis 7+** (optional — all operations degrade gracefully to SQLite-only)
88
- - **Ollama** (optional — for embedding generation, model: `nomic-embed-text`, 768d)
89
- - **sqlite-vec** (optional — for vector search)
334
+ **Current release: hypermem 0.5.0.** Topic-aware memory and compiled-knowledge system, optimized to run light by default and scale up when operators need richer context.
90
335
 
91
- ## Quick Start
336
+ What 0.5.0 includes:
337
+ - Topic-aware context tracking
338
+ - Compiled knowledge / wiki-like synthesis and recall
339
+ - Metrics dashboard primitives
340
+ - Obsidian import and export
341
+ - Aligned runtime profiles: `light`, `standard`, `full`
92
342
 
93
- ```bash
94
- npm install
95
- npm run build # TypeScript compilation + hook deployment
343
+ | Requirement | Version | Notes |
344
+ |---|---|---|
345
+ | **Node.js** | `>=22.0.0` | Required for native `node:sqlite` module |
346
+ | **better-sqlite3** | `^11.x` | Installed automatically via npm; powers L1 in-memory and L4 library |
347
+ | **sqlite-vec** | `0.1.9` | Bundled; no separate install needed |
96
348
 
97
- # Run all tests (requires Redis on localhost:6379 for full suite)
98
- npm test
349
+ SQLite is a library, not a service. All four layers run in-process with no external daemons. The nomic embedder on Ollama is the heaviest component, and it is lighter than pgvector or any hosted vector database.
99
350
 
100
- # Quick smoke test (SQLite-only, no external deps)
101
- npm run test:quick
351
+ **Runtime version constants** (importable from the package):
352
+ ```typescript
353
+ import {
354
+ ENGINE_VERSION, // '0.5.0'
355
+ MIN_NODE_VERSION, // '22.0.0'
356
+ MIN_SQLITE_VERSION, // '3.35.0'
357
+ SQLITE_VEC_VERSION, // '0.1.9'
358
+ MAIN_SCHEMA_VERSION, // 6 (hypermem.db)
359
+ LIBRARY_SCHEMA_VERSION_EXPORT, // 12 (library.db)
360
+ } from '@psiclawops/hypermem';
102
361
  ```
103
362
 
104
- ### Test Suites
363
+ Schema versions are stamped into each database on startup and checked on open. A database created by an older engine version will be migrated forward automatically. A database created by a newer engine version will throw on open.
364
+
365
+ ---
366
+
367
+ ## Installation
105
368
 
106
369
  ```bash
107
- npm test # All 11 suites (419 tests)
108
- npm run test:quick # smoke + library + compositor
370
+ git clone https://github.com/PsiClawOps/hypermem.git ~/.openclaw/workspace/repo/hypermem
371
+ cd ~/.openclaw/workspace/repo/hypermem
372
+ npm install && npm run build
373
+ npm --prefix plugin install && npm --prefix plugin run build
374
+
375
+ openclaw config set plugins.slots.contextEngine hypermem
376
+ openclaw config set plugins.slots.memory hypermem
377
+ openclaw config set plugins.load.paths '["~/.openclaw/workspace/repo/hypermem/plugin"]' --strict-json
378
+ openclaw gateway restart
109
379
  ```
110
380
 
111
- ## Data Directory
381
+ Or use the one-line installer:
112
382
 
383
+ ```bash
384
+ curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
113
385
  ```
114
- ~/.openclaw/hypermem/
115
- ├── library.db # Fleet-wide shared knowledge (L4)
116
- └── agents/
117
- └── {agentId}/
118
- ├── messages.db # Current conversation DB (L2)
119
- ├── messages_2026Q1.db # Rotated archive (read-only)
120
- └── vectors.db # Semantic search index (L3)
386
+
387
+ **Requirements:** Node.js 22+, OpenClaw with context engine plugin support, and either Ollama (local) or an OpenRouter API key (hosted) for embeddings. No standalone SQLite install is required for the documented repo install: hypermem uses the SQLite bundled with Node 22 via `node:sqlite`, and `sqlite-vec` provides the platform-specific extension through npm dependencies.
388
+
389
+ Full guide with deployment-specific options: **[INSTALL.md](./INSTALL.md)**
390
+
391
+ ### Agent-assisted install
392
+
393
+ If you prefer, hand the install to your OpenClaw agent:
394
+
395
+ > "Install hypermem following INSTALL.md. I'm running a [solo / multi-agent] setup."
396
+
397
+ ### Tuning
398
+
399
+ hypermem ships three aligned operating profiles: `light`, `standard`, and `full`. Pick one and set `outputProfile` in your config. Everything else follows.
400
+
401
+ | Profile | Context window | Budget fraction | Best for |
402
+ |---|---|---|---|
403
+ | `light` | 64k | 0.50 | Single-agent installs, minimal parallel work |
404
+ | `standard` | 128k | 0.65 | Normal OpenClaw deployments |
405
+ | `full` | 200k+ | 0.55 | Large-context or multi-agent installs, maximum richness |
406
+
407
+ **Start with `light`** on 64k models or single-agent systems. Move to `standard` once the system has stable latency and headroom. Use `full` only when you want maximum context richness and have the budget for it.
408
+
409
+ Primary tuning knobs:
410
+
411
+ - **`targetBudgetFraction`**: caps total non-history context weight. Lower values force lighter composition.
412
+ - **`wikiTokenCap`**: caps compiled-knowledge/wiki contribution.
413
+ - **`outputProfile`**: `light`, `standard`, or `full`. Controls how much hyperform guidance is injected per turn.
414
+
415
+ Drop a `~/.openclaw/hypermem/config.json` to override compositor defaults. Takes effect on gateway restart:
416
+
417
+ ```json
418
+ {
419
+ "deferToolPruning": true,
420
+ "compositor": {
421
+ "defaultTokenBudget": 60000,
422
+ "maxFacts": 18,
423
+ "contextWindowReserve": 0.25,
424
+ "outputProfile": "standard"
425
+ }
426
+ }
121
427
  ```
122
428
 
429
+ Additional compositor knobs: `maxCrossSessionContext`, `maxRecentToolPairs`, `maxProseToolPairs`, see INSTALL.md for full descriptions.
430
+
431
+ `deferToolPruning: true` tells hypermem to skip its own T0/T1/T2/T3 gradient when OpenClaw's native `contextPruning` extension is active (Anthropic and Google providers). On those providers, OpenClaw's pruner handles tool result trimming: ratio-driven at >30% context fill, soft-trim head+tail for results over 4,000 chars, hard-clear above 50k total, with the last 3 assistant turns always protected. hypermem's gradient remains active as fallback for other providers (GPT-5.4, etc.). Default: `true` for Anthropic installs.
432
+
433
+ `outputProfile` valid values: `"light"` (~100 tokens: anti-sycophancy, em dash ban, AI vocab ban, length targets, evidence calibration), `"standard"` (~250 tokens: full directive set plus pagination and hedging rules), `"full"` (~400 tokens: complete normalization with full directive set and model-specific calibration). Default: `"standard"`.
434
+
435
+ Context presets ship as named profiles importable from the package:
436
+
437
+ ```typescript
438
+ import { lightProfile, standardProfile, fullProfile } from '@psiclawops/hypermem';
439
+ ```
440
+
441
+ Pass to `HyperMem.create()` as the base config. Full tuning notes are in INSTALL.md.
442
+
443
+ ---
444
+
123
445
  ## API
124
446
 
125
447
  ```typescript
126
448
  import { HyperMem } from '@psiclawops/hypermem';
127
449
 
128
450
  const hm = await HyperMem.create({
129
- agentId: 'forge',
130
451
  dataDir: '~/.openclaw/hypermem',
131
- redis: { host: 'localhost', port: 6379 },
132
- ollama: { host: 'http://localhost:11434', model: 'nomic-embed-text' }
452
+ cache: { maxEntries: 10000 },
453
+ // Local (Ollama):
454
+ embedding: { ollamaUrl: 'http://localhost:11434', model: 'nomic-embed-text' },
455
+ // Hosted (OpenRouter), recommended for installs without local GPU/CPU:
456
+ // embedding: { provider: 'openai', openaiApiKey: 'sk-or-...', openaiBaseUrl: 'https://openrouter.ai/api/v1', model: 'qwen/qwen3-embedding-8b', dimensions: 4096, batchSize: 128 },
133
457
  });
134
458
 
135
- // Record messages
136
- await hm.recordUserMessage(conversationId, 'How does drift detection work?');
137
- await hm.recordAssistantMessage(conversationId, 'Drift detection compares...');
459
+ // Record and compose
460
+ await hm.recordUserMessage('forge', 'agent:forge:webchat:main', 'How does drift detection work?');
138
461
 
139
- // Compose prompt (all 4 layers, budget-aware)
140
462
  const composed = await hm.compose({
141
463
  agentId: 'forge',
142
464
  sessionKey: 'agent:forge:webchat:main',
465
+ prompt: 'How does drift detection work?',
143
466
  tokenBudget: 4000,
144
- provider: 'anthropic'
467
+ provider: 'anthropic',
145
468
  });
146
469
 
147
- // Hybrid retrieval (FTS + vector)
148
- const results = await hm.hybridSearch('drift detection', {
149
- limit: 10,
150
- ftsWeight: 0.4,
151
- vectorWeight: 0.6
152
- });
470
+ // Refresh tool compression after each turn
471
+ await hm.refreshCacheGradient('forge', 'agent:forge:webchat:main');
472
+ ```
153
473
 
154
- // Fleet operations
155
- await hm.upsertFleetAgent({ id: 'forge', displayName: 'Forge', tier: 'council' });
156
- await hm.setDesiredState('forge', 'model', 'anthropic/claude-opus-4-6', 'ragesaq');
157
- const drifted = await hm.getDriftedState();
474
+ Spawning a subagent with parent context:
158
475
 
159
- // Semantic search
160
- const similar = await hm.searchSimilar('drift detection', { limit: 5, threshold: 0.8 });
476
+ ```typescript
477
+ import { buildSpawnContext, MessageStore, DocChunkStore } from '@psiclawops/hypermem';
478
+
479
+ const spawn = await buildSpawnContext(
480
+ new MessageStore(hm.dbManager.getMessageDb('forge')),
481
+ new DocChunkStore(hm.dbManager.getLibraryDb()),
482
+ 'forge',
483
+ { parentSessionKey: 'agent:forge:webchat:main', workingSnapshot: 12 }
484
+ );
485
+ ```
161
486
 
162
- // Knowledge graph
163
- await hm.addKnowledgeLink('fact', factId, 'knowledge', knowledgeId, 'supports');
164
- const related = await hm.traverseKnowledge('fact', factId, { maxDepth: 3 });
487
+ ---
165
488
 
166
- // Workspace indexing
167
- await hm.seedWorkspace('/path/to/workspace');
489
+ ## Data directory
168
490
 
169
- // Cleanup
170
- await hm.close();
491
+ ```text
492
+ ~/.openclaw/hypermem/
493
+ ├── library.db
494
+ └── agents/
495
+ └── {agentId}/
496
+ ├── messages.db
497
+ ├── messages_2026Q1.db (rotated archive)
498
+ └── vectors.db
171
499
  ```
172
500
 
173
- ## Test Coverage
501
+ ---
174
502
 
175
- | Suite | Tests | What's Covered |
176
- |---|---|---|
177
- | smoke | 10 | End-to-end create/write/read/close, provider translation |
178
- | redis-integration | 24 | Redis slots, history, pub/sub |
179
- | cross-agent | 20 | Cross-agent queries, fleet search, visibility tiers |
180
- | vector-search | 33 | Embedding, KNN, batch indexing |
181
- | library | 71 | All L4 collections (facts desired state) |
182
- | compositor | 50 | Four-layer composition, budgets, providers, safety valve |
183
- | fleet-cache | 32 | Redis fleet cache, hydration, cache-aside |
184
- | rotation | 29 | DB rotation, auto-rotate, collision handling |
185
- | knowledge-graph | 33 | DAG traversal, shortest path, analytics |
186
- | rate-limiter | 22 | Token bucket, priority, timeout, embedder |
187
- | doc-chunker | 105 | Markdown/file chunking, section-aware parsing, seeder |
188
- | **Total** | **419** | |
189
-
190
- ## Module Map
191
-
192
- 29 source files, ~12,300 lines:
193
-
194
- | Module | Lines | Layer | Purpose |
195
- |---|---|---|---|
196
- | `index.ts` | ~1,340 | All | Facade — all public API |
197
- | `compositor.ts` | ~1,030 | L1-L4 | Prompt assembly + token budgeting + safety valve |
198
- | `library-schema.ts` | ~780 | L4 | Library schema v5 + migrations |
199
- | `background-indexer.ts` | ~680 | L2-L4 | LLM-powered extraction framework |
200
- | `vector-store.ts` | ~600 | L3 | Semantic search + embedding |
201
- | `hybrid-retrieval.ts` | ~450 | L3-L4 | FTS5 + KNN with Reciprocal Rank Fusion |
202
- | `fleet-store.ts` | ~440 | L4 | Fleet registry + capabilities |
203
- | `db.ts` | ~440 | - | Database manager + rotation |
204
- | `knowledge-graph.ts` | ~420 | L4 | DAG traversal + shortest path |
205
- | `redis.ts` | ~400 | L1 | Redis operations + fleet cache |
206
- | `doc-chunker.ts` | ~400 | - | Section-aware markdown/file parser |
207
- | `work-store.ts` | ~400 | L4 | Work queue + FTS5 |
208
- | `provider-translator.ts` | ~390 | - | Neutral ↔ provider format conversion |
209
- | `doc-chunk-store.ts` | ~375 | L4 | Chunk storage + deduplication |
210
- | `message-store.ts` | ~370 | L2 | Conversation recording + querying |
211
- | `types.ts` | ~330 | - | Shared type definitions |
212
- | `cross-agent.ts` | ~330 | L2-L4 | Cross-agent knowledge queries + visibility |
213
- | `desired-state-store.ts` | ~310 | L4 | Config drift detection |
214
- | `knowledge-store.ts` | ~300 | L4 | Domain/key/value structured data |
215
- | `secret-scanner.ts` | ~285 | - | Credential/secret detection |
216
- | `system-store.ts` | ~250 | L4 | Service state tracking |
217
- | `seed.ts` | ~250 | L4 | Workspace seeder + collection inference |
218
- | `fact-store.ts` | ~230 | L4 | Facts with confidence + expiry |
219
- | `rate-limiter.ts` | ~230 | L3 | Token-bucket for embedding API |
220
- | `schema.ts` | ~200 | L2 | Messages schema + migrations |
221
- | `episode-store.ts` | ~180 | L4 | Significant event tracking |
222
- | `preference-store.ts` | ~170 | L4 | Operator behavioral patterns |
223
- | `topic-store.ts` | ~160 | L4 | Cross-session thread tracking |
224
- | `plugin/src/index.ts` | ~550 | - | OpenClaw context engine plugin |
225
-
226
- ## Roadmap
227
-
228
- - [x] ~~Document chunk ingestion pipeline (section-aware markdown parsing)~~
229
- - [x] ~~Workspace seeder with idempotent re-indexing~~
230
- - [x] ~~Hybrid retrieval (FTS5 + KNN with RRF)~~
231
- - [x] ~~Context engine plugin (OpenClaw integration)~~
232
- - [x] ~~Compositor safety valve for budget overrun~~
233
- - [x] ~~Own compaction (`ownsCompaction: true`)~~
234
- - [ ] Background indexer activation (LLM extraction from live conversations)
235
- - [ ] Versioned atomic re-indexing (source hash + transactional swap)
236
- - [ ] Bootstrap seed command (`hypermem seed --workspace`)
237
- - [ ] npm publish to registry
238
- - [ ] Live org registry (replace hardcoded `defaultOrgRegistry()` with library.db lookup)
239
- - [ ] Embedding model hot-swap (currently pinned to nomic-embed-text)
503
+ ## Migration
504
+
505
+ hypermem doesn't touch your existing memory data. Install it, switch the context engine, and migrate historical data on your own timeline.
506
+
507
+ The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, Mem0, Honcho, QMD session exports, and Engram. Each example walks through the data model mapping, transformation steps, and validation. Adapt them to your setup.
508
+
509
+ All examples default to dry-run. Nothing is written until you add `--apply`.
510
+
511
+ Operator guide: **[docs/MIGRATION_GUIDE.md](./docs/MIGRATION_GUIDE.md)**
512
+
513
+
514
+ ---
515
+
516
+ ## Identity layer
517
+
518
+ hypermem handles context and output normalization. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
519
+
520
+ Design guide: [PsiClawOps/AgenticCognitiveArchitecture](https://github.com/PsiClawOps/AgenticCognitiveArchitecture/)
521
+
522
+ ---
240
523
 
241
524
  ## License
242
525
 
243
- Private PsiClawOps
526
+ Apache-2.0, [PsiClawOps](https://github.com/PsiClawOps)