npm - @psiclawops/hypermem - Versions diffs - 0.8.4 → 0.9.0 - Mend

@psiclawops/hypermem 0.8.4 → 0.9.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (99) hide show

package/CHANGELOG.md +33 -0
package/INSTALL.md +203 -23
package/README.md +139 -216
package/bench/README.md +42 -0
package/bench/data-access-bench.mjs +380 -0
package/bin/hypermem-bench.mjs +2 -0
package/bin/hypermem-doctor.mjs +412 -0
package/bin/hypermem-model-audit.mjs +339 -0
package/bin/hypermem-status.mjs +491 -70
package/dist/adaptive-lifecycle.d.ts +81 -0
package/dist/adaptive-lifecycle.d.ts.map +1 -0
package/dist/adaptive-lifecycle.js +190 -0
package/dist/background-indexer.js +9 -9
package/dist/budget-policy.d.ts +1 -1
package/dist/budget-policy.d.ts.map +1 -1
package/dist/budget-policy.js +10 -5
package/dist/cache.d.ts +4 -0
package/dist/cache.d.ts.map +1 -1
package/dist/cache.js +2 -0
package/dist/composition-snapshot-integrity.d.ts +36 -0
package/dist/composition-snapshot-integrity.d.ts.map +1 -0
package/dist/composition-snapshot-integrity.js +131 -0
package/dist/composition-snapshot-runtime.d.ts +59 -0
package/dist/composition-snapshot-runtime.d.ts.map +1 -0
package/dist/composition-snapshot-runtime.js +250 -0
package/dist/composition-snapshot-store.d.ts +44 -0
package/dist/composition-snapshot-store.d.ts.map +1 -0
package/dist/composition-snapshot-store.js +117 -0
package/dist/compositor.d.ts +125 -1
package/dist/compositor.d.ts.map +1 -1
package/dist/compositor.js +692 -44
package/dist/cross-agent.d.ts +1 -1
package/dist/cross-agent.js +17 -17
package/dist/doc-chunk-store.d.ts +19 -0
package/dist/doc-chunk-store.d.ts.map +1 -1
package/dist/doc-chunk-store.js +56 -6
package/dist/dreaming-promoter.d.ts +1 -1
package/dist/dreaming-promoter.js +2 -2
package/dist/hybrid-retrieval.d.ts +38 -0
package/dist/hybrid-retrieval.d.ts.map +1 -1
package/dist/hybrid-retrieval.js +86 -1
package/dist/index.d.ts +15 -6
package/dist/index.d.ts.map +1 -1
package/dist/index.js +33 -7
package/dist/knowledge-store.d.ts +4 -1
package/dist/knowledge-store.d.ts.map +1 -1
package/dist/knowledge-store.js +27 -4
package/dist/library-schema.d.ts +12 -8
package/dist/library-schema.d.ts.map +1 -1
package/dist/library-schema.js +22 -8
package/dist/message-store.d.ts.map +1 -1
package/dist/message-store.js +7 -3
package/dist/metrics-dashboard.d.ts +18 -1
package/dist/metrics-dashboard.d.ts.map +1 -1
package/dist/metrics-dashboard.js +52 -14
package/dist/reranker.d.ts +1 -1
package/dist/reranker.js +2 -2
package/dist/schema.d.ts +1 -1
package/dist/schema.d.ts.map +1 -1
package/dist/schema.js +28 -1
package/dist/seed.d.ts +1 -1
package/dist/seed.d.ts.map +1 -1
package/dist/seed.js +3 -1
package/dist/session-flusher.d.ts +2 -2
package/dist/session-flusher.js +2 -2
package/dist/spawn-context.d.ts +1 -1
package/dist/spawn-context.js +1 -1
package/dist/topic-store.js +5 -5
package/dist/topic-synthesizer.d.ts +20 -0
package/dist/topic-synthesizer.d.ts.map +1 -1
package/dist/topic-synthesizer.js +114 -4
package/dist/trigger-registry.d.ts +1 -1
package/dist/trigger-registry.d.ts.map +1 -1
package/dist/trigger-registry.js +14 -6
package/dist/types.d.ts +273 -3
package/dist/types.d.ts.map +1 -1
package/dist/version.d.ts +7 -7
package/dist/version.d.ts.map +1 -1
package/dist/version.js +17 -7
package/docs/DIAGNOSTICS.md +205 -0
package/docs/INTEGRATION_VALIDATION.md +186 -0
package/docs/MIGRATION.md +9 -6
package/docs/MIGRATION_GUIDE.md +125 -101
package/docs/ROADMAP.md +238 -20
package/docs/TUNING.md +30 -6
package/install.sh +159 -408
package/memory-plugin/LICENSE +190 -0
package/memory-plugin/README.md +20 -0
package/memory-plugin/dist/index.js +50 -0
package/memory-plugin/package.json +2 -2
package/package.json +18 -4
package/plugin/LICENSE +190 -0
package/plugin/README.md +20 -0
package/plugin/dist/index.d.ts +55 -0
package/plugin/dist/index.d.ts.map +1 -1
package/plugin/dist/index.js +362 -42
package/plugin/dist/index.js.map +1 -1
package/plugin/package.json +2 -2
package/scripts/install-runtime.mjs +13 -3

package/README.md CHANGED Viewed

@@ -8,7 +8,7 @@
 hypermem is a SQLite-backed runtime context engine for OpenClaw agents.
-**Quick install** (interactive, detects hardware, writes config):
+**Quick install** (runtime staging + guided OpenClaw wiring):
 ```bash
 npm install @psiclawops/hypermem && npx hypermem-install
@@ -20,14 +20,23 @@ Or via the shell installer:
 curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
 ```
-Or install manually via `npm install @psiclawops/hypermem` — see [Installation](#installation) for plugin wiring, embedding setup, and step-by-step paths.
+Or install manually via `npm install @psiclawops/hypermem` - see [Installation](#installation) for the full declarative plugin path, verification checkpoints, and setup variants.
+Release operators should also read:
+- [INSTALL.md](./INSTALL.md) - canonical fresh install and upgrade guide
+- [docs/INTEGRATION_VALIDATION.md](./docs/INTEGRATION_VALIDATION.md) - end-to-end integration validation contract
+- [docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md) - status, model audit, compose, trim, and release diagnostics
+A successful `hypermem-install` only stages the runtime. HyperMem is active only after OpenClaw config is wired, the gateway restarts, and logs show compose activity.
 ---
 ## The problem
-Every LLM conversation is composed at runtime. The model sees only what's in the prompt. It has no memory of prior sessions, no access to decisions made last week, no awareness of work that happened before this context window opened.
+Your agent can feel sharp on day one, then start slipping as the work accumulates.
+Not because the model got worse. Because each turn is composed at runtime, and the model only sees what made it into this prompt. It has no native memory of prior sessions, no direct access to last week's decisions, and no awareness of work that happened before this context window opened.
 Two questions make this concrete:
@@ -36,36 +45,38 @@ Two questions make this concrete:
 | *"What was Caesar's greatest military victory?"* | Training data | ✅ Answered correctly, no session context needed |
 | *"What did we decide about the retry logic last week?"* | Nothing (prior session is gone) | ❌ The decision existed only in that session |
-The difference isn't intelligence. It's what was in the prompt. Two failure modes follow:
+The difference is not intelligence. It is prompt access. Three failure modes follow:
-**New-session amnesia.** The agent restarts and everything is gone. Decisions, preferences, work in progress: erased at the session boundary. Operators re-explain context. Agents re-ask questions already answered.
+**New-session amnesia.** The agent restarts and the work disappears with the session. Decisions, preferences, and work in progress vanish at the boundary. Operators re-explain context. Agents re-ask questions that were already settled.
-**Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics (tool output, exact decisions, file paths) are lost in the summary. The agent keeps running, but degraded.
+**Compaction crunch.** Long sessions fill the context window. The runtime summarizes to make room. Specifics like tool output, exact decisions, and file paths are the first things to get flattened. The agent keeps running, but with less ground truth than it had a few turns ago.
-**Bloated context.** 128k tokens doesn't mean 128k of useful prompt. Without active curation, agents fill the window with stale history, redundant instructions, and memory that isn't relevant to this turn. A bigger context window just means more room to waste. The information is in the prompt somewhere, buried under content irrelevant to this turn.
+**Bloated context.** 128k tokens does not mean 128k of useful prompt. Without active selection, agents fill the window with stale history, repeated instructions, and memory that does not matter to this turn. A bigger window just gives you more room to waste.
 ---
 ## What OpenClaw provides today
-OpenClaw addresses both failure modes with structured guidance files injected into every session:
+OpenClaw already gives agents a stronger baseline than most stacks. It injects structured guidance into every session:
 | File | What it contributes | Survives session restart? |
 |---|---|---|
 | `SOUL.md` | Agent identity, voice, principles | ✅ always injected |
 | `USER.md` | User preferences, working style | ✅ always injected |
-| Task and workspace instruction files (for example AGENTS.md, job files, and related guidance) | ✅ always injected |
+| Task/workspace instructions | `AGENTS.md`, job files, and related guidance | ✅ always injected |
 | `MEMORY.md` | Hand-curated decisions, facts, patterns | ✅ if manually maintained |
-These are powerful for identity and preferences. But the retry logic decision from last week? If nobody manually captured it into `MEMORY.md`, that session boundary erased it. The system is only as strong as its last manual update.
+These files are strong at identity, user fit, and working style. They are not a durable memory system by themselves. If nobody copied the retry-logic decision into `MEMORY.md`, the next session does not know it happened.
-OpenClaw also ships compaction safeguards and hybrid file search. That's a solid baseline. It has limits. hypermem closes both gaps.
+OpenClaw also ships compaction safeguards and hybrid file search. That is a solid baseline. What is still missing is durable recall across sessions, active prompt selection under pressure, and writing discipline that holds across long-running work. hypermem adds those layers.
 ---
 ## hypermem
-Four SQLite-backed memory databases, sub-millisecond retrieval, no external database services required. Runs in-process with local SQLite storage and local Nomic embeddings by default, with optional hosted embeddings for L3.
+OpenClaw gives agents a strong starting shape: identity files, user guidance, task framing, compaction safeguards, and hybrid file search. What it does not add by default is durable recall across session boundaries. When a useful decision falls out of the prompt and nobody hand-copied it into `MEMORY.md`, it is gone.
+hypermem closes that gap with four SQLite-backed memory layers that stay local, run in-process, and remain queryable across sessions. No external database service. No retrieval stack to babysit.
 | Layer | What it holds | Speed |
 |---|---|---|
@@ -74,22 +85,24 @@ Four SQLite-backed memory databases, sub-millisecond retrieval, no external data
 | **L3 Semantic** | Finds related content even when the words don't match. | 0.29ms |
 | **L4 Knowledge** | Facts, wiki pages, episodes, preferences. Shared across agents. | 0.09ms |
-Everything is retained. Storage survives session boundaries. The retry logic decision from last week, the deployment preferences from last month, the architecture choices from day one: all queryable, all available for composition.
+Durable context stays in SQLite and remains queryable across session boundaries. The retry logic decision from last week, the deployment preferences from last month, and the architecture choices from day one can be pulled back in when they matter.
-**Session warming.** Before the first turn fires, hypermem pre-loads the agent's full working state from its SQLite-backed memory stores and hot `:memory:` cache: recent history, facts ranked by confidence and recency, active topic context, cached embeddings for fast semantic recall. The agent's first reply draws from everything that was in scope at the end of the last session. The agent picks up where it left off.
+That changes OpenClaw in a few concrete ways. Starts are warm instead of blank because recent history, ranked facts, active topics, and cached semantic state are loaded before the first turn. Recall survives wording drift because FTS5, sqlite-vec, RRF fusion, and an optional reranker can recover the same idea through different phrasing. Time-aware facts can answer “last week” and “before the release” as retrieval problems instead of vague prompt guessing. Shared knowledge stops living in one agent’s scratchpad because `library.db` holds facts, docs, episodes, preferences, fleet state, and output standards with visibility controls.
 ---
 ## hypercompositor
-Every memory system stores. Almost none compose.
+Storage is only half the problem. The harder question is what actually reaches the model.
+Most memory systems can save useful state. Far fewer can decide, turn by turn, what belongs in the prompt right now and what should stay on disk. Without that layer, long sessions bloat, tool output crowds out current work, and a larger context window just gives you more room to waste tokens.
-Your agent has four layers of stored context, but what shows up in the prompt? How much of the token budget goes to stale content? Who decides what's relevant to this specific turn?
+hypercompositor queries all four memory layers in parallel, scores what matters for the current turn, and composes a fresh prompt inside a fixed budget. Content that does not fit is not destroyed. It stays in storage and can win its way back in when the topic returns.
-The hypercompositor queries all four layers in parallel on every turn and composes context within a fixed token budget. No transcript accumulates. No lossy transcript summarization. Amnesia isn't a storage problem; the memories exist, but nobody composed them into a coherent prompt. Compaction isn't inevitable; content that doesn't fit this turn stays in storage instead of being destroyed.
+That changes OpenClaw at the prompt boundary. Selection replaces loss. Tool calls and results stay paired, recent turns stay readable, and older payloads compress by age instead of being flattened blindly. Quiet topics compile into structured wiki pages so the next turn can inject the decision trail without replaying raw transcript. Duplicate prompt spend drops because facts, doc chunks, semantic hits, and bootstrap content are fingerprinted before insertion. Subagents inherit a bounded handoff instead of a random slice of parent history.
-**Bigger context windows don't help if you fill them with stale history.**
-128k tokens of stale history and irrelevant memory is worse than 32k of precisely selected content. 9 budget categories, priority-ordered, greedy-fill. Every token in the prompt earned its spot.
+**A bigger context window does not fix bad composition.**
+128k tokens of stale history is worse than 32k of selected context. hypercompositor treats prompt space as a constrained resource, not a dumping ground.
 ### What the model actually sees
@@ -123,7 +136,7 @@ OpenClaw default                        hypercompositor
 ────────────────────────────────        ────────────────────────────────
 message → append to transcript          message → detect active topic
 transcript full → trim oldest           query 4 storage layers in parallel
-trimmed content → summarize (lossy)     budget allocator: 9 slots, fixed cap
+trimmed content → summarize (lossy)     budget allocator: 10 slots, fixed cap
 send transcript to model                tool compression by turn age
 model responds → append again           keystone guard + hyperform profile
                                         composed prompt → model
@@ -148,7 +161,7 @@ When it fills:                          When budget is exceeded:
 High-signal turns are marked as keystones and survive pressure trimming ahead of ordinary history.
-The compositor fills 9 slots in priority order (system prompt → identity → hyperform → history → facts → wiki → semantic recall → cross-session → action summary). Each slot consumes tokens from the remaining budget before the next slot runs. Slots that don't fit this turn stay in storage, not destroyed.
+The compositor fills 10 slots in priority order (system prompt → identity → hyperform → history → recent tools → keystones → wiki/knowledge → facts → semantic recall → reserve/action context). Each slot consumes tokens from the remaining budget before the next slot runs. Slots that do not fit this turn stay in storage, not destroyed.
 For the full fill order, budget formula, and all configuration knobs, see **[Tuning](#tuning)** below and **[docs/TUNING.md](./docs/TUNING.md)**.
@@ -156,9 +169,13 @@ For the full fill order, budget formula, and all configuration knobs, see **[Tun
 ## hyperform
-Raw model output has two problems. It drifts from your standards (sycophancy, hedging, pagination, formatting) and it drifts from your facts (confabulation, contradiction, stale claims). hyperform handles both: normalization enforces consistency, confabulation resistance checks output against what's actually stored.
+Good memory is wasted if the model still writes like it has no standards.
-Consistent output isn't just aesthetic. A model that paginates short answers, preambles with filler, or inflates lists uses more output tokens per turn. Over hundreds of turns, that compounds into real cost. hyperform directives compress output at the source: fewer tokens generated means lower API spend per session, and less context pressure for subsequent turns.
+OpenClaw can preserve identity and instruction. That does not guarantee consistent delivery. Models still drift into filler openings, hedging, bloated lists, pagination, and stale claims. Over long sessions that is not just annoying copy. It is token waste, weaker signal, and lower trust in what gets written back into memory.
+hyperform adds a writing contract at prompt time. Output profiles inject shared standards before generation. Model directives correct known provider habits. Confabulation resistance checks candidate claims against stored facts before new memory is recorded.
+That gives OpenClaw something it does not get from raw prompting alone: fleet-wide writing discipline, model-aware correction, and tighter claim hygiene at the memory boundary. The point is not to post-process prose into something artificial. The point is to make the first draft cleaner, shorter, and harder to contaminate with unsupported claims.
 ### Behavior standards
@@ -184,14 +201,14 @@ Model adaptation is only active at the `full` tier. At `light` and `standard`, m
 The `model_output_directives` table starts empty. You populate it with corrections for the models you run. See [docs/TUNING.md](./docs/TUNING.md#creating-custom-entries) for the schema and SQL examples.
-### Before and after
+### Illustrative before and after
-The same prompt, GPT-5.4, with and without `hyperformProfile: "light"`:
+The example below shows the intended effect of `hyperformProfile: "light"`. hyperform is prompt-time shaping, not a deterministic post-generation rewrite engine:
 ```
 Prompt: "How should I size my context window budget for a long-running agent session?"
-WITHOUT normalization (GPT-5.4 default):
+WITHOUT hyperform shaping (GPT-5.4 default):
 Here are the key factors to consider when sizing your context window budget:
 **1. Session depth**
@@ -211,46 +228,12 @@ tool context, and leave ~30k as allocator reserve. hypermem handles slot competi
 automatically. Set `reserveFraction` to your preferred floor and let the compositor fill.
 ```
-**Confabulation resistance** checks output against stored facts before claims are recorded. No LLM call. Pattern matching against the fact corpus, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
+**Confabulation resistance** checks stored claims against existing facts before new memory entries are recorded. No LLM call. Pattern matching against the fact corpus, with confidence scoring and contradiction detection. Unsupported claims are flagged, contradictions surface in diagnostics, and a confabulation risk score is attached to the stored episode.
 Set `compositor.hyperformProfile` to `light`, `standard`, or `full`. For tier selection guidance, configuration details, and custom entry creation, see **[Tuning](#tuning)** below and **[docs/TUNING.md](./docs/TUNING.md)**.
 ---
-## What it solves
-### Tool output that doesn't take over
-Agentic sessions generate massive tool output. Left unmanaged, old results crowd out current reasoning. hypermem compresses tool history by age: recent clusters stay full, older clusters are capped, and the oldest collapse to short stubs while preserving tool call/result integrity. The budget goes to current work, not last hour's npm test output.
-### Knowledge that outlasts the conversation
-Most memory systems store what was said. hypermem synthesizes what was learned.
-When a topic goes quiet, hypermem compiles the thread into a structured wiki page: decisions, open questions, artifacts, participants. When the topic resurfaces, the agent gets a compact structured summary rather than a raw history replay.
-OpenClaw 2026.4.7 ships memory wiki for structured storage. hypermem goes further: wiki pages are synthesized automatically and injected by the compositor within token budget, backed by SQLite memory databases instead of an external cache service.
-### Subagents that hit the ground running
-Spawned subagents inherit a bounded context block: recent parent turns, session-scoped documents, and relevant facts. Scope is isolated from the shared library. Documents are cleaned up on completion.
-### Context that doesn't repeat itself
-Retrieval paths pull from four layers, trigger shortcuts, temporal indexes, open-domain FTS5, semantic recall, and cross-session summaries. Without dedup, the same fact surfaces through multiple paths and wastes budget on repetition.
-hypermem runs content fingerprint dedup across all compose-time retrieval. Every fact, temporal result, open-domain hit, and semantic recall entry is normalized and fingerprinted on a 120-char prefix. O(1) lookup in a shared set catches duplicates regardless of which retrieval path produced them, including rephrased near-duplicates that substring matching missed. Diagnostics track dedup counts and fingerprint collisions per compose call.
-Identity content (SOUL.md, USER.md, IDENTITY.md) and doc chunks already injected by OpenClaw's bootstrap are fingerprinted before retrieval runs, so the compositor never double-injects content the runtime already placed in the prompt.
-### Integrity under failure
-The background indexer runs a startup integrity check against `library.db` on every boot. If the schema is corrupt, tables are missing, or critical indexes are damaged, the indexer enters circuit-breaker mode: it logs the failure, skips indexing for the session, and avoids cascading writes into a broken database. The agent still runs with cached and in-memory data while the operator is notified.
-SQL queries that interpolate datetime values are fully parameterized. FTS5 trigger terms are quoted to prevent injection through crafted content. These aren't theoretical: agentic sessions ingest arbitrary user and tool output into the fact store, and unparameterized queries on that path were a real attack surface.
----
 ## Pressure management
 hypermem manages context pressure automatically through four escalating paths. Most sessions never need manual intervention. For trigger thresholds and path details, see [Pressure management](#pressure-management-1) below.
@@ -280,7 +263,15 @@ No configuration required for any of these:
 ## Speed
-Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB. 1,000 iterations, 50 warmup discarded, single-process isolation.
+HyperMem ships a user-facing benchmark so operators can validate local memory access speed against their own dataset:
+```bash
+hypermem-bench --iterations 1000 --warmup 50 --agent main
+```
+The benchmark reports min, average, p50, p95, p99, and max timings for the storage paths present in the install: message hot-path lookups, session/conversation lookup, message FTS, facts, episodes, topics, fleet records, and doc chunks. It reads from `~/.openclaw/hypermem` by default, or from `HYPERMEM_DATA_DIR` / `--data-dir`.
+Reference run, production database: 5,104 facts, 28,441 episodes, 847 knowledge entries, 42MB, 1,000 iterations, 50 warmup discarded, single-process isolation.
 | Operation | avg | p50 | p95 |
 |---|---|---|---|
@@ -292,9 +283,10 @@ Benchmarked against a production database: 5,104 facts, 28,441 episodes, 847 kno
 | L4 FTS5 + agentId filter | 0.07ms | 0.06ms | 0.10ms |
 | L4 knowledge query | 0.09ms | 0.08ms | 0.14ms |
 | Recency decay scoring (28 rows, in JS) | 0.003ms | 0.002ms | 0.005ms |
-> Query planner uses compound indexes on agentId + sort key; FTS5 performance improved 25% from baseline after index additions despite a 47% increase in stored data.
-L1 and L4 structured retrieval are sub-millisecond. Vector embeddings are computed asynchronously after the assistant replies and cached in the in-memory layer, not on the primary composition call path. Users never wait for an embedding computation.
+L1 and L4 structured retrieval are sub-millisecond on this dataset. Vector embeddings are computed asynchronously after the assistant replies and cached for later recall; hosted reranker latency depends on the chosen provider and is measured separately from SQLite access timings.
+For reproducible commands and interpretation notes, see **[docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md#memory-access-benchmark)**.
 ---
@@ -319,28 +311,36 @@ The Plugin column is the npm package name. The ID column is what goes in `plugin
 Retrieval follows a fixed pipeline on every compose call:
-1. **Trigger registry** fires first. Nine pattern triggers check for exact-match shortcuts. If one hits, scoped FTS5 prefix queries (`word1* OR word2*`) run against L4 collections and return immediately.
-2. **Semantic fallback** fires when no trigger matches. Bounded hybrid retrieval runs FTS5 + KNN in parallel, then merges via Reciprocal Rank Fusion (RRF). BM25 ranks and KNN cosine distances combine into a single ordered result.
-3. **Noise floor** filters anything below RRF 0.008 before it reaches the compositor.
+1. **Active facts** are ranked by confidence and recency.
+2. **Temporal retrieval** runs when the query has time signals.
+3. **Open-domain retrieval** handles broad exploratory queries over indexed memory.
+4. **Knowledge and preference blocks** add structured library context.
+5. **Hybrid semantic recall** runs FTS5 and KNN/vector search, then merges candidates with Reciprocal Rank Fusion (RRF).
+6. **Optional reranking** reorders fused candidates when a reranker is configured. Supported providers include ZeroEntropy, OpenRouter, and Ollama. If the reranker is absent, fails, times out, or has too few candidates, HyperMem keeps the original RRF order.
+7. **Trigger-based doc retrieval** pulls doctrine, policy, and workspace chunks by trigger match, with semantic fallback on misses.
+8. **Session-scoped spawn context** and **cross-session context** are added when relevant.
-FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries). These indexes yielded a 25% read improvement over baseline despite a 47% increase in stored data.
+Diagnostics expose reranker status, candidate count, and provider, so operators can tell whether a turn used RRF only or reranked retrieval. FTS5 queries use compound indexes on `agentId + sort key` and prefix optimization (3+ chars, capped at 8 terms, OR queries).
-### Retrieval pipeline
+### Library and fleet data
 **L4: Library DB.** Per-agent storage can't hold shared knowledge. Facts established by one agent, wiki pages synthesized from cross-agent topics, shared registry state: these belong to the system, not one agent. One shared SQLite database:
 | Collection | What it holds |
 |---|---|
-| Facts | Claims with confidence scoring, domain, expiry, supersedes chains |
-| Knowledge | Domain/key/value structured data with full-text search |
-| Episodes | Significant events with impact scores and participant tracking |
-| Topics | Cross-session thread tracking and synthesized wiki pages |
-| Preferences | Operator behavioral patterns |
-| Fleet Registry | Agent registry with tier, org, and capability metadata |
-| System Registry | Service state and lifecycle |
-| Work Items | Work queue with status transitions and FTS5 |
-| Session Registry | Session lifecycle tracking |
-| Desired State | Per-agent config targets; compares running config against desired at gateway startup and surfaces drift for operator review |
+| Facts | Claims with confidence, visibility, decay, temporal validity, and supersession chains |
+| Knowledge / wiki | Domain knowledge and synthesized topic pages with full-text search |
+| Episodes | Significant events, decisions, discoveries, participants, and source links |
+| Topics | Cross-session thread tracking and topic lifecycle state |
+| Preferences | Operator and agent behavior patterns |
+| Documents | Chunked workspace/governance docs, doc sources, and trigger retrieval metadata |
+| Knowledge graph | Links between facts, knowledge, topics, episodes, agents, and preferences |
+| Fleet registry | Agents, orgs, tiers, capabilities, and fleet topology |
+| Desired state | Per-agent config targets, config events, and drift detection |
+| System / work state | Service state, system events, work items, and work events |
+| Sessions | Session registry, lifecycle events, and extraction counters |
+| Output standards | Fleet output standards, model directives, and output metrics |
+| Temporal / expertise / audits | Temporal index, expertise patterns, contradiction audits, and indexer watermarks |
 Facts are ranked by `confidence × recencyDecay`, where decay is exponential with a configurable half-life: recent, high-confidence facts float to the top while stale entries yield budget to newer knowledge.
@@ -371,7 +371,7 @@ Facts are ranked by `confidence × recencyDecay`, where decay is exponential wit
        │
   keystone guard ──► high-signal turns survive pressure
        │
-  hyperform ──► output normalization directives
+  hyperform ──► output profile directives
        │
   composed prompt
        │
@@ -386,20 +386,19 @@ Slot-level budget allocation is shown in the [hypercompositor diagram](#what-the
 ## Requirements
-**Current release: hypermem 0.8.2.** Changelog: [CHANGELOG.md](./CHANGELOG.md)
+**Current release: hypermem 0.9.0.** Changelog: [CHANGELOG.md](./CHANGELOG.md)
 | Requirement | Version | Notes |
 |---|---|---|
 | **Node.js** | `>=22.0.0` | Required for native `node:sqlite` module |
-| **better-sqlite3** | `^11.x` | Installed automatically via npm; powers L1 in-memory and L4 library |
 | **sqlite-vec** | `0.1.9` | Bundled; no separate install needed |
-SQLite is a library, not a service. All four layers run in-process with no external daemons. The nomic embedder on Ollama is the heaviest component, and it is lighter than pgvector or any hosted vector database.
+SQLite is a library, not a service. All four layers run in-process with no external database daemon. Embeddings are optional: use no embeddings for FTS-only lightweight mode, Ollama for local embeddings, or a hosted provider such as OpenRouter/Gemini when configured.
 **Runtime version constants** (importable from the package):
 ```typescript
 import {
-  ENGINE_VERSION,        // '0.8.2'
+  ENGINE_VERSION,        // '0.9.0'
   MIN_NODE_VERSION,      // '22.0.0'
   SQLITE_VEC_VERSION,    // '0.1.9'
   MAIN_SCHEMA_VERSION,   // 10 (messages.db)
@@ -413,105 +412,77 @@ Schema versions are stamped into each database on startup and checked on open. A
 ## Installation
-**Requirements:** Node.js 22+, OpenClaw with context engine plugin support. No standalone SQLite install needed (uses Node 22 built-in `node:sqlite`). Embedding provider is optional for first install.
+**Requirements:** Node.js 22+, OpenClaw with context engine plugin support. No standalone SQLite install is needed because HyperMem uses Node 22 `node:sqlite`. Embeddings are optional on first install.
-hypermem works two ways:
-- **As a library** — import directly into your own Node.js code. No OpenClaw required.
-- **As an OpenClaw plugin** — replaces the default context engine. Requires a running OpenClaw gateway.
+README is OpenClaw-first. For non-OpenClaw library usage, see **[INSTALL.md § Non-OpenClaw usage](./INSTALL.md#non-openclaw-usage)**.
-### Library usage (no OpenClaw required)
+### OpenClaw quickstart
 ```bash
 npm install @psiclawops/hypermem
+npx hypermem-install
 ```
-```typescript
-import { HyperMem } from '@psiclawops/hypermem';
-import { join } from 'node:path';
-import { homedir } from 'node:os';
-const hm = await HyperMem.create({
-  dataDir: join(homedir(), '.openclaw', 'hypermem'),
-  embedding: { provider: 'none' },
-});
-await hm.recordUserMessage('my-agent', 'session-1', 'Hello');
-const composed = await hm.compose({
-  agentId: 'my-agent',
-  sessionKey: 'session-1',
-  prompt: 'Hello',
-  tokenBudget: 4000,
-  provider: 'anthropic',
-});
-```
+`hypermem-install` stages the runtime payload into `~/.openclaw/plugins/hypermem`. It does **not** modify OpenClaw config and does **not** restart the gateway. HyperMem is active only after OpenClaw is wired, restarted, and compose activity appears in logs.
-That's it. No gateway, no plugins, no config files. See [API](#api) for the full interface.
+Install states:
-### OpenClaw plugin install (from source)
-> **Release note:** if the npm package you installed does not contain `install:runtime`, you are on an older public release. Use the source-clone path below or wait for `0.8.4+`.
-```bash
-git clone https://github.com/PsiClawOps/hypermem.git
-cd hypermem
-npm install && npm run build
-npm --prefix plugin install && npm --prefix plugin run build
-npm --prefix memory-plugin install && npm --prefix memory-plugin run build
-npm run install:runtime
-```
+| State | Meaning |
+|---|---|
+| Package installed | npm package is present |
+| Runtime staged | plugin payload copied into `~/.openclaw/plugins/hypermem` |
+| OpenClaw wired | `plugins.load.paths`, `plugins.slots.contextEngine`, and `plugins.slots.memory` point at HyperMem |
+| Runtime loaded | gateway restarted and both plugins loaded |
+| Runtime active | logs show `hypermem initialized` and compose activity |
-`install:runtime` stages the runtime payload into `~/.openclaw/plugins/hypermem` and prints the exact config commands to wire the plugins. It does not finish wiring automatically. Before running them, create the data directory and config:
+Minimal starter config for lightweight FTS-only mode:
 ```bash
 mkdir -p ~/.openclaw/hypermem
 cat > ~/.openclaw/hypermem/config.json <<'JSON'
 {
-  "embedding": {
-    "provider": "none"
+  "embedding": { "provider": "none" },
+  "compositor": {
+    "budgetFraction": 0.55,
+    "contextWindowReserve": 0.25,
+    "targetBudgetFraction": 0.50,
+    "warmHistoryBudgetFraction": 0.27,
+    "maxFacts": 25,
+    "maxHistoryMessages": 500,
+    "maxCrossSessionContext": 4000,
+    "maxRecentToolPairs": 3,
+    "maxProseToolPairs": 10,
+    "keystoneHistoryFraction": 0.15,
+    "keystoneMaxMessages": 12,
+    "wikiTokenCap": 500
   }
 }
 JSON
 ```
-This sets lightweight mode (FTS5 keyword search, no embedding provider needed). Add an embedding provider later for semantic search without losing stored data. See [INSTALL.md](./INSTALL.md#embedding-providers) for options.
-Wire the plugins into OpenClaw:
-> **⚠️  Merge, don't overwrite.** If you already have values in `plugins.load.paths` or `plugins.allow`, check them first and include your existing entries alongside the new ones. Replacing the list drops whatever was there before.
->
-> ```bash
-> openclaw config get plugins.allow
-> openclaw config get plugins.load.paths
-> ```
+Then merge the staged plugin paths into OpenClaw config and set the slots:
 ```bash
-# Use a variable to avoid shell quote-escaping issues with $HOME:
+openclaw config get plugins.load.paths
+openclaw config get plugins.allow
 HYPERMEM_PATHS="[\"${HOME}/.openclaw/plugins/hypermem/plugin\",\"${HOME}/.openclaw/plugins/hypermem/memory-plugin\"]"
 openclaw config set plugins.load.paths "$HYPERMEM_PATHS" --strict-json
-# If you have existing load paths, merge them into the array in HYPERMEM_PATHS.
 openclaw config set plugins.slots.contextEngine hypercompositor
 openclaw config set plugins.slots.memory hypermem
-# Only set plugins.allow if your OpenClaw config already uses an allowlist.
-# If `openclaw config get plugins.allow` returns null, empty, or unset, skip this step.
-# If it returns an array, copy that array and append "hypercompositor" and "hypermem".
+# Only if your install already uses plugins.allow: merge, do not replace.
 openclaw config set plugins.allow '["existing-plugin","hypercompositor","hypermem"]' --strict-json
 openclaw gateway restart
+hypermem-doctor --fix-plan
+hypermem-status --health
+hypermem-model-audit --strict
 ```
-Do **not** replace a working `plugins.allow` list with only `['hypercompositor','hypermem']`. That can disable bundled CLI surfaces and channel plugins.
-Verify (run these commands from the repo clone directory — `bin/` is a relative path):
-```bash
-openclaw plugins list                    # hypercompositor and hypermem should show as loaded
-node bin/hypermem-status.mjs --health    # confirms database initialization
-openclaw logs --limit 50 | grep hypermem # should show "hypermem initialized"
-```
+`hypermem-doctor` is the confidence check: it validates plugin wiring, runtime load state, recommended OpenClaw settings such as `contextPruning.mode=off`, GPT-5 personality overlay off, startup/bootstrap injection sizing, compaction safety settings, HyperMem data files, and model context-window overrides for GPT/OpenAI-compatible/local gateways. It is read-only and prints a reviewable fix plan.
-If you see `falling back to default engine "legacy"` in the logs, the install is not active. Check [INSTALL.md troubleshooting](./INSTALL.md#troubleshooting-clean-installs).
+Full install, upgrade, source-clone, embedding provider, reranker, fleet config, and rollback guidance lives in **[INSTALL.md](./INSTALL.md)**.
 ### One-line installer
@@ -519,9 +490,7 @@ If you see `falling back to default engine "legacy"` in the logs, the install is
 curl -fsSL https://raw.githubusercontent.com/PsiClawOps/hypermem/main/install.sh | bash
 ```
-Interactive: detects hardware, selects embedding tier, writes config, registers plugins.
-Full guide with embedding tiers, reranker setup, fleet config, and tuning: **[INSTALL.md](./INSTALL.md)**
+The shell installer stages the runtime and prints merge-safe activation commands. It does not edit OpenClaw config or restart the gateway.
 ### Agent-assisted install
@@ -537,6 +506,8 @@ If you prefer, hand the install to your OpenClaw agent:
 ### Tuning
+Do tuning **after** the install is verified active. If logs still show `legacy` fallback or no compose activity, you do not have a tuning problem yet. You have an install problem.
 Two independent surfaces: **context assembly** (what fills the context window) and **output shaping** (how the model writes). Pick a profile first. Most deployments adjust one or two settings on top.
 | Profile | Target window | Best for |
@@ -582,78 +553,30 @@ Or configure through `openclaw.json` (preferred for managed deployments):
 Plugin config in `openclaw.json` takes precedence over `config.json`. Both sources are merged, with plugin config winning on overlap. The config schema is validated on gateway start and visible via `openclaw config get plugins.entries.hypercompositor.config`.
-Full reference: **[docs/TUNING.md](./docs/TUNING.md)**
----
-## API
-> **Note:** The examples below use placeholder agent names (`my-agent`, `agent1`, etc.). Replace these with your actual agent IDs from your OpenClaw config. Single-agent installs typically use `main`. Multi-agent fleets use whatever IDs you've configured. See [INSTALL.md § "Configure your fleet"](./INSTALL.md#step-5--configure-your-fleet-multi-agent-only) for details.
-```typescript
-import { HyperMem } from '@psiclawops/hypermem';
-import { join } from 'node:path';
-import { homedir } from 'node:os';
-const hm = await HyperMem.create({
-  dataDir: join(homedir(), '.openclaw', 'hypermem'),
-  cache: { maxEntries: 10000 },
-  // Local (Ollama):
-  embedding: { ollamaUrl: 'http://localhost:11434', model: 'nomic-embed-text' },
-  // Hosted (OpenRouter), recommended for installs without local GPU/CPU:
-  // embedding: { provider: 'openai', openaiApiKey: 'sk-or-...', openaiBaseUrl: 'https://openrouter.ai/api/v1', model: 'qwen/qwen3-embedding-8b', dimensions: 4096, batchSize: 128 },
-});
-// Record and compose
-await hm.recordUserMessage('my-agent', 'agent:my-agent:webchat:main', 'How does drift detection work?');
-const composed = await hm.compose({
-  agentId: 'my-agent',
-  sessionKey: 'agent:my-agent:webchat:main',
-  prompt: 'How does drift detection work?',
-  tokenBudget: 4000,
-  provider: 'anthropic',
-});
-// Refresh tool compression after each turn
-await hm.refreshCacheGradient('my-agent', 'agent:my-agent:webchat:main');
-```
+**Key tuning knobs:**
+- `verboseLogging` — set to `true` in the compositor config to see per-turn budget resolution in the gateway logs (`budget source:` lines show which window size is active and why).
+- `contextWindowOverrides` — override the detected context window per `"provider/model"` key when autodetect gives wrong results for custom, local, or finetuned models. Fixes all downstream budget fractions in one place.
-Spawning a subagent with parent context:
-```typescript
-import { buildSpawnContext, MessageStore, DocChunkStore } from '@psiclawops/hypermem';
-const spawn = await buildSpawnContext(
-  new MessageStore(hm.dbManager.getMessageDb('my-agent')),
-  new DocChunkStore(hm.dbManager.getLibraryDb()),
-  'my-agent',
-  { parentSessionKey: 'agent:my-agent:webchat:main', workingSnapshot: 12 }
-);
-```
+Full reference: **[docs/TUNING.md](./docs/TUNING.md)**
 ---
-## CLI
+## API and CLI references
-`bin/hypermem-status.mjs` provides health checks and metrics from the command line:
+README keeps the interface surface short. Use the detailed docs for exact examples and release validation commands.
-```bash
-node bin/hypermem-status.mjs              # full dashboard
-node bin/hypermem-status.mjs --agent my-agent   # scoped to one agent
-node bin/hypermem-status.mjs --json          # machine-readable output
-node bin/hypermem-status.mjs --health        # health checks only (exit 1 on failure)
-```
+**Runtime API:** import `HyperMem` from `@psiclawops/hypermem` for direct Node.js use, custom tests, and non-OpenClaw integrations. See **[INSTALL.md § Non-OpenClaw usage](./INSTALL.md#non-openclaw-usage)** and package TypeScript declarations for the current interface.
-By default, `hypermem-status` looks for data in `~/.openclaw/hypermem`. If your data directory is elsewhere (e.g. testing in an isolated environment), set:
+**Operator CLIs:**
 ```bash
-HYPERMEM_DATA_DIR=/path/to/data node bin/hypermem-status.mjs --health
+hypermem-status --health
+hypermem-status --master
+hypermem-model-audit --strict
+hypermem-bench --iterations 1000 --warmup 50 --agent main
 ```
-> **Fresh install note:** If no agent has run a session yet, `--health` will report "no sessions ingested" rather than a database error. This is expected. Send a test message to any agent, then re-run the health check.
----
+Diagnostics and validation details: **[docs/DIAGNOSTICS.md](./docs/DIAGNOSTICS.md)** and **[docs/INTEGRATION_VALIDATION.md](./docs/INTEGRATION_VALIDATION.md)**.
 ## Pressure management
@@ -704,7 +627,7 @@ Full troubleshooting: **[INSTALL.md § Troubleshooting](./INSTALL.md#troubleshoo
 hypermem doesn't touch your existing memory data. Install it, switch the context engine, and migrate historical data on your own timeline.
-The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, Mem0, Honcho, QMD session exports, and Engram. Each example walks through the data model mapping, transformation steps, and validation. Adapt them to your setup.
+The migration guide includes worked examples showing how to bring data from OpenClaw built-in memory, QMD, ClawText, Cognee, Mem0, Zep, Honcho, memory-lancedb, MEMORY.md files, and custom engines. Each path documents source mapping, dry-run expectations, activation, rollback, and post-migration validation. Adapter snippets are examples unless explicitly shipped as package binaries.
 All examples default to dry-run. Nothing is written until you add `--apply`.
@@ -715,7 +638,7 @@ Operator guide: **[docs/MIGRATION_GUIDE.md](./docs/MIGRATION_GUIDE.md)**
 ## Identity layer
-hypermem handles context and output normalization. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
+hypermem handles context assembly and output-profile shaping. The Agentic Cognitive Architecture handles identity: self-authored SOUL files, structured communication contracts, and identity persistence across sessions. Same team, complementary layers.
 Design guide: [PsiClawOps/AgenticCognitiveArchitecture](https://github.com/PsiClawOps/AgenticCognitiveArchitecture/)