clawmem 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (50) hide show
  1. package/AGENTS.md +660 -0
  2. package/CLAUDE.md +660 -0
  3. package/LICENSE +21 -0
  4. package/README.md +993 -0
  5. package/SKILL.md +717 -0
  6. package/bin/clawmem +75 -0
  7. package/package.json +72 -0
  8. package/src/amem.ts +797 -0
  9. package/src/beads.ts +263 -0
  10. package/src/clawmem.ts +1849 -0
  11. package/src/collections.ts +405 -0
  12. package/src/config.ts +178 -0
  13. package/src/consolidation.ts +123 -0
  14. package/src/directory-context.ts +248 -0
  15. package/src/errors.ts +41 -0
  16. package/src/formatter.ts +427 -0
  17. package/src/graph-traversal.ts +247 -0
  18. package/src/hooks/context-surfacing.ts +317 -0
  19. package/src/hooks/curator-nudge.ts +89 -0
  20. package/src/hooks/decision-extractor.ts +639 -0
  21. package/src/hooks/feedback-loop.ts +214 -0
  22. package/src/hooks/handoff-generator.ts +345 -0
  23. package/src/hooks/postcompact-inject.ts +226 -0
  24. package/src/hooks/precompact-extract.ts +314 -0
  25. package/src/hooks/pretool-inject.ts +79 -0
  26. package/src/hooks/session-bootstrap.ts +324 -0
  27. package/src/hooks/staleness-check.ts +130 -0
  28. package/src/hooks.ts +367 -0
  29. package/src/indexer.ts +327 -0
  30. package/src/intent.ts +294 -0
  31. package/src/limits.ts +26 -0
  32. package/src/llm.ts +1175 -0
  33. package/src/mcp.ts +2138 -0
  34. package/src/memory.ts +336 -0
  35. package/src/mmr.ts +93 -0
  36. package/src/observer.ts +269 -0
  37. package/src/openclaw/engine.ts +283 -0
  38. package/src/openclaw/index.ts +221 -0
  39. package/src/openclaw/plugin.json +83 -0
  40. package/src/openclaw/shell.ts +207 -0
  41. package/src/openclaw/tools.ts +304 -0
  42. package/src/profile.ts +346 -0
  43. package/src/promptguard.ts +218 -0
  44. package/src/retrieval-gate.ts +106 -0
  45. package/src/search-utils.ts +127 -0
  46. package/src/server.ts +783 -0
  47. package/src/splitter.ts +325 -0
  48. package/src/store.ts +4062 -0
  49. package/src/validation.ts +67 -0
  50. package/src/watcher.ts +58 -0
package/AGENTS.md ADDED
@@ -0,0 +1,660 @@
1
+ # ClawMem — Agent Quick Reference
2
+
3
+ ## Inference Services
4
+
5
+ ClawMem uses three `llama-server` instances for neural inference. By default, the `bin/clawmem` wrapper points at `localhost:8088/8089/8090`.
6
+
7
+ **Default (QMD native combo, any GPU or in-process):**
8
+
9
+ | Service | Port | Model | VRAM | Protocol |
10
+ |---|---|---|---|---|
11
+ | Embedding | 8088 | EmbeddingGemma-300M-Q8_0 | ~400MB | `/v1/embeddings` |
12
+ | LLM | 8089 | qmd-query-expansion-1.7B-q4_k_m | ~2.2GB | `/v1/chat/completions` |
13
+ | Reranker | 8090 | qwen3-reranker-0.6B-Q8_0 | ~1.3GB | `/v1/rerank` |
14
+
15
+ All three models auto-download via `node-llama-cpp` if no server is running (Metal on Apple Silicon, Vulkan where available, CPU as last resort). Fast with GPU acceleration (Metal/Vulkan); significantly slower on CPU-only.
16
+
17
+ **SOTA upgrade (12GB+ GPU):** CC-BY-NC-4.0 — non-commercial only.
18
+
19
+ | Service | Port | Model | VRAM | Protocol |
20
+ |---|---|---|---|---|
21
+ | Embedding | 8088 | zembed-1-Q4_K_M | ~4.4GB | `/v1/embeddings` |
22
+ | LLM | 8089 | qmd-query-expansion-1.7B-q4_k_m | ~2.2GB | `/v1/chat/completions` |
23
+ | Reranker | 8090 | zerank-2-Q4_K_M | ~3.3GB | `/v1/rerank` |
24
+
25
+ Total ~10GB VRAM. zembed-1 (2560d, 32K context, SOTA retrieval) distilled from zerank-2 via zELO. Optimal pairing.
26
+
27
+ **Remote option:** Set `CLAWMEM_EMBED_URL`, `CLAWMEM_LLM_URL`, `CLAWMEM_RERANK_URL` to the remote host. Set `CLAWMEM_NO_LOCAL_MODELS=true` to prevent surprise fallback downloads.
28
+
29
+ **Cloud embedding:** Set `CLAWMEM_EMBED_API_KEY` + `CLAWMEM_EMBED_URL` + `CLAWMEM_EMBED_MODEL` to use a cloud provider instead of local GPU. Supported: Jina AI (recommended: `jina-embeddings-v5-text-small`, 1024d), OpenAI, Voyage, Cohere. Cloud mode enables batch embedding (50 frags/request), provider-specific retrieval params auto-detected from URL (Jina `task`, Voyage/Cohere `input_type`), server-side truncation, and adaptive TPM-aware pacing. Set `CLAWMEM_EMBED_TPM_LIMIT` to match your tier.
30
+
31
+ **Qwen3 /no_think flag:** Qwen3 uses thinking tokens by default. ClawMem appends `/no_think` to all prompts automatically for structured output.
32
+
33
+ ### Model Recommendations
34
+
35
+ | Role | Default (QMD native) | SOTA Upgrade | Notes |
36
+ |---|---|---|---|
37
+ | Embedding | [EmbeddingGemma-300M-Q8_0](https://huggingface.co/ggml-org/embeddinggemma-300M-GGUF) (314MB, 768d) | [zembed-1-Q4_K_M](https://huggingface.co/Abhiray/zembed-1-Q4_K_M-GGUF) (2.4GB, 2560d) | zembed-1: 32K context, SOTA retrieval. `-ub` must match `-b`. |
38
+ | LLM | [qmd-query-expansion-1.7B-q4_k_m](https://huggingface.co/tobil/qmd-query-expansion-1.7B-gguf) (~1.1GB) | Same | QMD's Qwen3-1.7B finetune for query expansion. |
39
+ | Reranker | [qwen3-reranker-0.6B-Q8_0](https://huggingface.co/ggml-org/Qwen3-Reranker-0.6B-Q8_0-GGUF) (~600MB) | [zerank-2-Q4_K_M](https://huggingface.co/keisuke-miyako/zerank-2-gguf-q4_k_m) (2.4GB) | zerank-2: outperforms Cohere rerank-3.5. `-ub` must match `-b`. |
40
+
41
+ ### Server Setup (all three use llama-server)
42
+
43
+ ```bash
44
+ # === Default (QMD native combo) ===
45
+
46
+ # Embedding (--embeddings flag required)
47
+ llama-server -m embeddinggemma-300M-Q8_0.gguf \
48
+ --embeddings --port 8088 --host 0.0.0.0 -ngl 99 -c 2048 --batch-size 2048
49
+
50
+ # LLM (QMD finetuned model)
51
+ llama-server -m qmd-query-expansion-1.7B-q4_k_m.gguf \
52
+ --port 8089 --host 0.0.0.0 -ngl 99 -c 4096 --batch-size 512
53
+
54
+ # Reranker
55
+ llama-server -m Qwen3-Reranker-0.6B-Q8_0.gguf \
56
+ --reranking --port 8090 --host 0.0.0.0 -ngl 99 -c 2048 --batch-size 512
57
+
58
+ # === SOTA upgrade (12GB+ GPU) — -ub must match -b for non-causal attention ===
59
+
60
+ # Embedding
61
+ llama-server -m zembed-1-Q4_K_M.gguf \
62
+ --embeddings --port 8088 --host 0.0.0.0 -ngl 99 -c 8192 -b 2048 -ub 2048
63
+
64
+ # Reranker
65
+ llama-server -m zerank-2-Q4_K_M.gguf \
66
+ --reranking --port 8090 --host 0.0.0.0 -ngl 99 -c 2048 -b 2048 -ub 2048
67
+ ```
68
+
69
+ ### Verify Endpoints
70
+
71
+ ```bash
72
+ # Embedding
73
+ curl http://host:8088/v1/embeddings -d '{"input":"test","model":"embedding"}' -H 'Content-Type: application/json'
74
+
75
+ # LLM
76
+ curl http://host:8089/v1/models
77
+
78
+ # Reranker
79
+ curl http://host:8090/v1/models
80
+ ```
81
+
82
+ ## Environment Variable Reference
83
+
84
+ | Variable | Default (via wrapper) | Effect |
85
+ |---|---|---|
86
+ | `CLAWMEM_EMBED_URL` | `http://localhost:8088` | Embedding server URL. Local llama-server, cloud API, or falls back to in-process `node-llama-cpp` if unset. |
87
+ | `CLAWMEM_EMBED_API_KEY` | (none) | API key for cloud embedding providers. Sent as Bearer token. Enables cloud mode: skips client-side truncation, sends `truncate: true` + `task` param (LoRA adapter selection for Jina v5), and activates batch embedding with adaptive TPM-aware pacing. |
88
+ | `CLAWMEM_EMBED_MODEL` | `embedding` | Model name for embedding requests. Override for cloud providers (e.g. `jina-embeddings-v5-text-small`). |
89
+ | `CLAWMEM_EMBED_MAX_CHARS` | `6000` | Max chars per embedding input. Default fits EmbeddingGemma (2048 tokens). Set to `1100` for granite-278m (512 tokens). Cloud providers skip truncation. |
90
+ | `CLAWMEM_EMBED_TPM_LIMIT` | `100000` | Tokens-per-minute limit for cloud embedding pacing. Match to your provider tier: Free 100000, Paid 2000000, Premium 50000000. |
91
+ | `CLAWMEM_EMBED_DIMENSIONS` | (none) | Output dimensions for OpenAI `text-embedding-3-*` Matryoshka models (e.g. `512`, `1024`). Only sent when URL contains `openai.com`. |
92
+ | `CLAWMEM_LLM_URL` | `http://localhost:8089` | LLM server for intent, expansion, A-MEM. Falls to `node-llama-cpp` if unset + `NO_LOCAL_MODELS=false`. |
93
+ | `CLAWMEM_RERANK_URL` | `http://localhost:8090` | Reranker server. Falls to `node-llama-cpp` if unset + `NO_LOCAL_MODELS=false`. |
94
+ | `CLAWMEM_NO_LOCAL_MODELS` | `false` | Blocks `node-llama-cpp` from auto-downloading GGUF models. Set `true` for remote-only setups. |
95
+ | `CLAWMEM_VAULTS` | (none) | JSON map of vault name → SQLite path for multi-vault mode. E.g. `{"work":"~/.cache/clawmem/work.sqlite"}` |
96
+ | `CLAWMEM_ENABLE_AMEM` | enabled | A-MEM note construction + link generation during indexing. |
97
+ | `CLAWMEM_ENABLE_CONSOLIDATION` | disabled | Background worker backfills unenriched docs. Needs long-lived MCP process. |
98
+ | `CLAWMEM_CONSOLIDATION_INTERVAL` | 300000 | Worker interval in ms (min 15000). |
99
+
100
+ **Note:** The `bin/clawmem` wrapper sets all endpoint defaults. Always use the wrapper — never `bun run src/clawmem.ts` directly. For remote GPU setups, add the same env vars to the watcher service via a systemd drop-in.
101
+
102
+ ## Quick Setup
103
+
104
+ ```bash
105
+ # Install via npm
106
+ bun add -g clawmem # or: npm install -g clawmem
107
+
108
+ # Or from source
109
+ git clone https://github.com/yoloshii/clawmem.git ~/clawmem
110
+ cd ~/clawmem && bun install
111
+ ln -sf ~/clawmem/bin/clawmem ~/.bun/bin/clawmem
112
+
113
+ # Bootstrap a vault (init + index + embed + hooks + MCP)
114
+ clawmem bootstrap ~/notes --name notes
115
+
116
+ # Or step by step:
117
+ ./bin/clawmem init
118
+ ./bin/clawmem collection add ~/notes --name notes
119
+ ./bin/clawmem update --embed
120
+ ./bin/clawmem setup hooks
121
+ ./bin/clawmem setup mcp
122
+
123
+ # Verify
124
+ ./bin/clawmem doctor # Full health check
125
+ ./bin/clawmem status # Quick index status
126
+ ```
127
+
128
+ ### Background Services (systemd user units)
129
+
130
+ The watcher and embed timer keep the vault fresh automatically. Create these after setup:
131
+
132
+ ```bash
133
+ # Create systemd user directory
134
+ mkdir -p ~/.config/systemd/user
135
+
136
+ # clawmem-watcher.service — auto-indexes on .md changes
137
+ cat > ~/.config/systemd/user/clawmem-watcher.service << 'EOF'
138
+ [Unit]
139
+ Description=ClawMem file watcher — auto-indexes on .md changes
140
+ After=default.target
141
+
142
+ [Service]
143
+ Type=simple
144
+ ExecStart=%h/clawmem/bin/clawmem watch
145
+ Restart=on-failure
146
+ RestartSec=10
147
+
148
+ [Install]
149
+ WantedBy=default.target
150
+ EOF
151
+
152
+ # clawmem-embed.service — oneshot embedding sweep
153
+ cat > ~/.config/systemd/user/clawmem-embed.service << 'EOF'
154
+ [Unit]
155
+ Description=ClawMem embedding sweep
156
+
157
+ [Service]
158
+ Type=oneshot
159
+ ExecStart=%h/clawmem/bin/clawmem embed
160
+ EOF
161
+
162
+ # clawmem-embed.timer — daily at 04:00
163
+ cat > ~/.config/systemd/user/clawmem-embed.timer << 'EOF'
164
+ [Unit]
165
+ Description=ClawMem daily embedding sweep
166
+
167
+ [Timer]
168
+ OnCalendar=*-*-* 04:00:00
169
+ Persistent=true
170
+ RandomizedDelaySec=300
171
+
172
+ [Install]
173
+ WantedBy=timers.target
174
+ EOF
175
+
176
+ # Enable and start
177
+ systemctl --user daemon-reload
178
+ systemctl --user enable --now clawmem-watcher.service clawmem-embed.timer
179
+
180
+ # Persist across reboots (start without login)
181
+ loginctl enable-linger $(whoami)
182
+
183
+ # Verify
184
+ systemctl --user status clawmem-watcher.service clawmem-embed.timer
185
+ ```
186
+
187
+ **Note:** The service files use `%h` (home directory specifier). If clawmem is installed elsewhere, update `ExecStart` paths. For remote GPU setups, add `Environment=CLAWMEM_EMBED_URL=http://host:8088` etc. to both service files (the `bin/clawmem` wrapper sets defaults).
188
+
189
+ ---
190
+
191
+ ## OpenClaw Integration: Memory System Configuration
192
+
193
+ When using ClawMem with OpenClaw, choose one of two deployment options:
194
+
195
+ ### Option 1: ClawMem Exclusive (Recommended)
196
+
197
+ ClawMem handles 100% of memory operations via hooks + MCP tools. Zero redundancy.
198
+
199
+ **Benefits:**
200
+ - No context window waste (avoids 10-15% duplicate injection)
201
+ - Prevents OpenClaw native memory auto-initialization on updates
202
+ - All memory in ClawMem's hybrid search + graph traversal system
203
+
204
+ **Configuration:**
205
+ ```bash
206
+ # Disable OpenClaw's native memory
207
+ openclaw config set agents.defaults.memorySearch.extraPaths "[]"
208
+
209
+ # Verify
210
+ openclaw config get agents.defaults.memorySearch
211
+ # Expected: {"extraPaths": []}
212
+
213
+ # Confirm no native memory index exists
214
+ ls ~/.openclaw/agents/main/memory/
215
+ # Expected: "No such file or directory"
216
+ ```
217
+
218
+ **Memory distribution:**
219
+ - **Tier 2 (90%):** Hooks auto-inject context (session-bootstrap, context-surfacing, staleness-check, decision-extractor, handoff-generator, feedback-loop)
220
+ - **Tier 3 (10%):** Agent-initiated MCP tools (query, intent_search, find_causal_links, etc.)
221
+
222
+ ### Option 2: Hybrid (ClawMem + Native)
223
+
224
+ Run both ClawMem and OpenClaw's native memory for redundancy.
225
+
226
+ **Configuration:**
227
+ ```bash
228
+ openclaw config set agents.defaults.memorySearch.extraPaths '["~/documents", "~/notes"]'
229
+ ```
230
+
231
+ **Tradeoffs:**
232
+ - ✅ Redundant recall from two independent systems
233
+ - ❌ 10-15% context window waste from duplicate facts
234
+ - ❌ Two memory indices to maintain
235
+
236
+ **Recommendation:** Use Option 1 unless you have a specific need for redundant memory systems.
237
+
238
+ ---
239
+
240
+ ## Memory Retrieval (90/10 Rule)
241
+
242
+ ClawMem hooks handle ~90% of retrieval automatically. Agent-initiated MCP calls cover the remaining ~10%.
243
+
244
+ ### Tier 2 — Automatic (hooks, zero agent effort)
245
+
246
+ | Hook | Trigger | Budget | Content |
247
+ |------|---------|--------|---------|
248
+ | `context-surfacing` | UserPromptSubmit | profile-driven (default 800) | retrieval gate → profile-driven hybrid search (vector if `useVector`, timeout from profile) → FTS supplement → file-aware supplemental search (E13) → snooze filter → noise filter → spreading activation (E11: co-activated doc boost) → memory type diversification (E10) → tiered injection (HOT/WARM/COLD snippets) → `<vault-context>` + optional `<vault-routing>` hint. Budget, max results, vector timeout, and min score all driven by `CLAWMEM_PROFILE`. |
249
+ | `postcompact-inject` | SessionStart (compact) | 1200 tokens | re-injects authoritative context after compaction: precompact state (600) + recent decisions (400) + antipatterns (150) + vault context (200) → `<vault-postcompact>` |
250
+ | `curator-nudge` | SessionStart | 200 tokens | surfaces curator report actions, nudges when report is stale (>7 days) |
251
+ | `precompact-extract` | PreCompact | — | extracts decisions, file paths, open questions → writes `precompact-state.md` to auto-memory. Query-aware decision ranking. Reindexes auto-memory collection. |
252
+ | `decision-extractor` | Stop | — | LLM extracts observations → `_clawmem/agent/observations/`, infers causal links, detects contradictions with prior decisions |
253
+ | `handoff-generator` | Stop | — | LLM summarizes session → `_clawmem/agent/handoffs/` |
254
+ | `feedback-loop` | Stop | — | tracks referenced notes → boosts confidence, records usage relations + co-activations between co-referenced docs, tracks utility signals (surfaced vs referenced ratio for lifecycle automation) |
255
+
256
+ **Default behavior:** Read injected `<vault-context>` first. If sufficient, answer immediately.
257
+
258
+ **Hook blind spots (by design):** Hooks filter out `_clawmem/` system artifacts, enforce score thresholds, and cap token budget. Absence in `<vault-context>` does NOT mean absence in memory. If you expect a memory to exist but it wasn't surfaced, escalate to Tier 3.
259
+
260
+ ### Tier 3 — Agent-Initiated (one targeted MCP call)
261
+
262
+ **Escalate ONLY when one of these three rules fires:**
263
+ 1. **Low-specificity injection** — `<vault-context>` is empty or lacks the specific fact/chain the task requires. Hooks surface top-k by relevance; if the needed memory wasn't in top-k, escalate.
264
+ 2. **Cross-session question** — the task explicitly references prior sessions or decisions: "why did we decide X", "what changed since last time", "when did we start doing Y".
265
+ 3. **Pre-irreversible check** — about to make a destructive or hard-to-reverse change (deletion, config change, architecture decision). Check vault for prior decisions before proceeding.
266
+
267
+ All other retrieval is handled by Tier 2 hooks. Do NOT call MCP tools speculatively or "just to be thorough."
268
+
269
+ **Once escalated, route by query type:**
270
+
271
+ **PREFERRED:** `memory_retrieve(query)` — auto-classifies and routes to the optimal backend (query, intent_search, session_log, find_similar, or query_plan). Use this instead of manually choosing a tool below.
272
+
273
+ ```
274
+ 1a. General recall → query(query, compact=true, limit=20)
275
+ Full hybrid: BM25 + vector + query expansion + deep reranking (4000 char).
276
+ Supports compact, collection filter (comma-separated for multi-collection: `"col1,col2"`), intent, and candidateLimit.
277
+ Default for most Tier 3 needs.
278
+ Optional: intent="domain hint" for ambiguous queries (steers expansion, reranking, chunk selection, snippets).
279
+ Optional: candidateLimit=N to tune precision/speed (default 30).
280
+ BM25 strong-signal bypass: skips expansion when top BM25 hit ≥ 0.85 with gap ≥ 0.15 (disabled when intent is provided).
281
+
282
+ 1b. Causal/why/when/entity → intent_search(query, enable_graph_traversal=true)
283
+ MAGMA intent classification + intent-weighted RRF + multi-hop graph traversal.
284
+ Use DIRECTLY (not as fallback) when the question is "why", "when", "how did X lead to Y",
285
+ or needs entity-relationship traversal.
286
+ Override auto-detection: force_intent="WHY"|"WHEN"|"ENTITY"|"WHAT"
287
+ When to override:
288
+ WHY — "why", "what led to", "rationale", "tradeoff", "decision behind"
289
+ ENTITY — named component/person/service needing cross-doc linkage, not just keyword hits
290
+ WHEN — timelines, first/last occurrence, "when did this change/regress"
291
+ WHEN note: start with enable_graph_traversal=false (BM25-biased); fall back to query() if recall drifts.
292
+
293
+ Choose 1a or 1b based on query type. They are parallel options, not sequential.
294
+
295
+ 1c. Multi-topic/complex → query_plan(query, compact=true)
296
+ Decomposes query into 2-4 typed clauses (bm25/vector/graph), executes in parallel, merges via RRF.
297
+ Use when query spans multiple topics or needs both keyword and semantic recall simultaneously.
298
+ Falls back to single-query behavior for simple queries (planner returns 1 clause).
299
+
300
+ 2. Progressive disclosure → multi_get("path1,path2") for full content of top hits
301
+
302
+ 3. Spot checks → search(query) (BM25, 0 GPU) or vsearch(query) (vector, 1 GPU)
303
+
304
+ 4. Chain tracing → find_causal_links(docid, direction="both", depth=5)
305
+ Traverses causal edges between _clawmem/agent/observations/ docs (from decision-extractor).
306
+
307
+ 5. Memory debugging → memory_evolution_status(docid)
308
+
309
+ 6. Temporal context → timeline(docid, before=5, after=5, same_collection=false)
310
+ Shows what was created/modified before and after a document.
311
+ Use after search to understand chronological neighborhood.
312
+ ```
313
+
314
+ **Other tools:**
315
+ - `find_similar(docid)` — "what else relates to X". k-NN vector neighbors — discovers connections beyond keyword overlap.
316
+ - `session_log` — USE THIS for "last time", "yesterday", "what did we do". DO NOT use `query()` for cross-session questions.
317
+ - `profile` — user profile (static facts + dynamic context).
318
+ - `memory_forget(query)` — deactivate a memory by closest match.
319
+ - `memory_pin(query, unpin?)` — +0.3 composite boost. USE PROACTIVELY for constraints, architecture decisions, corrections.
320
+ - `memory_snooze(query, until?)` — USE PROACTIVELY when `<vault-context>` surfaces noise — snooze 30 days.
321
+ - `build_graphs(temporal?, semantic?)` — build temporal backbone + semantic graph after bulk ingestion. Not needed after routine indexing (A-MEM handles per-doc links).
322
+ - `beads_sync(project_path?)` — sync Beads issues from Dolt backend (via `bd` CLI) into memory. Usually automatic via watcher.
323
+ - `query_plan(query, compact=true)` — USE THIS for multi-topic queries. `query()` searches as one blob — this splits topics and routes each optimally.
324
+ - `timeline(docid, before=5, after=5, same_collection=false)` — temporal neighborhood around a document. Progressive disclosure: search → timeline → get. Supports same-collection scoping and session correlation.
325
+ - `list_vaults()` — show configured vault names and paths. Empty in single-vault mode (default).
326
+ - `vault_sync(vault, content_root, pattern?, collection_name?)` — index markdown from a directory into a named vault. Restricted-path validation rejects sensitive directories (`/etc/`, `/root/`, `.ssh`, `.env`, `credentials`, etc.).
327
+
328
+ ### Multi-Vault
329
+
330
+ All tools accept an optional `vault` parameter — retrieval (query, search, vsearch, intent_search, memory_retrieve, query_plan), document access (get, multi_get, find_similar, find_causal_links, timeline, memory_evolution_status, session_log), mutations (memory_pin, memory_snooze, memory_forget), lifecycle (lifecycle_status, lifecycle_sweep, lifecycle_restore), and maintenance (status, reindex, index_stats, build_graphs). Omit `vault` for the default vault (single-vault mode). Named vaults are configured in `~/.config/clawmem/config.yaml` under `vaults:` or via `CLAWMEM_VAULTS` env var. Vault paths support `~` expansion.
331
+
332
+ ### Memory Lifecycle
333
+
334
+ Pin, snooze, and forget are **manual MCP tools** — not automated. The agent should proactively use them when appropriate:
335
+
336
+ - **Pin** (`memory_pin`) — +0.3 composite boost, ensures persistent surfacing.
337
+ - **Proactive triggers:** User says "remember this" / "don't forget" / "this is important". Architecture or critical design decision just made. User-stated preference or constraint that should persist across sessions.
338
+ - **Do NOT pin:** routine decisions, session-specific context, or observations that will naturally surface via recency.
339
+ - **Snooze** (`memory_snooze`) — temporarily hides from context surfacing until a date.
340
+ - **Proactive triggers:** A memory keeps surfacing but isn't relevant to current work. User says "not now" / "later" / "ignore this for now". Seasonal or time-boxed content (e.g., "revisit after launch").
341
+ - **Forget** (`memory_forget`) — permanently deactivates. Use sparingly.
342
+ - Only when a memory is genuinely wrong or permanently obsolete. Prefer snooze for temporary suppression.
343
+ - **Contradictions auto-resolve:** When `decision-extractor` detects a new decision contradicting an old one, the old decision's confidence is lowered automatically. No manual intervention needed for superseded decisions.
344
+
345
+ ### Anti-Patterns
346
+
347
+ - Do NOT manually pick query/intent_search/search when `memory_retrieve` can auto-route.
348
+ - Do NOT call MCP tools every turn — three rules above are the only gates.
349
+ - Do NOT re-search what's already in `<vault-context>`.
350
+ - Do NOT run `status` routinely. Only when retrieval feels broken or after large ingestion.
351
+ - Do NOT pin everything — pin is for persistent high-priority items, not temporary boosting.
352
+ - Do NOT forget memories to "clean up" — let confidence decay and contradiction detection handle it naturally.
353
+ - Do NOT run `build_graphs` after every reindex — A-MEM creates per-doc links automatically. Only after bulk ingestion or when `intent_search` returns weak graph results.
354
+
355
+ ## Tool Selection (one-liner)
356
+
357
+ ```
358
+ ClawMem escalation: memory_retrieve(query) | query(compact=true) | intent_search(why/when/entity) | query_plan(multi-topic) → multi_get → search/vsearch (spot checks)
359
+ ```
360
+
361
+ ## Curator Agent
362
+
363
+ Maintenance agent for Tier 3 operations the main agent typically neglects. Install with `clawmem setup curator`.
364
+
365
+ **Invoke:** "curate memory", "run curator", or "memory maintenance"
366
+
367
+ **6 phases:**
368
+ 1. Health snapshot — status, index_stats, lifecycle_status, doctor
369
+ 2. Lifecycle triage — pin high-value unpinned memories, snooze stale content, propose forget candidates (never auto-confirms)
370
+ 3. Retrieval health check — 5 probes (BM25, vector, hybrid, intent/graph, lifecycle)
371
+ 4. Maintenance — reflect (cross-session patterns), consolidate --dry-run (dedup candidates)
372
+ 5. Graph rebuild — conditional on probe results and embedding state
373
+ 6. Collection hygiene — orphan detection, content type distribution
374
+
375
+ **Safety rails:** Never auto-confirms forget. Never runs embed (timer's job). Never modifies config.yaml. All destructive proposals require user approval.
376
+
377
+ ## Query Optimization
378
+
379
+ The pipeline autonomously generates lex/vec/hyde variants, fuses BM25 + vector via RRF, and reranks with a cross-encoder. Agents do NOT choose search types — the pipeline handles fusion internally. The optimization levers are: **tool selection**, **query string quality**, **intent**, and **candidateLimit**.
380
+
381
+ ### Tool Selection (highest impact)
382
+
383
+ Pick the lightest tool that satisfies the need:
384
+
385
+ | Tool | Cost | When |
386
+ |------|------|------|
387
+ | `search(q, compact=true)` | BM25 only, 0 GPU | Know exact terms, spot-check, fast keyword lookup |
388
+ | `vsearch(q, compact=true)` | Vector only, 1 GPU call | Conceptual/fuzzy, don't know vocabulary |
389
+ | `query(q, compact=true)` | Full hybrid, 3+ GPU calls | General recall, unsure which signal matters |
390
+ | `intent_search(q)` | Hybrid + graph | Why/entity chains (graph traversal), when queries (BM25-biased) |
391
+ | `query_plan(q)` | Hybrid + decomposition | Complex multi-topic queries needing parallel typed retrieval |
392
+
393
+ Use `search` for quick keyword spot-checks. Use `query` for general recall (default Tier 3 workhorse). Use `intent_search` directly (not as fallback) when the question is causal or relational.
394
+
395
+ ### Query String Quality
396
+
397
+ The query string feeds BM25 directly (probes first, can short-circuit the pipeline) and anchors the 2×-weighted original signal in RRF.
398
+
399
+ **For keyword recall (BM25):** 2-5 precise terms, no filler. Code identifiers work. BM25 AND's all terms as prefix matches (`perf` matches "performance") — no phrase search or negation syntax. A strong hit (≥ 0.85 with gap ≥ 0.15) skips expansion — faster results.
400
+
401
+ **For semantic recall (vector):** Full natural language question, be specific. `"in the payment service, how are refunds processed"` > `"refunds"`.
402
+
403
+ **Do NOT write hypothetical-answer-style queries.** The expansion LLM already generates hyde variants internally. A long hypothetical dilutes BM25 scoring and duplicates what the pipeline does autonomously.
404
+
405
+ ### Intent Parameter
406
+
407
+ Steers 5 autonomous stages: expansion, reranking, chunk selection, snippet extraction, and strong-signal bypass (disabled when intent is provided).
408
+
409
+ Use when: query term has multiple meanings in the vault, domain is known but query alone is ambiguous.
410
+ Do NOT use when: query is already specific, single-domain vault, using `search`/`vsearch` (intent only affects `query`).
411
+
412
+ Note: intent disables BM25 strong-signal bypass, forcing full expansion+reranking. Correct behavior — intent means the query is ambiguous, so keyword confidence alone is insufficient.
413
+
414
+ ## Composite Scoring (automatic, applied to all search tools)
415
+
416
+ ```
417
+ compositeScore = (0.50 × searchScore + 0.25 × recencyScore + 0.25 × confidenceScore) × qualityMultiplier × coActivationBoost
418
+ ```
419
+
420
+ Where `qualityMultiplier = 0.7 + 0.6 × qualityScore` (range: 0.7× penalty to 1.3× boost).
421
+ `coActivationBoost = 1 + min(coCount/10, 0.15)` — documents frequently surfaced together get up to 15% boost.
422
+ Length normalization: `1/(1 + 0.5 × log2(max(bodyLength/500, 1)))` — penalizes verbose entries, floor at 30%.
423
+ Frequency boost: `freqSignal = (revisions-1)×2 + (duplicates-1)`, `freqBoost = min(0.10, log1p(freqSignal)×0.03)`. Revision count weighted 2× vs duplicate count. Capped at 10%.
424
+ Pinned documents get +0.3 additive boost (capped at 1.0).
425
+
426
+ Recency intent detected ("latest", "recent", "last session"):
427
+ ```
428
+ compositeScore = (0.10 × searchScore + 0.70 × recencyScore + 0.20 × confidenceScore) × qualityMultiplier × coActivationBoost
429
+ ```
430
+
431
+ | Content Type | Half-Life | Effect |
432
+ |--------------|-----------|--------|
433
+ | decision, hub | ∞ | Never decay |
434
+ | antipattern | ∞ | Never decay — accumulated negative patterns persist |
435
+ | project | 120 days | Slow decay |
436
+ | research | 90 days | Moderate decay |
437
+ | note | 60 days | Default |
438
+ | progress | 45 days | Faster decay |
439
+ | handoff | 30 days | Fast — recent matters most |
440
+
441
+ Half-lives extend up to 3× for frequently-accessed memories (access reinforcement decays over 90 days).
442
+ Attention decay: non-durable types (handoff, progress, note, project) lose 5% confidence per week without access. Decision/hub/research/antipattern are exempt.
443
+
444
+ ## Indexing & Graph Building
445
+
446
+ ### What Gets Indexed (per collection in config.yaml, symlinked as index.yml)
447
+
448
+ - `**/MEMORY.md` — any depth
449
+ - `**/memory/**/*.md`, `**/memory/**/*.txt` — session logs
450
+ - `**/docs/**/*.md`, `**/docs/**/*.txt` — documentation
451
+ - `**/research/**/*.md`, `**/research/**/*.txt` — research dumps
452
+ - `**/YYYY-MM-DD*.md`, `**/YYYY-MM-DD*.txt` — date-format records
453
+
454
+ ### Excluded (even if pattern matches)
455
+
456
+ - `gits/`, `scraped/`, `.git/`, `node_modules/`, `dist/`, `build/`, `vendor/`
457
+
458
+ ### Indexing vs Embedding (important distinction)
459
+
460
+ **Infrastructure (Tier 1, no agent action needed):**
461
+ - **`clawmem-watcher`** — keeps index + A-MEM fresh (continuous, on `.md` change). Also watches `.beads/` — routes changes to `syncBeadsIssues()` which queries `bd` CLI for live Dolt data (auto-bridges deps into `memory_relations`). Does NOT embed.
462
+ - **`clawmem-embed` timer** — keeps embeddings fresh (daily). Idempotent, skips already-embedded fragments.
463
+
464
+ **Quality scoring:** Each document gets a `quality_score` (0.0–1.0) computed during indexing based on length, structure (headings, lists), decision keywords, correction keywords, and frontmatter richness. Applied as a multiplier in composite scoring.
465
+
466
+ **Impact of missing embeddings:** `vsearch`, `query` (vector component), `context-surfacing` (vector component), and `generateMemoryLinks()` (neighbor discovery) all depend on embeddings. If embeddings are missing, these degrade silently — BM25 still works, but vector recall and inter-doc link quality suffer.
467
+
468
+ **Agent escape hatches (rare):**
469
+ - `clawmem embed` via CLI if you just wrote a doc and need immediate vector recall in the next turn.
470
+ - Manual `reindex` only when immediate index freshness is required and watcher hasn't caught up.
471
+
472
+ ### Graph Population (memory_relations)
473
+
474
+ The `memory_relations` table is populated by multiple independent sources:
475
+
476
+ | Source | Edge Types | Trigger | Notes |
477
+ |--------|-----------|---------|-------|
478
+ | A-MEM `generateMemoryLinks()` | semantic, supporting, contradicts | Indexing (new docs only) | LLM-assessed confidence + reasoning. Requires embeddings for neighbor discovery. |
479
+ | A-MEM `inferCausalLinks()` | causal | Post-response (IO3 decision-extractor) | Links between `_clawmem/agent/observations/` docs, not arbitrary workspace docs. |
480
+ | Beads `syncBeadsIssues()` | causal, supporting, semantic | `beads_sync` MCP tool or watcher (.beads/ change) | Queries `bd` CLI (Dolt backend). Maps beads deps: blocks→causal, discovered-from→supporting, relates-to→semantic, plus conditional-blocks→causal, caused-by→causal, supersedes→supporting. Metadata: `{origin: "beads"}`. |
481
+ | `buildTemporalBackbone()` | temporal | `build_graphs` MCP tool (manual) | Creation-order edges between all active docs. |
482
+ | `buildSemanticGraph()` | semantic | `build_graphs` MCP tool (manual) | Pure cosine similarity. PK collision: `INSERT OR IGNORE` means A-MEM semantic edges take precedence if they exist first. |
483
+
484
+ **Edge collision:** Both `generateMemoryLinks()` and `buildSemanticGraph()` insert `relation_type='semantic'`. PK is `(source_id, target_id, relation_type)` — first writer wins.
485
+
486
+ **Graph traversal asymmetry:** `adaptiveTraversal()` traverses all edge types outbound (source→target) but only `semantic` and `entity` edges inbound (target→source). Temporal and causal edges are directional only.
487
+
488
+ ### When to Run `build_graphs`
489
+
490
+ - After **bulk ingestion** (many new docs at once) — adds temporal backbone and fills semantic gaps where A-MEM links are sparse.
491
+ - When `intent_search` for WHY/ENTITY returns **weak or obviously incomplete results** and you suspect graph sparsity.
492
+ - Do NOT run after every reindex. Routine indexing creates A-MEM links automatically for new docs.
493
+
494
+ ### When to Run `index_stats`
495
+
496
+ - After bulk ingestion to verify doc counts and embedding coverage.
497
+ - When retrieval quality seems degraded — check for unembedded docs or content type distribution issues.
498
+ - Do NOT run routinely.
499
+
500
+ ## Pipeline Details
501
+
502
+ ### `query` (default Tier 3 workhorse)
503
+
504
+ ```
505
+ User Query + optional intent hint
506
+ → BM25 Probe → Strong Signal Check (skip expansion if top hit ≥ 0.85 with gap ≥ 0.15; disabled when intent provided)
507
+ → Query Expansion (LLM generates text variants; intent steers expansion prompt)
508
+ → Parallel: BM25(original) + Vector(original) + BM25(each expanded) + Vector(each expanded)
509
+ → Original query lists get positional 2× weight in RRF; expanded get 1×
510
+ → Reciprocal Rank Fusion (k=60, top candidateLimit)
511
+ → Intent-Aware Chunk Selection (intent terms at 0.5× weight alongside query terms at 1.0×)
512
+ → Cross-Encoder Reranking (4000 char context; intent prepended to rerank query; chunk dedup; batch cap=4)
513
+ → Position-Aware Blending (α=0.75 top3, 0.60 mid, 0.40 tail)
514
+ → SAME Composite Scoring
515
+ → MMR Diversity Filter (Jaccard bigram similarity > 0.6 → demoted, not removed)
516
+ ```
517
+
518
+ ### `intent_search` (specialist for causal chains)
519
+
520
+ ```
521
+ User Query → Intent Classification (WHY/WHEN/ENTITY/WHAT)
522
+ → BM25 + Vector (intent-weighted RRF: boost BM25 for WHEN, vector for WHY)
523
+ → Graph Traversal (WHY/ENTITY only; multi-hop beam search over memory_relations)
524
+ Outbound: all edge types (semantic, supporting, contradicts, causal, temporal)
525
+ Inbound: semantic and entity only
526
+ Scores normalized to [0,1] before merge with search results
527
+ → Cross-Encoder Reranking (200 char context per doc; file-keyed score join)
528
+ → SAME Composite Scoring (uses stored confidence from contradiction detection + feedback)
529
+ ```
530
+
531
+ ### Key Differences
532
+
533
+ | Aspect | `query` | `intent_search` |
534
+ |--------|---------|-----------------|
535
+ | Query expansion | Yes (skipped on strong BM25 signal) | No |
536
+ | Intent hint | Yes (`intent` param steers 5 stages) | Auto-detected (WHY/WHEN/ENTITY/WHAT) |
537
+ | Rerank context | 4000 chars/doc (intent-aware chunk selection) | 200 chars/doc |
538
+ | Chunk dedup | Yes (identical texts share single rerank call) | No |
539
+ | Graph traversal | No | Yes (WHY/ENTITY, multi-hop) |
540
+ | MMR diversity | Yes (`diverse=true` default) | No |
541
+ | `compact` param | Yes | No |
542
+ | `collection` filter | Yes | No |
543
+ | `candidateLimit` | Yes (default 30) | No |
544
+ | Best for | Most queries, progressive disclosure | Causal chains spanning multiple docs |
545
+
546
+ ## Operational Issue Tracking
547
+
548
+ When encountering tool failures, instruction contradictions, retrieval gaps, or workflow friction that would benefit from a fix:
549
+
550
+ Write to `docs/issues/YYYY-MM-DD-<slug>.md` with: category, severity, what happened, what was expected, context, suggested fix.
551
+
552
+ **File structure:**
553
+ ```
554
+ # <title>
555
+ - Category: tool-failure | instruction-gap | workflow-friction | retrieval-gap | inconsistency
556
+ - Severity: critical | high | medium
557
+ - Status: open | resolved
558
+
559
+ ## Observed
560
+ ## Expected
561
+ ## Context
562
+ ## Suggested Fix
563
+ ```
564
+
565
+ **Triggers:** repeated tool error, instruction that contradicts observed behavior, retrieval consistently missing known content, workflow requiring unnecessary steps.
566
+
567
+ **Do NOT log:** one-off transient errors, user-caused issues, issues already recorded.
568
+
569
+ ## Troubleshooting
570
+
571
+ ```
572
+ Symptom: "Local model download blocked" error
573
+ → llama-server endpoint unreachable while CLAWMEM_NO_LOCAL_MODELS=true.
574
+ → Fix: Start the llama-server instance. Or set CLAWMEM_NO_LOCAL_MODELS=false for in-process fallback.
575
+
576
+ Symptom: Query expansion always fails / returns garbage
577
+ → On CPU-only systems, in-process inference is significantly slower and less reliable. Systems with GPU acceleration (Metal/Vulkan) handle these models well in-process.
578
+ → Fix: Run llama-server on a GPU. Even a low-end NVIDIA card handles 1.7B models.
579
+
580
+ Symptom: Vector search returns no results but BM25 works
581
+ → Missing embeddings. Watcher indexes but does NOT embed.
582
+ → Fix: Run `clawmem embed` or wait for the daily embed timer.
583
+
584
+ Symptom: llama-server crashes with "non-causal attention requires n_ubatch >= n_tokens"
585
+ → Embedding/reranking models use non-causal attention. When -b (batch) > -ub (ubatch), the assertion fails.
586
+ → Fix: Set -ub equal to -b (e.g. -b 2048 -ub 2048). Never omit -ub for embedding/reranking servers.
587
+
588
+ Symptom: context-surfacing hook returns empty
589
+ → Prompt too short (<20 chars), starts with `/`, or no docs score above threshold.
590
+ → Fix: Check `clawmem status` for doc counts. Check `clawmem embed` for embedding coverage.
591
+
592
+ Symptom: intent_search returns weak results for WHY/ENTITY
593
+ → Graph may be sparse (few A-MEM edges).
594
+ → Fix: Run `build_graphs` to add temporal backbone + semantic edges.
595
+
596
+ Symptom: Watcher logs events but collections show 0 docs after update/reindex
597
+ → Bun.Glob does not support brace expansion {a,b,c}. Collection patterns returned 0 files.
598
+ → Fixed 2026-02-12: indexer.ts splits brace patterns into individual Glob scans.
599
+
600
+ Symptom: Watcher fires events but wrong collection processes them (e.g., workspace instead of dharma-propagation)
601
+ → Collection prefix matching via Array.find() returns first match. Parent paths match before children.
602
+ → Fixed 2026-02-12: cmdWatch() sorts collections by path length descending (most specific first).
603
+
604
+ Symptom: reindex --force crashes with "UNIQUE constraint failed: documents.collection, documents.path"
605
+ → Force deactivates rows (active=0) but UNIQUE(collection, path) doesn't discriminate by active flag.
606
+ → Fixed 2026-02-12: indexer.ts checks for inactive rows and reactivates instead of inserting.
607
+
608
+ Symptom: embed crashes with "UNIQUE constraint failed on vectors_vec primary key" on restart
609
+ → vectors_vec is a vec0 virtual table — INSERT OR REPLACE is not supported by vec0.
610
+ → Fixed 2026-03-15: insertEmbedding() uses DELETE (try-catch) + INSERT instead of INSERT OR REPLACE.
611
+ → Embed can now resume after interrupted runs without --force.
612
+
613
+ Symptom: embed crashes with alternating "no such table: vectors_vec" / "table vectors_vec already exists"
614
+ → Dimension migration race: --force drops vectors_vec, ensureVecTable per-fragment drops+recreates on dimension
615
+ mismatch, causing rapid table existence flickering between fragments.
616
+ → Fixed 2026-03-15: ensureVecTable caches verified dimensions (vecTableDims), uses CREATE VIRTUAL TABLE IF NOT EXISTS,
617
+ and clearAllEmbeddings resets the cache. First fragment creates, rest skip the check.
618
+
619
+ Symptom: embed --force with new model produces 3 docs stuck as "Unembedded" but "All documents already embedded"
620
+ → First fragment (seq=0) failed during a crashed embed run. Later fragments succeeded.
621
+ getHashesNeedingFragments thinks the doc is done but status checks seq=0 specifically.
622
+ → Fix: Delete partial content_vectors + vectors_vec for the stuck hashes, then re-run embed (no --force).
623
+ The vec0 DELETE try-catch prevents cascading failures during the re-embed.
624
+
625
+ Symptom: CLI reindex/update falls back to node-llama-cpp Vulkan (not GPU server)
626
+ → GPU env vars only in systemd drop-in, not in wrapper script. CLI invocations missed them.
627
+ → Fixed 2026-02-12: bin/clawmem wrapper exports CLAWMEM_EMBED_URL/LLM_URL/RERANK_URL defaults.
628
+ ```
629
+
630
+ ## CLI Reference
631
+
632
+ Run `clawmem --help` for full command listing. Use this before guessing at commands or parameters.
633
+
634
+ **IO6 surface commands** (for daemon/`--print` mode integration):
635
+ ```bash
636
+ # IO6a: per-prompt context injection (pipe prompt on stdin)
637
+ echo "user query" | clawmem surface --context --stdin
638
+
639
+ # IO6b: per-session bootstrap injection (pipe session ID on stdin)
640
+ echo "session-id" | clawmem surface --bootstrap --stdin
641
+ ```
642
+
643
+ **Analysis commands:**
644
+ ```bash
645
+ clawmem reflect [N] # Cross-session reflection (last N days, default 14)
646
+ # Shows recurring themes, antipatterns, co-activation clusters
647
+ clawmem consolidate [--dry-run] # Find and archive duplicate low-confidence documents
648
+ # Uses Jaccard similarity within same collection
649
+ ```
650
+
651
+ ## Integration Notes
652
+
653
+ - QMD retrieval (BM25, vector, RRF, rerank, query expansion) is forked into ClawMem. Do not call standalone QMD tools.
654
+ - SAME (composite scoring), MAGMA (intent + graph), A-MEM (self-evolving notes) layer on top of QMD substrate.
655
+ - Three `llama-server` instances (embedding, LLM, reranker) on local or remote GPU. Wrapper defaults to `localhost:8088/8089/8090`.
656
+ - `CLAWMEM_NO_LOCAL_MODELS=false` (default) allows in-process LLM/reranker fallback via `node-llama-cpp`. Set `true` for remote-only setups to fail fast on unreachable endpoints.
657
+ - Consolidation worker (`CLAWMEM_ENABLE_CONSOLIDATION=true`) backfills unenriched docs with A-MEM notes + links. Only runs if the MCP process stays alive long enough to tick (every 5min).
658
+ - Beads integration: `syncBeadsIssues()` queries `bd` CLI (Dolt backend, v0.58.0+) for live issue data, creates markdown docs in `beads` collection, maps all dependency edge types into `memory_relations`, and triggers A-MEM enrichment for new docs. Watcher auto-triggers on `.beads/` directory changes; `beads_sync` MCP tool for manual sync. Requires `bd` binary on PATH or at `~/go/bin/bd`.
659
+ - HTTP REST API: `clawmem serve [--port 7438]` — optional REST server on localhost. Search, retrieval, lifecycle, and graph traversal. `POST /retrieve` mirrors `memory_retrieve` with auto-routing (keyword/semantic/causal/timeline/hybrid). `POST /search` provides direct mode selection. Bearer token auth via `CLAWMEM_API_TOKEN` env var (disabled if unset).
660
+ - OpenClaw ContextEngine plugin: `clawmem setup openclaw` — registers ClawMem as a native OpenClaw context engine. Uses `before_prompt_build` for retrieval (prompt-aware), `afterTurn()` for extraction, `compact()` for pre-compaction. Shares same vault as Claude Code hooks (dual-mode). SQLite busy_timeout=5000ms for concurrent access safety.