memory-crystal 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (104) hide show
  1. package/.env.example +20 -0
  2. package/CHANGELOG.md +6 -0
  3. package/LETTERS.md +22 -0
  4. package/LICENSE +21 -0
  5. package/README-ENTERPRISE.md +162 -0
  6. package/README-old.md +275 -0
  7. package/README.md +91 -0
  8. package/RELAY.md +88 -0
  9. package/TECHNICAL.md +379 -0
  10. package/ai/dev-updates/2026-02-25--cc-air--phase2-architecture-pivot.md +70 -0
  11. package/ai/dev-updates/2026-02-25--cc-air--phase2-worker-build.md +72 -0
  12. package/ai/dev-updates/2026-02-26--10-25-16--cc-mini--phase2-implementation.md +49 -0
  13. package/ai/dev-updates/2026-02-27--20-30-00--cc-mini--readme-overhaul-and-public-deploy.md +69 -0
  14. package/ai/notes/2026-02-26--cc-air--notes.md +412 -0
  15. package/ai/notes/2026-02-27--cc-mini--grok-feedback.md +44 -0
  16. package/ai/notes/2026-02-27--cc-mini--lesa-feedback.md +45 -0
  17. package/ai/notes/RESEARCH.md +1185 -0
  18. package/ai/notes/salience-research/README.md +29 -0
  19. package/ai/notes/salience-research/eurosla-salience-review.md +64 -0
  20. package/ai/notes/salience-research/full-research-summary.md +269 -0
  21. package/ai/notes/salience-research/salience-levels-diagram.png +0 -0
  22. package/ai/plan/2026-02-27--cc-mini--qr-pairing-spec.md +203 -0
  23. package/ai/plan/_archive/PLAN.md +194 -0
  24. package/ai/plan/_archive/PRD.md +1014 -0
  25. package/ai/plan/cc-plans-duplicates-from-dot-claude/2026-02-26--cc-mini--phase2-implementation-plan.md +245 -0
  26. package/ai/plan/dev-conventions-note.md +70 -0
  27. package/ai/plan/ldm-os-install-and-boot-architecture.md +285 -0
  28. package/ai/plan/memory-crystal-phase2-plan.md +192 -0
  29. package/ai/plan/memory-system-lay-of-the-land.md +214 -0
  30. package/ai/plan/phase2-ephemeral-relay.md +238 -0
  31. package/ai/plan/readme-first.md +68 -0
  32. package/ai/plan/roadmap.md +159 -0
  33. package/ai/todos/PUNCHLIST.md +44 -0
  34. package/ai/todos/README.md +31 -0
  35. package/ai/todos/inboxes/cc-air/2026-02-26--cc-air--post-relay-todos.md +85 -0
  36. package/ai/todos/inboxes/cc-mini/2026-02-26--cc-mini--phase2-status.md +100 -0
  37. package/ai/todos/inboxes/cc-mini/_archive/TODO.md +25 -0
  38. package/ai/todos/inboxes/parker/2026-02-25--cc-air--setup-checklist.md +139 -0
  39. package/ai/todos/inboxes/parker/2026-02-26--cc-mini--phase2-your-moves.md +72 -0
  40. package/dist/cc-hook.d.ts +1 -0
  41. package/dist/cc-hook.js +349 -0
  42. package/dist/chunk-3VFIJYS4.js +818 -0
  43. package/dist/chunk-52QE3YI3.js +1169 -0
  44. package/dist/chunk-AA3OPP4Z.js +432 -0
  45. package/dist/chunk-D3I3ZSE2.js +411 -0
  46. package/dist/chunk-EKSACBTJ.js +1070 -0
  47. package/dist/chunk-F3Y7EL7K.js +83 -0
  48. package/dist/chunk-JWZXYVET.js +1068 -0
  49. package/dist/chunk-KYVWO6ZM.js +1069 -0
  50. package/dist/chunk-L3VHARQH.js +413 -0
  51. package/dist/chunk-LOVAHSQV.js +411 -0
  52. package/dist/chunk-LQOYCAGG.js +446 -0
  53. package/dist/chunk-MK42FMEG.js +147 -0
  54. package/dist/chunk-NIJCVN3O.js +147 -0
  55. package/dist/chunk-O2UITJGH.js +465 -0
  56. package/dist/chunk-PEK6JH65.js +432 -0
  57. package/dist/chunk-PJ6FFKEX.js +77 -0
  58. package/dist/chunk-PLUBBZYR.js +800 -0
  59. package/dist/chunk-SGL6ISBJ.js +1061 -0
  60. package/dist/chunk-UNHVZB5G.js +411 -0
  61. package/dist/chunk-VAFTWSTE.js +1061 -0
  62. package/dist/chunk-XZ3S56RQ.js +1061 -0
  63. package/dist/chunk-Y72C7F6O.js +148 -0
  64. package/dist/cli.d.ts +1 -0
  65. package/dist/cli.js +325 -0
  66. package/dist/core.d.ts +188 -0
  67. package/dist/core.js +12 -0
  68. package/dist/crypto.d.ts +16 -0
  69. package/dist/crypto.js +18 -0
  70. package/dist/dev-update-SZ2Z4WCQ.js +6 -0
  71. package/dist/ldm.d.ts +17 -0
  72. package/dist/ldm.js +12 -0
  73. package/dist/mcp-server.d.ts +1 -0
  74. package/dist/mcp-server.js +250 -0
  75. package/dist/migrate.d.ts +1 -0
  76. package/dist/migrate.js +89 -0
  77. package/dist/mirror-sync.d.ts +1 -0
  78. package/dist/mirror-sync.js +130 -0
  79. package/dist/openclaw.d.ts +5 -0
  80. package/dist/openclaw.js +349 -0
  81. package/dist/poller.d.ts +1 -0
  82. package/dist/poller.js +272 -0
  83. package/dist/summarize.d.ts +19 -0
  84. package/dist/summarize.js +10 -0
  85. package/dist/worker.js +137 -0
  86. package/openclaw.plugin.json +11 -0
  87. package/package.json +40 -0
  88. package/scripts/migrate-lance-to-sqlite.mjs +217 -0
  89. package/skills/memory/SKILL.md +61 -0
  90. package/src/cc-hook.ts +447 -0
  91. package/src/cli.ts +356 -0
  92. package/src/core.ts +1472 -0
  93. package/src/crypto.ts +113 -0
  94. package/src/dev-update.ts +178 -0
  95. package/src/ldm.ts +117 -0
  96. package/src/mcp-server.ts +274 -0
  97. package/src/migrate.ts +104 -0
  98. package/src/mirror-sync.ts +175 -0
  99. package/src/openclaw.ts +250 -0
  100. package/src/poller.ts +345 -0
  101. package/src/summarize.ts +210 -0
  102. package/src/worker.ts +208 -0
  103. package/tsconfig.json +18 -0
  104. package/wrangler.toml +20 -0
@@ -0,0 +1,1014 @@
1
+ # Memory Crystal — Product Requirements Document
2
+
3
+ **Sovereign memory infrastructure for OpenClaw agents.**
4
+
5
+ Your memory. Your machine. Your rules.
6
+
7
+ ---
8
+
9
+ ## 1. Vision
10
+
11
+ Memory Crystal is a local-first, self-hosted memory system that gives OpenClaw agents (starting with Lesa) total recall — across conversations, documents, code, web pages, messages, and every source of knowledge that matters. No cloud dependency. No $19/month. No data leaving your machine.
12
+
13
+ This is the memory layer that Supermemory charges for, but built as sovereign infrastructure: an OpenClaw plugin backed by SQLite, running on your Mac mini, with every byte under your control.
14
+
15
+ ### Design Philosophy
16
+
17
+ From WIP.computer's founding principles:
18
+
19
+ > "We do not dictate outcomes. We design conditions."
20
+
21
+ Memory Crystal creates the conditions for intelligence to emerge:
22
+ - **Foundation:** A unified memory store that captures everything
23
+ - **Constraint:** Privacy boundaries, data sovereignty, local-first
24
+ - **Intelligence:** Semantic search, knowledge graphs, memory evolution
25
+ - **Emergence:** An agent that never forgets, that connects dots across months of context
26
+
27
+ ---
28
+
29
+ ## 2. Problem Statement
30
+
31
+ ### What exists today
32
+
33
+ Parker's OpenClaw setup has memory spread across five disconnected systems:
34
+
35
+ | System | What it stores | Limitation |
36
+ |--------|---------------|------------|
37
+ | `MEMORY.md` + workspace files | Curated long-term memory, daily logs | Manual, no search beyond grep |
38
+ | `context-embeddings` plugin | Conversation turns (2,673 chunks) | Brute-force cosine sim, no indexing, token-based chunking |
39
+ | `main.sqlite` (built-in) | Session transcripts + document embeddings | OpenClaw's built-in search, limited control |
40
+ | `lesa-bridge` MCP server | Exposes memory to Claude Code | Read-only bridge, no unified interface |
41
+ | Workspace `.md` files | Notes, research, observations | Flat files, no semantic understanding |
42
+
43
+ **Problems:**
44
+ 1. **No unified search** — Each system has its own query interface. No single query can search across conversations, documents, and notes.
45
+ 2. **No knowledge graph** — Facts are stored as flat text. "Parker works at WIP.computer" and "WIP.computer is building a music player" aren't connected.
46
+ 3. **No memory evolution** — Old facts never get updated or deprecated. If Parker changes his mind, both old and new opinions coexist with equal weight.
47
+ 4. **No ingestion pipeline** — Adding a new document or URL to memory is manual. No connectors for iMessage history, browser bookmarks, email, or local files.
48
+ 5. **No vector indexing** — The context-embeddings plugin does brute-force cosine similarity over all chunks. This works at 2,673 chunks but won't scale to 100K+.
49
+ 6. **Chunking is naive** — Fixed ~400 token chunks with overlap. No semantic boundaries, no code-aware chunking, no contextual enrichment.
50
+
51
+ ### What Supermemory offers (and what we're replacing)
52
+
53
+ Supermemory ($19/mo, closed-source backend, Cloudflare-locked):
54
+ - Knowledge graph with auto-evolving memories
55
+ - Hybrid search (vector + keyword + reranking)
56
+ - Connectors: Gmail, Google Drive, Notion, OneDrive, S3, web crawler
57
+ - MCP server, browser extension, SDKs
58
+ - Sub-300ms search, scales to 50M tokens per user
59
+ - Memory operations: ADD, UPDATE, DELETE, NOOP (mem0-style)
60
+
61
+ **Their weakness:** Your data lives on their servers. Self-hosting requires enterprise plan + Cloudflare account. The backend engine is closed-source.
62
+
63
+ ---
64
+
65
+ ## 3. Architecture
66
+
67
+ ### Core Principle: LanceDB + SQLite, Local-First
68
+
69
+ No external databases. No Docker. No Postgres. Two embedded stores:
70
+ - **LanceDB** — Vector search + BM25 hybrid search (Apache Arrow format, disk-efficient, scales to 1M+)
71
+ - **SQLite** — Knowledge graph, metadata, memory records, connector state
72
+
73
+ Embedding calls default to **Ollama** (local, free, `nomic-embed-text-v1.5`) with OpenAI as a fallback for users without Ollama. See [RESEARCH.md](./RESEARCH.md) for the full comparison that led to this decision.
74
+
75
+ ```
76
+ ~/.openclaw/memory-crystal/
77
+ ├── lance/ ← LanceDB data directory
78
+ │ └── memories.lance/ ← Vector index + BM25 index (chunks, memories, entities)
79
+ ├── crystal.db ← SQLite: knowledge graph, metadata, connector state
80
+ └── backups/ ← Automatic daily backups
81
+
82
+ ~/Documents/Projects/OpenClaw/memory-crystal/
83
+ ├── src/
84
+ │ ├── index.ts ← Plugin entry point (registers tools, services, CLI)
85
+ │ ├── db/
86
+ │ │ ├── lance.ts ← LanceDB connection, table creation, hybrid search
87
+ │ │ ├── sqlite.ts ← SQLite schema, migrations, graph queries
88
+ │ │ └── migrate.ts ← Import from context-embeddings.sqlite
89
+ │ ├── embed.ts ← Embedding provider (Ollama primary, OpenAI fallback)
90
+ │ ├── ingest/
91
+ │ │ ├── pipeline.ts ← Universal ingestion pipeline
92
+ │ │ ├── chunker.ts ← Smart chunking (semantic + code-aware)
93
+ │ │ ├── extractor.ts ← Content extraction (URLs, PDFs, etc.)
94
+ │ │ └── enricher.ts ← Contextual enrichment (Anthropic-style)
95
+ │ ├── memory/
96
+ │ │ ├── operations.ts ← ADD / UPDATE / DELETE / NOOP logic
97
+ │ │ ├── graph.ts ← Knowledge graph (entities + relationships)
98
+ │ │ ├── evolution.ts ← Memory decay, dedup, consolidation
99
+ │ │ └── extract.ts ← LLM-based fact extraction from text
100
+ │ ├── search/
101
+ │ │ ├── hybrid.ts ← Vector + BM25 hybrid search
102
+ │ │ ├── rerank.ts ← Cross-encoder reranking
103
+ │ │ └── query.ts ← Query rewriting / HyDE
104
+ │ ├── connectors/
105
+ │ │ ├── conversations.ts ← OpenClaw conversation capture (agent_end hook)
106
+ │ │ ├── imessage.ts ← macOS iMessage history (chat.db)
107
+ │ │ ├── files.ts ← Local file watcher (.md, .pdf, code)
108
+ │ │ ├── browser.ts ← Chrome/Firefox/Safari history + bookmarks
109
+ │ │ ├── clipboard.ts ← Clipboard history capture
110
+ │ │ ├── apple-notes.ts ← Apple Notes via AppleScript bridge
111
+ │ │ └── web.ts ← URL fetch + extract (via Tavily or direct)
112
+ │ ├── mcp/
113
+ │ │ └── server.ts ← MCP server (replaces lesa-bridge)
114
+ │ └── cli/
115
+ │ └── commands.ts ← CLI: status, search, ingest, connectors
116
+ ├── openclaw.plugin.json
117
+ ├── package.json
118
+ ├── tsconfig.json
119
+ ├── PRD.md ← This file
120
+ └── README.md
121
+ ```
122
+
123
+ ### Data Flow
124
+
125
+ ```
126
+ Sources Ingestion Storage Retrieval
127
+ ──────── ───────── ─────── ─────────
128
+ Conversations (agent_end) ─┐
129
+ iMessage (chat.db) ─┤ ┌──────────────┐ ┌────────────────────┐
130
+ Local files (.md, .pdf) ─┤───▶│ Pipeline │───▶│ crystal.sqlite │
131
+ Browser history ─┤ │ ┌──────────┐ │ │ ┌──────────────┐ │ ┌───────────┐
132
+ Apple Notes ─┤ │ │ Extract │ │ │ │ chunks │ │───▶│ Hybrid │
133
+ Clipboard ─┤ │ │ Chunk │ │ │ │ memories │ │ │ Search │
134
+ URLs (manual/Tavily) ─┘ │ │ Enrich │ │ │ │ entities │ │ │ ┌───────┐ │
135
+ │ │ Embed │ │ │ │ relationships│ │ │ │Vector │ │
136
+ │ │ Extract │ │ │ │ vec_chunks │ │ │ │BM25 │ │
137
+ │ │ Facts │ │ │ └──────────────┘ │ │ │Rerank │ │
138
+ │ └──────────┘ │ └────────────────────┘ │ └───────┘ │
139
+ └──────────────┘ └───────────┘
140
+
141
+ ┌─────▼─────┐
142
+ │ Agent │
143
+ │ Tools │
144
+ │ MCP │
145
+ │ CLI │
146
+ └───────────┘
147
+ ```
148
+
149
+ ---
150
+
151
+ ## 4. Database Schema
152
+
153
+ Two stores: **LanceDB** for vector-indexed content, **SQLite** for graph + metadata.
154
+
155
+ ### LanceDB Tables (vector search)
156
+
157
+ LanceDB stores embeddings as Apache Arrow columnar files with built-in IVF-PQ indexing and BM25 full-text search. All three content types are indexed:
158
+
159
+ **`chunks` table** — Raw content chunks
160
+ ```
161
+ id: string (nanoid)
162
+ source_id: string (FK to SQLite sources)
163
+ text: string (chunk text, BM25-indexed)
164
+ embedding: vector[768] (nomic-embed-text-v1.5, or 1536 for OpenAI)
165
+ role: string ('user', 'assistant', 'document', 'note')
166
+ metadata: string (JSON: { turnIndex, lineRange, language, ... })
167
+ token_count: int32
168
+ created_at: int64 (unix timestamp)
169
+ updated_at: int64
170
+ ```
171
+
172
+ **`memories` table** — Extracted facts (mem0-style)
173
+ ```
174
+ id: string (nanoid)
175
+ text: string (BM25-indexed)
176
+ embedding: vector[768]
177
+ category: string ('fact', 'preference', 'event', 'opinion', 'skill')
178
+ confidence: float64 (decays over time, boosted on re-confirmation)
179
+ source_ids: string (JSON array of source chunk IDs)
180
+ supersedes: string (ID of memory this one replaced)
181
+ status: string ('active', 'deprecated', 'deleted')
182
+ created_at: int64
183
+ updated_at: int64
184
+ last_accessed: int64
185
+ ```
186
+
187
+ **`entity_embeddings` table** — Entity vectors for semantic entity search
188
+ ```
189
+ id: string (nanoid, FK to SQLite entities)
190
+ name: string (BM25-indexed)
191
+ description: string (BM25-indexed)
192
+ embedding: vector[768]
193
+ ```
194
+
195
+ ### SQLite Tables (graph + metadata): `crystal.db`
196
+
197
+ ### `entities` — Knowledge graph nodes
198
+
199
+ ```sql
200
+ CREATE TABLE entities (
201
+ id TEXT PRIMARY KEY,
202
+ name TEXT NOT NULL, -- "Parker Todd Brooks"
203
+ type TEXT, -- 'person', 'project', 'concept', 'tool', 'place'
204
+ description TEXT, -- summary
205
+ properties TEXT, -- JSON: arbitrary key-value
206
+ created_at INTEGER NOT NULL,
207
+ updated_at INTEGER NOT NULL
208
+ );
209
+ ```
210
+
211
+ ### `relationships` — Knowledge graph edges (bi-temporal, inspired by Graphiti)
212
+
213
+ ```sql
214
+ CREATE TABLE relationships (
215
+ id TEXT PRIMARY KEY,
216
+ source_id TEXT NOT NULL, -- FK to entities
217
+ target_id TEXT NOT NULL, -- FK to entities
218
+ type TEXT NOT NULL, -- 'founded', 'works_on', 'uses', 'knows', 'prefers'
219
+ description TEXT, -- natural language description
220
+ weight REAL DEFAULT 1.0, -- strength/confidence
221
+ temporal TEXT, -- 'current', 'past', 'planned'
222
+ event_time INTEGER, -- when the fact actually occurred (T)
223
+ evidence TEXT, -- JSON array of source chunk IDs
224
+ valid_from INTEGER NOT NULL, -- ingestion time (T') — when we learned this
225
+ valid_until INTEGER, -- set when superseded (NULL = still valid)
226
+ superseded_by TEXT, -- FK to new relationship that replaced this
227
+ created_at INTEGER NOT NULL,
228
+ updated_at INTEGER NOT NULL
229
+ );
230
+ ```
231
+
232
+ ### `sources` — Provenance tracking
233
+
234
+ ```sql
235
+ CREATE TABLE sources (
236
+ id TEXT PRIMARY KEY,
237
+ type TEXT NOT NULL, -- 'conversation', 'file', 'imessage', 'url', 'clipboard', 'apple_note', 'browser'
238
+ uri TEXT, -- file path, URL, session ID, etc.
239
+ title TEXT,
240
+ connector TEXT, -- which connector produced this
241
+ metadata TEXT, -- JSON: connector-specific state
242
+ ingested_at INTEGER NOT NULL,
243
+ chunk_count INTEGER DEFAULT 0
244
+ );
245
+ ```
246
+
247
+ ### `connectors` — Sync state for each connector
248
+
249
+ ```sql
250
+ CREATE TABLE connectors (
251
+ id TEXT PRIMARY KEY, -- 'imessage', 'browser-chrome', 'files', etc.
252
+ enabled INTEGER DEFAULT 1,
253
+ last_sync INTEGER, -- unix timestamp
254
+ cursor TEXT, -- connector-specific sync cursor (message ID, file mtime, etc.)
255
+ config TEXT, -- JSON: connector-specific config
256
+ stats TEXT -- JSON: { totalIngested, lastRunDuration, errors }
257
+ );
258
+ ```
259
+
260
+ ### Indexes
261
+
262
+ ```sql
263
+ CREATE INDEX idx_entities_name ON entities(name);
264
+ CREATE INDEX idx_entities_type ON entities(type);
265
+ CREATE INDEX idx_relationships_source ON relationships(source_id);
266
+ CREATE INDEX idx_relationships_target ON relationships(target_id);
267
+ CREATE INDEX idx_relationships_valid ON relationships(valid_from, valid_until);
268
+ CREATE INDEX idx_sources_type ON sources(type);
269
+ CREATE INDEX idx_connectors_last_sync ON connectors(last_sync);
270
+ ```
271
+
272
+ **Note:** Vector indexes (IVF-PQ) and BM25 full-text indexes are managed by LanceDB, not SQLite. This eliminates the need for sqlite-vec, FTS5, and manual RRF fusion — LanceDB's hybrid search handles it natively.
273
+
274
+ ---
275
+
276
+ ## 5. Ingestion Pipeline
277
+
278
+ ### 5.1 Content Extraction
279
+
280
+ Every source goes through extraction first:
281
+
282
+ | Source | Extraction Method |
283
+ |--------|------------------|
284
+ | Conversations | Extract text from message objects (skip tool results optionally) |
285
+ | Files (.md) | Read directly |
286
+ | Files (.pdf) | `pdf-parse` or `pdfjs-dist` |
287
+ | Files (.ts/.js/.py/etc.) | Read directly, tag with language |
288
+ | URLs | Tavily extract API or `@mozilla/readability` + `jsdom` |
289
+ | iMessage | Read `~/Library/Messages/chat.db` SQLite directly |
290
+ | Browser history | Read Chrome/Firefox/Safari SQLite databases |
291
+ | Apple Notes | AppleScript bridge to read note contents |
292
+ | Clipboard | macOS `pbpaste` or clipboard monitoring daemon |
293
+
294
+ ### 5.2 Smart Chunking
295
+
296
+ **Not fixed-size.** Three chunking strategies depending on content type:
297
+
298
+ **Semantic chunking (prose, conversations):**
299
+ - Split at paragraph boundaries
300
+ - Use embedding similarity between adjacent paragraphs
301
+ - If similarity drops below threshold → chunk boundary
302
+ - Target: 200-600 tokens per chunk
303
+ - Preserves natural semantic units
304
+
305
+ **AST-aware chunking (code):**
306
+ - Use tree-sitter to parse into AST
307
+ - Extract semantic entities: functions, classes, interfaces, types
308
+ - Build scope tree (preserve nesting: `UserService > getUser`)
309
+ - Split at entity boundaries, not arbitrary line counts
310
+ - Prepend scope chain as context
311
+ - Supported: TypeScript, JavaScript, Python, Rust, Go, Java
312
+
313
+ **Recursive character chunking (fallback):**
314
+ - For content that doesn't fit above categories
315
+ - Split at sentence → paragraph → section boundaries
316
+ - Fixed ~400 token chunks with 80 token overlap
317
+ - Current context-embeddings approach (backward compatible)
318
+
319
+ ### 5.3 Contextual Enrichment (Anthropic's Approach)
320
+
321
+ Before embedding, each chunk gets a context prefix generated by an LLM:
322
+
323
+ ```
324
+ Input chunk: "We fixed the apiKey issue by leaving remote: {} empty"
325
+
326
+ Context prefix: "This chunk is from a conversation about OpenClaw memory search
327
+ configuration. It describes the fix for a bug where putting an apiKey in
328
+ memorySearch.remote blocked the environment variable fallback. The solution was
329
+ to leave the remote object empty and let the op-secrets plugin set
330
+ process.env.OPENAI_API_KEY from 1Password."
331
+
332
+ Stored text: [context prefix] + [original chunk]
333
+ ```
334
+
335
+ This dramatically improves retrieval quality by making chunks self-contained. Uses Claude Haiku for cost efficiency (~$0.001 per chunk).
336
+
337
+ ### 5.4 Embedding
338
+
339
+ - **Primary:** Local via Ollama — `nomic-embed-text-v1.5` (768 dimensions, 8K token context, beats text-embedding-3-small on MTEB, Matryoshka support for dimensionality reduction)
340
+ - **Fallback:** OpenAI `text-embedding-3-small` (1536 dimensions) for users without Ollama
341
+ - **CPU-only fallback:** `all-MiniLM-L6-v2` via ONNX (384 dimensions, only 256 token context, but ~15ms/chunk on CPU)
342
+ - **Batch processing:** Embed in batches of 100 to minimize overhead
343
+ - Embeddings stored in LanceDB (Apache Arrow columnar format, memory-mapped for near in-memory speed)
344
+
345
+ ### 5.5 Fact Extraction (mem0-style)
346
+
347
+ After chunking + embedding, an LLM extracts structured facts:
348
+
349
+ ```
350
+ Input: "Parker founded WIP.computer in 2026. The company has three products:
351
+ Lesa (agent service), LYLA (token system), and an unnamed music player."
352
+
353
+ Extracted memories:
354
+ - "Parker Todd Brooks founded WIP.computer in 2026" [fact]
355
+ - "WIP.computer has three products: Lesa, LYLA, and an unnamed music player" [fact]
356
+ - "LYLA is the token/currency system for the WIP.computer ecosystem" [fact]
357
+
358
+ Extracted entities:
359
+ - Parker Todd Brooks [person]
360
+ - WIP.computer [company]
361
+ - Lesa [product]
362
+ - LYLA [product]
363
+
364
+ Extracted relationships:
365
+ - Parker Todd Brooks --founded--> WIP.computer
366
+ - WIP.computer --has_product--> Lesa
367
+ - WIP.computer --has_product--> LYLA
368
+ - Lesa --type--> agent_service
369
+ - LYLA --type--> token_system
370
+ ```
371
+
372
+ ### 5.6 Memory Operations (ADD / UPDATE / DELETE / NOOP)
373
+
374
+ When new facts are extracted, they're compared against existing memories:
375
+
376
+ 1. **Embed** the new fact
377
+ 2. **Search** for semantically similar existing memories (top-5)
378
+ 3. **LLM decides** which operation to apply:
379
+ - **ADD** — No equivalent memory exists. Create new.
380
+ - **UPDATE** — Existing memory has related but incomplete info. Merge.
381
+ - **DELETE** — New info contradicts existing memory. Mark old as deprecated, create new.
382
+ - **NOOP** — Memory already captured. Skip.
383
+
384
+ This is how memory evolves: "Parker uses Sonnet" gets updated to "Parker uses Sonnet as primary, with Opus for complex tasks" without creating duplicates.
385
+
386
+ ---
387
+
388
+ ## 6. Search & Retrieval
389
+
390
+ ### 6.1 Hybrid Search
391
+
392
+ Every query runs through three parallel search paths:
393
+
394
+ 1. **Hybrid search (LanceDB)** — Single query combining ANN vector search (IVF-PQ) + BM25 keyword search. LanceDB handles fusion natively — no manual RRF needed. Top-20.
395
+ 2. **Graph traversal** — Find related entities, walk relationships in SQLite, gather connected memories
396
+ 3. **Merge** — LanceDB results + graph-augmented context combined via Reciprocal Rank Fusion:
397
+
398
+ ```
399
+ score(doc) = Σ 1/(k + rank_i(doc)) where k = 60
400
+ ```
401
+
402
+ ### 6.2 Query Rewriting
403
+
404
+ Before searching, the raw query is optionally rewritten:
405
+
406
+ - **Decomposition:** "What did Parker and I discuss about music and tokens?" → ["Parker music player vision", "LYLA token system", "UHI framework"]
407
+ - **HyDE:** Generate a hypothetical answer, embed that instead of the question (better for finding factual matches)
408
+
409
+ ### 6.3 Reranking
410
+
411
+ After fusion, top candidates are reranked:
412
+
413
+ - **Option A (default):** Local cross-encoder model (`ms-marco-MiniLM-L-6-v2` via ONNX Runtime) — fast (~5ms per candidate), free, runs on CPU, significant quality improvement
414
+ - **Option B:** LLM-based reranking (Claude Haiku scores relevance 0-10) — higher quality, costs ~$0.0001/query
415
+ - **Option C:** No reranking (for speed, acceptable at small scale)
416
+
417
+ Default: Option A for all queries. Option B available as `reranking: "haiku"` config.
418
+
419
+ ### 6.4 Graph-Augmented Retrieval
420
+
421
+ For entity-rich queries, the knowledge graph augments results:
422
+
423
+ 1. Extract entities from query
424
+ 2. Find matching entity nodes
425
+ 3. Traverse 1-2 hops of relationships
426
+ 4. Include connected memories as additional context
427
+ 5. This turns "tell me about WIP.computer" into a structured answer with products, people, philosophy, and status
428
+
429
+ ---
430
+
431
+ ## 7. Connectors
432
+
433
+ ### 7.1 Conversations (Primary — replaces context-embeddings)
434
+
435
+ **Hook:** `agent_end` (fires after every agent turn)
436
+
437
+ **Behavior:**
438
+ - Same as current context-embeddings plugin but with smart chunking + fact extraction
439
+ - Captures user/assistant turns, skips tool results (configurable)
440
+ - Extracts facts and updates knowledge graph after each turn
441
+ - Tracks per-session capture state to avoid re-processing
442
+
443
+ **Migration:** Import existing `context-embeddings.sqlite` chunks on first run.
444
+
445
+ ### 7.2 iMessage History
446
+
447
+ **Source:** `~/Library/Messages/chat.db` (SQLite, read-only)
448
+
449
+ **Behavior:**
450
+ - Reads `message` table joined with `chat` and `handle`
451
+ - Filters by date range and/or chat ID
452
+ - Extracts text content (handles attributedBody NSKeyedArchiver format)
453
+ - Groups by conversation thread
454
+ - Incremental sync via `message.ROWID` cursor
455
+
456
+ **Privacy:** Only indexes conversations with Parker (configurable handle filter). Does NOT index group chats by default.
457
+
458
+ **Note:** Requires Full Disk Access permission for the process reading chat.db.
459
+
460
+ ### 7.3 Local Files
461
+
462
+ **Source:** Configurable directory paths (default: `~/.openclaw/workspace/`, `~/Documents/`)
463
+
464
+ **Behavior:**
465
+ - Watches for `.md`, `.txt`, `.pdf`, `.ts`, `.js`, `.py`, `.json` files
466
+ - Hashes files to detect changes (skip unchanged)
467
+ - Re-indexes on modification
468
+ - Respects `.gitignore` and custom exclude patterns
469
+ - Code files use AST-aware chunking
470
+
471
+ **Incremental:** File mtime-based cursor.
472
+
473
+ ### 7.4 Browser History & Bookmarks
474
+
475
+ **Source:** Chrome (`~/Library/Application Support/Google/Chrome/Default/History`), Firefox (`~/Library/Application Support/Firefox/Profiles/*/places.sqlite`), Safari (`~/Library/Safari/History.db`)
476
+
477
+ **Behavior:**
478
+ - Reads URL + title + visit timestamp
479
+ - Optionally fetches and extracts content from frequently visited URLs
480
+ - Bookmarks indexed with higher weight
481
+ - Incremental via visit timestamp cursor
482
+
483
+ **Note:** Chrome locks its History file while running. Copy to temp first.
484
+
485
+ ### 7.5 Apple Notes
486
+
487
+ **Source:** AppleScript bridge (`osascript`)
488
+
489
+ **Behavior:**
490
+ - Lists all notes via `tell application "Notes" to get every note`
491
+ - Extracts title + body (HTML → markdown)
492
+ - Incremental via modification date
493
+
494
+ ### 7.6 Clipboard History
495
+
496
+ **Source:** macOS pasteboard
497
+
498
+ **Behavior:**
499
+ - Optional: runs a lightweight daemon that polls `pbpaste` every N seconds
500
+ - Only captures text content over 50 characters
501
+ - Deduplicates against recent captures
502
+ - Useful for capturing URLs, code snippets, notes copied from other apps
503
+
504
+ ### 7.7 Web / URL Ingestion
505
+
506
+ **Source:** Manual URL submission or Tavily extract
507
+
508
+ **Behavior:**
509
+ - Agent calls `crystal_ingest_url` tool with a URL
510
+ - Content extracted via Tavily extract API (if available) or `@mozilla/readability`
511
+ - Chunked, embedded, facts extracted
512
+ - Source tracked with URL for provenance
513
+
514
+ ---
515
+
516
+ ## 8. Agent Tools
517
+
518
+ ### `crystal_search`
519
+
520
+ The primary search tool. Replaces `conversation_search` and `memory_search`.
521
+
522
+ ```
523
+ Parameters:
524
+ query: string — Natural language query
525
+ scope?: string[] — Filter: ['conversations', 'documents', 'notes', 'web', 'messages']
526
+ limit?: number — Max results (default: 10)
527
+ time_range?: string — 'today', 'week', 'month', 'all' (default: 'all')
528
+ include_graph?: bool — Include knowledge graph context (default: true)
529
+
530
+ Returns:
531
+ results: Array<{
532
+ text: string,
533
+ source: { type, uri, title, timestamp },
534
+ score: number,
535
+ related_memories?: string[],
536
+ graph_context?: string
537
+ }>
538
+ ```
539
+
540
+ ### `crystal_remember`
541
+
542
+ Store a new memory or fact explicitly.
543
+
544
+ ```
545
+ Parameters:
546
+ text: string — The memory/fact to store
547
+ category?: string — 'fact', 'preference', 'event', 'opinion'
548
+
549
+ Returns:
550
+ operation: 'added' | 'updated' | 'duplicate'
551
+ memory_id: string
552
+ ```
553
+
554
+ ### `crystal_forget`
555
+
556
+ Mark a memory as deprecated.
557
+
558
+ ```
559
+ Parameters:
560
+ query: string — Description of what to forget
561
+ confirm?: bool — Require confirmation (default: true)
562
+
563
+ Returns:
564
+ forgotten: number — Count of memories deprecated
565
+ ```
566
+
567
+ ### `crystal_ingest_url`
568
+
569
+ Ingest a web page into memory.
570
+
571
+ ```
572
+ Parameters:
573
+ url: string
574
+ title?: string
575
+
576
+ Returns:
577
+ chunks: number
578
+ memories: number
579
+ entities: number
580
+ ```
581
+
582
+ ### `crystal_graph`
583
+
584
+ Query the knowledge graph directly.
585
+
586
+ ```
587
+ Parameters:
588
+ entity: string — Entity name or description
589
+ depth?: number — Relationship traversal depth (default: 2)
590
+
591
+ Returns:
592
+ entity: { name, type, description, properties }
593
+ relationships: Array<{ target, type, description }>
594
+ connected_memories: string[]
595
+ ```
596
+
597
+ ### `crystal_status`
598
+
599
+ Show memory system stats.
600
+
601
+ ```
602
+ Returns:
603
+ total_chunks: number
604
+ total_memories: number
605
+ total_entities: number
606
+ total_relationships: number
607
+ database_size: string
608
+ connectors: Array<{ id, enabled, last_sync, total_ingested }>
609
+ ```
610
+
611
+ ---
612
+
613
+ ## 9. MCP Server
614
+
615
+ Replaces the current `lesa-bridge` MCP server with a superset of tools:
616
+
617
+ | Current lesa-bridge tool | Memory Crystal replacement |
618
+ |-------------------------|---------------------------|
619
+ | `lesa_conversation_search` | `crystal_search` (scope: conversations) |
620
+ | `lesa_memory_search` | `crystal_search` (scope: documents, notes) |
621
+ | `lesa_read_workspace` | `crystal_search` + direct file read |
622
+
623
+ **New MCP tools:**
624
+ - `crystal_search` — Unified hybrid search
625
+ - `crystal_remember` — Store memories
626
+ - `crystal_forget` — Deprecate memories
627
+ - `crystal_ingest_url` — Ingest web content
628
+ - `crystal_graph` — Knowledge graph queries
629
+ - `crystal_status` — System stats
630
+
631
+ **MCP Resources** (automatic context injection):
632
+ - `memory://recent` — Last 24h of memories (injected at conversation start)
633
+ - `memory://graph` — Full knowledge graph summary
634
+ - `memory://entity/{name}` — Everything known about a specific entity
635
+
636
+ Registered via `claude mcp add` at user scope, available in all Claude Code sessions.
637
+
638
+ ---
639
+
640
+ ## 10. CLI Commands
641
+
642
+ ```bash
643
+ # Status and diagnostics
644
+ openclaw crystal status # Show stats, connector status, DB size
645
+ openclaw crystal search "query" # Search from command line
646
+ openclaw crystal search "query" --scope conversations --limit 5
647
+
648
+ # Ingestion
649
+ openclaw crystal ingest <file> # Ingest a file or directory
650
+ openclaw crystal ingest <url> # Ingest a URL
651
+ openclaw crystal ingest --all # Run all enabled connectors
652
+
653
+ # Connectors
654
+ openclaw crystal connectors # List all connectors and status
655
+ openclaw crystal connectors enable imessage
656
+ openclaw crystal connectors disable clipboard
657
+ openclaw crystal connectors sync imessage # Force sync a connector
658
+ openclaw crystal connectors sync --all # Sync all
659
+
660
+ # Knowledge graph
661
+ openclaw crystal graph "Parker" # Show entity and relationships
662
+ openclaw crystal graph --stats # Graph statistics
663
+
664
+ # Migration
665
+ openclaw crystal migrate # Import from context-embeddings.sqlite
666
+
667
+ # Maintenance
668
+ openclaw crystal compact # Remove deprecated memories, optimize DB
669
+ openclaw crystal export # Export all memories as JSON (backup)
670
+ openclaw crystal import <file> # Import from backup
671
+ ```
672
+
673
+ ---
674
+
675
+ ## 11. Migration Strategy
676
+
677
+ ### Phase 1: Replace context-embeddings (backward compatible)
678
+
679
+ 1. Import all 2,673 chunks from `context-embeddings.sqlite` into LanceDB + `crystal.db`
680
+ 2. Register same `agent_end` hook for conversation capture
681
+ 3. Provide same `conversation_search` tool (aliased to `crystal_search`)
682
+ 4. Disable old `context-embeddings` plugin
683
+ 5. **Zero loss of existing data**
684
+
685
+ ### Phase 2: Add hybrid search + fact extraction
686
+
687
+ 6. LanceDB IVF-PQ indexing on all chunks (auto-enabled above threshold)
688
+ 7. LanceDB built-in BM25 full-text indexing
689
+ 8. Enable fact extraction pipeline (memories table)
690
+ 9. Run fact extraction over existing chunks (batch job)
691
+ 10. Add `crystal_remember` and `crystal_forget` tools
692
+
693
+ ### Phase 3: Knowledge graph
694
+
695
+ 11. Enable entity + relationship extraction
696
+ 12. Build graph from existing memories
697
+ 13. Add `crystal_graph` tool
698
+ 14. Add graph-augmented retrieval to search
699
+
700
+ ### Phase 4: Connectors
701
+
702
+ 14. Enable file watcher (workspace + Documents)
703
+ 15. Enable iMessage connector
704
+ 16. Enable browser history connector
705
+ 17. Enable Apple Notes connector
706
+ 18. Optional: clipboard daemon
707
+
708
+ ### Phase 5: MCP server + external access
709
+
710
+ 19. Build MCP server (replaces lesa-bridge)
711
+ 20. Register with Claude Code
712
+ 21. Add contextual enrichment to ingestion pipeline
713
+ 22. Add query rewriting to search
714
+
715
+ ---
716
+
717
+ ## 12. Technical Decisions
718
+
719
+ ### Why LanceDB + SQLite (not Postgres/pgvector, not single SQLite)
720
+
721
+ Research evaluated five vector stores (see [RESEARCH.md](./RESEARCH.md) §1). LanceDB won:
722
+
723
+ - **Embedded library** — `npm install @lancedb/lancedb`, no server process. Same deployment model as SQLite.
724
+ - **Native hybrid search** — BM25 + vector in one query, built-in. No manual FTS5/sqlite-vec/RRF plumbing.
725
+ - **Scales to 1M+** — IVF-PQ indexing keeps query times flat (~25ms at 1M). sqlite-vec degrades linearly (~200ms at 1M).
726
+ - **Disk-efficient** — Apache Arrow columnar format, memory-mapped. Near in-memory speed from disk.
727
+ - **Used by Continue IDE** — Proven for exactly this use case (code + conversation memory).
728
+
729
+ SQLite handles the knowledge graph and metadata — it's better at relational queries, joins, and recursive CTEs for graph traversal.
730
+
731
+ **Why not single-file SQLite + sqlite-vec?** sqlite-vec is brute-force only (no ANN indexing). Performance is linear with dataset size. Acceptable at 10K chunks (~2ms) but unacceptable at 100K+ (~75ms). LanceDB stays flat. Also, sqlite-vec requires manual RRF fusion between separate vec0 and FTS5 queries — LanceDB does this natively.
732
+
733
+ ### Why mem0-style operations (not just append)
734
+
735
+ - Memories need to evolve. "Parker's primary model is Sonnet" should update, not duplicate.
736
+ - LLM-based ADD/UPDATE/DELETE/NOOP is proven (mem0 paper: arxiv.org/html/2504.19413v1)
737
+ - Keeps memory corpus clean and current
738
+
739
+ ### Why Anthropic-style contextual enrichment
740
+
741
+ - Raw chunks out of context are ambiguous ("We fixed the apiKey issue" — which apiKey?)
742
+ - Prepending a context summary improves retrieval by 35-67% (Anthropic's research)
743
+ - Cost is minimal: ~$0.001 per chunk with Haiku
744
+
745
+ ### Why tree-sitter for code chunking
746
+
747
+ - Fixed-size chunks split functions mid-body, losing semantic meaning
748
+ - AST-aware chunking keeps complete functions/classes together
749
+ - Scope chain prepended means "getUser" is searchable as "UserService.getUser"
750
+ - Supermemory's `code-chunk` library proves this approach works
751
+
752
+ ### Why not a separate knowledge graph database (FalkorDB, Neo4j)
753
+
754
+ - Overkill for single-user, local memory
755
+ - SQLite tables with proper indexes handle graph queries fine
756
+ - Entity + relationship tables with recursive CTEs = basic graph traversal
757
+ - No additional infrastructure to manage
758
+
759
+ ---
760
+
761
+ ## 13. Dependencies
762
+
763
+ ```json
764
+ {
765
+ "dependencies": {
766
+ "@lancedb/lancedb": "^0.10.0",
767
+ "better-sqlite3": "^11.0.0",
768
+ "@1password/sdk": "^0.3.1",
769
+ "openai": "^4.0.0",
770
+ "nanoid": "^5.0.0",
771
+ "@anthropic-ai/sdk": "^0.30.0",
772
+ "@modelcontextprotocol/sdk": "^1.0.0",
773
+ "zod": "^3.22.0",
774
+ "apache-arrow": "^17.0.0"
775
+ },
776
+ "optionalDependencies": {
777
+ "@supermemory/code-chunk": "^1.0.0",
778
+ "onnxruntime-node": "^1.18.0",
779
+ "pdf-parse": "^1.1.1",
780
+ "@mozilla/readability": "^0.5.0",
781
+ "jsdom": "^24.0.0",
782
+ "chokidar": "^4.0.0"
783
+ },
784
+ "devDependencies": {
785
+ "tsup": "^8.0.0",
786
+ "typescript": "^5.4.0"
787
+ }
788
+ }
789
+ ```
790
+
791
+ ---
792
+
793
+ ## 14. Configuration
794
+
795
+ Plugin config in `openclaw.json`:
796
+
797
+ ```json
798
+ {
799
+ "plugins": {
800
+ "entries": {
801
+ "memory-crystal": {
802
+ "enabled": true,
803
+ "config": {
804
+ "dataDir": "~/.openclaw/memory-crystal",
805
+ "embedding": {
806
+ "provider": "ollama",
807
+ "model": "nomic-embed-text",
808
+ "dimensions": 768,
809
+ "ollamaBaseUrl": "http://localhost:11434",
810
+ "fallback": "openai"
811
+ },
812
+ "enrichment": {
813
+ "enabled": true,
814
+ "model": "claude-haiku"
815
+ },
816
+ "connectors": {
817
+ "conversations": { "enabled": true },
818
+ "files": {
819
+ "enabled": true,
820
+ "paths": ["~/.openclaw/workspace/", "~/Documents/"],
821
+ "extensions": [".md", ".txt", ".pdf"],
822
+ "exclude": ["node_modules", ".git", "dist"]
823
+ },
824
+ "imessage": {
825
+ "enabled": false,
826
+ "handles": []
827
+ },
828
+ "browser": {
829
+ "enabled": false,
830
+ "browsers": ["chrome"]
831
+ },
832
+ "apple_notes": { "enabled": false },
833
+ "clipboard": { "enabled": false }
834
+ },
835
+ "search": {
836
+ "hybridWeight": { "vector": 0.7, "bm25": 0.3 },
837
+ "reranking": "haiku",
838
+ "maxResults": 10
839
+ },
840
+ "factExtraction": {
841
+ "enabled": true,
842
+ "model": "claude-haiku"
843
+ },
844
+ "graph": {
845
+ "enabled": true,
846
+ "extractEntities": true,
847
+ "traversalDepth": 2
848
+ }
849
+ }
850
+ }
851
+ }
852
+ }
853
+ }
854
+ ```
855
+
856
+ ---
857
+
858
+ ## 15. Performance Targets
859
+
860
+ | Metric | Target | Notes |
861
+ |--------|--------|-------|
862
+ | Search latency (10K chunks) | < 30ms | LanceDB IVF-PQ + built-in BM25 |
863
+ | Search latency (100K chunks) | < 50ms | IVF-PQ scales sublinearly (stays flat) |
864
+ | Search latency (1M chunks) | < 50ms | Same indexing, memory-mapped from disk |
865
+ | Ingestion throughput (local embed) | 100 chunks/sec | Ollama nomic-embed ~50ms/chunk on GPU |
866
+ | Ingestion throughput (API embed) | 50 chunks/sec | Bottleneck: OpenAI API |
867
+ | Fact extraction | 10 facts/sec | Bottleneck: LLM API |
868
+ | Database size per 10K chunks | ~30MB | 768-dim × 4 bytes × 10K = 30MB + Arrow overhead |
869
+ | Startup time | < 2 seconds | LanceDB + SQLite connection + connector init |
870
+ | Memory overhead | < 80MB RSS | LanceDB memory-maps Arrow files |
871
+
872
+ ---
873
+
874
+ ## 16. Security & Privacy
875
+
876
+ - **All data local.** Database never leaves the machine.
877
+ - **API keys from 1Password.** OpenAI key resolved via op-secrets plugin at startup.
878
+ - **iMessage requires Full Disk Access.** User must explicitly grant this.
879
+ - **No telemetry.** Zero data sent anywhere except embedding/LLM API calls.
880
+ - **Connector isolation.** Each connector only accesses its declared source.
881
+ - **Sensitive content.** Clipboard and iMessage connectors disabled by default.
882
+ - **Backup.** Two stores: `crystal.db` (SQLite) + `lance/` directory. Copy the `~/.openclaw/memory-crystal/` directory. Automatic daily backups to `backups/`.
883
+
884
+ ---
885
+
886
+ ## 17. Relationship to WIP.computer
887
+
888
+ Memory Crystal is infrastructure for Lesa specifically, but the patterns apply to WIP.computer's broader vision:
889
+
890
+ - **Namespace + Memory:** When agents have perfect recall, they can maintain creator identity context across every interaction
891
+ - **Creation-time attribution:** If the memory system tracks what influenced a creation, the rubric has better data for fair splits
892
+ - **Agent marketplace:** Trained agents (like "Debbie") need persistent memory of their training — Crystal is that layer
893
+ - **Multi-agent isolation:** Each spawned agent gets its own memory crystal (separate DB file), with privacy boundaries by design
894
+
895
+ ---
896
+
897
+ ## 18. Development Phases & Effort Estimates
898
+
899
+ ### Phase 1: Core + Migration (Week 1)
900
+ - [ ] Project scaffold (package.json, tsconfig, plugin manifest)
901
+ - [ ] LanceDB setup + table creation (chunks, memories, entity_embeddings)
902
+ - [ ] SQLite schema + migrations (entities, relationships, sources, connectors)
903
+ - [ ] Embedding provider (Ollama nomic-embed primary, OpenAI fallback, 1Password resolution)
904
+ - [ ] Basic chunking (current approach, backward compatible)
905
+ - [ ] Import from context-embeddings.sqlite (re-embed with nomic-embed)
906
+ - [ ] LanceDB hybrid search (built-in vector + BM25)
907
+ - [ ] `crystal_search` agent tool
908
+ - [ ] `crystal_status` agent tool + CLI
909
+ - [ ] Conversation connector (agent_end hook)
910
+ - [ ] Disable context-embeddings, enable memory-crystal
911
+
912
+ ### Phase 2: Intelligence (Week 2)
913
+ - [ ] Fact extraction pipeline (LLM-based)
914
+ - [ ] Memory operations (ADD/UPDATE/DELETE/NOOP)
915
+ - [ ] `crystal_remember` + `crystal_forget` tools
916
+ - [ ] Semantic chunking for prose
917
+ - [ ] Contextual enrichment (Anthropic approach)
918
+ - [ ] Query rewriting
919
+
920
+ ### Phase 3: Knowledge Graph (Week 3)
921
+ - [ ] Entity extraction
922
+ - [ ] Relationship extraction
923
+ - [ ] Graph storage + traversal queries
924
+ - [ ] Graph-augmented search
925
+ - [ ] `crystal_graph` tool
926
+
927
+ ### Phase 4: Connectors (Week 3-4)
928
+ - [ ] File watcher connector
929
+ - [ ] iMessage connector
930
+ - [ ] Browser history connector
931
+ - [ ] Apple Notes connector
932
+ - [ ] Clipboard connector
933
+ - [ ] Web/URL connector
934
+ - [ ] Connector management CLI
935
+
936
+ ### Phase 5: MCP + Polish (Week 4)
937
+ - [ ] MCP server (replaces lesa-bridge)
938
+ - [ ] CLI refinement
939
+ - [ ] AST-aware code chunking (tree-sitter)
940
+ - [ ] Reranking
941
+ - [ ] Performance optimization
942
+ - [ ] Documentation
943
+
944
+ ---
945
+
946
+ ## 19. Success Criteria
947
+
948
+ 1. **"Crystal, what did Parker and I discuss about LYLA tokens?"** returns accurate, sourced results from conversations that happened weeks ago — even if they were compacted from Lesa's context window.
949
+
950
+ 2. **Memory evolves.** If Parker changes his primary model from Sonnet to Opus, Crystal updates the fact — not duplicates it.
951
+
952
+ 3. **Cross-source connections.** A search for "music player" returns results from conversations, WIP.computer docs, browser history of relevant articles, and knowledge graph showing connections to UHI framework and LYLA.
953
+
954
+ 4. **Zero manual maintenance.** Connectors run automatically. Facts extract automatically. Graph builds automatically. Parker never has to manually add anything.
955
+
956
+ 5. **Sub-second search.** Even at 100K+ chunks, search returns in under 300ms.
957
+
958
+ 6. **Total sovereignty.** Everything in one directory (`~/.openclaw/memory-crystal/`). Back it up, move it, encrypt it. No cloud dependency. No external servers.
959
+
960
+ ---
961
+
962
+ ## 20. Open Questions
963
+
964
+ 1. ~~**sqlite-vec vs. vectorlite vs. raw BLOB cosine**~~ **RESOLVED:** LanceDB. See [RESEARCH.md](./RESEARCH.md) §1 for full comparison. Built-in hybrid search, IVF-PQ indexing, native TS SDK.
965
+
966
+ 2. ~~**Local embedding models**~~ **RESOLVED:** nomic-embed-text-v1.5 via Ollama as default. Beats text-embedding-3-small on MTEB benchmarks, 8K token context, free. OpenAI as fallback. See [RESEARCH.md](./RESEARCH.md) §1.
967
+
968
+ 3. **Enrichment cost** — At 50K chunks, contextual enrichment via Haiku costs ~$50. Worth it? Could batch-process in background.
969
+
970
+ 4. **iMessage NSKeyedArchiver format** — Parsing `attributedBody` is non-trivial. May need a Swift helper or existing library.
971
+
972
+ 5. **Code chunking approach** — Research recommends `@supermemory/code-chunk` (tree-sitter, TypeScript native) over raw tree-sitter bindings. Avoids native module issues. Need to verify it works with OpenClaw's module loader.
973
+
974
+ 6. **Memory Crystal as name** — Parker to confirm. Alternative: "Recall", "Engram", "Mnemon", "Crystal Memory", "Total Recall" (lol).
975
+
976
+ 7. ~~**Relationship to context-embeddings**~~ **RESOLVED:** Full replacement. Migration path defined in §11.
977
+
978
+ 8. **Re-embedding existing data** — The 2,673 existing chunks use text-embedding-3-small (1536 dim). Switching to nomic-embed (768 dim) means re-embedding everything during migration. One-time cost, ~2 minutes with Ollama.
979
+
980
+ 9. **LanceDB maturity** — LanceDB TS SDK is newer than sqlite-vec. Need to monitor for stability issues. Fallback: keep sqlite-vec as a "lite mode" option.
981
+
982
+ ---
983
+
984
+ ## 21. References
985
+
986
+ ### Research & Prior Art
987
+ - [Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory](https://arxiv.org/html/2504.19413v1) — Extraction + update pipeline, ADD/UPDATE/DELETE/NOOP
988
+ - [Anthropic: Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval) — Chunk enrichment for better RAG
989
+ - [Graphiti: Real-Time Knowledge Graphs for AI Agents](https://github.com/getzep/graphiti) — Temporal knowledge graphs
990
+ - [Supermemory](https://supermemory.ai) — Hosted memory API (what we're replacing)
991
+ - [Supermemory code-chunk](https://github.com/supermemoryai/code-chunk) — AST-aware code chunking
992
+ - [Microsoft GraphRAG](https://github.com/microsoft/graphrag) — Graph-based RAG approach
993
+
994
+ ### Infrastructure
995
+ - [LanceDB](https://docs.lancedb.com/) — Embedded vector database with hybrid search (chosen over sqlite-vec)
996
+ - [sqlite-vec](https://github.com/asg017/sqlite-vec) — Vector search extension for SQLite (lite-mode fallback)
997
+ - [better-sqlite3](https://github.com/WiseLibs/better-sqlite3) — Synchronous SQLite for Node.js
998
+ - [nomic-embed-text](https://www.nomic.ai/blog/posts/nomic-embed-matryoshka) — Local embedding model (chosen as primary)
999
+ - [@supermemory/code-chunk](https://github.com/supermemoryai/code-chunk) — AST-aware code chunking
1000
+ - [MCP SDK](https://github.com/modelcontextprotocol/typescript-sdk) — Model Context Protocol
1001
+ - [ONNX Runtime](https://onnxruntime.ai/) — For local cross-encoder reranking
1002
+
1003
+ ### Parker's Existing Infrastructure
1004
+ - `~/.openclaw/extensions/context-embeddings/` — Current conversation embedding plugin
1005
+ - `~/Documents/Projects/Claude Code/lesa-bridge/` — Current MCP server
1006
+ - `~/Documents/Projects/Claude Code/openclaw-1password/` — Secret management
1007
+ - `~/Documents/Projects/OpenClaw/WIP.computer/` — Company vision docs
1008
+
1009
+ ---
1010
+
1011
+ *PRD written: 2026-02-08*
1012
+ *Author: Claude Code (Opus 4.6) + Parker*
1013
+ *Project: memory-crystal*
1014
+ *Status: Draft — awaiting Parker's review*