memory-crystal 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.env.example +20 -0
- package/CHANGELOG.md +6 -0
- package/LETTERS.md +22 -0
- package/LICENSE +21 -0
- package/README-ENTERPRISE.md +162 -0
- package/README-old.md +275 -0
- package/README.md +91 -0
- package/RELAY.md +88 -0
- package/TECHNICAL.md +379 -0
- package/ai/dev-updates/2026-02-25--cc-air--phase2-architecture-pivot.md +70 -0
- package/ai/dev-updates/2026-02-25--cc-air--phase2-worker-build.md +72 -0
- package/ai/dev-updates/2026-02-26--10-25-16--cc-mini--phase2-implementation.md +49 -0
- package/ai/dev-updates/2026-02-27--20-30-00--cc-mini--readme-overhaul-and-public-deploy.md +69 -0
- package/ai/notes/2026-02-26--cc-air--notes.md +412 -0
- package/ai/notes/2026-02-27--cc-mini--grok-feedback.md +44 -0
- package/ai/notes/2026-02-27--cc-mini--lesa-feedback.md +45 -0
- package/ai/notes/RESEARCH.md +1185 -0
- package/ai/notes/salience-research/README.md +29 -0
- package/ai/notes/salience-research/eurosla-salience-review.md +64 -0
- package/ai/notes/salience-research/full-research-summary.md +269 -0
- package/ai/notes/salience-research/salience-levels-diagram.png +0 -0
- package/ai/plan/2026-02-27--cc-mini--qr-pairing-spec.md +203 -0
- package/ai/plan/_archive/PLAN.md +194 -0
- package/ai/plan/_archive/PRD.md +1014 -0
- package/ai/plan/cc-plans-duplicates-from-dot-claude/2026-02-26--cc-mini--phase2-implementation-plan.md +245 -0
- package/ai/plan/dev-conventions-note.md +70 -0
- package/ai/plan/ldm-os-install-and-boot-architecture.md +285 -0
- package/ai/plan/memory-crystal-phase2-plan.md +192 -0
- package/ai/plan/memory-system-lay-of-the-land.md +214 -0
- package/ai/plan/phase2-ephemeral-relay.md +238 -0
- package/ai/plan/readme-first.md +68 -0
- package/ai/plan/roadmap.md +159 -0
- package/ai/todos/PUNCHLIST.md +44 -0
- package/ai/todos/README.md +31 -0
- package/ai/todos/inboxes/cc-air/2026-02-26--cc-air--post-relay-todos.md +85 -0
- package/ai/todos/inboxes/cc-mini/2026-02-26--cc-mini--phase2-status.md +100 -0
- package/ai/todos/inboxes/cc-mini/_archive/TODO.md +25 -0
- package/ai/todos/inboxes/parker/2026-02-25--cc-air--setup-checklist.md +139 -0
- package/ai/todos/inboxes/parker/2026-02-26--cc-mini--phase2-your-moves.md +72 -0
- package/dist/cc-hook.d.ts +1 -0
- package/dist/cc-hook.js +349 -0
- package/dist/chunk-3VFIJYS4.js +818 -0
- package/dist/chunk-52QE3YI3.js +1169 -0
- package/dist/chunk-AA3OPP4Z.js +432 -0
- package/dist/chunk-D3I3ZSE2.js +411 -0
- package/dist/chunk-EKSACBTJ.js +1070 -0
- package/dist/chunk-F3Y7EL7K.js +83 -0
- package/dist/chunk-JWZXYVET.js +1068 -0
- package/dist/chunk-KYVWO6ZM.js +1069 -0
- package/dist/chunk-L3VHARQH.js +413 -0
- package/dist/chunk-LOVAHSQV.js +411 -0
- package/dist/chunk-LQOYCAGG.js +446 -0
- package/dist/chunk-MK42FMEG.js +147 -0
- package/dist/chunk-NIJCVN3O.js +147 -0
- package/dist/chunk-O2UITJGH.js +465 -0
- package/dist/chunk-PEK6JH65.js +432 -0
- package/dist/chunk-PJ6FFKEX.js +77 -0
- package/dist/chunk-PLUBBZYR.js +800 -0
- package/dist/chunk-SGL6ISBJ.js +1061 -0
- package/dist/chunk-UNHVZB5G.js +411 -0
- package/dist/chunk-VAFTWSTE.js +1061 -0
- package/dist/chunk-XZ3S56RQ.js +1061 -0
- package/dist/chunk-Y72C7F6O.js +148 -0
- package/dist/cli.d.ts +1 -0
- package/dist/cli.js +325 -0
- package/dist/core.d.ts +188 -0
- package/dist/core.js +12 -0
- package/dist/crypto.d.ts +16 -0
- package/dist/crypto.js +18 -0
- package/dist/dev-update-SZ2Z4WCQ.js +6 -0
- package/dist/ldm.d.ts +17 -0
- package/dist/ldm.js +12 -0
- package/dist/mcp-server.d.ts +1 -0
- package/dist/mcp-server.js +250 -0
- package/dist/migrate.d.ts +1 -0
- package/dist/migrate.js +89 -0
- package/dist/mirror-sync.d.ts +1 -0
- package/dist/mirror-sync.js +130 -0
- package/dist/openclaw.d.ts +5 -0
- package/dist/openclaw.js +349 -0
- package/dist/poller.d.ts +1 -0
- package/dist/poller.js +272 -0
- package/dist/summarize.d.ts +19 -0
- package/dist/summarize.js +10 -0
- package/dist/worker.js +137 -0
- package/openclaw.plugin.json +11 -0
- package/package.json +40 -0
- package/scripts/migrate-lance-to-sqlite.mjs +217 -0
- package/skills/memory/SKILL.md +61 -0
- package/src/cc-hook.ts +447 -0
- package/src/cli.ts +356 -0
- package/src/core.ts +1472 -0
- package/src/crypto.ts +113 -0
- package/src/dev-update.ts +178 -0
- package/src/ldm.ts +117 -0
- package/src/mcp-server.ts +274 -0
- package/src/migrate.ts +104 -0
- package/src/mirror-sync.ts +175 -0
- package/src/openclaw.ts +250 -0
- package/src/poller.ts +345 -0
- package/src/summarize.ts +210 -0
- package/src/worker.ts +208 -0
- package/tsconfig.json +18 -0
- package/wrangler.toml +20 -0
|
@@ -0,0 +1,1014 @@
|
|
|
1
|
+
# Memory Crystal — Product Requirements Document
|
|
2
|
+
|
|
3
|
+
**Sovereign memory infrastructure for OpenClaw agents.**
|
|
4
|
+
|
|
5
|
+
Your memory. Your machine. Your rules.
|
|
6
|
+
|
|
7
|
+
---
|
|
8
|
+
|
|
9
|
+
## 1. Vision
|
|
10
|
+
|
|
11
|
+
Memory Crystal is a local-first, self-hosted memory system that gives OpenClaw agents (starting with Lesa) total recall — across conversations, documents, code, web pages, messages, and every source of knowledge that matters. No cloud dependency. No $19/month. No data leaving your machine.
|
|
12
|
+
|
|
13
|
+
This is the memory layer that Supermemory charges for, but built as sovereign infrastructure: an OpenClaw plugin backed by SQLite, running on your Mac mini, with every byte under your control.
|
|
14
|
+
|
|
15
|
+
### Design Philosophy
|
|
16
|
+
|
|
17
|
+
From WIP.computer's founding principles:
|
|
18
|
+
|
|
19
|
+
> "We do not dictate outcomes. We design conditions."
|
|
20
|
+
|
|
21
|
+
Memory Crystal creates the conditions for intelligence to emerge:
|
|
22
|
+
- **Foundation:** A unified memory store that captures everything
|
|
23
|
+
- **Constraint:** Privacy boundaries, data sovereignty, local-first
|
|
24
|
+
- **Intelligence:** Semantic search, knowledge graphs, memory evolution
|
|
25
|
+
- **Emergence:** An agent that never forgets, that connects dots across months of context
|
|
26
|
+
|
|
27
|
+
---
|
|
28
|
+
|
|
29
|
+
## 2. Problem Statement
|
|
30
|
+
|
|
31
|
+
### What exists today
|
|
32
|
+
|
|
33
|
+
Parker's OpenClaw setup has memory spread across five disconnected systems:
|
|
34
|
+
|
|
35
|
+
| System | What it stores | Limitation |
|
|
36
|
+
|--------|---------------|------------|
|
|
37
|
+
| `MEMORY.md` + workspace files | Curated long-term memory, daily logs | Manual, no search beyond grep |
|
|
38
|
+
| `context-embeddings` plugin | Conversation turns (2,673 chunks) | Brute-force cosine sim, no indexing, token-based chunking |
|
|
39
|
+
| `main.sqlite` (built-in) | Session transcripts + document embeddings | OpenClaw's built-in search, limited control |
|
|
40
|
+
| `lesa-bridge` MCP server | Exposes memory to Claude Code | Read-only bridge, no unified interface |
|
|
41
|
+
| Workspace `.md` files | Notes, research, observations | Flat files, no semantic understanding |
|
|
42
|
+
|
|
43
|
+
**Problems:**
|
|
44
|
+
1. **No unified search** — Each system has its own query interface. No single query can search across conversations, documents, and notes.
|
|
45
|
+
2. **No knowledge graph** — Facts are stored as flat text. "Parker works at WIP.computer" and "WIP.computer is building a music player" aren't connected.
|
|
46
|
+
3. **No memory evolution** — Old facts never get updated or deprecated. If Parker changes his mind, both old and new opinions coexist with equal weight.
|
|
47
|
+
4. **No ingestion pipeline** — Adding a new document or URL to memory is manual. No connectors for iMessage history, browser bookmarks, email, or local files.
|
|
48
|
+
5. **No vector indexing** — The context-embeddings plugin does brute-force cosine similarity over all chunks. This works at 2,673 chunks but won't scale to 100K+.
|
|
49
|
+
6. **Chunking is naive** — Fixed ~400 token chunks with overlap. No semantic boundaries, no code-aware chunking, no contextual enrichment.
|
|
50
|
+
|
|
51
|
+
### What Supermemory offers (and what we're replacing)
|
|
52
|
+
|
|
53
|
+
Supermemory ($19/mo, closed-source backend, Cloudflare-locked):
|
|
54
|
+
- Knowledge graph with auto-evolving memories
|
|
55
|
+
- Hybrid search (vector + keyword + reranking)
|
|
56
|
+
- Connectors: Gmail, Google Drive, Notion, OneDrive, S3, web crawler
|
|
57
|
+
- MCP server, browser extension, SDKs
|
|
58
|
+
- Sub-300ms search, scales to 50M tokens per user
|
|
59
|
+
- Memory operations: ADD, UPDATE, DELETE, NOOP (mem0-style)
|
|
60
|
+
|
|
61
|
+
**Their weakness:** Your data lives on their servers. Self-hosting requires enterprise plan + Cloudflare account. The backend engine is closed-source.
|
|
62
|
+
|
|
63
|
+
---
|
|
64
|
+
|
|
65
|
+
## 3. Architecture
|
|
66
|
+
|
|
67
|
+
### Core Principle: LanceDB + SQLite, Local-First
|
|
68
|
+
|
|
69
|
+
No external databases. No Docker. No Postgres. Two embedded stores:
|
|
70
|
+
- **LanceDB** — Vector search + BM25 hybrid search (Apache Arrow format, disk-efficient, scales to 1M+)
|
|
71
|
+
- **SQLite** — Knowledge graph, metadata, memory records, connector state
|
|
72
|
+
|
|
73
|
+
Embedding calls default to **Ollama** (local, free, `nomic-embed-text-v1.5`) with OpenAI as a fallback for users without Ollama. See [RESEARCH.md](./RESEARCH.md) for the full comparison that led to this decision.
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
~/.openclaw/memory-crystal/
|
|
77
|
+
├── lance/ ← LanceDB data directory
|
|
78
|
+
│ └── memories.lance/ ← Vector index + BM25 index (chunks, memories, entities)
|
|
79
|
+
├── crystal.db ← SQLite: knowledge graph, metadata, connector state
|
|
80
|
+
└── backups/ ← Automatic daily backups
|
|
81
|
+
|
|
82
|
+
~/Documents/Projects/OpenClaw/memory-crystal/
|
|
83
|
+
├── src/
|
|
84
|
+
│ ├── index.ts ← Plugin entry point (registers tools, services, CLI)
|
|
85
|
+
│ ├── db/
|
|
86
|
+
│ │ ├── lance.ts ← LanceDB connection, table creation, hybrid search
|
|
87
|
+
│ │ ├── sqlite.ts ← SQLite schema, migrations, graph queries
|
|
88
|
+
│ │ └── migrate.ts ← Import from context-embeddings.sqlite
|
|
89
|
+
│ ├── embed.ts ← Embedding provider (Ollama primary, OpenAI fallback)
|
|
90
|
+
│ ├── ingest/
|
|
91
|
+
│ │ ├── pipeline.ts ← Universal ingestion pipeline
|
|
92
|
+
│ │ ├── chunker.ts ← Smart chunking (semantic + code-aware)
|
|
93
|
+
│ │ ├── extractor.ts ← Content extraction (URLs, PDFs, etc.)
|
|
94
|
+
│ │ └── enricher.ts ← Contextual enrichment (Anthropic-style)
|
|
95
|
+
│ ├── memory/
|
|
96
|
+
│ │ ├── operations.ts ← ADD / UPDATE / DELETE / NOOP logic
|
|
97
|
+
│ │ ├── graph.ts ← Knowledge graph (entities + relationships)
|
|
98
|
+
│ │ ├── evolution.ts ← Memory decay, dedup, consolidation
|
|
99
|
+
│ │ └── extract.ts ← LLM-based fact extraction from text
|
|
100
|
+
│ ├── search/
|
|
101
|
+
│ │ ├── hybrid.ts ← Vector + BM25 hybrid search
|
|
102
|
+
│ │ ├── rerank.ts ← Cross-encoder reranking
|
|
103
|
+
│ │ └── query.ts ← Query rewriting / HyDE
|
|
104
|
+
│ ├── connectors/
|
|
105
|
+
│ │ ├── conversations.ts ← OpenClaw conversation capture (agent_end hook)
|
|
106
|
+
│ │ ├── imessage.ts ← macOS iMessage history (chat.db)
|
|
107
|
+
│ │ ├── files.ts ← Local file watcher (.md, .pdf, code)
|
|
108
|
+
│ │ ├── browser.ts ← Chrome/Firefox/Safari history + bookmarks
|
|
109
|
+
│ │ ├── clipboard.ts ← Clipboard history capture
|
|
110
|
+
│ │ ├── apple-notes.ts ← Apple Notes via AppleScript bridge
|
|
111
|
+
│ │ └── web.ts ← URL fetch + extract (via Tavily or direct)
|
|
112
|
+
│ ├── mcp/
|
|
113
|
+
│ │ └── server.ts ← MCP server (replaces lesa-bridge)
|
|
114
|
+
│ └── cli/
|
|
115
|
+
│ └── commands.ts ← CLI: status, search, ingest, connectors
|
|
116
|
+
├── openclaw.plugin.json
|
|
117
|
+
├── package.json
|
|
118
|
+
├── tsconfig.json
|
|
119
|
+
├── PRD.md ← This file
|
|
120
|
+
└── README.md
|
|
121
|
+
```
|
|
122
|
+
|
|
123
|
+
### Data Flow
|
|
124
|
+
|
|
125
|
+
```
|
|
126
|
+
Sources Ingestion Storage Retrieval
|
|
127
|
+
──────── ───────── ─────── ─────────
|
|
128
|
+
Conversations (agent_end) ─┐
|
|
129
|
+
iMessage (chat.db) ─┤ ┌──────────────┐ ┌────────────────────┐
|
|
130
|
+
Local files (.md, .pdf) ─┤───▶│ Pipeline │───▶│ crystal.sqlite │
|
|
131
|
+
Browser history ─┤ │ ┌──────────┐ │ │ ┌──────────────┐ │ ┌───────────┐
|
|
132
|
+
Apple Notes ─┤ │ │ Extract │ │ │ │ chunks │ │───▶│ Hybrid │
|
|
133
|
+
Clipboard ─┤ │ │ Chunk │ │ │ │ memories │ │ │ Search │
|
|
134
|
+
URLs (manual/Tavily) ─┘ │ │ Enrich │ │ │ │ entities │ │ │ ┌───────┐ │
|
|
135
|
+
│ │ Embed │ │ │ │ relationships│ │ │ │Vector │ │
|
|
136
|
+
│ │ Extract │ │ │ │ vec_chunks │ │ │ │BM25 │ │
|
|
137
|
+
│ │ Facts │ │ │ └──────────────┘ │ │ │Rerank │ │
|
|
138
|
+
│ └──────────┘ │ └────────────────────┘ │ └───────┘ │
|
|
139
|
+
└──────────────┘ └───────────┘
|
|
140
|
+
│
|
|
141
|
+
┌─────▼─────┐
|
|
142
|
+
│ Agent │
|
|
143
|
+
│ Tools │
|
|
144
|
+
│ MCP │
|
|
145
|
+
│ CLI │
|
|
146
|
+
└───────────┘
|
|
147
|
+
```
|
|
148
|
+
|
|
149
|
+
---
|
|
150
|
+
|
|
151
|
+
## 4. Database Schema
|
|
152
|
+
|
|
153
|
+
Two stores: **LanceDB** for vector-indexed content, **SQLite** for graph + metadata.
|
|
154
|
+
|
|
155
|
+
### LanceDB Tables (vector search)
|
|
156
|
+
|
|
157
|
+
LanceDB stores embeddings as Apache Arrow columnar files with built-in IVF-PQ indexing and BM25 full-text search. All three content types are indexed:
|
|
158
|
+
|
|
159
|
+
**`chunks` table** — Raw content chunks
|
|
160
|
+
```
|
|
161
|
+
id: string (nanoid)
|
|
162
|
+
source_id: string (FK to SQLite sources)
|
|
163
|
+
text: string (chunk text, BM25-indexed)
|
|
164
|
+
embedding: vector[768] (nomic-embed-text-v1.5, or 1536 for OpenAI)
|
|
165
|
+
role: string ('user', 'assistant', 'document', 'note')
|
|
166
|
+
metadata: string (JSON: { turnIndex, lineRange, language, ... })
|
|
167
|
+
token_count: int32
|
|
168
|
+
created_at: int64 (unix timestamp)
|
|
169
|
+
updated_at: int64
|
|
170
|
+
```
|
|
171
|
+
|
|
172
|
+
**`memories` table** — Extracted facts (mem0-style)
|
|
173
|
+
```
|
|
174
|
+
id: string (nanoid)
|
|
175
|
+
text: string (BM25-indexed)
|
|
176
|
+
embedding: vector[768]
|
|
177
|
+
category: string ('fact', 'preference', 'event', 'opinion', 'skill')
|
|
178
|
+
confidence: float64 (decays over time, boosted on re-confirmation)
|
|
179
|
+
source_ids: string (JSON array of source chunk IDs)
|
|
180
|
+
supersedes: string (ID of memory this one replaced)
|
|
181
|
+
status: string ('active', 'deprecated', 'deleted')
|
|
182
|
+
created_at: int64
|
|
183
|
+
updated_at: int64
|
|
184
|
+
last_accessed: int64
|
|
185
|
+
```
|
|
186
|
+
|
|
187
|
+
**`entity_embeddings` table** — Entity vectors for semantic entity search
|
|
188
|
+
```
|
|
189
|
+
id: string (nanoid, FK to SQLite entities)
|
|
190
|
+
name: string (BM25-indexed)
|
|
191
|
+
description: string (BM25-indexed)
|
|
192
|
+
embedding: vector[768]
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
### SQLite Tables (graph + metadata): `crystal.db`
|
|
196
|
+
|
|
197
|
+
### `entities` — Knowledge graph nodes
|
|
198
|
+
|
|
199
|
+
```sql
|
|
200
|
+
CREATE TABLE entities (
|
|
201
|
+
id TEXT PRIMARY KEY,
|
|
202
|
+
name TEXT NOT NULL, -- "Parker Todd Brooks"
|
|
203
|
+
type TEXT, -- 'person', 'project', 'concept', 'tool', 'place'
|
|
204
|
+
description TEXT, -- summary
|
|
205
|
+
properties TEXT, -- JSON: arbitrary key-value
|
|
206
|
+
created_at INTEGER NOT NULL,
|
|
207
|
+
updated_at INTEGER NOT NULL
|
|
208
|
+
);
|
|
209
|
+
```
|
|
210
|
+
|
|
211
|
+
### `relationships` — Knowledge graph edges (bi-temporal, inspired by Graphiti)
|
|
212
|
+
|
|
213
|
+
```sql
|
|
214
|
+
CREATE TABLE relationships (
|
|
215
|
+
id TEXT PRIMARY KEY,
|
|
216
|
+
source_id TEXT NOT NULL, -- FK to entities
|
|
217
|
+
target_id TEXT NOT NULL, -- FK to entities
|
|
218
|
+
type TEXT NOT NULL, -- 'founded', 'works_on', 'uses', 'knows', 'prefers'
|
|
219
|
+
description TEXT, -- natural language description
|
|
220
|
+
weight REAL DEFAULT 1.0, -- strength/confidence
|
|
221
|
+
temporal TEXT, -- 'current', 'past', 'planned'
|
|
222
|
+
event_time INTEGER, -- when the fact actually occurred (T)
|
|
223
|
+
evidence TEXT, -- JSON array of source chunk IDs
|
|
224
|
+
valid_from INTEGER NOT NULL, -- ingestion time (T') — when we learned this
|
|
225
|
+
valid_until INTEGER, -- set when superseded (NULL = still valid)
|
|
226
|
+
superseded_by TEXT, -- FK to new relationship that replaced this
|
|
227
|
+
created_at INTEGER NOT NULL,
|
|
228
|
+
updated_at INTEGER NOT NULL
|
|
229
|
+
);
|
|
230
|
+
```
|
|
231
|
+
|
|
232
|
+
### `sources` — Provenance tracking
|
|
233
|
+
|
|
234
|
+
```sql
|
|
235
|
+
CREATE TABLE sources (
|
|
236
|
+
id TEXT PRIMARY KEY,
|
|
237
|
+
type TEXT NOT NULL, -- 'conversation', 'file', 'imessage', 'url', 'clipboard', 'apple_note', 'browser'
|
|
238
|
+
uri TEXT, -- file path, URL, session ID, etc.
|
|
239
|
+
title TEXT,
|
|
240
|
+
connector TEXT, -- which connector produced this
|
|
241
|
+
metadata TEXT, -- JSON: connector-specific state
|
|
242
|
+
ingested_at INTEGER NOT NULL,
|
|
243
|
+
chunk_count INTEGER DEFAULT 0
|
|
244
|
+
);
|
|
245
|
+
```
|
|
246
|
+
|
|
247
|
+
### `connectors` — Sync state for each connector
|
|
248
|
+
|
|
249
|
+
```sql
|
|
250
|
+
CREATE TABLE connectors (
|
|
251
|
+
id TEXT PRIMARY KEY, -- 'imessage', 'browser-chrome', 'files', etc.
|
|
252
|
+
enabled INTEGER DEFAULT 1,
|
|
253
|
+
last_sync INTEGER, -- unix timestamp
|
|
254
|
+
cursor TEXT, -- connector-specific sync cursor (message ID, file mtime, etc.)
|
|
255
|
+
config TEXT, -- JSON: connector-specific config
|
|
256
|
+
stats TEXT -- JSON: { totalIngested, lastRunDuration, errors }
|
|
257
|
+
);
|
|
258
|
+
```
|
|
259
|
+
|
|
260
|
+
### Indexes
|
|
261
|
+
|
|
262
|
+
```sql
|
|
263
|
+
CREATE INDEX idx_entities_name ON entities(name);
|
|
264
|
+
CREATE INDEX idx_entities_type ON entities(type);
|
|
265
|
+
CREATE INDEX idx_relationships_source ON relationships(source_id);
|
|
266
|
+
CREATE INDEX idx_relationships_target ON relationships(target_id);
|
|
267
|
+
CREATE INDEX idx_relationships_valid ON relationships(valid_from, valid_until);
|
|
268
|
+
CREATE INDEX idx_sources_type ON sources(type);
|
|
269
|
+
CREATE INDEX idx_connectors_last_sync ON connectors(last_sync);
|
|
270
|
+
```
|
|
271
|
+
|
|
272
|
+
**Note:** Vector indexes (IVF-PQ) and BM25 full-text indexes are managed by LanceDB, not SQLite. This eliminates the need for sqlite-vec, FTS5, and manual RRF fusion — LanceDB's hybrid search handles it natively.
|
|
273
|
+
|
|
274
|
+
---
|
|
275
|
+
|
|
276
|
+
## 5. Ingestion Pipeline
|
|
277
|
+
|
|
278
|
+
### 5.1 Content Extraction
|
|
279
|
+
|
|
280
|
+
Every source goes through extraction first:
|
|
281
|
+
|
|
282
|
+
| Source | Extraction Method |
|
|
283
|
+
|--------|------------------|
|
|
284
|
+
| Conversations | Extract text from message objects (skip tool results optionally) |
|
|
285
|
+
| Files (.md) | Read directly |
|
|
286
|
+
| Files (.pdf) | `pdf-parse` or `pdfjs-dist` |
|
|
287
|
+
| Files (.ts/.js/.py/etc.) | Read directly, tag with language |
|
|
288
|
+
| URLs | Tavily extract API or `@mozilla/readability` + `jsdom` |
|
|
289
|
+
| iMessage | Read `~/Library/Messages/chat.db` SQLite directly |
|
|
290
|
+
| Browser history | Read Chrome/Firefox/Safari SQLite databases |
|
|
291
|
+
| Apple Notes | AppleScript bridge to read note contents |
|
|
292
|
+
| Clipboard | macOS `pbpaste` or clipboard monitoring daemon |
|
|
293
|
+
|
|
294
|
+
### 5.2 Smart Chunking
|
|
295
|
+
|
|
296
|
+
**Not fixed-size.** Three chunking strategies depending on content type:
|
|
297
|
+
|
|
298
|
+
**Semantic chunking (prose, conversations):**
|
|
299
|
+
- Split at paragraph boundaries
|
|
300
|
+
- Use embedding similarity between adjacent paragraphs
|
|
301
|
+
- If similarity drops below threshold → chunk boundary
|
|
302
|
+
- Target: 200-600 tokens per chunk
|
|
303
|
+
- Preserves natural semantic units
|
|
304
|
+
|
|
305
|
+
**AST-aware chunking (code):**
|
|
306
|
+
- Use tree-sitter to parse into AST
|
|
307
|
+
- Extract semantic entities: functions, classes, interfaces, types
|
|
308
|
+
- Build scope tree (preserve nesting: `UserService > getUser`)
|
|
309
|
+
- Split at entity boundaries, not arbitrary line counts
|
|
310
|
+
- Prepend scope chain as context
|
|
311
|
+
- Supported: TypeScript, JavaScript, Python, Rust, Go, Java
|
|
312
|
+
|
|
313
|
+
**Recursive character chunking (fallback):**
|
|
314
|
+
- For content that doesn't fit above categories
|
|
315
|
+
- Split at sentence → paragraph → section boundaries
|
|
316
|
+
- Fixed ~400 token chunks with 80 token overlap
|
|
317
|
+
- Current context-embeddings approach (backward compatible)
|
|
318
|
+
|
|
319
|
+
### 5.3 Contextual Enrichment (Anthropic's Approach)
|
|
320
|
+
|
|
321
|
+
Before embedding, each chunk gets a context prefix generated by an LLM:
|
|
322
|
+
|
|
323
|
+
```
|
|
324
|
+
Input chunk: "We fixed the apiKey issue by leaving remote: {} empty"
|
|
325
|
+
|
|
326
|
+
Context prefix: "This chunk is from a conversation about OpenClaw memory search
|
|
327
|
+
configuration. It describes the fix for a bug where putting an apiKey in
|
|
328
|
+
memorySearch.remote blocked the environment variable fallback. The solution was
|
|
329
|
+
to leave the remote object empty and let the op-secrets plugin set
|
|
330
|
+
process.env.OPENAI_API_KEY from 1Password."
|
|
331
|
+
|
|
332
|
+
Stored text: [context prefix] + [original chunk]
|
|
333
|
+
```
|
|
334
|
+
|
|
335
|
+
This dramatically improves retrieval quality by making chunks self-contained. Uses Claude Haiku for cost efficiency (~$0.001 per chunk).
|
|
336
|
+
|
|
337
|
+
### 5.4 Embedding
|
|
338
|
+
|
|
339
|
+
- **Primary:** Local via Ollama — `nomic-embed-text-v1.5` (768 dimensions, 8K token context, beats text-embedding-3-small on MTEB, Matryoshka support for dimensionality reduction)
|
|
340
|
+
- **Fallback:** OpenAI `text-embedding-3-small` (1536 dimensions) for users without Ollama
|
|
341
|
+
- **CPU-only fallback:** `all-MiniLM-L6-v2` via ONNX (384 dimensions, only 256 token context, but ~15ms/chunk on CPU)
|
|
342
|
+
- **Batch processing:** Embed in batches of 100 to minimize overhead
|
|
343
|
+
- Embeddings stored in LanceDB (Apache Arrow columnar format, memory-mapped for near in-memory speed)
|
|
344
|
+
|
|
345
|
+
### 5.5 Fact Extraction (mem0-style)
|
|
346
|
+
|
|
347
|
+
After chunking + embedding, an LLM extracts structured facts:
|
|
348
|
+
|
|
349
|
+
```
|
|
350
|
+
Input: "Parker founded WIP.computer in 2026. The company has three products:
|
|
351
|
+
Lesa (agent service), LYLA (token system), and an unnamed music player."
|
|
352
|
+
|
|
353
|
+
Extracted memories:
|
|
354
|
+
- "Parker Todd Brooks founded WIP.computer in 2026" [fact]
|
|
355
|
+
- "WIP.computer has three products: Lesa, LYLA, and an unnamed music player" [fact]
|
|
356
|
+
- "LYLA is the token/currency system for the WIP.computer ecosystem" [fact]
|
|
357
|
+
|
|
358
|
+
Extracted entities:
|
|
359
|
+
- Parker Todd Brooks [person]
|
|
360
|
+
- WIP.computer [company]
|
|
361
|
+
- Lesa [product]
|
|
362
|
+
- LYLA [product]
|
|
363
|
+
|
|
364
|
+
Extracted relationships:
|
|
365
|
+
- Parker Todd Brooks --founded--> WIP.computer
|
|
366
|
+
- WIP.computer --has_product--> Lesa
|
|
367
|
+
- WIP.computer --has_product--> LYLA
|
|
368
|
+
- Lesa --type--> agent_service
|
|
369
|
+
- LYLA --type--> token_system
|
|
370
|
+
```
|
|
371
|
+
|
|
372
|
+
### 5.6 Memory Operations (ADD / UPDATE / DELETE / NOOP)
|
|
373
|
+
|
|
374
|
+
When new facts are extracted, they're compared against existing memories:
|
|
375
|
+
|
|
376
|
+
1. **Embed** the new fact
|
|
377
|
+
2. **Search** for semantically similar existing memories (top-5)
|
|
378
|
+
3. **LLM decides** which operation to apply:
|
|
379
|
+
- **ADD** — No equivalent memory exists. Create new.
|
|
380
|
+
- **UPDATE** — Existing memory has related but incomplete info. Merge.
|
|
381
|
+
- **DELETE** — New info contradicts existing memory. Mark old as deprecated, create new.
|
|
382
|
+
- **NOOP** — Memory already captured. Skip.
|
|
383
|
+
|
|
384
|
+
This is how memory evolves: "Parker uses Sonnet" gets updated to "Parker uses Sonnet as primary, with Opus for complex tasks" without creating duplicates.
|
|
385
|
+
|
|
386
|
+
---
|
|
387
|
+
|
|
388
|
+
## 6. Search & Retrieval
|
|
389
|
+
|
|
390
|
+
### 6.1 Hybrid Search
|
|
391
|
+
|
|
392
|
+
Every query runs through three parallel search paths:
|
|
393
|
+
|
|
394
|
+
1. **Hybrid search (LanceDB)** — Single query combining ANN vector search (IVF-PQ) + BM25 keyword search. LanceDB handles fusion natively — no manual RRF needed. Top-20.
|
|
395
|
+
2. **Graph traversal** — Find related entities, walk relationships in SQLite, gather connected memories
|
|
396
|
+
3. **Merge** — LanceDB results + graph-augmented context combined via Reciprocal Rank Fusion:
|
|
397
|
+
|
|
398
|
+
```
|
|
399
|
+
score(doc) = Σ 1/(k + rank_i(doc)) where k = 60
|
|
400
|
+
```
|
|
401
|
+
|
|
402
|
+
### 6.2 Query Rewriting
|
|
403
|
+
|
|
404
|
+
Before searching, the raw query is optionally rewritten:
|
|
405
|
+
|
|
406
|
+
- **Decomposition:** "What did Parker and I discuss about music and tokens?" → ["Parker music player vision", "LYLA token system", "UHI framework"]
|
|
407
|
+
- **HyDE:** Generate a hypothetical answer, embed that instead of the question (better for finding factual matches)
|
|
408
|
+
|
|
409
|
+
### 6.3 Reranking
|
|
410
|
+
|
|
411
|
+
After fusion, top candidates are reranked:
|
|
412
|
+
|
|
413
|
+
- **Option A (default):** Local cross-encoder model (`ms-marco-MiniLM-L-6-v2` via ONNX Runtime) — fast (~5ms per candidate), free, runs on CPU, significant quality improvement
|
|
414
|
+
- **Option B:** LLM-based reranking (Claude Haiku scores relevance 0-10) — higher quality, costs ~$0.0001/query
|
|
415
|
+
- **Option C:** No reranking (for speed, acceptable at small scale)
|
|
416
|
+
|
|
417
|
+
Default: Option A for all queries. Option B available as `reranking: "haiku"` config.
|
|
418
|
+
|
|
419
|
+
### 6.4 Graph-Augmented Retrieval
|
|
420
|
+
|
|
421
|
+
For entity-rich queries, the knowledge graph augments results:
|
|
422
|
+
|
|
423
|
+
1. Extract entities from query
|
|
424
|
+
2. Find matching entity nodes
|
|
425
|
+
3. Traverse 1-2 hops of relationships
|
|
426
|
+
4. Include connected memories as additional context
|
|
427
|
+
5. This turns "tell me about WIP.computer" into a structured answer with products, people, philosophy, and status
|
|
428
|
+
|
|
429
|
+
---
|
|
430
|
+
|
|
431
|
+
## 7. Connectors
|
|
432
|
+
|
|
433
|
+
### 7.1 Conversations (Primary — replaces context-embeddings)
|
|
434
|
+
|
|
435
|
+
**Hook:** `agent_end` (fires after every agent turn)
|
|
436
|
+
|
|
437
|
+
**Behavior:**
|
|
438
|
+
- Same as current context-embeddings plugin but with smart chunking + fact extraction
|
|
439
|
+
- Captures user/assistant turns, skips tool results (configurable)
|
|
440
|
+
- Extracts facts and updates knowledge graph after each turn
|
|
441
|
+
- Tracks per-session capture state to avoid re-processing
|
|
442
|
+
|
|
443
|
+
**Migration:** Import existing `context-embeddings.sqlite` chunks on first run.
|
|
444
|
+
|
|
445
|
+
### 7.2 iMessage History
|
|
446
|
+
|
|
447
|
+
**Source:** `~/Library/Messages/chat.db` (SQLite, read-only)
|
|
448
|
+
|
|
449
|
+
**Behavior:**
|
|
450
|
+
- Reads `message` table joined with `chat` and `handle`
|
|
451
|
+
- Filters by date range and/or chat ID
|
|
452
|
+
- Extracts text content (handles attributedBody NSKeyedArchiver format)
|
|
453
|
+
- Groups by conversation thread
|
|
454
|
+
- Incremental sync via `message.ROWID` cursor
|
|
455
|
+
|
|
456
|
+
**Privacy:** Only indexes conversations with Parker (configurable handle filter). Does NOT index group chats by default.
|
|
457
|
+
|
|
458
|
+
**Note:** Requires Full Disk Access permission for the process reading chat.db.
|
|
459
|
+
|
|
460
|
+
### 7.3 Local Files
|
|
461
|
+
|
|
462
|
+
**Source:** Configurable directory paths (default: `~/.openclaw/workspace/`, `~/Documents/`)
|
|
463
|
+
|
|
464
|
+
**Behavior:**
|
|
465
|
+
- Watches for `.md`, `.txt`, `.pdf`, `.ts`, `.js`, `.py`, `.json` files
|
|
466
|
+
- Hashes files to detect changes (skip unchanged)
|
|
467
|
+
- Re-indexes on modification
|
|
468
|
+
- Respects `.gitignore` and custom exclude patterns
|
|
469
|
+
- Code files use AST-aware chunking
|
|
470
|
+
|
|
471
|
+
**Incremental:** File mtime-based cursor.
|
|
472
|
+
|
|
473
|
+
### 7.4 Browser History & Bookmarks
|
|
474
|
+
|
|
475
|
+
**Source:** Chrome (`~/Library/Application Support/Google/Chrome/Default/History`), Firefox (`~/Library/Application Support/Firefox/Profiles/*/places.sqlite`), Safari (`~/Library/Safari/History.db`)
|
|
476
|
+
|
|
477
|
+
**Behavior:**
|
|
478
|
+
- Reads URL + title + visit timestamp
|
|
479
|
+
- Optionally fetches and extracts content from frequently visited URLs
|
|
480
|
+
- Bookmarks indexed with higher weight
|
|
481
|
+
- Incremental via visit timestamp cursor
|
|
482
|
+
|
|
483
|
+
**Note:** Chrome locks its History file while running. Copy to temp first.
|
|
484
|
+
|
|
485
|
+
### 7.5 Apple Notes
|
|
486
|
+
|
|
487
|
+
**Source:** AppleScript bridge (`osascript`)
|
|
488
|
+
|
|
489
|
+
**Behavior:**
|
|
490
|
+
- Lists all notes via `tell application "Notes" to get every note`
|
|
491
|
+
- Extracts title + body (HTML → markdown)
|
|
492
|
+
- Incremental via modification date
|
|
493
|
+
|
|
494
|
+
### 7.6 Clipboard History
|
|
495
|
+
|
|
496
|
+
**Source:** macOS pasteboard
|
|
497
|
+
|
|
498
|
+
**Behavior:**
|
|
499
|
+
- Optional: runs a lightweight daemon that polls `pbpaste` every N seconds
|
|
500
|
+
- Only captures text content over 50 characters
|
|
501
|
+
- Deduplicates against recent captures
|
|
502
|
+
- Useful for capturing URLs, code snippets, notes copied from other apps
|
|
503
|
+
|
|
504
|
+
### 7.7 Web / URL Ingestion
|
|
505
|
+
|
|
506
|
+
**Source:** Manual URL submission or Tavily extract
|
|
507
|
+
|
|
508
|
+
**Behavior:**
|
|
509
|
+
- Agent calls `crystal_ingest_url` tool with a URL
|
|
510
|
+
- Content extracted via Tavily extract API (if available) or `@mozilla/readability`
|
|
511
|
+
- Chunked, embedded, facts extracted
|
|
512
|
+
- Source tracked with URL for provenance
|
|
513
|
+
|
|
514
|
+
---
|
|
515
|
+
|
|
516
|
+
## 8. Agent Tools
|
|
517
|
+
|
|
518
|
+
### `crystal_search`
|
|
519
|
+
|
|
520
|
+
The primary search tool. Replaces `conversation_search` and `memory_search`.
|
|
521
|
+
|
|
522
|
+
```
|
|
523
|
+
Parameters:
|
|
524
|
+
query: string — Natural language query
|
|
525
|
+
scope?: string[] — Filter: ['conversations', 'documents', 'notes', 'web', 'messages']
|
|
526
|
+
limit?: number — Max results (default: 10)
|
|
527
|
+
time_range?: string — 'today', 'week', 'month', 'all' (default: 'all')
|
|
528
|
+
include_graph?: bool — Include knowledge graph context (default: true)
|
|
529
|
+
|
|
530
|
+
Returns:
|
|
531
|
+
results: Array<{
|
|
532
|
+
text: string,
|
|
533
|
+
source: { type, uri, title, timestamp },
|
|
534
|
+
score: number,
|
|
535
|
+
related_memories?: string[],
|
|
536
|
+
graph_context?: string
|
|
537
|
+
}>
|
|
538
|
+
```
|
|
539
|
+
|
|
540
|
+
### `crystal_remember`
|
|
541
|
+
|
|
542
|
+
Store a new memory or fact explicitly.
|
|
543
|
+
|
|
544
|
+
```
|
|
545
|
+
Parameters:
|
|
546
|
+
text: string — The memory/fact to store
|
|
547
|
+
category?: string — 'fact', 'preference', 'event', 'opinion'
|
|
548
|
+
|
|
549
|
+
Returns:
|
|
550
|
+
operation: 'added' | 'updated' | 'duplicate'
|
|
551
|
+
memory_id: string
|
|
552
|
+
```
|
|
553
|
+
|
|
554
|
+
### `crystal_forget`
|
|
555
|
+
|
|
556
|
+
Mark a memory as deprecated.
|
|
557
|
+
|
|
558
|
+
```
|
|
559
|
+
Parameters:
|
|
560
|
+
query: string — Description of what to forget
|
|
561
|
+
confirm?: bool — Require confirmation (default: true)
|
|
562
|
+
|
|
563
|
+
Returns:
|
|
564
|
+
forgotten: number — Count of memories deprecated
|
|
565
|
+
```
|
|
566
|
+
|
|
567
|
+
### `crystal_ingest_url`
|
|
568
|
+
|
|
569
|
+
Ingest a web page into memory.
|
|
570
|
+
|
|
571
|
+
```
|
|
572
|
+
Parameters:
|
|
573
|
+
url: string
|
|
574
|
+
title?: string
|
|
575
|
+
|
|
576
|
+
Returns:
|
|
577
|
+
chunks: number
|
|
578
|
+
memories: number
|
|
579
|
+
entities: number
|
|
580
|
+
```
|
|
581
|
+
|
|
582
|
+
### `crystal_graph`
|
|
583
|
+
|
|
584
|
+
Query the knowledge graph directly.
|
|
585
|
+
|
|
586
|
+
```
|
|
587
|
+
Parameters:
|
|
588
|
+
entity: string — Entity name or description
|
|
589
|
+
depth?: number — Relationship traversal depth (default: 2)
|
|
590
|
+
|
|
591
|
+
Returns:
|
|
592
|
+
entity: { name, type, description, properties }
|
|
593
|
+
relationships: Array<{ target, type, description }>
|
|
594
|
+
connected_memories: string[]
|
|
595
|
+
```
|
|
596
|
+
|
|
597
|
+
### `crystal_status`
|
|
598
|
+
|
|
599
|
+
Show memory system stats.
|
|
600
|
+
|
|
601
|
+
```
|
|
602
|
+
Returns:
|
|
603
|
+
total_chunks: number
|
|
604
|
+
total_memories: number
|
|
605
|
+
total_entities: number
|
|
606
|
+
total_relationships: number
|
|
607
|
+
database_size: string
|
|
608
|
+
connectors: Array<{ id, enabled, last_sync, total_ingested }>
|
|
609
|
+
```
|
|
610
|
+
|
|
611
|
+
---
|
|
612
|
+
|
|
613
|
+
## 9. MCP Server
|
|
614
|
+
|
|
615
|
+
Replaces the current `lesa-bridge` MCP server with a superset of tools:
|
|
616
|
+
|
|
617
|
+
| Current lesa-bridge tool | Memory Crystal replacement |
|
|
618
|
+
|-------------------------|---------------------------|
|
|
619
|
+
| `lesa_conversation_search` | `crystal_search` (scope: conversations) |
|
|
620
|
+
| `lesa_memory_search` | `crystal_search` (scope: documents, notes) |
|
|
621
|
+
| `lesa_read_workspace` | `crystal_search` + direct file read |
|
|
622
|
+
|
|
623
|
+
**New MCP tools:**
|
|
624
|
+
- `crystal_search` — Unified hybrid search
|
|
625
|
+
- `crystal_remember` — Store memories
|
|
626
|
+
- `crystal_forget` — Deprecate memories
|
|
627
|
+
- `crystal_ingest_url` — Ingest web content
|
|
628
|
+
- `crystal_graph` — Knowledge graph queries
|
|
629
|
+
- `crystal_status` — System stats
|
|
630
|
+
|
|
631
|
+
**MCP Resources** (automatic context injection):
|
|
632
|
+
- `memory://recent` — Last 24h of memories (injected at conversation start)
|
|
633
|
+
- `memory://graph` — Full knowledge graph summary
|
|
634
|
+
- `memory://entity/{name}` — Everything known about a specific entity
|
|
635
|
+
|
|
636
|
+
Registered via `claude mcp add` at user scope, available in all Claude Code sessions.
|
|
637
|
+
|
|
638
|
+
---
|
|
639
|
+
|
|
640
|
+
## 10. CLI Commands
|
|
641
|
+
|
|
642
|
+
```bash
|
|
643
|
+
# Status and diagnostics
|
|
644
|
+
openclaw crystal status # Show stats, connector status, DB size
|
|
645
|
+
openclaw crystal search "query" # Search from command line
|
|
646
|
+
openclaw crystal search "query" --scope conversations --limit 5
|
|
647
|
+
|
|
648
|
+
# Ingestion
|
|
649
|
+
openclaw crystal ingest <file> # Ingest a file or directory
|
|
650
|
+
openclaw crystal ingest <url> # Ingest a URL
|
|
651
|
+
openclaw crystal ingest --all # Run all enabled connectors
|
|
652
|
+
|
|
653
|
+
# Connectors
|
|
654
|
+
openclaw crystal connectors # List all connectors and status
|
|
655
|
+
openclaw crystal connectors enable imessage
|
|
656
|
+
openclaw crystal connectors disable clipboard
|
|
657
|
+
openclaw crystal connectors sync imessage # Force sync a connector
|
|
658
|
+
openclaw crystal connectors sync --all # Sync all
|
|
659
|
+
|
|
660
|
+
# Knowledge graph
|
|
661
|
+
openclaw crystal graph "Parker" # Show entity and relationships
|
|
662
|
+
openclaw crystal graph --stats # Graph statistics
|
|
663
|
+
|
|
664
|
+
# Migration
|
|
665
|
+
openclaw crystal migrate # Import from context-embeddings.sqlite
|
|
666
|
+
|
|
667
|
+
# Maintenance
|
|
668
|
+
openclaw crystal compact # Remove deprecated memories, optimize DB
|
|
669
|
+
openclaw crystal export # Export all memories as JSON (backup)
|
|
670
|
+
openclaw crystal import <file> # Import from backup
|
|
671
|
+
```
|
|
672
|
+
|
|
673
|
+
---
|
|
674
|
+
|
|
675
|
+
## 11. Migration Strategy
|
|
676
|
+
|
|
677
|
+
### Phase 1: Replace context-embeddings (backward compatible)
|
|
678
|
+
|
|
679
|
+
1. Import all 2,673 chunks from `context-embeddings.sqlite` into LanceDB + `crystal.db`
|
|
680
|
+
2. Register same `agent_end` hook for conversation capture
|
|
681
|
+
3. Provide same `conversation_search` tool (aliased to `crystal_search`)
|
|
682
|
+
4. Disable old `context-embeddings` plugin
|
|
683
|
+
5. **Zero loss of existing data**
|
|
684
|
+
|
|
685
|
+
### Phase 2: Add hybrid search + fact extraction
|
|
686
|
+
|
|
687
|
+
6. LanceDB IVF-PQ indexing on all chunks (auto-enabled above threshold)
|
|
688
|
+
7. LanceDB built-in BM25 full-text indexing
|
|
689
|
+
8. Enable fact extraction pipeline (memories table)
|
|
690
|
+
9. Run fact extraction over existing chunks (batch job)
|
|
691
|
+
10. Add `crystal_remember` and `crystal_forget` tools
|
|
692
|
+
|
|
693
|
+
### Phase 3: Knowledge graph
|
|
694
|
+
|
|
695
|
+
11. Enable entity + relationship extraction
|
|
696
|
+
12. Build graph from existing memories
|
|
697
|
+
13. Add `crystal_graph` tool
|
|
698
|
+
14. Add graph-augmented retrieval to search
|
|
699
|
+
|
|
700
|
+
### Phase 4: Connectors
|
|
701
|
+
|
|
702
|
+
14. Enable file watcher (workspace + Documents)
|
|
703
|
+
15. Enable iMessage connector
|
|
704
|
+
16. Enable browser history connector
|
|
705
|
+
17. Enable Apple Notes connector
|
|
706
|
+
18. Optional: clipboard daemon
|
|
707
|
+
|
|
708
|
+
### Phase 5: MCP server + external access
|
|
709
|
+
|
|
710
|
+
19. Build MCP server (replaces lesa-bridge)
|
|
711
|
+
20. Register with Claude Code
|
|
712
|
+
21. Add contextual enrichment to ingestion pipeline
|
|
713
|
+
22. Add query rewriting to search
|
|
714
|
+
|
|
715
|
+
---
|
|
716
|
+
|
|
717
|
+
## 12. Technical Decisions
|
|
718
|
+
|
|
719
|
+
### Why LanceDB + SQLite (not Postgres/pgvector, not single SQLite)
|
|
720
|
+
|
|
721
|
+
Research evaluated five vector stores (see [RESEARCH.md](./RESEARCH.md) §1). LanceDB won:
|
|
722
|
+
|
|
723
|
+
- **Embedded library** — `npm install @lancedb/lancedb`, no server process. Same deployment model as SQLite.
|
|
724
|
+
- **Native hybrid search** — BM25 + vector in one query, built-in. No manual FTS5/sqlite-vec/RRF plumbing.
|
|
725
|
+
- **Scales to 1M+** — IVF-PQ indexing keeps query times flat (~25ms at 1M). sqlite-vec degrades linearly (~200ms at 1M).
|
|
726
|
+
- **Disk-efficient** — Apache Arrow columnar format, memory-mapped. Near in-memory speed from disk.
|
|
727
|
+
- **Used by Continue IDE** — Proven for exactly this use case (code + conversation memory).
|
|
728
|
+
|
|
729
|
+
SQLite handles the knowledge graph and metadata — it's better at relational queries, joins, and recursive CTEs for graph traversal.
|
|
730
|
+
|
|
731
|
+
**Why not single-file SQLite + sqlite-vec?** sqlite-vec is brute-force only (no ANN indexing). Performance is linear with dataset size. Acceptable at 10K chunks (~2ms) but unacceptable at 100K+ (~75ms). LanceDB stays flat. Also, sqlite-vec requires manual RRF fusion between separate vec0 and FTS5 queries — LanceDB does this natively.
|
|
732
|
+
|
|
733
|
+
### Why mem0-style operations (not just append)
|
|
734
|
+
|
|
735
|
+
- Memories need to evolve. "Parker's primary model is Sonnet" should update, not duplicate.
|
|
736
|
+
- LLM-based ADD/UPDATE/DELETE/NOOP is proven (mem0 paper: arxiv.org/html/2504.19413v1)
|
|
737
|
+
- Keeps memory corpus clean and current
|
|
738
|
+
|
|
739
|
+
### Why Anthropic-style contextual enrichment
|
|
740
|
+
|
|
741
|
+
- Raw chunks out of context are ambiguous ("We fixed the apiKey issue" — which apiKey?)
|
|
742
|
+
- Prepending a context summary improves retrieval by 35-67% (Anthropic's research)
|
|
743
|
+
- Cost is minimal: ~$0.001 per chunk with Haiku
|
|
744
|
+
|
|
745
|
+
### Why tree-sitter for code chunking
|
|
746
|
+
|
|
747
|
+
- Fixed-size chunks split functions mid-body, losing semantic meaning
|
|
748
|
+
- AST-aware chunking keeps complete functions/classes together
|
|
749
|
+
- Scope chain prepended means "getUser" is searchable as "UserService.getUser"
|
|
750
|
+
- Supermemory's `code-chunk` library proves this approach works
|
|
751
|
+
|
|
752
|
+
### Why not a separate knowledge graph database (FalkorDB, Neo4j)
|
|
753
|
+
|
|
754
|
+
- Overkill for single-user, local memory
|
|
755
|
+
- SQLite tables with proper indexes handle graph queries fine
|
|
756
|
+
- Entity + relationship tables with recursive CTEs = basic graph traversal
|
|
757
|
+
- No additional infrastructure to manage
|
|
758
|
+
|
|
759
|
+
---
|
|
760
|
+
|
|
761
|
+
## 13. Dependencies
|
|
762
|
+
|
|
763
|
+
```json
|
|
764
|
+
{
|
|
765
|
+
"dependencies": {
|
|
766
|
+
"@lancedb/lancedb": "^0.10.0",
|
|
767
|
+
"better-sqlite3": "^11.0.0",
|
|
768
|
+
"@1password/sdk": "^0.3.1",
|
|
769
|
+
"openai": "^4.0.0",
|
|
770
|
+
"nanoid": "^5.0.0",
|
|
771
|
+
"@anthropic-ai/sdk": "^0.30.0",
|
|
772
|
+
"@modelcontextprotocol/sdk": "^1.0.0",
|
|
773
|
+
"zod": "^3.22.0",
|
|
774
|
+
"apache-arrow": "^17.0.0"
|
|
775
|
+
},
|
|
776
|
+
"optionalDependencies": {
|
|
777
|
+
"@supermemory/code-chunk": "^1.0.0",
|
|
778
|
+
"onnxruntime-node": "^1.18.0",
|
|
779
|
+
"pdf-parse": "^1.1.1",
|
|
780
|
+
"@mozilla/readability": "^0.5.0",
|
|
781
|
+
"jsdom": "^24.0.0",
|
|
782
|
+
"chokidar": "^4.0.0"
|
|
783
|
+
},
|
|
784
|
+
"devDependencies": {
|
|
785
|
+
"tsup": "^8.0.0",
|
|
786
|
+
"typescript": "^5.4.0"
|
|
787
|
+
}
|
|
788
|
+
}
|
|
789
|
+
```
|
|
790
|
+
|
|
791
|
+
---
|
|
792
|
+
|
|
793
|
+
## 14. Configuration
|
|
794
|
+
|
|
795
|
+
Plugin config in `openclaw.json`:
|
|
796
|
+
|
|
797
|
+
```json
|
|
798
|
+
{
|
|
799
|
+
"plugins": {
|
|
800
|
+
"entries": {
|
|
801
|
+
"memory-crystal": {
|
|
802
|
+
"enabled": true,
|
|
803
|
+
"config": {
|
|
804
|
+
"dataDir": "~/.openclaw/memory-crystal",
|
|
805
|
+
"embedding": {
|
|
806
|
+
"provider": "ollama",
|
|
807
|
+
"model": "nomic-embed-text",
|
|
808
|
+
"dimensions": 768,
|
|
809
|
+
"ollamaBaseUrl": "http://localhost:11434",
|
|
810
|
+
"fallback": "openai"
|
|
811
|
+
},
|
|
812
|
+
"enrichment": {
|
|
813
|
+
"enabled": true,
|
|
814
|
+
"model": "claude-haiku"
|
|
815
|
+
},
|
|
816
|
+
"connectors": {
|
|
817
|
+
"conversations": { "enabled": true },
|
|
818
|
+
"files": {
|
|
819
|
+
"enabled": true,
|
|
820
|
+
"paths": ["~/.openclaw/workspace/", "~/Documents/"],
|
|
821
|
+
"extensions": [".md", ".txt", ".pdf"],
|
|
822
|
+
"exclude": ["node_modules", ".git", "dist"]
|
|
823
|
+
},
|
|
824
|
+
"imessage": {
|
|
825
|
+
"enabled": false,
|
|
826
|
+
"handles": []
|
|
827
|
+
},
|
|
828
|
+
"browser": {
|
|
829
|
+
"enabled": false,
|
|
830
|
+
"browsers": ["chrome"]
|
|
831
|
+
},
|
|
832
|
+
"apple_notes": { "enabled": false },
|
|
833
|
+
"clipboard": { "enabled": false }
|
|
834
|
+
},
|
|
835
|
+
"search": {
|
|
836
|
+
"hybridWeight": { "vector": 0.7, "bm25": 0.3 },
|
|
837
|
+
"reranking": "haiku",
|
|
838
|
+
"maxResults": 10
|
|
839
|
+
},
|
|
840
|
+
"factExtraction": {
|
|
841
|
+
"enabled": true,
|
|
842
|
+
"model": "claude-haiku"
|
|
843
|
+
},
|
|
844
|
+
"graph": {
|
|
845
|
+
"enabled": true,
|
|
846
|
+
"extractEntities": true,
|
|
847
|
+
"traversalDepth": 2
|
|
848
|
+
}
|
|
849
|
+
}
|
|
850
|
+
}
|
|
851
|
+
}
|
|
852
|
+
}
|
|
853
|
+
}
|
|
854
|
+
```
|
|
855
|
+
|
|
856
|
+
---
|
|
857
|
+
|
|
858
|
+
## 15. Performance Targets
|
|
859
|
+
|
|
860
|
+
| Metric | Target | Notes |
|
|
861
|
+
|--------|--------|-------|
|
|
862
|
+
| Search latency (10K chunks) | < 30ms | LanceDB IVF-PQ + built-in BM25 |
|
|
863
|
+
| Search latency (100K chunks) | < 50ms | IVF-PQ scales sublinearly (stays flat) |
|
|
864
|
+
| Search latency (1M chunks) | < 50ms | Same indexing, memory-mapped from disk |
|
|
865
|
+
| Ingestion throughput (local embed) | 100 chunks/sec | Ollama nomic-embed ~50ms/chunk on GPU |
|
|
866
|
+
| Ingestion throughput (API embed) | 50 chunks/sec | Bottleneck: OpenAI API |
|
|
867
|
+
| Fact extraction | 10 facts/sec | Bottleneck: LLM API |
|
|
868
|
+
| Database size per 10K chunks | ~30MB | 768-dim × 4 bytes × 10K = 30MB + Arrow overhead |
|
|
869
|
+
| Startup time | < 2 seconds | LanceDB + SQLite connection + connector init |
|
|
870
|
+
| Memory overhead | < 80MB RSS | LanceDB memory-maps Arrow files |
|
|
871
|
+
|
|
872
|
+
---
|
|
873
|
+
|
|
874
|
+
## 16. Security & Privacy
|
|
875
|
+
|
|
876
|
+
- **All data local.** Database never leaves the machine.
|
|
877
|
+
- **API keys from 1Password.** OpenAI key resolved via op-secrets plugin at startup.
|
|
878
|
+
- **iMessage requires Full Disk Access.** User must explicitly grant this.
|
|
879
|
+
- **No telemetry.** Zero data sent anywhere except embedding/LLM API calls.
|
|
880
|
+
- **Connector isolation.** Each connector only accesses its declared source.
|
|
881
|
+
- **Sensitive content.** Clipboard and iMessage connectors disabled by default.
|
|
882
|
+
- **Backup.** Two stores: `crystal.db` (SQLite) + `lance/` directory. Copy the `~/.openclaw/memory-crystal/` directory. Automatic daily backups to `backups/`.
|
|
883
|
+
|
|
884
|
+
---
|
|
885
|
+
|
|
886
|
+
## 17. Relationship to WIP.computer
|
|
887
|
+
|
|
888
|
+
Memory Crystal is infrastructure for Lesa specifically, but the patterns apply to WIP.computer's broader vision:
|
|
889
|
+
|
|
890
|
+
- **Namespace + Memory:** When agents have perfect recall, they can maintain creator identity context across every interaction
|
|
891
|
+
- **Creation-time attribution:** If the memory system tracks what influenced a creation, the rubric has better data for fair splits
|
|
892
|
+
- **Agent marketplace:** Trained agents (like "Debbie") need persistent memory of their training — Crystal is that layer
|
|
893
|
+
- **Multi-agent isolation:** Each spawned agent gets its own memory crystal (separate DB file), with privacy boundaries by design
|
|
894
|
+
|
|
895
|
+
---
|
|
896
|
+
|
|
897
|
+
## 18. Development Phases & Effort Estimates
|
|
898
|
+
|
|
899
|
+
### Phase 1: Core + Migration (Week 1)
|
|
900
|
+
- [ ] Project scaffold (package.json, tsconfig, plugin manifest)
|
|
901
|
+
- [ ] LanceDB setup + table creation (chunks, memories, entity_embeddings)
|
|
902
|
+
- [ ] SQLite schema + migrations (entities, relationships, sources, connectors)
|
|
903
|
+
- [ ] Embedding provider (Ollama nomic-embed primary, OpenAI fallback, 1Password resolution)
|
|
904
|
+
- [ ] Basic chunking (current approach, backward compatible)
|
|
905
|
+
- [ ] Import from context-embeddings.sqlite (re-embed with nomic-embed)
|
|
906
|
+
- [ ] LanceDB hybrid search (built-in vector + BM25)
|
|
907
|
+
- [ ] `crystal_search` agent tool
|
|
908
|
+
- [ ] `crystal_status` agent tool + CLI
|
|
909
|
+
- [ ] Conversation connector (agent_end hook)
|
|
910
|
+
- [ ] Disable context-embeddings, enable memory-crystal
|
|
911
|
+
|
|
912
|
+
### Phase 2: Intelligence (Week 2)
|
|
913
|
+
- [ ] Fact extraction pipeline (LLM-based)
|
|
914
|
+
- [ ] Memory operations (ADD/UPDATE/DELETE/NOOP)
|
|
915
|
+
- [ ] `crystal_remember` + `crystal_forget` tools
|
|
916
|
+
- [ ] Semantic chunking for prose
|
|
917
|
+
- [ ] Contextual enrichment (Anthropic approach)
|
|
918
|
+
- [ ] Query rewriting
|
|
919
|
+
|
|
920
|
+
### Phase 3: Knowledge Graph (Week 3)
|
|
921
|
+
- [ ] Entity extraction
|
|
922
|
+
- [ ] Relationship extraction
|
|
923
|
+
- [ ] Graph storage + traversal queries
|
|
924
|
+
- [ ] Graph-augmented search
|
|
925
|
+
- [ ] `crystal_graph` tool
|
|
926
|
+
|
|
927
|
+
### Phase 4: Connectors (Week 3-4)
|
|
928
|
+
- [ ] File watcher connector
|
|
929
|
+
- [ ] iMessage connector
|
|
930
|
+
- [ ] Browser history connector
|
|
931
|
+
- [ ] Apple Notes connector
|
|
932
|
+
- [ ] Clipboard connector
|
|
933
|
+
- [ ] Web/URL connector
|
|
934
|
+
- [ ] Connector management CLI
|
|
935
|
+
|
|
936
|
+
### Phase 5: MCP + Polish (Week 4)
|
|
937
|
+
- [ ] MCP server (replaces lesa-bridge)
|
|
938
|
+
- [ ] CLI refinement
|
|
939
|
+
- [ ] AST-aware code chunking (tree-sitter)
|
|
940
|
+
- [ ] Reranking
|
|
941
|
+
- [ ] Performance optimization
|
|
942
|
+
- [ ] Documentation
|
|
943
|
+
|
|
944
|
+
---
|
|
945
|
+
|
|
946
|
+
## 19. Success Criteria
|
|
947
|
+
|
|
948
|
+
1. **"Crystal, what did Parker and I discuss about LYLA tokens?"** returns accurate, sourced results from conversations that happened weeks ago — even if they were compacted from Lesa's context window.
|
|
949
|
+
|
|
950
|
+
2. **Memory evolves.** If Parker changes his primary model from Sonnet to Opus, Crystal updates the fact — not duplicates it.
|
|
951
|
+
|
|
952
|
+
3. **Cross-source connections.** A search for "music player" returns results from conversations, WIP.computer docs, browser history of relevant articles, and knowledge graph showing connections to UHI framework and LYLA.
|
|
953
|
+
|
|
954
|
+
4. **Zero manual maintenance.** Connectors run automatically. Facts extract automatically. Graph builds automatically. Parker never has to manually add anything.
|
|
955
|
+
|
|
956
|
+
5. **Sub-second search.** Even at 100K+ chunks, search returns in under 300ms.
|
|
957
|
+
|
|
958
|
+
6. **Total sovereignty.** Everything in one directory (`~/.openclaw/memory-crystal/`). Back it up, move it, encrypt it. No cloud dependency. No external servers.
|
|
959
|
+
|
|
960
|
+
---
|
|
961
|
+
|
|
962
|
+
## 20. Open Questions
|
|
963
|
+
|
|
964
|
+
1. ~~**sqlite-vec vs. vectorlite vs. raw BLOB cosine**~~ **RESOLVED:** LanceDB. See [RESEARCH.md](./RESEARCH.md) §1 for full comparison. Built-in hybrid search, IVF-PQ indexing, native TS SDK.
|
|
965
|
+
|
|
966
|
+
2. ~~**Local embedding models**~~ **RESOLVED:** nomic-embed-text-v1.5 via Ollama as default. Beats text-embedding-3-small on MTEB benchmarks, 8K token context, free. OpenAI as fallback. See [RESEARCH.md](./RESEARCH.md) §1.
|
|
967
|
+
|
|
968
|
+
3. **Enrichment cost** — At 50K chunks, contextual enrichment via Haiku costs ~$50. Worth it? Could batch-process in background.
|
|
969
|
+
|
|
970
|
+
4. **iMessage NSKeyedArchiver format** — Parsing `attributedBody` is non-trivial. May need a Swift helper or existing library.
|
|
971
|
+
|
|
972
|
+
5. **Code chunking approach** — Research recommends `@supermemory/code-chunk` (tree-sitter, TypeScript native) over raw tree-sitter bindings. Avoids native module issues. Need to verify it works with OpenClaw's module loader.
|
|
973
|
+
|
|
974
|
+
6. **Memory Crystal as name** — Parker to confirm. Alternative: "Recall", "Engram", "Mnemon", "Crystal Memory", "Total Recall" (lol).
|
|
975
|
+
|
|
976
|
+
7. ~~**Relationship to context-embeddings**~~ **RESOLVED:** Full replacement. Migration path defined in §11.
|
|
977
|
+
|
|
978
|
+
8. **Re-embedding existing data** — The 2,673 existing chunks use text-embedding-3-small (1536 dim). Switching to nomic-embed (768 dim) means re-embedding everything during migration. One-time cost, ~2 minutes with Ollama.
|
|
979
|
+
|
|
980
|
+
9. **LanceDB maturity** — LanceDB TS SDK is newer than sqlite-vec. Need to monitor for stability issues. Fallback: keep sqlite-vec as a "lite mode" option.
|
|
981
|
+
|
|
982
|
+
---
|
|
983
|
+
|
|
984
|
+
## 21. References
|
|
985
|
+
|
|
986
|
+
### Research & Prior Art
|
|
987
|
+
- [Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory](https://arxiv.org/html/2504.19413v1) — Extraction + update pipeline, ADD/UPDATE/DELETE/NOOP
|
|
988
|
+
- [Anthropic: Contextual Retrieval](https://www.anthropic.com/news/contextual-retrieval) — Chunk enrichment for better RAG
|
|
989
|
+
- [Graphiti: Real-Time Knowledge Graphs for AI Agents](https://github.com/getzep/graphiti) — Temporal knowledge graphs
|
|
990
|
+
- [Supermemory](https://supermemory.ai) — Hosted memory API (what we're replacing)
|
|
991
|
+
- [Supermemory code-chunk](https://github.com/supermemoryai/code-chunk) — AST-aware code chunking
|
|
992
|
+
- [Microsoft GraphRAG](https://github.com/microsoft/graphrag) — Graph-based RAG approach
|
|
993
|
+
|
|
994
|
+
### Infrastructure
|
|
995
|
+
- [LanceDB](https://docs.lancedb.com/) — Embedded vector database with hybrid search (chosen over sqlite-vec)
|
|
996
|
+
- [sqlite-vec](https://github.com/asg017/sqlite-vec) — Vector search extension for SQLite (lite-mode fallback)
|
|
997
|
+
- [better-sqlite3](https://github.com/WiseLibs/better-sqlite3) — Synchronous SQLite for Node.js
|
|
998
|
+
- [nomic-embed-text](https://www.nomic.ai/blog/posts/nomic-embed-matryoshka) — Local embedding model (chosen as primary)
|
|
999
|
+
- [@supermemory/code-chunk](https://github.com/supermemoryai/code-chunk) — AST-aware code chunking
|
|
1000
|
+
- [MCP SDK](https://github.com/modelcontextprotocol/typescript-sdk) — Model Context Protocol
|
|
1001
|
+
- [ONNX Runtime](https://onnxruntime.ai/) — For local cross-encoder reranking
|
|
1002
|
+
|
|
1003
|
+
### Parker's Existing Infrastructure
|
|
1004
|
+
- `~/.openclaw/extensions/context-embeddings/` — Current conversation embedding plugin
|
|
1005
|
+
- `~/Documents/Projects/Claude Code/lesa-bridge/` — Current MCP server
|
|
1006
|
+
- `~/Documents/Projects/Claude Code/openclaw-1password/` — Secret management
|
|
1007
|
+
- `~/Documents/Projects/OpenClaw/WIP.computer/` — Company vision docs
|
|
1008
|
+
|
|
1009
|
+
---
|
|
1010
|
+
|
|
1011
|
+
*PRD written: 2026-02-08*
|
|
1012
|
+
*Author: Claude Code (Opus 4.6) + Parker*
|
|
1013
|
+
*Project: memory-crystal*
|
|
1014
|
+
*Status: Draft — awaiting Parker's review*
|