cozo-memory 1.0.4 → 1.0.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -28,6 +28,8 @@
28
28
  - [Development](#development)
29
29
  - [User Preference Profiling](#user-preference-profiling-mem0-style)
30
30
  - [Troubleshooting](#troubleshooting)
31
+ - [Roadmap](#roadmap)
32
+ - [Contributing](#contributing)
31
33
  - [License](#license)
32
34
 
33
35
  ## Quick Start
@@ -81,6 +83,10 @@ Now you can add the server to your MCP client (e.g. Claude Desktop).
81
83
 
82
84
  📦 **Export/Import (since v1.8)** - Export to JSON, Markdown, or Obsidian-ready ZIP; import from Mem0, MemGPT, Markdown, or native format
83
85
 
86
+ 📄 **PDF Support (since v1.9)** - Direct PDF ingestion with text extraction via pdfjs-dist; supports file path and content parameters
87
+
88
+ 🕐 **Dual Timestamp Format (since v1.9)** - All timestamps returned in both Unix microseconds and ISO 8601 format for maximum flexibility
89
+
84
90
  ### Detailed Features
85
91
  - **Hybrid Search (v0.7 Optimized)**: Combination of semantic search (HNSW), **Full-Text Search (FTS)**, and graph signals, merged via Reciprocal Rank Fusion (RRF).
86
92
  - **Full-Text Search (FTS)**: Native CozoDB v0.7 FTS indices with stemming, stopword filtering, and robust query sanitizing (cleaning of `+ - * / \ ( ) ? .`) for maximum stability.
@@ -224,7 +230,11 @@ graph LR
224
230
 
225
231
  ### Prerequisites
226
232
  - Node.js 20+ (recommended)
227
- - CozoDB native dependency is installed via `cozo-node`.
233
+ - **RAM: 1.7 GB minimum** (for default bge-m3 model)
234
+ - Model download: ~600 MB
235
+ - Runtime memory: ~1.1 GB
236
+ - For lower-spec machines, see [Embedding Model Options](#embedding-model-options) below
237
+ - CozoDB native dependency is installed via `cozo-node`
228
238
 
229
239
  ### Via npm (Easiest)
230
240
 
@@ -257,6 +267,62 @@ Notes:
257
267
  - On first start, `@xenova/transformers` downloads the embedding model (may take time).
258
268
  - Embeddings are processed on the CPU.
259
269
 
270
+ ### Embedding Model Options
271
+
272
+ CozoDB Memory supports multiple embedding models via the `EMBEDDING_MODEL` environment variable:
273
+
274
+ | Model | Size | RAM | Dimensions | Best For |
275
+ |-------|------|-----|------------|----------|
276
+ | `Xenova/bge-m3` (default) | ~600 MB | ~1.7 GB | 1024 | High accuracy, production use |
277
+ | `Xenova/all-MiniLM-L6-v2` | ~80 MB | ~400 MB | 384 | Low-spec machines, development |
278
+ | `Xenova/bge-small-en-v1.5` | ~130 MB | ~600 MB | 384 | Balanced performance |
279
+
280
+ **Configuration Options:**
281
+
282
+ **Option 1: Using `.env` file (Easiest for beginners)**
283
+
284
+ ```bash
285
+ # Copy the example file
286
+ cp .env.example .env
287
+
288
+ # Edit .env and set your preferred model
289
+ EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2
290
+ ```
291
+
292
+ **Option 2: MCP Server Config (For Claude Desktop / Kiro)**
293
+
294
+ ```json
295
+ {
296
+ "mcpServers": {
297
+ "cozo-memory": {
298
+ "command": "npx",
299
+ "args": ["cozo-memory"],
300
+ "env": {
301
+ "EMBEDDING_MODEL": "Xenova/all-MiniLM-L6-v2"
302
+ }
303
+ }
304
+ }
305
+ }
306
+ ```
307
+
308
+ **Option 3: Command Line**
309
+
310
+ ```bash
311
+ # Use lightweight model for development
312
+ EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run start
313
+ ```
314
+
315
+ **Download Model First (Recommended):**
316
+
317
+ ```bash
318
+ # Set model in .env or via command line, then:
319
+ EMBEDDING_MODEL=Xenova/all-MiniLM-L6-v2 npm run download-model
320
+ ```
321
+ }
322
+ ```
323
+
324
+ **Note:** Changing models requires re-embedding existing data. The model is downloaded once on first use.
325
+
260
326
  ## Start / Integration
261
327
 
262
328
  ### MCP Server (stdio)
@@ -269,6 +335,67 @@ npm run start
269
335
 
270
336
  Default database path: `memory_db.cozo.db` in project root (created automatically).
271
337
 
338
+ ### CLI Tool
339
+
340
+ CozoDB Memory includes a full-featured CLI for all operations:
341
+
342
+ ```bash
343
+ # System operations
344
+ cozo-memory system health
345
+ cozo-memory system metrics
346
+
347
+ # Entity operations
348
+ cozo-memory entity create -n "MyEntity" -t "person" -m '{"age": 30}'
349
+ cozo-memory entity get -i <entity-id>
350
+ cozo-memory entity delete -i <entity-id>
351
+
352
+ # Observations
353
+ cozo-memory observation add -i <entity-id> -t "Some note"
354
+
355
+ # Relations
356
+ cozo-memory relation create --from <id1> --to <id2> --type "knows" -s 0.8
357
+
358
+ # Search
359
+ cozo-memory search query -q "search term" -l 10
360
+ cozo-memory search context -q "context query"
361
+
362
+ # Graph operations
363
+ cozo-memory graph explore -s <entity-id> -h 3
364
+ cozo-memory graph pagerank
365
+ cozo-memory graph communities
366
+
367
+ # Export/Import
368
+ cozo-memory export json -o backup.json --include-metadata --include-relationships --include-observations
369
+ cozo-memory export markdown -o notes.md
370
+ cozo-memory export obsidian -o vault.zip
371
+ cozo-memory import file -i data.json -f cozo
372
+
373
+ # All commands support -f json or -f pretty for output formatting
374
+ ```
375
+
376
+ ### TUI (Terminal User Interface)
377
+
378
+ Interactive TUI with mouse support powered by Python Textual:
379
+
380
+ ```bash
381
+ # Install Python dependencies (one-time)
382
+ pip install textual
383
+
384
+ # Launch TUI
385
+ npm run tui
386
+ # or directly:
387
+ cozo-memory-tui
388
+ ```
389
+
390
+ **TUI Features:**
391
+ - 🖱️ Full mouse support (click buttons, scroll, select inputs)
392
+ - ⌨️ Keyboard shortcuts (q=quit, h=help, r=refresh)
393
+ - 📊 Interactive menus for all operations
394
+ - 🎨 Rich terminal UI with colors and animations
395
+ - 📋 Real-time results display
396
+ - 🔍 Forms for entity creation, search, graph operations
397
+ - 📤 Export/Import wizards
398
+
272
399
  ### Claude Desktop Integration
273
400
 
274
401
  #### Using npx (Recommended)
@@ -335,6 +462,14 @@ DB_ENGINE=rocksdb npm run dev
335
462
  | **RocksDB** | Prepared & Tested | For high-performance or very large datasets. |
336
463
  | **MDBX** | Not supported | Requires manual build of `cozo-node` from source. |
337
464
 
465
+ ### Environment Variables
466
+
467
+ | Variable | Default | Description |
468
+ |----------|---------|-------------|
469
+ | `DB_ENGINE` | `sqlite` | Database backend: `sqlite` or `rocksdb` |
470
+ | `EMBEDDING_MODEL` | `Xenova/bge-m3` | Embedding model (see [Embedding Model Options](#embedding-model-options)) |
471
+ | `PORT` | `3001` | HTTP API bridge port (if using `npm run bridge`) |
472
+
338
473
  ---
339
474
 
340
475
  ## Data Model
@@ -367,7 +502,10 @@ Actions:
367
502
  - `create_relation`: `{ from_id, to_id, relation_type, strength?, metadata? }`
368
503
  - `run_transaction`: `{ operations: Array<{ action, params }> }` **(New v1.2)**: Executes multiple operations atomically.
369
504
  - `add_inference_rule`: `{ name, datalog }`
370
- - `ingest_file`: `{ format, content, entity_id?, entity_name?, entity_type?, chunking?, metadata?, observation_metadata?, deduplicate?, max_observations? }`
505
+ - `ingest_file`: `{ format, file_path?, content?, entity_id?, entity_name?, entity_type?, chunking?, metadata?, observation_metadata?, deduplicate?, max_observations? }`
506
+ - `format` options: `"markdown"`, `"json"`, `"pdf"` **(New v1.9)**
507
+ - `file_path`: Optional path to file on disk (alternative to `content` parameter)
508
+ - `content`: File content as string (required if `file_path` not provided)
371
509
  - `chunking` options: `"none"`, `"paragraphs"` (future: `"semantic"`)
372
510
 
373
511
  Important Details:
@@ -419,7 +557,7 @@ Example (Transitive Manager ⇒ Upper Manager):
419
557
  }
420
558
  ```
421
559
 
422
- Bulk Ingestion (Markdown/JSON):
560
+ Bulk Ingestion (Markdown/JSON/PDF):
423
561
 
424
562
  ```json
425
563
  {
@@ -433,6 +571,19 @@ Bulk Ingestion (Markdown/JSON):
433
571
  }
434
572
  ```
435
573
 
574
+ PDF Ingestion via File Path:
575
+
576
+ ```json
577
+ {
578
+ "action": "ingest_file",
579
+ "entity_name": "Research Paper",
580
+ "format": "pdf",
581
+ "file_path": "/path/to/document.pdf",
582
+ "chunking": "paragraphs",
583
+ "deduplicate": true
584
+ }
585
+ ```
586
+
436
587
  ### query_memory (Read)
437
588
 
438
589
  Actions:
@@ -660,6 +811,24 @@ Returns deletion statistics showing exactly what was removed.
660
811
 
661
812
  ## Technical Highlights
662
813
 
814
+ ### Dual Timestamp Format (v1.9)
815
+
816
+ All write operations (`create_entity`, `add_observation`, `create_relation`) return timestamps in both formats:
817
+ - `created_at`: Unix microseconds (CozoDB native format, precise for calculations)
818
+ - `created_at_iso`: ISO 8601 string (human-readable, e.g., `"2026-02-28T17:21:19.343Z"`)
819
+
820
+ This dual format provides maximum flexibility - use Unix timestamps for time calculations and comparisons, or ISO strings for display and logging.
821
+
822
+ Example response:
823
+ ```json
824
+ {
825
+ "id": "...",
826
+ "created_at": 1772299279343000,
827
+ "created_at_iso": "2026-02-28T17:21:19.343Z",
828
+ "status": "Entity created"
829
+ }
830
+ ```
831
+
663
832
  ### Local ONNX Embeddings (Transformers)
664
833
 
665
834
  Default Model: `Xenova/bge-m3` (1024 dimensions).
@@ -784,6 +953,45 @@ npx ts-node test-user-pref.ts
784
953
  - Use `health` action to check cache hit rates
785
954
  - Consider RocksDB backend for datasets > 100k entities
786
955
 
956
+ ## Roadmap
957
+
958
+ CozoDB Memory is actively developed. Here's what's planned:
959
+
960
+ ### Near-Term (v1.x)
961
+
962
+ - **GPU Acceleration** - CUDA support for embedding generation (10-50x faster)
963
+ - **Streaming Ingestion** - Real-time data ingestion from logs, APIs, webhooks
964
+ - **Advanced Chunking** - Semantic chunking for `ingest_file` (paragraph-aware splitting)
965
+ - **Query Optimization** - Automatic query plan optimization for complex graph traversals
966
+ - **Additional Export Formats** - Notion, Roam Research, Logseq compatibility
967
+
968
+ ### Mid-Term (v2.x)
969
+
970
+ - **Multi-Modal Embeddings** - Image and audio embedding support via CLIP/Whisper
971
+ - **Distributed Mode** - Multi-node deployment with CozoDB clustering
972
+ - **Real-Time Sync** - WebSocket-based live updates for collaborative use cases
973
+ - **Advanced Inference** - Causal reasoning, temporal pattern detection
974
+ - **Web UI** - Optional web interface for memory exploration and visualization
975
+
976
+ ### Long-Term
977
+
978
+ - **Federated Learning** - Privacy-preserving model updates across instances
979
+ - **Custom Embedding Models** - Fine-tune embeddings on domain-specific data
980
+ - **Plugin System** - Extensible architecture for custom tools and integrations
981
+
982
+ ### Community Requests
983
+
984
+ Have a feature idea? Open an issue with the `enhancement` label or check [Low-Hanging-Fruit.md](Low-Hanging-Fruit.md) for quick wins you can contribute.
985
+
986
+ ## Contributing
987
+
988
+ Contributions are welcome! Please read [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines on:
989
+
990
+ - Setting up the development environment
991
+ - Coding standards and best practices
992
+ - Testing and documentation requirements
993
+ - Pull request process
994
+
787
995
  ## License
788
996
 
789
997
  This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details.
@@ -35,11 +35,13 @@ app.post("/api/entities", async (req, res) => {
35
35
  try {
36
36
  // We use the same logic as in create_entity tool
37
37
  const id = (0, uuid_1.v4)();
38
- const embedding = await memoryServer.embeddingService.embed(name + " " + type);
38
+ const content = name + " " + type;
39
+ const embedding = await memoryServer.embeddingService.embed(content);
40
+ const nameEmbedding = await memoryServer.embeddingService.embed(name);
39
41
  await memoryServer.db.run(`
40
- ?[id, created_at, name, type, embedding, metadata] <- [
41
- [$id, "ASSERT", $name, $type, [${embedding.join(",")}], $metadata]
42
- ] :put entity {id, created_at => name, type, embedding, metadata}
42
+ ?[id, created_at, name, type, embedding, name_embedding, metadata] <- [
43
+ [$id, "ASSERT", $name, $type, [${embedding.join(",")}], [${nameEmbedding.join(",")}], $metadata]
44
+ ] :put entity {id, created_at => name, type, embedding, name_embedding, metadata}
43
45
  `, { id, name, type, metadata: metadata || {} });
44
46
  res.status(201).json({ id, name, type, metadata, status: "Entity created" });
45
47
  }
@@ -0,0 +1,204 @@
1
+ "use strict";
2
+ /**
3
+ * Shared CLI command logic for both pure CLI and TUI
4
+ * Calls MemoryServer public methods directly
5
+ */
6
+ Object.defineProperty(exports, "__esModule", { value: true });
7
+ exports.CLICommands = void 0;
8
+ const index_js_1 = require("./index.js");
9
+ class CLICommands {
10
+ server;
11
+ initialized = false;
12
+ constructor() {
13
+ this.server = new index_js_1.MemoryServer();
14
+ }
15
+ async init() {
16
+ if (!this.initialized) {
17
+ await this.server.initPromise;
18
+ this.initialized = true;
19
+ }
20
+ }
21
+ async close() {
22
+ // CozoDB handles cleanup automatically
23
+ }
24
+ // Entity operations - use db directly
25
+ async createEntity(name, type, metadata) {
26
+ const { v4: uuidv4 } = await import('uuid');
27
+ const id = uuidv4();
28
+ const content = name + " " + type;
29
+ const embedding = await this.server.embeddingService.embed(content);
30
+ const nameEmbedding = await this.server.embeddingService.embed(name);
31
+ const now = Date.now() * 1000; // microseconds
32
+ const nowIso = new Date().toISOString();
33
+ await this.server.db.run(`
34
+ ?[id, created_at, name, type, embedding, name_embedding, metadata] <- [
35
+ [$id, "ASSERT", $name, $type, [${embedding.join(",")}], [${nameEmbedding.join(",")}], $metadata]
36
+ ] :put entity {id, created_at => name, type, embedding, name_embedding, metadata}
37
+ `, { id, name, type, metadata: metadata || {} });
38
+ return { id, name, type, metadata, created_at: now, created_at_iso: nowIso, status: "Entity created" };
39
+ }
40
+ async getEntity(entityId) {
41
+ const entityRes = await this.server.db.run('?[id, name, type, metadata, ts] := *entity{id, name, type, metadata, created_at, @ "NOW"}, id = $id, ts = to_int(created_at)', { id: entityId });
42
+ if (entityRes.rows.length === 0) {
43
+ throw new Error("Entity not found");
44
+ }
45
+ const obsRes = await this.server.db.run('?[id, text, metadata, ts] := *observation{id, entity_id, text, metadata, created_at, @ "NOW"}, entity_id = $id, ts = to_int(created_at)', { id: entityId });
46
+ const relRes = await this.server.db.run(`
47
+ ?[target_id, type, strength, metadata, direction] := *relationship{from_id, to_id, relation_type: type, strength, metadata, @ "NOW"}, from_id = $id, target_id = to_id, direction = 'outgoing'
48
+ ?[target_id, type, strength, metadata, direction] := *relationship{from_id, to_id, relation_type: type, strength, metadata, @ "NOW"}, to_id = $id, target_id = from_id, direction = 'incoming'
49
+ `, { id: entityId });
50
+ return {
51
+ entity: {
52
+ id: entityRes.rows[0][0],
53
+ name: entityRes.rows[0][1],
54
+ type: entityRes.rows[0][2],
55
+ metadata: entityRes.rows[0][3],
56
+ created_at: entityRes.rows[0][4]
57
+ },
58
+ observations: obsRes.rows.map((r) => ({ id: r[0], text: r[1], metadata: r[2], created_at: r[3] })),
59
+ relations: relRes.rows.map((r) => ({ target_id: r[0], type: r[1], strength: r[2], metadata: r[3], direction: r[4] }))
60
+ };
61
+ }
62
+ async deleteEntity(entityId) {
63
+ await this.server.db.run(`
64
+ { ?[id, created_at] := *observation{id, entity_id, created_at}, entity_id = $target_id :rm observation {id, created_at} }
65
+ { ?[from_id, to_id, relation_type, created_at] := *relationship{from_id, to_id, relation_type, created_at}, from_id = $target_id :rm relationship {from_id, to_id, relation_type, created_at} }
66
+ { ?[from_id, to_id, relation_type, created_at] := *relationship{from_id, to_id, relation_type, created_at}, to_id = $target_id :rm relationship {from_id, to_id, relation_type, created_at} }
67
+ { ?[id, created_at] := *entity{id, created_at}, id = $target_id :rm entity {id, created_at} }
68
+ `, { target_id: entityId });
69
+ return { status: "Entity and related data deleted" };
70
+ }
71
+ // Observation operations
72
+ async addObservation(entityId, text, metadata) {
73
+ const { v4: uuidv4 } = await import('uuid');
74
+ const id = uuidv4();
75
+ const embedding = await this.server.embeddingService.embed(text);
76
+ const now = Date.now() * 1000;
77
+ const nowIso = new Date().toISOString();
78
+ await this.server.db.run(`
79
+ ?[id, created_at, entity_id, text, embedding, metadata] <- [
80
+ [$id, "ASSERT", $entity_id, $text, [${embedding.join(",")}], $metadata]
81
+ ] :put observation {id, created_at => entity_id, text, embedding, metadata}
82
+ `, { id, entity_id: entityId, text, metadata: metadata || {} });
83
+ return { id, entity_id: entityId, text, metadata, created_at: now, created_at_iso: nowIso, status: "Observation added" };
84
+ }
85
+ // Relation operations
86
+ async createRelation(fromId, toId, relationType, strength, metadata) {
87
+ const str = strength !== undefined ? strength : 1.0;
88
+ const now = Date.now() * 1000;
89
+ const nowIso = new Date().toISOString();
90
+ await this.server.db.run(`
91
+ ?[from_id, to_id, relation_type, created_at, strength, metadata] <- [
92
+ [$from_id, $to_id, $relation_type, "ASSERT", $strength, $metadata]
93
+ ] :put relationship {from_id, to_id, relation_type, created_at => strength, metadata}
94
+ `, { from_id: fromId, to_id: toId, relation_type: relationType, strength: str, metadata: metadata || {} });
95
+ return { from_id: fromId, to_id: toId, relation_type: relationType, strength: str, metadata, created_at: now, created_at_iso: nowIso, status: "Relation created" };
96
+ }
97
+ // Search operations - use the MCP tool directly
98
+ async search(query, limit = 10, entityTypes, includeEntities = true, includeObservations = true) {
99
+ // Call the search method from the server's query_memory tool
100
+ const result = await this.server.hybridSearch.search({
101
+ query,
102
+ limit,
103
+ entityTypes,
104
+ includeEntities,
105
+ includeObservations
106
+ });
107
+ // If result is empty or has issues, return it as-is
108
+ return result;
109
+ }
110
+ async advancedSearch(params) {
111
+ return await this.server.advancedSearch(params);
112
+ }
113
+ async context(query, contextWindow, timeRangeHours) {
114
+ // Use advancedSearch with appropriate parameters
115
+ return await this.server.advancedSearch({
116
+ query,
117
+ limit: contextWindow || 10,
118
+ timeRangeHours
119
+ });
120
+ }
121
+ // Graph operations
122
+ async explore(startEntity, endEntity, maxHops, relationTypes) {
123
+ // Use graph_walking or advancedSearch
124
+ if (endEntity) {
125
+ // Path finding
126
+ return await this.server.computeShortestPath({
127
+ start_entity: startEntity,
128
+ end_entity: endEntity
129
+ });
130
+ }
131
+ else {
132
+ // Graph exploration - use advancedSearch with graph constraints
133
+ return await this.server.advancedSearch({
134
+ query: '',
135
+ graphConstraints: {
136
+ maxDepth: maxHops || 3,
137
+ requiredRelations: relationTypes,
138
+ targetEntityIds: [startEntity]
139
+ }
140
+ });
141
+ }
142
+ }
143
+ async pagerank() {
144
+ return await this.server.recomputePageRank();
145
+ }
146
+ async communities() {
147
+ return await this.server.recomputeCommunities();
148
+ }
149
+ // System operations
150
+ async health() {
151
+ const entityCount = await this.server.db.run('?[count(id)] := *entity{id, @ "NOW"}');
152
+ const obsCount = await this.server.db.run('?[count(id)] := *observation{id, @ "NOW"}');
153
+ const relCount = await this.server.db.run('?[count(from_id)] := *relationship{from_id, @ "NOW"}');
154
+ return {
155
+ status: "healthy",
156
+ entities: entityCount.rows[0][0],
157
+ observations: obsCount.rows[0][0],
158
+ relationships: relCount.rows[0][0]
159
+ };
160
+ }
161
+ async metrics() {
162
+ // Access private metrics via type assertion
163
+ return this.server.metrics;
164
+ }
165
+ async exportMemory(format, options) {
166
+ const { ExportImportService } = await import('./export-import-service.js');
167
+ // Create a simple wrapper that implements DbService interface
168
+ const dbService = {
169
+ run: async (query, params) => {
170
+ return await this.server.db.run(query, params);
171
+ }
172
+ };
173
+ const exportService = new ExportImportService(dbService);
174
+ return await exportService.exportMemory({
175
+ format,
176
+ includeMetadata: options?.includeMetadata,
177
+ includeRelationships: options?.includeRelationships,
178
+ includeObservations: options?.includeObservations,
179
+ entityTypes: options?.entityTypes,
180
+ since: options?.since
181
+ });
182
+ }
183
+ async importMemory(data, sourceFormat, options) {
184
+ const { ExportImportService } = await import('./export-import-service.js');
185
+ // Create a simple wrapper that implements DbService interface
186
+ const dbService = {
187
+ run: async (query, params) => {
188
+ return await this.server.db.run(query, params);
189
+ }
190
+ };
191
+ const exportService = new ExportImportService(dbService);
192
+ return await exportService.importMemory(data, {
193
+ sourceFormat: sourceFormat,
194
+ mergeStrategy: options?.mergeStrategy || 'skip',
195
+ defaultEntityType: options?.defaultEntityType
196
+ });
197
+ }
198
+ async ingestFile(entityId, format, filePath, content, options) {
199
+ // This would need to be implemented similar to the MCP tool
200
+ // For now, return a placeholder
201
+ return { status: "not_implemented", message: "Use MCP server for file ingestion" };
202
+ }
203
+ }
204
+ exports.CLICommands = CLICommands;