collective-memory-mcp 0.6.0 → 0.6.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,22 +1,23 @@
1
1
  # Collective Memory MCP Server
2
2
 
3
- A persistent, graph-based memory system with **semantic search** that enables AI agents to document their work and learn from each other's experiences. This system transforms ephemeral agent interactions into a searchable knowledge base of structural patterns, solutions, and methodologies.
3
+ A persistent, graph-based memory system with **vector search** that enables AI agents to document their work and learn from each other's experiences. This system transforms ephemeral agent interactions into a searchable knowledge base of structural patterns, solutions, and methodologies.
4
4
 
5
5
  ## Overview
6
6
 
7
7
  The Collective Memory System is designed for multi-agent environments where agents need to:
8
8
 
9
9
  - Document their completed work for future reference
10
- - Discover how similar tasks were solved previously using **semantic understanding**
10
+ - Discover how similar tasks were solved previously using **vector search**
11
11
  - Learn from the structural patterns and approaches of other agents
12
12
  - Coordinate across parallel executions without duplicating effort
13
13
 
14
14
  ## Key Features
15
15
 
16
- - **Semantic Search** - Finds conceptually similar content even when keywords differ
16
+ - **Vector Search** - TF-IDF based search finds conceptually similar content even when keywords differ
17
17
  - **Knowledge Graph** - Entities and relations capture complex relationships
18
18
  - **Ranked Results** - Similarity scores help identify the most relevant past work
19
- - **Auto-Embeddings** - New content automatically gets semantic embeddings
19
+ - **Zero Configuration** - Works out of the box, no external dependencies or API keys needed
20
+ - **Pure JavaScript** - No native dependencies, works completely offline
20
21
 
21
22
  ## Installation
22
23
 
@@ -49,10 +50,10 @@ Add this to your Claude system prompt to ensure agents know about the Collective
49
50
  You have access to a Collective Memory MCP Server that stores knowledge from previous tasks.
50
51
 
51
52
  BEFORE starting work, search for similar past tasks using:
52
- - search_collective_memory (semantic search - understands meaning, not just keywords)
53
+ - search_collective_memory (vector search - understands meaning, not just keywords)
53
54
  - find_similar_procedures (finds similar tasks with full implementation details)
54
55
 
55
- The search uses semantic embeddings, so it finds relevant content even when different
56
+ The search uses TF-IDF vector embeddings, so it finds relevant content even when different
56
57
  terminology is used. Results are ranked by similarity score.
57
58
 
58
59
  AFTER completing any task, document it using:
@@ -62,47 +63,16 @@ When writing observations, be SPECIFIC and include facts like file paths, versio
62
63
  metrics, and error messages. Avoid vague statements like "works well" or "fixed bugs".
63
64
  ```
64
65
 
65
- ## Setting Up Semantic Search
66
+ ## How Vector Search Works
66
67
 
67
- For semantic search to work, you need to configure an embeddings provider:
68
+ This system uses **TF-IDF (Term Frequency-Inverse Document Frequency)** vector search:
68
69
 
69
- ### Option 1: OpenAI (Recommended - Best Quality)
70
+ - Tokenizes text into meaningful terms
71
+ - Calculates term importance based on frequency
72
+ - Uses cosine similarity to rank results
73
+ - Works entirely offline with no external dependencies
70
74
 
71
- ```bash
72
- # Configure with your OpenAI API key
73
- # Use the manage_embeddings tool with:
74
- {
75
- "action": "configure",
76
- "provider": "openai",
77
- "api_key": "sk-..."
78
- }
79
- ```
80
-
81
- ### Option 2: Ollama (Free - Local)
82
-
83
- ```bash
84
- # Install Ollama first
85
- # Then pull the embedding model
86
- ollama pull nomic-embed-text
87
-
88
- # Configure the provider
89
- {
90
- "action": "configure",
91
- "provider": "ollama"
92
- }
93
- ```
94
-
95
- ### Generate Embeddings for Existing Data
96
-
97
- After configuring, generate embeddings for any existing entities:
98
-
99
- ```json
100
- {
101
- "action": "generate"
102
- }
103
- ```
104
-
105
- **Note:** Without configuring embeddings, the system falls back to keyword-based search.
75
+ No configuration needed - it just works!
106
76
 
107
77
  ## Entity Types
108
78
 
@@ -145,7 +115,7 @@ After configuring, generate embeddings for any existing entities:
145
115
  ### Query & Search
146
116
 
147
117
  - **read_graph** - Read entire knowledge graph
148
- - **search_collective_memory** - Semantic search with ranked results
118
+ - **search_collective_memory** - Vector search with ranked results
149
119
  - **open_nodes** - Retrieve specific nodes by name
150
120
 
151
121
  ### Agent Workflow
@@ -153,10 +123,6 @@ After configuring, generate embeddings for any existing entities:
153
123
  - **record_task_completion** - Primary tool for documenting completed work
154
124
  - **find_similar_procedures** - Find similar tasks with full implementation details
155
125
 
156
- ### Embeddings Management
157
-
158
- - **manage_embeddings** - Configure semantic search and generate embeddings
159
-
160
126
  ## Example Usage
161
127
 
162
128
  ### Recording a Task Completion
@@ -182,7 +148,7 @@ await session.callTool("record_task_completion", {
182
148
  });
183
149
  ```
184
150
 
185
- ### Finding Similar Procedures (Semantic Search)
151
+ ### Finding Similar Procedures (Vector Search)
186
152
 
187
153
  ```javascript
188
154
  const result = await session.callTool("find_similar_procedures", {
@@ -195,25 +161,18 @@ const result = await session.callTool("find_similar_procedures", {
195
161
  // { "task": {...}, "score": 0.89, "artifacts": [...], "structures": [...] },
196
162
  // { "task": {...}, "score": 0.82, "artifacts": [...], "structures": [...] }
197
163
  // ],
198
- // "search_method": "semantic"
164
+ // "search_method": "vector"
199
165
  // }
200
166
  ```
201
167
 
202
- ### Configuring Embeddings
168
+ ### Searching the Collective Memory
203
169
 
204
170
  ```javascript
205
- // Check status
206
- await session.callTool("manage_embeddings", { "action": "status" });
207
-
208
- // Configure OpenAI
209
- await session.callTool("manage_embeddings", {
210
- "action": "configure",
211
- "provider": "openai",
212
- "api_key": "sk-..."
171
+ const result = await session.callTool("search_collective_memory", {
172
+ query: "database optimization"
213
173
  });
214
174
 
215
- // Generate embeddings for existing data
216
- await session.callTool("manage_embeddings", { "action": "generate" });
175
+ // Returns matching entities with similarity scores
217
176
  ```
218
177
 
219
178
  ## Database
@@ -222,16 +181,15 @@ The server uses JSON file storage for persistence. Data is stored at:
222
181
 
223
182
  ```
224
183
  ~/.collective-memory/memory.json # Knowledge graph data
225
- ~/.collective-memory/config.json # Embeddings provider configuration
226
184
  ```
227
185
 
228
- ## Semantic Search Benefits
186
+ ## Vector Search Benefits
229
187
 
230
- | Before (Keyword Search) | After (Semantic Search) |
231
- |------------------------|------------------------|
232
- | Query "login" misses "authentication" | Query "login" finds "authentication", "JWT", "OAuth" |
188
+ | Traditional Keyword Search | TF-IDF Vector Search |
189
+ |---------------------------|---------------------|
190
+ | Exact word matching required | Finds related terms automatically |
233
191
  | No relevance ranking | Results ranked by similarity score (0-1) |
234
- | Exact word matching required | Understands meaning and intent |
192
+ | "login" misses "authentication" | "login" finds "authentication", "JWT", "OAuth" |
235
193
  | High false-positive rate | More precise, relevant results |
236
194
 
237
195
  ## Requirements
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "collective-memory-mcp",
3
- "version": "0.6.0",
4
- "description": "A persistent, graph-based memory system for AI agents with semantic search (MCP Server)",
3
+ "version": "0.6.2",
4
+ "description": "A persistent, graph-based memory system for AI agents with TF-IDF vector search (MCP Server)",
5
5
  "type": "module",
6
6
  "main": "src/server.js",
7
7
  "bin": {
@@ -20,7 +20,10 @@
20
20
  "collective",
21
21
  "anthropic",
22
22
  "claude",
23
- "model-context-protocol"
23
+ "model-context-protocol",
24
+ "vector-search",
25
+ "tf-idf",
26
+ "semantic-search"
24
27
  ],
25
28
  "license": "MIT",
26
29
  "repository": {
package/src/server.js CHANGED
@@ -14,7 +14,6 @@ import {
14
14
  } from "@modelcontextprotocol/sdk/types.js";
15
15
  import { getStorage } from "./storage.js";
16
16
  import { Entity, Relation, ENTITY_TYPES, RELATION_TYPES } from "./models.js";
17
- import { loadConfig } from "./embeddings.js";
18
17
 
19
18
  /**
20
19
  * Create and configure the MCP server
@@ -22,9 +21,6 @@ import { loadConfig } from "./embeddings.js";
22
21
  async function createServer() {
23
22
  const storage = getStorage();
24
23
 
25
- // Initialize embeddings for semantic search
26
- await storage.initEmbeddings();
27
-
28
24
  const server = new Server(
29
25
  {
30
26
  name: "collective-memory",
@@ -234,17 +230,16 @@ async function createServer() {
234
230
  {
235
231
  name: "search_collective_memory",
236
232
  description:
237
- "**Search all past work using semantic search** - Use before starting a task to learn from previous solutions. " +
238
- "Uses semantic embeddings to find conceptually similar content, even with different keywords. " +
233
+ "**Search all past work using vector search** - Use before starting a task to learn from previous solutions. " +
234
+ "Uses TF-IDF vector search to find conceptually similar content, even with different keywords. " +
239
235
  "Returns ranked results with similarity scores. " +
240
- "Automatically falls back to keyword search if embeddings aren't available. " +
241
236
  "Use find_similar_procedures for more detailed results with artifacts.",
242
237
  inputSchema: {
243
238
  type: "object",
244
239
  properties: {
245
240
  query: {
246
241
  type: "string",
247
- description: "What are you looking for? Semantic search understands meaning. (e.g., 'authentication', 'CORS fix', 'database')",
242
+ description: "What are you looking for? Vector search understands meaning. (e.g., 'authentication', 'CORS fix', 'database')",
248
243
  },
249
244
  },
250
245
  required: ["query"],
@@ -355,9 +350,9 @@ async function createServer() {
355
350
  {
356
351
  name: "find_similar_procedures",
357
352
  description:
358
- "**Use BEFORE starting work** - Find how similar tasks were solved previously using semantic search. " +
353
+ "**Use BEFORE starting work** - Find how similar tasks were solved previously using vector search. " +
359
354
  "Returns complete implementation details including artifacts and structures, ranked by similarity. " +
360
- "Understands meaning and intent, not just keywords. " +
355
+ "Understands meaning and intent using TF-IDF vectors. " +
361
356
  "Learn from past solutions before implementing new features. " +
362
357
  "Query examples: 'authentication', 'database migration', 'API design', 'error handling'.",
363
358
  inputSchema: {
@@ -365,39 +360,12 @@ async function createServer() {
365
360
  properties: {
366
361
  query: {
367
362
  type: "string",
368
- description: "What are you trying to do? Semantic search finds conceptually similar work. (e.g., 'authentication implementation', 'database migration')",
363
+ description: "What are you trying to do? Vector search finds conceptually similar work. (e.g., 'authentication implementation', 'database migration')",
369
364
  },
370
365
  },
371
366
  required: ["query"],
372
367
  },
373
368
  },
374
- {
375
- name: "manage_embeddings",
376
- description:
377
- "**Manage semantic search embeddings** - Generate embeddings for existing entities " +
378
- "to enable semantic search. Run this once after setting up an embeddings provider. " +
379
- "Embeddings enable finding similar content even when keywords don't match exactly.",
380
- inputSchema: {
381
- type: "object",
382
- properties: {
383
- action: {
384
- type: "string",
385
- enum: ["generate", "status", "configure"],
386
- description: "Action: 'generate' creates embeddings for entities missing them, 'status' shows current state, 'configure' updates settings",
387
- },
388
- provider: {
389
- type: "string",
390
- enum: ["openai", "ollama"],
391
- description: "Provider to use (only for 'configure' action)",
392
- },
393
- api_key: {
394
- type: "string",
395
- description: "API key for OpenAI (only for 'configure' action with provider='openai')",
396
- },
397
- },
398
- required: ["action"],
399
- },
400
- },
401
369
  ],
402
370
  };
403
371
  });
@@ -736,9 +704,6 @@ Future agents will read your observations to learn. Write for them, not for your
736
704
  case "find_similar_procedures":
737
705
  return { content: [{ type: "text", text: JSON.stringify(findSimilarProcedures(args), null, 2) }] };
738
706
 
739
- case "manage_embeddings":
740
- return { content: [{ type: "text", text: JSON.stringify(await manageEmbeddings(args), null, 2) }] };
741
-
742
707
  default:
743
708
  throw new Error(`Unknown tool: ${name}`);
744
709
  }
@@ -1126,112 +1091,6 @@ Future agents will read your observations to learn. Write for them, not for your
1126
1091
  return { similar_tasks: results, count: results.length, search_method: searchResult.method };
1127
1092
  }
1128
1093
 
1129
- async function manageEmbeddings({ action = "status", provider = null, api_key = null }) {
1130
- const { saveConfig } = await import("./embeddings.js");
1131
-
1132
- switch (action) {
1133
- case "status": {
1134
- const allEntities = storage.getAllEntities();
1135
- const withEmbeddings = allEntities.filter(
1136
- (e) => storage.data.entities[e.name]?.embedding
1137
- ).length;
1138
-
1139
- const config = loadConfig();
1140
- const isReady = storage.embeddingsReady;
1141
-
1142
- return {
1143
- status: "success",
1144
- action: "status",
1145
- embeddings_ready: isReady,
1146
- provider: config.embedding_provider,
1147
- entities_with_embeddings: withEmbeddings,
1148
- total_entities: allEntities.length,
1149
- coverage_percent: allEntities.length > 0
1150
- ? Math.round((withEmbeddings / allEntities.length) * 100)
1151
- : 0,
1152
- message: isReady
1153
- ? `Embeddings enabled using ${config.embedding_provider}. ${withEmbeddings}/${allEntities.length} entities have embeddings.`
1154
- : "Embeddings not configured. Use 'configure' action to set up a provider.",
1155
- };
1156
- }
1157
-
1158
- case "configure": {
1159
- if (!provider) {
1160
- return {
1161
- status: "error",
1162
- message: "Provider is required for configure action",
1163
- };
1164
- }
1165
-
1166
- const config = loadConfig();
1167
- config.embedding_provider = provider;
1168
-
1169
- if (provider === "openai" && api_key) {
1170
- config.openai_api_key = api_key;
1171
- }
1172
-
1173
- saveConfig(config);
1174
-
1175
- // Re-initialize embeddings with new config
1176
- storage.embeddingsReady = false;
1177
- await storage.initEmbeddings();
1178
-
1179
- return {
1180
- status: "success",
1181
- action: "configure",
1182
- provider,
1183
- embeddings_ready: storage.embeddingsReady,
1184
- message: storage.embeddingsReady
1185
- ? `Successfully configured ${provider} for embeddings`
1186
- : `Configured ${provider} but provider is not available. Check API keys or provider status.`,
1187
- };
1188
- }
1189
-
1190
- case "generate": {
1191
- // Generate embeddings for entities that don't have them
1192
- const allEntities = storage.getAllEntities();
1193
- const missing = allEntities.filter(
1194
- (e) => !storage.data.entities[e.name]?.embedding
1195
- );
1196
-
1197
- if (missing.length === 0) {
1198
- return {
1199
- status: "success",
1200
- action: "generate",
1201
- message: "All entities already have embeddings",
1202
- processed: 0,
1203
- };
1204
- }
1205
-
1206
- try {
1207
- const result = await storage.generateMissingEmbeddings((current, total, name) => {
1208
- // Optional: could emit progress events
1209
- });
1210
-
1211
- return {
1212
- status: "success",
1213
- action: "generate",
1214
- processed: result.processed,
1215
- total_entities: result.total,
1216
- message: `Generated embeddings for ${result.processed} entities`,
1217
- };
1218
- } catch (error) {
1219
- return {
1220
- status: "error",
1221
- action: "generate",
1222
- message: error.message,
1223
- };
1224
- }
1225
- }
1226
-
1227
- default:
1228
- return {
1229
- status: "error",
1230
- message: `Unknown action: ${action}`,
1231
- };
1232
- }
1233
- }
1234
-
1235
1094
  return server;
1236
1095
  }
1237
1096
 
package/src/storage.js CHANGED
@@ -1,28 +1,27 @@
1
1
  /**
2
2
  * Storage layer for the Collective Memory System using JSON file.
3
- * Pure JavaScript - no native dependencies required.
3
+ * Pure JavaScript - no external dependencies required.
4
+ * Uses TF-IDF vector search for semantic-like matching.
4
5
  */
5
6
 
6
- import { promises as fs } from "fs";
7
7
  import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
8
+ import { promises as fs } from "fs";
8
9
  import path from "path";
9
10
  import os from "os";
10
11
  import { Entity, Relation } from "./models.js";
11
- import { getEmbeddings } from "./embeddings.js";
12
+ import { getVectorIndex, buildIndexFromEntities } from "./vector-search.js";
12
13
 
13
14
  const DB_DIR = path.join(os.homedir(), ".collective-memory");
14
15
  const DB_PATH = path.join(DB_DIR, "memory.json");
15
16
 
16
17
  /**
17
- * Simple file-based storage
18
+ * File-based storage with vector search
18
19
  */
19
20
  export class Storage {
20
- constructor(dbPath = DB_PATH, embeddingsEnabled = true) {
21
+ constructor(dbPath = DB_PATH) {
21
22
  this.dbPath = dbPath;
22
23
  this.data = null;
23
- this.embeddingsEnabled = embeddingsEnabled;
24
- this.embeddings = null;
25
- this.embeddingsReady = false;
24
+ this.vectorIndex = getVectorIndex();
26
25
  // Initialize synchronously
27
26
  this.init();
28
27
  }
@@ -45,10 +44,13 @@ export class Storage {
45
44
  this.data = {
46
45
  entities: {},
47
46
  relations: [],
48
- version: "2.0", // Version bump for embeddings support
47
+ version: "2.0",
49
48
  };
50
49
  this.saveSync();
51
50
  }
51
+
52
+ // Build vector index from loaded entities
53
+ this.rebuildIndex();
52
54
  } catch (error) {
53
55
  // If anything fails, start with empty data
54
56
  this.data = {
@@ -60,53 +62,11 @@ export class Storage {
60
62
  }
61
63
 
62
64
  /**
63
- * Initialize embeddings asynchronously
64
- */
65
- async initEmbeddings() {
66
- if (!this.embeddingsEnabled || this.embeddingsReady) {
67
- return this.embeddingsReady;
68
- }
69
-
70
- try {
71
- this.embeddings = getEmbeddings();
72
- this.embeddingsReady = await this.embeddings.isAvailable();
73
- } catch (error) {
74
- console.warn("Embeddings not available:", error.message);
75
- this.embeddingsReady = false;
76
- }
77
-
78
- return this.embeddingsReady;
79
- }
80
-
81
- /**
82
- * Generate embedding for an entity
65
+ * Rebuild the vector search index from all entities
83
66
  */
84
- async generateEmbedding(entity) {
85
- if (!this.embeddingsReady || !this.embeddings) {
86
- return null;
87
- }
88
-
89
- try {
90
- const text = this.embeddings.createEntityText(entity);
91
- return await this.embeddings.embed(text);
92
- } catch (error) {
93
- console.warn("Failed to generate embedding:", error.message);
94
- return null;
95
- }
96
- }
97
-
98
- /**
99
- * Store embedding in entity data
100
- */
101
- async updateEntityEmbedding(entityName, entity) {
102
- if (!this.embeddingsReady) {
103
- return;
104
- }
105
-
106
- const embedding = await this.generateEmbedding(entity);
107
- if (embedding) {
108
- this.data.entities[entityName].embedding = embedding;
109
- }
67
+ rebuildIndex() {
68
+ const entities = this.getAllEntities();
69
+ buildIndexFromEntities(entities);
110
70
  }
111
71
 
112
72
  /**
@@ -135,6 +95,15 @@ export class Storage {
135
95
  await fs.writeFile(this.dbPath, JSON.stringify(this.data, null, 2), "utf-8");
136
96
  }
137
97
 
98
+ /**
99
+ * Initialize embeddings (placeholder for API compatibility)
100
+ * This system uses built-in TF-IDF vector search, no external embeddings needed
101
+ */
102
+ async initEmbeddings() {
103
+ // Vector search is always available, no configuration needed
104
+ return true;
105
+ }
106
+
138
107
  // ========== Entity Operations ==========
139
108
 
140
109
  /**
@@ -144,11 +113,12 @@ export class Storage {
144
113
  if (this.data.entities[entity.name]) {
145
114
  return false;
146
115
  }
147
- const entityData = entity.toJSON();
148
- this.data.entities[entity.name] = entityData;
149
116
 
150
- // Generate embedding if available
151
- await this.updateEntityEmbedding(entity.name, entity);
117
+ this.data.entities[entity.name] = entity.toJSON();
118
+
119
+ // Add to vector index
120
+ this.vectorIndex.addDocument(entity.name, entity);
121
+ this.vectorIndex.build();
152
122
 
153
123
  await this.save();
154
124
  return true;
@@ -186,7 +156,6 @@ export class Storage {
186
156
  const entity = this.data.entities[name];
187
157
  if (!entity) return false;
188
158
 
189
- const updated = false;
190
159
  if (observations !== undefined) {
191
160
  entity.observations = observations;
192
161
  }
@@ -194,11 +163,8 @@ export class Storage {
194
163
  entity.metadata = metadata;
195
164
  }
196
165
 
197
- // Regenerate embedding if observations changed
198
- if (observations !== undefined && this.embeddingsReady) {
199
- const entityObj = new Entity(entity);
200
- await this.updateEntityEmbedding(name, entityObj);
201
- }
166
+ // Rebuild index to reflect changes
167
+ this.rebuildIndex();
202
168
 
203
169
  await this.save();
204
170
  return true;
@@ -219,6 +185,9 @@ export class Storage {
219
185
  r => r.from !== name && r.to !== name
220
186
  );
221
187
 
188
+ // Rebuild index
189
+ this.rebuildIndex();
190
+
222
191
  await this.save();
223
192
  return true;
224
193
  }
@@ -319,133 +288,42 @@ export class Storage {
319
288
  return count;
320
289
  }
321
290
 
322
- // ========== Search ==========
291
+ // ========== Vector Search ==========
323
292
 
324
293
  /**
325
- * Search entities by name, type, or observations
326
- * Uses word-based matching - any word in the query that matches returns the entity
294
+ * Vector search is always available (built-in TF-IDF)
327
295
  */
328
- searchEntities(query) {
329
- // Split query into words, remove common stop words
330
- const stopWords = new Set(["the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"]);
331
- const words = query
332
- .toLowerCase()
333
- .split(/\s+/)
334
- .filter(w => w.length > 2 && !stopWords.has(w));
335
-
336
- if (words.length === 0) {
337
- // Fallback to original query if all words were filtered
338
- const lowerQuery = query.toLowerCase();
339
- return this.getAllEntities().filter(e => {
340
- if (e.name.toLowerCase().includes(lowerQuery)) return true;
341
- if (e.entityType.toLowerCase().includes(lowerQuery)) return true;
342
- if (e.observations.some(o => o.toLowerCase().includes(lowerQuery))) return true;
343
- return false;
344
- });
345
- }
346
-
347
- return this.getAllEntities().filter(e => {
348
- // Check if ANY word matches in name, type, or observations
349
- for (const word of words) {
350
- if (e.name.toLowerCase().includes(word)) return true;
351
- if (e.entityType.toLowerCase().includes(word)) return true;
352
- if (e.observations.some(o => o.toLowerCase().includes(word))) return true;
353
- }
354
- return false;
355
- });
296
+ get embeddingsReady() {
297
+ return true;
356
298
  }
357
299
 
358
300
  /**
359
- * Semantic search using embeddings
360
- * Returns entities ranked by similarity score
301
+ * Alias for semantic search - uses built-in TF-IDF vector search
302
+ * This provides semantic-like understanding without external dependencies
361
303
  */
362
304
  async semanticSearchEntities(query, options = {}) {
363
- const {
364
- topK = 10,
365
- threshold = 0.65, // Lower threshold for more matches
366
- entityType = null,
367
- } = options;
368
-
369
- // Initialize embeddings if not ready
370
- if (!this.embeddingsReady) {
371
- await this.initEmbeddings();
372
- }
373
-
374
- // Fall back to keyword search if embeddings not available
375
- if (!this.embeddingsReady) {
376
- const results = this.searchEntities(query);
377
- return {
378
- results: results.map(e => ({ entity: e, score: 0 })),
379
- method: "keyword",
380
- count: results.length,
381
- };
382
- }
383
-
384
- try {
385
- // Generate embedding for the query
386
- const queryEmbedding = await this.embeddings.embed(query);
387
-
388
- // Get all entities with their embeddings
389
- const allEntities = this.getAllEntities();
390
- const items = allEntities
391
- .filter(e => !entityType || e.entityType === entityType)
392
- .map(e => ({
393
- entity: e,
394
- embedding: this.data.entities[e.name]?.embedding || null,
395
- }))
396
- .filter(item => item.embedding !== null);
397
-
398
- // Find most similar
399
- const scoredResults = this.embeddings.findMostSimilar(
400
- queryEmbedding,
401
- items,
402
- topK,
403
- threshold
404
- );
405
-
406
- return {
407
- results: scoredResults.map(r => ({ entity: r.entity, score: r.score })),
408
- method: "semantic",
409
- count: scoredResults.length,
410
- };
411
- } catch (error) {
412
- console.warn("Semantic search failed, falling back to keyword:", error.message);
413
- const results = this.searchEntities(query);
414
- return {
415
- results: results.map(e => ({ entity: e, score: 0 })),
416
- method: "keyword",
417
- count: results.length,
418
- };
419
- }
305
+ return this.vectorSearchEntities(query, options);
420
306
  }
421
307
 
422
308
  /**
423
- * Generate embeddings for all entities that don't have them
424
- * Useful for migrating existing data to semantic search
309
+ * Search entities using TF-IDF vector search
310
+ * Returns entities ranked by similarity score
425
311
  */
426
- async generateMissingEmbeddings(progressCallback = null) {
427
- // Initialize embeddings if not ready
428
- if (!this.embeddingsReady) {
429
- const ready = await this.initEmbeddings();
430
- if (!ready) {
431
- throw new Error("Embeddings provider not available");
432
- }
433
- }
312
+ vectorSearchEntities(query, options = {}) {
313
+ const {
314
+ topK = 10,
315
+ threshold = 0.1,
316
+ entityType = null
317
+ } = options;
434
318
 
435
- const entities = this.getAllEntities();
436
- const missing = entities.filter(e => !this.data.entities[e.name]?.embedding);
437
-
438
- let processed = 0;
439
- for (const entity of missing) {
440
- await this.updateEntityEmbedding(entity.name, entity);
441
- processed++;
442
- if (progressCallback) {
443
- progressCallback(processed, missing.length, entity.name);
444
- }
445
- }
319
+ // Use vector search index
320
+ const result = this.vectorIndex.search(query, { topK, threshold, entityType });
446
321
 
447
- await this.save();
448
- return { processed, total: entities.length };
322
+ return {
323
+ results: result.results,
324
+ method: "vector",
325
+ count: result.count
326
+ };
449
327
  }
450
328
 
451
329
  /**
@@ -477,6 +355,17 @@ export class Storage {
477
355
  return result;
478
356
  }
479
357
 
358
+ /**
359
+ * Get index statistics
360
+ */
361
+ getIndexStats() {
362
+ return {
363
+ ...this.vectorIndex.getStats(),
364
+ entityCount: Object.keys(this.data.entities).length,
365
+ relationCount: this.data.relations.length
366
+ };
367
+ }
368
+
480
369
  /**
481
370
  * Close storage
482
371
  */
@@ -0,0 +1,323 @@
1
+ /**
2
+ * Vector Search module for the Collective Memory System.
3
+ * Uses TF-IDF (Term Frequency-Inverse Document Frequency) for semantic-like search.
4
+ * Pure JavaScript - no external dependencies, works completely offline.
5
+ */
6
+
7
+ /**
8
+ * Tokenize text into terms
9
+ * - Converts to lowercase
10
+ * - Removes special characters (but keeps internal hyphens in compound words)
11
+ * - Splits into words
12
+ * - Filters stop words
13
+ */
14
+ function tokenize(text) {
15
+ const stopWords = new Set([
16
+ "a", "about", "above", "after", "again", "against", "all", "am", "an", "and",
17
+ "any", "are", "as", "at", "be", "because", "been", "before", "being", "below",
18
+ "between", "both", "but", "by", "can", "did", "do", "does", "doing", "don",
19
+ "down", "during", "each", "few", "for", "from", "further", "had", "has", "have",
20
+ "having", "he", "her", "here", "hers", "herself", "him", "himself", "his",
21
+ "how", "i", "if", "in", "into", "is", "it", "its", "itself", "just", "me",
22
+ "might", "more", "most", "must", "my", "myself", "no", "nor", "not", "now",
23
+ "of", "off", "on", "once", "only", "or", "other", "our", "ours", "ourselves",
24
+ "out", "over", "own", "s", "same", "she", "should", "so", "some", "still",
25
+ "such", "t", "than", "that", "the", "their", "theirs", "them", "themselves",
26
+ "then", "there", "these", "they", "this", "those", "through", "to", "too",
27
+ "under", "until", "up", "very", "was", "we", "were", "what", "when", "where",
28
+ "which", "while", "who", "whom", "why", "will", "with", "would", "you",
29
+ "your", "yours", "yourself", "yourselves", "task", "artifact", "structure",
30
+ "agent", "session", "entity", "description", "created", "during"
31
+ ]);
32
+
33
+ return text
34
+ .toLowerCase()
35
+ // Replace special chars with space, but keep word chars and internal hyphens
36
+ .replace(/[^\w\s-]/g, " ")
37
+ .split(/\s+/)
38
+ .filter(word => word.length > 2 && !stopWords.has(word));
39
+ }
40
+
41
+ /**
42
+ * Extract terms from an entity for indexing
43
+ */
44
+ function extractEntityTerms(entity) {
45
+ const terms = [];
46
+
47
+ // Name has high weight
48
+ terms.push(...tokenize(entity.name));
49
+
50
+ // Entity type
51
+ terms.push(entity.entityType);
52
+
53
+ // All observations
54
+ if (entity.observations) {
55
+ for (const obs of entity.observations) {
56
+ terms.push(...tokenize(obs));
57
+ }
58
+ }
59
+
60
+ // Metadata
61
+ if (entity.metadata) {
62
+ for (const value of Object.values(entity.metadata)) {
63
+ if (typeof value === "string") {
64
+ terms.push(...tokenize(value));
65
+ }
66
+ }
67
+ }
68
+
69
+ return terms;
70
+ }
71
+
72
+ /**
73
+ * Calculate term frequency for a document
74
+ */
75
+ function calculateTermFrequency(terms) {
76
+ const tf = {};
77
+ const totalTerms = terms.length;
78
+
79
+ for (const term of terms) {
80
+ tf[term] = (tf[term] || 0) + 1;
81
+ }
82
+
83
+ // Normalize by document length
84
+ for (const term in tf) {
85
+ tf[term] = tf[term] / totalTerms;
86
+ }
87
+
88
+ return tf;
89
+ }
90
+
91
+ /**
92
+ * Calculate inverse document frequency
93
+ */
94
+ function calculateIDF(documents) {
95
+ const idf = {};
96
+ const totalDocs = documents.length;
97
+
98
+ // Count documents containing each term
99
+ for (const doc of documents) {
100
+ const uniqueTerms = new Set(doc.terms);
101
+ for (const term of uniqueTerms) {
102
+ idf[term] = (idf[term] || 0) + 1;
103
+ }
104
+ }
105
+
106
+ // Calculate IDF
107
+ for (const term in idf) {
108
+ idf[term] = Math.log(totalDocs / (1 + idf[term]));
109
+ }
110
+
111
+ return idf;
112
+ }
113
+
114
+ /**
115
+ * Create a TF-IDF vector for a document
116
+ */
117
+ function createTFIDFVector(tf, idf, allTerms) {
118
+ const vector = [];
119
+
120
+ for (const term of allTerms) {
121
+ const tfValue = tf[term] || 0;
122
+ const idfValue = idf[term] || 0;
123
+ vector.push(tfValue * idfValue);
124
+ }
125
+
126
+ return vector;
127
+ }
128
+
129
+ /**
130
+ * Calculate cosine similarity between two vectors
131
+ */
132
+ function cosineSimilarity(a, b) {
133
+ if (a.length !== b.length) {
134
+ return 0;
135
+ }
136
+
137
+ let dotProduct = 0;
138
+ let normA = 0;
139
+ let normB = 0;
140
+
141
+ for (let i = 0; i < a.length; i++) {
142
+ dotProduct += a[i] * b[i];
143
+ normA += a[i] * a[i];
144
+ normB += b[i] * b[i];
145
+ }
146
+
147
+ normA = Math.sqrt(normA);
148
+ normB = Math.sqrt(normB);
149
+
150
+ if (normA === 0 || normB === 0) {
151
+ return 0;
152
+ }
153
+
154
+ return dotProduct / (normA * normB);
155
+ }
156
+
157
+ /**
158
+ * Vector Search Index
159
+ */
160
+ class VectorSearchIndex {
161
+ constructor() {
162
+ this.documents = [];
163
+ this.allTerms = new Set();
164
+ this.idf = {};
165
+ this.built = false;
166
+ }
167
+
168
+ /**
169
+ * Add a document to the index
170
+ */
171
+ addDocument(id, entity) {
172
+ const terms = extractEntityTerms(entity);
173
+
174
+ this.documents.push({
175
+ id,
176
+ entity,
177
+ terms,
178
+ tf: null,
179
+ vector: null
180
+ });
181
+
182
+ for (const term of terms) {
183
+ this.allTerms.add(term);
184
+ }
185
+
186
+ this.built = false;
187
+ }
188
+
189
+ /**
190
+ * Build the index (calculate TF-IDF vectors)
191
+ */
192
+ build() {
193
+ if (this.documents.length === 0) {
194
+ return;
195
+ }
196
+
197
+ const termList = Array.from(this.allTerms);
198
+
199
+ // Calculate IDF for all terms
200
+ this.idf = calculateIDF(this.documents);
201
+
202
+ // Calculate TF and create vectors for each document
203
+ for (const doc of this.documents) {
204
+ doc.tf = calculateTermFrequency(doc.terms);
205
+ doc.vector = createTFIDFVector(doc.tf, this.idf, termList);
206
+ }
207
+
208
+ this.allTermsList = termList;
209
+ this.built = true;
210
+ }
211
+
212
+ /**
213
+ * Search the index
214
+ */
215
+ search(query, options = {}) {
216
+ const {
217
+ topK = 10,
218
+ threshold = 0.1,
219
+ entityType = null
220
+ } = options;
221
+
222
+ // Build if not already built
223
+ if (!this.built) {
224
+ this.build();
225
+ }
226
+
227
+ // Tokenize query
228
+ const queryTerms = tokenize(query);
229
+ if (queryTerms.length === 0) {
230
+ return { results: [], method: "vector", count: 0 };
231
+ }
232
+
233
+ // Create query vector
234
+ const queryTF = calculateTermFrequency(queryTerms);
235
+ const queryVector = createTFIDFVector(queryTF, this.idf, this.allTermsList);
236
+
237
+ // Calculate similarities
238
+ const results = this.documents
239
+ .filter(doc => !entityType || doc.entity.entityType === entityType)
240
+ .map(doc => {
241
+ const score = cosineSimilarity(queryVector, doc.vector);
242
+ return {
243
+ entity: doc.entity,
244
+ score
245
+ };
246
+ })
247
+ .filter(r => r.score >= threshold)
248
+ .sort((a, b) => b.score - a.score)
249
+ .slice(0, topK);
250
+
251
+ return {
252
+ results,
253
+ method: "vector",
254
+ count: results.length
255
+ };
256
+ }
257
+
258
+ /**
259
+ * Clear the index
260
+ */
261
+ clear() {
262
+ this.documents = [];
263
+ this.allTerms = new Set();
264
+ this.idf = {};
265
+ this.built = false;
266
+ }
267
+
268
+ /**
269
+ * Get index statistics
270
+ */
271
+ getStats() {
272
+ return {
273
+ documentCount: this.documents.length,
274
+ uniqueTermCount: this.allTerms.size,
275
+ built: this.built
276
+ };
277
+ }
278
+ }
279
+
280
+ /**
281
+ * Singleton instance
282
+ */
283
+ let indexInstance = null;
284
+
285
+ /**
286
+ * Get or create the vector search index
287
+ */
288
+ export function getVectorIndex() {
289
+ if (!indexInstance) {
290
+ indexInstance = new VectorSearchIndex();
291
+ }
292
+ return indexInstance;
293
+ }
294
+
295
+ /**
296
+ * Rebuild index from entities
297
+ */
298
+ export function buildIndexFromEntities(entities) {
299
+ const index = getVectorIndex();
300
+ index.clear();
301
+
302
+ for (const entity of entities) {
303
+ index.addDocument(entity.name, entity);
304
+ }
305
+
306
+ index.build();
307
+ return index;
308
+ }
309
+
310
+ export {
311
+ VectorSearchIndex,
312
+ tokenize,
313
+ extractEntityTerms,
314
+ cosineSimilarity,
315
+ calculateTermFrequency,
316
+ calculateIDF
317
+ };
318
+
319
+ export default {
320
+ getVectorIndex,
321
+ buildIndexFromEntities,
322
+ VectorSearchIndex
323
+ };
package/src/embeddings.js DELETED
@@ -1,318 +0,0 @@
1
- /**
2
- * Embeddings module for semantic search in the Collective Memory System.
3
- * Supports multiple embedding providers with cosine similarity.
4
- */
5
-
6
- import fs from "fs";
7
- import path from "path";
8
- import os from "os";
9
-
10
- const CONFIG_DIR = path.join(os.homedir(), ".collective-memory");
11
- const CONFIG_PATH = path.join(CONFIG_DIR, "config.json");
12
-
13
- /**
14
- * Default configuration
15
- */
16
- const DEFAULT_CONFIG = {
17
- embedding_provider: "openai", // 'openai' or 'ollama'
18
- openai_api_key: null,
19
- openai_model: "text-embedding-3-small",
20
- ollama_base_url: "http://localhost:11434",
21
- ollama_model: "nomic-embed-text",
22
- embedding_dimension: 1536,
23
- };
24
-
25
- /**
26
- * Load configuration from file
27
- */
28
- function loadConfig() {
29
- try {
30
- if (fs.existsSync(CONFIG_PATH)) {
31
- const content = fs.readFileSync(CONFIG_PATH, "utf-8");
32
- return { ...DEFAULT_CONFIG, ...JSON.parse(content) };
33
- }
34
- } catch (error) {
35
- console.error("Failed to load config:", error.message);
36
- }
37
- return { ...DEFAULT_CONFIG };
38
- }
39
-
40
- /**
41
- * Save configuration to file
42
- */
43
- function saveConfig(config) {
44
- try {
45
- if (!fs.existsSync(CONFIG_DIR)) {
46
- fs.mkdirSync(CONFIG_DIR, { recursive: true });
47
- }
48
- fs.writeFileSync(CONFIG_PATH, JSON.stringify(config, null, 2));
49
- } catch (error) {
50
- console.error("Failed to save config:", error.message);
51
- }
52
- }
53
-
54
- /**
55
- * Calculate cosine similarity between two vectors
56
- */
57
- function cosineSimilarity(a, b) {
58
- if (a.length !== b.length) {
59
- throw new Error("Vector dimensions must match");
60
- }
61
-
62
- let dotProduct = 0;
63
- let normA = 0;
64
- let normB = 0;
65
-
66
- for (let i = 0; i < a.length; i++) {
67
- dotProduct += a[i] * b[i];
68
- normA += a[i] * a[i];
69
- normB += b[i] * b[i];
70
- }
71
-
72
- normA = Math.sqrt(normA);
73
- normB = Math.sqrt(normB);
74
-
75
- if (normA === 0 || normB === 0) {
76
- return 0;
77
- }
78
-
79
- return dotProduct / (normA * normB);
80
- }
81
-
82
- /**
83
- * OpenAI embeddings provider
84
- */
85
- class OpenAIEmbeddings {
86
- constructor(config) {
87
- this.apiKey = config.openai_api_key || process.env.OPENAI_API_KEY;
88
- this.model = config.openai_model || "text-embedding-3-small";
89
- this.dimension = config.embedding_dimension || 1536;
90
- }
91
-
92
- isAvailable() {
93
- return !!this.apiKey;
94
- }
95
-
96
- async embed(text) {
97
- if (!this.isAvailable()) {
98
- throw new Error("OpenAI API key not configured");
99
- }
100
-
101
- const response = await fetch("https://api.openai.com/v1/embeddings", {
102
- method: "POST",
103
- headers: {
104
- "Content-Type": "application/json",
105
- "Authorization": `Bearer ${this.apiKey}`,
106
- },
107
- body: JSON.stringify({
108
- model: this.model,
109
- input: text,
110
- }),
111
- });
112
-
113
- if (!response.ok) {
114
- const error = await response.text();
115
- throw new Error(`OpenAI API error: ${error}`);
116
- }
117
-
118
- const data = await response.json();
119
- return data.data[0].embedding;
120
- }
121
-
122
- async embedBatch(texts) {
123
- if (!this.isAvailable()) {
124
- throw new Error("OpenAI API key not configured");
125
- }
126
-
127
- const response = await fetch("https://api.openai.com/v1/embeddings", {
128
- method: "POST",
129
- headers: {
130
- "Content-Type": "application/json",
131
- "Authorization": `Bearer ${this.apiKey}`,
132
- },
133
- body: JSON.stringify({
134
- model: this.model,
135
- input: texts,
136
- }),
137
- });
138
-
139
- if (!response.ok) {
140
- const error = await response.text();
141
- throw new Error(`OpenAI API error: ${error}`);
142
- }
143
-
144
- const data = await response.json();
145
- return data.data.map((item) => item.embedding);
146
- }
147
- }
148
-
149
- /**
150
- * Ollama embeddings provider (local, free)
151
- */
152
- class OllamaEmbeddings {
153
- constructor(config) {
154
- this.baseUrl = config.ollama_base_url || "http://localhost:11434";
155
- this.model = config.ollama_model || "nomic-embed-text";
156
- this.dimension = 768; // Default for nomic-embed-text
157
- }
158
-
159
- isAvailable() {
160
- // Check if Ollama is running
161
- return fetch(`${this.baseUrl}/api/tags`)
162
- .then((res) => res.ok)
163
- .catch(() => false);
164
- }
165
-
166
- async embed(text) {
167
- const response = await fetch(`${this.baseUrl}/api/embeddings`, {
168
- method: "POST",
169
- headers: {
170
- "Content-Type": "application/json",
171
- },
172
- body: JSON.stringify({
173
- model: this.model,
174
- prompt: text,
175
- }),
176
- });
177
-
178
- if (!response.ok) {
179
- const error = await response.text();
180
- throw new Error(`Ollama API error: ${error}`);
181
- }
182
-
183
- const data = await response.json();
184
- return data.embedding;
185
- }
186
-
187
- async embedBatch(texts) {
188
- // Ollama doesn't support batch embeddings, so we do them sequentially
189
- const embeddings = [];
190
- for (const text of texts) {
191
- embeddings.push(await this.embed(text));
192
- }
193
- return embeddings;
194
- }
195
- }
196
-
197
- /**
198
- * Main embeddings class that manages providers
199
- */
200
- class Embeddings {
201
- constructor(config) {
202
- this.config = config || loadConfig();
203
- this.providers = {
204
- openai: new OpenAIEmbeddings(this.config),
205
- ollama: new OllamaEmbeddings(this.config),
206
- };
207
- this.activeProvider = this.config.embedding_provider || "openai";
208
- }
209
-
210
- /**
211
- * Get the active provider
212
- */
213
- getProvider() {
214
- return this.providers[this.activeProvider];
215
- }
216
-
217
- /**
218
- * Check if the active provider is available
219
- */
220
- async isAvailable() {
221
- const provider = this.getProvider();
222
-
223
- if (this.activeProvider === "ollama") {
224
- return await provider.isAvailable();
225
- }
226
-
227
- return provider.isAvailable();
228
- }
229
-
230
- /**
231
- * Generate embedding for a single text
232
- */
233
- async embed(text) {
234
- const provider = this.getProvider();
235
- return await provider.embed(text);
236
- }
237
-
238
- /**
239
- * Generate embeddings for multiple texts
240
- */
241
- async embedBatch(texts) {
242
- const provider = this.getProvider();
243
- return await provider.embedBatch(texts);
244
- }
245
-
246
- /**
247
- * Find most similar items using cosine similarity
248
- */
249
- findMostSimilar(queryEmbedding, items, topK = 10, threshold = 0.7) {
250
- const results = items.map((item) => {
251
- if (!item.embedding) {
252
- return { ...item, score: 0 };
253
- }
254
- const score = cosineSimilarity(queryEmbedding, item.embedding);
255
- return { ...item, score };
256
- });
257
-
258
- // Sort by score descending and filter by threshold
259
- return results
260
- .filter((r) => r.score >= threshold)
261
- .sort((a, b) => b.score - a.score)
262
- .slice(0, topK);
263
- }
264
-
265
- /**
266
- * Create text representation for entity embedding
267
- */
268
- createEntityText(entity) {
269
- const parts = [];
270
-
271
- // Name carries important semantic weight
272
- parts.push(`Name: ${entity.name}`);
273
-
274
- // Entity type provides context
275
- parts.push(`Type: ${entity.entityType}`);
276
-
277
- // Observations contain the detailed information
278
- if (entity.observations && entity.observations.length > 0) {
279
- parts.push(`Observations:\n${entity.observations.join("\n")}`);
280
- }
281
-
282
- // Metadata if present
283
- if (entity.metadata) {
284
- parts.push(`Metadata: ${JSON.stringify(entity.metadata)}`);
285
- }
286
-
287
- return parts.join("\n\n");
288
- }
289
- }
290
-
291
- /**
292
- * Singleton instance
293
- */
294
- let embeddingsInstance = null;
295
-
296
- /**
297
- * Get or create embeddings instance
298
- */
299
- export function getEmbeddings(config) {
300
- if (!embeddingsInstance) {
301
- embeddingsInstance = new Embeddings(config);
302
- }
303
- return embeddingsInstance;
304
- }
305
-
306
- /**
307
- * Export utilities
308
- */
309
- export {
310
- cosineSimilarity,
311
- loadConfig,
312
- saveConfig,
313
- Embeddings,
314
- OpenAIEmbeddings,
315
- OllamaEmbeddings,
316
- };
317
-
318
- export default { getEmbeddings, cosineSimilarity, loadConfig, saveConfig };