collective-memory-mcp 0.5.0 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/README.md CHANGED
@@ -1,16 +1,23 @@
1
1
  # Collective Memory MCP Server
2
2
 
3
- A persistent, graph-based memory system that enables AI agents to document their work and learn from each other's experiences. This system transforms ephemeral agent interactions into a searchable knowledge base of structural patterns, solutions, and methodologies.
3
+ A persistent, graph-based memory system with **semantic search** that enables AI agents to document their work and learn from each other's experiences. This system transforms ephemeral agent interactions into a searchable knowledge base of structural patterns, solutions, and methodologies.
4
4
 
5
5
  ## Overview
6
6
 
7
7
  The Collective Memory System is designed for multi-agent environments where agents need to:
8
8
 
9
9
  - Document their completed work for future reference
10
- - Discover how similar tasks were solved previously
10
+ - Discover how similar tasks were solved previously using **semantic understanding**
11
11
  - Learn from the structural patterns and approaches of other agents
12
12
  - Coordinate across parallel executions without duplicating effort
13
13
 
14
+ ## Key Features
15
+
16
+ - **Semantic Search** - Finds conceptually similar content even when keywords differ
17
+ - **Knowledge Graph** - Entities and relations capture complex relationships
18
+ - **Ranked Results** - Similarity scores help identify the most relevant past work
19
+ - **Auto-Embeddings** - New content automatically gets semantic embeddings
20
+
14
21
  ## Installation
15
22
 
16
23
  ```bash
@@ -42,8 +49,11 @@ Add this to your Claude system prompt to ensure agents know about the Collective
42
49
  You have access to a Collective Memory MCP Server that stores knowledge from previous tasks.
43
50
 
44
51
  BEFORE starting work, search for similar past tasks using:
45
- - search_collective_memory
46
- - find_similar_procedures
52
+ - search_collective_memory (semantic search - understands meaning, not just keywords)
53
+ - find_similar_procedures (finds similar tasks with full implementation details)
54
+
55
+ The search uses semantic embeddings, so it finds relevant content even when different
56
+ terminology is used. Results are ranked by similarity score.
47
57
 
48
58
  AFTER completing any task, document it using:
49
59
  - record_task_completion
@@ -52,6 +62,48 @@ When writing observations, be SPECIFIC and include facts like file paths, versio
52
62
  metrics, and error messages. Avoid vague statements like "works well" or "fixed bugs".
53
63
  ```
54
64
 
65
+ ## Setting Up Semantic Search
66
+
67
+ For semantic search to work, you need to configure an embeddings provider:
68
+
69
+ ### Option 1: OpenAI (Recommended - Best Quality)
70
+
71
+ ```bash
72
+ # Configure with your OpenAI API key
73
+ # Use the manage_embeddings tool with:
74
+ {
75
+ "action": "configure",
76
+ "provider": "openai",
77
+ "api_key": "sk-..."
78
+ }
79
+ ```
80
+
81
+ ### Option 2: Ollama (Free - Local)
82
+
83
+ ```bash
84
+ # Install Ollama first
85
+ # Then pull the embedding model
86
+ ollama pull nomic-embed-text
87
+
88
+ # Configure the provider
89
+ {
90
+ "action": "configure",
91
+ "provider": "ollama"
92
+ }
93
+ ```
94
+
95
+ ### Generate Embeddings for Existing Data
96
+
97
+ After configuring, generate embeddings for any existing entities:
98
+
99
+ ```json
100
+ {
101
+ "action": "generate"
102
+ }
103
+ ```
104
+
105
+ **Note:** Without configuring embeddings, the system falls back to keyword-based search.
106
+
55
107
  ## Entity Types
56
108
 
57
109
  | Type | Description |
@@ -93,7 +145,7 @@ metrics, and error messages. Avoid vague statements like "works well" or "fixed
93
145
  ### Query & Search
94
146
 
95
147
  - **read_graph** - Read entire knowledge graph
96
- - **search_collective_memory** - Natural language search
148
+ - **search_collective_memory** - Semantic search with ranked results
97
149
  - **open_nodes** - Retrieve specific nodes by name
98
150
 
99
151
  ### Agent Workflow
@@ -101,6 +153,10 @@ metrics, and error messages. Avoid vague statements like "works well" or "fixed
101
153
  - **record_task_completion** - Primary tool for documenting completed work
102
154
  - **find_similar_procedures** - Find similar tasks with full implementation details
103
155
 
156
+ ### Embeddings Management
157
+
158
+ - **manage_embeddings** - Configure semantic search and generate embeddings
159
+
104
160
  ## Example Usage
105
161
 
106
162
  ### Recording a Task Completion
@@ -126,12 +182,38 @@ await session.callTool("record_task_completion", {
126
182
  });
127
183
  ```
128
184
 
129
- ### Finding Similar Procedures
185
+ ### Finding Similar Procedures (Semantic Search)
130
186
 
131
187
  ```javascript
132
188
  const result = await session.callTool("find_similar_procedures", {
133
189
  query: "authentication implementation"
134
190
  });
191
+
192
+ // Returns ranked results with similarity scores:
193
+ // {
194
+ // "similar_tasks": [
195
+ // { "task": {...}, "score": 0.89, "artifacts": [...], "structures": [...] },
196
+ // { "task": {...}, "score": 0.82, "artifacts": [...], "structures": [...] }
197
+ // ],
198
+ // "search_method": "semantic"
199
+ // }
200
+ ```
201
+
202
+ ### Configuring Embeddings
203
+
204
+ ```javascript
205
+ // Check status
206
+ await session.callTool("manage_embeddings", { "action": "status" });
207
+
208
+ // Configure OpenAI
209
+ await session.callTool("manage_embeddings", {
210
+ "action": "configure",
211
+ "provider": "openai",
212
+ "api_key": "sk-..."
213
+ });
214
+
215
+ // Generate embeddings for existing data
216
+ await session.callTool("manage_embeddings", { "action": "generate" });
135
217
  ```
136
218
 
137
219
  ## Database
@@ -139,9 +221,19 @@ const result = await session.callTool("find_similar_procedures", {
139
221
  The server uses JSON file storage for persistence. Data is stored at:
140
222
 
141
223
  ```
142
- ~/.collective-memory/memory.json
224
+ ~/.collective-memory/memory.json # Knowledge graph data
225
+ ~/.collective-memory/config.json # Embeddings provider configuration
143
226
  ```
144
227
 
228
+ ## Semantic Search Benefits
229
+
230
+ | Before (Keyword Search) | After (Semantic Search) |
231
+ |------------------------|------------------------|
232
+ | Query "login" misses "authentication" | Query "login" finds "authentication", "JWT", "OAuth" |
233
+ | No relevance ranking | Results ranked by similarity score (0-1) |
234
+ | Exact word matching required | Understands meaning and intent |
235
+ | High false-positive rate | More precise, relevant results |
236
+
145
237
  ## Requirements
146
238
 
147
239
  - Node.js 18+
package/package.json CHANGED
@@ -1,7 +1,7 @@
1
1
  {
2
2
  "name": "collective-memory-mcp",
3
- "version": "0.5.0",
4
- "description": "A persistent, graph-based memory system for AI agents (MCP Server)",
3
+ "version": "0.6.0",
4
+ "description": "A persistent, graph-based memory system for AI agents with semantic search (MCP Server)",
5
5
  "type": "module",
6
6
  "main": "src/server.js",
7
7
  "bin": {
@@ -0,0 +1,318 @@
1
+ /**
2
+ * Embeddings module for semantic search in the Collective Memory System.
3
+ * Supports multiple embedding providers with cosine similarity.
4
+ */
5
+
6
+ import fs from "fs";
7
+ import path from "path";
8
+ import os from "os";
9
+
10
+ const CONFIG_DIR = path.join(os.homedir(), ".collective-memory");
11
+ const CONFIG_PATH = path.join(CONFIG_DIR, "config.json");
12
+
13
+ /**
14
+ * Default configuration
15
+ */
16
+ const DEFAULT_CONFIG = {
17
+ embedding_provider: "openai", // 'openai' or 'ollama'
18
+ openai_api_key: null,
19
+ openai_model: "text-embedding-3-small",
20
+ ollama_base_url: "http://localhost:11434",
21
+ ollama_model: "nomic-embed-text",
22
+ embedding_dimension: 1536,
23
+ };
24
+
25
+ /**
26
+ * Load configuration from file
27
+ */
28
+ function loadConfig() {
29
+ try {
30
+ if (fs.existsSync(CONFIG_PATH)) {
31
+ const content = fs.readFileSync(CONFIG_PATH, "utf-8");
32
+ return { ...DEFAULT_CONFIG, ...JSON.parse(content) };
33
+ }
34
+ } catch (error) {
35
+ console.error("Failed to load config:", error.message);
36
+ }
37
+ return { ...DEFAULT_CONFIG };
38
+ }
39
+
40
+ /**
41
+ * Save configuration to file
42
+ */
43
+ function saveConfig(config) {
44
+ try {
45
+ if (!fs.existsSync(CONFIG_DIR)) {
46
+ fs.mkdirSync(CONFIG_DIR, { recursive: true });
47
+ }
48
+ fs.writeFileSync(CONFIG_PATH, JSON.stringify(config, null, 2));
49
+ } catch (error) {
50
+ console.error("Failed to save config:", error.message);
51
+ }
52
+ }
53
+
54
+ /**
55
+ * Calculate cosine similarity between two vectors
56
+ */
57
+ function cosineSimilarity(a, b) {
58
+ if (a.length !== b.length) {
59
+ throw new Error("Vector dimensions must match");
60
+ }
61
+
62
+ let dotProduct = 0;
63
+ let normA = 0;
64
+ let normB = 0;
65
+
66
+ for (let i = 0; i < a.length; i++) {
67
+ dotProduct += a[i] * b[i];
68
+ normA += a[i] * a[i];
69
+ normB += b[i] * b[i];
70
+ }
71
+
72
+ normA = Math.sqrt(normA);
73
+ normB = Math.sqrt(normB);
74
+
75
+ if (normA === 0 || normB === 0) {
76
+ return 0;
77
+ }
78
+
79
+ return dotProduct / (normA * normB);
80
+ }
81
+
82
+ /**
83
+ * OpenAI embeddings provider
84
+ */
85
+ class OpenAIEmbeddings {
86
+ constructor(config) {
87
+ this.apiKey = config.openai_api_key || process.env.OPENAI_API_KEY;
88
+ this.model = config.openai_model || "text-embedding-3-small";
89
+ this.dimension = config.embedding_dimension || 1536;
90
+ }
91
+
92
+ isAvailable() {
93
+ return !!this.apiKey;
94
+ }
95
+
96
+ async embed(text) {
97
+ if (!this.isAvailable()) {
98
+ throw new Error("OpenAI API key not configured");
99
+ }
100
+
101
+ const response = await fetch("https://api.openai.com/v1/embeddings", {
102
+ method: "POST",
103
+ headers: {
104
+ "Content-Type": "application/json",
105
+ "Authorization": `Bearer ${this.apiKey}`,
106
+ },
107
+ body: JSON.stringify({
108
+ model: this.model,
109
+ input: text,
110
+ }),
111
+ });
112
+
113
+ if (!response.ok) {
114
+ const error = await response.text();
115
+ throw new Error(`OpenAI API error: ${error}`);
116
+ }
117
+
118
+ const data = await response.json();
119
+ return data.data[0].embedding;
120
+ }
121
+
122
+ async embedBatch(texts) {
123
+ if (!this.isAvailable()) {
124
+ throw new Error("OpenAI API key not configured");
125
+ }
126
+
127
+ const response = await fetch("https://api.openai.com/v1/embeddings", {
128
+ method: "POST",
129
+ headers: {
130
+ "Content-Type": "application/json",
131
+ "Authorization": `Bearer ${this.apiKey}`,
132
+ },
133
+ body: JSON.stringify({
134
+ model: this.model,
135
+ input: texts,
136
+ }),
137
+ });
138
+
139
+ if (!response.ok) {
140
+ const error = await response.text();
141
+ throw new Error(`OpenAI API error: ${error}`);
142
+ }
143
+
144
+ const data = await response.json();
145
+ return data.data.map((item) => item.embedding);
146
+ }
147
+ }
148
+
149
+ /**
150
+ * Ollama embeddings provider (local, free)
151
+ */
152
+ class OllamaEmbeddings {
153
+ constructor(config) {
154
+ this.baseUrl = config.ollama_base_url || "http://localhost:11434";
155
+ this.model = config.ollama_model || "nomic-embed-text";
156
+ this.dimension = 768; // Default for nomic-embed-text
157
+ }
158
+
159
+ isAvailable() {
160
+ // Check if Ollama is running
161
+ return fetch(`${this.baseUrl}/api/tags`)
162
+ .then((res) => res.ok)
163
+ .catch(() => false);
164
+ }
165
+
166
+ async embed(text) {
167
+ const response = await fetch(`${this.baseUrl}/api/embeddings`, {
168
+ method: "POST",
169
+ headers: {
170
+ "Content-Type": "application/json",
171
+ },
172
+ body: JSON.stringify({
173
+ model: this.model,
174
+ prompt: text,
175
+ }),
176
+ });
177
+
178
+ if (!response.ok) {
179
+ const error = await response.text();
180
+ throw new Error(`Ollama API error: ${error}`);
181
+ }
182
+
183
+ const data = await response.json();
184
+ return data.embedding;
185
+ }
186
+
187
+ async embedBatch(texts) {
188
+ // Ollama doesn't support batch embeddings, so we do them sequentially
189
+ const embeddings = [];
190
+ for (const text of texts) {
191
+ embeddings.push(await this.embed(text));
192
+ }
193
+ return embeddings;
194
+ }
195
+ }
196
+
197
+ /**
198
+ * Main embeddings class that manages providers
199
+ */
200
+ class Embeddings {
201
+ constructor(config) {
202
+ this.config = config || loadConfig();
203
+ this.providers = {
204
+ openai: new OpenAIEmbeddings(this.config),
205
+ ollama: new OllamaEmbeddings(this.config),
206
+ };
207
+ this.activeProvider = this.config.embedding_provider || "openai";
208
+ }
209
+
210
+ /**
211
+ * Get the active provider
212
+ */
213
+ getProvider() {
214
+ return this.providers[this.activeProvider];
215
+ }
216
+
217
+ /**
218
+ * Check if the active provider is available
219
+ */
220
+ async isAvailable() {
221
+ const provider = this.getProvider();
222
+
223
+ if (this.activeProvider === "ollama") {
224
+ return await provider.isAvailable();
225
+ }
226
+
227
+ return provider.isAvailable();
228
+ }
229
+
230
+ /**
231
+ * Generate embedding for a single text
232
+ */
233
+ async embed(text) {
234
+ const provider = this.getProvider();
235
+ return await provider.embed(text);
236
+ }
237
+
238
+ /**
239
+ * Generate embeddings for multiple texts
240
+ */
241
+ async embedBatch(texts) {
242
+ const provider = this.getProvider();
243
+ return await provider.embedBatch(texts);
244
+ }
245
+
246
+ /**
247
+ * Find most similar items using cosine similarity
248
+ */
249
+ findMostSimilar(queryEmbedding, items, topK = 10, threshold = 0.7) {
250
+ const results = items.map((item) => {
251
+ if (!item.embedding) {
252
+ return { ...item, score: 0 };
253
+ }
254
+ const score = cosineSimilarity(queryEmbedding, item.embedding);
255
+ return { ...item, score };
256
+ });
257
+
258
+ // Sort by score descending and filter by threshold
259
+ return results
260
+ .filter((r) => r.score >= threshold)
261
+ .sort((a, b) => b.score - a.score)
262
+ .slice(0, topK);
263
+ }
264
+
265
+ /**
266
+ * Create text representation for entity embedding
267
+ */
268
+ createEntityText(entity) {
269
+ const parts = [];
270
+
271
+ // Name carries important semantic weight
272
+ parts.push(`Name: ${entity.name}`);
273
+
274
+ // Entity type provides context
275
+ parts.push(`Type: ${entity.entityType}`);
276
+
277
+ // Observations contain the detailed information
278
+ if (entity.observations && entity.observations.length > 0) {
279
+ parts.push(`Observations:\n${entity.observations.join("\n")}`);
280
+ }
281
+
282
+ // Metadata if present
283
+ if (entity.metadata) {
284
+ parts.push(`Metadata: ${JSON.stringify(entity.metadata)}`);
285
+ }
286
+
287
+ return parts.join("\n\n");
288
+ }
289
+ }
290
+
291
+ /**
292
+ * Singleton instance
293
+ */
294
+ let embeddingsInstance = null;
295
+
296
+ /**
297
+ * Get or create embeddings instance
298
+ */
299
+ export function getEmbeddings(config) {
300
+ if (!embeddingsInstance) {
301
+ embeddingsInstance = new Embeddings(config);
302
+ }
303
+ return embeddingsInstance;
304
+ }
305
+
306
+ /**
307
+ * Export utilities
308
+ */
309
+ export {
310
+ cosineSimilarity,
311
+ loadConfig,
312
+ saveConfig,
313
+ Embeddings,
314
+ OpenAIEmbeddings,
315
+ OllamaEmbeddings,
316
+ };
317
+
318
+ export default { getEmbeddings, cosineSimilarity, loadConfig, saveConfig };
package/src/server.js CHANGED
@@ -14,13 +14,17 @@ import {
14
14
  } from "@modelcontextprotocol/sdk/types.js";
15
15
  import { getStorage } from "./storage.js";
16
16
  import { Entity, Relation, ENTITY_TYPES, RELATION_TYPES } from "./models.js";
17
+ import { loadConfig } from "./embeddings.js";
17
18
 
18
19
  /**
19
20
  * Create and configure the MCP server
20
21
  */
21
- function createServer() {
22
+ async function createServer() {
22
23
  const storage = getStorage();
23
24
 
25
+ // Initialize embeddings for semantic search
26
+ await storage.initEmbeddings();
27
+
24
28
  const server = new Server(
25
29
  {
26
30
  name: "collective-memory",
@@ -230,16 +234,17 @@ function createServer() {
230
234
  {
231
235
  name: "search_collective_memory",
232
236
  description:
233
- "**Search all past work** - Use before starting a task to learn from previous solutions. " +
234
- "Searches entity names, types, and all observation content. " +
235
- "Returns matching entities with their relations. " +
237
+ "**Search all past work using semantic search** - Use before starting a task to learn from previous solutions. " +
238
+ "Uses semantic embeddings to find conceptually similar content, even with different keywords. " +
239
+ "Returns ranked results with similarity scores. " +
240
+ "Automatically falls back to keyword search if embeddings aren't available. " +
236
241
  "Use find_similar_procedures for more detailed results with artifacts.",
237
242
  inputSchema: {
238
243
  type: "object",
239
244
  properties: {
240
245
  query: {
241
246
  type: "string",
242
- description: "What are you looking for? (e.g., 'authentication', 'CORS fix', 'database')",
247
+ description: "What are you looking for? Semantic search understands meaning. (e.g., 'authentication', 'CORS fix', 'database')",
243
248
  },
244
249
  },
245
250
  required: ["query"],
@@ -350,8 +355,9 @@ function createServer() {
350
355
  {
351
356
  name: "find_similar_procedures",
352
357
  description:
353
- "**Use BEFORE starting work** - Find how similar tasks were solved previously. " +
354
- "Returns complete implementation details including artifacts and structures. " +
358
+ "**Use BEFORE starting work** - Find how similar tasks were solved previously using semantic search. " +
359
+ "Returns complete implementation details including artifacts and structures, ranked by similarity. " +
360
+ "Understands meaning and intent, not just keywords. " +
355
361
  "Learn from past solutions before implementing new features. " +
356
362
  "Query examples: 'authentication', 'database migration', 'API design', 'error handling'.",
357
363
  inputSchema: {
@@ -359,12 +365,39 @@ function createServer() {
359
365
  properties: {
360
366
  query: {
361
367
  type: "string",
362
- description: "What are you trying to do? (e.g., 'authentication implementation', 'database migration')",
368
+ description: "What are you trying to do? Semantic search finds conceptually similar work. (e.g., 'authentication implementation', 'database migration')",
363
369
  },
364
370
  },
365
371
  required: ["query"],
366
372
  },
367
373
  },
374
+ {
375
+ name: "manage_embeddings",
376
+ description:
377
+ "**Manage semantic search embeddings** - Generate embeddings for existing entities " +
378
+ "to enable semantic search. Run this once after setting up an embeddings provider. " +
379
+ "Embeddings enable finding similar content even when keywords don't match exactly.",
380
+ inputSchema: {
381
+ type: "object",
382
+ properties: {
383
+ action: {
384
+ type: "string",
385
+ enum: ["generate", "status", "configure"],
386
+ description: "Action: 'generate' creates embeddings for entities missing them, 'status' shows current state, 'configure' updates settings",
387
+ },
388
+ provider: {
389
+ type: "string",
390
+ enum: ["openai", "ollama"],
391
+ description: "Provider to use (only for 'configure' action)",
392
+ },
393
+ api_key: {
394
+ type: "string",
395
+ description: "API key for OpenAI (only for 'configure' action with provider='openai')",
396
+ },
397
+ },
398
+ required: ["action"],
399
+ },
400
+ },
368
401
  ],
369
402
  };
370
403
  });
@@ -703,6 +736,9 @@ Future agents will read your observations to learn. Write for them, not for your
703
736
  case "find_similar_procedures":
704
737
  return { content: [{ type: "text", text: JSON.stringify(findSimilarProcedures(args), null, 2) }] };
705
738
 
739
+ case "manage_embeddings":
740
+ return { content: [{ type: "text", text: JSON.stringify(await manageEmbeddings(args), null, 2) }] };
741
+
706
742
  default:
707
743
  throw new Error(`Unknown tool: ${name}`);
708
744
  }
@@ -841,16 +877,19 @@ Future agents will read your observations to learn. Write for them, not for your
841
877
  };
842
878
  }
843
879
 
844
- function searchCollectiveMemory({ query = "" }) {
845
- const matchingEntities = storage.searchEntities(query);
880
+ async function searchCollectiveMemory({ query = "" }) {
881
+ // Use semantic search if available
882
+ const searchResult = await storage.semanticSearchEntities(query);
846
883
 
847
- const results = matchingEntities.map((entity) => {
884
+ const results = searchResult.results.map((item) => {
885
+ const entity = item.entity;
848
886
  const related = storage.getRelatedEntities(entity.name);
849
887
  return {
850
888
  name: entity.name,
851
889
  entityType: entity.entityType,
852
890
  observations: entity.observations,
853
891
  createdAt: entity.createdAt,
892
+ score: item.score,
854
893
  related_entities: related.connected.map((e) => ({
855
894
  name: e.name,
856
895
  entityType: e.entityType,
@@ -858,7 +897,11 @@ Future agents will read your observations to learn. Write for them, not for your
858
897
  };
859
898
  });
860
899
 
861
- return { matching_entities: results, count: results.length };
900
+ return {
901
+ matching_entities: results,
902
+ count: results.length,
903
+ search_method: searchResult.method,
904
+ };
862
905
  }
863
906
 
864
907
  function openNodes({ names = [] }) {
@@ -1032,20 +1075,13 @@ Future agents will read your observations to learn. Write for them, not for your
1032
1075
  };
1033
1076
  }
1034
1077
 
1035
- function findSimilarProcedures({ query = "" }) {
1036
- const searchQuery = query.toLowerCase();
1037
-
1038
- // Search for matching task entities
1039
- const allEntities = storage.getAllEntities();
1040
- const matchingTasks = allEntities.filter(
1041
- (e) =>
1042
- e.entityType === "task" &&
1043
- (e.name.toLowerCase().includes(searchQuery) ||
1044
- e.observations.some((obs) => obs.toLowerCase().includes(searchQuery)))
1045
- );
1078
+ async function findSimilarProcedures({ query = "" }) {
1079
+ // Use semantic search for tasks, falling back to keyword search
1080
+ const searchResult = await storage.semanticSearchEntities(query, { entityType: "task" });
1046
1081
 
1047
1082
  const results = [];
1048
- for (const task of matchingTasks) {
1083
+ for (const item of searchResult.results) {
1084
+ const task = item.entity;
1049
1085
  const taskRelations = storage.getRelations({ fromEntity: task.name });
1050
1086
 
1051
1087
  const artifacts = [];
@@ -1083,10 +1119,117 @@ Future agents will read your observations to learn. Write for them, not for your
1083
1119
  artifacts,
1084
1120
  structures,
1085
1121
  execution_context: executionContext,
1122
+ score: item.score,
1086
1123
  });
1087
1124
  }
1088
1125
 
1089
- return { similar_tasks: results, count: results.length };
1126
+ return { similar_tasks: results, count: results.length, search_method: searchResult.method };
1127
+ }
1128
+
1129
+ async function manageEmbeddings({ action = "status", provider = null, api_key = null }) {
1130
+ const { saveConfig } = await import("./embeddings.js");
1131
+
1132
+ switch (action) {
1133
+ case "status": {
1134
+ const allEntities = storage.getAllEntities();
1135
+ const withEmbeddings = allEntities.filter(
1136
+ (e) => storage.data.entities[e.name]?.embedding
1137
+ ).length;
1138
+
1139
+ const config = loadConfig();
1140
+ const isReady = storage.embeddingsReady;
1141
+
1142
+ return {
1143
+ status: "success",
1144
+ action: "status",
1145
+ embeddings_ready: isReady,
1146
+ provider: config.embedding_provider,
1147
+ entities_with_embeddings: withEmbeddings,
1148
+ total_entities: allEntities.length,
1149
+ coverage_percent: allEntities.length > 0
1150
+ ? Math.round((withEmbeddings / allEntities.length) * 100)
1151
+ : 0,
1152
+ message: isReady
1153
+ ? `Embeddings enabled using ${config.embedding_provider}. ${withEmbeddings}/${allEntities.length} entities have embeddings.`
1154
+ : "Embeddings not configured. Use 'configure' action to set up a provider.",
1155
+ };
1156
+ }
1157
+
1158
+ case "configure": {
1159
+ if (!provider) {
1160
+ return {
1161
+ status: "error",
1162
+ message: "Provider is required for configure action",
1163
+ };
1164
+ }
1165
+
1166
+ const config = loadConfig();
1167
+ config.embedding_provider = provider;
1168
+
1169
+ if (provider === "openai" && api_key) {
1170
+ config.openai_api_key = api_key;
1171
+ }
1172
+
1173
+ saveConfig(config);
1174
+
1175
+ // Re-initialize embeddings with new config
1176
+ storage.embeddingsReady = false;
1177
+ await storage.initEmbeddings();
1178
+
1179
+ return {
1180
+ status: "success",
1181
+ action: "configure",
1182
+ provider,
1183
+ embeddings_ready: storage.embeddingsReady,
1184
+ message: storage.embeddingsReady
1185
+ ? `Successfully configured ${provider} for embeddings`
1186
+ : `Configured ${provider} but provider is not available. Check API keys or provider status.`,
1187
+ };
1188
+ }
1189
+
1190
+ case "generate": {
1191
+ // Generate embeddings for entities that don't have them
1192
+ const allEntities = storage.getAllEntities();
1193
+ const missing = allEntities.filter(
1194
+ (e) => !storage.data.entities[e.name]?.embedding
1195
+ );
1196
+
1197
+ if (missing.length === 0) {
1198
+ return {
1199
+ status: "success",
1200
+ action: "generate",
1201
+ message: "All entities already have embeddings",
1202
+ processed: 0,
1203
+ };
1204
+ }
1205
+
1206
+ try {
1207
+ const result = await storage.generateMissingEmbeddings((current, total, name) => {
1208
+ // Optional: could emit progress events
1209
+ });
1210
+
1211
+ return {
1212
+ status: "success",
1213
+ action: "generate",
1214
+ processed: result.processed,
1215
+ total_entities: result.total,
1216
+ message: `Generated embeddings for ${result.processed} entities`,
1217
+ };
1218
+ } catch (error) {
1219
+ return {
1220
+ status: "error",
1221
+ action: "generate",
1222
+ message: error.message,
1223
+ };
1224
+ }
1225
+ }
1226
+
1227
+ default:
1228
+ return {
1229
+ status: "error",
1230
+ message: `Unknown action: ${action}`,
1231
+ };
1232
+ }
1090
1233
  }
1091
1234
 
1092
1235
  return server;
@@ -1096,7 +1239,7 @@ Future agents will read your observations to learn. Write for them, not for your
1096
1239
  * Main entry point
1097
1240
  */
1098
1241
  async function main() {
1099
- const server = createServer();
1242
+ const server = await createServer();
1100
1243
  const transport = new StdioServerTransport();
1101
1244
  await server.connect(transport);
1102
1245
  }
package/src/storage.js CHANGED
@@ -8,6 +8,7 @@ import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
8
8
  import path from "path";
9
9
  import os from "os";
10
10
  import { Entity, Relation } from "./models.js";
11
+ import { getEmbeddings } from "./embeddings.js";
11
12
 
12
13
  const DB_DIR = path.join(os.homedir(), ".collective-memory");
13
14
  const DB_PATH = path.join(DB_DIR, "memory.json");
@@ -16,9 +17,12 @@ const DB_PATH = path.join(DB_DIR, "memory.json");
16
17
  * Simple file-based storage
17
18
  */
18
19
  export class Storage {
19
- constructor(dbPath = DB_PATH) {
20
+ constructor(dbPath = DB_PATH, embeddingsEnabled = true) {
20
21
  this.dbPath = dbPath;
21
22
  this.data = null;
23
+ this.embeddingsEnabled = embeddingsEnabled;
24
+ this.embeddings = null;
25
+ this.embeddingsReady = false;
22
26
  // Initialize synchronously
23
27
  this.init();
24
28
  }
@@ -41,7 +45,7 @@ export class Storage {
41
45
  this.data = {
42
46
  entities: {},
43
47
  relations: [],
44
- version: "1.0",
48
+ version: "2.0", // Version bump for embeddings support
45
49
  };
46
50
  this.saveSync();
47
51
  }
@@ -50,11 +54,61 @@ export class Storage {
50
54
  this.data = {
51
55
  entities: {},
52
56
  relations: [],
53
- version: "1.0",
57
+ version: "2.0",
54
58
  };
55
59
  }
56
60
  }
57
61
 
62
+ /**
63
+ * Initialize embeddings asynchronously
64
+ */
65
+ async initEmbeddings() {
66
+ if (!this.embeddingsEnabled || this.embeddingsReady) {
67
+ return this.embeddingsReady;
68
+ }
69
+
70
+ try {
71
+ this.embeddings = getEmbeddings();
72
+ this.embeddingsReady = await this.embeddings.isAvailable();
73
+ } catch (error) {
74
+ console.warn("Embeddings not available:", error.message);
75
+ this.embeddingsReady = false;
76
+ }
77
+
78
+ return this.embeddingsReady;
79
+ }
80
+
81
+ /**
82
+ * Generate embedding for an entity
83
+ */
84
+ async generateEmbedding(entity) {
85
+ if (!this.embeddingsReady || !this.embeddings) {
86
+ return null;
87
+ }
88
+
89
+ try {
90
+ const text = this.embeddings.createEntityText(entity);
91
+ return await this.embeddings.embed(text);
92
+ } catch (error) {
93
+ console.warn("Failed to generate embedding:", error.message);
94
+ return null;
95
+ }
96
+ }
97
+
98
+ /**
99
+ * Store embedding in entity data
100
+ */
101
+ async updateEntityEmbedding(entityName, entity) {
102
+ if (!this.embeddingsReady) {
103
+ return;
104
+ }
105
+
106
+ const embedding = await this.generateEmbedding(entity);
107
+ if (embedding) {
108
+ this.data.entities[entityName].embedding = embedding;
109
+ }
110
+ }
111
+
58
112
  /**
59
113
  * Save data synchronously
60
114
  */
@@ -90,7 +144,12 @@ export class Storage {
90
144
  if (this.data.entities[entity.name]) {
91
145
  return false;
92
146
  }
93
- this.data.entities[entity.name] = entity.toJSON();
147
+ const entityData = entity.toJSON();
148
+ this.data.entities[entity.name] = entityData;
149
+
150
+ // Generate embedding if available
151
+ await this.updateEntityEmbedding(entity.name, entity);
152
+
94
153
  await this.save();
95
154
  return true;
96
155
  }
@@ -127,6 +186,7 @@ export class Storage {
127
186
  const entity = this.data.entities[name];
128
187
  if (!entity) return false;
129
188
 
189
+ const updated = false;
130
190
  if (observations !== undefined) {
131
191
  entity.observations = observations;
132
192
  }
@@ -134,6 +194,12 @@ export class Storage {
134
194
  entity.metadata = metadata;
135
195
  }
136
196
 
197
+ // Regenerate embedding if observations changed
198
+ if (observations !== undefined && this.embeddingsReady) {
199
+ const entityObj = new Entity(entity);
200
+ await this.updateEntityEmbedding(name, entityObj);
201
+ }
202
+
137
203
  await this.save();
138
204
  return true;
139
205
  }
@@ -257,17 +323,131 @@ export class Storage {
257
323
 
258
324
  /**
259
325
  * Search entities by name, type, or observations
326
+ * Uses word-based matching - any word in the query that matches returns the entity
260
327
  */
261
328
  searchEntities(query) {
262
- const lowerQuery = query.toLowerCase();
329
+ // Split query into words, remove common stop words
330
+ const stopWords = new Set(["the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"]);
331
+ const words = query
332
+ .toLowerCase()
333
+ .split(/\s+/)
334
+ .filter(w => w.length > 2 && !stopWords.has(w));
335
+
336
+ if (words.length === 0) {
337
+ // Fallback to original query if all words were filtered
338
+ const lowerQuery = query.toLowerCase();
339
+ return this.getAllEntities().filter(e => {
340
+ if (e.name.toLowerCase().includes(lowerQuery)) return true;
341
+ if (e.entityType.toLowerCase().includes(lowerQuery)) return true;
342
+ if (e.observations.some(o => o.toLowerCase().includes(lowerQuery))) return true;
343
+ return false;
344
+ });
345
+ }
346
+
263
347
  return this.getAllEntities().filter(e => {
264
- if (e.name.toLowerCase().includes(lowerQuery)) return true;
265
- if (e.entityType.toLowerCase().includes(lowerQuery)) return true;
266
- if (e.observations.some(o => o.toLowerCase().includes(lowerQuery))) return true;
348
+ // Check if ANY word matches in name, type, or observations
349
+ for (const word of words) {
350
+ if (e.name.toLowerCase().includes(word)) return true;
351
+ if (e.entityType.toLowerCase().includes(word)) return true;
352
+ if (e.observations.some(o => o.toLowerCase().includes(word))) return true;
353
+ }
267
354
  return false;
268
355
  });
269
356
  }
270
357
 
358
+ /**
359
+ * Semantic search using embeddings
360
+ * Returns entities ranked by similarity score
361
+ */
362
+ async semanticSearchEntities(query, options = {}) {
363
+ const {
364
+ topK = 10,
365
+ threshold = 0.65, // Lower threshold for more matches
366
+ entityType = null,
367
+ } = options;
368
+
369
+ // Initialize embeddings if not ready
370
+ if (!this.embeddingsReady) {
371
+ await this.initEmbeddings();
372
+ }
373
+
374
+ // Fall back to keyword search if embeddings not available
375
+ if (!this.embeddingsReady) {
376
+ const results = this.searchEntities(query);
377
+ return {
378
+ results: results.map(e => ({ entity: e, score: 0 })),
379
+ method: "keyword",
380
+ count: results.length,
381
+ };
382
+ }
383
+
384
+ try {
385
+ // Generate embedding for the query
386
+ const queryEmbedding = await this.embeddings.embed(query);
387
+
388
+ // Get all entities with their embeddings
389
+ const allEntities = this.getAllEntities();
390
+ const items = allEntities
391
+ .filter(e => !entityType || e.entityType === entityType)
392
+ .map(e => ({
393
+ entity: e,
394
+ embedding: this.data.entities[e.name]?.embedding || null,
395
+ }))
396
+ .filter(item => item.embedding !== null);
397
+
398
+ // Find most similar
399
+ const scoredResults = this.embeddings.findMostSimilar(
400
+ queryEmbedding,
401
+ items,
402
+ topK,
403
+ threshold
404
+ );
405
+
406
+ return {
407
+ results: scoredResults.map(r => ({ entity: r.entity, score: r.score })),
408
+ method: "semantic",
409
+ count: scoredResults.length,
410
+ };
411
+ } catch (error) {
412
+ console.warn("Semantic search failed, falling back to keyword:", error.message);
413
+ const results = this.searchEntities(query);
414
+ return {
415
+ results: results.map(e => ({ entity: e, score: 0 })),
416
+ method: "keyword",
417
+ count: results.length,
418
+ };
419
+ }
420
+ }
421
+
422
+ /**
423
+ * Generate embeddings for all entities that don't have them
424
+ * Useful for migrating existing data to semantic search
425
+ */
426
+ async generateMissingEmbeddings(progressCallback = null) {
427
+ // Initialize embeddings if not ready
428
+ if (!this.embeddingsReady) {
429
+ const ready = await this.initEmbeddings();
430
+ if (!ready) {
431
+ throw new Error("Embeddings provider not available");
432
+ }
433
+ }
434
+
435
+ const entities = this.getAllEntities();
436
+ const missing = entities.filter(e => !this.data.entities[e.name]?.embedding);
437
+
438
+ let processed = 0;
439
+ for (const entity of missing) {
440
+ await this.updateEntityEmbedding(entity.name, entity);
441
+ processed++;
442
+ if (progressCallback) {
443
+ progressCallback(processed, missing.length, entity.name);
444
+ }
445
+ }
446
+
447
+ await this.save();
448
+ return { processed, total: entities.length };
449
+ }
450
+
271
451
  /**
272
452
  * Get entities related to a given entity
273
453
  */