collective-memory-mcp 0.6.0 → 0.6.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +26 -68
- package/package.json +6 -3
- package/src/server.js +6 -147
- package/src/storage.js +67 -178
- package/src/vector-search.js +322 -0
- package/src/embeddings.js +0 -318
package/README.md
CHANGED
|
@@ -1,22 +1,23 @@
|
|
|
1
1
|
# Collective Memory MCP Server
|
|
2
2
|
|
|
3
|
-
A persistent, graph-based memory system with **
|
|
3
|
+
A persistent, graph-based memory system with **vector search** that enables AI agents to document their work and learn from each other's experiences. This system transforms ephemeral agent interactions into a searchable knowledge base of structural patterns, solutions, and methodologies.
|
|
4
4
|
|
|
5
5
|
## Overview
|
|
6
6
|
|
|
7
7
|
The Collective Memory System is designed for multi-agent environments where agents need to:
|
|
8
8
|
|
|
9
9
|
- Document their completed work for future reference
|
|
10
|
-
- Discover how similar tasks were solved previously using **
|
|
10
|
+
- Discover how similar tasks were solved previously using **vector search**
|
|
11
11
|
- Learn from the structural patterns and approaches of other agents
|
|
12
12
|
- Coordinate across parallel executions without duplicating effort
|
|
13
13
|
|
|
14
14
|
## Key Features
|
|
15
15
|
|
|
16
|
-
- **
|
|
16
|
+
- **Vector Search** - TF-IDF based search finds conceptually similar content even when keywords differ
|
|
17
17
|
- **Knowledge Graph** - Entities and relations capture complex relationships
|
|
18
18
|
- **Ranked Results** - Similarity scores help identify the most relevant past work
|
|
19
|
-
- **
|
|
19
|
+
- **Zero Configuration** - Works out of the box, no external dependencies or API keys needed
|
|
20
|
+
- **Pure JavaScript** - No native dependencies, works completely offline
|
|
20
21
|
|
|
21
22
|
## Installation
|
|
22
23
|
|
|
@@ -49,10 +50,10 @@ Add this to your Claude system prompt to ensure agents know about the Collective
|
|
|
49
50
|
You have access to a Collective Memory MCP Server that stores knowledge from previous tasks.
|
|
50
51
|
|
|
51
52
|
BEFORE starting work, search for similar past tasks using:
|
|
52
|
-
- search_collective_memory (
|
|
53
|
+
- search_collective_memory (vector search - understands meaning, not just keywords)
|
|
53
54
|
- find_similar_procedures (finds similar tasks with full implementation details)
|
|
54
55
|
|
|
55
|
-
The search uses
|
|
56
|
+
The search uses TF-IDF vector embeddings, so it finds relevant content even when different
|
|
56
57
|
terminology is used. Results are ranked by similarity score.
|
|
57
58
|
|
|
58
59
|
AFTER completing any task, document it using:
|
|
@@ -62,47 +63,16 @@ When writing observations, be SPECIFIC and include facts like file paths, versio
|
|
|
62
63
|
metrics, and error messages. Avoid vague statements like "works well" or "fixed bugs".
|
|
63
64
|
```
|
|
64
65
|
|
|
65
|
-
##
|
|
66
|
+
## How Vector Search Works
|
|
66
67
|
|
|
67
|
-
|
|
68
|
+
This system uses **TF-IDF (Term Frequency-Inverse Document Frequency)** vector search:
|
|
68
69
|
|
|
69
|
-
|
|
70
|
+
- Tokenizes text into meaningful terms
|
|
71
|
+
- Calculates term importance based on frequency
|
|
72
|
+
- Uses cosine similarity to rank results
|
|
73
|
+
- Works entirely offline with no external dependencies
|
|
70
74
|
|
|
71
|
-
|
|
72
|
-
# Configure with your OpenAI API key
|
|
73
|
-
# Use the manage_embeddings tool with:
|
|
74
|
-
{
|
|
75
|
-
"action": "configure",
|
|
76
|
-
"provider": "openai",
|
|
77
|
-
"api_key": "sk-..."
|
|
78
|
-
}
|
|
79
|
-
```
|
|
80
|
-
|
|
81
|
-
### Option 2: Ollama (Free - Local)
|
|
82
|
-
|
|
83
|
-
```bash
|
|
84
|
-
# Install Ollama first
|
|
85
|
-
# Then pull the embedding model
|
|
86
|
-
ollama pull nomic-embed-text
|
|
87
|
-
|
|
88
|
-
# Configure the provider
|
|
89
|
-
{
|
|
90
|
-
"action": "configure",
|
|
91
|
-
"provider": "ollama"
|
|
92
|
-
}
|
|
93
|
-
```
|
|
94
|
-
|
|
95
|
-
### Generate Embeddings for Existing Data
|
|
96
|
-
|
|
97
|
-
After configuring, generate embeddings for any existing entities:
|
|
98
|
-
|
|
99
|
-
```json
|
|
100
|
-
{
|
|
101
|
-
"action": "generate"
|
|
102
|
-
}
|
|
103
|
-
```
|
|
104
|
-
|
|
105
|
-
**Note:** Without configuring embeddings, the system falls back to keyword-based search.
|
|
75
|
+
No configuration needed - it just works!
|
|
106
76
|
|
|
107
77
|
## Entity Types
|
|
108
78
|
|
|
@@ -145,7 +115,7 @@ After configuring, generate embeddings for any existing entities:
|
|
|
145
115
|
### Query & Search
|
|
146
116
|
|
|
147
117
|
- **read_graph** - Read entire knowledge graph
|
|
148
|
-
- **search_collective_memory** -
|
|
118
|
+
- **search_collective_memory** - Vector search with ranked results
|
|
149
119
|
- **open_nodes** - Retrieve specific nodes by name
|
|
150
120
|
|
|
151
121
|
### Agent Workflow
|
|
@@ -153,10 +123,6 @@ After configuring, generate embeddings for any existing entities:
|
|
|
153
123
|
- **record_task_completion** - Primary tool for documenting completed work
|
|
154
124
|
- **find_similar_procedures** - Find similar tasks with full implementation details
|
|
155
125
|
|
|
156
|
-
### Embeddings Management
|
|
157
|
-
|
|
158
|
-
- **manage_embeddings** - Configure semantic search and generate embeddings
|
|
159
|
-
|
|
160
126
|
## Example Usage
|
|
161
127
|
|
|
162
128
|
### Recording a Task Completion
|
|
@@ -182,7 +148,7 @@ await session.callTool("record_task_completion", {
|
|
|
182
148
|
});
|
|
183
149
|
```
|
|
184
150
|
|
|
185
|
-
### Finding Similar Procedures (
|
|
151
|
+
### Finding Similar Procedures (Vector Search)
|
|
186
152
|
|
|
187
153
|
```javascript
|
|
188
154
|
const result = await session.callTool("find_similar_procedures", {
|
|
@@ -195,25 +161,18 @@ const result = await session.callTool("find_similar_procedures", {
|
|
|
195
161
|
// { "task": {...}, "score": 0.89, "artifacts": [...], "structures": [...] },
|
|
196
162
|
// { "task": {...}, "score": 0.82, "artifacts": [...], "structures": [...] }
|
|
197
163
|
// ],
|
|
198
|
-
// "search_method": "
|
|
164
|
+
// "search_method": "vector"
|
|
199
165
|
// }
|
|
200
166
|
```
|
|
201
167
|
|
|
202
|
-
###
|
|
168
|
+
### Searching the Collective Memory
|
|
203
169
|
|
|
204
170
|
```javascript
|
|
205
|
-
|
|
206
|
-
|
|
207
|
-
|
|
208
|
-
// Configure OpenAI
|
|
209
|
-
await session.callTool("manage_embeddings", {
|
|
210
|
-
"action": "configure",
|
|
211
|
-
"provider": "openai",
|
|
212
|
-
"api_key": "sk-..."
|
|
171
|
+
const result = await session.callTool("search_collective_memory", {
|
|
172
|
+
query: "database optimization"
|
|
213
173
|
});
|
|
214
174
|
|
|
215
|
-
//
|
|
216
|
-
await session.callTool("manage_embeddings", { "action": "generate" });
|
|
175
|
+
// Returns matching entities with similarity scores
|
|
217
176
|
```
|
|
218
177
|
|
|
219
178
|
## Database
|
|
@@ -222,16 +181,15 @@ The server uses JSON file storage for persistence. Data is stored at:
|
|
|
222
181
|
|
|
223
182
|
```
|
|
224
183
|
~/.collective-memory/memory.json # Knowledge graph data
|
|
225
|
-
~/.collective-memory/config.json # Embeddings provider configuration
|
|
226
184
|
```
|
|
227
185
|
|
|
228
|
-
##
|
|
186
|
+
## Vector Search Benefits
|
|
229
187
|
|
|
230
|
-
|
|
|
231
|
-
|
|
232
|
-
|
|
|
188
|
+
| Traditional Keyword Search | TF-IDF Vector Search |
|
|
189
|
+
|---------------------------|---------------------|
|
|
190
|
+
| Exact word matching required | Finds related terms automatically |
|
|
233
191
|
| No relevance ranking | Results ranked by similarity score (0-1) |
|
|
234
|
-
|
|
|
192
|
+
| "login" misses "authentication" | "login" finds "authentication", "JWT", "OAuth" |
|
|
235
193
|
| High false-positive rate | More precise, relevant results |
|
|
236
194
|
|
|
237
195
|
## Requirements
|
package/package.json
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
{
|
|
2
2
|
"name": "collective-memory-mcp",
|
|
3
|
-
"version": "0.6.
|
|
4
|
-
"description": "A persistent, graph-based memory system for AI agents with
|
|
3
|
+
"version": "0.6.1",
|
|
4
|
+
"description": "A persistent, graph-based memory system for AI agents with TF-IDF vector search (MCP Server)",
|
|
5
5
|
"type": "module",
|
|
6
6
|
"main": "src/server.js",
|
|
7
7
|
"bin": {
|
|
@@ -20,7 +20,10 @@
|
|
|
20
20
|
"collective",
|
|
21
21
|
"anthropic",
|
|
22
22
|
"claude",
|
|
23
|
-
"model-context-protocol"
|
|
23
|
+
"model-context-protocol",
|
|
24
|
+
"vector-search",
|
|
25
|
+
"tf-idf",
|
|
26
|
+
"semantic-search"
|
|
24
27
|
],
|
|
25
28
|
"license": "MIT",
|
|
26
29
|
"repository": {
|
package/src/server.js
CHANGED
|
@@ -14,7 +14,6 @@ import {
|
|
|
14
14
|
} from "@modelcontextprotocol/sdk/types.js";
|
|
15
15
|
import { getStorage } from "./storage.js";
|
|
16
16
|
import { Entity, Relation, ENTITY_TYPES, RELATION_TYPES } from "./models.js";
|
|
17
|
-
import { loadConfig } from "./embeddings.js";
|
|
18
17
|
|
|
19
18
|
/**
|
|
20
19
|
* Create and configure the MCP server
|
|
@@ -22,9 +21,6 @@ import { loadConfig } from "./embeddings.js";
|
|
|
22
21
|
async function createServer() {
|
|
23
22
|
const storage = getStorage();
|
|
24
23
|
|
|
25
|
-
// Initialize embeddings for semantic search
|
|
26
|
-
await storage.initEmbeddings();
|
|
27
|
-
|
|
28
24
|
const server = new Server(
|
|
29
25
|
{
|
|
30
26
|
name: "collective-memory",
|
|
@@ -234,17 +230,16 @@ async function createServer() {
|
|
|
234
230
|
{
|
|
235
231
|
name: "search_collective_memory",
|
|
236
232
|
description:
|
|
237
|
-
"**Search all past work using
|
|
238
|
-
"Uses
|
|
233
|
+
"**Search all past work using vector search** - Use before starting a task to learn from previous solutions. " +
|
|
234
|
+
"Uses TF-IDF vector search to find conceptually similar content, even with different keywords. " +
|
|
239
235
|
"Returns ranked results with similarity scores. " +
|
|
240
|
-
"Automatically falls back to keyword search if embeddings aren't available. " +
|
|
241
236
|
"Use find_similar_procedures for more detailed results with artifacts.",
|
|
242
237
|
inputSchema: {
|
|
243
238
|
type: "object",
|
|
244
239
|
properties: {
|
|
245
240
|
query: {
|
|
246
241
|
type: "string",
|
|
247
|
-
description: "What are you looking for?
|
|
242
|
+
description: "What are you looking for? Vector search understands meaning. (e.g., 'authentication', 'CORS fix', 'database')",
|
|
248
243
|
},
|
|
249
244
|
},
|
|
250
245
|
required: ["query"],
|
|
@@ -355,9 +350,9 @@ async function createServer() {
|
|
|
355
350
|
{
|
|
356
351
|
name: "find_similar_procedures",
|
|
357
352
|
description:
|
|
358
|
-
"**Use BEFORE starting work** - Find how similar tasks were solved previously using
|
|
353
|
+
"**Use BEFORE starting work** - Find how similar tasks were solved previously using vector search. " +
|
|
359
354
|
"Returns complete implementation details including artifacts and structures, ranked by similarity. " +
|
|
360
|
-
"Understands meaning and intent
|
|
355
|
+
"Understands meaning and intent using TF-IDF vectors. " +
|
|
361
356
|
"Learn from past solutions before implementing new features. " +
|
|
362
357
|
"Query examples: 'authentication', 'database migration', 'API design', 'error handling'.",
|
|
363
358
|
inputSchema: {
|
|
@@ -365,39 +360,12 @@ async function createServer() {
|
|
|
365
360
|
properties: {
|
|
366
361
|
query: {
|
|
367
362
|
type: "string",
|
|
368
|
-
description: "What are you trying to do?
|
|
363
|
+
description: "What are you trying to do? Vector search finds conceptually similar work. (e.g., 'authentication implementation', 'database migration')",
|
|
369
364
|
},
|
|
370
365
|
},
|
|
371
366
|
required: ["query"],
|
|
372
367
|
},
|
|
373
368
|
},
|
|
374
|
-
{
|
|
375
|
-
name: "manage_embeddings",
|
|
376
|
-
description:
|
|
377
|
-
"**Manage semantic search embeddings** - Generate embeddings for existing entities " +
|
|
378
|
-
"to enable semantic search. Run this once after setting up an embeddings provider. " +
|
|
379
|
-
"Embeddings enable finding similar content even when keywords don't match exactly.",
|
|
380
|
-
inputSchema: {
|
|
381
|
-
type: "object",
|
|
382
|
-
properties: {
|
|
383
|
-
action: {
|
|
384
|
-
type: "string",
|
|
385
|
-
enum: ["generate", "status", "configure"],
|
|
386
|
-
description: "Action: 'generate' creates embeddings for entities missing them, 'status' shows current state, 'configure' updates settings",
|
|
387
|
-
},
|
|
388
|
-
provider: {
|
|
389
|
-
type: "string",
|
|
390
|
-
enum: ["openai", "ollama"],
|
|
391
|
-
description: "Provider to use (only for 'configure' action)",
|
|
392
|
-
},
|
|
393
|
-
api_key: {
|
|
394
|
-
type: "string",
|
|
395
|
-
description: "API key for OpenAI (only for 'configure' action with provider='openai')",
|
|
396
|
-
},
|
|
397
|
-
},
|
|
398
|
-
required: ["action"],
|
|
399
|
-
},
|
|
400
|
-
},
|
|
401
369
|
],
|
|
402
370
|
};
|
|
403
371
|
});
|
|
@@ -736,9 +704,6 @@ Future agents will read your observations to learn. Write for them, not for your
|
|
|
736
704
|
case "find_similar_procedures":
|
|
737
705
|
return { content: [{ type: "text", text: JSON.stringify(findSimilarProcedures(args), null, 2) }] };
|
|
738
706
|
|
|
739
|
-
case "manage_embeddings":
|
|
740
|
-
return { content: [{ type: "text", text: JSON.stringify(await manageEmbeddings(args), null, 2) }] };
|
|
741
|
-
|
|
742
707
|
default:
|
|
743
708
|
throw new Error(`Unknown tool: ${name}`);
|
|
744
709
|
}
|
|
@@ -1126,112 +1091,6 @@ Future agents will read your observations to learn. Write for them, not for your
|
|
|
1126
1091
|
return { similar_tasks: results, count: results.length, search_method: searchResult.method };
|
|
1127
1092
|
}
|
|
1128
1093
|
|
|
1129
|
-
async function manageEmbeddings({ action = "status", provider = null, api_key = null }) {
|
|
1130
|
-
const { saveConfig } = await import("./embeddings.js");
|
|
1131
|
-
|
|
1132
|
-
switch (action) {
|
|
1133
|
-
case "status": {
|
|
1134
|
-
const allEntities = storage.getAllEntities();
|
|
1135
|
-
const withEmbeddings = allEntities.filter(
|
|
1136
|
-
(e) => storage.data.entities[e.name]?.embedding
|
|
1137
|
-
).length;
|
|
1138
|
-
|
|
1139
|
-
const config = loadConfig();
|
|
1140
|
-
const isReady = storage.embeddingsReady;
|
|
1141
|
-
|
|
1142
|
-
return {
|
|
1143
|
-
status: "success",
|
|
1144
|
-
action: "status",
|
|
1145
|
-
embeddings_ready: isReady,
|
|
1146
|
-
provider: config.embedding_provider,
|
|
1147
|
-
entities_with_embeddings: withEmbeddings,
|
|
1148
|
-
total_entities: allEntities.length,
|
|
1149
|
-
coverage_percent: allEntities.length > 0
|
|
1150
|
-
? Math.round((withEmbeddings / allEntities.length) * 100)
|
|
1151
|
-
: 0,
|
|
1152
|
-
message: isReady
|
|
1153
|
-
? `Embeddings enabled using ${config.embedding_provider}. ${withEmbeddings}/${allEntities.length} entities have embeddings.`
|
|
1154
|
-
: "Embeddings not configured. Use 'configure' action to set up a provider.",
|
|
1155
|
-
};
|
|
1156
|
-
}
|
|
1157
|
-
|
|
1158
|
-
case "configure": {
|
|
1159
|
-
if (!provider) {
|
|
1160
|
-
return {
|
|
1161
|
-
status: "error",
|
|
1162
|
-
message: "Provider is required for configure action",
|
|
1163
|
-
};
|
|
1164
|
-
}
|
|
1165
|
-
|
|
1166
|
-
const config = loadConfig();
|
|
1167
|
-
config.embedding_provider = provider;
|
|
1168
|
-
|
|
1169
|
-
if (provider === "openai" && api_key) {
|
|
1170
|
-
config.openai_api_key = api_key;
|
|
1171
|
-
}
|
|
1172
|
-
|
|
1173
|
-
saveConfig(config);
|
|
1174
|
-
|
|
1175
|
-
// Re-initialize embeddings with new config
|
|
1176
|
-
storage.embeddingsReady = false;
|
|
1177
|
-
await storage.initEmbeddings();
|
|
1178
|
-
|
|
1179
|
-
return {
|
|
1180
|
-
status: "success",
|
|
1181
|
-
action: "configure",
|
|
1182
|
-
provider,
|
|
1183
|
-
embeddings_ready: storage.embeddingsReady,
|
|
1184
|
-
message: storage.embeddingsReady
|
|
1185
|
-
? `Successfully configured ${provider} for embeddings`
|
|
1186
|
-
: `Configured ${provider} but provider is not available. Check API keys or provider status.`,
|
|
1187
|
-
};
|
|
1188
|
-
}
|
|
1189
|
-
|
|
1190
|
-
case "generate": {
|
|
1191
|
-
// Generate embeddings for entities that don't have them
|
|
1192
|
-
const allEntities = storage.getAllEntities();
|
|
1193
|
-
const missing = allEntities.filter(
|
|
1194
|
-
(e) => !storage.data.entities[e.name]?.embedding
|
|
1195
|
-
);
|
|
1196
|
-
|
|
1197
|
-
if (missing.length === 0) {
|
|
1198
|
-
return {
|
|
1199
|
-
status: "success",
|
|
1200
|
-
action: "generate",
|
|
1201
|
-
message: "All entities already have embeddings",
|
|
1202
|
-
processed: 0,
|
|
1203
|
-
};
|
|
1204
|
-
}
|
|
1205
|
-
|
|
1206
|
-
try {
|
|
1207
|
-
const result = await storage.generateMissingEmbeddings((current, total, name) => {
|
|
1208
|
-
// Optional: could emit progress events
|
|
1209
|
-
});
|
|
1210
|
-
|
|
1211
|
-
return {
|
|
1212
|
-
status: "success",
|
|
1213
|
-
action: "generate",
|
|
1214
|
-
processed: result.processed,
|
|
1215
|
-
total_entities: result.total,
|
|
1216
|
-
message: `Generated embeddings for ${result.processed} entities`,
|
|
1217
|
-
};
|
|
1218
|
-
} catch (error) {
|
|
1219
|
-
return {
|
|
1220
|
-
status: "error",
|
|
1221
|
-
action: "generate",
|
|
1222
|
-
message: error.message,
|
|
1223
|
-
};
|
|
1224
|
-
}
|
|
1225
|
-
}
|
|
1226
|
-
|
|
1227
|
-
default:
|
|
1228
|
-
return {
|
|
1229
|
-
status: "error",
|
|
1230
|
-
message: `Unknown action: ${action}`,
|
|
1231
|
-
};
|
|
1232
|
-
}
|
|
1233
|
-
}
|
|
1234
|
-
|
|
1235
1094
|
return server;
|
|
1236
1095
|
}
|
|
1237
1096
|
|
package/src/storage.js
CHANGED
|
@@ -1,28 +1,27 @@
|
|
|
1
1
|
/**
|
|
2
2
|
* Storage layer for the Collective Memory System using JSON file.
|
|
3
|
-
* Pure JavaScript - no
|
|
3
|
+
* Pure JavaScript - no external dependencies required.
|
|
4
|
+
* Uses TF-IDF vector search for semantic-like matching.
|
|
4
5
|
*/
|
|
5
6
|
|
|
6
|
-
import { promises as fs } from "fs";
|
|
7
7
|
import { existsSync, mkdirSync, readFileSync, writeFileSync } from "fs";
|
|
8
|
+
import { promises as fs } from "fs";
|
|
8
9
|
import path from "path";
|
|
9
10
|
import os from "os";
|
|
10
11
|
import { Entity, Relation } from "./models.js";
|
|
11
|
-
import {
|
|
12
|
+
import { getVectorIndex, buildIndexFromEntities } from "./vector-search.js";
|
|
12
13
|
|
|
13
14
|
const DB_DIR = path.join(os.homedir(), ".collective-memory");
|
|
14
15
|
const DB_PATH = path.join(DB_DIR, "memory.json");
|
|
15
16
|
|
|
16
17
|
/**
|
|
17
|
-
*
|
|
18
|
+
* File-based storage with vector search
|
|
18
19
|
*/
|
|
19
20
|
export class Storage {
|
|
20
|
-
constructor(dbPath = DB_PATH
|
|
21
|
+
constructor(dbPath = DB_PATH) {
|
|
21
22
|
this.dbPath = dbPath;
|
|
22
23
|
this.data = null;
|
|
23
|
-
this.
|
|
24
|
-
this.embeddings = null;
|
|
25
|
-
this.embeddingsReady = false;
|
|
24
|
+
this.vectorIndex = getVectorIndex();
|
|
26
25
|
// Initialize synchronously
|
|
27
26
|
this.init();
|
|
28
27
|
}
|
|
@@ -45,10 +44,13 @@ export class Storage {
|
|
|
45
44
|
this.data = {
|
|
46
45
|
entities: {},
|
|
47
46
|
relations: [],
|
|
48
|
-
version: "2.0",
|
|
47
|
+
version: "2.0",
|
|
49
48
|
};
|
|
50
49
|
this.saveSync();
|
|
51
50
|
}
|
|
51
|
+
|
|
52
|
+
// Build vector index from loaded entities
|
|
53
|
+
this.rebuildIndex();
|
|
52
54
|
} catch (error) {
|
|
53
55
|
// If anything fails, start with empty data
|
|
54
56
|
this.data = {
|
|
@@ -60,53 +62,11 @@ export class Storage {
|
|
|
60
62
|
}
|
|
61
63
|
|
|
62
64
|
/**
|
|
63
|
-
*
|
|
64
|
-
*/
|
|
65
|
-
async initEmbeddings() {
|
|
66
|
-
if (!this.embeddingsEnabled || this.embeddingsReady) {
|
|
67
|
-
return this.embeddingsReady;
|
|
68
|
-
}
|
|
69
|
-
|
|
70
|
-
try {
|
|
71
|
-
this.embeddings = getEmbeddings();
|
|
72
|
-
this.embeddingsReady = await this.embeddings.isAvailable();
|
|
73
|
-
} catch (error) {
|
|
74
|
-
console.warn("Embeddings not available:", error.message);
|
|
75
|
-
this.embeddingsReady = false;
|
|
76
|
-
}
|
|
77
|
-
|
|
78
|
-
return this.embeddingsReady;
|
|
79
|
-
}
|
|
80
|
-
|
|
81
|
-
/**
|
|
82
|
-
* Generate embedding for an entity
|
|
65
|
+
* Rebuild the vector search index from all entities
|
|
83
66
|
*/
|
|
84
|
-
|
|
85
|
-
|
|
86
|
-
|
|
87
|
-
}
|
|
88
|
-
|
|
89
|
-
try {
|
|
90
|
-
const text = this.embeddings.createEntityText(entity);
|
|
91
|
-
return await this.embeddings.embed(text);
|
|
92
|
-
} catch (error) {
|
|
93
|
-
console.warn("Failed to generate embedding:", error.message);
|
|
94
|
-
return null;
|
|
95
|
-
}
|
|
96
|
-
}
|
|
97
|
-
|
|
98
|
-
/**
|
|
99
|
-
* Store embedding in entity data
|
|
100
|
-
*/
|
|
101
|
-
async updateEntityEmbedding(entityName, entity) {
|
|
102
|
-
if (!this.embeddingsReady) {
|
|
103
|
-
return;
|
|
104
|
-
}
|
|
105
|
-
|
|
106
|
-
const embedding = await this.generateEmbedding(entity);
|
|
107
|
-
if (embedding) {
|
|
108
|
-
this.data.entities[entityName].embedding = embedding;
|
|
109
|
-
}
|
|
67
|
+
rebuildIndex() {
|
|
68
|
+
const entities = this.getAllEntities();
|
|
69
|
+
buildIndexFromEntities(entities);
|
|
110
70
|
}
|
|
111
71
|
|
|
112
72
|
/**
|
|
@@ -135,6 +95,15 @@ export class Storage {
|
|
|
135
95
|
await fs.writeFile(this.dbPath, JSON.stringify(this.data, null, 2), "utf-8");
|
|
136
96
|
}
|
|
137
97
|
|
|
98
|
+
/**
|
|
99
|
+
* Initialize embeddings (placeholder for API compatibility)
|
|
100
|
+
* This system uses built-in TF-IDF vector search, no external embeddings needed
|
|
101
|
+
*/
|
|
102
|
+
async initEmbeddings() {
|
|
103
|
+
// Vector search is always available, no configuration needed
|
|
104
|
+
return true;
|
|
105
|
+
}
|
|
106
|
+
|
|
138
107
|
// ========== Entity Operations ==========
|
|
139
108
|
|
|
140
109
|
/**
|
|
@@ -144,11 +113,12 @@ export class Storage {
|
|
|
144
113
|
if (this.data.entities[entity.name]) {
|
|
145
114
|
return false;
|
|
146
115
|
}
|
|
147
|
-
const entityData = entity.toJSON();
|
|
148
|
-
this.data.entities[entity.name] = entityData;
|
|
149
116
|
|
|
150
|
-
|
|
151
|
-
|
|
117
|
+
this.data.entities[entity.name] = entity.toJSON();
|
|
118
|
+
|
|
119
|
+
// Add to vector index
|
|
120
|
+
this.vectorIndex.addDocument(entity.name, entity);
|
|
121
|
+
this.vectorIndex.build();
|
|
152
122
|
|
|
153
123
|
await this.save();
|
|
154
124
|
return true;
|
|
@@ -186,7 +156,6 @@ export class Storage {
|
|
|
186
156
|
const entity = this.data.entities[name];
|
|
187
157
|
if (!entity) return false;
|
|
188
158
|
|
|
189
|
-
const updated = false;
|
|
190
159
|
if (observations !== undefined) {
|
|
191
160
|
entity.observations = observations;
|
|
192
161
|
}
|
|
@@ -194,11 +163,8 @@ export class Storage {
|
|
|
194
163
|
entity.metadata = metadata;
|
|
195
164
|
}
|
|
196
165
|
|
|
197
|
-
//
|
|
198
|
-
|
|
199
|
-
const entityObj = new Entity(entity);
|
|
200
|
-
await this.updateEntityEmbedding(name, entityObj);
|
|
201
|
-
}
|
|
166
|
+
// Rebuild index to reflect changes
|
|
167
|
+
this.rebuildIndex();
|
|
202
168
|
|
|
203
169
|
await this.save();
|
|
204
170
|
return true;
|
|
@@ -219,6 +185,9 @@ export class Storage {
|
|
|
219
185
|
r => r.from !== name && r.to !== name
|
|
220
186
|
);
|
|
221
187
|
|
|
188
|
+
// Rebuild index
|
|
189
|
+
this.rebuildIndex();
|
|
190
|
+
|
|
222
191
|
await this.save();
|
|
223
192
|
return true;
|
|
224
193
|
}
|
|
@@ -319,133 +288,42 @@ export class Storage {
|
|
|
319
288
|
return count;
|
|
320
289
|
}
|
|
321
290
|
|
|
322
|
-
// ========== Search ==========
|
|
291
|
+
// ========== Vector Search ==========
|
|
323
292
|
|
|
324
293
|
/**
|
|
325
|
-
*
|
|
326
|
-
* Uses word-based matching - any word in the query that matches returns the entity
|
|
294
|
+
* Vector search is always available (built-in TF-IDF)
|
|
327
295
|
*/
|
|
328
|
-
|
|
329
|
-
|
|
330
|
-
const stopWords = new Set(["the", "a", "an", "and", "or", "but", "in", "on", "at", "to", "for", "of", "with", "by"]);
|
|
331
|
-
const words = query
|
|
332
|
-
.toLowerCase()
|
|
333
|
-
.split(/\s+/)
|
|
334
|
-
.filter(w => w.length > 2 && !stopWords.has(w));
|
|
335
|
-
|
|
336
|
-
if (words.length === 0) {
|
|
337
|
-
// Fallback to original query if all words were filtered
|
|
338
|
-
const lowerQuery = query.toLowerCase();
|
|
339
|
-
return this.getAllEntities().filter(e => {
|
|
340
|
-
if (e.name.toLowerCase().includes(lowerQuery)) return true;
|
|
341
|
-
if (e.entityType.toLowerCase().includes(lowerQuery)) return true;
|
|
342
|
-
if (e.observations.some(o => o.toLowerCase().includes(lowerQuery))) return true;
|
|
343
|
-
return false;
|
|
344
|
-
});
|
|
345
|
-
}
|
|
346
|
-
|
|
347
|
-
return this.getAllEntities().filter(e => {
|
|
348
|
-
// Check if ANY word matches in name, type, or observations
|
|
349
|
-
for (const word of words) {
|
|
350
|
-
if (e.name.toLowerCase().includes(word)) return true;
|
|
351
|
-
if (e.entityType.toLowerCase().includes(word)) return true;
|
|
352
|
-
if (e.observations.some(o => o.toLowerCase().includes(word))) return true;
|
|
353
|
-
}
|
|
354
|
-
return false;
|
|
355
|
-
});
|
|
296
|
+
get embeddingsReady() {
|
|
297
|
+
return true;
|
|
356
298
|
}
|
|
357
299
|
|
|
358
300
|
/**
|
|
359
|
-
*
|
|
360
|
-
*
|
|
301
|
+
* Alias for semantic search - uses built-in TF-IDF vector search
|
|
302
|
+
* This provides semantic-like understanding without external dependencies
|
|
361
303
|
*/
|
|
362
304
|
async semanticSearchEntities(query, options = {}) {
|
|
363
|
-
|
|
364
|
-
topK = 10,
|
|
365
|
-
threshold = 0.65, // Lower threshold for more matches
|
|
366
|
-
entityType = null,
|
|
367
|
-
} = options;
|
|
368
|
-
|
|
369
|
-
// Initialize embeddings if not ready
|
|
370
|
-
if (!this.embeddingsReady) {
|
|
371
|
-
await this.initEmbeddings();
|
|
372
|
-
}
|
|
373
|
-
|
|
374
|
-
// Fall back to keyword search if embeddings not available
|
|
375
|
-
if (!this.embeddingsReady) {
|
|
376
|
-
const results = this.searchEntities(query);
|
|
377
|
-
return {
|
|
378
|
-
results: results.map(e => ({ entity: e, score: 0 })),
|
|
379
|
-
method: "keyword",
|
|
380
|
-
count: results.length,
|
|
381
|
-
};
|
|
382
|
-
}
|
|
383
|
-
|
|
384
|
-
try {
|
|
385
|
-
// Generate embedding for the query
|
|
386
|
-
const queryEmbedding = await this.embeddings.embed(query);
|
|
387
|
-
|
|
388
|
-
// Get all entities with their embeddings
|
|
389
|
-
const allEntities = this.getAllEntities();
|
|
390
|
-
const items = allEntities
|
|
391
|
-
.filter(e => !entityType || e.entityType === entityType)
|
|
392
|
-
.map(e => ({
|
|
393
|
-
entity: e,
|
|
394
|
-
embedding: this.data.entities[e.name]?.embedding || null,
|
|
395
|
-
}))
|
|
396
|
-
.filter(item => item.embedding !== null);
|
|
397
|
-
|
|
398
|
-
// Find most similar
|
|
399
|
-
const scoredResults = this.embeddings.findMostSimilar(
|
|
400
|
-
queryEmbedding,
|
|
401
|
-
items,
|
|
402
|
-
topK,
|
|
403
|
-
threshold
|
|
404
|
-
);
|
|
405
|
-
|
|
406
|
-
return {
|
|
407
|
-
results: scoredResults.map(r => ({ entity: r.entity, score: r.score })),
|
|
408
|
-
method: "semantic",
|
|
409
|
-
count: scoredResults.length,
|
|
410
|
-
};
|
|
411
|
-
} catch (error) {
|
|
412
|
-
console.warn("Semantic search failed, falling back to keyword:", error.message);
|
|
413
|
-
const results = this.searchEntities(query);
|
|
414
|
-
return {
|
|
415
|
-
results: results.map(e => ({ entity: e, score: 0 })),
|
|
416
|
-
method: "keyword",
|
|
417
|
-
count: results.length,
|
|
418
|
-
};
|
|
419
|
-
}
|
|
305
|
+
return this.vectorSearchEntities(query, options);
|
|
420
306
|
}
|
|
421
307
|
|
|
422
308
|
/**
|
|
423
|
-
*
|
|
424
|
-
*
|
|
309
|
+
* Search entities using TF-IDF vector search
|
|
310
|
+
* Returns entities ranked by similarity score
|
|
425
311
|
*/
|
|
426
|
-
|
|
427
|
-
|
|
428
|
-
|
|
429
|
-
|
|
430
|
-
|
|
431
|
-
|
|
432
|
-
}
|
|
433
|
-
}
|
|
312
|
+
vectorSearchEntities(query, options = {}) {
|
|
313
|
+
const {
|
|
314
|
+
topK = 10,
|
|
315
|
+
threshold = 0.1,
|
|
316
|
+
entityType = null
|
|
317
|
+
} = options;
|
|
434
318
|
|
|
435
|
-
|
|
436
|
-
const
|
|
437
|
-
|
|
438
|
-
let processed = 0;
|
|
439
|
-
for (const entity of missing) {
|
|
440
|
-
await this.updateEntityEmbedding(entity.name, entity);
|
|
441
|
-
processed++;
|
|
442
|
-
if (progressCallback) {
|
|
443
|
-
progressCallback(processed, missing.length, entity.name);
|
|
444
|
-
}
|
|
445
|
-
}
|
|
319
|
+
// Use vector search index
|
|
320
|
+
const result = this.vectorIndex.search(query, { topK, threshold, entityType });
|
|
446
321
|
|
|
447
|
-
|
|
448
|
-
|
|
322
|
+
return {
|
|
323
|
+
results: result.results,
|
|
324
|
+
method: "vector",
|
|
325
|
+
count: result.count
|
|
326
|
+
};
|
|
449
327
|
}
|
|
450
328
|
|
|
451
329
|
/**
|
|
@@ -477,6 +355,17 @@ export class Storage {
|
|
|
477
355
|
return result;
|
|
478
356
|
}
|
|
479
357
|
|
|
358
|
+
/**
|
|
359
|
+
* Get index statistics
|
|
360
|
+
*/
|
|
361
|
+
getIndexStats() {
|
|
362
|
+
return {
|
|
363
|
+
...this.vectorIndex.getStats(),
|
|
364
|
+
entityCount: Object.keys(this.data.entities).length,
|
|
365
|
+
relationCount: this.data.relations.length
|
|
366
|
+
};
|
|
367
|
+
}
|
|
368
|
+
|
|
480
369
|
/**
|
|
481
370
|
* Close storage
|
|
482
371
|
*/
|
|
@@ -0,0 +1,322 @@
|
|
|
1
|
+
/**
|
|
2
|
+
* Vector Search module for the Collective Memory System.
|
|
3
|
+
* Uses TF-IDF (Term Frequency-Inverse Document Frequency) for semantic-like search.
|
|
4
|
+
* Pure JavaScript - no external dependencies, works completely offline.
|
|
5
|
+
*/
|
|
6
|
+
|
|
7
|
+
/**
|
|
8
|
+
* Tokenize text into terms
|
|
9
|
+
* - Converts to lowercase
|
|
10
|
+
* - Removes special characters
|
|
11
|
+
* - Splits into words
|
|
12
|
+
* - Filters stop words
|
|
13
|
+
*/
|
|
14
|
+
function tokenize(text) {
|
|
15
|
+
const stopWords = new Set([
|
|
16
|
+
"a", "about", "above", "after", "again", "against", "all", "am", "an", "and",
|
|
17
|
+
"any", "are", "as", "at", "be", "because", "been", "before", "being", "below",
|
|
18
|
+
"between", "both", "but", "by", "can", "did", "do", "does", "doing", "don",
|
|
19
|
+
"down", "during", "each", "few", "for", "from", "further", "had", "has", "have",
|
|
20
|
+
"having", "he", "her", "here", "hers", "herself", "him", "himself", "his",
|
|
21
|
+
"how", "i", "if", "in", "into", "is", "it", "its", "itself", "just", "me",
|
|
22
|
+
"might", "more", "most", "must", "my", "myself", "no", "nor", "not", "now",
|
|
23
|
+
"of", "off", "on", "once", "only", "or", "other", "our", "ours", "ourselves",
|
|
24
|
+
"out", "over", "own", "s", "same", "she", "should", "so", "some", "still",
|
|
25
|
+
"such", "t", "than", "that", "the", "their", "theirs", "them", "themselves",
|
|
26
|
+
"then", "there", "these", "they", "this", "those", "through", "to", "too",
|
|
27
|
+
"under", "until", "up", "very", "was", "we", "were", "what", "when", "where",
|
|
28
|
+
"which", "while", "who", "whom", "why", "will", "with", "would", "you",
|
|
29
|
+
"your", "yours", "yourself", "yourselves", "task", "artifact", "structure",
|
|
30
|
+
"agent", "session", "entity", "description", "created", "during"
|
|
31
|
+
]);
|
|
32
|
+
|
|
33
|
+
return text
|
|
34
|
+
.toLowerCase()
|
|
35
|
+
.replace(/[^\w\s@#.-]/g, " ") // Keep @, #, ., - for technical terms
|
|
36
|
+
.split(/\s+/)
|
|
37
|
+
.filter(word => word.length > 2 && !stopWords.has(word));
|
|
38
|
+
}
|
|
39
|
+
|
|
40
|
+
/**
|
|
41
|
+
* Extract terms from an entity for indexing
|
|
42
|
+
*/
|
|
43
|
+
function extractEntityTerms(entity) {
|
|
44
|
+
const terms = [];
|
|
45
|
+
|
|
46
|
+
// Name has high weight
|
|
47
|
+
terms.push(...tokenize(entity.name));
|
|
48
|
+
|
|
49
|
+
// Entity type
|
|
50
|
+
terms.push(entity.entityType);
|
|
51
|
+
|
|
52
|
+
// All observations
|
|
53
|
+
if (entity.observations) {
|
|
54
|
+
for (const obs of entity.observations) {
|
|
55
|
+
terms.push(...tokenize(obs));
|
|
56
|
+
}
|
|
57
|
+
}
|
|
58
|
+
|
|
59
|
+
// Metadata
|
|
60
|
+
if (entity.metadata) {
|
|
61
|
+
for (const value of Object.values(entity.metadata)) {
|
|
62
|
+
if (typeof value === "string") {
|
|
63
|
+
terms.push(...tokenize(value));
|
|
64
|
+
}
|
|
65
|
+
}
|
|
66
|
+
}
|
|
67
|
+
|
|
68
|
+
return terms;
|
|
69
|
+
}
|
|
70
|
+
|
|
71
|
+
/**
|
|
72
|
+
* Calculate term frequency for a document
|
|
73
|
+
*/
|
|
74
|
+
function calculateTermFrequency(terms) {
|
|
75
|
+
const tf = {};
|
|
76
|
+
const totalTerms = terms.length;
|
|
77
|
+
|
|
78
|
+
for (const term of terms) {
|
|
79
|
+
tf[term] = (tf[term] || 0) + 1;
|
|
80
|
+
}
|
|
81
|
+
|
|
82
|
+
// Normalize by document length
|
|
83
|
+
for (const term in tf) {
|
|
84
|
+
tf[term] = tf[term] / totalTerms;
|
|
85
|
+
}
|
|
86
|
+
|
|
87
|
+
return tf;
|
|
88
|
+
}
|
|
89
|
+
|
|
90
|
+
/**
|
|
91
|
+
* Calculate inverse document frequency
|
|
92
|
+
*/
|
|
93
|
+
function calculateIDF(documents) {
|
|
94
|
+
const idf = {};
|
|
95
|
+
const totalDocs = documents.length;
|
|
96
|
+
|
|
97
|
+
// Count documents containing each term
|
|
98
|
+
for (const doc of documents) {
|
|
99
|
+
const uniqueTerms = new Set(doc.terms);
|
|
100
|
+
for (const term of uniqueTerms) {
|
|
101
|
+
idf[term] = (idf[term] || 0) + 1;
|
|
102
|
+
}
|
|
103
|
+
}
|
|
104
|
+
|
|
105
|
+
// Calculate IDF
|
|
106
|
+
for (const term in idf) {
|
|
107
|
+
idf[term] = Math.log(totalDocs / (1 + idf[term]));
|
|
108
|
+
}
|
|
109
|
+
|
|
110
|
+
return idf;
|
|
111
|
+
}
|
|
112
|
+
|
|
113
|
+
/**
|
|
114
|
+
* Create a TF-IDF vector for a document
|
|
115
|
+
*/
|
|
116
|
+
function createTFIDFVector(tf, idf, allTerms) {
|
|
117
|
+
const vector = [];
|
|
118
|
+
|
|
119
|
+
for (const term of allTerms) {
|
|
120
|
+
const tfValue = tf[term] || 0;
|
|
121
|
+
const idfValue = idf[term] || 0;
|
|
122
|
+
vector.push(tfValue * idfValue);
|
|
123
|
+
}
|
|
124
|
+
|
|
125
|
+
return vector;
|
|
126
|
+
}
|
|
127
|
+
|
|
128
|
+
/**
|
|
129
|
+
* Calculate cosine similarity between two vectors
|
|
130
|
+
*/
|
|
131
|
+
function cosineSimilarity(a, b) {
|
|
132
|
+
if (a.length !== b.length) {
|
|
133
|
+
return 0;
|
|
134
|
+
}
|
|
135
|
+
|
|
136
|
+
let dotProduct = 0;
|
|
137
|
+
let normA = 0;
|
|
138
|
+
let normB = 0;
|
|
139
|
+
|
|
140
|
+
for (let i = 0; i < a.length; i++) {
|
|
141
|
+
dotProduct += a[i] * b[i];
|
|
142
|
+
normA += a[i] * a[i];
|
|
143
|
+
normB += b[i] * b[i];
|
|
144
|
+
}
|
|
145
|
+
|
|
146
|
+
normA = Math.sqrt(normA);
|
|
147
|
+
normB = Math.sqrt(normB);
|
|
148
|
+
|
|
149
|
+
if (normA === 0 || normB === 0) {
|
|
150
|
+
return 0;
|
|
151
|
+
}
|
|
152
|
+
|
|
153
|
+
return dotProduct / (normA * normB);
|
|
154
|
+
}
|
|
155
|
+
|
|
156
|
+
/**
|
|
157
|
+
* Vector Search Index
|
|
158
|
+
*/
|
|
159
|
+
class VectorSearchIndex {
|
|
160
|
+
constructor() {
|
|
161
|
+
this.documents = [];
|
|
162
|
+
this.allTerms = new Set();
|
|
163
|
+
this.idf = {};
|
|
164
|
+
this.built = false;
|
|
165
|
+
}
|
|
166
|
+
|
|
167
|
+
/**
|
|
168
|
+
* Add a document to the index
|
|
169
|
+
*/
|
|
170
|
+
addDocument(id, entity) {
|
|
171
|
+
const terms = extractEntityTerms(entity);
|
|
172
|
+
|
|
173
|
+
this.documents.push({
|
|
174
|
+
id,
|
|
175
|
+
entity,
|
|
176
|
+
terms,
|
|
177
|
+
tf: null,
|
|
178
|
+
vector: null
|
|
179
|
+
});
|
|
180
|
+
|
|
181
|
+
for (const term of terms) {
|
|
182
|
+
this.allTerms.add(term);
|
|
183
|
+
}
|
|
184
|
+
|
|
185
|
+
this.built = false;
|
|
186
|
+
}
|
|
187
|
+
|
|
188
|
+
/**
|
|
189
|
+
* Build the index (calculate TF-IDF vectors)
|
|
190
|
+
*/
|
|
191
|
+
build() {
|
|
192
|
+
if (this.documents.length === 0) {
|
|
193
|
+
return;
|
|
194
|
+
}
|
|
195
|
+
|
|
196
|
+
const termList = Array.from(this.allTerms);
|
|
197
|
+
|
|
198
|
+
// Calculate IDF for all terms
|
|
199
|
+
this.idf = calculateIDF(this.documents);
|
|
200
|
+
|
|
201
|
+
// Calculate TF and create vectors for each document
|
|
202
|
+
for (const doc of this.documents) {
|
|
203
|
+
doc.tf = calculateTermFrequency(doc.terms);
|
|
204
|
+
doc.vector = createTFIDFVector(doc.tf, this.idf, termList);
|
|
205
|
+
}
|
|
206
|
+
|
|
207
|
+
this.allTermsList = termList;
|
|
208
|
+
this.built = true;
|
|
209
|
+
}
|
|
210
|
+
|
|
211
|
+
/**
|
|
212
|
+
* Search the index
|
|
213
|
+
*/
|
|
214
|
+
search(query, options = {}) {
|
|
215
|
+
const {
|
|
216
|
+
topK = 10,
|
|
217
|
+
threshold = 0.1,
|
|
218
|
+
entityType = null
|
|
219
|
+
} = options;
|
|
220
|
+
|
|
221
|
+
// Build if not already built
|
|
222
|
+
if (!this.built) {
|
|
223
|
+
this.build();
|
|
224
|
+
}
|
|
225
|
+
|
|
226
|
+
// Tokenize query
|
|
227
|
+
const queryTerms = tokenize(query);
|
|
228
|
+
if (queryTerms.length === 0) {
|
|
229
|
+
return { results: [], method: "vector", count: 0 };
|
|
230
|
+
}
|
|
231
|
+
|
|
232
|
+
// Create query vector
|
|
233
|
+
const queryTF = calculateTermFrequency(queryTerms);
|
|
234
|
+
const queryVector = createTFIDFVector(queryTF, this.idf, this.allTermsList);
|
|
235
|
+
|
|
236
|
+
// Calculate similarities
|
|
237
|
+
const results = this.documents
|
|
238
|
+
.filter(doc => !entityType || doc.entity.entityType === entityType)
|
|
239
|
+
.map(doc => {
|
|
240
|
+
const score = cosineSimilarity(queryVector, doc.vector);
|
|
241
|
+
return {
|
|
242
|
+
entity: doc.entity,
|
|
243
|
+
score
|
|
244
|
+
};
|
|
245
|
+
})
|
|
246
|
+
.filter(r => r.score >= threshold)
|
|
247
|
+
.sort((a, b) => b.score - a.score)
|
|
248
|
+
.slice(0, topK);
|
|
249
|
+
|
|
250
|
+
return {
|
|
251
|
+
results,
|
|
252
|
+
method: "vector",
|
|
253
|
+
count: results.length
|
|
254
|
+
};
|
|
255
|
+
}
|
|
256
|
+
|
|
257
|
+
/**
|
|
258
|
+
* Clear the index
|
|
259
|
+
*/
|
|
260
|
+
clear() {
|
|
261
|
+
this.documents = [];
|
|
262
|
+
this.allTerms = new Set();
|
|
263
|
+
this.idf = {};
|
|
264
|
+
this.built = false;
|
|
265
|
+
}
|
|
266
|
+
|
|
267
|
+
/**
|
|
268
|
+
* Get index statistics
|
|
269
|
+
*/
|
|
270
|
+
getStats() {
|
|
271
|
+
return {
|
|
272
|
+
documentCount: this.documents.length,
|
|
273
|
+
uniqueTermCount: this.allTerms.size,
|
|
274
|
+
built: this.built
|
|
275
|
+
};
|
|
276
|
+
}
|
|
277
|
+
}
|
|
278
|
+
|
|
279
|
+
/**
|
|
280
|
+
* Singleton instance
|
|
281
|
+
*/
|
|
282
|
+
let indexInstance = null;
|
|
283
|
+
|
|
284
|
+
/**
|
|
285
|
+
* Get or create the vector search index
|
|
286
|
+
*/
|
|
287
|
+
export function getVectorIndex() {
|
|
288
|
+
if (!indexInstance) {
|
|
289
|
+
indexInstance = new VectorSearchIndex();
|
|
290
|
+
}
|
|
291
|
+
return indexInstance;
|
|
292
|
+
}
|
|
293
|
+
|
|
294
|
+
/**
|
|
295
|
+
* Rebuild index from entities
|
|
296
|
+
*/
|
|
297
|
+
export function buildIndexFromEntities(entities) {
|
|
298
|
+
const index = getVectorIndex();
|
|
299
|
+
index.clear();
|
|
300
|
+
|
|
301
|
+
for (const entity of entities) {
|
|
302
|
+
index.addDocument(entity.name, entity);
|
|
303
|
+
}
|
|
304
|
+
|
|
305
|
+
index.build();
|
|
306
|
+
return index;
|
|
307
|
+
}
|
|
308
|
+
|
|
309
|
+
export {
|
|
310
|
+
VectorSearchIndex,
|
|
311
|
+
tokenize,
|
|
312
|
+
extractEntityTerms,
|
|
313
|
+
cosineSimilarity,
|
|
314
|
+
calculateTermFrequency,
|
|
315
|
+
calculateIDF
|
|
316
|
+
};
|
|
317
|
+
|
|
318
|
+
export default {
|
|
319
|
+
getVectorIndex,
|
|
320
|
+
buildIndexFromEntities,
|
|
321
|
+
VectorSearchIndex
|
|
322
|
+
};
|
package/src/embeddings.js
DELETED
|
@@ -1,318 +0,0 @@
|
|
|
1
|
-
/**
|
|
2
|
-
* Embeddings module for semantic search in the Collective Memory System.
|
|
3
|
-
* Supports multiple embedding providers with cosine similarity.
|
|
4
|
-
*/
|
|
5
|
-
|
|
6
|
-
import fs from "fs";
|
|
7
|
-
import path from "path";
|
|
8
|
-
import os from "os";
|
|
9
|
-
|
|
10
|
-
const CONFIG_DIR = path.join(os.homedir(), ".collective-memory");
|
|
11
|
-
const CONFIG_PATH = path.join(CONFIG_DIR, "config.json");
|
|
12
|
-
|
|
13
|
-
/**
|
|
14
|
-
* Default configuration
|
|
15
|
-
*/
|
|
16
|
-
const DEFAULT_CONFIG = {
|
|
17
|
-
embedding_provider: "openai", // 'openai' or 'ollama'
|
|
18
|
-
openai_api_key: null,
|
|
19
|
-
openai_model: "text-embedding-3-small",
|
|
20
|
-
ollama_base_url: "http://localhost:11434",
|
|
21
|
-
ollama_model: "nomic-embed-text",
|
|
22
|
-
embedding_dimension: 1536,
|
|
23
|
-
};
|
|
24
|
-
|
|
25
|
-
/**
|
|
26
|
-
* Load configuration from file
|
|
27
|
-
*/
|
|
28
|
-
function loadConfig() {
|
|
29
|
-
try {
|
|
30
|
-
if (fs.existsSync(CONFIG_PATH)) {
|
|
31
|
-
const content = fs.readFileSync(CONFIG_PATH, "utf-8");
|
|
32
|
-
return { ...DEFAULT_CONFIG, ...JSON.parse(content) };
|
|
33
|
-
}
|
|
34
|
-
} catch (error) {
|
|
35
|
-
console.error("Failed to load config:", error.message);
|
|
36
|
-
}
|
|
37
|
-
return { ...DEFAULT_CONFIG };
|
|
38
|
-
}
|
|
39
|
-
|
|
40
|
-
/**
|
|
41
|
-
* Save configuration to file
|
|
42
|
-
*/
|
|
43
|
-
function saveConfig(config) {
|
|
44
|
-
try {
|
|
45
|
-
if (!fs.existsSync(CONFIG_DIR)) {
|
|
46
|
-
fs.mkdirSync(CONFIG_DIR, { recursive: true });
|
|
47
|
-
}
|
|
48
|
-
fs.writeFileSync(CONFIG_PATH, JSON.stringify(config, null, 2));
|
|
49
|
-
} catch (error) {
|
|
50
|
-
console.error("Failed to save config:", error.message);
|
|
51
|
-
}
|
|
52
|
-
}
|
|
53
|
-
|
|
54
|
-
/**
|
|
55
|
-
* Calculate cosine similarity between two vectors
|
|
56
|
-
*/
|
|
57
|
-
function cosineSimilarity(a, b) {
|
|
58
|
-
if (a.length !== b.length) {
|
|
59
|
-
throw new Error("Vector dimensions must match");
|
|
60
|
-
}
|
|
61
|
-
|
|
62
|
-
let dotProduct = 0;
|
|
63
|
-
let normA = 0;
|
|
64
|
-
let normB = 0;
|
|
65
|
-
|
|
66
|
-
for (let i = 0; i < a.length; i++) {
|
|
67
|
-
dotProduct += a[i] * b[i];
|
|
68
|
-
normA += a[i] * a[i];
|
|
69
|
-
normB += b[i] * b[i];
|
|
70
|
-
}
|
|
71
|
-
|
|
72
|
-
normA = Math.sqrt(normA);
|
|
73
|
-
normB = Math.sqrt(normB);
|
|
74
|
-
|
|
75
|
-
if (normA === 0 || normB === 0) {
|
|
76
|
-
return 0;
|
|
77
|
-
}
|
|
78
|
-
|
|
79
|
-
return dotProduct / (normA * normB);
|
|
80
|
-
}
|
|
81
|
-
|
|
82
|
-
/**
|
|
83
|
-
* OpenAI embeddings provider
|
|
84
|
-
*/
|
|
85
|
-
class OpenAIEmbeddings {
|
|
86
|
-
constructor(config) {
|
|
87
|
-
this.apiKey = config.openai_api_key || process.env.OPENAI_API_KEY;
|
|
88
|
-
this.model = config.openai_model || "text-embedding-3-small";
|
|
89
|
-
this.dimension = config.embedding_dimension || 1536;
|
|
90
|
-
}
|
|
91
|
-
|
|
92
|
-
isAvailable() {
|
|
93
|
-
return !!this.apiKey;
|
|
94
|
-
}
|
|
95
|
-
|
|
96
|
-
async embed(text) {
|
|
97
|
-
if (!this.isAvailable()) {
|
|
98
|
-
throw new Error("OpenAI API key not configured");
|
|
99
|
-
}
|
|
100
|
-
|
|
101
|
-
const response = await fetch("https://api.openai.com/v1/embeddings", {
|
|
102
|
-
method: "POST",
|
|
103
|
-
headers: {
|
|
104
|
-
"Content-Type": "application/json",
|
|
105
|
-
"Authorization": `Bearer ${this.apiKey}`,
|
|
106
|
-
},
|
|
107
|
-
body: JSON.stringify({
|
|
108
|
-
model: this.model,
|
|
109
|
-
input: text,
|
|
110
|
-
}),
|
|
111
|
-
});
|
|
112
|
-
|
|
113
|
-
if (!response.ok) {
|
|
114
|
-
const error = await response.text();
|
|
115
|
-
throw new Error(`OpenAI API error: ${error}`);
|
|
116
|
-
}
|
|
117
|
-
|
|
118
|
-
const data = await response.json();
|
|
119
|
-
return data.data[0].embedding;
|
|
120
|
-
}
|
|
121
|
-
|
|
122
|
-
async embedBatch(texts) {
|
|
123
|
-
if (!this.isAvailable()) {
|
|
124
|
-
throw new Error("OpenAI API key not configured");
|
|
125
|
-
}
|
|
126
|
-
|
|
127
|
-
const response = await fetch("https://api.openai.com/v1/embeddings", {
|
|
128
|
-
method: "POST",
|
|
129
|
-
headers: {
|
|
130
|
-
"Content-Type": "application/json",
|
|
131
|
-
"Authorization": `Bearer ${this.apiKey}`,
|
|
132
|
-
},
|
|
133
|
-
body: JSON.stringify({
|
|
134
|
-
model: this.model,
|
|
135
|
-
input: texts,
|
|
136
|
-
}),
|
|
137
|
-
});
|
|
138
|
-
|
|
139
|
-
if (!response.ok) {
|
|
140
|
-
const error = await response.text();
|
|
141
|
-
throw new Error(`OpenAI API error: ${error}`);
|
|
142
|
-
}
|
|
143
|
-
|
|
144
|
-
const data = await response.json();
|
|
145
|
-
return data.data.map((item) => item.embedding);
|
|
146
|
-
}
|
|
147
|
-
}
|
|
148
|
-
|
|
149
|
-
/**
|
|
150
|
-
* Ollama embeddings provider (local, free)
|
|
151
|
-
*/
|
|
152
|
-
class OllamaEmbeddings {
|
|
153
|
-
constructor(config) {
|
|
154
|
-
this.baseUrl = config.ollama_base_url || "http://localhost:11434";
|
|
155
|
-
this.model = config.ollama_model || "nomic-embed-text";
|
|
156
|
-
this.dimension = 768; // Default for nomic-embed-text
|
|
157
|
-
}
|
|
158
|
-
|
|
159
|
-
isAvailable() {
|
|
160
|
-
// Check if Ollama is running
|
|
161
|
-
return fetch(`${this.baseUrl}/api/tags`)
|
|
162
|
-
.then((res) => res.ok)
|
|
163
|
-
.catch(() => false);
|
|
164
|
-
}
|
|
165
|
-
|
|
166
|
-
async embed(text) {
|
|
167
|
-
const response = await fetch(`${this.baseUrl}/api/embeddings`, {
|
|
168
|
-
method: "POST",
|
|
169
|
-
headers: {
|
|
170
|
-
"Content-Type": "application/json",
|
|
171
|
-
},
|
|
172
|
-
body: JSON.stringify({
|
|
173
|
-
model: this.model,
|
|
174
|
-
prompt: text,
|
|
175
|
-
}),
|
|
176
|
-
});
|
|
177
|
-
|
|
178
|
-
if (!response.ok) {
|
|
179
|
-
const error = await response.text();
|
|
180
|
-
throw new Error(`Ollama API error: ${error}`);
|
|
181
|
-
}
|
|
182
|
-
|
|
183
|
-
const data = await response.json();
|
|
184
|
-
return data.embedding;
|
|
185
|
-
}
|
|
186
|
-
|
|
187
|
-
async embedBatch(texts) {
|
|
188
|
-
// Ollama doesn't support batch embeddings, so we do them sequentially
|
|
189
|
-
const embeddings = [];
|
|
190
|
-
for (const text of texts) {
|
|
191
|
-
embeddings.push(await this.embed(text));
|
|
192
|
-
}
|
|
193
|
-
return embeddings;
|
|
194
|
-
}
|
|
195
|
-
}
|
|
196
|
-
|
|
197
|
-
/**
|
|
198
|
-
* Main embeddings class that manages providers
|
|
199
|
-
*/
|
|
200
|
-
class Embeddings {
|
|
201
|
-
constructor(config) {
|
|
202
|
-
this.config = config || loadConfig();
|
|
203
|
-
this.providers = {
|
|
204
|
-
openai: new OpenAIEmbeddings(this.config),
|
|
205
|
-
ollama: new OllamaEmbeddings(this.config),
|
|
206
|
-
};
|
|
207
|
-
this.activeProvider = this.config.embedding_provider || "openai";
|
|
208
|
-
}
|
|
209
|
-
|
|
210
|
-
/**
|
|
211
|
-
* Get the active provider
|
|
212
|
-
*/
|
|
213
|
-
getProvider() {
|
|
214
|
-
return this.providers[this.activeProvider];
|
|
215
|
-
}
|
|
216
|
-
|
|
217
|
-
/**
|
|
218
|
-
* Check if the active provider is available
|
|
219
|
-
*/
|
|
220
|
-
async isAvailable() {
|
|
221
|
-
const provider = this.getProvider();
|
|
222
|
-
|
|
223
|
-
if (this.activeProvider === "ollama") {
|
|
224
|
-
return await provider.isAvailable();
|
|
225
|
-
}
|
|
226
|
-
|
|
227
|
-
return provider.isAvailable();
|
|
228
|
-
}
|
|
229
|
-
|
|
230
|
-
/**
|
|
231
|
-
* Generate embedding for a single text
|
|
232
|
-
*/
|
|
233
|
-
async embed(text) {
|
|
234
|
-
const provider = this.getProvider();
|
|
235
|
-
return await provider.embed(text);
|
|
236
|
-
}
|
|
237
|
-
|
|
238
|
-
/**
|
|
239
|
-
* Generate embeddings for multiple texts
|
|
240
|
-
*/
|
|
241
|
-
async embedBatch(texts) {
|
|
242
|
-
const provider = this.getProvider();
|
|
243
|
-
return await provider.embedBatch(texts);
|
|
244
|
-
}
|
|
245
|
-
|
|
246
|
-
/**
|
|
247
|
-
* Find most similar items using cosine similarity
|
|
248
|
-
*/
|
|
249
|
-
findMostSimilar(queryEmbedding, items, topK = 10, threshold = 0.7) {
|
|
250
|
-
const results = items.map((item) => {
|
|
251
|
-
if (!item.embedding) {
|
|
252
|
-
return { ...item, score: 0 };
|
|
253
|
-
}
|
|
254
|
-
const score = cosineSimilarity(queryEmbedding, item.embedding);
|
|
255
|
-
return { ...item, score };
|
|
256
|
-
});
|
|
257
|
-
|
|
258
|
-
// Sort by score descending and filter by threshold
|
|
259
|
-
return results
|
|
260
|
-
.filter((r) => r.score >= threshold)
|
|
261
|
-
.sort((a, b) => b.score - a.score)
|
|
262
|
-
.slice(0, topK);
|
|
263
|
-
}
|
|
264
|
-
|
|
265
|
-
/**
|
|
266
|
-
* Create text representation for entity embedding
|
|
267
|
-
*/
|
|
268
|
-
createEntityText(entity) {
|
|
269
|
-
const parts = [];
|
|
270
|
-
|
|
271
|
-
// Name carries important semantic weight
|
|
272
|
-
parts.push(`Name: ${entity.name}`);
|
|
273
|
-
|
|
274
|
-
// Entity type provides context
|
|
275
|
-
parts.push(`Type: ${entity.entityType}`);
|
|
276
|
-
|
|
277
|
-
// Observations contain the detailed information
|
|
278
|
-
if (entity.observations && entity.observations.length > 0) {
|
|
279
|
-
parts.push(`Observations:\n${entity.observations.join("\n")}`);
|
|
280
|
-
}
|
|
281
|
-
|
|
282
|
-
// Metadata if present
|
|
283
|
-
if (entity.metadata) {
|
|
284
|
-
parts.push(`Metadata: ${JSON.stringify(entity.metadata)}`);
|
|
285
|
-
}
|
|
286
|
-
|
|
287
|
-
return parts.join("\n\n");
|
|
288
|
-
}
|
|
289
|
-
}
|
|
290
|
-
|
|
291
|
-
/**
|
|
292
|
-
* Singleton instance
|
|
293
|
-
*/
|
|
294
|
-
let embeddingsInstance = null;
|
|
295
|
-
|
|
296
|
-
/**
|
|
297
|
-
* Get or create embeddings instance
|
|
298
|
-
*/
|
|
299
|
-
export function getEmbeddings(config) {
|
|
300
|
-
if (!embeddingsInstance) {
|
|
301
|
-
embeddingsInstance = new Embeddings(config);
|
|
302
|
-
}
|
|
303
|
-
return embeddingsInstance;
|
|
304
|
-
}
|
|
305
|
-
|
|
306
|
-
/**
|
|
307
|
-
* Export utilities
|
|
308
|
-
*/
|
|
309
|
-
export {
|
|
310
|
-
cosineSimilarity,
|
|
311
|
-
loadConfig,
|
|
312
|
-
saveConfig,
|
|
313
|
-
Embeddings,
|
|
314
|
-
OpenAIEmbeddings,
|
|
315
|
-
OllamaEmbeddings,
|
|
316
|
-
};
|
|
317
|
-
|
|
318
|
-
export default { getEmbeddings, cosineSimilarity, loadConfig, saveConfig };
|