eigen-db 4.0.2 → 4.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/README.md +113 -20
- package/dist/eigen-db.js +242 -119
- package/dist/eigen-db.js.map +1 -1
- package/dist/eigen-db.umd.cjs +1 -1
- package/dist/eigen-db.umd.cjs.map +1 -1
- package/package.json +1 -1
- package/src/lib/__tests__/result-set.test.ts +53 -8
- package/src/lib/__tests__/vector-db.test.ts +437 -9
- package/src/lib/memory-manager.ts +8 -0
- package/src/lib/result-set.ts +29 -8
- package/src/lib/simd-binary.ts +1 -1
- package/src/lib/simd.wat +270 -128
- package/src/lib/types.ts +3 -1
- package/src/lib/vector-db.ts +240 -3
package/CHANGELOG.md
CHANGED
package/README.md
CHANGED
|
@@ -50,11 +50,11 @@ Notes:
|
|
|
50
50
|
```ts
|
|
51
51
|
const queryVector = embeddingQuery;
|
|
52
52
|
|
|
53
|
-
// Returns a plain array of { key,
|
|
53
|
+
// Returns a plain array of { key, similarity } sorted by descending similarity
|
|
54
54
|
const results = db.query(queryVector, { topK: 10 });
|
|
55
55
|
|
|
56
|
-
for (const { key,
|
|
57
|
-
console.log(key,
|
|
56
|
+
for (const { key, similarity } of results) {
|
|
57
|
+
console.log(key, similarity);
|
|
58
58
|
}
|
|
59
59
|
```
|
|
60
60
|
|
|
@@ -64,15 +64,25 @@ For lazy iteration (useful for pagination or early stopping):
|
|
|
64
64
|
const results = db.query(queryVector, { topK: 100, iterable: true });
|
|
65
65
|
|
|
66
66
|
// Iterate and break early — keys are resolved on demand
|
|
67
|
-
for (const { key,
|
|
68
|
-
if (
|
|
69
|
-
console.log(key,
|
|
67
|
+
for (const { key, similarity } of results) {
|
|
68
|
+
if (similarity < 0.5) break;
|
|
69
|
+
console.log(key, similarity);
|
|
70
70
|
}
|
|
71
71
|
|
|
72
72
|
// Or spread into an array when you need all results
|
|
73
73
|
const all = [...results];
|
|
74
74
|
```
|
|
75
75
|
|
|
76
|
+
Use `minSimilarity` to automatically cut off results below a threshold:
|
|
77
|
+
|
|
78
|
+
```ts
|
|
79
|
+
// Only return results with similarity ≥ 0.7 (inclusive)
|
|
80
|
+
const results = db.query(queryVector, { minSimilarity: 0.7 });
|
|
81
|
+
|
|
82
|
+
// Works with iterable mode too — iteration stops early at the threshold
|
|
83
|
+
const results = db.query(queryVector, { minSimilarity: 0.7, iterable: true });
|
|
84
|
+
```
|
|
85
|
+
|
|
76
86
|
### 4) Persist and lifecycle
|
|
77
87
|
|
|
78
88
|
```ts
|
|
@@ -86,6 +96,55 @@ To delete all vectors and storage:
|
|
|
86
96
|
await db.clear();
|
|
87
97
|
```
|
|
88
98
|
|
|
99
|
+
### 5) Export and import
|
|
100
|
+
|
|
101
|
+
Export the entire database as a streaming binary file:
|
|
102
|
+
|
|
103
|
+
```ts
|
|
104
|
+
const stream = await db.export(); // ReadableStream<Uint8Array>
|
|
105
|
+
|
|
106
|
+
// In a browser — download as a file
|
|
107
|
+
const response = new Response(stream);
|
|
108
|
+
const blob = await response.blob();
|
|
109
|
+
const url = URL.createObjectURL(blob);
|
|
110
|
+
const a = document.createElement("a");
|
|
111
|
+
a.href = url;
|
|
112
|
+
a.download = "database.bin";
|
|
113
|
+
a.click();
|
|
114
|
+
```
|
|
115
|
+
|
|
116
|
+
Import from a stream, replacing all existing data:
|
|
117
|
+
|
|
118
|
+
```ts
|
|
119
|
+
// From a File (e.g., <input type="file">)
|
|
120
|
+
await db.import(file.stream());
|
|
121
|
+
|
|
122
|
+
// From a fetch response
|
|
123
|
+
const res = await fetch("/path/to/database.bin");
|
|
124
|
+
await db.import(res.body!);
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
Notes:
|
|
128
|
+
|
|
129
|
+
- `import()` replaces all existing data in the target database.
|
|
130
|
+
- A dimension check is performed on import: the stream must contain data exported from a database with the same `dimensions` setting.
|
|
131
|
+
- Both methods use the Web Streams API to avoid large heap allocations — vectors are streamed in 64KB chunks.
|
|
132
|
+
|
|
133
|
+
## Similarity metric
|
|
134
|
+
|
|
135
|
+
Similarity is the dot product of the query and stored vectors.
|
|
136
|
+
|
|
137
|
+
- **With normalization enabled** (the default): vectors are L2-normalized before storage and query, so the dot product equals cosine similarity. Similarity ranges from **1** (identical) to **-1** (opposite), with **0** indicating orthogonal vectors.
|
|
138
|
+
- **With normalization disabled** (`normalize: false`): the dot product is computed on raw vectors. The range depends on the magnitude of your vectors. Use this mode when your vectors are already normalized or when you want raw dot-product semantics.
|
|
139
|
+
|
|
140
|
+
**When to normalize:**
|
|
141
|
+
|
|
142
|
+
| Scenario | Normalize? | Notes |
|
|
143
|
+
| --- | --- | --- |
|
|
144
|
+
| Using embeddings from OpenAI, Cohere, etc. | `true` (default) | Embeddings may not be unit-length; normalization ensures cosine similarity. |
|
|
145
|
+
| Vectors are already unit-length | Either | Setting `false` avoids redundant work. |
|
|
146
|
+
| You need raw dot-product semantics | `false` | Similarity will be the raw dot product; range depends on vector magnitudes. |
|
|
147
|
+
|
|
89
148
|
## Full API Reference
|
|
90
149
|
|
|
91
150
|
## Exports
|
|
@@ -94,7 +153,7 @@ await db.clear();
|
|
|
94
153
|
export { DB };
|
|
95
154
|
export type { ResultItem };
|
|
96
155
|
export { VectorCapacityExceededError };
|
|
97
|
-
export type { OpenOptions, OpenOptionsInternal, SetOptions, QueryOptions,
|
|
156
|
+
export type { OpenOptions, OpenOptionsInternal, SetOptions, QueryOptions, VectorInput };
|
|
98
157
|
export { InMemoryStorageProvider, OPFSStorageProvider };
|
|
99
158
|
export type { StorageProvider };
|
|
100
159
|
```
|
|
@@ -126,9 +185,9 @@ Opens (or creates) a database instance and loads persisted data.
|
|
|
126
185
|
- `getMany(keys: string[]): (number[] | undefined)[]`
|
|
127
186
|
- Batch lookup.
|
|
128
187
|
- `query(value: VectorInput, options?: QueryOptions): ResultItem[]`
|
|
129
|
-
- Returns similarity
|
|
188
|
+
- Returns results sorted by descending similarity as a plain array.
|
|
130
189
|
- Throws on dimension mismatch.
|
|
131
|
-
- `query(value: VectorInput, options:
|
|
190
|
+
- `query(value: VectorInput, options: QueryOptions & { iterable: true }): Iterable<ResultItem>`
|
|
132
191
|
- With `{ iterable: true }`, returns a lazy iterable. Keys are resolved
|
|
133
192
|
only as each item is consumed, enabling early stopping and pagination.
|
|
134
193
|
- Throws on dimension mismatch.
|
|
@@ -139,16 +198,23 @@ Opens (or creates) a database instance and loads persisted data.
|
|
|
139
198
|
- Subsequent operations throw.
|
|
140
199
|
- `clear(): Promise<void>`
|
|
141
200
|
- Clears in-memory state and destroys storage for this DB.
|
|
201
|
+
- `export(): Promise<ReadableStream<Uint8Array>>`
|
|
202
|
+
- Exports the entire database as a streaming binary. Vectors are streamed in 64KB chunks.
|
|
203
|
+
- `import(stream: ReadableStream<Uint8Array>): Promise<void>`
|
|
204
|
+
- Imports data from a stream, replacing all existing data.
|
|
205
|
+
- Throws on dimension mismatch between the stream data and the database.
|
|
142
206
|
|
|
143
207
|
### `ResultItem`
|
|
144
208
|
|
|
145
209
|
```ts
|
|
146
210
|
interface ResultItem {
|
|
147
211
|
key: string;
|
|
148
|
-
|
|
212
|
+
similarity: number;
|
|
149
213
|
}
|
|
150
214
|
```
|
|
151
215
|
|
|
216
|
+
- `similarity` — The dot product of query and stored vectors. With normalization (default), this is cosine similarity: 1 = identical, -1 = opposite.
|
|
217
|
+
|
|
152
218
|
### Option types
|
|
153
219
|
|
|
154
220
|
#### `OpenOptions`
|
|
@@ -190,16 +256,10 @@ interface SetOptions {
|
|
|
190
256
|
|
|
191
257
|
```ts
|
|
192
258
|
interface QueryOptions {
|
|
193
|
-
topK?: number; // default: all
|
|
259
|
+
topK?: number; // default: Infinity (all results)
|
|
260
|
+
minSimilarity?: number; // inclusive lower bound on similarity; results below this are excluded
|
|
194
261
|
normalize?: boolean;
|
|
195
|
-
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
#### `IterableQueryOptions`
|
|
199
|
-
|
|
200
|
-
```ts
|
|
201
|
-
interface IterableQueryOptions extends QueryOptions {
|
|
202
|
-
iterable: true; // returns Iterable<ResultItem> instead of ResultItem[]
|
|
262
|
+
iterable?: boolean; // when true, returns Iterable<ResultItem> instead of ResultItem[]
|
|
203
263
|
}
|
|
204
264
|
```
|
|
205
265
|
|
|
@@ -238,8 +298,41 @@ new InMemoryStorageProvider();
|
|
|
238
298
|
|
|
239
299
|
Thrown when memory growth would exceed WASM 32-bit memory limits for the configured dimension size.
|
|
240
300
|
|
|
301
|
+
## Benchmark results
|
|
302
|
+
|
|
303
|
+
WASM SIMD vs pure JavaScript performance on 1536-dimensional vectors (OpenAI embedding size), measured with `vitest bench` (Node.js):
|
|
304
|
+
|
|
305
|
+
| Operation | JS (ops/s) | WASM SIMD (ops/s) | Speedup |
|
|
306
|
+
| --- | --- | --- | --- |
|
|
307
|
+
| normalize (1536 dims) | 223,117 | 2,226,734 | **~10×** |
|
|
308
|
+
| searchAll (100 vectors × 1536 dims) | 3,429 | 77,130 | **~22×** |
|
|
309
|
+
| searchAll (1,000 vectors × 1536 dims) | 344 | 8,009 | **~23×** |
|
|
310
|
+
| searchAll (10,000 vectors × 1536 dims) | 34 | 398 | **~12×** |
|
|
311
|
+
|
|
312
|
+
The WASM SIMD layer uses 2-vector outer loop unrolling (halving query memory reads) and 4× inner loop unrolling with multiple independent accumulators.
|
|
313
|
+
|
|
314
|
+
### Running benchmarks
|
|
315
|
+
|
|
316
|
+
**Node.js** (via vitest):
|
|
317
|
+
|
|
318
|
+
```bash
|
|
319
|
+
npm run bench
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
**Browser**: start the dev server and navigate to the benchmark page:
|
|
323
|
+
|
|
324
|
+
```bash
|
|
325
|
+
npm run dev
|
|
326
|
+
# Open http://localhost:5173/bench.html
|
|
327
|
+
```
|
|
328
|
+
|
|
241
329
|
## Practical notes
|
|
242
330
|
|
|
243
|
-
- Similarity is dot product; with normalization enabled (default), this behaves like cosine similarity.
|
|
331
|
+
- Similarity is the dot product of query and stored vectors; with normalization enabled (default), this behaves like cosine similarity (1 = identical, -1 = opposite).
|
|
332
|
+
- `topK` defaults to `Infinity`, returning all stored vectors sorted by similarity. Use `minSimilarity` to limit results by proximity.
|
|
244
333
|
- Querying an empty database returns an empty array (`[]`).
|
|
245
334
|
- `flush()` writes deduplicated state, and reopen preserves key-to-slot mapping.
|
|
335
|
+
|
|
336
|
+
## Related
|
|
337
|
+
|
|
338
|
+
- Just need cosine similarity? Try [fast-theta](https://github.com/chuanqisun/fast-theta).
|