eigen-db 4.0.1 → 4.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +8 -0
- package/README.md +70 -20
- package/dist/eigen-db.js +192 -191
- package/dist/eigen-db.js.map +1 -1
- package/dist/eigen-db.umd.cjs +1 -1
- package/dist/eigen-db.umd.cjs.map +1 -1
- package/package.json +1 -1
- package/src/lib/__tests__/result-set.test.ts +53 -8
- package/src/lib/__tests__/vector-db.test.ts +64 -9
- package/src/lib/index.ts +1 -8
- package/src/lib/result-set.ts +28 -8
- package/src/lib/simd-binary.ts +1 -1
- package/src/lib/simd.wat +270 -128
- package/src/lib/types.ts +4 -10
- package/src/lib/vector-db.ts +11 -14
package/CHANGELOG.md
CHANGED
package/README.md
CHANGED
|
@@ -50,11 +50,11 @@ Notes:
|
|
|
50
50
|
```ts
|
|
51
51
|
const queryVector = embeddingQuery;
|
|
52
52
|
|
|
53
|
-
// Returns a plain array of { key,
|
|
53
|
+
// Returns a plain array of { key, distance } sorted by ascending distance
|
|
54
54
|
const results = db.query(queryVector, { topK: 10 });
|
|
55
55
|
|
|
56
|
-
for (const { key,
|
|
57
|
-
console.log(key,
|
|
56
|
+
for (const { key, distance } of results) {
|
|
57
|
+
console.log(key, distance);
|
|
58
58
|
}
|
|
59
59
|
```
|
|
60
60
|
|
|
@@ -64,15 +64,25 @@ For lazy iteration (useful for pagination or early stopping):
|
|
|
64
64
|
const results = db.query(queryVector, { topK: 100, iterable: true });
|
|
65
65
|
|
|
66
66
|
// Iterate and break early — keys are resolved on demand
|
|
67
|
-
for (const { key,
|
|
68
|
-
if (
|
|
69
|
-
console.log(key,
|
|
67
|
+
for (const { key, distance } of results) {
|
|
68
|
+
if (distance > 0.5) break;
|
|
69
|
+
console.log(key, distance);
|
|
70
70
|
}
|
|
71
71
|
|
|
72
72
|
// Or spread into an array when you need all results
|
|
73
73
|
const all = [...results];
|
|
74
74
|
```
|
|
75
75
|
|
|
76
|
+
Use `maxDistance` to automatically cut off results beyond a threshold:
|
|
77
|
+
|
|
78
|
+
```ts
|
|
79
|
+
// Only return results within distance 0.3 (inclusive)
|
|
80
|
+
const results = db.query(queryVector, { maxDistance: 0.3 });
|
|
81
|
+
|
|
82
|
+
// Works with iterable mode too — iteration stops early at the threshold
|
|
83
|
+
const results = db.query(queryVector, { maxDistance: 0.3, iterable: true });
|
|
84
|
+
```
|
|
85
|
+
|
|
76
86
|
### 4) Persist and lifecycle
|
|
77
87
|
|
|
78
88
|
```ts
|
|
@@ -86,6 +96,21 @@ To delete all vectors and storage:
|
|
|
86
96
|
await db.clear();
|
|
87
97
|
```
|
|
88
98
|
|
|
99
|
+
## Distance metric
|
|
100
|
+
|
|
101
|
+
Distance is defined as `1 - dotProduct(query, stored)`.
|
|
102
|
+
|
|
103
|
+
- **With normalization enabled** (the default): vectors are L2-normalized before storage and query, so the dot product equals cosine similarity. Distance then equals **cosine distance**, ranging from **0** (identical) to **2** (opposite).
|
|
104
|
+
- **With normalization disabled** (`normalize: false`): the dot product is computed on raw vectors. Distance is `1 - dotProduct`, which is not a standard metric and its range depends on the magnitude of your vectors. Use this mode when your vectors are already normalized or when you want raw dot-product semantics.
|
|
105
|
+
|
|
106
|
+
**When to normalize:**
|
|
107
|
+
|
|
108
|
+
| Scenario | Normalize? | Notes |
|
|
109
|
+
| --- | --- | --- |
|
|
110
|
+
| Using embeddings from OpenAI, Cohere, etc. | `true` (default) | Embeddings may not be unit-length; normalization ensures cosine distance. |
|
|
111
|
+
| Vectors are already unit-length | Either | Setting `false` avoids redundant work. |
|
|
112
|
+
| You need raw dot-product semantics | `false` | Distance will be `1 - dotProduct`; range depends on vector magnitudes. |
|
|
113
|
+
|
|
89
114
|
## Full API Reference
|
|
90
115
|
|
|
91
116
|
## Exports
|
|
@@ -94,7 +119,7 @@ await db.clear();
|
|
|
94
119
|
export { DB };
|
|
95
120
|
export type { ResultItem };
|
|
96
121
|
export { VectorCapacityExceededError };
|
|
97
|
-
export type { OpenOptions, OpenOptionsInternal, SetOptions, QueryOptions,
|
|
122
|
+
export type { OpenOptions, OpenOptionsInternal, SetOptions, QueryOptions, VectorInput };
|
|
98
123
|
export { InMemoryStorageProvider, OPFSStorageProvider };
|
|
99
124
|
export type { StorageProvider };
|
|
100
125
|
```
|
|
@@ -126,9 +151,9 @@ Opens (or creates) a database instance and loads persisted data.
|
|
|
126
151
|
- `getMany(keys: string[]): (number[] | undefined)[]`
|
|
127
152
|
- Batch lookup.
|
|
128
153
|
- `query(value: VectorInput, options?: QueryOptions): ResultItem[]`
|
|
129
|
-
- Returns
|
|
154
|
+
- Returns results sorted by ascending distance as a plain array.
|
|
130
155
|
- Throws on dimension mismatch.
|
|
131
|
-
- `query(value: VectorInput, options:
|
|
156
|
+
- `query(value: VectorInput, options: QueryOptions & { iterable: true }): Iterable<ResultItem>`
|
|
132
157
|
- With `{ iterable: true }`, returns a lazy iterable. Keys are resolved
|
|
133
158
|
only as each item is consumed, enabling early stopping and pagination.
|
|
134
159
|
- Throws on dimension mismatch.
|
|
@@ -145,10 +170,12 @@ Opens (or creates) a database instance and loads persisted data.
|
|
|
145
170
|
```ts
|
|
146
171
|
interface ResultItem {
|
|
147
172
|
key: string;
|
|
148
|
-
|
|
173
|
+
distance: number;
|
|
149
174
|
}
|
|
150
175
|
```
|
|
151
176
|
|
|
177
|
+
- `distance` — Defined as `1 - dotProduct`. With normalization (default), this is cosine distance: 0 = identical, 2 = opposite.
|
|
178
|
+
|
|
152
179
|
### Option types
|
|
153
180
|
|
|
154
181
|
#### `OpenOptions`
|
|
@@ -190,16 +217,10 @@ interface SetOptions {
|
|
|
190
217
|
|
|
191
218
|
```ts
|
|
192
219
|
interface QueryOptions {
|
|
193
|
-
topK?: number; // default: all
|
|
220
|
+
topK?: number; // default: Infinity (all results)
|
|
221
|
+
maxDistance?: number; // inclusive upper bound on distance; results beyond this are excluded
|
|
194
222
|
normalize?: boolean;
|
|
195
|
-
|
|
196
|
-
```
|
|
197
|
-
|
|
198
|
-
#### `IterableQueryOptions`
|
|
199
|
-
|
|
200
|
-
```ts
|
|
201
|
-
interface IterableQueryOptions extends QueryOptions {
|
|
202
|
-
iterable: true; // returns Iterable<ResultItem> instead of ResultItem[]
|
|
223
|
+
iterable?: boolean; // when true, returns Iterable<ResultItem> instead of ResultItem[]
|
|
203
224
|
}
|
|
204
225
|
```
|
|
205
226
|
|
|
@@ -238,8 +259,37 @@ new InMemoryStorageProvider();
|
|
|
238
259
|
|
|
239
260
|
Thrown when memory growth would exceed WASM 32-bit memory limits for the configured dimension size.
|
|
240
261
|
|
|
262
|
+
## Benchmark results
|
|
263
|
+
|
|
264
|
+
WASM SIMD vs pure JavaScript performance on 1536-dimensional vectors (OpenAI embedding size), measured with `vitest bench` (Node.js):
|
|
265
|
+
|
|
266
|
+
| Operation | JS (ops/s) | WASM SIMD (ops/s) | Speedup |
|
|
267
|
+
| --- | --- | --- | --- |
|
|
268
|
+
| normalize (1536 dims) | 223,117 | 2,226,734 | **~10×** |
|
|
269
|
+
| searchAll (100 vectors × 1536 dims) | 3,429 | 77,130 | **~22×** |
|
|
270
|
+
| searchAll (1,000 vectors × 1536 dims) | 344 | 8,009 | **~23×** |
|
|
271
|
+
| searchAll (10,000 vectors × 1536 dims) | 34 | 398 | **~12×** |
|
|
272
|
+
|
|
273
|
+
The WASM SIMD layer uses 2-vector outer loop unrolling (halving query memory reads) and 4× inner loop unrolling with multiple independent accumulators.
|
|
274
|
+
|
|
275
|
+
### Running benchmarks
|
|
276
|
+
|
|
277
|
+
**Node.js** (via vitest):
|
|
278
|
+
|
|
279
|
+
```bash
|
|
280
|
+
npm run bench
|
|
281
|
+
```
|
|
282
|
+
|
|
283
|
+
**Browser**: start the dev server and navigate to the benchmark page:
|
|
284
|
+
|
|
285
|
+
```bash
|
|
286
|
+
npm run dev
|
|
287
|
+
# Open http://localhost:5173/bench.html
|
|
288
|
+
```
|
|
289
|
+
|
|
241
290
|
## Practical notes
|
|
242
291
|
|
|
243
|
-
-
|
|
292
|
+
- Distance is `1 - dotProduct`; with normalization enabled (default), this behaves like cosine distance (0 = identical, 2 = opposite).
|
|
293
|
+
- `topK` defaults to `Infinity`, returning all stored vectors sorted by distance. Use `maxDistance` to limit results by proximity.
|
|
244
294
|
- Querying an empty database returns an empty array (`[]`).
|
|
245
295
|
- `flush()` writes deduplicated state, and reopen preserves key-to-slot mapping.
|