eigen-db 4.0.1 → 4.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ # v4.1.0
2
+
3
+ Added: performance optimization in WASM
4
+
5
+ # v4.0.2
6
+
7
+ Fixed: `query()` overload type hint issue
8
+
1
9
  # v4.0.1
2
10
 
3
11
  Fixed: the source code was stale
package/README.md CHANGED
@@ -50,11 +50,11 @@ Notes:
50
50
  ```ts
51
51
  const queryVector = embeddingQuery;
52
52
 
53
- // Returns a plain array of { key, score } sorted by similarity
53
+ // Returns a plain array of { key, distance } sorted by ascending distance
54
54
  const results = db.query(queryVector, { topK: 10 });
55
55
 
56
- for (const { key, score } of results) {
57
- console.log(key, score);
56
+ for (const { key, distance } of results) {
57
+ console.log(key, distance);
58
58
  }
59
59
  ```
60
60
 
@@ -64,15 +64,25 @@ For lazy iteration (useful for pagination or early stopping):
64
64
  const results = db.query(queryVector, { topK: 100, iterable: true });
65
65
 
66
66
  // Iterate and break early — keys are resolved on demand
67
- for (const { key, score } of results) {
68
- if (score < 0.5) break;
69
- console.log(key, score);
67
+ for (const { key, distance } of results) {
68
+ if (distance > 0.5) break;
69
+ console.log(key, distance);
70
70
  }
71
71
 
72
72
  // Or spread into an array when you need all results
73
73
  const all = [...results];
74
74
  ```
75
75
 
76
+ Use `maxDistance` to automatically cut off results beyond a threshold:
77
+
78
+ ```ts
79
+ // Only return results within distance 0.3 (inclusive)
80
+ const results = db.query(queryVector, { maxDistance: 0.3 });
81
+
82
+ // Works with iterable mode too — iteration stops early at the threshold
83
+ const results = db.query(queryVector, { maxDistance: 0.3, iterable: true });
84
+ ```
85
+
76
86
  ### 4) Persist and lifecycle
77
87
 
78
88
  ```ts
@@ -86,6 +96,21 @@ To delete all vectors and storage:
86
96
  await db.clear();
87
97
  ```
88
98
 
99
+ ## Distance metric
100
+
101
+ Distance is defined as `1 - dotProduct(query, stored)`.
102
+
103
+ - **With normalization enabled** (the default): vectors are L2-normalized before storage and query, so the dot product equals cosine similarity. Distance then equals **cosine distance**, ranging from **0** (identical) to **2** (opposite).
104
+ - **With normalization disabled** (`normalize: false`): the dot product is computed on raw vectors. Distance is `1 - dotProduct`, which is not a standard metric and its range depends on the magnitude of your vectors. Use this mode when your vectors are already normalized or when you want raw dot-product semantics.
105
+
106
+ **When to normalize:**
107
+
108
+ | Scenario | Normalize? | Notes |
109
+ | --- | --- | --- |
110
+ | Using embeddings from OpenAI, Cohere, etc. | `true` (default) | Embeddings may not be unit-length; normalization ensures cosine distance. |
111
+ | Vectors are already unit-length | Either | Setting `false` avoids redundant work. |
112
+ | You need raw dot-product semantics | `false` | Distance will be `1 - dotProduct`; range depends on vector magnitudes. |
113
+
89
114
  ## Full API Reference
90
115
 
91
116
  ## Exports
@@ -94,7 +119,7 @@ await db.clear();
94
119
  export { DB };
95
120
  export type { ResultItem };
96
121
  export { VectorCapacityExceededError };
97
- export type { OpenOptions, OpenOptionsInternal, SetOptions, QueryOptions, IterableQueryOptions, VectorInput };
122
+ export type { OpenOptions, OpenOptionsInternal, SetOptions, QueryOptions, VectorInput };
98
123
  export { InMemoryStorageProvider, OPFSStorageProvider };
99
124
  export type { StorageProvider };
100
125
  ```
@@ -126,9 +151,9 @@ Opens (or creates) a database instance and loads persisted data.
126
151
  - `getMany(keys: string[]): (number[] | undefined)[]`
127
152
  - Batch lookup.
128
153
  - `query(value: VectorInput, options?: QueryOptions): ResultItem[]`
129
- - Returns similarity-ranked results as a plain array.
154
+ - Returns results sorted by ascending distance as a plain array.
130
155
  - Throws on dimension mismatch.
131
- - `query(value: VectorInput, options: IterableQueryOptions): Iterable<ResultItem>`
156
+ - `query(value: VectorInput, options: QueryOptions & { iterable: true }): Iterable<ResultItem>`
132
157
  - With `{ iterable: true }`, returns a lazy iterable. Keys are resolved
133
158
  only as each item is consumed, enabling early stopping and pagination.
134
159
  - Throws on dimension mismatch.
@@ -145,10 +170,12 @@ Opens (or creates) a database instance and loads persisted data.
145
170
  ```ts
146
171
  interface ResultItem {
147
172
  key: string;
148
- score: number;
173
+ distance: number;
149
174
  }
150
175
  ```
151
176
 
177
+ - `distance` — Defined as `1 - dotProduct`. With normalization (default), this is cosine distance: 0 = identical, 2 = opposite.
178
+
152
179
  ### Option types
153
180
 
154
181
  #### `OpenOptions`
@@ -190,16 +217,10 @@ interface SetOptions {
190
217
 
191
218
  ```ts
192
219
  interface QueryOptions {
193
- topK?: number; // default: all vectors
220
+ topK?: number; // default: Infinity (all results)
221
+ maxDistance?: number; // inclusive upper bound on distance; results beyond this are excluded
194
222
  normalize?: boolean;
195
- }
196
- ```
197
-
198
- #### `IterableQueryOptions`
199
-
200
- ```ts
201
- interface IterableQueryOptions extends QueryOptions {
202
- iterable: true; // returns Iterable<ResultItem> instead of ResultItem[]
223
+ iterable?: boolean; // when true, returns Iterable<ResultItem> instead of ResultItem[]
203
224
  }
204
225
  ```
205
226
 
@@ -238,8 +259,37 @@ new InMemoryStorageProvider();
238
259
 
239
260
  Thrown when memory growth would exceed WASM 32-bit memory limits for the configured dimension size.
240
261
 
262
+ ## Benchmark results
263
+
264
+ WASM SIMD vs pure JavaScript performance on 1536-dimensional vectors (OpenAI embedding size), measured with `vitest bench` (Node.js):
265
+
266
+ | Operation | JS (ops/s) | WASM SIMD (ops/s) | Speedup |
267
+ | --- | --- | --- | --- |
268
+ | normalize (1536 dims) | 223,117 | 2,226,734 | **~10×** |
269
+ | searchAll (100 vectors × 1536 dims) | 3,429 | 77,130 | **~22×** |
270
+ | searchAll (1,000 vectors × 1536 dims) | 344 | 8,009 | **~23×** |
271
+ | searchAll (10,000 vectors × 1536 dims) | 34 | 398 | **~12×** |
272
+
273
+ The WASM SIMD layer uses 2-vector outer loop unrolling (halving query memory reads) and 4× inner loop unrolling with multiple independent accumulators.
274
+
275
+ ### Running benchmarks
276
+
277
+ **Node.js** (via vitest):
278
+
279
+ ```bash
280
+ npm run bench
281
+ ```
282
+
283
+ **Browser**: start the dev server and navigate to the benchmark page:
284
+
285
+ ```bash
286
+ npm run dev
287
+ # Open http://localhost:5173/bench.html
288
+ ```
289
+
241
290
  ## Practical notes
242
291
 
243
- - Similarity is dot product; with normalization enabled (default), this behaves like cosine similarity.
292
+ - Distance is `1 - dotProduct`; with normalization enabled (default), this behaves like cosine distance (0 = identical, 2 = opposite).
293
+ - `topK` defaults to `Infinity`, returning all stored vectors sorted by distance. Use `maxDistance` to limit results by proximity.
244
294
  - Querying an empty database returns an empty array (`[]`).
245
295
  - `flush()` writes deduplicated state, and reopen preserves key-to-slot mapping.