eigen-db 4.3.0 → 5.0.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ # v5.0.0
2
+
3
+ Changed: replaced `topK` with `limit` and `order` parameters in `query()` method
4
+
5
+ # v4.4.0
6
+
7
+ Added: `entries()`, `keys()`, `values()`, `delete()`, `has()` methods, and `dimensions` property
8
+
1
9
  # v4.3.0
2
10
 
3
11
  Added: option to choose between in memory or OPFS storage backends
package/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # Eigen DB
2
2
 
3
- High-performance vector database for the web.
3
+ High-performance vector database for the web, powered by Web Assembly.
4
4
 
5
5
  `eigen-db` stores and queries embedding vectors in-browser, using:
6
6
 
@@ -16,7 +16,7 @@ npm install eigen-db
16
16
 
17
17
  ## Guide: Set up and query
18
18
 
19
- ### 1) Open a database
19
+ ### Open a database
20
20
 
21
21
  ```ts
22
22
  import { DB } from "eigen-db";
@@ -39,7 +39,7 @@ const db = await DB.open({
39
39
  });
40
40
  ```
41
41
 
42
- ### 2) Insert vectors
42
+ ### Insert vectors
43
43
 
44
44
  ```ts
45
45
  db.set("doc:1", embedding1);
@@ -56,13 +56,40 @@ Notes:
56
56
  - Each vector must be a `number[]` (or `Float32Array`) with exactly `dimensions` elements.
57
57
  - Duplicate keys use last-write-wins semantics.
58
58
 
59
- ### 3) Query nearest vectors
59
+ ### Look up, check, and remove vectors
60
+
61
+ ```ts
62
+ db.get("doc:1"); // number[] | undefined
63
+ db.has("doc:1"); // true
64
+ db.delete("doc:1"); // true (removed), false (not found)
65
+ db.dimensions; // configured vector dimensions
66
+ db.size; // number of entries
67
+ ```
68
+
69
+ ### Iterate over the database
70
+
71
+ ```ts
72
+ // Iterate over all keys
73
+ for (const key of db.keys()) {
74
+ console.log(key);
75
+ }
76
+
77
+ // Iterate over all [key, value] pairs
78
+ for (const [key, vector] of db.entries()) {
79
+ console.log(key, vector);
80
+ }
81
+
82
+ // Spread into an array (uses Symbol.iterator, same as entries())
83
+ const all = [...db];
84
+ ```
85
+
86
+ ### Query nearest vectors
60
87
 
61
88
  ```ts
62
89
  const queryVector = embeddingQuery;
63
90
 
64
91
  // Returns a plain array of { key, similarity } sorted by descending similarity
65
- const results = db.query(queryVector, { topK: 10 });
92
+ const results = db.query(queryVector, { limit: 10 });
66
93
 
67
94
  for (const { key, similarity } of results) {
68
95
  console.log(key, similarity);
@@ -72,7 +99,7 @@ for (const { key, similarity } of results) {
72
99
  For lazy iteration (useful for pagination or early stopping):
73
100
 
74
101
  ```ts
75
- const results = db.query(queryVector, { topK: 100, iterable: true });
102
+ const results = db.query(queryVector, { limit: 100, iterable: true });
76
103
 
77
104
  // Iterate and break early — keys are resolved on demand
78
105
  for (const { key, similarity } of results) {
@@ -84,17 +111,27 @@ for (const { key, similarity } of results) {
84
111
  const all = [...results];
85
112
  ```
86
113
 
87
- Use `minSimilarity` to automatically cut off results below a threshold:
114
+ Use `minSimilarity` and `maxSimilarity` to filter results by a similarity range:
88
115
 
89
116
  ```ts
90
117
  // Only return results with similarity ≥ 0.7 (inclusive)
91
118
  const results = db.query(queryVector, { minSimilarity: 0.7 });
92
119
 
93
- // Works with iterable mode too iteration stops early at the threshold
94
- const results = db.query(queryVector, { minSimilarity: 0.7, iterable: true });
120
+ // Only return results with similarity 0.5 (inclusive)
121
+ const results = db.query(queryVector, { maxSimilarity: 0.5 });
122
+
123
+ // Combine both for a range
124
+ const results = db.query(queryVector, { minSimilarity: 0.3, maxSimilarity: 0.8 });
125
+ ```
126
+
127
+ Use `order: "ascend"` to get the least similar results first (bottom-K):
128
+
129
+ ```ts
130
+ // Least similar results first
131
+ const bottomK = db.query(queryVector, { order: "ascend", limit: 10 });
95
132
  ```
96
133
 
97
- ### 4) Persist and lifecycle
134
+ ### Persist and lifecycle
98
135
 
99
136
  ```ts
100
137
  await db.flush(); // persist current state
@@ -107,7 +144,7 @@ To delete all vectors and storage:
107
144
  await db.clear();
108
145
  ```
109
146
 
110
- ### 5) Export and import
147
+ ### Export and import
111
148
 
112
149
  Export the entire database as a streaming binary file:
113
150
 
@@ -150,11 +187,11 @@ Similarity is the dot product of the query and stored vectors.
150
187
 
151
188
  **When to normalize:**
152
189
 
153
- | Scenario | Normalize? | Notes |
154
- | --- | --- | --- |
190
+ | Scenario | Normalize? | Notes |
191
+ | ------------------------------------------ | ---------------- | --------------------------------------------------------------------------- |
155
192
  | Using embeddings from OpenAI, Cohere, etc. | `true` (default) | Embeddings may not be unit-length; normalization ensures cosine similarity. |
156
- | Vectors are already unit-length | Either | Setting `false` avoids redundant work. |
157
- | You need raw dot-product semantics | `false` | Similarity will be the raw dot product; range depends on vector magnitudes. |
193
+ | Vectors are already unit-length | Either | Setting `false` avoids redundant work. |
194
+ | You need raw dot-product semantics | `false` | Similarity will be the raw dot product; range depends on vector magnitudes. |
158
195
 
159
196
  ## Full API Reference
160
197
 
@@ -183,6 +220,7 @@ Opens (or creates) a database instance and loads persisted data.
183
220
  #### Properties
184
221
 
185
222
  - `size: number` — current number of key-vector pairs
223
+ - `dimensions: number` — number of dimensions per vector
186
224
 
187
225
  #### Methods
188
226
 
@@ -191,10 +229,20 @@ Opens (or creates) a database instance and loads persisted data.
191
229
  - Throws on dimension mismatch.
192
230
  - `get(key: string): number[] | undefined`
193
231
  - Returns a copy of the stored vector.
232
+ - `has(key: string): boolean`
233
+ - Returns `true` if the key exists, `false` otherwise. O(1) lookup.
234
+ - `delete(key: string): boolean`
235
+ - Removes the entry for the given key. Returns `true` if the key existed, `false` otherwise.
194
236
  - `setMany(entries: [string, VectorInput][]): void`
195
237
  - Batch insert/update.
196
238
  - `getMany(keys: string[]): (number[] | undefined)[]`
197
239
  - Batch lookup.
240
+ - `keys(): IterableIterator<string>`
241
+ - Returns an iterable of all keys.
242
+ - `entries(): IterableIterator<[string, number[]]>`
243
+ - Returns an iterable of `[key, value]` pairs. Values are plain number array copies.
244
+ - `[Symbol.iterator](): IterableIterator<[string, number[]]>`
245
+ - Same as `entries()`. Enables `[...db]` and `for...of` iteration.
198
246
  - `query(value: VectorInput, options?: QueryOptions): ResultItem[]`
199
247
  - Returns results sorted by descending similarity as a plain array.
200
248
  - Throws on dimension mismatch.
@@ -265,8 +313,10 @@ interface SetOptions {
265
313
 
266
314
  ```ts
267
315
  interface QueryOptions {
268
- topK?: number; // default: Infinity (all results)
316
+ limit?: number; // default: Infinity (all results)
317
+ order?: "ascend" | "descend"; // default: "descend" (most similar first)
269
318
  minSimilarity?: number; // inclusive lower bound on similarity; results below this are excluded
319
+ maxSimilarity?: number; // inclusive upper bound on similarity; results above this are excluded
270
320
  normalize?: boolean;
271
321
  iterable?: boolean; // when true, returns Iterable<ResultItem> instead of ResultItem[]
272
322
  }
@@ -311,12 +361,12 @@ Thrown when memory growth would exceed WASM 32-bit memory limits for the configu
311
361
 
312
362
  WASM SIMD vs pure JavaScript performance on 1536-dimensional vectors (OpenAI embedding size), measured with `vitest bench` (Node.js):
313
363
 
314
- | Operation | JS (ops/s) | WASM SIMD (ops/s) | Speedup |
315
- | --- | --- | --- | --- |
316
- | normalize (1536 dims) | 223,117 | 2,226,734 | **~10×** |
317
- | searchAll (100 vectors × 1536 dims) | 3,429 | 77,130 | **~22×** |
318
- | searchAll (1,000 vectors × 1536 dims) | 344 | 8,009 | **~23×** |
319
- | searchAll (10,000 vectors × 1536 dims) | 34 | 398 | **~12×** |
364
+ | Operation | JS (ops/s) | WASM SIMD (ops/s) | Speedup |
365
+ | -------------------------------------- | ---------- | ----------------- | -------- |
366
+ | normalize (1536 dims) | 223,117 | 2,226,734 | **~10×** |
367
+ | searchAll (100 vectors × 1536 dims) | 3,429 | 77,130 | **~22×** |
368
+ | searchAll (1,000 vectors × 1536 dims) | 344 | 8,009 | **~23×** |
369
+ | searchAll (10,000 vectors × 1536 dims) | 34 | 398 | **~12×** |
320
370
 
321
371
  The WASM SIMD layer uses 2-vector outer loop unrolling (halving query memory reads) and 4× inner loop unrolling with multiple independent accumulators.
322
372
 
@@ -338,7 +388,8 @@ npm run dev
338
388
  ## Practical notes
339
389
 
340
390
  - Similarity is the dot product of query and stored vectors; with normalization enabled (default), this behaves like cosine similarity (1 = identical, -1 = opposite).
341
- - `topK` defaults to `Infinity`, returning all stored vectors sorted by similarity. Use `minSimilarity` to limit results by proximity.
391
+ - `limit` defaults to `Infinity`, returning all stored vectors sorted by similarity. Use `minSimilarity` and `maxSimilarity` to filter results by proximity range.
392
+ - `order` defaults to `"descend"` (most similar first). Use `"ascend"` to get least similar first.
342
393
  - Querying an empty database returns an empty array (`[]`).
343
394
  - `flush()` writes deduplicated state, and reopen preserves key-to-slot mapping.
344
395
 
@@ -0,0 +1,20 @@
1
+ /**
2
+ * Pure JavaScript compute functions for vector operations.
3
+ * These serve as the reference implementation and fallback when WASM SIMD is unavailable.
4
+ */
5
+ /**
6
+ * Normalizes a vector in-place to unit length.
7
+ * After normalization, cosine similarity reduces to a simple dot product.
8
+ */
9
+ export declare function normalize(vec: Float32Array): void;
10
+ /**
11
+ * Computes dot products of query against all vectors in the database.
12
+ * Writes scores to the output array.
13
+ *
14
+ * @param query - Normalized query vector (length = dimensions)
15
+ * @param db - Contiguous flat array of normalized vectors (length = dbSize * dimensions)
16
+ * @param scores - Output array for dot product scores (length = dbSize)
17
+ * @param dbSize - Number of vectors in the database
18
+ * @param dimensions - Dimensionality of each vector
19
+ */
20
+ export declare function searchAll(query: Float32Array, db: Float32Array, scores: Float32Array, dbSize: number, dimensions: number): void;