eigen-db 4.1.0 → 4.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
package/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ # v4.2.0
2
+
3
+ Added: import/export in binary format
4
+
1
5
  # v4.1.0
2
6
 
3
7
  Added: performance optimization in WASM
package/README.md CHANGED
@@ -50,11 +50,11 @@ Notes:
50
50
  ```ts
51
51
  const queryVector = embeddingQuery;
52
52
 
53
- // Returns a plain array of { key, distance } sorted by ascending distance
53
+ // Returns a plain array of { key, similarity } sorted by descending similarity
54
54
  const results = db.query(queryVector, { topK: 10 });
55
55
 
56
- for (const { key, distance } of results) {
57
- console.log(key, distance);
56
+ for (const { key, similarity } of results) {
57
+ console.log(key, similarity);
58
58
  }
59
59
  ```
60
60
 
@@ -64,23 +64,23 @@ For lazy iteration (useful for pagination or early stopping):
64
64
  const results = db.query(queryVector, { topK: 100, iterable: true });
65
65
 
66
66
  // Iterate and break early — keys are resolved on demand
67
- for (const { key, distance } of results) {
68
- if (distance > 0.5) break;
69
- console.log(key, distance);
67
+ for (const { key, similarity } of results) {
68
+ if (similarity < 0.5) break;
69
+ console.log(key, similarity);
70
70
  }
71
71
 
72
72
  // Or spread into an array when you need all results
73
73
  const all = [...results];
74
74
  ```
75
75
 
76
- Use `maxDistance` to automatically cut off results beyond a threshold:
76
+ Use `minSimilarity` to automatically cut off results below a threshold:
77
77
 
78
78
  ```ts
79
- // Only return results within distance 0.3 (inclusive)
80
- const results = db.query(queryVector, { maxDistance: 0.3 });
79
+ // Only return results with similarity 0.7 (inclusive)
80
+ const results = db.query(queryVector, { minSimilarity: 0.7 });
81
81
 
82
82
  // Works with iterable mode too — iteration stops early at the threshold
83
- const results = db.query(queryVector, { maxDistance: 0.3, iterable: true });
83
+ const results = db.query(queryVector, { minSimilarity: 0.7, iterable: true });
84
84
  ```
85
85
 
86
86
  ### 4) Persist and lifecycle
@@ -96,20 +96,54 @@ To delete all vectors and storage:
96
96
  await db.clear();
97
97
  ```
98
98
 
99
- ## Distance metric
99
+ ### 5) Export and import
100
100
 
101
- Distance is defined as `1 - dotProduct(query, stored)`.
101
+ Export the entire database as a streaming binary file:
102
102
 
103
- - **With normalization enabled** (the default): vectors are L2-normalized before storage and query, so the dot product equals cosine similarity. Distance then equals **cosine distance**, ranging from **0** (identical) to **2** (opposite).
104
- - **With normalization disabled** (`normalize: false`): the dot product is computed on raw vectors. Distance is `1 - dotProduct`, which is not a standard metric and its range depends on the magnitude of your vectors. Use this mode when your vectors are already normalized or when you want raw dot-product semantics.
103
+ ```ts
104
+ const stream = await db.export(); // ReadableStream<Uint8Array>
105
+
106
+ // In a browser — download as a file
107
+ const response = new Response(stream);
108
+ const blob = await response.blob();
109
+ const url = URL.createObjectURL(blob);
110
+ const a = document.createElement("a");
111
+ a.href = url;
112
+ a.download = "database.bin";
113
+ a.click();
114
+ ```
115
+
116
+ Import from a stream, replacing all existing data:
117
+
118
+ ```ts
119
+ // From a File (e.g., <input type="file">)
120
+ await db.import(file.stream());
121
+
122
+ // From a fetch response
123
+ const res = await fetch("/path/to/database.bin");
124
+ await db.import(res.body!);
125
+ ```
126
+
127
+ Notes:
128
+
129
+ - `import()` replaces all existing data in the target database.
130
+ - A dimension check is performed on import: the stream must contain data exported from a database with the same `dimensions` setting.
131
+ - Both methods use the Web Streams API to avoid large heap allocations — vectors are streamed in 64KB chunks.
132
+
133
+ ## Similarity metric
134
+
135
+ Similarity is the dot product of the query and stored vectors.
136
+
137
+ - **With normalization enabled** (the default): vectors are L2-normalized before storage and query, so the dot product equals cosine similarity. Similarity ranges from **1** (identical) to **-1** (opposite), with **0** indicating orthogonal vectors.
138
+ - **With normalization disabled** (`normalize: false`): the dot product is computed on raw vectors. The range depends on the magnitude of your vectors. Use this mode when your vectors are already normalized or when you want raw dot-product semantics.
105
139
 
106
140
  **When to normalize:**
107
141
 
108
142
  | Scenario | Normalize? | Notes |
109
143
  | --- | --- | --- |
110
- | Using embeddings from OpenAI, Cohere, etc. | `true` (default) | Embeddings may not be unit-length; normalization ensures cosine distance. |
144
+ | Using embeddings from OpenAI, Cohere, etc. | `true` (default) | Embeddings may not be unit-length; normalization ensures cosine similarity. |
111
145
  | Vectors are already unit-length | Either | Setting `false` avoids redundant work. |
112
- | You need raw dot-product semantics | `false` | Distance will be `1 - dotProduct`; range depends on vector magnitudes. |
146
+ | You need raw dot-product semantics | `false` | Similarity will be the raw dot product; range depends on vector magnitudes. |
113
147
 
114
148
  ## Full API Reference
115
149
 
@@ -151,7 +185,7 @@ Opens (or creates) a database instance and loads persisted data.
151
185
  - `getMany(keys: string[]): (number[] | undefined)[]`
152
186
  - Batch lookup.
153
187
  - `query(value: VectorInput, options?: QueryOptions): ResultItem[]`
154
- - Returns results sorted by ascending distance as a plain array.
188
+ - Returns results sorted by descending similarity as a plain array.
155
189
  - Throws on dimension mismatch.
156
190
  - `query(value: VectorInput, options: QueryOptions & { iterable: true }): Iterable<ResultItem>`
157
191
  - With `{ iterable: true }`, returns a lazy iterable. Keys are resolved
@@ -164,17 +198,22 @@ Opens (or creates) a database instance and loads persisted data.
164
198
  - Subsequent operations throw.
165
199
  - `clear(): Promise<void>`
166
200
  - Clears in-memory state and destroys storage for this DB.
201
+ - `export(): Promise<ReadableStream<Uint8Array>>`
202
+ - Exports the entire database as a streaming binary. Vectors are streamed in 64KB chunks.
203
+ - `import(stream: ReadableStream<Uint8Array>): Promise<void>`
204
+ - Imports data from a stream, replacing all existing data.
205
+ - Throws on dimension mismatch between the stream data and the database.
167
206
 
168
207
  ### `ResultItem`
169
208
 
170
209
  ```ts
171
210
  interface ResultItem {
172
211
  key: string;
173
- distance: number;
212
+ similarity: number;
174
213
  }
175
214
  ```
176
215
 
177
- - `distance` — Defined as `1 - dotProduct`. With normalization (default), this is cosine distance: 0 = identical, 2 = opposite.
216
+ - `similarity` — The dot product of query and stored vectors. With normalization (default), this is cosine similarity: 1 = identical, -1 = opposite.
178
217
 
179
218
  ### Option types
180
219
 
@@ -218,7 +257,7 @@ interface SetOptions {
218
257
  ```ts
219
258
  interface QueryOptions {
220
259
  topK?: number; // default: Infinity (all results)
221
- maxDistance?: number; // inclusive upper bound on distance; results beyond this are excluded
260
+ minSimilarity?: number; // inclusive lower bound on similarity; results below this are excluded
222
261
  normalize?: boolean;
223
262
  iterable?: boolean; // when true, returns Iterable<ResultItem> instead of ResultItem[]
224
263
  }
@@ -289,7 +328,11 @@ npm run dev
289
328
 
290
329
  ## Practical notes
291
330
 
292
- - Distance is `1 - dotProduct`; with normalization enabled (default), this behaves like cosine distance (0 = identical, 2 = opposite).
293
- - `topK` defaults to `Infinity`, returning all stored vectors sorted by distance. Use `maxDistance` to limit results by proximity.
331
+ - Similarity is the dot product of query and stored vectors; with normalization enabled (default), this behaves like cosine similarity (1 = identical, -1 = opposite).
332
+ - `topK` defaults to `Infinity`, returning all stored vectors sorted by similarity. Use `minSimilarity` to limit results by proximity.
294
333
  - Querying an empty database returns an empty array (`[]`).
295
334
  - `flush()` writes deduplicated state, and reopen preserves key-to-slot mapping.
335
+
336
+ ## Related
337
+
338
+ - Just need cosine similarity? Try [fast-theta](https://github.com/chuanqisun/fast-theta).