neighbor 0.6.0 → 1.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +9 -0
- data/LICENSE.txt +1 -1
- data/README.md +148 -99
- data/lib/neighbor/model.rb +5 -11
- data/lib/neighbor/postgresql.rb +1 -1
- data/lib/neighbor/sqlite.rb +120 -8
- data/lib/neighbor/utils.rb +36 -7
- data/lib/neighbor/version.rb +1 -1
- data/lib/neighbor.rb +1 -0
- metadata +5 -5
checksums.yaml
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
---
|
|
2
2
|
SHA256:
|
|
3
|
-
metadata.gz:
|
|
4
|
-
data.tar.gz:
|
|
3
|
+
metadata.gz: 1154f0138248270d73d1681962c7bfecc8f81e074104a24071d32d76d5405705
|
|
4
|
+
data.tar.gz: 23c823cc022efcfebf9533d0388dd3abaec0fca61cc7e0911a2972dfb3bf429b
|
|
5
5
|
SHA512:
|
|
6
|
-
metadata.gz:
|
|
7
|
-
data.tar.gz:
|
|
6
|
+
metadata.gz: 67cc38e53cb43b8775561dd234cf073d2d45115b1c517aa32f890ab9f3d059106915158ee8bc08d2af9091e2107273ba9a65da6c66a1646afc419b62a8a669e0
|
|
7
|
+
data.tar.gz: 63cf2a9765043884726a59a6c058730d004be2b15bed0565be70579ef1717b322bb424e1311ced9a5685c8f45b7521333b0c66cca42c83cd52cb94cd13a7a5fd
|
data/CHANGELOG.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
1
|
+
## 1.1.0 (2026-05-14)
|
|
2
|
+
|
|
3
|
+
- Added experimental support for SQLite with no extension
|
|
4
|
+
- Added experimental support for Vec1
|
|
5
|
+
|
|
6
|
+
## 1.0.0 (2026-04-04)
|
|
7
|
+
|
|
8
|
+
- Dropped support for Ruby < 3.3 and Active Record < 7.2
|
|
9
|
+
|
|
1
10
|
## 0.6.0 (2025-06-12)
|
|
2
11
|
|
|
3
12
|
- Added support for MariaDB 11.8
|
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
|
@@ -4,10 +4,12 @@ Nearest neighbor search for Rails
|
|
|
4
4
|
|
|
5
5
|
Supports:
|
|
6
6
|
|
|
7
|
-
- Postgres (
|
|
7
|
+
- Postgres (pgvector and cube)
|
|
8
8
|
- MariaDB 11.8
|
|
9
9
|
- MySQL 9 (searching requires HeatWave) - experimental
|
|
10
|
-
- SQLite
|
|
10
|
+
- SQLite - experimental
|
|
11
|
+
|
|
12
|
+
Also available for [Redis](https://github.com/ankane/neighbor-redis) and [S3 Vectors](https://github.com/ankane/neighbor-s3)
|
|
11
13
|
|
|
12
14
|
[](https://github.com/ankane/neighbor/actions)
|
|
13
15
|
|
|
@@ -21,14 +23,7 @@ gem "neighbor"
|
|
|
21
23
|
|
|
22
24
|
### For Postgres
|
|
23
25
|
|
|
24
|
-
Neighbor supports two extensions: [
|
|
25
|
-
|
|
26
|
-
For cube, run:
|
|
27
|
-
|
|
28
|
-
```sh
|
|
29
|
-
rails generate neighbor:cube
|
|
30
|
-
rails db:migrate
|
|
31
|
-
```
|
|
26
|
+
Neighbor supports two extensions for Postgres: [pgvector](https://github.com/pgvector/pgvector) and [cube](https://www.postgresql.org/docs/current/cube.html). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.
|
|
32
27
|
|
|
33
28
|
For pgvector, [install the extension](https://github.com/pgvector/pgvector#installation) and run:
|
|
34
29
|
|
|
@@ -37,18 +32,11 @@ rails generate neighbor:vector
|
|
|
37
32
|
rails db:migrate
|
|
38
33
|
```
|
|
39
34
|
|
|
40
|
-
|
|
41
|
-
|
|
42
|
-
Add this line to your application’s Gemfile:
|
|
43
|
-
|
|
44
|
-
```ruby
|
|
45
|
-
gem "sqlite-vec"
|
|
46
|
-
```
|
|
47
|
-
|
|
48
|
-
And run:
|
|
35
|
+
For cube, run:
|
|
49
36
|
|
|
50
37
|
```sh
|
|
51
|
-
rails generate neighbor:
|
|
38
|
+
rails generate neighbor:cube
|
|
39
|
+
rails db:migrate
|
|
52
40
|
```
|
|
53
41
|
|
|
54
42
|
## Getting Started
|
|
@@ -56,15 +44,15 @@ rails generate neighbor:sqlite
|
|
|
56
44
|
Create a migration
|
|
57
45
|
|
|
58
46
|
```ruby
|
|
59
|
-
class AddEmbeddingToItems < ActiveRecord::Migration[8.
|
|
47
|
+
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
|
|
60
48
|
def change
|
|
61
|
-
# cube
|
|
62
|
-
add_column :items, :embedding, :cube
|
|
63
|
-
|
|
64
49
|
# pgvector, MariaDB, and MySQL
|
|
65
50
|
add_column :items, :embedding, :vector, limit: 3 # dimensions
|
|
66
51
|
|
|
67
|
-
#
|
|
52
|
+
# cube
|
|
53
|
+
add_column :items, :embedding, :cube
|
|
54
|
+
|
|
55
|
+
# SQLite
|
|
68
56
|
add_column :items, :embedding, :binary
|
|
69
57
|
end
|
|
70
58
|
end
|
|
@@ -105,47 +93,14 @@ nearest_item.neighbor_distance
|
|
|
105
93
|
|
|
106
94
|
See the additional docs for:
|
|
107
95
|
|
|
108
|
-
- [cube](#cube)
|
|
109
96
|
- [pgvector](#pgvector)
|
|
97
|
+
- [cube](#cube)
|
|
110
98
|
- [MariaDB](#mariadb)
|
|
111
99
|
- [MySQL](#mysql)
|
|
112
|
-
- [
|
|
100
|
+
- [SQLite](#sqlite)
|
|
113
101
|
|
|
114
102
|
Or check out some [examples](#examples)
|
|
115
103
|
|
|
116
|
-
## cube
|
|
117
|
-
|
|
118
|
-
### Distance
|
|
119
|
-
|
|
120
|
-
Supported values are:
|
|
121
|
-
|
|
122
|
-
- `euclidean`
|
|
123
|
-
- `cosine`
|
|
124
|
-
- `taxicab`
|
|
125
|
-
- `chebyshev`
|
|
126
|
-
|
|
127
|
-
For cosine distance with cube, vectors must be normalized before being stored.
|
|
128
|
-
|
|
129
|
-
```ruby
|
|
130
|
-
class Item < ApplicationRecord
|
|
131
|
-
has_neighbors :embedding, normalize: true
|
|
132
|
-
end
|
|
133
|
-
```
|
|
134
|
-
|
|
135
|
-
For inner product with cube, see [this example](examples/disco/user_recs_cube.rb).
|
|
136
|
-
|
|
137
|
-
### Dimensions
|
|
138
|
-
|
|
139
|
-
The `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.
|
|
140
|
-
|
|
141
|
-
For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
|
|
142
|
-
|
|
143
|
-
```ruby
|
|
144
|
-
class Item < ApplicationRecord
|
|
145
|
-
has_neighbors :embedding, dimensions: 3
|
|
146
|
-
end
|
|
147
|
-
```
|
|
148
|
-
|
|
149
104
|
## pgvector
|
|
150
105
|
|
|
151
106
|
### Distance
|
|
@@ -174,7 +129,7 @@ The `sparsevec` type can have up to 16,000 non-zero elements, and sparse vectors
|
|
|
174
129
|
Add an approximate index to speed up queries. Create a migration with:
|
|
175
130
|
|
|
176
131
|
```ruby
|
|
177
|
-
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.
|
|
132
|
+
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.1]
|
|
178
133
|
def change
|
|
179
134
|
add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops
|
|
180
135
|
# or
|
|
@@ -202,7 +157,7 @@ Item.connection.execute("SET ivfflat.probes = 3")
|
|
|
202
157
|
Use the `halfvec` type to store half-precision vectors
|
|
203
158
|
|
|
204
159
|
```ruby
|
|
205
|
-
class AddEmbeddingToItems < ActiveRecord::Migration[8.
|
|
160
|
+
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
|
|
206
161
|
def change
|
|
207
162
|
add_column :items, :embedding, :halfvec, limit: 3 # dimensions
|
|
208
163
|
end
|
|
@@ -214,7 +169,7 @@ end
|
|
|
214
169
|
Index vectors at half precision for smaller indexes
|
|
215
170
|
|
|
216
171
|
```ruby
|
|
217
|
-
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.
|
|
172
|
+
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.1]
|
|
218
173
|
def change
|
|
219
174
|
add_index :items, "(embedding::halfvec(3)) halfvec_l2_ops", using: :hnsw
|
|
220
175
|
end
|
|
@@ -232,7 +187,7 @@ Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean", preci
|
|
|
232
187
|
Use the `bit` type to store binary vectors
|
|
233
188
|
|
|
234
189
|
```ruby
|
|
235
|
-
class AddEmbeddingToItems < ActiveRecord::Migration[8.
|
|
190
|
+
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
|
|
236
191
|
def change
|
|
237
192
|
add_column :items, :embedding, :bit, limit: 3 # dimensions
|
|
238
193
|
end
|
|
@@ -250,7 +205,7 @@ Item.nearest_neighbors(:embedding, "101", distance: "hamming").first(5)
|
|
|
250
205
|
Use expression indexing for binary quantization
|
|
251
206
|
|
|
252
207
|
```ruby
|
|
253
|
-
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.
|
|
208
|
+
class AddIndexToItemsEmbedding < ActiveRecord::Migration[8.1]
|
|
254
209
|
def change
|
|
255
210
|
add_index :items, "(binary_quantize(embedding)::bit(3)) bit_hamming_ops", using: :hnsw
|
|
256
211
|
end
|
|
@@ -262,7 +217,7 @@ end
|
|
|
262
217
|
Use the `sparsevec` type to store sparse vectors
|
|
263
218
|
|
|
264
219
|
```ruby
|
|
265
|
-
class AddEmbeddingToItems < ActiveRecord::Migration[8.
|
|
220
|
+
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
|
|
266
221
|
def change
|
|
267
222
|
add_column :items, :embedding, :sparsevec, limit: 3 # dimensions
|
|
268
223
|
end
|
|
@@ -276,6 +231,39 @@ embedding = Neighbor::SparseVector.new({0 => 0.9, 1 => 1.3, 2 => 1.1}, 3)
|
|
|
276
231
|
Item.nearest_neighbors(:embedding, embedding, distance: "euclidean").first(5)
|
|
277
232
|
```
|
|
278
233
|
|
|
234
|
+
## cube
|
|
235
|
+
|
|
236
|
+
### Distance
|
|
237
|
+
|
|
238
|
+
Supported values are:
|
|
239
|
+
|
|
240
|
+
- `euclidean`
|
|
241
|
+
- `cosine`
|
|
242
|
+
- `taxicab`
|
|
243
|
+
- `chebyshev`
|
|
244
|
+
|
|
245
|
+
For cosine distance with cube, vectors must be normalized before being stored.
|
|
246
|
+
|
|
247
|
+
```ruby
|
|
248
|
+
class Item < ApplicationRecord
|
|
249
|
+
has_neighbors :embedding, normalize: true
|
|
250
|
+
end
|
|
251
|
+
```
|
|
252
|
+
|
|
253
|
+
For inner product with cube, see [this example](examples/disco/user_recs_cube.rb).
|
|
254
|
+
|
|
255
|
+
### Dimensions
|
|
256
|
+
|
|
257
|
+
The `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.
|
|
258
|
+
|
|
259
|
+
For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
|
|
260
|
+
|
|
261
|
+
```ruby
|
|
262
|
+
class Item < ApplicationRecord
|
|
263
|
+
has_neighbors :embedding, dimensions: 3
|
|
264
|
+
end
|
|
265
|
+
```
|
|
266
|
+
|
|
279
267
|
## MariaDB
|
|
280
268
|
|
|
281
269
|
### Distance
|
|
@@ -291,7 +279,7 @@ Supported values are:
|
|
|
291
279
|
Vector columns must use `null: false` to add a vector index
|
|
292
280
|
|
|
293
281
|
```ruby
|
|
294
|
-
class CreateItems < ActiveRecord::Migration[8.
|
|
282
|
+
class CreateItems < ActiveRecord::Migration[8.1]
|
|
295
283
|
def change
|
|
296
284
|
create_table :items do |t|
|
|
297
285
|
t.vector :embedding, limit: 3, null: false
|
|
@@ -306,7 +294,7 @@ end
|
|
|
306
294
|
Use the `bigint` type to store binary vectors
|
|
307
295
|
|
|
308
296
|
```ruby
|
|
309
|
-
class AddEmbeddingToItems < ActiveRecord::Migration[8.
|
|
297
|
+
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
|
|
310
298
|
def change
|
|
311
299
|
add_column :items, :embedding, :bigint
|
|
312
300
|
end
|
|
@@ -338,7 +326,7 @@ Note: The `DISTANCE()` function is [only available on HeatWave](https://dev.mysq
|
|
|
338
326
|
Use the `binary` type to store binary vectors
|
|
339
327
|
|
|
340
328
|
```ruby
|
|
341
|
-
class AddEmbeddingToItems < ActiveRecord::Migration[8.
|
|
329
|
+
class AddEmbeddingToItems < ActiveRecord::Migration[8.1]
|
|
342
330
|
def change
|
|
343
331
|
add_column :items, :embedding, :binary
|
|
344
332
|
end
|
|
@@ -351,20 +339,22 @@ Get the nearest neighbors by Hamming distance
|
|
|
351
339
|
Item.nearest_neighbors(:embedding, "\x05", distance: "hamming").first(5)
|
|
352
340
|
```
|
|
353
341
|
|
|
354
|
-
##
|
|
342
|
+
## SQLite
|
|
355
343
|
|
|
356
344
|
### Distance
|
|
357
345
|
|
|
358
346
|
Supported values are:
|
|
359
347
|
|
|
360
348
|
- `euclidean`
|
|
349
|
+
- `inner_product`
|
|
361
350
|
- `cosine`
|
|
362
351
|
- `taxicab`
|
|
363
352
|
- `hamming`
|
|
353
|
+
- `jaccard`
|
|
364
354
|
|
|
365
355
|
### Dimensions
|
|
366
356
|
|
|
367
|
-
For
|
|
357
|
+
For SQLite, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
|
|
368
358
|
|
|
369
359
|
```ruby
|
|
370
360
|
class Item < ApplicationRecord
|
|
@@ -372,12 +362,97 @@ class Item < ApplicationRecord
|
|
|
372
362
|
end
|
|
373
363
|
```
|
|
374
364
|
|
|
375
|
-
###
|
|
365
|
+
### Int8 Vectors
|
|
366
|
+
|
|
367
|
+
Use the `type` option for int8 vectors
|
|
368
|
+
|
|
369
|
+
```ruby
|
|
370
|
+
class Item < ApplicationRecord
|
|
371
|
+
has_neighbors :embedding, dimensions: 3, type: :int8
|
|
372
|
+
end
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
### Binary Vectors
|
|
376
|
+
|
|
377
|
+
Use the `type` option for binary vectors
|
|
378
|
+
|
|
379
|
+
```ruby
|
|
380
|
+
class Item < ApplicationRecord
|
|
381
|
+
has_neighbors :embedding, dimensions: 8, type: :bit
|
|
382
|
+
end
|
|
383
|
+
```
|
|
384
|
+
|
|
385
|
+
Get the nearest neighbors by Hamming distance
|
|
386
|
+
|
|
387
|
+
```ruby
|
|
388
|
+
Item.nearest_neighbors(:embedding, "\x05", distance: "hamming").first(5)
|
|
389
|
+
```
|
|
390
|
+
|
|
391
|
+
### SQLite Extensions
|
|
392
|
+
|
|
393
|
+
Improve performance with extensions:
|
|
394
|
+
|
|
395
|
+
- [Vec1](#vec1)
|
|
396
|
+
- [sqlite-vec](#sqlite-vec)
|
|
397
|
+
|
|
398
|
+
### Vec1
|
|
399
|
+
|
|
400
|
+
For [Vec1](https://sqlite.org/vec1/doc/trunk/doc/vec1.md), [build the extension](https://sqlite.org/vec1/doc/trunk/doc/vec1.md#2-building-the-extension) and create `config/initializers/neighbor.rb` with:
|
|
401
|
+
|
|
402
|
+
```ruby
|
|
403
|
+
Neighbor::SQLite.initialize!(extension: "/path/to/vec1.so")
|
|
404
|
+
```
|
|
405
|
+
|
|
406
|
+
This speeds up `euclidean` and `cosine` distance
|
|
407
|
+
|
|
408
|
+
You can also use [virtual tables](https://sqlite.org/vec1/doc/trunk/doc/vec1intro.md#1-using-the-virtual-table)
|
|
409
|
+
|
|
410
|
+
```ruby
|
|
411
|
+
class CreateItems < ActiveRecord::Migration[8.1]
|
|
412
|
+
def change
|
|
413
|
+
# Rails 8+
|
|
414
|
+
create_virtual_table :items, :vec1, ["embedding", "id"]
|
|
415
|
+
|
|
416
|
+
# Rails < 8
|
|
417
|
+
execute "CREATE VIRTUAL TABLE items USING vec1(embedding, id)"
|
|
418
|
+
end
|
|
419
|
+
end
|
|
420
|
+
```
|
|
421
|
+
|
|
422
|
+
You can optionally ignore any shadow tables that are created
|
|
423
|
+
|
|
424
|
+
```ruby
|
|
425
|
+
ActiveRecord::SchemaDumper.ignore_tables += [
|
|
426
|
+
"items_base", "items_config", "items_idx", "items_meta", "items_model"
|
|
427
|
+
]
|
|
428
|
+
```
|
|
429
|
+
|
|
430
|
+
Get the `k` nearest neighbors
|
|
431
|
+
|
|
432
|
+
```ruby
|
|
433
|
+
Item.find_by_sql("SELECT * FROM items(vec1_from_json(?), ?)", [[1, 2, 3].to_json, {k: 5}.to_json])
|
|
434
|
+
```
|
|
435
|
+
|
|
436
|
+
### sqlite-vec
|
|
437
|
+
|
|
438
|
+
For [sqlite-vec](https://github.com/asg017/sqlite-vec), add this line to your application’s Gemfile:
|
|
439
|
+
|
|
440
|
+
```ruby
|
|
441
|
+
gem "sqlite-vec"
|
|
442
|
+
```
|
|
443
|
+
|
|
444
|
+
And run:
|
|
445
|
+
|
|
446
|
+
```sh
|
|
447
|
+
rails generate neighbor:sqlite
|
|
448
|
+
```
|
|
449
|
+
|
|
450
|
+
This speeds up `euclidean`, `cosine`, `taxicab`, and `hamming` distance
|
|
376
451
|
|
|
377
452
|
You can also use [virtual tables](https://alexgarcia.xyz/sqlite-vec/features/knn.html)
|
|
378
453
|
|
|
379
454
|
```ruby
|
|
380
|
-
class
|
|
455
|
+
class CreateItems < ActiveRecord::Migration[8.1]
|
|
381
456
|
def change
|
|
382
457
|
# Rails 8+
|
|
383
458
|
create_virtual_table :items, :vec0, [
|
|
@@ -418,32 +493,6 @@ Filter by primary key
|
|
|
418
493
|
Item.where(id: [2, 3]).where("embedding MATCH ?", [1, 2, 3].to_s).where(k: 5).order(:distance)
|
|
419
494
|
```
|
|
420
495
|
|
|
421
|
-
### Int8 Vectors
|
|
422
|
-
|
|
423
|
-
Use the `type` option for int8 vectors
|
|
424
|
-
|
|
425
|
-
```ruby
|
|
426
|
-
class Item < ApplicationRecord
|
|
427
|
-
has_neighbors :embedding, dimensions: 3, type: :int8
|
|
428
|
-
end
|
|
429
|
-
```
|
|
430
|
-
|
|
431
|
-
### Binary Vectors
|
|
432
|
-
|
|
433
|
-
Use the `type` option for binary vectors
|
|
434
|
-
|
|
435
|
-
```ruby
|
|
436
|
-
class Item < ApplicationRecord
|
|
437
|
-
has_neighbors :embedding, dimensions: 8, type: :bit
|
|
438
|
-
end
|
|
439
|
-
```
|
|
440
|
-
|
|
441
|
-
Get the nearest neighbors by Hamming distance
|
|
442
|
-
|
|
443
|
-
```ruby
|
|
444
|
-
Item.nearest_neighbors(:embedding, "\x05", distance: "hamming").first(5)
|
|
445
|
-
```
|
|
446
|
-
|
|
447
496
|
## Examples
|
|
448
497
|
|
|
449
498
|
- [Embeddings](#openai-embeddings) with OpenAI
|
data/lib/neighbor/model.rb
CHANGED
|
@@ -27,16 +27,8 @@ module Neighbor
|
|
|
27
27
|
@neighbor_attributes[attribute_name] = {dimensions: dimensions, normalize: normalize, type: type&.to_sym}
|
|
28
28
|
end
|
|
29
29
|
|
|
30
|
-
|
|
31
|
-
|
|
32
|
-
Neighbor::Attribute.new(cast_type: cast_type, model: self, type: type, attribute_name: name)
|
|
33
|
-
end
|
|
34
|
-
else
|
|
35
|
-
attribute_names.each do |attribute_name|
|
|
36
|
-
attribute attribute_name do |cast_type|
|
|
37
|
-
Neighbor::Attribute.new(cast_type: cast_type, model: self, type: type, attribute_name: attribute_name)
|
|
38
|
-
end
|
|
39
|
-
end
|
|
30
|
+
decorate_attributes(attribute_names) do |name, cast_type|
|
|
31
|
+
Neighbor::Attribute.new(cast_type: cast_type, model: self, type: type, attribute_name: name)
|
|
40
32
|
end
|
|
41
33
|
|
|
42
34
|
if normalize
|
|
@@ -135,8 +127,10 @@ module Neighbor
|
|
|
135
127
|
neighbor_distance =
|
|
136
128
|
if distance == "cosine" && normalize_required
|
|
137
129
|
"POWER(#{order}, 2) / 2.0"
|
|
138
|
-
elsif
|
|
130
|
+
elsif distance == "inner_product"
|
|
139
131
|
"(#{order}) * -1"
|
|
132
|
+
elsif adapter == :sqlite && order.start_with?("vec1_l2_distance")
|
|
133
|
+
"sqrt(#{order})"
|
|
140
134
|
else
|
|
141
135
|
order
|
|
142
136
|
end
|
data/lib/neighbor/postgresql.rb
CHANGED
|
@@ -47,7 +47,7 @@ module Neighbor
|
|
|
47
47
|
|
|
48
48
|
module ArrayMethods
|
|
49
49
|
def type_cast_array(value, method, ...)
|
|
50
|
-
if (subtype.is_a?(Neighbor::Type::Vector) || subtype.is_a?(Neighbor::Type::Halfvec)) && method != :deserialize && value.is_a?(::Array) && value.all?
|
|
50
|
+
if (subtype.is_a?(Neighbor::Type::Vector) || subtype.is_a?(Neighbor::Type::Halfvec)) && method != :deserialize && value.is_a?(::Array) && value.all?(::Numeric)
|
|
51
51
|
super(ArrayWrapper.new(value), method, ...)
|
|
52
52
|
else
|
|
53
53
|
super
|
data/lib/neighbor/sqlite.rb
CHANGED
|
@@ -1,27 +1,139 @@
|
|
|
1
1
|
module Neighbor
|
|
2
2
|
module SQLite
|
|
3
|
+
class << self
|
|
4
|
+
attr_reader :extensions
|
|
5
|
+
end
|
|
6
|
+
|
|
3
7
|
# note: this is a public API (unlike PostgreSQL and MySQL)
|
|
4
|
-
def self.initialize!
|
|
5
|
-
|
|
8
|
+
def self.initialize!(extension: :sqlite_vec)
|
|
9
|
+
if extension == :sqlite_vec
|
|
10
|
+
require "sqlite_vec"
|
|
11
|
+
elsif !extension.is_a?(String)
|
|
12
|
+
raise ArgumentError, "Unsupported extension"
|
|
13
|
+
end
|
|
14
|
+
|
|
15
|
+
(@extensions ||= []) << extension
|
|
16
|
+
end
|
|
17
|
+
|
|
18
|
+
def self.initialize_adapter!
|
|
19
|
+
@extensions ||= []
|
|
6
20
|
|
|
7
21
|
require_relative "type/sqlite_vector"
|
|
8
22
|
require_relative "type/sqlite_int8_vector"
|
|
9
23
|
|
|
10
|
-
require "sqlite_vec"
|
|
11
24
|
require "active_record/connection_adapters/sqlite3_adapter"
|
|
12
|
-
|
|
13
25
|
ActiveRecord::ConnectionAdapters::SQLite3Adapter.prepend(InstanceMethods)
|
|
26
|
+
end
|
|
27
|
+
|
|
28
|
+
def self.vec1?
|
|
29
|
+
extensions.any?(String)
|
|
30
|
+
end
|
|
31
|
+
|
|
32
|
+
def self.sqlite_vec?
|
|
33
|
+
extensions.include?(:sqlite_vec)
|
|
34
|
+
end
|
|
14
35
|
|
|
15
|
-
|
|
36
|
+
def self.setup_functions(db)
|
|
37
|
+
db.create_function("neighbor_l2_distance", 2) do |func, a, b, c|
|
|
38
|
+
func.result =
|
|
39
|
+
if a.nil? || b.nil?
|
|
40
|
+
nil
|
|
41
|
+
else
|
|
42
|
+
raise SQLite3::SQLException, "different vector dimensions" if a.bytesize != b.bytesize
|
|
43
|
+
fmt = c == 1 ? "c*" : "f*"
|
|
44
|
+
a = a.unpack(fmt)
|
|
45
|
+
b = b.unpack(fmt)
|
|
46
|
+
Math.sqrt(a.zip(b).sum { |ai, bi| (ai - bi)**2 })
|
|
47
|
+
end
|
|
48
|
+
end
|
|
49
|
+
|
|
50
|
+
db.create_function("neighbor_max_inner_product", 2) do |func, a, b, c|
|
|
51
|
+
func.result =
|
|
52
|
+
if a.nil? || b.nil?
|
|
53
|
+
nil
|
|
54
|
+
else
|
|
55
|
+
raise SQLite3::SQLException, "different vector dimensions" if a.bytesize != b.bytesize
|
|
56
|
+
fmt = c == 1 ? "c*" : "f*"
|
|
57
|
+
a = a.unpack(fmt)
|
|
58
|
+
b = b.unpack(fmt)
|
|
59
|
+
-a.zip(b).sum { |ai, bi| ai * bi }
|
|
60
|
+
end
|
|
61
|
+
end
|
|
62
|
+
|
|
63
|
+
db.create_function("neighbor_cosine_distance", 2) do |func, a, b, c|
|
|
64
|
+
func.result =
|
|
65
|
+
if a.nil? || b.nil?
|
|
66
|
+
nil
|
|
67
|
+
else
|
|
68
|
+
raise SQLite3::SQLException, "different vector dimensions" if a.bytesize != b.bytesize
|
|
69
|
+
fmt = c == 1 ? "c*" : "f*"
|
|
70
|
+
a = a.unpack(fmt)
|
|
71
|
+
b = b.unpack(fmt)
|
|
72
|
+
similarity = a.zip(b).sum { |ai, bi| ai * bi }
|
|
73
|
+
norma = a.sum { |v| v * v }
|
|
74
|
+
normb = b.sum { |v| v * v }
|
|
75
|
+
1.0 - (similarity / Math.sqrt(norma * normb)).clamp(-1.0, 1.0)
|
|
76
|
+
end
|
|
77
|
+
end
|
|
78
|
+
|
|
79
|
+
db.create_function("neighbor_l1_distance", 2) do |func, a, b, c|
|
|
80
|
+
func.result =
|
|
81
|
+
if a.nil? || b.nil?
|
|
82
|
+
nil
|
|
83
|
+
else
|
|
84
|
+
raise SQLite3::SQLException, "different vector dimensions" if a.bytesize != b.bytesize
|
|
85
|
+
fmt = c == 1 ? "c*" : "f*"
|
|
86
|
+
a = a.unpack(fmt)
|
|
87
|
+
b = b.unpack(fmt)
|
|
88
|
+
a.zip(b).sum { |ai, bi| (ai - bi).abs }
|
|
89
|
+
end
|
|
90
|
+
end
|
|
91
|
+
|
|
92
|
+
db.create_function("neighbor_hamming_distance", 2) do |func, a, b|
|
|
93
|
+
func.result =
|
|
94
|
+
if a.nil? || b.nil?
|
|
95
|
+
nil
|
|
96
|
+
else
|
|
97
|
+
raise SQLite3::SQLException, "different vector dimensions" if a.bytesize != b.bytesize
|
|
98
|
+
# TODO improve
|
|
99
|
+
a.each_byte.zip(b.each_byte).sum { |ai, bi| (ai ^ bi).to_s(2).count("1") }
|
|
100
|
+
end
|
|
101
|
+
end
|
|
102
|
+
|
|
103
|
+
db.create_function("neighbor_jaccard_distance", 2) do |func, a, b|
|
|
104
|
+
func.result =
|
|
105
|
+
if a.nil? || b.nil?
|
|
106
|
+
nil
|
|
107
|
+
else
|
|
108
|
+
raise SQLite3::SQLException, "different vector dimensions" if a.bytesize != b.bytesize
|
|
109
|
+
# TODO improve
|
|
110
|
+
ab = a.each_byte.zip(b.each_byte).sum { |ai, bi| (ai & bi).to_s(2).count("1") }
|
|
111
|
+
aa = a.unpack1("B*").count("1")
|
|
112
|
+
bb = b.unpack1("B*").count("1")
|
|
113
|
+
ab == 0 ? 1.0 : 1.0 - (ab / (aa + bb - ab).to_f)
|
|
114
|
+
end
|
|
115
|
+
end
|
|
16
116
|
end
|
|
17
117
|
|
|
18
118
|
module InstanceMethods
|
|
19
119
|
def configure_connection
|
|
20
120
|
super
|
|
21
121
|
db = @raw_connection
|
|
22
|
-
|
|
23
|
-
|
|
24
|
-
|
|
122
|
+
SQLite.setup_functions(db)
|
|
123
|
+
if SQLite.extensions.any?
|
|
124
|
+
db.enable_load_extension(1)
|
|
125
|
+
begin
|
|
126
|
+
SQLite.extensions.each do |extension|
|
|
127
|
+
if extension == :sqlite_vec
|
|
128
|
+
SqliteVec.load(db)
|
|
129
|
+
else
|
|
130
|
+
db.load_extension(extension)
|
|
131
|
+
end
|
|
132
|
+
end
|
|
133
|
+
ensure
|
|
134
|
+
db.enable_load_extension(0)
|
|
135
|
+
end
|
|
136
|
+
end
|
|
25
137
|
end
|
|
26
138
|
end
|
|
27
139
|
end
|
data/lib/neighbor/utils.rb
CHANGED
|
@@ -76,13 +76,37 @@ module Neighbor
|
|
|
76
76
|
when :sqlite
|
|
77
77
|
case distance
|
|
78
78
|
when "euclidean"
|
|
79
|
-
|
|
79
|
+
if SQLite.vec1?
|
|
80
|
+
"vec1_l2_distance"
|
|
81
|
+
elsif SQLite.sqlite_vec?
|
|
82
|
+
"vec_distance_L2"
|
|
83
|
+
else
|
|
84
|
+
"neighbor_l2_distance"
|
|
85
|
+
end
|
|
80
86
|
when "cosine"
|
|
81
|
-
|
|
87
|
+
if SQLite.vec1?
|
|
88
|
+
"vec1_cos_distance"
|
|
89
|
+
elsif SQLite.sqlite_vec?
|
|
90
|
+
"vec_distance_cosine"
|
|
91
|
+
else
|
|
92
|
+
"neighbor_cosine_distance"
|
|
93
|
+
end
|
|
82
94
|
when "taxicab"
|
|
83
|
-
|
|
95
|
+
if SQLite.sqlite_vec?
|
|
96
|
+
"vec_distance_L1"
|
|
97
|
+
else
|
|
98
|
+
"neighbor_l1_distance"
|
|
99
|
+
end
|
|
100
|
+
when "inner_product"
|
|
101
|
+
"neighbor_max_inner_product"
|
|
84
102
|
when "hamming"
|
|
85
|
-
|
|
103
|
+
if SQLite.sqlite_vec?
|
|
104
|
+
"vec_distance_hamming"
|
|
105
|
+
else
|
|
106
|
+
"neighbor_hamming_distance"
|
|
107
|
+
end
|
|
108
|
+
when "jaccard"
|
|
109
|
+
"neighbor_jaccard_distance"
|
|
86
110
|
end
|
|
87
111
|
when :mariadb
|
|
88
112
|
case column_type
|
|
@@ -158,10 +182,15 @@ module Neighbor
|
|
|
158
182
|
def self.order(adapter, type, operator, quoted_attribute, query)
|
|
159
183
|
case adapter
|
|
160
184
|
when :sqlite
|
|
161
|
-
|
|
162
|
-
|
|
185
|
+
if operator.start_with?("neighbor")
|
|
186
|
+
if type == :bit
|
|
187
|
+
"#{operator}(#{quoted_attribute}, #{query})"
|
|
188
|
+
else
|
|
189
|
+
"#{operator}(#{quoted_attribute}, #{query}, #{type == :int8 ? 1 : 0})"
|
|
190
|
+
end
|
|
191
|
+
elsif type == :int8
|
|
163
192
|
"#{operator}(vec_int8(#{quoted_attribute}), vec_int8(#{query}))"
|
|
164
|
-
|
|
193
|
+
elsif type == :bit
|
|
165
194
|
"#{operator}(vec_bit(#{quoted_attribute}), vec_bit(#{query}))"
|
|
166
195
|
else
|
|
167
196
|
"#{operator}(#{quoted_attribute}, #{query})"
|
data/lib/neighbor/version.rb
CHANGED
data/lib/neighbor.rb
CHANGED
metadata
CHANGED
|
@@ -1,7 +1,7 @@
|
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
|
2
2
|
name: neighbor
|
|
3
3
|
version: !ruby/object:Gem::Version
|
|
4
|
-
version:
|
|
4
|
+
version: 1.1.0
|
|
5
5
|
platform: ruby
|
|
6
6
|
authors:
|
|
7
7
|
- Andrew Kane
|
|
@@ -15,14 +15,14 @@ dependencies:
|
|
|
15
15
|
requirements:
|
|
16
16
|
- - ">="
|
|
17
17
|
- !ruby/object:Gem::Version
|
|
18
|
-
version: '7.
|
|
18
|
+
version: '7.2'
|
|
19
19
|
type: :runtime
|
|
20
20
|
prerelease: false
|
|
21
21
|
version_requirements: !ruby/object:Gem::Requirement
|
|
22
22
|
requirements:
|
|
23
23
|
- - ">="
|
|
24
24
|
- !ruby/object:Gem::Version
|
|
25
|
-
version: '7.
|
|
25
|
+
version: '7.2'
|
|
26
26
|
email: andrew@ankane.org
|
|
27
27
|
executables: []
|
|
28
28
|
extensions: []
|
|
@@ -67,14 +67,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
|
67
67
|
requirements:
|
|
68
68
|
- - ">="
|
|
69
69
|
- !ruby/object:Gem::Version
|
|
70
|
-
version: '3.
|
|
70
|
+
version: '3.3'
|
|
71
71
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
|
72
72
|
requirements:
|
|
73
73
|
- - ">="
|
|
74
74
|
- !ruby/object:Gem::Version
|
|
75
75
|
version: '0'
|
|
76
76
|
requirements: []
|
|
77
|
-
rubygems_version:
|
|
77
|
+
rubygems_version: 4.0.6
|
|
78
78
|
specification_version: 4
|
|
79
79
|
summary: Nearest neighbor search for Rails
|
|
80
80
|
test_files: []
|