neighbor 0.4.1 → 0.4.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +48 -25
- data/lib/neighbor/type/cube.rb +2 -2
- data/lib/neighbor/type/halfvec.rb +2 -2
- data/lib/neighbor/type/sparsevec.rb +1 -1
- data/lib/neighbor/type/vector.rb +2 -2
- data/lib/neighbor/utils.rb +4 -0
- data/lib/neighbor/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dfc4af6302c7098ea40f96e9d8a19706aff46a2506cad541ff18ee07fcd11019
|
4
|
+
data.tar.gz: a79b59895ca3b99a7c048eddd20cb3602b2660425ab44463e7021ab763a26f62
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 11081e687de4c79428351095477137f9140bc6c0363d09c54ece8fd5f7bbe2df802d740332f4474357f9e9e57157bd1f1f4dd3671c106d24dc7e01e2f0d84e2a
|
7
|
+
data.tar.gz: f18d787b22df7bbc00c69b1f9f6262e19c0214f6ef28814a5c05d9e8c3dae357f32596994bef60112d3f53dac0c1b51f6c99af6345c7e01b2f26cef1a7b42226
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -14,7 +14,7 @@ gem "neighbor"
|
|
14
14
|
|
15
15
|
## Choose An Extension
|
16
16
|
|
17
|
-
Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [
|
17
|
+
Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [pgvector](https://github.com/pgvector/pgvector). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.
|
18
18
|
|
19
19
|
For cube, run:
|
20
20
|
|
@@ -23,7 +23,7 @@ rails generate neighbor:cube
|
|
23
23
|
rails db:migrate
|
24
24
|
```
|
25
25
|
|
26
|
-
For
|
26
|
+
For pgvector, [install the extension](https://github.com/pgvector/pgvector#installation) and run:
|
27
27
|
|
28
28
|
```sh
|
29
29
|
rails generate neighbor:vector
|
@@ -70,17 +70,30 @@ Get the nearest neighbors to a vector
|
|
70
70
|
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)
|
71
71
|
```
|
72
72
|
|
73
|
-
|
73
|
+
Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
|
77
|
+
nearest_item.neighbor_distance
|
78
|
+
```
|
79
|
+
|
80
|
+
See the additional docs for:
|
81
|
+
|
82
|
+
- [cube](#cube)
|
83
|
+
- [pgvector](#pgvector)
|
84
|
+
|
85
|
+
Or check out some [examples](#examples)
|
86
|
+
|
87
|
+
## cube
|
88
|
+
|
89
|
+
### Distance
|
74
90
|
|
75
91
|
Supported values are:
|
76
92
|
|
77
93
|
- `euclidean`
|
78
94
|
- `cosine`
|
79
95
|
- `taxicab`
|
80
|
-
- `chebyshev`
|
81
|
-
- `inner_product` (vector only)
|
82
|
-
- `hamming` (vector only)
|
83
|
-
- `jaccard` (vector only)
|
96
|
+
- `chebyshev`
|
84
97
|
|
85
98
|
For cosine distance with cube, vectors must be normalized before being stored.
|
86
99
|
|
@@ -90,18 +103,11 @@ class Item < ApplicationRecord
|
|
90
103
|
end
|
91
104
|
```
|
92
105
|
|
93
|
-
For inner product with cube, see [this example](examples/
|
94
|
-
|
95
|
-
Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
|
96
|
-
|
97
|
-
```ruby
|
98
|
-
nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
|
99
|
-
nearest_item.neighbor_distance
|
100
|
-
```
|
106
|
+
For inner product with cube, see [this example](examples/disco/user_recs_cube.rb).
|
101
107
|
|
102
|
-
|
108
|
+
### Dimensions
|
103
109
|
|
104
|
-
The cube
|
110
|
+
The `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.
|
105
111
|
|
106
112
|
For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
|
107
113
|
|
@@ -111,9 +117,26 @@ class Item < ApplicationRecord
|
|
111
117
|
end
|
112
118
|
```
|
113
119
|
|
114
|
-
##
|
120
|
+
## pgvector
|
121
|
+
|
122
|
+
### Distance
|
123
|
+
|
124
|
+
Supported values are:
|
125
|
+
|
126
|
+
- `euclidean`
|
127
|
+
- `inner_product`
|
128
|
+
- `cosine`
|
129
|
+
- `taxicab`
|
130
|
+
- `hamming`
|
131
|
+
- `jaccard`
|
132
|
+
|
133
|
+
### Dimensions
|
134
|
+
|
135
|
+
The `vector` type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
|
136
|
+
|
137
|
+
### Indexing
|
115
138
|
|
116
|
-
|
139
|
+
Add an approximate index to speed up queries. Create a migration with:
|
117
140
|
|
118
141
|
```ruby
|
119
142
|
class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
|
@@ -139,7 +162,7 @@ Or the number of probes with IVFFlat
|
|
139
162
|
Item.connection.execute("SET ivfflat.probes = 3")
|
140
163
|
```
|
141
164
|
|
142
|
-
|
165
|
+
### Half-Precision Vectors
|
143
166
|
|
144
167
|
Use the `halfvec` type to store half-precision vectors
|
145
168
|
|
@@ -151,7 +174,7 @@ class AddEmbeddingToItems < ActiveRecord::Migration[7.2]
|
|
151
174
|
end
|
152
175
|
```
|
153
176
|
|
154
|
-
|
177
|
+
### Half-Precision Indexing
|
155
178
|
|
156
179
|
Index vectors at half precision for smaller indexes
|
157
180
|
|
@@ -169,7 +192,7 @@ Get the nearest neighbors
|
|
169
192
|
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean", precision: "half").first(5)
|
170
193
|
```
|
171
194
|
|
172
|
-
|
195
|
+
### Binary Vectors
|
173
196
|
|
174
197
|
Use the `bit` type to store binary vectors
|
175
198
|
|
@@ -187,7 +210,7 @@ Get the nearest neighbors by Hamming distance
|
|
187
210
|
Item.nearest_neighbors(:embedding, "101", distance: "hamming").first(5)
|
188
211
|
```
|
189
212
|
|
190
|
-
|
213
|
+
### Binary Quantization
|
191
214
|
|
192
215
|
Use expression indexing for binary quantization
|
193
216
|
|
@@ -199,7 +222,7 @@ class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
|
|
199
222
|
end
|
200
223
|
```
|
201
224
|
|
202
|
-
|
225
|
+
### Sparse Vectors
|
203
226
|
|
204
227
|
Use the `sparsevec` type to store sparse vectors
|
205
228
|
|
@@ -543,7 +566,7 @@ movie = Movie.find_by(name: "Star Wars (1977)")
|
|
543
566
|
movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)
|
544
567
|
```
|
545
568
|
|
546
|
-
See the complete code for [cube](examples/disco/item_recs_cube.rb) and [
|
569
|
+
See the complete code for [cube](examples/disco/item_recs_cube.rb) and [pgvector](examples/disco/item_recs_vector.rb)
|
547
570
|
|
548
571
|
## History
|
549
572
|
|
data/lib/neighbor/type/cube.rb
CHANGED
@@ -6,7 +6,7 @@ module Neighbor
|
|
6
6
|
end
|
7
7
|
|
8
8
|
def serialize(value)
|
9
|
-
if
|
9
|
+
if Utils.array?(value)
|
10
10
|
value = value.to_a
|
11
11
|
if value.first.is_a?(Array)
|
12
12
|
value = value.map { |v| serialize_point(v) }.join(", ")
|
@@ -20,7 +20,7 @@ module Neighbor
|
|
20
20
|
private
|
21
21
|
|
22
22
|
def cast_value(value)
|
23
|
-
if
|
23
|
+
if Utils.array?(value)
|
24
24
|
value.to_a
|
25
25
|
elsif value.is_a?(Numeric)
|
26
26
|
[value]
|
@@ -6,7 +6,7 @@ module Neighbor
|
|
6
6
|
end
|
7
7
|
|
8
8
|
def serialize(value)
|
9
|
-
if
|
9
|
+
if Utils.array?(value)
|
10
10
|
value = "[#{value.to_a.map(&:to_f).join(",")}]"
|
11
11
|
end
|
12
12
|
super(value)
|
@@ -17,7 +17,7 @@ module Neighbor
|
|
17
17
|
def cast_value(value)
|
18
18
|
if value.is_a?(String)
|
19
19
|
value[1..-1].split(",").map(&:to_f)
|
20
|
-
elsif
|
20
|
+
elsif Utils.array?(value)
|
21
21
|
value.to_a
|
22
22
|
else
|
23
23
|
raise "can't cast #{value.class.name} to halfvec"
|
data/lib/neighbor/type/vector.rb
CHANGED
@@ -6,7 +6,7 @@ module Neighbor
|
|
6
6
|
end
|
7
7
|
|
8
8
|
def serialize(value)
|
9
|
-
if
|
9
|
+
if Utils.array?(value)
|
10
10
|
value = "[#{value.to_a.map(&:to_f).join(",")}]"
|
11
11
|
end
|
12
12
|
super(value)
|
@@ -17,7 +17,7 @@ module Neighbor
|
|
17
17
|
def cast_value(value)
|
18
18
|
if value.is_a?(String)
|
19
19
|
value[1..-1].split(",").map(&:to_f)
|
20
|
-
elsif
|
20
|
+
elsif Utils.array?(value)
|
21
21
|
value.to_a
|
22
22
|
else
|
23
23
|
raise "can't cast #{value.class.name} to vector"
|
data/lib/neighbor/utils.rb
CHANGED
data/lib/neighbor/version.rb
CHANGED