neighbor 0.4.1 → 0.4.2
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +4 -0
- data/README.md +48 -25
- data/lib/neighbor/type/cube.rb +2 -2
- data/lib/neighbor/type/halfvec.rb +2 -2
- data/lib/neighbor/type/sparsevec.rb +1 -1
- data/lib/neighbor/type/vector.rb +2 -2
- data/lib/neighbor/utils.rb +4 -0
- data/lib/neighbor/version.rb +1 -1
- metadata +1 -1
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: dfc4af6302c7098ea40f96e9d8a19706aff46a2506cad541ff18ee07fcd11019
|
4
|
+
data.tar.gz: a79b59895ca3b99a7c048eddd20cb3602b2660425ab44463e7021ab763a26f62
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 11081e687de4c79428351095477137f9140bc6c0363d09c54ece8fd5f7bbe2df802d740332f4474357f9e9e57157bd1f1f4dd3671c106d24dc7e01e2f0d84e2a
|
7
|
+
data.tar.gz: f18d787b22df7bbc00c69b1f9f6262e19c0214f6ef28814a5c05d9e8c3dae357f32596994bef60112d3f53dac0c1b51f6c99af6345c7e01b2f26cef1a7b42226
|
data/CHANGELOG.md
CHANGED
data/README.md
CHANGED
@@ -14,7 +14,7 @@ gem "neighbor"
|
|
14
14
|
|
15
15
|
## Choose An Extension
|
16
16
|
|
17
|
-
Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [
|
17
|
+
Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [pgvector](https://github.com/pgvector/pgvector). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.
|
18
18
|
|
19
19
|
For cube, run:
|
20
20
|
|
@@ -23,7 +23,7 @@ rails generate neighbor:cube
|
|
23
23
|
rails db:migrate
|
24
24
|
```
|
25
25
|
|
26
|
-
For
|
26
|
+
For pgvector, [install the extension](https://github.com/pgvector/pgvector#installation) and run:
|
27
27
|
|
28
28
|
```sh
|
29
29
|
rails generate neighbor:vector
|
@@ -70,17 +70,30 @@ Get the nearest neighbors to a vector
|
|
70
70
|
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)
|
71
71
|
```
|
72
72
|
|
73
|
-
|
73
|
+
Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
|
74
|
+
|
75
|
+
```ruby
|
76
|
+
nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
|
77
|
+
nearest_item.neighbor_distance
|
78
|
+
```
|
79
|
+
|
80
|
+
See the additional docs for:
|
81
|
+
|
82
|
+
- [cube](#cube)
|
83
|
+
- [pgvector](#pgvector)
|
84
|
+
|
85
|
+
Or check out some [examples](#examples)
|
86
|
+
|
87
|
+
## cube
|
88
|
+
|
89
|
+
### Distance
|
74
90
|
|
75
91
|
Supported values are:
|
76
92
|
|
77
93
|
- `euclidean`
|
78
94
|
- `cosine`
|
79
95
|
- `taxicab`
|
80
|
-
- `chebyshev`
|
81
|
-
- `inner_product` (vector only)
|
82
|
-
- `hamming` (vector only)
|
83
|
-
- `jaccard` (vector only)
|
96
|
+
- `chebyshev`
|
84
97
|
|
85
98
|
For cosine distance with cube, vectors must be normalized before being stored.
|
86
99
|
|
@@ -90,18 +103,11 @@ class Item < ApplicationRecord
|
|
90
103
|
end
|
91
104
|
```
|
92
105
|
|
93
|
-
For inner product with cube, see [this example](examples/
|
94
|
-
|
95
|
-
Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
|
96
|
-
|
97
|
-
```ruby
|
98
|
-
nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
|
99
|
-
nearest_item.neighbor_distance
|
100
|
-
```
|
106
|
+
For inner product with cube, see [this example](examples/disco/user_recs_cube.rb).
|
101
107
|
|
102
|
-
|
108
|
+
### Dimensions
|
103
109
|
|
104
|
-
The cube
|
110
|
+
The `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.
|
105
111
|
|
106
112
|
For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
|
107
113
|
|
@@ -111,9 +117,26 @@ class Item < ApplicationRecord
|
|
111
117
|
end
|
112
118
|
```
|
113
119
|
|
114
|
-
##
|
120
|
+
## pgvector
|
121
|
+
|
122
|
+
### Distance
|
123
|
+
|
124
|
+
Supported values are:
|
125
|
+
|
126
|
+
- `euclidean`
|
127
|
+
- `inner_product`
|
128
|
+
- `cosine`
|
129
|
+
- `taxicab`
|
130
|
+
- `hamming`
|
131
|
+
- `jaccard`
|
132
|
+
|
133
|
+
### Dimensions
|
134
|
+
|
135
|
+
The `vector` type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
|
136
|
+
|
137
|
+
### Indexing
|
115
138
|
|
116
|
-
|
139
|
+
Add an approximate index to speed up queries. Create a migration with:
|
117
140
|
|
118
141
|
```ruby
|
119
142
|
class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
|
@@ -139,7 +162,7 @@ Or the number of probes with IVFFlat
|
|
139
162
|
Item.connection.execute("SET ivfflat.probes = 3")
|
140
163
|
```
|
141
164
|
|
142
|
-
|
165
|
+
### Half-Precision Vectors
|
143
166
|
|
144
167
|
Use the `halfvec` type to store half-precision vectors
|
145
168
|
|
@@ -151,7 +174,7 @@ class AddEmbeddingToItems < ActiveRecord::Migration[7.2]
|
|
151
174
|
end
|
152
175
|
```
|
153
176
|
|
154
|
-
|
177
|
+
### Half-Precision Indexing
|
155
178
|
|
156
179
|
Index vectors at half precision for smaller indexes
|
157
180
|
|
@@ -169,7 +192,7 @@ Get the nearest neighbors
|
|
169
192
|
Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean", precision: "half").first(5)
|
170
193
|
```
|
171
194
|
|
172
|
-
|
195
|
+
### Binary Vectors
|
173
196
|
|
174
197
|
Use the `bit` type to store binary vectors
|
175
198
|
|
@@ -187,7 +210,7 @@ Get the nearest neighbors by Hamming distance
|
|
187
210
|
Item.nearest_neighbors(:embedding, "101", distance: "hamming").first(5)
|
188
211
|
```
|
189
212
|
|
190
|
-
|
213
|
+
### Binary Quantization
|
191
214
|
|
192
215
|
Use expression indexing for binary quantization
|
193
216
|
|
@@ -199,7 +222,7 @@ class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
|
|
199
222
|
end
|
200
223
|
```
|
201
224
|
|
202
|
-
|
225
|
+
### Sparse Vectors
|
203
226
|
|
204
227
|
Use the `sparsevec` type to store sparse vectors
|
205
228
|
|
@@ -543,7 +566,7 @@ movie = Movie.find_by(name: "Star Wars (1977)")
|
|
543
566
|
movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)
|
544
567
|
```
|
545
568
|
|
546
|
-
See the complete code for [cube](examples/disco/item_recs_cube.rb) and [
|
569
|
+
See the complete code for [cube](examples/disco/item_recs_cube.rb) and [pgvector](examples/disco/item_recs_vector.rb)
|
547
570
|
|
548
571
|
## History
|
549
572
|
|
data/lib/neighbor/type/cube.rb
CHANGED
@@ -6,7 +6,7 @@ module Neighbor
|
|
6
6
|
end
|
7
7
|
|
8
8
|
def serialize(value)
|
9
|
-
if
|
9
|
+
if Utils.array?(value)
|
10
10
|
value = value.to_a
|
11
11
|
if value.first.is_a?(Array)
|
12
12
|
value = value.map { |v| serialize_point(v) }.join(", ")
|
@@ -20,7 +20,7 @@ module Neighbor
|
|
20
20
|
private
|
21
21
|
|
22
22
|
def cast_value(value)
|
23
|
-
if
|
23
|
+
if Utils.array?(value)
|
24
24
|
value.to_a
|
25
25
|
elsif value.is_a?(Numeric)
|
26
26
|
[value]
|
@@ -6,7 +6,7 @@ module Neighbor
|
|
6
6
|
end
|
7
7
|
|
8
8
|
def serialize(value)
|
9
|
-
if
|
9
|
+
if Utils.array?(value)
|
10
10
|
value = "[#{value.to_a.map(&:to_f).join(",")}]"
|
11
11
|
end
|
12
12
|
super(value)
|
@@ -17,7 +17,7 @@ module Neighbor
|
|
17
17
|
def cast_value(value)
|
18
18
|
if value.is_a?(String)
|
19
19
|
value[1..-1].split(",").map(&:to_f)
|
20
|
-
elsif
|
20
|
+
elsif Utils.array?(value)
|
21
21
|
value.to_a
|
22
22
|
else
|
23
23
|
raise "can't cast #{value.class.name} to halfvec"
|
data/lib/neighbor/type/vector.rb
CHANGED
@@ -6,7 +6,7 @@ module Neighbor
|
|
6
6
|
end
|
7
7
|
|
8
8
|
def serialize(value)
|
9
|
-
if
|
9
|
+
if Utils.array?(value)
|
10
10
|
value = "[#{value.to_a.map(&:to_f).join(",")}]"
|
11
11
|
end
|
12
12
|
super(value)
|
@@ -17,7 +17,7 @@ module Neighbor
|
|
17
17
|
def cast_value(value)
|
18
18
|
if value.is_a?(String)
|
19
19
|
value[1..-1].split(",").map(&:to_f)
|
20
|
-
elsif
|
20
|
+
elsif Utils.array?(value)
|
21
21
|
value.to_a
|
22
22
|
else
|
23
23
|
raise "can't cast #{value.class.name} to vector"
|
data/lib/neighbor/utils.rb
CHANGED
data/lib/neighbor/version.rb
CHANGED