neighbor 0.4.1 → 0.4.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8aa6de2790d94de9411b0142836b2ad181a411e299fce4b98357b96ac4161183
4
- data.tar.gz: 2924d7f15f5b36bc89ee72372c1bfeb373d99481269696a9a9dcc41f90201f38
3
+ metadata.gz: dfc4af6302c7098ea40f96e9d8a19706aff46a2506cad541ff18ee07fcd11019
4
+ data.tar.gz: a79b59895ca3b99a7c048eddd20cb3602b2660425ab44463e7021ab763a26f62
5
5
  SHA512:
6
- metadata.gz: 2bc1b3ee6d5b1ee0ab175b017e753cf958bd8ceb1ef2a23ba769770dfebf54eec251ac59c8f5f3b6ca56efcbad1763c34622b94924a017622c2f78fc8740f762
7
- data.tar.gz: d946dda99833964582f63863b2d898fea6bf065312cf60aec873631df96195e1a54375606ad9c9cc0f767937cdb7ea38b0d9990efcbbeab15ccbb11f8a2020ef
6
+ metadata.gz: 11081e687de4c79428351095477137f9140bc6c0363d09c54ece8fd5f7bbe2df802d740332f4474357f9e9e57157bd1f1f4dd3671c106d24dc7e01e2f0d84e2a
7
+ data.tar.gz: f18d787b22df7bbc00c69b1f9f6262e19c0214f6ef28814a5c05d9e8c3dae357f32596994bef60112d3f53dac0c1b51f6c99af6345c7e01b2f26cef1a7b42226
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ## 0.4.2 (2024-08-27)
2
+
3
+ - Fixed error with `nil` values
4
+
1
5
  ## 0.4.1 (2024-08-26)
2
6
 
3
7
  - Added `precision` option
data/README.md CHANGED
@@ -14,7 +14,7 @@ gem "neighbor"
14
14
 
15
15
  ## Choose An Extension
16
16
 
17
- Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [vector](https://github.com/pgvector/pgvector). cube ships with Postgres, while vector supports more dimensions and approximate nearest neighbor search.
17
+ Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [pgvector](https://github.com/pgvector/pgvector). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.
18
18
 
19
19
  For cube, run:
20
20
 
@@ -23,7 +23,7 @@ rails generate neighbor:cube
23
23
  rails db:migrate
24
24
  ```
25
25
 
26
- For vector, [install pgvector](https://github.com/pgvector/pgvector#installation) and run:
26
+ For pgvector, [install the extension](https://github.com/pgvector/pgvector#installation) and run:
27
27
 
28
28
  ```sh
29
29
  rails generate neighbor:vector
@@ -70,17 +70,30 @@ Get the nearest neighbors to a vector
70
70
  Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)
71
71
  ```
72
72
 
73
- ## Distance
73
+ Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
74
+
75
+ ```ruby
76
+ nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
77
+ nearest_item.neighbor_distance
78
+ ```
79
+
80
+ See the additional docs for:
81
+
82
+ - [cube](#cube)
83
+ - [pgvector](#pgvector)
84
+
85
+ Or check out some [examples](#examples)
86
+
87
+ ## cube
88
+
89
+ ### Distance
74
90
 
75
91
  Supported values are:
76
92
 
77
93
  - `euclidean`
78
94
  - `cosine`
79
95
  - `taxicab`
80
- - `chebyshev` (cube only)
81
- - `inner_product` (vector only)
82
- - `hamming` (vector only)
83
- - `jaccard` (vector only)
96
+ - `chebyshev`
84
97
 
85
98
  For cosine distance with cube, vectors must be normalized before being stored.
86
99
 
@@ -90,18 +103,11 @@ class Item < ApplicationRecord
90
103
  end
91
104
  ```
92
105
 
93
- For inner product with cube, see [this example](examples/disco_user_recs_cube.rb).
94
-
95
- Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
96
-
97
- ```ruby
98
- nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
99
- nearest_item.neighbor_distance
100
- ```
106
+ For inner product with cube, see [this example](examples/disco/user_recs_cube.rb).
101
107
 
102
- ## Dimensions
108
+ ### Dimensions
103
109
 
104
- The cube data type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this. The vector data type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
110
+ The `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.
105
111
 
106
112
  For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
107
113
 
@@ -111,9 +117,26 @@ class Item < ApplicationRecord
111
117
  end
112
118
  ```
113
119
 
114
- ## Indexing
120
+ ## pgvector
121
+
122
+ ### Distance
123
+
124
+ Supported values are:
125
+
126
+ - `euclidean`
127
+ - `inner_product`
128
+ - `cosine`
129
+ - `taxicab`
130
+ - `hamming`
131
+ - `jaccard`
132
+
133
+ ### Dimensions
134
+
135
+ The `vector` type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
136
+
137
+ ### Indexing
115
138
 
116
- For vector, add an approximate index to speed up queries. Create a migration with:
139
+ Add an approximate index to speed up queries. Create a migration with:
117
140
 
118
141
  ```ruby
119
142
  class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
@@ -139,7 +162,7 @@ Or the number of probes with IVFFlat
139
162
  Item.connection.execute("SET ivfflat.probes = 3")
140
163
  ```
141
164
 
142
- ## Half-Precision Vectors
165
+ ### Half-Precision Vectors
143
166
 
144
167
  Use the `halfvec` type to store half-precision vectors
145
168
 
@@ -151,7 +174,7 @@ class AddEmbeddingToItems < ActiveRecord::Migration[7.2]
151
174
  end
152
175
  ```
153
176
 
154
- ## Half-Precision Indexing
177
+ ### Half-Precision Indexing
155
178
 
156
179
  Index vectors at half precision for smaller indexes
157
180
 
@@ -169,7 +192,7 @@ Get the nearest neighbors
169
192
  Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean", precision: "half").first(5)
170
193
  ```
171
194
 
172
- ## Binary Vectors
195
+ ### Binary Vectors
173
196
 
174
197
  Use the `bit` type to store binary vectors
175
198
 
@@ -187,7 +210,7 @@ Get the nearest neighbors by Hamming distance
187
210
  Item.nearest_neighbors(:embedding, "101", distance: "hamming").first(5)
188
211
  ```
189
212
 
190
- ## Binary Quantization
213
+ ### Binary Quantization
191
214
 
192
215
  Use expression indexing for binary quantization
193
216
 
@@ -199,7 +222,7 @@ class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
199
222
  end
200
223
  ```
201
224
 
202
- ## Sparse Vectors
225
+ ### Sparse Vectors
203
226
 
204
227
  Use the `sparsevec` type to store sparse vectors
205
228
 
@@ -543,7 +566,7 @@ movie = Movie.find_by(name: "Star Wars (1977)")
543
566
  movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)
544
567
  ```
545
568
 
546
- See the complete code for [cube](examples/disco/item_recs_cube.rb) and [vector](examples/disco/item_recs_vector.rb)
569
+ See the complete code for [cube](examples/disco/item_recs_cube.rb) and [pgvector](examples/disco/item_recs_vector.rb)
547
570
 
548
571
  ## History
549
572
 
@@ -6,7 +6,7 @@ module Neighbor
6
6
  end
7
7
 
8
8
  def serialize(value)
9
- if value.respond_to?(:to_a)
9
+ if Utils.array?(value)
10
10
  value = value.to_a
11
11
  if value.first.is_a?(Array)
12
12
  value = value.map { |v| serialize_point(v) }.join(", ")
@@ -20,7 +20,7 @@ module Neighbor
20
20
  private
21
21
 
22
22
  def cast_value(value)
23
- if value.respond_to?(:to_a)
23
+ if Utils.array?(value)
24
24
  value.to_a
25
25
  elsif value.is_a?(Numeric)
26
26
  [value]
@@ -6,7 +6,7 @@ module Neighbor
6
6
  end
7
7
 
8
8
  def serialize(value)
9
- if value.respond_to?(:to_a)
9
+ if Utils.array?(value)
10
10
  value = "[#{value.to_a.map(&:to_f).join(",")}]"
11
11
  end
12
12
  super(value)
@@ -17,7 +17,7 @@ module Neighbor
17
17
  def cast_value(value)
18
18
  if value.is_a?(String)
19
19
  value[1..-1].split(",").map(&:to_f)
20
- elsif value.respond_to?(:to_a)
20
+ elsif Utils.array?(value)
21
21
  value.to_a
22
22
  else
23
23
  raise "can't cast #{value.class.name} to halfvec"
@@ -19,7 +19,7 @@ module Neighbor
19
19
  value
20
20
  elsif value.is_a?(String)
21
21
  SparseVector.from_text(value)
22
- elsif value.respond_to?(:to_a)
22
+ elsif Utils.array?(value)
23
23
  value = SparseVector.new(value.to_a)
24
24
  else
25
25
  raise "can't cast #{value.class.name} to sparsevec"
@@ -6,7 +6,7 @@ module Neighbor
6
6
  end
7
7
 
8
8
  def serialize(value)
9
- if value.respond_to?(:to_a)
9
+ if Utils.array?(value)
10
10
  value = "[#{value.to_a.map(&:to_f).join(",")}]"
11
11
  end
12
12
  super(value)
@@ -17,7 +17,7 @@ module Neighbor
17
17
  def cast_value(value)
18
18
  if value.is_a?(String)
19
19
  value[1..-1].split(",").map(&:to_f)
20
- elsif value.respond_to?(:to_a)
20
+ elsif Utils.array?(value)
21
21
  value.to_a
22
22
  else
23
23
  raise "can't cast #{value.class.name} to vector"
@@ -38,5 +38,9 @@ module Neighbor
38
38
  # could also throw error
39
39
  norm > 0 ? value.map { |v| v / norm } : value
40
40
  end
41
+
42
+ def self.array?(value)
43
+ !value.nil? && value.respond_to?(:to_a)
44
+ end
41
45
  end
42
46
  end
@@ -1,3 +1,3 @@
1
1
  module Neighbor
2
- VERSION = "0.4.1"
2
+ VERSION = "0.4.2"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: neighbor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane