neighbor 0.4.1 → 0.4.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 8aa6de2790d94de9411b0142836b2ad181a411e299fce4b98357b96ac4161183
4
- data.tar.gz: 2924d7f15f5b36bc89ee72372c1bfeb373d99481269696a9a9dcc41f90201f38
3
+ metadata.gz: dfc4af6302c7098ea40f96e9d8a19706aff46a2506cad541ff18ee07fcd11019
4
+ data.tar.gz: a79b59895ca3b99a7c048eddd20cb3602b2660425ab44463e7021ab763a26f62
5
5
  SHA512:
6
- metadata.gz: 2bc1b3ee6d5b1ee0ab175b017e753cf958bd8ceb1ef2a23ba769770dfebf54eec251ac59c8f5f3b6ca56efcbad1763c34622b94924a017622c2f78fc8740f762
7
- data.tar.gz: d946dda99833964582f63863b2d898fea6bf065312cf60aec873631df96195e1a54375606ad9c9cc0f767937cdb7ea38b0d9990efcbbeab15ccbb11f8a2020ef
6
+ metadata.gz: 11081e687de4c79428351095477137f9140bc6c0363d09c54ece8fd5f7bbe2df802d740332f4474357f9e9e57157bd1f1f4dd3671c106d24dc7e01e2f0d84e2a
7
+ data.tar.gz: f18d787b22df7bbc00c69b1f9f6262e19c0214f6ef28814a5c05d9e8c3dae357f32596994bef60112d3f53dac0c1b51f6c99af6345c7e01b2f26cef1a7b42226
data/CHANGELOG.md CHANGED
@@ -1,3 +1,7 @@
1
+ ## 0.4.2 (2024-08-27)
2
+
3
+ - Fixed error with `nil` values
4
+
1
5
  ## 0.4.1 (2024-08-26)
2
6
 
3
7
  - Added `precision` option
data/README.md CHANGED
@@ -14,7 +14,7 @@ gem "neighbor"
14
14
 
15
15
  ## Choose An Extension
16
16
 
17
- Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [vector](https://github.com/pgvector/pgvector). cube ships with Postgres, while vector supports more dimensions and approximate nearest neighbor search.
17
+ Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [pgvector](https://github.com/pgvector/pgvector). cube ships with Postgres, while pgvector supports more dimensions and approximate nearest neighbor search.
18
18
 
19
19
  For cube, run:
20
20
 
@@ -23,7 +23,7 @@ rails generate neighbor:cube
23
23
  rails db:migrate
24
24
  ```
25
25
 
26
- For vector, [install pgvector](https://github.com/pgvector/pgvector#installation) and run:
26
+ For pgvector, [install the extension](https://github.com/pgvector/pgvector#installation) and run:
27
27
 
28
28
  ```sh
29
29
  rails generate neighbor:vector
@@ -70,17 +70,30 @@ Get the nearest neighbors to a vector
70
70
  Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean").first(5)
71
71
  ```
72
72
 
73
- ## Distance
73
+ Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
74
+
75
+ ```ruby
76
+ nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
77
+ nearest_item.neighbor_distance
78
+ ```
79
+
80
+ See the additional docs for:
81
+
82
+ - [cube](#cube)
83
+ - [pgvector](#pgvector)
84
+
85
+ Or check out some [examples](#examples)
86
+
87
+ ## cube
88
+
89
+ ### Distance
74
90
 
75
91
  Supported values are:
76
92
 
77
93
  - `euclidean`
78
94
  - `cosine`
79
95
  - `taxicab`
80
- - `chebyshev` (cube only)
81
- - `inner_product` (vector only)
82
- - `hamming` (vector only)
83
- - `jaccard` (vector only)
96
+ - `chebyshev`
84
97
 
85
98
  For cosine distance with cube, vectors must be normalized before being stored.
86
99
 
@@ -90,18 +103,11 @@ class Item < ApplicationRecord
90
103
  end
91
104
  ```
92
105
 
93
- For inner product with cube, see [this example](examples/disco_user_recs_cube.rb).
94
-
95
- Records returned from `nearest_neighbors` will have a `neighbor_distance` attribute
96
-
97
- ```ruby
98
- nearest_item = item.nearest_neighbors(:embedding, distance: "euclidean").first
99
- nearest_item.neighbor_distance
100
- ```
106
+ For inner product with cube, see [this example](examples/disco/user_recs_cube.rb).
101
107
 
102
- ## Dimensions
108
+ ### Dimensions
103
109
 
104
- The cube data type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this. The vector data type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
110
+ The `cube` type can have up to 100 dimensions by default. See the [Postgres docs](https://www.postgresql.org/docs/current/cube.html) for how to increase this.
105
111
 
106
112
  For cube, it’s a good idea to specify the number of dimensions to ensure all records have the same number.
107
113
 
@@ -111,9 +117,26 @@ class Item < ApplicationRecord
111
117
  end
112
118
  ```
113
119
 
114
- ## Indexing
120
+ ## pgvector
121
+
122
+ ### Distance
123
+
124
+ Supported values are:
125
+
126
+ - `euclidean`
127
+ - `inner_product`
128
+ - `cosine`
129
+ - `taxicab`
130
+ - `hamming`
131
+ - `jaccard`
132
+
133
+ ### Dimensions
134
+
135
+ The `vector` type can have up to 16,000 dimensions, and vectors with up to 2,000 dimensions can be indexed.
136
+
137
+ ### Indexing
115
138
 
116
- For vector, add an approximate index to speed up queries. Create a migration with:
139
+ Add an approximate index to speed up queries. Create a migration with:
117
140
 
118
141
  ```ruby
119
142
  class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
@@ -139,7 +162,7 @@ Or the number of probes with IVFFlat
139
162
  Item.connection.execute("SET ivfflat.probes = 3")
140
163
  ```
141
164
 
142
- ## Half-Precision Vectors
165
+ ### Half-Precision Vectors
143
166
 
144
167
  Use the `halfvec` type to store half-precision vectors
145
168
 
@@ -151,7 +174,7 @@ class AddEmbeddingToItems < ActiveRecord::Migration[7.2]
151
174
  end
152
175
  ```
153
176
 
154
- ## Half-Precision Indexing
177
+ ### Half-Precision Indexing
155
178
 
156
179
  Index vectors at half precision for smaller indexes
157
180
 
@@ -169,7 +192,7 @@ Get the nearest neighbors
169
192
  Item.nearest_neighbors(:embedding, [0.9, 1.3, 1.1], distance: "euclidean", precision: "half").first(5)
170
193
  ```
171
194
 
172
- ## Binary Vectors
195
+ ### Binary Vectors
173
196
 
174
197
  Use the `bit` type to store binary vectors
175
198
 
@@ -187,7 +210,7 @@ Get the nearest neighbors by Hamming distance
187
210
  Item.nearest_neighbors(:embedding, "101", distance: "hamming").first(5)
188
211
  ```
189
212
 
190
- ## Binary Quantization
213
+ ### Binary Quantization
191
214
 
192
215
  Use expression indexing for binary quantization
193
216
 
@@ -199,7 +222,7 @@ class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.2]
199
222
  end
200
223
  ```
201
224
 
202
- ## Sparse Vectors
225
+ ### Sparse Vectors
203
226
 
204
227
  Use the `sparsevec` type to store sparse vectors
205
228
 
@@ -543,7 +566,7 @@ movie = Movie.find_by(name: "Star Wars (1977)")
543
566
  movie.nearest_neighbors(:factors, distance: "cosine").first(5).map(&:name)
544
567
  ```
545
568
 
546
- See the complete code for [cube](examples/disco/item_recs_cube.rb) and [vector](examples/disco/item_recs_vector.rb)
569
+ See the complete code for [cube](examples/disco/item_recs_cube.rb) and [pgvector](examples/disco/item_recs_vector.rb)
547
570
 
548
571
  ## History
549
572
 
@@ -6,7 +6,7 @@ module Neighbor
6
6
  end
7
7
 
8
8
  def serialize(value)
9
- if value.respond_to?(:to_a)
9
+ if Utils.array?(value)
10
10
  value = value.to_a
11
11
  if value.first.is_a?(Array)
12
12
  value = value.map { |v| serialize_point(v) }.join(", ")
@@ -20,7 +20,7 @@ module Neighbor
20
20
  private
21
21
 
22
22
  def cast_value(value)
23
- if value.respond_to?(:to_a)
23
+ if Utils.array?(value)
24
24
  value.to_a
25
25
  elsif value.is_a?(Numeric)
26
26
  [value]
@@ -6,7 +6,7 @@ module Neighbor
6
6
  end
7
7
 
8
8
  def serialize(value)
9
- if value.respond_to?(:to_a)
9
+ if Utils.array?(value)
10
10
  value = "[#{value.to_a.map(&:to_f).join(",")}]"
11
11
  end
12
12
  super(value)
@@ -17,7 +17,7 @@ module Neighbor
17
17
  def cast_value(value)
18
18
  if value.is_a?(String)
19
19
  value[1..-1].split(",").map(&:to_f)
20
- elsif value.respond_to?(:to_a)
20
+ elsif Utils.array?(value)
21
21
  value.to_a
22
22
  else
23
23
  raise "can't cast #{value.class.name} to halfvec"
@@ -19,7 +19,7 @@ module Neighbor
19
19
  value
20
20
  elsif value.is_a?(String)
21
21
  SparseVector.from_text(value)
22
- elsif value.respond_to?(:to_a)
22
+ elsif Utils.array?(value)
23
23
  value = SparseVector.new(value.to_a)
24
24
  else
25
25
  raise "can't cast #{value.class.name} to sparsevec"
@@ -6,7 +6,7 @@ module Neighbor
6
6
  end
7
7
 
8
8
  def serialize(value)
9
- if value.respond_to?(:to_a)
9
+ if Utils.array?(value)
10
10
  value = "[#{value.to_a.map(&:to_f).join(",")}]"
11
11
  end
12
12
  super(value)
@@ -17,7 +17,7 @@ module Neighbor
17
17
  def cast_value(value)
18
18
  if value.is_a?(String)
19
19
  value[1..-1].split(",").map(&:to_f)
20
- elsif value.respond_to?(:to_a)
20
+ elsif Utils.array?(value)
21
21
  value.to_a
22
22
  else
23
23
  raise "can't cast #{value.class.name} to vector"
@@ -38,5 +38,9 @@ module Neighbor
38
38
  # could also throw error
39
39
  norm > 0 ? value.map { |v| v / norm } : value
40
40
  end
41
+
42
+ def self.array?(value)
43
+ !value.nil? && value.respond_to?(:to_a)
44
+ end
41
45
  end
42
46
  end
@@ -1,3 +1,3 @@
1
1
  module Neighbor
2
- VERSION = "0.4.1"
2
+ VERSION = "0.4.2"
3
3
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: neighbor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.4.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane