neighbor 0.3.0 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a0ab3641b145d2f7b83ab5b9639b63a97243aeaadd82e057ff71d30440eccf86
4
- data.tar.gz: 0042cdebaff064ee3c27658f2a1edf348768bc7022e14fc12a15eabf1f4cb20f
3
+ metadata.gz: 0c8b5d19222742f33f51f2c30f9d03108ebd3ed99908a7e9dd5f4e49caa2e225
4
+ data.tar.gz: c9cfa942f2cdd8b9757c9ecfe5e89d0aced11263f8a559004ee15fa0c8adb3f4
5
5
  SHA512:
6
- metadata.gz: 0c500661e9557575032762af3f54b36259a4b87493cd0ed30c559e9fb1b21c18c70b4c410becd1095aed64bbd86d8aac527cbd498a14717708030905ecc84d6e
7
- data.tar.gz: 75a62b2a13ecb8a34bb6595b71d0aa5686111a390a32ab610c10bcaf8d1e688541b742073658498e20a4724a9f9a1e7ab718f0a2298a7214e5c1cbffdb47c2e2
6
+ metadata.gz: e9e0050031ce7691baa9242b3b6b5aa76afb1fe7c63575129e68b2f5c027143b3c08f68a7babfcf2a9b02f1d9327679f75e9c40b95ac2245ea7c8dd3025d3cdb
7
+ data.tar.gz: a9c505740cba454437617733d4025360848a16ef9a4c9c83fc16d5bc82a3e5521c77e3cba874ef3cf318cf3a1e319567958a6156481f7fd82ef72ebaa87d97eb
data/CHANGELOG.md CHANGED
@@ -1,3 +1,13 @@
1
+ ## 0.3.2 (2023-12-12)
2
+
3
+ - Added deprecation warning for `has_neighbors` without an attribute name
4
+ - Added deprecation warning for `nearest_neighbors` without an attribute name
5
+
6
+ ## 0.3.1 (2023-09-25)
7
+
8
+ - Added support for passing multiple attributes to `has_neighbors`
9
+ - Fixed error with `nearest_neighbors` scope with Ruby 3.2 and Active Record 6.1
10
+
1
11
  ## 0.3.0 (2023-07-24)
2
12
 
3
13
  - Dropped support for Ruby < 3 and Active Record < 6.1
data/README.md CHANGED
@@ -14,7 +14,7 @@ gem "neighbor"
14
14
 
15
15
  ## Choose An Extension
16
16
 
17
- Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [vector](https://github.com/pgvector/pgvector). cube ships with Postgres, while vector supports approximate nearest neighbor search.
17
+ Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [vector](https://github.com/pgvector/pgvector). cube ships with Postgres, while vector supports more dimensions and approximate nearest neighbor search.
18
18
 
19
19
  For cube, run:
20
20
 
@@ -35,7 +35,7 @@ rails db:migrate
35
35
  Create a migration
36
36
 
37
37
  ```ruby
38
- class AddNeighborVectorToItems < ActiveRecord::Migration[7.0]
38
+ class AddEmbeddingToItems < ActiveRecord::Migration[7.1]
39
39
  def change
40
40
  add_column :items, :embedding, :cube
41
41
  # or
@@ -114,21 +114,29 @@ end
114
114
  For vector, add an approximate index to speed up queries. Create a migration with:
115
115
 
116
116
  ```ruby
117
- class AddIndexToItemsNeighborVector < ActiveRecord::Migration[7.0]
117
+ class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.1]
118
118
  def change
119
119
  add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops
120
+ # or with pgvector 0.5.0+
121
+ add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops
120
122
  end
121
123
  end
122
124
  ```
123
125
 
124
126
  Use `:vector_cosine_ops` for cosine distance and `:vector_ip_ops` for inner product.
125
127
 
126
- Set the number of probes
128
+ Set the number of probes with IVFFlat
127
129
 
128
130
  ```ruby
129
131
  Item.connection.execute("SET ivfflat.probes = 3")
130
132
  ```
131
133
 
134
+ Or the size of the dynamic candidate list with HNSW
135
+
136
+ ```ruby
137
+ Item.connection.execute("SET hnsw.ef_search = 100")
138
+ ```
139
+
132
140
  ## Examples
133
141
 
134
142
  - [OpenAI Embeddings](#openai-embeddings)
@@ -139,14 +147,14 @@ Item.connection.execute("SET ivfflat.probes = 3")
139
147
  Generate a model
140
148
 
141
149
  ```sh
142
- rails generate model Article content:text embedding:vector{1536}
150
+ rails generate model Document content:text embedding:vector{1536}
143
151
  rails db:migrate
144
152
  ```
145
153
 
146
154
  And add `has_neighbors`
147
155
 
148
156
  ```ruby
149
- class Article < ApplicationRecord
157
+ class Document < ApplicationRecord
150
158
  has_neighbors :embedding
151
159
  end
152
160
  ```
@@ -184,18 +192,18 @@ embeddings = fetch_embeddings(input)
184
192
  Store the embeddings
185
193
 
186
194
  ```ruby
187
- articles = []
195
+ documents = []
188
196
  input.zip(embeddings) do |content, embedding|
189
- articles << {content: content, embedding: embedding}
197
+ documents << {content: content, embedding: embedding}
190
198
  end
191
- Article.insert_all!(articles) # use create! for Active Record < 6
199
+ Document.insert_all!(documents)
192
200
  ```
193
201
 
194
202
  And get similar articles
195
203
 
196
204
  ```ruby
197
- article = Article.first
198
- article.nearest_neighbors(:embedding, distance: "inner_product").first(5).map(&:content)
205
+ document = Document.first
206
+ document.nearest_neighbors(:embedding, distance: "cosine").first(5).map(&:content)
199
207
  ```
200
208
 
201
209
  See the [complete code](examples/openai_embeddings.rb)
@@ -1,7 +1,12 @@
1
1
  module Neighbor
2
2
  module Model
3
- def has_neighbors(attribute_name = :neighbor_vector, dimensions: nil, normalize: nil)
4
- attribute_name = attribute_name.to_sym
3
+ def has_neighbors(*attribute_names, dimensions: nil, normalize: nil)
4
+ if attribute_names.empty?
5
+ warn "[neighbor] has_neighbors without an attribute name is deprecated"
6
+ attribute_names << :neighbor_vector
7
+ else
8
+ attribute_names.map!(&:to_sym)
9
+ end
5
10
 
6
11
  class_eval do
7
12
  @neighbor_attributes ||= {}
@@ -19,15 +24,28 @@ module Neighbor
19
24
  end
20
25
  end
21
26
 
22
- raise Error, "has_neighbors already called for #{attribute_name.inspect}" if neighbor_attributes[attribute_name]
23
- @neighbor_attributes[attribute_name] = {dimensions: dimensions, normalize: normalize}
27
+ attribute_names.each do |attribute_name|
28
+ raise Error, "has_neighbors already called for #{attribute_name.inspect}" if neighbor_attributes[attribute_name]
29
+ @neighbor_attributes[attribute_name] = {dimensions: dimensions, normalize: normalize}
30
+
31
+ attribute attribute_name, Neighbor::Vector.new(dimensions: dimensions, normalize: normalize, model: self, attribute_name: attribute_name)
32
+ end
24
33
 
25
- attribute attribute_name, Neighbor::Vector.new(dimensions: dimensions, normalize: normalize, model: self, attribute_name: attribute_name)
34
+ return if @neighbor_attributes.size != attribute_names.size
26
35
 
27
- return if @neighbor_attributes.size != 1
36
+ scope :nearest_neighbors, ->(attribute_name, vector = nil, options = nil) {
37
+ # cannot use keyword arguments with scope with Ruby 3.2 and Active Record 6.1
38
+ # https://github.com/rails/rails/issues/46934
39
+ if options.nil? && vector.is_a?(Hash)
40
+ options = vector
41
+ vector = nil
42
+ end
43
+ raise ArgumentError, "missing keyword: :distance" unless options.is_a?(Hash) && options.key?(:distance)
44
+ distance = options.delete(:distance)
45
+ raise ArgumentError, "unknown keywords: #{options.keys.map(&:inspect).join(", ")}" if options.any?
28
46
 
29
- scope :nearest_neighbors, ->(attribute_name, vector = nil, distance:) {
30
47
  if vector.nil? && !attribute_name.nil? && attribute_name.respond_to?(:to_a)
48
+ warn "[neighbor] nearest_neighbors without an attribute name is deprecated"
31
49
  vector = attribute_name
32
50
  attribute_name = :neighbor_vector
33
51
  end
@@ -107,7 +125,11 @@ module Neighbor
107
125
  .order(Arel.sql(order))
108
126
  }
109
127
 
110
- def nearest_neighbors(attribute_name = :neighbor_vector, **options)
128
+ def nearest_neighbors(attribute_name = nil, **options)
129
+ if attribute_name.nil?
130
+ warn "[neighbor] nearest_neighbors without an attribute name is deprecated"
131
+ attribute_name = :neighbor_vector
132
+ end
111
133
  attribute_name = attribute_name.to_sym
112
134
  # important! check if neighbor attribute before calling send
113
135
  raise ArgumentError, "Invalid attribute" unless self.class.neighbor_attributes[attribute_name]
@@ -0,0 +1,38 @@
1
+ module Neighbor
2
+ module Type
3
+ class Cube < ActiveRecord::Type::String
4
+ def type
5
+ :cube
6
+ end
7
+
8
+ def cast(value)
9
+ if value.is_a?(Array)
10
+ if value.first.is_a?(Array)
11
+ value.map { |v| cast_point(v) }.join(", ")
12
+ else
13
+ cast_point(value)
14
+ end
15
+ else
16
+ super
17
+ end
18
+ end
19
+
20
+ # TODO uncomment in 0.4.0
21
+ # def deserialize(value)
22
+ # if value.nil?
23
+ # super
24
+ # elsif value.include?("),(")
25
+ # value[1..-1].split("),(").map { |v| v.split(",").map(&:to_f) }
26
+ # else
27
+ # value[1..-1].split(",").map(&:to_f)
28
+ # end
29
+ # end
30
+
31
+ private
32
+
33
+ def cast_point(value)
34
+ "(#{value.map(&:to_f).join(", ")})"
35
+ end
36
+ end
37
+ end
38
+ end
@@ -0,0 +1,14 @@
1
+ module Neighbor
2
+ module Type
3
+ class Vector < ActiveRecord::Type::String
4
+ def type
5
+ :vector
6
+ end
7
+
8
+ # TODO uncomment in 0.4.0
9
+ # def deserialize(value)
10
+ # value[1..-1].split(",").map(&:to_f) unless value.nil?
11
+ # end
12
+ end
13
+ end
14
+ end
@@ -1,3 +1,3 @@
1
1
  module Neighbor
2
- VERSION = "0.3.0"
2
+ VERSION = "0.3.2"
3
3
  end
data/lib/neighbor.rb CHANGED
@@ -10,10 +10,10 @@ module Neighbor
10
10
  module RegisterTypes
11
11
  def initialize_type_map(m = type_map)
12
12
  super
13
- m.register_type "cube", ActiveRecord::ConnectionAdapters::PostgreSQL::OID::SpecializedString.new(:cube)
13
+ m.register_type "cube", Type::Cube.new
14
14
  m.register_type "vector" do |_, _, sql_type|
15
15
  limit = extract_limit(sql_type)
16
- ActiveRecord::ConnectionAdapters::PostgreSQL::OID::SpecializedString.new(:vector, limit: limit)
16
+ Type::Vector.new(limit: limit)
17
17
  end
18
18
  end
19
19
  end
@@ -22,6 +22,8 @@ end
22
22
  ActiveSupport.on_load(:active_record) do
23
23
  require_relative "neighbor/model"
24
24
  require_relative "neighbor/vector"
25
+ require_relative "neighbor/type/cube"
26
+ require_relative "neighbor/type/vector"
25
27
 
26
28
  extend Neighbor::Model
27
29
 
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: neighbor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-07-25 00:00:00.000000000 Z
11
+ date: 2023-12-12 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -40,6 +40,8 @@ files:
40
40
  - lib/neighbor.rb
41
41
  - lib/neighbor/model.rb
42
42
  - lib/neighbor/railtie.rb
43
+ - lib/neighbor/type/cube.rb
44
+ - lib/neighbor/type/vector.rb
43
45
  - lib/neighbor/vector.rb
44
46
  - lib/neighbor/version.rb
45
47
  homepage: https://github.com/ankane/neighbor