neighbor 0.3.1 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 5818fdfa27cc4b0678fb125e82d2fa0a3066ea173674ca541987b9084a94ac14
4
- data.tar.gz: e874541532172dce9932a98506bd5ce7866a0f5c3e600ffd9d6473f226829368
3
+ metadata.gz: '09edc5a7eebbf6b14f06cb51340c5def49117a318340b4d2265321a8ce6a0bec'
4
+ data.tar.gz: fc8c8319cf715612f195836c84861eb327765355a0430f2d58fb5ab57857844e
5
5
  SHA512:
6
- metadata.gz: 96211da20caf70383018ca626cbd83bbda5e4ffaf95e5a9b5405686b192643916325f89a652b4ba9e34e17daa1b84e8045cc644b606bab8aa6ea339ed4d7ea0d
7
- data.tar.gz: bf855dc34617d489618417b9c807969f828358de19462458b7c09e1743102a8b9967b32b09660fab35f35543b77d588297c69971edb7442af0995cdca3b6ff8e
6
+ metadata.gz: caa86d17e8a3f710988486264434767c33f8b197f9a8721d6dc762235a0bc959d5c186670f7518b9d628a771454861df1beb603a175ec804aa67cf6eb9e14361
7
+ data.tar.gz: 3ac9d60c57cc3e82b617820f205282b42684517070de22af5d94878959ef00e3758fb88f821ba3d3f2369602919a41d1706314f77436c8b3e5ef95acc38e3c17
data/CHANGELOG.md CHANGED
@@ -1,3 +1,21 @@
1
+ ## 0.4.0 (2024-06-25)
2
+
3
+ - Added support for `halfvec` and `sparsevec` types
4
+ - Added support for `taxicab`, `hamming`, and `jaccard` distances with `vector` extension
5
+ - Added deserialization for `cube` and `vector` columns without `has_neighbor`
6
+ - Added support for composite primary keys
7
+ - Changed `nearest_neighbors` to replace previous `order` scopes
8
+ - Changed `normalize` option to use `before_save` callback
9
+ - Changed dimensions and finite values checks to use Active Record validations
10
+ - Fixed issue with `nearest_neighbors` scope overriding `select` values
11
+ - Removed default attribute name
12
+ - Dropped support for Ruby < 3.1
13
+
14
+ ## 0.3.2 (2023-12-12)
15
+
16
+ - Added deprecation warning for `has_neighbors` without an attribute name
17
+ - Added deprecation warning for `nearest_neighbors` without an attribute name
18
+
1
19
  ## 0.3.1 (2023-09-25)
2
20
 
3
21
  - Added support for passing multiple attributes to `has_neighbors`
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2021-2023 Andrew Kane
3
+ Copyright (c) 2021-2024 Andrew Kane
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  Nearest neighbor search for Rails and Postgres
4
4
 
5
- [![Build Status](https://github.com/ankane/neighbor/workflows/build/badge.svg?branch=master)](https://github.com/ankane/neighbor/actions)
5
+ [![Build Status](https://github.com/ankane/neighbor/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/neighbor/actions)
6
6
 
7
7
  ## Installation
8
8
 
@@ -14,7 +14,7 @@ gem "neighbor"
14
14
 
15
15
  ## Choose An Extension
16
16
 
17
- Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [vector](https://github.com/pgvector/pgvector). cube ships with Postgres, while vector supports approximate nearest neighbor search.
17
+ Neighbor supports two extensions: [cube](https://www.postgresql.org/docs/current/cube.html) and [vector](https://github.com/pgvector/pgvector). cube ships with Postgres, while vector supports more dimensions and approximate nearest neighbor search.
18
18
 
19
19
  For cube, run:
20
20
 
@@ -35,7 +35,7 @@ rails db:migrate
35
35
  Create a migration
36
36
 
37
37
  ```ruby
38
- class AddNeighborVectorToItems < ActiveRecord::Migration[7.0]
38
+ class AddEmbeddingToItems < ActiveRecord::Migration[7.1]
39
39
  def change
40
40
  add_column :items, :embedding, :cube
41
41
  # or
@@ -114,27 +114,27 @@ end
114
114
  For vector, add an approximate index to speed up queries. Create a migration with:
115
115
 
116
116
  ```ruby
117
- class AddIndexToItemsNeighborVector < ActiveRecord::Migration[7.0]
117
+ class AddIndexToItemsEmbedding < ActiveRecord::Migration[7.1]
118
118
  def change
119
- add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops
120
- # or with pgvector 0.5.0+
121
119
  add_index :items, :embedding, using: :hnsw, opclass: :vector_l2_ops
120
+ # or
121
+ add_index :items, :embedding, using: :ivfflat, opclass: :vector_l2_ops
122
122
  end
123
123
  end
124
124
  ```
125
125
 
126
126
  Use `:vector_cosine_ops` for cosine distance and `:vector_ip_ops` for inner product.
127
127
 
128
- Set the number of probes with IVFFlat
128
+ Set the size of the dynamic candidate list with HNSW
129
129
 
130
130
  ```ruby
131
- Item.connection.execute("SET ivfflat.probes = 3")
131
+ Item.connection.execute("SET hnsw.ef_search = 100")
132
132
  ```
133
133
 
134
- Or the size of the dynamic candidate list with HNSW
134
+ Or the number of probes with IVFFlat
135
135
 
136
136
  ```ruby
137
- Item.connection.execute("SET hnsw.ef_search = 100")
137
+ Item.connection.execute("SET ivfflat.probes = 3")
138
138
  ```
139
139
 
140
140
  ## Examples
@@ -242,7 +242,7 @@ movies = []
242
242
  recommender.item_ids.each do |item_id|
243
243
  movies << {name: item_id, factors: recommender.item_factors(item_id)}
244
244
  end
245
- Movie.insert_all!(movies) # use create! for Active Record < 6
245
+ Movie.insert_all!(movies)
246
246
  ```
247
247
 
248
248
  And get similar movies
@@ -286,10 +286,5 @@ git clone https://github.com/ankane/neighbor.git
286
286
  cd neighbor
287
287
  bundle install
288
288
  createdb neighbor_test
289
-
290
- # cube
291
289
  bundle exec rake test
292
-
293
- # vector
294
- EXT=vector bundle exec rake test
295
290
  ```
@@ -1,3 +1,4 @@
1
+ require "rails/generators"
1
2
  require "rails/generators/active_record"
2
3
 
3
4
  module Neighbor
@@ -1,3 +1,4 @@
1
+ require "rails/generators"
1
2
  require "rails/generators/active_record"
2
3
 
3
4
  module Neighbor
@@ -2,10 +2,9 @@ module Neighbor
2
2
  module Model
3
3
  def has_neighbors(*attribute_names, dimensions: nil, normalize: nil)
4
4
  if attribute_names.empty?
5
- attribute_names << :neighbor_vector
6
- else
7
- attribute_names.map!(&:to_sym)
5
+ raise ArgumentError, "has_neighbors requires an attribute name"
8
6
  end
7
+ attribute_names.map!(&:to_sym)
9
8
 
10
9
  class_eval do
11
10
  @neighbor_attributes ||= {}
@@ -26,29 +25,45 @@ module Neighbor
26
25
  attribute_names.each do |attribute_name|
27
26
  raise Error, "has_neighbors already called for #{attribute_name.inspect}" if neighbor_attributes[attribute_name]
28
27
  @neighbor_attributes[attribute_name] = {dimensions: dimensions, normalize: normalize}
29
-
30
- attribute attribute_name, Neighbor::Vector.new(dimensions: dimensions, normalize: normalize, model: self, attribute_name: attribute_name)
31
28
  end
32
29
 
33
30
  return if @neighbor_attributes.size != attribute_names.size
34
31
 
35
- scope :nearest_neighbors, ->(attribute_name, vector = nil, options = nil) {
36
- # cannot use keyword arguments with scope with Ruby 3.2 and Active Record 6.1
37
- # https://github.com/rails/rails/issues/46934
38
- if options.nil? && vector.is_a?(Hash)
39
- options = vector
40
- vector = nil
32
+ validate do
33
+ self.class.neighbor_attributes.each do |k, v|
34
+ value = read_attribute(k)
35
+ next if value.nil?
36
+
37
+ column_info = self.class.columns_hash[k.to_s]
38
+ dimensions = v[:dimensions] || column_info&.limit
39
+
40
+ if !Neighbor::Utils.validate_dimensions(value, column_info&.type, dimensions).nil?
41
+ errors.add(k, "must have #{dimensions} dimensions")
42
+ end
43
+ if !Neighbor::Utils.validate_finite(value, column_info&.type)
44
+ errors.add(k, "must have finite values")
45
+ end
41
46
  end
47
+ end
48
+
49
+ # TODO move to normalizes when Active Record < 7.1 no longer supported
50
+ before_save do
51
+ self.class.neighbor_attributes.each do |k, v|
52
+ next unless v[:normalize]
53
+ value = read_attribute(k)
54
+ next if value.nil?
55
+ self[k] = Neighbor::Utils.normalize(value, column_info: self.class.columns_hash[k.to_s])
56
+ end
57
+ end
58
+
59
+ # cannot use keyword arguments with scope with Ruby 3.2 and Active Record 6.1
60
+ # https://github.com/rails/rails/issues/46934
61
+ scope :nearest_neighbors, ->(attribute_name, vector, options = nil) {
42
62
  raise ArgumentError, "missing keyword: :distance" unless options.is_a?(Hash) && options.key?(:distance)
43
63
  distance = options.delete(:distance)
44
64
  raise ArgumentError, "unknown keywords: #{options.keys.map(&:inspect).join(", ")}" if options.any?
45
65
 
46
- if vector.nil? && !attribute_name.nil? && attribute_name.respond_to?(:to_a)
47
- vector = attribute_name
48
- attribute_name = :neighbor_vector
49
- end
50
66
  attribute_name = attribute_name.to_sym
51
-
52
67
  options = neighbor_attributes[attribute_name]
53
68
  raise ArgumentError, "Invalid attribute" unless options
54
69
  normalize = options[:normalize]
@@ -60,10 +75,21 @@ module Neighbor
60
75
 
61
76
  quoted_attribute = "#{connection.quote_table_name(table_name)}.#{connection.quote_column_name(attribute_name)}"
62
77
 
63
- column_info = klass.type_for_attribute(attribute_name).column_info
78
+ column_info = columns_hash[attribute_name.to_s]
79
+ column_type = column_info&.type
64
80
 
65
81
  operator =
66
- if column_info[:type] == :vector
82
+ case column_type
83
+ when :bit
84
+ case distance
85
+ when "hamming"
86
+ "<~>"
87
+ when "jaccard"
88
+ "<%>"
89
+ when "hamming2"
90
+ "#"
91
+ end
92
+ when :vector, :halfvec, :sparsevec
67
93
  case distance
68
94
  when "inner_product"
69
95
  "<#>"
@@ -71,8 +97,10 @@ module Neighbor
71
97
  "<=>"
72
98
  when "euclidean"
73
99
  "<->"
100
+ when "taxicab"
101
+ "<+>"
74
102
  end
75
- else
103
+ when :cube
76
104
  case distance
77
105
  when "taxicab"
78
106
  "<#>"
@@ -81,27 +109,27 @@ module Neighbor
81
109
  when "euclidean", "cosine"
82
110
  "<->"
83
111
  end
112
+ else
113
+ raise ArgumentError, "Unsupported type: #{column_type}"
84
114
  end
85
115
 
86
116
  raise ArgumentError, "Invalid distance: #{distance}" unless operator
87
117
 
88
118
  # ensure normalize set (can be true or false)
89
- if distance == "cosine" && column_info[:type] == :cube && normalize.nil?
119
+ if distance == "cosine" && column_type == :cube && normalize.nil?
90
120
  raise Neighbor::Error, "Set normalize for cosine distance with cube"
91
121
  end
92
122
 
93
- vector = Neighbor::Vector.cast(vector, dimensions: dimensions, normalize: normalize, column_info: column_info)
94
-
95
- # important! neighbor_vector should already be typecast
96
- # but use to_f as extra safeguard against SQL injection
97
- query =
98
- if column_info[:type] == :vector
99
- connection.quote("[#{vector.map(&:to_f).join(", ")}]")
100
- else
101
- "cube(array[#{vector.map(&:to_f).join(", ")}])"
102
- end
123
+ column_attribute = klass.type_for_attribute(attribute_name)
124
+ vector = column_attribute.cast(vector)
125
+ Neighbor::Utils.validate(vector, dimensions: dimensions, column_info: column_info)
126
+ vector = Neighbor::Utils.normalize(vector, column_info: column_info) if normalize
103
127
 
128
+ query = connection.quote(column_attribute.serialize(vector))
104
129
  order = "#{quoted_attribute} #{operator} #{query}"
130
+ if operator == "#"
131
+ order = "bit_count(#{order})"
132
+ end
105
133
 
106
134
  # https://stats.stackexchange.com/questions/146221/is-cosine-similarity-identical-to-l2-normalized-euclidean-distance
107
135
  # with normalized vectors:
@@ -109,27 +137,28 @@ module Neighbor
109
137
  # cosine distance = 1 - cosine similarity
110
138
  # this transformation doesn't change the order, so only needed for select
111
139
  neighbor_distance =
112
- if column_info[:type] != :vector && distance == "cosine"
140
+ if column_type == :cube && distance == "cosine"
113
141
  "POWER(#{order}, 2) / 2.0"
114
- elsif column_info[:type] == :vector && distance == "inner_product"
142
+ elsif [:vector, :halfvec, :sparsevec].include?(column_type) && distance == "inner_product"
115
143
  "(#{order}) * -1"
116
144
  else
117
145
  order
118
146
  end
119
147
 
120
148
  # for select, use column_names instead of * to account for ignored columns
121
- select(*column_names, "#{neighbor_distance} AS neighbor_distance")
149
+ select_columns = select_values.any? ? [] : column_names
150
+ select(*select_columns, "#{neighbor_distance} AS neighbor_distance")
122
151
  .where.not(attribute_name => nil)
123
- .order(Arel.sql(order))
152
+ .reorder(Arel.sql(order))
124
153
  }
125
154
 
126
- def nearest_neighbors(attribute_name = :neighbor_vector, **options)
155
+ def nearest_neighbors(attribute_name, **options)
127
156
  attribute_name = attribute_name.to_sym
128
- # important! check if neighbor attribute before calling send
157
+ # important! check if neighbor attribute before accessing
129
158
  raise ArgumentError, "Invalid attribute" unless self.class.neighbor_attributes[attribute_name]
130
159
 
131
160
  self.class
132
- .where.not(self.class.primary_key => self[self.class.primary_key])
161
+ .where.not(Array(self.class.primary_key).to_h { |k| [k, self[k]] })
133
162
  .nearest_neighbors(attribute_name, self[attribute_name], **options)
134
163
  end
135
164
  end
@@ -1,16 +1,16 @@
1
1
  module Neighbor
2
2
  class Railtie < Rails::Railtie
3
3
  generators do
4
+ require "rails/generators/generated_attribute"
5
+
4
6
  # rails generate model Item embedding:vector{3}
5
- if defined?(Rails::Generators::GeneratedAttribute)
6
- Rails::Generators::GeneratedAttribute.singleton_class.prepend(Neighbor::GeneratedAttribute)
7
- end
7
+ Rails::Generators::GeneratedAttribute.singleton_class.prepend(Neighbor::GeneratedAttribute)
8
8
  end
9
9
  end
10
10
 
11
11
  module GeneratedAttribute
12
12
  def parse_type_and_options(type, *, **)
13
- if type =~ /\A(vector)\{(\d+)\}\z/
13
+ if type =~ /\A(vector|halfvec|sparsevec)\{(\d+)\}\z/
14
14
  return $1, limit: $2.to_i
15
15
  end
16
16
  super
@@ -0,0 +1,79 @@
1
+ module Neighbor
2
+ class SparseVector
3
+ attr_reader :dimensions, :indices, :values
4
+
5
+ NO_DEFAULT = Object.new
6
+
7
+ def initialize(value, dimensions = NO_DEFAULT)
8
+ if value.is_a?(Hash)
9
+ if dimensions == NO_DEFAULT
10
+ raise ArgumentError, "missing dimensions"
11
+ end
12
+ from_hash(value, dimensions)
13
+ else
14
+ unless dimensions == NO_DEFAULT
15
+ raise ArgumentError, "extra argument"
16
+ end
17
+ from_array(value)
18
+ end
19
+ end
20
+
21
+ def to_s
22
+ "{#{@indices.zip(@values).map { |i, v| "#{i.to_i + 1}:#{v.to_f}" }.join(",")}}/#{@dimensions.to_i}"
23
+ end
24
+
25
+ def to_a
26
+ arr = Array.new(dimensions, 0.0)
27
+ @indices.zip(@values) do |i, v|
28
+ arr[i] = v
29
+ end
30
+ arr
31
+ end
32
+
33
+ private
34
+
35
+ def from_hash(data, dimensions)
36
+ elements = data.select { |_, v| v != 0 }.sort
37
+ @dimensions = dimensions.to_i
38
+ @indices = elements.map { |v| v[0].to_i }
39
+ @values = elements.map { |v| v[1].to_f }
40
+ end
41
+
42
+ def from_array(arr)
43
+ arr = arr.to_a
44
+ @dimensions = arr.size
45
+ @indices = []
46
+ @values = []
47
+ arr.each_with_index do |v, i|
48
+ if v != 0
49
+ @indices << i
50
+ @values << v.to_f
51
+ end
52
+ end
53
+ end
54
+
55
+ class << self
56
+ def from_text(string)
57
+ elements, dimensions = string.split("/", 2)
58
+ indices = []
59
+ values = []
60
+ elements[1..-2].split(",").each do |e|
61
+ index, value = e.split(":", 2)
62
+ indices << index.to_i - 1
63
+ values << value.to_f
64
+ end
65
+ from_parts(dimensions.to_i, indices, values)
66
+ end
67
+
68
+ private
69
+
70
+ def from_parts(dimensions, indices, values)
71
+ vec = allocate
72
+ vec.instance_variable_set(:@dimensions, dimensions)
73
+ vec.instance_variable_set(:@indices, indices)
74
+ vec.instance_variable_set(:@values, values)
75
+ vec
76
+ end
77
+ end
78
+ end
79
+ end
@@ -0,0 +1,42 @@
1
+ module Neighbor
2
+ module Type
3
+ class Cube < ActiveRecord::Type::Value
4
+ def type
5
+ :cube
6
+ end
7
+
8
+ def serialize(value)
9
+ if value.is_a?(Array)
10
+ if value.first.is_a?(Array)
11
+ value = value.map { |v| serialize_point(v) }.join(", ")
12
+ else
13
+ value = serialize_point(value)
14
+ end
15
+ end
16
+ super(value)
17
+ end
18
+
19
+ private
20
+
21
+ def cast_value(value)
22
+ if value.is_a?(Array)
23
+ value
24
+ elsif value.is_a?(Numeric)
25
+ [value]
26
+ elsif value.is_a?(String)
27
+ if value.include?("),(")
28
+ value[1..-1].split("),(").map { |v| v.split(",").map(&:to_f) }
29
+ else
30
+ value[1..-1].split(",").map(&:to_f)
31
+ end
32
+ else
33
+ raise "can't cast #{value.class.name} to cube"
34
+ end
35
+ end
36
+
37
+ def serialize_point(value)
38
+ "(#{value.map(&:to_f).join(", ")})"
39
+ end
40
+ end
41
+ end
42
+ end
@@ -0,0 +1,28 @@
1
+ module Neighbor
2
+ module Type
3
+ class Halfvec < ActiveRecord::Type::Value
4
+ def type
5
+ :halfvec
6
+ end
7
+
8
+ def serialize(value)
9
+ if value.is_a?(Array)
10
+ value = "[#{value.map(&:to_f).join(",")}]"
11
+ end
12
+ super(value)
13
+ end
14
+
15
+ private
16
+
17
+ def cast_value(value)
18
+ if value.is_a?(String)
19
+ value[1..-1].split(",").map(&:to_f)
20
+ elsif value.is_a?(Array)
21
+ value
22
+ else
23
+ raise "can't cast #{value.class.name} to halfvec"
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,30 @@
1
+ module Neighbor
2
+ module Type
3
+ class Sparsevec < ActiveRecord::Type::Value
4
+ def type
5
+ :sparsevec
6
+ end
7
+
8
+ def serialize(value)
9
+ if value.is_a?(SparseVector)
10
+ value = "{#{value.indices.zip(value.values).map { |i, v| "#{i.to_i + 1}:#{v.to_f}" }.join(",")}}/#{value.dimensions.to_i}"
11
+ end
12
+ super(value)
13
+ end
14
+
15
+ private
16
+
17
+ def cast_value(value)
18
+ if value.is_a?(SparseVector)
19
+ value
20
+ elsif value.is_a?(String)
21
+ SparseVector.from_text(value)
22
+ elsif value.is_a?(Array)
23
+ value = SparseVector.new(value)
24
+ else
25
+ raise "can't cast #{value.class.name} to sparsevec"
26
+ end
27
+ end
28
+ end
29
+ end
30
+ end
@@ -0,0 +1,28 @@
1
+ module Neighbor
2
+ module Type
3
+ class Vector < ActiveRecord::Type::Value
4
+ def type
5
+ :vector
6
+ end
7
+
8
+ def serialize(value)
9
+ if value.is_a?(Array)
10
+ value = "[#{value.map(&:to_f).join(",")}]"
11
+ end
12
+ super(value)
13
+ end
14
+
15
+ private
16
+
17
+ def cast_value(value)
18
+ if value.is_a?(String)
19
+ value[1..-1].split(",").map(&:to_f)
20
+ elsif value.is_a?(Array)
21
+ value
22
+ else
23
+ raise "can't cast #{value.class.name} to vector"
24
+ end
25
+ end
26
+ end
27
+ end
28
+ end
@@ -0,0 +1,42 @@
1
+ module Neighbor
2
+ module Utils
3
+ def self.validate_dimensions(value, type, expected)
4
+ dimensions = type == :sparsevec ? value.dimensions : value.size
5
+ if expected && dimensions != expected
6
+ "Expected #{expected} dimensions, not #{dimensions}"
7
+ end
8
+ end
9
+
10
+ def self.validate_finite(value, type)
11
+ case type
12
+ when :bit
13
+ true
14
+ when :sparsevec
15
+ value.values.all?(&:finite?)
16
+ else
17
+ value.all?(&:finite?)
18
+ end
19
+ end
20
+
21
+ def self.validate(value, dimensions:, column_info:)
22
+ if (message = validate_dimensions(value, column_info&.type, dimensions || column_info&.limit))
23
+ raise Error, message
24
+ end
25
+
26
+ if !validate_finite(value, column_info&.type)
27
+ raise Error, "Values must be finite"
28
+ end
29
+ end
30
+
31
+ def self.normalize(value, column_info:)
32
+ raise Error, "Normalize not supported for type" unless [:cube, :vector, :halfvec].include?(column_info&.type)
33
+
34
+ norm = Math.sqrt(value.sum { |v| v * v })
35
+
36
+ # store zero vector as all zeros
37
+ # since NaN makes the distance always 0
38
+ # could also throw error
39
+ norm > 0 ? value.map { |v| v / norm } : value
40
+ end
41
+ end
42
+ end
@@ -1,3 +1,3 @@
1
1
  module Neighbor
2
- VERSION = "0.3.1"
2
+ VERSION = "0.4.0"
3
3
  end
data/lib/neighbor.rb CHANGED
@@ -2,6 +2,8 @@
2
2
  require "active_support"
3
3
 
4
4
  # modules
5
+ require_relative "neighbor/sparse_vector"
6
+ require_relative "neighbor/utils"
5
7
  require_relative "neighbor/version"
6
8
 
7
9
  module Neighbor
@@ -10,10 +12,18 @@ module Neighbor
10
12
  module RegisterTypes
11
13
  def initialize_type_map(m = type_map)
12
14
  super
13
- m.register_type "cube", ActiveRecord::ConnectionAdapters::PostgreSQL::OID::SpecializedString.new(:cube)
15
+ m.register_type "cube", Type::Cube.new
16
+ m.register_type "halfvec" do |_, _, sql_type|
17
+ limit = extract_limit(sql_type)
18
+ Type::Halfvec.new(limit: limit)
19
+ end
20
+ m.register_type "sparsevec" do |_, _, sql_type|
21
+ limit = extract_limit(sql_type)
22
+ Type::Sparsevec.new(limit: limit)
23
+ end
14
24
  m.register_type "vector" do |_, _, sql_type|
15
25
  limit = extract_limit(sql_type)
16
- ActiveRecord::ConnectionAdapters::PostgreSQL::OID::SpecializedString.new(:vector, limit: limit)
26
+ Type::Vector.new(limit: limit)
17
27
  end
18
28
  end
19
29
  end
@@ -21,7 +31,10 @@ end
21
31
 
22
32
  ActiveSupport.on_load(:active_record) do
23
33
  require_relative "neighbor/model"
24
- require_relative "neighbor/vector"
34
+ require_relative "neighbor/type/cube"
35
+ require_relative "neighbor/type/halfvec"
36
+ require_relative "neighbor/type/sparsevec"
37
+ require_relative "neighbor/type/vector"
25
38
 
26
39
  extend Neighbor::Model
27
40
 
@@ -29,10 +42,12 @@ ActiveSupport.on_load(:active_record) do
29
42
 
30
43
  # ensure schema can be dumped
31
44
  ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::NATIVE_DATABASE_TYPES[:cube] = {name: "cube"}
45
+ ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::NATIVE_DATABASE_TYPES[:halfvec] = {name: "halfvec"}
46
+ ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::NATIVE_DATABASE_TYPES[:sparsevec] = {name: "sparsevec"}
32
47
  ActiveRecord::ConnectionAdapters::PostgreSQLAdapter::NATIVE_DATABASE_TYPES[:vector] = {name: "vector"}
33
48
 
34
49
  # ensure schema can be loaded
35
- ActiveRecord::ConnectionAdapters::TableDefinition.send(:define_column_methods, :cube, :vector)
50
+ ActiveRecord::ConnectionAdapters::TableDefinition.send(:define_column_methods, :cube, :halfvec, :sparsevec, :vector)
36
51
 
37
52
  # prevent unknown OID warning
38
53
  if ActiveRecord::VERSION::MAJOR >= 7
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: neighbor
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.4.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2023-09-26 00:00:00.000000000 Z
11
+ date: 2024-06-26 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: activerecord
@@ -40,7 +40,12 @@ files:
40
40
  - lib/neighbor.rb
41
41
  - lib/neighbor/model.rb
42
42
  - lib/neighbor/railtie.rb
43
- - lib/neighbor/vector.rb
43
+ - lib/neighbor/sparse_vector.rb
44
+ - lib/neighbor/type/cube.rb
45
+ - lib/neighbor/type/halfvec.rb
46
+ - lib/neighbor/type/sparsevec.rb
47
+ - lib/neighbor/type/vector.rb
48
+ - lib/neighbor/utils.rb
44
49
  - lib/neighbor/version.rb
45
50
  homepage: https://github.com/ankane/neighbor
46
51
  licenses:
@@ -54,14 +59,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
54
59
  requirements:
55
60
  - - ">="
56
61
  - !ruby/object:Gem::Version
57
- version: '3'
62
+ version: '3.1'
58
63
  required_rubygems_version: !ruby/object:Gem::Requirement
59
64
  requirements:
60
65
  - - ">="
61
66
  - !ruby/object:Gem::Version
62
67
  version: '0'
63
68
  requirements: []
64
- rubygems_version: 3.4.10
69
+ rubygems_version: 3.5.11
65
70
  signing_key:
66
71
  specification_version: 4
67
72
  summary: Nearest neighbor search for Rails and Postgres
@@ -1,65 +0,0 @@
1
- module Neighbor
2
- class Vector < ActiveRecord::Type::Value
3
- def initialize(dimensions:, normalize:, model:, attribute_name:)
4
- super()
5
- @dimensions = dimensions
6
- @normalize = normalize
7
- @model = model
8
- @attribute_name = attribute_name
9
- end
10
-
11
- def self.cast(value, dimensions:, normalize:, column_info:)
12
- value = value.to_a.map(&:to_f)
13
-
14
- dimensions ||= column_info[:dimensions]
15
- raise Error, "Expected #{dimensions} dimensions, not #{value.size}" if dimensions && value.size != dimensions
16
-
17
- raise Error, "Values must be finite" unless value.all?(&:finite?)
18
-
19
- if normalize
20
- norm = Math.sqrt(value.sum { |v| v * v })
21
-
22
- # store zero vector as all zeros
23
- # since NaN makes the distance always 0
24
- # could also throw error
25
-
26
- # safe to update in-place since earlier map dups
27
- value.map! { |v| v / norm } if norm > 0
28
- end
29
-
30
- value
31
- end
32
-
33
- def self.column_info(model, attribute_name)
34
- attribute_name = attribute_name.to_s
35
- column = model.columns.detect { |c| c.name == attribute_name }
36
- {
37
- type: column.try(:type),
38
- dimensions: column.try(:limit)
39
- }
40
- end
41
-
42
- # need to be careful to avoid loading column info before needed
43
- def column_info
44
- @column_info ||= self.class.column_info(@model, @attribute_name)
45
- end
46
-
47
- def cast(value)
48
- self.class.cast(value, dimensions: @dimensions, normalize: @normalize, column_info: column_info) unless value.nil?
49
- end
50
-
51
- def serialize(value)
52
- unless value.nil?
53
- if column_info[:type] == :vector
54
- "[#{cast(value).join(", ")}]"
55
- else
56
- "(#{cast(value).join(", ")})"
57
- end
58
- end
59
- end
60
-
61
- def deserialize(value)
62
- value[1..-1].split(",").map(&:to_f) unless value.nil?
63
- end
64
- end
65
- end