neighbor-redis 0.2.1 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 7a5af4a80cc03cdce7567594144695b7bf5bd9a165884135c21eb10a316c3e17
4
- data.tar.gz: 8700c04a90e034c099def3fec562aa982dafcbb87075e08a81fb13b66d74a703
3
+ metadata.gz: 94d78e38cc8f93df780741fe7c08964657e9f965ca746ffeb9b80fa5950663f6
4
+ data.tar.gz: 8c4ed1a76383f1a7f0d99712cce6b5d835472eabf48357ec1b7b9044f212e8ff
5
5
  SHA512:
6
- metadata.gz: e035dba8d0fd5ed1fd27526db22ec466e9704f914e1e91d30665aa092ac0b76ae202a2c4178815159f14c5f9923ee96094b565338f238bbdc82b416175d91204
7
- data.tar.gz: d26fced7f998bad66b2a20cb81327dbcfdbcfc4bdc5754efd93ccdd730ad6865fcbedea0e59ff059139ee8679f56202ab143c45a8ee8a3c28a88046e668b2501
6
+ metadata.gz: 83d4415cd858429b265e764f09fb34df4492a0fa93d94d6a766688a8e2f3a98ae9850102ca684322838a898c7c2fb430aa7fbc0082564679b775d7dce79d7e18
7
+ data.tar.gz: 8dd3a10a9fbaf716df69cc166cd4703e2a0c9489a3cddc58816e4ef17a0f0a5be7fe16da680553b4884606eaddc9e62fdbdea572b6891e44ed54da7335187018
data/CHANGELOG.md CHANGED
@@ -1,3 +1,14 @@
1
+ ## 0.3.0 (2025-09-12)
2
+
3
+ - Added support for vector sets
4
+ - Added support for SVS Vamana indexes
5
+ - Added support for metadata
6
+ - Added `info` and `count` methods to indexes
7
+ - Updated `add` and `remove` methods to return boolean
8
+ - Updated `add_all` method to return array of booleans
9
+ - Updated `create` and `promote` methods to return `nil`
10
+ - Dropped support for Ruby < 3.2
11
+
1
12
  ## 0.2.1 (2025-05-06)
2
13
 
3
14
  - Added support for Redis 8
data/README.md CHANGED
@@ -2,11 +2,13 @@
2
2
 
3
3
  Nearest neighbor search for Ruby and Redis
4
4
 
5
+ Supports Redis 8 [vector sets](https://redis.io/docs/latest/develop/data-types/vector-sets/) and RediSearch [vector indexes](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/)
6
+
5
7
  [![Build Status](https://github.com/ankane/neighbor-redis/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/neighbor-redis/actions)
6
8
 
7
9
  ## Installation
8
10
 
9
- First, install Redis with the [RediSearch](https://github.com/RediSearch/RediSearch) module. With Docker, use:
11
+ First, install Redis. With Docker, use:
10
12
 
11
13
  ```sh
12
14
  docker run -p 6379:6379 redis:8
@@ -29,11 +31,10 @@ Neighbor::Redis.client = RedisClient.config.new_pool
29
31
  Create an index
30
32
 
31
33
  ```ruby
32
- index = Neighbor::Redis::HNSWIndex.new("items", dimensions: 3, distance: "l2")
33
- index.create
34
+ index = Neighbor::Redis::VectorSet.new("items")
34
35
  ```
35
36
 
36
- Add items
37
+ Add vectors
37
38
 
38
39
  ```ruby
39
40
  index.add(1, [1, 1, 1])
@@ -41,105 +42,196 @@ index.add(2, [2, 2, 2])
41
42
  index.add(3, [1, 1, 2])
42
43
  ```
43
44
 
44
- Note: IDs are stored and returned as strings (uses less total memory)
45
+ Search for nearest neighbors to a vector
45
46
 
46
- Get the nearest neighbors to an item
47
+ ```ruby
48
+ index.search([1, 1, 1], count: 5)
49
+ ```
50
+
51
+ Search for nearest neighbors to a vector in the index
47
52
 
48
53
  ```ruby
49
- index.nearest(1, count: 5)
54
+ index.search_id(1, count: 5)
50
55
  ```
51
56
 
52
- Get the nearest neighbors to a vector
57
+ IDs are treated as strings by default, but can also be treated as integers
53
58
 
54
59
  ```ruby
55
- index.search([1, 1, 1], count: 5)
60
+ Neighbor::Redis::VectorSet.new("items", id_type: "integer")
56
61
  ```
57
62
 
58
- ## Distance
63
+ ## Operations
59
64
 
60
- Supported values are:
65
+ Add or update a vector
61
66
 
62
- - `l2`
63
- - `inner_product`
64
- - `cosine`
67
+ ```ruby
68
+ index.add(id, vector)
69
+ ```
65
70
 
66
- ## Index Types
71
+ Add or update multiple vectors
72
+
73
+ ```ruby
74
+ index.add_all(ids, vectors)
75
+ ```
67
76
 
68
- Hierarchical Navigable Small World (HNSW)
77
+ Get a vector
69
78
 
70
79
  ```ruby
71
- Neighbor::Redis::HNSWIndex.new(
72
- name,
73
- initial_cap: nil,
74
- m: 16,
75
- ef_construction: 200,
76
- ef_runtime: 10,
77
- epsilon: 0.01
78
- )
80
+ index.find(id)
79
81
  ```
80
82
 
81
- Flat
83
+ Remove a vector
82
84
 
83
85
  ```ruby
84
- Neighbor::Redis::FlatIndex.new(
85
- name,
86
- initial_cap: nil,
87
- block_size: 1024
88
- )
86
+ index.remove(id)
89
87
  ```
90
88
 
91
- ## Additional Options
89
+ Remove multiple vectors
92
90
 
93
- Store vectors as double precision (instead of single precision)
91
+ ```ruby
92
+ index.remove_all(ids)
93
+ ```
94
+
95
+ Count vectors
94
96
 
95
97
  ```ruby
96
- Neighbor::Redis::HNSWIndex.new(name, type: "float64")
98
+ index.count
97
99
  ```
98
100
 
99
- Store vectors as JSON (instead of a hash/blob)
101
+ ## Metadata
102
+
103
+ Add a vector with metadata
100
104
 
101
105
  ```ruby
102
- Neighbor::Redis::HNSWIndex.new(name, redis_type: "json")
106
+ index.add(id, vector, metadata: {category: "A"})
103
107
  ```
104
108
 
105
- ## Changing Options
109
+ Add multiple vectors with metadata
110
+
111
+ ```ruby
112
+ index.add_all(ids, vectors, metadata: [{category: "A"}, {category: "B"}, ...])
113
+ ```
106
114
 
107
- Create a new index to change any index options
115
+ Get metadata for a vector
108
116
 
109
117
  ```ruby
110
- Neighbor::Redis::HNSWIndex.new("items-v2", **new_options)
118
+ index.metadata(id)
111
119
  ```
112
120
 
113
- ## Additional Operations
121
+ Get metadata with search results
122
+
123
+ ```ruby
124
+ index.search(vector, with_metadata: true)
125
+ ```
114
126
 
115
- Add multiple items
127
+ Set metadata
116
128
 
117
129
  ```ruby
118
- index.add_all(ids, embeddings)
130
+ index.set_metadata(id, {category: "B"})
119
131
  ```
120
132
 
121
- Get an item
133
+ Remove metadata
122
134
 
123
135
  ```ruby
124
- index.find(id)
136
+ index.remove_metadata(id)
137
+ ```
138
+
139
+ ## Index Types
140
+
141
+ [Vector sets](#vector-sets)
142
+
143
+ - use cosine distance
144
+ - use single-precision floats
145
+ - support exact and approximate search
146
+ - support quantization and dimensionality reduction
147
+
148
+ [Vector indexes](#vector-indexes)
149
+
150
+ - support L2, inner product, and cosine distance
151
+ - support single or double-precision floats
152
+ - support either exact (flat) or approximate (HNSW and SVS Vamana) search
153
+ - can support quantization and dimensionality reduction (SVS Vamana)
154
+ - require calling `create` before searching
155
+
156
+ ## Vector Sets
157
+
158
+ Create a vector set
159
+
160
+ ```ruby
161
+ Neighbor::Redis::VectorSet.new(name)
125
162
  ```
126
163
 
127
- Remove an item
164
+ Specify parameters
128
165
 
129
166
  ```ruby
130
- index.remove(id)
167
+ Neighbor::Redis::VectorSet.new(name, m: 16, ef_construction: 200, ef_search: 10)
131
168
  ```
132
169
 
133
- Remove multiple items
170
+ Use quantization (`int8` or `binary`)
134
171
 
135
172
  ```ruby
136
- index.remove_all(ids)
173
+ Neighbor::Redis::VectorSet.new(name, quantization: "int8")
137
174
  ```
138
175
 
139
- Drop the index
176
+ Use dimensionality reduction
140
177
 
141
178
  ```ruby
142
- index.drop
179
+ Neighbor::Redis::VectorSet.new(name, reduce: 2)
180
+ ```
181
+
182
+ Perform exact search
183
+
184
+ ```ruby
185
+ index.search(vector, exact: true)
186
+ ```
187
+
188
+ ## Vector Indexes
189
+
190
+ Create a vector index (`l2`, `inner_product`, or `cosine` distance)
191
+
192
+ ```ruby
193
+ index = Neighbor::Redis::HnswIndex.new(name, dimensions: 3, distance: "cosine")
194
+ index.create
195
+ ```
196
+
197
+ Store vectors as double precision (instead of single precision)
198
+
199
+ ```ruby
200
+ Neighbor::Redis::HnswIndex.new(name, type: "float64")
201
+ ```
202
+
203
+ Store vectors as JSON (instead of a hash/blob)
204
+
205
+ ```ruby
206
+ Neighbor::Redis::HnswIndex.new(name, redis_type: "json")
207
+ ```
208
+
209
+ ### Index Options
210
+
211
+ HNSW
212
+
213
+ ```ruby
214
+ Neighbor::Redis::HnswIndex.new(name, m: 16, ef_construction: 200, ef_search: 10)
215
+ ```
216
+
217
+ SVS Vamana - *Redis 8.2+*
218
+
219
+ ```ruby
220
+ Neighbor::Redis::SvsVamanaIndex.new(
221
+ name,
222
+ compression: nil,
223
+ construction_window_size: 200,
224
+ graph_max_degree: 32,
225
+ search_window_size: 10,
226
+ training_threshold: nil,
227
+ reduce: nil
228
+ )
229
+ ```
230
+
231
+ Flat
232
+
233
+ ```ruby
234
+ Neighbor::Redis::FlatIndex.new(name)
143
235
  ```
144
236
 
145
237
  ## Example
@@ -149,8 +241,7 @@ You can use Neighbor Redis for online item-based recommendations with [Disco](ht
149
241
  Create an index
150
242
 
151
243
  ```ruby
152
- index = Neighbor::Redis::HNSWIndex.new("movies", dimensions: 20, distance: "cosine")
153
- index.create
244
+ index = Neighbor::Redis::VectorSet.new("movies")
154
245
  ```
155
246
 
156
247
  Fit the recommender
@@ -170,14 +261,30 @@ index.add_all(recommender.item_ids, recommender.item_factors)
170
261
  And get similar movies
171
262
 
172
263
  ```ruby
173
- index.nearest("Star Wars (1977)").map { |v| v[:id] }
264
+ index.search_id("Star Wars (1977)").map { |v| v[:id] }
174
265
  ```
175
266
 
176
- See the [complete code](examples/disco_item_recs.rb)
267
+ See the complete code for [vector sets](examples/disco_item_recs_vs.rb) and [vector indexes](examples/disco_item_recs.rb)
177
268
 
178
269
  ## Reference
179
270
 
180
- - [Vector similarity](https://redis.io/docs/stack/search/reference/vectors/)
271
+ Get index info
272
+
273
+ ```ruby
274
+ index.info
275
+ ```
276
+
277
+ Check if an index exists
278
+
279
+ ```ruby
280
+ index.exists?
281
+ ```
282
+
283
+ Drop an index
284
+
285
+ ```ruby
286
+ index.drop
287
+ ```
181
288
 
182
289
  ## History
183
290
 
@@ -1,13 +1,22 @@
1
1
  module Neighbor
2
2
  module Redis
3
- class HNSWIndex < Index
4
- def initialize(*args, initial_cap: nil, m: nil, ef_construction: nil, ef_runtime: nil, epsilon: nil, **options)
3
+ class HnswIndex < Index
4
+ def initialize(
5
+ *args,
6
+ initial_cap: nil,
7
+ m: nil,
8
+ ef_construction: nil,
9
+ ef_search: nil,
10
+ ef_runtime: nil,
11
+ epsilon: nil,
12
+ **options
13
+ )
5
14
  super(*args, **options)
6
15
  @algorithm = "HNSW"
7
16
  @initial_cap = initial_cap
8
17
  @m = m
9
18
  @ef_construction = ef_construction
10
- @ef_runtime = ef_runtime
19
+ @ef_runtime = ef_search || ef_runtime
11
20
  @epsilon = epsilon
12
21
  end
13
22
 
@@ -23,5 +32,7 @@ module Neighbor
23
32
  params
24
33
  end
25
34
  end
35
+
36
+ HNSWIndex = HnswIndex
26
37
  end
27
38
  end
@@ -1,20 +1,27 @@
1
1
  module Neighbor
2
2
  module Redis
3
3
  class Index
4
- def initialize(name, dimensions:, distance:, type: "float32", redis_type: "hash")
4
+ def initialize(name, dimensions:, distance:, type: "float32", redis_type: "hash", id_type: "string")
5
5
  @index_name = index_name(name)
6
6
  @global_prefix = "neighbor:items:"
7
7
  @prefix = "#{@global_prefix}#{name}:"
8
8
 
9
- @dimensions = dimensions
9
+ @dimensions = dimensions.to_i
10
10
 
11
11
  unless distance.nil?
12
12
  @distance_metric =
13
13
  case distance.to_s
14
- when "l2", "cosine"
15
- distance.to_s.upcase
14
+ when "l2"
15
+ "L2"
16
16
  when "inner_product"
17
17
  "IP"
18
+ when "cosine"
19
+ if Redis.server_type == :dragonfly
20
+ # uses inner product instead of cosine distance?
21
+ raise ArgumentError, "unsupported distance"
22
+ else
23
+ "COSINE"
24
+ end
18
25
  else
19
26
  raise ArgumentError, "invalid distance"
20
27
  end
@@ -40,15 +47,25 @@ module Neighbor
40
47
  else
41
48
  raise ArgumentError, "invalid redis_type"
42
49
  end
50
+
51
+ @int_ids =
52
+ case id_type.to_s
53
+ when "string"
54
+ false
55
+ when "integer"
56
+ true
57
+ else
58
+ raise ArgumentError, "invalid id_type"
59
+ end
43
60
  end
44
61
 
45
- def self.create(...)
46
- index = new(...)
47
- index.create
62
+ def self.create(*args, _schema: nil, **options)
63
+ index = new(*args, **options)
64
+ index.create(_schema:)
48
65
  index
49
66
  end
50
67
 
51
- def create
68
+ def create(_schema: nil)
52
69
  params = {
53
70
  "TYPE" => @float64 ? "FLOAT64" : "FLOAT32",
54
71
  "DIM" => @dimensions,
@@ -59,82 +76,246 @@ module Neighbor
59
76
  command.push("ON", "JSON") if @json
60
77
  command.push("PREFIX", "1", @prefix, "SCHEMA")
61
78
  command.push("$.v", "AS") if @json
62
- command.push("v", "VECTOR", @algorithm, params.size * 2, params)
63
- ft_command { redis.call(*command) }
79
+ command.push("v", "VECTOR", @algorithm, params.size * 2)
80
+ params.each do |k, v|
81
+ command.push(k, v)
82
+ end
83
+
84
+ (_schema || {}).each do |k, v|
85
+ k = k.to_s
86
+ # TODO improve
87
+ if k == "v" || !k.match?(/\A\w+\z/)
88
+ raise ArgumentError, "invalid schema"
89
+ end
90
+ command.push("$.#{k}", "AS") if @json
91
+ command.push(k, v.to_s)
92
+ # TODO figure out how to handle separator for hashes
93
+ # command.push("SEPARATOR", "") if !@json
94
+ end
95
+
96
+ run_command(*command)
97
+ nil
98
+ rescue => e
99
+ raise Error, "RediSearch not installed" if e.message.include?("ERR unknown command 'FT.")
100
+ raise e
64
101
  end
65
102
 
66
103
  def exists?
67
- redis.call("FT.INFO", @index_name)
104
+ run_command("FT.INFO", @index_name)
68
105
  true
69
106
  rescue ArgumentError
70
107
  # fix for invalid value for Float(): "-nan"
71
108
  true
72
109
  rescue => e
73
110
  message = e.message.downcase
74
- raise unless message.include?("unknown index name") || message.include?("no such index")
111
+ raise e unless message.include?("unknown index name") || message.include?("no such index") || message.include?("not found")
75
112
  false
76
113
  end
77
114
 
78
- def add(id, embedding)
79
- add_all([id], [embedding])
115
+ def info
116
+ info = run_command("FT.INFO", @index_name)
117
+ if info.is_a?(Hash)
118
+ info
119
+ else
120
+ # for RESP2
121
+ info = hash_result(info)
122
+ ["index_definition", "gc_stats" ,"cursor_stats", "dialect_stats", "Index Errors"].each do |k|
123
+ info[k] = hash_result(info[k]) if info[k]
124
+ end
125
+ ["attributes", "field statistics"].each do |k|
126
+ info[k]&.map! { |v| hash_result(v) }
127
+ end
128
+ info["field statistics"]&.each do |v|
129
+ v["Index Errors"] = hash_result(v["Index Errors"]) if v["Index Errors"]
130
+ end
131
+ info
132
+ end
133
+ end
134
+
135
+ def count
136
+ info.fetch("num_docs").to_i
80
137
  end
81
138
 
82
- def add_all(ids, embeddings)
83
- ids = ids.to_a
84
- embeddings = embeddings.to_a
139
+ def add(id, vector, metadata: nil)
140
+ add_all([id], [vector], metadata: metadata ? [metadata] : nil)[0]
141
+ end
85
142
 
86
- raise ArgumentError, "different sizes" if ids.size != embeddings.size
143
+ def add_all(ids, vectors, metadata: nil)
144
+ # perform checks first to reduce chance of non-atomic updates
145
+ ids = ids.to_a.map { |v| item_id(v) }
146
+ vectors = vectors.to_a
147
+ metadata = metadata.to_a if metadata
87
148
 
88
- embeddings.each { |e| check_dimensions(e) }
149
+ raise ArgumentError, "different sizes" if ids.size != vectors.size
89
150
 
90
- redis.pipelined do |pipeline|
91
- ids.zip(embeddings).each do |id, embedding|
92
- if @json
93
- pipeline.call("JSON.SET", item_key(id), "$", JSON.generate({v: embedding}))
94
- else
95
- pipeline.call("HSET", item_key(id), {v: to_binary(embedding)})
96
- end
151
+ vectors.each { |e| check_dimensions(e) }
152
+
153
+ if metadata
154
+ raise ArgumentError, "different sizes" if metadata.size != ids.size
155
+
156
+ metadata = metadata.map { |v| v&.transform_keys(&:to_s) }
157
+ if metadata.any? { |v| v&.key?("v") }
158
+ # TODO improve
159
+ raise ArgumentError, "invalid metadata"
97
160
  end
98
161
  end
162
+
163
+ result =
164
+ client.pipelined do |pipeline|
165
+ ids.zip(vectors).each_with_index do |(id, vector), i|
166
+ attributes = metadata && metadata[i] || {}
167
+ if @json
168
+ pipeline.call("JSON.SET", item_key(id), "$", JSON.generate(attributes.merge({"v" => vector})))
169
+ else
170
+ pipeline.call("HSET", item_key(id), attributes.merge({"v" => to_binary(vector)}))
171
+ end
172
+ end
173
+ end
174
+ result.map { |v| v.is_a?(String) ? v == "OK" : v > 0 }
99
175
  end
100
176
 
101
- def remove(id)
102
- remove_all([id])
177
+ def member?(id)
178
+ key = item_key(id)
179
+
180
+ run_command("EXISTS", key) == 1
103
181
  end
182
+ alias_method :include?, :member?
104
183
 
105
- def remove_all(ids)
106
- redis.call("DEL", ids.map { |id| item_key(id) })
184
+ def remove(id)
185
+ remove_all([id]) == 1
107
186
  end
108
187
 
109
- def search(embedding, count: 5)
110
- check_dimensions(embedding)
188
+ def remove_all(ids)
189
+ keys = ids.to_a.map { |id| item_key(id) }
111
190
 
112
- search_by_blob(to_binary(embedding), count)
191
+ run_command("DEL", *keys).to_i
113
192
  end
114
193
 
115
194
  def find(id)
195
+ key = item_key(id)
196
+
116
197
  if @json
117
- s = redis.call("JSON.GET", item_key(id), "$.v")
198
+ s = run_command("JSON.GET", key, "$.v")
118
199
  JSON.parse(s)[0] if s
119
200
  else
120
- from_binary(redis.call("HGET", item_key(id), "v"))
201
+ s = run_command("HGET", key, "v")
202
+ from_binary(s) if s
203
+ end
204
+ end
205
+
206
+ def metadata(id)
207
+ key = item_key(id)
208
+
209
+ if @json
210
+ v = run_command("JSON.GET", key)
211
+ JSON.parse(v).except("v") if v
212
+ else
213
+ v = hash_result(run_command("HGETALL", key))
214
+ v.except("v") if v.any?
215
+ end
216
+ end
217
+
218
+ def set_metadata(id, metadata)
219
+ key = item_key(id)
220
+
221
+ # TODO DRY with add_all
222
+ metadata = metadata.transform_keys(&:to_s)
223
+ raise ArgumentError, "invalid metadata" if metadata.key?("v")
224
+
225
+ if @json
226
+ # TODO use WATCH
227
+ keys = run_command("JSON.OBJKEYS", key)
228
+ return false unless keys
229
+
230
+ keys.each do |k|
231
+ next if k == "v"
232
+
233
+ # safe to modify in-place
234
+ metadata[k] = nil unless metadata.key?(k)
235
+ end
236
+
237
+ run_command("JSON.MERGE", key, "$", JSON.generate(metadata)) == "OK"
238
+ else
239
+ # TODO use WATCH
240
+ fields = run_command("HKEYS", key)
241
+ return false if fields.empty?
242
+
243
+ fields.delete("v")
244
+ if fields.any?
245
+ # TODO use MULTI
246
+ run_command("HDEL", key, *fields)
247
+ end
248
+
249
+ if metadata.any?
250
+ args = []
251
+ metadata.each do |k, v|
252
+ args.push(k, v)
253
+ end
254
+ run_command("HSET", key, *args) > 0
255
+ else
256
+ true
257
+ end
258
+ end
259
+ end
260
+
261
+ def remove_metadata(id)
262
+ key = item_key(id)
263
+
264
+ if @json
265
+ # TODO use WATCH
266
+ keys = run_command("JSON.OBJKEYS", key)
267
+ return false unless keys
268
+
269
+ keys.delete("v")
270
+ if keys.any?
271
+ # merge with null deletes key
272
+ run_command("JSON.MERGE", key, "$", JSON.generate(keys.to_h { |k| [k, nil] })) == "OK"
273
+ else
274
+ true
275
+ end
276
+ else
277
+ # TODO use WATCH
278
+ fields = run_command("HKEYS", key)
279
+ return false if fields.empty?
280
+
281
+ fields.delete("v")
282
+ if fields.any?
283
+ run_command("HDEL", key, *fields) > 0
284
+ else
285
+ true
286
+ end
121
287
  end
122
288
  end
123
289
 
124
- def nearest(id, count: 5)
125
- embedding =
290
+ def search(vector, count: 5, with_metadata: false, _filter: nil)
291
+ check_dimensions(vector)
292
+
293
+ search_command(to_binary(vector), count, with_metadata:, _filter:)
294
+ end
295
+
296
+ def search_id(id, count: 5, with_metadata: false, _filter: nil)
297
+ id = item_id(id)
298
+ key = item_key(id)
299
+
300
+ vector =
126
301
  if @json
127
- s = redis.call("JSON.GET", item_key(id), "$.v")
302
+ s = run_command("JSON.GET", key, "$.v")
128
303
  to_binary(JSON.parse(s)[0]) if s
129
304
  else
130
- redis.call("HGET", item_key(id), "v")
305
+ run_command("HGET", key, "v")
131
306
  end
132
307
 
133
- unless embedding
308
+ unless vector
134
309
  raise Error, "Could not find item #{id}"
135
310
  end
136
311
 
137
- search_by_blob(embedding, count + 1).reject { |v| v[:id] == id.to_s }.first(count)
312
+ search_command(vector, count + 1, with_metadata:, _filter:).reject { |v| v[:id] == id }.first(count)
313
+ end
314
+ alias_method :nearest, :search_id
315
+
316
+ def promote(alias_name)
317
+ run_command("FT.ALIASUPDATE", index_name(alias_name), @index_name)
318
+ nil
138
319
  end
139
320
 
140
321
  def drop
@@ -142,63 +323,81 @@ module Neighbor
142
323
  drop_keys
143
324
  end
144
325
 
145
- def promote(alias_name)
146
- redis.call("FT.ALIASUPDATE", index_name(alias_name), @index_name)
147
- end
148
-
149
326
  private
150
327
 
151
328
  def index_name(name)
152
329
  if name.include?(":")
153
- raise ArgumentError, "Invalid name"
330
+ raise ArgumentError, "invalid name"
154
331
  end
155
332
 
156
333
  "neighbor-idx-#{name}"
157
334
  end
158
335
 
159
- def check_dimensions(embedding)
160
- if embedding.size != @dimensions
336
+ def check_dimensions(vector)
337
+ if vector.size != @dimensions
161
338
  raise ArgumentError, "expected #{@dimensions} dimensions"
162
339
  end
163
340
  end
164
341
 
165
342
  def item_key(id)
166
- "#{@prefix}#{id}"
343
+ "#{@prefix}#{item_id(id)}"
344
+ end
345
+
346
+ def item_id(id)
347
+ @int_ids ? Integer(id) : id.to_s
167
348
  end
168
349
 
169
- def search_by_blob(blob, count)
170
- resp = redis.call("FT.SEARCH", @index_name, "*=>[KNN #{count.to_i} @v $BLOB]", "PARAMS", "2", "BLOB", blob, "SORTBY", "__v_score", "DIALECT", "2")
171
- resp.is_a?(Hash) ? parse_results_hash(resp) : parse_results_array(resp)
350
+ def search_command(blob, count, with_metadata:, _filter:)
351
+ filter = _filter ? "(#{_filter})" : "*"
352
+ return_args = with_metadata ? [] : ["RETURN", 1, "__v_score"]
353
+ resp = run_command("FT.SEARCH", @index_name, "#{filter}=>[KNN #{count.to_i} @v $BLOB AS __v_score]", "PARAMS", "2", "BLOB", blob, *search_sort_args, *return_args, "DIALECT", "2")
354
+ if resp.is_a?(Hash)
355
+ parse_results_hash(resp, with_metadata:)
356
+ else
357
+ parse_results_array(resp, with_metadata:)
358
+ end
172
359
  end
173
360
 
174
- def parse_results_hash(resp)
361
+ def search_sort_args
362
+ @search_sort_args ||= Redis.server_type == :valkey ? [] : ["SORTBY", "__v_score"]
363
+ end
364
+
365
+ def parse_results_hash(resp, with_metadata:)
175
366
  prefix_length = nil
176
367
  resp["results"].map do |result|
177
368
  key = result["id"]
178
369
  info = result["extra_attributes"]
179
370
  prefix_length ||= find_prefix_length(key)
180
- search_result(key, info, prefix_length)
371
+ search_result(key, info, prefix_length, with_metadata:)
181
372
  end
182
373
  end
183
374
 
184
- def parse_results_array(resp)
375
+ def parse_results_array(resp, with_metadata:)
185
376
  prefix_length = nil
186
377
  resp.shift.times.map do |i|
187
378
  key, info = resp.shift(2)
188
- info = info.each_slice(2).to_h
379
+ info = info.each_slice(2).to_h unless info.is_a?(Hash)
189
380
  prefix_length ||= find_prefix_length(key)
190
- search_result(key, info, prefix_length)
381
+ search_result(key, info, prefix_length, with_metadata:)
191
382
  end
192
383
  end
193
384
 
194
- def search_result(key, info, prefix_length)
385
+ def search_result(key, info, prefix_length, with_metadata:)
195
386
  score = info["__v_score"].to_f
196
387
  distance = calculate_distance(score)
197
388
 
198
- {
199
- id: key[prefix_length..-1],
389
+ result = {
390
+ id: item_id(key[prefix_length..-1]),
200
391
  distance: distance
201
392
  }
393
+ if with_metadata
394
+ if @json
395
+ result[:metadata] = JSON.parse(info["$"]).except("v")
396
+ else
397
+ result[:metadata] = info.except("v", "__v_score")
398
+ end
399
+ end
400
+ result
202
401
  end
203
402
 
204
403
  def calculate_distance(score)
@@ -218,19 +417,19 @@ module Neighbor
218
417
  end
219
418
 
220
419
  def drop_index
221
- redis.call("FT.DROPINDEX", @index_name)
420
+ run_command("FT.DROPINDEX", @index_name)
222
421
  end
223
422
 
224
423
  def drop_keys
225
424
  cursor = 0
226
425
  begin
227
- cursor, keys = redis.call("SCAN", cursor, "MATCH", "#{@prefix}*", "COUNT", 100)
228
- redis.call("DEL", keys) if keys.any?
426
+ cursor, keys = run_command("SCAN", cursor, "MATCH", "#{@prefix}*", "COUNT", 100)
427
+ run_command("DEL", *keys) if keys.any?
229
428
  end while cursor != "0"
230
429
  end
231
430
 
232
- def to_binary(embedding)
233
- embedding.to_a.pack(pack_format)
431
+ def to_binary(vector)
432
+ vector.to_a.pack(pack_format)
234
433
  end
235
434
 
236
435
  def from_binary(s)
@@ -241,15 +440,18 @@ module Neighbor
241
440
  @pack_format ||= @float64 ? "d#{@dimensions}" : "f#{@dimensions}"
242
441
  end
243
442
 
244
- # just use for create for now
245
- def ft_command
246
- yield
247
- rescue => e
248
- raise Error, "RediSearch not installed" if e.message.include?("ERR unknown command 'FT.")
249
- raise
443
+ def hash_result(result)
444
+ result.is_a?(Array) ? result.each_slice(2).to_h : result
445
+ end
446
+
447
+ def run_command(*args)
448
+ if args.any? { |v| !(v.is_a?(String) || v.is_a?(Numeric)) }
449
+ raise TypeError, "unexpected argument type"
450
+ end
451
+ client.call(*args)
250
452
  end
251
453
 
252
- def redis
454
+ def client
253
455
  Redis.client
254
456
  end
255
457
  end
@@ -0,0 +1,41 @@
1
+ module Neighbor
2
+ module Redis
3
+ class SvsVamanaIndex < Index
4
+ def initialize(
5
+ *args,
6
+ compression: nil,
7
+ construction_window_size: nil,
8
+ graph_max_degree: nil,
9
+ search_window_size: nil,
10
+ epsilon: nil,
11
+ training_threshold: nil,
12
+ reduce: nil,
13
+ **options
14
+ )
15
+ super(*args, **options)
16
+ @algorithm = "SVS-VAMANA"
17
+ @compression = compression
18
+ @construction_window_size = construction_window_size
19
+ @graph_max_degree = graph_max_degree
20
+ @search_window_size = search_window_size
21
+ @epsilon = epsilon
22
+ @training_threshold = training_threshold
23
+ @reduce = reduce
24
+ end
25
+
26
+ private
27
+
28
+ def create_params
29
+ params = {}
30
+ params["COMPRESSION"] = @compression if @compression
31
+ params["CONSTRUCTION_WINDOW_SIZE"] = @construction_window_size if @construction_window_size
32
+ params["GRAPH_MAX_DEGREE"] = @graph_max_degree if @graph_max_degree
33
+ params["SEARCH_WINDOW_SIZE"] = @search_window_size if @search_window_size
34
+ params["EPSILON"] = @epsilon if @epsilon
35
+ params["TRAINING_THRESHOLD"] = @training_threshold if @training_threshold
36
+ params["REDUCE"] = @reduce if @reduce
37
+ params
38
+ end
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,265 @@
1
+ module Neighbor
2
+ module Redis
3
+ class VectorSet
4
+ NO_DEFAULT = Object.new
5
+
6
+ def initialize(
7
+ name,
8
+ m: nil,
9
+ ef_construction: nil,
10
+ ef_search: nil,
11
+ epsilon: nil,
12
+ quantization: nil,
13
+ reduce: nil,
14
+ id_type: "string"
15
+ )
16
+ name = name.to_str
17
+ if name.include?(":")
18
+ raise ArgumentError, "invalid name"
19
+ end
20
+
21
+ @name = name
22
+ @m = m&.to_i
23
+ @ef_construction = ef_construction&.to_i
24
+ @ef_search = ef_search&.to_i
25
+ @epsilon = epsilon&.to_f
26
+
27
+ @quant_type =
28
+ case quantization&.to_s
29
+ when nil
30
+ "NOQUANT"
31
+ when "binary"
32
+ "BIN"
33
+ when "int8"
34
+ "Q8"
35
+ else
36
+ raise ArgumentError, "invalid quantization"
37
+ end
38
+
39
+ case id_type.to_s
40
+ when "string", "integer"
41
+ @int_ids = id_type == "integer"
42
+ else
43
+ raise ArgumentError, "invalid id_type"
44
+ end
45
+
46
+ @reduce_args = []
47
+ @reduce_args.push("REDUCE", reduce.to_i) if reduce
48
+
49
+ @add_args = []
50
+ @add_args.push("M", @m) if @m
51
+ @add_args.push("EF", @ef_construction) if @ef_construction
52
+ end
53
+
54
+ def exists?
55
+ !run_command("VINFO", key).nil?
56
+ end
57
+
58
+ def info
59
+ hash_result(run_command("VINFO", key))
60
+ end
61
+
62
+ def dimensions
63
+ run_command("VDIM", key)
64
+ rescue => e
65
+ raise e unless e.message.include?("key does not exist")
66
+ nil
67
+ end
68
+
69
+ def count
70
+ run_command("VCARD", key)
71
+ end
72
+
73
+ def add(id, vector, metadata: nil)
74
+ add_all([id], [vector], metadata: metadata ? [metadata] : nil)[0]
75
+ end
76
+
77
+ def add_all(ids, vectors, metadata: nil)
78
+ # perform checks first to reduce chance of non-atomic updates
79
+ ids = ids.to_a.map { |v| item_id(v) }
80
+ vectors = vectors.to_a
81
+ metadata = metadata.to_a if metadata
82
+
83
+ raise ArgumentError, "different sizes" if ids.size != vectors.size
84
+
85
+ if vectors.size > 1
86
+ dimensions = vectors.first.size
87
+ unless vectors.all? { |v| v.size == dimensions }
88
+ raise ArgumentError, "different dimensions"
89
+ end
90
+ end
91
+
92
+ if metadata
93
+ raise ArgumentError, "different sizes" if metadata.size != ids.size
94
+ end
95
+
96
+ result =
97
+ client.pipelined do |pipeline|
98
+ ids.zip(vectors).each_with_index do |(id, vector), i|
99
+ attributes = metadata[i] if metadata
100
+ attribute_args = []
101
+ attribute_args.push("SETATTR", JSON.generate(attributes)) if attributes
102
+ pipeline.call("VADD", key, *@reduce_args, "FP32", to_binary(vector), id, @quant_type, *attribute_args, *@add_args)
103
+ end
104
+ end
105
+ result.map { |v| bool_result(v) }
106
+ end
107
+
108
+ def member?(id)
109
+ id = item_id(id)
110
+
111
+ bool_result(run_command("VISMEMBER", key, id))
112
+ end
113
+ alias_method :include?, :member?
114
+
115
+ def remove(id)
116
+ id = item_id(id)
117
+
118
+ bool_result(run_command("VREM", key, id))
119
+ end
120
+
121
+ def remove_all(ids)
122
+ # perform checks first to reduce chance of non-atomic updates
123
+ ids = ids.to_a.map { |v| item_id(v) }
124
+
125
+ result =
126
+ client.pipelined do |pipeline|
127
+ ids.each do |id|
128
+ pipeline.call("VREM", key, id)
129
+ end
130
+ end
131
+ result.map { |v| bool_result(v) }
132
+ end
133
+
134
+ def find(id)
135
+ id = item_id(id)
136
+
137
+ run_command("VEMB", key, id)&.map(&:to_f)
138
+ end
139
+
140
+ def metadata(id)
141
+ id = item_id(id)
142
+
143
+ a = run_command("VGETATTR", key, id)
144
+ a ? JSON.parse(a) : nil
145
+ end
146
+
147
+ def set_metadata(id, metadata)
148
+ id = item_id(id)
149
+
150
+ bool_result(run_command("VSETATTR", key, id, JSON.generate(metadata)))
151
+ end
152
+
153
+ def remove_metadata(id)
154
+ id = item_id(id)
155
+
156
+ bool_result(run_command("VSETATTR", key, id, ""))
157
+ end
158
+
159
+ def search(vector, count: 5, with_metadata: false, ef_search: nil, exact: false, _filter: nil, _ef_filter: nil)
160
+ count = count.to_i
161
+
162
+ search_command(["FP32", to_binary(vector)], count:, with_metadata:, ef_search:, exact:, _filter:, _ef_filter:).map do |k, v|
163
+ search_result(k, v, with_metadata:)
164
+ end
165
+ end
166
+
167
+ def search_id(id, count: 5, with_metadata: false, ef_search: nil, exact: false, _filter: nil, _ef_filter: nil)
168
+ id = item_id(id)
169
+ count = count.to_i
170
+
171
+ result =
172
+ search_command(["ELE", id], count: count + 1, with_metadata:, ef_search:, exact:, _filter:, _ef_filter:).filter_map do |k, v|
173
+ if k != id.to_s
174
+ search_result(k, v, with_metadata:)
175
+ end
176
+ end
177
+ result.first(count)
178
+ end
179
+ alias_method :nearest, :search_id
180
+
181
+ def links(id)
182
+ id = item_id(id)
183
+
184
+ run_command("VLINKS", key, id, "WITHSCORES")&.map do |links|
185
+ hash_result(links).map do |k, v|
186
+ search_result(k, v)
187
+ end
188
+ end
189
+ end
190
+
191
+ def sample(n = NO_DEFAULT)
192
+ count = n == NO_DEFAULT ? 1 : n.to_i
193
+
194
+ result = run_command("VRANDMEMBER", key, count).map { |v| item_id(v) }
195
+ n == NO_DEFAULT ? result.first : result
196
+ end
197
+
198
+ def drop
199
+ bool_result(run_command("DEL", key))
200
+ end
201
+
202
+ private
203
+
204
+ def key
205
+ "neighbor:vs:#{@name}"
206
+ end
207
+
208
+ def item_id(id)
209
+ @int_ids ? Integer(id) : id.to_s
210
+ end
211
+
212
+ def to_binary(vector)
213
+ vector.pack("e*")
214
+ end
215
+
216
+ def search_command(args, count:, with_metadata:, ef_search:, exact:, _filter:, _ef_filter:)
217
+ ef_search = @ef_search if ef_search.nil?
218
+
219
+ args << "WITHATTRIBS" if with_metadata
220
+ args.push("EF", ef_search) if ef_search
221
+ args.push("EPSILON", @epsilon) if @epsilon
222
+ args.push("FILTER", _filter) if _filter
223
+ args.push("FILTER_EF", _ef_filter) if _ef_filter
224
+ args << "TRUTH" if exact
225
+
226
+ result = run_command("VSIM", key, *args, "WITHSCORES", "COUNT", count)
227
+ if result.is_a?(Array)
228
+ if with_metadata
229
+ result.each_slice(3).to_h { |v| [v[0], v[1..]] }
230
+ else
231
+ hash_result(result)
232
+ end
233
+ else
234
+ result
235
+ end
236
+ end
237
+
238
+ def search_result(k, v, with_metadata: false)
239
+ v, a = v if with_metadata
240
+ value = {id: item_id(k), distance: 2 * (1 - v.to_f)}
241
+ value.merge!(metadata: a ? JSON.parse(a) : {}) if with_metadata
242
+ value
243
+ end
244
+
245
+ def hash_result(result)
246
+ result.is_a?(Array) ? result.each_slice(2).to_h : result
247
+ end
248
+
249
+ def bool_result(result)
250
+ result == true || result == 1
251
+ end
252
+
253
+ def run_command(*args)
254
+ if args.any? { |v| !(v.is_a?(String) || v.is_a?(Numeric)) }
255
+ raise TypeError, "unexpected argument type"
256
+ end
257
+ client.call(*args)
258
+ end
259
+
260
+ def client
261
+ Redis.client
262
+ end
263
+ end
264
+ end
265
+ end
@@ -1,5 +1,5 @@
1
1
  module Neighbor
2
2
  module Redis
3
- VERSION = "0.2.1"
3
+ VERSION = "0.3.0"
4
4
  end
5
5
  end
@@ -1,10 +1,15 @@
1
1
  # dependencies
2
2
  require "redis-client"
3
3
 
4
+ # stdlib
5
+ require "json"
6
+
4
7
  # modules
5
8
  require_relative "redis/index"
6
9
  require_relative "redis/flat_index"
7
10
  require_relative "redis/hnsw_index"
11
+ require_relative "redis/svs_vamana_index"
12
+ require_relative "redis/vector_set"
8
13
  require_relative "redis/version"
9
14
 
10
15
  module Neighbor
@@ -13,6 +18,21 @@ module Neighbor
13
18
 
14
19
  class << self
15
20
  attr_accessor :client
21
+
22
+ def server_type
23
+ unless defined?(@server_type)
24
+ info = client.call("INFO")
25
+ @server_type =
26
+ if info.include?("valkey_version")
27
+ :valkey
28
+ elsif info.include?("dragonfly_version")
29
+ :dragonfly
30
+ else
31
+ :redis
32
+ end
33
+ end
34
+ @server_type
35
+ end
16
36
  end
17
37
  end
18
38
  end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: neighbor-redis
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.1
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
@@ -36,6 +36,8 @@ files:
36
36
  - lib/neighbor/redis/flat_index.rb
37
37
  - lib/neighbor/redis/hnsw_index.rb
38
38
  - lib/neighbor/redis/index.rb
39
+ - lib/neighbor/redis/svs_vamana_index.rb
40
+ - lib/neighbor/redis/vector_set.rb
39
41
  - lib/neighbor/redis/version.rb
40
42
  homepage: https://github.com/ankane/neighbor-redis
41
43
  licenses:
@@ -48,14 +50,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
48
50
  requirements:
49
51
  - - ">="
50
52
  - !ruby/object:Gem::Version
51
- version: '3.1'
53
+ version: '3.2'
52
54
  required_rubygems_version: !ruby/object:Gem::Requirement
53
55
  requirements:
54
56
  - - ">="
55
57
  - !ruby/object:Gem::Version
56
58
  version: '0'
57
59
  requirements: []
58
- rubygems_version: 3.6.7
60
+ rubygems_version: 3.6.9
59
61
  specification_version: 4
60
62
  summary: Nearest neighbor search for Ruby and Redis
61
63
  test_files: []