neighbor-redis 0.2.0 → 0.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 15f6a05f33831afcb282262440f8a4ccd73264c109ada3d5de1383af76cab304
4
- data.tar.gz: 977b93e67d4b62db75837bf72dff4959a5b6aae8c940828273bee22563f172ad
3
+ metadata.gz: 94d78e38cc8f93df780741fe7c08964657e9f965ca746ffeb9b80fa5950663f6
4
+ data.tar.gz: 8c4ed1a76383f1a7f0d99712cce6b5d835472eabf48357ec1b7b9044f212e8ff
5
5
  SHA512:
6
- metadata.gz: f6148b2717ddbb7cef8e9d762866b3c0d981be8e57e6a9e7c8a28f3a37b8402b5add0383e2676bc21b1c93dd687e10ae2fdaf9bfb31969adc80d7fb43def01a4
7
- data.tar.gz: e7bc98f4436a8e48c84215e837461312bf0d610e9f5103a8d6f0ce39aa800970f5adb4c341ff691e21a059d30e4540eda4ef1df3d17f7d313923e6799860beb3
6
+ metadata.gz: 83d4415cd858429b265e764f09fb34df4492a0fa93d94d6a766688a8e2f3a98ae9850102ca684322838a898c7c2fb430aa7fbc0082564679b775d7dce79d7e18
7
+ data.tar.gz: 8dd3a10a9fbaf716df69cc166cd4703e2a0c9489a3cddc58816e4ef17a0f0a5be7fe16da680553b4884606eaddc9e62fdbdea572b6891e44ed54da7335187018
data/CHANGELOG.md CHANGED
@@ -1,3 +1,18 @@
1
+ ## 0.3.0 (2025-09-12)
2
+
3
+ - Added support for vector sets
4
+ - Added support for SVS Vamana indexes
5
+ - Added support for metadata
6
+ - Added `info` and `count` methods to indexes
7
+ - Updated `add` and `remove` methods to return boolean
8
+ - Updated `add_all` method to return array of booleans
9
+ - Updated `create` and `promote` methods to return `nil`
10
+ - Dropped support for Ruby < 3.2
11
+
12
+ ## 0.2.1 (2025-05-06)
13
+
14
+ - Added support for Redis 8
15
+
1
16
  ## 0.2.0 (2024-10-23)
2
17
 
3
18
  - Dropped support for Ruby < 3.1
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2023 Andrew Kane
3
+ Copyright (c) 2023-2025 Andrew Kane
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -2,14 +2,16 @@
2
2
 
3
3
  Nearest neighbor search for Ruby and Redis
4
4
 
5
+ Supports Redis 8 [vector sets](https://redis.io/docs/latest/develop/data-types/vector-sets/) and RediSearch [vector indexes](https://redis.io/docs/latest/develop/ai/search-and-query/vectors/)
6
+
5
7
  [![Build Status](https://github.com/ankane/neighbor-redis/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/neighbor-redis/actions)
6
8
 
7
9
  ## Installation
8
10
 
9
- First, [install RediSearch](https://redis.io/docs/stack/search/quick_start/). With Docker, use:
11
+ First, install Redis. With Docker, use:
10
12
 
11
13
  ```sh
12
- docker run -p 6379:6379 redis/redis-stack-server
14
+ docker run -p 6379:6379 redis:8
13
15
  ```
14
16
 
15
17
  Add this line to your application’s Gemfile:
@@ -29,11 +31,10 @@ Neighbor::Redis.client = RedisClient.config.new_pool
29
31
  Create an index
30
32
 
31
33
  ```ruby
32
- index = Neighbor::Redis::HNSWIndex.new("items", dimensions: 3, distance: "l2")
33
- index.create
34
+ index = Neighbor::Redis::VectorSet.new("items")
34
35
  ```
35
36
 
36
- Add items
37
+ Add vectors
37
38
 
38
39
  ```ruby
39
40
  index.add(1, [1, 1, 1])
@@ -41,105 +42,196 @@ index.add(2, [2, 2, 2])
41
42
  index.add(3, [1, 1, 2])
42
43
  ```
43
44
 
44
- Note: IDs are stored and returned as strings (uses less total memory)
45
+ Search for nearest neighbors to a vector
45
46
 
46
- Get the nearest neighbors to an item
47
+ ```ruby
48
+ index.search([1, 1, 1], count: 5)
49
+ ```
50
+
51
+ Search for nearest neighbors to a vector in the index
47
52
 
48
53
  ```ruby
49
- index.nearest(1, count: 5)
54
+ index.search_id(1, count: 5)
50
55
  ```
51
56
 
52
- Get the nearest neighbors to a vector
57
+ IDs are treated as strings by default, but can also be treated as integers
53
58
 
54
59
  ```ruby
55
- index.search([1, 1, 1], count: 5)
60
+ Neighbor::Redis::VectorSet.new("items", id_type: "integer")
56
61
  ```
57
62
 
58
- ## Distance
63
+ ## Operations
59
64
 
60
- Supported values are:
65
+ Add or update a vector
61
66
 
62
- - `l2`
63
- - `inner_product`
64
- - `cosine`
67
+ ```ruby
68
+ index.add(id, vector)
69
+ ```
65
70
 
66
- ## Index Types
71
+ Add or update multiple vectors
72
+
73
+ ```ruby
74
+ index.add_all(ids, vectors)
75
+ ```
67
76
 
68
- Hierarchical Navigable Small World (HNSW)
77
+ Get a vector
69
78
 
70
79
  ```ruby
71
- Neighbor::Redis::HNSWIndex.new(
72
- name,
73
- initial_cap: nil,
74
- m: 16,
75
- ef_construction: 200,
76
- ef_runtime: 10,
77
- epsilon: 0.01
78
- )
80
+ index.find(id)
79
81
  ```
80
82
 
81
- Flat
83
+ Remove a vector
82
84
 
83
85
  ```ruby
84
- Neighbor::Redis::FlatIndex.new(
85
- name,
86
- initial_cap: nil,
87
- block_size: 1024
88
- )
86
+ index.remove(id)
89
87
  ```
90
88
 
91
- ## Additional Options
89
+ Remove multiple vectors
92
90
 
93
- Store vectors as double precision (instead of single precision)
91
+ ```ruby
92
+ index.remove_all(ids)
93
+ ```
94
+
95
+ Count vectors
94
96
 
95
97
  ```ruby
96
- Neighbor::Redis::HNSWIndex.new(name, type: "float64")
98
+ index.count
97
99
  ```
98
100
 
99
- Store vectors as JSON (instead of a hash/blob)
101
+ ## Metadata
102
+
103
+ Add a vector with metadata
100
104
 
101
105
  ```ruby
102
- Neighbor::Redis::HNSWIndex.new(name, redis_type: "json")
106
+ index.add(id, vector, metadata: {category: "A"})
103
107
  ```
104
108
 
105
- ## Changing Options
109
+ Add multiple vectors with metadata
110
+
111
+ ```ruby
112
+ index.add_all(ids, vectors, metadata: [{category: "A"}, {category: "B"}, ...])
113
+ ```
106
114
 
107
- Create a new index to change any index options
115
+ Get metadata for a vector
108
116
 
109
117
  ```ruby
110
- Neighbor::Redis::HNSWIndex.new("items-v2", **new_options)
118
+ index.metadata(id)
111
119
  ```
112
120
 
113
- ## Additional Operations
121
+ Get metadata with search results
122
+
123
+ ```ruby
124
+ index.search(vector, with_metadata: true)
125
+ ```
114
126
 
115
- Add multiple items
127
+ Set metadata
116
128
 
117
129
  ```ruby
118
- index.add_all(ids, embeddings)
130
+ index.set_metadata(id, {category: "B"})
119
131
  ```
120
132
 
121
- Get an item
133
+ Remove metadata
122
134
 
123
135
  ```ruby
124
- index.find(id)
136
+ index.remove_metadata(id)
137
+ ```
138
+
139
+ ## Index Types
140
+
141
+ [Vector sets](#vector-sets)
142
+
143
+ - use cosine distance
144
+ - use single-precision floats
145
+ - support exact and approximate search
146
+ - support quantization and dimensionality reduction
147
+
148
+ [Vector indexes](#vector-indexes)
149
+
150
+ - support L2, inner product, and cosine distance
151
+ - support single or double-precision floats
152
+ - support either exact (flat) or approximate (HNSW and SVS Vamana) search
153
+ - can support quantization and dimensionality reduction (SVS Vamana)
154
+ - require calling `create` before searching
155
+
156
+ ## Vector Sets
157
+
158
+ Create a vector set
159
+
160
+ ```ruby
161
+ Neighbor::Redis::VectorSet.new(name)
125
162
  ```
126
163
 
127
- Remove an item
164
+ Specify parameters
128
165
 
129
166
  ```ruby
130
- index.remove(id)
167
+ Neighbor::Redis::VectorSet.new(name, m: 16, ef_construction: 200, ef_search: 10)
131
168
  ```
132
169
 
133
- Remove multiple items
170
+ Use quantization (`int8` or `binary`)
134
171
 
135
172
  ```ruby
136
- index.remove_all(ids)
173
+ Neighbor::Redis::VectorSet.new(name, quantization: "int8")
137
174
  ```
138
175
 
139
- Drop the index
176
+ Use dimensionality reduction
140
177
 
141
178
  ```ruby
142
- index.drop
179
+ Neighbor::Redis::VectorSet.new(name, reduce: 2)
180
+ ```
181
+
182
+ Perform exact search
183
+
184
+ ```ruby
185
+ index.search(vector, exact: true)
186
+ ```
187
+
188
+ ## Vector Indexes
189
+
190
+ Create a vector index (`l2`, `inner_product`, or `cosine` distance)
191
+
192
+ ```ruby
193
+ index = Neighbor::Redis::HnswIndex.new(name, dimensions: 3, distance: "cosine")
194
+ index.create
195
+ ```
196
+
197
+ Store vectors as double precision (instead of single precision)
198
+
199
+ ```ruby
200
+ Neighbor::Redis::HnswIndex.new(name, type: "float64")
201
+ ```
202
+
203
+ Store vectors as JSON (instead of a hash/blob)
204
+
205
+ ```ruby
206
+ Neighbor::Redis::HnswIndex.new(name, redis_type: "json")
207
+ ```
208
+
209
+ ### Index Options
210
+
211
+ HNSW
212
+
213
+ ```ruby
214
+ Neighbor::Redis::HnswIndex.new(name, m: 16, ef_construction: 200, ef_search: 10)
215
+ ```
216
+
217
+ SVS Vamana - *Redis 8.2+*
218
+
219
+ ```ruby
220
+ Neighbor::Redis::SvsVamanaIndex.new(
221
+ name,
222
+ compression: nil,
223
+ construction_window_size: 200,
224
+ graph_max_degree: 32,
225
+ search_window_size: 10,
226
+ training_threshold: nil,
227
+ reduce: nil
228
+ )
229
+ ```
230
+
231
+ Flat
232
+
233
+ ```ruby
234
+ Neighbor::Redis::FlatIndex.new(name)
143
235
  ```
144
236
 
145
237
  ## Example
@@ -149,8 +241,7 @@ You can use Neighbor Redis for online item-based recommendations with [Disco](ht
149
241
  Create an index
150
242
 
151
243
  ```ruby
152
- index = Neighbor::Redis::HNSWIndex.new("movies", dimensions: 20, distance: "cosine")
153
- index.create
244
+ index = Neighbor::Redis::VectorSet.new("movies")
154
245
  ```
155
246
 
156
247
  Fit the recommender
@@ -170,14 +261,30 @@ index.add_all(recommender.item_ids, recommender.item_factors)
170
261
  And get similar movies
171
262
 
172
263
  ```ruby
173
- index.nearest("Star Wars (1977)").map { |v| v[:id] }
264
+ index.search_id("Star Wars (1977)").map { |v| v[:id] }
174
265
  ```
175
266
 
176
- See the [complete code](examples/disco_item_recs.rb)
267
+ See the complete code for [vector sets](examples/disco_item_recs_vs.rb) and [vector indexes](examples/disco_item_recs.rb)
177
268
 
178
269
  ## Reference
179
270
 
180
- - [Vector similarity](https://redis.io/docs/stack/search/reference/vectors/)
271
+ Get index info
272
+
273
+ ```ruby
274
+ index.info
275
+ ```
276
+
277
+ Check if an index exists
278
+
279
+ ```ruby
280
+ index.exists?
281
+ ```
282
+
283
+ Drop an index
284
+
285
+ ```ruby
286
+ index.drop
287
+ ```
181
288
 
182
289
  ## History
183
290
 
@@ -1,13 +1,22 @@
1
1
  module Neighbor
2
2
  module Redis
3
- class HNSWIndex < Index
4
- def initialize(*args, initial_cap: nil, m: nil, ef_construction: nil, ef_runtime: nil, epsilon: nil, **options)
3
+ class HnswIndex < Index
4
+ def initialize(
5
+ *args,
6
+ initial_cap: nil,
7
+ m: nil,
8
+ ef_construction: nil,
9
+ ef_search: nil,
10
+ ef_runtime: nil,
11
+ epsilon: nil,
12
+ **options
13
+ )
5
14
  super(*args, **options)
6
15
  @algorithm = "HNSW"
7
16
  @initial_cap = initial_cap
8
17
  @m = m
9
18
  @ef_construction = ef_construction
10
- @ef_runtime = ef_runtime
19
+ @ef_runtime = ef_search || ef_runtime
11
20
  @epsilon = epsilon
12
21
  end
13
22
 
@@ -23,5 +32,7 @@ module Neighbor
23
32
  params
24
33
  end
25
34
  end
35
+
36
+ HNSWIndex = HnswIndex
26
37
  end
27
38
  end
@@ -1,20 +1,27 @@
1
1
  module Neighbor
2
2
  module Redis
3
3
  class Index
4
- def initialize(name, dimensions:, distance:, type: "float32", redis_type: "hash")
4
+ def initialize(name, dimensions:, distance:, type: "float32", redis_type: "hash", id_type: "string")
5
5
  @index_name = index_name(name)
6
6
  @global_prefix = "neighbor:items:"
7
7
  @prefix = "#{@global_prefix}#{name}:"
8
8
 
9
- @dimensions = dimensions
9
+ @dimensions = dimensions.to_i
10
10
 
11
11
  unless distance.nil?
12
12
  @distance_metric =
13
13
  case distance.to_s
14
- when "l2", "cosine"
15
- distance.to_s.upcase
14
+ when "l2"
15
+ "L2"
16
16
  when "inner_product"
17
17
  "IP"
18
+ when "cosine"
19
+ if Redis.server_type == :dragonfly
20
+ # uses inner product instead of cosine distance?
21
+ raise ArgumentError, "unsupported distance"
22
+ else
23
+ "COSINE"
24
+ end
18
25
  else
19
26
  raise ArgumentError, "invalid distance"
20
27
  end
@@ -40,15 +47,25 @@ module Neighbor
40
47
  else
41
48
  raise ArgumentError, "invalid redis_type"
42
49
  end
50
+
51
+ @int_ids =
52
+ case id_type.to_s
53
+ when "string"
54
+ false
55
+ when "integer"
56
+ true
57
+ else
58
+ raise ArgumentError, "invalid id_type"
59
+ end
43
60
  end
44
61
 
45
- def self.create(...)
46
- index = new(...)
47
- index.create
62
+ def self.create(*args, _schema: nil, **options)
63
+ index = new(*args, **options)
64
+ index.create(_schema:)
48
65
  index
49
66
  end
50
67
 
51
- def create
68
+ def create(_schema: nil)
52
69
  params = {
53
70
  "TYPE" => @float64 ? "FLOAT64" : "FLOAT32",
54
71
  "DIM" => @dimensions,
@@ -59,81 +76,246 @@ module Neighbor
59
76
  command.push("ON", "JSON") if @json
60
77
  command.push("PREFIX", "1", @prefix, "SCHEMA")
61
78
  command.push("$.v", "AS") if @json
62
- command.push("v", "VECTOR", @algorithm, params.size * 2, params)
63
- ft_command { redis.call(*command) }
79
+ command.push("v", "VECTOR", @algorithm, params.size * 2)
80
+ params.each do |k, v|
81
+ command.push(k, v)
82
+ end
83
+
84
+ (_schema || {}).each do |k, v|
85
+ k = k.to_s
86
+ # TODO improve
87
+ if k == "v" || !k.match?(/\A\w+\z/)
88
+ raise ArgumentError, "invalid schema"
89
+ end
90
+ command.push("$.#{k}", "AS") if @json
91
+ command.push(k, v.to_s)
92
+ # TODO figure out how to handle separator for hashes
93
+ # command.push("SEPARATOR", "") if !@json
94
+ end
95
+
96
+ run_command(*command)
97
+ nil
98
+ rescue => e
99
+ raise Error, "RediSearch not installed" if e.message.include?("ERR unknown command 'FT.")
100
+ raise e
64
101
  end
65
102
 
66
103
  def exists?
67
- redis.call("FT.INFO", @index_name)
104
+ run_command("FT.INFO", @index_name)
68
105
  true
69
106
  rescue ArgumentError
70
107
  # fix for invalid value for Float(): "-nan"
71
108
  true
72
109
  rescue => e
73
- raise unless e.message.downcase.include?("unknown index name")
110
+ message = e.message.downcase
111
+ raise e unless message.include?("unknown index name") || message.include?("no such index") || message.include?("not found")
74
112
  false
75
113
  end
76
114
 
77
- def add(id, embedding)
78
- add_all([id], [embedding])
115
+ def info
116
+ info = run_command("FT.INFO", @index_name)
117
+ if info.is_a?(Hash)
118
+ info
119
+ else
120
+ # for RESP2
121
+ info = hash_result(info)
122
+ ["index_definition", "gc_stats" ,"cursor_stats", "dialect_stats", "Index Errors"].each do |k|
123
+ info[k] = hash_result(info[k]) if info[k]
124
+ end
125
+ ["attributes", "field statistics"].each do |k|
126
+ info[k]&.map! { |v| hash_result(v) }
127
+ end
128
+ info["field statistics"]&.each do |v|
129
+ v["Index Errors"] = hash_result(v["Index Errors"]) if v["Index Errors"]
130
+ end
131
+ info
132
+ end
133
+ end
134
+
135
+ def count
136
+ info.fetch("num_docs").to_i
79
137
  end
80
138
 
81
- def add_all(ids, embeddings)
82
- ids = ids.to_a
83
- embeddings = embeddings.to_a
139
+ def add(id, vector, metadata: nil)
140
+ add_all([id], [vector], metadata: metadata ? [metadata] : nil)[0]
141
+ end
84
142
 
85
- raise ArgumentError, "different sizes" if ids.size != embeddings.size
143
+ def add_all(ids, vectors, metadata: nil)
144
+ # perform checks first to reduce chance of non-atomic updates
145
+ ids = ids.to_a.map { |v| item_id(v) }
146
+ vectors = vectors.to_a
147
+ metadata = metadata.to_a if metadata
86
148
 
87
- embeddings.each { |e| check_dimensions(e) }
149
+ raise ArgumentError, "different sizes" if ids.size != vectors.size
88
150
 
89
- redis.pipelined do |pipeline|
90
- ids.zip(embeddings).each do |id, embedding|
91
- if @json
92
- pipeline.call("JSON.SET", item_key(id), "$", JSON.generate({v: embedding}))
93
- else
94
- pipeline.call("HSET", item_key(id), {v: to_binary(embedding)})
95
- end
151
+ vectors.each { |e| check_dimensions(e) }
152
+
153
+ if metadata
154
+ raise ArgumentError, "different sizes" if metadata.size != ids.size
155
+
156
+ metadata = metadata.map { |v| v&.transform_keys(&:to_s) }
157
+ if metadata.any? { |v| v&.key?("v") }
158
+ # TODO improve
159
+ raise ArgumentError, "invalid metadata"
96
160
  end
97
161
  end
162
+
163
+ result =
164
+ client.pipelined do |pipeline|
165
+ ids.zip(vectors).each_with_index do |(id, vector), i|
166
+ attributes = metadata && metadata[i] || {}
167
+ if @json
168
+ pipeline.call("JSON.SET", item_key(id), "$", JSON.generate(attributes.merge({"v" => vector})))
169
+ else
170
+ pipeline.call("HSET", item_key(id), attributes.merge({"v" => to_binary(vector)}))
171
+ end
172
+ end
173
+ end
174
+ result.map { |v| v.is_a?(String) ? v == "OK" : v > 0 }
98
175
  end
99
176
 
100
- def remove(id)
101
- remove_all([id])
177
+ def member?(id)
178
+ key = item_key(id)
179
+
180
+ run_command("EXISTS", key) == 1
102
181
  end
182
+ alias_method :include?, :member?
103
183
 
104
- def remove_all(ids)
105
- redis.call("DEL", ids.map { |id| item_key(id) })
184
+ def remove(id)
185
+ remove_all([id]) == 1
106
186
  end
107
187
 
108
- def search(embedding, count: 5)
109
- check_dimensions(embedding)
188
+ def remove_all(ids)
189
+ keys = ids.to_a.map { |id| item_key(id) }
110
190
 
111
- search_by_blob(to_binary(embedding), count)
191
+ run_command("DEL", *keys).to_i
112
192
  end
113
193
 
114
194
  def find(id)
195
+ key = item_key(id)
196
+
115
197
  if @json
116
- s = redis.call("JSON.GET", item_key(id), "$.v")
198
+ s = run_command("JSON.GET", key, "$.v")
117
199
  JSON.parse(s)[0] if s
118
200
  else
119
- from_binary(redis.call("HGET", item_key(id), "v"))
201
+ s = run_command("HGET", key, "v")
202
+ from_binary(s) if s
203
+ end
204
+ end
205
+
206
+ def metadata(id)
207
+ key = item_key(id)
208
+
209
+ if @json
210
+ v = run_command("JSON.GET", key)
211
+ JSON.parse(v).except("v") if v
212
+ else
213
+ v = hash_result(run_command("HGETALL", key))
214
+ v.except("v") if v.any?
215
+ end
216
+ end
217
+
218
+ def set_metadata(id, metadata)
219
+ key = item_key(id)
220
+
221
+ # TODO DRY with add_all
222
+ metadata = metadata.transform_keys(&:to_s)
223
+ raise ArgumentError, "invalid metadata" if metadata.key?("v")
224
+
225
+ if @json
226
+ # TODO use WATCH
227
+ keys = run_command("JSON.OBJKEYS", key)
228
+ return false unless keys
229
+
230
+ keys.each do |k|
231
+ next if k == "v"
232
+
233
+ # safe to modify in-place
234
+ metadata[k] = nil unless metadata.key?(k)
235
+ end
236
+
237
+ run_command("JSON.MERGE", key, "$", JSON.generate(metadata)) == "OK"
238
+ else
239
+ # TODO use WATCH
240
+ fields = run_command("HKEYS", key)
241
+ return false if fields.empty?
242
+
243
+ fields.delete("v")
244
+ if fields.any?
245
+ # TODO use MULTI
246
+ run_command("HDEL", key, *fields)
247
+ end
248
+
249
+ if metadata.any?
250
+ args = []
251
+ metadata.each do |k, v|
252
+ args.push(k, v)
253
+ end
254
+ run_command("HSET", key, *args) > 0
255
+ else
256
+ true
257
+ end
258
+ end
259
+ end
260
+
261
+ def remove_metadata(id)
262
+ key = item_key(id)
263
+
264
+ if @json
265
+ # TODO use WATCH
266
+ keys = run_command("JSON.OBJKEYS", key)
267
+ return false unless keys
268
+
269
+ keys.delete("v")
270
+ if keys.any?
271
+ # merge with null deletes key
272
+ run_command("JSON.MERGE", key, "$", JSON.generate(keys.to_h { |k| [k, nil] })) == "OK"
273
+ else
274
+ true
275
+ end
276
+ else
277
+ # TODO use WATCH
278
+ fields = run_command("HKEYS", key)
279
+ return false if fields.empty?
280
+
281
+ fields.delete("v")
282
+ if fields.any?
283
+ run_command("HDEL", key, *fields) > 0
284
+ else
285
+ true
286
+ end
120
287
  end
121
288
  end
122
289
 
123
- def nearest(id, count: 5)
124
- embedding =
290
+ def search(vector, count: 5, with_metadata: false, _filter: nil)
291
+ check_dimensions(vector)
292
+
293
+ search_command(to_binary(vector), count, with_metadata:, _filter:)
294
+ end
295
+
296
+ def search_id(id, count: 5, with_metadata: false, _filter: nil)
297
+ id = item_id(id)
298
+ key = item_key(id)
299
+
300
+ vector =
125
301
  if @json
126
- s = redis.call("JSON.GET", item_key(id), "$.v")
302
+ s = run_command("JSON.GET", key, "$.v")
127
303
  to_binary(JSON.parse(s)[0]) if s
128
304
  else
129
- redis.call("HGET", item_key(id), "v")
305
+ run_command("HGET", key, "v")
130
306
  end
131
307
 
132
- unless embedding
308
+ unless vector
133
309
  raise Error, "Could not find item #{id}"
134
310
  end
135
311
 
136
- search_by_blob(embedding, count + 1).reject { |v| v[:id] == id.to_s }.first(count)
312
+ search_command(vector, count + 1, with_metadata:, _filter:).reject { |v| v[:id] == id }.first(count)
313
+ end
314
+ alias_method :nearest, :search_id
315
+
316
+ def promote(alias_name)
317
+ run_command("FT.ALIASUPDATE", index_name(alias_name), @index_name)
318
+ nil
137
319
  end
138
320
 
139
321
  def drop
@@ -141,63 +323,81 @@ module Neighbor
141
323
  drop_keys
142
324
  end
143
325
 
144
- def promote(alias_name)
145
- redis.call("FT.ALIASUPDATE", index_name(alias_name), @index_name)
146
- end
147
-
148
326
  private
149
327
 
150
328
  def index_name(name)
151
329
  if name.include?(":")
152
- raise ArgumentError, "Invalid name"
330
+ raise ArgumentError, "invalid name"
153
331
  end
154
332
 
155
333
  "neighbor-idx-#{name}"
156
334
  end
157
335
 
158
- def check_dimensions(embedding)
159
- if embedding.size != @dimensions
336
+ def check_dimensions(vector)
337
+ if vector.size != @dimensions
160
338
  raise ArgumentError, "expected #{@dimensions} dimensions"
161
339
  end
162
340
  end
163
341
 
164
342
  def item_key(id)
165
- "#{@prefix}#{id}"
343
+ "#{@prefix}#{item_id(id)}"
344
+ end
345
+
346
+ def item_id(id)
347
+ @int_ids ? Integer(id) : id.to_s
166
348
  end
167
349
 
168
- def search_by_blob(blob, count)
169
- resp = redis.call("FT.SEARCH", @index_name, "*=>[KNN #{count.to_i} @v $BLOB]", "PARAMS", "2", "BLOB", blob, "SORTBY", "__v_score", "DIALECT", "2")
170
- resp.is_a?(Hash) ? parse_results_hash(resp) : parse_results_array(resp)
350
+ def search_command(blob, count, with_metadata:, _filter:)
351
+ filter = _filter ? "(#{_filter})" : "*"
352
+ return_args = with_metadata ? [] : ["RETURN", 1, "__v_score"]
353
+ resp = run_command("FT.SEARCH", @index_name, "#{filter}=>[KNN #{count.to_i} @v $BLOB AS __v_score]", "PARAMS", "2", "BLOB", blob, *search_sort_args, *return_args, "DIALECT", "2")
354
+ if resp.is_a?(Hash)
355
+ parse_results_hash(resp, with_metadata:)
356
+ else
357
+ parse_results_array(resp, with_metadata:)
358
+ end
171
359
  end
172
360
 
173
- def parse_results_hash(resp)
361
+ def search_sort_args
362
+ @search_sort_args ||= Redis.server_type == :valkey ? [] : ["SORTBY", "__v_score"]
363
+ end
364
+
365
+ def parse_results_hash(resp, with_metadata:)
174
366
  prefix_length = nil
175
367
  resp["results"].map do |result|
176
368
  key = result["id"]
177
369
  info = result["extra_attributes"]
178
370
  prefix_length ||= find_prefix_length(key)
179
- search_result(key, info, prefix_length)
371
+ search_result(key, info, prefix_length, with_metadata:)
180
372
  end
181
373
  end
182
374
 
183
- def parse_results_array(resp)
375
+ def parse_results_array(resp, with_metadata:)
184
376
  prefix_length = nil
185
377
  resp.shift.times.map do |i|
186
378
  key, info = resp.shift(2)
187
- info = info.each_slice(2).to_h
379
+ info = info.each_slice(2).to_h unless info.is_a?(Hash)
188
380
  prefix_length ||= find_prefix_length(key)
189
- search_result(key, info, prefix_length)
381
+ search_result(key, info, prefix_length, with_metadata:)
190
382
  end
191
383
  end
192
384
 
193
- def search_result(key, info, prefix_length)
385
+ def search_result(key, info, prefix_length, with_metadata:)
194
386
  score = info["__v_score"].to_f
195
387
  distance = calculate_distance(score)
196
388
 
197
- {
198
- id: key[prefix_length..-1],
389
+ result = {
390
+ id: item_id(key[prefix_length..-1]),
199
391
  distance: distance
200
392
  }
393
+ if with_metadata
394
+ if @json
395
+ result[:metadata] = JSON.parse(info["$"]).except("v")
396
+ else
397
+ result[:metadata] = info.except("v", "__v_score")
398
+ end
399
+ end
400
+ result
201
401
  end
202
402
 
203
403
  def calculate_distance(score)
@@ -217,19 +417,19 @@ module Neighbor
217
417
  end
218
418
 
219
419
  def drop_index
220
- redis.call("FT.DROPINDEX", @index_name)
420
+ run_command("FT.DROPINDEX", @index_name)
221
421
  end
222
422
 
223
423
  def drop_keys
224
424
  cursor = 0
225
425
  begin
226
- cursor, keys = redis.call("SCAN", cursor, "MATCH", "#{@prefix}*", "COUNT", 100)
227
- redis.call("DEL", keys) if keys.any?
426
+ cursor, keys = run_command("SCAN", cursor, "MATCH", "#{@prefix}*", "COUNT", 100)
427
+ run_command("DEL", *keys) if keys.any?
228
428
  end while cursor != "0"
229
429
  end
230
430
 
231
- def to_binary(embedding)
232
- embedding.to_a.pack(pack_format)
431
+ def to_binary(vector)
432
+ vector.to_a.pack(pack_format)
233
433
  end
234
434
 
235
435
  def from_binary(s)
@@ -240,15 +440,18 @@ module Neighbor
240
440
  @pack_format ||= @float64 ? "d#{@dimensions}" : "f#{@dimensions}"
241
441
  end
242
442
 
243
- # just use for create for now
244
- def ft_command
245
- yield
246
- rescue => e
247
- raise Error, "RediSearch not installed" if e.message.include?("ERR unknown command 'FT.")
248
- raise
443
+ def hash_result(result)
444
+ result.is_a?(Array) ? result.each_slice(2).to_h : result
445
+ end
446
+
447
+ def run_command(*args)
448
+ if args.any? { |v| !(v.is_a?(String) || v.is_a?(Numeric)) }
449
+ raise TypeError, "unexpected argument type"
450
+ end
451
+ client.call(*args)
249
452
  end
250
453
 
251
- def redis
454
+ def client
252
455
  Redis.client
253
456
  end
254
457
  end
@@ -0,0 +1,41 @@
1
+ module Neighbor
2
+ module Redis
3
+ class SvsVamanaIndex < Index
4
+ def initialize(
5
+ *args,
6
+ compression: nil,
7
+ construction_window_size: nil,
8
+ graph_max_degree: nil,
9
+ search_window_size: nil,
10
+ epsilon: nil,
11
+ training_threshold: nil,
12
+ reduce: nil,
13
+ **options
14
+ )
15
+ super(*args, **options)
16
+ @algorithm = "SVS-VAMANA"
17
+ @compression = compression
18
+ @construction_window_size = construction_window_size
19
+ @graph_max_degree = graph_max_degree
20
+ @search_window_size = search_window_size
21
+ @epsilon = epsilon
22
+ @training_threshold = training_threshold
23
+ @reduce = reduce
24
+ end
25
+
26
+ private
27
+
28
+ def create_params
29
+ params = {}
30
+ params["COMPRESSION"] = @compression if @compression
31
+ params["CONSTRUCTION_WINDOW_SIZE"] = @construction_window_size if @construction_window_size
32
+ params["GRAPH_MAX_DEGREE"] = @graph_max_degree if @graph_max_degree
33
+ params["SEARCH_WINDOW_SIZE"] = @search_window_size if @search_window_size
34
+ params["EPSILON"] = @epsilon if @epsilon
35
+ params["TRAINING_THRESHOLD"] = @training_threshold if @training_threshold
36
+ params["REDUCE"] = @reduce if @reduce
37
+ params
38
+ end
39
+ end
40
+ end
41
+ end
@@ -0,0 +1,265 @@
1
+ module Neighbor
2
+ module Redis
3
+ class VectorSet
4
+ NO_DEFAULT = Object.new
5
+
6
+ def initialize(
7
+ name,
8
+ m: nil,
9
+ ef_construction: nil,
10
+ ef_search: nil,
11
+ epsilon: nil,
12
+ quantization: nil,
13
+ reduce: nil,
14
+ id_type: "string"
15
+ )
16
+ name = name.to_str
17
+ if name.include?(":")
18
+ raise ArgumentError, "invalid name"
19
+ end
20
+
21
+ @name = name
22
+ @m = m&.to_i
23
+ @ef_construction = ef_construction&.to_i
24
+ @ef_search = ef_search&.to_i
25
+ @epsilon = epsilon&.to_f
26
+
27
+ @quant_type =
28
+ case quantization&.to_s
29
+ when nil
30
+ "NOQUANT"
31
+ when "binary"
32
+ "BIN"
33
+ when "int8"
34
+ "Q8"
35
+ else
36
+ raise ArgumentError, "invalid quantization"
37
+ end
38
+
39
+ case id_type.to_s
40
+ when "string", "integer"
41
+ @int_ids = id_type == "integer"
42
+ else
43
+ raise ArgumentError, "invalid id_type"
44
+ end
45
+
46
+ @reduce_args = []
47
+ @reduce_args.push("REDUCE", reduce.to_i) if reduce
48
+
49
+ @add_args = []
50
+ @add_args.push("M", @m) if @m
51
+ @add_args.push("EF", @ef_construction) if @ef_construction
52
+ end
53
+
54
+ def exists?
55
+ !run_command("VINFO", key).nil?
56
+ end
57
+
58
+ def info
59
+ hash_result(run_command("VINFO", key))
60
+ end
61
+
62
+ def dimensions
63
+ run_command("VDIM", key)
64
+ rescue => e
65
+ raise e unless e.message.include?("key does not exist")
66
+ nil
67
+ end
68
+
69
+ def count
70
+ run_command("VCARD", key)
71
+ end
72
+
73
+ def add(id, vector, metadata: nil)
74
+ add_all([id], [vector], metadata: metadata ? [metadata] : nil)[0]
75
+ end
76
+
77
+ def add_all(ids, vectors, metadata: nil)
78
+ # perform checks first to reduce chance of non-atomic updates
79
+ ids = ids.to_a.map { |v| item_id(v) }
80
+ vectors = vectors.to_a
81
+ metadata = metadata.to_a if metadata
82
+
83
+ raise ArgumentError, "different sizes" if ids.size != vectors.size
84
+
85
+ if vectors.size > 1
86
+ dimensions = vectors.first.size
87
+ unless vectors.all? { |v| v.size == dimensions }
88
+ raise ArgumentError, "different dimensions"
89
+ end
90
+ end
91
+
92
+ if metadata
93
+ raise ArgumentError, "different sizes" if metadata.size != ids.size
94
+ end
95
+
96
+ result =
97
+ client.pipelined do |pipeline|
98
+ ids.zip(vectors).each_with_index do |(id, vector), i|
99
+ attributes = metadata[i] if metadata
100
+ attribute_args = []
101
+ attribute_args.push("SETATTR", JSON.generate(attributes)) if attributes
102
+ pipeline.call("VADD", key, *@reduce_args, "FP32", to_binary(vector), id, @quant_type, *attribute_args, *@add_args)
103
+ end
104
+ end
105
+ result.map { |v| bool_result(v) }
106
+ end
107
+
108
+ def member?(id)
109
+ id = item_id(id)
110
+
111
+ bool_result(run_command("VISMEMBER", key, id))
112
+ end
113
+ alias_method :include?, :member?
114
+
115
+ def remove(id)
116
+ id = item_id(id)
117
+
118
+ bool_result(run_command("VREM", key, id))
119
+ end
120
+
121
+ def remove_all(ids)
122
+ # perform checks first to reduce chance of non-atomic updates
123
+ ids = ids.to_a.map { |v| item_id(v) }
124
+
125
+ result =
126
+ client.pipelined do |pipeline|
127
+ ids.each do |id|
128
+ pipeline.call("VREM", key, id)
129
+ end
130
+ end
131
+ result.map { |v| bool_result(v) }
132
+ end
133
+
134
+ def find(id)
135
+ id = item_id(id)
136
+
137
+ run_command("VEMB", key, id)&.map(&:to_f)
138
+ end
139
+
140
+ def metadata(id)
141
+ id = item_id(id)
142
+
143
+ a = run_command("VGETATTR", key, id)
144
+ a ? JSON.parse(a) : nil
145
+ end
146
+
147
+ def set_metadata(id, metadata)
148
+ id = item_id(id)
149
+
150
+ bool_result(run_command("VSETATTR", key, id, JSON.generate(metadata)))
151
+ end
152
+
153
+ def remove_metadata(id)
154
+ id = item_id(id)
155
+
156
+ bool_result(run_command("VSETATTR", key, id, ""))
157
+ end
158
+
159
+ def search(vector, count: 5, with_metadata: false, ef_search: nil, exact: false, _filter: nil, _ef_filter: nil)
160
+ count = count.to_i
161
+
162
+ search_command(["FP32", to_binary(vector)], count:, with_metadata:, ef_search:, exact:, _filter:, _ef_filter:).map do |k, v|
163
+ search_result(k, v, with_metadata:)
164
+ end
165
+ end
166
+
167
+ def search_id(id, count: 5, with_metadata: false, ef_search: nil, exact: false, _filter: nil, _ef_filter: nil)
168
+ id = item_id(id)
169
+ count = count.to_i
170
+
171
+ result =
172
+ search_command(["ELE", id], count: count + 1, with_metadata:, ef_search:, exact:, _filter:, _ef_filter:).filter_map do |k, v|
173
+ if k != id.to_s
174
+ search_result(k, v, with_metadata:)
175
+ end
176
+ end
177
+ result.first(count)
178
+ end
179
+ alias_method :nearest, :search_id
180
+
181
+ def links(id)
182
+ id = item_id(id)
183
+
184
+ run_command("VLINKS", key, id, "WITHSCORES")&.map do |links|
185
+ hash_result(links).map do |k, v|
186
+ search_result(k, v)
187
+ end
188
+ end
189
+ end
190
+
191
+ def sample(n = NO_DEFAULT)
192
+ count = n == NO_DEFAULT ? 1 : n.to_i
193
+
194
+ result = run_command("VRANDMEMBER", key, count).map { |v| item_id(v) }
195
+ n == NO_DEFAULT ? result.first : result
196
+ end
197
+
198
+ def drop
199
+ bool_result(run_command("DEL", key))
200
+ end
201
+
202
+ private
203
+
204
+ def key
205
+ "neighbor:vs:#{@name}"
206
+ end
207
+
208
+ def item_id(id)
209
+ @int_ids ? Integer(id) : id.to_s
210
+ end
211
+
212
+ def to_binary(vector)
213
+ vector.pack("e*")
214
+ end
215
+
216
+ def search_command(args, count:, with_metadata:, ef_search:, exact:, _filter:, _ef_filter:)
217
+ ef_search = @ef_search if ef_search.nil?
218
+
219
+ args << "WITHATTRIBS" if with_metadata
220
+ args.push("EF", ef_search) if ef_search
221
+ args.push("EPSILON", @epsilon) if @epsilon
222
+ args.push("FILTER", _filter) if _filter
223
+ args.push("FILTER_EF", _ef_filter) if _ef_filter
224
+ args << "TRUTH" if exact
225
+
226
+ result = run_command("VSIM", key, *args, "WITHSCORES", "COUNT", count)
227
+ if result.is_a?(Array)
228
+ if with_metadata
229
+ result.each_slice(3).to_h { |v| [v[0], v[1..]] }
230
+ else
231
+ hash_result(result)
232
+ end
233
+ else
234
+ result
235
+ end
236
+ end
237
+
238
+ def search_result(k, v, with_metadata: false)
239
+ v, a = v if with_metadata
240
+ value = {id: item_id(k), distance: 2 * (1 - v.to_f)}
241
+ value.merge!(metadata: a ? JSON.parse(a) : {}) if with_metadata
242
+ value
243
+ end
244
+
245
+ def hash_result(result)
246
+ result.is_a?(Array) ? result.each_slice(2).to_h : result
247
+ end
248
+
249
+ def bool_result(result)
250
+ result == true || result == 1
251
+ end
252
+
253
+ def run_command(*args)
254
+ if args.any? { |v| !(v.is_a?(String) || v.is_a?(Numeric)) }
255
+ raise TypeError, "unexpected argument type"
256
+ end
257
+ client.call(*args)
258
+ end
259
+
260
+ def client
261
+ Redis.client
262
+ end
263
+ end
264
+ end
265
+ end
@@ -1,5 +1,5 @@
1
1
  module Neighbor
2
2
  module Redis
3
- VERSION = "0.2.0"
3
+ VERSION = "0.3.0"
4
4
  end
5
5
  end
@@ -1,10 +1,15 @@
1
1
  # dependencies
2
2
  require "redis-client"
3
3
 
4
+ # stdlib
5
+ require "json"
6
+
4
7
  # modules
5
8
  require_relative "redis/index"
6
9
  require_relative "redis/flat_index"
7
10
  require_relative "redis/hnsw_index"
11
+ require_relative "redis/svs_vamana_index"
12
+ require_relative "redis/vector_set"
8
13
  require_relative "redis/version"
9
14
 
10
15
  module Neighbor
@@ -13,6 +18,21 @@ module Neighbor
13
18
 
14
19
  class << self
15
20
  attr_accessor :client
21
+
22
+ def server_type
23
+ unless defined?(@server_type)
24
+ info = client.call("INFO")
25
+ @server_type =
26
+ if info.include?("valkey_version")
27
+ :valkey
28
+ elsif info.include?("dragonfly_version")
29
+ :dragonfly
30
+ else
31
+ :redis
32
+ end
33
+ end
34
+ @server_type
35
+ end
16
36
  end
17
37
  end
18
38
  end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: neighbor-redis
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
9
8
  bindir: bin
10
9
  cert_chain: []
11
- date: 2024-10-23 00:00:00.000000000 Z
10
+ date: 1980-01-02 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: redis-client
@@ -24,7 +23,6 @@ dependencies:
24
23
  - - ">="
25
24
  - !ruby/object:Gem::Version
26
25
  version: '0'
27
- description:
28
26
  email: andrew@ankane.org
29
27
  executables: []
30
28
  extensions: []
@@ -38,12 +36,13 @@ files:
38
36
  - lib/neighbor/redis/flat_index.rb
39
37
  - lib/neighbor/redis/hnsw_index.rb
40
38
  - lib/neighbor/redis/index.rb
39
+ - lib/neighbor/redis/svs_vamana_index.rb
40
+ - lib/neighbor/redis/vector_set.rb
41
41
  - lib/neighbor/redis/version.rb
42
42
  homepage: https://github.com/ankane/neighbor-redis
43
43
  licenses:
44
44
  - MIT
45
45
  metadata: {}
46
- post_install_message:
47
46
  rdoc_options: []
48
47
  require_paths:
49
48
  - lib
@@ -51,15 +50,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
51
50
  requirements:
52
51
  - - ">="
53
52
  - !ruby/object:Gem::Version
54
- version: '3.1'
53
+ version: '3.2'
55
54
  required_rubygems_version: !ruby/object:Gem::Requirement
56
55
  requirements:
57
56
  - - ">="
58
57
  - !ruby/object:Gem::Version
59
58
  version: '0'
60
59
  requirements: []
61
- rubygems_version: 3.5.16
62
- signing_key:
60
+ rubygems_version: 3.6.9
63
61
  specification_version: 4
64
62
  summary: Nearest neighbor search for Ruby and Redis
65
63
  test_files: []