neighbor-redis 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: ca86068ccb3e4538661582c5cf25f58b522569ff2abc6fb69cb6c69f3171141e
4
+ data.tar.gz: 455e7ce9a5c112b3ffb2e6f5004f4cc0598033b1b1ca409fdec7fdecdd936c86
5
+ SHA512:
6
+ metadata.gz: 4ff369d81326391017aa7c8c98966d44200856f7fb7be32caa8dd773f55ab01a15674ed5bb2a58aa8bf8163604d26895516fcd4c29332caf5354dc101d5d7f54
7
+ data.tar.gz: 5d00a85e5265d01617299daa1b0ce5a3e67f4d570c6a512eee5bc6b4023d5dedfc93a8e8583689a54fbb1cafae8401cb76acc35242b6d215cd1883aa34add26f
data/CHANGELOG.md ADDED
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2023-02-21)
2
+
3
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 Andrew Kane
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,196 @@
1
+ # Neighbor Redis
2
+
3
+ Nearest neighbor search for Ruby and Redis
4
+
5
+ [![Build Status](https://github.com/ankane/neighbor-redis/workflows/build/badge.svg?branch=master)](https://github.com/ankane/neighbor-redis/actions)
6
+
7
+ ## Installation
8
+
9
+ First, [install RediSearch](https://redis.io/docs/stack/search/quick_start/). With Docker, use:
10
+
11
+ ```sh
12
+ docker run -p 6379:6379 redis/redis-stack-server:latest
13
+ ```
14
+
15
+ Add this line to your application’s Gemfile:
16
+
17
+ ```ruby
18
+ gem "neighbor-redis"
19
+ ```
20
+
21
+ And set the Redis client:
22
+
23
+ ```ruby
24
+ Neighbor::Redis.client = RedisClient.config.new_pool
25
+ ```
26
+
27
+ ## Getting Started
28
+
29
+ Create an index
30
+
31
+ ```ruby
32
+ index = Neighbor::Redis::HNSWIndex.new("items", dimensions: 3, distance: "l2")
33
+ index.create
34
+ ```
35
+
36
+ Add items
37
+
38
+ ```ruby
39
+ index.add(1, [1, 1, 1])
40
+ index.add(2, [2, 2, 2])
41
+ index.add(3, [1, 1, 2])
42
+ ```
43
+
44
+ Note: IDs are stored and returned as strings (uses less total storage)
45
+
46
+ Get the nearest neighbors to an item
47
+
48
+ ```ruby
49
+ index.nearest(1, count: 5)
50
+ ```
51
+
52
+ Get the nearest neighbors to a vector
53
+
54
+ ```ruby
55
+ index.search([1, 1, 1], count: 5)
56
+ ```
57
+
58
+ ## Distance
59
+
60
+ Supported values are:
61
+
62
+ - `l2`
63
+ - `inner_product`
64
+ - `cosine`
65
+
66
+ ## Index Types
67
+
68
+ Hierarchical Navigable Small World (HNSW)
69
+
70
+ ```ruby
71
+ Neighbor::Redis::HNSWIndex.new(
72
+ name,
73
+ initial_cap: nil,
74
+ m: 16,
75
+ ef_construction: 200,
76
+ ef_runtime: 10,
77
+ epsilon: 0.01
78
+ )
79
+ ```
80
+
81
+ Flat
82
+
83
+ ```ruby
84
+ Neighbor::Redis::FlatIndex.new(
85
+ name,
86
+ initial_cap: nil,
87
+ block_size: 1024
88
+ )
89
+ ```
90
+
91
+ ## Additional Options
92
+
93
+ Store vectors as double precision (instead of single precision)
94
+
95
+ ```ruby
96
+ Neighbor::Redis::HNSWIndex.new(name, type: "float64")
97
+ ```
98
+
99
+ Store vectors as JSON (instead of a hash/blob)
100
+
101
+ ```ruby
102
+ Neighbor::Redis::HNSWIndex.new(name, redis_type: "json")
103
+ ```
104
+
105
+ ## Changing Options
106
+
107
+ Create a new index to change any index options
108
+
109
+ ```ruby
110
+ Neighbor::Redis::HNSWIndex.new("items-v2", **new_options)
111
+ ```
112
+
113
+ ## Additional Operations
114
+
115
+ Add multiple items
116
+
117
+ ```ruby
118
+ index.add_all(ids, embeddings)
119
+ ```
120
+
121
+ Remove an item
122
+
123
+ ```ruby
124
+ index.remove(id)
125
+ ```
126
+
127
+ Remove multiple items
128
+
129
+ ```ruby
130
+ index.remove_all(ids)
131
+ ```
132
+
133
+ Drop the index
134
+
135
+ ```ruby
136
+ index.drop
137
+ ```
138
+
139
+ ## Example
140
+
141
+ You can use Neighbor Redis for online item-based recommendations with [Disco](https://github.com/ankane/disco). We’ll use MovieLens data for this example.
142
+
143
+ Create an index
144
+
145
+ ```ruby
146
+ index = Neighbor::Redis::HNSWIndex.new("movies", dimensions: 20, distance: "cosine")
147
+ index.create
148
+ ```
149
+
150
+ Fit the recommender
151
+
152
+ ```ruby
153
+ data = Disco.load_movielens
154
+ recommender = Disco::Recommender.new(factors: 20)
155
+ recommender.fit(data)
156
+ ```
157
+
158
+ Store the item factors
159
+
160
+ ```ruby
161
+ index.add_all(recommender.item_ids, recommender.item_factors)
162
+ ```
163
+
164
+ And get similar movies
165
+
166
+ ```ruby
167
+ index.nearest("Star Wars (1977)").map { |v| v[:id] }
168
+ ```
169
+
170
+ See the [complete code](examples/disco_item_recs.rb)
171
+
172
+ ## Reference
173
+
174
+ - [Vector similarity](https://redis.io/docs/stack/search/reference/vectors/)
175
+
176
+ ## History
177
+
178
+ View the [changelog](CHANGELOG.md)
179
+
180
+ ## Contributing
181
+
182
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
183
+
184
+ - [Report bugs](https://github.com/ankane/neighbor-redis/issues)
185
+ - Fix bugs and [submit pull requests](https://github.com/ankane/neighbor-redis/pulls)
186
+ - Write, clarify, or fix documentation
187
+ - Suggest or add new features
188
+
189
+ To get started with development:
190
+
191
+ ```sh
192
+ git clone https://github.com/ankane/neighbor-redis.git
193
+ cd neighbor-redis
194
+ bundle install
195
+ bundle exec rake test
196
+ ```
@@ -0,0 +1,21 @@
1
+ module Neighbor
2
+ module Redis
3
+ class FlatIndex < Index
4
+ def initialize(*args, initial_cap: nil, block_size: nil, **options)
5
+ super(*args, **options)
6
+ @algorithm = "FLAT"
7
+ @initial_cap = initial_cap
8
+ @block_size = block_size
9
+ end
10
+
11
+ private
12
+
13
+ def create_params
14
+ params = {}
15
+ params["INITIAL_CAP"] = @initial_cap if @initial_cap
16
+ params["BLOCK_SIZE"] = @block_size if @block_size
17
+ params
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,27 @@
1
+ module Neighbor
2
+ module Redis
3
+ class HNSWIndex < Index
4
+ def initialize(*args, initial_cap: nil, m: nil, ef_construction: nil, ef_runtime: nil, epsilon: nil, **options)
5
+ super(*args, **options)
6
+ @algorithm = "HNSW"
7
+ @initial_cap = initial_cap
8
+ @m = m
9
+ @ef_construction = ef_construction
10
+ @ef_runtime = ef_runtime
11
+ @epsilon = epsilon
12
+ end
13
+
14
+ private
15
+
16
+ def create_params
17
+ params = {}
18
+ params["INITIAL_CAP"] = @initial_cap if @initial_cap
19
+ params["M"] = @m if @m
20
+ params["EF_CONSTRUCTION"] = @ef_construction if @ef_construction
21
+ params["EF_RUNTIME"] = @ef_runtime if @ef_runtime
22
+ params["EPSILON"] = @epsilon if @epsilon
23
+ params
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,236 @@
1
+ module Neighbor
2
+ module Redis
3
+ class Index
4
+ def initialize(name, dimensions:, distance:, type: "float32", redis_type: "hash")
5
+ @index_name = index_name(name)
6
+ @global_prefix = "neighbor:items:"
7
+ @prefix = "#{@global_prefix}#{name}:"
8
+
9
+ @dimensions = dimensions
10
+
11
+ unless distance.nil?
12
+ @distance_metric =
13
+ case distance.to_s
14
+ when "l2", "cosine"
15
+ distance.to_s.upcase
16
+ when "inner_product"
17
+ "IP"
18
+ else
19
+ raise ArgumentError, "invalid distance"
20
+ end
21
+ end
22
+
23
+ @float64 =
24
+ case type.to_s
25
+ when "float32"
26
+ false
27
+ when "float64"
28
+ true
29
+ else
30
+ raise ArgumentError, "invalid type"
31
+ end
32
+
33
+ @json =
34
+ case redis_type.to_s
35
+ when "hash"
36
+ false
37
+ when "json"
38
+ require "json"
39
+ true
40
+ else
41
+ raise ArgumentError, "invalid redis_type"
42
+ end
43
+ end
44
+
45
+ def self.create(...)
46
+ index = new(...)
47
+ index.create
48
+ index
49
+ end
50
+
51
+ def create
52
+ params = {
53
+ "TYPE" => @float64 ? "FLOAT64" : "FLOAT32",
54
+ "DIM" => @dimensions,
55
+ "DISTANCE_METRIC" => @distance_metric,
56
+ }.merge(create_params)
57
+
58
+ command = ["FT.CREATE", @index_name]
59
+ command.push("ON", "JSON") if @json
60
+ command.push("PREFIX", "1", @prefix, "SCHEMA")
61
+ command.push("$.v", "AS") if @json
62
+ command.push("v", "VECTOR", @algorithm, params.size * 2, params)
63
+ ft_command { redis.call(*command) }
64
+ end
65
+
66
+ def exists?
67
+ redis.call("FT.INFO", @index_name)
68
+ true
69
+ rescue ArgumentError
70
+ # fix for invalid value for Float(): "-nan"
71
+ true
72
+ rescue => e
73
+ raise unless e.message.include?("Unknown Index name")
74
+ false
75
+ end
76
+
77
+ def add(id, embedding)
78
+ add_all([id], [embedding])
79
+ end
80
+
81
+ def add_all(ids, embeddings)
82
+ ids = ids.to_a
83
+ embeddings = embeddings.to_a
84
+
85
+ raise ArgumentError, "different sizes" if ids.size != embeddings.size
86
+
87
+ embeddings.each { |e| check_dimensions(e) }
88
+
89
+ redis.pipelined do |pipeline|
90
+ ids.zip(embeddings).each do |id, embedding|
91
+ if @json
92
+ pipeline.call("JSON.SET", item_key(id), "$", JSON.generate({v: embedding}))
93
+ else
94
+ pipeline.call("HSET", item_key(id), {v: to_binary(embedding)})
95
+ end
96
+ end
97
+ end
98
+ end
99
+
100
+ def remove(id)
101
+ remove_all([id])
102
+ end
103
+
104
+ def remove_all(ids)
105
+ redis.call("DEL", ids.map { |id| item_key(id) })
106
+ end
107
+
108
+ def search(embedding, count: 5)
109
+ check_dimensions(embedding)
110
+
111
+ search_by_blob(to_binary(embedding), count)
112
+ end
113
+
114
+ def find(id)
115
+ if @json
116
+ s = redis.call("JSON.GET", item_key(id), "$.v")
117
+ JSON.parse(s)[0] if s
118
+ else
119
+ from_binary(redis.call("HGET", item_key(id), "v"))
120
+ end
121
+ end
122
+
123
+ def nearest(id, count: 5)
124
+ embedding =
125
+ if @json
126
+ s = redis.call("JSON.GET", item_key(id), "$.v")
127
+ to_binary(JSON.parse(s)[0]) if s
128
+ else
129
+ redis.call("HGET", item_key(id), "v")
130
+ end
131
+
132
+ unless embedding
133
+ raise Error, "Could not find item #{id}"
134
+ end
135
+
136
+ search_by_blob(embedding, count + 1).reject { |v| v[:id] == id.to_s }.first(count)
137
+ end
138
+
139
+ def drop
140
+ drop_index
141
+ drop_keys
142
+ end
143
+
144
+ def promote(alias_name)
145
+ redis.call("FT.ALIASUPDATE", index_name(alias_name), @index_name)
146
+ end
147
+
148
+ private
149
+
150
+ def index_name(name)
151
+ if name.include?(":")
152
+ raise ArgumentError, "Invalid name"
153
+ end
154
+
155
+ "neighbor-idx-#{name}"
156
+ end
157
+
158
+ def check_dimensions(embedding)
159
+ if embedding.size != @dimensions
160
+ raise ArgumentError, "expected #{@dimensions} dimensions"
161
+ end
162
+ end
163
+
164
+ def item_key(id)
165
+ "#{@prefix}#{id}"
166
+ end
167
+
168
+ def search_by_blob(blob, count)
169
+ resp = redis.call("FT.SEARCH", @index_name, "*=>[KNN #{count.to_i} @v $BLOB]", "PARAMS", "2", "BLOB", blob, "SORTBY", "__v_score", "DIALECT", "2")
170
+ len = resp.shift
171
+ prefix_length = nil
172
+ len.times.map do |i|
173
+ key, info = resp.shift(2)
174
+ info = info.each_slice(2).to_h
175
+ score = info["__v_score"].to_f
176
+ distance =
177
+ case @distance_metric
178
+ when "L2"
179
+ Math.sqrt(score)
180
+ when "IP"
181
+ (score * -1) + 1
182
+ else
183
+ score
184
+ end
185
+
186
+ prefix_length ||= find_prefix_length(key)
187
+
188
+ {
189
+ id: key[prefix_length..-1],
190
+ distance: distance
191
+ }
192
+ end
193
+ end
194
+
195
+ def find_prefix_length(key)
196
+ key[@global_prefix.length..-1].index(":") + @global_prefix.length + 1
197
+ end
198
+
199
+ def drop_index
200
+ redis.call("FT.DROPINDEX", @index_name)
201
+ end
202
+
203
+ def drop_keys
204
+ cursor = 0
205
+ begin
206
+ cursor, keys = redis.call("SCAN", cursor, "MATCH", "#{@prefix}*", "COUNT", 100)
207
+ redis.call("DEL", keys) if keys.any?
208
+ end while cursor != "0"
209
+ end
210
+
211
+ def to_binary(embedding)
212
+ embedding.to_a.pack(pack_format)
213
+ end
214
+
215
+ def from_binary(s)
216
+ s.unpack(pack_format)
217
+ end
218
+
219
+ def pack_format
220
+ @pack_format ||= @float64 ? "d#{@dimensions}" : "f#{@dimensions}"
221
+ end
222
+
223
+ # just use for create for now
224
+ def ft_command
225
+ yield
226
+ rescue => e
227
+ raise Error, "RediSearch not installed" if e.message.include?("ERR unknown command 'FT.")
228
+ raise
229
+ end
230
+
231
+ def redis
232
+ Redis.client
233
+ end
234
+ end
235
+ end
236
+ end
@@ -0,0 +1,5 @@
1
+ module Neighbor
2
+ module Redis
3
+ VERSION = "0.1.0"
4
+ end
5
+ end
@@ -0,0 +1,18 @@
1
+ # dependencies
2
+ require "redis-client"
3
+
4
+ # modules
5
+ require_relative "redis/index"
6
+ require_relative "redis/flat_index"
7
+ require_relative "redis/hnsw_index"
8
+ require_relative "redis/version"
9
+
10
+ module Neighbor
11
+ module Redis
12
+ class Error < StandardError; end
13
+
14
+ class << self
15
+ attr_accessor :client
16
+ end
17
+ end
18
+ end
@@ -0,0 +1 @@
1
+ require_relative "neighbor/redis"
metadata ADDED
@@ -0,0 +1,65 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: neighbor-redis
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Andrew Kane
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2023-02-21 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: redis-client
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ description:
28
+ email: andrew@ankane.org
29
+ executables: []
30
+ extensions: []
31
+ extra_rdoc_files: []
32
+ files:
33
+ - CHANGELOG.md
34
+ - LICENSE.txt
35
+ - README.md
36
+ - lib/neighbor-redis.rb
37
+ - lib/neighbor/redis.rb
38
+ - lib/neighbor/redis/flat_index.rb
39
+ - lib/neighbor/redis/hnsw_index.rb
40
+ - lib/neighbor/redis/index.rb
41
+ - lib/neighbor/redis/version.rb
42
+ homepage: https://github.com/ankane/neighbor-redis
43
+ licenses:
44
+ - MIT
45
+ metadata: {}
46
+ post_install_message:
47
+ rdoc_options: []
48
+ require_paths:
49
+ - lib
50
+ required_ruby_version: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '2.7'
55
+ required_rubygems_version: !ruby/object:Gem::Requirement
56
+ requirements:
57
+ - - ">="
58
+ - !ruby/object:Gem::Version
59
+ version: '0'
60
+ requirements: []
61
+ rubygems_version: 3.4.6
62
+ signing_key:
63
+ specification_version: 4
64
+ summary: Nearest neighbor search for Ruby and Redis
65
+ test_files: []