neighbor-redis 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: ca86068ccb3e4538661582c5cf25f58b522569ff2abc6fb69cb6c69f3171141e
4
+ data.tar.gz: 455e7ce9a5c112b3ffb2e6f5004f4cc0598033b1b1ca409fdec7fdecdd936c86
5
+ SHA512:
6
+ metadata.gz: 4ff369d81326391017aa7c8c98966d44200856f7fb7be32caa8dd773f55ab01a15674ed5bb2a58aa8bf8163604d26895516fcd4c29332caf5354dc101d5d7f54
7
+ data.tar.gz: 5d00a85e5265d01617299daa1b0ce5a3e67f4d570c6a512eee5bc6b4023d5dedfc93a8e8583689a54fbb1cafae8401cb76acc35242b6d215cd1883aa34add26f
data/CHANGELOG.md ADDED
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2023-02-21)
2
+
3
+ - First release
data/LICENSE.txt ADDED
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2023 Andrew Kane
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in
13
+ all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
21
+ THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,196 @@
1
+ # Neighbor Redis
2
+
3
+ Nearest neighbor search for Ruby and Redis
4
+
5
+ [![Build Status](https://github.com/ankane/neighbor-redis/workflows/build/badge.svg?branch=master)](https://github.com/ankane/neighbor-redis/actions)
6
+
7
+ ## Installation
8
+
9
+ First, [install RediSearch](https://redis.io/docs/stack/search/quick_start/). With Docker, use:
10
+
11
+ ```sh
12
+ docker run -p 6379:6379 redis/redis-stack-server:latest
13
+ ```
14
+
15
+ Add this line to your application’s Gemfile:
16
+
17
+ ```ruby
18
+ gem "neighbor-redis"
19
+ ```
20
+
21
+ And set the Redis client:
22
+
23
+ ```ruby
24
+ Neighbor::Redis.client = RedisClient.config.new_pool
25
+ ```
26
+
27
+ ## Getting Started
28
+
29
+ Create an index
30
+
31
+ ```ruby
32
+ index = Neighbor::Redis::HNSWIndex.new("items", dimensions: 3, distance: "l2")
33
+ index.create
34
+ ```
35
+
36
+ Add items
37
+
38
+ ```ruby
39
+ index.add(1, [1, 1, 1])
40
+ index.add(2, [2, 2, 2])
41
+ index.add(3, [1, 1, 2])
42
+ ```
43
+
44
+ Note: IDs are stored and returned as strings (uses less total storage)
45
+
46
+ Get the nearest neighbors to an item
47
+
48
+ ```ruby
49
+ index.nearest(1, count: 5)
50
+ ```
51
+
52
+ Get the nearest neighbors to a vector
53
+
54
+ ```ruby
55
+ index.search([1, 1, 1], count: 5)
56
+ ```
57
+
58
+ ## Distance
59
+
60
+ Supported values are:
61
+
62
+ - `l2`
63
+ - `inner_product`
64
+ - `cosine`
65
+
66
+ ## Index Types
67
+
68
+ Hierarchical Navigable Small World (HNSW)
69
+
70
+ ```ruby
71
+ Neighbor::Redis::HNSWIndex.new(
72
+ name,
73
+ initial_cap: nil,
74
+ m: 16,
75
+ ef_construction: 200,
76
+ ef_runtime: 10,
77
+ epsilon: 0.01
78
+ )
79
+ ```
80
+
81
+ Flat
82
+
83
+ ```ruby
84
+ Neighbor::Redis::FlatIndex.new(
85
+ name,
86
+ initial_cap: nil,
87
+ block_size: 1024
88
+ )
89
+ ```
90
+
91
+ ## Additional Options
92
+
93
+ Store vectors as double precision (instead of single precision)
94
+
95
+ ```ruby
96
+ Neighbor::Redis::HNSWIndex.new(name, type: "float64")
97
+ ```
98
+
99
+ Store vectors as JSON (instead of a hash/blob)
100
+
101
+ ```ruby
102
+ Neighbor::Redis::HNSWIndex.new(name, redis_type: "json")
103
+ ```
104
+
105
+ ## Changing Options
106
+
107
+ Create a new index to change any index options
108
+
109
+ ```ruby
110
+ Neighbor::Redis::HNSWIndex.new("items-v2", **new_options)
111
+ ```
112
+
113
+ ## Additional Operations
114
+
115
+ Add multiple items
116
+
117
+ ```ruby
118
+ index.add_all(ids, embeddings)
119
+ ```
120
+
121
+ Remove an item
122
+
123
+ ```ruby
124
+ index.remove(id)
125
+ ```
126
+
127
+ Remove multiple items
128
+
129
+ ```ruby
130
+ index.remove_all(ids)
131
+ ```
132
+
133
+ Drop the index
134
+
135
+ ```ruby
136
+ index.drop
137
+ ```
138
+
139
+ ## Example
140
+
141
+ You can use Neighbor Redis for online item-based recommendations with [Disco](https://github.com/ankane/disco). We’ll use MovieLens data for this example.
142
+
143
+ Create an index
144
+
145
+ ```ruby
146
+ index = Neighbor::Redis::HNSWIndex.new("movies", dimensions: 20, distance: "cosine")
147
+ index.create
148
+ ```
149
+
150
+ Fit the recommender
151
+
152
+ ```ruby
153
+ data = Disco.load_movielens
154
+ recommender = Disco::Recommender.new(factors: 20)
155
+ recommender.fit(data)
156
+ ```
157
+
158
+ Store the item factors
159
+
160
+ ```ruby
161
+ index.add_all(recommender.item_ids, recommender.item_factors)
162
+ ```
163
+
164
+ And get similar movies
165
+
166
+ ```ruby
167
+ index.nearest("Star Wars (1977)").map { |v| v[:id] }
168
+ ```
169
+
170
+ See the [complete code](examples/disco_item_recs.rb)
171
+
172
+ ## Reference
173
+
174
+ - [Vector similarity](https://redis.io/docs/stack/search/reference/vectors/)
175
+
176
+ ## History
177
+
178
+ View the [changelog](CHANGELOG.md)
179
+
180
+ ## Contributing
181
+
182
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
183
+
184
+ - [Report bugs](https://github.com/ankane/neighbor-redis/issues)
185
+ - Fix bugs and [submit pull requests](https://github.com/ankane/neighbor-redis/pulls)
186
+ - Write, clarify, or fix documentation
187
+ - Suggest or add new features
188
+
189
+ To get started with development:
190
+
191
+ ```sh
192
+ git clone https://github.com/ankane/neighbor-redis.git
193
+ cd neighbor-redis
194
+ bundle install
195
+ bundle exec rake test
196
+ ```
@@ -0,0 +1,21 @@
1
+ module Neighbor
2
+ module Redis
3
+ class FlatIndex < Index
4
+ def initialize(*args, initial_cap: nil, block_size: nil, **options)
5
+ super(*args, **options)
6
+ @algorithm = "FLAT"
7
+ @initial_cap = initial_cap
8
+ @block_size = block_size
9
+ end
10
+
11
+ private
12
+
13
+ def create_params
14
+ params = {}
15
+ params["INITIAL_CAP"] = @initial_cap if @initial_cap
16
+ params["BLOCK_SIZE"] = @block_size if @block_size
17
+ params
18
+ end
19
+ end
20
+ end
21
+ end
@@ -0,0 +1,27 @@
1
+ module Neighbor
2
+ module Redis
3
+ class HNSWIndex < Index
4
+ def initialize(*args, initial_cap: nil, m: nil, ef_construction: nil, ef_runtime: nil, epsilon: nil, **options)
5
+ super(*args, **options)
6
+ @algorithm = "HNSW"
7
+ @initial_cap = initial_cap
8
+ @m = m
9
+ @ef_construction = ef_construction
10
+ @ef_runtime = ef_runtime
11
+ @epsilon = epsilon
12
+ end
13
+
14
+ private
15
+
16
+ def create_params
17
+ params = {}
18
+ params["INITIAL_CAP"] = @initial_cap if @initial_cap
19
+ params["M"] = @m if @m
20
+ params["EF_CONSTRUCTION"] = @ef_construction if @ef_construction
21
+ params["EF_RUNTIME"] = @ef_runtime if @ef_runtime
22
+ params["EPSILON"] = @epsilon if @epsilon
23
+ params
24
+ end
25
+ end
26
+ end
27
+ end
@@ -0,0 +1,236 @@
1
+ module Neighbor
2
+ module Redis
3
+ class Index
4
+ def initialize(name, dimensions:, distance:, type: "float32", redis_type: "hash")
5
+ @index_name = index_name(name)
6
+ @global_prefix = "neighbor:items:"
7
+ @prefix = "#{@global_prefix}#{name}:"
8
+
9
+ @dimensions = dimensions
10
+
11
+ unless distance.nil?
12
+ @distance_metric =
13
+ case distance.to_s
14
+ when "l2", "cosine"
15
+ distance.to_s.upcase
16
+ when "inner_product"
17
+ "IP"
18
+ else
19
+ raise ArgumentError, "invalid distance"
20
+ end
21
+ end
22
+
23
+ @float64 =
24
+ case type.to_s
25
+ when "float32"
26
+ false
27
+ when "float64"
28
+ true
29
+ else
30
+ raise ArgumentError, "invalid type"
31
+ end
32
+
33
+ @json =
34
+ case redis_type.to_s
35
+ when "hash"
36
+ false
37
+ when "json"
38
+ require "json"
39
+ true
40
+ else
41
+ raise ArgumentError, "invalid redis_type"
42
+ end
43
+ end
44
+
45
+ def self.create(...)
46
+ index = new(...)
47
+ index.create
48
+ index
49
+ end
50
+
51
+ def create
52
+ params = {
53
+ "TYPE" => @float64 ? "FLOAT64" : "FLOAT32",
54
+ "DIM" => @dimensions,
55
+ "DISTANCE_METRIC" => @distance_metric,
56
+ }.merge(create_params)
57
+
58
+ command = ["FT.CREATE", @index_name]
59
+ command.push("ON", "JSON") if @json
60
+ command.push("PREFIX", "1", @prefix, "SCHEMA")
61
+ command.push("$.v", "AS") if @json
62
+ command.push("v", "VECTOR", @algorithm, params.size * 2, params)
63
+ ft_command { redis.call(*command) }
64
+ end
65
+
66
+ def exists?
67
+ redis.call("FT.INFO", @index_name)
68
+ true
69
+ rescue ArgumentError
70
+ # fix for invalid value for Float(): "-nan"
71
+ true
72
+ rescue => e
73
+ raise unless e.message.include?("Unknown Index name")
74
+ false
75
+ end
76
+
77
+ def add(id, embedding)
78
+ add_all([id], [embedding])
79
+ end
80
+
81
+ def add_all(ids, embeddings)
82
+ ids = ids.to_a
83
+ embeddings = embeddings.to_a
84
+
85
+ raise ArgumentError, "different sizes" if ids.size != embeddings.size
86
+
87
+ embeddings.each { |e| check_dimensions(e) }
88
+
89
+ redis.pipelined do |pipeline|
90
+ ids.zip(embeddings).each do |id, embedding|
91
+ if @json
92
+ pipeline.call("JSON.SET", item_key(id), "$", JSON.generate({v: embedding}))
93
+ else
94
+ pipeline.call("HSET", item_key(id), {v: to_binary(embedding)})
95
+ end
96
+ end
97
+ end
98
+ end
99
+
100
+ def remove(id)
101
+ remove_all([id])
102
+ end
103
+
104
+ def remove_all(ids)
105
+ redis.call("DEL", ids.map { |id| item_key(id) })
106
+ end
107
+
108
+ def search(embedding, count: 5)
109
+ check_dimensions(embedding)
110
+
111
+ search_by_blob(to_binary(embedding), count)
112
+ end
113
+
114
+ def find(id)
115
+ if @json
116
+ s = redis.call("JSON.GET", item_key(id), "$.v")
117
+ JSON.parse(s)[0] if s
118
+ else
119
+ from_binary(redis.call("HGET", item_key(id), "v"))
120
+ end
121
+ end
122
+
123
+ def nearest(id, count: 5)
124
+ embedding =
125
+ if @json
126
+ s = redis.call("JSON.GET", item_key(id), "$.v")
127
+ to_binary(JSON.parse(s)[0]) if s
128
+ else
129
+ redis.call("HGET", item_key(id), "v")
130
+ end
131
+
132
+ unless embedding
133
+ raise Error, "Could not find item #{id}"
134
+ end
135
+
136
+ search_by_blob(embedding, count + 1).reject { |v| v[:id] == id.to_s }.first(count)
137
+ end
138
+
139
+ def drop
140
+ drop_index
141
+ drop_keys
142
+ end
143
+
144
+ def promote(alias_name)
145
+ redis.call("FT.ALIASUPDATE", index_name(alias_name), @index_name)
146
+ end
147
+
148
+ private
149
+
150
+ def index_name(name)
151
+ if name.include?(":")
152
+ raise ArgumentError, "Invalid name"
153
+ end
154
+
155
+ "neighbor-idx-#{name}"
156
+ end
157
+
158
+ def check_dimensions(embedding)
159
+ if embedding.size != @dimensions
160
+ raise ArgumentError, "expected #{@dimensions} dimensions"
161
+ end
162
+ end
163
+
164
+ def item_key(id)
165
+ "#{@prefix}#{id}"
166
+ end
167
+
168
+ def search_by_blob(blob, count)
169
+ resp = redis.call("FT.SEARCH", @index_name, "*=>[KNN #{count.to_i} @v $BLOB]", "PARAMS", "2", "BLOB", blob, "SORTBY", "__v_score", "DIALECT", "2")
170
+ len = resp.shift
171
+ prefix_length = nil
172
+ len.times.map do |i|
173
+ key, info = resp.shift(2)
174
+ info = info.each_slice(2).to_h
175
+ score = info["__v_score"].to_f
176
+ distance =
177
+ case @distance_metric
178
+ when "L2"
179
+ Math.sqrt(score)
180
+ when "IP"
181
+ (score * -1) + 1
182
+ else
183
+ score
184
+ end
185
+
186
+ prefix_length ||= find_prefix_length(key)
187
+
188
+ {
189
+ id: key[prefix_length..-1],
190
+ distance: distance
191
+ }
192
+ end
193
+ end
194
+
195
+ def find_prefix_length(key)
196
+ key[@global_prefix.length..-1].index(":") + @global_prefix.length + 1
197
+ end
198
+
199
+ def drop_index
200
+ redis.call("FT.DROPINDEX", @index_name)
201
+ end
202
+
203
+ def drop_keys
204
+ cursor = 0
205
+ begin
206
+ cursor, keys = redis.call("SCAN", cursor, "MATCH", "#{@prefix}*", "COUNT", 100)
207
+ redis.call("DEL", keys) if keys.any?
208
+ end while cursor != "0"
209
+ end
210
+
211
+ def to_binary(embedding)
212
+ embedding.to_a.pack(pack_format)
213
+ end
214
+
215
+ def from_binary(s)
216
+ s.unpack(pack_format)
217
+ end
218
+
219
+ def pack_format
220
+ @pack_format ||= @float64 ? "d#{@dimensions}" : "f#{@dimensions}"
221
+ end
222
+
223
+ # just use for create for now
224
+ def ft_command
225
+ yield
226
+ rescue => e
227
+ raise Error, "RediSearch not installed" if e.message.include?("ERR unknown command 'FT.")
228
+ raise
229
+ end
230
+
231
+ def redis
232
+ Redis.client
233
+ end
234
+ end
235
+ end
236
+ end
@@ -0,0 +1,5 @@
1
+ module Neighbor
2
+ module Redis
3
+ VERSION = "0.1.0"
4
+ end
5
+ end
@@ -0,0 +1,18 @@
1
+ # dependencies
2
+ require "redis-client"
3
+
4
+ # modules
5
+ require_relative "redis/index"
6
+ require_relative "redis/flat_index"
7
+ require_relative "redis/hnsw_index"
8
+ require_relative "redis/version"
9
+
10
+ module Neighbor
11
+ module Redis
12
+ class Error < StandardError; end
13
+
14
+ class << self
15
+ attr_accessor :client
16
+ end
17
+ end
18
+ end
@@ -0,0 +1 @@
1
+ require_relative "neighbor/redis"
metadata ADDED
@@ -0,0 +1,65 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: neighbor-redis
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Andrew Kane
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2023-02-21 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: redis-client
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ description:
28
+ email: andrew@ankane.org
29
+ executables: []
30
+ extensions: []
31
+ extra_rdoc_files: []
32
+ files:
33
+ - CHANGELOG.md
34
+ - LICENSE.txt
35
+ - README.md
36
+ - lib/neighbor-redis.rb
37
+ - lib/neighbor/redis.rb
38
+ - lib/neighbor/redis/flat_index.rb
39
+ - lib/neighbor/redis/hnsw_index.rb
40
+ - lib/neighbor/redis/index.rb
41
+ - lib/neighbor/redis/version.rb
42
+ homepage: https://github.com/ankane/neighbor-redis
43
+ licenses:
44
+ - MIT
45
+ metadata: {}
46
+ post_install_message:
47
+ rdoc_options: []
48
+ require_paths:
49
+ - lib
50
+ required_ruby_version: !ruby/object:Gem::Requirement
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ version: '2.7'
55
+ required_rubygems_version: !ruby/object:Gem::Requirement
56
+ requirements:
57
+ - - ">="
58
+ - !ruby/object:Gem::Version
59
+ version: '0'
60
+ requirements: []
61
+ rubygems_version: 3.4.6
62
+ signing_key:
63
+ specification_version: 4
64
+ summary: Nearest neighbor search for Ruby and Redis
65
+ test_files: []