predictor 2.2.0 → 2.3.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Changelog.md +6 -1
- data/README.md +15 -0
- data/Rakefile +2 -0
- data/benchmark/process.rb +47 -0
- data/lib/predictor.rb +1 -0
- data/lib/predictor/base.rb +70 -5
- data/lib/predictor/input_matrix.rb +14 -13
- data/lib/predictor/predictor.rb +115 -0
- data/lib/predictor/version.rb +1 -1
- data/spec/base_spec.rb +171 -120
- data/spec/input_matrix_spec.rb +5 -0
- metadata +3 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c267ff0bf82d11ccefe19ae4314a62b09e9300fa
|
4
|
+
data.tar.gz: 7eed86238af5297d8c32250c6ebf28f992b9e331
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: eaa28d8a14437dd11742499a10211827ebc212feeb99d421185676d4a06ed1253f67a5252e7e4a748c6f87c69b6d0ed136b461f1df7b557ef4c92514b399e2c3
|
7
|
+
data.tar.gz: c60c5a6572cb6c42ef4e9633e29a73e8baee84384b6dbc3371b9979e7b0a5fc2c6bc2b67dd64e0a2733a38fbadb1e8df73b4e024bdb22b501aaf88b1732d438d
|
data/Changelog.md
CHANGED
@@ -2,7 +2,12 @@
|
|
2
2
|
Predictor Changelog
|
3
3
|
=========
|
4
4
|
|
5
|
-
2.
|
5
|
+
2.3.0
|
6
|
+
---------------------
|
7
|
+
* The logic for processing item similarities was ported to a Lua script. Use `Predictor.processing_technique(:lua)` to use the Lua script for all similarity calculations, or use `MyRecommender.processing_technique(:lua)` to use it for specific recommenders. It is substantially faster than the default (old) Ruby mechanism, but has the disadvantage of blocking the Redis server while it runs.
|
8
|
+
* An alternate method of calculating item similarities was added, which uses a ZUNIONSTORE across item sets. The results are similar to those achieved by using the Ruby or Lua scripts, but faster. Use `Predictor.processing_technique(:union)` to use the ZUNIONSTORE technique for all similarity calculations, or use `MyRecommender.processing_technique(:union)` to use it for specific recommenders.
|
9
|
+
|
10
|
+
2.2.0 (2014-06-24)
|
6
11
|
---------------------
|
7
12
|
* The namespace used for keys in Redis is now configurable on a global or per-class basis. See the readme for more information. If you were overriding the redis_prefix instance method before, it is recommended that you use the new redis_prefix class method instead.
|
8
13
|
* Data stored in Redis is now namespaced by the class name of the recommender it is stored by. This change ensures that different recommenders with input matrices of the same name don't overwrite each others' data. After upgrading you'll need to either reindex your data in Redis or configure Predictor to use the naming system you were using before. If you were using the defaults before and you're not worried about matrix name collisions, you can mimic the old behavior with:
|
data/README.md
CHANGED
@@ -219,6 +219,21 @@ You can also configure the namespace used by each class you create:
|
|
219
219
|
end
|
220
220
|
```
|
221
221
|
|
222
|
+
Processing Items
|
223
|
+
---------------------
|
224
|
+
As of 2.3.0, there are now multiple techniques available for processing item similarities. You can choose between them by setting a global default like `Predictor.processing_technique(:lua)` or setting a technique for certain classes like `CourseRecommender.processing_technique(:union)`. There are three values.
|
225
|
+
- :ruby - This is the default, and is how Predictor calculated similarities before 2.3.0. With this technique the Jaccard and Sorensen calculations are performed in Ruby, with frequent calls to Redis to retrieve simple values. It is somewhat slow.
|
226
|
+
- :lua - This option performs the Jaccard and Sorensen calculations in a Lua script on the Redis server. It is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run. The period of blocking will vary based on the size and disposition of your data, but each call may take up to several hundred milliseconds. If your application requires your Redis server to always return results quickly, and you're not able to simply run calculations during off-hours, you should use a different strategy.
|
227
|
+
- :union - This option skips Jaccard and Sorensen entirely, and uses a simpler technique involving a ZUNIONSTORE across many item sets to calculate similarities. The results are different from, but similar to the results of using the Jaccard and Sorensen algorithms. It is even faster than the :lua option and does not have the same problem of blocking Redis for long periods of time, but before using it you should sample the output to ensure that it is good enough for your application.
|
228
|
+
|
229
|
+
Predictor now contains a benchmarking script that you can use to compare the speed of these options. An example output from the processing of a relatively small dataset is:
|
230
|
+
|
231
|
+
```
|
232
|
+
ruby = 21.098 seconds
|
233
|
+
lua = 2.106 seconds
|
234
|
+
union = 0.741 seconds
|
235
|
+
```
|
236
|
+
|
222
237
|
Upgrading from 1.0 to 2.0
|
223
238
|
---------------------
|
224
239
|
As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:
|
data/Rakefile
CHANGED
@@ -0,0 +1,47 @@
|
|
1
|
+
namespace :benchmark do
|
2
|
+
task :process do
|
3
|
+
require 'predictor'
|
4
|
+
require 'pry'
|
5
|
+
require 'logger'
|
6
|
+
|
7
|
+
Predictor.redis = Redis.new #logger: Logger.new(STDOUT)
|
8
|
+
Predictor.redis_prefix "predictor-benchmark"
|
9
|
+
|
10
|
+
def flush!
|
11
|
+
keys = Predictor.redis.keys("predictor-benchmark*")
|
12
|
+
Predictor.redis.del(keys) if keys.any?
|
13
|
+
end
|
14
|
+
|
15
|
+
class ItemRecommender
|
16
|
+
include Predictor::Base
|
17
|
+
|
18
|
+
input_matrix :users, weight: 2.0
|
19
|
+
input_matrix :parts, weight: 1.0
|
20
|
+
end
|
21
|
+
|
22
|
+
flush!
|
23
|
+
|
24
|
+
items = (1..200).map { |i| "item-#{i}" }
|
25
|
+
users = (1..100).map { |i| "user-#{i}" }
|
26
|
+
parts = (1..100).map { |i| "part-#{i}" }
|
27
|
+
|
28
|
+
r = ItemRecommender.new
|
29
|
+
|
30
|
+
start = Time.now
|
31
|
+
users.each { |user| r.users.add_to_set user, *items.sample(40) }
|
32
|
+
parts.each { |part| r.parts.add_to_set part, *items.sample(40) }
|
33
|
+
elapsed = Time.now - start
|
34
|
+
|
35
|
+
puts "add_to_set = #{elapsed.round(3)} seconds"
|
36
|
+
|
37
|
+
[:ruby, :lua, :union].each do |technique|
|
38
|
+
start = Time.now
|
39
|
+
Predictor.processing_technique technique
|
40
|
+
r.process!
|
41
|
+
elapsed = Time.now - start
|
42
|
+
puts "#{technique} = #{elapsed.round(3)} seconds"
|
43
|
+
end
|
44
|
+
|
45
|
+
flush!
|
46
|
+
end
|
47
|
+
end
|
data/lib/predictor.rb
CHANGED
data/lib/predictor/base.rb
CHANGED
@@ -46,6 +46,14 @@ module Predictor::Base
|
|
46
46
|
to_s
|
47
47
|
end
|
48
48
|
end
|
49
|
+
|
50
|
+
def processing_technique(technique)
|
51
|
+
@technique = technique
|
52
|
+
end
|
53
|
+
|
54
|
+
def get_processing_technique
|
55
|
+
@technique || Predictor.get_processing_technique
|
56
|
+
end
|
49
57
|
end
|
50
58
|
|
51
59
|
def input_matrices
|
@@ -75,7 +83,7 @@ module Predictor::Base
|
|
75
83
|
end
|
76
84
|
end
|
77
85
|
|
78
|
-
def respond_to?(method)
|
86
|
+
def respond_to?(method, include_all = false)
|
79
87
|
input_matrices.has_key?(method) ? true : super
|
80
88
|
end
|
81
89
|
|
@@ -104,9 +112,11 @@ module Predictor::Base
|
|
104
112
|
keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item.to_s])
|
105
113
|
end
|
106
114
|
|
107
|
-
def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, offset: 0, limit: -1, exclusion_set: [], boost: {})
|
115
|
+
def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, on: nil, offset: 0, limit: -1, exclusion_set: [], boost: {})
|
108
116
|
fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)
|
109
117
|
|
118
|
+
on = Array(on)
|
119
|
+
|
110
120
|
if matrix_label
|
111
121
|
matrix = input_matrices[matrix_label]
|
112
122
|
item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
|
@@ -150,6 +160,13 @@ module Predictor::Base
|
|
150
160
|
multi.zunionstore 'temp', item_keys, weights: weights
|
151
161
|
multi.zrem 'temp', item_set if item_set.any?
|
152
162
|
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
|
163
|
+
|
164
|
+
if on.any?
|
165
|
+
multi.zadd 'temp2', on.map{ |val| [0.0, val] }
|
166
|
+
multi.zinterstore 'temp', ['temp', 'temp2']
|
167
|
+
multi.del 'temp2'
|
168
|
+
end
|
169
|
+
|
153
170
|
predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
|
154
171
|
multi.del 'temp'
|
155
172
|
end
|
@@ -178,10 +195,58 @@ module Predictor::Base
|
|
178
195
|
end
|
179
196
|
|
180
197
|
def process_items!(*items)
|
181
|
-
items = items.flatten if items.count == 1 && items[0].is_a?(Array)
|
182
|
-
|
183
|
-
|
198
|
+
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
|
199
|
+
|
200
|
+
case self.class.get_processing_technique
|
201
|
+
when :lua
|
202
|
+
matrix_data = {}
|
203
|
+
input_matrices.each do |name, matrix|
|
204
|
+
matrix_data[name] = {weight: matrix.weight, measure: matrix.measure_name}
|
205
|
+
end
|
206
|
+
matrix_json = JSON.dump(matrix_data)
|
207
|
+
|
208
|
+
items.each do |item|
|
209
|
+
Predictor.process_lua_script(redis_key, matrix_json, similarity_limit, item)
|
210
|
+
end
|
211
|
+
when :union
|
212
|
+
items.each do |item|
|
213
|
+
keys = []
|
214
|
+
weights = []
|
215
|
+
|
216
|
+
input_matrices.each do |key, matrix|
|
217
|
+
k = matrix.redis_key(:sets, item)
|
218
|
+
item_keys = Predictor.redis.smembers(k).map { |set| matrix.redis_key(:items, set) }
|
219
|
+
|
220
|
+
counts = Predictor.redis.multi do |multi|
|
221
|
+
item_keys.each { |key| Predictor.redis.scard(key) }
|
222
|
+
end
|
223
|
+
|
224
|
+
item_keys.zip(counts).each do |key, count|
|
225
|
+
unless count.zero?
|
226
|
+
keys << key
|
227
|
+
weights << matrix.weight / count
|
228
|
+
end
|
229
|
+
end
|
230
|
+
end
|
231
|
+
|
232
|
+
Predictor.redis.multi do |multi|
|
233
|
+
key = redis_key(:similarities, item)
|
234
|
+
multi.del(key)
|
235
|
+
|
236
|
+
if keys.any?
|
237
|
+
multi.zunionstore(key, keys, weights: weights)
|
238
|
+
multi.zrem(key, item)
|
239
|
+
multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
|
240
|
+
multi.zunionstore key, [key] # Rewrite zset for optimized storage.
|
241
|
+
end
|
242
|
+
end
|
243
|
+
end
|
244
|
+
else # Default to old behavior, processing things in Ruby.
|
245
|
+
items.each do |item|
|
246
|
+
related_items(item).each { |related_item| cache_similarity(item, related_item) }
|
247
|
+
end
|
184
248
|
end
|
249
|
+
|
185
250
|
return self
|
186
251
|
end
|
187
252
|
|
@@ -4,6 +4,10 @@ module Predictor
|
|
4
4
|
@opts = opts
|
5
5
|
end
|
6
6
|
|
7
|
+
def measure_name
|
8
|
+
@opts.fetch(:measure, :jaccard_index)
|
9
|
+
end
|
10
|
+
|
7
11
|
def base
|
8
12
|
@opts[:base]
|
9
13
|
end
|
@@ -22,8 +26,16 @@ module Predictor
|
|
22
26
|
|
23
27
|
def add_to_set(set, *items)
|
24
28
|
items = items.flatten if items.count == 1 && items[0].is_a?(Array)
|
25
|
-
|
26
|
-
|
29
|
+
if items.any?
|
30
|
+
Predictor.redis.multi do |redis|
|
31
|
+
redis.sadd(parent_redis_key(:all_items), items)
|
32
|
+
redis.sadd(redis_key(:items, set), items)
|
33
|
+
|
34
|
+
items.each do |item|
|
35
|
+
# add the set to the item's set--inverting the sets
|
36
|
+
redis.sadd(redis_key(:sets, item), set)
|
37
|
+
end
|
38
|
+
end
|
27
39
|
end
|
28
40
|
end
|
29
41
|
|
@@ -64,7 +76,6 @@ module Predictor
|
|
64
76
|
end
|
65
77
|
|
66
78
|
def score(item1, item2)
|
67
|
-
measure_name = @opts.fetch(:measure, :jaccard_index)
|
68
79
|
Distance.send(measure_name, redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
|
69
80
|
end
|
70
81
|
|
@@ -72,15 +83,5 @@ module Predictor
|
|
72
83
|
warn 'InputMatrix#calculate_jaccard is now deprecated. Use InputMatrix#score instead'
|
73
84
|
Distance.jaccard_index(redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
|
74
85
|
end
|
75
|
-
|
76
|
-
private
|
77
|
-
|
78
|
-
def add_single_nomulti(set, item)
|
79
|
-
Predictor.redis.sadd(parent_redis_key(:all_items), item)
|
80
|
-
Predictor.redis.sadd(redis_key(:items, set), item)
|
81
|
-
# add the set to the item's set--inverting the sets
|
82
|
-
Predictor.redis.sadd(redis_key(:sets, item), set)
|
83
|
-
end
|
84
|
-
|
85
86
|
end
|
86
87
|
end
|
data/lib/predictor/predictor.rb
CHANGED
@@ -35,4 +35,119 @@ module Predictor
|
|
35
35
|
def self.constantize(klass)
|
36
36
|
Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__)
|
37
37
|
end
|
38
|
+
|
39
|
+
def self.processing_technique(algorithm)
|
40
|
+
@technique = algorithm
|
41
|
+
end
|
42
|
+
|
43
|
+
def self.get_processing_technique
|
44
|
+
@technique || :ruby
|
45
|
+
end
|
46
|
+
|
47
|
+
def self.process_lua_script(*args)
|
48
|
+
@process_sha ||= redis.script(:load, PROCESS_ITEMS_LUA_SCRIPT)
|
49
|
+
redis.evalsha(@process_sha, argv: args)
|
50
|
+
end
|
51
|
+
|
52
|
+
PROCESS_ITEMS_LUA_SCRIPT = <<-LUA
|
53
|
+
local redis_prefix = ARGV[1]
|
54
|
+
local input_matrices = cjson.decode(ARGV[2])
|
55
|
+
local similarity_limit = tonumber(ARGV[3])
|
56
|
+
local item = ARGV[4]
|
57
|
+
local keys = {}
|
58
|
+
|
59
|
+
for name, options in pairs(input_matrices) do
|
60
|
+
local key = table.concat({redis_prefix, name, 'sets', item}, ':')
|
61
|
+
local sets = redis.call('SMEMBERS', key)
|
62
|
+
for _, set in ipairs(sets) do
|
63
|
+
table.insert(keys, table.concat({redis_prefix, name, 'items', set}, ':'))
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
-- Account for empty tables.
|
68
|
+
if next(keys) == nil then
|
69
|
+
return nil
|
70
|
+
end
|
71
|
+
|
72
|
+
local related_items = redis.call('SUNION', unpack(keys))
|
73
|
+
|
74
|
+
local function add_similarity_if_necessary(item, similarity, score)
|
75
|
+
local store = true
|
76
|
+
local key = table.concat({redis_prefix, 'similarities', item}, ':')
|
77
|
+
|
78
|
+
if similarity_limit ~= nil then
|
79
|
+
local zrank = redis.call('ZRANK', key, similarity)
|
80
|
+
|
81
|
+
if zrank ~= nil then
|
82
|
+
local zcard = redis.call('ZCARD', key)
|
83
|
+
|
84
|
+
if zcard >= similarity_limit then
|
85
|
+
-- Similarity is not already stored and we are at limit of similarities.
|
86
|
+
|
87
|
+
local lowest_scored_item = redis.call('ZRANGEBYSCORE', key, '0', '+inf', 'withscores', 'limit', 0, 1)
|
88
|
+
|
89
|
+
if #lowest_scored_item > 0 then
|
90
|
+
-- If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
|
91
|
+
if score <= tonumber(lowest_scored_item[2]) then
|
92
|
+
store = false
|
93
|
+
else
|
94
|
+
redis.call('ZREM', key, lowest_scored_item[1])
|
95
|
+
end
|
96
|
+
end
|
97
|
+
end
|
98
|
+
end
|
99
|
+
end
|
100
|
+
|
101
|
+
if store then
|
102
|
+
redis.call('ZADD', key, score, similarity)
|
103
|
+
end
|
104
|
+
end
|
105
|
+
|
106
|
+
for i, related_item in ipairs(related_items) do
|
107
|
+
-- Disregard the current item.
|
108
|
+
if related_item ~= item then
|
109
|
+
local score = 0.0
|
110
|
+
|
111
|
+
for name, matrix in pairs(input_matrices) do
|
112
|
+
local s = 0.0
|
113
|
+
|
114
|
+
local key_1 = table.concat({redis_prefix, name, 'sets', item}, ':')
|
115
|
+
local key_2 = table.concat({redis_prefix, name, 'sets', related_item}, ':')
|
116
|
+
|
117
|
+
if matrix.measure == 'jaccard_index' then
|
118
|
+
local x = tonumber(redis.call('SINTERSTORE', 'temp', key_1, key_2))
|
119
|
+
local y = tonumber(redis.call('SUNIONSTORE', 'temp', key_1, key_2))
|
120
|
+
redis.call('DEL', 'temp')
|
121
|
+
|
122
|
+
if y > 0 then
|
123
|
+
s = s + (x / y)
|
124
|
+
end
|
125
|
+
elseif matrix.measure == 'sorensen_coefficient' then
|
126
|
+
local x = redis.call('SINTERSTORE', 'temp', key_1, key_2)
|
127
|
+
local y = redis.call('SCARD', key_1)
|
128
|
+
local z = redis.call('SCARD', key_2)
|
129
|
+
|
130
|
+
redis.call('DEL', 'temp')
|
131
|
+
|
132
|
+
local denom = y + z
|
133
|
+
if denom > 0 then
|
134
|
+
s = s + (2 * x / denom)
|
135
|
+
end
|
136
|
+
else
|
137
|
+
error("Bad matrix.measure: " .. matrix.measure)
|
138
|
+
end
|
139
|
+
|
140
|
+
score = score + (s * matrix.weight)
|
141
|
+
end
|
142
|
+
|
143
|
+
if score > 0 then
|
144
|
+
add_similarity_if_necessary(item, related_item, score)
|
145
|
+
add_similarity_if_necessary(related_item, item, score)
|
146
|
+
else
|
147
|
+
redis.call('ZREM', table.concat({redis_prefix, 'similarities', item}, ':'), related_item)
|
148
|
+
redis.call('ZREM', table.concat({redis_prefix, 'similarities', related_item}, ':'), item)
|
149
|
+
end
|
150
|
+
end
|
151
|
+
end
|
152
|
+
LUA
|
38
153
|
end
|
data/lib/predictor/version.rb
CHANGED
data/spec/base_spec.rb
CHANGED
@@ -8,6 +8,9 @@ describe Predictor::Base do
|
|
8
8
|
BaseRecommender.redis_prefix(nil)
|
9
9
|
UserRecommender.input_matrices = {}
|
10
10
|
UserRecommender.reset_similarity_limit!
|
11
|
+
BaseRecommender.processing_technique nil
|
12
|
+
UserRecommender.processing_technique nil
|
13
|
+
Predictor.processing_technique nil
|
11
14
|
end
|
12
15
|
|
13
16
|
describe "configuration" do
|
@@ -49,6 +52,14 @@ describe Predictor::Base do
|
|
49
52
|
sm = BaseRecommender.new
|
50
53
|
expect(sm.myinput).to be_a(Predictor::InputMatrix)
|
51
54
|
end
|
55
|
+
|
56
|
+
it "should accept a custom processing_technique, or default to Predictor's default" do
|
57
|
+
BaseRecommender.get_processing_technique.should == :ruby
|
58
|
+
Predictor.processing_technique :lua
|
59
|
+
BaseRecommender.get_processing_technique.should == :lua
|
60
|
+
BaseRecommender.processing_technique :union
|
61
|
+
BaseRecommender.get_processing_technique.should == :union
|
62
|
+
end
|
52
63
|
end
|
53
64
|
|
54
65
|
describe "redis_key" do
|
@@ -202,7 +213,7 @@ describe Predictor::Base do
|
|
202
213
|
end
|
203
214
|
|
204
215
|
describe "predictions_for" do
|
205
|
-
it "
|
216
|
+
it "accepts an :on option to return scores of specific objects" do
|
206
217
|
BaseRecommender.input_matrix(:users, weight: 4.0)
|
207
218
|
BaseRecommender.input_matrix(:tags, weight: 1.0)
|
208
219
|
sm = BaseRecommender.new
|
@@ -211,93 +222,171 @@ describe Predictor::Base do
|
|
211
222
|
sm.users.add_to_set('another', "fnord", "other")
|
212
223
|
sm.users.add_to_set('another', "nada")
|
213
224
|
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
|
214
|
-
sm.tags.add_to_set('tag2', "bar", "shmoo")
|
225
|
+
sm.tags.add_to_set('tag2', "bar", "shmoo", "other")
|
215
226
|
sm.tags.add_to_set('tag3', "shmoo", "nada")
|
216
227
|
sm.process!
|
217
|
-
predictions = sm.predictions_for('me', matrix_label: :users)
|
218
|
-
expect(predictions).to eq([
|
219
|
-
predictions = sm.predictions_for(
|
220
|
-
expect(predictions).to eq([
|
221
|
-
predictions = sm.predictions_for('me', matrix_label: :users,
|
222
|
-
expect(predictions).to eq([
|
223
|
-
predictions = sm.predictions_for(
|
224
|
-
expect(predictions).to eq([
|
228
|
+
predictions = sm.predictions_for('me', matrix_label: :users, on: 'other', with_scores: true)
|
229
|
+
expect(predictions).to eq([['other', 3.0]])
|
230
|
+
predictions = sm.predictions_for('me', matrix_label: :users, on: ['other'], with_scores: true)
|
231
|
+
expect(predictions).to eq([['other', 3.0]])
|
232
|
+
predictions = sm.predictions_for('me', matrix_label: :users, on: ['other', 'nada'], with_scores: true)
|
233
|
+
expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
|
234
|
+
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'], with_scores: true)
|
235
|
+
expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
|
236
|
+
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'])
|
237
|
+
expect(predictions).to eq(['other', 'nada'])
|
238
|
+
predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, limit: 1, with_scores: true)
|
239
|
+
expect(predictions).to eq([["other", 3.0]])
|
240
|
+
predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, with_scores: true)
|
241
|
+
expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
|
225
242
|
end
|
243
|
+
end
|
226
244
|
|
227
|
-
|
228
|
-
|
229
|
-
|
230
|
-
|
231
|
-
|
232
|
-
sm.users.add_to_set('not_me', "foo", "shmoo")
|
233
|
-
sm.users.add_to_set('another', "fnord", "other")
|
234
|
-
sm.users.add_to_set('another', "nada")
|
235
|
-
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
|
236
|
-
sm.tags.add_to_set('tag2', "bar", "shmoo")
|
237
|
-
sm.tags.add_to_set('tag3', "shmoo", "nada")
|
238
|
-
sm.process!
|
245
|
+
[:ruby, :lua, :union].each do |technique|
|
246
|
+
describe "predictions_for with #{technique} processing" do
|
247
|
+
before do
|
248
|
+
Predictor.processing_technique(technique)
|
249
|
+
end
|
239
250
|
|
240
|
-
|
241
|
-
|
242
|
-
|
243
|
-
|
244
|
-
|
245
|
-
|
246
|
-
|
247
|
-
|
248
|
-
|
249
|
-
|
250
|
-
|
251
|
-
|
252
|
-
|
253
|
-
|
254
|
-
|
255
|
-
|
256
|
-
|
257
|
-
|
258
|
-
|
259
|
-
|
260
|
-
|
261
|
-
|
262
|
-
|
263
|
-
|
264
|
-
|
265
|
-
|
266
|
-
|
267
|
-
|
268
|
-
|
269
|
-
|
270
|
-
|
271
|
-
|
272
|
-
|
273
|
-
|
274
|
-
|
275
|
-
|
276
|
-
|
277
|
-
|
278
|
-
|
279
|
-
|
280
|
-
|
251
|
+
it "returns relevant predictions" do
|
252
|
+
BaseRecommender.input_matrix(:users, weight: 4.0)
|
253
|
+
BaseRecommender.input_matrix(:tags, weight: 1.0)
|
254
|
+
sm = BaseRecommender.new
|
255
|
+
sm.users.add_to_set('me', "foo", "bar", "fnord")
|
256
|
+
sm.users.add_to_set('not_me', "foo", "shmoo")
|
257
|
+
sm.users.add_to_set('another', "fnord", "other")
|
258
|
+
sm.users.add_to_set('another', "nada")
|
259
|
+
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
|
260
|
+
sm.tags.add_to_set('tag2', "bar", "shmoo")
|
261
|
+
sm.tags.add_to_set('tag3', "shmoo", "nada")
|
262
|
+
sm.process!
|
263
|
+
predictions = sm.predictions_for('me', matrix_label: :users)
|
264
|
+
expect(predictions).to eq(["shmoo", "other", "nada"])
|
265
|
+
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
|
266
|
+
expect(predictions).to eq(["shmoo", "other", "nada"])
|
267
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
|
268
|
+
expect(predictions).to eq(["other"])
|
269
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
|
270
|
+
expect(predictions).to eq(["other", "nada"])
|
271
|
+
end
|
272
|
+
|
273
|
+
it "accepts a :boost option" do
|
274
|
+
BaseRecommender.input_matrix(:users, weight: 4.0)
|
275
|
+
BaseRecommender.input_matrix(:tags, weight: 1.0)
|
276
|
+
sm = BaseRecommender.new
|
277
|
+
sm.users.add_to_set('me', "foo", "bar", "fnord")
|
278
|
+
sm.users.add_to_set('not_me', "foo", "shmoo")
|
279
|
+
sm.users.add_to_set('another', "fnord", "other")
|
280
|
+
sm.users.add_to_set('another', "nada")
|
281
|
+
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
|
282
|
+
sm.tags.add_to_set('tag2', "bar", "shmoo")
|
283
|
+
sm.tags.add_to_set('tag3', "shmoo", "nada")
|
284
|
+
sm.process!
|
285
|
+
|
286
|
+
# Syntax #1: Tags passed as array, weights assumed to be 1.0
|
287
|
+
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
|
288
|
+
expect(predictions).to eq(["shmoo", "nada", "other"])
|
289
|
+
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
|
290
|
+
expect(predictions).to eq(["shmoo", "nada", "other"])
|
291
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
|
292
|
+
expect(predictions).to eq(["nada"])
|
293
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
|
294
|
+
expect(predictions).to eq(["nada", "other"])
|
295
|
+
|
296
|
+
# Syntax #2: Weights explicitly set.
|
297
|
+
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
|
298
|
+
expect(predictions).to eq(["shmoo", "nada", "other"])
|
299
|
+
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
|
300
|
+
expect(predictions).to eq(["shmoo", "nada", "other"])
|
301
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
|
302
|
+
expect(predictions).to eq(["nada"])
|
303
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
|
304
|
+
expect(predictions).to eq(["nada", "other"])
|
305
|
+
|
306
|
+
# Make sure weights are actually being passed to Redis.
|
307
|
+
shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
|
308
|
+
expect(shmoo[0]).to eq('shmoo')
|
309
|
+
expect(shmoo[1]).to be > 10000
|
310
|
+
expect(nada[0]).to eq('nada')
|
311
|
+
expect(nada[1]).to be > 10000
|
312
|
+
expect(other[0]).to eq('other')
|
313
|
+
expect(other[1]).to be < 10
|
314
|
+
end
|
281
315
|
|
282
|
-
|
283
|
-
|
284
|
-
|
285
|
-
|
286
|
-
|
287
|
-
|
288
|
-
|
289
|
-
|
290
|
-
|
291
|
-
|
292
|
-
|
293
|
-
|
294
|
-
|
295
|
-
|
296
|
-
|
297
|
-
|
298
|
-
|
299
|
-
|
300
|
-
|
316
|
+
it "accepts a :boost option, even with an empty item set" do
|
317
|
+
BaseRecommender.input_matrix(:users, weight: 4.0)
|
318
|
+
BaseRecommender.input_matrix(:tags, weight: 1.0)
|
319
|
+
sm = BaseRecommender.new
|
320
|
+
sm.users.add_to_set('not_me', "foo", "shmoo")
|
321
|
+
sm.users.add_to_set('another', "fnord", "other")
|
322
|
+
sm.users.add_to_set('another', "nada")
|
323
|
+
sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
|
324
|
+
sm.tags.add_to_set('tag2', "bar", "shmoo")
|
325
|
+
sm.tags.add_to_set('tag3', "shmoo", "nada")
|
326
|
+
sm.process!
|
327
|
+
|
328
|
+
# Syntax #1: Tags passed as array, weights assumed to be 1.0
|
329
|
+
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
|
330
|
+
expect(predictions).to eq(["shmoo", "nada"])
|
331
|
+
predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
|
332
|
+
expect(predictions).to eq(["shmoo", "nada"])
|
333
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
|
334
|
+
expect(predictions).to eq(["nada"])
|
335
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
|
336
|
+
expect(predictions).to eq(["nada"])
|
337
|
+
|
338
|
+
# Syntax #2: Weights explicitly set.
|
339
|
+
predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
|
340
|
+
expect(predictions).to eq(["shmoo", "nada"])
|
341
|
+
predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
|
342
|
+
expect(predictions).to eq(["shmoo", "nada"])
|
343
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
|
344
|
+
expect(predictions).to eq(["nada"])
|
345
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
|
346
|
+
expect(predictions).to eq(["nada"])
|
347
|
+
end
|
348
|
+
end
|
349
|
+
|
350
|
+
describe "process_items! with #{technique} processing" do
|
351
|
+
before do
|
352
|
+
Predictor.processing_technique(technique)
|
353
|
+
end
|
354
|
+
|
355
|
+
context "with no similarity_limit" do
|
356
|
+
it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
|
357
|
+
BaseRecommender.input_matrix(:myfirstinput)
|
358
|
+
BaseRecommender.input_matrix(:mysecondinput)
|
359
|
+
BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
|
360
|
+
sm = BaseRecommender.new
|
361
|
+
sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
|
362
|
+
sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
|
363
|
+
sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
|
364
|
+
sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
|
365
|
+
expect(sm.similarities_for('item2')).to be_empty
|
366
|
+
sm.process_items!('item2')
|
367
|
+
similarities = sm.similarities_for('item2')
|
368
|
+
expect(similarities).to eq(["item3", "item1"])
|
369
|
+
end
|
370
|
+
end
|
371
|
+
|
372
|
+
context "with a similarity_limit" do
|
373
|
+
it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
|
374
|
+
BaseRecommender.input_matrix(:myfirstinput)
|
375
|
+
BaseRecommender.input_matrix(:mysecondinput)
|
376
|
+
BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
|
377
|
+
BaseRecommender.limit_similarities_to(1)
|
378
|
+
sm = BaseRecommender.new
|
379
|
+
sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
|
380
|
+
sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
|
381
|
+
sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
|
382
|
+
sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
|
383
|
+
expect(sm.similarities_for('item2')).to be_empty
|
384
|
+
sm.process_items!('item2')
|
385
|
+
similarities = sm.similarities_for('item2')
|
386
|
+
expect(similarities).to include("item3")
|
387
|
+
expect(similarities.length).to eq(1)
|
388
|
+
end
|
389
|
+
end
|
301
390
|
end
|
302
391
|
end
|
303
392
|
|
@@ -343,44 +432,6 @@ describe Predictor::Base do
|
|
343
432
|
end
|
344
433
|
end
|
345
434
|
|
346
|
-
describe "process_items!" do
|
347
|
-
context "with no similarity_limit" do
|
348
|
-
it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
|
349
|
-
BaseRecommender.input_matrix(:myfirstinput)
|
350
|
-
BaseRecommender.input_matrix(:mysecondinput)
|
351
|
-
BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
|
352
|
-
sm = BaseRecommender.new
|
353
|
-
sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
|
354
|
-
sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
|
355
|
-
sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
|
356
|
-
sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
|
357
|
-
expect(sm.similarities_for('item2')).to be_empty
|
358
|
-
sm.process_items!('item2')
|
359
|
-
similarities = sm.similarities_for('item2', with_scores: true)
|
360
|
-
expect(similarities).to include(["item3", 4.0], ["item1", 2.5])
|
361
|
-
end
|
362
|
-
end
|
363
|
-
|
364
|
-
context "with a similarity_limit" do
|
365
|
-
it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
|
366
|
-
BaseRecommender.input_matrix(:myfirstinput)
|
367
|
-
BaseRecommender.input_matrix(:mysecondinput)
|
368
|
-
BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
|
369
|
-
BaseRecommender.limit_similarities_to(1)
|
370
|
-
sm = BaseRecommender.new
|
371
|
-
sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
|
372
|
-
sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
|
373
|
-
sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
|
374
|
-
sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
|
375
|
-
expect(sm.similarities_for('item2')).to be_empty
|
376
|
-
sm.process_items!('item2')
|
377
|
-
similarities = sm.similarities_for('item2', with_scores: true)
|
378
|
-
expect(similarities).to include(["item3", 4.0])
|
379
|
-
expect(similarities.length).to eq(1)
|
380
|
-
end
|
381
|
-
end
|
382
|
-
end
|
383
|
-
|
384
435
|
describe "process!" do
|
385
436
|
it "should call process_items for all_items's" do
|
386
437
|
BaseRecommender.input_matrix(:anotherinput)
|
data/spec/input_matrix_spec.rb
CHANGED
@@ -85,6 +85,11 @@ describe Predictor::InputMatrix do
|
|
85
85
|
expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
|
86
86
|
end
|
87
87
|
|
88
|
+
it "does not crash if the set of items is empty" do
|
89
|
+
@matrix.add_to_set "item1"
|
90
|
+
@matrix.add_to_set "item1", []
|
91
|
+
end
|
92
|
+
|
88
93
|
it "adds the key to each set member's 'items' set" do
|
89
94
|
expect(@matrix.sets_for("foo")).not_to include("item1")
|
90
95
|
expect(@matrix.sets_for("bar")).not_to include("item1")
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: predictor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.3.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pathgather
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-
|
11
|
+
date: 2014-09-05 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: redis
|
@@ -92,6 +92,7 @@ files:
|
|
92
92
|
- LICENSE
|
93
93
|
- README.md
|
94
94
|
- Rakefile
|
95
|
+
- benchmark/process.rb
|
95
96
|
- docs/READMEv1.md
|
96
97
|
- lib/predictor.rb
|
97
98
|
- lib/predictor/base.rb
|