predictor 2.2.0 → 2.3.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f06d8361ac24ffaedb43dc650bba9af6ad62374a
4
- data.tar.gz: c2815b5b8a507026bae58773ac32a1d7188debcb
3
+ metadata.gz: c267ff0bf82d11ccefe19ae4314a62b09e9300fa
4
+ data.tar.gz: 7eed86238af5297d8c32250c6ebf28f992b9e331
5
5
  SHA512:
6
- metadata.gz: 2988190b65071a5d155974db67bc9815614720f57d9e5131ac429c0fcae5d7210527a07548aca73066a63fee542ee6edb7fb9209304041374df04304e72650ff
7
- data.tar.gz: aa8215990ac119de3ca275c9ab666cbe246ffa53f28f5c428361734d7a3a79186bffc007d83056d3b1511beead9ab9ec0aa1f941a97fc8d41378b48b61775b53
6
+ metadata.gz: eaa28d8a14437dd11742499a10211827ebc212feeb99d421185676d4a06ed1253f67a5252e7e4a748c6f87c69b6d0ed136b461f1df7b557ef4c92514b399e2c3
7
+ data.tar.gz: c60c5a6572cb6c42ef4e9633e29a73e8baee84384b6dbc3371b9979e7b0a5fc2c6bc2b67dd64e0a2733a38fbadb1e8df73b4e024bdb22b501aaf88b1732d438d
data/Changelog.md CHANGED
@@ -2,7 +2,12 @@
2
2
  Predictor Changelog
3
3
  =========
4
4
 
5
- 2.2.0 (Unreleased)
5
+ 2.3.0
6
+ ---------------------
7
+ * The logic for processing item similarities was ported to a Lua script. Use `Predictor.processing_technique(:lua)` to use the Lua script for all similarity calculations, or use `MyRecommender.processing_technique(:lua)` to use it for specific recommenders. It is substantially faster than the default (old) Ruby mechanism, but has the disadvantage of blocking the Redis server while it runs.
8
+ * An alternate method of calculating item similarities was added, which uses a ZUNIONSTORE across item sets. The results are similar to those achieved by using the Ruby or Lua scripts, but faster. Use `Predictor.processing_technique(:union)` to use the ZUNIONSTORE technique for all similarity calculations, or use `MyRecommender.processing_technique(:union)` to use it for specific recommenders.
9
+
10
+ 2.2.0 (2014-06-24)
6
11
  ---------------------
7
12
  * The namespace used for keys in Redis is now configurable on a global or per-class basis. See the readme for more information. If you were overriding the redis_prefix instance method before, it is recommended that you use the new redis_prefix class method instead.
8
13
  * Data stored in Redis is now namespaced by the class name of the recommender it is stored by. This change ensures that different recommenders with input matrices of the same name don't overwrite each others' data. After upgrading you'll need to either reindex your data in Redis or configure Predictor to use the naming system you were using before. If you were using the defaults before and you're not worried about matrix name collisions, you can mimic the old behavior with:
data/README.md CHANGED
@@ -219,6 +219,21 @@ You can also configure the namespace used by each class you create:
219
219
  end
220
220
  ```
221
221
 
222
+ Processing Items
223
+ ---------------------
224
+ As of 2.3.0, there are now multiple techniques available for processing item similarities. You can choose between them by setting a global default like `Predictor.processing_technique(:lua)` or setting a technique for certain classes like `CourseRecommender.processing_technique(:union)`. There are three values.
225
+ - :ruby - This is the default, and is how Predictor calculated similarities before 2.3.0. With this technique the Jaccard and Sorensen calculations are performed in Ruby, with frequent calls to Redis to retrieve simple values. It is somewhat slow.
226
+ - :lua - This option performs the Jaccard and Sorensen calculations in a Lua script on the Redis server. It is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run. The period of blocking will vary based on the size and disposition of your data, but each call may take up to several hundred milliseconds. If your application requires your Redis server to always return results quickly, and you're not able to simply run calculations during off-hours, you should use a different strategy.
227
+ - :union - This option skips Jaccard and Sorensen entirely, and uses a simpler technique involving a ZUNIONSTORE across many item sets to calculate similarities. The results are different from, but similar to the results of using the Jaccard and Sorensen algorithms. It is even faster than the :lua option and does not have the same problem of blocking Redis for long periods of time, but before using it you should sample the output to ensure that it is good enough for your application.
228
+
229
+ Predictor now contains a benchmarking script that you can use to compare the speed of these options. An example output from the processing of a relatively small dataset is:
230
+
231
+ ```
232
+ ruby = 21.098 seconds
233
+ lua = 2.106 seconds
234
+ union = 0.741 seconds
235
+ ```
236
+
222
237
  Upgrading from 1.0 to 2.0
223
238
  ---------------------
224
239
  As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:
data/Rakefile CHANGED
@@ -4,3 +4,5 @@ require 'rspec/core/rake_task'
4
4
  RSpec::Core::RakeTask.new(:spec)
5
5
 
6
6
  task :default => :spec
7
+
8
+ Dir["./benchmark/*.rb"].sort.each &method(:require)
@@ -0,0 +1,47 @@
1
+ namespace :benchmark do
2
+ task :process do
3
+ require 'predictor'
4
+ require 'pry'
5
+ require 'logger'
6
+
7
+ Predictor.redis = Redis.new #logger: Logger.new(STDOUT)
8
+ Predictor.redis_prefix "predictor-benchmark"
9
+
10
+ def flush!
11
+ keys = Predictor.redis.keys("predictor-benchmark*")
12
+ Predictor.redis.del(keys) if keys.any?
13
+ end
14
+
15
+ class ItemRecommender
16
+ include Predictor::Base
17
+
18
+ input_matrix :users, weight: 2.0
19
+ input_matrix :parts, weight: 1.0
20
+ end
21
+
22
+ flush!
23
+
24
+ items = (1..200).map { |i| "item-#{i}" }
25
+ users = (1..100).map { |i| "user-#{i}" }
26
+ parts = (1..100).map { |i| "part-#{i}" }
27
+
28
+ r = ItemRecommender.new
29
+
30
+ start = Time.now
31
+ users.each { |user| r.users.add_to_set user, *items.sample(40) }
32
+ parts.each { |part| r.parts.add_to_set part, *items.sample(40) }
33
+ elapsed = Time.now - start
34
+
35
+ puts "add_to_set = #{elapsed.round(3)} seconds"
36
+
37
+ [:ruby, :lua, :union].each do |technique|
38
+ start = Time.now
39
+ Predictor.processing_technique technique
40
+ r.process!
41
+ elapsed = Time.now - start
42
+ puts "#{technique} = #{elapsed.round(3)} seconds"
43
+ end
44
+
45
+ flush!
46
+ end
47
+ end
data/lib/predictor.rb CHANGED
@@ -1,3 +1,4 @@
1
+ require 'json'
1
2
  require "redis"
2
3
  require "predictor/predictor"
3
4
  require "predictor/distance"
@@ -46,6 +46,14 @@ module Predictor::Base
46
46
  to_s
47
47
  end
48
48
  end
49
+
50
+ def processing_technique(technique)
51
+ @technique = technique
52
+ end
53
+
54
+ def get_processing_technique
55
+ @technique || Predictor.get_processing_technique
56
+ end
49
57
  end
50
58
 
51
59
  def input_matrices
@@ -75,7 +83,7 @@ module Predictor::Base
75
83
  end
76
84
  end
77
85
 
78
- def respond_to?(method)
86
+ def respond_to?(method, include_all = false)
79
87
  input_matrices.has_key?(method) ? true : super
80
88
  end
81
89
 
@@ -104,9 +112,11 @@ module Predictor::Base
104
112
  keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item.to_s])
105
113
  end
106
114
 
107
- def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, offset: 0, limit: -1, exclusion_set: [], boost: {})
115
+ def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, on: nil, offset: 0, limit: -1, exclusion_set: [], boost: {})
108
116
  fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)
109
117
 
118
+ on = Array(on)
119
+
110
120
  if matrix_label
111
121
  matrix = input_matrices[matrix_label]
112
122
  item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
@@ -150,6 +160,13 @@ module Predictor::Base
150
160
  multi.zunionstore 'temp', item_keys, weights: weights
151
161
  multi.zrem 'temp', item_set if item_set.any?
152
162
  multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
163
+
164
+ if on.any?
165
+ multi.zadd 'temp2', on.map{ |val| [0.0, val] }
166
+ multi.zinterstore 'temp', ['temp', 'temp2']
167
+ multi.del 'temp2'
168
+ end
169
+
153
170
  predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
154
171
  multi.del 'temp'
155
172
  end
@@ -178,10 +195,58 @@ module Predictor::Base
178
195
  end
179
196
 
180
197
  def process_items!(*items)
181
- items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
182
- items.each do |item|
183
- related_items(item).each{ |related_item| cache_similarity(item, related_item) }
198
+ items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
199
+
200
+ case self.class.get_processing_technique
201
+ when :lua
202
+ matrix_data = {}
203
+ input_matrices.each do |name, matrix|
204
+ matrix_data[name] = {weight: matrix.weight, measure: matrix.measure_name}
205
+ end
206
+ matrix_json = JSON.dump(matrix_data)
207
+
208
+ items.each do |item|
209
+ Predictor.process_lua_script(redis_key, matrix_json, similarity_limit, item)
210
+ end
211
+ when :union
212
+ items.each do |item|
213
+ keys = []
214
+ weights = []
215
+
216
+ input_matrices.each do |key, matrix|
217
+ k = matrix.redis_key(:sets, item)
218
+ item_keys = Predictor.redis.smembers(k).map { |set| matrix.redis_key(:items, set) }
219
+
220
+ counts = Predictor.redis.multi do |multi|
221
+ item_keys.each { |key| Predictor.redis.scard(key) }
222
+ end
223
+
224
+ item_keys.zip(counts).each do |key, count|
225
+ unless count.zero?
226
+ keys << key
227
+ weights << matrix.weight / count
228
+ end
229
+ end
230
+ end
231
+
232
+ Predictor.redis.multi do |multi|
233
+ key = redis_key(:similarities, item)
234
+ multi.del(key)
235
+
236
+ if keys.any?
237
+ multi.zunionstore(key, keys, weights: weights)
238
+ multi.zrem(key, item)
239
+ multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
240
+ multi.zunionstore key, [key] # Rewrite zset for optimized storage.
241
+ end
242
+ end
243
+ end
244
+ else # Default to old behavior, processing things in Ruby.
245
+ items.each do |item|
246
+ related_items(item).each { |related_item| cache_similarity(item, related_item) }
247
+ end
184
248
  end
249
+
185
250
  return self
186
251
  end
187
252
 
@@ -4,6 +4,10 @@ module Predictor
4
4
  @opts = opts
5
5
  end
6
6
 
7
+ def measure_name
8
+ @opts.fetch(:measure, :jaccard_index)
9
+ end
10
+
7
11
  def base
8
12
  @opts[:base]
9
13
  end
@@ -22,8 +26,16 @@ module Predictor
22
26
 
23
27
  def add_to_set(set, *items)
24
28
  items = items.flatten if items.count == 1 && items[0].is_a?(Array)
25
- Predictor.redis.multi do
26
- items.each { |item| add_single_nomulti(set, item) }
29
+ if items.any?
30
+ Predictor.redis.multi do |redis|
31
+ redis.sadd(parent_redis_key(:all_items), items)
32
+ redis.sadd(redis_key(:items, set), items)
33
+
34
+ items.each do |item|
35
+ # add the set to the item's set--inverting the sets
36
+ redis.sadd(redis_key(:sets, item), set)
37
+ end
38
+ end
27
39
  end
28
40
  end
29
41
 
@@ -64,7 +76,6 @@ module Predictor
64
76
  end
65
77
 
66
78
  def score(item1, item2)
67
- measure_name = @opts.fetch(:measure, :jaccard_index)
68
79
  Distance.send(measure_name, redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
69
80
  end
70
81
 
@@ -72,15 +83,5 @@ module Predictor
72
83
  warn 'InputMatrix#calculate_jaccard is now deprecated. Use InputMatrix#score instead'
73
84
  Distance.jaccard_index(redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
74
85
  end
75
-
76
- private
77
-
78
- def add_single_nomulti(set, item)
79
- Predictor.redis.sadd(parent_redis_key(:all_items), item)
80
- Predictor.redis.sadd(redis_key(:items, set), item)
81
- # add the set to the item's set--inverting the sets
82
- Predictor.redis.sadd(redis_key(:sets, item), set)
83
- end
84
-
85
86
  end
86
87
  end
@@ -35,4 +35,119 @@ module Predictor
35
35
  def self.constantize(klass)
36
36
  Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__)
37
37
  end
38
+
39
+ def self.processing_technique(algorithm)
40
+ @technique = algorithm
41
+ end
42
+
43
+ def self.get_processing_technique
44
+ @technique || :ruby
45
+ end
46
+
47
+ def self.process_lua_script(*args)
48
+ @process_sha ||= redis.script(:load, PROCESS_ITEMS_LUA_SCRIPT)
49
+ redis.evalsha(@process_sha, argv: args)
50
+ end
51
+
52
+ PROCESS_ITEMS_LUA_SCRIPT = <<-LUA
53
+ local redis_prefix = ARGV[1]
54
+ local input_matrices = cjson.decode(ARGV[2])
55
+ local similarity_limit = tonumber(ARGV[3])
56
+ local item = ARGV[4]
57
+ local keys = {}
58
+
59
+ for name, options in pairs(input_matrices) do
60
+ local key = table.concat({redis_prefix, name, 'sets', item}, ':')
61
+ local sets = redis.call('SMEMBERS', key)
62
+ for _, set in ipairs(sets) do
63
+ table.insert(keys, table.concat({redis_prefix, name, 'items', set}, ':'))
64
+ end
65
+ end
66
+
67
+ -- Account for empty tables.
68
+ if next(keys) == nil then
69
+ return nil
70
+ end
71
+
72
+ local related_items = redis.call('SUNION', unpack(keys))
73
+
74
+ local function add_similarity_if_necessary(item, similarity, score)
75
+ local store = true
76
+ local key = table.concat({redis_prefix, 'similarities', item}, ':')
77
+
78
+ if similarity_limit ~= nil then
79
+ local zrank = redis.call('ZRANK', key, similarity)
80
+
81
+ if zrank ~= nil then
82
+ local zcard = redis.call('ZCARD', key)
83
+
84
+ if zcard >= similarity_limit then
85
+ -- Similarity is not already stored and we are at limit of similarities.
86
+
87
+ local lowest_scored_item = redis.call('ZRANGEBYSCORE', key, '0', '+inf', 'withscores', 'limit', 0, 1)
88
+
89
+ if #lowest_scored_item > 0 then
90
+ -- If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
91
+ if score <= tonumber(lowest_scored_item[2]) then
92
+ store = false
93
+ else
94
+ redis.call('ZREM', key, lowest_scored_item[1])
95
+ end
96
+ end
97
+ end
98
+ end
99
+ end
100
+
101
+ if store then
102
+ redis.call('ZADD', key, score, similarity)
103
+ end
104
+ end
105
+
106
+ for i, related_item in ipairs(related_items) do
107
+ -- Disregard the current item.
108
+ if related_item ~= item then
109
+ local score = 0.0
110
+
111
+ for name, matrix in pairs(input_matrices) do
112
+ local s = 0.0
113
+
114
+ local key_1 = table.concat({redis_prefix, name, 'sets', item}, ':')
115
+ local key_2 = table.concat({redis_prefix, name, 'sets', related_item}, ':')
116
+
117
+ if matrix.measure == 'jaccard_index' then
118
+ local x = tonumber(redis.call('SINTERSTORE', 'temp', key_1, key_2))
119
+ local y = tonumber(redis.call('SUNIONSTORE', 'temp', key_1, key_2))
120
+ redis.call('DEL', 'temp')
121
+
122
+ if y > 0 then
123
+ s = s + (x / y)
124
+ end
125
+ elseif matrix.measure == 'sorensen_coefficient' then
126
+ local x = redis.call('SINTERSTORE', 'temp', key_1, key_2)
127
+ local y = redis.call('SCARD', key_1)
128
+ local z = redis.call('SCARD', key_2)
129
+
130
+ redis.call('DEL', 'temp')
131
+
132
+ local denom = y + z
133
+ if denom > 0 then
134
+ s = s + (2 * x / denom)
135
+ end
136
+ else
137
+ error("Bad matrix.measure: " .. matrix.measure)
138
+ end
139
+
140
+ score = score + (s * matrix.weight)
141
+ end
142
+
143
+ if score > 0 then
144
+ add_similarity_if_necessary(item, related_item, score)
145
+ add_similarity_if_necessary(related_item, item, score)
146
+ else
147
+ redis.call('ZREM', table.concat({redis_prefix, 'similarities', item}, ':'), related_item)
148
+ redis.call('ZREM', table.concat({redis_prefix, 'similarities', related_item}, ':'), item)
149
+ end
150
+ end
151
+ end
152
+ LUA
38
153
  end
@@ -1,3 +1,3 @@
1
1
  module Predictor
2
- VERSION = "2.2.0"
2
+ VERSION = "2.3.0"
3
3
  end
data/spec/base_spec.rb CHANGED
@@ -8,6 +8,9 @@ describe Predictor::Base do
8
8
  BaseRecommender.redis_prefix(nil)
9
9
  UserRecommender.input_matrices = {}
10
10
  UserRecommender.reset_similarity_limit!
11
+ BaseRecommender.processing_technique nil
12
+ UserRecommender.processing_technique nil
13
+ Predictor.processing_technique nil
11
14
  end
12
15
 
13
16
  describe "configuration" do
@@ -49,6 +52,14 @@ describe Predictor::Base do
49
52
  sm = BaseRecommender.new
50
53
  expect(sm.myinput).to be_a(Predictor::InputMatrix)
51
54
  end
55
+
56
+ it "should accept a custom processing_technique, or default to Predictor's default" do
57
+ BaseRecommender.get_processing_technique.should == :ruby
58
+ Predictor.processing_technique :lua
59
+ BaseRecommender.get_processing_technique.should == :lua
60
+ BaseRecommender.processing_technique :union
61
+ BaseRecommender.get_processing_technique.should == :union
62
+ end
52
63
  end
53
64
 
54
65
  describe "redis_key" do
@@ -202,7 +213,7 @@ describe Predictor::Base do
202
213
  end
203
214
 
204
215
  describe "predictions_for" do
205
- it "returns relevant predictions" do
216
+ it "accepts an :on option to return scores of specific objects" do
206
217
  BaseRecommender.input_matrix(:users, weight: 4.0)
207
218
  BaseRecommender.input_matrix(:tags, weight: 1.0)
208
219
  sm = BaseRecommender.new
@@ -211,93 +222,171 @@ describe Predictor::Base do
211
222
  sm.users.add_to_set('another', "fnord", "other")
212
223
  sm.users.add_to_set('another', "nada")
213
224
  sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
214
- sm.tags.add_to_set('tag2', "bar", "shmoo")
225
+ sm.tags.add_to_set('tag2', "bar", "shmoo", "other")
215
226
  sm.tags.add_to_set('tag3', "shmoo", "nada")
216
227
  sm.process!
217
- predictions = sm.predictions_for('me', matrix_label: :users)
218
- expect(predictions).to eq(["shmoo", "other", "nada"])
219
- predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
220
- expect(predictions).to eq(["shmoo", "other", "nada"])
221
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
222
- expect(predictions).to eq(["other"])
223
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
224
- expect(predictions).to eq(["other", "nada"])
228
+ predictions = sm.predictions_for('me', matrix_label: :users, on: 'other', with_scores: true)
229
+ expect(predictions).to eq([['other', 3.0]])
230
+ predictions = sm.predictions_for('me', matrix_label: :users, on: ['other'], with_scores: true)
231
+ expect(predictions).to eq([['other', 3.0]])
232
+ predictions = sm.predictions_for('me', matrix_label: :users, on: ['other', 'nada'], with_scores: true)
233
+ expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
234
+ predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'], with_scores: true)
235
+ expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
236
+ predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'])
237
+ expect(predictions).to eq(['other', 'nada'])
238
+ predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, limit: 1, with_scores: true)
239
+ expect(predictions).to eq([["other", 3.0]])
240
+ predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, with_scores: true)
241
+ expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
225
242
  end
243
+ end
226
244
 
227
- it "accepts a :boost option" do
228
- BaseRecommender.input_matrix(:users, weight: 4.0)
229
- BaseRecommender.input_matrix(:tags, weight: 1.0)
230
- sm = BaseRecommender.new
231
- sm.users.add_to_set('me', "foo", "bar", "fnord")
232
- sm.users.add_to_set('not_me', "foo", "shmoo")
233
- sm.users.add_to_set('another', "fnord", "other")
234
- sm.users.add_to_set('another', "nada")
235
- sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
236
- sm.tags.add_to_set('tag2', "bar", "shmoo")
237
- sm.tags.add_to_set('tag3', "shmoo", "nada")
238
- sm.process!
245
+ [:ruby, :lua, :union].each do |technique|
246
+ describe "predictions_for with #{technique} processing" do
247
+ before do
248
+ Predictor.processing_technique(technique)
249
+ end
239
250
 
240
- # Syntax #1: Tags passed as array, weights assumed to be 1.0
241
- predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
242
- expect(predictions).to eq(["shmoo", "nada", "other"])
243
- predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
244
- expect(predictions).to eq(["shmoo", "nada", "other"])
245
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
246
- expect(predictions).to eq(["nada"])
247
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
248
- expect(predictions).to eq(["nada", "other"])
249
-
250
- # Syntax #2: Weights explicitly set.
251
- predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
252
- expect(predictions).to eq(["shmoo", "nada", "other"])
253
- predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
254
- expect(predictions).to eq(["shmoo", "nada", "other"])
255
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
256
- expect(predictions).to eq(["nada"])
257
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
258
- expect(predictions).to eq(["nada", "other"])
259
-
260
- # Make sure weights are actually being passed to Redis.
261
- shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
262
- expect(shmoo[0]).to eq('shmoo')
263
- expect(shmoo[1]).to be > 10000
264
- expect(nada[0]).to eq('nada')
265
- expect(nada[1]).to be > 10000
266
- expect(other[0]).to eq('other')
267
- expect(other[1]).to be < 10
268
- end
269
-
270
- it "accepts a :boost option, even with an empty item set" do
271
- BaseRecommender.input_matrix(:users, weight: 4.0)
272
- BaseRecommender.input_matrix(:tags, weight: 1.0)
273
- sm = BaseRecommender.new
274
- sm.users.add_to_set('not_me', "foo", "shmoo")
275
- sm.users.add_to_set('another', "fnord", "other")
276
- sm.users.add_to_set('another', "nada")
277
- sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
278
- sm.tags.add_to_set('tag2', "bar", "shmoo")
279
- sm.tags.add_to_set('tag3', "shmoo", "nada")
280
- sm.process!
251
+ it "returns relevant predictions" do
252
+ BaseRecommender.input_matrix(:users, weight: 4.0)
253
+ BaseRecommender.input_matrix(:tags, weight: 1.0)
254
+ sm = BaseRecommender.new
255
+ sm.users.add_to_set('me', "foo", "bar", "fnord")
256
+ sm.users.add_to_set('not_me', "foo", "shmoo")
257
+ sm.users.add_to_set('another', "fnord", "other")
258
+ sm.users.add_to_set('another', "nada")
259
+ sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
260
+ sm.tags.add_to_set('tag2', "bar", "shmoo")
261
+ sm.tags.add_to_set('tag3', "shmoo", "nada")
262
+ sm.process!
263
+ predictions = sm.predictions_for('me', matrix_label: :users)
264
+ expect(predictions).to eq(["shmoo", "other", "nada"])
265
+ predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
266
+ expect(predictions).to eq(["shmoo", "other", "nada"])
267
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
268
+ expect(predictions).to eq(["other"])
269
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
270
+ expect(predictions).to eq(["other", "nada"])
271
+ end
272
+
273
+ it "accepts a :boost option" do
274
+ BaseRecommender.input_matrix(:users, weight: 4.0)
275
+ BaseRecommender.input_matrix(:tags, weight: 1.0)
276
+ sm = BaseRecommender.new
277
+ sm.users.add_to_set('me', "foo", "bar", "fnord")
278
+ sm.users.add_to_set('not_me', "foo", "shmoo")
279
+ sm.users.add_to_set('another', "fnord", "other")
280
+ sm.users.add_to_set('another', "nada")
281
+ sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
282
+ sm.tags.add_to_set('tag2', "bar", "shmoo")
283
+ sm.tags.add_to_set('tag3', "shmoo", "nada")
284
+ sm.process!
285
+
286
+ # Syntax #1: Tags passed as array, weights assumed to be 1.0
287
+ predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
288
+ expect(predictions).to eq(["shmoo", "nada", "other"])
289
+ predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
290
+ expect(predictions).to eq(["shmoo", "nada", "other"])
291
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
292
+ expect(predictions).to eq(["nada"])
293
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
294
+ expect(predictions).to eq(["nada", "other"])
295
+
296
+ # Syntax #2: Weights explicitly set.
297
+ predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
298
+ expect(predictions).to eq(["shmoo", "nada", "other"])
299
+ predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
300
+ expect(predictions).to eq(["shmoo", "nada", "other"])
301
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
302
+ expect(predictions).to eq(["nada"])
303
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
304
+ expect(predictions).to eq(["nada", "other"])
305
+
306
+ # Make sure weights are actually being passed to Redis.
307
+ shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
308
+ expect(shmoo[0]).to eq('shmoo')
309
+ expect(shmoo[1]).to be > 10000
310
+ expect(nada[0]).to eq('nada')
311
+ expect(nada[1]).to be > 10000
312
+ expect(other[0]).to eq('other')
313
+ expect(other[1]).to be < 10
314
+ end
281
315
 
282
- # Syntax #1: Tags passed as array, weights assumed to be 1.0
283
- predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
284
- expect(predictions).to eq(["shmoo", "nada"])
285
- predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
286
- expect(predictions).to eq(["shmoo", "nada"])
287
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
288
- expect(predictions).to eq(["nada"])
289
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
290
- expect(predictions).to eq(["nada"])
291
-
292
- # Syntax #2: Weights explicitly set.
293
- predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
294
- expect(predictions).to eq(["shmoo", "nada"])
295
- predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
296
- expect(predictions).to eq(["shmoo", "nada"])
297
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
298
- expect(predictions).to eq(["nada"])
299
- predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
300
- expect(predictions).to eq(["nada"])
316
+ it "accepts a :boost option, even with an empty item set" do
317
+ BaseRecommender.input_matrix(:users, weight: 4.0)
318
+ BaseRecommender.input_matrix(:tags, weight: 1.0)
319
+ sm = BaseRecommender.new
320
+ sm.users.add_to_set('not_me', "foo", "shmoo")
321
+ sm.users.add_to_set('another', "fnord", "other")
322
+ sm.users.add_to_set('another', "nada")
323
+ sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
324
+ sm.tags.add_to_set('tag2', "bar", "shmoo")
325
+ sm.tags.add_to_set('tag3', "shmoo", "nada")
326
+ sm.process!
327
+
328
+ # Syntax #1: Tags passed as array, weights assumed to be 1.0
329
+ predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
330
+ expect(predictions).to eq(["shmoo", "nada"])
331
+ predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
332
+ expect(predictions).to eq(["shmoo", "nada"])
333
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
334
+ expect(predictions).to eq(["nada"])
335
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
336
+ expect(predictions).to eq(["nada"])
337
+
338
+ # Syntax #2: Weights explicitly set.
339
+ predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
340
+ expect(predictions).to eq(["shmoo", "nada"])
341
+ predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
342
+ expect(predictions).to eq(["shmoo", "nada"])
343
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
344
+ expect(predictions).to eq(["nada"])
345
+ predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
346
+ expect(predictions).to eq(["nada"])
347
+ end
348
+ end
349
+
350
+ describe "process_items! with #{technique} processing" do
351
+ before do
352
+ Predictor.processing_technique(technique)
353
+ end
354
+
355
+ context "with no similarity_limit" do
356
+ it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
357
+ BaseRecommender.input_matrix(:myfirstinput)
358
+ BaseRecommender.input_matrix(:mysecondinput)
359
+ BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
360
+ sm = BaseRecommender.new
361
+ sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
362
+ sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
363
+ sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
364
+ sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
365
+ expect(sm.similarities_for('item2')).to be_empty
366
+ sm.process_items!('item2')
367
+ similarities = sm.similarities_for('item2')
368
+ expect(similarities).to eq(["item3", "item1"])
369
+ end
370
+ end
371
+
372
+ context "with a similarity_limit" do
373
+ it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
374
+ BaseRecommender.input_matrix(:myfirstinput)
375
+ BaseRecommender.input_matrix(:mysecondinput)
376
+ BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
377
+ BaseRecommender.limit_similarities_to(1)
378
+ sm = BaseRecommender.new
379
+ sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
380
+ sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
381
+ sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
382
+ sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
383
+ expect(sm.similarities_for('item2')).to be_empty
384
+ sm.process_items!('item2')
385
+ similarities = sm.similarities_for('item2')
386
+ expect(similarities).to include("item3")
387
+ expect(similarities.length).to eq(1)
388
+ end
389
+ end
301
390
  end
302
391
  end
303
392
 
@@ -343,44 +432,6 @@ describe Predictor::Base do
343
432
  end
344
433
  end
345
434
 
346
- describe "process_items!" do
347
- context "with no similarity_limit" do
348
- it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
349
- BaseRecommender.input_matrix(:myfirstinput)
350
- BaseRecommender.input_matrix(:mysecondinput)
351
- BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
352
- sm = BaseRecommender.new
353
- sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
354
- sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
355
- sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
356
- sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
357
- expect(sm.similarities_for('item2')).to be_empty
358
- sm.process_items!('item2')
359
- similarities = sm.similarities_for('item2', with_scores: true)
360
- expect(similarities).to include(["item3", 4.0], ["item1", 2.5])
361
- end
362
- end
363
-
364
- context "with a similarity_limit" do
365
- it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
366
- BaseRecommender.input_matrix(:myfirstinput)
367
- BaseRecommender.input_matrix(:mysecondinput)
368
- BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
369
- BaseRecommender.limit_similarities_to(1)
370
- sm = BaseRecommender.new
371
- sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
372
- sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
373
- sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
374
- sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
375
- expect(sm.similarities_for('item2')).to be_empty
376
- sm.process_items!('item2')
377
- similarities = sm.similarities_for('item2', with_scores: true)
378
- expect(similarities).to include(["item3", 4.0])
379
- expect(similarities.length).to eq(1)
380
- end
381
- end
382
- end
383
-
384
435
  describe "process!" do
385
436
  it "should call process_items for all_items's" do
386
437
  BaseRecommender.input_matrix(:anotherinput)
@@ -85,6 +85,11 @@ describe Predictor::InputMatrix do
85
85
  expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
86
86
  end
87
87
 
88
+ it "does not crash if the set of items is empty" do
89
+ @matrix.add_to_set "item1"
90
+ @matrix.add_to_set "item1", []
91
+ end
92
+
88
93
  it "adds the key to each set member's 'items' set" do
89
94
  expect(@matrix.sets_for("foo")).not_to include("item1")
90
95
  expect(@matrix.sets_for("bar")).not_to include("item1")
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: predictor
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.2.0
4
+ version: 2.3.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pathgather
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-06-24 00:00:00.000000000 Z
11
+ date: 2014-09-05 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: redis
@@ -92,6 +92,7 @@ files:
92
92
  - LICENSE
93
93
  - README.md
94
94
  - Rakefile
95
+ - benchmark/process.rb
95
96
  - docs/READMEv1.md
96
97
  - lib/predictor.rb
97
98
  - lib/predictor/base.rb