RubyGems - predictor - Versions diffs - 2.2.0 → 2.3.0 - Mend

predictor 2.2.0 → 2.3.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (13) hide show

checksums.yaml +4 -4
data/Changelog.md +6 -1
data/README.md +15 -0
data/Rakefile +2 -0
data/benchmark/process.rb +47 -0
data/lib/predictor.rb +1 -0
data/lib/predictor/base.rb +70 -5
data/lib/predictor/input_matrix.rb +14 -13
data/lib/predictor/predictor.rb +115 -0
data/lib/predictor/version.rb +1 -1
data/spec/base_spec.rb +171 -120
data/spec/input_matrix_spec.rb +5 -0
metadata +3 -2

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: f06d8361ac24ffaedb43dc650bba9af6ad62374a
-  data.tar.gz: c2815b5b8a507026bae58773ac32a1d7188debcb
+  metadata.gz: c267ff0bf82d11ccefe19ae4314a62b09e9300fa
+  data.tar.gz: 7eed86238af5297d8c32250c6ebf28f992b9e331
 SHA512:
-  metadata.gz: 2988190b65071a5d155974db67bc9815614720f57d9e5131ac429c0fcae5d7210527a07548aca73066a63fee542ee6edb7fb9209304041374df04304e72650ff
-  data.tar.gz: aa8215990ac119de3ca275c9ab666cbe246ffa53f28f5c428361734d7a3a79186bffc007d83056d3b1511beead9ab9ec0aa1f941a97fc8d41378b48b61775b53
+  metadata.gz: eaa28d8a14437dd11742499a10211827ebc212feeb99d421185676d4a06ed1253f67a5252e7e4a748c6f87c69b6d0ed136b461f1df7b557ef4c92514b399e2c3
+  data.tar.gz: c60c5a6572cb6c42ef4e9633e29a73e8baee84384b6dbc3371b9979e7b0a5fc2c6bc2b67dd64e0a2733a38fbadb1e8df73b4e024bdb22b501aaf88b1732d438d

data/Changelog.md CHANGED Viewed

@@ -2,7 +2,12 @@
 Predictor Changelog
 =========
-2.2.0 (Unreleased)
+2.3.0
+---------------------
+* The logic for processing item similarities was ported to a Lua script. Use `Predictor.processing_technique(:lua)` to use the Lua script for all similarity calculations, or use `MyRecommender.processing_technique(:lua)` to use it for specific recommenders. It is substantially faster than the default (old) Ruby mechanism, but has the disadvantage of blocking the Redis server while it runs.
+* An alternate method of calculating item similarities was added, which uses a ZUNIONSTORE across item sets. The results are similar to those achieved by using the Ruby or Lua scripts, but faster. Use `Predictor.processing_technique(:union)` to use the ZUNIONSTORE technique for all similarity calculations, or use `MyRecommender.processing_technique(:union)` to use it for specific recommenders.
+2.2.0 (2014-06-24)
 ---------------------
 * The namespace used for keys in Redis is now configurable on a global or per-class basis. See the readme for more information. If you were overriding the redis_prefix instance method before, it is recommended that you use the new redis_prefix class method instead.
 * Data stored in Redis is now namespaced by the class name of the recommender it is stored by. This change ensures that different recommenders with input matrices of the same name don't overwrite each others' data. After upgrading you'll need to either reindex your data in Redis or configure Predictor to use the naming system you were using before. If you were using the defaults before and you're not worried about matrix name collisions, you can mimic the old behavior with:

data/README.md CHANGED Viewed

@@ -219,6 +219,21 @@ You can also configure the namespace used by each class you create:
   end
 ```
+Processing Items
+---------------------
+As of 2.3.0, there are now multiple techniques available for processing item similarities. You can choose between them by setting a global default like `Predictor.processing_technique(:lua)` or setting a technique for certain classes like `CourseRecommender.processing_technique(:union)`. There are three values.
+- :ruby - This is the default, and is how Predictor calculated similarities before 2.3.0. With this technique the Jaccard and Sorensen calculations are performed in Ruby, with frequent calls to Redis to retrieve simple values. It is somewhat slow.
+- :lua - This option performs the Jaccard and Sorensen calculations in a Lua script on the Redis server. It is substantially faster than the :ruby technique, but blocks the Redis server while each set of calculations are run. The period of blocking will vary based on the size and disposition of your data, but each call may take up to several hundred milliseconds. If your application requires your Redis server to always return results quickly, and you're not able to simply run calculations during off-hours, you should use a different strategy.
+- :union - This option skips Jaccard and Sorensen entirely, and uses a simpler technique involving a ZUNIONSTORE across many item sets to calculate similarities. The results are different from, but similar to the results of using the Jaccard and Sorensen algorithms. It is even faster than the :lua option and does not have the same problem of blocking Redis for long periods of time, but before using it you should sample the output to ensure that it is good enough for your application.
+Predictor now contains a benchmarking script that you can use to compare the speed of these options. An example output from the processing of a relatively small dataset is:
+```
+ruby = 21.098 seconds
+lua = 2.106 seconds
+union = 0.741 seconds
+```
 Upgrading from 1.0 to 2.0
 ---------------------
 As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:

data/Rakefile CHANGED Viewed

@@ -4,3 +4,5 @@ require 'rspec/core/rake_task'
 RSpec::Core::RakeTask.new(:spec)
 task :default => :spec
+Dir["./benchmark/*.rb"].sort.each &method(:require)

data/benchmark/process.rb ADDED Viewed

@@ -0,0 +1,47 @@
+namespace :benchmark do
+  task :process do
+    require 'predictor'
+    require 'pry'
+    require 'logger'
+    Predictor.redis = Redis.new #logger: Logger.new(STDOUT)
+    Predictor.redis_prefix "predictor-benchmark"
+    def flush!
+      keys = Predictor.redis.keys("predictor-benchmark*")
+      Predictor.redis.del(keys) if keys.any?
+    end
+    class ItemRecommender
+      include Predictor::Base
+      input_matrix :users, weight: 2.0
+      input_matrix :parts, weight: 1.0
+    end
+    flush!
+    items = (1..200).map { |i| "item-#{i}" }
+    users = (1..100).map { |i| "user-#{i}" }
+    parts = (1..100).map { |i| "part-#{i}" }
+    r = ItemRecommender.new
+    start = Time.now
+    users.each { |user| r.users.add_to_set user, *items.sample(40) }
+    parts.each { |part| r.parts.add_to_set part, *items.sample(40) }
+    elapsed = Time.now - start
+    puts "add_to_set = #{elapsed.round(3)} seconds"
+    [:ruby, :lua, :union].each do |technique|
+      start = Time.now
+      Predictor.processing_technique technique
+      r.process!
+      elapsed = Time.now - start
+      puts "#{technique} = #{elapsed.round(3)} seconds"
+    end
+    flush!
+  end
+end

data/lib/predictor.rb CHANGED Viewed

@@ -1,3 +1,4 @@
+require 'json'
 require "redis"
 require "predictor/predictor"
 require "predictor/distance"

data/lib/predictor/base.rb CHANGED Viewed

@@ -46,6 +46,14 @@ module Predictor::Base
         to_s
       end
     end
+    def processing_technique(technique)
+      @technique = technique
+    end
+    def get_processing_technique
+      @technique || Predictor.get_processing_technique
+    end
   end
   def input_matrices
@@ -75,7 +83,7 @@ module Predictor::Base
     end
   end
-  def respond_to?(method)
+  def respond_to?(method, include_all = false)
     input_matrices.has_key?(method) ? true : super
   end
@@ -104,9 +112,11 @@ module Predictor::Base
     keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item.to_s])
   end
-  def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, offset: 0, limit: -1, exclusion_set: [], boost: {})
+  def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, on: nil, offset: 0, limit: -1, exclusion_set: [], boost: {})
     fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)
+    on = Array(on)
     if matrix_label
       matrix = input_matrices[matrix_label]
       item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
@@ -150,6 +160,13 @@ module Predictor::Base
       multi.zunionstore 'temp', item_keys, weights: weights
       multi.zrem 'temp', item_set if item_set.any?
       multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
+      if on.any?
+        multi.zadd 'temp2', on.map{ |val| [0.0, val] }
+        multi.zinterstore 'temp', ['temp', 'temp2']
+        multi.del 'temp2'
+      end
       predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
       multi.del 'temp'
     end
@@ -178,10 +195,58 @@ module Predictor::Base
   end
   def process_items!(*items)
-    items = items.flatten if items.count == 1 && items[0].is_a?(Array)  # Old syntax
-    items.each do |item|
-      related_items(item).each{ |related_item| cache_similarity(item, related_item) }
+    items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
+    case self.class.get_processing_technique
+    when :lua
+      matrix_data = {}
+      input_matrices.each do |name, matrix|
+        matrix_data[name] = {weight: matrix.weight, measure: matrix.measure_name}
+      end
+      matrix_json = JSON.dump(matrix_data)
+      items.each do |item|
+        Predictor.process_lua_script(redis_key, matrix_json, similarity_limit, item)
+      end
+    when :union
+      items.each do |item|
+        keys    = []
+        weights = []
+        input_matrices.each do |key, matrix|
+          k = matrix.redis_key(:sets, item)
+          item_keys = Predictor.redis.smembers(k).map { |set| matrix.redis_key(:items, set) }
+          counts = Predictor.redis.multi do |multi|
+            item_keys.each { |key| Predictor.redis.scard(key) }
+          end
+          item_keys.zip(counts).each do |key, count|
+            unless count.zero?
+              keys << key
+              weights << matrix.weight / count
+            end
+          end
+        end
+        Predictor.redis.multi do |multi|
+          key = redis_key(:similarities, item)
+          multi.del(key)
+          if keys.any?
+            multi.zunionstore(key, keys, weights: weights)
+            multi.zrem(key, item)
+            multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
+            multi.zunionstore key, [key] # Rewrite zset for optimized storage.
+          end
+        end
+      end
+    else # Default to old behavior, processing things in Ruby.
+      items.each do |item|
+        related_items(item).each { |related_item| cache_similarity(item, related_item) }
+      end
     end
     return self
   end

data/lib/predictor/input_matrix.rb CHANGED Viewed

@@ -4,6 +4,10 @@ module Predictor
       @opts = opts
     end
+    def measure_name
+      @opts.fetch(:measure, :jaccard_index)
+    end
     def base
       @opts[:base]
     end
@@ -22,8 +26,16 @@ module Predictor
     def add_to_set(set, *items)
       items = items.flatten if items.count == 1 && items[0].is_a?(Array)
-      Predictor.redis.multi do
-        items.each { |item| add_single_nomulti(set, item) }
+      if items.any?
+        Predictor.redis.multi do |redis|
+          redis.sadd(parent_redis_key(:all_items), items)
+          redis.sadd(redis_key(:items, set), items)
+          items.each do |item|
+            # add the set to the item's set--inverting the sets
+            redis.sadd(redis_key(:sets, item), set)
+          end
+        end
       end
     end
@@ -64,7 +76,6 @@ module Predictor
     end
     def score(item1, item2)
-      measure_name = @opts.fetch(:measure, :jaccard_index)
       Distance.send(measure_name, redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
     end
@@ -72,15 +83,5 @@ module Predictor
       warn 'InputMatrix#calculate_jaccard is now deprecated. Use InputMatrix#score instead'
       Distance.jaccard_index(redis_key(:sets, item1), redis_key(:sets, item2), Predictor.redis)
     end
-    private
-    def add_single_nomulti(set, item)
-      Predictor.redis.sadd(parent_redis_key(:all_items), item)
-      Predictor.redis.sadd(redis_key(:items, set), item)
-      # add the set to the item's set--inverting the sets
-      Predictor.redis.sadd(redis_key(:sets, item), set)
-    end
   end
 end

data/lib/predictor/predictor.rb CHANGED Viewed

@@ -35,4 +35,119 @@ module Predictor
   def self.constantize(klass)
     Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__)
   end
+  def self.processing_technique(algorithm)
+    @technique = algorithm
+  end
+  def self.get_processing_technique
+    @technique || :ruby
+  end
+  def self.process_lua_script(*args)
+    @process_sha ||= redis.script(:load, PROCESS_ITEMS_LUA_SCRIPT)
+    redis.evalsha(@process_sha, argv: args)
+  end
+  PROCESS_ITEMS_LUA_SCRIPT = <<-LUA
+    local redis_prefix = ARGV[1]
+    local input_matrices = cjson.decode(ARGV[2])
+    local similarity_limit = tonumber(ARGV[3])
+    local item = ARGV[4]
+    local keys = {}
+    for name, options in pairs(input_matrices) do
+      local key = table.concat({redis_prefix, name, 'sets', item}, ':')
+      local sets = redis.call('SMEMBERS', key)
+      for _, set in ipairs(sets) do
+        table.insert(keys, table.concat({redis_prefix, name, 'items', set}, ':'))
+      end
+    end
+    -- Account for empty tables.
+    if next(keys) == nil then
+      return nil
+    end
+    local related_items = redis.call('SUNION', unpack(keys))
+    local function add_similarity_if_necessary(item, similarity, score)
+      local store = true
+      local key = table.concat({redis_prefix, 'similarities', item}, ':')
+      if similarity_limit ~= nil then
+        local zrank = redis.call('ZRANK', key, similarity)
+        if zrank ~= nil then
+          local zcard = redis.call('ZCARD', key)
+          if zcard >= similarity_limit then
+            -- Similarity is not already stored and we are at limit of similarities.
+            local lowest_scored_item = redis.call('ZRANGEBYSCORE', key, '0', '+inf', 'withscores', 'limit', 0, 1)
+            if #lowest_scored_item > 0 then
+              -- If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
+              if score <= tonumber(lowest_scored_item[2]) then
+                store = false
+              else
+                redis.call('ZREM', key, lowest_scored_item[1])
+              end
+            end
+          end
+        end
+      end
+      if store then
+        redis.call('ZADD', key, score, similarity)
+      end
+    end
+    for i, related_item in ipairs(related_items) do
+      -- Disregard the current item.
+      if related_item ~= item then
+        local score = 0.0
+        for name, matrix in pairs(input_matrices) do
+          local s = 0.0
+          local key_1 = table.concat({redis_prefix, name, 'sets', item}, ':')
+          local key_2 = table.concat({redis_prefix, name, 'sets', related_item}, ':')
+          if matrix.measure == 'jaccard_index' then
+            local x = tonumber(redis.call('SINTERSTORE', 'temp', key_1, key_2))
+            local y = tonumber(redis.call('SUNIONSTORE', 'temp', key_1, key_2))
+            redis.call('DEL', 'temp')
+            if y > 0 then
+              s = s + (x / y)
+            end
+          elseif matrix.measure == 'sorensen_coefficient' then
+            local x = redis.call('SINTERSTORE', 'temp', key_1, key_2)
+            local y = redis.call('SCARD', key_1)
+            local z = redis.call('SCARD', key_2)
+            redis.call('DEL', 'temp')
+            local denom = y + z
+            if denom > 0 then
+              s = s + (2 * x / denom)
+            end
+          else
+            error("Bad matrix.measure: " .. matrix.measure)
+          end
+          score = score + (s * matrix.weight)
+        end
+        if score > 0 then
+          add_similarity_if_necessary(item, related_item, score)
+          add_similarity_if_necessary(related_item, item, score)
+        else
+          redis.call('ZREM', table.concat({redis_prefix, 'similarities', item}, ':'), related_item)
+          redis.call('ZREM', table.concat({redis_prefix, 'similarities', related_item}, ':'), item)
+        end
+      end
+    end
+  LUA
 end

data/lib/predictor/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Predictor
-  VERSION = "2.2.0"
+  VERSION = "2.3.0"
 end

data/spec/base_spec.rb CHANGED Viewed

@@ -8,6 +8,9 @@ describe Predictor::Base do
     BaseRecommender.redis_prefix(nil)
     UserRecommender.input_matrices = {}
     UserRecommender.reset_similarity_limit!
+    BaseRecommender.processing_technique nil
+    UserRecommender.processing_technique nil
+    Predictor.processing_technique nil
   end
   describe "configuration" do
@@ -49,6 +52,14 @@ describe Predictor::Base do
       sm = BaseRecommender.new
       expect(sm.myinput).to be_a(Predictor::InputMatrix)
     end
+    it "should accept a custom processing_technique, or default to Predictor's default" do
+      BaseRecommender.get_processing_technique.should == :ruby
+      Predictor.processing_technique :lua
+      BaseRecommender.get_processing_technique.should == :lua
+      BaseRecommender.processing_technique :union
+      BaseRecommender.get_processing_technique.should == :union
+    end
   end
   describe "redis_key" do
@@ -202,7 +213,7 @@ describe Predictor::Base do
   end
   describe "predictions_for" do
-    it "returns relevant predictions" do
+    it "accepts an :on option to return scores of specific objects" do
       BaseRecommender.input_matrix(:users, weight: 4.0)
       BaseRecommender.input_matrix(:tags, weight: 1.0)
       sm = BaseRecommender.new
@@ -211,93 +222,171 @@ describe Predictor::Base do
       sm.users.add_to_set('another', "fnord", "other")
       sm.users.add_to_set('another', "nada")
       sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
-      sm.tags.add_to_set('tag2', "bar", "shmoo")
+      sm.tags.add_to_set('tag2', "bar", "shmoo", "other")
       sm.tags.add_to_set('tag3', "shmoo", "nada")
       sm.process!
-      predictions = sm.predictions_for('me', matrix_label: :users)
-      expect(predictions).to eq(["shmoo", "other", "nada"])
-      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
-      expect(predictions).to eq(["shmoo", "other", "nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
-      expect(predictions).to eq(["other"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
-      expect(predictions).to eq(["other", "nada"])
+      predictions = sm.predictions_for('me', matrix_label: :users, on: 'other', with_scores: true)
+      expect(predictions).to eq([['other', 3.0]])
+      predictions = sm.predictions_for('me', matrix_label: :users, on: ['other'], with_scores: true)
+      expect(predictions).to eq([['other', 3.0]])
+      predictions = sm.predictions_for('me', matrix_label: :users, on: ['other', 'nada'], with_scores: true)
+      expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
+      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'], with_scores: true)
+      expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
+      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], on: ['other', 'nada'])
+      expect(predictions).to eq(['other', 'nada'])
+      predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, limit: 1, with_scores: true)
+      expect(predictions).to eq([["other", 3.0]])
+      predictions = sm.predictions_for('me', matrix_label: :users, on: ['shmoo', 'other', 'nada'], offset: 1, with_scores: true)
+      expect(predictions).to eq([['other', 3.0], ['nada', 2.0]])
     end
+  end
-    it "accepts a :boost option" do
-      BaseRecommender.input_matrix(:users, weight: 4.0)
-      BaseRecommender.input_matrix(:tags, weight: 1.0)
-      sm = BaseRecommender.new
-      sm.users.add_to_set('me', "foo", "bar", "fnord")
-      sm.users.add_to_set('not_me', "foo", "shmoo")
-      sm.users.add_to_set('another', "fnord", "other")
-      sm.users.add_to_set('another', "nada")
-      sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
-      sm.tags.add_to_set('tag2', "bar", "shmoo")
-      sm.tags.add_to_set('tag3', "shmoo", "nada")
-      sm.process!
+  [:ruby, :lua, :union].each do |technique|
+    describe "predictions_for with #{technique} processing" do
+      before do
+        Predictor.processing_technique(technique)
+      end
-      # Syntax #1: Tags passed as array, weights assumed to be 1.0
-      predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
-      expect(predictions).to eq(["shmoo", "nada", "other"])
-      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
-      expect(predictions).to eq(["shmoo", "nada", "other"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
-      expect(predictions).to eq(["nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
-      expect(predictions).to eq(["nada", "other"])
-      # Syntax #2: Weights explicitly set.
-      predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["shmoo", "nada", "other"])
-      predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["shmoo", "nada", "other"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["nada", "other"])
-      # Make sure weights are actually being passed to Redis.
-      shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
-      expect(shmoo[0]).to eq('shmoo')
-      expect(shmoo[1]).to be > 10000
-      expect(nada[0]).to eq('nada')
-      expect(nada[1]).to be > 10000
-      expect(other[0]).to eq('other')
-      expect(other[1]).to be < 10
-    end
-    it "accepts a :boost option, even with an empty item set" do
-      BaseRecommender.input_matrix(:users, weight: 4.0)
-      BaseRecommender.input_matrix(:tags, weight: 1.0)
-      sm = BaseRecommender.new
-      sm.users.add_to_set('not_me', "foo", "shmoo")
-      sm.users.add_to_set('another', "fnord", "other")
-      sm.users.add_to_set('another', "nada")
-      sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
-      sm.tags.add_to_set('tag2', "bar", "shmoo")
-      sm.tags.add_to_set('tag3', "shmoo", "nada")
-      sm.process!
+      it "returns relevant predictions" do
+        BaseRecommender.input_matrix(:users, weight: 4.0)
+        BaseRecommender.input_matrix(:tags, weight: 1.0)
+        sm = BaseRecommender.new
+        sm.users.add_to_set('me', "foo", "bar", "fnord")
+        sm.users.add_to_set('not_me', "foo", "shmoo")
+        sm.users.add_to_set('another', "fnord", "other")
+        sm.users.add_to_set('another', "nada")
+        sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
+        sm.tags.add_to_set('tag2', "bar", "shmoo")
+        sm.tags.add_to_set('tag3', "shmoo", "nada")
+        sm.process!
+        predictions = sm.predictions_for('me', matrix_label: :users)
+        expect(predictions).to eq(["shmoo", "other", "nada"])
+        predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
+        expect(predictions).to eq(["shmoo", "other", "nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
+        expect(predictions).to eq(["other"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
+        expect(predictions).to eq(["other", "nada"])
+      end
+      it "accepts a :boost option" do
+        BaseRecommender.input_matrix(:users, weight: 4.0)
+        BaseRecommender.input_matrix(:tags, weight: 1.0)
+        sm = BaseRecommender.new
+        sm.users.add_to_set('me', "foo", "bar", "fnord")
+        sm.users.add_to_set('not_me', "foo", "shmoo")
+        sm.users.add_to_set('another', "fnord", "other")
+        sm.users.add_to_set('another', "nada")
+        sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
+        sm.tags.add_to_set('tag2', "bar", "shmoo")
+        sm.tags.add_to_set('tag3', "shmoo", "nada")
+        sm.process!
+        # Syntax #1: Tags passed as array, weights assumed to be 1.0
+        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
+        expect(predictions).to eq(["shmoo", "nada", "other"])
+        predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: ['tag3']})
+        expect(predictions).to eq(["shmoo", "nada", "other"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
+        expect(predictions).to eq(["nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
+        expect(predictions).to eq(["nada", "other"])
+        # Syntax #2: Weights explicitly set.
+        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["shmoo", "nada", "other"])
+        predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"], boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["shmoo", "nada", "other"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["nada", "other"])
+        # Make sure weights are actually being passed to Redis.
+        shmoo, nada, other = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 10000.0}}, with_scores: true)
+        expect(shmoo[0]).to eq('shmoo')
+        expect(shmoo[1]).to be > 10000
+        expect(nada[0]).to eq('nada')
+        expect(nada[1]).to be > 10000
+        expect(other[0]).to eq('other')
+        expect(other[1]).to be < 10
+      end
-      # Syntax #1: Tags passed as array, weights assumed to be 1.0
-      predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
-      expect(predictions).to eq(["shmoo", "nada"])
-      predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
-      expect(predictions).to eq(["shmoo", "nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
-      expect(predictions).to eq(["nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
-      expect(predictions).to eq(["nada"])
-      # Syntax #2: Weights explicitly set.
-      predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["shmoo", "nada"])
-      predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["shmoo", "nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["nada"])
-      predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
-      expect(predictions).to eq(["nada"])
+      it "accepts a :boost option, even with an empty item set" do
+        BaseRecommender.input_matrix(:users, weight: 4.0)
+        BaseRecommender.input_matrix(:tags, weight: 1.0)
+        sm = BaseRecommender.new
+        sm.users.add_to_set('not_me', "foo", "shmoo")
+        sm.users.add_to_set('another', "fnord", "other")
+        sm.users.add_to_set('another', "nada")
+        sm.tags.add_to_set('tag1', "foo", "fnord", "shmoo")
+        sm.tags.add_to_set('tag2', "bar", "shmoo")
+        sm.tags.add_to_set('tag3', "shmoo", "nada")
+        sm.process!
+        # Syntax #1: Tags passed as array, weights assumed to be 1.0
+        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: ['tag3']})
+        expect(predictions).to eq(["shmoo", "nada"])
+        predictions = sm.predictions_for(item_set: [], boost: {tags: ['tag3']})
+        expect(predictions).to eq(["shmoo", "nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: ['tag3']})
+        expect(predictions).to eq(["nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: ['tag3']})
+        expect(predictions).to eq(["nada"])
+        # Syntax #2: Weights explicitly set.
+        predictions = sm.predictions_for('me', matrix_label: :users, boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["shmoo", "nada"])
+        predictions = sm.predictions_for(item_set: [], boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["shmoo", "nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["nada"])
+        predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, boost: {tags: {values: ['tag3'], weight: 1.0}})
+        expect(predictions).to eq(["nada"])
+      end
+    end
+    describe "process_items! with #{technique} processing" do
+      before do
+        Predictor.processing_technique(technique)
+      end
+      context "with no similarity_limit" do
+        it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
+          BaseRecommender.input_matrix(:myfirstinput)
+          BaseRecommender.input_matrix(:mysecondinput)
+          BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
+          sm = BaseRecommender.new
+          sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
+          sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
+          sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
+          sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
+          expect(sm.similarities_for('item2')).to be_empty
+          sm.process_items!('item2')
+          similarities = sm.similarities_for('item2')
+          expect(similarities).to eq(["item3", "item1"])
+        end
+      end
+      context "with a similarity_limit" do
+        it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
+          BaseRecommender.input_matrix(:myfirstinput)
+          BaseRecommender.input_matrix(:mysecondinput)
+          BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
+          BaseRecommender.limit_similarities_to(1)
+          sm = BaseRecommender.new
+          sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
+          sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
+          sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
+          sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
+          expect(sm.similarities_for('item2')).to be_empty
+          sm.process_items!('item2')
+          similarities = sm.similarities_for('item2')
+          expect(similarities).to include("item3")
+          expect(similarities.length).to eq(1)
+        end
+      end
     end
   end
@@ -343,44 +432,6 @@ describe Predictor::Base do
     end
   end
-  describe "process_items!" do
-    context "with no similarity_limit" do
-      it "calculates the similarity between the item and all related_items (other items in a set the given item is in)" do
-        BaseRecommender.input_matrix(:myfirstinput)
-        BaseRecommender.input_matrix(:mysecondinput)
-        BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
-        sm = BaseRecommender.new
-        sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
-        sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
-        sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
-        sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
-        expect(sm.similarities_for('item2')).to be_empty
-        sm.process_items!('item2')
-        similarities = sm.similarities_for('item2', with_scores: true)
-        expect(similarities).to include(["item3", 4.0], ["item1", 2.5])
-      end
-    end
-    context "with a similarity_limit" do
-      it "calculates the similarity between the item and all related_items (other items in a set the given item is in), but obeys the similarity_limit" do
-        BaseRecommender.input_matrix(:myfirstinput)
-        BaseRecommender.input_matrix(:mysecondinput)
-        BaseRecommender.input_matrix(:mythirdinput, weight: 3.0)
-        BaseRecommender.limit_similarities_to(1)
-        sm = BaseRecommender.new
-        sm.myfirstinput.add_to_set 'set1', 'item1', 'item2'
-        sm.mysecondinput.add_to_set 'set2', 'item2', 'item3'
-        sm.mythirdinput.add_to_set 'set3', 'item2', 'item3'
-        sm.mythirdinput.add_to_set 'set4', 'item1', 'item2', 'item3'
-        expect(sm.similarities_for('item2')).to be_empty
-        sm.process_items!('item2')
-        similarities = sm.similarities_for('item2', with_scores: true)
-        expect(similarities).to include(["item3", 4.0])
-        expect(similarities.length).to eq(1)
-      end
-    end
-  end
   describe "process!" do
     it "should call process_items for all_items's" do
       BaseRecommender.input_matrix(:anotherinput)

data/spec/input_matrix_spec.rb CHANGED Viewed

@@ -85,6 +85,11 @@ describe Predictor::InputMatrix do
       expect(@matrix.items_for("item1")).to include("foo", "bar", "fnord", "blubb")
     end
+    it "does not crash if the set of items is empty" do
+      @matrix.add_to_set "item1"
+      @matrix.add_to_set "item1", []
+    end
     it "adds the key to each set member's 'items' set" do
       expect(@matrix.sets_for("foo")).not_to include("item1")
       expect(@matrix.sets_for("bar")).not_to include("item1")

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: predictor
 version: !ruby/object:Gem::Version
-  version: 2.2.0
+  version: 2.3.0
 platform: ruby
 authors:
 - Pathgather
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-06-24 00:00:00.000000000 Z
+date: 2014-09-05 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: redis
@@ -92,6 +92,7 @@ files:
 - LICENSE
 - README.md
 - Rakefile
+- benchmark/process.rb
 - docs/READMEv1.md
 - lib/predictor.rb
 - lib/predictor/base.rb