predictor 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/Gemfile +10 -0
- data/LICENSE +20 -0
- data/README.md +176 -0
- data/Rakefile +12 -0
- data/lib/predictor.rb +4 -0
- data/lib/predictor/base.rb +148 -0
- data/lib/predictor/input_matrix.rb +138 -0
- data/lib/predictor/predictor.rb +21 -0
- data/lib/predictor/version.rb +3 -0
- data/predictor.gemspec +21 -0
- data/spec/base_spec.rb +224 -0
- data/spec/input_matrix_spec.rb +251 -0
- data/spec/predictor_spec.rb +15 -0
- data/spec/spec_helper.rb +40 -0
- metadata +91 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 23c921a064f6dcb321d1948e051545298616f3b7
|
4
|
+
data.tar.gz: 501a6132f7ea81fa316faf5a5f14a7b0d28afcdb
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 0b14e50f6df801912204a8a312124eba2e44888e8c5b8b1e8f3fe51808b0b3cad2e794b55633fb747cb60189d782312864c49c5105d03de5e1992f240a3528d8
|
7
|
+
data.tar.gz: 850b7d299e0f3ce4352fb3cd3423e96d598f83a72b5a0cf2d0cf27734f25e8951f2834cb40444931820721dce74bb4f029a82081765abb4069fa25d9a31fa04c
|
data/Gemfile
ADDED
data/LICENSE
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2014 Pathgather
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
6
|
+
this software and associated documentation files (the "Software"), to deal in
|
7
|
+
the Software without restriction, including without limitation the rights to
|
8
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
9
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
10
|
+
subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
17
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
18
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
19
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
20
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,176 @@
|
|
1
|
+
=======
|
2
|
+
Predictor
|
3
|
+
=========
|
4
|
+
|
5
|
+
Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to recommend content to users.
|
6
|
+
|
7
|
+
![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)
|
8
|
+
|
9
|
+
Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
|
10
|
+
* Be much, much more performant and efficient by using Redis for most logic.
|
11
|
+
* Provide item similarities such as "Users that read this book also read ..."
|
12
|
+
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."
|
13
|
+
|
14
|
+
At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
|
15
|
+
|
16
|
+
Installation
|
17
|
+
---------------------
|
18
|
+
```ruby
|
19
|
+
gem install predictor
|
20
|
+
````
|
21
|
+
or in your Gemfile:
|
22
|
+
````
|
23
|
+
gem 'predictor'
|
24
|
+
```
|
25
|
+
Getting Started
|
26
|
+
---------------------
|
27
|
+
First step is to configure Predictor with your Redis instance.
|
28
|
+
```ruby
|
29
|
+
# in config/initializers/predictor.rb
|
30
|
+
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
|
31
|
+
|
32
|
+
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first
|
33
|
+
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
|
34
|
+
```
|
35
|
+
Inputting Data
|
36
|
+
---------------------
|
37
|
+
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
|
38
|
+
|
39
|
+
Below, we're building a recommender to recommend courses based off of:
|
40
|
+
* Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
|
41
|
+
* "user1" -> "course-1", "course-3",
|
42
|
+
* "user2" -> "course-1", "course-4"
|
43
|
+
* Tags and their courses. This will lead to sets like:
|
44
|
+
* "rails" -> "course-1", "course-2",
|
45
|
+
* "microeconomics" -> "course-3", "course-4"
|
46
|
+
* Topics and their courses. This will lead to sets like:
|
47
|
+
* "computer science" -> "course-1", "course-2",
|
48
|
+
* "economics and finance" -> "course-3", "course-4"
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
class CourseRecommender
|
52
|
+
include Predictor::Base
|
53
|
+
|
54
|
+
input_matrix :users, weight: 3.0
|
55
|
+
input_matrix :tags, weight: 2.0
|
56
|
+
input_matrix :topics, weight: 1.0
|
57
|
+
end
|
58
|
+
```
|
59
|
+
|
60
|
+
Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
|
61
|
+
```ruby
|
62
|
+
recommender = CourseRecommender.new
|
63
|
+
|
64
|
+
# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
|
65
|
+
recommender.topics.add_single!("topic-1", "course-1")
|
66
|
+
|
67
|
+
# If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
|
68
|
+
# for course-1 across all other courses. If this is the case, use add_single and process the item at a more
|
69
|
+
# convenient time, perhaps in a background job
|
70
|
+
recommender.topics.add_single("topic-1", "course-1")
|
71
|
+
recommender.topics.process_item!("course-1")
|
72
|
+
|
73
|
+
# Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
|
74
|
+
# If not, the tag-1 set will be initialized with course-1 and course-2
|
75
|
+
recommender.tags.add_set!("tag-1", ["course-1", "course-2"])
|
76
|
+
|
77
|
+
# Or, just add the set and process whenever you like
|
78
|
+
recommender.tags.add_set("tag-1", ["course-1", "course-2"])
|
79
|
+
["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
|
80
|
+
```
|
81
|
+
|
82
|
+
As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
|
83
|
+
* If you want to simply update the similarities for a single item in a specific matrix:
|
84
|
+
````
|
85
|
+
recommender.matrix.process_item!(item)
|
86
|
+
````
|
87
|
+
* If you want to update the similarities for all items in a specific matrix:
|
88
|
+
````
|
89
|
+
recommender.matrix.process!
|
90
|
+
````
|
91
|
+
* If you want to update the similarities for a single item in all matrices:
|
92
|
+
````
|
93
|
+
recommender.process_item!(item)
|
94
|
+
````
|
95
|
+
* If you want to update all similarities in all matrices:
|
96
|
+
````
|
97
|
+
recommender.process!
|
98
|
+
````
|
99
|
+
|
100
|
+
Retrieving Similarities and Recommendations
|
101
|
+
---------------------
|
102
|
+
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.
|
103
|
+
```ruby
|
104
|
+
recommender = CourseRecommender.new
|
105
|
+
|
106
|
+
# Return all similarities for course-1 (ordered by most similar to least).
|
107
|
+
recommender.similarities_for("course-1")
|
108
|
+
|
109
|
+
# Need to paginate? Not a problem! Specify an offset and a limit
|
110
|
+
recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20
|
111
|
+
|
112
|
+
# Want scores?
|
113
|
+
recommender.similarities_for("course-1", with_scores: true)
|
114
|
+
|
115
|
+
# Want to ignore a certain set of courses in similarities?
|
116
|
+
recommender.similarities_for("course-1", exclusion_set: ["course-2"])
|
117
|
+
```
|
118
|
+
|
119
|
+
The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!
|
120
|
+
```ruby
|
121
|
+
recommender = CourseRecommender.new
|
122
|
+
|
123
|
+
# User has taken course-1 and course-2. Let's see what else they might like...
|
124
|
+
recommender.predictions_for(item_set: ["course-1", "course-2"])
|
125
|
+
|
126
|
+
# Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
|
127
|
+
recommender.predictions_for("user-1", matrix_label: :users)
|
128
|
+
|
129
|
+
# Paginate too!
|
130
|
+
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
|
131
|
+
|
132
|
+
# Gimme some scores and ignore user-2....that user-2 is one sketchy fella
|
133
|
+
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
|
134
|
+
```
|
135
|
+
|
136
|
+
Deleting Items
|
137
|
+
---------------------
|
138
|
+
If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
|
139
|
+
```ruby
|
140
|
+
recommender = CourseRecommender.new
|
141
|
+
|
142
|
+
# User removed course-1 from topic-1, but course-1 still exists
|
143
|
+
recommender.topics.delete_item!("course-1")
|
144
|
+
|
145
|
+
# course-1 was permanently deleted
|
146
|
+
recommender.delete_item!("course-1")
|
147
|
+
|
148
|
+
# Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
|
149
|
+
recommender.clean!
|
150
|
+
```
|
151
|
+
|
152
|
+
Problems? Issues? Want to help out?
|
153
|
+
---------------------
|
154
|
+
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
|
155
|
+
|
156
|
+
The MIT License (MIT)
|
157
|
+
---------------------
|
158
|
+
Copyright (c) 2014 Pathgather
|
159
|
+
|
160
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
161
|
+
this software and associated documentation files (the "Software"), to deal in
|
162
|
+
the Software without restriction, including without limitation the rights to
|
163
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
164
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
165
|
+
subject to the following conditions:
|
166
|
+
|
167
|
+
The above copyright notice and this permission notice shall be included in all
|
168
|
+
copies or substantial portions of the Software.
|
169
|
+
|
170
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
171
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
172
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
173
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
174
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
175
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
176
|
+
|
data/Rakefile
ADDED
data/lib/predictor.rb
ADDED
@@ -0,0 +1,148 @@
|
|
1
|
+
module Predictor::Base
|
2
|
+
def self.included(base)
|
3
|
+
base.extend(ClassMethods)
|
4
|
+
end
|
5
|
+
|
6
|
+
module ClassMethods
|
7
|
+
def input_matrix(key, opts={})
|
8
|
+
@matrices ||= {}
|
9
|
+
@matrices[key] = opts
|
10
|
+
end
|
11
|
+
|
12
|
+
def input_matrices=(val)
|
13
|
+
@matrices = val
|
14
|
+
end
|
15
|
+
|
16
|
+
def input_matrices
|
17
|
+
@matrices
|
18
|
+
end
|
19
|
+
end
|
20
|
+
|
21
|
+
def input_matrices
|
22
|
+
@input_matrices ||= Hash[self.class.input_matrices.map{ |key, opts|
|
23
|
+
opts.merge!(:key => key, :redis_prefix => redis_prefix)
|
24
|
+
[ key, Predictor::InputMatrix.new(opts) ]
|
25
|
+
}]
|
26
|
+
end
|
27
|
+
|
28
|
+
def redis_prefix
|
29
|
+
"predictor"
|
30
|
+
end
|
31
|
+
|
32
|
+
def redis_key(*append)
|
33
|
+
([redis_prefix] + append).flatten.compact.join(":")
|
34
|
+
end
|
35
|
+
|
36
|
+
def method_missing(method, *args)
|
37
|
+
if input_matrices.has_key?(method)
|
38
|
+
input_matrices[method]
|
39
|
+
else
|
40
|
+
raise NoMethodError.new(method.to_s)
|
41
|
+
end
|
42
|
+
end
|
43
|
+
|
44
|
+
def respond_to?(method)
|
45
|
+
input_matrices.has_key?(method) ? true : super
|
46
|
+
end
|
47
|
+
|
48
|
+
def all_items
|
49
|
+
Predictor.redis.sunion input_matrices.map{|k,m| m.redis_key(:all_items)}
|
50
|
+
end
|
51
|
+
|
52
|
+
def item_score(item, normalize)
|
53
|
+
if normalize
|
54
|
+
similarities = similarities_for(item, with_scores: true)
|
55
|
+
unless similarities.empty?
|
56
|
+
similarities.map{|x,y| y}.reduce(:+)
|
57
|
+
else
|
58
|
+
1
|
59
|
+
end
|
60
|
+
else
|
61
|
+
1
|
62
|
+
end
|
63
|
+
end
|
64
|
+
|
65
|
+
def predictions_for(set_id=nil, item_set: nil, matrix_label: nil, with_scores: false, normalize: true, offset: 0, limit: -1, exclusion_set: [])
|
66
|
+
fail "item_set or matrix_label and set_id is required" unless item_set || (matrix_label && set_id)
|
67
|
+
redis = Predictor.redis
|
68
|
+
|
69
|
+
if matrix_label
|
70
|
+
matrix = input_matrices[matrix_label]
|
71
|
+
item_set = redis.smembers(matrix.redis_key(:items, set_id))
|
72
|
+
end
|
73
|
+
|
74
|
+
item_keys = item_set.map do |item|
|
75
|
+
input_matrices.map{ |k,m| m.redis_key(:similarities, item) }
|
76
|
+
end.flatten
|
77
|
+
|
78
|
+
item_weights = item_set.map do |item|
|
79
|
+
score = item_score(item, normalize)
|
80
|
+
input_matrices.map{|k, m| m.weight/score }
|
81
|
+
end.flatten
|
82
|
+
|
83
|
+
unless item_keys.empty?
|
84
|
+
predictions = nil
|
85
|
+
redis.multi do |multi|
|
86
|
+
multi.zunionstore 'temp', item_keys, weights: item_weights
|
87
|
+
multi.zrem 'temp', item_set
|
88
|
+
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
|
89
|
+
predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
|
90
|
+
multi.del 'temp'
|
91
|
+
end
|
92
|
+
return predictions.value
|
93
|
+
else
|
94
|
+
return []
|
95
|
+
end
|
96
|
+
end
|
97
|
+
|
98
|
+
def similarities_for(item, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
|
99
|
+
keys = input_matrices.map{ |k,m| m.redis_key(:similarities, item) }
|
100
|
+
weights = input_matrices.map{ |k,m| m.weight }
|
101
|
+
neighbors = nil
|
102
|
+
unless keys.empty?
|
103
|
+
Predictor.redis.multi do |multi|
|
104
|
+
multi.zunionstore 'temp', keys, weights: weights
|
105
|
+
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
|
106
|
+
neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
|
107
|
+
multi.del 'temp'
|
108
|
+
end
|
109
|
+
return neighbors.value
|
110
|
+
else
|
111
|
+
return []
|
112
|
+
end
|
113
|
+
end
|
114
|
+
|
115
|
+
def sets_for(item)
|
116
|
+
keys = input_matrices.map{ |k,m| m.redis_key(:sets, item) }
|
117
|
+
Predictor.redis.sunion keys
|
118
|
+
end
|
119
|
+
|
120
|
+
def process!
|
121
|
+
input_matrices.each do |k,m|
|
122
|
+
m.process!
|
123
|
+
end
|
124
|
+
return self
|
125
|
+
end
|
126
|
+
|
127
|
+
def process_item!(item)
|
128
|
+
input_matrices.each do |k,m|
|
129
|
+
m.process_item!(item)
|
130
|
+
end
|
131
|
+
return self
|
132
|
+
end
|
133
|
+
|
134
|
+
def delete_item!(item_id)
|
135
|
+
input_matrices.each do |k,m|
|
136
|
+
m.delete_item!(item_id)
|
137
|
+
end
|
138
|
+
return self
|
139
|
+
end
|
140
|
+
|
141
|
+
def clean!
|
142
|
+
# now only flushes the keys for the instantiated recommender
|
143
|
+
keys = Predictor.redis.keys("#{self.redis_prefix}:*")
|
144
|
+
unless keys.empty?
|
145
|
+
Predictor.redis.del(keys)
|
146
|
+
end
|
147
|
+
end
|
148
|
+
end
|
@@ -0,0 +1,138 @@
|
|
1
|
+
class Predictor::InputMatrix
|
2
|
+
def initialize(opts)
|
3
|
+
@opts = opts
|
4
|
+
end
|
5
|
+
|
6
|
+
def redis_key(*append)
|
7
|
+
([@opts.fetch(:redis_prefix), @opts.fetch(:key)] + append).flatten.compact.join(":")
|
8
|
+
end
|
9
|
+
|
10
|
+
def weight
|
11
|
+
(@opts[:weight] || 1).to_f
|
12
|
+
end
|
13
|
+
|
14
|
+
def add_set(set_id, item_ids)
|
15
|
+
Predictor.redis.multi do
|
16
|
+
item_ids.each { |item| add_single_nomulti(set_id, item) }
|
17
|
+
end
|
18
|
+
end
|
19
|
+
|
20
|
+
def add_set!(set_id, item_ids)
|
21
|
+
add_set(set_id, item_ids)
|
22
|
+
item_ids.each { |item_id| process_item!(item_id) }
|
23
|
+
end
|
24
|
+
|
25
|
+
def add_single(set_id, item_id)
|
26
|
+
Predictor.redis.multi do
|
27
|
+
add_single_nomulti(set_id, item_id)
|
28
|
+
end
|
29
|
+
end
|
30
|
+
|
31
|
+
def add_single!(set_id, item_id)
|
32
|
+
add_single(set_id, item_id)
|
33
|
+
process_item!(item_id)
|
34
|
+
end
|
35
|
+
|
36
|
+
def all_items
|
37
|
+
Predictor.redis.smembers(redis_key(:all_items))
|
38
|
+
end
|
39
|
+
|
40
|
+
def items_for(set)
|
41
|
+
Predictor.redis.smembers redis_key(:items, set)
|
42
|
+
end
|
43
|
+
|
44
|
+
def sets_for(item)
|
45
|
+
Predictor.redis.sunion redis_key(:sets, item)
|
46
|
+
end
|
47
|
+
|
48
|
+
def related_items(item_id)
|
49
|
+
sets = Predictor.redis.smembers(redis_key(:sets, item_id))
|
50
|
+
keys = sets.map { |set| redis_key(:items, set) }
|
51
|
+
if keys.length > 0
|
52
|
+
Predictor.redis.sunion(keys) - [item_id]
|
53
|
+
else
|
54
|
+
[]
|
55
|
+
end
|
56
|
+
end
|
57
|
+
|
58
|
+
def similarity(item1, item2)
|
59
|
+
Predictor.redis.zscore(redis_key(:similarities, item1), item2)
|
60
|
+
end
|
61
|
+
|
62
|
+
# calculate all similarities to other items in the matrix for item1
|
63
|
+
def similarities_for(item1, with_scores: false, offset: 0, limit: -1)
|
64
|
+
Predictor.redis.zrevrange(redis_key(:similarities, item1), offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
|
65
|
+
end
|
66
|
+
|
67
|
+
def process_item!(item)
|
68
|
+
cache_similarities_for(item)
|
69
|
+
end
|
70
|
+
|
71
|
+
def process!
|
72
|
+
all_items.each do |item|
|
73
|
+
process_item!(item)
|
74
|
+
end
|
75
|
+
end
|
76
|
+
|
77
|
+
# delete item_id from the matrix
|
78
|
+
def delete_item!(item_id)
|
79
|
+
Predictor.redis.srem(redis_key(:all_items), item_id)
|
80
|
+
Predictor.redis.watch(redis_key(:sets, item_id), redis_key(:similarities, item_id)) do
|
81
|
+
sets = Predictor.redis.smembers(redis_key(:sets, item_id))
|
82
|
+
items = Predictor.redis.zrange(redis_key(:similarities, item_id), 0, -1)
|
83
|
+
Predictor.redis.multi do |multi|
|
84
|
+
sets.each do |set|
|
85
|
+
multi.srem(redis_key(:items, set), item_id)
|
86
|
+
end
|
87
|
+
|
88
|
+
items.each do |item|
|
89
|
+
multi.zrem(redis_key(:similarities, item), item_id)
|
90
|
+
end
|
91
|
+
|
92
|
+
multi.del redis_key(:sets, item_id), redis_key(:similarities, item_id)
|
93
|
+
end
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
private
|
98
|
+
|
99
|
+
def add_single_nomulti(set_id, item_id)
|
100
|
+
Predictor.redis.sadd(redis_key(:all_items), item_id)
|
101
|
+
Predictor.redis.sadd(redis_key(:items, set_id), item_id)
|
102
|
+
# add the set_id to the item_id's set--inverting the sets
|
103
|
+
Predictor.redis.sadd(redis_key(:sets, item_id), set_id)
|
104
|
+
end
|
105
|
+
|
106
|
+
def cache_similarity(item1, item2)
|
107
|
+
score = calculate_jaccard(item1, item2)
|
108
|
+
|
109
|
+
if score > 0
|
110
|
+
Predictor.redis.multi do |multi|
|
111
|
+
multi.zadd(redis_key(:similarities, item1), score, item2)
|
112
|
+
multi.zadd(redis_key(:similarities, item2), score, item1)
|
113
|
+
end
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
def cache_similarities_for(item)
|
118
|
+
related_items(item).each do |related_item|
|
119
|
+
cache_similarity(item, related_item)
|
120
|
+
end
|
121
|
+
end
|
122
|
+
|
123
|
+
def calculate_jaccard(item1, item2)
|
124
|
+
x = nil
|
125
|
+
y = nil
|
126
|
+
Predictor.redis.multi do |multi|
|
127
|
+
x = multi.sinterstore 'temp', [redis_key(:sets, item1), redis_key(:sets, item2)]
|
128
|
+
y = multi.sunionstore 'temp', [redis_key(:sets, item1), redis_key(:sets, item2)]
|
129
|
+
multi.del 'temp'
|
130
|
+
end
|
131
|
+
|
132
|
+
if y.value > 0
|
133
|
+
return (x.value.to_f/y.value.to_f)
|
134
|
+
else
|
135
|
+
return 0.0
|
136
|
+
end
|
137
|
+
end
|
138
|
+
end
|
@@ -0,0 +1,21 @@
|
|
1
|
+
module Predictor
|
2
|
+
@@redis = nil
|
3
|
+
|
4
|
+
def self.redis=(redis)
|
5
|
+
@@redis = redis
|
6
|
+
end
|
7
|
+
|
8
|
+
def self.redis
|
9
|
+
return @@redis unless @@redis.nil?
|
10
|
+
raise "redis not configured! - Predictor.redis = Redis.new"
|
11
|
+
end
|
12
|
+
|
13
|
+
def self.capitalize(str_or_sym)
|
14
|
+
str = str_or_sym.to_s.each_char.to_a
|
15
|
+
str.first.upcase + str[1..-1].join("").downcase
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.constantize(klass)
|
19
|
+
Object.module_eval("Predictor::#{klass}", __FILE__, __LINE__)
|
20
|
+
end
|
21
|
+
end
|
data/predictor.gemspec
ADDED
@@ -0,0 +1,21 @@
|
|
1
|
+
# -*- encoding: utf-8 -*-
|
2
|
+
require File.expand_path('../lib/predictor/version', __FILE__)
|
3
|
+
|
4
|
+
Gem::Specification.new do |s|
|
5
|
+
s.name = "predictor"
|
6
|
+
s.version = Predictor::VERSION
|
7
|
+
s.platform = Gem::Platform::RUBY
|
8
|
+
s.authors = ["Pathgather"]
|
9
|
+
s.email = ["tech@pathgather.com"]
|
10
|
+
s.homepage = "https://github.com/Pathgather/predictor"
|
11
|
+
s.description = s.summary = "Fast and efficient recommendations and predictions using Redis"
|
12
|
+
s.licenses = ["MIT"]
|
13
|
+
|
14
|
+
s.add_dependency "redis", ">= 3.0.0"
|
15
|
+
|
16
|
+
s.add_development_dependency "rspec", "~> 2.8.0"
|
17
|
+
|
18
|
+
s.files = `git ls-files`.split("\n") - [".gitignore", ".rspec", ".travis.yml"]
|
19
|
+
s.test_files = `git ls-files -- spec/*`.split("\n")
|
20
|
+
s.require_paths = ["lib"]
|
21
|
+
end
|
data/spec/base_spec.rb
ADDED
@@ -0,0 +1,224 @@
|
|
1
|
+
require ::File.expand_path('../spec_helper', __FILE__)
|
2
|
+
|
3
|
+
describe Predictor::Base do
|
4
|
+
class BaseRecommender
|
5
|
+
include Predictor::Base
|
6
|
+
end
|
7
|
+
|
8
|
+
before(:each) do
|
9
|
+
flush_redis!
|
10
|
+
BaseRecommender.input_matrices = {}
|
11
|
+
end
|
12
|
+
|
13
|
+
describe "configuration" do
|
14
|
+
it "should add an input_matrix by 'key'" do
|
15
|
+
BaseRecommender.input_matrix(:myinput)
|
16
|
+
BaseRecommender.input_matrices.keys.should == [:myinput]
|
17
|
+
end
|
18
|
+
|
19
|
+
it "should retrieve an input_matrix on a new instance" do
|
20
|
+
BaseRecommender.input_matrix(:myinput)
|
21
|
+
sm = BaseRecommender.new
|
22
|
+
lambda{ sm.myinput }.should_not raise_error
|
23
|
+
end
|
24
|
+
|
25
|
+
it "should retrieve an input_matrix on a new instance and correctly overload respond_to?" do
|
26
|
+
BaseRecommender.input_matrix(:myinput)
|
27
|
+
sm = BaseRecommender.new
|
28
|
+
sm.respond_to?(:process!).should be_true
|
29
|
+
sm.respond_to?(:myinput).should be_true
|
30
|
+
sm.respond_to?(:fnord).should be_false
|
31
|
+
end
|
32
|
+
|
33
|
+
it "should retrieve an input_matrix on a new instance and intialize the correct class" do
|
34
|
+
BaseRecommender.input_matrix(:myinput)
|
35
|
+
sm = BaseRecommender.new
|
36
|
+
sm.myinput.should be_a(Predictor::InputMatrix)
|
37
|
+
end
|
38
|
+
end
|
39
|
+
|
40
|
+
describe "process_item!" do
|
41
|
+
it "should call process_item! on each input_matrix" do
|
42
|
+
BaseRecommender.input_matrix(:myfirstinput)
|
43
|
+
BaseRecommender.input_matrix(:mysecondinput)
|
44
|
+
sm = BaseRecommender.new
|
45
|
+
sm.myfirstinput.should_receive(:process_item!).with("fnorditem").and_return([["fooitem",0.5]])
|
46
|
+
sm.mysecondinput.should_receive(:process_item!).with("fnorditem").and_return([["fooitem",0.5]])
|
47
|
+
sm.process_item!("fnorditem")
|
48
|
+
end
|
49
|
+
|
50
|
+
it "should call process_item! on each input_matrix and add all outputs to the similarity matrix" do
|
51
|
+
BaseRecommender.input_matrix(:myfirstinput)
|
52
|
+
BaseRecommender.input_matrix(:mysecondinput)
|
53
|
+
sm = BaseRecommender.new
|
54
|
+
sm.myfirstinput.should_receive(:process_item!).and_return([["fooitem",0.5]])
|
55
|
+
sm.mysecondinput.should_receive(:process_item!).and_return([["fooitem",0.75], ["baritem", 1.0]])
|
56
|
+
sm.process_item!("fnorditem")
|
57
|
+
end
|
58
|
+
|
59
|
+
it "should call process_item! on each input_matrix and add all outputs to the similarity matrix with weight" do
|
60
|
+
BaseRecommender.input_matrix(:myfirstinput, :weight => 4.0)
|
61
|
+
BaseRecommender.input_matrix(:mysecondinput)
|
62
|
+
sm = BaseRecommender.new
|
63
|
+
sm.myfirstinput.should_receive(:process_item!).and_return([["fooitem",0.5]])
|
64
|
+
sm.mysecondinput.should_receive(:process_item!).and_return([["fooitem",0.75], ["baritem", 1.0]])
|
65
|
+
sm.process_item!("fnorditem")
|
66
|
+
end
|
67
|
+
end
|
68
|
+
|
69
|
+
describe "all_items" do
|
70
|
+
it "should retrieve all items from all input matrices" do
|
71
|
+
BaseRecommender.input_matrix(:anotherinput)
|
72
|
+
BaseRecommender.input_matrix(:yetanotherinput)
|
73
|
+
sm = BaseRecommender.new
|
74
|
+
sm.anotherinput.add_set('a', ["foo", "bar"])
|
75
|
+
sm.yetanotherinput.add_set('b', ["fnord", "shmoo"])
|
76
|
+
sm.all_items.length.should == 4
|
77
|
+
sm.all_items.should include("foo", "bar", "fnord", "shmoo")
|
78
|
+
end
|
79
|
+
|
80
|
+
it "should retrieve all items from all input matrices (uniquely)" do
|
81
|
+
BaseRecommender.input_matrix(:anotherinput)
|
82
|
+
BaseRecommender.input_matrix(:yetanotherinput)
|
83
|
+
sm = BaseRecommender.new
|
84
|
+
sm.anotherinput.add_set('a', ["foo", "bar"])
|
85
|
+
sm.yetanotherinput.add_set('b', ["fnord", "bar"])
|
86
|
+
sm.all_items.length.should == 3
|
87
|
+
sm.all_items.should include("foo", "bar", "fnord")
|
88
|
+
end
|
89
|
+
end
|
90
|
+
|
91
|
+
describe "process!" do
|
92
|
+
it "should call process_item for all input_matrix.all_items's" do
|
93
|
+
BaseRecommender.input_matrix(:anotherinput)
|
94
|
+
BaseRecommender.input_matrix(:yetanotherinput)
|
95
|
+
sm = BaseRecommender.new
|
96
|
+
sm.anotherinput.add_set('a', ["foo", "bar"])
|
97
|
+
sm.yetanotherinput.add_set('b', ["fnord", "shmoo"])
|
98
|
+
sm.anotherinput.should_receive(:process!).exactly(1).times
|
99
|
+
sm.yetanotherinput.should_receive(:process!).exactly(1).times
|
100
|
+
sm.process!
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
describe "predictions_for" do
|
105
|
+
it "returns relevant predictions" do
|
106
|
+
BaseRecommender.input_matrix(:users, weight: 4.0)
|
107
|
+
BaseRecommender.input_matrix(:tags, weight: 1.0)
|
108
|
+
sm = BaseRecommender.new
|
109
|
+
sm.users.add_set('me', ["foo", "bar", "fnord"])
|
110
|
+
sm.users.add_set('not_me', ["foo", "shmoo"])
|
111
|
+
sm.users.add_set('another', ["fnord", "other"])
|
112
|
+
sm.users.add_set('another', ["nada"])
|
113
|
+
sm.tags.add_set('tag1', ["foo", "fnord", "shmoo"])
|
114
|
+
sm.tags.add_set('tag2', ["bar", "shmoo"])
|
115
|
+
sm.tags.add_set('tag3', ["shmoo", "nada"])
|
116
|
+
sm.process!
|
117
|
+
predictions = sm.predictions_for('me', matrix_label: :users)
|
118
|
+
predictions.should == ["shmoo", "other", "nada"]
|
119
|
+
predictions = sm.predictions_for(item_set: ["foo", "bar", "fnord"])
|
120
|
+
predictions.should == ["shmoo", "other", "nada"]
|
121
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1, limit: 1)
|
122
|
+
predictions.should == ["other"]
|
123
|
+
predictions = sm.predictions_for('me', matrix_label: :users, offset: 1)
|
124
|
+
predictions.should == ["other", "nada"]
|
125
|
+
end
|
126
|
+
|
127
|
+
it "correctly normalizes predictions" do
|
128
|
+
BaseRecommender.input_matrix(:users, weight: 1.0)
|
129
|
+
BaseRecommender.input_matrix(:tags, weight: 2.0)
|
130
|
+
BaseRecommender.input_matrix(:topics, weight: 4.0)
|
131
|
+
|
132
|
+
sm = BaseRecommender.new
|
133
|
+
|
134
|
+
sm.users.add_set('user1', ["c1", "c2", "c4"])
|
135
|
+
sm.users.add_set('user2', ["c3", "c4"])
|
136
|
+
sm.topics.add_set('topic1', ["c1", "c4"])
|
137
|
+
sm.topics.add_set('topic2', ["c2", "c3"])
|
138
|
+
sm.tags.add_set('tag1', ["c1", "c2", "c4"])
|
139
|
+
sm.tags.add_set('tag2', ["c1", "c4"])
|
140
|
+
|
141
|
+
sm.process!
|
142
|
+
|
143
|
+
predictions = sm.predictions_for('user1', matrix_label: :users, with_scores: true, normalize: false)
|
144
|
+
predictions.should eq([["c3", 4.5]])
|
145
|
+
predictions = sm.predictions_for('user2', matrix_label: :users, with_scores: true, normalize: false)
|
146
|
+
predictions.should eq([["c1", 6.5], ["c2", 5.5]])
|
147
|
+
predictions = sm.predictions_for('user1', matrix_label: :users, with_scores: true, normalize: true)
|
148
|
+
predictions[0][0].should eq("c3")
|
149
|
+
predictions[0][1].should be_within(0.001).of(0.592)
|
150
|
+
predictions = sm.predictions_for('user2', matrix_label: :users, with_scores: true, normalize: true)
|
151
|
+
predictions[0][0].should eq("c2")
|
152
|
+
predictions[0][1].should be_within(0.001).of(1.065)
|
153
|
+
predictions[1][0].should eq("c1")
|
154
|
+
predictions[1][1].should be_within(0.001).of(0.764)
|
155
|
+
end
|
156
|
+
end
|
157
|
+
|
158
|
+
describe "similarities_for(item_id)" do
|
159
|
+
it "should not throw exception for non existing items" do
|
160
|
+
sm = BaseRecommender.new
|
161
|
+
sm.similarities_for("not_existing_item").length.should == 0
|
162
|
+
end
|
163
|
+
|
164
|
+
it "correctly weighs and sums input matrices" do
|
165
|
+
BaseRecommender.input_matrix(:users, weight: 1.0)
|
166
|
+
BaseRecommender.input_matrix(:tags, weight: 2.0)
|
167
|
+
BaseRecommender.input_matrix(:topics, weight: 4.0)
|
168
|
+
|
169
|
+
sm = BaseRecommender.new
|
170
|
+
|
171
|
+
sm.users.add_set('user1', ["c1", "c2", "c4"])
|
172
|
+
sm.users.add_set('user2', ["c3", "c4"])
|
173
|
+
sm.topics.add_set('topic1', ["c1", "c4"])
|
174
|
+
sm.topics.add_set('topic2', ["c2", "c3"])
|
175
|
+
sm.tags.add_set('tag1', ["c1", "c2", "c4"])
|
176
|
+
sm.tags.add_set('tag2', ["c1", "c4"])
|
177
|
+
|
178
|
+
sm.process!
|
179
|
+
sm.similarities_for("c1", with_scores: true).should eq([["c4", 6.5], ["c2", 2.0]])
|
180
|
+
sm.similarities_for("c2", with_scores: true).should eq([["c3", 4.0], ["c1", 2.0], ["c4", 1.5]])
|
181
|
+
sm.similarities_for("c3", with_scores: true).should eq([["c2", 4.0], ["c4", 0.5]])
|
182
|
+
sm.similarities_for("c4", with_scores: true, exclusion_set: ["c3"]).should eq([["c1", 6.5], ["c2", 1.5]])
|
183
|
+
end
|
184
|
+
end
|
185
|
+
|
186
|
+
describe "sets_for" do
|
187
|
+
it "should return all the sets the given item is in" do
|
188
|
+
BaseRecommender.input_matrix(:set1)
|
189
|
+
BaseRecommender.input_matrix(:set2)
|
190
|
+
sm = BaseRecommender.new
|
191
|
+
sm.set1.add_set "item1", ["foo", "bar"]
|
192
|
+
sm.set1.add_set "item2", ["nada", "bar"]
|
193
|
+
sm.set2.add_set "item3", ["bar", "other"]
|
194
|
+
sm.sets_for("bar").length.should == 3
|
195
|
+
sm.sets_for("bar").should include("item1", "item2", "item3")
|
196
|
+
sm.sets_for("other").should == ["item3"]
|
197
|
+
end
|
198
|
+
end
|
199
|
+
|
200
|
+
describe "delete_item!" do
|
201
|
+
it "should call delete_item on each input_matrix" do
|
202
|
+
BaseRecommender.input_matrix(:myfirstinput)
|
203
|
+
BaseRecommender.input_matrix(:mysecondinput)
|
204
|
+
sm = BaseRecommender.new
|
205
|
+
sm.myfirstinput.should_receive(:delete_item!).with("fnorditem")
|
206
|
+
sm.mysecondinput.should_receive(:delete_item!).with("fnorditem")
|
207
|
+
sm.delete_item!("fnorditem")
|
208
|
+
end
|
209
|
+
end
|
210
|
+
|
211
|
+
describe "clean!" do
|
212
|
+
it "should clean out the Redis storage for this Predictor" do
|
213
|
+
BaseRecommender.input_matrix(:set1)
|
214
|
+
BaseRecommender.input_matrix(:set2)
|
215
|
+
sm = BaseRecommender.new
|
216
|
+
sm.set1.add_set "item1", ["foo", "bar"]
|
217
|
+
sm.set1.add_set "item2", ["nada", "bar"]
|
218
|
+
sm.set2.add_set "item3", ["bar", "other"]
|
219
|
+
Predictor.redis.keys("#{sm.redis_prefix}:*").should_not be_empty
|
220
|
+
sm.clean!
|
221
|
+
Predictor.redis.keys("#{sm.redis_prefix}:*").should be_empty
|
222
|
+
end
|
223
|
+
end
|
224
|
+
end
|
@@ -0,0 +1,251 @@
|
|
1
|
+
require ::File.expand_path('../spec_helper', __FILE__)
|
2
|
+
|
3
|
+
describe Predictor::InputMatrix do
|
4
|
+
|
5
|
+
before(:all) do
|
6
|
+
@matrix = Predictor::InputMatrix.new(:redis_prefix => "predictor-test", :key => "mymatrix")
|
7
|
+
end
|
8
|
+
|
9
|
+
before(:each) do
|
10
|
+
flush_redis!
|
11
|
+
end
|
12
|
+
|
13
|
+
it "should build the correct keys" do
|
14
|
+
@matrix.redis_key.should == "predictor-test:mymatrix"
|
15
|
+
end
|
16
|
+
|
17
|
+
it "should respond to add_set" do
|
18
|
+
@matrix.respond_to?(:add_set).should == true
|
19
|
+
end
|
20
|
+
|
21
|
+
it "should respond to add_single" do
|
22
|
+
@matrix.respond_to?(:add_single).should == true
|
23
|
+
end
|
24
|
+
|
25
|
+
it "should respond to similarities_for" do
|
26
|
+
@matrix.respond_to?(:similarities_for).should == true
|
27
|
+
end
|
28
|
+
|
29
|
+
it "should respond to all_items" do
|
30
|
+
@matrix.respond_to?(:all_items).should == true
|
31
|
+
end
|
32
|
+
|
33
|
+
describe "weight" do
|
34
|
+
it "returns the weight configured or a default of 1" do
|
35
|
+
@matrix.weight.should == 1.0 # default weight
|
36
|
+
matrix = Predictor::InputMatrix.new(redis_prefix: "predictor-test", key: "mymatrix", weight: 5.0)
|
37
|
+
matrix.weight.should == 5.0
|
38
|
+
end
|
39
|
+
end
|
40
|
+
|
41
|
+
describe "add_set" do
|
42
|
+
it "adds each member of the set to the 'all_items' set" do
|
43
|
+
@matrix.all_items.should_not include("foo", "bar", "fnord", "blubb")
|
44
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
45
|
+
@matrix.all_items.should include("foo", "bar", "fnord", "blubb")
|
46
|
+
end
|
47
|
+
|
48
|
+
it "adds each member of the set to the key's 'sets' set" do
|
49
|
+
@matrix.items_for("item1").should_not include("foo", "bar", "fnord", "blubb")
|
50
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
51
|
+
@matrix.items_for("item1").should include("foo", "bar", "fnord", "blubb")
|
52
|
+
end
|
53
|
+
|
54
|
+
it "adds the key to each set member's 'items' set" do
|
55
|
+
@matrix.sets_for("foo").should_not include("item1")
|
56
|
+
@matrix.sets_for("bar").should_not include("item1")
|
57
|
+
@matrix.sets_for("fnord").should_not include("item1")
|
58
|
+
@matrix.sets_for("blubb").should_not include("item1")
|
59
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
60
|
+
@matrix.sets_for("foo").should include("item1")
|
61
|
+
@matrix.sets_for("bar").should include("item1")
|
62
|
+
@matrix.sets_for("fnord").should include("item1")
|
63
|
+
@matrix.sets_for("blubb").should include("item1")
|
64
|
+
end
|
65
|
+
end
|
66
|
+
|
67
|
+
describe "add_set!" do
|
68
|
+
it "calls add_set and process_item! for each item" do
|
69
|
+
@matrix.should_receive(:add_set).with("item1", ["foo", "bar"])
|
70
|
+
@matrix.should_receive(:process_item!).with("foo")
|
71
|
+
@matrix.should_receive(:process_item!).with("bar")
|
72
|
+
@matrix.add_set! "item1", ["foo", "bar"]
|
73
|
+
end
|
74
|
+
end
|
75
|
+
|
76
|
+
describe "add_single" do
|
77
|
+
it "adds the item to the 'all_items' set" do
|
78
|
+
@matrix.all_items.should_not include("foo")
|
79
|
+
@matrix.add_single "item1", "foo"
|
80
|
+
@matrix.all_items.should include("foo")
|
81
|
+
end
|
82
|
+
|
83
|
+
it "adds the item to the key's 'sets' set" do
|
84
|
+
@matrix.items_for("item1").should_not include("foo")
|
85
|
+
@matrix.add_single "item1", "foo"
|
86
|
+
@matrix.items_for("item1").should include("foo")
|
87
|
+
end
|
88
|
+
|
89
|
+
it "adds the key to the item's 'items' set" do
|
90
|
+
@matrix.sets_for("foo").should_not include("item1")
|
91
|
+
@matrix.add_single "item1", "foo"
|
92
|
+
@matrix.sets_for("foo").should include("item1")
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
describe "add_single!" do
|
97
|
+
it "calls add_single and process_item! for the item" do
|
98
|
+
@matrix.should_receive(:add_single).with("item1", "foo")
|
99
|
+
@matrix.should_receive(:process_item!).with("foo")
|
100
|
+
@matrix.add_single! "item1", "foo"
|
101
|
+
end
|
102
|
+
end
|
103
|
+
|
104
|
+
describe "all_items" do
|
105
|
+
it "returns all items across all sets in the input matrix" do
|
106
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
107
|
+
@matrix.add_set "item2", ["foo", "bar", "snafu", "nada"]
|
108
|
+
@matrix.add_set "item3", ["nada"]
|
109
|
+
@matrix.all_items.should include("foo", "bar", "fnord", "blubb", "snafu", "nada")
|
110
|
+
@matrix.all_items.length.should == 6
|
111
|
+
end
|
112
|
+
end
|
113
|
+
|
114
|
+
describe "items_for" do
|
115
|
+
it "returns the items in the given set ID" do
|
116
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
117
|
+
@matrix.items_for("item1").should include("foo", "bar", "fnord", "blubb")
|
118
|
+
@matrix.add_set "item2", ["foo", "bar", "snafu", "nada"]
|
119
|
+
@matrix.items_for("item2").should include("foo", "bar", "snafu", "nada")
|
120
|
+
@matrix.items_for("item1").should_not include("snafu", "nada")
|
121
|
+
end
|
122
|
+
end
|
123
|
+
|
124
|
+
describe "sets_for" do
|
125
|
+
it "returns the set IDs the given item is in" do
|
126
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
127
|
+
@matrix.add_set "item2", ["foo", "bar", "snafu", "nada"]
|
128
|
+
@matrix.sets_for("foo").should include("item1", "item2")
|
129
|
+
@matrix.sets_for("snafu").should == ["item2"]
|
130
|
+
end
|
131
|
+
end
|
132
|
+
|
133
|
+
describe "related_items" do
|
134
|
+
it "returns the items in sets the given item is also in" do
|
135
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
136
|
+
@matrix.add_set "item2", ["foo", "bar", "snafu", "nada"]
|
137
|
+
@matrix.add_set "item3", ["nada", "other"]
|
138
|
+
@matrix.related_items("bar").should include("foo", "fnord", "blubb", "snafu", "nada")
|
139
|
+
@matrix.related_items("bar").length.should == 5
|
140
|
+
@matrix.related_items("other").should == ["nada"]
|
141
|
+
@matrix.related_items("snafu").should include("foo", "bar", "nada")
|
142
|
+
@matrix.related_items("snafu").length.should == 3
|
143
|
+
end
|
144
|
+
end
|
145
|
+
|
146
|
+
describe "similarity" do
|
147
|
+
it "should calculate the correct similarity between two items" do
|
148
|
+
add_two_item_test_data!(@matrix)
|
149
|
+
@matrix.process!
|
150
|
+
@matrix.similarity("fnord", "blubb").should == 0.4
|
151
|
+
@matrix.similarity("blubb", "fnord").should == 0.4
|
152
|
+
end
|
153
|
+
end
|
154
|
+
|
155
|
+
describe "similarities_for" do
|
156
|
+
it "should calculate all similarities for an item (1/3)" do
|
157
|
+
add_three_item_test_data!(@matrix)
|
158
|
+
@matrix.process!
|
159
|
+
res = @matrix.similarities_for("fnord", with_scores: true)
|
160
|
+
res.length.should == 2
|
161
|
+
res[0].should == ["shmoo", 0.75]
|
162
|
+
res[1].should == ["blubb", 0.4]
|
163
|
+
end
|
164
|
+
|
165
|
+
it "should calculate all similarities for an item (2/3)" do
|
166
|
+
add_three_item_test_data!(@matrix)
|
167
|
+
@matrix.process!
|
168
|
+
res = @matrix.similarities_for("shmoo", with_scores: true)
|
169
|
+
res.length.should == 2
|
170
|
+
res[0].should == ["fnord", 0.75]
|
171
|
+
res[1].should == ["blubb", 0.2]
|
172
|
+
end
|
173
|
+
|
174
|
+
|
175
|
+
it "should calculate all similarities for an item (3/3)" do
|
176
|
+
add_three_item_test_data!(@matrix)
|
177
|
+
@matrix.process!
|
178
|
+
res = @matrix.similarities_for("blubb", with_scores: true)
|
179
|
+
res.length.should == 2
|
180
|
+
res[0].should == ["fnord", 0.4]
|
181
|
+
res[1].should == ["shmoo", 0.2]
|
182
|
+
end
|
183
|
+
end
|
184
|
+
|
185
|
+
describe "delete_item!" do
|
186
|
+
before do
|
187
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
188
|
+
@matrix.add_set "item2", ["foo", "bar", "snafu", "nada"]
|
189
|
+
@matrix.add_set "item3", ["nada", "other"]
|
190
|
+
@matrix.process!
|
191
|
+
end
|
192
|
+
|
193
|
+
it "should delete the item from sets it is in" do
|
194
|
+
@matrix.items_for("item1").should include("bar")
|
195
|
+
@matrix.items_for("item2").should include("bar")
|
196
|
+
@matrix.sets_for("bar").should include("item1", "item2")
|
197
|
+
@matrix.delete_item!("bar")
|
198
|
+
@matrix.items_for("item1").should_not include("bar")
|
199
|
+
@matrix.items_for("item2").should_not include("bar")
|
200
|
+
@matrix.sets_for("bar").should be_empty
|
201
|
+
end
|
202
|
+
|
203
|
+
it "should delete the cached similarities for the item" do
|
204
|
+
@matrix.similarities_for("bar").should_not be_empty
|
205
|
+
@matrix.delete_item!("bar")
|
206
|
+
@matrix.similarities_for("bar").should be_empty
|
207
|
+
end
|
208
|
+
|
209
|
+
it "should delete the item from other cached similarities" do
|
210
|
+
@matrix.similarities_for("foo").should include("bar")
|
211
|
+
@matrix.delete_item!("bar")
|
212
|
+
@matrix.similarities_for("foo").should_not include("bar")
|
213
|
+
end
|
214
|
+
|
215
|
+
it "should delete the item from the all_items set" do
|
216
|
+
@matrix.all_items.should include("bar")
|
217
|
+
@matrix.delete_item!("bar")
|
218
|
+
@matrix.all_items.should_not include("bar")
|
219
|
+
end
|
220
|
+
end
|
221
|
+
|
222
|
+
it "should calculate the correct jaccard index" do
|
223
|
+
@matrix.add_set "item1", ["foo", "bar", "fnord", "blubb"]
|
224
|
+
@matrix.add_set "item2", ["bar", "fnord", "shmoo", "snafu"]
|
225
|
+
@matrix.add_set "item3", ["bar", "nada", "snafu"]
|
226
|
+
|
227
|
+
@matrix.send(:calculate_jaccard,
|
228
|
+
"bar",
|
229
|
+
"snafu"
|
230
|
+
).should == 2.0/3.0
|
231
|
+
end
|
232
|
+
|
233
|
+
private
|
234
|
+
|
235
|
+
def add_two_item_test_data!(matrix)
|
236
|
+
matrix.add_set("user42", ["fnord", "blubb"])
|
237
|
+
matrix.add_set("user44", ["blubb"])
|
238
|
+
matrix.add_set("user46", ["fnord"])
|
239
|
+
matrix.add_set("user48", ["fnord", "blubb"])
|
240
|
+
matrix.add_set("user50", ["fnord"])
|
241
|
+
end
|
242
|
+
|
243
|
+
def add_three_item_test_data!(matrix)
|
244
|
+
matrix.add_set("user42", ["fnord", "blubb", "shmoo"])
|
245
|
+
matrix.add_set("user44", ["blubb"])
|
246
|
+
matrix.add_set("user46", ["fnord", "shmoo"])
|
247
|
+
matrix.add_set("user48", ["fnord", "blubb"])
|
248
|
+
matrix.add_set("user50", ["fnord", "shmoo"])
|
249
|
+
end
|
250
|
+
|
251
|
+
end
|
@@ -0,0 +1,15 @@
|
|
1
|
+
require ::File.expand_path('../spec_helper', __FILE__)
|
2
|
+
|
3
|
+
describe Predictor do
|
4
|
+
|
5
|
+
it "should store a redis connection" do
|
6
|
+
Predictor.redis = "asd"
|
7
|
+
Predictor.redis.should == "asd"
|
8
|
+
end
|
9
|
+
|
10
|
+
it "should raise an exception if unconfigured redis connection is accessed" do
|
11
|
+
Predictor.redis = nil
|
12
|
+
lambda{ ecommendify.redis }.should raise_error
|
13
|
+
end
|
14
|
+
|
15
|
+
end
|
data/spec/spec_helper.rb
ADDED
@@ -0,0 +1,40 @@
|
|
1
|
+
require "rspec"
|
2
|
+
require "redis"
|
3
|
+
require "pry"
|
4
|
+
|
5
|
+
require ::File.expand_path('../../lib/predictor', __FILE__)
|
6
|
+
|
7
|
+
def flush_redis!
|
8
|
+
Predictor.redis = Redis.new
|
9
|
+
Predictor.redis.keys("predictor-test*").each do |k|
|
10
|
+
Predictor.redis.del(k)
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
module Predictor::Base
|
15
|
+
|
16
|
+
def redis_prefix
|
17
|
+
"predictor-test"
|
18
|
+
end
|
19
|
+
|
20
|
+
end
|
21
|
+
|
22
|
+
|
23
|
+
class TestRecommender
|
24
|
+
include Predictor::Base
|
25
|
+
|
26
|
+
input_matrix :jaccard_one
|
27
|
+
|
28
|
+
end
|
29
|
+
|
30
|
+
class Predictor::TestInputMatrix
|
31
|
+
|
32
|
+
def initialize(opts)
|
33
|
+
@opts = opts
|
34
|
+
end
|
35
|
+
|
36
|
+
def method_missing(method, *args)
|
37
|
+
@opts[method]
|
38
|
+
end
|
39
|
+
|
40
|
+
end
|
metadata
ADDED
@@ -0,0 +1,91 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: predictor
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Pathgather
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2014-01-17 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: redis
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - '>='
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: 3.0.0
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - '>='
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: 3.0.0
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: rspec
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - ~>
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: 2.8.0
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - ~>
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: 2.8.0
|
41
|
+
description: Fast and efficient recommendations and predictions using Redis
|
42
|
+
email:
|
43
|
+
- tech@pathgather.com
|
44
|
+
executables: []
|
45
|
+
extensions: []
|
46
|
+
extra_rdoc_files: []
|
47
|
+
files:
|
48
|
+
- Gemfile
|
49
|
+
- LICENSE
|
50
|
+
- README.md
|
51
|
+
- Rakefile
|
52
|
+
- lib/predictor.rb
|
53
|
+
- lib/predictor/base.rb
|
54
|
+
- lib/predictor/input_matrix.rb
|
55
|
+
- lib/predictor/predictor.rb
|
56
|
+
- lib/predictor/version.rb
|
57
|
+
- predictor.gemspec
|
58
|
+
- spec/base_spec.rb
|
59
|
+
- spec/input_matrix_spec.rb
|
60
|
+
- spec/predictor_spec.rb
|
61
|
+
- spec/spec_helper.rb
|
62
|
+
homepage: https://github.com/Pathgather/predictor
|
63
|
+
licenses:
|
64
|
+
- MIT
|
65
|
+
metadata: {}
|
66
|
+
post_install_message:
|
67
|
+
rdoc_options: []
|
68
|
+
require_paths:
|
69
|
+
- lib
|
70
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
71
|
+
requirements:
|
72
|
+
- - '>='
|
73
|
+
- !ruby/object:Gem::Version
|
74
|
+
version: '0'
|
75
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
76
|
+
requirements:
|
77
|
+
- - '>='
|
78
|
+
- !ruby/object:Gem::Version
|
79
|
+
version: '0'
|
80
|
+
requirements: []
|
81
|
+
rubyforge_project:
|
82
|
+
rubygems_version: 2.1.11
|
83
|
+
signing_key:
|
84
|
+
specification_version: 4
|
85
|
+
summary: Fast and efficient recommendations and predictions using Redis
|
86
|
+
test_files:
|
87
|
+
- spec/base_spec.rb
|
88
|
+
- spec/input_matrix_spec.rb
|
89
|
+
- spec/predictor_spec.rb
|
90
|
+
- spec/spec_helper.rb
|
91
|
+
has_rdoc:
|