predictor 1.0.0 → 2.0.0.rc1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Changelog.md +13 -0
- data/Gemfile +1 -1
- data/README.md +75 -32
- data/docs/READMEv1.md +206 -0
- data/lib/predictor/base.rb +128 -60
- data/lib/predictor/input_matrix.rb +29 -82
- data/lib/predictor/version.rb +1 -1
- data/spec/base_spec.rb +160 -94
- data/spec/input_matrix_spec.rb +30 -160
- data/spec/predictor_spec.rb +1 -1
- metadata +6 -4
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 29f61606a156b3a6132dc9212f3d027492285d2e
|
4
|
+
data.tar.gz: acf97ff88f34ca518536e18b7753fe27e69956ec
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: cd25db9f133ee47f44703e8e6cab7d0daed59b9a035abfac75d733d6891d0c312981f76875b0278a9b66285c3f374650b01fe4466f45b51c0d42685dafae4783
|
7
|
+
data.tar.gz: 4b01b5f70cdb3d4de6267e72502b50e8c8c7210f3d13855477b9d1dd9a4556b57787bca2827ddc7ac3060c915f032a2b32a50b5feb13709b9ecd5a86167c092c
|
data/Changelog.md
ADDED
@@ -0,0 +1,13 @@
|
|
1
|
+
=======
|
2
|
+
Predictor Changelog
|
3
|
+
=========
|
4
|
+
2.0.0 (2014-03-07)
|
5
|
+
---------------------
|
6
|
+
**Rewrite of 1.0.0 and contains several breaking changes!**
|
7
|
+
|
8
|
+
Version 1.0.0 (which really should have been 0.0.1) contained several issues that made compatability with v2 not worth the trouble. This includes:
|
9
|
+
* In v1, similarities were cached per input_matrix, and Predictor::Base utilized those caches when determining similarities and predictions. This quickly ate up Redis memory with even a semi-large dataset, as each input_matrix had a significant memory requirement. v2 caches similarities at the root (Recommender::Base), which means you can add any number of input matrices with little impact on memory usage.
|
10
|
+
* Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
|
11
|
+
* Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
|
12
|
+
* Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
|
13
|
+
* Other minor fixes.
|
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -2,7 +2,7 @@
|
|
2
2
|
Predictor
|
3
3
|
=========
|
4
4
|
|
5
|
-
Fast and efficient recommendations and predictions using Ruby & Redis.
|
5
|
+
Fast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.
|
6
6
|
|
7
7
|
![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)
|
8
8
|
|
@@ -13,6 +13,10 @@ Originally forked and based on [Recommendify](https://github.com/paulasmuth/reco
|
|
13
13
|
|
14
14
|
At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
|
15
15
|
|
16
|
+
Notice
|
17
|
+
---------------------
|
18
|
+
This is the readme for Predictor 2.0, which contains a few breaking changes from 1.0. The 1.0 readme can be found [here](https://github.com/Pathgather/predictor/blob/master/docs/READMEv1.md). See below on how to upgrade to 2.0
|
19
|
+
|
16
20
|
Installation
|
17
21
|
---------------------
|
18
22
|
```ruby
|
@@ -29,9 +33,10 @@ First step is to configure Predictor with your Redis instance.
|
|
29
33
|
# in config/initializers/predictor.rb
|
30
34
|
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
|
31
35
|
|
32
|
-
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first
|
36
|
+
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
|
33
37
|
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
|
34
38
|
```
|
39
|
+
|
35
40
|
Inputting Data
|
36
41
|
---------------------
|
37
42
|
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
|
@@ -51,6 +56,7 @@ Below, we're building a recommender to recommend courses based off of:
|
|
51
56
|
class CourseRecommender
|
52
57
|
include Predictor::Base
|
53
58
|
|
59
|
+
limit_similarities_to 500 # Optional, but if specified, Predictor only caches the top x similarities for an item at any given time. Can greatly help with efficient use of Redis memory
|
54
60
|
input_matrix :users, weight: 3.0
|
55
61
|
input_matrix :tags, weight: 2.0
|
56
62
|
input_matrix :topics, weight: 1.0
|
@@ -62,37 +68,21 @@ Now, we just need to update our matrices when courses are created, users take a
|
|
62
68
|
recommender = CourseRecommender.new
|
63
69
|
|
64
70
|
# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
|
65
|
-
recommender.
|
66
|
-
|
67
|
-
# If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
|
68
|
-
# for course-1 across all other courses. If this is the case, use add_single and process the item at a more
|
69
|
-
# convenient time, perhaps in a background job
|
70
|
-
recommender.topics.add_single("topic-1", "course-1")
|
71
|
-
recommender.topics.process_item!("course-1")
|
72
|
-
|
73
|
-
# Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
|
74
|
-
# If not, the tag-1 set will be initialized with course-1 and course-2
|
75
|
-
recommender.tags.add_set!("tag-1", ["course-1", "course-2"])
|
71
|
+
recommender.add_to_matrix!(:topics, "topic-1", "course-1")
|
76
72
|
|
77
|
-
#
|
78
|
-
|
79
|
-
|
73
|
+
# If your dataset is even remotely large, add_to_matrix! could take some time, as it must calculate the similarity scores
|
74
|
+
# for course-1 and other courses that share a set with course-1. If this is the case, use add_to_matrix and
|
75
|
+
# process the items at a more convenient time, perhaps in a background job
|
76
|
+
recommender.topics.add_to_set("topic-1", "course-1", "course-2") # Same as recommender.add_to_matrix(:topics, "topic-1", "course-1", "course-2")
|
77
|
+
recommender.process_items!("course-1", "course-2")
|
80
78
|
```
|
81
79
|
|
82
|
-
As noted above, it's important to remember that if you don't use the bang
|
83
|
-
* If you want to
|
84
|
-
````
|
85
|
-
recommender.matrix.process_item!(item)
|
86
|
-
````
|
87
|
-
* If you want to update the similarities for all items in a specific matrix:
|
88
|
-
````
|
89
|
-
recommender.matrix.process!
|
90
|
-
````
|
91
|
-
* If you want to update the similarities for a single item in all matrices:
|
80
|
+
As noted above, it's important to remember that if you don't use the bang method 'add_to_matrix!', you'll need to manually update your similarities. If your dataset is even remotely large, you'll probably want to do this:
|
81
|
+
* If you want to update the similarities for certain item(s):
|
92
82
|
````
|
93
|
-
recommender.
|
83
|
+
recommender.process_items!(item1, item2, etc)
|
94
84
|
````
|
95
|
-
* If you want to update all similarities
|
85
|
+
* If you want to update all similarities for all items:
|
96
86
|
````
|
97
87
|
recommender.process!
|
98
88
|
````
|
@@ -100,6 +90,9 @@ As noted above, it's important to remember that if you don't use the bang method
|
|
100
90
|
Retrieving Similarities and Recommendations
|
101
91
|
---------------------
|
102
92
|
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.
|
93
|
+
|
94
|
+
![Course Alternative](http://pathgather.github.io/predictor/images/course-alts.png)
|
95
|
+
|
103
96
|
```ruby
|
104
97
|
recommender = CourseRecommender.new
|
105
98
|
|
@@ -117,6 +110,9 @@ recommender.similarities_for("course-1", exclusion_set: ["course-2"])
|
|
117
110
|
```
|
118
111
|
|
119
112
|
The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!
|
113
|
+
|
114
|
+
![Course Recommendations](http://pathgather.github.io/predictor/images/suggested.png)
|
115
|
+
|
120
116
|
```ruby
|
121
117
|
recommender = CourseRecommender.new
|
122
118
|
|
@@ -129,18 +125,18 @@ recommender.predictions_for("user-1", matrix_label: :users)
|
|
129
125
|
# Paginate too!
|
130
126
|
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
|
131
127
|
|
132
|
-
# Gimme some scores and ignore
|
133
|
-
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["
|
128
|
+
# Gimme some scores and ignore course-2....that course-2 is one sketchy fella
|
129
|
+
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["course-2"])
|
134
130
|
```
|
135
131
|
|
136
132
|
Deleting Items
|
137
133
|
---------------------
|
138
|
-
If your data is deleted from your persistent storage, you certainly don't want to recommend
|
134
|
+
If your data is deleted from your persistent storage, you certainly don't want to recommend it to a user. To ensure that doesn't happen, simply call delete_from_matrix! with the individual matrix or delete_item! if the item is completely gone:
|
139
135
|
```ruby
|
140
136
|
recommender = CourseRecommender.new
|
141
137
|
|
142
138
|
# User removed course-1 from topic-1, but course-1 still exists
|
143
|
-
recommender.
|
139
|
+
recommender.delete_from_matrix!(:topics, "course-1")
|
144
140
|
|
145
141
|
# course-1 was permanently deleted
|
146
142
|
recommender.delete_item!("course-1")
|
@@ -149,6 +145,53 @@ recommender.delete_item!("course-1")
|
|
149
145
|
recommender.clean!
|
150
146
|
```
|
151
147
|
|
148
|
+
Limiting Similarities
|
149
|
+
---------------------
|
150
|
+
By default, Predictor caches all similarities for all items, with no limit. That means if you have 10,000 items, and each item is somehow related to the other, we'll have 10,000 sets each with 9,999 items. That's going to use Redis' memory quite quickly. To limit this, specify the limit_similarities_to option.
|
151
|
+
```ruby
|
152
|
+
class CourseRecommender
|
153
|
+
include Predictor::Base
|
154
|
+
|
155
|
+
limit_similarities_to 500
|
156
|
+
input_matrix :users, weight: 3.0
|
157
|
+
input_matrix :tags, weight: 2.0
|
158
|
+
input_matrix :topics, weight: 1.0
|
159
|
+
end
|
160
|
+
```
|
161
|
+
|
162
|
+
This can really save a ton of memory. Just remember though, predictions fetched with the predictions_for call utilzes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
|
163
|
+
|
164
|
+
Upgrading from 1.0 to 2.0
|
165
|
+
---------------------
|
166
|
+
As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:
|
167
|
+
|
168
|
+
* Change predictor.matrix.add_set! and predictor.matrix.add_single! calls to predictor.add_to_matrix!. For example:
|
169
|
+
```ruby
|
170
|
+
# Change
|
171
|
+
predictor.topics.add_single!("topic-1", "course-1")
|
172
|
+
# to
|
173
|
+
predictor.add_to_matrix!(:topics, "topic-1", "course-1")
|
174
|
+
|
175
|
+
# Change
|
176
|
+
predictor.tags.add_set!("tag-1", ["course-1", "course-2"])
|
177
|
+
# to
|
178
|
+
predictor.add_to_matrix!(:tags, "tag-1", "course-1", "course-2")
|
179
|
+
```
|
180
|
+
* Change predictor.matrix.process! or predictor.matrix.process_item! calls to just predictor.process! or predictor.process_items!
|
181
|
+
```ruby
|
182
|
+
# Change
|
183
|
+
predictor.topics.process_item!("course-1")
|
184
|
+
# to
|
185
|
+
predictor.process_items!("course-1")
|
186
|
+
```
|
187
|
+
* Change predictor.matrix.delete_item! calls to predictor.delete_from_matrix!. This will update similarities too, so you may want to queue this to run in a background job.
|
188
|
+
```ruby
|
189
|
+
# Change
|
190
|
+
predictor.topics.delete_item!("course-1")
|
191
|
+
# to delete_from_matrix! if you want to update similarities to account for the deleted item (in v1, this was a bug and didn't occur)
|
192
|
+
predictor.delete_from_matrix!(:topics, "course-1")
|
193
|
+
```
|
194
|
+
|
152
195
|
Problems? Issues? Want to help out?
|
153
196
|
---------------------
|
154
197
|
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
|
data/docs/READMEv1.md
ADDED
@@ -0,0 +1,206 @@
|
|
1
|
+
=======
|
2
|
+
Predictor
|
3
|
+
=========
|
4
|
+
|
5
|
+
Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.
|
6
|
+
|
7
|
+
![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)
|
8
|
+
|
9
|
+
Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
|
10
|
+
* Be much, much more performant and efficient by using Redis for most logic.
|
11
|
+
* Provide item similarities such as "Users that read this book also read ..."
|
12
|
+
* Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."
|
13
|
+
|
14
|
+
At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
|
15
|
+
|
16
|
+
Installation
|
17
|
+
---------------------
|
18
|
+
```ruby
|
19
|
+
gem install predictor
|
20
|
+
````
|
21
|
+
or in your Gemfile:
|
22
|
+
````
|
23
|
+
gem 'predictor'
|
24
|
+
```
|
25
|
+
Getting Started
|
26
|
+
---------------------
|
27
|
+
First step is to configure Predictor with your Redis instance.
|
28
|
+
```ruby
|
29
|
+
# in config/initializers/predictor.rb
|
30
|
+
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
|
31
|
+
|
32
|
+
# Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
|
33
|
+
Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
|
34
|
+
```
|
35
|
+
Inputting Data
|
36
|
+
---------------------
|
37
|
+
Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
|
38
|
+
|
39
|
+
Below, we're building a recommender to recommend courses based off of:
|
40
|
+
* Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
|
41
|
+
* "user1" -> "course-1", "course-3",
|
42
|
+
* "user2" -> "course-1", "course-4"
|
43
|
+
* Tags and their courses. This will lead to sets like:
|
44
|
+
* "rails" -> "course-1", "course-2",
|
45
|
+
* "microeconomics" -> "course-3", "course-4"
|
46
|
+
* Topics and their courses. This will lead to sets like:
|
47
|
+
* "computer science" -> "course-1", "course-2",
|
48
|
+
* "economics and finance" -> "course-3", "course-4"
|
49
|
+
|
50
|
+
```ruby
|
51
|
+
class CourseRecommender
|
52
|
+
include Predictor::Base
|
53
|
+
|
54
|
+
input_matrix :users, weight: 3.0
|
55
|
+
input_matrix :tags, weight: 2.0
|
56
|
+
input_matrix :topics, weight: 1.0
|
57
|
+
end
|
58
|
+
```
|
59
|
+
|
60
|
+
Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
|
61
|
+
```ruby
|
62
|
+
recommender = CourseRecommender.new
|
63
|
+
|
64
|
+
# Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
|
65
|
+
recommender.topics.add_single!("topic-1", "course-1")
|
66
|
+
|
67
|
+
# If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
|
68
|
+
# for course-1 across all other courses. If this is the case, use add_single and process the item at a more
|
69
|
+
# convenient time, perhaps in a background job
|
70
|
+
recommender.topics.add_single("topic-1", "course-1")
|
71
|
+
recommender.topics.process_item!("course-1")
|
72
|
+
|
73
|
+
# Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
|
74
|
+
# If not, the tag-1 set will be initialized with course-1 and course-2
|
75
|
+
recommender.tags.add_set!("tag-1", ["course-1", "course-2"])
|
76
|
+
|
77
|
+
# Or, just add the set and process whenever you like
|
78
|
+
recommender.tags.add_set("tag-1", ["course-1", "course-2"])
|
79
|
+
["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
|
80
|
+
```
|
81
|
+
|
82
|
+
As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
|
83
|
+
* If you want to simply update the similarities for a single item in a specific matrix:
|
84
|
+
````
|
85
|
+
recommender.matrix.process_item!(item)
|
86
|
+
````
|
87
|
+
* If you want to update the similarities for all items in a specific matrix:
|
88
|
+
````
|
89
|
+
recommender.matrix.process!
|
90
|
+
````
|
91
|
+
* If you want to update the similarities for a single item in all matrices:
|
92
|
+
````
|
93
|
+
recommender.process_item!(item)
|
94
|
+
````
|
95
|
+
* If you want to update all similarities in all matrices:
|
96
|
+
````
|
97
|
+
recommender.process!
|
98
|
+
````
|
99
|
+
|
100
|
+
Retrieving Similarities and Recommendations
|
101
|
+
---------------------
|
102
|
+
Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.
|
103
|
+
|
104
|
+
![Course Alternative](http://pathgather.github.io/predictor/images/course-alts.png)
|
105
|
+
|
106
|
+
```ruby
|
107
|
+
recommender = CourseRecommender.new
|
108
|
+
|
109
|
+
# Return all similarities for course-1 (ordered by most similar to least).
|
110
|
+
recommender.similarities_for("course-1")
|
111
|
+
|
112
|
+
# Need to paginate? Not a problem! Specify an offset and a limit
|
113
|
+
recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20
|
114
|
+
|
115
|
+
# Want scores?
|
116
|
+
recommender.similarities_for("course-1", with_scores: true)
|
117
|
+
|
118
|
+
# Want to ignore a certain set of courses in similarities?
|
119
|
+
recommender.similarities_for("course-1", exclusion_set: ["course-2"])
|
120
|
+
```
|
121
|
+
|
122
|
+
The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!
|
123
|
+
|
124
|
+
![Course Recommendations](http://pathgather.github.io/predictor/images/suggested.png)
|
125
|
+
|
126
|
+
```ruby
|
127
|
+
recommender = CourseRecommender.new
|
128
|
+
|
129
|
+
# User has taken course-1 and course-2. Let's see what else they might like...
|
130
|
+
recommender.predictions_for(item_set: ["course-1", "course-2"])
|
131
|
+
|
132
|
+
# Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
|
133
|
+
recommender.predictions_for("user-1", matrix_label: :users)
|
134
|
+
|
135
|
+
# Paginate too!
|
136
|
+
recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
|
137
|
+
|
138
|
+
# Gimme some scores and ignore user-2....that user-2 is one sketchy fella
|
139
|
+
recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
|
140
|
+
```
|
141
|
+
|
142
|
+
Deleting Items
|
143
|
+
---------------------
|
144
|
+
If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
|
145
|
+
```ruby
|
146
|
+
recommender = CourseRecommender.new
|
147
|
+
|
148
|
+
# User removed course-1 from topic-1, but course-1 still exists
|
149
|
+
recommender.topics.delete_item!("course-1")
|
150
|
+
|
151
|
+
# course-1 was permanently deleted
|
152
|
+
recommender.delete_item!("course-1")
|
153
|
+
|
154
|
+
# Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
|
155
|
+
recommender.clean!
|
156
|
+
```
|
157
|
+
|
158
|
+
Memory Management
|
159
|
+
---------------------
|
160
|
+
Predictor works by caching the similarities for each item in each matrix, then computing overall similarities off those caches. With an even semi-large dataset, this can really eat up Redis's memory. To limit the number of similarities cached in each matrix, specify a similarity_limit option when defining the matrix.
|
161
|
+
```ruby
|
162
|
+
class CourseRecommender
|
163
|
+
include Predictor::Base
|
164
|
+
|
165
|
+
input_matrix :users, weight: 3.0, similarity_limit: 300
|
166
|
+
input_matrix :tags, weight: 2.0, similarity_limit: 300
|
167
|
+
input_matrix :topics, weight: 1.0, similarity_limit: 300
|
168
|
+
end
|
169
|
+
```
|
170
|
+
|
171
|
+
This will ensure that only the top 300 similarities for each item are cached in each matrix. This can greatly reduce your memory usage, and if you're just using Predictor for scenarios where you maybe show the top 5 or so similar items, then this can be hugely helpful. But note, **don't set similarity_limit to 5 in that case**. This simply limits the similarities cached in each matrix, but does not limit the similarities for an item across all matrices. That is computed (and can be limited) on the fly, and uses the similarity cache in each matrix. So, you need a large enough cache in each matrix to determine an intelligent similarity list across all matrices.
|
172
|
+
|
173
|
+
*Note*: This is a bit of a hack, and there are most certainly other ways to improve Predictor's memory usage for large datasets, but each appear to require a more significant change than the trivial implementation of similarity_limit above. PRs are quite welcome that experiment with these other ways :)
|
174
|
+
|
175
|
+
Oh, and if you decide to tinker with your limit to try and find a sweet spot, I added a helpful method to ensure limits are obeyed to avoid regenerating all similarities. Of course, this only helps if you are decreasing the limit. If you're increasing it, you'll need to process similarities all over.
|
176
|
+
```ruby
|
177
|
+
recommender.users.ensure_similarity_limit_is_obeyed! # Remove similarities that disobey our current limit
|
178
|
+
recommender.tags.ensure_similarity_limit_is_obeyed!
|
179
|
+
recommender.topics.ensure_similarity_limit_is_obeyed!
|
180
|
+
```
|
181
|
+
|
182
|
+
Problems? Issues? Want to help out?
|
183
|
+
---------------------
|
184
|
+
Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
|
185
|
+
|
186
|
+
The MIT License (MIT)
|
187
|
+
---------------------
|
188
|
+
Copyright (c) 2014 Pathgather
|
189
|
+
|
190
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
191
|
+
this software and associated documentation files (the "Software"), to deal in
|
192
|
+
the Software without restriction, including without limitation the rights to
|
193
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
|
194
|
+
the Software, and to permit persons to whom the Software is furnished to do so,
|
195
|
+
subject to the following conditions:
|
196
|
+
|
197
|
+
The above copyright notice and this permission notice shall be included in all
|
198
|
+
copies or substantial portions of the Software.
|
199
|
+
|
200
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
201
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
202
|
+
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
203
|
+
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
204
|
+
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
205
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
206
|
+
|
data/lib/predictor/base.rb
CHANGED
@@ -9,6 +9,14 @@ module Predictor::Base
|
|
9
9
|
@matrices[key] = opts
|
10
10
|
end
|
11
11
|
|
12
|
+
def limit_similarities_to(val)
|
13
|
+
@similarity_limit = val
|
14
|
+
end
|
15
|
+
|
16
|
+
def similarity_limit
|
17
|
+
@similarity_limit
|
18
|
+
end
|
19
|
+
|
12
20
|
def input_matrices=(val)
|
13
21
|
@matrices = val
|
14
22
|
end
|
@@ -29,6 +37,10 @@ module Predictor::Base
|
|
29
37
|
"predictor"
|
30
38
|
end
|
31
39
|
|
40
|
+
def similarity_limit
|
41
|
+
self.class.similarity_limit
|
42
|
+
end
|
43
|
+
|
32
44
|
def redis_key(*append)
|
33
45
|
([redis_prefix] + append).flatten.compact.join(":")
|
34
46
|
end
|
@@ -46,70 +58,60 @@ module Predictor::Base
|
|
46
58
|
end
|
47
59
|
|
48
60
|
def all_items
|
49
|
-
Predictor.redis.
|
61
|
+
Predictor.redis.smembers(redis_key(:all_items))
|
50
62
|
end
|
51
63
|
|
52
|
-
def
|
53
|
-
if
|
54
|
-
|
55
|
-
|
56
|
-
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
64
|
+
def add_to_matrix(matrix, set, *items)
|
65
|
+
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
|
66
|
+
input_matrices[matrix].add_to_set(set, *items)
|
67
|
+
end
|
68
|
+
|
69
|
+
def add_to_matrix!(matrix, set, *items)
|
70
|
+
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
|
71
|
+
add_to_matrix(matrix, set, *items)
|
72
|
+
process_items!(*items)
|
73
|
+
end
|
74
|
+
|
75
|
+
def related_items(item)
|
76
|
+
keys = []
|
77
|
+
input_matrices.each do |key, matrix|
|
78
|
+
sets = Predictor.redis.smembers(matrix.redis_key(:sets, item))
|
79
|
+
keys.concat(sets.map { |set| matrix.redis_key(:items, set) })
|
62
80
|
end
|
81
|
+
|
82
|
+
keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item])
|
63
83
|
end
|
64
84
|
|
65
|
-
def predictions_for(
|
66
|
-
fail "item_set or matrix_label and
|
67
|
-
redis = Predictor.redis
|
85
|
+
def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
|
86
|
+
fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)
|
68
87
|
|
69
88
|
if matrix_label
|
70
89
|
matrix = input_matrices[matrix_label]
|
71
|
-
item_set = redis.smembers(matrix.redis_key(:items,
|
72
|
-
end
|
73
|
-
|
74
|
-
item_keys = item_set.map
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
|
83
|
-
unless item_keys.empty?
|
84
|
-
predictions = nil
|
85
|
-
redis.multi do |multi|
|
86
|
-
multi.zunionstore 'temp', item_keys, weights: item_weights
|
87
|
-
multi.zrem 'temp', item_set
|
88
|
-
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
|
89
|
-
predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
|
90
|
-
multi.del 'temp'
|
91
|
-
end
|
92
|
-
return predictions.value
|
93
|
-
else
|
94
|
-
return []
|
90
|
+
item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
|
91
|
+
end
|
92
|
+
|
93
|
+
item_keys = item_set.map { |item| redis_key(:similarities, item) }
|
94
|
+
return [] if item_keys.empty?
|
95
|
+
predictions = nil
|
96
|
+
Predictor.redis.multi do |multi|
|
97
|
+
multi.zunionstore 'temp', item_keys
|
98
|
+
multi.zrem 'temp', item_set
|
99
|
+
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
|
100
|
+
predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
|
101
|
+
multi.del 'temp'
|
95
102
|
end
|
103
|
+
predictions.value
|
96
104
|
end
|
97
105
|
|
98
106
|
def similarities_for(item, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
|
99
|
-
keys = input_matrices.map{ |k,m| m.redis_key(:similarities, item) }
|
100
|
-
weights = input_matrices.map{ |k,m| m.weight }
|
101
107
|
neighbors = nil
|
102
|
-
|
103
|
-
|
104
|
-
|
105
|
-
|
106
|
-
|
107
|
-
multi.del 'temp'
|
108
|
-
end
|
109
|
-
return neighbors.value
|
110
|
-
else
|
111
|
-
return []
|
108
|
+
Predictor.redis.multi do |multi|
|
109
|
+
multi.zunionstore 'temp', [1, redis_key(:similarities, item)]
|
110
|
+
multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
|
111
|
+
neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
|
112
|
+
multi.del 'temp'
|
112
113
|
end
|
114
|
+
return neighbors.value
|
113
115
|
end
|
114
116
|
|
115
117
|
def sets_for(item)
|
@@ -117,32 +119,98 @@ module Predictor::Base
|
|
117
119
|
Predictor.redis.sunion keys
|
118
120
|
end
|
119
121
|
|
120
|
-
def
|
121
|
-
|
122
|
-
|
122
|
+
def process_item!(item)
|
123
|
+
process_items!(item) # Old method
|
124
|
+
end
|
125
|
+
|
126
|
+
def process_items!(*items)
|
127
|
+
items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
|
128
|
+
items.each do |item|
|
129
|
+
related_items(item).each{ |related_item| cache_similarity(item, related_item) }
|
123
130
|
end
|
124
131
|
return self
|
125
132
|
end
|
126
133
|
|
127
|
-
def
|
128
|
-
|
129
|
-
m.process_item!(item)
|
130
|
-
end
|
134
|
+
def process!
|
135
|
+
process_items!(*all_items)
|
131
136
|
return self
|
132
137
|
end
|
133
138
|
|
134
|
-
def
|
139
|
+
def delete_from_matrix!(matrix, item)
|
140
|
+
# Deleting from a specific matrix, so get related_items, delete, then update the similarity of those related_items
|
141
|
+
items = related_items(item)
|
142
|
+
input_matrices[matrix].delete_item(item)
|
143
|
+
items.each { |related_item| cache_similarity(item, related_item) }
|
144
|
+
return self
|
145
|
+
end
|
146
|
+
|
147
|
+
def delete_item!(item)
|
148
|
+
Predictor.redis.srem(redis_key(:all_items), item)
|
149
|
+
Predictor.redis.watch(redis_key(:similarities, item)) do
|
150
|
+
items = related_items(item)
|
151
|
+
Predictor.redis.multi do |multi|
|
152
|
+
items.each do |related_item|
|
153
|
+
multi.zrem(redis_key(:similarities, related_item), item)
|
154
|
+
end
|
155
|
+
multi.del redis_key(:similarities, item)
|
156
|
+
end
|
157
|
+
end
|
158
|
+
|
135
159
|
input_matrices.each do |k,m|
|
136
|
-
m.delete_item
|
160
|
+
m.delete_item(item)
|
137
161
|
end
|
138
162
|
return self
|
139
163
|
end
|
140
164
|
|
141
165
|
def clean!
|
142
|
-
# now only flushes the keys for the instantiated recommender
|
143
166
|
keys = Predictor.redis.keys("#{self.redis_prefix}:*")
|
144
167
|
unless keys.empty?
|
145
168
|
Predictor.redis.del(keys)
|
146
169
|
end
|
147
170
|
end
|
171
|
+
|
172
|
+
def ensure_similarity_limit_is_obeyed!
|
173
|
+
if similarity_limit
|
174
|
+
items = all_items
|
175
|
+
Predictor.redis.multi do |multi|
|
176
|
+
items.each do |item|
|
177
|
+
multi.zremrangebyrank(redis_key(:similarities, item), 0, -(similarity_limit))
|
178
|
+
end
|
179
|
+
end
|
180
|
+
end
|
181
|
+
end
|
182
|
+
|
183
|
+
private
|
184
|
+
|
185
|
+
def cache_similarity(item1, item2)
|
186
|
+
score = 0
|
187
|
+
input_matrices.each do |key, matrix|
|
188
|
+
score += (matrix.calculate_jaccard(item1, item2) * matrix.weight)
|
189
|
+
end
|
190
|
+
if score > 0
|
191
|
+
add_similarity_if_necessary(item1, item2, score)
|
192
|
+
add_similarity_if_necessary(item2, item1, score)
|
193
|
+
else
|
194
|
+
Predictor.redis.multi do |multi|
|
195
|
+
multi.zrem(redis_key(:similarities, item1), item2)
|
196
|
+
multi.zrem(redis_key(:similarities, item2), item1)
|
197
|
+
end
|
198
|
+
end
|
199
|
+
end
|
200
|
+
|
201
|
+
def add_similarity_if_necessary(item, similarity, score)
|
202
|
+
store = true
|
203
|
+
key = redis_key(:similarities, item)
|
204
|
+
if similarity_limit
|
205
|
+
if Predictor.redis.zrank(key, similarity).nil? && Predictor.redis.zcard(key) >= similarity_limit
|
206
|
+
# Similarity is not already stored and we are at limit of similarities
|
207
|
+
lowest_scored_item = Predictor.redis.zrangebyscore(key, "0", "+inf", limit: [0, 1], with_scores: true)
|
208
|
+
unless lowest_scored_item.empty?
|
209
|
+
# If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
|
210
|
+
score <= lowest_scored_item[0][1] ? store = false : Predictor.redis.zrem(key, lowest_scored_item[0][0])
|
211
|
+
end
|
212
|
+
end
|
213
|
+
end
|
214
|
+
Predictor.redis.zadd(key, score, similarity) if store
|
215
|
+
end
|
148
216
|
end
|