predictor 1.0.0 → 2.0.0.rc1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 23c921a064f6dcb321d1948e051545298616f3b7
4
- data.tar.gz: 501a6132f7ea81fa316faf5a5f14a7b0d28afcdb
3
+ metadata.gz: 29f61606a156b3a6132dc9212f3d027492285d2e
4
+ data.tar.gz: acf97ff88f34ca518536e18b7753fe27e69956ec
5
5
  SHA512:
6
- metadata.gz: 0b14e50f6df801912204a8a312124eba2e44888e8c5b8b1e8f3fe51808b0b3cad2e794b55633fb747cb60189d782312864c49c5105d03de5e1992f240a3528d8
7
- data.tar.gz: 850b7d299e0f3ce4352fb3cd3423e96d598f83a72b5a0cf2d0cf27734f25e8951f2834cb40444931820721dce74bb4f029a82081765abb4069fa25d9a31fa04c
6
+ metadata.gz: cd25db9f133ee47f44703e8e6cab7d0daed59b9a035abfac75d733d6891d0c312981f76875b0278a9b66285c3f374650b01fe4466f45b51c0d42685dafae4783
7
+ data.tar.gz: 4b01b5f70cdb3d4de6267e72502b50e8c8c7210f3d13855477b9d1dd9a4556b57787bca2827ddc7ac3060c915f032a2b32a50b5feb13709b9ecd5a86167c092c
@@ -0,0 +1,13 @@
1
+ =======
2
+ Predictor Changelog
3
+ =========
4
+ 2.0.0 (2014-03-07)
5
+ ---------------------
6
+ **Rewrite of 1.0.0 and contains several breaking changes!**
7
+
8
+ Version 1.0.0 (which really should have been 0.0.1) contained several issues that made compatability with v2 not worth the trouble. This includes:
9
+ * In v1, similarities were cached per input_matrix, and Predictor::Base utilized those caches when determining similarities and predictions. This quickly ate up Redis memory with even a semi-large dataset, as each input_matrix had a significant memory requirement. v2 caches similarities at the root (Recommender::Base), which means you can add any number of input matrices with little impact on memory usage.
10
+ * Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
11
+ * Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
12
+ * Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
13
+ * Other minor fixes.
data/Gemfile CHANGED
@@ -1,4 +1,4 @@
1
- source :rubygems
1
+ source 'https://rubygems.org'
2
2
 
3
3
  gem "redis"
4
4
 
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
  Predictor
3
3
  =========
4
4
 
5
- Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to recommend content to users.
5
+ Fast and efficient recommendations and predictions using Ruby & Redis. Developed by and used at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.
6
6
 
7
7
  ![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)
8
8
 
@@ -13,6 +13,10 @@ Originally forked and based on [Recommendify](https://github.com/paulasmuth/reco
13
13
 
14
14
  At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
15
15
 
16
+ Notice
17
+ ---------------------
18
+ This is the readme for Predictor 2.0, which contains a few breaking changes from 1.0. The 1.0 readme can be found [here](https://github.com/Pathgather/predictor/blob/master/docs/READMEv1.md). See below on how to upgrade to 2.0
19
+
16
20
  Installation
17
21
  ---------------------
18
22
  ```ruby
@@ -29,9 +33,10 @@ First step is to configure Predictor with your Redis instance.
29
33
  # in config/initializers/predictor.rb
30
34
  Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
31
35
 
32
- # Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first
36
+ # Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
33
37
  Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
34
38
  ```
39
+
35
40
  Inputting Data
36
41
  ---------------------
37
42
  Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
@@ -51,6 +56,7 @@ Below, we're building a recommender to recommend courses based off of:
51
56
  class CourseRecommender
52
57
  include Predictor::Base
53
58
 
59
+ limit_similarities_to 500 # Optional, but if specified, Predictor only caches the top x similarities for an item at any given time. Can greatly help with efficient use of Redis memory
54
60
  input_matrix :users, weight: 3.0
55
61
  input_matrix :tags, weight: 2.0
56
62
  input_matrix :topics, weight: 1.0
@@ -62,37 +68,21 @@ Now, we just need to update our matrices when courses are created, users take a
62
68
  recommender = CourseRecommender.new
63
69
 
64
70
  # Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
65
- recommender.topics.add_single!("topic-1", "course-1")
66
-
67
- # If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
68
- # for course-1 across all other courses. If this is the case, use add_single and process the item at a more
69
- # convenient time, perhaps in a background job
70
- recommender.topics.add_single("topic-1", "course-1")
71
- recommender.topics.process_item!("course-1")
72
-
73
- # Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
74
- # If not, the tag-1 set will be initialized with course-1 and course-2
75
- recommender.tags.add_set!("tag-1", ["course-1", "course-2"])
71
+ recommender.add_to_matrix!(:topics, "topic-1", "course-1")
76
72
 
77
- # Or, just add the set and process whenever you like
78
- recommender.tags.add_set("tag-1", ["course-1", "course-2"])
79
- ["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
73
+ # If your dataset is even remotely large, add_to_matrix! could take some time, as it must calculate the similarity scores
74
+ # for course-1 and other courses that share a set with course-1. If this is the case, use add_to_matrix and
75
+ # process the items at a more convenient time, perhaps in a background job
76
+ recommender.topics.add_to_set("topic-1", "course-1", "course-2") # Same as recommender.add_to_matrix(:topics, "topic-1", "course-1", "course-2")
77
+ recommender.process_items!("course-1", "course-2")
80
78
  ```
81
79
 
82
- As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
83
- * If you want to simply update the similarities for a single item in a specific matrix:
84
- ````
85
- recommender.matrix.process_item!(item)
86
- ````
87
- * If you want to update the similarities for all items in a specific matrix:
88
- ````
89
- recommender.matrix.process!
90
- ````
91
- * If you want to update the similarities for a single item in all matrices:
80
+ As noted above, it's important to remember that if you don't use the bang method 'add_to_matrix!', you'll need to manually update your similarities. If your dataset is even remotely large, you'll probably want to do this:
81
+ * If you want to update the similarities for certain item(s):
92
82
  ````
93
- recommender.process_item!(item)
83
+ recommender.process_items!(item1, item2, etc)
94
84
  ````
95
- * If you want to update all similarities in all matrices:
85
+ * If you want to update all similarities for all items:
96
86
  ````
97
87
  recommender.process!
98
88
  ````
@@ -100,6 +90,9 @@ As noted above, it's important to remember that if you don't use the bang method
100
90
  Retrieving Similarities and Recommendations
101
91
  ---------------------
102
92
  Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.
93
+
94
+ ![Course Alternative](http://pathgather.github.io/predictor/images/course-alts.png)
95
+
103
96
  ```ruby
104
97
  recommender = CourseRecommender.new
105
98
 
@@ -117,6 +110,9 @@ recommender.similarities_for("course-1", exclusion_set: ["course-2"])
117
110
  ```
118
111
 
119
112
  The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!
113
+
114
+ ![Course Recommendations](http://pathgather.github.io/predictor/images/suggested.png)
115
+
120
116
  ```ruby
121
117
  recommender = CourseRecommender.new
122
118
 
@@ -129,18 +125,18 @@ recommender.predictions_for("user-1", matrix_label: :users)
129
125
  # Paginate too!
130
126
  recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
131
127
 
132
- # Gimme some scores and ignore user-2....that user-2 is one sketchy fella
133
- recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
128
+ # Gimme some scores and ignore course-2....that course-2 is one sketchy fella
129
+ recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["course-2"])
134
130
  ```
135
131
 
136
132
  Deleting Items
137
133
  ---------------------
138
- If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
134
+ If your data is deleted from your persistent storage, you certainly don't want to recommend it to a user. To ensure that doesn't happen, simply call delete_from_matrix! with the individual matrix or delete_item! if the item is completely gone:
139
135
  ```ruby
140
136
  recommender = CourseRecommender.new
141
137
 
142
138
  # User removed course-1 from topic-1, but course-1 still exists
143
- recommender.topics.delete_item!("course-1")
139
+ recommender.delete_from_matrix!(:topics, "course-1")
144
140
 
145
141
  # course-1 was permanently deleted
146
142
  recommender.delete_item!("course-1")
@@ -149,6 +145,53 @@ recommender.delete_item!("course-1")
149
145
  recommender.clean!
150
146
  ```
151
147
 
148
+ Limiting Similarities
149
+ ---------------------
150
+ By default, Predictor caches all similarities for all items, with no limit. That means if you have 10,000 items, and each item is somehow related to the other, we'll have 10,000 sets each with 9,999 items. That's going to use Redis' memory quite quickly. To limit this, specify the limit_similarities_to option.
151
+ ```ruby
152
+ class CourseRecommender
153
+ include Predictor::Base
154
+
155
+ limit_similarities_to 500
156
+ input_matrix :users, weight: 3.0
157
+ input_matrix :tags, weight: 2.0
158
+ input_matrix :topics, weight: 1.0
159
+ end
160
+ ```
161
+
162
+ This can really save a ton of memory. Just remember though, predictions fetched with the predictions_for call utilzes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
163
+
164
+ Upgrading from 1.0 to 2.0
165
+ ---------------------
166
+ As mentioned, 2.0.0 is quite a bit different than 1.0.0, so simply upgrading with no changes likely won't work. My apologies for this. I promise this won't happen in future releases, as I'm much more confident in this Predictor release than the last. Anywho, upgrading really shouldn't be that much of a pain if you follow these steps:
167
+
168
+ * Change predictor.matrix.add_set! and predictor.matrix.add_single! calls to predictor.add_to_matrix!. For example:
169
+ ```ruby
170
+ # Change
171
+ predictor.topics.add_single!("topic-1", "course-1")
172
+ # to
173
+ predictor.add_to_matrix!(:topics, "topic-1", "course-1")
174
+
175
+ # Change
176
+ predictor.tags.add_set!("tag-1", ["course-1", "course-2"])
177
+ # to
178
+ predictor.add_to_matrix!(:tags, "tag-1", "course-1", "course-2")
179
+ ```
180
+ * Change predictor.matrix.process! or predictor.matrix.process_item! calls to just predictor.process! or predictor.process_items!
181
+ ```ruby
182
+ # Change
183
+ predictor.topics.process_item!("course-1")
184
+ # to
185
+ predictor.process_items!("course-1")
186
+ ```
187
+ * Change predictor.matrix.delete_item! calls to predictor.delete_from_matrix!. This will update similarities too, so you may want to queue this to run in a background job.
188
+ ```ruby
189
+ # Change
190
+ predictor.topics.delete_item!("course-1")
191
+ # to delete_from_matrix! if you want to update similarities to account for the deleted item (in v1, this was a bug and didn't occur)
192
+ predictor.delete_from_matrix!(:topics, "course-1")
193
+ ```
194
+
152
195
  Problems? Issues? Want to help out?
153
196
  ---------------------
154
197
  Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
@@ -0,0 +1,206 @@
1
+ =======
2
+ Predictor
3
+ =========
4
+
5
+ Fast and efficient recommendations and predictions using Ruby & Redis. Used in production over at [Pathgather](http://pathgather.com) to generate course similarities and content recommendations to users.
6
+
7
+ ![](https://www.codeship.io/projects/5aeeedf0-6053-0131-2319-5ede98f174ff/status)
8
+
9
+ Originally forked and based on [Recommendify](https://github.com/paulasmuth/recommendify) by Paul Asmuth, so a huge thanks to him for his contributions to Recommendify. Predictor has been almost completely rewritten to
10
+ * Be much, much more performant and efficient by using Redis for most logic.
11
+ * Provide item similarities such as "Users that read this book also read ..."
12
+ * Provide personalized predictions based on a user's past history, such as "You read these 10 books, so you might also like to read ..."
13
+
14
+ At the moment, Predictor uses the [Jaccard index](http://en.wikipedia.org/wiki/Jaccard_index) to determine similarities between items. There are other ways to do this, which we intend to implement eventually, but if you want to beat us to the punch, pull requests are quite welcome :)
15
+
16
+ Installation
17
+ ---------------------
18
+ ```ruby
19
+ gem install predictor
20
+ ````
21
+ or in your Gemfile:
22
+ ````
23
+ gem 'predictor'
24
+ ```
25
+ Getting Started
26
+ ---------------------
27
+ First step is to configure Predictor with your Redis instance.
28
+ ```ruby
29
+ # in config/initializers/predictor.rb
30
+ Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"])
31
+
32
+ # Or, to improve performance, add hiredis as your driver (you'll need to install the hiredis gem first)
33
+ Predictor.redis = Redis.new(:url => ENV["PREDICTOR_REDIS"], :driver => :hiredis)
34
+ ```
35
+ Inputting Data
36
+ ---------------------
37
+ Create a class and include the Predictor::Base module. Define an input_matrix for each relationship you'd like to keep track of. This can be anything you think is a significant metric for the item: page views, purchases, categories the item belongs to, etc.
38
+
39
+ Below, we're building a recommender to recommend courses based off of:
40
+ * Users that have taken a course. If 2 courses were taken by the same user, this is 3 times as important to us than if the courses share the same topic. This will lead to sets like:
41
+ * "user1" -> "course-1", "course-3",
42
+ * "user2" -> "course-1", "course-4"
43
+ * Tags and their courses. This will lead to sets like:
44
+ * "rails" -> "course-1", "course-2",
45
+ * "microeconomics" -> "course-3", "course-4"
46
+ * Topics and their courses. This will lead to sets like:
47
+ * "computer science" -> "course-1", "course-2",
48
+ * "economics and finance" -> "course-3", "course-4"
49
+
50
+ ```ruby
51
+ class CourseRecommender
52
+ include Predictor::Base
53
+
54
+ input_matrix :users, weight: 3.0
55
+ input_matrix :tags, weight: 2.0
56
+ input_matrix :topics, weight: 1.0
57
+ end
58
+ ```
59
+
60
+ Now, we just need to update our matrices when courses are created, users take a course, topics are changed, etc:
61
+ ```ruby
62
+ recommender = CourseRecommender.new
63
+
64
+ # Add a single course to topic-1's items. If topic-1 already exists as a set ID, this just adds course-1 to the set
65
+ recommender.topics.add_single!("topic-1", "course-1")
66
+
67
+ # If your matrix is quite large, add_single! could take some time, as it must calculate the similarity scores
68
+ # for course-1 across all other courses. If this is the case, use add_single and process the item at a more
69
+ # convenient time, perhaps in a background job
70
+ recommender.topics.add_single("topic-1", "course-1")
71
+ recommender.topics.process_item!("course-1")
72
+
73
+ # Add an array of courses to tag-1. Again, these will simply be added to tag-1's existing set, if it exists.
74
+ # If not, the tag-1 set will be initialized with course-1 and course-2
75
+ recommender.tags.add_set!("tag-1", ["course-1", "course-2"])
76
+
77
+ # Or, just add the set and process whenever you like
78
+ recommender.tags.add_set("tag-1", ["course-1", "course-2"])
79
+ ["course-1", "course-2"].each { |course| recommender.topics.process_item!(course) }
80
+ ```
81
+
82
+ As noted above, it's important to remember that if you don't use the bang methods (add_set! and add_single!), you'll need to manually update your similarities (the bang methods will likely suffice for most use cases though). You can do so a variety of ways.
83
+ * If you want to simply update the similarities for a single item in a specific matrix:
84
+ ````
85
+ recommender.matrix.process_item!(item)
86
+ ````
87
+ * If you want to update the similarities for all items in a specific matrix:
88
+ ````
89
+ recommender.matrix.process!
90
+ ````
91
+ * If you want to update the similarities for a single item in all matrices:
92
+ ````
93
+ recommender.process_item!(item)
94
+ ````
95
+ * If you want to update all similarities in all matrices:
96
+ ````
97
+ recommender.process!
98
+ ````
99
+
100
+ Retrieving Similarities and Recommendations
101
+ ---------------------
102
+ Now that your matrices have been initialized with several relationships, you can start generating similarities and recommendations! First, let's start with similarities, which will use the weights we specify on each matrix to determine which courses share the most in common with a given course.
103
+
104
+ ![Course Alternative](http://pathgather.github.io/predictor/images/course-alts.png)
105
+
106
+ ```ruby
107
+ recommender = CourseRecommender.new
108
+
109
+ # Return all similarities for course-1 (ordered by most similar to least).
110
+ recommender.similarities_for("course-1")
111
+
112
+ # Need to paginate? Not a problem! Specify an offset and a limit
113
+ recommender.similarities_for("course-1", offset: 10, limit: 10) # Gets similarities 11-20
114
+
115
+ # Want scores?
116
+ recommender.similarities_for("course-1", with_scores: true)
117
+
118
+ # Want to ignore a certain set of courses in similarities?
119
+ recommender.similarities_for("course-1", exclusion_set: ["course-2"])
120
+ ```
121
+
122
+ The above examples are great for situations like "Users that viewed this also liked ...", but what if you wanted to recommend courses to a user based on the courses they've already taken? Not a problem!
123
+
124
+ ![Course Recommendations](http://pathgather.github.io/predictor/images/suggested.png)
125
+
126
+ ```ruby
127
+ recommender = CourseRecommender.new
128
+
129
+ # User has taken course-1 and course-2. Let's see what else they might like...
130
+ recommender.predictions_for(item_set: ["course-1", "course-2"])
131
+
132
+ # Already have the set you need stored in an input matrix? In our case, we do (the users matrix stores the courses a user has taken), so we can just do:
133
+ recommender.predictions_for("user-1", matrix_label: :users)
134
+
135
+ # Paginate too!
136
+ recommender.predictions_for("user-1", matrix_label: :users, offset: 10, limit: 10)
137
+
138
+ # Gimme some scores and ignore user-2....that user-2 is one sketchy fella
139
+ recommender.predictions_for("user-1", matrix_label: :users, with_scores: true, exclusion_set: ["user-2"])
140
+ ```
141
+
142
+ Deleting Items
143
+ ---------------------
144
+ If your data is deleted from your persistent storage, you certainly don't want to recommend that data to a user. To ensure that doesn't happen, simply call delete_item! on the individual matrix or recommender as a whole:
145
+ ```ruby
146
+ recommender = CourseRecommender.new
147
+
148
+ # User removed course-1 from topic-1, but course-1 still exists
149
+ recommender.topics.delete_item!("course-1")
150
+
151
+ # course-1 was permanently deleted
152
+ recommender.delete_item!("course-1")
153
+
154
+ # Something crazy has happened, so let's just start fresh and wipe out all previously stored similarities:
155
+ recommender.clean!
156
+ ```
157
+
158
+ Memory Management
159
+ ---------------------
160
+ Predictor works by caching the similarities for each item in each matrix, then computing overall similarities off those caches. With an even semi-large dataset, this can really eat up Redis's memory. To limit the number of similarities cached in each matrix, specify a similarity_limit option when defining the matrix.
161
+ ```ruby
162
+ class CourseRecommender
163
+ include Predictor::Base
164
+
165
+ input_matrix :users, weight: 3.0, similarity_limit: 300
166
+ input_matrix :tags, weight: 2.0, similarity_limit: 300
167
+ input_matrix :topics, weight: 1.0, similarity_limit: 300
168
+ end
169
+ ```
170
+
171
+ This will ensure that only the top 300 similarities for each item are cached in each matrix. This can greatly reduce your memory usage, and if you're just using Predictor for scenarios where you maybe show the top 5 or so similar items, then this can be hugely helpful. But note, **don't set similarity_limit to 5 in that case**. This simply limits the similarities cached in each matrix, but does not limit the similarities for an item across all matrices. That is computed (and can be limited) on the fly, and uses the similarity cache in each matrix. So, you need a large enough cache in each matrix to determine an intelligent similarity list across all matrices.
172
+
173
+ *Note*: This is a bit of a hack, and there are most certainly other ways to improve Predictor's memory usage for large datasets, but each appear to require a more significant change than the trivial implementation of similarity_limit above. PRs are quite welcome that experiment with these other ways :)
174
+
175
+ Oh, and if you decide to tinker with your limit to try and find a sweet spot, I added a helpful method to ensure limits are obeyed to avoid regenerating all similarities. Of course, this only helps if you are decreasing the limit. If you're increasing it, you'll need to process similarities all over.
176
+ ```ruby
177
+ recommender.users.ensure_similarity_limit_is_obeyed! # Remove similarities that disobey our current limit
178
+ recommender.tags.ensure_similarity_limit_is_obeyed!
179
+ recommender.topics.ensure_similarity_limit_is_obeyed!
180
+ ```
181
+
182
+ Problems? Issues? Want to help out?
183
+ ---------------------
184
+ Just submit a Gihub issue or pull request! We'd love to have you help out, as the most common library to use for this need, Recommendify, was last updated 2 years ago. We'll be sure to keep this maintained, but we could certainly use your help!
185
+
186
+ The MIT License (MIT)
187
+ ---------------------
188
+ Copyright (c) 2014 Pathgather
189
+
190
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
191
+ this software and associated documentation files (the "Software"), to deal in
192
+ the Software without restriction, including without limitation the rights to
193
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
194
+ the Software, and to permit persons to whom the Software is furnished to do so,
195
+ subject to the following conditions:
196
+
197
+ The above copyright notice and this permission notice shall be included in all
198
+ copies or substantial portions of the Software.
199
+
200
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
201
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
202
+ FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
203
+ COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
204
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
205
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
206
+
@@ -9,6 +9,14 @@ module Predictor::Base
9
9
  @matrices[key] = opts
10
10
  end
11
11
 
12
+ def limit_similarities_to(val)
13
+ @similarity_limit = val
14
+ end
15
+
16
+ def similarity_limit
17
+ @similarity_limit
18
+ end
19
+
12
20
  def input_matrices=(val)
13
21
  @matrices = val
14
22
  end
@@ -29,6 +37,10 @@ module Predictor::Base
29
37
  "predictor"
30
38
  end
31
39
 
40
+ def similarity_limit
41
+ self.class.similarity_limit
42
+ end
43
+
32
44
  def redis_key(*append)
33
45
  ([redis_prefix] + append).flatten.compact.join(":")
34
46
  end
@@ -46,70 +58,60 @@ module Predictor::Base
46
58
  end
47
59
 
48
60
  def all_items
49
- Predictor.redis.sunion input_matrices.map{|k,m| m.redis_key(:all_items)}
61
+ Predictor.redis.smembers(redis_key(:all_items))
50
62
  end
51
63
 
52
- def item_score(item, normalize)
53
- if normalize
54
- similarities = similarities_for(item, with_scores: true)
55
- unless similarities.empty?
56
- similarities.map{|x,y| y}.reduce(:+)
57
- else
58
- 1
59
- end
60
- else
61
- 1
64
+ def add_to_matrix(matrix, set, *items)
65
+ items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
66
+ input_matrices[matrix].add_to_set(set, *items)
67
+ end
68
+
69
+ def add_to_matrix!(matrix, set, *items)
70
+ items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
71
+ add_to_matrix(matrix, set, *items)
72
+ process_items!(*items)
73
+ end
74
+
75
+ def related_items(item)
76
+ keys = []
77
+ input_matrices.each do |key, matrix|
78
+ sets = Predictor.redis.smembers(matrix.redis_key(:sets, item))
79
+ keys.concat(sets.map { |set| matrix.redis_key(:items, set) })
62
80
  end
81
+
82
+ keys.empty? ? [] : (Predictor.redis.sunion(keys) - [item])
63
83
  end
64
84
 
65
- def predictions_for(set_id=nil, item_set: nil, matrix_label: nil, with_scores: false, normalize: true, offset: 0, limit: -1, exclusion_set: [])
66
- fail "item_set or matrix_label and set_id is required" unless item_set || (matrix_label && set_id)
67
- redis = Predictor.redis
85
+ def predictions_for(set=nil, item_set: nil, matrix_label: nil, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
86
+ fail "item_set or matrix_label and set is required" unless item_set || (matrix_label && set)
68
87
 
69
88
  if matrix_label
70
89
  matrix = input_matrices[matrix_label]
71
- item_set = redis.smembers(matrix.redis_key(:items, set_id))
72
- end
73
-
74
- item_keys = item_set.map do |item|
75
- input_matrices.map{ |k,m| m.redis_key(:similarities, item) }
76
- end.flatten
77
-
78
- item_weights = item_set.map do |item|
79
- score = item_score(item, normalize)
80
- input_matrices.map{|k, m| m.weight/score }
81
- end.flatten
82
-
83
- unless item_keys.empty?
84
- predictions = nil
85
- redis.multi do |multi|
86
- multi.zunionstore 'temp', item_keys, weights: item_weights
87
- multi.zrem 'temp', item_set
88
- multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
89
- predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
90
- multi.del 'temp'
91
- end
92
- return predictions.value
93
- else
94
- return []
90
+ item_set = Predictor.redis.smembers(matrix.redis_key(:items, set))
91
+ end
92
+
93
+ item_keys = item_set.map { |item| redis_key(:similarities, item) }
94
+ return [] if item_keys.empty?
95
+ predictions = nil
96
+ Predictor.redis.multi do |multi|
97
+ multi.zunionstore 'temp', item_keys
98
+ multi.zrem 'temp', item_set
99
+ multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
100
+ predictions = multi.zrevrange 'temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores
101
+ multi.del 'temp'
95
102
  end
103
+ predictions.value
96
104
  end
97
105
 
98
106
  def similarities_for(item, with_scores: false, offset: 0, limit: -1, exclusion_set: [])
99
- keys = input_matrices.map{ |k,m| m.redis_key(:similarities, item) }
100
- weights = input_matrices.map{ |k,m| m.weight }
101
107
  neighbors = nil
102
- unless keys.empty?
103
- Predictor.redis.multi do |multi|
104
- multi.zunionstore 'temp', keys, weights: weights
105
- multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
106
- neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
107
- multi.del 'temp'
108
- end
109
- return neighbors.value
110
- else
111
- return []
108
+ Predictor.redis.multi do |multi|
109
+ multi.zunionstore 'temp', [1, redis_key(:similarities, item)]
110
+ multi.zrem 'temp', exclusion_set if exclusion_set.length > 0
111
+ neighbors = multi.zrevrange('temp', offset, limit == -1 ? limit : offset + (limit - 1), with_scores: with_scores)
112
+ multi.del 'temp'
112
113
  end
114
+ return neighbors.value
113
115
  end
114
116
 
115
117
  def sets_for(item)
@@ -117,32 +119,98 @@ module Predictor::Base
117
119
  Predictor.redis.sunion keys
118
120
  end
119
121
 
120
- def process!
121
- input_matrices.each do |k,m|
122
- m.process!
122
+ def process_item!(item)
123
+ process_items!(item) # Old method
124
+ end
125
+
126
+ def process_items!(*items)
127
+ items = items.flatten if items.count == 1 && items[0].is_a?(Array) # Old syntax
128
+ items.each do |item|
129
+ related_items(item).each{ |related_item| cache_similarity(item, related_item) }
123
130
  end
124
131
  return self
125
132
  end
126
133
 
127
- def process_item!(item)
128
- input_matrices.each do |k,m|
129
- m.process_item!(item)
130
- end
134
+ def process!
135
+ process_items!(*all_items)
131
136
  return self
132
137
  end
133
138
 
134
- def delete_item!(item_id)
139
+ def delete_from_matrix!(matrix, item)
140
+ # Deleting from a specific matrix, so get related_items, delete, then update the similarity of those related_items
141
+ items = related_items(item)
142
+ input_matrices[matrix].delete_item(item)
143
+ items.each { |related_item| cache_similarity(item, related_item) }
144
+ return self
145
+ end
146
+
147
+ def delete_item!(item)
148
+ Predictor.redis.srem(redis_key(:all_items), item)
149
+ Predictor.redis.watch(redis_key(:similarities, item)) do
150
+ items = related_items(item)
151
+ Predictor.redis.multi do |multi|
152
+ items.each do |related_item|
153
+ multi.zrem(redis_key(:similarities, related_item), item)
154
+ end
155
+ multi.del redis_key(:similarities, item)
156
+ end
157
+ end
158
+
135
159
  input_matrices.each do |k,m|
136
- m.delete_item!(item_id)
160
+ m.delete_item(item)
137
161
  end
138
162
  return self
139
163
  end
140
164
 
141
165
  def clean!
142
- # now only flushes the keys for the instantiated recommender
143
166
  keys = Predictor.redis.keys("#{self.redis_prefix}:*")
144
167
  unless keys.empty?
145
168
  Predictor.redis.del(keys)
146
169
  end
147
170
  end
171
+
172
+ def ensure_similarity_limit_is_obeyed!
173
+ if similarity_limit
174
+ items = all_items
175
+ Predictor.redis.multi do |multi|
176
+ items.each do |item|
177
+ multi.zremrangebyrank(redis_key(:similarities, item), 0, -(similarity_limit))
178
+ end
179
+ end
180
+ end
181
+ end
182
+
183
+ private
184
+
185
+ def cache_similarity(item1, item2)
186
+ score = 0
187
+ input_matrices.each do |key, matrix|
188
+ score += (matrix.calculate_jaccard(item1, item2) * matrix.weight)
189
+ end
190
+ if score > 0
191
+ add_similarity_if_necessary(item1, item2, score)
192
+ add_similarity_if_necessary(item2, item1, score)
193
+ else
194
+ Predictor.redis.multi do |multi|
195
+ multi.zrem(redis_key(:similarities, item1), item2)
196
+ multi.zrem(redis_key(:similarities, item2), item1)
197
+ end
198
+ end
199
+ end
200
+
201
+ def add_similarity_if_necessary(item, similarity, score)
202
+ store = true
203
+ key = redis_key(:similarities, item)
204
+ if similarity_limit
205
+ if Predictor.redis.zrank(key, similarity).nil? && Predictor.redis.zcard(key) >= similarity_limit
206
+ # Similarity is not already stored and we are at limit of similarities
207
+ lowest_scored_item = Predictor.redis.zrangebyscore(key, "0", "+inf", limit: [0, 1], with_scores: true)
208
+ unless lowest_scored_item.empty?
209
+ # If score is less than or equal to the lowest score, don't store it. Otherwise, make room by removing the lowest scored similarity
210
+ score <= lowest_scored_item[0][1] ? store = false : Predictor.redis.zrem(key, lowest_scored_item[0][0])
211
+ end
212
+ end
213
+ end
214
+ Predictor.redis.zadd(key, score, similarity) if store
215
+ end
148
216
  end