predictor 2.0.0 → 2.1.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/Changelog.md +6 -1
- data/README.md +18 -4
- data/lib/predictor/base.rb +11 -3
- data/lib/predictor/version.rb +1 -1
- data/spec/base_spec.rb +37 -4
- metadata +2 -2
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: ce936c35fc81b3f16b757d6e98a90e0a61c83c36
|
4
|
+
data.tar.gz: 839d3b0b314273ef857fd88a67f3a6e0b8d2c45f
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 14e5aacf724794effdaf7e6f092e6b19be17b30885aefe2f71d1414607bf243006142346678aac70fecef0b794cc0e14cb80c71f385242b090f2e2f5328d3ec2
|
7
|
+
data.tar.gz: 7f8477ed02993b787b52e74805952f75eb9f45cab316aa4d191e64ffc89dce1f5a66f6c1aa68dea3457c4d5ce2fb85e311b32ea3d53193494138f052e2911f8d
|
data/Changelog.md
CHANGED
@@ -1,6 +1,11 @@
|
|
1
1
|
=======
|
2
2
|
Predictor Changelog
|
3
3
|
=========
|
4
|
+
|
5
|
+
2.1.0 (2014-06-19)
|
6
|
+
---------------------
|
7
|
+
* The similarity limit now defaults to 128, instead of being unlimited. This is intended to save space in Redis. See the Readme for more information. It is strongly recommended that you run `ensure_similarity_limit_is_obeyed!` to shrink existing similarity sets.
|
8
|
+
|
4
9
|
2.0.0 (2014-04-17)
|
5
10
|
---------------------
|
6
11
|
**Rewrite of 1.0.0 and contains several breaking changes!**
|
@@ -10,4 +15,4 @@ Version 1.0.0 (which really should have been 0.0.1) contained several issues tha
|
|
10
15
|
* Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
|
11
16
|
* Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
|
12
17
|
* Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
|
13
|
-
* Other minor fixes.
|
18
|
+
* Other minor fixes.
|
data/README.md
CHANGED
@@ -53,7 +53,6 @@ Below, we're building a recommender to recommend courses based off of:
|
|
53
53
|
class CourseRecommender
|
54
54
|
include Predictor::Base
|
55
55
|
|
56
|
-
limit_similarities_to 500 # Optional, but if specified, Predictor only caches the top x similarities for an item at any given time. Can greatly help with efficient use of Redis memory
|
57
56
|
input_matrix :users, weight: 3.0
|
58
57
|
input_matrix :tags, weight: 2.0
|
59
58
|
input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard
|
@@ -144,7 +143,8 @@ recommender.clean!
|
|
144
143
|
|
145
144
|
Limiting Similarities
|
146
145
|
---------------------
|
147
|
-
By default, Predictor caches
|
146
|
+
By default, Predictor caches 128 similarities for each item. This is because this is the maximum size for the similarity sorted sets to be kept in a [memory-efficient format](http://redis.io/topics/memory-optimization). If you want to keep more similarities than that, and you don't mind using more memory, you may want to increase the similarity limit, like so:
|
147
|
+
|
148
148
|
```ruby
|
149
149
|
class CourseRecommender
|
150
150
|
include Predictor::Base
|
@@ -156,7 +156,21 @@ class CourseRecommender
|
|
156
156
|
end
|
157
157
|
```
|
158
158
|
|
159
|
-
|
159
|
+
The memory penalty can be heavy, though. In our testing, similarity caches for 1,000 objects varied in size like so:
|
160
|
+
|
161
|
+
```
|
162
|
+
limit_similarities_to(128) # 8.5 MB (this is the default)
|
163
|
+
limit_similarities_to(129) # 22.74 MB
|
164
|
+
limit_similarities_to(500) # 76.72 MB
|
165
|
+
```
|
166
|
+
|
167
|
+
If you decide you need to store more than 128 similarities, you may want to see the Redis documentation linked above and consider increasing `zset-max-ziplist-entries` in your configuration.
|
168
|
+
|
169
|
+
Predictions fetched with the predictions_for call utilizes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
|
170
|
+
|
171
|
+
You can also use `limit_similarities_to(nil)` to remove the limit entirely. This means if you have 10,000 items, and each item is somehow related to the other, you'll have 10,000 sets each with 9,999 items, which will run up your Redis bill quite quickly. Removing the limit is not recommended unless you're sure you know what you're doing.
|
172
|
+
|
173
|
+
If at some point you decide to lower your similarity limits, you'll want to be sure to shrink the size of the sorted sets already in Redis. You can do this with `CourseRecommender.new.ensure_similarity_limit_is_obeyed!`.
|
160
174
|
|
161
175
|
Upgrading from 1.0 to 2.0
|
162
176
|
---------------------
|
@@ -213,4 +227,4 @@ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
|
|
213
227
|
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
|
214
228
|
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
|
215
229
|
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
|
216
|
-
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
230
|
+
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/lib/predictor/base.rb
CHANGED
@@ -10,11 +10,17 @@ module Predictor::Base
|
|
10
10
|
end
|
11
11
|
|
12
12
|
def limit_similarities_to(val)
|
13
|
-
@
|
13
|
+
@similarity_limit_set = true
|
14
|
+
@similarity_limit = val
|
14
15
|
end
|
15
16
|
|
16
17
|
def similarity_limit
|
17
|
-
@similarity_limit
|
18
|
+
@similarity_limit_set ? @similarity_limit : 128
|
19
|
+
end
|
20
|
+
|
21
|
+
def reset_similarity_limit!
|
22
|
+
@similarity_limit_set = nil
|
23
|
+
@similarity_limit = nil
|
18
24
|
end
|
19
25
|
|
20
26
|
def input_matrices=(val)
|
@@ -174,7 +180,9 @@ module Predictor::Base
|
|
174
180
|
items = all_items
|
175
181
|
Predictor.redis.multi do |multi|
|
176
182
|
items.each do |item|
|
177
|
-
|
183
|
+
key = redis_key(:similarities, item)
|
184
|
+
multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
|
185
|
+
multi.zunionstore key, [key] # Rewrite zset to take advantage of ziplist implementation.
|
178
186
|
end
|
179
187
|
end
|
180
188
|
end
|
data/lib/predictor/version.rb
CHANGED
data/spec/base_spec.rb
CHANGED
@@ -8,7 +8,7 @@ describe Predictor::Base do
|
|
8
8
|
before(:each) do
|
9
9
|
flush_redis!
|
10
10
|
BaseRecommender.input_matrices = {}
|
11
|
-
BaseRecommender.
|
11
|
+
BaseRecommender.reset_similarity_limit!
|
12
12
|
end
|
13
13
|
|
14
14
|
describe "configuration" do
|
@@ -17,9 +17,18 @@ describe Predictor::Base do
|
|
17
17
|
BaseRecommender.input_matrices.keys.should == [:myinput]
|
18
18
|
end
|
19
19
|
|
20
|
-
it "should
|
21
|
-
BaseRecommender.
|
22
|
-
|
20
|
+
it "should default the similarity_limit to 128" do
|
21
|
+
BaseRecommender.similarity_limit.should == 128
|
22
|
+
end
|
23
|
+
|
24
|
+
it "should allow the similarity limit to be configured" do
|
25
|
+
BaseRecommender.limit_similarities_to(500)
|
26
|
+
BaseRecommender.similarity_limit.should == 500
|
27
|
+
end
|
28
|
+
|
29
|
+
it "should allow the similarity limit to be removed" do
|
30
|
+
BaseRecommender.limit_similarities_to(nil)
|
31
|
+
BaseRecommender.similarity_limit.should == nil
|
23
32
|
end
|
24
33
|
|
25
34
|
it "should retrieve an input_matrix on a new instance" do
|
@@ -287,4 +296,28 @@ describe Predictor::Base do
|
|
287
296
|
Predictor.redis.keys("#{sm.redis_prefix}:*").should be_empty
|
288
297
|
end
|
289
298
|
end
|
299
|
+
|
300
|
+
describe "ensure_similarity_limit_is_obeyed!" do
|
301
|
+
it "should shorten similarities to the given limit and rewrite the zset" do
|
302
|
+
BaseRecommender.limit_similarities_to(nil)
|
303
|
+
|
304
|
+
BaseRecommender.input_matrix(:myfirstinput)
|
305
|
+
sm = BaseRecommender.new
|
306
|
+
sm.myfirstinput.add_to_set *(['set1'] + 130.times.map{|i| "item#{i}"})
|
307
|
+
sm.similarities_for('item2').should be_empty
|
308
|
+
sm.process_items!('item2')
|
309
|
+
sm.similarities_for('item2').length.should == 129
|
310
|
+
|
311
|
+
redis = Predictor.redis
|
312
|
+
key = sm.redis_key(:similarities, 'item2')
|
313
|
+
redis.zcard(key).should == 129
|
314
|
+
redis.object(:encoding, key).should == 'skiplist' # Inefficient
|
315
|
+
|
316
|
+
BaseRecommender.reset_similarity_limit!
|
317
|
+
sm.ensure_similarity_limit_is_obeyed!
|
318
|
+
|
319
|
+
redis.zcard(key).should == 128
|
320
|
+
redis.object(:encoding, key).should == 'ziplist' # Efficient
|
321
|
+
end
|
322
|
+
end
|
290
323
|
end
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: predictor
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 2.
|
4
|
+
version: 2.1.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Pathgather
|
8
8
|
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date: 2014-
|
11
|
+
date: 2014-06-19 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: redis
|