predictor 2.0.0 → 2.1.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: 22ffddcaceb5b1a189aa7f0e237fad7469cb537e
4
- data.tar.gz: 21c1b3aa6902f605c01e1de0cd91fa37c4e046e1
3
+ metadata.gz: ce936c35fc81b3f16b757d6e98a90e0a61c83c36
4
+ data.tar.gz: 839d3b0b314273ef857fd88a67f3a6e0b8d2c45f
5
5
  SHA512:
6
- metadata.gz: 1e6231526449192f10c34b73bee3500e52b13c7f4e9a3f6699b40c13e361dd7c098c28e199dd2eba2c8490532bf5c07f79af39027b865e31f7100fd5219ba855
7
- data.tar.gz: d07e1994e3221003cc17fda719f2964ab35478a0f396735018fab0a12e278831a710eca7512ea538328a6473c0296a3a79295a48322f5b215a8b535be7e962b4
6
+ metadata.gz: 14e5aacf724794effdaf7e6f092e6b19be17b30885aefe2f71d1414607bf243006142346678aac70fecef0b794cc0e14cb80c71f385242b090f2e2f5328d3ec2
7
+ data.tar.gz: 7f8477ed02993b787b52e74805952f75eb9f45cab316aa4d191e64ffc89dce1f5a66f6c1aa68dea3457c4d5ce2fb85e311b32ea3d53193494138f052e2911f8d
data/Changelog.md CHANGED
@@ -1,6 +1,11 @@
1
1
  =======
2
2
  Predictor Changelog
3
3
  =========
4
+
5
+ 2.1.0 (2014-06-19)
6
+ ---------------------
7
+ * The similarity limit now defaults to 128, instead of being unlimited. This is intended to save space in Redis. See the Readme for more information. It is strongly recommended that you run `ensure_similarity_limit_is_obeyed!` to shrink existing similarity sets.
8
+
4
9
  2.0.0 (2014-04-17)
5
10
  ---------------------
6
11
  **Rewrite of 1.0.0 and contains several breaking changes!**
@@ -10,4 +15,4 @@ Version 1.0.0 (which really should have been 0.0.1) contained several issues tha
10
15
  * Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
11
16
  * Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
12
17
  * Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
13
- * Other minor fixes.
18
+ * Other minor fixes.
data/README.md CHANGED
@@ -53,7 +53,6 @@ Below, we're building a recommender to recommend courses based off of:
53
53
  class CourseRecommender
54
54
  include Predictor::Base
55
55
 
56
- limit_similarities_to 500 # Optional, but if specified, Predictor only caches the top x similarities for an item at any given time. Can greatly help with efficient use of Redis memory
57
56
  input_matrix :users, weight: 3.0
58
57
  input_matrix :tags, weight: 2.0
59
58
  input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard
@@ -144,7 +143,8 @@ recommender.clean!
144
143
 
145
144
  Limiting Similarities
146
145
  ---------------------
147
- By default, Predictor caches all similarities for all items, with no limit. That means if you have 10,000 items, and each item is somehow related to the other, we'll have 10,000 sets each with 9,999 items. That's going to use Redis' memory quite quickly. To limit this, specify the limit_similarities_to option.
146
+ By default, Predictor caches 128 similarities for each item. This is because this is the maximum size for the similarity sorted sets to be kept in a [memory-efficient format](http://redis.io/topics/memory-optimization). If you want to keep more similarities than that, and you don't mind using more memory, you may want to increase the similarity limit, like so:
147
+
148
148
  ```ruby
149
149
  class CourseRecommender
150
150
  include Predictor::Base
@@ -156,7 +156,21 @@ class CourseRecommender
156
156
  end
157
157
  ```
158
158
 
159
- This can really save a ton of memory. Just remember though, predictions fetched with the predictions_for call utilzes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
159
+ The memory penalty can be heavy, though. In our testing, similarity caches for 1,000 objects varied in size like so:
160
+
161
+ ```
162
+ limit_similarities_to(128) # 8.5 MB (this is the default)
163
+ limit_similarities_to(129) # 22.74 MB
164
+ limit_similarities_to(500) # 76.72 MB
165
+ ```
166
+
167
+ If you decide you need to store more than 128 similarities, you may want to see the Redis documentation linked above and consider increasing `zset-max-ziplist-entries` in your configuration.
168
+
169
+ Predictions fetched with the predictions_for call utilizes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
170
+
171
+ You can also use `limit_similarities_to(nil)` to remove the limit entirely. This means if you have 10,000 items, and each item is somehow related to the other, you'll have 10,000 sets each with 9,999 items, which will run up your Redis bill quite quickly. Removing the limit is not recommended unless you're sure you know what you're doing.
172
+
173
+ If at some point you decide to lower your similarity limits, you'll want to be sure to shrink the size of the sorted sets already in Redis. You can do this with `CourseRecommender.new.ensure_similarity_limit_is_obeyed!`.
160
174
 
161
175
  Upgrading from 1.0 to 2.0
162
176
  ---------------------
@@ -213,4 +227,4 @@ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
213
227
  FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
214
228
  COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
215
229
  IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
216
- CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
230
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -10,11 +10,17 @@ module Predictor::Base
10
10
  end
11
11
 
12
12
  def limit_similarities_to(val)
13
- @similarity_limit = val
13
+ @similarity_limit_set = true
14
+ @similarity_limit = val
14
15
  end
15
16
 
16
17
  def similarity_limit
17
- @similarity_limit
18
+ @similarity_limit_set ? @similarity_limit : 128
19
+ end
20
+
21
+ def reset_similarity_limit!
22
+ @similarity_limit_set = nil
23
+ @similarity_limit = nil
18
24
  end
19
25
 
20
26
  def input_matrices=(val)
@@ -174,7 +180,9 @@ module Predictor::Base
174
180
  items = all_items
175
181
  Predictor.redis.multi do |multi|
176
182
  items.each do |item|
177
- multi.zremrangebyrank(redis_key(:similarities, item), 0, -(similarity_limit))
183
+ key = redis_key(:similarities, item)
184
+ multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
185
+ multi.zunionstore key, [key] # Rewrite zset to take advantage of ziplist implementation.
178
186
  end
179
187
  end
180
188
  end
@@ -1,3 +1,3 @@
1
1
  module Predictor
2
- VERSION = "2.0.0"
2
+ VERSION = "2.1.0"
3
3
  end
data/spec/base_spec.rb CHANGED
@@ -8,7 +8,7 @@ describe Predictor::Base do
8
8
  before(:each) do
9
9
  flush_redis!
10
10
  BaseRecommender.input_matrices = {}
11
- BaseRecommender.limit_similarities_to(nil)
11
+ BaseRecommender.reset_similarity_limit!
12
12
  end
13
13
 
14
14
  describe "configuration" do
@@ -17,9 +17,18 @@ describe Predictor::Base do
17
17
  BaseRecommender.input_matrices.keys.should == [:myinput]
18
18
  end
19
19
 
20
- it "should allow a similarity limit" do
21
- BaseRecommender.limit_similarities_to(100)
22
- BaseRecommender.similarity_limit.should == 100
20
+ it "should default the similarity_limit to 128" do
21
+ BaseRecommender.similarity_limit.should == 128
22
+ end
23
+
24
+ it "should allow the similarity limit to be configured" do
25
+ BaseRecommender.limit_similarities_to(500)
26
+ BaseRecommender.similarity_limit.should == 500
27
+ end
28
+
29
+ it "should allow the similarity limit to be removed" do
30
+ BaseRecommender.limit_similarities_to(nil)
31
+ BaseRecommender.similarity_limit.should == nil
23
32
  end
24
33
 
25
34
  it "should retrieve an input_matrix on a new instance" do
@@ -287,4 +296,28 @@ describe Predictor::Base do
287
296
  Predictor.redis.keys("#{sm.redis_prefix}:*").should be_empty
288
297
  end
289
298
  end
299
+
300
+ describe "ensure_similarity_limit_is_obeyed!" do
301
+ it "should shorten similarities to the given limit and rewrite the zset" do
302
+ BaseRecommender.limit_similarities_to(nil)
303
+
304
+ BaseRecommender.input_matrix(:myfirstinput)
305
+ sm = BaseRecommender.new
306
+ sm.myfirstinput.add_to_set *(['set1'] + 130.times.map{|i| "item#{i}"})
307
+ sm.similarities_for('item2').should be_empty
308
+ sm.process_items!('item2')
309
+ sm.similarities_for('item2').length.should == 129
310
+
311
+ redis = Predictor.redis
312
+ key = sm.redis_key(:similarities, 'item2')
313
+ redis.zcard(key).should == 129
314
+ redis.object(:encoding, key).should == 'skiplist' # Inefficient
315
+
316
+ BaseRecommender.reset_similarity_limit!
317
+ sm.ensure_similarity_limit_is_obeyed!
318
+
319
+ redis.zcard(key).should == 128
320
+ redis.object(:encoding, key).should == 'ziplist' # Efficient
321
+ end
322
+ end
290
323
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: predictor
3
3
  version: !ruby/object:Gem::Version
4
- version: 2.0.0
4
+ version: 2.1.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Pathgather
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2014-04-17 00:00:00.000000000 Z
11
+ date: 2014-06-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: redis