RubyGems - predictor - Versions diffs - 2.0.0 → 2.1.0 - Mend

predictor 2.0.0 → 2.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (7) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 22ffddcaceb5b1a189aa7f0e237fad7469cb537e
-  data.tar.gz: 21c1b3aa6902f605c01e1de0cd91fa37c4e046e1
+  metadata.gz: ce936c35fc81b3f16b757d6e98a90e0a61c83c36
+  data.tar.gz: 839d3b0b314273ef857fd88a67f3a6e0b8d2c45f
 SHA512:
-  metadata.gz: 1e6231526449192f10c34b73bee3500e52b13c7f4e9a3f6699b40c13e361dd7c098c28e199dd2eba2c8490532bf5c07f79af39027b865e31f7100fd5219ba855
-  data.tar.gz: d07e1994e3221003cc17fda719f2964ab35478a0f396735018fab0a12e278831a710eca7512ea538328a6473c0296a3a79295a48322f5b215a8b535be7e962b4
+  metadata.gz: 14e5aacf724794effdaf7e6f092e6b19be17b30885aefe2f71d1414607bf243006142346678aac70fecef0b794cc0e14cb80c71f385242b090f2e2f5328d3ec2
+  data.tar.gz: 7f8477ed02993b787b52e74805952f75eb9f45cab316aa4d191e64ffc89dce1f5a66f6c1aa68dea3457c4d5ce2fb85e311b32ea3d53193494138f052e2911f8d

data/Changelog.md CHANGED Viewed

@@ -1,6 +1,11 @@
 =======
 Predictor Changelog
 =========
+2.1.0 (2014-06-19)
+---------------------
+* The similarity limit now defaults to 128, instead of being unlimited. This is intended to save space in Redis. See the Readme for more information. It is strongly recommended that you run `ensure_similarity_limit_is_obeyed!` to shrink existing similarity sets.
 2.0.0 (2014-04-17)
 ---------------------
 **Rewrite of 1.0.0 and contains several breaking changes!**
@@ -10,4 +15,4 @@ Version 1.0.0 (which really should have been 0.0.1) contained several issues tha
 * Added the ability to limit the number of items stored in the similarity cache (via the 'limit_similarities_to' option). Now that similarities are cached at the root, this is possible and can greatly help memory usage.
 * Removed bang methods from input_matrix (add_set!, and_single!, etc). These called process! for you previously, but since the cache is no longer kept at the input_matrix level, process! has to be called at the root (Recommender::Base)
 * Bug fix: Fixed bug where a call to delete_item! on the input matrix didn't update the similarity cache.
-* Other minor fixes.
+* Other minor fixes.

data/README.md CHANGED Viewed

@@ -53,7 +53,6 @@ Below, we're building a recommender to recommend courses based off of:
 class CourseRecommender
   include Predictor::Base
-  limit_similarities_to 500   # Optional, but if specified, Predictor only caches the top x similarities for an item at any given time. Can greatly help with efficient use of Redis memory
   input_matrix :users, weight: 3.0
   input_matrix :tags, weight: 2.0
   input_matrix :topics, weight: 1.0, measure: :sorensen_coefficient # Use Sorenson over Jaccard
@@ -144,7 +143,8 @@ recommender.clean!
 Limiting Similarities
 ---------------------
-By default, Predictor caches all similarities for all items, with no limit. That means if you have 10,000 items, and each item is somehow related to the other, we'll have 10,000 sets each with 9,999 items. That's going to use Redis' memory quite quickly. To limit this, specify the limit_similarities_to option.
+By default, Predictor caches 128 similarities for each item. This is because this is the maximum size for the similarity sorted sets to be kept in a [memory-efficient format](http://redis.io/topics/memory-optimization). If you want to keep more similarities than that, and you don't mind using more memory, you may want to increase the similarity limit, like so:
 ```ruby
 class CourseRecommender
   include Predictor::Base
@@ -156,7 +156,21 @@ class CourseRecommender
 end
 ```
-This can really save a ton of memory. Just remember though, predictions fetched with the predictions_for call utilzes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
+The memory penalty can be heavy, though. In our testing, similarity caches for 1,000 objects varied in size like so:
+```
+limit_similarities_to(128) # 8.5 MB (this is the default)
+limit_similarities_to(129) # 22.74 MB
+limit_similarities_to(500) # 76.72 MB
+```
+If you decide you need to store more than 128 similarities, you may want to see the Redis documentation linked above and consider increasing `zset-max-ziplist-entries` in your configuration.
+Predictions fetched with the predictions_for call utilizes the similarity caches, so if you're using predictions_for, make sure you set the limit high enough so that intelligent predictions can be generated. If you aren't using predictions and are just using similarities, then feel free to set this to the maximum number of similarities you'd possibly want to show!
+You can also use `limit_similarities_to(nil)` to remove the limit entirely. This means if you have 10,000 items, and each item is somehow related to the other, you'll have 10,000 sets each with 9,999 items, which will run up your Redis bill quite quickly. Removing the limit is not recommended unless you're sure you know what you're doing.
+If at some point you decide to lower your similarity limits, you'll want to be sure to shrink the size of the sorted sets already in Redis. You can do this with `CourseRecommender.new.ensure_similarity_limit_is_obeyed!`.
 Upgrading from 1.0 to 2.0
 ---------------------
@@ -213,4 +227,4 @@ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
 FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
 COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
 IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
-CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/lib/predictor/base.rb CHANGED Viewed

@@ -10,11 +10,17 @@ module Predictor::Base
     end
     def limit_similarities_to(val)
-      @similarity_limit = val
+      @similarity_limit_set = true
+      @similarity_limit     = val
     end
     def similarity_limit
-      @similarity_limit
+      @similarity_limit_set ? @similarity_limit : 128
+    end
+    def reset_similarity_limit!
+      @similarity_limit_set = nil
+      @similarity_limit     = nil
     end
     def input_matrices=(val)
@@ -174,7 +180,9 @@ module Predictor::Base
       items = all_items
       Predictor.redis.multi do |multi|
         items.each do |item|
-          multi.zremrangebyrank(redis_key(:similarities, item), 0, -(similarity_limit))
+          key = redis_key(:similarities, item)
+          multi.zremrangebyrank(key, 0, -(similarity_limit + 1))
+          multi.zunionstore key, [key] # Rewrite zset to take advantage of ziplist implementation.
         end
       end
     end

data/lib/predictor/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Predictor
-  VERSION = "2.0.0"
+  VERSION = "2.1.0"
 end

data/spec/base_spec.rb CHANGED Viewed

@@ -8,7 +8,7 @@ describe Predictor::Base do
   before(:each) do
     flush_redis!
     BaseRecommender.input_matrices = {}
-    BaseRecommender.limit_similarities_to(nil)
+    BaseRecommender.reset_similarity_limit!
   end
   describe "configuration" do
@@ -17,9 +17,18 @@ describe Predictor::Base do
       BaseRecommender.input_matrices.keys.should == [:myinput]
     end
-    it "should allow a similarity limit" do
-      BaseRecommender.limit_similarities_to(100)
-      BaseRecommender.similarity_limit.should == 100
+    it "should default the similarity_limit to 128" do
+      BaseRecommender.similarity_limit.should == 128
+    end
+    it "should allow the similarity limit to be configured" do
+      BaseRecommender.limit_similarities_to(500)
+      BaseRecommender.similarity_limit.should == 500
+    end
+    it "should allow the similarity limit to be removed" do
+      BaseRecommender.limit_similarities_to(nil)
+      BaseRecommender.similarity_limit.should == nil
     end
     it "should retrieve an input_matrix on a new instance" do
@@ -287,4 +296,28 @@ describe Predictor::Base do
       Predictor.redis.keys("#{sm.redis_prefix}:*").should be_empty
     end
   end
+  describe "ensure_similarity_limit_is_obeyed!" do
+    it "should shorten similarities to the given limit and rewrite the zset" do
+      BaseRecommender.limit_similarities_to(nil)
+      BaseRecommender.input_matrix(:myfirstinput)
+      sm = BaseRecommender.new
+      sm.myfirstinput.add_to_set *(['set1'] + 130.times.map{|i| "item#{i}"})
+      sm.similarities_for('item2').should be_empty
+      sm.process_items!('item2')
+      sm.similarities_for('item2').length.should == 129
+      redis = Predictor.redis
+      key = sm.redis_key(:similarities, 'item2')
+      redis.zcard(key).should == 129
+      redis.object(:encoding, key).should == 'skiplist' # Inefficient
+      BaseRecommender.reset_similarity_limit!
+      sm.ensure_similarity_limit_is_obeyed!
+      redis.zcard(key).should == 128
+      redis.object(:encoding, key).should == 'ziplist' # Efficient
+    end
+  end
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: predictor
 version: !ruby/object:Gem::Version
-  version: 2.0.0
+  version: 2.1.0
 platform: ruby
 authors:
 - Pathgather
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2014-04-17 00:00:00.000000000 Z
+date: 2014-06-19 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: redis