RubyGems - disco - Versions diffs - 0.2.6 → 0.2.7 - Mend

disco 0.2.6 → 0.2.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (6) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: a7823dbe0e68967c39a59f8cdc2fe577f4366b492e0559487606b74a7de1cc84
-  data.tar.gz: ba40e46b203e424eccb811c6b042c9a283356c42585b7e00123b4bb2f232b1e2
+  metadata.gz: 5f400f07839587b574ddcfa4c88335bfe20fcd876164b943e8094a35c3c1cfef
+  data.tar.gz: e2426b283146837d14be154ff0e67eb2505fd6587958b39212bf2dfe3bfccd80
 SHA512:
-  metadata.gz: ee43326933ac019b0bae631631ba79a7b1e03d1e9669361ef7722aa5a43b7bf2a2f49ccf8b098ab23539392fd09b83224c3cb9d340b80483179fabb45d62ee30
-  data.tar.gz: 9733820cc4e81b22cca51dbf89a02aa87e96cbbc1add753b2799878b5b50b549f2a27886dcfae387ad4cc158ce4bd651354f8bbd2514460ac07a60560ad5c455
+  metadata.gz: 2be9f24184036ec5b093de55640aebb60887ac59c566f37698fcba7a18daa15cf586566708def0060f80fc0747a50447538cf42fdf36024ae19ddac0de8b415c
+  data.tar.gz: 4682a5524a8cad4a247ec53f99c78e317d56ee55433bb2ad7806af4f2a9854bc016fd23564003f009dc69d0fdcf81949dc88c64d3cbe824a8e76fc5cae8abc7d

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+## 0.2.7 (2021-08-06)
+- Added warning for `value`
 ## 0.2.6 (2021-02-24)
 - Improved performance

data/README.md CHANGED Viewed

@@ -35,17 +35,15 @@ recommender.fit([
 > IDs can be integers, strings, or any other data type
-If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating, or use a value like number of purchases, number of page views, or time spent on page:
+If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating.
 ```ruby
 recommender.fit([
-  {user_id: 1, item_id: 1, value: 1},
-  {user_id: 2, item_id: 1, value: 1}
+  {user_id: 1, item_id: 1},
+  {user_id: 2, item_id: 1}
 ])
 ```
-> Use `value` instead of `rating` for implicit feedback
 Get user-based recommendations - “users like you also liked”
 ```ruby
@@ -106,11 +104,10 @@ views = Ahoy::Event.
   count
 data =
-  views.map do |(user_id, post_id), count|
+  views.map do |(user_id, post_id), _|
     {
       user_id: user_id,
-      item_id: post_id,
-      value: count
+      item_id: post_id
     }
   end
 ```
@@ -201,7 +198,7 @@ bin = File.binread("recommender.bin")
 recommender = Marshal.load(bin)
 ```
-Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor)
+Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor). See the [examples](https://github.com/ankane/neighbor/tree/master/examples).
 ## Algorithms
@@ -282,22 +279,22 @@ gem 'faiss'
 Speed up the `user_recs` method with:
 ```ruby
-model.optimize_user_recs
+recommender.optimize_user_recs
 ```
 Speed up the `item_recs` method with:
 ```ruby
-model.optimize_item_recs
+recommender.optimize_item_recs
 ```
 Speed up the `similar_users` method with:
 ```ruby
-model.optimize_similar_users
+recommender.optimize_similar_users
 ```
-This should be called after fitting or loading the model.
+This should be called after fitting or loading the recommender.
 ## Reference
@@ -336,6 +333,28 @@ Thanks to:
 - [Implicit](https://github.com/benfred/implicit/) for serving as an initial reference for user and item similarity
 - [@dasch](https://github.com/dasch) for the gem name
+## Upgrading
+### 0.2.7
+There’s now a warning when passing `:value` with implicit feedback, as this has no effect on recommendations and can be removed. Earlier versions of the library incorrectly stated this was used.
+```ruby
+recommender.fit([
+  {user_id: 1, item_id: 1, value: 1},
+  {user_id: 2, item_id: 1, value: 3}
+])
+```
+to:
+```ruby
+recommender.fit([
+  {user_id: 1, item_id: 1},
+  {user_id: 2, item_id: 1}
+])
+```
 ## History
 View the [changelog](https://github.com/ankane/disco/blob/master/CHANGELOG.md)

data/lib/disco/recommender.rb CHANGED Viewed

@@ -22,6 +22,10 @@ module Disco
       # but may be confusing if they are all missing and later ones aren't
       @implicit = !train_set.any? { |v| v[:rating] }
+      if @implicit && train_set.any? { |v| v[:value] }
+        warn "[disco] WARNING: Passing `:value` with implicit feedback has no effect on recommendations and can be removed. Earlier versions of the library incorrectly stated this was used."
+      end
       # TODO improve performance
       # (catch exception instead of checking ahead of time)
       unless @implicit
@@ -34,7 +38,6 @@ module Disco
       @rated = Hash.new { |hash, key| hash[key] = {} }
       input = []
-      value_key = @implicit ? :value : :rating
       train_set.each do |v|
         # update maps and build matrix in single pass
         u = (@user_map[v[:user_id]] ||= @user_map.size)
@@ -42,7 +45,7 @@ module Disco
         @rated[u][i] = true
         # explicit will always have a value due to check_ratings
-        input << [u, i, v[value_key] || 1]
+        input << [u, i, @implicit ? 1 : v[:rating]]
       end
       @rated.default = nil
@@ -61,7 +64,7 @@ module Disco
         train_set.each do |v|
           i = @item_map[v[:item_id]]
           @item_count[i] += 1
-          @item_sum[i] += (v[value_key] || 1)
+          @item_sum[i] += (@implicit ? 1 : v[:rating])
         end
       end
@@ -76,7 +79,7 @@ module Disco
           u ||= -1
           i ||= -1
-          eval_set << [u, i, v[value_key] || 1]
+          eval_set << [u, i, @implicit ? 1 : v[:rating]]
         end
       end
@@ -138,8 +141,7 @@ module Disco
           predictions, ids = @user_recs_index.search(@user_factors[u, true].expand_dims(0), count + rated.size).map { |v| v[0, true] }
         else
           predictions = @item_factors.inner(@user_factors[u, true])
-          # TODO make sure reverse isn't hurting performance
-          indexes = predictions.sort_index.reverse
+          indexes = predictions.sort_index.reverse # reverse just creates view
           indexes = indexes[0...[count + rated.size, indexes.size].min] if count
           predictions = predictions[indexes]
           ids = indexes
@@ -179,19 +181,32 @@ module Disco
       raise "top_items not computed" unless @top_items
       if @implicit
-        scores = @item_count
+        scores = Numo::UInt64.cast(@item_count)
       else
         require "wilson_score"
         range = @min_rating..@max_rating
-        scores = @item_sum.zip(@item_count).map { |s, c| WilsonScore.rating_lower_bound(s / c, c, range) }
+        scores = Numo::DFloat.cast(@item_sum.zip(@item_count).map { |s, c| WilsonScore.rating_lower_bound(s / c, c, range) })
+        # TODO uncomment in 0.3.0
+        # wilson score with continuity correction
+        # https://en.wikipedia.org/wiki/Binomial_proportion_confidence_interval#Wilson_score_interval_with_continuity_correction
+        # z = 1.96 # 95% confidence
+        # range = @max_rating - @min_rating
+        # n = Numo::DFloat.cast(@item_count)
+        # phat = (Numo::DFloat.cast(@item_sum) - (@min_rating * n)) / range / n
+        # phat = (phat - (1 / 2 * n)).clip(0, 100) # continuity correction
+        # scores = (phat + z**2 / (2 * n) - z * Numo::DFloat::Math.sqrt((phat * (1 - phat) + z**2 / (4 * n)) / n)) / (1 + z**2 / n)
+        # scores = scores * range + @min_rating
       end
-      scores = scores.map.with_index.sort_by { |s, _| -s }
-      scores = scores.first(count) if count
-      item_ids = item_ids()
-      scores.map do |s, i|
-        {item_id: item_ids[i], score: s}
+      indexes = scores.sort_index.reverse
+      indexes = indexes[0...[count, indexes.size].min] if count
+      scores = scores[indexes]
+      keys = @item_map.keys
+      indexes.size.times.map do |i|
+        {item_id: keys[indexes[i]], score: scores[i]}
       end
     end
@@ -255,8 +270,9 @@ module Disco
         # inner product is cosine similarity with normalized vectors
         # https://github.com/facebookresearch/faiss/issues/95
         #
-        # TODO use non-exact index
+        # TODO use non-exact index in 0.3.0
         # https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
+        # index = Faiss::IndexHNSWFlat.new(factors.shape[1], 32, :inner_product)
         index = Faiss::IndexFlatIP.new(factors.shape[1])
         # ids are from 0...total

data/lib/disco/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Disco
-  VERSION = "0.2.6"
+  VERSION = "0.2.7"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: disco
 version: !ruby/object:Gem::Version
-  version: 0.2.6
+  version: 0.2.7
 platform: ruby
 authors:
 - Andrew Kane
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2021-02-24 00:00:00.000000000 Z
+date: 2021-08-06 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: libmf
@@ -76,7 +76,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.2.3
+rubygems_version: 3.2.22
 signing_key:
 specification_version: 4
 summary: Recommendations for Ruby and Rails using collaborative filtering