disco 0.1.3 → 0.2.5

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 33961b51cd8461f821c4622f5983b2ac6138cc3b70c9be8ef1d3a6e82c37ab9e
4
- data.tar.gz: f4e8cdfa4efb354878c459b57b522a81cd3f0c81e4297c53f9dc88517b312ac8
3
+ metadata.gz: 8fbecb858b316ed39a9cb726263e182561cba6df498e6253d88c79ebec5cab05
4
+ data.tar.gz: 42eb38a6e4e0b3fc5a9452deae5a48676ae9a53e78eeb6197718a0c94bd02b6b
5
5
  SHA512:
6
- metadata.gz: 2f4c207486e858a23480e52b4b9a479fd23b26f0259ef12e39b964d9d7f4cc0067f162207d88119f76414269d65e3ee3d7c675c46f5f143c5b016eacab6e888c
7
- data.tar.gz: 2734c1dcc87c423566dd2f842ef7fdd1b7e3cbaa1ecac61dbfafdbc1769b43edca81d28ce60712008eee9d381d64c9e2dea71b210c1a10fecaef75696ee2fd05
6
+ metadata.gz: d0250346d75fba75064a29578f6bfd39f09ecf712ba2e505b97a4952b5ff8b31af307eb1b912e9b25cc3dc28dee0d096bea44b47bb2ef268859bb4171f0ef8b2
7
+ data.tar.gz: 7b341328c12885efd0ffece4201036bb9457caee80a48a99ba110af9a81bcf832bbc1e8f8f5f14e7fddffef2dd3f4643837e0d569c997ab0c2d9ae85e12422f7
data/CHANGELOG.md CHANGED
@@ -1,3 +1,35 @@
1
+ ## 0.2.5 (2021-02-20)
2
+
3
+ - Added `top_items` method
4
+ - Added `optimize_similar_users` method
5
+ - Added support for Faiss for `optimize_item_recs` and `optimize_similar_users` methods
6
+ - Added `rmse` method
7
+ - Improved performance
8
+
9
+ ## 0.2.4 (2021-02-15)
10
+
11
+ - Added `user_ids` and `item_ids` methods
12
+ - Added `user_id` argument to `user_factors`
13
+ - Added `item_id` argument to `item_factors`
14
+
15
+ ## 0.2.3 (2020-11-28)
16
+
17
+ - Added `predict` method
18
+ - Fixed bad recommendations and scores with `user_recs` and explicit feedback
19
+ - Fixed `item_ids` option for `user_recs`
20
+
21
+ ## 0.2.2 (n/a)
22
+
23
+ - Not available (released by previous gem owner)
24
+
25
+ ## 0.2.1 (2020-10-28)
26
+
27
+ - Fixed issue with `user_recs` returning rated items
28
+
29
+ ## 0.2.0 (2020-07-31)
30
+
31
+ - Changed score to always be between -1 and 1 for `item_recs` and `similar_users` (cosine similarity - this makes it easier to understand and consistent with `optimize_item_recs` and `optimize_similar_users`)
32
+
1
33
  ## 0.1.3 (2020-06-28)
2
34
 
3
35
  - Added support for Rover
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2019 Andrew Kane
1
+ Copyright (c) 2019-2021 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
1
  # Disco
2
2
 
3
- :fire: Collaborative filtering for Ruby
3
+ :fire: Recommendations for Ruby and Rails using collaborative filtering
4
4
 
5
5
  - Supports user-based and item-based recommendations
6
6
  - Works with explicit and implicit feedback
7
7
  - Uses high-performance matrix factorization
8
8
 
9
- [![Build Status](https://travis-ci.org/ankane/disco.svg?branch=master)](https://travis-ci.org/ankane/disco)
9
+ [![Build Status](https://github.com/ankane/disco/workflows/build/badge.svg?branch=master)](https://github.com/ankane/disco/actions)
10
10
 
11
11
  ## Installation
12
12
 
@@ -46,13 +46,13 @@ recommender.fit([
46
46
 
47
47
  > Use `value` instead of rating for implicit feedback
48
48
 
49
- Get user-based (user-item) recommendations - “users like you also liked”
49
+ Get user-based recommendations - “users like you also liked”
50
50
 
51
51
  ```ruby
52
52
  recommender.user_recs(user_id)
53
53
  ```
54
54
 
55
- Get item-based (item-item) recommendations - “users who liked this item also liked”
55
+ Get item-based recommendations - “users who liked this item also liked”
56
56
 
57
57
  ```ruby
58
58
  recommender.item_recs(item_id)
@@ -64,10 +64,10 @@ Use the `count` option to specify the number of recommendations (default is 5)
64
64
  recommender.user_recs(user_id, count: 3)
65
65
  ```
66
66
 
67
- Get predicted ratings for specific items
67
+ Get predicted ratings for specific users and items
68
68
 
69
69
  ```ruby
70
- recommender.user_recs(user_id, item_ids: [1, 2, 3])
70
+ recommender.predict([{user_id: 1, item_id: 2}, {user_id: 2, item_id: 4}])
71
71
  ```
72
72
 
73
73
  Get similar users
@@ -101,14 +101,15 @@ recommender.item_recs("Star Wars (1977)")
101
101
  ```ruby
102
102
  views = Ahoy::Event.
103
103
  where(name: "Viewed post").
104
- group(:user_id, "properties->>'post_id'") # postgres syntax
104
+ group(:user_id).
105
+ group("properties->>'post_id'"). # postgres syntax
105
106
  count
106
107
 
107
108
  data =
108
109
  views.map do |(user_id, post_id), count|
109
110
  {
110
111
  user_id: user_id,
111
- post_id: post_id,
112
+ item_id: post_id,
112
113
  value: count
113
114
  }
114
115
  end
@@ -200,6 +201,8 @@ bin = File.binread("recommender.bin")
200
201
  recommender = Marshal.load(bin)
201
202
  ```
202
203
 
204
+ Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor)
205
+
203
206
  ## Algorithms
204
207
 
205
208
  Disco uses high-performance matrix factorization.
@@ -236,6 +239,16 @@ There are a number of ways to deal with this, but here are some common ones:
236
239
  - For user-based recommendations, show new users the most popular items.
237
240
  - For item-based recommendations, make content-based recommendations with a gem like [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity).
238
241
 
242
+ Get top items with:
243
+
244
+ ```ruby
245
+ recommender = Disco::Recommender.new(top_items: true)
246
+ recommender.fit(data)
247
+ recommender.top_items
248
+ ```
249
+
250
+ This uses [Wilson score](https://www.evanmiller.org/how-not-to-sort-by-average-rating.html) for explicit feedback (add [wilson_score](https://github.com/instacart/wilson_score) your application’s Gemfile) and item frequency for implicit feedback.
251
+
239
252
  ## Data
240
253
 
241
254
  Data can be an array of hashes
@@ -256,23 +269,29 @@ Or a Daru data frame
256
269
  Daru::DataFrame.from_csv("ratings.csv")
257
270
  ```
258
271
 
259
- ## Faster Similarity
272
+ ## Performance [master]
260
273
 
261
- If you have a large number of users/items, you can use an approximate nearest neighbors library like [NGT](https://github.com/ankane/ngt) to speed up item-based recommendations and similar users.
274
+ If you have a large number of users or items, you can use an approximate nearest neighbors library like [Faiss](https://github.com/ankane/faiss) to improve the performance of certain methods.
262
275
 
263
276
  Add this line to your application’s Gemfile:
264
277
 
265
278
  ```ruby
266
- gem 'ngt', '>= 0.3.0'
279
+ gem 'faiss'
280
+ ```
281
+
282
+ Speed up the `user_recs` method with:
283
+
284
+ ```ruby
285
+ model.optimize_user_recs
267
286
  ```
268
287
 
269
- Speed up item-based recommendations with:
288
+ Speed up the `item_recs` method with:
270
289
 
271
290
  ```ruby
272
291
  model.optimize_item_recs
273
292
  ```
274
293
 
275
- Speed up similar users with:
294
+ Speed up the `similar_users` method with:
276
295
 
277
296
  ```ruby
278
297
  model.optimize_similar_users
@@ -282,19 +301,33 @@ This should be called after fitting or loading the model.
282
301
 
283
302
  ## Reference
284
303
 
304
+ Get ids
305
+
306
+ ```ruby
307
+ recommender.user_ids
308
+ recommender.item_ids
309
+ ```
310
+
285
311
  Get the global mean
286
312
 
287
313
  ```ruby
288
314
  recommender.global_mean
289
315
  ```
290
316
 
291
- Get the factors
317
+ Get factors
292
318
 
293
319
  ```ruby
294
320
  recommender.user_factors
295
321
  recommender.item_factors
296
322
  ```
297
323
 
324
+ Get factors for specific users and items
325
+
326
+ ```ruby
327
+ recommender.user_factors(user_id)
328
+ recommender.item_factors(item_id)
329
+ ```
330
+
298
331
  ## Credits
299
332
 
300
333
  Thanks to:
@@ -315,3 +348,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
315
348
  - Fix bugs and [submit pull requests](https://github.com/ankane/disco/pulls)
316
349
  - Write, clarify, or fix documentation
317
350
  - Suggest or add new features
351
+
352
+ To get started with development:
353
+
354
+ ```sh
355
+ git clone https://github.com/ankane/disco.git
356
+ cd disco
357
+ bundle install
358
+ bundle exec rake test
359
+ ```
data/lib/disco.rb CHANGED
@@ -9,6 +9,7 @@ require "net/http"
9
9
 
10
10
  # modules
11
11
  require "disco/data"
12
+ require "disco/metrics"
12
13
  require "disco/recommender"
13
14
  require "disco/version"
14
15
 
data/lib/disco/data.rb CHANGED
@@ -36,8 +36,7 @@ module Disco
36
36
 
37
37
  return dest if File.exist?(dest)
38
38
 
39
- temp_dir ||= File.dirname(Tempfile.new("disco"))
40
- temp_path = "#{temp_dir}/#{Time.now.to_f}" # TODO better name
39
+ temp_path = "#{Dir.tmpdir}/disco-#{Time.now.to_f}" # TODO better name
41
40
 
42
41
  digest = Digest::SHA2.new
43
42
 
@@ -0,0 +1,10 @@
1
+ module Disco
2
+ module Metrics
3
+ class << self
4
+ def rmse(act, exp)
5
+ raise ArgumentError, "Size mismatch" if act.size != exp.size
6
+ Math.sqrt(act.zip(exp).sum { |a, e| (a - e)**2 } / act.size.to_f)
7
+ end
8
+ end
9
+ end
10
+ end
@@ -1,32 +1,33 @@
1
1
  module Disco
2
2
  class Recommender
3
- attr_reader :global_mean, :item_factors, :user_factors
3
+ attr_reader :global_mean
4
4
 
5
- def initialize(factors: 8, epochs: 20, verbose: nil)
5
+ def initialize(factors: 8, epochs: 20, verbose: nil, top_items: false)
6
6
  @factors = factors
7
7
  @epochs = epochs
8
8
  @verbose = verbose
9
+ @user_map = {}
10
+ @item_map = {}
11
+ @top_items = top_items
9
12
  end
10
13
 
11
14
  def fit(train_set, validation_set: nil)
12
15
  train_set = to_dataset(train_set)
13
16
  validation_set = to_dataset(validation_set) if validation_set
14
17
 
15
- @implicit = !train_set.any? { |v| v[:rating] }
18
+ check_training_set(train_set)
16
19
 
20
+ @implicit = !train_set.any? { |v| v[:rating] }
17
21
  unless @implicit
18
- ratings = train_set.map { |o| o[:rating] }
19
- check_ratings(ratings)
20
- @min_rating = ratings.min
21
- @max_rating = ratings.max
22
+ check_ratings(train_set)
23
+ @min_rating, @max_rating = train_set.minmax_by { |o| o[:rating] }.map { |o| o[:rating] }
22
24
 
23
25
  if validation_set
24
- check_ratings(validation_set.map { |o| o[:rating] })
26
+ check_ratings(validation_set)
25
27
  end
26
28
  end
27
29
 
28
- check_training_set(train_set)
29
- create_maps(train_set)
30
+ update_maps(train_set)
30
31
 
31
32
  @rated = Hash.new { |hash, key| hash[key] = {} }
32
33
  input = []
@@ -41,6 +42,16 @@ module Disco
41
42
  end
42
43
  @rated.default = nil
43
44
 
45
+ if @top_items
46
+ @item_count = [0] * @item_map.size
47
+ @item_sum = [0.0] * @item_map.size
48
+ train_set.each do |v|
49
+ i = @item_map[v[:item_id]]
50
+ @item_count[i] += 1
51
+ @item_sum[i] += (v[value_key] || 1)
52
+ end
53
+ end
54
+
44
55
  eval_set = nil
45
56
  if validation_set
46
57
  eval_set = []
@@ -67,67 +78,188 @@ module Disco
67
78
  @user_factors = model.p_factors(format: :numo)
68
79
  @item_factors = model.q_factors(format: :numo)
69
80
 
70
- @user_index = nil
71
- @item_index = nil
81
+ @user_recs_index = nil
82
+ @similar_users_index = nil
83
+ @similar_items_index = nil
84
+ end
85
+
86
+ # generates a prediction even if a user has already rated the item
87
+ def predict(data)
88
+ data = to_dataset(data)
89
+
90
+ u = data.map { |v| @user_map[v[:user_id]] }
91
+ i = data.map { |v| @item_map[v[:item_id]] }
92
+
93
+ new_index = data.each_index.select { |index| u[index].nil? || i[index].nil? }
94
+ new_index.each do |j|
95
+ u[j] = 0
96
+ i[j] = 0
97
+ end
98
+
99
+ predictions = @user_factors[u, true].inner(@item_factors[i, true])
100
+ predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
101
+ predictions[new_index] = @global_mean
102
+ predictions.to_a
72
103
  end
73
104
 
74
105
  def user_recs(user_id, count: 5, item_ids: nil)
106
+ check_fit
75
107
  u = @user_map[user_id]
76
108
 
77
109
  if u
78
- predictions = @global_mean + @item_factors.dot(@user_factors[u, true])
79
- predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
80
-
81
- predictions =
82
- @item_map.keys.zip(predictions).map do |item_id, pred|
83
- {item_id: item_id, score: pred}
84
- end
110
+ rated = item_ids ? {} : @rated[u]
85
111
 
86
112
  if item_ids
87
- idx = item_ids.map { |i| @item_map[i] }.compact
88
- predictions.values_at(*idx)
113
+ ids = Numo::NArray.cast(item_ids.map { |i| @item_map[i] }.compact)
114
+ return [] if ids.size == 0
115
+
116
+ predictions = @item_factors[ids, true].inner(@user_factors[u, true])
117
+ indexes = predictions.sort_index.reverse
118
+ indexes = indexes[0...[count + rated.size, indexes.size].min] if count
119
+ predictions = predictions[indexes]
120
+ ids = ids[indexes]
121
+ elsif @user_recs_index && count
122
+ predictions, ids = @user_recs_index.search(@user_factors[u, true].expand_dims(0), count + rated.size).map { |v| v[0, true] }
89
123
  else
90
- @rated[u].keys.each do |i|
91
- predictions.delete_at(i)
92
- end
124
+ predictions = @item_factors.inner(@user_factors[u, true])
125
+ # TODO make sure reverse isn't hurting performance
126
+ indexes = predictions.sort_index.reverse
127
+ indexes = indexes[0...[count + rated.size, indexes.size].min] if count
128
+ predictions = predictions[indexes]
129
+ ids = indexes
93
130
  end
94
131
 
95
- predictions.sort_by! { |pred| -pred[:score] } # already sorted by id
96
- predictions = predictions.first(count) if count && !item_ids
97
- predictions
132
+ predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
133
+
134
+ keys = @item_map.keys
135
+ result = []
136
+ ids.each_with_index do |item_id, i|
137
+ next if rated[item_id]
138
+
139
+ result << {item_id: keys[item_id], score: predictions[i]}
140
+ break if result.size == count
141
+ end
142
+ result
143
+ elsif @top_items
144
+ top_items(count: count)
98
145
  else
99
- # no items if user is unknown
100
- # TODO maybe most popular items
101
146
  []
102
147
  end
103
148
  end
104
149
 
105
- def optimize_similar_items
106
- @item_index = create_index(@item_factors)
150
+ def similar_items(item_id, count: 5)
151
+ check_fit
152
+ similar(item_id, @item_map, item_norms, count, @similar_items_index)
107
153
  end
108
- alias_method :optimize_item_recs, :optimize_similar_items
154
+ alias_method :item_recs, :similar_items
109
155
 
110
- def optimize_similar_users
111
- @user_index = create_index(@user_factors)
156
+ def similar_users(user_id, count: 5)
157
+ check_fit
158
+ similar(user_id, @user_map, user_norms, count, @similar_users_index)
112
159
  end
113
160
 
114
- def similar_items(item_id, count: 5)
115
- similar(item_id, @item_map, @item_factors, item_norms, count, @item_index)
161
+ def top_items(count: 5)
162
+ check_fit
163
+ raise "top_items not computed" unless @top_items
164
+
165
+ if @implicit
166
+ scores = @item_count
167
+ else
168
+ require "wilson_score"
169
+
170
+ range = @min_rating..@max_rating
171
+ scores = @item_sum.zip(@item_count).map { |s, c| WilsonScore.rating_lower_bound(s / c, c, range) }
172
+ end
173
+
174
+ scores = scores.map.with_index.sort_by { |s, _| -s }
175
+ scores = scores.first(count) if count
176
+ item_ids = item_ids()
177
+ scores.map do |s, i|
178
+ {item_id: item_ids[i], score: s}
179
+ end
116
180
  end
117
- alias_method :item_recs, :similar_items
118
181
 
119
- def similar_users(user_id, count: 5)
120
- similar(user_id, @user_map, @user_factors, user_norms, count, @user_index)
182
+ def user_ids
183
+ @user_map.keys
184
+ end
185
+
186
+ def item_ids
187
+ @item_map.keys
188
+ end
189
+
190
+ def user_factors(user_id = nil)
191
+ if user_id
192
+ u = @user_map[user_id]
193
+ @user_factors[u, true] if u
194
+ else
195
+ @user_factors
196
+ end
197
+ end
198
+
199
+ def item_factors(item_id = nil)
200
+ if item_id
201
+ i = @item_map[item_id]
202
+ @item_factors[i, true] if i
203
+ else
204
+ @item_factors
205
+ end
206
+ end
207
+
208
+ def optimize_user_recs
209
+ check_fit
210
+ @user_recs_index = create_index(item_factors, library: "faiss")
211
+ end
212
+
213
+ def optimize_similar_items(library: nil)
214
+ check_fit
215
+ @similar_items_index = create_index(item_norms, library: library)
216
+ end
217
+ alias_method :optimize_item_recs, :optimize_similar_items
218
+
219
+ def optimize_similar_users(library: nil)
220
+ check_fit
221
+ @similar_users_index = create_index(user_norms, library: library)
121
222
  end
122
223
 
123
224
  private
124
225
 
125
- def create_index(factors)
126
- require "ngt"
226
+ # factors should already be normalized for similar users/items
227
+ def create_index(factors, library:)
228
+ # TODO make Faiss the default in 0.3.0
229
+ library ||= defined?(Faiss) && !defined?(Ngt) ? "faiss" : "ngt"
230
+
231
+ case library
232
+ when "faiss"
233
+ require "faiss"
234
+
235
+ # inner product is cosine similarity with normalized vectors
236
+ # https://github.com/facebookresearch/faiss/issues/95
237
+ #
238
+ # TODO use non-exact index
239
+ # https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
240
+ index = Faiss::IndexFlatIP.new(factors.shape[1])
241
+
242
+ # ids are from 0...total
243
+ # https://github.com/facebookresearch/faiss/blob/96b740abedffc8f67389f29c2a180913941534c6/faiss/Index.h#L89
244
+ index.add(factors)
245
+
246
+ index
247
+ when "ngt"
248
+ require "ngt"
127
249
 
128
- index = Ngt::Index.new(factors.shape[1], distance_type: "Cosine")
129
- index.batch_insert(factors)
130
- index
250
+ # could speed up search with normalized cosine
251
+ # https://github.com/yahoojapan/NGT/issues/36
252
+ index = Ngt::Index.new(factors.shape[1], distance_type: "Cosine")
253
+
254
+ # NGT normalizes so could call create_index with factors instead of norms
255
+ # but keep code simple for now
256
+ ids = index.batch_insert(factors)
257
+ raise "Unexpected ids. Please report a bug." if ids.first != 1 || ids.last != factors.shape[0]
258
+
259
+ index
260
+ else
261
+ raise ArgumentError, "Invalid library: #{library}"
262
+ end
131
263
  end
132
264
 
133
265
  def user_norms
@@ -139,63 +271,61 @@ module Disco
139
271
  end
140
272
 
141
273
  def norms(factors)
142
- norms = Numo::DFloat::Math.sqrt((factors * factors).sum(axis: 1))
274
+ norms = Numo::SFloat::Math.sqrt((factors * factors).sum(axis: 1))
143
275
  norms[norms.eq(0)] = 1e-10 # no zeros
144
- norms
276
+ factors / norms.expand_dims(1)
145
277
  end
146
278
 
147
- def similar(id, map, factors, norms, count, index)
279
+ def similar(id, map, norm_factors, count, index)
148
280
  i = map[id]
149
- if i
281
+
282
+ if i && norm_factors.shape[0] > 1
150
283
  if index && count
151
- keys = map.keys
152
- result = index.search(factors[i, true], size: count + 1)[1..-1]
153
- result.map do |v|
154
- {
155
- # ids from batch_insert start at 1 instead of 0
156
- item_id: keys[v[:id] - 1],
157
- # convert cosine distance to cosine similarity
158
- score: 1 - v[:distance]
159
- }
284
+ if defined?(Faiss) && index.is_a?(Faiss::Index)
285
+ predictions, ids = index.search(norm_factors[i, true].expand_dims(0), count + 1).map { |v| v.to_a[0] }
286
+ else
287
+ result = index.search(norm_factors[i, true], size: count + 1)
288
+ # ids from batch_insert start at 1 instead of 0
289
+ ids = result.map { |v| v[:id] - 1 }
290
+ # convert cosine distance to cosine similarity
291
+ predictions = result.map { |v| 1 - v[:distance] }
160
292
  end
161
293
  else
162
- predictions = factors.dot(factors[i, true]) / norms
163
-
164
- predictions =
165
- map.keys.zip(predictions).map do |item_id, pred|
166
- {item_id: item_id, score: pred}
167
- end
168
-
169
- max_score = predictions.delete_at(i)[:score]
170
- predictions.sort_by! { |pred| -pred[:score] } # already sorted by id
171
- predictions = predictions.first(count) if count
172
- # divide by max score to get cosine similarity
173
- # only need to do for returned records
174
- # could alternatively do cosine distance = 1 - cosine similarity
175
- # predictions.each { |pred| pred[:score] /= max_score }
176
- predictions
294
+ predictions = norm_factors.inner(norm_factors[i, true])
295
+ indexes = predictions.sort_index.reverse
296
+ indexes = indexes[0...[count + 1, indexes.size].min] if count
297
+ predictions = predictions[indexes]
298
+ ids = indexes
299
+ end
300
+
301
+ keys = map.keys
302
+
303
+ # TODO use user_id for similar_users in 0.3.0
304
+ key = :item_id
305
+
306
+ (1...ids.size).map do |i|
307
+ {key => keys[ids[i]], score: predictions[i]}
177
308
  end
178
309
  else
179
310
  []
180
311
  end
181
312
  end
182
313
 
183
- def create_maps(train_set)
184
- user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
185
- item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
314
+ def update_maps(train_set)
315
+ raise ArgumentError, "Missing user_id" if train_set.any? { |v| v[:user_id].nil? }
316
+ raise ArgumentError, "Missing item_id" if train_set.any? { |v| v[:item_id].nil? }
186
317
 
187
- raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
188
- raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
189
-
190
- @user_map = user_ids.zip(user_ids.size.times).to_h
191
- @item_map = item_ids.zip(item_ids.size.times).to_h
318
+ train_set.each do |v|
319
+ @user_map[v[:user_id]] ||= @user_map.size
320
+ @item_map[v[:item_id]] ||= @item_map.size
321
+ end
192
322
  end
193
323
 
194
324
  def check_ratings(ratings)
195
- unless ratings.all? { |r| !r.nil? }
325
+ unless ratings.all? { |r| !r[:rating].nil? }
196
326
  raise ArgumentError, "Missing ratings"
197
327
  end
198
- unless ratings.all? { |r| r.is_a?(Numeric) }
328
+ unless ratings.all? { |r| r[:rating].is_a?(Numeric) }
199
329
  raise ArgumentError, "Ratings must be numeric"
200
330
  end
201
331
  end
@@ -204,6 +334,10 @@ module Disco
204
334
  raise ArgumentError, "No training data" if train_set.empty?
205
335
  end
206
336
 
337
+ def check_fit
338
+ raise "Not fit" unless defined?(@implicit)
339
+ end
340
+
207
341
  def to_dataset(dataset)
208
342
  if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
209
343
  # convert keys to symbols
@@ -239,6 +373,11 @@ module Disco
239
373
  obj[:max_rating] = @max_rating
240
374
  end
241
375
 
376
+ if @top_items
377
+ obj[:item_count] = @item_count
378
+ obj[:item_sum] = @item_sum
379
+ end
380
+
242
381
  obj
243
382
  end
244
383
 
@@ -255,6 +394,12 @@ module Disco
255
394
  @min_rating = obj[:min_rating]
256
395
  @max_rating = obj[:max_rating]
257
396
  end
397
+
398
+ @top_items = obj.key?(:item_count)
399
+ if @top_items
400
+ @item_count = obj[:item_count]
401
+ @item_sum = obj[:item_sum]
402
+ end
258
403
  end
259
404
  end
260
405
  end
data/lib/disco/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Disco
2
- VERSION = "0.1.3"
2
+ VERSION = "0.2.5"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: disco
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-06-29 00:00:00.000000000 Z
11
+ date: 2021-02-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: libmf
@@ -38,120 +38,8 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: bundler
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: rake
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- - !ruby/object:Gem::Dependency
70
- name: minitest
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '5'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '5'
83
- - !ruby/object:Gem::Dependency
84
- name: activerecord
85
- requirement: !ruby/object:Gem::Requirement
86
- requirements:
87
- - - ">="
88
- - !ruby/object:Gem::Version
89
- version: '0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - ">="
95
- - !ruby/object:Gem::Version
96
- version: '0'
97
- - !ruby/object:Gem::Dependency
98
- name: sqlite3
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - ">="
102
- - !ruby/object:Gem::Version
103
- version: '0'
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - ">="
109
- - !ruby/object:Gem::Version
110
- version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: daru
113
- requirement: !ruby/object:Gem::Requirement
114
- requirements:
115
- - - ">="
116
- - !ruby/object:Gem::Version
117
- version: '0'
118
- type: :development
119
- prerelease: false
120
- version_requirements: !ruby/object:Gem::Requirement
121
- requirements:
122
- - - ">="
123
- - !ruby/object:Gem::Version
124
- version: '0'
125
- - !ruby/object:Gem::Dependency
126
- name: rover-df
127
- requirement: !ruby/object:Gem::Requirement
128
- requirements:
129
- - - ">="
130
- - !ruby/object:Gem::Version
131
- version: '0'
132
- type: :development
133
- prerelease: false
134
- version_requirements: !ruby/object:Gem::Requirement
135
- requirements:
136
- - - ">="
137
- - !ruby/object:Gem::Version
138
- version: '0'
139
- - !ruby/object:Gem::Dependency
140
- name: ngt
141
- requirement: !ruby/object:Gem::Requirement
142
- requirements:
143
- - - ">="
144
- - !ruby/object:Gem::Version
145
- version: 0.2.3
146
- type: :development
147
- prerelease: false
148
- version_requirements: !ruby/object:Gem::Requirement
149
- requirements:
150
- - - ">="
151
- - !ruby/object:Gem::Version
152
- version: 0.2.3
153
- description:
154
- email: andrew@chartkick.com
41
+ description:
42
+ email: andrew@ankane.org
155
43
  executables: []
156
44
  extensions: []
157
45
  extra_rdoc_files: []
@@ -163,6 +51,7 @@ files:
163
51
  - lib/disco.rb
164
52
  - lib/disco/data.rb
165
53
  - lib/disco/engine.rb
54
+ - lib/disco/metrics.rb
166
55
  - lib/disco/model.rb
167
56
  - lib/disco/recommender.rb
168
57
  - lib/disco/version.rb
@@ -172,7 +61,7 @@ homepage: https://github.com/ankane/disco
172
61
  licenses:
173
62
  - MIT
174
63
  metadata: {}
175
- post_install_message:
64
+ post_install_message:
176
65
  rdoc_options: []
177
66
  require_paths:
178
67
  - lib
@@ -187,8 +76,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
187
76
  - !ruby/object:Gem::Version
188
77
  version: '0'
189
78
  requirements: []
190
- rubygems_version: 3.1.2
191
- signing_key:
79
+ rubygems_version: 3.2.3
80
+ signing_key:
192
81
  specification_version: 4
193
- summary: Collaborative filtering for Ruby
82
+ summary: Recommendations for Ruby and Rails using collaborative filtering
194
83
  test_files: []