disco 0.1.3 → 0.2.5

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 33961b51cd8461f821c4622f5983b2ac6138cc3b70c9be8ef1d3a6e82c37ab9e
4
- data.tar.gz: f4e8cdfa4efb354878c459b57b522a81cd3f0c81e4297c53f9dc88517b312ac8
3
+ metadata.gz: 8fbecb858b316ed39a9cb726263e182561cba6df498e6253d88c79ebec5cab05
4
+ data.tar.gz: 42eb38a6e4e0b3fc5a9452deae5a48676ae9a53e78eeb6197718a0c94bd02b6b
5
5
  SHA512:
6
- metadata.gz: 2f4c207486e858a23480e52b4b9a479fd23b26f0259ef12e39b964d9d7f4cc0067f162207d88119f76414269d65e3ee3d7c675c46f5f143c5b016eacab6e888c
7
- data.tar.gz: 2734c1dcc87c423566dd2f842ef7fdd1b7e3cbaa1ecac61dbfafdbc1769b43edca81d28ce60712008eee9d381d64c9e2dea71b210c1a10fecaef75696ee2fd05
6
+ metadata.gz: d0250346d75fba75064a29578f6bfd39f09ecf712ba2e505b97a4952b5ff8b31af307eb1b912e9b25cc3dc28dee0d096bea44b47bb2ef268859bb4171f0ef8b2
7
+ data.tar.gz: 7b341328c12885efd0ffece4201036bb9457caee80a48a99ba110af9a81bcf832bbc1e8f8f5f14e7fddffef2dd3f4643837e0d569c997ab0c2d9ae85e12422f7
data/CHANGELOG.md CHANGED
@@ -1,3 +1,35 @@
1
+ ## 0.2.5 (2021-02-20)
2
+
3
+ - Added `top_items` method
4
+ - Added `optimize_similar_users` method
5
+ - Added support for Faiss for `optimize_item_recs` and `optimize_similar_users` methods
6
+ - Added `rmse` method
7
+ - Improved performance
8
+
9
+ ## 0.2.4 (2021-02-15)
10
+
11
+ - Added `user_ids` and `item_ids` methods
12
+ - Added `user_id` argument to `user_factors`
13
+ - Added `item_id` argument to `item_factors`
14
+
15
+ ## 0.2.3 (2020-11-28)
16
+
17
+ - Added `predict` method
18
+ - Fixed bad recommendations and scores with `user_recs` and explicit feedback
19
+ - Fixed `item_ids` option for `user_recs`
20
+
21
+ ## 0.2.2 (n/a)
22
+
23
+ - Not available (released by previous gem owner)
24
+
25
+ ## 0.2.1 (2020-10-28)
26
+
27
+ - Fixed issue with `user_recs` returning rated items
28
+
29
+ ## 0.2.0 (2020-07-31)
30
+
31
+ - Changed score to always be between -1 and 1 for `item_recs` and `similar_users` (cosine similarity - this makes it easier to understand and consistent with `optimize_item_recs` and `optimize_similar_users`)
32
+
1
33
  ## 0.1.3 (2020-06-28)
2
34
 
3
35
  - Added support for Rover
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2019 Andrew Kane
1
+ Copyright (c) 2019-2021 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
1
  # Disco
2
2
 
3
- :fire: Collaborative filtering for Ruby
3
+ :fire: Recommendations for Ruby and Rails using collaborative filtering
4
4
 
5
5
  - Supports user-based and item-based recommendations
6
6
  - Works with explicit and implicit feedback
7
7
  - Uses high-performance matrix factorization
8
8
 
9
- [![Build Status](https://travis-ci.org/ankane/disco.svg?branch=master)](https://travis-ci.org/ankane/disco)
9
+ [![Build Status](https://github.com/ankane/disco/workflows/build/badge.svg?branch=master)](https://github.com/ankane/disco/actions)
10
10
 
11
11
  ## Installation
12
12
 
@@ -46,13 +46,13 @@ recommender.fit([
46
46
 
47
47
  > Use `value` instead of rating for implicit feedback
48
48
 
49
- Get user-based (user-item) recommendations - “users like you also liked”
49
+ Get user-based recommendations - “users like you also liked”
50
50
 
51
51
  ```ruby
52
52
  recommender.user_recs(user_id)
53
53
  ```
54
54
 
55
- Get item-based (item-item) recommendations - “users who liked this item also liked”
55
+ Get item-based recommendations - “users who liked this item also liked”
56
56
 
57
57
  ```ruby
58
58
  recommender.item_recs(item_id)
@@ -64,10 +64,10 @@ Use the `count` option to specify the number of recommendations (default is 5)
64
64
  recommender.user_recs(user_id, count: 3)
65
65
  ```
66
66
 
67
- Get predicted ratings for specific items
67
+ Get predicted ratings for specific users and items
68
68
 
69
69
  ```ruby
70
- recommender.user_recs(user_id, item_ids: [1, 2, 3])
70
+ recommender.predict([{user_id: 1, item_id: 2}, {user_id: 2, item_id: 4}])
71
71
  ```
72
72
 
73
73
  Get similar users
@@ -101,14 +101,15 @@ recommender.item_recs("Star Wars (1977)")
101
101
  ```ruby
102
102
  views = Ahoy::Event.
103
103
  where(name: "Viewed post").
104
- group(:user_id, "properties->>'post_id'") # postgres syntax
104
+ group(:user_id).
105
+ group("properties->>'post_id'"). # postgres syntax
105
106
  count
106
107
 
107
108
  data =
108
109
  views.map do |(user_id, post_id), count|
109
110
  {
110
111
  user_id: user_id,
111
- post_id: post_id,
112
+ item_id: post_id,
112
113
  value: count
113
114
  }
114
115
  end
@@ -200,6 +201,8 @@ bin = File.binread("recommender.bin")
200
201
  recommender = Marshal.load(bin)
201
202
  ```
202
203
 
204
+ Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor)
205
+
203
206
  ## Algorithms
204
207
 
205
208
  Disco uses high-performance matrix factorization.
@@ -236,6 +239,16 @@ There are a number of ways to deal with this, but here are some common ones:
236
239
  - For user-based recommendations, show new users the most popular items.
237
240
  - For item-based recommendations, make content-based recommendations with a gem like [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity).
238
241
 
242
+ Get top items with:
243
+
244
+ ```ruby
245
+ recommender = Disco::Recommender.new(top_items: true)
246
+ recommender.fit(data)
247
+ recommender.top_items
248
+ ```
249
+
250
+ This uses [Wilson score](https://www.evanmiller.org/how-not-to-sort-by-average-rating.html) for explicit feedback (add [wilson_score](https://github.com/instacart/wilson_score) your application’s Gemfile) and item frequency for implicit feedback.
251
+
239
252
  ## Data
240
253
 
241
254
  Data can be an array of hashes
@@ -256,23 +269,29 @@ Or a Daru data frame
256
269
  Daru::DataFrame.from_csv("ratings.csv")
257
270
  ```
258
271
 
259
- ## Faster Similarity
272
+ ## Performance [master]
260
273
 
261
- If you have a large number of users/items, you can use an approximate nearest neighbors library like [NGT](https://github.com/ankane/ngt) to speed up item-based recommendations and similar users.
274
+ If you have a large number of users or items, you can use an approximate nearest neighbors library like [Faiss](https://github.com/ankane/faiss) to improve the performance of certain methods.
262
275
 
263
276
  Add this line to your application’s Gemfile:
264
277
 
265
278
  ```ruby
266
- gem 'ngt', '>= 0.3.0'
279
+ gem 'faiss'
280
+ ```
281
+
282
+ Speed up the `user_recs` method with:
283
+
284
+ ```ruby
285
+ model.optimize_user_recs
267
286
  ```
268
287
 
269
- Speed up item-based recommendations with:
288
+ Speed up the `item_recs` method with:
270
289
 
271
290
  ```ruby
272
291
  model.optimize_item_recs
273
292
  ```
274
293
 
275
- Speed up similar users with:
294
+ Speed up the `similar_users` method with:
276
295
 
277
296
  ```ruby
278
297
  model.optimize_similar_users
@@ -282,19 +301,33 @@ This should be called after fitting or loading the model.
282
301
 
283
302
  ## Reference
284
303
 
304
+ Get ids
305
+
306
+ ```ruby
307
+ recommender.user_ids
308
+ recommender.item_ids
309
+ ```
310
+
285
311
  Get the global mean
286
312
 
287
313
  ```ruby
288
314
  recommender.global_mean
289
315
  ```
290
316
 
291
- Get the factors
317
+ Get factors
292
318
 
293
319
  ```ruby
294
320
  recommender.user_factors
295
321
  recommender.item_factors
296
322
  ```
297
323
 
324
+ Get factors for specific users and items
325
+
326
+ ```ruby
327
+ recommender.user_factors(user_id)
328
+ recommender.item_factors(item_id)
329
+ ```
330
+
298
331
  ## Credits
299
332
 
300
333
  Thanks to:
@@ -315,3 +348,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
315
348
  - Fix bugs and [submit pull requests](https://github.com/ankane/disco/pulls)
316
349
  - Write, clarify, or fix documentation
317
350
  - Suggest or add new features
351
+
352
+ To get started with development:
353
+
354
+ ```sh
355
+ git clone https://github.com/ankane/disco.git
356
+ cd disco
357
+ bundle install
358
+ bundle exec rake test
359
+ ```
data/lib/disco.rb CHANGED
@@ -9,6 +9,7 @@ require "net/http"
9
9
 
10
10
  # modules
11
11
  require "disco/data"
12
+ require "disco/metrics"
12
13
  require "disco/recommender"
13
14
  require "disco/version"
14
15
 
data/lib/disco/data.rb CHANGED
@@ -36,8 +36,7 @@ module Disco
36
36
 
37
37
  return dest if File.exist?(dest)
38
38
 
39
- temp_dir ||= File.dirname(Tempfile.new("disco"))
40
- temp_path = "#{temp_dir}/#{Time.now.to_f}" # TODO better name
39
+ temp_path = "#{Dir.tmpdir}/disco-#{Time.now.to_f}" # TODO better name
41
40
 
42
41
  digest = Digest::SHA2.new
43
42
 
@@ -0,0 +1,10 @@
1
+ module Disco
2
+ module Metrics
3
+ class << self
4
+ def rmse(act, exp)
5
+ raise ArgumentError, "Size mismatch" if act.size != exp.size
6
+ Math.sqrt(act.zip(exp).sum { |a, e| (a - e)**2 } / act.size.to_f)
7
+ end
8
+ end
9
+ end
10
+ end
@@ -1,32 +1,33 @@
1
1
  module Disco
2
2
  class Recommender
3
- attr_reader :global_mean, :item_factors, :user_factors
3
+ attr_reader :global_mean
4
4
 
5
- def initialize(factors: 8, epochs: 20, verbose: nil)
5
+ def initialize(factors: 8, epochs: 20, verbose: nil, top_items: false)
6
6
  @factors = factors
7
7
  @epochs = epochs
8
8
  @verbose = verbose
9
+ @user_map = {}
10
+ @item_map = {}
11
+ @top_items = top_items
9
12
  end
10
13
 
11
14
  def fit(train_set, validation_set: nil)
12
15
  train_set = to_dataset(train_set)
13
16
  validation_set = to_dataset(validation_set) if validation_set
14
17
 
15
- @implicit = !train_set.any? { |v| v[:rating] }
18
+ check_training_set(train_set)
16
19
 
20
+ @implicit = !train_set.any? { |v| v[:rating] }
17
21
  unless @implicit
18
- ratings = train_set.map { |o| o[:rating] }
19
- check_ratings(ratings)
20
- @min_rating = ratings.min
21
- @max_rating = ratings.max
22
+ check_ratings(train_set)
23
+ @min_rating, @max_rating = train_set.minmax_by { |o| o[:rating] }.map { |o| o[:rating] }
22
24
 
23
25
  if validation_set
24
- check_ratings(validation_set.map { |o| o[:rating] })
26
+ check_ratings(validation_set)
25
27
  end
26
28
  end
27
29
 
28
- check_training_set(train_set)
29
- create_maps(train_set)
30
+ update_maps(train_set)
30
31
 
31
32
  @rated = Hash.new { |hash, key| hash[key] = {} }
32
33
  input = []
@@ -41,6 +42,16 @@ module Disco
41
42
  end
42
43
  @rated.default = nil
43
44
 
45
+ if @top_items
46
+ @item_count = [0] * @item_map.size
47
+ @item_sum = [0.0] * @item_map.size
48
+ train_set.each do |v|
49
+ i = @item_map[v[:item_id]]
50
+ @item_count[i] += 1
51
+ @item_sum[i] += (v[value_key] || 1)
52
+ end
53
+ end
54
+
44
55
  eval_set = nil
45
56
  if validation_set
46
57
  eval_set = []
@@ -67,67 +78,188 @@ module Disco
67
78
  @user_factors = model.p_factors(format: :numo)
68
79
  @item_factors = model.q_factors(format: :numo)
69
80
 
70
- @user_index = nil
71
- @item_index = nil
81
+ @user_recs_index = nil
82
+ @similar_users_index = nil
83
+ @similar_items_index = nil
84
+ end
85
+
86
+ # generates a prediction even if a user has already rated the item
87
+ def predict(data)
88
+ data = to_dataset(data)
89
+
90
+ u = data.map { |v| @user_map[v[:user_id]] }
91
+ i = data.map { |v| @item_map[v[:item_id]] }
92
+
93
+ new_index = data.each_index.select { |index| u[index].nil? || i[index].nil? }
94
+ new_index.each do |j|
95
+ u[j] = 0
96
+ i[j] = 0
97
+ end
98
+
99
+ predictions = @user_factors[u, true].inner(@item_factors[i, true])
100
+ predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
101
+ predictions[new_index] = @global_mean
102
+ predictions.to_a
72
103
  end
73
104
 
74
105
  def user_recs(user_id, count: 5, item_ids: nil)
106
+ check_fit
75
107
  u = @user_map[user_id]
76
108
 
77
109
  if u
78
- predictions = @global_mean + @item_factors.dot(@user_factors[u, true])
79
- predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
80
-
81
- predictions =
82
- @item_map.keys.zip(predictions).map do |item_id, pred|
83
- {item_id: item_id, score: pred}
84
- end
110
+ rated = item_ids ? {} : @rated[u]
85
111
 
86
112
  if item_ids
87
- idx = item_ids.map { |i| @item_map[i] }.compact
88
- predictions.values_at(*idx)
113
+ ids = Numo::NArray.cast(item_ids.map { |i| @item_map[i] }.compact)
114
+ return [] if ids.size == 0
115
+
116
+ predictions = @item_factors[ids, true].inner(@user_factors[u, true])
117
+ indexes = predictions.sort_index.reverse
118
+ indexes = indexes[0...[count + rated.size, indexes.size].min] if count
119
+ predictions = predictions[indexes]
120
+ ids = ids[indexes]
121
+ elsif @user_recs_index && count
122
+ predictions, ids = @user_recs_index.search(@user_factors[u, true].expand_dims(0), count + rated.size).map { |v| v[0, true] }
89
123
  else
90
- @rated[u].keys.each do |i|
91
- predictions.delete_at(i)
92
- end
124
+ predictions = @item_factors.inner(@user_factors[u, true])
125
+ # TODO make sure reverse isn't hurting performance
126
+ indexes = predictions.sort_index.reverse
127
+ indexes = indexes[0...[count + rated.size, indexes.size].min] if count
128
+ predictions = predictions[indexes]
129
+ ids = indexes
93
130
  end
94
131
 
95
- predictions.sort_by! { |pred| -pred[:score] } # already sorted by id
96
- predictions = predictions.first(count) if count && !item_ids
97
- predictions
132
+ predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
133
+
134
+ keys = @item_map.keys
135
+ result = []
136
+ ids.each_with_index do |item_id, i|
137
+ next if rated[item_id]
138
+
139
+ result << {item_id: keys[item_id], score: predictions[i]}
140
+ break if result.size == count
141
+ end
142
+ result
143
+ elsif @top_items
144
+ top_items(count: count)
98
145
  else
99
- # no items if user is unknown
100
- # TODO maybe most popular items
101
146
  []
102
147
  end
103
148
  end
104
149
 
105
- def optimize_similar_items
106
- @item_index = create_index(@item_factors)
150
+ def similar_items(item_id, count: 5)
151
+ check_fit
152
+ similar(item_id, @item_map, item_norms, count, @similar_items_index)
107
153
  end
108
- alias_method :optimize_item_recs, :optimize_similar_items
154
+ alias_method :item_recs, :similar_items
109
155
 
110
- def optimize_similar_users
111
- @user_index = create_index(@user_factors)
156
+ def similar_users(user_id, count: 5)
157
+ check_fit
158
+ similar(user_id, @user_map, user_norms, count, @similar_users_index)
112
159
  end
113
160
 
114
- def similar_items(item_id, count: 5)
115
- similar(item_id, @item_map, @item_factors, item_norms, count, @item_index)
161
+ def top_items(count: 5)
162
+ check_fit
163
+ raise "top_items not computed" unless @top_items
164
+
165
+ if @implicit
166
+ scores = @item_count
167
+ else
168
+ require "wilson_score"
169
+
170
+ range = @min_rating..@max_rating
171
+ scores = @item_sum.zip(@item_count).map { |s, c| WilsonScore.rating_lower_bound(s / c, c, range) }
172
+ end
173
+
174
+ scores = scores.map.with_index.sort_by { |s, _| -s }
175
+ scores = scores.first(count) if count
176
+ item_ids = item_ids()
177
+ scores.map do |s, i|
178
+ {item_id: item_ids[i], score: s}
179
+ end
116
180
  end
117
- alias_method :item_recs, :similar_items
118
181
 
119
- def similar_users(user_id, count: 5)
120
- similar(user_id, @user_map, @user_factors, user_norms, count, @user_index)
182
+ def user_ids
183
+ @user_map.keys
184
+ end
185
+
186
+ def item_ids
187
+ @item_map.keys
188
+ end
189
+
190
+ def user_factors(user_id = nil)
191
+ if user_id
192
+ u = @user_map[user_id]
193
+ @user_factors[u, true] if u
194
+ else
195
+ @user_factors
196
+ end
197
+ end
198
+
199
+ def item_factors(item_id = nil)
200
+ if item_id
201
+ i = @item_map[item_id]
202
+ @item_factors[i, true] if i
203
+ else
204
+ @item_factors
205
+ end
206
+ end
207
+
208
+ def optimize_user_recs
209
+ check_fit
210
+ @user_recs_index = create_index(item_factors, library: "faiss")
211
+ end
212
+
213
+ def optimize_similar_items(library: nil)
214
+ check_fit
215
+ @similar_items_index = create_index(item_norms, library: library)
216
+ end
217
+ alias_method :optimize_item_recs, :optimize_similar_items
218
+
219
+ def optimize_similar_users(library: nil)
220
+ check_fit
221
+ @similar_users_index = create_index(user_norms, library: library)
121
222
  end
122
223
 
123
224
  private
124
225
 
125
- def create_index(factors)
126
- require "ngt"
226
+ # factors should already be normalized for similar users/items
227
+ def create_index(factors, library:)
228
+ # TODO make Faiss the default in 0.3.0
229
+ library ||= defined?(Faiss) && !defined?(Ngt) ? "faiss" : "ngt"
230
+
231
+ case library
232
+ when "faiss"
233
+ require "faiss"
234
+
235
+ # inner product is cosine similarity with normalized vectors
236
+ # https://github.com/facebookresearch/faiss/issues/95
237
+ #
238
+ # TODO use non-exact index
239
+ # https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
240
+ index = Faiss::IndexFlatIP.new(factors.shape[1])
241
+
242
+ # ids are from 0...total
243
+ # https://github.com/facebookresearch/faiss/blob/96b740abedffc8f67389f29c2a180913941534c6/faiss/Index.h#L89
244
+ index.add(factors)
245
+
246
+ index
247
+ when "ngt"
248
+ require "ngt"
127
249
 
128
- index = Ngt::Index.new(factors.shape[1], distance_type: "Cosine")
129
- index.batch_insert(factors)
130
- index
250
+ # could speed up search with normalized cosine
251
+ # https://github.com/yahoojapan/NGT/issues/36
252
+ index = Ngt::Index.new(factors.shape[1], distance_type: "Cosine")
253
+
254
+ # NGT normalizes so could call create_index with factors instead of norms
255
+ # but keep code simple for now
256
+ ids = index.batch_insert(factors)
257
+ raise "Unexpected ids. Please report a bug." if ids.first != 1 || ids.last != factors.shape[0]
258
+
259
+ index
260
+ else
261
+ raise ArgumentError, "Invalid library: #{library}"
262
+ end
131
263
  end
132
264
 
133
265
  def user_norms
@@ -139,63 +271,61 @@ module Disco
139
271
  end
140
272
 
141
273
  def norms(factors)
142
- norms = Numo::DFloat::Math.sqrt((factors * factors).sum(axis: 1))
274
+ norms = Numo::SFloat::Math.sqrt((factors * factors).sum(axis: 1))
143
275
  norms[norms.eq(0)] = 1e-10 # no zeros
144
- norms
276
+ factors / norms.expand_dims(1)
145
277
  end
146
278
 
147
- def similar(id, map, factors, norms, count, index)
279
+ def similar(id, map, norm_factors, count, index)
148
280
  i = map[id]
149
- if i
281
+
282
+ if i && norm_factors.shape[0] > 1
150
283
  if index && count
151
- keys = map.keys
152
- result = index.search(factors[i, true], size: count + 1)[1..-1]
153
- result.map do |v|
154
- {
155
- # ids from batch_insert start at 1 instead of 0
156
- item_id: keys[v[:id] - 1],
157
- # convert cosine distance to cosine similarity
158
- score: 1 - v[:distance]
159
- }
284
+ if defined?(Faiss) && index.is_a?(Faiss::Index)
285
+ predictions, ids = index.search(norm_factors[i, true].expand_dims(0), count + 1).map { |v| v.to_a[0] }
286
+ else
287
+ result = index.search(norm_factors[i, true], size: count + 1)
288
+ # ids from batch_insert start at 1 instead of 0
289
+ ids = result.map { |v| v[:id] - 1 }
290
+ # convert cosine distance to cosine similarity
291
+ predictions = result.map { |v| 1 - v[:distance] }
160
292
  end
161
293
  else
162
- predictions = factors.dot(factors[i, true]) / norms
163
-
164
- predictions =
165
- map.keys.zip(predictions).map do |item_id, pred|
166
- {item_id: item_id, score: pred}
167
- end
168
-
169
- max_score = predictions.delete_at(i)[:score]
170
- predictions.sort_by! { |pred| -pred[:score] } # already sorted by id
171
- predictions = predictions.first(count) if count
172
- # divide by max score to get cosine similarity
173
- # only need to do for returned records
174
- # could alternatively do cosine distance = 1 - cosine similarity
175
- # predictions.each { |pred| pred[:score] /= max_score }
176
- predictions
294
+ predictions = norm_factors.inner(norm_factors[i, true])
295
+ indexes = predictions.sort_index.reverse
296
+ indexes = indexes[0...[count + 1, indexes.size].min] if count
297
+ predictions = predictions[indexes]
298
+ ids = indexes
299
+ end
300
+
301
+ keys = map.keys
302
+
303
+ # TODO use user_id for similar_users in 0.3.0
304
+ key = :item_id
305
+
306
+ (1...ids.size).map do |i|
307
+ {key => keys[ids[i]], score: predictions[i]}
177
308
  end
178
309
  else
179
310
  []
180
311
  end
181
312
  end
182
313
 
183
- def create_maps(train_set)
184
- user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
185
- item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
314
+ def update_maps(train_set)
315
+ raise ArgumentError, "Missing user_id" if train_set.any? { |v| v[:user_id].nil? }
316
+ raise ArgumentError, "Missing item_id" if train_set.any? { |v| v[:item_id].nil? }
186
317
 
187
- raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
188
- raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
189
-
190
- @user_map = user_ids.zip(user_ids.size.times).to_h
191
- @item_map = item_ids.zip(item_ids.size.times).to_h
318
+ train_set.each do |v|
319
+ @user_map[v[:user_id]] ||= @user_map.size
320
+ @item_map[v[:item_id]] ||= @item_map.size
321
+ end
192
322
  end
193
323
 
194
324
  def check_ratings(ratings)
195
- unless ratings.all? { |r| !r.nil? }
325
+ unless ratings.all? { |r| !r[:rating].nil? }
196
326
  raise ArgumentError, "Missing ratings"
197
327
  end
198
- unless ratings.all? { |r| r.is_a?(Numeric) }
328
+ unless ratings.all? { |r| r[:rating].is_a?(Numeric) }
199
329
  raise ArgumentError, "Ratings must be numeric"
200
330
  end
201
331
  end
@@ -204,6 +334,10 @@ module Disco
204
334
  raise ArgumentError, "No training data" if train_set.empty?
205
335
  end
206
336
 
337
+ def check_fit
338
+ raise "Not fit" unless defined?(@implicit)
339
+ end
340
+
207
341
  def to_dataset(dataset)
208
342
  if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
209
343
  # convert keys to symbols
@@ -239,6 +373,11 @@ module Disco
239
373
  obj[:max_rating] = @max_rating
240
374
  end
241
375
 
376
+ if @top_items
377
+ obj[:item_count] = @item_count
378
+ obj[:item_sum] = @item_sum
379
+ end
380
+
242
381
  obj
243
382
  end
244
383
 
@@ -255,6 +394,12 @@ module Disco
255
394
  @min_rating = obj[:min_rating]
256
395
  @max_rating = obj[:max_rating]
257
396
  end
397
+
398
+ @top_items = obj.key?(:item_count)
399
+ if @top_items
400
+ @item_count = obj[:item_count]
401
+ @item_sum = obj[:item_sum]
402
+ end
258
403
  end
259
404
  end
260
405
  end
data/lib/disco/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Disco
2
- VERSION = "0.1.3"
2
+ VERSION = "0.2.5"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: disco
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.3
4
+ version: 0.2.5
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-06-29 00:00:00.000000000 Z
11
+ date: 2021-02-20 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: libmf
@@ -38,120 +38,8 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: bundler
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: rake
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- - !ruby/object:Gem::Dependency
70
- name: minitest
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '5'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '5'
83
- - !ruby/object:Gem::Dependency
84
- name: activerecord
85
- requirement: !ruby/object:Gem::Requirement
86
- requirements:
87
- - - ">="
88
- - !ruby/object:Gem::Version
89
- version: '0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - ">="
95
- - !ruby/object:Gem::Version
96
- version: '0'
97
- - !ruby/object:Gem::Dependency
98
- name: sqlite3
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - ">="
102
- - !ruby/object:Gem::Version
103
- version: '0'
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - ">="
109
- - !ruby/object:Gem::Version
110
- version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: daru
113
- requirement: !ruby/object:Gem::Requirement
114
- requirements:
115
- - - ">="
116
- - !ruby/object:Gem::Version
117
- version: '0'
118
- type: :development
119
- prerelease: false
120
- version_requirements: !ruby/object:Gem::Requirement
121
- requirements:
122
- - - ">="
123
- - !ruby/object:Gem::Version
124
- version: '0'
125
- - !ruby/object:Gem::Dependency
126
- name: rover-df
127
- requirement: !ruby/object:Gem::Requirement
128
- requirements:
129
- - - ">="
130
- - !ruby/object:Gem::Version
131
- version: '0'
132
- type: :development
133
- prerelease: false
134
- version_requirements: !ruby/object:Gem::Requirement
135
- requirements:
136
- - - ">="
137
- - !ruby/object:Gem::Version
138
- version: '0'
139
- - !ruby/object:Gem::Dependency
140
- name: ngt
141
- requirement: !ruby/object:Gem::Requirement
142
- requirements:
143
- - - ">="
144
- - !ruby/object:Gem::Version
145
- version: 0.2.3
146
- type: :development
147
- prerelease: false
148
- version_requirements: !ruby/object:Gem::Requirement
149
- requirements:
150
- - - ">="
151
- - !ruby/object:Gem::Version
152
- version: 0.2.3
153
- description:
154
- email: andrew@chartkick.com
41
+ description:
42
+ email: andrew@ankane.org
155
43
  executables: []
156
44
  extensions: []
157
45
  extra_rdoc_files: []
@@ -163,6 +51,7 @@ files:
163
51
  - lib/disco.rb
164
52
  - lib/disco/data.rb
165
53
  - lib/disco/engine.rb
54
+ - lib/disco/metrics.rb
166
55
  - lib/disco/model.rb
167
56
  - lib/disco/recommender.rb
168
57
  - lib/disco/version.rb
@@ -172,7 +61,7 @@ homepage: https://github.com/ankane/disco
172
61
  licenses:
173
62
  - MIT
174
63
  metadata: {}
175
- post_install_message:
64
+ post_install_message:
176
65
  rdoc_options: []
177
66
  require_paths:
178
67
  - lib
@@ -187,8 +76,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
187
76
  - !ruby/object:Gem::Version
188
77
  version: '0'
189
78
  requirements: []
190
- rubygems_version: 3.1.2
191
- signing_key:
79
+ rubygems_version: 3.2.3
80
+ signing_key:
192
81
  specification_version: 4
193
- summary: Collaborative filtering for Ruby
82
+ summary: Recommendations for Ruby and Rails using collaborative filtering
194
83
  test_files: []