disco 0.2.0 → 0.2.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 96836166496bb14ec8f973fb5e3709e0a28b7d5d8678608d074c0d7613076cc2
4
- data.tar.gz: c68d12941fddc51a67938ef951be4ee809faef2e131de0f5856908d8ed1f93d9
3
+ metadata.gz: a7823dbe0e68967c39a59f8cdc2fe577f4366b492e0559487606b74a7de1cc84
4
+ data.tar.gz: ba40e46b203e424eccb811c6b042c9a283356c42585b7e00123b4bb2f232b1e2
5
5
  SHA512:
6
- metadata.gz: 8b960dc961ead701713dbc7f9c104852355ac6d61f9f1f6e1cde43d1d4fa257b2c55859ca42896a6780f006d25f5ff613bf0261e601033db7d3c063f2a9f3d3e
7
- data.tar.gz: 6c82413ce53a9100fc97f9a3849c6231ebee4945fd26a3e3a1150f8c3abcca915032d35f4373b83217fa786600181097c003d1d071aab9de00612baf4c4eaa99
6
+ metadata.gz: ee43326933ac019b0bae631631ba79a7b1e03d1e9669361ef7722aa5a43b7bf2a2f49ccf8b098ab23539392fd09b83224c3cb9d340b80483179fabb45d62ee30
7
+ data.tar.gz: 9733820cc4e81b22cca51dbf89a02aa87e96cbbc1add753b2799878b5b50b549f2a27886dcfae387ad4cc158ce4bd651354f8bbd2514460ac07a60560ad5c455
data/CHANGELOG.md CHANGED
@@ -1,3 +1,38 @@
1
+ ## 0.2.6 (2021-02-24)
2
+
3
+ - Improved performance
4
+ - Improved `inspect` method
5
+ - Fixed issue with `similar_users` and `item_recs` returning the original user/item
6
+ - Fixed error with `fit` after loading
7
+
8
+ ## 0.2.5 (2021-02-20)
9
+
10
+ - Added `top_items` method
11
+ - Added `optimize_similar_users` method
12
+ - Added support for Faiss for `optimize_item_recs` and `optimize_similar_users` methods
13
+ - Added `rmse` method
14
+ - Improved performance
15
+
16
+ ## 0.2.4 (2021-02-15)
17
+
18
+ - Added `user_ids` and `item_ids` methods
19
+ - Added `user_id` argument to `user_factors`
20
+ - Added `item_id` argument to `item_factors`
21
+
22
+ ## 0.2.3 (2020-11-28)
23
+
24
+ - Added `predict` method
25
+ - Fixed bad recommendations and scores with `user_recs` and explicit feedback
26
+ - Fixed `item_ids` option for `user_recs`
27
+
28
+ ## 0.2.2 (n/a)
29
+
30
+ - Not available (released by previous gem owner)
31
+
32
+ ## 0.2.1 (2020-10-28)
33
+
34
+ - Fixed issue with `user_recs` returning rated items
35
+
1
36
  ## 0.2.0 (2020-07-31)
2
37
 
3
38
  - Changed score to always be between -1 and 1 for `item_recs` and `similar_users` (cosine similarity - this makes it easier to understand and consistent with `optimize_item_recs` and `optimize_similar_users`)
data/LICENSE.txt CHANGED
@@ -1,4 +1,4 @@
1
- Copyright (c) 2019-2020 Andrew Kane
1
+ Copyright (c) 2019-2021 Andrew Kane
2
2
 
3
3
  MIT License
4
4
 
data/README.md CHANGED
@@ -1,12 +1,12 @@
1
1
  # Disco
2
2
 
3
- :fire: Collaborative filtering for Ruby
3
+ :fire: Recommendations for Ruby and Rails using collaborative filtering
4
4
 
5
5
  - Supports user-based and item-based recommendations
6
6
  - Works with explicit and implicit feedback
7
7
  - Uses high-performance matrix factorization
8
8
 
9
- [![Build Status](https://travis-ci.org/ankane/disco.svg?branch=master)](https://travis-ci.org/ankane/disco)
9
+ [![Build Status](https://github.com/ankane/disco/workflows/build/badge.svg?branch=master)](https://github.com/ankane/disco/actions)
10
10
 
11
11
  ## Installation
12
12
 
@@ -44,15 +44,15 @@ recommender.fit([
44
44
  ])
45
45
  ```
46
46
 
47
- > Use `value` instead of rating for implicit feedback
47
+ > Use `value` instead of `rating` for implicit feedback
48
48
 
49
- Get user-based (user-item) recommendations - “users like you also liked”
49
+ Get user-based recommendations - “users like you also liked”
50
50
 
51
51
  ```ruby
52
52
  recommender.user_recs(user_id)
53
53
  ```
54
54
 
55
- Get item-based (item-item) recommendations - “users who liked this item also liked”
55
+ Get item-based recommendations - “users who liked this item also liked”
56
56
 
57
57
  ```ruby
58
58
  recommender.item_recs(item_id)
@@ -64,10 +64,10 @@ Use the `count` option to specify the number of recommendations (default is 5)
64
64
  recommender.user_recs(user_id, count: 3)
65
65
  ```
66
66
 
67
- Get predicted ratings for specific items
67
+ Get predicted ratings for specific users and items
68
68
 
69
69
  ```ruby
70
- recommender.user_recs(user_id, item_ids: [1, 2, 3])
70
+ recommender.predict([{user_id: 1, item_id: 2}, {user_id: 2, item_id: 4}])
71
71
  ```
72
72
 
73
73
  Get similar users
@@ -101,7 +101,8 @@ recommender.item_recs("Star Wars (1977)")
101
101
  ```ruby
102
102
  views = Ahoy::Event.
103
103
  where(name: "Viewed post").
104
- group(:user_id, "properties->>'post_id'"). # postgres syntax
104
+ group(:user_id).
105
+ group("properties->>'post_id'"). # postgres syntax
105
106
  count
106
107
 
107
108
  data =
@@ -200,6 +201,8 @@ bin = File.binread("recommender.bin")
200
201
  recommender = Marshal.load(bin)
201
202
  ```
202
203
 
204
+ Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor)
205
+
203
206
  ## Algorithms
204
207
 
205
208
  Disco uses high-performance matrix factorization.
@@ -236,6 +239,16 @@ There are a number of ways to deal with this, but here are some common ones:
236
239
  - For user-based recommendations, show new users the most popular items.
237
240
  - For item-based recommendations, make content-based recommendations with a gem like [tf-idf-similarity](https://github.com/jpmckinney/tf-idf-similarity).
238
241
 
242
+ Get top items with:
243
+
244
+ ```ruby
245
+ recommender = Disco::Recommender.new(top_items: true)
246
+ recommender.fit(data)
247
+ recommender.top_items
248
+ ```
249
+
250
+ This uses [Wilson score](https://www.evanmiller.org/how-not-to-sort-by-average-rating.html) for explicit feedback (add [wilson_score](https://github.com/instacart/wilson_score) to your application’s Gemfile) and item frequency for implicit feedback.
251
+
239
252
  ## Data
240
253
 
241
254
  Data can be an array of hashes
@@ -256,23 +269,29 @@ Or a Daru data frame
256
269
  Daru::DataFrame.from_csv("ratings.csv")
257
270
  ```
258
271
 
259
- ## Faster Similarity
272
+ ## Performance
260
273
 
261
- If you have a large number of users/items, you can use an approximate nearest neighbors library like [NGT](https://github.com/ankane/ngt) to speed up item-based recommendations and similar users.
274
+ If you have a large number of users or items, you can use an approximate nearest neighbors library like [Faiss](https://github.com/ankane/faiss) to improve the performance of certain methods.
262
275
 
263
276
  Add this line to your application’s Gemfile:
264
277
 
265
278
  ```ruby
266
- gem 'ngt', '>= 0.3.0'
279
+ gem 'faiss'
280
+ ```
281
+
282
+ Speed up the `user_recs` method with:
283
+
284
+ ```ruby
285
+ model.optimize_user_recs
267
286
  ```
268
287
 
269
- Speed up item-based recommendations with:
288
+ Speed up the `item_recs` method with:
270
289
 
271
290
  ```ruby
272
291
  model.optimize_item_recs
273
292
  ```
274
293
 
275
- Speed up similar users with:
294
+ Speed up the `similar_users` method with:
276
295
 
277
296
  ```ruby
278
297
  model.optimize_similar_users
@@ -282,19 +301,33 @@ This should be called after fitting or loading the model.
282
301
 
283
302
  ## Reference
284
303
 
304
+ Get ids
305
+
306
+ ```ruby
307
+ recommender.user_ids
308
+ recommender.item_ids
309
+ ```
310
+
285
311
  Get the global mean
286
312
 
287
313
  ```ruby
288
314
  recommender.global_mean
289
315
  ```
290
316
 
291
- Get the factors
317
+ Get factors
292
318
 
293
319
  ```ruby
294
320
  recommender.user_factors
295
321
  recommender.item_factors
296
322
  ```
297
323
 
324
+ Get factors for specific users and items
325
+
326
+ ```ruby
327
+ recommender.user_factors(user_id)
328
+ recommender.item_factors(item_id)
329
+ ```
330
+
298
331
  ## Credits
299
332
 
300
333
  Thanks to:
@@ -315,3 +348,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
315
348
  - Fix bugs and [submit pull requests](https://github.com/ankane/disco/pulls)
316
349
  - Write, clarify, or fix documentation
317
350
  - Suggest or add new features
351
+
352
+ To get started with development:
353
+
354
+ ```sh
355
+ git clone https://github.com/ankane/disco.git
356
+ cd disco
357
+ bundle install
358
+ bundle exec rake test
359
+ ```
data/lib/disco.rb CHANGED
@@ -9,6 +9,7 @@ require "net/http"
9
9
 
10
10
  # modules
11
11
  require "disco/data"
12
+ require "disco/metrics"
12
13
  require "disco/recommender"
13
14
  require "disco/version"
14
15
 
@@ -0,0 +1,10 @@
1
+ module Disco
2
+ module Metrics
3
+ class << self
4
+ def rmse(act, exp)
5
+ raise ArgumentError, "Size mismatch" if act.size != exp.size
6
+ Math.sqrt(act.zip(exp).sum { |a, e| (a - e)**2 } / act.size.to_f)
7
+ end
8
+ end
9
+ end
10
+ end
@@ -1,39 +1,44 @@
1
1
  module Disco
2
2
  class Recommender
3
- attr_reader :global_mean, :item_factors, :user_factors
3
+ attr_reader :global_mean
4
4
 
5
- def initialize(factors: 8, epochs: 20, verbose: nil)
5
+ def initialize(factors: 8, epochs: 20, verbose: nil, top_items: false)
6
6
  @factors = factors
7
7
  @epochs = epochs
8
8
  @verbose = verbose
9
+ @user_map = {}
10
+ @item_map = {}
11
+ @top_items = top_items
9
12
  end
10
13
 
11
14
  def fit(train_set, validation_set: nil)
12
15
  train_set = to_dataset(train_set)
13
16
  validation_set = to_dataset(validation_set) if validation_set
14
17
 
18
+ check_training_set(train_set)
19
+
20
+ # TODO option to set in initializer to avoid pass
21
+ # could also just check first few values
22
+ # but may be confusing if they are all missing and later ones aren't
15
23
  @implicit = !train_set.any? { |v| v[:rating] }
16
24
 
25
+ # TODO improve performance
26
+ # (catch exception instead of checking ahead of time)
17
27
  unless @implicit
18
- ratings = train_set.map { |o| o[:rating] }
19
- check_ratings(ratings)
20
- @min_rating = ratings.min
21
- @max_rating = ratings.max
28
+ check_ratings(train_set)
22
29
 
23
30
  if validation_set
24
- check_ratings(validation_set.map { |o| o[:rating] })
31
+ check_ratings(validation_set)
25
32
  end
26
33
  end
27
34
 
28
- check_training_set(train_set)
29
- create_maps(train_set)
30
-
31
35
  @rated = Hash.new { |hash, key| hash[key] = {} }
32
36
  input = []
33
37
  value_key = @implicit ? :value : :rating
34
38
  train_set.each do |v|
35
- u = @user_map[v[:user_id]]
36
- i = @item_map[v[:item_id]]
39
+ # update maps and build matrix in single pass
40
+ u = (@user_map[v[:user_id]] ||= @user_map.size)
41
+ i = (@item_map[v[:item_id]] ||= @item_map.size)
37
42
  @rated[u][i] = true
38
43
 
39
44
  # explicit will always have a value due to check_ratings
@@ -41,6 +46,25 @@ module Disco
41
46
  end
42
47
  @rated.default = nil
43
48
 
49
+ # much more efficient than checking every value in another pass
50
+ raise ArgumentError, "Missing user_id" if @user_map.key?(nil)
51
+ raise ArgumentError, "Missing item_id" if @item_map.key?(nil)
52
+
53
+ # TODO improve performance
54
+ unless @implicit
55
+ @min_rating, @max_rating = train_set.minmax_by { |o| o[:rating] }.map { |o| o[:rating] }
56
+ end
57
+
58
+ if @top_items
59
+ @item_count = [0] * @item_map.size
60
+ @item_sum = [0.0] * @item_map.size
61
+ train_set.each do |v|
62
+ i = @item_map[v[:item_id]]
63
+ @item_count[i] += 1
64
+ @item_sum[i] += (v[value_key] || 1)
65
+ end
66
+ end
67
+
44
68
  eval_set = nil
45
69
  if validation_set
46
70
  eval_set = []
@@ -67,135 +91,258 @@ module Disco
67
91
  @user_factors = model.p_factors(format: :numo)
68
92
  @item_factors = model.q_factors(format: :numo)
69
93
 
70
- @user_index = nil
71
- @item_index = nil
94
+ @normalized_user_factors = nil
95
+ @normalized_item_factors = nil
96
+
97
+ @user_recs_index = nil
98
+ @similar_users_index = nil
99
+ @similar_items_index = nil
100
+ end
101
+
102
+ # generates a prediction even if a user has already rated the item
103
+ def predict(data)
104
+ data = to_dataset(data)
105
+
106
+ u = data.map { |v| @user_map[v[:user_id]] }
107
+ i = data.map { |v| @item_map[v[:item_id]] }
108
+
109
+ new_index = data.each_index.select { |index| u[index].nil? || i[index].nil? }
110
+ new_index.each do |j|
111
+ u[j] = 0
112
+ i[j] = 0
113
+ end
114
+
115
+ predictions = @user_factors[u, true].inner(@item_factors[i, true])
116
+ predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
117
+ predictions[new_index] = @global_mean
118
+ predictions.to_a
72
119
  end
73
120
 
74
121
  def user_recs(user_id, count: 5, item_ids: nil)
122
+ check_fit
75
123
  u = @user_map[user_id]
76
124
 
77
125
  if u
78
- predictions = @global_mean + @item_factors.dot(@user_factors[u, true])
79
- predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
80
-
81
- predictions =
82
- @item_map.keys.zip(predictions).map do |item_id, pred|
83
- {item_id: item_id, score: pred}
84
- end
126
+ rated = item_ids ? {} : @rated[u]
85
127
 
86
128
  if item_ids
87
- idx = item_ids.map { |i| @item_map[i] }.compact
88
- predictions.values_at(*idx)
129
+ ids = Numo::NArray.cast(item_ids.map { |i| @item_map[i] }.compact)
130
+ return [] if ids.size == 0
131
+
132
+ predictions = @item_factors[ids, true].inner(@user_factors[u, true])
133
+ indexes = predictions.sort_index.reverse
134
+ indexes = indexes[0...[count + rated.size, indexes.size].min] if count
135
+ predictions = predictions[indexes]
136
+ ids = ids[indexes]
137
+ elsif @user_recs_index && count
138
+ predictions, ids = @user_recs_index.search(@user_factors[u, true].expand_dims(0), count + rated.size).map { |v| v[0, true] }
89
139
  else
90
- @rated[u].keys.each do |i|
91
- predictions.delete_at(i)
92
- end
140
+ predictions = @item_factors.inner(@user_factors[u, true])
141
+ # TODO make sure reverse isn't hurting performance
142
+ indexes = predictions.sort_index.reverse
143
+ indexes = indexes[0...[count + rated.size, indexes.size].min] if count
144
+ predictions = predictions[indexes]
145
+ ids = indexes
93
146
  end
94
147
 
95
- predictions.sort_by! { |pred| -pred[:score] } # already sorted by id
96
- predictions = predictions.first(count) if count && !item_ids
97
- predictions
148
+ predictions.inplace.clip(@min_rating, @max_rating) if @min_rating
149
+
150
+ keys = @item_map.keys
151
+ result = []
152
+ ids.each_with_index do |item_id, i|
153
+ next if rated[item_id]
154
+
155
+ result << {item_id: keys[item_id], score: predictions[i]}
156
+ break if result.size == count
157
+ end
158
+ result
159
+ elsif @top_items
160
+ top_items(count: count)
98
161
  else
99
- # no items if user is unknown
100
- # TODO maybe most popular items
101
162
  []
102
163
  end
103
164
  end
104
165
 
105
- def optimize_similar_items
106
- @item_index = create_index(@item_factors)
166
+ def similar_items(item_id, count: 5)
167
+ check_fit
168
+ similar(item_id, @item_map, normalized_item_factors, count, @similar_items_index)
107
169
  end
108
- alias_method :optimize_item_recs, :optimize_similar_items
170
+ alias_method :item_recs, :similar_items
109
171
 
110
- def optimize_similar_users
111
- @user_index = create_index(@user_factors)
172
+ def similar_users(user_id, count: 5)
173
+ check_fit
174
+ similar(user_id, @user_map, normalized_user_factors, count, @similar_users_index)
112
175
  end
113
176
 
114
- def similar_items(item_id, count: 5)
115
- similar(item_id, @item_map, @item_factors, item_norms, count, @item_index)
177
+ def top_items(count: 5)
178
+ check_fit
179
+ raise "top_items not computed" unless @top_items
180
+
181
+ if @implicit
182
+ scores = @item_count
183
+ else
184
+ require "wilson_score"
185
+
186
+ range = @min_rating..@max_rating
187
+ scores = @item_sum.zip(@item_count).map { |s, c| WilsonScore.rating_lower_bound(s / c, c, range) }
188
+ end
189
+
190
+ scores = scores.map.with_index.sort_by { |s, _| -s }
191
+ scores = scores.first(count) if count
192
+ item_ids = item_ids()
193
+ scores.map do |s, i|
194
+ {item_id: item_ids[i], score: s}
195
+ end
116
196
  end
117
- alias_method :item_recs, :similar_items
118
197
 
119
- def similar_users(user_id, count: 5)
120
- similar(user_id, @user_map, @user_factors, user_norms, count, @user_index)
198
+ def user_ids
199
+ @user_map.keys
200
+ end
201
+
202
+ def item_ids
203
+ @item_map.keys
204
+ end
205
+
206
+ def user_factors(user_id = nil)
207
+ if user_id
208
+ u = @user_map[user_id]
209
+ @user_factors[u, true] if u
210
+ else
211
+ @user_factors
212
+ end
213
+ end
214
+
215
+ def item_factors(item_id = nil)
216
+ if item_id
217
+ i = @item_map[item_id]
218
+ @item_factors[i, true] if i
219
+ else
220
+ @item_factors
221
+ end
222
+ end
223
+
224
+ def optimize_user_recs
225
+ check_fit
226
+ @user_recs_index = create_index(item_factors, library: "faiss")
227
+ end
228
+
229
+ def optimize_similar_items(library: nil)
230
+ check_fit
231
+ @similar_items_index = create_index(normalized_item_factors, library: library)
232
+ end
233
+ alias_method :optimize_item_recs, :optimize_similar_items
234
+
235
+ def optimize_similar_users(library: nil)
236
+ check_fit
237
+ @similar_users_index = create_index(normalized_user_factors, library: library)
238
+ end
239
+
240
+ def inspect
241
+ to_s # for now
121
242
  end
122
243
 
123
244
  private
124
245
 
125
- def create_index(factors)
126
- require "ngt"
246
+ # factors should already be normalized for similar users/items
247
+ def create_index(factors, library:)
248
+ # TODO make Faiss the default in 0.3.0
249
+ library ||= defined?(Faiss) && !defined?(Ngt) ? "faiss" : "ngt"
250
+
251
+ case library
252
+ when "faiss"
253
+ require "faiss"
254
+
255
+ # inner product is cosine similarity with normalized vectors
256
+ # https://github.com/facebookresearch/faiss/issues/95
257
+ #
258
+ # TODO use non-exact index
259
+ # https://github.com/facebookresearch/faiss/wiki/Faiss-indexes
260
+ index = Faiss::IndexFlatIP.new(factors.shape[1])
261
+
262
+ # ids are from 0...total
263
+ # https://github.com/facebookresearch/faiss/blob/96b740abedffc8f67389f29c2a180913941534c6/faiss/Index.h#L89
264
+ index.add(factors)
265
+
266
+ index
267
+ when "ngt"
268
+ require "ngt"
127
269
 
128
- index = Ngt::Index.new(factors.shape[1], distance_type: "Cosine")
129
- index.batch_insert(factors)
130
- index
270
+ # could speed up search with normalized cosine
271
+ # https://github.com/yahoojapan/NGT/issues/36
272
+ index = Ngt::Index.new(factors.shape[1], distance_type: "Cosine")
273
+
274
+ # NGT normalizes so could call create_index without normalized factors
275
+ # but keep code simple for now
276
+ ids = index.batch_insert(factors)
277
+ raise "Unexpected ids. Please report a bug." if ids.first != 1 || ids.last != factors.shape[0]
278
+
279
+ index
280
+ else
281
+ raise ArgumentError, "Invalid library: #{library}"
282
+ end
131
283
  end
132
284
 
133
- def user_norms
134
- @user_norms ||= norms(@user_factors)
285
+ def normalized_user_factors
286
+ @normalized_user_factors ||= normalize(@user_factors)
135
287
  end
136
288
 
137
- def item_norms
138
- @item_norms ||= norms(@item_factors)
289
+ def normalized_item_factors
290
+ @normalized_item_factors ||= normalize(@item_factors)
139
291
  end
140
292
 
141
- def norms(factors)
293
+ def normalize(factors)
142
294
  norms = Numo::SFloat::Math.sqrt((factors * factors).sum(axis: 1))
143
295
  norms[norms.eq(0)] = 1e-10 # no zeros
144
- norms
296
+ factors / norms.expand_dims(1)
145
297
  end
146
298
 
147
- def similar(id, map, factors, norms, count, index)
299
+ def similar(id, map, norm_factors, count, index)
148
300
  i = map[id]
149
- if i
301
+
302
+ if i && norm_factors.shape[0] > 1
150
303
  if index && count
151
- keys = map.keys
152
- result = index.search(factors[i, true], size: count + 1)[1..-1]
153
- result.map do |v|
154
- {
155
- # ids from batch_insert start at 1 instead of 0
156
- item_id: keys[v[:id] - 1],
157
- # convert cosine distance to cosine similarity
158
- score: 1 - v[:distance]
159
- }
304
+ if defined?(Faiss) && index.is_a?(Faiss::Index)
305
+ predictions, ids = index.search(norm_factors[i, true].expand_dims(0), count + 1).map { |v| v.to_a[0] }
306
+ else
307
+ result = index.search(norm_factors[i, true], size: count + 1)
308
+ # ids from batch_insert start at 1 instead of 0
309
+ ids = result.map { |v| v[:id] - 1 }
310
+ # convert cosine distance to cosine similarity
311
+ predictions = result.map { |v| 1 - v[:distance] }
160
312
  end
161
313
  else
162
- predictions = factors.dot(factors[i, true]) / norms
163
-
164
- predictions =
165
- map.keys.zip(predictions).map do |item_id, pred|
166
- {item_id: item_id, score: pred}
167
- end
168
-
169
- max_score = predictions.delete_at(i)[:score]
170
- predictions.sort_by! { |pred| -pred[:score] } # already sorted by id
171
- predictions = predictions.first(count) if count
172
- # divide by max score to get cosine similarity
173
- # only need to do for returned records
174
- predictions.each { |pred| pred[:score] /= max_score }
175
- predictions
314
+ predictions = norm_factors.inner(norm_factors[i, true])
315
+ indexes = predictions.sort_index.reverse
316
+ indexes = indexes[0...[count + 1, indexes.size].min] if count
317
+ predictions = predictions[indexes]
318
+ ids = indexes
176
319
  end
177
- else
178
- []
179
- end
180
- end
181
320
 
182
- def create_maps(train_set)
183
- user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
184
- item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
321
+ keys = map.keys
322
+
323
+ # TODO use user_id for similar_users in 0.3.0
324
+ key = :item_id
185
325
 
186
- raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
187
- raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
326
+ result = []
327
+ # items can have the same score
328
+ # so original item may not be at index 0
329
+ ids.each_with_index do |id, j|
330
+ next if id == i
188
331
 
189
- @user_map = user_ids.zip(user_ids.size.times).to_h
190
- @item_map = item_ids.zip(item_ids.size.times).to_h
332
+ result << {key => keys[id], score: predictions[j]}
333
+ end
334
+ result
335
+ else
336
+ []
337
+ end
191
338
  end
192
339
 
193
340
  def check_ratings(ratings)
194
- unless ratings.all? { |r| !r.nil? }
195
- raise ArgumentError, "Missing ratings"
341
+ unless ratings.all? { |r| !r[:rating].nil? }
342
+ raise ArgumentError, "Missing rating"
196
343
  end
197
- unless ratings.all? { |r| r.is_a?(Numeric) }
198
- raise ArgumentError, "Ratings must be numeric"
344
+ unless ratings.all? { |r| r[:rating].is_a?(Numeric) }
345
+ raise ArgumentError, "Rating must be numeric"
199
346
  end
200
347
  end
201
348
 
@@ -203,6 +350,10 @@ module Disco
203
350
  raise ArgumentError, "No training data" if train_set.empty?
204
351
  end
205
352
 
353
+ def check_fit
354
+ raise "Not fit" unless defined?(@implicit)
355
+ end
356
+
206
357
  def to_dataset(dataset)
207
358
  if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
208
359
  # convert keys to symbols
@@ -230,7 +381,10 @@ module Disco
230
381
  rated: @rated,
231
382
  global_mean: @global_mean,
232
383
  user_factors: @user_factors,
233
- item_factors: @item_factors
384
+ item_factors: @item_factors,
385
+ factors: @factors,
386
+ epochs: @epochs,
387
+ verbose: @verbose
234
388
  }
235
389
 
236
390
  unless @implicit
@@ -238,6 +392,11 @@ module Disco
238
392
  obj[:max_rating] = @max_rating
239
393
  end
240
394
 
395
+ if @top_items
396
+ obj[:item_count] = @item_count
397
+ obj[:item_sum] = @item_sum
398
+ end
399
+
241
400
  obj
242
401
  end
243
402
 
@@ -249,11 +408,20 @@ module Disco
249
408
  @global_mean = obj[:global_mean]
250
409
  @user_factors = obj[:user_factors]
251
410
  @item_factors = obj[:item_factors]
411
+ @factors = obj[:factors]
412
+ @epochs = obj[:epochs]
413
+ @verbose = obj[:verbose]
252
414
 
253
415
  unless @implicit
254
416
  @min_rating = obj[:min_rating]
255
417
  @max_rating = obj[:max_rating]
256
418
  end
419
+
420
+ @top_items = obj.key?(:item_count)
421
+ if @top_items
422
+ @item_count = obj[:item_count]
423
+ @item_sum = obj[:item_sum]
424
+ end
257
425
  end
258
426
  end
259
427
  end
data/lib/disco/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Disco
2
- VERSION = "0.2.0"
2
+ VERSION = "0.2.6"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: disco
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-07-31 00:00:00.000000000 Z
11
+ date: 2021-02-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: libmf
@@ -38,120 +38,8 @@ dependencies:
38
38
  - - ">="
39
39
  - !ruby/object:Gem::Version
40
40
  version: '0'
41
- - !ruby/object:Gem::Dependency
42
- name: bundler
43
- requirement: !ruby/object:Gem::Requirement
44
- requirements:
45
- - - ">="
46
- - !ruby/object:Gem::Version
47
- version: '0'
48
- type: :development
49
- prerelease: false
50
- version_requirements: !ruby/object:Gem::Requirement
51
- requirements:
52
- - - ">="
53
- - !ruby/object:Gem::Version
54
- version: '0'
55
- - !ruby/object:Gem::Dependency
56
- name: rake
57
- requirement: !ruby/object:Gem::Requirement
58
- requirements:
59
- - - ">="
60
- - !ruby/object:Gem::Version
61
- version: '0'
62
- type: :development
63
- prerelease: false
64
- version_requirements: !ruby/object:Gem::Requirement
65
- requirements:
66
- - - ">="
67
- - !ruby/object:Gem::Version
68
- version: '0'
69
- - !ruby/object:Gem::Dependency
70
- name: minitest
71
- requirement: !ruby/object:Gem::Requirement
72
- requirements:
73
- - - ">="
74
- - !ruby/object:Gem::Version
75
- version: '5'
76
- type: :development
77
- prerelease: false
78
- version_requirements: !ruby/object:Gem::Requirement
79
- requirements:
80
- - - ">="
81
- - !ruby/object:Gem::Version
82
- version: '5'
83
- - !ruby/object:Gem::Dependency
84
- name: activerecord
85
- requirement: !ruby/object:Gem::Requirement
86
- requirements:
87
- - - ">="
88
- - !ruby/object:Gem::Version
89
- version: '0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - ">="
95
- - !ruby/object:Gem::Version
96
- version: '0'
97
- - !ruby/object:Gem::Dependency
98
- name: sqlite3
99
- requirement: !ruby/object:Gem::Requirement
100
- requirements:
101
- - - ">="
102
- - !ruby/object:Gem::Version
103
- version: '0'
104
- type: :development
105
- prerelease: false
106
- version_requirements: !ruby/object:Gem::Requirement
107
- requirements:
108
- - - ">="
109
- - !ruby/object:Gem::Version
110
- version: '0'
111
- - !ruby/object:Gem::Dependency
112
- name: daru
113
- requirement: !ruby/object:Gem::Requirement
114
- requirements:
115
- - - ">="
116
- - !ruby/object:Gem::Version
117
- version: '0'
118
- type: :development
119
- prerelease: false
120
- version_requirements: !ruby/object:Gem::Requirement
121
- requirements:
122
- - - ">="
123
- - !ruby/object:Gem::Version
124
- version: '0'
125
- - !ruby/object:Gem::Dependency
126
- name: rover-df
127
- requirement: !ruby/object:Gem::Requirement
128
- requirements:
129
- - - ">="
130
- - !ruby/object:Gem::Version
131
- version: '0'
132
- type: :development
133
- prerelease: false
134
- version_requirements: !ruby/object:Gem::Requirement
135
- requirements:
136
- - - ">="
137
- - !ruby/object:Gem::Version
138
- version: '0'
139
- - !ruby/object:Gem::Dependency
140
- name: ngt
141
- requirement: !ruby/object:Gem::Requirement
142
- requirements:
143
- - - ">="
144
- - !ruby/object:Gem::Version
145
- version: 0.3.0
146
- type: :development
147
- prerelease: false
148
- version_requirements: !ruby/object:Gem::Requirement
149
- requirements:
150
- - - ">="
151
- - !ruby/object:Gem::Version
152
- version: 0.3.0
153
- description:
154
- email: andrew@chartkick.com
41
+ description:
42
+ email: andrew@ankane.org
155
43
  executables: []
156
44
  extensions: []
157
45
  extra_rdoc_files: []
@@ -163,6 +51,7 @@ files:
163
51
  - lib/disco.rb
164
52
  - lib/disco/data.rb
165
53
  - lib/disco/engine.rb
54
+ - lib/disco/metrics.rb
166
55
  - lib/disco/model.rb
167
56
  - lib/disco/recommender.rb
168
57
  - lib/disco/version.rb
@@ -172,7 +61,7 @@ homepage: https://github.com/ankane/disco
172
61
  licenses:
173
62
  - MIT
174
63
  metadata: {}
175
- post_install_message:
64
+ post_install_message:
176
65
  rdoc_options: []
177
66
  require_paths:
178
67
  - lib
@@ -187,8 +76,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
187
76
  - !ruby/object:Gem::Version
188
77
  version: '0'
189
78
  requirements: []
190
- rubygems_version: 3.1.2
191
- signing_key:
79
+ rubygems_version: 3.2.3
80
+ signing_key:
192
81
  specification_version: 4
193
- summary: Collaborative filtering for Ruby
82
+ summary: Recommendations for Ruby and Rails using collaborative filtering
194
83
  test_files: []