disco 0.3.0 → 0.3.2

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 815bc7de802959be7093d9e0478d83a0cf49a522e72a2df928de86223799d83d
4
- data.tar.gz: cbfacf86f1e0507abe4df07b45f20bc3d06d682617c482419a05935186a61c15
3
+ metadata.gz: 645a5f169c78e36ee6394a71ba61ac611cb333176adff84812c8a25549e2ea28
4
+ data.tar.gz: 68caa44554bff09a39a68522ad5c8008164840186c08e0fdbad8f9e465855c89
5
5
  SHA512:
6
- metadata.gz: d0f3285b53cb8fe7e7d5ef30a970c632c52112c6b0503b8c81155f6cdb37583f036107b052c37019671355d0838512f904a61aeff5b69b7f6f8a2c1f4fabe785
7
- data.tar.gz: f1b9c5759d77c1f497a0ac09ccf455beda29417024c4cd8ba6c0f8fcbac3347ab233c9e8c558a75382ef3b41495b5b693495ab0533c3a084a416c1f75a38313b
6
+ metadata.gz: 104a016c693c0256cae13d4545df4112538a1f7a66c8790c38a1b434f144c8d2fb173d2428bbc9168e2581e0e9399d761083500fca69919c07f86b2f1e6582ee
7
+ data.tar.gz: 3d06f45fdcf63ea26fa48c18ec5304d22f3306da5bb3fa0a2bcd107455002dbfda56e0df22f61270c12374cb693ad6da2d747bd363c0ec001048f23690b3efe1
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ## 0.3.2 (2022-09-26)
2
+
3
+ - Fixed issue when `fit` is called multiple times
4
+
5
+ ## 0.3.1 (2022-07-10)
6
+
7
+ - Added support for JSON serialization
8
+
1
9
  ## 0.3.0 (2022-03-22)
2
10
 
3
11
  - Changed `item_id` to `user_id` for `similar_users`
data/README.md CHANGED
@@ -183,17 +183,17 @@ For Rails < 6, speed up inserts by adding [activerecord-import](https://github.c
183
183
  If you’d prefer to perform recommendations on-the-fly, store the recommender
184
184
 
185
185
  ```ruby
186
- bin = Marshal.dump(recommender)
187
- File.binwrite("recommender.bin", bin)
186
+ json = recommender.to_json
187
+ File.write("recommender.json", json)
188
188
  ```
189
189
 
190
- > You can save it to a file, database, or any other storage system
190
+ The serialized recommender includes user activity from the training data (to avoid recommending previously rated items), so be sure to protect it. You can save it to a file, database, or any other storage system, or use a tool like [Trove](https://github.com/ankane/trove). Also, user and item IDs should be integers or strings for this.
191
191
 
192
192
  Load a recommender
193
193
 
194
194
  ```ruby
195
- bin = File.binread("recommender.bin")
196
- recommender = Marshal.load(bin)
195
+ json = File.read("recommender.json")
196
+ recommender = Disco::Recommender.load_json(json)
197
197
  ```
198
198
 
199
199
  Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor). See the [examples](https://github.com/ankane/neighbor/tree/master/examples).
@@ -223,7 +223,7 @@ recommender.fit(data, validation_set: validation_set)
223
223
 
224
224
  ## Cold Start
225
225
 
226
- Collaborative filtering suffers from the [cold start problem](https://www.yuspify.com/blog/cold-start-problem-recommender-systems/). It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.
226
+ Collaborative filtering suffers from the [cold start problem](https://en.wikipedia.org/wiki/Cold_start_(recommender_systems)). It’s unable to make good recommendations without data on a user or item, which is problematic for new users and items.
227
227
 
228
228
  ```ruby
229
229
  recommender.user_recs(new_user_id) # returns empty array
@@ -36,6 +36,8 @@ module Disco
36
36
  end
37
37
  end
38
38
 
39
+ @user_map = {}
40
+ @item_map = {}
39
41
  @rated = Hash.new { |hash, key| hash[key] = {} }
40
42
  input = []
41
43
  train_set.each do |v|
@@ -56,6 +58,9 @@ module Disco
56
58
  # TODO improve performance
57
59
  unless @implicit
58
60
  @min_rating, @max_rating = train_set.minmax_by { |o| o[:rating] }.map { |o| o[:rating] }
61
+ else
62
+ @min_rating = nil
63
+ @max_rating = nil
59
64
  end
60
65
 
61
66
  if @top_items
@@ -255,6 +260,46 @@ module Disco
255
260
  to_s # for now
256
261
  end
257
262
 
263
+ def to_json
264
+ require "base64"
265
+ require "json"
266
+
267
+ obj = {
268
+ implicit: @implicit,
269
+ user_ids: @user_map.keys,
270
+ item_ids: @item_map.keys,
271
+ rated: @user_map.map { |_, u| (@rated[u] || {}).keys },
272
+ global_mean: @global_mean,
273
+ user_factors: Base64.strict_encode64(@user_factors.to_binary),
274
+ item_factors: Base64.strict_encode64(@item_factors.to_binary),
275
+ factors: @factors,
276
+ epochs: @epochs,
277
+ verbose: @verbose
278
+ }
279
+
280
+ unless @implicit
281
+ obj[:min_rating] = @min_rating
282
+ obj[:max_rating] = @max_rating
283
+ end
284
+
285
+ if @top_items
286
+ obj[:item_count] = @item_count
287
+ obj[:item_sum] = @item_sum
288
+ end
289
+
290
+ JSON.generate(obj)
291
+ end
292
+
293
+ def self.load_json(json)
294
+ require "json"
295
+
296
+ obj = JSON.parse(json)
297
+
298
+ recommender = new
299
+ recommender.send(:json_load, obj)
300
+ recommender
301
+ end
302
+
258
303
  private
259
304
 
260
305
  # factors should already be normalized for similar users/items
@@ -434,5 +479,31 @@ module Disco
434
479
  @item_sum = obj[:item_sum]
435
480
  end
436
481
  end
482
+
483
+ def json_load(obj)
484
+ require "base64"
485
+
486
+ @implicit = obj["implicit"]
487
+ @user_map = obj["user_ids"].map.with_index.to_h
488
+ @item_map = obj["item_ids"].map.with_index.to_h
489
+ @rated = obj["rated"].map.with_index.to_h { |r, i| [i, r.to_h { |v| [v, true] }] }
490
+ @global_mean = obj["global_mean"].to_f
491
+ @factors = obj["factors"].to_i
492
+ @user_factors = Numo::SFloat.from_binary(Base64.strict_decode64(obj["user_factors"]), [@user_map.size, @factors])
493
+ @item_factors = Numo::SFloat.from_binary(Base64.strict_decode64(obj["item_factors"]), [@item_map.size, @factors])
494
+ @epochs = obj["epochs"].to_i
495
+ @verbose = obj["verbose"]
496
+
497
+ unless @implicit
498
+ @min_rating = obj["min_rating"]
499
+ @max_rating = obj["max_rating"]
500
+ end
501
+
502
+ @top_items = obj.key?("item_count")
503
+ if @top_items
504
+ @item_count = obj["item_count"]
505
+ @item_sum = obj["item_sum"]
506
+ end
507
+ end
437
508
  end
438
509
  end
data/lib/disco/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Disco
2
- VERSION = "0.3.0"
2
+ VERSION = "0.3.2"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: disco
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.0
4
+ version: 0.3.2
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2022-03-22 00:00:00.000000000 Z
11
+ date: 2022-09-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: libmf