cmfrec 0.1.4 → 0.1.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bb7b07ae46500a545f1a130dfc5648aa3f925f9b5766a6c70a1652c7b5732182
4
- data.tar.gz: e89a6d1900cda651dc6b0aac2899050e28680cddfb6b39b6b5eacfe467b59aad
3
+ metadata.gz: 8d16ab98cb7de22042eaf353a9d41d0b7a4214a631a373c553f73825418c026a
4
+ data.tar.gz: 9ab678a9d389b835b4dfd91d14c372d5acfef950bf068ac46d2d879af04f0fcc
5
5
  SHA512:
6
- metadata.gz: 117aa6952fe0ab8ddebfaece6655cf479a7adbab7d6f634e7d3428c72824a410812c037ae006366180a9691a6d160d8065b777a9c10a33a5ccfefedb28c99ec6
7
- data.tar.gz: 57985a055705b820226a2aa1451453383ee3509e43225f8fdb09e713c4530754b0b608f7d1b4814973b43e3d625f824f9f87939687d015b352cc8905f7b4f118
6
+ metadata.gz: 175d3c91056d2e8734af6961471c98be76e4d5f6b85faaecdfd3b39a220efafa70150983e9d74efb1a1211a29e6c867d6b1d7f482cc34c55500268d29b40158c
7
+ data.tar.gz: faaed621391ccc7d94f2e6309481b24f7db62fbe86eb6c1bbb35445dde0cabf32c41791af4ebcd25e530bdc1f6319f1b4477b6c4d6ca1bf77353d3a0c4ae8d5c
data/CHANGELOG.md CHANGED
@@ -1,3 +1,24 @@
1
+ ## 0.1.7 (2022-03-22)
2
+
3
+ - Improved ARM detection
4
+ - Fixed error with `load_movielens`
5
+ - Fixed duplicates in `item_info` with `load_movielens`
6
+
7
+ ## 0.1.6 (2021-08-12)
8
+
9
+ - Added `user_ids` and `item_ids` methods
10
+ - Added `user_id` argument to `user_factors`
11
+ - Added `item_id` argument to `item_factors`
12
+ - Added `user_id` argument to `user_bias`
13
+ - Added `item_id` argument to `item_bias`
14
+ - Added `item_ids` argument to `new_user_recs`
15
+ - Fixed order for `user_recs`
16
+
17
+ ## 0.1.5 (2021-08-10)
18
+
19
+ - Fixed issue with `user_recs` and `new_user_recs` returning rated items
20
+ - Fixed error with `new_user_recs`
21
+
1
22
  ## 0.1.4 (2021-02-04)
2
23
 
3
24
  - Added support for saving and loading recommenders
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # cmfrec
1
+ # cmfrec Ruby
2
2
 
3
3
  :fire: Recommendations for Ruby, powered by [cmfrec](https://github.com/david-cortes/cmfrec)
4
4
 
@@ -6,7 +6,7 @@
6
6
  - Works with explicit and implicit feedback
7
7
  - Uses high-performance matrix factorization
8
8
 
9
- [![Build Status](https://github.com/ankane/cmfrec/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec/actions)
9
+ [![Build Status](https://github.com/ankane/cmfrec-ruby/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec-ruby/actions)
10
10
 
11
11
  ## Installation
12
12
 
@@ -58,8 +58,8 @@ Get recommendations for a new user
58
58
 
59
59
  ```ruby
60
60
  recommender.new_user_recs([
61
- {item_id: 1, value: 5},
62
- {item_id: 2, value: 3}
61
+ {item_id: 1, rating: 5},
62
+ {item_id: 2, rating: 3}
63
63
  ])
64
64
  ```
65
65
 
@@ -150,11 +150,7 @@ recommender.predict(ratings.last(20000))
150
150
  [Ahoy](https://github.com/ankane/ahoy) is a great source for implicit feedback
151
151
 
152
152
  ```ruby
153
- views = Ahoy::Event.
154
- where(name: "Viewed post").
155
- group(:user_id).
156
- group("properties->>'post_id'"). # postgres syntax
157
- count
153
+ views = Ahoy::Event.where(name: "Viewed post").group(:user_id).group_prop(:post_id).count
158
154
 
159
155
  data =
160
156
  views.map do |(user_id, post_id), count|
@@ -230,8 +226,17 @@ bin = File.binread("recommender.bin")
230
226
  recommender = Marshal.load(bin)
231
227
  ```
232
228
 
229
+ Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor). See the [examples](https://github.com/ankane/neighbor/tree/master/examples) for Disco, which has a similar API. For explicit feedback, you should [disable the bias](#explicit-feedback) with this approach.
230
+
233
231
  ## Reference
234
232
 
233
+ Get ids
234
+
235
+ ```ruby
236
+ recommender.user_ids
237
+ recommender.item_ids
238
+ ```
239
+
235
240
  Get the global mean
236
241
 
237
242
  ```ruby
@@ -262,22 +267,22 @@ Cmfrec.ffi_lib = "path/to/cmfrec.dll"
262
267
 
263
268
  ## History
264
269
 
265
- View the [changelog](https://github.com/ankane/cmfrec/blob/master/CHANGELOG.md)
270
+ View the [changelog](https://github.com/ankane/cmfrec-ruby/blob/master/CHANGELOG.md)
266
271
 
267
272
  ## Contributing
268
273
 
269
274
  Everyone is encouraged to help improve this project. Here are a few ways you can help:
270
275
 
271
- - [Report bugs](https://github.com/ankane/cmfrec/issues)
272
- - Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec/pulls)
276
+ - [Report bugs](https://github.com/ankane/cmfrec-ruby/issues)
277
+ - Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec-ruby/pulls)
273
278
  - Write, clarify, or fix documentation
274
279
  - Suggest or add new features
275
280
 
276
281
  To get started with development:
277
282
 
278
283
  ```sh
279
- git clone https://github.com/ankane/cmfrec.git
280
- cd cmfrec
284
+ git clone https://github.com/ankane/cmfrec-ruby.git
285
+ cd cmfrec-ruby
281
286
  bundle install
282
287
  bundle exec rake vendor:all
283
288
  bundle exec rake test
data/lib/cmfrec/data.rb CHANGED
@@ -3,11 +3,11 @@ module Cmfrec
3
3
  def load_movielens
4
4
  require "csv"
5
5
 
6
- data_path = download_file("ml-100k/u.data", "http://files.grouplens.org/datasets/movielens/ml-100k/u.data",
6
+ data_path = download_file("ml-100k/u.data", "https://files.grouplens.org/datasets/movielens/ml-100k/u.data",
7
7
  file_hash: "06416e597f82b7342361e41163890c81036900f418ad91315590814211dca490")
8
- user_path = download_file("ml-100k/u.user", "http://files.grouplens.org/datasets/movielens/ml-100k/u.user",
8
+ user_path = download_file("ml-100k/u.user", "https://files.grouplens.org/datasets/movielens/ml-100k/u.user",
9
9
  file_hash: "f120e114da2e8cf314fd28f99417c94ae9ddf1cb6db8ce0e4b5995d40e90e62c")
10
- item_path = download_file("ml-100k/u.item", "http://files.grouplens.org/datasets/movielens/ml-100k/u.item",
10
+ item_path = download_file("ml-100k/u.item", "https://files.grouplens.org/datasets/movielens/ml-100k/u.item",
11
11
  file_hash: "553841ebc7de3a0fd0d6b62a204ea30c1e651aacfb2814c7a6584ac52f2c5701")
12
12
 
13
13
  # convert u.item to utf-8
@@ -24,8 +24,13 @@ module Cmfrec
24
24
 
25
25
  item_info = []
26
26
  movies = {}
27
+ movie_names = {}
27
28
  genres = %w(unknown action adventure animation childrens comedy crime documentary drama fantasy filmnoir horror musical mystery romance scifi thriller war western)
28
29
  CSV.parse(movies_str, col_sep: "|", converters: [:numeric]) do |row|
30
+ # filter duplicates
31
+ next if movie_names[row[1]]
32
+ movie_names[row[1]] = true
33
+
29
34
  movies[row[0]] = row[1]
30
35
  item = {item_id: row[1], year: row[2] ? Date.parse(row[2]).year : 1970}
31
36
  genres.each_with_index do |genre, i|
@@ -49,7 +54,10 @@ module Cmfrec
49
54
  private
50
55
 
51
56
  def download_file(fname, origin, file_hash:)
57
+ require "digest"
52
58
  require "fileutils"
59
+ require "net/http"
60
+ require "tmpdir"
53
61
 
54
62
  # TODO handle this better
55
63
  raise "No HOME" unless ENV["HOME"]
@@ -58,10 +66,6 @@ module Cmfrec
58
66
 
59
67
  return dest if File.exist?(dest)
60
68
 
61
- require "digest"
62
- require "net/http"
63
- require "tmpdir"
64
-
65
69
  temp_path = "#{Dir.tmpdir}/cmfrec-#{Time.now.to_f}" # TODO better name
66
70
 
67
71
  digest = Digest::SHA2.new
@@ -67,21 +67,10 @@ module Cmfrec
67
67
  user = @user_map[user_id]
68
68
 
69
69
  if user
70
- if item_ids
71
- # remove missing ids
72
- item_ids = item_ids.select { |v| @item_map[v] }
73
-
74
- data = item_ids.map { |v| {user_id: user_id, item_id: v} }
75
- scores = predict(data)
76
-
77
- item_ids.zip(scores).map do |item_id, score|
78
- {item_id: item_id, score: score}
79
- end
80
- else
81
- a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
82
- a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
83
- top_n(a_vec: a_vec, a_bias: a_bias, count: count)
84
- end
70
+ a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
71
+ a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
72
+ # @rated[user] will be nil for recommenders saved before 0.1.5
73
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count, rated: (@rated[user] || {}).keys, item_ids: item_ids)
85
74
  else
86
75
  # no items if user is unknown
87
76
  # TODO maybe most popular items
@@ -89,28 +78,35 @@ module Cmfrec
89
78
  end
90
79
  end
91
80
 
92
- # TODO add item_ids
93
- def new_user_recs(data, count: 5, user_info: nil)
81
+ def new_user_recs(data, count: 5, user_info: nil, item_ids: nil)
94
82
  check_fit
95
83
 
96
- a_vec, a_bias = factors_warm(data, user_info: user_info)
97
- top_n(a_vec: a_vec, a_bias: a_bias, count: count)
84
+ a_vec, a_bias, rated = factors_warm(data, user_info: user_info)
85
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count, rated: rated, item_ids: item_ids)
86
+ end
87
+
88
+ def user_ids
89
+ @user_map.keys
98
90
  end
99
91
 
100
- def user_factors
101
- read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main)
92
+ def item_ids
93
+ @item_map.keys
102
94
  end
103
95
 
104
- def item_factors
105
- read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main)
96
+ def user_factors(user_id = nil)
97
+ read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main, user_id, @user_map)
106
98
  end
107
99
 
108
- def user_bias
109
- read_bias(@bias_a) if @bias_a
100
+ def item_factors(item_id = nil)
101
+ read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main, item_id, @item_map)
110
102
  end
111
103
 
112
- def item_bias
113
- read_bias(@bias_b) if @bias_b
104
+ def user_bias(user_id = nil)
105
+ read_bias(@bias_a, user_id, @user_map) if @bias_a
106
+ end
107
+
108
+ def item_bias(item_id = nil)
109
+ read_bias(@bias_b, item_id, @item_map) if @bias_b
114
110
  end
115
111
 
116
112
  def similar_items(item_id, count: 5)
@@ -191,11 +187,17 @@ module Cmfrec
191
187
  x_col = []
192
188
  x_val = []
193
189
  value_key = @implicit ? :value : :rating
190
+ @rated = Hash.new { |hash, key| hash[key] = {} }
194
191
  train_set.each do |v|
195
- x_row << @user_map[v[:user_id]]
196
- x_col << @item_map[v[:item_id]]
192
+ u = @user_map[v[:user_id]]
193
+ i = @item_map[v[:item_id]]
194
+ @rated[u][i] = true
195
+
196
+ x_row << u
197
+ x_col << i
197
198
  x_val << (v[value_key] || 1)
198
199
  end
200
+ @rated.default = nil
199
201
 
200
202
  @m = @user_map.size
201
203
  @n = @item_map.size
@@ -435,26 +437,59 @@ module Cmfrec
435
437
  end
436
438
  end
437
439
 
438
- def read_factors(ptr, d1, d2)
439
- arr = []
440
- offset = 0
440
+ def read_factors(ptr, d1, d2, id, map)
441
441
  width = d2 * Fiddle::SIZEOF_DOUBLE
442
- d1.times do |i|
443
- arr << ptr[offset, width].unpack("d*")
444
- offset += width
442
+ if id
443
+ i = map[id]
444
+ ptr[i * width, width].unpack("d*") if i
445
+ else
446
+ arr = []
447
+ offset = 0
448
+ d1.times do |i|
449
+ arr << ptr[offset, width].unpack("d*")
450
+ offset += width
451
+ end
452
+ arr
445
453
  end
446
- arr
447
454
  end
448
455
 
449
- def read_bias(ptr)
450
- real_array(ptr)
456
+ def read_bias(ptr, id, map)
457
+ if id
458
+ i = map[id]
459
+ ptr[i * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") if i
460
+ else
461
+ real_array(ptr)
462
+ end
451
463
  end
452
464
 
453
- def top_n(a_vec:, a_bias:, count:)
454
- include_ix = nil
455
- n_include = 0
456
- exclude_ix = nil
457
- n_exclude = 0
465
+ def top_n(a_vec:, a_bias:, count:, rated: nil, item_ids: nil)
466
+ if item_ids
467
+ # remove missing ids
468
+ item_ids = item_ids.map { |v| @item_map[v] }.compact
469
+ return [] if item_ids.empty?
470
+
471
+ include_ix = int_ptr(item_ids)
472
+ n_include = item_ids.size
473
+
474
+ # TODO uncomment in 0.2.0
475
+ count = n_include # if n_include < count
476
+ else
477
+ include_ix = nil
478
+ n_include = 0
479
+ end
480
+
481
+ if rated && !item_ids
482
+ # assumes rated is unique and all items are known
483
+ # calling code is responsible for this
484
+ exclude_ix = int_ptr(rated)
485
+ n_exclude = rated.size
486
+ remaining = @item_map.size - n_exclude
487
+ return [] if remaining == 0
488
+ count = remaining if remaining < count
489
+ else
490
+ exclude_ix = nil
491
+ n_exclude = 0
492
+ end
458
493
 
459
494
  outp_ix = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_INT)
460
495
  outp_score = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_DOUBLE)
@@ -484,6 +519,16 @@ module Cmfrec
484
519
  data = to_dataset(data)
485
520
  user_info = to_dataset(user_info) if user_info
486
521
 
522
+ # remove unknown items
523
+ data, unknown_data = data.partition { |d| @item_map[d[:item_id]] }
524
+
525
+ if unknown_data.any?
526
+ # TODO warn for unknown items?
527
+ # warn "[cmfrec] Unknown items: #{unknown_data.map { |d| d[:item_id] }.join(", ")}"
528
+ end
529
+
530
+ item_ids = data.map { |d| @item_map[d[:item_id]] }
531
+
487
532
  nnz = data.size
488
533
  a_vec = Fiddle::Pointer.malloc((@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
489
534
  bias_a = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
@@ -524,7 +569,7 @@ module Cmfrec
524
569
  check_ratings(ratings)
525
570
  end
526
571
  xa = real_ptr(ratings)
527
- x_col = int_ptr(data.map { |d| d[:item_id] })
572
+ x_col = int_ptr(item_ids)
528
573
  else
529
574
  xa = nil
530
575
  x_col = nil
@@ -587,7 +632,7 @@ module Cmfrec
587
632
  check_status FFI.factors_collective_explicit_single(*fiddle_args(args))
588
633
  end
589
634
 
590
- [a_vec, real_array(bias_a).first]
635
+ [a_vec, real_array(bias_a).first, item_ids.uniq]
591
636
  end
592
637
 
593
638
  # convert boolean to int
@@ -679,6 +724,7 @@ module Cmfrec
679
724
  # factors
680
725
  obj[:user_map] = @user_map
681
726
  obj[:item_map] = @item_map
727
+ obj[:rated] = @rated
682
728
  obj[:user_factors] = dump_ptr(@a)
683
729
  obj[:item_factors] = dump_ptr(@b)
684
730
 
@@ -726,6 +772,7 @@ module Cmfrec
726
772
  # factors
727
773
  @user_map = obj[:user_map]
728
774
  @item_map = obj[:item_map]
775
+ @rated = obj[:rated] || {}
729
776
  @a = load_ptr(obj[:user_factors])
730
777
  @b = load_ptr(obj[:item_factors])
731
778
 
@@ -1,3 +1,3 @@
1
1
  module Cmfrec
2
- VERSION = "0.1.4"
2
+ VERSION = "0.1.7"
3
3
  end
data/lib/cmfrec.rb CHANGED
@@ -19,7 +19,7 @@ module Cmfrec
19
19
  if Gem.win_platform?
20
20
  "cmfrec.dll"
21
21
  elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
22
- if RbConfig::CONFIG["host_cpu"] =~ /arm/i
22
+ if RbConfig::CONFIG["host_cpu"] =~ /arm|aarch64/i
23
23
  "libcmfrec.arm64.dylib"
24
24
  else
25
25
  "libcmfrec.dylib"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cmfrec
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-02-05 00:00:00.000000000 Z
11
+ date: 2022-03-22 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description:
14
14
  email: andrew@ankane.org
@@ -28,7 +28,7 @@ files:
28
28
  - vendor/libcmfrec.arm64.dylib
29
29
  - vendor/libcmfrec.dylib
30
30
  - vendor/libcmfrec.so
31
- homepage: https://github.com/ankane/cmfrec
31
+ homepage: https://github.com/ankane/cmfrec-ruby
32
32
  licenses:
33
33
  - MIT
34
34
  metadata: {}
@@ -47,7 +47,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
47
47
  - !ruby/object:Gem::Version
48
48
  version: '0'
49
49
  requirements: []
50
- rubygems_version: 3.2.3
50
+ rubygems_version: 3.3.7
51
51
  signing_key:
52
52
  specification_version: 4
53
53
  summary: Recommendations for Ruby using collective matrix factorization