cmfrec 0.1.4 → 0.1.7

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: bb7b07ae46500a545f1a130dfc5648aa3f925f9b5766a6c70a1652c7b5732182
4
- data.tar.gz: e89a6d1900cda651dc6b0aac2899050e28680cddfb6b39b6b5eacfe467b59aad
3
+ metadata.gz: 8d16ab98cb7de22042eaf353a9d41d0b7a4214a631a373c553f73825418c026a
4
+ data.tar.gz: 9ab678a9d389b835b4dfd91d14c372d5acfef950bf068ac46d2d879af04f0fcc
5
5
  SHA512:
6
- metadata.gz: 117aa6952fe0ab8ddebfaece6655cf479a7adbab7d6f634e7d3428c72824a410812c037ae006366180a9691a6d160d8065b777a9c10a33a5ccfefedb28c99ec6
7
- data.tar.gz: 57985a055705b820226a2aa1451453383ee3509e43225f8fdb09e713c4530754b0b608f7d1b4814973b43e3d625f824f9f87939687d015b352cc8905f7b4f118
6
+ metadata.gz: 175d3c91056d2e8734af6961471c98be76e4d5f6b85faaecdfd3b39a220efafa70150983e9d74efb1a1211a29e6c867d6b1d7f482cc34c55500268d29b40158c
7
+ data.tar.gz: faaed621391ccc7d94f2e6309481b24f7db62fbe86eb6c1bbb35445dde0cabf32c41791af4ebcd25e530bdc1f6319f1b4477b6c4d6ca1bf77353d3a0c4ae8d5c
data/CHANGELOG.md CHANGED
@@ -1,3 +1,24 @@
1
+ ## 0.1.7 (2022-03-22)
2
+
3
+ - Improved ARM detection
4
+ - Fixed error with `load_movielens`
5
+ - Fixed duplicates in `item_info` with `load_movielens`
6
+
7
+ ## 0.1.6 (2021-08-12)
8
+
9
+ - Added `user_ids` and `item_ids` methods
10
+ - Added `user_id` argument to `user_factors`
11
+ - Added `item_id` argument to `item_factors`
12
+ - Added `user_id` argument to `user_bias`
13
+ - Added `item_id` argument to `item_bias`
14
+ - Added `item_ids` argument to `new_user_recs`
15
+ - Fixed order for `user_recs`
16
+
17
+ ## 0.1.5 (2021-08-10)
18
+
19
+ - Fixed issue with `user_recs` and `new_user_recs` returning rated items
20
+ - Fixed error with `new_user_recs`
21
+
1
22
  ## 0.1.4 (2021-02-04)
2
23
 
3
24
  - Added support for saving and loading recommenders
data/README.md CHANGED
@@ -1,4 +1,4 @@
1
- # cmfrec
1
+ # cmfrec Ruby
2
2
 
3
3
  :fire: Recommendations for Ruby, powered by [cmfrec](https://github.com/david-cortes/cmfrec)
4
4
 
@@ -6,7 +6,7 @@
6
6
  - Works with explicit and implicit feedback
7
7
  - Uses high-performance matrix factorization
8
8
 
9
- [![Build Status](https://github.com/ankane/cmfrec/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec/actions)
9
+ [![Build Status](https://github.com/ankane/cmfrec-ruby/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec-ruby/actions)
10
10
 
11
11
  ## Installation
12
12
 
@@ -58,8 +58,8 @@ Get recommendations for a new user
58
58
 
59
59
  ```ruby
60
60
  recommender.new_user_recs([
61
- {item_id: 1, value: 5},
62
- {item_id: 2, value: 3}
61
+ {item_id: 1, rating: 5},
62
+ {item_id: 2, rating: 3}
63
63
  ])
64
64
  ```
65
65
 
@@ -150,11 +150,7 @@ recommender.predict(ratings.last(20000))
150
150
  [Ahoy](https://github.com/ankane/ahoy) is a great source for implicit feedback
151
151
 
152
152
  ```ruby
153
- views = Ahoy::Event.
154
- where(name: "Viewed post").
155
- group(:user_id).
156
- group("properties->>'post_id'"). # postgres syntax
157
- count
153
+ views = Ahoy::Event.where(name: "Viewed post").group(:user_id).group_prop(:post_id).count
158
154
 
159
155
  data =
160
156
  views.map do |(user_id, post_id), count|
@@ -230,8 +226,17 @@ bin = File.binread("recommender.bin")
230
226
  recommender = Marshal.load(bin)
231
227
  ```
232
228
 
229
+ Alternatively, you can store only the factors and use a library like [Neighbor](https://github.com/ankane/neighbor). See the [examples](https://github.com/ankane/neighbor/tree/master/examples) for Disco, which has a similar API. For explicit feedback, you should [disable the bias](#explicit-feedback) with this approach.
230
+
233
231
  ## Reference
234
232
 
233
+ Get ids
234
+
235
+ ```ruby
236
+ recommender.user_ids
237
+ recommender.item_ids
238
+ ```
239
+
235
240
  Get the global mean
236
241
 
237
242
  ```ruby
@@ -262,22 +267,22 @@ Cmfrec.ffi_lib = "path/to/cmfrec.dll"
262
267
 
263
268
  ## History
264
269
 
265
- View the [changelog](https://github.com/ankane/cmfrec/blob/master/CHANGELOG.md)
270
+ View the [changelog](https://github.com/ankane/cmfrec-ruby/blob/master/CHANGELOG.md)
266
271
 
267
272
  ## Contributing
268
273
 
269
274
  Everyone is encouraged to help improve this project. Here are a few ways you can help:
270
275
 
271
- - [Report bugs](https://github.com/ankane/cmfrec/issues)
272
- - Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec/pulls)
276
+ - [Report bugs](https://github.com/ankane/cmfrec-ruby/issues)
277
+ - Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec-ruby/pulls)
273
278
  - Write, clarify, or fix documentation
274
279
  - Suggest or add new features
275
280
 
276
281
  To get started with development:
277
282
 
278
283
  ```sh
279
- git clone https://github.com/ankane/cmfrec.git
280
- cd cmfrec
284
+ git clone https://github.com/ankane/cmfrec-ruby.git
285
+ cd cmfrec-ruby
281
286
  bundle install
282
287
  bundle exec rake vendor:all
283
288
  bundle exec rake test
data/lib/cmfrec/data.rb CHANGED
@@ -3,11 +3,11 @@ module Cmfrec
3
3
  def load_movielens
4
4
  require "csv"
5
5
 
6
- data_path = download_file("ml-100k/u.data", "http://files.grouplens.org/datasets/movielens/ml-100k/u.data",
6
+ data_path = download_file("ml-100k/u.data", "https://files.grouplens.org/datasets/movielens/ml-100k/u.data",
7
7
  file_hash: "06416e597f82b7342361e41163890c81036900f418ad91315590814211dca490")
8
- user_path = download_file("ml-100k/u.user", "http://files.grouplens.org/datasets/movielens/ml-100k/u.user",
8
+ user_path = download_file("ml-100k/u.user", "https://files.grouplens.org/datasets/movielens/ml-100k/u.user",
9
9
  file_hash: "f120e114da2e8cf314fd28f99417c94ae9ddf1cb6db8ce0e4b5995d40e90e62c")
10
- item_path = download_file("ml-100k/u.item", "http://files.grouplens.org/datasets/movielens/ml-100k/u.item",
10
+ item_path = download_file("ml-100k/u.item", "https://files.grouplens.org/datasets/movielens/ml-100k/u.item",
11
11
  file_hash: "553841ebc7de3a0fd0d6b62a204ea30c1e651aacfb2814c7a6584ac52f2c5701")
12
12
 
13
13
  # convert u.item to utf-8
@@ -24,8 +24,13 @@ module Cmfrec
24
24
 
25
25
  item_info = []
26
26
  movies = {}
27
+ movie_names = {}
27
28
  genres = %w(unknown action adventure animation childrens comedy crime documentary drama fantasy filmnoir horror musical mystery romance scifi thriller war western)
28
29
  CSV.parse(movies_str, col_sep: "|", converters: [:numeric]) do |row|
30
+ # filter duplicates
31
+ next if movie_names[row[1]]
32
+ movie_names[row[1]] = true
33
+
29
34
  movies[row[0]] = row[1]
30
35
  item = {item_id: row[1], year: row[2] ? Date.parse(row[2]).year : 1970}
31
36
  genres.each_with_index do |genre, i|
@@ -49,7 +54,10 @@ module Cmfrec
49
54
  private
50
55
 
51
56
  def download_file(fname, origin, file_hash:)
57
+ require "digest"
52
58
  require "fileutils"
59
+ require "net/http"
60
+ require "tmpdir"
53
61
 
54
62
  # TODO handle this better
55
63
  raise "No HOME" unless ENV["HOME"]
@@ -58,10 +66,6 @@ module Cmfrec
58
66
 
59
67
  return dest if File.exist?(dest)
60
68
 
61
- require "digest"
62
- require "net/http"
63
- require "tmpdir"
64
-
65
69
  temp_path = "#{Dir.tmpdir}/cmfrec-#{Time.now.to_f}" # TODO better name
66
70
 
67
71
  digest = Digest::SHA2.new
@@ -67,21 +67,10 @@ module Cmfrec
67
67
  user = @user_map[user_id]
68
68
 
69
69
  if user
70
- if item_ids
71
- # remove missing ids
72
- item_ids = item_ids.select { |v| @item_map[v] }
73
-
74
- data = item_ids.map { |v| {user_id: user_id, item_id: v} }
75
- scores = predict(data)
76
-
77
- item_ids.zip(scores).map do |item_id, score|
78
- {item_id: item_id, score: score}
79
- end
80
- else
81
- a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
82
- a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
83
- top_n(a_vec: a_vec, a_bias: a_bias, count: count)
84
- end
70
+ a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
71
+ a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
72
+ # @rated[user] will be nil for recommenders saved before 0.1.5
73
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count, rated: (@rated[user] || {}).keys, item_ids: item_ids)
85
74
  else
86
75
  # no items if user is unknown
87
76
  # TODO maybe most popular items
@@ -89,28 +78,35 @@ module Cmfrec
89
78
  end
90
79
  end
91
80
 
92
- # TODO add item_ids
93
- def new_user_recs(data, count: 5, user_info: nil)
81
+ def new_user_recs(data, count: 5, user_info: nil, item_ids: nil)
94
82
  check_fit
95
83
 
96
- a_vec, a_bias = factors_warm(data, user_info: user_info)
97
- top_n(a_vec: a_vec, a_bias: a_bias, count: count)
84
+ a_vec, a_bias, rated = factors_warm(data, user_info: user_info)
85
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count, rated: rated, item_ids: item_ids)
86
+ end
87
+
88
+ def user_ids
89
+ @user_map.keys
98
90
  end
99
91
 
100
- def user_factors
101
- read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main)
92
+ def item_ids
93
+ @item_map.keys
102
94
  end
103
95
 
104
- def item_factors
105
- read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main)
96
+ def user_factors(user_id = nil)
97
+ read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main, user_id, @user_map)
106
98
  end
107
99
 
108
- def user_bias
109
- read_bias(@bias_a) if @bias_a
100
+ def item_factors(item_id = nil)
101
+ read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main, item_id, @item_map)
110
102
  end
111
103
 
112
- def item_bias
113
- read_bias(@bias_b) if @bias_b
104
+ def user_bias(user_id = nil)
105
+ read_bias(@bias_a, user_id, @user_map) if @bias_a
106
+ end
107
+
108
+ def item_bias(item_id = nil)
109
+ read_bias(@bias_b, item_id, @item_map) if @bias_b
114
110
  end
115
111
 
116
112
  def similar_items(item_id, count: 5)
@@ -191,11 +187,17 @@ module Cmfrec
191
187
  x_col = []
192
188
  x_val = []
193
189
  value_key = @implicit ? :value : :rating
190
+ @rated = Hash.new { |hash, key| hash[key] = {} }
194
191
  train_set.each do |v|
195
- x_row << @user_map[v[:user_id]]
196
- x_col << @item_map[v[:item_id]]
192
+ u = @user_map[v[:user_id]]
193
+ i = @item_map[v[:item_id]]
194
+ @rated[u][i] = true
195
+
196
+ x_row << u
197
+ x_col << i
197
198
  x_val << (v[value_key] || 1)
198
199
  end
200
+ @rated.default = nil
199
201
 
200
202
  @m = @user_map.size
201
203
  @n = @item_map.size
@@ -435,26 +437,59 @@ module Cmfrec
435
437
  end
436
438
  end
437
439
 
438
- def read_factors(ptr, d1, d2)
439
- arr = []
440
- offset = 0
440
+ def read_factors(ptr, d1, d2, id, map)
441
441
  width = d2 * Fiddle::SIZEOF_DOUBLE
442
- d1.times do |i|
443
- arr << ptr[offset, width].unpack("d*")
444
- offset += width
442
+ if id
443
+ i = map[id]
444
+ ptr[i * width, width].unpack("d*") if i
445
+ else
446
+ arr = []
447
+ offset = 0
448
+ d1.times do |i|
449
+ arr << ptr[offset, width].unpack("d*")
450
+ offset += width
451
+ end
452
+ arr
445
453
  end
446
- arr
447
454
  end
448
455
 
449
- def read_bias(ptr)
450
- real_array(ptr)
456
+ def read_bias(ptr, id, map)
457
+ if id
458
+ i = map[id]
459
+ ptr[i * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") if i
460
+ else
461
+ real_array(ptr)
462
+ end
451
463
  end
452
464
 
453
- def top_n(a_vec:, a_bias:, count:)
454
- include_ix = nil
455
- n_include = 0
456
- exclude_ix = nil
457
- n_exclude = 0
465
+ def top_n(a_vec:, a_bias:, count:, rated: nil, item_ids: nil)
466
+ if item_ids
467
+ # remove missing ids
468
+ item_ids = item_ids.map { |v| @item_map[v] }.compact
469
+ return [] if item_ids.empty?
470
+
471
+ include_ix = int_ptr(item_ids)
472
+ n_include = item_ids.size
473
+
474
+ # TODO uncomment in 0.2.0
475
+ count = n_include # if n_include < count
476
+ else
477
+ include_ix = nil
478
+ n_include = 0
479
+ end
480
+
481
+ if rated && !item_ids
482
+ # assumes rated is unique and all items are known
483
+ # calling code is responsible for this
484
+ exclude_ix = int_ptr(rated)
485
+ n_exclude = rated.size
486
+ remaining = @item_map.size - n_exclude
487
+ return [] if remaining == 0
488
+ count = remaining if remaining < count
489
+ else
490
+ exclude_ix = nil
491
+ n_exclude = 0
492
+ end
458
493
 
459
494
  outp_ix = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_INT)
460
495
  outp_score = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_DOUBLE)
@@ -484,6 +519,16 @@ module Cmfrec
484
519
  data = to_dataset(data)
485
520
  user_info = to_dataset(user_info) if user_info
486
521
 
522
+ # remove unknown items
523
+ data, unknown_data = data.partition { |d| @item_map[d[:item_id]] }
524
+
525
+ if unknown_data.any?
526
+ # TODO warn for unknown items?
527
+ # warn "[cmfrec] Unknown items: #{unknown_data.map { |d| d[:item_id] }.join(", ")}"
528
+ end
529
+
530
+ item_ids = data.map { |d| @item_map[d[:item_id]] }
531
+
487
532
  nnz = data.size
488
533
  a_vec = Fiddle::Pointer.malloc((@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
489
534
  bias_a = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
@@ -524,7 +569,7 @@ module Cmfrec
524
569
  check_ratings(ratings)
525
570
  end
526
571
  xa = real_ptr(ratings)
527
- x_col = int_ptr(data.map { |d| d[:item_id] })
572
+ x_col = int_ptr(item_ids)
528
573
  else
529
574
  xa = nil
530
575
  x_col = nil
@@ -587,7 +632,7 @@ module Cmfrec
587
632
  check_status FFI.factors_collective_explicit_single(*fiddle_args(args))
588
633
  end
589
634
 
590
- [a_vec, real_array(bias_a).first]
635
+ [a_vec, real_array(bias_a).first, item_ids.uniq]
591
636
  end
592
637
 
593
638
  # convert boolean to int
@@ -679,6 +724,7 @@ module Cmfrec
679
724
  # factors
680
725
  obj[:user_map] = @user_map
681
726
  obj[:item_map] = @item_map
727
+ obj[:rated] = @rated
682
728
  obj[:user_factors] = dump_ptr(@a)
683
729
  obj[:item_factors] = dump_ptr(@b)
684
730
 
@@ -726,6 +772,7 @@ module Cmfrec
726
772
  # factors
727
773
  @user_map = obj[:user_map]
728
774
  @item_map = obj[:item_map]
775
+ @rated = obj[:rated] || {}
729
776
  @a = load_ptr(obj[:user_factors])
730
777
  @b = load_ptr(obj[:item_factors])
731
778
 
@@ -1,3 +1,3 @@
1
1
  module Cmfrec
2
- VERSION = "0.1.4"
2
+ VERSION = "0.1.7"
3
3
  end
data/lib/cmfrec.rb CHANGED
@@ -19,7 +19,7 @@ module Cmfrec
19
19
  if Gem.win_platform?
20
20
  "cmfrec.dll"
21
21
  elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
22
- if RbConfig::CONFIG["host_cpu"] =~ /arm/i
22
+ if RbConfig::CONFIG["host_cpu"] =~ /arm|aarch64/i
23
23
  "libcmfrec.arm64.dylib"
24
24
  else
25
25
  "libcmfrec.dylib"
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: cmfrec
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2021-02-05 00:00:00.000000000 Z
11
+ date: 2022-03-22 00:00:00.000000000 Z
12
12
  dependencies: []
13
13
  description:
14
14
  email: andrew@ankane.org
@@ -28,7 +28,7 @@ files:
28
28
  - vendor/libcmfrec.arm64.dylib
29
29
  - vendor/libcmfrec.dylib
30
30
  - vendor/libcmfrec.so
31
- homepage: https://github.com/ankane/cmfrec
31
+ homepage: https://github.com/ankane/cmfrec-ruby
32
32
  licenses:
33
33
  - MIT
34
34
  metadata: {}
@@ -47,7 +47,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
47
47
  - !ruby/object:Gem::Version
48
48
  version: '0'
49
49
  requirements: []
50
- rubygems_version: 3.2.3
50
+ rubygems_version: 3.3.7
51
51
  signing_key:
52
52
  specification_version: 4
53
53
  summary: Recommendations for Ruby using collective matrix factorization