cmfrec 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 2e9f45e0c3826b90788782ac8a0838476fc7849e75d91f2d5949b08b382a28c3
4
+ data.tar.gz: 2f44b988001bf2b23e3c5938b4c5e7b9089e8943d86aa42c86e72be0ff558ea1
5
+ SHA512:
6
+ metadata.gz: 592ef4363bc016da1a35a958f99dd13a0fc7e900ecbe77b3ba95fc4b4cbbe151397924ff90aab2e326ac3369ac07c0380c76b1d3a2ac71b390664118de0f8611
7
+ data.tar.gz: 27d3e1ce80e88af9e8b062837f68329b10718372be15c08d4cd2e056619d1cd9e5ed906e6070798c32993ccea2f617b706af5e6afb7cba4f3cfc70ea22393ba6
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2020-11-27)
2
+
3
+ - First release
@@ -0,0 +1,24 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2020 David Cortes
4
+ Copyright (c) 2020 Andrew Kane
5
+
6
+ All rights reserved.
7
+
8
+ Permission is hereby granted, free of charge, to any person obtaining a copy
9
+ of this software and associated documentation files (the "Software"), to
10
+ deal in the Software without restriction, including without limitation the
11
+ rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
12
+ sell copies of the Software, and to permit persons to whom the Software is
13
+ furnished to do so, subject to the following conditions:
14
+
15
+ The above copyright notice and this permission notice shall be included in
16
+ all copies or substantial portions of the Software.
17
+
18
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
24
+ IN THE SOFTWARE.
@@ -0,0 +1,191 @@
1
+ # cmfrec
2
+
3
+ :fire: Recommendations for Ruby, powered by [cmfrec](https://github.com/david-cortes/cmfrec)
4
+
5
+ - Supports side information :tada:
6
+ - Works with explicit and implicit feedback
7
+ - Uses high-performance matrix factorization
8
+
9
+ Not available for Windows yet
10
+
11
+ [![Build Status](https://github.com/ankane/cmfrec/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec/actions)
12
+
13
+ ## Installation
14
+
15
+ Add this line to your application’s Gemfile:
16
+
17
+ ```ruby
18
+ gem 'cmfrec'
19
+ ```
20
+
21
+ ## Getting Started
22
+
23
+ Create a recommender
24
+
25
+ ```ruby
26
+ recommender = Cmfrec::Recommender.new
27
+ ```
28
+
29
+ If users rate items directly, this is known as explicit feedback. Fit the recommender with:
30
+
31
+ ```ruby
32
+ recommender.fit([
33
+ {user_id: 1, item_id: 1, rating: 5},
34
+ {user_id: 2, item_id: 1, rating: 3}
35
+ ])
36
+ ```
37
+
38
+ > IDs can be integers, strings, or any other data type
39
+
40
+ If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating, or use a value like number of purchases, number of page views, or time spent on page:
41
+
42
+ ```ruby
43
+ recommender.fit([
44
+ {user_id: 1, item_id: 1, value: 1},
45
+ {user_id: 2, item_id: 1, value: 1}
46
+ ])
47
+ ```
48
+
49
+ > Use `value` instead of rating for implicit feedback
50
+
51
+ Get recommendations - “users like you also liked”
52
+
53
+ ```ruby
54
+ recommender.user_recs(user_id)
55
+ ```
56
+
57
+ Get recommendations for a new user
58
+
59
+ ```ruby
60
+ recommender.new_user_recs([
61
+ {item_id: 1, value: 5},
62
+ {item_id: 2, value: 3}
63
+ ])
64
+ ```
65
+
66
+ Use the `count` option to specify the number of recommendations (default is 5)
67
+
68
+ ```ruby
69
+ recommender.user_recs(user_id, count: 3)
70
+ ```
71
+
72
+ Get predicted ratings for specific items
73
+
74
+ ```ruby
75
+ recommender.user_recs(user_id, item_ids: [1, 2, 3])
76
+ ```
77
+
78
+ ## Side Information
79
+
80
+ Add side information about users, items, or both
81
+
82
+ ```ruby
83
+ user_info = [
84
+ {user_id: 1, a: 1, b: 1},
85
+ {user_id: 2, a: 1, b: 1},
86
+ ]
87
+ item_info = [
88
+ {item_id: 1, c: 1, d: 1},
89
+ {item_id: 2, c: 1, d: 1},
90
+ ]
91
+ recommender.fit(ratings, user_info: user_info, item_info: item_info)
92
+ ```
93
+
94
+ Get recommendations for a new user with ratings and side information
95
+
96
+ ```ruby
97
+ ratings = [
98
+ {item_id: 1, rating: 5},
99
+ {item_id: 2, rating: 3}
100
+ ]
101
+ recommender.new_user_recs(ratings, user_info: {a: 1, b: 1})
102
+ ```
103
+
104
+ Get recommendations with only side information
105
+
106
+ ```ruby
107
+ recommender.new_user_recs([], user_info: {a: 1, b: 1})
108
+ ```
109
+
110
+ ## Options
111
+
112
+ Specify the number of factors and epochs
113
+
114
+ ```ruby
115
+ Cmfrec::Recommender.new(factors: 8, epochs: 20)
116
+ ```
117
+
118
+ If recommendations look off, trying changing `factors`. The default is 8, but 3 could be good for some applications and 300 good for others.
119
+
120
+ ### Explicit Feedback
121
+
122
+ Add implicit features
123
+
124
+ ```ruby
125
+ Cmfrec::Recommender.new(add_implicit_features: true)
126
+ ```
127
+
128
+ Disable bias
129
+
130
+ ```ruby
131
+ Cmfrec::Recommender.new(user_bias: false, item_bias: false)
132
+ ```
133
+
134
+ ## Data
135
+
136
+ Data can be an array of hashes
137
+
138
+ ```ruby
139
+ [{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]
140
+ ```
141
+
142
+ Or a Rover data frame
143
+
144
+ ```ruby
145
+ Rover.read_csv("ratings.csv")
146
+ ```
147
+
148
+ ## Reference
149
+
150
+ Get the global mean
151
+
152
+ ```ruby
153
+ recommender.global_mean
154
+ ```
155
+
156
+ Get the factors
157
+
158
+ ```ruby
159
+ recommender.user_factors
160
+ recommender.item_factors
161
+ ```
162
+
163
+ Get the bias
164
+
165
+ ```ruby
166
+ recommender.user_bias
167
+ recommender.item_bias
168
+ ```
169
+
170
+ ## History
171
+
172
+ View the [changelog](https://github.com/ankane/cmfrec/blob/master/CHANGELOG.md)
173
+
174
+ ## Contributing
175
+
176
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
177
+
178
+ - [Report bugs](https://github.com/ankane/cmfrec/issues)
179
+ - Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec/pulls)
180
+ - Write, clarify, or fix documentation
181
+ - Suggest or add new features
182
+
183
+ To get started with development:
184
+
185
+ ```sh
186
+ git clone https://github.com/ankane/cmfrec.git
187
+ cd cmfrec
188
+ bundle install
189
+ bundle exec rake vendor:all
190
+ bundle exec rake test
191
+ ```
@@ -0,0 +1,28 @@
1
+ # stdlib
2
+ require "etc"
3
+ require "fiddle/import"
4
+
5
+ # modules
6
+ require "cmfrec/recommender"
7
+ require "cmfrec/version"
8
+
9
+ module Cmfrec
10
+ class Error < StandardError; end
11
+
12
+ class << self
13
+ attr_accessor :ffi_lib
14
+ end
15
+ lib_name =
16
+ if Gem.win_platform?
17
+ "cmfrec.dll"
18
+ elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
19
+ "libcmfrec.dylib"
20
+ else
21
+ "libcmfrec.so"
22
+ end
23
+ vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
24
+ self.ffi_lib = [vendor_lib]
25
+
26
+ # friendlier error message
27
+ autoload :FFI, "cmfrec/ffi"
28
+ end
@@ -0,0 +1,26 @@
1
+ module Cmfrec
2
+ module FFI
3
+ extend Fiddle::Importer
4
+
5
+ libs = Cmfrec.ffi_lib.dup
6
+ begin
7
+ dlload Fiddle.dlopen(libs.shift)
8
+ rescue Fiddle::DLError => e
9
+ retry if libs.any?
10
+ raise e
11
+ end
12
+
13
+ typealias "bool", "char"
14
+ # determined by CMakeLists.txt
15
+ typealias "int_t", "int"
16
+ typealias "real_t", "double"
17
+
18
+ extern "int_t fit_collective_explicit_als(real_t *biasA, real_t *biasB, real_t *A, real_t *B, real_t *C, real_t *D, real_t *Ai, real_t *Bi, bool add_implicit_features, bool reset_values, int_t seed, real_t *glob_mean, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t *Xfull, real_t *weight, bool user_bias, bool item_bias, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_X, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t w_implicit, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, bool include_all_X, real_t *B_plus_bias, real_t *precomputedBtB, real_t *precomputedTransBtBinvBt, real_t *precomputedBeTBeChol, real_t *precomputedBiTBi, real_t *precomputedTransCtCinvCt, real_t *precomputedCtCw)"
19
+ extern "int_t fit_collective_implicit_als(real_t *A, real_t *B, real_t *C, real_t *D, bool reset_values, int_t seed, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t *w_main_multiplier, real_t alpha, bool adjust_weight, bool apply_log_transf, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, real_t *precomputedBtB, real_t *precomputedBeTBe, real_t *precomputedBeTBeChol)"
20
+ extern "int_t factors_collective_explicit_single(real_t *a_vec, real_t *a_bias, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, real_t *u_bin_vec, int_t pbin, bool NA_as_zero_U, bool NA_as_zero_X, bool nonneg, real_t *C, real_t *Cb, real_t glob_mean, real_t *biasB, real_t *U_colmeans, real_t *Xa, int_t X_col[], size_t nnz, real_t *Xa_dense, int_t n, real_t *weight, real_t *B, real_t *Bi, bool add_implicit_features, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t *lam_unique, real_t w_main, real_t w_user, real_t w_implicit, int_t n_max, bool include_all_X, real_t *TransBtBinvBt, real_t *BtB, real_t *BeTBeChol, real_t *BiTBi, real_t *CtCw, real_t *TransCtCinvCt, real_t *B_plus_bias)"
21
+ extern "int_t factors_collective_implicit_single(real_t *a_vec, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, bool NA_as_zero_U, bool nonneg, real_t *U_colmeans, real_t *B, int_t n, real_t *C, real_t *Xa, int_t X_col[], size_t nnz, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t alpha, real_t w_main, real_t w_user, real_t w_main_multiplier, bool apply_log_transf, real_t *BeTBe, real_t *BtB, real_t *BeTBeChol)"
22
+ extern "void predict_multiple(real_t *restrict A, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasA, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_main, int_t m, int_t n, int_t predA[], int_t predB[], size_t nnz, real_t *restrict outp, int_t nthreads)"
23
+ extern "int_t predict_X_old_collective_explicit(int_t row[], int_t col[], real_t *restrict predicted, size_t n_predict, real_t *restrict A, real_t *restrict biasA, real_t *restrict B, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_user, int_t k_item, int_t k_main, int_t m, int_t n_max, int_t nthreads)"
24
+ extern "int_t topN(real_t *restrict a_vec, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasB, real_t glob_mean, real_t biasA, int_t k, int_t k_main, int_t *restrict include_ix, int_t n_include, int_t *restrict exclude_ix, int_t n_exclude, int_t *restrict outp_ix, real_t *restrict outp_score, int_t n_top, int_t n, int_t nthreads)"
25
+ end
26
+ end
@@ -0,0 +1,548 @@
1
+ module Cmfrec
2
+ class Recommender
3
+ attr_reader :global_mean
4
+
5
+ def initialize(factors: 8, epochs: 10, verbose: true, user_bias: true, item_bias: true, add_implicit_features: false)
6
+ set_params(
7
+ k: factors,
8
+ niter: epochs,
9
+ verbose: verbose,
10
+ user_bias: user_bias,
11
+ item_bias: item_bias,
12
+ add_implicit_features: add_implicit_features
13
+ )
14
+ end
15
+
16
+ def fit(train_set, user_info: nil, item_info: nil)
17
+ train_set = to_dataset(train_set)
18
+
19
+ @implicit = !train_set.any? { |v| v[:rating] }
20
+ unless @implicit
21
+ ratings = train_set.map { |o| o[:rating] }
22
+ check_ratings(ratings)
23
+ end
24
+
25
+ check_training_set(train_set)
26
+ create_maps(train_set)
27
+
28
+ x_row = []
29
+ x_col = []
30
+ x_val = []
31
+ value_key = @implicit ? :value : :rating
32
+ train_set.each do |v|
33
+ x_row << @user_map[v[:user_id]]
34
+ x_col << @item_map[v[:item_id]]
35
+ x_val << (v[value_key] || 1)
36
+ end
37
+
38
+ @m = @user_map.size
39
+ @n = @item_map.size
40
+ nnz = train_set.size
41
+
42
+ x_row = int_ptr(x_row)
43
+ x_col = int_ptr(x_col)
44
+ x = real_ptr(x_val)
45
+
46
+ x_full = nil
47
+ weight = nil
48
+ lam_unique = nil
49
+
50
+ uu = nil
51
+ ii = nil
52
+
53
+ @user_info_map = {}
54
+ u_row, u_col, u_sp, nnz_u, @m_u, p_ = process_info(user_info, @user_map, @user_info_map, :user_id)
55
+
56
+ @item_info_map = {}
57
+ i_row, i_col, i_sp, nnz_i, @n_i, q = process_info(item_info, @item_map, @item_info_map, :item_id)
58
+
59
+ @precompute_for_predictions = false
60
+
61
+ # initialize w/ normal distribution
62
+ reset_values = true
63
+
64
+ @a = Fiddle::Pointer.malloc([@m, @m_u].max * (@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
65
+ @b = Fiddle::Pointer.malloc([@n, @n_i].max * (@k_item + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
66
+ @c = p_ > 0 ? Fiddle::Pointer.malloc(p_ * (@k_user + @k) * Fiddle::SIZEOF_DOUBLE) : nil
67
+ @d = q > 0 ? Fiddle::Pointer.malloc(q * (@k_item + @k) * Fiddle::SIZEOF_DOUBLE) : nil
68
+
69
+ @bias_a = nil
70
+ @bias_b = nil
71
+
72
+ u_colmeans = Fiddle::Pointer.malloc(p_ * Fiddle::SIZEOF_DOUBLE)
73
+ i_colmeans = Fiddle::Pointer.malloc(q * Fiddle::SIZEOF_DOUBLE)
74
+
75
+ if @implicit
76
+ @w_main_multiplier = 1.0
77
+ @alpha = 1.0
78
+ @adjust_weight = false # downweight?
79
+ @apply_log_transf = false
80
+
81
+ # different defaults
82
+ @lambda_ = 1e0
83
+ @w_user = 10
84
+ @w_item = 10
85
+ @finalize_chol = false
86
+
87
+ args = [
88
+ @a, @b,
89
+ @c, @d,
90
+ reset_values, @random_state,
91
+ u_colmeans, i_colmeans,
92
+ @m, @n, @k,
93
+ x_row, x_col, x, nnz,
94
+ @lambda_, lam_unique,
95
+ uu, @m_u, p_,
96
+ ii, @n_i, q,
97
+ u_row, u_col, u_sp, nnz_u,
98
+ i_row, i_col, i_sp, nnz_i,
99
+ @na_as_zero_user, @na_as_zero_item,
100
+ @k_main, @k_user, @k_item,
101
+ @w_main, @w_user, @w_item, real_ptr([@w_main_multiplier]),
102
+ @alpha, @adjust_weight, @apply_log_transf,
103
+ @niter, @nthreads, @verbose, @handle_interrupt,
104
+ @use_cg, @max_cg_steps, @finalize_chol,
105
+ @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
106
+ @precompute_for_predictions,
107
+ nil, #precomputedBtB,
108
+ nil, #precomputedBeTBe,
109
+ nil #precomputedBeTBeChol
110
+ ]
111
+ check_status FFI.fit_collective_implicit_als(*fiddle_args(args))
112
+
113
+ @global_mean = 0
114
+ else
115
+ @bias_a = Fiddle::Pointer.malloc([@m, @m_u].max * Fiddle::SIZEOF_DOUBLE) if @user_bias
116
+ @bias_b = Fiddle::Pointer.malloc([@n, @n_i].max * Fiddle::SIZEOF_DOUBLE) if @item_bias
117
+
118
+ if @add_implicit_features
119
+ @ai = Fiddle::Pointer.malloc([@m, @m_u].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
120
+ @bi = Fiddle::Pointer.malloc([@n, @n_i].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
121
+ else
122
+ @ai = nil
123
+ @bi = nil
124
+ end
125
+
126
+ glob_mean = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
127
+
128
+ args = [
129
+ @bias_a, @bias_b,
130
+ @a, @b,
131
+ @c, @d,
132
+ @ai, @bi,
133
+ @add_implicit_features,
134
+ reset_values, @random_state,
135
+ glob_mean,
136
+ u_colmeans, i_colmeans,
137
+ @m, @n, @k,
138
+ x_row, x_col, x, nnz,
139
+ x_full,
140
+ weight,
141
+ @user_bias, @item_bias,
142
+ @lambda_, lam_unique,
143
+ uu, @m_u, p_,
144
+ ii, @n_i, q,
145
+ u_row, u_col, u_sp, nnz_u,
146
+ i_row, i_col, i_sp, nnz_i,
147
+ @na_as_zero, @na_as_zero_user, @na_as_zero_item,
148
+ @k_main, @k_user, @k_item,
149
+ @w_main, @w_user, @w_item, @w_implicit,
150
+ @niter, @nthreads, @verbose, @handle_interrupt,
151
+ @use_cg, @max_cg_steps, @finalize_chol,
152
+ @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
153
+ @precompute_for_predictions,
154
+ @include_all_x,
155
+ nil, #B_plus_bias,
156
+ nil, #precomputedBtB,
157
+ nil, #precomputedTransBtBinvBt,
158
+ nil, #precomputedBeTBeChol,
159
+ nil, #precomputedBiTBi,
160
+ nil, #precomputedTransCtCinvCt,
161
+ nil #precomputedCtCw
162
+ ]
163
+ check_status FFI.fit_collective_explicit_als(*fiddle_args(args))
164
+
165
+ @global_mean = real_array(glob_mean).first
166
+ end
167
+
168
+ @u_colmeans = real_array(u_colmeans)
169
+ @i_colmeans = real_array(i_colmeans)
170
+ @u_colmeans_ptr = u_colmeans
171
+
172
+ self
173
+ end
174
+
175
+ def user_recs(user_id, count: 5, item_ids: nil)
176
+ check_fit
177
+ user = @user_map[user_id]
178
+
179
+ if user
180
+ if item_ids
181
+ # remove missing ids
182
+ item_ids = item_ids.select { |v| @item_map[v] }
183
+
184
+ pred_a = int_ptr([@user_map[user_id]] * item_ids.size)
185
+ pred_b = int_ptr(item_ids.map { |v| @item_map[v] })
186
+ nnz = item_ids.size
187
+ outp = Fiddle::Pointer.malloc(nnz * Fiddle::SIZEOF_DOUBLE)
188
+
189
+ FFI.predict_multiple(
190
+ @a, @k_user,
191
+ @b, @k_item,
192
+ @bias_a, @bias_b,
193
+ @global_mean,
194
+ @k, @k_main,
195
+ @m, @n,
196
+ pred_a, pred_b, nnz,
197
+ outp,
198
+ @nthreads
199
+ )
200
+
201
+ scores = real_array(outp)
202
+ item_ids.zip(scores).map do |item_id, score|
203
+ {item_id: item_id, score: score}
204
+ end
205
+ else
206
+ a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
207
+ a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
208
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count)
209
+ end
210
+ else
211
+ # no items if user is unknown
212
+ # TODO maybe most popular items
213
+ []
214
+ end
215
+ end
216
+
217
+ # TODO add item_ids
218
+ def new_user_recs(data, count: 5, user_info: nil)
219
+ check_fit
220
+
221
+ a_vec, a_bias = factors_warm(data, user_info: user_info)
222
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count)
223
+ end
224
+
225
+ def user_factors
226
+ read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main)
227
+ end
228
+
229
+ def item_factors
230
+ read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main)
231
+ end
232
+
233
+ def user_bias
234
+ read_bias(@bias_a) if @bias_a
235
+ end
236
+
237
+ def item_bias
238
+ read_bias(@bias_b) if @bias_b
239
+ end
240
+
241
+ private
242
+
243
+ def set_params(
244
+ k: 40, lambda_: 1e+1, method: "als", use_cg: true, user_bias: true,
245
+ item_bias: true, add_implicit_features: false,
246
+ k_user: 0, k_item: 0, k_main: 0,
247
+ w_main: 1.0, w_user: 1.0, w_item: 1.0, w_implicit: 0.5,
248
+ maxiter: 800, niter: 10, parallelize: "separate", corr_pairs: 4,
249
+ max_cg_steps: 3, finalize_chol: true,
250
+ na_as_zero: false, na_as_zero_user: false, na_as_zero_item: false,
251
+ nonneg: false, nonneg_c: false, nonneg_d: false, max_cd_steps: 100,
252
+ precompute_for_predictions: true, include_all_x: true,
253
+ use_float: false,
254
+ random_state: 1, verbose: true, print_every: 10,
255
+ handle_interrupt: true, produce_dicts: false,
256
+ copy_data: true, nthreads: -1
257
+ )
258
+
259
+ @k = k
260
+ @k_user = k_user
261
+ @k_item = k_item
262
+ @k_main = k_main
263
+ @lambda_ = lambda_
264
+ @w_main = w_main
265
+ @w_user = w_user
266
+ @w_item = w_item
267
+ @w_implicit = w_implicit
268
+ @user_bias = !!user_bias
269
+ @item_bias = !!item_bias
270
+ @method = method
271
+ @add_implicit_features = !!add_implicit_features
272
+ @use_cg = !!use_cg
273
+ @max_cg_steps = max_cg_steps.to_i
274
+ @max_cd_steps = max_cd_steps.to_i
275
+ @finalize_chol = !!finalize_chol
276
+ @maxiter = maxiter
277
+ @niter = niter
278
+ @parallelize = parallelize
279
+ @na_as_zero = !!na_as_zero
280
+ @na_as_zero_user = !!na_as_zero_user
281
+ @na_as_zero_item = !!na_as_zero_item
282
+ @nonneg = !!nonneg
283
+ @nonneg_c = !!nonneg_c
284
+ @nonneg_d = !!nonneg_d
285
+ @precompute_for_predictions = !!precompute_for_predictions
286
+ @include_all_x = true
287
+ @use_float = !!use_float
288
+ @verbose = !!verbose
289
+ @print_every = print_every
290
+ @corr_pairs = corr_pairs
291
+ @random_state = random_state.to_i
292
+ @produce_dicts = !!produce_dicts
293
+ @handle_interrupt = !!handle_interrupt
294
+ @copy_data = !!copy_data
295
+ nthreads = Etc.nprocessors if nthreads < 0
296
+ @nthreads = nthreads
297
+ end
298
+
299
+ def create_maps(train_set)
300
+ user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
301
+ item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
302
+
303
+ raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
304
+ raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
305
+
306
+ @user_map = user_ids.zip(user_ids.size.times).to_h
307
+ @item_map = item_ids.zip(item_ids.size.times).to_h
308
+ end
309
+
310
+ def check_ratings(ratings)
311
+ unless ratings.all? { |r| !r.nil? }
312
+ raise ArgumentError, "Missing ratings"
313
+ end
314
+ unless ratings.all? { |r| r.is_a?(Numeric) }
315
+ raise ArgumentError, "Ratings must be numeric"
316
+ end
317
+ end
318
+
319
+ def check_training_set(train_set)
320
+ raise ArgumentError, "No training data" if train_set.empty?
321
+ end
322
+
323
+ def check_fit
324
+ raise "Not fit" unless defined?(@implicit)
325
+ end
326
+
327
+ def to_dataset(dataset)
328
+ if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
329
+ # convert keys to symbols
330
+ dataset = dataset.dup
331
+ dataset.keys.each do |k, v|
332
+ dataset[k.to_sym] ||= dataset.delete(k)
333
+ end
334
+ dataset.to_a
335
+ elsif defined?(Daru::DataFrame) && dataset.is_a?(Daru::DataFrame)
336
+ # convert keys to symbols
337
+ dataset = dataset.dup
338
+ new_names = dataset.vectors.to_a.map { |k| [k, k.to_sym] }.to_h
339
+ dataset.rename_vectors!(new_names)
340
+ dataset.to_a[0]
341
+ else
342
+ dataset
343
+ end
344
+ end
345
+
346
+ def read_factors(ptr, d1, d2)
347
+ arr = []
348
+ offset = 0
349
+ width = d2 * Fiddle::SIZEOF_DOUBLE
350
+ d1.times do |i|
351
+ arr << ptr[offset, width].unpack("d*")
352
+ offset += width
353
+ end
354
+ arr
355
+ end
356
+
357
+ def read_bias(ptr)
358
+ real_array(ptr)
359
+ end
360
+
361
+ def top_n(a_vec:, a_bias:, count:)
362
+ include_ix = nil
363
+ n_include = 0
364
+ exclude_ix = nil
365
+ n_exclude = 0
366
+
367
+ outp_ix = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_INT)
368
+ outp_score = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_DOUBLE)
369
+
370
+ check_status FFI.topN(
371
+ a_vec, @k_user,
372
+ @b, @k_item,
373
+ @bias_b, @global_mean, a_bias,
374
+ @k, @k_main,
375
+ include_ix, n_include,
376
+ exclude_ix, n_exclude,
377
+ outp_ix, outp_score,
378
+ count, @n,
379
+ @nthreads
380
+ )
381
+
382
+ imap = @item_map.map(&:reverse).to_h
383
+ item_ids = int_array(outp_ix).map { |v| imap[v] }
384
+ scores = real_array(outp_score)
385
+
386
+ item_ids.zip(scores).map do |item_id, score|
387
+ {item_id: item_id, score: score}
388
+ end
389
+ end
390
+
391
+ def factors_warm(data, user_info: nil)
392
+ data = to_dataset(data)
393
+ user_info = to_dataset(user_info) if user_info
394
+
395
+ nnz = data.size
396
+ a_vec = Fiddle::Pointer.malloc((@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
397
+ bias_a = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
398
+
399
+ u_vec_sp = []
400
+ u_vec_x_col = []
401
+ if user_info
402
+ user_info.each do |k, v|
403
+ next if k == :user_id
404
+
405
+ uc = @user_info_map[k]
406
+ raise "Bad key: #{k}" unless uc
407
+
408
+ u_vec_x_col << uc
409
+ u_vec_sp << v
410
+ end
411
+ end
412
+ p_ = @user_info_map.size
413
+ nnz_u_vec = u_vec_sp.size
414
+ u_vec_x_col = int_ptr(u_vec_x_col)
415
+ u_vec_sp = real_ptr(u_vec_sp)
416
+
417
+ u_vec = nil
418
+ u_bin_vec = nil
419
+ pbin = 0
420
+
421
+ weight = nil
422
+ lam_unique = nil
423
+ n_max = @n
424
+
425
+ if data.any?
426
+ if @implicit
427
+ ratings = data.map { |d| d[:value] || 1 }
428
+ else
429
+ ratings = data.map { |d| d[:rating] }
430
+ check_ratings(ratings)
431
+ end
432
+ xa = real_ptr(ratings)
433
+ x_col = int_ptr(data.map { |d| d[:item_id] })
434
+ else
435
+ xa = nil
436
+ x_col = nil
437
+ end
438
+ xa_dense = nil
439
+
440
+ if @implicit
441
+ args = [
442
+ a_vec,
443
+ u_vec, p_,
444
+ u_vec_sp, u_vec_x_col, nnz_u_vec,
445
+ @na_as_zero_user,
446
+ @nonneg,
447
+ @u_colmeans_ptr,
448
+ @b, @n, @c,
449
+ xa, x_col, nnz,
450
+ @k, @k_user, @k_item, @k_main,
451
+ @lambda_, @alpha,
452
+ @w_main, @w_user, @w_main_multiplier,
453
+ @apply_log_transf,
454
+ nil, #BeTBe,
455
+ nil, #BtB
456
+ nil #BeTBeChol
457
+ ]
458
+ check_status FFI.factors_collective_implicit_single(*fiddle_args(args))
459
+ else
460
+ cb = nil
461
+
462
+ args = [
463
+ a_vec, bias_a,
464
+ u_vec, p_,
465
+ u_vec_sp, u_vec_x_col, nnz_u_vec,
466
+ u_bin_vec, pbin,
467
+ @na_as_zero_user, @na_as_zero,
468
+ @nonneg,
469
+ @c, cb,
470
+ @global_mean, @bias_b, @u_colmeans_ptr,
471
+ xa, x_col, nnz, xa_dense,
472
+ @n, weight, @b, @bi,
473
+ @add_implicit_features,
474
+ @k, @k_user, @k_item, @k_main,
475
+ @lambda_, lam_unique,
476
+ @w_main, @w_user, @w_implicit,
477
+ n_max,
478
+ @include_all_x,
479
+ nil, #TransBtBinvBt,
480
+ nil, #BtB,
481
+ nil, #BeTBeChol,
482
+ nil, #BiTBi,
483
+ nil, #CtCw,
484
+ nil, #TransCtCinvCt,
485
+ nil #B_plus_bias
486
+ ]
487
+ check_status FFI.factors_collective_explicit_single(*fiddle_args(args))
488
+ end
489
+
490
+ [a_vec, real_array(bias_a).first]
491
+ end
492
+
493
+ # convert boolean to int
494
+ def fiddle_args(args)
495
+ args.map { |v| v == true || v == false ? (v ? 1 : 0) : v }
496
+ end
497
+
498
+ def check_status(ret_val)
499
+ case ret_val
500
+ when 0
501
+ # success
502
+ when 1
503
+ raise "Could not allocate sufficient memory"
504
+ else
505
+ raise "Bad status: #{ret_val}"
506
+ end
507
+ end
508
+
509
+ def process_info(info, map, info_map, key)
510
+ return [nil, nil, nil, 0, 0, 0] unless info
511
+
512
+ info = to_dataset(info)
513
+
514
+ row = []
515
+ col = []
516
+ val = []
517
+ info.each do |ri|
518
+ rk = ri[key]
519
+ raise ArgumentError, "Missing #{key}" unless rk
520
+
521
+ r = (map[rk] ||= map.size)
522
+ ri.each do |k, v|
523
+ next if k == key
524
+ row << r
525
+ col << (info_map[k] ||= info_map.size)
526
+ val << v
527
+ end
528
+ end
529
+ [int_ptr(row), int_ptr(col), real_ptr(val), val.size, map.size, info_map.size]
530
+ end
531
+
532
+ def int_ptr(v)
533
+ v.pack("i*")
534
+ end
535
+
536
+ def real_ptr(v)
537
+ v.pack("d*")
538
+ end
539
+
540
+ def int_array(ptr)
541
+ ptr.to_s(ptr.size).unpack("i*")
542
+ end
543
+
544
+ def real_array(ptr)
545
+ ptr.to_s(ptr.size).unpack("d*")
546
+ end
547
+ end
548
+ end
@@ -0,0 +1,3 @@
1
+ module Cmfrec
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,74 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2020 David Cortes
4
+
5
+ All rights reserved.
6
+
7
+ Permission is hereby granted, free of charge, to any person obtaining a copy
8
+ of this software and associated documentation files (the "Software"), to
9
+ deal in the Software without restriction, including without limitation the
10
+ rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
11
+ sell copies of the Software, and to permit persons to whom the Software is
12
+ furnished to do so, subject to the following conditions:
13
+
14
+ The above copyright notice and this permission notice shall be included in
15
+ all copies or substantial portions of the Software.
16
+
17
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
22
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
23
+ IN THE SOFTWARE.
24
+
25
+ ---
26
+
27
+ ANSI C implementation of vector operations.
28
+
29
+ Copyright (c) 2007-2010 Naoaki Okazaki
30
+ All rights reserved.
31
+
32
+ Permission is hereby granted, free of charge, to any person obtaining a copy
33
+ of this software and associated documentation files (the "Software"), to deal
34
+ in the Software without restriction, including without limitation the rights
35
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
36
+ copies of the Software, and to permit persons to whom the Software is
37
+ furnished to do so, subject to the following conditions:
38
+
39
+ The above copyright notice and this permission notice shall be included in
40
+ all copies or substantial portions of the Software.
41
+
42
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
43
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
44
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
45
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
46
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
47
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
48
+ THE SOFTWARE.
49
+
50
+ ---
51
+
52
+ C library of Limited memory BFGS (L-BFGS).
53
+
54
+ Copyright (c) 1990, Jorge Nocedal
55
+ Copyright (c) 2007-2010 Naoaki Okazaki
56
+ All rights reserved.
57
+
58
+ Permission is hereby granted, free of charge, to any person obtaining a copy
59
+ of this software and associated documentation files (the "Software"), to deal
60
+ in the Software without restriction, including without limitation the rights
61
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
62
+ copies of the Software, and to permit persons to whom the Software is
63
+ furnished to do so, subject to the following conditions:
64
+
65
+ The above copyright notice and this permission notice shall be included in
66
+ all copies or substantial portions of the Software.
67
+
68
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
69
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
70
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
71
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
72
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
73
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
74
+ THE SOFTWARE.
Binary file
Binary file
metadata ADDED
@@ -0,0 +1,52 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: cmfrec
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Andrew Kane
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2020-11-28 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description:
14
+ email: andrew@chartkick.com
15
+ executables: []
16
+ extensions: []
17
+ extra_rdoc_files: []
18
+ files:
19
+ - CHANGELOG.md
20
+ - LICENSE.txt
21
+ - README.md
22
+ - lib/cmfrec.rb
23
+ - lib/cmfrec/ffi.rb
24
+ - lib/cmfrec/recommender.rb
25
+ - lib/cmfrec/version.rb
26
+ - vendor/LICENSE.txt
27
+ - vendor/libcmfrec.dylib
28
+ - vendor/libcmfrec.so
29
+ homepage: https://github.com/ankane/cmfrec
30
+ licenses:
31
+ - MIT
32
+ metadata: {}
33
+ post_install_message:
34
+ rdoc_options: []
35
+ require_paths:
36
+ - lib
37
+ required_ruby_version: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ version: '2.5'
42
+ required_rubygems_version: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ requirements: []
48
+ rubygems_version: 3.1.4
49
+ signing_key:
50
+ specification_version: 4
51
+ summary: Recommendations for Ruby using collective matrix factorization
52
+ test_files: []