cmfrec 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA256:
3
+ metadata.gz: 2e9f45e0c3826b90788782ac8a0838476fc7849e75d91f2d5949b08b382a28c3
4
+ data.tar.gz: 2f44b988001bf2b23e3c5938b4c5e7b9089e8943d86aa42c86e72be0ff558ea1
5
+ SHA512:
6
+ metadata.gz: 592ef4363bc016da1a35a958f99dd13a0fc7e900ecbe77b3ba95fc4b4cbbe151397924ff90aab2e326ac3369ac07c0380c76b1d3a2ac71b390664118de0f8611
7
+ data.tar.gz: 27d3e1ce80e88af9e8b062837f68329b10718372be15c08d4cd2e056619d1cd9e5ed906e6070798c32993ccea2f617b706af5e6afb7cba4f3cfc70ea22393ba6
@@ -0,0 +1,3 @@
1
+ ## 0.1.0 (2020-11-27)
2
+
3
+ - First release
@@ -0,0 +1,24 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2020 David Cortes
4
+ Copyright (c) 2020 Andrew Kane
5
+
6
+ All rights reserved.
7
+
8
+ Permission is hereby granted, free of charge, to any person obtaining a copy
9
+ of this software and associated documentation files (the "Software"), to
10
+ deal in the Software without restriction, including without limitation the
11
+ rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
12
+ sell copies of the Software, and to permit persons to whom the Software is
13
+ furnished to do so, subject to the following conditions:
14
+
15
+ The above copyright notice and this permission notice shall be included in
16
+ all copies or substantial portions of the Software.
17
+
18
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
24
+ IN THE SOFTWARE.
@@ -0,0 +1,191 @@
1
+ # cmfrec
2
+
3
+ :fire: Recommendations for Ruby, powered by [cmfrec](https://github.com/david-cortes/cmfrec)
4
+
5
+ - Supports side information :tada:
6
+ - Works with explicit and implicit feedback
7
+ - Uses high-performance matrix factorization
8
+
9
+ Not available for Windows yet
10
+
11
+ [![Build Status](https://github.com/ankane/cmfrec/workflows/build/badge.svg?branch=master)](https://github.com/ankane/cmfrec/actions)
12
+
13
+ ## Installation
14
+
15
+ Add this line to your application’s Gemfile:
16
+
17
+ ```ruby
18
+ gem 'cmfrec'
19
+ ```
20
+
21
+ ## Getting Started
22
+
23
+ Create a recommender
24
+
25
+ ```ruby
26
+ recommender = Cmfrec::Recommender.new
27
+ ```
28
+
29
+ If users rate items directly, this is known as explicit feedback. Fit the recommender with:
30
+
31
+ ```ruby
32
+ recommender.fit([
33
+ {user_id: 1, item_id: 1, rating: 5},
34
+ {user_id: 2, item_id: 1, rating: 3}
35
+ ])
36
+ ```
37
+
38
+ > IDs can be integers, strings, or any other data type
39
+
40
+ If users don’t rate items directly (for instance, they’re purchasing items or reading posts), this is known as implicit feedback. Leave out the rating, or use a value like number of purchases, number of page views, or time spent on page:
41
+
42
+ ```ruby
43
+ recommender.fit([
44
+ {user_id: 1, item_id: 1, value: 1},
45
+ {user_id: 2, item_id: 1, value: 1}
46
+ ])
47
+ ```
48
+
49
+ > Use `value` instead of rating for implicit feedback
50
+
51
+ Get recommendations - “users like you also liked”
52
+
53
+ ```ruby
54
+ recommender.user_recs(user_id)
55
+ ```
56
+
57
+ Get recommendations for a new user
58
+
59
+ ```ruby
60
+ recommender.new_user_recs([
61
+ {item_id: 1, value: 5},
62
+ {item_id: 2, value: 3}
63
+ ])
64
+ ```
65
+
66
+ Use the `count` option to specify the number of recommendations (default is 5)
67
+
68
+ ```ruby
69
+ recommender.user_recs(user_id, count: 3)
70
+ ```
71
+
72
+ Get predicted ratings for specific items
73
+
74
+ ```ruby
75
+ recommender.user_recs(user_id, item_ids: [1, 2, 3])
76
+ ```
77
+
78
+ ## Side Information
79
+
80
+ Add side information about users, items, or both
81
+
82
+ ```ruby
83
+ user_info = [
84
+ {user_id: 1, a: 1, b: 1},
85
+ {user_id: 2, a: 1, b: 1},
86
+ ]
87
+ item_info = [
88
+ {item_id: 1, c: 1, d: 1},
89
+ {item_id: 2, c: 1, d: 1},
90
+ ]
91
+ recommender.fit(ratings, user_info: user_info, item_info: item_info)
92
+ ```
93
+
94
+ Get recommendations for a new user with ratings and side information
95
+
96
+ ```ruby
97
+ ratings = [
98
+ {item_id: 1, rating: 5},
99
+ {item_id: 2, rating: 3}
100
+ ]
101
+ recommender.new_user_recs(ratings, user_info: {a: 1, b: 1})
102
+ ```
103
+
104
+ Get recommendations with only side information
105
+
106
+ ```ruby
107
+ recommender.new_user_recs([], user_info: {a: 1, b: 1})
108
+ ```
109
+
110
+ ## Options
111
+
112
+ Specify the number of factors and epochs
113
+
114
+ ```ruby
115
+ Cmfrec::Recommender.new(factors: 8, epochs: 20)
116
+ ```
117
+
118
+ If recommendations look off, trying changing `factors`. The default is 8, but 3 could be good for some applications and 300 good for others.
119
+
120
+ ### Explicit Feedback
121
+
122
+ Add implicit features
123
+
124
+ ```ruby
125
+ Cmfrec::Recommender.new(add_implicit_features: true)
126
+ ```
127
+
128
+ Disable bias
129
+
130
+ ```ruby
131
+ Cmfrec::Recommender.new(user_bias: false, item_bias: false)
132
+ ```
133
+
134
+ ## Data
135
+
136
+ Data can be an array of hashes
137
+
138
+ ```ruby
139
+ [{user_id: 1, item_id: 1, rating: 5}, {user_id: 2, item_id: 1, rating: 3}]
140
+ ```
141
+
142
+ Or a Rover data frame
143
+
144
+ ```ruby
145
+ Rover.read_csv("ratings.csv")
146
+ ```
147
+
148
+ ## Reference
149
+
150
+ Get the global mean
151
+
152
+ ```ruby
153
+ recommender.global_mean
154
+ ```
155
+
156
+ Get the factors
157
+
158
+ ```ruby
159
+ recommender.user_factors
160
+ recommender.item_factors
161
+ ```
162
+
163
+ Get the bias
164
+
165
+ ```ruby
166
+ recommender.user_bias
167
+ recommender.item_bias
168
+ ```
169
+
170
+ ## History
171
+
172
+ View the [changelog](https://github.com/ankane/cmfrec/blob/master/CHANGELOG.md)
173
+
174
+ ## Contributing
175
+
176
+ Everyone is encouraged to help improve this project. Here are a few ways you can help:
177
+
178
+ - [Report bugs](https://github.com/ankane/cmfrec/issues)
179
+ - Fix bugs and [submit pull requests](https://github.com/ankane/cmfrec/pulls)
180
+ - Write, clarify, or fix documentation
181
+ - Suggest or add new features
182
+
183
+ To get started with development:
184
+
185
+ ```sh
186
+ git clone https://github.com/ankane/cmfrec.git
187
+ cd cmfrec
188
+ bundle install
189
+ bundle exec rake vendor:all
190
+ bundle exec rake test
191
+ ```
@@ -0,0 +1,28 @@
1
+ # stdlib
2
+ require "etc"
3
+ require "fiddle/import"
4
+
5
+ # modules
6
+ require "cmfrec/recommender"
7
+ require "cmfrec/version"
8
+
9
+ module Cmfrec
10
+ class Error < StandardError; end
11
+
12
+ class << self
13
+ attr_accessor :ffi_lib
14
+ end
15
+ lib_name =
16
+ if Gem.win_platform?
17
+ "cmfrec.dll"
18
+ elsif RbConfig::CONFIG["host_os"] =~ /darwin/i
19
+ "libcmfrec.dylib"
20
+ else
21
+ "libcmfrec.so"
22
+ end
23
+ vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
24
+ self.ffi_lib = [vendor_lib]
25
+
26
+ # friendlier error message
27
+ autoload :FFI, "cmfrec/ffi"
28
+ end
@@ -0,0 +1,26 @@
1
+ module Cmfrec
2
+ module FFI
3
+ extend Fiddle::Importer
4
+
5
+ libs = Cmfrec.ffi_lib.dup
6
+ begin
7
+ dlload Fiddle.dlopen(libs.shift)
8
+ rescue Fiddle::DLError => e
9
+ retry if libs.any?
10
+ raise e
11
+ end
12
+
13
+ typealias "bool", "char"
14
+ # determined by CMakeLists.txt
15
+ typealias "int_t", "int"
16
+ typealias "real_t", "double"
17
+
18
+ extern "int_t fit_collective_explicit_als(real_t *biasA, real_t *biasB, real_t *A, real_t *B, real_t *C, real_t *D, real_t *Ai, real_t *Bi, bool add_implicit_features, bool reset_values, int_t seed, real_t *glob_mean, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t *Xfull, real_t *weight, bool user_bias, bool item_bias, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_X, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t w_implicit, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, bool include_all_X, real_t *B_plus_bias, real_t *precomputedBtB, real_t *precomputedTransBtBinvBt, real_t *precomputedBeTBeChol, real_t *precomputedBiTBi, real_t *precomputedTransCtCinvCt, real_t *precomputedCtCw)"
19
+ extern "int_t fit_collective_implicit_als(real_t *A, real_t *B, real_t *C, real_t *D, bool reset_values, int_t seed, real_t *U_colmeans, real_t *I_colmeans, int_t m, int_t n, int_t k, int_t X_row[], int_t X_col[], real_t *X, size_t nnz, real_t lam, real_t *lam_unique, real_t *U, int_t m_u, int_t p, real_t *II, int_t n_i, int_t q, int_t U_row[], int_t U_col[], real_t *U_sp, size_t nnz_U, int_t I_row[], int_t I_col[], real_t *I_sp, size_t nnz_I, bool NA_as_zero_U, bool NA_as_zero_I, int_t k_main, int_t k_user, int_t k_item, real_t w_main, real_t w_user, real_t w_item, real_t *w_main_multiplier, real_t alpha, bool adjust_weight, bool apply_log_transf, int_t niter, int_t nthreads, bool verbose, bool handle_interrupt, bool use_cg, int_t max_cg_steps, bool finalize_chol, bool nonneg, int_t max_cd_steps, bool nonneg_C, bool nonneg_D, bool precompute_for_predictions, real_t *precomputedBtB, real_t *precomputedBeTBe, real_t *precomputedBeTBeChol)"
20
+ extern "int_t factors_collective_explicit_single(real_t *a_vec, real_t *a_bias, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, real_t *u_bin_vec, int_t pbin, bool NA_as_zero_U, bool NA_as_zero_X, bool nonneg, real_t *C, real_t *Cb, real_t glob_mean, real_t *biasB, real_t *U_colmeans, real_t *Xa, int_t X_col[], size_t nnz, real_t *Xa_dense, int_t n, real_t *weight, real_t *B, real_t *Bi, bool add_implicit_features, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t *lam_unique, real_t w_main, real_t w_user, real_t w_implicit, int_t n_max, bool include_all_X, real_t *TransBtBinvBt, real_t *BtB, real_t *BeTBeChol, real_t *BiTBi, real_t *CtCw, real_t *TransCtCinvCt, real_t *B_plus_bias)"
21
+ extern "int_t factors_collective_implicit_single(real_t *a_vec, real_t *u_vec, int_t p, real_t *u_vec_sp, int_t u_vec_X_col[], size_t nnz_u_vec, bool NA_as_zero_U, bool nonneg, real_t *U_colmeans, real_t *B, int_t n, real_t *C, real_t *Xa, int_t X_col[], size_t nnz, int_t k, int_t k_user, int_t k_item, int_t k_main, real_t lam, real_t alpha, real_t w_main, real_t w_user, real_t w_main_multiplier, bool apply_log_transf, real_t *BeTBe, real_t *BtB, real_t *BeTBeChol)"
22
+ extern "void predict_multiple(real_t *restrict A, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasA, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_main, int_t m, int_t n, int_t predA[], int_t predB[], size_t nnz, real_t *restrict outp, int_t nthreads)"
23
+ extern "int_t predict_X_old_collective_explicit(int_t row[], int_t col[], real_t *restrict predicted, size_t n_predict, real_t *restrict A, real_t *restrict biasA, real_t *restrict B, real_t *restrict biasB, real_t glob_mean, int_t k, int_t k_user, int_t k_item, int_t k_main, int_t m, int_t n_max, int_t nthreads)"
24
+ extern "int_t topN(real_t *restrict a_vec, int_t k_user, real_t *restrict B, int_t k_item, real_t *restrict biasB, real_t glob_mean, real_t biasA, int_t k, int_t k_main, int_t *restrict include_ix, int_t n_include, int_t *restrict exclude_ix, int_t n_exclude, int_t *restrict outp_ix, real_t *restrict outp_score, int_t n_top, int_t n, int_t nthreads)"
25
+ end
26
+ end
@@ -0,0 +1,548 @@
1
+ module Cmfrec
2
+ class Recommender
3
+ attr_reader :global_mean
4
+
5
+ def initialize(factors: 8, epochs: 10, verbose: true, user_bias: true, item_bias: true, add_implicit_features: false)
6
+ set_params(
7
+ k: factors,
8
+ niter: epochs,
9
+ verbose: verbose,
10
+ user_bias: user_bias,
11
+ item_bias: item_bias,
12
+ add_implicit_features: add_implicit_features
13
+ )
14
+ end
15
+
16
+ def fit(train_set, user_info: nil, item_info: nil)
17
+ train_set = to_dataset(train_set)
18
+
19
+ @implicit = !train_set.any? { |v| v[:rating] }
20
+ unless @implicit
21
+ ratings = train_set.map { |o| o[:rating] }
22
+ check_ratings(ratings)
23
+ end
24
+
25
+ check_training_set(train_set)
26
+ create_maps(train_set)
27
+
28
+ x_row = []
29
+ x_col = []
30
+ x_val = []
31
+ value_key = @implicit ? :value : :rating
32
+ train_set.each do |v|
33
+ x_row << @user_map[v[:user_id]]
34
+ x_col << @item_map[v[:item_id]]
35
+ x_val << (v[value_key] || 1)
36
+ end
37
+
38
+ @m = @user_map.size
39
+ @n = @item_map.size
40
+ nnz = train_set.size
41
+
42
+ x_row = int_ptr(x_row)
43
+ x_col = int_ptr(x_col)
44
+ x = real_ptr(x_val)
45
+
46
+ x_full = nil
47
+ weight = nil
48
+ lam_unique = nil
49
+
50
+ uu = nil
51
+ ii = nil
52
+
53
+ @user_info_map = {}
54
+ u_row, u_col, u_sp, nnz_u, @m_u, p_ = process_info(user_info, @user_map, @user_info_map, :user_id)
55
+
56
+ @item_info_map = {}
57
+ i_row, i_col, i_sp, nnz_i, @n_i, q = process_info(item_info, @item_map, @item_info_map, :item_id)
58
+
59
+ @precompute_for_predictions = false
60
+
61
+ # initialize w/ normal distribution
62
+ reset_values = true
63
+
64
+ @a = Fiddle::Pointer.malloc([@m, @m_u].max * (@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
65
+ @b = Fiddle::Pointer.malloc([@n, @n_i].max * (@k_item + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
66
+ @c = p_ > 0 ? Fiddle::Pointer.malloc(p_ * (@k_user + @k) * Fiddle::SIZEOF_DOUBLE) : nil
67
+ @d = q > 0 ? Fiddle::Pointer.malloc(q * (@k_item + @k) * Fiddle::SIZEOF_DOUBLE) : nil
68
+
69
+ @bias_a = nil
70
+ @bias_b = nil
71
+
72
+ u_colmeans = Fiddle::Pointer.malloc(p_ * Fiddle::SIZEOF_DOUBLE)
73
+ i_colmeans = Fiddle::Pointer.malloc(q * Fiddle::SIZEOF_DOUBLE)
74
+
75
+ if @implicit
76
+ @w_main_multiplier = 1.0
77
+ @alpha = 1.0
78
+ @adjust_weight = false # downweight?
79
+ @apply_log_transf = false
80
+
81
+ # different defaults
82
+ @lambda_ = 1e0
83
+ @w_user = 10
84
+ @w_item = 10
85
+ @finalize_chol = false
86
+
87
+ args = [
88
+ @a, @b,
89
+ @c, @d,
90
+ reset_values, @random_state,
91
+ u_colmeans, i_colmeans,
92
+ @m, @n, @k,
93
+ x_row, x_col, x, nnz,
94
+ @lambda_, lam_unique,
95
+ uu, @m_u, p_,
96
+ ii, @n_i, q,
97
+ u_row, u_col, u_sp, nnz_u,
98
+ i_row, i_col, i_sp, nnz_i,
99
+ @na_as_zero_user, @na_as_zero_item,
100
+ @k_main, @k_user, @k_item,
101
+ @w_main, @w_user, @w_item, real_ptr([@w_main_multiplier]),
102
+ @alpha, @adjust_weight, @apply_log_transf,
103
+ @niter, @nthreads, @verbose, @handle_interrupt,
104
+ @use_cg, @max_cg_steps, @finalize_chol,
105
+ @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
106
+ @precompute_for_predictions,
107
+ nil, #precomputedBtB,
108
+ nil, #precomputedBeTBe,
109
+ nil #precomputedBeTBeChol
110
+ ]
111
+ check_status FFI.fit_collective_implicit_als(*fiddle_args(args))
112
+
113
+ @global_mean = 0
114
+ else
115
+ @bias_a = Fiddle::Pointer.malloc([@m, @m_u].max * Fiddle::SIZEOF_DOUBLE) if @user_bias
116
+ @bias_b = Fiddle::Pointer.malloc([@n, @n_i].max * Fiddle::SIZEOF_DOUBLE) if @item_bias
117
+
118
+ if @add_implicit_features
119
+ @ai = Fiddle::Pointer.malloc([@m, @m_u].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
120
+ @bi = Fiddle::Pointer.malloc([@n, @n_i].max * (@k + @k_main) * Fiddle::SIZEOF_DOUBLE)
121
+ else
122
+ @ai = nil
123
+ @bi = nil
124
+ end
125
+
126
+ glob_mean = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
127
+
128
+ args = [
129
+ @bias_a, @bias_b,
130
+ @a, @b,
131
+ @c, @d,
132
+ @ai, @bi,
133
+ @add_implicit_features,
134
+ reset_values, @random_state,
135
+ glob_mean,
136
+ u_colmeans, i_colmeans,
137
+ @m, @n, @k,
138
+ x_row, x_col, x, nnz,
139
+ x_full,
140
+ weight,
141
+ @user_bias, @item_bias,
142
+ @lambda_, lam_unique,
143
+ uu, @m_u, p_,
144
+ ii, @n_i, q,
145
+ u_row, u_col, u_sp, nnz_u,
146
+ i_row, i_col, i_sp, nnz_i,
147
+ @na_as_zero, @na_as_zero_user, @na_as_zero_item,
148
+ @k_main, @k_user, @k_item,
149
+ @w_main, @w_user, @w_item, @w_implicit,
150
+ @niter, @nthreads, @verbose, @handle_interrupt,
151
+ @use_cg, @max_cg_steps, @finalize_chol,
152
+ @nonneg, @max_cd_steps, @nonneg_c, @nonneg_d,
153
+ @precompute_for_predictions,
154
+ @include_all_x,
155
+ nil, #B_plus_bias,
156
+ nil, #precomputedBtB,
157
+ nil, #precomputedTransBtBinvBt,
158
+ nil, #precomputedBeTBeChol,
159
+ nil, #precomputedBiTBi,
160
+ nil, #precomputedTransCtCinvCt,
161
+ nil #precomputedCtCw
162
+ ]
163
+ check_status FFI.fit_collective_explicit_als(*fiddle_args(args))
164
+
165
+ @global_mean = real_array(glob_mean).first
166
+ end
167
+
168
+ @u_colmeans = real_array(u_colmeans)
169
+ @i_colmeans = real_array(i_colmeans)
170
+ @u_colmeans_ptr = u_colmeans
171
+
172
+ self
173
+ end
174
+
175
+ def user_recs(user_id, count: 5, item_ids: nil)
176
+ check_fit
177
+ user = @user_map[user_id]
178
+
179
+ if user
180
+ if item_ids
181
+ # remove missing ids
182
+ item_ids = item_ids.select { |v| @item_map[v] }
183
+
184
+ pred_a = int_ptr([@user_map[user_id]] * item_ids.size)
185
+ pred_b = int_ptr(item_ids.map { |v| @item_map[v] })
186
+ nnz = item_ids.size
187
+ outp = Fiddle::Pointer.malloc(nnz * Fiddle::SIZEOF_DOUBLE)
188
+
189
+ FFI.predict_multiple(
190
+ @a, @k_user,
191
+ @b, @k_item,
192
+ @bias_a, @bias_b,
193
+ @global_mean,
194
+ @k, @k_main,
195
+ @m, @n,
196
+ pred_a, pred_b, nnz,
197
+ outp,
198
+ @nthreads
199
+ )
200
+
201
+ scores = real_array(outp)
202
+ item_ids.zip(scores).map do |item_id, score|
203
+ {item_id: item_id, score: score}
204
+ end
205
+ else
206
+ a_vec = @a[user * @k * Fiddle::SIZEOF_DOUBLE, @k * Fiddle::SIZEOF_DOUBLE]
207
+ a_bias = @bias_a ? @bias_a[user * Fiddle::SIZEOF_DOUBLE, Fiddle::SIZEOF_DOUBLE].unpack1("d") : 0
208
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count)
209
+ end
210
+ else
211
+ # no items if user is unknown
212
+ # TODO maybe most popular items
213
+ []
214
+ end
215
+ end
216
+
217
+ # TODO add item_ids
218
+ def new_user_recs(data, count: 5, user_info: nil)
219
+ check_fit
220
+
221
+ a_vec, a_bias = factors_warm(data, user_info: user_info)
222
+ top_n(a_vec: a_vec, a_bias: a_bias, count: count)
223
+ end
224
+
225
+ def user_factors
226
+ read_factors(@a, [@m, @m_u].max, @k_user + @k + @k_main)
227
+ end
228
+
229
+ def item_factors
230
+ read_factors(@b, [@n, @n_i].max, @k_item + @k + @k_main)
231
+ end
232
+
233
+ def user_bias
234
+ read_bias(@bias_a) if @bias_a
235
+ end
236
+
237
+ def item_bias
238
+ read_bias(@bias_b) if @bias_b
239
+ end
240
+
241
+ private
242
+
243
+ def set_params(
244
+ k: 40, lambda_: 1e+1, method: "als", use_cg: true, user_bias: true,
245
+ item_bias: true, add_implicit_features: false,
246
+ k_user: 0, k_item: 0, k_main: 0,
247
+ w_main: 1.0, w_user: 1.0, w_item: 1.0, w_implicit: 0.5,
248
+ maxiter: 800, niter: 10, parallelize: "separate", corr_pairs: 4,
249
+ max_cg_steps: 3, finalize_chol: true,
250
+ na_as_zero: false, na_as_zero_user: false, na_as_zero_item: false,
251
+ nonneg: false, nonneg_c: false, nonneg_d: false, max_cd_steps: 100,
252
+ precompute_for_predictions: true, include_all_x: true,
253
+ use_float: false,
254
+ random_state: 1, verbose: true, print_every: 10,
255
+ handle_interrupt: true, produce_dicts: false,
256
+ copy_data: true, nthreads: -1
257
+ )
258
+
259
+ @k = k
260
+ @k_user = k_user
261
+ @k_item = k_item
262
+ @k_main = k_main
263
+ @lambda_ = lambda_
264
+ @w_main = w_main
265
+ @w_user = w_user
266
+ @w_item = w_item
267
+ @w_implicit = w_implicit
268
+ @user_bias = !!user_bias
269
+ @item_bias = !!item_bias
270
+ @method = method
271
+ @add_implicit_features = !!add_implicit_features
272
+ @use_cg = !!use_cg
273
+ @max_cg_steps = max_cg_steps.to_i
274
+ @max_cd_steps = max_cd_steps.to_i
275
+ @finalize_chol = !!finalize_chol
276
+ @maxiter = maxiter
277
+ @niter = niter
278
+ @parallelize = parallelize
279
+ @na_as_zero = !!na_as_zero
280
+ @na_as_zero_user = !!na_as_zero_user
281
+ @na_as_zero_item = !!na_as_zero_item
282
+ @nonneg = !!nonneg
283
+ @nonneg_c = !!nonneg_c
284
+ @nonneg_d = !!nonneg_d
285
+ @precompute_for_predictions = !!precompute_for_predictions
286
+ @include_all_x = true
287
+ @use_float = !!use_float
288
+ @verbose = !!verbose
289
+ @print_every = print_every
290
+ @corr_pairs = corr_pairs
291
+ @random_state = random_state.to_i
292
+ @produce_dicts = !!produce_dicts
293
+ @handle_interrupt = !!handle_interrupt
294
+ @copy_data = !!copy_data
295
+ nthreads = Etc.nprocessors if nthreads < 0
296
+ @nthreads = nthreads
297
+ end
298
+
299
+ def create_maps(train_set)
300
+ user_ids = train_set.map { |v| v[:user_id] }.uniq.sort
301
+ item_ids = train_set.map { |v| v[:item_id] }.uniq.sort
302
+
303
+ raise ArgumentError, "Missing user_id" if user_ids.any?(&:nil?)
304
+ raise ArgumentError, "Missing item_id" if item_ids.any?(&:nil?)
305
+
306
+ @user_map = user_ids.zip(user_ids.size.times).to_h
307
+ @item_map = item_ids.zip(item_ids.size.times).to_h
308
+ end
309
+
310
+ def check_ratings(ratings)
311
+ unless ratings.all? { |r| !r.nil? }
312
+ raise ArgumentError, "Missing ratings"
313
+ end
314
+ unless ratings.all? { |r| r.is_a?(Numeric) }
315
+ raise ArgumentError, "Ratings must be numeric"
316
+ end
317
+ end
318
+
319
+ def check_training_set(train_set)
320
+ raise ArgumentError, "No training data" if train_set.empty?
321
+ end
322
+
323
+ def check_fit
324
+ raise "Not fit" unless defined?(@implicit)
325
+ end
326
+
327
+ def to_dataset(dataset)
328
+ if defined?(Rover::DataFrame) && dataset.is_a?(Rover::DataFrame)
329
+ # convert keys to symbols
330
+ dataset = dataset.dup
331
+ dataset.keys.each do |k, v|
332
+ dataset[k.to_sym] ||= dataset.delete(k)
333
+ end
334
+ dataset.to_a
335
+ elsif defined?(Daru::DataFrame) && dataset.is_a?(Daru::DataFrame)
336
+ # convert keys to symbols
337
+ dataset = dataset.dup
338
+ new_names = dataset.vectors.to_a.map { |k| [k, k.to_sym] }.to_h
339
+ dataset.rename_vectors!(new_names)
340
+ dataset.to_a[0]
341
+ else
342
+ dataset
343
+ end
344
+ end
345
+
346
+ def read_factors(ptr, d1, d2)
347
+ arr = []
348
+ offset = 0
349
+ width = d2 * Fiddle::SIZEOF_DOUBLE
350
+ d1.times do |i|
351
+ arr << ptr[offset, width].unpack("d*")
352
+ offset += width
353
+ end
354
+ arr
355
+ end
356
+
357
+ def read_bias(ptr)
358
+ real_array(ptr)
359
+ end
360
+
361
+ def top_n(a_vec:, a_bias:, count:)
362
+ include_ix = nil
363
+ n_include = 0
364
+ exclude_ix = nil
365
+ n_exclude = 0
366
+
367
+ outp_ix = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_INT)
368
+ outp_score = Fiddle::Pointer.malloc(count * Fiddle::SIZEOF_DOUBLE)
369
+
370
+ check_status FFI.topN(
371
+ a_vec, @k_user,
372
+ @b, @k_item,
373
+ @bias_b, @global_mean, a_bias,
374
+ @k, @k_main,
375
+ include_ix, n_include,
376
+ exclude_ix, n_exclude,
377
+ outp_ix, outp_score,
378
+ count, @n,
379
+ @nthreads
380
+ )
381
+
382
+ imap = @item_map.map(&:reverse).to_h
383
+ item_ids = int_array(outp_ix).map { |v| imap[v] }
384
+ scores = real_array(outp_score)
385
+
386
+ item_ids.zip(scores).map do |item_id, score|
387
+ {item_id: item_id, score: score}
388
+ end
389
+ end
390
+
391
+ def factors_warm(data, user_info: nil)
392
+ data = to_dataset(data)
393
+ user_info = to_dataset(user_info) if user_info
394
+
395
+ nnz = data.size
396
+ a_vec = Fiddle::Pointer.malloc((@k_user + @k + @k_main) * Fiddle::SIZEOF_DOUBLE)
397
+ bias_a = Fiddle::Pointer.malloc(Fiddle::SIZEOF_DOUBLE)
398
+
399
+ u_vec_sp = []
400
+ u_vec_x_col = []
401
+ if user_info
402
+ user_info.each do |k, v|
403
+ next if k == :user_id
404
+
405
+ uc = @user_info_map[k]
406
+ raise "Bad key: #{k}" unless uc
407
+
408
+ u_vec_x_col << uc
409
+ u_vec_sp << v
410
+ end
411
+ end
412
+ p_ = @user_info_map.size
413
+ nnz_u_vec = u_vec_sp.size
414
+ u_vec_x_col = int_ptr(u_vec_x_col)
415
+ u_vec_sp = real_ptr(u_vec_sp)
416
+
417
+ u_vec = nil
418
+ u_bin_vec = nil
419
+ pbin = 0
420
+
421
+ weight = nil
422
+ lam_unique = nil
423
+ n_max = @n
424
+
425
+ if data.any?
426
+ if @implicit
427
+ ratings = data.map { |d| d[:value] || 1 }
428
+ else
429
+ ratings = data.map { |d| d[:rating] }
430
+ check_ratings(ratings)
431
+ end
432
+ xa = real_ptr(ratings)
433
+ x_col = int_ptr(data.map { |d| d[:item_id] })
434
+ else
435
+ xa = nil
436
+ x_col = nil
437
+ end
438
+ xa_dense = nil
439
+
440
+ if @implicit
441
+ args = [
442
+ a_vec,
443
+ u_vec, p_,
444
+ u_vec_sp, u_vec_x_col, nnz_u_vec,
445
+ @na_as_zero_user,
446
+ @nonneg,
447
+ @u_colmeans_ptr,
448
+ @b, @n, @c,
449
+ xa, x_col, nnz,
450
+ @k, @k_user, @k_item, @k_main,
451
+ @lambda_, @alpha,
452
+ @w_main, @w_user, @w_main_multiplier,
453
+ @apply_log_transf,
454
+ nil, #BeTBe,
455
+ nil, #BtB
456
+ nil #BeTBeChol
457
+ ]
458
+ check_status FFI.factors_collective_implicit_single(*fiddle_args(args))
459
+ else
460
+ cb = nil
461
+
462
+ args = [
463
+ a_vec, bias_a,
464
+ u_vec, p_,
465
+ u_vec_sp, u_vec_x_col, nnz_u_vec,
466
+ u_bin_vec, pbin,
467
+ @na_as_zero_user, @na_as_zero,
468
+ @nonneg,
469
+ @c, cb,
470
+ @global_mean, @bias_b, @u_colmeans_ptr,
471
+ xa, x_col, nnz, xa_dense,
472
+ @n, weight, @b, @bi,
473
+ @add_implicit_features,
474
+ @k, @k_user, @k_item, @k_main,
475
+ @lambda_, lam_unique,
476
+ @w_main, @w_user, @w_implicit,
477
+ n_max,
478
+ @include_all_x,
479
+ nil, #TransBtBinvBt,
480
+ nil, #BtB,
481
+ nil, #BeTBeChol,
482
+ nil, #BiTBi,
483
+ nil, #CtCw,
484
+ nil, #TransCtCinvCt,
485
+ nil #B_plus_bias
486
+ ]
487
+ check_status FFI.factors_collective_explicit_single(*fiddle_args(args))
488
+ end
489
+
490
+ [a_vec, real_array(bias_a).first]
491
+ end
492
+
493
+ # convert boolean to int
494
+ def fiddle_args(args)
495
+ args.map { |v| v == true || v == false ? (v ? 1 : 0) : v }
496
+ end
497
+
498
+ def check_status(ret_val)
499
+ case ret_val
500
+ when 0
501
+ # success
502
+ when 1
503
+ raise "Could not allocate sufficient memory"
504
+ else
505
+ raise "Bad status: #{ret_val}"
506
+ end
507
+ end
508
+
509
+ def process_info(info, map, info_map, key)
510
+ return [nil, nil, nil, 0, 0, 0] unless info
511
+
512
+ info = to_dataset(info)
513
+
514
+ row = []
515
+ col = []
516
+ val = []
517
+ info.each do |ri|
518
+ rk = ri[key]
519
+ raise ArgumentError, "Missing #{key}" unless rk
520
+
521
+ r = (map[rk] ||= map.size)
522
+ ri.each do |k, v|
523
+ next if k == key
524
+ row << r
525
+ col << (info_map[k] ||= info_map.size)
526
+ val << v
527
+ end
528
+ end
529
+ [int_ptr(row), int_ptr(col), real_ptr(val), val.size, map.size, info_map.size]
530
+ end
531
+
532
+ def int_ptr(v)
533
+ v.pack("i*")
534
+ end
535
+
536
+ def real_ptr(v)
537
+ v.pack("d*")
538
+ end
539
+
540
+ def int_array(ptr)
541
+ ptr.to_s(ptr.size).unpack("i*")
542
+ end
543
+
544
+ def real_array(ptr)
545
+ ptr.to_s(ptr.size).unpack("d*")
546
+ end
547
+ end
548
+ end
@@ -0,0 +1,3 @@
1
+ module Cmfrec
2
+ VERSION = "0.1.0"
3
+ end
@@ -0,0 +1,74 @@
1
+ MIT License
2
+
3
+ Copyright (c) 2020 David Cortes
4
+
5
+ All rights reserved.
6
+
7
+ Permission is hereby granted, free of charge, to any person obtaining a copy
8
+ of this software and associated documentation files (the "Software"), to
9
+ deal in the Software without restriction, including without limitation the
10
+ rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
11
+ sell copies of the Software, and to permit persons to whom the Software is
12
+ furnished to do so, subject to the following conditions:
13
+
14
+ The above copyright notice and this permission notice shall be included in
15
+ all copies or substantial portions of the Software.
16
+
17
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
22
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
23
+ IN THE SOFTWARE.
24
+
25
+ ---
26
+
27
+ ANSI C implementation of vector operations.
28
+
29
+ Copyright (c) 2007-2010 Naoaki Okazaki
30
+ All rights reserved.
31
+
32
+ Permission is hereby granted, free of charge, to any person obtaining a copy
33
+ of this software and associated documentation files (the "Software"), to deal
34
+ in the Software without restriction, including without limitation the rights
35
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
36
+ copies of the Software, and to permit persons to whom the Software is
37
+ furnished to do so, subject to the following conditions:
38
+
39
+ The above copyright notice and this permission notice shall be included in
40
+ all copies or substantial portions of the Software.
41
+
42
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
43
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
44
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
45
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
46
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
47
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
48
+ THE SOFTWARE.
49
+
50
+ ---
51
+
52
+ C library of Limited memory BFGS (L-BFGS).
53
+
54
+ Copyright (c) 1990, Jorge Nocedal
55
+ Copyright (c) 2007-2010 Naoaki Okazaki
56
+ All rights reserved.
57
+
58
+ Permission is hereby granted, free of charge, to any person obtaining a copy
59
+ of this software and associated documentation files (the "Software"), to deal
60
+ in the Software without restriction, including without limitation the rights
61
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
62
+ copies of the Software, and to permit persons to whom the Software is
63
+ furnished to do so, subject to the following conditions:
64
+
65
+ The above copyright notice and this permission notice shall be included in
66
+ all copies or substantial portions of the Software.
67
+
68
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
69
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
70
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
71
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
72
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
73
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
74
+ THE SOFTWARE.
Binary file
Binary file
metadata ADDED
@@ -0,0 +1,52 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: cmfrec
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.1.0
5
+ platform: ruby
6
+ authors:
7
+ - Andrew Kane
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2020-11-28 00:00:00.000000000 Z
12
+ dependencies: []
13
+ description:
14
+ email: andrew@chartkick.com
15
+ executables: []
16
+ extensions: []
17
+ extra_rdoc_files: []
18
+ files:
19
+ - CHANGELOG.md
20
+ - LICENSE.txt
21
+ - README.md
22
+ - lib/cmfrec.rb
23
+ - lib/cmfrec/ffi.rb
24
+ - lib/cmfrec/recommender.rb
25
+ - lib/cmfrec/version.rb
26
+ - vendor/LICENSE.txt
27
+ - vendor/libcmfrec.dylib
28
+ - vendor/libcmfrec.so
29
+ homepage: https://github.com/ankane/cmfrec
30
+ licenses:
31
+ - MIT
32
+ metadata: {}
33
+ post_install_message:
34
+ rdoc_options: []
35
+ require_paths:
36
+ - lib
37
+ required_ruby_version: !ruby/object:Gem::Requirement
38
+ requirements:
39
+ - - ">="
40
+ - !ruby/object:Gem::Version
41
+ version: '2.5'
42
+ required_rubygems_version: !ruby/object:Gem::Requirement
43
+ requirements:
44
+ - - ">="
45
+ - !ruby/object:Gem::Version
46
+ version: '0'
47
+ requirements: []
48
+ rubygems_version: 3.1.4
49
+ signing_key:
50
+ specification_version: 4
51
+ summary: Recommendations for Ruby using collective matrix factorization
52
+ test_files: []