rumale 0.22.2 → 0.22.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 703a6895f4218ca45c5d5ae5e86559b077cf1be213d4939eb1e9ab94eac4621d
4
- data.tar.gz: 5862466e565d1e6030c35494b5028ae980a47d373e90050c62266055fcecd374
3
+ metadata.gz: 2bcd9baeafc1a271f75ccd74123f50ebd9d4fbe9065c2583f376c562f8e49155
4
+ data.tar.gz: 937dda6bbe4c41953f1e6eb1ea205eaa54277ae9f4202fa8a1e7e789348a76ad
5
5
  SHA512:
6
- metadata.gz: 988d55c681a102e0c65b9133c6aeafc049e33755955f959d6e6046f5601dd192af881424355a2b373ed2e7a5a16b74236698aef5372e09584b10fe28d1b7bc21
7
- data.tar.gz: adc58efa3b46d9fc1a87ddb2a4df32472507d61f21a3a0eb07026068cc5e41af166fb0a0f8ae23f1b23aec649b22835a50edbed79d35255e8cc231b82b31eb8c
6
+ metadata.gz: cbad4cc283bb449116b360bc4ef8002928add3399005bcc30aaccdf95ea03233f0d035862de643b4aa4d688eedbeaaa7dc029c67a2336156d7e03c9435468cfa
7
+ data.tar.gz: 83bfa0f53d7c0e094f271bfb3ddfef21ca58d41d77e1278886b5e26216a5b614629c9be33bc587bccc62e280612c75dbd0356fce772a727ed8cc003f86a03976
@@ -0,0 +1 @@
1
+ service_name: github-ci
@@ -0,0 +1,28 @@
1
+ name: coverage
2
+
3
+ on:
4
+ push:
5
+ branches: [ main ]
6
+ pull_request:
7
+ branches: [ main ]
8
+
9
+ jobs:
10
+ coverage:
11
+ runs-on: ubuntu-20.04
12
+ steps:
13
+ - uses: actions/checkout@v2
14
+ - name: Install BLAS and LAPACK
15
+ run: sudo apt-get install -y libopenblas-dev liblapacke-dev
16
+ - name: Set up Ruby 2.7
17
+ uses: actions/setup-ruby@v1
18
+ with:
19
+ ruby-version: '2.7'
20
+ - name: Build and test with Rake
21
+ run: |
22
+ gem install bundler
23
+ bundle install
24
+ bundle exec rake
25
+ - name: Coveralls GitHub Action
26
+ uses: coverallsapp/github-action@v1.1.2
27
+ with:
28
+ github-token: ${{ secrets.GITHUB_TOKEN }}
data/.gitignore CHANGED
@@ -16,6 +16,7 @@
16
16
  tags
17
17
  .DS_Store
18
18
  .ruby-version
19
+ iterate.dat
19
20
  /spec/dump_dbl.t
20
21
  /spec/dump_int.t
21
22
  /spec/dump_mult_dbl.t
@@ -1,3 +1,12 @@
1
+ # 0.22.3
2
+ - Add regressor class for non-negative least square method.
3
+ - [NNLS](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/NNLS.html)
4
+ - Add lbfgs solver to [Ridge](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/Ridge.html) and [LinearRegression](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/LinearRegression.html).
5
+ - In version 0.23.0, these classes will be changed to attempt to optimize with 'svd' or 'lbfgs' solver if 'auto' is given to
6
+ the solver parameter. If you use 'sgd' solver, you need specify it explicitly.
7
+ - Add GC guard to native extension codes.
8
+ - Update API documentation.
9
+
1
10
  # 0.22.2
2
11
  - Add classifier and regressor classes for stacking method.
3
12
  - [StackingClassifier](https://yoshoku.github.io/rumale/doc/Rumale/Ensemble/StackingClassifier.html)
data/Gemfile CHANGED
@@ -13,4 +13,5 @@ gem 'rubocop', '~> 1.0'
13
13
  gem 'rubocop-performance', '~> 1.8'
14
14
  gem 'rubocop-rake', '~> 0.5'
15
15
  gem 'rubocop-rspec', '~> 2.0'
16
- gem 'simplecov', '~> 0.19'
16
+ gem 'simplecov', '~> 0.21'
17
+ gem 'simplecov-lcov', '~> 0.8'
@@ -1,4 +1,4 @@
1
- Copyright (c) 2017-2020 Atsushi Tatsuma
1
+ Copyright (c) 2017-2021 Atsushi Tatsuma
2
2
  All rights reserved.
3
3
 
4
4
  Redistribution and use in source and binary forms, with or without
data/README.md CHANGED
@@ -3,6 +3,7 @@
3
3
  ![Rumale](https://dl.dropboxusercontent.com/s/joxruk2720ur66o/rumale_header_400.png)
4
4
 
5
5
  [![Build Status](https://github.com/yoshoku/rumale/workflows/build/badge.svg)](https://github.com/yoshoku/rumale/actions?query=workflow%3Abuild)
6
+ [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=main)](https://coveralls.io/github/yoshoku/rumale?branch=main)
6
7
  [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
7
8
  [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/main/LICENSE.txt)
8
9
  [![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://yoshoku.github.io/rumale/doc/)
@@ -176,7 +177,7 @@ For example, using the [OpenBLAS](https://github.com/xianyi/OpenBLAS) speeds up
176
177
 
177
178
  Install OpenBLAS library.
178
179
 
179
- Mac:
180
+ macOS:
180
181
 
181
182
  ```bash
182
183
  $ brew install openblas
@@ -185,12 +186,13 @@ $ brew install openblas
185
186
  Ubuntu:
186
187
 
187
188
  ```bash
188
- $ sudo apt-get install gcc gfortran
189
- $ wget https://github.com/xianyi/OpenBLAS/archive/v0.3.5.tar.gz
190
- $ tar xzf v0.3.5.tar.gz
191
- $ cd OpenBLAS-0.3.5
192
- $ make USE_OPENMP=1
193
- $ sudo make PREFIX=/usr/local install
189
+ $ sudo apt-get install libopenblas-dev liblapacke-dev
190
+ ```
191
+
192
+ Windows (MSYS2):
193
+
194
+ ```bash
195
+ $ pacman -S mingw-w64-x86_64-ruby mingw-w64-x86_64-openblas mingw-w64-x86_64-lapack
194
196
  ```
195
197
 
196
198
  Install Numo::Linalg gem.
@@ -206,6 +208,37 @@ require 'numo/linalg/autoloader'
206
208
  require 'rumale'
207
209
  ```
208
210
 
211
+ ### Numo::OpenBLAS
212
+ [Numo::OpenBLAS](https://github.com/yoshoku/numo-openblas) downloads and builds OpenBLAS during installation
213
+ and uses that as a background library for Numo::Linalg.
214
+
215
+ Install compilers for building OpenBLAS.
216
+
217
+ macOS:
218
+
219
+ ```bash
220
+ $ brew install gcc gfortran make
221
+ ```
222
+
223
+ Ubuntu:
224
+
225
+ ```bash
226
+ $ sudo apt-get install gcc gfortran make
227
+ ```
228
+
229
+ Install Numo::OpenBLAS gem.
230
+
231
+ ```bash
232
+ $ gem install numo-openblas
233
+ ```
234
+
235
+ Load Numo::OpenBLAS gem instead of Numo::Linalg.
236
+
237
+ ```ruby
238
+ require 'numo/openblas'
239
+ require 'rumale'
240
+ ```
241
+
209
242
  ### Parallel
210
243
  Several estimators in Rumale support parallel processing.
211
244
  Parallel processing in Rumale is realized by [Parallel](https://github.com/grosser/parallel) gem,
@@ -227,6 +260,10 @@ When -1 is given to n_jobs parameter, all processors are used.
227
260
  estimator = Rumale::Ensemble::RandomForestClassifier.new(n_jobs: -1, random_seed: 1)
228
261
  ```
229
262
 
263
+ ## Related Projects
264
+ - [Rumale::SVM](https://github.com/yoshoku/rumale-svm) provides support vector machine algorithms in LIBSVM and LIBLINEAR with Rumale interface.
265
+ - [Rumale::Torch](https://github.com/yoshoku/rumale-torch) provides the learning and inference by the neural network defined in torch.rb with Rumale interface.
266
+
230
267
  ## Novelties
231
268
 
232
269
  * [Rumale SHOP](https://suzuri.jp/yoshoku)
@@ -257,10 +257,13 @@ find_split_params_cls(VALUE self, VALUE criterion, VALUE impurity, VALUE order,
257
257
  split_opts_cls opts = { StringValuePtr(criterion), NUM2LONG(n_classes), NUM2DBL(impurity) };
258
258
  VALUE params = na_ndloop3(&ndf, &opts, 3, order, features, labels);
259
259
  VALUE results = rb_ary_new2(4);
260
- rb_ary_store(results, 0, DBL2NUM(((double*)na_get_pointer_for_read(params))[0]));
261
- rb_ary_store(results, 1, DBL2NUM(((double*)na_get_pointer_for_read(params))[1]));
262
- rb_ary_store(results, 2, DBL2NUM(((double*)na_get_pointer_for_read(params))[2]));
263
- rb_ary_store(results, 3, DBL2NUM(((double*)na_get_pointer_for_read(params))[3]));
260
+ double* params_ptr = (double*)na_get_pointer_for_read(params);
261
+ rb_ary_store(results, 0, DBL2NUM(params_ptr[0]));
262
+ rb_ary_store(results, 1, DBL2NUM(params_ptr[1]));
263
+ rb_ary_store(results, 2, DBL2NUM(params_ptr[2]));
264
+ rb_ary_store(results, 3, DBL2NUM(params_ptr[3]));
265
+ RB_GC_GUARD(params);
266
+ RB_GC_GUARD(criterion);
264
267
  return results;
265
268
  }
266
269
 
@@ -375,10 +378,13 @@ find_split_params_reg(VALUE self, VALUE criterion, VALUE impurity, VALUE order,
375
378
  split_opts_reg opts = { StringValuePtr(criterion), NUM2DBL(impurity) };
376
379
  VALUE params = na_ndloop3(&ndf, &opts, 3, order, features, targets);
377
380
  VALUE results = rb_ary_new2(4);
378
- rb_ary_store(results, 0, DBL2NUM(((double*)na_get_pointer_for_read(params))[0]));
379
- rb_ary_store(results, 1, DBL2NUM(((double*)na_get_pointer_for_read(params))[1]));
380
- rb_ary_store(results, 2, DBL2NUM(((double*)na_get_pointer_for_read(params))[2]));
381
- rb_ary_store(results, 3, DBL2NUM(((double*)na_get_pointer_for_read(params))[3]));
381
+ double* params_ptr = (double*)na_get_pointer_for_read(params);
382
+ rb_ary_store(results, 0, DBL2NUM(params_ptr[0]));
383
+ rb_ary_store(results, 1, DBL2NUM(params_ptr[1]));
384
+ rb_ary_store(results, 2, DBL2NUM(params_ptr[2]));
385
+ rb_ary_store(results, 3, DBL2NUM(params_ptr[3]));
386
+ RB_GC_GUARD(params);
387
+ RB_GC_GUARD(criterion);
382
388
  return results;
383
389
  }
384
390
 
@@ -464,8 +470,10 @@ find_split_params_grad_reg
464
470
  double opts[3] = { NUM2DBL(sum_gradient), NUM2DBL(sum_hessian), NUM2DBL(reg_lambda) };
465
471
  VALUE params = na_ndloop3(&ndf, opts, 4, order, features, gradients, hessians);
466
472
  VALUE results = rb_ary_new2(2);
467
- rb_ary_store(results, 0, DBL2NUM(((double*)na_get_pointer_for_read(params))[0]));
468
- rb_ary_store(results, 1, DBL2NUM(((double*)na_get_pointer_for_read(params))[1]));
473
+ double* params_ptr = (double*)na_get_pointer_for_read(params);
474
+ rb_ary_store(results, 0, DBL2NUM(params_ptr[0]));
475
+ rb_ary_store(results, 1, DBL2NUM(params_ptr[1]));
476
+ RB_GC_GUARD(params);
469
477
  return results;
470
478
  }
471
479
 
@@ -497,6 +505,9 @@ node_impurity_cls(VALUE self, VALUE criterion, VALUE y_nary, VALUE n_elements_,
497
505
 
498
506
  xfree(histogram);
499
507
 
508
+ RB_GC_GUARD(y_nary);
509
+ RB_GC_GUARD(criterion);
510
+
500
511
  return ret;
501
512
  }
502
513
 
@@ -531,6 +542,8 @@ node_impurity_reg(VALUE self, VALUE criterion, VALUE y)
531
542
 
532
543
  xfree(sum_vec);
533
544
 
545
+ RB_GC_GUARD(criterion);
546
+
534
547
  return ret;
535
548
  }
536
549
 
@@ -30,6 +30,7 @@ require 'rumale/linear_model/linear_regression'
30
30
  require 'rumale/linear_model/ridge'
31
31
  require 'rumale/linear_model/lasso'
32
32
  require 'rumale/linear_model/elastic_net'
33
+ require 'rumale/linear_model/nnls'
33
34
  require 'rumale/kernel_machine/kernel_svc'
34
35
  require 'rumale/kernel_machine/kernel_pca'
35
36
  require 'rumale/kernel_machine/kernel_fda'
@@ -11,13 +11,15 @@ module Rumale
11
11
 
12
12
  private
13
13
 
14
- def enable_linalg?
14
+ def enable_linalg?(warning: true)
15
15
  if defined?(Numo::Linalg).nil?
16
- warn('If you want to use features that depend on Numo::Linalg, you should install and load Numo::Linalg in advance.')
16
+ warn('If you want to use features that depend on Numo::Linalg, you should install and load Numo::Linalg in advance.') if warning
17
17
  return false
18
18
  end
19
19
  if Numo::Linalg::VERSION < '0.1.4'
20
- warn('The loaded Numo::Linalg does not implement the methods required by Rumale. Please load Numo::Linalg version 0.1.4 or later.')
20
+ if warning
21
+ warn('The loaded Numo::Linalg does not implement the methods required by Rumale. Please load Numo::Linalg version 0.1.4 or later.')
22
+ end
21
23
  return false
22
24
  end
23
25
  true
@@ -81,7 +81,7 @@ module Rumale
81
81
  # Fit the model with given training data.
82
82
  #
83
83
  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
84
- # @param y [Numo::Int32] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
84
+ # @param y [Numo::DFloat] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
85
85
  # @return [ElasticNet] The learned regressor itself.
86
86
  def fit(x, y)
87
87
  x = check_convert_sample_array(x)
@@ -77,7 +77,7 @@ module Rumale
77
77
  # Fit the model with given training data.
78
78
  #
79
79
  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
80
- # @param y [Numo::Int32] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
80
+ # @param y [Numo::DFloat] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
81
81
  # @return [Lasso] The learned regressor itself.
82
82
  def fit(x, y)
83
83
  x = check_convert_sample_array(x)
@@ -6,7 +6,8 @@ require 'rumale/base/regressor'
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LinearRegression is a class that implements ordinary least square linear regression
9
- # with stochastic gradient descent (SGD) optimization or singular value decomposition (SVD).
9
+ # with stochastic gradient descent (SGD) optimization,
10
+ # singular value decomposition (SVD), or L-BFGS optimization.
10
11
  #
11
12
  # @example
12
13
  # estimator =
@@ -41,31 +42,32 @@ module Rumale
41
42
  #
42
43
  # @param learning_rate [Float] The initial value of learning rate.
43
44
  # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
- # If solver = 'svd', this parameter is ignored.
45
+ # If solver is not 'sgd', this parameter is ignored.
45
46
  # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
46
47
  # If nil is given, the decay sets to 'learning_rate'.
47
- # If solver = 'svd', this parameter is ignored.
48
+ # If solver is not 'sgd', this parameter is ignored.
48
49
  # @param momentum [Float] The momentum factor.
49
- # If solver = 'svd', this parameter is ignored.
50
+ # If solver is not 'sgd', this parameter is ignored.
50
51
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
51
52
  # @param bias_scale [Float] The scale of the bias term.
52
53
  # @param max_iter [Integer] The maximum number of epochs that indicates
53
54
  # how many times the whole data is given to the training process.
54
- # If solver = 'svd', this parameter is ignored.
55
+ # If solver is 'svd', this parameter is ignored.
55
56
  # @param batch_size [Integer] The size of the mini batches.
56
- # If solver = 'svd', this parameter is ignored.
57
+ # If solver is not 'sgd', this parameter is ignored.
57
58
  # @param tol [Float] The tolerance of loss for terminating optimization.
58
- # If solver = 'svd', this parameter is ignored.
59
- # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd' or 'svd').
59
+ # If solver is 'svd', this parameter is ignored.
60
+ # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd', 'svd' or 'lbfgs').
60
61
  # 'auto' chooses the 'svd' solver if Numo::Linalg is loaded. Otherwise, it chooses the 'sgd' solver.
61
62
  # 'sgd' uses the stochastic gradient descent optimization.
62
63
  # 'svd' performs singular value decomposition of samples.
64
+ # 'lbfgs' uses the L-BFGS method for optimization.
63
65
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
64
66
  # If nil is given, the method does not execute in parallel.
65
67
  # If zero or less is given, it becomes equal to the number of processors.
66
- # This parameter is ignored if the Parallel gem is not loaded.
68
+ # This parameter is ignored if the Parallel gem is not loaded or solver is not 'sgd'.
67
69
  # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
68
- # If solver = 'svd', this parameter is ignored.
70
+ # If solver is 'svd', this parameter is ignored.
69
71
  # @param random_seed [Integer] The seed value using to initialize the random generator.
70
72
  def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
71
73
  fit_bias: true, bias_scale: 1.0, max_iter: 1000, batch_size: 50, tol: 1e-4,
@@ -80,9 +82,9 @@ module Rumale
80
82
  super()
81
83
  @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
82
84
  @params[:solver] = if solver == 'auto'
83
- load_linalg? ? 'svd' : 'sgd'
85
+ enable_linalg?(warning: false) ? 'svd' : 'sgd'
84
86
  else
85
- solver != 'svd' ? 'sgd' : 'svd' # rubocop:disable Style/NegatedIfElseCondition
87
+ solver.match?(/^svd$|^sgd$|^lbfgs$/) ? solver : 'sgd'
86
88
  end
87
89
  @params[:decay] ||= @params[:learning_rate]
88
90
  @params[:random_seed] ||= srand
@@ -95,15 +97,17 @@ module Rumale
95
97
  # Fit the model with given training data.
96
98
  #
97
99
  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
98
- # @param y [Numo::Int32] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
100
+ # @param y [Numo::DFloat] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
99
101
  # @return [LinearRegression] The learned regressor itself.
100
102
  def fit(x, y)
101
103
  x = check_convert_sample_array(x)
102
104
  y = check_convert_tvalue_array(y)
103
105
  check_sample_tvalue_size(x, y)
104
106
 
105
- if @params[:solver] == 'svd' && enable_linalg?
107
+ if @params[:solver] == 'svd' && enable_linalg?(warning: false)
106
108
  fit_svd(x, y)
109
+ elsif @params[:solver] == 'lbfgs'
110
+ fit_lbfgs(x, y)
107
111
  else
108
112
  fit_sgd(x, y)
109
113
  end
@@ -124,24 +128,46 @@ module Rumale
124
128
 
125
129
  def fit_svd(x, y)
126
130
  x = expand_feature(x) if fit_bias?
127
-
128
131
  w = Numo::Linalg.pinv(x, driver: 'svd').dot(y)
132
+ @weight_vec, @bias_term = single_target?(y) ? split_weight(w) : split_weight_mult(w)
133
+ end
129
134
 
130
- is_single_target_vals = y.shape[1].nil?
131
- if @params[:fit_bias]
132
- @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
133
- @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
134
- else
135
- @weight_vec = w.dup
136
- @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
135
+ def fit_lbfgs(x, y)
136
+ fnc = proc do |w, x, y| # rubocop:disable Lint/ShadowingOuterLocalVariable
137
+ n_samples, n_features = x.shape
138
+ w = w.reshape(y.shape[1], n_features) unless y.shape[1].nil?
139
+ z = x.dot(w.transpose)
140
+ d = z - y
141
+ loss = (d**2).sum.fdiv(n_samples)
142
+ gradient = 2.fdiv(n_samples) * d.transpose.dot(x)
143
+ [loss, gradient.flatten.dup]
137
144
  end
138
- end
139
145
 
140
- def fit_sgd(x, y)
141
- n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
146
+ x = expand_feature(x) if fit_bias?
147
+
142
148
  n_features = x.shape[1]
149
+ n_outputs = single_target?(y) ? 1 : y.shape[1]
150
+
151
+ res = Lbfgsb.minimize(
152
+ fnc: fnc, jcb: true, x_init: init_weight(n_features, n_outputs), args: [x, y],
153
+ maxiter: @params[:max_iter], factr: @params[:tol] / Lbfgsb::DBL_EPSILON,
154
+ verbose: @params[:verbose] ? 1 : -1
155
+ )
156
+
157
+ @weight_vec, @bias_term =
158
+ if single_target?(y)
159
+ split_weight(res[:x])
160
+ else
161
+ split_weight_mult(res[:x].reshape(n_outputs, n_features).transpose)
162
+ end
163
+ end
143
164
 
144
- if n_outputs > 1
165
+ def fit_sgd(x, y)
166
+ if single_target?(y)
167
+ @weight_vec, @bias_term = partial_fit(x, y)
168
+ else
169
+ n_outputs = y.shape[1]
170
+ n_features = x.shape[1]
145
171
  @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
146
172
  @bias_term = Numo::DFloat.zeros(n_outputs)
147
173
  if enable_parallel?
@@ -150,20 +176,23 @@ module Rumale
150
176
  else
151
177
  n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
152
178
  end
153
- else
154
- @weight_vec, @bias_term = partial_fit(x, y)
155
179
  end
156
180
  end
157
181
 
158
- def fit_bias?
159
- @params[:fit_bias] == true
182
+ def single_target?(y)
183
+ y.ndim == 1
160
184
  end
161
185
 
162
- def load_linalg?
163
- return false if defined?(Numo::Linalg).nil?
164
- return false if Numo::Linalg::VERSION < '0.1.4'
186
+ def init_weight(n_features, n_outputs)
187
+ Rumale::Utils.rand_normal([n_outputs, n_features], @rng.dup).flatten.dup
188
+ end
165
189
 
166
- true
190
+ def split_weight_mult(w)
191
+ if fit_bias?
192
+ [w[0...-1, true].dup, w[-1, true].dup]
193
+ else
194
+ [w.dup, Numo::DFloat.zeros(w.shape[1])]
195
+ end
167
196
  end
168
197
  end
169
198
  end
@@ -0,0 +1,137 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'lbfgsb'
4
+
5
+ require 'rumale/base/base_estimator'
6
+ require 'rumale/base/regressor'
7
+
8
+ module Rumale
9
+ module LinearModel
10
+ # NNLS is a class that implements non-negative least squares regression.
11
+ # NNLS solves least squares problem under non-negative constraints on the coefficient using L-BFGS-B method.
12
+ #
13
+ # @example
14
+ # estimator = Rumale::LinearModel::NNLS.new(reg_param: 0.01, random_seed: 1)
15
+ # estimator.fit(training_samples, traininig_values)
16
+ # results = estimator.predict(testing_samples)
17
+ #
18
+ class NNLS
19
+ include Base::BaseEstimator
20
+ include Base::Regressor
21
+
22
+ # Return the weight vector.
23
+ # @return [Numo::DFloat] (shape: [n_outputs, n_features])
24
+ attr_reader :weight_vec
25
+
26
+ # Return the bias term (a.k.a. intercept).
27
+ # @return [Numo::DFloat] (shape: [n_outputs])
28
+ attr_reader :bias_term
29
+
30
+ # Returns the number of iterations when converged.
31
+ # @return [Integer]
32
+ attr_reader :n_iter
33
+
34
+ # Return the random generator for initializing weight.
35
+ # @return [Random]
36
+ attr_reader :rng
37
+
38
+ # Create a new regressor with non-negative least squares method.
39
+ #
40
+ # @param reg_param [Float] The regularization parameter for L2 regularization term.
41
+ # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
42
+ # @param bias_scale [Float] The scale of the bias term.
43
+ # @param max_iter [Integer] The maximum number of epochs that indicates
44
+ # how many times the whole data is given to the training process.
45
+ # @param tol [Float] The tolerance of loss for terminating optimization.
46
+ # If solver = 'svd', this parameter is ignored.
47
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
48
+ # @param random_seed [Integer] The seed value using to initialize the random generator.
49
+ def initialize(reg_param: 1.0, fit_bias: true, bias_scale: 1.0,
50
+ max_iter: 1000, tol: 1e-4, verbose: false, random_seed: nil)
51
+ check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, tol: tol)
52
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
53
+ check_params_numeric_or_nil(random_seed: random_seed)
54
+ check_params_positive(reg_param: reg_param, max_iter: max_iter)
55
+ @params = method(:initialize).parameters.each_with_object({}) { |(_, prm), obj| obj[prm] = binding.local_variable_get(prm) }
56
+ @params[:random_seed] ||= srand
57
+ @n_iter = nil
58
+ @weight_vec = nil
59
+ @bias_term = nil
60
+ @rng = Random.new(@params[:random_seed])
61
+ end
62
+
63
+ # Fit the model with given training data.
64
+ #
65
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
66
+ # @param y [Numo::DFloat] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
67
+ # @return [NonneagtiveLeastSquare] The learned regressor itself.
68
+ def fit(x, y)
69
+ x = check_convert_sample_array(x)
70
+ y = check_convert_tvalue_array(y)
71
+ check_sample_tvalue_size(x, y)
72
+
73
+ x = expand_feature(x) if fit_bias?
74
+
75
+ n_features = x.shape[1]
76
+ n_outputs = single_target?(y) ? 1 : y.shape[1]
77
+
78
+ w_init = Rumale::Utils.rand_normal([n_outputs, n_features], @rng.dup).flatten.dup
79
+ w_init[w_init.lt(0)] = 0
80
+ bounds = Numo::DFloat.zeros(n_outputs * n_features, 2)
81
+ bounds.shape[0].times { |n| bounds[n, 1] = Float::INFINITY }
82
+
83
+ res = Lbfgsb.minimize(
84
+ fnc: method(:nnls_fnc), jcb: true, x_init: w_init, args: [x, y, @params[:reg_param]], bounds: bounds,
85
+ maxiter: @params[:max_iter], factr: @params[:tol] / Lbfgsb::DBL_EPSILON, verbose: @params[:verbose] ? 1 : -1
86
+ )
87
+
88
+ @n_iter = res[:n_iter]
89
+ w = single_target?(y) ? res[:x] : res[:x].reshape(n_outputs, n_features).transpose
90
+
91
+ if fit_bias?
92
+ @weight_vec = single_target?(y) ? w[0...-1].dup : w[0...-1, true].dup
93
+ @bias_term = single_target?(y) ? w[-1] : w[-1, true].dup
94
+ else
95
+ @weight_vec = w.dup
96
+ @bias_term = single_target?(y) ? 0 : Numo::DFloat.zeros(y.shape[1])
97
+ end
98
+
99
+ self
100
+ end
101
+
102
+ # Predict values for samples.
103
+ #
104
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to predict the values.
105
+ # @return [Numo::DFloat] (shape: [n_samples, n_outputs]) Predicted values per sample.
106
+ def predict(x)
107
+ x = check_convert_sample_array(x)
108
+ x.dot(@weight_vec.transpose) + @bias_term
109
+ end
110
+
111
+ private
112
+
113
+ def nnls_fnc(w, x, y, alpha)
114
+ n_samples, n_features = x.shape
115
+ w = w.reshape(y.shape[1], n_features) unless y.shape[1].nil?
116
+ z = x.dot(w.transpose)
117
+ d = z - y
118
+ loss = (d**2).sum.fdiv(n_samples) + alpha * (w * w).sum
119
+ gradient = 2.fdiv(n_samples) * d.transpose.dot(x) + 2.0 * alpha * w
120
+ [loss, gradient.flatten.dup]
121
+ end
122
+
123
+ def expand_feature(x)
124
+ n_samples = x.shape[0]
125
+ Numo::NArray.hstack([x, Numo::DFloat.ones([n_samples, 1]) * @params[:bias_scale]])
126
+ end
127
+
128
+ def fit_bias?
129
+ @params[:fit_bias] == true
130
+ end
131
+
132
+ def single_target?(y)
133
+ y.ndim == 1
134
+ end
135
+ end
136
+ end
137
+ end
@@ -1,12 +1,15 @@
1
1
  # frozen_string_literal: true
2
2
 
3
+ require 'lbfgsb'
4
+
3
5
  require 'rumale/linear_model/base_sgd'
4
6
  require 'rumale/base/regressor'
5
7
 
6
8
  module Rumale
7
9
  module LinearModel
8
10
  # Ridge is a class that implements Ridge Regression
9
- # with stochastic gradient descent (SGD) optimization or singular value decomposition (SVD).
11
+ # with stochastic gradient descent (SGD) optimization,
12
+ # singular value decomposition (SVD), or L-BFGS optimization.
10
13
  #
11
14
  # @example
12
15
  # estimator =
@@ -41,32 +44,33 @@ module Rumale
41
44
  #
42
45
  # @param learning_rate [Float] The initial value of learning rate.
43
46
  # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
- # If solver = 'svd', this parameter is ignored.
47
+ # If solver is not 'sgd', this parameter is ignored.
45
48
  # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
46
49
  # If nil is given, the decay sets to 'reg_param * learning_rate'.
47
- # If solver = 'svd', this parameter is ignored.
50
+ # If solver is not 'sgd', this parameter is ignored.
48
51
  # @param momentum [Float] The momentum factor.
49
- # If solver = 'svd', this parameter is ignored.
52
+ # If solver is not 'sgd', this parameter is ignored.
50
53
  # @param reg_param [Float] The regularization parameter.
51
54
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
52
55
  # @param bias_scale [Float] The scale of the bias term.
53
56
  # @param max_iter [Integer] The maximum number of epochs that indicates
54
57
  # how many times the whole data is given to the training process.
55
- # If solver = 'svd', this parameter is ignored.
58
+ # If solver is 'svd', this parameter is ignored.
56
59
  # @param batch_size [Integer] The size of the mini batches.
57
- # If solver = 'svd', this parameter is ignored.
60
+ # If solver is not 'sgd', this parameter is ignored.
58
61
  # @param tol [Float] The tolerance of loss for terminating optimization.
59
- # If solver = 'svd', this parameter is ignored.
60
- # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd' or 'svd').
62
+ # If solver is 'svd', this parameter is ignored.
63
+ # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd', 'svd', or 'lbfgs').
61
64
  # 'auto' chooses the 'svd' solver if Numo::Linalg is loaded. Otherwise, it chooses the 'sgd' solver.
62
65
  # 'sgd' uses the stochastic gradient descent optimization.
63
66
  # 'svd' performs singular value decomposition of samples.
67
+ # 'lbfgs' uses the L-BFGS method for optimization.
64
68
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
65
69
  # If nil is given, the method does not execute in parallel.
66
70
  # If zero or less is given, it becomes equal to the number of processors.
67
- # This parameter is ignored if the Parallel gem is not loaded or the solver is 'svd'.
71
+ # This parameter is ignored if the Parallel gem is not loaded or solver is not 'sgd'.
68
72
  # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
69
- # If solver = 'svd', this parameter is ignored.
73
+ # If solver is 'svd', this parameter is ignored.
70
74
  # @param random_seed [Integer] The seed value using to initialize the random generator.
71
75
  def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
72
76
  reg_param: 1.0, fit_bias: true, bias_scale: 1.0,
@@ -83,9 +87,9 @@ module Rumale
83
87
  super()
84
88
  @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
85
89
  @params[:solver] = if solver == 'auto'
86
- load_linalg? ? 'svd' : 'sgd'
90
+ enable_linalg?(warning: false) ? 'svd' : 'sgd'
87
91
  else
88
- solver != 'svd' ? 'sgd' : 'svd' # rubocop:disable Style/NegatedIfElseCondition
92
+ solver.match?(/^svd$|^sgd$|^lbfgs$/) ? solver : 'sgd'
89
93
  end
90
94
  @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
91
95
  @params[:random_seed] ||= srand
@@ -99,15 +103,17 @@ module Rumale
99
103
  # Fit the model with given training data.
100
104
  #
101
105
  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
102
- # @param y [Numo::Int32] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
106
+ # @param y [Numo::DFloat] (shape: [n_samples, n_outputs]) The target values to be used for fitting the model.
103
107
  # @return [Ridge] The learned regressor itself.
104
108
  def fit(x, y)
105
109
  x = check_convert_sample_array(x)
106
110
  y = check_convert_tvalue_array(y)
107
111
  check_sample_tvalue_size(x, y)
108
112
 
109
- if @params[:solver] == 'svd' && enable_linalg?
113
+ if @params[:solver] == 'svd' && enable_linalg?(warning: false)
110
114
  fit_svd(x, y)
115
+ elsif @params[:solver] == 'lbfgs'
116
+ fit_lbfgs(x, y)
111
117
  else
112
118
  fit_sgd(x, y)
113
119
  end
@@ -127,27 +133,51 @@ module Rumale
127
133
  private
128
134
 
129
135
  def fit_svd(x, y)
130
- samples = @params[:fit_bias] ? expand_feature(x) : x
136
+ x = expand_feature(x) if fit_bias?
131
137
 
132
- s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
138
+ s, u, vt = Numo::Linalg.svd(x, driver: 'sdd', job: 'S')
133
139
  d = (s / (s**2 + @params[:reg_param])).diag
134
140
  w = vt.transpose.dot(d).dot(u.transpose).dot(y)
135
141
 
136
- is_single_target_vals = y.shape[1].nil?
137
- if @params[:fit_bias]
138
- @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
139
- @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
140
- else
141
- @weight_vec = w.dup
142
- @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
143
- end
142
+ @weight_vec, @bias_term = single_target?(y) ? split_weight(w) : split_weight_mult(w)
144
143
  end
145
144
 
146
- def fit_sgd(x, y)
147
- n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
145
+ def fit_lbfgs(x, y)
146
+ fnc = proc do |w, x, y, a| # rubocop:disable Lint/ShadowingOuterLocalVariable
147
+ n_samples, n_features = x.shape
148
+ w = w.reshape(y.shape[1], n_features) unless y.shape[1].nil?
149
+ z = x.dot(w.transpose)
150
+ d = z - y
151
+ loss = (d**2).sum.fdiv(n_samples) + a * (w * w).sum
152
+ gradient = 2.fdiv(n_samples) * d.transpose.dot(x) + 2.0 * a * w
153
+ [loss, gradient.flatten.dup]
154
+ end
155
+
156
+ x = expand_feature(x) if fit_bias?
157
+
148
158
  n_features = x.shape[1]
159
+ n_outputs = single_target?(y) ? 1 : y.shape[1]
160
+
161
+ res = Lbfgsb.minimize(
162
+ fnc: fnc, jcb: true, x_init: init_weight(n_features, n_outputs), args: [x, y, @params[:reg_param]],
163
+ maxiter: @params[:max_iter], factr: @params[:tol] / Lbfgsb::DBL_EPSILON,
164
+ verbose: @params[:verbose] ? 1 : -1
165
+ )
166
+
167
+ @weight_vec, @bias_term =
168
+ if single_target?(y)
169
+ split_weight(res[:x])
170
+ else
171
+ split_weight_mult(res[:x].reshape(n_outputs, n_features).transpose)
172
+ end
173
+ end
149
174
 
150
- if n_outputs > 1
175
+ def fit_sgd(x, y)
176
+ if single_target?(y)
177
+ @weight_vec, @bias_term = partial_fit(x, y)
178
+ else
179
+ n_outputs = y.shape[1]
180
+ n_features = x.shape[1]
151
181
  @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
152
182
  @bias_term = Numo::DFloat.zeros(n_outputs)
153
183
  if enable_parallel?
@@ -156,16 +186,23 @@ module Rumale
156
186
  else
157
187
  n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
158
188
  end
159
- else
160
- @weight_vec, @bias_term = partial_fit(x, y)
161
189
  end
162
190
  end
163
191
 
164
- def load_linalg?
165
- return false if defined?(Numo::Linalg).nil?
166
- return false if Numo::Linalg::VERSION < '0.1.4'
192
+ def single_target?(y)
193
+ y.ndim == 1
194
+ end
195
+
196
+ def init_weight(n_features, n_outputs)
197
+ Rumale::Utils.rand_normal([n_outputs, n_features], @rng.dup).flatten.dup
198
+ end
167
199
 
168
- true
200
+ def split_weight_mult(w)
201
+ if fit_bias?
202
+ [w[0...-1, true].dup, w[-1, true].dup]
203
+ else
204
+ [w.dup, Numo::DFloat.zeros(w.shape[1])]
205
+ end
169
206
  end
170
207
  end
171
208
  end
@@ -27,6 +27,7 @@ module Rumale
27
27
  y
28
28
  end
29
29
 
30
+ # @deprecated Use check_convert_sample_array instead of this method.
30
31
  # @!visibility private
31
32
  def check_sample_array(x)
32
33
  raise TypeError, 'Expect class of sample matrix to be Numo::DFloat' unless x.is_a?(Numo::DFloat)
@@ -35,6 +36,7 @@ module Rumale
35
36
  nil
36
37
  end
37
38
 
39
+ # @deprecated Use check_convert_label_array instead of this method.
38
40
  # @!visibility private
39
41
  def check_label_array(y)
40
42
  raise TypeError, 'Expect class of label vector to be Numo::Int32' unless y.is_a?(Numo::Int32)
@@ -43,6 +45,7 @@ module Rumale
43
45
  nil
44
46
  end
45
47
 
48
+ # @deprecated Use check_convert_tvalue_array instead of this method.
46
49
  # @!visibility private
47
50
  def check_tvalue_array(y)
48
51
  raise TypeError, 'Expect class of target value vector to be Numo::DFloat' unless y.is_a?(Numo::DFloat)
@@ -64,49 +67,58 @@ module Rumale
64
67
  nil
65
68
  end
66
69
 
70
+ # TODO: Better to replace with RBS in the future.
67
71
  # @!visibility private
68
72
  def check_params_type(type, params = {})
69
73
  params.each { |k, v| raise TypeError, "Expect class of #{k} to be #{type}" unless v.is_a?(type) }
70
74
  nil
71
75
  end
72
76
 
77
+ # TODO: Better to replace with RBS in the future.
73
78
  # @!visibility private
74
79
  def check_params_type_or_nil(type, params = {})
75
80
  params.each { |k, v| raise TypeError, "Expect class of #{k} to be #{type} or nil" unless v.is_a?(type) || v.is_a?(NilClass) }
76
81
  nil
77
82
  end
78
83
 
84
+ # TODO: Better to replace with RBS in the future.
79
85
  # @!visibility private
80
86
  def check_params_numeric(params = {})
81
87
  check_params_type(Numeric, params)
82
88
  end
83
89
 
90
+ # TODO: Better to replace with RBS in the future.
84
91
  # @!visibility private
85
92
  def check_params_numeric_or_nil(params = {})
86
93
  check_params_type_or_nil(Numeric, params)
87
94
  end
88
95
 
96
+ # @deprecated Use check_params_numeric instead of this method.
89
97
  # @!visibility private
90
98
  def check_params_float(params = {})
91
99
  check_params_type(Float, params)
92
100
  end
93
101
 
102
+ # @deprecated Use check_params_numeric instead of this method.
94
103
  # @!visibility private
95
104
  def check_params_integer(params = {})
96
105
  check_params_type(Integer, params)
97
106
  end
98
107
 
108
+ # TODO: Better to replace with RBS in the future.
99
109
  # @!visibility private
100
110
  def check_params_string(params = {})
101
111
  check_params_type(String, params)
102
112
  end
103
113
 
114
+ # TODO: Better to replace with RBS in the future.
104
115
  # @!visibility private
105
116
  def check_params_boolean(params = {})
106
117
  params.each { |k, v| raise TypeError, "Expect class of #{k} to be Boolean" unless v.is_a?(FalseClass) || v.is_a?(TrueClass) }
107
118
  nil
108
119
  end
109
120
 
121
+ # TODO: Better to replace with RBS in the future.
110
122
  # @!visibility private
111
123
  def check_params_positive(params = {})
112
124
  params.compact.each { |k, v| raise ArgumentError, "Expect #{k} to be positive value" if v.negative? }
@@ -3,5 +3,5 @@
3
3
  # Rumale is a machine learning library in Ruby.
4
4
  module Rumale
5
5
  # The version of Rumale you are using.
6
- VERSION = '0.22.2'
6
+ VERSION = '0.22.3'
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rumale
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.22.2
4
+ version: 0.22.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshoku
8
- autorequire:
8
+ autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2021-01-10 00:00:00.000000000 Z
11
+ date: 2021-01-23 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: numo-narray
@@ -57,7 +57,9 @@ extensions:
57
57
  - ext/rumale/extconf.rb
58
58
  extra_rdoc_files: []
59
59
  files:
60
+ - ".coveralls.yml"
60
61
  - ".github/workflows/build.yml"
62
+ - ".github/workflows/coverage.yml"
61
63
  - ".gitignore"
62
64
  - ".rspec"
63
65
  - ".rubocop.yml"
@@ -141,6 +143,7 @@ files:
141
143
  - lib/rumale/linear_model/lasso.rb
142
144
  - lib/rumale/linear_model/linear_regression.rb
143
145
  - lib/rumale/linear_model/logistic_regression.rb
146
+ - lib/rumale/linear_model/nnls.rb
144
147
  - lib/rumale/linear_model/ridge.rb
145
148
  - lib/rumale/linear_model/svc.rb
146
149
  - lib/rumale/linear_model/svr.rb
@@ -211,7 +214,7 @@ metadata:
211
214
  source_code_uri: https://github.com/yoshoku/rumale
212
215
  documentation_uri: https://yoshoku.github.io/rumale/doc/
213
216
  bug_tracker_uri: https://github.com/yoshoku/rumale/issues
214
- post_install_message:
217
+ post_install_message:
215
218
  rdoc_options: []
216
219
  require_paths:
217
220
  - lib
@@ -226,8 +229,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
226
229
  - !ruby/object:Gem::Version
227
230
  version: '0'
228
231
  requirements: []
229
- rubygems_version: 3.1.4
230
- signing_key:
232
+ rubygems_version: 3.2.3
233
+ signing_key:
231
234
  specification_version: 4
232
235
  summary: Rumale is a machine learning library in Ruby. Rumale provides machine learning
233
236
  algorithms with interfaces similar to Scikit-Learn in Python.