rumale 0.16.1 → 0.17.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 30476b58c5c5b39567f1cb3c8346a7c354fbf8d30401555fa2e02995021b759d
4
- data.tar.gz: 6f664b0c279e0fef2dc47e608cdc2737318274b45017d6d60f0dd516aa2ebb48
3
+ metadata.gz: d1071dfdccfc177ea5902e5e1b09fce084fd4b6ce403fae6797e6b93c3f826ad
4
+ data.tar.gz: 30768881f5c826f59dbcca0b17a1192dbdc17ca835c8bcc626e874391131bf92
5
5
  SHA512:
6
- metadata.gz: aa51f865e4995901e5587e3089fae724a57022d96c95d2b852cfde99f85f9aae7035c4edfe6c4a7899c22674778e1bfc0332ef83b6f234a8c9e8aa982e55e833
7
- data.tar.gz: 55e209725a0c716b1f450bed025fceefe36dafe278b96648ac60079e3968778840bc4e1e75ff4181abafc4f98eb93cc40d8e2e0e3b5bf078bc94cf8b9a5dc50d
6
+ metadata.gz: e748eedf78b040a7dbe1a1b744f87f1c1e9c3ae751417711eccd5f69dca68335f1a206b9258503da70a8486c3d8588d61ae78081ebdecc6a8ee40f85383a319f
7
+ data.tar.gz: 2abae603660179e05f8341ab5351fb9e028549674bb13901e4cae4dfd13c99995de0de3c63f8c75182acd155a1d2171b02e4db74fcaa08c1108a1a0e92ad3eee
@@ -15,7 +15,7 @@ AllCops:
15
15
  Style/Documentation:
16
16
  Enabled: false
17
17
 
18
- Metrics/LineLength:
18
+ Layout/LineLength:
19
19
  Max: 145
20
20
  IgnoredPatterns: ['(\A|\s)#']
21
21
 
@@ -43,7 +43,7 @@ Metrics/BlockLength:
43
43
  - 'spec/**/*'
44
44
 
45
45
  Metrics/ParameterLists:
46
- Max: 10
46
+ Max: 15
47
47
 
48
48
  Security/MarshalLoad:
49
49
  Enabled: false
@@ -1,3 +1,25 @@
1
+ # 0.17.0
2
+ ## Breaking changes
3
+ - Fix all linear model estimators to use the new abstract class ([BaseSGD](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/BaseSGD.html)) introduced in version 0.16.1.
4
+ The major differences from the old abstract class are that
5
+ the optimizer of LinearModel estimators is fixed to mini-batch SGD with momentum term,
6
+ the max_iter parameter indicates the number of epochs instead of the maximum number of iterations,
7
+ the fit_bias parameter is true by default, and elastic-net style regularization can be used.
8
+ Note that there are additions and changes to hyperparameters.
9
+ Existing trained linear models may need to re-train the model and adjust the hyperparameters.
10
+ - [LogisticRegression](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/LogisticRegression.html)
11
+ - [SVC](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/SVC.html)
12
+ - [LinearRegression](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/LinearRegression.html)
13
+ - [Rdige](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/Ridge.html)
14
+ - [Lasso](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/Lasso.html)
15
+ - [SVR](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/SVR.html)
16
+ - Change the default value of solver parameter on LinearRegression and Ridge to 'auto'.
17
+ If Numo::Linalg is loaded, 'svd' is selected for the solver, otherwise 'sgd' is selected.
18
+ - The meaning of the `max_iter` parameter of the factorization machine estimators
19
+ has been changed from the maximum number of iterations to the number of epochs.
20
+ - [FactorizationMachineClassifier](https://yoshoku.github.io/rumale/doc/Rumale/PolynomialModel/FactorizationMachineClassifier.html)
21
+ - [FactorizationMachineRegressor](https://yoshoku.github.io/rumale/doc/Rumale/PolynomialModel/FactorizationMachineRegressor.html)
22
+
1
23
  # 0.16.1
2
24
  - Add regressor class for [ElasticNet](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/ElasticNet.html).
3
25
  - Add new linear model abstract class.
data/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
7
7
  [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
8
8
  [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
9
- [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://yoshoku.github.io/rumale/doc/)
9
+ [![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://yoshoku.github.io/rumale/doc/)
10
10
 
11
11
  Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
12
12
  Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
@@ -37,6 +37,10 @@ Or install it yourself as:
37
37
 
38
38
  $ gem install rumale
39
39
 
40
+ ## Documentation
41
+
42
+ - [Rumale API Documentation](https://yoshoku.github.io/rumale/doc/)
43
+
40
44
  ## Usage
41
45
 
42
46
  ### Example 1. XOR data
@@ -95,7 +99,7 @@ transformer = Rumale::KernelApproximation::RBF.new(gamma: 0.0001, n_components:
95
99
  transformed = transformer.fit_transform(samples)
96
100
 
97
101
  # Train linear SVM classifier.
98
- classifier = Rumale::LinearModel::SVC.new(reg_param: 0.0001, max_iter: 1000, batch_size: 50, random_seed: 1)
102
+ classifier = Rumale::LinearModel::SVC.new(reg_param: 0.0001, random_seed: 1)
99
103
  classifier.fit(transformed, labels)
100
104
 
101
105
  # Save the model.
@@ -132,7 +136,7 @@ Execution of the above scripts result in the following.
132
136
  ```bash
133
137
  $ ruby train.rb
134
138
  $ ruby test.rb
135
- Accuracy: 98.4%
139
+ Accuracy: 98.7%
136
140
  ```
137
141
 
138
142
  ### Example 3. Cross-validation
@@ -144,7 +148,7 @@ require 'rumale'
144
148
  samples, labels = Rumale::Dataset.load_libsvm_file('pendigits')
145
149
 
146
150
  # Define the estimator to be evaluated.
147
- lr = Rumale::LinearModel::LogisticRegression.new(reg_param: 0.0001, random_seed: 1)
151
+ lr = Rumale::LinearModel::LogisticRegression.new(learning_rate: 0.00001, reg_param: 0.0001, random_seed: 1)
148
152
 
149
153
  # Define the evaluation measure, splitting strategy, and cross validation.
150
154
  ev = Rumale::EvaluationMeasure::LogLoss.new
@@ -163,7 +167,7 @@ Execution of the above scripts result in the following.
163
167
 
164
168
  ```bash
165
169
  $ ruby cross_validation.rb
166
- 5-CV mean log-loss: 0.476
170
+ 5-CV mean log-loss: 0.355
167
171
  ```
168
172
 
169
173
  ### Example 4. Pipeline
@@ -176,7 +180,7 @@ samples, labels = Rumale::Dataset.load_libsvm_file('pendigits')
176
180
 
177
181
  # Construct pipeline with kernel approximation and SVC.
178
182
  rbf = Rumale::KernelApproximation::RBF.new(gamma: 0.0001, n_components: 800, random_seed: 1)
179
- svc = Rumale::LinearModel::SVC.new(reg_param: 0.0001, max_iter: 1000, random_seed: 1)
183
+ svc = Rumale::LinearModel::SVC.new(reg_param: 0.0001, random_seed: 1)
180
184
  pipeline = Rumale::Pipeline::Pipeline.new(steps: { trns: rbf, clsf: svc })
181
185
 
182
186
  # Define the splitting strategy and cross validation.
@@ -195,7 +199,7 @@ Execution of the above scripts result in the following.
195
199
 
196
200
  ```bash
197
201
  $ ruby pipeline.rb
198
- 5-CV mean accuracy: 99.2 %
202
+ 5-CV mean accuracy: 99.6 %
199
203
  ```
200
204
 
201
205
  ## Speeding up
@@ -5,10 +5,15 @@ require 'rumale/optimizer/nadam'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
+ # @note
9
+ # In version 0.17.0, a new linear model abstract class called BaseSGD is introduced.
10
+ # BaseLienarModel is deprecated and will be removed in the future.
11
+ #
8
12
  # BaseLinearModel is an abstract class for implementation of linear estimator
9
13
  # with mini-batch stochastic gradient descent optimization.
10
14
  # This class is used for internal process.
11
15
  class BaseLinearModel
16
+ # :nocov:
12
17
  include Base::BaseEstimator
13
18
 
14
19
  # Initialize a linear estimator.
@@ -26,6 +31,7 @@ module Rumale
26
31
  # @param random_seed [Integer] The seed value using to initialize the random generator.
27
32
  def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0,
28
33
  max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
34
+ warn 'warning: BaseLinearModel is deprecated. Use BaseSGD instead.'
29
35
  @params = {}
30
36
  @params[:reg_param] = reg_param
31
37
  @params[:fit_bias] = fit_bias
@@ -88,6 +94,7 @@ module Rumale
88
94
  [weight, 0.0]
89
95
  end
90
96
  end
97
+ # :nocov:
91
98
  end
92
99
  end
93
100
  end
@@ -99,6 +99,61 @@ module Rumale
99
99
  2.fdiv(y.shape[0]) * (out - y)
100
100
  end
101
101
  end
102
+
103
+ # @!visibility private
104
+ # LogLoss is a class that calculates logistic loss for logistic regression.
105
+ class LogLoss
106
+ # @!visibility private
107
+ def loss(out, y)
108
+ Numo::NMath.log(1 + Numo::NMath.exp(-y * out)).sum.fdiv(y.shape[0])
109
+ end
110
+
111
+ # @!visibility private
112
+ def dloss(out, y)
113
+ y / (1 + Numo::NMath.exp(-y * out)) - y
114
+ end
115
+ end
116
+
117
+ # @!visibility private
118
+ # HingeLoss is a class that calculates hinge loss for support vector classifier.
119
+ class HingeLoss
120
+ # @!visibility private
121
+ def loss(out, y)
122
+ out.class.maximum(0.0, 1 - y * out).sum.fdiv(y.shape[0])
123
+ end
124
+
125
+ # @!visibility private
126
+ def dloss(out, y)
127
+ tids = (y * out).lt(1)
128
+ d = Numo::DFloat.zeros(y.shape[0])
129
+ d[tids] = -y[tids] if tids.count.positive?
130
+ d
131
+ end
132
+ end
133
+
134
+ # @!visibility private
135
+ # EpsilonInsensitive is a class that calculates epsilon insensitive for support vector regressor.
136
+ class EpsilonInsensitive
137
+ # @!visibility private
138
+ def initialize(epsilon: 0.1)
139
+ @epsilon = epsilon
140
+ end
141
+
142
+ # @!visibility private
143
+ def loss(out, y)
144
+ out.class.maximum(0.0, (y - out).abs - @epsilon).sum.fdiv(y.shape[0])
145
+ end
146
+
147
+ # @!visibility private
148
+ def dloss(out, y)
149
+ d = Numo::DFloat.zeros(y.shape[0])
150
+ tids = (out - y).gt(@epsilon)
151
+ d[tids] = 1 if tids.count.positive?
152
+ tids = (y - out).gt(@epsilon)
153
+ d[tids] = -1 if tids.count.positive?
154
+ d
155
+ end
156
+ end
102
157
  end
103
158
 
104
159
  # BaseSGD is an abstract class for implementation of linear model with mini-batch stochastic gradient descent (SGD) optimization.
@@ -59,13 +59,13 @@ module Rumale
59
59
  # @param random_seed [Integer] The seed value using to initialize the random generator.
60
60
  def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
61
61
  reg_param: 1.0, l1_ratio: 0.5, fit_bias: true, bias_scale: 1.0,
62
- max_iter: 100, batch_size: 50, tol: 1e-4,
62
+ max_iter: 200, batch_size: 50, tol: 1e-4,
63
63
  n_jobs: nil, verbose: false, random_seed: nil)
64
64
  check_params_numeric(learning_rate: learning_rate, momentum: momentum,
65
65
  reg_param: reg_param, l1_ratio: l1_ratio, bias_scale: bias_scale,
66
66
  max_iter: max_iter, batch_size: batch_size, tol: tol)
67
67
  check_params_boolean(fit_bias: fit_bias, verbose: verbose)
68
- check_params_numeric_or_nil(decay: nil, n_jobs: n_jobs, random_seed: random_seed)
68
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
69
69
  check_params_positive(learning_rate: learning_rate, reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
70
70
  super()
71
71
  @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
@@ -10,14 +10,15 @@ module Rumale
10
10
  #
11
11
  # @example
12
12
  # estimator =
13
- # Rumale::LinearModel::Lasso.new(reg_param: 0.1, max_iter: 1000, batch_size: 20, random_seed: 1)
13
+ # Rumale::LinearModel::Lasso.new(reg_param: 0.1, max_iter: 500, batch_size: 20, random_seed: 1)
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
17
17
  # *Reference*
18
18
  # - S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
19
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
19
20
  # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
20
- class Lasso < BaseLinearModel
21
+ class Lasso < BaseSGD
21
22
  include Base::Regressor
22
23
 
23
24
  # Return the weight vector.
@@ -34,25 +35,43 @@ module Rumale
34
35
 
35
36
  # Create a new Lasso regressor.
36
37
  #
38
+ # @param learning_rate [Float] The initial value of learning rate.
39
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
40
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
41
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
42
+ # @param momentum [Float] The momentum factor.
37
43
  # @param reg_param [Float] The regularization parameter.
38
44
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
39
45
  # @param bias_scale [Float] The scale of the bias term.
40
- # @param max_iter [Integer] The maximum number of iterations.
46
+ # @param max_iter [Integer] The maximum number of epochs that indicates
47
+ # how many times the whole data is given to the training process.
41
48
  # @param batch_size [Integer] The size of the mini batches.
42
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
43
- # If nil is given, Nadam is used.
49
+ # @param tol [Float] The tolerance of loss for terminating optimization.
44
50
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
45
51
  # If nil is given, the method does not execute in parallel.
46
52
  # If zero or less is given, it becomes equal to the number of processors.
47
53
  # This parameter is ignored if the Parallel gem is not loaded.
54
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
48
55
  # @param random_seed [Integer] The seed value using to initialize the random generator.
49
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
50
- n_jobs: nil, random_seed: nil)
51
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
52
- check_params_boolean(fit_bias: fit_bias)
53
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
54
- check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
55
- super
56
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
57
+ reg_param: 1.0, fit_bias: true, bias_scale: 1.0,
58
+ max_iter: 200, batch_size: 50, tol: 1e-4,
59
+ n_jobs: nil, verbose: false, random_seed: nil)
60
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
61
+ reg_param: reg_param, bias_scale: bias_scale,
62
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
63
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
64
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
65
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
66
+ super()
67
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
68
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
69
+ @params[:random_seed] ||= srand
70
+ @rng = Random.new(@params[:random_seed])
71
+ @penalty_type = L1_PENALTY
72
+ @loss_func = LinearModel::Loss::MeanSquaredError.new
73
+ @weight_vec = nil
74
+ @bias_term = nil
56
75
  end
57
76
 
58
77
  # Fit the model with given training data.
@@ -91,54 +110,6 @@ module Rumale
91
110
  x = check_convert_sample_array(x)
92
111
  x.dot(@weight_vec.transpose) + @bias_term
93
112
  end
94
-
95
- # Dump marshal data.
96
- # @return [Hash] The marshal data about Lasso.
97
- def marshal_dump
98
- { params: @params,
99
- weight_vec: @weight_vec,
100
- bias_term: @bias_term,
101
- rng: @rng }
102
- end
103
-
104
- # Load marshal data.
105
- # @return [nil]
106
- def marshal_load(obj)
107
- @params = obj[:params]
108
- @weight_vec = obj[:weight_vec]
109
- @bias_term = obj[:bias_term]
110
- @rng = obj[:rng]
111
- nil
112
- end
113
-
114
- private
115
-
116
- def partial_fit(x, y)
117
- n_features = @params[:fit_bias] ? x.shape[1] + 1 : x.shape[1]
118
- @left_weight = Numo::DFloat.zeros(n_features)
119
- @right_weight = Numo::DFloat.zeros(n_features)
120
- @left_optimizer = @params[:optimizer].dup
121
- @right_optimizer = @params[:optimizer].dup
122
- super
123
- end
124
-
125
- def calc_loss_gradient(x, y, weight)
126
- 2.0 * (x.dot(weight) - y)
127
- end
128
-
129
- def calc_new_weight(_optimizer, x, _weight, loss_gradient)
130
- @left_weight = round_weight(@left_optimizer.call(@left_weight, calc_weight_gradient(loss_gradient, x)))
131
- @right_weight = round_weight(@right_optimizer.call(@right_weight, calc_weight_gradient(-loss_gradient, x)))
132
- @left_weight - @right_weight
133
- end
134
-
135
- def calc_weight_gradient(loss_gradient, data)
136
- ((@params[:reg_param] + loss_gradient).expand_dims(1) * data).mean(0)
137
- end
138
-
139
- def round_weight(weight)
140
- 0.5 * (weight + weight.abs)
141
- end
142
113
  end
143
114
  end
144
115
  end
@@ -1,16 +1,16 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LinearRegression is a class that implements ordinary least square linear regression
9
- # with mini-batch stochastic gradient descent optimization or singular value decomposition.
9
+ # with stochastic gradient descent (SGD) optimization or singular value decomposition (SVD).
10
10
  #
11
11
  # @example
12
12
  # estimator =
13
- # Rumale::LinearModel::LinearRegression.new(max_iter: 1000, batch_size: 20, random_seed: 1)
13
+ # Rumale::LinearModel::LinearRegression.new(max_iter: 500, batch_size: 20, random_seed: 1)
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
@@ -19,7 +19,10 @@ module Rumale
19
19
  # estimator = Rumale::LinearModel::LinearRegression.new(solver: 'svd')
20
20
  # estimator.fit(training_samples, traininig_values)
21
21
  # results = estimator.predict(testing_samples)
22
- class LinearRegression < BaseLinearModel
22
+ #
23
+ # *Reference*
24
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
25
+ class LinearRegression < BaseSGD
23
26
  include Base::Regressor
24
27
 
25
28
  # Return the weight vector.
@@ -36,34 +39,57 @@ module Rumale
36
39
 
37
40
  # Create a new ordinary least square linear regressor.
38
41
  #
42
+ # @param learning_rate [Float] The initial value of learning rate.
43
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
+ # If solver = 'svd', this parameter is ignored.
45
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
46
+ # If nil is given, the decay sets to 'learning_rate'.
47
+ # If solver = 'svd', this parameter is ignored.
48
+ # @param momentum [Float] The momentum factor.
49
+ # If solver = 'svd', this parameter is ignored.
39
50
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
40
51
  # @param bias_scale [Float] The scale of the bias term.
41
- # @param max_iter [Integer] The maximum number of iterations.
52
+ # @param max_iter [Integer] The maximum number of epochs that indicates
53
+ # how many times the whole data is given to the training process.
42
54
  # If solver = 'svd', this parameter is ignored.
43
55
  # @param batch_size [Integer] The size of the mini batches.
44
56
  # If solver = 'svd', this parameter is ignored.
45
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
46
- # If nil is given, Nadam is used.
57
+ # @param tol [Float] The tolerance of loss for terminating optimization.
47
58
  # If solver = 'svd', this parameter is ignored.
48
- # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
59
+ # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd' or 'svd').
60
+ # 'auto' chooses the 'svd' solver if Numo::Linalg is loaded. Otherwise, it chooses the 'sgd' solver.
49
61
  # 'sgd' uses the stochastic gradient descent optimization.
50
62
  # 'svd' performs singular value decomposition of samples.
51
63
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
52
64
  # If nil is given, the method does not execute in parallel.
53
65
  # If zero or less is given, it becomes equal to the number of processors.
54
66
  # This parameter is ignored if the Parallel gem is not loaded.
67
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
68
+ # If solver = 'svd', this parameter is ignored.
55
69
  # @param random_seed [Integer] The seed value using to initialize the random generator.
56
- def initialize(fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
57
- solver: 'sgd', n_jobs: nil, random_seed: nil)
58
- check_params_numeric(bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
59
- check_params_boolean(fit_bias: fit_bias)
70
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
71
+ fit_bias: true, bias_scale: 1.0, max_iter: 200, batch_size: 50, tol: 1e-4,
72
+ solver: 'auto',
73
+ n_jobs: nil, verbose: false, random_seed: nil)
74
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
75
+ bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
76
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
60
77
  check_params_string(solver: solver)
61
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
62
- check_params_positive(max_iter: max_iter, batch_size: batch_size)
63
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h.merge(reg_param: 0.0)
64
- keywd_args.delete(:solver)
65
- super(**keywd_args)
66
- @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
78
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
79
+ check_params_positive(learning_rate: learning_rate, max_iter: max_iter, batch_size: batch_size)
80
+ super()
81
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
82
+ @params[:solver] = if solver == 'auto'
83
+ load_linalg? ? 'svd' : 'sgd'
84
+ else
85
+ solver != 'svd' ? 'sgd' : 'svd'
86
+ end
87
+ @params[:decay] ||= @params[:learning_rate]
88
+ @params[:random_seed] ||= srand
89
+ @rng = Random.new(@params[:random_seed])
90
+ @loss_func = LinearModel::Loss::MeanSquaredError.new
91
+ @weight_vec = nil
92
+ @bias_term = nil
67
93
  end
68
94
 
69
95
  # Fit the model with given training data.
@@ -94,33 +120,12 @@ module Rumale
94
120
  x.dot(@weight_vec.transpose) + @bias_term
95
121
  end
96
122
 
97
- # Dump marshal data.
98
- # @return [Hash] The marshal data about LinearRegression.
99
- def marshal_dump
100
- { params: @params,
101
- weight_vec: @weight_vec,
102
- bias_term: @bias_term,
103
- rng: @rng }
104
- end
105
-
106
- # Load marshal data.
107
- # @return [nil]
108
- def marshal_load(obj)
109
- @params = obj[:params]
110
- @weight_vec = obj[:weight_vec]
111
- @bias_term = obj[:bias_term]
112
- @rng = obj[:rng]
113
- nil
114
- end
115
-
116
123
  private
117
124
 
118
125
  def fit_svd(x, y)
119
- samples = @params[:fit_bias] ? expand_feature(x) : x
126
+ x = expand_feature(x) if fit_bias?
120
127
 
121
- s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
122
- d = (s / s**2).diag
123
- w = vt.transpose.dot(d).dot(u.transpose).dot(y)
128
+ w = Numo::Linalg.pinv(x, driver: 'svd').dot(y)
124
129
 
125
130
  is_single_target_vals = y.shape[1].nil?
126
131
  if @params[:fit_bias]
@@ -150,8 +155,14 @@ module Rumale
150
155
  end
151
156
  end
152
157
 
153
- def calc_loss_gradient(x, y, weight)
154
- 2.0 * (x.dot(weight) - y)
158
+ def fit_bias?
159
+ @params[:fit_bias] == true
160
+ end
161
+
162
+ def load_linalg?
163
+ return false if defined?(Numo::Linalg).nil?
164
+ return false if Numo::Linalg::VERSION < '0.1.4'
165
+ true
155
166
  end
156
167
  end
157
168
  end
@@ -1,12 +1,12 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/classifier'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LogisticRegression is a class that implements Logistic Regression
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with stochastic gradient descent optimization.
10
10
  # For multiclass classification problem, it uses one-vs-the-rest strategy.
11
11
  #
12
12
  # Rumale::SVM provides Logistic Regression based on LIBLINEAR.
@@ -15,13 +15,15 @@ module Rumale
15
15
  #
16
16
  # @example
17
17
  # estimator =
18
- # Rumale::LinearModel::LogisticRegression.new(reg_param: 1.0, max_iter: 1000, batch_size: 20, random_seed: 1)
18
+ # Rumale::LinearModel::LogisticRegression.new(reg_param: 1.0, max_iter: 200, batch_size: 50, random_seed: 1)
19
19
  # estimator.fit(training_samples, traininig_labels)
20
20
  # results = estimator.predict(testing_samples)
21
21
  #
22
22
  # *Reference*
23
23
  # - S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Mathematical Programming, vol. 127 (1), pp. 3--30, 2011.
24
- class LogisticRegression < BaseLinearModel
24
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
25
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
26
+ class LogisticRegression < BaseSGD
25
27
  include Base::Classifier
26
28
 
27
29
  # Return the weight vector for Logistic Regression.
@@ -42,26 +44,53 @@ module Rumale
42
44
 
43
45
  # Create a new classifier with Logisitc Regression by the SGD optimization.
44
46
  #
47
+ # @param learning_rate [Float] The initial value of learning rate.
48
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
49
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
50
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
51
+ # @param momentum [Float] The momentum factor.
52
+ # @param penalty [String] The regularization type to be used ('l1', 'l2', and 'elasticnet').
53
+ # @param l1_ratio [Float] The elastic-net type regularization mixing parameter.
54
+ # If penalty set to 'l2' or 'l1', this parameter is ignored.
55
+ # If l1_ratio = 1, the regularization is similar to Lasso.
56
+ # If l1_ratio = 0, the regularization is similar to Ridge.
57
+ # If 0 < l1_ratio < 1, the regularization is a combination of L1 and L2.
45
58
  # @param reg_param [Float] The regularization parameter.
46
59
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
47
60
  # @param bias_scale [Float] The scale of the bias term.
48
61
  # If fit_bias is true, the feature vector v becoms [v; bias_scale].
49
- # @param max_iter [Integer] The maximum number of iterations.
62
+ # @param max_iter [Integer] The maximum number of epochs that indicates
63
+ # how many times the whole data is given to the training process.
50
64
  # @param batch_size [Integer] The size of the mini batches.
51
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
52
- # If nil is given, Nadam is used.
65
+ # @param tol [Float] The tolerance of loss for terminating optimization.
53
66
  # @param n_jobs [Integer] The number of jobs for running the fit and predict methods in parallel.
54
67
  # If nil is given, the methods do not execute in parallel.
55
68
  # If zero or less is given, it becomes equal to the number of processors.
56
69
  # This parameter is ignored if the Parallel gem is not loaded.
70
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
57
71
  # @param random_seed [Integer] The seed value using to initialize the random generator.
58
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0,
59
- max_iter: 1000, batch_size: 20, optimizer: nil, n_jobs: nil, random_seed: nil)
60
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
61
- check_params_boolean(fit_bias: fit_bias)
62
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
63
- check_params_positive(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
64
- super
72
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
73
+ penalty: 'l2', reg_param: 1.0, l1_ratio: 0.5,
74
+ fit_bias: true, bias_scale: 1.0,
75
+ max_iter: 200, batch_size: 50, tol: 1e-4,
76
+ n_jobs: nil, verbose: false, random_seed: nil)
77
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
78
+ reg_param: reg_param, l1_ratio: l1_ratio, bias_scale: bias_scale,
79
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
80
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
81
+ check_params_string(penalty: penalty)
82
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
83
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param,
84
+ bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
85
+ super()
86
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
87
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
88
+ @params[:random_seed] ||= srand
89
+ @rng = Random.new(@params[:random_seed])
90
+ @penalty_type = @params[:penalty]
91
+ @loss_func = LinearModel::Loss::LogLoss.new
92
+ @weight_vec = nil
93
+ @bias_term = nil
65
94
  @classes = nil
66
95
  end
67
96
 
@@ -148,33 +177,8 @@ module Rumale
148
177
  probs
149
178
  end
150
179
 
151
- # Dump marshal data.
152
- # @return [Hash] The marshal data about LogisticRegression.
153
- def marshal_dump
154
- { params: @params,
155
- weight_vec: @weight_vec,
156
- bias_term: @bias_term,
157
- classes: @classes,
158
- rng: @rng }
159
- end
160
-
161
- # Load marshal data.
162
- # @return [nil]
163
- def marshal_load(obj)
164
- @params = obj[:params]
165
- @weight_vec = obj[:weight_vec]
166
- @bias_term = obj[:bias_term]
167
- @classes = obj[:classes]
168
- @rng = obj[:rng]
169
- nil
170
- end
171
-
172
180
  private
173
181
 
174
- def calc_loss_gradient(x, y, weight)
175
- y / (Numo::NMath.exp(-y * x.dot(weight)) + 1.0) - y
176
- end
177
-
178
182
  def multiclass_problem?
179
183
  @classes.size > 2
180
184
  end
@@ -1,16 +1,16 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # Ridge is a class that implements Ridge Regression
9
- # with mini-batch stochastic gradient descent optimization or singular value decomposition.
9
+ # with stochastic gradient descent (SGD) optimization or singular value decomposition (SVD).
10
10
  #
11
11
  # @example
12
12
  # estimator =
13
- # Rumale::LinearModel::Ridge.new(reg_param: 0.1, max_iter: 1000, batch_size: 20, random_seed: 1)
13
+ # Rumale::LinearModel::Ridge.new(reg_param: 0.1, max_iter: 500, batch_size: 20, random_seed: 1)
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
@@ -19,7 +19,10 @@ module Rumale
19
19
  # estimator = Rumale::LinearModel::Ridge.new(reg_param: 0.1, solver: 'svd')
20
20
  # estimator.fit(training_samples, traininig_values)
21
21
  # results = estimator.predict(testing_samples)
22
- class Ridge < BaseLinearModel
22
+ #
23
+ # *Reference*
24
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
25
+ class Ridge < BaseSGD
23
26
  include Base::Regressor
24
27
 
25
28
  # Return the weight vector.
@@ -36,35 +39,61 @@ module Rumale
36
39
 
37
40
  # Create a new Ridge regressor.
38
41
  #
42
+ # @param learning_rate [Float] The initial value of learning rate.
43
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
+ # If solver = 'svd', this parameter is ignored.
45
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
46
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
47
+ # If solver = 'svd', this parameter is ignored.
48
+ # @param momentum [Float] The momentum factor.
49
+ # If solver = 'svd', this parameter is ignored.
39
50
  # @param reg_param [Float] The regularization parameter.
40
51
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
41
52
  # @param bias_scale [Float] The scale of the bias term.
42
- # @param max_iter [Integer] The maximum number of iterations.
53
+ # @param max_iter [Integer] The maximum number of epochs that indicates
54
+ # how many times the whole data is given to the training process.
43
55
  # If solver = 'svd', this parameter is ignored.
44
56
  # @param batch_size [Integer] The size of the mini batches.
45
57
  # If solver = 'svd', this parameter is ignored.
46
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
47
- # If nil is given, Nadam is used.
58
+ # @param tol [Float] The tolerance of loss for terminating optimization.
48
59
  # If solver = 'svd', this parameter is ignored.
49
- # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
60
+ # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd' or 'svd').
61
+ # 'auto' chooses the 'svd' solver if Numo::Linalg is loaded. Otherwise, it chooses the 'sgd' solver.
50
62
  # 'sgd' uses the stochastic gradient descent optimization.
51
63
  # 'svd' performs singular value decomposition of samples.
52
64
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
53
65
  # If nil is given, the method does not execute in parallel.
54
66
  # If zero or less is given, it becomes equal to the number of processors.
55
67
  # This parameter is ignored if the Parallel gem is not loaded or the solver is 'svd'.
68
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
69
+ # If solver = 'svd', this parameter is ignored.
56
70
  # @param random_seed [Integer] The seed value using to initialize the random generator.
57
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
58
- solver: 'sgd', n_jobs: nil, random_seed: nil)
59
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
60
- check_params_boolean(fit_bias: fit_bias)
71
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
72
+ reg_param: 1.0, fit_bias: true, bias_scale: 1.0,
73
+ max_iter: 200, batch_size: 50, tol: 1e-4,
74
+ solver: 'auto',
75
+ n_jobs: nil, verbose: false, random_seed: nil)
76
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
77
+ reg_param: reg_param, bias_scale: bias_scale,
78
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
79
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
61
80
  check_params_string(solver: solver)
62
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
63
- check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
64
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
65
- keywd_args.delete(:solver)
66
- super(**keywd_args)
67
- @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
81
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
82
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
83
+ super()
84
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
85
+ @params[:solver] = if solver == 'auto'
86
+ load_linalg? ? 'svd' : 'sgd'
87
+ else
88
+ solver != 'svd' ? 'sgd' : 'svd'
89
+ end
90
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
91
+ @params[:random_seed] ||= srand
92
+ @rng = Random.new(@params[:random_seed])
93
+ @penalty_type = L2_PENALTY
94
+ @loss_func = LinearModel::Loss::MeanSquaredError.new
95
+ @weight_vec = nil
96
+ @bias_term = nil
68
97
  end
69
98
 
70
99
  # Fit the model with given training data.
@@ -95,25 +124,6 @@ module Rumale
95
124
  x.dot(@weight_vec.transpose) + @bias_term
96
125
  end
97
126
 
98
- # Dump marshal data.
99
- # @return [Hash] The marshal data about Ridge.
100
- def marshal_dump
101
- { params: @params,
102
- weight_vec: @weight_vec,
103
- bias_term: @bias_term,
104
- rng: @rng }
105
- end
106
-
107
- # Load marshal data.
108
- # @return [nil]
109
- def marshal_load(obj)
110
- @params = obj[:params]
111
- @weight_vec = obj[:weight_vec]
112
- @bias_term = obj[:bias_term]
113
- @rng = obj[:rng]
114
- nil
115
- end
116
-
117
127
  private
118
128
 
119
129
  def fit_svd(x, y)
@@ -151,8 +161,10 @@ module Rumale
151
161
  end
152
162
  end
153
163
 
154
- def calc_loss_gradient(x, y, weight)
155
- 2.0 * (x.dot(weight) - y)
164
+ def load_linalg?
165
+ return false if defined?(Numo::Linalg).nil?
166
+ return false if Numo::Linalg::VERSION < '0.1.4'
167
+ true
156
168
  end
157
169
  end
158
170
  end
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/classifier'
5
5
  require 'rumale/probabilistic_output'
6
6
 
@@ -8,7 +8,7 @@ module Rumale
8
8
  # This module consists of the classes that implement generalized linear models.
9
9
  module LinearModel
10
10
  # SVC is a class that implements Support Vector Classifier
11
- # with mini-batch stochastic gradient descent optimization.
11
+ # with stochastic gradient descent optimization.
12
12
  # For multiclass classification problem, it uses one-vs-the-rest strategy.
13
13
  #
14
14
  # Rumale::SVM provides linear support vector classifier based on LIBLINEAR.
@@ -17,13 +17,15 @@ module Rumale
17
17
  #
18
18
  # @example
19
19
  # estimator =
20
- # Rumale::LinearModel::SVC.new(reg_param: 1.0, max_iter: 1000, batch_size: 20, random_seed: 1)
20
+ # Rumale::LinearModel::SVC.new(reg_param: 1.0, max_iter: 200, batch_size: 50, random_seed: 1)
21
21
  # estimator.fit(training_samples, traininig_labels)
22
22
  # results = estimator.predict(testing_samples)
23
23
  #
24
24
  # *Reference*
25
25
  # - S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
26
- class SVC < BaseLinearModel
26
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
27
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
28
+ class SVC < BaseSGD
27
29
  include Base::Classifier
28
30
 
29
31
  # Return the weight vector for SVC.
@@ -44,31 +46,56 @@ module Rumale
44
46
 
45
47
  # Create a new classifier with Support Vector Machine by the SGD optimization.
46
48
  #
49
+ # @param learning_rate [Float] The initial value of learning rate.
50
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
51
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
52
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
53
+ # @param momentum [Float] The momentum factor.
54
+ # @param penalty [String] The regularization type to be used ('l1', 'l2', and 'elasticnet').
55
+ # @param l1_ratio [Float] The elastic-net type regularization mixing parameter.
56
+ # If penalty set to 'l2' or 'l1', this parameter is ignored.
57
+ # If l1_ratio = 1, the regularization is similar to Lasso.
58
+ # If l1_ratio = 0, the regularization is similar to Ridge.
59
+ # If 0 < l1_ratio < 1, the regularization is a combination of L1 and L2.
47
60
  # @param reg_param [Float] The regularization parameter.
48
61
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
49
62
  # @param bias_scale [Float] The scale of the bias term.
50
- # @param max_iter [Integer] The maximum number of iterations.
63
+ # @param max_iter [Integer] The maximum number of epochs that indicates
64
+ # how many times the whole data is given to the training process.
51
65
  # @param batch_size [Integer] The size of the mini batches.
66
+ # @param tol [Float] The tolerance of loss for terminating optimization.
52
67
  # @param probability [Boolean] The flag indicating whether to perform probability estimation.
53
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
54
- # If nil is given, Nadam is used.
55
68
  # @param n_jobs [Integer] The number of jobs for running the fit and predict methods in parallel.
56
69
  # If nil is given, the methods do not execute in parallel.
57
70
  # If zero or less is given, it becomes equal to the number of processors.
58
71
  # This parameter is ignored if the Parallel gem is not loaded.
72
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
59
73
  # @param random_seed [Integer] The seed value using to initialize the random generator.
60
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0,
61
- max_iter: 1000, batch_size: 20, probability: false, optimizer: nil, n_jobs: nil, random_seed: nil)
62
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
63
- check_params_boolean(fit_bias: fit_bias, probability: probability)
64
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
65
- check_params_positive(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
66
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
67
- keywd_args.delete(:probability)
68
- super(**keywd_args)
69
- @params[:probability] = probability
70
- @prob_param = nil
74
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
75
+ penalty: 'l2', reg_param: 1.0, l1_ratio: 0.5,
76
+ fit_bias: true, bias_scale: 1.0,
77
+ max_iter: 200, batch_size: 50, tol: 1e-4,
78
+ probability: false,
79
+ n_jobs: nil, verbose: false, random_seed: nil)
80
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
81
+ reg_param: reg_param, l1_ratio: l1_ratio, bias_scale: bias_scale,
82
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
83
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose, probability: probability)
84
+ check_params_string(penalty: penalty)
85
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
86
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param,
87
+ bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
88
+ super()
89
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
90
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
91
+ @params[:random_seed] ||= srand
92
+ @rng = Random.new(@params[:random_seed])
93
+ @penalty_type = @params[:penalty]
94
+ @loss_func = LinearModel::Loss::HingeLoss.new
95
+ @weight_vec = nil
96
+ @bias_term = nil
71
97
  @classes = nil
98
+ @prob_param = nil
72
99
  end
73
100
 
74
101
  # Fit the model with given training data.
@@ -165,29 +192,6 @@ module Rumale
165
192
  end
166
193
  end
167
194
 
168
- # Dump marshal data.
169
- # @return [Hash] The marshal data about SVC.
170
- def marshal_dump
171
- { params: @params,
172
- weight_vec: @weight_vec,
173
- bias_term: @bias_term,
174
- prob_param: @prob_param,
175
- classes: @classes,
176
- rng: @rng }
177
- end
178
-
179
- # Load marshal data.
180
- # @return [nil]
181
- def marshal_load(obj)
182
- @params = obj[:params]
183
- @weight_vec = obj[:weight_vec]
184
- @bias_term = obj[:bias_term]
185
- @prob_param = obj[:prob_param]
186
- @classes = obj[:classes]
187
- @rng = obj[:rng]
188
- nil
189
- end
190
-
191
195
  private
192
196
 
193
197
  def partial_fit(x, bin_y)
@@ -200,13 +204,6 @@ module Rumale
200
204
  [w, b, p]
201
205
  end
202
206
 
203
- def calc_loss_gradient(x, y, weight)
204
- target_ids = (x.dot(weight) * y).lt(1.0).where
205
- grad = Numo::DFloat.zeros(@params[:batch_size])
206
- grad[target_ids] = -y[target_ids]
207
- grad
208
- end
209
-
210
207
  def multiclass_problem?
211
208
  @classes.size > 2
212
209
  end
@@ -1,12 +1,12 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # SVR is a class that implements Support Vector Regressor
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with stochastic gradient descent optimization.
10
10
  #
11
11
  # Rumale::SVM provides linear and kernel support vector regressor based on LIBLINEAR and LIBSVM.
12
12
  # If you prefer execution speed, you should use Rumale::SVM::LinearSVR.
@@ -14,13 +14,15 @@ module Rumale
14
14
  #
15
15
  # @example
16
16
  # estimator =
17
- # Rumale::LinearModel::SVR.new(reg_param: 1.0, epsilon: 0.1, max_iter: 1000, batch_size: 20, random_seed: 1)
17
+ # Rumale::LinearModel::SVR.new(reg_param: 1.0, epsilon: 0.1, max_iter: 200, batch_size: 50, random_seed: 1)
18
18
  # estimator.fit(training_samples, traininig_target_values)
19
19
  # results = estimator.predict(testing_samples)
20
20
  #
21
21
  # *Reference*
22
- # 1. S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
23
- class SVR < BaseLinearModel
22
+ # - S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
23
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
24
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
25
+ class SVR < BaseSGD
24
26
  include Base::Regressor
25
27
 
26
28
  # Return the weight vector for SVR.
@@ -37,30 +39,54 @@ module Rumale
37
39
 
38
40
  # Create a new regressor with Support Vector Machine by the SGD optimization.
39
41
  #
42
+ # @param learning_rate [Float] The initial value of learning rate.
43
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
45
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
46
+ # @param momentum [Float] The momentum factor.
47
+ # @param penalty [String] The regularization type to be used ('l1', 'l2', and 'elasticnet').
48
+ # @param l1_ratio [Float] The elastic-net type regularization mixing parameter.
49
+ # If penalty set to 'l2' or 'l1', this parameter is ignored.
50
+ # If l1_ratio = 1, the regularization is similar to Lasso.
51
+ # If l1_ratio = 0, the regularization is similar to Ridge.
52
+ # If 0 < l1_ratio < 1, the regularization is a combination of L1 and L2.
40
53
  # @param reg_param [Float] The regularization parameter.
41
54
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
42
55
  # @param bias_scale [Float] The scale of the bias term.
43
56
  # @param epsilon [Float] The margin of tolerance.
44
- # @param max_iter [Integer] The maximum number of iterations.
57
+ # @param max_iter [Integer] The maximum number of epochs that indicates
58
+ # how many times the whole data is given to the training process.
45
59
  # @param batch_size [Integer] The size of the mini batches.
46
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
47
- # If nil is given, Nadam is used.
60
+ # @param tol [Float] The tolerance of loss for terminating optimization.
48
61
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
49
62
  # If nil is given, the method does not execute in parallel.
50
63
  # If zero or less is given, it becomes equal to the number of processors.
51
64
  # This parameter is ignored if the Parallel gem is not loaded.
65
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
52
66
  # @param random_seed [Integer] The seed value using to initialize the random generator.
53
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, epsilon: 0.1,
54
- max_iter: 1000, batch_size: 20, optimizer: nil, n_jobs: nil, random_seed: nil)
55
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, epsilon: epsilon, max_iter: max_iter, batch_size: batch_size)
56
- check_params_boolean(fit_bias: fit_bias)
57
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
58
- check_params_positive(reg_param: reg_param, bias_scale: bias_scale, epsilon: epsilon,
67
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
68
+ penalty: 'l2', reg_param: 1.0, l1_ratio: 0.5,
69
+ fit_bias: true, bias_scale: 1.0,
70
+ epsilon: 0.1,
71
+ max_iter: 200, batch_size: 50, tol: 1e-4,
72
+ n_jobs: nil, verbose: false, random_seed: nil)
73
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
74
+ reg_param: reg_param, bias_scale: bias_scale, epsilon: epsilon,
75
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
76
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
77
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
78
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param,
79
+ bias_scale: bias_scale, epsilon: epsilon,
59
80
  max_iter: max_iter, batch_size: batch_size)
60
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
61
- keywd_args.delete(:epsilon)
62
- super(**keywd_args)
63
- @params[:epsilon] = epsilon
81
+ super()
82
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
83
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
84
+ @params[:random_seed] ||= srand
85
+ @rng = Random.new(@params[:random_seed])
86
+ @penalty_type = @params[:penalty]
87
+ @loss_func = LinearModel::Loss::EpsilonInsensitive.new(epsilon: @params[:epsilon])
88
+ @weight_vec = nil
89
+ @bias_term = nil
64
90
  end
65
91
 
66
92
  # Fit the model with given training data.
@@ -100,35 +126,6 @@ module Rumale
100
126
  x = check_convert_sample_array(x)
101
127
  x.dot(@weight_vec.transpose) + @bias_term
102
128
  end
103
-
104
- # Dump marshal data.
105
- # @return [Hash] The marshal data about SVR.
106
- def marshal_dump
107
- { params: @params,
108
- weight_vec: @weight_vec,
109
- bias_term: @bias_term,
110
- rng: @rng }
111
- end
112
-
113
- # Load marshal data.
114
- # @return [nil]
115
- def marshal_load(obj)
116
- @params = obj[:params]
117
- @weight_vec = obj[:weight_vec]
118
- @bias_term = obj[:bias_term]
119
- @rng = obj[:rng]
120
- nil
121
- end
122
-
123
- private
124
-
125
- def calc_loss_gradient(x, y, weight)
126
- z = x.dot(weight)
127
- grad = Numo::DFloat.zeros(@params[:batch_size])
128
- grad[(z - y).gt(@params[:epsilon]).where] = 1
129
- grad[(y - z).gt(@params[:epsilon]).where] = -1
130
- grad
131
- end
132
129
  end
133
130
  end
134
131
  end
@@ -17,7 +17,8 @@ module Rumale
17
17
  # @param loss [String] The loss function ('hinge' or 'logistic' or nil).
18
18
  # @param reg_param_linear [Float] The regularization parameter for linear model.
19
19
  # @param reg_param_factor [Float] The regularization parameter for factor matrix.
20
- # @param max_iter [Integer] The maximum number of iterations.
20
+ # @param max_iter [Integer] The maximum number of epochs that indicates
21
+ # how many times the whole data is given to the training process.
21
22
  # @param batch_size [Integer] The size of the mini batches.
22
23
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
23
24
  # If nil is given, Nadam is used.
@@ -27,7 +28,7 @@ module Rumale
27
28
  # This parameter is ignored if the Parallel gem is not loaded.
28
29
  # @param random_seed [Integer] The seed value using to initialize the random generator.
29
30
  def initialize(n_factors: 2, loss: nil, reg_param_linear: 1.0, reg_param_factor: 1.0,
30
- max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
31
+ max_iter: 200, batch_size: 50, optimizer: nil, n_jobs: nil, random_seed: nil)
31
32
  @params = {}
32
33
  @params[:n_factors] = n_factors
33
34
  @params[:loss] = loss unless loss.nil?
@@ -51,27 +52,29 @@ module Rumale
51
52
  def partial_fit(x, y)
52
53
  # Initialize some variables.
53
54
  n_samples, n_features = x.shape
54
- rand_ids = [*0...n_samples].shuffle(random: @rng.dup)
55
+ sub_rng = @rng.dup
55
56
  weight_vec = Numo::DFloat.zeros(n_features + 1)
56
57
  factor_mat = Numo::DFloat.zeros(@params[:n_factors], n_features)
57
58
  weight_optimizer = @params[:optimizer].dup
58
59
  factor_optimizers = Array.new(@params[:n_factors]) { @params[:optimizer].dup }
59
60
  # Start optimization.
60
61
  @params[:max_iter].times do |_t|
61
- # Random sampling.
62
- subset_ids = rand_ids.shift(@params[:batch_size])
63
- rand_ids.concat(subset_ids)
64
- data = x[subset_ids, true]
65
- ex_data = expand_feature(data)
66
- targets = y[subset_ids]
67
- # Calculate gradients for loss function.
68
- loss_grad = loss_gradient(data, ex_data, targets, factor_mat, weight_vec)
69
- next if loss_grad.ne(0.0).count.zero?
70
- # Update each parameter.
71
- weight_vec = weight_optimizer.call(weight_vec, weight_gradient(loss_grad, ex_data, weight_vec))
72
- @params[:n_factors].times do |n|
73
- factor_mat[n, true] = factor_optimizers[n].call(factor_mat[n, true],
74
- factor_gradient(loss_grad, data, factor_mat[n, true]))
62
+ sample_ids = [*0...n_samples]
63
+ sample_ids.shuffle!(random: sub_rng)
64
+ until (subset_ids = sample_ids.shift(@params[:batch_size])).empty?
65
+ # Sampling.
66
+ sub_x = x[subset_ids, true]
67
+ sub_y = y[subset_ids]
68
+ ex_sub_x = expand_feature(sub_x)
69
+ # Calculate gradients for loss function.
70
+ loss_grad = loss_gradient(sub_x, ex_sub_x, sub_y, factor_mat, weight_vec)
71
+ next if loss_grad.ne(0.0).count.zero?
72
+ # Update each parameter.
73
+ weight_vec = weight_optimizer.call(weight_vec, weight_gradient(loss_grad, ex_sub_x, weight_vec))
74
+ @params[:n_factors].times do |n|
75
+ factor_mat[n, true] = factor_optimizers[n].call(factor_mat[n, true],
76
+ factor_gradient(loss_grad, sub_x, factor_mat[n, true]))
77
+ end
75
78
  end
76
79
  end
77
80
  [factor_mat, *split_weight_vec_bias(weight_vec)]
@@ -14,7 +14,7 @@ module Rumale
14
14
  # estimator =
15
15
  # Rumale::PolynomialModel::FactorizationMachineClassifier.new(
16
16
  # n_factors: 10, loss: 'hinge', reg_param_linear: 0.001, reg_param_factor: 0.001,
17
- # max_iter: 5000, batch_size: 50, random_seed: 1)
17
+ # max_iter: 500, batch_size: 50, random_seed: 1)
18
18
  # estimator.fit(training_samples, traininig_labels)
19
19
  # results = estimator.predict(testing_samples)
20
20
  #
@@ -50,7 +50,8 @@ module Rumale
50
50
  # @param loss [String] The loss function ('hinge' or 'logistic').
51
51
  # @param reg_param_linear [Float] The regularization parameter for linear model.
52
52
  # @param reg_param_factor [Float] The regularization parameter for factor matrix.
53
- # @param max_iter [Integer] The maximum number of iterations.
53
+ # @param max_iter [Integer] The maximum number of epochs that indicates
54
+ # how many times the whole data is given to the training process.
54
55
  # @param batch_size [Integer] The size of the mini batches.
55
56
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
56
57
  # If nil is given, Nadam is used.
@@ -60,7 +61,7 @@ module Rumale
60
61
  # This parameter is ignored if the Parallel gem is not loaded.
61
62
  # @param random_seed [Integer] The seed value using to initialize the random generator.
62
63
  def initialize(n_factors: 2, loss: 'hinge', reg_param_linear: 1.0, reg_param_factor: 1.0,
63
- max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
64
+ max_iter: 200, batch_size: 50, optimizer: nil, n_jobs: nil, random_seed: nil)
64
65
  check_params_numeric(reg_param_linear: reg_param_linear, reg_param_factor: reg_param_factor,
65
66
  n_factors: n_factors, max_iter: max_iter, batch_size: batch_size)
66
67
  check_params_string(loss: loss)
@@ -12,7 +12,7 @@ module Rumale
12
12
  # estimator =
13
13
  # Rumale::PolynomialModel::FactorizationMachineRegressor.new(
14
14
  # n_factors: 10, reg_param_linear: 0.1, reg_param_factor: 0.1,
15
- # max_iter: 5000, batch_size: 50, random_seed: 1)
15
+ # max_iter: 500, batch_size: 50, random_seed: 1)
16
16
  # estimator.fit(training_samples, traininig_values)
17
17
  # results = estimator.predict(testing_samples)
18
18
  #
@@ -43,7 +43,8 @@ module Rumale
43
43
  # @param n_factors [Integer] The maximum number of iterations.
44
44
  # @param reg_param_linear [Float] The regularization parameter for linear model.
45
45
  # @param reg_param_factor [Float] The regularization parameter for factor matrix.
46
- # @param max_iter [Integer] The maximum number of iterations.
46
+ # @param max_iter [Integer] The maximum number of epochs that indicates
47
+ # how many times the whole data is given to the training process.
47
48
  # @param batch_size [Integer] The size of the mini batches.
48
49
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
49
50
  # If nil is given, Nadam is used.
@@ -53,7 +54,7 @@ module Rumale
53
54
  # This parameter is ignored if the Parallel gem is not loaded.
54
55
  # @param random_seed [Integer] The seed value using to initialize the random generator.
55
56
  def initialize(n_factors: 2, reg_param_linear: 1.0, reg_param_factor: 1.0,
56
- max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
57
+ max_iter: 200, batch_size: 50, optimizer: nil, n_jobs: nil, random_seed: nil)
57
58
  check_params_numeric(reg_param_linear: reg_param_linear, reg_param_factor: reg_param_factor,
58
59
  n_factors: n_factors, max_iter: max_iter, batch_size: batch_size)
59
60
  check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
@@ -3,5 +3,5 @@
3
3
  # Rumale is a machine learning library in Ruby.
4
4
  module Rumale
5
5
  # The version of Rumale you are using.
6
- VERSION = '0.16.1'
6
+ VERSION = '0.17.0'
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rumale
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.16.1
4
+ version: 0.17.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshoku
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-01-11 00:00:00.000000000 Z
11
+ date: 2020-01-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: numo-narray