rumale 0.16.1 → 0.17.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 30476b58c5c5b39567f1cb3c8346a7c354fbf8d30401555fa2e02995021b759d
4
- data.tar.gz: 6f664b0c279e0fef2dc47e608cdc2737318274b45017d6d60f0dd516aa2ebb48
3
+ metadata.gz: d1071dfdccfc177ea5902e5e1b09fce084fd4b6ce403fae6797e6b93c3f826ad
4
+ data.tar.gz: 30768881f5c826f59dbcca0b17a1192dbdc17ca835c8bcc626e874391131bf92
5
5
  SHA512:
6
- metadata.gz: aa51f865e4995901e5587e3089fae724a57022d96c95d2b852cfde99f85f9aae7035c4edfe6c4a7899c22674778e1bfc0332ef83b6f234a8c9e8aa982e55e833
7
- data.tar.gz: 55e209725a0c716b1f450bed025fceefe36dafe278b96648ac60079e3968778840bc4e1e75ff4181abafc4f98eb93cc40d8e2e0e3b5bf078bc94cf8b9a5dc50d
6
+ metadata.gz: e748eedf78b040a7dbe1a1b744f87f1c1e9c3ae751417711eccd5f69dca68335f1a206b9258503da70a8486c3d8588d61ae78081ebdecc6a8ee40f85383a319f
7
+ data.tar.gz: 2abae603660179e05f8341ab5351fb9e028549674bb13901e4cae4dfd13c99995de0de3c63f8c75182acd155a1d2171b02e4db74fcaa08c1108a1a0e92ad3eee
@@ -15,7 +15,7 @@ AllCops:
15
15
  Style/Documentation:
16
16
  Enabled: false
17
17
 
18
- Metrics/LineLength:
18
+ Layout/LineLength:
19
19
  Max: 145
20
20
  IgnoredPatterns: ['(\A|\s)#']
21
21
 
@@ -43,7 +43,7 @@ Metrics/BlockLength:
43
43
  - 'spec/**/*'
44
44
 
45
45
  Metrics/ParameterLists:
46
- Max: 10
46
+ Max: 15
47
47
 
48
48
  Security/MarshalLoad:
49
49
  Enabled: false
@@ -1,3 +1,25 @@
1
+ # 0.17.0
2
+ ## Breaking changes
3
+ - Fix all linear model estimators to use the new abstract class ([BaseSGD](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/BaseSGD.html)) introduced in version 0.16.1.
4
+ The major differences from the old abstract class are that
5
+ the optimizer of LinearModel estimators is fixed to mini-batch SGD with momentum term,
6
+ the max_iter parameter indicates the number of epochs instead of the maximum number of iterations,
7
+ the fit_bias parameter is true by default, and elastic-net style regularization can be used.
8
+ Note that there are additions and changes to hyperparameters.
9
+ Existing trained linear models may need to re-train the model and adjust the hyperparameters.
10
+ - [LogisticRegression](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/LogisticRegression.html)
11
+ - [SVC](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/SVC.html)
12
+ - [LinearRegression](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/LinearRegression.html)
13
+ - [Rdige](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/Ridge.html)
14
+ - [Lasso](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/Lasso.html)
15
+ - [SVR](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/SVR.html)
16
+ - Change the default value of solver parameter on LinearRegression and Ridge to 'auto'.
17
+ If Numo::Linalg is loaded, 'svd' is selected for the solver, otherwise 'sgd' is selected.
18
+ - The meaning of the `max_iter` parameter of the factorization machine estimators
19
+ has been changed from the maximum number of iterations to the number of epochs.
20
+ - [FactorizationMachineClassifier](https://yoshoku.github.io/rumale/doc/Rumale/PolynomialModel/FactorizationMachineClassifier.html)
21
+ - [FactorizationMachineRegressor](https://yoshoku.github.io/rumale/doc/Rumale/PolynomialModel/FactorizationMachineRegressor.html)
22
+
1
23
  # 0.16.1
2
24
  - Add regressor class for [ElasticNet](https://yoshoku.github.io/rumale/doc/Rumale/LinearModel/ElasticNet.html).
3
25
  - Add new linear model abstract class.
data/README.md CHANGED
@@ -6,7 +6,7 @@
6
6
  [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
7
7
  [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
8
8
  [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
9
- [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://yoshoku.github.io/rumale/doc/)
9
+ [![Documentation](https://img.shields.io/badge/api-reference-blue.svg)](https://yoshoku.github.io/rumale/doc/)
10
10
 
11
11
  Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
12
12
  Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
@@ -37,6 +37,10 @@ Or install it yourself as:
37
37
 
38
38
  $ gem install rumale
39
39
 
40
+ ## Documentation
41
+
42
+ - [Rumale API Documentation](https://yoshoku.github.io/rumale/doc/)
43
+
40
44
  ## Usage
41
45
 
42
46
  ### Example 1. XOR data
@@ -95,7 +99,7 @@ transformer = Rumale::KernelApproximation::RBF.new(gamma: 0.0001, n_components:
95
99
  transformed = transformer.fit_transform(samples)
96
100
 
97
101
  # Train linear SVM classifier.
98
- classifier = Rumale::LinearModel::SVC.new(reg_param: 0.0001, max_iter: 1000, batch_size: 50, random_seed: 1)
102
+ classifier = Rumale::LinearModel::SVC.new(reg_param: 0.0001, random_seed: 1)
99
103
  classifier.fit(transformed, labels)
100
104
 
101
105
  # Save the model.
@@ -132,7 +136,7 @@ Execution of the above scripts result in the following.
132
136
  ```bash
133
137
  $ ruby train.rb
134
138
  $ ruby test.rb
135
- Accuracy: 98.4%
139
+ Accuracy: 98.7%
136
140
  ```
137
141
 
138
142
  ### Example 3. Cross-validation
@@ -144,7 +148,7 @@ require 'rumale'
144
148
  samples, labels = Rumale::Dataset.load_libsvm_file('pendigits')
145
149
 
146
150
  # Define the estimator to be evaluated.
147
- lr = Rumale::LinearModel::LogisticRegression.new(reg_param: 0.0001, random_seed: 1)
151
+ lr = Rumale::LinearModel::LogisticRegression.new(learning_rate: 0.00001, reg_param: 0.0001, random_seed: 1)
148
152
 
149
153
  # Define the evaluation measure, splitting strategy, and cross validation.
150
154
  ev = Rumale::EvaluationMeasure::LogLoss.new
@@ -163,7 +167,7 @@ Execution of the above scripts result in the following.
163
167
 
164
168
  ```bash
165
169
  $ ruby cross_validation.rb
166
- 5-CV mean log-loss: 0.476
170
+ 5-CV mean log-loss: 0.355
167
171
  ```
168
172
 
169
173
  ### Example 4. Pipeline
@@ -176,7 +180,7 @@ samples, labels = Rumale::Dataset.load_libsvm_file('pendigits')
176
180
 
177
181
  # Construct pipeline with kernel approximation and SVC.
178
182
  rbf = Rumale::KernelApproximation::RBF.new(gamma: 0.0001, n_components: 800, random_seed: 1)
179
- svc = Rumale::LinearModel::SVC.new(reg_param: 0.0001, max_iter: 1000, random_seed: 1)
183
+ svc = Rumale::LinearModel::SVC.new(reg_param: 0.0001, random_seed: 1)
180
184
  pipeline = Rumale::Pipeline::Pipeline.new(steps: { trns: rbf, clsf: svc })
181
185
 
182
186
  # Define the splitting strategy and cross validation.
@@ -195,7 +199,7 @@ Execution of the above scripts result in the following.
195
199
 
196
200
  ```bash
197
201
  $ ruby pipeline.rb
198
- 5-CV mean accuracy: 99.2 %
202
+ 5-CV mean accuracy: 99.6 %
199
203
  ```
200
204
 
201
205
  ## Speeding up
@@ -5,10 +5,15 @@ require 'rumale/optimizer/nadam'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
+ # @note
9
+ # In version 0.17.0, a new linear model abstract class called BaseSGD is introduced.
10
+ # BaseLienarModel is deprecated and will be removed in the future.
11
+ #
8
12
  # BaseLinearModel is an abstract class for implementation of linear estimator
9
13
  # with mini-batch stochastic gradient descent optimization.
10
14
  # This class is used for internal process.
11
15
  class BaseLinearModel
16
+ # :nocov:
12
17
  include Base::BaseEstimator
13
18
 
14
19
  # Initialize a linear estimator.
@@ -26,6 +31,7 @@ module Rumale
26
31
  # @param random_seed [Integer] The seed value using to initialize the random generator.
27
32
  def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0,
28
33
  max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
34
+ warn 'warning: BaseLinearModel is deprecated. Use BaseSGD instead.'
29
35
  @params = {}
30
36
  @params[:reg_param] = reg_param
31
37
  @params[:fit_bias] = fit_bias
@@ -88,6 +94,7 @@ module Rumale
88
94
  [weight, 0.0]
89
95
  end
90
96
  end
97
+ # :nocov:
91
98
  end
92
99
  end
93
100
  end
@@ -99,6 +99,61 @@ module Rumale
99
99
  2.fdiv(y.shape[0]) * (out - y)
100
100
  end
101
101
  end
102
+
103
+ # @!visibility private
104
+ # LogLoss is a class that calculates logistic loss for logistic regression.
105
+ class LogLoss
106
+ # @!visibility private
107
+ def loss(out, y)
108
+ Numo::NMath.log(1 + Numo::NMath.exp(-y * out)).sum.fdiv(y.shape[0])
109
+ end
110
+
111
+ # @!visibility private
112
+ def dloss(out, y)
113
+ y / (1 + Numo::NMath.exp(-y * out)) - y
114
+ end
115
+ end
116
+
117
+ # @!visibility private
118
+ # HingeLoss is a class that calculates hinge loss for support vector classifier.
119
+ class HingeLoss
120
+ # @!visibility private
121
+ def loss(out, y)
122
+ out.class.maximum(0.0, 1 - y * out).sum.fdiv(y.shape[0])
123
+ end
124
+
125
+ # @!visibility private
126
+ def dloss(out, y)
127
+ tids = (y * out).lt(1)
128
+ d = Numo::DFloat.zeros(y.shape[0])
129
+ d[tids] = -y[tids] if tids.count.positive?
130
+ d
131
+ end
132
+ end
133
+
134
+ # @!visibility private
135
+ # EpsilonInsensitive is a class that calculates epsilon insensitive for support vector regressor.
136
+ class EpsilonInsensitive
137
+ # @!visibility private
138
+ def initialize(epsilon: 0.1)
139
+ @epsilon = epsilon
140
+ end
141
+
142
+ # @!visibility private
143
+ def loss(out, y)
144
+ out.class.maximum(0.0, (y - out).abs - @epsilon).sum.fdiv(y.shape[0])
145
+ end
146
+
147
+ # @!visibility private
148
+ def dloss(out, y)
149
+ d = Numo::DFloat.zeros(y.shape[0])
150
+ tids = (out - y).gt(@epsilon)
151
+ d[tids] = 1 if tids.count.positive?
152
+ tids = (y - out).gt(@epsilon)
153
+ d[tids] = -1 if tids.count.positive?
154
+ d
155
+ end
156
+ end
102
157
  end
103
158
 
104
159
  # BaseSGD is an abstract class for implementation of linear model with mini-batch stochastic gradient descent (SGD) optimization.
@@ -59,13 +59,13 @@ module Rumale
59
59
  # @param random_seed [Integer] The seed value using to initialize the random generator.
60
60
  def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
61
61
  reg_param: 1.0, l1_ratio: 0.5, fit_bias: true, bias_scale: 1.0,
62
- max_iter: 100, batch_size: 50, tol: 1e-4,
62
+ max_iter: 200, batch_size: 50, tol: 1e-4,
63
63
  n_jobs: nil, verbose: false, random_seed: nil)
64
64
  check_params_numeric(learning_rate: learning_rate, momentum: momentum,
65
65
  reg_param: reg_param, l1_ratio: l1_ratio, bias_scale: bias_scale,
66
66
  max_iter: max_iter, batch_size: batch_size, tol: tol)
67
67
  check_params_boolean(fit_bias: fit_bias, verbose: verbose)
68
- check_params_numeric_or_nil(decay: nil, n_jobs: n_jobs, random_seed: random_seed)
68
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
69
69
  check_params_positive(learning_rate: learning_rate, reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
70
70
  super()
71
71
  @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
@@ -10,14 +10,15 @@ module Rumale
10
10
  #
11
11
  # @example
12
12
  # estimator =
13
- # Rumale::LinearModel::Lasso.new(reg_param: 0.1, max_iter: 1000, batch_size: 20, random_seed: 1)
13
+ # Rumale::LinearModel::Lasso.new(reg_param: 0.1, max_iter: 500, batch_size: 20, random_seed: 1)
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
17
17
  # *Reference*
18
18
  # - S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
19
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
19
20
  # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
20
- class Lasso < BaseLinearModel
21
+ class Lasso < BaseSGD
21
22
  include Base::Regressor
22
23
 
23
24
  # Return the weight vector.
@@ -34,25 +35,43 @@ module Rumale
34
35
 
35
36
  # Create a new Lasso regressor.
36
37
  #
38
+ # @param learning_rate [Float] The initial value of learning rate.
39
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
40
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
41
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
42
+ # @param momentum [Float] The momentum factor.
37
43
  # @param reg_param [Float] The regularization parameter.
38
44
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
39
45
  # @param bias_scale [Float] The scale of the bias term.
40
- # @param max_iter [Integer] The maximum number of iterations.
46
+ # @param max_iter [Integer] The maximum number of epochs that indicates
47
+ # how many times the whole data is given to the training process.
41
48
  # @param batch_size [Integer] The size of the mini batches.
42
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
43
- # If nil is given, Nadam is used.
49
+ # @param tol [Float] The tolerance of loss for terminating optimization.
44
50
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
45
51
  # If nil is given, the method does not execute in parallel.
46
52
  # If zero or less is given, it becomes equal to the number of processors.
47
53
  # This parameter is ignored if the Parallel gem is not loaded.
54
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
48
55
  # @param random_seed [Integer] The seed value using to initialize the random generator.
49
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
50
- n_jobs: nil, random_seed: nil)
51
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
52
- check_params_boolean(fit_bias: fit_bias)
53
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
54
- check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
55
- super
56
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
57
+ reg_param: 1.0, fit_bias: true, bias_scale: 1.0,
58
+ max_iter: 200, batch_size: 50, tol: 1e-4,
59
+ n_jobs: nil, verbose: false, random_seed: nil)
60
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
61
+ reg_param: reg_param, bias_scale: bias_scale,
62
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
63
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
64
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
65
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
66
+ super()
67
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
68
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
69
+ @params[:random_seed] ||= srand
70
+ @rng = Random.new(@params[:random_seed])
71
+ @penalty_type = L1_PENALTY
72
+ @loss_func = LinearModel::Loss::MeanSquaredError.new
73
+ @weight_vec = nil
74
+ @bias_term = nil
56
75
  end
57
76
 
58
77
  # Fit the model with given training data.
@@ -91,54 +110,6 @@ module Rumale
91
110
  x = check_convert_sample_array(x)
92
111
  x.dot(@weight_vec.transpose) + @bias_term
93
112
  end
94
-
95
- # Dump marshal data.
96
- # @return [Hash] The marshal data about Lasso.
97
- def marshal_dump
98
- { params: @params,
99
- weight_vec: @weight_vec,
100
- bias_term: @bias_term,
101
- rng: @rng }
102
- end
103
-
104
- # Load marshal data.
105
- # @return [nil]
106
- def marshal_load(obj)
107
- @params = obj[:params]
108
- @weight_vec = obj[:weight_vec]
109
- @bias_term = obj[:bias_term]
110
- @rng = obj[:rng]
111
- nil
112
- end
113
-
114
- private
115
-
116
- def partial_fit(x, y)
117
- n_features = @params[:fit_bias] ? x.shape[1] + 1 : x.shape[1]
118
- @left_weight = Numo::DFloat.zeros(n_features)
119
- @right_weight = Numo::DFloat.zeros(n_features)
120
- @left_optimizer = @params[:optimizer].dup
121
- @right_optimizer = @params[:optimizer].dup
122
- super
123
- end
124
-
125
- def calc_loss_gradient(x, y, weight)
126
- 2.0 * (x.dot(weight) - y)
127
- end
128
-
129
- def calc_new_weight(_optimizer, x, _weight, loss_gradient)
130
- @left_weight = round_weight(@left_optimizer.call(@left_weight, calc_weight_gradient(loss_gradient, x)))
131
- @right_weight = round_weight(@right_optimizer.call(@right_weight, calc_weight_gradient(-loss_gradient, x)))
132
- @left_weight - @right_weight
133
- end
134
-
135
- def calc_weight_gradient(loss_gradient, data)
136
- ((@params[:reg_param] + loss_gradient).expand_dims(1) * data).mean(0)
137
- end
138
-
139
- def round_weight(weight)
140
- 0.5 * (weight + weight.abs)
141
- end
142
113
  end
143
114
  end
144
115
  end
@@ -1,16 +1,16 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LinearRegression is a class that implements ordinary least square linear regression
9
- # with mini-batch stochastic gradient descent optimization or singular value decomposition.
9
+ # with stochastic gradient descent (SGD) optimization or singular value decomposition (SVD).
10
10
  #
11
11
  # @example
12
12
  # estimator =
13
- # Rumale::LinearModel::LinearRegression.new(max_iter: 1000, batch_size: 20, random_seed: 1)
13
+ # Rumale::LinearModel::LinearRegression.new(max_iter: 500, batch_size: 20, random_seed: 1)
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
@@ -19,7 +19,10 @@ module Rumale
19
19
  # estimator = Rumale::LinearModel::LinearRegression.new(solver: 'svd')
20
20
  # estimator.fit(training_samples, traininig_values)
21
21
  # results = estimator.predict(testing_samples)
22
- class LinearRegression < BaseLinearModel
22
+ #
23
+ # *Reference*
24
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
25
+ class LinearRegression < BaseSGD
23
26
  include Base::Regressor
24
27
 
25
28
  # Return the weight vector.
@@ -36,34 +39,57 @@ module Rumale
36
39
 
37
40
  # Create a new ordinary least square linear regressor.
38
41
  #
42
+ # @param learning_rate [Float] The initial value of learning rate.
43
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
+ # If solver = 'svd', this parameter is ignored.
45
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
46
+ # If nil is given, the decay sets to 'learning_rate'.
47
+ # If solver = 'svd', this parameter is ignored.
48
+ # @param momentum [Float] The momentum factor.
49
+ # If solver = 'svd', this parameter is ignored.
39
50
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
40
51
  # @param bias_scale [Float] The scale of the bias term.
41
- # @param max_iter [Integer] The maximum number of iterations.
52
+ # @param max_iter [Integer] The maximum number of epochs that indicates
53
+ # how many times the whole data is given to the training process.
42
54
  # If solver = 'svd', this parameter is ignored.
43
55
  # @param batch_size [Integer] The size of the mini batches.
44
56
  # If solver = 'svd', this parameter is ignored.
45
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
46
- # If nil is given, Nadam is used.
57
+ # @param tol [Float] The tolerance of loss for terminating optimization.
47
58
  # If solver = 'svd', this parameter is ignored.
48
- # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
59
+ # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd' or 'svd').
60
+ # 'auto' chooses the 'svd' solver if Numo::Linalg is loaded. Otherwise, it chooses the 'sgd' solver.
49
61
  # 'sgd' uses the stochastic gradient descent optimization.
50
62
  # 'svd' performs singular value decomposition of samples.
51
63
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
52
64
  # If nil is given, the method does not execute in parallel.
53
65
  # If zero or less is given, it becomes equal to the number of processors.
54
66
  # This parameter is ignored if the Parallel gem is not loaded.
67
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
68
+ # If solver = 'svd', this parameter is ignored.
55
69
  # @param random_seed [Integer] The seed value using to initialize the random generator.
56
- def initialize(fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
57
- solver: 'sgd', n_jobs: nil, random_seed: nil)
58
- check_params_numeric(bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
59
- check_params_boolean(fit_bias: fit_bias)
70
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
71
+ fit_bias: true, bias_scale: 1.0, max_iter: 200, batch_size: 50, tol: 1e-4,
72
+ solver: 'auto',
73
+ n_jobs: nil, verbose: false, random_seed: nil)
74
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
75
+ bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
76
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
60
77
  check_params_string(solver: solver)
61
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
62
- check_params_positive(max_iter: max_iter, batch_size: batch_size)
63
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h.merge(reg_param: 0.0)
64
- keywd_args.delete(:solver)
65
- super(**keywd_args)
66
- @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
78
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
79
+ check_params_positive(learning_rate: learning_rate, max_iter: max_iter, batch_size: batch_size)
80
+ super()
81
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
82
+ @params[:solver] = if solver == 'auto'
83
+ load_linalg? ? 'svd' : 'sgd'
84
+ else
85
+ solver != 'svd' ? 'sgd' : 'svd'
86
+ end
87
+ @params[:decay] ||= @params[:learning_rate]
88
+ @params[:random_seed] ||= srand
89
+ @rng = Random.new(@params[:random_seed])
90
+ @loss_func = LinearModel::Loss::MeanSquaredError.new
91
+ @weight_vec = nil
92
+ @bias_term = nil
67
93
  end
68
94
 
69
95
  # Fit the model with given training data.
@@ -94,33 +120,12 @@ module Rumale
94
120
  x.dot(@weight_vec.transpose) + @bias_term
95
121
  end
96
122
 
97
- # Dump marshal data.
98
- # @return [Hash] The marshal data about LinearRegression.
99
- def marshal_dump
100
- { params: @params,
101
- weight_vec: @weight_vec,
102
- bias_term: @bias_term,
103
- rng: @rng }
104
- end
105
-
106
- # Load marshal data.
107
- # @return [nil]
108
- def marshal_load(obj)
109
- @params = obj[:params]
110
- @weight_vec = obj[:weight_vec]
111
- @bias_term = obj[:bias_term]
112
- @rng = obj[:rng]
113
- nil
114
- end
115
-
116
123
  private
117
124
 
118
125
  def fit_svd(x, y)
119
- samples = @params[:fit_bias] ? expand_feature(x) : x
126
+ x = expand_feature(x) if fit_bias?
120
127
 
121
- s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
122
- d = (s / s**2).diag
123
- w = vt.transpose.dot(d).dot(u.transpose).dot(y)
128
+ w = Numo::Linalg.pinv(x, driver: 'svd').dot(y)
124
129
 
125
130
  is_single_target_vals = y.shape[1].nil?
126
131
  if @params[:fit_bias]
@@ -150,8 +155,14 @@ module Rumale
150
155
  end
151
156
  end
152
157
 
153
- def calc_loss_gradient(x, y, weight)
154
- 2.0 * (x.dot(weight) - y)
158
+ def fit_bias?
159
+ @params[:fit_bias] == true
160
+ end
161
+
162
+ def load_linalg?
163
+ return false if defined?(Numo::Linalg).nil?
164
+ return false if Numo::Linalg::VERSION < '0.1.4'
165
+ true
155
166
  end
156
167
  end
157
168
  end
@@ -1,12 +1,12 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/classifier'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LogisticRegression is a class that implements Logistic Regression
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with stochastic gradient descent optimization.
10
10
  # For multiclass classification problem, it uses one-vs-the-rest strategy.
11
11
  #
12
12
  # Rumale::SVM provides Logistic Regression based on LIBLINEAR.
@@ -15,13 +15,15 @@ module Rumale
15
15
  #
16
16
  # @example
17
17
  # estimator =
18
- # Rumale::LinearModel::LogisticRegression.new(reg_param: 1.0, max_iter: 1000, batch_size: 20, random_seed: 1)
18
+ # Rumale::LinearModel::LogisticRegression.new(reg_param: 1.0, max_iter: 200, batch_size: 50, random_seed: 1)
19
19
  # estimator.fit(training_samples, traininig_labels)
20
20
  # results = estimator.predict(testing_samples)
21
21
  #
22
22
  # *Reference*
23
23
  # - S. Shalev-Shwartz, Y. Singer, N. Srebro, and A. Cotter, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Mathematical Programming, vol. 127 (1), pp. 3--30, 2011.
24
- class LogisticRegression < BaseLinearModel
24
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
25
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
26
+ class LogisticRegression < BaseSGD
25
27
  include Base::Classifier
26
28
 
27
29
  # Return the weight vector for Logistic Regression.
@@ -42,26 +44,53 @@ module Rumale
42
44
 
43
45
  # Create a new classifier with Logisitc Regression by the SGD optimization.
44
46
  #
47
+ # @param learning_rate [Float] The initial value of learning rate.
48
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
49
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
50
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
51
+ # @param momentum [Float] The momentum factor.
52
+ # @param penalty [String] The regularization type to be used ('l1', 'l2', and 'elasticnet').
53
+ # @param l1_ratio [Float] The elastic-net type regularization mixing parameter.
54
+ # If penalty set to 'l2' or 'l1', this parameter is ignored.
55
+ # If l1_ratio = 1, the regularization is similar to Lasso.
56
+ # If l1_ratio = 0, the regularization is similar to Ridge.
57
+ # If 0 < l1_ratio < 1, the regularization is a combination of L1 and L2.
45
58
  # @param reg_param [Float] The regularization parameter.
46
59
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
47
60
  # @param bias_scale [Float] The scale of the bias term.
48
61
  # If fit_bias is true, the feature vector v becoms [v; bias_scale].
49
- # @param max_iter [Integer] The maximum number of iterations.
62
+ # @param max_iter [Integer] The maximum number of epochs that indicates
63
+ # how many times the whole data is given to the training process.
50
64
  # @param batch_size [Integer] The size of the mini batches.
51
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
52
- # If nil is given, Nadam is used.
65
+ # @param tol [Float] The tolerance of loss for terminating optimization.
53
66
  # @param n_jobs [Integer] The number of jobs for running the fit and predict methods in parallel.
54
67
  # If nil is given, the methods do not execute in parallel.
55
68
  # If zero or less is given, it becomes equal to the number of processors.
56
69
  # This parameter is ignored if the Parallel gem is not loaded.
70
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
57
71
  # @param random_seed [Integer] The seed value using to initialize the random generator.
58
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0,
59
- max_iter: 1000, batch_size: 20, optimizer: nil, n_jobs: nil, random_seed: nil)
60
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
61
- check_params_boolean(fit_bias: fit_bias)
62
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
63
- check_params_positive(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
64
- super
72
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
73
+ penalty: 'l2', reg_param: 1.0, l1_ratio: 0.5,
74
+ fit_bias: true, bias_scale: 1.0,
75
+ max_iter: 200, batch_size: 50, tol: 1e-4,
76
+ n_jobs: nil, verbose: false, random_seed: nil)
77
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
78
+ reg_param: reg_param, l1_ratio: l1_ratio, bias_scale: bias_scale,
79
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
80
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
81
+ check_params_string(penalty: penalty)
82
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
83
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param,
84
+ bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
85
+ super()
86
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
87
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
88
+ @params[:random_seed] ||= srand
89
+ @rng = Random.new(@params[:random_seed])
90
+ @penalty_type = @params[:penalty]
91
+ @loss_func = LinearModel::Loss::LogLoss.new
92
+ @weight_vec = nil
93
+ @bias_term = nil
65
94
  @classes = nil
66
95
  end
67
96
 
@@ -148,33 +177,8 @@ module Rumale
148
177
  probs
149
178
  end
150
179
 
151
- # Dump marshal data.
152
- # @return [Hash] The marshal data about LogisticRegression.
153
- def marshal_dump
154
- { params: @params,
155
- weight_vec: @weight_vec,
156
- bias_term: @bias_term,
157
- classes: @classes,
158
- rng: @rng }
159
- end
160
-
161
- # Load marshal data.
162
- # @return [nil]
163
- def marshal_load(obj)
164
- @params = obj[:params]
165
- @weight_vec = obj[:weight_vec]
166
- @bias_term = obj[:bias_term]
167
- @classes = obj[:classes]
168
- @rng = obj[:rng]
169
- nil
170
- end
171
-
172
180
  private
173
181
 
174
- def calc_loss_gradient(x, y, weight)
175
- y / (Numo::NMath.exp(-y * x.dot(weight)) + 1.0) - y
176
- end
177
-
178
182
  def multiclass_problem?
179
183
  @classes.size > 2
180
184
  end
@@ -1,16 +1,16 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # Ridge is a class that implements Ridge Regression
9
- # with mini-batch stochastic gradient descent optimization or singular value decomposition.
9
+ # with stochastic gradient descent (SGD) optimization or singular value decomposition (SVD).
10
10
  #
11
11
  # @example
12
12
  # estimator =
13
- # Rumale::LinearModel::Ridge.new(reg_param: 0.1, max_iter: 1000, batch_size: 20, random_seed: 1)
13
+ # Rumale::LinearModel::Ridge.new(reg_param: 0.1, max_iter: 500, batch_size: 20, random_seed: 1)
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
@@ -19,7 +19,10 @@ module Rumale
19
19
  # estimator = Rumale::LinearModel::Ridge.new(reg_param: 0.1, solver: 'svd')
20
20
  # estimator.fit(training_samples, traininig_values)
21
21
  # results = estimator.predict(testing_samples)
22
- class Ridge < BaseLinearModel
22
+ #
23
+ # *Reference*
24
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
25
+ class Ridge < BaseSGD
23
26
  include Base::Regressor
24
27
 
25
28
  # Return the weight vector.
@@ -36,35 +39,61 @@ module Rumale
36
39
 
37
40
  # Create a new Ridge regressor.
38
41
  #
42
+ # @param learning_rate [Float] The initial value of learning rate.
43
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
+ # If solver = 'svd', this parameter is ignored.
45
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
46
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
47
+ # If solver = 'svd', this parameter is ignored.
48
+ # @param momentum [Float] The momentum factor.
49
+ # If solver = 'svd', this parameter is ignored.
39
50
  # @param reg_param [Float] The regularization parameter.
40
51
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
41
52
  # @param bias_scale [Float] The scale of the bias term.
42
- # @param max_iter [Integer] The maximum number of iterations.
53
+ # @param max_iter [Integer] The maximum number of epochs that indicates
54
+ # how many times the whole data is given to the training process.
43
55
  # If solver = 'svd', this parameter is ignored.
44
56
  # @param batch_size [Integer] The size of the mini batches.
45
57
  # If solver = 'svd', this parameter is ignored.
46
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
47
- # If nil is given, Nadam is used.
58
+ # @param tol [Float] The tolerance of loss for terminating optimization.
48
59
  # If solver = 'svd', this parameter is ignored.
49
- # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
60
+ # @param solver [String] The algorithm to calculate weights. ('auto', 'sgd' or 'svd').
61
+ # 'auto' chooses the 'svd' solver if Numo::Linalg is loaded. Otherwise, it chooses the 'sgd' solver.
50
62
  # 'sgd' uses the stochastic gradient descent optimization.
51
63
  # 'svd' performs singular value decomposition of samples.
52
64
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
53
65
  # If nil is given, the method does not execute in parallel.
54
66
  # If zero or less is given, it becomes equal to the number of processors.
55
67
  # This parameter is ignored if the Parallel gem is not loaded or the solver is 'svd'.
68
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
69
+ # If solver = 'svd', this parameter is ignored.
56
70
  # @param random_seed [Integer] The seed value using to initialize the random generator.
57
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
58
- solver: 'sgd', n_jobs: nil, random_seed: nil)
59
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
60
- check_params_boolean(fit_bias: fit_bias)
71
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
72
+ reg_param: 1.0, fit_bias: true, bias_scale: 1.0,
73
+ max_iter: 200, batch_size: 50, tol: 1e-4,
74
+ solver: 'auto',
75
+ n_jobs: nil, verbose: false, random_seed: nil)
76
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
77
+ reg_param: reg_param, bias_scale: bias_scale,
78
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
79
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
61
80
  check_params_string(solver: solver)
62
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
63
- check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
64
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
65
- keywd_args.delete(:solver)
66
- super(**keywd_args)
67
- @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
81
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
82
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
83
+ super()
84
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
85
+ @params[:solver] = if solver == 'auto'
86
+ load_linalg? ? 'svd' : 'sgd'
87
+ else
88
+ solver != 'svd' ? 'sgd' : 'svd'
89
+ end
90
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
91
+ @params[:random_seed] ||= srand
92
+ @rng = Random.new(@params[:random_seed])
93
+ @penalty_type = L2_PENALTY
94
+ @loss_func = LinearModel::Loss::MeanSquaredError.new
95
+ @weight_vec = nil
96
+ @bias_term = nil
68
97
  end
69
98
 
70
99
  # Fit the model with given training data.
@@ -95,25 +124,6 @@ module Rumale
95
124
  x.dot(@weight_vec.transpose) + @bias_term
96
125
  end
97
126
 
98
- # Dump marshal data.
99
- # @return [Hash] The marshal data about Ridge.
100
- def marshal_dump
101
- { params: @params,
102
- weight_vec: @weight_vec,
103
- bias_term: @bias_term,
104
- rng: @rng }
105
- end
106
-
107
- # Load marshal data.
108
- # @return [nil]
109
- def marshal_load(obj)
110
- @params = obj[:params]
111
- @weight_vec = obj[:weight_vec]
112
- @bias_term = obj[:bias_term]
113
- @rng = obj[:rng]
114
- nil
115
- end
116
-
117
127
  private
118
128
 
119
129
  def fit_svd(x, y)
@@ -151,8 +161,10 @@ module Rumale
151
161
  end
152
162
  end
153
163
 
154
- def calc_loss_gradient(x, y, weight)
155
- 2.0 * (x.dot(weight) - y)
164
+ def load_linalg?
165
+ return false if defined?(Numo::Linalg).nil?
166
+ return false if Numo::Linalg::VERSION < '0.1.4'
167
+ true
156
168
  end
157
169
  end
158
170
  end
@@ -1,6 +1,6 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/classifier'
5
5
  require 'rumale/probabilistic_output'
6
6
 
@@ -8,7 +8,7 @@ module Rumale
8
8
  # This module consists of the classes that implement generalized linear models.
9
9
  module LinearModel
10
10
  # SVC is a class that implements Support Vector Classifier
11
- # with mini-batch stochastic gradient descent optimization.
11
+ # with stochastic gradient descent optimization.
12
12
  # For multiclass classification problem, it uses one-vs-the-rest strategy.
13
13
  #
14
14
  # Rumale::SVM provides linear support vector classifier based on LIBLINEAR.
@@ -17,13 +17,15 @@ module Rumale
17
17
  #
18
18
  # @example
19
19
  # estimator =
20
- # Rumale::LinearModel::SVC.new(reg_param: 1.0, max_iter: 1000, batch_size: 20, random_seed: 1)
20
+ # Rumale::LinearModel::SVC.new(reg_param: 1.0, max_iter: 200, batch_size: 50, random_seed: 1)
21
21
  # estimator.fit(training_samples, traininig_labels)
22
22
  # results = estimator.predict(testing_samples)
23
23
  #
24
24
  # *Reference*
25
25
  # - S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
26
- class SVC < BaseLinearModel
26
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
27
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
28
+ class SVC < BaseSGD
27
29
  include Base::Classifier
28
30
 
29
31
  # Return the weight vector for SVC.
@@ -44,31 +46,56 @@ module Rumale
44
46
 
45
47
  # Create a new classifier with Support Vector Machine by the SGD optimization.
46
48
  #
49
+ # @param learning_rate [Float] The initial value of learning rate.
50
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
51
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
52
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
53
+ # @param momentum [Float] The momentum factor.
54
+ # @param penalty [String] The regularization type to be used ('l1', 'l2', and 'elasticnet').
55
+ # @param l1_ratio [Float] The elastic-net type regularization mixing parameter.
56
+ # If penalty set to 'l2' or 'l1', this parameter is ignored.
57
+ # If l1_ratio = 1, the regularization is similar to Lasso.
58
+ # If l1_ratio = 0, the regularization is similar to Ridge.
59
+ # If 0 < l1_ratio < 1, the regularization is a combination of L1 and L2.
47
60
  # @param reg_param [Float] The regularization parameter.
48
61
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
49
62
  # @param bias_scale [Float] The scale of the bias term.
50
- # @param max_iter [Integer] The maximum number of iterations.
63
+ # @param max_iter [Integer] The maximum number of epochs that indicates
64
+ # how many times the whole data is given to the training process.
51
65
  # @param batch_size [Integer] The size of the mini batches.
66
+ # @param tol [Float] The tolerance of loss for terminating optimization.
52
67
  # @param probability [Boolean] The flag indicating whether to perform probability estimation.
53
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
54
- # If nil is given, Nadam is used.
55
68
  # @param n_jobs [Integer] The number of jobs for running the fit and predict methods in parallel.
56
69
  # If nil is given, the methods do not execute in parallel.
57
70
  # If zero or less is given, it becomes equal to the number of processors.
58
71
  # This parameter is ignored if the Parallel gem is not loaded.
72
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
59
73
  # @param random_seed [Integer] The seed value using to initialize the random generator.
60
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0,
61
- max_iter: 1000, batch_size: 20, probability: false, optimizer: nil, n_jobs: nil, random_seed: nil)
62
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
63
- check_params_boolean(fit_bias: fit_bias, probability: probability)
64
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
65
- check_params_positive(reg_param: reg_param, bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
66
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
67
- keywd_args.delete(:probability)
68
- super(**keywd_args)
69
- @params[:probability] = probability
70
- @prob_param = nil
74
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
75
+ penalty: 'l2', reg_param: 1.0, l1_ratio: 0.5,
76
+ fit_bias: true, bias_scale: 1.0,
77
+ max_iter: 200, batch_size: 50, tol: 1e-4,
78
+ probability: false,
79
+ n_jobs: nil, verbose: false, random_seed: nil)
80
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
81
+ reg_param: reg_param, l1_ratio: l1_ratio, bias_scale: bias_scale,
82
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
83
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose, probability: probability)
84
+ check_params_string(penalty: penalty)
85
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
86
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param,
87
+ bias_scale: bias_scale, max_iter: max_iter, batch_size: batch_size)
88
+ super()
89
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
90
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
91
+ @params[:random_seed] ||= srand
92
+ @rng = Random.new(@params[:random_seed])
93
+ @penalty_type = @params[:penalty]
94
+ @loss_func = LinearModel::Loss::HingeLoss.new
95
+ @weight_vec = nil
96
+ @bias_term = nil
71
97
  @classes = nil
98
+ @prob_param = nil
72
99
  end
73
100
 
74
101
  # Fit the model with given training data.
@@ -165,29 +192,6 @@ module Rumale
165
192
  end
166
193
  end
167
194
 
168
- # Dump marshal data.
169
- # @return [Hash] The marshal data about SVC.
170
- def marshal_dump
171
- { params: @params,
172
- weight_vec: @weight_vec,
173
- bias_term: @bias_term,
174
- prob_param: @prob_param,
175
- classes: @classes,
176
- rng: @rng }
177
- end
178
-
179
- # Load marshal data.
180
- # @return [nil]
181
- def marshal_load(obj)
182
- @params = obj[:params]
183
- @weight_vec = obj[:weight_vec]
184
- @bias_term = obj[:bias_term]
185
- @prob_param = obj[:prob_param]
186
- @classes = obj[:classes]
187
- @rng = obj[:rng]
188
- nil
189
- end
190
-
191
195
  private
192
196
 
193
197
  def partial_fit(x, bin_y)
@@ -200,13 +204,6 @@ module Rumale
200
204
  [w, b, p]
201
205
  end
202
206
 
203
- def calc_loss_gradient(x, y, weight)
204
- target_ids = (x.dot(weight) * y).lt(1.0).where
205
- grad = Numo::DFloat.zeros(@params[:batch_size])
206
- grad[target_ids] = -y[target_ids]
207
- grad
208
- end
209
-
210
207
  def multiclass_problem?
211
208
  @classes.size > 2
212
209
  end
@@ -1,12 +1,12 @@
1
1
  # frozen_string_literal: true
2
2
 
3
- require 'rumale/linear_model/base_linear_model'
3
+ require 'rumale/linear_model/base_sgd'
4
4
  require 'rumale/base/regressor'
5
5
 
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # SVR is a class that implements Support Vector Regressor
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with stochastic gradient descent optimization.
10
10
  #
11
11
  # Rumale::SVM provides linear and kernel support vector regressor based on LIBLINEAR and LIBSVM.
12
12
  # If you prefer execution speed, you should use Rumale::SVM::LinearSVR.
@@ -14,13 +14,15 @@ module Rumale
14
14
  #
15
15
  # @example
16
16
  # estimator =
17
- # Rumale::LinearModel::SVR.new(reg_param: 1.0, epsilon: 0.1, max_iter: 1000, batch_size: 20, random_seed: 1)
17
+ # Rumale::LinearModel::SVR.new(reg_param: 1.0, epsilon: 0.1, max_iter: 200, batch_size: 50, random_seed: 1)
18
18
  # estimator.fit(training_samples, traininig_target_values)
19
19
  # results = estimator.predict(testing_samples)
20
20
  #
21
21
  # *Reference*
22
- # 1. S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
23
- class SVR < BaseLinearModel
22
+ # - S. Shalev-Shwartz and Y. Singer, "Pegasos: Primal Estimated sub-GrAdient SOlver for SVM," Proc. ICML'07, pp. 807--814, 2007.
23
+ # - Y. Tsuruoka, J. Tsujii, and S. Ananiadou, "Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty," Proc. ACL'09, pp. 477--485, 2009.
24
+ # - L. Bottou, "Large-Scale Machine Learning with Stochastic Gradient Descent," Proc. COMPSTAT'10, pp. 177--186, 2010.
25
+ class SVR < BaseSGD
24
26
  include Base::Regressor
25
27
 
26
28
  # Return the weight vector for SVR.
@@ -37,30 +39,54 @@ module Rumale
37
39
 
38
40
  # Create a new regressor with Support Vector Machine by the SGD optimization.
39
41
  #
42
+ # @param learning_rate [Float] The initial value of learning rate.
43
+ # The learning rate decreases as the iteration proceeds according to the equation: learning_rate / (1 + decay * t).
44
+ # @param decay [Float] The smoothing parameter for decreasing learning rate as the iteration proceeds.
45
+ # If nil is given, the decay sets to 'reg_param * learning_rate'.
46
+ # @param momentum [Float] The momentum factor.
47
+ # @param penalty [String] The regularization type to be used ('l1', 'l2', and 'elasticnet').
48
+ # @param l1_ratio [Float] The elastic-net type regularization mixing parameter.
49
+ # If penalty set to 'l2' or 'l1', this parameter is ignored.
50
+ # If l1_ratio = 1, the regularization is similar to Lasso.
51
+ # If l1_ratio = 0, the regularization is similar to Ridge.
52
+ # If 0 < l1_ratio < 1, the regularization is a combination of L1 and L2.
40
53
  # @param reg_param [Float] The regularization parameter.
41
54
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
42
55
  # @param bias_scale [Float] The scale of the bias term.
43
56
  # @param epsilon [Float] The margin of tolerance.
44
- # @param max_iter [Integer] The maximum number of iterations.
57
+ # @param max_iter [Integer] The maximum number of epochs that indicates
58
+ # how many times the whole data is given to the training process.
45
59
  # @param batch_size [Integer] The size of the mini batches.
46
- # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
47
- # If nil is given, Nadam is used.
60
+ # @param tol [Float] The tolerance of loss for terminating optimization.
48
61
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
49
62
  # If nil is given, the method does not execute in parallel.
50
63
  # If zero or less is given, it becomes equal to the number of processors.
51
64
  # This parameter is ignored if the Parallel gem is not loaded.
65
+ # @param verbose [Boolean] The flag indicating whether to output loss during iteration.
52
66
  # @param random_seed [Integer] The seed value using to initialize the random generator.
53
- def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, epsilon: 0.1,
54
- max_iter: 1000, batch_size: 20, optimizer: nil, n_jobs: nil, random_seed: nil)
55
- check_params_numeric(reg_param: reg_param, bias_scale: bias_scale, epsilon: epsilon, max_iter: max_iter, batch_size: batch_size)
56
- check_params_boolean(fit_bias: fit_bias)
57
- check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
58
- check_params_positive(reg_param: reg_param, bias_scale: bias_scale, epsilon: epsilon,
67
+ def initialize(learning_rate: 0.01, decay: nil, momentum: 0.9,
68
+ penalty: 'l2', reg_param: 1.0, l1_ratio: 0.5,
69
+ fit_bias: true, bias_scale: 1.0,
70
+ epsilon: 0.1,
71
+ max_iter: 200, batch_size: 50, tol: 1e-4,
72
+ n_jobs: nil, verbose: false, random_seed: nil)
73
+ check_params_numeric(learning_rate: learning_rate, momentum: momentum,
74
+ reg_param: reg_param, bias_scale: bias_scale, epsilon: epsilon,
75
+ max_iter: max_iter, batch_size: batch_size, tol: tol)
76
+ check_params_boolean(fit_bias: fit_bias, verbose: verbose)
77
+ check_params_numeric_or_nil(decay: decay, n_jobs: n_jobs, random_seed: random_seed)
78
+ check_params_positive(learning_rate: learning_rate, reg_param: reg_param,
79
+ bias_scale: bias_scale, epsilon: epsilon,
59
80
  max_iter: max_iter, batch_size: batch_size)
60
- keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
61
- keywd_args.delete(:epsilon)
62
- super(**keywd_args)
63
- @params[:epsilon] = epsilon
81
+ super()
82
+ @params.merge!(method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h)
83
+ @params[:decay] ||= @params[:reg_param] * @params[:learning_rate]
84
+ @params[:random_seed] ||= srand
85
+ @rng = Random.new(@params[:random_seed])
86
+ @penalty_type = @params[:penalty]
87
+ @loss_func = LinearModel::Loss::EpsilonInsensitive.new(epsilon: @params[:epsilon])
88
+ @weight_vec = nil
89
+ @bias_term = nil
64
90
  end
65
91
 
66
92
  # Fit the model with given training data.
@@ -100,35 +126,6 @@ module Rumale
100
126
  x = check_convert_sample_array(x)
101
127
  x.dot(@weight_vec.transpose) + @bias_term
102
128
  end
103
-
104
- # Dump marshal data.
105
- # @return [Hash] The marshal data about SVR.
106
- def marshal_dump
107
- { params: @params,
108
- weight_vec: @weight_vec,
109
- bias_term: @bias_term,
110
- rng: @rng }
111
- end
112
-
113
- # Load marshal data.
114
- # @return [nil]
115
- def marshal_load(obj)
116
- @params = obj[:params]
117
- @weight_vec = obj[:weight_vec]
118
- @bias_term = obj[:bias_term]
119
- @rng = obj[:rng]
120
- nil
121
- end
122
-
123
- private
124
-
125
- def calc_loss_gradient(x, y, weight)
126
- z = x.dot(weight)
127
- grad = Numo::DFloat.zeros(@params[:batch_size])
128
- grad[(z - y).gt(@params[:epsilon]).where] = 1
129
- grad[(y - z).gt(@params[:epsilon]).where] = -1
130
- grad
131
- end
132
129
  end
133
130
  end
134
131
  end
@@ -17,7 +17,8 @@ module Rumale
17
17
  # @param loss [String] The loss function ('hinge' or 'logistic' or nil).
18
18
  # @param reg_param_linear [Float] The regularization parameter for linear model.
19
19
  # @param reg_param_factor [Float] The regularization parameter for factor matrix.
20
- # @param max_iter [Integer] The maximum number of iterations.
20
+ # @param max_iter [Integer] The maximum number of epochs that indicates
21
+ # how many times the whole data is given to the training process.
21
22
  # @param batch_size [Integer] The size of the mini batches.
22
23
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
23
24
  # If nil is given, Nadam is used.
@@ -27,7 +28,7 @@ module Rumale
27
28
  # This parameter is ignored if the Parallel gem is not loaded.
28
29
  # @param random_seed [Integer] The seed value using to initialize the random generator.
29
30
  def initialize(n_factors: 2, loss: nil, reg_param_linear: 1.0, reg_param_factor: 1.0,
30
- max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
31
+ max_iter: 200, batch_size: 50, optimizer: nil, n_jobs: nil, random_seed: nil)
31
32
  @params = {}
32
33
  @params[:n_factors] = n_factors
33
34
  @params[:loss] = loss unless loss.nil?
@@ -51,27 +52,29 @@ module Rumale
51
52
  def partial_fit(x, y)
52
53
  # Initialize some variables.
53
54
  n_samples, n_features = x.shape
54
- rand_ids = [*0...n_samples].shuffle(random: @rng.dup)
55
+ sub_rng = @rng.dup
55
56
  weight_vec = Numo::DFloat.zeros(n_features + 1)
56
57
  factor_mat = Numo::DFloat.zeros(@params[:n_factors], n_features)
57
58
  weight_optimizer = @params[:optimizer].dup
58
59
  factor_optimizers = Array.new(@params[:n_factors]) { @params[:optimizer].dup }
59
60
  # Start optimization.
60
61
  @params[:max_iter].times do |_t|
61
- # Random sampling.
62
- subset_ids = rand_ids.shift(@params[:batch_size])
63
- rand_ids.concat(subset_ids)
64
- data = x[subset_ids, true]
65
- ex_data = expand_feature(data)
66
- targets = y[subset_ids]
67
- # Calculate gradients for loss function.
68
- loss_grad = loss_gradient(data, ex_data, targets, factor_mat, weight_vec)
69
- next if loss_grad.ne(0.0).count.zero?
70
- # Update each parameter.
71
- weight_vec = weight_optimizer.call(weight_vec, weight_gradient(loss_grad, ex_data, weight_vec))
72
- @params[:n_factors].times do |n|
73
- factor_mat[n, true] = factor_optimizers[n].call(factor_mat[n, true],
74
- factor_gradient(loss_grad, data, factor_mat[n, true]))
62
+ sample_ids = [*0...n_samples]
63
+ sample_ids.shuffle!(random: sub_rng)
64
+ until (subset_ids = sample_ids.shift(@params[:batch_size])).empty?
65
+ # Sampling.
66
+ sub_x = x[subset_ids, true]
67
+ sub_y = y[subset_ids]
68
+ ex_sub_x = expand_feature(sub_x)
69
+ # Calculate gradients for loss function.
70
+ loss_grad = loss_gradient(sub_x, ex_sub_x, sub_y, factor_mat, weight_vec)
71
+ next if loss_grad.ne(0.0).count.zero?
72
+ # Update each parameter.
73
+ weight_vec = weight_optimizer.call(weight_vec, weight_gradient(loss_grad, ex_sub_x, weight_vec))
74
+ @params[:n_factors].times do |n|
75
+ factor_mat[n, true] = factor_optimizers[n].call(factor_mat[n, true],
76
+ factor_gradient(loss_grad, sub_x, factor_mat[n, true]))
77
+ end
75
78
  end
76
79
  end
77
80
  [factor_mat, *split_weight_vec_bias(weight_vec)]
@@ -14,7 +14,7 @@ module Rumale
14
14
  # estimator =
15
15
  # Rumale::PolynomialModel::FactorizationMachineClassifier.new(
16
16
  # n_factors: 10, loss: 'hinge', reg_param_linear: 0.001, reg_param_factor: 0.001,
17
- # max_iter: 5000, batch_size: 50, random_seed: 1)
17
+ # max_iter: 500, batch_size: 50, random_seed: 1)
18
18
  # estimator.fit(training_samples, traininig_labels)
19
19
  # results = estimator.predict(testing_samples)
20
20
  #
@@ -50,7 +50,8 @@ module Rumale
50
50
  # @param loss [String] The loss function ('hinge' or 'logistic').
51
51
  # @param reg_param_linear [Float] The regularization parameter for linear model.
52
52
  # @param reg_param_factor [Float] The regularization parameter for factor matrix.
53
- # @param max_iter [Integer] The maximum number of iterations.
53
+ # @param max_iter [Integer] The maximum number of epochs that indicates
54
+ # how many times the whole data is given to the training process.
54
55
  # @param batch_size [Integer] The size of the mini batches.
55
56
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
56
57
  # If nil is given, Nadam is used.
@@ -60,7 +61,7 @@ module Rumale
60
61
  # This parameter is ignored if the Parallel gem is not loaded.
61
62
  # @param random_seed [Integer] The seed value using to initialize the random generator.
62
63
  def initialize(n_factors: 2, loss: 'hinge', reg_param_linear: 1.0, reg_param_factor: 1.0,
63
- max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
64
+ max_iter: 200, batch_size: 50, optimizer: nil, n_jobs: nil, random_seed: nil)
64
65
  check_params_numeric(reg_param_linear: reg_param_linear, reg_param_factor: reg_param_factor,
65
66
  n_factors: n_factors, max_iter: max_iter, batch_size: batch_size)
66
67
  check_params_string(loss: loss)
@@ -12,7 +12,7 @@ module Rumale
12
12
  # estimator =
13
13
  # Rumale::PolynomialModel::FactorizationMachineRegressor.new(
14
14
  # n_factors: 10, reg_param_linear: 0.1, reg_param_factor: 0.1,
15
- # max_iter: 5000, batch_size: 50, random_seed: 1)
15
+ # max_iter: 500, batch_size: 50, random_seed: 1)
16
16
  # estimator.fit(training_samples, traininig_values)
17
17
  # results = estimator.predict(testing_samples)
18
18
  #
@@ -43,7 +43,8 @@ module Rumale
43
43
  # @param n_factors [Integer] The maximum number of iterations.
44
44
  # @param reg_param_linear [Float] The regularization parameter for linear model.
45
45
  # @param reg_param_factor [Float] The regularization parameter for factor matrix.
46
- # @param max_iter [Integer] The maximum number of iterations.
46
+ # @param max_iter [Integer] The maximum number of epochs that indicates
47
+ # how many times the whole data is given to the training process.
47
48
  # @param batch_size [Integer] The size of the mini batches.
48
49
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
49
50
  # If nil is given, Nadam is used.
@@ -53,7 +54,7 @@ module Rumale
53
54
  # This parameter is ignored if the Parallel gem is not loaded.
54
55
  # @param random_seed [Integer] The seed value using to initialize the random generator.
55
56
  def initialize(n_factors: 2, reg_param_linear: 1.0, reg_param_factor: 1.0,
56
- max_iter: 1000, batch_size: 10, optimizer: nil, n_jobs: nil, random_seed: nil)
57
+ max_iter: 200, batch_size: 50, optimizer: nil, n_jobs: nil, random_seed: nil)
57
58
  check_params_numeric(reg_param_linear: reg_param_linear, reg_param_factor: reg_param_factor,
58
59
  n_factors: n_factors, max_iter: max_iter, batch_size: batch_size)
59
60
  check_params_numeric_or_nil(n_jobs: n_jobs, random_seed: random_seed)
@@ -3,5 +3,5 @@
3
3
  # Rumale is a machine learning library in Ruby.
4
4
  module Rumale
5
5
  # The version of Rumale you are using.
6
- VERSION = '0.16.1'
6
+ VERSION = '0.17.0'
7
7
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rumale
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.16.1
4
+ version: 0.17.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshoku
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2020-01-11 00:00:00.000000000 Z
11
+ date: 2020-01-18 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: numo-narray