rumale 0.13.0 → 0.13.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cdd9f655f3a1f95edc68476dd614deac7718ef65
4
- data.tar.gz: f979c070a55fc8fb2afce0c04b1fa26bf92aa2af
3
+ metadata.gz: ce88d7170fd676377227427a0be90f8bdb1a9c97
4
+ data.tar.gz: 04f0d07e6d098768eda726fc82f864420678e427
5
5
  SHA512:
6
- metadata.gz: 94a6aed271a2f0da786544aaef2dc81b62c357f7dbd552c5ebbdd6a67c39b3584b9b139a6ea8be269a030ca2c7ec0f5852ccc3990492d3f4aabf55ab46172d9a
7
- data.tar.gz: b89de3059991cecfe92d492193287777cd8278715ee55e52868005a8df5489005f02ed862b8a782111e0cc8f76da67b1965b690e5a5876c7877025d7657f6be3
6
+ metadata.gz: 203444f0e7d833946f67c2ee922e02a48b7174c20eac84480e190f8749e150e0c5ed18e3d7b7d30480e565483b5a5b51d1990cced7e09b5db027d8c508fa4313
7
+ data.tar.gz: e608c97fc0d29c018c778f9cc96cd53b0edff927c5631bd3b0cb606ee93f4e8c647ed2c76e7835b49f1933c0e5aeccb1ffbda4fe9aec59a2689f7bde4a28e103
@@ -1,3 +1,11 @@
1
+ # 0.13.1
2
+ - Add class for Shared Neareset Neighbor clustering.
3
+ - Add function for calculation of manhattan distance to Rumale::PairwiseMetric.
4
+ - Add metric parameter that specifies distance metric to Rumale::Clustering::DBSCAN.
5
+ - Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::LinearRegression.
6
+ - Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::Ridge.
7
+ - Fix bug that the ndim of NArray of 1-dimensional principal components is not 1.
8
+
1
9
  # 0.13.0
2
10
  - Introduce [Numo::Linalg](https://github.com/ruby-numo/numo-linalg) to use linear algebra algorithms on the optimization.
3
11
  - Add the solver parameter that specifies the optimization algorithm to Rumale::Decomposition::PCA.
data/README.md CHANGED
@@ -6,14 +6,14 @@
6
6
  [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
7
7
  [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
8
8
  [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
9
- [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.13.0)
9
+ [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.13.1)
10
10
 
11
11
  Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
12
12
  Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
13
13
  Rumale supports Linear / Kernel Support Vector Machine,
14
14
  Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
15
15
  Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier,
16
- K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
16
+ K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
17
17
  Mutidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
18
18
 
19
19
  This project was formerly known as "SVMKit".
@@ -60,6 +60,7 @@ require 'rumale/clustering/k_means'
60
60
  require 'rumale/clustering/k_medoids'
61
61
  require 'rumale/clustering/gaussian_mixture'
62
62
  require 'rumale/clustering/dbscan'
63
+ require 'rumale/clustering/snn'
63
64
  require 'rumale/clustering/power_iteration'
64
65
  require 'rumale/decomposition/pca'
65
66
  require 'rumale/decomposition/nmf'
@@ -7,7 +7,6 @@ require 'rumale/pairwise_metric'
7
7
  module Rumale
8
8
  module Clustering
9
9
  # DBSCAN is a class that implements DBSCAN cluster analysis.
10
- # The current implementation uses the Euclidean distance for analyzing the clusters.
11
10
  #
12
11
  # @example
13
12
  # analyzer = Rumale::Clustering::DBSCAN.new(eps: 0.5, min_samples: 5)
@@ -31,12 +30,17 @@ module Rumale
31
30
  #
32
31
  # @param eps [Float] The radius of neighborhood.
33
32
  # @param min_samples [Integer] The number of neighbor samples to be used for the criterion whether a point is a core point.
34
- def initialize(eps: 0.5, min_samples: 5)
33
+ # @param metric [String] The metric to calculate the distances.
34
+ # If metric is 'euclidean', Euclidean distance is calculated for distance between points.
35
+ # If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
36
+ def initialize(eps: 0.5, min_samples: 5, metric: 'euclidean')
35
37
  check_params_float(eps: eps)
36
38
  check_params_integer(min_samples: min_samples)
39
+ check_params_string(metric: metric)
37
40
  @params = {}
38
41
  @params[:eps] = eps
39
42
  @params[:min_samples] = min_samples
43
+ @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
40
44
  @core_sample_ids = nil
41
45
  @labels = nil
42
46
  end
@@ -46,19 +50,23 @@ module Rumale
46
50
  # @overload fit(x) -> DBSCAN
47
51
  #
48
52
  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
53
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
49
54
  # @return [DBSCAN] The learned cluster analyzer itself.
50
55
  def fit(x, _y = nil)
51
56
  check_sample_array(x)
57
+ raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
52
58
  partial_fit(x)
53
59
  self
54
60
  end
55
61
 
56
62
  # Analysis clusters and assign samples to clusters.
57
63
  #
58
- # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
64
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to be used for cluster analysis.
65
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
59
66
  # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
60
67
  def fit_predict(x)
61
68
  check_sample_array(x)
69
+ raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
62
70
  partial_fit(x)
63
71
  labels
64
72
  end
@@ -84,19 +92,24 @@ module Rumale
84
92
 
85
93
  def partial_fit(x)
86
94
  cluster_id = 0
87
- n_samples = x.shape[0]
95
+ metric_mat = calc_pairwise_metrics(x)
96
+ n_samples = metric_mat.shape[0]
88
97
  @core_sample_ids = []
89
98
  @labels = Numo::Int32.zeros(n_samples) - 2
90
- n_samples.times do |q|
91
- next if @labels[q] >= -1
92
- cluster_id += 1 if expand_cluster(x, q, cluster_id)
99
+ n_samples.times do |query_id|
100
+ next if @labels[query_id] >= -1
101
+ cluster_id += 1 if expand_cluster(metric_mat, query_id, cluster_id)
93
102
  end
94
103
  @core_sample_ids = Numo::Int32[*@core_sample_ids.flatten]
95
104
  nil
96
105
  end
97
106
 
98
- def expand_cluster(x, query_id, cluster_id)
99
- target_ids = region_query(x[query_id, true], x)
107
+ def calc_pairwise_metrics(x)
108
+ @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
109
+ end
110
+
111
+ def expand_cluster(metric_mat, query_id, cluster_id)
112
+ target_ids = region_query(metric_mat[query_id, true])
100
113
  if target_ids.size < @params[:min_samples]
101
114
  @labels[query_id] = -1
102
115
  false
@@ -105,7 +118,7 @@ module Rumale
105
118
  @core_sample_ids.push(target_ids.dup)
106
119
  target_ids.delete(query_id)
107
120
  while (m = target_ids.shift)
108
- neighbor_ids = region_query(x[m, true], x)
121
+ neighbor_ids = region_query(metric_mat[m, true])
109
122
  next if neighbor_ids.size < @params[:min_samples]
110
123
  neighbor_ids.each do |n|
111
124
  target_ids.push(n) if @labels[n] < -1
@@ -116,9 +129,8 @@ module Rumale
116
129
  end
117
130
  end
118
131
 
119
- def region_query(query, targets)
120
- distance_arr = PairwiseMetric.euclidean_distance(query.expand_dims(0), targets)[0, true]
121
- distance_arr.lt(@params[:eps]).where.to_a
132
+ def region_query(metric_arr)
133
+ metric_arr.lt(@params[:eps]).where.to_a
122
134
  end
123
135
  end
124
136
  end
@@ -29,8 +29,8 @@ module Rumale
29
29
  # Create a new cluster analyzer with K-Medoids method.
30
30
  #
31
31
  # @param n_clusters [Integer] The number of clusters.
32
- # @param metric [String] The metric to calculate the distances in original space.
33
- # If metric is 'euclidean', Euclidean distance is calculated for distance in original space.
32
+ # @param metric [String] The metric to calculate the distances.
33
+ # If metric is 'euclidean', Euclidean distance is calculated for distance between points.
34
34
  # If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
35
35
  # @param init [String] The initialization method for centroids ('random' or 'k-means++').
36
36
  # @param max_iter [Integer] The maximum number of iterations.
@@ -0,0 +1,76 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'rumale/pairwise_metric'
4
+ require 'rumale/clustering/dbscan'
5
+
6
+ module Rumale
7
+ module Clustering
8
+ # SNN is a class that implements Shared Nearest Neighbor cluster analysis.
9
+ # The SNN method is a variation of DBSCAN that uses similarity based on k-nearest neighbors as a metric.
10
+ #
11
+ # @example
12
+ # analyzer = Rumale::Clustering::SNN.new(n_neighbros: 10, eps: 5, min_samples: 5)
13
+ # cluster_labels = analyzer.fit_predict(samples)
14
+ #
15
+ # *Reference*
16
+ # - L. Ertoz, M. Steinbach, and V. Kumar, "Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data," Proc. SDM'03, pp. 47--58, 2003.
17
+ # - M E. Houle, H-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek, "Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?," Proc. SSDBM'10, pp. 482--500, 2010.
18
+ class SNN < DBSCAN
19
+ # Create a new cluster analyzer with Shared Neareset Neighbor method.
20
+ #
21
+ # @param n_neighbors [Integer] The number of neighbors to be used for finding k-nearest neighbors.
22
+ # @param eps [Integer] The threshold value for finding connected components based on similarity.
23
+ # @param min_samples [Integer] The number of neighbor samples to be used for the criterion whether a point is a core point.
24
+ # @param metric [String] The metric to calculate the distances.
25
+ # If metric is 'euclidean', Euclidean distance is calculated for distance between points.
26
+ # If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
27
+ def initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean')
28
+ check_params_integer(n_neighbors: n_neighbors, min_samples: min_samples)
29
+ check_params_string(metric: metric)
30
+ @params = {}
31
+ @params[:n_neighbors] = n_neighbors
32
+ @params[:eps] = eps
33
+ @params[:min_samples] = min_samples
34
+ @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
35
+ @core_sample_ids = nil
36
+ @labels = nil
37
+ end
38
+
39
+ # Analysis clusters with given training data.
40
+ #
41
+ # @overload fit(x) -> SNN
42
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
43
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
44
+ # @return [SNN] The learned cluster analyzer itself.
45
+ def fit(x, _y = nil)
46
+ super
47
+ end
48
+
49
+ # Analysis clusters and assign samples to clusters.
50
+ #
51
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to be used for cluster analysis.
52
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
53
+ # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
54
+ def fit_predict(x)
55
+ super
56
+ end
57
+
58
+ private
59
+
60
+ def calc_pairwise_metrics(x)
61
+ distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
62
+ n_samples = distance_mat.shape[0]
63
+ adjacency_mat = Numo::DFloat.zeros(n_samples, n_samples)
64
+ n_samples.times do |n|
65
+ neighbor_ids = distance_mat[n, true].sort_index[0...@params[:n_neighbors]]
66
+ adjacency_mat[n, neighbor_ids] = 1
67
+ end
68
+ adjacency_mat.dot(adjacency_mat.transpose)
69
+ end
70
+
71
+ def region_query(similarity_arr)
72
+ similarity_arr.gt(@params[:eps]).where.to_a
73
+ end
74
+ end
75
+ end
76
+ end
@@ -80,7 +80,8 @@ module Rumale
80
80
  covariance_mat = centered_x.transpose.dot(centered_x) / (n_samples - 1)
81
81
  if @params[:solver] == 'evd' && enable_linalg?
82
82
  _, evecs = Numo::Linalg.eigh(covariance_mat, vals_range: (n_features - @params[:n_components])...n_features)
83
- @components = evecs.reverse(1).transpose.dup
83
+ comps = evecs.reverse(1).transpose
84
+ @components = @params[:n_components] == 1 ? comps[0, true].dup : comps.dup
84
85
  else
85
86
  @params[:n_components].times do
86
87
  comp_vec = Rumale::Utils.rand_uniform(n_features, sub_rng)
@@ -6,7 +6,7 @@ require 'rumale/base/regressor'
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LinearRegression is a class that implements ordinary least square linear regression
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with mini-batch stochastic gradient descent optimization or singular value decomposition.
10
10
  #
11
11
  # @example
12
12
  # estimator =
@@ -14,6 +14,11 @@ module Rumale
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
17
+ # # If Numo::Linalg is installed, you can specify 'svd' for the solver option.
18
+ # require 'numo/linalg/autoloader'
19
+ # estimator = Rumale::LinearModel::LinearRegression.new(solver: 'svd')
20
+ # estimator.fit(training_samples, traininig_values)
21
+ # results = estimator.predict(testing_samples)
17
22
  class LinearRegression < BaseLinearModel
18
23
  include Base::Regressor
19
24
 
@@ -34,23 +39,32 @@ module Rumale
34
39
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
35
40
  # @param bias_scale [Float] The scale of the bias term.
36
41
  # @param max_iter [Integer] The maximum number of iterations.
42
+ # If solver = 'svd', this parameter is ignored.
37
43
  # @param batch_size [Integer] The size of the mini batches.
44
+ # If solver = 'svd', this parameter is ignored.
38
45
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
39
46
  # If nil is given, Nadam is used.
47
+ # If solver = 'svd', this parameter is ignored.
48
+ # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
49
+ # 'sgd' uses the stochastic gradient descent optimization.
50
+ # 'svd' performs singular value decomposition of samples.
40
51
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
41
52
  # If nil is given, the method does not execute in parallel.
42
53
  # If zero or less is given, it becomes equal to the number of processors.
43
54
  # This parameter is ignored if the Parallel gem is not loaded.
44
55
  # @param random_seed [Integer] The seed value using to initialize the random generator.
45
56
  def initialize(fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
46
- n_jobs: nil, random_seed: nil)
57
+ solver: 'sgd', n_jobs: nil, random_seed: nil)
47
58
  check_params_float(bias_scale: bias_scale)
48
59
  check_params_integer(max_iter: max_iter, batch_size: batch_size)
49
60
  check_params_boolean(fit_bias: fit_bias)
61
+ check_params_string(solver: solver)
50
62
  check_params_type_or_nil(Integer, n_jobs: n_jobs, random_seed: random_seed)
51
63
  check_params_positive(max_iter: max_iter, batch_size: batch_size)
52
64
  keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h.merge(reg_param: 0.0)
65
+ keywd_args.delete(:solver)
53
66
  super(keywd_args)
67
+ @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
54
68
  end
55
69
 
56
70
  # Fit the model with given training data.
@@ -63,20 +77,10 @@ module Rumale
63
77
  check_tvalue_array(y)
64
78
  check_sample_tvalue_size(x, y)
65
79
 
66
- n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
67
- n_features = x.shape[1]
68
-
69
- if n_outputs > 1
70
- @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
71
- @bias_term = Numo::DFloat.zeros(n_outputs)
72
- if enable_parallel?
73
- models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
74
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
75
- else
76
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
77
- end
80
+ if @params[:solver] == 'svd' && enable_linalg?
81
+ fit_svd(x, y)
78
82
  else
79
- @weight_vec, @bias_term = partial_fit(x, y)
83
+ fit_sgd(x, y)
80
84
  end
81
85
 
82
86
  self
@@ -112,6 +116,41 @@ module Rumale
112
116
 
113
117
  private
114
118
 
119
+ def fit_svd(x, y)
120
+ samples = @params[:fit_bias] ? expand_feature(x) : x
121
+
122
+ s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
123
+ d = (s / s**2).diag
124
+ w = vt.transpose.dot(d).dot(u.transpose).dot(y)
125
+
126
+ is_single_target_vals = y.shape[1].nil?
127
+ if @params[:fit_bias]
128
+ @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
129
+ @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
130
+ else
131
+ @weight_vec = w.dup
132
+ @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
133
+ end
134
+ end
135
+
136
+ def fit_sgd(x, y)
137
+ n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
138
+ n_features = x.shape[1]
139
+
140
+ if n_outputs > 1
141
+ @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
142
+ @bias_term = Numo::DFloat.zeros(n_outputs)
143
+ if enable_parallel?
144
+ models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
145
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
146
+ else
147
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
148
+ end
149
+ else
150
+ @weight_vec, @bias_term = partial_fit(x, y)
151
+ end
152
+ end
153
+
115
154
  def calc_loss_gradient(x, y, weight)
116
155
  2.0 * (x.dot(weight) - y)
117
156
  end
@@ -6,7 +6,7 @@ require 'rumale/base/regressor'
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # Ridge is a class that implements Ridge Regression
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with mini-batch stochastic gradient descent optimization or singular value decomposition.
10
10
  #
11
11
  # @example
12
12
  # estimator =
@@ -14,6 +14,11 @@ module Rumale
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
17
+ # # If Numo::Linalg is installed, you can specify 'svd' for the solver option.
18
+ # require 'numo/linalg/autoloader'
19
+ # estimator = Rumale::LinearModel::Ridge.new(reg_param: 0.1, solver: 'svd')
20
+ # estimator.fit(training_samples, traininig_values)
21
+ # results = estimator.predict(testing_samples)
17
22
  class Ridge < BaseLinearModel
18
23
  include Base::Regressor
19
24
 
@@ -35,22 +40,32 @@ module Rumale
35
40
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
36
41
  # @param bias_scale [Float] The scale of the bias term.
37
42
  # @param max_iter [Integer] The maximum number of iterations.
43
+ # If solver = 'svd', this parameter is ignored.
38
44
  # @param batch_size [Integer] The size of the mini batches.
45
+ # If solver = 'svd', this parameter is ignored.
39
46
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
40
47
  # If nil is given, Nadam is used.
48
+ # If solver = 'svd', this parameter is ignored.
49
+ # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
50
+ # 'sgd' uses the stochastic gradient descent optimization.
51
+ # 'svd' performs singular value decomposition of samples.
41
52
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
42
53
  # If nil is given, the method does not execute in parallel.
43
54
  # If zero or less is given, it becomes equal to the number of processors.
44
- # This parameter is ignored if the Parallel gem is not loaded.
55
+ # This parameter is ignored if the Parallel gem is not loaded or the solver is 'svd'.
45
56
  # @param random_seed [Integer] The seed value using to initialize the random generator.
46
57
  def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
47
- n_jobs: nil, random_seed: nil)
58
+ solver: 'sgd', n_jobs: nil, random_seed: nil)
48
59
  check_params_float(reg_param: reg_param, bias_scale: bias_scale)
49
60
  check_params_integer(max_iter: max_iter, batch_size: batch_size)
50
61
  check_params_boolean(fit_bias: fit_bias)
62
+ check_params_string(solver: solver)
51
63
  check_params_type_or_nil(Integer, n_jobs: n_jobs, random_seed: random_seed)
52
64
  check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
53
- super
65
+ keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
66
+ keywd_args.delete(:solver)
67
+ super(keywd_args)
68
+ @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
54
69
  end
55
70
 
56
71
  # Fit the model with given training data.
@@ -63,20 +78,10 @@ module Rumale
63
78
  check_tvalue_array(y)
64
79
  check_sample_tvalue_size(x, y)
65
80
 
66
- n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
67
- n_features = x.shape[1]
68
-
69
- if n_outputs > 1
70
- @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
71
- @bias_term = Numo::DFloat.zeros(n_outputs)
72
- if enable_parallel?
73
- models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
74
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
75
- else
76
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
77
- end
81
+ if @params[:solver] == 'svd' && enable_linalg?
82
+ fit_svd(x, y)
78
83
  else
79
- @weight_vec, @bias_term = partial_fit(x, y)
84
+ fit_sgd(x, y)
80
85
  end
81
86
 
82
87
  self
@@ -112,6 +117,41 @@ module Rumale
112
117
 
113
118
  private
114
119
 
120
+ def fit_svd(x, y)
121
+ samples = @params[:fit_bias] ? expand_feature(x) : x
122
+
123
+ s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
124
+ d = (s / (s**2 + @params[:reg_param])).diag
125
+ w = vt.transpose.dot(d).dot(u.transpose).dot(y)
126
+
127
+ is_single_target_vals = y.shape[1].nil?
128
+ if @params[:fit_bias]
129
+ @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
130
+ @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
131
+ else
132
+ @weight_vec = w.dup
133
+ @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
134
+ end
135
+ end
136
+
137
+ def fit_sgd(x, y)
138
+ n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
139
+ n_features = x.shape[1]
140
+
141
+ if n_outputs > 1
142
+ @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
143
+ @bias_term = Numo::DFloat.zeros(n_outputs)
144
+ if enable_parallel?
145
+ models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
146
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
147
+ else
148
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
149
+ end
150
+ else
151
+ @weight_vec, @bias_term = partial_fit(x, y)
152
+ end
153
+ end
154
+
115
155
  def calc_loss_gradient(x, y, weight)
116
156
  2.0 * (x.dot(weight) - y)
117
157
  end
@@ -18,6 +18,24 @@ module Rumale
18
18
  Numo::NMath.sqrt(squared_error(x, y).abs)
19
19
  end
20
20
 
21
+ # Calculate the pairwise manhattan distances between x and y.
22
+ #
23
+ # @param x [Numo::DFloat] (shape: [n_samples_x, n_features])
24
+ # @param y [Numo::DFloat] (shape: [n_samples_y, n_features])
25
+ # @return [Numo::DFloat] (shape: [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y] if y is given)
26
+ def manhattan_distance(x, y = nil)
27
+ y = x if y.nil?
28
+ Rumale::Validation.check_sample_array(x)
29
+ Rumale::Validation.check_sample_array(y)
30
+ n_samples_x = x.shape[0]
31
+ n_samples_y = y.shape[0]
32
+ distance_mat = Numo::DFloat.zeros(n_samples_x, n_samples_y)
33
+ n_samples_x.times do |n|
34
+ distance_mat[n, true] = (y - x[n, true]).abs.sum(axis: 1)
35
+ end
36
+ distance_mat
37
+ end
38
+
21
39
  # Calculate the pairwise squared errors between x and y.
22
40
  #
23
41
  # @param x [Numo::DFloat] (shape: [n_samples_x, n_features])
@@ -27,11 +45,6 @@ module Rumale
27
45
  y = x if y.nil?
28
46
  Rumale::Validation.check_sample_array(x)
29
47
  Rumale::Validation.check_sample_array(y)
30
- # sum_x_vec = (x**2).sum(1)
31
- # sum_y_vec = (y**2).sum(1)
32
- # dot_xy_mat = x.dot(y.transpose)
33
- # dot_xy_mat * -2.0 + sum_x_vec.tile(y.shape[0], 1).transpose + sum_y_vec.tile(x.shape[0], 1)
34
- #
35
48
  n_features = x.shape[1]
36
49
  one_vec = Numo::DFloat.ones(n_features).expand_dims(1)
37
50
  sum_x_vec = (x**2).dot(one_vec)
@@ -3,5 +3,5 @@
3
3
  # Rumale is a machine learning library in Ruby.
4
4
  module Rumale
5
5
  # The version of Rumale you are using.
6
- VERSION = '0.13.0'
6
+ VERSION = '0.13.1'
7
7
  end
@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
19
19
  Rumale currently supports Linear / Kernel Support Vector Machine,
20
20
  Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
21
21
  Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
22
- K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
22
+ K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
23
23
  Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
24
24
  MSG
25
25
  spec.homepage = 'https://github.com/yoshoku/rumale'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rumale
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.13.0
4
+ version: 0.13.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshoku
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2019-08-24 00:00:00.000000000 Z
11
+ date: 2019-09-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: numo-narray
@@ -128,7 +128,7 @@ description: |
128
128
  Rumale currently supports Linear / Kernel Support Vector Machine,
129
129
  Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
130
130
  Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
131
- K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
131
+ K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
132
132
  Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
133
133
  email:
134
134
  - yoshoku@outlook.com
@@ -166,6 +166,7 @@ files:
166
166
  - lib/rumale/clustering/k_means.rb
167
167
  - lib/rumale/clustering/k_medoids.rb
168
168
  - lib/rumale/clustering/power_iteration.rb
169
+ - lib/rumale/clustering/snn.rb
169
170
  - lib/rumale/dataset.rb
170
171
  - lib/rumale/decomposition/nmf.rb
171
172
  - lib/rumale/decomposition/pca.rb