rumale 0.13.0 → 0.13.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: cdd9f655f3a1f95edc68476dd614deac7718ef65
4
- data.tar.gz: f979c070a55fc8fb2afce0c04b1fa26bf92aa2af
3
+ metadata.gz: ce88d7170fd676377227427a0be90f8bdb1a9c97
4
+ data.tar.gz: 04f0d07e6d098768eda726fc82f864420678e427
5
5
  SHA512:
6
- metadata.gz: 94a6aed271a2f0da786544aaef2dc81b62c357f7dbd552c5ebbdd6a67c39b3584b9b139a6ea8be269a030ca2c7ec0f5852ccc3990492d3f4aabf55ab46172d9a
7
- data.tar.gz: b89de3059991cecfe92d492193287777cd8278715ee55e52868005a8df5489005f02ed862b8a782111e0cc8f76da67b1965b690e5a5876c7877025d7657f6be3
6
+ metadata.gz: 203444f0e7d833946f67c2ee922e02a48b7174c20eac84480e190f8749e150e0c5ed18e3d7b7d30480e565483b5a5b51d1990cced7e09b5db027d8c508fa4313
7
+ data.tar.gz: e608c97fc0d29c018c778f9cc96cd53b0edff927c5631bd3b0cb606ee93f4e8c647ed2c76e7835b49f1933c0e5aeccb1ffbda4fe9aec59a2689f7bde4a28e103
@@ -1,3 +1,11 @@
1
+ # 0.13.1
2
+ - Add class for Shared Neareset Neighbor clustering.
3
+ - Add function for calculation of manhattan distance to Rumale::PairwiseMetric.
4
+ - Add metric parameter that specifies distance metric to Rumale::Clustering::DBSCAN.
5
+ - Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::LinearRegression.
6
+ - Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::Ridge.
7
+ - Fix bug that the ndim of NArray of 1-dimensional principal components is not 1.
8
+
1
9
  # 0.13.0
2
10
  - Introduce [Numo::Linalg](https://github.com/ruby-numo/numo-linalg) to use linear algebra algorithms on the optimization.
3
11
  - Add the solver parameter that specifies the optimization algorithm to Rumale::Decomposition::PCA.
data/README.md CHANGED
@@ -6,14 +6,14 @@
6
6
  [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
7
7
  [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
8
8
  [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
9
- [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.13.0)
9
+ [![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.13.1)
10
10
 
11
11
  Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
12
12
  Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
13
13
  Rumale supports Linear / Kernel Support Vector Machine,
14
14
  Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
15
15
  Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier,
16
- K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
16
+ K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
17
17
  Mutidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
18
18
 
19
19
  This project was formerly known as "SVMKit".
@@ -60,6 +60,7 @@ require 'rumale/clustering/k_means'
60
60
  require 'rumale/clustering/k_medoids'
61
61
  require 'rumale/clustering/gaussian_mixture'
62
62
  require 'rumale/clustering/dbscan'
63
+ require 'rumale/clustering/snn'
63
64
  require 'rumale/clustering/power_iteration'
64
65
  require 'rumale/decomposition/pca'
65
66
  require 'rumale/decomposition/nmf'
@@ -7,7 +7,6 @@ require 'rumale/pairwise_metric'
7
7
  module Rumale
8
8
  module Clustering
9
9
  # DBSCAN is a class that implements DBSCAN cluster analysis.
10
- # The current implementation uses the Euclidean distance for analyzing the clusters.
11
10
  #
12
11
  # @example
13
12
  # analyzer = Rumale::Clustering::DBSCAN.new(eps: 0.5, min_samples: 5)
@@ -31,12 +30,17 @@ module Rumale
31
30
  #
32
31
  # @param eps [Float] The radius of neighborhood.
33
32
  # @param min_samples [Integer] The number of neighbor samples to be used for the criterion whether a point is a core point.
34
- def initialize(eps: 0.5, min_samples: 5)
33
+ # @param metric [String] The metric to calculate the distances.
34
+ # If metric is 'euclidean', Euclidean distance is calculated for distance between points.
35
+ # If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
36
+ def initialize(eps: 0.5, min_samples: 5, metric: 'euclidean')
35
37
  check_params_float(eps: eps)
36
38
  check_params_integer(min_samples: min_samples)
39
+ check_params_string(metric: metric)
37
40
  @params = {}
38
41
  @params[:eps] = eps
39
42
  @params[:min_samples] = min_samples
43
+ @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
40
44
  @core_sample_ids = nil
41
45
  @labels = nil
42
46
  end
@@ -46,19 +50,23 @@ module Rumale
46
50
  # @overload fit(x) -> DBSCAN
47
51
  #
48
52
  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
53
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
49
54
  # @return [DBSCAN] The learned cluster analyzer itself.
50
55
  def fit(x, _y = nil)
51
56
  check_sample_array(x)
57
+ raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
52
58
  partial_fit(x)
53
59
  self
54
60
  end
55
61
 
56
62
  # Analysis clusters and assign samples to clusters.
57
63
  #
58
- # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
64
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to be used for cluster analysis.
65
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
59
66
  # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
60
67
  def fit_predict(x)
61
68
  check_sample_array(x)
69
+ raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
62
70
  partial_fit(x)
63
71
  labels
64
72
  end
@@ -84,19 +92,24 @@ module Rumale
84
92
 
85
93
  def partial_fit(x)
86
94
  cluster_id = 0
87
- n_samples = x.shape[0]
95
+ metric_mat = calc_pairwise_metrics(x)
96
+ n_samples = metric_mat.shape[0]
88
97
  @core_sample_ids = []
89
98
  @labels = Numo::Int32.zeros(n_samples) - 2
90
- n_samples.times do |q|
91
- next if @labels[q] >= -1
92
- cluster_id += 1 if expand_cluster(x, q, cluster_id)
99
+ n_samples.times do |query_id|
100
+ next if @labels[query_id] >= -1
101
+ cluster_id += 1 if expand_cluster(metric_mat, query_id, cluster_id)
93
102
  end
94
103
  @core_sample_ids = Numo::Int32[*@core_sample_ids.flatten]
95
104
  nil
96
105
  end
97
106
 
98
- def expand_cluster(x, query_id, cluster_id)
99
- target_ids = region_query(x[query_id, true], x)
107
+ def calc_pairwise_metrics(x)
108
+ @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
109
+ end
110
+
111
+ def expand_cluster(metric_mat, query_id, cluster_id)
112
+ target_ids = region_query(metric_mat[query_id, true])
100
113
  if target_ids.size < @params[:min_samples]
101
114
  @labels[query_id] = -1
102
115
  false
@@ -105,7 +118,7 @@ module Rumale
105
118
  @core_sample_ids.push(target_ids.dup)
106
119
  target_ids.delete(query_id)
107
120
  while (m = target_ids.shift)
108
- neighbor_ids = region_query(x[m, true], x)
121
+ neighbor_ids = region_query(metric_mat[m, true])
109
122
  next if neighbor_ids.size < @params[:min_samples]
110
123
  neighbor_ids.each do |n|
111
124
  target_ids.push(n) if @labels[n] < -1
@@ -116,9 +129,8 @@ module Rumale
116
129
  end
117
130
  end
118
131
 
119
- def region_query(query, targets)
120
- distance_arr = PairwiseMetric.euclidean_distance(query.expand_dims(0), targets)[0, true]
121
- distance_arr.lt(@params[:eps]).where.to_a
132
+ def region_query(metric_arr)
133
+ metric_arr.lt(@params[:eps]).where.to_a
122
134
  end
123
135
  end
124
136
  end
@@ -29,8 +29,8 @@ module Rumale
29
29
  # Create a new cluster analyzer with K-Medoids method.
30
30
  #
31
31
  # @param n_clusters [Integer] The number of clusters.
32
- # @param metric [String] The metric to calculate the distances in original space.
33
- # If metric is 'euclidean', Euclidean distance is calculated for distance in original space.
32
+ # @param metric [String] The metric to calculate the distances.
33
+ # If metric is 'euclidean', Euclidean distance is calculated for distance between points.
34
34
  # If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
35
35
  # @param init [String] The initialization method for centroids ('random' or 'k-means++').
36
36
  # @param max_iter [Integer] The maximum number of iterations.
@@ -0,0 +1,76 @@
1
+ # frozen_string_literal: true
2
+
3
+ require 'rumale/pairwise_metric'
4
+ require 'rumale/clustering/dbscan'
5
+
6
+ module Rumale
7
+ module Clustering
8
+ # SNN is a class that implements Shared Nearest Neighbor cluster analysis.
9
+ # The SNN method is a variation of DBSCAN that uses similarity based on k-nearest neighbors as a metric.
10
+ #
11
+ # @example
12
+ # analyzer = Rumale::Clustering::SNN.new(n_neighbros: 10, eps: 5, min_samples: 5)
13
+ # cluster_labels = analyzer.fit_predict(samples)
14
+ #
15
+ # *Reference*
16
+ # - L. Ertoz, M. Steinbach, and V. Kumar, "Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data," Proc. SDM'03, pp. 47--58, 2003.
17
+ # - M E. Houle, H-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek, "Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?," Proc. SSDBM'10, pp. 482--500, 2010.
18
+ class SNN < DBSCAN
19
+ # Create a new cluster analyzer with Shared Neareset Neighbor method.
20
+ #
21
+ # @param n_neighbors [Integer] The number of neighbors to be used for finding k-nearest neighbors.
22
+ # @param eps [Integer] The threshold value for finding connected components based on similarity.
23
+ # @param min_samples [Integer] The number of neighbor samples to be used for the criterion whether a point is a core point.
24
+ # @param metric [String] The metric to calculate the distances.
25
+ # If metric is 'euclidean', Euclidean distance is calculated for distance between points.
26
+ # If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
27
+ def initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean')
28
+ check_params_integer(n_neighbors: n_neighbors, min_samples: min_samples)
29
+ check_params_string(metric: metric)
30
+ @params = {}
31
+ @params[:n_neighbors] = n_neighbors
32
+ @params[:eps] = eps
33
+ @params[:min_samples] = min_samples
34
+ @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
35
+ @core_sample_ids = nil
36
+ @labels = nil
37
+ end
38
+
39
+ # Analysis clusters with given training data.
40
+ #
41
+ # @overload fit(x) -> SNN
42
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
43
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
44
+ # @return [SNN] The learned cluster analyzer itself.
45
+ def fit(x, _y = nil)
46
+ super
47
+ end
48
+
49
+ # Analysis clusters and assign samples to clusters.
50
+ #
51
+ # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to be used for cluster analysis.
52
+ # If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
53
+ # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
54
+ def fit_predict(x)
55
+ super
56
+ end
57
+
58
+ private
59
+
60
+ def calc_pairwise_metrics(x)
61
+ distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
62
+ n_samples = distance_mat.shape[0]
63
+ adjacency_mat = Numo::DFloat.zeros(n_samples, n_samples)
64
+ n_samples.times do |n|
65
+ neighbor_ids = distance_mat[n, true].sort_index[0...@params[:n_neighbors]]
66
+ adjacency_mat[n, neighbor_ids] = 1
67
+ end
68
+ adjacency_mat.dot(adjacency_mat.transpose)
69
+ end
70
+
71
+ def region_query(similarity_arr)
72
+ similarity_arr.gt(@params[:eps]).where.to_a
73
+ end
74
+ end
75
+ end
76
+ end
@@ -80,7 +80,8 @@ module Rumale
80
80
  covariance_mat = centered_x.transpose.dot(centered_x) / (n_samples - 1)
81
81
  if @params[:solver] == 'evd' && enable_linalg?
82
82
  _, evecs = Numo::Linalg.eigh(covariance_mat, vals_range: (n_features - @params[:n_components])...n_features)
83
- @components = evecs.reverse(1).transpose.dup
83
+ comps = evecs.reverse(1).transpose
84
+ @components = @params[:n_components] == 1 ? comps[0, true].dup : comps.dup
84
85
  else
85
86
  @params[:n_components].times do
86
87
  comp_vec = Rumale::Utils.rand_uniform(n_features, sub_rng)
@@ -6,7 +6,7 @@ require 'rumale/base/regressor'
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # LinearRegression is a class that implements ordinary least square linear regression
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with mini-batch stochastic gradient descent optimization or singular value decomposition.
10
10
  #
11
11
  # @example
12
12
  # estimator =
@@ -14,6 +14,11 @@ module Rumale
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
17
+ # # If Numo::Linalg is installed, you can specify 'svd' for the solver option.
18
+ # require 'numo/linalg/autoloader'
19
+ # estimator = Rumale::LinearModel::LinearRegression.new(solver: 'svd')
20
+ # estimator.fit(training_samples, traininig_values)
21
+ # results = estimator.predict(testing_samples)
17
22
  class LinearRegression < BaseLinearModel
18
23
  include Base::Regressor
19
24
 
@@ -34,23 +39,32 @@ module Rumale
34
39
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
35
40
  # @param bias_scale [Float] The scale of the bias term.
36
41
  # @param max_iter [Integer] The maximum number of iterations.
42
+ # If solver = 'svd', this parameter is ignored.
37
43
  # @param batch_size [Integer] The size of the mini batches.
44
+ # If solver = 'svd', this parameter is ignored.
38
45
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
39
46
  # If nil is given, Nadam is used.
47
+ # If solver = 'svd', this parameter is ignored.
48
+ # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
49
+ # 'sgd' uses the stochastic gradient descent optimization.
50
+ # 'svd' performs singular value decomposition of samples.
40
51
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
41
52
  # If nil is given, the method does not execute in parallel.
42
53
  # If zero or less is given, it becomes equal to the number of processors.
43
54
  # This parameter is ignored if the Parallel gem is not loaded.
44
55
  # @param random_seed [Integer] The seed value using to initialize the random generator.
45
56
  def initialize(fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
46
- n_jobs: nil, random_seed: nil)
57
+ solver: 'sgd', n_jobs: nil, random_seed: nil)
47
58
  check_params_float(bias_scale: bias_scale)
48
59
  check_params_integer(max_iter: max_iter, batch_size: batch_size)
49
60
  check_params_boolean(fit_bias: fit_bias)
61
+ check_params_string(solver: solver)
50
62
  check_params_type_or_nil(Integer, n_jobs: n_jobs, random_seed: random_seed)
51
63
  check_params_positive(max_iter: max_iter, batch_size: batch_size)
52
64
  keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h.merge(reg_param: 0.0)
65
+ keywd_args.delete(:solver)
53
66
  super(keywd_args)
67
+ @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
54
68
  end
55
69
 
56
70
  # Fit the model with given training data.
@@ -63,20 +77,10 @@ module Rumale
63
77
  check_tvalue_array(y)
64
78
  check_sample_tvalue_size(x, y)
65
79
 
66
- n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
67
- n_features = x.shape[1]
68
-
69
- if n_outputs > 1
70
- @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
71
- @bias_term = Numo::DFloat.zeros(n_outputs)
72
- if enable_parallel?
73
- models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
74
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
75
- else
76
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
77
- end
80
+ if @params[:solver] == 'svd' && enable_linalg?
81
+ fit_svd(x, y)
78
82
  else
79
- @weight_vec, @bias_term = partial_fit(x, y)
83
+ fit_sgd(x, y)
80
84
  end
81
85
 
82
86
  self
@@ -112,6 +116,41 @@ module Rumale
112
116
 
113
117
  private
114
118
 
119
+ def fit_svd(x, y)
120
+ samples = @params[:fit_bias] ? expand_feature(x) : x
121
+
122
+ s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
123
+ d = (s / s**2).diag
124
+ w = vt.transpose.dot(d).dot(u.transpose).dot(y)
125
+
126
+ is_single_target_vals = y.shape[1].nil?
127
+ if @params[:fit_bias]
128
+ @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
129
+ @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
130
+ else
131
+ @weight_vec = w.dup
132
+ @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
133
+ end
134
+ end
135
+
136
+ def fit_sgd(x, y)
137
+ n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
138
+ n_features = x.shape[1]
139
+
140
+ if n_outputs > 1
141
+ @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
142
+ @bias_term = Numo::DFloat.zeros(n_outputs)
143
+ if enable_parallel?
144
+ models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
145
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
146
+ else
147
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
148
+ end
149
+ else
150
+ @weight_vec, @bias_term = partial_fit(x, y)
151
+ end
152
+ end
153
+
115
154
  def calc_loss_gradient(x, y, weight)
116
155
  2.0 * (x.dot(weight) - y)
117
156
  end
@@ -6,7 +6,7 @@ require 'rumale/base/regressor'
6
6
  module Rumale
7
7
  module LinearModel
8
8
  # Ridge is a class that implements Ridge Regression
9
- # with mini-batch stochastic gradient descent optimization.
9
+ # with mini-batch stochastic gradient descent optimization or singular value decomposition.
10
10
  #
11
11
  # @example
12
12
  # estimator =
@@ -14,6 +14,11 @@ module Rumale
14
14
  # estimator.fit(training_samples, traininig_values)
15
15
  # results = estimator.predict(testing_samples)
16
16
  #
17
+ # # If Numo::Linalg is installed, you can specify 'svd' for the solver option.
18
+ # require 'numo/linalg/autoloader'
19
+ # estimator = Rumale::LinearModel::Ridge.new(reg_param: 0.1, solver: 'svd')
20
+ # estimator.fit(training_samples, traininig_values)
21
+ # results = estimator.predict(testing_samples)
17
22
  class Ridge < BaseLinearModel
18
23
  include Base::Regressor
19
24
 
@@ -35,22 +40,32 @@ module Rumale
35
40
  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
36
41
  # @param bias_scale [Float] The scale of the bias term.
37
42
  # @param max_iter [Integer] The maximum number of iterations.
43
+ # If solver = 'svd', this parameter is ignored.
38
44
  # @param batch_size [Integer] The size of the mini batches.
45
+ # If solver = 'svd', this parameter is ignored.
39
46
  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
40
47
  # If nil is given, Nadam is used.
48
+ # If solver = 'svd', this parameter is ignored.
49
+ # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
50
+ # 'sgd' uses the stochastic gradient descent optimization.
51
+ # 'svd' performs singular value decomposition of samples.
41
52
  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
42
53
  # If nil is given, the method does not execute in parallel.
43
54
  # If zero or less is given, it becomes equal to the number of processors.
44
- # This parameter is ignored if the Parallel gem is not loaded.
55
+ # This parameter is ignored if the Parallel gem is not loaded or the solver is 'svd'.
45
56
  # @param random_seed [Integer] The seed value using to initialize the random generator.
46
57
  def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
47
- n_jobs: nil, random_seed: nil)
58
+ solver: 'sgd', n_jobs: nil, random_seed: nil)
48
59
  check_params_float(reg_param: reg_param, bias_scale: bias_scale)
49
60
  check_params_integer(max_iter: max_iter, batch_size: batch_size)
50
61
  check_params_boolean(fit_bias: fit_bias)
62
+ check_params_string(solver: solver)
51
63
  check_params_type_or_nil(Integer, n_jobs: n_jobs, random_seed: random_seed)
52
64
  check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
53
- super
65
+ keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
66
+ keywd_args.delete(:solver)
67
+ super(keywd_args)
68
+ @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
54
69
  end
55
70
 
56
71
  # Fit the model with given training data.
@@ -63,20 +78,10 @@ module Rumale
63
78
  check_tvalue_array(y)
64
79
  check_sample_tvalue_size(x, y)
65
80
 
66
- n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
67
- n_features = x.shape[1]
68
-
69
- if n_outputs > 1
70
- @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
71
- @bias_term = Numo::DFloat.zeros(n_outputs)
72
- if enable_parallel?
73
- models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
74
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
75
- else
76
- n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
77
- end
81
+ if @params[:solver] == 'svd' && enable_linalg?
82
+ fit_svd(x, y)
78
83
  else
79
- @weight_vec, @bias_term = partial_fit(x, y)
84
+ fit_sgd(x, y)
80
85
  end
81
86
 
82
87
  self
@@ -112,6 +117,41 @@ module Rumale
112
117
 
113
118
  private
114
119
 
120
+ def fit_svd(x, y)
121
+ samples = @params[:fit_bias] ? expand_feature(x) : x
122
+
123
+ s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
124
+ d = (s / (s**2 + @params[:reg_param])).diag
125
+ w = vt.transpose.dot(d).dot(u.transpose).dot(y)
126
+
127
+ is_single_target_vals = y.shape[1].nil?
128
+ if @params[:fit_bias]
129
+ @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
130
+ @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
131
+ else
132
+ @weight_vec = w.dup
133
+ @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
134
+ end
135
+ end
136
+
137
+ def fit_sgd(x, y)
138
+ n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
139
+ n_features = x.shape[1]
140
+
141
+ if n_outputs > 1
142
+ @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
143
+ @bias_term = Numo::DFloat.zeros(n_outputs)
144
+ if enable_parallel?
145
+ models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
146
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
147
+ else
148
+ n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
149
+ end
150
+ else
151
+ @weight_vec, @bias_term = partial_fit(x, y)
152
+ end
153
+ end
154
+
115
155
  def calc_loss_gradient(x, y, weight)
116
156
  2.0 * (x.dot(weight) - y)
117
157
  end
@@ -18,6 +18,24 @@ module Rumale
18
18
  Numo::NMath.sqrt(squared_error(x, y).abs)
19
19
  end
20
20
 
21
+ # Calculate the pairwise manhattan distances between x and y.
22
+ #
23
+ # @param x [Numo::DFloat] (shape: [n_samples_x, n_features])
24
+ # @param y [Numo::DFloat] (shape: [n_samples_y, n_features])
25
+ # @return [Numo::DFloat] (shape: [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y] if y is given)
26
+ def manhattan_distance(x, y = nil)
27
+ y = x if y.nil?
28
+ Rumale::Validation.check_sample_array(x)
29
+ Rumale::Validation.check_sample_array(y)
30
+ n_samples_x = x.shape[0]
31
+ n_samples_y = y.shape[0]
32
+ distance_mat = Numo::DFloat.zeros(n_samples_x, n_samples_y)
33
+ n_samples_x.times do |n|
34
+ distance_mat[n, true] = (y - x[n, true]).abs.sum(axis: 1)
35
+ end
36
+ distance_mat
37
+ end
38
+
21
39
  # Calculate the pairwise squared errors between x and y.
22
40
  #
23
41
  # @param x [Numo::DFloat] (shape: [n_samples_x, n_features])
@@ -27,11 +45,6 @@ module Rumale
27
45
  y = x if y.nil?
28
46
  Rumale::Validation.check_sample_array(x)
29
47
  Rumale::Validation.check_sample_array(y)
30
- # sum_x_vec = (x**2).sum(1)
31
- # sum_y_vec = (y**2).sum(1)
32
- # dot_xy_mat = x.dot(y.transpose)
33
- # dot_xy_mat * -2.0 + sum_x_vec.tile(y.shape[0], 1).transpose + sum_y_vec.tile(x.shape[0], 1)
34
- #
35
48
  n_features = x.shape[1]
36
49
  one_vec = Numo::DFloat.ones(n_features).expand_dims(1)
37
50
  sum_x_vec = (x**2).dot(one_vec)
@@ -3,5 +3,5 @@
3
3
  # Rumale is a machine learning library in Ruby.
4
4
  module Rumale
5
5
  # The version of Rumale you are using.
6
- VERSION = '0.13.0'
6
+ VERSION = '0.13.1'
7
7
  end
@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
19
19
  Rumale currently supports Linear / Kernel Support Vector Machine,
20
20
  Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
21
21
  Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
22
- K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
22
+ K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
23
23
  Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
24
24
  MSG
25
25
  spec.homepage = 'https://github.com/yoshoku/rumale'
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: rumale
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.13.0
4
+ version: 0.13.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - yoshoku
8
8
  autorequire:
9
9
  bindir: exe
10
10
  cert_chain: []
11
- date: 2019-08-24 00:00:00.000000000 Z
11
+ date: 2019-09-01 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: numo-narray
@@ -128,7 +128,7 @@ description: |
128
128
  Rumale currently supports Linear / Kernel Support Vector Machine,
129
129
  Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
130
130
  Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
131
- K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
131
+ K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
132
132
  Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
133
133
  email:
134
134
  - yoshoku@outlook.com
@@ -166,6 +166,7 @@ files:
166
166
  - lib/rumale/clustering/k_means.rb
167
167
  - lib/rumale/clustering/k_medoids.rb
168
168
  - lib/rumale/clustering/power_iteration.rb
169
+ - lib/rumale/clustering/snn.rb
169
170
  - lib/rumale/dataset.rb
170
171
  - lib/rumale/decomposition/nmf.rb
171
172
  - lib/rumale/decomposition/pca.rb