rumale 0.13.0 → 0.13.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/README.md +2 -2
- data/lib/rumale.rb +1 -0
- data/lib/rumale/clustering/dbscan.rb +25 -13
- data/lib/rumale/clustering/k_medoids.rb +2 -2
- data/lib/rumale/clustering/snn.rb +76 -0
- data/lib/rumale/decomposition/pca.rb +2 -1
- data/lib/rumale/linear_model/linear_regression.rb +54 -15
- data/lib/rumale/linear_model/ridge.rb +57 -17
- data/lib/rumale/pairwise_metric.rb +18 -5
- data/lib/rumale/version.rb +1 -1
- data/rumale.gemspec +1 -1
- metadata +4 -3
    
        checksums.yaml
    CHANGED
    
    | @@ -1,7 +1,7 @@ | |
| 1 1 | 
             
            ---
         | 
| 2 2 | 
             
            SHA1:
         | 
| 3 | 
            -
              metadata.gz:  | 
| 4 | 
            -
              data.tar.gz:  | 
| 3 | 
            +
              metadata.gz: ce88d7170fd676377227427a0be90f8bdb1a9c97
         | 
| 4 | 
            +
              data.tar.gz: 04f0d07e6d098768eda726fc82f864420678e427
         | 
| 5 5 | 
             
            SHA512:
         | 
| 6 | 
            -
              metadata.gz:  | 
| 7 | 
            -
              data.tar.gz:  | 
| 6 | 
            +
              metadata.gz: 203444f0e7d833946f67c2ee922e02a48b7174c20eac84480e190f8749e150e0c5ed18e3d7b7d30480e565483b5a5b51d1990cced7e09b5db027d8c508fa4313
         | 
| 7 | 
            +
              data.tar.gz: e608c97fc0d29c018c778f9cc96cd53b0edff927c5631bd3b0cb606ee93f4e8c647ed2c76e7835b49f1933c0e5aeccb1ffbda4fe9aec59a2689f7bde4a28e103
         | 
    
        data/CHANGELOG.md
    CHANGED
    
    | @@ -1,3 +1,11 @@ | |
| 1 | 
            +
            # 0.13.1
         | 
| 2 | 
            +
            - Add class for Shared Neareset Neighbor clustering.
         | 
| 3 | 
            +
            - Add function for calculation of manhattan distance to Rumale::PairwiseMetric.
         | 
| 4 | 
            +
            - Add metric parameter that specifies distance metric to Rumale::Clustering::DBSCAN.
         | 
| 5 | 
            +
            - Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::LinearRegression.
         | 
| 6 | 
            +
            - Add the solver parameter that specifies the optimization algorithm to Rumale::LinearModel::Ridge.
         | 
| 7 | 
            +
            - Fix bug that the ndim of NArray of 1-dimensional principal components is not 1.
         | 
| 8 | 
            +
             | 
| 1 9 | 
             
            # 0.13.0
         | 
| 2 10 | 
             
            - Introduce [Numo::Linalg](https://github.com/ruby-numo/numo-linalg) to use linear algebra algorithms on the optimization.
         | 
| 3 11 | 
             
            - Add the solver parameter that specifies the optimization algorithm to Rumale::Decomposition::PCA.
         | 
    
        data/README.md
    CHANGED
    
    | @@ -6,14 +6,14 @@ | |
| 6 6 | 
             
            [](https://coveralls.io/github/yoshoku/rumale?branch=master)
         | 
| 7 7 | 
             
            [](https://badge.fury.io/rb/rumale)
         | 
| 8 8 | 
             
            [](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
         | 
| 9 | 
            -
            [](https://www.rubydoc.info/gems/rumale/0.13. | 
| 9 | 
            +
            [](https://www.rubydoc.info/gems/rumale/0.13.1)
         | 
| 10 10 |  | 
| 11 11 | 
             
            Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
         | 
| 12 12 | 
             
            Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
         | 
| 13 13 | 
             
            Rumale supports Linear / Kernel Support Vector Machine,
         | 
| 14 14 | 
             
            Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
         | 
| 15 15 | 
             
            Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier,
         | 
| 16 | 
            -
            K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
         | 
| 16 | 
            +
            K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
         | 
| 17 17 | 
             
            Mutidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
         | 
| 18 18 |  | 
| 19 19 | 
             
            This project was formerly known as "SVMKit".
         | 
    
        data/lib/rumale.rb
    CHANGED
    
    | @@ -60,6 +60,7 @@ require 'rumale/clustering/k_means' | |
| 60 60 | 
             
            require 'rumale/clustering/k_medoids'
         | 
| 61 61 | 
             
            require 'rumale/clustering/gaussian_mixture'
         | 
| 62 62 | 
             
            require 'rumale/clustering/dbscan'
         | 
| 63 | 
            +
            require 'rumale/clustering/snn'
         | 
| 63 64 | 
             
            require 'rumale/clustering/power_iteration'
         | 
| 64 65 | 
             
            require 'rumale/decomposition/pca'
         | 
| 65 66 | 
             
            require 'rumale/decomposition/nmf'
         | 
| @@ -7,7 +7,6 @@ require 'rumale/pairwise_metric' | |
| 7 7 | 
             
            module Rumale
         | 
| 8 8 | 
             
              module Clustering
         | 
| 9 9 | 
             
                # DBSCAN is a class that implements DBSCAN cluster analysis.
         | 
| 10 | 
            -
                # The current implementation uses the Euclidean distance for analyzing the clusters.
         | 
| 11 10 | 
             
                #
         | 
| 12 11 | 
             
                # @example
         | 
| 13 12 | 
             
                #   analyzer = Rumale::Clustering::DBSCAN.new(eps: 0.5, min_samples: 5)
         | 
| @@ -31,12 +30,17 @@ module Rumale | |
| 31 30 | 
             
                  #
         | 
| 32 31 | 
             
                  # @param eps [Float] The radius of neighborhood.
         | 
| 33 32 | 
             
                  # @param min_samples [Integer] The number of neighbor samples to be used for the criterion whether a point is a core point.
         | 
| 34 | 
            -
                   | 
| 33 | 
            +
                  # @param metric [String] The metric to calculate the distances.
         | 
| 34 | 
            +
                  #   If metric is 'euclidean', Euclidean distance is calculated for distance between points.
         | 
| 35 | 
            +
                  #   If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
         | 
| 36 | 
            +
                  def initialize(eps: 0.5, min_samples: 5, metric: 'euclidean')
         | 
| 35 37 | 
             
                    check_params_float(eps: eps)
         | 
| 36 38 | 
             
                    check_params_integer(min_samples: min_samples)
         | 
| 39 | 
            +
                    check_params_string(metric: metric)
         | 
| 37 40 | 
             
                    @params = {}
         | 
| 38 41 | 
             
                    @params[:eps] = eps
         | 
| 39 42 | 
             
                    @params[:min_samples] = min_samples
         | 
| 43 | 
            +
                    @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
         | 
| 40 44 | 
             
                    @core_sample_ids = nil
         | 
| 41 45 | 
             
                    @labels = nil
         | 
| 42 46 | 
             
                  end
         | 
| @@ -46,19 +50,23 @@ module Rumale | |
| 46 50 | 
             
                  # @overload fit(x) -> DBSCAN
         | 
| 47 51 | 
             
                  #
         | 
| 48 52 | 
             
                  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
         | 
| 53 | 
            +
                  #   If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
         | 
| 49 54 | 
             
                  # @return [DBSCAN] The learned cluster analyzer itself.
         | 
| 50 55 | 
             
                  def fit(x, _y = nil)
         | 
| 51 56 | 
             
                    check_sample_array(x)
         | 
| 57 | 
            +
                    raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
         | 
| 52 58 | 
             
                    partial_fit(x)
         | 
| 53 59 | 
             
                    self
         | 
| 54 60 | 
             
                  end
         | 
| 55 61 |  | 
| 56 62 | 
             
                  # Analysis clusters and assign samples to clusters.
         | 
| 57 63 | 
             
                  #
         | 
| 58 | 
            -
                  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The  | 
| 64 | 
            +
                  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to be used for cluster analysis.
         | 
| 65 | 
            +
                  #   If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
         | 
| 59 66 | 
             
                  # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
         | 
| 60 67 | 
             
                  def fit_predict(x)
         | 
| 61 68 | 
             
                    check_sample_array(x)
         | 
| 69 | 
            +
                    raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
         | 
| 62 70 | 
             
                    partial_fit(x)
         | 
| 63 71 | 
             
                    labels
         | 
| 64 72 | 
             
                  end
         | 
| @@ -84,19 +92,24 @@ module Rumale | |
| 84 92 |  | 
| 85 93 | 
             
                  def partial_fit(x)
         | 
| 86 94 | 
             
                    cluster_id = 0
         | 
| 87 | 
            -
                     | 
| 95 | 
            +
                    metric_mat = calc_pairwise_metrics(x)
         | 
| 96 | 
            +
                    n_samples = metric_mat.shape[0]
         | 
| 88 97 | 
             
                    @core_sample_ids = []
         | 
| 89 98 | 
             
                    @labels = Numo::Int32.zeros(n_samples) - 2
         | 
| 90 | 
            -
                    n_samples.times do | | 
| 91 | 
            -
                      next if @labels[ | 
| 92 | 
            -
                      cluster_id += 1 if expand_cluster( | 
| 99 | 
            +
                    n_samples.times do |query_id|
         | 
| 100 | 
            +
                      next if @labels[query_id] >= -1
         | 
| 101 | 
            +
                      cluster_id += 1 if expand_cluster(metric_mat, query_id, cluster_id)
         | 
| 93 102 | 
             
                    end
         | 
| 94 103 | 
             
                    @core_sample_ids = Numo::Int32[*@core_sample_ids.flatten]
         | 
| 95 104 | 
             
                    nil
         | 
| 96 105 | 
             
                  end
         | 
| 97 106 |  | 
| 98 | 
            -
                  def  | 
| 99 | 
            -
                     | 
| 107 | 
            +
                  def calc_pairwise_metrics(x)
         | 
| 108 | 
            +
                    @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
         | 
| 109 | 
            +
                  end
         | 
| 110 | 
            +
             | 
| 111 | 
            +
                  def expand_cluster(metric_mat, query_id, cluster_id)
         | 
| 112 | 
            +
                    target_ids = region_query(metric_mat[query_id, true])
         | 
| 100 113 | 
             
                    if target_ids.size < @params[:min_samples]
         | 
| 101 114 | 
             
                      @labels[query_id] = -1
         | 
| 102 115 | 
             
                      false
         | 
| @@ -105,7 +118,7 @@ module Rumale | |
| 105 118 | 
             
                      @core_sample_ids.push(target_ids.dup)
         | 
| 106 119 | 
             
                      target_ids.delete(query_id)
         | 
| 107 120 | 
             
                      while (m = target_ids.shift)
         | 
| 108 | 
            -
                        neighbor_ids = region_query( | 
| 121 | 
            +
                        neighbor_ids = region_query(metric_mat[m, true])
         | 
| 109 122 | 
             
                        next if neighbor_ids.size < @params[:min_samples]
         | 
| 110 123 | 
             
                        neighbor_ids.each do |n|
         | 
| 111 124 | 
             
                          target_ids.push(n) if @labels[n] < -1
         | 
| @@ -116,9 +129,8 @@ module Rumale | |
| 116 129 | 
             
                    end
         | 
| 117 130 | 
             
                  end
         | 
| 118 131 |  | 
| 119 | 
            -
                  def region_query( | 
| 120 | 
            -
                     | 
| 121 | 
            -
                    distance_arr.lt(@params[:eps]).where.to_a
         | 
| 132 | 
            +
                  def region_query(metric_arr)
         | 
| 133 | 
            +
                    metric_arr.lt(@params[:eps]).where.to_a
         | 
| 122 134 | 
             
                  end
         | 
| 123 135 | 
             
                end
         | 
| 124 136 | 
             
              end
         | 
| @@ -29,8 +29,8 @@ module Rumale | |
| 29 29 | 
             
                  # Create a new cluster analyzer with K-Medoids method.
         | 
| 30 30 | 
             
                  #
         | 
| 31 31 | 
             
                  # @param n_clusters [Integer] The number of clusters.
         | 
| 32 | 
            -
                  # @param metric [String] The metric to calculate the distances | 
| 33 | 
            -
                  #   If metric is 'euclidean', Euclidean distance is calculated for distance  | 
| 32 | 
            +
                  # @param metric [String] The metric to calculate the distances.
         | 
| 33 | 
            +
                  #   If metric is 'euclidean', Euclidean distance is calculated for distance between points.
         | 
| 34 34 | 
             
                  #   If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
         | 
| 35 35 | 
             
                  # @param init [String] The initialization method for centroids ('random' or 'k-means++').
         | 
| 36 36 | 
             
                  # @param max_iter [Integer] The maximum number of iterations.
         | 
| @@ -0,0 +1,76 @@ | |
| 1 | 
            +
            # frozen_string_literal: true
         | 
| 2 | 
            +
             | 
| 3 | 
            +
            require 'rumale/pairwise_metric'
         | 
| 4 | 
            +
            require 'rumale/clustering/dbscan'
         | 
| 5 | 
            +
             | 
| 6 | 
            +
            module Rumale
         | 
| 7 | 
            +
              module Clustering
         | 
| 8 | 
            +
                # SNN is a class that implements Shared Nearest Neighbor cluster analysis.
         | 
| 9 | 
            +
                # The SNN method is a variation of DBSCAN that uses similarity based on k-nearest neighbors as a metric.
         | 
| 10 | 
            +
                #
         | 
| 11 | 
            +
                # @example
         | 
| 12 | 
            +
                #   analyzer = Rumale::Clustering::SNN.new(n_neighbros: 10, eps: 5, min_samples: 5)
         | 
| 13 | 
            +
                #   cluster_labels = analyzer.fit_predict(samples)
         | 
| 14 | 
            +
                #
         | 
| 15 | 
            +
                # *Reference*
         | 
| 16 | 
            +
                # - L. Ertoz, M. Steinbach, and V. Kumar, "Finding Clusters of Different Sizes, Shapes, and Densities in Noisy, High Dimensional Data," Proc. SDM'03, pp. 47--58, 2003.
         | 
| 17 | 
            +
                # - M E. Houle, H-P. Kriegel, P. Kroger, E. Schubert, and A. Zimek, "Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?," Proc. SSDBM'10, pp. 482--500, 2010.
         | 
| 18 | 
            +
                class SNN < DBSCAN
         | 
| 19 | 
            +
                  # Create a new cluster analyzer with Shared Neareset Neighbor method.
         | 
| 20 | 
            +
                  #
         | 
| 21 | 
            +
                  # @param n_neighbors [Integer] The number of neighbors to be used for finding k-nearest neighbors.
         | 
| 22 | 
            +
                  # @param eps [Integer] The threshold value for finding connected components based on similarity.
         | 
| 23 | 
            +
                  # @param min_samples [Integer] The number of neighbor samples to be used for the criterion whether a point is a core point.
         | 
| 24 | 
            +
                  # @param metric [String] The metric to calculate the distances.
         | 
| 25 | 
            +
                  #   If metric is 'euclidean', Euclidean distance is calculated for distance between points.
         | 
| 26 | 
            +
                  #   If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
         | 
| 27 | 
            +
                  def initialize(n_neighbors: 10, eps: 5, min_samples: 5, metric: 'euclidean')
         | 
| 28 | 
            +
                    check_params_integer(n_neighbors: n_neighbors, min_samples: min_samples)
         | 
| 29 | 
            +
                    check_params_string(metric: metric)
         | 
| 30 | 
            +
                    @params = {}
         | 
| 31 | 
            +
                    @params[:n_neighbors] = n_neighbors
         | 
| 32 | 
            +
                    @params[:eps] = eps
         | 
| 33 | 
            +
                    @params[:min_samples] = min_samples
         | 
| 34 | 
            +
                    @params[:metric] = metric == 'precomputed' ? 'precomputed' : 'euclidean'
         | 
| 35 | 
            +
                    @core_sample_ids = nil
         | 
| 36 | 
            +
                    @labels = nil
         | 
| 37 | 
            +
                  end
         | 
| 38 | 
            +
             | 
| 39 | 
            +
                  # Analysis clusters with given training data.
         | 
| 40 | 
            +
                  #
         | 
| 41 | 
            +
                  # @overload fit(x) -> SNN
         | 
| 42 | 
            +
                  #   @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
         | 
| 43 | 
            +
                  #     If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
         | 
| 44 | 
            +
                  # @return [SNN] The learned cluster analyzer itself.
         | 
| 45 | 
            +
                  def fit(x, _y = nil)
         | 
| 46 | 
            +
                    super
         | 
| 47 | 
            +
                  end
         | 
| 48 | 
            +
             | 
| 49 | 
            +
                  # Analysis clusters and assign samples to clusters.
         | 
| 50 | 
            +
                  #
         | 
| 51 | 
            +
                  # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The samples to be used for cluster analysis.
         | 
| 52 | 
            +
                  #   If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
         | 
| 53 | 
            +
                  # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
         | 
| 54 | 
            +
                  def fit_predict(x)
         | 
| 55 | 
            +
                    super
         | 
| 56 | 
            +
                  end
         | 
| 57 | 
            +
             | 
| 58 | 
            +
                  private
         | 
| 59 | 
            +
             | 
| 60 | 
            +
                  def calc_pairwise_metrics(x)
         | 
| 61 | 
            +
                    distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
         | 
| 62 | 
            +
                    n_samples = distance_mat.shape[0]
         | 
| 63 | 
            +
                    adjacency_mat = Numo::DFloat.zeros(n_samples, n_samples)
         | 
| 64 | 
            +
                    n_samples.times do |n|
         | 
| 65 | 
            +
                      neighbor_ids = distance_mat[n, true].sort_index[0...@params[:n_neighbors]]
         | 
| 66 | 
            +
                      adjacency_mat[n, neighbor_ids] = 1
         | 
| 67 | 
            +
                    end
         | 
| 68 | 
            +
                    adjacency_mat.dot(adjacency_mat.transpose)
         | 
| 69 | 
            +
                  end
         | 
| 70 | 
            +
             | 
| 71 | 
            +
                  def region_query(similarity_arr)
         | 
| 72 | 
            +
                    similarity_arr.gt(@params[:eps]).where.to_a
         | 
| 73 | 
            +
                  end
         | 
| 74 | 
            +
                end
         | 
| 75 | 
            +
              end
         | 
| 76 | 
            +
            end
         | 
| @@ -80,7 +80,8 @@ module Rumale | |
| 80 80 | 
             
                    covariance_mat = centered_x.transpose.dot(centered_x) / (n_samples - 1)
         | 
| 81 81 | 
             
                    if @params[:solver] == 'evd' && enable_linalg?
         | 
| 82 82 | 
             
                      _, evecs = Numo::Linalg.eigh(covariance_mat, vals_range: (n_features - @params[:n_components])...n_features)
         | 
| 83 | 
            -
                       | 
| 83 | 
            +
                      comps = evecs.reverse(1).transpose
         | 
| 84 | 
            +
                      @components = @params[:n_components] == 1 ? comps[0, true].dup : comps.dup
         | 
| 84 85 | 
             
                    else
         | 
| 85 86 | 
             
                      @params[:n_components].times do
         | 
| 86 87 | 
             
                        comp_vec = Rumale::Utils.rand_uniform(n_features, sub_rng)
         | 
| @@ -6,7 +6,7 @@ require 'rumale/base/regressor' | |
| 6 6 | 
             
            module Rumale
         | 
| 7 7 | 
             
              module LinearModel
         | 
| 8 8 | 
             
                # LinearRegression is a class that implements ordinary least square linear regression
         | 
| 9 | 
            -
                # with mini-batch stochastic gradient descent optimization.
         | 
| 9 | 
            +
                # with mini-batch stochastic gradient descent optimization or singular value decomposition.
         | 
| 10 10 | 
             
                #
         | 
| 11 11 | 
             
                # @example
         | 
| 12 12 | 
             
                #   estimator =
         | 
| @@ -14,6 +14,11 @@ module Rumale | |
| 14 14 | 
             
                #   estimator.fit(training_samples, traininig_values)
         | 
| 15 15 | 
             
                #   results = estimator.predict(testing_samples)
         | 
| 16 16 | 
             
                #
         | 
| 17 | 
            +
                #   # If Numo::Linalg is installed, you can specify 'svd' for the solver option.
         | 
| 18 | 
            +
                #   require 'numo/linalg/autoloader'
         | 
| 19 | 
            +
                #   estimator = Rumale::LinearModel::LinearRegression.new(solver: 'svd')
         | 
| 20 | 
            +
                #   estimator.fit(training_samples, traininig_values)
         | 
| 21 | 
            +
                #   results = estimator.predict(testing_samples)
         | 
| 17 22 | 
             
                class LinearRegression < BaseLinearModel
         | 
| 18 23 | 
             
                  include Base::Regressor
         | 
| 19 24 |  | 
| @@ -34,23 +39,32 @@ module Rumale | |
| 34 39 | 
             
                  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
         | 
| 35 40 | 
             
                  # @param bias_scale [Float] The scale of the bias term.
         | 
| 36 41 | 
             
                  # @param max_iter [Integer] The maximum number of iterations.
         | 
| 42 | 
            +
                  #   If solver = 'svd', this parameter is ignored.
         | 
| 37 43 | 
             
                  # @param batch_size [Integer] The size of the mini batches.
         | 
| 44 | 
            +
                  #   If solver = 'svd', this parameter is ignored.
         | 
| 38 45 | 
             
                  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
         | 
| 39 46 | 
             
                  #   If nil is given, Nadam is used.
         | 
| 47 | 
            +
                  #   If solver = 'svd', this parameter is ignored.
         | 
| 48 | 
            +
                  # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
         | 
| 49 | 
            +
                  #   'sgd' uses the stochastic gradient descent optimization.
         | 
| 50 | 
            +
                  #   'svd' performs singular value decomposition of samples.
         | 
| 40 51 | 
             
                  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
         | 
| 41 52 | 
             
                  #   If nil is given, the method does not execute in parallel.
         | 
| 42 53 | 
             
                  #   If zero or less is given, it becomes equal to the number of processors.
         | 
| 43 54 | 
             
                  #   This parameter is ignored if the Parallel gem is not loaded.
         | 
| 44 55 | 
             
                  # @param random_seed [Integer] The seed value using to initialize the random generator.
         | 
| 45 56 | 
             
                  def initialize(fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
         | 
| 46 | 
            -
                                 n_jobs: nil, random_seed: nil)
         | 
| 57 | 
            +
                                 solver: 'sgd', n_jobs: nil, random_seed: nil)
         | 
| 47 58 | 
             
                    check_params_float(bias_scale: bias_scale)
         | 
| 48 59 | 
             
                    check_params_integer(max_iter: max_iter, batch_size: batch_size)
         | 
| 49 60 | 
             
                    check_params_boolean(fit_bias: fit_bias)
         | 
| 61 | 
            +
                    check_params_string(solver: solver)
         | 
| 50 62 | 
             
                    check_params_type_or_nil(Integer, n_jobs: n_jobs, random_seed: random_seed)
         | 
| 51 63 | 
             
                    check_params_positive(max_iter: max_iter, batch_size: batch_size)
         | 
| 52 64 | 
             
                    keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h.merge(reg_param: 0.0)
         | 
| 65 | 
            +
                    keywd_args.delete(:solver)
         | 
| 53 66 | 
             
                    super(keywd_args)
         | 
| 67 | 
            +
                    @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
         | 
| 54 68 | 
             
                  end
         | 
| 55 69 |  | 
| 56 70 | 
             
                  # Fit the model with given training data.
         | 
| @@ -63,20 +77,10 @@ module Rumale | |
| 63 77 | 
             
                    check_tvalue_array(y)
         | 
| 64 78 | 
             
                    check_sample_tvalue_size(x, y)
         | 
| 65 79 |  | 
| 66 | 
            -
                     | 
| 67 | 
            -
             | 
| 68 | 
            -
             | 
| 69 | 
            -
                    if n_outputs > 1
         | 
| 70 | 
            -
                      @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
         | 
| 71 | 
            -
                      @bias_term = Numo::DFloat.zeros(n_outputs)
         | 
| 72 | 
            -
                      if enable_parallel?
         | 
| 73 | 
            -
                        models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
         | 
| 74 | 
            -
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
         | 
| 75 | 
            -
                      else
         | 
| 76 | 
            -
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
         | 
| 77 | 
            -
                      end
         | 
| 80 | 
            +
                    if @params[:solver] == 'svd' && enable_linalg?
         | 
| 81 | 
            +
                      fit_svd(x, y)
         | 
| 78 82 | 
             
                    else
         | 
| 79 | 
            -
                       | 
| 83 | 
            +
                      fit_sgd(x, y)
         | 
| 80 84 | 
             
                    end
         | 
| 81 85 |  | 
| 82 86 | 
             
                    self
         | 
| @@ -112,6 +116,41 @@ module Rumale | |
| 112 116 |  | 
| 113 117 | 
             
                  private
         | 
| 114 118 |  | 
| 119 | 
            +
                  def fit_svd(x, y)
         | 
| 120 | 
            +
                    samples = @params[:fit_bias] ? expand_feature(x) : x
         | 
| 121 | 
            +
             | 
| 122 | 
            +
                    s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
         | 
| 123 | 
            +
                    d = (s / s**2).diag
         | 
| 124 | 
            +
                    w = vt.transpose.dot(d).dot(u.transpose).dot(y)
         | 
| 125 | 
            +
             | 
| 126 | 
            +
                    is_single_target_vals = y.shape[1].nil?
         | 
| 127 | 
            +
                    if @params[:fit_bias]
         | 
| 128 | 
            +
                      @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
         | 
| 129 | 
            +
                      @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
         | 
| 130 | 
            +
                    else
         | 
| 131 | 
            +
                      @weight_vec = w.dup
         | 
| 132 | 
            +
                      @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
         | 
| 133 | 
            +
                    end
         | 
| 134 | 
            +
                  end
         | 
| 135 | 
            +
             | 
| 136 | 
            +
                  def fit_sgd(x, y)
         | 
| 137 | 
            +
                    n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
         | 
| 138 | 
            +
                    n_features = x.shape[1]
         | 
| 139 | 
            +
             | 
| 140 | 
            +
                    if n_outputs > 1
         | 
| 141 | 
            +
                      @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
         | 
| 142 | 
            +
                      @bias_term = Numo::DFloat.zeros(n_outputs)
         | 
| 143 | 
            +
                      if enable_parallel?
         | 
| 144 | 
            +
                        models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
         | 
| 145 | 
            +
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
         | 
| 146 | 
            +
                      else
         | 
| 147 | 
            +
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
         | 
| 148 | 
            +
                      end
         | 
| 149 | 
            +
                    else
         | 
| 150 | 
            +
                      @weight_vec, @bias_term = partial_fit(x, y)
         | 
| 151 | 
            +
                    end
         | 
| 152 | 
            +
                  end
         | 
| 153 | 
            +
             | 
| 115 154 | 
             
                  def calc_loss_gradient(x, y, weight)
         | 
| 116 155 | 
             
                    2.0 * (x.dot(weight) - y)
         | 
| 117 156 | 
             
                  end
         | 
| @@ -6,7 +6,7 @@ require 'rumale/base/regressor' | |
| 6 6 | 
             
            module Rumale
         | 
| 7 7 | 
             
              module LinearModel
         | 
| 8 8 | 
             
                # Ridge is a class that implements Ridge Regression
         | 
| 9 | 
            -
                # with mini-batch stochastic gradient descent optimization.
         | 
| 9 | 
            +
                # with mini-batch stochastic gradient descent optimization or singular value decomposition.
         | 
| 10 10 | 
             
                #
         | 
| 11 11 | 
             
                # @example
         | 
| 12 12 | 
             
                #   estimator =
         | 
| @@ -14,6 +14,11 @@ module Rumale | |
| 14 14 | 
             
                #   estimator.fit(training_samples, traininig_values)
         | 
| 15 15 | 
             
                #   results = estimator.predict(testing_samples)
         | 
| 16 16 | 
             
                #
         | 
| 17 | 
            +
                #   # If Numo::Linalg is installed, you can specify 'svd' for the solver option.
         | 
| 18 | 
            +
                #   require 'numo/linalg/autoloader'
         | 
| 19 | 
            +
                #   estimator = Rumale::LinearModel::Ridge.new(reg_param: 0.1, solver: 'svd')
         | 
| 20 | 
            +
                #   estimator.fit(training_samples, traininig_values)
         | 
| 21 | 
            +
                #   results = estimator.predict(testing_samples)
         | 
| 17 22 | 
             
                class Ridge < BaseLinearModel
         | 
| 18 23 | 
             
                  include Base::Regressor
         | 
| 19 24 |  | 
| @@ -35,22 +40,32 @@ module Rumale | |
| 35 40 | 
             
                  # @param fit_bias [Boolean] The flag indicating whether to fit the bias term.
         | 
| 36 41 | 
             
                  # @param bias_scale [Float] The scale of the bias term.
         | 
| 37 42 | 
             
                  # @param max_iter [Integer] The maximum number of iterations.
         | 
| 43 | 
            +
                  #   If solver = 'svd', this parameter is ignored.
         | 
| 38 44 | 
             
                  # @param batch_size [Integer] The size of the mini batches.
         | 
| 45 | 
            +
                  #   If solver = 'svd', this parameter is ignored.
         | 
| 39 46 | 
             
                  # @param optimizer [Optimizer] The optimizer to calculate adaptive learning rate.
         | 
| 40 47 | 
             
                  #   If nil is given, Nadam is used.
         | 
| 48 | 
            +
                  #   If solver = 'svd', this parameter is ignored.
         | 
| 49 | 
            +
                  # @param solver [String] The algorithm to calculate weights. ('sgd' or 'svd').
         | 
| 50 | 
            +
                  #   'sgd' uses the stochastic gradient descent optimization.
         | 
| 51 | 
            +
                  #   'svd' performs singular value decomposition of samples.
         | 
| 41 52 | 
             
                  # @param n_jobs [Integer] The number of jobs for running the fit method in parallel.
         | 
| 42 53 | 
             
                  #   If nil is given, the method does not execute in parallel.
         | 
| 43 54 | 
             
                  #   If zero or less is given, it becomes equal to the number of processors.
         | 
| 44 | 
            -
                  #   This parameter is ignored if the Parallel gem is not loaded.
         | 
| 55 | 
            +
                  #   This parameter is ignored if the Parallel gem is not loaded or the solver is 'svd'.
         | 
| 45 56 | 
             
                  # @param random_seed [Integer] The seed value using to initialize the random generator.
         | 
| 46 57 | 
             
                  def initialize(reg_param: 1.0, fit_bias: false, bias_scale: 1.0, max_iter: 1000, batch_size: 10, optimizer: nil,
         | 
| 47 | 
            -
                                 n_jobs: nil, random_seed: nil)
         | 
| 58 | 
            +
                                 solver: 'sgd', n_jobs: nil, random_seed: nil)
         | 
| 48 59 | 
             
                    check_params_float(reg_param: reg_param, bias_scale: bias_scale)
         | 
| 49 60 | 
             
                    check_params_integer(max_iter: max_iter, batch_size: batch_size)
         | 
| 50 61 | 
             
                    check_params_boolean(fit_bias: fit_bias)
         | 
| 62 | 
            +
                    check_params_string(solver: solver)
         | 
| 51 63 | 
             
                    check_params_type_or_nil(Integer, n_jobs: n_jobs, random_seed: random_seed)
         | 
| 52 64 | 
             
                    check_params_positive(reg_param: reg_param, max_iter: max_iter, batch_size: batch_size)
         | 
| 53 | 
            -
                     | 
| 65 | 
            +
                    keywd_args = method(:initialize).parameters.map { |_t, arg| [arg, binding.local_variable_get(arg)] }.to_h
         | 
| 66 | 
            +
                    keywd_args.delete(:solver)
         | 
| 67 | 
            +
                    super(keywd_args)
         | 
| 68 | 
            +
                    @params[:solver] = solver != 'svd' ? 'sgd' : 'svd'
         | 
| 54 69 | 
             
                  end
         | 
| 55 70 |  | 
| 56 71 | 
             
                  # Fit the model with given training data.
         | 
| @@ -63,20 +78,10 @@ module Rumale | |
| 63 78 | 
             
                    check_tvalue_array(y)
         | 
| 64 79 | 
             
                    check_sample_tvalue_size(x, y)
         | 
| 65 80 |  | 
| 66 | 
            -
                     | 
| 67 | 
            -
             | 
| 68 | 
            -
             | 
| 69 | 
            -
                    if n_outputs > 1
         | 
| 70 | 
            -
                      @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
         | 
| 71 | 
            -
                      @bias_term = Numo::DFloat.zeros(n_outputs)
         | 
| 72 | 
            -
                      if enable_parallel?
         | 
| 73 | 
            -
                        models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
         | 
| 74 | 
            -
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
         | 
| 75 | 
            -
                      else
         | 
| 76 | 
            -
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
         | 
| 77 | 
            -
                      end
         | 
| 81 | 
            +
                    if @params[:solver] == 'svd' && enable_linalg?
         | 
| 82 | 
            +
                      fit_svd(x, y)
         | 
| 78 83 | 
             
                    else
         | 
| 79 | 
            -
                       | 
| 84 | 
            +
                      fit_sgd(x, y)
         | 
| 80 85 | 
             
                    end
         | 
| 81 86 |  | 
| 82 87 | 
             
                    self
         | 
| @@ -112,6 +117,41 @@ module Rumale | |
| 112 117 |  | 
| 113 118 | 
             
                  private
         | 
| 114 119 |  | 
| 120 | 
            +
                  def fit_svd(x, y)
         | 
| 121 | 
            +
                    samples = @params[:fit_bias] ? expand_feature(x) : x
         | 
| 122 | 
            +
             | 
| 123 | 
            +
                    s, u, vt = Numo::Linalg.svd(samples, driver: 'sdd', job: 'S')
         | 
| 124 | 
            +
                    d = (s / (s**2 + @params[:reg_param])).diag
         | 
| 125 | 
            +
                    w = vt.transpose.dot(d).dot(u.transpose).dot(y)
         | 
| 126 | 
            +
             | 
| 127 | 
            +
                    is_single_target_vals = y.shape[1].nil?
         | 
| 128 | 
            +
                    if @params[:fit_bias]
         | 
| 129 | 
            +
                      @weight_vec = is_single_target_vals ? w[0...-1].dup : w[0...-1, true].dup
         | 
| 130 | 
            +
                      @bias_term = is_single_target_vals ? w[-1] : w[-1, true].dup
         | 
| 131 | 
            +
                    else
         | 
| 132 | 
            +
                      @weight_vec = w.dup
         | 
| 133 | 
            +
                      @bias_term = is_single_target_vals ? 0 : Numo::DFloat.zeros(y.shape[1])
         | 
| 134 | 
            +
                    end
         | 
| 135 | 
            +
                  end
         | 
| 136 | 
            +
             | 
| 137 | 
            +
                  def fit_sgd(x, y)
         | 
| 138 | 
            +
                    n_outputs = y.shape[1].nil? ? 1 : y.shape[1]
         | 
| 139 | 
            +
                    n_features = x.shape[1]
         | 
| 140 | 
            +
             | 
| 141 | 
            +
                    if n_outputs > 1
         | 
| 142 | 
            +
                      @weight_vec = Numo::DFloat.zeros(n_outputs, n_features)
         | 
| 143 | 
            +
                      @bias_term = Numo::DFloat.zeros(n_outputs)
         | 
| 144 | 
            +
                      if enable_parallel?
         | 
| 145 | 
            +
                        models = parallel_map(n_outputs) { |n| partial_fit(x, y[true, n]) }
         | 
| 146 | 
            +
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = models[n] }
         | 
| 147 | 
            +
                      else
         | 
| 148 | 
            +
                        n_outputs.times { |n| @weight_vec[n, true], @bias_term[n] = partial_fit(x, y[true, n]) }
         | 
| 149 | 
            +
                      end
         | 
| 150 | 
            +
                    else
         | 
| 151 | 
            +
                      @weight_vec, @bias_term = partial_fit(x, y)
         | 
| 152 | 
            +
                    end
         | 
| 153 | 
            +
                  end
         | 
| 154 | 
            +
             | 
| 115 155 | 
             
                  def calc_loss_gradient(x, y, weight)
         | 
| 116 156 | 
             
                    2.0 * (x.dot(weight) - y)
         | 
| 117 157 | 
             
                  end
         | 
| @@ -18,6 +18,24 @@ module Rumale | |
| 18 18 | 
             
                    Numo::NMath.sqrt(squared_error(x, y).abs)
         | 
| 19 19 | 
             
                  end
         | 
| 20 20 |  | 
| 21 | 
            +
                  # Calculate the pairwise manhattan distances between x and y.
         | 
| 22 | 
            +
                  #
         | 
| 23 | 
            +
                  # @param x [Numo::DFloat] (shape: [n_samples_x, n_features])
         | 
| 24 | 
            +
                  # @param y [Numo::DFloat] (shape: [n_samples_y, n_features])
         | 
| 25 | 
            +
                  # @return [Numo::DFloat] (shape: [n_samples_x, n_samples_x] or [n_samples_x, n_samples_y] if y is given)
         | 
| 26 | 
            +
                  def manhattan_distance(x, y = nil)
         | 
| 27 | 
            +
                    y = x if y.nil?
         | 
| 28 | 
            +
                    Rumale::Validation.check_sample_array(x)
         | 
| 29 | 
            +
                    Rumale::Validation.check_sample_array(y)
         | 
| 30 | 
            +
                    n_samples_x = x.shape[0]
         | 
| 31 | 
            +
                    n_samples_y = y.shape[0]
         | 
| 32 | 
            +
                    distance_mat = Numo::DFloat.zeros(n_samples_x, n_samples_y)
         | 
| 33 | 
            +
                    n_samples_x.times do |n|
         | 
| 34 | 
            +
                      distance_mat[n, true] = (y - x[n, true]).abs.sum(axis: 1)
         | 
| 35 | 
            +
                    end
         | 
| 36 | 
            +
                    distance_mat
         | 
| 37 | 
            +
                  end
         | 
| 38 | 
            +
             | 
| 21 39 | 
             
                  # Calculate the pairwise squared errors between x and y.
         | 
| 22 40 | 
             
                  #
         | 
| 23 41 | 
             
                  # @param x [Numo::DFloat] (shape: [n_samples_x, n_features])
         | 
| @@ -27,11 +45,6 @@ module Rumale | |
| 27 45 | 
             
                    y = x if y.nil?
         | 
| 28 46 | 
             
                    Rumale::Validation.check_sample_array(x)
         | 
| 29 47 | 
             
                    Rumale::Validation.check_sample_array(y)
         | 
| 30 | 
            -
                    # sum_x_vec = (x**2).sum(1)
         | 
| 31 | 
            -
                    # sum_y_vec = (y**2).sum(1)
         | 
| 32 | 
            -
                    # dot_xy_mat = x.dot(y.transpose)
         | 
| 33 | 
            -
                    # dot_xy_mat * -2.0 + sum_x_vec.tile(y.shape[0], 1).transpose + sum_y_vec.tile(x.shape[0], 1)
         | 
| 34 | 
            -
                    #
         | 
| 35 48 | 
             
                    n_features = x.shape[1]
         | 
| 36 49 | 
             
                    one_vec = Numo::DFloat.ones(n_features).expand_dims(1)
         | 
| 37 50 | 
             
                    sum_x_vec = (x**2).dot(one_vec)
         | 
    
        data/lib/rumale/version.rb
    CHANGED
    
    
    
        data/rumale.gemspec
    CHANGED
    
    | @@ -19,7 +19,7 @@ Gem::Specification.new do |spec| | |
| 19 19 | 
             
                Rumale currently supports Linear / Kernel Support Vector Machine,
         | 
| 20 20 | 
             
                Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
         | 
| 21 21 | 
             
                Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
         | 
| 22 | 
            -
                K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
         | 
| 22 | 
            +
                K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
         | 
| 23 23 | 
             
                Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
         | 
| 24 24 | 
             
              MSG
         | 
| 25 25 | 
             
              spec.homepage      = 'https://github.com/yoshoku/rumale'
         | 
    
        metadata
    CHANGED
    
    | @@ -1,14 +1,14 @@ | |
| 1 1 | 
             
            --- !ruby/object:Gem::Specification
         | 
| 2 2 | 
             
            name: rumale
         | 
| 3 3 | 
             
            version: !ruby/object:Gem::Version
         | 
| 4 | 
            -
              version: 0.13. | 
| 4 | 
            +
              version: 0.13.1
         | 
| 5 5 | 
             
            platform: ruby
         | 
| 6 6 | 
             
            authors:
         | 
| 7 7 | 
             
            - yoshoku
         | 
| 8 8 | 
             
            autorequire: 
         | 
| 9 9 | 
             
            bindir: exe
         | 
| 10 10 | 
             
            cert_chain: []
         | 
| 11 | 
            -
            date: 2019- | 
| 11 | 
            +
            date: 2019-09-01 00:00:00.000000000 Z
         | 
| 12 12 | 
             
            dependencies:
         | 
| 13 13 | 
             
            - !ruby/object:Gem::Dependency
         | 
| 14 14 | 
             
              name: numo-narray
         | 
| @@ -128,7 +128,7 @@ description: | | |
| 128 128 | 
             
              Rumale currently supports Linear / Kernel Support Vector Machine,
         | 
| 129 129 | 
             
              Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
         | 
| 130 130 | 
             
              Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
         | 
| 131 | 
            -
              K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
         | 
| 131 | 
            +
              K-Means, K-Medoids, Gaussian Mixture Model, DBSCAN, SNN, Power Iteration Clustering,
         | 
| 132 132 | 
             
              Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
         | 
| 133 133 | 
             
            email:
         | 
| 134 134 | 
             
            - yoshoku@outlook.com
         | 
| @@ -166,6 +166,7 @@ files: | |
| 166 166 | 
             
            - lib/rumale/clustering/k_means.rb
         | 
| 167 167 | 
             
            - lib/rumale/clustering/k_medoids.rb
         | 
| 168 168 | 
             
            - lib/rumale/clustering/power_iteration.rb
         | 
| 169 | 
            +
            - lib/rumale/clustering/snn.rb
         | 
| 169 170 | 
             
            - lib/rumale/dataset.rb
         | 
| 170 171 | 
             
            - lib/rumale/decomposition/nmf.rb
         | 
| 171 172 | 
             
            - lib/rumale/decomposition/pca.rb
         |