RubyGems - rumale - Versions diffs - 0.12.3 → 0.12.4 - Mend

rumale 0.12.3 → 0.12.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (9) hide show

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 6d9c7691afd71e50df0c05d726a535a7f5dd426f
-  data.tar.gz: 3c2ac53df9060b7ff8abc62717c2ae06c1adebca
+  metadata.gz: fd6aef33fee80a240c1cad6a61189f9cb3a93034
+  data.tar.gz: 166db39ecff891c22648998d0524fee38b1fb906
 SHA512:
-  metadata.gz: 9686efe5c0126f29672b60047af8e604e4811916fd86a0f9151fe6e9b6e7ee292f4bbd33eae1dab8b2af1941910a52fbb68a0938f280bbce4e4c57faedf3215b
-  data.tar.gz: 2bf6c6c1fb42ab8290cc5471c4417f4aac8a07fdc23c607ffc10855bf3941f9b532e669374e5ad237f67bf5495902290284fc0697143c721e484e6e4ac39753a
+  metadata.gz: a9ef86ae0e3c7f9477bbf4efd3feb05e0fa67bdddb3b6cb15bd9b6ae54aece4c9507156e228736cee0727a0377b934cd6f5a15597064df9a129938da91423316
+  data.tar.gz: 89f0bec8f13f504bc1620c13af6ae1e40475d2b1c1addf026f7510c33711401f7ab721d564b1bd9c7755a20c240bba0881473b20e13875c4f0eb45904dc572f0

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,7 @@
+# 0.12.4
+- Add class for multidimensional scaling.
+- Fix parameter description on artificial dataset generation method.
 # 0.12.3
 - Add class for Power Iteration clustering.
 - Add classes for artificial dataset generation.

data/README.md CHANGED Viewed

@@ -6,14 +6,15 @@
 [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
 [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
 [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
-[![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.12.3)
+[![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.12.4)
 Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
 Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
 Rumale supports Linear / Kernel Support Vector Machine,
 Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
 Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier,
-K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
+K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
+Mutidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
 This project was formerly known as "SVMKit".
 If you are using SVMKit, please install Rumale and replace `SVMKit` constants with `Rumale`.

data/lib/rumale/dataset.rb CHANGED Viewed

@@ -56,7 +56,7 @@ module Rumale
       # @param noise [Float] The standard deviaion of gaussian noise added to the data.
       #   If nil is given, no noise is added.
       # @param factor [Float] The scale factor between inner and outer circles. The interval of factor is (0, 1).
-      # @random_seed [Integer] The seed value using to initialize the random generator.
+      # @param random_seed [Integer] The seed value using to initialize the random generator.
       def make_circles(n_samples, shuffle: true, noise: nil, factor: 0.8, random_seed: nil)
         Rumale::Validation.check_params_integer(n_samples: n_samples)
         Rumale::Validation.check_params_boolean(shuffle: shuffle)
@@ -95,7 +95,7 @@ module Rumale
       # @param shuffle [Boolean] The flag indicating whether to shuffle the dataset
       # @param noise [Float] The standard deviaion of gaussian noise added to the data.
       #   If nil is given, no noise is added.
-      # @random_seed [Integer] The seed value using to initialize the random generator.
+      # @param random_seed [Integer] The seed value using to initialize the random generator.
       def make_moons(n_samples, shuffle: true, noise: nil, random_seed: nil)
         Rumale::Validation.check_params_integer(n_samples: n_samples)
         Rumale::Validation.check_params_boolean(shuffle: shuffle)

data/lib/rumale/manifold/mds.rb ADDED Viewed

@@ -0,0 +1,175 @@
+# frozen_string_literal: true
+require 'rumale/base/base_estimator'
+require 'rumale/base/transformer'
+require 'rumale/utils'
+require 'rumale/pairwise_metric'
+require 'rumale/decomposition/pca'
+module Rumale
+  module Manifold
+    # MDS is a class that implements Metric Multidimensional Scaling (MDS)
+    # with Scaling by MAjorizing a COmplicated Function (SMACOF) algorithm.
+    #
+    # @example
+    #   mds = Rumale::Manifold::MDS.new(init: 'pca', max_iter: 500, random_seed: 1)
+    #   representations = mds.fit_transform(samples)
+    #
+    # *Reference*
+    # - P J. F. Groenen and M. van de Velden, "Multidimensional Scaling by Majorization: A Review," J. of Statistical Software, Vol. 73 (8), 2016.
+    class MDS
+      include Base::BaseEstimator
+      include Base::Transformer
+      # Return the data in representation space.
+      # @return [Numo::DFloat] (shape: [n_samples, n_components])
+      attr_reader :embedding
+      # Return the stress function value after optimization.
+      # @return [Float]
+      attr_reader :stress
+      # Return the number of iterations run for optimization
+      # @return [Integer]
+      attr_reader :n_iter
+      # Return the random generator.
+      # @return [Random]
+      attr_reader :rng
+      # Create a new transformer with MDS.
+      #
+      # @param n_components [Integer] The number of dimensions on representation space.
+      # @param metric [String] The metric to calculate the distances in original space.
+      #   If metric is 'euclidean', Euclidean distance is calculated for distance in original space.
+      #   If metric is 'precomputed', the fit and fit_transform methods expect to be given a distance matrix.
+      # @param init [String] The init is a method to initialize the representaion space.
+      #   If init is 'random', the representaion space is initialized with normal random variables.
+      #   If init is 'pca', the result of principal component analysis as the initial value of the representation space.
+      # @param max_iter [Integer] The maximum number of iterations.
+      # @param tol [Float] The tolerance of stress value for terminating optimization.
+      #   If tol is nil,  it does not use stress value as a criterion for terminating the optimization.
+      # @param verbose [Boolean] The flag indicating whether to output stress value during iteration.
+      # @param random_seed [Integer] The seed value using to initialize the random generator.
+      def initialize(n_components: 2, metric: 'euclidean', init: 'random',
+                     max_iter: 300, tol: nil, verbose: false, random_seed: nil)
+        check_params_integer(n_components: n_components, max_iter: max_iter)
+        check_params_string(metric: metric, init: init)
+        check_params_boolean(verbose: verbose)
+        check_params_type_or_nil(Float, tol: tol)
+        check_params_type_or_nil(Integer, random_seed: random_seed)
+        check_params_positive(n_components: n_components, max_iter: max_iter)
+        @params = {}
+        @params[:n_components] = n_components
+        @params[:max_iter] = max_iter
+        @params[:tol] = tol
+        @params[:metric] = metric
+        @params[:init] = init
+        @params[:verbose] = verbose
+        @params[:random_seed] = random_seed
+        @params[:random_seed] ||= srand
+        @rng = Random.new(@params[:random_seed])
+        @embedding = nil
+        @stress = nil
+        @n_iter = nil
+      end
+      # Fit the model with given training data.
+      #
+      # @overload fit(x) -> MDS
+      #
+      # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
+      #   If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
+      # @return [MDS] The learned transformer itself.
+      def fit(x, _not_used = nil)
+        check_sample_array(x)
+        raise ArgumentError, 'Expect the input distance matrix to be square.' if @params[:metric] == 'precomputed' && x.shape[0] != x.shape[1]
+        # initialize some varibales.
+        n_samples = x.shape[0]
+        hi_distance_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.euclidean_distance(x)
+        @embedding = init_embedding(x)
+        lo_distance_mat = Rumale::PairwiseMetric.euclidean_distance(@embedding)
+        @stress = calc_stress(hi_distance_mat, lo_distance_mat)
+        @n_iter = 0
+        # perform optimization.
+        @params[:max_iter].times do |t|
+          # guttman tarnsform.
+          ratio = hi_distance_mat / lo_distance_mat
+          ratio[ratio.diag_indices] = 0.0
+          ratio[lo_distance_mat.eq(0)] = 0.0
+          tmp_mat = -ratio
+          tmp_mat[tmp_mat.diag_indices] += ratio.sum(axis: 1)
+          @embedding = 1.fdiv(n_samples) * tmp_mat.dot(@embedding)
+          # check convergence.
+          new_stress = calc_stress(hi_distance_mat, lo_distance_mat)
+          if terminate?(@stress, new_stress)
+            @stress = new_stress
+            break
+          end
+          # next step.
+          @n_iter = t + 1
+          @stress = new_stress
+          lo_distance_mat = Rumale::PairwiseMetric.euclidean_distance(@embedding)
+          puts "[MDS] stress function after #{@n_iter} iterations: #{@stress}" if @params[:verbose] && (@n_iter % 100).zero?
+        end
+        self
+      end
+      # Fit the model with training data, and then transform them with the learned model.
+      #
+      # @overload fit_transform(x) -> Numo::DFloat
+      #
+      # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for fitting the model.
+      #   If the metric is 'precomputed', x must be a square distance matrix (shape: [n_samples, n_samples]).
+      # @return [Numo::DFloat] (shape: [n_samples, n_components]) The transformed data
+      def fit_transform(x, _not_used = nil)
+        fit(x)
+        @embedding.dup
+      end
+      # Dump marshal data.
+      # @return [Hash] The marshal data.
+      def marshal_dump
+        { params: @params,
+          embedding: @embedding,
+          stress: @stress,
+          n_iter: @n_iter,
+          rng: @rng }
+      end
+      # Load marshal data.
+      # @return [nil]
+      def marshal_load(obj)
+        @params = obj[:params]
+        @embedding = obj[:embedding]
+        @stress = obj[:stress]
+        @n_iter = obj[:n_iter]
+        @rng = obj[:rng]
+        nil
+      end
+      private
+      def init_embedding(x)
+        if @params[:init] == 'pca' && @params[:metric] == 'euclidean'
+          pca = Rumale::Decomposition::PCA.new(n_components: @params[:n_components], random_seed: @params[:random_seed])
+          pca.fit_transform(x)
+        else
+          n_samples = x.shape[0]
+          sub_rng = @rng.dup
+          Rumale::Utils.rand_uniform([n_samples, @params[:n_components]], sub_rng) - 0.5
+        end
+      end
+      def terminate?(old_stress, new_stress)
+        return false if @params[:tol].nil?
+        return false if old_stress.nil?
+        (old_stress - new_stress).abs <= @params[:tol]
+      end
+      def calc_stress(hi_distance_mat, lo_distance_mat)
+        ((hi_distance_mat - lo_distance_mat)**2).sum.fdiv(2)
+      end
+    end
+  end
+end

data/lib/rumale/version.rb CHANGED Viewed

@@ -3,5 +3,5 @@
 # Rumale is a machine learning library in Ruby.
 module Rumale
   # The version of Rumale you are using.
-  VERSION = '0.12.3'
+  VERSION = '0.12.4'
 end

data/lib/rumale.rb CHANGED Viewed

@@ -63,6 +63,7 @@ require 'rumale/clustering/power_iteration'
 require 'rumale/decomposition/pca'
 require 'rumale/decomposition/nmf'
 require 'rumale/manifold/tsne'
+require 'rumale/manifold/mds'
 require 'rumale/preprocessing/l2_normalizer'
 require 'rumale/preprocessing/min_max_scaler'
 require 'rumale/preprocessing/max_abs_scaler'

data/rumale.gemspec CHANGED Viewed

@@ -19,7 +19,8 @@ Gem::Specification.new do |spec|
     Rumale currently supports Linear / Kernel Support Vector Machine,
     Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
     Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
-    K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
+    K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
+    Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
   MSG
   spec.homepage      = 'https://github.com/yoshoku/rumale'
   spec.license       = 'BSD-2-Clause'

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rumale
 version: !ruby/object:Gem::Version
-  version: 0.12.3
+  version: 0.12.4
 platform: ruby
 authors:
 - yoshoku
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2019-06-22 00:00:00.000000000 Z
+date: 2019-06-29 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: numo-narray
@@ -114,7 +114,8 @@ description: |
   Rumale currently supports Linear / Kernel Support Vector Machine,
   Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
   Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
-  K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
+  K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering,
+  Multidimensional Scaling, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
 email:
 - yoshoku@outlook.com
 executables: []
@@ -187,6 +188,7 @@ files:
 - lib/rumale/linear_model/ridge.rb
 - lib/rumale/linear_model/svc.rb
 - lib/rumale/linear_model/svr.rb
+- lib/rumale/manifold/mds.rb
 - lib/rumale/manifold/tsne.rb
 - lib/rumale/model_selection/cross_validation.rb
 - lib/rumale/model_selection/grid_search_cv.rb