RubyGems - rumale - Versions diffs - 0.12.2 → 0.12.3 - Mend

rumale 0.12.2 → 0.12.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +4 -0
data/README.md +2 -2
data/lib/rumale.rb +1 -0
data/lib/rumale/clustering/power_iteration.rb +129 -0
data/lib/rumale/dataset.rb +78 -0
data/lib/rumale/pairwise_metric.rb +1 -2
data/lib/rumale/version.rb +1 -1
data/rumale.gemspec +1 -1
metadata +5 -4

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 59005b59f6a6a195fbe200260e0c74008fa532fe
-  data.tar.gz: 00e40ea656556bd5a42bf7d96674e9b758ec7460
+  metadata.gz: 6d9c7691afd71e50df0c05d726a535a7f5dd426f
+  data.tar.gz: 3c2ac53df9060b7ff8abc62717c2ae06c1adebca
 SHA512:
-  metadata.gz: 59ef5edcd1b435260e79792ed592d3b044a17c83ed23705c5f065cd46d916bb761cc8b4e1759c76cf86febf8839925ba53322ede8b29f57bdb7e7d656f92104b
-  data.tar.gz: 2ef45761c87c14882532c27e957d83adc2e40719df03a8cb497da09429713ddf946d1e6ba8bdcb73536cf6daed49c098c851b3bae2a7d828dd3efad42a712360
+  metadata.gz: 9686efe5c0126f29672b60047af8e604e4811916fd86a0f9151fe6e9b6e7ee292f4bbd33eae1dab8b2af1941910a52fbb68a0938f280bbce4e4c57faedf3215b
+  data.tar.gz: 2bf6c6c1fb42ab8290cc5471c4417f4aac8a07fdc23c607ffc10855bf3941f9b532e669374e5ad237f67bf5495902290284fc0697143c721e484e6e4ac39753a

data/CHANGELOG.md CHANGED

@@ -1,3 +1,7 @@
+# 0.12.3
+- Add class for Power Iteration clustering.
+- Add classes for artificial dataset generation.
 # 0.12.2
 - Add class for cluster analysis with Gaussian Mixture Model.
 - Add encoder class for categorical features.

data/README.md CHANGED

@@ -6,14 +6,14 @@
 [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
 [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
 [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
-[![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.12.2)
+[![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.12.3)
 Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
 Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
 Rumale supports Linear / Kernel Support Vector Machine,
 Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
 Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor classifier,
-K-Means, DBSCAN, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
+K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
 This project was formerly known as "SVMKit".
 If you are using SVMKit, please install Rumale and replace `SVMKit` constants with `Rumale`.

data/lib/rumale.rb CHANGED

@@ -59,6 +59,7 @@ require 'rumale/ensemble/extra_trees_regressor'
 require 'rumale/clustering/k_means'
 require 'rumale/clustering/gaussian_mixture'
 require 'rumale/clustering/dbscan'
+require 'rumale/clustering/power_iteration'
 require 'rumale/decomposition/pca'
 require 'rumale/decomposition/nmf'
 require 'rumale/manifold/tsne'

data/lib/rumale/clustering/power_iteration.rb ADDED

@@ -0,0 +1,129 @@
+# frozen_string_literal: true
+require 'rumale/base/base_estimator'
+require 'rumale/base/cluster_analyzer'
+require 'rumale/pairwise_metric'
+module Rumale
+  module Clustering
+    # PowerIteration is a class that implements power iteration clustering.
+    #
+    # @example
+    #   analyzer = Rumale::Clustering::PowerIteration.new(n_clusters: 10, gamma: 8.0, max_iter: 1000)
+    #   cluster_labels = analyzer.fit_predict(samples)
+    #
+    # *Reference*
+    # - F. Lin and W W. Cohen, "Power Iteration Clustering," Proc. ICML'10, pp. 655--662, 2010.
+    class PowerIteration
+      include Base::BaseEstimator
+      include Base::ClusterAnalyzer
+      # Return the data in embedded space.
+      # @return [Numo::DFloat] (shape: [n_samples])
+      attr_reader :embedding
+      # Return the number of iterations run for optimization
+      # @return [Integer]
+      attr_reader :n_iter
+      # Create a new cluster analyzer with power iteration clustering.
+      #
+      # @param n_clusters [Integer] The number of clusters.
+      # @param affinity [String] The representation of affinity matrix ('rbf' or 'precomputed').
+      # @param gamma [Float] The parameter of rbf kernel, if nil it is 1 / n_features.
+      #   If affinity = 'precomputed', this parameter is ignored.
+      # @param init [String] The initialization method for centroids of K-Means clustering ('random' or 'k-means++').
+      # @param max_iter [Integer] The maximum number of iterations.
+      # @param tol [Float] The tolerance of termination criterion.
+      # @param eps [Float] A small value close to zero to avoid zero division error.
+      # @param random_seed [Integer] The seed value using to initialize the random generator.
+      def initialize(n_clusters: 8, affinity: 'rbf', gamma: nil, init: 'k-means++', max_iter: 1000, tol: 1.0e-8, eps: 1.0e-5, random_seed: nil)
+        check_params_integer(n_clusters: n_clusters, max_iter: max_iter)
+        check_params_float(tol: tol, eps: eps)
+        check_params_string(affinity: affinity, init: init)
+        check_params_type_or_nil(Float, gamma: gamma)
+        check_params_type_or_nil(Integer, random_seed: random_seed)
+        check_params_positive(n_clusters: n_clusters, max_iter: max_iter, tol: tol, eps: eps)
+        @params = {}
+        @params[:n_clusters] = n_clusters
+        @params[:affinity] = affinity
+        @params[:gamma] = gamma
+        @params[:init] = init == 'random' ? 'random' : 'k-means++'
+        @params[:max_iter] = max_iter
+        @params[:tol] = tol
+        @params[:eps] = eps
+        @params[:random_seed] = random_seed
+        @params[:random_seed] ||= srand
+        @embedding = nil
+        @n_iter = nil
+      end
+      # Analysis clusters with given training data.
+      #
+      # @overload fit(x) -> PowerClustering
+      #
+      # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
+      #   If the metric is 'precomputed', x must be a square affinity matrix (shape: [n_samples, n_samples]).
+      # @return [PowerIteration] The learned cluster analyzer itself.
+      def fit(x, _y = nil)
+        check_sample_array(x)
+        raise ArgumentError, 'Expect the input affinity matrix to be square.' if @params[:affinity] == 'precomputed' && x.shape[0] != x.shape[1]
+        # initialize some variables.
+        affinity_mat = @params[:metric] == 'precomputed' ? x : Rumale::PairwiseMetric.rbf_kernel(x, nil, @params[:gamma])
+        affinity_mat[affinity_mat.diag_indices] = 0.0
+        n_samples = affinity_mat.shape[0]
+        tol = @params[:tol].fdiv(n_samples)
+        # calculate normalized affinity matrix.
+        degrees = affinity_mat.sum(axis: 1)
+        normalized_affinity_mat = (1.0 / degrees).diag.dot(affinity_mat)
+        # initialize embedding space.
+        @embedding = degrees / degrees.sum
+        # optimization
+        @n_iter = 0
+        error = Numo::DFloat.ones(n_samples)
+        @params[:max_iter].times do |t|
+          @n_iter = t + 1
+          new_embedding = normalized_affinity_mat.dot(@embedding)
+          new_embedding /= new_embedding.abs.sum
+          new_error = (new_embedding - @embedding).abs
+          break if (new_error - error).abs.max <= tol
+          @embedding = new_embedding
+          error = new_error
+        end
+        self
+      end
+      # Analysis clusters and assign samples to clusters.
+      #
+      # @param x [Numo::DFloat] (shape: [n_samples, n_features]) The training data to be used for cluster analysis.
+      #   If the metric is 'precomputed', x must be a square affinity matrix (shape: [n_samples, n_samples]).
+      # @return [Numo::Int32] (shape: [n_samples]) Predicted cluster label per sample.
+      def fit_predict(x)
+        check_sample_array(x)
+        fit(x)
+        kmeans = Rumale::Clustering::KMeans.new(
+          n_clusters: @params[:n_clusters], init: @params[:init],
+          max_iter: @params[:max_iter], tol: @params[:tol], random_seed: @params[:random_seed]
+        )
+        kmeans.fit_predict(@embedding.expand_dims(1))
+      end
+      # Dump marshal data.
+      # @return [Hash] The marshal data.
+      def marshal_dump
+        { params: @params,
+          embedding: @embedding,
+          n_iter: @n_iter }
+      end
+      # Load marshal data.
+      # @return [nil]
+      def marshal_load(obj)
+        @params = obj[:params]
+        @embedding = obj[:embedding]
+        @n_iter = obj[:n_iter]
+        nil
+      end
+    end
+  end
+end

data/lib/rumale/dataset.rb CHANGED

@@ -1,6 +1,7 @@
 # frozen_string_literal: true
 require 'csv'
+require 'rumale/validation'
 module Rumale
   # Module for loading and saving a dataset file.
@@ -48,6 +49,83 @@ module Rumale
         end
       end
+      # Generate a two-dimensional data set consisting of an inner circle and an outer circle.
+      #
+      # @param n_samples [Integer] The number of samples.
+      # @param shuffle [Boolean] The flag indicating whether to shuffle the dataset
+      # @param noise [Float] The standard deviaion of gaussian noise added to the data.
+      #   If nil is given, no noise is added.
+      # @param factor [Float] The scale factor between inner and outer circles. The interval of factor is (0, 1).
+      # @random_seed [Integer] The seed value using to initialize the random generator.
+      def make_circles(n_samples, shuffle: true, noise: nil, factor: 0.8, random_seed: nil)
+        Rumale::Validation.check_params_integer(n_samples: n_samples)
+        Rumale::Validation.check_params_boolean(shuffle: shuffle)
+        Rumale::Validation.check_params_type_or_nil(Float, noise: noise)
+        Rumale::Validation.check_params_float(factor: factor)
+        Rumale::Validation.check_params_type_or_nil(Integer, random_seed: random_seed)
+        raise ArgumentError, 'The number of samples must be more than 2.' if n_samples <= 1
+        raise RangeError, 'The interval of factor is (0, 1).' if factor <= 0 || factor >= 1
+        # initialize some variables.
+        rs = random_seed
+        rs ||= srand
+        rng = Random.new(rs)
+        n_samples_out = n_samples.fdiv(2).to_i
+        n_samples_in = n_samples - n_samples_out
+        # make two circles.
+        linsp_out = Numo::DFloat.linspace(0, 2 * Math::PI, n_samples_out)
+        linsp_in = Numo::DFloat.linspace(0, 2 * Math::PI, n_samples_in)
+        circle_out = Numo::DFloat[Numo::NMath.cos(linsp_out), Numo::NMath.sin(linsp_out)].transpose
+        circle_in = Numo::DFloat[Numo::NMath.cos(linsp_in), Numo::NMath.sin(linsp_in)].transpose
+        x = Numo::DFloat.vstack([circle_out, factor * circle_in])
+        y = Numo::Int32.hstack([Numo::Int32.zeros(n_samples_out), Numo::Int32.ones(n_samples_in)])
+        # shuffle data indices.
+        if shuffle
+          rand_ids = [*0...n_samples].shuffle(random: rng.dup)
+          x = x[rand_ids, true].dup
+          y = y[rand_ids].dup
+        end
+        # add gaussian noise.
+        x += Rumale::Utils.rand_normal(x.shape, rng.dup, 0.0, noise) unless noise.nil?
+        [x, y]
+      end
+      # Generate a two-dimensional data set consisting of two half circles shifted.
+      #
+      # @param n_samples [Integer] The number of samples.
+      # @param shuffle [Boolean] The flag indicating whether to shuffle the dataset
+      # @param noise [Float] The standard deviaion of gaussian noise added to the data.
+      #   If nil is given, no noise is added.
+      # @random_seed [Integer] The seed value using to initialize the random generator.
+      def make_moons(n_samples, shuffle: true, noise: nil, random_seed: nil)
+        Rumale::Validation.check_params_integer(n_samples: n_samples)
+        Rumale::Validation.check_params_boolean(shuffle: shuffle)
+        Rumale::Validation.check_params_type_or_nil(Float, noise: noise)
+        Rumale::Validation.check_params_type_or_nil(Integer, random_seed: random_seed)
+        raise ArgumentError, 'The number of samples must be more than 2.' if n_samples <= 1
+        # initialize some variables.
+        rs = random_seed
+        rs ||= srand
+        rng = Random.new(rs)
+        n_samples_out = n_samples.fdiv(2).to_i
+        n_samples_in = n_samples - n_samples_out
+        # make two half circles.
+        linsp_out = Numo::DFloat.linspace(0, Math::PI, n_samples_out)
+        linsp_in = Numo::DFloat.linspace(0, Math::PI, n_samples_in)
+        circle_out = Numo::DFloat[Numo::NMath.cos(linsp_out), Numo::NMath.sin(linsp_out)].transpose
+        circle_in = Numo::DFloat[1 - Numo::NMath.cos(linsp_in), 1 - Numo::NMath.sin(linsp_in) - 0.5].transpose
+        x = Numo::DFloat.vstack([circle_out, circle_in])
+        y = Numo::Int32.hstack([Numo::Int32.zeros(n_samples_out), Numo::Int32.ones(n_samples_in)])
+        # shuffle data indices.
+        if shuffle
+          rand_ids = [*0...n_samples].shuffle(random: rng.dup)
+          x = x[rand_ids, true].dup
+          y = y[rand_ids].dup
+        end
+        # add gaussian noise.
+        x += Rumale::Utils.rand_normal(x.shape, rng.dup, 0.0, noise) unless noise.nil?
+        [x, y]
+      end
       private
       def parse_libsvm_line(line, zero_based)

data/lib/rumale/pairwise_metric.rb CHANGED

@@ -52,8 +52,7 @@ module Rumale
         Rumale::Validation.check_sample_array(x)
         Rumale::Validation.check_sample_array(y)
         Rumale::Validation.check_params_float(gamma: gamma)
-        distance_matrix = euclidean_distance(x, y)
-        Numo::NMath.exp((distance_matrix**2) * -gamma)
+        Numo::NMath.exp(-gamma * squared_error(x, y).abs)
       end
       # Calculate the linear kernel between x and y.

data/lib/rumale/version.rb CHANGED

@@ -3,5 +3,5 @@
 # Rumale is a machine learning library in Ruby.
 module Rumale
   # The version of Rumale you are using.
-  VERSION = '0.12.2'
+  VERSION = '0.12.3'
 end

data/rumale.gemspec CHANGED

@@ -19,7 +19,7 @@ Gem::Specification.new do |spec|
     Rumale currently supports Linear / Kernel Support Vector Machine,
     Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
     Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
-    K-Means, DBSCAN, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
+    K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
   MSG
   spec.homepage      = 'https://github.com/yoshoku/rumale'
   spec.license       = 'BSD-2-Clause'

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rumale
 version: !ruby/object:Gem::Version
-  version: 0.12.2
+  version: 0.12.3
 platform: ruby
 authors:
 - yoshoku
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2019-06-15 00:00:00.000000000 Z
+date: 2019-06-22 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: numo-narray
@@ -114,7 +114,7 @@ description: |
   Rumale currently supports Linear / Kernel Support Vector Machine,
   Logistic Regression, Linear Regression, Ridge, Lasso, Factorization Machine,
   Naive Bayes, Decision Tree, AdaBoost, Gradient Tree Boosting, Random Forest, Extra-Trees, K-nearest neighbor algorithm,
-  K-Means, DBSCAN, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
+  K-Means, Gaussian Mixture Model, DBSCAN, Power Iteration Clustering, t-SNE, Principal Component Analysis, and Non-negative Matrix Factorization.
 email:
 - yoshoku@outlook.com
 executables: []
@@ -149,6 +149,7 @@ files:
 - lib/rumale/clustering/dbscan.rb
 - lib/rumale/clustering/gaussian_mixture.rb
 - lib/rumale/clustering/k_means.rb
+- lib/rumale/clustering/power_iteration.rb
 - lib/rumale/dataset.rb
 - lib/rumale/decomposition/nmf.rb
 - lib/rumale/decomposition/pca.rb
@@ -249,7 +250,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
       version: '0'
 requirements: []
 rubyforge_project:
-rubygems_version: 2.5.2.3
+rubygems_version: 2.6.14.4
 signing_key:
 specification_version: 4
 summary: Rumale is a machine learning library in Ruby. Rumale provides machine learning