RubyGems - rumale - Versions diffs - 0.11.0 → 0.12.0 - Mend

rumale 0.11.0 → 0.12.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (35) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +22 -0
data/README.md +100 -4
data/lib/rumale/clustering/k_means.rb +3 -2
data/lib/rumale/decomposition/nmf.rb +3 -2
data/lib/rumale/decomposition/pca.rb +2 -5
data/lib/rumale/ensemble/ada_boost_classifier.rb +3 -2
data/lib/rumale/ensemble/ada_boost_regressor.rb +3 -2
data/lib/rumale/ensemble/extra_trees_classifier.rb +2 -1
data/lib/rumale/ensemble/extra_trees_regressor.rb +2 -1
data/lib/rumale/ensemble/gradient_boosting_classifier.rb +2 -1
data/lib/rumale/ensemble/gradient_boosting_regressor.rb +2 -1
data/lib/rumale/ensemble/random_forest_classifier.rb +5 -4
data/lib/rumale/ensemble/random_forest_regressor.rb +5 -4
data/lib/rumale/kernel_approximation/rbf.rb +3 -2
data/lib/rumale/kernel_machine/kernel_svc.rb +2 -1
data/lib/rumale/linear_model/base_linear_model.rb +1 -1
data/lib/rumale/manifold/tsne.rb +2 -1
data/lib/rumale/model_selection/k_fold.rb +2 -1
data/lib/rumale/model_selection/shuffle_split.rb +3 -2
data/lib/rumale/model_selection/stratified_k_fold.rb +4 -3
data/lib/rumale/model_selection/stratified_shuffle_split.rb +3 -2
data/lib/rumale/nearest_neighbors/k_neighbors_classifier.rb +1 -1
data/lib/rumale/nearest_neighbors/k_neighbors_regressor.rb +1 -1
data/lib/rumale/pipeline/pipeline.rb +1 -1
data/lib/rumale/polynomial_model/base_factorization_machine.rb +1 -1
data/lib/rumale/tree/base_decision_tree.rb +1 -1
data/lib/rumale/tree/decision_tree_classifier.rb +1 -0
data/lib/rumale/tree/decision_tree_regressor.rb +1 -0
data/lib/rumale/tree/extra_tree_classifier.rb +1 -1
data/lib/rumale/tree/extra_tree_regressor.rb +1 -1
data/lib/rumale/tree/gradient_tree_regressor.rb +2 -1
data/lib/rumale/utils.rb +6 -2
data/lib/rumale/version.rb +1 -1
metadata +2 -2

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA1:
-  metadata.gz: 50ce110d0d5ad24245b5b52347a7ae72c1a7c673
-  data.tar.gz: 52c1acc4ebe4c8da8120dc431be4e1a953317a63
+  metadata.gz: f662b1bf4abdb9aba9c978362094d80f59fcb390
+  data.tar.gz: eb5087ce4b4f2dfdc8e789c340139dd7d36693e0
 SHA512:
-  metadata.gz: f8774f51f6bde00ea9414de9bfbe2c31b1c3c09c6931bd29ae414117d2648ee8273fa4f8dc32e78573a9e9da96db2cba19ca67372e4ac56adbe2a68c9be5b92a
-  data.tar.gz: 7777ba4d627830877dea89b1c9573340fd03882ccdafac57700e261f1e0b621962cc9744129bdbf26ae1078995e7d16db9c36758ae9a327d93ef3e5c3f572b28
+  metadata.gz: 8418aa3932962b135c3a9725262e84b741825ed2491e98dba508e04ea4104d7abe0a9938d6248d9405bc7d9793d1f128a7e37c89804a79367626a47d3fa6a773
+  data.tar.gz: 40fff97c335d5720eaf1c90b45ed39531c61827777797a3a4bbc5c8b5f4b8df9b3814290686c959f9f7a9726966187a064b521850139124c9be2c64189d1d29f

data/CHANGELOG.md CHANGED

@@ -1,3 +1,25 @@
+# 0.12.0
+## Breaking changes
+- For reproductivity, Rumale changes to not repeatedly use the same random number generator in the same estimator.
+In the training phase, estimators use a copy of the random number generator created in the initialize method.
+Even with the same algorithm and the same data, the order of random number generation
+may make slight differences in learning results.
+By this change, even if the fit method is executed multiple times,
+the same learning result can be obtained if the same data is given.
+```ruby
+svc = Rumale::LinearModel::SVC.new(random_seed: 0)
+svc.fit(x, y)
+a = svc.weight_vec
+svc.fit(x, y)
+b = svc.weight_vec
+err = ((a - b)**2).mean
+# In version 0.11.0 or earlier, false may be output,
+# but from this version, true is always output.
+puts(err < 1e-4)
+```
 # 0.11.0
 - Introduce [Parallel gem](https://github.com/grosser/parallel) to improve execution speed for one-vs-the-rest and bagging methods.
 - Add the n_jobs parameter that specifies the number of jobs for parallel processing in some estimators belong to the Rumale::LinearModel, Rumale::PolynomialModel, and Rumale::Ensemble.

data/README.md CHANGED

@@ -6,7 +6,7 @@
 [![Coverage Status](https://coveralls.io/repos/github/yoshoku/rumale/badge.svg?branch=master)](https://coveralls.io/github/yoshoku/rumale?branch=master)
 [![Gem Version](https://badge.fury.io/rb/rumale.svg)](https://badge.fury.io/rb/rumale)
 [![BSD 2-Clause License](https://img.shields.io/badge/License-BSD%202--Clause-orange.svg)](https://github.com/yoshoku/rumale/blob/master/LICENSE.txt)
-[![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.11.0)
+[![Documentation](http://img.shields.io/badge/docs-rdoc.info-blue.svg)](https://www.rubydoc.info/gems/rumale/0.12.0)
 Rumale (**Ru**by **ma**chine **le**arning) is a machine learning library in Ruby.
 Rumale provides machine learning algorithms with interfaces similar to Scikit-Learn in Python.
@@ -36,7 +36,43 @@ Or install it yourself as:
 ## Usage
-### Example 1. Pendigits dataset classification
+### Example 1. XOR data
+First, let's classify simple xor data.
+In Rumale, feature vectors and labels are represented by [Numo::NArray](https://github.com/ruby-numo/numo-narray).
+```ruby
+require 'rumale'
+# Prepare XOR data.
+features = [[0, 0], [0, 1], [1, 0], [1, 1]]
+labels = [0, 1, 1, 0]
+# Convert Ruby Array into Numo::NArray.
+x = Numo::DFloat.asarray(features)
+y = Numo::Int32.asarray(labels)
+# Train classifier with nearest neighbor rule.
+estimator = Rumale::NearestNeighbors::KNeighborsClassifier.new(n_neighbors: 1)
+estimator.fit(x, y)
+# Predict labels.
+p y
+p estimator.predict(x)
+```
+Execution of the above script result in the following.
+```ruby
+Numo::Int32#shape=[4]
+[0, 1, 1, 0]
+Numo::Int32#shape=[4]
+[0, 1, 1, 0]
+```
+The basic usage of Rumale is to first train the model with the fit method
+and then estimate with the predict method.
+### Example 2. Pendigits dataset classification
 Rumale provides function loading libsvm format dataset file.
 We start by downloading the pendigits dataset from LIBSVM Data web site.
@@ -99,7 +135,7 @@ $ ruby test.rb
 Accuracy: 98.4%
 ```
-### Example 2. Cross-validation
+### Example 3. Cross-validation
 ```ruby
 require 'rumale'
@@ -130,7 +166,7 @@ $ ruby cross_validation.rb
 5-CV mean log-loss: 0.476
 ```
-### Example 3. Pipeline
+### Example 4. Pipeline
 ```ruby
 require 'rumale'
@@ -162,6 +198,66 @@ $ ruby pipeline.rb
 5-CV mean accuracy: 99.2 %
 ```
+## Speeding up
+### Numo::Linalg
+Loading the [Numo::Linalg](https://github.com/ruby-numo/numo-linalg) allows to perform matrix product of Numo::NArray using BLAS libraries.
+For example, using the [OpenBLAS](https://github.com/xianyi/OpenBLAS) speeds up many estimators in Rumale.
+Install OpenBLAS library.
+Mac:
+```bash
+$ brew install openblas --with-openmp
+```
+Ubuntu:
+```bash
+$ sudo apt-get install gcc gfortran
+$ wget https://github.com/xianyi/OpenBLAS/archive/v0.3.5.tar.gz
+$ tar xzf v0.3.5.tar.gz
+$ cd OpenBLAS-0.3.5
+$ make USE_OPENMP=1
+$ sudo make PREFIX=/usr/local install
+```
+Install Numo::Linalg gem.
+```bash
+$ gem install numo-linalg
+```
+In ruby script, you only need to require the autoloader module of Numo::Linalg.
+```ruby
+require 'numo/linalg/autoloader'
+require 'rumale'
+```
+### Parallel
+Several estimators in Rumale support parallel processing.
+Parallel processing in Rumale is realized by [Parallel](https://github.com/grosser/parallel) gem,
+so install and load it.
+```bash
+$ gem install parallel
+```
+```ruby
+require 'parallel'
+require 'rumale'
+```
+Estimators that support parallel processing have n_jobs parameter.
+When -1 is given to n_jobs parameter, all processors are used.
+```ruby
+estimator = Rumale::Ensemble::RandomForestClassifier.new(n_jobs: -1, random_seed: 1)
+```
 ## Development
 After checking out the repo, run `bin/setup` to install dependencies. Then, run `rake spec` to run the tests. You can also run `bin/console` for an interactive prompt that will allow you to experiment.

data/lib/rumale/clustering/k_means.rb CHANGED

@@ -120,7 +120,8 @@ module Rumale
       def init_cluster_centers(x)
         # random initialize
         n_samples = x.shape[0]
-        rand_id = [*0...n_samples].sample(@params[:n_clusters], random: @rng)
+        sub_rng = @rng.dup
+        rand_id = [*0...n_samples].sample(@params[:n_clusters], random: sub_rng)
         @cluster_centers = x[rand_id, true].dup
         return unless @params[:init] == 'k-means++'
         # k-means++ initialize
@@ -129,7 +130,7 @@ module Rumale
           min_distances = distance_matrix.flatten[distance_matrix.min_index(axis: 1)]
           probs = min_distances**2 / (min_distances**2).sum
           cum_probs = probs.cumsum
-          selected_id = cum_probs.gt(@rng.rand).where.to_a.first
+          selected_id = cum_probs.gt(sub_rng.rand).where.to_a.first
           @cluster_centers[n, true] = x[selected_id, true].dup
         end
       end

data/lib/rumale/decomposition/nmf.rb CHANGED

@@ -113,8 +113,9 @@ module Rumale
         # initialize some variables.
         n_samples, n_features = x.shape
         scale = Math.sqrt(x.mean / @params[:n_components])
-        @components = Rumale::Utils.rand_uniform([@params[:n_components], n_features], @rng) * scale if update_comps
-        coefficients = Rumale::Utils.rand_uniform([n_samples, @params[:n_components]], @rng) * scale
+        sub_rng = @rng.dup
+        @components = Rumale::Utils.rand_uniform([@params[:n_components], n_features], sub_rng) * scale if update_comps
+        coefficients = Rumale::Utils.rand_uniform([n_samples, @params[:n_components]], sub_rng) * scale
         # optimization.
         @params[:max_iter].times do
           # update

data/lib/rumale/decomposition/pca.rb CHANGED

@@ -63,13 +63,14 @@ module Rumale
         # initialize some variables.
         @components = nil
         n_samples, n_features = x.shape
+        sub_rng = @rng.dup
         # centering.
         @mean = x.mean(0)
         centered_x = x - @mean
         # optimization.
         covariance_mat = centered_x.transpose.dot(centered_x) / (n_samples - 1)
         @params[:n_components].times do
-          comp_vec = random_vec(n_features)
+          comp_vec = Rumale::Utils.rand_uniform(n_features, sub_rng)
           @params[:max_iter].times do
             updated = orthogonalize(covariance_mat.dot(comp_vec))
             break if (updated.dot(comp_vec) - 1).abs < @params[:tol]
@@ -139,10 +140,6 @@ module Rumale
         end
         pcvec / Math.sqrt((pcvec**2).sum.abs) + 1.0e-12
       end
-      def random_vec(n_features)
-        Numo::DFloat[*(Array.new(n_features) { @rng.rand })]
-      end
     end
   end
 end

data/lib/rumale/ensemble/ada_boost_classifier.rb CHANGED

@@ -95,6 +95,7 @@ module Rumale
         @params[:max_features] = [[1, @params[:max_features]].max, n_features].min
         @classes = Numo::Int32.asarray(y.to_a.uniq.sort)
         n_classes = @classes.shape[0]
+        sub_rng = @rng.dup
         ## Boosting.
         classes_arr = @classes.to_a
         y_codes = Numo::DFloat.zeros(n_samples, n_classes) - 1.fdiv(n_classes - 1)
@@ -102,12 +103,12 @@ module Rumale
         observation_weights = Numo::DFloat.zeros(n_samples) + 1.fdiv(n_samples)
         @params[:n_estimators].times do |_t|
           # Fit classfier.
-          ids = Rumale::Utils.choice_ids(n_samples, observation_weights, @rng)
+          ids = Rumale::Utils.choice_ids(n_samples, observation_weights, sub_rng)
           break if y[ids].to_a.uniq.size != n_classes
           tree = Tree::DecisionTreeClassifier.new(
             criterion: @params[:criterion], max_depth: @params[:max_depth],
             max_leaf_nodes: @params[:max_leaf_nodes], min_samples_leaf: @params[:min_samples_leaf],
-            max_features: @params[:max_features], random_seed: @rng.rand(Rumale::Values.int_max)
+            max_features: @params[:max_features], random_seed: sub_rng.rand(Rumale::Values.int_max)
           )
           tree.fit(x[ids, true], y[ids])
           # Calculate estimator error.

data/lib/rumale/ensemble/ada_boost_regressor.rb CHANGED

@@ -102,14 +102,15 @@ module Rumale
         @estimators = []
         @estimator_weights = []
         @feature_importances = Numo::DFloat.zeros(n_features)
+        sub_rng = @rng.dup
         # Construct forest.
         @params[:n_estimators].times do |_t|
           # Fit weak learner.
-          ids = Rumale::Utils.choice_ids(n_samples, observation_weights, @rng)
+          ids = Rumale::Utils.choice_ids(n_samples, observation_weights, sub_rng)
           tree = Tree::DecisionTreeRegressor.new(
             criterion: @params[:criterion], max_depth: @params[:max_depth],
             max_leaf_nodes: @params[:max_leaf_nodes], min_samples_leaf: @params[:min_samples_leaf],
-            max_features: @params[:max_features], random_seed: @rng.rand(Rumale::Values.int_max)
+            max_features: @params[:max_features], random_seed: sub_rng.rand(Rumale::Values.int_max)
           )
           tree.fit(x[ids, true], y[ids])
           p = tree.predict(x)

data/lib/rumale/ensemble/extra_trees_classifier.rb CHANGED

@@ -80,8 +80,9 @@ module Rumale
         @params[:max_features] = Math.sqrt(n_features).to_i unless @params[:max_features].is_a?(Integer)
         @params[:max_features] = [[1, @params[:max_features]].max, n_features].min
         @classes = Numo::Int32.asarray(y.to_a.uniq.sort)
+        sub_rng = @rng.dup
         # Construct trees.
-        rng_seeds = Array.new(@params[:n_estimators]) { @rng.rand(Rumale::Values.int_max) }
+        rng_seeds = Array.new(@params[:n_estimators]) { sub_rng.rand(Rumale::Values.int_max) }
         @estimators = if enable_parallel?
                         parallel_map(@params[:n_estimators]) { |n| plant_tree(rng_seeds[n]).fit(x, y) }
                       else

data/lib/rumale/ensemble/extra_trees_regressor.rb CHANGED

@@ -75,8 +75,9 @@ module Rumale
         n_features = x.shape[1]
         @params[:max_features] = Math.sqrt(n_features).to_i unless @params[:max_features].is_a?(Integer)
         @params[:max_features] = [[1, @params[:max_features]].max, n_features].min
+        sub_rng = @rng.dup
         # Construct forest.
-        rng_seeds = Array.new(@params[:n_estimators]) { @rng.rand(Rumale::Values.int_max) }
+        rng_seeds = Array.new(@params[:n_estimators]) { sub_rng.rand(Rumale::Values.int_max) }
         @estimators = if enable_parallel?
                         parallel_map(@params[:n_estimators]) { |n| plant_tree(rng_seeds[n]).fit(x, y) }
                       else

data/lib/rumale/ensemble/gradient_boosting_classifier.rb CHANGED

@@ -216,10 +216,11 @@ module Rumale
         n_sub_samples = [n_samples, [(n_samples * @params[:subsample]).to_i, 1].max].min
         whole_ids = Array.new(n_samples) { |v| v }
         y_pred = Numo::DFloat.ones(n_samples) * init_pred
+        sub_rng = @rng.dup
         # grow trees.
         @params[:n_estimators].times do |_t|
           # subsampling
-          ids = whole_ids.sample(n_sub_samples, random: @rng)
+          ids = whole_ids.sample(n_sub_samples, random: sub_rng)
           x_sub = x[ids, true]
           y_sub = y[ids]
           y_pred_sub = y_pred[ids]

data/lib/rumale/ensemble/gradient_boosting_regressor.rb CHANGED

@@ -178,10 +178,11 @@ module Rumale
         n_sub_samples = [n_samples, [(n_samples * @params[:subsample]).to_i, 1].max].min
         whole_ids = Array.new(n_samples) { |v| v }
         y_pred = Numo::DFloat.ones(n_samples) * init_pred
+        sub_rng = @rng.dup
         # grow trees.
         @params[:n_estimators].times do |_t|
           # subsampling
-          ids = whole_ids.sample(n_sub_samples, random: @rng)
+          ids = whole_ids.sample(n_sub_samples, random: sub_rng)
           x_sub = x[ids, true]
           y_sub = y[ids]
           y_pred_sub = y_pred[ids]

data/lib/rumale/ensemble/random_forest_classifier.rb CHANGED

@@ -94,10 +94,11 @@ module Rumale
         @params[:max_features] = Math.sqrt(n_features).to_i unless @params[:max_features].is_a?(Integer)
         @params[:max_features] = [[1, @params[:max_features]].max, n_features].min
         @classes = Numo::Int32.asarray(y.to_a.uniq.sort)
+        sub_rng = @rng.dup
+        rngs = Array.new(@params[:n_estimators]) { Random.new(sub_rng.rand(Rumale::Values.int_max)) }
         # Construct forest.
         @estimators =
           if enable_parallel?
-            rngs = Array.new(@params[:n_estimators]) { Random.new(@rng.rand(Rumale::Values.int_max)) }
             # :nocov:
             parallel_map(@params[:n_estimators]) do |n|
               bootstrap_ids = Array.new(n_samples) { rngs[n].rand(0...n_samples) }
@@ -105,9 +106,9 @@ module Rumale
             end
             # :nocov:
           else
-            Array.new(@params[:n_estimators]) do
-              bootstrap_ids = Array.new(n_samples) { @rng.rand(0...n_samples) }
-              plant_tree(@rng.rand(Rumale::Values.int_max)).fit(x[bootstrap_ids, true], y[bootstrap_ids])
+            Array.new(@params[:n_estimators]) do |n|
+              bootstrap_ids = Array.new(n_samples) { rngs[n].rand(0...n_samples) }
+              plant_tree(rngs[n].rand(Rumale::Values.int_max)).fit(x[bootstrap_ids, true], y[bootstrap_ids])
             end
           end
         @feature_importances =

data/lib/rumale/ensemble/random_forest_regressor.rb CHANGED

@@ -88,10 +88,11 @@ module Rumale
         @params[:max_features] = Math.sqrt(n_features).to_i unless @params[:max_features].is_a?(Integer)
         @params[:max_features] = [[1, @params[:max_features]].max, n_features].min
         single_target = y.shape[1].nil?
+        sub_rng = @rng.dup
+        rngs = Array.new(@params[:n_estimators]) { Random.new(sub_rng.rand(Rumale::Values.int_max)) }
         # Construct forest.
         @estimators =
           if enable_parallel?
-            rngs = Array.new(@params[:n_estimators]) { Random.new(@rng.rand(Rumale::Values.int_max)) }
             # :nocov:
             parallel_map(@params[:n_estimators]) do |n|
               bootstrap_ids = Array.new(n_samples) { rngs[n].rand(0...n_samples) }
@@ -100,9 +101,9 @@ module Rumale
             end
             # :nocov:
           else
-            Array.new(@params[:n_estimators]) do
-              bootstrap_ids = Array.new(n_samples) { @rng.rand(0...n_samples) }
-              tree = plant_tree(@rng.rand(Rumale::Values.int_max))
+            Array.new(@params[:n_estimators]) do |n|
+              bootstrap_ids = Array.new(n_samples) { rngs[n].rand(0...n_samples) }
+              tree = plant_tree(rngs[n].rand(Rumale::Values.int_max))
               tree.fit(x[bootstrap_ids, true], single_target ? y[bootstrap_ids] : y[bootstrap_ids, true])
             end
           end

data/lib/rumale/kernel_approximation/rbf.rb CHANGED

@@ -10,7 +10,7 @@ module Rumale
     # Class for RBF kernel feature mapping.
     #
     # @example
-    #   transformer = Rumale::KernelApproximation::RBF.new(gamma: 1.0, n_coponents: 128, random_seed: 1)
+    #   transformer = Rumale::KernelApproximation::RBF.new(gamma: 1.0, n_components: 128, random_seed: 1)
     #   new_training_samples = transformer.fit_transform(training_samples)
     #   new_testing_samples = transformer.transform(testing_samples)
     #
@@ -63,8 +63,9 @@ module Rumale
         check_sample_array(x)
         n_features = x.shape[1]
+        sub_rng = @rng.dup
         @params[:n_components] = 2 * n_features if @params[:n_components] <= 0
-        @random_mat = Rumale::Utils.rand_normal([n_features, @params[:n_components]], @rng) * (2.0 * @params[:gamma])**0.5
+        @random_mat = Rumale::Utils.rand_normal([n_features, @params[:n_components]], sub_rng) * (2.0 * @params[:gamma])**0.5
         n_half_components = @params[:n_components] / 2
         @random_vec = Numo::DFloat.zeros(@params[:n_components] - n_half_components).concatenate(
           Numo::DFloat.ones(n_half_components) * (0.5 * Math::PI)

data/lib/rumale/kernel_machine/kernel_svc.rb CHANGED

@@ -202,10 +202,11 @@ module Rumale
         n_training_samples = x.shape[0]
         rand_ids = []
         weight_vec = Numo::DFloat.zeros(n_training_samples)
+        sub_rng = @rng.dup
         # Start optimization.
         @params[:max_iter].times do |t|
           # random sampling
-          rand_ids = [*0...n_training_samples].shuffle(random: @rng) if rand_ids.empty?
+          rand_ids = [*0...n_training_samples].shuffle(random: sub_rng) if rand_ids.empty?
           target_id = rand_ids.shift
           # update the weight vector
           func = (weight_vec * bin_y).dot(x[target_id, true].transpose).to_f

data/lib/rumale/linear_model/base_linear_model.rb CHANGED

@@ -49,7 +49,7 @@ module Rumale
         samples = @params[:fit_bias] ? expand_feature(x) : x
         # Initialize some variables.
         n_samples, n_features = samples.shape
-        rand_ids = [*0...n_samples].shuffle(random: @rng)
+        rand_ids = [*0...n_samples].shuffle(random: @rng.dup)
         weight = Numo::DFloat.zeros(n_features)
         optimizer = @params[:optimizer].dup
         # Optimization.

data/lib/rumale/manifold/tsne.rb CHANGED

@@ -155,7 +155,8 @@ module Rumale
           pca.fit_transform(x)
         else
           n_samples = x.shape[0]
-          Rumale::Utils.rand_normal([n_samples, @params[:n_components]], @rng, 0, 0.0001)
+          sub_rng = @rng.dup
+          Rumale::Utils.rand_normal([n_samples, @params[:n_components]], sub_rng, 0, 0.0001)
         end
       end

data/lib/rumale/model_selection/k_fold.rb CHANGED

@@ -60,9 +60,10 @@ module Rumale
           raise ArgumentError,
                 'The value of n_splits must be not less than 2 and not more than the number of samples.'
         end
+        sub_rng = @rng.dup
         # Splits dataset ids to each fold.
         dataset_ids = [*0...n_samples]
-        dataset_ids.shuffle!(random: @rng) if @shuffle
+        dataset_ids.shuffle!(random: sub_rng) if @shuffle
         fold_sets = Array.new(@n_splits) do |n|
           n_fold_samples = n_samples / @n_splits
           n_fold_samples += 1 if n < n_samples % @n_splits

data/lib/rumale/model_selection/shuffle_split.rb CHANGED

@@ -74,14 +74,15 @@ module Rumale
           raise RangeError,
                 'The total number of samples in test split and train split must be not more than the number of samples.'
         end
+        sub_rng = @rng.dup
         # Returns array consisting of the training and testing ids for each fold.
         dataset_ids = [*0...n_samples]
         Array.new(@n_splits) do
-          test_ids = dataset_ids.sample(n_test_samples, random: @rng)
+          test_ids = dataset_ids.sample(n_test_samples, random: sub_rng)
           train_ids = if @train_size.nil?
                         dataset_ids - test_ids
                       else
-                        (dataset_ids - test_ids).sample(n_train_samples, random: @rng)
+                        (dataset_ids - test_ids).sample(n_train_samples, random: sub_rng)
                       end
           [train_ids, test_ids]
         end

data/lib/rumale/model_selection/stratified_k_fold.rb CHANGED

@@ -65,7 +65,8 @@ module Rumale
                 'The value of n_splits must be not less than 2 and not more than the number of samples in each class.'
         end
         # Splits dataset ids of each class to each fold.
-        fold_sets_each_class = y.to_a.uniq.map { |label| fold_sets(y, label) }
+        sub_rng = @rng.dup
+        fold_sets_each_class = y.to_a.uniq.map { |label| fold_sets(y, label, sub_rng) }
         # Returns array consisting of the training and testing ids for each fold.
         Array.new(@n_splits) { |fold_id| train_test_sets(fold_sets_each_class, fold_id) }
       end
@@ -76,9 +77,9 @@ module Rumale
         y.to_a.uniq.map { |label| y.eq(label).where.size }.all? { |n_samples| @n_splits.between?(2, n_samples) }
       end
-      def fold_sets(y, label)
+      def fold_sets(y, label, sub_rng)
         sample_ids = y.eq(label).where.to_a
-        sample_ids.shuffle!(random: @rng) if @shuffle
+        sample_ids.shuffle!(random: sub_rng) if @shuffle
         n_samples = sample_ids.size
         Array.new(@n_splits) do |n|
           n_fold_samples = n_samples / @n_splits

data/lib/rumale/model_selection/stratified_shuffle_split.rb CHANGED

@@ -62,6 +62,7 @@ module Rumale
         check_sample_label_size(x, y)
         # Initialize and check some variables.
         train_sz = @train_size.nil? ? 1.0 - @test_size : @train_size
+        sub_rng = @rng.dup
         # Check the number of samples in each class.
         unless valid_n_splits?(y)
           raise ArgumentError,
@@ -88,11 +89,11 @@ module Rumale
             n_samples = sample_ids.size
             n_test_samples = (@test_size * n_samples).to_i
             n_train_samples = (train_sz * n_samples).to_i
-            test_ids += sample_ids.sample(n_test_samples, random: @rng)
+            test_ids += sample_ids.sample(n_test_samples, random: sub_rng)
             train_ids += if @train_size.nil?
                            sample_ids - test_ids
                          else
-                           (sample_ids - test_ids).sample(n_train_samples, random: @rng)
+                           (sample_ids - test_ids).sample(n_train_samples, random: sub_rng)
                          end
           end
           [train_ids, test_ids]

data/lib/rumale/nearest_neighbors/k_neighbors_classifier.rb CHANGED

@@ -11,7 +11,7 @@ module Rumale
     #
     # @example
     #   estimator =
-    #     Rumale::NearestNeighbors::KNeighborsClassifier.new(n_neighbors = 5)
+    #     Rumale::NearestNeighbors::KNeighborsClassifier.new(n_neighbors: 5)
     #   estimator.fit(training_samples, traininig_labels)
     #   results = estimator.predict(testing_samples)
     #

data/lib/rumale/nearest_neighbors/k_neighbors_regressor.rb CHANGED

@@ -10,7 +10,7 @@ module Rumale
     #
     # @example
     #   estimator =
-    #     Rumale::NearestNeighbors::KNeighborsRegressor.new(n_neighbors = 5)
+    #     Rumale::NearestNeighbors::KNeighborsRegressor.new(n_neighbors: 5)
     #   estimator.fit(training_samples, traininig_target_values)
     #   results = estimator.predict(testing_samples)
     #

data/lib/rumale/pipeline/pipeline.rb CHANGED

@@ -9,7 +9,7 @@ module Rumale
     # Pipeline is a class that implements the function to perform the transformers and estimators sequencially.
     #
     # @example
-    #   rbf = Rumale::KernelApproximation::RBF.new(gamma: 1.0, n_coponents: 128, random_seed: 1)
+    #   rbf = Rumale::KernelApproximation::RBF.new(gamma: 1.0, n_components: 128, random_seed: 1)
     #   svc = Rumale::LinearModel::SVC.new(reg_param: 1.0, fit_bias: true, max_iter: 5000, random_seed: 1)
     #   pipeline = Rumale::Pipeline::Pipeline.new(steps: { trs: rbf, est: svc })
     #   pipeline.fit(training_samples, traininig_labels)

data/lib/rumale/polynomial_model/base_factorization_machine.rb CHANGED

@@ -51,7 +51,7 @@ module Rumale
       def partial_fit(x, y)
         # Initialize some variables.
         n_samples, n_features = x.shape
-        rand_ids = [*0...n_samples].shuffle(random: @rng)
+        rand_ids = [*0...n_samples].shuffle(random: @rng.dup)
         weight_vec = Numo::DFloat.zeros(n_features + 1)
         factor_mat = Numo::DFloat.zeros(@params[:n_factors], n_features)
         weight_optimizer = @params[:optimizer].dup

data/lib/rumale/tree/base_decision_tree.rb CHANGED

@@ -113,7 +113,7 @@ module Rumale
       end
       def rand_ids(n)
-        [*0...n].sample(@params[:max_features], random: @rng)
+        [*0...n].sample(@params[:max_features], random: @sub_rng)
       end
       def best_split(_features, _y, _impurity)

data/lib/rumale/tree/decision_tree_classifier.rb CHANGED

@@ -79,6 +79,7 @@ module Rumale
         @classes = Numo::Int32.asarray(uniq_y)
         @n_leaves = 0
         @leaf_labels = []
+        @sub_rng = @rng.dup
         build_tree(x, y.map { |v| uniq_y.index(v) })
         eval_importance(n_samples, n_features)
         @leaf_labels = Numo::Int32[*@leaf_labels]

data/lib/rumale/tree/decision_tree_regressor.rb CHANGED

@@ -73,6 +73,7 @@ module Rumale
         @params[:max_features] = [@params[:max_features], n_features].min
         @n_leaves = 0
         @leaf_values = []
+        @sub_rng = @rng.dup
         build_tree(x, y)
         eval_importance(n_samples, n_features)
         @leaf_values = Numo::DFloat.cast(@leaf_values)

data/lib/rumale/tree/extra_tree_classifier.rb CHANGED

@@ -104,7 +104,7 @@ module Rumale
       private
       def best_split(features, y, whole_impurity)
-        threshold = @rng.rand(features.min..features.max)
+        threshold = @sub_rng.rand(features.min..features.max)
         l_ids = features.le(threshold).where
         r_ids = features.gt(threshold).where
         l_impurity = l_ids.empty? ? 0.0 : impurity(y[l_ids, true])

data/lib/rumale/tree/extra_tree_regressor.rb CHANGED

@@ -91,7 +91,7 @@ module Rumale
       private
       def best_split(features, y, whole_impurity)
-        threshold = @rng.rand(features.min..features.max)
+        threshold = @sub_rng.rand(features.min..features.max)
         l_ids = features.le(threshold).where
         r_ids = features.gt(threshold).where
         l_impurity = l_ids.empty? ? 0.0 : impurity(y[l_ids, true])

data/lib/rumale/tree/gradient_tree_regressor.rb CHANGED

@@ -93,6 +93,7 @@ module Rumale
         @n_leaves = 0
         @leaf_weights = []
         @feature_importances = Numo::DFloat.zeros(n_features)
+        @sub_rng = @rng.dup
         # Build tree.
         build_tree(x, y, g, h)
         @leaf_weights = Numo::DFloat[*@leaf_weights]
@@ -221,7 +222,7 @@ module Rumale
       end
       def rand_ids(n)
-        [*0...n].sample(@params[:max_features], random: @rng)
+        [*0...n].sample(@params[:max_features], random: @sub_rng)
       end
     end
   end

data/lib/rumale/utils.rb CHANGED

@@ -22,8 +22,12 @@ module Rumale
     # @!visibility private
     def rand_uniform(shape, rng = nil)
       rng ||= Random.new
-      rnd_vals = Array.new(shape.inject(:*)) { rng.rand }
-      Numo::DFloat.asarray(rnd_vals).reshape(shape[0], shape[1])
+      if shape.is_a?(Array)
+        rnd_vals = Array.new(shape.inject(:*)) { rng.rand }
+        Numo::DFloat.asarray(rnd_vals).reshape(shape[0], shape[1])
+      else
+        Numo::DFloat.asarray(Array.new(shape) { rng.rand })
+      end
     end
     # @!visibility private

data/lib/rumale/version.rb CHANGED

@@ -3,5 +3,5 @@
 # Rumale is a machine learning library in Ruby.
 module Rumale
   # The version of Rumale you are using.
-  VERSION = '0.11.0'
+  VERSION = '0.12.0'
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: rumale
 version: !ruby/object:Gem::Version
-  version: 0.11.0
+  version: 0.12.0
 platform: ruby
 authors:
 - yoshoku
 autorequire:
 bindir: exe
 cert_chain: []
-date: 2019-05-24 00:00:00.000000000 Z
+date: 2019-06-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: numo-narray