RubyGems - lightgbm - Versions diffs - 0.1.2 → 0.1.3 - Mend

lightgbm 0.1.2 → 0.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (10) hide show

checksums.yaml CHANGED

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 723130d41ea9196bcbd7bcffeb865c40c65985f26eca018d49bd176d33c43142
-  data.tar.gz: d92b41899ff72da2ef4e5782bf4d2840caee1554107d9fd5d02bd6728829585a
+  metadata.gz: 3d841acf71e8af7111178da8c2062b47900ec953a94154a0cdf9f28bf7d61714
+  data.tar.gz: 6ed019f4094803a06be77008e48870fb8db3acac4b83f3675eaeae4e20c27fdb
 SHA512:
-  metadata.gz: 6960dbf1e2a884705e8a2752952392483c0ab1e74e970382b45df747ccbedbc0d978d3910e2b935190c98a9c45315841b53e27128aa5944e3a7834808e05582a
-  data.tar.gz: c6e793933dc794fa62099580ad35f29d7e5e3ae24a07df4335dca3d68571bc1d5b360a7cf0859d75d245a77d470a0dafba0cb5d86edda1974f3bc532b0f5c11a
+  metadata.gz: 477e25066789028e7b8a8a78107c1ed823bd06d96d97afdda41b502e2e3e4a9e0065888c414effe4ace4097baa4d4b18988c4ee6b4a9d06347992afa201a52b5
+  data.tar.gz: eabb924994ffcafce6cb9038a60e3327528d2308d39c62bc336a06191e471ff412e141f9117446abc068aabbe9d1d16be59cc8bdca889270219895e85ec9e57b

data/CHANGELOG.md CHANGED

@@ -1,3 +1,8 @@
+## 0.1.3
+- Added Scikit-Learn API
+- Added support for Daru and Numo::NArray
 ## 0.1.2
 - Added `cv` method

data/README.md CHANGED

@@ -1,6 +1,6 @@
 # LightGBM
-[LightGBM](https://github.com/microsoft/LightGBM) for Ruby
+[LightGBM](https://github.com/microsoft/LightGBM) - the high performance machine learning library - for Ruby
 :fire: Uses the C API for blazing performance
@@ -18,6 +18,16 @@ gem 'lightgbm'
 ## Getting Started
+This library follows the [Data Structure, Training, and Scikit-Learn APIs](https://lightgbm.readthedocs.io/en/latest/Python-API.html) of the Python library. A few differences are:
+- The `get_` prefix is removed from methods
+- The default verbosity is `-1`
+- With the `cv` method, `stratified` is set to `false`
+Some methods and options are also missing at the moment. PRs welcome!
+## Training API
 Train a model
 ```ruby
@@ -44,38 +54,98 @@ Load the model from a file
 booster = LightGBM::Booster.new(model_file: "model.txt")
 ```
-Get feature importance
+Get the importance of features
 ```ruby
 booster.feature_importance
 ```
-## Early Stopping
+Early stopping
 ```ruby
 LightGBM.train(params, train_set, valid_set: [train_set, test_set], early_stopping_rounds: 5)
 ```
-## CV
+CV
 ```ruby
 LightGBM.cv(params, train_set, nfold: 5, verbose_eval: true)
 ```
-## Reference
+## Scikit-Learn API
-This library follows the [Data Structure and Training APIs](https://lightgbm.readthedocs.io/en/latest/Python-API.html) for the Python library. A few differences are:
+Prep your data
-- The default verbosity is `-1`
-- With the `cv` method, `stratified` is set to `false`
+```ruby
+x = [[1, 2], [3, 4], [5, 6], [7, 8]]
+y = [1, 2, 3, 4]
+```
-Some methods and options are also missing at the moment. PRs welcome!
+Train a model
+```ruby
+model = LightGBM::Regressor.new
+model.fit(x, y)
+```
+> For classification, use `LightGBM::Classifier`
+Predict
+```ruby
+model.predict(x)
+```
+> For classification, use `predict_proba` for probabilities
+Save the model to a file
+```ruby
+model.save_model("model.txt")
+```
+Load the model from a file
+```ruby
+model.load_model("model.txt")
+```
+Get the importance of features
+```ruby
+model.feature_importances
+```
+## Data
+Data can be an array of arrays
+```ruby
+[[1, 2, 3], [4, 5, 6]]
+```
+Or a Daru data frame
+```ruby
+Daru::DataFrame.from_csv("houses.csv")
+```
+Or a Numo NArray
+```ruby
+Numo::DFloat.new(3, 2).seq
+```
 ## Helpful Resources
 - [Parameters](https://lightgbm.readthedocs.io/en/latest/Parameters.html)
 - [Parameter Tuning](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html)
+## Related Projects
+- [Xgb](https://github.com/ankane/xgb) - XGBoost for Ruby
+- [Eps](https://github.com/ankane/eps) - Machine Learning for Ruby
 ## Credits
 Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for serving as an initial reference, and Selva Prabhakaran for the [test datasets](https://github.com/selva86/datasets).

data/lib/lightgbm.rb CHANGED

@@ -8,11 +8,15 @@ require "lightgbm/dataset"
 require "lightgbm/ffi"
 require "lightgbm/version"
+# scikit-learn API
+require "lightgbm/classifier"
+require "lightgbm/regressor"
 module LightGBM
   class Error < StandardError; end
   class << self
-    def train(params, train_set,num_boost_round: 100, valid_sets: [], valid_names: [], early_stopping_rounds: nil, verbose_eval: true)
+    def train(params, train_set, num_boost_round: 100, valid_sets: [], valid_names: [], early_stopping_rounds: nil, verbose_eval: true)
       booster = Booster.new(params: params, train_set: train_set)
       valid_contain_train = false
@@ -150,6 +154,7 @@ module LightGBM
         if early_stopping_rounds
           stop_early = false
           means.each do |k, score|
+            # TODO fix higher better
             if best_score[k].nil? || score < best_score[k]
               best_score[k] = score
               best_iter[k] = iteration

data/lib/lightgbm/booster.rb CHANGED

@@ -77,7 +77,7 @@ module LightGBM
       num_feature = self.num_feature
       out_result = ::FFI::MemoryPointer.new(:double, num_feature)
       check_result FFI.LGBM_BoosterFeatureImportance(handle_pointer, iteration, importance_type, out_result)
-      out_result.read_array_of_double(num_feature)
+      out_result.read_array_of_double(num_feature).map(&:to_i)
     end
     def model_from_string(model_str)

data/lib/lightgbm/classifier.rb ADDED

@@ -0,0 +1,64 @@
+module LightGBM
+  class Classifier
+    def initialize(num_leaves: 31, learning_rate: 0.1, n_estimators: 100, objective: nil)
+      @params = {
+        num_leaves: num_leaves,
+        learning_rate: learning_rate
+      }
+      @params[:objective] = objective if objective
+      @n_estimators = n_estimators
+    end
+    def fit(x, y)
+      n_classes = y.uniq.size
+      params = @params.dup
+      if n_classes > 2
+        params[:objective] ||= "multiclass"
+        params[:num_class] = n_classes
+      else
+        params[:objective] ||= "binary"
+      end
+      train_set = Dataset.new(x, label: y)
+      @booster = LightGBM.train(params, train_set, num_boost_round: @n_estimators)
+      nil
+    end
+    def predict(data)
+      y_pred = @booster.predict(data)
+      if y_pred.first.is_a?(Array)
+        # multiple classes
+        y_pred.map do |v|
+          v.map.with_index.max_by { |v2, i| v2 }.last
+        end
+      else
+        y_pred.map { |v| v > 0.5 ? 1 : 0 }
+      end
+    end
+    def predict_proba(data)
+      y_pred = @booster.predict(data)
+      if y_pred.first.is_a?(Array)
+        # multiple classes
+        y_pred
+      else
+        y_pred.map { |v| [1 - v, v] }
+      end
+    end
+    def save_model(fname)
+      @booster.save_model(fname)
+    end
+    def load_model(fname)
+      @booster = Booster.new(params: @params, model_file: fname)
+    end
+    def feature_importances
+      @booster.feature_importance
+    end
+  end
+end

data/lib/lightgbm/dataset.rb CHANGED

@@ -20,9 +20,25 @@ module LightGBM
         used_row_indices.put_array_of_int32(0, used_indices)
         check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
       else
-        c_data = ::FFI::MemoryPointer.new(:float, data.count * data.first.count)
-        c_data.put_array_of_float(0, data.flatten)
-        check_result FFI.LGBM_DatasetCreateFromMat(c_data, 0, data.count, data.first.count, 1, parameters, reference, @handle)
+        if matrix?(data)
+          nrow = data.row_count
+          ncol = data.column_count
+          flat_data = data.to_a.flatten
+        elsif daru?(data)
+          nrow, ncol = data.shape
+          flat_data = data.each_vector.map(&:to_a).flatten
+        elsif narray?(data)
+          nrow, ncol = data.shape
+          flat_data = data.flatten.to_a
+        else
+          nrow = data.count
+          ncol = data.first.count
+          flat_data = data.flatten
+        end
+        c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
+        c_data.put_array_of_float(0, flat_data)
+        check_result FFI.LGBM_DatasetCreateFromMat(c_data, 0, nrow, ncol, 1, parameters, reference, @handle)
       end
       # causes "Stack consistency error"
       # ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer))
@@ -89,11 +105,24 @@ module LightGBM
     end
     def set_field(field_name, data)
+      data = data.to_a unless data.is_a?(Array)
       c_data = ::FFI::MemoryPointer.new(:float, data.count)
       c_data.put_array_of_float(0, data)
       check_result FFI.LGBM_DatasetSetField(handle_pointer, field_name, c_data, data.count, 0)
     end
+    def matrix?(data)
+      defined?(Matrix) && data.is_a?(Matrix)
+    end
+    def daru?(data)
+      defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
+    end
+    def narray?(data)
+      defined?(Numo::NArray) && data.is_a?(Numo::NArray)
+    end
     include Utils
   end
 end

data/lib/lightgbm/regressor.rb ADDED

@@ -0,0 +1,34 @@
+module LightGBM
+  class Regressor
+    def initialize(num_leaves: 31, learning_rate: 0.1, n_estimators: 100, objective: nil)
+      @params = {
+        num_leaves: num_leaves,
+        learning_rate: learning_rate
+      }
+      @params[:objective] = objective if objective
+      @n_estimators = n_estimators
+    end
+    def fit(x, y)
+      train_set = Dataset.new(x, label: y)
+      @booster = LightGBM.train(@params, train_set, num_boost_round: @n_estimators)
+      nil
+    end
+    def predict(data)
+      @booster.predict(data)
+    end
+    def save_model(fname)
+      @booster.save_model(fname)
+    end
+    def load_model(fname)
+      @booster = Booster.new(params: @params, model_file: fname)
+    end
+    def feature_importances
+      @booster.feature_importance
+    end
+  end
+end

data/lib/lightgbm/version.rb CHANGED

@@ -1,3 +1,3 @@
 module LightGBM
-  VERSION = "0.1.2"
+  VERSION = "0.1.3"
 end

metadata CHANGED

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: lightgbm
 version: !ruby/object:Gem::Version
-  version: 0.1.2
+  version: 0.1.3
 platform: ruby
 authors:
 - Andrew Kane
 autorequire:
 bindir: bin
 cert_chain: []
-date: 2019-08-15 00:00:00.000000000 Z
+date: 2019-08-16 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: ffi
@@ -66,6 +66,34 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '5'
+- !ruby/object:Gem::Dependency
+  name: daru
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+- !ruby/object:Gem::Dependency
+  name: numo-narray
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description:
 email: andrew@chartkick.com
 executables: []
@@ -76,8 +104,10 @@ files:
 - README.md
 - lib/lightgbm.rb
 - lib/lightgbm/booster.rb
+- lib/lightgbm/classifier.rb
 - lib/lightgbm/dataset.rb
 - lib/lightgbm/ffi.rb
+- lib/lightgbm/regressor.rb
 - lib/lightgbm/utils.rb
 - lib/lightgbm/version.rb
 homepage: https://github.com/ankane/lightgbm
@@ -102,5 +132,5 @@ requirements: []
 rubygems_version: 3.0.4
 signing_key:
 specification_version: 4
-summary: LightGBM for Ruby
+summary: LightGBM - the high performance machine learning library - for Ruby
 test_files: []