lightgbm 0.1.2 → 0.1.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 723130d41ea9196bcbd7bcffeb865c40c65985f26eca018d49bd176d33c43142
4
- data.tar.gz: d92b41899ff72da2ef4e5782bf4d2840caee1554107d9fd5d02bd6728829585a
3
+ metadata.gz: 3d841acf71e8af7111178da8c2062b47900ec953a94154a0cdf9f28bf7d61714
4
+ data.tar.gz: 6ed019f4094803a06be77008e48870fb8db3acac4b83f3675eaeae4e20c27fdb
5
5
  SHA512:
6
- metadata.gz: 6960dbf1e2a884705e8a2752952392483c0ab1e74e970382b45df747ccbedbc0d978d3910e2b935190c98a9c45315841b53e27128aa5944e3a7834808e05582a
7
- data.tar.gz: c6e793933dc794fa62099580ad35f29d7e5e3ae24a07df4335dca3d68571bc1d5b360a7cf0859d75d245a77d470a0dafba0cb5d86edda1974f3bc532b0f5c11a
6
+ metadata.gz: 477e25066789028e7b8a8a78107c1ed823bd06d96d97afdda41b502e2e3e4a9e0065888c414effe4ace4097baa4d4b18988c4ee6b4a9d06347992afa201a52b5
7
+ data.tar.gz: eabb924994ffcafce6cb9038a60e3327528d2308d39c62bc336a06191e471ff412e141f9117446abc068aabbe9d1d16be59cc8bdca889270219895e85ec9e57b
@@ -1,3 +1,8 @@
1
+ ## 0.1.3
2
+
3
+ - Added Scikit-Learn API
4
+ - Added support for Daru and Numo::NArray
5
+
1
6
  ## 0.1.2
2
7
 
3
8
  - Added `cv` method
data/README.md CHANGED
@@ -1,6 +1,6 @@
1
1
  # LightGBM
2
2
 
3
- [LightGBM](https://github.com/microsoft/LightGBM) for Ruby
3
+ [LightGBM](https://github.com/microsoft/LightGBM) - the high performance machine learning library - for Ruby
4
4
 
5
5
  :fire: Uses the C API for blazing performance
6
6
 
@@ -18,6 +18,16 @@ gem 'lightgbm'
18
18
 
19
19
  ## Getting Started
20
20
 
21
+ This library follows the [Data Structure, Training, and Scikit-Learn APIs](https://lightgbm.readthedocs.io/en/latest/Python-API.html) of the Python library. A few differences are:
22
+
23
+ - The `get_` prefix is removed from methods
24
+ - The default verbosity is `-1`
25
+ - With the `cv` method, `stratified` is set to `false`
26
+
27
+ Some methods and options are also missing at the moment. PRs welcome!
28
+
29
+ ## Training API
30
+
21
31
  Train a model
22
32
 
23
33
  ```ruby
@@ -44,38 +54,98 @@ Load the model from a file
44
54
  booster = LightGBM::Booster.new(model_file: "model.txt")
45
55
  ```
46
56
 
47
- Get feature importance
57
+ Get the importance of features
48
58
 
49
59
  ```ruby
50
60
  booster.feature_importance
51
61
  ```
52
62
 
53
- ## Early Stopping
63
+ Early stopping
54
64
 
55
65
  ```ruby
56
66
  LightGBM.train(params, train_set, valid_set: [train_set, test_set], early_stopping_rounds: 5)
57
67
  ```
58
68
 
59
- ## CV
69
+ CV
60
70
 
61
71
  ```ruby
62
72
  LightGBM.cv(params, train_set, nfold: 5, verbose_eval: true)
63
73
  ```
64
74
 
65
- ## Reference
75
+ ## Scikit-Learn API
66
76
 
67
- This library follows the [Data Structure and Training APIs](https://lightgbm.readthedocs.io/en/latest/Python-API.html) for the Python library. A few differences are:
77
+ Prep your data
68
78
 
69
- - The default verbosity is `-1`
70
- - With the `cv` method, `stratified` is set to `false`
79
+ ```ruby
80
+ x = [[1, 2], [3, 4], [5, 6], [7, 8]]
81
+ y = [1, 2, 3, 4]
82
+ ```
71
83
 
72
- Some methods and options are also missing at the moment. PRs welcome!
84
+ Train a model
85
+
86
+ ```ruby
87
+ model = LightGBM::Regressor.new
88
+ model.fit(x, y)
89
+ ```
90
+
91
+ > For classification, use `LightGBM::Classifier`
92
+
93
+ Predict
94
+
95
+ ```ruby
96
+ model.predict(x)
97
+ ```
98
+
99
+ > For classification, use `predict_proba` for probabilities
100
+
101
+ Save the model to a file
102
+
103
+ ```ruby
104
+ model.save_model("model.txt")
105
+ ```
106
+
107
+ Load the model from a file
108
+
109
+ ```ruby
110
+ model.load_model("model.txt")
111
+ ```
112
+
113
+ Get the importance of features
114
+
115
+ ```ruby
116
+ model.feature_importances
117
+ ```
118
+
119
+ ## Data
120
+
121
+ Data can be an array of arrays
122
+
123
+ ```ruby
124
+ [[1, 2, 3], [4, 5, 6]]
125
+ ```
126
+
127
+ Or a Daru data frame
128
+
129
+ ```ruby
130
+ Daru::DataFrame.from_csv("houses.csv")
131
+ ```
132
+
133
+ Or a Numo NArray
134
+
135
+ ```ruby
136
+ Numo::DFloat.new(3, 2).seq
137
+ ```
73
138
 
74
139
  ## Helpful Resources
75
140
 
76
141
  - [Parameters](https://lightgbm.readthedocs.io/en/latest/Parameters.html)
77
142
  - [Parameter Tuning](https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html)
78
143
 
144
+ ## Related Projects
145
+
146
+ - [Xgb](https://github.com/ankane/xgb) - XGBoost for Ruby
147
+ - [Eps](https://github.com/ankane/eps) - Machine Learning for Ruby
148
+
79
149
  ## Credits
80
150
 
81
151
  Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for serving as an initial reference, and Selva Prabhakaran for the [test datasets](https://github.com/selva86/datasets).
@@ -8,11 +8,15 @@ require "lightgbm/dataset"
8
8
  require "lightgbm/ffi"
9
9
  require "lightgbm/version"
10
10
 
11
+ # scikit-learn API
12
+ require "lightgbm/classifier"
13
+ require "lightgbm/regressor"
14
+
11
15
  module LightGBM
12
16
  class Error < StandardError; end
13
17
 
14
18
  class << self
15
- def train(params, train_set,num_boost_round: 100, valid_sets: [], valid_names: [], early_stopping_rounds: nil, verbose_eval: true)
19
+ def train(params, train_set, num_boost_round: 100, valid_sets: [], valid_names: [], early_stopping_rounds: nil, verbose_eval: true)
16
20
  booster = Booster.new(params: params, train_set: train_set)
17
21
 
18
22
  valid_contain_train = false
@@ -150,6 +154,7 @@ module LightGBM
150
154
  if early_stopping_rounds
151
155
  stop_early = false
152
156
  means.each do |k, score|
157
+ # TODO fix higher better
153
158
  if best_score[k].nil? || score < best_score[k]
154
159
  best_score[k] = score
155
160
  best_iter[k] = iteration
@@ -77,7 +77,7 @@ module LightGBM
77
77
  num_feature = self.num_feature
78
78
  out_result = ::FFI::MemoryPointer.new(:double, num_feature)
79
79
  check_result FFI.LGBM_BoosterFeatureImportance(handle_pointer, iteration, importance_type, out_result)
80
- out_result.read_array_of_double(num_feature)
80
+ out_result.read_array_of_double(num_feature).map(&:to_i)
81
81
  end
82
82
 
83
83
  def model_from_string(model_str)
@@ -0,0 +1,64 @@
1
+ module LightGBM
2
+ class Classifier
3
+ def initialize(num_leaves: 31, learning_rate: 0.1, n_estimators: 100, objective: nil)
4
+ @params = {
5
+ num_leaves: num_leaves,
6
+ learning_rate: learning_rate
7
+ }
8
+ @params[:objective] = objective if objective
9
+ @n_estimators = n_estimators
10
+ end
11
+
12
+ def fit(x, y)
13
+ n_classes = y.uniq.size
14
+
15
+ params = @params.dup
16
+ if n_classes > 2
17
+ params[:objective] ||= "multiclass"
18
+ params[:num_class] = n_classes
19
+ else
20
+ params[:objective] ||= "binary"
21
+ end
22
+
23
+ train_set = Dataset.new(x, label: y)
24
+ @booster = LightGBM.train(params, train_set, num_boost_round: @n_estimators)
25
+ nil
26
+ end
27
+
28
+ def predict(data)
29
+ y_pred = @booster.predict(data)
30
+
31
+ if y_pred.first.is_a?(Array)
32
+ # multiple classes
33
+ y_pred.map do |v|
34
+ v.map.with_index.max_by { |v2, i| v2 }.last
35
+ end
36
+ else
37
+ y_pred.map { |v| v > 0.5 ? 1 : 0 }
38
+ end
39
+ end
40
+
41
+ def predict_proba(data)
42
+ y_pred = @booster.predict(data)
43
+
44
+ if y_pred.first.is_a?(Array)
45
+ # multiple classes
46
+ y_pred
47
+ else
48
+ y_pred.map { |v| [1 - v, v] }
49
+ end
50
+ end
51
+
52
+ def save_model(fname)
53
+ @booster.save_model(fname)
54
+ end
55
+
56
+ def load_model(fname)
57
+ @booster = Booster.new(params: @params, model_file: fname)
58
+ end
59
+
60
+ def feature_importances
61
+ @booster.feature_importance
62
+ end
63
+ end
64
+ end
@@ -20,9 +20,25 @@ module LightGBM
20
20
  used_row_indices.put_array_of_int32(0, used_indices)
21
21
  check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
22
22
  else
23
- c_data = ::FFI::MemoryPointer.new(:float, data.count * data.first.count)
24
- c_data.put_array_of_float(0, data.flatten)
25
- check_result FFI.LGBM_DatasetCreateFromMat(c_data, 0, data.count, data.first.count, 1, parameters, reference, @handle)
23
+ if matrix?(data)
24
+ nrow = data.row_count
25
+ ncol = data.column_count
26
+ flat_data = data.to_a.flatten
27
+ elsif daru?(data)
28
+ nrow, ncol = data.shape
29
+ flat_data = data.each_vector.map(&:to_a).flatten
30
+ elsif narray?(data)
31
+ nrow, ncol = data.shape
32
+ flat_data = data.flatten.to_a
33
+ else
34
+ nrow = data.count
35
+ ncol = data.first.count
36
+ flat_data = data.flatten
37
+ end
38
+
39
+ c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
40
+ c_data.put_array_of_float(0, flat_data)
41
+ check_result FFI.LGBM_DatasetCreateFromMat(c_data, 0, nrow, ncol, 1, parameters, reference, @handle)
26
42
  end
27
43
  # causes "Stack consistency error"
28
44
  # ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer))
@@ -89,11 +105,24 @@ module LightGBM
89
105
  end
90
106
 
91
107
  def set_field(field_name, data)
108
+ data = data.to_a unless data.is_a?(Array)
92
109
  c_data = ::FFI::MemoryPointer.new(:float, data.count)
93
110
  c_data.put_array_of_float(0, data)
94
111
  check_result FFI.LGBM_DatasetSetField(handle_pointer, field_name, c_data, data.count, 0)
95
112
  end
96
113
 
114
+ def matrix?(data)
115
+ defined?(Matrix) && data.is_a?(Matrix)
116
+ end
117
+
118
+ def daru?(data)
119
+ defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
120
+ end
121
+
122
+ def narray?(data)
123
+ defined?(Numo::NArray) && data.is_a?(Numo::NArray)
124
+ end
125
+
97
126
  include Utils
98
127
  end
99
128
  end
@@ -0,0 +1,34 @@
1
+ module LightGBM
2
+ class Regressor
3
+ def initialize(num_leaves: 31, learning_rate: 0.1, n_estimators: 100, objective: nil)
4
+ @params = {
5
+ num_leaves: num_leaves,
6
+ learning_rate: learning_rate
7
+ }
8
+ @params[:objective] = objective if objective
9
+ @n_estimators = n_estimators
10
+ end
11
+
12
+ def fit(x, y)
13
+ train_set = Dataset.new(x, label: y)
14
+ @booster = LightGBM.train(@params, train_set, num_boost_round: @n_estimators)
15
+ nil
16
+ end
17
+
18
+ def predict(data)
19
+ @booster.predict(data)
20
+ end
21
+
22
+ def save_model(fname)
23
+ @booster.save_model(fname)
24
+ end
25
+
26
+ def load_model(fname)
27
+ @booster = Booster.new(params: @params, model_file: fname)
28
+ end
29
+
30
+ def feature_importances
31
+ @booster.feature_importance
32
+ end
33
+ end
34
+ end
@@ -1,3 +1,3 @@
1
1
  module LightGBM
2
- VERSION = "0.1.2"
2
+ VERSION = "0.1.3"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lightgbm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.2
4
+ version: 0.1.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-08-15 00:00:00.000000000 Z
11
+ date: 2019-08-16 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -66,6 +66,34 @@ dependencies:
66
66
  - - ">="
67
67
  - !ruby/object:Gem::Version
68
68
  version: '5'
69
+ - !ruby/object:Gem::Dependency
70
+ name: daru
71
+ requirement: !ruby/object:Gem::Requirement
72
+ requirements:
73
+ - - ">="
74
+ - !ruby/object:Gem::Version
75
+ version: '0'
76
+ type: :development
77
+ prerelease: false
78
+ version_requirements: !ruby/object:Gem::Requirement
79
+ requirements:
80
+ - - ">="
81
+ - !ruby/object:Gem::Version
82
+ version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: numo-narray
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
69
97
  description:
70
98
  email: andrew@chartkick.com
71
99
  executables: []
@@ -76,8 +104,10 @@ files:
76
104
  - README.md
77
105
  - lib/lightgbm.rb
78
106
  - lib/lightgbm/booster.rb
107
+ - lib/lightgbm/classifier.rb
79
108
  - lib/lightgbm/dataset.rb
80
109
  - lib/lightgbm/ffi.rb
110
+ - lib/lightgbm/regressor.rb
81
111
  - lib/lightgbm/utils.rb
82
112
  - lib/lightgbm/version.rb
83
113
  homepage: https://github.com/ankane/lightgbm
@@ -102,5 +132,5 @@ requirements: []
102
132
  rubygems_version: 3.0.4
103
133
  signing_key:
104
134
  specification_version: 4
105
- summary: LightGBM for Ruby
135
+ summary: LightGBM - the high performance machine learning library - for Ruby
106
136
  test_files: []