lightgbm 0.1.4 → 0.1.9

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 81f1f695112234bb576afaab35f4bf276d1f9c4a4adf0c74831cd1bb73f6baa0
4
- data.tar.gz: 59ef1f3c581f83e108ce2a6f2c847bb7488fc2ea7f39ba217aaeffbf46e99351
3
+ metadata.gz: a4aac9eac1ab0dadbf31d1e1fc2714e75a8c37075538aef5c53b42df1c34f658
4
+ data.tar.gz: ca3a1043c55184992b3fac611963062d01747449f6170719ef1f299d4f0474c9
5
5
  SHA512:
6
- metadata.gz: 21297d26e88957dd60d0aa61da19aa53aa632958adaf069b0efe2c8dae35e2e21e74c374da3509e337ca3268613b14dc541aee5012df086af7e8f784adb5063d
7
- data.tar.gz: 7dbdc0fccaf256a1a835aea3eaa51fe326a0cd4b8cde168a6b8c27ff00c6412a5b5d82583fb1c14749f001f002a4fa3c3e156b9c4b2174b1e91f9449a4fa9ba1
6
+ metadata.gz: c2f14ccc3b40690060d2ee533cfe46e137f40e91520aab3eb188a8a03a697a6956853b717e743cd357d0a059836fc32e9e0aa0fbe5a7dd4263b1ff3e94e79601
7
+ data.tar.gz: 64abdb43f4c45222dcbb39ddde1d21350b2b4958439dc84d8f13684f4e9bcd3aa2c0e49aa8fd9ae1dfc09b8bc0ce69592e95aab0542684f3c952e9e67ac4689f
@@ -1,4 +1,31 @@
1
- ## 0.1.4
1
+ ## 0.1.9 (2020-06-10)
2
+
3
+ - Added support for Rover
4
+ - Improved performance of Numo datasets
5
+
6
+ ## 0.1.8 (2020-05-09)
7
+
8
+ - Improved error message when OpenMP not found on Mac
9
+ - Fixed `Cannot add validation data` error
10
+
11
+ ## 0.1.7 (2019-12-05)
12
+
13
+ - Updated LightGBM to 2.3.1
14
+ - Switched to doubles for datasets and predictions
15
+
16
+ ## 0.1.6 (2019-09-29)
17
+
18
+ - Updated LightGBM to 2.3.0
19
+ - Fixed error with JRuby
20
+
21
+ ## 0.1.5 (2019-09-03)
22
+
23
+ - Packaged LightGBM with gem
24
+ - Added support for missing values
25
+ - Added `feature_names` to datasets
26
+ - Fixed Daru training and prediction
27
+
28
+ ## 0.1.4 (2019-08-19)
2
29
 
3
30
  - Friendlier message when LightGBM not found
4
31
  - Added `Ranker`
@@ -6,22 +33,22 @@
6
33
  - Free memory when objects are destroyed
7
34
  - Removed unreleased `dump_text` method
8
35
 
9
- ## 0.1.3
36
+ ## 0.1.3 (2019-08-16)
10
37
 
11
38
  - Added Scikit-Learn API
12
39
  - Added support for Daru and Numo::NArray
13
40
 
14
- ## 0.1.2
41
+ ## 0.1.2 (2019-08-15)
15
42
 
16
43
  - Added `cv` method
17
44
  - Added early stopping
18
45
  - Fixed multiclass classification
19
46
 
20
- ## 0.1.1
47
+ ## 0.1.1 (2019-08-14)
21
48
 
22
49
  - Added training API
23
50
  - Added many methods
24
51
 
25
- ## 0.1.0
52
+ ## 0.1.0 (2019-08-13)
26
53
 
27
54
  - First release
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2019 Andrew Kane
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md CHANGED
@@ -1,45 +1,44 @@
1
1
  # LightGBM
2
2
 
3
- [LightGBM](https://github.com/microsoft/LightGBM) - the high performance machine learning library - for Ruby
4
-
5
- :fire: Uses the C API for blazing performance
3
+ [LightGBM](https://github.com/microsoft/LightGBM) - high performance gradient boosting - for Ruby
6
4
 
7
5
  [![Build Status](https://travis-ci.org/ankane/lightgbm.svg?branch=master)](https://travis-ci.org/ankane/lightgbm)
8
6
 
9
7
  ## Installation
10
8
 
11
- First, [install LightGBM](https://lightgbm.readthedocs.io/en/latest/Installation-Guide.html). On Mac, copy `lib_lightgbm.so` to `/usr/local/lib`.
12
-
13
9
  Add this line to your application’s Gemfile:
14
10
 
15
11
  ```ruby
16
12
  gem 'lightgbm'
17
13
  ```
18
14
 
19
- ## Getting Started
15
+ On Mac, also install OpenMP:
20
16
 
21
- This library follows the [Python API](https://lightgbm.readthedocs.io/en/latest/Python-API.html). A few differences are:
17
+ ```sh
18
+ brew install libomp
19
+ ```
22
20
 
23
- - The `get_` prefix is removed from methods
24
- - The default verbosity is `-1`
25
- - With the `cv` method, `stratified` is set to `false`
21
+ ## Training API
26
22
 
27
- Some methods and options are also missing at the moment. PRs welcome!
23
+ Prep your data
28
24
 
29
- ## Training API
25
+ ```ruby
26
+ x = [[1, 2], [3, 4], [5, 6], [7, 8]]
27
+ y = [1, 2, 3, 4]
28
+ ```
30
29
 
31
30
  Train a model
32
31
 
33
32
  ```ruby
34
33
  params = {objective: "regression"}
35
- train_set = LightGBM::Dataset.new(x_train, label: y_train)
34
+ train_set = LightGBM::Dataset.new(x, label: y)
36
35
  booster = LightGBM.train(params, train_set)
37
36
  ```
38
37
 
39
38
  Predict
40
39
 
41
40
  ```ruby
42
- booster.predict(x_test)
41
+ booster.predict(x)
43
42
  ```
44
43
 
45
44
  Save the model to a file
@@ -130,16 +129,22 @@ Data can be an array of arrays
130
129
  [[1, 2, 3], [4, 5, 6]]
131
130
  ```
132
131
 
133
- Or a Daru data frame
132
+ Or a Numo NArray
134
133
 
135
134
  ```ruby
136
- Daru::DataFrame.from_csv("houses.csv")
135
+ Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])
137
136
  ```
138
137
 
139
- Or a Numo NArray
138
+ Or a Rover data frame
139
+
140
+ ```ruby
141
+ Rover.read_csv("houses.csv")
142
+ ```
143
+
144
+ Or a Daru data frame
140
145
 
141
146
  ```ruby
142
- Numo::DFloat.new(3, 2).seq
147
+ Daru::DataFrame.from_csv("houses.csv")
143
148
  ```
144
149
 
145
150
  ## Helpful Resources
@@ -149,12 +154,18 @@ Numo::DFloat.new(3, 2).seq
149
154
 
150
155
  ## Related Projects
151
156
 
152
- - [Xgb](https://github.com/ankane/xgb) - XGBoost for Ruby
153
- - [Eps](https://github.com/ankane/eps) - Machine Learning for Ruby
157
+ - [XGBoost](https://github.com/ankane/xgboost) - XGBoost for Ruby
158
+ - [Eps](https://github.com/ankane/eps) - Machine learning for Ruby
154
159
 
155
160
  ## Credits
156
161
 
157
- Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for serving as an initial reference, and Selva Prabhakaran for the [test datasets](https://github.com/selva86/datasets).
162
+ This library follows the [Python API](https://lightgbm.readthedocs.io/en/latest/Python-API.html). A few differences are:
163
+
164
+ - The `get_` and `set_` prefixes are removed from methods
165
+ - The default verbosity is `-1`
166
+ - With the `cv` method, `stratified` is set to `false`
167
+
168
+ Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for showing how to use FFI.
158
169
 
159
170
  ## History
160
171
 
@@ -168,3 +179,13 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
168
179
  - Fix bugs and [submit pull requests](https://github.com/ankane/lightgbm/pulls)
169
180
  - Write, clarify, or fix documentation
170
181
  - Suggest or add new features
182
+
183
+ To get started with development:
184
+
185
+ ```sh
186
+ git clone https://github.com/ankane/lightgbm.git
187
+ cd lightgbm
188
+ bundle install
189
+ bundle exec rake vendor:all
190
+ bundle exec rake test
191
+ ```
@@ -20,7 +20,8 @@ module LightGBM
20
20
  attr_accessor :ffi_lib
21
21
  end
22
22
  lib_name = "lib_lightgbm.#{::FFI::Platform::LIBSUFFIX}"
23
- self.ffi_lib = [lib_name, "lib_lightgbm.so"]
23
+ vendor_lib = File.expand_path("../vendor/#{lib_name}", __dir__)
24
+ self.ffi_lib = [lib_name, "lib_lightgbm.so", vendor_lib]
24
25
 
25
26
  # friendlier error message
26
27
  autoload :FFI, "lightgbm/ffi"
@@ -35,10 +36,14 @@ module LightGBM
35
36
  booster.train_data_name = name || "training"
36
37
  valid_contain_train = true
37
38
  else
39
+ # ensure the validation set references the training set
40
+ data.reference = train_set
38
41
  booster.add_valid(data, name || "valid_#{i}")
39
42
  end
40
43
  end
41
44
 
45
+ raise ArgumentError, "For early stopping, at least one validation set is required" if early_stopping_rounds && !valid_sets.any? { |v| v != train_set }
46
+
42
47
  booster.best_iteration = 0
43
48
 
44
49
  if early_stopping_rounds
@@ -130,6 +135,7 @@ module LightGBM
130
135
  if early_stopping_rounds
131
136
  best_score = {}
132
137
  best_iter = {}
138
+ best_iteration = nil
133
139
  end
134
140
 
135
141
  num_boost_round.times do |iteration|
@@ -169,6 +175,7 @@ module LightGBM
169
175
  best_score[k] = score
170
176
  best_iter[k] = iteration
171
177
  elsif iteration - best_iter[k] >= early_stopping_rounds
178
+ best_iteration = best_iter[k]
172
179
  stop_early = true
173
180
  break
174
181
  end
@@ -177,6 +184,15 @@ module LightGBM
177
184
  end
178
185
  end
179
186
 
187
+ if early_stopping_rounds
188
+ # use best iteration from first metric if not stopped early
189
+ best_iteration ||= best_iter[best_iter.keys.first]
190
+ eval_hist.each_key do |k|
191
+ # TODO uncomment for 0.2.0
192
+ # eval_hist[k] = eval_hist[k].first(best_iteration + 1)
193
+ end
194
+ end
195
+
180
196
  eval_hist
181
197
  end
182
198
 
@@ -30,7 +30,7 @@ module LightGBM
30
30
 
31
31
  def current_iteration
32
32
  out = ::FFI::MemoryPointer.new(:int)
33
- check_result FFI::LGBM_BoosterGetCurrentIteration(handle_pointer, out)
33
+ check_result FFI.LGBM_BoosterGetCurrentIteration(handle_pointer, out)
34
34
  out.read_int
35
35
  end
36
36
 
@@ -38,11 +38,11 @@ module LightGBM
38
38
  num_iteration ||= best_iteration
39
39
  buffer_len = 1 << 20
40
40
  out_len = ::FFI::MemoryPointer.new(:int64)
41
- out_str = ::FFI::MemoryPointer.new(:string, buffer_len)
41
+ out_str = ::FFI::MemoryPointer.new(:char, buffer_len)
42
42
  check_result FFI.LGBM_BoosterDumpModel(handle_pointer, start_iteration, num_iteration, buffer_len, out_len, out_str)
43
- actual_len = out_len.read_int64
43
+ actual_len = read_int64(out_len)
44
44
  if actual_len > buffer_len
45
- out_str = ::FFI::MemoryPointer.new(:string, actual_len)
45
+ out_str = ::FFI::MemoryPointer.new(:char, actual_len)
46
46
  check_result FFI.LGBM_BoosterDumpModel(handle_pointer, start_iteration, num_iteration, actual_len, out_len, out_str)
47
47
  end
48
48
  out_str.read_string
@@ -85,11 +85,11 @@ module LightGBM
85
85
  num_iteration ||= best_iteration
86
86
  buffer_len = 1 << 20
87
87
  out_len = ::FFI::MemoryPointer.new(:int64)
88
- out_str = ::FFI::MemoryPointer.new(:string, buffer_len)
88
+ out_str = ::FFI::MemoryPointer.new(:char, buffer_len)
89
89
  check_result FFI.LGBM_BoosterSaveModelToString(handle_pointer, start_iteration, num_iteration, buffer_len, out_len, out_str)
90
- actual_len = out_len.read_int64
90
+ actual_len = read_int64(out_len)
91
91
  if actual_len > buffer_len
92
- out_str = ::FFI::MemoryPointer.new(:string, actual_len)
92
+ out_str = ::FFI::MemoryPointer.new(:char, actual_len)
93
93
  check_result FFI.LGBM_BoosterSaveModelToString(handle_pointer, start_iteration, num_iteration, actual_len, out_len, out_str)
94
94
  end
95
95
  out_str.read_string
@@ -104,19 +104,24 @@ module LightGBM
104
104
 
105
105
  def num_model_per_iteration
106
106
  out = ::FFI::MemoryPointer.new(:int)
107
- check_result FFI::LGBM_BoosterNumModelPerIteration(handle_pointer, out)
107
+ check_result FFI.LGBM_BoosterNumModelPerIteration(handle_pointer, out)
108
108
  out.read_int
109
109
  end
110
110
 
111
111
  def num_trees
112
112
  out = ::FFI::MemoryPointer.new(:int)
113
- check_result FFI::LGBM_BoosterNumberOfTotalModel(handle_pointer, out)
113
+ check_result FFI.LGBM_BoosterNumberOfTotalModel(handle_pointer, out)
114
114
  out.read_int
115
115
  end
116
116
 
117
117
  # TODO support different prediction types
118
118
  def predict(input, num_iteration: nil, **params)
119
- raise TypeError unless input.is_a?(Array)
119
+ input =
120
+ if daru?(input)
121
+ input.map_rows(&:to_a)
122
+ else
123
+ input.to_a
124
+ end
120
125
 
121
126
  singular = !input.first.is_a?(Array)
122
127
  input = [input] if singular
@@ -124,13 +129,15 @@ module LightGBM
124
129
  num_iteration ||= best_iteration
125
130
  num_class ||= num_class()
126
131
 
127
- data = ::FFI::MemoryPointer.new(:float, input.count * input.first.count)
128
- data.put_array_of_float(0, input.flatten)
132
+ flat_input = input.flatten
133
+ handle_missing(flat_input)
134
+ data = ::FFI::MemoryPointer.new(:double, input.count * input.first.count)
135
+ data.write_array_of_double(flat_input)
129
136
 
130
137
  out_len = ::FFI::MemoryPointer.new(:int64)
131
138
  out_result = ::FFI::MemoryPointer.new(:double, num_class * input.count)
132
- check_result FFI.LGBM_BoosterPredictForMat(handle_pointer, data, 0, input.count, input.first.count, 1, 0, num_iteration, params_str(params), out_len, out_result)
133
- out = out_result.read_array_of_double(out_len.read_int64)
139
+ check_result FFI.LGBM_BoosterPredictForMat(handle_pointer, data, 1, input.count, input.first.count, 1, 0, num_iteration, params_str(params), out_len, out_result)
140
+ out = out_result.read_array_of_double(read_int64(out_len))
134
141
  out = out.each_slice(num_class).to_a if num_class > 1
135
142
 
136
143
  singular ? out.first : out
@@ -161,7 +168,7 @@ module LightGBM
161
168
 
162
169
  def eval_counts
163
170
  out = ::FFI::MemoryPointer.new(:int)
164
- check_result FFI::LGBM_BoosterGetEvalCounts(handle_pointer, out)
171
+ check_result FFI.LGBM_BoosterGetEvalCounts(handle_pointer, out)
165
172
  out.read_int
166
173
  end
167
174
 
@@ -169,8 +176,8 @@ module LightGBM
169
176
  eval_counts ||= eval_counts()
170
177
  out_len = ::FFI::MemoryPointer.new(:int)
171
178
  out_strs = ::FFI::MemoryPointer.new(:pointer, eval_counts)
172
- str_ptrs = eval_counts.times.map { ::FFI::MemoryPointer.new(:string, 255) }
173
- out_strs.put_array_of_pointer(0, str_ptrs)
179
+ str_ptrs = eval_counts.times.map { ::FFI::MemoryPointer.new(:char, 255) }
180
+ out_strs.write_array_of_pointer(str_ptrs)
174
181
  check_result FFI.LGBM_BoosterGetEvalNames(handle_pointer, out_len, out_strs)
175
182
  str_ptrs.map(&:read_string)
176
183
  end
@@ -191,10 +198,15 @@ module LightGBM
191
198
 
192
199
  def num_class
193
200
  out = ::FFI::MemoryPointer.new(:int)
194
- check_result FFI::LGBM_BoosterGetNumClasses(handle_pointer, out)
201
+ check_result FFI.LGBM_BoosterGetNumClasses(handle_pointer, out)
195
202
  out.read_int
196
203
  end
197
204
 
205
+ # read_int64 not available on JRuby
206
+ def read_int64(ptr)
207
+ ptr.read_array_of_int64(1).first
208
+ end
209
+
198
210
  include Utils
199
211
  end
200
212
  end
@@ -15,8 +15,8 @@ module LightGBM
15
15
  params[:objective] ||= "binary"
16
16
  end
17
17
 
18
- train_set = Dataset.new(x, label: y, categorical_feature: categorical_feature)
19
- valid_sets = Array(eval_set).map { |v| Dataset.new(v[0], label: v[1], reference: train_set) }
18
+ train_set = Dataset.new(x, label: y, categorical_feature: categorical_feature, params: params)
19
+ valid_sets = Array(eval_set).map { |v| Dataset.new(v[0], label: v[1], reference: train_set, params: params) }
20
20
 
21
21
  @booster = LightGBM.train(params, train_set,
22
22
  num_boost_round: @n_estimators,
@@ -2,49 +2,18 @@ module LightGBM
2
2
  class Dataset
3
3
  attr_reader :data, :params
4
4
 
5
- def initialize(data, label: nil, weight: nil, group: nil, params: nil, reference: nil, used_indices: nil, categorical_feature: "auto")
5
+ def initialize(data, label: nil, weight: nil, group: nil, params: nil, reference: nil, used_indices: nil, categorical_feature: "auto", feature_names: nil)
6
6
  @data = data
7
+ @label = label
8
+ @weight = weight
9
+ @group = group
10
+ @params = params
11
+ @reference = reference
12
+ @used_indices = used_indices
13
+ @categorical_feature = categorical_feature
14
+ @feature_names = feature_names
7
15
 
8
- # TODO stringify params
9
- params ||= {}
10
- params["categorical_feature"] ||= categorical_feature.join(",") if categorical_feature != "auto"
11
- set_verbosity(params)
12
-
13
- @handle = ::FFI::MemoryPointer.new(:pointer)
14
- parameters = params_str(params)
15
- reference = reference.handle_pointer if reference
16
- if used_indices
17
- used_row_indices = ::FFI::MemoryPointer.new(:int32, used_indices.count)
18
- used_row_indices.put_array_of_int32(0, used_indices)
19
- check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
20
- elsif data.is_a?(String)
21
- check_result FFI.LGBM_DatasetCreateFromFile(data, parameters, reference, @handle)
22
- else
23
- if matrix?(data)
24
- nrow = data.row_count
25
- ncol = data.column_count
26
- flat_data = data.to_a.flatten
27
- elsif daru?(data)
28
- nrow, ncol = data.shape
29
- flat_data = data.each_vector.map(&:to_a).flatten
30
- elsif narray?(data)
31
- nrow, ncol = data.shape
32
- flat_data = data.flatten.to_a
33
- else
34
- nrow = data.count
35
- ncol = data.first.count
36
- flat_data = data.flatten
37
- end
38
-
39
- c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
40
- c_data.put_array_of_float(0, flat_data)
41
- check_result FFI.LGBM_DatasetCreateFromMat(c_data, 0, nrow, ncol, 1, parameters, reference, @handle)
42
- end
43
- ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer)) unless used_indices
44
-
45
- self.label = label if label
46
- self.weight = weight if weight
47
- self.group = group if group
16
+ construct
48
17
  end
49
18
 
50
19
  def label
@@ -55,18 +24,47 @@ module LightGBM
55
24
  field("weight")
56
25
  end
57
26
 
27
+ def feature_names
28
+ # must preallocate space
29
+ num_feature_names = ::FFI::MemoryPointer.new(:int)
30
+ out_strs = ::FFI::MemoryPointer.new(:pointer, 1000)
31
+ str_ptrs = 1000.times.map { ::FFI::MemoryPointer.new(:char, 255) }
32
+ out_strs.write_array_of_pointer(str_ptrs)
33
+ check_result FFI.LGBM_DatasetGetFeatureNames(handle_pointer, out_strs, num_feature_names)
34
+ str_ptrs[0, num_feature_names.read_int].map(&:read_string)
35
+ end
36
+
58
37
  def label=(label)
38
+ @label = label
59
39
  set_field("label", label)
60
40
  end
61
41
 
62
42
  def weight=(weight)
43
+ @weight = weight
63
44
  set_field("weight", weight)
64
45
  end
65
46
 
66
47
  def group=(group)
48
+ @group = group
67
49
  set_field("group", group, type: :int32)
68
50
  end
69
51
 
52
+ def feature_names=(feature_names)
53
+ @feature_names = feature_names
54
+ c_feature_names = ::FFI::MemoryPointer.new(:pointer, feature_names.size)
55
+ c_feature_names.write_array_of_pointer(feature_names.map { |v| ::FFI::MemoryPointer.from_string(v) })
56
+ check_result FFI.LGBM_DatasetSetFeatureNames(handle_pointer, c_feature_names, feature_names.size)
57
+ end
58
+
59
+ # TODO only update reference if not in chain
60
+ def reference=(reference)
61
+ if reference != @reference
62
+ @reference = reference
63
+ free_handle
64
+ construct
65
+ end
66
+ end
67
+
70
68
  def num_data
71
69
  out = ::FFI::MemoryPointer.new(:int)
72
70
  check_result FFI.LGBM_DatasetGetNumData(handle_pointer, out)
@@ -83,11 +81,6 @@ module LightGBM
83
81
  check_result FFI.LGBM_DatasetSaveBinary(handle_pointer, filename)
84
82
  end
85
83
 
86
- # not released yet
87
- # def dump_text(filename)
88
- # check_result FFI.LGBM_DatasetDumpText(handle_pointer, filename)
89
- # end
90
-
91
84
  def subset(used_indices, params: nil)
92
85
  # categorical_feature passed via params
93
86
  params ||= self.params
@@ -109,6 +102,70 @@ module LightGBM
109
102
 
110
103
  private
111
104
 
105
+ def construct
106
+ data = @data
107
+ used_indices = @used_indices
108
+
109
+ # TODO stringify params
110
+ params = @params || {}
111
+ if @categorical_feature != "auto" && @categorical_feature.any?
112
+ params["categorical_feature"] ||= @categorical_feature.join(",")
113
+ end
114
+ set_verbosity(params)
115
+
116
+ @handle = ::FFI::MemoryPointer.new(:pointer)
117
+ parameters = params_str(params)
118
+ reference = @reference.handle_pointer if @reference
119
+ if used_indices
120
+ used_row_indices = ::FFI::MemoryPointer.new(:int32, used_indices.count)
121
+ used_row_indices.write_array_of_int32(used_indices)
122
+ check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
123
+ elsif data.is_a?(String)
124
+ check_result FFI.LGBM_DatasetCreateFromFile(data, parameters, reference, @handle)
125
+ else
126
+ if matrix?(data)
127
+ nrow = data.row_count
128
+ ncol = data.column_count
129
+ flat_data = data.to_a.flatten
130
+ elsif daru?(data)
131
+ nrow, ncol = data.shape
132
+ flat_data = data.map_rows(&:to_a).flatten
133
+ elsif numo?(data) || rover?(data)
134
+ data = data.to_numo if rover?(data)
135
+ nrow, ncol = data.shape
136
+ else
137
+ nrow = data.count
138
+ ncol = data.first.count
139
+ flat_data = data.flatten
140
+ end
141
+
142
+ c_data = ::FFI::MemoryPointer.new(:double, nrow * ncol)
143
+ if numo?(data)
144
+ c_data.write_bytes(data.cast_to(Numo::DFloat).to_string)
145
+ else
146
+ handle_missing(flat_data)
147
+ c_data.write_array_of_double(flat_data)
148
+ end
149
+
150
+ check_result FFI.LGBM_DatasetCreateFromMat(c_data, 1, nrow, ncol, 1, parameters, reference, @handle)
151
+ end
152
+ ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer)) unless used_indices
153
+
154
+ self.label = @label if @label
155
+ self.weight = @weight if @weight
156
+ self.group = @group if @group
157
+ self.feature_names = @feature_names if @feature_names
158
+ end
159
+
160
+ def free_handle
161
+ FFI.LGBM_DatasetFree(handle_pointer)
162
+ ObjectSpace.undefine_finalizer(self)
163
+ end
164
+
165
+ def dump_text(filename)
166
+ check_result FFI.LGBM_DatasetDumpText(handle_pointer, filename)
167
+ end
168
+
112
169
  def field(field_name)
113
170
  num_data = self.num_data
114
171
  out_len = ::FFI::MemoryPointer.new(:int)
@@ -122,27 +179,15 @@ module LightGBM
122
179
  data = data.to_a unless data.is_a?(Array)
123
180
  if type == :int32
124
181
  c_data = ::FFI::MemoryPointer.new(:int32, data.count)
125
- c_data.put_array_of_int32(0, data)
182
+ c_data.write_array_of_int32(data)
126
183
  check_result FFI.LGBM_DatasetSetField(handle_pointer, field_name, c_data, data.count, 2)
127
184
  else
128
185
  c_data = ::FFI::MemoryPointer.new(:float, data.count)
129
- c_data.put_array_of_float(0, data)
186
+ c_data.write_array_of_float(data)
130
187
  check_result FFI.LGBM_DatasetSetField(handle_pointer, field_name, c_data, data.count, 0)
131
188
  end
132
189
  end
133
190
 
134
- def matrix?(data)
135
- defined?(Matrix) && data.is_a?(Matrix)
136
- end
137
-
138
- def daru?(data)
139
- defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
140
- end
141
-
142
- def narray?(data)
143
- defined?(Numo::NArray) && data.is_a?(Numo::NArray)
144
- end
145
-
146
191
  include Utils
147
192
  end
148
193
  end
@@ -5,8 +5,11 @@ module LightGBM
5
5
  begin
6
6
  ffi_lib LightGBM.ffi_lib
7
7
  rescue LoadError => e
8
- raise e if ENV["LIGHTGBM_DEBUG"]
9
- raise LoadError, "Could not find LightGBM"
8
+ if e.message.include?("Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib") && e.message.include?("Reason: image not found")
9
+ raise LoadError, "OpenMP not found. Run `brew install libomp`"
10
+ else
11
+ raise e
12
+ end
10
13
  end
11
14
 
12
15
  # https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/c_api.h
@@ -19,9 +22,11 @@ module LightGBM
19
22
  attach_function :LGBM_DatasetCreateFromFile, %i[string string pointer pointer], :int
20
23
  attach_function :LGBM_DatasetCreateFromMat, %i[pointer int int32 int32 int string pointer pointer], :int
21
24
  attach_function :LGBM_DatasetGetSubset, %i[pointer pointer int32 string pointer], :int
25
+ attach_function :LGBM_DatasetSetFeatureNames, %i[pointer pointer int], :int
26
+ attach_function :LGBM_DatasetGetFeatureNames, %i[pointer pointer pointer], :int
22
27
  attach_function :LGBM_DatasetFree, %i[pointer], :int
23
28
  attach_function :LGBM_DatasetSaveBinary, %i[pointer string], :int
24
- # attach_function :LGBM_DatasetDumpText, %i[pointer string], :int
29
+ attach_function :LGBM_DatasetDumpText, %i[pointer string], :int
25
30
  attach_function :LGBM_DatasetSetField, %i[pointer string pointer int int], :int
26
31
  attach_function :LGBM_DatasetGetField, %i[pointer string pointer pointer pointer], :int
27
32
  attach_function :LGBM_DatasetGetNumData, %i[pointer pointer], :int
@@ -5,8 +5,8 @@ module LightGBM
5
5
  end
6
6
 
7
7
  def fit(x, y, categorical_feature: "auto", eval_set: nil, eval_names: [], early_stopping_rounds: nil, verbose: true)
8
- train_set = Dataset.new(x, label: y, categorical_feature: categorical_feature)
9
- valid_sets = Array(eval_set).map { |v| Dataset.new(v[0], label: v[1], reference: train_set) }
8
+ train_set = Dataset.new(x, label: y, categorical_feature: categorical_feature, params: @params)
9
+ valid_sets = Array(eval_set).map { |v| Dataset.new(v[0], label: v[1], reference: train_set, params: @params) }
10
10
 
11
11
  @booster = LightGBM.train(@params, train_set,
12
12
  num_boost_round: @n_estimators,
@@ -23,5 +23,26 @@ module LightGBM
23
23
  params["verbosity"] = -1
24
24
  end
25
25
  end
26
+
27
+ # for categorical, NaN and negative value are the same
28
+ def handle_missing(data)
29
+ data.map! { |v| v.nil? ? Float::NAN : v }
30
+ end
31
+
32
+ def matrix?(data)
33
+ defined?(Matrix) && data.is_a?(Matrix)
34
+ end
35
+
36
+ def daru?(data)
37
+ defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
38
+ end
39
+
40
+ def numo?(data)
41
+ defined?(Numo::NArray) && data.is_a?(Numo::NArray)
42
+ end
43
+
44
+ def rover?(data)
45
+ defined?(Rover::DataFrame) && data.is_a?(Rover::DataFrame)
46
+ end
26
47
  end
27
48
  end
@@ -1,3 +1,3 @@
1
1
  module LightGBM
2
- VERSION = "0.1.4"
2
+ VERSION = "0.1.9"
3
3
  end
@@ -0,0 +1,21 @@
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) Microsoft Corporation
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
Binary file
Binary file
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lightgbm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.4
4
+ version: 0.1.9
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-08-19 00:00:00.000000000 Z
11
+ date: 2020-06-11 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -80,20 +80,6 @@ dependencies:
80
80
  - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
- - !ruby/object:Gem::Dependency
84
- name: numo-narray
85
- requirement: !ruby/object:Gem::Requirement
86
- requirements:
87
- - - ">="
88
- - !ruby/object:Gem::Version
89
- version: '0'
90
- type: :development
91
- prerelease: false
92
- version_requirements: !ruby/object:Gem::Requirement
93
- requirements:
94
- - - ">="
95
- - !ruby/object:Gem::Version
96
- version: '0'
97
83
  description:
98
84
  email: andrew@chartkick.com
99
85
  executables: []
@@ -101,6 +87,7 @@ extensions: []
101
87
  extra_rdoc_files: []
102
88
  files:
103
89
  - CHANGELOG.md
90
+ - LICENSE.txt
104
91
  - README.md
105
92
  - lib/lightgbm.rb
106
93
  - lib/lightgbm/booster.rb
@@ -112,6 +99,10 @@ files:
112
99
  - lib/lightgbm/regressor.rb
113
100
  - lib/lightgbm/utils.rb
114
101
  - lib/lightgbm/version.rb
102
+ - vendor/LICENSE
103
+ - vendor/lib_lightgbm.dll
104
+ - vendor/lib_lightgbm.dylib
105
+ - vendor/lib_lightgbm.so
115
106
  homepage: https://github.com/ankane/lightgbm
116
107
  licenses:
117
108
  - MIT
@@ -131,8 +122,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
131
122
  - !ruby/object:Gem::Version
132
123
  version: '0'
133
124
  requirements: []
134
- rubygems_version: 3.0.3
125
+ rubygems_version: 3.1.2
135
126
  signing_key:
136
127
  specification_version: 4
137
- summary: LightGBM - the high performance machine learning library - for Ruby
128
+ summary: High performance gradient boosting for Ruby
138
129
  test_files: []