lightgbm 0.1.6 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +31 -7
- data/LICENSE.txt +18 -18
- data/README.md +30 -20
- data/lib/lightgbm.rb +17 -7
- data/lib/lightgbm/booster.rb +27 -21
- data/lib/lightgbm/dataset.rb +95 -54
- data/lib/lightgbm/ffi.rb +10 -10
- data/lib/lightgbm/utils.rb +5 -1
- data/lib/lightgbm/version.rb +1 -1
- data/vendor/lib_lightgbm.dll +0 -0
- data/vendor/lib_lightgbm.dylib +0 -0
- data/vendor/lib_lightgbm.so +0 -0
- metadata +8 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 6113148a40170a5599cdb8f1798bff80da62f15575baab818e04e951046a8e57
|
4
|
+
data.tar.gz: 53973708703c24ed2fd62a2d78f653e0f3d0236e30e63824988f645011a69de1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 75e73bc112f072843d2596f8be10fcb964788feb355117edac2f24cbaf3ab12740f08f1d9708c7204f1764f9283956bdede416b0f8cfb0da42c7d90293f97b42
|
7
|
+
data.tar.gz: 0c165211fe5c0ae036dc14b3ade6df6fe0eea1a485833e7a351dce129671b7c868e71840a89962eb90008ff264629fddf4b014de0a98b9be636da9d6f6b31337
|
data/CHANGELOG.md
CHANGED
@@ -1,16 +1,40 @@
|
|
1
|
-
## 0.1
|
1
|
+
## 0.2.1 (2020-11-15)
|
2
|
+
|
3
|
+
- Updated LightGBM to 3.1.0
|
4
|
+
|
5
|
+
## 0.2.0 (2020-08-31)
|
6
|
+
|
7
|
+
- Updated LightGBM to 3.0.0
|
8
|
+
- Made `best_iteration` and `eval_hist` consistent with Python
|
9
|
+
|
10
|
+
## 0.1.9 (2020-06-10)
|
11
|
+
|
12
|
+
- Added support for Rover
|
13
|
+
- Improved performance of Numo datasets
|
14
|
+
|
15
|
+
## 0.1.8 (2020-05-09)
|
16
|
+
|
17
|
+
- Improved error message when OpenMP not found on Mac
|
18
|
+
- Fixed `Cannot add validation data` error
|
19
|
+
|
20
|
+
## 0.1.7 (2019-12-05)
|
21
|
+
|
22
|
+
- Updated LightGBM to 2.3.1
|
23
|
+
- Switched to doubles for datasets and predictions
|
24
|
+
|
25
|
+
## 0.1.6 (2019-09-29)
|
2
26
|
|
3
27
|
- Updated LightGBM to 2.3.0
|
4
28
|
- Fixed error with JRuby
|
5
29
|
|
6
|
-
## 0.1.5
|
30
|
+
## 0.1.5 (2019-09-03)
|
7
31
|
|
8
32
|
- Packaged LightGBM with gem
|
9
33
|
- Added support for missing values
|
10
34
|
- Added `feature_names` to datasets
|
11
35
|
- Fixed Daru training and prediction
|
12
36
|
|
13
|
-
## 0.1.4
|
37
|
+
## 0.1.4 (2019-08-19)
|
14
38
|
|
15
39
|
- Friendlier message when LightGBM not found
|
16
40
|
- Added `Ranker`
|
@@ -18,22 +42,22 @@
|
|
18
42
|
- Free memory when objects are destroyed
|
19
43
|
- Removed unreleased `dump_text` method
|
20
44
|
|
21
|
-
## 0.1.3
|
45
|
+
## 0.1.3 (2019-08-16)
|
22
46
|
|
23
47
|
- Added Scikit-Learn API
|
24
48
|
- Added support for Daru and Numo::NArray
|
25
49
|
|
26
|
-
## 0.1.2
|
50
|
+
## 0.1.2 (2019-08-15)
|
27
51
|
|
28
52
|
- Added `cv` method
|
29
53
|
- Added early stopping
|
30
54
|
- Fixed multiclass classification
|
31
55
|
|
32
|
-
## 0.1.1
|
56
|
+
## 0.1.1 (2019-08-14)
|
33
57
|
|
34
58
|
- Added training API
|
35
59
|
- Added many methods
|
36
60
|
|
37
|
-
## 0.1.0
|
61
|
+
## 0.1.0 (2019-08-13)
|
38
62
|
|
39
63
|
- First release
|
data/LICENSE.txt
CHANGED
@@ -1,22 +1,22 @@
|
|
1
|
-
|
1
|
+
The MIT License (MIT)
|
2
2
|
|
3
|
-
|
3
|
+
Copyright (c) Microsoft Corporation
|
4
|
+
Copyright (c) 2019-2020 Andrew Kane
|
4
5
|
|
5
|
-
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
the following conditions:
|
6
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
7
|
+
of this software and associated documentation files (the "Software"), to deal
|
8
|
+
in the Software without restriction, including without limitation the rights
|
9
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
10
|
+
copies of the Software, and to permit persons to whom the Software is
|
11
|
+
furnished to do so, subject to the following conditions:
|
12
12
|
|
13
|
-
The above copyright notice and this permission notice shall be
|
14
|
-
|
13
|
+
The above copyright notice and this permission notice shall be included in all
|
14
|
+
copies or substantial portions of the Software.
|
15
15
|
|
16
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
OF
|
22
|
-
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
17
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
18
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
19
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
20
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
21
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
22
|
+
SOFTWARE.
|
data/README.md
CHANGED
@@ -1,8 +1,6 @@
|
|
1
1
|
# LightGBM
|
2
2
|
|
3
|
-
[LightGBM](https://github.com/microsoft/LightGBM) -
|
4
|
-
|
5
|
-
:fire: Uses the C API for blazing performance
|
3
|
+
[LightGBM](https://github.com/microsoft/LightGBM) - high performance gradient boosting - for Ruby
|
6
4
|
|
7
5
|
[![Build Status](https://travis-ci.org/ankane/lightgbm.svg?branch=master)](https://travis-ci.org/ankane/lightgbm)
|
8
6
|
|
@@ -20,16 +18,6 @@ On Mac, also install OpenMP:
|
|
20
18
|
brew install libomp
|
21
19
|
```
|
22
20
|
|
23
|
-
## Getting Started
|
24
|
-
|
25
|
-
This library follows the [Python API](https://lightgbm.readthedocs.io/en/latest/Python-API.html). A few differences are:
|
26
|
-
|
27
|
-
- The `get_` and `set_` prefixes are removed from methods
|
28
|
-
- The default verbosity is `-1`
|
29
|
-
- With the `cv` method, `stratified` is set to `false`
|
30
|
-
|
31
|
-
Some methods and options are also missing at the moment. PRs welcome!
|
32
|
-
|
33
21
|
## Training API
|
34
22
|
|
35
23
|
Prep your data
|
@@ -141,16 +129,22 @@ Data can be an array of arrays
|
|
141
129
|
[[1, 2, 3], [4, 5, 6]]
|
142
130
|
```
|
143
131
|
|
144
|
-
Or a
|
132
|
+
Or a Numo array
|
145
133
|
|
146
134
|
```ruby
|
147
|
-
|
135
|
+
Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])
|
148
136
|
```
|
149
137
|
|
150
|
-
Or a
|
138
|
+
Or a Rover data frame
|
151
139
|
|
152
140
|
```ruby
|
153
|
-
|
141
|
+
Rover.read_csv("houses.csv")
|
142
|
+
```
|
143
|
+
|
144
|
+
Or a Daru data frame
|
145
|
+
|
146
|
+
```ruby
|
147
|
+
Daru::DataFrame.from_csv("houses.csv")
|
154
148
|
```
|
155
149
|
|
156
150
|
## Helpful Resources
|
@@ -160,12 +154,18 @@ Numo::DFloat.new(3, 2).seq
|
|
160
154
|
|
161
155
|
## Related Projects
|
162
156
|
|
163
|
-
- [
|
164
|
-
- [Eps](https://github.com/ankane/eps) - Machine
|
157
|
+
- [XGBoost](https://github.com/ankane/xgboost) - XGBoost for Ruby
|
158
|
+
- [Eps](https://github.com/ankane/eps) - Machine learning for Ruby
|
165
159
|
|
166
160
|
## Credits
|
167
161
|
|
168
|
-
|
162
|
+
This library follows the [Python API](https://lightgbm.readthedocs.io/en/latest/Python-API.html). A few differences are:
|
163
|
+
|
164
|
+
- The `get_` and `set_` prefixes are removed from methods
|
165
|
+
- The default verbosity is `-1`
|
166
|
+
- With the `cv` method, `stratified` is set to `false`
|
167
|
+
|
168
|
+
Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for showing how to use FFI.
|
169
169
|
|
170
170
|
## History
|
171
171
|
|
@@ -179,3 +179,13 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
|
|
179
179
|
- Fix bugs and [submit pull requests](https://github.com/ankane/lightgbm/pulls)
|
180
180
|
- Write, clarify, or fix documentation
|
181
181
|
- Suggest or add new features
|
182
|
+
|
183
|
+
To get started with development:
|
184
|
+
|
185
|
+
```sh
|
186
|
+
git clone https://github.com/ankane/lightgbm.git
|
187
|
+
cd lightgbm
|
188
|
+
bundle install
|
189
|
+
bundle exec rake vendor:all
|
190
|
+
bundle exec rake test
|
191
|
+
```
|
data/lib/lightgbm.rb
CHANGED
@@ -36,6 +36,8 @@ module LightGBM
|
|
36
36
|
booster.train_data_name = name || "training"
|
37
37
|
valid_contain_train = true
|
38
38
|
else
|
39
|
+
# ensure the validation set references the training set
|
40
|
+
data.reference = train_set
|
39
41
|
booster.add_valid(data, name || "valid_#{i}")
|
40
42
|
end
|
41
43
|
end
|
@@ -59,16 +61,14 @@ module LightGBM
|
|
59
61
|
# print results
|
60
62
|
messages = []
|
61
63
|
|
64
|
+
eval_valid = booster.eval_valid
|
62
65
|
if valid_contain_train
|
63
|
-
|
64
|
-
booster.eval_train.reverse.each do |res|
|
65
|
-
messages << "%s's %s: %g" % [res[0], res[1], res[2]]
|
66
|
-
end
|
66
|
+
eval_valid = eval_valid + booster.eval_train
|
67
67
|
end
|
68
|
-
|
69
|
-
eval_valid = booster.eval_valid
|
70
68
|
# not sure why reversed in output
|
71
|
-
eval_valid.reverse
|
69
|
+
eval_valid.reverse!
|
70
|
+
|
71
|
+
eval_valid.each do |res|
|
72
72
|
messages << "%s's %s: %g" % [res[0], res[1], res[2]]
|
73
73
|
end
|
74
74
|
|
@@ -133,6 +133,7 @@ module LightGBM
|
|
133
133
|
if early_stopping_rounds
|
134
134
|
best_score = {}
|
135
135
|
best_iter = {}
|
136
|
+
best_iteration = nil
|
136
137
|
end
|
137
138
|
|
138
139
|
num_boost_round.times do |iteration|
|
@@ -172,6 +173,7 @@ module LightGBM
|
|
172
173
|
best_score[k] = score
|
173
174
|
best_iter[k] = iteration
|
174
175
|
elsif iteration - best_iter[k] >= early_stopping_rounds
|
176
|
+
best_iteration = best_iter[k]
|
175
177
|
stop_early = true
|
176
178
|
break
|
177
179
|
end
|
@@ -180,6 +182,14 @@ module LightGBM
|
|
180
182
|
end
|
181
183
|
end
|
182
184
|
|
185
|
+
if early_stopping_rounds
|
186
|
+
# use best iteration from first metric if not stopped early
|
187
|
+
best_iteration ||= best_iter[best_iter.keys.first]
|
188
|
+
eval_hist.each_key do |k|
|
189
|
+
eval_hist[k] = eval_hist[k].first(best_iteration + 1)
|
190
|
+
end
|
191
|
+
end
|
192
|
+
|
183
193
|
eval_hist
|
184
194
|
end
|
185
195
|
|
data/lib/lightgbm/booster.rb
CHANGED
@@ -30,7 +30,7 @@ module LightGBM
|
|
30
30
|
|
31
31
|
def current_iteration
|
32
32
|
out = ::FFI::MemoryPointer.new(:int)
|
33
|
-
check_result FFI
|
33
|
+
check_result FFI.LGBM_BoosterGetCurrentIteration(handle_pointer, out)
|
34
34
|
out.read_int
|
35
35
|
end
|
36
36
|
|
@@ -38,12 +38,13 @@ module LightGBM
|
|
38
38
|
num_iteration ||= best_iteration
|
39
39
|
buffer_len = 1 << 20
|
40
40
|
out_len = ::FFI::MemoryPointer.new(:int64)
|
41
|
-
out_str = ::FFI::MemoryPointer.new(:
|
42
|
-
|
41
|
+
out_str = ::FFI::MemoryPointer.new(:char, buffer_len)
|
42
|
+
feature_importance_type = 0 # TODO add option
|
43
|
+
check_result FFI.LGBM_BoosterDumpModel(handle_pointer, start_iteration, num_iteration, feature_importance_type, buffer_len, out_len, out_str)
|
43
44
|
actual_len = read_int64(out_len)
|
44
45
|
if actual_len > buffer_len
|
45
|
-
out_str = ::FFI::MemoryPointer.new(:
|
46
|
-
check_result FFI.LGBM_BoosterDumpModel(handle_pointer, start_iteration, num_iteration, actual_len, out_len, out_str)
|
46
|
+
out_str = ::FFI::MemoryPointer.new(:char, actual_len)
|
47
|
+
check_result FFI.LGBM_BoosterDumpModel(handle_pointer, start_iteration, num_iteration, feature_importance_type, actual_len, out_len, out_str)
|
47
48
|
end
|
48
49
|
out_str.read_string
|
49
50
|
end
|
@@ -85,12 +86,13 @@ module LightGBM
|
|
85
86
|
num_iteration ||= best_iteration
|
86
87
|
buffer_len = 1 << 20
|
87
88
|
out_len = ::FFI::MemoryPointer.new(:int64)
|
88
|
-
out_str = ::FFI::MemoryPointer.new(:
|
89
|
-
|
89
|
+
out_str = ::FFI::MemoryPointer.new(:char, buffer_len)
|
90
|
+
feature_importance_type = 0 # TODO add option
|
91
|
+
check_result FFI.LGBM_BoosterSaveModelToString(handle_pointer, start_iteration, num_iteration, feature_importance_type, buffer_len, out_len, out_str)
|
90
92
|
actual_len = read_int64(out_len)
|
91
93
|
if actual_len > buffer_len
|
92
|
-
out_str = ::FFI::MemoryPointer.new(:
|
93
|
-
check_result FFI.LGBM_BoosterSaveModelToString(handle_pointer, start_iteration, num_iteration, actual_len, out_len, out_str)
|
94
|
+
out_str = ::FFI::MemoryPointer.new(:char, actual_len)
|
95
|
+
check_result FFI.LGBM_BoosterSaveModelToString(handle_pointer, start_iteration, num_iteration, feature_importance_type, actual_len, out_len, out_str)
|
94
96
|
end
|
95
97
|
out_str.read_string
|
96
98
|
end
|
@@ -104,18 +106,18 @@ module LightGBM
|
|
104
106
|
|
105
107
|
def num_model_per_iteration
|
106
108
|
out = ::FFI::MemoryPointer.new(:int)
|
107
|
-
check_result FFI
|
109
|
+
check_result FFI.LGBM_BoosterNumModelPerIteration(handle_pointer, out)
|
108
110
|
out.read_int
|
109
111
|
end
|
110
112
|
|
111
113
|
def num_trees
|
112
114
|
out = ::FFI::MemoryPointer.new(:int)
|
113
|
-
check_result FFI
|
115
|
+
check_result FFI.LGBM_BoosterNumberOfTotalModel(handle_pointer, out)
|
114
116
|
out.read_int
|
115
117
|
end
|
116
118
|
|
117
119
|
# TODO support different prediction types
|
118
|
-
def predict(input, num_iteration: nil, **params)
|
120
|
+
def predict(input, start_iteration: nil, num_iteration: nil, **params)
|
119
121
|
input =
|
120
122
|
if daru?(input)
|
121
123
|
input.map_rows(&:to_a)
|
@@ -126,17 +128,18 @@ module LightGBM
|
|
126
128
|
singular = !input.first.is_a?(Array)
|
127
129
|
input = [input] if singular
|
128
130
|
|
131
|
+
start_iteration ||= 0
|
129
132
|
num_iteration ||= best_iteration
|
130
133
|
num_class ||= num_class()
|
131
134
|
|
132
135
|
flat_input = input.flatten
|
133
136
|
handle_missing(flat_input)
|
134
|
-
data = ::FFI::MemoryPointer.new(:
|
135
|
-
data.
|
137
|
+
data = ::FFI::MemoryPointer.new(:double, input.count * input.first.count)
|
138
|
+
data.write_array_of_double(flat_input)
|
136
139
|
|
137
140
|
out_len = ::FFI::MemoryPointer.new(:int64)
|
138
141
|
out_result = ::FFI::MemoryPointer.new(:double, num_class * input.count)
|
139
|
-
check_result FFI.LGBM_BoosterPredictForMat(handle_pointer, data,
|
142
|
+
check_result FFI.LGBM_BoosterPredictForMat(handle_pointer, data, 1, input.count, input.first.count, 1, 0, start_iteration, num_iteration, params_str(params), out_len, out_result)
|
140
143
|
out = out_result.read_array_of_double(read_int64(out_len))
|
141
144
|
out = out.each_slice(num_class).to_a if num_class > 1
|
142
145
|
|
@@ -145,7 +148,8 @@ module LightGBM
|
|
145
148
|
|
146
149
|
def save_model(filename, num_iteration: nil, start_iteration: 0)
|
147
150
|
num_iteration ||= best_iteration
|
148
|
-
|
151
|
+
feature_importance_type = 0 # TODO add
|
152
|
+
check_result FFI.LGBM_BoosterSaveModel(handle_pointer, start_iteration, num_iteration, feature_importance_type, filename)
|
149
153
|
self # consistent with Python API
|
150
154
|
end
|
151
155
|
|
@@ -168,17 +172,19 @@ module LightGBM
|
|
168
172
|
|
169
173
|
def eval_counts
|
170
174
|
out = ::FFI::MemoryPointer.new(:int)
|
171
|
-
check_result FFI
|
175
|
+
check_result FFI.LGBM_BoosterGetEvalCounts(handle_pointer, out)
|
172
176
|
out.read_int
|
173
177
|
end
|
174
178
|
|
175
179
|
def eval_names
|
176
180
|
eval_counts ||= eval_counts()
|
177
181
|
out_len = ::FFI::MemoryPointer.new(:int)
|
182
|
+
out_buffer_len = ::FFI::MemoryPointer.new(:size_t)
|
178
183
|
out_strs = ::FFI::MemoryPointer.new(:pointer, eval_counts)
|
179
|
-
|
180
|
-
|
181
|
-
|
184
|
+
buffer_len = 255
|
185
|
+
str_ptrs = eval_counts.times.map { ::FFI::MemoryPointer.new(:char, buffer_len) }
|
186
|
+
out_strs.write_array_of_pointer(str_ptrs)
|
187
|
+
check_result FFI.LGBM_BoosterGetEvalNames(handle_pointer, eval_counts, out_len, buffer_len, out_buffer_len, out_strs)
|
182
188
|
str_ptrs.map(&:read_string)
|
183
189
|
end
|
184
190
|
|
@@ -198,7 +204,7 @@ module LightGBM
|
|
198
204
|
|
199
205
|
def num_class
|
200
206
|
out = ::FFI::MemoryPointer.new(:int)
|
201
|
-
check_result FFI
|
207
|
+
check_result FFI.LGBM_BoosterGetNumClasses(handle_pointer, out)
|
202
208
|
out.read_int
|
203
209
|
end
|
204
210
|
|
data/lib/lightgbm/dataset.rb
CHANGED
@@ -4,51 +4,16 @@ module LightGBM
|
|
4
4
|
|
5
5
|
def initialize(data, label: nil, weight: nil, group: nil, params: nil, reference: nil, used_indices: nil, categorical_feature: "auto", feature_names: nil)
|
6
6
|
@data = data
|
7
|
+
@label = label
|
8
|
+
@weight = weight
|
9
|
+
@group = group
|
10
|
+
@params = params
|
11
|
+
@reference = reference
|
12
|
+
@used_indices = used_indices
|
13
|
+
@categorical_feature = categorical_feature
|
14
|
+
@feature_names = feature_names
|
7
15
|
|
8
|
-
|
9
|
-
params ||= {}
|
10
|
-
if categorical_feature != "auto" && categorical_feature.any?
|
11
|
-
params["categorical_feature"] ||= categorical_feature.join(",")
|
12
|
-
end
|
13
|
-
set_verbosity(params)
|
14
|
-
|
15
|
-
@handle = ::FFI::MemoryPointer.new(:pointer)
|
16
|
-
parameters = params_str(params)
|
17
|
-
reference = reference.handle_pointer if reference
|
18
|
-
if used_indices
|
19
|
-
used_row_indices = ::FFI::MemoryPointer.new(:int32, used_indices.count)
|
20
|
-
used_row_indices.put_array_of_int32(0, used_indices)
|
21
|
-
check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
|
22
|
-
elsif data.is_a?(String)
|
23
|
-
check_result FFI.LGBM_DatasetCreateFromFile(data, parameters, reference, @handle)
|
24
|
-
else
|
25
|
-
if matrix?(data)
|
26
|
-
nrow = data.row_count
|
27
|
-
ncol = data.column_count
|
28
|
-
flat_data = data.to_a.flatten
|
29
|
-
elsif daru?(data)
|
30
|
-
nrow, ncol = data.shape
|
31
|
-
flat_data = data.map_rows(&:to_a).flatten
|
32
|
-
elsif narray?(data)
|
33
|
-
nrow, ncol = data.shape
|
34
|
-
flat_data = data.flatten.to_a
|
35
|
-
else
|
36
|
-
nrow = data.count
|
37
|
-
ncol = data.first.count
|
38
|
-
flat_data = data.flatten
|
39
|
-
end
|
40
|
-
|
41
|
-
handle_missing(flat_data)
|
42
|
-
c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
|
43
|
-
c_data.put_array_of_float(0, flat_data)
|
44
|
-
check_result FFI.LGBM_DatasetCreateFromMat(c_data, 0, nrow, ncol, 1, parameters, reference, @handle)
|
45
|
-
end
|
46
|
-
ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer)) unless used_indices
|
47
|
-
|
48
|
-
self.label = label if label
|
49
|
-
self.weight = weight if weight
|
50
|
-
self.group = group if group
|
51
|
-
self.feature_names = feature_names if feature_names
|
16
|
+
construct
|
52
17
|
end
|
53
18
|
|
54
19
|
def label
|
@@ -59,34 +24,50 @@ module LightGBM
|
|
59
24
|
field("weight")
|
60
25
|
end
|
61
26
|
|
62
|
-
def label=(label)
|
63
|
-
set_field("label", label)
|
64
|
-
end
|
65
|
-
|
66
27
|
def feature_names
|
67
28
|
# must preallocate space
|
68
29
|
num_feature_names = ::FFI::MemoryPointer.new(:int)
|
69
|
-
|
70
|
-
|
71
|
-
out_strs.
|
72
|
-
|
30
|
+
out_buffer_len = ::FFI::MemoryPointer.new(:size_t)
|
31
|
+
len = 1000
|
32
|
+
out_strs = ::FFI::MemoryPointer.new(:pointer, len)
|
33
|
+
buffer_len = 255
|
34
|
+
str_ptrs = len.times.map { ::FFI::MemoryPointer.new(:char, buffer_len) }
|
35
|
+
out_strs.write_array_of_pointer(str_ptrs)
|
36
|
+
check_result FFI.LGBM_DatasetGetFeatureNames(handle_pointer, len, num_feature_names, buffer_len, out_buffer_len, out_strs)
|
73
37
|
str_ptrs[0, num_feature_names.read_int].map(&:read_string)
|
74
38
|
end
|
75
39
|
|
40
|
+
def label=(label)
|
41
|
+
@label = label
|
42
|
+
set_field("label", label)
|
43
|
+
end
|
44
|
+
|
76
45
|
def weight=(weight)
|
46
|
+
@weight = weight
|
77
47
|
set_field("weight", weight)
|
78
48
|
end
|
79
49
|
|
80
50
|
def group=(group)
|
51
|
+
@group = group
|
81
52
|
set_field("group", group, type: :int32)
|
82
53
|
end
|
83
54
|
|
84
55
|
def feature_names=(feature_names)
|
56
|
+
@feature_names = feature_names
|
85
57
|
c_feature_names = ::FFI::MemoryPointer.new(:pointer, feature_names.size)
|
86
58
|
c_feature_names.write_array_of_pointer(feature_names.map { |v| ::FFI::MemoryPointer.from_string(v) })
|
87
59
|
check_result FFI.LGBM_DatasetSetFeatureNames(handle_pointer, c_feature_names, feature_names.size)
|
88
60
|
end
|
89
61
|
|
62
|
+
# TODO only update reference if not in chain
|
63
|
+
def reference=(reference)
|
64
|
+
if reference != @reference
|
65
|
+
@reference = reference
|
66
|
+
free_handle
|
67
|
+
construct
|
68
|
+
end
|
69
|
+
end
|
70
|
+
|
90
71
|
def num_data
|
91
72
|
out = ::FFI::MemoryPointer.new(:int)
|
92
73
|
check_result FFI.LGBM_DatasetGetNumData(handle_pointer, out)
|
@@ -124,6 +105,66 @@ module LightGBM
|
|
124
105
|
|
125
106
|
private
|
126
107
|
|
108
|
+
def construct
|
109
|
+
data = @data
|
110
|
+
used_indices = @used_indices
|
111
|
+
|
112
|
+
# TODO stringify params
|
113
|
+
params = @params || {}
|
114
|
+
if @categorical_feature != "auto" && @categorical_feature.any?
|
115
|
+
params["categorical_feature"] ||= @categorical_feature.join(",")
|
116
|
+
end
|
117
|
+
set_verbosity(params)
|
118
|
+
|
119
|
+
@handle = ::FFI::MemoryPointer.new(:pointer)
|
120
|
+
parameters = params_str(params)
|
121
|
+
reference = @reference.handle_pointer if @reference
|
122
|
+
if used_indices
|
123
|
+
used_row_indices = ::FFI::MemoryPointer.new(:int32, used_indices.count)
|
124
|
+
used_row_indices.write_array_of_int32(used_indices)
|
125
|
+
check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
|
126
|
+
elsif data.is_a?(String)
|
127
|
+
check_result FFI.LGBM_DatasetCreateFromFile(data, parameters, reference, @handle)
|
128
|
+
else
|
129
|
+
if matrix?(data)
|
130
|
+
nrow = data.row_count
|
131
|
+
ncol = data.column_count
|
132
|
+
flat_data = data.to_a.flatten
|
133
|
+
elsif daru?(data)
|
134
|
+
nrow, ncol = data.shape
|
135
|
+
flat_data = data.map_rows(&:to_a).flatten
|
136
|
+
elsif numo?(data) || rover?(data)
|
137
|
+
data = data.to_numo if rover?(data)
|
138
|
+
nrow, ncol = data.shape
|
139
|
+
else
|
140
|
+
nrow = data.count
|
141
|
+
ncol = data.first.count
|
142
|
+
flat_data = data.flatten
|
143
|
+
end
|
144
|
+
|
145
|
+
c_data = ::FFI::MemoryPointer.new(:double, nrow * ncol)
|
146
|
+
if numo?(data)
|
147
|
+
c_data.write_bytes(data.cast_to(Numo::DFloat).to_string)
|
148
|
+
else
|
149
|
+
handle_missing(flat_data)
|
150
|
+
c_data.write_array_of_double(flat_data)
|
151
|
+
end
|
152
|
+
|
153
|
+
check_result FFI.LGBM_DatasetCreateFromMat(c_data, 1, nrow, ncol, 1, parameters, reference, @handle)
|
154
|
+
end
|
155
|
+
ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer)) unless used_indices
|
156
|
+
|
157
|
+
self.label = @label if @label
|
158
|
+
self.weight = @weight if @weight
|
159
|
+
self.group = @group if @group
|
160
|
+
self.feature_names = @feature_names if @feature_names
|
161
|
+
end
|
162
|
+
|
163
|
+
def free_handle
|
164
|
+
FFI.LGBM_DatasetFree(handle_pointer)
|
165
|
+
ObjectSpace.undefine_finalizer(self)
|
166
|
+
end
|
167
|
+
|
127
168
|
def dump_text(filename)
|
128
169
|
check_result FFI.LGBM_DatasetDumpText(handle_pointer, filename)
|
129
170
|
end
|
@@ -141,11 +182,11 @@ module LightGBM
|
|
141
182
|
data = data.to_a unless data.is_a?(Array)
|
142
183
|
if type == :int32
|
143
184
|
c_data = ::FFI::MemoryPointer.new(:int32, data.count)
|
144
|
-
c_data.
|
185
|
+
c_data.write_array_of_int32(data)
|
145
186
|
check_result FFI.LGBM_DatasetSetField(handle_pointer, field_name, c_data, data.count, 2)
|
146
187
|
else
|
147
188
|
c_data = ::FFI::MemoryPointer.new(:float, data.count)
|
148
|
-
c_data.
|
189
|
+
c_data.write_array_of_float(data)
|
149
190
|
check_result FFI.LGBM_DatasetSetField(handle_pointer, field_name, c_data, data.count, 0)
|
150
191
|
end
|
151
192
|
end
|
data/lib/lightgbm/ffi.rb
CHANGED
@@ -5,11 +5,11 @@ module LightGBM
|
|
5
5
|
begin
|
6
6
|
ffi_lib LightGBM.ffi_lib
|
7
7
|
rescue LoadError => e
|
8
|
-
|
9
|
-
|
10
|
-
|
8
|
+
if e.message.include?("Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib") && e.message.include?("Reason: image not found")
|
9
|
+
raise LoadError, "OpenMP not found. Run `brew install libomp`"
|
10
|
+
else
|
11
|
+
raise e
|
11
12
|
end
|
12
|
-
raise LoadError, "Could not find LightGBM"
|
13
13
|
end
|
14
14
|
|
15
15
|
# https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/c_api.h
|
@@ -23,7 +23,7 @@ module LightGBM
|
|
23
23
|
attach_function :LGBM_DatasetCreateFromMat, %i[pointer int int32 int32 int string pointer pointer], :int
|
24
24
|
attach_function :LGBM_DatasetGetSubset, %i[pointer pointer int32 string pointer], :int
|
25
25
|
attach_function :LGBM_DatasetSetFeatureNames, %i[pointer pointer int], :int
|
26
|
-
attach_function :LGBM_DatasetGetFeatureNames, %i[pointer pointer pointer], :int
|
26
|
+
attach_function :LGBM_DatasetGetFeatureNames, %i[pointer int pointer size_t pointer pointer], :int
|
27
27
|
attach_function :LGBM_DatasetFree, %i[pointer], :int
|
28
28
|
attach_function :LGBM_DatasetSaveBinary, %i[pointer string], :int
|
29
29
|
attach_function :LGBM_DatasetDumpText, %i[pointer string], :int
|
@@ -44,13 +44,13 @@ module LightGBM
|
|
44
44
|
attach_function :LGBM_BoosterNumModelPerIteration, %i[pointer pointer], :int
|
45
45
|
attach_function :LGBM_BoosterNumberOfTotalModel, %i[pointer pointer], :int
|
46
46
|
attach_function :LGBM_BoosterGetEvalCounts, %i[pointer pointer], :int
|
47
|
-
attach_function :LGBM_BoosterGetEvalNames, %i[pointer pointer pointer], :int
|
47
|
+
attach_function :LGBM_BoosterGetEvalNames, %i[pointer int pointer size_t pointer pointer], :int
|
48
48
|
attach_function :LGBM_BoosterGetNumFeature, %i[pointer pointer], :int
|
49
49
|
attach_function :LGBM_BoosterGetEval, %i[pointer int pointer pointer], :int
|
50
|
-
attach_function :LGBM_BoosterPredictForMat, %i[pointer pointer int int32 int32 int int int string pointer pointer], :int
|
51
|
-
attach_function :LGBM_BoosterSaveModel, %i[pointer int int string], :int
|
52
|
-
attach_function :LGBM_BoosterSaveModelToString, %i[pointer int int int64 pointer pointer], :int
|
53
|
-
attach_function :LGBM_BoosterDumpModel, %i[pointer int int int64 pointer pointer], :int
|
50
|
+
attach_function :LGBM_BoosterPredictForMat, %i[pointer pointer int int32 int32 int int int int string pointer pointer], :int
|
51
|
+
attach_function :LGBM_BoosterSaveModel, %i[pointer int int int string], :int
|
52
|
+
attach_function :LGBM_BoosterSaveModelToString, %i[pointer int int int int64 pointer pointer], :int
|
53
|
+
attach_function :LGBM_BoosterDumpModel, %i[pointer int int int int64 pointer pointer], :int
|
54
54
|
attach_function :LGBM_BoosterFeatureImportance, %i[pointer int int pointer], :int
|
55
55
|
end
|
56
56
|
end
|
data/lib/lightgbm/utils.rb
CHANGED
@@ -37,8 +37,12 @@ module LightGBM
|
|
37
37
|
defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
|
38
38
|
end
|
39
39
|
|
40
|
-
def
|
40
|
+
def numo?(data)
|
41
41
|
defined?(Numo::NArray) && data.is_a?(Numo::NArray)
|
42
42
|
end
|
43
|
+
|
44
|
+
def rover?(data)
|
45
|
+
defined?(Rover::DataFrame) && data.is_a?(Rover::DataFrame)
|
46
|
+
end
|
43
47
|
end
|
44
48
|
end
|
data/lib/lightgbm/version.rb
CHANGED
data/vendor/lib_lightgbm.dll
CHANGED
Binary file
|
data/vendor/lib_lightgbm.dylib
CHANGED
Binary file
|
data/vendor/lib_lightgbm.so
CHANGED
Binary file
|
metadata
CHANGED
@@ -1,14 +1,14 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: lightgbm
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.1
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
|
-
autorequire:
|
8
|
+
autorequire:
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
|
-
date:
|
11
|
+
date: 2020-11-16 00:00:00.000000000 Z
|
12
12
|
dependencies:
|
13
13
|
- !ruby/object:Gem::Dependency
|
14
14
|
name: ffi
|
@@ -80,7 +80,7 @@ dependencies:
|
|
80
80
|
- - ">="
|
81
81
|
- !ruby/object:Gem::Version
|
82
82
|
version: '0'
|
83
|
-
description:
|
83
|
+
description:
|
84
84
|
email: andrew@chartkick.com
|
85
85
|
executables: []
|
86
86
|
extensions: []
|
@@ -107,7 +107,7 @@ homepage: https://github.com/ankane/lightgbm
|
|
107
107
|
licenses:
|
108
108
|
- MIT
|
109
109
|
metadata: {}
|
110
|
-
post_install_message:
|
110
|
+
post_install_message:
|
111
111
|
rdoc_options: []
|
112
112
|
require_paths:
|
113
113
|
- lib
|
@@ -122,8 +122,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
122
122
|
- !ruby/object:Gem::Version
|
123
123
|
version: '0'
|
124
124
|
requirements: []
|
125
|
-
rubygems_version: 3.
|
126
|
-
signing_key:
|
125
|
+
rubygems_version: 3.1.4
|
126
|
+
signing_key:
|
127
127
|
specification_version: 4
|
128
|
-
summary:
|
128
|
+
summary: High performance gradient boosting for Ruby
|
129
129
|
test_files: []
|