lightgbm 0.1.7 → 0.1.8

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: cf13db584839dc0cb0a9b6d2480a6846704aee74f6a984300db06901cfbb1e66
4
- data.tar.gz: ec716f3a2354c28e916b4ae89bd567e222f8bb5e5415e9579fb3c67def9c8edc
3
+ metadata.gz: d3b86686bc4575d069e469fb9c8911e93c7dcfb7622a415dc3e7ebee301947c3
4
+ data.tar.gz: a186cca11d4838fd13573b14ea154fff174804b31ae45236130c30a24f55ff7b
5
5
  SHA512:
6
- metadata.gz: db7d6da179096a0648b2257c575383e9a59caa6bae04f39275aafe9126e69b51c6048c1452d2fb8dffa538362614efe83f7c05a8df4c749808040835a977623c
7
- data.tar.gz: 5ebecf54455d3aef8ff3235e196bc064353bb7379a8b6fa61671feb223813ac6d028d7b86a0c9ada0568b06d048c6fcfdea48f082d208edaecbfdb54ead75d33
6
+ metadata.gz: 32ef8f452075bcf4441b8e0fcf68e63332c42da61445301ecd0a85b51b2af1b7a730da0810e1b182ff5fdfe184884242a5489469d16e0e9a717d19bb145eb095
7
+ data.tar.gz: 41b76c5ac174b75ce4e4f21477e4d32a9607b550e70448dde2797a765813d08a6c75de2f7f6d04ea79b4e9fa8afd768d097be516781e27e4eee4f982af364dc0
@@ -1,3 +1,8 @@
1
+ ## 0.1.8 (2020-05-09)
2
+
3
+ - Improved error message when OpenMP not found on Mac
4
+ - Fixed `Cannot add validation data` error
5
+
1
6
  ## 0.1.7 (2019-12-05)
2
7
 
3
8
  - Updated LightGBM to 2.3.1
data/README.md CHANGED
@@ -1,8 +1,6 @@
1
1
  # LightGBM
2
2
 
3
- [LightGBM](https://github.com/microsoft/LightGBM) - the high performance machine learning library - for Ruby
4
-
5
- :fire: Uses the C API for blazing performance
3
+ [LightGBM](https://github.com/microsoft/LightGBM) - high performance gradient boosting - for Ruby
6
4
 
7
5
  [![Build Status](https://travis-ci.org/ankane/lightgbm.svg?branch=master)](https://travis-ci.org/ankane/lightgbm)
8
6
 
@@ -20,16 +18,6 @@ On Mac, also install OpenMP:
20
18
  brew install libomp
21
19
  ```
22
20
 
23
- ## Getting Started
24
-
25
- This library follows the [Python API](https://lightgbm.readthedocs.io/en/latest/Python-API.html). A few differences are:
26
-
27
- - The `get_` and `set_` prefixes are removed from methods
28
- - The default verbosity is `-1`
29
- - With the `cv` method, `stratified` is set to `false`
30
-
31
- Some methods and options are also missing at the moment. PRs welcome!
32
-
33
21
  ## Training API
34
22
 
35
23
  Prep your data
@@ -160,12 +148,18 @@ Numo::DFloat.new(3, 2).seq
160
148
 
161
149
  ## Related Projects
162
150
 
163
- - [Xgb](https://github.com/ankane/xgb) - XGBoost for Ruby
164
- - [Eps](https://github.com/ankane/eps) - Machine Learning for Ruby
151
+ - [XGBoost](https://github.com/ankane/xgboost) - XGBoost for Ruby
152
+ - [Eps](https://github.com/ankane/eps) - Machine learning for Ruby
165
153
 
166
154
  ## Credits
167
155
 
168
- Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for serving as an initial reference.
156
+ This library follows the [Python API](https://lightgbm.readthedocs.io/en/latest/Python-API.html). A few differences are:
157
+
158
+ - The `get_` and `set_` prefixes are removed from methods
159
+ - The default verbosity is `-1`
160
+ - With the `cv` method, `stratified` is set to `false`
161
+
162
+ Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for showing how to use FFI.
169
163
 
170
164
  ## History
171
165
 
@@ -180,7 +174,7 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
180
174
  - Write, clarify, or fix documentation
181
175
  - Suggest or add new features
182
176
 
183
- To get started with development and testing:
177
+ To get started with development:
184
178
 
185
179
  ```sh
186
180
  git clone https://github.com/ankane/lightgbm.git
@@ -36,6 +36,8 @@ module LightGBM
36
36
  booster.train_data_name = name || "training"
37
37
  valid_contain_train = true
38
38
  else
39
+ # ensure the validation set references the training set
40
+ data.reference = train_set
39
41
  booster.add_valid(data, name || "valid_#{i}")
40
42
  end
41
43
  end
@@ -133,6 +135,7 @@ module LightGBM
133
135
  if early_stopping_rounds
134
136
  best_score = {}
135
137
  best_iter = {}
138
+ best_iteration = nil
136
139
  end
137
140
 
138
141
  num_boost_round.times do |iteration|
@@ -172,6 +175,7 @@ module LightGBM
172
175
  best_score[k] = score
173
176
  best_iter[k] = iteration
174
177
  elsif iteration - best_iter[k] >= early_stopping_rounds
178
+ best_iteration = best_iter[k]
175
179
  stop_early = true
176
180
  break
177
181
  end
@@ -180,6 +184,15 @@ module LightGBM
180
184
  end
181
185
  end
182
186
 
187
+ if early_stopping_rounds
188
+ # use best iteration from first metric if not stopped early
189
+ best_iteration ||= best_iter[best_iter.keys.first]
190
+ eval_hist.each_key do |k|
191
+ # TODO uncomment for 0.2.0
192
+ # eval_hist[k] = eval_hist[k].first(best_iteration + 1)
193
+ end
194
+ end
195
+
183
196
  eval_hist
184
197
  end
185
198
 
@@ -4,51 +4,16 @@ module LightGBM
4
4
 
5
5
  def initialize(data, label: nil, weight: nil, group: nil, params: nil, reference: nil, used_indices: nil, categorical_feature: "auto", feature_names: nil)
6
6
  @data = data
7
+ @label = label
8
+ @weight = weight
9
+ @group = group
10
+ @params = params
11
+ @reference = reference
12
+ @used_indices = used_indices
13
+ @categorical_feature = categorical_feature
14
+ @feature_names = feature_names
7
15
 
8
- # TODO stringify params
9
- params ||= {}
10
- if categorical_feature != "auto" && categorical_feature.any?
11
- params["categorical_feature"] ||= categorical_feature.join(",")
12
- end
13
- set_verbosity(params)
14
-
15
- @handle = ::FFI::MemoryPointer.new(:pointer)
16
- parameters = params_str(params)
17
- reference = reference.handle_pointer if reference
18
- if used_indices
19
- used_row_indices = ::FFI::MemoryPointer.new(:int32, used_indices.count)
20
- used_row_indices.write_array_of_int32(used_indices)
21
- check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
22
- elsif data.is_a?(String)
23
- check_result FFI.LGBM_DatasetCreateFromFile(data, parameters, reference, @handle)
24
- else
25
- if matrix?(data)
26
- nrow = data.row_count
27
- ncol = data.column_count
28
- flat_data = data.to_a.flatten
29
- elsif daru?(data)
30
- nrow, ncol = data.shape
31
- flat_data = data.map_rows(&:to_a).flatten
32
- elsif narray?(data)
33
- nrow, ncol = data.shape
34
- flat_data = data.flatten.to_a
35
- else
36
- nrow = data.count
37
- ncol = data.first.count
38
- flat_data = data.flatten
39
- end
40
-
41
- handle_missing(flat_data)
42
- c_data = ::FFI::MemoryPointer.new(:double, nrow * ncol)
43
- c_data.write_array_of_double(flat_data)
44
- check_result FFI.LGBM_DatasetCreateFromMat(c_data, 1, nrow, ncol, 1, parameters, reference, @handle)
45
- end
46
- ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer)) unless used_indices
47
-
48
- self.label = label if label
49
- self.weight = weight if weight
50
- self.group = group if group
51
- self.feature_names = feature_names if feature_names
16
+ construct
52
17
  end
53
18
 
54
19
  def label
@@ -59,10 +24,6 @@ module LightGBM
59
24
  field("weight")
60
25
  end
61
26
 
62
- def label=(label)
63
- set_field("label", label)
64
- end
65
-
66
27
  def feature_names
67
28
  # must preallocate space
68
29
  num_feature_names = ::FFI::MemoryPointer.new(:int)
@@ -73,20 +34,37 @@ module LightGBM
73
34
  str_ptrs[0, num_feature_names.read_int].map(&:read_string)
74
35
  end
75
36
 
37
+ def label=(label)
38
+ @label = label
39
+ set_field("label", label)
40
+ end
41
+
76
42
  def weight=(weight)
43
+ @weight = weight
77
44
  set_field("weight", weight)
78
45
  end
79
46
 
80
47
  def group=(group)
48
+ @group = group
81
49
  set_field("group", group, type: :int32)
82
50
  end
83
51
 
84
52
  def feature_names=(feature_names)
53
+ @feature_names = feature_names
85
54
  c_feature_names = ::FFI::MemoryPointer.new(:pointer, feature_names.size)
86
55
  c_feature_names.write_array_of_pointer(feature_names.map { |v| ::FFI::MemoryPointer.from_string(v) })
87
56
  check_result FFI.LGBM_DatasetSetFeatureNames(handle_pointer, c_feature_names, feature_names.size)
88
57
  end
89
58
 
59
+ # TODO only update reference if not in chain
60
+ def reference=(reference)
61
+ if reference != @reference
62
+ @reference = reference
63
+ free_handle
64
+ construct
65
+ end
66
+ end
67
+
90
68
  def num_data
91
69
  out = ::FFI::MemoryPointer.new(:int)
92
70
  check_result FFI.LGBM_DatasetGetNumData(handle_pointer, out)
@@ -124,6 +102,61 @@ module LightGBM
124
102
 
125
103
  private
126
104
 
105
+ def construct
106
+ data = @data
107
+ used_indices = @used_indices
108
+
109
+ # TODO stringify params
110
+ params = @params || {}
111
+ if @categorical_feature != "auto" && @categorical_feature.any?
112
+ params["categorical_feature"] ||= @categorical_feature.join(",")
113
+ end
114
+ set_verbosity(params)
115
+
116
+ @handle = ::FFI::MemoryPointer.new(:pointer)
117
+ parameters = params_str(params)
118
+ reference = @reference.handle_pointer if @reference
119
+ if used_indices
120
+ used_row_indices = ::FFI::MemoryPointer.new(:int32, used_indices.count)
121
+ used_row_indices.write_array_of_int32(used_indices)
122
+ check_result FFI.LGBM_DatasetGetSubset(reference, used_row_indices, used_indices.count, parameters, @handle)
123
+ elsif data.is_a?(String)
124
+ check_result FFI.LGBM_DatasetCreateFromFile(data, parameters, reference, @handle)
125
+ else
126
+ if matrix?(data)
127
+ nrow = data.row_count
128
+ ncol = data.column_count
129
+ flat_data = data.to_a.flatten
130
+ elsif daru?(data)
131
+ nrow, ncol = data.shape
132
+ flat_data = data.map_rows(&:to_a).flatten
133
+ elsif narray?(data)
134
+ nrow, ncol = data.shape
135
+ flat_data = data.flatten.to_a
136
+ else
137
+ nrow = data.count
138
+ ncol = data.first.count
139
+ flat_data = data.flatten
140
+ end
141
+
142
+ handle_missing(flat_data)
143
+ c_data = ::FFI::MemoryPointer.new(:double, nrow * ncol)
144
+ c_data.write_array_of_double(flat_data)
145
+ check_result FFI.LGBM_DatasetCreateFromMat(c_data, 1, nrow, ncol, 1, parameters, reference, @handle)
146
+ end
147
+ ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer)) unless used_indices
148
+
149
+ self.label = @label if @label
150
+ self.weight = @weight if @weight
151
+ self.group = @group if @group
152
+ self.feature_names = @feature_names if @feature_names
153
+ end
154
+
155
+ def free_handle
156
+ FFI.LGBM_DatasetFree(handle_pointer)
157
+ ObjectSpace.undefine_finalizer(self)
158
+ end
159
+
127
160
  def dump_text(filename)
128
161
  check_result FFI.LGBM_DatasetDumpText(handle_pointer, filename)
129
162
  end
@@ -5,11 +5,11 @@ module LightGBM
5
5
  begin
6
6
  ffi_lib LightGBM.ffi_lib
7
7
  rescue LoadError => e
8
- raise e if ENV["LIGHTGBM_DEBUG"]
9
- if e.message.include?("libomp")
10
- raise LoadError, "Could not find OpenMP"
8
+ if e.message.include?("Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib") && e.message.include?("Reason: image not found")
9
+ raise LoadError, "OpenMP not found. Run `brew install libomp`"
10
+ else
11
+ raise e
11
12
  end
12
- raise LoadError, "Could not find LightGBM"
13
13
  end
14
14
 
15
15
  # https://github.com/microsoft/LightGBM/blob/master/include/LightGBM/c_api.h
@@ -1,3 +1,3 @@
1
1
  module LightGBM
2
- VERSION = "0.1.7"
2
+ VERSION = "0.1.8"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: lightgbm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.7
4
+ version: 0.1.8
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-12-06 00:00:00.000000000 Z
11
+ date: 2020-05-09 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -122,8 +122,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
122
122
  - !ruby/object:Gem::Version
123
123
  version: '0'
124
124
  requirements: []
125
- rubygems_version: 3.0.3
125
+ rubygems_version: 3.1.2
126
126
  signing_key:
127
127
  specification_version: 4
128
- summary: LightGBM - the high performance machine learning library - for Ruby
128
+ summary: High performance gradient boosting for Ruby
129
129
  test_files: []