xgb 0.2.0 → 0.4.1

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ea879f0494a877766dd500203b7f4ec091607a0a0cae609bc3be50fb134b87eb
4
- data.tar.gz: 1ff52f49405628aa837f4f6ff1558b4b1a624a311b4c359c0a0703499613d35d
3
+ metadata.gz: bb82540880bbecdc88d82eb71b0d3fd3e7cf276f05205ddc0f1900684c5602a2
4
+ data.tar.gz: 1463e06dce0ae99fdee5ccc1887d1a24d537fcdac7d89fb685701566083d5600
5
5
  SHA512:
6
- metadata.gz: cd2dcbbc2300bbb5900ae1cdf8abe47fd2c0f0c8dc1b825f53c5988450405298e5b8693be243ed351f7f14715e9f5963eb7c00cd39c1b25983b449398804e6f5
7
- data.tar.gz: 969c4472a3926f8cd6569269817086ed047e4e64132fd92ab1bb78ab5e8f294c3a4342cfc4e121b8a2866b8f534d1ffaa35b806d9968f198f87bb4de06a64348
6
+ metadata.gz: 4869e2465af9d824e56dfd180eff0a7b8aacc0dd9788cfb9f5d429e83c51b5bceb93b8ad93fcdb0b7b2e0ec81231204d32f738988af97304b6d2eee33fa2f709
7
+ data.tar.gz: 4206063987a450cbb82bb9d2ea0cc7cfa8bf8bb9df483d4f088943cc41ba9d396afef81136f9090e65249ed2a3d78806c3fca70de5e198a15f6199ed8b3a01e8
@@ -1,3 +1,29 @@
1
+ ## 0.4.1 (2020-08-26)
2
+
3
+ - Updated XGBoost to 1.2.0
4
+
5
+ ## 0.4.0 (2020-05-17)
6
+
7
+ - Updated XGBoost to 1.1.0
8
+ - Changed default `learning_rate` and `max_depth` for Scikit-Learn API to match Python
9
+ - Added support for Rover
10
+ - Improved performance of Numo datasets
11
+ - Improved error message when OpenMP not found on Mac
12
+
13
+ ## 0.3.1 (2020-04-16)
14
+
15
+ - Added `feature_names` and `feature_types` to `DMatrix`
16
+ - Added feature names to `dump`
17
+
18
+ ## 0.3.0 (2020-02-19)
19
+
20
+ - Updated XGBoost to 1.0.0
21
+
22
+ ## 0.2.1 (2020-02-11)
23
+
24
+ - Fixed `Could not find XGBoost` error on some Linux platforms
25
+ - Fixed `SignalException` on Windows
26
+
1
27
  ## 0.2.0 (2020-01-26)
2
28
 
3
29
  - Prefer `XGBoost` over `Xgb`
data/NOTICE.txt CHANGED
@@ -1,3 +1,4 @@
1
+ Copyright XGBoost contributors
1
2
  Copyright 2019-2020 Andrew Kane
2
3
 
3
4
  Licensed under the Apache License, Version 2.0 (the "License");
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [XGBoost](https://github.com/dmlc/xgboost) - high performance gradient boosting - for Ruby
4
4
 
5
- [![Build Status](https://travis-ci.org/ankane/xgboost.svg?branch=master)](https://travis-ci.org/ankane/xgboost)
5
+ [![Build Status](https://travis-ci.org/ankane/xgboost.svg?branch=master)](https://travis-ci.org/ankane/xgboost) [![Build status](https://ci.appveyor.com/api/projects/status/s8umwyuahvj68m6p/branch/master?svg=true)](https://ci.appveyor.com/project/ankane/xgboost/branch/master)
6
6
 
7
7
  ## Installation
8
8
 
@@ -12,9 +12,11 @@ Add this line to your application’s Gemfile:
12
12
  gem 'xgb'
13
13
  ```
14
14
 
15
- ## Getting Started
15
+ On Mac, also install OpenMP:
16
16
 
17
- This library follows the [Python API](https://xgboost.readthedocs.io/en/latest/python/python_api.html), with the `get_` and `set_` prefixes removed from methods to make it more Ruby-like.
17
+ ```sh
18
+ brew install libomp
19
+ ```
18
20
 
19
21
  ## Learning API
20
22
 
@@ -70,7 +72,7 @@ CV
70
72
  XGBoost.cv(params, dtrain, nfold: 3, verbose_eval: true)
71
73
  ```
72
74
 
73
- Set metadata about a model [master]
75
+ Set metadata about a model
74
76
 
75
77
  ```ruby
76
78
  booster["key"] = "value"
@@ -135,16 +137,22 @@ Data can be an array of arrays
135
137
  [[1, 2, 3], [4, 5, 6]]
136
138
  ```
137
139
 
138
- Or a Daru data frame
140
+ Or a Numo array
139
141
 
140
142
  ```ruby
141
- Daru::DataFrame.from_csv("houses.csv")
143
+ Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])
142
144
  ```
143
145
 
144
- Or a Numo NArray
146
+ Or a Rover data frame
145
147
 
146
148
  ```ruby
147
- Numo::DFloat.new(3, 2).seq
149
+ Rover.read_csv("houses.csv")
150
+ ```
151
+
152
+ Or a Daru data frame
153
+
154
+ ```ruby
155
+ Daru::DataFrame.from_csv("houses.csv")
148
156
  ```
149
157
 
150
158
  ## Helpful Resources
@@ -155,11 +163,13 @@ Numo::DFloat.new(3, 2).seq
155
163
  ## Related Projects
156
164
 
157
165
  - [LightGBM](https://github.com/ankane/lightgbm) - LightGBM for Ruby
158
- - [Eps](https://github.com/ankane/eps) - Machine Learning for Ruby
166
+ - [Eps](https://github.com/ankane/eps) - Machine learning for Ruby
159
167
 
160
168
  ## Credits
161
169
 
162
- Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for serving as an initial reference.
170
+ This library follows the [Python API](https://xgboost.readthedocs.io/en/latest/python/python_api.html), with the `get_` and `set_` prefixes removed from methods to make it more Ruby-like.
171
+
172
+ Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for showing how to use FFI.
163
173
 
164
174
  ## History
165
175
 
@@ -174,11 +184,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
174
184
  - Write, clarify, or fix documentation
175
185
  - Suggest or add new features
176
186
 
177
- To get started with development and testing:
187
+ To get started with development:
178
188
 
179
189
  ```sh
180
190
  git clone https://github.com/ankane/xgboost.git
181
191
  cd xgboost
182
192
  bundle install
193
+ bundle exec rake vendor:all
183
194
  bundle exec rake test
184
195
  ```
@@ -31,7 +31,8 @@ module XGBoost
31
31
  booster = Booster.new(params: params)
32
32
  num_feature = dtrain.num_col
33
33
  booster.set_param("num_feature", num_feature)
34
- booster.feature_names = num_feature.times.map { |i| "f#{i}" }
34
+ booster.feature_names = dtrain.feature_names
35
+ booster.feature_types = dtrain.feature_types
35
36
  evals ||= []
36
37
 
37
38
  if early_stopping_rounds
@@ -156,6 +157,14 @@ module XGBoost
156
157
  eval_hist
157
158
  end
158
159
 
160
+ def lib_version
161
+ major = ::FFI::MemoryPointer.new(:int)
162
+ minor = ::FFI::MemoryPointer.new(:int)
163
+ patch = ::FFI::MemoryPointer.new(:int)
164
+ FFI.XGBoostVersion(major, minor, patch)
165
+ "#{major.read_int}.#{minor.read_int}.#{patch.read_int}"
166
+ end
167
+
159
168
  private
160
169
 
161
170
  def mean(arr)
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Booster
3
- attr_accessor :best_iteration, :feature_names
3
+ attr_accessor :best_iteration, :feature_names, :feature_types
4
4
 
5
5
  def initialize(params: nil, model_file: nil)
6
6
  @handle = ::FFI::MemoryPointer.new(:pointer)
@@ -25,11 +25,8 @@ module XGBoost
25
25
  end
26
26
 
27
27
  def eval_set(evals, iteration)
28
- dmats = ::FFI::MemoryPointer.new(:pointer, evals.size)
29
- dmats.write_array_of_pointer(evals.map { |v| v[0].handle_pointer })
30
-
31
- evnames = ::FFI::MemoryPointer.new(:pointer, evals.size)
32
- evnames.write_array_of_pointer(evals.map { |v| ::FFI::MemoryPointer.from_string(v[1]) })
28
+ dmats = array_of_pointers(evals.map { |v| v[0].handle_pointer })
29
+ evnames = array_of_pointers(evals.map { |v| string_pointer(v[1]) })
33
30
 
34
31
  out_result = ::FFI::MemoryPointer.new(:pointer)
35
32
 
@@ -52,7 +49,7 @@ module XGBoost
52
49
  ntree_limit ||= 0
53
50
  out_len = ::FFI::MemoryPointer.new(:uint64)
54
51
  out_result = ::FFI::MemoryPointer.new(:pointer)
55
- check_result FFI.XGBoosterPredict(handle_pointer, data.handle_pointer, 0, ntree_limit, out_len, out_result)
52
+ check_result FFI.XGBoosterPredict(handle_pointer, data.handle_pointer, 0, ntree_limit, 0, out_len, out_result)
56
53
  out = out_result.read_pointer.read_array_of_float(read_uint64(out_len))
57
54
  num_class = out.size / data.num_row
58
55
  out = out.each_slice(num_class).to_a if num_class > 1
@@ -67,7 +64,13 @@ module XGBoost
67
64
  def dump(fmap: "", with_stats: false, dump_format: "text")
68
65
  out_len = ::FFI::MemoryPointer.new(:uint64)
69
66
  out_result = ::FFI::MemoryPointer.new(:pointer)
70
- check_result FFI.XGBoosterDumpModelEx(handle_pointer, fmap, with_stats ? 1 : 0, dump_format, out_len, out_result)
67
+
68
+ names = feature_names || []
69
+ fnames = array_of_pointers(names.map { |fname| string_pointer(fname) })
70
+ ftypes = array_of_pointers(feature_types || Array.new(names.size, string_pointer("float")))
71
+
72
+ check_result FFI.XGBoosterDumpModelExWithFeatures(handle_pointer, names.size, fnames, ftypes, with_stats ? 1 : 0, dump_format, out_len, out_result)
73
+
71
74
  out_result.read_pointer.get_array_of_string(0, read_uint64(out_len))
72
75
  end
73
76
 
@@ -155,7 +158,7 @@ module XGBoost
155
158
  end
156
159
 
157
160
  def [](key_name)
158
- key = ::FFI::MemoryPointer.from_string(key_name)
161
+ key = string_pointer(key_name)
159
162
  success = ::FFI::MemoryPointer.new(:int)
160
163
  out_result = ::FFI::MemoryPointer.new(:pointer)
161
164
 
@@ -165,8 +168,8 @@ module XGBoost
165
168
  end
166
169
 
167
170
  def []=(key_name, raw_value)
168
- key = ::FFI::MemoryPointer.from_string(key_name)
169
- value = raw_value.nil? ? nil : ::FFI::MemoryPointer.from_string(raw_value)
171
+ key = string_pointer(key_name)
172
+ value = raw_value.nil? ? nil : string_pointer(raw_value)
170
173
 
171
174
  check_result FFI.XGBoosterSetAttr(handle_pointer, key, value)
172
175
  end
@@ -188,6 +191,14 @@ module XGBoost
188
191
  @handle.read_pointer
189
192
  end
190
193
 
194
+ def array_of_pointers(values)
195
+ ::FFI::MemoryPointer.new(:pointer, values.size).write_array_of_pointer(values)
196
+ end
197
+
198
+ def string_pointer(value)
199
+ ::FFI::MemoryPointer.from_string(value.to_s)
200
+ end
201
+
191
202
  include Utils
192
203
  end
193
204
  end
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Classifier < Model
3
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: "binary:logistic", importance_type: "gain", **options)
3
+ def initialize(n_estimators: 100, objective: "binary:logistic", importance_type: "gain", **options)
4
4
  super
5
5
  end
6
6
 
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class DMatrix
3
- attr_reader :data
3
+ attr_reader :data, :feature_names, :feature_types
4
4
 
5
5
  def initialize(data, label: nil, weight: nil, missing: Float::NAN)
6
6
  @data = data
@@ -15,21 +15,42 @@ module XGBoost
15
15
  elsif daru?(data)
16
16
  nrow, ncol = data.shape
17
17
  flat_data = data.map_rows(&:to_a).flatten
18
- elsif narray?(data)
18
+ @feature_names = data.each_vector.map(&:name)
19
+ @feature_types =
20
+ data.each_vector.map(&:db_type).map do |v|
21
+ case v
22
+ when "INTEGER"
23
+ "int"
24
+ when "DOUBLE"
25
+ "float"
26
+ else
27
+ raise Error, "Unknown feature type: #{v}"
28
+ end
29
+ end
30
+ elsif numo?(data)
19
31
  nrow, ncol = data.shape
20
- flat_data = data.flatten.to_a
32
+ elsif rover?(data)
33
+ nrow, ncol = data.shape
34
+ @feature_names = data.keys
35
+ data = data.to_numo
21
36
  else
22
37
  nrow = data.count
23
38
  ncol = data.first.count
24
39
  flat_data = data.flatten
25
40
  end
26
41
 
27
- handle_missing(flat_data, missing)
28
42
  c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
29
- c_data.write_array_of_float(flat_data)
43
+ if numo?(data)
44
+ c_data.write_bytes(data.cast_to(Numo::SFloat).to_string)
45
+ else
46
+ handle_missing(flat_data, missing)
47
+ c_data.write_array_of_float(flat_data)
48
+ end
30
49
  check_result FFI.XGDMatrixCreateFromMat(c_data, nrow, ncol, missing, @handle)
31
50
 
32
51
  ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer))
52
+
53
+ @feature_names ||= ncol.times.map { |i| "f#{i}" }
33
54
  end
34
55
 
35
56
  self.label = label if label
@@ -60,7 +81,7 @@ module XGBoost
60
81
  def group=(group)
61
82
  c_data = ::FFI::MemoryPointer.new(:int, group.size)
62
83
  c_data.write_array_of_int(group)
63
- check_result FFI.XGDMatrixSetGroup(handle_pointer, c_data, group.size)
84
+ check_result FFI.XGDMatrixSetUIntInfo(handle_pointer, "group", c_data, group.size)
64
85
  end
65
86
 
66
87
  def num_row
@@ -120,10 +141,14 @@ module XGBoost
120
141
  defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
121
142
  end
122
143
 
123
- def narray?(data)
144
+ def numo?(data)
124
145
  defined?(Numo::NArray) && data.is_a?(Numo::NArray)
125
146
  end
126
147
 
148
+ def rover?(data)
149
+ defined?(Rover::DataFrame) && data.is_a?(Rover::DataFrame)
150
+ end
151
+
127
152
  def handle_missing(data, missing)
128
153
  data.map! { |v| v.nil? ? missing : v }
129
154
  end
@@ -5,19 +5,23 @@ module XGBoost
5
5
  begin
6
6
  ffi_lib XGBoost.ffi_lib
7
7
  rescue LoadError => e
8
- raise e if ENV["XGB_DEBUG"]
9
- raise LoadError, "Could not find XGBoost"
8
+ if e.message.include?("Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib") && e.message.include?("Reason: image not found")
9
+ raise LoadError, "OpenMP not found. Run `brew install libomp`"
10
+ else
11
+ raise e
12
+ end
10
13
  end
11
14
 
12
15
  # https://github.com/dmlc/xgboost/blob/master/include/xgboost/c_api.h
13
16
  # keep same order
14
17
 
15
- # error
18
+ # general
19
+ attach_function :XGBoostVersion, %i[pointer pointer pointer], :void
16
20
  attach_function :XGBGetLastError, %i[], :string
17
21
 
18
22
  # dmatrix
19
23
  attach_function :XGDMatrixCreateFromMat, %i[pointer uint64 uint64 float pointer], :int
20
- attach_function :XGDMatrixSetGroup, %i[pointer pointer uint64], :int
24
+ attach_function :XGDMatrixSetUIntInfo, %i[pointer string pointer uint64], :int
21
25
  attach_function :XGDMatrixNumRow, %i[pointer pointer], :int
22
26
  attach_function :XGDMatrixNumCol, %i[pointer pointer], :int
23
27
  attach_function :XGDMatrixSliceDMatrix, %i[pointer pointer uint64 pointer], :int
@@ -32,10 +36,10 @@ module XGBoost
32
36
  attach_function :XGBoosterEvalOneIter, %i[pointer int pointer pointer uint64 pointer], :int
33
37
  attach_function :XGBoosterFree, %i[pointer], :int
34
38
  attach_function :XGBoosterSetParam, %i[pointer string string], :int
35
- attach_function :XGBoosterPredict, %i[pointer pointer int int pointer pointer], :int
39
+ attach_function :XGBoosterPredict, %i[pointer pointer int int int pointer pointer], :int
36
40
  attach_function :XGBoosterLoadModel, %i[pointer string], :int
37
41
  attach_function :XGBoosterSaveModel, %i[pointer string], :int
38
- attach_function :XGBoosterDumpModelEx, %i[pointer string int string pointer pointer], :int
42
+ attach_function :XGBoosterDumpModelExWithFeatures, %i[pointer int pointer pointer int string pointer pointer], :int
39
43
  attach_function :XGBoosterGetAttr, %i[pointer pointer pointer pointer], :int
40
44
  attach_function :XGBoosterSetAttr, %i[pointer pointer pointer], :int
41
45
  attach_function :XGBoosterGetAttrNames, %i[pointer pointer pointer], :int
@@ -2,12 +2,8 @@ module XGBoost
2
2
  class Model
3
3
  attr_reader :booster
4
4
 
5
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: nil, importance_type: "gain", **options)
6
- @params = {
7
- max_depth: max_depth,
8
- objective: objective,
9
- learning_rate: learning_rate
10
- }.merge(options)
5
+ def initialize(n_estimators: 100, importance_type: "gain", **options)
6
+ @params = options
11
7
  @n_estimators = n_estimators
12
8
  @importance_type = importance_type
13
9
  end
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Ranker < Model
3
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: "rank:pairwise", importance_type: "gain", **options)
3
+ def initialize(n_estimators: 100, objective: "rank:pairwise", importance_type: "gain", **options)
4
4
  super
5
5
  end
6
6
 
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Regressor < Model
3
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: "reg:squarederror", importance_type: "gain", **options)
3
+ def initialize(n_estimators: 100, objective: "reg:squarederror", importance_type: "gain", **options)
4
4
  super
5
5
  end
6
6
 
@@ -1,3 +1,3 @@
1
1
  module XGBoost
2
- VERSION = "0.2.0"
2
+ VERSION = "0.4.1"
3
3
  end
Binary file
Binary file
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xgb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-01-27 00:00:00.000000000 Z
11
+ date: 2020-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -94,6 +94,20 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rover-df
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
97
111
  description:
98
112
  email: andrew@chartkick.com
99
113
  executables: []