xgb 0.2.0 → 0.4.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: ea879f0494a877766dd500203b7f4ec091607a0a0cae609bc3be50fb134b87eb
4
- data.tar.gz: 1ff52f49405628aa837f4f6ff1558b4b1a624a311b4c359c0a0703499613d35d
3
+ metadata.gz: bb82540880bbecdc88d82eb71b0d3fd3e7cf276f05205ddc0f1900684c5602a2
4
+ data.tar.gz: 1463e06dce0ae99fdee5ccc1887d1a24d537fcdac7d89fb685701566083d5600
5
5
  SHA512:
6
- metadata.gz: cd2dcbbc2300bbb5900ae1cdf8abe47fd2c0f0c8dc1b825f53c5988450405298e5b8693be243ed351f7f14715e9f5963eb7c00cd39c1b25983b449398804e6f5
7
- data.tar.gz: 969c4472a3926f8cd6569269817086ed047e4e64132fd92ab1bb78ab5e8f294c3a4342cfc4e121b8a2866b8f534d1ffaa35b806d9968f198f87bb4de06a64348
6
+ metadata.gz: 4869e2465af9d824e56dfd180eff0a7b8aacc0dd9788cfb9f5d429e83c51b5bceb93b8ad93fcdb0b7b2e0ec81231204d32f738988af97304b6d2eee33fa2f709
7
+ data.tar.gz: 4206063987a450cbb82bb9d2ea0cc7cfa8bf8bb9df483d4f088943cc41ba9d396afef81136f9090e65249ed2a3d78806c3fca70de5e198a15f6199ed8b3a01e8
@@ -1,3 +1,29 @@
1
+ ## 0.4.1 (2020-08-26)
2
+
3
+ - Updated XGBoost to 1.2.0
4
+
5
+ ## 0.4.0 (2020-05-17)
6
+
7
+ - Updated XGBoost to 1.1.0
8
+ - Changed default `learning_rate` and `max_depth` for Scikit-Learn API to match Python
9
+ - Added support for Rover
10
+ - Improved performance of Numo datasets
11
+ - Improved error message when OpenMP not found on Mac
12
+
13
+ ## 0.3.1 (2020-04-16)
14
+
15
+ - Added `feature_names` and `feature_types` to `DMatrix`
16
+ - Added feature names to `dump`
17
+
18
+ ## 0.3.0 (2020-02-19)
19
+
20
+ - Updated XGBoost to 1.0.0
21
+
22
+ ## 0.2.1 (2020-02-11)
23
+
24
+ - Fixed `Could not find XGBoost` error on some Linux platforms
25
+ - Fixed `SignalException` on Windows
26
+
1
27
  ## 0.2.0 (2020-01-26)
2
28
 
3
29
  - Prefer `XGBoost` over `Xgb`
data/NOTICE.txt CHANGED
@@ -1,3 +1,4 @@
1
+ Copyright XGBoost contributors
1
2
  Copyright 2019-2020 Andrew Kane
2
3
 
3
4
  Licensed under the Apache License, Version 2.0 (the "License");
data/README.md CHANGED
@@ -2,7 +2,7 @@
2
2
 
3
3
  [XGBoost](https://github.com/dmlc/xgboost) - high performance gradient boosting - for Ruby
4
4
 
5
- [![Build Status](https://travis-ci.org/ankane/xgboost.svg?branch=master)](https://travis-ci.org/ankane/xgboost)
5
+ [![Build Status](https://travis-ci.org/ankane/xgboost.svg?branch=master)](https://travis-ci.org/ankane/xgboost) [![Build status](https://ci.appveyor.com/api/projects/status/s8umwyuahvj68m6p/branch/master?svg=true)](https://ci.appveyor.com/project/ankane/xgboost/branch/master)
6
6
 
7
7
  ## Installation
8
8
 
@@ -12,9 +12,11 @@ Add this line to your application’s Gemfile:
12
12
  gem 'xgb'
13
13
  ```
14
14
 
15
- ## Getting Started
15
+ On Mac, also install OpenMP:
16
16
 
17
- This library follows the [Python API](https://xgboost.readthedocs.io/en/latest/python/python_api.html), with the `get_` and `set_` prefixes removed from methods to make it more Ruby-like.
17
+ ```sh
18
+ brew install libomp
19
+ ```
18
20
 
19
21
  ## Learning API
20
22
 
@@ -70,7 +72,7 @@ CV
70
72
  XGBoost.cv(params, dtrain, nfold: 3, verbose_eval: true)
71
73
  ```
72
74
 
73
- Set metadata about a model [master]
75
+ Set metadata about a model
74
76
 
75
77
  ```ruby
76
78
  booster["key"] = "value"
@@ -135,16 +137,22 @@ Data can be an array of arrays
135
137
  [[1, 2, 3], [4, 5, 6]]
136
138
  ```
137
139
 
138
- Or a Daru data frame
140
+ Or a Numo array
139
141
 
140
142
  ```ruby
141
- Daru::DataFrame.from_csv("houses.csv")
143
+ Numo::NArray.cast([[1, 2, 3], [4, 5, 6]])
142
144
  ```
143
145
 
144
- Or a Numo NArray
146
+ Or a Rover data frame
145
147
 
146
148
  ```ruby
147
- Numo::DFloat.new(3, 2).seq
149
+ Rover.read_csv("houses.csv")
150
+ ```
151
+
152
+ Or a Daru data frame
153
+
154
+ ```ruby
155
+ Daru::DataFrame.from_csv("houses.csv")
148
156
  ```
149
157
 
150
158
  ## Helpful Resources
@@ -155,11 +163,13 @@ Numo::DFloat.new(3, 2).seq
155
163
  ## Related Projects
156
164
 
157
165
  - [LightGBM](https://github.com/ankane/lightgbm) - LightGBM for Ruby
158
- - [Eps](https://github.com/ankane/eps) - Machine Learning for Ruby
166
+ - [Eps](https://github.com/ankane/eps) - Machine learning for Ruby
159
167
 
160
168
  ## Credits
161
169
 
162
- Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for serving as an initial reference.
170
+ This library follows the [Python API](https://xgboost.readthedocs.io/en/latest/python/python_api.html), with the `get_` and `set_` prefixes removed from methods to make it more Ruby-like.
171
+
172
+ Thanks to the [xgboost](https://github.com/PairOnAir/xgboost-ruby) gem for showing how to use FFI.
163
173
 
164
174
  ## History
165
175
 
@@ -174,11 +184,12 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
174
184
  - Write, clarify, or fix documentation
175
185
  - Suggest or add new features
176
186
 
177
- To get started with development and testing:
187
+ To get started with development:
178
188
 
179
189
  ```sh
180
190
  git clone https://github.com/ankane/xgboost.git
181
191
  cd xgboost
182
192
  bundle install
193
+ bundle exec rake vendor:all
183
194
  bundle exec rake test
184
195
  ```
@@ -31,7 +31,8 @@ module XGBoost
31
31
  booster = Booster.new(params: params)
32
32
  num_feature = dtrain.num_col
33
33
  booster.set_param("num_feature", num_feature)
34
- booster.feature_names = num_feature.times.map { |i| "f#{i}" }
34
+ booster.feature_names = dtrain.feature_names
35
+ booster.feature_types = dtrain.feature_types
35
36
  evals ||= []
36
37
 
37
38
  if early_stopping_rounds
@@ -156,6 +157,14 @@ module XGBoost
156
157
  eval_hist
157
158
  end
158
159
 
160
+ def lib_version
161
+ major = ::FFI::MemoryPointer.new(:int)
162
+ minor = ::FFI::MemoryPointer.new(:int)
163
+ patch = ::FFI::MemoryPointer.new(:int)
164
+ FFI.XGBoostVersion(major, minor, patch)
165
+ "#{major.read_int}.#{minor.read_int}.#{patch.read_int}"
166
+ end
167
+
159
168
  private
160
169
 
161
170
  def mean(arr)
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Booster
3
- attr_accessor :best_iteration, :feature_names
3
+ attr_accessor :best_iteration, :feature_names, :feature_types
4
4
 
5
5
  def initialize(params: nil, model_file: nil)
6
6
  @handle = ::FFI::MemoryPointer.new(:pointer)
@@ -25,11 +25,8 @@ module XGBoost
25
25
  end
26
26
 
27
27
  def eval_set(evals, iteration)
28
- dmats = ::FFI::MemoryPointer.new(:pointer, evals.size)
29
- dmats.write_array_of_pointer(evals.map { |v| v[0].handle_pointer })
30
-
31
- evnames = ::FFI::MemoryPointer.new(:pointer, evals.size)
32
- evnames.write_array_of_pointer(evals.map { |v| ::FFI::MemoryPointer.from_string(v[1]) })
28
+ dmats = array_of_pointers(evals.map { |v| v[0].handle_pointer })
29
+ evnames = array_of_pointers(evals.map { |v| string_pointer(v[1]) })
33
30
 
34
31
  out_result = ::FFI::MemoryPointer.new(:pointer)
35
32
 
@@ -52,7 +49,7 @@ module XGBoost
52
49
  ntree_limit ||= 0
53
50
  out_len = ::FFI::MemoryPointer.new(:uint64)
54
51
  out_result = ::FFI::MemoryPointer.new(:pointer)
55
- check_result FFI.XGBoosterPredict(handle_pointer, data.handle_pointer, 0, ntree_limit, out_len, out_result)
52
+ check_result FFI.XGBoosterPredict(handle_pointer, data.handle_pointer, 0, ntree_limit, 0, out_len, out_result)
56
53
  out = out_result.read_pointer.read_array_of_float(read_uint64(out_len))
57
54
  num_class = out.size / data.num_row
58
55
  out = out.each_slice(num_class).to_a if num_class > 1
@@ -67,7 +64,13 @@ module XGBoost
67
64
  def dump(fmap: "", with_stats: false, dump_format: "text")
68
65
  out_len = ::FFI::MemoryPointer.new(:uint64)
69
66
  out_result = ::FFI::MemoryPointer.new(:pointer)
70
- check_result FFI.XGBoosterDumpModelEx(handle_pointer, fmap, with_stats ? 1 : 0, dump_format, out_len, out_result)
67
+
68
+ names = feature_names || []
69
+ fnames = array_of_pointers(names.map { |fname| string_pointer(fname) })
70
+ ftypes = array_of_pointers(feature_types || Array.new(names.size, string_pointer("float")))
71
+
72
+ check_result FFI.XGBoosterDumpModelExWithFeatures(handle_pointer, names.size, fnames, ftypes, with_stats ? 1 : 0, dump_format, out_len, out_result)
73
+
71
74
  out_result.read_pointer.get_array_of_string(0, read_uint64(out_len))
72
75
  end
73
76
 
@@ -155,7 +158,7 @@ module XGBoost
155
158
  end
156
159
 
157
160
  def [](key_name)
158
- key = ::FFI::MemoryPointer.from_string(key_name)
161
+ key = string_pointer(key_name)
159
162
  success = ::FFI::MemoryPointer.new(:int)
160
163
  out_result = ::FFI::MemoryPointer.new(:pointer)
161
164
 
@@ -165,8 +168,8 @@ module XGBoost
165
168
  end
166
169
 
167
170
  def []=(key_name, raw_value)
168
- key = ::FFI::MemoryPointer.from_string(key_name)
169
- value = raw_value.nil? ? nil : ::FFI::MemoryPointer.from_string(raw_value)
171
+ key = string_pointer(key_name)
172
+ value = raw_value.nil? ? nil : string_pointer(raw_value)
170
173
 
171
174
  check_result FFI.XGBoosterSetAttr(handle_pointer, key, value)
172
175
  end
@@ -188,6 +191,14 @@ module XGBoost
188
191
  @handle.read_pointer
189
192
  end
190
193
 
194
+ def array_of_pointers(values)
195
+ ::FFI::MemoryPointer.new(:pointer, values.size).write_array_of_pointer(values)
196
+ end
197
+
198
+ def string_pointer(value)
199
+ ::FFI::MemoryPointer.from_string(value.to_s)
200
+ end
201
+
191
202
  include Utils
192
203
  end
193
204
  end
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Classifier < Model
3
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: "binary:logistic", importance_type: "gain", **options)
3
+ def initialize(n_estimators: 100, objective: "binary:logistic", importance_type: "gain", **options)
4
4
  super
5
5
  end
6
6
 
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class DMatrix
3
- attr_reader :data
3
+ attr_reader :data, :feature_names, :feature_types
4
4
 
5
5
  def initialize(data, label: nil, weight: nil, missing: Float::NAN)
6
6
  @data = data
@@ -15,21 +15,42 @@ module XGBoost
15
15
  elsif daru?(data)
16
16
  nrow, ncol = data.shape
17
17
  flat_data = data.map_rows(&:to_a).flatten
18
- elsif narray?(data)
18
+ @feature_names = data.each_vector.map(&:name)
19
+ @feature_types =
20
+ data.each_vector.map(&:db_type).map do |v|
21
+ case v
22
+ when "INTEGER"
23
+ "int"
24
+ when "DOUBLE"
25
+ "float"
26
+ else
27
+ raise Error, "Unknown feature type: #{v}"
28
+ end
29
+ end
30
+ elsif numo?(data)
19
31
  nrow, ncol = data.shape
20
- flat_data = data.flatten.to_a
32
+ elsif rover?(data)
33
+ nrow, ncol = data.shape
34
+ @feature_names = data.keys
35
+ data = data.to_numo
21
36
  else
22
37
  nrow = data.count
23
38
  ncol = data.first.count
24
39
  flat_data = data.flatten
25
40
  end
26
41
 
27
- handle_missing(flat_data, missing)
28
42
  c_data = ::FFI::MemoryPointer.new(:float, nrow * ncol)
29
- c_data.write_array_of_float(flat_data)
43
+ if numo?(data)
44
+ c_data.write_bytes(data.cast_to(Numo::SFloat).to_string)
45
+ else
46
+ handle_missing(flat_data, missing)
47
+ c_data.write_array_of_float(flat_data)
48
+ end
30
49
  check_result FFI.XGDMatrixCreateFromMat(c_data, nrow, ncol, missing, @handle)
31
50
 
32
51
  ObjectSpace.define_finalizer(self, self.class.finalize(handle_pointer))
52
+
53
+ @feature_names ||= ncol.times.map { |i| "f#{i}" }
33
54
  end
34
55
 
35
56
  self.label = label if label
@@ -60,7 +81,7 @@ module XGBoost
60
81
  def group=(group)
61
82
  c_data = ::FFI::MemoryPointer.new(:int, group.size)
62
83
  c_data.write_array_of_int(group)
63
- check_result FFI.XGDMatrixSetGroup(handle_pointer, c_data, group.size)
84
+ check_result FFI.XGDMatrixSetUIntInfo(handle_pointer, "group", c_data, group.size)
64
85
  end
65
86
 
66
87
  def num_row
@@ -120,10 +141,14 @@ module XGBoost
120
141
  defined?(Daru::DataFrame) && data.is_a?(Daru::DataFrame)
121
142
  end
122
143
 
123
- def narray?(data)
144
+ def numo?(data)
124
145
  defined?(Numo::NArray) && data.is_a?(Numo::NArray)
125
146
  end
126
147
 
148
+ def rover?(data)
149
+ defined?(Rover::DataFrame) && data.is_a?(Rover::DataFrame)
150
+ end
151
+
127
152
  def handle_missing(data, missing)
128
153
  data.map! { |v| v.nil? ? missing : v }
129
154
  end
@@ -5,19 +5,23 @@ module XGBoost
5
5
  begin
6
6
  ffi_lib XGBoost.ffi_lib
7
7
  rescue LoadError => e
8
- raise e if ENV["XGB_DEBUG"]
9
- raise LoadError, "Could not find XGBoost"
8
+ if e.message.include?("Library not loaded: /usr/local/opt/libomp/lib/libomp.dylib") && e.message.include?("Reason: image not found")
9
+ raise LoadError, "OpenMP not found. Run `brew install libomp`"
10
+ else
11
+ raise e
12
+ end
10
13
  end
11
14
 
12
15
  # https://github.com/dmlc/xgboost/blob/master/include/xgboost/c_api.h
13
16
  # keep same order
14
17
 
15
- # error
18
+ # general
19
+ attach_function :XGBoostVersion, %i[pointer pointer pointer], :void
16
20
  attach_function :XGBGetLastError, %i[], :string
17
21
 
18
22
  # dmatrix
19
23
  attach_function :XGDMatrixCreateFromMat, %i[pointer uint64 uint64 float pointer], :int
20
- attach_function :XGDMatrixSetGroup, %i[pointer pointer uint64], :int
24
+ attach_function :XGDMatrixSetUIntInfo, %i[pointer string pointer uint64], :int
21
25
  attach_function :XGDMatrixNumRow, %i[pointer pointer], :int
22
26
  attach_function :XGDMatrixNumCol, %i[pointer pointer], :int
23
27
  attach_function :XGDMatrixSliceDMatrix, %i[pointer pointer uint64 pointer], :int
@@ -32,10 +36,10 @@ module XGBoost
32
36
  attach_function :XGBoosterEvalOneIter, %i[pointer int pointer pointer uint64 pointer], :int
33
37
  attach_function :XGBoosterFree, %i[pointer], :int
34
38
  attach_function :XGBoosterSetParam, %i[pointer string string], :int
35
- attach_function :XGBoosterPredict, %i[pointer pointer int int pointer pointer], :int
39
+ attach_function :XGBoosterPredict, %i[pointer pointer int int int pointer pointer], :int
36
40
  attach_function :XGBoosterLoadModel, %i[pointer string], :int
37
41
  attach_function :XGBoosterSaveModel, %i[pointer string], :int
38
- attach_function :XGBoosterDumpModelEx, %i[pointer string int string pointer pointer], :int
42
+ attach_function :XGBoosterDumpModelExWithFeatures, %i[pointer int pointer pointer int string pointer pointer], :int
39
43
  attach_function :XGBoosterGetAttr, %i[pointer pointer pointer pointer], :int
40
44
  attach_function :XGBoosterSetAttr, %i[pointer pointer pointer], :int
41
45
  attach_function :XGBoosterGetAttrNames, %i[pointer pointer pointer], :int
@@ -2,12 +2,8 @@ module XGBoost
2
2
  class Model
3
3
  attr_reader :booster
4
4
 
5
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: nil, importance_type: "gain", **options)
6
- @params = {
7
- max_depth: max_depth,
8
- objective: objective,
9
- learning_rate: learning_rate
10
- }.merge(options)
5
+ def initialize(n_estimators: 100, importance_type: "gain", **options)
6
+ @params = options
11
7
  @n_estimators = n_estimators
12
8
  @importance_type = importance_type
13
9
  end
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Ranker < Model
3
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: "rank:pairwise", importance_type: "gain", **options)
3
+ def initialize(n_estimators: 100, objective: "rank:pairwise", importance_type: "gain", **options)
4
4
  super
5
5
  end
6
6
 
@@ -1,6 +1,6 @@
1
1
  module XGBoost
2
2
  class Regressor < Model
3
- def initialize(max_depth: 3, learning_rate: 0.1, n_estimators: 100, objective: "reg:squarederror", importance_type: "gain", **options)
3
+ def initialize(n_estimators: 100, objective: "reg:squarederror", importance_type: "gain", **options)
4
4
  super
5
5
  end
6
6
 
@@ -1,3 +1,3 @@
1
1
  module XGBoost
2
- VERSION = "0.2.0"
2
+ VERSION = "0.4.1"
3
3
  end
Binary file
Binary file
Binary file
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: xgb
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.4.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2020-01-27 00:00:00.000000000 Z
11
+ date: 2020-08-27 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: ffi
@@ -94,6 +94,20 @@ dependencies:
94
94
  - - ">="
95
95
  - !ruby/object:Gem::Version
96
96
  version: '0'
97
+ - !ruby/object:Gem::Dependency
98
+ name: rover-df
99
+ requirement: !ruby/object:Gem::Requirement
100
+ requirements:
101
+ - - ">="
102
+ - !ruby/object:Gem::Version
103
+ version: '0'
104
+ type: :development
105
+ prerelease: false
106
+ version_requirements: !ruby/object:Gem::Requirement
107
+ requirements:
108
+ - - ">="
109
+ - !ruby/object:Gem::Version
110
+ version: '0'
97
111
  description:
98
112
  email: andrew@chartkick.com
99
113
  executables: []