eps 0.3.1 → 0.3.6

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a59850fe508d404a023145710505e721f1bfc24935a30a090aee09d179887d3a
4
- data.tar.gz: 8218bc5bb63ee5ebbd23a8e9a129bcd76789b1f6bb628d57b015f1d5740183ac
3
+ metadata.gz: d56573908e892d8d1959d66c7b6f2940f8930a2d0f2dfd5d4da75e2ff7cfdb63
4
+ data.tar.gz: 9eaf1a06c8c51ba15d9b4468796fc869f2933945494d027b54789304080c5d5b
5
5
  SHA512:
6
- metadata.gz: db1011e9228763dc0a98e1e57d1c9e18a297d362cea18b33bf8eeffecce853ea49d4273ae4e782a6de2be37711e9e6373810e5517558248489e696b477c0848b
7
- data.tar.gz: 6b9f52453be9d2ad7a29a4703508763988447de64a7599c53f9b9d3b0135e105130aba3c2679fed17ea60ba7242b6bd0d3cac9c5c2b796fe93f9009f0bbbcb30
6
+ metadata.gz: 971dbd2a95a280ed50925df68a29018ba7b3bccb7094b1374923a8ce7d100720202245843e003b26447832e9c1f8285bafcc7692020f5971a56c0a8e89a12afb
7
+ data.tar.gz: de06585dc75608b0f8c62188cce351987a0cd53f3b12889d4d63de28ed81ae1b143e31f47ac8c53083eeb250e18c5f8b721fff94a378e14203fd8fa90ba3e440
@@ -1,7 +1,30 @@
1
+ ## 0.3.6 (2020-06-19)
2
+
3
+ - Fixed error with text features for LightGBM
4
+
5
+ ## 0.3.5 (2020-06-10)
6
+
7
+ - Added `learning_rate` option for LightGBM
8
+ - Added support for Numo and Rover
9
+
10
+ ## 0.3.4 (2020-04-05)
11
+
12
+ - Added `predict_probability` for classification
13
+
14
+ ## 0.3.3 (2020-02-24)
15
+
16
+ - Fixed errors and incorrect predictions with boolean columns
17
+ - Fixed deprecation warnings in Ruby 2.7
18
+
19
+ ## 0.3.2 (2019-12-08)
20
+
21
+ - Added support for GSLR
22
+
1
23
  ## 0.3.1 (2019-12-06)
2
24
 
3
25
  - Added `weight` option for LightGBM and linear regression
4
26
  - Added `intercept` option for linear regression
27
+ - Added LightGBM evaluator safety check
5
28
  - Fixed `Unknown label` error for LightGBM
6
29
  - Fixed error message for unstable solutions with linear regression
7
30
 
data/README.md CHANGED
@@ -4,7 +4,6 @@ Machine learning for Ruby
4
4
 
5
5
  - Build predictive models quickly and easily
6
6
  - Serve models built in Ruby, Python, R, and more
7
- - No prior knowledge of machine learning required :tada:
8
7
 
9
8
  Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
10
9
 
@@ -314,7 +313,7 @@ y = [1, 2, 3]
314
313
  Eps::Model.new(x, y)
315
314
  ```
316
315
 
317
- Or pass arrays of arrays
316
+ Data can be an array of arrays
318
317
 
319
318
  ```ruby
320
319
  x = [[1, 2], [2, 0], [3, 1]]
@@ -322,18 +321,29 @@ y = [1, 2, 3]
322
321
  Eps::Model.new(x, y)
323
322
  ```
324
323
 
325
- ### Daru
324
+ Or Numo arrays
326
325
 
327
- Eps works well with Daru data frames.
326
+ ```ruby
327
+ x = Numo::NArray.cast([[1, 2], [2, 0], [3, 1]])
328
+ y = Numo::NArray.cast([1, 2, 3])
329
+ Eps::Model.new(x, y)
330
+ ```
331
+
332
+ Or a Rover data frame
328
333
 
329
334
  ```ruby
330
- df = Daru::DataFrame.from_csv("houses.csv")
335
+ df = Rover.read_csv("houses.csv")
331
336
  Eps::Model.new(df, target: "price")
332
337
  ```
333
338
 
334
- ### CSVs
339
+ Or a Daru data frame
335
340
 
336
- When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
341
+ ```ruby
342
+ df = Daru::DataFrame.from_csv("houses.csv")
343
+ Eps::Model.new(df, target: "price")
344
+ ```
345
+
346
+ When reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.
337
347
 
338
348
  ```ruby
339
349
  CSV.table("data.csv").map { |row| row.to_h }
@@ -353,9 +363,23 @@ Eps supports:
353
363
  - Linear Regression
354
364
  - Naive Bayes
355
365
 
366
+ ### LightGBM
367
+
368
+ Pass the learning rate with:
369
+
370
+ ```ruby
371
+ Eps::Model.new(data, learning_rate: 0.01)
372
+ ```
373
+
356
374
  ### Linear Regression
357
375
 
358
- To speed up training on large datasets with linear regression, [install GSL](https://www.gnu.org/software/gsl/). With Homebrew, you can use:
376
+ By default, an intercept is included. Disable this with:
377
+
378
+ ```ruby
379
+ Eps::Model.new(data, intercept: false)
380
+ ```
381
+
382
+ To speed up training on large datasets with linear regression, [install GSL](https://github.com/ankane/gslr#gsl-installation). With Homebrew, you can use:
359
383
 
360
384
  ```sh
361
385
  brew install gsl
@@ -364,17 +388,21 @@ brew install gsl
364
388
  Then, add this line to your application’s Gemfile:
365
389
 
366
390
  ```ruby
367
- gem 'gsl', group: :development
391
+ gem 'gslr', group: :development
368
392
  ```
369
393
 
370
394
  It only needs to be available in environments used to build the model.
371
395
 
372
- By default, an intercept is included. Disable this with:
396
+ ## Probability
397
+
398
+ To get the probability of each category for predictions with classification, use:
373
399
 
374
400
  ```ruby
375
- Eps::Model.new(data, intercept: false)
401
+ model.predict_probability(data)
376
402
  ```
377
403
 
404
+ Naive Bayes is known to produce poor probability estimates, so stick with LightGBM if you need this.
405
+
378
406
  ## Validation Options
379
407
 
380
408
  Pass your own validation set with:
@@ -410,7 +438,7 @@ The database is another place you can store models. It’s good if you retrain m
410
438
  Create an ActiveRecord model to store the predictive model.
411
439
 
412
440
  ```sh
413
- rails g model Model key:string:uniq data:text
441
+ rails generate model Model key:string:uniq data:text
414
442
  ```
415
443
 
416
444
  Store the model with:
@@ -520,11 +548,11 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
520
548
  - Write, clarify, or fix documentation
521
549
  - Suggest or add new features
522
550
 
523
- To get started with development and testing:
551
+ To get started with development:
524
552
 
525
553
  ```sh
526
554
  git clone https://github.com/ankane/eps.git
527
555
  cd eps
528
556
  bundle install
529
- rake test
557
+ bundle exec rake test
530
558
  ```
@@ -2,33 +2,18 @@ module Eps
2
2
  class BaseEstimator
3
3
  def initialize(data = nil, y = nil, **options)
4
4
  @options = options.dup
5
+ @trained = false
6
+ @text_encoders = {}
5
7
  # TODO better pattern - don't pass most options to train
6
- options.delete(:intercept)
7
8
  train(data, y, **options) if data
8
9
  end
9
10
 
10
11
  def predict(data)
11
- singular = data.is_a?(Hash)
12
- data = [data] if singular
13
-
14
- data = Eps::DataFrame.new(data)
15
-
16
- @evaluator.features.each do |k, type|
17
- values = data.columns[k]
18
- raise ArgumentError, "Missing column: #{k}" if !values
19
- column_type = Utils.column_type(values.compact, k) if values
20
-
21
- if !column_type.nil?
22
- if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
23
- raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
24
- end
25
- end
26
- # TODO check for unknown values for categorical features
27
- end
28
-
29
- predictions = @evaluator.predict(data)
12
+ _predict(data, false)
13
+ end
30
14
 
31
- singular ? predictions.first : predictions
15
+ def predict_probability(data)
16
+ _predict(data, true)
32
17
  end
33
18
 
34
19
  def evaluate(data, y = nil, target: nil, weight: nil)
@@ -48,6 +33,8 @@ module Eps
48
33
  end
49
34
 
50
35
  def summary(extended: false)
36
+ raise "Summary not available for loaded models" unless @trained
37
+
51
38
  str = String.new("")
52
39
 
53
40
  if @validation_set
@@ -72,7 +59,31 @@ module Eps
72
59
 
73
60
  private
74
61
 
75
- def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, verbose: nil, text_features: nil, early_stopping: nil)
62
+ def _predict(data, probabilities)
63
+ singular = data.is_a?(Hash)
64
+ data = [data] if singular
65
+
66
+ data = Eps::DataFrame.new(data)
67
+
68
+ @evaluator.features.each do |k, type|
69
+ values = data.columns[k]
70
+ raise ArgumentError, "Missing column: #{k}" if !values
71
+ column_type = Utils.column_type(values.compact, k) if values
72
+
73
+ if !column_type.nil?
74
+ if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
75
+ raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
76
+ end
77
+ end
78
+ # TODO check for unknown values for categorical features
79
+ end
80
+
81
+ predictions = @evaluator.predict(data, probabilities: probabilities)
82
+
83
+ singular ? predictions.first : predictions
84
+ end
85
+
86
+ def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, text_features: nil, **options)
76
87
  data, @target = prep_data(data, y, target, weight)
77
88
  @target_type = Utils.column_type(data.label, @target)
78
89
 
@@ -164,11 +175,13 @@ module Eps
164
175
  raise "No data in validation set" if validation_set && validation_set.empty?
165
176
 
166
177
  @validation_set = validation_set
167
- @evaluator = _train(verbose: verbose, early_stopping: early_stopping)
178
+ @evaluator = _train(**options)
168
179
 
169
180
  # reset pmml
170
181
  @pmml = nil
171
182
 
183
+ @trained = true
184
+
172
185
  nil
173
186
  end
174
187
 
@@ -197,29 +210,38 @@ module Eps
197
210
  [data, target]
198
211
  end
199
212
 
200
- def prep_text_features(train_set)
201
- @text_encoders = {}
213
+ def prep_text_features(train_set, fit: true)
202
214
  @text_features.each do |k, v|
203
- # reset vocabulary
204
- v.delete(:vocabulary)
215
+ if fit
216
+ # reset vocabulary
217
+ v.delete(:vocabulary)
218
+
219
+ # TODO determine max features automatically
220
+ # start based on number of rows
221
+ encoder = Eps::TextEncoder.new(**v)
222
+ counts = encoder.fit(train_set.columns.delete(k))
223
+ else
224
+ encoder = @text_encoders[k]
225
+ counts = encoder.transform(train_set.columns.delete(k))
226
+ end
205
227
 
206
- # TODO determine max features automatically
207
- # start based on number of rows
208
- encoder = Eps::TextEncoder.new(v)
209
- counts = encoder.fit(train_set.columns.delete(k))
210
228
  encoder.vocabulary.each do |word|
211
229
  train_set.columns[[k, word]] = [0] * counts.size
212
230
  end
231
+
213
232
  counts.each_with_index do |ci, i|
214
233
  ci.each do |word, count|
215
234
  word_key = [k, word]
216
235
  train_set.columns[word_key][i] = 1 if train_set.columns.key?(word_key)
217
236
  end
218
237
  end
219
- @text_encoders[k] = encoder
220
238
 
221
- # update vocabulary
222
- v[:vocabulary] = encoder.vocabulary
239
+ if fit
240
+ @text_encoders[k] = encoder
241
+
242
+ # update vocabulary
243
+ v[:vocabulary] = encoder.vocabulary
244
+ end
223
245
  end
224
246
 
225
247
  raise "No features left" if train_set.columns.empty?
@@ -233,7 +255,7 @@ module Eps
233
255
 
234
256
  def check_missing(c, name)
235
257
  raise ArgumentError, "Missing column: #{name}" if !c
236
- raise ArgumentError, "Missing values in column #{name}" if c.any?(&:nil?)
258
+ raise ArgumentError, "Missing values in column #{name}" if c.to_a.any?(&:nil?)
237
259
  end
238
260
 
239
261
  def check_missing_value(df)
@@ -10,7 +10,7 @@ module Eps
10
10
  data.columns.each do |k, v|
11
11
  @columns[k] = v
12
12
  end
13
- elsif daru?(data)
13
+ elsif rover?(data) || daru?(data)
14
14
  data.to_h.each do |k, v|
15
15
  @columns[k.to_s] = v.to_a
16
16
  end
@@ -19,6 +19,8 @@ module Eps
19
19
  @columns[k.to_s] = v.to_a
20
20
  end
21
21
  else
22
+ data = data.to_a if numo?(data)
23
+
22
24
  if data.any?
23
25
  row = data[0]
24
26
 
@@ -140,8 +142,16 @@ module Eps
140
142
 
141
143
  private
142
144
 
145
+ def numo?(x)
146
+ defined?(Numo::NArray) && x.is_a?(Numo::NArray)
147
+ end
148
+
149
+ def rover?(x)
150
+ defined?(Rover::DataFrame) && x.is_a?(Rover::DataFrame)
151
+ end
152
+
143
153
  def daru?(x)
144
- defined?(Daru) && x.is_a?(Daru::DataFrame)
154
+ defined?(Daru::DataFrame) && x.is_a?(Daru::DataFrame)
145
155
  end
146
156
  end
147
157
  end
@@ -11,19 +11,15 @@ module Eps
11
11
  @text_features = text_features
12
12
  end
13
13
 
14
- def predict(data)
14
+ def predict(data, probabilities: false)
15
+ raise "Probabilities not supported" if probabilities && @objective == "regression"
16
+
15
17
  rows = data.map(&:to_h)
16
18
 
17
19
  # sparse matrix
18
20
  @text_features.each do |k, v|
19
- encoder = TextEncoder.new(v)
20
-
21
- values = data.columns.delete(k)
22
- counts = encoder.transform(values)
23
-
24
- encoder.vocabulary.each do |word|
25
- data.columns[[k, word]] = [0] * values.size
26
- end
21
+ encoder = TextEncoder.new(**v)
22
+ counts = encoder.transform(data.columns[k])
27
23
 
28
24
  counts.each_with_index do |xc, i|
29
25
  row = rows[i]
@@ -38,17 +34,28 @@ module Eps
38
34
  when "regression"
39
35
  sum_trees(rows, @trees)
40
36
  when "binary"
41
- sum_trees(rows, @trees).map { |s| @labels[sigmoid(s) > 0.5 ? 1 : 0] }
37
+ prob = sum_trees(rows, @trees).map { |s| sigmoid(s) }
38
+ if probabilities
39
+ prob.map { |v| @labels.zip([1 - v, v]).to_h }
40
+ else
41
+ prob.map { |v| @labels[v > 0.5 ? 1 : 0] }
42
+ end
42
43
  else
43
44
  tree_scores = []
44
45
  num_trees = @trees.size / @labels.size
45
46
  @trees.each_slice(num_trees).each do |trees|
46
47
  tree_scores << sum_trees(rows, trees)
47
48
  end
48
- data.size.times.map do |i|
49
+ rows.size.times.map do |i|
49
50
  v = tree_scores.map { |s| s[i] }
50
- idx = v.map.with_index.max_by { |v2, _| v2 }.last
51
- @labels[idx]
51
+ if probabilities
52
+ exp = v.map { |vi| Math.exp(vi) }
53
+ sum = exp.sum
54
+ @labels.zip(exp.map { |e| e / sum }).to_h
55
+ else
56
+ idx = v.map.with_index.max_by { |v2, _| v2 }.last
57
+ @labels[idx]
58
+ end
52
59
  end
53
60
  end
54
61
  end
@@ -81,7 +88,7 @@ module Eps
81
88
  else
82
89
  case node.operator
83
90
  when "equal"
84
- v == node.value
91
+ v.to_s == node.value
85
92
  when "in"
86
93
  node.value.include?(v)
87
94
  when "greaterThan"
@@ -109,7 +116,7 @@ module Eps
109
116
  end
110
117
 
111
118
  def sigmoid(x)
112
- 1.0 / (1 + Math::E**(-x))
119
+ 1.0 / (1 + Math.exp(-x))
113
120
  end
114
121
  end
115
122
  end
@@ -9,7 +9,9 @@ module Eps
9
9
  @text_features = text_features || {}
10
10
  end
11
11
 
12
- def predict(x)
12
+ def predict(x, probabilities: false)
13
+ raise "Probabilities not supported" if probabilities
14
+
13
15
  intercept = @coefficients["_intercept"] || 0.0
14
16
  scores = [intercept] * x.size
15
17
 
@@ -19,10 +21,11 @@ module Eps
19
21
  case type
20
22
  when "categorical"
21
23
  x.columns[k].each_with_index do |xv, i|
22
- scores[i] += @coefficients[[k, xv]].to_f
24
+ # TODO clean up
25
+ scores[i] += (@coefficients[[k, xv]] || @coefficients[[k, xv.to_s]]).to_f
23
26
  end
24
27
  when "text"
25
- encoder = TextEncoder.new(@text_features[k])
28
+ encoder = TextEncoder.new(**@text_features[k])
26
29
  counts = encoder.transform(x.columns[k])
27
30
  coef = {}
28
31
  @coefficients.each do |k2, v|
@@ -10,14 +10,15 @@ module Eps
10
10
  @legacy = legacy
11
11
  end
12
12
 
13
- def predict(x)
13
+ def predict(x, probabilities: false)
14
14
  probs = calculate_class_probabilities(x)
15
15
  probs.map do |xp|
16
- # convert probabilities
17
- # not needed when just returning label
18
- # sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
19
- # p xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
20
- xp.sort_by { |k, v| [-v, k] }[0][0]
16
+ if probabilities
17
+ sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
18
+ xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
19
+ else
20
+ xp.sort_by { |k, v| [-v, k] }[0][0]
21
+ end
21
22
  end
22
23
  end
23
24
 
@@ -38,7 +39,8 @@ module Eps
38
39
  case type
39
40
  when "categorical"
40
41
  x.columns[k].each_with_index do |xi, i|
41
- vc = probabilities[:conditional][k][xi]
42
+ # TODO clean this up
43
+ vc = probabilities[:conditional][k][xi] || probabilities[:conditional][k][xi.to_s]
42
44
 
43
45
  # unknown value if not vc
44
46
  if vc
@@ -17,7 +17,7 @@ module Eps
17
17
  str
18
18
  end
19
19
 
20
- def _train(verbose: nil, early_stopping: nil)
20
+ def _train(verbose: nil, early_stopping: nil, learning_rate: 0.1)
21
21
  train_set = @train_set
22
22
  validation_set = @validation_set.dup
23
23
  summary_label = train_set.label
@@ -57,10 +57,13 @@ module Eps
57
57
 
58
58
  # text feature encoding
59
59
  prep_text_features(train_set)
60
- prep_text_features(validation_set) if validation_set
60
+ prep_text_features(validation_set, fit: false) if validation_set
61
61
 
62
62
  # create params
63
- params = {objective: objective}
63
+ params = {
64
+ objective: objective,
65
+ learning_rate: learning_rate
66
+ }
64
67
  params[:num_classes] = labels.size if objective == "multiclass"
65
68
  if train_set.size < 30
66
69
  params[:min_data_in_bin] = 1
@@ -121,25 +124,30 @@ module Eps
121
124
  def check_evaluator(objective, labels, booster, booster_set, evaluator, evaluator_set)
122
125
  expected = @booster.predict(booster_set.map_rows(&:to_a))
123
126
  if objective == "multiclass"
124
- expected.map! do |v|
125
- labels[v.map.with_index.max_by { |v2, _| v2 }.last]
126
- end
127
+ actual = evaluator.predict(evaluator_set, probabilities: true)
128
+ # just compare first for now
129
+ expected.map! { |v| v.first }
130
+ actual.map! { |v| v.values.first }
127
131
  elsif objective == "binary"
128
- expected.map! { |v| labels[v >= 0.5 ? 1 : 0] }
132
+ actual = evaluator.predict(evaluator_set, probabilities: true).map { |v| v.values.last }
133
+ else
134
+ actual = evaluator.predict(evaluator_set)
129
135
  end
130
- actual = evaluator.predict(evaluator_set)
131
136
 
132
- regression = objective == "regression"
137
+ regression = objective == "regression" || objective == "binary"
133
138
  bad_observations = []
134
139
  expected.zip(actual).each_with_index do |(exp, act), i|
135
- success = regression ? (act - exp).abs < 0.001 : act == exp
140
+ success = (act - exp).abs < 0.001
136
141
  unless success
137
142
  bad_observations << {expected: exp, actual: act, data_point: evaluator_set[i].map(&:itself).first}
138
143
  end
139
144
  end
140
145
 
141
146
  if bad_observations.any?
142
- raise "Bug detected in evaluator. Please report an issue. Bad data points: #{bad_observations.inspect}"
147
+ bad_observations.each do |obs|
148
+ p obs
149
+ end
150
+ raise "Bug detected in evaluator. Please report an issue."
143
151
  end
144
152
  end
145
153
 
@@ -37,6 +37,7 @@ module Eps
37
37
  str
38
38
  end
39
39
 
40
+ # TODO use keyword arguments for gsl and intercept in 0.4.0
40
41
  def _train(**options)
41
42
  raise "Target must be numeric" if @target_type != "numeric"
42
43
  check_missing_value(@train_set)
@@ -50,17 +51,35 @@ module Eps
50
51
 
51
52
  x = data.map_rows(&:to_a)
52
53
 
53
- intercept = @options.key?(:intercept) ? @options[:intercept] : true
54
- if intercept
54
+ gsl =
55
+ if options.key?(:gsl)
56
+ options[:gsl]
57
+ elsif defined?(GSL)
58
+ true
59
+ elsif defined?(GSLR)
60
+ :gslr
61
+ else
62
+ false
63
+ end
64
+
65
+ intercept = options.key?(:intercept) ? options[:intercept] : true
66
+ if intercept && gsl != :gslr
55
67
  data.size.times do |i|
56
68
  x[i].unshift(1)
57
69
  end
58
70
  end
59
71
 
60
- gsl = options.key?(:gsl) ? options[:gsl] : defined?(GSL)
61
-
62
72
  v3 =
63
- if gsl
73
+ if gsl == :gslr
74
+ model = GSLR::OLS.new(intercept: intercept)
75
+ model.fit(x, data.label, weight: data.weight)
76
+
77
+ @covariance = model.covariance
78
+
79
+ coefficients = model.coefficients.dup
80
+ coefficients.unshift(model.intercept) if intercept
81
+ coefficients
82
+ elsif gsl
64
83
  x = GSL::Matrix.alloc(*x)
65
84
  y = GSL::Vector.alloc(data.label)
66
85
  w = GSL::Vector.alloc(data.weight) if data.weight
@@ -196,7 +215,11 @@ module Eps
196
215
 
197
216
  def diagonal
198
217
  @diagonal ||= begin
199
- if covariance.respond_to?(:each)
218
+ if covariance.is_a?(Array)
219
+ covariance.size.times.map do |i|
220
+ covariance[i][i]
221
+ end
222
+ elsif covariance.respond_to?(:each)
200
223
  d = covariance.each(:diagonal).to_a
201
224
  @removed.each do |i|
202
225
  d.insert(i, 0)
@@ -17,7 +17,7 @@ module Eps
17
17
  str
18
18
  end
19
19
 
20
- def _train(smoothing: 1, **options)
20
+ def _train(smoothing: 1)
21
21
  raise "Target must be strings" if @target_type != "categorical"
22
22
  check_missing_value(@train_set)
23
23
  check_missing_value(@validation_set) if @validation_set
@@ -210,10 +210,10 @@ module Eps
210
210
  probabilities[:conditional].each do |k, v|
211
211
  xml.BayesInput(fieldName: k) do
212
212
  if features[k] == "categorical"
213
- v.sort_by { |k2, _| k2 }.each do |k2, v2|
213
+ v.sort_by { |k2, _| k2.to_s }.each do |k2, v2|
214
214
  xml.PairCounts(value: k2) do
215
215
  xml.TargetValueCounts do
216
- v2.sort_by { |k2, _| k2 }.each do |k3, v3|
216
+ v2.sort_by { |k2, _| k2.to_s }.each do |k3, v3|
217
217
  xml.TargetValueCount(value: k3, count: v3)
218
218
  end
219
219
  end
@@ -221,7 +221,7 @@ module Eps
221
221
  end
222
222
  else
223
223
  xml.TargetValueStats do
224
- v.sort_by { |k2, _| k2 }.each do |k2, v2|
224
+ v.sort_by { |k2, _| k2.to_s }.each do |k2, v2|
225
225
  xml.TargetValueStat(value: k2) do
226
226
  xml.GaussianDistribution(mean: v2[:mean], variance: v2[:stdev]**2)
227
227
  end
@@ -233,7 +233,7 @@ module Eps
233
233
  end
234
234
  xml.BayesOutput(fieldName: "target") do
235
235
  xml.TargetValueCounts do
236
- probabilities[:prior].sort_by { |k, _| k }.each do |k, v|
236
+ probabilities[:prior].sort_by { |k, _| k.to_s }.each do |k, v|
237
237
  xml.TargetValueCount(value: k, count: v)
238
238
  end
239
239
  end
@@ -1,3 +1,3 @@
1
1
  module Eps
2
- VERSION = "0.3.1"
2
+ VERSION = "0.3.6"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: eps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-12-06 00:00:00.000000000 Z
11
+ date: 2020-06-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: lightgbm
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: numo-narray
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: rake
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -94,6 +108,20 @@ dependencies:
94
108
  - - ">="
95
109
  - !ruby/object:Gem::Version
96
110
  version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: rover-df
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
97
125
  description:
98
126
  email: andrew@chartkick.com
99
127
  executables: []
@@ -143,7 +171,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
143
171
  - !ruby/object:Gem::Version
144
172
  version: '0'
145
173
  requirements: []
146
- rubygems_version: 3.0.3
174
+ rubygems_version: 3.1.2
147
175
  signing_key:
148
176
  specification_version: 4
149
177
  summary: Machine learning for Ruby. Supports regression (linear regression) and classification