eps 0.3.1 → 0.3.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: a59850fe508d404a023145710505e721f1bfc24935a30a090aee09d179887d3a
4
- data.tar.gz: 8218bc5bb63ee5ebbd23a8e9a129bcd76789b1f6bb628d57b015f1d5740183ac
3
+ metadata.gz: d56573908e892d8d1959d66c7b6f2940f8930a2d0f2dfd5d4da75e2ff7cfdb63
4
+ data.tar.gz: 9eaf1a06c8c51ba15d9b4468796fc869f2933945494d027b54789304080c5d5b
5
5
  SHA512:
6
- metadata.gz: db1011e9228763dc0a98e1e57d1c9e18a297d362cea18b33bf8eeffecce853ea49d4273ae4e782a6de2be37711e9e6373810e5517558248489e696b477c0848b
7
- data.tar.gz: 6b9f52453be9d2ad7a29a4703508763988447de64a7599c53f9b9d3b0135e105130aba3c2679fed17ea60ba7242b6bd0d3cac9c5c2b796fe93f9009f0bbbcb30
6
+ metadata.gz: 971dbd2a95a280ed50925df68a29018ba7b3bccb7094b1374923a8ce7d100720202245843e003b26447832e9c1f8285bafcc7692020f5971a56c0a8e89a12afb
7
+ data.tar.gz: de06585dc75608b0f8c62188cce351987a0cd53f3b12889d4d63de28ed81ae1b143e31f47ac8c53083eeb250e18c5f8b721fff94a378e14203fd8fa90ba3e440
@@ -1,7 +1,30 @@
1
+ ## 0.3.6 (2020-06-19)
2
+
3
+ - Fixed error with text features for LightGBM
4
+
5
+ ## 0.3.5 (2020-06-10)
6
+
7
+ - Added `learning_rate` option for LightGBM
8
+ - Added support for Numo and Rover
9
+
10
+ ## 0.3.4 (2020-04-05)
11
+
12
+ - Added `predict_probability` for classification
13
+
14
+ ## 0.3.3 (2020-02-24)
15
+
16
+ - Fixed errors and incorrect predictions with boolean columns
17
+ - Fixed deprecation warnings in Ruby 2.7
18
+
19
+ ## 0.3.2 (2019-12-08)
20
+
21
+ - Added support for GSLR
22
+
1
23
  ## 0.3.1 (2019-12-06)
2
24
 
3
25
  - Added `weight` option for LightGBM and linear regression
4
26
  - Added `intercept` option for linear regression
27
+ - Added LightGBM evaluator safety check
5
28
  - Fixed `Unknown label` error for LightGBM
6
29
  - Fixed error message for unstable solutions with linear regression
7
30
 
data/README.md CHANGED
@@ -4,7 +4,6 @@ Machine learning for Ruby
4
4
 
5
5
  - Build predictive models quickly and easily
6
6
  - Serve models built in Ruby, Python, R, and more
7
- - No prior knowledge of machine learning required :tada:
8
7
 
9
8
  Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
10
9
 
@@ -314,7 +313,7 @@ y = [1, 2, 3]
314
313
  Eps::Model.new(x, y)
315
314
  ```
316
315
 
317
- Or pass arrays of arrays
316
+ Data can be an array of arrays
318
317
 
319
318
  ```ruby
320
319
  x = [[1, 2], [2, 0], [3, 1]]
@@ -322,18 +321,29 @@ y = [1, 2, 3]
322
321
  Eps::Model.new(x, y)
323
322
  ```
324
323
 
325
- ### Daru
324
+ Or Numo arrays
326
325
 
327
- Eps works well with Daru data frames.
326
+ ```ruby
327
+ x = Numo::NArray.cast([[1, 2], [2, 0], [3, 1]])
328
+ y = Numo::NArray.cast([1, 2, 3])
329
+ Eps::Model.new(x, y)
330
+ ```
331
+
332
+ Or a Rover data frame
328
333
 
329
334
  ```ruby
330
- df = Daru::DataFrame.from_csv("houses.csv")
335
+ df = Rover.read_csv("houses.csv")
331
336
  Eps::Model.new(df, target: "price")
332
337
  ```
333
338
 
334
- ### CSVs
339
+ Or a Daru data frame
335
340
 
336
- When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
341
+ ```ruby
342
+ df = Daru::DataFrame.from_csv("houses.csv")
343
+ Eps::Model.new(df, target: "price")
344
+ ```
345
+
346
+ When reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.
337
347
 
338
348
  ```ruby
339
349
  CSV.table("data.csv").map { |row| row.to_h }
@@ -353,9 +363,23 @@ Eps supports:
353
363
  - Linear Regression
354
364
  - Naive Bayes
355
365
 
366
+ ### LightGBM
367
+
368
+ Pass the learning rate with:
369
+
370
+ ```ruby
371
+ Eps::Model.new(data, learning_rate: 0.01)
372
+ ```
373
+
356
374
  ### Linear Regression
357
375
 
358
- To speed up training on large datasets with linear regression, [install GSL](https://www.gnu.org/software/gsl/). With Homebrew, you can use:
376
+ By default, an intercept is included. Disable this with:
377
+
378
+ ```ruby
379
+ Eps::Model.new(data, intercept: false)
380
+ ```
381
+
382
+ To speed up training on large datasets with linear regression, [install GSL](https://github.com/ankane/gslr#gsl-installation). With Homebrew, you can use:
359
383
 
360
384
  ```sh
361
385
  brew install gsl
@@ -364,17 +388,21 @@ brew install gsl
364
388
  Then, add this line to your application’s Gemfile:
365
389
 
366
390
  ```ruby
367
- gem 'gsl', group: :development
391
+ gem 'gslr', group: :development
368
392
  ```
369
393
 
370
394
  It only needs to be available in environments used to build the model.
371
395
 
372
- By default, an intercept is included. Disable this with:
396
+ ## Probability
397
+
398
+ To get the probability of each category for predictions with classification, use:
373
399
 
374
400
  ```ruby
375
- Eps::Model.new(data, intercept: false)
401
+ model.predict_probability(data)
376
402
  ```
377
403
 
404
+ Naive Bayes is known to produce poor probability estimates, so stick with LightGBM if you need this.
405
+
378
406
  ## Validation Options
379
407
 
380
408
  Pass your own validation set with:
@@ -410,7 +438,7 @@ The database is another place you can store models. It’s good if you retrain m
410
438
  Create an ActiveRecord model to store the predictive model.
411
439
 
412
440
  ```sh
413
- rails g model Model key:string:uniq data:text
441
+ rails generate model Model key:string:uniq data:text
414
442
  ```
415
443
 
416
444
  Store the model with:
@@ -520,11 +548,11 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
520
548
  - Write, clarify, or fix documentation
521
549
  - Suggest or add new features
522
550
 
523
- To get started with development and testing:
551
+ To get started with development:
524
552
 
525
553
  ```sh
526
554
  git clone https://github.com/ankane/eps.git
527
555
  cd eps
528
556
  bundle install
529
- rake test
557
+ bundle exec rake test
530
558
  ```
@@ -2,33 +2,18 @@ module Eps
2
2
  class BaseEstimator
3
3
  def initialize(data = nil, y = nil, **options)
4
4
  @options = options.dup
5
+ @trained = false
6
+ @text_encoders = {}
5
7
  # TODO better pattern - don't pass most options to train
6
- options.delete(:intercept)
7
8
  train(data, y, **options) if data
8
9
  end
9
10
 
10
11
  def predict(data)
11
- singular = data.is_a?(Hash)
12
- data = [data] if singular
13
-
14
- data = Eps::DataFrame.new(data)
15
-
16
- @evaluator.features.each do |k, type|
17
- values = data.columns[k]
18
- raise ArgumentError, "Missing column: #{k}" if !values
19
- column_type = Utils.column_type(values.compact, k) if values
20
-
21
- if !column_type.nil?
22
- if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
23
- raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
24
- end
25
- end
26
- # TODO check for unknown values for categorical features
27
- end
28
-
29
- predictions = @evaluator.predict(data)
12
+ _predict(data, false)
13
+ end
30
14
 
31
- singular ? predictions.first : predictions
15
+ def predict_probability(data)
16
+ _predict(data, true)
32
17
  end
33
18
 
34
19
  def evaluate(data, y = nil, target: nil, weight: nil)
@@ -48,6 +33,8 @@ module Eps
48
33
  end
49
34
 
50
35
  def summary(extended: false)
36
+ raise "Summary not available for loaded models" unless @trained
37
+
51
38
  str = String.new("")
52
39
 
53
40
  if @validation_set
@@ -72,7 +59,31 @@ module Eps
72
59
 
73
60
  private
74
61
 
75
- def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, verbose: nil, text_features: nil, early_stopping: nil)
62
+ def _predict(data, probabilities)
63
+ singular = data.is_a?(Hash)
64
+ data = [data] if singular
65
+
66
+ data = Eps::DataFrame.new(data)
67
+
68
+ @evaluator.features.each do |k, type|
69
+ values = data.columns[k]
70
+ raise ArgumentError, "Missing column: #{k}" if !values
71
+ column_type = Utils.column_type(values.compact, k) if values
72
+
73
+ if !column_type.nil?
74
+ if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
75
+ raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
76
+ end
77
+ end
78
+ # TODO check for unknown values for categorical features
79
+ end
80
+
81
+ predictions = @evaluator.predict(data, probabilities: probabilities)
82
+
83
+ singular ? predictions.first : predictions
84
+ end
85
+
86
+ def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, text_features: nil, **options)
76
87
  data, @target = prep_data(data, y, target, weight)
77
88
  @target_type = Utils.column_type(data.label, @target)
78
89
 
@@ -164,11 +175,13 @@ module Eps
164
175
  raise "No data in validation set" if validation_set && validation_set.empty?
165
176
 
166
177
  @validation_set = validation_set
167
- @evaluator = _train(verbose: verbose, early_stopping: early_stopping)
178
+ @evaluator = _train(**options)
168
179
 
169
180
  # reset pmml
170
181
  @pmml = nil
171
182
 
183
+ @trained = true
184
+
172
185
  nil
173
186
  end
174
187
 
@@ -197,29 +210,38 @@ module Eps
197
210
  [data, target]
198
211
  end
199
212
 
200
- def prep_text_features(train_set)
201
- @text_encoders = {}
213
+ def prep_text_features(train_set, fit: true)
202
214
  @text_features.each do |k, v|
203
- # reset vocabulary
204
- v.delete(:vocabulary)
215
+ if fit
216
+ # reset vocabulary
217
+ v.delete(:vocabulary)
218
+
219
+ # TODO determine max features automatically
220
+ # start based on number of rows
221
+ encoder = Eps::TextEncoder.new(**v)
222
+ counts = encoder.fit(train_set.columns.delete(k))
223
+ else
224
+ encoder = @text_encoders[k]
225
+ counts = encoder.transform(train_set.columns.delete(k))
226
+ end
205
227
 
206
- # TODO determine max features automatically
207
- # start based on number of rows
208
- encoder = Eps::TextEncoder.new(v)
209
- counts = encoder.fit(train_set.columns.delete(k))
210
228
  encoder.vocabulary.each do |word|
211
229
  train_set.columns[[k, word]] = [0] * counts.size
212
230
  end
231
+
213
232
  counts.each_with_index do |ci, i|
214
233
  ci.each do |word, count|
215
234
  word_key = [k, word]
216
235
  train_set.columns[word_key][i] = 1 if train_set.columns.key?(word_key)
217
236
  end
218
237
  end
219
- @text_encoders[k] = encoder
220
238
 
221
- # update vocabulary
222
- v[:vocabulary] = encoder.vocabulary
239
+ if fit
240
+ @text_encoders[k] = encoder
241
+
242
+ # update vocabulary
243
+ v[:vocabulary] = encoder.vocabulary
244
+ end
223
245
  end
224
246
 
225
247
  raise "No features left" if train_set.columns.empty?
@@ -233,7 +255,7 @@ module Eps
233
255
 
234
256
  def check_missing(c, name)
235
257
  raise ArgumentError, "Missing column: #{name}" if !c
236
- raise ArgumentError, "Missing values in column #{name}" if c.any?(&:nil?)
258
+ raise ArgumentError, "Missing values in column #{name}" if c.to_a.any?(&:nil?)
237
259
  end
238
260
 
239
261
  def check_missing_value(df)
@@ -10,7 +10,7 @@ module Eps
10
10
  data.columns.each do |k, v|
11
11
  @columns[k] = v
12
12
  end
13
- elsif daru?(data)
13
+ elsif rover?(data) || daru?(data)
14
14
  data.to_h.each do |k, v|
15
15
  @columns[k.to_s] = v.to_a
16
16
  end
@@ -19,6 +19,8 @@ module Eps
19
19
  @columns[k.to_s] = v.to_a
20
20
  end
21
21
  else
22
+ data = data.to_a if numo?(data)
23
+
22
24
  if data.any?
23
25
  row = data[0]
24
26
 
@@ -140,8 +142,16 @@ module Eps
140
142
 
141
143
  private
142
144
 
145
+ def numo?(x)
146
+ defined?(Numo::NArray) && x.is_a?(Numo::NArray)
147
+ end
148
+
149
+ def rover?(x)
150
+ defined?(Rover::DataFrame) && x.is_a?(Rover::DataFrame)
151
+ end
152
+
143
153
  def daru?(x)
144
- defined?(Daru) && x.is_a?(Daru::DataFrame)
154
+ defined?(Daru::DataFrame) && x.is_a?(Daru::DataFrame)
145
155
  end
146
156
  end
147
157
  end
@@ -11,19 +11,15 @@ module Eps
11
11
  @text_features = text_features
12
12
  end
13
13
 
14
- def predict(data)
14
+ def predict(data, probabilities: false)
15
+ raise "Probabilities not supported" if probabilities && @objective == "regression"
16
+
15
17
  rows = data.map(&:to_h)
16
18
 
17
19
  # sparse matrix
18
20
  @text_features.each do |k, v|
19
- encoder = TextEncoder.new(v)
20
-
21
- values = data.columns.delete(k)
22
- counts = encoder.transform(values)
23
-
24
- encoder.vocabulary.each do |word|
25
- data.columns[[k, word]] = [0] * values.size
26
- end
21
+ encoder = TextEncoder.new(**v)
22
+ counts = encoder.transform(data.columns[k])
27
23
 
28
24
  counts.each_with_index do |xc, i|
29
25
  row = rows[i]
@@ -38,17 +34,28 @@ module Eps
38
34
  when "regression"
39
35
  sum_trees(rows, @trees)
40
36
  when "binary"
41
- sum_trees(rows, @trees).map { |s| @labels[sigmoid(s) > 0.5 ? 1 : 0] }
37
+ prob = sum_trees(rows, @trees).map { |s| sigmoid(s) }
38
+ if probabilities
39
+ prob.map { |v| @labels.zip([1 - v, v]).to_h }
40
+ else
41
+ prob.map { |v| @labels[v > 0.5 ? 1 : 0] }
42
+ end
42
43
  else
43
44
  tree_scores = []
44
45
  num_trees = @trees.size / @labels.size
45
46
  @trees.each_slice(num_trees).each do |trees|
46
47
  tree_scores << sum_trees(rows, trees)
47
48
  end
48
- data.size.times.map do |i|
49
+ rows.size.times.map do |i|
49
50
  v = tree_scores.map { |s| s[i] }
50
- idx = v.map.with_index.max_by { |v2, _| v2 }.last
51
- @labels[idx]
51
+ if probabilities
52
+ exp = v.map { |vi| Math.exp(vi) }
53
+ sum = exp.sum
54
+ @labels.zip(exp.map { |e| e / sum }).to_h
55
+ else
56
+ idx = v.map.with_index.max_by { |v2, _| v2 }.last
57
+ @labels[idx]
58
+ end
52
59
  end
53
60
  end
54
61
  end
@@ -81,7 +88,7 @@ module Eps
81
88
  else
82
89
  case node.operator
83
90
  when "equal"
84
- v == node.value
91
+ v.to_s == node.value
85
92
  when "in"
86
93
  node.value.include?(v)
87
94
  when "greaterThan"
@@ -109,7 +116,7 @@ module Eps
109
116
  end
110
117
 
111
118
  def sigmoid(x)
112
- 1.0 / (1 + Math::E**(-x))
119
+ 1.0 / (1 + Math.exp(-x))
113
120
  end
114
121
  end
115
122
  end
@@ -9,7 +9,9 @@ module Eps
9
9
  @text_features = text_features || {}
10
10
  end
11
11
 
12
- def predict(x)
12
+ def predict(x, probabilities: false)
13
+ raise "Probabilities not supported" if probabilities
14
+
13
15
  intercept = @coefficients["_intercept"] || 0.0
14
16
  scores = [intercept] * x.size
15
17
 
@@ -19,10 +21,11 @@ module Eps
19
21
  case type
20
22
  when "categorical"
21
23
  x.columns[k].each_with_index do |xv, i|
22
- scores[i] += @coefficients[[k, xv]].to_f
24
+ # TODO clean up
25
+ scores[i] += (@coefficients[[k, xv]] || @coefficients[[k, xv.to_s]]).to_f
23
26
  end
24
27
  when "text"
25
- encoder = TextEncoder.new(@text_features[k])
28
+ encoder = TextEncoder.new(**@text_features[k])
26
29
  counts = encoder.transform(x.columns[k])
27
30
  coef = {}
28
31
  @coefficients.each do |k2, v|
@@ -10,14 +10,15 @@ module Eps
10
10
  @legacy = legacy
11
11
  end
12
12
 
13
- def predict(x)
13
+ def predict(x, probabilities: false)
14
14
  probs = calculate_class_probabilities(x)
15
15
  probs.map do |xp|
16
- # convert probabilities
17
- # not needed when just returning label
18
- # sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
19
- # p xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
20
- xp.sort_by { |k, v| [-v, k] }[0][0]
16
+ if probabilities
17
+ sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
18
+ xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
19
+ else
20
+ xp.sort_by { |k, v| [-v, k] }[0][0]
21
+ end
21
22
  end
22
23
  end
23
24
 
@@ -38,7 +39,8 @@ module Eps
38
39
  case type
39
40
  when "categorical"
40
41
  x.columns[k].each_with_index do |xi, i|
41
- vc = probabilities[:conditional][k][xi]
42
+ # TODO clean this up
43
+ vc = probabilities[:conditional][k][xi] || probabilities[:conditional][k][xi.to_s]
42
44
 
43
45
  # unknown value if not vc
44
46
  if vc
@@ -17,7 +17,7 @@ module Eps
17
17
  str
18
18
  end
19
19
 
20
- def _train(verbose: nil, early_stopping: nil)
20
+ def _train(verbose: nil, early_stopping: nil, learning_rate: 0.1)
21
21
  train_set = @train_set
22
22
  validation_set = @validation_set.dup
23
23
  summary_label = train_set.label
@@ -57,10 +57,13 @@ module Eps
57
57
 
58
58
  # text feature encoding
59
59
  prep_text_features(train_set)
60
- prep_text_features(validation_set) if validation_set
60
+ prep_text_features(validation_set, fit: false) if validation_set
61
61
 
62
62
  # create params
63
- params = {objective: objective}
63
+ params = {
64
+ objective: objective,
65
+ learning_rate: learning_rate
66
+ }
64
67
  params[:num_classes] = labels.size if objective == "multiclass"
65
68
  if train_set.size < 30
66
69
  params[:min_data_in_bin] = 1
@@ -121,25 +124,30 @@ module Eps
121
124
  def check_evaluator(objective, labels, booster, booster_set, evaluator, evaluator_set)
122
125
  expected = @booster.predict(booster_set.map_rows(&:to_a))
123
126
  if objective == "multiclass"
124
- expected.map! do |v|
125
- labels[v.map.with_index.max_by { |v2, _| v2 }.last]
126
- end
127
+ actual = evaluator.predict(evaluator_set, probabilities: true)
128
+ # just compare first for now
129
+ expected.map! { |v| v.first }
130
+ actual.map! { |v| v.values.first }
127
131
  elsif objective == "binary"
128
- expected.map! { |v| labels[v >= 0.5 ? 1 : 0] }
132
+ actual = evaluator.predict(evaluator_set, probabilities: true).map { |v| v.values.last }
133
+ else
134
+ actual = evaluator.predict(evaluator_set)
129
135
  end
130
- actual = evaluator.predict(evaluator_set)
131
136
 
132
- regression = objective == "regression"
137
+ regression = objective == "regression" || objective == "binary"
133
138
  bad_observations = []
134
139
  expected.zip(actual).each_with_index do |(exp, act), i|
135
- success = regression ? (act - exp).abs < 0.001 : act == exp
140
+ success = (act - exp).abs < 0.001
136
141
  unless success
137
142
  bad_observations << {expected: exp, actual: act, data_point: evaluator_set[i].map(&:itself).first}
138
143
  end
139
144
  end
140
145
 
141
146
  if bad_observations.any?
142
- raise "Bug detected in evaluator. Please report an issue. Bad data points: #{bad_observations.inspect}"
147
+ bad_observations.each do |obs|
148
+ p obs
149
+ end
150
+ raise "Bug detected in evaluator. Please report an issue."
143
151
  end
144
152
  end
145
153
 
@@ -37,6 +37,7 @@ module Eps
37
37
  str
38
38
  end
39
39
 
40
+ # TODO use keyword arguments for gsl and intercept in 0.4.0
40
41
  def _train(**options)
41
42
  raise "Target must be numeric" if @target_type != "numeric"
42
43
  check_missing_value(@train_set)
@@ -50,17 +51,35 @@ module Eps
50
51
 
51
52
  x = data.map_rows(&:to_a)
52
53
 
53
- intercept = @options.key?(:intercept) ? @options[:intercept] : true
54
- if intercept
54
+ gsl =
55
+ if options.key?(:gsl)
56
+ options[:gsl]
57
+ elsif defined?(GSL)
58
+ true
59
+ elsif defined?(GSLR)
60
+ :gslr
61
+ else
62
+ false
63
+ end
64
+
65
+ intercept = options.key?(:intercept) ? options[:intercept] : true
66
+ if intercept && gsl != :gslr
55
67
  data.size.times do |i|
56
68
  x[i].unshift(1)
57
69
  end
58
70
  end
59
71
 
60
- gsl = options.key?(:gsl) ? options[:gsl] : defined?(GSL)
61
-
62
72
  v3 =
63
- if gsl
73
+ if gsl == :gslr
74
+ model = GSLR::OLS.new(intercept: intercept)
75
+ model.fit(x, data.label, weight: data.weight)
76
+
77
+ @covariance = model.covariance
78
+
79
+ coefficients = model.coefficients.dup
80
+ coefficients.unshift(model.intercept) if intercept
81
+ coefficients
82
+ elsif gsl
64
83
  x = GSL::Matrix.alloc(*x)
65
84
  y = GSL::Vector.alloc(data.label)
66
85
  w = GSL::Vector.alloc(data.weight) if data.weight
@@ -196,7 +215,11 @@ module Eps
196
215
 
197
216
  def diagonal
198
217
  @diagonal ||= begin
199
- if covariance.respond_to?(:each)
218
+ if covariance.is_a?(Array)
219
+ covariance.size.times.map do |i|
220
+ covariance[i][i]
221
+ end
222
+ elsif covariance.respond_to?(:each)
200
223
  d = covariance.each(:diagonal).to_a
201
224
  @removed.each do |i|
202
225
  d.insert(i, 0)
@@ -17,7 +17,7 @@ module Eps
17
17
  str
18
18
  end
19
19
 
20
- def _train(smoothing: 1, **options)
20
+ def _train(smoothing: 1)
21
21
  raise "Target must be strings" if @target_type != "categorical"
22
22
  check_missing_value(@train_set)
23
23
  check_missing_value(@validation_set) if @validation_set
@@ -210,10 +210,10 @@ module Eps
210
210
  probabilities[:conditional].each do |k, v|
211
211
  xml.BayesInput(fieldName: k) do
212
212
  if features[k] == "categorical"
213
- v.sort_by { |k2, _| k2 }.each do |k2, v2|
213
+ v.sort_by { |k2, _| k2.to_s }.each do |k2, v2|
214
214
  xml.PairCounts(value: k2) do
215
215
  xml.TargetValueCounts do
216
- v2.sort_by { |k2, _| k2 }.each do |k3, v3|
216
+ v2.sort_by { |k2, _| k2.to_s }.each do |k3, v3|
217
217
  xml.TargetValueCount(value: k3, count: v3)
218
218
  end
219
219
  end
@@ -221,7 +221,7 @@ module Eps
221
221
  end
222
222
  else
223
223
  xml.TargetValueStats do
224
- v.sort_by { |k2, _| k2 }.each do |k2, v2|
224
+ v.sort_by { |k2, _| k2.to_s }.each do |k2, v2|
225
225
  xml.TargetValueStat(value: k2) do
226
226
  xml.GaussianDistribution(mean: v2[:mean], variance: v2[:stdev]**2)
227
227
  end
@@ -233,7 +233,7 @@ module Eps
233
233
  end
234
234
  xml.BayesOutput(fieldName: "target") do
235
235
  xml.TargetValueCounts do
236
- probabilities[:prior].sort_by { |k, _| k }.each do |k, v|
236
+ probabilities[:prior].sort_by { |k, _| k.to_s }.each do |k, v|
237
237
  xml.TargetValueCount(value: k, count: v)
238
238
  end
239
239
  end
@@ -1,3 +1,3 @@
1
1
  module Eps
2
- VERSION = "0.3.1"
2
+ VERSION = "0.3.6"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: eps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.1
4
+ version: 0.3.6
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
8
  autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-12-06 00:00:00.000000000 Z
11
+ date: 2020-06-19 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: lightgbm
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: numo-narray
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: rake
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -94,6 +108,20 @@ dependencies:
94
108
  - - ">="
95
109
  - !ruby/object:Gem::Version
96
110
  version: '0'
111
+ - !ruby/object:Gem::Dependency
112
+ name: rover-df
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
97
125
  description:
98
126
  email: andrew@chartkick.com
99
127
  executables: []
@@ -143,7 +171,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
143
171
  - !ruby/object:Gem::Version
144
172
  version: '0'
145
173
  requirements: []
146
- rubygems_version: 3.0.3
174
+ rubygems_version: 3.1.2
147
175
  signing_key:
148
176
  specification_version: 4
149
177
  summary: Machine learning for Ruby. Supports regression (linear regression) and classification