eps 0.3.2 → 0.3.7

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 50f5a4273111ebc5ba1265d07d4e925d770006fce330b01cfbe4fe97221548d9
4
- data.tar.gz: a785ca5533618243b933248ef2f418c5d0d048d8ff1051c9a2e7fae5f8c87ba3
3
+ metadata.gz: bf9b15abb922ed62bace8127413e9353d37364f7fe63218088278420655a2561
4
+ data.tar.gz: 9ae7077f18295a24daf682777106807eec96dfa75e6e4a9f6b595cb52981aec5
5
5
  SHA512:
6
- metadata.gz: 3adffd0fbb0d16163a06720adfd97a83c6bf4bf2554e30b3a6f3599828511834826477a8296f93720c3efd52d19ca412e8b7695b013a4e7361073e3c5bcf5ee5
7
- data.tar.gz: 69847d6d49742f61b3b4fac6e9298e5c954d4cc22e5302b716f258e6190f71df4eee1820e844c4159e1fdf99385a36c2cb697065e78d5f6557c6d1dbca70f9de
6
+ metadata.gz: d37cec29c949a729f9581532902b595f4fca1817054243e7e6261b5167917144ba988bbea5fe2a069ef4b988f91fa2b5fd0ea5628059c328b4575d374eb952d7
7
+ data.tar.gz: 667afb1f383c0d2a8c45c281b7a2b88cc76c3b691704853feb03a8be5a95bfa3ba155ba3e82278c5993b638185c80a82fbbe852f5704ab6bed896af667dd3b76
@@ -1,3 +1,25 @@
1
+ ## 0.3.7 (2020-11-23)
2
+
3
+ - Fixed error with LightGBM summary
4
+
5
+ ## 0.3.6 (2020-06-19)
6
+
7
+ - Fixed error with text features for LightGBM
8
+
9
+ ## 0.3.5 (2020-06-10)
10
+
11
+ - Added `learning_rate` option for LightGBM
12
+ - Added support for Numo and Rover
13
+
14
+ ## 0.3.4 (2020-04-05)
15
+
16
+ - Added `predict_probability` for classification
17
+
18
+ ## 0.3.3 (2020-02-24)
19
+
20
+ - Fixed errors and incorrect predictions with boolean columns
21
+ - Fixed deprecation warnings in Ruby 2.7
22
+
1
23
  ## 0.3.2 (2019-12-08)
2
24
 
3
25
  - Added support for GSLR
data/README.md CHANGED
@@ -4,11 +4,10 @@ Machine learning for Ruby
4
4
 
5
5
  - Build predictive models quickly and easily
6
6
  - Serve models built in Ruby, Python, R, and more
7
- - No prior knowledge of machine learning required :tada:
8
7
 
9
8
  Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
10
9
 
11
- [![Build Status](https://travis-ci.org/ankane/eps.svg?branch=master)](https://travis-ci.org/ankane/eps)
10
+ [![Build Status](https://github.com/ankane/eps/workflows/build/badge.svg?branch=master)](https://github.com/ankane/eps/actions)
12
11
 
13
12
  ## Installation
14
13
 
@@ -135,7 +134,7 @@ For text features, use strings with multiple words.
135
134
  {description: "a beautiful house on top of a hill"}
136
135
  ```
137
136
 
138
- This creates features based on word count (term frequency).
137
+ This creates features based on [word count](https://en.wikipedia.org/wiki/Bag-of-words_model).
139
138
 
140
139
  You can specify text features explicitly with:
141
140
 
@@ -148,12 +147,12 @@ You can set advanced options with:
148
147
  ```ruby
149
148
  text_features: {
150
149
  description: {
151
- min_occurences: 5,
152
- max_features: 1000,
153
- min_length: 1,
154
- case_sensitive: true,
155
- tokenizer: /\s+/,
156
- stop_words: ["and", "the"]
150
+ min_occurences: 5, # min times a word must appear to be included in the model
151
+ max_features: 1000, # max number of words to include in the model
152
+ min_length: 1, # min length of words to be included
153
+ case_sensitive: true, # how to treat words with different case
154
+ tokenizer: /\s+/, # how to tokenize the text, defaults to whitespace
155
+ stop_words: ["and", "the"] # words to exclude from the model
157
156
  }
158
157
  }
159
158
  ```
@@ -219,7 +218,7 @@ Build the model with:
219
218
  PriceModel.build
220
219
  ```
221
220
 
222
- This saves the model to `price_model.pmml`. Be sure to check this into source control.
221
+ This saves the model to `price_model.pmml`. Check this into source control or use a tool like [Trove](https://github.com/ankane/trove) to store it.
223
222
 
224
223
  Predict with:
225
224
 
@@ -314,7 +313,7 @@ y = [1, 2, 3]
314
313
  Eps::Model.new(x, y)
315
314
  ```
316
315
 
317
- Or pass arrays of arrays
316
+ Data can be an array of arrays
318
317
 
319
318
  ```ruby
320
319
  x = [[1, 2], [2, 0], [3, 1]]
@@ -322,18 +321,29 @@ y = [1, 2, 3]
322
321
  Eps::Model.new(x, y)
323
322
  ```
324
323
 
325
- ### Daru
324
+ Or Numo arrays
326
325
 
327
- Eps works well with Daru data frames.
326
+ ```ruby
327
+ x = Numo::NArray.cast([[1, 2], [2, 0], [3, 1]])
328
+ y = Numo::NArray.cast([1, 2, 3])
329
+ Eps::Model.new(x, y)
330
+ ```
331
+
332
+ Or a Rover data frame
328
333
 
329
334
  ```ruby
330
- df = Daru::DataFrame.from_csv("houses.csv")
335
+ df = Rover.read_csv("houses.csv")
331
336
  Eps::Model.new(df, target: "price")
332
337
  ```
333
338
 
334
- ### CSVs
339
+ Or a Daru data frame
340
+
341
+ ```ruby
342
+ df = Daru::DataFrame.from_csv("houses.csv")
343
+ Eps::Model.new(df, target: "price")
344
+ ```
335
345
 
336
- When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
346
+ When reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.
337
347
 
338
348
  ```ruby
339
349
  CSV.table("data.csv").map { |row| row.to_h }
@@ -353,11 +363,23 @@ Eps supports:
353
363
  - Linear Regression
354
364
  - Naive Bayes
355
365
 
366
+ ### LightGBM
367
+
368
+ Pass the learning rate with:
369
+
370
+ ```ruby
371
+ Eps::Model.new(data, learning_rate: 0.01)
372
+ ```
373
+
356
374
  ### Linear Regression
357
375
 
358
- #### Performance
376
+ By default, an intercept is included. Disable this with:
359
377
 
360
- To speed up training on large datasets with linear regression, [install GSL](https://www.gnu.org/software/gsl/). With Homebrew, you can use:
378
+ ```ruby
379
+ Eps::Model.new(data, intercept: false)
380
+ ```
381
+
382
+ To speed up training on large datasets with linear regression, [install GSL](https://github.com/ankane/gslr#gsl-installation). With Homebrew, you can use:
361
383
 
362
384
  ```sh
363
385
  brew install gsl
@@ -371,14 +393,16 @@ gem 'gslr', group: :development
371
393
 
372
394
  It only needs to be available in environments used to build the model.
373
395
 
374
- #### Options
396
+ ## Probability
375
397
 
376
- By default, an intercept is included. Disable this with:
398
+ To get the probability of each category for predictions with classification, use:
377
399
 
378
400
  ```ruby
379
- Eps::Model.new(data, intercept: false)
401
+ model.predict_probability(data)
380
402
  ```
381
403
 
404
+ Naive Bayes is known to produce poor probability estimates, so stick with LightGBM if you need this.
405
+
382
406
  ## Validation Options
383
407
 
384
408
  Pass your own validation set with:
@@ -414,7 +438,7 @@ The database is another place you can store models. It’s good if you retrain m
414
438
  Create an ActiveRecord model to store the predictive model.
415
439
 
416
440
  ```sh
417
- rails g model Model key:string:uniq data:text
441
+ rails generate model Model key:string:uniq data:text
418
442
  ```
419
443
 
420
444
  Store the model with:
@@ -524,11 +548,11 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
524
548
  - Write, clarify, or fix documentation
525
549
  - Suggest or add new features
526
550
 
527
- To get started with development and testing:
551
+ To get started with development:
528
552
 
529
553
  ```sh
530
554
  git clone https://github.com/ankane/eps.git
531
555
  cd eps
532
556
  bundle install
533
- rake test
557
+ bundle exec rake test
534
558
  ```
@@ -2,33 +2,18 @@ module Eps
2
2
  class BaseEstimator
3
3
  def initialize(data = nil, y = nil, **options)
4
4
  @options = options.dup
5
+ @trained = false
6
+ @text_encoders = {}
5
7
  # TODO better pattern - don't pass most options to train
6
- options.delete(:intercept)
7
8
  train(data, y, **options) if data
8
9
  end
9
10
 
10
11
  def predict(data)
11
- singular = data.is_a?(Hash)
12
- data = [data] if singular
13
-
14
- data = Eps::DataFrame.new(data)
15
-
16
- @evaluator.features.each do |k, type|
17
- values = data.columns[k]
18
- raise ArgumentError, "Missing column: #{k}" if !values
19
- column_type = Utils.column_type(values.compact, k) if values
20
-
21
- if !column_type.nil?
22
- if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
23
- raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
24
- end
25
- end
26
- # TODO check for unknown values for categorical features
27
- end
28
-
29
- predictions = @evaluator.predict(data)
12
+ _predict(data, false)
13
+ end
30
14
 
31
- singular ? predictions.first : predictions
15
+ def predict_probability(data)
16
+ _predict(data, true)
32
17
  end
33
18
 
34
19
  def evaluate(data, y = nil, target: nil, weight: nil)
@@ -48,6 +33,8 @@ module Eps
48
33
  end
49
34
 
50
35
  def summary(extended: false)
36
+ raise "Summary not available for loaded models" unless @trained
37
+
51
38
  str = String.new("")
52
39
 
53
40
  if @validation_set
@@ -72,7 +59,31 @@ module Eps
72
59
 
73
60
  private
74
61
 
75
- def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, verbose: nil, text_features: nil, early_stopping: nil)
62
+ def _predict(data, probabilities)
63
+ singular = data.is_a?(Hash)
64
+ data = [data] if singular
65
+
66
+ data = Eps::DataFrame.new(data)
67
+
68
+ @evaluator.features.each do |k, type|
69
+ values = data.columns[k]
70
+ raise ArgumentError, "Missing column: #{k}" if !values
71
+ column_type = Utils.column_type(values.compact, k) if values
72
+
73
+ if !column_type.nil?
74
+ if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
75
+ raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
76
+ end
77
+ end
78
+ # TODO check for unknown values for categorical features
79
+ end
80
+
81
+ predictions = @evaluator.predict(data, probabilities: probabilities)
82
+
83
+ singular ? predictions.first : predictions
84
+ end
85
+
86
+ def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, text_features: nil, **options)
76
87
  data, @target = prep_data(data, y, target, weight)
77
88
  @target_type = Utils.column_type(data.label, @target)
78
89
 
@@ -164,11 +175,13 @@ module Eps
164
175
  raise "No data in validation set" if validation_set && validation_set.empty?
165
176
 
166
177
  @validation_set = validation_set
167
- @evaluator = _train(verbose: verbose, early_stopping: early_stopping)
178
+ @evaluator = _train(**options)
168
179
 
169
180
  # reset pmml
170
181
  @pmml = nil
171
182
 
183
+ @trained = true
184
+
172
185
  nil
173
186
  end
174
187
 
@@ -197,29 +210,38 @@ module Eps
197
210
  [data, target]
198
211
  end
199
212
 
200
- def prep_text_features(train_set)
201
- @text_encoders = {}
213
+ def prep_text_features(train_set, fit: true)
202
214
  @text_features.each do |k, v|
203
- # reset vocabulary
204
- v.delete(:vocabulary)
215
+ if fit
216
+ # reset vocabulary
217
+ v.delete(:vocabulary)
218
+
219
+ # TODO determine max features automatically
220
+ # start based on number of rows
221
+ encoder = Eps::TextEncoder.new(**v)
222
+ counts = encoder.fit(train_set.columns.delete(k))
223
+ else
224
+ encoder = @text_encoders[k]
225
+ counts = encoder.transform(train_set.columns.delete(k))
226
+ end
205
227
 
206
- # TODO determine max features automatically
207
- # start based on number of rows
208
- encoder = Eps::TextEncoder.new(v)
209
- counts = encoder.fit(train_set.columns.delete(k))
210
228
  encoder.vocabulary.each do |word|
211
229
  train_set.columns[[k, word]] = [0] * counts.size
212
230
  end
231
+
213
232
  counts.each_with_index do |ci, i|
214
233
  ci.each do |word, count|
215
234
  word_key = [k, word]
216
235
  train_set.columns[word_key][i] = 1 if train_set.columns.key?(word_key)
217
236
  end
218
237
  end
219
- @text_encoders[k] = encoder
220
238
 
221
- # update vocabulary
222
- v[:vocabulary] = encoder.vocabulary
239
+ if fit
240
+ @text_encoders[k] = encoder
241
+
242
+ # update vocabulary
243
+ v[:vocabulary] = encoder.vocabulary
244
+ end
223
245
  end
224
246
 
225
247
  raise "No features left" if train_set.columns.empty?
@@ -233,7 +255,7 @@ module Eps
233
255
 
234
256
  def check_missing(c, name)
235
257
  raise ArgumentError, "Missing column: #{name}" if !c
236
- raise ArgumentError, "Missing values in column #{name}" if c.any?(&:nil?)
258
+ raise ArgumentError, "Missing values in column #{name}" if c.to_a.any?(&:nil?)
237
259
  end
238
260
 
239
261
  def check_missing_value(df)
@@ -10,7 +10,7 @@ module Eps
10
10
  data.columns.each do |k, v|
11
11
  @columns[k] = v
12
12
  end
13
- elsif daru?(data)
13
+ elsif rover?(data) || daru?(data)
14
14
  data.to_h.each do |k, v|
15
15
  @columns[k.to_s] = v.to_a
16
16
  end
@@ -19,6 +19,8 @@ module Eps
19
19
  @columns[k.to_s] = v.to_a
20
20
  end
21
21
  else
22
+ data = data.to_a if numo?(data)
23
+
22
24
  if data.any?
23
25
  row = data[0]
24
26
 
@@ -140,8 +142,16 @@ module Eps
140
142
 
141
143
  private
142
144
 
145
+ def numo?(x)
146
+ defined?(Numo::NArray) && x.is_a?(Numo::NArray)
147
+ end
148
+
149
+ def rover?(x)
150
+ defined?(Rover::DataFrame) && x.is_a?(Rover::DataFrame)
151
+ end
152
+
143
153
  def daru?(x)
144
- defined?(Daru) && x.is_a?(Daru::DataFrame)
154
+ defined?(Daru::DataFrame) && x.is_a?(Daru::DataFrame)
145
155
  end
146
156
  end
147
157
  end
@@ -11,19 +11,15 @@ module Eps
11
11
  @text_features = text_features
12
12
  end
13
13
 
14
- def predict(data)
14
+ def predict(data, probabilities: false)
15
+ raise "Probabilities not supported" if probabilities && @objective == "regression"
16
+
15
17
  rows = data.map(&:to_h)
16
18
 
17
19
  # sparse matrix
18
20
  @text_features.each do |k, v|
19
- encoder = TextEncoder.new(v)
20
-
21
- values = data.columns.delete(k)
22
- counts = encoder.transform(values)
23
-
24
- encoder.vocabulary.each do |word|
25
- data.columns[[k, word]] = [0] * values.size
26
- end
21
+ encoder = TextEncoder.new(**v)
22
+ counts = encoder.transform(data.columns[k])
27
23
 
28
24
  counts.each_with_index do |xc, i|
29
25
  row = rows[i]
@@ -38,17 +34,28 @@ module Eps
38
34
  when "regression"
39
35
  sum_trees(rows, @trees)
40
36
  when "binary"
41
- sum_trees(rows, @trees).map { |s| @labels[sigmoid(s) > 0.5 ? 1 : 0] }
37
+ prob = sum_trees(rows, @trees).map { |s| sigmoid(s) }
38
+ if probabilities
39
+ prob.map { |v| @labels.zip([1 - v, v]).to_h }
40
+ else
41
+ prob.map { |v| @labels[v > 0.5 ? 1 : 0] }
42
+ end
42
43
  else
43
44
  tree_scores = []
44
45
  num_trees = @trees.size / @labels.size
45
46
  @trees.each_slice(num_trees).each do |trees|
46
47
  tree_scores << sum_trees(rows, trees)
47
48
  end
48
- data.size.times.map do |i|
49
+ rows.size.times.map do |i|
49
50
  v = tree_scores.map { |s| s[i] }
50
- idx = v.map.with_index.max_by { |v2, _| v2 }.last
51
- @labels[idx]
51
+ if probabilities
52
+ exp = v.map { |vi| Math.exp(vi) }
53
+ sum = exp.sum
54
+ @labels.zip(exp.map { |e| e / sum }).to_h
55
+ else
56
+ idx = v.map.with_index.max_by { |v2, _| v2 }.last
57
+ @labels[idx]
58
+ end
52
59
  end
53
60
  end
54
61
  end
@@ -81,7 +88,7 @@ module Eps
81
88
  else
82
89
  case node.operator
83
90
  when "equal"
84
- v == node.value
91
+ v.to_s == node.value
85
92
  when "in"
86
93
  node.value.include?(v)
87
94
  when "greaterThan"
@@ -109,7 +116,7 @@ module Eps
109
116
  end
110
117
 
111
118
  def sigmoid(x)
112
- 1.0 / (1 + Math::E**(-x))
119
+ 1.0 / (1 + Math.exp(-x))
113
120
  end
114
121
  end
115
122
  end
@@ -9,7 +9,9 @@ module Eps
9
9
  @text_features = text_features || {}
10
10
  end
11
11
 
12
- def predict(x)
12
+ def predict(x, probabilities: false)
13
+ raise "Probabilities not supported" if probabilities
14
+
13
15
  intercept = @coefficients["_intercept"] || 0.0
14
16
  scores = [intercept] * x.size
15
17
 
@@ -19,10 +21,11 @@ module Eps
19
21
  case type
20
22
  when "categorical"
21
23
  x.columns[k].each_with_index do |xv, i|
22
- scores[i] += @coefficients[[k, xv]].to_f
24
+ # TODO clean up
25
+ scores[i] += (@coefficients[[k, xv]] || @coefficients[[k, xv.to_s]]).to_f
23
26
  end
24
27
  when "text"
25
- encoder = TextEncoder.new(@text_features[k])
28
+ encoder = TextEncoder.new(**@text_features[k])
26
29
  counts = encoder.transform(x.columns[k])
27
30
  coef = {}
28
31
  @coefficients.each do |k2, v|
@@ -10,14 +10,15 @@ module Eps
10
10
  @legacy = legacy
11
11
  end
12
12
 
13
- def predict(x)
13
+ def predict(x, probabilities: false)
14
14
  probs = calculate_class_probabilities(x)
15
15
  probs.map do |xp|
16
- # convert probabilities
17
- # not needed when just returning label
18
- # sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
19
- # p xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
20
- xp.sort_by { |k, v| [-v, k] }[0][0]
16
+ if probabilities
17
+ sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
18
+ xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
19
+ else
20
+ xp.sort_by { |k, v| [-v, k] }[0][0]
21
+ end
21
22
  end
22
23
  end
23
24
 
@@ -38,7 +39,8 @@ module Eps
38
39
  case type
39
40
  when "categorical"
40
41
  x.columns[k].each_with_index do |xi, i|
41
- vc = probabilities[:conditional][k][xi]
42
+ # TODO clean this up
43
+ vc = probabilities[:conditional][k][xi] || probabilities[:conditional][k][xi.to_s]
42
44
 
43
45
  # unknown value if not vc
44
46
  if vc
@@ -10,14 +10,14 @@ module Eps
10
10
  str << "Model needs more data for better predictions\n"
11
11
  else
12
12
  str << "Most important features\n"
13
- @importance_keys.zip(importance).sort_by { |k, v| [-v, k] }.first(10).each do |k, v|
13
+ @importance_keys.zip(importance).sort_by { |k, v| [-v, display_field(k)] }.first(10).each do |k, v|
14
14
  str << "#{display_field(k)}: #{(100 * v / total).round}\n"
15
15
  end
16
16
  end
17
17
  str
18
18
  end
19
19
 
20
- def _train(verbose: nil, early_stopping: nil)
20
+ def _train(verbose: nil, early_stopping: nil, learning_rate: 0.1)
21
21
  train_set = @train_set
22
22
  validation_set = @validation_set.dup
23
23
  summary_label = train_set.label
@@ -57,10 +57,13 @@ module Eps
57
57
 
58
58
  # text feature encoding
59
59
  prep_text_features(train_set)
60
- prep_text_features(validation_set) if validation_set
60
+ prep_text_features(validation_set, fit: false) if validation_set
61
61
 
62
62
  # create params
63
- params = {objective: objective}
63
+ params = {
64
+ objective: objective,
65
+ learning_rate: learning_rate
66
+ }
64
67
  params[:num_classes] = labels.size if objective == "multiclass"
65
68
  if train_set.size < 30
66
69
  params[:min_data_in_bin] = 1
@@ -121,25 +124,30 @@ module Eps
121
124
  def check_evaluator(objective, labels, booster, booster_set, evaluator, evaluator_set)
122
125
  expected = @booster.predict(booster_set.map_rows(&:to_a))
123
126
  if objective == "multiclass"
124
- expected.map! do |v|
125
- labels[v.map.with_index.max_by { |v2, _| v2 }.last]
126
- end
127
+ actual = evaluator.predict(evaluator_set, probabilities: true)
128
+ # just compare first for now
129
+ expected.map! { |v| v.first }
130
+ actual.map! { |v| v.values.first }
127
131
  elsif objective == "binary"
128
- expected.map! { |v| labels[v >= 0.5 ? 1 : 0] }
132
+ actual = evaluator.predict(evaluator_set, probabilities: true).map { |v| v.values.last }
133
+ else
134
+ actual = evaluator.predict(evaluator_set)
129
135
  end
130
- actual = evaluator.predict(evaluator_set)
131
136
 
132
- regression = objective == "regression"
137
+ regression = objective == "regression" || objective == "binary"
133
138
  bad_observations = []
134
139
  expected.zip(actual).each_with_index do |(exp, act), i|
135
- success = regression ? (act - exp).abs < 0.001 : act == exp
140
+ success = (act - exp).abs < 0.001
136
141
  unless success
137
142
  bad_observations << {expected: exp, actual: act, data_point: evaluator_set[i].map(&:itself).first}
138
143
  end
139
144
  end
140
145
 
141
146
  if bad_observations.any?
142
- raise "Bug detected in evaluator. Please report an issue. Bad data points: #{bad_observations.inspect}"
147
+ bad_observations.each do |obs|
148
+ p obs
149
+ end
150
+ raise "Bug detected in evaluator. Please report an issue."
143
151
  end
144
152
  end
145
153
 
@@ -37,6 +37,7 @@ module Eps
37
37
  str
38
38
  end
39
39
 
40
+ # TODO use keyword arguments for gsl and intercept in 0.4.0
40
41
  def _train(**options)
41
42
  raise "Target must be numeric" if @target_type != "numeric"
42
43
  check_missing_value(@train_set)
@@ -61,7 +62,7 @@ module Eps
61
62
  false
62
63
  end
63
64
 
64
- intercept = @options.key?(:intercept) ? @options[:intercept] : true
65
+ intercept = options.key?(:intercept) ? options[:intercept] : true
65
66
  if intercept && gsl != :gslr
66
67
  data.size.times do |i|
67
68
  x[i].unshift(1)
@@ -17,7 +17,7 @@ module Eps
17
17
  str
18
18
  end
19
19
 
20
- def _train(smoothing: 1, **options)
20
+ def _train(smoothing: 1)
21
21
  raise "Target must be strings" if @target_type != "categorical"
22
22
  check_missing_value(@train_set)
23
23
  check_missing_value(@validation_set) if @validation_set
@@ -210,10 +210,10 @@ module Eps
210
210
  probabilities[:conditional].each do |k, v|
211
211
  xml.BayesInput(fieldName: k) do
212
212
  if features[k] == "categorical"
213
- v.sort_by { |k2, _| k2 }.each do |k2, v2|
213
+ v.sort_by { |k2, _| k2.to_s }.each do |k2, v2|
214
214
  xml.PairCounts(value: k2) do
215
215
  xml.TargetValueCounts do
216
- v2.sort_by { |k2, _| k2 }.each do |k3, v3|
216
+ v2.sort_by { |k2, _| k2.to_s }.each do |k3, v3|
217
217
  xml.TargetValueCount(value: k3, count: v3)
218
218
  end
219
219
  end
@@ -221,7 +221,7 @@ module Eps
221
221
  end
222
222
  else
223
223
  xml.TargetValueStats do
224
- v.sort_by { |k2, _| k2 }.each do |k2, v2|
224
+ v.sort_by { |k2, _| k2.to_s }.each do |k2, v2|
225
225
  xml.TargetValueStat(value: k2) do
226
226
  xml.GaussianDistribution(mean: v2[:mean], variance: v2[:stdev]**2)
227
227
  end
@@ -233,7 +233,7 @@ module Eps
233
233
  end
234
234
  xml.BayesOutput(fieldName: "target") do
235
235
  xml.TargetValueCounts do
236
- probabilities[:prior].sort_by { |k, _| k }.each do |k, v|
236
+ probabilities[:prior].sort_by { |k, _| k.to_s }.each do |k, v|
237
237
  xml.TargetValueCount(value: k, count: v)
238
238
  end
239
239
  end
@@ -1,3 +1,3 @@
1
1
  module Eps
2
- VERSION = "0.3.2"
2
+ VERSION = "0.3.7"
3
3
  end
metadata CHANGED
@@ -1,14 +1,14 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: eps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.3.2
4
+ version: 0.3.7
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
8
+ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
- date: 2019-12-09 00:00:00.000000000 Z
11
+ date: 2020-11-24 00:00:00.000000000 Z
12
12
  dependencies:
13
13
  - !ruby/object:Gem::Dependency
14
14
  name: lightgbm
@@ -80,6 +80,20 @@ dependencies:
80
80
  - - ">="
81
81
  - !ruby/object:Gem::Version
82
82
  version: '0'
83
+ - !ruby/object:Gem::Dependency
84
+ name: numo-narray
85
+ requirement: !ruby/object:Gem::Requirement
86
+ requirements:
87
+ - - ">="
88
+ - !ruby/object:Gem::Version
89
+ version: '0'
90
+ type: :development
91
+ prerelease: false
92
+ version_requirements: !ruby/object:Gem::Requirement
93
+ requirements:
94
+ - - ">="
95
+ - !ruby/object:Gem::Version
96
+ version: '0'
83
97
  - !ruby/object:Gem::Dependency
84
98
  name: rake
85
99
  requirement: !ruby/object:Gem::Requirement
@@ -94,7 +108,21 @@ dependencies:
94
108
  - - ">="
95
109
  - !ruby/object:Gem::Version
96
110
  version: '0'
97
- description:
111
+ - !ruby/object:Gem::Dependency
112
+ name: rover-df
113
+ requirement: !ruby/object:Gem::Requirement
114
+ requirements:
115
+ - - ">="
116
+ - !ruby/object:Gem::Version
117
+ version: '0'
118
+ type: :development
119
+ prerelease: false
120
+ version_requirements: !ruby/object:Gem::Requirement
121
+ requirements:
122
+ - - ">="
123
+ - !ruby/object:Gem::Version
124
+ version: '0'
125
+ description:
98
126
  email: andrew@chartkick.com
99
127
  executables: []
100
128
  extensions: []
@@ -128,7 +156,7 @@ homepage: https://github.com/ankane/eps
128
156
  licenses:
129
157
  - MIT
130
158
  metadata: {}
131
- post_install_message:
159
+ post_install_message:
132
160
  rdoc_options: []
133
161
  require_paths:
134
162
  - lib
@@ -143,8 +171,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
143
171
  - !ruby/object:Gem::Version
144
172
  version: '0'
145
173
  requirements: []
146
- rubygems_version: 3.0.3
147
- signing_key:
174
+ rubygems_version: 3.1.4
175
+ signing_key:
148
176
  specification_version: 4
149
177
  summary: Machine learning for Ruby. Supports regression (linear regression) and classification
150
178
  (naive Bayes)