eps 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: d93161edfe5b26ce55bbdafedfa4ead7fad756cc0f3e921f2b970a49c97bb5fc
4
- data.tar.gz: 5d0e4f8326a6e446efbe0a4a6f9e8e6435b7314ac4dba737d87bbe4b73c4e04a
3
+ metadata.gz: 9559f4d440f2a7cb6d541073600829a0b1bdd0df61c6abe166e0ff731a34fe18
4
+ data.tar.gz: 6a9f37c76d6bdec50f877d9932f805bda751b87c03c0a69f187830dc7c1876d4
5
5
  SHA512:
6
- metadata.gz: e387214353fdf13608d48b306db3ce1b635eb3977f052d1d47b3e2b8cbe0c14628e01ca1d4291eaa9d3fb833864ff02628817155275d2105a069d2f4a866b8b3
7
- data.tar.gz: b27237a71a7198719b3000f385ea946547258f789f1a650cc348ed38d96e49c4d56b01149917807c97200aa737d6864739094c91d86ab8bafdd29e96e25e0d3b
6
+ metadata.gz: c12d5b688bbe9884a0350efd3900de68f1244264dc0cf56f7a8a64145b3ff4a2f08608146e596d0052c1a7da4cc52c4606ae9a5e52b180c1fe1e60d034170e9f
7
+ data.tar.gz: 5fe61bd82055f22210f27f7d1cceb7adbb71d50dd6750336ddc731ae617d749e2c9c478ada19f6a2ad8def2c51d8e2e13a0269ba7687ee97331490a49f59d7c4
data/CHANGELOG.md CHANGED
@@ -1,3 +1,12 @@
1
+ ## 0.7.0 (2026-04-14)
2
+
3
+ - Dropped support for Daru
4
+ - Dropped support for Ruby < 3.3
5
+
6
+ ## 0.6.0 (2025-02-01)
7
+
8
+ - Dropped support for Ruby < 3.1
9
+
1
10
  ## 0.5.0 (2023-07-02)
2
11
 
3
12
  - Dropped support for Ruby < 3
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2018-2023 Andrew Kane
3
+ Copyright (c) 2018-2026 Andrew Kane
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -7,7 +7,7 @@ Machine learning for Ruby
7
7
 
8
8
  Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
9
9
 
10
- [![Build Status](https://github.com/ankane/eps/workflows/build/badge.svg?branch=master)](https://github.com/ankane/eps/actions)
10
+ [![Build Status](https://github.com/ankane/eps/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/eps/actions)
11
11
 
12
12
  ## Installation
13
13
 
@@ -147,7 +147,7 @@ You can set advanced options with:
147
147
  ```ruby
148
148
  text_features: {
149
149
  description: {
150
- min_occurences: 5, # min times a word must appear to be included in the model
150
+ min_occurrences: 5, # min times a word must appear to be included in the model
151
151
  max_features: 1000, # max number of words to include in the model
152
152
  min_length: 1, # min length of words to be included
153
153
  case_sensitive: true, # how to treat words with different case
@@ -336,13 +336,6 @@ df = Rover.read_csv("houses.csv")
336
336
  Eps::Model.new(df, target: "price")
337
337
  ```
338
338
 
339
- Or a Daru data frame
340
-
341
- ```ruby
342
- df = Daru::DataFrame.from_csv("houses.csv")
343
- Eps::Model.new(df, target: "price")
344
- ```
345
-
346
339
  When reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.
347
340
 
348
341
  ```ruby
@@ -414,7 +407,7 @@ Eps::Model.new(data, validation_set: validation_set)
414
407
  Split on a specific value
415
408
 
416
409
  ```ruby
417
- Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
410
+ Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2025-01-01")})
418
411
  ```
419
412
 
420
413
  Specify the validation set size (the default is `0.25`, which is 25%)
@@ -435,7 +428,7 @@ The database is another place you can store models. It’s good if you retrain m
435
428
 
436
429
  > We recommend adding monitoring and guardrails as well if you retrain automatically
437
430
 
438
- Create an ActiveRecord model to store the predictive model.
431
+ Create an Active Record model to store the predictive model.
439
432
 
440
433
  ```sh
441
434
  rails generate model Model key:string:uniq data:text
@@ -479,61 +472,7 @@ Weights are supported for metrics as well
479
472
  Eps.metrics(actual, predicted, weight: weight)
480
473
  ```
481
474
 
482
- Reweighing is one method to [mitigate bias](http://aif360.mybluemix.net/) in training data
483
-
484
- ## Upgrading
485
-
486
- ## 0.3.0
487
-
488
- Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
489
-
490
- - LightGBM is now the default for new models. On Mac, run:
491
-
492
- ```sh
493
- brew install libomp
494
- ```
495
-
496
- Pass the `algorithm` option to use linear regression or naive Bayes.
497
-
498
- ```ruby
499
- Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
500
- ```
501
-
502
- - Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
503
-
504
- ```ruby
505
- Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
506
- ```
507
-
508
- Or randomly, use:
509
-
510
- ```ruby
511
- Eps::Model.new(data, split: {validation_size: 0.3})
512
- ```
513
-
514
- To continue splitting manually, use:
515
-
516
- ```ruby
517
- Eps::Model.new(data, validation_set: test_set)
518
- ```
519
-
520
- - It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
521
-
522
- ## 0.2.0
523
-
524
- Eps 0.2.0 brings a number of improvements, including support for classification.
525
-
526
- We recommend:
527
-
528
- 1. Changing `Eps::Regressor` to `Eps::Model`
529
- 2. Converting models from JSON to PMML
530
-
531
- ```ruby
532
- model = Eps::Model.load_json("model.json")
533
- File.write("model.pmml", model.to_pmml)
534
- ```
535
-
536
- 3. Renaming `app/stats_models` to `app/ml_models`
475
+ Reweighing is one method to [mitigate bias](https://fairlearn.org/) in training data
537
476
 
538
477
  ## History
539
478
 
@@ -27,8 +27,8 @@ module Eps
27
27
 
28
28
  def self.load_pmml(pmml)
29
29
  model = new
30
- model.instance_variable_set("@evaluator", PMML.load(pmml))
31
- model.instance_variable_set("@pmml", pmml.respond_to?(:to_xml) ? pmml.to_xml : pmml) # cache data
30
+ model.instance_variable_set(:@evaluator, PMML.load(pmml))
31
+ model.instance_variable_set(:@pmml, pmml.respond_to?(:to_xml) ? pmml.to_xml : pmml) # cache data
32
32
  model
33
33
  end
34
34
 
@@ -226,7 +226,7 @@ module Eps
226
226
  end
227
227
 
228
228
  encoder.vocabulary.each do |word|
229
- train_set.columns[[k, word]] = [0] * counts.size
229
+ train_set.columns[[k, word]] = Array.new(counts.size, 0)
230
230
  end
231
231
 
232
232
  counts.each_with_index do |ci, i|
@@ -10,7 +10,7 @@ module Eps
10
10
  data.columns.each do |k, v|
11
11
  @columns[k] = v
12
12
  end
13
- elsif rover?(data) || daru?(data)
13
+ elsif rover?(data)
14
14
  data.to_h.each do |k, v|
15
15
  @columns[k.to_s] = v.to_a
16
16
  end
@@ -152,9 +152,5 @@ module Eps
152
152
  def rover?(x)
153
153
  defined?(Rover::DataFrame) && x.is_a?(Rover::DataFrame)
154
154
  end
155
-
156
- def daru?(x)
157
- defined?(Daru::DataFrame) && x.is_a?(Daru::DataFrame)
158
- end
159
155
  end
160
156
  end
@@ -13,7 +13,7 @@ module Eps
13
13
  raise "Probabilities not supported" if probabilities
14
14
 
15
15
  intercept = @coefficients["_intercept"] || 0.0
16
- scores = [intercept] * x.size
16
+ scores = Array.new(x.size, intercept)
17
17
 
18
18
  @features.each do |k, type|
19
19
  raise "Missing data in #{k}" if !x.columns[k] || x.columns[k].any?(&:nil?)
@@ -14,7 +14,7 @@ module Eps
14
14
  probs = calculate_class_probabilities(x)
15
15
  probs.map do |xp|
16
16
  if probabilities
17
- sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
17
+ sum = xp.values.sum { |v| Math.exp(v) }.to_f
18
18
  xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
19
19
  else
20
20
  xp.sort_by { |k, v| [-v, k] }[0][0]
@@ -33,7 +33,7 @@ module Eps
33
33
  total = probabilities[:prior].values.sum.to_f
34
34
  probabilities[:prior].each do |c, cv|
35
35
  prior = Math.log(cv / total)
36
- px = [prior] * x.size
36
+ px = Array.new(x.size, prior)
37
37
 
38
38
  @features.each do |k, type|
39
39
  case type
@@ -44,7 +44,7 @@ module Eps
44
44
 
45
45
  # unknown value if not vc
46
46
  if vc
47
- denom = probabilities[:conditional][k].map { |k, v| v[c] }.sum.to_f
47
+ denom = probabilities[:conditional][k].sum { |k, v| v[c] }.to_f
48
48
  p2 = vc[c].to_f / denom
49
49
 
50
50
  # TODO use proper smoothing instead
data/lib/eps/lightgbm.rb CHANGED
@@ -20,7 +20,6 @@ module Eps
20
20
  def _train(verbose: nil, early_stopping: nil, learning_rate: 0.1)
21
21
  train_set = @train_set
22
22
  validation_set = @validation_set.dup
23
- summary_label = train_set.label
24
23
 
25
24
  # create check set
26
25
  evaluator_set = validation_set || train_set
@@ -134,7 +133,6 @@ module Eps
134
133
  actual = evaluator.predict(evaluator_set)
135
134
  end
136
135
 
137
- regression = objective == "regression" || objective == "binary"
138
136
  bad_observations = []
139
137
  expected.zip(actual).each_with_index do |(exp, act), i|
140
138
  success = (act - exp).abs < 0.001
@@ -221,13 +221,13 @@ module Eps
221
221
 
222
222
  # total sum of squares
223
223
  def sst
224
- @sst ||= @train_set.label.map { |y| (y - y_bar)**2 }.sum
224
+ @sst ||= @train_set.label.sum { |y| (y - y_bar)**2 }
225
225
  end
226
226
 
227
227
  # sum of squared errors of prediction
228
228
  # not to be confused with "explained sum of squares"
229
229
  def sse
230
- @sse ||= @train_set.label.zip(y_hat).map { |y, yh| (y - yh)**2 }.sum
230
+ @sse ||= @train_set.label.zip(y_hat).sum { |y, yh| (y - yh)**2 }
231
231
  end
232
232
 
233
233
  def mst
@@ -73,7 +73,7 @@ module Eps
73
73
  # smooth
74
74
  if smoothing
75
75
  labels.each do |label|
76
- sum = prob.map { |k2, v2| v2[label] }.sum.to_f
76
+ sum = prob.sum { |k2, v2| v2[label] }.to_f
77
77
  prob.each do |k2, v|
78
78
  v[label] = (v[label] + smoothing) * sum / (sum + (prob.size * smoothing))
79
79
  end
@@ -151,7 +151,7 @@ module Eps
151
151
  end
152
152
 
153
153
  def linear_regression
154
- predictors = model.instance_variable_get("@coefficients").dup
154
+ predictors = model.instance_variable_get(:@coefficients).dup
155
155
  intercept = predictors.delete("_intercept") || 0.0
156
156
 
157
157
  data_fields = {}
@@ -377,43 +377,43 @@ module Eps
377
377
  # TODO create instance methods on model for all of these features
378
378
 
379
379
  def features
380
- model.instance_variable_get("@features")
380
+ model.instance_variable_get(:@features)
381
381
  end
382
382
 
383
383
  def text_features
384
- model.instance_variable_get("@text_features")
384
+ model.instance_variable_get(:@text_features)
385
385
  end
386
386
 
387
387
  def text_encoders
388
- model.instance_variable_get("@text_encoders")
388
+ model.instance_variable_get(:@text_encoders)
389
389
  end
390
390
 
391
391
  def feature_importance
392
- model.instance_variable_get("@feature_importance")
392
+ model.instance_variable_get(:@feature_importance)
393
393
  end
394
394
 
395
395
  def labels
396
- model.instance_variable_get("@labels")
396
+ model.instance_variable_get(:@labels)
397
397
  end
398
398
 
399
399
  def trees
400
- model.instance_variable_get("@trees")
400
+ model.instance_variable_get(:@trees)
401
401
  end
402
402
 
403
403
  def target
404
- model.instance_variable_get("@target")
404
+ model.instance_variable_get(:@target)
405
405
  end
406
406
 
407
407
  def label_encoders
408
- model.instance_variable_get("@label_encoders")
408
+ model.instance_variable_get(:@label_encoders)
409
409
  end
410
410
 
411
411
  def objective
412
- model.instance_variable_get("@objective")
412
+ model.instance_variable_get(:@objective)
413
413
  end
414
414
 
415
415
  def probabilities
416
- model.instance_variable_get("@probabilities")
416
+ model.instance_variable_get(:@probabilities)
417
417
  end
418
418
 
419
419
  # end TODO
@@ -139,8 +139,6 @@ module Eps
139
139
  features[name] = n.css("TargetValueStat").any? ? "numeric" : "categorical"
140
140
  end
141
141
 
142
- target = node.css("BayesOutput").attribute("fieldName").value
143
-
144
142
  probabilities = {
145
143
  prior: prior,
146
144
  conditional: conditional
@@ -47,7 +47,7 @@ module Eps
47
47
  return start + sign * (z - a) / 2.0
48
48
  end
49
49
 
50
- # tail series expanation for large t-values
50
+ # tail series expansion for large t-values
51
51
  a = Math.sqrt(b)
52
52
  y = a * n
53
53
  j = 0
@@ -36,7 +36,7 @@ module Eps
36
36
  end
37
37
 
38
38
  def transform(arr)
39
- counts, fit = count_and_fit(arr)
39
+ _, fit = count_and_fit(arr)
40
40
  fit
41
41
  end
42
42
 
data/lib/eps/utils.rb CHANGED
@@ -3,14 +3,14 @@ module Eps
3
3
  def self.column_type(c, k)
4
4
  if !c
5
5
  raise ArgumentError, "Missing column: #{k}"
6
- elsif c.all? { |v| v.nil? }
6
+ elsif c.all?(&:nil?)
7
7
  # goes here for empty as well
8
8
  nil
9
- elsif c.any? { |v| v.nil? }
9
+ elsif c.any?(&:nil?)
10
10
  raise ArgumentError, "Missing values in column #{k}"
11
- elsif c.all? { |v| v.is_a?(Numeric) }
11
+ elsif c.all?(Numeric)
12
12
  "numeric"
13
- elsif c.all? { |v| v.is_a?(String) }
13
+ elsif c.all?(String)
14
14
  "categorical"
15
15
  elsif c.all? { |v| v == true || v == false }
16
16
  "categorical" # boolean
data/lib/eps/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Eps
2
- VERSION = "0.5.0"
2
+ VERSION = "0.7.0"
3
3
  end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: eps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.5.0
4
+ version: 0.7.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
9
8
  bindir: bin
10
9
  cert_chain: []
11
- date: 2023-07-02 00:00:00.000000000 Z
10
+ date: 1980-01-02 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: lightgbm
@@ -16,14 +15,14 @@ dependencies:
16
15
  requirements:
17
16
  - - ">="
18
17
  - !ruby/object:Gem::Version
19
- version: 0.1.7
18
+ version: '0.4'
20
19
  type: :runtime
21
20
  prerelease: false
22
21
  version_requirements: !ruby/object:Gem::Requirement
23
22
  requirements:
24
23
  - - ">="
25
24
  - !ruby/object:Gem::Version
26
- version: 0.1.7
25
+ version: '0.4'
27
26
  - !ruby/object:Gem::Dependency
28
27
  name: matrix
29
28
  requirement: !ruby/object:Gem::Requirement
@@ -52,7 +51,6 @@ dependencies:
52
51
  - - ">="
53
52
  - !ruby/object:Gem::Version
54
53
  version: '0'
55
- description:
56
54
  email: andrew@ankane.org
57
55
  executables: []
58
56
  extensions: []
@@ -86,7 +84,6 @@ homepage: https://github.com/ankane/eps
86
84
  licenses:
87
85
  - MIT
88
86
  metadata: {}
89
- post_install_message:
90
87
  rdoc_options: []
91
88
  require_paths:
92
89
  - lib
@@ -94,15 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
94
91
  requirements:
95
92
  - - ">="
96
93
  - !ruby/object:Gem::Version
97
- version: '3'
94
+ version: '3.3'
98
95
  required_rubygems_version: !ruby/object:Gem::Requirement
99
96
  requirements:
100
97
  - - ">="
101
98
  - !ruby/object:Gem::Version
102
99
  version: '0'
103
100
  requirements: []
104
- rubygems_version: 3.4.10
105
- signing_key:
101
+ rubygems_version: 4.0.6
106
102
  specification_version: 4
107
103
  summary: Machine learning for Ruby. Supports regression (linear regression) and classification
108
104
  (naive Bayes)