eps 0.4.1 → 0.6.0

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 4df8a83bee7fce8feebec2cf26d33d7ee4ca74fbcda9f41fb070f614cfb2e0eb
4
- data.tar.gz: 23f7dd9aa63eb4306268f19b862de3a07f9d72d9ec507160a7e6d291ea2245c6
3
+ metadata.gz: 701cde7907172e33d54d05ee0f2236dbfa84486744e7a510190618848c878418
4
+ data.tar.gz: a6be730db321a5143e34727497c6ed675f8d734b6195cd4fdc5f8e6bc4e98ff1
5
5
  SHA512:
6
- metadata.gz: c24ea7abf903829b3fe00dd0f7c601062464ecc193ccd8a725a98a437e7ed6f6bff8952c1c50aeeadcc5981e84325a44efedd53580e23f2475f1c8a7b927ed78
7
- data.tar.gz: 601cf18d044fd9ac348d3f632b7edda7fbd34ef11f497d4be998d62ae76f33f6681953fb5c263924deebed184b5b6f560bcd24de272c509317fb9c3b68f2f3b9
6
+ metadata.gz: 6c76ec99a116a91cb9550b9a007b3150689b76649b14aa0c85582176fb44a08957b37c40e54ed1adb6447d8c5ec06757a611fad51a582581031f5724317ee888
7
+ data.tar.gz: 2176bf4351c26df4562a2879ecb2f0637d2d23bb624dc683790befa2029c2a3d89388b7d5b3183d4d48789801068d7442807c28c2a53439089e2ba0943e0bc9b
data/CHANGELOG.md CHANGED
@@ -1,3 +1,11 @@
1
+ ## 0.6.0 (2025-02-01)
2
+
3
+ - Dropped support for Ruby < 3.1
4
+
5
+ ## 0.5.0 (2023-07-02)
6
+
7
+ - Dropped support for Ruby < 3
8
+
1
9
  ## 0.4.1 (2022-09-28)
2
10
 
3
11
  - Fixed `cannot load such file -- matrix` error with Ruby 3.1
data/LICENSE.txt CHANGED
@@ -1,6 +1,6 @@
1
1
  The MIT License (MIT)
2
2
 
3
- Copyright (c) 2018-2021 Andrew Kane
3
+ Copyright (c) 2018-2024 Andrew Kane
4
4
 
5
5
  Permission is hereby granted, free of charge, to any person obtaining a copy
6
6
  of this software and associated documentation files (the "Software"), to deal
data/README.md CHANGED
@@ -7,7 +7,7 @@ Machine learning for Ruby
7
7
 
8
8
  Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
9
9
 
10
- [![Build Status](https://github.com/ankane/eps/workflows/build/badge.svg?branch=master)](https://github.com/ankane/eps/actions)
10
+ [![Build Status](https://github.com/ankane/eps/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/eps/actions)
11
11
 
12
12
  ## Installation
13
13
 
@@ -414,7 +414,7 @@ Eps::Model.new(data, validation_set: validation_set)
414
414
  Split on a specific value
415
415
 
416
416
  ```ruby
417
- Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
417
+ Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2025-01-01")})
418
418
  ```
419
419
 
420
420
  Specify the validation set size (the default is `0.25`, which is 25%)
@@ -435,7 +435,7 @@ The database is another place you can store models. It’s good if you retrain m
435
435
 
436
436
  > We recommend adding monitoring and guardrails as well if you retrain automatically
437
437
 
438
- Create an ActiveRecord model to store the predictive model.
438
+ Create an Active Record model to store the predictive model.
439
439
 
440
440
  ```sh
441
441
  rails generate model Model key:string:uniq data:text
@@ -479,61 +479,7 @@ Weights are supported for metrics as well
479
479
  Eps.metrics(actual, predicted, weight: weight)
480
480
  ```
481
481
 
482
- Reweighing is one method to [mitigate bias](http://aif360.mybluemix.net/) in training data
483
-
484
- ## Upgrading
485
-
486
- ## 0.3.0
487
-
488
- Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
489
-
490
- - LightGBM is now the default for new models. On Mac, run:
491
-
492
- ```sh
493
- brew install libomp
494
- ```
495
-
496
- Pass the `algorithm` option to use linear regression or naive Bayes.
497
-
498
- ```ruby
499
- Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
500
- ```
501
-
502
- - Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
503
-
504
- ```ruby
505
- Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
506
- ```
507
-
508
- Or randomly, use:
509
-
510
- ```ruby
511
- Eps::Model.new(data, split: {validation_size: 0.3})
512
- ```
513
-
514
- To continue splitting manually, use:
515
-
516
- ```ruby
517
- Eps::Model.new(data, validation_set: test_set)
518
- ```
519
-
520
- - It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
521
-
522
- ## 0.2.0
523
-
524
- Eps 0.2.0 brings a number of improvements, including support for classification.
525
-
526
- We recommend:
527
-
528
- 1. Changing `Eps::Regressor` to `Eps::Model`
529
- 2. Converting models from JSON to PMML
530
-
531
- ```ruby
532
- model = Eps::Model.load_json("model.json")
533
- File.write("model.pmml", model.to_pmml)
534
- ```
535
-
536
- 3. Renaming `app/stats_models` to `app/ml_models`
482
+ Reweighing is one method to [mitigate bias](https://fairlearn.org/) in training data
537
483
 
538
484
  ## History
539
485
 
@@ -226,7 +226,7 @@ module Eps
226
226
  end
227
227
 
228
228
  encoder.vocabulary.each do |word|
229
- train_set.columns[[k, word]] = [0] * counts.size
229
+ train_set.columns[[k, word]] = Array.new(counts.size, 0)
230
230
  end
231
231
 
232
232
  counts.each_with_index do |ci, i|
@@ -54,7 +54,7 @@ module Eps
54
54
  def map
55
55
  if @columns.any?
56
56
  size.times.map do |i|
57
- yield Hash[@columns.map { |k, v| [k, v[i]] }]
57
+ yield @columns.to_h { |k, v| [k, v[i]] }
58
58
  end
59
59
  end
60
60
  end
@@ -4,7 +4,7 @@ module Eps
4
4
  attr_reader :features
5
5
 
6
6
  def initialize(coefficients:, features:, text_features:)
7
- @coefficients = Hash[coefficients.map { |k, v| [k.is_a?(Array) ? [k[0].to_s, k[1]] : k.to_s, v] }]
7
+ @coefficients = coefficients.to_h { |k, v| [k.is_a?(Array) ? [k[0].to_s, k[1]] : k.to_s, v] }
8
8
  @features = features
9
9
  @text_features = text_features || {}
10
10
  end
@@ -13,7 +13,7 @@ module Eps
13
13
  raise "Probabilities not supported" if probabilities
14
14
 
15
15
  intercept = @coefficients["_intercept"] || 0.0
16
- scores = [intercept] * x.size
16
+ scores = Array.new(x.size, intercept)
17
17
 
18
18
  @features.each do |k, type|
19
19
  raise "Missing data in #{k}" if !x.columns[k] || x.columns[k].any?(&:nil?)
@@ -50,7 +50,7 @@ module Eps
50
50
  end
51
51
 
52
52
  def coefficients
53
- Hash[@coefficients.map { |k, v| [Array(k).join.to_sym, v] }]
53
+ @coefficients.to_h { |k, v| [Array(k).join.to_sym, v] }
54
54
  end
55
55
  end
56
56
  end
@@ -33,7 +33,7 @@ module Eps
33
33
  total = probabilities[:prior].values.sum.to_f
34
34
  probabilities[:prior].each do |c, cv|
35
35
  prior = Math.log(cv / total)
36
- px = [prior] * x.size
36
+ px = Array.new(x.size, prior)
37
37
 
38
38
  @features.each do |k, type|
39
39
  case type
@@ -36,7 +36,7 @@ module Eps
36
36
  end
37
37
 
38
38
  def inverse_transform(y)
39
- inverse = Hash[@labels.map(&:reverse)]
39
+ inverse = @labels.map(&:reverse).to_h
40
40
  y.map do |yi|
41
41
  inverse[yi.to_i]
42
42
  end
@@ -146,7 +146,7 @@ module Eps
146
146
 
147
147
  @coefficient_names = data.columns.keys
148
148
  @coefficient_names.unshift("_intercept") if intercept
149
- @coefficients = Hash[@coefficient_names.zip(v3)]
149
+ @coefficients = @coefficient_names.zip(v3).to_h
150
150
  Evaluators::LinearRegression.new(coefficients: @coefficients, features: @features, text_features: @text_features)
151
151
  end
152
152
 
@@ -172,21 +172,20 @@ module Eps
172
172
  # add epsilon for perfect fits
173
173
  # consistent with GSL
174
174
  def t_value
175
- @t_value ||= Hash[@coefficients.map { |k, v| [k, v / (std_err[k] + Float::EPSILON)] }]
175
+ @t_value ||= @coefficients.to_h { |k, v| [k, v / (std_err[k] + Float::EPSILON)] }
176
176
  end
177
177
 
178
178
  def p_value
179
179
  @p_value ||= begin
180
- Hash[@coefficients.map do |k, _|
181
- tp = Eps::Statistics.tdist_p(t_value[k].abs, degrees_of_freedom)
182
- [k, 2 * (1 - tp)]
183
- end]
180
+ @coefficients.to_h do |k, _|
181
+ [k, 2 * Eps::Statistics.students_t_cdf(-t_value[k].abs, degrees_of_freedom)]
182
+ end
184
183
  end
185
184
  end
186
185
 
187
186
  def std_err
188
187
  @std_err ||= begin
189
- Hash[@coefficient_names.zip(diagonal.map { |v| Math.sqrt(v) })]
188
+ @coefficient_names.zip(diagonal.map { |v| Math.sqrt(v) }).to_h
190
189
  end
191
190
  end
192
191
 
@@ -1,79 +1,73 @@
1
- ### Extracted from https://github.com/estebanz01/ruby-statistics
2
- ### The Ruby author is Esteban Zapata Rojas
3
- ###
4
- ### Originally extracted from https://codeplea.com/incomplete-beta-function-c
5
- ### These functions shared under zlib license and the author is Lewis Van Winkle
6
-
7
1
  module Eps
8
2
  module Statistics
9
- def self.tdist_p(value, degrees_of_freedom)
10
- upper = (value + Math.sqrt(value * value + degrees_of_freedom))
11
- lower = (2.0 * Math.sqrt(value * value + degrees_of_freedom))
12
-
13
- x = upper/lower
14
-
15
- alpha = degrees_of_freedom/2.0
16
- beta = degrees_of_freedom/2.0
17
-
18
- incomplete_beta_function(x, alpha, beta)
3
+ def self.normal_cdf(x, mean, std_dev)
4
+ 0.5 * (1.0 + Math.erf((x - mean) / (std_dev * Math.sqrt(2))))
19
5
  end
20
6
 
21
- def self.incomplete_beta_function(x, alp, bet)
22
- return if x < 0.0
23
- return 1.0 if x > 1.0
24
-
25
- tiny = 1.0E-50
26
-
27
- if x > ((alp + 1.0)/(alp + bet + 2.0))
28
- return 1.0 - incomplete_beta_function(1.0 - x, bet, alp)
7
+ # Hill, G. W. (1970).
8
+ # Algorithm 395: Student's t-distribution.
9
+ # Communications of the ACM, 13(10), 617-619.
10
+ def self.students_t_cdf(x, n)
11
+ start, sign = x < 0 ? [0, 1] : [1, -1]
12
+
13
+ z = 1.0
14
+ t = x * x
15
+ y = t / n.to_f
16
+ b = 1.0 + y
17
+
18
+ if n > n.floor || (n >= 20.0 && t < n) || n > 200.0
19
+ # asymptotic series for large or noninteger n
20
+ if y > 10e-6
21
+ y = Math.log(b)
22
+ end
23
+ a = n - 0.5
24
+ b = 48.0 * a * a
25
+ y *= a
26
+ y = (((((-0.4 * y - 3.3) * y - 24.0) * y - 85.5) / (0.8 * y * y + 100.0 + b) + y + 3.0) / b + 1.0) * Math.sqrt(y)
27
+ return start + sign * normal_cdf(-y, 0.0, 1.0)
29
28
  end
30
29
 
31
- # To avoid overflow problems, the implementation applies the logarithm properties
32
- # to calculate in a faster and safer way the values.
33
- lbet_ab = (Math.lgamma(alp)[0] + Math.lgamma(bet)[0] - Math.lgamma(alp + bet)[0]).freeze
34
- front = (Math.exp(Math.log(x) * alp + Math.log(1.0 - x) * bet - lbet_ab) / alp.to_f).freeze
35
-
36
- # This is the non-log version of the left part of the formula (before the continuous fraction)
37
- # down_left = alp * self.beta_function(alp, bet)
38
- # upper_left = (x ** alp) * ((1.0 - x) ** bet)
39
- # front = upper_left/down_left
40
-
41
- f, c, d = 1.0, 1.0, 0.0
42
-
43
- returned_value = nil
44
-
45
- # Let's do more iterations than the proposed implementation (200 iters)
46
- (0..500).each do |number|
47
- m = number/2
48
-
49
- numerator = if number == 0
50
- 1.0
51
- elsif number % 2 == 0
52
- (m * (bet - m) * x)/((alp + 2.0 * m - 1.0)* (alp + 2.0 * m))
53
- else
54
- top = -((alp + m) * (alp + bet + m) * x)
55
- down = ((alp + 2.0 * m) * (alp + 2.0 * m + 1.0))
56
-
57
- top/down
58
- end
59
-
60
- d = 1.0 + numerator * d
61
- d = tiny if d.abs < tiny
62
- d = 1.0 / d
63
-
64
- c = 1.0 + numerator / c
65
- c = tiny if c.abs < tiny
66
-
67
- cd = (c*d).freeze
68
- f = f * cd
30
+ if n < 20 && t < 4.0
31
+ # nested summation of cosine series
32
+ y = Math.sqrt(y)
33
+ a = y
34
+ if n == 1
35
+ a = 0.0
36
+ end
69
37
 
70
- if (1.0 - cd).abs < 1.0E-10
71
- returned_value = front * (f - 1.0)
72
- break
38
+ # loop
39
+ if n > 1
40
+ n -= 2
41
+ while n > 1
42
+ a = (n - 1) / (b * n) * a + y
43
+ n -= 2
44
+ end
73
45
  end
46
+ a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
47
+ return start + sign * (z - a) / 2.0
74
48
  end
75
49
 
76
- returned_value
50
+ # tail series expanation for large t-values
51
+ a = Math.sqrt(b)
52
+ y = a * n
53
+ j = 0
54
+ while a != z
55
+ j += 2
56
+ z = a
57
+ y = y * (j - 1) / (b * j)
58
+ a += y / (n + j)
59
+ end
60
+ z = 0.0
61
+ y = 0.0
62
+ a = -a
63
+
64
+ # loop (without n + 2 and n - 2)
65
+ while n > 1
66
+ a = (n - 1) / (b * n) * a + y
67
+ n -= 2
68
+ end
69
+ a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
70
+ start + sign * (z - a) / 2.0
77
71
  end
78
72
  end
79
73
  end
@@ -27,7 +27,7 @@ module Eps
27
27
 
28
28
  max_features = options[:max_features]
29
29
  if max_features
30
- counts = Hash[counts.sort_by { |_, v| -v }[0...max_features]]
30
+ counts = counts.sort_by { |_, v| -v }[0...max_features].to_h
31
31
  end
32
32
 
33
33
  @vocabulary = counts.keys
data/lib/eps/version.rb CHANGED
@@ -1,3 +1,3 @@
1
1
  module Eps
2
- VERSION = "0.4.1"
2
+ VERSION = "0.6.0"
3
3
  end
data/lib/eps.rb CHANGED
@@ -1,34 +1,36 @@
1
1
  # dependencies
2
- require "json"
3
2
  require "lightgbm"
4
3
  require "matrix"
5
4
  require "nokogiri"
6
5
 
6
+ # stdlib
7
+ require "json"
8
+
7
9
  # modules
8
- require "eps/base"
9
- require "eps/base_estimator"
10
- require "eps/data_frame"
11
- require "eps/label_encoder"
12
- require "eps/lightgbm"
13
- require "eps/linear_regression"
14
- require "eps/metrics"
15
- require "eps/model"
16
- require "eps/naive_bayes"
17
- require "eps/statistics"
18
- require "eps/text_encoder"
19
- require "eps/utils"
20
- require "eps/version"
10
+ require_relative "eps/base"
11
+ require_relative "eps/base_estimator"
12
+ require_relative "eps/data_frame"
13
+ require_relative "eps/label_encoder"
14
+ require_relative "eps/lightgbm"
15
+ require_relative "eps/linear_regression"
16
+ require_relative "eps/metrics"
17
+ require_relative "eps/model"
18
+ require_relative "eps/naive_bayes"
19
+ require_relative "eps/statistics"
20
+ require_relative "eps/text_encoder"
21
+ require_relative "eps/utils"
22
+ require_relative "eps/version"
21
23
 
22
24
  # pmml
23
- require "eps/pmml"
24
- require "eps/pmml/generator"
25
- require "eps/pmml/loader"
25
+ require_relative "eps/pmml"
26
+ require_relative "eps/pmml/generator"
27
+ require_relative "eps/pmml/loader"
26
28
 
27
29
  # evaluators
28
- require "eps/evaluators/linear_regression"
29
- require "eps/evaluators/lightgbm"
30
- require "eps/evaluators/naive_bayes"
31
- require "eps/evaluators/node"
30
+ require_relative "eps/evaluators/linear_regression"
31
+ require_relative "eps/evaluators/lightgbm"
32
+ require_relative "eps/evaluators/naive_bayes"
33
+ require_relative "eps/evaluators/node"
32
34
 
33
35
  module Eps
34
36
  class Error < StandardError; end
metadata CHANGED
@@ -1,14 +1,13 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: eps
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.4.1
4
+ version: 0.6.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Andrew Kane
8
- autorequire:
9
8
  bindir: bin
10
9
  cert_chain: []
11
- date: 2022-09-28 00:00:00.000000000 Z
10
+ date: 2025-02-01 00:00:00.000000000 Z
12
11
  dependencies:
13
12
  - !ruby/object:Gem::Dependency
14
13
  name: lightgbm
@@ -52,7 +51,6 @@ dependencies:
52
51
  - - ">="
53
52
  - !ruby/object:Gem::Version
54
53
  version: '0'
55
- description:
56
54
  email: andrew@ankane.org
57
55
  executables: []
58
56
  extensions: []
@@ -86,7 +84,6 @@ homepage: https://github.com/ankane/eps
86
84
  licenses:
87
85
  - MIT
88
86
  metadata: {}
89
- post_install_message:
90
87
  rdoc_options: []
91
88
  require_paths:
92
89
  - lib
@@ -94,15 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
94
91
  requirements:
95
92
  - - ">="
96
93
  - !ruby/object:Gem::Version
97
- version: '2.7'
94
+ version: '3.1'
98
95
  required_rubygems_version: !ruby/object:Gem::Requirement
99
96
  requirements:
100
97
  - - ">="
101
98
  - !ruby/object:Gem::Version
102
99
  version: '0'
103
100
  requirements: []
104
- rubygems_version: 3.3.7
105
- signing_key:
101
+ rubygems_version: 3.6.2
106
102
  specification_version: 4
107
103
  summary: Machine learning for Ruby. Supports regression (linear regression) and classification
108
104
  (naive Bayes)