eps 0.4.1 → 0.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/LICENSE.txt +1 -1
- data/README.md +4 -58
- data/lib/eps/base_estimator.rb +1 -1
- data/lib/eps/data_frame.rb +1 -1
- data/lib/eps/evaluators/linear_regression.rb +3 -3
- data/lib/eps/evaluators/naive_bayes.rb +1 -1
- data/lib/eps/label_encoder.rb +1 -1
- data/lib/eps/linear_regression.rb +6 -7
- data/lib/eps/statistics.rb +60 -66
- data/lib/eps/text_encoder.rb +1 -1
- data/lib/eps/version.rb +1 -1
- data/lib/eps.rb +23 -21
- metadata +4 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 701cde7907172e33d54d05ee0f2236dbfa84486744e7a510190618848c878418
|
4
|
+
data.tar.gz: a6be730db321a5143e34727497c6ed675f8d734b6195cd4fdc5f8e6bc4e98ff1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6c76ec99a116a91cb9550b9a007b3150689b76649b14aa0c85582176fb44a08957b37c40e54ed1adb6447d8c5ec06757a611fad51a582581031f5724317ee888
|
7
|
+
data.tar.gz: 2176bf4351c26df4562a2879ecb2f0637d2d23bb624dc683790befa2029c2a3d89388b7d5b3183d4d48789801068d7442807c28c2a53439089e2ba0943e0bc9b
|
data/CHANGELOG.md
CHANGED
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -7,7 +7,7 @@ Machine learning for Ruby
|
|
7
7
|
|
8
8
|
Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
|
9
9
|
|
10
|
-
[](https://github.com/ankane/eps/actions)
|
11
11
|
|
12
12
|
## Installation
|
13
13
|
|
@@ -414,7 +414,7 @@ Eps::Model.new(data, validation_set: validation_set)
|
|
414
414
|
Split on a specific value
|
415
415
|
|
416
416
|
```ruby
|
417
|
-
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("
|
417
|
+
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2025-01-01")})
|
418
418
|
```
|
419
419
|
|
420
420
|
Specify the validation set size (the default is `0.25`, which is 25%)
|
@@ -435,7 +435,7 @@ The database is another place you can store models. It’s good if you retrain m
|
|
435
435
|
|
436
436
|
> We recommend adding monitoring and guardrails as well if you retrain automatically
|
437
437
|
|
438
|
-
Create an
|
438
|
+
Create an Active Record model to store the predictive model.
|
439
439
|
|
440
440
|
```sh
|
441
441
|
rails generate model Model key:string:uniq data:text
|
@@ -479,61 +479,7 @@ Weights are supported for metrics as well
|
|
479
479
|
Eps.metrics(actual, predicted, weight: weight)
|
480
480
|
```
|
481
481
|
|
482
|
-
Reweighing is one method to [mitigate bias](
|
483
|
-
|
484
|
-
## Upgrading
|
485
|
-
|
486
|
-
## 0.3.0
|
487
|
-
|
488
|
-
Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
|
489
|
-
|
490
|
-
- LightGBM is now the default for new models. On Mac, run:
|
491
|
-
|
492
|
-
```sh
|
493
|
-
brew install libomp
|
494
|
-
```
|
495
|
-
|
496
|
-
Pass the `algorithm` option to use linear regression or naive Bayes.
|
497
|
-
|
498
|
-
```ruby
|
499
|
-
Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
|
500
|
-
```
|
501
|
-
|
502
|
-
- Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
|
503
|
-
|
504
|
-
```ruby
|
505
|
-
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
|
506
|
-
```
|
507
|
-
|
508
|
-
Or randomly, use:
|
509
|
-
|
510
|
-
```ruby
|
511
|
-
Eps::Model.new(data, split: {validation_size: 0.3})
|
512
|
-
```
|
513
|
-
|
514
|
-
To continue splitting manually, use:
|
515
|
-
|
516
|
-
```ruby
|
517
|
-
Eps::Model.new(data, validation_set: test_set)
|
518
|
-
```
|
519
|
-
|
520
|
-
- It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
|
521
|
-
|
522
|
-
## 0.2.0
|
523
|
-
|
524
|
-
Eps 0.2.0 brings a number of improvements, including support for classification.
|
525
|
-
|
526
|
-
We recommend:
|
527
|
-
|
528
|
-
1. Changing `Eps::Regressor` to `Eps::Model`
|
529
|
-
2. Converting models from JSON to PMML
|
530
|
-
|
531
|
-
```ruby
|
532
|
-
model = Eps::Model.load_json("model.json")
|
533
|
-
File.write("model.pmml", model.to_pmml)
|
534
|
-
```
|
535
|
-
|
536
|
-
3. Renaming `app/stats_models` to `app/ml_models`
|
482
|
+
Reweighing is one method to [mitigate bias](https://fairlearn.org/) in training data
|
537
483
|
|
538
484
|
## History
|
539
485
|
|
data/lib/eps/base_estimator.rb
CHANGED
data/lib/eps/data_frame.rb
CHANGED
@@ -4,7 +4,7 @@ module Eps
|
|
4
4
|
attr_reader :features
|
5
5
|
|
6
6
|
def initialize(coefficients:, features:, text_features:)
|
7
|
-
@coefficients =
|
7
|
+
@coefficients = coefficients.to_h { |k, v| [k.is_a?(Array) ? [k[0].to_s, k[1]] : k.to_s, v] }
|
8
8
|
@features = features
|
9
9
|
@text_features = text_features || {}
|
10
10
|
end
|
@@ -13,7 +13,7 @@ module Eps
|
|
13
13
|
raise "Probabilities not supported" if probabilities
|
14
14
|
|
15
15
|
intercept = @coefficients["_intercept"] || 0.0
|
16
|
-
scores =
|
16
|
+
scores = Array.new(x.size, intercept)
|
17
17
|
|
18
18
|
@features.each do |k, type|
|
19
19
|
raise "Missing data in #{k}" if !x.columns[k] || x.columns[k].any?(&:nil?)
|
@@ -50,7 +50,7 @@ module Eps
|
|
50
50
|
end
|
51
51
|
|
52
52
|
def coefficients
|
53
|
-
|
53
|
+
@coefficients.to_h { |k, v| [Array(k).join.to_sym, v] }
|
54
54
|
end
|
55
55
|
end
|
56
56
|
end
|
data/lib/eps/label_encoder.rb
CHANGED
@@ -146,7 +146,7 @@ module Eps
|
|
146
146
|
|
147
147
|
@coefficient_names = data.columns.keys
|
148
148
|
@coefficient_names.unshift("_intercept") if intercept
|
149
|
-
@coefficients =
|
149
|
+
@coefficients = @coefficient_names.zip(v3).to_h
|
150
150
|
Evaluators::LinearRegression.new(coefficients: @coefficients, features: @features, text_features: @text_features)
|
151
151
|
end
|
152
152
|
|
@@ -172,21 +172,20 @@ module Eps
|
|
172
172
|
# add epsilon for perfect fits
|
173
173
|
# consistent with GSL
|
174
174
|
def t_value
|
175
|
-
@t_value ||=
|
175
|
+
@t_value ||= @coefficients.to_h { |k, v| [k, v / (std_err[k] + Float::EPSILON)] }
|
176
176
|
end
|
177
177
|
|
178
178
|
def p_value
|
179
179
|
@p_value ||= begin
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
end]
|
180
|
+
@coefficients.to_h do |k, _|
|
181
|
+
[k, 2 * Eps::Statistics.students_t_cdf(-t_value[k].abs, degrees_of_freedom)]
|
182
|
+
end
|
184
183
|
end
|
185
184
|
end
|
186
185
|
|
187
186
|
def std_err
|
188
187
|
@std_err ||= begin
|
189
|
-
|
188
|
+
@coefficient_names.zip(diagonal.map { |v| Math.sqrt(v) }).to_h
|
190
189
|
end
|
191
190
|
end
|
192
191
|
|
data/lib/eps/statistics.rb
CHANGED
@@ -1,79 +1,73 @@
|
|
1
|
-
### Extracted from https://github.com/estebanz01/ruby-statistics
|
2
|
-
### The Ruby author is Esteban Zapata Rojas
|
3
|
-
###
|
4
|
-
### Originally extracted from https://codeplea.com/incomplete-beta-function-c
|
5
|
-
### These functions shared under zlib license and the author is Lewis Van Winkle
|
6
|
-
|
7
1
|
module Eps
|
8
2
|
module Statistics
|
9
|
-
def self.
|
10
|
-
|
11
|
-
lower = (2.0 * Math.sqrt(value * value + degrees_of_freedom))
|
12
|
-
|
13
|
-
x = upper/lower
|
14
|
-
|
15
|
-
alpha = degrees_of_freedom/2.0
|
16
|
-
beta = degrees_of_freedom/2.0
|
17
|
-
|
18
|
-
incomplete_beta_function(x, alpha, beta)
|
3
|
+
def self.normal_cdf(x, mean, std_dev)
|
4
|
+
0.5 * (1.0 + Math.erf((x - mean) / (std_dev * Math.sqrt(2))))
|
19
5
|
end
|
20
6
|
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
7
|
+
# Hill, G. W. (1970).
|
8
|
+
# Algorithm 395: Student's t-distribution.
|
9
|
+
# Communications of the ACM, 13(10), 617-619.
|
10
|
+
def self.students_t_cdf(x, n)
|
11
|
+
start, sign = x < 0 ? [0, 1] : [1, -1]
|
12
|
+
|
13
|
+
z = 1.0
|
14
|
+
t = x * x
|
15
|
+
y = t / n.to_f
|
16
|
+
b = 1.0 + y
|
17
|
+
|
18
|
+
if n > n.floor || (n >= 20.0 && t < n) || n > 200.0
|
19
|
+
# asymptotic series for large or noninteger n
|
20
|
+
if y > 10e-6
|
21
|
+
y = Math.log(b)
|
22
|
+
end
|
23
|
+
a = n - 0.5
|
24
|
+
b = 48.0 * a * a
|
25
|
+
y *= a
|
26
|
+
y = (((((-0.4 * y - 3.3) * y - 24.0) * y - 85.5) / (0.8 * y * y + 100.0 + b) + y + 3.0) / b + 1.0) * Math.sqrt(y)
|
27
|
+
return start + sign * normal_cdf(-y, 0.0, 1.0)
|
29
28
|
end
|
30
29
|
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
# upper_left = (x ** alp) * ((1.0 - x) ** bet)
|
39
|
-
# front = upper_left/down_left
|
40
|
-
|
41
|
-
f, c, d = 1.0, 1.0, 0.0
|
42
|
-
|
43
|
-
returned_value = nil
|
44
|
-
|
45
|
-
# Let's do more iterations than the proposed implementation (200 iters)
|
46
|
-
(0..500).each do |number|
|
47
|
-
m = number/2
|
48
|
-
|
49
|
-
numerator = if number == 0
|
50
|
-
1.0
|
51
|
-
elsif number % 2 == 0
|
52
|
-
(m * (bet - m) * x)/((alp + 2.0 * m - 1.0)* (alp + 2.0 * m))
|
53
|
-
else
|
54
|
-
top = -((alp + m) * (alp + bet + m) * x)
|
55
|
-
down = ((alp + 2.0 * m) * (alp + 2.0 * m + 1.0))
|
56
|
-
|
57
|
-
top/down
|
58
|
-
end
|
59
|
-
|
60
|
-
d = 1.0 + numerator * d
|
61
|
-
d = tiny if d.abs < tiny
|
62
|
-
d = 1.0 / d
|
63
|
-
|
64
|
-
c = 1.0 + numerator / c
|
65
|
-
c = tiny if c.abs < tiny
|
66
|
-
|
67
|
-
cd = (c*d).freeze
|
68
|
-
f = f * cd
|
30
|
+
if n < 20 && t < 4.0
|
31
|
+
# nested summation of cosine series
|
32
|
+
y = Math.sqrt(y)
|
33
|
+
a = y
|
34
|
+
if n == 1
|
35
|
+
a = 0.0
|
36
|
+
end
|
69
37
|
|
70
|
-
|
71
|
-
|
72
|
-
|
38
|
+
# loop
|
39
|
+
if n > 1
|
40
|
+
n -= 2
|
41
|
+
while n > 1
|
42
|
+
a = (n - 1) / (b * n) * a + y
|
43
|
+
n -= 2
|
44
|
+
end
|
73
45
|
end
|
46
|
+
a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
|
47
|
+
return start + sign * (z - a) / 2.0
|
74
48
|
end
|
75
49
|
|
76
|
-
|
50
|
+
# tail series expanation for large t-values
|
51
|
+
a = Math.sqrt(b)
|
52
|
+
y = a * n
|
53
|
+
j = 0
|
54
|
+
while a != z
|
55
|
+
j += 2
|
56
|
+
z = a
|
57
|
+
y = y * (j - 1) / (b * j)
|
58
|
+
a += y / (n + j)
|
59
|
+
end
|
60
|
+
z = 0.0
|
61
|
+
y = 0.0
|
62
|
+
a = -a
|
63
|
+
|
64
|
+
# loop (without n + 2 and n - 2)
|
65
|
+
while n > 1
|
66
|
+
a = (n - 1) / (b * n) * a + y
|
67
|
+
n -= 2
|
68
|
+
end
|
69
|
+
a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
|
70
|
+
start + sign * (z - a) / 2.0
|
77
71
|
end
|
78
72
|
end
|
79
73
|
end
|
data/lib/eps/text_encoder.rb
CHANGED
data/lib/eps/version.rb
CHANGED
data/lib/eps.rb
CHANGED
@@ -1,34 +1,36 @@
|
|
1
1
|
# dependencies
|
2
|
-
require "json"
|
3
2
|
require "lightgbm"
|
4
3
|
require "matrix"
|
5
4
|
require "nokogiri"
|
6
5
|
|
6
|
+
# stdlib
|
7
|
+
require "json"
|
8
|
+
|
7
9
|
# modules
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
10
|
+
require_relative "eps/base"
|
11
|
+
require_relative "eps/base_estimator"
|
12
|
+
require_relative "eps/data_frame"
|
13
|
+
require_relative "eps/label_encoder"
|
14
|
+
require_relative "eps/lightgbm"
|
15
|
+
require_relative "eps/linear_regression"
|
16
|
+
require_relative "eps/metrics"
|
17
|
+
require_relative "eps/model"
|
18
|
+
require_relative "eps/naive_bayes"
|
19
|
+
require_relative "eps/statistics"
|
20
|
+
require_relative "eps/text_encoder"
|
21
|
+
require_relative "eps/utils"
|
22
|
+
require_relative "eps/version"
|
21
23
|
|
22
24
|
# pmml
|
23
|
-
|
24
|
-
|
25
|
-
|
25
|
+
require_relative "eps/pmml"
|
26
|
+
require_relative "eps/pmml/generator"
|
27
|
+
require_relative "eps/pmml/loader"
|
26
28
|
|
27
29
|
# evaluators
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
30
|
+
require_relative "eps/evaluators/linear_regression"
|
31
|
+
require_relative "eps/evaluators/lightgbm"
|
32
|
+
require_relative "eps/evaluators/naive_bayes"
|
33
|
+
require_relative "eps/evaluators/node"
|
32
34
|
|
33
35
|
module Eps
|
34
36
|
class Error < StandardError; end
|
metadata
CHANGED
@@ -1,14 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: eps
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
|
-
autorequire:
|
9
8
|
bindir: bin
|
10
9
|
cert_chain: []
|
11
|
-
date:
|
10
|
+
date: 2025-02-01 00:00:00.000000000 Z
|
12
11
|
dependencies:
|
13
12
|
- !ruby/object:Gem::Dependency
|
14
13
|
name: lightgbm
|
@@ -52,7 +51,6 @@ dependencies:
|
|
52
51
|
- - ">="
|
53
52
|
- !ruby/object:Gem::Version
|
54
53
|
version: '0'
|
55
|
-
description:
|
56
54
|
email: andrew@ankane.org
|
57
55
|
executables: []
|
58
56
|
extensions: []
|
@@ -86,7 +84,6 @@ homepage: https://github.com/ankane/eps
|
|
86
84
|
licenses:
|
87
85
|
- MIT
|
88
86
|
metadata: {}
|
89
|
-
post_install_message:
|
90
87
|
rdoc_options: []
|
91
88
|
require_paths:
|
92
89
|
- lib
|
@@ -94,15 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
94
91
|
requirements:
|
95
92
|
- - ">="
|
96
93
|
- !ruby/object:Gem::Version
|
97
|
-
version: '
|
94
|
+
version: '3.1'
|
98
95
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
99
96
|
requirements:
|
100
97
|
- - ">="
|
101
98
|
- !ruby/object:Gem::Version
|
102
99
|
version: '0'
|
103
100
|
requirements: []
|
104
|
-
rubygems_version: 3.
|
105
|
-
signing_key:
|
101
|
+
rubygems_version: 3.6.2
|
106
102
|
specification_version: 4
|
107
103
|
summary: Machine learning for Ruby. Supports regression (linear regression) and classification
|
108
104
|
(naive Bayes)
|