eps 0.4.1 → 0.6.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/CHANGELOG.md +8 -0
- data/LICENSE.txt +1 -1
- data/README.md +4 -58
- data/lib/eps/base_estimator.rb +1 -1
- data/lib/eps/data_frame.rb +1 -1
- data/lib/eps/evaluators/linear_regression.rb +3 -3
- data/lib/eps/evaluators/naive_bayes.rb +1 -1
- data/lib/eps/label_encoder.rb +1 -1
- data/lib/eps/linear_regression.rb +6 -7
- data/lib/eps/statistics.rb +60 -66
- data/lib/eps/text_encoder.rb +1 -1
- data/lib/eps/version.rb +1 -1
- data/lib/eps.rb +23 -21
- metadata +4 -8
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA256:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: 701cde7907172e33d54d05ee0f2236dbfa84486744e7a510190618848c878418
|
4
|
+
data.tar.gz: a6be730db321a5143e34727497c6ed675f8d734b6195cd4fdc5f8e6bc4e98ff1
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 6c76ec99a116a91cb9550b9a007b3150689b76649b14aa0c85582176fb44a08957b37c40e54ed1adb6447d8c5ec06757a611fad51a582581031f5724317ee888
|
7
|
+
data.tar.gz: 2176bf4351c26df4562a2879ecb2f0637d2d23bb624dc683790befa2029c2a3d89388b7d5b3183d4d48789801068d7442807c28c2a53439089e2ba0943e0bc9b
|
data/CHANGELOG.md
CHANGED
data/LICENSE.txt
CHANGED
data/README.md
CHANGED
@@ -7,7 +7,7 @@ Machine learning for Ruby
|
|
7
7
|
|
8
8
|
Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
|
9
9
|
|
10
|
-
[](https://github.com/ankane/eps/actions)
|
11
11
|
|
12
12
|
## Installation
|
13
13
|
|
@@ -414,7 +414,7 @@ Eps::Model.new(data, validation_set: validation_set)
|
|
414
414
|
Split on a specific value
|
415
415
|
|
416
416
|
```ruby
|
417
|
-
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("
|
417
|
+
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2025-01-01")})
|
418
418
|
```
|
419
419
|
|
420
420
|
Specify the validation set size (the default is `0.25`, which is 25%)
|
@@ -435,7 +435,7 @@ The database is another place you can store models. It’s good if you retrain m
|
|
435
435
|
|
436
436
|
> We recommend adding monitoring and guardrails as well if you retrain automatically
|
437
437
|
|
438
|
-
Create an
|
438
|
+
Create an Active Record model to store the predictive model.
|
439
439
|
|
440
440
|
```sh
|
441
441
|
rails generate model Model key:string:uniq data:text
|
@@ -479,61 +479,7 @@ Weights are supported for metrics as well
|
|
479
479
|
Eps.metrics(actual, predicted, weight: weight)
|
480
480
|
```
|
481
481
|
|
482
|
-
Reweighing is one method to [mitigate bias](
|
483
|
-
|
484
|
-
## Upgrading
|
485
|
-
|
486
|
-
## 0.3.0
|
487
|
-
|
488
|
-
Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
|
489
|
-
|
490
|
-
- LightGBM is now the default for new models. On Mac, run:
|
491
|
-
|
492
|
-
```sh
|
493
|
-
brew install libomp
|
494
|
-
```
|
495
|
-
|
496
|
-
Pass the `algorithm` option to use linear regression or naive Bayes.
|
497
|
-
|
498
|
-
```ruby
|
499
|
-
Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
|
500
|
-
```
|
501
|
-
|
502
|
-
- Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
|
503
|
-
|
504
|
-
```ruby
|
505
|
-
Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
|
506
|
-
```
|
507
|
-
|
508
|
-
Or randomly, use:
|
509
|
-
|
510
|
-
```ruby
|
511
|
-
Eps::Model.new(data, split: {validation_size: 0.3})
|
512
|
-
```
|
513
|
-
|
514
|
-
To continue splitting manually, use:
|
515
|
-
|
516
|
-
```ruby
|
517
|
-
Eps::Model.new(data, validation_set: test_set)
|
518
|
-
```
|
519
|
-
|
520
|
-
- It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
|
521
|
-
|
522
|
-
## 0.2.0
|
523
|
-
|
524
|
-
Eps 0.2.0 brings a number of improvements, including support for classification.
|
525
|
-
|
526
|
-
We recommend:
|
527
|
-
|
528
|
-
1. Changing `Eps::Regressor` to `Eps::Model`
|
529
|
-
2. Converting models from JSON to PMML
|
530
|
-
|
531
|
-
```ruby
|
532
|
-
model = Eps::Model.load_json("model.json")
|
533
|
-
File.write("model.pmml", model.to_pmml)
|
534
|
-
```
|
535
|
-
|
536
|
-
3. Renaming `app/stats_models` to `app/ml_models`
|
482
|
+
Reweighing is one method to [mitigate bias](https://fairlearn.org/) in training data
|
537
483
|
|
538
484
|
## History
|
539
485
|
|
data/lib/eps/base_estimator.rb
CHANGED
data/lib/eps/data_frame.rb
CHANGED
@@ -4,7 +4,7 @@ module Eps
|
|
4
4
|
attr_reader :features
|
5
5
|
|
6
6
|
def initialize(coefficients:, features:, text_features:)
|
7
|
-
@coefficients =
|
7
|
+
@coefficients = coefficients.to_h { |k, v| [k.is_a?(Array) ? [k[0].to_s, k[1]] : k.to_s, v] }
|
8
8
|
@features = features
|
9
9
|
@text_features = text_features || {}
|
10
10
|
end
|
@@ -13,7 +13,7 @@ module Eps
|
|
13
13
|
raise "Probabilities not supported" if probabilities
|
14
14
|
|
15
15
|
intercept = @coefficients["_intercept"] || 0.0
|
16
|
-
scores =
|
16
|
+
scores = Array.new(x.size, intercept)
|
17
17
|
|
18
18
|
@features.each do |k, type|
|
19
19
|
raise "Missing data in #{k}" if !x.columns[k] || x.columns[k].any?(&:nil?)
|
@@ -50,7 +50,7 @@ module Eps
|
|
50
50
|
end
|
51
51
|
|
52
52
|
def coefficients
|
53
|
-
|
53
|
+
@coefficients.to_h { |k, v| [Array(k).join.to_sym, v] }
|
54
54
|
end
|
55
55
|
end
|
56
56
|
end
|
data/lib/eps/label_encoder.rb
CHANGED
@@ -146,7 +146,7 @@ module Eps
|
|
146
146
|
|
147
147
|
@coefficient_names = data.columns.keys
|
148
148
|
@coefficient_names.unshift("_intercept") if intercept
|
149
|
-
@coefficients =
|
149
|
+
@coefficients = @coefficient_names.zip(v3).to_h
|
150
150
|
Evaluators::LinearRegression.new(coefficients: @coefficients, features: @features, text_features: @text_features)
|
151
151
|
end
|
152
152
|
|
@@ -172,21 +172,20 @@ module Eps
|
|
172
172
|
# add epsilon for perfect fits
|
173
173
|
# consistent with GSL
|
174
174
|
def t_value
|
175
|
-
@t_value ||=
|
175
|
+
@t_value ||= @coefficients.to_h { |k, v| [k, v / (std_err[k] + Float::EPSILON)] }
|
176
176
|
end
|
177
177
|
|
178
178
|
def p_value
|
179
179
|
@p_value ||= begin
|
180
|
-
|
181
|
-
|
182
|
-
|
183
|
-
end]
|
180
|
+
@coefficients.to_h do |k, _|
|
181
|
+
[k, 2 * Eps::Statistics.students_t_cdf(-t_value[k].abs, degrees_of_freedom)]
|
182
|
+
end
|
184
183
|
end
|
185
184
|
end
|
186
185
|
|
187
186
|
def std_err
|
188
187
|
@std_err ||= begin
|
189
|
-
|
188
|
+
@coefficient_names.zip(diagonal.map { |v| Math.sqrt(v) }).to_h
|
190
189
|
end
|
191
190
|
end
|
192
191
|
|
data/lib/eps/statistics.rb
CHANGED
@@ -1,79 +1,73 @@
|
|
1
|
-
### Extracted from https://github.com/estebanz01/ruby-statistics
|
2
|
-
### The Ruby author is Esteban Zapata Rojas
|
3
|
-
###
|
4
|
-
### Originally extracted from https://codeplea.com/incomplete-beta-function-c
|
5
|
-
### These functions shared under zlib license and the author is Lewis Van Winkle
|
6
|
-
|
7
1
|
module Eps
|
8
2
|
module Statistics
|
9
|
-
def self.
|
10
|
-
|
11
|
-
lower = (2.0 * Math.sqrt(value * value + degrees_of_freedom))
|
12
|
-
|
13
|
-
x = upper/lower
|
14
|
-
|
15
|
-
alpha = degrees_of_freedom/2.0
|
16
|
-
beta = degrees_of_freedom/2.0
|
17
|
-
|
18
|
-
incomplete_beta_function(x, alpha, beta)
|
3
|
+
def self.normal_cdf(x, mean, std_dev)
|
4
|
+
0.5 * (1.0 + Math.erf((x - mean) / (std_dev * Math.sqrt(2))))
|
19
5
|
end
|
20
6
|
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
7
|
+
# Hill, G. W. (1970).
|
8
|
+
# Algorithm 395: Student's t-distribution.
|
9
|
+
# Communications of the ACM, 13(10), 617-619.
|
10
|
+
def self.students_t_cdf(x, n)
|
11
|
+
start, sign = x < 0 ? [0, 1] : [1, -1]
|
12
|
+
|
13
|
+
z = 1.0
|
14
|
+
t = x * x
|
15
|
+
y = t / n.to_f
|
16
|
+
b = 1.0 + y
|
17
|
+
|
18
|
+
if n > n.floor || (n >= 20.0 && t < n) || n > 200.0
|
19
|
+
# asymptotic series for large or noninteger n
|
20
|
+
if y > 10e-6
|
21
|
+
y = Math.log(b)
|
22
|
+
end
|
23
|
+
a = n - 0.5
|
24
|
+
b = 48.0 * a * a
|
25
|
+
y *= a
|
26
|
+
y = (((((-0.4 * y - 3.3) * y - 24.0) * y - 85.5) / (0.8 * y * y + 100.0 + b) + y + 3.0) / b + 1.0) * Math.sqrt(y)
|
27
|
+
return start + sign * normal_cdf(-y, 0.0, 1.0)
|
29
28
|
end
|
30
29
|
|
31
|
-
|
32
|
-
|
33
|
-
|
34
|
-
|
35
|
-
|
36
|
-
|
37
|
-
|
38
|
-
# upper_left = (x ** alp) * ((1.0 - x) ** bet)
|
39
|
-
# front = upper_left/down_left
|
40
|
-
|
41
|
-
f, c, d = 1.0, 1.0, 0.0
|
42
|
-
|
43
|
-
returned_value = nil
|
44
|
-
|
45
|
-
# Let's do more iterations than the proposed implementation (200 iters)
|
46
|
-
(0..500).each do |number|
|
47
|
-
m = number/2
|
48
|
-
|
49
|
-
numerator = if number == 0
|
50
|
-
1.0
|
51
|
-
elsif number % 2 == 0
|
52
|
-
(m * (bet - m) * x)/((alp + 2.0 * m - 1.0)* (alp + 2.0 * m))
|
53
|
-
else
|
54
|
-
top = -((alp + m) * (alp + bet + m) * x)
|
55
|
-
down = ((alp + 2.0 * m) * (alp + 2.0 * m + 1.0))
|
56
|
-
|
57
|
-
top/down
|
58
|
-
end
|
59
|
-
|
60
|
-
d = 1.0 + numerator * d
|
61
|
-
d = tiny if d.abs < tiny
|
62
|
-
d = 1.0 / d
|
63
|
-
|
64
|
-
c = 1.0 + numerator / c
|
65
|
-
c = tiny if c.abs < tiny
|
66
|
-
|
67
|
-
cd = (c*d).freeze
|
68
|
-
f = f * cd
|
30
|
+
if n < 20 && t < 4.0
|
31
|
+
# nested summation of cosine series
|
32
|
+
y = Math.sqrt(y)
|
33
|
+
a = y
|
34
|
+
if n == 1
|
35
|
+
a = 0.0
|
36
|
+
end
|
69
37
|
|
70
|
-
|
71
|
-
|
72
|
-
|
38
|
+
# loop
|
39
|
+
if n > 1
|
40
|
+
n -= 2
|
41
|
+
while n > 1
|
42
|
+
a = (n - 1) / (b * n) * a + y
|
43
|
+
n -= 2
|
44
|
+
end
|
73
45
|
end
|
46
|
+
a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
|
47
|
+
return start + sign * (z - a) / 2.0
|
74
48
|
end
|
75
49
|
|
76
|
-
|
50
|
+
# tail series expanation for large t-values
|
51
|
+
a = Math.sqrt(b)
|
52
|
+
y = a * n
|
53
|
+
j = 0
|
54
|
+
while a != z
|
55
|
+
j += 2
|
56
|
+
z = a
|
57
|
+
y = y * (j - 1) / (b * j)
|
58
|
+
a += y / (n + j)
|
59
|
+
end
|
60
|
+
z = 0.0
|
61
|
+
y = 0.0
|
62
|
+
a = -a
|
63
|
+
|
64
|
+
# loop (without n + 2 and n - 2)
|
65
|
+
while n > 1
|
66
|
+
a = (n - 1) / (b * n) * a + y
|
67
|
+
n -= 2
|
68
|
+
end
|
69
|
+
a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
|
70
|
+
start + sign * (z - a) / 2.0
|
77
71
|
end
|
78
72
|
end
|
79
73
|
end
|
data/lib/eps/text_encoder.rb
CHANGED
data/lib/eps/version.rb
CHANGED
data/lib/eps.rb
CHANGED
@@ -1,34 +1,36 @@
|
|
1
1
|
# dependencies
|
2
|
-
require "json"
|
3
2
|
require "lightgbm"
|
4
3
|
require "matrix"
|
5
4
|
require "nokogiri"
|
6
5
|
|
6
|
+
# stdlib
|
7
|
+
require "json"
|
8
|
+
|
7
9
|
# modules
|
8
|
-
|
9
|
-
|
10
|
-
|
11
|
-
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
10
|
+
require_relative "eps/base"
|
11
|
+
require_relative "eps/base_estimator"
|
12
|
+
require_relative "eps/data_frame"
|
13
|
+
require_relative "eps/label_encoder"
|
14
|
+
require_relative "eps/lightgbm"
|
15
|
+
require_relative "eps/linear_regression"
|
16
|
+
require_relative "eps/metrics"
|
17
|
+
require_relative "eps/model"
|
18
|
+
require_relative "eps/naive_bayes"
|
19
|
+
require_relative "eps/statistics"
|
20
|
+
require_relative "eps/text_encoder"
|
21
|
+
require_relative "eps/utils"
|
22
|
+
require_relative "eps/version"
|
21
23
|
|
22
24
|
# pmml
|
23
|
-
|
24
|
-
|
25
|
-
|
25
|
+
require_relative "eps/pmml"
|
26
|
+
require_relative "eps/pmml/generator"
|
27
|
+
require_relative "eps/pmml/loader"
|
26
28
|
|
27
29
|
# evaluators
|
28
|
-
|
29
|
-
|
30
|
-
|
31
|
-
|
30
|
+
require_relative "eps/evaluators/linear_regression"
|
31
|
+
require_relative "eps/evaluators/lightgbm"
|
32
|
+
require_relative "eps/evaluators/naive_bayes"
|
33
|
+
require_relative "eps/evaluators/node"
|
32
34
|
|
33
35
|
module Eps
|
34
36
|
class Error < StandardError; end
|
metadata
CHANGED
@@ -1,14 +1,13 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: eps
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.6.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Andrew Kane
|
8
|
-
autorequire:
|
9
8
|
bindir: bin
|
10
9
|
cert_chain: []
|
11
|
-
date:
|
10
|
+
date: 2025-02-01 00:00:00.000000000 Z
|
12
11
|
dependencies:
|
13
12
|
- !ruby/object:Gem::Dependency
|
14
13
|
name: lightgbm
|
@@ -52,7 +51,6 @@ dependencies:
|
|
52
51
|
- - ">="
|
53
52
|
- !ruby/object:Gem::Version
|
54
53
|
version: '0'
|
55
|
-
description:
|
56
54
|
email: andrew@ankane.org
|
57
55
|
executables: []
|
58
56
|
extensions: []
|
@@ -86,7 +84,6 @@ homepage: https://github.com/ankane/eps
|
|
86
84
|
licenses:
|
87
85
|
- MIT
|
88
86
|
metadata: {}
|
89
|
-
post_install_message:
|
90
87
|
rdoc_options: []
|
91
88
|
require_paths:
|
92
89
|
- lib
|
@@ -94,15 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
|
|
94
91
|
requirements:
|
95
92
|
- - ">="
|
96
93
|
- !ruby/object:Gem::Version
|
97
|
-
version: '
|
94
|
+
version: '3.1'
|
98
95
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
99
96
|
requirements:
|
100
97
|
- - ">="
|
101
98
|
- !ruby/object:Gem::Version
|
102
99
|
version: '0'
|
103
100
|
requirements: []
|
104
|
-
rubygems_version: 3.
|
105
|
-
signing_key:
|
101
|
+
rubygems_version: 3.6.2
|
106
102
|
specification_version: 4
|
107
103
|
summary: Machine learning for Ruby. Supports regression (linear regression) and classification
|
108
104
|
(naive Bayes)
|