RubyGems - eps - Versions diffs - 0.4.1 → 0.6.0 - Mend

eps 0.4.1 → 0.6.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (15) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +8 -0
data/LICENSE.txt +1 -1
data/README.md +4 -58
data/lib/eps/base_estimator.rb +1 -1
data/lib/eps/data_frame.rb +1 -1
data/lib/eps/evaluators/linear_regression.rb +3 -3
data/lib/eps/evaluators/naive_bayes.rb +1 -1
data/lib/eps/label_encoder.rb +1 -1
data/lib/eps/linear_regression.rb +6 -7
data/lib/eps/statistics.rb +60 -66
data/lib/eps/text_encoder.rb +1 -1
data/lib/eps/version.rb +1 -1
data/lib/eps.rb +23 -21
metadata +4 -8

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 4df8a83bee7fce8feebec2cf26d33d7ee4ca74fbcda9f41fb070f614cfb2e0eb
-  data.tar.gz: 23f7dd9aa63eb4306268f19b862de3a07f9d72d9ec507160a7e6d291ea2245c6
+  metadata.gz: 701cde7907172e33d54d05ee0f2236dbfa84486744e7a510190618848c878418
+  data.tar.gz: a6be730db321a5143e34727497c6ed675f8d734b6195cd4fdc5f8e6bc4e98ff1
 SHA512:
-  metadata.gz: c24ea7abf903829b3fe00dd0f7c601062464ecc193ccd8a725a98a437e7ed6f6bff8952c1c50aeeadcc5981e84325a44efedd53580e23f2475f1c8a7b927ed78
-  data.tar.gz: 601cf18d044fd9ac348d3f632b7edda7fbd34ef11f497d4be998d62ae76f33f6681953fb5c263924deebed184b5b6f560bcd24de272c509317fb9c3b68f2f3b9
+  metadata.gz: 6c76ec99a116a91cb9550b9a007b3150689b76649b14aa0c85582176fb44a08957b37c40e54ed1adb6447d8c5ec06757a611fad51a582581031f5724317ee888
+  data.tar.gz: 2176bf4351c26df4562a2879ecb2f0637d2d23bb624dc683790befa2029c2a3d89388b7d5b3183d4d48789801068d7442807c28c2a53439089e2ba0943e0bc9b

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,11 @@
+## 0.6.0 (2025-02-01)
+- Dropped support for Ruby < 3.1
+## 0.5.0 (2023-07-02)
+- Dropped support for Ruby < 3
 ## 0.4.1 (2022-09-28)
 - Fixed `cannot load such file -- matrix` error with Ruby 3.1

data/LICENSE.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 The MIT License (MIT)
-Copyright (c) 2018-2021 Andrew Kane
+Copyright (c) 2018-2024 Andrew Kane
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

data/README.md CHANGED Viewed

@@ -7,7 +7,7 @@ Machine learning for Ruby
 Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
-[![Build Status](https://github.com/ankane/eps/workflows/build/badge.svg?branch=master)](https://github.com/ankane/eps/actions)
+[![Build Status](https://github.com/ankane/eps/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/eps/actions)
 ## Installation
@@ -414,7 +414,7 @@ Eps::Model.new(data, validation_set: validation_set)
 Split on a specific value
 ```ruby
-Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
+Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2025-01-01")})
 ```
 Specify the validation set size (the default is `0.25`, which is 25%)
@@ -435,7 +435,7 @@ The database is another place you can store models. It’s good if you retrain m
 > We recommend adding monitoring and guardrails as well if you retrain automatically
-Create an ActiveRecord model to store the predictive model.
+Create an Active Record model to store the predictive model.
 ```sh
 rails generate model Model key:string:uniq data:text
@@ -479,61 +479,7 @@ Weights are supported for metrics as well
 Eps.metrics(actual, predicted, weight: weight)
 ```
-Reweighing is one method to [mitigate bias](http://aif360.mybluemix.net/) in training data
-## Upgrading
-## 0.3.0
-Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
-- LightGBM is now the default for new models. On Mac, run:
-  ```sh
-  brew install libomp
-  ```
-  Pass the `algorithm` option to use linear regression or naive Bayes.
-  ```ruby
-  Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
-  ```
-- Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
-  ```ruby
-  Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
-  ```
-  Or randomly, use:
-  ```ruby
-  Eps::Model.new(data, split: {validation_size: 0.3})
-  ```
-  To continue splitting manually, use:
-  ```ruby
-  Eps::Model.new(data, validation_set: test_set)
-  ```
-- It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
-## 0.2.0
-Eps 0.2.0 brings a number of improvements, including support for classification.
-We recommend:
-1. Changing `Eps::Regressor` to `Eps::Model`
-2. Converting models from JSON to PMML
-  ```ruby
-  model = Eps::Model.load_json("model.json")
-  File.write("model.pmml", model.to_pmml)
-  ```
-3. Renaming `app/stats_models` to `app/ml_models`
+Reweighing is one method to [mitigate bias](https://fairlearn.org/) in training data
 ## History

data/lib/eps/base_estimator.rb CHANGED Viewed

@@ -226,7 +226,7 @@ module Eps
         end
         encoder.vocabulary.each do |word|
-          train_set.columns[[k, word]] = [0] * counts.size
+          train_set.columns[[k, word]] = Array.new(counts.size, 0)
         end
         counts.each_with_index do |ci, i|

data/lib/eps/data_frame.rb CHANGED Viewed

@@ -54,7 +54,7 @@ module Eps
     def map
       if @columns.any?
         size.times.map do |i|
-          yield Hash[@columns.map { |k, v| [k, v[i]] }]
+          yield @columns.to_h { |k, v| [k, v[i]] }
         end
       end
     end

data/lib/eps/evaluators/linear_regression.rb CHANGED Viewed

@@ -4,7 +4,7 @@ module Eps
       attr_reader :features
       def initialize(coefficients:, features:, text_features:)
-        @coefficients = Hash[coefficients.map { |k, v| [k.is_a?(Array) ? [k[0].to_s, k[1]] : k.to_s, v] }]
+        @coefficients = coefficients.to_h { |k, v| [k.is_a?(Array) ? [k[0].to_s, k[1]] : k.to_s, v] }
         @features = features
         @text_features = text_features || {}
       end
@@ -13,7 +13,7 @@ module Eps
         raise "Probabilities not supported" if probabilities
         intercept = @coefficients["_intercept"] || 0.0
-        scores = [intercept] * x.size
+        scores = Array.new(x.size, intercept)
         @features.each do |k, type|
           raise "Missing data in #{k}" if !x.columns[k] || x.columns[k].any?(&:nil?)
@@ -50,7 +50,7 @@ module Eps
       end
       def coefficients
-        Hash[@coefficients.map { |k, v| [Array(k).join.to_sym, v] }]
+        @coefficients.to_h { |k, v| [Array(k).join.to_sym, v] }
       end
     end
   end

data/lib/eps/evaluators/naive_bayes.rb CHANGED Viewed

@@ -33,7 +33,7 @@ module Eps
         total = probabilities[:prior].values.sum.to_f
         probabilities[:prior].each do |c, cv|
           prior = Math.log(cv / total)
-          px = [prior] * x.size
+          px = Array.new(x.size, prior)
           @features.each do |k, type|
             case type

data/lib/eps/label_encoder.rb CHANGED Viewed

@@ -36,7 +36,7 @@ module Eps
     end
     def inverse_transform(y)
-      inverse = Hash[@labels.map(&:reverse)]
+      inverse = @labels.map(&:reverse).to_h
       y.map do |yi|
         inverse[yi.to_i]
       end

data/lib/eps/linear_regression.rb CHANGED Viewed

@@ -146,7 +146,7 @@ module Eps
       @coefficient_names = data.columns.keys
       @coefficient_names.unshift("_intercept") if intercept
-      @coefficients = Hash[@coefficient_names.zip(v3)]
+      @coefficients = @coefficient_names.zip(v3).to_h
       Evaluators::LinearRegression.new(coefficients: @coefficients, features: @features, text_features: @text_features)
     end
@@ -172,21 +172,20 @@ module Eps
     # add epsilon for perfect fits
     # consistent with GSL
     def t_value
-      @t_value ||= Hash[@coefficients.map { |k, v| [k, v / (std_err[k] + Float::EPSILON)] }]
+      @t_value ||= @coefficients.to_h { |k, v| [k, v / (std_err[k] + Float::EPSILON)] }
     end
     def p_value
       @p_value ||= begin
-        Hash[@coefficients.map do |k, _|
-          tp = Eps::Statistics.tdist_p(t_value[k].abs, degrees_of_freedom)
-          [k, 2 * (1 - tp)]
-        end]
+        @coefficients.to_h do |k, _|
+          [k, 2 * Eps::Statistics.students_t_cdf(-t_value[k].abs, degrees_of_freedom)]
+        end
       end
     end
     def std_err
       @std_err ||= begin
-        Hash[@coefficient_names.zip(diagonal.map { |v| Math.sqrt(v) })]
+        @coefficient_names.zip(diagonal.map { |v| Math.sqrt(v) }).to_h
       end
     end

data/lib/eps/statistics.rb CHANGED Viewed

@@ -1,79 +1,73 @@
-### Extracted from https://github.com/estebanz01/ruby-statistics
-### The Ruby author is Esteban Zapata Rojas
-###
-### Originally extracted from https://codeplea.com/incomplete-beta-function-c
-### These functions shared under zlib license and the author is Lewis Van Winkle
 module Eps
   module Statistics
-    def self.tdist_p(value, degrees_of_freedom)
-      upper = (value + Math.sqrt(value * value + degrees_of_freedom))
-      lower = (2.0 * Math.sqrt(value * value + degrees_of_freedom))
-      x = upper/lower
-      alpha = degrees_of_freedom/2.0
-      beta = degrees_of_freedom/2.0
-      incomplete_beta_function(x, alpha, beta)
+    def self.normal_cdf(x, mean, std_dev)
+      0.5 * (1.0 + Math.erf((x - mean) / (std_dev * Math.sqrt(2))))
     end
-    def self.incomplete_beta_function(x, alp, bet)
-      return if x < 0.0
-      return 1.0 if x > 1.0
-      tiny = 1.0E-50
-      if x > ((alp + 1.0)/(alp + bet + 2.0))
-        return 1.0 - incomplete_beta_function(1.0 - x, bet, alp)
+    # Hill, G. W. (1970).
+    # Algorithm 395: Student's t-distribution.
+    # Communications of the ACM, 13(10), 617-619.
+    def self.students_t_cdf(x, n)
+      start, sign = x < 0 ? [0, 1] : [1, -1]
+      z = 1.0
+      t = x * x
+      y = t / n.to_f
+      b = 1.0 + y
+      if n > n.floor || (n >= 20.0 && t < n) || n > 200.0
+        # asymptotic series for large or noninteger n
+        if y > 10e-6
+          y = Math.log(b)
+        end
+        a = n - 0.5
+        b = 48.0 * a * a
+        y *= a
+        y = (((((-0.4 * y - 3.3) * y - 24.0) * y - 85.5) / (0.8 * y * y + 100.0 + b) + y + 3.0) / b + 1.0) * Math.sqrt(y)
+        return start + sign * normal_cdf(-y, 0.0, 1.0)
       end
-      # To avoid overflow problems, the implementation applies the logarithm properties
-      # to calculate in a faster and safer way the values.
-      lbet_ab = (Math.lgamma(alp)[0] + Math.lgamma(bet)[0] - Math.lgamma(alp + bet)[0]).freeze
-      front = (Math.exp(Math.log(x) * alp + Math.log(1.0 - x) * bet - lbet_ab) / alp.to_f).freeze
-      # This is the non-log version of the left part of the formula (before the continuous fraction)
-      # down_left = alp * self.beta_function(alp, bet)
-      # upper_left = (x ** alp) * ((1.0 - x) ** bet)
-      # front = upper_left/down_left
-      f, c, d = 1.0, 1.0, 0.0
-      returned_value = nil
-      # Let's do more iterations than the proposed implementation (200 iters)
-      (0..500).each do |number|
-        m = number/2
-        numerator = if number == 0
-                      1.0
-                    elsif number % 2 == 0
-                      (m * (bet - m) * x)/((alp + 2.0 * m - 1.0)* (alp + 2.0 * m))
-                    else
-                      top = -((alp + m) * (alp + bet + m) * x)
-                      down = ((alp + 2.0 * m) * (alp + 2.0 * m + 1.0))
-                      top/down
-                    end
-        d = 1.0 + numerator * d
-        d = tiny if d.abs < tiny
-        d = 1.0 / d
-        c = 1.0 + numerator / c
-        c = tiny if c.abs < tiny
-        cd = (c*d).freeze
-        f = f * cd
+      if n < 20 && t < 4.0
+        # nested summation of cosine series
+        y = Math.sqrt(y)
+        a = y
+        if n == 1
+          a = 0.0
+        end
-        if (1.0 - cd).abs < 1.0E-10
-          returned_value = front * (f - 1.0)
-          break
+        # loop
+        if n > 1
+          n -= 2
+          while n > 1
+            a = (n - 1) / (b * n) * a + y
+            n -= 2
+          end
         end
+        a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
+        return start + sign * (z - a) / 2.0
       end
-      returned_value
+      # tail series expanation for large t-values
+      a = Math.sqrt(b)
+      y = a * n
+      j = 0
+      while a != z
+        j += 2
+        z = a
+        y = y * (j - 1) / (b * j)
+        a += y / (n + j)
+      end
+      z = 0.0
+      y = 0.0
+      a = -a
+      # loop (without n + 2 and n - 2)
+      while n > 1
+        a = (n - 1) / (b * n) * a + y
+        n -= 2
+      end
+      a = n == 0 ? a / Math.sqrt(b) : (Math.atan(y) + a / b) * (2.0 / Math::PI)
+      start + sign * (z - a) / 2.0
     end
   end
 end

data/lib/eps/text_encoder.rb CHANGED Viewed

@@ -27,7 +27,7 @@ module Eps
       max_features = options[:max_features]
       if max_features
-        counts = Hash[counts.sort_by { |_, v| -v }[0...max_features]]
+        counts = counts.sort_by { |_, v| -v }[0...max_features].to_h
       end
       @vocabulary = counts.keys

data/lib/eps/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Eps
-  VERSION = "0.4.1"
+  VERSION = "0.6.0"
 end

data/lib/eps.rb CHANGED Viewed

@@ -1,34 +1,36 @@
 # dependencies
-require "json"
 require "lightgbm"
 require "matrix"
 require "nokogiri"
+# stdlib
+require "json"
 # modules
-require "eps/base"
-require "eps/base_estimator"
-require "eps/data_frame"
-require "eps/label_encoder"
-require "eps/lightgbm"
-require "eps/linear_regression"
-require "eps/metrics"
-require "eps/model"
-require "eps/naive_bayes"
-require "eps/statistics"
-require "eps/text_encoder"
-require "eps/utils"
-require "eps/version"
+require_relative "eps/base"
+require_relative "eps/base_estimator"
+require_relative "eps/data_frame"
+require_relative "eps/label_encoder"
+require_relative "eps/lightgbm"
+require_relative "eps/linear_regression"
+require_relative "eps/metrics"
+require_relative "eps/model"
+require_relative "eps/naive_bayes"
+require_relative "eps/statistics"
+require_relative "eps/text_encoder"
+require_relative "eps/utils"
+require_relative "eps/version"
 # pmml
-require "eps/pmml"
-require "eps/pmml/generator"
-require "eps/pmml/loader"
+require_relative "eps/pmml"
+require_relative "eps/pmml/generator"
+require_relative "eps/pmml/loader"
 # evaluators
-require "eps/evaluators/linear_regression"
-require "eps/evaluators/lightgbm"
-require "eps/evaluators/naive_bayes"
-require "eps/evaluators/node"
+require_relative "eps/evaluators/linear_regression"
+require_relative "eps/evaluators/lightgbm"
+require_relative "eps/evaluators/naive_bayes"
+require_relative "eps/evaluators/node"
 module Eps
   class Error < StandardError; end

metadata CHANGED Viewed

@@ -1,14 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: eps
 version: !ruby/object:Gem::Version
-  version: 0.4.1
+  version: 0.6.0
 platform: ruby
 authors:
 - Andrew Kane
-autorequire:
 bindir: bin
 cert_chain: []
-date: 2022-09-28 00:00:00.000000000 Z
+date: 2025-02-01 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: lightgbm
@@ -52,7 +51,6 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
-description:
 email: andrew@ankane.org
 executables: []
 extensions: []
@@ -86,7 +84,6 @@ homepage: https://github.com/ankane/eps
 licenses:
 - MIT
 metadata: {}
-post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -94,15 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '2.7'
+      version: '3.1'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.3.7
-signing_key:
+rubygems_version: 3.6.2
 specification_version: 4
 summary: Machine learning for Ruby. Supports regression (linear regression) and classification
   (naive Bayes)