RubyGems - eps - Versions diffs - 0.5.0 → 0.7.0 - Mend

eps 0.5.0 → 0.7.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (18) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +9 -0
data/LICENSE.txt +1 -1
data/README.md +5 -66
data/lib/eps/base_estimator.rb +3 -3
data/lib/eps/data_frame.rb +1 -5
data/lib/eps/evaluators/linear_regression.rb +1 -1
data/lib/eps/evaluators/naive_bayes.rb +3 -3
data/lib/eps/lightgbm.rb +0 -2
data/lib/eps/linear_regression.rb +2 -2
data/lib/eps/naive_bayes.rb +1 -1
data/lib/eps/pmml/generator.rb +11 -11
data/lib/eps/pmml/loader.rb +0 -2
data/lib/eps/statistics.rb +1 -1
data/lib/eps/text_encoder.rb +1 -1
data/lib/eps/utils.rb +4 -4
data/lib/eps/version.rb +1 -1
metadata +6 -10

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: d93161edfe5b26ce55bbdafedfa4ead7fad756cc0f3e921f2b970a49c97bb5fc
-  data.tar.gz: 5d0e4f8326a6e446efbe0a4a6f9e8e6435b7314ac4dba737d87bbe4b73c4e04a
+  metadata.gz: 9559f4d440f2a7cb6d541073600829a0b1bdd0df61c6abe166e0ff731a34fe18
+  data.tar.gz: 6a9f37c76d6bdec50f877d9932f805bda751b87c03c0a69f187830dc7c1876d4
 SHA512:
-  metadata.gz: e387214353fdf13608d48b306db3ce1b635eb3977f052d1d47b3e2b8cbe0c14628e01ca1d4291eaa9d3fb833864ff02628817155275d2105a069d2f4a866b8b3
-  data.tar.gz: b27237a71a7198719b3000f385ea946547258f789f1a650cc348ed38d96e49c4d56b01149917807c97200aa737d6864739094c91d86ab8bafdd29e96e25e0d3b
+  metadata.gz: c12d5b688bbe9884a0350efd3900de68f1244264dc0cf56f7a8a64145b3ff4a2f08608146e596d0052c1a7da4cc52c4606ae9a5e52b180c1fe1e60d034170e9f
+  data.tar.gz: 5fe61bd82055f22210f27f7d1cceb7adbb71d50dd6750336ddc731ae617d749e2c9c478ada19f6a2ad8def2c51d8e2e13a0269ba7687ee97331490a49f59d7c4

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,12 @@
+## 0.7.0 (2026-04-14)
+- Dropped support for Daru
+- Dropped support for Ruby < 3.3
+## 0.6.0 (2025-02-01)
+- Dropped support for Ruby < 3.1
 ## 0.5.0 (2023-07-02)
 - Dropped support for Ruby < 3

data/LICENSE.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 The MIT License (MIT)
-Copyright (c) 2018-2023 Andrew Kane
+Copyright (c) 2018-2026 Andrew Kane
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

data/README.md CHANGED Viewed

@@ -7,7 +7,7 @@ Machine learning for Ruby
 Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
-[![Build Status](https://github.com/ankane/eps/workflows/build/badge.svg?branch=master)](https://github.com/ankane/eps/actions)
+[![Build Status](https://github.com/ankane/eps/actions/workflows/build.yml/badge.svg)](https://github.com/ankane/eps/actions)
 ## Installation
@@ -147,7 +147,7 @@ You can set advanced options with:
 ```ruby
 text_features: {
   description: {
-    min_occurences: 5,          # min times a word must appear to be included in the model
+    min_occurrences: 5,         # min times a word must appear to be included in the model
     max_features: 1000,         # max number of words to include in the model
     min_length: 1,              # min length of words to be included
     case_sensitive: true,       # how to treat words with different case
@@ -336,13 +336,6 @@ df = Rover.read_csv("houses.csv")
 Eps::Model.new(df, target: "price")
 ```
-Or a Daru data frame
-```ruby
-df = Daru::DataFrame.from_csv("houses.csv")
-Eps::Model.new(df, target: "price")
-```
 When reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.
 ```ruby
@@ -414,7 +407,7 @@ Eps::Model.new(data, validation_set: validation_set)
 Split on a specific value
 ```ruby
-Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
+Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2025-01-01")})
 ```
 Specify the validation set size (the default is `0.25`, which is 25%)
@@ -435,7 +428,7 @@ The database is another place you can store models. It’s good if you retrain m
 > We recommend adding monitoring and guardrails as well if you retrain automatically
-Create an ActiveRecord model to store the predictive model.
+Create an Active Record model to store the predictive model.
 ```sh
 rails generate model Model key:string:uniq data:text
@@ -479,61 +472,7 @@ Weights are supported for metrics as well
 Eps.metrics(actual, predicted, weight: weight)
 ```
-Reweighing is one method to [mitigate bias](http://aif360.mybluemix.net/) in training data
-## Upgrading
-## 0.3.0
-Eps 0.3.0 brings a number of improvements, including support for LightGBM and cross-validation. There are a number of breaking changes to be aware of:
-- LightGBM is now the default for new models. On Mac, run:
-  ```sh
-  brew install libomp
-  ```
-  Pass the `algorithm` option to use linear regression or naive Bayes.
-  ```ruby
-  Eps::Model.new(data, algorithm: :linear_regression) # or :naive_bayes
-  ```
-- Cross-validation happens automatically by default. You no longer need to create training and test sets manually. If you were splitting on a time, use:
-  ```ruby
-  Eps::Model.new(data, split: {column: :listed_at, value: Date.parse("2019-01-01")})
-  ```
-  Or randomly, use:
-  ```ruby
-  Eps::Model.new(data, split: {validation_size: 0.3})
-  ```
-  To continue splitting manually, use:
-  ```ruby
-  Eps::Model.new(data, validation_set: test_set)
-  ```
-- It’s no longer possible to load models in JSON or PFA formats. Retrain models and save them as PMML.
-## 0.2.0
-Eps 0.2.0 brings a number of improvements, including support for classification.
-We recommend:
-1. Changing `Eps::Regressor` to `Eps::Model`
-2. Converting models from JSON to PMML
-  ```ruby
-  model = Eps::Model.load_json("model.json")
-  File.write("model.pmml", model.to_pmml)
-  ```
-3. Renaming `app/stats_models` to `app/ml_models`
+Reweighing is one method to [mitigate bias](https://fairlearn.org/) in training data
 ## History

data/lib/eps/base_estimator.rb CHANGED Viewed

@@ -27,8 +27,8 @@ module Eps
     def self.load_pmml(pmml)
       model = new
-      model.instance_variable_set("@evaluator", PMML.load(pmml))
-      model.instance_variable_set("@pmml", pmml.respond_to?(:to_xml) ? pmml.to_xml : pmml) # cache data
+      model.instance_variable_set(:@evaluator, PMML.load(pmml))
+      model.instance_variable_set(:@pmml, pmml.respond_to?(:to_xml) ? pmml.to_xml : pmml) # cache data
       model
     end
@@ -226,7 +226,7 @@ module Eps
         end
         encoder.vocabulary.each do |word|
-          train_set.columns[[k, word]] = [0] * counts.size
+          train_set.columns[[k, word]] = Array.new(counts.size, 0)
         end
         counts.each_with_index do |ci, i|

data/lib/eps/data_frame.rb CHANGED Viewed

@@ -10,7 +10,7 @@ module Eps
         data.columns.each do |k, v|
           @columns[k] = v
         end
-      elsif rover?(data) || daru?(data)
+      elsif rover?(data)
         data.to_h.each do |k, v|
           @columns[k.to_s] = v.to_a
         end
@@ -152,9 +152,5 @@ module Eps
     def rover?(x)
       defined?(Rover::DataFrame) && x.is_a?(Rover::DataFrame)
     end
-    def daru?(x)
-      defined?(Daru::DataFrame) && x.is_a?(Daru::DataFrame)
-    end
   end
 end

data/lib/eps/evaluators/linear_regression.rb CHANGED Viewed

@@ -13,7 +13,7 @@ module Eps
         raise "Probabilities not supported" if probabilities
         intercept = @coefficients["_intercept"] || 0.0
-        scores = [intercept] * x.size
+        scores = Array.new(x.size, intercept)
         @features.each do |k, type|
           raise "Missing data in #{k}" if !x.columns[k] || x.columns[k].any?(&:nil?)

data/lib/eps/evaluators/naive_bayes.rb CHANGED Viewed

@@ -14,7 +14,7 @@ module Eps
         probs = calculate_class_probabilities(x)
         probs.map do |xp|
           if probabilities
-            sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
+            sum = xp.values.sum { |v| Math.exp(v) }.to_f
             xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
           else
             xp.sort_by { |k, v| [-v, k] }[0][0]
@@ -33,7 +33,7 @@ module Eps
         total = probabilities[:prior].values.sum.to_f
         probabilities[:prior].each do |c, cv|
           prior = Math.log(cv / total)
-          px = [prior] * x.size
+          px = Array.new(x.size, prior)
           @features.each do |k, type|
             case type
@@ -44,7 +44,7 @@ module Eps
                 # unknown value if not vc
                 if vc
-                  denom = probabilities[:conditional][k].map { |k, v| v[c] }.sum.to_f
+                  denom = probabilities[:conditional][k].sum { |k, v| v[c] }.to_f
                   p2 = vc[c].to_f / denom
                   # TODO use proper smoothing instead

data/lib/eps/lightgbm.rb CHANGED Viewed

@@ -20,7 +20,6 @@ module Eps
     def _train(verbose: nil, early_stopping: nil, learning_rate: 0.1)
       train_set = @train_set
       validation_set = @validation_set.dup
-      summary_label = train_set.label
       # create check set
       evaluator_set = validation_set || train_set
@@ -134,7 +133,6 @@ module Eps
         actual = evaluator.predict(evaluator_set)
       end
-      regression = objective == "regression" || objective == "binary"
       bad_observations = []
       expected.zip(actual).each_with_index do |(exp, act), i|
         success = (act - exp).abs < 0.001

data/lib/eps/linear_regression.rb CHANGED Viewed

@@ -221,13 +221,13 @@ module Eps
     # total sum of squares
     def sst
-      @sst ||= @train_set.label.map { |y| (y - y_bar)**2 }.sum
+      @sst ||= @train_set.label.sum { |y| (y - y_bar)**2 }
     end
     # sum of squared errors of prediction
     # not to be confused with "explained sum of squares"
     def sse
-      @sse ||= @train_set.label.zip(y_hat).map { |y, yh| (y - yh)**2 }.sum
+      @sse ||= @train_set.label.zip(y_hat).sum { |y, yh| (y - yh)**2 }
     end
     def mst

data/lib/eps/naive_bayes.rb CHANGED Viewed

@@ -73,7 +73,7 @@ module Eps
           # smooth
           if smoothing
             labels.each do |label|
-              sum = prob.map { |k2, v2| v2[label] }.sum.to_f
+              sum = prob.sum { |k2, v2| v2[label] }.to_f
               prob.each do |k2, v|
                 v[label] = (v[label] + smoothing) * sum / (sum + (prob.size * smoothing))
               end

data/lib/eps/pmml/generator.rb CHANGED Viewed

@@ -151,7 +151,7 @@ module Eps
       end
       def linear_regression
-        predictors = model.instance_variable_get("@coefficients").dup
+        predictors = model.instance_variable_get(:@coefficients).dup
         intercept = predictors.delete("_intercept") || 0.0
         data_fields = {}
@@ -377,43 +377,43 @@ module Eps
       # TODO create instance methods on model for all of these features
       def features
-        model.instance_variable_get("@features")
+        model.instance_variable_get(:@features)
       end
       def text_features
-        model.instance_variable_get("@text_features")
+        model.instance_variable_get(:@text_features)
       end
       def text_encoders
-        model.instance_variable_get("@text_encoders")
+        model.instance_variable_get(:@text_encoders)
       end
       def feature_importance
-        model.instance_variable_get("@feature_importance")
+        model.instance_variable_get(:@feature_importance)
       end
       def labels
-        model.instance_variable_get("@labels")
+        model.instance_variable_get(:@labels)
       end
       def trees
-        model.instance_variable_get("@trees")
+        model.instance_variable_get(:@trees)
       end
       def target
-        model.instance_variable_get("@target")
+        model.instance_variable_get(:@target)
       end
       def label_encoders
-        model.instance_variable_get("@label_encoders")
+        model.instance_variable_get(:@label_encoders)
       end
       def objective
-        model.instance_variable_get("@objective")
+        model.instance_variable_get(:@objective)
       end
       def probabilities
-        model.instance_variable_get("@probabilities")
+        model.instance_variable_get(:@probabilities)
       end
       # end TODO

data/lib/eps/pmml/loader.rb CHANGED Viewed

@@ -139,8 +139,6 @@ module Eps
           features[name] = n.css("TargetValueStat").any? ? "numeric" : "categorical"
         end
-        target = node.css("BayesOutput").attribute("fieldName").value
         probabilities = {
           prior: prior,
           conditional: conditional

data/lib/eps/statistics.rb CHANGED Viewed

@@ -47,7 +47,7 @@ module Eps
         return start + sign * (z - a) / 2.0
       end
-      # tail series expanation for large t-values
+      # tail series expansion for large t-values
       a = Math.sqrt(b)
       y = a * n
       j = 0

data/lib/eps/text_encoder.rb CHANGED Viewed

@@ -36,7 +36,7 @@ module Eps
     end
     def transform(arr)
-      counts, fit = count_and_fit(arr)
+      _, fit = count_and_fit(arr)
       fit
     end

data/lib/eps/utils.rb CHANGED Viewed

@@ -3,14 +3,14 @@ module Eps
     def self.column_type(c, k)
       if !c
         raise ArgumentError, "Missing column: #{k}"
-      elsif c.all? { |v| v.nil? }
+      elsif c.all?(&:nil?)
         # goes here for empty as well
         nil
-      elsif c.any? { |v| v.nil? }
+      elsif c.any?(&:nil?)
         raise ArgumentError, "Missing values in column #{k}"
-      elsif c.all? { |v| v.is_a?(Numeric) }
+      elsif c.all?(Numeric)
         "numeric"
-      elsif c.all? { |v| v.is_a?(String) }
+      elsif c.all?(String)
         "categorical"
       elsif c.all? { |v| v == true || v == false }
         "categorical" # boolean

data/lib/eps/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Eps
-  VERSION = "0.5.0"
+  VERSION = "0.7.0"
 end

metadata CHANGED Viewed

@@ -1,14 +1,13 @@
 --- !ruby/object:Gem::Specification
 name: eps
 version: !ruby/object:Gem::Version
-  version: 0.5.0
+  version: 0.7.0
 platform: ruby
 authors:
 - Andrew Kane
-autorequire:
 bindir: bin
 cert_chain: []
-date: 2023-07-02 00:00:00.000000000 Z
+date: 1980-01-02 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: lightgbm
@@ -16,14 +15,14 @@ dependencies:
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.1.7
+        version: '0.4'
   type: :runtime
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 0.1.7
+        version: '0.4'
 - !ruby/object:Gem::Dependency
   name: matrix
   requirement: !ruby/object:Gem::Requirement
@@ -52,7 +51,6 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
-description:
 email: andrew@ankane.org
 executables: []
 extensions: []
@@ -86,7 +84,6 @@ homepage: https://github.com/ankane/eps
 licenses:
 - MIT
 metadata: {}
-post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -94,15 +91,14 @@ required_ruby_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
-      version: '3'
+      version: '3.3'
 required_rubygems_version: !ruby/object:Gem::Requirement
   requirements:
   - - ">="
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.4.10
-signing_key:
+rubygems_version: 4.0.6
 specification_version: 4
 summary: Machine learning for Ruby. Supports regression (linear regression) and classification
   (naive Bayes)