RubyGems - eps - Versions diffs - 0.3.3 → 0.3.8 - Mend

eps 0.3.3 → 0.3.8

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (14) hide show

checksums.yaml +4 -4
data/CHANGELOG.md +21 -0
data/LICENSE.txt +1 -1
data/README.md +47 -23
data/lib/eps/base_estimator.rb +53 -36
data/lib/eps/data_frame.rb +12 -2
data/lib/eps/evaluators/lightgbm.rb +20 -13
data/lib/eps/evaluators/linear_regression.rb +3 -1
data/lib/eps/evaluators/naive_bayes.rb +7 -6
data/lib/eps/lightgbm.rb +21 -13
data/lib/eps/linear_regression.rb +2 -1
data/lib/eps/naive_bayes.rb +1 -1
data/lib/eps/version.rb +1 -1
metadata +8 -64

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
 SHA256:
-  metadata.gz: 8133bd3887423fb41421aa2a4270aa0c7fc75b741ea60a6c55fb97308f3ddea4
-  data.tar.gz: '018283d9934459202f8395b4c3f4ba201894296e3cfa185bb884cc5b73981f0a'
+  metadata.gz: 75a23047c60f205e21919d168f45f98ce564619dddee2991b1f1aaf92fa1e264
+  data.tar.gz: 46596cdd433f6e2b333743e49d06b3a0057c974e707ee2ec1d0bf4cabbcfecc8
 SHA512:
-  metadata.gz: 2e1439f4a9a268a0434dc926a68822731db9267c746d4c76fa43a8debdbc49c25a502ff2051254fbf3453edb33141d35c02fa067afcddf2761e92b96e9d85751
-  data.tar.gz: 0c87d327d5f8083349cc75ea6b6e725e15ad20ac2535dc37dbc6b7955e6eca4408db93140287e8a9bfa7cee9da11956eb018600851bb07072c9e4347978d89dc
+  metadata.gz: 07bc84a1114c78d4d6de89ee4f82a1d4cb2a06717b930d9ddd31692811eacc094da1c8bba8fb455d22557871460a271c9a1aa876ec1caf4ac9390a24f02b1a9c
+  data.tar.gz: e3cc53b76ab217bc91d2734b0d4047471c5ff98a6671cf1f1cefc7dc28c342d5b7ec0bb68652e50a7b6b79f20d2ad56a83a7215d38530378f76da36b3553571d

data/CHANGELOG.md CHANGED Viewed

@@ -1,3 +1,24 @@
+## 0.3.8 (2021-02-08)
+- Fixed error with categorical and text features
+## 0.3.7 (2020-11-23)
+- Fixed error with LightGBM summary
+## 0.3.6 (2020-06-19)
+- Fixed error with text features for LightGBM
+## 0.3.5 (2020-06-10)
+- Added `learning_rate` option for LightGBM
+- Added support for Numo and Rover
+## 0.3.4 (2020-04-05)
+- Added `predict_probability` for classification
 ## 0.3.3 (2020-02-24)
 - Fixed errors and incorrect predictions with boolean columns

data/LICENSE.txt CHANGED Viewed

@@ -1,6 +1,6 @@
 The MIT License (MIT)
-Copyright (c) 2018-2019 Andrew Kane
+Copyright (c) 2018-2021 Andrew Kane
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal

data/README.md CHANGED Viewed

@@ -4,11 +4,10 @@ Machine learning for Ruby
 - Build predictive models quickly and easily
 - Serve models built in Ruby, Python, R, and more
-- No prior knowledge of machine learning required :tada:
 Check out [this post](https://ankane.org/rails-meet-data-science) for more info on machine learning with Rails
-[![Build Status](https://travis-ci.org/ankane/eps.svg?branch=master)](https://travis-ci.org/ankane/eps)
+[![Build Status](https://github.com/ankane/eps/workflows/build/badge.svg?branch=master)](https://github.com/ankane/eps/actions)
 ## Installation
@@ -135,7 +134,7 @@ For text features, use strings with multiple words.
 {description: "a beautiful house on top of a hill"}
 ```
-This creates features based on word count (term frequency).
+This creates features based on [word count](https://en.wikipedia.org/wiki/Bag-of-words_model).
 You can specify text features explicitly with:
@@ -148,12 +147,12 @@ You can set advanced options with:
 ```ruby
 text_features: {
   description: {
-    min_occurences: 5,
-    max_features: 1000,
-    min_length: 1,
-    case_sensitive: true,
-    tokenizer: /\s+/,
-    stop_words: ["and", "the"]
+    min_occurences: 5,          # min times a word must appear to be included in the model
+    max_features: 1000,         # max number of words to include in the model
+    min_length: 1,              # min length of words to be included
+    case_sensitive: true,       # how to treat words with different case
+    tokenizer: /\s+/,           # how to tokenize the text, defaults to whitespace
+    stop_words: ["and", "the"]  # words to exclude from the model
   }
 }
 ```
@@ -219,7 +218,7 @@ Build the model with:
 PriceModel.build
 ```
-This saves the model to `price_model.pmml`. Be sure to check this into source control.
+This saves the model to `price_model.pmml`. Check this into source control or use a tool like [Trove](https://github.com/ankane/trove) to store it.
 Predict with:
@@ -314,7 +313,7 @@ y = [1, 2, 3]
 Eps::Model.new(x, y)
 ```
-Or pass arrays of arrays
+Data can be an array of arrays
 ```ruby
 x = [[1, 2], [2, 0], [3, 1]]
@@ -322,18 +321,29 @@ y = [1, 2, 3]
 Eps::Model.new(x, y)
 ```
-### Daru
+Or Numo arrays
-Eps works well with Daru data frames.
+```ruby
+x = Numo::NArray.cast([[1, 2], [2, 0], [3, 1]])
+y = Numo::NArray.cast([1, 2, 3])
+Eps::Model.new(x, y)
+```
+Or a Rover data frame
 ```ruby
-df = Daru::DataFrame.from_csv("houses.csv")
+df = Rover.read_csv("houses.csv")
 Eps::Model.new(df, target: "price")
 ```
-### CSVs
+Or a Daru data frame
+```ruby
+df = Daru::DataFrame.from_csv("houses.csv")
+Eps::Model.new(df, target: "price")
+```
-When importing data from CSV files, be sure to convert numeric fields. The `table` method does this automatically.
+When reading CSV files directly, be sure to convert numeric fields. The `table` method does this automatically.
 ```ruby
 CSV.table("data.csv").map { |row| row.to_h }
@@ -353,9 +363,21 @@ Eps supports:
 - Linear Regression
 - Naive Bayes
+### LightGBM
+Pass the learning rate with:
+```ruby
+Eps::Model.new(data, learning_rate: 0.01)
+```
 ### Linear Regression
-#### Performance
+By default, an intercept is included. Disable this with:
+```ruby
+Eps::Model.new(data, intercept: false)
+```
 To speed up training on large datasets with linear regression, [install GSL](https://github.com/ankane/gslr#gsl-installation). With Homebrew, you can use:
@@ -371,14 +393,16 @@ gem 'gslr', group: :development
 It only needs to be available in environments used to build the model.
-#### Options
+## Probability
-By default, an intercept is included. Disable this with:
+To get the probability of each category for predictions with classification, use:
 ```ruby
-Eps::Model.new(data, intercept: false)
+model.predict_probability(data)
 ```
+Naive Bayes is known to produce poor probability estimates, so stick with LightGBM if you need this.
 ## Validation Options
 Pass your own validation set with:
@@ -414,7 +438,7 @@ The database is another place you can store models. It’s good if you retrain m
 Create an ActiveRecord model to store the predictive model.
 ```sh
-rails g model Model key:string:uniq data:text
+rails generate model Model key:string:uniq data:text
 ```
 Store the model with:
@@ -524,11 +548,11 @@ Everyone is encouraged to help improve this project. Here are a few ways you can
 - Write, clarify, or fix documentation
 - Suggest or add new features
-To get started with development and testing:
+To get started with development:
 ```sh
 git clone https://github.com/ankane/eps.git
 cd eps
 bundle install
-rake test
+bundle exec rake test
 ```

data/lib/eps/base_estimator.rb CHANGED Viewed

@@ -2,34 +2,18 @@ module Eps
   class BaseEstimator
     def initialize(data = nil, y = nil, **options)
       @options = options.dup
-      # TODO better pattern - don't pass most options to train
-      options.delete(:intercept)
       @trained = false
+      @text_encoders = {}
+      # TODO better pattern - don't pass most options to train
       train(data, y, **options) if data
     end
     def predict(data)
-      singular = data.is_a?(Hash)
-      data = [data] if singular
-      data = Eps::DataFrame.new(data)
-      @evaluator.features.each do |k, type|
-        values = data.columns[k]
-        raise ArgumentError, "Missing column: #{k}" if !values
-        column_type = Utils.column_type(values.compact, k) if values
-        if !column_type.nil?
-          if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
-            raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
-          end
-        end
-        # TODO check for unknown values for categorical features
-      end
-      predictions = @evaluator.predict(data)
+      _predict(data, false)
+    end
-      singular ? predictions.first : predictions
+    def predict_probability(data)
+      _predict(data, true)
     end
     def evaluate(data, y = nil, target: nil, weight: nil)
@@ -75,7 +59,31 @@ module Eps
     private
-    def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, verbose: nil, text_features: nil, early_stopping: nil)
+    def _predict(data, probabilities)
+      singular = data.is_a?(Hash)
+      data = [data] if singular
+      data = Eps::DataFrame.new(data)
+      @evaluator.features.each do |k, type|
+        values = data.columns[k]
+        raise ArgumentError, "Missing column: #{k}" if !values
+        column_type = Utils.column_type(values.compact, k) if values
+        if !column_type.nil?
+          if (type == "numeric" && column_type != "numeric") || (type != "numeric" && column_type != "categorical")
+            raise ArgumentError, "Bad type for column #{k}: Expected #{type} but got #{column_type}"
+          end
+        end
+        # TODO check for unknown values for categorical features
+      end
+      predictions = @evaluator.predict(data, probabilities: probabilities)
+      singular ? predictions.first : predictions
+    end
+    def train(data, y = nil, target: nil, weight: nil, split: nil, validation_set: nil, text_features: nil, **options)
       data, @target = prep_data(data, y, target, weight)
       @target_type = Utils.column_type(data.label, @target)
@@ -167,7 +175,7 @@ module Eps
       raise "No data in validation set" if validation_set && validation_set.empty?
       @validation_set = validation_set
-      @evaluator = _train(verbose: verbose, early_stopping: early_stopping)
+      @evaluator = _train(**options)
       # reset pmml
       @pmml = nil
@@ -202,29 +210,38 @@ module Eps
       [data, target]
     end
-    def prep_text_features(train_set)
-      @text_encoders = {}
+    def prep_text_features(train_set, fit: true)
       @text_features.each do |k, v|
-        # reset vocabulary
-        v.delete(:vocabulary)
+        if fit
+          # reset vocabulary
+          v.delete(:vocabulary)
+          # TODO determine max features automatically
+          # start based on number of rows
+          encoder = Eps::TextEncoder.new(**v)
+          counts = encoder.fit(train_set.columns.delete(k))
+        else
+          encoder = @text_encoders[k]
+          counts = encoder.transform(train_set.columns.delete(k))
+        end
-        # TODO determine max features automatically
-        # start based on number of rows
-        encoder = Eps::TextEncoder.new(**v)
-        counts = encoder.fit(train_set.columns.delete(k))
         encoder.vocabulary.each do |word|
           train_set.columns[[k, word]] = [0] * counts.size
         end
         counts.each_with_index do |ci, i|
           ci.each do |word, count|
             word_key = [k, word]
             train_set.columns[word_key][i] = 1 if train_set.columns.key?(word_key)
           end
         end
-        @text_encoders[k] = encoder
-        # update vocabulary
-        v[:vocabulary] = encoder.vocabulary
+        if fit
+          @text_encoders[k] = encoder
+          # update vocabulary
+          v[:vocabulary] = encoder.vocabulary
+        end
       end
       raise "No features left" if train_set.columns.empty?
@@ -238,7 +255,7 @@ module Eps
     def check_missing(c, name)
       raise ArgumentError, "Missing column: #{name}" if !c
-      raise ArgumentError, "Missing values in column #{name}" if c.any?(&:nil?)
+      raise ArgumentError, "Missing values in column #{name}" if c.to_a.any?(&:nil?)
     end
     def check_missing_value(df)

data/lib/eps/data_frame.rb CHANGED Viewed

@@ -10,7 +10,7 @@ module Eps
         data.columns.each do |k, v|
           @columns[k] = v
         end
-      elsif daru?(data)
+      elsif rover?(data) || daru?(data)
         data.to_h.each do |k, v|
           @columns[k.to_s] = v.to_a
         end
@@ -19,6 +19,8 @@ module Eps
           @columns[k.to_s] = v.to_a
         end
       else
+        data = data.to_a if numo?(data)
         if data.any?
           row = data[0]
@@ -140,8 +142,16 @@ module Eps
     private
+    def numo?(x)
+      defined?(Numo::NArray) && x.is_a?(Numo::NArray)
+    end
+    def rover?(x)
+      defined?(Rover::DataFrame) && x.is_a?(Rover::DataFrame)
+    end
     def daru?(x)
-      defined?(Daru) && x.is_a?(Daru::DataFrame)
+      defined?(Daru::DataFrame) && x.is_a?(Daru::DataFrame)
     end
   end
 end

data/lib/eps/evaluators/lightgbm.rb CHANGED Viewed

@@ -11,19 +11,15 @@ module Eps
         @text_features = text_features
       end
-      def predict(data)
+      def predict(data, probabilities: false)
+        raise "Probabilities not supported" if probabilities && @objective == "regression"
         rows = data.map(&:to_h)
         # sparse matrix
         @text_features.each do |k, v|
           encoder = TextEncoder.new(**v)
-          values = data.columns.delete(k)
-          counts = encoder.transform(values)
-          encoder.vocabulary.each do |word|
-            data.columns[[k, word]] = [0] * values.size
-          end
+          counts = encoder.transform(data.columns[k])
           counts.each_with_index do |xc, i|
             row = rows[i]
@@ -38,17 +34,28 @@ module Eps
         when "regression"
           sum_trees(rows, @trees)
         when "binary"
-          sum_trees(rows, @trees).map { |s| @labels[sigmoid(s) > 0.5 ? 1 : 0] }
+          prob = sum_trees(rows, @trees).map { |s| sigmoid(s) }
+          if probabilities
+            prob.map { |v| @labels.zip([1 - v, v]).to_h }
+          else
+            prob.map { |v| @labels[v > 0.5 ? 1 : 0] }
+          end
         else
           tree_scores = []
           num_trees = @trees.size / @labels.size
           @trees.each_slice(num_trees).each do |trees|
             tree_scores << sum_trees(rows, trees)
           end
-          data.size.times.map do |i|
+          rows.size.times.map do |i|
             v = tree_scores.map { |s| s[i] }
-            idx = v.map.with_index.max_by { |v2, _| v2 }.last
-            @labels[idx]
+            if probabilities
+              exp = v.map { |vi| Math.exp(vi) }
+              sum = exp.sum
+              @labels.zip(exp.map { |e| e / sum }).to_h
+            else
+              idx = v.map.with_index.max_by { |v2, _| v2 }.last
+              @labels[idx]
+            end
           end
         end
       end
@@ -109,7 +116,7 @@ module Eps
       end
       def sigmoid(x)
-        1.0 / (1 + Math::E**(-x))
+        1.0 / (1 + Math.exp(-x))
       end
     end
   end

data/lib/eps/evaluators/linear_regression.rb CHANGED Viewed

@@ -9,7 +9,9 @@ module Eps
         @text_features = text_features || {}
       end
-      def predict(x)
+      def predict(x, probabilities: false)
+        raise "Probabilities not supported" if probabilities
         intercept = @coefficients["_intercept"] || 0.0
         scores = [intercept] * x.size

data/lib/eps/evaluators/naive_bayes.rb CHANGED Viewed

@@ -10,14 +10,15 @@ module Eps
         @legacy = legacy
       end
-      def predict(x)
+      def predict(x, probabilities: false)
         probs = calculate_class_probabilities(x)
         probs.map do |xp|
-          # convert probabilities
-          # not needed when just returning label
-          # sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
-          # p xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
-          xp.sort_by { |k, v| [-v, k] }[0][0]
+          if probabilities
+            sum = xp.values.map { |v| Math.exp(v) }.sum.to_f
+            xp.map { |k, v| [k, Math.exp(v) / sum] }.to_h
+          else
+            xp.sort_by { |k, v| [-v, k] }[0][0]
+          end
         end
       end

data/lib/eps/lightgbm.rb CHANGED Viewed

@@ -10,14 +10,14 @@ module Eps
         str << "Model needs more data for better predictions\n"
       else
         str << "Most important features\n"
-        @importance_keys.zip(importance).sort_by { |k, v| [-v, k] }.first(10).each do |k, v|
+        @importance_keys.zip(importance).sort_by { |k, v| [-v, display_field(k)] }.first(10).each do |k, v|
           str << "#{display_field(k)}: #{(100 * v / total).round}\n"
         end
       end
       str
     end
-    def _train(verbose: nil, early_stopping: nil)
+    def _train(verbose: nil, early_stopping: nil, learning_rate: 0.1)
       train_set = @train_set
       validation_set = @validation_set.dup
       summary_label = train_set.label
@@ -57,10 +57,13 @@ module Eps
       # text feature encoding
       prep_text_features(train_set)
-      prep_text_features(validation_set) if validation_set
+      prep_text_features(validation_set, fit: false) if validation_set
       # create params
-      params = {objective: objective}
+      params = {
+        objective: objective,
+        learning_rate: learning_rate
+      }
       params[:num_classes] = labels.size if objective == "multiclass"
       if train_set.size < 30
         params[:min_data_in_bin] = 1
@@ -68,7 +71,7 @@ module Eps
       end
       # create datasets
-      categorical_idx = @features.values.map.with_index.select { |type, _| type == "categorical" }.map(&:last)
+      categorical_idx = train_set.columns.keys.map.with_index.select { |k, _| @features[k] == "categorical" }.map(&:last)
       train_ds = ::LightGBM::Dataset.new(train_set.map_rows(&:to_a), label: train_set.label, weight: train_set.weight, categorical_feature: categorical_idx, params: params)
       validation_ds = ::LightGBM::Dataset.new(validation_set.map_rows(&:to_a), label: validation_set.label, weight: validation_set.weight, categorical_feature: categorical_idx, params: params, reference: train_ds) if validation_set
@@ -121,25 +124,30 @@ module Eps
     def check_evaluator(objective, labels, booster, booster_set, evaluator, evaluator_set)
       expected = @booster.predict(booster_set.map_rows(&:to_a))
       if objective == "multiclass"
-        expected.map! do |v|
-          labels[v.map.with_index.max_by { |v2, _| v2 }.last]
-        end
+        actual = evaluator.predict(evaluator_set, probabilities: true)
+        # just compare first for now
+        expected.map! { |v| v.first }
+        actual.map! { |v| v.values.first }
       elsif objective == "binary"
-        expected.map! { |v| labels[v >= 0.5 ? 1 : 0] }
+        actual = evaluator.predict(evaluator_set, probabilities: true).map { |v| v.values.last }
+      else
+        actual = evaluator.predict(evaluator_set)
       end
-      actual = evaluator.predict(evaluator_set)
-      regression = objective == "regression"
+      regression = objective == "regression" || objective == "binary"
       bad_observations = []
       expected.zip(actual).each_with_index do |(exp, act), i|
-        success = regression ? (act - exp).abs < 0.001 : act == exp
+        success = (act - exp).abs < 0.001
         unless success
           bad_observations << {expected: exp, actual: act, data_point: evaluator_set[i].map(&:itself).first}
         end
       end
       if bad_observations.any?
-        raise "Bug detected in evaluator. Please report an issue. Bad data points: #{bad_observations.inspect}"
+        bad_observations.each do |obs|
+          p obs
+        end
+        raise "Bug detected in evaluator. Please report an issue."
       end
     end

data/lib/eps/linear_regression.rb CHANGED Viewed

@@ -37,6 +37,7 @@ module Eps
       str
     end
+    # TODO use keyword arguments for gsl and intercept in 0.4.0
     def _train(**options)
       raise "Target must be numeric" if @target_type != "numeric"
       check_missing_value(@train_set)
@@ -61,7 +62,7 @@ module Eps
           false
         end
-      intercept = @options.key?(:intercept) ? @options[:intercept] : true
+      intercept = options.key?(:intercept) ? options[:intercept] : true
       if intercept && gsl != :gslr
         data.size.times do |i|
           x[i].unshift(1)

data/lib/eps/naive_bayes.rb CHANGED Viewed

@@ -17,7 +17,7 @@ module Eps
       str
     end
-    def _train(smoothing: 1, **options)
+    def _train(smoothing: 1)
       raise "Target must be strings" if @target_type != "categorical"
       check_missing_value(@train_set)
       check_missing_value(@validation_set) if @validation_set

data/lib/eps/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Eps
-  VERSION = "0.3.3"
+  VERSION = "0.3.8"
 end

metadata CHANGED Viewed

@@ -1,14 +1,14 @@
 --- !ruby/object:Gem::Specification
 name: eps
 version: !ruby/object:Gem::Version
-  version: 0.3.3
+  version: 0.3.8
 platform: ruby
 authors:
 - Andrew Kane
-autorequire:
+autorequire:
 bindir: bin
 cert_chain: []
-date: 2020-02-25 00:00:00.000000000 Z
+date: 2021-02-08 00:00:00.000000000 Z
 dependencies:
 - !ruby/object:Gem::Dependency
   name: lightgbm
@@ -38,64 +38,8 @@ dependencies:
     - - ">="
       - !ruby/object:Gem::Version
         version: '0'
-- !ruby/object:Gem::Dependency
-  name: bundler
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: daru
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: minitest
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-- !ruby/object:Gem::Dependency
-  name: rake
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: '0'
-description:
-email: andrew@chartkick.com
+description:
+email: andrew@ankane.org
 executables: []
 extensions: []
 extra_rdoc_files: []
@@ -128,7 +72,7 @@ homepage: https://github.com/ankane/eps
 licenses:
 - MIT
 metadata: {}
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -143,8 +87,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubygems_version: 3.1.2
-signing_key:
+rubygems_version: 3.2.3
+signing_key:
 specification_version: 4
 summary: Machine learning for Ruby. Supports regression (linear regression) and classification
   (naive Bayes)