RubyGems - ruby-statistics - Versions diffs - 2.0.4 → 2.1.3 - Mend

ruby-statistics 2.0.4 → 2.1.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (28) hide show

checksums.yaml +5 -5
data/.github/dependabot.yml +15 -0
data/.github/workflows/ruby.yml +35 -0
data/.travis.yml +6 -5
data/CONTRIBUTING.md +1 -0
data/README.md +6 -5
data/lib/math.rb +6 -5
data/lib/statistics/distribution/bernoulli.rb +35 -0
data/lib/statistics/distribution/beta.rb +2 -2
data/lib/statistics/distribution/empirical.rb +26 -0
data/lib/statistics/distribution/f.rb +8 -8
data/lib/statistics/distribution/geometric.rb +76 -0
data/lib/statistics/distribution/logseries.rb +51 -0
data/lib/statistics/distribution/negative_binomial.rb +51 -0
data/lib/statistics/distribution/normal.rb +59 -3
data/lib/statistics/distribution/poisson.rb +2 -2
data/lib/statistics/distribution/t_student.rb +3 -3
data/lib/statistics/distribution/uniform.rb +2 -2
data/lib/statistics/distribution/weibull.rb +3 -3
data/lib/statistics/spearman_rank_coefficient.rb +71 -0
data/lib/statistics/statistical_test/chi_squared_test.rb +2 -2
data/lib/statistics/statistical_test/f_test.rb +4 -4
data/lib/statistics/statistical_test/kolmogorov_smirnov_test.rb +70 -0
data/lib/statistics/statistical_test/t_test.rb +6 -6
data/lib/statistics/statistical_test/wilcoxon_rank_sum_test.rb +2 -2
data/lib/statistics/version.rb +1 -1
data/ruby-statistics.gemspec +4 -4
metadata +35 -44

checksums.yaml CHANGED Viewed

@@ -1,7 +1,7 @@
 ---
-SHA1:
-  metadata.gz: 298bc7d8dff1aeabc7db9c11fe9d7987f16bde40
-  data.tar.gz: 1d796e62c18052f87fc2616b4c1a5f777080c1ab
+SHA256:
+  metadata.gz: 6612502f03d8077d0158d997a42dfbc4d1002f2ab01ce2b7bdb5fbd510187e3e
+  data.tar.gz: 14fb04073b5b788dfa9e93aa586daef050dd105c2d2f8bdd17db30ad1fbcf144
 SHA512:
-  metadata.gz: 98e8c58f34668e839be9689c74debd75bd7a6869372536d7e9927a63f77fca59ab05e06b413705f0d286094292cb566c01e6fe71145cdd7d2152fc930829910e
-  data.tar.gz: 37b78191adb8d659f21134346a8a415c5bd7bd8a7dd99b2c1f8d7793a2ea741c43e60d8235a7d5fcc2bc0284b24e8e58e8404e0b4b4401ee3bc60f7e1afc8b8b
+  metadata.gz: '09590f836a59563819a1a847830e5dc2ee3554415cadc81c35b2a0f43ab1af87204f028659e8aa2f30a14b58c69c3e4f65db5e722d0a00ced5d92faa1e7dce82'
+  data.tar.gz: 2e66a26c23bf1f05cb9de40e992b302c4f0fef13aa70b4e509de479cb15b9700d4032f5d548aa45110f161ef9dac417f9b1872479a02dca0e729a051be2a4fc8

data/.github/dependabot.yml ADDED Viewed

@@ -0,0 +1,15 @@
+# To get started with Dependabot version updates, you'll need to specify which
+# package ecosystems to update and where the package manifests are located.
+# Please see the documentation for all configuration options:
+# https://help.github.com/github/administering-a-repository/configuration-options-for-dependency-updates
+version: 2
+updates:
+  - package-ecosystem: "bundler" # See documentation for possible values
+    directory: "/" # Location of package manifests
+    schedule:
+      interval: "weekly"
+  - package-ecosystem: "github-actions" # See documentation for possible values
+    directory: "/" # Location of package manifests
+    schedule:
+      interval: "weekly"

data/.github/workflows/ruby.yml ADDED Viewed

@@ -0,0 +1,35 @@
+name: Ruby
+on: [push]
+jobs:
+  build:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2.3.4
+    - name: Set up Ruby 2.6
+      uses: actions/setup-ruby@v1.1.2
+      with:
+        ruby-version: 2.6.x
+    - name: Build and test with Rake
+      run: |
+        gem install bundler
+        bundle install --jobs 2 --retry 1
+        bundle exec rake
+  build_2_7:
+    runs-on: ubuntu-latest
+    steps:
+    - uses: actions/checkout@v2.3.4
+    - name: Set up Ruby 2.7
+      uses: actions/setup-ruby@v1.1.2
+      with:
+        ruby-version: 2.7.x
+    - name: Build and test with Rake
+      run: |
+        gem install bundler
+        bundle install --jobs 2 --retry 1
+        bundle exec rake

data/.travis.yml CHANGED Viewed

@@ -1,8 +1,9 @@
 sudo: false
 language: ruby
 rvm:
-  - 2.2
-  - 2.3.1
-  - 2.4.0
-  - 2.5.0
-before_install: gem install bundler
+  - 2.5.1
+  - 2.6.0
+  - 2.6.3
+  - 2.6.5
+  - 2.7
+before_install: gem update --system && gem install bundler

data/CONTRIBUTING.md ADDED Viewed

	@@ -0,0 +1 @@
1	+ Bug reports and pull requests are welcome on GitHub at https://github.com/estebanz01/ruby-statistics. This project is intended to be a safe, welcoming space for collaboration, and contributors are expected to adhere to the [Contributor Covenant code of conduct](https://www.contributor-covenant.org/).

data/README.md CHANGED Viewed

@@ -5,10 +5,11 @@
 A basic ruby gem that implements some statistical methods, functions and concepts to be used in any ruby environment without depending on any mathematical software like `R`, `Matlab`, `Octave` or similar.
 Unit test runs under the following ruby versions:
-* Ruby 2.2.
-* Ruby 2.3.1.
-* Ruby 2.4.0.
-* Ruby 2.5.0.
+* Ruby 2.5.1.
+* Ruby 2.6.0.
+* Ruby 2.6.3.
+* Ruby 2.6.5.
+* Ruby 2.7.
 We got the inspiration from the folks at [JStat](https://github.com/jstat/jstat) and some interesting lectures about [Keystroke dynamics](http://www.biometric-solutions.com/keystroke-dynamics.html).
@@ -52,7 +53,7 @@ normal = Statistics::Distribution::StandardNormal.new # Using all namespaces.
 ```
 ## Documentation
-You can find a bit more detailed documentation of all available distributions, tests and functions in the [Documentation Index](https://github.com/estebanz01/ruby-statistics/wiki/Documentation-Index)
+You can find a bit more detailed documentation of all available distributions, tests and functions in the [Documentation Index](https://github.com/estebanz01/ruby-statistics/wiki)
 ## Development

data/lib/math.rb CHANGED Viewed

@@ -9,11 +9,11 @@ module Math
   end
   def self.combination(n, r)
-    self.factorial(n)/(self.factorial(r) * self.factorial(n - r)).to_f # n!/(r! * [n - r]!)
+    self.factorial(n)/(self.factorial(r) * self.factorial(n - r)).to_r # n!/(r! * [n - r]!)
   end
   def self.permutation(n, k)
-    self.factorial(n)/self.factorial(n - k).to_f
+    self.factorial(n)/self.factorial(n - k).to_r
   end
   # Function adapted from the python implementation that exists in https://en.wikipedia.org/wiki/Simpson%27s_rule#Sample_implementation
@@ -24,7 +24,8 @@ module Math
       return
     end
-    h = (b - a)/n.to_f
+    h = (b - a)/n.to_r
     resA = yield(a)
     resB = yield(b)
@@ -45,7 +46,7 @@ module Math
   def self.lower_incomplete_gamma_function(s, x)
     # The greater the iterations, the better. That's why we are iterating 10_000 * x times
-    self.simpson_rule(0, x, (10_000 * x.round).round) do |t|
+    self.simpson_rule(0, x.to_r, (10_000 * x.round).round) do |t|
       (t ** (s - 1)) * Math.exp(-t)
     end
   end
@@ -72,7 +73,7 @@ module Math
     # To avoid overflow problems, the implementation applies the logarithm properties
     # to calculate in a faster and safer way the values.
     lbet_ab = (Math.lgamma(alp)[0] + Math.lgamma(bet)[0] - Math.lgamma(alp + bet)[0]).freeze
-    front = (Math.exp(Math.log(x) * alp + Math.log(1.0 - x) * bet - lbet_ab) / alp.to_f).freeze
+    front = (Math.exp(Math.log(x) * alp + Math.log(1.0 - x) * bet - lbet_ab) / alp.to_r).freeze
     # This is the non-log version of the left part of the formula (before the continuous fraction)
     # down_left = alp * self.beta_function(alp, bet)

data/lib/statistics/distribution/bernoulli.rb ADDED Viewed

@@ -0,0 +1,35 @@
+module Statistics
+  module Distribution
+    class Bernoulli
+      def self.density_function(n, p)
+        return if n != 0 && n != 1 # The support of the distribution is n = {0, 1}.
+        case n
+        when 0 then 1.0 - p
+        when 1 then p
+        end
+      end
+      def self.cumulative_function(n, p)
+        return if n != 0 && n != 1 # The support of the distribution is n = {0, 1}.
+        case n
+        when 0 then 1.0 - p
+        when 1 then 1.0
+        end
+      end
+      def self.variance(p)
+        p * (1.0 - p)
+      end
+      def self.skewness(p)
+        (1.0 - 2.0*p).to_r / Math.sqrt(p * (1.0 - p))
+      end
+      def self.kurtosis(p)
+        (6.0 * (p ** 2) - (6 * p) + 1) / (p * (1.0 - p))
+      end
+    end
+  end
+end

data/lib/statistics/distribution/beta.rb CHANGED Viewed

@@ -4,8 +4,8 @@ module Statistics
       attr_accessor :alpha, :beta
       def initialize(alp, bet)
-        self.alpha = alp.to_f
-        self.beta = bet.to_f
+        self.alpha = alp.to_r
+        self.beta = bet.to_r
       end
       def cumulative_function(value)

data/lib/statistics/distribution/empirical.rb ADDED Viewed

@@ -0,0 +1,26 @@
+module Statistics
+  module Distribution
+    class Empirical
+      attr_accessor :samples
+      def initialize(samples:)
+        self.samples = samples
+      end
+      # Formula grabbed from here: https://statlect.com/asymptotic-theory/empirical-distribution
+      def cumulative_function(x:)
+        cumulative_sum = samples.reduce(0) do |summation, sample|
+          summation += if sample <= x
+                         1
+                       else
+                         0
+                       end
+          summation
+        end
+        cumulative_sum / samples.size.to_r
+      end
+    end
+  end
+end

data/lib/statistics/distribution/f.rb CHANGED Viewed

@@ -10,7 +10,7 @@ module Statistics
       # Formula extracted from http://www.itl.nist.gov/div898/handbook/eda/section3/eda3665.htm#CDF
       def cumulative_function(value)
-        k = d2/(d2 + d1 * value.to_f)
+        k = d2/(d2 + d1 * value.to_r)
         1 - Math.incomplete_beta_function(k, d2/2.0, d1/2.0)
       end
@@ -18,28 +18,28 @@ module Statistics
       def density_function(value)
         return if d1 < 0 || d2 < 0 # F-pdf is well defined for the [0, +infinity) interval.
-        val = value.to_f
+        val = value.to_r
         upper = ((d1 * val) ** d1) * (d2**d2)
         lower = (d1 * val + d2) ** (d1 + d2)
-        up = Math.sqrt(upper/lower.to_f)
+        up = Math.sqrt(upper/lower.to_r)
         down = val * Math.beta_function(d1/2.0, d2/2.0)
-        up/down.to_f
+        up/down.to_r
       end
       def mean
         return if d2 <= 2
-        d2/(d2 - 2).to_f
+        d2/(d2 - 2).to_r
       end
       def mode
         return if d1 <= 2
-        left = (d1 - 2)/d1.to_f
-        right = d2/(d2 + 2).to_f
+        left = (d1 - 2)/d1.to_r
+        right = d2/(d2 + 2).to_r
-        left * right
+        (left * right).to_f
       end
     end
   end

data/lib/statistics/distribution/geometric.rb ADDED Viewed

@@ -0,0 +1,76 @@
+module Statistics
+  module Distribution
+    class Geometric
+      attr_accessor :probability_of_success, :always_success_allowed
+      def initialize(p, always_success: false)
+        self.probability_of_success = p.to_r
+        self.always_success_allowed = always_success
+      end
+      def density_function(k)
+        k = k.to_i
+        if always_success_allowed
+          return if k < 0
+          ((1.0 - probability_of_success) ** k) * probability_of_success
+        else
+          return if k <= 0
+          ((1.0 - probability_of_success) ** (k - 1.0)) * probability_of_success
+        end
+      end
+      def cumulative_function(k)
+        k = k.to_i
+        if always_success_allowed
+          return if k < 0
+          1.0 - ((1.0 - probability_of_success) ** (k + 1.0))
+        else
+          return if k <= 0
+          1.0 - ((1.0 - probability_of_success) ** k)
+        end
+      end
+      def mean
+        if always_success_allowed
+          (1.0 - probability_of_success) / probability_of_success
+        else
+          1.0 / probability_of_success
+        end
+      end
+      def median
+        if always_success_allowed
+          (-1.0 / Math.log2(1.0 - probability_of_success)).ceil - 1.0
+        else
+          (-1.0 / Math.log2(1.0 - probability_of_success)).ceil
+        end
+      end
+      def mode
+        if always_success_allowed
+          0.0
+        else
+          1.0
+        end
+      end
+      def variance
+        (1.0 - probability_of_success) / (probability_of_success ** 2)
+      end
+      def skewness
+        (2.0 - probability_of_success) / Math.sqrt(1.0 - probability_of_success)
+      end
+      def kurtosis
+        6.0 + ((probability_of_success ** 2) / (1.0 - probability_of_success))
+      end
+    end
+  end
+end

data/lib/statistics/distribution/logseries.rb ADDED Viewed

@@ -0,0 +1,51 @@
+module Statistics
+  module Distribution
+    class LogSeries
+      def self.density_function(k, p)
+        return if k <= 0
+        k = k.to_i
+        left = (-1.0 / Math.log(1.0 - p))
+        right = (p ** k).to_r
+        left * right / k
+      end
+      def self.cumulative_function(k, p)
+        return if k <= 0
+        # Sadly, the incomplete beta function is converging
+        # too fast to zero and breaking the calculation on logs.
+        # So, we default to the basic definition of the CDF which is
+        # the integral (-Inf, K) of the PDF, with P(X <= x) which can
+        # be solved as a summation of all PDFs from 1 to K. Note that the summation approach
+        # only applies to discrete distributions.
+        #
+        # right = Math.incomplete_beta_function(p, (k + 1).floor, 0) / Math.log(1.0 - p)
+        # 1.0 + right
+        result = 0.0
+        1.upto(k) do |number|
+          result += self.density_function(number, p)
+        end
+        result
+      end
+      def self.mode
+        1.0
+      end
+      def self.mean(p)
+        (-1.0 / Math.log(1.0 - p)) * (p / (1.0 - p))
+      end
+      def self.variance(p)
+        up = p + Math.log(1.0 - p)
+        down = ((1.0 - p) ** 2) * (Math.log(1.0 - p) ** 2)
+        (-1.0 * p) * (up / down.to_r)
+      end
+    end
+  end
+end

data/lib/statistics/distribution/negative_binomial.rb ADDED Viewed

@@ -0,0 +1,51 @@
+module Statistics
+  module Distribution
+    class NegativeBinomial
+      attr_accessor :number_of_failures, :probability_per_trial
+      def initialize(r, p)
+        self.number_of_failures = r.to_i
+        self.probability_per_trial = p
+      end
+      def probability_mass_function(k)
+        return if number_of_failures < 0 || k < 0 || k > number_of_failures
+        left = Math.combination(k + number_of_failures - 1, k)
+        right = ((1 - probability_per_trial) ** number_of_failures) * (probability_per_trial ** k)
+        left * right
+      end
+      def cumulative_function(k)
+        return if k < 0 || k > number_of_failures
+        k = k.to_i
+        1.0 - Math.incomplete_beta_function(probability_per_trial, k + 1, number_of_failures)
+      end
+      def mean
+        (probability_per_trial * number_of_failures)/(1 - probability_per_trial).to_r
+      end
+      def variance
+        (probability_per_trial * number_of_failures)/((1 - probability_per_trial) ** 2).to_r
+      end
+      def skewness
+        (1 + probability_per_trial).to_r / Math.sqrt(probability_per_trial * number_of_failures)
+      end
+      def mode
+        if number_of_failures > 1
+          up = probability_per_trial * (number_of_failures - 1)
+          down = (1 - probability_per_trial).to_r
+          (up/down).floor
+        elsif number_of_failures <= 1
+          0.0
+        end
+      end
+    end
+  end
+end

data/lib/statistics/distribution/normal.rb CHANGED Viewed

@@ -5,9 +5,9 @@ module Statistics
       alias_method :mode, :mean
       def initialize(avg, std)
-        self.mean = avg.to_f
-        self.standard_deviation = std.to_f
-        self.variance = std.to_f**2
+        self.mean = avg.to_r
+        self.standard_deviation = std.to_r
+        self.variance = std.to_r**2
       end
       def cumulative_function(value)
@@ -79,5 +79,61 @@ module Statistics
         euler/Math.sqrt(2 * Math::PI)
       end
     end
+    # Inverse Standard Normal distribution:
+    # References:
+    # https://en.wikipedia.org/wiki/Inverse_distribution
+    # http://www.source-code.biz/snippets/vbasic/9.htm
+    class InverseStandardNormal < StandardNormal
+      A1 = -39.6968302866538
+      A2 = 220.946098424521
+      A3 = -275.928510446969
+      A4 = 138.357751867269
+      A5 = -30.6647980661472
+      A6 = 2.50662827745924
+      B1 = -54.4760987982241
+      B2 = 161.585836858041
+      B3 = -155.698979859887
+      B4 = 66.8013118877197
+      B5 = -13.2806815528857
+      C1 = -7.78489400243029E-03
+      C2 = -0.322396458041136
+      C3 = -2.40075827716184
+      C4 = -2.54973253934373
+      C5 = 4.37466414146497
+      C6 = 2.93816398269878
+      D1 = 7.78469570904146E-03
+      D2 = 0.32246712907004
+      D3 = 2.445134137143
+      D4 = 3.75440866190742
+      P_LOW = 0.02425
+      P_HIGH = 1 - P_LOW
+      def density_function(_)
+        raise NotImplementedError
+      end
+      def random(elements: 1, seed: Random.new_seed)
+        raise NotImplementedError
+      end
+      def cumulative_function(value)
+        return if value < 0.0 || value > 1.0
+        return -1.0 * Float::INFINITY if value.zero?
+        return Float::INFINITY if value == 1.0
+        if value < P_LOW
+          q = Math.sqrt((Math.log(value) * -2.0))
+          (((((C1 * q + C2) * q + C3) * q + C4) * q + C5) * q + C6) / ((((D1 * q + D2) * q + D3) * q + D4) * q + 1.0)
+        elsif value <= P_HIGH
+          q = value - 0.5
+          r = q ** 2
+          (((((A1 * r + A2) * r + A3) * r + A4) * r + A5) * r + A6) * q / (((((B1 * r + B2) * r + B3) * r + B4) * r + B5) * r + 1.0)
+        else
+          q = Math.sqrt((Math.log(1 - value) * -2.0))
+          - (((((C1 * q + C2) * q + C3) * q + C4) * q + C5) * q + C6) / ((((D1 * q + D2) * q + D3) * q + D4) * q + 1)
+        end
+      end
+    end
   end
 end

data/lib/statistics/distribution/poisson.rb CHANGED Viewed

@@ -18,7 +18,7 @@ module Statistics
         upper = (expected_number_of_occurrences ** k) * Math.exp(-expected_number_of_occurrences)
         lower = Math.factorial(k)
-        upper/lower.to_f
+        upper/lower.to_r
       end
       def cumulative_function(k)
@@ -31,7 +31,7 @@ module Statistics
         # We need the right tail, i.e.: The upper incomplete gamma function. This can be
         # achieved by doing a substraction between 1 and the lower incomplete gamma function.
-        1 - (upper/lower.to_f)
+        1 - (upper/lower.to_r)
       end
     end
   end

data/lib/statistics/distribution/t_student.rb CHANGED Viewed

@@ -29,7 +29,7 @@ module Statistics
         upper = Math.gamma((degrees_of_freedom + 1)/2.0)
         lower = Math.sqrt(degrees_of_freedom * Math::PI) * Math.gamma(degrees_of_freedom/2.0)
         left = upper/lower
-        right = (1 + ((value ** 2)/degrees_of_freedom.to_f)) ** -((degrees_of_freedom + 1)/2.0)
+        right = (1 + ((value ** 2)/degrees_of_freedom.to_r)) ** -((degrees_of_freedom + 1)/2.0)
         left * right
       end
@@ -64,8 +64,8 @@ module Statistics
           results << Math.simpson_rule(threshold, y, 10_000) do |t|
             up = Math.gamma((v+1)/2.0)
             down = Math.sqrt(Math::PI * v) * Math.gamma(v/2.0)
-            right = (1 + ((y ** 2)/v.to_f)) ** ((v+1)/2.0)
-            left = up/down.to_f
+            right = (1 + ((y ** 2)/v.to_r)) ** ((v+1)/2.0)
+            left = up/down.to_r
             left * right
           end

data/lib/statistics/distribution/uniform.rb CHANGED Viewed

@@ -4,8 +4,8 @@ module Statistics
       attr_accessor :left, :right
       def initialize(a, b)
-        self.left = a.to_f
-        self.right = b.to_f
+        self.left = a.to_r
+        self.right = b.to_r
       end
       def density_function(value)

data/lib/statistics/distribution/weibull.rb CHANGED Viewed

@@ -4,8 +4,8 @@ module Statistics
       attr_accessor :shape, :scale # k and lambda
       def initialize(k, lamb)
-        self.shape = k.to_f
-        self.scale = lamb.to_f
+        self.shape = k.to_r
+        self.scale = lamb.to_r
       end
       def cumulative_function(random_value)
@@ -45,7 +45,7 @@ module Statistics
       # Using the inverse CDF function, also called quantile, we can calculate
       # a random sample that follows a weibull distribution.
       #
-      # Formula extracted from http://www.stat.yale.edu/Courses/1997-98/101/chigf.htm
+      # Formula extracted from https://www.taygeta.com/random/weibull.html
       def random(elements: 1, seed: Random.new_seed)
         results = []

data/lib/statistics/spearman_rank_coefficient.rb ADDED Viewed

@@ -0,0 +1,71 @@
+module Statistics
+  class SpearmanRankCoefficient
+    def self.rank(data:, return_ranks_only: true)
+      descending_order_data = data.sort { |a, b| b <=> a }
+      rankings = {}
+      data.each do |value|
+        # If we have ties, the find_index method will only retrieve the index of the
+        # first element in the list (i.e, the most close to the left of the array),
+        # so when a tie is detected, we increase the temporal ranking by the number of
+        # counted elements at that particular time and then we increase the counter.
+        temporal_ranking = descending_order_data.find_index(value) + 1 # 0-index
+        if rankings.fetch(value, false)
+          rankings[value][:rank] += (temporal_ranking + rankings[value][:counter])
+          rankings[value][:counter] += 1
+          rankings[value][:tie_rank] = rankings[value][:rank] / rankings[value][:counter].to_r
+        else
+          rankings[value] = { counter: 1, rank: temporal_ranking, tie_rank: temporal_ranking }
+        end
+      end
+      if return_ranks_only
+        data.map do |value|
+          rankings[value][:tie_rank]
+        end
+      else
+        rankings
+      end
+    end
+    # Formulas extracted from: https://statistics.laerd.com/statistical-guides/spearmans-rank-order-correlation-statistical-guide.php
+    def self.coefficient(set_one, set_two)
+      raise 'Both group sets must have the same number of cases.' if set_one.size != set_two.size
+      return if set_one.size == 0 && set_two.size == 0
+      set_one_mean, set_two_mean = set_one.mean, set_two.mean
+      have_tie_ranks = (set_one + set_two).any? { |rank| rank.is_a?(Float) || rank.is_a?(Rational) }
+      if have_tie_ranks
+        numerator = 0
+        squared_differences_set_one = 0
+        squared_differences_set_two = 0
+        set_one.size.times do |idx|
+          local_diff_one = (set_one[idx] - set_one_mean)
+          local_diff_two = (set_two[idx] - set_two_mean)
+          squared_differences_set_one += local_diff_one ** 2
+          squared_differences_set_two += local_diff_two ** 2
+          numerator += local_diff_one * local_diff_two
+        end
+        denominator = Math.sqrt(squared_differences_set_one * squared_differences_set_two)
+        numerator / denominator.to_r # This is rho or spearman's coefficient.
+      else
+        sum_squared_differences = set_one.each_with_index.reduce(0) do |memo, (rank_one, index)|
+          memo += ((rank_one - set_two[index]) ** 2)
+          memo
+        end
+        numerator = 6 * sum_squared_differences
+        denominator = ((set_one.size ** 3) - set_one.size)
+        1.0 - (numerator / denominator.to_r) # This is rho or spearman's coefficient.
+      end
+    end
+  end
+end

data/lib/statistics/statistical_test/chi_squared_test.rb CHANGED Viewed

@@ -8,12 +8,12 @@ module Statistics
         statistic = if expected.is_a? Numeric
                       observed.reduce(0) do |memo, observed_value|
                         up = (observed_value - expected) ** 2
-                        memo += (up/expected.to_f)
+                        memo += (up/expected.to_r)
                       end
                     else
                       expected.each_with_index.reduce(0) do |memo, (expected_value, index)|
                         up = (observed[index] - expected_value) ** 2
-                        memo += (up/expected_value.to_f)
+                        memo += (up/expected_value.to_r)
                       end
                     end

data/lib/statistics/statistical_test/f_test.rb CHANGED Viewed

@@ -19,7 +19,7 @@ module Statistics
         if args.size == 2
           variances = [args[0].variance, args[1].variance]
-          f_score = variances.max/variances.min.to_f
+          f_score = variances.max/variances.min.to_r
           df1 = 1 # k-1 (k = 2)
           df2 = args.flatten.size - 2 # N-k (k = 2)
         elsif args.size > 2
@@ -37,18 +37,18 @@ module Statistics
           variance_between_groups = iterator.reduce(0) do |summation, (size, index)|
             inner_calculation = size * ((sample_means[index] - overall_mean) ** 2)
-            summation += (inner_calculation / (total_groups - 1).to_f)
+            summation += (inner_calculation / (total_groups - 1).to_r)
           end
           # Variance within groups
           variance_within_groups = (0...total_groups).reduce(0) do |outer_summation, group_index|
             outer_summation += args[group_index].reduce(0) do |inner_sumation, observation|
               inner_calculation = ((observation - sample_means[group_index]) ** 2)
-              inner_sumation += (inner_calculation / (total_elements - total_groups).to_f)
+              inner_sumation += (inner_calculation / (total_elements - total_groups).to_r)
             end
           end
-          f_score = variance_between_groups/variance_within_groups.to_f
+          f_score = variance_between_groups/variance_within_groups.to_r
           df1 = total_groups - 1
           df2 = total_elements - total_groups
         end

data/lib/statistics/statistical_test/kolmogorov_smirnov_test.rb ADDED Viewed

@@ -0,0 +1,70 @@
+module Statistics
+  module StatisticalTest
+    class KolmogorovSmirnovTest
+      # Common alpha, and critical D are calculated following formulas from: https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test#Two-sample_Kolmogorov%E2%80%93Smirnov_test
+      def self.two_samples(group_one:, group_two:, alpha: 0.05)
+        samples = group_one + group_two # We can use unbalaced group samples
+        ecdf_one = Distribution::Empirical.new(samples: group_one)
+        ecdf_two = Distribution::Empirical.new(samples: group_two)
+        d_max = samples.sort.map do |sample|
+          d1 = ecdf_one.cumulative_function(x: sample)
+          d2 = ecdf_two.cumulative_function(x: sample)
+          (d1 - d2).abs
+        end.max
+        # TODO: Validate calculation of Common alpha.
+        common_alpha = Math.sqrt((-0.5 * Math.log(alpha)))
+        radicand = (group_one.size + group_two.size) / (group_one.size * group_two.size).to_r
+        critical_d = common_alpha * Math.sqrt(radicand)
+        # critical_d = self.critical_d(alpha: alpha, n: samples.size)
+        # We are unable to calculate the p_value, because we don't have the Kolmogorov distribution
+        # defined. We reject the null hypotesis if Dmax is > than Dcritical.
+        { d_max: d_max,
+          d_critical: critical_d,
+          total_samples: samples.size,
+          alpha: alpha,
+          null: d_max <= critical_d,
+          alternative: d_max > critical_d,
+          confidence_level: 1.0 - alpha }
+      end
+      # This is an implementation of the formula presented by Paul Molin and Hervé Abdi in a paper,
+      # called "New Table and numerical approximations for Kolmogorov-Smirnov / Lilliefors / Van Soest
+      # normality test".
+      # In this paper, the authors defines a couple of 6th-degree polynomial functions that allow us
+      # to find an aproximation of the real critical value. This is based in the conclusions made by
+      # Dagnelie (1968), where indicates that critical values given by Lilliefors can be approximated
+      # numerically.
+      #
+      # In general, the formula found is:
+      #  C(N, alpha) ^ -2  = A(alpha) * N + B(alpha).
+      #
+      # Where A(alpha), B(alpha) are two 6th degree polynomial functions computed using the principle
+      # of Monte Carlo simulations.
+      #
+      # paper can be found here: https://utdallas.edu/~herve/MolinAbdi1998-LillieforsTechReport.pdf
+      # def self.critical_d(alpha:, n:)
+      #   confidence = 1.0 - alpha
+      #   a_alpha = 6.32207539843126 -17.1398870006148 * confidence +
+      #     38.42812675101057 * (confidence ** 2) - 45.93241384693391 * (confidence ** 3) +
+      #     7.88697700041829 * (confidence ** 4) + 29.79317711037858 * (confidence ** 5) -
+      #     18.48090137098585 * (confidence ** 6)
+      #   b_alpha = 12.940399038404 - 53.458334259532 * confidence +
+      #     186.923866119699 * (confidence ** 2) - 410.582178349305 * (confidence ** 3) +
+      #     517.377862566267 * (confidence ** 4) - 343.581476222384 * (confidence ** 5) +
+      #     92.123451358715 * (confidence ** 6)
+      #   Math.sqrt(1.0 / (a_alpha * n + b_alpha))
+      # end
+    end
+    KSTest = KolmogorovSmirnovTest # Alias
+  end
+end

data/lib/statistics/statistical_test/t_test.rb CHANGED Viewed

@@ -21,9 +21,9 @@ module Statistics
                     raise ZeroStdError, ZeroStdError::STD_ERROR_MSG if data_std == 0
                     comparison_mean = args[0]
-                    degrees_of_freedom = args[1].size
+                    degrees_of_freedom = args[1].size - 1
-                    (data_mean - comparison_mean)/(data_std / Math.sqrt(args[1].size).to_f).to_f
+                    (data_mean - comparison_mean)/(data_std / Math.sqrt(args[1].size).to_r).to_r
                   else
                     sample_left_mean = args[0].mean
                     sample_left_variance = args[0].variance
@@ -31,12 +31,12 @@ module Statistics
                     sample_right_mean = args[1].mean
                     degrees_of_freedom = args.flatten.size - 2
-                    left_root = sample_left_variance/args[0].size.to_f
-                    right_root = sample_right_variance/args[1].size.to_f
+                    left_root = sample_left_variance/args[0].size.to_r
+                    right_root = sample_right_variance/args[1].size.to_r
                     standard_error = Math.sqrt(left_root + right_root)
-                    (sample_left_mean - sample_right_mean).abs/standard_error.to_f
+                    (sample_left_mean - sample_right_mean).abs/standard_error.to_r
                   end
         t_distribution = Distribution::TStudent.new(degrees_of_freedom)
@@ -72,7 +72,7 @@ module Statistics
         down = difference_std/Math.sqrt(differences.size)
-        t_score = (differences.mean - 0)/down.to_f
+        t_score = (differences.mean - 0)/down.to_r
         probability = Distribution::TStudent.new(degrees_of_freedom).cumulative_function(t_score)

data/lib/statistics/statistical_test/wilcoxon_rank_sum_test.rb CHANGED Viewed

@@ -73,7 +73,7 @@ module Statistics
                     memo += ((t[:counter] ** 3) - t[:counter])/12.0
                   end
-        left = (total_group_one * total_group_two)/(n * (n - 1)).to_f
+        left = (total_group_one * total_group_two)/(n * (n - 1)).to_r
         right = (((n ** 3) - n)/12.0) - rank_sum
         Math.sqrt(left * right)
@@ -82,7 +82,7 @@ module Statistics
       private def ranked_sum_for(total, group)
         # sum rankings per group
         group.reduce(0) do |memo, element|
-          rank_of_element = total[element][:rank] / total[element][:counter].to_f
+          rank_of_element = total[element][:rank] / total[element][:counter].to_r
           memo += rank_of_element
         end
       end

data/lib/statistics/version.rb CHANGED Viewed

@@ -1,3 +1,3 @@
 module Statistics
-  VERSION = "2.0.4"
+  VERSION = "2.1.3"
 end

data/ruby-statistics.gemspec CHANGED Viewed

@@ -27,9 +27,9 @@ Gem::Specification.new do |spec|
   spec.executables   = spec.files.grep(%r{^exe/}) { |f| File.basename(f) }
   spec.require_paths = ["lib"]
-  spec.add_development_dependency "bundler",'~> 1.15', '>= 1.15.4'
-  spec.add_development_dependency "rake", '~> 12.0', '>= 12.0.0'
-  spec.add_development_dependency "rspec", '~> 3.6', '>= 3.6.0'
+  spec.add_development_dependency "rake", '>= 12.0.0', '~> 13.0'
+  spec.add_development_dependency "rspec", '>= 3.6.0'
   spec.add_development_dependency "grb", '~> 0.4.1', '>= 0.4.1'
-  spec.add_development_dependency 'byebug', '~> 9.1.0', '>= 9.1.0'
+  spec.add_development_dependency 'byebug', '>= 9.1.0'
+  spec.add_development_dependency 'pry'
 end

metadata CHANGED Viewed

@@ -1,62 +1,39 @@
 --- !ruby/object:Gem::Specification
 name: ruby-statistics
 version: !ruby/object:Gem::Version
-  version: 2.0.4
+  version: 2.1.3
 platform: ruby
 authors:
 - esteban zapata
-autorequire:
+autorequire:
 bindir: exe
 cert_chain: []
-date: 2018-05-18 00:00:00.000000000 Z
+date: 2021-02-04 00:00:00.000000000 Z
 dependencies:
-- !ruby/object:Gem::Dependency
-  name: bundler
-  requirement: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.15'
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: 1.15.4
-  type: :development
-  prerelease: false
-  version_requirements: !ruby/object:Gem::Requirement
-    requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '1.15'
-    - - ">="
-      - !ruby/object:Gem::Version
-        version: 1.15.4
 - !ruby/object:Gem::Dependency
   name: rake
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '12.0'
     - - ">="
       - !ruby/object:Gem::Version
         version: 12.0.0
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
   type: :development
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '12.0'
     - - ">="
       - !ruby/object:Gem::Version
         version: 12.0.0
+    - - "~>"
+      - !ruby/object:Gem::Version
+        version: '13.0'
 - !ruby/object:Gem::Dependency
   name: rspec
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '3.6'
     - - ">="
       - !ruby/object:Gem::Version
         version: 3.6.0
@@ -64,9 +41,6 @@ dependencies:
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: '3.6'
     - - ">="
       - !ruby/object:Gem::Version
         version: 3.6.0
@@ -94,9 +68,6 @@ dependencies:
   name: byebug
   requirement: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
-      - !ruby/object:Gem::Version
-        version: 9.1.0
     - - ">="
       - !ruby/object:Gem::Version
         version: 9.1.0
@@ -104,12 +75,23 @@ dependencies:
   prerelease: false
   version_requirements: !ruby/object:Gem::Requirement
     requirements:
-    - - "~>"
+    - - ">="
       - !ruby/object:Gem::Version
         version: 9.1.0
+- !ruby/object:Gem::Dependency
+  name: pry
+  requirement: !ruby/object:Gem::Requirement
+    requirements:
     - - ">="
       - !ruby/object:Gem::Version
-        version: 9.1.0
+        version: '0'
+  type: :development
+  prerelease: false
+  version_requirements: !ruby/object:Gem::Requirement
+    requirements:
+    - - ">="
+      - !ruby/object:Gem::Version
+        version: '0'
 description: |-
   This gem is intended to accomplish the same purpose as jStat js library:
                             to provide ruby with statistical capabilities without the need
@@ -122,10 +104,13 @@ executables: []
 extensions: []
 extra_rdoc_files: []
 files:
+- ".github/dependabot.yml"
+- ".github/workflows/ruby.yml"
 - ".gitignore"
 - ".rspec"
 - ".travis.yml"
 - CODE_OF_CONDUCT.md
+- CONTRIBUTING.md
 - Gemfile
 - LICENSE
 - LICENSE.txt
@@ -137,18 +122,25 @@ files:
 - lib/math.rb
 - lib/statistics.rb
 - lib/statistics/distribution.rb
+- lib/statistics/distribution/bernoulli.rb
 - lib/statistics/distribution/beta.rb
 - lib/statistics/distribution/binomial.rb
 - lib/statistics/distribution/chi_squared.rb
+- lib/statistics/distribution/empirical.rb
 - lib/statistics/distribution/f.rb
+- lib/statistics/distribution/geometric.rb
+- lib/statistics/distribution/logseries.rb
+- lib/statistics/distribution/negative_binomial.rb
 - lib/statistics/distribution/normal.rb
 - lib/statistics/distribution/poisson.rb
 - lib/statistics/distribution/t_student.rb
 - lib/statistics/distribution/uniform.rb
 - lib/statistics/distribution/weibull.rb
+- lib/statistics/spearman_rank_coefficient.rb
 - lib/statistics/statistical_test.rb
 - lib/statistics/statistical_test/chi_squared_test.rb
 - lib/statistics/statistical_test/f_test.rb
+- lib/statistics/statistical_test/kolmogorov_smirnov_test.rb
 - lib/statistics/statistical_test/t_test.rb
 - lib/statistics/statistical_test/wilcoxon_rank_sum_test.rb
 - lib/statistics/version.rb
@@ -157,7 +149,7 @@ homepage: https://github.com/estebanz01/ruby-statistics
 licenses:
 - MIT
 metadata: {}
-post_install_message:
+post_install_message:
 rdoc_options: []
 require_paths:
 - lib
@@ -172,9 +164,8 @@ required_rubygems_version: !ruby/object:Gem::Requirement
     - !ruby/object:Gem::Version
       version: '0'
 requirements: []
-rubyforge_project:
-rubygems_version: 2.5.2.1
-signing_key:
+rubygems_version: 3.1.4
+signing_key:
 specification_version: 4
 summary: A ruby gem for som specific statistics. Inspired by the jStat js library.
 test_files: []