RubyGems - rubystats - Versions diffs - 0.1.2 → 0.2.0 - Mend

rubystats 0.1.2 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (33) hide show

data/History.txt +7 -0
data/Manifest.txt +22 -0
data/README.txt +109 -0
data/Rakefile +19 -0
data/examples/beta.rb +10 -12
data/examples/binomial.rb +12 -10
data/examples/failrate_vs_goal.rb +28 -0
data/examples/fisher.rb +2 -6
data/examples/norm.rb +10 -4
data/lib/rubystats.rb +9 -0
data/lib/rubystats/beta_distribution.rb +88 -0
data/lib/rubystats/binomial_distribution.rb +195 -0
data/lib/rubystats/fishers_exact_test.rb +171 -0
data/lib/rubystats/modules.rb +742 -0
data/lib/rubystats/normal_distribution.rb +114 -0
data/lib/rubystats/probability_distribution.rb +131 -0
data/{tests → test}/tc_beta.rb +4 -4
data/{tests → test}/tc_binomial.rb +4 -4
data/{tests → test}/tc_fisher.rb +2 -2
data/test/tc_norm.rb +14 -0
data/test/tc_require_all.rb +18 -0
data/{tests → test}/ts_stats.rb +0 -0
metadata +72 -51
data/README +0 -9
data/lib/beta_distribution.rb +0 -87
data/lib/binomial_distribution.rb +0 -194
data/lib/fishers_exact_test.rb +0 -171
data/lib/modules/extra_math.rb +0 -7
data/lib/modules/numerical_constants.rb +0 -17
data/lib/modules/special_math.rb +0 -721
data/lib/normal_distribution.rb +0 -114
data/lib/probability_distribution.rb +0 -132
data/tests/tc_norm.rb +0 -13

data/History.txt ADDED

@@ -0,0 +1,7 @@
+=== 0.2.0 / 2008-04-14
+* Major reorganization of code.
+  * Added lib/rubystats subdirectory and namespaced all classes under the Rubystats module.
+  * Added another example or two and fixed bug #16827.
+  * Should not break old API
+  *Now using Hoe to manage gem.

data/Manifest.txt ADDED

@@ -0,0 +1,22 @@
+History.txt
+Manifest.txt
+README.txt
+Rakefile
+examples/beta.rb
+examples/binomial.rb
+examples/failrate_vs_goal.rb
+examples/fisher.rb
+examples/norm.rb
+lib/rubystats.rb
+lib/rubystats/beta_distribution.rb
+lib/rubystats/binomial_distribution.rb
+lib/rubystats/fishers_exact_test.rb
+lib/rubystats/modules.rb
+lib/rubystats/normal_distribution.rb
+lib/rubystats/probability_distribution.rb
+test/tc_beta.rb
+test/tc_binomial.rb
+test/tc_fisher.rb
+test/tc_norm.rb
+test/tc_require_all.rb
+test/ts_stats.rb

data/README.txt ADDED

@@ -0,0 +1,109 @@
+= rubystats
+* http://rubyforge.org/projects/rubystats/
+== DESCRIPTION:
+ This is a set of Ruby statistics libraries ported from the PHPMath libraries.
+ PHPMath libraries created by Paul Meagher (many of which were ported from
+ various sources).
+ See http://www.phpmath.com/ for PHPMath libraries.
+ See examples and tests for usage.
+== NOTE for version 0.2.0:
+ The API has changed somewhat in version 0.2. All tests pass with the old API, but
+ You now if you want to load just one of the distributions, you must require it like:
+ require 'rubystats/normal_distribution'
+ Then prefix the class name with the Rubystats module name:
+ norm = Rubystats::NormalDistribution.new(10, 2)
+ Alternatively, you can simply require 'rubystats' and have all the classes loaded, and
+ have the Rubystats module included.
+== Author:
+ Bryan Donovan
+ 2006-2008
+== WARNING
+This is beta-quality software. It works well according to my tests, but the API may change and other features may be added.
+== FEATURES:
+Classes for distributions:
+* Normal
+* Binomial
+* Beta
+Also includes Fisher's Exact Test
+== SYNOPSIS:
+=== Example: normal distribution with mean of 10 and standard deviation of 2
+norm = Rubystats::NormalDistribution.new(10, 2)
+cdf = norm.cdf(11)
+pdf = norm.pdf(11)
+puts "CDF(11): #{cdf}"
+puts "PDF(11): #{pdf}"
+Output:
+ CDF(11): 0.691462461274013
+ PDF(11): 0.0733813315868699
+=== Example: get some random numbers from a normal distribution
+puts "Random numbers from normal distribution:"
+10.times do
+  puts norm.rng
+end
+(sample) Output:
+  18.8877297946427
+  -15.4463065628574
+  4.55538065315298
+  17.0281528150355
+  3.16543873165151
+  2.48599492216993
+  14.3947330544886
+  -3.47989062859462
+  5.05832591294848
+  31.2952983108343
+== REQUIREMENTS:
+* Ruby > 1.8.2 (may work with earlier versions)
+== INSTALL:
+* sudo gem install rubystats
+== LICENSE:
+(The MIT License)
+Copyright (c) 2008
+Permission is hereby granted, free of charge, to any person obtaining
+a copy of this software and associated documentation files (the
+'Software'), to deal in the Software without restriction, including
+without limitation the rights to use, copy, modify, merge, publish,
+distribute, sublicense, and/or sell copies of the Software, and to
+permit persons to whom the Software is furnished to do so, subject to
+the following conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED 'AS IS', WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
+IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
+CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

data/Rakefile ADDED

@@ -0,0 +1,19 @@
+require 'rubygems'
+require 'hoe'
+$:.unshift(File.dirname(__FILE__) + "/lib")
+require 'rubystats'
+Hoe.new('rubystats', Rubystats::VERSION) do |p|
+  p.name = "rubystats"
+  p.author = "Bryan Donovan - http://www.bryandonovan.com"
+  p.email = "b.dondo+rubyforge@gmail.com"
+  p.description = "Ruby Stats is a port of the statistics libraries from PHPMath. Probability distributions include binomial, beta, and normal distributions with PDF, CDF and inverse CDF as well as Fisher's Exact Test."
+  p.summary = "Port of PHPMath to Ruby"
+  p.url = "http://rubyforge.org/projects/rubystats/"
+  p.changes = p.paragraphs_of('History.txt', 0..1).join("\n\n")
+  p.remote_rdoc_dir = '' # Release to root
+end
+rule '' do |t|
+  system "cd test && ruby ts_stats.rb"
+end

data/examples/beta.rb CHANGED

@@ -1,22 +1,22 @@
 $:.unshift File.join(File.dirname(__FILE__), "..", "lib")
-require 'beta_distribution'
+require 'rubystats/beta_distribution'
 def get_lower_limit(trials,alpha,p)
-	if p==0
-		lcl=0
+	if p == 0
+		lcl = 0
 	else
-		q=trials-p+1
-		bet= BetaDistribution.new(p,q)
-		lcl=bet.icdf(alpha)
+		q = trials - p + 1
+		bet = Rubystats::BetaDistribution.new(p,q)
+		lcl = bet.icdf(alpha)
 	end
 	return lcl
 end
 def get_upper_limit(trials,alpha,p)
-	q=trials-p
-	p=p+1
-	bet= BetaDistribution.new(p,q)
-	ucl=bet.icdf(1-alpha)
+	q = trials - p
+	p = p + 1
+	bet = Rubystats::BetaDistribution.new(p,q)
+	ucl = bet.icdf(1-alpha)
 	return ucl
 end
@@ -32,5 +32,3 @@ puts "lcl= "
 p lcl
 puts "ucl= "
 p ucl

data/examples/binomial.rb CHANGED

@@ -1,20 +1,22 @@
 $:.unshift File.join(File.dirname(__FILE__), "..", "lib")
-require 'binomial_distribution'
+require 'rubystats/binomial_distribution'
 t = 100
 f = 7
 p = 0.05
-bin = BinomialDistribution.new(t,p)
+bin = Rubystats::BinomialDistribution.new(t,p)
 f = f - 1
-	mean = bin.mean
-	puts mean
-for i in 1..5
-	pdf = bin.pdf(i)
-	cdf = bin.cdf(i)
-	inv = bin.icdf(cdf)
-	puts inv
-	puts "#{i}: #{pdf} : #{cdf}"
+mean = bin.mean
+puts mean
+for i in 1..12
+  pdf = bin.pdf(i)
+  cdf = bin.cdf(i)
+  inv = bin.icdf(cdf)
+  pval = 1 - cdf
+  #	puts inv
+  puts "#{i}: #{pdf} : #{cdf}: pval: #{pval}"
 end

data/examples/failrate_vs_goal.rb ADDED

@@ -0,0 +1,28 @@
+$:.unshift File.join(File.dirname(__FILE__), "..", "lib")
+require 'rubystats/binomial_distribution'
+# Manufacturing example.
+# We have 10 different-sized batches of units that
+# get tested in our process.  We want to see if,
+# at > 95% confidence, the fail rate for any of
+# those batches is worse than our goal fail rate
+# of 10%
+tested = [100, 68, 67, 96, 46, 2, 13, 33, 88, 71]
+failed = [12,  9,  12, 7,  7,  0, 6,  4,  5,  5]
+bad_fail_rate = 0.10
+alpha = 0.05
+for i in 0..9
+  t = tested[i]
+  f = failed[i]
+  bin = Rubystats::BinomialDistribution.new(t,bad_fail_rate)
+  pdf = bin.pdf(f-1)
+  cdf = bin.cdf(f-1)
+  pval = 1 - cdf
+  pval = sprintf("%.3f",pval).to_f
+  status = pval <= alpha ? "RED ALERT" : "OK"
+  puts "Tested: #{t}\tFailed: #{f}\tpval: #{pval}\tStatus:#{status}"
+end

data/examples/fisher.rb CHANGED

@@ -1,5 +1,5 @@
 $:.unshift File.join(File.dirname(__FILE__), "..", "lib")
-require 'fishers_exact_test'
+require 'rubystats/fishers_exact_test'
 require 'pp'
 tested1 = 20
@@ -10,7 +10,7 @@ f2 = 10
 t1 = tested1 - f1
 t2 = tested2 - f2
-fet = FishersExactTest.new
+fet = Rubystats::FishersExactTest.new
 fisher = fet.calculate(t1,t2,f1,f2)
@@ -19,7 +19,3 @@ pp fisher
 perc = 100 * (1.0 - fisher[:twotail])
 pp perc

data/examples/norm.rb CHANGED

@@ -1,8 +1,14 @@
 $:.unshift File.join(File.dirname(__FILE__), "..", "lib")
-require 'normal_distribution'
+require 'rubystats/normal_distribution'
-norm = NormalDistribution.new(10, 2)
+#normal distribution with mean of 10 and standard deviation of 2
+norm = Rubystats::NormalDistribution.new(10, 2)
 cdf = norm.cdf(11)
 pdf = norm.pdf(11)
-puts cdf
-puts pdf
+puts "CDF(11): #{cdf}"
+puts "PDF(11): #{pdf}"
+puts "Random numbers from normal distribution:"
+10.times do
+  puts norm.rng
+end

data/lib/rubystats.rb ADDED

@@ -0,0 +1,9 @@
+module Rubystats
+  VERSION = '0.2.0'
+end
+require 'rubystats/normal_distribution'
+require 'rubystats/binomial_distribution'
+require 'rubystats/beta_distribution'
+require 'rubystats/fishers_exact_test'
+include Rubystats

data/lib/rubystats/beta_distribution.rb ADDED

@@ -0,0 +1,88 @@
+require 'rubystats/probability_distribution'
+module Rubystats
+  class BetaDistribution < Rubystats::ProbabilityDistribution
+    include Rubystats::SpecialMath
+    attr_reader :p, :q
+    #dgr_p = degrees of freedom p
+    #dgr_q = degrees of freedom q
+    def initialize(dgr_p, dgr_q)
+      if dgr_p <= 0 || dgr_q <= 0
+        return nil
+      end
+      @p = dgr_p.to_f
+      @q = dgr_q.to_f
+    end
+    def mean
+      @p.to_f / (@p.to_f + @q.to_f)
+    end
+    def standard_deviation
+      Math.sqrt(@p * @q / ((@p + @q)**2 * (@p + @q + 1)))
+    end
+    def pdf(x)
+      if x.class == Array
+        pdf_vals = []
+        for i in (0 ... x.size)
+          check_range(x[i])
+          if x[i] == 0.0 || x[i] == 1.0
+            pdf_vals[i] = 0.0
+          else
+            pdf_vals[i] = Math.exp( - log_beta(@p, @q) + (@p - 1.0) * Math.log(x[i]) + (@q - 1.0) * Math.log(1.0 - x[i]))
+          end
+        end
+        return pdf_vals
+      else
+        check_range(x)
+        if  (x == 0.0) || (x == 1.0)
+          return 0.0
+        else
+          return Math.exp( - log_beta(@p, @q) + (@p - 1.0) * Math.log(x) + (@q - 1.0) * Math.log(1.0 - x)
+          )
+        end
+      end
+    end
+    def cdf(x)
+      if x.class == Array
+        cdf_vals = Array.new
+        for i in 0 ... x.size
+          check_range(x[i])
+          cdf_vals[i] = incomplete_beta(x[i], @p, @q)
+        end
+        return cdf_vals
+      else
+        check_range(x)
+        cdf_val = incomplete_beta(x, @p, @q)
+        return cdf_val
+      end
+    end
+    def icdf(prob)
+      if prob.class == Array
+        inv_vals = Array.new
+        for i in 0 ... prob.size
+          check_range(prob[i])
+          if prob[i] == 0.0
+            inv_vals[i] = 0.0
+          end
+          if prob[i] == 1.0
+            inv_vals[i] = 1.0
+          end
+          inv_vals[i] = find_root(prob[i], 0.5, 0.0, 1.0)
+        end
+        return inv_vals
+      else
+        check_range(prob)
+        return 0.0 if prob == 0.0
+        return 1.0 if prob == 1.0
+        return find_root(prob, 0.5, 0.0, 1.0)
+      end
+    end
+  end
+end

data/lib/rubystats/binomial_distribution.rb ADDED

@@ -0,0 +1,195 @@
+require 'rubystats/probability_distribution'
+# This class provides an object for encapsulating binomial distributions
+# Ported to Ruby from PHPMath class by Bryan Donovan
+# Author:: Mark Hale
+# Author:: Paul Meagher
+# Author:: Bryan Donovan (http://www.bryandonovan.com)
+module Rubystats
+  class BinomialDistribution < Rubystats::ProbabilityDistribution
+    include Rubystats::NumericalConstants
+    include Rubystats::SpecialMath
+    include Rubystats::ExtraMath
+    # Constructs a binomial distribution
+    def initialize (trials, prob)
+      if trials <= 0
+        raise "Error: trials must be greater than 0"
+      end
+      @n = trials
+      if prob < 0.0 || prob > 1.0
+        raise "Error: prob must be between 0 and 1"
+      end
+      @p = prob
+    end
+    #returns the number of trials
+    def get_trials_parameter
+      return @n
+    end
+    #returns the probability
+    def get_probability_parameter
+      return @p
+    end
+    #returns the mean
+    def get_mean
+      return @n * @p
+    end
+    #returns the variance
+    def variance
+      return @n * @p * (1.0 - @p)
+    end
+    # Probability density function of a binomial distribution (equivalent
+    # to R dbinom function).
+    # _x should be an integer
+    # returns the probability that a stochastic variable x has the value _x,
+    # i.e. P(x = _x)
+    def pdf(_x)
+      if _x.class == Array
+        pdf_vals = []
+        for i in (0 ... _x.length)
+          check_range(_x[i], 0.0, @n)
+          pdf_vals[i] = binomial(@n, _x[i]) * (1-@p)**(@n-_x[i])
+        end
+        return pdf_vals
+      else
+        check_range(_x, 0.0, @n)
+        return binomial(@n, _x) * @p**_x * (1-@p)**(@n-_x)
+      end
+    end
+    # Cumulative binomial distribution function (equivalent to R pbinom function).
+    # _x should be integer-valued and can be single integer or array of integers
+    # returns single value or array containing probability that a stochastic
+    # variable x is less then X, i.e. P(x < _x).
+    def cdf(_x)
+      if _x.class == Array
+        inv_vals = []
+        for i in (0 ..._x.length)
+          pdf_vals[i] = get_cdf(_x[i])
+        end
+        return pdf_vals
+      else
+        return get_cdf(_x)
+      end
+    end
+    # Inverse of the cumulative binomial distribution function
+    # (equivalent to R qbinom function).
+    # returns the value X for which P(x < _x).
+    def get_icdf(prob)
+      if prob.class == Array
+        inv_vals = []
+        for i in (0 ...prob.length)
+          check_range(prob[i])
+          inv_vals[i] = (find_root(prob[i], @n/2, 0.0, @n)).floor
+        end
+        return inv_vals
+      else
+        check_range(prob)
+        return (find_root(prob, @n/2, 0.0, @n)).floor
+      end
+    end
+    # Wrapper for binomial RNG function (equivalent to R rbinom function).
+    # returns random deviate given trials and p
+    def rng(num_vals = 1)
+      if num_vals < 1
+        raise "Error num_vals must be greater than or equal to 1"
+      end
+      if num_vals == 1
+        return get_rng
+      else
+        rand_vals = []
+        for i in (0 ...num_vals)
+          rand_vals[i] = get_rng
+        end
+        return rand_vals
+      end
+    end
+    # Private methods below
+    private
+    # Private shared function for getting cumulant for particular x
+    # param _x should be integer-valued
+    # returns the probability that a stochastic variable x is less than _x
+    # i.e P(x < _x)
+    def get_cdf(_x)
+      check_range(_x, 0.0, @n)
+      sum = 0.0
+      for i in (0 .. _x)
+        sum = sum + pdf(i)
+      end
+      return sum
+    end
+    # Private binomial RNG function
+    # Original version of this function from Press et al.
+    #
+    # see http://www.library.cornell.edu/nr/bookcpdf/c7-3.pdf
+    #
+    # Changed parts having to do with generating a uniformly distributed
+    # number in the 0 to 1 range.  Also using instance variables, instead
+    # of supplying function with p and n values.  Finally calling port
+    # of JSci's log gamma routine instead of Press et al.
+    #
+    # There are enough non-trivial changes to this function that the
+    # port conforms to the Press et al. copyright.
+    def get_rng
+      nold = -1
+      pold = -1
+      p = (if @p <= 0.5 then @p else 1.0 - @p end)
+      am = @n * p
+      if @n < 25
+        bnl = 0.0
+        for i in (1...@n)
+          if  Kernel.rand < p
+            bnl = bnl.next
+          end
+        end
+      elsif am < 1.0
+        g = Math.exp(-am)
+        t = 1.0
+        for j in (0 ... @n)
+          t = t * Kernel.rand
+          break if t < g
+        end
+        bnl = (if j <= @n then j else @n end)
+      else
+        if n != nold
+          en = @n
+          oldg = log_gamma(en + 1.0)
+          nold = n
+        end
+        if p != pold
+          pc = 1.0 - p
+          plog = Math.log(p)
+          pclog = Math.log(pc)
+          pold = p
+        end
+        sq = Math.sqrt(2.0 * am * pc)
+        until Kernel.rand <= t do
+          until (em >= 0.0 || em < (en + 1.0)) do
+            angle = Pi * Kernel.rand
+            y = Math.tan(angle)
+            em = sq * y + am
+          end
+          em = em.floor
+          t = 1.2 * sq * (1.0 + y * y) *
+          Math.exp(oldg - log_gamma(em + 1.0) -
+          log_gamma(en - em + 1.0) + em * plog + (en - em) * pclog)
+        end
+        bnl = em
+      end
+      if p != @p
+        bnl = @n - bnl
+      end
+      return bnl
+    end
+  end
+end