distribution 0.3.0 → 0.4.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data.tar.gz.sig CHANGED
Binary file
@@ -1,3 +1,13 @@
1
+ === 0.4.0 / 2011-02-01
2
+
3
+ * Poisson and exponential distributions implemented. Implementation of inverse cdf for poisson is not perfect, yet.
4
+ * +distribution+ executable can create template files for new distributions.
5
+ * MathExtension should work fine with Ruby 1.8. Fixed shadowed variable on MathExtension.naive_factorial
6
+ * Added factorial lookup table for n<20.
7
+ * Added exact cdf for Binomial
8
+ * Binomial coefficient in function of permutations. Deleted incomplete beta until we found a faster way to calculate it
9
+
10
+
1
11
  === 0.3.0 / 2011-01-28
2
12
 
3
13
  * Included support for binomial distribution. p_value is not accurate
@@ -4,9 +4,17 @@ Manifest.txt
4
4
  README.txt
5
5
  Rakefile
6
6
  benchmark/binomial_coefficient.rb
7
+ benchmark/binomial_coefficient/binomial_coefficient.ds
8
+ benchmark/binomial_coefficient/binomial_coefficient.xls
9
+ benchmark/binomial_coefficient/experiment.rb
10
+ benchmark/factorial_hash.rb
7
11
  benchmark/factorial_method.rb
8
12
  benchmark/odd.rb
9
13
  bin/distribution
14
+ data/template/distribution.erb
15
+ data/template/distribution/gsl.erb
16
+ data/template/distribution/ruby.erb
17
+ data/template/spec.erb
10
18
  lib/distribution.rb
11
19
  lib/distribution/binomial.rb
12
20
  lib/distribution/binomial/gsl.rb
@@ -22,6 +30,9 @@ lib/distribution/chisquare/gsl.rb
22
30
  lib/distribution/chisquare/java.rb
23
31
  lib/distribution/chisquare/ruby.rb
24
32
  lib/distribution/chisquare/statistics2.rb
33
+ lib/distribution/exponential.rb
34
+ lib/distribution/exponential/gsl.rb
35
+ lib/distribution/exponential/ruby.rb
25
36
  lib/distribution/f.rb
26
37
  lib/distribution/f/gsl.rb
27
38
  lib/distribution/f/java.rb
@@ -38,6 +49,9 @@ lib/distribution/normal/java.rb
38
49
  lib/distribution/normal/ruby.rb
39
50
  lib/distribution/normal/statistics2.rb
40
51
  lib/distribution/normalmultivariate.rb
52
+ lib/distribution/poisson.rb
53
+ lib/distribution/poisson/gsl.rb
54
+ lib/distribution/poisson/ruby.rb
41
55
  lib/distribution/t.rb
42
56
  lib/distribution/t/gsl.rb
43
57
  lib/distribution/t/java.rb
@@ -47,10 +61,12 @@ spec/binomial_spec.rb
47
61
  spec/bivariatenormal_spec.rb
48
62
  spec/chisquare_spec.rb
49
63
  spec/distribution_spec.rb
64
+ spec/exponential_spec.rb
50
65
  spec/f_spec.rb
51
66
  spec/hypergeometric_spec.rb
52
67
  spec/math_extension_spec.rb
53
68
  spec/normal_spec.rb
69
+ spec/poisson_spec.rb
54
70
  spec/shorthand_spec.rb
55
71
  spec/spec.opts
56
72
  spec/spec_helper.rb
data/README.txt CHANGED
@@ -4,16 +4,17 @@
4
4
 
5
5
  == DESCRIPTION:
6
6
 
7
- Statistical Distributions multi library wrapper.
7
+ Statistical Distributions library. Includes Normal univariate and bivariate, T, F, Chi Square, Binomial, Hypergeometric, Exponential and Poisson.
8
+
8
9
  Uses Ruby by default and C (statistics2/GSL) or Java extensions where available.
9
10
 
10
- Includes code from statistics2
11
+ Includes code from statistics2 on Normal, T, F and Chi Square ruby code [http://blade.nagaokaut.ac.jp/~sinara/ruby/math/statistics2]
11
12
 
12
13
  == FEATURES/PROBLEMS:
13
14
 
14
15
  * Very fast ruby 1.8.7/1.9.+ implementation, with improved method to calculate factorials and others common functions
15
16
  * All methods tested on several ranges. See spec/
16
- * On Jruby, BivariateNormal returns incorrect pdf
17
+ * On Jruby and Rubinius, BivariateNormal returns incorrect pdf
17
18
 
18
19
  == API structure
19
20
 
@@ -31,7 +32,7 @@ On discrete distributions, exact cdf, pdf and p_value are
31
32
 
32
33
  <Distribution shortname>_(ecdf|epdf|ep)
33
34
 
34
- Shortnames are:
35
+ Shortnames for distributions:
35
36
 
36
37
  * Normal: norm
37
38
  * Bivariate Normal: bnor
@@ -40,6 +41,8 @@ Shortnames are:
40
41
  * Chi Square: chisq
41
42
  * Binomial: bino
42
43
  * Hypergeometric: hypg
44
+ * Exponential: expo
45
+ * Poisson: pois
43
46
 
44
47
  For example
45
48
 
@@ -83,6 +86,12 @@ After checking out the source, run:
83
86
  This task will install any missing dependencies, run the tests/specs,
84
87
  and generate the RDoc.
85
88
 
89
+ If you want to provide a new distribution, /lib/distribution run
90
+
91
+ $ distribution --new your_distribution
92
+
93
+ This should create the main distribution file, the directory with ruby and gsl engines and the rspec on /spec directory.
94
+
86
95
  == LICENSE:
87
96
 
88
97
  GPL V2
@@ -1,18 +1,21 @@
1
- $:.unshift(File.dirname(__FILE__)+"/../lib")
1
+ $:.unshift(File.expand_path(File.dirname(__FILE__)+"/../lib"))
2
2
  require 'distribution'
3
3
  require 'bench_press'
4
4
 
5
5
  extend BenchPress
6
6
 
7
+
8
+
9
+ samples=10.times.map {|i| 2**(i+1)}
10
+
7
11
  name 'binomial coefficient: multiplicative, factorial and optimized factorial methods'
8
12
  author 'Claudio Bustos'
9
13
  date '2011-01-27'
10
- summary "Exact calculation of Binomial Coefficient could be obtained using multiplicative, pure factorial or optimized factorial algorithm.
14
+ summary "Exact calculation of Binomial Coefficient could be obtained using multiplicative, pure factorial or optimized factorial algorithm (failing + factorial).
11
15
  Which one is faster?
12
16
 
13
17
  Lower k is the best for all
14
- k=n/2 is the worst case for optimized algorithm
15
- k near n is the worst for multiplicative
18
+ k=n/2 is the worst case.
16
19
 
17
20
  The factorial method uses the fastest Swing Prime Algorithm."
18
21
 
@@ -23,31 +26,29 @@ x=100
23
26
  n=100
24
27
  k=50
25
28
 
26
- samples=10.times.map {|i| 2**(i+1)}
27
-
28
29
 
29
30
 
30
31
  measure "Multiplicative" do
31
32
  samples.each do |n|
32
- [5,n/2,n-1].each do |k|
33
+ [5,n/2].each do |k|
33
34
  k=[k,n-k].min
34
35
  (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
35
36
  end
36
37
  end
37
38
  end
38
39
 
39
- measure "Factorial" do
40
+ measure "Pure Factorial" do
40
41
  samples.each do |n|
41
- [5,n/2,n-1].each do |k|
42
+ [5,n/2].each do |k|
42
43
  k=[k,n-k].min
43
44
  Math.factorial(n).quo(Math.factorial(k) * Math.factorial(n - k))
44
45
  end
45
46
  end
46
47
  end
47
48
 
48
- measure "Optimized Factorial" do
49
+ measure "Failing factorial + factorial" do
49
50
  samples.each do |n|
50
- [5,n/2,n-1].each do |k|
51
+ [5,n/2].each do |k|
51
52
  k=[k,n-k].min
52
53
  (((n-k+1)..n).inject(1) {|ac,v| ac * v}).quo(Math.factorial(k))
53
54
  end
@@ -0,0 +1,54 @@
1
+ # This test create a database to adjust the best algorithm
2
+ # to use on correlation matrix
3
+ $:.unshift(File.expand_path(File.dirname(__FILE__)+"/../../lib"))
4
+ require 'distribution'
5
+ require 'statsample'
6
+ require 'benchmark'
7
+
8
+ if !File.exists?("binomial_coefficient.ds") or File.mtime(__FILE__) > File.mtime("binomial_coefficient.ds")
9
+ reps=100 #number of repetitions
10
+ ns={
11
+ 5=> [1,3],
12
+ 10=> [1,3,5],
13
+ 50=> [1,3,5,10,25],
14
+ 100=> [1,3,5,10,25,50],
15
+ 500=> [1,3,5,10,25,50,100,250],
16
+ 1000=> [1,3,5,10,25,50,100,250,500],
17
+ 5000=> [1,3,5,10,25,50,100,250,500,1000,2500],
18
+ 10000=>[1,3,5,10,25,50,100,250,500,1000,2500,5000]
19
+ }
20
+
21
+ rs=Statsample::Dataset.new(%w{n k mixed_factorial multiplicative})
22
+
23
+ ns.each do |n,ks|
24
+ ks.each do |k|
25
+
26
+ time_factorial= Benchmark.realtime do
27
+ reps.times {
28
+ (((n-k+1)..n).inject(1) {|ac,v| ac * v}).quo(Math.factorial(k))
29
+ }
30
+ end
31
+
32
+ time_multiplicative= Benchmark.realtime do
33
+ reps.times {
34
+ (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
35
+ }
36
+ end
37
+
38
+ puts "n:#{n}, k:#{k} -> factorial:%0.3f | multiplicative: %0.3f " % [time_factorial, time_multiplicative]
39
+
40
+ rs.add_case({'n'=>n,'k'=>k,'mixed_factorial'=>time_factorial, 'multiplicative'=>time_multiplicative})
41
+ end
42
+ end
43
+
44
+ else
45
+ rs=Statsample.load("binomial_coefficient.ds")
46
+ end
47
+
48
+
49
+ rs.fields.each {|f| rs[f].type=:scale}
50
+
51
+
52
+ rs.update_valid_data
53
+ rs.save("binomial_coefficient.ds")
54
+ Statsample::Excel.write(rs,"binomial_coefficient.xls")
@@ -0,0 +1,24 @@
1
+ $:.unshift(File.expand_path(File.dirname(__FILE__)+"/../lib"))
2
+ require 'bench_press'
3
+ require 'distribution'
4
+ extend BenchPress
5
+
6
+ name 'calculate factorial vs looking on a Hash'
7
+ author 'Claudio Bustos'
8
+ date '2011-01-31'
9
+ summary "
10
+ Is better create a lookup table for factorial or just calculate it?
11
+ Distribution::MathExtension::SwingFactorial has a lookup table
12
+ for factorials n<20
13
+ "
14
+
15
+ reps 1000 #number of repetitions
16
+
17
+ measure "Lookup" do
18
+ Math.factorial(19)
19
+ end
20
+
21
+ measure "calculate" do
22
+ Distribution::MathExtension::SwingFactorial.naive_factorial(19)
23
+ end
24
+
@@ -1,3 +1,51 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- abort "you need to write me"
3
+ require 'optparse'
4
+ require 'fileutils'
5
+ require 'erb'
6
+ gem_base=File.expand_path(File.dirname(__FILE__)+"/..")
7
+ require gem_base+"/lib/distribution"
8
+
9
+ new=false
10
+ parameters=""
11
+ OptionParser.new do |opts|
12
+ opts.banner="Usage: distribution [--new] [--params parameters] distribution"
13
+ opts.on("-n", "--new", "Create a new template for distribution") do
14
+ new=true
15
+ end
16
+ opts.on("-PMANDATORY", "--params MANDATORY", String, "Parameters for distribution") do |n_param|
17
+ parameters=", #{n_param}"
18
+ end
19
+
20
+ opts.on("-h", "--help", "Show this message") do
21
+ puts opts
22
+ exit
23
+ end
24
+
25
+ begin
26
+ ARGV << "-h" if ARGV.empty?
27
+ opts.parse!(ARGV)
28
+ rescue OptionParser::ParseError => e
29
+ STDERR.puts e.message, "\n", opts
30
+ exit(-1)
31
+ end
32
+ end
33
+
34
+ ARGV.each do |distribution|
35
+ if new
36
+ basename=distribution.downcase
37
+ raise "You should be inside distribution lib directory" unless File.exists? "../distribution.rb"
38
+ raise "Distribution already created" if File.exists? basename+".rb"
39
+ main=ERB.new(File.read(gem_base+"/data/template/distribution.erb"),)
40
+ ruby=ERB.new(File.read(gem_base+"/data/template/distribution/ruby.erb"))
41
+ gsl=ERB.new(File.read(gem_base+"/data/template/distribution/gsl.erb"))
42
+ spec=ERB.new(File.read(gem_base+"/data/template/spec.erb"))
43
+
44
+ FileUtils.mkdir(basename) unless File.exists? basename
45
+ File.open(basename+".rb","w") {|fp| fp.write(main.result(binding))}
46
+ File.open(basename+"/ruby.rb","w") {|fp| fp.write(ruby.result(binding))}
47
+ File.open(basename+"/gsl.rb","w") {|fp| fp.write(gsl.result(binding))}
48
+ File.open("../../spec/#{basename}_spec.rb","w") {|fp| fp.write(spec.result(binding))}
49
+
50
+ end
51
+ end
@@ -0,0 +1,23 @@
1
+ require 'distribution/<%= distribution.downcase %>/ruby'
2
+ require 'distribution/<%= distribution.downcase %>/gsl'
3
+ #require 'distribution/<%= distribution.downcase %>/java'
4
+
5
+
6
+ module Distribution
7
+ # TODO: Document this Distribution
8
+ module <%= distribution.capitalize %>
9
+ SHORTHAND='<%= distribution.downcase[0,4] %>'
10
+ extend Distributable
11
+ create_distribution_methods
12
+
13
+ ##
14
+ # :singleton-method: pdf(x <%= parameters %>)
15
+
16
+ ##
17
+ # :singleton-method: cdf(x <%= parameters %>)
18
+
19
+ ##
20
+ # :singleton-method: p_value(pr <%= parameters %>)
21
+
22
+ end
23
+ end
@@ -0,0 +1,14 @@
1
+ module Distribution
2
+ module <%= distribution.capitalize %>
3
+ module GSL_
4
+ class << self
5
+ def pdf(x <% parameters %>)
6
+ end
7
+ def cdf(x <% parameters %>)
8
+ end
9
+ def p_value(pr <% parameters %>)
10
+ end
11
+ end
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,14 @@
1
+ module Distribution
2
+ module <%= distribution.capitalize %>
3
+ module Ruby_
4
+ class << self
5
+ def pdf(x <% parameters %>)
6
+ end
7
+ def cdf(x <% parameters %>)
8
+ end
9
+ def p_value(pr <% parameters %>)
10
+ end
11
+ end
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,54 @@
1
+ require File.expand_path(File.dirname(__FILE__)+"/spec_helper.rb")
2
+
3
+ describe Distribution::<%= distribution.capitalize %> do
4
+
5
+ shared_examples_for "<%= distribution %> engine" do
6
+ it "should return correct pdf" do
7
+ if @engine.respond_to? :pdf
8
+ else
9
+ pending("No #{@engine}.pdf")
10
+ end
11
+ end
12
+
13
+ it "should return correct cdf" do
14
+ if @engine.respond_to? :cdf
15
+ else
16
+ pending("No #{@engine}.cdf")
17
+ end
18
+ end
19
+
20
+
21
+ it "should return correct p_value" do
22
+ if @engine.respond_to? :p_value
23
+ else
24
+ pending("No #{@engine}.cdf")
25
+ end
26
+ end
27
+ end
28
+
29
+
30
+ describe "singleton" do
31
+ before do
32
+ @engine=Distribution::<%= distribution.capitalize %>
33
+ end
34
+ it_should_behave_like "<%= distribution %> engine"
35
+ end
36
+
37
+ describe Distribution::<%= distribution.capitalize %>::Ruby_ do
38
+ before do
39
+ @engine=Distribution::<%= distribution.capitalize %>::Ruby_
40
+ end
41
+ it_should_behave_like "<%= distribution %> engine"
42
+
43
+ end
44
+
45
+ describe Distribution::<%= distribution.capitalize %>::GSL_ do
46
+ before do
47
+ @engine=Distribution::<%= distribution.capitalize %>::GSL_
48
+ end
49
+ it_should_behave_like "<%= distribution %> engine"
50
+
51
+ end
52
+
53
+
54
+ end
@@ -49,7 +49,7 @@ require 'distribution/math_extension'
49
49
  # Distribution::Normal.p_value(0.95)
50
50
  # => 1.64485364660836
51
51
  module Distribution
52
- VERSION="0.3.0"
52
+ VERSION="0.4.0"
53
53
 
54
54
  module Shorthand
55
55
  EQUIVALENCES={:p_value=>:p, :cdf=>:cdf, :pdf=>:pdf, :rng=>:r, :exact_pdf=>:epdf, :exact_cdf=>:ecdf, :exact_p_value=>:ep}
@@ -144,8 +144,10 @@ module Distribution
144
144
  autoload(:F, 'distribution/f')
145
145
  autoload(:BivariateNormal, 'distribution/bivariatenormal')
146
146
  autoload(:Binomial, 'distribution/binomial')
147
-
148
147
  autoload(:Hypergeometric, 'distribution/hypergeometric')
148
+ autoload(:Exponential, 'distribution/exponential')
149
+ autoload(:Poisson, 'distribution/poisson')
150
+
149
151
  end
150
152
 
151
153
 
@@ -9,17 +9,18 @@ module Distribution
9
9
  #(0..x.floor).inject(0) {|ac,i| ac+pdf(i,n,pr)}
10
10
  Math.regularized_beta_function(1-pr,n - k,k+1)
11
11
  end
12
+ def exact_cdf(k,n,pr)
13
+ (0..k).inject(0) {|ac,i| ac+pdf(i,n,pr)}
14
+ end
12
15
  def p_value(prob,n,pr)
13
16
  ac=0
14
17
  (0..n).each do |i|
15
18
  ac+=pdf(i,n,pr)
16
- return i if ac>=prob
19
+ return i if prob<=ac
17
20
  end
18
21
  end
19
22
 
20
23
  alias :exact_pdf :pdf
21
-
22
-
23
24
  end
24
25
  end
25
26
  end
@@ -3,8 +3,7 @@ module Distribution
3
3
  module Ruby_
4
4
  class << self
5
5
 
6
- include Distribution::MathExtension
7
-
6
+ include Math
8
7
  def pdf(x,n)
9
8
  if n == 1
10
9
  1.0/Math.sqrt(2 * Math::PI * x) * Math::E**(-x/2.0)
@@ -0,0 +1,34 @@
1
+ require 'distribution/exponential/ruby'
2
+ require 'distribution/exponential/gsl'
3
+ #require 'distribution/exponential/java'
4
+
5
+
6
+ module Distribution
7
+ # From Wikipedia:
8
+ # In probability theory and statistics, the exponential distribution
9
+ # (a.k.a. negative exponential distribution) is a family of continuous
10
+ # probability distributions. It describes the time between events in a
11
+ # Poisson process, i.e. a process in which events occur continuously
12
+ # and independently at a constant average rate.
13
+ #
14
+ # Parameter +l+ is the rate parameter, the number of occurrences/unit time.
15
+ module Exponential
16
+ SHORTHAND='expo'
17
+ extend Distributable
18
+ create_distribution_methods
19
+
20
+ ##
21
+ # :singleton-method: pdf(x,l)
22
+ # PDF of exponential distribution, with parameters +x+ and +l+.
23
+ # +l+ is rate parameter
24
+
25
+ ##
26
+ # :singleton-method: cdf(x,l)
27
+ # CDF of exponential distribution, with parameters +x+ and +l+.
28
+ # +l+ is rate parameter
29
+ ##
30
+ # :singleton-method: p_value(pr,l)
31
+ # Inverse CDF of exponential distribution, with parameters +pr+ and +l+.
32
+ # +l+ is rate parameter
33
+ end
34
+ end
@@ -0,0 +1,19 @@
1
+ module Distribution
2
+ module Exponential
3
+ module GSL_
4
+ class << self
5
+ def pdf(x,l)
6
+ return 0 if x<0
7
+ GSL::Ran.exponential_pdf(x,1/l.to_f)
8
+ end
9
+ def cdf(x,l)
10
+ return 0 if x<0
11
+ GSL::Cdf.exponential_P(x,1/l.to_f)
12
+ end
13
+ def p_value(pr,l)
14
+ GSL::Cdf.exponential_Pinv(pr,1/l.to_f)
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,19 @@
1
+ module Distribution
2
+ module Exponential
3
+ module Ruby_
4
+ class << self
5
+ def pdf(x,l)
6
+ return 0 if x<0
7
+ l*Math.exp(-l*x)
8
+ end
9
+ def cdf(x,l)
10
+ return 0 if x<0
11
+ 1-Math.exp(-l*x)
12
+ end
13
+ def p_value(pr,l)
14
+ (-Math.log(1-pr)).quo(l)
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
@@ -14,8 +14,52 @@ end
14
14
  require 'bigdecimal'
15
15
  require 'bigdecimal/math'
16
16
 
17
- # Useful additions to Math
18
17
  module Distribution
18
+ # Extension for Ruby18
19
+ # Includes gamma and lgamma
20
+ module MathExtension18
21
+ LOG_2PI = Math.log(2 * Math::PI)# log(2PI)
22
+ N = 8
23
+ B0 = 1.0
24
+ B1 = -1.0 / 2.0
25
+ B2 = 1.0 / 6.0
26
+ B4 = -1.0 / 30.0
27
+ B6 = 1.0 / 42.0
28
+ B8 = -1.0 / 30.0
29
+ B10 = 5.0 / 66.0
30
+ B12 = -691.0 / 2730.0
31
+ B14 = 7.0 / 6.0
32
+ B16 = -3617.0 / 510.0
33
+ # From statistics2
34
+ def loggamma(x)
35
+ v = 1.0
36
+ while (x < N)
37
+ v *= x
38
+ x += 1.0
39
+ end
40
+ w = 1.0 / (x * x)
41
+ ret = B16 / (16 * 15)
42
+ ret = ret * w + B14 / (14 * 13)
43
+ ret = ret * w + B12 / (12 * 11)
44
+ ret = ret * w + B10 / (10 * 9)
45
+ ret = ret * w + B8 / ( 8 * 7)
46
+ ret = ret * w + B6 / ( 6 * 5)
47
+ ret = ret * w + B4 / ( 4 * 3)
48
+ ret = ret * w + B2 / ( 2 * 1)
49
+ ret = ret / x + 0.5 * LOG_2PI - Math.log(v) - x + (x - 0.5) * Math.log(x)
50
+ ret
51
+ end
52
+
53
+ # Gamma function.
54
+ # From statistics2
55
+ def gamma(x)
56
+ if (x < 0.0)
57
+ return Math::PI / (Math.sin(Math.PI * x) * Math.exp(loggamma(1 - x))) #/
58
+ end
59
+ Math.exp(loggamma(x))
60
+ end
61
+ end
62
+ # Useful additions to Math
19
63
  module MathExtension
20
64
  # Factorization based on Prime Swing algorithm, by Luschny (the king of factorial numbers analysis :P )
21
65
  # == Reference
@@ -24,6 +68,7 @@ module Distribution
24
68
  class SwingFactorial
25
69
  attr_reader :result
26
70
  SmallOddSwing=[ 1, 1, 1, 3, 3, 15, 5, 35, 35, 315, 63, 693, 231, 3003, 429, 6435, 6435, 109395, 12155, 230945, 46189, 969969, 88179, 2028117, 676039, 16900975, 1300075, 35102025, 5014575,145422675, 9694845, 300540195, 300540195]
71
+ SmallFactorial=[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600, 6227020800, 87178291200, 1307674368000, 20922789888000, 355687428096000, 6402373705728000, 121645100408832000, 2432902008176640000]
27
72
  def bitcount(n)
28
73
  bc = n - ((n >> 1) & 0x55555555);
29
74
  bc = (bc & 0x33333333) + ((bc >> 2) & 0x33333333);
@@ -35,7 +80,8 @@ module Distribution
35
80
  end
36
81
  def initialize(n)
37
82
  if (n<20)
38
- naive_factorial(n)
83
+ @result=SmallFactorial[n]
84
+ #naive_factorial(n)
39
85
  else
40
86
  @prime_list=[]
41
87
  exp2 = n - bitcount(n);
@@ -89,7 +135,7 @@ module Distribution
89
135
  @result=(self.class).naive_factorial(n)
90
136
  end
91
137
  def self.naive_factorial(n)
92
- (2..n).inject(1) { |f,n| f * n }
138
+ (2..n).inject(1) { |f,nn| f * nn }
93
139
  end
94
140
  end
95
141
  # Module to calculate approximated factorial
@@ -140,8 +186,8 @@ module Distribution
140
186
  end
141
187
  end
142
188
  # Exact factorial.
143
- # Use naive algorithm (iterative) on n<20
144
- # and Prime Swing algorithm for higher values
189
+ # Use lookup on a Hash table on n<20
190
+ # and Prime Swing algorithm for higher values.
145
191
  def factorial(n)
146
192
  SwingFactorial.new(n).result
147
193
  end
@@ -150,14 +196,6 @@ module Distribution
150
196
  def fast_factorial(n)
151
197
  ApproxFactorial.stieltjes_factorial(n)
152
198
  end
153
-
154
-
155
- # Quick, accurate approximation of factorial for very small n. Special case, generally you want to use stirling instead.
156
- # ==Reference
157
- # * http://mathworld.wolfram.com/StirlingsApproximation.html
158
- def gosper(n)
159
- Math.sqrt( (2*n + 1/3.0) * Math::PI ) * (n/Math::E)**n
160
- end
161
199
 
162
200
  # Beta function.
163
201
  # Source:
@@ -166,14 +204,14 @@ module Distribution
166
204
  (gamma(x)*gamma(y)).quo(gamma(x+y))
167
205
  end
168
206
  # I_x(a,b): Regularized incomplete beta function
169
- #
207
+ # TODO: Find a faster version.
170
208
  # Source:
171
- #
209
+ # * http://dlmf.nist.gov/8.17
172
210
  def regularized_beta_function(x,a,b)
173
211
  return 1 if x==1
174
212
  #incomplete_beta(x,a,b).quo(beta(a,b))
175
- m=a
176
- n=b+a-1
213
+ m=a.to_i
214
+ n=(b+a-1).to_i
177
215
  (m..n).inject(0) {|sum,j|
178
216
  sum+(binomial_coefficient(n,j)* x**j * (1-x)**(n-j))
179
217
  }
@@ -183,65 +221,38 @@ module Distribution
183
221
  # Should be replaced by
184
222
  # http://lib.stat.cmu.edu/apstat/63
185
223
  def incomplete_beta(x,a,b)
186
- raise "Not work"
187
- return beta(a,b) if x==1
188
-
189
- ((x**a * (1-x)**b).quo(a)) * hyper_f(a+b,1,a+1,x)
190
- end
191
- def permutations(x,n)
192
- factorial(x).quo(factorial(x-n))
224
+ raise "Doesn't work"
193
225
  end
194
226
 
227
+ # Rising factorial
195
228
  def rising_factorial(x,n)
196
229
  factorial(x+n-1).quo(factorial(x-1))
197
230
  end
198
231
 
199
-
200
- LOG_2PI = Math.log(2 * Math::PI)# log(2PI)
201
- N = 8
202
- B0 = 1.0
203
- B1 = -1.0 / 2.0
204
- B2 = 1.0 / 6.0
205
- B4 = -1.0 / 30.0
206
- B6 = 1.0 / 42.0
207
- B8 = -1.0 / 30.0
208
- B10 = 5.0 / 66.0
209
- B12 = -691.0 / 2730.0
210
- B14 = 7.0 / 6.0
211
- B16 = -3617.0 / 510.0
212
- # From statistics2
232
+ # Ln of gamma
213
233
  def loggamma(x)
214
- v = 1.0
215
- while (x < N)
216
- v *= x
217
- x += 1.0
218
- end
219
- w = 1.0 / (x * x)
220
- ret = B16 / (16 * 15)
221
- ret = ret * w + B14 / (14 * 13)
222
- ret = ret * w + B12 / (12 * 11)
223
- ret = ret * w + B10 / (10 * 9)
224
- ret = ret * w + B8 / ( 8 * 7)
225
- ret = ret * w + B6 / ( 6 * 5)
226
- ret = ret * w + B4 / ( 4 * 3)
227
- ret = ret * w + B2 / ( 2 * 1)
228
- ret = ret / x + 0.5 * LOG_2PI - Math.log(v) - x + (x - 0.5) * Math.log(x)
229
- ret
234
+ lg=Math.lgamma(x)
235
+ lg[0]*lg[1]
230
236
  end
231
- # Gamma function.
232
- # From statistics2
233
- def gamma(x)
234
- if (x < 0.0)
235
- return Math::PI / (Math.sin(Math.PI * x) * Math.exp(loggamma(1 - x))) #/
236
- end
237
- Math.exp(loggamma(x))
237
+
238
+
239
+ # Sequences without repetition. n^k'
240
+ # Also called 'failing factorial'
241
+ def permutations(n,k)
242
+ return 1 if k==0
243
+ return n if k==1
244
+ return factorial(n) if k==n
245
+ (((n-k+1)..n).inject(1) {|ac,v| ac * v})
246
+ #factorial(x).quo(factorial(x-n))
238
247
  end
248
+
239
249
  # Binomial coeffients, or:
240
250
  # ( n )
241
251
  # ( k )
242
- # Gives the number of different k size subsets of a set size n
252
+ #
253
+ # Gives the number of *different* k size subsets of a set size n
243
254
  #
244
- # Replaces (n,k) for (n, n-k) if k>n-k
255
+ # Uses:
245
256
  #
246
257
  # (n) n^k' (n)..(n-k+1)
247
258
  # ( ) = ---- = ------------
@@ -250,41 +261,63 @@ module Distribution
250
261
  def binomial_coefficient(n,k)
251
262
  return 1 if (k==0 or k==n)
252
263
  k=[k, n-k].min
253
- (((n-k+1)..n).inject(1) {|ac,v| ac * v}).quo(factorial(k))
254
- # Other way to calcule binomial is this:
264
+ permutations(n,k).quo(factorial(k))
265
+ # The factorial way is
266
+ # factorial(n).quo(factorial(k)*(factorial(n-k)))
267
+ # The multiplicative way is
255
268
  # (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
256
269
  end
270
+ # Binomial coefficient using multiplicative algorithm
271
+ # On benchmarks, is faster that raising factorial method
272
+ # when k is little. Use only when you're sure of that.
273
+ def binomial_coefficient_multiplicative(n,k)
274
+ return 1 if (k==0 or k==n)
275
+ k=[k, n-k].min
276
+ (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
277
+ end
278
+
257
279
  # Approximate binomial coefficient, using gamma function.
258
280
  # The fastest method, until we fall on BigDecimal!
259
281
  def binomial_coefficient_gamma(n,k)
260
282
  return 1 if (k==0 or k==n)
261
- k=[k, n-k].min
262
-
263
- val=gamma(n+1) / (gamma(k+1)*gamma(n-k+1))
283
+ k=[k, n-k].min
284
+ # First, we try direct gamma calculation for max precission
285
+
286
+ val=gamma(n + 1).quo(gamma(k+1)*gamma(n-k+1))
287
+ # Ups. Outside float point range. We try with logs
264
288
  if (val.nan?)
265
- lg=lgamma(n+1) - (lgamma(k+1)+lgamma(n-k+1))
289
+ #puts "nan"
290
+ lg=loggamma( n + 1 ) - (loggamma(k+1)+ loggamma(n-k+1))
266
291
  val=Math.exp(lg)
267
292
  # Crash again! We require BigDecimals
268
293
  if val.infinite?
294
+ #puts "infinite"
269
295
  val=BigMath.exp(BigDecimal(lg.to_s),16)
270
296
  end
271
297
  end
272
-
273
298
  val
274
299
  end
300
+ alias :combinations :binomial_coefficient
275
301
  end
276
302
  end
277
303
 
278
304
  module Math
279
305
  include Distribution::MathExtension
280
- alias :lgamma :loggamma
281
-
282
- module_function :factorial, :beta, :gamma, :gosper, :loggamma, :lgamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial , :fast_factorial
306
+ module_function :factorial, :beta, :loggamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial , :fast_factorial, :combinations
283
307
  end
284
308
 
285
309
  # Necessary on Ruby 1.9
286
310
  module CMath # :nodoc:
287
311
  include Distribution::MathExtension
288
- module_function :factorial, :beta, :gosper, :loggamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial, :fast_factorial
312
+ module_function :factorial, :beta, :loggamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial, :fast_factorial, :combinations
313
+ end
314
+
315
+ if RUBY_VERSION<"1.9"
316
+ module Math
317
+ remove_method :loggamma
318
+ include Distribution::MathExtension18
319
+ module_function :gamma, :loggamma
320
+ end
289
321
  end
290
322
 
323
+
@@ -0,0 +1,34 @@
1
+ require 'distribution/poisson/ruby'
2
+ require 'distribution/poisson/gsl'
3
+ #require 'distribution/poisson/java'
4
+
5
+
6
+ module Distribution
7
+ # From Wikipedia
8
+ # In probability theory and statistics, the Poisson distribution is
9
+ # a discrete probability distribution that expresses the probability of
10
+ # a number of events occurring in a fixed period of time if these
11
+ # events occur with a known average rate and independently of the time
12
+ # since the last event.
13
+ module Poisson
14
+ SHORTHAND='pois'
15
+ extend Distributable
16
+ create_distribution_methods
17
+
18
+ ##
19
+ # :singleton-method: pdf(k , l)
20
+ # PDF for Poisson distribution,
21
+ # [+k+] is the number of occurrences of an event
22
+ # [+l+] is a positive real number, equal to the expected number of occurrences that occur during the given interval.
23
+
24
+ ##
25
+ # :singleton-method: cdf(k , l)
26
+ # CDF for Poisson distribution
27
+ # [+k+] is the number of occurrences of an event
28
+ # [+l+] is a positive real number, equal to the expected number of occurrences that occur during the given interval.
29
+
30
+ # TODO: Not implemented yet
31
+ # :singleton-method: p_value(pr , l)
32
+
33
+ end
34
+ end
@@ -0,0 +1,17 @@
1
+ module Distribution
2
+ module Poisson
3
+ module GSL_
4
+ class << self
5
+ def pdf(k,l)
6
+ return 0 if k<0
7
+ GSL::Ran.poisson_pdf(k,l.to_f)
8
+ end
9
+ def cdf(k,l)
10
+ return 0 if k<0
11
+ GSL::Cdf.poisson_P(k, l.to_f)
12
+ end
13
+
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,21 @@
1
+ module Distribution
2
+ module Poisson
3
+ module Ruby_
4
+ class << self
5
+ def pdf(k,l )
6
+ (l**k*Math.exp(-l)).quo(Math.factorial(k))
7
+ end
8
+ def cdf(k,l)
9
+ Math.exp(-l)*(0..k).inject(0) {|ac,i| ac+ (l**i).quo(Math.factorial(i))}
10
+ end
11
+ def p_value(prob,l)
12
+ ac=0
13
+ (0..100).each do |i|
14
+ ac+=pdf(i,l)
15
+ return i if prob<=ac
16
+ end
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -51,6 +51,14 @@ end
51
51
  pending("No exact_p_value")
52
52
  @engine.should respond_to(:exact_p_value)
53
53
  }
54
+ it "exact_cdf should return same values as cdf for n=50" do
55
+ pr=rand()*0.8+0.1
56
+ n=rand(10)+10
57
+ [1,(n/2).to_i,n-1].each do |k|
58
+
59
+ @engine.exact_cdf(k,n,pr).should be_within(1e-10).of(@engine.cdf(k,n,pr))
60
+ end
61
+ end
54
62
 
55
63
  it "exact_pdf should not return a Float if not float is used as parameter" do
56
64
  @engine.exact_pdf(1,1,1).should_not be_a(Float)
@@ -0,0 +1,80 @@
1
+ require File.expand_path(File.dirname(__FILE__)+"/spec_helper.rb")
2
+
3
+ describe Distribution::Exponential do
4
+
5
+ shared_examples_for "exponential engine" do
6
+ it "should return correct pdf" do
7
+ if @engine.respond_to? :pdf
8
+ [0.5,1,1.5].each {|l|
9
+ 1.upto(5) {|x|
10
+ @engine.pdf(x,l).should be_within(1e-10).of(l*Math.exp(-l*x))
11
+ }
12
+ }
13
+ else
14
+ pending("No #{@engine}.pdf")
15
+ end
16
+ end
17
+
18
+ it "should return correct cdf" do
19
+ if @engine.respond_to? :cdf
20
+ [0.5,1,1.5].each {|l|
21
+ 1.upto(5) {|x|
22
+ @engine.cdf(x,l).should be_within(1e-10).of(1-Math.exp(-l*x))
23
+ }
24
+ }
25
+ else
26
+ pending("No #{@engine}.cdf")
27
+ end
28
+ end
29
+
30
+
31
+ it "should return correct p_value" do
32
+ if @engine.respond_to? :p_value
33
+ [0.5,1,1.5].each {|l|
34
+ 1.upto(5) {|x|
35
+ pr=@engine.cdf(x,l)
36
+ @engine.p_value(pr,l).should be_within(1e-10).of(x)
37
+ }
38
+ }
39
+ else
40
+ pending("No #{@engine}.p_value")
41
+ end
42
+ end
43
+ end
44
+
45
+
46
+ describe "singleton" do
47
+ before do
48
+ @engine=Distribution::Exponential
49
+ end
50
+ it_should_behave_like "exponential engine"
51
+ end
52
+
53
+ describe Distribution::Exponential::Ruby_ do
54
+ before do
55
+ @engine=Distribution::Exponential::Ruby_
56
+ end
57
+ it_should_behave_like "exponential engine"
58
+
59
+ end
60
+
61
+ if Distribution.has_gsl?
62
+ describe Distribution::Exponential::GSL_ do
63
+ before do
64
+ @engine=Distribution::Exponential::GSL_
65
+ end
66
+ it_should_behave_like "exponential engine"
67
+ end
68
+ end
69
+
70
+ # if Distribution.has_java?
71
+ # describe Distribution::Exponential::Java_ do
72
+ # before do
73
+ # @engine=Distribution::Exponential::Java_
74
+ # end
75
+ # it_should_behave_like "exponential engine"
76
+ #
77
+ # end
78
+ # end
79
+
80
+ end
@@ -8,13 +8,7 @@ describe Distribution::MathExtension do
8
8
  end
9
9
  end
10
10
 
11
- it "binomial coefficient(gamma) with n<=48 should be correct " do
12
-
13
- [1,5,10,25,48].each {|n|
14
- k=(n/2).to_i
15
- Math.binomial_coefficient_gamma(n,k).round.should eq(Math.binomial_coefficient(n,k))
16
- }
17
- end
11
+
18
12
  it "rising_factorial should return correct values" do
19
13
 
20
14
  x=rand(10)+1
@@ -25,6 +19,17 @@ describe Distribution::MathExtension do
25
19
  Math.rising_factorial(x,4).should eq x**4+6*x**3+11*x**2+6*x
26
20
 
27
21
  end
22
+
23
+ it "permutations should return correct values" do
24
+ n=rand(50)+50
25
+ 10.times { |k|
26
+ Math.permutations(n,k).should eq(Math.factorial(n) / Math.factorial(n-k))
27
+ }
28
+
29
+
30
+ Math.permutations(n,n).should eq(Math.factorial(n) / Math.factorial(n-n))
31
+ end
32
+
28
33
  it "incomplete beta function should return similar results to R" do
29
34
  pending("Not working yet")
30
35
  Math.incomplete_beta(0.5,5,6).should be_within(1e-6).of(Math.beta(5,6)*0.6230469)
@@ -46,12 +51,22 @@ describe Distribution::MathExtension do
46
51
 
47
52
 
48
53
  end
49
- it "binomial coefficient(gamma) with 48<n<1000 should have 12 correct digits" do
54
+
55
+ it "binomial coefficient(gamma) with n<=48 should be correct " do
56
+
57
+ [1,5,10,25,48].each {|n|
58
+ k=(n/2).to_i
59
+ Math.binomial_coefficient_gamma(n,k).round.should eq(Math.binomial_coefficient(n,k))
60
+ }
61
+ end
62
+
63
+ it "binomial coefficient(gamma) with 48<n<1000 should have 11 correct digits" do
50
64
 
51
- [50,100,1000].each {|n|
52
- k=n/2.to_i
53
- obs=Math.binomial_coefficient_gamma(n,k).to_i.to_s[0,12]
54
- exp=Math.binomial_coefficient(n,k).to_i.to_s[0,12]
65
+ [50,100,200,1000].each {|n|
66
+ k=(n/2).to_i
67
+ obs=Math.binomial_coefficient_gamma(n, k).to_i.to_s[0,11]
68
+ exp=Math.binomial_coefficient(n, k).to_i.to_s[0,11]
69
+
55
70
  obs.should eq(exp)
56
71
  }
57
72
  end
@@ -0,0 +1,72 @@
1
+ require File.expand_path(File.dirname(__FILE__)+"/spec_helper.rb")
2
+ include ExampleWithGSL
3
+ describe Distribution::Poisson do
4
+
5
+ shared_examples_for "poisson engine" do
6
+ it "should return correct pdf" do
7
+ if @engine.respond_to? :pdf
8
+ [0.5,1,1.5].each {|l|
9
+ 1.upto(5) {|k|
10
+ @engine.pdf(k,l).should be_within(1e-10).of((l**k*Math.exp(-l)).quo(Math.factorial(k)))
11
+ }
12
+ }
13
+ else
14
+ pending("No #{@engine}.pdf")
15
+ end
16
+ end
17
+
18
+ it_only_with_gsl "should return correct cdf" do
19
+ if @engine.respond_to? :cdf
20
+ [0.5,1,1.5,4,10].each {|l|
21
+ 1.upto(5) {|k|
22
+ @engine.cdf(k,l).should be_within(1e-10).of(GSL::Cdf.poisson_P(k,l))
23
+ }
24
+ }
25
+
26
+ else
27
+ pending("No #{@engine}.cdf")
28
+ end
29
+ end
30
+
31
+
32
+ it "should return correct p_value" do
33
+ pending("No exact p_value")
34
+ if @engine.respond_to? :p_value
35
+ [0.1,1,5,10].each {|l|
36
+ 1.upto(20) {|k|
37
+ pr=@engine.cdf(k,l)
38
+ @engine.p_value(pr,l).should eq(k)
39
+ }
40
+ }
41
+ else
42
+ pending("No #{@engine}.p_value")
43
+ end
44
+ end
45
+ end
46
+
47
+
48
+ describe "singleton" do
49
+ before do
50
+ @engine=Distribution::Poisson
51
+ end
52
+ it_should_behave_like "poisson engine"
53
+ end
54
+
55
+ describe Distribution::Poisson::Ruby_ do
56
+ before do
57
+ @engine=Distribution::Poisson::Ruby_
58
+ end
59
+ it_should_behave_like "poisson engine"
60
+
61
+ end
62
+
63
+ describe Distribution::Poisson::GSL_ do
64
+ before do
65
+ @engine=Distribution::Poisson::GSL_
66
+ end
67
+ it_should_behave_like "poisson engine"
68
+
69
+ end
70
+
71
+
72
+ end
metadata CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
6
  - 0
7
- - 3
7
+ - 4
8
8
  - 0
9
- version: 0.3.0
9
+ version: 0.4.0
10
10
  platform: ruby
11
11
  authors:
12
12
  - Claudio Bustos
@@ -35,7 +35,7 @@ cert_chain:
35
35
  rpP0jjs0
36
36
  -----END CERTIFICATE-----
37
37
 
38
- date: 2011-01-28 00:00:00 -03:00
38
+ date: 2011-02-01 00:00:00 -03:00
39
39
  default_executable:
40
40
  dependencies:
41
41
  - !ruby/object:Gem::Dependency
@@ -96,10 +96,11 @@ dependencies:
96
96
  type: :development
97
97
  version_requirements: *id004
98
98
  description: |-
99
- Statistical Distributions multi library wrapper.
99
+ Statistical Distributions library. Includes Normal univariate and bivariate, T, F, Chi Square, Binomial, Hypergeometric, Exponential and Poisson.
100
+
100
101
  Uses Ruby by default and C (statistics2/GSL) or Java extensions where available.
101
102
 
102
- Includes code from statistics2
103
+ Includes code from statistics2 on Normal, T, F and Chi Square ruby code [http://blade.nagaokaut.ac.jp/~sinara/ruby/math/statistics2]
103
104
  email:
104
105
  - clbustos_at_gmail.com
105
106
  executables:
@@ -117,9 +118,17 @@ files:
117
118
  - README.txt
118
119
  - Rakefile
119
120
  - benchmark/binomial_coefficient.rb
121
+ - benchmark/binomial_coefficient/binomial_coefficient.ds
122
+ - benchmark/binomial_coefficient/binomial_coefficient.xls
123
+ - benchmark/binomial_coefficient/experiment.rb
124
+ - benchmark/factorial_hash.rb
120
125
  - benchmark/factorial_method.rb
121
126
  - benchmark/odd.rb
122
127
  - bin/distribution
128
+ - data/template/distribution.erb
129
+ - data/template/distribution/gsl.erb
130
+ - data/template/distribution/ruby.erb
131
+ - data/template/spec.erb
123
132
  - lib/distribution.rb
124
133
  - lib/distribution/binomial.rb
125
134
  - lib/distribution/binomial/gsl.rb
@@ -135,6 +144,9 @@ files:
135
144
  - lib/distribution/chisquare/java.rb
136
145
  - lib/distribution/chisquare/ruby.rb
137
146
  - lib/distribution/chisquare/statistics2.rb
147
+ - lib/distribution/exponential.rb
148
+ - lib/distribution/exponential/gsl.rb
149
+ - lib/distribution/exponential/ruby.rb
138
150
  - lib/distribution/f.rb
139
151
  - lib/distribution/f/gsl.rb
140
152
  - lib/distribution/f/java.rb
@@ -151,6 +163,9 @@ files:
151
163
  - lib/distribution/normal/ruby.rb
152
164
  - lib/distribution/normal/statistics2.rb
153
165
  - lib/distribution/normalmultivariate.rb
166
+ - lib/distribution/poisson.rb
167
+ - lib/distribution/poisson/gsl.rb
168
+ - lib/distribution/poisson/ruby.rb
154
169
  - lib/distribution/t.rb
155
170
  - lib/distribution/t/gsl.rb
156
171
  - lib/distribution/t/java.rb
@@ -160,10 +175,12 @@ files:
160
175
  - spec/bivariatenormal_spec.rb
161
176
  - spec/chisquare_spec.rb
162
177
  - spec/distribution_spec.rb
178
+ - spec/exponential_spec.rb
163
179
  - spec/f_spec.rb
164
180
  - spec/hypergeometric_spec.rb
165
181
  - spec/math_extension_spec.rb
166
182
  - spec/normal_spec.rb
183
+ - spec/poisson_spec.rb
167
184
  - spec/shorthand_spec.rb
168
185
  - spec/spec.opts
169
186
  - spec/spec_helper.rb
@@ -200,6 +217,6 @@ rubyforge_project: distribution
200
217
  rubygems_version: 1.3.7
201
218
  signing_key:
202
219
  specification_version: 3
203
- summary: Statistical Distributions multi library wrapper
220
+ summary: Statistical Distributions library
204
221
  test_files: []
205
222
 
metadata.gz.sig CHANGED
Binary file