distribution 0.3.0 → 0.4.0

Sign up to get free protection for your applications and to get access to all the features.
data.tar.gz.sig CHANGED
Binary file
@@ -1,3 +1,13 @@
1
+ === 0.4.0 / 2011-02-01
2
+
3
+ * Poisson and exponential distributions implemented. Implementation of inverse cdf for poisson is not perfect, yet.
4
+ * +distribution+ executable can create template files for new distributions.
5
+ * MathExtension should work fine with Ruby 1.8. Fixed shadowed variable on MathExtension.naive_factorial
6
+ * Added factorial lookup table for n<20.
7
+ * Added exact cdf for Binomial
8
+ * Binomial coefficient in function of permutations. Deleted incomplete beta until we found a faster way to calculate it
9
+
10
+
1
11
  === 0.3.0 / 2011-01-28
2
12
 
3
13
  * Included support for binomial distribution. p_value is not accurate
@@ -4,9 +4,17 @@ Manifest.txt
4
4
  README.txt
5
5
  Rakefile
6
6
  benchmark/binomial_coefficient.rb
7
+ benchmark/binomial_coefficient/binomial_coefficient.ds
8
+ benchmark/binomial_coefficient/binomial_coefficient.xls
9
+ benchmark/binomial_coefficient/experiment.rb
10
+ benchmark/factorial_hash.rb
7
11
  benchmark/factorial_method.rb
8
12
  benchmark/odd.rb
9
13
  bin/distribution
14
+ data/template/distribution.erb
15
+ data/template/distribution/gsl.erb
16
+ data/template/distribution/ruby.erb
17
+ data/template/spec.erb
10
18
  lib/distribution.rb
11
19
  lib/distribution/binomial.rb
12
20
  lib/distribution/binomial/gsl.rb
@@ -22,6 +30,9 @@ lib/distribution/chisquare/gsl.rb
22
30
  lib/distribution/chisquare/java.rb
23
31
  lib/distribution/chisquare/ruby.rb
24
32
  lib/distribution/chisquare/statistics2.rb
33
+ lib/distribution/exponential.rb
34
+ lib/distribution/exponential/gsl.rb
35
+ lib/distribution/exponential/ruby.rb
25
36
  lib/distribution/f.rb
26
37
  lib/distribution/f/gsl.rb
27
38
  lib/distribution/f/java.rb
@@ -38,6 +49,9 @@ lib/distribution/normal/java.rb
38
49
  lib/distribution/normal/ruby.rb
39
50
  lib/distribution/normal/statistics2.rb
40
51
  lib/distribution/normalmultivariate.rb
52
+ lib/distribution/poisson.rb
53
+ lib/distribution/poisson/gsl.rb
54
+ lib/distribution/poisson/ruby.rb
41
55
  lib/distribution/t.rb
42
56
  lib/distribution/t/gsl.rb
43
57
  lib/distribution/t/java.rb
@@ -47,10 +61,12 @@ spec/binomial_spec.rb
47
61
  spec/bivariatenormal_spec.rb
48
62
  spec/chisquare_spec.rb
49
63
  spec/distribution_spec.rb
64
+ spec/exponential_spec.rb
50
65
  spec/f_spec.rb
51
66
  spec/hypergeometric_spec.rb
52
67
  spec/math_extension_spec.rb
53
68
  spec/normal_spec.rb
69
+ spec/poisson_spec.rb
54
70
  spec/shorthand_spec.rb
55
71
  spec/spec.opts
56
72
  spec/spec_helper.rb
data/README.txt CHANGED
@@ -4,16 +4,17 @@
4
4
 
5
5
  == DESCRIPTION:
6
6
 
7
- Statistical Distributions multi library wrapper.
7
+ Statistical Distributions library. Includes Normal univariate and bivariate, T, F, Chi Square, Binomial, Hypergeometric, Exponential and Poisson.
8
+
8
9
  Uses Ruby by default and C (statistics2/GSL) or Java extensions where available.
9
10
 
10
- Includes code from statistics2
11
+ Includes code from statistics2 on Normal, T, F and Chi Square ruby code [http://blade.nagaokaut.ac.jp/~sinara/ruby/math/statistics2]
11
12
 
12
13
  == FEATURES/PROBLEMS:
13
14
 
14
15
  * Very fast ruby 1.8.7/1.9.+ implementation, with improved method to calculate factorials and others common functions
15
16
  * All methods tested on several ranges. See spec/
16
- * On Jruby, BivariateNormal returns incorrect pdf
17
+ * On Jruby and Rubinius, BivariateNormal returns incorrect pdf
17
18
 
18
19
  == API structure
19
20
 
@@ -31,7 +32,7 @@ On discrete distributions, exact cdf, pdf and p_value are
31
32
 
32
33
  <Distribution shortname>_(ecdf|epdf|ep)
33
34
 
34
- Shortnames are:
35
+ Shortnames for distributions:
35
36
 
36
37
  * Normal: norm
37
38
  * Bivariate Normal: bnor
@@ -40,6 +41,8 @@ Shortnames are:
40
41
  * Chi Square: chisq
41
42
  * Binomial: bino
42
43
  * Hypergeometric: hypg
44
+ * Exponential: expo
45
+ * Poisson: pois
43
46
 
44
47
  For example
45
48
 
@@ -83,6 +86,12 @@ After checking out the source, run:
83
86
  This task will install any missing dependencies, run the tests/specs,
84
87
  and generate the RDoc.
85
88
 
89
+ If you want to provide a new distribution, /lib/distribution run
90
+
91
+ $ distribution --new your_distribution
92
+
93
+ This should create the main distribution file, the directory with ruby and gsl engines and the rspec on /spec directory.
94
+
86
95
  == LICENSE:
87
96
 
88
97
  GPL V2
@@ -1,18 +1,21 @@
1
- $:.unshift(File.dirname(__FILE__)+"/../lib")
1
+ $:.unshift(File.expand_path(File.dirname(__FILE__)+"/../lib"))
2
2
  require 'distribution'
3
3
  require 'bench_press'
4
4
 
5
5
  extend BenchPress
6
6
 
7
+
8
+
9
+ samples=10.times.map {|i| 2**(i+1)}
10
+
7
11
  name 'binomial coefficient: multiplicative, factorial and optimized factorial methods'
8
12
  author 'Claudio Bustos'
9
13
  date '2011-01-27'
10
- summary "Exact calculation of Binomial Coefficient could be obtained using multiplicative, pure factorial or optimized factorial algorithm.
14
+ summary "Exact calculation of Binomial Coefficient could be obtained using multiplicative, pure factorial or optimized factorial algorithm (failing + factorial).
11
15
  Which one is faster?
12
16
 
13
17
  Lower k is the best for all
14
- k=n/2 is the worst case for optimized algorithm
15
- k near n is the worst for multiplicative
18
+ k=n/2 is the worst case.
16
19
 
17
20
  The factorial method uses the fastest Swing Prime Algorithm."
18
21
 
@@ -23,31 +26,29 @@ x=100
23
26
  n=100
24
27
  k=50
25
28
 
26
- samples=10.times.map {|i| 2**(i+1)}
27
-
28
29
 
29
30
 
30
31
  measure "Multiplicative" do
31
32
  samples.each do |n|
32
- [5,n/2,n-1].each do |k|
33
+ [5,n/2].each do |k|
33
34
  k=[k,n-k].min
34
35
  (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
35
36
  end
36
37
  end
37
38
  end
38
39
 
39
- measure "Factorial" do
40
+ measure "Pure Factorial" do
40
41
  samples.each do |n|
41
- [5,n/2,n-1].each do |k|
42
+ [5,n/2].each do |k|
42
43
  k=[k,n-k].min
43
44
  Math.factorial(n).quo(Math.factorial(k) * Math.factorial(n - k))
44
45
  end
45
46
  end
46
47
  end
47
48
 
48
- measure "Optimized Factorial" do
49
+ measure "Failing factorial + factorial" do
49
50
  samples.each do |n|
50
- [5,n/2,n-1].each do |k|
51
+ [5,n/2].each do |k|
51
52
  k=[k,n-k].min
52
53
  (((n-k+1)..n).inject(1) {|ac,v| ac * v}).quo(Math.factorial(k))
53
54
  end
@@ -0,0 +1,54 @@
1
+ # This test create a database to adjust the best algorithm
2
+ # to use on correlation matrix
3
+ $:.unshift(File.expand_path(File.dirname(__FILE__)+"/../../lib"))
4
+ require 'distribution'
5
+ require 'statsample'
6
+ require 'benchmark'
7
+
8
+ if !File.exists?("binomial_coefficient.ds") or File.mtime(__FILE__) > File.mtime("binomial_coefficient.ds")
9
+ reps=100 #number of repetitions
10
+ ns={
11
+ 5=> [1,3],
12
+ 10=> [1,3,5],
13
+ 50=> [1,3,5,10,25],
14
+ 100=> [1,3,5,10,25,50],
15
+ 500=> [1,3,5,10,25,50,100,250],
16
+ 1000=> [1,3,5,10,25,50,100,250,500],
17
+ 5000=> [1,3,5,10,25,50,100,250,500,1000,2500],
18
+ 10000=>[1,3,5,10,25,50,100,250,500,1000,2500,5000]
19
+ }
20
+
21
+ rs=Statsample::Dataset.new(%w{n k mixed_factorial multiplicative})
22
+
23
+ ns.each do |n,ks|
24
+ ks.each do |k|
25
+
26
+ time_factorial= Benchmark.realtime do
27
+ reps.times {
28
+ (((n-k+1)..n).inject(1) {|ac,v| ac * v}).quo(Math.factorial(k))
29
+ }
30
+ end
31
+
32
+ time_multiplicative= Benchmark.realtime do
33
+ reps.times {
34
+ (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
35
+ }
36
+ end
37
+
38
+ puts "n:#{n}, k:#{k} -> factorial:%0.3f | multiplicative: %0.3f " % [time_factorial, time_multiplicative]
39
+
40
+ rs.add_case({'n'=>n,'k'=>k,'mixed_factorial'=>time_factorial, 'multiplicative'=>time_multiplicative})
41
+ end
42
+ end
43
+
44
+ else
45
+ rs=Statsample.load("binomial_coefficient.ds")
46
+ end
47
+
48
+
49
+ rs.fields.each {|f| rs[f].type=:scale}
50
+
51
+
52
+ rs.update_valid_data
53
+ rs.save("binomial_coefficient.ds")
54
+ Statsample::Excel.write(rs,"binomial_coefficient.xls")
@@ -0,0 +1,24 @@
1
+ $:.unshift(File.expand_path(File.dirname(__FILE__)+"/../lib"))
2
+ require 'bench_press'
3
+ require 'distribution'
4
+ extend BenchPress
5
+
6
+ name 'calculate factorial vs looking on a Hash'
7
+ author 'Claudio Bustos'
8
+ date '2011-01-31'
9
+ summary "
10
+ Is better create a lookup table for factorial or just calculate it?
11
+ Distribution::MathExtension::SwingFactorial has a lookup table
12
+ for factorials n<20
13
+ "
14
+
15
+ reps 1000 #number of repetitions
16
+
17
+ measure "Lookup" do
18
+ Math.factorial(19)
19
+ end
20
+
21
+ measure "calculate" do
22
+ Distribution::MathExtension::SwingFactorial.naive_factorial(19)
23
+ end
24
+
@@ -1,3 +1,51 @@
1
1
  #!/usr/bin/env ruby
2
2
 
3
- abort "you need to write me"
3
+ require 'optparse'
4
+ require 'fileutils'
5
+ require 'erb'
6
+ gem_base=File.expand_path(File.dirname(__FILE__)+"/..")
7
+ require gem_base+"/lib/distribution"
8
+
9
+ new=false
10
+ parameters=""
11
+ OptionParser.new do |opts|
12
+ opts.banner="Usage: distribution [--new] [--params parameters] distribution"
13
+ opts.on("-n", "--new", "Create a new template for distribution") do
14
+ new=true
15
+ end
16
+ opts.on("-PMANDATORY", "--params MANDATORY", String, "Parameters for distribution") do |n_param|
17
+ parameters=", #{n_param}"
18
+ end
19
+
20
+ opts.on("-h", "--help", "Show this message") do
21
+ puts opts
22
+ exit
23
+ end
24
+
25
+ begin
26
+ ARGV << "-h" if ARGV.empty?
27
+ opts.parse!(ARGV)
28
+ rescue OptionParser::ParseError => e
29
+ STDERR.puts e.message, "\n", opts
30
+ exit(-1)
31
+ end
32
+ end
33
+
34
+ ARGV.each do |distribution|
35
+ if new
36
+ basename=distribution.downcase
37
+ raise "You should be inside distribution lib directory" unless File.exists? "../distribution.rb"
38
+ raise "Distribution already created" if File.exists? basename+".rb"
39
+ main=ERB.new(File.read(gem_base+"/data/template/distribution.erb"),)
40
+ ruby=ERB.new(File.read(gem_base+"/data/template/distribution/ruby.erb"))
41
+ gsl=ERB.new(File.read(gem_base+"/data/template/distribution/gsl.erb"))
42
+ spec=ERB.new(File.read(gem_base+"/data/template/spec.erb"))
43
+
44
+ FileUtils.mkdir(basename) unless File.exists? basename
45
+ File.open(basename+".rb","w") {|fp| fp.write(main.result(binding))}
46
+ File.open(basename+"/ruby.rb","w") {|fp| fp.write(ruby.result(binding))}
47
+ File.open(basename+"/gsl.rb","w") {|fp| fp.write(gsl.result(binding))}
48
+ File.open("../../spec/#{basename}_spec.rb","w") {|fp| fp.write(spec.result(binding))}
49
+
50
+ end
51
+ end
@@ -0,0 +1,23 @@
1
+ require 'distribution/<%= distribution.downcase %>/ruby'
2
+ require 'distribution/<%= distribution.downcase %>/gsl'
3
+ #require 'distribution/<%= distribution.downcase %>/java'
4
+
5
+
6
+ module Distribution
7
+ # TODO: Document this Distribution
8
+ module <%= distribution.capitalize %>
9
+ SHORTHAND='<%= distribution.downcase[0,4] %>'
10
+ extend Distributable
11
+ create_distribution_methods
12
+
13
+ ##
14
+ # :singleton-method: pdf(x <%= parameters %>)
15
+
16
+ ##
17
+ # :singleton-method: cdf(x <%= parameters %>)
18
+
19
+ ##
20
+ # :singleton-method: p_value(pr <%= parameters %>)
21
+
22
+ end
23
+ end
@@ -0,0 +1,14 @@
1
+ module Distribution
2
+ module <%= distribution.capitalize %>
3
+ module GSL_
4
+ class << self
5
+ def pdf(x <% parameters %>)
6
+ end
7
+ def cdf(x <% parameters %>)
8
+ end
9
+ def p_value(pr <% parameters %>)
10
+ end
11
+ end
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,14 @@
1
+ module Distribution
2
+ module <%= distribution.capitalize %>
3
+ module Ruby_
4
+ class << self
5
+ def pdf(x <% parameters %>)
6
+ end
7
+ def cdf(x <% parameters %>)
8
+ end
9
+ def p_value(pr <% parameters %>)
10
+ end
11
+ end
12
+ end
13
+ end
14
+ end
@@ -0,0 +1,54 @@
1
+ require File.expand_path(File.dirname(__FILE__)+"/spec_helper.rb")
2
+
3
+ describe Distribution::<%= distribution.capitalize %> do
4
+
5
+ shared_examples_for "<%= distribution %> engine" do
6
+ it "should return correct pdf" do
7
+ if @engine.respond_to? :pdf
8
+ else
9
+ pending("No #{@engine}.pdf")
10
+ end
11
+ end
12
+
13
+ it "should return correct cdf" do
14
+ if @engine.respond_to? :cdf
15
+ else
16
+ pending("No #{@engine}.cdf")
17
+ end
18
+ end
19
+
20
+
21
+ it "should return correct p_value" do
22
+ if @engine.respond_to? :p_value
23
+ else
24
+ pending("No #{@engine}.cdf")
25
+ end
26
+ end
27
+ end
28
+
29
+
30
+ describe "singleton" do
31
+ before do
32
+ @engine=Distribution::<%= distribution.capitalize %>
33
+ end
34
+ it_should_behave_like "<%= distribution %> engine"
35
+ end
36
+
37
+ describe Distribution::<%= distribution.capitalize %>::Ruby_ do
38
+ before do
39
+ @engine=Distribution::<%= distribution.capitalize %>::Ruby_
40
+ end
41
+ it_should_behave_like "<%= distribution %> engine"
42
+
43
+ end
44
+
45
+ describe Distribution::<%= distribution.capitalize %>::GSL_ do
46
+ before do
47
+ @engine=Distribution::<%= distribution.capitalize %>::GSL_
48
+ end
49
+ it_should_behave_like "<%= distribution %> engine"
50
+
51
+ end
52
+
53
+
54
+ end
@@ -49,7 +49,7 @@ require 'distribution/math_extension'
49
49
  # Distribution::Normal.p_value(0.95)
50
50
  # => 1.64485364660836
51
51
  module Distribution
52
- VERSION="0.3.0"
52
+ VERSION="0.4.0"
53
53
 
54
54
  module Shorthand
55
55
  EQUIVALENCES={:p_value=>:p, :cdf=>:cdf, :pdf=>:pdf, :rng=>:r, :exact_pdf=>:epdf, :exact_cdf=>:ecdf, :exact_p_value=>:ep}
@@ -144,8 +144,10 @@ module Distribution
144
144
  autoload(:F, 'distribution/f')
145
145
  autoload(:BivariateNormal, 'distribution/bivariatenormal')
146
146
  autoload(:Binomial, 'distribution/binomial')
147
-
148
147
  autoload(:Hypergeometric, 'distribution/hypergeometric')
148
+ autoload(:Exponential, 'distribution/exponential')
149
+ autoload(:Poisson, 'distribution/poisson')
150
+
149
151
  end
150
152
 
151
153
 
@@ -9,17 +9,18 @@ module Distribution
9
9
  #(0..x.floor).inject(0) {|ac,i| ac+pdf(i,n,pr)}
10
10
  Math.regularized_beta_function(1-pr,n - k,k+1)
11
11
  end
12
+ def exact_cdf(k,n,pr)
13
+ (0..k).inject(0) {|ac,i| ac+pdf(i,n,pr)}
14
+ end
12
15
  def p_value(prob,n,pr)
13
16
  ac=0
14
17
  (0..n).each do |i|
15
18
  ac+=pdf(i,n,pr)
16
- return i if ac>=prob
19
+ return i if prob<=ac
17
20
  end
18
21
  end
19
22
 
20
23
  alias :exact_pdf :pdf
21
-
22
-
23
24
  end
24
25
  end
25
26
  end
@@ -3,8 +3,7 @@ module Distribution
3
3
  module Ruby_
4
4
  class << self
5
5
 
6
- include Distribution::MathExtension
7
-
6
+ include Math
8
7
  def pdf(x,n)
9
8
  if n == 1
10
9
  1.0/Math.sqrt(2 * Math::PI * x) * Math::E**(-x/2.0)
@@ -0,0 +1,34 @@
1
+ require 'distribution/exponential/ruby'
2
+ require 'distribution/exponential/gsl'
3
+ #require 'distribution/exponential/java'
4
+
5
+
6
+ module Distribution
7
+ # From Wikipedia:
8
+ # In probability theory and statistics, the exponential distribution
9
+ # (a.k.a. negative exponential distribution) is a family of continuous
10
+ # probability distributions. It describes the time between events in a
11
+ # Poisson process, i.e. a process in which events occur continuously
12
+ # and independently at a constant average rate.
13
+ #
14
+ # Parameter +l+ is the rate parameter, the number of occurrences/unit time.
15
+ module Exponential
16
+ SHORTHAND='expo'
17
+ extend Distributable
18
+ create_distribution_methods
19
+
20
+ ##
21
+ # :singleton-method: pdf(x,l)
22
+ # PDF of exponential distribution, with parameters +x+ and +l+.
23
+ # +l+ is rate parameter
24
+
25
+ ##
26
+ # :singleton-method: cdf(x,l)
27
+ # CDF of exponential distribution, with parameters +x+ and +l+.
28
+ # +l+ is rate parameter
29
+ ##
30
+ # :singleton-method: p_value(pr,l)
31
+ # Inverse CDF of exponential distribution, with parameters +pr+ and +l+.
32
+ # +l+ is rate parameter
33
+ end
34
+ end
@@ -0,0 +1,19 @@
1
+ module Distribution
2
+ module Exponential
3
+ module GSL_
4
+ class << self
5
+ def pdf(x,l)
6
+ return 0 if x<0
7
+ GSL::Ran.exponential_pdf(x,1/l.to_f)
8
+ end
9
+ def cdf(x,l)
10
+ return 0 if x<0
11
+ GSL::Cdf.exponential_P(x,1/l.to_f)
12
+ end
13
+ def p_value(pr,l)
14
+ GSL::Cdf.exponential_Pinv(pr,1/l.to_f)
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
@@ -0,0 +1,19 @@
1
+ module Distribution
2
+ module Exponential
3
+ module Ruby_
4
+ class << self
5
+ def pdf(x,l)
6
+ return 0 if x<0
7
+ l*Math.exp(-l*x)
8
+ end
9
+ def cdf(x,l)
10
+ return 0 if x<0
11
+ 1-Math.exp(-l*x)
12
+ end
13
+ def p_value(pr,l)
14
+ (-Math.log(1-pr)).quo(l)
15
+ end
16
+ end
17
+ end
18
+ end
19
+ end
@@ -14,8 +14,52 @@ end
14
14
  require 'bigdecimal'
15
15
  require 'bigdecimal/math'
16
16
 
17
- # Useful additions to Math
18
17
  module Distribution
18
+ # Extension for Ruby18
19
+ # Includes gamma and lgamma
20
+ module MathExtension18
21
+ LOG_2PI = Math.log(2 * Math::PI)# log(2PI)
22
+ N = 8
23
+ B0 = 1.0
24
+ B1 = -1.0 / 2.0
25
+ B2 = 1.0 / 6.0
26
+ B4 = -1.0 / 30.0
27
+ B6 = 1.0 / 42.0
28
+ B8 = -1.0 / 30.0
29
+ B10 = 5.0 / 66.0
30
+ B12 = -691.0 / 2730.0
31
+ B14 = 7.0 / 6.0
32
+ B16 = -3617.0 / 510.0
33
+ # From statistics2
34
+ def loggamma(x)
35
+ v = 1.0
36
+ while (x < N)
37
+ v *= x
38
+ x += 1.0
39
+ end
40
+ w = 1.0 / (x * x)
41
+ ret = B16 / (16 * 15)
42
+ ret = ret * w + B14 / (14 * 13)
43
+ ret = ret * w + B12 / (12 * 11)
44
+ ret = ret * w + B10 / (10 * 9)
45
+ ret = ret * w + B8 / ( 8 * 7)
46
+ ret = ret * w + B6 / ( 6 * 5)
47
+ ret = ret * w + B4 / ( 4 * 3)
48
+ ret = ret * w + B2 / ( 2 * 1)
49
+ ret = ret / x + 0.5 * LOG_2PI - Math.log(v) - x + (x - 0.5) * Math.log(x)
50
+ ret
51
+ end
52
+
53
+ # Gamma function.
54
+ # From statistics2
55
+ def gamma(x)
56
+ if (x < 0.0)
57
+ return Math::PI / (Math.sin(Math.PI * x) * Math.exp(loggamma(1 - x))) #/
58
+ end
59
+ Math.exp(loggamma(x))
60
+ end
61
+ end
62
+ # Useful additions to Math
19
63
  module MathExtension
20
64
  # Factorization based on Prime Swing algorithm, by Luschny (the king of factorial numbers analysis :P )
21
65
  # == Reference
@@ -24,6 +68,7 @@ module Distribution
24
68
  class SwingFactorial
25
69
  attr_reader :result
26
70
  SmallOddSwing=[ 1, 1, 1, 3, 3, 15, 5, 35, 35, 315, 63, 693, 231, 3003, 429, 6435, 6435, 109395, 12155, 230945, 46189, 969969, 88179, 2028117, 676039, 16900975, 1300075, 35102025, 5014575,145422675, 9694845, 300540195, 300540195]
71
+ SmallFactorial=[1, 1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600, 6227020800, 87178291200, 1307674368000, 20922789888000, 355687428096000, 6402373705728000, 121645100408832000, 2432902008176640000]
27
72
  def bitcount(n)
28
73
  bc = n - ((n >> 1) & 0x55555555);
29
74
  bc = (bc & 0x33333333) + ((bc >> 2) & 0x33333333);
@@ -35,7 +80,8 @@ module Distribution
35
80
  end
36
81
  def initialize(n)
37
82
  if (n<20)
38
- naive_factorial(n)
83
+ @result=SmallFactorial[n]
84
+ #naive_factorial(n)
39
85
  else
40
86
  @prime_list=[]
41
87
  exp2 = n - bitcount(n);
@@ -89,7 +135,7 @@ module Distribution
89
135
  @result=(self.class).naive_factorial(n)
90
136
  end
91
137
  def self.naive_factorial(n)
92
- (2..n).inject(1) { |f,n| f * n }
138
+ (2..n).inject(1) { |f,nn| f * nn }
93
139
  end
94
140
  end
95
141
  # Module to calculate approximated factorial
@@ -140,8 +186,8 @@ module Distribution
140
186
  end
141
187
  end
142
188
  # Exact factorial.
143
- # Use naive algorithm (iterative) on n<20
144
- # and Prime Swing algorithm for higher values
189
+ # Use lookup on a Hash table on n<20
190
+ # and Prime Swing algorithm for higher values.
145
191
  def factorial(n)
146
192
  SwingFactorial.new(n).result
147
193
  end
@@ -150,14 +196,6 @@ module Distribution
150
196
  def fast_factorial(n)
151
197
  ApproxFactorial.stieltjes_factorial(n)
152
198
  end
153
-
154
-
155
- # Quick, accurate approximation of factorial for very small n. Special case, generally you want to use stirling instead.
156
- # ==Reference
157
- # * http://mathworld.wolfram.com/StirlingsApproximation.html
158
- def gosper(n)
159
- Math.sqrt( (2*n + 1/3.0) * Math::PI ) * (n/Math::E)**n
160
- end
161
199
 
162
200
  # Beta function.
163
201
  # Source:
@@ -166,14 +204,14 @@ module Distribution
166
204
  (gamma(x)*gamma(y)).quo(gamma(x+y))
167
205
  end
168
206
  # I_x(a,b): Regularized incomplete beta function
169
- #
207
+ # TODO: Find a faster version.
170
208
  # Source:
171
- #
209
+ # * http://dlmf.nist.gov/8.17
172
210
  def regularized_beta_function(x,a,b)
173
211
  return 1 if x==1
174
212
  #incomplete_beta(x,a,b).quo(beta(a,b))
175
- m=a
176
- n=b+a-1
213
+ m=a.to_i
214
+ n=(b+a-1).to_i
177
215
  (m..n).inject(0) {|sum,j|
178
216
  sum+(binomial_coefficient(n,j)* x**j * (1-x)**(n-j))
179
217
  }
@@ -183,65 +221,38 @@ module Distribution
183
221
  # Should be replaced by
184
222
  # http://lib.stat.cmu.edu/apstat/63
185
223
  def incomplete_beta(x,a,b)
186
- raise "Not work"
187
- return beta(a,b) if x==1
188
-
189
- ((x**a * (1-x)**b).quo(a)) * hyper_f(a+b,1,a+1,x)
190
- end
191
- def permutations(x,n)
192
- factorial(x).quo(factorial(x-n))
224
+ raise "Doesn't work"
193
225
  end
194
226
 
227
+ # Rising factorial
195
228
  def rising_factorial(x,n)
196
229
  factorial(x+n-1).quo(factorial(x-1))
197
230
  end
198
231
 
199
-
200
- LOG_2PI = Math.log(2 * Math::PI)# log(2PI)
201
- N = 8
202
- B0 = 1.0
203
- B1 = -1.0 / 2.0
204
- B2 = 1.0 / 6.0
205
- B4 = -1.0 / 30.0
206
- B6 = 1.0 / 42.0
207
- B8 = -1.0 / 30.0
208
- B10 = 5.0 / 66.0
209
- B12 = -691.0 / 2730.0
210
- B14 = 7.0 / 6.0
211
- B16 = -3617.0 / 510.0
212
- # From statistics2
232
+ # Ln of gamma
213
233
  def loggamma(x)
214
- v = 1.0
215
- while (x < N)
216
- v *= x
217
- x += 1.0
218
- end
219
- w = 1.0 / (x * x)
220
- ret = B16 / (16 * 15)
221
- ret = ret * w + B14 / (14 * 13)
222
- ret = ret * w + B12 / (12 * 11)
223
- ret = ret * w + B10 / (10 * 9)
224
- ret = ret * w + B8 / ( 8 * 7)
225
- ret = ret * w + B6 / ( 6 * 5)
226
- ret = ret * w + B4 / ( 4 * 3)
227
- ret = ret * w + B2 / ( 2 * 1)
228
- ret = ret / x + 0.5 * LOG_2PI - Math.log(v) - x + (x - 0.5) * Math.log(x)
229
- ret
234
+ lg=Math.lgamma(x)
235
+ lg[0]*lg[1]
230
236
  end
231
- # Gamma function.
232
- # From statistics2
233
- def gamma(x)
234
- if (x < 0.0)
235
- return Math::PI / (Math.sin(Math.PI * x) * Math.exp(loggamma(1 - x))) #/
236
- end
237
- Math.exp(loggamma(x))
237
+
238
+
239
+ # Sequences without repetition. n^k'
240
+ # Also called 'failing factorial'
241
+ def permutations(n,k)
242
+ return 1 if k==0
243
+ return n if k==1
244
+ return factorial(n) if k==n
245
+ (((n-k+1)..n).inject(1) {|ac,v| ac * v})
246
+ #factorial(x).quo(factorial(x-n))
238
247
  end
248
+
239
249
  # Binomial coeffients, or:
240
250
  # ( n )
241
251
  # ( k )
242
- # Gives the number of different k size subsets of a set size n
252
+ #
253
+ # Gives the number of *different* k size subsets of a set size n
243
254
  #
244
- # Replaces (n,k) for (n, n-k) if k>n-k
255
+ # Uses:
245
256
  #
246
257
  # (n) n^k' (n)..(n-k+1)
247
258
  # ( ) = ---- = ------------
@@ -250,41 +261,63 @@ module Distribution
250
261
  def binomial_coefficient(n,k)
251
262
  return 1 if (k==0 or k==n)
252
263
  k=[k, n-k].min
253
- (((n-k+1)..n).inject(1) {|ac,v| ac * v}).quo(factorial(k))
254
- # Other way to calcule binomial is this:
264
+ permutations(n,k).quo(factorial(k))
265
+ # The factorial way is
266
+ # factorial(n).quo(factorial(k)*(factorial(n-k)))
267
+ # The multiplicative way is
255
268
  # (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
256
269
  end
270
+ # Binomial coefficient using multiplicative algorithm
271
+ # On benchmarks, is faster that raising factorial method
272
+ # when k is little. Use only when you're sure of that.
273
+ def binomial_coefficient_multiplicative(n,k)
274
+ return 1 if (k==0 or k==n)
275
+ k=[k, n-k].min
276
+ (1..k).inject(1) {|ac, i| (ac*(n-k+i).quo(i))}
277
+ end
278
+
257
279
  # Approximate binomial coefficient, using gamma function.
258
280
  # The fastest method, until we fall on BigDecimal!
259
281
  def binomial_coefficient_gamma(n,k)
260
282
  return 1 if (k==0 or k==n)
261
- k=[k, n-k].min
262
-
263
- val=gamma(n+1) / (gamma(k+1)*gamma(n-k+1))
283
+ k=[k, n-k].min
284
+ # First, we try direct gamma calculation for max precission
285
+
286
+ val=gamma(n + 1).quo(gamma(k+1)*gamma(n-k+1))
287
+ # Ups. Outside float point range. We try with logs
264
288
  if (val.nan?)
265
- lg=lgamma(n+1) - (lgamma(k+1)+lgamma(n-k+1))
289
+ #puts "nan"
290
+ lg=loggamma( n + 1 ) - (loggamma(k+1)+ loggamma(n-k+1))
266
291
  val=Math.exp(lg)
267
292
  # Crash again! We require BigDecimals
268
293
  if val.infinite?
294
+ #puts "infinite"
269
295
  val=BigMath.exp(BigDecimal(lg.to_s),16)
270
296
  end
271
297
  end
272
-
273
298
  val
274
299
  end
300
+ alias :combinations :binomial_coefficient
275
301
  end
276
302
  end
277
303
 
278
304
  module Math
279
305
  include Distribution::MathExtension
280
- alias :lgamma :loggamma
281
-
282
- module_function :factorial, :beta, :gamma, :gosper, :loggamma, :lgamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial , :fast_factorial
306
+ module_function :factorial, :beta, :loggamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial , :fast_factorial, :combinations
283
307
  end
284
308
 
285
309
  # Necessary on Ruby 1.9
286
310
  module CMath # :nodoc:
287
311
  include Distribution::MathExtension
288
- module_function :factorial, :beta, :gosper, :loggamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial, :fast_factorial
312
+ module_function :factorial, :beta, :loggamma, :binomial_coefficient, :binomial_coefficient_gamma, :regularized_beta_function, :incomplete_beta, :permutations, :rising_factorial, :fast_factorial, :combinations
313
+ end
314
+
315
+ if RUBY_VERSION<"1.9"
316
+ module Math
317
+ remove_method :loggamma
318
+ include Distribution::MathExtension18
319
+ module_function :gamma, :loggamma
320
+ end
289
321
  end
290
322
 
323
+
@@ -0,0 +1,34 @@
1
+ require 'distribution/poisson/ruby'
2
+ require 'distribution/poisson/gsl'
3
+ #require 'distribution/poisson/java'
4
+
5
+
6
+ module Distribution
7
+ # From Wikipedia
8
+ # In probability theory and statistics, the Poisson distribution is
9
+ # a discrete probability distribution that expresses the probability of
10
+ # a number of events occurring in a fixed period of time if these
11
+ # events occur with a known average rate and independently of the time
12
+ # since the last event.
13
+ module Poisson
14
+ SHORTHAND='pois'
15
+ extend Distributable
16
+ create_distribution_methods
17
+
18
+ ##
19
+ # :singleton-method: pdf(k , l)
20
+ # PDF for Poisson distribution,
21
+ # [+k+] is the number of occurrences of an event
22
+ # [+l+] is a positive real number, equal to the expected number of occurrences that occur during the given interval.
23
+
24
+ ##
25
+ # :singleton-method: cdf(k , l)
26
+ # CDF for Poisson distribution
27
+ # [+k+] is the number of occurrences of an event
28
+ # [+l+] is a positive real number, equal to the expected number of occurrences that occur during the given interval.
29
+
30
+ # TODO: Not implemented yet
31
+ # :singleton-method: p_value(pr , l)
32
+
33
+ end
34
+ end
@@ -0,0 +1,17 @@
1
+ module Distribution
2
+ module Poisson
3
+ module GSL_
4
+ class << self
5
+ def pdf(k,l)
6
+ return 0 if k<0
7
+ GSL::Ran.poisson_pdf(k,l.to_f)
8
+ end
9
+ def cdf(k,l)
10
+ return 0 if k<0
11
+ GSL::Cdf.poisson_P(k, l.to_f)
12
+ end
13
+
14
+ end
15
+ end
16
+ end
17
+ end
@@ -0,0 +1,21 @@
1
+ module Distribution
2
+ module Poisson
3
+ module Ruby_
4
+ class << self
5
+ def pdf(k,l )
6
+ (l**k*Math.exp(-l)).quo(Math.factorial(k))
7
+ end
8
+ def cdf(k,l)
9
+ Math.exp(-l)*(0..k).inject(0) {|ac,i| ac+ (l**i).quo(Math.factorial(i))}
10
+ end
11
+ def p_value(prob,l)
12
+ ac=0
13
+ (0..100).each do |i|
14
+ ac+=pdf(i,l)
15
+ return i if prob<=ac
16
+ end
17
+ end
18
+ end
19
+ end
20
+ end
21
+ end
@@ -51,6 +51,14 @@ end
51
51
  pending("No exact_p_value")
52
52
  @engine.should respond_to(:exact_p_value)
53
53
  }
54
+ it "exact_cdf should return same values as cdf for n=50" do
55
+ pr=rand()*0.8+0.1
56
+ n=rand(10)+10
57
+ [1,(n/2).to_i,n-1].each do |k|
58
+
59
+ @engine.exact_cdf(k,n,pr).should be_within(1e-10).of(@engine.cdf(k,n,pr))
60
+ end
61
+ end
54
62
 
55
63
  it "exact_pdf should not return a Float if not float is used as parameter" do
56
64
  @engine.exact_pdf(1,1,1).should_not be_a(Float)
@@ -0,0 +1,80 @@
1
+ require File.expand_path(File.dirname(__FILE__)+"/spec_helper.rb")
2
+
3
+ describe Distribution::Exponential do
4
+
5
+ shared_examples_for "exponential engine" do
6
+ it "should return correct pdf" do
7
+ if @engine.respond_to? :pdf
8
+ [0.5,1,1.5].each {|l|
9
+ 1.upto(5) {|x|
10
+ @engine.pdf(x,l).should be_within(1e-10).of(l*Math.exp(-l*x))
11
+ }
12
+ }
13
+ else
14
+ pending("No #{@engine}.pdf")
15
+ end
16
+ end
17
+
18
+ it "should return correct cdf" do
19
+ if @engine.respond_to? :cdf
20
+ [0.5,1,1.5].each {|l|
21
+ 1.upto(5) {|x|
22
+ @engine.cdf(x,l).should be_within(1e-10).of(1-Math.exp(-l*x))
23
+ }
24
+ }
25
+ else
26
+ pending("No #{@engine}.cdf")
27
+ end
28
+ end
29
+
30
+
31
+ it "should return correct p_value" do
32
+ if @engine.respond_to? :p_value
33
+ [0.5,1,1.5].each {|l|
34
+ 1.upto(5) {|x|
35
+ pr=@engine.cdf(x,l)
36
+ @engine.p_value(pr,l).should be_within(1e-10).of(x)
37
+ }
38
+ }
39
+ else
40
+ pending("No #{@engine}.p_value")
41
+ end
42
+ end
43
+ end
44
+
45
+
46
+ describe "singleton" do
47
+ before do
48
+ @engine=Distribution::Exponential
49
+ end
50
+ it_should_behave_like "exponential engine"
51
+ end
52
+
53
+ describe Distribution::Exponential::Ruby_ do
54
+ before do
55
+ @engine=Distribution::Exponential::Ruby_
56
+ end
57
+ it_should_behave_like "exponential engine"
58
+
59
+ end
60
+
61
+ if Distribution.has_gsl?
62
+ describe Distribution::Exponential::GSL_ do
63
+ before do
64
+ @engine=Distribution::Exponential::GSL_
65
+ end
66
+ it_should_behave_like "exponential engine"
67
+ end
68
+ end
69
+
70
+ # if Distribution.has_java?
71
+ # describe Distribution::Exponential::Java_ do
72
+ # before do
73
+ # @engine=Distribution::Exponential::Java_
74
+ # end
75
+ # it_should_behave_like "exponential engine"
76
+ #
77
+ # end
78
+ # end
79
+
80
+ end
@@ -8,13 +8,7 @@ describe Distribution::MathExtension do
8
8
  end
9
9
  end
10
10
 
11
- it "binomial coefficient(gamma) with n<=48 should be correct " do
12
-
13
- [1,5,10,25,48].each {|n|
14
- k=(n/2).to_i
15
- Math.binomial_coefficient_gamma(n,k).round.should eq(Math.binomial_coefficient(n,k))
16
- }
17
- end
11
+
18
12
  it "rising_factorial should return correct values" do
19
13
 
20
14
  x=rand(10)+1
@@ -25,6 +19,17 @@ describe Distribution::MathExtension do
25
19
  Math.rising_factorial(x,4).should eq x**4+6*x**3+11*x**2+6*x
26
20
 
27
21
  end
22
+
23
+ it "permutations should return correct values" do
24
+ n=rand(50)+50
25
+ 10.times { |k|
26
+ Math.permutations(n,k).should eq(Math.factorial(n) / Math.factorial(n-k))
27
+ }
28
+
29
+
30
+ Math.permutations(n,n).should eq(Math.factorial(n) / Math.factorial(n-n))
31
+ end
32
+
28
33
  it "incomplete beta function should return similar results to R" do
29
34
  pending("Not working yet")
30
35
  Math.incomplete_beta(0.5,5,6).should be_within(1e-6).of(Math.beta(5,6)*0.6230469)
@@ -46,12 +51,22 @@ describe Distribution::MathExtension do
46
51
 
47
52
 
48
53
  end
49
- it "binomial coefficient(gamma) with 48<n<1000 should have 12 correct digits" do
54
+
55
+ it "binomial coefficient(gamma) with n<=48 should be correct " do
56
+
57
+ [1,5,10,25,48].each {|n|
58
+ k=(n/2).to_i
59
+ Math.binomial_coefficient_gamma(n,k).round.should eq(Math.binomial_coefficient(n,k))
60
+ }
61
+ end
62
+
63
+ it "binomial coefficient(gamma) with 48<n<1000 should have 11 correct digits" do
50
64
 
51
- [50,100,1000].each {|n|
52
- k=n/2.to_i
53
- obs=Math.binomial_coefficient_gamma(n,k).to_i.to_s[0,12]
54
- exp=Math.binomial_coefficient(n,k).to_i.to_s[0,12]
65
+ [50,100,200,1000].each {|n|
66
+ k=(n/2).to_i
67
+ obs=Math.binomial_coefficient_gamma(n, k).to_i.to_s[0,11]
68
+ exp=Math.binomial_coefficient(n, k).to_i.to_s[0,11]
69
+
55
70
  obs.should eq(exp)
56
71
  }
57
72
  end
@@ -0,0 +1,72 @@
1
+ require File.expand_path(File.dirname(__FILE__)+"/spec_helper.rb")
2
+ include ExampleWithGSL
3
+ describe Distribution::Poisson do
4
+
5
+ shared_examples_for "poisson engine" do
6
+ it "should return correct pdf" do
7
+ if @engine.respond_to? :pdf
8
+ [0.5,1,1.5].each {|l|
9
+ 1.upto(5) {|k|
10
+ @engine.pdf(k,l).should be_within(1e-10).of((l**k*Math.exp(-l)).quo(Math.factorial(k)))
11
+ }
12
+ }
13
+ else
14
+ pending("No #{@engine}.pdf")
15
+ end
16
+ end
17
+
18
+ it_only_with_gsl "should return correct cdf" do
19
+ if @engine.respond_to? :cdf
20
+ [0.5,1,1.5,4,10].each {|l|
21
+ 1.upto(5) {|k|
22
+ @engine.cdf(k,l).should be_within(1e-10).of(GSL::Cdf.poisson_P(k,l))
23
+ }
24
+ }
25
+
26
+ else
27
+ pending("No #{@engine}.cdf")
28
+ end
29
+ end
30
+
31
+
32
+ it "should return correct p_value" do
33
+ pending("No exact p_value")
34
+ if @engine.respond_to? :p_value
35
+ [0.1,1,5,10].each {|l|
36
+ 1.upto(20) {|k|
37
+ pr=@engine.cdf(k,l)
38
+ @engine.p_value(pr,l).should eq(k)
39
+ }
40
+ }
41
+ else
42
+ pending("No #{@engine}.p_value")
43
+ end
44
+ end
45
+ end
46
+
47
+
48
+ describe "singleton" do
49
+ before do
50
+ @engine=Distribution::Poisson
51
+ end
52
+ it_should_behave_like "poisson engine"
53
+ end
54
+
55
+ describe Distribution::Poisson::Ruby_ do
56
+ before do
57
+ @engine=Distribution::Poisson::Ruby_
58
+ end
59
+ it_should_behave_like "poisson engine"
60
+
61
+ end
62
+
63
+ describe Distribution::Poisson::GSL_ do
64
+ before do
65
+ @engine=Distribution::Poisson::GSL_
66
+ end
67
+ it_should_behave_like "poisson engine"
68
+
69
+ end
70
+
71
+
72
+ end
metadata CHANGED
@@ -4,9 +4,9 @@ version: !ruby/object:Gem::Version
4
4
  prerelease: false
5
5
  segments:
6
6
  - 0
7
- - 3
7
+ - 4
8
8
  - 0
9
- version: 0.3.0
9
+ version: 0.4.0
10
10
  platform: ruby
11
11
  authors:
12
12
  - Claudio Bustos
@@ -35,7 +35,7 @@ cert_chain:
35
35
  rpP0jjs0
36
36
  -----END CERTIFICATE-----
37
37
 
38
- date: 2011-01-28 00:00:00 -03:00
38
+ date: 2011-02-01 00:00:00 -03:00
39
39
  default_executable:
40
40
  dependencies:
41
41
  - !ruby/object:Gem::Dependency
@@ -96,10 +96,11 @@ dependencies:
96
96
  type: :development
97
97
  version_requirements: *id004
98
98
  description: |-
99
- Statistical Distributions multi library wrapper.
99
+ Statistical Distributions library. Includes Normal univariate and bivariate, T, F, Chi Square, Binomial, Hypergeometric, Exponential and Poisson.
100
+
100
101
  Uses Ruby by default and C (statistics2/GSL) or Java extensions where available.
101
102
 
102
- Includes code from statistics2
103
+ Includes code from statistics2 on Normal, T, F and Chi Square ruby code [http://blade.nagaokaut.ac.jp/~sinara/ruby/math/statistics2]
103
104
  email:
104
105
  - clbustos_at_gmail.com
105
106
  executables:
@@ -117,9 +118,17 @@ files:
117
118
  - README.txt
118
119
  - Rakefile
119
120
  - benchmark/binomial_coefficient.rb
121
+ - benchmark/binomial_coefficient/binomial_coefficient.ds
122
+ - benchmark/binomial_coefficient/binomial_coefficient.xls
123
+ - benchmark/binomial_coefficient/experiment.rb
124
+ - benchmark/factorial_hash.rb
120
125
  - benchmark/factorial_method.rb
121
126
  - benchmark/odd.rb
122
127
  - bin/distribution
128
+ - data/template/distribution.erb
129
+ - data/template/distribution/gsl.erb
130
+ - data/template/distribution/ruby.erb
131
+ - data/template/spec.erb
123
132
  - lib/distribution.rb
124
133
  - lib/distribution/binomial.rb
125
134
  - lib/distribution/binomial/gsl.rb
@@ -135,6 +144,9 @@ files:
135
144
  - lib/distribution/chisquare/java.rb
136
145
  - lib/distribution/chisquare/ruby.rb
137
146
  - lib/distribution/chisquare/statistics2.rb
147
+ - lib/distribution/exponential.rb
148
+ - lib/distribution/exponential/gsl.rb
149
+ - lib/distribution/exponential/ruby.rb
138
150
  - lib/distribution/f.rb
139
151
  - lib/distribution/f/gsl.rb
140
152
  - lib/distribution/f/java.rb
@@ -151,6 +163,9 @@ files:
151
163
  - lib/distribution/normal/ruby.rb
152
164
  - lib/distribution/normal/statistics2.rb
153
165
  - lib/distribution/normalmultivariate.rb
166
+ - lib/distribution/poisson.rb
167
+ - lib/distribution/poisson/gsl.rb
168
+ - lib/distribution/poisson/ruby.rb
154
169
  - lib/distribution/t.rb
155
170
  - lib/distribution/t/gsl.rb
156
171
  - lib/distribution/t/java.rb
@@ -160,10 +175,12 @@ files:
160
175
  - spec/bivariatenormal_spec.rb
161
176
  - spec/chisquare_spec.rb
162
177
  - spec/distribution_spec.rb
178
+ - spec/exponential_spec.rb
163
179
  - spec/f_spec.rb
164
180
  - spec/hypergeometric_spec.rb
165
181
  - spec/math_extension_spec.rb
166
182
  - spec/normal_spec.rb
183
+ - spec/poisson_spec.rb
167
184
  - spec/shorthand_spec.rb
168
185
  - spec/spec.opts
169
186
  - spec/spec_helper.rb
@@ -200,6 +217,6 @@ rubyforge_project: distribution
200
217
  rubygems_version: 1.3.7
201
218
  signing_key:
202
219
  specification_version: 3
203
- summary: Statistical Distributions multi library wrapper
220
+ summary: Statistical Distributions library
204
221
  test_files: []
205
222
 
metadata.gz.sig CHANGED
Binary file