statsample 1.0.1 → 1.1.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (39) hide show
  1. data/.gemtest +0 -0
  2. data/History.txt +14 -0
  3. data/Manifest.txt +4 -0
  4. data/README.txt +49 -13
  5. data/data/locale/es/LC_MESSAGES/statsample.mo +0 -0
  6. data/lib/statsample.rb +1 -23
  7. data/lib/statsample/analysis.rb +49 -28
  8. data/lib/statsample/analysis/suite.rb +18 -5
  9. data/lib/statsample/analysis/suitereportbuilder.rb +9 -3
  10. data/lib/statsample/anova.rb +2 -0
  11. data/lib/statsample/anova/contrast.rb +79 -0
  12. data/lib/statsample/anova/oneway.rb +39 -5
  13. data/lib/statsample/converter/csv.rb +2 -5
  14. data/lib/statsample/converters.rb +1 -0
  15. data/lib/statsample/dataset.rb +31 -1
  16. data/lib/statsample/graph/histogram.rb +1 -1
  17. data/lib/statsample/regression/multiple/baseengine.rb +5 -0
  18. data/lib/statsample/reliability/multiscaleanalysis.rb +3 -1
  19. data/lib/statsample/reliability/scaleanalysis.rb +3 -4
  20. data/lib/statsample/shorthand.rb +41 -1
  21. data/lib/statsample/test.rb +10 -0
  22. data/lib/statsample/test/kolmogorovsmirnov.rb +61 -0
  23. data/lib/statsample/test/t.rb +92 -9
  24. data/lib/statsample/vector.rb +143 -10
  25. data/po/es/statsample.mo +0 -0
  26. data/po/es/statsample.po +109 -110
  27. data/po/statsample.pot +108 -60
  28. data/test/helpers_tests.rb +1 -0
  29. data/test/test_analysis.rb +70 -11
  30. data/test/test_anova_contrast.rb +36 -0
  31. data/test/test_anovawithvectors.rb +8 -0
  32. data/test/test_dataset.rb +12 -0
  33. data/test/test_factor_pa.rb +1 -3
  34. data/test/test_test_kolmogorovsmirnov.rb +34 -0
  35. data/test/test_test_t.rb +16 -0
  36. data/test/test_vector.rb +40 -2
  37. metadata +44 -118
  38. data.tar.gz.sig +0 -0
  39. metadata.gz.sig +0 -0
data/.gemtest ADDED
File without changes
data/History.txt CHANGED
@@ -1,3 +1,17 @@
1
+ === 1.1.0 / 2011-06-02
2
+
3
+ * New Statsample::Anova::Contrast
4
+ * Jacknife and bootstrap for Vector. Thanks to John Firebaugh for the idea
5
+ * Improved Statsample::Analysis API
6
+ * Updated CSV.read. Third argument is a Hash with options to CSV class
7
+ * Added restriction on Statsample::Excel.read
8
+ * Updated spanish po
9
+ * Better summary for Vector
10
+ * Improving summary of t related test (confidence interval and estimate output)
11
+ * Replaced c for vector on Statsample::Analysis examples
12
+ * Added Vector#median_absolute_deviation
13
+ * First implementation of Kolmogorov Smirnov test. Returns correct D value, but without Kolmogorov distribution isn't very useful.
14
+
1
15
  === 1.0.1 / 2011-01-28
2
16
 
3
17
  * Updated spanish po.
data/Manifest.txt CHANGED
@@ -44,6 +44,7 @@ lib/statsample/analysis.rb
44
44
  lib/statsample/analysis/suite.rb
45
45
  lib/statsample/analysis/suitereportbuilder.rb
46
46
  lib/statsample/anova.rb
47
+ lib/statsample/anova/contrast.rb
47
48
  lib/statsample/anova/oneway.rb
48
49
  lib/statsample/anova/twoway.rb
49
50
  lib/statsample/bivariate.rb
@@ -97,6 +98,7 @@ lib/statsample/test.rb
97
98
  lib/statsample/test/bartlettsphericity.rb
98
99
  lib/statsample/test/chisquare.rb
99
100
  lib/statsample/test/f.rb
101
+ lib/statsample/test/kolmogorovsmirnov.rb
100
102
  lib/statsample/test/levene.rb
101
103
  lib/statsample/test/t.rb
102
104
  lib/statsample/test/umannwhitney.rb
@@ -119,6 +121,7 @@ test/fixtures/tetmat_matrix.txt
119
121
  test/fixtures/tetmat_test.txt
120
122
  test/helpers_tests.rb
121
123
  test/test_analysis.rb
124
+ test/test_anova_contrast.rb
122
125
  test/test_anovaoneway.rb
123
126
  test/test_anovatwoway.rb
124
127
  test/test_anovatwowaywithdataset.rb
@@ -151,6 +154,7 @@ test/test_statistics.rb
151
154
  test/test_stest.rb
152
155
  test/test_stratified.rb
153
156
  test/test_test_f.rb
157
+ test/test_test_kolmogorovsmirnov.rb
154
158
  test/test_test_t.rb
155
159
  test/test_umannwhitney.rb
156
160
  test/test_vector.rb
data/README.txt CHANGED
@@ -5,14 +5,14 @@ http://ruby-statsample.rubyforge.org/
5
5
 
6
6
  == DESCRIPTION:
7
7
 
8
- A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010) and JRuby 1.4 (Ruby 1.8.7 compatible).
8
+ A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
9
9
 
10
10
  Include:
11
11
  * Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
12
12
  * Imports and exports datasets from and to Excel, CSV and plain text files.
13
13
  * Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and gamma. Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
14
14
  * Intra-class correlation
15
- * Anova: generic and vector-based One-way ANOVA and Two-way ANOVA
15
+ * Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
16
16
  * Tests: F, T, Levene, U-Mannwhitney.
17
17
  * Regression: Simple, Multiple (OLS), Probit and Logit
18
18
  * Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
@@ -23,13 +23,27 @@ Include:
23
23
  * Creates reports on text, html and rtf, using ReportBuilder gem
24
24
  * Graphics: Histogram, Boxplot and Scatterplot
25
25
 
26
+ == PRINCIPLES
27
+
28
+ * Software Design:
29
+ * One module/class for each type of analysis
30
+ * Options can be set as hash on initialize() or as setters methods
31
+ * Clean API for interactive sessions
32
+ * summary() returns all necessary informacion for interactive sessions
33
+ * All statistical data available though methods on objects
34
+ * All (important) methods should be tested. Better with random data.
35
+ * Statistical Design
36
+ * Results are tested against text results, SPSS and R outputs.
37
+ * Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
38
+ * (When possible) All references for methods are documented, providing sensible information on documentation
39
+
26
40
  == FEATURES:
27
41
 
28
42
  * Classes for manipulation and storage of data:
29
43
  * Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
30
44
  * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
31
45
  * Statsample::Multiset: multiple datasets with same fields and type of vectors
32
- * Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors
46
+ * Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
33
47
  * Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
34
48
  * Multiple types of regression.
35
49
  * Simple Regression : Statsample::Regression::Simple
@@ -61,15 +75,16 @@ Include:
61
75
  * Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
62
76
  * Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
63
77
  * Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
64
- * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulation.
78
+ * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
65
79
  * Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
66
80
  * Module Statsample::Test provides several methods and classes to perform inferencial statistics
67
81
  * Statsample::Test::BartlettSphericity
68
82
  * Statsample::Test::ChiSquare
83
+ * Statsample::Test::F
84
+ * Statsample::Test::KolmogorovSmirnov (only D value)
69
85
  * Statsample::Test::Levene
70
86
  * Statsample::Test::UMannWhitney
71
87
  * Statsample::Test::T
72
- * Statsample::Test::F
73
88
  * Module Graph provides several classes to create beautiful graphs using rubyvis
74
89
  * Statsample::Graph::Boxplot
75
90
  * Statsample::Graph::Histogram
@@ -81,16 +96,37 @@ Include:
81
96
 
82
97
  See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/]
83
98
 
99
+ === Boxplot
100
+
101
+ require 'statsample'
102
+ ss_analysis(Statsample::Graph::Boxplot) do
103
+ n=30
104
+ a=rnorm(n-1,50,10)
105
+ b=rnorm(n, 30,5)
106
+ c=rnorm(n,5,1)
107
+ a.push(2)
108
+ boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
109
+ end
110
+ Statsample::Analysis.run # Open svg file on *nix application defined
111
+
84
112
  === Correlation matrix
85
113
 
86
114
  require 'statsample'
87
- a=1000.times.collect {rand}.to_scale
88
- b=1000.times.collect {rand}.to_scale
89
- c=1000.times.collect {rand}.to_scale
90
- d=1000.times.collect {rand}.to_scale
91
- ds={'a'=>a,'b'=>b,'c'=>c,'d'=>d}.to_dataset
92
- cm=Statsample::Bivariate.correlation_matrix(ds)
93
- puts cm.summary
115
+ # Note R like generation of random gaussian variable
116
+ # and correlation matrix
117
+
118
+ ss_analysis("Statsample::Bivariate.correlation_matrix") do
119
+ samples=1000
120
+ ds=data_frame(
121
+ 'a'=>rnorm(samples),
122
+ 'b'=>rnorm(samples),
123
+ 'c'=>rnorm(samples),
124
+ 'd'=>rnorm(samples))
125
+ cm=cor(ds)
126
+ summary(cm)
127
+ end
128
+
129
+ Statsample::Analysis.run_batch # Echo output to console
94
130
 
95
131
 
96
132
  == REQUIREMENTS:
@@ -107,7 +143,7 @@ Optional:
107
143
  * Source code on github: http://github.com/clbustos/statsample
108
144
  * API: http://ruby-statsample.rubyforge.org/statsample/
109
145
  * Bug report and feature request: http://github.com/clbustos/statsample/issues
110
-
146
+ * E-mailing list: http://groups.google.com/group/statsample
111
147
 
112
148
  == INSTALL:
113
149
 
data/lib/statsample.rb CHANGED
@@ -134,7 +134,7 @@ module Statsample
134
134
 
135
135
  create_has_library :gsl
136
136
 
137
- VERSION = '1.0.1'
137
+ VERSION = '1.1.0'
138
138
  SPLIT_TOKEN = ","
139
139
  autoload(:Analysis, 'statsample/analysis')
140
140
  autoload(:Database, 'statsample/converters')
@@ -174,29 +174,7 @@ module Statsample
174
174
  false
175
175
  end
176
176
  end
177
- # Import an Excel file. Cache result by default
178
- def load_excel(filename, opts=Hash.new, cache=true)
179
- file_ds=filename+".ds"
180
- if cache and (File.exists? file_ds and File.mtime(file_ds)>File.mtime(filename))
181
- ds=Statsample.load(file_ds)
182
- else
183
- ds=Statsample::Excel.read(filename)
184
- ds.save(file_ds) if cache
185
- end
186
- ds
187
- end
188
177
 
189
- # Import an Excel file. Cache result by default
190
- def load_csv(filename, opts=Hash.new, cache=true)
191
- file_ds=filename+".ds"
192
- if cache and (File.exists? file_ds and File.mtime(file_ds)>File.mtime(filename))
193
- ds=Statsample.load(file_ds)
194
- else
195
- ds=Statsample::CSV.read(filename,opts)
196
- ds.save(file_ds) if cache
197
- end
198
- ds
199
- end
200
178
 
201
179
 
202
180
  # Create a matrix using vectors as columns.
@@ -26,54 +26,75 @@ module Statsample
26
26
  # # or using the returned variables
27
27
  # an1.run
28
28
  # # You can also generate a report using ReportBuilder.
29
- # # puts and pp are overloaded, so its output will be
30
- # # redirected to report.
31
- # # Summary method call 'report_building' on the object,
32
- # # instead of calling summary
29
+ # # .summary() method call 'report_building' on the object,
30
+ # # instead of calling text summary
33
31
  # an1.generate("report.html")
34
32
  module Analysis
35
33
  @@stored_analysis={}
36
34
  @@last_analysis=nil
35
+ def self.clear_analysis
36
+ @@stored_analysis.clear
37
+ end
37
38
  def self.stored_analysis
38
39
  @@stored_analysis
39
40
  end
40
41
  def self.last
41
42
  @@stored_analysis[@@last_analysis]
42
43
  end
43
- def self.store(name,opts=Hash.new,&block)
44
+ def self.store(name, opts=Hash.new,&block)
44
45
  raise "You should provide a block" if !block
45
46
  @@last_analysis=name
46
- @@stored_analysis[name]=Suite.new(name,opts,&block)
47
+ opts={:name=>name}.merge(opts)
48
+ @@stored_analysis[name]=Suite.new(opts,&block)
47
49
  end
48
- # Run analysis +name+
49
- # Withoud arguments, run the latest analysis
50
+ # Run analysis +*args+
51
+ # Without arguments, run all stored analysis
50
52
  # Only 'echo' will be returned to screen
51
- def self.run(name=nil)
52
- name||=@@last_analysis
53
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
54
- stored_analysis[name].run
53
+ def self.run(*args)
54
+ args=stored_analysis.keys if args.size==0
55
+ raise "Analysis #{args} doesn't exists" if (args - stored_analysis.keys).size>0
56
+ args.each do |name|
57
+ stored_analysis[name].run
58
+ end
55
59
  end
56
- # Run analysis and return to screen all
57
- # echo and summary callings
58
- def self.run_batch(name=nil)
59
- name||=@@last_analysis
60
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
61
- puts stored_analysis[name].to_text
62
- end
63
- def self.save(filename, name=nil)
64
- name||=@@last_analysis
65
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
66
- puts stored_analysis[name].generate(filename)
60
+
61
+ # Add analysis +*args+ to an reportbuilder object.
62
+ # Without arguments, add all stored analysis
63
+ # Each analysis is wrapped inside a ReportBuilder::Section object
64
+ # This is the method is used by save() and to_text()
65
+
66
+ def self.add_to_reportbuilder(rb, *args)
67
+ args=stored_analysis.keys if args.size==0
68
+ raise "Analysis #{name} doesn't exists" if (args - stored_analysis.keys).size>0
69
+ args.each do |name|
70
+ section=ReportBuilder::Section.new(:name=>stored_analysis[name].name)
71
+ rb_an=stored_analysis[name].add_to_reportbuilder(section)
72
+ rb.add(section)
73
+ rb_an.run
74
+ end
67
75
  end
68
76
 
77
+ # Save the analysis on a file
78
+ # Without arguments, add all stored analysis
79
+ def self.save(filename, *args)
80
+ rb=ReportBuilder.new(:name=>filename)
81
+ add_to_reportbuilder(rb, *args)
82
+ rb.save(filename)
83
+ end
69
84
 
70
85
  # Run analysis and return as string
71
86
  # output of echo callings
72
- def self.to_text(name=nil)
73
- name||=@@last_analysis
74
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
75
- stored_analysis[name].to_text
76
-
87
+ # Without arguments, add all stored analysis
88
+
89
+ def self.to_text(*args)
90
+ rb=ReportBuilder.new(:name=>"Analysis #{Time.now}")
91
+ add_to_reportbuilder(rb, *args)
92
+ rb.to_text
77
93
  end
94
+ # Run analysis and return to screen all
95
+ # echo and summary callings
96
+ def self.run_batch(*args)
97
+ puts to_text(*args)
98
+ end
78
99
  end
79
100
  end
@@ -5,29 +5,42 @@ module Statsample
5
5
  attr_accessor :output
6
6
  attr_accessor :name
7
7
  attr_reader :block
8
- def initialize(name,opts=Hash.new(),&block)
9
- @name=name
8
+ def initialize(opts=Hash.new(), &block)
9
+ if !opts.is_a? Hash
10
+ opts={:name=>opts}
11
+ end
12
+
10
13
  @block=block
14
+ @name=opts[:name] || "Analysis #{Time.now}"
11
15
  @attached=[]
12
16
  @output=opts[:output] || ::STDOUT
13
-
14
17
  end
15
18
  # Run the analysis, putting output on
16
19
  def run
17
20
  @block.arity<1 ? instance_eval(&@block) : @block.call(self)
18
21
  end
22
+ # Provides a description of the procedure. Only appears as a commentary on
23
+ # SuiteReportBuilder outputs
24
+ def desc(d)
25
+ @output.puts("Description:")
26
+ @output.puts(" #{d}")
27
+ end
19
28
  def echo(*args)
20
29
  @output.puts(*args)
21
30
  end
22
31
  def summary(obj)
23
32
  obj.summary
24
33
  end
34
+ def add_to_reportbuilder(rb)
35
+ SuiteReportBuilder.new({:name=>name, :rb=>rb}, &block)
36
+ end
37
+
25
38
  def generate(filename)
26
- ar=SuiteReportBuilder.new(name,&block)
39
+ ar=SuiteReportBuilder.new({:name=>name}, &block)
27
40
  ar.generate(filename)
28
41
  end
29
42
  def to_text
30
- ar=SuiteReportBuilder.new(name, &block)
43
+ ar=SuiteReportBuilder.new({:name=>name}, &block)
31
44
  ar.to_text
32
45
  end
33
46
 
@@ -2,9 +2,12 @@ module Statsample
2
2
  module Analysis
3
3
  class SuiteReportBuilder < Suite
4
4
  attr_accessor :rb
5
- def initialize(name,&block)
6
- super(name,&block)
7
- @rb=ReportBuilder.new(:name=>name)
5
+ def initialize(opts=Hash.new,&block)
6
+ if !opts.is_a? Hash
7
+ opts={:name=>opts}
8
+ end
9
+ super(opts,&block)
10
+ @rb=opts[:rb] || ReportBuilder.new(:name=>name)
8
11
  end
9
12
  def generate(filename)
10
13
  run if @block
@@ -17,6 +20,9 @@ module Statsample
17
20
  def summary(o)
18
21
  @rb.add(o)
19
22
  end
23
+ def desc(d)
24
+ @rb.add(d)
25
+ end
20
26
  def echo(*args)
21
27
  args.each do |a|
22
28
  @rb.add(a)
@@ -18,5 +18,7 @@ module Statsample
18
18
  end
19
19
  end
20
20
  end
21
+
21
22
  require 'statsample/anova/oneway'
23
+ require 'statsample/anova/contrast'
22
24
  require 'statsample/anova/twoway'
@@ -0,0 +1,79 @@
1
+ module Statsample
2
+ module Anova
3
+ class Contrast
4
+ attr_reader :psi
5
+
6
+ attr_reader :msw
7
+ include Summarizable
8
+ def initialize(opts=Hash.new)
9
+ raise "Should set at least vectors options" if opts[:vectors].nil?
10
+ @vectors=opts[:vectors]
11
+ @c=opts[:c]
12
+ @c1,@c2=opts[:c1], opts[:c2]
13
+ @t_options=opts[:t_options] || {:estimate_name=>_("Psi estimate")}
14
+ @name=opts[:name] || _("Contrast")
15
+ psi
16
+ @anova=Statsample::Anova::OneWayWithVectors.new(@vectors)
17
+ @msw=@anova.msw
18
+ end
19
+ # Hypothesis contrast, selecting index for each constrast
20
+ # For example, if you want to contrast x_0 against x_1 and x_2
21
+ # you should use
22
+ # c.contrast([0],[1,2])
23
+ def c_by_index(c1,c2)
24
+ contrast=[0]*@vectors.size
25
+ c1.each {|i| contrast[i]=1.quo(c1.size)}
26
+ c2.each {|i| contrast[i]=-1.quo(c2.size)}
27
+ @c=contrast
28
+ c(contrast)
29
+ end
30
+ def psi
31
+ if @psi.nil?
32
+ c(@c) if @c
33
+ c_by_index(@c1,@c2) if (@c1 and @c2)
34
+ end
35
+ @psi
36
+ end
37
+ def confidence_interval(cl=nil)
38
+ t_object.confidence_interval(cl)
39
+ end
40
+ # Hypothesis contrast, using custom values
41
+ # Every parameter is a contrast value. You should use
42
+ # the same number of contrast as vectors on class and the sum
43
+ # of constrast should be 0.
44
+ def c(args=nil)
45
+
46
+ return @c if args.nil?
47
+ @c=args
48
+ raise "contrast number!=vector number" if args.size!=@vectors.size
49
+ #raise "Sum should be 0" if args.inject(0) {|ac,v| ac+v}!=0
50
+ @psi=args.size.times.inject(0) {|ac,i| ac+(args[i]*@vectors[i].mean)}
51
+ end
52
+ def standard_error
53
+ sum=@vectors.size.times.inject(0) {|ac,i|
54
+ ac+((@c[i].rationalize**2).quo(@vectors[i].size))
55
+ }
56
+ Math.sqrt(@msw*sum)
57
+ end
58
+ alias :se :standard_error
59
+ def df
60
+ @vectors.inject(0) {|ac,v| ac+v.size}-@vectors.size
61
+ end
62
+ def t_object
63
+ Statsample::Test::T.new(psi, se, df, @t_options)
64
+ end
65
+ def t
66
+ t_object.t
67
+ end
68
+ def probability
69
+ t_object.probability
70
+ end
71
+ def report_building(builder)
72
+ builder.section(:name=>@name) do |s|
73
+ s.text _("Contrast:%s") % c.join(",")
74
+ s.parse_element(t_object)
75
+ end
76
+ end
77
+ end
78
+ end
79
+ end