statsample 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (39) hide show
  1. data/.gemtest +0 -0
  2. data/History.txt +14 -0
  3. data/Manifest.txt +4 -0
  4. data/README.txt +49 -13
  5. data/data/locale/es/LC_MESSAGES/statsample.mo +0 -0
  6. data/lib/statsample.rb +1 -23
  7. data/lib/statsample/analysis.rb +49 -28
  8. data/lib/statsample/analysis/suite.rb +18 -5
  9. data/lib/statsample/analysis/suitereportbuilder.rb +9 -3
  10. data/lib/statsample/anova.rb +2 -0
  11. data/lib/statsample/anova/contrast.rb +79 -0
  12. data/lib/statsample/anova/oneway.rb +39 -5
  13. data/lib/statsample/converter/csv.rb +2 -5
  14. data/lib/statsample/converters.rb +1 -0
  15. data/lib/statsample/dataset.rb +31 -1
  16. data/lib/statsample/graph/histogram.rb +1 -1
  17. data/lib/statsample/regression/multiple/baseengine.rb +5 -0
  18. data/lib/statsample/reliability/multiscaleanalysis.rb +3 -1
  19. data/lib/statsample/reliability/scaleanalysis.rb +3 -4
  20. data/lib/statsample/shorthand.rb +41 -1
  21. data/lib/statsample/test.rb +10 -0
  22. data/lib/statsample/test/kolmogorovsmirnov.rb +61 -0
  23. data/lib/statsample/test/t.rb +92 -9
  24. data/lib/statsample/vector.rb +143 -10
  25. data/po/es/statsample.mo +0 -0
  26. data/po/es/statsample.po +109 -110
  27. data/po/statsample.pot +108 -60
  28. data/test/helpers_tests.rb +1 -0
  29. data/test/test_analysis.rb +70 -11
  30. data/test/test_anova_contrast.rb +36 -0
  31. data/test/test_anovawithvectors.rb +8 -0
  32. data/test/test_dataset.rb +12 -0
  33. data/test/test_factor_pa.rb +1 -3
  34. data/test/test_test_kolmogorovsmirnov.rb +34 -0
  35. data/test/test_test_t.rb +16 -0
  36. data/test/test_vector.rb +40 -2
  37. metadata +44 -118
  38. data.tar.gz.sig +0 -0
  39. metadata.gz.sig +0 -0
data/.gemtest ADDED
File without changes
data/History.txt CHANGED
@@ -1,3 +1,17 @@
1
+ === 1.1.0 / 2011-06-02
2
+
3
+ * New Statsample::Anova::Contrast
4
+ * Jacknife and bootstrap for Vector. Thanks to John Firebaugh for the idea
5
+ * Improved Statsample::Analysis API
6
+ * Updated CSV.read. Third argument is a Hash with options to CSV class
7
+ * Added restriction on Statsample::Excel.read
8
+ * Updated spanish po
9
+ * Better summary for Vector
10
+ * Improving summary of t related test (confidence interval and estimate output)
11
+ * Replaced c for vector on Statsample::Analysis examples
12
+ * Added Vector#median_absolute_deviation
13
+ * First implementation of Kolmogorov Smirnov test. Returns correct D value, but without Kolmogorov distribution isn't very useful.
14
+
1
15
  === 1.0.1 / 2011-01-28
2
16
 
3
17
  * Updated spanish po.
data/Manifest.txt CHANGED
@@ -44,6 +44,7 @@ lib/statsample/analysis.rb
44
44
  lib/statsample/analysis/suite.rb
45
45
  lib/statsample/analysis/suitereportbuilder.rb
46
46
  lib/statsample/anova.rb
47
+ lib/statsample/anova/contrast.rb
47
48
  lib/statsample/anova/oneway.rb
48
49
  lib/statsample/anova/twoway.rb
49
50
  lib/statsample/bivariate.rb
@@ -97,6 +98,7 @@ lib/statsample/test.rb
97
98
  lib/statsample/test/bartlettsphericity.rb
98
99
  lib/statsample/test/chisquare.rb
99
100
  lib/statsample/test/f.rb
101
+ lib/statsample/test/kolmogorovsmirnov.rb
100
102
  lib/statsample/test/levene.rb
101
103
  lib/statsample/test/t.rb
102
104
  lib/statsample/test/umannwhitney.rb
@@ -119,6 +121,7 @@ test/fixtures/tetmat_matrix.txt
119
121
  test/fixtures/tetmat_test.txt
120
122
  test/helpers_tests.rb
121
123
  test/test_analysis.rb
124
+ test/test_anova_contrast.rb
122
125
  test/test_anovaoneway.rb
123
126
  test/test_anovatwoway.rb
124
127
  test/test_anovatwowaywithdataset.rb
@@ -151,6 +154,7 @@ test/test_statistics.rb
151
154
  test/test_stest.rb
152
155
  test/test_stratified.rb
153
156
  test/test_test_f.rb
157
+ test/test_test_kolmogorovsmirnov.rb
154
158
  test/test_test_t.rb
155
159
  test/test_umannwhitney.rb
156
160
  test/test_vector.rb
data/README.txt CHANGED
@@ -5,14 +5,14 @@ http://ruby-statsample.rubyforge.org/
5
5
 
6
6
  == DESCRIPTION:
7
7
 
8
- A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010) and JRuby 1.4 (Ruby 1.8.7 compatible).
8
+ A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
9
9
 
10
10
  Include:
11
11
  * Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
12
12
  * Imports and exports datasets from and to Excel, CSV and plain text files.
13
13
  * Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and gamma. Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
14
14
  * Intra-class correlation
15
- * Anova: generic and vector-based One-way ANOVA and Two-way ANOVA
15
+ * Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
16
16
  * Tests: F, T, Levene, U-Mannwhitney.
17
17
  * Regression: Simple, Multiple (OLS), Probit and Logit
18
18
  * Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
@@ -23,13 +23,27 @@ Include:
23
23
  * Creates reports on text, html and rtf, using ReportBuilder gem
24
24
  * Graphics: Histogram, Boxplot and Scatterplot
25
25
 
26
+ == PRINCIPLES
27
+
28
+ * Software Design:
29
+ * One module/class for each type of analysis
30
+ * Options can be set as hash on initialize() or as setters methods
31
+ * Clean API for interactive sessions
32
+ * summary() returns all necessary informacion for interactive sessions
33
+ * All statistical data available though methods on objects
34
+ * All (important) methods should be tested. Better with random data.
35
+ * Statistical Design
36
+ * Results are tested against text results, SPSS and R outputs.
37
+ * Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
38
+ * (When possible) All references for methods are documented, providing sensible information on documentation
39
+
26
40
  == FEATURES:
27
41
 
28
42
  * Classes for manipulation and storage of data:
29
43
  * Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
30
44
  * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
31
45
  * Statsample::Multiset: multiple datasets with same fields and type of vectors
32
- * Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors
46
+ * Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
33
47
  * Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
34
48
  * Multiple types of regression.
35
49
  * Simple Regression : Statsample::Regression::Simple
@@ -61,15 +75,16 @@ Include:
61
75
  * Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
62
76
  * Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
63
77
  * Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
64
- * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulation.
78
+ * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
65
79
  * Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
66
80
  * Module Statsample::Test provides several methods and classes to perform inferencial statistics
67
81
  * Statsample::Test::BartlettSphericity
68
82
  * Statsample::Test::ChiSquare
83
+ * Statsample::Test::F
84
+ * Statsample::Test::KolmogorovSmirnov (only D value)
69
85
  * Statsample::Test::Levene
70
86
  * Statsample::Test::UMannWhitney
71
87
  * Statsample::Test::T
72
- * Statsample::Test::F
73
88
  * Module Graph provides several classes to create beautiful graphs using rubyvis
74
89
  * Statsample::Graph::Boxplot
75
90
  * Statsample::Graph::Histogram
@@ -81,16 +96,37 @@ Include:
81
96
 
82
97
  See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/]
83
98
 
99
+ === Boxplot
100
+
101
+ require 'statsample'
102
+ ss_analysis(Statsample::Graph::Boxplot) do
103
+ n=30
104
+ a=rnorm(n-1,50,10)
105
+ b=rnorm(n, 30,5)
106
+ c=rnorm(n,5,1)
107
+ a.push(2)
108
+ boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
109
+ end
110
+ Statsample::Analysis.run # Open svg file on *nix application defined
111
+
84
112
  === Correlation matrix
85
113
 
86
114
  require 'statsample'
87
- a=1000.times.collect {rand}.to_scale
88
- b=1000.times.collect {rand}.to_scale
89
- c=1000.times.collect {rand}.to_scale
90
- d=1000.times.collect {rand}.to_scale
91
- ds={'a'=>a,'b'=>b,'c'=>c,'d'=>d}.to_dataset
92
- cm=Statsample::Bivariate.correlation_matrix(ds)
93
- puts cm.summary
115
+ # Note R like generation of random gaussian variable
116
+ # and correlation matrix
117
+
118
+ ss_analysis("Statsample::Bivariate.correlation_matrix") do
119
+ samples=1000
120
+ ds=data_frame(
121
+ 'a'=>rnorm(samples),
122
+ 'b'=>rnorm(samples),
123
+ 'c'=>rnorm(samples),
124
+ 'd'=>rnorm(samples))
125
+ cm=cor(ds)
126
+ summary(cm)
127
+ end
128
+
129
+ Statsample::Analysis.run_batch # Echo output to console
94
130
 
95
131
 
96
132
  == REQUIREMENTS:
@@ -107,7 +143,7 @@ Optional:
107
143
  * Source code on github: http://github.com/clbustos/statsample
108
144
  * API: http://ruby-statsample.rubyforge.org/statsample/
109
145
  * Bug report and feature request: http://github.com/clbustos/statsample/issues
110
-
146
+ * E-mailing list: http://groups.google.com/group/statsample
111
147
 
112
148
  == INSTALL:
113
149
 
data/lib/statsample.rb CHANGED
@@ -134,7 +134,7 @@ module Statsample
134
134
 
135
135
  create_has_library :gsl
136
136
 
137
- VERSION = '1.0.1'
137
+ VERSION = '1.1.0'
138
138
  SPLIT_TOKEN = ","
139
139
  autoload(:Analysis, 'statsample/analysis')
140
140
  autoload(:Database, 'statsample/converters')
@@ -174,29 +174,7 @@ module Statsample
174
174
  false
175
175
  end
176
176
  end
177
- # Import an Excel file. Cache result by default
178
- def load_excel(filename, opts=Hash.new, cache=true)
179
- file_ds=filename+".ds"
180
- if cache and (File.exists? file_ds and File.mtime(file_ds)>File.mtime(filename))
181
- ds=Statsample.load(file_ds)
182
- else
183
- ds=Statsample::Excel.read(filename)
184
- ds.save(file_ds) if cache
185
- end
186
- ds
187
- end
188
177
 
189
- # Import an Excel file. Cache result by default
190
- def load_csv(filename, opts=Hash.new, cache=true)
191
- file_ds=filename+".ds"
192
- if cache and (File.exists? file_ds and File.mtime(file_ds)>File.mtime(filename))
193
- ds=Statsample.load(file_ds)
194
- else
195
- ds=Statsample::CSV.read(filename,opts)
196
- ds.save(file_ds) if cache
197
- end
198
- ds
199
- end
200
178
 
201
179
 
202
180
  # Create a matrix using vectors as columns.
@@ -26,54 +26,75 @@ module Statsample
26
26
  # # or using the returned variables
27
27
  # an1.run
28
28
  # # You can also generate a report using ReportBuilder.
29
- # # puts and pp are overloaded, so its output will be
30
- # # redirected to report.
31
- # # Summary method call 'report_building' on the object,
32
- # # instead of calling summary
29
+ # # .summary() method call 'report_building' on the object,
30
+ # # instead of calling text summary
33
31
  # an1.generate("report.html")
34
32
  module Analysis
35
33
  @@stored_analysis={}
36
34
  @@last_analysis=nil
35
+ def self.clear_analysis
36
+ @@stored_analysis.clear
37
+ end
37
38
  def self.stored_analysis
38
39
  @@stored_analysis
39
40
  end
40
41
  def self.last
41
42
  @@stored_analysis[@@last_analysis]
42
43
  end
43
- def self.store(name,opts=Hash.new,&block)
44
+ def self.store(name, opts=Hash.new,&block)
44
45
  raise "You should provide a block" if !block
45
46
  @@last_analysis=name
46
- @@stored_analysis[name]=Suite.new(name,opts,&block)
47
+ opts={:name=>name}.merge(opts)
48
+ @@stored_analysis[name]=Suite.new(opts,&block)
47
49
  end
48
- # Run analysis +name+
49
- # Withoud arguments, run the latest analysis
50
+ # Run analysis +*args+
51
+ # Without arguments, run all stored analysis
50
52
  # Only 'echo' will be returned to screen
51
- def self.run(name=nil)
52
- name||=@@last_analysis
53
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
54
- stored_analysis[name].run
53
+ def self.run(*args)
54
+ args=stored_analysis.keys if args.size==0
55
+ raise "Analysis #{args} doesn't exists" if (args - stored_analysis.keys).size>0
56
+ args.each do |name|
57
+ stored_analysis[name].run
58
+ end
55
59
  end
56
- # Run analysis and return to screen all
57
- # echo and summary callings
58
- def self.run_batch(name=nil)
59
- name||=@@last_analysis
60
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
61
- puts stored_analysis[name].to_text
62
- end
63
- def self.save(filename, name=nil)
64
- name||=@@last_analysis
65
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
66
- puts stored_analysis[name].generate(filename)
60
+
61
+ # Add analysis +*args+ to an reportbuilder object.
62
+ # Without arguments, add all stored analysis
63
+ # Each analysis is wrapped inside a ReportBuilder::Section object
64
+ # This is the method is used by save() and to_text()
65
+
66
+ def self.add_to_reportbuilder(rb, *args)
67
+ args=stored_analysis.keys if args.size==0
68
+ raise "Analysis #{name} doesn't exists" if (args - stored_analysis.keys).size>0
69
+ args.each do |name|
70
+ section=ReportBuilder::Section.new(:name=>stored_analysis[name].name)
71
+ rb_an=stored_analysis[name].add_to_reportbuilder(section)
72
+ rb.add(section)
73
+ rb_an.run
74
+ end
67
75
  end
68
76
 
77
+ # Save the analysis on a file
78
+ # Without arguments, add all stored analysis
79
+ def self.save(filename, *args)
80
+ rb=ReportBuilder.new(:name=>filename)
81
+ add_to_reportbuilder(rb, *args)
82
+ rb.save(filename)
83
+ end
69
84
 
70
85
  # Run analysis and return as string
71
86
  # output of echo callings
72
- def self.to_text(name=nil)
73
- name||=@@last_analysis
74
- raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
75
- stored_analysis[name].to_text
76
-
87
+ # Without arguments, add all stored analysis
88
+
89
+ def self.to_text(*args)
90
+ rb=ReportBuilder.new(:name=>"Analysis #{Time.now}")
91
+ add_to_reportbuilder(rb, *args)
92
+ rb.to_text
77
93
  end
94
+ # Run analysis and return to screen all
95
+ # echo and summary callings
96
+ def self.run_batch(*args)
97
+ puts to_text(*args)
98
+ end
78
99
  end
79
100
  end
@@ -5,29 +5,42 @@ module Statsample
5
5
  attr_accessor :output
6
6
  attr_accessor :name
7
7
  attr_reader :block
8
- def initialize(name,opts=Hash.new(),&block)
9
- @name=name
8
+ def initialize(opts=Hash.new(), &block)
9
+ if !opts.is_a? Hash
10
+ opts={:name=>opts}
11
+ end
12
+
10
13
  @block=block
14
+ @name=opts[:name] || "Analysis #{Time.now}"
11
15
  @attached=[]
12
16
  @output=opts[:output] || ::STDOUT
13
-
14
17
  end
15
18
  # Run the analysis, putting output on
16
19
  def run
17
20
  @block.arity<1 ? instance_eval(&@block) : @block.call(self)
18
21
  end
22
+ # Provides a description of the procedure. Only appears as a commentary on
23
+ # SuiteReportBuilder outputs
24
+ def desc(d)
25
+ @output.puts("Description:")
26
+ @output.puts(" #{d}")
27
+ end
19
28
  def echo(*args)
20
29
  @output.puts(*args)
21
30
  end
22
31
  def summary(obj)
23
32
  obj.summary
24
33
  end
34
+ def add_to_reportbuilder(rb)
35
+ SuiteReportBuilder.new({:name=>name, :rb=>rb}, &block)
36
+ end
37
+
25
38
  def generate(filename)
26
- ar=SuiteReportBuilder.new(name,&block)
39
+ ar=SuiteReportBuilder.new({:name=>name}, &block)
27
40
  ar.generate(filename)
28
41
  end
29
42
  def to_text
30
- ar=SuiteReportBuilder.new(name, &block)
43
+ ar=SuiteReportBuilder.new({:name=>name}, &block)
31
44
  ar.to_text
32
45
  end
33
46
 
@@ -2,9 +2,12 @@ module Statsample
2
2
  module Analysis
3
3
  class SuiteReportBuilder < Suite
4
4
  attr_accessor :rb
5
- def initialize(name,&block)
6
- super(name,&block)
7
- @rb=ReportBuilder.new(:name=>name)
5
+ def initialize(opts=Hash.new,&block)
6
+ if !opts.is_a? Hash
7
+ opts={:name=>opts}
8
+ end
9
+ super(opts,&block)
10
+ @rb=opts[:rb] || ReportBuilder.new(:name=>name)
8
11
  end
9
12
  def generate(filename)
10
13
  run if @block
@@ -17,6 +20,9 @@ module Statsample
17
20
  def summary(o)
18
21
  @rb.add(o)
19
22
  end
23
+ def desc(d)
24
+ @rb.add(d)
25
+ end
20
26
  def echo(*args)
21
27
  args.each do |a|
22
28
  @rb.add(a)
@@ -18,5 +18,7 @@ module Statsample
18
18
  end
19
19
  end
20
20
  end
21
+
21
22
  require 'statsample/anova/oneway'
23
+ require 'statsample/anova/contrast'
22
24
  require 'statsample/anova/twoway'
@@ -0,0 +1,79 @@
1
+ module Statsample
2
+ module Anova
3
+ class Contrast
4
+ attr_reader :psi
5
+
6
+ attr_reader :msw
7
+ include Summarizable
8
+ def initialize(opts=Hash.new)
9
+ raise "Should set at least vectors options" if opts[:vectors].nil?
10
+ @vectors=opts[:vectors]
11
+ @c=opts[:c]
12
+ @c1,@c2=opts[:c1], opts[:c2]
13
+ @t_options=opts[:t_options] || {:estimate_name=>_("Psi estimate")}
14
+ @name=opts[:name] || _("Contrast")
15
+ psi
16
+ @anova=Statsample::Anova::OneWayWithVectors.new(@vectors)
17
+ @msw=@anova.msw
18
+ end
19
+ # Hypothesis contrast, selecting index for each constrast
20
+ # For example, if you want to contrast x_0 against x_1 and x_2
21
+ # you should use
22
+ # c.contrast([0],[1,2])
23
+ def c_by_index(c1,c2)
24
+ contrast=[0]*@vectors.size
25
+ c1.each {|i| contrast[i]=1.quo(c1.size)}
26
+ c2.each {|i| contrast[i]=-1.quo(c2.size)}
27
+ @c=contrast
28
+ c(contrast)
29
+ end
30
+ def psi
31
+ if @psi.nil?
32
+ c(@c) if @c
33
+ c_by_index(@c1,@c2) if (@c1 and @c2)
34
+ end
35
+ @psi
36
+ end
37
+ def confidence_interval(cl=nil)
38
+ t_object.confidence_interval(cl)
39
+ end
40
+ # Hypothesis contrast, using custom values
41
+ # Every parameter is a contrast value. You should use
42
+ # the same number of contrast as vectors on class and the sum
43
+ # of constrast should be 0.
44
+ def c(args=nil)
45
+
46
+ return @c if args.nil?
47
+ @c=args
48
+ raise "contrast number!=vector number" if args.size!=@vectors.size
49
+ #raise "Sum should be 0" if args.inject(0) {|ac,v| ac+v}!=0
50
+ @psi=args.size.times.inject(0) {|ac,i| ac+(args[i]*@vectors[i].mean)}
51
+ end
52
+ def standard_error
53
+ sum=@vectors.size.times.inject(0) {|ac,i|
54
+ ac+((@c[i].rationalize**2).quo(@vectors[i].size))
55
+ }
56
+ Math.sqrt(@msw*sum)
57
+ end
58
+ alias :se :standard_error
59
+ def df
60
+ @vectors.inject(0) {|ac,v| ac+v.size}-@vectors.size
61
+ end
62
+ def t_object
63
+ Statsample::Test::T.new(psi, se, df, @t_options)
64
+ end
65
+ def t
66
+ t_object.t
67
+ end
68
+ def probability
69
+ t_object.probability
70
+ end
71
+ def report_building(builder)
72
+ builder.section(:name=>@name) do |s|
73
+ s.text _("Contrast:%s") % c.join(",")
74
+ s.parse_element(t_object)
75
+ end
76
+ end
77
+ end
78
+ end
79
+ end