RubyGems - statsample - Versions diffs - 1.0.1 → 1.1.0 - Mend

statsample 1.0.1 → 1.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.

Files changed (39) hide show

data/.gemtest +0 -0
data/History.txt +14 -0
data/Manifest.txt +4 -0
data/README.txt +49 -13
data/data/locale/es/LC_MESSAGES/statsample.mo +0 -0
data/lib/statsample.rb +1 -23
data/lib/statsample/analysis.rb +49 -28
data/lib/statsample/analysis/suite.rb +18 -5
data/lib/statsample/analysis/suitereportbuilder.rb +9 -3
data/lib/statsample/anova.rb +2 -0
data/lib/statsample/anova/contrast.rb +79 -0
data/lib/statsample/anova/oneway.rb +39 -5
data/lib/statsample/converter/csv.rb +2 -5
data/lib/statsample/converters.rb +1 -0
data/lib/statsample/dataset.rb +31 -1
data/lib/statsample/graph/histogram.rb +1 -1
data/lib/statsample/regression/multiple/baseengine.rb +5 -0
data/lib/statsample/reliability/multiscaleanalysis.rb +3 -1
data/lib/statsample/reliability/scaleanalysis.rb +3 -4
data/lib/statsample/shorthand.rb +41 -1
data/lib/statsample/test.rb +10 -0
data/lib/statsample/test/kolmogorovsmirnov.rb +61 -0
data/lib/statsample/test/t.rb +92 -9
data/lib/statsample/vector.rb +143 -10
data/po/es/statsample.mo +0 -0
data/po/es/statsample.po +109 -110
data/po/statsample.pot +108 -60
data/test/helpers_tests.rb +1 -0
data/test/test_analysis.rb +70 -11
data/test/test_anova_contrast.rb +36 -0
data/test/test_anovawithvectors.rb +8 -0
data/test/test_dataset.rb +12 -0
data/test/test_factor_pa.rb +1 -3
data/test/test_test_kolmogorovsmirnov.rb +34 -0
data/test/test_test_t.rb +16 -0
data/test/test_vector.rb +40 -2
metadata +44 -118
data.tar.gz.sig +0 -0
metadata.gz.sig +0 -0

data/.gemtest ADDED Viewed

File without changes

data/History.txt CHANGED Viewed

@@ -1,3 +1,17 @@
+=== 1.1.0 / 2011-06-02
+* New Statsample::Anova::Contrast
+* Jacknife and bootstrap for Vector. Thanks to John Firebaugh for the idea
+* Improved Statsample::Analysis API
+* Updated CSV.read. Third argument is a Hash with options to CSV class
+* Added restriction on Statsample::Excel.read
+* Updated spanish po
+* Better summary for Vector
+* Improving summary of t related test (confidence interval and estimate output)
+* Replaced c for vector on Statsample::Analysis examples
+* Added Vector#median_absolute_deviation
+* First implementation of Kolmogorov Smirnov test. Returns correct D value, but without Kolmogorov distribution isn't very useful.
 === 1.0.1 / 2011-01-28
 * Updated spanish po.

data/Manifest.txt CHANGED Viewed

@@ -44,6 +44,7 @@ lib/statsample/analysis.rb
 lib/statsample/analysis/suite.rb
 lib/statsample/analysis/suitereportbuilder.rb
 lib/statsample/anova.rb
+lib/statsample/anova/contrast.rb
 lib/statsample/anova/oneway.rb
 lib/statsample/anova/twoway.rb
 lib/statsample/bivariate.rb
@@ -97,6 +98,7 @@ lib/statsample/test.rb
 lib/statsample/test/bartlettsphericity.rb
 lib/statsample/test/chisquare.rb
 lib/statsample/test/f.rb
+lib/statsample/test/kolmogorovsmirnov.rb
 lib/statsample/test/levene.rb
 lib/statsample/test/t.rb
 lib/statsample/test/umannwhitney.rb
@@ -119,6 +121,7 @@ test/fixtures/tetmat_matrix.txt
 test/fixtures/tetmat_test.txt
 test/helpers_tests.rb
 test/test_analysis.rb
+test/test_anova_contrast.rb
 test/test_anovaoneway.rb
 test/test_anovatwoway.rb
 test/test_anovatwowaywithdataset.rb
@@ -151,6 +154,7 @@ test/test_statistics.rb
 test/test_stest.rb
 test/test_stratified.rb
 test/test_test_f.rb
+test/test_test_kolmogorovsmirnov.rb
 test/test_test_t.rb
 test/test_umannwhitney.rb
 test/test_vector.rb

data/README.txt CHANGED Viewed

@@ -5,14 +5,14 @@ http://ruby-statsample.rubyforge.org/
 == DESCRIPTION:
-A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010) and JRuby 1.4 (Ruby 1.8.7 compatible).
+A suite for basic and advanced statistics on Ruby. Tested on Ruby 1.8.7, 1.9.1, 1.9.2 (April, 2010), ruby-head(June, 2011) and JRuby 1.4 (Ruby 1.8.7 compatible).
 Include:
 * Descriptive statistics: frequencies, median, mean, standard error, skew, kurtosis (and many others).
 * Imports and exports datasets from and to Excel, CSV and plain text files.
 * Correlations: Pearson's r, Spearman's rank correlation (rho), point biserial, tau a, tau b and  gamma.  Tetrachoric and Polychoric correlation provides by +statsample-bivariate-extension+ gem.
 * Intra-class correlation
-* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA
+* Anova: generic and vector-based One-way ANOVA and Two-way ANOVA, with contrasts for One-way ANOVA.
 * Tests: F, T, Levene, U-Mannwhitney.
 * Regression: Simple, Multiple (OLS), Probit  and Logit
 * Factorial Analysis: Extraction (PCA and Principal Axis), Rotation (Varimax, Equimax, Quartimax) and Parallel Analysis and Velicer's MAP test, for estimation of number of factors.
@@ -23,13 +23,27 @@ Include:
 * Creates reports on text, html and rtf, using ReportBuilder gem
 * Graphics: Histogram, Boxplot and Scatterplot
+== PRINCIPLES
+* Software Design:
+  * One module/class for each type of analysis
+  * Options can be set as hash on initialize() or as setters methods
+  * Clean API for interactive sessions
+  * summary() returns all necessary informacion for interactive sessions
+  * All statistical data available though methods on objects
+  * All (important) methods should be tested. Better with random data.
+* Statistical Design
+  * Results are tested against text results, SPSS and R outputs.
+  * Go beyond Null Hiphotesis Testing, using confidence intervals and effect sizes when possible
+  * (When possible) All references for methods are documented, providing sensible information on documentation
 == FEATURES:
 * Classes for manipulation and storage of data:
   * Statsample::Vector: An extension of an array, with statistical methods like sum, mean and standard deviation
   * Statsample::Dataset: a group of Statsample::Vector, analog to a excel spreadsheet or a dataframe on R. The base of almost all operations on statsample.
   * Statsample::Multiset: multiple datasets with same fields and type of vectors
-* Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors
+* Anova module provides generic Statsample::Anova::OneWay and vector based Statsample::Anova::OneWayWithVectors. Also you can create contrast using Statsample::Anova::Contrast
 * Module Statsample::Bivariate provides covariance and pearson, spearman, point biserial, tau a, tau b, gamma, tetrachoric (see Bivariate::Tetrachoric) and polychoric (see Bivariate::Polychoric) correlations. Include methods to create correlation and covariance matrices
 * Multiple types of regression.
   * Simple Regression :  Statsample::Regression::Simple
@@ -61,15 +75,16 @@ Include:
 * Module Statsample::Reliability provides functions to analyze scales with psychometric methods.
   * Class Statsample::Reliability::ScaleAnalysis provides statistics like mean, standard deviation for a scale, Cronbach's alpha and standarized Cronbach's alpha, and for each item: mean, correlation with total scale, mean if deleted, Cronbach's alpha is deleted.
   * Class Statsample::Reliability::MultiScaleAnalysis provides a DSL to easily analyze reliability of multiple scales and retrieve correlation matrix and factor analysis of them.
-  * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulation.
+  * Class Statsample::Reliability::ICC provides intra-class correlation, using Shrout & Fleiss(1979) and McGraw & Wong (1996) formulations.
 * Module Statsample::SRS (Simple Random Sampling) provides a lot of functions to estimate standard error for several type of samples
 * Module Statsample::Test provides several methods and classes to perform inferencial statistics
   * Statsample::Test::BartlettSphericity
   * Statsample::Test::ChiSquare
+  * Statsample::Test::F
+  * Statsample::Test::KolmogorovSmirnov (only D value)
   * Statsample::Test::Levene
   * Statsample::Test::UMannWhitney
   * Statsample::Test::T
-  * Statsample::Test::F
 * Module Graph provides several classes to create beautiful graphs using rubyvis
   * Statsample::Graph::Boxplot
   * Statsample::Graph::Histogram
@@ -81,16 +96,37 @@ Include:
 See multiples examples of use on [http://github.com/clbustos/statsample/tree/master/examples/]
+=== Boxplot
+    require 'statsample'
+    ss_analysis(Statsample::Graph::Boxplot) do
+      n=30
+      a=rnorm(n-1,50,10)
+      b=rnorm(n, 30,5)
+      c=rnorm(n,5,1)
+      a.push(2)
+      boxplot(:vectors=>[a,b,c], :width=>300, :height=>300, :groups=>%w{first first second}, :minimum=>0)
+    end
+    Statsample::Analysis.run # Open svg file on *nix application defined
 === Correlation matrix
     require 'statsample'
-    a=1000.times.collect {rand}.to_scale
-    b=1000.times.collect {rand}.to_scale
-    c=1000.times.collect {rand}.to_scale
-    d=1000.times.collect {rand}.to_scale
-    ds={'a'=>a,'b'=>b,'c'=>c,'d'=>d}.to_dataset
-    cm=Statsample::Bivariate.correlation_matrix(ds)
-    puts cm.summary
+    # Note R like generation of random gaussian variable
+    # and correlation matrix
+    ss_analysis("Statsample::Bivariate.correlation_matrix") do
+      samples=1000
+      ds=data_frame(
+        'a'=>rnorm(samples),
+        'b'=>rnorm(samples),
+        'c'=>rnorm(samples),
+        'd'=>rnorm(samples))
+      cm=cor(ds)
+      summary(cm)
+    end
+    Statsample::Analysis.run_batch # Echo output to console
 == REQUIREMENTS:
@@ -107,7 +143,7 @@ Optional:
 * Source code on github: http://github.com/clbustos/statsample
 * API: http://ruby-statsample.rubyforge.org/statsample/
 * Bug report and feature request: http://github.com/clbustos/statsample/issues
+* E-mailing list: http://groups.google.com/group/statsample
 == INSTALL:

data/data/locale/es/LC_MESSAGES/statsample.mo CHANGED Viewed

Binary file

data/lib/statsample.rb CHANGED Viewed

@@ -134,7 +134,7 @@ module Statsample
   create_has_library :gsl
-  VERSION = '1.0.1'
+  VERSION = '1.1.0'
   SPLIT_TOKEN = ","
   autoload(:Analysis, 'statsample/analysis')
   autoload(:Database, 'statsample/converters')
@@ -174,29 +174,7 @@ module Statsample
         false
       end
     end
-    # Import an Excel file. Cache result by default
-    def load_excel(filename, opts=Hash.new, cache=true)
-      file_ds=filename+".ds"
-      if cache and (File.exists? file_ds and File.mtime(file_ds)>File.mtime(filename))
-        ds=Statsample.load(file_ds)
-      else
-        ds=Statsample::Excel.read(filename)
-        ds.save(file_ds) if cache
-      end
-      ds
-    end
-    # Import an Excel file. Cache result by default
-    def load_csv(filename, opts=Hash.new, cache=true)
-      file_ds=filename+".ds"
-      if cache and (File.exists? file_ds and File.mtime(file_ds)>File.mtime(filename))
-        ds=Statsample.load(file_ds)
-      else
-        ds=Statsample::CSV.read(filename,opts)
-        ds.save(file_ds) if cache
-      end
-      ds
-    end
     # Create a matrix using vectors as columns.

data/lib/statsample/analysis.rb CHANGED Viewed

@@ -26,54 +26,75 @@ module Statsample
   #  # or using the returned variables
   #  an1.run
   #  # You can also generate a report using ReportBuilder.
-  #  # puts and pp are overloaded, so its output will be
-  #  # redirected to report.
-  #  # Summary method call 'report_building' on the object,
-  #  # instead of calling summary
+  #  # .summary() method call 'report_building' on the object,
+  #  # instead of calling text summary
   #  an1.generate("report.html")
   module Analysis
     @@stored_analysis={}
     @@last_analysis=nil
+    def self.clear_analysis
+      @@stored_analysis.clear
+    end
     def self.stored_analysis
       @@stored_analysis
     end
     def self.last
       @@stored_analysis[@@last_analysis]
     end
-    def self.store(name,opts=Hash.new,&block)
+    def self.store(name, opts=Hash.new,&block)
       raise "You should provide a block" if !block
       @@last_analysis=name
-      @@stored_analysis[name]=Suite.new(name,opts,&block)
+      opts={:name=>name}.merge(opts)
+      @@stored_analysis[name]=Suite.new(opts,&block)
     end
-    # Run analysis +name+
-    # Withoud arguments, run the latest analysis
+    # Run analysis +*args+
+    # Without arguments, run all stored analysis
     # Only 'echo' will be returned to screen
-    def self.run(name=nil)
-      name||=@@last_analysis
-      raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
-      stored_analysis[name].run
+    def self.run(*args)
+      args=stored_analysis.keys if args.size==0
+      raise "Analysis #{args} doesn't exists" if (args - stored_analysis.keys).size>0
+      args.each do |name|
+        stored_analysis[name].run
+      end
     end
-    # Run analysis and return to screen all
-    # echo and summary callings
-    def self.run_batch(name=nil)
-      name||=@@last_analysis
-      raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
-      puts stored_analysis[name].to_text
-    end
-    def self.save(filename, name=nil)
-      name||=@@last_analysis
-      raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
-      puts stored_analysis[name].generate(filename)
+    # Add analysis +*args+ to an reportbuilder object.
+    # Without arguments, add all stored analysis
+    # Each analysis is wrapped inside a ReportBuilder::Section object
+    # This is the method is used by save() and to_text()
+    def self.add_to_reportbuilder(rb, *args)
+      args=stored_analysis.keys if args.size==0
+      raise "Analysis #{name} doesn't exists" if (args - stored_analysis.keys).size>0
+      args.each do |name|
+        section=ReportBuilder::Section.new(:name=>stored_analysis[name].name)
+        rb_an=stored_analysis[name].add_to_reportbuilder(section)
+        rb.add(section)
+        rb_an.run
+      end
     end
+    # Save the analysis on a file
+    # Without arguments, add all stored analysis
+    def self.save(filename, *args)
+      rb=ReportBuilder.new(:name=>filename)
+      add_to_reportbuilder(rb, *args)
+      rb.save(filename)
+    end
     # Run analysis and return as string
     # output of echo callings
-    def self.to_text(name=nil)
-      name||=@@last_analysis
-      raise "Analysis #{name} doesn't exists" unless stored_analysis[name]
-      stored_analysis[name].to_text
+    # Without arguments, add all stored analysis
+    def self.to_text(*args)
+      rb=ReportBuilder.new(:name=>"Analysis #{Time.now}")
+      add_to_reportbuilder(rb, *args)
+      rb.to_text
     end
+    # Run analysis and return to screen all
+    # echo and summary callings
+    def self.run_batch(*args)
+      puts to_text(*args)
+    end
   end
 end

data/lib/statsample/analysis/suite.rb CHANGED Viewed

@@ -5,29 +5,42 @@ module Statsample
       attr_accessor :output
       attr_accessor :name
       attr_reader :block
-      def initialize(name,opts=Hash.new(),&block)
-        @name=name
+      def initialize(opts=Hash.new(), &block)
+        if !opts.is_a? Hash
+          opts={:name=>opts}
+        end
         @block=block
+        @name=opts[:name] || "Analysis #{Time.now}"
         @attached=[]
         @output=opts[:output] || ::STDOUT
       end
       # Run the analysis, putting output on
       def run
          @block.arity<1 ? instance_eval(&@block) : @block.call(self)
       end
+      # Provides a description of the procedure. Only appears as a commentary on
+      # SuiteReportBuilder outputs
+      def desc(d)
+        @output.puts("Description:")
+        @output.puts("  #{d}")
+      end
       def echo(*args)
         @output.puts(*args)
       end
       def summary(obj)
         obj.summary
       end
+      def add_to_reportbuilder(rb)
+        SuiteReportBuilder.new({:name=>name, :rb=>rb}, &block)
+      end
       def generate(filename)
-        ar=SuiteReportBuilder.new(name,&block)
+        ar=SuiteReportBuilder.new({:name=>name}, &block)
         ar.generate(filename)
       end
       def to_text
-        ar=SuiteReportBuilder.new(name, &block)
+        ar=SuiteReportBuilder.new({:name=>name}, &block)
         ar.to_text
       end

data/lib/statsample/analysis/suitereportbuilder.rb CHANGED Viewed

@@ -2,9 +2,12 @@ module Statsample
   module Analysis
     class SuiteReportBuilder < Suite
       attr_accessor :rb
-      def initialize(name,&block)
-        super(name,&block)
-        @rb=ReportBuilder.new(:name=>name)
+      def initialize(opts=Hash.new,&block)
+        if !opts.is_a? Hash
+          opts={:name=>opts}
+        end
+        super(opts,&block)
+        @rb=opts[:rb] || ReportBuilder.new(:name=>name)
       end
       def generate(filename)
         run if @block
@@ -17,6 +20,9 @@ module Statsample
       def summary(o)
         @rb.add(o)
       end
+      def desc(d)
+        @rb.add(d)
+      end
       def echo(*args)
         args.each do |a|
           @rb.add(a)

data/lib/statsample/anova.rb CHANGED Viewed

@@ -18,5 +18,7 @@ module Statsample
     end
   end
 end
 require 'statsample/anova/oneway'
+require 'statsample/anova/contrast'
 require 'statsample/anova/twoway'

data/lib/statsample/anova/contrast.rb ADDED Viewed

@@ -0,0 +1,79 @@
+module Statsample
+  module Anova
+    class Contrast
+      attr_reader :psi
+      attr_reader :msw
+      include Summarizable
+      def initialize(opts=Hash.new)
+        raise "Should set at least vectors options" if opts[:vectors].nil?
+        @vectors=opts[:vectors]
+        @c=opts[:c]
+        @c1,@c2=opts[:c1], opts[:c2]
+        @t_options=opts[:t_options] || {:estimate_name=>_("Psi estimate")}
+        @name=opts[:name] || _("Contrast")
+        psi
+        @anova=Statsample::Anova::OneWayWithVectors.new(@vectors)
+        @msw=@anova.msw
+      end
+      # Hypothesis contrast, selecting index for each constrast
+      # For example, if you want to contrast x_0 against x_1 and x_2
+      # you should use
+      # c.contrast([0],[1,2])
+      def c_by_index(c1,c2)
+        contrast=[0]*@vectors.size
+        c1.each {|i| contrast[i]=1.quo(c1.size)}
+        c2.each {|i| contrast[i]=-1.quo(c2.size)}
+        @c=contrast
+        c(contrast)
+      end
+      def psi
+        if @psi.nil?
+          c(@c) if @c
+          c_by_index(@c1,@c2) if (@c1 and @c2)
+        end
+        @psi
+      end
+      def confidence_interval(cl=nil)
+        t_object.confidence_interval(cl)
+      end
+      # Hypothesis contrast, using custom values
+      # Every parameter is a contrast value. You should use
+      # the same number of contrast as vectors on class and the sum
+      # of constrast should be 0.
+      def c(args=nil)
+        return @c if args.nil?
+        @c=args
+        raise "contrast number!=vector number" if args.size!=@vectors.size
+        #raise "Sum should be 0" if args.inject(0) {|ac,v| ac+v}!=0
+        @psi=args.size.times.inject(0) {|ac,i| ac+(args[i]*@vectors[i].mean)}
+      end
+      def standard_error
+        sum=@vectors.size.times.inject(0) {|ac,i|
+          ac+((@c[i].rationalize**2).quo(@vectors[i].size))
+        }
+        Math.sqrt(@msw*sum)
+      end
+      alias :se :standard_error
+      def df
+        @vectors.inject(0) {|ac,v| ac+v.size}-@vectors.size
+      end
+      def t_object
+        Statsample::Test::T.new(psi, se, df, @t_options)
+      end
+      def t
+        t_object.t
+      end
+      def probability
+        t_object.probability
+      end
+      def report_building(builder)
+         builder.section(:name=>@name) do |s|
+           s.text _("Contrast:%s") % c.join(",")
+           s.parse_element(t_object)
+         end
+      end
+    end
+  end
+end