RubyGems - rails-data-explorer - Versions diffs - 0.2.3 → 1.0.0 - Mend

rails-data-explorer 0.2.3 → 1.0.0

Files changed (71) hide show

data/lib/rails-data-explorer/statistics/pearsons_chi_squared_independence_test.rb DELETED

@@ -1,75 +0,0 @@
-=begin
-From http://en.wikipedia.org/wiki/Pearson's_chi-squared_test
-Pearson's chi-squared test is used to assess whether paired observations on two
-variables, expressed in a contingency table, are independent of each other.
-An "observation" consists of the values of two outcomes and the null hypothesis
-is that the occurrence of these outcomes is statistically independent. Each
-observation is allocated to one cell of a two-dimensional array of cells (called
-a contingency table) according to the values of the two outcomes.
-Assumptions
------------
-The chi-squared test, when used with the standard approximation that a chi-
-squared distribution is applicable, has the following assumptions:
-* Simple random sample – The sample data is a random sampling from a fixed
-  distribution or population where every collection of members of the population
-  of the given sample size has an equal probability of selection. Variants of
-  the test have been developed for complex samples, such as where the data is
-  weighted. Other forms can be used such as purposive sampling.
-* Sample size (whole table) – A sample with a sufficiently large size is assumed.
-  If a chi squared test is conducted on a sample with a smaller size, then the
-  chi squared test will yield an inaccurate inference. The researcher, by using
-  chi squared test on small samples, might end up committing a Type II error.
-* Expected cell count – Adequate expected cell counts. Some require 5 or more,
-  and others require 10 or more. A common rule is 5 or more in all cells of a
-  2-by-2 table, and 5 or more in 80% of cells in larger tables, but no cells
-  with zero expected count. When this assumption is not met, Yates's Correction
-  is applied.
-* Independence – The observations are always assumed to be independent of each
-  other. This means chi-squared cannot be used to test correlated data
-  (like matched pairs or panel data). In those cases you might want to turn to
-  McNemar's test.
-Problems
---------
-The approximation to the chi-squared distribution breaks down if expected
-frequencies are too low. It will normally be acceptable so long as no more than
-20% of the events have expected frequencies below 5. Where there is only 1
-degree of freedom, the approximation is not reliable if expected frequencies are
-below 10. In this case, a better approximation can be obtained by reducing the
-absolute value of each difference between observed and expected frequencies by
-0.5 before squaring; this is called Yates's correction for continuity.
-In cases where the expected value, E, is found to be small (indicating a small
-underlying population probability, and/or a small number of observations), the
-normal approximation of the multinomial distribution can fail, and in such cases
-it is found to be more appropriate to use the G-test, a likelihood ratio-based
-test statistic. Where the total sample size is small, it is necessary to use an
-appropriate exact test, typically either the binomial test or (for contingency
-tables) Fisher's exact test. This test uses the conditional distribution of the
-test statistic given the marginal totals; however, it does not assume that the
-data were generated from an experiment in which the marginal totals are fixed
-and is valid whether or not that is the case.
-=end
-class RailsDataExplorer
-  module Statistics
-    class PearsonsChiSquaredIndependenceTest
-      #
-      def initialize(data_matrix, min_probability = 0.05)
-      end
-      def compute
-      end
-    end
-  end
-end

data/spec/rails-data-explorer/data_type/categorical_spec.rb DELETED

@@ -1,34 +0,0 @@
-require_relative '../../helper_no_rails'
-class RailsDataExplorer
-  class DataType
-    describe Categorical do
-      let(:dt) { Categorical.new }
-      let(:values) { ['a', 'a', 'b', 'c'] }
-      describe "#descriptive_statistics" do
-        let(:desc_stats) {
-          dt.descriptive_statistics(values)
-        }
-        it "computes count for each uniq val" do
-          desc_stats.detect{ |e| 'a_count' == e[:label] }[:value].must_equal 2
-        end
-        it "computes percent for each uniq val" do
-          desc_stats.detect{ |e| 'a_percent' == e[:label] }[:value].must_equal 50.0
-        end
-        it "computes total count" do
-          desc_stats.detect{ |e| 'Total_count' == e[:label] }[:value].must_equal 4
-        end
-      end
-      describe "#available_chart_types" do
-      end
-    end
-  end
-end