skab 0.1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1,104 @@
1
+ = Skab
2
+
3
+ This is a tool to help run statistical analyses of A/B testing experiments
4
+ we run here at Songkick.
5
+
6
+ We use this util mainly to generate CSV files that we can plot using Google
7
+ Docs in order to determine if an A/B test is a success or a failure.
8
+
9
+ == Getting started
10
+
11
+ * Install skab by running `gem install skab`
12
+ * You can run the util by using the `skab` command line
13
+
14
+ == Command line arguments
15
+
16
+ skab [output] [model] [model_args]
17
+
18
+ The command line accepts a variable number of arguments:
19
+
20
+ * `output` is the name of the output module to use to print data
21
+ * `model` is the name of the model used to model the process to analyse
22
+ * All other arguments are model dependent and are passed to the model
23
+
24
+ == Outputs
25
+
26
+ Skab is able to output different statistics, all based on the model used to
27
+ generate the distribution.
28
+
29
+ We currently support two main outputs:
30
+
31
+ * Distribution: the discrete probability distribution for each group,
32
+ based on the model used to represent the process
33
+ * Differential: the discrete probability distribution for Xb - Xa
34
+
35
+ == Models
36
+
37
+ Skab currently supports two models to generate a distribution of the mean
38
+ depending on the actual observed values:
39
+
40
+ * Poisson model, working with rate of events on a specific period of time
41
+ * Binomial model, working with success rates
42
+
43
+ === The poisson model
44
+
45
+ The poisson model accepts two integer parameters: A and B. Each parameter
46
+ corresponds to the measured number of events occuring in group A or B,
47
+ respectively.
48
+
49
+ The distribution outputs a list of probability for each mean depending on the
50
+ A or B group, according to the poisson law of small numbers.
51
+
52
+ Here is an example, with 1450 events observed for group A and 1430 for group B:
53
+
54
+ skab distribution poisson 1450 1430
55
+
56
+ It is worth noting that the Poisson distribution is expensive to compute for
57
+ large numbers (> 100), so this model uses an approximation using a normal
58
+ distribution (using a variance of delta).
59
+
60
+ === The binomial model
61
+
62
+ The binomial model is used to generate a distribution of success rates
63
+ depending on a number of trials and successes for each group A and B.
64
+
65
+ The distribution outputs a list of probable success rates and their respective
66
+ probability for groups A and B.
67
+
68
+ For example, this command generate the binomial distribution with:
69
+
70
+ * 200 successes out of 450 trials for group A
71
+ * 220 successes out of 470 trials for group B
72
+
73
+ skab distribution binomial 450 200 470 220
74
+
75
+ == Known issues
76
+
77
+ This software relies on Hash ordering to display values in the correct order.
78
+ On Ruby versions older than 1.9, hash ordering wasn't guaranteed, and this
79
+ will cause some output to be inconsistent (mainly differential CSV and
80
+ summary outputs).
81
+
82
+ == LICENSE
83
+
84
+ The MIT License
85
+
86
+ Copyright © 2012 Songkick
87
+
88
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
89
+ this software and associated documentation files (the “Software”), to deal in
90
+ the Software without restriction, including without limitation the rights to
91
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
92
+ of the Software, and to permit persons to whom the Software is furnished to do
93
+ so, subject to the following conditions:
94
+
95
+ The above copyright notice and this permission notice shall be included in all
96
+ copies or substantial portions of the Software.
97
+
98
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
99
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
100
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
101
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
102
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
103
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
104
+ SOFTWARE.
@@ -0,0 +1,81 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ version = RUBY_VERSION.split('.').map { |s| s.to_i }
4
+ if version[0] <= 1 && version [1] < 9
5
+ STDERR.puts <<-WARN18
6
+ WARNING: This software relies on feature available in Ruby 1.9 only. Using it
7
+ under Ruby 1.8 may yield unexpected results. Read the README for more
8
+ information.
9
+ WARN18
10
+ end
11
+
12
+ usage = <<-USAGE
13
+ Usage: skab [output] [model] [extra_params]
14
+ \tTry `skab help model` for more info about available models
15
+ \tor `skab help output` for more info about outputs
16
+ USAGE
17
+
18
+ require File.expand_path('../../lib/skab', __FILE__)
19
+
20
+ if ARGV.empty?
21
+ puts usage
22
+ exit
23
+ end
24
+
25
+ if ARGV[0] == 'help'
26
+ if ARGV[1] == 'model'
27
+ if ARGV[2]
28
+ model = Skab::Models.from_name(ARGV[2])
29
+ if model
30
+ puts model.help
31
+ else
32
+ puts <<-UNKNOWN_MODEL
33
+ This model doesn't exist. List of known models:
34
+ #{Skab::Models.model_names.join(', ')}
35
+ UNKNOWN_MODEL
36
+ end
37
+ else
38
+ puts Skab::Models.help
39
+ end
40
+ elsif ARGV[1] == 'output'
41
+ if ARGV[2]
42
+ output = Skab::Output.from_name(ARGV[2])
43
+ if output
44
+ puts output.help
45
+ else
46
+ puts <<-UNKNOWN_OUTPUT
47
+ This output doesn't exist. List of known output:
48
+ #{Skab::Output.output_names.join(', ')}
49
+ UNKNOWN_OUTPUT
50
+ end
51
+ else
52
+ puts Skab::Output.help
53
+ end
54
+ else
55
+ puts usage
56
+ end
57
+ exit
58
+ end
59
+
60
+ model_args = ARGV.slice(2, ARGV.length)
61
+ model_class = Skab::Models.from_name(ARGV[1])
62
+ unless model_class
63
+ puts <<-UNKNOWN_MODEL
64
+ This model doesn't exist. List of known models:
65
+ #{Skab::Models.model_names.join(', ')}
66
+ UNKNOWN_MODEL
67
+ exit
68
+ end
69
+
70
+ output_class = Skab::Output.from_name(ARGV[0])
71
+ unless output_class
72
+ puts <<-UNKNOWN_OUTPUT
73
+ This output doesn't exist. List of known outputs:
74
+ #{Skab::Output.output_names.join(', ')}
75
+ UNKNOWN_OUTPUT
76
+ exit
77
+ end
78
+
79
+ model = model_class.new(model_args)
80
+ output = output_class.new(STDOUT)
81
+ output.output(model)
@@ -0,0 +1,6 @@
1
+ module Skab
2
+ ROOT = File.expand_path('..', __FILE__)
3
+ require ROOT + '/skab/models'
4
+ require ROOT + '/skab/output'
5
+ end
6
+
@@ -0,0 +1,26 @@
1
+ module Skab
2
+ require ROOT + '/skab/models/poisson'
3
+ require ROOT + '/skab/models/binomial'
4
+
5
+ module Models
6
+ def self.from_name(name)
7
+ case name
8
+ when 'poisson'
9
+ Poisson
10
+ when 'binomial'
11
+ Binomial
12
+ end
13
+ end
14
+
15
+ def self.model_names
16
+ ['poisson', 'binomial']
17
+ end
18
+
19
+ def self.help
20
+ <<-HELP
21
+ The following models are available: #{model_names.join(', ')}
22
+ \tTry `skab help model [model] to find out more about a model
23
+ HELP
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,83 @@
1
+ module Skab
2
+ module Models
3
+ class Binomial
4
+
5
+ def initialize(args)
6
+ @a_trials = args.shift.to_i
7
+ @a_success = args.shift.to_i
8
+ @b_trials = args.shift.to_i
9
+ @b_success = args.shift.to_i
10
+ end
11
+
12
+ def distribution
13
+ return @distribution if @distribution
14
+ @distribution = []
15
+ sums = [0, 0, 0]
16
+ i = 0.0
17
+ while i <= 1000
18
+ @distribution[i] = []
19
+ @distribution[i][0] = i / 1000
20
+ @distribution[i][1] = binomial(@a_trials, @a_success, i / 1000)
21
+ @distribution[i][2] = binomial(@b_trials, @b_success, i / 1000)
22
+ sums[1] += binomial(@a_trials, @a_success, i / 1000)
23
+ sums[2] += binomial(@b_trials, @b_success, i / 1000)
24
+ i += 1
25
+ end
26
+ i = 0.0
27
+ while i <= 1000
28
+ @distribution[i][1] /= sums[1]
29
+ @distribution[i][2] /= sums[2]
30
+ i += 1
31
+ end
32
+ @distribution
33
+ end
34
+
35
+ def differential
36
+ return @differential if @differential
37
+ @differential = Hash.new(0)
38
+ i = 0.0
39
+ while i <= 1000
40
+ j = 0.0
41
+ while j <= 1000
42
+ @differential[(j - i) / 1000] += distribution[j][2] * distribution[i][1]
43
+ j += 1
44
+ end
45
+ i += 1
46
+ end
47
+ @differential
48
+ end
49
+
50
+ def self.help
51
+ <<-HELP
52
+ skab [output] binomial [trials_a] [successes_a] [trials_b] [successes_b]
53
+ \tWhere: all parameters are integers
54
+ \tPlots the binomial distribution for A and B, given their respective
55
+ \tnumber of successes and trials
56
+ HELP
57
+ end
58
+
59
+ private
60
+
61
+ attr_reader :a, :b
62
+
63
+ def binomial(trials, success, rate)
64
+ binomial_coef(trials, success) *
65
+ (rate ** success) *
66
+ ((1 - rate) ** (trials - success))
67
+ end
68
+
69
+ def binomial_coef(n, k)
70
+ fact(n) / (fact(k) * fact(n - k))
71
+ end
72
+
73
+ def fact(n)
74
+ f = 1
75
+ (1..n).each do |i|
76
+ f *= i
77
+ end
78
+ f
79
+ end
80
+
81
+ end
82
+ end
83
+ end
@@ -0,0 +1,67 @@
1
+ module Skab
2
+ module Models
3
+ class Poisson
4
+
5
+ def initialize(args)
6
+ @a = args.shift.to_i
7
+ @b = args.shift.to_i
8
+ end
9
+
10
+ def distribution
11
+ return @distribution if @distribution
12
+ @distribution = []
13
+ (0..limit).each do |n|
14
+ @distribution[n] = []
15
+ @distribution[n][0] = n
16
+ @distribution[n][1] = normal_approximation(n, a)
17
+ @distribution[n][2] = normal_approximation(n, b)
18
+ end
19
+ @distribution
20
+ end
21
+
22
+ def differential
23
+ return @differential if @differential
24
+ @differential = Hash.new(0)
25
+ (0..limit).each do |a|
26
+ (0..limit).each do |b|
27
+ @differential[b - a] += distribution[b][2] * distribution[a][1]
28
+ end
29
+ end
30
+ @differential
31
+ end
32
+
33
+ def self.help
34
+ <<-USAGE
35
+ skab [output] poisson [a] [b]
36
+ \tWhere: [a] and [b] are integers
37
+ \tPlots the poisson distribution for [a] and [b]
38
+ USAGE
39
+ end
40
+
41
+ private
42
+
43
+ attr_reader :a, :b
44
+
45
+ def limit
46
+ limit = [a, b].max * 2
47
+ end
48
+
49
+ def normal_approximation(k, delta)
50
+ (1 / (Math.sqrt(delta) * Math.sqrt(2 * Math::PI))) * Math.exp(-0.5 * (((k - delta) / Math.sqrt(delta))**2))
51
+ end
52
+
53
+ def poisson(k, delta)
54
+ ((delta ** k) * Math.exp(-delta)) / factorial(k)
55
+ end
56
+
57
+ def factorial(n)
58
+ f = 1
59
+ (1..n).each do |i|
60
+ f *= i
61
+ end
62
+ f
63
+ end
64
+
65
+ end
66
+ end
67
+ end
@@ -0,0 +1,29 @@
1
+ module Skab
2
+ require ROOT + '/skab/output/distribution'
3
+ require ROOT + '/skab/output/differential'
4
+ require ROOT + '/skab/output/summary'
5
+
6
+ module Output
7
+ def self.from_name(name)
8
+ case name
9
+ when 'differential'
10
+ Differential
11
+ when 'distribution'
12
+ Distribution
13
+ when 'summary'
14
+ Summary
15
+ end
16
+ end
17
+
18
+ def self.output_names
19
+ ['distribution', 'differential', 'summary']
20
+ end
21
+
22
+ def self.help
23
+ <<-HELP
24
+ The following outputs are available: #{output_names.join(', ')}
25
+ \tTry `skab help output [output] to find out more about a given output
26
+ HELP
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,44 @@
1
+ module Skab
2
+ module Output
3
+ class Differential
4
+ def initialize(out)
5
+ @out = out
6
+ end
7
+
8
+ def output(model)
9
+ data = model.differential
10
+
11
+ range = 0
12
+ data.each do |k, v|
13
+ if v != 0 && abs(k) > range
14
+ range = abs(k)
15
+ end
16
+ end
17
+
18
+ range += range / 10
19
+
20
+ Hash[data.sort].each do |k, v|
21
+ if abs(k) <= range
22
+ @out.puts "#{k},#{v}"
23
+ end
24
+ end
25
+ end
26
+
27
+ def self.help
28
+ <<-HELP
29
+ Usage: skab differential [model] [parameters]
30
+ \tOutputs the discrete probability distribution for (B - A) as returned by the
31
+ \tspecified model. The output is a two columns CSV file, where the first
32
+ \tcolumn is the absolute value of (B - A) and the second column the
33
+ \tcorresponding discrete probability
34
+ HELP
35
+ end
36
+
37
+ private
38
+
39
+ def abs(n)
40
+ n >= 0 ? n : -n
41
+ end
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,25 @@
1
+ module Skab
2
+ module Output
3
+ class Distribution
4
+ def initialize(out)
5
+ @out = out
6
+ end
7
+
8
+ def output(model)
9
+ model.distribution.each do |d|
10
+ @out.puts "#{d.join(',')}"
11
+ end
12
+ end
13
+
14
+ def self.help
15
+ <<-HELP
16
+ Usage: skab distribution [model] [parameters]
17
+ \tOutputs the discrete probability distribution for both A and B, as
18
+ \treturned by the specified model. The output is a three columns CSV
19
+ \tfile, where the first column is the probable mean and the second and
20
+ \tthird column the corresponding discrete probability for A and B.
21
+ HELP
22
+ end
23
+ end
24
+ end
25
+ end
@@ -0,0 +1,34 @@
1
+ module Skab
2
+ module Output
3
+ class Summary
4
+ def initialize(out)
5
+ @out = out
6
+ end
7
+
8
+ def output(model)
9
+ sum = 0.0
10
+ min = 0
11
+ max = 0
12
+ Hash[model.differential.sort].each do |k, v|
13
+ sum += v
14
+ if min == 0 || sum <= 0.05
15
+ min = k
16
+ end
17
+ if max == 0 && sum >= 0.95
18
+ max = k
19
+ end
20
+ end
21
+
22
+ @out.puts "The difference is located between #{min} and #{max} (90% confidence)"
23
+ end
24
+
25
+ def self.help
26
+ <<-HELP
27
+ Usage: skab summary [model] [parameters]
28
+ \tOutputs a summary of the whole statistical analysis conducted on A and
29
+ \tB, using the specified model.
30
+ HELP
31
+ end
32
+ end
33
+ end
34
+ end
metadata ADDED
@@ -0,0 +1,64 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: skab
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.1.0
6
+ platform: ruby
7
+ authors:
8
+ - Vivien Barousse
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2012-10-10 00:00:00 Z
14
+ dependencies: []
15
+
16
+ description:
17
+ email: vivien@songkick.com
18
+ executables:
19
+ - skab
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - README.rdoc
24
+ files:
25
+ - README.rdoc
26
+ - bin/skab
27
+ - lib/skab.rb
28
+ - lib/skab/output.rb
29
+ - lib/skab/models/binomial.rb
30
+ - lib/skab/models/poisson.rb
31
+ - lib/skab/output/summary.rb
32
+ - lib/skab/output/distribution.rb
33
+ - lib/skab/output/differential.rb
34
+ - lib/skab/models.rb
35
+ homepage: http://github.com/songkick/skab
36
+ licenses: []
37
+
38
+ post_install_message:
39
+ rdoc_options:
40
+ - --main
41
+ - README.rdoc
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ none: false
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ required_rubygems_version: !ruby/object:Gem::Requirement
51
+ none: false
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: "0"
56
+ requirements: []
57
+
58
+ rubyforge_project:
59
+ rubygems_version: 1.8.21
60
+ signing_key:
61
+ specification_version: 3
62
+ summary: A/B testing statistical analysis utility
63
+ test_files: []
64
+