skab 0.1.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,104 @@
1
+ = Skab
2
+
3
+ This is a tool to help run statistical analyses of A/B testing experiments
4
+ we run here at Songkick.
5
+
6
+ We use this util mainly to generate CSV files that we can plot using Google
7
+ Docs in order to determine if an A/B test is a success or a failure.
8
+
9
+ == Getting started
10
+
11
+ * Install skab by running `gem install skab`
12
+ * You can run the util by using the `skab` command line
13
+
14
+ == Command line arguments
15
+
16
+ skab [output] [model] [model_args]
17
+
18
+ The command line accepts a variable number of arguments:
19
+
20
+ * `output` is the name of the output module to use to print data
21
+ * `model` is the name of the model used to model the process to analyse
22
+ * All other arguments are model dependent and are passed to the model
23
+
24
+ == Outputs
25
+
26
+ Skab is able to output different statistics, all based on the model used to
27
+ generate the distribution.
28
+
29
+ We currently support two main outputs:
30
+
31
+ * Distribution: the discrete probability distribution for each group,
32
+ based on the model used to represent the process
33
+ * Differential: the discrete probability distribution for Xb - Xa
34
+
35
+ == Models
36
+
37
+ Skab currently supports two models to generate a distribution of the mean
38
+ depending on the actual observed values:
39
+
40
+ * Poisson model, working with rate of events on a specific period of time
41
+ * Binomial model, working with success rates
42
+
43
+ === The poisson model
44
+
45
+ The poisson model accepts two integer parameters: A and B. Each parameter
46
+ corresponds to the measured number of events occuring in group A or B,
47
+ respectively.
48
+
49
+ The distribution outputs a list of probability for each mean depending on the
50
+ A or B group, according to the poisson law of small numbers.
51
+
52
+ Here is an example, with 1450 events observed for group A and 1430 for group B:
53
+
54
+ skab distribution poisson 1450 1430
55
+
56
+ It is worth noting that the Poisson distribution is expensive to compute for
57
+ large numbers (> 100), so this model uses an approximation using a normal
58
+ distribution (using a variance of delta).
59
+
60
+ === The binomial model
61
+
62
+ The binomial model is used to generate a distribution of success rates
63
+ depending on a number of trials and successes for each group A and B.
64
+
65
+ The distribution outputs a list of probable success rates and their respective
66
+ probability for groups A and B.
67
+
68
+ For example, this command generate the binomial distribution with:
69
+
70
+ * 200 successes out of 450 trials for group A
71
+ * 220 successes out of 470 trials for group B
72
+
73
+ skab distribution binomial 450 200 470 220
74
+
75
+ == Known issues
76
+
77
+ This software relies on Hash ordering to display values in the correct order.
78
+ On Ruby versions older than 1.9, hash ordering wasn't guaranteed, and this
79
+ will cause some output to be inconsistent (mainly differential CSV and
80
+ summary outputs).
81
+
82
+ == LICENSE
83
+
84
+ The MIT License
85
+
86
+ Copyright © 2012 Songkick
87
+
88
+ Permission is hereby granted, free of charge, to any person obtaining a copy of
89
+ this software and associated documentation files (the “Software”), to deal in
90
+ the Software without restriction, including without limitation the rights to
91
+ use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
92
+ of the Software, and to permit persons to whom the Software is furnished to do
93
+ so, subject to the following conditions:
94
+
95
+ The above copyright notice and this permission notice shall be included in all
96
+ copies or substantial portions of the Software.
97
+
98
+ THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
99
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
100
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
101
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
102
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
103
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
104
+ SOFTWARE.
@@ -0,0 +1,81 @@
1
+ #!/usr/bin/env ruby
2
+
3
+ version = RUBY_VERSION.split('.').map { |s| s.to_i }
4
+ if version[0] <= 1 && version [1] < 9
5
+ STDERR.puts <<-WARN18
6
+ WARNING: This software relies on feature available in Ruby 1.9 only. Using it
7
+ under Ruby 1.8 may yield unexpected results. Read the README for more
8
+ information.
9
+ WARN18
10
+ end
11
+
12
+ usage = <<-USAGE
13
+ Usage: skab [output] [model] [extra_params]
14
+ \tTry `skab help model` for more info about available models
15
+ \tor `skab help output` for more info about outputs
16
+ USAGE
17
+
18
+ require File.expand_path('../../lib/skab', __FILE__)
19
+
20
+ if ARGV.empty?
21
+ puts usage
22
+ exit
23
+ end
24
+
25
+ if ARGV[0] == 'help'
26
+ if ARGV[1] == 'model'
27
+ if ARGV[2]
28
+ model = Skab::Models.from_name(ARGV[2])
29
+ if model
30
+ puts model.help
31
+ else
32
+ puts <<-UNKNOWN_MODEL
33
+ This model doesn't exist. List of known models:
34
+ #{Skab::Models.model_names.join(', ')}
35
+ UNKNOWN_MODEL
36
+ end
37
+ else
38
+ puts Skab::Models.help
39
+ end
40
+ elsif ARGV[1] == 'output'
41
+ if ARGV[2]
42
+ output = Skab::Output.from_name(ARGV[2])
43
+ if output
44
+ puts output.help
45
+ else
46
+ puts <<-UNKNOWN_OUTPUT
47
+ This output doesn't exist. List of known output:
48
+ #{Skab::Output.output_names.join(', ')}
49
+ UNKNOWN_OUTPUT
50
+ end
51
+ else
52
+ puts Skab::Output.help
53
+ end
54
+ else
55
+ puts usage
56
+ end
57
+ exit
58
+ end
59
+
60
+ model_args = ARGV.slice(2, ARGV.length)
61
+ model_class = Skab::Models.from_name(ARGV[1])
62
+ unless model_class
63
+ puts <<-UNKNOWN_MODEL
64
+ This model doesn't exist. List of known models:
65
+ #{Skab::Models.model_names.join(', ')}
66
+ UNKNOWN_MODEL
67
+ exit
68
+ end
69
+
70
+ output_class = Skab::Output.from_name(ARGV[0])
71
+ unless output_class
72
+ puts <<-UNKNOWN_OUTPUT
73
+ This output doesn't exist. List of known outputs:
74
+ #{Skab::Output.output_names.join(', ')}
75
+ UNKNOWN_OUTPUT
76
+ exit
77
+ end
78
+
79
+ model = model_class.new(model_args)
80
+ output = output_class.new(STDOUT)
81
+ output.output(model)
@@ -0,0 +1,6 @@
1
+ module Skab
2
+ ROOT = File.expand_path('..', __FILE__)
3
+ require ROOT + '/skab/models'
4
+ require ROOT + '/skab/output'
5
+ end
6
+
@@ -0,0 +1,26 @@
1
+ module Skab
2
+ require ROOT + '/skab/models/poisson'
3
+ require ROOT + '/skab/models/binomial'
4
+
5
+ module Models
6
+ def self.from_name(name)
7
+ case name
8
+ when 'poisson'
9
+ Poisson
10
+ when 'binomial'
11
+ Binomial
12
+ end
13
+ end
14
+
15
+ def self.model_names
16
+ ['poisson', 'binomial']
17
+ end
18
+
19
+ def self.help
20
+ <<-HELP
21
+ The following models are available: #{model_names.join(', ')}
22
+ \tTry `skab help model [model] to find out more about a model
23
+ HELP
24
+ end
25
+ end
26
+ end
@@ -0,0 +1,83 @@
1
+ module Skab
2
+ module Models
3
+ class Binomial
4
+
5
+ def initialize(args)
6
+ @a_trials = args.shift.to_i
7
+ @a_success = args.shift.to_i
8
+ @b_trials = args.shift.to_i
9
+ @b_success = args.shift.to_i
10
+ end
11
+
12
+ def distribution
13
+ return @distribution if @distribution
14
+ @distribution = []
15
+ sums = [0, 0, 0]
16
+ i = 0.0
17
+ while i <= 1000
18
+ @distribution[i] = []
19
+ @distribution[i][0] = i / 1000
20
+ @distribution[i][1] = binomial(@a_trials, @a_success, i / 1000)
21
+ @distribution[i][2] = binomial(@b_trials, @b_success, i / 1000)
22
+ sums[1] += binomial(@a_trials, @a_success, i / 1000)
23
+ sums[2] += binomial(@b_trials, @b_success, i / 1000)
24
+ i += 1
25
+ end
26
+ i = 0.0
27
+ while i <= 1000
28
+ @distribution[i][1] /= sums[1]
29
+ @distribution[i][2] /= sums[2]
30
+ i += 1
31
+ end
32
+ @distribution
33
+ end
34
+
35
+ def differential
36
+ return @differential if @differential
37
+ @differential = Hash.new(0)
38
+ i = 0.0
39
+ while i <= 1000
40
+ j = 0.0
41
+ while j <= 1000
42
+ @differential[(j - i) / 1000] += distribution[j][2] * distribution[i][1]
43
+ j += 1
44
+ end
45
+ i += 1
46
+ end
47
+ @differential
48
+ end
49
+
50
+ def self.help
51
+ <<-HELP
52
+ skab [output] binomial [trials_a] [successes_a] [trials_b] [successes_b]
53
+ \tWhere: all parameters are integers
54
+ \tPlots the binomial distribution for A and B, given their respective
55
+ \tnumber of successes and trials
56
+ HELP
57
+ end
58
+
59
+ private
60
+
61
+ attr_reader :a, :b
62
+
63
+ def binomial(trials, success, rate)
64
+ binomial_coef(trials, success) *
65
+ (rate ** success) *
66
+ ((1 - rate) ** (trials - success))
67
+ end
68
+
69
+ def binomial_coef(n, k)
70
+ fact(n) / (fact(k) * fact(n - k))
71
+ end
72
+
73
+ def fact(n)
74
+ f = 1
75
+ (1..n).each do |i|
76
+ f *= i
77
+ end
78
+ f
79
+ end
80
+
81
+ end
82
+ end
83
+ end
@@ -0,0 +1,67 @@
1
+ module Skab
2
+ module Models
3
+ class Poisson
4
+
5
+ def initialize(args)
6
+ @a = args.shift.to_i
7
+ @b = args.shift.to_i
8
+ end
9
+
10
+ def distribution
11
+ return @distribution if @distribution
12
+ @distribution = []
13
+ (0..limit).each do |n|
14
+ @distribution[n] = []
15
+ @distribution[n][0] = n
16
+ @distribution[n][1] = normal_approximation(n, a)
17
+ @distribution[n][2] = normal_approximation(n, b)
18
+ end
19
+ @distribution
20
+ end
21
+
22
+ def differential
23
+ return @differential if @differential
24
+ @differential = Hash.new(0)
25
+ (0..limit).each do |a|
26
+ (0..limit).each do |b|
27
+ @differential[b - a] += distribution[b][2] * distribution[a][1]
28
+ end
29
+ end
30
+ @differential
31
+ end
32
+
33
+ def self.help
34
+ <<-USAGE
35
+ skab [output] poisson [a] [b]
36
+ \tWhere: [a] and [b] are integers
37
+ \tPlots the poisson distribution for [a] and [b]
38
+ USAGE
39
+ end
40
+
41
+ private
42
+
43
+ attr_reader :a, :b
44
+
45
+ def limit
46
+ limit = [a, b].max * 2
47
+ end
48
+
49
+ def normal_approximation(k, delta)
50
+ (1 / (Math.sqrt(delta) * Math.sqrt(2 * Math::PI))) * Math.exp(-0.5 * (((k - delta) / Math.sqrt(delta))**2))
51
+ end
52
+
53
+ def poisson(k, delta)
54
+ ((delta ** k) * Math.exp(-delta)) / factorial(k)
55
+ end
56
+
57
+ def factorial(n)
58
+ f = 1
59
+ (1..n).each do |i|
60
+ f *= i
61
+ end
62
+ f
63
+ end
64
+
65
+ end
66
+ end
67
+ end
@@ -0,0 +1,29 @@
1
+ module Skab
2
+ require ROOT + '/skab/output/distribution'
3
+ require ROOT + '/skab/output/differential'
4
+ require ROOT + '/skab/output/summary'
5
+
6
+ module Output
7
+ def self.from_name(name)
8
+ case name
9
+ when 'differential'
10
+ Differential
11
+ when 'distribution'
12
+ Distribution
13
+ when 'summary'
14
+ Summary
15
+ end
16
+ end
17
+
18
+ def self.output_names
19
+ ['distribution', 'differential', 'summary']
20
+ end
21
+
22
+ def self.help
23
+ <<-HELP
24
+ The following outputs are available: #{output_names.join(', ')}
25
+ \tTry `skab help output [output] to find out more about a given output
26
+ HELP
27
+ end
28
+ end
29
+ end
@@ -0,0 +1,44 @@
1
+ module Skab
2
+ module Output
3
+ class Differential
4
+ def initialize(out)
5
+ @out = out
6
+ end
7
+
8
+ def output(model)
9
+ data = model.differential
10
+
11
+ range = 0
12
+ data.each do |k, v|
13
+ if v != 0 && abs(k) > range
14
+ range = abs(k)
15
+ end
16
+ end
17
+
18
+ range += range / 10
19
+
20
+ Hash[data.sort].each do |k, v|
21
+ if abs(k) <= range
22
+ @out.puts "#{k},#{v}"
23
+ end
24
+ end
25
+ end
26
+
27
+ def self.help
28
+ <<-HELP
29
+ Usage: skab differential [model] [parameters]
30
+ \tOutputs the discrete probability distribution for (B - A) as returned by the
31
+ \tspecified model. The output is a two columns CSV file, where the first
32
+ \tcolumn is the absolute value of (B - A) and the second column the
33
+ \tcorresponding discrete probability
34
+ HELP
35
+ end
36
+
37
+ private
38
+
39
+ def abs(n)
40
+ n >= 0 ? n : -n
41
+ end
42
+ end
43
+ end
44
+ end
@@ -0,0 +1,25 @@
1
+ module Skab
2
+ module Output
3
+ class Distribution
4
+ def initialize(out)
5
+ @out = out
6
+ end
7
+
8
+ def output(model)
9
+ model.distribution.each do |d|
10
+ @out.puts "#{d.join(',')}"
11
+ end
12
+ end
13
+
14
+ def self.help
15
+ <<-HELP
16
+ Usage: skab distribution [model] [parameters]
17
+ \tOutputs the discrete probability distribution for both A and B, as
18
+ \treturned by the specified model. The output is a three columns CSV
19
+ \tfile, where the first column is the probable mean and the second and
20
+ \tthird column the corresponding discrete probability for A and B.
21
+ HELP
22
+ end
23
+ end
24
+ end
25
+ end
@@ -0,0 +1,34 @@
1
+ module Skab
2
+ module Output
3
+ class Summary
4
+ def initialize(out)
5
+ @out = out
6
+ end
7
+
8
+ def output(model)
9
+ sum = 0.0
10
+ min = 0
11
+ max = 0
12
+ Hash[model.differential.sort].each do |k, v|
13
+ sum += v
14
+ if min == 0 || sum <= 0.05
15
+ min = k
16
+ end
17
+ if max == 0 && sum >= 0.95
18
+ max = k
19
+ end
20
+ end
21
+
22
+ @out.puts "The difference is located between #{min} and #{max} (90% confidence)"
23
+ end
24
+
25
+ def self.help
26
+ <<-HELP
27
+ Usage: skab summary [model] [parameters]
28
+ \tOutputs a summary of the whole statistical analysis conducted on A and
29
+ \tB, using the specified model.
30
+ HELP
31
+ end
32
+ end
33
+ end
34
+ end
metadata ADDED
@@ -0,0 +1,64 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: skab
3
+ version: !ruby/object:Gem::Version
4
+ prerelease:
5
+ version: 0.1.0
6
+ platform: ruby
7
+ authors:
8
+ - Vivien Barousse
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+
13
+ date: 2012-10-10 00:00:00 Z
14
+ dependencies: []
15
+
16
+ description:
17
+ email: vivien@songkick.com
18
+ executables:
19
+ - skab
20
+ extensions: []
21
+
22
+ extra_rdoc_files:
23
+ - README.rdoc
24
+ files:
25
+ - README.rdoc
26
+ - bin/skab
27
+ - lib/skab.rb
28
+ - lib/skab/output.rb
29
+ - lib/skab/models/binomial.rb
30
+ - lib/skab/models/poisson.rb
31
+ - lib/skab/output/summary.rb
32
+ - lib/skab/output/distribution.rb
33
+ - lib/skab/output/differential.rb
34
+ - lib/skab/models.rb
35
+ homepage: http://github.com/songkick/skab
36
+ licenses: []
37
+
38
+ post_install_message:
39
+ rdoc_options:
40
+ - --main
41
+ - README.rdoc
42
+ require_paths:
43
+ - lib
44
+ required_ruby_version: !ruby/object:Gem::Requirement
45
+ none: false
46
+ requirements:
47
+ - - ">="
48
+ - !ruby/object:Gem::Version
49
+ version: "0"
50
+ required_rubygems_version: !ruby/object:Gem::Requirement
51
+ none: false
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: "0"
56
+ requirements: []
57
+
58
+ rubyforge_project:
59
+ rubygems_version: 1.8.21
60
+ signing_key:
61
+ specification_version: 3
62
+ summary: A/B testing statistical analysis utility
63
+ test_files: []
64
+