skab 0.1.0
Sign up to get free protection for your applications and to get access to all the features.
- data/README.rdoc +104 -0
- data/bin/skab +81 -0
- data/lib/skab.rb +6 -0
- data/lib/skab/models.rb +26 -0
- data/lib/skab/models/binomial.rb +83 -0
- data/lib/skab/models/poisson.rb +67 -0
- data/lib/skab/output.rb +29 -0
- data/lib/skab/output/differential.rb +44 -0
- data/lib/skab/output/distribution.rb +25 -0
- data/lib/skab/output/summary.rb +34 -0
- metadata +64 -0
data/README.rdoc
ADDED
@@ -0,0 +1,104 @@
|
|
1
|
+
= Skab
|
2
|
+
|
3
|
+
This is a tool to help run statistical analyses of A/B testing experiments
|
4
|
+
we run here at Songkick.
|
5
|
+
|
6
|
+
We use this util mainly to generate CSV files that we can plot using Google
|
7
|
+
Docs in order to determine if an A/B test is a success or a failure.
|
8
|
+
|
9
|
+
== Getting started
|
10
|
+
|
11
|
+
* Install skab by running `gem install skab`
|
12
|
+
* You can run the util by using the `skab` command line
|
13
|
+
|
14
|
+
== Command line arguments
|
15
|
+
|
16
|
+
skab [output] [model] [model_args]
|
17
|
+
|
18
|
+
The command line accepts a variable number of arguments:
|
19
|
+
|
20
|
+
* `output` is the name of the output module to use to print data
|
21
|
+
* `model` is the name of the model used to model the process to analyse
|
22
|
+
* All other arguments are model dependent and are passed to the model
|
23
|
+
|
24
|
+
== Outputs
|
25
|
+
|
26
|
+
Skab is able to output different statistics, all based on the model used to
|
27
|
+
generate the distribution.
|
28
|
+
|
29
|
+
We currently support two main outputs:
|
30
|
+
|
31
|
+
* Distribution: the discrete probability distribution for each group,
|
32
|
+
based on the model used to represent the process
|
33
|
+
* Differential: the discrete probability distribution for Xb - Xa
|
34
|
+
|
35
|
+
== Models
|
36
|
+
|
37
|
+
Skab currently supports two models to generate a distribution of the mean
|
38
|
+
depending on the actual observed values:
|
39
|
+
|
40
|
+
* Poisson model, working with rate of events on a specific period of time
|
41
|
+
* Binomial model, working with success rates
|
42
|
+
|
43
|
+
=== The poisson model
|
44
|
+
|
45
|
+
The poisson model accepts two integer parameters: A and B. Each parameter
|
46
|
+
corresponds to the measured number of events occuring in group A or B,
|
47
|
+
respectively.
|
48
|
+
|
49
|
+
The distribution outputs a list of probability for each mean depending on the
|
50
|
+
A or B group, according to the poisson law of small numbers.
|
51
|
+
|
52
|
+
Here is an example, with 1450 events observed for group A and 1430 for group B:
|
53
|
+
|
54
|
+
skab distribution poisson 1450 1430
|
55
|
+
|
56
|
+
It is worth noting that the Poisson distribution is expensive to compute for
|
57
|
+
large numbers (> 100), so this model uses an approximation using a normal
|
58
|
+
distribution (using a variance of delta).
|
59
|
+
|
60
|
+
=== The binomial model
|
61
|
+
|
62
|
+
The binomial model is used to generate a distribution of success rates
|
63
|
+
depending on a number of trials and successes for each group A and B.
|
64
|
+
|
65
|
+
The distribution outputs a list of probable success rates and their respective
|
66
|
+
probability for groups A and B.
|
67
|
+
|
68
|
+
For example, this command generate the binomial distribution with:
|
69
|
+
|
70
|
+
* 200 successes out of 450 trials for group A
|
71
|
+
* 220 successes out of 470 trials for group B
|
72
|
+
|
73
|
+
skab distribution binomial 450 200 470 220
|
74
|
+
|
75
|
+
== Known issues
|
76
|
+
|
77
|
+
This software relies on Hash ordering to display values in the correct order.
|
78
|
+
On Ruby versions older than 1.9, hash ordering wasn't guaranteed, and this
|
79
|
+
will cause some output to be inconsistent (mainly differential CSV and
|
80
|
+
summary outputs).
|
81
|
+
|
82
|
+
== LICENSE
|
83
|
+
|
84
|
+
The MIT License
|
85
|
+
|
86
|
+
Copyright © 2012 Songkick
|
87
|
+
|
88
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
89
|
+
this software and associated documentation files (the “Software”), to deal in
|
90
|
+
the Software without restriction, including without limitation the rights to
|
91
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
92
|
+
of the Software, and to permit persons to whom the Software is furnished to do
|
93
|
+
so, subject to the following conditions:
|
94
|
+
|
95
|
+
The above copyright notice and this permission notice shall be included in all
|
96
|
+
copies or substantial portions of the Software.
|
97
|
+
|
98
|
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
99
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
100
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
101
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
102
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
103
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
104
|
+
SOFTWARE.
|
data/bin/skab
ADDED
@@ -0,0 +1,81 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
version = RUBY_VERSION.split('.').map { |s| s.to_i }
|
4
|
+
if version[0] <= 1 && version [1] < 9
|
5
|
+
STDERR.puts <<-WARN18
|
6
|
+
WARNING: This software relies on feature available in Ruby 1.9 only. Using it
|
7
|
+
under Ruby 1.8 may yield unexpected results. Read the README for more
|
8
|
+
information.
|
9
|
+
WARN18
|
10
|
+
end
|
11
|
+
|
12
|
+
usage = <<-USAGE
|
13
|
+
Usage: skab [output] [model] [extra_params]
|
14
|
+
\tTry `skab help model` for more info about available models
|
15
|
+
\tor `skab help output` for more info about outputs
|
16
|
+
USAGE
|
17
|
+
|
18
|
+
require File.expand_path('../../lib/skab', __FILE__)
|
19
|
+
|
20
|
+
if ARGV.empty?
|
21
|
+
puts usage
|
22
|
+
exit
|
23
|
+
end
|
24
|
+
|
25
|
+
if ARGV[0] == 'help'
|
26
|
+
if ARGV[1] == 'model'
|
27
|
+
if ARGV[2]
|
28
|
+
model = Skab::Models.from_name(ARGV[2])
|
29
|
+
if model
|
30
|
+
puts model.help
|
31
|
+
else
|
32
|
+
puts <<-UNKNOWN_MODEL
|
33
|
+
This model doesn't exist. List of known models:
|
34
|
+
#{Skab::Models.model_names.join(', ')}
|
35
|
+
UNKNOWN_MODEL
|
36
|
+
end
|
37
|
+
else
|
38
|
+
puts Skab::Models.help
|
39
|
+
end
|
40
|
+
elsif ARGV[1] == 'output'
|
41
|
+
if ARGV[2]
|
42
|
+
output = Skab::Output.from_name(ARGV[2])
|
43
|
+
if output
|
44
|
+
puts output.help
|
45
|
+
else
|
46
|
+
puts <<-UNKNOWN_OUTPUT
|
47
|
+
This output doesn't exist. List of known output:
|
48
|
+
#{Skab::Output.output_names.join(', ')}
|
49
|
+
UNKNOWN_OUTPUT
|
50
|
+
end
|
51
|
+
else
|
52
|
+
puts Skab::Output.help
|
53
|
+
end
|
54
|
+
else
|
55
|
+
puts usage
|
56
|
+
end
|
57
|
+
exit
|
58
|
+
end
|
59
|
+
|
60
|
+
model_args = ARGV.slice(2, ARGV.length)
|
61
|
+
model_class = Skab::Models.from_name(ARGV[1])
|
62
|
+
unless model_class
|
63
|
+
puts <<-UNKNOWN_MODEL
|
64
|
+
This model doesn't exist. List of known models:
|
65
|
+
#{Skab::Models.model_names.join(', ')}
|
66
|
+
UNKNOWN_MODEL
|
67
|
+
exit
|
68
|
+
end
|
69
|
+
|
70
|
+
output_class = Skab::Output.from_name(ARGV[0])
|
71
|
+
unless output_class
|
72
|
+
puts <<-UNKNOWN_OUTPUT
|
73
|
+
This output doesn't exist. List of known outputs:
|
74
|
+
#{Skab::Output.output_names.join(', ')}
|
75
|
+
UNKNOWN_OUTPUT
|
76
|
+
exit
|
77
|
+
end
|
78
|
+
|
79
|
+
model = model_class.new(model_args)
|
80
|
+
output = output_class.new(STDOUT)
|
81
|
+
output.output(model)
|
data/lib/skab.rb
ADDED
data/lib/skab/models.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
module Skab
|
2
|
+
require ROOT + '/skab/models/poisson'
|
3
|
+
require ROOT + '/skab/models/binomial'
|
4
|
+
|
5
|
+
module Models
|
6
|
+
def self.from_name(name)
|
7
|
+
case name
|
8
|
+
when 'poisson'
|
9
|
+
Poisson
|
10
|
+
when 'binomial'
|
11
|
+
Binomial
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
def self.model_names
|
16
|
+
['poisson', 'binomial']
|
17
|
+
end
|
18
|
+
|
19
|
+
def self.help
|
20
|
+
<<-HELP
|
21
|
+
The following models are available: #{model_names.join(', ')}
|
22
|
+
\tTry `skab help model [model] to find out more about a model
|
23
|
+
HELP
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,83 @@
|
|
1
|
+
module Skab
|
2
|
+
module Models
|
3
|
+
class Binomial
|
4
|
+
|
5
|
+
def initialize(args)
|
6
|
+
@a_trials = args.shift.to_i
|
7
|
+
@a_success = args.shift.to_i
|
8
|
+
@b_trials = args.shift.to_i
|
9
|
+
@b_success = args.shift.to_i
|
10
|
+
end
|
11
|
+
|
12
|
+
def distribution
|
13
|
+
return @distribution if @distribution
|
14
|
+
@distribution = []
|
15
|
+
sums = [0, 0, 0]
|
16
|
+
i = 0.0
|
17
|
+
while i <= 1000
|
18
|
+
@distribution[i] = []
|
19
|
+
@distribution[i][0] = i / 1000
|
20
|
+
@distribution[i][1] = binomial(@a_trials, @a_success, i / 1000)
|
21
|
+
@distribution[i][2] = binomial(@b_trials, @b_success, i / 1000)
|
22
|
+
sums[1] += binomial(@a_trials, @a_success, i / 1000)
|
23
|
+
sums[2] += binomial(@b_trials, @b_success, i / 1000)
|
24
|
+
i += 1
|
25
|
+
end
|
26
|
+
i = 0.0
|
27
|
+
while i <= 1000
|
28
|
+
@distribution[i][1] /= sums[1]
|
29
|
+
@distribution[i][2] /= sums[2]
|
30
|
+
i += 1
|
31
|
+
end
|
32
|
+
@distribution
|
33
|
+
end
|
34
|
+
|
35
|
+
def differential
|
36
|
+
return @differential if @differential
|
37
|
+
@differential = Hash.new(0)
|
38
|
+
i = 0.0
|
39
|
+
while i <= 1000
|
40
|
+
j = 0.0
|
41
|
+
while j <= 1000
|
42
|
+
@differential[(j - i) / 1000] += distribution[j][2] * distribution[i][1]
|
43
|
+
j += 1
|
44
|
+
end
|
45
|
+
i += 1
|
46
|
+
end
|
47
|
+
@differential
|
48
|
+
end
|
49
|
+
|
50
|
+
def self.help
|
51
|
+
<<-HELP
|
52
|
+
skab [output] binomial [trials_a] [successes_a] [trials_b] [successes_b]
|
53
|
+
\tWhere: all parameters are integers
|
54
|
+
\tPlots the binomial distribution for A and B, given their respective
|
55
|
+
\tnumber of successes and trials
|
56
|
+
HELP
|
57
|
+
end
|
58
|
+
|
59
|
+
private
|
60
|
+
|
61
|
+
attr_reader :a, :b
|
62
|
+
|
63
|
+
def binomial(trials, success, rate)
|
64
|
+
binomial_coef(trials, success) *
|
65
|
+
(rate ** success) *
|
66
|
+
((1 - rate) ** (trials - success))
|
67
|
+
end
|
68
|
+
|
69
|
+
def binomial_coef(n, k)
|
70
|
+
fact(n) / (fact(k) * fact(n - k))
|
71
|
+
end
|
72
|
+
|
73
|
+
def fact(n)
|
74
|
+
f = 1
|
75
|
+
(1..n).each do |i|
|
76
|
+
f *= i
|
77
|
+
end
|
78
|
+
f
|
79
|
+
end
|
80
|
+
|
81
|
+
end
|
82
|
+
end
|
83
|
+
end
|
@@ -0,0 +1,67 @@
|
|
1
|
+
module Skab
|
2
|
+
module Models
|
3
|
+
class Poisson
|
4
|
+
|
5
|
+
def initialize(args)
|
6
|
+
@a = args.shift.to_i
|
7
|
+
@b = args.shift.to_i
|
8
|
+
end
|
9
|
+
|
10
|
+
def distribution
|
11
|
+
return @distribution if @distribution
|
12
|
+
@distribution = []
|
13
|
+
(0..limit).each do |n|
|
14
|
+
@distribution[n] = []
|
15
|
+
@distribution[n][0] = n
|
16
|
+
@distribution[n][1] = normal_approximation(n, a)
|
17
|
+
@distribution[n][2] = normal_approximation(n, b)
|
18
|
+
end
|
19
|
+
@distribution
|
20
|
+
end
|
21
|
+
|
22
|
+
def differential
|
23
|
+
return @differential if @differential
|
24
|
+
@differential = Hash.new(0)
|
25
|
+
(0..limit).each do |a|
|
26
|
+
(0..limit).each do |b|
|
27
|
+
@differential[b - a] += distribution[b][2] * distribution[a][1]
|
28
|
+
end
|
29
|
+
end
|
30
|
+
@differential
|
31
|
+
end
|
32
|
+
|
33
|
+
def self.help
|
34
|
+
<<-USAGE
|
35
|
+
skab [output] poisson [a] [b]
|
36
|
+
\tWhere: [a] and [b] are integers
|
37
|
+
\tPlots the poisson distribution for [a] and [b]
|
38
|
+
USAGE
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
attr_reader :a, :b
|
44
|
+
|
45
|
+
def limit
|
46
|
+
limit = [a, b].max * 2
|
47
|
+
end
|
48
|
+
|
49
|
+
def normal_approximation(k, delta)
|
50
|
+
(1 / (Math.sqrt(delta) * Math.sqrt(2 * Math::PI))) * Math.exp(-0.5 * (((k - delta) / Math.sqrt(delta))**2))
|
51
|
+
end
|
52
|
+
|
53
|
+
def poisson(k, delta)
|
54
|
+
((delta ** k) * Math.exp(-delta)) / factorial(k)
|
55
|
+
end
|
56
|
+
|
57
|
+
def factorial(n)
|
58
|
+
f = 1
|
59
|
+
(1..n).each do |i|
|
60
|
+
f *= i
|
61
|
+
end
|
62
|
+
f
|
63
|
+
end
|
64
|
+
|
65
|
+
end
|
66
|
+
end
|
67
|
+
end
|
data/lib/skab/output.rb
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
module Skab
|
2
|
+
require ROOT + '/skab/output/distribution'
|
3
|
+
require ROOT + '/skab/output/differential'
|
4
|
+
require ROOT + '/skab/output/summary'
|
5
|
+
|
6
|
+
module Output
|
7
|
+
def self.from_name(name)
|
8
|
+
case name
|
9
|
+
when 'differential'
|
10
|
+
Differential
|
11
|
+
when 'distribution'
|
12
|
+
Distribution
|
13
|
+
when 'summary'
|
14
|
+
Summary
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.output_names
|
19
|
+
['distribution', 'differential', 'summary']
|
20
|
+
end
|
21
|
+
|
22
|
+
def self.help
|
23
|
+
<<-HELP
|
24
|
+
The following outputs are available: #{output_names.join(', ')}
|
25
|
+
\tTry `skab help output [output] to find out more about a given output
|
26
|
+
HELP
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,44 @@
|
|
1
|
+
module Skab
|
2
|
+
module Output
|
3
|
+
class Differential
|
4
|
+
def initialize(out)
|
5
|
+
@out = out
|
6
|
+
end
|
7
|
+
|
8
|
+
def output(model)
|
9
|
+
data = model.differential
|
10
|
+
|
11
|
+
range = 0
|
12
|
+
data.each do |k, v|
|
13
|
+
if v != 0 && abs(k) > range
|
14
|
+
range = abs(k)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
range += range / 10
|
19
|
+
|
20
|
+
Hash[data.sort].each do |k, v|
|
21
|
+
if abs(k) <= range
|
22
|
+
@out.puts "#{k},#{v}"
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.help
|
28
|
+
<<-HELP
|
29
|
+
Usage: skab differential [model] [parameters]
|
30
|
+
\tOutputs the discrete probability distribution for (B - A) as returned by the
|
31
|
+
\tspecified model. The output is a two columns CSV file, where the first
|
32
|
+
\tcolumn is the absolute value of (B - A) and the second column the
|
33
|
+
\tcorresponding discrete probability
|
34
|
+
HELP
|
35
|
+
end
|
36
|
+
|
37
|
+
private
|
38
|
+
|
39
|
+
def abs(n)
|
40
|
+
n >= 0 ? n : -n
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end
|
@@ -0,0 +1,25 @@
|
|
1
|
+
module Skab
|
2
|
+
module Output
|
3
|
+
class Distribution
|
4
|
+
def initialize(out)
|
5
|
+
@out = out
|
6
|
+
end
|
7
|
+
|
8
|
+
def output(model)
|
9
|
+
model.distribution.each do |d|
|
10
|
+
@out.puts "#{d.join(',')}"
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.help
|
15
|
+
<<-HELP
|
16
|
+
Usage: skab distribution [model] [parameters]
|
17
|
+
\tOutputs the discrete probability distribution for both A and B, as
|
18
|
+
\treturned by the specified model. The output is a three columns CSV
|
19
|
+
\tfile, where the first column is the probable mean and the second and
|
20
|
+
\tthird column the corresponding discrete probability for A and B.
|
21
|
+
HELP
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
@@ -0,0 +1,34 @@
|
|
1
|
+
module Skab
|
2
|
+
module Output
|
3
|
+
class Summary
|
4
|
+
def initialize(out)
|
5
|
+
@out = out
|
6
|
+
end
|
7
|
+
|
8
|
+
def output(model)
|
9
|
+
sum = 0.0
|
10
|
+
min = 0
|
11
|
+
max = 0
|
12
|
+
Hash[model.differential.sort].each do |k, v|
|
13
|
+
sum += v
|
14
|
+
if min == 0 || sum <= 0.05
|
15
|
+
min = k
|
16
|
+
end
|
17
|
+
if max == 0 && sum >= 0.95
|
18
|
+
max = k
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
@out.puts "The difference is located between #{min} and #{max} (90% confidence)"
|
23
|
+
end
|
24
|
+
|
25
|
+
def self.help
|
26
|
+
<<-HELP
|
27
|
+
Usage: skab summary [model] [parameters]
|
28
|
+
\tOutputs a summary of the whole statistical analysis conducted on A and
|
29
|
+
\tB, using the specified model.
|
30
|
+
HELP
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
metadata
ADDED
@@ -0,0 +1,64 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: skab
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
prerelease:
|
5
|
+
version: 0.1.0
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Vivien Barousse
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
|
13
|
+
date: 2012-10-10 00:00:00 Z
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description:
|
17
|
+
email: vivien@songkick.com
|
18
|
+
executables:
|
19
|
+
- skab
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- README.rdoc
|
24
|
+
files:
|
25
|
+
- README.rdoc
|
26
|
+
- bin/skab
|
27
|
+
- lib/skab.rb
|
28
|
+
- lib/skab/output.rb
|
29
|
+
- lib/skab/models/binomial.rb
|
30
|
+
- lib/skab/models/poisson.rb
|
31
|
+
- lib/skab/output/summary.rb
|
32
|
+
- lib/skab/output/distribution.rb
|
33
|
+
- lib/skab/output/differential.rb
|
34
|
+
- lib/skab/models.rb
|
35
|
+
homepage: http://github.com/songkick/skab
|
36
|
+
licenses: []
|
37
|
+
|
38
|
+
post_install_message:
|
39
|
+
rdoc_options:
|
40
|
+
- --main
|
41
|
+
- README.rdoc
|
42
|
+
require_paths:
|
43
|
+
- lib
|
44
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
45
|
+
none: false
|
46
|
+
requirements:
|
47
|
+
- - ">="
|
48
|
+
- !ruby/object:Gem::Version
|
49
|
+
version: "0"
|
50
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
51
|
+
none: false
|
52
|
+
requirements:
|
53
|
+
- - ">="
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: "0"
|
56
|
+
requirements: []
|
57
|
+
|
58
|
+
rubyforge_project:
|
59
|
+
rubygems_version: 1.8.21
|
60
|
+
signing_key:
|
61
|
+
specification_version: 3
|
62
|
+
summary: A/B testing statistical analysis utility
|
63
|
+
test_files: []
|
64
|
+
|