skab 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/README.rdoc +104 -0
- data/bin/skab +81 -0
- data/lib/skab.rb +6 -0
- data/lib/skab/models.rb +26 -0
- data/lib/skab/models/binomial.rb +83 -0
- data/lib/skab/models/poisson.rb +67 -0
- data/lib/skab/output.rb +29 -0
- data/lib/skab/output/differential.rb +44 -0
- data/lib/skab/output/distribution.rb +25 -0
- data/lib/skab/output/summary.rb +34 -0
- metadata +64 -0
data/README.rdoc
ADDED
@@ -0,0 +1,104 @@
|
|
1
|
+
= Skab
|
2
|
+
|
3
|
+
This is a tool to help run statistical analyses of A/B testing experiments
|
4
|
+
we run here at Songkick.
|
5
|
+
|
6
|
+
We use this util mainly to generate CSV files that we can plot using Google
|
7
|
+
Docs in order to determine if an A/B test is a success or a failure.
|
8
|
+
|
9
|
+
== Getting started
|
10
|
+
|
11
|
+
* Install skab by running `gem install skab`
|
12
|
+
* You can run the util by using the `skab` command line
|
13
|
+
|
14
|
+
== Command line arguments
|
15
|
+
|
16
|
+
skab [output] [model] [model_args]
|
17
|
+
|
18
|
+
The command line accepts a variable number of arguments:
|
19
|
+
|
20
|
+
* `output` is the name of the output module to use to print data
|
21
|
+
* `model` is the name of the model used to model the process to analyse
|
22
|
+
* All other arguments are model dependent and are passed to the model
|
23
|
+
|
24
|
+
== Outputs
|
25
|
+
|
26
|
+
Skab is able to output different statistics, all based on the model used to
|
27
|
+
generate the distribution.
|
28
|
+
|
29
|
+
We currently support two main outputs:
|
30
|
+
|
31
|
+
* Distribution: the discrete probability distribution for each group,
|
32
|
+
based on the model used to represent the process
|
33
|
+
* Differential: the discrete probability distribution for Xb - Xa
|
34
|
+
|
35
|
+
== Models
|
36
|
+
|
37
|
+
Skab currently supports two models to generate a distribution of the mean
|
38
|
+
depending on the actual observed values:
|
39
|
+
|
40
|
+
* Poisson model, working with rate of events on a specific period of time
|
41
|
+
* Binomial model, working with success rates
|
42
|
+
|
43
|
+
=== The poisson model
|
44
|
+
|
45
|
+
The poisson model accepts two integer parameters: A and B. Each parameter
|
46
|
+
corresponds to the measured number of events occuring in group A or B,
|
47
|
+
respectively.
|
48
|
+
|
49
|
+
The distribution outputs a list of probability for each mean depending on the
|
50
|
+
A or B group, according to the poisson law of small numbers.
|
51
|
+
|
52
|
+
Here is an example, with 1450 events observed for group A and 1430 for group B:
|
53
|
+
|
54
|
+
skab distribution poisson 1450 1430
|
55
|
+
|
56
|
+
It is worth noting that the Poisson distribution is expensive to compute for
|
57
|
+
large numbers (> 100), so this model uses an approximation using a normal
|
58
|
+
distribution (using a variance of delta).
|
59
|
+
|
60
|
+
=== The binomial model
|
61
|
+
|
62
|
+
The binomial model is used to generate a distribution of success rates
|
63
|
+
depending on a number of trials and successes for each group A and B.
|
64
|
+
|
65
|
+
The distribution outputs a list of probable success rates and their respective
|
66
|
+
probability for groups A and B.
|
67
|
+
|
68
|
+
For example, this command generate the binomial distribution with:
|
69
|
+
|
70
|
+
* 200 successes out of 450 trials for group A
|
71
|
+
* 220 successes out of 470 trials for group B
|
72
|
+
|
73
|
+
skab distribution binomial 450 200 470 220
|
74
|
+
|
75
|
+
== Known issues
|
76
|
+
|
77
|
+
This software relies on Hash ordering to display values in the correct order.
|
78
|
+
On Ruby versions older than 1.9, hash ordering wasn't guaranteed, and this
|
79
|
+
will cause some output to be inconsistent (mainly differential CSV and
|
80
|
+
summary outputs).
|
81
|
+
|
82
|
+
== LICENSE
|
83
|
+
|
84
|
+
The MIT License
|
85
|
+
|
86
|
+
Copyright © 2012 Songkick
|
87
|
+
|
88
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy of
|
89
|
+
this software and associated documentation files (the “Software”), to deal in
|
90
|
+
the Software without restriction, including without limitation the rights to
|
91
|
+
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
|
92
|
+
of the Software, and to permit persons to whom the Software is furnished to do
|
93
|
+
so, subject to the following conditions:
|
94
|
+
|
95
|
+
The above copyright notice and this permission notice shall be included in all
|
96
|
+
copies or substantial portions of the Software.
|
97
|
+
|
98
|
+
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
99
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
100
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
101
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
102
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
103
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
104
|
+
SOFTWARE.
|
data/bin/skab
ADDED
@@ -0,0 +1,81 @@
|
|
1
|
+
#!/usr/bin/env ruby
|
2
|
+
|
3
|
+
version = RUBY_VERSION.split('.').map { |s| s.to_i }
|
4
|
+
if version[0] <= 1 && version [1] < 9
|
5
|
+
STDERR.puts <<-WARN18
|
6
|
+
WARNING: This software relies on feature available in Ruby 1.9 only. Using it
|
7
|
+
under Ruby 1.8 may yield unexpected results. Read the README for more
|
8
|
+
information.
|
9
|
+
WARN18
|
10
|
+
end
|
11
|
+
|
12
|
+
usage = <<-USAGE
|
13
|
+
Usage: skab [output] [model] [extra_params]
|
14
|
+
\tTry `skab help model` for more info about available models
|
15
|
+
\tor `skab help output` for more info about outputs
|
16
|
+
USAGE
|
17
|
+
|
18
|
+
require File.expand_path('../../lib/skab', __FILE__)
|
19
|
+
|
20
|
+
if ARGV.empty?
|
21
|
+
puts usage
|
22
|
+
exit
|
23
|
+
end
|
24
|
+
|
25
|
+
if ARGV[0] == 'help'
|
26
|
+
if ARGV[1] == 'model'
|
27
|
+
if ARGV[2]
|
28
|
+
model = Skab::Models.from_name(ARGV[2])
|
29
|
+
if model
|
30
|
+
puts model.help
|
31
|
+
else
|
32
|
+
puts <<-UNKNOWN_MODEL
|
33
|
+
This model doesn't exist. List of known models:
|
34
|
+
#{Skab::Models.model_names.join(', ')}
|
35
|
+
UNKNOWN_MODEL
|
36
|
+
end
|
37
|
+
else
|
38
|
+
puts Skab::Models.help
|
39
|
+
end
|
40
|
+
elsif ARGV[1] == 'output'
|
41
|
+
if ARGV[2]
|
42
|
+
output = Skab::Output.from_name(ARGV[2])
|
43
|
+
if output
|
44
|
+
puts output.help
|
45
|
+
else
|
46
|
+
puts <<-UNKNOWN_OUTPUT
|
47
|
+
This output doesn't exist. List of known output:
|
48
|
+
#{Skab::Output.output_names.join(', ')}
|
49
|
+
UNKNOWN_OUTPUT
|
50
|
+
end
|
51
|
+
else
|
52
|
+
puts Skab::Output.help
|
53
|
+
end
|
54
|
+
else
|
55
|
+
puts usage
|
56
|
+
end
|
57
|
+
exit
|
58
|
+
end
|
59
|
+
|
60
|
+
model_args = ARGV.slice(2, ARGV.length)
|
61
|
+
model_class = Skab::Models.from_name(ARGV[1])
|
62
|
+
unless model_class
|
63
|
+
puts <<-UNKNOWN_MODEL
|
64
|
+
This model doesn't exist. List of known models:
|
65
|
+
#{Skab::Models.model_names.join(', ')}
|
66
|
+
UNKNOWN_MODEL
|
67
|
+
exit
|
68
|
+
end
|
69
|
+
|
70
|
+
output_class = Skab::Output.from_name(ARGV[0])
|
71
|
+
unless output_class
|
72
|
+
puts <<-UNKNOWN_OUTPUT
|
73
|
+
This output doesn't exist. List of known outputs:
|
74
|
+
#{Skab::Output.output_names.join(', ')}
|
75
|
+
UNKNOWN_OUTPUT
|
76
|
+
exit
|
77
|
+
end
|
78
|
+
|
79
|
+
model = model_class.new(model_args)
|
80
|
+
output = output_class.new(STDOUT)
|
81
|
+
output.output(model)
|
data/lib/skab.rb
ADDED
data/lib/skab/models.rb
ADDED
@@ -0,0 +1,26 @@
|
|
1
|
+
module Skab
|
2
|
+
require ROOT + '/skab/models/poisson'
|
3
|
+
require ROOT + '/skab/models/binomial'
|
4
|
+
|
5
|
+
module Models
|
6
|
+
def self.from_name(name)
|
7
|
+
case name
|
8
|
+
when 'poisson'
|
9
|
+
Poisson
|
10
|
+
when 'binomial'
|
11
|
+
Binomial
|
12
|
+
end
|
13
|
+
end
|
14
|
+
|
15
|
+
def self.model_names
|
16
|
+
['poisson', 'binomial']
|
17
|
+
end
|
18
|
+
|
19
|
+
def self.help
|
20
|
+
<<-HELP
|
21
|
+
The following models are available: #{model_names.join(', ')}
|
22
|
+
\tTry `skab help model [model] to find out more about a model
|
23
|
+
HELP
|
24
|
+
end
|
25
|
+
end
|
26
|
+
end
|
@@ -0,0 +1,83 @@
|
|
1
|
+
module Skab
|
2
|
+
module Models
|
3
|
+
class Binomial
|
4
|
+
|
5
|
+
def initialize(args)
|
6
|
+
@a_trials = args.shift.to_i
|
7
|
+
@a_success = args.shift.to_i
|
8
|
+
@b_trials = args.shift.to_i
|
9
|
+
@b_success = args.shift.to_i
|
10
|
+
end
|
11
|
+
|
12
|
+
def distribution
|
13
|
+
return @distribution if @distribution
|
14
|
+
@distribution = []
|
15
|
+
sums = [0, 0, 0]
|
16
|
+
i = 0.0
|
17
|
+
while i <= 1000
|
18
|
+
@distribution[i] = []
|
19
|
+
@distribution[i][0] = i / 1000
|
20
|
+
@distribution[i][1] = binomial(@a_trials, @a_success, i / 1000)
|
21
|
+
@distribution[i][2] = binomial(@b_trials, @b_success, i / 1000)
|
22
|
+
sums[1] += binomial(@a_trials, @a_success, i / 1000)
|
23
|
+
sums[2] += binomial(@b_trials, @b_success, i / 1000)
|
24
|
+
i += 1
|
25
|
+
end
|
26
|
+
i = 0.0
|
27
|
+
while i <= 1000
|
28
|
+
@distribution[i][1] /= sums[1]
|
29
|
+
@distribution[i][2] /= sums[2]
|
30
|
+
i += 1
|
31
|
+
end
|
32
|
+
@distribution
|
33
|
+
end
|
34
|
+
|
35
|
+
def differential
|
36
|
+
return @differential if @differential
|
37
|
+
@differential = Hash.new(0)
|
38
|
+
i = 0.0
|
39
|
+
while i <= 1000
|
40
|
+
j = 0.0
|
41
|
+
while j <= 1000
|
42
|
+
@differential[(j - i) / 1000] += distribution[j][2] * distribution[i][1]
|
43
|
+
j += 1
|
44
|
+
end
|
45
|
+
i += 1
|
46
|
+
end
|
47
|
+
@differential
|
48
|
+
end
|
49
|
+
|
50
|
+
def self.help
|
51
|
+
<<-HELP
|
52
|
+
skab [output] binomial [trials_a] [successes_a] [trials_b] [successes_b]
|
53
|
+
\tWhere: all parameters are integers
|
54
|
+
\tPlots the binomial distribution for A and B, given their respective
|
55
|
+
\tnumber of successes and trials
|
56
|
+
HELP
|
57
|
+
end
|
58
|
+
|
59
|
+
private
|
60
|
+
|
61
|
+
attr_reader :a, :b
|
62
|
+
|
63
|
+
def binomial(trials, success, rate)
|
64
|
+
binomial_coef(trials, success) *
|
65
|
+
(rate ** success) *
|
66
|
+
((1 - rate) ** (trials - success))
|
67
|
+
end
|
68
|
+
|
69
|
+
def binomial_coef(n, k)
|
70
|
+
fact(n) / (fact(k) * fact(n - k))
|
71
|
+
end
|
72
|
+
|
73
|
+
def fact(n)
|
74
|
+
f = 1
|
75
|
+
(1..n).each do |i|
|
76
|
+
f *= i
|
77
|
+
end
|
78
|
+
f
|
79
|
+
end
|
80
|
+
|
81
|
+
end
|
82
|
+
end
|
83
|
+
end
|
@@ -0,0 +1,67 @@
|
|
1
|
+
module Skab
|
2
|
+
module Models
|
3
|
+
class Poisson
|
4
|
+
|
5
|
+
def initialize(args)
|
6
|
+
@a = args.shift.to_i
|
7
|
+
@b = args.shift.to_i
|
8
|
+
end
|
9
|
+
|
10
|
+
def distribution
|
11
|
+
return @distribution if @distribution
|
12
|
+
@distribution = []
|
13
|
+
(0..limit).each do |n|
|
14
|
+
@distribution[n] = []
|
15
|
+
@distribution[n][0] = n
|
16
|
+
@distribution[n][1] = normal_approximation(n, a)
|
17
|
+
@distribution[n][2] = normal_approximation(n, b)
|
18
|
+
end
|
19
|
+
@distribution
|
20
|
+
end
|
21
|
+
|
22
|
+
def differential
|
23
|
+
return @differential if @differential
|
24
|
+
@differential = Hash.new(0)
|
25
|
+
(0..limit).each do |a|
|
26
|
+
(0..limit).each do |b|
|
27
|
+
@differential[b - a] += distribution[b][2] * distribution[a][1]
|
28
|
+
end
|
29
|
+
end
|
30
|
+
@differential
|
31
|
+
end
|
32
|
+
|
33
|
+
def self.help
|
34
|
+
<<-USAGE
|
35
|
+
skab [output] poisson [a] [b]
|
36
|
+
\tWhere: [a] and [b] are integers
|
37
|
+
\tPlots the poisson distribution for [a] and [b]
|
38
|
+
USAGE
|
39
|
+
end
|
40
|
+
|
41
|
+
private
|
42
|
+
|
43
|
+
attr_reader :a, :b
|
44
|
+
|
45
|
+
def limit
|
46
|
+
limit = [a, b].max * 2
|
47
|
+
end
|
48
|
+
|
49
|
+
def normal_approximation(k, delta)
|
50
|
+
(1 / (Math.sqrt(delta) * Math.sqrt(2 * Math::PI))) * Math.exp(-0.5 * (((k - delta) / Math.sqrt(delta))**2))
|
51
|
+
end
|
52
|
+
|
53
|
+
def poisson(k, delta)
|
54
|
+
((delta ** k) * Math.exp(-delta)) / factorial(k)
|
55
|
+
end
|
56
|
+
|
57
|
+
def factorial(n)
|
58
|
+
f = 1
|
59
|
+
(1..n).each do |i|
|
60
|
+
f *= i
|
61
|
+
end
|
62
|
+
f
|
63
|
+
end
|
64
|
+
|
65
|
+
end
|
66
|
+
end
|
67
|
+
end
|
data/lib/skab/output.rb
ADDED
@@ -0,0 +1,29 @@
|
|
1
|
+
module Skab
|
2
|
+
require ROOT + '/skab/output/distribution'
|
3
|
+
require ROOT + '/skab/output/differential'
|
4
|
+
require ROOT + '/skab/output/summary'
|
5
|
+
|
6
|
+
module Output
|
7
|
+
def self.from_name(name)
|
8
|
+
case name
|
9
|
+
when 'differential'
|
10
|
+
Differential
|
11
|
+
when 'distribution'
|
12
|
+
Distribution
|
13
|
+
when 'summary'
|
14
|
+
Summary
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.output_names
|
19
|
+
['distribution', 'differential', 'summary']
|
20
|
+
end
|
21
|
+
|
22
|
+
def self.help
|
23
|
+
<<-HELP
|
24
|
+
The following outputs are available: #{output_names.join(', ')}
|
25
|
+
\tTry `skab help output [output] to find out more about a given output
|
26
|
+
HELP
|
27
|
+
end
|
28
|
+
end
|
29
|
+
end
|
@@ -0,0 +1,44 @@
|
|
1
|
+
module Skab
|
2
|
+
module Output
|
3
|
+
class Differential
|
4
|
+
def initialize(out)
|
5
|
+
@out = out
|
6
|
+
end
|
7
|
+
|
8
|
+
def output(model)
|
9
|
+
data = model.differential
|
10
|
+
|
11
|
+
range = 0
|
12
|
+
data.each do |k, v|
|
13
|
+
if v != 0 && abs(k) > range
|
14
|
+
range = abs(k)
|
15
|
+
end
|
16
|
+
end
|
17
|
+
|
18
|
+
range += range / 10
|
19
|
+
|
20
|
+
Hash[data.sort].each do |k, v|
|
21
|
+
if abs(k) <= range
|
22
|
+
@out.puts "#{k},#{v}"
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
26
|
+
|
27
|
+
def self.help
|
28
|
+
<<-HELP
|
29
|
+
Usage: skab differential [model] [parameters]
|
30
|
+
\tOutputs the discrete probability distribution for (B - A) as returned by the
|
31
|
+
\tspecified model. The output is a two columns CSV file, where the first
|
32
|
+
\tcolumn is the absolute value of (B - A) and the second column the
|
33
|
+
\tcorresponding discrete probability
|
34
|
+
HELP
|
35
|
+
end
|
36
|
+
|
37
|
+
private
|
38
|
+
|
39
|
+
def abs(n)
|
40
|
+
n >= 0 ? n : -n
|
41
|
+
end
|
42
|
+
end
|
43
|
+
end
|
44
|
+
end
|
@@ -0,0 +1,25 @@
|
|
1
|
+
module Skab
|
2
|
+
module Output
|
3
|
+
class Distribution
|
4
|
+
def initialize(out)
|
5
|
+
@out = out
|
6
|
+
end
|
7
|
+
|
8
|
+
def output(model)
|
9
|
+
model.distribution.each do |d|
|
10
|
+
@out.puts "#{d.join(',')}"
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.help
|
15
|
+
<<-HELP
|
16
|
+
Usage: skab distribution [model] [parameters]
|
17
|
+
\tOutputs the discrete probability distribution for both A and B, as
|
18
|
+
\treturned by the specified model. The output is a three columns CSV
|
19
|
+
\tfile, where the first column is the probable mean and the second and
|
20
|
+
\tthird column the corresponding discrete probability for A and B.
|
21
|
+
HELP
|
22
|
+
end
|
23
|
+
end
|
24
|
+
end
|
25
|
+
end
|
@@ -0,0 +1,34 @@
|
|
1
|
+
module Skab
|
2
|
+
module Output
|
3
|
+
class Summary
|
4
|
+
def initialize(out)
|
5
|
+
@out = out
|
6
|
+
end
|
7
|
+
|
8
|
+
def output(model)
|
9
|
+
sum = 0.0
|
10
|
+
min = 0
|
11
|
+
max = 0
|
12
|
+
Hash[model.differential.sort].each do |k, v|
|
13
|
+
sum += v
|
14
|
+
if min == 0 || sum <= 0.05
|
15
|
+
min = k
|
16
|
+
end
|
17
|
+
if max == 0 && sum >= 0.95
|
18
|
+
max = k
|
19
|
+
end
|
20
|
+
end
|
21
|
+
|
22
|
+
@out.puts "The difference is located between #{min} and #{max} (90% confidence)"
|
23
|
+
end
|
24
|
+
|
25
|
+
def self.help
|
26
|
+
<<-HELP
|
27
|
+
Usage: skab summary [model] [parameters]
|
28
|
+
\tOutputs a summary of the whole statistical analysis conducted on A and
|
29
|
+
\tB, using the specified model.
|
30
|
+
HELP
|
31
|
+
end
|
32
|
+
end
|
33
|
+
end
|
34
|
+
end
|
metadata
ADDED
@@ -0,0 +1,64 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: skab
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
prerelease:
|
5
|
+
version: 0.1.0
|
6
|
+
platform: ruby
|
7
|
+
authors:
|
8
|
+
- Vivien Barousse
|
9
|
+
autorequire:
|
10
|
+
bindir: bin
|
11
|
+
cert_chain: []
|
12
|
+
|
13
|
+
date: 2012-10-10 00:00:00 Z
|
14
|
+
dependencies: []
|
15
|
+
|
16
|
+
description:
|
17
|
+
email: vivien@songkick.com
|
18
|
+
executables:
|
19
|
+
- skab
|
20
|
+
extensions: []
|
21
|
+
|
22
|
+
extra_rdoc_files:
|
23
|
+
- README.rdoc
|
24
|
+
files:
|
25
|
+
- README.rdoc
|
26
|
+
- bin/skab
|
27
|
+
- lib/skab.rb
|
28
|
+
- lib/skab/output.rb
|
29
|
+
- lib/skab/models/binomial.rb
|
30
|
+
- lib/skab/models/poisson.rb
|
31
|
+
- lib/skab/output/summary.rb
|
32
|
+
- lib/skab/output/distribution.rb
|
33
|
+
- lib/skab/output/differential.rb
|
34
|
+
- lib/skab/models.rb
|
35
|
+
homepage: http://github.com/songkick/skab
|
36
|
+
licenses: []
|
37
|
+
|
38
|
+
post_install_message:
|
39
|
+
rdoc_options:
|
40
|
+
- --main
|
41
|
+
- README.rdoc
|
42
|
+
require_paths:
|
43
|
+
- lib
|
44
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
45
|
+
none: false
|
46
|
+
requirements:
|
47
|
+
- - ">="
|
48
|
+
- !ruby/object:Gem::Version
|
49
|
+
version: "0"
|
50
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
51
|
+
none: false
|
52
|
+
requirements:
|
53
|
+
- - ">="
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: "0"
|
56
|
+
requirements: []
|
57
|
+
|
58
|
+
rubyforge_project:
|
59
|
+
rubygems_version: 1.8.21
|
60
|
+
signing_key:
|
61
|
+
specification_version: 3
|
62
|
+
summary: A/B testing statistical analysis utility
|
63
|
+
test_files: []
|
64
|
+
|