aggregate 0.2.0 → 0.2.1
Sign up to get free protection for your applications and to get access to all the features.
- data/.gitignore +1 -0
- data/{README → README.textile} +63 -22
- data/Rakefile +1 -1
- data/VERSION +1 -1
- data/aggregate.gemspec +7 -6
- metadata +6 -5
data/.gitignore
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
pkg/
|
data/{README → README.textile}
RENAMED
@@ -1,13 +1,20 @@
|
|
1
|
+
h1. Aggregate
|
2
|
+
|
3
|
+
By Joseph Ruscio
|
4
|
+
|
1
5
|
Aggregate is an intuitive ruby implementation of a statistics aggregator
|
2
6
|
including both default and configurable histogram support. It does this
|
3
7
|
without recording/storing any of the actual sample values, making it
|
4
8
|
suitable for tracking statistics across millions/billions of sample
|
5
9
|
without any impact on performance or memory footprint. Originally
|
6
|
-
inspired by the Aggregate support in SystemTap
|
10
|
+
inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
|
7
11
|
|
8
|
-
|
9
|
-
statistics:
|
12
|
+
h2. Getting Started
|
10
13
|
|
14
|
+
Aggregates are easy to instantiate, populate with sample data, and then
|
15
|
+
inspect for common aggregate statistics:
|
16
|
+
|
17
|
+
<pre><code>
|
11
18
|
#After instantiation use the << operator to add a sample to the aggregate:
|
12
19
|
stats = Aggregate.new
|
13
20
|
|
@@ -30,14 +37,21 @@ stats.min
|
|
30
37
|
|
31
38
|
# The standard deviation
|
32
39
|
stats.std_dev
|
40
|
+
</code></pre>
|
41
|
+
|
42
|
+
h2. Histograms
|
33
43
|
|
34
44
|
Perhaps more importantly than the basic aggregate statistics detailed above
|
35
|
-
Aggregate also maintains a histogram of samples.
|
36
|
-
|
45
|
+
Aggregate also maintains a histogram of samples. For anything other than
|
46
|
+
normally distributed data are insufficient at best and often downright misleading
|
47
|
+
37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
|
48
|
+
Aggregates maintains its histogram internally as a set of "buckets".
|
49
|
+
Each bucket represents a range of possible sample values. The set of all buckets
|
50
|
+
represents the range of "normal" sample values.
|
37
51
|
|
38
|
-
|
39
|
-
|
40
|
-
|
52
|
+
h3. Binary Histograms
|
53
|
+
|
54
|
+
Without any configuration Aggregate instance maintains a binary histogram, where
|
41
55
|
each bucket represents a range twice as large as the preceding bucket i.e.
|
42
56
|
[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
|
43
57
|
provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
|
@@ -50,31 +64,44 @@ fall into some bucket. After using binary histograms to determine
|
|
50
64
|
the coarse-grained characteristics of your sample space you can
|
51
65
|
configure a linear histogram to examine it in closer detail.
|
52
66
|
|
67
|
+
h3. Linear Histograms
|
68
|
+
|
53
69
|
Linear histograms are specified with the three values low, high, and width.
|
54
|
-
Low and high
|
70
|
+
Low and high specify a range [low, high) of values included in the
|
55
71
|
histogram (all others are outliers). Width specifies the number of
|
56
72
|
values represented by each bucket and therefore the number of
|
57
73
|
buckets i.e. granularity of the histogram. The histogram range
|
58
74
|
(high - low) must be a multiple of width:
|
59
75
|
|
76
|
+
<pre><code>
|
60
77
|
#Want to track aggregate stats on response times in ms
|
61
78
|
response_stats = Aggregate.new(0, 2000, 50)
|
79
|
+
</code></pre>
|
62
80
|
|
63
81
|
The example above creates a linear histogram that tracks the
|
64
82
|
response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
|
65
|
-
most of your samples fall in the first couple buckets!
|
66
|
-
|
83
|
+
most of your samples fall in the first couple buckets!
|
84
|
+
|
85
|
+
h3. Histogram Outliers
|
67
86
|
|
87
|
+
An Aggregate records any samples that fall outside the histogram range as
|
88
|
+
outliers:
|
89
|
+
|
90
|
+
<pre><code>
|
68
91
|
# Number of samples that fall below the normal range
|
69
92
|
stats.outliers_low
|
70
93
|
|
71
94
|
# Number of samples that fall above the normal range
|
72
95
|
stats.outliers_high
|
96
|
+
</code></pre>
|
97
|
+
|
98
|
+
h3. Histogram Iterators
|
73
99
|
|
74
100
|
Once a histogram is populated Aggregate provides iterator support for
|
75
101
|
examining the contents of buckets. The iterators provide both the
|
76
102
|
number of samples in the bucket, as well as its range:
|
77
103
|
|
104
|
+
<pre><code>
|
78
105
|
#Examine every bucket
|
79
106
|
@stats.each do |bucket, count|
|
80
107
|
end
|
@@ -82,21 +109,29 @@ end
|
|
82
109
|
#Examine only buckets containing samples
|
83
110
|
@stats.each_nonzero do |bucket, count|
|
84
111
|
end
|
112
|
+
</code></pre>
|
85
113
|
|
86
|
-
|
87
|
-
any given number of columns >= 80 (defaults to 80) and sample distribution
|
88
|
-
properly sets a marker weight based on the samples per bucket and aligns all
|
89
|
-
output. Empty buckets are skipped to conserve screen space.
|
114
|
+
h3. Histogram Bar Chart
|
90
115
|
|
116
|
+
Finally Aggregate contains sophisticated pretty-printing support to generate
|
117
|
+
ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
|
118
|
+
sample distribution the <code>to_s</code> method properly sets a marker weight based on the
|
119
|
+
samples per bucket and aligns all output. Empty buckets are skipped to conserve
|
120
|
+
screen space.
|
121
|
+
|
122
|
+
<pre><code>
|
91
123
|
# Generate and display an 80 column histogram
|
92
124
|
puts stats.to_s
|
93
125
|
|
94
126
|
# Generate and display a 120 column histogram
|
95
127
|
puts stats.to_s(120)
|
128
|
+
</code></pre>
|
96
129
|
|
97
|
-
|
98
|
-
set of 65536 values generated by rand to produce
|
130
|
+
This code example populates both a binary and linear histogram with the same
|
131
|
+
set of 65536 values generated by <code>rand</code> to produce the
|
132
|
+
two histograms that follow it:
|
99
133
|
|
134
|
+
<pre><code>
|
100
135
|
require 'rubygems'
|
101
136
|
require 'aggregate'
|
102
137
|
|
@@ -112,9 +147,11 @@ end
|
|
112
147
|
|
113
148
|
puts binary_aggregate.to_s
|
114
149
|
puts linear_aggregate.to_s
|
150
|
+
</code></pre>
|
151
|
+
|
152
|
+
h4. Binary Histogram
|
115
153
|
|
116
|
-
|
117
|
-
** Binary Histogram**
|
154
|
+
<pre><code>
|
118
155
|
value |------------------------------------------------------------------| count
|
119
156
|
1 | | 3
|
120
157
|
2 | | 1
|
@@ -134,8 +171,11 @@ value |------------------------------------------------------------------| count
|
|
134
171
|
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
|
135
172
|
~
|
136
173
|
Total |------------------------------------------------------------------| 65535
|
174
|
+
</code></pre>
|
137
175
|
|
138
|
-
|
176
|
+
h4. Linear (0, 65536, 4096) Histogram
|
177
|
+
|
178
|
+
<pre><code>
|
139
179
|
value |------------------------------------------------------------------| count
|
140
180
|
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
|
141
181
|
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
|
@@ -154,11 +194,12 @@ value |------------------------------------------------------------------| count
|
|
154
194
|
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
|
155
195
|
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
|
156
196
|
Total |------------------------------------------------------------------| 65532
|
157
|
-
|
197
|
+
</code></pre>
|
158
198
|
We can see from these histograms that Ruby's rand function does a relatively good
|
159
199
|
job of distributing returned values in the requested range.
|
160
200
|
|
161
|
-
|
201
|
+
h2. NOTES
|
202
|
+
|
162
203
|
Ruby doesn't have a log2 function built into Math, so we approximate with
|
163
204
|
log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
|
164
205
|
to precision limitations, once n reaches a certain size (somewhere > 32)
|
data/Rakefile
CHANGED
@@ -5,7 +5,7 @@ begin
|
|
5
5
|
Jeweler::Tasks.new do |gemspec|
|
6
6
|
gemspec.name = "aggregate"
|
7
7
|
gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
|
8
|
-
gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
|
8
|
+
gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
|
9
9
|
gemspec.email = "jruscio@gmail.com"
|
10
10
|
gemspec.homepage = "http://github.com/josephruscio/aggregate"
|
11
11
|
gemspec.authors = ["Joseph Ruscio"]
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.2.
|
1
|
+
0.2.1
|
data/aggregate.gemspec
CHANGED
@@ -5,20 +5,21 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{aggregate}
|
8
|
-
s.version = "0.2.
|
8
|
+
s.version = "0.2.1"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Joseph Ruscio"]
|
12
|
-
s.date = %q{2009-09-
|
13
|
-
s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
|
12
|
+
s.date = %q{2009-09-13}
|
13
|
+
s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
|
14
14
|
s.email = %q{jruscio@gmail.com}
|
15
15
|
s.extra_rdoc_files = [
|
16
16
|
"LICENSE",
|
17
|
-
"README"
|
17
|
+
"README.textile"
|
18
18
|
]
|
19
19
|
s.files = [
|
20
|
-
"
|
21
|
-
"
|
20
|
+
".gitignore",
|
21
|
+
"LICENSE",
|
22
|
+
"README.textile",
|
22
23
|
"Rakefile",
|
23
24
|
"VERSION",
|
24
25
|
"aggregate.gemspec",
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: aggregate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Joseph Ruscio
|
@@ -9,11 +9,11 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-09-
|
12
|
+
date: 2009-09-13 00:00:00 -07:00
|
13
13
|
default_executable:
|
14
14
|
dependencies: []
|
15
15
|
|
16
|
-
description: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
|
16
|
+
description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
|
17
17
|
email: jruscio@gmail.com
|
18
18
|
executables: []
|
19
19
|
|
@@ -21,10 +21,11 @@ extensions: []
|
|
21
21
|
|
22
22
|
extra_rdoc_files:
|
23
23
|
- LICENSE
|
24
|
-
- README
|
24
|
+
- README.textile
|
25
25
|
files:
|
26
|
+
- .gitignore
|
26
27
|
- LICENSE
|
27
|
-
- README
|
28
|
+
- README.textile
|
28
29
|
- Rakefile
|
29
30
|
- VERSION
|
30
31
|
- aggregate.gemspec
|