josephruscio-aggregate 0.2.0 → 0.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +1 -0
- data/{README → README.textile} +63 -22
- data/Rakefile +1 -1
- data/VERSION +1 -1
- data/aggregate.gemspec +7 -6
- metadata +8 -6
data/.gitignore
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
pkg/
|
data/{README → README.textile}
RENAMED
@@ -1,13 +1,20 @@
|
|
1
|
+
h1. Aggregate
|
2
|
+
|
3
|
+
By Joseph Ruscio
|
4
|
+
|
1
5
|
Aggregate is an intuitive ruby implementation of a statistics aggregator
|
2
6
|
including both default and configurable histogram support. It does this
|
3
7
|
without recording/storing any of the actual sample values, making it
|
4
8
|
suitable for tracking statistics across millions/billions of sample
|
5
9
|
without any impact on performance or memory footprint. Originally
|
6
|
-
inspired by the Aggregate support in SystemTap
|
10
|
+
inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
|
7
11
|
|
8
|
-
|
9
|
-
statistics:
|
12
|
+
h2. Getting Started
|
10
13
|
|
14
|
+
Aggregates are easy to instantiate, populate with sample data, and then
|
15
|
+
inspect for common aggregate statistics:
|
16
|
+
|
17
|
+
<pre><code>
|
11
18
|
#After instantiation use the << operator to add a sample to the aggregate:
|
12
19
|
stats = Aggregate.new
|
13
20
|
|
@@ -30,14 +37,21 @@ stats.min
|
|
30
37
|
|
31
38
|
# The standard deviation
|
32
39
|
stats.std_dev
|
40
|
+
</code></pre>
|
41
|
+
|
42
|
+
h2. Histograms
|
33
43
|
|
34
44
|
Perhaps more importantly than the basic aggregate statistics detailed above
|
35
|
-
Aggregate also maintains a histogram of samples.
|
36
|
-
|
45
|
+
Aggregate also maintains a histogram of samples. For anything other than
|
46
|
+
normally distributed data are insufficient at best and often downright misleading
|
47
|
+
37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
|
48
|
+
Aggregates maintains its histogram internally as a set of "buckets".
|
49
|
+
Each bucket represents a range of possible sample values. The set of all buckets
|
50
|
+
represents the range of "normal" sample values.
|
37
51
|
|
38
|
-
|
39
|
-
|
40
|
-
|
52
|
+
h3. Binary Histograms
|
53
|
+
|
54
|
+
Without any configuration Aggregate instance maintains a binary histogram, where
|
41
55
|
each bucket represents a range twice as large as the preceding bucket i.e.
|
42
56
|
[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
|
43
57
|
provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
|
@@ -50,31 +64,44 @@ fall into some bucket. After using binary histograms to determine
|
|
50
64
|
the coarse-grained characteristics of your sample space you can
|
51
65
|
configure a linear histogram to examine it in closer detail.
|
52
66
|
|
67
|
+
h3. Linear Histograms
|
68
|
+
|
53
69
|
Linear histograms are specified with the three values low, high, and width.
|
54
|
-
Low and high
|
70
|
+
Low and high specify a range [low, high) of values included in the
|
55
71
|
histogram (all others are outliers). Width specifies the number of
|
56
72
|
values represented by each bucket and therefore the number of
|
57
73
|
buckets i.e. granularity of the histogram. The histogram range
|
58
74
|
(high - low) must be a multiple of width:
|
59
75
|
|
76
|
+
<pre><code>
|
60
77
|
#Want to track aggregate stats on response times in ms
|
61
78
|
response_stats = Aggregate.new(0, 2000, 50)
|
79
|
+
</code></pre>
|
62
80
|
|
63
81
|
The example above creates a linear histogram that tracks the
|
64
82
|
response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
|
65
|
-
most of your samples fall in the first couple buckets!
|
66
|
-
|
83
|
+
most of your samples fall in the first couple buckets!
|
84
|
+
|
85
|
+
h3. Histogram Outliers
|
67
86
|
|
87
|
+
An Aggregate records any samples that fall outside the histogram range as
|
88
|
+
outliers:
|
89
|
+
|
90
|
+
<pre><code>
|
68
91
|
# Number of samples that fall below the normal range
|
69
92
|
stats.outliers_low
|
70
93
|
|
71
94
|
# Number of samples that fall above the normal range
|
72
95
|
stats.outliers_high
|
96
|
+
</code></pre>
|
97
|
+
|
98
|
+
h3. Histogram Iterators
|
73
99
|
|
74
100
|
Once a histogram is populated Aggregate provides iterator support for
|
75
101
|
examining the contents of buckets. The iterators provide both the
|
76
102
|
number of samples in the bucket, as well as its range:
|
77
103
|
|
104
|
+
<pre><code>
|
78
105
|
#Examine every bucket
|
79
106
|
@stats.each do |bucket, count|
|
80
107
|
end
|
@@ -82,21 +109,29 @@ end
|
|
82
109
|
#Examine only buckets containing samples
|
83
110
|
@stats.each_nonzero do |bucket, count|
|
84
111
|
end
|
112
|
+
</code></pre>
|
85
113
|
|
86
|
-
|
87
|
-
any given number of columns >= 80 (defaults to 80) and sample distribution
|
88
|
-
properly sets a marker weight based on the samples per bucket and aligns all
|
89
|
-
output. Empty buckets are skipped to conserve screen space.
|
114
|
+
h3. Histogram Bar Chart
|
90
115
|
|
116
|
+
Finally Aggregate contains sophisticated pretty-printing support to generate
|
117
|
+
ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
|
118
|
+
sample distribution the <code>to_s</code> method properly sets a marker weight based on the
|
119
|
+
samples per bucket and aligns all output. Empty buckets are skipped to conserve
|
120
|
+
screen space.
|
121
|
+
|
122
|
+
<pre><code>
|
91
123
|
# Generate and display an 80 column histogram
|
92
124
|
puts stats.to_s
|
93
125
|
|
94
126
|
# Generate and display a 120 column histogram
|
95
127
|
puts stats.to_s(120)
|
128
|
+
</code></pre>
|
96
129
|
|
97
|
-
|
98
|
-
set of 65536 values generated by rand to produce
|
130
|
+
This code example populates both a binary and linear histogram with the same
|
131
|
+
set of 65536 values generated by <code>rand</code> to produce the
|
132
|
+
two histograms that follow it:
|
99
133
|
|
134
|
+
<pre><code>
|
100
135
|
require 'rubygems'
|
101
136
|
require 'aggregate'
|
102
137
|
|
@@ -112,9 +147,11 @@ end
|
|
112
147
|
|
113
148
|
puts binary_aggregate.to_s
|
114
149
|
puts linear_aggregate.to_s
|
150
|
+
</code></pre>
|
151
|
+
|
152
|
+
h4. Binary Histogram
|
115
153
|
|
116
|
-
|
117
|
-
** Binary Histogram**
|
154
|
+
<pre><code>
|
118
155
|
value |------------------------------------------------------------------| count
|
119
156
|
1 | | 3
|
120
157
|
2 | | 1
|
@@ -134,8 +171,11 @@ value |------------------------------------------------------------------| count
|
|
134
171
|
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
|
135
172
|
~
|
136
173
|
Total |------------------------------------------------------------------| 65535
|
174
|
+
</code></pre>
|
137
175
|
|
138
|
-
|
176
|
+
h4. Linear (0, 65536, 4096) Histogram
|
177
|
+
|
178
|
+
<pre><code>
|
139
179
|
value |------------------------------------------------------------------| count
|
140
180
|
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
|
141
181
|
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
|
@@ -154,11 +194,12 @@ value |------------------------------------------------------------------| count
|
|
154
194
|
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
|
155
195
|
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
|
156
196
|
Total |------------------------------------------------------------------| 65532
|
157
|
-
|
197
|
+
</code></pre>
|
158
198
|
We can see from these histograms that Ruby's rand function does a relatively good
|
159
199
|
job of distributing returned values in the requested range.
|
160
200
|
|
161
|
-
|
201
|
+
h2. NOTES
|
202
|
+
|
162
203
|
Ruby doesn't have a log2 function built into Math, so we approximate with
|
163
204
|
log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
|
164
205
|
to precision limitations, once n reaches a certain size (somewhere > 32)
|
data/Rakefile
CHANGED
@@ -5,7 +5,7 @@ begin
|
|
5
5
|
Jeweler::Tasks.new do |gemspec|
|
6
6
|
gemspec.name = "aggregate"
|
7
7
|
gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
|
8
|
-
gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
|
8
|
+
gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
|
9
9
|
gemspec.email = "jruscio@gmail.com"
|
10
10
|
gemspec.homepage = "http://github.com/josephruscio/aggregate"
|
11
11
|
gemspec.authors = ["Joseph Ruscio"]
|
data/VERSION
CHANGED
@@ -1 +1 @@
|
|
1
|
-
0.2.
|
1
|
+
0.2.1
|
data/aggregate.gemspec
CHANGED
@@ -5,20 +5,21 @@
|
|
5
5
|
|
6
6
|
Gem::Specification.new do |s|
|
7
7
|
s.name = %q{aggregate}
|
8
|
-
s.version = "0.2.
|
8
|
+
s.version = "0.2.1"
|
9
9
|
|
10
10
|
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
11
|
s.authors = ["Joseph Ruscio"]
|
12
|
-
s.date = %q{2009-09-
|
13
|
-
s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
|
12
|
+
s.date = %q{2009-09-13}
|
13
|
+
s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
|
14
14
|
s.email = %q{jruscio@gmail.com}
|
15
15
|
s.extra_rdoc_files = [
|
16
16
|
"LICENSE",
|
17
|
-
"README"
|
17
|
+
"README.textile"
|
18
18
|
]
|
19
19
|
s.files = [
|
20
|
-
"
|
21
|
-
"
|
20
|
+
".gitignore",
|
21
|
+
"LICENSE",
|
22
|
+
"README.textile",
|
22
23
|
"Rakefile",
|
23
24
|
"VERSION",
|
24
25
|
"aggregate.gemspec",
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: josephruscio-aggregate
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.2.
|
4
|
+
version: 0.2.1
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Joseph Ruscio
|
@@ -9,11 +9,11 @@ autorequire:
|
|
9
9
|
bindir: bin
|
10
10
|
cert_chain: []
|
11
11
|
|
12
|
-
date: 2009-09-
|
12
|
+
date: 2009-09-13 00:00:00 -07:00
|
13
13
|
default_executable:
|
14
14
|
dependencies: []
|
15
15
|
|
16
|
-
description: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
|
16
|
+
description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
|
17
17
|
email: jruscio@gmail.com
|
18
18
|
executables: []
|
19
19
|
|
@@ -21,10 +21,11 @@ extensions: []
|
|
21
21
|
|
22
22
|
extra_rdoc_files:
|
23
23
|
- LICENSE
|
24
|
-
- README
|
24
|
+
- README.textile
|
25
25
|
files:
|
26
|
+
- .gitignore
|
26
27
|
- LICENSE
|
27
|
-
- README
|
28
|
+
- README.textile
|
28
29
|
- Rakefile
|
29
30
|
- VERSION
|
30
31
|
- aggregate.gemspec
|
@@ -32,6 +33,7 @@ files:
|
|
32
33
|
- test/ts_aggregate.rb
|
33
34
|
has_rdoc: false
|
34
35
|
homepage: http://github.com/josephruscio/aggregate
|
36
|
+
licenses:
|
35
37
|
post_install_message:
|
36
38
|
rdoc_options:
|
37
39
|
- --charset=UTF-8
|
@@ -52,7 +54,7 @@ required_rubygems_version: !ruby/object:Gem::Requirement
|
|
52
54
|
requirements: []
|
53
55
|
|
54
56
|
rubyforge_project:
|
55
|
-
rubygems_version: 1.
|
57
|
+
rubygems_version: 1.3.5
|
56
58
|
signing_key:
|
57
59
|
specification_version: 3
|
58
60
|
summary: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
|