aggregate_afurmanov 0.2.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/.gitignore +1 -0
- data/LICENSE +22 -0
- data/README.textile +215 -0
- data/Rakefile +15 -0
- data/VERSION +1 -0
- data/aggregate_afurmanov.gemspec +48 -0
- data/lib/aggregate.rb +298 -0
- data/test/ts_aggregate.rb +162 -0
- metadata +75 -0
data/.gitignore
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
pkg/
|
data/LICENSE
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2009 Joseph Ruscio
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person
|
4
|
+
obtaining a copy of this software and associated documentation
|
5
|
+
files (the "Software"), to deal in the Software without
|
6
|
+
restriction, including without limitation the rights to use,
|
7
|
+
copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
copies of the Software, and to permit persons to whom the
|
9
|
+
Software is furnished to do so, subject to the following
|
10
|
+
conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be
|
13
|
+
included in all copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
16
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
|
17
|
+
OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
18
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
|
19
|
+
HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
|
20
|
+
WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
21
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
|
22
|
+
OTHER DEALINGS IN THE SOFTWARE.
|
data/README.textile
ADDED
@@ -0,0 +1,215 @@
|
|
1
|
+
h1. Aggregate
|
2
|
+
|
3
|
+
By Joseph Ruscio
|
4
|
+
|
5
|
+
Aggregate is an intuitive ruby implementation of a statistics aggregator
|
6
|
+
including both default and configurable histogram support. It does this
|
7
|
+
without recording/storing any of the actual sample values, making it
|
8
|
+
suitable for tracking statistics across millions/billions of sample
|
9
|
+
without any impact on performance or memory footprint. Originally
|
10
|
+
inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
|
11
|
+
|
12
|
+
h2. Getting Started
|
13
|
+
|
14
|
+
Aggregates are easy to instantiate, populate with sample data, and then
|
15
|
+
inspect for common aggregate statistics:
|
16
|
+
|
17
|
+
<pre><code>
|
18
|
+
#After instantiation use the << operator to add a sample to the aggregate:
|
19
|
+
stats = Aggregate.new
|
20
|
+
|
21
|
+
loop do
|
22
|
+
# Take some action that generates a sample measurement
|
23
|
+
stats << sample
|
24
|
+
end
|
25
|
+
|
26
|
+
# The number of samples
|
27
|
+
stats.count
|
28
|
+
|
29
|
+
# The average
|
30
|
+
stats.mean
|
31
|
+
|
32
|
+
# Max sample value
|
33
|
+
stats.max
|
34
|
+
|
35
|
+
# Min sample value
|
36
|
+
stats.min
|
37
|
+
|
38
|
+
# The standard deviation
|
39
|
+
stats.std_dev
|
40
|
+
</code></pre>
|
41
|
+
|
42
|
+
h2. Histograms
|
43
|
+
|
44
|
+
Perhaps more importantly than the basic aggregate statistics detailed above
|
45
|
+
Aggregate also maintains a histogram of samples. For anything other than
|
46
|
+
normally distributed data are insufficient at best and often downright misleading
|
47
|
+
37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
|
48
|
+
Aggregates maintains its histogram internally as a set of "buckets".
|
49
|
+
Each bucket represents a range of possible sample values. The set of all buckets
|
50
|
+
represents the range of "normal" sample values.
|
51
|
+
|
52
|
+
h3. Binary Histograms
|
53
|
+
|
54
|
+
Without any configuration Aggregate instance maintains a binary histogram, where
|
55
|
+
each bucket represents a range twice as large as the preceding bucket i.e.
|
56
|
+
[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
|
57
|
+
provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
|
58
|
+
(See NOTES below for a discussion on the effects in practice of insufficient
|
59
|
+
precision.)
|
60
|
+
|
61
|
+
Binary histograms are useful when we have little idea about what the
|
62
|
+
sample distribution may look like as almost any positive value will
|
63
|
+
fall into some bucket. After using binary histograms to determine
|
64
|
+
the coarse-grained characteristics of your sample space you can
|
65
|
+
configure a linear histogram to examine it in closer detail.
|
66
|
+
|
67
|
+
h3. Linear Histograms
|
68
|
+
|
69
|
+
Linear histograms are specified with the three values low, high, and width.
|
70
|
+
Low and high specify a range [low, high) of values included in the
|
71
|
+
histogram (all others are outliers). Width specifies the number of
|
72
|
+
values represented by each bucket and therefore the number of
|
73
|
+
buckets i.e. granularity of the histogram. The histogram range
|
74
|
+
(high - low) must be a multiple of width:
|
75
|
+
|
76
|
+
<pre><code>
|
77
|
+
#Want to track aggregate stats on response times in ms
|
78
|
+
response_stats = Aggregate.new(0, 2000, 50)
|
79
|
+
</code></pre>
|
80
|
+
|
81
|
+
The example above creates a linear histogram that tracks the
|
82
|
+
response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
|
83
|
+
most of your samples fall in the first couple buckets!
|
84
|
+
|
85
|
+
h3. Histogram Outliers
|
86
|
+
|
87
|
+
An Aggregate records any samples that fall outside the histogram range as
|
88
|
+
outliers:
|
89
|
+
|
90
|
+
<pre><code>
|
91
|
+
# Number of samples that fall below the normal range
|
92
|
+
stats.outliers_low
|
93
|
+
|
94
|
+
# Number of samples that fall above the normal range
|
95
|
+
stats.outliers_high
|
96
|
+
</code></pre>
|
97
|
+
|
98
|
+
h3. Histogram Iterators
|
99
|
+
|
100
|
+
Once a histogram is populated Aggregate provides iterator support for
|
101
|
+
examining the contents of buckets. The iterators provide both the
|
102
|
+
number of samples in the bucket, as well as its range:
|
103
|
+
|
104
|
+
<pre><code>
|
105
|
+
#Examine every bucket
|
106
|
+
@stats.each do |bucket, count|
|
107
|
+
end
|
108
|
+
|
109
|
+
#Examine only buckets containing samples
|
110
|
+
@stats.each_nonzero do |bucket, count|
|
111
|
+
end
|
112
|
+
</code></pre>
|
113
|
+
|
114
|
+
h3. Histogram Bar Chart
|
115
|
+
|
116
|
+
Finally Aggregate contains sophisticated pretty-printing support to generate
|
117
|
+
ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
|
118
|
+
sample distribution the <code>to_s</code> method properly sets a marker weight based on the
|
119
|
+
samples per bucket and aligns all output. Empty buckets are skipped to conserve
|
120
|
+
screen space.
|
121
|
+
|
122
|
+
<pre><code>
|
123
|
+
# Generate and display an 80 column histogram
|
124
|
+
puts stats.to_s
|
125
|
+
|
126
|
+
# Generate and display a 120 column histogram
|
127
|
+
puts stats.to_s(120)
|
128
|
+
</code></pre>
|
129
|
+
|
130
|
+
This code example populates both a binary and linear histogram with the same
|
131
|
+
set of 65536 values generated by <code>rand</code> to produce the
|
132
|
+
two histograms that follow it:
|
133
|
+
|
134
|
+
<pre><code>
|
135
|
+
require 'rubygems'
|
136
|
+
require 'aggregate'
|
137
|
+
|
138
|
+
# Create an Aggregate instance
|
139
|
+
binary_aggregate = Aggregate.new
|
140
|
+
linear_aggregate = Aggregate.new(0, 65536, 8192)
|
141
|
+
|
142
|
+
65536.times do
|
143
|
+
x = rand(65536)
|
144
|
+
binary_aggregate << x
|
145
|
+
linear_aggregate << x
|
146
|
+
end
|
147
|
+
|
148
|
+
puts binary_aggregate.to_s
|
149
|
+
puts linear_aggregate.to_s
|
150
|
+
</code></pre>
|
151
|
+
|
152
|
+
h4. Binary Histogram
|
153
|
+
|
154
|
+
<pre><code>
|
155
|
+
value |------------------------------------------------------------------| count
|
156
|
+
1 | | 3
|
157
|
+
2 | | 1
|
158
|
+
4 | | 5
|
159
|
+
8 | | 9
|
160
|
+
16 | | 15
|
161
|
+
32 | | 29
|
162
|
+
64 | | 62
|
163
|
+
128 | | 115
|
164
|
+
256 | | 267
|
165
|
+
512 |@ | 523
|
166
|
+
1024 |@ | 970
|
167
|
+
2048 |@@@ | 1987
|
168
|
+
4096 |@@@@@@@@ | 4075
|
169
|
+
8192 |@@@@@@@@@@@@@@@@ | 8108
|
170
|
+
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 16405
|
171
|
+
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
|
172
|
+
~
|
173
|
+
Total |------------------------------------------------------------------| 65535
|
174
|
+
</code></pre>
|
175
|
+
|
176
|
+
h4. Linear (0, 65536, 4096) Histogram
|
177
|
+
|
178
|
+
<pre><code>
|
179
|
+
value |------------------------------------------------------------------| count
|
180
|
+
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
|
181
|
+
4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
|
182
|
+
8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4118
|
183
|
+
12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4059
|
184
|
+
16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 3999
|
185
|
+
20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4083
|
186
|
+
24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4134
|
187
|
+
28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4143
|
188
|
+
32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4152
|
189
|
+
36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4033
|
190
|
+
40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4064
|
191
|
+
45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4012
|
192
|
+
49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4070
|
193
|
+
53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4090
|
194
|
+
57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
|
195
|
+
61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
|
196
|
+
Total |------------------------------------------------------------------| 65532
|
197
|
+
</code></pre>
|
198
|
+
We can see from these histograms that Ruby's rand function does a relatively good
|
199
|
+
job of distributing returned values in the requested range.
|
200
|
+
|
201
|
+
h2. Examples
|
202
|
+
|
203
|
+
Here's an example of a "handy timing benchmark":http://gist.github.com/187669
|
204
|
+
implemented with aggregate.
|
205
|
+
|
206
|
+
h2. NOTES
|
207
|
+
|
208
|
+
Ruby doesn't have a log2 function built into Math, so we approximate with
|
209
|
+
log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
|
210
|
+
to precision limitations, once n reaches a certain size (somewhere > 32)
|
211
|
+
this starts to return n. The larger the value of n, the more numbers i.e.
|
212
|
+
(2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into
|
213
|
+
using something like BigDecimal, but for the current purposes of the binary
|
214
|
+
histogram i.e. a simple coarse-grained view the current implementation is
|
215
|
+
sufficient.
|
data/Rakefile
ADDED
@@ -0,0 +1,15 @@
|
|
1
|
+
require 'rake'
|
2
|
+
|
3
|
+
begin
|
4
|
+
require 'jeweler'
|
5
|
+
Jeweler::Tasks.new do |gemspec|
|
6
|
+
gemspec.name = "aggregate_afurmanov"
|
7
|
+
gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
|
8
|
+
gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
|
9
|
+
gemspec.email = "aleksandr.furmanov@gmail.com"
|
10
|
+
gemspec.homepage = "http://github.com/afurmanov/aggregate"
|
11
|
+
gemspec.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
|
12
|
+
end
|
13
|
+
rescue LoadError
|
14
|
+
puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
|
15
|
+
end
|
data/VERSION
ADDED
@@ -0,0 +1 @@
|
|
1
|
+
0.2.2
|
@@ -0,0 +1,48 @@
|
|
1
|
+
# Generated by jeweler
|
2
|
+
# DO NOT EDIT THIS FILE DIRECTLY
|
3
|
+
# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
|
4
|
+
# -*- encoding: utf-8 -*-
|
5
|
+
|
6
|
+
Gem::Specification.new do |s|
|
7
|
+
s.name = %q{aggregate_afurmanov}
|
8
|
+
s.version = "0.2.2"
|
9
|
+
|
10
|
+
s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
|
11
|
+
s.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
|
12
|
+
s.date = %q{2010-12-08}
|
13
|
+
s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
|
14
|
+
s.email = %q{aleksandr.furmanov@gmail.com}
|
15
|
+
s.extra_rdoc_files = [
|
16
|
+
"LICENSE",
|
17
|
+
"README.textile"
|
18
|
+
]
|
19
|
+
s.files = [
|
20
|
+
".gitignore",
|
21
|
+
"LICENSE",
|
22
|
+
"README.textile",
|
23
|
+
"Rakefile",
|
24
|
+
"VERSION",
|
25
|
+
"aggregate_afurmanov.gemspec",
|
26
|
+
"lib/aggregate.rb",
|
27
|
+
"test/ts_aggregate.rb"
|
28
|
+
]
|
29
|
+
s.homepage = %q{http://github.com/afurmanov/aggregate}
|
30
|
+
s.rdoc_options = ["--charset=UTF-8"]
|
31
|
+
s.require_paths = ["lib"]
|
32
|
+
s.rubygems_version = %q{1.3.7}
|
33
|
+
s.summary = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
|
34
|
+
s.test_files = [
|
35
|
+
"test/ts_aggregate.rb"
|
36
|
+
]
|
37
|
+
|
38
|
+
if s.respond_to? :specification_version then
|
39
|
+
current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
|
40
|
+
s.specification_version = 3
|
41
|
+
|
42
|
+
if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
|
43
|
+
else
|
44
|
+
end
|
45
|
+
else
|
46
|
+
end
|
47
|
+
end
|
48
|
+
|
data/lib/aggregate.rb
ADDED
@@ -0,0 +1,298 @@
|
|
1
|
+
# Implements aggregate statistics and maintains
|
2
|
+
# configurable histogram for a set of given samples. Convenient for tracking
|
3
|
+
# high throughput data.
|
4
|
+
class Aggregate
|
5
|
+
#The current average of all samples
|
6
|
+
attr_reader :mean
|
7
|
+
|
8
|
+
#The current number of samples
|
9
|
+
attr_reader :count
|
10
|
+
|
11
|
+
#The maximum sample value
|
12
|
+
attr_reader :max
|
13
|
+
|
14
|
+
#The minimum samples value
|
15
|
+
attr_reader :min
|
16
|
+
|
17
|
+
#The sum of all samples
|
18
|
+
attr_reader :sum
|
19
|
+
|
20
|
+
#The number of samples falling below the lowest valued histogram bucket
|
21
|
+
attr_reader :outliers_low
|
22
|
+
|
23
|
+
#The number of samples falling above the highest valued histogram bucket
|
24
|
+
attr_reader :outliers_high
|
25
|
+
|
26
|
+
DEFAULT_LOG_BUCKETS = 8
|
27
|
+
|
28
|
+
# The number of buckets in the binary logarithmic histogram (low => 2**0, high => 2**@@LOG_BUCKETS)
|
29
|
+
def log_buckets
|
30
|
+
@log_buckets
|
31
|
+
end
|
32
|
+
|
33
|
+
# Create a new Aggregate that maintains a binary logarithmic histogram
|
34
|
+
# by default. Specifying values for low, high, and width configures
|
35
|
+
# the aggregate to maintain a linear histogram with (high - low)/width buckets
|
36
|
+
def initialize (options={})
|
37
|
+
low = options[:low]
|
38
|
+
high = options[:high]
|
39
|
+
width = options[:width]
|
40
|
+
@log_buckets = options[:log_buckets] || DEFAULT_LOG_BUCKETS
|
41
|
+
@count = 0
|
42
|
+
@sum = 0.0
|
43
|
+
@sum2 = 0.0
|
44
|
+
@outliers_low = 0
|
45
|
+
@outliers_high = 0
|
46
|
+
|
47
|
+
# If the user asks we maintain a linear histogram where
|
48
|
+
# values in the range [low, high) are bucketed in multiples
|
49
|
+
# of width
|
50
|
+
if (nil != low && nil != high && nil != width)
|
51
|
+
|
52
|
+
#Validate linear specification
|
53
|
+
if high <= low
|
54
|
+
raise ArgumentError, "High bucket must be > Low bucket"
|
55
|
+
end
|
56
|
+
|
57
|
+
if high - low < width
|
58
|
+
raise ArgumentError, "Histogram width must be <= histogram range"
|
59
|
+
end
|
60
|
+
|
61
|
+
if 0 != (high - low).modulo(width)
|
62
|
+
raise ArgumentError, "Histogram range (high - low) must be a multiple of width"
|
63
|
+
end
|
64
|
+
|
65
|
+
@low = low
|
66
|
+
@high = high
|
67
|
+
@width = width
|
68
|
+
else
|
69
|
+
low ||= 1
|
70
|
+
@low = 1
|
71
|
+
@low = to_bucket(to_index(low))
|
72
|
+
@high = to_bucket(to_index(@low) + log_buckets - 1)
|
73
|
+
end
|
74
|
+
|
75
|
+
#Initialize all buckets to 0
|
76
|
+
@buckets = Array.new(bucket_count, 0)
|
77
|
+
end
|
78
|
+
|
79
|
+
# Include a sample in the aggregate
|
80
|
+
def << data
|
81
|
+
|
82
|
+
# Update min/max
|
83
|
+
if 0 == @count
|
84
|
+
@min = data
|
85
|
+
@max = data
|
86
|
+
else
|
87
|
+
@max = [data, @max].max
|
88
|
+
@min = [data, @min].min
|
89
|
+
end
|
90
|
+
|
91
|
+
# Update the running info
|
92
|
+
@count += 1
|
93
|
+
@sum += data
|
94
|
+
@sum2 += (data * data)
|
95
|
+
|
96
|
+
# Update the bucket
|
97
|
+
@buckets[to_index(data)] += 1 unless outlier?(data)
|
98
|
+
end
|
99
|
+
|
100
|
+
def mean
|
101
|
+
@sum / @count
|
102
|
+
end
|
103
|
+
|
104
|
+
#Calculate the standard deviation
|
105
|
+
def std_dev
|
106
|
+
Math.sqrt((@sum2.to_f - ((@sum.to_f * @sum.to_f)/@count.to_f)) / (@count.to_f - 1))
|
107
|
+
end
|
108
|
+
|
109
|
+
# Combine two aggregates
|
110
|
+
#def +(b)
|
111
|
+
# a = self
|
112
|
+
# c = Aggregate.new
|
113
|
+
|
114
|
+
# c.count = a.count + b.count
|
115
|
+
#end
|
116
|
+
|
117
|
+
#Generate a pretty-printed ASCII representation of the histogram
|
118
|
+
def to_s(columns=nil)
|
119
|
+
|
120
|
+
#default to an 80 column terminal, don't support < 80 for now
|
121
|
+
if nil == columns
|
122
|
+
columns = 80
|
123
|
+
else
|
124
|
+
raise ArgumentError if columns < 80
|
125
|
+
end
|
126
|
+
|
127
|
+
#Find the largest bucket and create an array of the rows we intend to print
|
128
|
+
disp_buckets = Array.new
|
129
|
+
max_count = 0
|
130
|
+
total = 0
|
131
|
+
@buckets.each_with_index do |count, idx|
|
132
|
+
next if 0 == count
|
133
|
+
max_count = [max_count, count].max
|
134
|
+
disp_buckets << [idx, to_bucket(idx), count]
|
135
|
+
total += count
|
136
|
+
end
|
137
|
+
|
138
|
+
#XXX: Better to print just header --> footer
|
139
|
+
return "Empty histogram" if 0 == disp_buckets.length
|
140
|
+
|
141
|
+
#Figure out how wide the value and count columns need to be based on their
|
142
|
+
#largest respective numbers
|
143
|
+
value_str = "value"
|
144
|
+
count_str = "count"
|
145
|
+
total_str = "Total"
|
146
|
+
value_width = [disp_buckets.last[1].to_s.length, value_str.length].max
|
147
|
+
value_width = [value_width, total_str.length].max
|
148
|
+
count_width = [total.to_s.length, count_str.length].max
|
149
|
+
max_bar_width = columns - (value_width + " |".length + "| ".length + count_width)
|
150
|
+
|
151
|
+
#Determine the value of a '@'
|
152
|
+
weight = [max_count.to_f/max_bar_width.to_f, 1.0].max
|
153
|
+
|
154
|
+
#format the header
|
155
|
+
histogram = sprintf("%#{value_width}s |", value_str)
|
156
|
+
max_bar_width.times { histogram << "-"}
|
157
|
+
histogram << sprintf("| %#{count_width}s\n", count_str)
|
158
|
+
|
159
|
+
# We denote empty buckets with a '~'
|
160
|
+
def skip_row(value_width)
|
161
|
+
sprintf("%#{value_width}s ~\n", " ")
|
162
|
+
end
|
163
|
+
|
164
|
+
#Loop through each bucket to be displayed and output the correct number
|
165
|
+
prev_index = disp_buckets[0][0] - 1
|
166
|
+
|
167
|
+
disp_buckets.each do |x|
|
168
|
+
#Denote skipped empty buckets with a ~
|
169
|
+
histogram << skip_row(value_width) unless prev_index == x[0] - 1
|
170
|
+
prev_index = x[0]
|
171
|
+
|
172
|
+
#Add the value
|
173
|
+
row = sprintf("%#{value_width}d |", x[1])
|
174
|
+
|
175
|
+
#Add the bar
|
176
|
+
bar_size = (x[2]/weight).to_i
|
177
|
+
bar_size.times { row += "@"}
|
178
|
+
(max_bar_width - bar_size).times { row += " " }
|
179
|
+
|
180
|
+
#Add the count
|
181
|
+
row << sprintf("| %#{count_width}d\n", x[2])
|
182
|
+
|
183
|
+
#Append the finished row onto the histogram
|
184
|
+
histogram << row
|
185
|
+
end
|
186
|
+
|
187
|
+
#End the table
|
188
|
+
histogram << skip_row(value_width) if disp_buckets.last[0] != bucket_count-1
|
189
|
+
histogram << sprintf("%#{value_width}s", "Total")
|
190
|
+
histogram << " |"
|
191
|
+
max_bar_width.times {histogram << "-"}
|
192
|
+
histogram << "| "
|
193
|
+
histogram << sprintf("%#{count_width}d\n", total)
|
194
|
+
end
|
195
|
+
|
196
|
+
#Iterate through each bucket in the histogram regardless of
|
197
|
+
#its contents
|
198
|
+
def each
|
199
|
+
@buckets.each_with_index do |count, index|
|
200
|
+
yield(to_bucket(index), count)
|
201
|
+
end
|
202
|
+
end
|
203
|
+
|
204
|
+
#Iterate through only the buckets in the histogram that contain
|
205
|
+
#samples
|
206
|
+
def each_nonzero
|
207
|
+
@buckets.each_with_index do |count, index|
|
208
|
+
yield(to_bucket(index), count) if count != 0
|
209
|
+
end
|
210
|
+
end
|
211
|
+
|
212
|
+
private
|
213
|
+
|
214
|
+
def linear?
|
215
|
+
nil != @width
|
216
|
+
end
|
217
|
+
|
218
|
+
def outlier? (data)
|
219
|
+
|
220
|
+
if data < @low
|
221
|
+
@outliers_low += 1
|
222
|
+
elsif data >= @high
|
223
|
+
@outliers_high += 1
|
224
|
+
else
|
225
|
+
return false
|
226
|
+
end
|
227
|
+
end
|
228
|
+
|
229
|
+
def bucket_count
|
230
|
+
if linear?
|
231
|
+
return (@high-@low)/@width
|
232
|
+
else
|
233
|
+
return log_buckets
|
234
|
+
end
|
235
|
+
end
|
236
|
+
|
237
|
+
def to_bucket(index)
|
238
|
+
if linear?
|
239
|
+
return @low + (index * @width)
|
240
|
+
else
|
241
|
+
return 2**(log2(@low) + index)
|
242
|
+
end
|
243
|
+
end
|
244
|
+
|
245
|
+
def right_bucket? index, data
|
246
|
+
|
247
|
+
# check invariant
|
248
|
+
raise unless linear?
|
249
|
+
|
250
|
+
bucket = to_bucket(index)
|
251
|
+
|
252
|
+
#It's the right bucket if data falls between bucket and next bucket
|
253
|
+
bucket <= data && data < bucket + @width
|
254
|
+
end
|
255
|
+
|
256
|
+
=begin
|
257
|
+
def find_bucket(lower, upper, target)
|
258
|
+
#Classic binary search
|
259
|
+
return upper if right_bucket?(upper, target)
|
260
|
+
|
261
|
+
# Cut the search range in half
|
262
|
+
middle = (upper/2).to_i
|
263
|
+
|
264
|
+
# Determine which half contains our value and recurse
|
265
|
+
if (to_bucket(middle) >= target)
|
266
|
+
return find_bucket(lower, middle, target)
|
267
|
+
else
|
268
|
+
return find_bucket(middle, upper, target)
|
269
|
+
end
|
270
|
+
end
|
271
|
+
=end
|
272
|
+
|
273
|
+
# A data point is added to the bucket[n] where the data point
|
274
|
+
# is less than the value represented by bucket[n], but greater
|
275
|
+
# than the value represented by bucket[n+1]
|
276
|
+
public
|
277
|
+
def to_index (data)
|
278
|
+
|
279
|
+
# basic case is simple
|
280
|
+
return log2([1,data/@low].max).to_i if !linear?
|
281
|
+
|
282
|
+
# Search for the right bucket in the linear case
|
283
|
+
@buckets.each_with_index do |count, idx|
|
284
|
+
return idx if right_bucket?(idx, data)
|
285
|
+
end
|
286
|
+
#find_bucket(0, bucket_count-1, data)
|
287
|
+
|
288
|
+
#Should not get here
|
289
|
+
raise "#{data}"
|
290
|
+
end
|
291
|
+
|
292
|
+
# log2(x) returns j, | i = j-1 and 2**i <= data < 2**j
|
293
|
+
@@LOG2_DIVEDEND = Math.log(2)
|
294
|
+
def log2( x )
|
295
|
+
Math.log(x) / @@LOG2_DIVEDEND
|
296
|
+
end
|
297
|
+
|
298
|
+
end
|
@@ -0,0 +1,162 @@
|
|
1
|
+
require 'test/unit'
|
2
|
+
require 'lib/aggregate'
|
3
|
+
|
4
|
+
class SimpleStatsTest < Test::Unit::TestCase
|
5
|
+
|
6
|
+
def setup
|
7
|
+
@stats = Aggregate.new(:log_buckets => 128)
|
8
|
+
|
9
|
+
@@DATA.each do |x|
|
10
|
+
@stats << x
|
11
|
+
end
|
12
|
+
end
|
13
|
+
|
14
|
+
def test_stats_count
|
15
|
+
assert_equal @@DATA.length, @stats.count
|
16
|
+
end
|
17
|
+
|
18
|
+
def test_stats_min_max
|
19
|
+
sorted_data = @@DATA.sort
|
20
|
+
|
21
|
+
assert_equal sorted_data[0], @stats.min
|
22
|
+
assert_equal sorted_data.last, @stats.max
|
23
|
+
end
|
24
|
+
|
25
|
+
def test_stats_mean
|
26
|
+
sum = 0
|
27
|
+
@@DATA.each do |x|
|
28
|
+
sum += x
|
29
|
+
end
|
30
|
+
|
31
|
+
assert_equal sum.to_f/@@DATA.length.to_f, @stats.mean
|
32
|
+
end
|
33
|
+
|
34
|
+
def test_bucket_counts
|
35
|
+
|
36
|
+
#Test each iterator
|
37
|
+
total_bucket_sum = 0
|
38
|
+
i = 0
|
39
|
+
@stats.each do |bucket, count|
|
40
|
+
assert_equal 2**i, bucket
|
41
|
+
|
42
|
+
total_bucket_sum += count
|
43
|
+
i += 1
|
44
|
+
end
|
45
|
+
|
46
|
+
assert_equal @@DATA.length, total_bucket_sum
|
47
|
+
|
48
|
+
#Test each_nonzero iterator
|
49
|
+
prev_bucket = 0
|
50
|
+
total_bucket_sum = 0
|
51
|
+
@stats.each_nonzero do |bucket, count|
|
52
|
+
assert bucket > prev_bucket
|
53
|
+
assert_not_equal count, 0
|
54
|
+
|
55
|
+
total_bucket_sum += count
|
56
|
+
end
|
57
|
+
|
58
|
+
assert_equal total_bucket_sum, @@DATA.length
|
59
|
+
end
|
60
|
+
|
61
|
+
=begin
|
62
|
+
def test_addition
|
63
|
+
stats1 = Aggregate.new
|
64
|
+
stats2 = Aggregate.new
|
65
|
+
|
66
|
+
stats1 << 1
|
67
|
+
stats2 << 3
|
68
|
+
|
69
|
+
stats_sum = stats1 + stats2
|
70
|
+
|
71
|
+
assert_equal stats_sum.count, stats1.count + stats2.count
|
72
|
+
end
|
73
|
+
=end
|
74
|
+
|
75
|
+
#XXX: Update test_bucket_contents() if you muck with @@DATA
|
76
|
+
@@DATA = [ 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383]
|
77
|
+
def test_bucket_contents
|
78
|
+
#XXX: This is the only test so far that cares about the actual contents
|
79
|
+
# of @@DATA, so if you update that array ... update this method too
|
80
|
+
expected_buckets = [1, 4, 1024, 8192, 16384]
|
81
|
+
expected_counts = [1, 3, 2, 1, 2]
|
82
|
+
|
83
|
+
i = 0
|
84
|
+
@stats.each_nonzero do |bucket, count|
|
85
|
+
assert_equal expected_buckets[i], bucket
|
86
|
+
assert_equal expected_counts[i], count
|
87
|
+
# Increment for the next test
|
88
|
+
i += 1
|
89
|
+
end
|
90
|
+
end
|
91
|
+
|
92
|
+
def test_histogram
|
93
|
+
puts @stats.to_s
|
94
|
+
end
|
95
|
+
|
96
|
+
def test_outlier
|
97
|
+
assert_equal 0, @stats.outliers_low
|
98
|
+
assert_equal 0, @stats.outliers_high
|
99
|
+
|
100
|
+
@stats << -1
|
101
|
+
@stats << -2
|
102
|
+
@stats << 0
|
103
|
+
|
104
|
+
@stats << 2**128
|
105
|
+
|
106
|
+
# This should be the last value in the last bucket, but Ruby's native
|
107
|
+
# floats are not precise enough. Somewhere past 2^32 the log(x)/log(2)
|
108
|
+
# breaks down. So it shows up as 128 (outlier) instead of 127
|
109
|
+
#@stats << (2**128) - 1
|
110
|
+
|
111
|
+
assert_equal 3, @stats.outliers_low
|
112
|
+
assert_equal 1, @stats.outliers_high
|
113
|
+
end
|
114
|
+
|
115
|
+
def test_std_dev
|
116
|
+
@stats.std_dev
|
117
|
+
end
|
118
|
+
end
|
119
|
+
|
120
|
+
class LinearHistogramTest < Test::Unit::TestCase
|
121
|
+
def setup
|
122
|
+
@stats = Aggregate.new(:low => 0, :high => 32768, :width => 1024)
|
123
|
+
|
124
|
+
@@DATA.each do |x|
|
125
|
+
@stats << x
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
def test_validation
|
130
|
+
|
131
|
+
# Range cannot be 0
|
132
|
+
assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32,:high => 32, :width => 4)}
|
133
|
+
|
134
|
+
# Range cannot be negative
|
135
|
+
assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32, :high => 16, :width => 4)}
|
136
|
+
|
137
|
+
# Range cannot be < single bucket
|
138
|
+
assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 16, :high => 32, :width => 17)}
|
139
|
+
|
140
|
+
# Range % width must equal 0 (for now)
|
141
|
+
assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 1, :high => 16384, :width => 1024)}
|
142
|
+
end
|
143
|
+
|
144
|
+
#XXX: Update test_bucket_contents() if you muck with @@DATA
|
145
|
+
# 32768 is an outlier
|
146
|
+
@@DATA = [ 0, 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383, 32768]
|
147
|
+
def test_bucket_contents
|
148
|
+
#XXX: This is the only test so far that cares about the actual contents
|
149
|
+
# of @@DATA, so if you update that array ... update this method too
|
150
|
+
expected_buckets = [0, 1024, 15360, 16384]
|
151
|
+
expected_counts = [5, 2, 1, 2]
|
152
|
+
|
153
|
+
i = 0
|
154
|
+
@stats.each_nonzero do |bucket, count|
|
155
|
+
assert_equal expected_buckets[i], bucket
|
156
|
+
assert_equal expected_counts[i], count
|
157
|
+
# Increment for the next test
|
158
|
+
i += 1
|
159
|
+
end
|
160
|
+
end
|
161
|
+
|
162
|
+
end
|
metadata
ADDED
@@ -0,0 +1,75 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: aggregate_afurmanov
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
hash: 19
|
5
|
+
prerelease: false
|
6
|
+
segments:
|
7
|
+
- 0
|
8
|
+
- 2
|
9
|
+
- 2
|
10
|
+
version: 0.2.2
|
11
|
+
platform: ruby
|
12
|
+
authors:
|
13
|
+
- Joseph Ruscio, Aleksandr Furmanov
|
14
|
+
autorequire:
|
15
|
+
bindir: bin
|
16
|
+
cert_chain: []
|
17
|
+
|
18
|
+
date: 2010-12-08 00:00:00 -08:00
|
19
|
+
default_executable:
|
20
|
+
dependencies: []
|
21
|
+
|
22
|
+
description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
|
23
|
+
email: aleksandr.furmanov@gmail.com
|
24
|
+
executables: []
|
25
|
+
|
26
|
+
extensions: []
|
27
|
+
|
28
|
+
extra_rdoc_files:
|
29
|
+
- LICENSE
|
30
|
+
- README.textile
|
31
|
+
files:
|
32
|
+
- .gitignore
|
33
|
+
- LICENSE
|
34
|
+
- README.textile
|
35
|
+
- Rakefile
|
36
|
+
- VERSION
|
37
|
+
- aggregate_afurmanov.gemspec
|
38
|
+
- lib/aggregate.rb
|
39
|
+
- test/ts_aggregate.rb
|
40
|
+
has_rdoc: true
|
41
|
+
homepage: http://github.com/afurmanov/aggregate
|
42
|
+
licenses: []
|
43
|
+
|
44
|
+
post_install_message:
|
45
|
+
rdoc_options:
|
46
|
+
- --charset=UTF-8
|
47
|
+
require_paths:
|
48
|
+
- lib
|
49
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
50
|
+
none: false
|
51
|
+
requirements:
|
52
|
+
- - ">="
|
53
|
+
- !ruby/object:Gem::Version
|
54
|
+
hash: 3
|
55
|
+
segments:
|
56
|
+
- 0
|
57
|
+
version: "0"
|
58
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
59
|
+
none: false
|
60
|
+
requirements:
|
61
|
+
- - ">="
|
62
|
+
- !ruby/object:Gem::Version
|
63
|
+
hash: 3
|
64
|
+
segments:
|
65
|
+
- 0
|
66
|
+
version: "0"
|
67
|
+
requirements: []
|
68
|
+
|
69
|
+
rubyforge_project:
|
70
|
+
rubygems_version: 1.3.7
|
71
|
+
signing_key:
|
72
|
+
specification_version: 3
|
73
|
+
summary: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
|
74
|
+
test_files:
|
75
|
+
- test/ts_aggregate.rb
|