aggregate_afurmanov 0.2.2

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1 @@
1
+ pkg/
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2009 Joseph Ruscio
2
+
3
+ Permission is hereby granted, free of charge, to any person
4
+ obtaining a copy of this software and associated documentation
5
+ files (the "Software"), to deal in the Software without
6
+ restriction, including without limitation the rights to use,
7
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ copies of the Software, and to permit persons to whom the
9
+ Software is furnished to do so, subject to the following
10
+ conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,215 @@
1
+ h1. Aggregate
2
+
3
+ By Joseph Ruscio
4
+
5
+ Aggregate is an intuitive ruby implementation of a statistics aggregator
6
+ including both default and configurable histogram support. It does this
7
+ without recording/storing any of the actual sample values, making it
8
+ suitable for tracking statistics across millions/billions of sample
9
+ without any impact on performance or memory footprint. Originally
10
+ inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
11
+
12
+ h2. Getting Started
13
+
14
+ Aggregates are easy to instantiate, populate with sample data, and then
15
+ inspect for common aggregate statistics:
16
+
17
+ <pre><code>
18
+ #After instantiation use the << operator to add a sample to the aggregate:
19
+ stats = Aggregate.new
20
+
21
+ loop do
22
+ # Take some action that generates a sample measurement
23
+ stats << sample
24
+ end
25
+
26
+ # The number of samples
27
+ stats.count
28
+
29
+ # The average
30
+ stats.mean
31
+
32
+ # Max sample value
33
+ stats.max
34
+
35
+ # Min sample value
36
+ stats.min
37
+
38
+ # The standard deviation
39
+ stats.std_dev
40
+ </code></pre>
41
+
42
+ h2. Histograms
43
+
44
+ Perhaps more importantly than the basic aggregate statistics detailed above
45
+ Aggregate also maintains a histogram of samples. For anything other than
46
+ normally distributed data are insufficient at best and often downright misleading
47
+ 37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
48
+ Aggregates maintains its histogram internally as a set of "buckets".
49
+ Each bucket represents a range of possible sample values. The set of all buckets
50
+ represents the range of "normal" sample values.
51
+
52
+ h3. Binary Histograms
53
+
54
+ Without any configuration Aggregate instance maintains a binary histogram, where
55
+ each bucket represents a range twice as large as the preceding bucket i.e.
56
+ [1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
57
+ provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
58
+ (See NOTES below for a discussion on the effects in practice of insufficient
59
+ precision.)
60
+
61
+ Binary histograms are useful when we have little idea about what the
62
+ sample distribution may look like as almost any positive value will
63
+ fall into some bucket. After using binary histograms to determine
64
+ the coarse-grained characteristics of your sample space you can
65
+ configure a linear histogram to examine it in closer detail.
66
+
67
+ h3. Linear Histograms
68
+
69
+ Linear histograms are specified with the three values low, high, and width.
70
+ Low and high specify a range [low, high) of values included in the
71
+ histogram (all others are outliers). Width specifies the number of
72
+ values represented by each bucket and therefore the number of
73
+ buckets i.e. granularity of the histogram. The histogram range
74
+ (high - low) must be a multiple of width:
75
+
76
+ <pre><code>
77
+ #Want to track aggregate stats on response times in ms
78
+ response_stats = Aggregate.new(0, 2000, 50)
79
+ </code></pre>
80
+
81
+ The example above creates a linear histogram that tracks the
82
+ response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
83
+ most of your samples fall in the first couple buckets!
84
+
85
+ h3. Histogram Outliers
86
+
87
+ An Aggregate records any samples that fall outside the histogram range as
88
+ outliers:
89
+
90
+ <pre><code>
91
+ # Number of samples that fall below the normal range
92
+ stats.outliers_low
93
+
94
+ # Number of samples that fall above the normal range
95
+ stats.outliers_high
96
+ </code></pre>
97
+
98
+ h3. Histogram Iterators
99
+
100
+ Once a histogram is populated Aggregate provides iterator support for
101
+ examining the contents of buckets. The iterators provide both the
102
+ number of samples in the bucket, as well as its range:
103
+
104
+ <pre><code>
105
+ #Examine every bucket
106
+ @stats.each do |bucket, count|
107
+ end
108
+
109
+ #Examine only buckets containing samples
110
+ @stats.each_nonzero do |bucket, count|
111
+ end
112
+ </code></pre>
113
+
114
+ h3. Histogram Bar Chart
115
+
116
+ Finally Aggregate contains sophisticated pretty-printing support to generate
117
+ ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
118
+ sample distribution the <code>to_s</code> method properly sets a marker weight based on the
119
+ samples per bucket and aligns all output. Empty buckets are skipped to conserve
120
+ screen space.
121
+
122
+ <pre><code>
123
+ # Generate and display an 80 column histogram
124
+ puts stats.to_s
125
+
126
+ # Generate and display a 120 column histogram
127
+ puts stats.to_s(120)
128
+ </code></pre>
129
+
130
+ This code example populates both a binary and linear histogram with the same
131
+ set of 65536 values generated by <code>rand</code> to produce the
132
+ two histograms that follow it:
133
+
134
+ <pre><code>
135
+ require 'rubygems'
136
+ require 'aggregate'
137
+
138
+ # Create an Aggregate instance
139
+ binary_aggregate = Aggregate.new
140
+ linear_aggregate = Aggregate.new(0, 65536, 8192)
141
+
142
+ 65536.times do
143
+ x = rand(65536)
144
+ binary_aggregate << x
145
+ linear_aggregate << x
146
+ end
147
+
148
+ puts binary_aggregate.to_s
149
+ puts linear_aggregate.to_s
150
+ </code></pre>
151
+
152
+ h4. Binary Histogram
153
+
154
+ <pre><code>
155
+ value |------------------------------------------------------------------| count
156
+ 1 | | 3
157
+ 2 | | 1
158
+ 4 | | 5
159
+ 8 | | 9
160
+ 16 | | 15
161
+ 32 | | 29
162
+ 64 | | 62
163
+ 128 | | 115
164
+ 256 | | 267
165
+ 512 |@ | 523
166
+ 1024 |@ | 970
167
+ 2048 |@@@ | 1987
168
+ 4096 |@@@@@@@@ | 4075
169
+ 8192 |@@@@@@@@@@@@@@@@ | 8108
170
+ 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 16405
171
+ 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
172
+ ~
173
+ Total |------------------------------------------------------------------| 65535
174
+ </code></pre>
175
+
176
+ h4. Linear (0, 65536, 4096) Histogram
177
+
178
+ <pre><code>
179
+ value |------------------------------------------------------------------| count
180
+ 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
181
+ 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
182
+ 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4118
183
+ 12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4059
184
+ 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 3999
185
+ 20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4083
186
+ 24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4134
187
+ 28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4143
188
+ 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4152
189
+ 36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4033
190
+ 40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4064
191
+ 45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4012
192
+ 49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4070
193
+ 53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4090
194
+ 57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
195
+ 61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
196
+ Total |------------------------------------------------------------------| 65532
197
+ </code></pre>
198
+ We can see from these histograms that Ruby's rand function does a relatively good
199
+ job of distributing returned values in the requested range.
200
+
201
+ h2. Examples
202
+
203
+ Here's an example of a "handy timing benchmark":http://gist.github.com/187669
204
+ implemented with aggregate.
205
+
206
+ h2. NOTES
207
+
208
+ Ruby doesn't have a log2 function built into Math, so we approximate with
209
+ log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
210
+ to precision limitations, once n reaches a certain size (somewhere > 32)
211
+ this starts to return n. The larger the value of n, the more numbers i.e.
212
+ (2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into
213
+ using something like BigDecimal, but for the current purposes of the binary
214
+ histogram i.e. a simple coarse-grained view the current implementation is
215
+ sufficient.
@@ -0,0 +1,15 @@
1
+ require 'rake'
2
+
3
+ begin
4
+ require 'jeweler'
5
+ Jeweler::Tasks.new do |gemspec|
6
+ gemspec.name = "aggregate_afurmanov"
7
+ gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
8
+ gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
9
+ gemspec.email = "aleksandr.furmanov@gmail.com"
10
+ gemspec.homepage = "http://github.com/afurmanov/aggregate"
11
+ gemspec.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
12
+ end
13
+ rescue LoadError
14
+ puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
15
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.2
@@ -0,0 +1,48 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{aggregate_afurmanov}
8
+ s.version = "0.2.2"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
12
+ s.date = %q{2010-12-08}
13
+ s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
14
+ s.email = %q{aleksandr.furmanov@gmail.com}
15
+ s.extra_rdoc_files = [
16
+ "LICENSE",
17
+ "README.textile"
18
+ ]
19
+ s.files = [
20
+ ".gitignore",
21
+ "LICENSE",
22
+ "README.textile",
23
+ "Rakefile",
24
+ "VERSION",
25
+ "aggregate_afurmanov.gemspec",
26
+ "lib/aggregate.rb",
27
+ "test/ts_aggregate.rb"
28
+ ]
29
+ s.homepage = %q{http://github.com/afurmanov/aggregate}
30
+ s.rdoc_options = ["--charset=UTF-8"]
31
+ s.require_paths = ["lib"]
32
+ s.rubygems_version = %q{1.3.7}
33
+ s.summary = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
34
+ s.test_files = [
35
+ "test/ts_aggregate.rb"
36
+ ]
37
+
38
+ if s.respond_to? :specification_version then
39
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
40
+ s.specification_version = 3
41
+
42
+ if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
43
+ else
44
+ end
45
+ else
46
+ end
47
+ end
48
+
@@ -0,0 +1,298 @@
1
+ # Implements aggregate statistics and maintains
2
+ # configurable histogram for a set of given samples. Convenient for tracking
3
+ # high throughput data.
4
+ class Aggregate
5
+ #The current average of all samples
6
+ attr_reader :mean
7
+
8
+ #The current number of samples
9
+ attr_reader :count
10
+
11
+ #The maximum sample value
12
+ attr_reader :max
13
+
14
+ #The minimum samples value
15
+ attr_reader :min
16
+
17
+ #The sum of all samples
18
+ attr_reader :sum
19
+
20
+ #The number of samples falling below the lowest valued histogram bucket
21
+ attr_reader :outliers_low
22
+
23
+ #The number of samples falling above the highest valued histogram bucket
24
+ attr_reader :outliers_high
25
+
26
+ DEFAULT_LOG_BUCKETS = 8
27
+
28
+ # The number of buckets in the binary logarithmic histogram (low => 2**0, high => 2**@@LOG_BUCKETS)
29
+ def log_buckets
30
+ @log_buckets
31
+ end
32
+
33
+ # Create a new Aggregate that maintains a binary logarithmic histogram
34
+ # by default. Specifying values for low, high, and width configures
35
+ # the aggregate to maintain a linear histogram with (high - low)/width buckets
36
+ def initialize (options={})
37
+ low = options[:low]
38
+ high = options[:high]
39
+ width = options[:width]
40
+ @log_buckets = options[:log_buckets] || DEFAULT_LOG_BUCKETS
41
+ @count = 0
42
+ @sum = 0.0
43
+ @sum2 = 0.0
44
+ @outliers_low = 0
45
+ @outliers_high = 0
46
+
47
+ # If the user asks we maintain a linear histogram where
48
+ # values in the range [low, high) are bucketed in multiples
49
+ # of width
50
+ if (nil != low && nil != high && nil != width)
51
+
52
+ #Validate linear specification
53
+ if high <= low
54
+ raise ArgumentError, "High bucket must be > Low bucket"
55
+ end
56
+
57
+ if high - low < width
58
+ raise ArgumentError, "Histogram width must be <= histogram range"
59
+ end
60
+
61
+ if 0 != (high - low).modulo(width)
62
+ raise ArgumentError, "Histogram range (high - low) must be a multiple of width"
63
+ end
64
+
65
+ @low = low
66
+ @high = high
67
+ @width = width
68
+ else
69
+ low ||= 1
70
+ @low = 1
71
+ @low = to_bucket(to_index(low))
72
+ @high = to_bucket(to_index(@low) + log_buckets - 1)
73
+ end
74
+
75
+ #Initialize all buckets to 0
76
+ @buckets = Array.new(bucket_count, 0)
77
+ end
78
+
79
+ # Include a sample in the aggregate
80
+ def << data
81
+
82
+ # Update min/max
83
+ if 0 == @count
84
+ @min = data
85
+ @max = data
86
+ else
87
+ @max = [data, @max].max
88
+ @min = [data, @min].min
89
+ end
90
+
91
+ # Update the running info
92
+ @count += 1
93
+ @sum += data
94
+ @sum2 += (data * data)
95
+
96
+ # Update the bucket
97
+ @buckets[to_index(data)] += 1 unless outlier?(data)
98
+ end
99
+
100
+ def mean
101
+ @sum / @count
102
+ end
103
+
104
+ #Calculate the standard deviation
105
+ def std_dev
106
+ Math.sqrt((@sum2.to_f - ((@sum.to_f * @sum.to_f)/@count.to_f)) / (@count.to_f - 1))
107
+ end
108
+
109
+ # Combine two aggregates
110
+ #def +(b)
111
+ # a = self
112
+ # c = Aggregate.new
113
+
114
+ # c.count = a.count + b.count
115
+ #end
116
+
117
+ #Generate a pretty-printed ASCII representation of the histogram
118
+ def to_s(columns=nil)
119
+
120
+ #default to an 80 column terminal, don't support < 80 for now
121
+ if nil == columns
122
+ columns = 80
123
+ else
124
+ raise ArgumentError if columns < 80
125
+ end
126
+
127
+ #Find the largest bucket and create an array of the rows we intend to print
128
+ disp_buckets = Array.new
129
+ max_count = 0
130
+ total = 0
131
+ @buckets.each_with_index do |count, idx|
132
+ next if 0 == count
133
+ max_count = [max_count, count].max
134
+ disp_buckets << [idx, to_bucket(idx), count]
135
+ total += count
136
+ end
137
+
138
+ #XXX: Better to print just header --> footer
139
+ return "Empty histogram" if 0 == disp_buckets.length
140
+
141
+ #Figure out how wide the value and count columns need to be based on their
142
+ #largest respective numbers
143
+ value_str = "value"
144
+ count_str = "count"
145
+ total_str = "Total"
146
+ value_width = [disp_buckets.last[1].to_s.length, value_str.length].max
147
+ value_width = [value_width, total_str.length].max
148
+ count_width = [total.to_s.length, count_str.length].max
149
+ max_bar_width = columns - (value_width + " |".length + "| ".length + count_width)
150
+
151
+ #Determine the value of a '@'
152
+ weight = [max_count.to_f/max_bar_width.to_f, 1.0].max
153
+
154
+ #format the header
155
+ histogram = sprintf("%#{value_width}s |", value_str)
156
+ max_bar_width.times { histogram << "-"}
157
+ histogram << sprintf("| %#{count_width}s\n", count_str)
158
+
159
+ # We denote empty buckets with a '~'
160
+ def skip_row(value_width)
161
+ sprintf("%#{value_width}s ~\n", " ")
162
+ end
163
+
164
+ #Loop through each bucket to be displayed and output the correct number
165
+ prev_index = disp_buckets[0][0] - 1
166
+
167
+ disp_buckets.each do |x|
168
+ #Denote skipped empty buckets with a ~
169
+ histogram << skip_row(value_width) unless prev_index == x[0] - 1
170
+ prev_index = x[0]
171
+
172
+ #Add the value
173
+ row = sprintf("%#{value_width}d |", x[1])
174
+
175
+ #Add the bar
176
+ bar_size = (x[2]/weight).to_i
177
+ bar_size.times { row += "@"}
178
+ (max_bar_width - bar_size).times { row += " " }
179
+
180
+ #Add the count
181
+ row << sprintf("| %#{count_width}d\n", x[2])
182
+
183
+ #Append the finished row onto the histogram
184
+ histogram << row
185
+ end
186
+
187
+ #End the table
188
+ histogram << skip_row(value_width) if disp_buckets.last[0] != bucket_count-1
189
+ histogram << sprintf("%#{value_width}s", "Total")
190
+ histogram << " |"
191
+ max_bar_width.times {histogram << "-"}
192
+ histogram << "| "
193
+ histogram << sprintf("%#{count_width}d\n", total)
194
+ end
195
+
196
+ #Iterate through each bucket in the histogram regardless of
197
+ #its contents
198
+ def each
199
+ @buckets.each_with_index do |count, index|
200
+ yield(to_bucket(index), count)
201
+ end
202
+ end
203
+
204
+ #Iterate through only the buckets in the histogram that contain
205
+ #samples
206
+ def each_nonzero
207
+ @buckets.each_with_index do |count, index|
208
+ yield(to_bucket(index), count) if count != 0
209
+ end
210
+ end
211
+
212
+ private
213
+
214
+ def linear?
215
+ nil != @width
216
+ end
217
+
218
+ def outlier? (data)
219
+
220
+ if data < @low
221
+ @outliers_low += 1
222
+ elsif data >= @high
223
+ @outliers_high += 1
224
+ else
225
+ return false
226
+ end
227
+ end
228
+
229
+ def bucket_count
230
+ if linear?
231
+ return (@high-@low)/@width
232
+ else
233
+ return log_buckets
234
+ end
235
+ end
236
+
237
+ def to_bucket(index)
238
+ if linear?
239
+ return @low + (index * @width)
240
+ else
241
+ return 2**(log2(@low) + index)
242
+ end
243
+ end
244
+
245
+ def right_bucket? index, data
246
+
247
+ # check invariant
248
+ raise unless linear?
249
+
250
+ bucket = to_bucket(index)
251
+
252
+ #It's the right bucket if data falls between bucket and next bucket
253
+ bucket <= data && data < bucket + @width
254
+ end
255
+
256
+ =begin
257
+ def find_bucket(lower, upper, target)
258
+ #Classic binary search
259
+ return upper if right_bucket?(upper, target)
260
+
261
+ # Cut the search range in half
262
+ middle = (upper/2).to_i
263
+
264
+ # Determine which half contains our value and recurse
265
+ if (to_bucket(middle) >= target)
266
+ return find_bucket(lower, middle, target)
267
+ else
268
+ return find_bucket(middle, upper, target)
269
+ end
270
+ end
271
+ =end
272
+
273
+ # A data point is added to the bucket[n] where the data point
274
+ # is less than the value represented by bucket[n], but greater
275
+ # than the value represented by bucket[n+1]
276
+ public
277
+ def to_index (data)
278
+
279
+ # basic case is simple
280
+ return log2([1,data/@low].max).to_i if !linear?
281
+
282
+ # Search for the right bucket in the linear case
283
+ @buckets.each_with_index do |count, idx|
284
+ return idx if right_bucket?(idx, data)
285
+ end
286
+ #find_bucket(0, bucket_count-1, data)
287
+
288
+ #Should not get here
289
+ raise "#{data}"
290
+ end
291
+
292
+ # log2(x) returns j, | i = j-1 and 2**i <= data < 2**j
293
+ @@LOG2_DIVEDEND = Math.log(2)
294
+ def log2( x )
295
+ Math.log(x) / @@LOG2_DIVEDEND
296
+ end
297
+
298
+ end
@@ -0,0 +1,162 @@
1
+ require 'test/unit'
2
+ require 'lib/aggregate'
3
+
4
+ class SimpleStatsTest < Test::Unit::TestCase
5
+
6
+ def setup
7
+ @stats = Aggregate.new(:log_buckets => 128)
8
+
9
+ @@DATA.each do |x|
10
+ @stats << x
11
+ end
12
+ end
13
+
14
+ def test_stats_count
15
+ assert_equal @@DATA.length, @stats.count
16
+ end
17
+
18
+ def test_stats_min_max
19
+ sorted_data = @@DATA.sort
20
+
21
+ assert_equal sorted_data[0], @stats.min
22
+ assert_equal sorted_data.last, @stats.max
23
+ end
24
+
25
+ def test_stats_mean
26
+ sum = 0
27
+ @@DATA.each do |x|
28
+ sum += x
29
+ end
30
+
31
+ assert_equal sum.to_f/@@DATA.length.to_f, @stats.mean
32
+ end
33
+
34
+ def test_bucket_counts
35
+
36
+ #Test each iterator
37
+ total_bucket_sum = 0
38
+ i = 0
39
+ @stats.each do |bucket, count|
40
+ assert_equal 2**i, bucket
41
+
42
+ total_bucket_sum += count
43
+ i += 1
44
+ end
45
+
46
+ assert_equal @@DATA.length, total_bucket_sum
47
+
48
+ #Test each_nonzero iterator
49
+ prev_bucket = 0
50
+ total_bucket_sum = 0
51
+ @stats.each_nonzero do |bucket, count|
52
+ assert bucket > prev_bucket
53
+ assert_not_equal count, 0
54
+
55
+ total_bucket_sum += count
56
+ end
57
+
58
+ assert_equal total_bucket_sum, @@DATA.length
59
+ end
60
+
61
+ =begin
62
+ def test_addition
63
+ stats1 = Aggregate.new
64
+ stats2 = Aggregate.new
65
+
66
+ stats1 << 1
67
+ stats2 << 3
68
+
69
+ stats_sum = stats1 + stats2
70
+
71
+ assert_equal stats_sum.count, stats1.count + stats2.count
72
+ end
73
+ =end
74
+
75
+ #XXX: Update test_bucket_contents() if you muck with @@DATA
76
+ @@DATA = [ 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383]
77
+ def test_bucket_contents
78
+ #XXX: This is the only test so far that cares about the actual contents
79
+ # of @@DATA, so if you update that array ... update this method too
80
+ expected_buckets = [1, 4, 1024, 8192, 16384]
81
+ expected_counts = [1, 3, 2, 1, 2]
82
+
83
+ i = 0
84
+ @stats.each_nonzero do |bucket, count|
85
+ assert_equal expected_buckets[i], bucket
86
+ assert_equal expected_counts[i], count
87
+ # Increment for the next test
88
+ i += 1
89
+ end
90
+ end
91
+
92
+ def test_histogram
93
+ puts @stats.to_s
94
+ end
95
+
96
+ def test_outlier
97
+ assert_equal 0, @stats.outliers_low
98
+ assert_equal 0, @stats.outliers_high
99
+
100
+ @stats << -1
101
+ @stats << -2
102
+ @stats << 0
103
+
104
+ @stats << 2**128
105
+
106
+ # This should be the last value in the last bucket, but Ruby's native
107
+ # floats are not precise enough. Somewhere past 2^32 the log(x)/log(2)
108
+ # breaks down. So it shows up as 128 (outlier) instead of 127
109
+ #@stats << (2**128) - 1
110
+
111
+ assert_equal 3, @stats.outliers_low
112
+ assert_equal 1, @stats.outliers_high
113
+ end
114
+
115
+ def test_std_dev
116
+ @stats.std_dev
117
+ end
118
+ end
119
+
120
+ class LinearHistogramTest < Test::Unit::TestCase
121
+ def setup
122
+ @stats = Aggregate.new(:low => 0, :high => 32768, :width => 1024)
123
+
124
+ @@DATA.each do |x|
125
+ @stats << x
126
+ end
127
+ end
128
+
129
+ def test_validation
130
+
131
+ # Range cannot be 0
132
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32,:high => 32, :width => 4)}
133
+
134
+ # Range cannot be negative
135
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32, :high => 16, :width => 4)}
136
+
137
+ # Range cannot be < single bucket
138
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 16, :high => 32, :width => 17)}
139
+
140
+ # Range % width must equal 0 (for now)
141
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 1, :high => 16384, :width => 1024)}
142
+ end
143
+
144
+ #XXX: Update test_bucket_contents() if you muck with @@DATA
145
+ # 32768 is an outlier
146
+ @@DATA = [ 0, 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383, 32768]
147
+ def test_bucket_contents
148
+ #XXX: This is the only test so far that cares about the actual contents
149
+ # of @@DATA, so if you update that array ... update this method too
150
+ expected_buckets = [0, 1024, 15360, 16384]
151
+ expected_counts = [5, 2, 1, 2]
152
+
153
+ i = 0
154
+ @stats.each_nonzero do |bucket, count|
155
+ assert_equal expected_buckets[i], bucket
156
+ assert_equal expected_counts[i], count
157
+ # Increment for the next test
158
+ i += 1
159
+ end
160
+ end
161
+
162
+ end
metadata ADDED
@@ -0,0 +1,75 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: aggregate_afurmanov
3
+ version: !ruby/object:Gem::Version
4
+ hash: 19
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 2
9
+ - 2
10
+ version: 0.2.2
11
+ platform: ruby
12
+ authors:
13
+ - Joseph Ruscio, Aleksandr Furmanov
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-12-08 00:00:00 -08:00
19
+ default_executable:
20
+ dependencies: []
21
+
22
+ description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
23
+ email: aleksandr.furmanov@gmail.com
24
+ executables: []
25
+
26
+ extensions: []
27
+
28
+ extra_rdoc_files:
29
+ - LICENSE
30
+ - README.textile
31
+ files:
32
+ - .gitignore
33
+ - LICENSE
34
+ - README.textile
35
+ - Rakefile
36
+ - VERSION
37
+ - aggregate_afurmanov.gemspec
38
+ - lib/aggregate.rb
39
+ - test/ts_aggregate.rb
40
+ has_rdoc: true
41
+ homepage: http://github.com/afurmanov/aggregate
42
+ licenses: []
43
+
44
+ post_install_message:
45
+ rdoc_options:
46
+ - --charset=UTF-8
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ none: false
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ hash: 3
55
+ segments:
56
+ - 0
57
+ version: "0"
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ none: false
60
+ requirements:
61
+ - - ">="
62
+ - !ruby/object:Gem::Version
63
+ hash: 3
64
+ segments:
65
+ - 0
66
+ version: "0"
67
+ requirements: []
68
+
69
+ rubyforge_project:
70
+ rubygems_version: 1.3.7
71
+ signing_key:
72
+ specification_version: 3
73
+ summary: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
74
+ test_files:
75
+ - test/ts_aggregate.rb