aggregate_afurmanov 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
@@ -0,0 +1 @@
1
+ pkg/
data/LICENSE ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2009 Joseph Ruscio
2
+
3
+ Permission is hereby granted, free of charge, to any person
4
+ obtaining a copy of this software and associated documentation
5
+ files (the "Software"), to deal in the Software without
6
+ restriction, including without limitation the rights to use,
7
+ copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ copies of the Software, and to permit persons to whom the
9
+ Software is furnished to do so, subject to the following
10
+ conditions:
11
+
12
+ The above copyright notice and this permission notice shall be
13
+ included in all copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
16
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
17
+ OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
18
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
19
+ HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
20
+ WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
21
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
22
+ OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,215 @@
1
+ h1. Aggregate
2
+
3
+ By Joseph Ruscio
4
+
5
+ Aggregate is an intuitive ruby implementation of a statistics aggregator
6
+ including both default and configurable histogram support. It does this
7
+ without recording/storing any of the actual sample values, making it
8
+ suitable for tracking statistics across millions/billions of sample
9
+ without any impact on performance or memory footprint. Originally
10
+ inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
11
+
12
+ h2. Getting Started
13
+
14
+ Aggregates are easy to instantiate, populate with sample data, and then
15
+ inspect for common aggregate statistics:
16
+
17
+ <pre><code>
18
+ #After instantiation use the << operator to add a sample to the aggregate:
19
+ stats = Aggregate.new
20
+
21
+ loop do
22
+ # Take some action that generates a sample measurement
23
+ stats << sample
24
+ end
25
+
26
+ # The number of samples
27
+ stats.count
28
+
29
+ # The average
30
+ stats.mean
31
+
32
+ # Max sample value
33
+ stats.max
34
+
35
+ # Min sample value
36
+ stats.min
37
+
38
+ # The standard deviation
39
+ stats.std_dev
40
+ </code></pre>
41
+
42
+ h2. Histograms
43
+
44
+ Perhaps more importantly than the basic aggregate statistics detailed above
45
+ Aggregate also maintains a histogram of samples. For anything other than
46
+ normally distributed data are insufficient at best and often downright misleading
47
+ 37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
48
+ Aggregates maintains its histogram internally as a set of "buckets".
49
+ Each bucket represents a range of possible sample values. The set of all buckets
50
+ represents the range of "normal" sample values.
51
+
52
+ h3. Binary Histograms
53
+
54
+ Without any configuration Aggregate instance maintains a binary histogram, where
55
+ each bucket represents a range twice as large as the preceding bucket i.e.
56
+ [1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
57
+ provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
58
+ (See NOTES below for a discussion on the effects in practice of insufficient
59
+ precision.)
60
+
61
+ Binary histograms are useful when we have little idea about what the
62
+ sample distribution may look like as almost any positive value will
63
+ fall into some bucket. After using binary histograms to determine
64
+ the coarse-grained characteristics of your sample space you can
65
+ configure a linear histogram to examine it in closer detail.
66
+
67
+ h3. Linear Histograms
68
+
69
+ Linear histograms are specified with the three values low, high, and width.
70
+ Low and high specify a range [low, high) of values included in the
71
+ histogram (all others are outliers). Width specifies the number of
72
+ values represented by each bucket and therefore the number of
73
+ buckets i.e. granularity of the histogram. The histogram range
74
+ (high - low) must be a multiple of width:
75
+
76
+ <pre><code>
77
+ #Want to track aggregate stats on response times in ms
78
+ response_stats = Aggregate.new(0, 2000, 50)
79
+ </code></pre>
80
+
81
+ The example above creates a linear histogram that tracks the
82
+ response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
83
+ most of your samples fall in the first couple buckets!
84
+
85
+ h3. Histogram Outliers
86
+
87
+ An Aggregate records any samples that fall outside the histogram range as
88
+ outliers:
89
+
90
+ <pre><code>
91
+ # Number of samples that fall below the normal range
92
+ stats.outliers_low
93
+
94
+ # Number of samples that fall above the normal range
95
+ stats.outliers_high
96
+ </code></pre>
97
+
98
+ h3. Histogram Iterators
99
+
100
+ Once a histogram is populated Aggregate provides iterator support for
101
+ examining the contents of buckets. The iterators provide both the
102
+ number of samples in the bucket, as well as its range:
103
+
104
+ <pre><code>
105
+ #Examine every bucket
106
+ @stats.each do |bucket, count|
107
+ end
108
+
109
+ #Examine only buckets containing samples
110
+ @stats.each_nonzero do |bucket, count|
111
+ end
112
+ </code></pre>
113
+
114
+ h3. Histogram Bar Chart
115
+
116
+ Finally Aggregate contains sophisticated pretty-printing support to generate
117
+ ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
118
+ sample distribution the <code>to_s</code> method properly sets a marker weight based on the
119
+ samples per bucket and aligns all output. Empty buckets are skipped to conserve
120
+ screen space.
121
+
122
+ <pre><code>
123
+ # Generate and display an 80 column histogram
124
+ puts stats.to_s
125
+
126
+ # Generate and display a 120 column histogram
127
+ puts stats.to_s(120)
128
+ </code></pre>
129
+
130
+ This code example populates both a binary and linear histogram with the same
131
+ set of 65536 values generated by <code>rand</code> to produce the
132
+ two histograms that follow it:
133
+
134
+ <pre><code>
135
+ require 'rubygems'
136
+ require 'aggregate'
137
+
138
+ # Create an Aggregate instance
139
+ binary_aggregate = Aggregate.new
140
+ linear_aggregate = Aggregate.new(0, 65536, 8192)
141
+
142
+ 65536.times do
143
+ x = rand(65536)
144
+ binary_aggregate << x
145
+ linear_aggregate << x
146
+ end
147
+
148
+ puts binary_aggregate.to_s
149
+ puts linear_aggregate.to_s
150
+ </code></pre>
151
+
152
+ h4. Binary Histogram
153
+
154
+ <pre><code>
155
+ value |------------------------------------------------------------------| count
156
+ 1 | | 3
157
+ 2 | | 1
158
+ 4 | | 5
159
+ 8 | | 9
160
+ 16 | | 15
161
+ 32 | | 29
162
+ 64 | | 62
163
+ 128 | | 115
164
+ 256 | | 267
165
+ 512 |@ | 523
166
+ 1024 |@ | 970
167
+ 2048 |@@@ | 1987
168
+ 4096 |@@@@@@@@ | 4075
169
+ 8192 |@@@@@@@@@@@@@@@@ | 8108
170
+ 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 16405
171
+ 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
172
+ ~
173
+ Total |------------------------------------------------------------------| 65535
174
+ </code></pre>
175
+
176
+ h4. Linear (0, 65536, 4096) Histogram
177
+
178
+ <pre><code>
179
+ value |------------------------------------------------------------------| count
180
+ 0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
181
+ 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
182
+ 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4118
183
+ 12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4059
184
+ 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 3999
185
+ 20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4083
186
+ 24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4134
187
+ 28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4143
188
+ 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4152
189
+ 36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4033
190
+ 40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4064
191
+ 45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4012
192
+ 49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4070
193
+ 53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4090
194
+ 57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
195
+ 61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
196
+ Total |------------------------------------------------------------------| 65532
197
+ </code></pre>
198
+ We can see from these histograms that Ruby's rand function does a relatively good
199
+ job of distributing returned values in the requested range.
200
+
201
+ h2. Examples
202
+
203
+ Here's an example of a "handy timing benchmark":http://gist.github.com/187669
204
+ implemented with aggregate.
205
+
206
+ h2. NOTES
207
+
208
+ Ruby doesn't have a log2 function built into Math, so we approximate with
209
+ log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
210
+ to precision limitations, once n reaches a certain size (somewhere > 32)
211
+ this starts to return n. The larger the value of n, the more numbers i.e.
212
+ (2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into
213
+ using something like BigDecimal, but for the current purposes of the binary
214
+ histogram i.e. a simple coarse-grained view the current implementation is
215
+ sufficient.
@@ -0,0 +1,15 @@
1
+ require 'rake'
2
+
3
+ begin
4
+ require 'jeweler'
5
+ Jeweler::Tasks.new do |gemspec|
6
+ gemspec.name = "aggregate_afurmanov"
7
+ gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
8
+ gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
9
+ gemspec.email = "aleksandr.furmanov@gmail.com"
10
+ gemspec.homepage = "http://github.com/afurmanov/aggregate"
11
+ gemspec.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
12
+ end
13
+ rescue LoadError
14
+ puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
15
+ end
data/VERSION ADDED
@@ -0,0 +1 @@
1
+ 0.2.2
@@ -0,0 +1,48 @@
1
+ # Generated by jeweler
2
+ # DO NOT EDIT THIS FILE DIRECTLY
3
+ # Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
4
+ # -*- encoding: utf-8 -*-
5
+
6
+ Gem::Specification.new do |s|
7
+ s.name = %q{aggregate_afurmanov}
8
+ s.version = "0.2.2"
9
+
10
+ s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
+ s.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
12
+ s.date = %q{2010-12-08}
13
+ s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
14
+ s.email = %q{aleksandr.furmanov@gmail.com}
15
+ s.extra_rdoc_files = [
16
+ "LICENSE",
17
+ "README.textile"
18
+ ]
19
+ s.files = [
20
+ ".gitignore",
21
+ "LICENSE",
22
+ "README.textile",
23
+ "Rakefile",
24
+ "VERSION",
25
+ "aggregate_afurmanov.gemspec",
26
+ "lib/aggregate.rb",
27
+ "test/ts_aggregate.rb"
28
+ ]
29
+ s.homepage = %q{http://github.com/afurmanov/aggregate}
30
+ s.rdoc_options = ["--charset=UTF-8"]
31
+ s.require_paths = ["lib"]
32
+ s.rubygems_version = %q{1.3.7}
33
+ s.summary = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
34
+ s.test_files = [
35
+ "test/ts_aggregate.rb"
36
+ ]
37
+
38
+ if s.respond_to? :specification_version then
39
+ current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
40
+ s.specification_version = 3
41
+
42
+ if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
43
+ else
44
+ end
45
+ else
46
+ end
47
+ end
48
+
@@ -0,0 +1,298 @@
1
+ # Implements aggregate statistics and maintains
2
+ # configurable histogram for a set of given samples. Convenient for tracking
3
+ # high throughput data.
4
+ class Aggregate
5
+ #The current average of all samples
6
+ attr_reader :mean
7
+
8
+ #The current number of samples
9
+ attr_reader :count
10
+
11
+ #The maximum sample value
12
+ attr_reader :max
13
+
14
+ #The minimum samples value
15
+ attr_reader :min
16
+
17
+ #The sum of all samples
18
+ attr_reader :sum
19
+
20
+ #The number of samples falling below the lowest valued histogram bucket
21
+ attr_reader :outliers_low
22
+
23
+ #The number of samples falling above the highest valued histogram bucket
24
+ attr_reader :outliers_high
25
+
26
+ DEFAULT_LOG_BUCKETS = 8
27
+
28
+ # The number of buckets in the binary logarithmic histogram (low => 2**0, high => 2**@@LOG_BUCKETS)
29
+ def log_buckets
30
+ @log_buckets
31
+ end
32
+
33
+ # Create a new Aggregate that maintains a binary logarithmic histogram
34
+ # by default. Specifying values for low, high, and width configures
35
+ # the aggregate to maintain a linear histogram with (high - low)/width buckets
36
+ def initialize (options={})
37
+ low = options[:low]
38
+ high = options[:high]
39
+ width = options[:width]
40
+ @log_buckets = options[:log_buckets] || DEFAULT_LOG_BUCKETS
41
+ @count = 0
42
+ @sum = 0.0
43
+ @sum2 = 0.0
44
+ @outliers_low = 0
45
+ @outliers_high = 0
46
+
47
+ # If the user asks we maintain a linear histogram where
48
+ # values in the range [low, high) are bucketed in multiples
49
+ # of width
50
+ if (nil != low && nil != high && nil != width)
51
+
52
+ #Validate linear specification
53
+ if high <= low
54
+ raise ArgumentError, "High bucket must be > Low bucket"
55
+ end
56
+
57
+ if high - low < width
58
+ raise ArgumentError, "Histogram width must be <= histogram range"
59
+ end
60
+
61
+ if 0 != (high - low).modulo(width)
62
+ raise ArgumentError, "Histogram range (high - low) must be a multiple of width"
63
+ end
64
+
65
+ @low = low
66
+ @high = high
67
+ @width = width
68
+ else
69
+ low ||= 1
70
+ @low = 1
71
+ @low = to_bucket(to_index(low))
72
+ @high = to_bucket(to_index(@low) + log_buckets - 1)
73
+ end
74
+
75
+ #Initialize all buckets to 0
76
+ @buckets = Array.new(bucket_count, 0)
77
+ end
78
+
79
+ # Include a sample in the aggregate
80
+ def << data
81
+
82
+ # Update min/max
83
+ if 0 == @count
84
+ @min = data
85
+ @max = data
86
+ else
87
+ @max = [data, @max].max
88
+ @min = [data, @min].min
89
+ end
90
+
91
+ # Update the running info
92
+ @count += 1
93
+ @sum += data
94
+ @sum2 += (data * data)
95
+
96
+ # Update the bucket
97
+ @buckets[to_index(data)] += 1 unless outlier?(data)
98
+ end
99
+
100
+ def mean
101
+ @sum / @count
102
+ end
103
+
104
+ #Calculate the standard deviation
105
+ def std_dev
106
+ Math.sqrt((@sum2.to_f - ((@sum.to_f * @sum.to_f)/@count.to_f)) / (@count.to_f - 1))
107
+ end
108
+
109
+ # Combine two aggregates
110
+ #def +(b)
111
+ # a = self
112
+ # c = Aggregate.new
113
+
114
+ # c.count = a.count + b.count
115
+ #end
116
+
117
+ #Generate a pretty-printed ASCII representation of the histogram
118
+ def to_s(columns=nil)
119
+
120
+ #default to an 80 column terminal, don't support < 80 for now
121
+ if nil == columns
122
+ columns = 80
123
+ else
124
+ raise ArgumentError if columns < 80
125
+ end
126
+
127
+ #Find the largest bucket and create an array of the rows we intend to print
128
+ disp_buckets = Array.new
129
+ max_count = 0
130
+ total = 0
131
+ @buckets.each_with_index do |count, idx|
132
+ next if 0 == count
133
+ max_count = [max_count, count].max
134
+ disp_buckets << [idx, to_bucket(idx), count]
135
+ total += count
136
+ end
137
+
138
+ #XXX: Better to print just header --> footer
139
+ return "Empty histogram" if 0 == disp_buckets.length
140
+
141
+ #Figure out how wide the value and count columns need to be based on their
142
+ #largest respective numbers
143
+ value_str = "value"
144
+ count_str = "count"
145
+ total_str = "Total"
146
+ value_width = [disp_buckets.last[1].to_s.length, value_str.length].max
147
+ value_width = [value_width, total_str.length].max
148
+ count_width = [total.to_s.length, count_str.length].max
149
+ max_bar_width = columns - (value_width + " |".length + "| ".length + count_width)
150
+
151
+ #Determine the value of a '@'
152
+ weight = [max_count.to_f/max_bar_width.to_f, 1.0].max
153
+
154
+ #format the header
155
+ histogram = sprintf("%#{value_width}s |", value_str)
156
+ max_bar_width.times { histogram << "-"}
157
+ histogram << sprintf("| %#{count_width}s\n", count_str)
158
+
159
+ # We denote empty buckets with a '~'
160
+ def skip_row(value_width)
161
+ sprintf("%#{value_width}s ~\n", " ")
162
+ end
163
+
164
+ #Loop through each bucket to be displayed and output the correct number
165
+ prev_index = disp_buckets[0][0] - 1
166
+
167
+ disp_buckets.each do |x|
168
+ #Denote skipped empty buckets with a ~
169
+ histogram << skip_row(value_width) unless prev_index == x[0] - 1
170
+ prev_index = x[0]
171
+
172
+ #Add the value
173
+ row = sprintf("%#{value_width}d |", x[1])
174
+
175
+ #Add the bar
176
+ bar_size = (x[2]/weight).to_i
177
+ bar_size.times { row += "@"}
178
+ (max_bar_width - bar_size).times { row += " " }
179
+
180
+ #Add the count
181
+ row << sprintf("| %#{count_width}d\n", x[2])
182
+
183
+ #Append the finished row onto the histogram
184
+ histogram << row
185
+ end
186
+
187
+ #End the table
188
+ histogram << skip_row(value_width) if disp_buckets.last[0] != bucket_count-1
189
+ histogram << sprintf("%#{value_width}s", "Total")
190
+ histogram << " |"
191
+ max_bar_width.times {histogram << "-"}
192
+ histogram << "| "
193
+ histogram << sprintf("%#{count_width}d\n", total)
194
+ end
195
+
196
+ #Iterate through each bucket in the histogram regardless of
197
+ #its contents
198
+ def each
199
+ @buckets.each_with_index do |count, index|
200
+ yield(to_bucket(index), count)
201
+ end
202
+ end
203
+
204
+ #Iterate through only the buckets in the histogram that contain
205
+ #samples
206
+ def each_nonzero
207
+ @buckets.each_with_index do |count, index|
208
+ yield(to_bucket(index), count) if count != 0
209
+ end
210
+ end
211
+
212
+ private
213
+
214
+ def linear?
215
+ nil != @width
216
+ end
217
+
218
+ def outlier? (data)
219
+
220
+ if data < @low
221
+ @outliers_low += 1
222
+ elsif data >= @high
223
+ @outliers_high += 1
224
+ else
225
+ return false
226
+ end
227
+ end
228
+
229
+ def bucket_count
230
+ if linear?
231
+ return (@high-@low)/@width
232
+ else
233
+ return log_buckets
234
+ end
235
+ end
236
+
237
+ def to_bucket(index)
238
+ if linear?
239
+ return @low + (index * @width)
240
+ else
241
+ return 2**(log2(@low) + index)
242
+ end
243
+ end
244
+
245
+ def right_bucket? index, data
246
+
247
+ # check invariant
248
+ raise unless linear?
249
+
250
+ bucket = to_bucket(index)
251
+
252
+ #It's the right bucket if data falls between bucket and next bucket
253
+ bucket <= data && data < bucket + @width
254
+ end
255
+
256
+ =begin
257
+ def find_bucket(lower, upper, target)
258
+ #Classic binary search
259
+ return upper if right_bucket?(upper, target)
260
+
261
+ # Cut the search range in half
262
+ middle = (upper/2).to_i
263
+
264
+ # Determine which half contains our value and recurse
265
+ if (to_bucket(middle) >= target)
266
+ return find_bucket(lower, middle, target)
267
+ else
268
+ return find_bucket(middle, upper, target)
269
+ end
270
+ end
271
+ =end
272
+
273
+ # A data point is added to the bucket[n] where the data point
274
+ # is less than the value represented by bucket[n], but greater
275
+ # than the value represented by bucket[n+1]
276
+ public
277
+ def to_index (data)
278
+
279
+ # basic case is simple
280
+ return log2([1,data/@low].max).to_i if !linear?
281
+
282
+ # Search for the right bucket in the linear case
283
+ @buckets.each_with_index do |count, idx|
284
+ return idx if right_bucket?(idx, data)
285
+ end
286
+ #find_bucket(0, bucket_count-1, data)
287
+
288
+ #Should not get here
289
+ raise "#{data}"
290
+ end
291
+
292
+ # log2(x) returns j, | i = j-1 and 2**i <= data < 2**j
293
+ @@LOG2_DIVEDEND = Math.log(2)
294
+ def log2( x )
295
+ Math.log(x) / @@LOG2_DIVEDEND
296
+ end
297
+
298
+ end
@@ -0,0 +1,162 @@
1
+ require 'test/unit'
2
+ require 'lib/aggregate'
3
+
4
+ class SimpleStatsTest < Test::Unit::TestCase
5
+
6
+ def setup
7
+ @stats = Aggregate.new(:log_buckets => 128)
8
+
9
+ @@DATA.each do |x|
10
+ @stats << x
11
+ end
12
+ end
13
+
14
+ def test_stats_count
15
+ assert_equal @@DATA.length, @stats.count
16
+ end
17
+
18
+ def test_stats_min_max
19
+ sorted_data = @@DATA.sort
20
+
21
+ assert_equal sorted_data[0], @stats.min
22
+ assert_equal sorted_data.last, @stats.max
23
+ end
24
+
25
+ def test_stats_mean
26
+ sum = 0
27
+ @@DATA.each do |x|
28
+ sum += x
29
+ end
30
+
31
+ assert_equal sum.to_f/@@DATA.length.to_f, @stats.mean
32
+ end
33
+
34
+ def test_bucket_counts
35
+
36
+ #Test each iterator
37
+ total_bucket_sum = 0
38
+ i = 0
39
+ @stats.each do |bucket, count|
40
+ assert_equal 2**i, bucket
41
+
42
+ total_bucket_sum += count
43
+ i += 1
44
+ end
45
+
46
+ assert_equal @@DATA.length, total_bucket_sum
47
+
48
+ #Test each_nonzero iterator
49
+ prev_bucket = 0
50
+ total_bucket_sum = 0
51
+ @stats.each_nonzero do |bucket, count|
52
+ assert bucket > prev_bucket
53
+ assert_not_equal count, 0
54
+
55
+ total_bucket_sum += count
56
+ end
57
+
58
+ assert_equal total_bucket_sum, @@DATA.length
59
+ end
60
+
61
+ =begin
62
+ def test_addition
63
+ stats1 = Aggregate.new
64
+ stats2 = Aggregate.new
65
+
66
+ stats1 << 1
67
+ stats2 << 3
68
+
69
+ stats_sum = stats1 + stats2
70
+
71
+ assert_equal stats_sum.count, stats1.count + stats2.count
72
+ end
73
+ =end
74
+
75
+ #XXX: Update test_bucket_contents() if you muck with @@DATA
76
+ @@DATA = [ 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383]
77
+ def test_bucket_contents
78
+ #XXX: This is the only test so far that cares about the actual contents
79
+ # of @@DATA, so if you update that array ... update this method too
80
+ expected_buckets = [1, 4, 1024, 8192, 16384]
81
+ expected_counts = [1, 3, 2, 1, 2]
82
+
83
+ i = 0
84
+ @stats.each_nonzero do |bucket, count|
85
+ assert_equal expected_buckets[i], bucket
86
+ assert_equal expected_counts[i], count
87
+ # Increment for the next test
88
+ i += 1
89
+ end
90
+ end
91
+
92
+ def test_histogram
93
+ puts @stats.to_s
94
+ end
95
+
96
+ def test_outlier
97
+ assert_equal 0, @stats.outliers_low
98
+ assert_equal 0, @stats.outliers_high
99
+
100
+ @stats << -1
101
+ @stats << -2
102
+ @stats << 0
103
+
104
+ @stats << 2**128
105
+
106
+ # This should be the last value in the last bucket, but Ruby's native
107
+ # floats are not precise enough. Somewhere past 2^32 the log(x)/log(2)
108
+ # breaks down. So it shows up as 128 (outlier) instead of 127
109
+ #@stats << (2**128) - 1
110
+
111
+ assert_equal 3, @stats.outliers_low
112
+ assert_equal 1, @stats.outliers_high
113
+ end
114
+
115
+ def test_std_dev
116
+ @stats.std_dev
117
+ end
118
+ end
119
+
120
+ class LinearHistogramTest < Test::Unit::TestCase
121
+ def setup
122
+ @stats = Aggregate.new(:low => 0, :high => 32768, :width => 1024)
123
+
124
+ @@DATA.each do |x|
125
+ @stats << x
126
+ end
127
+ end
128
+
129
+ def test_validation
130
+
131
+ # Range cannot be 0
132
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32,:high => 32, :width => 4)}
133
+
134
+ # Range cannot be negative
135
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32, :high => 16, :width => 4)}
136
+
137
+ # Range cannot be < single bucket
138
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 16, :high => 32, :width => 17)}
139
+
140
+ # Range % width must equal 0 (for now)
141
+ assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 1, :high => 16384, :width => 1024)}
142
+ end
143
+
144
+ #XXX: Update test_bucket_contents() if you muck with @@DATA
145
+ # 32768 is an outlier
146
+ @@DATA = [ 0, 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383, 32768]
147
+ def test_bucket_contents
148
+ #XXX: This is the only test so far that cares about the actual contents
149
+ # of @@DATA, so if you update that array ... update this method too
150
+ expected_buckets = [0, 1024, 15360, 16384]
151
+ expected_counts = [5, 2, 1, 2]
152
+
153
+ i = 0
154
+ @stats.each_nonzero do |bucket, count|
155
+ assert_equal expected_buckets[i], bucket
156
+ assert_equal expected_counts[i], count
157
+ # Increment for the next test
158
+ i += 1
159
+ end
160
+ end
161
+
162
+ end
metadata ADDED
@@ -0,0 +1,75 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: aggregate_afurmanov
3
+ version: !ruby/object:Gem::Version
4
+ hash: 19
5
+ prerelease: false
6
+ segments:
7
+ - 0
8
+ - 2
9
+ - 2
10
+ version: 0.2.2
11
+ platform: ruby
12
+ authors:
13
+ - Joseph Ruscio, Aleksandr Furmanov
14
+ autorequire:
15
+ bindir: bin
16
+ cert_chain: []
17
+
18
+ date: 2010-12-08 00:00:00 -08:00
19
+ default_executable:
20
+ dependencies: []
21
+
22
+ description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
23
+ email: aleksandr.furmanov@gmail.com
24
+ executables: []
25
+
26
+ extensions: []
27
+
28
+ extra_rdoc_files:
29
+ - LICENSE
30
+ - README.textile
31
+ files:
32
+ - .gitignore
33
+ - LICENSE
34
+ - README.textile
35
+ - Rakefile
36
+ - VERSION
37
+ - aggregate_afurmanov.gemspec
38
+ - lib/aggregate.rb
39
+ - test/ts_aggregate.rb
40
+ has_rdoc: true
41
+ homepage: http://github.com/afurmanov/aggregate
42
+ licenses: []
43
+
44
+ post_install_message:
45
+ rdoc_options:
46
+ - --charset=UTF-8
47
+ require_paths:
48
+ - lib
49
+ required_ruby_version: !ruby/object:Gem::Requirement
50
+ none: false
51
+ requirements:
52
+ - - ">="
53
+ - !ruby/object:Gem::Version
54
+ hash: 3
55
+ segments:
56
+ - 0
57
+ version: "0"
58
+ required_rubygems_version: !ruby/object:Gem::Requirement
59
+ none: false
60
+ requirements:
61
+ - - ">="
62
+ - !ruby/object:Gem::Version
63
+ hash: 3
64
+ segments:
65
+ - 0
66
+ version: "0"
67
+ requirements: []
68
+
69
+ rubyforge_project:
70
+ rubygems_version: 1.3.7
71
+ signing_key:
72
+ specification_version: 3
73
+ summary: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
74
+ test_files:
75
+ - test/ts_aggregate.rb