RubyGems - aggregate_afurmanov - Versions diffs - 0.2.2 - Mend

aggregate_afurmanov 0.2.2

Files changed (9) hide show

data/.gitignore ADDED

	@@ -0,0 +1 @@
1	+ pkg/

data/LICENSE ADDED

@@ -0,0 +1,22 @@
+Copyright (c) 2009 Joseph Ruscio
+Permission is hereby granted, free of charge, to any person
+obtaining a copy of this software and associated documentation
+files (the "Software"), to deal in the Software without
+restriction, including without limitation the rights to use,
+copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the
+Software is furnished to do so, subject to the following
+conditions:
+The above copyright notice and this permission notice shall be
+included in all copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
+EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES
+OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
+NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
+HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY,
+WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+OTHER DEALINGS IN THE SOFTWARE.

data/README.textile ADDED

@@ -0,0 +1,215 @@
+h1. Aggregate
+By Joseph Ruscio
+Aggregate is an intuitive ruby implementation of a statistics aggregator
+including both default and configurable histogram support. It does this
+without recording/storing any of the actual sample values, making it
+suitable for tracking statistics across millions/billions of sample
+without any impact on performance or memory footprint. Originally
+inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
+h2. Getting Started
+Aggregates are easy to instantiate, populate with sample data, and then
+inspect for common aggregate statistics:
+<pre><code>
+#After instantiation use the << operator to add a sample to the aggregate:
+stats = Aggregate.new
+loop do
+  # Take some action that generates a sample measurement
+  stats << sample
+end
+# The number of samples
+stats.count
+# The average
+stats.mean
+# Max sample value
+stats.max
+# Min sample value
+stats.min
+# The standard deviation
+stats.std_dev
+</code></pre>
+h2. Histograms
+Perhaps more importantly than the basic aggregate statistics detailed above
+Aggregate also maintains a histogram of samples. For anything other than
+normally distributed data are insufficient at best and often downright misleading
+37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
+Aggregates maintains its histogram internally as a set of "buckets".
+Each bucket represents a range of possible sample values. The set of all buckets
+represents the range of "normal" sample values.
+h3. Binary Histograms
+Without any configuration Aggregate instance maintains a binary histogram, where
+each bucket represents a range twice as large as the preceding bucket i.e.
+[1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
+provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
+(See NOTES below for a discussion on the effects in practice of insufficient
+precision.)
+Binary histograms are useful when we have little idea about what the
+sample distribution may look like as almost any positive value will
+fall into some bucket. After using binary histograms to determine
+the coarse-grained characteristics of your sample space you can
+configure a linear histogram to examine it in closer detail.
+h3. Linear Histograms
+Linear histograms are specified with the three values low, high, and width.
+Low and high specify a range [low, high) of values included in the
+histogram (all others are outliers). Width specifies the number of
+values represented by each bucket and therefore the number of
+buckets i.e. granularity of the histogram. The histogram range
+(high - low) must be a multiple of width:
+<pre><code>
+#Want to track aggregate stats on response times in ms
+response_stats = Aggregate.new(0, 2000, 50)
+</code></pre>
+The example above creates a linear histogram that tracks the
+response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
+most of your samples fall in the first couple buckets!
+h3. Histogram Outliers
+An Aggregate records any samples that fall outside the histogram range as
+outliers:
+<pre><code>
+# Number of samples that fall below the normal range
+stats.outliers_low
+# Number of samples that fall above the normal range
+stats.outliers_high
+</code></pre>
+h3. Histogram Iterators
+Once a histogram is populated Aggregate provides iterator support for
+examining the contents of buckets. The iterators provide both the
+number of samples in the bucket, as well as its range:
+<pre><code>
+#Examine every bucket
+@stats.each do |bucket, count|
+end
+#Examine only buckets containing samples
+@stats.each_nonzero do |bucket, count|
+end
+</code></pre>
+h3. Histogram Bar Chart
+Finally Aggregate contains sophisticated pretty-printing support to generate
+ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
+sample distribution the <code>to_s</code> method properly sets a marker weight based on the
+samples per bucket and aligns all output. Empty buckets are skipped to conserve
+screen space.
+<pre><code>
+# Generate and display an 80 column histogram
+puts stats.to_s
+# Generate and display a 120 column histogram
+puts stats.to_s(120)
+</code></pre>
+This code example populates both a binary and linear histogram with the same
+set of 65536 values generated by <code>rand</code> to produce the
+two histograms that follow it:
+<pre><code>
+require 'rubygems'
+require 'aggregate'
+# Create an Aggregate instance
+binary_aggregate = Aggregate.new
+linear_aggregate = Aggregate.new(0, 65536, 8192)
+65536.times do
+  x = rand(65536)
+  binary_aggregate << x
+  linear_aggregate << x
+end
+puts binary_aggregate.to_s
+puts linear_aggregate.to_s
+</code></pre>
+h4. Binary Histogram
+<pre><code>
+value |------------------------------------------------------------------| count
+    1 |                                                                  |     3
+    2 |                                                                  |     1
+    4 |                                                                  |     5
+    8 |                                                                  |     9
+   16 |                                                                  |    15
+   32 |                                                                  |    29
+   64 |                                                                  |    62
+  128 |                                                                  |   115
+  256 |                                                                  |   267
+  512 |@                                                                 |   523
+ 1024 |@                                                                 |   970
+ 2048 |@@@                                                               |  1987
+ 4096 |@@@@@@@@                                                          |  4075
+ 8192 |@@@@@@@@@@@@@@@@                                                  |  8108
+16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@                                  | 16405
+32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
+      ~
+Total |------------------------------------------------------------------| 65535
+</code></pre>
+h4. Linear (0, 65536, 4096) Histogram
+<pre><code>
+value |------------------------------------------------------------------| count
+    0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4094
+ 4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@|  4202
+ 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4118
+12288 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4059
+16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@    |  3999
+20480 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4083
+24576 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4134
+28672 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4143
+32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4152
+36864 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4033
+40960 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4064
+45056 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4012
+49152 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@   |  4070
+53248 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4090
+57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@  |  4135
+61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ |  4144
+Total |------------------------------------------------------------------| 65532
+</code></pre>
+We can see from these histograms that Ruby's rand function does a relatively good
+job of distributing returned values in the requested range.
+h2. Examples
+Here's an example of a "handy timing benchmark":http://gist.github.com/187669
+implemented with aggregate.
+h2. NOTES
+Ruby doesn't have a log2 function built into Math, so we approximate with
+log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
+to precision limitations, once n reaches a certain size (somewhere > 32)
+this starts to return n. The larger the value of n, the more numbers i.e.
+(2^n - 2), (2^n - 3), etc fall trap to this errors. Could probably look into
+using something like BigDecimal, but for the current purposes of the binary
+histogram i.e. a simple coarse-grained view the current implementation is
+sufficient.

data/Rakefile ADDED

@@ -0,0 +1,15 @@
+require 'rake'
+begin
+  require 'jeweler'
+  Jeweler::Tasks.new do |gemspec|
+    gemspec.name = "aggregate_afurmanov"
+    gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
+    gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
+    gemspec.email = "aleksandr.furmanov@gmail.com"
+    gemspec.homepage = "http://github.com/afurmanov/aggregate"
+    gemspec.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
+  end
+rescue LoadError
+  puts "Jeweler not available. Install it with: sudo gem install technicalpickles-jeweler -s http://gems.github.com"
+end

data/VERSION ADDED

	@@ -0,0 +1 @@
1	+ 0.2.2

data/aggregate_afurmanov.gemspec ADDED

@@ -0,0 +1,48 @@
+# Generated by jeweler
+# DO NOT EDIT THIS FILE DIRECTLY
+# Instead, edit Jeweler::Tasks in Rakefile, and run the gemspec command
+# -*- encoding: utf-8 -*-
+Gem::Specification.new do |s|
+  s.name = %q{aggregate_afurmanov}
+  s.version = "0.2.2"
+  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
+  s.authors = ["Joseph Ruscio, Aleksandr Furmanov"]
+  s.date = %q{2010-12-08}
+  s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
+  s.email = %q{aleksandr.furmanov@gmail.com}
+  s.extra_rdoc_files = [
+    "LICENSE",
+     "README.textile"
+  ]
+  s.files = [
+    ".gitignore",
+     "LICENSE",
+     "README.textile",
+     "Rakefile",
+     "VERSION",
+     "aggregate_afurmanov.gemspec",
+     "lib/aggregate.rb",
+     "test/ts_aggregate.rb"
+  ]
+  s.homepage = %q{http://github.com/afurmanov/aggregate}
+  s.rdoc_options = ["--charset=UTF-8"]
+  s.require_paths = ["lib"]
+  s.rubygems_version = %q{1.3.7}
+  s.summary = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
+  s.test_files = [
+    "test/ts_aggregate.rb"
+  ]
+  if s.respond_to? :specification_version then
+    current_version = Gem::Specification::CURRENT_SPECIFICATION_VERSION
+    s.specification_version = 3
+    if Gem::Version.new(Gem::VERSION) >= Gem::Version.new('1.2.0') then
+    else
+    end
+  else
+  end
+end

data/lib/aggregate.rb ADDED

@@ -0,0 +1,298 @@
+# Implements aggregate statistics and maintains
+# configurable histogram for a set of given samples. Convenient for tracking
+# high throughput data.
+class Aggregate
+  #The current average of all samples
+  attr_reader :mean
+  #The current number of samples
+  attr_reader :count
+  #The maximum sample value
+  attr_reader :max
+  #The minimum samples value
+  attr_reader :min
+  #The sum of all samples
+  attr_reader :sum
+  #The number of samples falling below the lowest valued histogram bucket
+  attr_reader :outliers_low
+  #The number of samples falling above the highest valued histogram bucket
+  attr_reader :outliers_high
+  DEFAULT_LOG_BUCKETS = 8
+  # The number of buckets in the binary logarithmic histogram (low => 2**0, high => 2**@@LOG_BUCKETS)
+  def log_buckets
+    @log_buckets
+  end
+  # Create a new Aggregate that maintains a binary logarithmic histogram
+  # by default. Specifying values for low, high, and width configures
+  # the aggregate to maintain a linear histogram with (high - low)/width buckets
+  def initialize (options={})
+    low = options[:low]
+    high = options[:high]
+    width = options[:width]
+    @log_buckets = options[:log_buckets] || DEFAULT_LOG_BUCKETS
+    @count = 0
+    @sum = 0.0
+    @sum2 = 0.0
+    @outliers_low = 0
+    @outliers_high = 0
+    # If the user asks we maintain a linear histogram where
+    # values in the range [low, high) are bucketed in multiples
+    # of width
+    if (nil != low && nil != high && nil != width)
+      #Validate linear specification
+      if high <= low
+	raise ArgumentError, "High bucket must be > Low bucket"
+      end
+      if high - low < width
+        raise ArgumentError, "Histogram width must be <= histogram range"
+      end
+      if 0 != (high - low).modulo(width)
+	raise ArgumentError, "Histogram range (high - low) must be a multiple of width"
+      end
+      @low = low
+      @high = high
+      @width = width
+    else
+      low ||= 1
+      @low = 1
+      @low = to_bucket(to_index(low))
+      @high = to_bucket(to_index(@low) + log_buckets - 1)
+    end
+    #Initialize all buckets to 0
+    @buckets = Array.new(bucket_count, 0)
+  end
+  # Include a sample in the aggregate
+  def << data
+    # Update min/max
+    if 0 == @count
+      @min = data
+      @max = data
+    else
+      @max = [data, @max].max
+      @min = [data, @min].min
+    end
+    # Update the running info
+    @count += 1
+    @sum += data
+    @sum2 += (data * data)
+    # Update the bucket
+    @buckets[to_index(data)] += 1 unless outlier?(data)
+  end
+  def mean
+    @sum / @count
+  end
+  #Calculate the standard deviation
+  def std_dev
+    Math.sqrt((@sum2.to_f - ((@sum.to_f * @sum.to_f)/@count.to_f)) / (@count.to_f - 1))
+  end
+  # Combine two aggregates
+  #def +(b)
+  #  a = self
+  #  c = Aggregate.new
+  #  c.count = a.count + b.count
+  #end
+  #Generate a pretty-printed ASCII representation of the histogram
+  def to_s(columns=nil)
+    #default to an 80 column terminal, don't support < 80 for now
+    if nil == columns
+      columns = 80
+    else
+      raise ArgumentError if columns < 80
+    end
+    #Find the largest bucket and create an array of the rows we intend to print
+    disp_buckets = Array.new
+    max_count = 0
+    total = 0
+    @buckets.each_with_index do |count, idx|
+      next if 0 == count
+      max_count = [max_count, count].max
+      disp_buckets << [idx, to_bucket(idx), count]
+      total += count
+    end
+    #XXX: Better to print just header --> footer
+    return "Empty histogram" if 0 == disp_buckets.length
+    #Figure out how wide the value and count columns need to be based on their
+    #largest respective numbers
+    value_str = "value"
+    count_str = "count"
+    total_str = "Total"
+    value_width = [disp_buckets.last[1].to_s.length, value_str.length].max
+    value_width = [value_width, total_str.length].max
+    count_width = [total.to_s.length, count_str.length].max
+    max_bar_width  = columns - (value_width + " |".length + "| ".length + count_width)
+    #Determine the value of a '@'
+    weight = [max_count.to_f/max_bar_width.to_f, 1.0].max
+    #format the header
+    histogram = sprintf("%#{value_width}s |", value_str)
+    max_bar_width.times { histogram << "-"}
+    histogram << sprintf("| %#{count_width}s\n", count_str)
+    # We denote empty buckets with a '~'
+    def skip_row(value_width)
+      sprintf("%#{value_width}s ~\n", " ")
+    end
+    #Loop through each bucket to be displayed and output the correct number
+    prev_index = disp_buckets[0][0] - 1
+    disp_buckets.each do |x|
+      #Denote skipped empty buckets with a ~
+      histogram << skip_row(value_width) unless prev_index == x[0] - 1
+      prev_index = x[0]
+      #Add the value
+      row = sprintf("%#{value_width}d |", x[1])
+      #Add the bar
+      bar_size = (x[2]/weight).to_i
+      bar_size.times { row += "@"}
+      (max_bar_width - bar_size).times { row += " " }
+      #Add the count
+      row << sprintf("| %#{count_width}d\n", x[2])
+      #Append the finished row onto the histogram
+      histogram << row
+    end
+    #End the table
+    histogram << skip_row(value_width) if disp_buckets.last[0] != bucket_count-1
+    histogram << sprintf("%#{value_width}s", "Total")
+    histogram << " |"
+    max_bar_width.times {histogram << "-"}
+    histogram << "| "
+    histogram << sprintf("%#{count_width}d\n", total)
+  end
+  #Iterate through each bucket in the histogram regardless of
+  #its contents
+  def each
+    @buckets.each_with_index do |count, index|
+      yield(to_bucket(index), count)
+    end
+  end
+  #Iterate through only the buckets in the histogram that contain
+  #samples
+  def each_nonzero
+    @buckets.each_with_index do |count, index|
+      yield(to_bucket(index), count) if count != 0
+    end
+  end
+  private
+  def linear?
+    nil != @width
+  end
+  def outlier? (data)
+    if data < @low
+      @outliers_low += 1
+    elsif data >= @high
+      @outliers_high += 1
+    else
+      return false
+    end
+  end
+  def bucket_count
+    if linear?
+      return (@high-@low)/@width
+    else
+      return log_buckets
+    end
+  end
+  def to_bucket(index)
+    if linear?
+      return @low + (index * @width)
+    else
+      return 2**(log2(@low) + index)
+    end
+  end
+  def right_bucket? index, data
+    # check invariant
+    raise unless linear?
+    bucket = to_bucket(index)
+    #It's the right bucket if data falls between bucket and next bucket
+    bucket <= data && data < bucket + @width
+  end
+=begin
+  def find_bucket(lower, upper, target)
+    #Classic binary search
+    return upper if right_bucket?(upper, target)
+    # Cut the search range in half
+    middle = (upper/2).to_i
+    # Determine which half contains our value and recurse
+    if (to_bucket(middle) >= target)
+      return find_bucket(lower, middle, target)
+    else
+      return find_bucket(middle, upper, target)
+    end
+  end
+=end
+  # A data point is added to the bucket[n] where the data point
+  # is less than the value represented by bucket[n], but greater
+  # than the value represented by bucket[n+1]
+public
+  def to_index (data)
+    # basic case is simple
+    return log2([1,data/@low].max).to_i if !linear?
+    # Search for the right bucket in the linear case
+    @buckets.each_with_index do |count, idx|
+      return idx if right_bucket?(idx, data)
+    end
+    #find_bucket(0, bucket_count-1, data)
+    #Should not get here
+    raise "#{data}"
+  end
+  # log2(x) returns j, | i = j-1 and 2**i <= data < 2**j
+  @@LOG2_DIVEDEND = Math.log(2)
+  def log2( x )
+   Math.log(x) / @@LOG2_DIVEDEND
+  end
+end

data/test/ts_aggregate.rb ADDED

@@ -0,0 +1,162 @@
+require 'test/unit'
+require 'lib/aggregate'
+class SimpleStatsTest < Test::Unit::TestCase
+  def setup
+    @stats = Aggregate.new(:log_buckets => 128)
+    @@DATA.each do |x|
+      @stats << x
+    end
+  end
+  def test_stats_count
+    assert_equal @@DATA.length, @stats.count
+  end
+  def test_stats_min_max
+    sorted_data = @@DATA.sort
+    assert_equal sorted_data[0], @stats.min
+    assert_equal sorted_data.last, @stats.max
+  end
+  def test_stats_mean
+    sum = 0
+    @@DATA.each do |x|
+      sum += x
+    end
+    assert_equal sum.to_f/@@DATA.length.to_f, @stats.mean
+  end
+  def test_bucket_counts
+    #Test each iterator
+    total_bucket_sum = 0
+    i = 0
+    @stats.each do |bucket, count|
+      assert_equal 2**i, bucket
+      total_bucket_sum += count
+      i += 1
+    end
+    assert_equal @@DATA.length, total_bucket_sum
+    #Test each_nonzero iterator
+    prev_bucket = 0
+    total_bucket_sum = 0
+    @stats.each_nonzero do |bucket, count|
+      assert bucket > prev_bucket
+      assert_not_equal count, 0
+      total_bucket_sum += count
+    end
+    assert_equal total_bucket_sum, @@DATA.length
+  end
+=begin
+  def test_addition
+    stats1 = Aggregate.new
+    stats2 = Aggregate.new
+    stats1 << 1
+    stats2 << 3
+    stats_sum = stats1 + stats2
+    assert_equal stats_sum.count, stats1.count + stats2.count
+  end
+=end
+  #XXX: Update test_bucket_contents() if you muck with @@DATA
+   @@DATA = [ 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383]
+  def test_bucket_contents
+    #XXX: This is the only test so far that cares about the actual contents
+    # of @@DATA, so if you update that array ... update this method too
+    expected_buckets  = [1, 4, 1024, 8192, 16384]
+    expected_counts =   [1, 3,    2,    1,     2]
+    i = 0
+    @stats.each_nonzero do |bucket, count|
+      assert_equal expected_buckets[i], bucket
+      assert_equal expected_counts[i],  count
+      # Increment for the next test
+      i += 1
+    end
+  end
+  def test_histogram
+    puts @stats.to_s
+  end
+  def test_outlier
+    assert_equal 0, @stats.outliers_low
+    assert_equal 0, @stats.outliers_high
+    @stats << -1
+    @stats << -2
+    @stats << 0
+    @stats << 2**128
+    # This should be the last value in the last bucket, but Ruby's native
+    # floats are not precise enough. Somewhere past 2^32 the log(x)/log(2)
+    # breaks down. So it shows up as 128 (outlier) instead of 127
+    #@stats << (2**128) - 1
+    assert_equal 3, @stats.outliers_low
+    assert_equal 1, @stats.outliers_high
+  end
+  def test_std_dev
+    @stats.std_dev
+  end
+end
+class LinearHistogramTest < Test::Unit::TestCase
+  def setup
+    @stats = Aggregate.new(:low => 0, :high => 32768, :width => 1024)
+    @@DATA.each do |x|
+      @stats << x
+    end
+  end
+  def test_validation
+    # Range cannot be 0
+    assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32,:high => 32, :width => 4)}
+    # Range cannot be negative
+    assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 32, :high => 16, :width => 4)}
+    # Range cannot be < single bucket
+    assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 16, :high => 32, :width => 17)}
+    # Range % width must equal 0 (for now)
+    assert_raise(ArgumentError) {bad_stats = Aggregate.new(:low => 1, :high => 16384, :width => 1024)}
+  end
+  #XXX: Update test_bucket_contents() if you muck with @@DATA
+  # 32768 is an outlier
+  @@DATA = [ 0, 1, 5, 4, 6, 1028, 1972, 16384, 16385, 16383, 32768]
+  def test_bucket_contents
+    #XXX: This is the only test so far that cares about the actual contents
+    # of @@DATA, so if you update that array ... update this method too
+    expected_buckets  = [0, 1024,  15360, 16384]
+    expected_counts =   [5, 2,     1,     2]
+    i = 0
+    @stats.each_nonzero do |bucket, count|
+      assert_equal expected_buckets[i], bucket
+      assert_equal expected_counts[i],  count
+      # Increment for the next test
+      i += 1
+    end
+  end
+end

metadata ADDED

@@ -0,0 +1,75 @@
+--- !ruby/object:Gem::Specification
+name: aggregate_afurmanov
+version: !ruby/object:Gem::Version
+  hash: 19
+  prerelease: false
+  segments:
+  - 0
+  - 2
+  - 2
+  version: 0.2.2
+platform: ruby
+authors:
+- Joseph Ruscio, Aleksandr Furmanov
+autorequire:
+bindir: bin
+cert_chain: []
+date: 2010-12-08 00:00:00 -08:00
+default_executable:
+dependencies: []
+description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
+email: aleksandr.furmanov@gmail.com
+executables: []
+extensions: []
+extra_rdoc_files:
+- LICENSE
+- README.textile
+files:
+- .gitignore
+- LICENSE
+- README.textile
+- Rakefile
+- VERSION
+- aggregate_afurmanov.gemspec
+- lib/aggregate.rb
+- test/ts_aggregate.rb
+has_rdoc: true
+homepage: http://github.com/afurmanov/aggregate
+licenses: []
+post_install_message:
+rdoc_options:
+- --charset=UTF-8
+require_paths:
+- lib
+required_ruby_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
+required_rubygems_version: !ruby/object:Gem::Requirement
+  none: false
+  requirements:
+  - - ">="
+    - !ruby/object:Gem::Version
+      hash: 3
+      segments:
+      - 0
+      version: "0"
+requirements: []
+rubyforge_project:
+rubygems_version: 1.3.7
+signing_key:
+specification_version: 3
+summary: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
+test_files:
+- test/ts_aggregate.rb