aggregate 0.2.0 → 0.2.1

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1 @@
1
+ pkg/
@@ -1,13 +1,20 @@
1
+ h1. Aggregate
2
+
3
+ By Joseph Ruscio
4
+
1
5
  Aggregate is an intuitive ruby implementation of a statistics aggregator
2
6
  including both default and configurable histogram support. It does this
3
7
  without recording/storing any of the actual sample values, making it
4
8
  suitable for tracking statistics across millions/billions of sample
5
9
  without any impact on performance or memory footprint. Originally
6
- inspired by the Aggregate support in SystemTap (http://sourceware.org/systemtap/)
10
+ inspired by the Aggregate support in "SystemTap.":http://sourceware.org/systemtap
7
11
 
8
- Aggregates are easy to instantiate, populate with sample data, and examine
9
- statistics:
12
+ h2. Getting Started
10
13
 
14
+ Aggregates are easy to instantiate, populate with sample data, and then
15
+ inspect for common aggregate statistics:
16
+
17
+ <pre><code>
11
18
  #After instantiation use the << operator to add a sample to the aggregate:
12
19
  stats = Aggregate.new
13
20
 
@@ -30,14 +37,21 @@ stats.min
30
37
 
31
38
  # The standard deviation
32
39
  stats.std_dev
40
+ </code></pre>
41
+
42
+ h2. Histograms
33
43
 
34
44
  Perhaps more importantly than the basic aggregate statistics detailed above
35
- Aggregate also maintains a histogram of samples. Good explanation of why
36
- its important: http://37signals.com/svn/posts/1836-the-problem-with-averages
45
+ Aggregate also maintains a histogram of samples. For anything other than
46
+ normally distributed data are insufficient at best and often downright misleading
47
+ 37Signals recently posted a terse but effective "explanation":http://37signals.com/svn/posts/1836-the-problem-with-averages of the importance of histograms.
48
+ Aggregates maintains its histogram internally as a set of "buckets".
49
+ Each bucket represents a range of possible sample values. The set of all buckets
50
+ represents the range of "normal" sample values.
37
51
 
38
- The histogram is maintained as a set of "buckets". Each bucket represents a
39
- range of possible sample values. The set of all buckets represents the range
40
- of "normal" sample values. By default this is a binary histogram, where
52
+ h3. Binary Histograms
53
+
54
+ Without any configuration Aggregate instance maintains a binary histogram, where
41
55
  each bucket represents a range twice as large as the preceding bucket i.e.
42
56
  [1,1], [2,3], [4,5,6,7], [8,9,10,11,12,13,14,15]. The default binary histogram
43
57
  provides for 128 buckets, theoretically covering the range [1, (2^127) - 1]
@@ -50,31 +64,44 @@ fall into some bucket. After using binary histograms to determine
50
64
  the coarse-grained characteristics of your sample space you can
51
65
  configure a linear histogram to examine it in closer detail.
52
66
 
67
+ h3. Linear Histograms
68
+
53
69
  Linear histograms are specified with the three values low, high, and width.
54
- Low and high specifiy a range [low, high) of values included in the
70
+ Low and high specify a range [low, high) of values included in the
55
71
  histogram (all others are outliers). Width specifies the number of
56
72
  values represented by each bucket and therefore the number of
57
73
  buckets i.e. granularity of the histogram. The histogram range
58
74
  (high - low) must be a multiple of width:
59
75
 
76
+ <pre><code>
60
77
  #Want to track aggregate stats on response times in ms
61
78
  response_stats = Aggregate.new(0, 2000, 50)
79
+ </code></pre>
62
80
 
63
81
  The example above creates a linear histogram that tracks the
64
82
  response times from 0 ms to 2000 ms in buckets of width 50 ms. Hopefully
65
- most of your samples fall in the first couple buckets! Any values added to the
66
- aggregate that fall outside of the histogram range are recorded as outliers:
83
+ most of your samples fall in the first couple buckets!
84
+
85
+ h3. Histogram Outliers
67
86
 
87
+ An Aggregate records any samples that fall outside the histogram range as
88
+ outliers:
89
+
90
+ <pre><code>
68
91
  # Number of samples that fall below the normal range
69
92
  stats.outliers_low
70
93
 
71
94
  # Number of samples that fall above the normal range
72
95
  stats.outliers_high
96
+ </code></pre>
97
+
98
+ h3. Histogram Iterators
73
99
 
74
100
  Once a histogram is populated Aggregate provides iterator support for
75
101
  examining the contents of buckets. The iterators provide both the
76
102
  number of samples in the bucket, as well as its range:
77
103
 
104
+ <pre><code>
78
105
  #Examine every bucket
79
106
  @stats.each do |bucket, count|
80
107
  end
@@ -82,21 +109,29 @@ end
82
109
  #Examine only buckets containing samples
83
110
  @stats.each_nonzero do |bucket, count|
84
111
  end
112
+ </code></pre>
85
113
 
86
- Finally Aggregate contains sophisticated pretty-printing support that for
87
- any given number of columns >= 80 (defaults to 80) and sample distribution
88
- properly sets a marker weight based on the samples per bucket and aligns all
89
- output. Empty buckets are skipped to conserve screen space.
114
+ h3. Histogram Bar Chart
90
115
 
116
+ Finally Aggregate contains sophisticated pretty-printing support to generate
117
+ ASCII bar charts. For any given number of columns >= 80 (defaults to 80) and
118
+ sample distribution the <code>to_s</code> method properly sets a marker weight based on the
119
+ samples per bucket and aligns all output. Empty buckets are skipped to conserve
120
+ screen space.
121
+
122
+ <pre><code>
91
123
  # Generate and display an 80 column histogram
92
124
  puts stats.to_s
93
125
 
94
126
  # Generate and display a 120 column histogram
95
127
  puts stats.to_s(120)
128
+ </code></pre>
96
129
 
97
- The following code populates both a binary and linear histogram with the same
98
- set of 65536 values generated by rand to produce two histograms:
130
+ This code example populates both a binary and linear histogram with the same
131
+ set of 65536 values generated by <code>rand</code> to produce the
132
+ two histograms that follow it:
99
133
 
134
+ <pre><code>
100
135
  require 'rubygems'
101
136
  require 'aggregate'
102
137
 
@@ -112,9 +147,11 @@ end
112
147
 
113
148
  puts binary_aggregate.to_s
114
149
  puts linear_aggregate.to_s
150
+ </code></pre>
151
+
152
+ h4. Binary Histogram
115
153
 
116
- ** OUTPUT **
117
- ** Binary Histogram**
154
+ <pre><code>
118
155
  value |------------------------------------------------------------------| count
119
156
  1 | | 3
120
157
  2 | | 1
@@ -134,8 +171,11 @@ value |------------------------------------------------------------------| count
134
171
  32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 32961
135
172
  ~
136
173
  Total |------------------------------------------------------------------| 65535
174
+ </code></pre>
137
175
 
138
- ** Linear (0, 65536, 4096) Histogram **
176
+ h4. Linear (0, 65536, 4096) Histogram
177
+
178
+ <pre><code>
139
179
  value |------------------------------------------------------------------| count
140
180
  0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4094
141
181
  4096 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| 4202
@@ -154,11 +194,12 @@ value |------------------------------------------------------------------| count
154
194
  57344 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4135
155
195
  61440 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | 4144
156
196
  Total |------------------------------------------------------------------| 65532
157
-
197
+ </code></pre>
158
198
  We can see from these histograms that Ruby's rand function does a relatively good
159
199
  job of distributing returned values in the requested range.
160
200
 
161
- ** NOTES **
201
+ h2. NOTES
202
+
162
203
  Ruby doesn't have a log2 function built into Math, so we approximate with
163
204
  log(x)/log(2). Theoretically log( 2^n - 1 )/ log(2) == n-1. Unfortunately due
164
205
  to precision limitations, once n reaches a certain size (somewhere > 32)
data/Rakefile CHANGED
@@ -5,7 +5,7 @@ begin
5
5
  Jeweler::Tasks.new do |gemspec|
6
6
  gemspec.name = "aggregate"
7
7
  gemspec.summary = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
8
- gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support"
8
+ gemspec.description = "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
9
9
  gemspec.email = "jruscio@gmail.com"
10
10
  gemspec.homepage = "http://github.com/josephruscio/aggregate"
11
11
  gemspec.authors = ["Joseph Ruscio"]
data/VERSION CHANGED
@@ -1 +1 @@
1
- 0.2.0
1
+ 0.2.1
@@ -5,20 +5,21 @@
5
5
 
6
6
  Gem::Specification.new do |s|
7
7
  s.name = %q{aggregate}
8
- s.version = "0.2.0"
8
+ s.version = "0.2.1"
9
9
 
10
10
  s.required_rubygems_version = Gem::Requirement.new(">= 0") if s.respond_to? :required_rubygems_version=
11
11
  s.authors = ["Joseph Ruscio"]
12
- s.date = %q{2009-09-12}
13
- s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support}
12
+ s.date = %q{2009-09-13}
13
+ s.description = %q{Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate}
14
14
  s.email = %q{jruscio@gmail.com}
15
15
  s.extra_rdoc_files = [
16
16
  "LICENSE",
17
- "README"
17
+ "README.textile"
18
18
  ]
19
19
  s.files = [
20
- "LICENSE",
21
- "README",
20
+ ".gitignore",
21
+ "LICENSE",
22
+ "README.textile",
22
23
  "Rakefile",
23
24
  "VERSION",
24
25
  "aggregate.gemspec",
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: aggregate
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.2.0
4
+ version: 0.2.1
5
5
  platform: ruby
6
6
  authors:
7
7
  - Joseph Ruscio
@@ -9,11 +9,11 @@ autorequire:
9
9
  bindir: bin
10
10
  cert_chain: []
11
11
 
12
- date: 2009-09-12 00:00:00 -07:00
12
+ date: 2009-09-13 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies: []
15
15
 
16
- description: Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support
16
+ description: "Aggregate is a Ruby class for accumulating aggregate statistics and includes histogram support. For a detailed README see: http://github.com/josephruscio/aggregate"
17
17
  email: jruscio@gmail.com
18
18
  executables: []
19
19
 
@@ -21,10 +21,11 @@ extensions: []
21
21
 
22
22
  extra_rdoc_files:
23
23
  - LICENSE
24
- - README
24
+ - README.textile
25
25
  files:
26
+ - .gitignore
26
27
  - LICENSE
27
- - README
28
+ - README.textile
28
29
  - Rakefile
29
30
  - VERSION
30
31
  - aggregate.gemspec