quantile_estimator 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (46) hide show
  1. checksums.yaml +7 -0
  2. data/.ruby-gemset +1 -0
  3. data/.ruby-version +1 -0
  4. data/Gemfile +4 -0
  5. data/Gemfile.lock +17 -0
  6. data/LICENSE.txt +22 -0
  7. data/README.md +85 -0
  8. data/Rakefile +9 -0
  9. data/benchmark.rb +21 -0
  10. data/doc/Cursor.html +422 -0
  11. data/doc/Estimator.html +779 -0
  12. data/doc/Invariant.html +115 -0
  13. data/doc/Invariant/Biased.html +268 -0
  14. data/doc/Invariant/Invariant.html +193 -0
  15. data/doc/Invariant/SingleTarget.html +278 -0
  16. data/doc/Invariant/Targeted.html +278 -0
  17. data/doc/Item.html +620 -0
  18. data/doc/Quantile.html +270 -0
  19. data/doc/QuantileEstimator.html +117 -0
  20. data/doc/_index.html +211 -0
  21. data/doc/class_list.html +54 -0
  22. data/doc/compression.png +0 -0
  23. data/doc/css/common.css +1 -0
  24. data/doc/css/full_list.css +57 -0
  25. data/doc/css/style.css +338 -0
  26. data/doc/file.README.html +186 -0
  27. data/doc/file_list.html +56 -0
  28. data/doc/frames.html +26 -0
  29. data/doc/index.html +186 -0
  30. data/doc/js/app.js +219 -0
  31. data/doc/js/full_list.js +178 -0
  32. data/doc/js/jquery.js +4 -0
  33. data/doc/method_list.html +221 -0
  34. data/doc/time.png +0 -0
  35. data/doc/top-level-namespace.html +114 -0
  36. data/lib/estimator.rb +120 -0
  37. data/lib/quantile_estimator/cursor.rb +24 -0
  38. data/lib/quantile_estimator/invariant.rb +47 -0
  39. data/lib/quantile_estimator/item.rb +21 -0
  40. data/lib/quantile_estimator/quantile.rb +3 -0
  41. data/lib/quantile_estimator/test.rb +37 -0
  42. data/lib/quantile_estimator/version.rb +3 -0
  43. data/pkg/quantile_estimator-0.0.1.gem +0 -0
  44. data/quantile_estimator.gemspec +29 -0
  45. data/test/test_quantile_estimator.rb +85 -0
  46. metadata +120 -0
@@ -0,0 +1,186 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6
+ <title>
7
+ File: README
8
+
9
+ &mdash; Documentation by YARD 0.8.7.3
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ hasFrames = window.top.frames.main ? true : false;
19
+ relpath = '';
20
+ framesUrl = "frames.html#!" + escape(window.location.href);
21
+ </script>
22
+
23
+
24
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
25
+
26
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
27
+
28
+
29
+ </head>
30
+ <body>
31
+ <div id="header">
32
+ <div id="menu">
33
+
34
+ <a href="_index.html">Index</a> &raquo;
35
+ <span class="title">File: README</span>
36
+
37
+
38
+ <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
39
+ </div>
40
+
41
+ <div id="search">
42
+
43
+ <a class="full_list_link" id="class_list_link"
44
+ href="class_list.html">
45
+ Class List
46
+ </a>
47
+
48
+ <a class="full_list_link" id="method_list_link"
49
+ href="method_list.html">
50
+ Method List
51
+ </a>
52
+
53
+ <a class="full_list_link" id="file_list_link"
54
+ href="file_list.html">
55
+ File List
56
+ </a>
57
+
58
+ </div>
59
+ <div class="clear"></div>
60
+ </div>
61
+
62
+ <iframe id="search_frame"></iframe>
63
+
64
+ <div id="content"><div id='filecontents'>
65
+ <h1 id="label-QuantileEstimator.rb">QuantileEstimator.rb</h1>
66
+
67
+ <p>This implements the quantile estimator described in the paper:</p>
68
+
69
+ <p>Cormode et. al.:
70
+ &quot;Effective Computation of Biased Quantiles over Data
71
+ Streams&quot;</p>
72
+
73
+ <p>Given the different strategies to implement the algorithm I used the
74
+ easiest one by
75
+ using <a
76
+ href="https://github.com/odo/quantile_estimator">github.com/odo/quantile_estimator</a>
77
+ as reference for the implementation.</p>
78
+
79
+ <h2 id="label-Installation">Installation</h2>
80
+
81
+ <p>Add this line to your application&#39;s Gemfile:</p>
82
+
83
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_gem'>gem</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>quantile_estimator</span><span class='tstring_end'>&#39;</span></span></code></pre>
84
+
85
+ <p>And then execute:</p>
86
+
87
+ <pre class="code ruby"><code class="ruby">$ bundle</code></pre>
88
+
89
+ <p>Or install it yourself as:</p>
90
+
91
+ <pre class="code ruby"><code class="ruby">$ gem install quantile_estimator</code></pre>
92
+
93
+ <h2 id="label-Usage">Usage</h2>
94
+
95
+ <p>First initialize the estimator. Right now you can either have a
96
+ <code>Biased</code> or
97
+ <code>Targeted</code> invariants. The targeted
98
+ invariant let you select the quantiles you are
99
+ particularly interested and
100
+ will yield higher compression rates.</p>
101
+
102
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_biased'>biased</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Biased</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='float'>0.004</span><span class='rparen'>)</span>
103
+ <span class='id identifier rubyid_targeted'>targeted</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Targeted</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='lbracket'>[</span>
104
+ <span class='lbracket'>[</span><span class='float'>0.05</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
105
+ <span class='lbracket'>[</span><span class='float'>0.5</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
106
+ <span class='lbracket'>[</span><span class='float'>0.95</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span>
107
+ <span class='rbracket'>]</span><span class='rparen'>)</span>
108
+
109
+ <span class='id identifier rubyid_estimator'>estimator</span> <span class='op'>=</span> <span class='const'>Estimator</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_targeted'>targeted</span><span class='rparen'>)</span></code></pre>
110
+
111
+ <p>Insertion of data is as simple as:</p>
112
+
113
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_insert'>insert</span><span class='lparen'>(</span><span class='id identifier rubyid_value'>value</span><span class='rparen'>)</span></code></pre>
114
+
115
+ <p>The insertion of data <em>doesn&#39;t</em> automatically compress it. To
116
+ compress the data just
117
+ call:</p>
118
+
119
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_compress!'>compress!</span></code></pre>
120
+
121
+ <p>Using these primitives you can build wraps to compress on every nth insert
122
+ as shown
123
+ in
124
+ <a
125
+ href="https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb">this
126
+ file</a></p>
127
+
128
+ <h2 id="label-Pretty+Graphs">Pretty Graphs</h2>
129
+
130
+ <p>Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95
131
+ quantiles, a
132
+ uniform source of random values and compressing every 100
133
+ iterations. We get the
134
+ following behavior regarding the size of the
135
+ internal data structure:</p>
136
+
137
+ <p><img
138
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png"
139
+ /></p>
140
+
141
+ <p>Running time behavior is not too bad. The following graph shows the cost
142
+ of
143
+ insertions in the estimator. The homogeneous layer of outlayers probably
144
+ corresponds
145
+ to the compression cycles, while the bottom line is the cost of
146
+ compression-less
147
+ insertions.</p>
148
+
149
+ <p><img
150
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png"
151
+ /></p>
152
+
153
+ <p>Different distributions, different invariants setups will have different
154
+ behaviors.</p>
155
+
156
+ <p>Check your real data before using this!</p>
157
+
158
+ <h2 id="label-Known+issues">Known issues</h2>
159
+
160
+ <p>The implementation is known not to be thread-safe, and little effort has
161
+ been done to
162
+ optimize it.</p>
163
+
164
+ <h2 id="label-Contributing">Contributing</h2>
165
+ <ol><li>
166
+ <p>Fork it</p>
167
+ </li><li>
168
+ <p>Create your feature branch (<code>git checkout -b my-new-feature</code>)</p>
169
+ </li><li>
170
+ <p>Commit your changes (<code>git commit -am &#39;Add some
171
+ feature&#39;</code>)</p>
172
+ </li><li>
173
+ <p>Push to the branch (<code>git push origin my-new-feature</code>)</p>
174
+ </li><li>
175
+ <p>Create new Pull Request</p>
176
+ </li></ol>
177
+ </div></div>
178
+
179
+ <div id="footer">
180
+ Generated on Fri Nov 15 15:39:43 2013 by
181
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
182
+ 0.8.7.3 (ruby-2.0.0).
183
+ </div>
184
+
185
+ </body>
186
+ </html>
@@ -0,0 +1,56 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html>
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
6
+
7
+ <link rel="stylesheet" href="css/full_list.css" type="text/css" media="screen" charset="utf-8" />
8
+
9
+ <link rel="stylesheet" href="css/common.css" type="text/css" media="screen" charset="utf-8" />
10
+
11
+
12
+
13
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
14
+
15
+ <script type="text/javascript" charset="utf-8" src="js/full_list.js"></script>
16
+
17
+
18
+ <title>File List</title>
19
+ <base id="base_target" target="_parent" />
20
+ </head>
21
+ <body>
22
+ <script type="text/javascript" charset="utf-8">
23
+ if (window.top.frames.main) {
24
+ document.getElementById('base_target').target = 'main';
25
+ document.body.className = 'frames';
26
+ }
27
+ </script>
28
+ <div id="content">
29
+ <h1 id="full_list_header">File List</h1>
30
+ <div id="nav">
31
+
32
+ <span><a target="_self" href="class_list.html">
33
+ Classes
34
+ </a></span>
35
+
36
+ <span><a target="_self" href="method_list.html">
37
+ Methods
38
+ </a></span>
39
+
40
+ <span><a target="_self" href="file_list.html">
41
+ Files
42
+ </a></span>
43
+
44
+ </div>
45
+ <div id="search">Search: <input type="text" /></div>
46
+
47
+ <ul id="full_list" class="file">
48
+
49
+
50
+ <li class="r1"><span class="object_link"><a href="index.html" title="README">README</a></a></li>
51
+
52
+
53
+ </ul>
54
+ </div>
55
+ </body>
56
+ </html>
data/doc/frames.html ADDED
@@ -0,0 +1,26 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
3
+
4
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5
+ <head>
6
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
7
+ <title>Documentation by YARD 0.8.7.3</title>
8
+ </head>
9
+ <script type="text/javascript" charset="utf-8">
10
+ window.onload = function() {
11
+ var match = unescape(window.location.hash).match(/^#!(.+)/);
12
+ var name = match ? match[1] : 'index.html';
13
+ name = name.replace(/^(\w+):\/\//, '').replace(/^\/\//, '');
14
+ document.writeln('<frameset cols="20%,*">' +
15
+ '<frame name="list" src="class_list.html" />' +
16
+ '<frame name="main" src="' + escape(name) + '" />' +
17
+ '</frameset>');
18
+ }
19
+ </script>
20
+ <noscript>
21
+ <frameset cols="20%,*">
22
+ <frame name="list" src="class_list.html" />
23
+ <frame name="main" src="index.html" />
24
+ </frameset>
25
+ </noscript>
26
+ </html>
data/doc/index.html ADDED
@@ -0,0 +1,186 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6
+ <title>
7
+ File: README
8
+
9
+ &mdash; Documentation by YARD 0.8.7.3
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ hasFrames = window.top.frames.main ? true : false;
19
+ relpath = '';
20
+ framesUrl = "frames.html#!" + escape(window.location.href);
21
+ </script>
22
+
23
+
24
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
25
+
26
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
27
+
28
+
29
+ </head>
30
+ <body>
31
+ <div id="header">
32
+ <div id="menu">
33
+
34
+ <a href="_index.html">Index</a> &raquo;
35
+ <span class="title">File: README</span>
36
+
37
+
38
+ <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
39
+ </div>
40
+
41
+ <div id="search">
42
+
43
+ <a class="full_list_link" id="class_list_link"
44
+ href="class_list.html">
45
+ Class List
46
+ </a>
47
+
48
+ <a class="full_list_link" id="method_list_link"
49
+ href="method_list.html">
50
+ Method List
51
+ </a>
52
+
53
+ <a class="full_list_link" id="file_list_link"
54
+ href="file_list.html">
55
+ File List
56
+ </a>
57
+
58
+ </div>
59
+ <div class="clear"></div>
60
+ </div>
61
+
62
+ <iframe id="search_frame"></iframe>
63
+
64
+ <div id="content"><div id='filecontents'>
65
+ <h1 id="label-QuantileEstimator.rb">QuantileEstimator.rb</h1>
66
+
67
+ <p>This implements the quantile estimator described in the paper:</p>
68
+
69
+ <p>Cormode et. al.:
70
+ &quot;Effective Computation of Biased Quantiles over Data
71
+ Streams&quot;</p>
72
+
73
+ <p>Given the different strategies to implement the algorithm I used the
74
+ easiest one by
75
+ using <a
76
+ href="https://github.com/odo/quantile_estimator">github.com/odo/quantile_estimator</a>
77
+ as reference for the implementation.</p>
78
+
79
+ <h2 id="label-Installation">Installation</h2>
80
+
81
+ <p>Add this line to your application&#39;s Gemfile:</p>
82
+
83
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_gem'>gem</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>quantile_estimator</span><span class='tstring_end'>&#39;</span></span></code></pre>
84
+
85
+ <p>And then execute:</p>
86
+
87
+ <pre class="code ruby"><code class="ruby">$ bundle</code></pre>
88
+
89
+ <p>Or install it yourself as:</p>
90
+
91
+ <pre class="code ruby"><code class="ruby">$ gem install quantile_estimator</code></pre>
92
+
93
+ <h2 id="label-Usage">Usage</h2>
94
+
95
+ <p>First initialize the estimator. Right now you can either have a
96
+ <code>Biased</code> or
97
+ <code>Targeted</code> invariants. The targeted
98
+ invariant let you select the quantiles you are
99
+ particularly interested and
100
+ will yield higher compression rates.</p>
101
+
102
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_biased'>biased</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Biased</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='float'>0.004</span><span class='rparen'>)</span>
103
+ <span class='id identifier rubyid_targeted'>targeted</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Targeted</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='lbracket'>[</span>
104
+ <span class='lbracket'>[</span><span class='float'>0.05</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
105
+ <span class='lbracket'>[</span><span class='float'>0.5</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
106
+ <span class='lbracket'>[</span><span class='float'>0.95</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span>
107
+ <span class='rbracket'>]</span><span class='rparen'>)</span>
108
+
109
+ <span class='id identifier rubyid_estimator'>estimator</span> <span class='op'>=</span> <span class='const'>Estimator</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_targeted'>targeted</span><span class='rparen'>)</span></code></pre>
110
+
111
+ <p>Insertion of data is as simple as:</p>
112
+
113
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_insert'>insert</span><span class='lparen'>(</span><span class='id identifier rubyid_value'>value</span><span class='rparen'>)</span></code></pre>
114
+
115
+ <p>The insertion of data <em>doesn&#39;t</em> automatically compress it. To
116
+ compress the data just
117
+ call:</p>
118
+
119
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_compress!'>compress!</span></code></pre>
120
+
121
+ <p>Using these primitives you can build wraps to compress on every nth insert
122
+ as shown
123
+ in
124
+ <a
125
+ href="https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb">this
126
+ file</a></p>
127
+
128
+ <h2 id="label-Pretty+Graphs">Pretty Graphs</h2>
129
+
130
+ <p>Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95
131
+ quantiles, a
132
+ uniform source of random values and compressing every 100
133
+ iterations. We get the
134
+ following behavior regarding the size of the
135
+ internal data structure:</p>
136
+
137
+ <p><img
138
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png"
139
+ /></p>
140
+
141
+ <p>Running time behavior is not too bad. The following graph shows the cost
142
+ of
143
+ insertions in the estimator. The homogeneous layer of outlayers probably
144
+ corresponds
145
+ to the compression cycles, while the bottom line is the cost of
146
+ compression-less
147
+ insertions.</p>
148
+
149
+ <p><img
150
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png"
151
+ /></p>
152
+
153
+ <p>Different distributions, different invariants setups will have different
154
+ behaviors.</p>
155
+
156
+ <p>Check your real data before using this!</p>
157
+
158
+ <h2 id="label-Known+issues">Known issues</h2>
159
+
160
+ <p>The implementation is known not to be thread-safe, and little effort has
161
+ been done to
162
+ optimize it.</p>
163
+
164
+ <h2 id="label-Contributing">Contributing</h2>
165
+ <ol><li>
166
+ <p>Fork it</p>
167
+ </li><li>
168
+ <p>Create your feature branch (<code>git checkout -b my-new-feature</code>)</p>
169
+ </li><li>
170
+ <p>Commit your changes (<code>git commit -am &#39;Add some
171
+ feature&#39;</code>)</p>
172
+ </li><li>
173
+ <p>Push to the branch (<code>git push origin my-new-feature</code>)</p>
174
+ </li><li>
175
+ <p>Create new Pull Request</p>
176
+ </li></ol>
177
+ </div></div>
178
+
179
+ <div id="footer">
180
+ Generated on Fri Nov 15 15:39:41 2013 by
181
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
182
+ 0.8.7.3 (ruby-2.0.0).
183
+ </div>
184
+
185
+ </body>
186
+ </html>