quantile_estimator 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. checksums.yaml +7 -0
  2. data/.ruby-gemset +1 -0
  3. data/.ruby-version +1 -0
  4. data/Gemfile +4 -0
  5. data/Gemfile.lock +17 -0
  6. data/LICENSE.txt +22 -0
  7. data/README.md +85 -0
  8. data/Rakefile +9 -0
  9. data/benchmark.rb +21 -0
  10. data/doc/Cursor.html +422 -0
  11. data/doc/Estimator.html +779 -0
  12. data/doc/Invariant.html +115 -0
  13. data/doc/Invariant/Biased.html +268 -0
  14. data/doc/Invariant/Invariant.html +193 -0
  15. data/doc/Invariant/SingleTarget.html +278 -0
  16. data/doc/Invariant/Targeted.html +278 -0
  17. data/doc/Item.html +620 -0
  18. data/doc/Quantile.html +270 -0
  19. data/doc/QuantileEstimator.html +117 -0
  20. data/doc/_index.html +211 -0
  21. data/doc/class_list.html +54 -0
  22. data/doc/compression.png +0 -0
  23. data/doc/css/common.css +1 -0
  24. data/doc/css/full_list.css +57 -0
  25. data/doc/css/style.css +338 -0
  26. data/doc/file.README.html +186 -0
  27. data/doc/file_list.html +56 -0
  28. data/doc/frames.html +26 -0
  29. data/doc/index.html +186 -0
  30. data/doc/js/app.js +219 -0
  31. data/doc/js/full_list.js +178 -0
  32. data/doc/js/jquery.js +4 -0
  33. data/doc/method_list.html +221 -0
  34. data/doc/time.png +0 -0
  35. data/doc/top-level-namespace.html +114 -0
  36. data/lib/estimator.rb +120 -0
  37. data/lib/quantile_estimator/cursor.rb +24 -0
  38. data/lib/quantile_estimator/invariant.rb +47 -0
  39. data/lib/quantile_estimator/item.rb +21 -0
  40. data/lib/quantile_estimator/quantile.rb +3 -0
  41. data/lib/quantile_estimator/test.rb +37 -0
  42. data/lib/quantile_estimator/version.rb +3 -0
  43. data/pkg/quantile_estimator-0.0.1.gem +0 -0
  44. data/quantile_estimator.gemspec +29 -0
  45. data/test/test_quantile_estimator.rb +85 -0
  46. metadata +120 -0
@@ -0,0 +1,186 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6
+ <title>
7
+ File: README
8
+
9
+ &mdash; Documentation by YARD 0.8.7.3
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ hasFrames = window.top.frames.main ? true : false;
19
+ relpath = '';
20
+ framesUrl = "frames.html#!" + escape(window.location.href);
21
+ </script>
22
+
23
+
24
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
25
+
26
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
27
+
28
+
29
+ </head>
30
+ <body>
31
+ <div id="header">
32
+ <div id="menu">
33
+
34
+ <a href="_index.html">Index</a> &raquo;
35
+ <span class="title">File: README</span>
36
+
37
+
38
+ <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
39
+ </div>
40
+
41
+ <div id="search">
42
+
43
+ <a class="full_list_link" id="class_list_link"
44
+ href="class_list.html">
45
+ Class List
46
+ </a>
47
+
48
+ <a class="full_list_link" id="method_list_link"
49
+ href="method_list.html">
50
+ Method List
51
+ </a>
52
+
53
+ <a class="full_list_link" id="file_list_link"
54
+ href="file_list.html">
55
+ File List
56
+ </a>
57
+
58
+ </div>
59
+ <div class="clear"></div>
60
+ </div>
61
+
62
+ <iframe id="search_frame"></iframe>
63
+
64
+ <div id="content"><div id='filecontents'>
65
+ <h1 id="label-QuantileEstimator.rb">QuantileEstimator.rb</h1>
66
+
67
+ <p>This implements the quantile estimator described in the paper:</p>
68
+
69
+ <p>Cormode et. al.:
70
+ &quot;Effective Computation of Biased Quantiles over Data
71
+ Streams&quot;</p>
72
+
73
+ <p>Given the different strategies to implement the algorithm I used the
74
+ easiest one by
75
+ using <a
76
+ href="https://github.com/odo/quantile_estimator">github.com/odo/quantile_estimator</a>
77
+ as reference for the implementation.</p>
78
+
79
+ <h2 id="label-Installation">Installation</h2>
80
+
81
+ <p>Add this line to your application&#39;s Gemfile:</p>
82
+
83
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_gem'>gem</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>quantile_estimator</span><span class='tstring_end'>&#39;</span></span></code></pre>
84
+
85
+ <p>And then execute:</p>
86
+
87
+ <pre class="code ruby"><code class="ruby">$ bundle</code></pre>
88
+
89
+ <p>Or install it yourself as:</p>
90
+
91
+ <pre class="code ruby"><code class="ruby">$ gem install quantile_estimator</code></pre>
92
+
93
+ <h2 id="label-Usage">Usage</h2>
94
+
95
+ <p>First initialize the estimator. Right now you can either have a
96
+ <code>Biased</code> or
97
+ <code>Targeted</code> invariants. The targeted
98
+ invariant let you select the quantiles you are
99
+ particularly interested and
100
+ will yield higher compression rates.</p>
101
+
102
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_biased'>biased</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Biased</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='float'>0.004</span><span class='rparen'>)</span>
103
+ <span class='id identifier rubyid_targeted'>targeted</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Targeted</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='lbracket'>[</span>
104
+ <span class='lbracket'>[</span><span class='float'>0.05</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
105
+ <span class='lbracket'>[</span><span class='float'>0.5</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
106
+ <span class='lbracket'>[</span><span class='float'>0.95</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span>
107
+ <span class='rbracket'>]</span><span class='rparen'>)</span>
108
+
109
+ <span class='id identifier rubyid_estimator'>estimator</span> <span class='op'>=</span> <span class='const'>Estimator</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_targeted'>targeted</span><span class='rparen'>)</span></code></pre>
110
+
111
+ <p>Insertion of data is as simple as:</p>
112
+
113
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_insert'>insert</span><span class='lparen'>(</span><span class='id identifier rubyid_value'>value</span><span class='rparen'>)</span></code></pre>
114
+
115
+ <p>The insertion of data <em>doesn&#39;t</em> automatically compress it. To
116
+ compress the data just
117
+ call:</p>
118
+
119
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_compress!'>compress!</span></code></pre>
120
+
121
+ <p>Using these primitives you can build wraps to compress on every nth insert
122
+ as shown
123
+ in
124
+ <a
125
+ href="https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb">this
126
+ file</a></p>
127
+
128
+ <h2 id="label-Pretty+Graphs">Pretty Graphs</h2>
129
+
130
+ <p>Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95
131
+ quantiles, a
132
+ uniform source of random values and compressing every 100
133
+ iterations. We get the
134
+ following behavior regarding the size of the
135
+ internal data structure:</p>
136
+
137
+ <p><img
138
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png"
139
+ /></p>
140
+
141
+ <p>Running time behavior is not too bad. The following graph shows the cost
142
+ of
143
+ insertions in the estimator. The homogeneous layer of outlayers probably
144
+ corresponds
145
+ to the compression cycles, while the bottom line is the cost of
146
+ compression-less
147
+ insertions.</p>
148
+
149
+ <p><img
150
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png"
151
+ /></p>
152
+
153
+ <p>Different distributions, different invariants setups will have different
154
+ behaviors.</p>
155
+
156
+ <p>Check your real data before using this!</p>
157
+
158
+ <h2 id="label-Known+issues">Known issues</h2>
159
+
160
+ <p>The implementation is known not to be thread-safe, and little effort has
161
+ been done to
162
+ optimize it.</p>
163
+
164
+ <h2 id="label-Contributing">Contributing</h2>
165
+ <ol><li>
166
+ <p>Fork it</p>
167
+ </li><li>
168
+ <p>Create your feature branch (<code>git checkout -b my-new-feature</code>)</p>
169
+ </li><li>
170
+ <p>Commit your changes (<code>git commit -am &#39;Add some
171
+ feature&#39;</code>)</p>
172
+ </li><li>
173
+ <p>Push to the branch (<code>git push origin my-new-feature</code>)</p>
174
+ </li><li>
175
+ <p>Create new Pull Request</p>
176
+ </li></ol>
177
+ </div></div>
178
+
179
+ <div id="footer">
180
+ Generated on Fri Nov 15 15:39:43 2013 by
181
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
182
+ 0.8.7.3 (ruby-2.0.0).
183
+ </div>
184
+
185
+ </body>
186
+ </html>
@@ -0,0 +1,56 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html>
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
6
+
7
+ <link rel="stylesheet" href="css/full_list.css" type="text/css" media="screen" charset="utf-8" />
8
+
9
+ <link rel="stylesheet" href="css/common.css" type="text/css" media="screen" charset="utf-8" />
10
+
11
+
12
+
13
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
14
+
15
+ <script type="text/javascript" charset="utf-8" src="js/full_list.js"></script>
16
+
17
+
18
+ <title>File List</title>
19
+ <base id="base_target" target="_parent" />
20
+ </head>
21
+ <body>
22
+ <script type="text/javascript" charset="utf-8">
23
+ if (window.top.frames.main) {
24
+ document.getElementById('base_target').target = 'main';
25
+ document.body.className = 'frames';
26
+ }
27
+ </script>
28
+ <div id="content">
29
+ <h1 id="full_list_header">File List</h1>
30
+ <div id="nav">
31
+
32
+ <span><a target="_self" href="class_list.html">
33
+ Classes
34
+ </a></span>
35
+
36
+ <span><a target="_self" href="method_list.html">
37
+ Methods
38
+ </a></span>
39
+
40
+ <span><a target="_self" href="file_list.html">
41
+ Files
42
+ </a></span>
43
+
44
+ </div>
45
+ <div id="search">Search: <input type="text" /></div>
46
+
47
+ <ul id="full_list" class="file">
48
+
49
+
50
+ <li class="r1"><span class="object_link"><a href="index.html" title="README">README</a></a></li>
51
+
52
+
53
+ </ul>
54
+ </div>
55
+ </body>
56
+ </html>
data/doc/frames.html ADDED
@@ -0,0 +1,26 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Frameset//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-frameset.dtd">
3
+
4
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
5
+ <head>
6
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8"/>
7
+ <title>Documentation by YARD 0.8.7.3</title>
8
+ </head>
9
+ <script type="text/javascript" charset="utf-8">
10
+ window.onload = function() {
11
+ var match = unescape(window.location.hash).match(/^#!(.+)/);
12
+ var name = match ? match[1] : 'index.html';
13
+ name = name.replace(/^(\w+):\/\//, '').replace(/^\/\//, '');
14
+ document.writeln('<frameset cols="20%,*">' +
15
+ '<frame name="list" src="class_list.html" />' +
16
+ '<frame name="main" src="' + escape(name) + '" />' +
17
+ '</frameset>');
18
+ }
19
+ </script>
20
+ <noscript>
21
+ <frameset cols="20%,*">
22
+ <frame name="list" src="class_list.html" />
23
+ <frame name="main" src="index.html" />
24
+ </frameset>
25
+ </noscript>
26
+ </html>
data/doc/index.html ADDED
@@ -0,0 +1,186 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
6
+ <title>
7
+ File: README
8
+
9
+ &mdash; Documentation by YARD 0.8.7.3
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ hasFrames = window.top.frames.main ? true : false;
19
+ relpath = '';
20
+ framesUrl = "frames.html#!" + escape(window.location.href);
21
+ </script>
22
+
23
+
24
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
25
+
26
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
27
+
28
+
29
+ </head>
30
+ <body>
31
+ <div id="header">
32
+ <div id="menu">
33
+
34
+ <a href="_index.html">Index</a> &raquo;
35
+ <span class="title">File: README</span>
36
+
37
+
38
+ <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
39
+ </div>
40
+
41
+ <div id="search">
42
+
43
+ <a class="full_list_link" id="class_list_link"
44
+ href="class_list.html">
45
+ Class List
46
+ </a>
47
+
48
+ <a class="full_list_link" id="method_list_link"
49
+ href="method_list.html">
50
+ Method List
51
+ </a>
52
+
53
+ <a class="full_list_link" id="file_list_link"
54
+ href="file_list.html">
55
+ File List
56
+ </a>
57
+
58
+ </div>
59
+ <div class="clear"></div>
60
+ </div>
61
+
62
+ <iframe id="search_frame"></iframe>
63
+
64
+ <div id="content"><div id='filecontents'>
65
+ <h1 id="label-QuantileEstimator.rb">QuantileEstimator.rb</h1>
66
+
67
+ <p>This implements the quantile estimator described in the paper:</p>
68
+
69
+ <p>Cormode et. al.:
70
+ &quot;Effective Computation of Biased Quantiles over Data
71
+ Streams&quot;</p>
72
+
73
+ <p>Given the different strategies to implement the algorithm I used the
74
+ easiest one by
75
+ using <a
76
+ href="https://github.com/odo/quantile_estimator">github.com/odo/quantile_estimator</a>
77
+ as reference for the implementation.</p>
78
+
79
+ <h2 id="label-Installation">Installation</h2>
80
+
81
+ <p>Add this line to your application&#39;s Gemfile:</p>
82
+
83
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_gem'>gem</span> <span class='tstring'><span class='tstring_beg'>&#39;</span><span class='tstring_content'>quantile_estimator</span><span class='tstring_end'>&#39;</span></span></code></pre>
84
+
85
+ <p>And then execute:</p>
86
+
87
+ <pre class="code ruby"><code class="ruby">$ bundle</code></pre>
88
+
89
+ <p>Or install it yourself as:</p>
90
+
91
+ <pre class="code ruby"><code class="ruby">$ gem install quantile_estimator</code></pre>
92
+
93
+ <h2 id="label-Usage">Usage</h2>
94
+
95
+ <p>First initialize the estimator. Right now you can either have a
96
+ <code>Biased</code> or
97
+ <code>Targeted</code> invariants. The targeted
98
+ invariant let you select the quantiles you are
99
+ particularly interested and
100
+ will yield higher compression rates.</p>
101
+
102
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_biased'>biased</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Biased</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='float'>0.004</span><span class='rparen'>)</span>
103
+ <span class='id identifier rubyid_targeted'>targeted</span> <span class='op'>=</span> <span class='const'>Invariant</span><span class='op'>::</span><span class='const'>Targeted</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='lbracket'>[</span>
104
+ <span class='lbracket'>[</span><span class='float'>0.05</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
105
+ <span class='lbracket'>[</span><span class='float'>0.5</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span><span class='comma'>,</span>
106
+ <span class='lbracket'>[</span><span class='float'>0.95</span><span class='comma'>,</span> <span class='float'>0.02</span><span class='rbracket'>]</span>
107
+ <span class='rbracket'>]</span><span class='rparen'>)</span>
108
+
109
+ <span class='id identifier rubyid_estimator'>estimator</span> <span class='op'>=</span> <span class='const'>Estimator</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_targeted'>targeted</span><span class='rparen'>)</span></code></pre>
110
+
111
+ <p>Insertion of data is as simple as:</p>
112
+
113
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_insert'>insert</span><span class='lparen'>(</span><span class='id identifier rubyid_value'>value</span><span class='rparen'>)</span></code></pre>
114
+
115
+ <p>The insertion of data <em>doesn&#39;t</em> automatically compress it. To
116
+ compress the data just
117
+ call:</p>
118
+
119
+ <pre class="code ruby"><code class="ruby"><span class='id identifier rubyid_estimator'>estimator</span><span class='period'>.</span><span class='id identifier rubyid_compress!'>compress!</span></code></pre>
120
+
121
+ <p>Using these primitives you can build wraps to compress on every nth insert
122
+ as shown
123
+ in
124
+ <a
125
+ href="https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb">this
126
+ file</a></p>
127
+
128
+ <h2 id="label-Pretty+Graphs">Pretty Graphs</h2>
129
+
130
+ <p>Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95
131
+ quantiles, a
132
+ uniform source of random values and compressing every 100
133
+ iterations. We get the
134
+ following behavior regarding the size of the
135
+ internal data structure:</p>
136
+
137
+ <p><img
138
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png"
139
+ /></p>
140
+
141
+ <p>Running time behavior is not too bad. The following graph shows the cost
142
+ of
143
+ insertions in the estimator. The homogeneous layer of outlayers probably
144
+ corresponds
145
+ to the compression cycles, while the bottom line is the cost of
146
+ compression-less
147
+ insertions.</p>
148
+
149
+ <p><img
150
+ src="https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png"
151
+ /></p>
152
+
153
+ <p>Different distributions, different invariants setups will have different
154
+ behaviors.</p>
155
+
156
+ <p>Check your real data before using this!</p>
157
+
158
+ <h2 id="label-Known+issues">Known issues</h2>
159
+
160
+ <p>The implementation is known not to be thread-safe, and little effort has
161
+ been done to
162
+ optimize it.</p>
163
+
164
+ <h2 id="label-Contributing">Contributing</h2>
165
+ <ol><li>
166
+ <p>Fork it</p>
167
+ </li><li>
168
+ <p>Create your feature branch (<code>git checkout -b my-new-feature</code>)</p>
169
+ </li><li>
170
+ <p>Commit your changes (<code>git commit -am &#39;Add some
171
+ feature&#39;</code>)</p>
172
+ </li><li>
173
+ <p>Push to the branch (<code>git push origin my-new-feature</code>)</p>
174
+ </li><li>
175
+ <p>Create new Pull Request</p>
176
+ </li></ol>
177
+ </div></div>
178
+
179
+ <div id="footer">
180
+ Generated on Fri Nov 15 15:39:41 2013 by
181
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
182
+ 0.8.7.3 (ruby-2.0.0).
183
+ </div>
184
+
185
+ </body>
186
+ </html>