quantile_estimator 0.0.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (46) hide show
  1. checksums.yaml +7 -0
  2. data/.ruby-gemset +1 -0
  3. data/.ruby-version +1 -0
  4. data/Gemfile +4 -0
  5. data/Gemfile.lock +17 -0
  6. data/LICENSE.txt +22 -0
  7. data/README.md +85 -0
  8. data/Rakefile +9 -0
  9. data/benchmark.rb +21 -0
  10. data/doc/Cursor.html +422 -0
  11. data/doc/Estimator.html +779 -0
  12. data/doc/Invariant.html +115 -0
  13. data/doc/Invariant/Biased.html +268 -0
  14. data/doc/Invariant/Invariant.html +193 -0
  15. data/doc/Invariant/SingleTarget.html +278 -0
  16. data/doc/Invariant/Targeted.html +278 -0
  17. data/doc/Item.html +620 -0
  18. data/doc/Quantile.html +270 -0
  19. data/doc/QuantileEstimator.html +117 -0
  20. data/doc/_index.html +211 -0
  21. data/doc/class_list.html +54 -0
  22. data/doc/compression.png +0 -0
  23. data/doc/css/common.css +1 -0
  24. data/doc/css/full_list.css +57 -0
  25. data/doc/css/style.css +338 -0
  26. data/doc/file.README.html +186 -0
  27. data/doc/file_list.html +56 -0
  28. data/doc/frames.html +26 -0
  29. data/doc/index.html +186 -0
  30. data/doc/js/app.js +219 -0
  31. data/doc/js/full_list.js +178 -0
  32. data/doc/js/jquery.js +4 -0
  33. data/doc/method_list.html +221 -0
  34. data/doc/time.png +0 -0
  35. data/doc/top-level-namespace.html +114 -0
  36. data/lib/estimator.rb +120 -0
  37. data/lib/quantile_estimator/cursor.rb +24 -0
  38. data/lib/quantile_estimator/invariant.rb +47 -0
  39. data/lib/quantile_estimator/item.rb +21 -0
  40. data/lib/quantile_estimator/quantile.rb +3 -0
  41. data/lib/quantile_estimator/test.rb +37 -0
  42. data/lib/quantile_estimator/version.rb +3 -0
  43. data/pkg/quantile_estimator-0.0.1.gem +0 -0
  44. data/quantile_estimator.gemspec +29 -0
  45. data/test/test_quantile_estimator.rb +85 -0
  46. metadata +120 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d96954e453f9b1c4dab4c71a9a26e00b8f26d138
4
+ data.tar.gz: 6da4c7c657a29ea5993d535586b3e4ab54bf8b4b
5
+ SHA512:
6
+ metadata.gz: e9b1452ae828b888742a75eb601bd049969d01501cd17410a55bd14463c8fc23456ffc4a21a37aa78f1acac41e95150697c18dd81d07459d8219baa7f1ea444d
7
+ data.tar.gz: 7885e33a2b2792f39ef81dc87a1fb4f90ca8934f401efa408c701f4596b52e37574b444f90c08f64f65afd9b8a2ba3302c1f474214977a4701ebb42ecab95603
data/.ruby-gemset ADDED
@@ -0,0 +1 @@
1
+ quantile_estimator
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ ruby-2.0.0-p0
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in quantile_estimator.gemspec
4
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,17 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ quantile_estimator (0.0.1)
5
+
6
+ GEM
7
+ remote: https://rubygems.org/
8
+ specs:
9
+ rake (10.0.4)
10
+
11
+ PLATFORMS
12
+ ruby
13
+
14
+ DEPENDENCIES
15
+ bundler (~> 1.3)
16
+ quantile_estimator!
17
+ rake
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2013 Diego Echeverri
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # QuantileEstimator.rb
2
+
3
+ This implements the quantile estimator described in the paper:
4
+
5
+ Cormode et. al.:
6
+ "Effective Computation of Biased Quantiles over Data Streams"
7
+
8
+ Given the different strategies to implement the algorithm I used the easiest one by
9
+ using https://github.com/odo/quantile_estimator as reference for the implementation.
10
+
11
+
12
+ ## Installation
13
+
14
+ Add this line to your application's Gemfile:
15
+
16
+ gem 'quantile_estimator'
17
+
18
+ And then execute:
19
+
20
+ $ bundle
21
+
22
+ Or install it yourself as:
23
+
24
+ $ gem install quantile_estimator
25
+
26
+ ## Usage
27
+
28
+ First initialize the estimator. Right now you can either have a `Biased` or
29
+ `Targeted` invariants. The targeted invariant let you select the quantiles you are
30
+ particularly interested and will yield higher compression rates.
31
+
32
+ biased = Invariant::Biased.new(0.004)
33
+ targeted = Invariant::Targeted.new([
34
+ [0.05, 0.02],
35
+ [0.5, 0.02],
36
+ [0.95, 0.02]
37
+ ])
38
+
39
+ estimator = Estimator.new(targeted)
40
+
41
+ Insertion of data is as simple as:
42
+
43
+ estimator.insert(value)
44
+
45
+ The insertion of data _doesn't_ automatically compress it. To compress the data just
46
+ call:
47
+
48
+ estimator.compress!
49
+
50
+ Using these primitives you can build wraps to compress on every nth insert as shown
51
+ in
52
+ [this file](https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb)
53
+
54
+ ## Pretty Graphs
55
+
56
+ Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95 quantiles, a
57
+ uniform source of random values and compressing every 100 iterations. We get the
58
+ following behavior regarding the size of the internal data structure:
59
+
60
+ ![compression rate (elements size, lower is better)](https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png
61
+ "Elements in the estimator. Lower is better")
62
+
63
+ Running time behavior is not too bad. The following graph shows the cost of
64
+ insertions in the estimator. The homogeneous layer of outlayers probably corresponds
65
+ to the compression cycles, while the bottom line is the cost of compression-less
66
+ insertions.
67
+
68
+ ![runtime behavior (ms, lower is better)](https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png "Time in ms. Lower is better")
69
+
70
+ Different distributions, different invariants setups will have different behaviors.
71
+
72
+ Check your real data before using this!
73
+
74
+ ## Known issues
75
+
76
+ The implementation is known not to be thread-safe, and little effort has been done to
77
+ optimize it.
78
+
79
+ ## Contributing
80
+
81
+ 1. Fork it
82
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
83
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
84
+ 4. Push to the branch (`git push origin my-new-feature`)
85
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new do |t|
5
+ t.libs << 'test'
6
+ end
7
+
8
+ desc "Run tests"
9
+ task :default => :test
data/benchmark.rb ADDED
@@ -0,0 +1,21 @@
1
+ require 'estimator'
2
+
3
+ biased = Invariant::Biased.new(0.004)
4
+ targeted = Invariant::Targeted.new([
5
+ [0.05, 0.02],
6
+ [0.5, 0.02],
7
+ [0.95, 0.02]
8
+ ])
9
+
10
+ # estimator = Estimator.new(invariant)
11
+
12
+ estimator = Estimator.new(targeted)
13
+
14
+ 10000.times do |i|
15
+ nowish = Time.now.to_f
16
+ estimator.insert(rand)
17
+ if i % 100 == 99
18
+ estimator.compress!
19
+ end
20
+ puts [i, 1000 * (Time.now.to_f - nowish), estimator.samples.size].join("\t")
21
+ end
data/doc/Cursor.html ADDED
@@ -0,0 +1,422 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
6
+ <title>
7
+ Class: Cursor
8
+
9
+ &mdash; Documentation by YARD 0.8.7.3
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ hasFrames = window.top.frames.main ? true : false;
19
+ relpath = '';
20
+ framesUrl = "frames.html#!" + escape(window.location.href);
21
+ </script>
22
+
23
+
24
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
25
+
26
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
27
+
28
+
29
+ </head>
30
+ <body>
31
+ <div id="header">
32
+ <div id="menu">
33
+
34
+ <a href="_index.html">Index (C)</a> &raquo;
35
+
36
+
37
+ <span class="title">Cursor</span>
38
+
39
+
40
+ <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
41
+ </div>
42
+
43
+ <div id="search">
44
+
45
+ <a class="full_list_link" id="class_list_link"
46
+ href="class_list.html">
47
+ Class List
48
+ </a>
49
+
50
+ <a class="full_list_link" id="method_list_link"
51
+ href="method_list.html">
52
+ Method List
53
+ </a>
54
+
55
+ <a class="full_list_link" id="file_list_link"
56
+ href="file_list.html">
57
+ File List
58
+ </a>
59
+
60
+ </div>
61
+ <div class="clear"></div>
62
+ </div>
63
+
64
+ <iframe id="search_frame"></iframe>
65
+
66
+ <div id="content"><h1>Class: Cursor
67
+
68
+
69
+
70
+ </h1>
71
+
72
+ <dl class="box">
73
+
74
+ <dt class="r1">Inherits:</dt>
75
+ <dd class="r1">
76
+ <span class="inheritName">Object</span>
77
+
78
+ <ul class="fullTree">
79
+ <li>Object</li>
80
+
81
+ <li class="next">Cursor</li>
82
+
83
+ </ul>
84
+ <a href="#" class="inheritanceTree">show all</a>
85
+
86
+ </dd>
87
+
88
+
89
+
90
+
91
+
92
+
93
+
94
+
95
+
96
+ <dt class="r2 last">Defined in:</dt>
97
+ <dd class="r2 last">lib/quantile_estimator/cursor.rb</dd>
98
+
99
+ </dl>
100
+ <div class="clear"></div>
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+
110
+ <h2>
111
+ Instance Method Summary
112
+ <small>(<a href="#" class="summary_toggle">collapse</a>)</small>
113
+ </h2>
114
+
115
+ <ul class="summary">
116
+
117
+ <li class="public ">
118
+ <span class="summary_signature">
119
+
120
+ <a href="#initialize-instance_method" title="#initialize (instance method)">- (Cursor) <strong>initialize</strong>(array, start = 0) </a>
121
+
122
+
123
+
124
+ </span>
125
+
126
+
127
+ <span class="note title constructor">constructor</span>
128
+
129
+
130
+
131
+
132
+
133
+
134
+
135
+
136
+ <span class="summary_desc"><div class='inline'>
137
+ <p>A new instance of Cursor.</p>
138
+ </div></span>
139
+
140
+ </li>
141
+
142
+
143
+ <li class="public ">
144
+ <span class="summary_signature">
145
+
146
+ <a href="#next-instance_method" title="#next (instance method)">- (Object) <strong>next</strong> </a>
147
+
148
+
149
+
150
+ </span>
151
+
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+
160
+ <span class="summary_desc"><div class='inline'></div></span>
161
+
162
+ </li>
163
+
164
+
165
+ <li class="public ">
166
+ <span class="summary_signature">
167
+
168
+ <a href="#previous-instance_method" title="#previous (instance method)">- (Object) <strong>previous</strong> </a>
169
+
170
+
171
+
172
+ </span>
173
+
174
+
175
+
176
+
177
+
178
+
179
+
180
+
181
+
182
+ <span class="summary_desc"><div class='inline'></div></span>
183
+
184
+ </li>
185
+
186
+
187
+ <li class="public ">
188
+ <span class="summary_signature">
189
+
190
+ <a href="#remove%21-instance_method" title="#remove! (instance method)">- (Object) <strong>remove!</strong> </a>
191
+
192
+
193
+
194
+ </span>
195
+
196
+
197
+
198
+
199
+
200
+
201
+
202
+
203
+
204
+ <span class="summary_desc"><div class='inline'></div></span>
205
+
206
+ </li>
207
+
208
+
209
+ <li class="public ">
210
+ <span class="summary_signature">
211
+
212
+ <a href="#%7E-instance_method" title="#~ (instance method)">- (Object) <strong>~</strong> </a>
213
+
214
+
215
+
216
+ </span>
217
+
218
+
219
+
220
+
221
+
222
+
223
+
224
+
225
+
226
+ <span class="summary_desc"><div class='inline'></div></span>
227
+
228
+ </li>
229
+
230
+
231
+ </ul>
232
+
233
+
234
+ <div id="constructor_details" class="method_details_list">
235
+ <h2>Constructor Details</h2>
236
+
237
+ <div class="method_details first">
238
+ <h3 class="signature first" id="initialize-instance_method">
239
+
240
+ - (<tt><span class='object_link'><a href="" title="Cursor (class)">Cursor</a></span></tt>) <strong>initialize</strong>(array, start = 0)
241
+
242
+
243
+
244
+
245
+
246
+ </h3><div class="docstring">
247
+ <div class="discussion">
248
+
249
+ <p>Returns a new instance of Cursor</p>
250
+
251
+
252
+ </div>
253
+ </div>
254
+ <div class="tags">
255
+
256
+
257
+ </div><table class="source_code">
258
+ <tr>
259
+ <td>
260
+ <pre class="lines">
261
+
262
+
263
+ 2
264
+ 3
265
+ 4
266
+ 5</pre>
267
+ </td>
268
+ <td>
269
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 2</span>
270
+
271
+ <span class='kw'>def</span> <span class='id identifier rubyid_initialize'>initialize</span><span class='lparen'>(</span><span class='id identifier rubyid_array'>array</span><span class='comma'>,</span> <span class='id identifier rubyid_start'>start</span><span class='op'>=</span><span class='int'>0</span><span class='rparen'>)</span>
272
+ <span class='ivar'>@array</span> <span class='op'>=</span> <span class='id identifier rubyid_array'>array</span>
273
+ <span class='ivar'>@start</span> <span class='op'>=</span> <span class='id identifier rubyid_start'>start</span>
274
+ <span class='kw'>end</span></pre>
275
+ </td>
276
+ </tr>
277
+ </table>
278
+ </div>
279
+
280
+ </div>
281
+
282
+
283
+ <div id="instance_method_details" class="method_details_list">
284
+ <h2>Instance Method Details</h2>
285
+
286
+
287
+ <div class="method_details first">
288
+ <h3 class="signature first" id="next-instance_method">
289
+
290
+ - (<tt>Object</tt>) <strong>next</strong>
291
+
292
+
293
+
294
+
295
+
296
+ </h3><table class="source_code">
297
+ <tr>
298
+ <td>
299
+ <pre class="lines">
300
+
301
+
302
+ 17
303
+ 18
304
+ 19</pre>
305
+ </td>
306
+ <td>
307
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 17</span>
308
+
309
+ <span class='kw'>def</span> <span class='kw'>next</span>
310
+ <span class='const'>Cursor</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='ivar'>@array</span><span class='comma'>,</span> <span class='ivar'>@start</span> <span class='op'>+</span> <span class='int'>1</span><span class='rparen'>)</span>
311
+ <span class='kw'>end</span></pre>
312
+ </td>
313
+ </tr>
314
+ </table>
315
+ </div>
316
+
317
+ <div class="method_details ">
318
+ <h3 class="signature " id="previous-instance_method">
319
+
320
+ - (<tt>Object</tt>) <strong>previous</strong>
321
+
322
+
323
+
324
+
325
+
326
+ </h3><table class="source_code">
327
+ <tr>
328
+ <td>
329
+ <pre class="lines">
330
+
331
+
332
+ 21
333
+ 22
334
+ 23</pre>
335
+ </td>
336
+ <td>
337
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 21</span>
338
+
339
+ <span class='kw'>def</span> <span class='id identifier rubyid_previous'>previous</span>
340
+ <span class='const'>Cursor</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='ivar'>@array</span><span class='comma'>,</span> <span class='ivar'>@start</span> <span class='op'>-</span> <span class='int'>1</span><span class='rparen'>)</span>
341
+ <span class='kw'>end</span></pre>
342
+ </td>
343
+ </tr>
344
+ </table>
345
+ </div>
346
+
347
+ <div class="method_details ">
348
+ <h3 class="signature " id="remove!-instance_method">
349
+
350
+ - (<tt>Object</tt>) <strong>remove!</strong>
351
+
352
+
353
+
354
+
355
+
356
+ </h3><table class="source_code">
357
+ <tr>
358
+ <td>
359
+ <pre class="lines">
360
+
361
+
362
+ 13
363
+ 14
364
+ 15</pre>
365
+ </td>
366
+ <td>
367
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 13</span>
368
+
369
+ <span class='kw'>def</span> <span class='id identifier rubyid_remove!'>remove!</span>
370
+ <span class='ivar'>@array</span><span class='period'>.</span><span class='id identifier rubyid_delete_at'>delete_at</span><span class='lparen'>(</span><span class='ivar'>@start</span><span class='rparen'>)</span>
371
+ <span class='kw'>end</span></pre>
372
+ </td>
373
+ </tr>
374
+ </table>
375
+ </div>
376
+
377
+ <div class="method_details ">
378
+ <h3 class="signature " id="~-instance_method">
379
+
380
+ - (<tt>Object</tt>) <strong>~</strong>
381
+
382
+
383
+
384
+
385
+
386
+ </h3><table class="source_code">
387
+ <tr>
388
+ <td>
389
+ <pre class="lines">
390
+
391
+
392
+ 7
393
+ 8
394
+ 9
395
+ 10
396
+ 11</pre>
397
+ </td>
398
+ <td>
399
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 7</span>
400
+
401
+ <span class='kw'>def</span> <span class='op'>~</span>
402
+ <span class='kw'>if</span> <span class='ivar'>@start</span> <span class='op'>&gt;=</span> <span class='int'>0</span>
403
+ <span class='ivar'>@array</span><span class='lbracket'>[</span><span class='ivar'>@start</span><span class='rbracket'>]</span>
404
+ <span class='kw'>end</span>
405
+ <span class='kw'>end</span></pre>
406
+ </td>
407
+ </tr>
408
+ </table>
409
+ </div>
410
+
411
+ </div>
412
+
413
+ </div>
414
+
415
+ <div id="footer">
416
+ Generated on Fri Nov 15 15:39:43 2013 by
417
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
418
+ 0.8.7.3 (ruby-2.0.0).
419
+ </div>
420
+
421
+ </body>
422
+ </html>