quantile_estimator 0.0.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (46) hide show
  1. checksums.yaml +7 -0
  2. data/.ruby-gemset +1 -0
  3. data/.ruby-version +1 -0
  4. data/Gemfile +4 -0
  5. data/Gemfile.lock +17 -0
  6. data/LICENSE.txt +22 -0
  7. data/README.md +85 -0
  8. data/Rakefile +9 -0
  9. data/benchmark.rb +21 -0
  10. data/doc/Cursor.html +422 -0
  11. data/doc/Estimator.html +779 -0
  12. data/doc/Invariant.html +115 -0
  13. data/doc/Invariant/Biased.html +268 -0
  14. data/doc/Invariant/Invariant.html +193 -0
  15. data/doc/Invariant/SingleTarget.html +278 -0
  16. data/doc/Invariant/Targeted.html +278 -0
  17. data/doc/Item.html +620 -0
  18. data/doc/Quantile.html +270 -0
  19. data/doc/QuantileEstimator.html +117 -0
  20. data/doc/_index.html +211 -0
  21. data/doc/class_list.html +54 -0
  22. data/doc/compression.png +0 -0
  23. data/doc/css/common.css +1 -0
  24. data/doc/css/full_list.css +57 -0
  25. data/doc/css/style.css +338 -0
  26. data/doc/file.README.html +186 -0
  27. data/doc/file_list.html +56 -0
  28. data/doc/frames.html +26 -0
  29. data/doc/index.html +186 -0
  30. data/doc/js/app.js +219 -0
  31. data/doc/js/full_list.js +178 -0
  32. data/doc/js/jquery.js +4 -0
  33. data/doc/method_list.html +221 -0
  34. data/doc/time.png +0 -0
  35. data/doc/top-level-namespace.html +114 -0
  36. data/lib/estimator.rb +120 -0
  37. data/lib/quantile_estimator/cursor.rb +24 -0
  38. data/lib/quantile_estimator/invariant.rb +47 -0
  39. data/lib/quantile_estimator/item.rb +21 -0
  40. data/lib/quantile_estimator/quantile.rb +3 -0
  41. data/lib/quantile_estimator/test.rb +37 -0
  42. data/lib/quantile_estimator/version.rb +3 -0
  43. data/pkg/quantile_estimator-0.0.1.gem +0 -0
  44. data/quantile_estimator.gemspec +29 -0
  45. data/test/test_quantile_estimator.rb +85 -0
  46. metadata +120 -0
checksums.yaml ADDED
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: d96954e453f9b1c4dab4c71a9a26e00b8f26d138
4
+ data.tar.gz: 6da4c7c657a29ea5993d535586b3e4ab54bf8b4b
5
+ SHA512:
6
+ metadata.gz: e9b1452ae828b888742a75eb601bd049969d01501cd17410a55bd14463c8fc23456ffc4a21a37aa78f1acac41e95150697c18dd81d07459d8219baa7f1ea444d
7
+ data.tar.gz: 7885e33a2b2792f39ef81dc87a1fb4f90ca8934f401efa408c701f4596b52e37574b444f90c08f64f65afd9b8a2ba3302c1f474214977a4701ebb42ecab95603
data/.ruby-gemset ADDED
@@ -0,0 +1 @@
1
+ quantile_estimator
data/.ruby-version ADDED
@@ -0,0 +1 @@
1
+ ruby-2.0.0-p0
data/Gemfile ADDED
@@ -0,0 +1,4 @@
1
+ source 'https://rubygems.org'
2
+
3
+ # Specify your gem's dependencies in quantile_estimator.gemspec
4
+ gemspec
data/Gemfile.lock ADDED
@@ -0,0 +1,17 @@
1
+ PATH
2
+ remote: .
3
+ specs:
4
+ quantile_estimator (0.0.1)
5
+
6
+ GEM
7
+ remote: https://rubygems.org/
8
+ specs:
9
+ rake (10.0.4)
10
+
11
+ PLATFORMS
12
+ ruby
13
+
14
+ DEPENDENCIES
15
+ bundler (~> 1.3)
16
+ quantile_estimator!
17
+ rake
data/LICENSE.txt ADDED
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2013 Diego Echeverri
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
data/README.md ADDED
@@ -0,0 +1,85 @@
1
+ # QuantileEstimator.rb
2
+
3
+ This implements the quantile estimator described in the paper:
4
+
5
+ Cormode et. al.:
6
+ "Effective Computation of Biased Quantiles over Data Streams"
7
+
8
+ Given the different strategies to implement the algorithm I used the easiest one by
9
+ using https://github.com/odo/quantile_estimator as reference for the implementation.
10
+
11
+
12
+ ## Installation
13
+
14
+ Add this line to your application's Gemfile:
15
+
16
+ gem 'quantile_estimator'
17
+
18
+ And then execute:
19
+
20
+ $ bundle
21
+
22
+ Or install it yourself as:
23
+
24
+ $ gem install quantile_estimator
25
+
26
+ ## Usage
27
+
28
+ First initialize the estimator. Right now you can either have a `Biased` or
29
+ `Targeted` invariants. The targeted invariant let you select the quantiles you are
30
+ particularly interested and will yield higher compression rates.
31
+
32
+ biased = Invariant::Biased.new(0.004)
33
+ targeted = Invariant::Targeted.new([
34
+ [0.05, 0.02],
35
+ [0.5, 0.02],
36
+ [0.95, 0.02]
37
+ ])
38
+
39
+ estimator = Estimator.new(targeted)
40
+
41
+ Insertion of data is as simple as:
42
+
43
+ estimator.insert(value)
44
+
45
+ The insertion of data _doesn't_ automatically compress it. To compress the data just
46
+ call:
47
+
48
+ estimator.compress!
49
+
50
+ Using these primitives you can build wraps to compress on every nth insert as shown
51
+ in
52
+ [this file](https://github.com/diegoeche/quantile_estimator.rb/blob/master/benchmark.rb)
53
+
54
+ ## Pretty Graphs
55
+
56
+ Using a targeted invariant to keep track of the 0.05, 0.5 and 0.95 quantiles, a
57
+ uniform source of random values and compressing every 100 iterations. We get the
58
+ following behavior regarding the size of the internal data structure:
59
+
60
+ ![compression rate (elements size, lower is better)](https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/compression.png
61
+ "Elements in the estimator. Lower is better")
62
+
63
+ Running time behavior is not too bad. The following graph shows the cost of
64
+ insertions in the estimator. The homogeneous layer of outlayers probably corresponds
65
+ to the compression cycles, while the bottom line is the cost of compression-less
66
+ insertions.
67
+
68
+ ![runtime behavior (ms, lower is better)](https://raw.github.com/diegoeche/quantile_estimator.rb/master/doc/time.png "Time in ms. Lower is better")
69
+
70
+ Different distributions, different invariants setups will have different behaviors.
71
+
72
+ Check your real data before using this!
73
+
74
+ ## Known issues
75
+
76
+ The implementation is known not to be thread-safe, and little effort has been done to
77
+ optimize it.
78
+
79
+ ## Contributing
80
+
81
+ 1. Fork it
82
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
83
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
84
+ 4. Push to the branch (`git push origin my-new-feature`)
85
+ 5. Create new Pull Request
data/Rakefile ADDED
@@ -0,0 +1,9 @@
1
+ require 'bundler/gem_tasks'
2
+ require 'rake/testtask'
3
+
4
+ Rake::TestTask.new do |t|
5
+ t.libs << 'test'
6
+ end
7
+
8
+ desc "Run tests"
9
+ task :default => :test
data/benchmark.rb ADDED
@@ -0,0 +1,21 @@
1
+ require 'estimator'
2
+
3
+ biased = Invariant::Biased.new(0.004)
4
+ targeted = Invariant::Targeted.new([
5
+ [0.05, 0.02],
6
+ [0.5, 0.02],
7
+ [0.95, 0.02]
8
+ ])
9
+
10
+ # estimator = Estimator.new(invariant)
11
+
12
+ estimator = Estimator.new(targeted)
13
+
14
+ 10000.times do |i|
15
+ nowish = Time.now.to_f
16
+ estimator.insert(rand)
17
+ if i % 100 == 99
18
+ estimator.compress!
19
+ end
20
+ puts [i, 1000 * (Time.now.to_f - nowish), estimator.samples.size].join("\t")
21
+ end
data/doc/Cursor.html ADDED
@@ -0,0 +1,422 @@
1
+ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
2
+ "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
3
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
4
+ <head>
5
+ <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
6
+ <title>
7
+ Class: Cursor
8
+
9
+ &mdash; Documentation by YARD 0.8.7.3
10
+
11
+ </title>
12
+
13
+ <link rel="stylesheet" href="css/style.css" type="text/css" charset="utf-8" />
14
+
15
+ <link rel="stylesheet" href="css/common.css" type="text/css" charset="utf-8" />
16
+
17
+ <script type="text/javascript" charset="utf-8">
18
+ hasFrames = window.top.frames.main ? true : false;
19
+ relpath = '';
20
+ framesUrl = "frames.html#!" + escape(window.location.href);
21
+ </script>
22
+
23
+
24
+ <script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
25
+
26
+ <script type="text/javascript" charset="utf-8" src="js/app.js"></script>
27
+
28
+
29
+ </head>
30
+ <body>
31
+ <div id="header">
32
+ <div id="menu">
33
+
34
+ <a href="_index.html">Index (C)</a> &raquo;
35
+
36
+
37
+ <span class="title">Cursor</span>
38
+
39
+
40
+ <div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
41
+ </div>
42
+
43
+ <div id="search">
44
+
45
+ <a class="full_list_link" id="class_list_link"
46
+ href="class_list.html">
47
+ Class List
48
+ </a>
49
+
50
+ <a class="full_list_link" id="method_list_link"
51
+ href="method_list.html">
52
+ Method List
53
+ </a>
54
+
55
+ <a class="full_list_link" id="file_list_link"
56
+ href="file_list.html">
57
+ File List
58
+ </a>
59
+
60
+ </div>
61
+ <div class="clear"></div>
62
+ </div>
63
+
64
+ <iframe id="search_frame"></iframe>
65
+
66
+ <div id="content"><h1>Class: Cursor
67
+
68
+
69
+
70
+ </h1>
71
+
72
+ <dl class="box">
73
+
74
+ <dt class="r1">Inherits:</dt>
75
+ <dd class="r1">
76
+ <span class="inheritName">Object</span>
77
+
78
+ <ul class="fullTree">
79
+ <li>Object</li>
80
+
81
+ <li class="next">Cursor</li>
82
+
83
+ </ul>
84
+ <a href="#" class="inheritanceTree">show all</a>
85
+
86
+ </dd>
87
+
88
+
89
+
90
+
91
+
92
+
93
+
94
+
95
+
96
+ <dt class="r2 last">Defined in:</dt>
97
+ <dd class="r2 last">lib/quantile_estimator/cursor.rb</dd>
98
+
99
+ </dl>
100
+ <div class="clear"></div>
101
+
102
+
103
+
104
+
105
+
106
+
107
+
108
+
109
+
110
+ <h2>
111
+ Instance Method Summary
112
+ <small>(<a href="#" class="summary_toggle">collapse</a>)</small>
113
+ </h2>
114
+
115
+ <ul class="summary">
116
+
117
+ <li class="public ">
118
+ <span class="summary_signature">
119
+
120
+ <a href="#initialize-instance_method" title="#initialize (instance method)">- (Cursor) <strong>initialize</strong>(array, start = 0) </a>
121
+
122
+
123
+
124
+ </span>
125
+
126
+
127
+ <span class="note title constructor">constructor</span>
128
+
129
+
130
+
131
+
132
+
133
+
134
+
135
+
136
+ <span class="summary_desc"><div class='inline'>
137
+ <p>A new instance of Cursor.</p>
138
+ </div></span>
139
+
140
+ </li>
141
+
142
+
143
+ <li class="public ">
144
+ <span class="summary_signature">
145
+
146
+ <a href="#next-instance_method" title="#next (instance method)">- (Object) <strong>next</strong> </a>
147
+
148
+
149
+
150
+ </span>
151
+
152
+
153
+
154
+
155
+
156
+
157
+
158
+
159
+
160
+ <span class="summary_desc"><div class='inline'></div></span>
161
+
162
+ </li>
163
+
164
+
165
+ <li class="public ">
166
+ <span class="summary_signature">
167
+
168
+ <a href="#previous-instance_method" title="#previous (instance method)">- (Object) <strong>previous</strong> </a>
169
+
170
+
171
+
172
+ </span>
173
+
174
+
175
+
176
+
177
+
178
+
179
+
180
+
181
+
182
+ <span class="summary_desc"><div class='inline'></div></span>
183
+
184
+ </li>
185
+
186
+
187
+ <li class="public ">
188
+ <span class="summary_signature">
189
+
190
+ <a href="#remove%21-instance_method" title="#remove! (instance method)">- (Object) <strong>remove!</strong> </a>
191
+
192
+
193
+
194
+ </span>
195
+
196
+
197
+
198
+
199
+
200
+
201
+
202
+
203
+
204
+ <span class="summary_desc"><div class='inline'></div></span>
205
+
206
+ </li>
207
+
208
+
209
+ <li class="public ">
210
+ <span class="summary_signature">
211
+
212
+ <a href="#%7E-instance_method" title="#~ (instance method)">- (Object) <strong>~</strong> </a>
213
+
214
+
215
+
216
+ </span>
217
+
218
+
219
+
220
+
221
+
222
+
223
+
224
+
225
+
226
+ <span class="summary_desc"><div class='inline'></div></span>
227
+
228
+ </li>
229
+
230
+
231
+ </ul>
232
+
233
+
234
+ <div id="constructor_details" class="method_details_list">
235
+ <h2>Constructor Details</h2>
236
+
237
+ <div class="method_details first">
238
+ <h3 class="signature first" id="initialize-instance_method">
239
+
240
+ - (<tt><span class='object_link'><a href="" title="Cursor (class)">Cursor</a></span></tt>) <strong>initialize</strong>(array, start = 0)
241
+
242
+
243
+
244
+
245
+
246
+ </h3><div class="docstring">
247
+ <div class="discussion">
248
+
249
+ <p>Returns a new instance of Cursor</p>
250
+
251
+
252
+ </div>
253
+ </div>
254
+ <div class="tags">
255
+
256
+
257
+ </div><table class="source_code">
258
+ <tr>
259
+ <td>
260
+ <pre class="lines">
261
+
262
+
263
+ 2
264
+ 3
265
+ 4
266
+ 5</pre>
267
+ </td>
268
+ <td>
269
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 2</span>
270
+
271
+ <span class='kw'>def</span> <span class='id identifier rubyid_initialize'>initialize</span><span class='lparen'>(</span><span class='id identifier rubyid_array'>array</span><span class='comma'>,</span> <span class='id identifier rubyid_start'>start</span><span class='op'>=</span><span class='int'>0</span><span class='rparen'>)</span>
272
+ <span class='ivar'>@array</span> <span class='op'>=</span> <span class='id identifier rubyid_array'>array</span>
273
+ <span class='ivar'>@start</span> <span class='op'>=</span> <span class='id identifier rubyid_start'>start</span>
274
+ <span class='kw'>end</span></pre>
275
+ </td>
276
+ </tr>
277
+ </table>
278
+ </div>
279
+
280
+ </div>
281
+
282
+
283
+ <div id="instance_method_details" class="method_details_list">
284
+ <h2>Instance Method Details</h2>
285
+
286
+
287
+ <div class="method_details first">
288
+ <h3 class="signature first" id="next-instance_method">
289
+
290
+ - (<tt>Object</tt>) <strong>next</strong>
291
+
292
+
293
+
294
+
295
+
296
+ </h3><table class="source_code">
297
+ <tr>
298
+ <td>
299
+ <pre class="lines">
300
+
301
+
302
+ 17
303
+ 18
304
+ 19</pre>
305
+ </td>
306
+ <td>
307
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 17</span>
308
+
309
+ <span class='kw'>def</span> <span class='kw'>next</span>
310
+ <span class='const'>Cursor</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='ivar'>@array</span><span class='comma'>,</span> <span class='ivar'>@start</span> <span class='op'>+</span> <span class='int'>1</span><span class='rparen'>)</span>
311
+ <span class='kw'>end</span></pre>
312
+ </td>
313
+ </tr>
314
+ </table>
315
+ </div>
316
+
317
+ <div class="method_details ">
318
+ <h3 class="signature " id="previous-instance_method">
319
+
320
+ - (<tt>Object</tt>) <strong>previous</strong>
321
+
322
+
323
+
324
+
325
+
326
+ </h3><table class="source_code">
327
+ <tr>
328
+ <td>
329
+ <pre class="lines">
330
+
331
+
332
+ 21
333
+ 22
334
+ 23</pre>
335
+ </td>
336
+ <td>
337
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 21</span>
338
+
339
+ <span class='kw'>def</span> <span class='id identifier rubyid_previous'>previous</span>
340
+ <span class='const'>Cursor</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='ivar'>@array</span><span class='comma'>,</span> <span class='ivar'>@start</span> <span class='op'>-</span> <span class='int'>1</span><span class='rparen'>)</span>
341
+ <span class='kw'>end</span></pre>
342
+ </td>
343
+ </tr>
344
+ </table>
345
+ </div>
346
+
347
+ <div class="method_details ">
348
+ <h3 class="signature " id="remove!-instance_method">
349
+
350
+ - (<tt>Object</tt>) <strong>remove!</strong>
351
+
352
+
353
+
354
+
355
+
356
+ </h3><table class="source_code">
357
+ <tr>
358
+ <td>
359
+ <pre class="lines">
360
+
361
+
362
+ 13
363
+ 14
364
+ 15</pre>
365
+ </td>
366
+ <td>
367
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 13</span>
368
+
369
+ <span class='kw'>def</span> <span class='id identifier rubyid_remove!'>remove!</span>
370
+ <span class='ivar'>@array</span><span class='period'>.</span><span class='id identifier rubyid_delete_at'>delete_at</span><span class='lparen'>(</span><span class='ivar'>@start</span><span class='rparen'>)</span>
371
+ <span class='kw'>end</span></pre>
372
+ </td>
373
+ </tr>
374
+ </table>
375
+ </div>
376
+
377
+ <div class="method_details ">
378
+ <h3 class="signature " id="~-instance_method">
379
+
380
+ - (<tt>Object</tt>) <strong>~</strong>
381
+
382
+
383
+
384
+
385
+
386
+ </h3><table class="source_code">
387
+ <tr>
388
+ <td>
389
+ <pre class="lines">
390
+
391
+
392
+ 7
393
+ 8
394
+ 9
395
+ 10
396
+ 11</pre>
397
+ </td>
398
+ <td>
399
+ <pre class="code"><span class="info file"># File 'lib/quantile_estimator/cursor.rb', line 7</span>
400
+
401
+ <span class='kw'>def</span> <span class='op'>~</span>
402
+ <span class='kw'>if</span> <span class='ivar'>@start</span> <span class='op'>&gt;=</span> <span class='int'>0</span>
403
+ <span class='ivar'>@array</span><span class='lbracket'>[</span><span class='ivar'>@start</span><span class='rbracket'>]</span>
404
+ <span class='kw'>end</span>
405
+ <span class='kw'>end</span></pre>
406
+ </td>
407
+ </tr>
408
+ </table>
409
+ </div>
410
+
411
+ </div>
412
+
413
+ </div>
414
+
415
+ <div id="footer">
416
+ Generated on Fri Nov 15 15:39:43 2013 by
417
+ <a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
418
+ 0.8.7.3 (ruby-2.0.0).
419
+ </div>
420
+
421
+ </body>
422
+ </html>