quantile 0.0.1

Sign up to get free protection for your applications and to get access to all the features.
data/LICENSE ADDED
@@ -0,0 +1,191 @@
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction, and
10
+ distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by the copyright
13
+ owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all other entities
16
+ that control, are controlled by, or are under common control with that entity.
17
+ For the purposes of this definition, "control" means (i) the power, direct or
18
+ indirect, to cause the direction or management of such entity, whether by
19
+ contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the
20
+ outstanding shares, or (iii) beneficial ownership of such entity.
21
+
22
+ "You" (or "Your") shall mean an individual or Legal Entity exercising
23
+ permissions granted by this License.
24
+
25
+ "Source" form shall mean the preferred form for making modifications, including
26
+ but not limited to software source code, documentation source, and configuration
27
+ files.
28
+
29
+ "Object" form shall mean any form resulting from mechanical transformation or
30
+ translation of a Source form, including but not limited to compiled object code,
31
+ generated documentation, and conversions to other media types.
32
+
33
+ "Work" shall mean the work of authorship, whether in Source or Object form, made
34
+ available under the License, as indicated by a copyright notice that is included
35
+ in or attached to the work (an example is provided in the Appendix below).
36
+
37
+ "Derivative Works" shall mean any work, whether in Source or Object form, that
38
+ is based on (or derived from) the Work and for which the editorial revisions,
39
+ annotations, elaborations, or other modifications represent, as a whole, an
40
+ original work of authorship. For the purposes of this License, Derivative Works
41
+ shall not include works that remain separable from, or merely link (or bind by
42
+ name) to the interfaces of, the Work and Derivative Works thereof.
43
+
44
+ "Contribution" shall mean any work of authorship, including the original version
45
+ of the Work and any modifications or additions to that Work or Derivative Works
46
+ thereof, that is intentionally submitted to Licensor for inclusion in the Work
47
+ by the copyright owner or by an individual or Legal Entity authorized to submit
48
+ on behalf of the copyright owner. For the purposes of this definition,
49
+ "submitted" means any form of electronic, verbal, or written communication sent
50
+ to the Licensor or its representatives, including but not limited to
51
+ communication on electronic mailing lists, source code control systems, and
52
+ issue tracking systems that are managed by, or on behalf of, the Licensor for
53
+ the purpose of discussing and improving the Work, but excluding communication
54
+ that is conspicuously marked or otherwise designated in writing by the copyright
55
+ owner as "Not a Contribution."
56
+
57
+ "Contributor" shall mean Licensor and any individual or Legal Entity on behalf
58
+ of whom a Contribution has been received by Licensor and subsequently
59
+ incorporated within the Work.
60
+
61
+ 2. Grant of Copyright License.
62
+
63
+ Subject to the terms and conditions of this License, each Contributor hereby
64
+ grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
65
+ irrevocable copyright license to reproduce, prepare Derivative Works of,
66
+ publicly display, publicly perform, sublicense, and distribute the Work and such
67
+ Derivative Works in Source or Object form.
68
+
69
+ 3. Grant of Patent License.
70
+
71
+ Subject to the terms and conditions of this License, each Contributor hereby
72
+ grants to You a perpetual, worldwide, non-exclusive, no-charge, royalty-free,
73
+ irrevocable (except as stated in this section) patent license to make, have
74
+ made, use, offer to sell, sell, import, and otherwise transfer the Work, where
75
+ such license applies only to those patent claims licensable by such Contributor
76
+ that are necessarily infringed by their Contribution(s) alone or by combination
77
+ of their Contribution(s) with the Work to which such Contribution(s) was
78
+ submitted. If You institute patent litigation against any entity (including a
79
+ cross-claim or counterclaim in a lawsuit) alleging that the Work or a
80
+ Contribution incorporated within the Work constitutes direct or contributory
81
+ patent infringement, then any patent licenses granted to You under this License
82
+ for that Work shall terminate as of the date such litigation is filed.
83
+
84
+ 4. Redistribution.
85
+
86
+ You may reproduce and distribute copies of the Work or Derivative Works thereof
87
+ in any medium, with or without modifications, and in Source or Object form,
88
+ provided that You meet the following conditions:
89
+
90
+ You must give any other recipients of the Work or Derivative Works a copy of
91
+ this License; and
92
+ You must cause any modified files to carry prominent notices stating that You
93
+ changed the files; and
94
+ You must retain, in the Source form of any Derivative Works that You distribute,
95
+ all copyright, patent, trademark, and attribution notices from the Source form
96
+ of the Work, excluding those notices that do not pertain to any part of the
97
+ Derivative Works; and
98
+ If the Work includes a "NOTICE" text file as part of its distribution, then any
99
+ Derivative Works that You distribute must include a readable copy of the
100
+ attribution notices contained within such NOTICE file, excluding those notices
101
+ that do not pertain to any part of the Derivative Works, in at least one of the
102
+ following places: within a NOTICE text file distributed as part of the
103
+ Derivative Works; within the Source form or documentation, if provided along
104
+ with the Derivative Works; or, within a display generated by the Derivative
105
+ Works, if and wherever such third-party notices normally appear. The contents of
106
+ the NOTICE file are for informational purposes only and do not modify the
107
+ License. You may add Your own attribution notices within Derivative Works that
108
+ You distribute, alongside or as an addendum to the NOTICE text from the Work,
109
+ provided that such additional attribution notices cannot be construed as
110
+ modifying the License.
111
+ You may add Your own copyright statement to Your modifications and may provide
112
+ additional or different license terms and conditions for use, reproduction, or
113
+ distribution of Your modifications, or for any such Derivative Works as a whole,
114
+ provided Your use, reproduction, and distribution of the Work otherwise complies
115
+ with the conditions stated in this License.
116
+
117
+ 5. Submission of Contributions.
118
+
119
+ Unless You explicitly state otherwise, any Contribution intentionally submitted
120
+ for inclusion in the Work by You to the Licensor shall be under the terms and
121
+ conditions of this License, without any additional terms or conditions.
122
+ Notwithstanding the above, nothing herein shall supersede or modify the terms of
123
+ any separate license agreement you may have executed with Licensor regarding
124
+ such Contributions.
125
+
126
+ 6. Trademarks.
127
+
128
+ This License does not grant permission to use the trade names, trademarks,
129
+ service marks, or product names of the Licensor, except as required for
130
+ reasonable and customary use in describing the origin of the Work and
131
+ reproducing the content of the NOTICE file.
132
+
133
+ 7. Disclaimer of Warranty.
134
+
135
+ Unless required by applicable law or agreed to in writing, Licensor provides the
136
+ Work (and each Contributor provides its Contributions) on an "AS IS" BASIS,
137
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied,
138
+ including, without limitation, any warranties or conditions of TITLE,
139
+ NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A PARTICULAR PURPOSE. You are
140
+ solely responsible for determining the appropriateness of using or
141
+ redistributing the Work and assume any risks associated with Your exercise of
142
+ permissions under this License.
143
+
144
+ 8. Limitation of Liability.
145
+
146
+ In no event and under no legal theory, whether in tort (including negligence),
147
+ contract, or otherwise, unless required by applicable law (such as deliberate
148
+ and grossly negligent acts) or agreed to in writing, shall any Contributor be
149
+ liable to You for damages, including any direct, indirect, special, incidental,
150
+ or consequential damages of any character arising as a result of this License or
151
+ out of the use or inability to use the Work (including but not limited to
152
+ damages for loss of goodwill, work stoppage, computer failure or malfunction, or
153
+ any and all other commercial damages or losses), even if such Contributor has
154
+ been advised of the possibility of such damages.
155
+
156
+ 9. Accepting Warranty or Additional Liability.
157
+
158
+ While redistributing the Work or Derivative Works thereof, You may choose to
159
+ offer, and charge a fee for, acceptance of support, warranty, indemnity, or
160
+ other liability obligations and/or rights consistent with this License. However,
161
+ in accepting such obligations, You may act only on Your own behalf and on Your
162
+ sole responsibility, not on behalf of any other Contributor, and only if You
163
+ agree to indemnify, defend, and hold each Contributor harmless for any liability
164
+ incurred by, or claims asserted against, such Contributor by reason of your
165
+ accepting any such warranty or additional liability.
166
+
167
+ END OF TERMS AND CONDITIONS
168
+
169
+ APPENDIX: How to apply the Apache License to your work
170
+
171
+ To apply the Apache License to your work, attach the following boilerplate
172
+ notice, with the fields enclosed by brackets "[]" replaced with your own
173
+ identifying information. (Don't include the brackets!) The text should be
174
+ enclosed in the appropriate comment syntax for the file format. We also
175
+ recommend that a file or class name and description of purpose be included on
176
+ the same "printed page" as the copyright notice for easier identification within
177
+ third-party archives.
178
+
179
+ Copyright [yyyy] [name of copyright owner]
180
+
181
+ Licensed under the Apache License, Version 2.0 (the "License");
182
+ you may not use this file except in compliance with the License.
183
+ You may obtain a copy of the License at
184
+
185
+ http://www.apache.org/licenses/LICENSE-2.0
186
+
187
+ Unless required by applicable law or agreed to in writing, software
188
+ distributed under the License is distributed on an "AS IS" BASIS,
189
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
190
+ See the License for the specific language governing permissions and
191
+ limitations under the License.
@@ -0,0 +1,4 @@
1
+ ruby_quantile_estimation
2
+ ========================
3
+
4
+ Ruby Implementation of Graham Cormode and S. Muthukrishnan's Effective Computation of Biased Quantiles over Data Streams in ICDE’05
@@ -0,0 +1,17 @@
1
+ # Copyright 2013 Matt T. Proud
2
+ # Licensed under the Apache License, Version 2.0 (the "License");
3
+ # you may not use this file except in compliance with the License.
4
+ # You may obtain a copy of the License at
5
+ #
6
+ # http://www.apache.org/licenses/LICENSE-2.0
7
+ #
8
+ # Unless required by applicable law or agreed to in writing, software
9
+ # distributed under the License is distributed on an "AS IS" BASIS,
10
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11
+ # See the License for the specific language governing permissions and
12
+ # limitations under the License.
13
+
14
+ require_relative 'quantile/quantile'
15
+ require_relative 'quantile/estimator'
16
+ require_relative 'quantile/version'
17
+
@@ -0,0 +1,184 @@
1
+ # Copyright 2013 Matt T. Proud
2
+ # Licensed under the Apache License, Version 2.0 (the "License");
3
+ # you may not use this file except in compliance with the License.
4
+ # You may obtain a copy of the License at
5
+ #
6
+ # http://www.apache.org/licenses/LICENSE-2.0
7
+ #
8
+ # Unless required by applicable law or agreed to in writing, software
9
+ # distributed under the License is distributed on an "AS IS" BASIS,
10
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11
+ # See the License for the specific language governing permissions and
12
+ # limitations under the License.
13
+
14
+ module Quantile
15
+ #
16
+ # Estimate quantile values efficiently where both the rank and the inaccuracy
17
+ # allowance are known a priori. This is accomplished via Graham Cormode and
18
+ # S\. Muthukrishnan's Effective Computation of Biased Quantiles over Data
19
+ # Streams in ICDE’05.
20
+ #
21
+ #
22
+ # @note {Estimator} is not concurrency safe.
23
+ #
24
+ # @see http://www.cs.rutgers.edu/~muthu/bquant.pdf Effective Computation of
25
+ # Biased Quantiles over Data Streams
26
+ #
27
+ class Estimator
28
+ #
29
+ # Create a streaming quantile estimator.
30
+ #
31
+ # @param invariants [Quantile] The quantile estimation targets that are provided a priori.
32
+ # @return [Estimator] An initialized {Estimator} for the given targets.
33
+ #
34
+ def initialize(*invariants)
35
+ if invariants.empty?
36
+ invariants = [Quantile.new(0.5, 0.05), Quantile.new(0.90, 0.01), Quantile.new(0.99, 0.001)]
37
+ end
38
+
39
+ @invariants = invariants
40
+ @buffer = []
41
+ @head = nil
42
+
43
+ @observations = 0
44
+ @items = 0
45
+ end
46
+
47
+ #
48
+ # Observe a sample value with this {Estimator}.
49
+ #
50
+ # @param value [Numeric] The value to catalog for later analysis.
51
+ #
52
+ def observe(value)
53
+ @buffer << value
54
+ if @buffer.size == BUFFER_SIZE
55
+ flush
56
+ end
57
+ end
58
+
59
+ #
60
+ # Get a quantile value for a given rank.
61
+ #
62
+ # @param rank [Float] The target quantile to retrieve. It *must* be one of
63
+ # the invariants provided in the constructor.
64
+ #
65
+ # @return [Numeric] The quantile value for the rank.
66
+ #
67
+ def query(rank)
68
+ flush
69
+
70
+ current = @head
71
+ if current.nil?
72
+ return 0
73
+ end
74
+
75
+ mid_rank = (rank * @observations).floor
76
+ max_rank = mid_rank + (invariant(mid_rank, @observations) / 2).floor
77
+
78
+ rank = 0.0
79
+ while !current.successor.nil?
80
+ rank += current.rank
81
+ if rank + current.successor.rank + current.successor.delta > max_rank
82
+ return current.value
83
+ end
84
+
85
+ current = current.successor
86
+ end
87
+
88
+ return current.value
89
+ end
90
+
91
+ private
92
+
93
+ def flush
94
+ @buffer.sort!
95
+ replace_batch
96
+ @buffer.clear
97
+ compress
98
+ end
99
+
100
+ def replace_batch
101
+ if @head.nil?
102
+ @head = record(@buffer.shift, 1, 0, nil)
103
+ end
104
+
105
+ rank = 0.0
106
+ current = @head
107
+
108
+ @buffer.each do |s|
109
+ if s < @head.value
110
+ @head = record(s, 1, 0, @head)
111
+ end
112
+
113
+ while !current.successor.nil? && current.successor.value < s
114
+ rank += current.rank
115
+ current = current.successor
116
+ end
117
+
118
+ if current.successor.nil?
119
+ current.successor = record(s, 1, 0, nil)
120
+ end
121
+
122
+ current.successor = record(s, 1, invariant(rank, @observations)-1, current.successor)
123
+ end
124
+ end
125
+
126
+ def record(value, rank, delta, successor)
127
+ @observations += 1
128
+ @items += 1
129
+
130
+ return Sample.new(value,rank,delta, successor)
131
+ end
132
+
133
+ def invariant(rank, n)
134
+ min = n + 1
135
+
136
+ @invariants.each do |i|
137
+ delta = i.delta(rank, n)
138
+ if delta < min
139
+ min = delta
140
+ end
141
+ end
142
+
143
+ return min.floor
144
+ end
145
+
146
+ def compress
147
+ rank = 0.0
148
+ current = @head
149
+
150
+ while !(current.nil? || current.successor.nil?)
151
+ if current.rank + current.successor.rank + current.successor.delta <= invariant(rank, @observations)
152
+ removed = current.successor
153
+
154
+ current.value = removed.value
155
+ current.rank += removed.rank
156
+ current.delta = removed.delta
157
+ current.successor = removed.successor
158
+ end
159
+
160
+ rank += current.rank
161
+ current = current.successor
162
+ end
163
+ end
164
+ end
165
+
166
+ private
167
+
168
+ BUFFER_SIZE = 512
169
+
170
+ class Sample
171
+ attr_accessor :value
172
+ attr_accessor :rank
173
+ attr_accessor :delta
174
+ attr_accessor :successor
175
+
176
+ def initialize(value, rank, delta, successor)
177
+ @value = value
178
+ @rank = rank
179
+ @delta = delta
180
+ @successor = successor
181
+ end
182
+ end
183
+ end
184
+
@@ -0,0 +1,54 @@
1
+ # Copyright 2013 Matt T. Proud
2
+ # Licensed under the Apache License, Version 2.0 (the "License");
3
+ # you may not use this file except in compliance with the License.
4
+ # You may obtain a copy of the License at
5
+ #
6
+ # http://www.apache.org/licenses/LICENSE-2.0
7
+ #
8
+ # Unless required by applicable law or agreed to in writing, software
9
+ # distributed under the License is distributed on an "AS IS" BASIS,
10
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11
+ # See the License for the specific language governing permissions and
12
+ # limitations under the License.
13
+
14
+ module Quantile
15
+ #
16
+ # A known quantile rank invariant for {Estimator}.
17
+ #
18
+ # @note {Quantile} is concurrency-safe.
19
+ #
20
+ class Quantile
21
+ attr_reader :quantile
22
+ attr_reader :inaccuracy
23
+
24
+ #
25
+ # Create a known quantile estimator invariant.
26
+ #
27
+ # @param quantile [Float] The target quantile value expressed along the
28
+ # interval [0, 1]. For instance, 0.5, would
29
+ # generate a precomputed median value and 0.99
30
+ # would provide the 99th percentile.
31
+ # @param inaccuracy [Float] The target error allowance expressed along the
32
+ # interval [0, 1]. For instance, 0.05 sets an
33
+ # error allowance of 5 percent and 0.001 of 0.1
34
+ # percent.
35
+ #
36
+ # @return [Quantile] an initialized {Quantile} for the given targets.
37
+ def initialize(quantile, inaccuracy)
38
+ @quantile = quantile
39
+ @inaccuracy = inaccuracy
40
+
41
+ @coefficient_i = (2.0 * inaccuracy) / (1.0 - quantile)
42
+ @coefficient_ii = 2.0 * inaccuracy / quantile
43
+ end
44
+
45
+ def delta(rank, n)
46
+ if rank <= (@quantile * n).floor
47
+ return @coefficient_i * (n - rank)
48
+ end
49
+
50
+ return @coefficient_ii * rank
51
+ end
52
+ end
53
+ end
54
+
@@ -0,0 +1,16 @@
1
+ # Copyright 2013 Matt T. Proud
2
+ # Licensed under the Apache License, Version 2.0 (the "License");
3
+ # you may not use this file except in compliance with the License.
4
+ # You may obtain a copy of the License at
5
+ #
6
+ # http://www.apache.org/licenses/LICENSE-2.0
7
+ #
8
+ # Unless required by applicable law or agreed to in writing, software
9
+ # distributed under the License is distributed on an "AS IS" BASIS,
10
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
11
+ # See the License for the specific language governing permissions and
12
+ # limitations under the License.
13
+
14
+ module Quantile
15
+ VERSION = '0.0.1'
16
+ end
metadata ADDED
@@ -0,0 +1,84 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: quantile
3
+ version: !ruby/object:Gem::Version
4
+ version: 0.0.1
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - Matt T. Proud
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-07-22 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rake
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :runtime
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: rspec
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: 2.0.0
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: 2.0.0
46
+ description: Graham Cormode and S. Muthukrishnan's Effective Computation of Biased
47
+ Quantiles over Data Streams in ICDE’05
48
+ email: matt.proud@gmail.com
49
+ executables: []
50
+ extensions: []
51
+ extra_rdoc_files: []
52
+ files:
53
+ - lib/quantile.rb
54
+ - lib/quantile/quantile.rb
55
+ - lib/quantile/estimator.rb
56
+ - lib/quantile/version.rb
57
+ - README.md
58
+ - LICENSE
59
+ homepage: http://github.com/matttproud/ruby_quantile_estimation
60
+ licenses: []
61
+ post_install_message:
62
+ rdoc_options: []
63
+ require_paths:
64
+ - lib
65
+ required_ruby_version: !ruby/object:Gem::Requirement
66
+ none: false
67
+ requirements:
68
+ - - ! '>='
69
+ - !ruby/object:Gem::Version
70
+ version: '0'
71
+ required_rubygems_version: !ruby/object:Gem::Requirement
72
+ none: false
73
+ requirements:
74
+ - - ! '>='
75
+ - !ruby/object:Gem::Version
76
+ version: '0'
77
+ requirements: []
78
+ rubyforge_project:
79
+ rubygems_version: 1.8.24
80
+ signing_key:
81
+ specification_version: 3
82
+ summary: Streaming Quantile Estimation
83
+ test_files: []
84
+ has_rdoc: