frequent-algorithm 0.0.3 → 0.0.4

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: c581bac868de994e83f9fee63f34d8987ab2eda9
4
- data.tar.gz: a9c068fa92383857a5d0526c9aa55ce4746651d5
3
+ metadata.gz: 0ab3af8c9299159321aa18f30c5022ff7e8b3042
4
+ data.tar.gz: 01250f9d283874ae66fb0660415ac74476724b43
5
5
  SHA512:
6
- metadata.gz: 42571a866be7b6ee00748a10afb5ee6d48d8025198a68877a18401bedc0e372da09b1af3a7ef69ac94c78ed2fb798e52899b30eaddea2c423c0982413a2f9f3e
7
- data.tar.gz: 0fd80b41cbc31fec785ea52596ab0e9094e54172d8a95e419f80339cdbcd7e9b275dd4a7a0826380ff43a3f81e8f6519d87fe02773ef4319c4a264211cee6f95
6
+ metadata.gz: c38e275789110b1abac68a91f60f1a952aac05ddba011d22e213c38e33132b6261d92fc05b994b3310422c4081e6b6e1018c1854cf4f9562c3aea9e40d680a12
7
+ data.tar.gz: 8a84db9ba16c809a7700973cd62738f8cbd4b19461bff56f6c6f9192000e2e0cc4affc136576682cdcf29fb18a37b0653197404773426ab40629e0a1c493ced3
data/.yardopts CHANGED
@@ -1,7 +1,7 @@
1
- --no-private
2
- --readme README.md
3
- --markup markdown
4
- --markup-provider rdiscount
5
- -
6
- LICENSE
7
- CHANGELOG
1
+ --no-private
2
+ --readme README.md
3
+ --markup markdown
4
+ --markup-provider rdiscount
5
+ -
6
+ LICENSE
7
+ CHANGELOG
data/CHANGELOG CHANGED
@@ -1,9 +1,23 @@
1
- ## CHANGELOG
2
-
3
- - __2015/03/19 0.0.2 release.
4
- - First-stage implementation.
5
- - API documentation added.
6
- - Fleshing out unit tests.
7
-
8
- - __2015/03/11__: 0.0.1 release.
9
- - Initial release.
1
+ ## CHANGELOG
2
+
3
+ - __2015/05/06__ 0.0.4 release.
4
+ - Issue 17: Enhance Algorithm: new method to return top-k statistics
5
+ - Issue 42: Enhance algorithm: Accept one element as parameter in the process method
6
+ - Issue 45: FIX markdown in CHANGELOG
7
+ - Issue 53: Get this project to run/test against Rubinius
8
+ - Initial work to make the code thread-safe
9
+
10
+ - __2015/03/23__ 0.0.3 release.
11
+ - Further refinements to process.
12
+ - Enhanced strategy for calculating kth largest element in list.
13
+ - Resolved Issue 24: Refactor - Consistent internal data structure.
14
+ - Wontfix Issue 28: Add new test cases using String as input.
15
+ - On-going refinements for unit tests.
16
+
17
+ - __2015/03/19__ 0.0.2 release.
18
+ - First-stage implementation.
19
+ - API documentation added.
20
+ - Fleshing out unit tests.
21
+
22
+ - __2015/03/11__: 0.0.1 release.
23
+ - Initial release.
data/LICENSE CHANGED
@@ -1,22 +1,22 @@
1
- The MIT License (MIT)
2
-
3
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
22
-
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
data/README.md CHANGED
@@ -1,149 +1,159 @@
1
- # frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
2
-
3
- Web site usage, social network behavior and Internet traffic are examples
4
- of systems that appear to follow the [Power law](http://en.wikipedia.org/wiki/Power_law),
5
- where most of the events are due to the actions of a very small few.
6
- Knowing at any given point in time which items are trending is valuable
7
- in understanding the system.
8
-
9
- `frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
10
- for identifying frequent items in a data stream in sliding windows.
11
- Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
12
- Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
13
- Golab, DeHaan, Demaine, López-Ortiz and Munro (2003).
14
-
15
- ## Introduction
16
-
17
- ### Challenges
18
-
19
- Challenges for Real-time processing of data streams for _frequent item queries_
20
- include:
21
-
22
- * data may be of unknown and possibly unbound length
23
- * data may be arriving a very fast rate
24
- * it might not be possible to go back and re-read the data
25
- * too large a window of observation may include stale data
26
-
27
- Therefore, a solution should have the following characteristics:
28
-
29
- * uses limited memory
30
- * can process events in the stream in Ο(1) constant time
31
- * requires only a single-pass over the data
32
-
33
-
34
- ### The algorithm
35
-
36
- > LOOP<br/>
37
- > 1. For each element e in the next b elements:<br/>
38
- > &nbsp;&nbsp;&nbsp;&nbsp;If a local counter exists for the type of element e:<br/>
39
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Increment the local counter.<br/>
40
- > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
41
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new local counter for this element type<br/>
42
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to 1.<br/>
43
- > 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
44
- > 3. Delete all local counters<br/>
45
- > 4. For each type named in S:<br/>
46
- > &nbsp;&nbsp;&nbsp;&nbsp;If a global counter exists for this type:<br/>
47
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Add to it the count recorded in S.<br/>
48
- > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
49
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new global counter for this element type<br/>
50
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to the count recorded in S.<br/>
51
- > 5. Add the count of the kth largest type in S to δ.<br/>
52
- > 6. If sizeOf(Q) > N/b:<br/>
53
- > &nbsp;&nbsp;&nbsp;&nbsp;(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
54
- > &nbsp;&nbsp;&nbsp;&nbsp;(b) For all element types named in S':<br/>
55
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subtract from their global counters the counts<br/>
56
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recorded in S'<br/>
57
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If a counter is decremented to zero:<br/>
58
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete it.<br/>
59
- > &nbsp;&nbsp;&nbsp;&nbsp;(c) Output the identity and value of each global counter > δ.
60
- >
61
- > &mdash; <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
62
-
63
-
64
- ## Usage
65
-
66
- require 'frequent-algorithm'
67
-
68
- # data is pi to 1000 digits
69
- pi = File.read('test/frequent/test_data_pi').strip
70
- data = pi.scan(/./).each_slice(b)
71
-
72
- N = 100 # size of main window
73
- b = 20 # size of basic window
74
- k = 3 # we are interested in top-3 numerals in pi
75
-
76
- alg = Frequent::Algorithm.new(N, b, k)
77
-
78
- # read in and process the 1st basic window
79
- alg.process(data.next)
80
-
81
- # and the top-3 numerals are?
82
- top3 = alg.statistics.report
83
- puts top3
84
-
85
- # lather, rinse and repeat
86
- alg.process(data.next)
87
-
88
-
89
- ## Development
90
-
91
- The development of this gem requires the following:
92
-
93
- * [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
94
- * [rubygems](https://rubygems.org/pages/download)
95
- * [`bundler`](https://github.com/bundler/bundler)
96
- * [`rake`](https://github.com/ruby/rake)
97
- * [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
98
- * [`yard`](https://rubygems.org/gems/yard) (documentation)
99
- * [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
100
-
101
- Building, testing and release of this rubygem uses the following
102
- `rake` commands:
103
-
104
-
105
- rake clean # Remove any temporary products
106
- rake clobber # Remove any generated file
107
- rake test # Execute unit tests
108
- rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
109
- rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
110
- rake release # Create tag vn.n.n and build and push
111
- # frequent-algorithm-n.n.n.gem to Rubygems
112
-
113
-
114
- ### Documentation
115
-
116
- `frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
117
- [`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
118
- Check out [Getting Started with
119
- Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
120
-
121
- ### Unit Testing
122
-
123
- `frequent-algorithm` uses
124
- [`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
125
- unit testing.
126
-
127
- ### Releasing
128
-
129
- Please refer to Publishing To Rubygems.org in the
130
- [Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
131
-
132
- ### Contributing
133
-
134
- 1. Fork it
135
- 2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
136
- 3. Create your feature branch (`git branch my-new-feature && git checkout
137
- my-new-feature`)
138
- 4. Commit your changes (`git commit -am 'Add some feature'`)
139
- 5. Push to the branch (`git push origin my-new-feature:dev-branch`)
140
- 6. Create new Pull Request
141
-
142
- You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
143
-
144
-
145
- ## License
146
-
147
- frequent-algorithm is provided under the terms of the MIT license.
148
-
149
- Copyright &copy; 2015, Willie Tong &amp; Brooke M. Fujita. All rights reserved.
1
+ # frequent-algorithm
2
+
3
+ Web site usage, social network behavior and Internet traffic are examples
4
+ of systems that appear to follow the [Power law](http://en.wikipedia.org/wiki/Power_law),
5
+ where most of the events are due to the actions of a very small few.
6
+ Knowing at any given point in time which items are trending is valuable
7
+ in understanding the system.
8
+
9
+ `frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
10
+ for identifying frequent items in a data stream in sliding windows.
11
+ Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
12
+ Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
13
+ Golab, DeHaan, Demaine, L&#243;pez-Ortiz and Munro (2003).
14
+
15
+ [![License](https://img.shields.io/badge/license-MIT-blue.svg)]() [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg?branch=master)](https://travis-ci.org/buruzaemon/frequent-algorithm) [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](https://rubygems.org/gems/frequent-algorithm)
16
+
17
+ ## Introduction
18
+
19
+ ### Challenges
20
+
21
+ Challenges for Real-time processing of data streams for _frequent item queries_
22
+ include:
23
+
24
+ * data may be of unknown and possibly unbound length
25
+ * data may be arriving a very fast rate
26
+ * it might not be possible to go back and re-read the data
27
+ * too large a window of observation may include stale data
28
+
29
+ Therefore, a solution should have the following characteristics:
30
+
31
+ * uses limited memory
32
+ * can process events in the stream in &#927;(1) constant time
33
+ * requires only a single-pass over the data
34
+
35
+
36
+ ### The algorithm
37
+
38
+ > LOOP<br/>
39
+ > 1. For each element e in the next b elements:<br/>
40
+ > &nbsp;&nbsp;&nbsp;&nbsp;If a local counter exists for the type of element e:<br/>
41
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Increment the local counter.<br/>
42
+ > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
43
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new local counter for this element type<br/>
44
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to 1.<br/>
45
+ > 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
46
+ > 3. Delete all local counters<br/>
47
+ > 4. For each type named in S:<br/>
48
+ > &nbsp;&nbsp;&nbsp;&nbsp;If a global counter exists for this type:<br/>
49
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Add to it the count recorded in S.<br/>
50
+ > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
51
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new global counter for this element type<br/>
52
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to the count recorded in S.<br/>
53
+ > 5. Add the count of the kth largest type in S to δ.<br/>
54
+ > 6. If sizeOf(Q) > N/b:<br/>
55
+ > &nbsp;&nbsp;&nbsp;&nbsp;(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
56
+ > &nbsp;&nbsp;&nbsp;&nbsp;(b) For all element types named in S':<br/>
57
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subtract from their global counters the counts<br/>
58
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recorded in S'<br/>
59
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If a counter is decremented to zero:<br/>
60
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete it.<br/>
61
+ > &nbsp;&nbsp;&nbsp;&nbsp;(c) Output the identity and value of each global counter > δ.
62
+ >
63
+ > &mdash; <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
64
+
65
+
66
+ ## Usage
67
+
68
+ require 'frequent-algorithm'
69
+
70
+ # data is pi to 1000 digits
71
+ pi = File.read('test/frequent/test_data_pi').strip
72
+ data = pi.scan(/./).each_slice(b)
73
+
74
+ N = 100 # size of main window
75
+ b = 20 # size of basic window
76
+ k = 3 # we are interested in top-3 numerals in pi
77
+
78
+ alg = Frequent::Algorithm.new(N, b, k)
79
+
80
+ # read in and process the 1st basic window
81
+ alg.process(data.next)
82
+
83
+ # and the top-3 numerals are?
84
+ top3 = alg.statistics.report
85
+ puts top3
86
+
87
+ # lather, rinse and repeat
88
+ alg.process(data.next)
89
+
90
+
91
+ ## Development
92
+
93
+ The development of this gem requires the following:
94
+
95
+ * [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
96
+ * [rubygems](https://rubygems.org/pages/download)
97
+ * [`bundler`](https://github.com/bundler/bundler)
98
+ * [`rake`](https://github.com/ruby/rake)
99
+ * [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
100
+ * [`yard`](https://rubygems.org/gems/yard) (documentation)
101
+ * [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
102
+
103
+ Building, testing and release of this rubygem uses the following
104
+ `rake` commands:
105
+
106
+
107
+ rake clean # Remove any temporary products
108
+ rake clobber # Remove any generated file
109
+ rake test # Execute unit tests
110
+ rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
111
+ rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
112
+ rake release # Create tag vn.n.n and build and push
113
+ # frequent-algorithm-n.n.n.gem to Rubygems
114
+
115
+
116
+ ### Documentation
117
+
118
+ `frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
119
+ [`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
120
+ Check out [Getting Started with
121
+ Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
122
+
123
+
124
+ ### Unit Testing
125
+
126
+ `frequent-algorithm` uses
127
+ [`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
128
+ unit testing.
129
+
130
+
131
+ ### Releasing
132
+
133
+ Please refer to Publishing To Rubygems.org in the
134
+ [Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
135
+
136
+
137
+ ### Contributing
138
+
139
+ 1. Fork it
140
+ 2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
141
+ 3. Create your feature branch (`git branch my-new-feature && git checkout
142
+ my-new-feature`)
143
+ 4. Commit your changes (`git commit -am 'Add some feature'`)
144
+ 5. Push to the branch (`git push origin my-new-feature:dev-branch`)
145
+ 6. Create new Pull Request
146
+
147
+ You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
148
+
149
+
150
+ ## Changelog
151
+
152
+ Please see the {file:CHANGELOG} for this gem's release history.
153
+
154
+
155
+ ## License
156
+
157
+ frequent-algorithm is provided under the terms of the MIT license.
158
+
159
+ Copyright &copy; 2015, Willie Tong &amp; Brooke M. Fujita. All rights reserved. Please see the {file:LICENSE} file for further details.
@@ -1,28 +1,28 @@
1
- # coding: utf-8
2
- require 'frequent/algorithm'
3
-
4
- =begin
5
-
6
- The MIT License (MIT)
7
-
8
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
9
-
10
- Permission is hereby granted, free of charge, to any person obtaining a
11
- copy of this software and associated documentation files (the "Software"),
12
- to deal in the Software without restriction, including without limitation
13
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
14
- and/or sell copies of the Software, and to permit persons to whom the
15
- Software is furnished to do so, subject to the following conditions:
16
-
17
- The above copyright notice and this permission notice shall be included
18
- in all copies or substantial portions of the Software.
19
-
20
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
23
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
25
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
26
- IN THE SOFTWARE.
27
-
28
- =end
1
+ # coding: utf-8
2
+ require 'frequent/algorithm'
3
+
4
+ =begin
5
+
6
+ The MIT License (MIT)
7
+
8
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a
11
+ copy of this software and associated documentation files (the "Software"),
12
+ to deal in the Software without restriction, including without limitation
13
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
14
+ and/or sell copies of the Software, and to permit persons to whom the
15
+ Software is furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included
18
+ in all copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
23
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
25
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
26
+ IN THE SOFTWARE.
27
+
28
+ =end
@@ -1,182 +1,201 @@
1
- # coding: utf-8
2
- require 'frequent/version'
3
-
4
- module Frequent
5
-
6
- ERR_BADLIST = "List cannot be nil or empty".freeze
7
- ERR_BADK = "k must be between 1 and %s".freeze
8
-
9
- # `Frequent::Algorithm` is the Ruby implementation of the
10
- # Demaine et al. FREQUENT algorithm for calculating
11
- # top-k items in a stream.
12
- #
13
- # The aims of this algorithm are:
14
- # * use limited memory
15
- # * require constant processing time per item
16
- # * require a single-pass only
17
- #
18
- class Algorithm
19
- # @return [Integer] the number of items in the main window
20
- attr_reader :n
21
- # @return [Integer] the number of items in a basic window
22
- attr_reader :b
23
- # @return [Integer] the number of top item categories to track
24
- attr_reader :k
25
- # @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
26
- attr_reader :queue
27
- # @return [Hash<Object,Integer>] global mapping of items and counts
28
- attr_reader :statistics
29
- # @return [Integer] minimum threshold for membership in top-k items
30
- attr_reader :delta
31
-
32
- # Initializes this top-k frequency-calculating instance.
33
- #
34
- # @param [Integer] n number of items in the main window
35
- # @param [Integer] b number of items in a basic window
36
- # @param [Integer] k number of top item categories to track
37
- # @raise [ArgumentError] if n is not greater than 0
38
- # @raise [ArgumentError] if b is not greater than 0
39
- # @raise [ArgumentError] if k is not greater than 0
40
- # @raise [ArgumentError] if n/b is not greater than 1
41
- def initialize(n, b, k=1)
42
- if n <= 0
43
- raise ArgumentError.new('n must be greater than 0')
44
- end
45
- if b <= 0
46
- raise ArgumentError.new('b must be greater than 0')
47
- end
48
- if k <= 0
49
- raise ArgumentError.new('k must be greater than 0')
50
- end
51
- if n/b < 1
52
- raise ArgumentError.new('n/b must be greater than 1')
53
- end
54
- @n = n
55
- @b = b
56
- @k = k
57
-
58
- @queue = []
59
- @statistics = {}
60
- @delta = 0
61
- end
62
-
63
- # Processes a single basic window of b items, by first adding
64
- # a summary of this basic window in the internal global queue;
65
- # and then updating the global statistics accordingly.
66
- #
67
- # @param [Array] an array of objects representing a basic window
68
- def process(elements)
69
- # Do we need this?
70
- return if elements.length != @b
71
-
72
- # Step 1
73
- summary = {}
74
- elements.each do |e|
75
- if summary.key? e
76
- summary[e] += 1
77
- else
78
- summary[e] = 1
79
- end
80
- end
81
-
82
- # Step 2
83
- @queue << summary
84
-
85
- # Step 3
86
- # Done, implicitly
87
-
88
- # Step 4
89
- summary.each do |k,v|
90
- if @statistics.key? k
91
- @statistics[k] += v
92
- else
93
- @statistics[k] = v
94
- end
95
- end
96
-
97
- # Step 5
98
- @delta += kth_largest(summary.values, @k)
99
-
100
- # Step 6 - sizeOf(Q) > N/b
101
- if @queue.length > @n/@b
102
- # a
103
- summary_p = @queue.shift
104
- @delta -= kth_largest(summary_p.values, @k)
105
-
106
- # b
107
- summary_p.each { |k,v| @statistics[k] -= v }
108
- @statistics.delete_if { |k,v| v <= 0 }
109
-
110
- #c
111
- @statistics.select { |k,v| v > @delta }
112
- else
113
- {}
114
- end
115
- end
116
-
117
- # Returns the version for this gem.
118
- #
119
- # @return [String] the version for this gem.
120
- def version
121
- Frequent::VERSION
122
- end
123
-
124
- private
125
- # Given a list of numbers and a number k which should be
126
- # between 1 and the length of the given list, return the
127
- # element x in the list that is larger than exactly k-1
128
- # other elements in the list.
129
- #
130
- # @param [Array] list of integers
131
- # @return [Integer] the kth largest element in list
132
- def kth_largest(list, k)
133
- raise ArgumentError.new(ERR_BADLIST) if list.nil? or list.empty?
134
- raise ArgumentError.new(ERR_BADK) if k < 1
135
-
136
- ulist = list.uniq
137
- k = ulist.size if k > ulist.size
138
-
139
- def quickselect(aset, k)
140
- p = rand(aset.size)
141
-
142
- lower = aset.select { |e| e < aset[p] }
143
- upper = aset.select { |e| e > aset[p] }
144
-
145
- if k <= lower.size
146
- quickselect(lower, k)
147
- elsif k > aset.size - upper.size
148
- quickselect(upper, k - (aset.size - upper.size))
149
- else
150
- aset[p]
151
- end
152
- end
153
- quickselect(ulist, ulist.size+1-k)
154
- end
155
- end
156
- end
157
-
158
- =begin
159
-
160
- The MIT License (MIT)
161
-
162
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
163
-
164
- Permission is hereby granted, free of charge, to any person obtaining a
165
- copy of this software and associated documentation files (the "Software"),
166
- to deal in the Software without restriction, including without limitation
167
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
168
- and/or sell copies of the Software, and to permit persons to whom the
169
- Software is furnished to do so, subject to the following conditions:
170
-
171
- The above copyright notice and this permission notice shall be included
172
- in all copies or substantial portions of the Software.
173
-
174
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
175
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
176
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
177
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
178
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
179
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
180
- IN THE SOFTWARE.
181
-
182
- =end
1
+ # coding: utf-8
2
+ require 'frequent/version'
3
+ require 'thread'
4
+
5
+ module Frequent
6
+
7
+ ERR_BADLIST = "List cannot be nil or empty".freeze
8
+ ERR_BADK = "k must be between 1 and %s".freeze
9
+
10
+ # `Frequent::Algorithm` is the Ruby implementation of the
11
+ # Demaine et al. FREQUENT algorithm for calculating
12
+ # top-k items in a stream.
13
+ #
14
+ # The aims of this algorithm are:
15
+ # * use limited memory
16
+ # * require constant processing time per item
17
+ # * require a single-pass only
18
+ #
19
+ class Algorithm
20
+
21
+ # @return [Integer] the number of items in the main window
22
+ attr_reader :n
23
+ # @return [Integer] the number of items in a basic window
24
+ attr_reader :b
25
+ # @return [Integer] the number of top item categories to track
26
+ attr_reader :k
27
+ # @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
28
+ attr_reader :queue
29
+ # @return [Hash<Object,Integer>] global mapping of items and counts
30
+ attr_reader :statistics
31
+ # @return [Integer] minimum threshold for membership in top-k items
32
+ attr_reader :delta
33
+ # @return [Hash<Object,Integer>] latest top k elements and their counts
34
+ attr_reader :topk
35
+ # @return [Array[Object]] the window of elements of size b
36
+ attr_reader :window
37
+
38
+ # Initializes this top-k frequency-calculating instance.
39
+ #
40
+ # @param [Integer] n number of items in the main window
41
+ # @param [Integer] b number of items in a basic window
42
+ # @param [Integer] k number of top item categories to track
43
+ # @raise [ArgumentError] if n is not greater than 0
44
+ # @raise [ArgumentError] if b is not greater than 0
45
+ # @raise [ArgumentError] if k is not greater than 0
46
+ # @raise [ArgumentError] if n/b is not greater than 1
47
+ def initialize(n, b, k=1)
48
+ @lock = Mutex.new
49
+
50
+ if n <= 0
51
+ raise ArgumentError.new('n must be greater than 0')
52
+ end
53
+ if b <= 0
54
+ raise ArgumentError.new('b must be greater than 0')
55
+ end
56
+ if k <= 0
57
+ raise ArgumentError.new('k must be greater than 0')
58
+ end
59
+ if n/b < 1
60
+ raise ArgumentError.new('n/b must be greater than 1')
61
+ end
62
+ @n = n
63
+ @b = b
64
+ @k = k
65
+
66
+ @queue = []
67
+ @statistics = {}
68
+ @delta = 0
69
+ @topk = {}
70
+ @window = []
71
+ end
72
+
73
+ # Processes a single basic window of b items, by first adding
74
+ # a summary of this basic window in the internal global queue;
75
+ # and then updating the global statistics accordingly.
76
+ #
77
+ # @param [Object] an object from a data stream
78
+ def process(element)
79
+ @lock.synchronize do
80
+ @window << element
81
+ if @window.length == @b
82
+
83
+ # Step 1
84
+ summary = {}
85
+ @window.each do |e|
86
+ if summary.key? e
87
+ summary[e] += 1
88
+ else
89
+ summary[e] = 1
90
+ end
91
+ end
92
+ @window.clear #current window cleared
93
+
94
+ # Step 2
95
+ @queue << summary
96
+
97
+ # Step 3
98
+ # Done, implicitly
99
+
100
+ # Step 4
101
+ summary.each do |k,v|
102
+ if @statistics.key? k
103
+ @statistics[k] += v
104
+ else
105
+ @statistics[k] = v
106
+ end
107
+ end
108
+
109
+ # Step 5
110
+ @delta += kth_largest(summary.values, @k)
111
+
112
+ # Step 6 - sizeOf(Q) > N/b
113
+ if @queue.length > @n/@b
114
+ # a
115
+ summary_p = @queue.shift
116
+ @delta -= kth_largest(summary_p.values, @k)
117
+
118
+ # b
119
+ summary_p.each { |k,v| @statistics[k] -= v }
120
+ @statistics.delete_if { |k,v| v <= 0 }
121
+
122
+ #c
123
+ @topk = @statistics.select { |k,v| v > @delta }
124
+ end
125
+ end
126
+ end
127
+ end
128
+
129
+ # Return the latest Tok K elements
130
+ #
131
+ # @return [Hash<Object,Integer>] a hash which contains the current top K elements and their counts
132
+ def report
133
+ @topk
134
+ end
135
+
136
+ # Returns the version for this gem.
137
+ #
138
+ # @return [String] the version for this gem.
139
+ def version
140
+ Frequent::VERSION
141
+ end
142
+
143
+ private
144
+ # Given a list of numbers and a number k which should be
145
+ # between 1 and the length of the given list, return the
146
+ # element x in the list that is larger than exactly k-1
147
+ # other elements in the list.
148
+ #
149
+ # @param [Array] list of integers
150
+ # @return [Integer] the kth largest element in list
151
+ def kth_largest(list, k)
152
+ raise ArgumentError.new(ERR_BADLIST) if list.nil? or list.empty?
153
+ raise ArgumentError.new(ERR_BADK) if k < 1
154
+
155
+ ulist = list.uniq
156
+ k = ulist.size if k > ulist.size
157
+
158
+ def quickselect(aset, k)
159
+ p = rand(aset.size)
160
+
161
+ lower = aset.select { |e| e < aset[p] }
162
+ upper = aset.select { |e| e > aset[p] }
163
+
164
+ if k <= lower.size
165
+ quickselect(lower, k)
166
+ elsif k > aset.size - upper.size
167
+ quickselect(upper, k - (aset.size - upper.size))
168
+ else
169
+ aset[p]
170
+ end
171
+ end
172
+ quickselect(ulist, ulist.size+1-k)
173
+ end
174
+ end
175
+ end
176
+
177
+ =begin
178
+
179
+ The MIT License (MIT)
180
+
181
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
182
+
183
+ Permission is hereby granted, free of charge, to any person obtaining a
184
+ copy of this software and associated documentation files (the "Software"),
185
+ to deal in the Software without restriction, including without limitation
186
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
187
+ and/or sell copies of the Software, and to permit persons to whom the
188
+ Software is furnished to do so, subject to the following conditions:
189
+
190
+ The above copyright notice and this permission notice shall be included
191
+ in all copies or substantial portions of the Software.
192
+
193
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
194
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
195
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
196
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
197
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
198
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
199
+ IN THE SOFTWARE.
200
+
201
+ =end
@@ -1,38 +1,38 @@
1
- # coding: utf-8
2
-
3
- # `Frequent` is the namespace for objects implementing
4
- # the Demaine et al. FREQUENT algorithm for finding
5
- # the most frequently-appearing items (top-k) in a
6
- # data stream in sliding windows.
7
- #
8
- # `Frequent::Algorithm` is the implementation class.
9
- module Frequent
10
- # Version string for this Rubygem.
11
- VERSION = '0.0.3'
12
- end
13
-
14
- =begin
15
-
16
- The MIT License (MIT)
17
-
18
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
19
-
20
- Permission is hereby granted, free of charge, to any person obtaining a
21
- copy of this software and associated documentation files (the "Software"),
22
- to deal in the Software without restriction, including without limitation
23
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
24
- and/or sell copies of the Software, and to permit persons to whom the
25
- Software is furnished to do so, subject to the following conditions:
26
-
27
- The above copyright notice and this permission notice shall be included
28
- in all copies or substantial portions of the Software.
29
-
30
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
31
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
32
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
33
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
34
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
35
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
36
- IN THE SOFTWARE.
37
-
38
- =end
1
+ # coding: utf-8
2
+
3
+ # `Frequent` is the namespace for objects implementing
4
+ # the Demaine et al. FREQUENT algorithm for finding
5
+ # the most frequently-appearing items (top-k) in a
6
+ # data stream in sliding windows.
7
+ #
8
+ # `Frequent::Algorithm` is the implementation class.
9
+ module Frequent
10
+ # Version string for this Rubygem.
11
+ VERSION = '0.0.4'
12
+ end
13
+
14
+ =begin
15
+
16
+ The MIT License (MIT)
17
+
18
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
19
+
20
+ Permission is hereby granted, free of charge, to any person obtaining a
21
+ copy of this software and associated documentation files (the "Software"),
22
+ to deal in the Software without restriction, including without limitation
23
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
24
+ and/or sell copies of the Software, and to permit persons to whom the
25
+ Software is furnished to do so, subject to the following conditions:
26
+
27
+ The above copyright notice and this permission notice shall be included
28
+ in all copies or substantial portions of the Software.
29
+
30
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
31
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
32
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
33
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
34
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
35
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
36
+ IN THE SOFTWARE.
37
+
38
+ =end
metadata CHANGED
@@ -1,44 +1,44 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: frequent-algorithm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.3
4
+ version: 0.0.4
5
5
  platform: ruby
6
6
  authors:
7
7
  - Willie Tong
8
8
  - Brooke M. Fujita
9
- autorequire:
9
+ autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2015-03-23 00:00:00.000000000 Z
12
+ date: 2015-05-06 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
16
- version_requirements: !ruby/object:Gem::Requirement
16
+ requirement: !ruby/object:Gem::Requirement
17
17
  requirements:
18
18
  - - ">="
19
19
  - !ruby/object:Gem::Version
20
20
  version: '0'
21
- requirement: !ruby/object:Gem::Requirement
21
+ type: :development
22
+ prerelease: false
23
+ version_requirements: !ruby/object:Gem::Requirement
22
24
  requirements:
23
25
  - - ">="
24
26
  - !ruby/object:Gem::Version
25
27
  version: '0'
26
- prerelease: false
27
- type: :development
28
28
  - !ruby/object:Gem::Dependency
29
29
  name: minitest
30
- version_requirements: !ruby/object:Gem::Requirement
30
+ requirement: !ruby/object:Gem::Requirement
31
31
  requirements:
32
32
  - - ">="
33
33
  - !ruby/object:Gem::Version
34
34
  version: '0'
35
- requirement: !ruby/object:Gem::Requirement
35
+ type: :development
36
+ prerelease: false
37
+ version_requirements: !ruby/object:Gem::Requirement
36
38
  requirements:
37
39
  - - ">="
38
40
  - !ruby/object:Gem::Version
39
41
  version: '0'
40
- prerelease: false
41
- type: :development
42
42
  description: |
43
43
  frequent-algorithm is a Ruby implementation of the Demaine et al FREQUENT algorithm for identifying frequent items in a data stream in sliding windows (c.f Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003).
44
44
  email:
@@ -59,7 +59,7 @@ homepage: https://github.com/buruzaemon/frequent-algorithm
59
59
  licenses:
60
60
  - MIT
61
61
  metadata: {}
62
- post_install_message:
62
+ post_install_message:
63
63
  rdoc_options: []
64
64
  require_paths:
65
65
  - lib
@@ -74,9 +74,11 @@ required_rubygems_version: !ruby/object:Gem::Requirement
74
74
  - !ruby/object:Gem::Version
75
75
  version: '0'
76
76
  requirements: []
77
- rubyforge_project:
77
+ rubyforge_project:
78
78
  rubygems_version: 2.4.5
79
- signing_key:
79
+ signing_key:
80
80
  specification_version: 4
81
- summary: Identifies frequent items in a data stream in sliding windows using the Demaine et al FREQUENT algorithm.
81
+ summary: Identifies frequent items in a data stream in sliding windows using the Demaine
82
+ et al FREQUENT algorithm.
82
83
  test_files: []
84
+ has_rdoc: