frequent-algorithm 0.0.2 → 0.0.3

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f799561d8b1543e23483918e587dffcbd0522e78
4
- data.tar.gz: 39a5a9ceee0dee33bc889111b2745907ffb49018
3
+ metadata.gz: c581bac868de994e83f9fee63f34d8987ab2eda9
4
+ data.tar.gz: a9c068fa92383857a5d0526c9aa55ce4746651d5
5
5
  SHA512:
6
- metadata.gz: c68087e23dc0ff299f81797a1152c6e1fc6f1f10b7ec46ca39fff35cb8d91836be27eb34fe434c484fc1b28fde939c2f436d42b36f958ddca0afbe9f5107194b
7
- data.tar.gz: 6f0da59941492900cc4da2f1266ed0e8398a852cbb55e064fd0ed290f8209a396da2e051c5293a45a21c7564828a099ea03b607dfba53608d824f834a1083bc1
6
+ metadata.gz: 42571a866be7b6ee00748a10afb5ee6d48d8025198a68877a18401bedc0e372da09b1af3a7ef69ac94c78ed2fb798e52899b30eaddea2c423c0982413a2f9f3e
7
+ data.tar.gz: 0fd80b41cbc31fec785ea52596ab0e9094e54172d8a95e419f80339cdbcd7e9b275dd4a7a0826380ff43a3f81e8f6519d87fe02773ef4319c4a264211cee6f95
data/.yardopts CHANGED
@@ -1,7 +1,7 @@
1
- --no-private
2
- --readme README.md
3
- --markup markdown
4
- --markup-provider rdiscount
5
- -
6
- LICENSE
7
- CHANGELOG
1
+ --no-private
2
+ --readme README.md
3
+ --markup markdown
4
+ --markup-provider rdiscount
5
+ -
6
+ LICENSE
7
+ CHANGELOG
data/CHANGELOG CHANGED
@@ -1,9 +1,9 @@
1
- ## CHANGELOG
2
-
3
- - __2015/03/19 0.0.2 release.
4
- - First-stage implementation.
5
- - API documentation added.
6
- - Fleshing out unit tests.
7
-
8
- - __2015/03/11__: 0.0.1 release.
9
- - Initial release.
1
+ ## CHANGELOG
2
+
3
+ - __2015/03/19 0.0.2 release.
4
+ - First-stage implementation.
5
+ - API documentation added.
6
+ - Fleshing out unit tests.
7
+
8
+ - __2015/03/11__: 0.0.1 release.
9
+ - Initial release.
data/LICENSE CHANGED
@@ -1,22 +1,22 @@
1
- The MIT License (MIT)
2
-
3
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
22
-
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
data/README.md CHANGED
@@ -1,149 +1,149 @@
1
- # frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
2
-
3
- Web site usage, social network behavior and Internet traffic are examples
4
- of systems that appear to follow the [power law](http://en.wikipedia.org/wiki/Power_law),
5
- where most of the events are due to the actions of a very small few.
6
- Knowing at any given point in time which items are trending is valuable
7
- in understanding the system.
8
-
9
- `frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
10
- for identifying frequent items in a data stream in sliding windows.
11
- Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
12
- Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
13
- Golab, DeHaan, Demaine, López-Ortiz and Munro (2003).
14
-
15
- ## Introduction
16
-
17
- ### Challenges
18
-
19
- Challenges for Real-time processing of data streams for _frequent item queries_
20
- include:
21
-
22
- * data may be of unknown and possibly unbound length
23
- * data may be arriving a very fast rate
24
- * it might not be possible to go back and re-read the data
25
- * too large a window of observation may include stale data
26
-
27
- Therefore, a solution should have the following characteristics:
28
-
29
- * uses limited memory
30
- * can process events in the stream in Ο(1) constant time
31
- * requires only a single-pass over the data
32
-
33
-
34
- ### The algorithm
35
-
36
- > LOOP<br/>
37
- > 1. For each element e in the next b elements:<br/>
38
- > &nbsp;&nbsp;&nbsp;&nbsp;If a local counter exists for the type of element e:<br/>
39
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Increment the local counter.<br/>
40
- > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
41
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new local counter for this element type<br/>
42
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to 1.<br/>
43
- > 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
44
- > 3. Delete all local counters<br/>
45
- > 4. For each type named in S:<br/>
46
- > &nbsp;&nbsp;&nbsp;&nbsp;If a global counter exists for this type:<br/>
47
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Add to it the count recorded in S.<br/>
48
- > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
49
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new global counter for this element type<br/>
50
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to the count recorded in S.<br/>
51
- > 5. Add the count of the kth largest type in S to δ.<br/>
52
- > 6. If sizeOf(Q) > N/b:<br/>
53
- > &nbsp;&nbsp;&nbsp;&nbsp;(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
54
- > &nbsp;&nbsp;&nbsp;&nbsp;(b) For all element types named in S':<br/>
55
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subtract from their global counters the counts<br/>
56
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recorded in S'<br/>
57
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If a counter is decremented to zero:<br/>
58
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete it.<br/>
59
- > &nbsp;&nbsp;&nbsp;&nbsp;(c) Output the identity and value of each global counter > δ.
60
- >
61
- > &mdash; <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
62
-
63
-
64
- ## Usage
65
-
66
- require 'frequent-algorithm'
67
-
68
- # data is pi to 1000 digits
69
- pi = File.read('test/frequent/test_data_pi').strip
70
- data = pi.scan(/./).each_slice(b)
71
-
72
- N = 100 # size of main window
73
- b = 20 # size of basic window
74
- k = 3 # we are interested in top-3 numerals in pi
75
-
76
- alg = Frequent::Algorithm.new(N, b, k)
77
-
78
- # read in and process the 1st basic window
79
- alg.process(data.next)
80
-
81
- # and the top-3 numerals are?
82
- top3 = alg.statistics.report
83
- puts top3
84
-
85
- # lather, rinse and repeat
86
- alg.process(data.next)
87
-
88
-
89
- ## Development
90
-
91
- The development of this gem requires the following:
92
-
93
- * [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
94
- * [rubygems](https://rubygems.org/pages/download)
95
- * [`bundler`](https://github.com/bundler/bundler)
96
- * [`rake`](https://github.com/ruby/rake)
97
- * [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
98
- * [`yard`](https://rubygems.org/gems/yard) (documentation)
99
- * [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
100
-
101
- Building, testing and release of this rubygem uses the following
102
- `rake` commands:
103
-
104
-
105
- rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
106
- rake clean # Remove any temporary products
107
- rake clobber # Remove any generated file
108
- rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
109
- rake release # Create tag vn.n.n and build and push
110
- # frequent-algorithm-n.n.n.gem to Rubygems
111
- rake test # Execute unit tests
112
-
113
-
114
- ### Documentation
115
-
116
- `frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
117
- [`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
118
- Check out [Getting Started with
119
- Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
120
-
121
- ### Unit Testing
122
-
123
- `frequent-algorithm` uses
124
- [`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
125
- unit testing.
126
-
127
- ### Releasing
128
-
129
- Please refer to Publishing To Rubygems.org in the
130
- [Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
131
-
132
- ### Contributing
133
-
134
- 1. Fork it
135
- 2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
136
- 3. Create your feature branch (`git branch my-new-feature && git checkout
137
- my-new-feature`)
138
- 4. Commit your changes (`git commit -am 'Add some feature'`)
139
- 5. Push to the branch (`git push origin my-new-feature:dev-branch`)
140
- 6. Create new Pull Request
141
-
142
- You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
143
-
144
-
145
- ## License
146
-
147
- frequent-algorithm is provided under the terms of the MIT license.
148
-
149
- Copyright &copy; 2015, Willie Tong &amp; Brooke M. Fujita. All rights reserved.
1
+ # frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
2
+
3
+ Web site usage, social network behavior and Internet traffic are examples
4
+ of systems that appear to follow the [Power law](http://en.wikipedia.org/wiki/Power_law),
5
+ where most of the events are due to the actions of a very small few.
6
+ Knowing at any given point in time which items are trending is valuable
7
+ in understanding the system.
8
+
9
+ `frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
10
+ for identifying frequent items in a data stream in sliding windows.
11
+ Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
12
+ Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
13
+ Golab, DeHaan, Demaine, L&#243;pez-Ortiz and Munro (2003).
14
+
15
+ ## Introduction
16
+
17
+ ### Challenges
18
+
19
+ Challenges for Real-time processing of data streams for _frequent item queries_
20
+ include:
21
+
22
+ * data may be of unknown and possibly unbound length
23
+ * data may be arriving a very fast rate
24
+ * it might not be possible to go back and re-read the data
25
+ * too large a window of observation may include stale data
26
+
27
+ Therefore, a solution should have the following characteristics:
28
+
29
+ * uses limited memory
30
+ * can process events in the stream in &#927;(1) constant time
31
+ * requires only a single-pass over the data
32
+
33
+
34
+ ### The algorithm
35
+
36
+ > LOOP<br/>
37
+ > 1. For each element e in the next b elements:<br/>
38
+ > &nbsp;&nbsp;&nbsp;&nbsp;If a local counter exists for the type of element e:<br/>
39
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Increment the local counter.<br/>
40
+ > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
41
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new local counter for this element type<br/>
42
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to 1.<br/>
43
+ > 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
44
+ > 3. Delete all local counters<br/>
45
+ > 4. For each type named in S:<br/>
46
+ > &nbsp;&nbsp;&nbsp;&nbsp;If a global counter exists for this type:<br/>
47
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Add to it the count recorded in S.<br/>
48
+ > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
49
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new global counter for this element type<br/>
50
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to the count recorded in S.<br/>
51
+ > 5. Add the count of the kth largest type in S to δ.<br/>
52
+ > 6. If sizeOf(Q) > N/b:<br/>
53
+ > &nbsp;&nbsp;&nbsp;&nbsp;(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
54
+ > &nbsp;&nbsp;&nbsp;&nbsp;(b) For all element types named in S':<br/>
55
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subtract from their global counters the counts<br/>
56
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recorded in S'<br/>
57
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If a counter is decremented to zero:<br/>
58
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete it.<br/>
59
+ > &nbsp;&nbsp;&nbsp;&nbsp;(c) Output the identity and value of each global counter > δ.
60
+ >
61
+ > &mdash; <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
62
+
63
+
64
+ ## Usage
65
+
66
+ require 'frequent-algorithm'
67
+
68
+ # data is pi to 1000 digits
69
+ pi = File.read('test/frequent/test_data_pi').strip
70
+ data = pi.scan(/./).each_slice(b)
71
+
72
+ N = 100 # size of main window
73
+ b = 20 # size of basic window
74
+ k = 3 # we are interested in top-3 numerals in pi
75
+
76
+ alg = Frequent::Algorithm.new(N, b, k)
77
+
78
+ # read in and process the 1st basic window
79
+ alg.process(data.next)
80
+
81
+ # and the top-3 numerals are?
82
+ top3 = alg.statistics.report
83
+ puts top3
84
+
85
+ # lather, rinse and repeat
86
+ alg.process(data.next)
87
+
88
+
89
+ ## Development
90
+
91
+ The development of this gem requires the following:
92
+
93
+ * [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
94
+ * [rubygems](https://rubygems.org/pages/download)
95
+ * [`bundler`](https://github.com/bundler/bundler)
96
+ * [`rake`](https://github.com/ruby/rake)
97
+ * [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
98
+ * [`yard`](https://rubygems.org/gems/yard) (documentation)
99
+ * [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
100
+
101
+ Building, testing and release of this rubygem uses the following
102
+ `rake` commands:
103
+
104
+
105
+ rake clean # Remove any temporary products
106
+ rake clobber # Remove any generated file
107
+ rake test # Execute unit tests
108
+ rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
109
+ rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
110
+ rake release # Create tag vn.n.n and build and push
111
+ # frequent-algorithm-n.n.n.gem to Rubygems
112
+
113
+
114
+ ### Documentation
115
+
116
+ `frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
117
+ [`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
118
+ Check out [Getting Started with
119
+ Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
120
+
121
+ ### Unit Testing
122
+
123
+ `frequent-algorithm` uses
124
+ [`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
125
+ unit testing.
126
+
127
+ ### Releasing
128
+
129
+ Please refer to Publishing To Rubygems.org in the
130
+ [Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
131
+
132
+ ### Contributing
133
+
134
+ 1. Fork it
135
+ 2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
136
+ 3. Create your feature branch (`git branch my-new-feature && git checkout
137
+ my-new-feature`)
138
+ 4. Commit your changes (`git commit -am 'Add some feature'`)
139
+ 5. Push to the branch (`git push origin my-new-feature:dev-branch`)
140
+ 6. Create new Pull Request
141
+
142
+ You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
143
+
144
+
145
+ ## License
146
+
147
+ frequent-algorithm is provided under the terms of the MIT license.
148
+
149
+ Copyright &copy; 2015, Willie Tong &amp; Brooke M. Fujita. All rights reserved.
@@ -1,28 +1,28 @@
1
- # coding: utf-8
2
- require 'frequent/algorithm'
3
-
4
- =begin
5
-
6
- The MIT License (MIT)
7
-
8
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
9
-
10
- Permission is hereby granted, free of charge, to any person obtaining a
11
- copy of this software and associated documentation files (the "Software"),
12
- to deal in the Software without restriction, including without limitation
13
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
14
- and/or sell copies of the Software, and to permit persons to whom the
15
- Software is furnished to do so, subject to the following conditions:
16
-
17
- The above copyright notice and this permission notice shall be included
18
- in all copies or substantial portions of the Software.
19
-
20
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
23
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
25
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
26
- IN THE SOFTWARE.
27
-
28
- =end
1
+ # coding: utf-8
2
+ require 'frequent/algorithm'
3
+
4
+ =begin
5
+
6
+ The MIT License (MIT)
7
+
8
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a
11
+ copy of this software and associated documentation files (the "Software"),
12
+ to deal in the Software without restriction, including without limitation
13
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
14
+ and/or sell copies of the Software, and to permit persons to whom the
15
+ Software is furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included
18
+ in all copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
23
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
25
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
26
+ IN THE SOFTWARE.
27
+
28
+ =end
@@ -1,166 +1,182 @@
1
- # coding: utf-8
2
- require 'frequent/version'
3
-
4
- module Frequent
5
-
6
- # `Frequent::Algorithm` is the Ruby implementation of the
7
- # Demaine et al. FREQUENT algorithm for calculating
8
- # top-k items in a stream.
9
- #
10
- # The aims of this algorithm are:
11
- # * use limited memory
12
- # * require constant processing time per item
13
- # * require a single-pass only
14
- #
15
- class Algorithm
16
- # @return [Integer] the number of items in the main window
17
- attr_reader :n
18
- # @return [Integer] the number of items in a basic window
19
- attr_reader :b
20
- # @return [Integer] the number of top item categories to track
21
- attr_reader :k
22
- # @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
23
- attr_reader :queue
24
- # @return [Hash<Object,Integer>] global mapping of items and counts
25
- attr_reader :statistics
26
- # @return [Integer] minimum threshold for membership in top-k items
27
- attr_reader :delta
28
-
29
- # Initializes this top-k frequency-calculating instance.
30
- #
31
- # @param [Integer] n number of items in the main window
32
- # @param [Integer] b number of items in a basic window
33
- # @param [Integer] k number of top item categories to track
34
- # @raise [ArgumentError] if n is not greater than 0
35
- # @raise [ArgumentError] if b is not greater than 0
36
- # @raise [ArgumentError] if k is not greater than 0
37
- # @raise [ArgumentError] if n/b is not greater than 1
38
- def initialize(n, b, k=1)
39
- if n <= 0
40
- raise ArgumentError.new('n must be greater than 0')
41
- end
42
- if b <= 0
43
- raise ArgumentError.new('b must be greater than 0')
44
- end
45
- if k <= 0
46
- raise ArgumentError.new('k must be greater than 0')
47
- end
48
- if n/b < 1
49
- raise ArgumentError.new('n/b must be greater than 1')
50
- end
51
- @n = n
52
- @b = b
53
- @k = k
54
-
55
- @queue = []
56
- @statistics = {}
57
- @delta = 0
58
- end
59
-
60
- # Processes a single basic window of b items, by first adding
61
- # a summary of this basic window in the internal global queue;
62
- # and then updating the global statistics accordingly.
63
- #
64
- # @param [Array] an array of objects representing a basic window
65
- def process(elements)
66
- # Do we need this?
67
- return if elements.length != @b
68
-
69
- # Step 1
70
- summary = {}
71
- elements.each do |e|
72
- if summary.key? e
73
- summary[e] += 1
74
- else
75
- summary[e] = 1
76
- end
77
- end
78
-
79
- # index of the k-th item
80
- kth_index = find_kth_largest(summary)
81
-
82
- # Step 2 & 3
83
- # summary is [[item,count],[item,count],[item,count]....]
84
- # sorted by descending order of the item count
85
- summary = summary.sort { |a,b| b[1]<=>a[1] }[0..kth_index]
86
- @queue << summary
87
-
88
- # Step 4
89
- summary.each do |t|
90
- if @statistics.key? t[0]
91
- @statistics[t[0]] += t[1]
92
- else
93
- @statistics[t[0]] = t[1]
94
- end
95
- end
96
-
97
- # Step 5
98
- @delta += summary[kth_index][1]
99
-
100
- # Step 6
101
- if should_pop_oldest_summary
102
- # a
103
- summary_p = @queue.shift
104
- @delta -= summary_p[find_kth_largest(summary_p)][1]
105
-
106
- # b
107
- summary_p.each { |t| @statistics[t[0]] -= t[1] }
108
- @statistics.delete_if { |k,v| v <= 0 }
109
-
110
- #c
111
- @statistics.select { |k,v| v > @delta }
112
- else
113
- {}
114
- end
115
- end
116
-
117
- # Returns the version for this gem.
118
- #
119
- # @return [String] the version for this gem.
120
- def version
121
- Frequent::VERSION
122
- end
123
-
124
- private
125
- # Return true when it is ready to pop oldest summary from queue
126
- #
127
- # @return [Boolean] whether it is ready to pop oldest summary from queue
128
- def should_pop_oldest_summary
129
- @queue.length > @n/@b
130
- end
131
-
132
- # Return the k-th index of a summary object
133
- #
134
- # @param [Object] a summary object
135
- # @return [Integer] the k-th index
136
- def find_kth_largest(summary)
137
- [summary.length, @k].min - 1
138
- end
139
- end
140
- end
141
-
142
- =begin
143
-
144
- The MIT License (MIT)
145
-
146
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
147
-
148
- Permission is hereby granted, free of charge, to any person obtaining a
149
- copy of this software and associated documentation files (the "Software"),
150
- to deal in the Software without restriction, including without limitation
151
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
152
- and/or sell copies of the Software, and to permit persons to whom the
153
- Software is furnished to do so, subject to the following conditions:
154
-
155
- The above copyright notice and this permission notice shall be included
156
- in all copies or substantial portions of the Software.
157
-
158
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
159
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
160
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
161
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
162
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
163
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
164
- IN THE SOFTWARE.
165
-
166
- =end
1
+ # coding: utf-8
2
+ require 'frequent/version'
3
+
4
+ module Frequent
5
+
6
+ ERR_BADLIST = "List cannot be nil or empty".freeze
7
+ ERR_BADK = "k must be between 1 and %s".freeze
8
+
9
+ # `Frequent::Algorithm` is the Ruby implementation of the
10
+ # Demaine et al. FREQUENT algorithm for calculating
11
+ # top-k items in a stream.
12
+ #
13
+ # The aims of this algorithm are:
14
+ # * use limited memory
15
+ # * require constant processing time per item
16
+ # * require a single-pass only
17
+ #
18
+ class Algorithm
19
+ # @return [Integer] the number of items in the main window
20
+ attr_reader :n
21
+ # @return [Integer] the number of items in a basic window
22
+ attr_reader :b
23
+ # @return [Integer] the number of top item categories to track
24
+ attr_reader :k
25
+ # @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
26
+ attr_reader :queue
27
+ # @return [Hash<Object,Integer>] global mapping of items and counts
28
+ attr_reader :statistics
29
+ # @return [Integer] minimum threshold for membership in top-k items
30
+ attr_reader :delta
31
+
32
+ # Initializes this top-k frequency-calculating instance.
33
+ #
34
+ # @param [Integer] n number of items in the main window
35
+ # @param [Integer] b number of items in a basic window
36
+ # @param [Integer] k number of top item categories to track
37
+ # @raise [ArgumentError] if n is not greater than 0
38
+ # @raise [ArgumentError] if b is not greater than 0
39
+ # @raise [ArgumentError] if k is not greater than 0
40
+ # @raise [ArgumentError] if n/b is not greater than 1
41
+ def initialize(n, b, k=1)
42
+ if n <= 0
43
+ raise ArgumentError.new('n must be greater than 0')
44
+ end
45
+ if b <= 0
46
+ raise ArgumentError.new('b must be greater than 0')
47
+ end
48
+ if k <= 0
49
+ raise ArgumentError.new('k must be greater than 0')
50
+ end
51
+ if n/b < 1
52
+ raise ArgumentError.new('n/b must be greater than 1')
53
+ end
54
+ @n = n
55
+ @b = b
56
+ @k = k
57
+
58
+ @queue = []
59
+ @statistics = {}
60
+ @delta = 0
61
+ end
62
+
63
+ # Processes a single basic window of b items, by first adding
64
+ # a summary of this basic window in the internal global queue;
65
+ # and then updating the global statistics accordingly.
66
+ #
67
+ # @param [Array] an array of objects representing a basic window
68
+ def process(elements)
69
+ # Do we need this?
70
+ return if elements.length != @b
71
+
72
+ # Step 1
73
+ summary = {}
74
+ elements.each do |e|
75
+ if summary.key? e
76
+ summary[e] += 1
77
+ else
78
+ summary[e] = 1
79
+ end
80
+ end
81
+
82
+ # Step 2
83
+ @queue << summary
84
+
85
+ # Step 3
86
+ # Done, implicitly
87
+
88
+ # Step 4
89
+ summary.each do |k,v|
90
+ if @statistics.key? k
91
+ @statistics[k] += v
92
+ else
93
+ @statistics[k] = v
94
+ end
95
+ end
96
+
97
+ # Step 5
98
+ @delta += kth_largest(summary.values, @k)
99
+
100
+ # Step 6 - sizeOf(Q) > N/b
101
+ if @queue.length > @n/@b
102
+ # a
103
+ summary_p = @queue.shift
104
+ @delta -= kth_largest(summary_p.values, @k)
105
+
106
+ # b
107
+ summary_p.each { |k,v| @statistics[k] -= v }
108
+ @statistics.delete_if { |k,v| v <= 0 }
109
+
110
+ #c
111
+ @statistics.select { |k,v| v > @delta }
112
+ else
113
+ {}
114
+ end
115
+ end
116
+
117
+ # Returns the version for this gem.
118
+ #
119
+ # @return [String] the version for this gem.
120
+ def version
121
+ Frequent::VERSION
122
+ end
123
+
124
+ private
125
+ # Given a list of numbers and a number k which should be
126
+ # between 1 and the length of the given list, return the
127
+ # element x in the list that is larger than exactly k-1
128
+ # other elements in the list.
129
+ #
130
+ # @param [Array] list of integers
131
+ # @return [Integer] the kth largest element in list
132
+ def kth_largest(list, k)
133
+ raise ArgumentError.new(ERR_BADLIST) if list.nil? or list.empty?
134
+ raise ArgumentError.new(ERR_BADK) if k < 1
135
+
136
+ ulist = list.uniq
137
+ k = ulist.size if k > ulist.size
138
+
139
+ def quickselect(aset, k)
140
+ p = rand(aset.size)
141
+
142
+ lower = aset.select { |e| e < aset[p] }
143
+ upper = aset.select { |e| e > aset[p] }
144
+
145
+ if k <= lower.size
146
+ quickselect(lower, k)
147
+ elsif k > aset.size - upper.size
148
+ quickselect(upper, k - (aset.size - upper.size))
149
+ else
150
+ aset[p]
151
+ end
152
+ end
153
+ quickselect(ulist, ulist.size+1-k)
154
+ end
155
+ end
156
+ end
157
+
158
+ =begin
159
+
160
+ The MIT License (MIT)
161
+
162
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
163
+
164
+ Permission is hereby granted, free of charge, to any person obtaining a
165
+ copy of this software and associated documentation files (the "Software"),
166
+ to deal in the Software without restriction, including without limitation
167
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
168
+ and/or sell copies of the Software, and to permit persons to whom the
169
+ Software is furnished to do so, subject to the following conditions:
170
+
171
+ The above copyright notice and this permission notice shall be included
172
+ in all copies or substantial portions of the Software.
173
+
174
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
175
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
176
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
177
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
178
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
179
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
180
+ IN THE SOFTWARE.
181
+
182
+ =end
@@ -1,38 +1,38 @@
1
- # coding: utf-8
2
-
3
- # `Frequent` is the namespace for objects implementing
4
- # the Demaine et al. FREQUENT algorithm for finding
5
- # the most frequently-appearing items (top-k) in a
6
- # data stream in sliding windows.
7
- #
8
- # `Frequent::Algorithm` is the implementation class.
9
- module Frequent
10
- # Version string for this Rubygem.
11
- VERSION = '0.0.2'
12
- end
13
-
14
- =begin
15
-
16
- The MIT License (MIT)
17
-
18
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
19
-
20
- Permission is hereby granted, free of charge, to any person obtaining a
21
- copy of this software and associated documentation files (the "Software"),
22
- to deal in the Software without restriction, including without limitation
23
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
24
- and/or sell copies of the Software, and to permit persons to whom the
25
- Software is furnished to do so, subject to the following conditions:
26
-
27
- The above copyright notice and this permission notice shall be included
28
- in all copies or substantial portions of the Software.
29
-
30
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
31
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
32
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
33
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
34
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
35
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
36
- IN THE SOFTWARE.
37
-
38
- =end
1
+ # coding: utf-8
2
+
3
+ # `Frequent` is the namespace for objects implementing
4
+ # the Demaine et al. FREQUENT algorithm for finding
5
+ # the most frequently-appearing items (top-k) in a
6
+ # data stream in sliding windows.
7
+ #
8
+ # `Frequent::Algorithm` is the implementation class.
9
+ module Frequent
10
+ # Version string for this Rubygem.
11
+ VERSION = '0.0.3'
12
+ end
13
+
14
+ =begin
15
+
16
+ The MIT License (MIT)
17
+
18
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
19
+
20
+ Permission is hereby granted, free of charge, to any person obtaining a
21
+ copy of this software and associated documentation files (the "Software"),
22
+ to deal in the Software without restriction, including without limitation
23
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
24
+ and/or sell copies of the Software, and to permit persons to whom the
25
+ Software is furnished to do so, subject to the following conditions:
26
+
27
+ The above copyright notice and this permission notice shall be included
28
+ in all copies or substantial portions of the Software.
29
+
30
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
31
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
32
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
33
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
34
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
35
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
36
+ IN THE SOFTWARE.
37
+
38
+ =end
metadata CHANGED
@@ -1,44 +1,44 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: frequent-algorithm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Willie Tong
8
8
  - Brooke M. Fujita
9
- autorequire:
9
+ autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2015-03-19 00:00:00.000000000 Z
12
+ date: 2015-03-23 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
16
- requirement: !ruby/object:Gem::Requirement
16
+ version_requirements: !ruby/object:Gem::Requirement
17
17
  requirements:
18
- - - '>='
18
+ - - ">="
19
19
  - !ruby/object:Gem::Version
20
20
  version: '0'
21
- type: :development
22
- prerelease: false
23
- version_requirements: !ruby/object:Gem::Requirement
21
+ requirement: !ruby/object:Gem::Requirement
24
22
  requirements:
25
- - - '>='
23
+ - - ">="
26
24
  - !ruby/object:Gem::Version
27
25
  version: '0'
26
+ prerelease: false
27
+ type: :development
28
28
  - !ruby/object:Gem::Dependency
29
29
  name: minitest
30
- requirement: !ruby/object:Gem::Requirement
30
+ version_requirements: !ruby/object:Gem::Requirement
31
31
  requirements:
32
- - - '>='
32
+ - - ">="
33
33
  - !ruby/object:Gem::Version
34
34
  version: '0'
35
- type: :development
36
- prerelease: false
37
- version_requirements: !ruby/object:Gem::Requirement
35
+ requirement: !ruby/object:Gem::Requirement
38
36
  requirements:
39
- - - '>='
37
+ - - ">="
40
38
  - !ruby/object:Gem::Version
41
39
  version: '0'
40
+ prerelease: false
41
+ type: :development
42
42
  description: |
43
43
  frequent-algorithm is a Ruby implementation of the Demaine et al FREQUENT algorithm for identifying frequent items in a data stream in sliding windows (c.f Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003).
44
44
  email:
@@ -48,36 +48,35 @@ executables: []
48
48
  extensions: []
49
49
  extra_rdoc_files: []
50
50
  files:
51
+ - ".yardopts"
52
+ - CHANGELOG
53
+ - LICENSE
54
+ - README.md
51
55
  - lib/frequent-algorithm.rb
52
56
  - lib/frequent/algorithm.rb
53
57
  - lib/frequent/version.rb
54
- - README.md
55
- - LICENSE
56
- - CHANGELOG
57
- - .yardopts
58
58
  homepage: https://github.com/buruzaemon/frequent-algorithm
59
59
  licenses:
60
60
  - MIT
61
61
  metadata: {}
62
- post_install_message:
62
+ post_install_message:
63
63
  rdoc_options: []
64
64
  require_paths:
65
65
  - lib
66
66
  required_ruby_version: !ruby/object:Gem::Requirement
67
67
  requirements:
68
- - - '>='
68
+ - - ">="
69
69
  - !ruby/object:Gem::Version
70
70
  version: '2.0'
71
71
  required_rubygems_version: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - '>='
73
+ - - ">="
74
74
  - !ruby/object:Gem::Version
75
75
  version: '0'
76
76
  requirements: []
77
- rubyforge_project:
78
- rubygems_version: 2.0.14
79
- signing_key:
77
+ rubyforge_project:
78
+ rubygems_version: 2.4.5
79
+ signing_key:
80
80
  specification_version: 4
81
- summary: Identifies frequent items in a data stream in sliding windows using the Demaine
82
- et al FREQUENT algorithm.
81
+ summary: Identifies frequent items in a data stream in sliding windows using the Demaine et al FREQUENT algorithm.
83
82
  test_files: []