frequent-algorithm 0.0.2 → 0.0.3

Sign up to get free protection for your applications and to get access to all the features.
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA1:
3
- metadata.gz: f799561d8b1543e23483918e587dffcbd0522e78
4
- data.tar.gz: 39a5a9ceee0dee33bc889111b2745907ffb49018
3
+ metadata.gz: c581bac868de994e83f9fee63f34d8987ab2eda9
4
+ data.tar.gz: a9c068fa92383857a5d0526c9aa55ce4746651d5
5
5
  SHA512:
6
- metadata.gz: c68087e23dc0ff299f81797a1152c6e1fc6f1f10b7ec46ca39fff35cb8d91836be27eb34fe434c484fc1b28fde939c2f436d42b36f958ddca0afbe9f5107194b
7
- data.tar.gz: 6f0da59941492900cc4da2f1266ed0e8398a852cbb55e064fd0ed290f8209a396da2e051c5293a45a21c7564828a099ea03b607dfba53608d824f834a1083bc1
6
+ metadata.gz: 42571a866be7b6ee00748a10afb5ee6d48d8025198a68877a18401bedc0e372da09b1af3a7ef69ac94c78ed2fb798e52899b30eaddea2c423c0982413a2f9f3e
7
+ data.tar.gz: 0fd80b41cbc31fec785ea52596ab0e9094e54172d8a95e419f80339cdbcd7e9b275dd4a7a0826380ff43a3f81e8f6519d87fe02773ef4319c4a264211cee6f95
data/.yardopts CHANGED
@@ -1,7 +1,7 @@
1
- --no-private
2
- --readme README.md
3
- --markup markdown
4
- --markup-provider rdiscount
5
- -
6
- LICENSE
7
- CHANGELOG
1
+ --no-private
2
+ --readme README.md
3
+ --markup markdown
4
+ --markup-provider rdiscount
5
+ -
6
+ LICENSE
7
+ CHANGELOG
data/CHANGELOG CHANGED
@@ -1,9 +1,9 @@
1
- ## CHANGELOG
2
-
3
- - __2015/03/19 0.0.2 release.
4
- - First-stage implementation.
5
- - API documentation added.
6
- - Fleshing out unit tests.
7
-
8
- - __2015/03/11__: 0.0.1 release.
9
- - Initial release.
1
+ ## CHANGELOG
2
+
3
+ - __2015/03/19 0.0.2 release.
4
+ - First-stage implementation.
5
+ - API documentation added.
6
+ - Fleshing out unit tests.
7
+
8
+ - __2015/03/11__: 0.0.1 release.
9
+ - Initial release.
data/LICENSE CHANGED
@@ -1,22 +1,22 @@
1
- The MIT License (MIT)
2
-
3
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
4
-
5
- Permission is hereby granted, free of charge, to any person obtaining a copy
6
- of this software and associated documentation files (the "Software"), to deal
7
- in the Software without restriction, including without limitation the rights
8
- to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
- copies of the Software, and to permit persons to whom the Software is
10
- furnished to do so, subject to the following conditions:
11
-
12
- The above copyright notice and this permission notice shall be included in all
13
- copies or substantial portions of the Software.
14
-
15
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
- AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
- OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
- SOFTWARE.
22
-
1
+ The MIT License (MIT)
2
+
3
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
22
+
data/README.md CHANGED
@@ -1,149 +1,149 @@
1
- # frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
2
-
3
- Web site usage, social network behavior and Internet traffic are examples
4
- of systems that appear to follow the [power law](http://en.wikipedia.org/wiki/Power_law),
5
- where most of the events are due to the actions of a very small few.
6
- Knowing at any given point in time which items are trending is valuable
7
- in understanding the system.
8
-
9
- `frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
10
- for identifying frequent items in a data stream in sliding windows.
11
- Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
12
- Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
13
- Golab, DeHaan, Demaine, López-Ortiz and Munro (2003).
14
-
15
- ## Introduction
16
-
17
- ### Challenges
18
-
19
- Challenges for Real-time processing of data streams for _frequent item queries_
20
- include:
21
-
22
- * data may be of unknown and possibly unbound length
23
- * data may be arriving a very fast rate
24
- * it might not be possible to go back and re-read the data
25
- * too large a window of observation may include stale data
26
-
27
- Therefore, a solution should have the following characteristics:
28
-
29
- * uses limited memory
30
- * can process events in the stream in Ο(1) constant time
31
- * requires only a single-pass over the data
32
-
33
-
34
- ### The algorithm
35
-
36
- > LOOP<br/>
37
- > 1. For each element e in the next b elements:<br/>
38
- > &nbsp;&nbsp;&nbsp;&nbsp;If a local counter exists for the type of element e:<br/>
39
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Increment the local counter.<br/>
40
- > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
41
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new local counter for this element type<br/>
42
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to 1.<br/>
43
- > 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
44
- > 3. Delete all local counters<br/>
45
- > 4. For each type named in S:<br/>
46
- > &nbsp;&nbsp;&nbsp;&nbsp;If a global counter exists for this type:<br/>
47
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Add to it the count recorded in S.<br/>
48
- > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
49
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new global counter for this element type<br/>
50
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to the count recorded in S.<br/>
51
- > 5. Add the count of the kth largest type in S to δ.<br/>
52
- > 6. If sizeOf(Q) > N/b:<br/>
53
- > &nbsp;&nbsp;&nbsp;&nbsp;(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
54
- > &nbsp;&nbsp;&nbsp;&nbsp;(b) For all element types named in S':<br/>
55
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subtract from their global counters the counts<br/>
56
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recorded in S'<br/>
57
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If a counter is decremented to zero:<br/>
58
- > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete it.<br/>
59
- > &nbsp;&nbsp;&nbsp;&nbsp;(c) Output the identity and value of each global counter > δ.
60
- >
61
- > &mdash; <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
62
-
63
-
64
- ## Usage
65
-
66
- require 'frequent-algorithm'
67
-
68
- # data is pi to 1000 digits
69
- pi = File.read('test/frequent/test_data_pi').strip
70
- data = pi.scan(/./).each_slice(b)
71
-
72
- N = 100 # size of main window
73
- b = 20 # size of basic window
74
- k = 3 # we are interested in top-3 numerals in pi
75
-
76
- alg = Frequent::Algorithm.new(N, b, k)
77
-
78
- # read in and process the 1st basic window
79
- alg.process(data.next)
80
-
81
- # and the top-3 numerals are?
82
- top3 = alg.statistics.report
83
- puts top3
84
-
85
- # lather, rinse and repeat
86
- alg.process(data.next)
87
-
88
-
89
- ## Development
90
-
91
- The development of this gem requires the following:
92
-
93
- * [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
94
- * [rubygems](https://rubygems.org/pages/download)
95
- * [`bundler`](https://github.com/bundler/bundler)
96
- * [`rake`](https://github.com/ruby/rake)
97
- * [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
98
- * [`yard`](https://rubygems.org/gems/yard) (documentation)
99
- * [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
100
-
101
- Building, testing and release of this rubygem uses the following
102
- `rake` commands:
103
-
104
-
105
- rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
106
- rake clean # Remove any temporary products
107
- rake clobber # Remove any generated file
108
- rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
109
- rake release # Create tag vn.n.n and build and push
110
- # frequent-algorithm-n.n.n.gem to Rubygems
111
- rake test # Execute unit tests
112
-
113
-
114
- ### Documentation
115
-
116
- `frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
117
- [`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
118
- Check out [Getting Started with
119
- Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
120
-
121
- ### Unit Testing
122
-
123
- `frequent-algorithm` uses
124
- [`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
125
- unit testing.
126
-
127
- ### Releasing
128
-
129
- Please refer to Publishing To Rubygems.org in the
130
- [Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
131
-
132
- ### Contributing
133
-
134
- 1. Fork it
135
- 2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
136
- 3. Create your feature branch (`git branch my-new-feature && git checkout
137
- my-new-feature`)
138
- 4. Commit your changes (`git commit -am 'Add some feature'`)
139
- 5. Push to the branch (`git push origin my-new-feature:dev-branch`)
140
- 6. Create new Pull Request
141
-
142
- You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
143
-
144
-
145
- ## License
146
-
147
- frequent-algorithm is provided under the terms of the MIT license.
148
-
149
- Copyright &copy; 2015, Willie Tong &amp; Brooke M. Fujita. All rights reserved.
1
+ # frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
2
+
3
+ Web site usage, social network behavior and Internet traffic are examples
4
+ of systems that appear to follow the [Power law](http://en.wikipedia.org/wiki/Power_law),
5
+ where most of the events are due to the actions of a very small few.
6
+ Knowing at any given point in time which items are trending is valuable
7
+ in understanding the system.
8
+
9
+ `frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
10
+ for identifying frequent items in a data stream in sliding windows.
11
+ Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
12
+ Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
13
+ Golab, DeHaan, Demaine, L&#243;pez-Ortiz and Munro (2003).
14
+
15
+ ## Introduction
16
+
17
+ ### Challenges
18
+
19
+ Challenges for Real-time processing of data streams for _frequent item queries_
20
+ include:
21
+
22
+ * data may be of unknown and possibly unbound length
23
+ * data may be arriving a very fast rate
24
+ * it might not be possible to go back and re-read the data
25
+ * too large a window of observation may include stale data
26
+
27
+ Therefore, a solution should have the following characteristics:
28
+
29
+ * uses limited memory
30
+ * can process events in the stream in &#927;(1) constant time
31
+ * requires only a single-pass over the data
32
+
33
+
34
+ ### The algorithm
35
+
36
+ > LOOP<br/>
37
+ > 1. For each element e in the next b elements:<br/>
38
+ > &nbsp;&nbsp;&nbsp;&nbsp;If a local counter exists for the type of element e:<br/>
39
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Increment the local counter.<br/>
40
+ > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
41
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new local counter for this element type<br/>
42
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to 1.<br/>
43
+ > 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
44
+ > 3. Delete all local counters<br/>
45
+ > 4. For each type named in S:<br/>
46
+ > &nbsp;&nbsp;&nbsp;&nbsp;If a global counter exists for this type:<br/>
47
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Add to it the count recorded in S.<br/>
48
+ > &nbsp;&nbsp;&nbsp;&nbsp;Otherwise:<br/>
49
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Create a new global counter for this element type<br/>
50
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;and set it equal to the count recorded in S.<br/>
51
+ > 5. Add the count of the kth largest type in S to δ.<br/>
52
+ > 6. If sizeOf(Q) > N/b:<br/>
53
+ > &nbsp;&nbsp;&nbsp;&nbsp;(a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
54
+ > &nbsp;&nbsp;&nbsp;&nbsp;(b) For all element types named in S':<br/>
55
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Subtract from their global counters the counts<br/>
56
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;recorded in S'<br/>
57
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;If a counter is decremented to zero:<br/>
58
+ > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Delete it.<br/>
59
+ > &nbsp;&nbsp;&nbsp;&nbsp;(c) Output the identity and value of each global counter > δ.
60
+ >
61
+ > &mdash; <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
62
+
63
+
64
+ ## Usage
65
+
66
+ require 'frequent-algorithm'
67
+
68
+ # data is pi to 1000 digits
69
+ pi = File.read('test/frequent/test_data_pi').strip
70
+ data = pi.scan(/./).each_slice(b)
71
+
72
+ N = 100 # size of main window
73
+ b = 20 # size of basic window
74
+ k = 3 # we are interested in top-3 numerals in pi
75
+
76
+ alg = Frequent::Algorithm.new(N, b, k)
77
+
78
+ # read in and process the 1st basic window
79
+ alg.process(data.next)
80
+
81
+ # and the top-3 numerals are?
82
+ top3 = alg.statistics.report
83
+ puts top3
84
+
85
+ # lather, rinse and repeat
86
+ alg.process(data.next)
87
+
88
+
89
+ ## Development
90
+
91
+ The development of this gem requires the following:
92
+
93
+ * [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
94
+ * [rubygems](https://rubygems.org/pages/download)
95
+ * [`bundler`](https://github.com/bundler/bundler)
96
+ * [`rake`](https://github.com/ruby/rake)
97
+ * [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
98
+ * [`yard`](https://rubygems.org/gems/yard) (documentation)
99
+ * [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
100
+
101
+ Building, testing and release of this rubygem uses the following
102
+ `rake` commands:
103
+
104
+
105
+ rake clean # Remove any temporary products
106
+ rake clobber # Remove any generated file
107
+ rake test # Execute unit tests
108
+ rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
109
+ rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
110
+ rake release # Create tag vn.n.n and build and push
111
+ # frequent-algorithm-n.n.n.gem to Rubygems
112
+
113
+
114
+ ### Documentation
115
+
116
+ `frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
117
+ [`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
118
+ Check out [Getting Started with
119
+ Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
120
+
121
+ ### Unit Testing
122
+
123
+ `frequent-algorithm` uses
124
+ [`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
125
+ unit testing.
126
+
127
+ ### Releasing
128
+
129
+ Please refer to Publishing To Rubygems.org in the
130
+ [Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
131
+
132
+ ### Contributing
133
+
134
+ 1. Fork it
135
+ 2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
136
+ 3. Create your feature branch (`git branch my-new-feature && git checkout
137
+ my-new-feature`)
138
+ 4. Commit your changes (`git commit -am 'Add some feature'`)
139
+ 5. Push to the branch (`git push origin my-new-feature:dev-branch`)
140
+ 6. Create new Pull Request
141
+
142
+ You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
143
+
144
+
145
+ ## License
146
+
147
+ frequent-algorithm is provided under the terms of the MIT license.
148
+
149
+ Copyright &copy; 2015, Willie Tong &amp; Brooke M. Fujita. All rights reserved.
@@ -1,28 +1,28 @@
1
- # coding: utf-8
2
- require 'frequent/algorithm'
3
-
4
- =begin
5
-
6
- The MIT License (MIT)
7
-
8
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
9
-
10
- Permission is hereby granted, free of charge, to any person obtaining a
11
- copy of this software and associated documentation files (the "Software"),
12
- to deal in the Software without restriction, including without limitation
13
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
14
- and/or sell copies of the Software, and to permit persons to whom the
15
- Software is furnished to do so, subject to the following conditions:
16
-
17
- The above copyright notice and this permission notice shall be included
18
- in all copies or substantial portions of the Software.
19
-
20
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
23
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
25
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
26
- IN THE SOFTWARE.
27
-
28
- =end
1
+ # coding: utf-8
2
+ require 'frequent/algorithm'
3
+
4
+ =begin
5
+
6
+ The MIT License (MIT)
7
+
8
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
9
+
10
+ Permission is hereby granted, free of charge, to any person obtaining a
11
+ copy of this software and associated documentation files (the "Software"),
12
+ to deal in the Software without restriction, including without limitation
13
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
14
+ and/or sell copies of the Software, and to permit persons to whom the
15
+ Software is furnished to do so, subject to the following conditions:
16
+
17
+ The above copyright notice and this permission notice shall be included
18
+ in all copies or substantial portions of the Software.
19
+
20
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
21
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
22
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
23
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
24
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
25
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
26
+ IN THE SOFTWARE.
27
+
28
+ =end
@@ -1,166 +1,182 @@
1
- # coding: utf-8
2
- require 'frequent/version'
3
-
4
- module Frequent
5
-
6
- # `Frequent::Algorithm` is the Ruby implementation of the
7
- # Demaine et al. FREQUENT algorithm for calculating
8
- # top-k items in a stream.
9
- #
10
- # The aims of this algorithm are:
11
- # * use limited memory
12
- # * require constant processing time per item
13
- # * require a single-pass only
14
- #
15
- class Algorithm
16
- # @return [Integer] the number of items in the main window
17
- attr_reader :n
18
- # @return [Integer] the number of items in a basic window
19
- attr_reader :b
20
- # @return [Integer] the number of top item categories to track
21
- attr_reader :k
22
- # @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
23
- attr_reader :queue
24
- # @return [Hash<Object,Integer>] global mapping of items and counts
25
- attr_reader :statistics
26
- # @return [Integer] minimum threshold for membership in top-k items
27
- attr_reader :delta
28
-
29
- # Initializes this top-k frequency-calculating instance.
30
- #
31
- # @param [Integer] n number of items in the main window
32
- # @param [Integer] b number of items in a basic window
33
- # @param [Integer] k number of top item categories to track
34
- # @raise [ArgumentError] if n is not greater than 0
35
- # @raise [ArgumentError] if b is not greater than 0
36
- # @raise [ArgumentError] if k is not greater than 0
37
- # @raise [ArgumentError] if n/b is not greater than 1
38
- def initialize(n, b, k=1)
39
- if n <= 0
40
- raise ArgumentError.new('n must be greater than 0')
41
- end
42
- if b <= 0
43
- raise ArgumentError.new('b must be greater than 0')
44
- end
45
- if k <= 0
46
- raise ArgumentError.new('k must be greater than 0')
47
- end
48
- if n/b < 1
49
- raise ArgumentError.new('n/b must be greater than 1')
50
- end
51
- @n = n
52
- @b = b
53
- @k = k
54
-
55
- @queue = []
56
- @statistics = {}
57
- @delta = 0
58
- end
59
-
60
- # Processes a single basic window of b items, by first adding
61
- # a summary of this basic window in the internal global queue;
62
- # and then updating the global statistics accordingly.
63
- #
64
- # @param [Array] an array of objects representing a basic window
65
- def process(elements)
66
- # Do we need this?
67
- return if elements.length != @b
68
-
69
- # Step 1
70
- summary = {}
71
- elements.each do |e|
72
- if summary.key? e
73
- summary[e] += 1
74
- else
75
- summary[e] = 1
76
- end
77
- end
78
-
79
- # index of the k-th item
80
- kth_index = find_kth_largest(summary)
81
-
82
- # Step 2 & 3
83
- # summary is [[item,count],[item,count],[item,count]....]
84
- # sorted by descending order of the item count
85
- summary = summary.sort { |a,b| b[1]<=>a[1] }[0..kth_index]
86
- @queue << summary
87
-
88
- # Step 4
89
- summary.each do |t|
90
- if @statistics.key? t[0]
91
- @statistics[t[0]] += t[1]
92
- else
93
- @statistics[t[0]] = t[1]
94
- end
95
- end
96
-
97
- # Step 5
98
- @delta += summary[kth_index][1]
99
-
100
- # Step 6
101
- if should_pop_oldest_summary
102
- # a
103
- summary_p = @queue.shift
104
- @delta -= summary_p[find_kth_largest(summary_p)][1]
105
-
106
- # b
107
- summary_p.each { |t| @statistics[t[0]] -= t[1] }
108
- @statistics.delete_if { |k,v| v <= 0 }
109
-
110
- #c
111
- @statistics.select { |k,v| v > @delta }
112
- else
113
- {}
114
- end
115
- end
116
-
117
- # Returns the version for this gem.
118
- #
119
- # @return [String] the version for this gem.
120
- def version
121
- Frequent::VERSION
122
- end
123
-
124
- private
125
- # Return true when it is ready to pop oldest summary from queue
126
- #
127
- # @return [Boolean] whether it is ready to pop oldest summary from queue
128
- def should_pop_oldest_summary
129
- @queue.length > @n/@b
130
- end
131
-
132
- # Return the k-th index of a summary object
133
- #
134
- # @param [Object] a summary object
135
- # @return [Integer] the k-th index
136
- def find_kth_largest(summary)
137
- [summary.length, @k].min - 1
138
- end
139
- end
140
- end
141
-
142
- =begin
143
-
144
- The MIT License (MIT)
145
-
146
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
147
-
148
- Permission is hereby granted, free of charge, to any person obtaining a
149
- copy of this software and associated documentation files (the "Software"),
150
- to deal in the Software without restriction, including without limitation
151
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
152
- and/or sell copies of the Software, and to permit persons to whom the
153
- Software is furnished to do so, subject to the following conditions:
154
-
155
- The above copyright notice and this permission notice shall be included
156
- in all copies or substantial portions of the Software.
157
-
158
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
159
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
160
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
161
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
162
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
163
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
164
- IN THE SOFTWARE.
165
-
166
- =end
1
+ # coding: utf-8
2
+ require 'frequent/version'
3
+
4
+ module Frequent
5
+
6
+ ERR_BADLIST = "List cannot be nil or empty".freeze
7
+ ERR_BADK = "k must be between 1 and %s".freeze
8
+
9
+ # `Frequent::Algorithm` is the Ruby implementation of the
10
+ # Demaine et al. FREQUENT algorithm for calculating
11
+ # top-k items in a stream.
12
+ #
13
+ # The aims of this algorithm are:
14
+ # * use limited memory
15
+ # * require constant processing time per item
16
+ # * require a single-pass only
17
+ #
18
+ class Algorithm
19
+ # @return [Integer] the number of items in the main window
20
+ attr_reader :n
21
+ # @return [Integer] the number of items in a basic window
22
+ attr_reader :b
23
+ # @return [Integer] the number of top item categories to track
24
+ attr_reader :k
25
+ # @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
26
+ attr_reader :queue
27
+ # @return [Hash<Object,Integer>] global mapping of items and counts
28
+ attr_reader :statistics
29
+ # @return [Integer] minimum threshold for membership in top-k items
30
+ attr_reader :delta
31
+
32
+ # Initializes this top-k frequency-calculating instance.
33
+ #
34
+ # @param [Integer] n number of items in the main window
35
+ # @param [Integer] b number of items in a basic window
36
+ # @param [Integer] k number of top item categories to track
37
+ # @raise [ArgumentError] if n is not greater than 0
38
+ # @raise [ArgumentError] if b is not greater than 0
39
+ # @raise [ArgumentError] if k is not greater than 0
40
+ # @raise [ArgumentError] if n/b is not greater than 1
41
+ def initialize(n, b, k=1)
42
+ if n <= 0
43
+ raise ArgumentError.new('n must be greater than 0')
44
+ end
45
+ if b <= 0
46
+ raise ArgumentError.new('b must be greater than 0')
47
+ end
48
+ if k <= 0
49
+ raise ArgumentError.new('k must be greater than 0')
50
+ end
51
+ if n/b < 1
52
+ raise ArgumentError.new('n/b must be greater than 1')
53
+ end
54
+ @n = n
55
+ @b = b
56
+ @k = k
57
+
58
+ @queue = []
59
+ @statistics = {}
60
+ @delta = 0
61
+ end
62
+
63
+ # Processes a single basic window of b items, by first adding
64
+ # a summary of this basic window in the internal global queue;
65
+ # and then updating the global statistics accordingly.
66
+ #
67
+ # @param [Array] an array of objects representing a basic window
68
+ def process(elements)
69
+ # Do we need this?
70
+ return if elements.length != @b
71
+
72
+ # Step 1
73
+ summary = {}
74
+ elements.each do |e|
75
+ if summary.key? e
76
+ summary[e] += 1
77
+ else
78
+ summary[e] = 1
79
+ end
80
+ end
81
+
82
+ # Step 2
83
+ @queue << summary
84
+
85
+ # Step 3
86
+ # Done, implicitly
87
+
88
+ # Step 4
89
+ summary.each do |k,v|
90
+ if @statistics.key? k
91
+ @statistics[k] += v
92
+ else
93
+ @statistics[k] = v
94
+ end
95
+ end
96
+
97
+ # Step 5
98
+ @delta += kth_largest(summary.values, @k)
99
+
100
+ # Step 6 - sizeOf(Q) > N/b
101
+ if @queue.length > @n/@b
102
+ # a
103
+ summary_p = @queue.shift
104
+ @delta -= kth_largest(summary_p.values, @k)
105
+
106
+ # b
107
+ summary_p.each { |k,v| @statistics[k] -= v }
108
+ @statistics.delete_if { |k,v| v <= 0 }
109
+
110
+ #c
111
+ @statistics.select { |k,v| v > @delta }
112
+ else
113
+ {}
114
+ end
115
+ end
116
+
117
+ # Returns the version for this gem.
118
+ #
119
+ # @return [String] the version for this gem.
120
+ def version
121
+ Frequent::VERSION
122
+ end
123
+
124
+ private
125
+ # Given a list of numbers and a number k which should be
126
+ # between 1 and the length of the given list, return the
127
+ # element x in the list that is larger than exactly k-1
128
+ # other elements in the list.
129
+ #
130
+ # @param [Array] list of integers
131
+ # @return [Integer] the kth largest element in list
132
+ def kth_largest(list, k)
133
+ raise ArgumentError.new(ERR_BADLIST) if list.nil? or list.empty?
134
+ raise ArgumentError.new(ERR_BADK) if k < 1
135
+
136
+ ulist = list.uniq
137
+ k = ulist.size if k > ulist.size
138
+
139
+ def quickselect(aset, k)
140
+ p = rand(aset.size)
141
+
142
+ lower = aset.select { |e| e < aset[p] }
143
+ upper = aset.select { |e| e > aset[p] }
144
+
145
+ if k <= lower.size
146
+ quickselect(lower, k)
147
+ elsif k > aset.size - upper.size
148
+ quickselect(upper, k - (aset.size - upper.size))
149
+ else
150
+ aset[p]
151
+ end
152
+ end
153
+ quickselect(ulist, ulist.size+1-k)
154
+ end
155
+ end
156
+ end
157
+
158
+ =begin
159
+
160
+ The MIT License (MIT)
161
+
162
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
163
+
164
+ Permission is hereby granted, free of charge, to any person obtaining a
165
+ copy of this software and associated documentation files (the "Software"),
166
+ to deal in the Software without restriction, including without limitation
167
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
168
+ and/or sell copies of the Software, and to permit persons to whom the
169
+ Software is furnished to do so, subject to the following conditions:
170
+
171
+ The above copyright notice and this permission notice shall be included
172
+ in all copies or substantial portions of the Software.
173
+
174
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
175
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
176
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
177
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
178
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
179
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
180
+ IN THE SOFTWARE.
181
+
182
+ =end
@@ -1,38 +1,38 @@
1
- # coding: utf-8
2
-
3
- # `Frequent` is the namespace for objects implementing
4
- # the Demaine et al. FREQUENT algorithm for finding
5
- # the most frequently-appearing items (top-k) in a
6
- # data stream in sliding windows.
7
- #
8
- # `Frequent::Algorithm` is the implementation class.
9
- module Frequent
10
- # Version string for this Rubygem.
11
- VERSION = '0.0.2'
12
- end
13
-
14
- =begin
15
-
16
- The MIT License (MIT)
17
-
18
- Copyright (c) 2015 Willie Tong, Brooke M. Fujita
19
-
20
- Permission is hereby granted, free of charge, to any person obtaining a
21
- copy of this software and associated documentation files (the "Software"),
22
- to deal in the Software without restriction, including without limitation
23
- the rights to use, copy, modify, merge, publish, distribute, sublicense,
24
- and/or sell copies of the Software, and to permit persons to whom the
25
- Software is furnished to do so, subject to the following conditions:
26
-
27
- The above copyright notice and this permission notice shall be included
28
- in all copies or substantial portions of the Software.
29
-
30
- THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
31
- IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
32
- FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
33
- THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
34
- LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
35
- FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
36
- IN THE SOFTWARE.
37
-
38
- =end
1
+ # coding: utf-8
2
+
3
+ # `Frequent` is the namespace for objects implementing
4
+ # the Demaine et al. FREQUENT algorithm for finding
5
+ # the most frequently-appearing items (top-k) in a
6
+ # data stream in sliding windows.
7
+ #
8
+ # `Frequent::Algorithm` is the implementation class.
9
+ module Frequent
10
+ # Version string for this Rubygem.
11
+ VERSION = '0.0.3'
12
+ end
13
+
14
+ =begin
15
+
16
+ The MIT License (MIT)
17
+
18
+ Copyright (c) 2015 Willie Tong, Brooke M. Fujita
19
+
20
+ Permission is hereby granted, free of charge, to any person obtaining a
21
+ copy of this software and associated documentation files (the "Software"),
22
+ to deal in the Software without restriction, including without limitation
23
+ the rights to use, copy, modify, merge, publish, distribute, sublicense,
24
+ and/or sell copies of the Software, and to permit persons to whom the
25
+ Software is furnished to do so, subject to the following conditions:
26
+
27
+ The above copyright notice and this permission notice shall be included
28
+ in all copies or substantial portions of the Software.
29
+
30
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
31
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
32
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
33
+ THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
34
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
35
+ FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
36
+ IN THE SOFTWARE.
37
+
38
+ =end
metadata CHANGED
@@ -1,44 +1,44 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: frequent-algorithm
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.0.2
4
+ version: 0.0.3
5
5
  platform: ruby
6
6
  authors:
7
7
  - Willie Tong
8
8
  - Brooke M. Fujita
9
- autorequire:
9
+ autorequire:
10
10
  bindir: bin
11
11
  cert_chain: []
12
- date: 2015-03-19 00:00:00.000000000 Z
12
+ date: 2015-03-23 00:00:00.000000000 Z
13
13
  dependencies:
14
14
  - !ruby/object:Gem::Dependency
15
15
  name: rake
16
- requirement: !ruby/object:Gem::Requirement
16
+ version_requirements: !ruby/object:Gem::Requirement
17
17
  requirements:
18
- - - '>='
18
+ - - ">="
19
19
  - !ruby/object:Gem::Version
20
20
  version: '0'
21
- type: :development
22
- prerelease: false
23
- version_requirements: !ruby/object:Gem::Requirement
21
+ requirement: !ruby/object:Gem::Requirement
24
22
  requirements:
25
- - - '>='
23
+ - - ">="
26
24
  - !ruby/object:Gem::Version
27
25
  version: '0'
26
+ prerelease: false
27
+ type: :development
28
28
  - !ruby/object:Gem::Dependency
29
29
  name: minitest
30
- requirement: !ruby/object:Gem::Requirement
30
+ version_requirements: !ruby/object:Gem::Requirement
31
31
  requirements:
32
- - - '>='
32
+ - - ">="
33
33
  - !ruby/object:Gem::Version
34
34
  version: '0'
35
- type: :development
36
- prerelease: false
37
- version_requirements: !ruby/object:Gem::Requirement
35
+ requirement: !ruby/object:Gem::Requirement
38
36
  requirements:
39
- - - '>='
37
+ - - ">="
40
38
  - !ruby/object:Gem::Version
41
39
  version: '0'
40
+ prerelease: false
41
+ type: :development
42
42
  description: |
43
43
  frequent-algorithm is a Ruby implementation of the Demaine et al FREQUENT algorithm for identifying frequent items in a data stream in sliding windows (c.f Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003).
44
44
  email:
@@ -48,36 +48,35 @@ executables: []
48
48
  extensions: []
49
49
  extra_rdoc_files: []
50
50
  files:
51
+ - ".yardopts"
52
+ - CHANGELOG
53
+ - LICENSE
54
+ - README.md
51
55
  - lib/frequent-algorithm.rb
52
56
  - lib/frequent/algorithm.rb
53
57
  - lib/frequent/version.rb
54
- - README.md
55
- - LICENSE
56
- - CHANGELOG
57
- - .yardopts
58
58
  homepage: https://github.com/buruzaemon/frequent-algorithm
59
59
  licenses:
60
60
  - MIT
61
61
  metadata: {}
62
- post_install_message:
62
+ post_install_message:
63
63
  rdoc_options: []
64
64
  require_paths:
65
65
  - lib
66
66
  required_ruby_version: !ruby/object:Gem::Requirement
67
67
  requirements:
68
- - - '>='
68
+ - - ">="
69
69
  - !ruby/object:Gem::Version
70
70
  version: '2.0'
71
71
  required_rubygems_version: !ruby/object:Gem::Requirement
72
72
  requirements:
73
- - - '>='
73
+ - - ">="
74
74
  - !ruby/object:Gem::Version
75
75
  version: '0'
76
76
  requirements: []
77
- rubyforge_project:
78
- rubygems_version: 2.0.14
79
- signing_key:
77
+ rubyforge_project:
78
+ rubygems_version: 2.4.5
79
+ signing_key:
80
80
  specification_version: 4
81
- summary: Identifies frequent items in a data stream in sliding windows using the Demaine
82
- et al FREQUENT algorithm.
81
+ summary: Identifies frequent items in a data stream in sliding windows using the Demaine et al FREQUENT algorithm.
83
82
  test_files: []