frequent-algorithm 0.0.2 → 0.0.3
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +4 -4
- data/.yardopts +7 -7
- data/CHANGELOG +9 -9
- data/LICENSE +22 -22
- data/README.md +149 -149
- data/lib/frequent-algorithm.rb +28 -28
- data/lib/frequent/algorithm.rb +182 -166
- data/lib/frequent/version.rb +38 -38
- metadata +26 -27
checksums.yaml
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
---
|
2
2
|
SHA1:
|
3
|
-
metadata.gz:
|
4
|
-
data.tar.gz:
|
3
|
+
metadata.gz: c581bac868de994e83f9fee63f34d8987ab2eda9
|
4
|
+
data.tar.gz: a9c068fa92383857a5d0526c9aa55ce4746651d5
|
5
5
|
SHA512:
|
6
|
-
metadata.gz:
|
7
|
-
data.tar.gz:
|
6
|
+
metadata.gz: 42571a866be7b6ee00748a10afb5ee6d48d8025198a68877a18401bedc0e372da09b1af3a7ef69ac94c78ed2fb798e52899b30eaddea2c423c0982413a2f9f3e
|
7
|
+
data.tar.gz: 0fd80b41cbc31fec785ea52596ab0e9094e54172d8a95e419f80339cdbcd7e9b275dd4a7a0826380ff43a3f81e8f6519d87fe02773ef4319c4a264211cee6f95
|
data/.yardopts
CHANGED
@@ -1,7 +1,7 @@
|
|
1
|
-
--no-private
|
2
|
-
--readme README.md
|
3
|
-
--markup markdown
|
4
|
-
--markup-provider rdiscount
|
5
|
-
-
|
6
|
-
LICENSE
|
7
|
-
CHANGELOG
|
1
|
+
--no-private
|
2
|
+
--readme README.md
|
3
|
+
--markup markdown
|
4
|
+
--markup-provider rdiscount
|
5
|
+
-
|
6
|
+
LICENSE
|
7
|
+
CHANGELOG
|
data/CHANGELOG
CHANGED
@@ -1,9 +1,9 @@
|
|
1
|
-
## CHANGELOG
|
2
|
-
|
3
|
-
- __2015/03/19 0.0.2 release.
|
4
|
-
- First-stage implementation.
|
5
|
-
- API documentation added.
|
6
|
-
- Fleshing out unit tests.
|
7
|
-
|
8
|
-
- __2015/03/11__: 0.0.1 release.
|
9
|
-
- Initial release.
|
1
|
+
## CHANGELOG
|
2
|
+
|
3
|
+
- __2015/03/19 0.0.2 release.
|
4
|
+
- First-stage implementation.
|
5
|
+
- API documentation added.
|
6
|
+
- Fleshing out unit tests.
|
7
|
+
|
8
|
+
- __2015/03/11__: 0.0.1 release.
|
9
|
+
- Initial release.
|
data/LICENSE
CHANGED
@@ -1,22 +1,22 @@
|
|
1
|
-
The MIT License (MIT)
|
2
|
-
|
3
|
-
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
4
|
-
|
5
|
-
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
-
of this software and associated documentation files (the "Software"), to deal
|
7
|
-
in the Software without restriction, including without limitation the rights
|
8
|
-
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
-
copies of the Software, and to permit persons to whom the Software is
|
10
|
-
furnished to do so, subject to the following conditions:
|
11
|
-
|
12
|
-
The above copyright notice and this permission notice shall be included in all
|
13
|
-
copies or substantial portions of the Software.
|
14
|
-
|
15
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
-
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
-
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
-
SOFTWARE.
|
22
|
-
|
1
|
+
The MIT License (MIT)
|
2
|
+
|
3
|
+
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
6
|
+
of this software and associated documentation files (the "Software"), to deal
|
7
|
+
in the Software without restriction, including without limitation the rights
|
8
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
9
|
+
copies of the Software, and to permit persons to whom the Software is
|
10
|
+
furnished to do so, subject to the following conditions:
|
11
|
+
|
12
|
+
The above copyright notice and this permission notice shall be included in all
|
13
|
+
copies or substantial portions of the Software.
|
14
|
+
|
15
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
16
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
17
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
18
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
19
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
20
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
21
|
+
SOFTWARE.
|
22
|
+
|
data/README.md
CHANGED
@@ -1,149 +1,149 @@
|
|
1
|
-
# frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
|
2
|
-
|
3
|
-
Web site usage, social network behavior and Internet traffic are examples
|
4
|
-
of systems that appear to follow the [
|
5
|
-
where most of the events are due to the actions of a very small few.
|
6
|
-
Knowing at any given point in time which items are trending is valuable
|
7
|
-
in understanding the system.
|
8
|
-
|
9
|
-
`frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
|
10
|
-
for identifying frequent items in a data stream in sliding windows.
|
11
|
-
Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
|
12
|
-
Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
|
13
|
-
Golab, DeHaan, Demaine, López-Ortiz and Munro (2003).
|
14
|
-
|
15
|
-
## Introduction
|
16
|
-
|
17
|
-
### Challenges
|
18
|
-
|
19
|
-
Challenges for Real-time processing of data streams for _frequent item queries_
|
20
|
-
include:
|
21
|
-
|
22
|
-
* data may be of unknown and possibly unbound length
|
23
|
-
* data may be arriving a very fast rate
|
24
|
-
* it might not be possible to go back and re-read the data
|
25
|
-
* too large a window of observation may include stale data
|
26
|
-
|
27
|
-
Therefore, a solution should have the following characteristics:
|
28
|
-
|
29
|
-
* uses limited memory
|
30
|
-
* can process events in the stream in Ο(1) constant time
|
31
|
-
* requires only a single-pass over the data
|
32
|
-
|
33
|
-
|
34
|
-
### The algorithm
|
35
|
-
|
36
|
-
> LOOP<br/>
|
37
|
-
> 1. For each element e in the next b elements:<br/>
|
38
|
-
> If a local counter exists for the type of element e:<br/>
|
39
|
-
> Increment the local counter.<br/>
|
40
|
-
> Otherwise:<br/>
|
41
|
-
> Create a new local counter for this element type<br/>
|
42
|
-
> and set it equal to 1.<br/>
|
43
|
-
> 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
|
44
|
-
> 3. Delete all local counters<br/>
|
45
|
-
> 4. For each type named in S:<br/>
|
46
|
-
> If a global counter exists for this type:<br/>
|
47
|
-
> Add to it the count recorded in S.<br/>
|
48
|
-
> Otherwise:<br/>
|
49
|
-
> Create a new global counter for this element type<br/>
|
50
|
-
> and set it equal to the count recorded in S.<br/>
|
51
|
-
> 5. Add the count of the kth largest type in S to δ.<br/>
|
52
|
-
> 6. If sizeOf(Q) > N/b:<br/>
|
53
|
-
> (a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
|
54
|
-
> (b) For all element types named in S':<br/>
|
55
|
-
> Subtract from their global counters the counts<br/>
|
56
|
-
> recorded in S'<br/>
|
57
|
-
> If a counter is decremented to zero:<br/>
|
58
|
-
> Delete it.<br/>
|
59
|
-
> (c) Output the identity and value of each global counter > δ.
|
60
|
-
>
|
61
|
-
> — <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
|
62
|
-
|
63
|
-
|
64
|
-
## Usage
|
65
|
-
|
66
|
-
require 'frequent-algorithm'
|
67
|
-
|
68
|
-
# data is pi to 1000 digits
|
69
|
-
pi = File.read('test/frequent/test_data_pi').strip
|
70
|
-
data = pi.scan(/./).each_slice(b)
|
71
|
-
|
72
|
-
N = 100 # size of main window
|
73
|
-
b = 20 # size of basic window
|
74
|
-
k = 3 # we are interested in top-3 numerals in pi
|
75
|
-
|
76
|
-
alg = Frequent::Algorithm.new(N, b, k)
|
77
|
-
|
78
|
-
# read in and process the 1st basic window
|
79
|
-
alg.process(data.next)
|
80
|
-
|
81
|
-
# and the top-3 numerals are?
|
82
|
-
top3 = alg.statistics.report
|
83
|
-
puts top3
|
84
|
-
|
85
|
-
# lather, rinse and repeat
|
86
|
-
alg.process(data.next)
|
87
|
-
|
88
|
-
|
89
|
-
## Development
|
90
|
-
|
91
|
-
The development of this gem requires the following:
|
92
|
-
|
93
|
-
* [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
|
94
|
-
* [rubygems](https://rubygems.org/pages/download)
|
95
|
-
* [`bundler`](https://github.com/bundler/bundler)
|
96
|
-
* [`rake`](https://github.com/ruby/rake)
|
97
|
-
* [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
|
98
|
-
* [`yard`](https://rubygems.org/gems/yard) (documentation)
|
99
|
-
* [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
|
100
|
-
|
101
|
-
Building, testing and release of this rubygem uses the following
|
102
|
-
`rake` commands:
|
103
|
-
|
104
|
-
|
105
|
-
rake
|
106
|
-
rake
|
107
|
-
rake
|
108
|
-
rake
|
109
|
-
rake
|
110
|
-
|
111
|
-
|
112
|
-
|
113
|
-
|
114
|
-
### Documentation
|
115
|
-
|
116
|
-
`frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
|
117
|
-
[`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
|
118
|
-
Check out [Getting Started with
|
119
|
-
Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
|
120
|
-
|
121
|
-
### Unit Testing
|
122
|
-
|
123
|
-
`frequent-algorithm` uses
|
124
|
-
[`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
|
125
|
-
unit testing.
|
126
|
-
|
127
|
-
### Releasing
|
128
|
-
|
129
|
-
Please refer to Publishing To Rubygems.org in the
|
130
|
-
[Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
|
131
|
-
|
132
|
-
### Contributing
|
133
|
-
|
134
|
-
1. Fork it
|
135
|
-
2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
|
136
|
-
3. Create your feature branch (`git branch my-new-feature && git checkout
|
137
|
-
my-new-feature`)
|
138
|
-
4. Commit your changes (`git commit -am 'Add some feature'`)
|
139
|
-
5. Push to the branch (`git push origin my-new-feature:dev-branch`)
|
140
|
-
6. Create new Pull Request
|
141
|
-
|
142
|
-
You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
|
143
|
-
|
144
|
-
|
145
|
-
## License
|
146
|
-
|
147
|
-
frequent-algorithm is provided under the terms of the MIT license.
|
148
|
-
|
149
|
-
Copyright © 2015, Willie Tong & Brooke M. Fujita. All rights reserved.
|
1
|
+
# frequent-algorithm [![Gem Version](https://badge.fury.io/rb/frequent-algorithm.svg)](http://badge.fury.io/rb/frequent-algorithm) [![Build Status](https://travis-ci.org/buruzaemon/frequent-algorithm.svg)](https://travis-ci.org/buruzaemon/frequent-algorithm)
|
2
|
+
|
3
|
+
Web site usage, social network behavior and Internet traffic are examples
|
4
|
+
of systems that appear to follow the [Power law](http://en.wikipedia.org/wiki/Power_law),
|
5
|
+
where most of the events are due to the actions of a very small few.
|
6
|
+
Knowing at any given point in time which items are trending is valuable
|
7
|
+
in understanding the system.
|
8
|
+
|
9
|
+
`frequent-algorithm` is a Ruby implementation of the FREQUENT algorithm
|
10
|
+
for identifying frequent items in a data stream in sliding windows.
|
11
|
+
Please refer to [Identifying Frequent Items in Sliding Windows over On-Line
|
12
|
+
Packet Streams](http://erikdemaine.org/papers/SlidingWindow_IMC2003/), by
|
13
|
+
Golab, DeHaan, Demaine, López-Ortiz and Munro (2003).
|
14
|
+
|
15
|
+
## Introduction
|
16
|
+
|
17
|
+
### Challenges
|
18
|
+
|
19
|
+
Challenges for Real-time processing of data streams for _frequent item queries_
|
20
|
+
include:
|
21
|
+
|
22
|
+
* data may be of unknown and possibly unbound length
|
23
|
+
* data may be arriving a very fast rate
|
24
|
+
* it might not be possible to go back and re-read the data
|
25
|
+
* too large a window of observation may include stale data
|
26
|
+
|
27
|
+
Therefore, a solution should have the following characteristics:
|
28
|
+
|
29
|
+
* uses limited memory
|
30
|
+
* can process events in the stream in Ο(1) constant time
|
31
|
+
* requires only a single-pass over the data
|
32
|
+
|
33
|
+
|
34
|
+
### The algorithm
|
35
|
+
|
36
|
+
> LOOP<br/>
|
37
|
+
> 1. For each element e in the next b elements:<br/>
|
38
|
+
> If a local counter exists for the type of element e:<br/>
|
39
|
+
> Increment the local counter.<br/>
|
40
|
+
> Otherwise:<br/>
|
41
|
+
> Create a new local counter for this element type<br/>
|
42
|
+
> and set it equal to 1.<br/>
|
43
|
+
> 2. Add a summary S containing identities and counts of the k most frequent items to the back of queue Q.<br/>
|
44
|
+
> 3. Delete all local counters<br/>
|
45
|
+
> 4. For each type named in S:<br/>
|
46
|
+
> If a global counter exists for this type:<br/>
|
47
|
+
> Add to it the count recorded in S.<br/>
|
48
|
+
> Otherwise:<br/>
|
49
|
+
> Create a new global counter for this element type<br/>
|
50
|
+
> and set it equal to the count recorded in S.<br/>
|
51
|
+
> 5. Add the count of the kth largest type in S to δ.<br/>
|
52
|
+
> 6. If sizeOf(Q) > N/b:<br/>
|
53
|
+
> (a) Remove the summary S' from the front of Q and subtract the count of the kth largest type in S' from δ.<br/>
|
54
|
+
> (b) For all element types named in S':<br/>
|
55
|
+
> Subtract from their global counters the counts<br/>
|
56
|
+
> recorded in S'<br/>
|
57
|
+
> If a counter is decremented to zero:<br/>
|
58
|
+
> Delete it.<br/>
|
59
|
+
> (c) Output the identity and value of each global counter > δ.
|
60
|
+
>
|
61
|
+
> — <cite>Golab, DeHaan, Demaine, López-Ortiz and Munro. Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003</cite>
|
62
|
+
|
63
|
+
|
64
|
+
## Usage
|
65
|
+
|
66
|
+
require 'frequent-algorithm'
|
67
|
+
|
68
|
+
# data is pi to 1000 digits
|
69
|
+
pi = File.read('test/frequent/test_data_pi').strip
|
70
|
+
data = pi.scan(/./).each_slice(b)
|
71
|
+
|
72
|
+
N = 100 # size of main window
|
73
|
+
b = 20 # size of basic window
|
74
|
+
k = 3 # we are interested in top-3 numerals in pi
|
75
|
+
|
76
|
+
alg = Frequent::Algorithm.new(N, b, k)
|
77
|
+
|
78
|
+
# read in and process the 1st basic window
|
79
|
+
alg.process(data.next)
|
80
|
+
|
81
|
+
# and the top-3 numerals are?
|
82
|
+
top3 = alg.statistics.report
|
83
|
+
puts top3
|
84
|
+
|
85
|
+
# lather, rinse and repeat
|
86
|
+
alg.process(data.next)
|
87
|
+
|
88
|
+
|
89
|
+
## Development
|
90
|
+
|
91
|
+
The development of this gem requires the following:
|
92
|
+
|
93
|
+
* [Ruby 1.9.3 or greater](https://www.ruby-lang.org/en/)
|
94
|
+
* [rubygems](https://rubygems.org/pages/download)
|
95
|
+
* [`bundler`](https://github.com/bundler/bundler)
|
96
|
+
* [`rake`](https://github.com/ruby/rake)
|
97
|
+
* [`minitest`](https://rubygems.org/gems/minitest) (unit testing)
|
98
|
+
* [`yard`](https://rubygems.org/gems/yard) (documentation)
|
99
|
+
* [`rdiscount`](https://rubygems.org/gems/rdiscount) (Markdown)
|
100
|
+
|
101
|
+
Building, testing and release of this rubygem uses the following
|
102
|
+
`rake` commands:
|
103
|
+
|
104
|
+
|
105
|
+
rake clean # Remove any temporary products
|
106
|
+
rake clobber # Remove any generated file
|
107
|
+
rake test # Execute unit tests
|
108
|
+
rake build # Build frequent-algorithm-n.n.n.gem into the pkg directory
|
109
|
+
rake install # Build and install frequent-algorithm-n.n.n.gem into system gems
|
110
|
+
rake release # Create tag vn.n.n and build and push
|
111
|
+
# frequent-algorithm-n.n.n.gem to Rubygems
|
112
|
+
|
113
|
+
|
114
|
+
### Documentation
|
115
|
+
|
116
|
+
`frequent-algorithm` uses [`yard`](https://rubygems.org/gems/yard) and
|
117
|
+
[`rdiscount`](https://rubygems.org/gems/rdiscount) for Markdown documentation.
|
118
|
+
Check out [Getting Started with
|
119
|
+
Yard](http://www.rubydoc.info/gems/yard/file/docs/GettingStarted.md).
|
120
|
+
|
121
|
+
### Unit Testing
|
122
|
+
|
123
|
+
`frequent-algorithm` uses
|
124
|
+
[`MiniTest::Unit`](https://github.com/seattlerb/minitest) for
|
125
|
+
unit testing.
|
126
|
+
|
127
|
+
### Releasing
|
128
|
+
|
129
|
+
Please refer to Publishing To Rubygems.org in the
|
130
|
+
[Rubygems Guide](http://guides.rubygems.org/make-your-own-gem/).
|
131
|
+
|
132
|
+
### Contributing
|
133
|
+
|
134
|
+
1. Fork it
|
135
|
+
2. Begin work on `dev-branch` (`git fetch && git checkout dev-branch`)
|
136
|
+
3. Create your feature branch (`git branch my-new-feature && git checkout
|
137
|
+
my-new-feature`)
|
138
|
+
4. Commit your changes (`git commit -am 'Add some feature'`)
|
139
|
+
5. Push to the branch (`git push origin my-new-feature:dev-branch`)
|
140
|
+
6. Create new Pull Request
|
141
|
+
|
142
|
+
You may wish to read the [Git book online](http://git-scm.com/book/en/v2).
|
143
|
+
|
144
|
+
|
145
|
+
## License
|
146
|
+
|
147
|
+
frequent-algorithm is provided under the terms of the MIT license.
|
148
|
+
|
149
|
+
Copyright © 2015, Willie Tong & Brooke M. Fujita. All rights reserved.
|
data/lib/frequent-algorithm.rb
CHANGED
@@ -1,28 +1,28 @@
|
|
1
|
-
# coding: utf-8
|
2
|
-
require 'frequent/algorithm'
|
3
|
-
|
4
|
-
=begin
|
5
|
-
|
6
|
-
The MIT License (MIT)
|
7
|
-
|
8
|
-
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
9
|
-
|
10
|
-
Permission is hereby granted, free of charge, to any person obtaining a
|
11
|
-
copy of this software and associated documentation files (the "Software"),
|
12
|
-
to deal in the Software without restriction, including without limitation
|
13
|
-
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
14
|
-
and/or sell copies of the Software, and to permit persons to whom the
|
15
|
-
Software is furnished to do so, subject to the following conditions:
|
16
|
-
|
17
|
-
The above copyright notice and this permission notice shall be included
|
18
|
-
in all copies or substantial portions of the Software.
|
19
|
-
|
20
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
21
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
22
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
23
|
-
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
24
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
25
|
-
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
26
|
-
IN THE SOFTWARE.
|
27
|
-
|
28
|
-
=end
|
1
|
+
# coding: utf-8
|
2
|
+
require 'frequent/algorithm'
|
3
|
+
|
4
|
+
=begin
|
5
|
+
|
6
|
+
The MIT License (MIT)
|
7
|
+
|
8
|
+
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
9
|
+
|
10
|
+
Permission is hereby granted, free of charge, to any person obtaining a
|
11
|
+
copy of this software and associated documentation files (the "Software"),
|
12
|
+
to deal in the Software without restriction, including without limitation
|
13
|
+
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
14
|
+
and/or sell copies of the Software, and to permit persons to whom the
|
15
|
+
Software is furnished to do so, subject to the following conditions:
|
16
|
+
|
17
|
+
The above copyright notice and this permission notice shall be included
|
18
|
+
in all copies or substantial portions of the Software.
|
19
|
+
|
20
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
21
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
22
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
23
|
+
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
24
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
25
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
26
|
+
IN THE SOFTWARE.
|
27
|
+
|
28
|
+
=end
|
data/lib/frequent/algorithm.rb
CHANGED
@@ -1,166 +1,182 @@
|
|
1
|
-
# coding: utf-8
|
2
|
-
require 'frequent/version'
|
3
|
-
|
4
|
-
module Frequent
|
5
|
-
|
6
|
-
|
7
|
-
|
8
|
-
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
12
|
-
#
|
13
|
-
#
|
14
|
-
#
|
15
|
-
|
16
|
-
|
17
|
-
|
18
|
-
|
19
|
-
|
20
|
-
|
21
|
-
|
22
|
-
|
23
|
-
|
24
|
-
|
25
|
-
|
26
|
-
|
27
|
-
|
28
|
-
|
29
|
-
#
|
30
|
-
|
31
|
-
|
32
|
-
#
|
33
|
-
#
|
34
|
-
# @
|
35
|
-
# @
|
36
|
-
# @
|
37
|
-
# @raise [ArgumentError] if n
|
38
|
-
|
39
|
-
|
40
|
-
|
41
|
-
|
42
|
-
if
|
43
|
-
raise ArgumentError.new('
|
44
|
-
end
|
45
|
-
if
|
46
|
-
raise ArgumentError.new('
|
47
|
-
end
|
48
|
-
if
|
49
|
-
raise ArgumentError.new('
|
50
|
-
end
|
51
|
-
|
52
|
-
|
53
|
-
|
54
|
-
|
55
|
-
@
|
56
|
-
@
|
57
|
-
|
58
|
-
|
59
|
-
|
60
|
-
|
61
|
-
|
62
|
-
|
63
|
-
#
|
64
|
-
#
|
65
|
-
|
66
|
-
|
67
|
-
|
68
|
-
|
69
|
-
#
|
70
|
-
|
71
|
-
|
72
|
-
|
73
|
-
|
74
|
-
|
75
|
-
|
76
|
-
|
77
|
-
|
78
|
-
|
79
|
-
|
80
|
-
|
81
|
-
|
82
|
-
# Step 2
|
83
|
-
|
84
|
-
|
85
|
-
|
86
|
-
|
87
|
-
|
88
|
-
# Step 4
|
89
|
-
summary.each do |
|
90
|
-
if @statistics.key?
|
91
|
-
@statistics[
|
92
|
-
else
|
93
|
-
@statistics[
|
94
|
-
end
|
95
|
-
end
|
96
|
-
|
97
|
-
# Step 5
|
98
|
-
@delta += summary
|
99
|
-
|
100
|
-
# Step 6
|
101
|
-
if
|
102
|
-
# a
|
103
|
-
summary_p = @queue.shift
|
104
|
-
@delta -=
|
105
|
-
|
106
|
-
# b
|
107
|
-
summary_p.each { |
|
108
|
-
@statistics.delete_if { |k,v| v <= 0 }
|
109
|
-
|
110
|
-
#c
|
111
|
-
@statistics.select { |k,v| v > @delta }
|
112
|
-
else
|
113
|
-
{}
|
114
|
-
end
|
115
|
-
end
|
116
|
-
|
117
|
-
# Returns the version for this gem.
|
118
|
-
#
|
119
|
-
# @return [String] the version for this gem.
|
120
|
-
def version
|
121
|
-
Frequent::VERSION
|
122
|
-
end
|
123
|
-
|
124
|
-
private
|
125
|
-
|
126
|
-
|
127
|
-
|
128
|
-
|
129
|
-
|
130
|
-
|
131
|
-
|
132
|
-
|
133
|
-
|
134
|
-
|
135
|
-
|
136
|
-
|
137
|
-
|
138
|
-
|
139
|
-
|
140
|
-
|
141
|
-
|
142
|
-
=
|
143
|
-
|
144
|
-
|
145
|
-
|
146
|
-
|
147
|
-
|
148
|
-
|
149
|
-
|
150
|
-
|
151
|
-
|
152
|
-
|
153
|
-
|
154
|
-
|
155
|
-
|
156
|
-
|
157
|
-
|
158
|
-
|
159
|
-
|
160
|
-
|
161
|
-
|
162
|
-
|
163
|
-
|
164
|
-
|
165
|
-
|
166
|
-
|
1
|
+
# coding: utf-8
|
2
|
+
require 'frequent/version'
|
3
|
+
|
4
|
+
module Frequent
|
5
|
+
|
6
|
+
ERR_BADLIST = "List cannot be nil or empty".freeze
|
7
|
+
ERR_BADK = "k must be between 1 and %s".freeze
|
8
|
+
|
9
|
+
# `Frequent::Algorithm` is the Ruby implementation of the
|
10
|
+
# Demaine et al. FREQUENT algorithm for calculating
|
11
|
+
# top-k items in a stream.
|
12
|
+
#
|
13
|
+
# The aims of this algorithm are:
|
14
|
+
# * use limited memory
|
15
|
+
# * require constant processing time per item
|
16
|
+
# * require a single-pass only
|
17
|
+
#
|
18
|
+
class Algorithm
|
19
|
+
# @return [Integer] the number of items in the main window
|
20
|
+
attr_reader :n
|
21
|
+
# @return [Integer] the number of items in a basic window
|
22
|
+
attr_reader :b
|
23
|
+
# @return [Integer] the number of top item categories to track
|
24
|
+
attr_reader :k
|
25
|
+
# @return [Array<Hash<Object,Integer>>] global queue for basic window summaries
|
26
|
+
attr_reader :queue
|
27
|
+
# @return [Hash<Object,Integer>] global mapping of items and counts
|
28
|
+
attr_reader :statistics
|
29
|
+
# @return [Integer] minimum threshold for membership in top-k items
|
30
|
+
attr_reader :delta
|
31
|
+
|
32
|
+
# Initializes this top-k frequency-calculating instance.
|
33
|
+
#
|
34
|
+
# @param [Integer] n number of items in the main window
|
35
|
+
# @param [Integer] b number of items in a basic window
|
36
|
+
# @param [Integer] k number of top item categories to track
|
37
|
+
# @raise [ArgumentError] if n is not greater than 0
|
38
|
+
# @raise [ArgumentError] if b is not greater than 0
|
39
|
+
# @raise [ArgumentError] if k is not greater than 0
|
40
|
+
# @raise [ArgumentError] if n/b is not greater than 1
|
41
|
+
def initialize(n, b, k=1)
|
42
|
+
if n <= 0
|
43
|
+
raise ArgumentError.new('n must be greater than 0')
|
44
|
+
end
|
45
|
+
if b <= 0
|
46
|
+
raise ArgumentError.new('b must be greater than 0')
|
47
|
+
end
|
48
|
+
if k <= 0
|
49
|
+
raise ArgumentError.new('k must be greater than 0')
|
50
|
+
end
|
51
|
+
if n/b < 1
|
52
|
+
raise ArgumentError.new('n/b must be greater than 1')
|
53
|
+
end
|
54
|
+
@n = n
|
55
|
+
@b = b
|
56
|
+
@k = k
|
57
|
+
|
58
|
+
@queue = []
|
59
|
+
@statistics = {}
|
60
|
+
@delta = 0
|
61
|
+
end
|
62
|
+
|
63
|
+
# Processes a single basic window of b items, by first adding
|
64
|
+
# a summary of this basic window in the internal global queue;
|
65
|
+
# and then updating the global statistics accordingly.
|
66
|
+
#
|
67
|
+
# @param [Array] an array of objects representing a basic window
|
68
|
+
def process(elements)
|
69
|
+
# Do we need this?
|
70
|
+
return if elements.length != @b
|
71
|
+
|
72
|
+
# Step 1
|
73
|
+
summary = {}
|
74
|
+
elements.each do |e|
|
75
|
+
if summary.key? e
|
76
|
+
summary[e] += 1
|
77
|
+
else
|
78
|
+
summary[e] = 1
|
79
|
+
end
|
80
|
+
end
|
81
|
+
|
82
|
+
# Step 2
|
83
|
+
@queue << summary
|
84
|
+
|
85
|
+
# Step 3
|
86
|
+
# Done, implicitly
|
87
|
+
|
88
|
+
# Step 4
|
89
|
+
summary.each do |k,v|
|
90
|
+
if @statistics.key? k
|
91
|
+
@statistics[k] += v
|
92
|
+
else
|
93
|
+
@statistics[k] = v
|
94
|
+
end
|
95
|
+
end
|
96
|
+
|
97
|
+
# Step 5
|
98
|
+
@delta += kth_largest(summary.values, @k)
|
99
|
+
|
100
|
+
# Step 6 - sizeOf(Q) > N/b
|
101
|
+
if @queue.length > @n/@b
|
102
|
+
# a
|
103
|
+
summary_p = @queue.shift
|
104
|
+
@delta -= kth_largest(summary_p.values, @k)
|
105
|
+
|
106
|
+
# b
|
107
|
+
summary_p.each { |k,v| @statistics[k] -= v }
|
108
|
+
@statistics.delete_if { |k,v| v <= 0 }
|
109
|
+
|
110
|
+
#c
|
111
|
+
@statistics.select { |k,v| v > @delta }
|
112
|
+
else
|
113
|
+
{}
|
114
|
+
end
|
115
|
+
end
|
116
|
+
|
117
|
+
# Returns the version for this gem.
|
118
|
+
#
|
119
|
+
# @return [String] the version for this gem.
|
120
|
+
def version
|
121
|
+
Frequent::VERSION
|
122
|
+
end
|
123
|
+
|
124
|
+
private
|
125
|
+
# Given a list of numbers and a number k which should be
|
126
|
+
# between 1 and the length of the given list, return the
|
127
|
+
# element x in the list that is larger than exactly k-1
|
128
|
+
# other elements in the list.
|
129
|
+
#
|
130
|
+
# @param [Array] list of integers
|
131
|
+
# @return [Integer] the kth largest element in list
|
132
|
+
def kth_largest(list, k)
|
133
|
+
raise ArgumentError.new(ERR_BADLIST) if list.nil? or list.empty?
|
134
|
+
raise ArgumentError.new(ERR_BADK) if k < 1
|
135
|
+
|
136
|
+
ulist = list.uniq
|
137
|
+
k = ulist.size if k > ulist.size
|
138
|
+
|
139
|
+
def quickselect(aset, k)
|
140
|
+
p = rand(aset.size)
|
141
|
+
|
142
|
+
lower = aset.select { |e| e < aset[p] }
|
143
|
+
upper = aset.select { |e| e > aset[p] }
|
144
|
+
|
145
|
+
if k <= lower.size
|
146
|
+
quickselect(lower, k)
|
147
|
+
elsif k > aset.size - upper.size
|
148
|
+
quickselect(upper, k - (aset.size - upper.size))
|
149
|
+
else
|
150
|
+
aset[p]
|
151
|
+
end
|
152
|
+
end
|
153
|
+
quickselect(ulist, ulist.size+1-k)
|
154
|
+
end
|
155
|
+
end
|
156
|
+
end
|
157
|
+
|
158
|
+
=begin
|
159
|
+
|
160
|
+
The MIT License (MIT)
|
161
|
+
|
162
|
+
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
163
|
+
|
164
|
+
Permission is hereby granted, free of charge, to any person obtaining a
|
165
|
+
copy of this software and associated documentation files (the "Software"),
|
166
|
+
to deal in the Software without restriction, including without limitation
|
167
|
+
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
168
|
+
and/or sell copies of the Software, and to permit persons to whom the
|
169
|
+
Software is furnished to do so, subject to the following conditions:
|
170
|
+
|
171
|
+
The above copyright notice and this permission notice shall be included
|
172
|
+
in all copies or substantial portions of the Software.
|
173
|
+
|
174
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
175
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
176
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
177
|
+
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
178
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
179
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
180
|
+
IN THE SOFTWARE.
|
181
|
+
|
182
|
+
=end
|
data/lib/frequent/version.rb
CHANGED
@@ -1,38 +1,38 @@
|
|
1
|
-
# coding: utf-8
|
2
|
-
|
3
|
-
# `Frequent` is the namespace for objects implementing
|
4
|
-
# the Demaine et al. FREQUENT algorithm for finding
|
5
|
-
# the most frequently-appearing items (top-k) in a
|
6
|
-
# data stream in sliding windows.
|
7
|
-
#
|
8
|
-
# `Frequent::Algorithm` is the implementation class.
|
9
|
-
module Frequent
|
10
|
-
# Version string for this Rubygem.
|
11
|
-
VERSION = '0.0.
|
12
|
-
end
|
13
|
-
|
14
|
-
=begin
|
15
|
-
|
16
|
-
The MIT License (MIT)
|
17
|
-
|
18
|
-
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
19
|
-
|
20
|
-
Permission is hereby granted, free of charge, to any person obtaining a
|
21
|
-
copy of this software and associated documentation files (the "Software"),
|
22
|
-
to deal in the Software without restriction, including without limitation
|
23
|
-
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
24
|
-
and/or sell copies of the Software, and to permit persons to whom the
|
25
|
-
Software is furnished to do so, subject to the following conditions:
|
26
|
-
|
27
|
-
The above copyright notice and this permission notice shall be included
|
28
|
-
in all copies or substantial portions of the Software.
|
29
|
-
|
30
|
-
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
31
|
-
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
32
|
-
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
33
|
-
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
34
|
-
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
35
|
-
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
36
|
-
IN THE SOFTWARE.
|
37
|
-
|
38
|
-
=end
|
1
|
+
# coding: utf-8
|
2
|
+
|
3
|
+
# `Frequent` is the namespace for objects implementing
|
4
|
+
# the Demaine et al. FREQUENT algorithm for finding
|
5
|
+
# the most frequently-appearing items (top-k) in a
|
6
|
+
# data stream in sliding windows.
|
7
|
+
#
|
8
|
+
# `Frequent::Algorithm` is the implementation class.
|
9
|
+
module Frequent
|
10
|
+
# Version string for this Rubygem.
|
11
|
+
VERSION = '0.0.3'
|
12
|
+
end
|
13
|
+
|
14
|
+
=begin
|
15
|
+
|
16
|
+
The MIT License (MIT)
|
17
|
+
|
18
|
+
Copyright (c) 2015 Willie Tong, Brooke M. Fujita
|
19
|
+
|
20
|
+
Permission is hereby granted, free of charge, to any person obtaining a
|
21
|
+
copy of this software and associated documentation files (the "Software"),
|
22
|
+
to deal in the Software without restriction, including without limitation
|
23
|
+
the rights to use, copy, modify, merge, publish, distribute, sublicense,
|
24
|
+
and/or sell copies of the Software, and to permit persons to whom the
|
25
|
+
Software is furnished to do so, subject to the following conditions:
|
26
|
+
|
27
|
+
The above copyright notice and this permission notice shall be included
|
28
|
+
in all copies or substantial portions of the Software.
|
29
|
+
|
30
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
31
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
32
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
|
33
|
+
THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
34
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
|
35
|
+
FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
|
36
|
+
IN THE SOFTWARE.
|
37
|
+
|
38
|
+
=end
|
metadata
CHANGED
@@ -1,44 +1,44 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: frequent-algorithm
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.0.
|
4
|
+
version: 0.0.3
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Willie Tong
|
8
8
|
- Brooke M. Fujita
|
9
|
-
autorequire:
|
9
|
+
autorequire:
|
10
10
|
bindir: bin
|
11
11
|
cert_chain: []
|
12
|
-
date: 2015-03-
|
12
|
+
date: 2015-03-23 00:00:00.000000000 Z
|
13
13
|
dependencies:
|
14
14
|
- !ruby/object:Gem::Dependency
|
15
15
|
name: rake
|
16
|
-
|
16
|
+
version_requirements: !ruby/object:Gem::Requirement
|
17
17
|
requirements:
|
18
|
-
- -
|
18
|
+
- - ">="
|
19
19
|
- !ruby/object:Gem::Version
|
20
20
|
version: '0'
|
21
|
-
|
22
|
-
prerelease: false
|
23
|
-
version_requirements: !ruby/object:Gem::Requirement
|
21
|
+
requirement: !ruby/object:Gem::Requirement
|
24
22
|
requirements:
|
25
|
-
- -
|
23
|
+
- - ">="
|
26
24
|
- !ruby/object:Gem::Version
|
27
25
|
version: '0'
|
26
|
+
prerelease: false
|
27
|
+
type: :development
|
28
28
|
- !ruby/object:Gem::Dependency
|
29
29
|
name: minitest
|
30
|
-
|
30
|
+
version_requirements: !ruby/object:Gem::Requirement
|
31
31
|
requirements:
|
32
|
-
- -
|
32
|
+
- - ">="
|
33
33
|
- !ruby/object:Gem::Version
|
34
34
|
version: '0'
|
35
|
-
|
36
|
-
prerelease: false
|
37
|
-
version_requirements: !ruby/object:Gem::Requirement
|
35
|
+
requirement: !ruby/object:Gem::Requirement
|
38
36
|
requirements:
|
39
|
-
- -
|
37
|
+
- - ">="
|
40
38
|
- !ruby/object:Gem::Version
|
41
39
|
version: '0'
|
40
|
+
prerelease: false
|
41
|
+
type: :development
|
42
42
|
description: |
|
43
43
|
frequent-algorithm is a Ruby implementation of the Demaine et al FREQUENT algorithm for identifying frequent items in a data stream in sliding windows (c.f Identifying Frequent Items in Sliding Windows over On-Line Packet Streams, 2003).
|
44
44
|
email:
|
@@ -48,36 +48,35 @@ executables: []
|
|
48
48
|
extensions: []
|
49
49
|
extra_rdoc_files: []
|
50
50
|
files:
|
51
|
+
- ".yardopts"
|
52
|
+
- CHANGELOG
|
53
|
+
- LICENSE
|
54
|
+
- README.md
|
51
55
|
- lib/frequent-algorithm.rb
|
52
56
|
- lib/frequent/algorithm.rb
|
53
57
|
- lib/frequent/version.rb
|
54
|
-
- README.md
|
55
|
-
- LICENSE
|
56
|
-
- CHANGELOG
|
57
|
-
- .yardopts
|
58
58
|
homepage: https://github.com/buruzaemon/frequent-algorithm
|
59
59
|
licenses:
|
60
60
|
- MIT
|
61
61
|
metadata: {}
|
62
|
-
post_install_message:
|
62
|
+
post_install_message:
|
63
63
|
rdoc_options: []
|
64
64
|
require_paths:
|
65
65
|
- lib
|
66
66
|
required_ruby_version: !ruby/object:Gem::Requirement
|
67
67
|
requirements:
|
68
|
-
- -
|
68
|
+
- - ">="
|
69
69
|
- !ruby/object:Gem::Version
|
70
70
|
version: '2.0'
|
71
71
|
required_rubygems_version: !ruby/object:Gem::Requirement
|
72
72
|
requirements:
|
73
|
-
- -
|
73
|
+
- - ">="
|
74
74
|
- !ruby/object:Gem::Version
|
75
75
|
version: '0'
|
76
76
|
requirements: []
|
77
|
-
rubyforge_project:
|
78
|
-
rubygems_version: 2.
|
79
|
-
signing_key:
|
77
|
+
rubyforge_project:
|
78
|
+
rubygems_version: 2.4.5
|
79
|
+
signing_key:
|
80
80
|
specification_version: 4
|
81
|
-
summary: Identifies frequent items in a data stream in sliding windows using the Demaine
|
82
|
-
et al FREQUENT algorithm.
|
81
|
+
summary: Identifies frequent items in a data stream in sliding windows using the Demaine et al FREQUENT algorithm.
|
83
82
|
test_files: []
|