aggregator 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/.gitignore +17 -0
- data/.travis.yml +7 -0
- data/Gemfile +2 -0
- data/LICENSE.txt +22 -0
- data/README.md +101 -0
- data/Rakefile +8 -0
- data/aggregator.gemspec +24 -0
- data/lib/aggregator.rb +144 -0
- data/lib/aggregator/version.rb +3 -0
- data/test/minitest_helper.rb +5 -0
- data/test/test_aggregator.rb +79 -0
- metadata +87 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: cb22458b343af2f63b3c053075447a73c12b3da1
|
4
|
+
data.tar.gz: e3ea269f9c95a9d02b713449afe14e110d7c0145
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: 07e8a50c88287282d418cc0913d6f43ff7791b49c2437633037bbb4e1bb12e78ef1d7fb83f78dfe50b489082cd4aae416b876350c4226fb6773480626ec3108e
|
7
|
+
data.tar.gz: 64922c7189f36b8fbbfc6bdb9da4e24591c76b023b553cf9b6ddb0d8a74e8053b367600a439423d4d8828261055a4a9a0b1b15b01c0e96090dad71a42ed63de3
|
data/.gitignore
ADDED
data/.travis.yml
ADDED
data/Gemfile
ADDED
data/LICENSE.txt
ADDED
@@ -0,0 +1,22 @@
|
|
1
|
+
Copyright (c) 2013 Adtile, Inc.
|
2
|
+
|
3
|
+
MIT License
|
4
|
+
|
5
|
+
Permission is hereby granted, free of charge, to any person obtaining
|
6
|
+
a copy of this software and associated documentation files (the
|
7
|
+
"Software"), to deal in the Software without restriction, including
|
8
|
+
without limitation the rights to use, copy, modify, merge, publish,
|
9
|
+
distribute, sublicense, and/or sell copies of the Software, and to
|
10
|
+
permit persons to whom the Software is furnished to do so, subject to
|
11
|
+
the following conditions:
|
12
|
+
|
13
|
+
The above copyright notice and this permission notice shall be
|
14
|
+
included in all copies or substantial portions of the Software.
|
15
|
+
|
16
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
|
17
|
+
EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
|
18
|
+
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
|
19
|
+
NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
|
20
|
+
LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
|
21
|
+
OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
|
22
|
+
WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,101 @@
|
|
1
|
+
# Aggregator
|
2
|
+
|
3
|
+
[![Build Status](https://travis-ci.org/adtile/aggregator.png?branch=master)](https://travis-ci.org/adtile/aggregator)
|
4
|
+
[![Code Climate](https://codeclimate.com/github/adtile/aggregator.png)](https://codeclimate.com/github/adtile/aggregator)
|
5
|
+
|
6
|
+
Aggregator is a Ruby gem that allows you to easily run aggregation work on a separate thread so that you can save yourself from doing too many expensive operations when you can do a batch operation less frequently.
|
7
|
+
|
8
|
+
## Installation
|
9
|
+
|
10
|
+
$ gem install aggregator
|
11
|
+
|
12
|
+
Or add it to your Gemfile.
|
13
|
+
|
14
|
+
## Usage
|
15
|
+
|
16
|
+
Let's create a sample aggregator for a Rails application to keep track of pageviews:
|
17
|
+
|
18
|
+
``` ruby
|
19
|
+
class PageviewAggregator < Aggregator
|
20
|
+
def process(collection, item)
|
21
|
+
collection ||= {}
|
22
|
+
collection[item] = collection.fetch(item, 0) + 1
|
23
|
+
collection
|
24
|
+
end
|
25
|
+
|
26
|
+
def finish(collection)
|
27
|
+
# Update the database based on your aggregated data:
|
28
|
+
# { "/" => 471, "/about" => 127, ... }
|
29
|
+
end
|
30
|
+
end
|
31
|
+
```
|
32
|
+
|
33
|
+
Then, in a Rails controller action you could push the current page path:
|
34
|
+
|
35
|
+
``` ruby
|
36
|
+
PageviewAggregator.push(request.path)
|
37
|
+
```
|
38
|
+
|
39
|
+
That's it! Let's go through what happens in more detail.
|
40
|
+
|
41
|
+
Every time a new item is pushed, the `#perform` method is called. For each new batch, `collection` will be `nil` and it is your responsibility to manage it. This way it can be any object you want (Hash, Array, etc.). You must also always return the collection object from this method. Since this method is called for each pushed item, you'll want to keep it fast.
|
42
|
+
|
43
|
+
Whenever a batch is ready, `#finish` is called and the final collection is passed. In here you can do whatever you want with it. Most likely you'll be doing something like saving it to a database.
|
44
|
+
|
45
|
+
A batch is considered ready whenever the one of two things happens:
|
46
|
+
|
47
|
+
- A configured number of items has been processed.
|
48
|
+
- A configured amount of time has passed since the batch started.
|
49
|
+
|
50
|
+
See the configuration options below to see how to set these values.
|
51
|
+
|
52
|
+
### Configuration options
|
53
|
+
|
54
|
+
Configuration options are defined for each `Aggregator` subclass and are class methods that must be explicitly called on `self`:
|
55
|
+
|
56
|
+
```ruby
|
57
|
+
class MyAggregator < Aggregator
|
58
|
+
self.option_name = <value>
|
59
|
+
end
|
60
|
+
```
|
61
|
+
|
62
|
+
The available options are:
|
63
|
+
|
64
|
+
- `.max_batch_size=`: maximum number of items to process before a batch is considered ready and `#finish` is called. Defaults to 1000.
|
65
|
+
|
66
|
+
- `.max_wait_time=`: maximum number of seconds given to the batch to process before it's considered ready. Defaults to 1.
|
67
|
+
|
68
|
+
- `.logger=`: logger to use. In a Rails application you probably want to set it to `Rails.logger`. Defaults to `Logger.new(STDOUT)`.
|
69
|
+
|
70
|
+
### Testing
|
71
|
+
|
72
|
+
When you're writing tests for your application, you might need to wait until the aggregations run before you can assert something. In that case, you can just call `.drain` on your Aggregator subclass, which will block until all items have been processed and finished:
|
73
|
+
|
74
|
+
``` ruby
|
75
|
+
it "saves all aggregations to the database" do
|
76
|
+
5.times { get "/page" }
|
77
|
+
PageviewAggregator.drain
|
78
|
+
pageviews = Pageview.find("/page").total
|
79
|
+
expect(pageviews).to eq(5)
|
80
|
+
end
|
81
|
+
```
|
82
|
+
|
83
|
+
## Guarantees and gotchas
|
84
|
+
|
85
|
+
All background threads are handled for you and can recover from crashes. However, if a thread crashes due to an exception raised in the `#perform` method, that item may be lost forever. Similarly, if there is an uncaught exception in `#finish` the entire collection will be lost. It is up to you to rescue and retry based on your needs.
|
86
|
+
|
87
|
+
One and only one background thread is started for each Aggregator subclass.
|
88
|
+
|
89
|
+
When the process exits gracefully (e.g. web server shutdown), running aggregators will finish processing all items.
|
90
|
+
|
91
|
+
## License
|
92
|
+
|
93
|
+
MIT License.
|
94
|
+
|
95
|
+
## Contributing
|
96
|
+
|
97
|
+
1. Fork it ( http://github.com/<my-github-username>/aggregator/fork )
|
98
|
+
2. Create your feature branch (`git checkout -b my-new-feature`)
|
99
|
+
3. Commit your changes (`git commit -am 'Add some feature'`)
|
100
|
+
4. Push to the branch (`git push origin my-new-feature`)
|
101
|
+
5. Create new Pull Request
|
data/Rakefile
ADDED
data/aggregator.gemspec
ADDED
@@ -0,0 +1,24 @@
|
|
1
|
+
# coding: utf-8
|
2
|
+
lib = File.expand_path('../lib', __FILE__)
|
3
|
+
$LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
|
4
|
+
require 'aggregator/version'
|
5
|
+
|
6
|
+
Gem::Specification.new do |spec|
|
7
|
+
spec.name = "aggregator"
|
8
|
+
spec.version = Aggregator::VERSION
|
9
|
+
spec.authors = ["Joao Carlos"]
|
10
|
+
spec.email = ["joao@adtile.me"]
|
11
|
+
spec.summary = %q{Aggregate items on a separate thread.}
|
12
|
+
spec.description = %q{Define aggregators that run on a separate thread so that you can do more, faster.}
|
13
|
+
spec.homepage = ""
|
14
|
+
spec.license = "MIT"
|
15
|
+
|
16
|
+
spec.files = `git ls-files`.split($/)
|
17
|
+
spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
|
18
|
+
spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
|
19
|
+
spec.require_paths = ["lib"]
|
20
|
+
|
21
|
+
spec.add_development_dependency "rake"
|
22
|
+
spec.add_development_dependency "minitest", ">= 5.0.0"
|
23
|
+
spec.add_development_dependency "rubysl", "~> 2.0" if RUBY_ENGINE == "rbx"
|
24
|
+
end
|
data/lib/aggregator.rb
ADDED
@@ -0,0 +1,144 @@
|
|
1
|
+
require "thread"
|
2
|
+
require "singleton"
|
3
|
+
require "logger"
|
4
|
+
|
5
|
+
class Aggregator
|
6
|
+
include Singleton
|
7
|
+
|
8
|
+
attr_accessor :max_batch_size, :max_wait_time, :logger
|
9
|
+
|
10
|
+
def self.push(data)
|
11
|
+
self.instance.push(data)
|
12
|
+
end
|
13
|
+
|
14
|
+
def self.max_batch_size=(value)
|
15
|
+
self.instance.max_batch_size = value
|
16
|
+
end
|
17
|
+
|
18
|
+
def self.max_wait_time=(value)
|
19
|
+
self.instance.max_wait_time = value
|
20
|
+
end
|
21
|
+
|
22
|
+
def self.logger=(logger)
|
23
|
+
self.instance.logger = logger
|
24
|
+
end
|
25
|
+
|
26
|
+
def self.drain
|
27
|
+
self.instance.drain
|
28
|
+
end
|
29
|
+
|
30
|
+
def initialize
|
31
|
+
@queue = Queue.new
|
32
|
+
@mutex = Mutex.new
|
33
|
+
@thread = nil
|
34
|
+
|
35
|
+
at_exit { stop }
|
36
|
+
end
|
37
|
+
|
38
|
+
def push(data)
|
39
|
+
@queue.push(data)
|
40
|
+
start unless running?
|
41
|
+
end
|
42
|
+
|
43
|
+
def drain
|
44
|
+
if running?
|
45
|
+
if ! @queue.empty?
|
46
|
+
log :info, "joining thread #{@thread.inspect} (queue length = #{@queue.length})"
|
47
|
+
@drain = true
|
48
|
+
@thread.join if running?
|
49
|
+
end
|
50
|
+
|
51
|
+
log :info, "stopping thread #{@thread.inspect} (queue length = #{@queue.length})"
|
52
|
+
@thread = nil
|
53
|
+
elsif ! @queue.empty?
|
54
|
+
start and drain
|
55
|
+
end
|
56
|
+
|
57
|
+
true
|
58
|
+
end
|
59
|
+
|
60
|
+
private
|
61
|
+
|
62
|
+
def max_batch_size
|
63
|
+
@max_batch_size || 1000
|
64
|
+
end
|
65
|
+
|
66
|
+
def max_wait_time
|
67
|
+
@max_wait_time || 1
|
68
|
+
end
|
69
|
+
|
70
|
+
def process(collection, item)
|
71
|
+
raise NoMethodError,
|
72
|
+
"#{self.class.name}#process(collection, item) must be implemented"
|
73
|
+
end
|
74
|
+
|
75
|
+
def finish(collection)
|
76
|
+
raise NoMethodError,
|
77
|
+
"#{self.class.name}#finish(collection) must be implemented"
|
78
|
+
end
|
79
|
+
|
80
|
+
def running?
|
81
|
+
@thread && @thread.alive?
|
82
|
+
end
|
83
|
+
|
84
|
+
def logger
|
85
|
+
@logger ||= Logger.new(STDOUT)
|
86
|
+
end
|
87
|
+
|
88
|
+
def log(level, message)
|
89
|
+
logger.send(level, "[#{self.class.name}] #{message}")
|
90
|
+
end
|
91
|
+
|
92
|
+
def process_queue
|
93
|
+
raise StopIteration if @queue.empty? && @drain
|
94
|
+
|
95
|
+
processed_items = 0
|
96
|
+
start_time = Time.now
|
97
|
+
|
98
|
+
while processed_items < max_batch_size && (Time.now - start_time) < max_wait_time
|
99
|
+
raise StopIteration if @queue.empty? && @drain
|
100
|
+
if @queue.empty?
|
101
|
+
sleep 0.1
|
102
|
+
else
|
103
|
+
collection = process(collection, @queue.pop(true))
|
104
|
+
processed_items += 1
|
105
|
+
end
|
106
|
+
end
|
107
|
+
ensure
|
108
|
+
finish(collection) if collection
|
109
|
+
end
|
110
|
+
|
111
|
+
def start
|
112
|
+
@mutex.synchronize do
|
113
|
+
return false if running?
|
114
|
+
|
115
|
+
@drain = false
|
116
|
+
|
117
|
+
@thread = Thread.new do
|
118
|
+
begin
|
119
|
+
log :info, "starting thread #{Thread.current}"
|
120
|
+
|
121
|
+
loop do
|
122
|
+
process_queue
|
123
|
+
end
|
124
|
+
rescue Exception => e
|
125
|
+
log :warn, "thread crashed with exception: #{e.inspect}"
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
@thread.priority = 2
|
130
|
+
|
131
|
+
@thread
|
132
|
+
end
|
133
|
+
end
|
134
|
+
|
135
|
+
def stop
|
136
|
+
if running?
|
137
|
+
drain
|
138
|
+
else
|
139
|
+
log :info, "thread not running - nothing to stop"
|
140
|
+
return false
|
141
|
+
end
|
142
|
+
end
|
143
|
+
|
144
|
+
end
|
@@ -0,0 +1,79 @@
|
|
1
|
+
require "minitest_helper"
|
2
|
+
require "stringio"
|
3
|
+
|
4
|
+
class TestAggregator < Minitest::Test
|
5
|
+
|
6
|
+
class EventAggregator < Aggregator
|
7
|
+
attr_reader :process_counter, :finish_counter
|
8
|
+
|
9
|
+
self.max_wait_time = 2
|
10
|
+
self.max_batch_size = 25
|
11
|
+
self.logger = Logger.new(StringIO.new)
|
12
|
+
|
13
|
+
def reset_counters
|
14
|
+
@process_counter = 0
|
15
|
+
@finish_counter = 0
|
16
|
+
end
|
17
|
+
|
18
|
+
def process(collection, item)
|
19
|
+
fail if item.nil?
|
20
|
+
collection ||= []
|
21
|
+
collection << item
|
22
|
+
collection
|
23
|
+
end
|
24
|
+
|
25
|
+
def finish(collection)
|
26
|
+
@process_counter ||= 0
|
27
|
+
@process_counter += collection.count
|
28
|
+
|
29
|
+
@finish_counter ||= 0
|
30
|
+
@finish_counter += 1
|
31
|
+
end
|
32
|
+
end
|
33
|
+
|
34
|
+
def setup
|
35
|
+
EventAggregator.instance.reset_counters
|
36
|
+
EventAggregator.instance.instance_variable_get(:@queue).clear
|
37
|
+
end
|
38
|
+
|
39
|
+
def test_it_processes_pushed_items
|
40
|
+
100.times { EventAggregator.push({}) }
|
41
|
+
EventAggregator.drain
|
42
|
+
assert_equal 100, EventAggregator.instance.process_counter
|
43
|
+
end
|
44
|
+
|
45
|
+
def test_it_processes_in_batches
|
46
|
+
100.times { EventAggregator.push({}) }
|
47
|
+
EventAggregator.drain
|
48
|
+
assert_equal 4, EventAggregator.instance.finish_counter
|
49
|
+
end
|
50
|
+
|
51
|
+
def test_it_can_drain_multiple_times
|
52
|
+
100.times do
|
53
|
+
EventAggregator.instance.reset_counters
|
54
|
+
90.times { EventAggregator.push({}) }
|
55
|
+
EventAggregator.drain
|
56
|
+
assert_equal 90, EventAggregator.instance.process_counter
|
57
|
+
end
|
58
|
+
end
|
59
|
+
|
60
|
+
def test_it_can_recover_from_a_thread_crash
|
61
|
+
100.times do |i|
|
62
|
+
item = (i == 35) ? nil : {}
|
63
|
+
EventAggregator.push(item)
|
64
|
+
end
|
65
|
+
|
66
|
+
EventAggregator.drain
|
67
|
+
EventAggregator.push({})
|
68
|
+
EventAggregator.drain
|
69
|
+
|
70
|
+
assert_equal 100, EventAggregator.instance.process_counter
|
71
|
+
end
|
72
|
+
|
73
|
+
def test_it_drains_the_queue_even_if_the_thread_was_not_running
|
74
|
+
EventAggregator.instance.instance_variable_get(:@queue).push({})
|
75
|
+
EventAggregator.drain
|
76
|
+
assert_equal 1, EventAggregator.instance.process_counter
|
77
|
+
end
|
78
|
+
|
79
|
+
end
|
metadata
ADDED
@@ -0,0 +1,87 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: aggregator
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Joao Carlos
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2013-12-20 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: rake
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - '>='
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :development
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - '>='
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
- !ruby/object:Gem::Dependency
|
28
|
+
name: minitest
|
29
|
+
requirement: !ruby/object:Gem::Requirement
|
30
|
+
requirements:
|
31
|
+
- - '>='
|
32
|
+
- !ruby/object:Gem::Version
|
33
|
+
version: 5.0.0
|
34
|
+
type: :development
|
35
|
+
prerelease: false
|
36
|
+
version_requirements: !ruby/object:Gem::Requirement
|
37
|
+
requirements:
|
38
|
+
- - '>='
|
39
|
+
- !ruby/object:Gem::Version
|
40
|
+
version: 5.0.0
|
41
|
+
description: Define aggregators that run on a separate thread so that you can do more,
|
42
|
+
faster.
|
43
|
+
email:
|
44
|
+
- joao@adtile.me
|
45
|
+
executables: []
|
46
|
+
extensions: []
|
47
|
+
extra_rdoc_files: []
|
48
|
+
files:
|
49
|
+
- .gitignore
|
50
|
+
- .travis.yml
|
51
|
+
- Gemfile
|
52
|
+
- LICENSE.txt
|
53
|
+
- README.md
|
54
|
+
- Rakefile
|
55
|
+
- aggregator.gemspec
|
56
|
+
- lib/aggregator.rb
|
57
|
+
- lib/aggregator/version.rb
|
58
|
+
- test/minitest_helper.rb
|
59
|
+
- test/test_aggregator.rb
|
60
|
+
homepage: ''
|
61
|
+
licenses:
|
62
|
+
- MIT
|
63
|
+
metadata: {}
|
64
|
+
post_install_message:
|
65
|
+
rdoc_options: []
|
66
|
+
require_paths:
|
67
|
+
- lib
|
68
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
69
|
+
requirements:
|
70
|
+
- - '>='
|
71
|
+
- !ruby/object:Gem::Version
|
72
|
+
version: '0'
|
73
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
74
|
+
requirements:
|
75
|
+
- - '>='
|
76
|
+
- !ruby/object:Gem::Version
|
77
|
+
version: '0'
|
78
|
+
requirements: []
|
79
|
+
rubyforge_project:
|
80
|
+
rubygems_version: 2.0.3
|
81
|
+
signing_key:
|
82
|
+
specification_version: 4
|
83
|
+
summary: Aggregate items on a separate thread.
|
84
|
+
test_files:
|
85
|
+
- test/minitest_helper.rb
|
86
|
+
- test/test_aggregator.rb
|
87
|
+
has_rdoc:
|