aggregator 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: cb22458b343af2f63b3c053075447a73c12b3da1
4
+ data.tar.gz: e3ea269f9c95a9d02b713449afe14e110d7c0145
5
+ SHA512:
6
+ metadata.gz: 07e8a50c88287282d418cc0913d6f43ff7791b49c2437633037bbb4e1bb12e78ef1d7fb83f78dfe50b489082cd4aae416b876350c4226fb6773480626ec3108e
7
+ data.tar.gz: 64922c7189f36b8fbbfc6bdb9da4e24591c76b023b553cf9b6ddb0d8a74e8053b367600a439423d4d8828261055a4a9a0b1b15b01c0e96090dad71a42ed63de3
@@ -0,0 +1,17 @@
1
+ *.gem
2
+ *.rbc
3
+ .bundle
4
+ .config
5
+ .yardoc
6
+ Gemfile.lock
7
+ InstalledFiles
8
+ _yardoc
9
+ coverage
10
+ doc/
11
+ lib/bundler/man
12
+ pkg
13
+ rdoc
14
+ spec/reports
15
+ test/tmp
16
+ test/version_tmp
17
+ tmp
@@ -0,0 +1,7 @@
1
+ language: ruby
2
+ rvm:
3
+ - 1.9.3
4
+ - 2.0.0
5
+ - 2.1.0-preview2
6
+ - rbx
7
+ - jruby
data/Gemfile ADDED
@@ -0,0 +1,2 @@
1
+ source "https://rubygems.org"
2
+ gemspec
@@ -0,0 +1,22 @@
1
+ Copyright (c) 2013 Adtile, Inc.
2
+
3
+ MIT License
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining
6
+ a copy of this software and associated documentation files (the
7
+ "Software"), to deal in the Software without restriction, including
8
+ without limitation the rights to use, copy, modify, merge, publish,
9
+ distribute, sublicense, and/or sell copies of the Software, and to
10
+ permit persons to whom the Software is furnished to do so, subject to
11
+ the following conditions:
12
+
13
+ The above copyright notice and this permission notice shall be
14
+ included in all copies or substantial portions of the Software.
15
+
16
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
17
+ EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
18
+ MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
19
+ NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
20
+ LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
21
+ OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
22
+ WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,101 @@
1
+ # Aggregator
2
+
3
+ [![Build Status](https://travis-ci.org/adtile/aggregator.png?branch=master)](https://travis-ci.org/adtile/aggregator)
4
+ [![Code Climate](https://codeclimate.com/github/adtile/aggregator.png)](https://codeclimate.com/github/adtile/aggregator)
5
+
6
+ Aggregator is a Ruby gem that allows you to easily run aggregation work on a separate thread so that you can save yourself from doing too many expensive operations when you can do a batch operation less frequently.
7
+
8
+ ## Installation
9
+
10
+ $ gem install aggregator
11
+
12
+ Or add it to your Gemfile.
13
+
14
+ ## Usage
15
+
16
+ Let's create a sample aggregator for a Rails application to keep track of pageviews:
17
+
18
+ ``` ruby
19
+ class PageviewAggregator < Aggregator
20
+ def process(collection, item)
21
+ collection ||= {}
22
+ collection[item] = collection.fetch(item, 0) + 1
23
+ collection
24
+ end
25
+
26
+ def finish(collection)
27
+ # Update the database based on your aggregated data:
28
+ # { "/" => 471, "/about" => 127, ... }
29
+ end
30
+ end
31
+ ```
32
+
33
+ Then, in a Rails controller action you could push the current page path:
34
+
35
+ ``` ruby
36
+ PageviewAggregator.push(request.path)
37
+ ```
38
+
39
+ That's it! Let's go through what happens in more detail.
40
+
41
+ Every time a new item is pushed, the `#perform` method is called. For each new batch, `collection` will be `nil` and it is your responsibility to manage it. This way it can be any object you want (Hash, Array, etc.). You must also always return the collection object from this method. Since this method is called for each pushed item, you'll want to keep it fast.
42
+
43
+ Whenever a batch is ready, `#finish` is called and the final collection is passed. In here you can do whatever you want with it. Most likely you'll be doing something like saving it to a database.
44
+
45
+ A batch is considered ready whenever the one of two things happens:
46
+
47
+ - A configured number of items has been processed.
48
+ - A configured amount of time has passed since the batch started.
49
+
50
+ See the configuration options below to see how to set these values.
51
+
52
+ ### Configuration options
53
+
54
+ Configuration options are defined for each `Aggregator` subclass and are class methods that must be explicitly called on `self`:
55
+
56
+ ```ruby
57
+ class MyAggregator < Aggregator
58
+ self.option_name = <value>
59
+ end
60
+ ```
61
+
62
+ The available options are:
63
+
64
+ - `.max_batch_size=`: maximum number of items to process before a batch is considered ready and `#finish` is called. Defaults to 1000.
65
+
66
+ - `.max_wait_time=`: maximum number of seconds given to the batch to process before it's considered ready. Defaults to 1.
67
+
68
+ - `.logger=`: logger to use. In a Rails application you probably want to set it to `Rails.logger`. Defaults to `Logger.new(STDOUT)`.
69
+
70
+ ### Testing
71
+
72
+ When you're writing tests for your application, you might need to wait until the aggregations run before you can assert something. In that case, you can just call `.drain` on your Aggregator subclass, which will block until all items have been processed and finished:
73
+
74
+ ``` ruby
75
+ it "saves all aggregations to the database" do
76
+ 5.times { get "/page" }
77
+ PageviewAggregator.drain
78
+ pageviews = Pageview.find("/page").total
79
+ expect(pageviews).to eq(5)
80
+ end
81
+ ```
82
+
83
+ ## Guarantees and gotchas
84
+
85
+ All background threads are handled for you and can recover from crashes. However, if a thread crashes due to an exception raised in the `#perform` method, that item may be lost forever. Similarly, if there is an uncaught exception in `#finish` the entire collection will be lost. It is up to you to rescue and retry based on your needs.
86
+
87
+ One and only one background thread is started for each Aggregator subclass.
88
+
89
+ When the process exits gracefully (e.g. web server shutdown), running aggregators will finish processing all items.
90
+
91
+ ## License
92
+
93
+ MIT License.
94
+
95
+ ## Contributing
96
+
97
+ 1. Fork it ( http://github.com/<my-github-username>/aggregator/fork )
98
+ 2. Create your feature branch (`git checkout -b my-new-feature`)
99
+ 3. Commit your changes (`git commit -am 'Add some feature'`)
100
+ 4. Push to the branch (`git push origin my-new-feature`)
101
+ 5. Create new Pull Request
@@ -0,0 +1,8 @@
1
+ require "bundler/gem_tasks"
2
+ require "rake/testtask"
3
+
4
+ Rake::TestTask.new(:test) do |t|
5
+ t.libs << "test"
6
+ end
7
+
8
+ task :default => :test
@@ -0,0 +1,24 @@
1
+ # coding: utf-8
2
+ lib = File.expand_path('../lib', __FILE__)
3
+ $LOAD_PATH.unshift(lib) unless $LOAD_PATH.include?(lib)
4
+ require 'aggregator/version'
5
+
6
+ Gem::Specification.new do |spec|
7
+ spec.name = "aggregator"
8
+ spec.version = Aggregator::VERSION
9
+ spec.authors = ["Joao Carlos"]
10
+ spec.email = ["joao@adtile.me"]
11
+ spec.summary = %q{Aggregate items on a separate thread.}
12
+ spec.description = %q{Define aggregators that run on a separate thread so that you can do more, faster.}
13
+ spec.homepage = ""
14
+ spec.license = "MIT"
15
+
16
+ spec.files = `git ls-files`.split($/)
17
+ spec.executables = spec.files.grep(%r{^bin/}) { |f| File.basename(f) }
18
+ spec.test_files = spec.files.grep(%r{^(test|spec|features)/})
19
+ spec.require_paths = ["lib"]
20
+
21
+ spec.add_development_dependency "rake"
22
+ spec.add_development_dependency "minitest", ">= 5.0.0"
23
+ spec.add_development_dependency "rubysl", "~> 2.0" if RUBY_ENGINE == "rbx"
24
+ end
@@ -0,0 +1,144 @@
1
+ require "thread"
2
+ require "singleton"
3
+ require "logger"
4
+
5
+ class Aggregator
6
+ include Singleton
7
+
8
+ attr_accessor :max_batch_size, :max_wait_time, :logger
9
+
10
+ def self.push(data)
11
+ self.instance.push(data)
12
+ end
13
+
14
+ def self.max_batch_size=(value)
15
+ self.instance.max_batch_size = value
16
+ end
17
+
18
+ def self.max_wait_time=(value)
19
+ self.instance.max_wait_time = value
20
+ end
21
+
22
+ def self.logger=(logger)
23
+ self.instance.logger = logger
24
+ end
25
+
26
+ def self.drain
27
+ self.instance.drain
28
+ end
29
+
30
+ def initialize
31
+ @queue = Queue.new
32
+ @mutex = Mutex.new
33
+ @thread = nil
34
+
35
+ at_exit { stop }
36
+ end
37
+
38
+ def push(data)
39
+ @queue.push(data)
40
+ start unless running?
41
+ end
42
+
43
+ def drain
44
+ if running?
45
+ if ! @queue.empty?
46
+ log :info, "joining thread #{@thread.inspect} (queue length = #{@queue.length})"
47
+ @drain = true
48
+ @thread.join if running?
49
+ end
50
+
51
+ log :info, "stopping thread #{@thread.inspect} (queue length = #{@queue.length})"
52
+ @thread = nil
53
+ elsif ! @queue.empty?
54
+ start and drain
55
+ end
56
+
57
+ true
58
+ end
59
+
60
+ private
61
+
62
+ def max_batch_size
63
+ @max_batch_size || 1000
64
+ end
65
+
66
+ def max_wait_time
67
+ @max_wait_time || 1
68
+ end
69
+
70
+ def process(collection, item)
71
+ raise NoMethodError,
72
+ "#{self.class.name}#process(collection, item) must be implemented"
73
+ end
74
+
75
+ def finish(collection)
76
+ raise NoMethodError,
77
+ "#{self.class.name}#finish(collection) must be implemented"
78
+ end
79
+
80
+ def running?
81
+ @thread && @thread.alive?
82
+ end
83
+
84
+ def logger
85
+ @logger ||= Logger.new(STDOUT)
86
+ end
87
+
88
+ def log(level, message)
89
+ logger.send(level, "[#{self.class.name}] #{message}")
90
+ end
91
+
92
+ def process_queue
93
+ raise StopIteration if @queue.empty? && @drain
94
+
95
+ processed_items = 0
96
+ start_time = Time.now
97
+
98
+ while processed_items < max_batch_size && (Time.now - start_time) < max_wait_time
99
+ raise StopIteration if @queue.empty? && @drain
100
+ if @queue.empty?
101
+ sleep 0.1
102
+ else
103
+ collection = process(collection, @queue.pop(true))
104
+ processed_items += 1
105
+ end
106
+ end
107
+ ensure
108
+ finish(collection) if collection
109
+ end
110
+
111
+ def start
112
+ @mutex.synchronize do
113
+ return false if running?
114
+
115
+ @drain = false
116
+
117
+ @thread = Thread.new do
118
+ begin
119
+ log :info, "starting thread #{Thread.current}"
120
+
121
+ loop do
122
+ process_queue
123
+ end
124
+ rescue Exception => e
125
+ log :warn, "thread crashed with exception: #{e.inspect}"
126
+ end
127
+ end
128
+
129
+ @thread.priority = 2
130
+
131
+ @thread
132
+ end
133
+ end
134
+
135
+ def stop
136
+ if running?
137
+ drain
138
+ else
139
+ log :info, "thread not running - nothing to stop"
140
+ return false
141
+ end
142
+ end
143
+
144
+ end
@@ -0,0 +1,3 @@
1
+ class Aggregator
2
+ VERSION = "1.0.0"
3
+ end
@@ -0,0 +1,5 @@
1
+ $LOAD_PATH.unshift File.expand_path("../../lib", __FILE__)
2
+ require "aggregator"
3
+
4
+ require "minitest"
5
+ require "minitest/autorun"
@@ -0,0 +1,79 @@
1
+ require "minitest_helper"
2
+ require "stringio"
3
+
4
+ class TestAggregator < Minitest::Test
5
+
6
+ class EventAggregator < Aggregator
7
+ attr_reader :process_counter, :finish_counter
8
+
9
+ self.max_wait_time = 2
10
+ self.max_batch_size = 25
11
+ self.logger = Logger.new(StringIO.new)
12
+
13
+ def reset_counters
14
+ @process_counter = 0
15
+ @finish_counter = 0
16
+ end
17
+
18
+ def process(collection, item)
19
+ fail if item.nil?
20
+ collection ||= []
21
+ collection << item
22
+ collection
23
+ end
24
+
25
+ def finish(collection)
26
+ @process_counter ||= 0
27
+ @process_counter += collection.count
28
+
29
+ @finish_counter ||= 0
30
+ @finish_counter += 1
31
+ end
32
+ end
33
+
34
+ def setup
35
+ EventAggregator.instance.reset_counters
36
+ EventAggregator.instance.instance_variable_get(:@queue).clear
37
+ end
38
+
39
+ def test_it_processes_pushed_items
40
+ 100.times { EventAggregator.push({}) }
41
+ EventAggregator.drain
42
+ assert_equal 100, EventAggregator.instance.process_counter
43
+ end
44
+
45
+ def test_it_processes_in_batches
46
+ 100.times { EventAggregator.push({}) }
47
+ EventAggregator.drain
48
+ assert_equal 4, EventAggregator.instance.finish_counter
49
+ end
50
+
51
+ def test_it_can_drain_multiple_times
52
+ 100.times do
53
+ EventAggregator.instance.reset_counters
54
+ 90.times { EventAggregator.push({}) }
55
+ EventAggregator.drain
56
+ assert_equal 90, EventAggregator.instance.process_counter
57
+ end
58
+ end
59
+
60
+ def test_it_can_recover_from_a_thread_crash
61
+ 100.times do |i|
62
+ item = (i == 35) ? nil : {}
63
+ EventAggregator.push(item)
64
+ end
65
+
66
+ EventAggregator.drain
67
+ EventAggregator.push({})
68
+ EventAggregator.drain
69
+
70
+ assert_equal 100, EventAggregator.instance.process_counter
71
+ end
72
+
73
+ def test_it_drains_the_queue_even_if_the_thread_was_not_running
74
+ EventAggregator.instance.instance_variable_get(:@queue).push({})
75
+ EventAggregator.drain
76
+ assert_equal 1, EventAggregator.instance.process_counter
77
+ end
78
+
79
+ end
metadata ADDED
@@ -0,0 +1,87 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: aggregator
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Joao Carlos
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2013-12-20 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: rake
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - '>='
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :development
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - '>='
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ - !ruby/object:Gem::Dependency
28
+ name: minitest
29
+ requirement: !ruby/object:Gem::Requirement
30
+ requirements:
31
+ - - '>='
32
+ - !ruby/object:Gem::Version
33
+ version: 5.0.0
34
+ type: :development
35
+ prerelease: false
36
+ version_requirements: !ruby/object:Gem::Requirement
37
+ requirements:
38
+ - - '>='
39
+ - !ruby/object:Gem::Version
40
+ version: 5.0.0
41
+ description: Define aggregators that run on a separate thread so that you can do more,
42
+ faster.
43
+ email:
44
+ - joao@adtile.me
45
+ executables: []
46
+ extensions: []
47
+ extra_rdoc_files: []
48
+ files:
49
+ - .gitignore
50
+ - .travis.yml
51
+ - Gemfile
52
+ - LICENSE.txt
53
+ - README.md
54
+ - Rakefile
55
+ - aggregator.gemspec
56
+ - lib/aggregator.rb
57
+ - lib/aggregator/version.rb
58
+ - test/minitest_helper.rb
59
+ - test/test_aggregator.rb
60
+ homepage: ''
61
+ licenses:
62
+ - MIT
63
+ metadata: {}
64
+ post_install_message:
65
+ rdoc_options: []
66
+ require_paths:
67
+ - lib
68
+ required_ruby_version: !ruby/object:Gem::Requirement
69
+ requirements:
70
+ - - '>='
71
+ - !ruby/object:Gem::Version
72
+ version: '0'
73
+ required_rubygems_version: !ruby/object:Gem::Requirement
74
+ requirements:
75
+ - - '>='
76
+ - !ruby/object:Gem::Version
77
+ version: '0'
78
+ requirements: []
79
+ rubyforge_project:
80
+ rubygems_version: 2.0.3
81
+ signing_key:
82
+ specification_version: 4
83
+ summary: Aggregate items on a separate thread.
84
+ test_files:
85
+ - test/minitest_helper.rb
86
+ - test/test_aggregator.rb
87
+ has_rdoc: