work_batcher 1.0.0
Sign up to get free protection for your applications and to get access to all the features.
- checksums.yaml +7 -0
- data/LICENSE.md +19 -0
- data/README.md +138 -0
- data/Rakefile +4 -0
- data/lib/work_batcher.rb +168 -0
- data/work_batcher.gemspec +18 -0
- metadata +63 -0
checksums.yaml
ADDED
@@ -0,0 +1,7 @@
|
|
1
|
+
---
|
2
|
+
SHA1:
|
3
|
+
metadata.gz: 7d90286860b0a7594c9cdc2207024afa543e060c
|
4
|
+
data.tar.gz: ef489fbfa4c8c4eb55ca5d0d6e48b9f369823e6c
|
5
|
+
SHA512:
|
6
|
+
metadata.gz: def16c19e043e1cc52b66832512e93a289d69ba8bd8364f59f5276151d7450afc427b58915e4f873e5443d8da855ff889581fe2e9847d2317be9727dacade246
|
7
|
+
data.tar.gz: 103d09310207b8dd001f34bcdb444021ad8952130835a9546ae67a7f72f75711e5f118c86855748cc27a1ed0c5f4fc648d6a11bbe75136efd7ae55bdaa2c2c97
|
data/LICENSE.md
ADDED
@@ -0,0 +1,19 @@
|
|
1
|
+
Copyright (c) 2016 Phusion B.V.
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
+
of this software and associated documentation files (the "Software"), to deal
|
5
|
+
in the Software without restriction, including without limitation the rights
|
6
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
7
|
+
copies of the Software, and to permit persons to whom the Software is
|
8
|
+
furnished to do so, subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in
|
11
|
+
all copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
19
|
+
THE SOFTWARE.
|
data/README.md
ADDED
@@ -0,0 +1,138 @@
|
|
1
|
+
# Small library for batching work
|
2
|
+
|
3
|
+
Many types of work can be performed more efficiently when performed in batches rather than individually. For example, performing a single coarse-grained HTTP API call with multiple inputs is often faster than performing multiple smaller HTTP API calls because it reduces the number of network roundtrips. Enter WorkBatcher: a generic library for performing any kind of work in batches.
|
4
|
+
|
5
|
+
WorkBatcher works as follows. First you tell WorkBatcher whether you want to batch by time or by size (or both). Then you pass work objects to WorkBatcher, which WorkBatcher adds to an internal store (the batch). When either the batch size limit or the time limit has been reached, WorkBatcher calls a user-specified callback (passing it the batch) to perform the actual processing of the batch.
|
6
|
+
|
7
|
+
Below is a small example that batches by time. Given a WorkBatcher object, call the `#add` method to add work objects. When the given time limit is reached (2 seconds in this example), it will call the `process_batch` lambda with all work objects that have been batched during that 2 seconds interval.
|
8
|
+
|
9
|
+
~~~ruby
|
10
|
+
process_batch = lambda do |batch|
|
11
|
+
# Warning! Code is executed in a different thread!
|
12
|
+
puts "Processing batch: #{batch.inspect}"
|
13
|
+
end
|
14
|
+
|
15
|
+
wb = WorkBatcher.new(processor: process_batch, time_limit: 2)
|
16
|
+
wb.add('work object 1')
|
17
|
+
wb.add('work object 2')
|
18
|
+
sleep 3
|
19
|
+
# => Processing batch: ["work object 1", "work object 2"]
|
20
|
+
|
21
|
+
wb.add('work object 1')
|
22
|
+
wb.add('work object 4')
|
23
|
+
sleep 3
|
24
|
+
# => Processing batch: ["work object 1", "work object 4"]
|
25
|
+
|
26
|
+
# Don't forget to cleanup.
|
27
|
+
wb.shutdown
|
28
|
+
~~~
|
29
|
+
|
30
|
+
**Table of contents**
|
31
|
+
|
32
|
+
* [Installation](#installation)
|
33
|
+
* [Features](#features)
|
34
|
+
* [Not a background job system](#not-a-background-job-system)
|
35
|
+
* [API](#api)
|
36
|
+
- [Constructor options](#constructor-options)
|
37
|
+
- [Adding work objects](#adding-work-objects)
|
38
|
+
- [Deduplicating work objects](#deduplicating-work-objects)
|
39
|
+
- [Concurrency notes](#concurrency-notes)
|
40
|
+
|
41
|
+
--------
|
42
|
+
|
43
|
+
## Installation
|
44
|
+
|
45
|
+
gem install work_batcher
|
46
|
+
|
47
|
+
## Features
|
48
|
+
|
49
|
+
* Lightweight library, not a background job system with daemon.
|
50
|
+
* You can configure it to process the batch either after a time limit, or after a certain number of items have been queued, or both.
|
51
|
+
* Able to avoid double work by [deduplicating work objects](#deduplicating work objects].
|
52
|
+
* Introspectable: you can query its processing status at any time.
|
53
|
+
* Thread-safe.
|
54
|
+
* Uses threads under the hood.
|
55
|
+
* The only dependency is concurrent-ruby. Unlike e.g. [task_batcher](http://www.rubydoc.info/gems/task_batcher), this library does not depend on EventMachine.
|
56
|
+
|
57
|
+
## Not a background job system
|
58
|
+
|
59
|
+
This library is *not* a background job system such as Sidekiq, BackgrounDRb or Resque. Those libraries are daemons that you run in the background for processing work. This library is for *batching* work. But you could combine this library with a background job system in order to add batching capabilities.
|
60
|
+
|
61
|
+
## API
|
62
|
+
|
63
|
+
### Constructor options
|
64
|
+
|
65
|
+
The constructor supports the following options.
|
66
|
+
|
67
|
+
Required:
|
68
|
+
|
69
|
+
* `:processor` (callable) -- will be called when WorkBatcher determines that it is time to process a batch. The callable will receive exactly one argument: an array of batched objects. The callable is called in a background thread; see [Concurrency notes](#concurrency-notes).
|
70
|
+
|
71
|
+
Optional:
|
72
|
+
|
73
|
+
* `:size_limit` (Integer) -- if the internal queue reaches this given size, then the current batch will be processed. Defaults to nil, meaning that WorkBatcher does not check the size.
|
74
|
+
* `:time_limit` (Float) -- if this much time has passed since the first time a work object has been placed in an empty queue, then the current batch will be processed. The unit is seconds, and may be a floating point number. Defaults to 5.
|
75
|
+
* `:deduplicate` (Boolean) -- whether to [deduplicate work objects](#deduplicating-work-objects) or not. Defaults to false.
|
76
|
+
* `:deduplicator` (callable) -- see [Deduplicating work objects](#deduplicating-work-objects) for more information.
|
77
|
+
* `:executor` -- a [concurrent-ruby](https://github.com/ruby-concurrency/concurrent-ruby) executor or thread pool object to perform background work in. Defaults to `Concurrent.global_io_executor`.
|
78
|
+
|
79
|
+
Note that `:size_limit` and `:time_limit` can be combined. If either condition is reached then the batch will be processed.
|
80
|
+
|
81
|
+
### Adding work objects
|
82
|
+
|
83
|
+
Add work objects with either the `#add` method (for adding a single object) or the `#add_multiple` method (for adding multiple objects). If you have multiple work objects, then it is more efficient to call `#add_multiple` once instead of calling `#add` many times.
|
84
|
+
|
85
|
+
work_batcher.add(work_object)
|
86
|
+
work_batcher.add_multiple([work_object1, work_object2])
|
87
|
+
|
88
|
+
A work object can be anything. Work_batcher does not do anything with the objects themselves; they are simply passed to the processor callable (although they may be [deduplicated](#deduplicating-work-objects) first).
|
89
|
+
|
90
|
+
### Deduplicating work objects
|
91
|
+
|
92
|
+
In some use cases it is desirable to deduplicate added work objects. Suppose that you are writing a social network in which each user can upload an avatar. You want to pass recently uploaded avatars to an image compressor service, and in order to reduce network roundtrips you want to do this in batches. What happens if you have _just_ added an avatar to a batch, but before the batch is sent out to the compressor service, the user uploads a new avatar? You want to replace that user's avatar in the batch with his/her latest one.
|
93
|
+
|
94
|
+
Enter deduplication. It starts by setting the `:deduplicate` option to true. When you add a work object, WorkBatcher will look in the batch for any objects that look like duplicates and remove them. By default, two work objects are considered equal if `#eql?` is true and (at the same time) their `#hash` are equal. This is because deduplication is internally implemented using a Hash.
|
95
|
+
|
96
|
+
The criteria for what is considered "duplicate" is configurable through the `:deduplicator` option. This option is to be set to a callable, which accepts a work object and which outputs a key object. Two work objects are considered duplicates if their key objects are the same (according to `#eql?` and `#hash`).
|
97
|
+
|
98
|
+
Here is an example on how to deduplicate avatars:
|
99
|
+
|
100
|
+
~~~ruby
|
101
|
+
deduplicator = lambda do |avatar|
|
102
|
+
# We consider two avatars to be duplicates if they are from the
|
103
|
+
# same user, so we return the user ID here.
|
104
|
+
avatar.user_id
|
105
|
+
end
|
106
|
+
|
107
|
+
processor = lambda do |avatars|
|
108
|
+
send_avatars_to_compressor_service(avatars)
|
109
|
+
end
|
110
|
+
|
111
|
+
wb = WorkBatcher.new(
|
112
|
+
deduplicate: true,
|
113
|
+
deduplicator: deduplicator,
|
114
|
+
processor: processor)
|
115
|
+
|
116
|
+
while !$quitting
|
117
|
+
avatar = receive_next_avatar
|
118
|
+
wb.add(avatar1)
|
119
|
+
end
|
120
|
+
~~~
|
121
|
+
|
122
|
+
Deduplication is disabled by default.
|
123
|
+
|
124
|
+
### Concurrency notes
|
125
|
+
|
126
|
+
WorkBatcher uses threads internally. The processor callback is called in a background thread, so care should be taken to ensure that your processor callback is thread-safe.
|
127
|
+
|
128
|
+
When using Rails, if your processor callback does anything with ActiveRecord then you must ensure that the processor callback releases the ActiveRecord thread-local connection, otherwise you will exhaust the ActiveRecord connection pool. Here is an example:
|
129
|
+
|
130
|
+
~~~ruby
|
131
|
+
processor = lambda do |batch|
|
132
|
+
begin
|
133
|
+
...
|
134
|
+
ensure
|
135
|
+
ActiveRecord::Base.connection_pool.release_connection
|
136
|
+
end
|
137
|
+
end
|
138
|
+
~~~
|
data/Rakefile
ADDED
data/lib/work_batcher.rb
ADDED
@@ -0,0 +1,168 @@
|
|
1
|
+
# Copyright (c) 2016 Phusion B.V.
|
2
|
+
#
|
3
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
+
# of this software and associated documentation files (the "Software"), to deal
|
5
|
+
# in the Software without restriction, including without limitation the rights
|
6
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
7
|
+
# copies of the Software, and to permit persons to whom the Software is
|
8
|
+
# furnished to do so, subject to the following conditions:
|
9
|
+
#
|
10
|
+
# The above copyright notice and this permission notice shall be included in
|
11
|
+
# all copies or substantial portions of the Software.
|
12
|
+
#
|
13
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
18
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
19
|
+
# THE SOFTWARE.
|
20
|
+
|
21
|
+
|
22
|
+
require 'thread'
|
23
|
+
require 'concurrent'
|
24
|
+
require 'concurrent/scheduled_task'
|
25
|
+
|
26
|
+
class WorkBatcher
|
27
|
+
def initialize(options = {})
|
28
|
+
@size_limit = get_option(options, :size_limit)
|
29
|
+
@time_limit = get_option(options, :time_limit, 5)
|
30
|
+
@deduplicate = get_option(options, :deduplicate)
|
31
|
+
@deduplicator = get_option(options, :deduplicator, method(:default_deduplicator))
|
32
|
+
@executor = get_option(options, :executor, Concurrent.global_io_executor)
|
33
|
+
@processor = get_option!(options, :processor)
|
34
|
+
|
35
|
+
@mutex = Mutex.new
|
36
|
+
if @deduplicate
|
37
|
+
@queue = {}
|
38
|
+
else
|
39
|
+
@queue = []
|
40
|
+
end
|
41
|
+
@processed = 0
|
42
|
+
end
|
43
|
+
|
44
|
+
def shutdown
|
45
|
+
task = @mutex.synchronize do
|
46
|
+
@scheduled_processing_task
|
47
|
+
end
|
48
|
+
if task
|
49
|
+
task.reschedule(0)
|
50
|
+
task.wait!
|
51
|
+
end
|
52
|
+
end
|
53
|
+
|
54
|
+
def add(work_object)
|
55
|
+
add_multiple([work_object])
|
56
|
+
end
|
57
|
+
|
58
|
+
def add_multiple(work_objects)
|
59
|
+
return if work_objects.empty?
|
60
|
+
|
61
|
+
@mutex.synchronize do
|
62
|
+
if @deduplicate
|
63
|
+
work_objects.each do |work_object|
|
64
|
+
key = @deduplicator.call(work_object)
|
65
|
+
@queue[key] = work_object
|
66
|
+
end
|
67
|
+
else
|
68
|
+
@queue.concat(work_objects)
|
69
|
+
end
|
70
|
+
schedule_processing
|
71
|
+
end
|
72
|
+
end
|
73
|
+
|
74
|
+
def status
|
75
|
+
result = {}
|
76
|
+
@mutex.synchronize do
|
77
|
+
if @scheduled_processing_task
|
78
|
+
result[:scheduled_processing_time] = @scheduled_processing_time
|
79
|
+
end
|
80
|
+
result[:queue_count] = @queue.size
|
81
|
+
result[:processed_count] = @processed
|
82
|
+
end
|
83
|
+
result
|
84
|
+
end
|
85
|
+
|
86
|
+
def inspect_queue
|
87
|
+
@mutex.synchronize do
|
88
|
+
if @deduplicate
|
89
|
+
@queue.values.dup
|
90
|
+
else
|
91
|
+
@queue.dup
|
92
|
+
end
|
93
|
+
end
|
94
|
+
end
|
95
|
+
|
96
|
+
private
|
97
|
+
def schedule_processing
|
98
|
+
if @scheduled_processing_task
|
99
|
+
if @size_limit && @queue.size >= @size_limit
|
100
|
+
@scheduled_processing_time = Time.now
|
101
|
+
@scheduled_processing_task.reschedule(0)
|
102
|
+
end
|
103
|
+
else
|
104
|
+
if @size_limit && @queue.size >= @size_limit
|
105
|
+
@scheduled_processing_task = create_scheduled_processing_task(0)
|
106
|
+
else
|
107
|
+
@scheduled_processing_task = create_scheduled_processing_task(@time_limit)
|
108
|
+
end
|
109
|
+
end
|
110
|
+
end
|
111
|
+
|
112
|
+
def create_scheduled_processing_task(delay)
|
113
|
+
@scheduled_processing_time = Time.now + delay
|
114
|
+
args = [delay, executor: @executor]
|
115
|
+
Concurrent::ScheduledTask.execute(*args) do
|
116
|
+
handle_uncaught_exception do
|
117
|
+
@mutex.synchronize do
|
118
|
+
begin
|
119
|
+
process_queue
|
120
|
+
ensure
|
121
|
+
@scheduled_processing_task = nil
|
122
|
+
@scheduled_processing_time = nil
|
123
|
+
end
|
124
|
+
end
|
125
|
+
end
|
126
|
+
end
|
127
|
+
end
|
128
|
+
|
129
|
+
def process_queue
|
130
|
+
if @deduplicate
|
131
|
+
@processor.call(@queue.values)
|
132
|
+
else
|
133
|
+
@processor.call(@queue.dup)
|
134
|
+
end
|
135
|
+
@processed += @queue.size
|
136
|
+
@queue.clear
|
137
|
+
end
|
138
|
+
|
139
|
+
def default_deduplicator(work_object)
|
140
|
+
work_object
|
141
|
+
end
|
142
|
+
|
143
|
+
def handle_uncaught_exception
|
144
|
+
begin
|
145
|
+
yield
|
146
|
+
rescue Exception => e
|
147
|
+
STDERR.puts(
|
148
|
+
"Uncaught exception in WorkBatcher: #{e} (#{e.class})\n" \
|
149
|
+
"#{e.backtrace.join("\n")}")
|
150
|
+
end
|
151
|
+
end
|
152
|
+
|
153
|
+
def get_option(options, key, default_value = nil)
|
154
|
+
if options.key?(key)
|
155
|
+
options[key]
|
156
|
+
else
|
157
|
+
default_value
|
158
|
+
end
|
159
|
+
end
|
160
|
+
|
161
|
+
def get_option!(options, key)
|
162
|
+
if options.key?(key)
|
163
|
+
options[key]
|
164
|
+
else
|
165
|
+
raise ArgumentError, "Option required: #{key}"
|
166
|
+
end
|
167
|
+
end
|
168
|
+
end
|
@@ -0,0 +1,18 @@
|
|
1
|
+
Gem::Specification.new do |s|
|
2
|
+
s.name = 'work_batcher'
|
3
|
+
s.version = '1.0.0'
|
4
|
+
s.summary = 'Library for batching work'
|
5
|
+
s.description = 'Library for batching work.'
|
6
|
+
s.email = 'info@phusion.nl'
|
7
|
+
s.homepage = 'https://github.com/phusion/work_batcher'
|
8
|
+
s.authors = ['Hongli Lai']
|
9
|
+
s.license = 'MIT'
|
10
|
+
s.files = [
|
11
|
+
'work_batcher.gemspec',
|
12
|
+
'README.md',
|
13
|
+
'LICENSE.md',
|
14
|
+
'Rakefile',
|
15
|
+
'lib/work_batcher.rb'
|
16
|
+
]
|
17
|
+
s.add_dependency 'concurrent-ruby'
|
18
|
+
end
|
metadata
ADDED
@@ -0,0 +1,63 @@
|
|
1
|
+
--- !ruby/object:Gem::Specification
|
2
|
+
name: work_batcher
|
3
|
+
version: !ruby/object:Gem::Version
|
4
|
+
version: 1.0.0
|
5
|
+
platform: ruby
|
6
|
+
authors:
|
7
|
+
- Hongli Lai
|
8
|
+
autorequire:
|
9
|
+
bindir: bin
|
10
|
+
cert_chain: []
|
11
|
+
date: 2016-09-23 00:00:00.000000000 Z
|
12
|
+
dependencies:
|
13
|
+
- !ruby/object:Gem::Dependency
|
14
|
+
name: concurrent-ruby
|
15
|
+
requirement: !ruby/object:Gem::Requirement
|
16
|
+
requirements:
|
17
|
+
- - ">="
|
18
|
+
- !ruby/object:Gem::Version
|
19
|
+
version: '0'
|
20
|
+
type: :runtime
|
21
|
+
prerelease: false
|
22
|
+
version_requirements: !ruby/object:Gem::Requirement
|
23
|
+
requirements:
|
24
|
+
- - ">="
|
25
|
+
- !ruby/object:Gem::Version
|
26
|
+
version: '0'
|
27
|
+
description: Library for batching work.
|
28
|
+
email: info@phusion.nl
|
29
|
+
executables: []
|
30
|
+
extensions: []
|
31
|
+
extra_rdoc_files: []
|
32
|
+
files:
|
33
|
+
- LICENSE.md
|
34
|
+
- README.md
|
35
|
+
- Rakefile
|
36
|
+
- lib/work_batcher.rb
|
37
|
+
- work_batcher.gemspec
|
38
|
+
homepage: https://github.com/phusion/work_batcher
|
39
|
+
licenses:
|
40
|
+
- MIT
|
41
|
+
metadata: {}
|
42
|
+
post_install_message:
|
43
|
+
rdoc_options: []
|
44
|
+
require_paths:
|
45
|
+
- lib
|
46
|
+
required_ruby_version: !ruby/object:Gem::Requirement
|
47
|
+
requirements:
|
48
|
+
- - ">="
|
49
|
+
- !ruby/object:Gem::Version
|
50
|
+
version: '0'
|
51
|
+
required_rubygems_version: !ruby/object:Gem::Requirement
|
52
|
+
requirements:
|
53
|
+
- - ">="
|
54
|
+
- !ruby/object:Gem::Version
|
55
|
+
version: '0'
|
56
|
+
requirements: []
|
57
|
+
rubyforge_project:
|
58
|
+
rubygems_version: 2.4.5.1
|
59
|
+
signing_key:
|
60
|
+
specification_version: 4
|
61
|
+
summary: Library for batching work
|
62
|
+
test_files: []
|
63
|
+
has_rdoc:
|