work_batcher 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,7 @@
1
+ ---
2
+ SHA1:
3
+ metadata.gz: 7d90286860b0a7594c9cdc2207024afa543e060c
4
+ data.tar.gz: ef489fbfa4c8c4eb55ca5d0d6e48b9f369823e6c
5
+ SHA512:
6
+ metadata.gz: def16c19e043e1cc52b66832512e93a289d69ba8bd8364f59f5276151d7450afc427b58915e4f873e5443d8da855ff889581fe2e9847d2317be9727dacade246
7
+ data.tar.gz: 103d09310207b8dd001f34bcdb444021ad8952130835a9546ae67a7f72f75711e5f118c86855748cc27a1ed0c5f4fc648d6a11bbe75136efd7ae55bdaa2c2c97
@@ -0,0 +1,19 @@
1
+ Copyright (c) 2016 Phusion B.V.
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
@@ -0,0 +1,138 @@
1
+ # Small library for batching work
2
+
3
+ Many types of work can be performed more efficiently when performed in batches rather than individually. For example, performing a single coarse-grained HTTP API call with multiple inputs is often faster than performing multiple smaller HTTP API calls because it reduces the number of network roundtrips. Enter WorkBatcher: a generic library for performing any kind of work in batches.
4
+
5
+ WorkBatcher works as follows. First you tell WorkBatcher whether you want to batch by time or by size (or both). Then you pass work objects to WorkBatcher, which WorkBatcher adds to an internal store (the batch). When either the batch size limit or the time limit has been reached, WorkBatcher calls a user-specified callback (passing it the batch) to perform the actual processing of the batch.
6
+
7
+ Below is a small example that batches by time. Given a WorkBatcher object, call the `#add` method to add work objects. When the given time limit is reached (2 seconds in this example), it will call the `process_batch` lambda with all work objects that have been batched during that 2 seconds interval.
8
+
9
+ ~~~ruby
10
+ process_batch = lambda do |batch|
11
+ # Warning! Code is executed in a different thread!
12
+ puts "Processing batch: #{batch.inspect}"
13
+ end
14
+
15
+ wb = WorkBatcher.new(processor: process_batch, time_limit: 2)
16
+ wb.add('work object 1')
17
+ wb.add('work object 2')
18
+ sleep 3
19
+ # => Processing batch: ["work object 1", "work object 2"]
20
+
21
+ wb.add('work object 1')
22
+ wb.add('work object 4')
23
+ sleep 3
24
+ # => Processing batch: ["work object 1", "work object 4"]
25
+
26
+ # Don't forget to cleanup.
27
+ wb.shutdown
28
+ ~~~
29
+
30
+ **Table of contents**
31
+
32
+ * [Installation](#installation)
33
+ * [Features](#features)
34
+ * [Not a background job system](#not-a-background-job-system)
35
+ * [API](#api)
36
+ - [Constructor options](#constructor-options)
37
+ - [Adding work objects](#adding-work-objects)
38
+ - [Deduplicating work objects](#deduplicating-work-objects)
39
+ - [Concurrency notes](#concurrency-notes)
40
+
41
+ --------
42
+
43
+ ## Installation
44
+
45
+ gem install work_batcher
46
+
47
+ ## Features
48
+
49
+ * Lightweight library, not a background job system with daemon.
50
+ * You can configure it to process the batch either after a time limit, or after a certain number of items have been queued, or both.
51
+ * Able to avoid double work by [deduplicating work objects](#deduplicating work objects].
52
+ * Introspectable: you can query its processing status at any time.
53
+ * Thread-safe.
54
+ * Uses threads under the hood.
55
+ * The only dependency is concurrent-ruby. Unlike e.g. [task_batcher](http://www.rubydoc.info/gems/task_batcher), this library does not depend on EventMachine.
56
+
57
+ ## Not a background job system
58
+
59
+ This library is *not* a background job system such as Sidekiq, BackgrounDRb or Resque. Those libraries are daemons that you run in the background for processing work. This library is for *batching* work. But you could combine this library with a background job system in order to add batching capabilities.
60
+
61
+ ## API
62
+
63
+ ### Constructor options
64
+
65
+ The constructor supports the following options.
66
+
67
+ Required:
68
+
69
+ * `:processor` (callable) -- will be called when WorkBatcher determines that it is time to process a batch. The callable will receive exactly one argument: an array of batched objects. The callable is called in a background thread; see [Concurrency notes](#concurrency-notes).
70
+
71
+ Optional:
72
+
73
+ * `:size_limit` (Integer) -- if the internal queue reaches this given size, then the current batch will be processed. Defaults to nil, meaning that WorkBatcher does not check the size.
74
+ * `:time_limit` (Float) -- if this much time has passed since the first time a work object has been placed in an empty queue, then the current batch will be processed. The unit is seconds, and may be a floating point number. Defaults to 5.
75
+ * `:deduplicate` (Boolean) -- whether to [deduplicate work objects](#deduplicating-work-objects) or not. Defaults to false.
76
+ * `:deduplicator` (callable) -- see [Deduplicating work objects](#deduplicating-work-objects) for more information.
77
+ * `:executor` -- a [concurrent-ruby](https://github.com/ruby-concurrency/concurrent-ruby) executor or thread pool object to perform background work in. Defaults to `Concurrent.global_io_executor`.
78
+
79
+ Note that `:size_limit` and `:time_limit` can be combined. If either condition is reached then the batch will be processed.
80
+
81
+ ### Adding work objects
82
+
83
+ Add work objects with either the `#add` method (for adding a single object) or the `#add_multiple` method (for adding multiple objects). If you have multiple work objects, then it is more efficient to call `#add_multiple` once instead of calling `#add` many times.
84
+
85
+ work_batcher.add(work_object)
86
+ work_batcher.add_multiple([work_object1, work_object2])
87
+
88
+ A work object can be anything. Work_batcher does not do anything with the objects themselves; they are simply passed to the processor callable (although they may be [deduplicated](#deduplicating-work-objects) first).
89
+
90
+ ### Deduplicating work objects
91
+
92
+ In some use cases it is desirable to deduplicate added work objects. Suppose that you are writing a social network in which each user can upload an avatar. You want to pass recently uploaded avatars to an image compressor service, and in order to reduce network roundtrips you want to do this in batches. What happens if you have _just_ added an avatar to a batch, but before the batch is sent out to the compressor service, the user uploads a new avatar? You want to replace that user's avatar in the batch with his/her latest one.
93
+
94
+ Enter deduplication. It starts by setting the `:deduplicate` option to true. When you add a work object, WorkBatcher will look in the batch for any objects that look like duplicates and remove them. By default, two work objects are considered equal if `#eql?` is true and (at the same time) their `#hash` are equal. This is because deduplication is internally implemented using a Hash.
95
+
96
+ The criteria for what is considered "duplicate" is configurable through the `:deduplicator` option. This option is to be set to a callable, which accepts a work object and which outputs a key object. Two work objects are considered duplicates if their key objects are the same (according to `#eql?` and `#hash`).
97
+
98
+ Here is an example on how to deduplicate avatars:
99
+
100
+ ~~~ruby
101
+ deduplicator = lambda do |avatar|
102
+ # We consider two avatars to be duplicates if they are from the
103
+ # same user, so we return the user ID here.
104
+ avatar.user_id
105
+ end
106
+
107
+ processor = lambda do |avatars|
108
+ send_avatars_to_compressor_service(avatars)
109
+ end
110
+
111
+ wb = WorkBatcher.new(
112
+ deduplicate: true,
113
+ deduplicator: deduplicator,
114
+ processor: processor)
115
+
116
+ while !$quitting
117
+ avatar = receive_next_avatar
118
+ wb.add(avatar1)
119
+ end
120
+ ~~~
121
+
122
+ Deduplication is disabled by default.
123
+
124
+ ### Concurrency notes
125
+
126
+ WorkBatcher uses threads internally. The processor callback is called in a background thread, so care should be taken to ensure that your processor callback is thread-safe.
127
+
128
+ When using Rails, if your processor callback does anything with ActiveRecord then you must ensure that the processor callback releases the ActiveRecord thread-local connection, otherwise you will exhaust the ActiveRecord connection pool. Here is an example:
129
+
130
+ ~~~ruby
131
+ processor = lambda do |batch|
132
+ begin
133
+ ...
134
+ ensure
135
+ ActiveRecord::Base.connection_pool.release_connection
136
+ end
137
+ end
138
+ ~~~
@@ -0,0 +1,4 @@
1
+ desc 'Run unit tests'
2
+ task :test do
3
+ sh 'bundle exec rspec -f d -c spec/*_spec.rb'
4
+ end
@@ -0,0 +1,168 @@
1
+ # Copyright (c) 2016 Phusion B.V.
2
+ #
3
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ # of this software and associated documentation files (the "Software"), to deal
5
+ # in the Software without restriction, including without limitation the rights
6
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ # copies of the Software, and to permit persons to whom the Software is
8
+ # furnished to do so, subject to the following conditions:
9
+ #
10
+ # The above copyright notice and this permission notice shall be included in
11
+ # all copies or substantial portions of the Software.
12
+ #
13
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ # THE SOFTWARE.
20
+
21
+
22
+ require 'thread'
23
+ require 'concurrent'
24
+ require 'concurrent/scheduled_task'
25
+
26
+ class WorkBatcher
27
+ def initialize(options = {})
28
+ @size_limit = get_option(options, :size_limit)
29
+ @time_limit = get_option(options, :time_limit, 5)
30
+ @deduplicate = get_option(options, :deduplicate)
31
+ @deduplicator = get_option(options, :deduplicator, method(:default_deduplicator))
32
+ @executor = get_option(options, :executor, Concurrent.global_io_executor)
33
+ @processor = get_option!(options, :processor)
34
+
35
+ @mutex = Mutex.new
36
+ if @deduplicate
37
+ @queue = {}
38
+ else
39
+ @queue = []
40
+ end
41
+ @processed = 0
42
+ end
43
+
44
+ def shutdown
45
+ task = @mutex.synchronize do
46
+ @scheduled_processing_task
47
+ end
48
+ if task
49
+ task.reschedule(0)
50
+ task.wait!
51
+ end
52
+ end
53
+
54
+ def add(work_object)
55
+ add_multiple([work_object])
56
+ end
57
+
58
+ def add_multiple(work_objects)
59
+ return if work_objects.empty?
60
+
61
+ @mutex.synchronize do
62
+ if @deduplicate
63
+ work_objects.each do |work_object|
64
+ key = @deduplicator.call(work_object)
65
+ @queue[key] = work_object
66
+ end
67
+ else
68
+ @queue.concat(work_objects)
69
+ end
70
+ schedule_processing
71
+ end
72
+ end
73
+
74
+ def status
75
+ result = {}
76
+ @mutex.synchronize do
77
+ if @scheduled_processing_task
78
+ result[:scheduled_processing_time] = @scheduled_processing_time
79
+ end
80
+ result[:queue_count] = @queue.size
81
+ result[:processed_count] = @processed
82
+ end
83
+ result
84
+ end
85
+
86
+ def inspect_queue
87
+ @mutex.synchronize do
88
+ if @deduplicate
89
+ @queue.values.dup
90
+ else
91
+ @queue.dup
92
+ end
93
+ end
94
+ end
95
+
96
+ private
97
+ def schedule_processing
98
+ if @scheduled_processing_task
99
+ if @size_limit && @queue.size >= @size_limit
100
+ @scheduled_processing_time = Time.now
101
+ @scheduled_processing_task.reschedule(0)
102
+ end
103
+ else
104
+ if @size_limit && @queue.size >= @size_limit
105
+ @scheduled_processing_task = create_scheduled_processing_task(0)
106
+ else
107
+ @scheduled_processing_task = create_scheduled_processing_task(@time_limit)
108
+ end
109
+ end
110
+ end
111
+
112
+ def create_scheduled_processing_task(delay)
113
+ @scheduled_processing_time = Time.now + delay
114
+ args = [delay, executor: @executor]
115
+ Concurrent::ScheduledTask.execute(*args) do
116
+ handle_uncaught_exception do
117
+ @mutex.synchronize do
118
+ begin
119
+ process_queue
120
+ ensure
121
+ @scheduled_processing_task = nil
122
+ @scheduled_processing_time = nil
123
+ end
124
+ end
125
+ end
126
+ end
127
+ end
128
+
129
+ def process_queue
130
+ if @deduplicate
131
+ @processor.call(@queue.values)
132
+ else
133
+ @processor.call(@queue.dup)
134
+ end
135
+ @processed += @queue.size
136
+ @queue.clear
137
+ end
138
+
139
+ def default_deduplicator(work_object)
140
+ work_object
141
+ end
142
+
143
+ def handle_uncaught_exception
144
+ begin
145
+ yield
146
+ rescue Exception => e
147
+ STDERR.puts(
148
+ "Uncaught exception in WorkBatcher: #{e} (#{e.class})\n" \
149
+ "#{e.backtrace.join("\n")}")
150
+ end
151
+ end
152
+
153
+ def get_option(options, key, default_value = nil)
154
+ if options.key?(key)
155
+ options[key]
156
+ else
157
+ default_value
158
+ end
159
+ end
160
+
161
+ def get_option!(options, key)
162
+ if options.key?(key)
163
+ options[key]
164
+ else
165
+ raise ArgumentError, "Option required: #{key}"
166
+ end
167
+ end
168
+ end
@@ -0,0 +1,18 @@
1
+ Gem::Specification.new do |s|
2
+ s.name = 'work_batcher'
3
+ s.version = '1.0.0'
4
+ s.summary = 'Library for batching work'
5
+ s.description = 'Library for batching work.'
6
+ s.email = 'info@phusion.nl'
7
+ s.homepage = 'https://github.com/phusion/work_batcher'
8
+ s.authors = ['Hongli Lai']
9
+ s.license = 'MIT'
10
+ s.files = [
11
+ 'work_batcher.gemspec',
12
+ 'README.md',
13
+ 'LICENSE.md',
14
+ 'Rakefile',
15
+ 'lib/work_batcher.rb'
16
+ ]
17
+ s.add_dependency 'concurrent-ruby'
18
+ end
metadata ADDED
@@ -0,0 +1,63 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: work_batcher
3
+ version: !ruby/object:Gem::Version
4
+ version: 1.0.0
5
+ platform: ruby
6
+ authors:
7
+ - Hongli Lai
8
+ autorequire:
9
+ bindir: bin
10
+ cert_chain: []
11
+ date: 2016-09-23 00:00:00.000000000 Z
12
+ dependencies:
13
+ - !ruby/object:Gem::Dependency
14
+ name: concurrent-ruby
15
+ requirement: !ruby/object:Gem::Requirement
16
+ requirements:
17
+ - - ">="
18
+ - !ruby/object:Gem::Version
19
+ version: '0'
20
+ type: :runtime
21
+ prerelease: false
22
+ version_requirements: !ruby/object:Gem::Requirement
23
+ requirements:
24
+ - - ">="
25
+ - !ruby/object:Gem::Version
26
+ version: '0'
27
+ description: Library for batching work.
28
+ email: info@phusion.nl
29
+ executables: []
30
+ extensions: []
31
+ extra_rdoc_files: []
32
+ files:
33
+ - LICENSE.md
34
+ - README.md
35
+ - Rakefile
36
+ - lib/work_batcher.rb
37
+ - work_batcher.gemspec
38
+ homepage: https://github.com/phusion/work_batcher
39
+ licenses:
40
+ - MIT
41
+ metadata: {}
42
+ post_install_message:
43
+ rdoc_options: []
44
+ require_paths:
45
+ - lib
46
+ required_ruby_version: !ruby/object:Gem::Requirement
47
+ requirements:
48
+ - - ">="
49
+ - !ruby/object:Gem::Version
50
+ version: '0'
51
+ required_rubygems_version: !ruby/object:Gem::Requirement
52
+ requirements:
53
+ - - ">="
54
+ - !ruby/object:Gem::Version
55
+ version: '0'
56
+ requirements: []
57
+ rubyforge_project:
58
+ rubygems_version: 2.4.5.1
59
+ signing_key:
60
+ specification_version: 4
61
+ summary: Library for batching work
62
+ test_files: []
63
+ has_rdoc: