divvy 1.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/COPYING ADDED
@@ -0,0 +1,18 @@
1
+ Copyright (c) 2013 by Ryan Tomayko <http://tomayko.com/>
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to
5
+ deal in the Software without restriction, including without limitation the
6
+ rights to use, copy, modify, merge, publish, distribute, sublicense, and/or
7
+ sell copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
16
+ THE AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
17
+ IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
18
+ CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
@@ -0,0 +1,120 @@
1
+ divvy - parallel script runner
2
+ ==============================
3
+
4
+ This is a (forking) parallel task runner for Ruby designed to be run in the
5
+ foreground and to require no external infrastructure components (like redis or a
6
+ queue server).
7
+
8
+ Divvy provides a light system for defining parallelizable pieces of work and a
9
+ process based run environment for executing them. It's good for running coarse
10
+ grained tasks that are network or IO heavy. It's not good at crunching lots of
11
+ inputs quickly or parallelizing fine grained / CPU intense pieces of work.
12
+
13
+ GitHub uses divvy with [ModelIterator](https://github.com/technoweenie/model_iterator)
14
+ to perform one-off and regular maintenance tasks on different types of records
15
+ and their associated storage components.
16
+
17
+ ## example
18
+
19
+ This is a simple and contrived example of a divvy job script. You must define a
20
+ class that includes the `Divvy::Parallelizable` module and implement the
21
+ `#dispatch` and `#perform` methods. There are also hooks available for tapping
22
+ into the worker process lifecycle.
23
+
24
+ ``` ruby
25
+ # This is a dance party. We're going to hand out tickets. We need to generate
26
+ # codes for each available ticket. Thing is, the ticket codes have to be
27
+ # generated by this external ticket code generator service (this part is
28
+ # just pretend) and there's a lot of latency involved. We can generate multiple
29
+ # ticket codes at the same time by making multiple connections.
30
+ require 'divvy'
31
+ require 'digest/sha1' # <-- your humble ticket code generator service
32
+
33
+ class DanceParty
34
+ # The Parallelizable module provides default method implementations and marks
35
+ # the object as following the interface defined below.
36
+ include Divvy::Parallelizable
37
+
38
+ # This is the main loop responsible for generating work items for worker
39
+ # processes. It runs in the master process only. Each item yielded from this
40
+ # method is marshalled over a pipe and distributed to the next available
41
+ # worker process where it arrives at the #perform method (see below).
42
+ #
43
+ # In this example we're just going to generate a series of numbers to pass
44
+ # to the workers. The workers just write the number out with their pid and the
45
+ # SHA1 hex digest of the number given.
46
+ def dispatch
47
+ tickets_available = ARGV[0] ? ARGV[0].to_i : 10
48
+ puts "Generating #{tickets_available} ticket codes for the show..."
49
+ (0...tickets_available).each do |ticket_number|
50
+ yield ticket_number
51
+ end
52
+ end
53
+
54
+ # The individual work item processing method. Each item produced by the
55
+ # dispatch method is sent to this method in the worker processes. The
56
+ # arguments to this method must match the arity of the work item yielded
57
+ # from the #dispatch method.
58
+ #
59
+ # In this example we're given a Fixnum ticket number and asked to produce a
60
+ # code. Pretend this is a network intense operation where you're mostly
61
+ # sleeping waiting for a reply.
62
+ def perform(ticket_number)
63
+ ticket_sha1 = Digest::SHA1.hexdigest(ticket_number.to_s)
64
+ printf "%5d %6d %s\n" % [$$, ticket_number, ticket_sha1]
65
+ sleep 0.150 # fake some latency
66
+ end
67
+
68
+ # Hook called after a worker process is forked off from the master process.
69
+ # This runs in the worker process only. Typically used to re-establish
70
+ # connections to external services or open files (logs and such).
71
+ def after_fork(worker)
72
+ # warn "In after_fork for worker #{worker.number}"
73
+ end
74
+
75
+ # Hook called before a worker process is forked off from the master process.
76
+ # This runs in the master process only. This can be used to monitor the rate
77
+ # at which workers are being created or to set a starting process state for
78
+ # the newly forked process.
79
+ def before_fork(worker)
80
+ # warn "In before_fork for worker #{worker.number}"
81
+ end
82
+ end
83
+ ```
84
+
85
+ ### divvy command
86
+
87
+ You can run the example script above with the `divvy` command, which includes
88
+ options for controlling concurrency and other cool stuff. Here we use five
89
+ worker processes to generate 10 dance party ticket codes:
90
+
91
+ ```
92
+ $ divvy -n 5 danceparty.rb
93
+ 51589 0 b6589fc6ab0dc82cf12099d1c2d40ab994e8410c
94
+ 51590 1 356a192b7913b04c54574d18c28d46e6395428ab
95
+ 51589 4 1b6453892473a467d07372d45eb05abc2031647a
96
+ 51590 5 ac3478d69a3c81fa62e60f5c3696165a4e5e6ac4
97
+ 51591 2 da4b9237bacccdf19c0760cab7aec4a8359010b0
98
+ 51589 6 c1dfd96eea8cc2b62785275bca38ac261256e278
99
+ 51592 3 77de68daecd823babbb58edb1c8e14d7106e83bb
100
+ 51590 8 fe5dbbcea5ce7e2988b8c69bcfdfde8904aabc1f
101
+ 51591 9 0ade7c2cf97f75d009975f4d720d1fa6c19f4897
102
+ 51593 7 902ba3cda1883801594b6e1b452790cc53948fda
103
+ ```
104
+
105
+ The columns of output are the worker pid, the ticket number input, and the
106
+ generated ticket code result. You can see items are distributed between
107
+ available workers evenly-ish and may not be processed in order.
108
+
109
+ ### manual runner
110
+
111
+ You can also turn the current ruby process into a divvy master by creating a
112
+ `Divvy::Master` object, passing an instance of `Parallelizable` and the amount
113
+ of desired concurrency:
114
+
115
+ ``` ruby
116
+ require 'danceparty'
117
+ task = DanceParty.new
118
+ master = Divvy::Master.new(task, 10)
119
+ master.run
120
+ ```
@@ -0,0 +1,6 @@
1
+ task :default => :test
2
+
3
+ desc "Run tests"
4
+ task :test do
5
+ sh "script/cibuild"
6
+ end
@@ -0,0 +1,31 @@
1
+ #!/usr/bin/env ruby
2
+ #/ Usage: divvy [-n <workers>] <script>.rb
3
+ #/ Run a divvy script with the given number of workers.
4
+ require 'optparse'
5
+ $stderr.sync = true
6
+
7
+ # Number of work processes to spawn.
8
+ worker_count = 1
9
+
10
+ # Whether to write verbose output to stderr
11
+ verbose = false
12
+
13
+ # The divvy script used to setup
14
+ script_file = nil
15
+
16
+ # parse arguments
17
+ file = __FILE__
18
+ ARGV.options do |opts|
19
+ opts.on("-n", "--workers=val", Integer) { |val| worker_count = val }
20
+ opts.on("-v", "--verbose") { |val| verbose = val }
21
+ opts.on_tail("-h", "--help") { exec "grep ^#/<'#{file}'|cut -c4-" }
22
+ opts.parse!
23
+ end
24
+ script_file = ARGV.shift
25
+
26
+ require 'divvy'
27
+ task = Divvy.load(script_file)
28
+ warn "divvy: booting #{worker_count} workers for #{task.class}" if verbose
29
+
30
+ master = Divvy::Master.new(task, worker_count, verbose)
31
+ master.run
@@ -0,0 +1,20 @@
1
+ # -*- encoding: utf-8 -*-
2
+
3
+ Gem::Specification.new do |s|
4
+ s.name = "divvy"
5
+ s.version = "1.0"
6
+ s.platform = Gem::Platform::RUBY
7
+ s.authors = %w[@rtomayko]
8
+ s.email = ["rtomayko@gmail.com"]
9
+ s.homepage = "https://github.com/rtomayko/divvy"
10
+ s.summary = "little ruby parallel script runner"
11
+ s.description = "..."
12
+
13
+ s.add_development_dependency "rake"
14
+ s.add_development_dependency "minitest"
15
+
16
+ s.files = `git ls-files`.split("\n")
17
+ s.test_files = `git ls-files -- test`.split("\n").select { |f| f =~ /_test.rb$/ }
18
+ s.executables = `git ls-files -- bin`.split("\n").map { |f| File.basename(f) }
19
+ s.require_paths = %w[lib]
20
+ end
@@ -0,0 +1,58 @@
1
+ # This is a dance party. We're going to hand out tickets. We need to generate
2
+ # codes for each available ticket. Thing is, the ticket codes have to be
3
+ # generated by this external ticket code generator service (this part is
4
+ # just pretend) and there's a lot of latency involved. We can generate multiple
5
+ # ticket codes at the same time by making multiple connections.
6
+ require 'divvy'
7
+ require 'digest/sha1' # <-- your humble ticket code generator service
8
+
9
+ class DanceParty
10
+ # The Parallelizable module provides default method implementations and marks
11
+ # the object as following the interface defined below.
12
+ include Divvy::Parallelizable
13
+
14
+ # This is the main loop responsible for generating work items for worker
15
+ # processes. It runs in the master process only. Each item yielded from this
16
+ # method is marshalled over a pipe and distributed to the next available
17
+ # worker process where it arrives at the #perform method (see below).
18
+ #
19
+ # In this example we're just going to generate a series of numbers to pass
20
+ # to the workers. The workers just write the number out with their pid and the
21
+ # SHA1 hex digest of the number given.
22
+ def dispatch
23
+ tickets_available = ARGV[0] ? ARGV[0].to_i : 10
24
+ puts "Generating #{tickets_available} ticket codes for the show..."
25
+ (0...tickets_available).each do |ticket_number|
26
+ yield ticket_number
27
+ end
28
+ end
29
+
30
+ # The individual work item processing method. Each item produced by the
31
+ # dispatch method is sent to this method in the worker processes. The
32
+ # arguments to this method must match the arity of the work item yielded
33
+ # from the #dispatch method.
34
+ #
35
+ # In this example we're given a Fixnum ticket number and asked to produce a
36
+ # code. Pretend this is a network intense operation where you're mostly
37
+ # sleeping waiting for a reply.
38
+ def perform(ticket_number)
39
+ ticket_sha1 = Digest::SHA1.hexdigest(ticket_number.to_s)
40
+ printf "%5d %6d %s\n" % [$$, ticket_number, ticket_sha1]
41
+ sleep 0.150 # fake some latency
42
+ end
43
+
44
+ # Hook called after a worker process is forked off from the master process.
45
+ # This runs in the worker process only. Typically used to re-establish
46
+ # connections to external services or open files (logs and such).
47
+ def after_fork(worker)
48
+ # warn "In after_fork for worker #{worker.number}"
49
+ end
50
+
51
+ # Hook called before a worker process is forked off from the master process.
52
+ # This runs in the master process only. This can be used to monitor the rate
53
+ # at which workers are being created or to set a starting process state for
54
+ # the newly forked process.
55
+ def before_fork(worker)
56
+ # warn "In before_fork for worker #{worker.number}"
57
+ end
58
+ end
@@ -0,0 +1,22 @@
1
+ require 'divvy/parallelizable'
2
+ require 'divvy/master'
3
+ require 'divvy/worker'
4
+
5
+ module Divvy
6
+ # Load a script that defines a Divvy::Parallelizable object. A class that
7
+ # includes the Parallelizable module must be defined in order for this to work.
8
+ #
9
+ # file - Script file to load.
10
+ #
11
+ # Returns an object that implements the Parallelizable interface.
12
+ # Raises a RuntimeError when no parallelizable object was defined.
13
+ def self.load(file)
14
+ Kernel::load(file)
15
+
16
+ if subclass = Parallelizable.parallelizable.last
17
+ @receiver = subclass.new
18
+ else
19
+ fail "#{file} does not define a Divvy::Parallelizable object"
20
+ end
21
+ end
22
+ end
@@ -0,0 +1,323 @@
1
+ require 'socket'
2
+
3
+ module Divvy
4
+ # The master process used to generate and distribute task items to the
5
+ # worker processes.
6
+ class Master
7
+ # The number of worker processes to boot.
8
+ attr_reader :worker_count
9
+
10
+ # The array of Worker objects this master is managing.
11
+ attr_reader :workers
12
+
13
+ # The string filename of the unix domain socket used to distribute work.
14
+ attr_reader :socket
15
+
16
+ # Enable verbose logging to stderr.
17
+ attr_accessor :verbose
18
+
19
+ # Number of tasks that have been distributed to worker processes.
20
+ attr_reader :tasks_distributed
21
+
22
+ # Number of worker processes that have exited with a failure status since
23
+ # the master started processing work.
24
+ attr_reader :failures
25
+
26
+ # Number of worker processes that have been spawned since the master
27
+ # started processing work.
28
+ attr_reader :spawn_count
29
+
30
+ # Raised from a signal handler when a forceful shutdown is requested.
31
+ class Shutdown < Exception
32
+ end
33
+
34
+ # Raised from the run loop when worker processes never fully booted and
35
+ # started making connections to the master.
36
+ class BootFailure < StandardError
37
+ end
38
+
39
+ # Create the master process object.
40
+ #
41
+ # task - Object that implements the Parallelizable interface.
42
+ # worker_count - Number of worker processes.
43
+ # verbose - Enable verbose error logging.
44
+ # socket - The unix domain socket filename.
45
+ #
46
+ # The workers array is initialized with Worker objects for the number of
47
+ # worker processes requested. The processes are not actually started at this
48
+ # time though.
49
+ def initialize(task, worker_count, verbose = false, socket = nil)
50
+ @task = task
51
+ @worker_count = worker_count
52
+ @verbose = verbose
53
+ @socket = socket || "/tmp/divvy-#{$$}-#{object_id}.sock"
54
+
55
+ # stats
56
+ @tasks_distributed = 0
57
+ @failures = 0
58
+ @spawn_count = 0
59
+
60
+ # shutdown state
61
+ @shutdown = false
62
+ @graceful = true
63
+ @reap = false
64
+
65
+ # worker objects
66
+ @workers = []
67
+ (1..@worker_count).each do |worker_num|
68
+ worker = Divvy::Worker.new(@task, worker_num, @socket, @verbose)
69
+ workers << worker
70
+ end
71
+ end
72
+
73
+ # Public: Start the main run loop. This installs signal handlers into the
74
+ # current process, binds to the unix domain socket, boots workers, and begins
75
+ # dispatching work.
76
+ #
77
+ # The run method does not return until all task items generated have been
78
+ # processed unless a shutdown signal is received or the #shutdown method is
79
+ # called within the loop.
80
+ #
81
+ # Returns nothing.
82
+ # Raises BootFailure when the workers fail to start.
83
+ # Raises Shutdown when a forceful shutdown is triggered (SIGTERM).
84
+ def run
85
+ fail "Already running!!!" if @server
86
+ fail "Attempt to run master in a worker process" if worker_process?
87
+ install_signal_traps
88
+ start_server
89
+
90
+ @task.dispatch do |*task_item|
91
+ # boot workers that haven't started yet or have been reaped
92
+ boot_workers
93
+
94
+ # check for shutdown or worker reap flag until a connection is pending
95
+ # in the domain socket queue. bail out if workers exited before even
96
+ # requesting a task item.
97
+ while IO.select([@server], nil, nil, 0.010).nil?
98
+ break if @shutdown
99
+ if @reap
100
+ reap_workers
101
+ if !workers_running? && @tasks_distributed == 0
102
+ raise BootFailure, "Worker processes failed to boot."
103
+ else
104
+ boot_workers
105
+ end
106
+ end
107
+ end
108
+ break if @shutdown
109
+
110
+ # at this point there should be at least one connection pending.
111
+ begin
112
+ data = Marshal.dump(task_item)
113
+ sock = @server.accept
114
+ sock.write(data)
115
+ ensure
116
+ sock.close if sock
117
+ end
118
+ @tasks_distributed += 1
119
+
120
+ break if @shutdown
121
+ reap_workers if @reap
122
+ end
123
+
124
+ nil
125
+ rescue Shutdown
126
+ @graceful = false
127
+ @shutdown = true
128
+ ensure
129
+ shutdown! if master_process?
130
+ end
131
+
132
+ # Public: Check if the current process is the master process.
133
+ #
134
+ # Returns true in the master process, false in the worker process.
135
+ def master_process?
136
+ @workers
137
+ end
138
+
139
+ # Public: Check if the current process is a worker process.
140
+ # This relies on the @workers array being set to a nil value.
141
+ #
142
+ # Returns true in the worker process, false in master processes.
143
+ def worker_process?
144
+ !master_process?
145
+ end
146
+
147
+ # Public: Are any worker processes currently running or have yet to be
148
+ # reaped by the master process?
149
+ def workers_running?
150
+ @workers.any? { |worker| worker.running? }
151
+ end
152
+
153
+ # Public: Initiate shutdown of the run loop. The loop will not be stopped when
154
+ # this method returns. The original run loop will return after the current
155
+ # iteration of task item.
156
+ def shutdown
157
+ @shutdown ||= Time.now
158
+ end
159
+
160
+ # Internal: Really shutdown the unix socket and reap all worker processes.
161
+ # This doesn't signal the workers. Instead, the socket shutdown is relied
162
+ # upon to trigger the workers to exit normally.
163
+ #
164
+ # TODO Send SIGKILL when workers stay running for configurable period.
165
+ def shutdown!
166
+ fail "Master#shutdown! called in worker process" if worker_process?
167
+ stop_server
168
+ while workers_running?
169
+ kill_workers("KILL") if !@graceful
170
+ reaped = reap_workers
171
+ sleep 0.010 if reaped.empty?
172
+ end
173
+ reset_signal_traps
174
+ raise Shutdown if !@graceful
175
+ end
176
+
177
+ # Internal: create and bind to the unix domain socket. Note that the
178
+ # requested backlog matches the number of workers. Otherwise workers will
179
+ # get ECONNREFUSED when attempting to connect to the master and exit.
180
+ def start_server
181
+ fail "Master#start_server called in worker process" if worker_process?
182
+ File.unlink(@socket) if File.exist?(@socket)
183
+ @server = UNIXServer.new(@socket)
184
+ @server.listen(worker_count)
185
+ end
186
+
187
+ # Internal: Close and remove the unix domain socket.
188
+ def stop_server
189
+ fail "Master#stop_server called in worker process" if worker_process?
190
+ File.unlink(@socket) if File.exist?(@socket)
191
+ @server.close if @server
192
+ @server = nil
193
+ end
194
+
195
+ # Internal: Boot any workers that are not currently running. This is a no-op
196
+ # if all workers are though to be running. No attempt is made to verify
197
+ # worker processes are running here. Only workers that have not yet been
198
+ # booted and those previously marked as reaped are started.
199
+ def boot_workers
200
+ workers.each do |worker|
201
+ next if worker.running?
202
+ boot_worker(worker)
203
+ end
204
+ end
205
+
206
+ # Internal: Boot and individual worker process. Don't call this if the
207
+ # worker is thought to be running.
208
+ #
209
+ # worker - The Worker object to boot.
210
+ #
211
+ # Returns the Worker object provided.
212
+ def boot_worker(worker)
213
+ fail "worker #{worker.number} already running" if worker.running?
214
+ fail "attempt to boot worker without server" if !@server
215
+
216
+ @task.before_fork(worker)
217
+
218
+ worker.spawn do
219
+ reset_signal_traps
220
+ @workers = nil
221
+
222
+ @server.close
223
+ @server = nil
224
+
225
+ $stdin.close
226
+ end
227
+ @spawn_count += 1
228
+
229
+ worker
230
+ end
231
+
232
+ # Internal: Send a signal to all running workers.
233
+ #
234
+ # signal - The string signal name.
235
+ #
236
+ # Returns nothing.
237
+ def kill_workers(signal = 'TERM')
238
+ workers.each do |worker|
239
+ next if !worker.running?
240
+ worker.kill(signal)
241
+ end
242
+ end
243
+
244
+ # Internal: Attempt to reap all worker processes via Process::waitpid. This
245
+ # method does not block waiting for processes to exit. Running processes are
246
+ # ignored.
247
+ #
248
+ # Returns an array of Worker objects whose process's were reaped. The
249
+ # Worker#status attribute can be used to access the Process::Status result.
250
+ def reap_workers
251
+ @reap = false
252
+ workers.select do |worker|
253
+ if status = worker.reap
254
+ @failures += 1 if !status.success?
255
+ worker
256
+ end
257
+ end
258
+ end
259
+
260
+ # Internal: Install traps for shutdown signals. Most signals deal with
261
+ # shutting down the master loop and socket.
262
+ #
263
+ # INFO - Dump stack for all processes to stderr.
264
+ # TERM - Initiate immediate forceful shutdown of all worker processes
265
+ # along with the master process, aborting any existing jobs in
266
+ # progress.
267
+ # INT, QUIT - Initiate graceful shutdown, allowing existing worker processes
268
+ # to finish their current task and exit on their own. If this
269
+ # signal is received again after 10s, instead initiate an
270
+ # immediate forceful shutdown as with TERM. This is mostly so you
271
+ # can interrupt sanely with Ctrl+C with the master foregrounded.
272
+ # CHLD - Set the worker reap flag. An attempt is made to reap workers
273
+ # immediately after the current dispatch iteration.
274
+ #
275
+ # Returns nothing.
276
+ def install_signal_traps
277
+ @traps =
278
+ %w[INT QUIT].map do |signal|
279
+ Signal.trap signal do
280
+ if @shutdown
281
+ raise Shutdown, "SIG#{signal}" if (Time.now - @shutdown) > 10 # seconds
282
+ next
283
+ else
284
+ shutdown
285
+ log "#{signal} received. initiating graceful shutdown..."
286
+ end
287
+ end
288
+ end
289
+ @traps << Signal.trap("CHLD") { @reap = true }
290
+ @traps << Signal.trap("TERM") { raise Shutdown, "SIGTERM" }
291
+
292
+ Signal.trap "INFO" do
293
+ message = "==> info: process #$$ dumping stack\n"
294
+ message << caller.join("\n").gsub(/^/, " ").gsub("#{Dir.pwd}/", "")
295
+ $stderr.puts(message)
296
+ end
297
+
298
+ @traps
299
+ end
300
+
301
+ # Internal: Uninstall signal traps set up by the install_signal_traps
302
+ # method. This is called immediately after forking worker processes to reset
303
+ # traps to their default implementations and also when the master process
304
+ # shuts down.
305
+ def reset_signal_traps
306
+ return if @traps.nil? || @traps.empty?
307
+ %w[INT QUIT CHLD TERM].each do |signal|
308
+ handler = @traps.shift || "DEFAULT"
309
+ if handler.is_a?(String)
310
+ Signal.trap(signal, handler)
311
+ else
312
+ Signal.trap(signal, &handler)
313
+ end
314
+ end
315
+ end
316
+
317
+ # Internal: Write a verbose log message to stderr.
318
+ def log(message)
319
+ return if !verbose
320
+ $stderr.printf("master: %s\n", message)
321
+ end
322
+ end
323
+ end
@@ -0,0 +1,61 @@
1
+ module Divvy
2
+ # Module defining the main task interface. Parallelizable classes must respond
3
+ # to #dispatch and #perform and may override hook methods to tap into the
4
+ # worker process lifecycle.
5
+ module Parallelizable
6
+ # The main loop responsible for generating task items to process in workers.
7
+ # Runs in the master process only. Each item this method yields is distributed
8
+ # to one of a pool of worker processes where #perform (see below) is invoked
9
+ # to process the task item.
10
+ #
11
+ # The arguments yielded to the block are passed with same arity to
12
+ # the #perform method. Only marshallable types may be included.
13
+ #
14
+ # The dispatch method takes no arguments. It's expected that the receiving
15
+ # object is setup with all necessary state to generate task items or can
16
+ # retrieve the information it needs from elsewhere.
17
+ #
18
+ # When the dispatch method returns the master process exits.
19
+ # If an exception is raised, the program exits non-zero.
20
+ def dispatch
21
+ raise NotImplementedError, "#{self.class} must implement #dispatch method"
22
+ end
23
+
24
+ # Process an individual task item. Each item produced by #dispatch is sent here
25
+ # in one of a pool of the worker processes. The arguments to this method must
26
+ # match the arity of the task item yielded from #dispatch.
27
+ def perform(*args)
28
+ raise NotImplementedError, "#{self.class} must implement perform method"
29
+ end
30
+
31
+ # Hook called after a worker process is forked off from the master process.
32
+ # This runs in the worker process only. Typically used to re-establish
33
+ # connections to external services or open files (logs and such).
34
+ #
35
+ # worker - A Divvy::Worker object describing the process that was just
36
+ # created. Always the current process ($$).
37
+ #
38
+ # Return value is ignored.
39
+ def after_fork(worker)
40
+ end
41
+
42
+ # Hook called before a worker process is forked off from the master process.
43
+ # This runs in the master process only.
44
+ #
45
+ # worker - Divvy::Worker object descibing the process that's about to fork.
46
+ # Worker#pid will be nil but Worker#number (1..worker_count) is
47
+ # always available.
48
+ #
49
+ # Return value is ignored.
50
+ def before_fork(worker)
51
+ end
52
+
53
+ # Track classes and modules that include this module.
54
+ @parallelizable = []
55
+ def self.included(mod)
56
+ @parallelizable << mod if self == Divvy::Parallelizable
57
+ super
58
+ end
59
+ def self.parallelizable; @parallelizable; end
60
+ end
61
+ end
@@ -0,0 +1,153 @@
1
+ module Divvy
2
+ # Models an individual divvy worker process. These objects are used in both
3
+ # the master and the forked off workers to perform common tasks and for basic
4
+ # tracking.
5
+ class Worker
6
+ # The worker number. These are sequential starting from 1 and ending in the
7
+ # configured worker concurrency count.
8
+ attr_reader :number
9
+
10
+ # The Unix domain socket file used to communicate with the master process.
11
+ attr_reader :socket
12
+
13
+ # Whether verbose log info should be written to stderr.
14
+ attr_accessor :verbose
15
+
16
+ # Process::Status object result of reaping the worker.
17
+ attr_reader :status
18
+
19
+ # The worker processes's pid. This is $$ when inside the worker process.
20
+ attr_accessor :pid
21
+
22
+ # Create a Worker object. The Master object typically handles this.
23
+ def initialize(task, number, socket, verbose = false)
24
+ @task = task
25
+ @number = number
26
+ @socket = socket
27
+ @verbose = verbose
28
+ @pid = nil
29
+ @status = nil
30
+ end
31
+
32
+ # Public: Check whether the worker process is thought to be running. This
33
+ # does not attempt to verify the real state of the process with the system.
34
+ def running?
35
+ @pid && @status.nil?
36
+ end
37
+
38
+ # Public: Send a signal to a running worker process.
39
+ #
40
+ # signal - String signal name.
41
+ #
42
+ # Returns true when the process was signaled, false if the process is no
43
+ # longer running.
44
+ # Raises when the worker process is not thought to be running.
45
+ def kill(signal)
46
+ fail "worker not running" if @pid.nil?
47
+ log "sending signal #{signal}"
48
+ Process.kill(signal, @pid)
49
+ true
50
+ rescue Errno::ESRCH
51
+ false
52
+ end
53
+
54
+ # Public: Check whether the current process is this worker process.
55
+ #
56
+ # Returns true when we're in this worker, false in the master.
57
+ def worker_process?
58
+ @pid == $$
59
+ end
60
+
61
+ # Public: Fork off a new process for this worker and yield to the block
62
+ # immediately in the new child process.
63
+ #
64
+ # Returns the pid of the new process in the master process. Never returns in
65
+ # the child process.
66
+ # Raises when the worker process is already thought to be running or has not
67
+ # yet been reaped.
68
+ def spawn
69
+ fail "worker already running" if running?
70
+ @status = nil
71
+
72
+ if (@pid = fork).nil?
73
+ @pid = $$
74
+ yield
75
+ install_signal_traps
76
+ main
77
+ exit 0
78
+ end
79
+
80
+ @pid
81
+ end
82
+
83
+ # Public: Attempt to reap this worker's process using waitpid. This is a
84
+ # no-op if the process is not thought to be running or is marked as already
85
+ # being reaped. This should only be called in the master process.
86
+ #
87
+ # Returns the Process::Status object if the process was reaped, nil if the
88
+ # process was not reaped because it's still running or is already reaped.
89
+ def reap
90
+ if @status.nil? && @pid && Process::waitpid(@pid, Process::WNOHANG)
91
+ @status = $?
92
+ log "exited, reaped pid #{@pid} (status: #{@status.exitstatus})"
93
+ @status
94
+ end
95
+ end
96
+
97
+ # Internal: The main worker loop. This is called after a new worker process
98
+ # has been setup with signal traps and whatnot and connects to the master in
99
+ # a loop to retrieve task items. The worker process exits if this method
100
+ # returns or raises an exception.
101
+ def main
102
+ fail "Worker#main called in master process" if !worker_process?
103
+
104
+ log "booted with pid #{@pid}"
105
+ @task.after_fork(self)
106
+
107
+ while arguments = dequeue
108
+ @task.perform(*arguments)
109
+ break if @shutdown
110
+ end
111
+
112
+ # worker should exit on return
113
+ rescue Exception => boom
114
+ warn "error: worker [#{number}]: #{boom.class} #{boom.to_s}"
115
+ exit 1
116
+ end
117
+
118
+ # Internal: Retrieve an individual task item from the master process. Opens
119
+ # a new socket, reads and unmarshals a single task item.
120
+ #
121
+ # Returns an Array containing the arguments yielded by the dispatcher.
122
+ def dequeue
123
+ client = UNIXSocket.new(@socket)
124
+ r, w, e = IO.select([client], nil, [client], nil)
125
+ return if !e.empty?
126
+
127
+ if data = client.read(16384)
128
+ Marshal.load(data)
129
+ end
130
+ rescue Errno::ENOENT => boom
131
+ # socket file went away, bail out
132
+ ensure
133
+ client.close if client
134
+ end
135
+
136
+ def install_signal_traps
137
+ fail "attempt to install worker signal handling in master" if !worker_process?
138
+
139
+ %w[INT TERM QUIT].each do |signal|
140
+ trap signal do
141
+ next if @shutdown
142
+ @shutdown = true
143
+ log "#{signal} received. initiating graceful shutdown..."
144
+ end
145
+ end
146
+ end
147
+
148
+ def log(message)
149
+ return if !verbose
150
+ $stderr.printf("worker [%d]: %s\n", number, message)
151
+ end
152
+ end
153
+ end
@@ -0,0 +1,4 @@
1
+ #!/bin/sh
2
+ set -e
3
+ cd "$(dirname "$0")/.."
4
+ testrb test/*_test.rb
@@ -0,0 +1,12 @@
1
+ require File.expand_path("../setup", __FILE__)
2
+ require "divvy"
3
+
4
+ # Tests that make sure the example.rb file runs and generates tickets.
5
+ class ExampleTest < MiniTest::Unit::TestCase
6
+ def test_running_the_example_program
7
+ example_script = File.expand_path("../../example.rb", __FILE__)
8
+ output = `divvy -n 2 '#{example_script}'`
9
+ assert $?.success?, "example program should exit successfully"
10
+ assert_equal 11, output.split("\n").size
11
+ end
12
+ end
@@ -0,0 +1,146 @@
1
+ require File.expand_path("../setup", __FILE__)
2
+ require "divvy"
3
+
4
+ # Tests that make sure the example.rb file runs and generates tickets.
5
+ class MasterTest < MiniTest::Unit::TestCase
6
+ class SimpleTask
7
+ include Divvy::Parallelizable
8
+
9
+ def dispatch
10
+ 10.times(&block)
11
+ end
12
+
13
+ def perform(num)
14
+ end
15
+ end
16
+
17
+ def setup
18
+ @task = SimpleTask.new
19
+ @master = Divvy::Master.new(@task, 2)
20
+ end
21
+
22
+ def teardown
23
+ @master.shutdown!
24
+ end
25
+
26
+ def test_worker_object_instantiation
27
+ assert_equal 2, @master.workers.size
28
+
29
+ assert_equal 1, @master.workers[0].number
30
+ assert_equal 2, @master.workers[1].number
31
+
32
+ @master.workers.each { |worker| assert_nil worker.pid }
33
+ @master.workers.each { |worker| assert_nil worker.status }
34
+ end
35
+
36
+ def test_workers_running_check
37
+ assert !@master.workers_running?
38
+ end
39
+
40
+ def test_master_process_check
41
+ assert @master.master_process?
42
+ assert !@master.worker_process?
43
+ end
44
+
45
+ def test_start_server
46
+ @master.start_server
47
+ assert File.exist?(@master.socket)
48
+ end
49
+
50
+ def test_boot_workers
51
+ @master.start_server
52
+ @master.boot_workers
53
+ assert @master.workers_running?
54
+ assert @master.workers.all? { |w| w.running? }
55
+ end
56
+
57
+ def test_reaping_and_killing_workers
58
+ @master.start_server
59
+ @master.boot_workers
60
+ reaped = @master.reap_workers
61
+ assert_equal 0, reaped.size
62
+
63
+ @master.kill_workers("KILL")
64
+ sleep 0.100
65
+ reaped = @master.reap_workers
66
+ assert_equal 2, reaped.size
67
+ end
68
+
69
+ def test_installing_and_uninstalling_signal_traps
70
+ traps = @master.install_signal_traps
71
+ assert traps.is_a?(Array)
72
+ assert traps.size >= 4
73
+
74
+ @master.reset_signal_traps
75
+ assert_equal 0, traps.size
76
+ end
77
+
78
+ class SuccessfulTask
79
+ include Divvy::Parallelizable
80
+
81
+ def dispatch
82
+ yield 'just one thing'
83
+ end
84
+
85
+ def perform(arg)
86
+ if arg != 'just one thing'
87
+ fail "expected arg to be 'just one thing'"
88
+ end
89
+ end
90
+ end
91
+
92
+ def test_successful_run
93
+ task = SuccessfulTask.new
94
+ master = Divvy::Master.new(task, 1)
95
+ master.run
96
+ end
97
+
98
+ class StatsTask
99
+ include Divvy::Parallelizable
100
+
101
+ def dispatch(&block)
102
+ 10.times(&block)
103
+ end
104
+
105
+ def perform(num)
106
+ if num % 2 == 0
107
+ fail "simulated failure"
108
+ else
109
+ true
110
+ end
111
+ end
112
+
113
+ def after_fork(worker)
114
+ $stderr.reopen("/dev/null")
115
+ end
116
+ end
117
+
118
+ def test_stats
119
+ task = StatsTask.new
120
+ master = Divvy::Master.new(task, 5)
121
+ master.run
122
+ assert_equal 5, master.failures
123
+ assert_equal 10, master.tasks_distributed
124
+ end
125
+
126
+ class FlappingWorkerTask
127
+ include Divvy::Parallelizable
128
+
129
+ def dispatch
130
+ yield 'never makes it'
131
+ end
132
+
133
+ def after_fork(worker)
134
+ exit! 1
135
+ end
136
+ end
137
+
138
+ def test_flapping_worker_detection
139
+ task = FlappingWorkerTask.new
140
+ master = Divvy::Master.new(task, 1)
141
+ assert_raises(Divvy::Master::BootFailure) do
142
+ master.run
143
+ end
144
+ assert_equal 1, master.failures
145
+ end
146
+ end
@@ -0,0 +1,19 @@
1
+ # Basic test environment.
2
+ #
3
+ # This should set up the load path for testing only. Don't require any support libs
4
+ # or gitrpc stuff in here.
5
+
6
+ # bring in minitest
7
+ require 'minitest/autorun'
8
+
9
+ # add bin dir to path for testing command
10
+ ENV['PATH'] = [
11
+ File.expand_path("../../bin", __FILE__),
12
+ ENV['PATH']
13
+ ].join(":")
14
+
15
+ # put lib dir directly to load path
16
+ $LOAD_PATH.unshift File.expand_path("../../lib", __FILE__)
17
+
18
+ # child processes inherit our load path
19
+ ENV['RUBYLIB'] = $LOAD_PATH.join(":")
metadata ADDED
@@ -0,0 +1,94 @@
1
+ --- !ruby/object:Gem::Specification
2
+ name: divvy
3
+ version: !ruby/object:Gem::Version
4
+ version: '1.0'
5
+ prerelease:
6
+ platform: ruby
7
+ authors:
8
+ - ! '@rtomayko'
9
+ autorequire:
10
+ bindir: bin
11
+ cert_chain: []
12
+ date: 2013-04-26 00:00:00.000000000 Z
13
+ dependencies:
14
+ - !ruby/object:Gem::Dependency
15
+ name: rake
16
+ requirement: !ruby/object:Gem::Requirement
17
+ none: false
18
+ requirements:
19
+ - - ! '>='
20
+ - !ruby/object:Gem::Version
21
+ version: '0'
22
+ type: :development
23
+ prerelease: false
24
+ version_requirements: !ruby/object:Gem::Requirement
25
+ none: false
26
+ requirements:
27
+ - - ! '>='
28
+ - !ruby/object:Gem::Version
29
+ version: '0'
30
+ - !ruby/object:Gem::Dependency
31
+ name: minitest
32
+ requirement: !ruby/object:Gem::Requirement
33
+ none: false
34
+ requirements:
35
+ - - ! '>='
36
+ - !ruby/object:Gem::Version
37
+ version: '0'
38
+ type: :development
39
+ prerelease: false
40
+ version_requirements: !ruby/object:Gem::Requirement
41
+ none: false
42
+ requirements:
43
+ - - ! '>='
44
+ - !ruby/object:Gem::Version
45
+ version: '0'
46
+ description: ! '...'
47
+ email:
48
+ - rtomayko@gmail.com
49
+ executables:
50
+ - divvy
51
+ extensions: []
52
+ extra_rdoc_files: []
53
+ files:
54
+ - COPYING
55
+ - README.md
56
+ - Rakefile
57
+ - bin/divvy
58
+ - divvy.gemspec
59
+ - example.rb
60
+ - lib/divvy.rb
61
+ - lib/divvy/master.rb
62
+ - lib/divvy/parallelizable.rb
63
+ - lib/divvy/worker.rb
64
+ - script/cibuild
65
+ - test/example_test.rb
66
+ - test/master_test.rb
67
+ - test/setup.rb
68
+ homepage: https://github.com/rtomayko/divvy
69
+ licenses: []
70
+ post_install_message:
71
+ rdoc_options: []
72
+ require_paths:
73
+ - lib
74
+ required_ruby_version: !ruby/object:Gem::Requirement
75
+ none: false
76
+ requirements:
77
+ - - ! '>='
78
+ - !ruby/object:Gem::Version
79
+ version: '0'
80
+ required_rubygems_version: !ruby/object:Gem::Requirement
81
+ none: false
82
+ requirements:
83
+ - - ! '>='
84
+ - !ruby/object:Gem::Version
85
+ version: '0'
86
+ requirements: []
87
+ rubyforge_project:
88
+ rubygems_version: 1.8.23
89
+ signing_key:
90
+ specification_version: 3
91
+ summary: little ruby parallel script runner
92
+ test_files:
93
+ - test/example_test.rb
94
+ - test/master_test.rb