ultravisor 0.0.0.3.g8cf10dc

Sign up to get free protection for your applications and to get access to all the features.
@@ -0,0 +1,451 @@
1
+ > # WARNING WARNING WARNING
2
+ >
3
+ > This README is, at least in part, speculative fiction. I practice
4
+ > README-driven development, and as such, not everything described in here
5
+ > actually exists yet, and what does exist may not work right.
6
+
7
+ Ultravisor is like a supervisor, but... *ULTRA*. The idea is that you specify
8
+ objects to instantiate and run in threads, and then the Ultravisor makes that
9
+ happen behind the scenes, including logging failures, restarting if necessary,
10
+ and so on. If you're familiar with Erlang supervision trees, then Ultravisor
11
+ will feel familiar to you, because I stole pretty much every good idea that
12
+ is in Ultravisor from Erlang. You will get a lot of very excellent insight
13
+ from reading [the Erlang/OTP Supervision Principles](http://erlang.org/doc/design_principles/sup_princ.html).
14
+
15
+
16
+ # Installation
17
+
18
+ It's a gem:
19
+
20
+ gem install ultravisor
21
+
22
+ There's also the wonders of [the Gemfile](http://bundler.io):
23
+
24
+ gem 'ultravisor'
25
+
26
+ If you're the sturdy type that likes to run from git:
27
+
28
+ rake install
29
+
30
+ Or, if you've eschewed the convenience of Rubygems entirely, then you
31
+ presumably know what to do already.
32
+
33
+
34
+ # Usage
35
+
36
+ This section gives you a basic overview of the high points of how Ultravisor
37
+ can be used. It is not intended to be an exhaustive reference of all possible
38
+ options; the {Ultravisor} class API documentation provides every possible option
39
+ and its meaning.
40
+
41
+
42
+ ## The Basics
43
+
44
+ Start by loading the code:
45
+
46
+ require "ultravisor"
47
+
48
+ Creating a new Ultravisor is a matter of instantiating a new object:
49
+
50
+ u = Ultravisor.new
51
+
52
+ In order for it to be useful, though, you'll need to add one or more children
53
+ to the Ultravisor instance, which can either be done as part of the call to
54
+ `.new`, or afterwards, as you see fit:
55
+
56
+ # Defining a child in the constructor
57
+ u = Ultravisor.new(children: [{id: :child, klass: Child, method: :run}])
58
+
59
+ # OR define it afterwards
60
+ u = Ultravisor.new
61
+ u.add_child(id: :my_child, klass: Child, method: :run)
62
+
63
+ Once you have an Ultravisor with children configured, you can set it running:
64
+
65
+ u.run
66
+
67
+ This will block until the Ultravisor terminates, one way or another.
68
+
69
+ We'll learn about other available initialization arguments, and all the other
70
+ features of Ultravisor, in the following sections.
71
+
72
+
73
+ ## Defining Children
74
+
75
+ As children are the primary reason Ultravisor exists, it is worth getting a handle
76
+ on them first.
77
+
78
+ Defining children, as we saw in the introduction, can be done by calling
79
+ {Ultravisor#add_child} for each child you want to add, or else you can provide
80
+ a list of children to start as part of the {Ultravisor.new} call, using the
81
+ `children` named argument. You can also combine the two approaches, if some
82
+ children are defined statically, while others only get added conditionally.
83
+
84
+ Let's take another look at that {Ultravisor#add_child} method from earlier:
85
+
86
+ u.add_child(id: :my_child, klass: Child, method: :run)
87
+
88
+ First up, every child has an ID. This is fairly straightforward -- it's a
89
+ unique ID (within a given Ultravisor) that refers to the child. Attempting to
90
+ add two children with the same ID will raise an exception.
91
+
92
+ The `class` and `method` arguments require a little more explanation. One
93
+ of the foundational principles of "fail fast" is "clean restart" -- that is, if you
94
+ do need to restart something, it's important to start with as clean a state as possible.
95
+ Thus, if a child needs to be restarted, we don't want to reuse an existing object, which
96
+ may be in a messy and unuseable state. Instead, we want a clean, fresh object to work on.
97
+ That's why you specify a `class` when you define a child -- it is a new instance of that
98
+ class that will be used every time the child is started (or restarted).
99
+
100
+ The `method` argument might now be obvious. Once the new instance of the
101
+ specified `class` exists, the Ultravisor will call the specified `method` to start
102
+ work happening. It is expected that this method will ***not return***, in most cases.
103
+ So you probably want some sort of infinite loop.
104
+
105
+ You might think that this is extremely inflexible, only being able to specify a class
106
+ and a method to call. What if you want to pass in some parameters? Don't worry, we've
107
+ got you covered:
108
+
109
+ u.add_child(
110
+ id: :my_child,
111
+ klass: Child,
112
+ args: ['foo', 42, x: 1, y: 2],
113
+ method: :run,
114
+ )
115
+
116
+ The call to `Child.new` can take arbitrary arguments, just by defining an array
117
+ for the `args` named parameter. Did you know you can define a hash inside an
118
+ array like `['foo', 'bar', x: 1, y: 2] => ['foo', 'bar', {:x => 1, :y => 2}]`?
119
+ I didn't, either, until I started working on Ultravisor, but you can, and it
120
+ works *exactly* like named parameters in method calls.
121
+
122
+ You can also add children after the Ultravisor has been set running:
123
+
124
+ u = Ultravisor.new
125
+
126
+ u.add_child(id: :c1, klass: SomeWorker, method: :run)
127
+
128
+ u.run # => starts running an instance of SomeWorker, doesn't return
129
+
130
+ # In another thread...
131
+ u.add_child(id: :c2, klass: OtherWorker, method: go!)
132
+
133
+ # An instance of OtherWorker will be created and set running
134
+
135
+ If you add a child to an already-running Ultravisor, that child will immediately be
136
+ started running, almost like magic.
137
+
138
+
139
+ ### Ordering of Children
140
+
141
+ The order in which children are defined is important. When children are (re)started,
142
+ they are always started in the order they were defined. When children are stopped,
143
+ either because the Ultravisor is shutting down, or because of a [supervision
144
+ strategy](#supervision-strategies), they are always stopped in the *reverse* order
145
+ of their definition.
146
+
147
+ All child specifications passed to {Ultravisor.new} always come first, in the
148
+ order they were in the array. Any children defined via calls to
149
+ {Ultravisor#add_child} will go next, in the order the `add_child` calls were
150
+ made.
151
+
152
+
153
+ ## Restarting Children
154
+
155
+ One of the fundamental purposes of a supervisor like Ultravisor is that it restarts
156
+ children if they crash, on the principle of "fail fast". There's no point failing fast
157
+ if things don't get automatically fixed. This is the default behaviour of all
158
+ Ultravisor children.
159
+
160
+ Controlling how children are restarted is the purpose of the "restart policy",
161
+ which is controlled by the `restart` and `restart_policy` named arguments in
162
+ the child specification. For example, if you want to create a child that will
163
+ only ever be run once, regardless of what happens to it, then use `restart:
164
+ :never`:
165
+
166
+ u.add_child(
167
+ id: :my_one_shot_child,
168
+ klass: Child,
169
+ method: :run_maybe,
170
+ restart: :never
171
+ )
172
+
173
+ If you want a child which gets restarted if its `method` raises an exception,
174
+ but *not* if it runs to completion without error, then use `restart: :on_failure`:
175
+
176
+ u.add_child(
177
+ id: :my_run_once_child,
178
+ klass: Child,
179
+ method: :run_once,
180
+ restart: :on_failure
181
+ )
182
+
183
+ ### The Limits of Failure
184
+
185
+ While restarting is great in general, you don't particularly want to fill your
186
+ logs with an endlessly restarting child -- say, because it doesn't have
187
+ permission to access a database. To solve that problem, an Ultravisor will
188
+ only attempt to restart a child a certain number of times before giving up and
189
+ exiting itself. The parameters of how this works are controlled by the
190
+ `restart_policy`, which is itself a hash:
191
+
192
+ u.add_child(
193
+ id: :my_restartable_child,
194
+ klass: Child,
195
+ method: :run,
196
+ restart_policy: {
197
+ period: 5,
198
+ retries: 2,
199
+ delay: 1,
200
+ }
201
+ )
202
+
203
+ The meaning of each of the `restart_policy` keys is best explained as part
204
+ of how Ultravisor restarts children.
205
+
206
+ When a child needs to be restarted, Ultravisor first waits a little while
207
+ before attempting the restart. The amount of time to wait is specified
208
+ by the `delay` value in the `restart_policy`. Then a new instance of the
209
+ `class` is instantiated, and the `method` is called on that instance.
210
+
211
+ The `period` and `retries` values of the `restart_policy` come into play
212
+ when the child exits repeatedly. If a single child needs to be restarted
213
+ more than `retries` times in `period` seconds, then instead of trying to
214
+ restart again, Ultravisor gives up. It doesn't try to start the child
215
+ again, it terminates all the *other* children of the Ultravisor, and
216
+ then it exits. Note that the `delay` between restarts is *not* part
217
+ of the `period`; only time spent actually running the child is
218
+ accounted for.
219
+
220
+
221
+ ## Managed Child Termination
222
+
223
+ If children need to be terminated, by default, child threads are simply
224
+ forcibly terminated by calling {Thread#kill} on them. However, for workers
225
+ which hold resources, this can cause problems.
226
+
227
+ Thus, it is possible to control both how a child is terminated, and how long
228
+ to wait for that termination to occur, by using the `shutdown` named argument
229
+ when you add a child (either via {Ultravisor#add_child}, or as part of the
230
+ `children` named argument to {Ultravisor.new}), like this:
231
+
232
+ u.add_child(
233
+ id: :fancy_worker,
234
+ shutdown: {
235
+ method: :gentle_landing,
236
+ timeout: 30
237
+ }
238
+ )
239
+
240
+ When a child with a custom shutdown policy needs to be terminated, the
241
+ method named in the `method` key is called on the instance of `class` that
242
+ represents that child. Once the shutdown has been signalled to the
243
+ worker, up to `timeout` seconds is allowed to elapse. If the child thread has
244
+ not terminated by this time, the thread is forcibly terminated by calling
245
+ {Thread#kill}. This timeout prevents shutdown or group restart from hanging
246
+ indefinitely.
247
+
248
+ Note that the `method` specified in the `shutdown` specification should
249
+ signal the worker to terminate, and then return immediately. It should
250
+ *not* wait for termination itself.
251
+
252
+
253
+ ## Supervision Strategies
254
+
255
+ When a child needs to be restarted, by default only the child that exited
256
+ will be restarted. However, it is possible to cause other
257
+ children to be restarted as well, if that is necessary. To do that, you
258
+ use the `strategy` named parameter when creating the Ultravisor:
259
+
260
+ u = Ultravisor.new(strategy: :one_for_all)
261
+
262
+ The possible values for the strategy are:
263
+
264
+ * `:one_for_one` -- the default restart strategy, this simply causes the
265
+ child which exited to be started again, in line with its restart policy.
266
+
267
+ * `:all_for_one` -- if any child needs to be restarted, all children of the
268
+ Ultravisor get terminated in reverse of their start order, and then all
269
+ children are started again, except those which are `restart: :never`, or
270
+ `restart: :on_failure` which had not already exited without error.
271
+
272
+ * `:rest_for_one` -- if any child needs to be restarted, all children of
273
+ the Ultravisor which are *after* the restarted child get terminated
274
+ in reverse of their start order, and then all children are started again,
275
+ except those which are `restart: :never`, or `restart: :on_failure` which
276
+ had not already exited without error.
277
+
278
+
279
+ ## Interacting With Child Objects
280
+
281
+ Since the Ultravisor is creating the object instances that run in the worker
282
+ threads, you don't automatically have access to the object instance itself.
283
+ This is somewhat by design -- concurrency bugs are hell. However, there *are*
284
+ ways around this, if you need to.
285
+
286
+
287
+ ### The power of cast / call
288
+
289
+ A common approach for interacting with an object in an otherwise concurrent
290
+ environment is the `cast` / `call` pattern. From the outside, the interface
291
+ is quite straightforward:
292
+
293
+ ```
294
+ u = Ultravisor.new(children: [
295
+ { id: :castcall, klass: CastCall, method: :run, enable_castcall: true }
296
+ ])
297
+
298
+ # This will return `nil` immediately
299
+ u[:castcall].cast.some_method
300
+
301
+ # This will, at some point in the future, return whatever `CastCall#to_s` could
302
+ u[:castcall].call.some_method
303
+ ```
304
+
305
+ To enable `cast` / `call` support for a child, you must set the `enable_castcall`
306
+ keyword argument on the child. This is because failing to process `cast`s and
307
+ `call`s can cause all sorts of unpleasant backlogs, so children who intend to
308
+ receive (and process) `cast`s and `call`s must explicitly opt-in.
309
+
310
+ The interface to the object from outside is straightforward. You get a
311
+ reference to the instance of {Ultravisor::Child} for the child you want to talk
312
+ to (which is returned by {Ultravisor#add_child}, or {Ultravisor#[]}), and then
313
+ call `child.cast.<method>` or `child.call.<method>`, passing in arguments as
314
+ per normal. Any public method can be the target of the `cast` or `call`, and you
315
+ can pass in any arguments you like, *including blocks* (although bear in mind that
316
+ any blocks passed will be run in the child instance's thread, and many
317
+ concurrency dragons await the unwary).
318
+
319
+ The difference between the `cast` and `call` methods is in whether or not a
320
+ return value is expected, and hence when the method call chained through
321
+ `cast` or `call` returns.
322
+
323
+ When you call `cast`, the real method call gets queued for later execution,
324
+ and since no return value is expected, the `child.cast.<method>` returns
325
+ `nil` immediately and your code gets on with its day. This is useful
326
+ when you want to tell the worker something, or instruct it to do something,
327
+ but there's no value coming back.
328
+
329
+ In comparison, when you call `call`, the real method call still gets queued,
330
+ but the calling code blocks, waiting for the return value from the queued
331
+ method call. This may seem pointless -- why have concurrency that blocks? --
332
+ but the value comes from the synchronisation. The method call only happens
333
+ when the worker loop calls `process_castcall`, which it can do at a time that
334
+ suits it, and when it knows that nothing else is going on that could cause
335
+ problems.
336
+
337
+ One thing to be aware of when interacting with a worker instance is that it may
338
+ crash, and be restarted by the Ultravisor, before it gets around to processing
339
+ a queued message. If you used `child.cast`, then the method call is just...
340
+ lost, forever. On the other hand, if you used `child.call`, then an
341
+ {Ultravisor::ChildRestartedError} exception will be raised, which you can deal
342
+ with as you see fit.
343
+
344
+ The really interesting part is what happens *inside* the child instance. The
345
+ actual execution of code in response to the method calls passed through `cast`
346
+ and `call` will only happen when the running instance of the child's class
347
+ calls `process_castcall`. When that happens, all pending casts and calls will
348
+ be executed. Since this happens within the same thread as the rest of the
349
+ child instance's code, it's a lot safer than trying to synchronise everything
350
+ with locks.
351
+
352
+ You can, of course, just call `process_castcall` repeatedly, however that's a
353
+ somewhat herp-a-derp way of doing it. The `castcall_fd` method in the running
354
+ instance will return an IO object which will become readable whenever there is
355
+ a pending `cast` or `call` to process. Thus, if you're using `IO.select` or
356
+ similar to wait for work to do, you can add `castcall_fd` to the readable set
357
+ and only call `process_castcall` when the relevant IO object comes back. Don't
358
+ actually try *reading* from it yourself; `process_castcall` takes care of all that.
359
+
360
+ If you happen to have a child class whose *only* purpose is to process `cast`s
361
+ and `call`s, you should configure the Ultravisor to use `process_castcall_loop`
362
+ as its entry method. This is a wrapper method which blocks on `castcall_fd`
363
+ becoming readable, and loops infinitely.
364
+
365
+ It is important to remember that not all concurrency bugs can be prevented by
366
+ using `cast` / `call`. For example, read-modify-write operations will still
367
+ cause all the same problems they always do, so if you find yourself calling
368
+ `child.call`, modifying the value returned, and then calling `child.cast`
369
+ with that modified value, you're in for a bad time.
370
+
371
+
372
+ ### Direct (Unsafe) Instance Access
373
+
374
+ If you have a worker class which you're *really* sure is safe against concurrent
375
+ access, you can eschew the convenience and safety of `cast` / `call`, and instead
376
+ allow direct access to the worker instance object.
377
+
378
+ To do this, specify `access: :unsafe` in the child specification, and then
379
+ call `child.unsafe_instance` to get the instance object currently in play.
380
+
381
+ Yes, the multiple mentions of `unsafe` are there deliberately, and no, I won't
382
+ be removing them. They're there to remind you, always, that what you're doing
383
+ is unsafe.
384
+
385
+ If the child is restarting at the time `child.unsafe_instance` is called,
386
+ the call will block until the child worker is started again, after which
387
+ you'll get the newly created worker instance object. The worker could crash
388
+ again at any time, of course, leaving you with a now out-of-date object
389
+ that is no longer being actively run. It's up to you to figure out how to
390
+ deal with that. If the Ultravisor associated with the child
391
+ has terminated, your call to `child.unsafe_instance` will raise an
392
+ {Ultravisor::ChildRestartedError}.
393
+
394
+ Why yes, Gracie, there *are* a lot of things that can go wrong when using
395
+ direct instance object access. Still wondering why those `unsafe`s are in
396
+ the name?
397
+
398
+
399
+ ## Supervision Trees
400
+
401
+ Whilst a collection of workers is a neat thing to have, more powerful systems
402
+ can be constructed if supervisors can, themselves, be supervised. Primarily
403
+ this is useful when recovering from persistent errors, because you can use
404
+ a higher-level supervisor to restart an entire tree of workers which has one
405
+ which is having problems.
406
+
407
+ Creating a supervision tree is straightforward. Because Ultravisor works by
408
+ instantiating plain old ruby objects, and Ultravisor is, itself, a plain old
409
+ ruby class, you use it more-or-less like you would any other object:
410
+
411
+ u = Ultravisor.new
412
+ u.add_child(id: :sub_sup, klass: Ultravisor, method: :run, args: [children: [...]])
413
+
414
+ That's all there is to it. Whenever the parent Ultravisor wants to work on the
415
+ child Ultravisor, it treats it like any other child, asking it to terminate,
416
+ start, etc, and the child Ultravisor's work consists of terminating, starting,
417
+ etc all of its children.
418
+
419
+ The only difference in default behaviour between a regular worker child and an
420
+ Ultravisor child is that an Ultravisor's `shutdown` policy is automatically set
421
+ to `method: :stop!, timeout: :infinity`. This is because it is *very* bad news
422
+ to forcibly terminate an Ultravisor before its children have stopped -- all
423
+ those children just get cast into the VM, never to be heard from again.
424
+
425
+
426
+ # Contributing
427
+
428
+ Bug reports should be sent to the [Github issue
429
+ tracker](https://github.com/mpalmer/ultravisor/issues), or
430
+ [e-mailed](mailto:theshed+ultravisor@hezmatt.org). Patches can be sent as a
431
+ Github pull request, or [e-mailed](mailto:theshed+ultravisor@hezmatt.org).
432
+
433
+
434
+ # Licence
435
+
436
+ Unless otherwise stated, everything in this repo is covered by the following
437
+ copyright notice:
438
+
439
+ Copyright (C) 2019 Matt Palmer <matt@hezmatt.org>
440
+
441
+ This program is free software: you can redistribute it and/or modify it
442
+ under the terms of the GNU General Public License version 3, as
443
+ published by the Free Software Foundation.
444
+
445
+ This program is distributed in the hope that it will be useful,
446
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
447
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
448
+ GNU General Public License for more details.
449
+
450
+ You should have received a copy of the GNU General Public License
451
+ along with this program. If not, see <http://www.gnu.org/licenses/>.
@@ -0,0 +1,216 @@
1
+ # frozen_string_literal: true
2
+
3
+ require "logger"
4
+
5
+ require_relative "./ultravisor/child"
6
+ require_relative "./ultravisor/error"
7
+ require_relative "./ultravisor/logging_helpers"
8
+
9
+ # A super-dooOOOoooper supervisor.
10
+ #
11
+ class Ultravisor
12
+ include LoggingHelpers
13
+
14
+ def initialize(children: [], strategy: :one_for_one, logger: Logger.new("/dev/null"))
15
+ @queue, @logger = Queue.new, logger
16
+
17
+ @strategy = strategy
18
+ validate_strategy
19
+
20
+ @op_m, @op_cv = Mutex.new, ConditionVariable.new
21
+ @running_thread = nil
22
+
23
+ initialize_children(children)
24
+ end
25
+
26
+ def run
27
+ logger.debug(logloc) { "called" }
28
+
29
+ @op_m.synchronize do
30
+ if @running_thread
31
+ raise AlreadyRunningError,
32
+ "This ultravisor is already running"
33
+ end
34
+
35
+ @queue.clear
36
+ @running_thread = Thread.current
37
+ Thread.current.name = "Ultravisor"
38
+ end
39
+
40
+ logger.debug(logloc) { "Going to start children #{@children.map(&:first).inspect}" }
41
+ @children.each { |c| c.last.spawn(@queue) }
42
+
43
+ process_events
44
+
45
+ @op_m.synchronize do
46
+ logger.debug(logloc) { "Shutdown time for #{@children.reverse.map(&:first).inspect}" }
47
+ @children.reverse.each { |c| c.last.shutdown }
48
+
49
+ @running_thread = nil
50
+ @op_cv.broadcast
51
+ end
52
+
53
+ self
54
+ end
55
+
56
+ def shutdown(wait: true, force: false)
57
+ @op_m.synchronize do
58
+ return self unless @running_thread
59
+ if force
60
+ @children.reverse.each { |c| c.last.shutdown(force: true) }
61
+ @running_thread.kill
62
+ @running_thread = nil
63
+ @op_cv.broadcast
64
+ else
65
+ @queue << :shutdown
66
+ if wait
67
+ @op_cv.wait(@op_m) while @running_thread
68
+ end
69
+ end
70
+ end
71
+ self
72
+ end
73
+
74
+ def [](id)
75
+ @children.assoc(id)&.last
76
+ end
77
+
78
+ def add_child(**args)
79
+ logger.debug(logloc) { "Adding child #{args[:id].inspect}" }
80
+ args[:logger] ||= logger
81
+
82
+ @op_m.synchronize do
83
+ c = Ultravisor::Child.new(**args)
84
+
85
+ if @children.assoc(c.id)
86
+ raise DuplicateChildError,
87
+ "Child with ID #{c.id.inspect} already exists"
88
+ end
89
+
90
+ @children << [c.id, c]
91
+
92
+ if @running_thread
93
+ logger.debug(logloc) { "Auto-starting new child #{args[:id].inspect}" }
94
+ c.spawn(@queue)
95
+ end
96
+ end
97
+ end
98
+
99
+ def remove_child(id)
100
+ logger.debug(logloc) { "Removing child #{id.inspect}" }
101
+
102
+ @op_m.synchronize do
103
+ c = @children.assoc(id)
104
+
105
+ return nil if c.nil?
106
+
107
+ @children.delete(c)
108
+ if @running_thread
109
+ logger.debug(logloc) { "Shutting down removed child #{id.inspect}" }
110
+ c.last.shutdown
111
+ end
112
+ end
113
+ end
114
+
115
+ private
116
+
117
+ def validate_strategy
118
+ unless %i{one_for_one all_for_one rest_for_one}.include?(@strategy)
119
+ raise ArgumentError,
120
+ "Invalid strategy #{@strategy.inspect}"
121
+ end
122
+ end
123
+
124
+ def initialize_children(children)
125
+ unless children.is_a?(Array)
126
+ raise ArgumentError,
127
+ "children must be an Array"
128
+ end
129
+
130
+ @children = []
131
+
132
+ children.each do |cfg|
133
+ cfg[:logger] ||= logger
134
+ c = Ultravisor::Child.new(**cfg)
135
+ if @children.assoc(c.id)
136
+ raise DuplicateChildError,
137
+ "Duplicate child ID: #{c.id.inspect}"
138
+ end
139
+
140
+ @children << [c.id, c]
141
+ end
142
+ end
143
+
144
+ def process_events
145
+ loop do
146
+ qe = @queue.pop
147
+
148
+ case qe
149
+ when Ultravisor::Child
150
+ logger.debug(logloc) { "Received Ultravisor::Child queue entry for #{qe.id}" }
151
+ @op_m.synchronize { child_exited(qe) }
152
+ when :shutdown
153
+ logger.debug(logloc) { "Received :shutdown queue entry" }
154
+ break
155
+ else
156
+ logger.error(logloc) { "Unknown queue entry: #{qe.inspect}" }
157
+ end
158
+ end
159
+ end
160
+
161
+ def child_exited(child)
162
+ if child.termination_exception
163
+ log_exception(child.termination_exception, "Ultravisor::Child(#{child.id.inspect})") { "Thread terminated by unhandled exception" }
164
+ end
165
+
166
+ if @running_thread.nil?
167
+ logger.debug(logloc) { "Child termination after shutdown" }
168
+ # Child termination processed after we've shut down... nope
169
+ return
170
+ end
171
+
172
+ begin
173
+ return unless child.restart?
174
+ rescue Ultravisor::BlownRestartPolicyError
175
+ # Uh oh...
176
+ logger.error(logloc) { "Child #{child.id} has exceeded its restart policy. Shutting down the Ultravisor." }
177
+ @queue << :shutdown
178
+ return
179
+ end
180
+
181
+ case @strategy
182
+ when :all_for_one
183
+ @children.reverse.each do |id, c|
184
+ # Don't need to shut down the child that has caused all this mess
185
+ next if child.id == id
186
+
187
+ c.shutdown
188
+ end
189
+ when :rest_for_one
190
+ @children.reverse.each do |id, c|
191
+ # Don't go past the child that caused the problems
192
+ break if child.id == id
193
+
194
+ c.shutdown
195
+ end
196
+ end
197
+
198
+ sleep child.restart_delay
199
+
200
+ case @strategy
201
+ when :all_for_one
202
+ @children.each do |_, c|
203
+ c.spawn(@queue)
204
+ end
205
+ when :rest_for_one
206
+ s = false
207
+ @children.each do |id, c|
208
+ s = true if child.id == id
209
+
210
+ c.spawn(@queue) if s
211
+ end
212
+ when :one_for_one
213
+ child.spawn(@queue)
214
+ end
215
+ end
216
+ end