daemon_controller 0.2.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2008 Phusion
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
20
+
data/README.markdown ADDED
@@ -0,0 +1,460 @@
1
+ Introduction
2
+ ============
3
+
4
+ *daemon_controller* is a library for starting and stopping specific daemons
5
+ programmatically in a robust, race-condition-free manner.
6
+
7
+ It's not a daemon monitoring system like God or Monit. It's also not a library
8
+ for writing daemons.
9
+
10
+ It provides the following functionality:
11
+
12
+ * Starting daemons. If the daemon fails to start then an exception will be
13
+ raised. *daemon_controller* can even detect failures that occur after the
14
+ daemon has already daemonized.
15
+
16
+ Starting daemons is done in a race-condition-free manner. If another
17
+ process using *daemon_controller* is trying to start the same daemon,
18
+ then *daemon_controller* will guarantee serialization.
19
+
20
+ *daemon_controller* also raises an exception if it detects that the daemon
21
+ is already started.
22
+ * Connecting to a daemon, starting it if it's not already started. This too
23
+ is done in a race-condition-free manner. If the daemon fails to start then
24
+ an exception will be raised.
25
+ * Stopping daemons.
26
+ * Checking whether a daemon is running.
27
+
28
+
29
+ ## What is it for?
30
+
31
+ There is a lot of software (both Rails related and unrelated) which rely on
32
+ servers or daemons. To name a few, in no particular order:
33
+
34
+ * [Ultrasphinx](http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/), a
35
+ Rails library for full-text searching. It makes use the [Sphinx search
36
+ software](http://www.sphinxsearch.com/) for indexing and searching. Indexing
37
+ is done by running a command, while searching is done by querying the Sphinx
38
+ search server.
39
+ * [acts_as_ferret](http://projects.jkraemer.net/acts_as_ferret/wiki), another
40
+ Rails library for full-text searching. It uses the Ferret search software.
41
+ On production environments, it relies on the Ferret DRB server for both
42
+ searching and indexing.
43
+ * [BackgrounDRb](http://backgroundrb.rubyforge.org/), a Ruby job server and
44
+ scheduler. Scheduling is done by contacting the BackgrounDRb daemon.
45
+ * [mongrel_cluster](http://mongrel.rubyforge.org/wiki/MongrelCluster), which
46
+ starts and stops multiple Mongrel daemons.
47
+
48
+ Relying on daemons is quite common, but not without problems. Let's go over
49
+ some of them.
50
+
51
+ ### Starting daemons is a hassle
52
+
53
+ If you've used similar software, then you might agree that managing these
54
+ daemons is a hassle. If you're using BackgrounDRb, then the daemon must be
55
+ running. Starting the daemon is not hard, but it is annoying. It's also
56
+ possible that the system administrator forgets to start the daemon. While
57
+ configuring the system to automatically start a daemon at startup is not hard,
58
+ it is an extra thing to do, and thus a hassle. We thought, why can't such
59
+ daemons be automatically started? Indeed, this won't be possible if the daemon
60
+ is to be run on a remote machine. But in by far the majority of use cases, the
61
+ daemon runs on the same host as the Rails application. If a Rails application -
62
+ or indeed, <em>any</em> application - is configured to contact a daemon on the
63
+ local host, then why not start the daemon automatically on demand?
64
+
65
+ ### Daemon starting code may not be robust or efficient
66
+
67
+ We've also observed that people write daemon controlling code over and over
68
+ again. Consider for example UltraSphinx, which provides a
69
+ `rake sphinx:daemon:start` Rake task to start the daemon. The time that a
70
+ daemon needs to initialize is variable, and depends on things such as the
71
+ current system load. The Sphinx daemon usually needs less than a second before
72
+ we can connect to it. However, the way different software handles starting of a
73
+ daemon varies. We've observed that waiting a fixed amount of time is by far the
74
+ most common way. For example, UltraSphinx's daemon starting code looks like
75
+ this:
76
+
77
+ system "searchd --config '#{Ultrasphinx::CONF_PATH}'"
78
+ sleep(4) # give daemon a chance to write the pid file
79
+ if ultrasphinx_daemon_running?
80
+ say "started successfully"
81
+ else
82
+ say "failed to start"
83
+ end
84
+
85
+ This is in no way a slam against UltraSphinx. However, if the daemon starts in
86
+ 200 miliseconds, then the user who issued the start command will be waiting for
87
+ 3.8 seconds for no good reason. This is not good for usability or for the
88
+ user's patience.
89
+
90
+ ### Startup error handling
91
+
92
+ Different software handles daemon startup errors in different ways. Some might
93
+ not even handle errors at all. For example, consider 'mongrel_cluster'. If
94
+ there's a typo in one of your application source files, then 'mongrel_cluster'
95
+ will not report the error. Instead, you have to check its log files to see what
96
+ happened. This is not good for usability: many people will be wondering why
97
+ they can't connect to their Mongrel ports after issuing a
98
+ `mongrel_rails cluster::start` - until they realize that they should read the
99
+ log file. But the thing is, not everybody realizes this. And typing in an extra
100
+ command to read the log file to check whether Mongrel started correctly, is
101
+ just a big hassle. Why can't the daemon startup code report such errors
102
+ immediately?
103
+
104
+ ### Stale or corrupt Pid files
105
+
106
+ Suppose that you're running a Mongrel cluster, and your server suddenly powers
107
+ off because of a power outage. When the server is online again, it fails to
108
+ start your Mongrel cluster because the PID file that it had written still
109
+ exists, and wasn't cleaned up properly (it's supposed to be cleaned up when
110
+ Mongrel exits). mongrel_cluster provides the `--clean` option to check whether
111
+ the PID file is *stale*, and will automatically clean it up if it is. But not
112
+ all daemon controlling software supports this. Why can't all software check for
113
+ stale PID files automatically?
114
+
115
+
116
+ ## Implementation issues
117
+
118
+ From the problem descriptions, it would become apparent that our wishlist is as
119
+ follows. Why is this wishlist often not implemented? Let's go over them.
120
+
121
+ - **A daemon should be automatically started on demand, instead of requiring the user to manually start it.**
122
+
123
+ The most obvious problems are related to concurrency. Suppose that your web
124
+ application has a search box, and you want to start the search daemon if it
125
+ isn't already started, then connect to. Two problems will arise:
126
+
127
+ * Suppose that Rails process A is still starting the daemon. At the same
128
+ time, another visitor tries to search something, and Rails process B
129
+ notices that the daemon is not running. If B tries to start the daemon
130
+ while it's already being started by A, then things can go wrong.
131
+ *A robust daemon starter must ensure that only one process at the same time may start the daemon.*
132
+ * It's not a good idea to wait a fixed amount of time for the daemon to
133
+ start, because you don't know in advance how long it will take for it to
134
+ start. For example, if you wait 2 seconds, then try to connect to the
135
+ daemon, and the daemon isn't done initializing yet, then it will seem as
136
+ if the daemon failed to start.
137
+
138
+ These are the most probable reasons why people don't try to write
139
+ auto-starting code, and instead require the user to start the daemon
140
+ manually.
141
+
142
+ These problems, as well as several less obvious problems, are closely
143
+ related to the next few points.
144
+
145
+ - **The daemon starter must wait until the daemon is done initializing, no longer and no shorter**
146
+
147
+ Because only after the daemon is fully initialized, is it safe to connect
148
+ to it. And because the user should not have to wait longer than he really
149
+ has to. During startup, the daemon will have to be continuously checked
150
+ whether it's done initializing or whether an error occured. Writing this
151
+ code can be quite a hassle, which is why most people don't do it.
152
+
153
+ - **The daemon starter must report any startup errors**
154
+
155
+ If the daemon starting command - e.g. `sphinx -c config_file.conf`,
156
+ `apachectl start` or `mongrel_rails cluster::start` - reports startup
157
+ errors, then all is fine as long as the user is starting the command from a
158
+ terminal. A problem occurs when the error occurs after the daemon has
159
+ already gone into the background. Such errors are only reported to the log
160
+ file.
161
+ *The daemon starter should also check the log file for any startup errors.*
162
+
163
+ Furthermore, it should be able to raise startup errors as exceptions. This
164
+ allows the the application to decide what to do with the error. For less
165
+ experienced system administrators, the error might be displayed in the
166
+ browser, allowing the administrators to become aware of the problem without
167
+ forcing them to manually check the log files. Or the error might be emailed
168
+ to a system administrator's email address.
169
+
170
+ - **The daemon starter must be able to correct stale or corrupted PID files**
171
+
172
+ If the PID file is stale, or for some reason has been corrupted, then the
173
+ daemon starter must be able to cope with that.
174
+ *It should check whether the PID file contains a valid PID, and whether the PID exists.*
175
+
176
+
177
+ Introducing daemon_controller
178
+ =============================
179
+
180
+ *daemon_controller* is a library for managing daemons in a robust manner. It is
181
+ not a tool for managing daemons. Rather, it is a library which lets you write
182
+ applications that manage daemons in a robust manner. For example,
183
+ 'mongrel_cluster' or UltraSphinx may be adapted to utilize this library, for
184
+ more robust daemon management.
185
+
186
+ *daemon_controller* implements all items in the aforementioned wishlist. It
187
+ provides the following functionalities:
188
+
189
+ ### Starting a daemon
190
+
191
+ This ensures that no two processes can start the same daemon at the same time.
192
+ It will also reports any startup errors, even errors that occur after the
193
+ daemon has already gone into the background but before it has fully initialized
194
+ yet. It also allows you to set a timeout, and will try to abort the daemon if
195
+ it takes too long to initialize.
196
+
197
+ The start function won't return until the daemon has been fully initialized,
198
+ and is responding to connections. So if the start function has returned, then
199
+ the daemon is guaranteed to be usable.
200
+
201
+ ### Stopping a daemon
202
+
203
+ It will stop the daemon, but only if it's already running. Any errors
204
+ are reported. If the daemon isn't already running, then it will silently
205
+ succeed. Just like starting a daemon, you can set a timeout for stopping the
206
+ daemon.
207
+
208
+ Like the start function, the stop function won't return until the daemon is no
209
+ longer running. This makes it save to immediately start the same daemon again
210
+ after having stopped it, without worrying that the previous daemon instance
211
+ hasn't exited yet and might conflict with the newly started daemon instance.
212
+
213
+ ### Connecting to a daemon, starting it if it isn't running
214
+
215
+ Every daemon has to be connected to using a different way. As a developer, you
216
+ tell 'daemon_controller' how to connect to the daemon. It will then attempt to
217
+ do that, and if that fails, it will check whether the daemon is running. If it
218
+ isn't running, then it will automatically start the daemon, and attempt to
219
+ connect to the daemon again. Failures are reported.
220
+
221
+ ### Checking whether a daemon is running
222
+
223
+ This information is retrieved from the PID file. It also checks whether the PID
224
+ file is stale.
225
+
226
+ ### All failures are reported via exceptions
227
+
228
+ So that you can exactly determine how you want to handle errors.
229
+
230
+ ### Lots and lots of error checking
231
+
232
+ So that there are very few ways in which the system can screw up.
233
+
234
+ daemon_controller's goal is to make daemon management less of a hassle, and as
235
+ automatic and straightforward as possible.
236
+
237
+
238
+ What about Monit/God?
239
+ =====================
240
+
241
+ daemon_controller is not a replacement for [Monit](http://www.tildeslash.com/monit/)
242
+ or [God](http://god.rubyforge.org/). Rather, it is a solution to the following
243
+ problem:
244
+
245
+ > **Hongli:** hey Ninh, do a 'git pull', I just implemented awesome searching
246
+ > features in our application!
247
+ > **Ninh:** cool. *pulls from repository*
248
+ > **Ninh:** hey Hongli, it doesn't work.
249
+ > **Hongli:** what do you mean, it doesn't work?
250
+ > **Ninh:** it says "connection refused", or something
251
+ > **Hongli:** oh I forgot to mention it, you have to run the Sphinx search
252
+ > daemon before it works. type "rake sphinx:daemon:start" to do
253
+ > that
254
+ > **Ninh:** great. but now I get a different error. something about
255
+ > BackgrounDRb.
256
+ > **Hongli:** oops, I forgot to mention this too. you need to start the
257
+ > BackgrounDRb server with "rake backgroundrb:start_server"
258
+ > **Ninh:** okay, so every time I want to use this app, I have to type
259
+ > "rake sphinx:daemon:start", "rake backgroundrb:start_server" and
260
+ > "./script/server"?
261
+ > **Hongli:** yep
262
+
263
+ Imagine the above conversation becoming just:
264
+
265
+ > **Hongli:** hey Ninh, do a 'git pull', I just implemented awesome searching
266
+ > features in our application!
267
+ > **Ninh:** cool. *pulls from repository*
268
+ > **Ninh:** awesome, it works!
269
+
270
+ This is not something that can be achieved with Monit/God. Monit/God are for
271
+ monitoring daemons, auto-restarting them when they use too much resources.
272
+ daemon_controller's goal is to allow developers to implement daemon
273
+ starting/stopping and daemon auto-starting code that's robust. daemon_controller
274
+ is intended to be used to make daemon-dependent applications Just Work(tm)
275
+ without having to start the daemons manually.
276
+
277
+
278
+ Tutorial #1: controlling Apache
279
+ ===============================
280
+
281
+ Suppose that you're a [Phusion Passenger](http://www.modrails.com/) developer,
282
+ and you need to write tests for the Apache module. In particular, you want to
283
+ test whether the different Phusion Passenger configuration directives are
284
+ working as expected. Obviously, to test the Apache module, the Apache web
285
+ server must be running. For every test, you will want the unit test suite to:
286
+
287
+ 1. Write an Apache configuration file, with the relevant configuration
288
+ directive set to a specific value.
289
+ 2. Start Apache.
290
+ 3. Send an HTTP request to Apache and check whether the HTTP response matches
291
+ your expectations.
292
+ 4. Stop Apache.
293
+
294
+ That can be done with the following code:
295
+
296
+ require 'daemon_controller'
297
+
298
+ File.open("apache.conf", "w") do |f|
299
+ f.write("PidFile apache.pid\n")
300
+ f.write("LogFile apache.log\n")
301
+ f.write("Listen 1234\n")
302
+ f.write(... other relevant configuration options ...)
303
+ end
304
+
305
+ controller = DaemonController.new(
306
+ :identifier => 'Apache web server',
307
+ :start_command => 'apachectl -f apache.conf -k start',
308
+ :ping_command => lambda { TCPSocket.new('localhost', 1234) },
309
+ :pid_file => 'apache.pid',
310
+ :log_file => 'apache.log',
311
+ :timeout => 25
312
+ )
313
+ controller.start
314
+
315
+ .... apache is now started ....
316
+ .... some test code here ....
317
+
318
+ controller.stop
319
+
320
+ The `File.open` line is obvious: it writes the relevant Apache configuration
321
+ file.
322
+
323
+ The next line is for creating a new DaemonController object. We pass a
324
+ human-readable identifier for this daemon ("Apache web server") to the
325
+ constructor. This is used for generating friendlier error messages.
326
+ We also tell it how Apache is supposed to be started (`:start_command`), how to
327
+ check whether it can be connected to (`:ping_command`), and where its PID file
328
+ and log file is. If Apache failed with an error during startup, then it will be
329
+ reported. If Apache failed with an error after it has gone into the background,
330
+ then that will be reported too: the given log file is monitored for new error
331
+ messages.
332
+ Finally, a timeout of 25 seconds is given. If Apache doesn't start within 25
333
+ seconds, then an exception will be raised.
334
+
335
+ The ping command is just a `Proc` which returns true or false. If the Proc
336
+ raises `Errno::ECONNREFUSED`, then that's also interpreted by DaemonController
337
+ as meaning that the daemon isn't responding yet.
338
+
339
+ After `controller.start` has returned, we can continue with the test case. At
340
+ this point, we know that Apache is done with initializing.
341
+ When we're done with Apache, we stop it with `controller.stop`. This does not
342
+ return until Apache has fully stopped.
343
+
344
+ The cautious reader might notice that the socket returned by the ping command
345
+ is never closed. That's true, because DaemonController will close it
346
+ automatically for us, if it notices that the ping command proc's return value
347
+ responds to `#close`.
348
+
349
+ From this example, it becomes apparent that for daemon_controller to work, you
350
+ must know how to start the daemon, how to contact the daemon, and you must know
351
+ where it will put its PID file and log file.
352
+
353
+
354
+ Tutorial #2: Sphinx indexing and search server management
355
+ =========================================================
356
+
357
+ We at Phusion are currently developing a web application with full-text search
358
+ capabilities, and we're using Sphinx for this purpose. We want to make the
359
+ lives of our developers and our system administrators as easy as possible, so
360
+ that there's little room for human screw-up, and so we've developed this
361
+ library. Our Sphinx search daemon is completely managed through this library
362
+ and is automatically started on demand.
363
+
364
+ Our Sphinx config file is generated from an ERB template. This ERB template
365
+ writes different values in the config file, depending on whether we're in
366
+ development, test or production mode. We will want to regenerate this config
367
+ file every time, just before we start the search daemon.
368
+ But there's more. The search daemon will fail if there is no search index. If a
369
+ new developer has just checked out the application's source code, then there is
370
+ no search index yet. We don't want him to go through the pain of having to
371
+ generate the index manually. (That said, it isn't that much of a pain, but it's
372
+ just yet-another-thing to do, which can and should be automated.) So before
373
+ starting the daemon, we will also want to check whether the index exists. If
374
+ not, then we'll generate it, and then start the daemon. Of course, no two Rails
375
+ processes may generate the config file or the index at the same time.
376
+
377
+ When querying the search server, we will want to automatically start it if it
378
+ isn't running.
379
+
380
+ This can be achieved with the following code:
381
+
382
+ require 'daemon_controller'
383
+
384
+ class SearchServer
385
+ SEARCH_SERVER_PORT = 1234
386
+
387
+ def initialize
388
+ @controller = DaemonController.new(
389
+ :identifier => 'Sphinx search server',
390
+ :start_command => "searchd -c config/sphinx.conf",
391
+ :before_start => method(:before_start),
392
+ :ping_command => lambda { TCPSocket.new('localhost', SEARCH_SERVER_PORT) },
393
+ :pid_file => 'tmp/pids/sphinx.pid',
394
+ :log_file => 'log/sphinx.log')
395
+ end
396
+
397
+ def query(search_terms)
398
+ socket = @controller.connect do
399
+ TCPSocket.new('localhost', SEARCH_SERVER_PORT)
400
+ end
401
+ send_query(socket, search_terms)
402
+ return retrieve_results(socket)
403
+ end
404
+
405
+ private
406
+ def before_start
407
+ generate_configuration_file
408
+ if !index_exists?
409
+ generate_index
410
+ end
411
+ end
412
+
413
+ ...
414
+ end
415
+
416
+ Notice the `:before_start` option. We pass a block of code which is to be run,
417
+ just before the daemon is started. This block, along with starting the daemon,
418
+ is completely serialized. That is, if you're inside the block, then it's
419
+ guaranteed that no other process is running this block at the same time as well.
420
+
421
+ The `#query` method is the method for querying the search server with search
422
+ terms. It returns a list of result. It uses `DaemonController#connect`: one
423
+ passes a block of that method, which contains code for connecting to the
424
+ daemon. If the block returns nil, or if it raises `Errno::ECONNREFUSED`, then
425
+ `DaemonController#connect` will automatically take care of auto-starting the
426
+ Sphinx daemon for us.
427
+
428
+
429
+ A little bit of history
430
+ =======================
431
+
432
+ The issue of managing daemons has been a thorn in our eyes for quite some time
433
+ now. Until now, we've solved this problem by equipping any daemons that we
434
+ write with the ability to gracefully handle being concurrently started, the
435
+ ability to initialize as much as possible *before* forking into the background,
436
+ etc. However, equipping all this robustness into our code over and over is a
437
+ lot of work. We've considered documenting a standard behavior for daemons so
438
+ that they can properly support auto-starting and such.
439
+
440
+ However, we've recently realized that that's probably a futile effort.
441
+ Convincing everybody to write a lot of code for a bit more robustness is
442
+ probably not realistic. So we took the pragmatic approach and developed a
443
+ library which adds more robustness on top of daemons' existing behavior. And
444
+ thus, daemon_controller was born. It is a little bit less efficient compared to
445
+ when the daemon is designed from the beginning with such abilities in mind, but
446
+ it's compatible with virtually all daemons, and is easy to use.
447
+
448
+
449
+ Concurrency notes
450
+ =================
451
+ DaemonController can only guarantee concurrency safety between multiple threads
452
+ in the same process, if all of those threads use the same DaemonController
453
+ object. Synchronization between multiple processes works fine.
454
+
455
+
456
+ API documentation
457
+ =================
458
+
459
+ Detailed API documentation is available in the form of inline comments in
460
+ `lib/daemon_controller.rb`.