FooBarWidget-daemon_controller 0.1.0 → 0.2.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- data/LICENSE.txt +20 -0
- data/README.markdown +387 -0
- data/daemon_controller.gemspec +5 -4
- data/lib/daemon_controller/lock_file.rb +48 -0
- data/lib/daemon_controller.rb +89 -50
- data/spec/daemon_controller_spec.rb +44 -0
- metadata +5 -4
- data/README.rdoc +0 -41
data/LICENSE.txt
ADDED
@@ -0,0 +1,20 @@
|
|
1
|
+
Copyright (c) 2008 Phusion
|
2
|
+
|
3
|
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
4
|
+
of this software and associated documentation files (the "Software"), to deal
|
5
|
+
in the Software without restriction, including without limitation the rights
|
6
|
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
7
|
+
copies of the Software, and to permit persons to whom the Software is
|
8
|
+
furnished to do so, subject to the following conditions:
|
9
|
+
|
10
|
+
The above copyright notice and this permission notice shall be included in
|
11
|
+
all copies or substantial portions of the Software.
|
12
|
+
|
13
|
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
14
|
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
15
|
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
16
|
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
17
|
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
18
|
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
19
|
+
THE SOFTWARE.
|
20
|
+
|
data/README.markdown
ADDED
@@ -0,0 +1,387 @@
|
|
1
|
+
Introduction
|
2
|
+
============
|
3
|
+
|
4
|
+
There is a lot of software (both Rails related and unrelated) which rely on
|
5
|
+
servers or daemons. To name a few, in no particular order:
|
6
|
+
|
7
|
+
* [Ultrasphinx](http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/), a
|
8
|
+
Rails library for full-text searching. It makes use the [Sphinx search
|
9
|
+
software](http://www.sphinxsearch.com/) for indexing and searching. Indexing
|
10
|
+
is done by running a command, while searching is done by querying the Sphinx
|
11
|
+
search server.
|
12
|
+
* [acts_as_ferret](http://projects.jkraemer.net/acts_as_ferret/wiki), another
|
13
|
+
Rails library for full-text searching. It uses the Ferret search software.
|
14
|
+
On production environments, it relies on the Ferret DRB server for both
|
15
|
+
searching and indexing.
|
16
|
+
* [BackgrounDRb](http://backgroundrb.rubyforge.org/), a Ruby job server and
|
17
|
+
scheduler. Scheduling is done by contacting the BackgrounDRb daemon.
|
18
|
+
* [mongrel_cluster](http://mongrel.rubyforge.org/wiki/MongrelCluster), which
|
19
|
+
starts and stops multiple Mongrel daemons.
|
20
|
+
|
21
|
+
Relying on daemons is quite common, but not without problems. Let's go over
|
22
|
+
some of them.
|
23
|
+
|
24
|
+
### Starting daemons is a hassle
|
25
|
+
|
26
|
+
If you've used similar software, then you might agree that managing these
|
27
|
+
daemons are a hassle. If you're using BackgrounDRb, then the daemon must be
|
28
|
+
running. Starting the daemon is not hard, but it is annoying. It's also
|
29
|
+
possible that the system administrator forgets to start the daemon. While
|
30
|
+
configuring the system to automatically start a daemon at startup is not hard,
|
31
|
+
it is an extra thing to do, and thus a hassle. We thought, why can't such
|
32
|
+
daemons be automatically started? Indeed, this won't be possible if the daemon
|
33
|
+
is to be run on a remote machine. But in by far the majority of use cases, the
|
34
|
+
daemon runs on the same host as the Rails application. If a Rails application -
|
35
|
+
or indeed, <em>any</em> application - is configured to contact a daemon on the
|
36
|
+
local host, then why not start the daemon automatically on demand?
|
37
|
+
|
38
|
+
### Daemon starting code may not be robust or efficient
|
39
|
+
|
40
|
+
We've also observed that people write daemon controlling code over and over
|
41
|
+
again. Consider for example UltraSphinx, which provides a
|
42
|
+
`rake sphinx:daemon:start` Rake task to start the daemon. The time that a
|
43
|
+
daemon needs to initialize is variable, and depends on things such as the
|
44
|
+
current system load. The Sphinx daemon usually needs less than a second before
|
45
|
+
we can connect to it. However, the way different software handles starting of a
|
46
|
+
daemon varies. We've observed that waiting a fixed amount of time is by far the
|
47
|
+
most common way. For example, UltraSphinx's daemon starting code looks like
|
48
|
+
this:
|
49
|
+
|
50
|
+
system "searchd --config '#{Ultrasphinx::CONF_PATH}'"
|
51
|
+
sleep(4) # give daemon a chance to write the pid file
|
52
|
+
if ultrasphinx_daemon_running?
|
53
|
+
say "started successfully"
|
54
|
+
else
|
55
|
+
say "failed to start"
|
56
|
+
end
|
57
|
+
|
58
|
+
This is in no way a slam against UltraSphinx. However, if the daemon starts in
|
59
|
+
200 miliseconds, then the user who issued the start command will be waiting for
|
60
|
+
3.8 seconds for no good reason. This is not good for usability or for the
|
61
|
+
user's patience.
|
62
|
+
|
63
|
+
### Startup error handling
|
64
|
+
|
65
|
+
Different software handles daemon startup errors in different ways. Some might
|
66
|
+
not even handle errors at all. For example, consider 'mongrel_cluster'. If
|
67
|
+
there's a typo in one of your application source files, then 'mongrel_cluster'
|
68
|
+
will not report the error. Instead, you have to check its log files to see what
|
69
|
+
happened. This is not good for usability: many people will be wondering why
|
70
|
+
they can't connect to their Mongrel ports after issuing a
|
71
|
+
`mongrel_rails cluster::start` - until they realize that they should read the
|
72
|
+
log file. But the thing is, not everybody realizes this. And typing in an extra
|
73
|
+
command to read the log file to check whether Mongrel started correctly, is
|
74
|
+
just a big hassle. Why can't the daemon startup code report such errors
|
75
|
+
immediately?
|
76
|
+
|
77
|
+
### Stale or corrupt Pid files
|
78
|
+
|
79
|
+
Suppose that you're running a Mongrel cluster, and your server suddenly powers
|
80
|
+
off because of a power outage. When the server is online again, it fails to
|
81
|
+
start your Mongrel cluster because the PID file that it had written still
|
82
|
+
exists, and wasn't cleaned up properly (it's supposed to be cleaned up when
|
83
|
+
Mongrel exits). mongrel_cluster provides the `--clean` option to check whether
|
84
|
+
the PID file is *stale*, and will automatically clean it up if it is. But not
|
85
|
+
all daemon controlling software supports this. Why can't all software check for
|
86
|
+
stale PID files automatically?
|
87
|
+
|
88
|
+
|
89
|
+
Implementation problems
|
90
|
+
=======================
|
91
|
+
|
92
|
+
From the problem descriptions, it would become apparent that our wishlist is as
|
93
|
+
follows. Why is this wishlist often not implemented? Let's go over them.
|
94
|
+
|
95
|
+
- **A daemon should be automatically started on demand, instead of requiring the user to manually start it.**
|
96
|
+
|
97
|
+
The most obvious problems are related to concurrency. Suppose that your web
|
98
|
+
application has a search box, and you want to start the search daemon if it
|
99
|
+
isn't already started, then connect to. Two problems will arise:
|
100
|
+
|
101
|
+
* Suppose that Rails process A is still starting the daemon. At the same
|
102
|
+
time, another visitor tries to search something, and Rails process B
|
103
|
+
notices that the daemon is not running. If B tries to start the daemon
|
104
|
+
while it's already being started by A, then things can go wrong.
|
105
|
+
*A robust daemon starter must ensure that only one process at the same time may start the daemon.*
|
106
|
+
* It's not a good idea to wait a fixed amount of time for the daemon to
|
107
|
+
start, because you don't know in advance how long it will take for it to
|
108
|
+
start. For example, if you wait 2 seconds, then try to connect to the
|
109
|
+
daemon, and the daemon isn't done initializing yet, then it will seem as
|
110
|
+
if the daemon failed to start.
|
111
|
+
|
112
|
+
These are the most probable reasons why people don't try to write
|
113
|
+
auto-starting code, and instead require the user to start the daemon
|
114
|
+
manually.
|
115
|
+
|
116
|
+
These problems, as well as several less obvious problems, are closely
|
117
|
+
related to the next few points.
|
118
|
+
|
119
|
+
- **The daemon starter must wait until the daemon is done initializing, no longer and no shorter**
|
120
|
+
|
121
|
+
Because only after the daemon is fully initialized, is it safe to connect
|
122
|
+
to it. And because the user should not have to wait longer than he really
|
123
|
+
has to. During startup, the daemon will have to be continuously checked
|
124
|
+
whether it's done initializing or whether an error occured. Writing this
|
125
|
+
code can be quite a hassle, which is why most people don't do it.
|
126
|
+
|
127
|
+
- **The daemon starter must report any startup errors**
|
128
|
+
|
129
|
+
If the daemon starting command - e.g. `sphinx -c config_file.conf`,
|
130
|
+
`apachectl start` or `mongrel_rails cluster::start` - reports startup
|
131
|
+
errors, then all is fine as long as the user is starting the command from a
|
132
|
+
terminal. A problem occurs when the error occurs after the daemon has
|
133
|
+
already gone into the background. Such errors are only reported to the log
|
134
|
+
file.
|
135
|
+
*The daemon starter should also check the log file for any startup errors.*
|
136
|
+
|
137
|
+
Furthermore, it should be able to raise startup errors as exceptions. This
|
138
|
+
allows the the application to decide what to do with the error. For less
|
139
|
+
experienced system administrators, the error might be displayed in the
|
140
|
+
browser, allowing the administrators to become aware of the problem without
|
141
|
+
forcing them to manually check the log files. Or the error might be emailed
|
142
|
+
to a system administrator's email address.
|
143
|
+
|
144
|
+
- **The daemon starter must be able to correct stale or corrupted PID files**
|
145
|
+
|
146
|
+
If the PID file is stale, or for some reason has been corrupted, then the
|
147
|
+
daemon starter must be able to cope with that.
|
148
|
+
*It should check whether the PID file contains a valid PID, and whether the PID exists.*
|
149
|
+
|
150
|
+
|
151
|
+
Introducing daemon_controller
|
152
|
+
=============================
|
153
|
+
|
154
|
+
*daemon_controller* is a library for managing daemons in a robust manner. It is
|
155
|
+
not a tool for managing daemons. Rather, it is a library which lets you write
|
156
|
+
applications that manage daemons in a robust manner. For example,
|
157
|
+
'mongrel_cluster' or UltraSphinx may be adapted to utilize this library, for
|
158
|
+
more robust daemon management.
|
159
|
+
|
160
|
+
*daemon_controller* implements all items in the aforementioned wishlist. It
|
161
|
+
provides the following functionalities:
|
162
|
+
|
163
|
+
### Starting a daemon
|
164
|
+
|
165
|
+
This ensures that no two processes can start the same daemon at the same time.
|
166
|
+
It will also reports any startup errors, even errors that occur after the
|
167
|
+
daemon has already gone into the background but before it has fully initialized
|
168
|
+
yet. It also allows you to set a timeout, and will try to abort the daemon if
|
169
|
+
it takes too long to initialize.
|
170
|
+
|
171
|
+
The start function won't return until the daemon has been fully initialized,
|
172
|
+
and is responding to connections. So if the start function has returned, then
|
173
|
+
the daemon is guaranteed to be usable.
|
174
|
+
|
175
|
+
### Stopping a daemon
|
176
|
+
|
177
|
+
It will stop the daemon, but only if it's already running. Any errors
|
178
|
+
are reported. If the daemon isn't already running, then it will silently
|
179
|
+
succeed. Just like starting a daemon, you can set a timeout for stopping the
|
180
|
+
daemon.
|
181
|
+
|
182
|
+
Like the start function, the stop function won't return until the daemon is no
|
183
|
+
longer running. This makes it save to immediately start the same daemon again
|
184
|
+
after having stopped it, without worrying that the previous daemon instance
|
185
|
+
hasn't exited yet and might conflict with the newly started daemon instance.
|
186
|
+
|
187
|
+
### Connecting to a daemon, starting it if it isn't running
|
188
|
+
|
189
|
+
Every daemon has to be connected to using a different way. As a developer, you
|
190
|
+
tell 'daemon_controller' how to connect to the daemon. It will then attempt to
|
191
|
+
do that, and if that fails, it will check whether the daemon is running. If it
|
192
|
+
isn't running, then it will automatically start the daemon, and attempt to
|
193
|
+
connect to the daemon again. Failures are reported.
|
194
|
+
|
195
|
+
### Checking whether a daemon is running
|
196
|
+
|
197
|
+
This information is retrieved from the PID file. It also checks whether the PID
|
198
|
+
file is stale.
|
199
|
+
|
200
|
+
### All failures are reported via exceptions
|
201
|
+
|
202
|
+
So that you can exactly determine how you want to handle errors.
|
203
|
+
|
204
|
+
### Lots and lots of error checking
|
205
|
+
|
206
|
+
So that there are very few ways in which the system can screw up.
|
207
|
+
|
208
|
+
daemon_controller's goal is to make daemon management less of a hassle, and as
|
209
|
+
automatic and straightforward as possible.
|
210
|
+
|
211
|
+
|
212
|
+
Tutorial #1: controlling Apache
|
213
|
+
===============================
|
214
|
+
|
215
|
+
Suppose that you're a [Phusion Passenger](http://www.modrails.com/) developer,
|
216
|
+
and you need to write tests for the Apache module. In particular, you want to
|
217
|
+
test whether the different Phusion Passenger configuration directives are
|
218
|
+
working as expected. Obviously, to test the Apache module, the Apache web
|
219
|
+
server must be running. For every test, you will want the unit test suite to:
|
220
|
+
|
221
|
+
1. Write an Apache configuration file, with the relevant configuration
|
222
|
+
directive set to a specific value.
|
223
|
+
2. Start Apache.
|
224
|
+
3. Send an HTTP request to Apache and check whether the HTTP response matches
|
225
|
+
your expectations.
|
226
|
+
4. Stop Apache.
|
227
|
+
|
228
|
+
That can be done with the following code:
|
229
|
+
|
230
|
+
require 'daemon_controller'
|
231
|
+
|
232
|
+
File.open("apache.conf", "w") do |f|
|
233
|
+
f.write("PidFile apache.pid\n")
|
234
|
+
f.write("LogFile apache.log\n")
|
235
|
+
f.write("Listen 1234\n")
|
236
|
+
f.write(... other relevant configuration options ...)
|
237
|
+
end
|
238
|
+
|
239
|
+
controller = DaemonController.new(
|
240
|
+
:identifier => 'Apache web server',
|
241
|
+
:start_command => 'apachectl -f apache.conf -k start',
|
242
|
+
:ping_command => lambda { TCPSocket.new('localhost', 1234) },
|
243
|
+
:pid_file => 'apache.pid',
|
244
|
+
:log_file => 'apache.log',
|
245
|
+
:timeout => 25
|
246
|
+
)
|
247
|
+
controller.start
|
248
|
+
|
249
|
+
.... apache is now started ....
|
250
|
+
.... some test code here ....
|
251
|
+
|
252
|
+
controller.stop
|
253
|
+
|
254
|
+
The `File.open` line is obvious: it writes the relevant Apache configuration
|
255
|
+
file.
|
256
|
+
|
257
|
+
The next line is for creating a new DaemonController object. We pass a
|
258
|
+
human-readable identifier for this daemon ("Apache web server") to the
|
259
|
+
constructor. This is used for generating friendlier error messages.
|
260
|
+
We also tell it how Apache is supposed to be started (`:start_command`), how to
|
261
|
+
check whether it can be connected to (`:ping_command`), and where its PID file
|
262
|
+
and log file is. If Apache failed with an error during startup, then it will be
|
263
|
+
reported. If Apache failed with an error after it has gone into the background,
|
264
|
+
then that will be reported too: the given log file is monitored for new error
|
265
|
+
messages.
|
266
|
+
Finally, a timeout of 25 seconds is given. If Apache doesn't start within 25
|
267
|
+
seconds, then an exception will be raised.
|
268
|
+
|
269
|
+
The ping command is just a `Proc` which returns true or false. If the Proc
|
270
|
+
raises `Errno::ECONNREFUSED`, then that's also interpreted by DaemonController
|
271
|
+
as meaning that the daemon isn't responding yet.
|
272
|
+
|
273
|
+
After `controller.start` has returned, we can continue with the test case. At
|
274
|
+
this point, we know that Apache has done initializing.
|
275
|
+
When we're done with Apache, we stop it with `controller.stop`. This does not
|
276
|
+
return until Apache has fully stopped.
|
277
|
+
|
278
|
+
The cautious reader might notice that the socket returned by the ping command
|
279
|
+
is never closed. That's true, because DaemonController will close it
|
280
|
+
automatically for us, if it notices that the ping command proc's return value
|
281
|
+
responds to `#close`.
|
282
|
+
|
283
|
+
From this example, it becomes apparent that for daemon_controller to work, you
|
284
|
+
must know how to start the daemon, how to contact the daemon, and you must know
|
285
|
+
where it will put its PID file and log file.
|
286
|
+
|
287
|
+
|
288
|
+
Tutorial #2: Sphinx indexing and search server management
|
289
|
+
=========================================================
|
290
|
+
|
291
|
+
We at Phusion are currently developing a web application with full-text search
|
292
|
+
capabilities, and we're using Sphinx for this purpose. We want to make the
|
293
|
+
lives of our developers and our system administrators as easy as possible, so
|
294
|
+
that there's little room for human screw-up, and so we've developed this
|
295
|
+
library. Our Sphinx search daemon is completely managed through this library
|
296
|
+
and is automatically started on demand.
|
297
|
+
|
298
|
+
Our Sphinx config file is generated from an ERB template. This ERB templates
|
299
|
+
writes different values in the config file, depending on whether we're in
|
300
|
+
development, test or production mode. We will want to regenerate this config
|
301
|
+
file every time, just before we start the search daemon.
|
302
|
+
But there's more. The search daemon will fail if there is no search index. If a
|
303
|
+
new developer has just checked out the application's source code, then there is
|
304
|
+
no search index yet. We don't want him to go through the pain of having to
|
305
|
+
generate the index manually. (That said, it isn't that much of a pain, but it's
|
306
|
+
just yet-another-thing to do, which can and should be automated.) So before
|
307
|
+
starting the daemon, we will also want to check whether the index exists. If
|
308
|
+
not, then we'll generate it, and then start the daemon. Of course, no two Rails
|
309
|
+
processes may generate the config file or the index at the same time.
|
310
|
+
|
311
|
+
When querying the search server, we will want to automatically start it if it
|
312
|
+
isn't running.
|
313
|
+
|
314
|
+
This can be achieved with the following code:
|
315
|
+
|
316
|
+
require 'daemon_controller'
|
317
|
+
|
318
|
+
class SearchServer
|
319
|
+
SEARCH_SERVER_PORT = 1234
|
320
|
+
|
321
|
+
def initialize
|
322
|
+
@controller = DaemonController.new(
|
323
|
+
:identifier => 'Sphinx search server',
|
324
|
+
:start_command => "searchd -c config/sphinx.conf",
|
325
|
+
:before_start => method(:before_start),
|
326
|
+
:ping_command => lambda { TCPSocket.new('localhost', SEARCH_SERVER_PORT) },
|
327
|
+
:pid_file => 'tmp/pids/sphinx.pid',
|
328
|
+
:log_file => 'log/sphinx.log',
|
329
|
+
end
|
330
|
+
|
331
|
+
def query(search_terms)
|
332
|
+
socket = @controller.connect do
|
333
|
+
TCPSocket.new('localhost', SEARCH_SERVER_PORT)
|
334
|
+
end
|
335
|
+
send_query(socket, search_terms)
|
336
|
+
return retrieve_results(socket)
|
337
|
+
end
|
338
|
+
|
339
|
+
private
|
340
|
+
def before_start
|
341
|
+
generate_configuration_file
|
342
|
+
if !index_exists?
|
343
|
+
generate_index
|
344
|
+
end
|
345
|
+
end
|
346
|
+
|
347
|
+
...
|
348
|
+
end
|
349
|
+
|
350
|
+
Notice the `:before_start` option. We pass a block of code which is to be run,
|
351
|
+
just before the daemon is started. This block, along with starting the daemon,
|
352
|
+
is completely serialized. That is, if you're inside the block, then it's
|
353
|
+
guaranteed that no other process is running this block at the same time as well.
|
354
|
+
|
355
|
+
The `#query` method is the method for querying the search server with search
|
356
|
+
terms. It returns a list of result. It uses `DaemonController#connect`: one
|
357
|
+
passes a block of that method, which contains code for connecting to the
|
358
|
+
daemon. If the block returns nil, or if it raises `Errno::ECONNREFUSED`, then
|
359
|
+
`DaemonController#connect` will automatically take care of auto-starting the
|
360
|
+
Sphinx daemon for us.
|
361
|
+
|
362
|
+
|
363
|
+
A little bit of history
|
364
|
+
=======================
|
365
|
+
|
366
|
+
The issue of managing daemons has been a thorn in our eyes for quite some time
|
367
|
+
now. Until now, we've solved this problem by equipping any daemons that we
|
368
|
+
write with the ability to gracefully handle being concurrently started, the
|
369
|
+
ability to initialize as much as possible *before* forking into the background,
|
370
|
+
etc. However, equipping all this robustness into our code over and over is a
|
371
|
+
lot of work. We've considered documenting a standard behavior for daemons so
|
372
|
+
that they can properly support auto-starting and such.
|
373
|
+
|
374
|
+
However, we've recently realized that that's probably a futile effort.
|
375
|
+
Convincing everybody to write a lot of code for a bit more robustness is
|
376
|
+
probably not realistic. So we took the pragmatic approach and developed a
|
377
|
+
library which adds more robustness on top of daemons' existing behavior. And
|
378
|
+
thus, daemon_controller was born. It is a little bit less efficient compared to
|
379
|
+
when the daemon is designed from the beginning with such abilities in mind, but
|
380
|
+
it's compatible with virtually all daemons, and is easy to use.
|
381
|
+
|
382
|
+
|
383
|
+
API documentation
|
384
|
+
=================
|
385
|
+
|
386
|
+
Detailed API documentation is available in the form of inline comments in
|
387
|
+
`lib/daemon_controller.rb`.
|
data/daemon_controller.gemspec
CHANGED
@@ -1,17 +1,18 @@
|
|
1
1
|
Gem::Specification.new do |s|
|
2
2
|
s.name = "daemon_controller"
|
3
|
-
s.version = "0.
|
3
|
+
s.version = "0.2.0"
|
4
4
|
s.date = "2008-08-21"
|
5
5
|
s.summary = "A library for implementing daemon management capabilities"
|
6
6
|
s.email = "hongli@phusion.nl"
|
7
7
|
s.homepage = "http://github.com/FooBarWidget/daemon_controller/tree/master"
|
8
|
-
s.description = "A library for
|
9
|
-
s.has_rdoc =
|
8
|
+
s.description = "A library for robust daemon management."
|
9
|
+
s.has_rdoc = true
|
10
10
|
s.authors = ["Hongli Lai"]
|
11
11
|
|
12
12
|
s.files = [
|
13
|
-
"README.
|
13
|
+
"README.markdown", "LICENSE.txt", "daemon_controller.gemspec",
|
14
14
|
"lib/daemon_controller.rb",
|
15
|
+
"lib/daemon_controller/lock_file.rb",
|
15
16
|
"spec/daemon_controller_spec.rb",
|
16
17
|
"spec/echo_server.rb"
|
17
18
|
]
|
@@ -0,0 +1,48 @@
|
|
1
|
+
# daemon_controller, library for robust daemon management
|
2
|
+
# Copyright (c) 2008 Phusion
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the "Software"), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in
|
12
|
+
# all copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
20
|
+
# THE SOFTWARE.
|
21
|
+
|
22
|
+
class DaemonController
|
23
|
+
class LockFile
|
24
|
+
def initialize(filename)
|
25
|
+
@filename = filename
|
26
|
+
end
|
27
|
+
|
28
|
+
def exclusive_lock
|
29
|
+
File.open(@filename, 'w') do |f|
|
30
|
+
if Fcntl.const_defined? :F_SETFD
|
31
|
+
f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
|
32
|
+
end
|
33
|
+
f.flock(File::LOCK_EX)
|
34
|
+
yield
|
35
|
+
end
|
36
|
+
end
|
37
|
+
|
38
|
+
def shared_lock
|
39
|
+
File.open(@filename, 'w') do |f|
|
40
|
+
if Fcntl.const_defined? :F_SETFD
|
41
|
+
f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
|
42
|
+
end
|
43
|
+
f.flock(File::LOCK_SH)
|
44
|
+
yield
|
45
|
+
end
|
46
|
+
end
|
47
|
+
end # class PidFile
|
48
|
+
end # class DaemonController
|
data/lib/daemon_controller.rb
CHANGED
@@ -1,20 +1,33 @@
|
|
1
|
-
#
|
2
|
-
#
|
3
|
-
#
|
4
|
-
#
|
5
|
-
#
|
6
|
-
#
|
7
|
-
#
|
8
|
-
#
|
9
|
-
#
|
10
|
-
#
|
11
|
-
#
|
12
|
-
#
|
1
|
+
# daemon_controller, library for robust daemon management
|
2
|
+
# Copyright (c) 2008 Phusion
|
3
|
+
#
|
4
|
+
# Permission is hereby granted, free of charge, to any person obtaining a copy
|
5
|
+
# of this software and associated documentation files (the "Software"), to deal
|
6
|
+
# in the Software without restriction, including without limitation the rights
|
7
|
+
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
8
|
+
# copies of the Software, and to permit persons to whom the Software is
|
9
|
+
# furnished to do so, subject to the following conditions:
|
10
|
+
#
|
11
|
+
# The above copyright notice and this permission notice shall be included in
|
12
|
+
# all copies or substantial portions of the Software.
|
13
|
+
#
|
14
|
+
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
15
|
+
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
16
|
+
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
17
|
+
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
18
|
+
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
19
|
+
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
|
20
|
+
# THE SOFTWARE.
|
13
21
|
|
14
22
|
require 'tempfile'
|
15
23
|
require 'fcntl'
|
24
|
+
require File.expand_path(File.dirname(__FILE__) << '/daemon_controller/lock_file')
|
16
25
|
|
26
|
+
# Main daemon controller object. See the README for an introduction and tutorial.
|
17
27
|
class DaemonController
|
28
|
+
ALLOWED_CONNECT_EXCEPTIONS = [Errno::ECONNREFUSED, Errno::ENETUNREACH,
|
29
|
+
Errno::ETIMEDOUT, Errno::ECONNRESET]
|
30
|
+
|
18
31
|
class Error < StandardError
|
19
32
|
end
|
20
33
|
class TimeoutError < Error
|
@@ -57,6 +70,15 @@ class DaemonController
|
|
57
70
|
#
|
58
71
|
# The value may also be a Proc, which returns an expression that evaluates to
|
59
72
|
# true (indicating that the daemon can be connected to) or false (failure).
|
73
|
+
# If the Proc raises Errno::ECONNREFUSED, Errno::ENETUNREACH, Errno::ETIMEDOUT
|
74
|
+
# or Errno::ECONNRESET, then that also means that the daemon cannot be connected
|
75
|
+
# to.
|
76
|
+
# <b>NOTE:</b> if the ping command returns an object which responds to
|
77
|
+
# <tt>#close</tt>, then that method will be called on the return value.
|
78
|
+
# This makes it possible to specify a ping command such as
|
79
|
+
# <tt>lambda { TCPSocket.new('localhost', 1234) }</tt>, without having to worry
|
80
|
+
# about closing it afterwards.
|
81
|
+
# Any exceptions raised by #close are ignored.
|
60
82
|
#
|
61
83
|
# [:pid_file]
|
62
84
|
# The PID file that the daemon will write to. Used to check whether the daemon
|
@@ -73,6 +95,10 @@ class DaemonController
|
|
73
95
|
# by killing the PID written in the PID file.
|
74
96
|
#
|
75
97
|
# The default value is +nil+.
|
98
|
+
#
|
99
|
+
# [:before_start]
|
100
|
+
# This may be a Proc. It will be called just before running the start command.
|
101
|
+
# The before_start proc is not subject to the start timeout.
|
76
102
|
#
|
77
103
|
# [:start_timeout]
|
78
104
|
# The maximum amount of time, in seconds, that #start may take to start
|
@@ -118,6 +144,7 @@ class DaemonController
|
|
118
144
|
@ping_interval = options[:ping_interval] || 0.1
|
119
145
|
@pid_file = options[:pid_file]
|
120
146
|
@log_file = options[:log_file]
|
147
|
+
@before_start = options[:before_start]
|
121
148
|
@start_timeout = options[:start_timeout] || 15
|
122
149
|
@stop_timeout = options[:stop_timeout] || 15
|
123
150
|
@log_file_activity_timeout = options[:log_file_activity_timeout] || 7
|
@@ -132,7 +159,7 @@ class DaemonController
|
|
132
159
|
# - StartTimeout - the daemon did not start in time. This could also
|
133
160
|
# mean that the daemon failed after it has gone into the background.
|
134
161
|
def start
|
135
|
-
exclusive_lock do
|
162
|
+
@lock_file.exclusive_lock do
|
136
163
|
start_without_locking
|
137
164
|
end
|
138
165
|
end
|
@@ -142,8 +169,8 @@ class DaemonController
|
|
142
169
|
# started.
|
143
170
|
#
|
144
171
|
# The block must return nil or raise Errno::ECONNREFUSED, Errno::ENETUNREACH,
|
145
|
-
#
|
146
|
-
# It must return non-nil if the daemon can be connected to.
|
172
|
+
# Errno::ETIMEDOUT, Errno::ECONNRESET to indicate that the daemon cannot be
|
173
|
+
# connected to. It must return non-nil if the daemon can be connected to.
|
147
174
|
# Upon successful connection, the return value of the block will
|
148
175
|
# be returned by #connect.
|
149
176
|
#
|
@@ -158,21 +185,21 @@ class DaemonController
|
|
158
185
|
# to the daemon even after starting it.
|
159
186
|
def connect
|
160
187
|
connection = nil
|
161
|
-
shared_lock do
|
188
|
+
@lock_file.shared_lock do
|
162
189
|
begin
|
163
190
|
connection = yield
|
164
|
-
rescue
|
191
|
+
rescue *ALLOWED_CONNECT_EXCEPTIONS
|
165
192
|
connection = nil
|
166
193
|
end
|
167
194
|
end
|
168
195
|
if connection.nil?
|
169
|
-
exclusive_lock do
|
196
|
+
@lock_file.exclusive_lock do
|
170
197
|
if !daemon_is_running?
|
171
198
|
start_without_locking
|
172
199
|
end
|
173
200
|
begin
|
174
201
|
connection = yield
|
175
|
-
rescue
|
202
|
+
rescue *ALLOWED_CONNECT_EXCEPTIONS
|
176
203
|
connection = nil
|
177
204
|
end
|
178
205
|
if connection.nil?
|
@@ -197,7 +224,7 @@ class DaemonController
|
|
197
224
|
# - StopError - the stop command failed.
|
198
225
|
# - StopTimeout - the daemon didn't stop in time.
|
199
226
|
def stop
|
200
|
-
exclusive_lock do
|
227
|
+
@lock_file.exclusive_lock do
|
201
228
|
begin
|
202
229
|
Timeout.timeout(@stop_timeout) do
|
203
230
|
kill_daemon
|
@@ -211,14 +238,16 @@ class DaemonController
|
|
211
238
|
end
|
212
239
|
end
|
213
240
|
|
214
|
-
# Returns the daemon's PID, as reported by its PID file.
|
241
|
+
# Returns the daemon's PID, as reported by its PID file. Returns the PID
|
242
|
+
# as an integer, or nil there is no valid PID in the PID file.
|
243
|
+
#
|
215
244
|
# This method doesn't check whether the daemon's actually running.
|
216
245
|
# Use #running? if you want to check whether it's actually running.
|
217
246
|
#
|
218
247
|
# Raises SystemCallError or IOError if something went wrong during
|
219
248
|
# reading of the PID file.
|
220
249
|
def pid
|
221
|
-
shared_lock do
|
250
|
+
@lock_file.shared_lock do
|
222
251
|
return read_pid_file
|
223
252
|
end
|
224
253
|
end
|
@@ -230,32 +259,12 @@ class DaemonController
|
|
230
259
|
# Raises SystemCallError or IOError if something went wrong during
|
231
260
|
# reading of the PID file.
|
232
261
|
def running?
|
233
|
-
shared_lock do
|
262
|
+
@lock_file.shared_lock do
|
234
263
|
return daemon_is_running?
|
235
264
|
end
|
236
265
|
end
|
237
266
|
|
238
267
|
private
|
239
|
-
def exclusive_lock
|
240
|
-
File.open(@lock_file, 'w') do |f|
|
241
|
-
if Fcntl.const_defined? :F_SETFD
|
242
|
-
f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
|
243
|
-
end
|
244
|
-
f.flock(File::LOCK_EX)
|
245
|
-
yield
|
246
|
-
end
|
247
|
-
end
|
248
|
-
|
249
|
-
def shared_lock
|
250
|
-
File.open(@lock_file, 'w') do |f|
|
251
|
-
if Fcntl.const_defined? :F_SETFD
|
252
|
-
f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
|
253
|
-
end
|
254
|
-
f.flock(File::LOCK_SH)
|
255
|
-
yield
|
256
|
-
end
|
257
|
-
end
|
258
|
-
|
259
268
|
def start_without_locking
|
260
269
|
if daemon_is_running?
|
261
270
|
raise AlreadyStarted, "Daemon '#{@identifier}' is already started"
|
@@ -264,6 +273,7 @@ private
|
|
264
273
|
delete_pid_file
|
265
274
|
begin
|
266
275
|
started = false
|
276
|
+
before_start
|
267
277
|
Timeout.timeout(@start_timeout) do
|
268
278
|
done = false
|
269
279
|
spawn_daemon
|
@@ -305,14 +315,22 @@ private
|
|
305
315
|
result = :timeout
|
306
316
|
end
|
307
317
|
if !result
|
308
|
-
raise
|
318
|
+
raise(StartError, differences_in_log_file ||
|
319
|
+
"Daemon '#{@identifier}' failed to start.")
|
309
320
|
elsif result == :timeout
|
310
|
-
raise
|
321
|
+
raise(StartTimeout, differences_in_log_file ||
|
322
|
+
"Daemon '#{@identifier}' failed to start in time.")
|
311
323
|
else
|
312
324
|
return true
|
313
325
|
end
|
314
326
|
end
|
315
327
|
|
328
|
+
def before_start
|
329
|
+
if @before_start
|
330
|
+
@before_start.call
|
331
|
+
end
|
332
|
+
end
|
333
|
+
|
316
334
|
def spawn_daemon
|
317
335
|
run_command(@start_command)
|
318
336
|
end
|
@@ -330,7 +348,10 @@ private
|
|
330
348
|
end
|
331
349
|
|
332
350
|
def kill_daemon_with_signal
|
333
|
-
|
351
|
+
pid = read_pid_file
|
352
|
+
if pid
|
353
|
+
Process.kill('SIGTERM', pid)
|
354
|
+
end
|
334
355
|
rescue Errno::ESRCH, Errno::ENOENT
|
335
356
|
end
|
336
357
|
|
@@ -354,7 +375,12 @@ private
|
|
354
375
|
end
|
355
376
|
|
356
377
|
def read_pid_file
|
357
|
-
|
378
|
+
pid = File.read(@pid_file).strip
|
379
|
+
if pid =~ /\A\d+\Z/
|
380
|
+
return pid.to_i
|
381
|
+
else
|
382
|
+
return nil
|
383
|
+
end
|
358
384
|
end
|
359
385
|
|
360
386
|
def delete_pid_file
|
@@ -437,7 +463,12 @@ private
|
|
437
463
|
if @original_log_file_stat
|
438
464
|
File.open(@log_file, 'r') do |f|
|
439
465
|
f.seek(@original_log_file_stat.size, IO::SEEK_SET)
|
440
|
-
|
466
|
+
diff = f.read.strip
|
467
|
+
if diff.empty?
|
468
|
+
return nil
|
469
|
+
else
|
470
|
+
return diff
|
471
|
+
end
|
441
472
|
end
|
442
473
|
else
|
443
474
|
return nil
|
@@ -447,7 +478,7 @@ private
|
|
447
478
|
end
|
448
479
|
|
449
480
|
def determine_lock_file(identifier, pid_file)
|
450
|
-
return File.expand_path(pid_file + ".lock")
|
481
|
+
return LockFile.new(File.expand_path(pid_file + ".lock"))
|
451
482
|
end
|
452
483
|
|
453
484
|
def self.fork_supported?
|
@@ -498,7 +529,15 @@ private
|
|
498
529
|
|
499
530
|
def run_ping_command
|
500
531
|
if @ping_command.respond_to?(:call)
|
501
|
-
|
532
|
+
begin
|
533
|
+
value = @ping_command.call
|
534
|
+
if value.respond_to?(:close)
|
535
|
+
value.close rescue nil
|
536
|
+
end
|
537
|
+
return value
|
538
|
+
rescue *ALLOWED_CONNECT_EXCEPTIONS
|
539
|
+
return false
|
540
|
+
end
|
502
541
|
else
|
503
542
|
return system(@ping_command)
|
504
543
|
end
|
@@ -156,6 +156,7 @@ describe DaemonController, "#start" do
|
|
156
156
|
max_start_timeout = 6
|
157
157
|
else
|
158
158
|
start_timeout = 0.15
|
159
|
+
min_start_timeout = 0.15
|
159
160
|
max_start_timeout = 0.30
|
160
161
|
end
|
161
162
|
new_controller(:start_command => 'sleep 2', :start_timeout => start_timeout)
|
@@ -298,3 +299,46 @@ describe DaemonController, "#connect" do
|
|
298
299
|
end
|
299
300
|
end
|
300
301
|
|
302
|
+
describe DaemonController do
|
303
|
+
include TestHelpers
|
304
|
+
|
305
|
+
specify "if the ping command is a block that raises Errno::ECONNREFUSED, then that's " <<
|
306
|
+
"an indication that the daemon cannot be connected to" do
|
307
|
+
new_controller(:ping_command => lambda do
|
308
|
+
raise Errno::ECONNREFUSED, "dummy"
|
309
|
+
end)
|
310
|
+
@controller.send(:run_ping_command).should be_false
|
311
|
+
end
|
312
|
+
|
313
|
+
specify "if the ping command is a block that returns an object that responds to #close, " <<
|
314
|
+
"then the close method will be called on that object" do
|
315
|
+
server = TCPServer.new('localhost', 8278)
|
316
|
+
begin
|
317
|
+
socket = nil
|
318
|
+
new_controller(:ping_command => lambda do
|
319
|
+
socket = TCPSocket.new('localhost', 8278)
|
320
|
+
end)
|
321
|
+
@controller.send(:run_ping_command)
|
322
|
+
socket.should be_closed
|
323
|
+
ensure
|
324
|
+
server.close
|
325
|
+
end
|
326
|
+
end
|
327
|
+
|
328
|
+
specify "if the ping command is a block that returns an object that responds to #close, " <<
|
329
|
+
"and #close raises an exception, then that exception is ignored" do
|
330
|
+
server = TCPServer.new('localhost', 8278)
|
331
|
+
begin
|
332
|
+
o = Object.new
|
333
|
+
o.should_receive(:close).and_return do
|
334
|
+
raise StandardError, "foo"
|
335
|
+
end
|
336
|
+
new_controller(:ping_command => lambda do
|
337
|
+
o
|
338
|
+
end)
|
339
|
+
lambda { @controller.send(:run_ping_command) }.should_not raise_error(StandardError)
|
340
|
+
ensure
|
341
|
+
server.close
|
342
|
+
end
|
343
|
+
end
|
344
|
+
end
|
metadata
CHANGED
@@ -1,7 +1,7 @@
|
|
1
1
|
--- !ruby/object:Gem::Specification
|
2
2
|
name: FooBarWidget-daemon_controller
|
3
3
|
version: !ruby/object:Gem::Version
|
4
|
-
version: 0.
|
4
|
+
version: 0.2.0
|
5
5
|
platform: ruby
|
6
6
|
authors:
|
7
7
|
- Hongli Lai
|
@@ -13,7 +13,7 @@ date: 2008-08-21 00:00:00 -07:00
|
|
13
13
|
default_executable:
|
14
14
|
dependencies: []
|
15
15
|
|
16
|
-
description: A library for
|
16
|
+
description: A library for robust daemon management.
|
17
17
|
email: hongli@phusion.nl
|
18
18
|
executables: []
|
19
19
|
|
@@ -22,13 +22,14 @@ extensions: []
|
|
22
22
|
extra_rdoc_files: []
|
23
23
|
|
24
24
|
files:
|
25
|
-
- README.
|
25
|
+
- README.markdown
|
26
26
|
- LICENSE.txt
|
27
27
|
- daemon_controller.gemspec
|
28
28
|
- lib/daemon_controller.rb
|
29
|
+
- lib/daemon_controller/lock_file.rb
|
29
30
|
- spec/daemon_controller_spec.rb
|
30
31
|
- spec/echo_server.rb
|
31
|
-
has_rdoc:
|
32
|
+
has_rdoc: true
|
32
33
|
homepage: http://github.com/FooBarWidget/daemon_controller/tree/master
|
33
34
|
post_install_message:
|
34
35
|
rdoc_options: []
|
data/README.rdoc
DELETED
@@ -1,41 +0,0 @@
|
|
1
|
-
= Introduction
|
2
|
-
|
3
|
-
daemon_controller is a library for implementing daemon management capabilities.
|
4
|
-
|
5
|
-
Suppose that you have a Ruby on Rails application which uses the Sphinx search
|
6
|
-
server [1] for full-text searching capbilities. In order to search the index,
|
7
|
-
the search daemon (searchd) must be running. Furthermore, you're using the Riddle
|
8
|
-
library [2] for interfacing with the search daemon.
|
9
|
-
|
10
|
-
You can write this in your application:
|
11
|
-
|
12
|
-
require 'daemon_controller'
|
13
|
-
require 'riddle'
|
14
|
-
|
15
|
-
controller = DaemonController.new(
|
16
|
-
:identifier => 'Sphinx search daemon',
|
17
|
-
:start_command => 'searchd -c config/sphinx.conf',
|
18
|
-
:ping_command => proc { Riddle::Client.new('localhost', 1234) },
|
19
|
-
:pid_file => 'tmp/pids/sphinx.pid',
|
20
|
-
:log_file => 'log/sphinx.log'
|
21
|
-
)
|
22
|
-
client = controller.connect do
|
23
|
-
Riddle::Client.new('localhost', 1234)
|
24
|
-
end
|
25
|
-
client.query("some search query...")
|
26
|
-
|
27
|
-
controller.connect will start the Sphinx search daemon if it isn't already
|
28
|
-
started. Then, it will connect to the Sphinx search daemon by running the
|
29
|
-
given block.
|
30
|
-
|
31
|
-
Basically you just tell the library how to start the daemon, how to check
|
32
|
-
whether it's responding to connections, and which PID file and log file it
|
33
|
-
uses. daemon_controller will automatically take care of things like:
|
34
|
-
|
35
|
-
* concurrency control, e.g. to ensure that no two processes will try to start
|
36
|
-
the Sphinx search daemon at the same time.
|
37
|
-
* error handling: if 'searchd' failed to start, then its error message will
|
38
|
-
be propagated into the exception that will be thrown. This makes it much
|
39
|
-
easier to handle daemon startup errors in your application. This can also
|
40
|
-
allow the system administrator to see the error message directly in your
|
41
|
-
application, instead of having to consult the daemon's log file.
|