FooBarWidget-daemon_controller 0.1.0 → 0.2.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
data/LICENSE.txt ADDED
@@ -0,0 +1,20 @@
1
+ Copyright (c) 2008 Phusion
2
+
3
+ Permission is hereby granted, free of charge, to any person obtaining a copy
4
+ of this software and associated documentation files (the "Software"), to deal
5
+ in the Software without restriction, including without limitation the rights
6
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7
+ copies of the Software, and to permit persons to whom the Software is
8
+ furnished to do so, subject to the following conditions:
9
+
10
+ The above copyright notice and this permission notice shall be included in
11
+ all copies or substantial portions of the Software.
12
+
13
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
19
+ THE SOFTWARE.
20
+
data/README.markdown ADDED
@@ -0,0 +1,387 @@
1
+ Introduction
2
+ ============
3
+
4
+ There is a lot of software (both Rails related and unrelated) which rely on
5
+ servers or daemons. To name a few, in no particular order:
6
+
7
+ * [Ultrasphinx](http://blog.evanweaver.com/files/doc/fauna/ultrasphinx/), a
8
+ Rails library for full-text searching. It makes use the [Sphinx search
9
+ software](http://www.sphinxsearch.com/) for indexing and searching. Indexing
10
+ is done by running a command, while searching is done by querying the Sphinx
11
+ search server.
12
+ * [acts_as_ferret](http://projects.jkraemer.net/acts_as_ferret/wiki), another
13
+ Rails library for full-text searching. It uses the Ferret search software.
14
+ On production environments, it relies on the Ferret DRB server for both
15
+ searching and indexing.
16
+ * [BackgrounDRb](http://backgroundrb.rubyforge.org/), a Ruby job server and
17
+ scheduler. Scheduling is done by contacting the BackgrounDRb daemon.
18
+ * [mongrel_cluster](http://mongrel.rubyforge.org/wiki/MongrelCluster), which
19
+ starts and stops multiple Mongrel daemons.
20
+
21
+ Relying on daemons is quite common, but not without problems. Let's go over
22
+ some of them.
23
+
24
+ ### Starting daemons is a hassle
25
+
26
+ If you've used similar software, then you might agree that managing these
27
+ daemons are a hassle. If you're using BackgrounDRb, then the daemon must be
28
+ running. Starting the daemon is not hard, but it is annoying. It's also
29
+ possible that the system administrator forgets to start the daemon. While
30
+ configuring the system to automatically start a daemon at startup is not hard,
31
+ it is an extra thing to do, and thus a hassle. We thought, why can't such
32
+ daemons be automatically started? Indeed, this won't be possible if the daemon
33
+ is to be run on a remote machine. But in by far the majority of use cases, the
34
+ daemon runs on the same host as the Rails application. If a Rails application -
35
+ or indeed, <em>any</em> application - is configured to contact a daemon on the
36
+ local host, then why not start the daemon automatically on demand?
37
+
38
+ ### Daemon starting code may not be robust or efficient
39
+
40
+ We've also observed that people write daemon controlling code over and over
41
+ again. Consider for example UltraSphinx, which provides a
42
+ `rake sphinx:daemon:start` Rake task to start the daemon. The time that a
43
+ daemon needs to initialize is variable, and depends on things such as the
44
+ current system load. The Sphinx daemon usually needs less than a second before
45
+ we can connect to it. However, the way different software handles starting of a
46
+ daemon varies. We've observed that waiting a fixed amount of time is by far the
47
+ most common way. For example, UltraSphinx's daemon starting code looks like
48
+ this:
49
+
50
+ system "searchd --config '#{Ultrasphinx::CONF_PATH}'"
51
+ sleep(4) # give daemon a chance to write the pid file
52
+ if ultrasphinx_daemon_running?
53
+ say "started successfully"
54
+ else
55
+ say "failed to start"
56
+ end
57
+
58
+ This is in no way a slam against UltraSphinx. However, if the daemon starts in
59
+ 200 miliseconds, then the user who issued the start command will be waiting for
60
+ 3.8 seconds for no good reason. This is not good for usability or for the
61
+ user's patience.
62
+
63
+ ### Startup error handling
64
+
65
+ Different software handles daemon startup errors in different ways. Some might
66
+ not even handle errors at all. For example, consider 'mongrel_cluster'. If
67
+ there's a typo in one of your application source files, then 'mongrel_cluster'
68
+ will not report the error. Instead, you have to check its log files to see what
69
+ happened. This is not good for usability: many people will be wondering why
70
+ they can't connect to their Mongrel ports after issuing a
71
+ `mongrel_rails cluster::start` - until they realize that they should read the
72
+ log file. But the thing is, not everybody realizes this. And typing in an extra
73
+ command to read the log file to check whether Mongrel started correctly, is
74
+ just a big hassle. Why can't the daemon startup code report such errors
75
+ immediately?
76
+
77
+ ### Stale or corrupt Pid files
78
+
79
+ Suppose that you're running a Mongrel cluster, and your server suddenly powers
80
+ off because of a power outage. When the server is online again, it fails to
81
+ start your Mongrel cluster because the PID file that it had written still
82
+ exists, and wasn't cleaned up properly (it's supposed to be cleaned up when
83
+ Mongrel exits). mongrel_cluster provides the `--clean` option to check whether
84
+ the PID file is *stale*, and will automatically clean it up if it is. But not
85
+ all daemon controlling software supports this. Why can't all software check for
86
+ stale PID files automatically?
87
+
88
+
89
+ Implementation problems
90
+ =======================
91
+
92
+ From the problem descriptions, it would become apparent that our wishlist is as
93
+ follows. Why is this wishlist often not implemented? Let's go over them.
94
+
95
+ - **A daemon should be automatically started on demand, instead of requiring the user to manually start it.**
96
+
97
+ The most obvious problems are related to concurrency. Suppose that your web
98
+ application has a search box, and you want to start the search daemon if it
99
+ isn't already started, then connect to. Two problems will arise:
100
+
101
+ * Suppose that Rails process A is still starting the daemon. At the same
102
+ time, another visitor tries to search something, and Rails process B
103
+ notices that the daemon is not running. If B tries to start the daemon
104
+ while it's already being started by A, then things can go wrong.
105
+ *A robust daemon starter must ensure that only one process at the same time may start the daemon.*
106
+ * It's not a good idea to wait a fixed amount of time for the daemon to
107
+ start, because you don't know in advance how long it will take for it to
108
+ start. For example, if you wait 2 seconds, then try to connect to the
109
+ daemon, and the daemon isn't done initializing yet, then it will seem as
110
+ if the daemon failed to start.
111
+
112
+ These are the most probable reasons why people don't try to write
113
+ auto-starting code, and instead require the user to start the daemon
114
+ manually.
115
+
116
+ These problems, as well as several less obvious problems, are closely
117
+ related to the next few points.
118
+
119
+ - **The daemon starter must wait until the daemon is done initializing, no longer and no shorter**
120
+
121
+ Because only after the daemon is fully initialized, is it safe to connect
122
+ to it. And because the user should not have to wait longer than he really
123
+ has to. During startup, the daemon will have to be continuously checked
124
+ whether it's done initializing or whether an error occured. Writing this
125
+ code can be quite a hassle, which is why most people don't do it.
126
+
127
+ - **The daemon starter must report any startup errors**
128
+
129
+ If the daemon starting command - e.g. `sphinx -c config_file.conf`,
130
+ `apachectl start` or `mongrel_rails cluster::start` - reports startup
131
+ errors, then all is fine as long as the user is starting the command from a
132
+ terminal. A problem occurs when the error occurs after the daemon has
133
+ already gone into the background. Such errors are only reported to the log
134
+ file.
135
+ *The daemon starter should also check the log file for any startup errors.*
136
+
137
+ Furthermore, it should be able to raise startup errors as exceptions. This
138
+ allows the the application to decide what to do with the error. For less
139
+ experienced system administrators, the error might be displayed in the
140
+ browser, allowing the administrators to become aware of the problem without
141
+ forcing them to manually check the log files. Or the error might be emailed
142
+ to a system administrator's email address.
143
+
144
+ - **The daemon starter must be able to correct stale or corrupted PID files**
145
+
146
+ If the PID file is stale, or for some reason has been corrupted, then the
147
+ daemon starter must be able to cope with that.
148
+ *It should check whether the PID file contains a valid PID, and whether the PID exists.*
149
+
150
+
151
+ Introducing daemon_controller
152
+ =============================
153
+
154
+ *daemon_controller* is a library for managing daemons in a robust manner. It is
155
+ not a tool for managing daemons. Rather, it is a library which lets you write
156
+ applications that manage daemons in a robust manner. For example,
157
+ 'mongrel_cluster' or UltraSphinx may be adapted to utilize this library, for
158
+ more robust daemon management.
159
+
160
+ *daemon_controller* implements all items in the aforementioned wishlist. It
161
+ provides the following functionalities:
162
+
163
+ ### Starting a daemon
164
+
165
+ This ensures that no two processes can start the same daemon at the same time.
166
+ It will also reports any startup errors, even errors that occur after the
167
+ daemon has already gone into the background but before it has fully initialized
168
+ yet. It also allows you to set a timeout, and will try to abort the daemon if
169
+ it takes too long to initialize.
170
+
171
+ The start function won't return until the daemon has been fully initialized,
172
+ and is responding to connections. So if the start function has returned, then
173
+ the daemon is guaranteed to be usable.
174
+
175
+ ### Stopping a daemon
176
+
177
+ It will stop the daemon, but only if it's already running. Any errors
178
+ are reported. If the daemon isn't already running, then it will silently
179
+ succeed. Just like starting a daemon, you can set a timeout for stopping the
180
+ daemon.
181
+
182
+ Like the start function, the stop function won't return until the daemon is no
183
+ longer running. This makes it save to immediately start the same daemon again
184
+ after having stopped it, without worrying that the previous daemon instance
185
+ hasn't exited yet and might conflict with the newly started daemon instance.
186
+
187
+ ### Connecting to a daemon, starting it if it isn't running
188
+
189
+ Every daemon has to be connected to using a different way. As a developer, you
190
+ tell 'daemon_controller' how to connect to the daemon. It will then attempt to
191
+ do that, and if that fails, it will check whether the daemon is running. If it
192
+ isn't running, then it will automatically start the daemon, and attempt to
193
+ connect to the daemon again. Failures are reported.
194
+
195
+ ### Checking whether a daemon is running
196
+
197
+ This information is retrieved from the PID file. It also checks whether the PID
198
+ file is stale.
199
+
200
+ ### All failures are reported via exceptions
201
+
202
+ So that you can exactly determine how you want to handle errors.
203
+
204
+ ### Lots and lots of error checking
205
+
206
+ So that there are very few ways in which the system can screw up.
207
+
208
+ daemon_controller's goal is to make daemon management less of a hassle, and as
209
+ automatic and straightforward as possible.
210
+
211
+
212
+ Tutorial #1: controlling Apache
213
+ ===============================
214
+
215
+ Suppose that you're a [Phusion Passenger](http://www.modrails.com/) developer,
216
+ and you need to write tests for the Apache module. In particular, you want to
217
+ test whether the different Phusion Passenger configuration directives are
218
+ working as expected. Obviously, to test the Apache module, the Apache web
219
+ server must be running. For every test, you will want the unit test suite to:
220
+
221
+ 1. Write an Apache configuration file, with the relevant configuration
222
+ directive set to a specific value.
223
+ 2. Start Apache.
224
+ 3. Send an HTTP request to Apache and check whether the HTTP response matches
225
+ your expectations.
226
+ 4. Stop Apache.
227
+
228
+ That can be done with the following code:
229
+
230
+ require 'daemon_controller'
231
+
232
+ File.open("apache.conf", "w") do |f|
233
+ f.write("PidFile apache.pid\n")
234
+ f.write("LogFile apache.log\n")
235
+ f.write("Listen 1234\n")
236
+ f.write(... other relevant configuration options ...)
237
+ end
238
+
239
+ controller = DaemonController.new(
240
+ :identifier => 'Apache web server',
241
+ :start_command => 'apachectl -f apache.conf -k start',
242
+ :ping_command => lambda { TCPSocket.new('localhost', 1234) },
243
+ :pid_file => 'apache.pid',
244
+ :log_file => 'apache.log',
245
+ :timeout => 25
246
+ )
247
+ controller.start
248
+
249
+ .... apache is now started ....
250
+ .... some test code here ....
251
+
252
+ controller.stop
253
+
254
+ The `File.open` line is obvious: it writes the relevant Apache configuration
255
+ file.
256
+
257
+ The next line is for creating a new DaemonController object. We pass a
258
+ human-readable identifier for this daemon ("Apache web server") to the
259
+ constructor. This is used for generating friendlier error messages.
260
+ We also tell it how Apache is supposed to be started (`:start_command`), how to
261
+ check whether it can be connected to (`:ping_command`), and where its PID file
262
+ and log file is. If Apache failed with an error during startup, then it will be
263
+ reported. If Apache failed with an error after it has gone into the background,
264
+ then that will be reported too: the given log file is monitored for new error
265
+ messages.
266
+ Finally, a timeout of 25 seconds is given. If Apache doesn't start within 25
267
+ seconds, then an exception will be raised.
268
+
269
+ The ping command is just a `Proc` which returns true or false. If the Proc
270
+ raises `Errno::ECONNREFUSED`, then that's also interpreted by DaemonController
271
+ as meaning that the daemon isn't responding yet.
272
+
273
+ After `controller.start` has returned, we can continue with the test case. At
274
+ this point, we know that Apache has done initializing.
275
+ When we're done with Apache, we stop it with `controller.stop`. This does not
276
+ return until Apache has fully stopped.
277
+
278
+ The cautious reader might notice that the socket returned by the ping command
279
+ is never closed. That's true, because DaemonController will close it
280
+ automatically for us, if it notices that the ping command proc's return value
281
+ responds to `#close`.
282
+
283
+ From this example, it becomes apparent that for daemon_controller to work, you
284
+ must know how to start the daemon, how to contact the daemon, and you must know
285
+ where it will put its PID file and log file.
286
+
287
+
288
+ Tutorial #2: Sphinx indexing and search server management
289
+ =========================================================
290
+
291
+ We at Phusion are currently developing a web application with full-text search
292
+ capabilities, and we're using Sphinx for this purpose. We want to make the
293
+ lives of our developers and our system administrators as easy as possible, so
294
+ that there's little room for human screw-up, and so we've developed this
295
+ library. Our Sphinx search daemon is completely managed through this library
296
+ and is automatically started on demand.
297
+
298
+ Our Sphinx config file is generated from an ERB template. This ERB templates
299
+ writes different values in the config file, depending on whether we're in
300
+ development, test or production mode. We will want to regenerate this config
301
+ file every time, just before we start the search daemon.
302
+ But there's more. The search daemon will fail if there is no search index. If a
303
+ new developer has just checked out the application's source code, then there is
304
+ no search index yet. We don't want him to go through the pain of having to
305
+ generate the index manually. (That said, it isn't that much of a pain, but it's
306
+ just yet-another-thing to do, which can and should be automated.) So before
307
+ starting the daemon, we will also want to check whether the index exists. If
308
+ not, then we'll generate it, and then start the daemon. Of course, no two Rails
309
+ processes may generate the config file or the index at the same time.
310
+
311
+ When querying the search server, we will want to automatically start it if it
312
+ isn't running.
313
+
314
+ This can be achieved with the following code:
315
+
316
+ require 'daemon_controller'
317
+
318
+ class SearchServer
319
+ SEARCH_SERVER_PORT = 1234
320
+
321
+ def initialize
322
+ @controller = DaemonController.new(
323
+ :identifier => 'Sphinx search server',
324
+ :start_command => "searchd -c config/sphinx.conf",
325
+ :before_start => method(:before_start),
326
+ :ping_command => lambda { TCPSocket.new('localhost', SEARCH_SERVER_PORT) },
327
+ :pid_file => 'tmp/pids/sphinx.pid',
328
+ :log_file => 'log/sphinx.log',
329
+ end
330
+
331
+ def query(search_terms)
332
+ socket = @controller.connect do
333
+ TCPSocket.new('localhost', SEARCH_SERVER_PORT)
334
+ end
335
+ send_query(socket, search_terms)
336
+ return retrieve_results(socket)
337
+ end
338
+
339
+ private
340
+ def before_start
341
+ generate_configuration_file
342
+ if !index_exists?
343
+ generate_index
344
+ end
345
+ end
346
+
347
+ ...
348
+ end
349
+
350
+ Notice the `:before_start` option. We pass a block of code which is to be run,
351
+ just before the daemon is started. This block, along with starting the daemon,
352
+ is completely serialized. That is, if you're inside the block, then it's
353
+ guaranteed that no other process is running this block at the same time as well.
354
+
355
+ The `#query` method is the method for querying the search server with search
356
+ terms. It returns a list of result. It uses `DaemonController#connect`: one
357
+ passes a block of that method, which contains code for connecting to the
358
+ daemon. If the block returns nil, or if it raises `Errno::ECONNREFUSED`, then
359
+ `DaemonController#connect` will automatically take care of auto-starting the
360
+ Sphinx daemon for us.
361
+
362
+
363
+ A little bit of history
364
+ =======================
365
+
366
+ The issue of managing daemons has been a thorn in our eyes for quite some time
367
+ now. Until now, we've solved this problem by equipping any daemons that we
368
+ write with the ability to gracefully handle being concurrently started, the
369
+ ability to initialize as much as possible *before* forking into the background,
370
+ etc. However, equipping all this robustness into our code over and over is a
371
+ lot of work. We've considered documenting a standard behavior for daemons so
372
+ that they can properly support auto-starting and such.
373
+
374
+ However, we've recently realized that that's probably a futile effort.
375
+ Convincing everybody to write a lot of code for a bit more robustness is
376
+ probably not realistic. So we took the pragmatic approach and developed a
377
+ library which adds more robustness on top of daemons' existing behavior. And
378
+ thus, daemon_controller was born. It is a little bit less efficient compared to
379
+ when the daemon is designed from the beginning with such abilities in mind, but
380
+ it's compatible with virtually all daemons, and is easy to use.
381
+
382
+
383
+ API documentation
384
+ =================
385
+
386
+ Detailed API documentation is available in the form of inline comments in
387
+ `lib/daemon_controller.rb`.
@@ -1,17 +1,18 @@
1
1
  Gem::Specification.new do |s|
2
2
  s.name = "daemon_controller"
3
- s.version = "0.1.0"
3
+ s.version = "0.2.0"
4
4
  s.date = "2008-08-21"
5
5
  s.summary = "A library for implementing daemon management capabilities"
6
6
  s.email = "hongli@phusion.nl"
7
7
  s.homepage = "http://github.com/FooBarWidget/daemon_controller/tree/master"
8
- s.description = "A library for implementing daemon management capabilities."
9
- s.has_rdoc = false
8
+ s.description = "A library for robust daemon management."
9
+ s.has_rdoc = true
10
10
  s.authors = ["Hongli Lai"]
11
11
 
12
12
  s.files = [
13
- "README.rdoc", "LICENSE.txt", "daemon_controller.gemspec",
13
+ "README.markdown", "LICENSE.txt", "daemon_controller.gemspec",
14
14
  "lib/daemon_controller.rb",
15
+ "lib/daemon_controller/lock_file.rb",
15
16
  "spec/daemon_controller_spec.rb",
16
17
  "spec/echo_server.rb"
17
18
  ]
@@ -0,0 +1,48 @@
1
+ # daemon_controller, library for robust daemon management
2
+ # Copyright (c) 2008 Phusion
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the "Software"), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in
12
+ # all copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
20
+ # THE SOFTWARE.
21
+
22
+ class DaemonController
23
+ class LockFile
24
+ def initialize(filename)
25
+ @filename = filename
26
+ end
27
+
28
+ def exclusive_lock
29
+ File.open(@filename, 'w') do |f|
30
+ if Fcntl.const_defined? :F_SETFD
31
+ f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
32
+ end
33
+ f.flock(File::LOCK_EX)
34
+ yield
35
+ end
36
+ end
37
+
38
+ def shared_lock
39
+ File.open(@filename, 'w') do |f|
40
+ if Fcntl.const_defined? :F_SETFD
41
+ f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
42
+ end
43
+ f.flock(File::LOCK_SH)
44
+ yield
45
+ end
46
+ end
47
+ end # class PidFile
48
+ end # class DaemonController
@@ -1,20 +1,33 @@
1
- # Basic functionality for a single, local, external daemon:
2
- # - starting daemon
3
- # * must be concurrency-safe!
4
- # * must be able to report startup errors!
5
- # * returns when daemon is fully operational
6
- # - stopping daemon
7
- # * must be concurrency-safe!
8
- # * returns when daemon has exited
9
- # - querying the status of a daemon
10
- # * querying the status of a daemon (i.e. whether it's running)
11
- # - connect to a daemon, and start it if it isn't already running
12
- # * must be a single atomic action
1
+ # daemon_controller, library for robust daemon management
2
+ # Copyright (c) 2008 Phusion
3
+ #
4
+ # Permission is hereby granted, free of charge, to any person obtaining a copy
5
+ # of this software and associated documentation files (the "Software"), to deal
6
+ # in the Software without restriction, including without limitation the rights
7
+ # to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
8
+ # copies of the Software, and to permit persons to whom the Software is
9
+ # furnished to do so, subject to the following conditions:
10
+ #
11
+ # The above copyright notice and this permission notice shall be included in
12
+ # all copies or substantial portions of the Software.
13
+ #
14
+ # THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
15
+ # IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
16
+ # FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
17
+ # AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
18
+ # LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
19
+ # OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
20
+ # THE SOFTWARE.
13
21
 
14
22
  require 'tempfile'
15
23
  require 'fcntl'
24
+ require File.expand_path(File.dirname(__FILE__) << '/daemon_controller/lock_file')
16
25
 
26
+ # Main daemon controller object. See the README for an introduction and tutorial.
17
27
  class DaemonController
28
+ ALLOWED_CONNECT_EXCEPTIONS = [Errno::ECONNREFUSED, Errno::ENETUNREACH,
29
+ Errno::ETIMEDOUT, Errno::ECONNRESET]
30
+
18
31
  class Error < StandardError
19
32
  end
20
33
  class TimeoutError < Error
@@ -57,6 +70,15 @@ class DaemonController
57
70
  #
58
71
  # The value may also be a Proc, which returns an expression that evaluates to
59
72
  # true (indicating that the daemon can be connected to) or false (failure).
73
+ # If the Proc raises Errno::ECONNREFUSED, Errno::ENETUNREACH, Errno::ETIMEDOUT
74
+ # or Errno::ECONNRESET, then that also means that the daemon cannot be connected
75
+ # to.
76
+ # <b>NOTE:</b> if the ping command returns an object which responds to
77
+ # <tt>#close</tt>, then that method will be called on the return value.
78
+ # This makes it possible to specify a ping command such as
79
+ # <tt>lambda { TCPSocket.new('localhost', 1234) }</tt>, without having to worry
80
+ # about closing it afterwards.
81
+ # Any exceptions raised by #close are ignored.
60
82
  #
61
83
  # [:pid_file]
62
84
  # The PID file that the daemon will write to. Used to check whether the daemon
@@ -73,6 +95,10 @@ class DaemonController
73
95
  # by killing the PID written in the PID file.
74
96
  #
75
97
  # The default value is +nil+.
98
+ #
99
+ # [:before_start]
100
+ # This may be a Proc. It will be called just before running the start command.
101
+ # The before_start proc is not subject to the start timeout.
76
102
  #
77
103
  # [:start_timeout]
78
104
  # The maximum amount of time, in seconds, that #start may take to start
@@ -118,6 +144,7 @@ class DaemonController
118
144
  @ping_interval = options[:ping_interval] || 0.1
119
145
  @pid_file = options[:pid_file]
120
146
  @log_file = options[:log_file]
147
+ @before_start = options[:before_start]
121
148
  @start_timeout = options[:start_timeout] || 15
122
149
  @stop_timeout = options[:stop_timeout] || 15
123
150
  @log_file_activity_timeout = options[:log_file_activity_timeout] || 7
@@ -132,7 +159,7 @@ class DaemonController
132
159
  # - StartTimeout - the daemon did not start in time. This could also
133
160
  # mean that the daemon failed after it has gone into the background.
134
161
  def start
135
- exclusive_lock do
162
+ @lock_file.exclusive_lock do
136
163
  start_without_locking
137
164
  end
138
165
  end
@@ -142,8 +169,8 @@ class DaemonController
142
169
  # started.
143
170
  #
144
171
  # The block must return nil or raise Errno::ECONNREFUSED, Errno::ENETUNREACH,
145
- # or Errno::ETIMEDOUT to indicate that the daemon cannot be connected to.
146
- # It must return non-nil if the daemon can be connected to.
172
+ # Errno::ETIMEDOUT, Errno::ECONNRESET to indicate that the daemon cannot be
173
+ # connected to. It must return non-nil if the daemon can be connected to.
147
174
  # Upon successful connection, the return value of the block will
148
175
  # be returned by #connect.
149
176
  #
@@ -158,21 +185,21 @@ class DaemonController
158
185
  # to the daemon even after starting it.
159
186
  def connect
160
187
  connection = nil
161
- shared_lock do
188
+ @lock_file.shared_lock do
162
189
  begin
163
190
  connection = yield
164
- rescue Errno::ECONNREFUSED, Errno::ENETUNREACH, Errno::ETIMEDOUT
191
+ rescue *ALLOWED_CONNECT_EXCEPTIONS
165
192
  connection = nil
166
193
  end
167
194
  end
168
195
  if connection.nil?
169
- exclusive_lock do
196
+ @lock_file.exclusive_lock do
170
197
  if !daemon_is_running?
171
198
  start_without_locking
172
199
  end
173
200
  begin
174
201
  connection = yield
175
- rescue Errno::ECONNREFUSED, Errno::ENETUNREACH, Errno::ETIMEDOUT
202
+ rescue *ALLOWED_CONNECT_EXCEPTIONS
176
203
  connection = nil
177
204
  end
178
205
  if connection.nil?
@@ -197,7 +224,7 @@ class DaemonController
197
224
  # - StopError - the stop command failed.
198
225
  # - StopTimeout - the daemon didn't stop in time.
199
226
  def stop
200
- exclusive_lock do
227
+ @lock_file.exclusive_lock do
201
228
  begin
202
229
  Timeout.timeout(@stop_timeout) do
203
230
  kill_daemon
@@ -211,14 +238,16 @@ class DaemonController
211
238
  end
212
239
  end
213
240
 
214
- # Returns the daemon's PID, as reported by its PID file.
241
+ # Returns the daemon's PID, as reported by its PID file. Returns the PID
242
+ # as an integer, or nil there is no valid PID in the PID file.
243
+ #
215
244
  # This method doesn't check whether the daemon's actually running.
216
245
  # Use #running? if you want to check whether it's actually running.
217
246
  #
218
247
  # Raises SystemCallError or IOError if something went wrong during
219
248
  # reading of the PID file.
220
249
  def pid
221
- shared_lock do
250
+ @lock_file.shared_lock do
222
251
  return read_pid_file
223
252
  end
224
253
  end
@@ -230,32 +259,12 @@ class DaemonController
230
259
  # Raises SystemCallError or IOError if something went wrong during
231
260
  # reading of the PID file.
232
261
  def running?
233
- shared_lock do
262
+ @lock_file.shared_lock do
234
263
  return daemon_is_running?
235
264
  end
236
265
  end
237
266
 
238
267
  private
239
- def exclusive_lock
240
- File.open(@lock_file, 'w') do |f|
241
- if Fcntl.const_defined? :F_SETFD
242
- f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
243
- end
244
- f.flock(File::LOCK_EX)
245
- yield
246
- end
247
- end
248
-
249
- def shared_lock
250
- File.open(@lock_file, 'w') do |f|
251
- if Fcntl.const_defined? :F_SETFD
252
- f.fcntl(Fcntl::F_SETFD, Fcntl::FD_CLOEXEC)
253
- end
254
- f.flock(File::LOCK_SH)
255
- yield
256
- end
257
- end
258
-
259
268
  def start_without_locking
260
269
  if daemon_is_running?
261
270
  raise AlreadyStarted, "Daemon '#{@identifier}' is already started"
@@ -264,6 +273,7 @@ private
264
273
  delete_pid_file
265
274
  begin
266
275
  started = false
276
+ before_start
267
277
  Timeout.timeout(@start_timeout) do
268
278
  done = false
269
279
  spawn_daemon
@@ -305,14 +315,22 @@ private
305
315
  result = :timeout
306
316
  end
307
317
  if !result
308
- raise StartError, differences_in_log_file
318
+ raise(StartError, differences_in_log_file ||
319
+ "Daemon '#{@identifier}' failed to start.")
309
320
  elsif result == :timeout
310
- raise StartTimeout, differences_in_log_file
321
+ raise(StartTimeout, differences_in_log_file ||
322
+ "Daemon '#{@identifier}' failed to start in time.")
311
323
  else
312
324
  return true
313
325
  end
314
326
  end
315
327
 
328
+ def before_start
329
+ if @before_start
330
+ @before_start.call
331
+ end
332
+ end
333
+
316
334
  def spawn_daemon
317
335
  run_command(@start_command)
318
336
  end
@@ -330,7 +348,10 @@ private
330
348
  end
331
349
 
332
350
  def kill_daemon_with_signal
333
- Process.kill('SIGTERM', read_pid_file)
351
+ pid = read_pid_file
352
+ if pid
353
+ Process.kill('SIGTERM', pid)
354
+ end
334
355
  rescue Errno::ESRCH, Errno::ENOENT
335
356
  end
336
357
 
@@ -354,7 +375,12 @@ private
354
375
  end
355
376
 
356
377
  def read_pid_file
357
- return File.read(@pid_file).strip.to_i
378
+ pid = File.read(@pid_file).strip
379
+ if pid =~ /\A\d+\Z/
380
+ return pid.to_i
381
+ else
382
+ return nil
383
+ end
358
384
  end
359
385
 
360
386
  def delete_pid_file
@@ -437,7 +463,12 @@ private
437
463
  if @original_log_file_stat
438
464
  File.open(@log_file, 'r') do |f|
439
465
  f.seek(@original_log_file_stat.size, IO::SEEK_SET)
440
- return f.read.strip
466
+ diff = f.read.strip
467
+ if diff.empty?
468
+ return nil
469
+ else
470
+ return diff
471
+ end
441
472
  end
442
473
  else
443
474
  return nil
@@ -447,7 +478,7 @@ private
447
478
  end
448
479
 
449
480
  def determine_lock_file(identifier, pid_file)
450
- return File.expand_path(pid_file + ".lock")
481
+ return LockFile.new(File.expand_path(pid_file + ".lock"))
451
482
  end
452
483
 
453
484
  def self.fork_supported?
@@ -498,7 +529,15 @@ private
498
529
 
499
530
  def run_ping_command
500
531
  if @ping_command.respond_to?(:call)
501
- return @ping_command.call
532
+ begin
533
+ value = @ping_command.call
534
+ if value.respond_to?(:close)
535
+ value.close rescue nil
536
+ end
537
+ return value
538
+ rescue *ALLOWED_CONNECT_EXCEPTIONS
539
+ return false
540
+ end
502
541
  else
503
542
  return system(@ping_command)
504
543
  end
@@ -156,6 +156,7 @@ describe DaemonController, "#start" do
156
156
  max_start_timeout = 6
157
157
  else
158
158
  start_timeout = 0.15
159
+ min_start_timeout = 0.15
159
160
  max_start_timeout = 0.30
160
161
  end
161
162
  new_controller(:start_command => 'sleep 2', :start_timeout => start_timeout)
@@ -298,3 +299,46 @@ describe DaemonController, "#connect" do
298
299
  end
299
300
  end
300
301
 
302
+ describe DaemonController do
303
+ include TestHelpers
304
+
305
+ specify "if the ping command is a block that raises Errno::ECONNREFUSED, then that's " <<
306
+ "an indication that the daemon cannot be connected to" do
307
+ new_controller(:ping_command => lambda do
308
+ raise Errno::ECONNREFUSED, "dummy"
309
+ end)
310
+ @controller.send(:run_ping_command).should be_false
311
+ end
312
+
313
+ specify "if the ping command is a block that returns an object that responds to #close, " <<
314
+ "then the close method will be called on that object" do
315
+ server = TCPServer.new('localhost', 8278)
316
+ begin
317
+ socket = nil
318
+ new_controller(:ping_command => lambda do
319
+ socket = TCPSocket.new('localhost', 8278)
320
+ end)
321
+ @controller.send(:run_ping_command)
322
+ socket.should be_closed
323
+ ensure
324
+ server.close
325
+ end
326
+ end
327
+
328
+ specify "if the ping command is a block that returns an object that responds to #close, " <<
329
+ "and #close raises an exception, then that exception is ignored" do
330
+ server = TCPServer.new('localhost', 8278)
331
+ begin
332
+ o = Object.new
333
+ o.should_receive(:close).and_return do
334
+ raise StandardError, "foo"
335
+ end
336
+ new_controller(:ping_command => lambda do
337
+ o
338
+ end)
339
+ lambda { @controller.send(:run_ping_command) }.should_not raise_error(StandardError)
340
+ ensure
341
+ server.close
342
+ end
343
+ end
344
+ end
metadata CHANGED
@@ -1,7 +1,7 @@
1
1
  --- !ruby/object:Gem::Specification
2
2
  name: FooBarWidget-daemon_controller
3
3
  version: !ruby/object:Gem::Version
4
- version: 0.1.0
4
+ version: 0.2.0
5
5
  platform: ruby
6
6
  authors:
7
7
  - Hongli Lai
@@ -13,7 +13,7 @@ date: 2008-08-21 00:00:00 -07:00
13
13
  default_executable:
14
14
  dependencies: []
15
15
 
16
- description: A library for implementing daemon management capabilities.
16
+ description: A library for robust daemon management.
17
17
  email: hongli@phusion.nl
18
18
  executables: []
19
19
 
@@ -22,13 +22,14 @@ extensions: []
22
22
  extra_rdoc_files: []
23
23
 
24
24
  files:
25
- - README.rdoc
25
+ - README.markdown
26
26
  - LICENSE.txt
27
27
  - daemon_controller.gemspec
28
28
  - lib/daemon_controller.rb
29
+ - lib/daemon_controller/lock_file.rb
29
30
  - spec/daemon_controller_spec.rb
30
31
  - spec/echo_server.rb
31
- has_rdoc: false
32
+ has_rdoc: true
32
33
  homepage: http://github.com/FooBarWidget/daemon_controller/tree/master
33
34
  post_install_message:
34
35
  rdoc_options: []
data/README.rdoc DELETED
@@ -1,41 +0,0 @@
1
- = Introduction
2
-
3
- daemon_controller is a library for implementing daemon management capabilities.
4
-
5
- Suppose that you have a Ruby on Rails application which uses the Sphinx search
6
- server [1] for full-text searching capbilities. In order to search the index,
7
- the search daemon (searchd) must be running. Furthermore, you're using the Riddle
8
- library [2] for interfacing with the search daemon.
9
-
10
- You can write this in your application:
11
-
12
- require 'daemon_controller'
13
- require 'riddle'
14
-
15
- controller = DaemonController.new(
16
- :identifier => 'Sphinx search daemon',
17
- :start_command => 'searchd -c config/sphinx.conf',
18
- :ping_command => proc { Riddle::Client.new('localhost', 1234) },
19
- :pid_file => 'tmp/pids/sphinx.pid',
20
- :log_file => 'log/sphinx.log'
21
- )
22
- client = controller.connect do
23
- Riddle::Client.new('localhost', 1234)
24
- end
25
- client.query("some search query...")
26
-
27
- controller.connect will start the Sphinx search daemon if it isn't already
28
- started. Then, it will connect to the Sphinx search daemon by running the
29
- given block.
30
-
31
- Basically you just tell the library how to start the daemon, how to check
32
- whether it's responding to connections, and which PID file and log file it
33
- uses. daemon_controller will automatically take care of things like:
34
-
35
- * concurrency control, e.g. to ensure that no two processes will try to start
36
- the Sphinx search daemon at the same time.
37
- * error handling: if 'searchd' failed to start, then its error message will
38
- be propagated into the exception that will be thrown. This makes it much
39
- easier to handle daemon startup errors in your application. This can also
40
- allow the system administrator to see the error message directly in your
41
- application, instead of having to consult the daemon's log file.