god 0.11.0 → 0.12.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (118) hide show
  1. data/Announce.txt +6 -6
  2. data/Gemfile +2 -0
  3. data/History.txt +19 -2
  4. data/{README.txt → LICENSE} +0 -37
  5. data/README.md +31 -0
  6. data/Rakefile +80 -38
  7. data/bin/god +21 -21
  8. data/doc/god.asciidoc +1487 -0
  9. data/doc/intro.asciidoc +20 -0
  10. data/ext/god/extconf.rb +3 -3
  11. data/ext/god/kqueue_handler.c +18 -18
  12. data/ext/god/netlink_handler.c +31 -31
  13. data/god.gemspec +24 -16
  14. data/lib/god.rb +261 -204
  15. data/lib/god/behavior.rb +14 -14
  16. data/lib/god/behaviors/clean_pid_file.rb +5 -5
  17. data/lib/god/behaviors/clean_unix_socket.rb +10 -10
  18. data/lib/god/behaviors/notify_when_flapping.rb +12 -12
  19. data/lib/god/cli/command.rb +59 -46
  20. data/lib/god/cli/run.rb +33 -37
  21. data/lib/god/cli/version.rb +6 -6
  22. data/lib/god/compat19.rb +1 -4
  23. data/lib/god/condition.rb +21 -21
  24. data/lib/god/conditions/always.rb +19 -6
  25. data/lib/god/conditions/complex.rb +18 -18
  26. data/lib/god/conditions/cpu_usage.rb +14 -14
  27. data/lib/god/conditions/degrading_lambda.rb +8 -8
  28. data/lib/god/conditions/disk_usage.rb +5 -5
  29. data/lib/god/conditions/flapping.rb +23 -23
  30. data/lib/god/conditions/http_response_code.rb +35 -19
  31. data/lib/god/conditions/lambda.rb +2 -2
  32. data/lib/god/conditions/memory_usage.rb +13 -13
  33. data/lib/god/conditions/process_exits.rb +14 -20
  34. data/lib/god/conditions/process_running.rb +16 -25
  35. data/lib/god/conditions/socket_responding.rb +132 -0
  36. data/lib/god/conditions/tries.rb +10 -10
  37. data/lib/god/configurable.rb +10 -10
  38. data/lib/god/contact.rb +20 -20
  39. data/lib/god/contacts/email.rb +7 -4
  40. data/lib/god/contacts/jabber.rb +1 -1
  41. data/lib/god/driver.rb +96 -64
  42. data/lib/god/errors.rb +9 -9
  43. data/lib/god/event_handler.rb +19 -19
  44. data/lib/god/event_handlers/dummy_handler.rb +4 -4
  45. data/lib/god/event_handlers/kqueue_handler.rb +3 -3
  46. data/lib/god/event_handlers/netlink_handler.rb +2 -2
  47. data/lib/god/logger.rb +13 -13
  48. data/lib/god/metric.rb +50 -22
  49. data/lib/god/process.rb +53 -52
  50. data/lib/god/registry.rb +7 -7
  51. data/lib/god/simple_logger.rb +14 -14
  52. data/lib/god/socket.rb +11 -11
  53. data/lib/god/sugar.rb +30 -15
  54. data/lib/god/sys_logger.rb +2 -2
  55. data/lib/god/system/portable_poller.rb +8 -8
  56. data/lib/god/system/process.rb +8 -8
  57. data/lib/god/system/slash_proc_poller.rb +13 -13
  58. data/lib/god/task.rb +237 -188
  59. data/lib/god/timeline.rb +5 -5
  60. data/lib/god/trigger.rb +11 -11
  61. data/lib/god/watch.rb +205 -53
  62. data/test/configs/child_events/child_events.god +5 -5
  63. data/test/configs/child_events/simple_server.rb +1 -1
  64. data/test/configs/child_polls/child_polls.god +4 -4
  65. data/test/configs/child_polls/simple_server.rb +4 -4
  66. data/test/configs/complex/complex.god +7 -7
  67. data/test/configs/complex/simple_server.rb +1 -1
  68. data/test/configs/contact/contact.god +1 -1
  69. data/test/configs/contact/simple_server.rb +1 -1
  70. data/test/configs/daemon_events/daemon_events.god +5 -5
  71. data/test/configs/daemon_events/simple_server.rb +1 -1
  72. data/test/configs/daemon_events/simple_server_stop.rb +1 -1
  73. data/test/configs/daemon_polls/daemon_polls.god +3 -3
  74. data/test/configs/daemon_polls/simple_server.rb +1 -1
  75. data/test/configs/degrading_lambda/degrading_lambda.god +3 -3
  76. data/test/configs/keepalive/keepalive.god +9 -0
  77. data/test/configs/keepalive/keepalive.rb +12 -0
  78. data/test/configs/lifecycle/lifecycle.god +2 -2
  79. data/test/configs/matias/matias.god +6 -6
  80. data/test/configs/real.rb +7 -7
  81. data/test/configs/running_load/running_load.god +2 -2
  82. data/test/configs/stop_options/simple_server.rb +1 -1
  83. data/test/configs/stress/simple_server.rb +1 -1
  84. data/test/configs/stress/stress.god +2 -2
  85. data/test/configs/task/task.god +5 -5
  86. data/test/configs/test.rb +7 -7
  87. data/test/helper.rb +8 -8
  88. data/test/test_behavior.rb +3 -3
  89. data/test/test_campfire.rb +1 -2
  90. data/test/test_condition.rb +10 -10
  91. data/test/test_conditions_disk_usage.rb +12 -12
  92. data/test/test_conditions_http_response_code.rb +24 -24
  93. data/test/test_conditions_process_running.rb +7 -7
  94. data/test/test_conditions_socket_responding.rb +122 -0
  95. data/test/test_conditions_tries.rb +12 -12
  96. data/test/test_contact.rb +19 -19
  97. data/test/test_driver.rb +17 -3
  98. data/test/test_event_handler.rb +12 -12
  99. data/test/test_god.rb +195 -117
  100. data/test/test_handlers_kqueue_handler.rb +4 -4
  101. data/test/test_jabber.rb +1 -1
  102. data/test/test_logger.rb +17 -17
  103. data/test/test_metric.rb +16 -16
  104. data/test/test_process.rb +47 -41
  105. data/test/test_prowl.rb +1 -1
  106. data/test/test_registry.rb +2 -2
  107. data/test/test_socket.rb +3 -3
  108. data/test/test_sugar.rb +7 -7
  109. data/test/test_system_portable_poller.rb +1 -1
  110. data/test/test_system_process.rb +5 -5
  111. data/test/test_task.rb +57 -57
  112. data/test/test_timeline.rb +8 -8
  113. data/test/test_trigger.rb +16 -16
  114. data/test/test_watch.rb +69 -62
  115. metadata +182 -69
  116. data/lib/god/dependency_graph.rb +0 -41
  117. data/lib/god/diagnostics.rb +0 -37
  118. data/test/test_dependency_graph.rb +0 -62
@@ -0,0 +1,1487 @@
1
+ Installation
2
+ ------------
3
+
4
+ The best way to get god is via rubygems:
5
+
6
+ ```terminal
7
+ $ [sudo] gem install god
8
+ ```
9
+
10
+ Requirements
11
+ ------------
12
+
13
+ God currently only works on *Linux (kernel 2.6.15+), BSD,* and *Darwin*
14
+ systems. No support for Windows is planned. Event based conditions on Linux
15
+ systems require the `cn` (connector) kernel module loaded or compiled into
16
+ the kernel and god must be run as root.
17
+
18
+ The following systems have been tested. Help us test it on others!
19
+
20
+ * Darwin 10.4.10
21
+ * RedHat Fedora 6-15
22
+ * Ubuntu Dapper (no events)
23
+ * Ubuntu Feisty
24
+ * CentOS 4.5 (no events), 5, 6
25
+
26
+
27
+ Quick Start
28
+ -----------
29
+
30
+ The easiest way to understand how god will make your life better is by trying
31
+ out a simple example. To get you up and running quickly, I'll show you how to
32
+ keep a trivial server running.
33
+
34
+ Open up a new directory and write a simple server. Let's call it
35
+ `simple.rb`:
36
+
37
+ ```ruby
38
+ loop do
39
+ puts 'Hello'
40
+ sleep 1
41
+ end
42
+ ```
43
+
44
+ Now we'll write a god config file that tells god about our process. Place it
45
+ in the same directory and call it `simple.god`:
46
+
47
+ ```ruby
48
+ God.watch do |w|
49
+ w.name = "simple"
50
+ w.start = "ruby /full/path/to/simple.rb"
51
+ w.keepalive
52
+ end
53
+ ```
54
+
55
+ This is the simplest possible god configuration. We start by declaring a
56
+ `God.watch` block. A watch in god represents a process that we want to watch
57
+ and control. Each watch must have, at minimum, a unique name and a command that
58
+ tells god how to start the process. The `keepalive` declaration tells god to
59
+ keep this process alive. If the process is not running when god starts, it will
60
+ be started. If the process dies, it will be restarted.
61
+
62
+ In this example the `simple` process runs foreground, so god will take care of
63
+ daemonizing it and keeping track of the PID for us. When possible, it's best to
64
+ let god daemonize processes for us, that way we don't have to worry about
65
+ specifying and keeping track of PID files. Later on we'll see how to manage
66
+ processes that can't run foreground or that require PID files to be specified.
67
+
68
+ To run god, we give it the configuration file we wrote with `-c`. To see what's
69
+ going on, we can ask it to run foreground with `-D`:
70
+
71
+ ```terminal
72
+ $ god -c path/to/simple.god -D
73
+ ```
74
+
75
+ There are two ways that god can monitor your process. The first and better way
76
+ is with process events. Not every system supports it, but those that do will
77
+ automatically use it. With events, god will know immediately when a process
78
+ exits. For those systems without process event support, god will use a polling
79
+ mechanism. The output you see throughout this section will show both ways.
80
+
81
+ After starting god, you should see some output like the following:
82
+
83
+ ```terminal
84
+ # Events
85
+
86
+ I [2011-12-10 15:24:34] INFO: Loading simple.god
87
+ I [2011-12-10 15:24:34] INFO: Syslog enabled.
88
+ I [2011-12-10 15:24:34] INFO: Using pid file directory: /Users/tom/.god/pids
89
+ I [2011-12-10 15:24:34] INFO: Started on drbunix:///tmp/god.17165.sock
90
+ I [2011-12-10 15:24:34] INFO: simple move 'unmonitored' to 'init'
91
+ I [2011-12-10 15:24:34] INFO: simple moved 'unmonitored' to 'init'
92
+ I [2011-12-10 15:24:34] INFO: simple [trigger] process is not running (ProcessRunning)
93
+ I [2011-12-10 15:24:34] INFO: simple move 'init' to 'start'
94
+ I [2011-12-10 15:24:34] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
95
+ I [2011-12-10 15:24:34] INFO: simple moved 'init' to 'start'
96
+ I [2011-12-10 15:24:34] INFO: simple [trigger] process is running (ProcessRunning)
97
+ I [2011-12-10 15:24:34] INFO: simple move 'start' to 'up'
98
+ I [2011-12-10 15:24:34] INFO: simple registered 'proc_exit' event for pid 23298
99
+ I [2011-12-10 15:24:34] INFO: simple moved 'start' to 'up'
100
+
101
+ # Polls
102
+
103
+ I [2011-12-07 09:40:18] INFO: Loading simple.god
104
+ I [2011-12-07 09:40:18] INFO: Syslog enabled.
105
+ I [2011-12-07 09:40:18] INFO: Using pid file directory: /Users/tom/.god/pids
106
+ I [2011-12-07 09:40:18] INFO: Started on drbunix:///tmp/god.17165.sock
107
+ I [2011-12-07 09:40:18] INFO: simple move 'unmonitored' to 'up'
108
+ I [2011-12-07 09:40:18] INFO: simple moved 'unmonitored' to 'up'
109
+ I [2011-12-07 09:40:18] INFO: simple [trigger] process is not running (ProcessRunning)
110
+ I [2011-12-07 09:40:18] INFO: simple move 'up' to 'start'
111
+ I [2011-12-07 09:40:18] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
112
+ I [2011-12-07 09:40:19] INFO: simple moved 'up' to 'up'
113
+ I [2011-12-07 09:40:19] INFO: simple [ok] process is running (ProcessRunning)
114
+ I [2011-12-07 09:40:24] INFO: simple [ok] process is running (ProcessRunning)
115
+ I [2011-12-07 09:40:29] INFO: simple [ok] process is running (ProcessRunning)
116
+ ```
117
+
118
+ Here you can see god starting up, noticing that the `simple` process isn't
119
+ running, starting it, and then checking every five seconds to make sure it's
120
+ up. If you'd like to see god work its magic, go ahead and kill the `simple`
121
+ process. You should then see something like this:
122
+
123
+ ```terminal
124
+ # Events
125
+
126
+ I [2011-12-10 15:33:38] INFO: simple [trigger] process 23416 exited (ProcessExits)
127
+ I [2011-12-10 15:33:38] INFO: simple move 'up' to 'start'
128
+ I [2011-12-10 15:33:38] INFO: simple deregistered 'proc_exit' event for pid 23416
129
+ I [2011-12-10 15:33:38] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
130
+ I [2011-12-10 15:33:38] INFO: simple moved 'up' to 'start'
131
+ I [2011-12-10 15:33:38] INFO: simple [trigger] process is running (ProcessRunning)
132
+ I [2011-12-10 15:33:38] INFO: simple move 'start' to 'up'
133
+ I [2011-12-10 15:33:38] INFO: simple registered 'proc_exit' event for pid 23601
134
+ I [2011-12-10 15:33:38] INFO: simple moved 'start' to 'up'
135
+
136
+ # Polls
137
+
138
+ I [2011-12-07 09:54:59] INFO: simple [ok] process is running (ProcessRunning)
139
+ I [2011-12-07 09:55:04] INFO: simple [ok] process is running (ProcessRunning)
140
+ I [2011-12-07 09:55:09] INFO: simple [trigger] process is not running (ProcessRunning)
141
+ I [2011-12-07 09:55:09] INFO: simple move 'up' to 'start'
142
+ I [2011-12-07 09:55:09] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
143
+ I [2011-12-07 09:55:09] INFO: simple moved 'up' to 'up'
144
+ I [2011-12-07 09:55:09] INFO: simple [ok] process is running (ProcessRunning)
145
+ I [2011-12-07 09:55:14] INFO: simple [ok] process is running (ProcessRunning)
146
+ ```
147
+
148
+ While keeping a process up is useful, it would be even better if we could make
149
+ sure our process was behaving well and restart it when resource utilization
150
+ exceeds our specifications. With a few additions, we can easily have our
151
+ process restarted when memory usage or CPU goes above certain limits. Edit
152
+ your `sample.god` config file to look like this:
153
+
154
+ ```ruby
155
+ God.watch do |w|
156
+ w.name = "simple"
157
+ w.start = "ruby /full/path/to/simple.rb"
158
+ w.keepalive(:memory_max => 150.megabytes,
159
+ :cpu_max => 50.percent)
160
+ end
161
+ ```
162
+
163
+ Here I've specified a `:memory_max` option to the `keepalive` command. Now if
164
+ the process memory usage goes above 150 megabytes, god will restart it.
165
+ Similarly, by setting the `:cpu_max`, god will restart my process if its CPU
166
+ usage goes over 50%. By default these properties will be checked every 30
167
+ seconds and will be acted upon if there is an overage for three out of any
168
+ five checks. This prevents the process from getting restarted for temporary
169
+ resource spikes.
170
+
171
+ To test this out, modify your `simple.rb` server script to introduce a memory
172
+ leak:
173
+
174
+ ```ruby
175
+ data = ''
176
+ loop do
177
+ puts 'Hello'
178
+ 100000.times { data << 'x' }
179
+ end
180
+ ```
181
+
182
+ Ctrl-C out of the foregrounded god instance. Notice that your current `simple`
183
+ server will continue to run. Start god again with the same command as before.
184
+ Now instead of starting the `simple` process, it will notice that one is
185
+ already running and simply switch to the `up` state.
186
+
187
+ ```terminal
188
+ # Events
189
+
190
+ I [2011-12-10 15:36:00] INFO: Loading simple.god
191
+ I [2011-12-10 15:36:00] INFO: Syslog enabled.
192
+ I [2011-12-10 15:36:00] INFO: Using pid file directory: /Users/tom/.god/pids
193
+ I [2011-12-10 15:36:00] INFO: Started on drbunix:///tmp/god.17165.sock
194
+ I [2011-12-10 15:36:00] INFO: simple move 'unmonitored' to 'init'
195
+ I [2011-12-10 15:36:00] INFO: simple moved 'unmonitored' to 'init'
196
+ I [2011-12-10 15:36:00] INFO: simple [trigger] process is running (ProcessRunning)
197
+ I [2011-12-10 15:36:00] INFO: simple move 'init' to 'up'
198
+ I [2011-12-10 15:36:00] INFO: simple registered 'proc_exit' event for pid 23601
199
+ I [2011-12-10 15:36:00] INFO: simple moved 'init' to 'up'
200
+
201
+ # Polls
202
+
203
+ I [2011-12-07 14:50:46] INFO: Loading simple.god
204
+ I [2011-12-07 14:50:46] INFO: Syslog enabled.
205
+ I [2011-12-07 14:50:46] INFO: Using pid file directory: /Users/tom/.god/pids
206
+ I [2011-12-07 14:50:47] INFO: Started on drbunix:///tmp/god.17165.sock
207
+ I [2011-12-07 14:50:47] INFO: simple move 'unmonitored' to 'up'
208
+ I [2011-12-07 14:50:47] INFO: simple moved 'unmonitored' to 'up'
209
+ I [2011-12-07 14:50:47] INFO: simple [ok] process is running (ProcessRunning)
210
+ ```
211
+
212
+ In order to get our new `simple` server running, we can issue a command to god
213
+ to have our process restarted:
214
+
215
+ ```terminal
216
+ $ god restart simple
217
+ ```
218
+
219
+ From the logs you can see god killing and restarting the process:
220
+
221
+ ```terminal
222
+ # Events
223
+
224
+ I [2011-12-10 15:38:13] INFO: simple move 'up' to 'restart'
225
+ I [2011-12-10 15:38:13] INFO: simple deregistered 'proc_exit' event for pid 23601
226
+ I [2011-12-10 15:38:13] INFO: simple stop: default lambda killer
227
+ I [2011-12-10 15:38:13] INFO: simple sent SIGTERM
228
+ I [2011-12-10 15:38:14] INFO: simple process stopped
229
+ I [2011-12-10 15:38:14] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
230
+ I [2011-12-10 15:38:14] INFO: simple moved 'up' to 'restart'
231
+ I [2011-12-10 15:38:14] INFO: simple [trigger] process is running (ProcessRunning)
232
+ I [2011-12-10 15:38:14] INFO: simple move 'restart' to 'up'
233
+ I [2011-12-10 15:38:14] INFO: simple registered 'proc_exit' event for pid 23707
234
+ I [2011-12-10 15:38:14] INFO: simple moved 'restart' to 'up'
235
+
236
+ # Polls
237
+
238
+ I [2011-12-07 14:51:13] INFO: simple [ok] process is running (ProcessRunning)
239
+ I [2011-12-07 14:51:13] INFO: simple move 'up' to 'restart'
240
+ I [2011-12-07 14:51:13] INFO: simple stop: default lambda killer
241
+ I [2011-12-07 14:51:13] INFO: simple sent SIGTERM
242
+ I [2011-12-07 14:51:14] INFO: simple process stopped
243
+ I [2011-12-07 14:51:14] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
244
+ I [2011-12-07 14:51:14] INFO: simple moved 'up' to 'up'
245
+ I [2011-12-07 14:51:14] INFO: simple [ok] process is running (ProcessRunning)
246
+ ```
247
+
248
+ God will now start reporting on memory and CPU utilization of your process:
249
+
250
+ ```terminal
251
+ # Events and Polls
252
+
253
+ I [2011-12-07 14:54:37] INFO: simple [ok] process is running (ProcessRunning)
254
+ I [2011-12-07 14:54:37] INFO: simple [ok] memory within bounds [2032kb] (MemoryUsage)
255
+ I [2011-12-07 14:54:37] INFO: simple [ok] cpu within bounds [0.0%%] (CpuUsage)
256
+ I [2011-12-07 14:54:42] INFO: simple [ok] process is running (ProcessRunning)
257
+ I [2011-12-07 14:54:42] INFO: simple [ok] memory within bounds [2032kb, 13492kb] (MemoryUsage)
258
+ I [2011-12-07 14:54:42] INFO: simple [ok] cpu within bounds [0.0%%, *99.7%%] (CpuUsage)
259
+ I [2011-12-07 14:54:47] INFO: simple [ok] process is running (ProcessRunning)
260
+ I [2011-12-07 14:54:47] INFO: simple [ok] memory within bounds [2032kb, 13492kb, 25568kb] (MemoryUsage)
261
+ I [2011-12-07 14:54:47] INFO: simple [ok] cpu within bounds [0.0%%, *99.7%%, *100.0%%] (CpuUsage)
262
+ I [2011-12-07 14:54:52] INFO: simple [ok] process is running (ProcessRunning)
263
+ I [2011-12-07 14:54:52] INFO: simple [ok] memory within bounds [2032kb, 13492kb, 25568kb, 37556kb] (MemoryUsage)
264
+ I [2011-12-07 14:54:52] INFO: simple [trigger] cpu out of bounds [0.0%%, *99.7%%, *100.0%%, *98.4%%] (CpuUsage)
265
+ I [2011-12-07 14:54:52] INFO: simple move 'up' to 'restart'
266
+ ```
267
+
268
+ On the last line of the above log you can see that CPU usage has gone above
269
+ 50% for three cycles and god will issue a restart operation. God will continue
270
+ to monitor the `simple` process for as long as god is running and the process
271
+ is set to be monitored.
272
+
273
+ Now, before you kill the god process, let's kill the `simple` server by asking
274
+ god to stop it for us. In a new terminal, issue the command:
275
+
276
+ ```terminal
277
+ $ god stop simple
278
+ ```
279
+
280
+ You should see the following output:
281
+
282
+ ```terminal
283
+ Sending 'stop' command
284
+
285
+ The following watches were affected:
286
+ simple
287
+ ```
288
+
289
+ And in the foregrounded god terminal window, you'll see the log of what
290
+ happened:
291
+
292
+ ```terminal
293
+ # Events
294
+
295
+ I [2011-12-10 15:41:04] INFO: simple stop: default lambda killer
296
+ I [2011-12-10 15:41:04] INFO: simple sent SIGTERM
297
+ I [2011-12-10 15:41:05] INFO: simple process stopped
298
+ I [2011-12-10 15:41:05] INFO: simple move 'up' to 'unmonitored'
299
+ I [2011-12-10 15:41:05] INFO: simple deregistered 'proc_exit' event for pid 23707
300
+ I [2011-12-10 15:41:05] INFO: simple moved 'up' to 'unmonitored'
301
+
302
+ # Polls
303
+
304
+ I [2011-12-07 09:59:59] INFO: simple [ok] process is running (ProcessRunning)
305
+ I [2011-12-07 10:00:04] INFO: simple [ok] process is running (ProcessRunning)
306
+ I [2011-12-07 10:00:07] INFO: simple stop: default lambda killer
307
+ I [2011-12-07 10:00:07] INFO: simple sent SIGTERM
308
+ I [2011-12-07 10:00:08] INFO: simple process stopped
309
+ I [2011-12-07 10:00:08] INFO: simple move 'up' to 'unmonitored'
310
+ I [2011-12-07 10:00:08] INFO: simple moved 'up' to 'unmonitored'
311
+ ```
312
+
313
+ Now feel free to Ctrl-C out of god. Congratulations! You've just taken god for
314
+ a test ride and seen how easy it is to keep your processes running.
315
+
316
+ This is just the beginning of what god can do, and in reality, the `keepalive`
317
+ command is a convenience method written using more advanced transitional and
318
+ condition constructs that may be used directly. You can configure many
319
+ different kinds of conditions to have your process restarted when memory or
320
+ CPU are too high, when disk usage is above a threshold, when a process returns
321
+ an HTTP error code on a specific URL, and many more. In addition you can write
322
+ your own custom conditions and use them in your configuration files. Many
323
+ different lifecycle controls are available alongside a sophisticated and
324
+ extensible notifications system. Keep reading to find out what makes god
325
+ different from other monitoring systems and how it can help you solve many of
326
+ your process monitoring and control problems.
327
+
328
+
329
+ Config Files are Ruby Code!
330
+ ---------------------------
331
+
332
+ Now that you've seen how to get started quickly, let's see how to use the more
333
+ powerful aspects of god. Once again, the best way to learn will be through an
334
+ example. The following configuration file is what I once used at gravatar.com
335
+ to keep the mongrels running:
336
+
337
+ ```ruby
338
+ RAILS_ROOT = "/Users/tom/dev/gravatar2"
339
+
340
+ %w{8200 8201 8202}.each do |port|
341
+ God.watch do |w|
342
+ w.name = "gravatar2-mongrel-#{port}"
343
+
344
+ w.start = "mongrel_rails start -c #{RAILS_ROOT} -p #{port} \
345
+ -P #{RAILS_ROOT}/log/mongrel.#{port}.pid -d"
346
+ w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
347
+ w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
348
+
349
+ w.pid_file = File.join(RAILS_ROOT, "log/mongrel.#{port}.pid")
350
+
351
+ w.behavior(:clean_pid_file)
352
+
353
+ w.start_if do |start|
354
+ start.condition(:process_running) do |c|
355
+ c.interval = 5.seconds
356
+ c.running = false
357
+ end
358
+ end
359
+
360
+ w.restart_if do |restart|
361
+ restart.condition(:memory_usage) do |c|
362
+ c.above = 150.megabytes
363
+ c.times = [3, 5] # 3 out of 5 intervals
364
+ end
365
+
366
+ restart.condition(:cpu_usage) do |c|
367
+ c.above = 50.percent
368
+ c.times = 5
369
+ end
370
+ end
371
+
372
+ # lifecycle
373
+ w.lifecycle do |on|
374
+ on.condition(:flapping) do |c|
375
+ c.to_state = [:start, :restart]
376
+ c.times = 5
377
+ c.within = 5.minute
378
+ c.transition = :unmonitored
379
+ c.retry_in = 10.minutes
380
+ c.retry_times = 5
381
+ c.retry_within = 2.hours
382
+ end
383
+ end
384
+ end
385
+ end
386
+ ```
387
+
388
+ That's a lot to take in at once, so I'll break it down by section and explain
389
+ what's going on in each.
390
+
391
+ ```ruby
392
+ RAILS_ROOT = "/var/www/gravatar2/current"
393
+ ```
394
+
395
+ Here I've set a constant that is used throughout the file. Keeping the
396
+ `RAILS_ROOT` value in a constant makes it easy to adapt this script to other
397
+ applications. Because the config file is Ruby code, I can set whatever
398
+ variables or constants I want that make the configuration more concise and
399
+ easier to work with.
400
+
401
+ ```ruby
402
+ %w{8200 8201 8202}.each do |port|
403
+ ...
404
+ end
405
+ ```
406
+
407
+ Because the config file is written in actual Ruby code, we can construct loops
408
+ and do other intelligent things that are impossible in your every day, run of
409
+ the mill config file. I need to watch three mongrels, so I simply loop over
410
+ their port numbers, eliminating duplication and making my life a whole lot
411
+ easier.
412
+
413
+ ```ruby
414
+ God.watch do |w|
415
+ w.name = "gravatar2-mongrel-#{port}"
416
+
417
+ w.start = "mongrel_rails start -c #{RAILS_ROOT} -p #{port} \
418
+ -P #{RAILS_ROOT}/log/mongrel.#{port}.pid -d"
419
+ w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
420
+ w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
421
+
422
+ w.pid_file = File.join(RAILS_ROOT, "log/mongrel.#{port}.pid")
423
+
424
+ ...
425
+ end
426
+ ```
427
+
428
+ A `watch` represents a single process that has concrete start, stop, and/or
429
+ restart operations. You can define as many watches as you like. In the example
430
+ above, I've got some Rails instances running in Mongrels that I need to keep
431
+ alive. Every watch must have a unique `name` so that it can be identified
432
+ later on. The `start` and `stop` attributes specify the commands to start
433
+ and stop the process. If no `restart` attribute is set, restart will be
434
+ represented by a call to stop followed by a call to start. The
435
+ optional `grace` attribute sets the amount of time following a
436
+ start/stop/restart command to wait before resuming normal monitoring
437
+ operations. If the process you're watching runs as a daemon (as
438
+ mine does), you'll need to set the `pid_file` attribute.
439
+
440
+ ```ruby
441
+ w.behavior(:clean_pid_file)
442
+ ```
443
+
444
+ Behaviors allow you to execute additional commands around start/stop/restart
445
+ commands. In our case, if the process dies it will leave a PID file behind.
446
+ The next time a start command is issued, it will fail, complaining about the
447
+ leftover PID file. We'd like the PID file cleaned up before a start command is
448
+ issued. The built-in behavior `clean_pid_file` will do just that.
449
+
450
+ ```ruby
451
+ w.start_if do |start|
452
+ start.condition(:process_running) do |c|
453
+ c.interval = 5.seconds
454
+ c.running = false
455
+ end
456
+ end
457
+ ```
458
+
459
+ Watches contain conditions grouped by the action to execute should they return
460
+ `true`. I start with a `start_if` block that contains a single condition.
461
+ Conditions are specified by calling `condition` with an identifier, in this
462
+ case `:process_running`. Each condition can specify a poll interval that will
463
+ override the default watch interval. In this case, I want to check that the
464
+ process is still running every 5 seconds instead of the 30 second interval
465
+ that other conditions will inherit. The ability to set condition specific poll
466
+ intervals makes it possible to run critical tests (such as :process_running)
467
+ more often than less critical tests (such as :memory_usage and :cpu_usage).
468
+
469
+ ```ruby
470
+ w.restart_if do |restart|
471
+ restart.condition(:memory_usage) do |c|
472
+ c.above = 150.megabytes
473
+ c.times = [3, 5] # 3 out of 5 intervals
474
+ end
475
+
476
+ ...
477
+ end
478
+ ```
479
+
480
+ Similar to `start_if` there is a `restart_if` command that groups conditions
481
+ that should trigger a restart. The `memory_usage` condition will fail if the
482
+ specified process is using too much memory. The maximum allowable amount of
483
+ memory is specified with the `above` attribute (you can use the `kilobytes`,
484
+ `megabytes`, or `gigabytes` helpers). The number of times the test needs to
485
+ fail in order to trigger a restart is set with `times`. This can be either an
486
+ integer or an array. An integer means it must fail that many times in a row
487
+ while an array `[x, y]` means it must fail `x` times out of the last `y`
488
+ tests.
489
+
490
+ ```ruby
491
+ w.restart_if do |restart|
492
+ ...
493
+
494
+ restart.condition(:cpu_usage) do |c|
495
+ c.above = 50.percent
496
+ c.times = 5
497
+ end
498
+ end
499
+ ```
500
+
501
+ To keep an eye on CPU usage, I've employed the `cpu_usage` condition. When CPU
502
+ usage for a Mongrel process is over 50% for 5 consecutive intervals, it will
503
+ be restarted.
504
+
505
+ ```ruby
506
+ w.lifecycle do |on|
507
+ on.condition(:flapping) do |c|
508
+ c.to_state = [:start, :restart]
509
+ c.times = 5
510
+ c.within = 5.minute
511
+ c.transition = :unmonitored
512
+ c.retry_in = 10.minutes
513
+ c.retry_times = 5
514
+ c.retry_within = 2.hours
515
+ end
516
+ end
517
+ ```
518
+
519
+ Conditions inside a `lifecycle` section are active as long as the process is being monitored (they live across state changes).
520
+
521
+ The `:flapping` condition guards against the edge case wherein god rapidly
522
+ starts or restarts your application. Things like server configuration changes
523
+ or the unavailability of external services could make it impossible for my
524
+ process to start. In that case, god will try to start my process over and over
525
+ to no avail. The `:flapping` condition provides two levels of giving up on
526
+ flapping processes. If I were to translate the options of the code above, it
527
+ would be something like: If this watch is started or restarted five times
528
+ withing 5 minutes, then unmonitor it...then after ten minutes, monitor it
529
+ again to see if it was just a temporary problem; if the process is seen to be
530
+ flapping five times within two hours, then give up completely.
531
+
532
+ That's it!
533
+
534
+ /////////////////////////////////////////////////////////////////////////////
535
+ /////////////////////////////////////////////////////////////////////////////
536
+
537
+ Starting and Controlling God
538
+ ----------------------------
539
+
540
+ To start the god monitoring process as a daemon simply run the `god`
541
+ executable passing in the path to the config file (you need to sudo if you're
542
+ using events on Linux or want to use the setuid/setgid functionality):
543
+
544
+ ```terminal
545
+ $ sudo god -c /path/to/config.god
546
+ ```
547
+
548
+ While you're writing your config file, it can be helpful to run god in the
549
+ foreground so you can see the log messages. You can do that with:
550
+
551
+ ```terminal
552
+ $ sudo god -c /path/to/config.god -D
553
+ ```
554
+
555
+ You can start/restart/stop/monitor/unmonitor your Watches with the same
556
+ utility like so:
557
+
558
+ ```terminal
559
+ $ sudo god stop gravatar2-mongrel-8200
560
+ ```
561
+
562
+ /////////////////////////////////////////////////////////////////////////////
563
+ /////////////////////////////////////////////////////////////////////////////
564
+
565
+ Watching Non-Daemon Processes
566
+ -----------------------------
567
+
568
+ Need to watch a script that doesn't have built in daemonization? No problem!
569
+ God will daemonize and keep track of your process for you. If you don't
570
+ specify a `pid_file` attribute for a watch, it will be auto-daemonized and a
571
+ PID file will be stored for it in `/var/run/god`.
572
+
573
+
574
+ ```ruby
575
+ God.pid_file_directory = '/home/tom/pids'
576
+
577
+ # Watcher that auto-daemonizes and creates the pid file
578
+ God.watch do |w|
579
+ w.name = 'mongrel'
580
+ w.pid_file = w.pid_file = File.join(RAILS_ROOT, "log/mongrel.pid")
581
+
582
+ w.start = "mongrel_rails start -P #{RAILS_ROOT}/log/mongrel.pid -d"
583
+
584
+ # ...
585
+ end
586
+
587
+ # Watcher that does not auto-daemonize
588
+ God.watch do |w|
589
+ w.name = 'worker'
590
+ # w.pid_file = is not set
591
+
592
+ w.start = "rake resque:worker"
593
+
594
+ # ...
595
+ end
596
+ ```
597
+
598
+
599
+ If you'd rather have the PID file stored in a different location, you can
600
+ set it at the top of your config:
601
+
602
+ ```ruby
603
+ God.pid_file_directory = '/home/tom/pids'
604
+ ```
605
+
606
+ The directory you specify must be writable by god.
607
+
608
+
609
+ /////////////////////////////////////////////////////////////////////////////
610
+ /////////////////////////////////////////////////////////////////////////////
611
+
612
+ Grouping Watches
613
+ ----------------
614
+
615
+ Watches can be assigned to groups. These groups can then be controlled
616
+ together from the command line.
617
+
618
+ ```ruby
619
+ God.watch do |w|
620
+ ...
621
+
622
+ w.group = 'mongrels'
623
+
624
+ ...
625
+ end
626
+ ```
627
+
628
+ The above configuration now allows you to control the watch (and any others
629
+ that are in the group) with a single command:
630
+
631
+ ```terminal
632
+ $ sudo god stop mongrels
633
+ ```
634
+
635
+ /////////////////////////////////////////////////////////////////////////////
636
+ /////////////////////////////////////////////////////////////////////////////
637
+
638
+ Redirecting STDOUT and STDERR of your Process
639
+ ---------------------------------------------
640
+
641
+ By default, the STDOUT stream for your process is redirected to `/dev/null`.
642
+ To get access to this output, you can redirect the stream either to a file or
643
+ to a command.
644
+
645
+ To redirect STDOUT to a file, set the `log` attribute to a file path. The file
646
+ will be written in append mode and created if it does not exist.
647
+
648
+ ```ruby
649
+ God.watch do |w|
650
+ ...
651
+
652
+ w.log = '/var/log/myprocess.log'
653
+
654
+ ...
655
+ end
656
+ ```
657
+
658
+ To redirect STDOUT to a command that will be run for you, set the `log_cmd`
659
+ attribute to a command.
660
+
661
+ ```ruby
662
+ God.watch do |w|
663
+ ...
664
+
665
+ w.log_cmd = '/usr/bin/logger'
666
+
667
+ ...
668
+ end
669
+ ```
670
+
671
+ By default, STDERR is redirected to STDOUT. You can redirect it to a file or a
672
+ command just like STDOUT by setting the `err_log` or `err_log_cmd` attributes
673
+ respectively.
674
+
675
+ /////////////////////////////////////////////////////////////////////////////
676
+ /////////////////////////////////////////////////////////////////////////////
677
+
678
+ Changing UID/GID for processes
679
+ ------------------------------
680
+
681
+ It is possible to have god run your start/stop/restart commands as a specific
682
+ user/group. This can be done by setting the `uid` and/or `gid` attributes of a
683
+ watch.
684
+
685
+ ```ruby
686
+ God.watch do |w|
687
+ ...
688
+
689
+ w.uid = 'tom'
690
+ w.gid = 'devs'
691
+
692
+ ...
693
+ end
694
+ ```
695
+
696
+ This only works for commands specified as a string. Lambda commands are
697
+ unaffected.
698
+
699
+ /////////////////////////////////////////////////////////////////////////////
700
+ /////////////////////////////////////////////////////////////////////////////
701
+
702
+ Setting the Working Directory
703
+ -----------------------------
704
+
705
+ By default, God sets the working directory to `/` before running your process.
706
+ You can change this by setting the `dir` attribute on the watch.
707
+
708
+ ```ruby
709
+ God.watch do |w|
710
+ ...
711
+
712
+ w.dir = '/var/www/myapp'
713
+
714
+ ...
715
+ end
716
+ ```
717
+
718
+ /////////////////////////////////////////////////////////////////////////////
719
+ /////////////////////////////////////////////////////////////////////////////
720
+
721
+ Setting environment variables
722
+ -----------------------------
723
+
724
+ You can set any number of environment variables you wish via the `env`
725
+ attribute of a watch.
726
+
727
+ ```ruby
728
+ God.watch do |w|
729
+ ...
730
+
731
+ w.env = { 'RAILS_ROOT' => "/var/www/myapp",
732
+ 'RAILS_ENV' => "production" }
733
+
734
+ ...
735
+ end
736
+ ```
737
+
738
+ /////////////////////////////////////////////////////////////////////////////
739
+ /////////////////////////////////////////////////////////////////////////////
740
+
741
+ Using chroot to Change the File System Root
742
+ -------------------------------------------
743
+
744
+ If you want your process to run chrooted, simply use the `chroot` attribute on
745
+ the watch. The specified directory must exist and have a `/dev/null`.
746
+
747
+ ```ruby
748
+ God.watch do |w|
749
+ ...
750
+
751
+ w.chroot = '/var/myroot'
752
+
753
+ ...
754
+ end
755
+ ```
756
+
757
+ /////////////////////////////////////////////////////////////////////////////
758
+ /////////////////////////////////////////////////////////////////////////////
759
+
760
+ Lambda commands
761
+ ---------------
762
+
763
+ In addition to specifying start/stop/restart commands as strings (to be
764
+ executed via the shell), you can specify a lambda that will be called.
765
+
766
+ ```ruby
767
+ God.watch do |w|
768
+ ...
769
+
770
+ w.start = lambda { ENV['APACHE'] ? `apachectl -k graceful` : `lighttpd restart` }
771
+
772
+ ...
773
+ end
774
+ ```
775
+
776
+ /////////////////////////////////////////////////////////////////////////////
777
+ /////////////////////////////////////////////////////////////////////////////
778
+
779
+ Customizing the Default Stop Lambda
780
+ -----------------------------------
781
+
782
+ If you do not provide a stop command, God will attempt to stop your process by
783
+ first sending a SIGTERM. It will then wait for ten seconds for the process to
784
+ exit. If after this time it still has not exited, it will be sent a SIGKILL.
785
+ You can customize the stop signal and/or the time to wait for the process to
786
+ exit by setting the `stop_signal` and `stop_timeout` attributes on the watch.
787
+
788
+ ```ruby
789
+ God.watch do |w|
790
+ ...
791
+
792
+ w.stop_signal = 'QUIT'
793
+ w.stop_timeout = 20.seconds
794
+
795
+ ...
796
+ end
797
+ ```
798
+
799
+
800
+ /////////////////////////////////////////////////////////////////////////////
801
+ /////////////////////////////////////////////////////////////////////////////
802
+
803
+ Loading Other Config Files
804
+ --------------------------
805
+
806
+ You should feel free to separate your god configs into separate files for
807
+ easier organization. You can load in other configs using Ruby's normal `load`
808
+ method, or use the convenience method `God.load` which allows for glob-style
809
+ paths:
810
+
811
+ ```ruby
812
+ # load in all god configs
813
+ God.load "/usr/local/conf/*.god"
814
+ ```
815
+
816
+ God won't start its monitoring operations until all configurations have been
817
+ loaded.
818
+
819
+ /////////////////////////////////////////////////////////////////////////////
820
+ /////////////////////////////////////////////////////////////////////////////
821
+
822
+ Dynamically Loading Config Files Into an Already Running God
823
+ ------------------------------------------------------------
824
+
825
+ God allows you to load or reload configurations into an already running
826
+ instance. There are a few things to consider when doing this:
827
+
828
+ * Existng Watches with the same `name` as the incoming Watches will be
829
+ overidden by the new config.
830
+ * All paths must be either absolute or relative to the path from which god was
831
+ started.
832
+
833
+ To load a config into a running god, issue the following command:
834
+
835
+ ```terminal
836
+ $ sudo god load path/to/config.god
837
+ ```
838
+
839
+ Config files that are loaded dynamically can contain anything that a normal
840
+ config file contains, however, global options such as `God.pid_file_directory`
841
+ blocks will be ignored (and produce a warning in the logs).
842
+
843
+ /////////////////////////////////////////////////////////////////////////////
844
+ /////////////////////////////////////////////////////////////////////////////
845
+
846
+ Getting Logs for a Single Watch
847
+ -------------------------------
848
+
849
+ Sifting through the god logs for statements specific to a single Watch can be
850
+ frustrating when you have many of them. You can get the realtime logs for a
851
+ single Watch via the command line:
852
+
853
+ ```terminal
854
+ $ sudo god log local-3000
855
+ ```
856
+
857
+ This will display log output for the 'local-3000' Watch and update every
858
+ second with new log messages.
859
+
860
+ You can also supply a shorthand to the log command that will match one of your
861
+ watches. If it happens to match several, the shortest match will be used:
862
+
863
+ ```terminal
864
+ $ sudo god log l3
865
+ ```
866
+
867
+ /////////////////////////////////////////////////////////////////////////////
868
+ /////////////////////////////////////////////////////////////////////////////
869
+
870
+ Notifications
871
+ -------------
872
+
873
+ God has an extensible notification framework built in that makes it easy to
874
+ have notifications sent when conditions are triggered. Each notification type
875
+ has a set of configuration parameters that must be set. These parameters may
876
+ be set globally via Contact Defaults or individually via Contact Instances.
877
+
878
+ *Contact Defaults* - Some parameters are unlikely to change on a per-contact
879
+ basis. You should set those parameters via the defaults mechanism.
880
+
881
+ ```ruby
882
+ God::Contacts::Email.defaults do |d|
883
+ d.from_email = 'god@example.com'
884
+ d.from_name = 'God'
885
+ d.delivery_method = :sendmail
886
+ end
887
+ ```
888
+
889
+ *Contact Instances* - Each contact must have a unique `name` set. You may
890
+ optionally assign each contact to a `group`.
891
+
892
+ ```ruby
893
+ God.contact(:email) do |c|
894
+ c.name = 'tom'
895
+ c.group = 'developers'
896
+ c.to_email = 'tom@example.com'
897
+ end
898
+
899
+ God.contact(:email) do |c|
900
+ c.name = 'vanpelt'
901
+ c.group = 'developers'
902
+ c.to_email = 'vanpelt@example.com'
903
+ end
904
+
905
+ God.contact(:email) do |c|
906
+ c.name = 'kevin'
907
+ c.group = 'developers'
908
+ c.to_email = 'kevin@example.com'
909
+ end
910
+ ```
911
+
912
+ *Condition Attachment* - To have a specific contact notified when a condition
913
+ is triggered, simply set the condition's `notify` attribute to the name of the
914
+ individual contact.
915
+
916
+ ```ruby
917
+ w.transition(:up, :start) do |on|
918
+ on.condition(:process_exits) do |c|
919
+ c.notify = 'tom'
920
+ end
921
+ end
922
+ ```
923
+
924
+ There are two ways to specify that a notification should be sent. The first,
925
+ easier way is shown above. Every condition can take an optional `notify`
926
+ attribute that specifies which contacts should be notified when the condition
927
+ is triggered. The value can be a contact name or contact group *or* an array
928
+ of contact names and/or contact groups.
929
+
930
+ ```ruby
931
+ w.transition(:up, :start) do |on|
932
+ on.condition(:process_exits) do |c|
933
+ c.notify = {:contacts => ['tom', 'developers'], :priority => 1, :category => 'product'}
934
+ end
935
+ end
936
+ ```
937
+
938
+ The second way allows you to specify the `priority` and `category` in addition
939
+ to the contacts. The extra attributes can be arbitrary integers or strings and
940
+ will be passed as-is to the notification subsystem.
941
+
942
+ The above notification will arrive as an email similar to the following.
943
+
944
+ ```
945
+ From: God &lt;god@example.com&gt;
946
+ To: tom &lt;tom@example.com&gt;
947
+ Subject: [god] mongrel-8600 [trigger] process exited (ProcessExits)
948
+
949
+ Message: mongrel-8600 [trigger] process exited (ProcessExits)
950
+ Host: candymountain.example.com
951
+ Priority: 1
952
+ Category: product
953
+ ```
954
+
955
+ Available Notification Types
956
+ ----------------------------
957
+
958
+ Campfire
959
+ ~~~~~~~~
960
+
961
+ Send a notice to a Campfire room (http://campfirenow.com).
962
+
963
+ ```ruby
964
+ God::Contacts::Campfire.defaults do |d|
965
+ ...
966
+ end
967
+
968
+ God.contact(:campfire) do |c|
969
+ ...
970
+ end
971
+ ```
972
+
973
+ ```
974
+ subdomain - The String subdomain of the Campfire account. If your URL is
975
+ "foo.campfirenow.com" then your subdomain is "foo".
976
+ token - The String token used for authentication.
977
+ room - The String room name to which the message should be sent.
978
+ ssl - A Boolean determining whether or not to use SSL
979
+ (default: false).
980
+ ```
981
+
982
+ Email
983
+ ~~~~~
984
+
985
+ Send a notice to an email address.
986
+
987
+ ```ruby
988
+ God::Contacts::Email.defaults do |d|
989
+ ...
990
+ end
991
+
992
+ God.contact(:email) do |c|
993
+ ...
994
+ end
995
+ ```
996
+
997
+ ```
998
+ to_email - The String email address to which the email will be sent.
999
+ to_name - The String name corresponding to the recipient.
1000
+ from_email - The String email address from which the email will be sent.
1001
+ from_name - The String name corresponding to the sender.
1002
+ delivery_method - The Symbol delivery method. [ :smtp | :sendmail ]
1003
+ (default: :smtp).
1004
+
1005
+ === SMTP Options (when delivery_method = :smtp) ===
1006
+ server_host - The String hostname of the SMTP server (default: localhost).
1007
+ server_port - The Integer port of the SMTP server (default: 25).
1008
+ server_auth - The Boolean of whether or not to use authentication
1009
+ (default: false).
1010
+
1011
+ === SMTP Auth Options (when server_auth = true) ===
1012
+ server_domain - The String domain.
1013
+ server_user - The String username.
1014
+ server_password - The String password.
1015
+
1016
+ === Sendmail Options (when delivery_method = :sendmail) ===
1017
+ sendmail_path - The String path to the sendmail executable
1018
+ (default: "/usr/sbin/sendmail").
1019
+ sendmail_args - The String args to send to sendmail (default "-i -t").
1020
+ ```
1021
+
1022
+ Jabber
1023
+ ~~~~~~
1024
+
1025
+ Send a notice to a Jabber address (http://jabber.org/).
1026
+
1027
+ Google Mail addresses should work. If you need a non-Gmail address, you can
1028
+ sign up for one at http://register.jabber.org/.
1029
+
1030
+ ```ruby
1031
+ God::Contacts::Jabber.defaults do |d|
1032
+ ...
1033
+ end
1034
+
1035
+ God.contact(:jabber) do |c|
1036
+ ...
1037
+ end
1038
+ ```
1039
+
1040
+ ```
1041
+ host - The String hostname of the Jabber server.
1042
+ port - The Integer port of the Jabber server.
1043
+ from_jid - The String Jabber ID of the sender.
1044
+ password - The String password of the sender.
1045
+ to_jid - The String Jabber ID of the recipient.
1046
+ subject - The String subject of the message (default: "God Notification").
1047
+ ```
1048
+
1049
+ Prowl
1050
+ ~~~~~
1051
+
1052
+ Send a notice to Prowl (<a href="http://prowl.weks.net/">http://prowl.weks.net/</a>).
1053
+
1054
+ ```ruby
1055
+ God::Contacts::Prowl.defaults do |d|
1056
+ ...
1057
+ end
1058
+
1059
+ God.contact(:prowl) do |c|
1060
+ ...
1061
+ end
1062
+ ```
1063
+
1064
+ ```
1065
+ apikey - The String API key.
1066
+ ```
1067
+
1068
+ Scout
1069
+ ~~~~~
1070
+
1071
+ Send a notice to Scout (http://scoutapp.com/).
1072
+
1073
+ ```ruby
1074
+ God::Contacts::Scout.defaults do |d|
1075
+ ...
1076
+ end
1077
+
1078
+ God.contact(:scout) do |c|
1079
+ ...
1080
+ end
1081
+ ```
1082
+
1083
+ ```
1084
+ client_key - The String client key.
1085
+ plugin_id - The String plugin id.
1086
+
1087
+ ```
1088
+
1089
+ Twitter
1090
+ ~~~~~~~
1091
+
1092
+ Send a notice to a Twitter account (http://twitter.com/).
1093
+
1094
+ In order to use the Twitter notification, you will need to authorize God via
1095
+ OAuth and then get the OAuth token and secret for your account. The easiest
1096
+ way to do this is with a Ruby gem called `twurl`. Install it like so:
1097
+
1098
+ ```terminal
1099
+ [sudo] gem install twurl
1100
+ ```
1101
+
1102
+ Then, run the following:
1103
+
1104
+ ```terminal
1105
+ twurl auth --consumer-key gOhjax6s0L3mLeaTtBWPw \
1106
+ --consumer-secret yz4gpAVXJHKxvsGK85tEyzQJ7o2FEy27H1KEWL75jfA
1107
+ ```
1108
+
1109
+ This will return a URL. Copy it to your clipboard. Make sure you are logged
1110
+ into Twitter with the account that will used for the notifications, and then
1111
+ paste the URL into a new browser window. At the end of the authentication
1112
+ process, you will be given a PIN. Copy this PIN and paste it back to the
1113
+ command line prompt. Once this is complete, you need to find your access token
1114
+ and secret:
1115
+
1116
+ ```terminal
1117
+ cat ~/.twurlrc
1118
+ ```
1119
+
1120
+ This will output the contents of the config file from which you can grab your
1121
+ access token and secret:
1122
+
1123
+ ```
1124
+ ---
1125
+ profiles:
1126
+ mojombo:
1127
+ gOhjax6s0L3mLeaTtBWPw:
1128
+ [red]token: 17376380-KXA91nCrgaQ4HxUXMmZtM38gB56qS3hx1NYbjT6mQ
1129
+ consumer_key: gOhjax6s0L3mLeaTtBWPw
1130
+ username: mojombo
1131
+ consumer_secret: yz4gpAVXJHKxvsGK85tEyzQJ7o2FEy27H1KEWL75jfA
1132
+ [red]secret: EBWFQBCtuMwCDeU4OXlc3LwGyY8OdWAV0Jg5KVB0
1133
+ configuration:
1134
+ default_profile:
1135
+ - mojombo
1136
+ - gOhjax6s0L3mLeaTtBWPw
1137
+
1138
+ ```
1139
+
1140
+ The access token and secret (highlighted in red above) are what you need to
1141
+ use as parameters to the Twitter notification.
1142
+
1143
+ ```ruby
1144
+ God::Contacts::Twitter.defaults do |d|
1145
+ ...
1146
+ end
1147
+
1148
+ God.contact(:twitter) do |c|
1149
+ ...
1150
+ end
1151
+ ```
1152
+
1153
+ ```
1154
+ consumer_token - The String OAuth consumer token (defaults to God's
1155
+ existing consumer token).
1156
+ consumer_secret - The String OAuth consumer secret (defaults to God's
1157
+ existing consumer secret).
1158
+ access_token - The String OAuth access token.
1159
+ access_secret - The String OAuth access secret.
1160
+ ```
1161
+
1162
+ Webhook
1163
+ ~~~~~~~
1164
+
1165
+ Send a notice to a webhook (http://www.webhooks.org/).
1166
+
1167
+ ```ruby
1168
+ God::Contacts::Webhook.defaults do |d|
1169
+ ...
1170
+ end
1171
+
1172
+ God.contact(:webhook) do |c|
1173
+ ...
1174
+ end
1175
+ ```
1176
+
1177
+ ```
1178
+ url - The String webhook URL.
1179
+ format - The Symbol format [ :form | :json ] (default: :form).
1180
+
1181
+ ```
1182
+
1183
+ /////////////////////////////////////////////////////////////////////////////
1184
+ /////////////////////////////////////////////////////////////////////////////
1185
+
1186
+ Advanced Configuration with Transitions and Events
1187
+ --------------------------------------------------
1188
+
1189
+ So far you've been introduced to a simple poll-based config file and seen how
1190
+ to run it. Poll-based monitoring works great for simple things, but falls
1191
+ short for highly critical tasks. God has native support for kqueue/netlink
1192
+ events on BSD/Darwin/Linux systems. For instance, instead of using the
1193
+ `process_running` condition to poll for the status of your process, you can
1194
+ use the `process_exits` condition that will be notified *immediately* upon the
1195
+ exit of your process. This means less load on your system and shorter downtime
1196
+ after a crash.
1197
+
1198
+ While the configuration syntax you saw in the previous example is very simple,
1199
+ it lacks the power that we need to deal with event based monitoring. In fact,
1200
+ the `start_if` and `restart_if` methods are really just calling out to a
1201
+ lower-level API. If we use the low-level API directly, we can harness the full
1202
+ power of god's event based lifecycle system. Let's look at another example
1203
+ config file.
1204
+
1205
+ ```ruby
1206
+ RAILS_ROOT = "/Users/tom/dev/gravatar2"
1207
+
1208
+ God.watch do |w|
1209
+ w.name = "local-3000"
1210
+
1211
+ w.start = "mongrel_rails start -c #{RAILS_ROOT} -P #{RAILS_ROOT}/log/mongrel.pid -p 3000 -d"
1212
+ w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/log/mongrel.pid"
1213
+ w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/log/mongrel.pid"
1214
+
1215
+ w.pid_file = File.join(RAILS_ROOT, "log/mongrel.pid")
1216
+
1217
+ # clean pid files before start if necessary
1218
+ w.behavior(:clean_pid_file)
1219
+
1220
+ # determine the state on startup
1221
+ w.transition(:init, { true => :up, false => :start }) do |on|
1222
+ on.condition(:process_running) do |c|
1223
+ c.running = true
1224
+ end
1225
+ end
1226
+
1227
+ # determine when process has finished starting
1228
+ w.transition([:start, :restart], :up) do |on|
1229
+ on.condition(:process_running) do |c|
1230
+ c.running = true
1231
+ end
1232
+
1233
+ # failsafe
1234
+ on.condition(:tries) do |c|
1235
+ c.times = 5
1236
+ c.transition = :start
1237
+ end
1238
+ end
1239
+
1240
+ # start if process is not running
1241
+ w.transition(:up, :start) do |on|
1242
+ on.condition(:process_exits)
1243
+ end
1244
+
1245
+ # restart if memory or cpu is too high
1246
+ w.transition(:up, :restart) do |on|
1247
+ on.condition(:memory_usage) do |c|
1248
+ c.interval = 20
1249
+ c.above = 50.megabytes
1250
+ c.times = [3, 5]
1251
+ end
1252
+
1253
+ on.condition(:cpu_usage) do |c|
1254
+ c.interval = 10
1255
+ c.above = 10.percent
1256
+ c.times = [3, 5]
1257
+ end
1258
+ end
1259
+
1260
+ # lifecycle
1261
+ w.lifecycle do |on|
1262
+ on.condition(:flapping) do |c|
1263
+ c.to_state = [:start, :restart]
1264
+ c.times = 5
1265
+ c.within = 5.minute
1266
+ c.transition = :unmonitored
1267
+ c.retry_in = 10.minutes
1268
+ c.retry_times = 5
1269
+ c.retry_within = 2.hours
1270
+ end
1271
+ end
1272
+ end
1273
+
1274
+ ```
1275
+
1276
+ A bit longer, I know, but very straighforward once you understand how the
1277
+ `transition` calls work. The `name`, `interval`, `start`, `stop`, and
1278
+ `pid_file` attributes should be familiar. We also specify the `clean_pid_file`
1279
+ behavior.
1280
+
1281
+ Before jumping into the code, it's important to understand the different
1282
+ states that a Watch can have, and how that state changes over time. At any
1283
+ given time, a Watch will be in one of the `init`, `up`, `start`, or `restart`
1284
+ states. As different conditions are satisfied, the Watch will progress from
1285
+ state to state, enabling and disabling conditions along the way.
1286
+
1287
+ When god first starts, each Watch is placed in the `init` state.
1288
+
1289
+ You'll use the `transition` method to tell god how to transition between
1290
+ states. It takes two arguments. The first argument may be either a symbol or
1291
+ an array of symbols representing the state or states during which the
1292
+ specified conditions should be enabled. The second argument may be either a
1293
+ symbol or a hash. If it is a symbol, then that is the state that will be
1294
+ transitioned to if any of the conditions return `true`. If it is a hash, then
1295
+ that hash must have both `true` and `false` keys, each of which point to a
1296
+ symbol that represents the state to transition to given the corresponding
1297
+ return from the single condition that must be specified.
1298
+
1299
+ ```ruby
1300
+ # determine the state on startup
1301
+ w.transition(:init, { true => :up, false => :start }) do |on|
1302
+ on.condition(:process_running) do |c|
1303
+ c.running = true
1304
+ end
1305
+ end
1306
+ ```
1307
+
1308
+ The first transition block tells god what to do when the Watch is in the
1309
+ `init` state (first argument). This is where I tell god how to determine if my
1310
+ task is already running. Since I'm monitoring a process, I can use the
1311
+ `process_running` condition to determine whether the process is running. If
1312
+ the process is running, it will return true, otherwise it will return false.
1313
+ Since I sent a hash as the second argument to `transition`, the return from
1314
+ `process_running` will determine which of the two states will be transitioned
1315
+ to. If the process is running, the return is true and god will put the Watch
1316
+ into the `up` state. If the process is not running, the return is false and
1317
+ god will put the Watch into the `start` state.
1318
+
1319
+ ```ruby
1320
+ # determine when process has finished starting
1321
+ w.transition([:start, :restart], :up) do |on|
1322
+ on.condition(:process_running) do |c|
1323
+ c.running = true
1324
+ end
1325
+
1326
+ ...
1327
+ end
1328
+ ```
1329
+
1330
+ If god has determined that my process isn't running, the Watch will be put
1331
+ into the `start` state. Upon entering this state, the `start` command that I
1332
+ specified on the Watch will be called. In addition, the above transition
1333
+ specifies a condition that should be enabled when in either the `start` or
1334
+ `restart` states. The condition is another `process_running`, however this
1335
+ time I'm only interested in moving to another state once it returns `true`. A
1336
+ `true` return from this condition means that the process is running and it's
1337
+ ok to transition to the `up` state (second argument to `transition`).
1338
+
1339
+ ```ruby
1340
+ # determine when process has finished starting
1341
+ w.transition([:start, :restart], :up) do |on|
1342
+ ...
1343
+
1344
+ # failsafe
1345
+ on.condition(:tries) do |c|
1346
+ c.times = 5
1347
+ c.transition = :start
1348
+ end
1349
+ end
1350
+ ```
1351
+
1352
+ The other half of this transition uses the `tries` condition to ensure that
1353
+ god doesn't get stuck in this state. It's possible that the process could go
1354
+ down while the transition is being made, in which case god would end up
1355
+ polling forever to see if the process is up. Here I've specified that if this
1356
+ condition is called five times, god should override the normal transition
1357
+ destination and move to the `start` state instead. If you specify a
1358
+ `transition` attribute on any condition, that state will be transferred to
1359
+ instead of the normal transfer destination.
1360
+
1361
+ ```ruby
1362
+ # start if process is not running
1363
+ w.transition(:up, :start) do |on|
1364
+ on.condition(:process_exits)
1365
+ end
1366
+ ```
1367
+
1368
+ This is where the event based system comes into play. Once in the `up` state,
1369
+ I want to be notified when my process exits. The `process_exits` condition
1370
+ registers a callback that will trigger a transition change when it is fired
1371
+ off. Event conditions (like this one) cannot be used in transitions that have
1372
+ a hash for the second argument (as they do not return true or false).
1373
+
1374
+ ```ruby
1375
+ # restart if memory or cpu is too high
1376
+ w.transition(:up, :restart) do |on|
1377
+ on.condition(:memory_usage) do |c|
1378
+ c.interval = 20
1379
+ c.above = 50.megabytes
1380
+ c.times = [3, 5]
1381
+ end
1382
+
1383
+ on.condition(:cpu_usage) do |c|
1384
+ c.interval = 10
1385
+ c.above = 10.percent
1386
+ c.times = [3, 5]
1387
+ end
1388
+ end
1389
+ ```
1390
+
1391
+ Notice that I can have multiple transitions with the same start state. In this
1392
+ case, I want to have the `memory_usage` and `cpu_usage` poll conditions going
1393
+ at the same time that I listen for the process exit event. In the case of
1394
+ runaway CPU or memory usage, however, I want to transition to the `restart`
1395
+ state. When a Watch enters the `restart` state it will either call the
1396
+ `restart` command that you specified, or if none has been set, call the `stop`
1397
+ and then `start` commands.
1398
+
1399
+
1400
+ /////////////////////////////////////////////////////////////////////////////
1401
+ /////////////////////////////////////////////////////////////////////////////
1402
+
1403
+ Extend God with your own Conditions
1404
+ -----------------------------------
1405
+
1406
+ God was designed from the start to allow you to easily write your own custom
1407
+ conditions, making it simple to add tests that are application specific.
1408
+
1409
+
1410
+ /////////////////////////////////////////////////////////////////////////////
1411
+ /////////////////////////////////////////////////////////////////////////////
1412
+
1413
+ Contribute
1414
+ ----------
1415
+
1416
+ If you'd like to hack on god itself or contribute fixes or new functionality,
1417
+ read this section.
1418
+
1419
+ The codebase can be found at https://github.com/mojombo/god. To get started,
1420
+ fork god on GitHub into your own account and then pull that down to your local
1421
+ machine. This way you can easily submit changes via Pull Requests later on.
1422
+
1423
+ ```terminal
1424
+ $ git clone git@github.com:yourusername/god
1425
+ ```
1426
+
1427
+ We recommend using link:https://github.com/sstephenson/rbenv[rbenv] and
1428
+ link:https://github.com/sstephenson/ruby-build[ruby-build] to manage multiple
1429
+ versions of Ruby and their separate gemsets. Any changes to god must work on
1430
+ both Ruby 1.8.7-p352 and 1.9.3-p0.
1431
+
1432
+ God uses link:http://gembundler.com/[bundler] to deal with development
1433
+ dependencies. Once you have the code locally, you can pull in all the
1434
+ dependencies like so:
1435
+
1436
+ ```terminal
1437
+ $ cd god
1438
+ $ bundle install
1439
+ ```
1440
+
1441
+ In order for process events to function during development you'll need to
1442
+ compile the C extensions:
1443
+
1444
+ ```terminal
1445
+ $ cd ext/god
1446
+ $ ruby extconf.rb
1447
+ $ make
1448
+ $ cd ../..
1449
+ ```
1450
+
1451
+ Now you're ready to run the tests and make sure everything is configured
1452
+ properly. On Linux you'll need to run the tests as root in order for the
1453
+ events system to load. On MacOS there is no need to run the tests as root.
1454
+
1455
+ ```terminal
1456
+ $ [sudo] bundle exec rake
1457
+ ```
1458
+
1459
+ To run your development god to make sure config files and such still work
1460
+ properly, just run:
1461
+
1462
+ ```terminal
1463
+ $ [sudo] bundle exec god -c myconfig.god -D
1464
+ ```
1465
+
1466
+ There are a bunch of example config files for various scenarios in
1467
+ `test/configs` that you can try out. For big new features, it's great to add a
1468
+ new test config showing off the usage of the feature.
1469
+
1470
+ If you intend to contribute your changes back to god core, make sure you create
1471
+ a new branch and do your work there. Then, when your changes are ready to be
1472
+ shared with the world, push them to your fork and issue a Pull Request against
1473
+ mojombo/god. Make sure to describe your changes in detail and add relevant
1474
+ tests.
1475
+
1476
+ Any feature additions or changes should be accompanied by corresponding updates
1477
+ to the documentation. It can be found in the `docs` directory. The
1478
+ documentation is done in link:http://github.com/github/gollum[Gollum] format
1479
+ and then converted into the public site at http://godrb.com. To see the
1480
+ generated site locally you'll first need to commit your changes to git and then
1481
+ issue the following:
1482
+
1483
+ ```terminal
1484
+ $ bundle exec rake site
1485
+ ```
1486
+
1487
+ This will open the site in your browser so you can check for correctness.