resurrected_god 0.14.0 → 1.0.0

Sign up to get free protection for your applications and to get access to all the features.
Files changed (138) hide show
  1. checksums.yaml +4 -4
  2. data/History.md +559 -0
  3. data/README.md +9 -4
  4. data/bin/god +26 -28
  5. data/ext/god/extconf.rb +9 -9
  6. data/lib/god/behavior.rb +3 -5
  7. data/lib/god/behaviors/clean_pid_file.rb +4 -6
  8. data/lib/god/behaviors/clean_unix_socket.rb +4 -6
  9. data/lib/god/behaviors/notify_when_flapping.rb +13 -15
  10. data/lib/god/cli/command.rb +109 -89
  11. data/lib/god/cli/run.rb +51 -75
  12. data/lib/god/cli/version.rb +2 -4
  13. data/lib/god/condition.rb +12 -14
  14. data/lib/god/conditions/always.rb +3 -2
  15. data/lib/god/conditions/complex.rb +23 -24
  16. data/lib/god/conditions/cpu_usage.rb +12 -16
  17. data/lib/god/conditions/degrading_lambda.rb +15 -19
  18. data/lib/god/conditions/disk_usage.rb +7 -8
  19. data/lib/god/conditions/file_mtime.rb +3 -7
  20. data/lib/god/conditions/file_touched.rb +4 -5
  21. data/lib/god/conditions/flapping.rb +57 -62
  22. data/lib/god/conditions/http_response_code.rb +27 -32
  23. data/lib/god/conditions/lambda.rb +3 -5
  24. data/lib/god/conditions/memory_usage.rb +12 -16
  25. data/lib/god/conditions/process_exits.rb +12 -12
  26. data/lib/god/conditions/process_running.rb +8 -10
  27. data/lib/god/conditions/socket_responding.rb +23 -28
  28. data/lib/god/conditions/tries.rb +12 -14
  29. data/lib/god/configurable.rb +7 -10
  30. data/lib/god/contact.rb +41 -52
  31. data/lib/god/contacts/airbrake.rb +10 -12
  32. data/lib/god/contacts/campfire.rb +18 -20
  33. data/lib/god/contacts/email.rb +27 -28
  34. data/lib/god/contacts/prowl.rb +16 -17
  35. data/lib/god/contacts/scout.rb +2 -5
  36. data/lib/god/contacts/sensu.rb +15 -11
  37. data/lib/god/contacts/slack.rb +23 -33
  38. data/lib/god/contacts/statsd.rb +8 -8
  39. data/lib/god/contacts/twitter.rb +2 -2
  40. data/lib/god/contacts/webhook.rb +21 -24
  41. data/lib/god/driver.rb +12 -18
  42. data/lib/god/errors.rb +0 -2
  43. data/lib/god/event_handler.rb +40 -49
  44. data/lib/god/event_handlers/dummy_handler.rb +1 -1
  45. data/lib/god/event_handlers/kqueue_handler.rb +2 -2
  46. data/lib/god/event_handlers/netlink_handler.rb +1 -1
  47. data/lib/god/logger.rb +5 -13
  48. data/lib/god/metric.rb +13 -15
  49. data/lib/god/process.rb +93 -98
  50. data/lib/god/simple_logger.rb +13 -15
  51. data/lib/god/socket.rb +21 -23
  52. data/lib/god/sugar.rb +8 -8
  53. data/lib/god/sys_logger.rb +6 -8
  54. data/lib/god/system/portable_poller.rb +1 -9
  55. data/lib/god/system/process.rb +4 -6
  56. data/lib/god/system/slash_proc_poller.rb +17 -19
  57. data/lib/god/task.rb +110 -136
  58. data/lib/god/timeline.rb +2 -4
  59. data/lib/god/trigger.rb +7 -11
  60. data/lib/god/version.rb +1 -1
  61. data/lib/god/watch.rb +51 -57
  62. data/lib/god.rb +122 -148
  63. metadata +17 -370
  64. data/Announce.txt +0 -135
  65. data/Gemfile +0 -5
  66. data/Rakefile +0 -129
  67. data/doc/god.asciidoc +0 -1592
  68. data/doc/intro.asciidoc +0 -20
  69. data/ext/god/.gitignore +0 -5
  70. data/lib/god/compat19.rb +0 -33
  71. data/lib/god/contacts/hipchat.rb +0 -117
  72. data/lib/god/contacts/jabber.rb +0 -75
  73. data/test/configs/child_events/child_events.god +0 -44
  74. data/test/configs/child_events/simple_server.rb +0 -3
  75. data/test/configs/child_polls/child_polls.god +0 -37
  76. data/test/configs/child_polls/simple_server.rb +0 -12
  77. data/test/configs/complex/complex.god +0 -59
  78. data/test/configs/complex/simple_server.rb +0 -3
  79. data/test/configs/contact/contact.god +0 -118
  80. data/test/configs/contact/simple_server.rb +0 -3
  81. data/test/configs/daemon_events/daemon_events.god +0 -37
  82. data/test/configs/daemon_events/simple_server.rb +0 -8
  83. data/test/configs/daemon_events/simple_server_stop.rb +0 -11
  84. data/test/configs/daemon_polls/daemon_polls.god +0 -17
  85. data/test/configs/daemon_polls/simple_server.rb +0 -6
  86. data/test/configs/degrading_lambda/degrading_lambda.god +0 -31
  87. data/test/configs/degrading_lambda/tcp_server.rb +0 -15
  88. data/test/configs/keepalive/keepalive.god +0 -9
  89. data/test/configs/keepalive/keepalive.rb +0 -12
  90. data/test/configs/lifecycle/lifecycle.god +0 -25
  91. data/test/configs/matias/matias.god +0 -50
  92. data/test/configs/real.rb +0 -59
  93. data/test/configs/running_load/running_load.god +0 -16
  94. data/test/configs/stop_options/simple_server.rb +0 -12
  95. data/test/configs/stop_options/stop_options.god +0 -39
  96. data/test/configs/stress/simple_server.rb +0 -3
  97. data/test/configs/stress/stress.god +0 -15
  98. data/test/configs/task/logs/.placeholder +0 -0
  99. data/test/configs/task/task.god +0 -26
  100. data/test/configs/test.rb +0 -61
  101. data/test/configs/usr1_trapper.rb +0 -10
  102. data/test/helper.rb +0 -172
  103. data/test/suite.rb +0 -6
  104. data/test/test_airbrake.rb +0 -14
  105. data/test/test_behavior.rb +0 -18
  106. data/test/test_campfire.rb +0 -22
  107. data/test/test_condition.rb +0 -52
  108. data/test/test_conditions_disk_usage.rb +0 -50
  109. data/test/test_conditions_http_response_code.rb +0 -109
  110. data/test/test_conditions_process_running.rb +0 -40
  111. data/test/test_conditions_socket_responding.rb +0 -176
  112. data/test/test_conditions_tries.rb +0 -67
  113. data/test/test_contact.rb +0 -109
  114. data/test/test_driver.rb +0 -26
  115. data/test/test_email.rb +0 -34
  116. data/test/test_event_handler.rb +0 -82
  117. data/test/test_god.rb +0 -710
  118. data/test/test_god_system.rb +0 -201
  119. data/test/test_handlers_kqueue_handler.rb +0 -16
  120. data/test/test_hipchat.rb +0 -23
  121. data/test/test_jabber.rb +0 -29
  122. data/test/test_logger.rb +0 -55
  123. data/test/test_metric.rb +0 -74
  124. data/test/test_process.rb +0 -263
  125. data/test/test_prowl.rb +0 -15
  126. data/test/test_registry.rb +0 -15
  127. data/test/test_sensu.rb +0 -11
  128. data/test/test_slack.rb +0 -57
  129. data/test/test_socket.rb +0 -34
  130. data/test/test_statsd.rb +0 -22
  131. data/test/test_sugar.rb +0 -42
  132. data/test/test_system_portable_poller.rb +0 -17
  133. data/test/test_system_process.rb +0 -30
  134. data/test/test_task.rb +0 -246
  135. data/test/test_timeline.rb +0 -37
  136. data/test/test_trigger.rb +0 -63
  137. data/test/test_watch.rb +0 -286
  138. data/test/test_webhook.rb +0 -22
data/doc/god.asciidoc DELETED
@@ -1,1592 +0,0 @@
1
- Installation
2
- ------------
3
-
4
- The best way to get god is via rubygems:
5
-
6
- ```terminal
7
- $ [sudo] gem install god
8
- ```
9
-
10
- Requirements
11
- ------------
12
-
13
- God currently only works on *Linux (kernel 2.6.15+), BSD,* and *Darwin*
14
- systems. No support for Windows is planned. Event based conditions on Linux
15
- systems require the `cn` (connector) kernel module loaded or compiled into
16
- the kernel and god must be run as root.
17
-
18
- The following systems have been tested. Help us test it on others!
19
-
20
- * Darwin 10.4.10
21
- * RedHat Fedora 6-15
22
- * Ubuntu Dapper (no events)
23
- * Ubuntu Feisty
24
- * CentOS 4.5 (no events), 5, 6
25
-
26
-
27
- Quick Start
28
- -----------
29
-
30
- Note: this quick start guide requires god 0.12.0 or above. You can check your
31
- version by running:
32
-
33
- ```terminal
34
- $ god --version
35
- ```
36
-
37
- The easiest way to understand how god will make your life better is by trying
38
- out a simple example. To get you up and running quickly, I'll show you how to
39
- keep a trivial server running.
40
-
41
- Open up a new directory and write a simple server. Let's call it
42
- `simple.rb`:
43
-
44
- ```ruby
45
- loop do
46
- puts 'Hello'
47
- sleep 1
48
- end
49
- ```
50
-
51
- Now we'll write a god config file that tells god about our process. Place it
52
- in the same directory and call it `simple.god`:
53
-
54
- ```ruby
55
- God.watch do |w|
56
- w.name = "simple"
57
- w.start = "ruby /full/path/to/simple.rb"
58
- w.keepalive
59
- end
60
- ```
61
-
62
- This is the simplest possible god configuration. We start by declaring a
63
- `God.watch` block. A watch in god represents a process that we want to watch
64
- and control. Each watch must have, at minimum, a unique name and a command that
65
- tells god how to start the process. The `keepalive` declaration tells god to
66
- keep this process alive. If the process is not running when god starts, it will
67
- be started. If the process dies, it will be restarted.
68
-
69
- In this example the `simple` process runs foreground, so god will take care of
70
- daemonizing it and keeping track of the PID for us. When possible, it's best to
71
- let god daemonize processes for us, that way we don't have to worry about
72
- specifying and keeping track of PID files. Later on we'll see how to manage
73
- processes that can't run foreground or that require PID files to be specified.
74
-
75
- To run god, we give it the configuration file we wrote with `-c`. To see what's
76
- going on, we can ask it to run foreground with `-D`:
77
-
78
- ```terminal
79
- $ god -c path/to/simple.god -D
80
- ```
81
-
82
- There are two ways that god can monitor your process. The first and better way
83
- is with process events. Not every system supports it, but those that do will
84
- automatically use it. With events, god will know immediately when a process
85
- exits. For those systems without process event support, god will use a polling
86
- mechanism. The output you see throughout this section will show both ways.
87
-
88
- After starting god, you should see some output like the following:
89
-
90
- ```terminal
91
- # Events
92
-
93
- I [2011-12-10 15:24:34] INFO: Loading simple.god
94
- I [2011-12-10 15:24:34] INFO: Syslog enabled.
95
- I [2011-12-10 15:24:34] INFO: Using pid file directory: /Users/tom/.god/pids
96
- I [2011-12-10 15:24:34] INFO: Started on drbunix:///tmp/god.17165.sock
97
- I [2011-12-10 15:24:34] INFO: simple move 'unmonitored' to 'init'
98
- I [2011-12-10 15:24:34] INFO: simple moved 'unmonitored' to 'init'
99
- I [2011-12-10 15:24:34] INFO: simple [trigger] process is not running (ProcessRunning)
100
- I [2011-12-10 15:24:34] INFO: simple move 'init' to 'start'
101
- I [2011-12-10 15:24:34] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
102
- I [2011-12-10 15:24:34] INFO: simple moved 'init' to 'start'
103
- I [2011-12-10 15:24:34] INFO: simple [trigger] process is running (ProcessRunning)
104
- I [2011-12-10 15:24:34] INFO: simple move 'start' to 'up'
105
- I [2011-12-10 15:24:34] INFO: simple registered 'proc_exit' event for pid 23298
106
- I [2011-12-10 15:24:34] INFO: simple moved 'start' to 'up'
107
-
108
- # Polls
109
-
110
- I [2011-12-07 09:40:18] INFO: Loading simple.god
111
- I [2011-12-07 09:40:18] INFO: Syslog enabled.
112
- I [2011-12-07 09:40:18] INFO: Using pid file directory: /Users/tom/.god/pids
113
- I [2011-12-07 09:40:18] INFO: Started on drbunix:///tmp/god.17165.sock
114
- I [2011-12-07 09:40:18] INFO: simple move 'unmonitored' to 'up'
115
- I [2011-12-07 09:40:18] INFO: simple moved 'unmonitored' to 'up'
116
- I [2011-12-07 09:40:18] INFO: simple [trigger] process is not running (ProcessRunning)
117
- I [2011-12-07 09:40:18] INFO: simple move 'up' to 'start'
118
- I [2011-12-07 09:40:18] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
119
- I [2011-12-07 09:40:19] INFO: simple moved 'up' to 'up'
120
- I [2011-12-07 09:40:19] INFO: simple [ok] process is running (ProcessRunning)
121
- I [2011-12-07 09:40:24] INFO: simple [ok] process is running (ProcessRunning)
122
- I [2011-12-07 09:40:29] INFO: simple [ok] process is running (ProcessRunning)
123
- ```
124
-
125
- Here you can see god starting up, noticing that the `simple` process isn't
126
- running, starting it, and then checking every five seconds to make sure it's
127
- up. If you'd like to see god work its magic, go ahead and kill the `simple`
128
- process. You should then see something like this:
129
-
130
- ```terminal
131
- # Events
132
-
133
- I [2011-12-10 15:33:38] INFO: simple [trigger] process 23416 exited (ProcessExits)
134
- I [2011-12-10 15:33:38] INFO: simple move 'up' to 'start'
135
- I [2011-12-10 15:33:38] INFO: simple deregistered 'proc_exit' event for pid 23416
136
- I [2011-12-10 15:33:38] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
137
- I [2011-12-10 15:33:38] INFO: simple moved 'up' to 'start'
138
- I [2011-12-10 15:33:38] INFO: simple [trigger] process is running (ProcessRunning)
139
- I [2011-12-10 15:33:38] INFO: simple move 'start' to 'up'
140
- I [2011-12-10 15:33:38] INFO: simple registered 'proc_exit' event for pid 23601
141
- I [2011-12-10 15:33:38] INFO: simple moved 'start' to 'up'
142
-
143
- # Polls
144
-
145
- I [2011-12-07 09:54:59] INFO: simple [ok] process is running (ProcessRunning)
146
- I [2011-12-07 09:55:04] INFO: simple [ok] process is running (ProcessRunning)
147
- I [2011-12-07 09:55:09] INFO: simple [trigger] process is not running (ProcessRunning)
148
- I [2011-12-07 09:55:09] INFO: simple move 'up' to 'start'
149
- I [2011-12-07 09:55:09] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
150
- I [2011-12-07 09:55:09] INFO: simple moved 'up' to 'up'
151
- I [2011-12-07 09:55:09] INFO: simple [ok] process is running (ProcessRunning)
152
- I [2011-12-07 09:55:14] INFO: simple [ok] process is running (ProcessRunning)
153
- ```
154
-
155
- While keeping a process up is useful, it would be even better if we could make
156
- sure our process was behaving well and restart it when resource utilization
157
- exceeds our specifications. With a few additions, we can easily have our
158
- process restarted when memory usage or CPU goes above certain limits. Edit
159
- your `sample.god` config file to look like this:
160
-
161
- ```ruby
162
- God.watch do |w|
163
- w.name = "simple"
164
- w.start = "ruby /full/path/to/simple.rb"
165
- w.keepalive(:memory_max => 150.megabytes,
166
- :cpu_max => 50.percent)
167
- end
168
- ```
169
-
170
- Here I've specified a `:memory_max` option to the `keepalive` command. Now if
171
- the process memory usage goes above 150 megabytes, god will restart it.
172
- Similarly, by setting the `:cpu_max`, god will restart my process if its CPU
173
- usage goes over 50%. By default these properties will be checked every 30
174
- seconds and will be acted upon if there is an overage for three out of any
175
- five checks. This prevents the process from getting restarted for temporary
176
- resource spikes.
177
-
178
- To test this out, modify your `simple.rb` server script to introduce a memory
179
- leak:
180
-
181
- ```ruby
182
- data = ''
183
- loop do
184
- puts 'Hello'
185
- 100000.times { data << 'x' }
186
- end
187
- ```
188
-
189
- Ctrl-C out of the foregrounded god instance. Notice that your current `simple`
190
- server will continue to run. Start god again with the same command as before.
191
- Now instead of starting the `simple` process, it will notice that one is
192
- already running and simply switch to the `up` state.
193
-
194
- ```terminal
195
- # Events
196
-
197
- I [2011-12-10 15:36:00] INFO: Loading simple.god
198
- I [2011-12-10 15:36:00] INFO: Syslog enabled.
199
- I [2011-12-10 15:36:00] INFO: Using pid file directory: /Users/tom/.god/pids
200
- I [2011-12-10 15:36:00] INFO: Started on drbunix:///tmp/god.17165.sock
201
- I [2011-12-10 15:36:00] INFO: simple move 'unmonitored' to 'init'
202
- I [2011-12-10 15:36:00] INFO: simple moved 'unmonitored' to 'init'
203
- I [2011-12-10 15:36:00] INFO: simple [trigger] process is running (ProcessRunning)
204
- I [2011-12-10 15:36:00] INFO: simple move 'init' to 'up'
205
- I [2011-12-10 15:36:00] INFO: simple registered 'proc_exit' event for pid 23601
206
- I [2011-12-10 15:36:00] INFO: simple moved 'init' to 'up'
207
-
208
- # Polls
209
-
210
- I [2011-12-07 14:50:46] INFO: Loading simple.god
211
- I [2011-12-07 14:50:46] INFO: Syslog enabled.
212
- I [2011-12-07 14:50:46] INFO: Using pid file directory: /Users/tom/.god/pids
213
- I [2011-12-07 14:50:47] INFO: Started on drbunix:///tmp/god.17165.sock
214
- I [2011-12-07 14:50:47] INFO: simple move 'unmonitored' to 'up'
215
- I [2011-12-07 14:50:47] INFO: simple moved 'unmonitored' to 'up'
216
- I [2011-12-07 14:50:47] INFO: simple [ok] process is running (ProcessRunning)
217
- ```
218
-
219
- In order to get our new `simple` server running, we can issue a command to god
220
- to have our process restarted:
221
-
222
- ```terminal
223
- $ god restart simple
224
- ```
225
-
226
- From the logs you can see god killing and restarting the process:
227
-
228
- ```terminal
229
- # Events
230
-
231
- I [2011-12-10 15:38:13] INFO: simple move 'up' to 'restart'
232
- I [2011-12-10 15:38:13] INFO: simple deregistered 'proc_exit' event for pid 23601
233
- I [2011-12-10 15:38:13] INFO: simple stop: default lambda killer
234
- I [2011-12-10 15:38:13] INFO: simple sent SIGTERM
235
- I [2011-12-10 15:38:14] INFO: simple process stopped
236
- I [2011-12-10 15:38:14] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
237
- I [2011-12-10 15:38:14] INFO: simple moved 'up' to 'restart'
238
- I [2011-12-10 15:38:14] INFO: simple [trigger] process is running (ProcessRunning)
239
- I [2011-12-10 15:38:14] INFO: simple move 'restart' to 'up'
240
- I [2011-12-10 15:38:14] INFO: simple registered 'proc_exit' event for pid 23707
241
- I [2011-12-10 15:38:14] INFO: simple moved 'restart' to 'up'
242
-
243
- # Polls
244
-
245
- I [2011-12-07 14:51:13] INFO: simple [ok] process is running (ProcessRunning)
246
- I [2011-12-07 14:51:13] INFO: simple move 'up' to 'restart'
247
- I [2011-12-07 14:51:13] INFO: simple stop: default lambda killer
248
- I [2011-12-07 14:51:13] INFO: simple sent SIGTERM
249
- I [2011-12-07 14:51:14] INFO: simple process stopped
250
- I [2011-12-07 14:51:14] INFO: simple start: ruby /Users/tom/dev/mojombo/god/simple.rb
251
- I [2011-12-07 14:51:14] INFO: simple moved 'up' to 'up'
252
- I [2011-12-07 14:51:14] INFO: simple [ok] process is running (ProcessRunning)
253
- ```
254
-
255
- God will now start reporting on memory and CPU utilization of your process:
256
-
257
- ```terminal
258
- # Events and Polls
259
-
260
- I [2011-12-07 14:54:37] INFO: simple [ok] process is running (ProcessRunning)
261
- I [2011-12-07 14:54:37] INFO: simple [ok] memory within bounds [2032kb] (MemoryUsage)
262
- I [2011-12-07 14:54:37] INFO: simple [ok] cpu within bounds [0.0%%] (CpuUsage)
263
- I [2011-12-07 14:54:42] INFO: simple [ok] process is running (ProcessRunning)
264
- I [2011-12-07 14:54:42] INFO: simple [ok] memory within bounds [2032kb, 13492kb] (MemoryUsage)
265
- I [2011-12-07 14:54:42] INFO: simple [ok] cpu within bounds [0.0%%, *99.7%%] (CpuUsage)
266
- I [2011-12-07 14:54:47] INFO: simple [ok] process is running (ProcessRunning)
267
- I [2011-12-07 14:54:47] INFO: simple [ok] memory within bounds [2032kb, 13492kb, 25568kb] (MemoryUsage)
268
- I [2011-12-07 14:54:47] INFO: simple [ok] cpu within bounds [0.0%%, *99.7%%, *100.0%%] (CpuUsage)
269
- I [2011-12-07 14:54:52] INFO: simple [ok] process is running (ProcessRunning)
270
- I [2011-12-07 14:54:52] INFO: simple [ok] memory within bounds [2032kb, 13492kb, 25568kb, 37556kb] (MemoryUsage)
271
- I [2011-12-07 14:54:52] INFO: simple [trigger] cpu out of bounds [0.0%%, *99.7%%, *100.0%%, *98.4%%] (CpuUsage)
272
- I [2011-12-07 14:54:52] INFO: simple move 'up' to 'restart'
273
- ```
274
-
275
- On the last line of the above log you can see that CPU usage has gone above
276
- 50% for three cycles and god will issue a restart operation. God will continue
277
- to monitor the `simple` process for as long as god is running and the process
278
- is set to be monitored.
279
-
280
- Now, before you kill the god process, let's kill the `simple` server by asking
281
- god to stop it for us. In a new terminal, issue the command:
282
-
283
- ```terminal
284
- $ god stop simple
285
- ```
286
-
287
- You should see the following output:
288
-
289
- ```terminal
290
- Sending 'stop' command
291
-
292
- The following watches were affected:
293
- simple
294
- ```
295
-
296
- And in the foregrounded god terminal window, you'll see the log of what
297
- happened:
298
-
299
- ```terminal
300
- # Events
301
-
302
- I [2011-12-10 15:41:04] INFO: simple stop: default lambda killer
303
- I [2011-12-10 15:41:04] INFO: simple sent SIGTERM
304
- I [2011-12-10 15:41:05] INFO: simple process stopped
305
- I [2011-12-10 15:41:05] INFO: simple move 'up' to 'unmonitored'
306
- I [2011-12-10 15:41:05] INFO: simple deregistered 'proc_exit' event for pid 23707
307
- I [2011-12-10 15:41:05] INFO: simple moved 'up' to 'unmonitored'
308
-
309
- # Polls
310
-
311
- I [2011-12-07 09:59:59] INFO: simple [ok] process is running (ProcessRunning)
312
- I [2011-12-07 10:00:04] INFO: simple [ok] process is running (ProcessRunning)
313
- I [2011-12-07 10:00:07] INFO: simple stop: default lambda killer
314
- I [2011-12-07 10:00:07] INFO: simple sent SIGTERM
315
- I [2011-12-07 10:00:08] INFO: simple process stopped
316
- I [2011-12-07 10:00:08] INFO: simple move 'up' to 'unmonitored'
317
- I [2011-12-07 10:00:08] INFO: simple moved 'up' to 'unmonitored'
318
- ```
319
-
320
- Now feel free to Ctrl-C out of god. Congratulations! You've just taken god for
321
- a test ride and seen how easy it is to keep your processes running.
322
-
323
- This is just the beginning of what god can do, and in reality, the `keepalive`
324
- command is a convenience method written using more advanced transitional and
325
- condition constructs that may be used directly. You can configure many
326
- different kinds of conditions to have your process restarted when memory or
327
- CPU are too high, when disk usage is above a threshold, when a process returns
328
- an HTTP error code on a specific URL, and many more. In addition you can write
329
- your own custom conditions and use them in your configuration files. Many
330
- different lifecycle controls are available alongside a sophisticated and
331
- extensible notifications system. Keep reading to find out what makes god
332
- different from other monitoring systems and how it can help you solve many of
333
- your process monitoring and control problems.
334
-
335
-
336
- Config Files are Ruby Code!
337
- ---------------------------
338
-
339
- Now that you've seen how to get started quickly, let's see how to use the more
340
- powerful aspects of god. Once again, the best way to learn will be through an
341
- example. The following configuration file is what I once used at gravatar.com
342
- to keep the mongrels running:
343
-
344
- ```ruby
345
- RAILS_ROOT = "/Users/tom/dev/gravatar2"
346
-
347
- %w{8200 8201 8202}.each do |port|
348
- God.watch do |w|
349
- w.name = "gravatar2-mongrel-#{port}"
350
-
351
- w.start = "mongrel_rails start -c #{RAILS_ROOT} -p #{port} \
352
- -P #{RAILS_ROOT}/log/mongrel.#{port}.pid -d"
353
- w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
354
- w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
355
-
356
- w.pid_file = File.join(RAILS_ROOT, "log/mongrel.#{port}.pid")
357
-
358
- w.behavior(:clean_pid_file)
359
-
360
- w.start_if do |start|
361
- start.condition(:process_running) do |c|
362
- c.interval = 5.seconds
363
- c.running = false
364
- end
365
- end
366
-
367
- w.restart_if do |restart|
368
- restart.condition(:memory_usage) do |c|
369
- c.above = 150.megabytes
370
- c.times = [3, 5] # 3 out of 5 intervals
371
- end
372
-
373
- restart.condition(:cpu_usage) do |c|
374
- c.above = 50.percent
375
- c.times = 5
376
- end
377
- end
378
-
379
- # lifecycle
380
- w.lifecycle do |on|
381
- on.condition(:flapping) do |c|
382
- c.to_state = [:start, :restart]
383
- c.times = 5
384
- c.within = 5.minute
385
- c.transition = :unmonitored
386
- c.retry_in = 10.minutes
387
- c.retry_times = 5
388
- c.retry_within = 2.hours
389
- end
390
- end
391
- end
392
- end
393
- ```
394
-
395
- That's a lot to take in at once, so I'll break it down by section and explain
396
- what's going on in each.
397
-
398
- ```ruby
399
- RAILS_ROOT = "/var/www/gravatar2/current"
400
- ```
401
-
402
- Here I've set a constant that is used throughout the file. Keeping the
403
- `RAILS_ROOT` value in a constant makes it easy to adapt this script to other
404
- applications. Because the config file is Ruby code, I can set whatever
405
- variables or constants I want that make the configuration more concise and
406
- easier to work with.
407
-
408
- ```ruby
409
- %w{8200 8201 8202}.each do |port|
410
- ...
411
- end
412
- ```
413
-
414
- Because the config file is written in actual Ruby code, we can construct loops
415
- and do other intelligent things that are impossible in your every day, run of
416
- the mill config file. I need to watch three mongrels, so I simply loop over
417
- their port numbers, eliminating duplication and making my life a whole lot
418
- easier.
419
-
420
- ```ruby
421
- God.watch do |w|
422
- w.name = "gravatar2-mongrel-#{port}"
423
-
424
- w.start = "mongrel_rails start -c #{RAILS_ROOT} -p #{port} \
425
- -P #{RAILS_ROOT}/log/mongrel.#{port}.pid -d"
426
- w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
427
- w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/log/mongrel.#{port}.pid"
428
-
429
- w.pid_file = File.join(RAILS_ROOT, "log/mongrel.#{port}.pid")
430
-
431
- ...
432
- end
433
- ```
434
-
435
- A `watch` represents a single process that has concrete start, stop, and/or
436
- restart operations. You can define as many watches as you like. In the example
437
- above, I've got some Rails instances running in Mongrels that I need to keep
438
- alive. Every watch must have a unique `name` so that it can be identified
439
- later on. The `start` and `stop` attributes specify the commands to start
440
- and stop the process. If no `restart` attribute is set, restart will be
441
- represented by a call to stop followed by a call to start. The
442
- optional `grace` attribute sets the amount of time following a
443
- start/stop/restart command to wait before resuming normal monitoring
444
- operations. If the process you're watching runs as a daemon (as
445
- mine does), you'll need to set the `pid_file` attribute.
446
-
447
- ```ruby
448
- w.behavior(:clean_pid_file)
449
- ```
450
-
451
- Behaviors allow you to execute additional commands around start/stop/restart
452
- commands. In our case, if the process dies it will leave a PID file behind.
453
- The next time a start command is issued, it will fail, complaining about the
454
- leftover PID file. We'd like the PID file cleaned up before a start command is
455
- issued. The built-in behavior `clean_pid_file` will do just that.
456
-
457
- ```ruby
458
- w.start_if do |start|
459
- start.condition(:process_running) do |c|
460
- c.interval = 5.seconds
461
- c.running = false
462
- end
463
- end
464
- ```
465
-
466
- Watches contain conditions grouped by the action to execute should they return
467
- `true`. I start with a `start_if` block that contains a single condition.
468
- Conditions are specified by calling `condition` with an identifier, in this
469
- case `:process_running`. Each condition can specify a poll interval that will
470
- override the default watch interval. In this case, I want to check that the
471
- process is still running every 5 seconds instead of the 30 second interval
472
- that other conditions will inherit. The ability to set condition specific poll
473
- intervals makes it possible to run critical tests (such as :process_running)
474
- more often than less critical tests (such as :memory_usage and :cpu_usage).
475
-
476
- ```ruby
477
- w.restart_if do |restart|
478
- restart.condition(:memory_usage) do |c|
479
- c.above = 150.megabytes
480
- c.times = [3, 5] # 3 out of 5 intervals
481
- end
482
-
483
- ...
484
- end
485
- ```
486
-
487
- Similar to `start_if` there is a `restart_if` command that groups conditions
488
- that should trigger a restart. The `memory_usage` condition will fail if the
489
- specified process is using too much memory. The maximum allowable amount of
490
- memory is specified with the `above` attribute (you can use the `kilobytes`,
491
- `megabytes`, or `gigabytes` helpers). The number of times the test needs to
492
- fail in order to trigger a restart is set with `times`. This can be either an
493
- integer or an array. An integer means it must fail that many times in a row
494
- while an array `[x, y]` means it must fail `x` times out of the last `y`
495
- tests.
496
-
497
- ```ruby
498
- w.restart_if do |restart|
499
- ...
500
-
501
- restart.condition(:cpu_usage) do |c|
502
- c.above = 50.percent
503
- c.times = 5
504
- end
505
- end
506
- ```
507
-
508
- To keep an eye on CPU usage, I've employed the `cpu_usage` condition. When CPU
509
- usage for a Mongrel process is over 50% for 5 consecutive intervals, it will
510
- be restarted.
511
-
512
- ```ruby
513
- w.lifecycle do |on|
514
- on.condition(:flapping) do |c|
515
- c.to_state = [:start, :restart]
516
- c.times = 5
517
- c.within = 5.minute
518
- c.transition = :unmonitored
519
- c.retry_in = 10.minutes
520
- c.retry_times = 5
521
- c.retry_within = 2.hours
522
- end
523
- end
524
- ```
525
-
526
- Conditions inside a `lifecycle` section are active as long as the process is being monitored (they live across state changes).
527
-
528
- The `:flapping` condition guards against the edge case wherein god rapidly
529
- starts or restarts your application. Things like server configuration changes
530
- or the unavailability of external services could make it impossible for my
531
- process to start. In that case, god will try to start my process over and over
532
- to no avail. The `:flapping` condition provides two levels of giving up on
533
- flapping processes. If I were to translate the options of the code above, it
534
- would be something like: If this watch is started or restarted five times
535
- within 5 minutes, then unmonitor it...then after ten minutes, monitor it
536
- again to see if it was just a temporary problem; if the process is seen to be
537
- flapping five times within two hours, then give up completely.
538
-
539
- That's it!
540
-
541
- /////////////////////////////////////////////////////////////////////////////
542
- /////////////////////////////////////////////////////////////////////////////
543
-
544
- Starting and Controlling God
545
- ----------------------------
546
-
547
- To start the god monitoring process as a daemon simply run the `god`
548
- executable passing in the path to the config file (you need to sudo if you're
549
- using events on Linux or want to use the setuid/setgid functionality):
550
-
551
- ```terminal
552
- $ sudo god -c /path/to/config.god
553
- ```
554
-
555
- While you're writing your config file, it can be helpful to run god in the
556
- foreground so you can see the log messages. You can do that with:
557
-
558
- ```terminal
559
- $ sudo god -c /path/to/config.god -D
560
- ```
561
-
562
- You can start/restart/stop/monitor/unmonitor your Watches with the same
563
- utility like so:
564
-
565
- ```terminal
566
- $ sudo god stop gravatar2-mongrel-8200
567
- ```
568
-
569
- /////////////////////////////////////////////////////////////////////////////
570
- /////////////////////////////////////////////////////////////////////////////
571
-
572
- Watching Non-Daemon Processes
573
- -----------------------------
574
-
575
- Need to watch a script that doesn't have built in daemonization? No problem!
576
- God will daemonize and keep track of your process for you. If you don't
577
- specify a `pid_file` attribute for a watch, it will be auto-daemonized and a
578
- PID file will be stored for it in `/var/run/god`.
579
-
580
-
581
- ```ruby
582
- God.pid_file_directory = '/home/tom/pids'
583
-
584
- # Watcher that auto-daemonizes and creates the pid file
585
- God.watch do |w|
586
- w.name = 'mongrel'
587
- w.pid_file = w.pid_file = File.join(RAILS_ROOT, "log/mongrel.pid")
588
-
589
- w.start = "mongrel_rails start -P #{RAILS_ROOT}/log/mongrel.pid -d"
590
-
591
- # ...
592
- end
593
-
594
- # Watcher that does not auto-daemonize
595
- God.watch do |w|
596
- w.name = 'worker'
597
- # w.pid_file = is not set
598
-
599
- w.start = "rake resque:worker"
600
-
601
- # ...
602
- end
603
- ```
604
-
605
-
606
- If you'd rather have the PID file stored in a different location, you can
607
- set it at the top of your config:
608
-
609
- ```ruby
610
- God.pid_file_directory = '/home/tom/pids'
611
- ```
612
-
613
- The directory you specify must be writable by god.
614
-
615
-
616
- /////////////////////////////////////////////////////////////////////////////
617
- /////////////////////////////////////////////////////////////////////////////
618
-
619
- Grouping Watches
620
- ----------------
621
-
622
- Watches can be assigned to groups. These groups can then be controlled
623
- together from the command line.
624
-
625
- ```ruby
626
- God.watch do |w|
627
- ...
628
-
629
- w.group = 'mongrels'
630
-
631
- ...
632
- end
633
- ```
634
-
635
- The above configuration now allows you to control the watch (and any others
636
- that are in the group) with a single command:
637
-
638
- ```terminal
639
- $ sudo god stop mongrels
640
- ```
641
-
642
- /////////////////////////////////////////////////////////////////////////////
643
- /////////////////////////////////////////////////////////////////////////////
644
-
645
- Invoke Commands for all watches
646
- -------------------------------
647
-
648
- If you need to invoke a command (e.g. Stop / Start / Restart) on all watches
649
- you can simply omit the second parameter. For example, to start all watches:
650
-
651
- ```terminal
652
- $ sudo god start
653
- ```
654
-
655
- /////////////////////////////////////////////////////////////////////////////
656
- /////////////////////////////////////////////////////////////////////////////
657
-
658
- Redirecting STDOUT and STDERR of your Process
659
- ---------------------------------------------
660
-
661
- By default, the STDOUT stream for your process is redirected to `/dev/null`.
662
- To get access to this output, you can redirect the stream either to a file or
663
- to a command.
664
-
665
- To redirect STDOUT to a file, set the `log` attribute to a file path. The file
666
- will be written in append mode and created if it does not exist.
667
-
668
- ```ruby
669
- God.watch do |w|
670
- ...
671
-
672
- w.log = '/var/log/myprocess.log'
673
-
674
- ...
675
- end
676
- ```
677
-
678
- To redirect STDOUT to a command that will be run for you, set the `log_cmd`
679
- attribute to a command.
680
-
681
- ```ruby
682
- God.watch do |w|
683
- ...
684
-
685
- w.log_cmd = '/usr/bin/logger'
686
-
687
- ...
688
- end
689
- ```
690
-
691
- By default, STDERR is redirected to STDOUT. You can redirect it to a file or a
692
- command just like STDOUT by setting the `err_log` or `err_log_cmd` attributes
693
- respectively.
694
-
695
- /////////////////////////////////////////////////////////////////////////////
696
- /////////////////////////////////////////////////////////////////////////////
697
-
698
- Changing UID/GID for processes
699
- ------------------------------
700
-
701
- It is possible to have god run your start/stop/restart commands as a specific
702
- user/group. This can be done by setting the `uid` and/or `gid` attributes of a
703
- watch.
704
-
705
- ```ruby
706
- God.watch do |w|
707
- ...
708
-
709
- w.uid = 'tom'
710
- w.gid = 'devs'
711
-
712
- ...
713
- end
714
- ```
715
-
716
- This only works for commands specified as a string. Lambda commands are
717
- unaffected.
718
-
719
- /////////////////////////////////////////////////////////////////////////////
720
- /////////////////////////////////////////////////////////////////////////////
721
-
722
- Setting the Working Directory
723
- -----------------------------
724
-
725
- By default, God sets the working directory to `/` before running your process.
726
- You can change this by setting the `dir` attribute on the watch.
727
-
728
- ```ruby
729
- God.watch do |w|
730
- ...
731
-
732
- w.dir = '/var/www/myapp'
733
-
734
- ...
735
- end
736
- ```
737
-
738
- /////////////////////////////////////////////////////////////////////////////
739
- /////////////////////////////////////////////////////////////////////////////
740
-
741
- Setting environment variables
742
- -----------------------------
743
-
744
- You can set any number of environment variables you wish via the `env`
745
- attribute of a watch.
746
-
747
- ```ruby
748
- God.watch do |w|
749
- ...
750
-
751
- w.env = { 'RAILS_ROOT' => "/var/www/myapp",
752
- 'RAILS_ENV' => "production" }
753
-
754
- ...
755
- end
756
- ```
757
-
758
- /////////////////////////////////////////////////////////////////////////////
759
- /////////////////////////////////////////////////////////////////////////////
760
-
761
- Using chroot to Change the File System Root
762
- -------------------------------------------
763
-
764
- If you want your process to run chrooted, simply use the `chroot` attribute on
765
- the watch. The specified directory must exist and have a `/dev/null`.
766
-
767
- ```ruby
768
- God.watch do |w|
769
- ...
770
-
771
- w.chroot = '/var/myroot'
772
-
773
- ...
774
- end
775
- ```
776
-
777
- /////////////////////////////////////////////////////////////////////////////
778
- /////////////////////////////////////////////////////////////////////////////
779
-
780
- Lambda commands
781
- ---------------
782
-
783
- In addition to specifying start/stop/restart commands as strings (to be
784
- executed via the shell), you can specify a lambda that will be called.
785
-
786
- ```ruby
787
- God.watch do |w|
788
- ...
789
-
790
- w.start = lambda { ENV['APACHE'] ? `apachectl -k graceful` : `lighttpd restart` }
791
-
792
- ...
793
- end
794
- ```
795
-
796
- /////////////////////////////////////////////////////////////////////////////
797
- /////////////////////////////////////////////////////////////////////////////
798
-
799
- Customizing the Default Stop Lambda
800
- -----------------------------------
801
-
802
- If you do not provide a stop command, God will attempt to stop your process by
803
- first sending a SIGTERM. It will then wait for ten seconds for the process to
804
- exit. If after this time it still has not exited, it will be sent a SIGKILL.
805
- You can customize the stop signal and/or the time to wait for the process to
806
- exit by setting the `stop_signal` and `stop_timeout` attributes on the watch.
807
-
808
- ```ruby
809
- God.watch do |w|
810
- ...
811
-
812
- w.stop_signal = 'QUIT'
813
- w.stop_timeout = 20.seconds
814
-
815
- ...
816
- end
817
- ```
818
-
819
-
820
- /////////////////////////////////////////////////////////////////////////////
821
- /////////////////////////////////////////////////////////////////////////////
822
-
823
- Loading Other Config Files
824
- --------------------------
825
-
826
- You should feel free to separate your god configs into separate files for
827
- easier organization. You can load in other configs using Ruby's normal `load`
828
- method, or use the convenience method `God.load` which allows for glob-style
829
- paths:
830
-
831
- ```ruby
832
- # load in all god configs
833
- God.load "/usr/local/conf/*.god"
834
- ```
835
-
836
- God won't start its monitoring operations until all configurations have been
837
- loaded.
838
-
839
- /////////////////////////////////////////////////////////////////////////////
840
- /////////////////////////////////////////////////////////////////////////////
841
-
842
- Dynamically Loading Config Files Into an Already Running God
843
- ------------------------------------------------------------
844
-
845
- God allows you to load or reload configurations into an already running
846
- instance. There are a few things to consider when doing this:
847
-
848
- * Existng Watches with the same `name` as the incoming Watches will be
849
- overidden by the new config.
850
- * All paths must be either absolute or relative to the path from which god was
851
- started.
852
-
853
- To load a config into a running god, issue the following command:
854
-
855
- ```terminal
856
- $ sudo god load path/to/config.god
857
- ```
858
-
859
- Config files that are loaded dynamically can contain anything that a normal
860
- config file contains, however, global options such as `God.pid_file_directory`
861
- blocks will be ignored (and produce a warning in the logs).
862
-
863
- /////////////////////////////////////////////////////////////////////////////
864
- /////////////////////////////////////////////////////////////////////////////
865
-
866
- Getting Logs for a Single Watch
867
- -------------------------------
868
-
869
- Sifting through the god logs for statements specific to a single Watch can be
870
- frustrating when you have many of them. You can get the realtime logs for a
871
- single Watch via the command line:
872
-
873
- ```terminal
874
- $ sudo god log local-3000
875
- ```
876
-
877
- This will display log output for the 'local-3000' Watch and update every
878
- second with new log messages.
879
-
880
- You can also supply a shorthand to the log command that will match one of your
881
- watches. If it happens to match several, the shortest match will be used:
882
-
883
- ```terminal
884
- $ sudo god log l3
885
- ```
886
-
887
- /////////////////////////////////////////////////////////////////////////////
888
- /////////////////////////////////////////////////////////////////////////////
889
-
890
- Notifications
891
- -------------
892
-
893
- God has an extensible notification framework built in that makes it easy to
894
- have notifications sent when conditions are triggered. Each notification type
895
- has a set of configuration parameters that must be set. These parameters may
896
- be set globally via Contact Defaults or individually via Contact Instances.
897
-
898
- *Contact Defaults* - Some parameters are unlikely to change on a per-contact
899
- basis. You should set those parameters via the defaults mechanism.
900
-
901
- ```ruby
902
- God::Contacts::Email.defaults do |d|
903
- d.from_email = 'god@example.com'
904
- d.from_name = 'God'
905
- d.delivery_method = :sendmail
906
- end
907
- ```
908
-
909
- *Contact Instances* - Each contact must have a unique `name` set. You may
910
- optionally assign each contact to a `group`.
911
-
912
- ```ruby
913
- God.contact(:email) do |c|
914
- c.name = 'tom'
915
- c.group = 'developers'
916
- c.to_email = 'tom@example.com'
917
- end
918
-
919
- God.contact(:email) do |c|
920
- c.name = 'vanpelt'
921
- c.group = 'developers'
922
- c.to_email = 'vanpelt@example.com'
923
- end
924
-
925
- God.contact(:email) do |c|
926
- c.name = 'kevin'
927
- c.group = 'developers'
928
- c.to_email = 'kevin@example.com'
929
- end
930
- ```
931
-
932
- *Condition Attachment* - To have a specific contact notified when a condition
933
- is triggered, simply set the condition's `notify` attribute to the name of the
934
- individual contact.
935
-
936
- ```ruby
937
- w.transition(:up, :start) do |on|
938
- on.condition(:process_exits) do |c|
939
- c.notify = 'tom'
940
- end
941
- end
942
- ```
943
-
944
- There are two ways to specify that a notification should be sent. The first,
945
- easier way is shown above. Every condition can take an optional `notify`
946
- attribute that specifies which contacts should be notified when the condition
947
- is triggered. The value can be a contact name or contact group *or* an array
948
- of contact names and/or contact groups.
949
-
950
- ```ruby
951
- w.transition(:up, :start) do |on|
952
- on.condition(:process_exits) do |c|
953
- c.notify = {:contacts => ['tom', 'developers'], :priority => 1, :category => 'product'}
954
- end
955
- end
956
- ```
957
-
958
- The second way allows you to specify the `priority` and `category` in addition
959
- to the contacts. The extra attributes can be arbitrary integers or strings and
960
- will be passed as-is to the notification subsystem.
961
-
962
- The above notification will arrive as an email similar to the following.
963
-
964
- ```
965
- From: God &lt;god@example.com&gt;
966
- To: tom &lt;tom@example.com&gt;
967
- Subject: [god] mongrel-8600 [trigger] process exited (ProcessExits)
968
-
969
- Message: mongrel-8600 [trigger] process exited (ProcessExits)
970
- Host: candymountain.example.com
971
- Priority: 1
972
- Category: product
973
- ```
974
-
975
- Available Notification Types
976
- ----------------------------
977
-
978
- Campfire
979
- ~~~~~~~~
980
-
981
- Send a notice to a Campfire room (http://campfirenow.com).
982
-
983
- ```ruby
984
- God::Contacts::Campfire.defaults do |d|
985
- ...
986
- end
987
-
988
- God.contact(:campfire) do |c|
989
- ...
990
- end
991
- ```
992
-
993
- ```
994
- subdomain - The String subdomain of the Campfire account. If your URL is
995
- "foo.campfirenow.com" then your subdomain is "foo".
996
- token - The String token used for authentication.
997
- room - The String room name to which the message should be sent.
998
- ssl - A Boolean determining whether or not to use SSL
999
- (default: false).
1000
- ```
1001
-
1002
- Hipchat
1003
- ~~~~~~~~
1004
-
1005
- Send a notice to a Hipchat room (http://hipchat.com).
1006
-
1007
- ```ruby
1008
- God::Contacts::Hipchat.defaults do |d|
1009
- ...
1010
- end
1011
-
1012
- God.contact(:hipchat) do |c|
1013
- ...
1014
- end
1015
- ```
1016
-
1017
- ```
1018
- token - The String token used for authentication.
1019
- room - The String room name to which the message should be sent.
1020
- ssl - A Boolean determining whether or not to use SSL
1021
- (default: false).
1022
- from - The String representing who the message should be sent as.
1023
- ```
1024
-
1025
- NOTE: in Hipchat you must have a token with 'admin' privileges. 'Notification' privileges will not be enough.
1026
-
1027
- Email
1028
- ~~~~~
1029
-
1030
- Send a notice to an email address.
1031
-
1032
- ```ruby
1033
- God::Contacts::Email.defaults do |d|
1034
- ...
1035
- end
1036
-
1037
- God.contact(:email) do |c|
1038
- ...
1039
- end
1040
- ```
1041
-
1042
- ```
1043
- to_email - The String email address to which the email will be sent.
1044
- to_name - The String name corresponding to the recipient.
1045
- from_email - The String email address from which the email will be sent.
1046
- from_name - The String name corresponding to the sender.
1047
- delivery_method - The Symbol delivery method. [ :smtp | :sendmail ]
1048
- (default: :smtp).
1049
-
1050
- === SMTP Options (when delivery_method = :smtp) ===
1051
- server_host - The String hostname of the SMTP server (default: localhost).
1052
- server_port - The Integer port of the SMTP server (default: 25).
1053
- server_auth - A Boolean or Symbol, false if no authentication else a symbol
1054
- for the type of authentication [false | :plain | :login | :cram_md5]
1055
- (default: false).
1056
-
1057
- === SMTP Auth Options (when server_auth = true) ===
1058
- server_domain - The String domain.
1059
- server_user - The String username.
1060
- server_password - The String password.
1061
-
1062
- === Sendmail Options (when delivery_method = :sendmail) ===
1063
- sendmail_path - The String path to the sendmail executable
1064
- (default: "/usr/sbin/sendmail").
1065
- sendmail_args - The String args to send to sendmail (default "-i -t").
1066
- ```
1067
-
1068
- Jabber
1069
- ~~~~~~
1070
-
1071
- Send a notice to a Jabber address (http://jabber.org/).
1072
-
1073
- Google Mail addresses should work. If you need a non-Gmail address, you can
1074
- sign up for one at http://register.jabber.org/.
1075
-
1076
- ```ruby
1077
- God::Contacts::Jabber.defaults do |d|
1078
- ...
1079
- end
1080
-
1081
- God.contact(:jabber) do |c|
1082
- ...
1083
- end
1084
- ```
1085
-
1086
- ```
1087
- host - The String hostname of the Jabber server.
1088
- port - The Integer port of the Jabber server.
1089
- from_jid - The String Jabber ID of the sender.
1090
- password - The String password of the sender.
1091
- to_jid - The String Jabber ID of the recipient.
1092
- subject - The String subject of the message (default: "God Notification").
1093
- ```
1094
-
1095
- Prowl
1096
- ~~~~~
1097
-
1098
- Send a notice to Prowl (http://prowl.weks.net/).
1099
-
1100
- ```ruby
1101
- God::Contacts::Prowl.defaults do |d|
1102
- ...
1103
- end
1104
-
1105
- God.contact(:prowl) do |c|
1106
- ...
1107
- end
1108
- ```
1109
-
1110
- ```
1111
- apikey - The String API key.
1112
- ```
1113
-
1114
- Scout
1115
- ~~~~~
1116
-
1117
- Send a notice to Scout (http://scoutapp.com/).
1118
-
1119
- ```ruby
1120
- God::Contacts::Scout.defaults do |d|
1121
- ...
1122
- end
1123
-
1124
- God.contact(:scout) do |c|
1125
- ...
1126
- end
1127
- ```
1128
-
1129
- ```
1130
- client_key - The String client key.
1131
- plugin_id - The String plugin id.
1132
-
1133
- ```
1134
-
1135
- Twitter
1136
- ~~~~~~~
1137
-
1138
- Send a notice to a Twitter account (http://twitter.com/).
1139
-
1140
- In order to use the Twitter notification, you will need to authorize God via
1141
- OAuth and then get the OAuth token and secret for your account. The easiest
1142
- way to do this is with a Ruby gem called `twurl`. Install it like so:
1143
-
1144
- ```terminal
1145
- [sudo] gem install twurl
1146
- ```
1147
-
1148
- Then, run the following:
1149
-
1150
- ```terminal
1151
- twurl auth --consumer-key gOhjax6s0L3mLeaTtBWPw \
1152
- --consumer-secret yz4gpAVXJHKxvsGK85tEyzQJ7o2FEy27H1KEWL75jfA
1153
- ```
1154
-
1155
- This will return a URL. Copy it to your clipboard. Make sure you are logged
1156
- into Twitter with the account that will used for the notifications, and then
1157
- paste the URL into a new browser window. At the end of the authentication
1158
- process, you will be given a PIN. Copy this PIN and paste it back to the
1159
- command line prompt. Once this is complete, you need to find your access token
1160
- and secret:
1161
-
1162
- ```terminal
1163
- cat ~/.twurlrc
1164
- ```
1165
-
1166
- This will output the contents of the config file from which you can grab your
1167
- access token and secret:
1168
-
1169
- ```
1170
- ---
1171
- profiles:
1172
- mojombo:
1173
- gOhjax6s0L3mLeaTtBWPw:
1174
- [red]token: 17376380-KXA91nCrgaQ4HxUXMmZtM38gB56qS3hx1NYbjT6mQ
1175
- consumer_key: gOhjax6s0L3mLeaTtBWPw
1176
- username: mojombo
1177
- consumer_secret: yz4gpAVXJHKxvsGK85tEyzQJ7o2FEy27H1KEWL75jfA
1178
- [red]secret: EBWFQBCtuMwCDeU4OXlc3LwGyY8OdWAV0Jg5KVB0
1179
- configuration:
1180
- default_profile:
1181
- - mojombo
1182
- - gOhjax6s0L3mLeaTtBWPw
1183
-
1184
- ```
1185
-
1186
- The access token and secret (highlighted in red above) are what you need to
1187
- use as parameters to the Twitter notification.
1188
-
1189
- ```ruby
1190
- God::Contacts::Twitter.defaults do |d|
1191
- ...
1192
- end
1193
-
1194
- God.contact(:twitter) do |c|
1195
- ...
1196
- end
1197
- ```
1198
-
1199
- ```
1200
- consumer_token - The String OAuth consumer token (defaults to God's
1201
- existing consumer token).
1202
- consumer_secret - The String OAuth consumer secret (defaults to God's
1203
- existing consumer secret).
1204
- access_token - The String OAuth access token.
1205
- access_secret - The String OAuth access secret.
1206
- ```
1207
-
1208
- Webhook
1209
- ~~~~~~~
1210
-
1211
- Send a notice to a webhook (http://www.webhooks.org/).
1212
-
1213
- ```ruby
1214
- God::Contacts::Webhook.defaults do |d|
1215
- ...
1216
- end
1217
-
1218
- God.contact(:webhook) do |c|
1219
- ...
1220
- end
1221
- ```
1222
-
1223
- ```
1224
- url - The String webhook URL.
1225
- format - The Symbol format [ :form | :json ] (default: :form).
1226
-
1227
- ```
1228
-
1229
- Airbrake
1230
- ~~~~~~~
1231
-
1232
- Send a notice to airbrake (http://airbrake.io/).
1233
-
1234
- ```ruby
1235
- God::Contacts::Airbrake.defaults do |d|
1236
- ...
1237
- end
1238
-
1239
- God.contact(:airbrake) do |c|
1240
- ...
1241
- end
1242
- ```
1243
-
1244
- ```
1245
- apikey - The String API key.
1246
- ```
1247
-
1248
- Slack
1249
- ~~~~~
1250
-
1251
- Send a message to a channel in Slack (https://slack.com/).
1252
-
1253
- First, set up an Incoming Webhook in your Slack account.
1254
-
1255
- Then, in your God configuration, set the defaults:
1256
-
1257
- ```ruby
1258
- God::Contacts::Slack.defaults do |d|
1259
- d.account = "foo"
1260
- d.token = "abc123abc123abc123"
1261
- c.notify_channel = true
1262
- c.format = '%{host} alert: %{message}'
1263
- end
1264
- ```
1265
-
1266
- `account` is the name of your Slack account; if you view slack at
1267
- "foo.slack.com", then your account is "foo". `token` is from your
1268
- newly-created webhook, and will be a string of unintelligible
1269
- characters.
1270
-
1271
- The `notify_channel` and `format` settings are optional. The first
1272
- controls whether the message includes `@channel` (sending notifications
1273
- to everyone in the channel); the second controls how the message is
1274
- formatted. Acceptable values within the format are `priority`, `host`,
1275
- `message`, `category`, and `time`.
1276
-
1277
- Once you've set the defaults, create contacts for the channels that you
1278
- want to notify. You can create as many as you like, and they'll look
1279
- something like this:
1280
-
1281
- ```ruby
1282
- God.contact(:slack) do |c|
1283
- c.name = '#ops'
1284
- c.channel = '#ops'
1285
- end
1286
- ```
1287
-
1288
- /////////////////////////////////////////////////////////////////////////////
1289
- /////////////////////////////////////////////////////////////////////////////
1290
-
1291
- Advanced Configuration with Transitions and Events
1292
- --------------------------------------------------
1293
-
1294
- So far you've been introduced to a simple poll-based config file and seen how
1295
- to run it. Poll-based monitoring works great for simple things, but falls
1296
- short for highly critical tasks. God has native support for kqueue/netlink
1297
- events on BSD/Darwin/Linux systems. For instance, instead of using the
1298
- `process_running` condition to poll for the status of your process, you can
1299
- use the `process_exits` condition that will be notified *immediately* upon the
1300
- exit of your process. This means less load on your system and shorter downtime
1301
- after a crash.
1302
-
1303
- While the configuration syntax you saw in the previous example is very simple,
1304
- it lacks the power that we need to deal with event based monitoring. In fact,
1305
- the `start_if` and `restart_if` methods are really just calling out to a
1306
- lower-level API. If we use the low-level API directly, we can harness the full
1307
- power of god's event based lifecycle system. Let's look at another example
1308
- config file.
1309
-
1310
- ```ruby
1311
- RAILS_ROOT = "/Users/tom/dev/gravatar2"
1312
-
1313
- God.watch do |w|
1314
- w.name = "local-3000"
1315
-
1316
- w.start = "mongrel_rails start -c #{RAILS_ROOT} -P #{RAILS_ROOT}/log/mongrel.pid -p 3000 -d"
1317
- w.stop = "mongrel_rails stop -P #{RAILS_ROOT}/log/mongrel.pid"
1318
- w.restart = "mongrel_rails restart -P #{RAILS_ROOT}/log/mongrel.pid"
1319
-
1320
- w.pid_file = File.join(RAILS_ROOT, "log/mongrel.pid")
1321
-
1322
- # clean pid files before start if necessary
1323
- w.behavior(:clean_pid_file)
1324
-
1325
- # determine the state on startup
1326
- w.transition(:init, { true => :up, false => :start }) do |on|
1327
- on.condition(:process_running) do |c|
1328
- c.running = true
1329
- end
1330
- end
1331
-
1332
- # determine when process has finished starting
1333
- w.transition([:start, :restart], :up) do |on|
1334
- on.condition(:process_running) do |c|
1335
- c.running = true
1336
- end
1337
-
1338
- # failsafe
1339
- on.condition(:tries) do |c|
1340
- c.times = 5
1341
- c.transition = :start
1342
- end
1343
- end
1344
-
1345
- # start if process is not running
1346
- w.transition(:up, :start) do |on|
1347
- on.condition(:process_exits)
1348
- end
1349
-
1350
- # restart if memory or cpu is too high
1351
- w.transition(:up, :restart) do |on|
1352
- on.condition(:memory_usage) do |c|
1353
- c.interval = 20
1354
- c.above = 50.megabytes
1355
- c.times = [3, 5]
1356
- end
1357
-
1358
- on.condition(:cpu_usage) do |c|
1359
- c.interval = 10
1360
- c.above = 10.percent
1361
- c.times = [3, 5]
1362
- end
1363
- end
1364
-
1365
- # lifecycle
1366
- w.lifecycle do |on|
1367
- on.condition(:flapping) do |c|
1368
- c.to_state = [:start, :restart]
1369
- c.times = 5
1370
- c.within = 5.minute
1371
- c.transition = :unmonitored
1372
- c.retry_in = 10.minutes
1373
- c.retry_times = 5
1374
- c.retry_within = 2.hours
1375
- end
1376
- end
1377
- end
1378
-
1379
- ```
1380
-
1381
- A bit longer, I know, but very straighforward once you understand how the
1382
- `transition` calls work. The `name`, `interval`, `start`, `stop`, and
1383
- `pid_file` attributes should be familiar. We also specify the `clean_pid_file`
1384
- behavior.
1385
-
1386
- Before jumping into the code, it's important to understand the different
1387
- states that a Watch can have, and how that state changes over time. At any
1388
- given time, a Watch will be in one of the `init`, `up`, `start`, or `restart`
1389
- states. As different conditions are satisfied, the Watch will progress from
1390
- state to state, enabling and disabling conditions along the way.
1391
-
1392
- When god first starts, each Watch is placed in the `init` state.
1393
-
1394
- You'll use the `transition` method to tell god how to transition between
1395
- states. It takes two arguments. The first argument may be either a symbol or
1396
- an array of symbols representing the state or states during which the
1397
- specified conditions should be enabled. The second argument may be either a
1398
- symbol or a hash. If it is a symbol, then that is the state that will be
1399
- transitioned to if any of the conditions return `true`. If it is a hash, then
1400
- that hash must have both `true` and `false` keys, each of which point to a
1401
- symbol that represents the state to transition to given the corresponding
1402
- return from the single condition that must be specified.
1403
-
1404
- ```ruby
1405
- # determine the state on startup
1406
- w.transition(:init, { true => :up, false => :start }) do |on|
1407
- on.condition(:process_running) do |c|
1408
- c.running = true
1409
- end
1410
- end
1411
- ```
1412
-
1413
- The first transition block tells god what to do when the Watch is in the
1414
- `init` state (first argument). This is where I tell god how to determine if my
1415
- task is already running. Since I'm monitoring a process, I can use the
1416
- `process_running` condition to determine whether the process is running. If
1417
- the process is running, it will return true, otherwise it will return false.
1418
- Since I sent a hash as the second argument to `transition`, the return from
1419
- `process_running` will determine which of the two states will be transitioned
1420
- to. If the process is running, the return is true and god will put the Watch
1421
- into the `up` state. If the process is not running, the return is false and
1422
- god will put the Watch into the `start` state.
1423
-
1424
- ```ruby
1425
- # determine when process has finished starting
1426
- w.transition([:start, :restart], :up) do |on|
1427
- on.condition(:process_running) do |c|
1428
- c.running = true
1429
- end
1430
-
1431
- ...
1432
- end
1433
- ```
1434
-
1435
- If god has determined that my process isn't running, the Watch will be put
1436
- into the `start` state. Upon entering this state, the `start` command that I
1437
- specified on the Watch will be called. In addition, the above transition
1438
- specifies a condition that should be enabled when in either the `start` or
1439
- `restart` states. The condition is another `process_running`, however this
1440
- time I'm only interested in moving to another state once it returns `true`. A
1441
- `true` return from this condition means that the process is running and it's
1442
- ok to transition to the `up` state (second argument to `transition`).
1443
-
1444
- ```ruby
1445
- # determine when process has finished starting
1446
- w.transition([:start, :restart], :up) do |on|
1447
- ...
1448
-
1449
- # failsafe
1450
- on.condition(:tries) do |c|
1451
- c.times = 5
1452
- c.transition = :start
1453
- end
1454
- end
1455
- ```
1456
-
1457
- The other half of this transition uses the `tries` condition to ensure that
1458
- god doesn't get stuck in this state. It's possible that the process could go
1459
- down while the transition is being made, in which case god would end up
1460
- polling forever to see if the process is up. Here I've specified that if this
1461
- condition is called five times, god should override the normal transition
1462
- destination and move to the `start` state instead. If you specify a
1463
- `transition` attribute on any condition, that state will be transferred to
1464
- instead of the normal transfer destination.
1465
-
1466
- ```ruby
1467
- # start if process is not running
1468
- w.transition(:up, :start) do |on|
1469
- on.condition(:process_exits)
1470
- end
1471
- ```
1472
-
1473
- This is where the event based system comes into play. Once in the `up` state,
1474
- I want to be notified when my process exits. The `process_exits` condition
1475
- registers a callback that will trigger a transition change when it is fired
1476
- off. Event conditions (like this one) cannot be used in transitions that have
1477
- a hash for the second argument (as they do not return true or false).
1478
-
1479
- ```ruby
1480
- # restart if memory or cpu is too high
1481
- w.transition(:up, :restart) do |on|
1482
- on.condition(:memory_usage) do |c|
1483
- c.interval = 20
1484
- c.above = 50.megabytes
1485
- c.times = [3, 5]
1486
- end
1487
-
1488
- on.condition(:cpu_usage) do |c|
1489
- c.interval = 10
1490
- c.above = 10.percent
1491
- c.times = [3, 5]
1492
- end
1493
- end
1494
- ```
1495
-
1496
- Notice that I can have multiple transitions with the same start state. In this
1497
- case, I want to have the `memory_usage` and `cpu_usage` poll conditions going
1498
- at the same time that I listen for the process exit event. In the case of
1499
- runaway CPU or memory usage, however, I want to transition to the `restart`
1500
- state. When a Watch enters the `restart` state it will either call the
1501
- `restart` command that you specified, or if none has been set, call the `stop`
1502
- and then `start` commands.
1503
-
1504
-
1505
- /////////////////////////////////////////////////////////////////////////////
1506
- /////////////////////////////////////////////////////////////////////////////
1507
-
1508
- Extend God with your own Conditions
1509
- -----------------------------------
1510
-
1511
- God was designed from the start to allow you to easily write your own custom
1512
- conditions, making it simple to add tests that are application specific.
1513
-
1514
-
1515
- /////////////////////////////////////////////////////////////////////////////
1516
- /////////////////////////////////////////////////////////////////////////////
1517
-
1518
- Contribute
1519
- ----------
1520
-
1521
- If you'd like to hack on god itself or contribute fixes or new functionality,
1522
- read this section.
1523
-
1524
- The codebase can be found at https://github.com/mojombo/god. To get started,
1525
- fork god on GitHub into your own account and then pull that down to your local
1526
- machine. This way you can easily submit changes via Pull Requests later on.
1527
-
1528
- ```terminal
1529
- $ git clone git@github.com:yourusername/god
1530
- ```
1531
-
1532
- We recommend using link:https://github.com/sstephenson/rbenv[rbenv] and
1533
- link:https://github.com/sstephenson/ruby-build[ruby-build] to manage multiple
1534
- versions of Ruby and their separate gemsets. Any changes to god must work on
1535
- both Ruby 1.8.7-p352 and 1.9.3-p0.
1536
-
1537
- God uses link:http://gembundler.com/[bundler] to deal with development
1538
- dependencies. Once you have the code locally, you can pull in all the
1539
- dependencies like so:
1540
-
1541
- ```terminal
1542
- $ cd god
1543
- $ bundle install
1544
- ```
1545
-
1546
- In order for process events to function during development you'll need to
1547
- compile the C extensions:
1548
-
1549
- ```terminal
1550
- $ cd ext/god
1551
- $ ruby extconf.rb
1552
- $ make
1553
- $ cd ../..
1554
- ```
1555
-
1556
- Now you're ready to run the tests and make sure everything is configured
1557
- properly. On Linux you'll need to run the tests as root in order for the
1558
- events system to load. On MacOS there is no need to run the tests as root.
1559
-
1560
- ```terminal
1561
- $ [sudo] bundle exec rake
1562
- ```
1563
-
1564
- To run your development god to make sure config files and such still work
1565
- properly, just run:
1566
-
1567
- ```terminal
1568
- $ [sudo] bundle exec god -c myconfig.god -D
1569
- ```
1570
-
1571
- There are a bunch of example config files for various scenarios in
1572
- `test/configs` that you can try out. For big new features, it's great to add a
1573
- new test config showing off the usage of the feature.
1574
-
1575
- If you intend to contribute your changes back to god core, make sure you create
1576
- a new branch and do your work there. Then, when your changes are ready to be
1577
- shared with the world, push them to your fork and issue a Pull Request against
1578
- mojombo/god. Make sure to describe your changes in detail and add relevant
1579
- tests.
1580
-
1581
- Any feature additions or changes should be accompanied by corresponding updates
1582
- to the documentation. It can be found in the `docs` directory. The
1583
- documentation is done in link:http://github.com/github/gollum[Gollum] format
1584
- and then converted into the public site at http://godrb.com. To see the
1585
- generated site locally you'll first need to commit your changes to git and then
1586
- issue the following:
1587
-
1588
- ```terminal
1589
- $ bundle exec rake site
1590
- ```
1591
-
1592
- This will open the site in your browser so you can check for correctness.