flapjack 0.6.61 → 0.7.0
Sign up to get free protection for your applications and to get access to all the features.
- data/Gemfile +2 -1
- data/README.md +8 -4
- data/features/events.feature +269 -146
- data/features/notification_rules.feature +93 -0
- data/features/steps/events_steps.rb +162 -21
- data/features/steps/notifications_steps.rb +1 -1
- data/features/steps/time_travel_steps.rb +30 -19
- data/features/support/env.rb +71 -1
- data/flapjack.gemspec +3 -0
- data/lib/flapjack/data/contact.rb +256 -57
- data/lib/flapjack/data/entity.rb +2 -1
- data/lib/flapjack/data/entity_check.rb +22 -7
- data/lib/flapjack/data/global.rb +1 -0
- data/lib/flapjack/data/message.rb +2 -0
- data/lib/flapjack/data/notification_rule.rb +172 -0
- data/lib/flapjack/data/tag.rb +7 -2
- data/lib/flapjack/data/tag_set.rb +16 -0
- data/lib/flapjack/executive.rb +147 -13
- data/lib/flapjack/filters/delays.rb +21 -9
- data/lib/flapjack/gateways/api.rb +407 -27
- data/lib/flapjack/gateways/pagerduty.rb +1 -1
- data/lib/flapjack/gateways/web.rb +50 -22
- data/lib/flapjack/gateways/web/views/self_stats.haml +2 -0
- data/lib/flapjack/utility.rb +10 -0
- data/lib/flapjack/version.rb +1 -1
- data/spec/lib/flapjack/data/contact_spec.rb +103 -6
- data/spec/lib/flapjack/data/global_spec.rb +2 -0
- data/spec/lib/flapjack/data/message_spec.rb +6 -0
- data/spec/lib/flapjack/data/notification_rule_spec.rb +22 -0
- data/spec/lib/flapjack/data/notification_spec.rb +6 -0
- data/spec/lib/flapjack/gateways/api_spec.rb +727 -4
- data/spec/lib/flapjack/gateways/jabber_spec.rb +1 -0
- data/spec/lib/flapjack/gateways/web_spec.rb +11 -1
- data/spec/spec_helper.rb +10 -0
- data/tmp/notification_rules.rb +73 -0
- data/tmp/test_json_post.rb +16 -0
- data/tmp/test_notification_rules_api.rb +170 -0
- metadata +59 -2
data/Gemfile
CHANGED
data/README.md
CHANGED
@@ -9,14 +9,18 @@ Flapjack is a highly scalable and distributed monitoring notification system.
|
|
9
9
|
|
10
10
|
Flapjack provides a scalable method for dealing with events representing changes in system state (OK -> WARNING -> CRITICAL transitions) and alerting appropriate people as necessary.
|
11
11
|
|
12
|
-
At its core,
|
12
|
+
At its core, Flapjack processes events received from external check execution engines, such as Nagios. Nagios provides a 'perfdata' event output channel, which writes to a named pipe. `flapjack-nagios-receiver` then reads from this named pipe, converts each line to JSON and adds them to the events queue.
|
13
|
+
|
14
|
+
Flapjack's `executive` component picks up the events and processes them -- deciding when and who to notifify about problems, recoveries, acknowledgements, etc.
|
15
|
+
|
16
|
+
Additional check engines can be supported by adding additional receiver processes similar to the nagios receiver.
|
13
17
|
|
14
18
|
|
15
19
|
## Using Flapjack
|
16
20
|
|
17
21
|
### Quickstart
|
18
22
|
|
19
|
-
TODO numbered list for simplest possible Flapjack run
|
23
|
+
TODO numbered list for simplest possible Flapjack run.
|
20
24
|
|
21
25
|
For more information, including full specification of the configuration file and the data import formats, please refer to the [Flapjack Wiki](https://github.com/flpjck/flapjack/wiki/USING).
|
22
26
|
|
@@ -35,8 +39,8 @@ git submodule update
|
|
35
39
|
|
36
40
|
If you make changes to the documentation locally, here's how to publish them:
|
37
41
|
|
38
|
-
*
|
42
|
+
* Checkout master within the doc subdir, otherwise you'll be commiting to no branch, a.k.a. *no man's land*.
|
39
43
|
* git add, commit and push from inside the doc subdir
|
40
|
-
*
|
44
|
+
* Add, commit and push the doc dir from the root (this updates the pointer in the main git repo to the correct ref in the doc repo, we think...)
|
41
45
|
|
42
46
|
|
data/features/events.feature
CHANGED
@@ -3,223 +3,346 @@ Feature: events
|
|
3
3
|
So people can be notified when things break and recover
|
4
4
|
flapjack-executive must process events correctly
|
5
5
|
|
6
|
-
# TODO make entity and check implicit, so the test reads more cleanly
|
7
6
|
Background:
|
8
|
-
Given an entity '
|
7
|
+
Given an entity 'foo-app-01.example.com' exists
|
8
|
+
And the check is check 'HTTP Port 80' on entity 'foo-app-01.example.com'
|
9
9
|
|
10
10
|
Scenario: Check ok to ok
|
11
|
-
Given check
|
12
|
-
When an ok event is received
|
13
|
-
Then a notification should not be generated
|
11
|
+
Given the check is in an ok state
|
12
|
+
When an ok event is received
|
13
|
+
Then a notification should not be generated
|
14
14
|
|
15
|
-
Scenario: Check ok to
|
16
|
-
Given check
|
17
|
-
When a
|
18
|
-
Then a notification should not be generated
|
15
|
+
Scenario: Check ok to warning
|
16
|
+
Given the check is in an ok state
|
17
|
+
When a warning event is received
|
18
|
+
Then a notification should not be generated
|
19
|
+
|
20
|
+
Scenario: Check ok to critical
|
21
|
+
Given the check is in an ok state
|
22
|
+
When a critical event is received
|
23
|
+
Then a notification should not be generated
|
19
24
|
|
20
25
|
@time
|
21
|
-
Scenario: Check
|
22
|
-
Given check
|
23
|
-
When a
|
26
|
+
Scenario: Check critical to critical after 10 seconds
|
27
|
+
Given the check is in an ok state
|
28
|
+
When a critical event is received
|
24
29
|
And 10 seconds passes
|
25
|
-
And a
|
26
|
-
Then a notification should not be generated
|
30
|
+
And a critical event is received
|
31
|
+
Then a notification should not be generated
|
32
|
+
|
33
|
+
@time
|
34
|
+
Scenario: Check ok to warning for 1 minute
|
35
|
+
Given the check is in an ok state
|
36
|
+
When a warning event is received
|
37
|
+
And 1 minute passes
|
38
|
+
And a warning event is received
|
39
|
+
Then a notification should be generated
|
27
40
|
|
28
41
|
@time
|
29
|
-
Scenario: Check ok to
|
30
|
-
Given check
|
31
|
-
When a
|
42
|
+
Scenario: Check ok to critical for 1 minute
|
43
|
+
Given the check is in an ok state
|
44
|
+
When a critical event is received
|
32
45
|
And 1 minute passes
|
33
|
-
And a
|
34
|
-
Then a notification should be generated
|
46
|
+
And a critical event is received
|
47
|
+
Then a notification should be generated
|
48
|
+
|
49
|
+
@time
|
50
|
+
Scenario: Check ok to warning, 1 min, then critical
|
51
|
+
Given the check is in an ok state
|
52
|
+
When a warning event is received
|
53
|
+
And 1 minute passes
|
54
|
+
And a warning event is received
|
55
|
+
Then a notification should be generated
|
56
|
+
When a critical event is received
|
57
|
+
Then a notification should not be generated
|
58
|
+
When 1 minute passes
|
59
|
+
And a critical event is received
|
60
|
+
Then a notification should be generated
|
35
61
|
|
36
62
|
@time
|
37
|
-
Scenario: Check
|
38
|
-
Given check
|
39
|
-
When a
|
63
|
+
Scenario: Check critical and alerted to critical for 1 minute
|
64
|
+
Given the check is in an ok state
|
65
|
+
When a critical event is received
|
40
66
|
And 1 minute passes
|
41
|
-
And a
|
42
|
-
Then a notification should be generated
|
67
|
+
And a critical event is received
|
68
|
+
Then a notification should be generated
|
43
69
|
When 1 minute passes
|
44
|
-
And a
|
45
|
-
Then a notification should not be generated
|
70
|
+
And a critical event is received
|
71
|
+
Then a notification should not be generated
|
46
72
|
|
47
73
|
@time
|
48
|
-
Scenario: Check
|
49
|
-
Given check
|
50
|
-
When a
|
74
|
+
Scenario: Check critical and alerted to critical for 6 minutes
|
75
|
+
Given the check is in an ok state
|
76
|
+
When a critical event is received
|
51
77
|
And 1 minute passes
|
52
|
-
And a
|
53
|
-
Then a notification should be generated
|
78
|
+
And a critical event is received
|
79
|
+
Then a notification should be generated
|
54
80
|
When 6 minutes passes
|
55
|
-
And a
|
56
|
-
Then a notification should be generated
|
81
|
+
And a critical event is received
|
82
|
+
Then a notification should be generated
|
57
83
|
|
58
84
|
@time
|
59
|
-
Scenario: Check ok to
|
60
|
-
Given check
|
61
|
-
And check
|
62
|
-
When a
|
85
|
+
Scenario: Check ok to critical for 1 minute when in scheduled maintenance
|
86
|
+
Given the check is in an ok state
|
87
|
+
And the check is in scheduled maintenance
|
88
|
+
When a critical event is received
|
63
89
|
And 1 minute passes
|
64
|
-
And a
|
65
|
-
Then a notification should not be generated
|
90
|
+
And a critical event is received
|
91
|
+
Then a notification should not be generated
|
66
92
|
|
67
93
|
@time
|
68
|
-
Scenario: Check ok to
|
69
|
-
Given check
|
70
|
-
And check
|
71
|
-
When a
|
94
|
+
Scenario: Check ok to critical for 1 minute when in unscheduled maintenance
|
95
|
+
Given the check is in an ok state
|
96
|
+
And the check is in unscheduled maintenance
|
97
|
+
When a critical event is received
|
72
98
|
And 1 minute passes
|
73
|
-
And a
|
74
|
-
Then a notification should not be generated
|
99
|
+
And a critical event is received
|
100
|
+
Then a notification should not be generated
|
75
101
|
|
76
102
|
@time
|
77
|
-
Scenario: Check ok to
|
78
|
-
Given check
|
79
|
-
When a
|
103
|
+
Scenario: Check ok to critical for 1 minute, acknowledged, and critical for 6 minutes
|
104
|
+
Given the check is in an ok state
|
105
|
+
When a critical event is received
|
80
106
|
And 1 minute passes
|
81
|
-
And a
|
82
|
-
Then a notification should be generated
|
83
|
-
When an acknowledgement is received
|
107
|
+
And a critical event is received
|
108
|
+
Then a notification should be generated
|
109
|
+
When an acknowledgement event is received
|
84
110
|
And 6 minute passes
|
85
|
-
And a
|
86
|
-
Then a notification should not be generated
|
87
|
-
|
88
|
-
|
89
|
-
|
90
|
-
|
91
|
-
And a failure event is received for check 'abc' on entity 'def'
|
92
|
-
Then a notification should be generated for check 'abc' on entity 'def'
|
111
|
+
And a critical event is received
|
112
|
+
Then a notification should not be generated
|
113
|
+
|
114
|
+
@time
|
115
|
+
Scenario: Check critical to ok
|
116
|
+
Given the check is in a critical state
|
93
117
|
When 5 minutes passes
|
94
|
-
And
|
95
|
-
Then a notification should be generated
|
118
|
+
And a critical event is received
|
119
|
+
Then a notification should be generated
|
120
|
+
When 5 minutes passes
|
121
|
+
And an ok event is received
|
122
|
+
Then a notification should be generated
|
96
123
|
|
97
124
|
@time
|
98
|
-
Scenario: Check
|
99
|
-
Given check
|
100
|
-
When an acknowledgement event is received
|
101
|
-
Then a notification should be generated
|
125
|
+
Scenario: Check critical to ok when acknowledged
|
126
|
+
Given the check is in a critical state
|
127
|
+
When an acknowledgement event is received
|
128
|
+
Then a notification should be generated
|
102
129
|
When 1 minute passes
|
103
|
-
And an ok event is received
|
104
|
-
Then a notification should be generated
|
130
|
+
And an ok event is received
|
131
|
+
Then a notification should be generated
|
105
132
|
|
106
133
|
@time
|
107
|
-
Scenario: Check
|
108
|
-
Given check
|
109
|
-
When an acknowledgement event is received
|
110
|
-
Then a notification should be generated
|
134
|
+
Scenario: Check critical to ok when acknowledged, and fails after 6 minutes
|
135
|
+
Given the check is in a critical state
|
136
|
+
When an acknowledgement event is received
|
137
|
+
Then a notification should be generated
|
111
138
|
When 1 minute passes
|
112
|
-
And an ok event is received
|
113
|
-
Then a notification should be generated
|
139
|
+
And an ok event is received
|
140
|
+
Then a notification should be generated
|
114
141
|
When 6 minutes passes
|
115
|
-
And a
|
116
|
-
Then a notification should not be generated
|
142
|
+
And a critical event is received
|
143
|
+
Then a notification should not be generated
|
117
144
|
When 6 minutes passes
|
118
|
-
And a
|
119
|
-
Then a notification should be generated
|
145
|
+
And a critical event is received
|
146
|
+
Then a notification should be generated
|
120
147
|
|
121
148
|
@time
|
122
149
|
Scenario: Osciliating state, period of two minutes
|
123
|
-
Given check
|
124
|
-
When a
|
125
|
-
Then a notification should not be generated
|
150
|
+
Given the check is in an ok state
|
151
|
+
When a critical event is received
|
152
|
+
Then a notification should not be generated
|
126
153
|
When 50 seconds passes
|
127
|
-
And a
|
128
|
-
Then a notification should be generated
|
154
|
+
And a critical event is received
|
155
|
+
Then a notification should be generated
|
129
156
|
When 10 seconds passes
|
130
|
-
And an ok event is received
|
131
|
-
Then a notification should be generated
|
157
|
+
And an ok event is received
|
158
|
+
Then a notification should be generated
|
132
159
|
When 50 seconds passes
|
133
|
-
And an ok event is received
|
134
|
-
Then a notification should not be generated
|
160
|
+
And an ok event is received
|
161
|
+
Then a notification should not be generated
|
135
162
|
When 10 seconds passes
|
136
|
-
And a
|
137
|
-
Then a notification should not be generated
|
163
|
+
And a critical event is received
|
164
|
+
Then a notification should not be generated
|
138
165
|
When 50 seconds passes
|
139
|
-
And a
|
140
|
-
|
141
|
-
Then a notification should be generated for check 'abc' on entity 'def'
|
166
|
+
And a critical event is received
|
167
|
+
Then a notification should be generated
|
142
168
|
When 10 seconds passes
|
143
|
-
And an ok event is received
|
144
|
-
Then a notification should be generated
|
169
|
+
And an ok event is received
|
170
|
+
Then a notification should be generated
|
145
171
|
|
146
172
|
Scenario: Acknowledgement when ok
|
147
|
-
Given check
|
148
|
-
When an acknowledgement event is received
|
149
|
-
Then a notification should not be generated
|
150
|
-
|
151
|
-
Scenario: Acknowledgement when
|
152
|
-
Given check
|
153
|
-
When an acknowledgement event is received
|
154
|
-
Then a notification should be generated
|
155
|
-
|
156
|
-
Scenario: Brief
|
157
|
-
Given check
|
158
|
-
When a
|
173
|
+
Given the check is in an ok state
|
174
|
+
When an acknowledgement event is received
|
175
|
+
Then a notification should not be generated
|
176
|
+
|
177
|
+
Scenario: Acknowledgement when critical
|
178
|
+
Given the check is in a critical state
|
179
|
+
When an acknowledgement event is received
|
180
|
+
Then a notification should be generated
|
181
|
+
|
182
|
+
Scenario: Brief critical then OK
|
183
|
+
Given the check is in an ok state
|
184
|
+
When a critical event is received
|
159
185
|
And 10 seconds passes
|
160
|
-
And an ok event is received
|
161
|
-
Then a notification should not be generated
|
186
|
+
And an ok event is received
|
187
|
+
Then a notification should not be generated
|
162
188
|
|
189
|
+
@time
|
163
190
|
Scenario: Flapper (down for one minute, up for one minute, repeat)
|
164
|
-
Given check
|
165
|
-
When a
|
166
|
-
Then a notification should not be generated
|
191
|
+
Given the check is in an ok state
|
192
|
+
When a critical event is received
|
193
|
+
Then a notification should not be generated
|
167
194
|
When 10 seconds passes
|
168
|
-
And a
|
169
|
-
Then a notification should not be generated
|
195
|
+
And a critical event is received
|
196
|
+
Then a notification should not be generated
|
170
197
|
When 10 seconds passes
|
171
|
-
And a
|
172
|
-
Then a notification should not be generated
|
198
|
+
And a critical event is received
|
199
|
+
Then a notification should not be generated
|
173
200
|
When 10 seconds passes
|
174
201
|
# 30 seconds
|
175
|
-
And a
|
176
|
-
Then a notification should be generated
|
202
|
+
And a critical event is received
|
203
|
+
Then a notification should be generated
|
177
204
|
When 10 seconds passes
|
178
|
-
And a
|
179
|
-
Then a notification should not be generated
|
205
|
+
And a critical event is received
|
206
|
+
Then a notification should not be generated
|
180
207
|
When 10 seconds passes
|
181
|
-
And a
|
182
|
-
Then a notification should not be generated
|
208
|
+
And a critical event is received
|
209
|
+
Then a notification should not be generated
|
183
210
|
When 10 seconds passes
|
184
211
|
# 60 seconds
|
185
|
-
And an ok event is received
|
186
|
-
Then a notification should be generated
|
212
|
+
And an ok event is received
|
213
|
+
Then a notification should be generated
|
187
214
|
When 10 seconds passes
|
188
|
-
And an ok event is received
|
189
|
-
Then a notification should not be generated
|
215
|
+
And an ok event is received
|
216
|
+
Then a notification should not be generated
|
190
217
|
When 10 seconds passes
|
191
|
-
And an ok event is received
|
192
|
-
Then a notification should not be generated
|
218
|
+
And an ok event is received
|
219
|
+
Then a notification should not be generated
|
193
220
|
When 10 seconds passes
|
194
|
-
And an ok event is received
|
195
|
-
Then a notification should not be generated
|
221
|
+
And an ok event is received
|
222
|
+
Then a notification should not be generated
|
196
223
|
When 10 seconds passes
|
197
|
-
And an ok event is received
|
198
|
-
Then a notification should not be generated
|
224
|
+
And an ok event is received
|
225
|
+
Then a notification should not be generated
|
199
226
|
When 10 seconds passes
|
200
|
-
And an ok event is received
|
201
|
-
Then a notification should not be generated
|
227
|
+
And an ok event is received
|
228
|
+
Then a notification should not be generated
|
202
229
|
When 10 seconds passes
|
203
230
|
# 120 seconds
|
204
|
-
And a
|
205
|
-
Then a notification should not be generated
|
231
|
+
And a critical event is received
|
232
|
+
Then a notification should not be generated
|
206
233
|
When 10 seconds passes
|
207
|
-
And a
|
208
|
-
Then a notification should not be generated
|
234
|
+
And a critical event is received
|
235
|
+
Then a notification should not be generated
|
209
236
|
When 10 seconds passes
|
210
|
-
And a
|
211
|
-
Then a notification should not be generated
|
237
|
+
And a critical event is received
|
238
|
+
Then a notification should not be generated
|
212
239
|
When 10 seconds passes
|
213
240
|
# 150 seconds
|
214
|
-
And a
|
215
|
-
Then a notification should be generated
|
241
|
+
And a critical event is received
|
242
|
+
Then a notification should be generated
|
216
243
|
When 10 seconds passes
|
217
|
-
And a
|
218
|
-
Then a notification should not be generated
|
244
|
+
And a critical event is received
|
245
|
+
Then a notification should not be generated
|
219
246
|
When 10 seconds passes
|
220
|
-
And a
|
221
|
-
Then a notification should not be generated
|
247
|
+
And a critical event is received
|
248
|
+
Then a notification should not be generated
|
222
249
|
When 10 seconds passes
|
223
250
|
# 180 seconds
|
224
|
-
And an ok event is received
|
225
|
-
Then a notification should be generated
|
251
|
+
And an ok event is received
|
252
|
+
Then a notification should be generated
|
253
|
+
|
254
|
+
# commenting out this test for now, will revive it
|
255
|
+
# when working on gh-119
|
256
|
+
# @time
|
257
|
+
# Scenario: a lot of quick ok -> warning -> ok -> warning
|
258
|
+
# Given the check is in an ok state
|
259
|
+
# When 10 seconds passes
|
260
|
+
# And a warning event is received
|
261
|
+
# Then a notification should not be generated
|
262
|
+
# When 10 seconds passes
|
263
|
+
# And an ok event is received
|
264
|
+
# Then a notification should not be generated
|
265
|
+
# When 10 seconds passes
|
266
|
+
# And a warning event is received
|
267
|
+
# Then a notification should not be generated
|
268
|
+
# When 10 seconds passes
|
269
|
+
# And a warning event is received
|
270
|
+
# Then a notification should not be generated
|
271
|
+
# When 10 seconds passes
|
272
|
+
# And a warning event is received
|
273
|
+
# Then a notification should not be generated
|
274
|
+
# When 10 seconds passes
|
275
|
+
# And an ok event is received
|
276
|
+
# Then a notification should not be generated
|
277
|
+
# When 10 seconds passes
|
278
|
+
# And a warning event is received
|
279
|
+
# Then a notification should not be generated
|
280
|
+
# When 10 seconds passes
|
281
|
+
# And an ok event is received
|
282
|
+
# Then a notification should not be generated
|
283
|
+
# When 10 seconds passes
|
284
|
+
# And a warning event is received
|
285
|
+
# Then a notification should not be generated
|
286
|
+
# When 10 seconds passes
|
287
|
+
# And a warning event is received
|
288
|
+
# Then a notification should not be generated
|
289
|
+
# When 10 seconds passes
|
290
|
+
# And a warning event is received
|
291
|
+
# Then a notification should not be generated
|
292
|
+
# When 10 seconds passes
|
293
|
+
# And a warning event is received
|
294
|
+
# Then a notification should be generated
|
295
|
+
# When 10 seconds passes
|
296
|
+
# And a warning event is received
|
297
|
+
# Then a notification should not be generated
|
298
|
+
# When 10 seconds passes
|
299
|
+
# And a warning event is received
|
300
|
+
# Then a notification should not be generated
|
301
|
+
# When 10 seconds passes
|
302
|
+
# And an ok event is received
|
303
|
+
# Then a notification should be generated
|
304
|
+
# When 10 seconds passes
|
305
|
+
# And a warning event is received
|
306
|
+
# Then a notification should not be generated
|
307
|
+
# When 10 seconds passes
|
308
|
+
# And a warning event is received
|
309
|
+
# Then a notification should not be generated
|
310
|
+
# When 10 seconds passes
|
311
|
+
# And a warning event is received
|
312
|
+
# Then a notification should not be generated
|
313
|
+
# When 10 seconds passes
|
314
|
+
# And an ok event is received
|
315
|
+
# Then a notification should not be generated
|
316
|
+
# When 10 seconds passes
|
317
|
+
# And a warning event is received
|
318
|
+
# Then a notification should not be generated
|
319
|
+
# When 10 seconds passes
|
320
|
+
# And a warning event is received
|
321
|
+
# Then a notification should not be generated
|
322
|
+
# When 10 seconds passes
|
323
|
+
# And a warning event is received
|
324
|
+
# Then a notification should not be generated
|
325
|
+
# When 10 seconds passes
|
326
|
+
# And an ok event is received
|
327
|
+
# Then a notification should not be generated
|
328
|
+
# When 10 seconds passes
|
329
|
+
# And an ok event is received
|
330
|
+
# Then a notification should not be generated
|
331
|
+
# When 10 seconds passes
|
332
|
+
# And an ok event is received
|
333
|
+
# Then a notification should not be generated
|
334
|
+
# When 10 seconds passes
|
335
|
+
# And an ok event is received
|
336
|
+
# Then a notification should not be generated
|
337
|
+
# When 10 seconds passes
|
338
|
+
# And an ok event is received
|
339
|
+
# Then a notification should not be generated
|
340
|
+
# When 10 seconds passes
|
341
|
+
# And a warning event is received
|
342
|
+
# Then a notification should not be generated
|
343
|
+
# When 10 seconds passes
|
344
|
+
# And a warning event is received
|
345
|
+
# Then a notification should not be generated
|
346
|
+
# When 10 seconds passes
|
347
|
+
# And an ok event is received
|
348
|
+
# Then a notification should not be generated
|