cfn-guardian 0.4.0 → 0.6.2

Sign up to get free protection for your applications and to get access to all the features.
Files changed (47) hide show
  1. checksums.yaml +4 -4
  2. data/.github/workflows/build-gem.yml +25 -0
  3. data/.github/workflows/release-gem.yml +25 -0
  4. data/.github/workflows/release-image.yml +33 -0
  5. data/.rspec +1 -0
  6. data/Gemfile.lock +13 -13
  7. data/README.md +3 -819
  8. data/cfn-guardian.gemspec +1 -3
  9. data/docs/alarm_templates.md +130 -0
  10. data/docs/cli.md +182 -0
  11. data/docs/composite_alarms.md +24 -0
  12. data/docs/custom_checks/azure_file_check.md +28 -0
  13. data/docs/custom_checks/domain_expiry.md +10 -0
  14. data/docs/custom_checks/http.md +59 -0
  15. data/docs/custom_checks/log_group_metric_filters.md +27 -0
  16. data/docs/custom_checks/nrpe.md +29 -0
  17. data/docs/custom_checks/port.md +40 -0
  18. data/docs/custom_checks/sftp.md +73 -0
  19. data/docs/custom_checks/sql.md +44 -0
  20. data/docs/custom_checks/tls.md +25 -0
  21. data/docs/custom_metrics.md +71 -0
  22. data/docs/event_subscriptions.md +67 -0
  23. data/docs/maintenance_mode.md +85 -0
  24. data/docs/notifiers.md +33 -0
  25. data/docs/overview.md +22 -0
  26. data/docs/resources.md +93 -0
  27. data/docs/variables.md +58 -0
  28. data/lib/cfnguardian.rb +72 -58
  29. data/lib/cfnguardian/cloudwatch.rb +43 -32
  30. data/lib/cfnguardian/compile.rb +82 -5
  31. data/lib/cfnguardian/deploy.rb +2 -16
  32. data/lib/cfnguardian/display_formatter.rb +1 -2
  33. data/lib/cfnguardian/error.rb +4 -0
  34. data/lib/cfnguardian/models/alarm.rb +40 -28
  35. data/lib/cfnguardian/models/check.rb +30 -12
  36. data/lib/cfnguardian/models/event.rb +43 -15
  37. data/lib/cfnguardian/models/event_subscription.rb +96 -0
  38. data/lib/cfnguardian/resources/azure_file.rb +20 -0
  39. data/lib/cfnguardian/resources/base.rb +111 -26
  40. data/lib/cfnguardian/resources/ec2_instance.rb +11 -0
  41. data/lib/cfnguardian/resources/http.rb +1 -0
  42. data/lib/cfnguardian/resources/rds_cluster.rb +14 -0
  43. data/lib/cfnguardian/resources/rds_instance.rb +71 -0
  44. data/lib/cfnguardian/stacks/main.rb +7 -6
  45. data/lib/cfnguardian/stacks/resources.rb +34 -5
  46. data/lib/cfnguardian/version.rb +1 -1
  47. metadata +35 -10
checksums.yaml CHANGED
@@ -1,7 +1,7 @@
1
1
  ---
2
2
  SHA256:
3
- metadata.gz: 97b6d983e52d77b70d2cea9e302ef3e02c377acc60bff4223fd2560a670293c5
4
- data.tar.gz: 76cbf80c45d2af2213513b093516ed302dc527ec278d837c492a7c5736f58f91
3
+ metadata.gz: dbe63daad265c8b1868992f73fbb38fd68a65833745264f2a276ea1cbe9a4cda
4
+ data.tar.gz: d3bb8f9a33a80d6e56b8040b712ffcfca3b0d5c958e22cc7d6983c204cbd65d4
5
5
  SHA512:
6
- metadata.gz: 7f248dc477c03b555afcee3bc74ac9d5f92be9c5937fccb6775aa9384ccaebde6dab76384106eebc8805180db1e773190636d50c3816b0e8ab6d8b872f708deb
7
- data.tar.gz: a69c1358fc076d1c79a8f0ed1eaf85d8d6f0b0c334372fcff92785ad7ac4fbbb5f6828c72d7c6a05637cd6831e76ae03afb957a90521b0a502f132c1ecdf4568
6
+ metadata.gz: 3135ca24580cbdbf1d361f3c4ada2e0fcf5b401d3a9a1aaf01d0ac366e51375d6de4251d41254cdfe5153f190ec339c7f79e5d79917cc31a502a9d216a8bee33
7
+ data.tar.gz: f8f9b8a6846747ff1e5cb2ae38c07302d2ef10ee0c5b8d617461c958f68316e36d93f2a8fb7cec4308023861ef70b48d26e212248f96b53e0988589e6d57fb41
@@ -0,0 +1,25 @@
1
+ name: test and build gem
2
+ on:
3
+ push:
4
+ branches: [ master ]
5
+ pull_request:
6
+ branches: [ master ]
7
+
8
+ jobs:
9
+ build:
10
+ name: test + build
11
+ runs-on: ubuntu-latest
12
+
13
+ steps:
14
+ - uses: actions/checkout@v2
15
+ - name: set up ruby 2.7
16
+ uses: actions/setup-ruby@v1
17
+ with:
18
+ ruby-version: 2.7.x
19
+ - name: rspec
20
+ run: |
21
+ gem install rspec
22
+ rspec
23
+ - name: build gem
24
+ run: |
25
+ gem build cfn-guardian.gemspec
@@ -0,0 +1,25 @@
1
+ name: release gem
2
+
3
+ on:
4
+ release:
5
+ types: [published]
6
+
7
+ jobs:
8
+ build:
9
+ name: Build and publish gem
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Check out the repo
14
+ uses: actions/checkout@v2
15
+
16
+ - name: Set up ruby 2.7
17
+ uses: actions/setup-ruby@v1
18
+ with:
19
+ ruby-version: 2.7.x
20
+
21
+ - name: Publish gem
22
+ uses: dawidd6/action-publish-gem@v1
23
+ with:
24
+ api_key: ${{secrets.RUBYGEMS_API_KEY}}
25
+ github_token: ${{secrets.GITHUB_TOKEN}}
@@ -0,0 +1,33 @@
1
+ name: release docker image
2
+
3
+ on:
4
+ release:
5
+ types: [published]
6
+
7
+ jobs:
8
+ build:
9
+ name: Build + Publish Container Image
10
+ runs-on: ubuntu-latest
11
+
12
+ steps:
13
+ - name: Check out the repo
14
+ uses: actions/checkout@v2
15
+
16
+ - name: Set up Docker Buildx
17
+ uses: docker/setup-buildx-action@v1
18
+
19
+ - name: Login to GitHub Container Repository
20
+ uses: docker/login-action@v1
21
+ with:
22
+ registry: ghcr.io
23
+ username: ${{ github.repository_owner }}
24
+ password: ${{ secrets.GHCR_PUSH_TOKEN }}
25
+
26
+ - name: Build and push Container Image to GitHub Container Repository
27
+ uses: docker/build-push-action@v2
28
+ with:
29
+ context: .
30
+ file: ./Dockerfile
31
+ push: true
32
+ tags: ghcr.io/base2services/guardian:${{ github.event.release.tag_name }}
33
+ build-args: GUARDIAN_VERSION=${{ github.event.release.tag_name }}
data/.rspec ADDED
@@ -0,0 +1 @@
1
+ --require spec_helper
data/Gemfile.lock CHANGED
@@ -1,7 +1,7 @@
1
1
  PATH
2
2
  remote: .
3
3
  specs:
4
- cfn-guardian (0.3.4)
4
+ cfn-guardian (0.6.0)
5
5
  aws-sdk-cloudformation (~> 1.31, < 2)
6
6
  aws-sdk-cloudwatch (~> 1.28, < 2)
7
7
  aws-sdk-codecommit (~> 1.28, < 2)
@@ -16,9 +16,9 @@ GEM
16
16
  remote: https://rubygems.org/
17
17
  specs:
18
18
  aws-eventstream (1.1.0)
19
- aws-partitions (1.337.0)
20
- aws-sdk-cloudformation (1.40.0)
21
- aws-sdk-core (~> 3, >= 3.99.0)
19
+ aws-partitions (1.390.0)
20
+ aws-sdk-cloudformation (1.44.0)
21
+ aws-sdk-core (~> 3, >= 3.109.0)
22
22
  aws-sigv4 (~> 1.1)
23
23
  aws-sdk-cloudwatch (1.40.0)
24
24
  aws-sdk-core (~> 3, >= 3.99.0)
@@ -29,25 +29,25 @@ GEM
29
29
  aws-sdk-codepipeline (1.33.0)
30
30
  aws-sdk-core (~> 3, >= 3.99.0)
31
31
  aws-sigv4 (~> 1.1)
32
- aws-sdk-core (3.103.0)
32
+ aws-sdk-core (3.109.2)
33
33
  aws-eventstream (~> 1, >= 1.0.2)
34
34
  aws-partitions (~> 1, >= 1.239.0)
35
35
  aws-sigv4 (~> 1.1)
36
36
  jmespath (~> 1.0)
37
- aws-sdk-kms (1.35.0)
38
- aws-sdk-core (~> 3, >= 3.99.0)
37
+ aws-sdk-kms (1.39.0)
38
+ aws-sdk-core (~> 3, >= 3.109.0)
39
39
  aws-sigv4 (~> 1.1)
40
- aws-sdk-s3 (1.72.0)
41
- aws-sdk-core (~> 3, >= 3.102.1)
40
+ aws-sdk-s3 (1.84.0)
41
+ aws-sdk-core (~> 3, >= 3.109.0)
42
42
  aws-sdk-kms (~> 1)
43
43
  aws-sigv4 (~> 1.1)
44
- aws-sigv4 (1.2.1)
44
+ aws-sigv4 (1.2.2)
45
45
  aws-eventstream (~> 1, >= 1.0.2)
46
- cfndsl (1.1.1)
46
+ cfndsl (1.2.0)
47
47
  hana (~> 1.3)
48
48
  hana (1.3.6)
49
49
  jmespath (1.4.0)
50
- rake (10.5.0)
50
+ rake (13.0.1)
51
51
  sync (0.5.0)
52
52
  term-ansicolor (1.7.1)
53
53
  tins (~> 1.0)
@@ -64,7 +64,7 @@ PLATFORMS
64
64
  DEPENDENCIES
65
65
  bundler (~> 2.0)
66
66
  cfn-guardian!
67
- rake (~> 10.0)
67
+ rake (~> 13.0)
68
68
 
69
69
  BUNDLED WITH
70
70
  2.0.2
data/README.md CHANGED
@@ -1,11 +1,14 @@
1
1
  # CfnGuardian
2
2
 
3
+ [Documentation](docs/overview.md)
4
+
3
5
  CfnGuardian is a AWS monitoring tool with a few capabilities:
4
6
 
5
7
  - creates cloudwatch alarms through cloudformation based upon resources defined in a YAML config
6
8
  - alerting through SNS using 4 levels of severity [ Critical, Warning, Task, Informational ]
7
9
  - has a standard set of default alarms across many AWS resources
8
10
  - creates cloudwatch log metric filters with default alarms
11
+ - creates specfic aws events with sns targets
9
12
  - creates custom metrics for external checks through lambda functions such as
10
13
  - http endpoint availability
11
14
  - http status code matching
@@ -38,822 +41,3 @@ CfnGuardian is a AWS monitoring tool with a few capabilities:
38
41
  - Redshift Cluster
39
42
  - SQS Queues
40
43
  - LogGroup Metric Filters
41
-
42
- ## Installation
43
-
44
- ```ruby
45
- gem install cfn-guardian
46
- ```
47
-
48
- ## Commands
49
-
50
- **compile**
51
-
52
- Generates CloudFormation templates from the alarm configuration and output to the out/ directory.
53
-
54
- ```bash
55
- Usage:
56
- cfn-guardian compile c, --config=CONFIG
57
-
58
- Options:
59
- c, --config=CONFIG # yaml config file
60
- [--validate], [--no-validate] # validate cfn templates
61
- # Default: true
62
- [--bucket=BUCKET] # provide custom bucket name, will create a default bucket if not provided
63
- r, [--region=REGION] # set the AWS region
64
- [--debug], [--no-debug] # enable debug logging
65
- ```
66
-
67
- **deploy**
68
-
69
- Generates CloudFormation templates from the alarm configuration and output to the out/ directory. Then copies the files to the s3 bucket and deploys the Cloudformation.
70
-
71
- ```bash
72
- Usage:
73
- cfn-guardian deploy c, --config=CONFIG
74
-
75
- Options:
76
- c, --config=CONFIG # yaml config file
77
- [--bucket=BUCKET] # provide custom bucket name, will create a default bucket if not provided
78
- r, [--region=REGION] # set the AWS region
79
- s, [--stack-name=STACK_NAME] # set the Cloudformation stack name. Defaults to `guardian`
80
- [--sns-critical=SNS_CRITICAL] # sns topic arn for the critical alamrs
81
- [--sns-warning=SNS_WARNING] # sns topic arn for the warning alamrs
82
- [--sns-task=SNS_TASK] # sns topic arn for the task alamrs
83
- [--sns-informational=SNS_INFORMATIONAL] # sns topic arn for the informational alamrs
84
- [--debug], [--no-debug] # enable debug logging
85
- ```
86
-
87
- **show-alarms**
88
-
89
- Displays the configured settings for each alarm. Can be filtered by resource group and alarm name. Defaults to show all configured alarms.
90
-
91
- ```bash
92
- Usage:
93
- cfn-guardian show-alarms c, --config=CONFIG
94
-
95
- Options:
96
- c, --config=CONFIG # yaml config file
97
- g, [--group=GROUP] # resource group
98
- a, [--alarm=ALARM] # alarm name
99
- [--id=ID] # resource id
100
- [--compare], [--no-compare] # compare config to deployed alarms
101
- [--defaults], [--no-defaults] # show default alarm and properites
102
- [--debug], [--no-debug] # enable debug logging
103
- ```
104
-
105
- **show-history**
106
-
107
- Displays the alarm state or config history for the last 7 days. Alarms can be described in 2 different ways:
108
-
109
- 1. Using the config to describe the alarms and filter via the group, alarm and resource id.
110
- 2. Supplying a list of alarm names with the `--alarm-names` option.
111
-
112
- *NOTE: Options 2 may find alarms not in the guardian stack.*
113
-
114
- ```bash
115
- Usage:
116
- cfn-guardian show-history
117
-
118
- Options:
119
- c, [--config=CONFIG] # yaml config file
120
- g, [--group=GROUP] # resource group
121
- a, [--alarm=ALARM] # alarm name
122
- [--alarm-names=one two three] # CloudWatch alarm name if not providing config
123
- [--id=ID] # resource id
124
- t, [--type=TYPE] # filter by alarm state
125
- # Default: state
126
- # Possible values: state, config
127
- [--debug], [--no-debug] # enable debug logging
128
- ```
129
-
130
- **show-state**
131
-
132
- Displays the current CloudWatch alarm state. Alarms can be described in 3 different ways:
133
-
134
- 1. Using the config to describe the alarms and filter via the group, alarm and resource id.
135
- 2. Supplying a list of alarm names with the `--alarm-names` option.
136
- 3. Supplying the alarm name prefix using the `--alarm-prefix` option. For example `--alarm-prefix ECS` will find all the ECSCluster related alarms.
137
-
138
- *NOTE: Options 2 and 3 may find alarms not in the guardian stack.*
139
-
140
- ```bash
141
- Usage:
142
- cfn-guardian show-state
143
-
144
- Options:
145
- c, [--config=CONFIG] # yaml config file
146
- g, [--group=GROUP] # resource group
147
- a, [--alarm=ALARM] # alarm name
148
- [--id=ID] # resource id
149
- s, [--state=STATE] # filter by alarm state
150
- # Possible values: OK, ALARM, INSUFFICIENT_DATA
151
- [--alarm-names=one two three] # CloudWatch alarm name if not providing config
152
- [--alarm-prefix=ALARM_PREFIX] # CloudWatch alarm name prefix if not providing config
153
- [--debug], [--no-debug] # enable debug logging
154
- ```
155
-
156
- **show-drift**
157
-
158
- Displays any Cloudformation drift detection in the CloudWatch alarms from the deployed stacks.
159
-
160
- ```bash
161
- Usage:
162
- cfn-guardian show-drift
163
-
164
- Options:
165
- s, [--stack-name=STACK_NAME] # set the Cloudformation stack name
166
- # Default: guardian
167
- [--debug], [--no-debug] # enable debug logging
168
- ```
169
-
170
- ## Alarm Notifications
171
-
172
- There are 4 default notification levels used by Guardian Critical, Warning, Task, Informational. If you wish to recieve notifications for each of these you need to supply an sns topic arn in the alarms.yaml
173
-
174
- ```yaml
175
- Topics:
176
- Critical: arn:aws:sns:ap-southeast-2:123456789012:Critical
177
- Warning: arn:aws:sns:ap-southeast-2:123456789012:Warning
178
- Task: arn:aws:sns:ap-southeast-2:123456789012:Task
179
- Informational: arn:aws:sns:ap-southeast-2:123456789012:Informational
180
- ```
181
-
182
- Each alarm has a default notification level but can be overriden in the config using the `AlarmAction` property at either the alarm group or alarm level. See the [Overriding Defaults](#overriding-defaults) section on how to do that.
183
-
184
- You can add your own notification topics to the topics section and combine them with the existing topics. `AlarmAction` property will accept both a string and array of notication topics.
185
-
186
- ```yaml
187
- Topics:
188
- Critical: arn:aws:sns:ap-southeast-2:123456789012:Critical
189
- Warning: arn:aws:sns:ap-southeast-2:123456789012:Warning
190
- Task: arn:aws:sns:ap-southeast-2:123456789012:Task
191
- Informational: arn:aws:sns:ap-southeast-2:123456789012:Informational
192
- CustomTopic: arn:aws:sns:ap-southeast-2:123456789012:Custom
193
-
194
- Template:
195
- Ec2Instance:
196
- GroupOverrides:
197
- AlarmActions:
198
- - Critical
199
- - Custom
200
- ```
201
-
202
- ### SNS Topics
203
-
204
- Create the topics before launching the guardian stack
205
-
206
- ```bash
207
- aws sns create-topic --name Guardian-Critical
208
- aws sns create-topic --name Guardian-Warning
209
- aws sns create-topic --name Guardian-Task
210
- aws sns create-topic --name Guardian-Informational
211
- ```
212
-
213
- SNS topics can be defined in the YAML config or during the `deploy` command using the sns switches. The full ARN must be used.
214
-
215
- ```yaml
216
- Topics:
217
- Critical: arn:aws:sns:ap-southeast-2:111111111111:Guardian-Critical
218
- Warning: arn:aws:sns:ap-southeast-2:111111111111:Guardian-Warning
219
- Task: arn:aws:sns:ap-southeast-2:111111111111:Guardian-Task
220
- Informational: arn:aws:sns:ap-southeast-2:111111111111:Guardian-Informational
221
- ```
222
-
223
- ## Configuration
224
-
225
- Config is stored in a standard YAML file which will default to `alarms.yaml`. This can be overridden by supplying the `--config` switch.
226
-
227
- ### AWS Resources
228
-
229
- The resources key is where the resources are defined.
230
-
231
- ```yaml
232
- Resources:
233
- # resource group
234
- Ec2Instance:
235
- # Array of resources defining the resource id with the Id: key
236
- - Id: i-1a2b3c4d5e
237
- ```
238
-
239
- There are some resources that require more that the resource id to generate the alarm, for these cases addition key:values are required.
240
-
241
- ```yaml
242
- Resources:
243
- ApplicationTargetGroup:
244
- - Id: target-group-id
245
- # Target group requires the loadbalancer id for the alarm
246
- Loadbalancer: app/application-loadbalancer-id
247
- ```
248
-
249
- | Resource Group | Require Keys |
250
- | --------------------------- | ---------------- |
251
- | ApiGateway | Id |
252
- | AmazonMQBroker | Id |
253
- | AutoScalingGroup | Id |
254
- | DynamoDBTable | Id |
255
- | ElastiCacheReplicationGroup | Id |
256
- | ElasticFileSystem | Id |
257
- | Ec2Instance | Id |
258
- | EcsCluster | Id |
259
- | EcsService | Id, Cluster |
260
- | NetworkTargetGroup | Id, LoadBalancer |
261
- | ApplicationTargetGroup | Id, LoadBalancer |
262
- | ElasticLoadBalancer | Id |
263
- | RDSInstance | Id |
264
- | RDSClusterInstance | Id |
265
- | RedshiftCluster | Id |
266
- | Lambda | Id |
267
- | CloudFrontDistribution | Id |
268
- | SQSQueue | Id |
269
-
270
- ### Alarm Defaults
271
-
272
- To list the default alarms use the `show-alarms` command with the `--defaults` switch.
273
- The list can be filtered using the `--group ApplicationTargetGroup` and `--alarm TargetResponseTime` optional switches
274
-
275
- ```sh
276
- cfn-guardian show-alarms --defaults --group ApplicationTargetGroup --alarm TargetResponseTime
277
-
278
- +-------------------------+----------------------------------+
279
- | ApplicationTargetGroup::TargetResponseTime |
280
- | guardian-ApplicationTargetGroup-Default-TargetResponseTime |
281
- +-------------------------+----------------------------------+
282
- | Property | Config |
283
- +-------------------------+----------------------------------+
284
- | ResourceId | Default |
285
- | ResourceHash | 7a1920d61156abc05a60135aefe8bc67 |
286
- | Enabled | true |
287
- | MetricName | TargetResponseTime |
288
- | Dimensions | |
289
- | Threshold | 5 |
290
- | Period | 60 |
291
- | EvaluationPeriods | 5 |
292
- | ComparisonOperator | GreaterThanThreshold |
293
- | Statistic | Maximum |
294
- | ActionsEnabled | true |
295
- | AlarmAction | Critical |
296
- | TreatMissingData | notBreaching |
297
- +-------------------------+----------------------------------+
298
- ```
299
-
300
- ### Friendly Resource Names
301
-
302
- You can set a friendly name which will replace the resource id in the alarm name.
303
- The resource id will still be available in the alarm description.
304
-
305
- ```yaml
306
- Resources:
307
- ApplicationTargetGroup:
308
- - Id: target-group-id
309
- Loadbalancer: app/application-loadbalancer-id
310
- Name: webapp
311
- ```
312
-
313
- ### Log Group Metric Filters
314
-
315
- Metric filters creates the metric filter and a corresponding alarm.
316
- Cloudwatch NameSpace: `MetricFilters`
317
-
318
- AWS [documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html) of pattern syntax
319
-
320
- ```yaml
321
- Resources:
322
- LogGroup:
323
- # Log group name
324
- - Id: /aws/lambda/myfuntion
325
- # List of metric filters
326
- MetricFilters:
327
- # Name of the cloud watch metric
328
- - MetricName: MyFunctionErrors
329
- # search pattern, see aws docs for syntax
330
- Pattern: error
331
- # metric to push to cloudwatch. Optional as it defaults to 1
332
- MetricValue: 1
333
-
334
- Templates:
335
- LogGroup:
336
- # use the MetricName name to override the alarm defaults
337
- MyFunctionErrors:
338
- Threshold: 10
339
- ```
340
-
341
- ### Custom Metric Resources
342
-
343
- These are also defined under the resources key but more detail is required and differs per group.
344
-
345
- #### Http
346
-
347
- Cloudwatch NameSpace: `HttpCheck`
348
-
349
- ```yaml
350
- Resources:
351
- Http:
352
- # Array of resources defining the http endpoint with the Id: key
353
- - Id: https://api.example.com
354
- # enables the status code check
355
- StatusCode: 200
356
- # enables the SSL check
357
- Ssl: true
358
- # boolean tp request a compressed response
359
- Compressed: true
360
- - Id: https://www.example.com
361
- StatusCode: 301
362
- - Id: https://example.com
363
- StatusCode: 200
364
- Ssl: true
365
- # enables the body regex check
366
- BodyRegex: 'helloworld'
367
- - Id: http://www.example.com/images/cat.jpg
368
- StatusCode: 200
369
- # md5 hash of the image
370
- BodyRegex: ae49b4246a89efcb5c639f00a013e812
371
- - Id: https://api.example.com/user
372
- StatusCode: 201
373
- # default method is get but can be overridden to support post/put/head etc
374
- Method: post
375
- # specify headers using "key=value key=value"
376
- Headers: content-type=application/json
377
- # pass in custom payload for the request
378
- Payload: '{"name": "john"}'
379
- ```
380
-
381
- #### InternalHttp
382
-
383
- Cloudwatch NameSpace: `InternalHttpCheck`
384
-
385
- ```yaml
386
- Resources:
387
- InternalHttp:
388
- # Array of host groups with the uniq identifier of Environment.
389
- # This will create a nrpe lambda per group attach to the defined vpc and subnets
390
- - Environment: Prod
391
- # VPC id for the vpc the EC2 hosts are running in
392
- VpcId: vpc-1234
393
- # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
394
- # Multiple subnets from the same AZ cannot be used!
395
- Subnets:
396
- - subnet-abcd
397
- Hosts:
398
- # Array of resources defining the http endpoint with the Id: key
399
- # All the same options as Http including ssl check on the internal endpoint
400
- - Id: http://api.example.com
401
- ```
402
-
403
- #### Port
404
-
405
- Cloudwatch NameSpace: `PortCheck`
406
-
407
- ```yaml
408
- Resources:
409
- Port:
410
- # Array of resources defining the endpoint with the Id: key and Port: Int
411
- - Id: api.example.com
412
- Port: 443
413
- # can override the default timeout of 120 seconds
414
- Timeout: 60
415
- ```
416
-
417
- #### InternalPort
418
-
419
- Cloudwatch NameSpace: `InternalPortCheck`
420
-
421
- ```yaml
422
- Resources:
423
- InternalPort:
424
- # Array of host groups with the uniq identifier of Environment.
425
- # This will create a nrpe lambda per group attach to the defined vpc and subnets
426
- - Environment: Prod
427
- # VPC id for the vpc the EC2 hosts are running in
428
- VpcId: vpc-1234
429
- # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
430
- # Multiple subnets from the same AZ cannot be used!
431
- Subnets:
432
- - subnet-abcd
433
- Hosts:
434
- # Array of resources defining the endpoint with the Id: key and Port: Int
435
- # All the same options as Port
436
- - Id: api.example.com
437
- Port: 8080
438
- ```
439
-
440
- #### DomainExpiry
441
-
442
- Cloudwatch NameSpace: `DNS`
443
-
444
- ```yaml
445
- Resources:
446
- DomainExpiry:
447
- # Array of resources defining the domain with the Id: key
448
- - Id: example.com
449
- ```
450
-
451
- #### Nrpe
452
-
453
- Cloudwatch NameSpace: `NRPE`
454
-
455
- *Note: This requires the nrpe agent running and configured on your EC2 Host*
456
-
457
- ```yaml
458
- Resources:
459
- Nrpe:
460
- # Array of host groups with the uniq identifier of Environment.
461
- # This will create a nrpe lambda per group attach to the defined vpc and subnets
462
- - Environment: Prod
463
- # VPC id for the vpc the EC2 hosts are running in
464
- VpcId: vpc-1234
465
- # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
466
- # Multiple subnets from the same AZ cannot be used!
467
- Subnets:
468
- - subnet-abcd
469
- Hosts:
470
- # Array of hosts with the Id: key defining the host private ip address
471
- - Id: 10.150.10.6
472
- # Array of nrpe commands to run against the host.
473
- # A custom metric and alarm is created for each command
474
- Commands:
475
- - check_disk
476
- - Id: 10.150.10.6
477
- Commands:
478
- - check_disk
479
- ```
480
-
481
- #### Sql
482
-
483
- Cloudwatch NameSpace: `SQL`
484
-
485
- ```yaml
486
- Resources:
487
- Sql:
488
- # Array of host groups with the uniq identifier of Environment.
489
- # This will create a sql lambda per group attach to the defined vpc and subnets
490
- - Environment: Prod
491
- # VPC id for the vpc the EC2 hosts are running in
492
- VpcId: vpc-1234
493
- # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
494
- # Multiple subnets from the same AZ cannot be used!
495
- Subnets:
496
- - subnet-1234
497
- Hosts:
498
- # Array of hosts with the Id: key defining the host private ip address
499
- - Id: my-rds-instance.example.com
500
- # Secret manager secret where the sql:// connection string key:value is defined
501
- # { "connectionString": "sql://username:password@mydb:3306/information_schema"}
502
- SecretId: MyTestDatabaseSecret
503
- # Database engine. supports mysql | postgres | mssql
504
- Engine: mysql
505
- Queries:
506
- # Array of SQL queries
507
- # MetricName used to create the custom metric and alarm
508
- - MetricName: LongRunningTransactions
509
- # SQL Query to execute
510
- Query: >-
511
- SELECT pl.host,trx_id,trx_started,trx_query
512
- FROM information_schema.INNODB_TRX it INNER
513
- JOIN information_schema.PROCESSLIST pl
514
- ON pl.Id=it.trx_mysql_thread_id
515
- WHERE it.trx_started < (NOW() - INTERVAL 4 HOUR);
516
- ```
517
-
518
- Create secretmanager secret:
519
-
520
- ```bash
521
- aws secretsmanager create-secret --name MyTestDatabaseSecret \
522
- --description "My test database secret for use with guardian sql check" \
523
- --secret-string '{"connectionString":"sql://username:password@mydb:3306/information_schema"}'
524
- ```
525
-
526
- #### SFTP
527
-
528
- CloudWatch Namespace: `SftpCheck`
529
-
530
- ```yaml
531
- Resources:
532
- SFTP:
533
- # sftp endpoint, can accept both ip address or dns endpoint
534
- - Id: example.com
535
- # sftp user to test connection with
536
- User: user
537
- # optionally set port, defaults to port 22
538
- Port: 22
539
- # for added security you can use allowed hosts when creating a
540
- # connection to the sftp by supplying the public key of the sftp server.
541
- # this removes the security risk for man in the middle attacks.
542
- ServerKey: public-server-key
543
- # ssm parameter path for the password for the SFTP user.
544
- Password: /ssm/path/password
545
- # ssm parameter path for the private key for the SFTP user
546
- PrivateKey: /ssm/path/privatekey
547
- # ssm parameter path for the password for the private key
548
- PrivateKeyPass: /ssm/path/privatekey/password
549
- # optionally set a file to check its existence and test the time it takes to get the file
550
- File: file.txt
551
- # optionally check for a regex match pattern in the body of the file
552
- FileBodyMatch: ok
553
- ```
554
-
555
- #### InternalSFTP
556
-
557
- CloudWatch Namespace: `InternalSftpCheck`
558
-
559
- ```yaml
560
- Resources:
561
- InternalSFTP:
562
- # Array of host groups with the uniq identifier of Environment.
563
- # This will create a sql lambda per group attach to the defined vpc and subnets
564
- - Environment: Prod
565
- # VPC id for the vpc the EC2 hosts are running in
566
- VpcId: vpc-1234
567
- # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
568
- # Multiple subnets from the same AZ cannot be used!
569
- Subnets:
570
- - subnet-1234
571
- Hosts:
572
- # Array of sftp hosts with the Id: key defining the host private ip address
573
- - Id: example.com
574
- User: user
575
- Port: 22
576
- ServerKey: public-server-key
577
- Password: /ssm/path/password
578
- PrivateKey: /ssm/path/privatekey
579
- PrivateKeyPass: /ssm/path/privatekey/password
580
- File: file.txt
581
- FileBodyMatch: ok
582
- ```
583
-
584
- #### TLS
585
-
586
- CloudWatch Namespace: `TLSVersionCheck`
587
-
588
- ```yaml
589
- Resources:
590
- TLS:
591
- # endpoint
592
- - Id: example.com
593
- # port to check, defaults to 443
594
- Port: 443
595
- # list of tls versions to validate against
596
- # there is a metric for each version with a 0 being no supported and 1 for supported
597
- # alarm thresholds will have to be adjusted to suit your checking requirements
598
- # defaults to all versions shown below
599
- Versions:
600
- - SSLv2
601
- - SSLv3
602
- - TLSv1
603
- - TLSv1.1
604
- - TLSv1.2
605
- # checks and reports the max tls version supported as an int
606
- # ['SSLv2 => 1', 'SSLv3 => 2', 'TLSv1 => 3','TLSv1.1 => 4', 'TLSv1.2 => 5']
607
- MaxSupported: '1'
608
- ```
609
-
610
- ## Alarm Templates
611
-
612
- Each resource group has a set of default alarm templates which defines all the cloudwatch alarm options such as Threshold, Statistic, EvaluationPeriods etc. These can be manipulated in a few ways to change the values or create new alarms.
613
-
614
- Custom alarm templates are defined within the same YAML config file un the `Templates` key.
615
-
616
- ### Overriding Defaults
617
-
618
- Alarm properties such as `Threshold`, `AlarmAction`, etc can be overriden at the alarm level or at the alarm group level.
619
-
620
- **Alarm Group Overrides**
621
-
622
- Alarm group level overrides apply to all alarms within the alarm group.
623
-
624
- ```yaml
625
- Templates:
626
- # define the resource group
627
- Ec2Instance:
628
- # GroupOverrides key denotes the group level overrides
629
- GroupOverrides:
630
- # supply the key value of the alarm property you want to override
631
- AlarmAction: Informational
632
- ```
633
-
634
- **Alarm Overrides**
635
-
636
- Alarm overrides apply only to the alarm the property is applied to. This will override any alarm group level overrides.
637
-
638
- ```yaml
639
- Templates:
640
- # define the resource group
641
- Ec2Instance:
642
- # define the Alarm name you want to override
643
- CPUUtilizationHigh:
644
- # supply the key value of the alarm property you want to override
645
- Threshold: 80
646
- ```
647
-
648
- ### Creating A New Alarm From A Default
649
-
650
- You can create a default alarm from a default alarm using the `Inherit:` key. This will inherit all properites from the default alarm which can then be overridden.
651
-
652
- ```yaml
653
- Templates:
654
- # define the resource group
655
- Ec2Instance:
656
- # define the Alarm name you want to override
657
- CPUUtilizationWarning:
658
- # Inherit the CPUUtilizationHigh alarm
659
- Inherit: CPUUtilizationHigh
660
- # supply the key value of the alarm property you want to override
661
- Threshold: 75
662
- EvaluationPeriods: 60
663
- AlarmAction: Warning
664
- ```
665
-
666
- ### Creating A New Alarm With No Defaults
667
-
668
- You can create a new alarm with out inheriting an existing one. This will the inherit the default properties for the resource group.
669
-
670
- ```yaml
671
- Templates:
672
- # define the resource group
673
- Ec2Instance:
674
- # define the Alarm name you want to override
675
- CPUUtilizationWarning:
676
- # metric name must be provided
677
- MetricName: CPUUtilization
678
- # supply the key value of the alarm property you want to override
679
- Statistic: Minimum
680
- Threshold: 75
681
- EvaluationPeriods: 60
682
- AlarmAction: Warning
683
- ```
684
-
685
- ### Disabling An Alarm
686
-
687
- You can disable an alarm by setting the alarm to `false`
688
-
689
- ```yaml
690
- Templates:
691
- # define the resource group
692
- Ec2Instance:
693
- # define the Alarm and set the value to false
694
- CPUUtilizationHigh: false
695
- ```
696
-
697
- ### Creating A New Resource Group
698
-
699
- You can create a new resource group based upon an existing resource group. For example if you had 2 target groups and wanted to disable an alarm for one but not the other you can create a new resource group which will inherit all the ApplicationTargetGroup alarms and the disabled the select alarm.
700
-
701
- ```yaml
702
- Resources:
703
- # the default resource group
704
- ApplicationTargetGroup:
705
- - Id: ApiTG
706
- LoadBalancer: MyPublicLB
707
- - Id: WebTG
708
- LoadBalancer: MyPublicLB
709
- - Id: ServiceTG
710
- LoadBalancer: MyPublicLB
711
-
712
- # my new custom resource group
713
- RedirectTargetGroup:
714
- - Id: RedirectTG
715
- LoadBalancer: MyPublicLB
716
-
717
- Templates:
718
- # create the new resource group
719
- RedirectTargetGroup:
720
- # inherit the ApplicationTargetGroup resource group
721
- Inherit: ApplicationTargetGroup
722
- # disable the selected alarm
723
- TargetResponseTime: false
724
- ```
725
-
726
- ## M Out Of N Metric Data Points
727
-
728
- This can be good to alert on groups of spikes with in a certain time frame without getting alerts for individual spikes.
729
- It works by setting the `EvaluationPeriods` as N value and `DatapointsToAlarm` as the M value.
730
- The following example will trigger the alarm if 6 out of 10 data points crossed the threshold of 90% CPU utilisation in a 10 minute period.
731
-
732
- ```yaml
733
- Templates:
734
- Ec2Instance:
735
- CPUUtilizationHigh:
736
- Threshold: 90
737
- Period: 60
738
- EvaluationPeriods: 10
739
- DatapointsToAlarm: 6
740
- ```
741
-
742
- ## Composite Alarms
743
-
744
- Composite alarms take into account a combination of alarm states and only alarm when all conditions in the rule are met. See AWS (documentation)[https://docs.aws.amazon.com/AmazonCloudWatch/latest/APIReference/API_PutCompositeAlarm.html] for rule syntax.
745
-
746
- Using the `Composites:` top level key, create the alarm using the following syntax.
747
-
748
- **NOTE:** Each composite alarm cost $0.50/month
749
-
750
- ```yaml
751
- Composites:
752
-
753
- # the key is used as the alarm name
754
- AlarmName:
755
- # Set the notification SNS topic, defaults to no notifications
756
- Action: Informational
757
- # Set a meaningful alarm description
758
- Description: test
759
- # Set the alarm rule by providing the alarm names. See above for rule syntax.
760
- # Use the show-state command to get a list of the alarm names.
761
- Rule: >-
762
- ALARM(guardian-alarm-1)
763
- AND
764
- ALARM(guardian-alarm-2)
765
- ```
766
-
767
- ## Maintenance Mode
768
-
769
- CloudWatch alarms can be enabled and disabled to allow maintenance periods without getting alert notifications.
770
- Alarms can be provided to the function the following ways
771
-
772
- **Alarm Names**
773
-
774
- Alarm names be provided by a space delimited list using the `--alarms` switch.
775
-
776
- ```bash
777
- cfn-guardian disable-alarms --group alarm-1 alarm-2
778
- cfn-guardian enable-alarms --group alarm-1 alarm-2
779
- ```
780
-
781
- **Alarm Name Prefix**
782
-
783
- Alarm name prefix will find the alarms in the account and region that start with the provided string.
784
- This can be useful if required to disable all guardian alarms, disable all alarm for a resource group or for a specific resource.
785
- Alarm names are created using the following convention.
786
-
787
- `guardian` - `ResourceGroupName` - `ResourceId` or `FriendlyName` - `AlarmName`
788
-
789
- The following example would disable/enable all alarms for all ECS Services
790
-
791
- ```bash
792
- cfn-guardian disable-alarms --alarm-prefix guardian-ECSService
793
- cfn-guardian enable-alarms --alarm-prefix guardian-ECSService
794
- ```
795
-
796
- The following example would disable/enable all alarms for the ECS Service app
797
-
798
- ```bash
799
- cfn-guardian disable-alarms --alarm-prefix guardian-ECSService-app
800
- cfn-guardian enable-alarms --alarm-prefix guardian-ECSService-app
801
- ```
802
-
803
- **Maintenance Groups**
804
-
805
- Maintenance groups are defined in the `alarms.yaml` config and creates a logical mapping between alarms.
806
-
807
- ```yaml
808
- Resources:
809
-
810
- ApplicationTargetGroup:
811
- - Id: app-tg
812
- LoadBalancer: public-lb
813
-
814
- AutoScalingGroup:
815
- - Id: ecs-asg
816
-
817
- ECSCluster:
818
- - Id: prod
819
-
820
- ECSService:
821
- - Id: app
822
- Cluster: prod
823
-
824
- Http:
825
- - Id: https://myapp.com
826
- StatusCode: 200
827
-
828
- # Define the top level key
829
- MaintenaceGroups:
830
-
831
- # Define the group name
832
- AppUpdate:
833
- # Define the resource group
834
- ECSService:
835
- # define the alarms in the resource group
836
- UnhealthyTaskCritical:
837
- # define the resource id's
838
- - Id: app
839
- # or the friendly name
840
- - Name: app
841
- Http:
842
- EndpointAvailable:
843
- - Id: https://myapp.com
844
- EndpointStatusCodeMatch:
845
- - Id: https://myapp.com
846
- ```
847
-
848
- ```bash
849
- cfn-guardian disable-alarms --group AppUpdate
850
- cfn-guardian enable-alarms --group AppUpdate
851
- ```
852
-
853
- ## Contributing
854
-
855
- Bug reports and pull requests are welcome on GitHub at https://github.com/base2services/cfn-guardian.
856
-
857
- ## License
858
-
859
- The gem is available as open source under the terms of the [MIT License](https://opensource.org/licenses/MIT).