cfn-guardian 0.4.0 → 0.6.6

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (54) hide show
  1. checksums.yaml +4 -4
  2. data/.github/workflows/build-gem.yml +25 -0
  3. data/.github/workflows/release-gem.yml +25 -0
  4. data/.github/workflows/release-image.yml +33 -0
  5. data/.rspec +1 -0
  6. data/Gemfile.lock +13 -13
  7. data/README.md +4 -820
  8. data/cfn-guardian.gemspec +1 -3
  9. data/docs/alarm_templates.md +130 -0
  10. data/docs/cli.md +182 -0
  11. data/docs/composite_alarms.md +24 -0
  12. data/docs/custom_checks/azure_file_check.md +28 -0
  13. data/docs/custom_checks/domain_expiry.md +10 -0
  14. data/docs/custom_checks/http.md +59 -0
  15. data/docs/custom_checks/log_group_metric_filters.md +27 -0
  16. data/docs/custom_checks/nrpe.md +29 -0
  17. data/docs/custom_checks/port.md +40 -0
  18. data/docs/custom_checks/sftp.md +73 -0
  19. data/docs/custom_checks/sql.md +44 -0
  20. data/docs/custom_checks/tls.md +25 -0
  21. data/docs/custom_metrics.md +71 -0
  22. data/docs/event_subscriptions.md +67 -0
  23. data/docs/maintenance_mode.md +85 -0
  24. data/docs/notifiers.md +33 -0
  25. data/docs/overview.md +22 -0
  26. data/docs/resources.md +93 -0
  27. data/docs/variables.md +58 -0
  28. data/lib/cfnguardian.rb +84 -66
  29. data/lib/cfnguardian/cloudwatch.rb +43 -32
  30. data/lib/cfnguardian/codecommit.rb +11 -2
  31. data/lib/cfnguardian/compile.rb +86 -5
  32. data/lib/cfnguardian/config/defaults.yaml +9 -0
  33. data/lib/cfnguardian/deploy.rb +2 -16
  34. data/lib/cfnguardian/display_formatter.rb +1 -2
  35. data/lib/cfnguardian/error.rb +4 -0
  36. data/lib/cfnguardian/models/alarm.rb +99 -29
  37. data/lib/cfnguardian/models/check.rb +30 -12
  38. data/lib/cfnguardian/models/event.rb +43 -15
  39. data/lib/cfnguardian/models/event_subscription.rb +111 -0
  40. data/lib/cfnguardian/resources/amazonmq_rabbitmq.rb +136 -0
  41. data/lib/cfnguardian/resources/azure_file.rb +20 -0
  42. data/lib/cfnguardian/resources/base.rb +111 -26
  43. data/lib/cfnguardian/resources/batch.rb +14 -0
  44. data/lib/cfnguardian/resources/ec2_instance.rb +11 -0
  45. data/lib/cfnguardian/resources/glue.rb +23 -0
  46. data/lib/cfnguardian/resources/http.rb +1 -0
  47. data/lib/cfnguardian/resources/rds_cluster.rb +14 -0
  48. data/lib/cfnguardian/resources/rds_instance.rb +80 -0
  49. data/lib/cfnguardian/resources/redshift_cluster.rb +2 -2
  50. data/lib/cfnguardian/resources/step_functions.rb +41 -0
  51. data/lib/cfnguardian/stacks/main.rb +7 -6
  52. data/lib/cfnguardian/stacks/resources.rb +34 -5
  53. data/lib/cfnguardian/version.rb +1 -1
  54. metadata +39 -10
@@ -0,0 +1,27 @@
1
+ # Log Group Metric Filters
2
+
3
+ Metric filters creates the metric filter and a corresponding alarm.
4
+ Cloudwatch NameSpace: `MetricFilters`
5
+
6
+ AWS [documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html) of pattern syntax
7
+
8
+ ```yaml
9
+ Resources:
10
+ LogGroup:
11
+ # Log group name
12
+ - Id: /aws/lambda/myfuntion
13
+ # List of metric filters
14
+ MetricFilters:
15
+ # Name of the cloud watch metric
16
+ - MetricName: MyFunctionErrors
17
+ # search pattern, see aws docs for syntax
18
+ Pattern: error
19
+ # metric to push to cloudwatch. Optional as it defaults to 1
20
+ MetricValue: 1
21
+
22
+ Templates:
23
+ LogGroup:
24
+ # use the MetricName name to override the alarm defaults
25
+ MyFunctionErrors:
26
+ Threshold: 10
27
+ ```
@@ -0,0 +1,29 @@
1
+ # Nrpe
2
+
3
+ Cloudwatch NameSpace: `NRPE`
4
+
5
+ *Note: This requires the nrpe agent running and configured on your EC2 Host*
6
+
7
+ ```yaml
8
+ Resources:
9
+ Nrpe:
10
+ # Array of host groups with the uniq identifier of Environment.
11
+ # This will create a nrpe lambda per group attach to the defined vpc and subnets
12
+ - Environment: Prod
13
+ # VPC id for the vpc the EC2 hosts are running in
14
+ VpcId: vpc-1234
15
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
16
+ # Multiple subnets from the same AZ cannot be used!
17
+ Subnets:
18
+ - subnet-abcd
19
+ Hosts:
20
+ # Array of hosts with the Id: key defining the host private ip address
21
+ - Id: 10.150.10.6
22
+ # Array of nrpe commands to run against the host.
23
+ # A custom metric and alarm is created for each command
24
+ Commands:
25
+ - check_disk
26
+ - Id: 10.150.10.6
27
+ Commands:
28
+ - check_disk
29
+ ```
@@ -0,0 +1,40 @@
1
+ # Port
2
+
3
+ The port check checks a TCP port connection is established on a specified port within the timeout.
4
+
5
+ ## Public Port Check
6
+
7
+ Cloudwatch NameSpace: `PortCheck`
8
+
9
+ ```yaml
10
+ Resources:
11
+ Port:
12
+ # Array of resources defining the endpoint with the Id: key and Port: Int
13
+ - Id: api.example.com
14
+ Port: 443
15
+ # can override the default timeout of 120 seconds
16
+ Timeout: 60
17
+ ```
18
+
19
+ ## Private Port Check
20
+
21
+ Cloudwatch NameSpace: `InternalPortCheck`
22
+
23
+ ```yaml
24
+ Resources:
25
+ InternalPort:
26
+ # Array of host groups with the uniq identifier of Environment.
27
+ # This will create a nrpe lambda per group attach to the defined vpc and subnets
28
+ - Environment: Prod
29
+ # VPC id for the vpc the EC2 hosts are running in
30
+ VpcId: vpc-1234
31
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
32
+ # Multiple subnets from the same AZ cannot be used!
33
+ Subnets:
34
+ - subnet-abcd
35
+ Hosts:
36
+ # Array of resources defining the endpoint with the Id: key and Port: Int
37
+ # All the same options as Port
38
+ - Id: api.example.com
39
+ Port: 8080
40
+ ```
@@ -0,0 +1,73 @@
1
+ # SFTP
2
+
3
+ The sftp check produces 5 different metrics that alarms can be created from.
4
+
5
+ 1. `Available` whether a connection could be made in timely manner, indicating problems with network, DNS lookup or server timeout.
6
+ 2. `ConnectionTime` time taken to connect to the sftp server, reported in milliseconds.
7
+ 3. `FileExists` checks the existence of the specified file in the location specified.
8
+ 4. `FileGetTime` time taken to download the file specified.
9
+ 5. `FileBodyMatch` body of the file specified matches regex provided.
10
+
11
+ [aws-lambda-sftp-check](https://github.com/base2Services/aws-lambda-sftp-check)
12
+
13
+ ## Public SFTP Check
14
+
15
+ The public sftp check executes the check against a public endpoint.
16
+
17
+ CloudWatch Namespace: `SftpCheck`
18
+
19
+ ```yaml
20
+ Resources:
21
+ SFTP:
22
+ # sftp endpoint, can accept both ip address or dns endpoint
23
+ - Id: example.com
24
+ # sftp user to test connection with
25
+ User: user
26
+ # optionally set port, defaults to port 22
27
+ Port: 22
28
+ # for added security you can use allowed hosts when creating a
29
+ # connection to the sftp by supplying the public key of the sftp server.
30
+ # this removes the security risk for man in the middle attacks.
31
+ ServerKey: public-server-key
32
+ # ssm parameter path for the password for the SFTP user.
33
+ Password: /ssm/path/password
34
+ # ssm parameter path for the private key for the SFTP user
35
+ PrivateKey: /ssm/path/privatekey
36
+ # ssm parameter path for the password for the private key
37
+ PrivateKeyPass: /ssm/path/privatekey/password
38
+ # optionally set a file to check its existence and test the time it takes to get the file
39
+ File: file.txt
40
+ # optionally check for a regex match pattern in the body of the file
41
+ FileBodyMatch: ok
42
+ ```
43
+
44
+ ## Private SFTP Check
45
+
46
+ Private sftp check should be used when running the check against a private sftp endpoint or a public sftp point that requies whitelisting. Whitelisting can be achieved by putting the sftp check in a private subnet and hitting the endpoint through a NAT gateway, whitelisting the NAT gateway's IP on the sftp security group.
47
+
48
+ CloudWatch Namespace: `InternalSftpCheck`
49
+
50
+ ```yaml
51
+ Resources:
52
+ InternalSFTP:
53
+ # Array of host groups with the uniq identifier of Environment.
54
+ # This will create a sql lambda per group attach to the defined vpc and subnets
55
+ - Environment: Prod
56
+ # VPC id for the vpc the EC2 hosts are running in
57
+ VpcId: vpc-1234
58
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
59
+ # Multiple subnets from the same AZ cannot be used!
60
+ Subnets:
61
+ - subnet-1234
62
+ Hosts:
63
+ # Array of sftp hosts with the Id: key defining the host private ip address
64
+ - Id: example.com
65
+ User: user
66
+ Port: 22
67
+ ServerKey: public-server-key
68
+ Password: /ssm/path/password
69
+ PrivateKey: /ssm/path/privatekey
70
+ PrivateKeyPass: /ssm/path/privatekey/password
71
+ File: file.txt
72
+ FileBodyMatch: ok
73
+ ```
@@ -0,0 +1,44 @@
1
+ # Sql
2
+
3
+ Cloudwatch NameSpace: `SQL`
4
+
5
+ ```yaml
6
+ Resources:
7
+ Sql:
8
+ # Array of host groups with the uniq identifier of Environment.
9
+ # This will create a sql lambda per group attach to the defined vpc and subnets
10
+ - Environment: Prod
11
+ # VPC id for the vpc the EC2 hosts are running in
12
+ VpcId: vpc-1234
13
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
14
+ # Multiple subnets from the same AZ cannot be used!
15
+ Subnets:
16
+ - subnet-1234
17
+ Hosts:
18
+ # Array of hosts with the Id: key defining the host private ip address
19
+ - Id: my-rds-instance.example.com
20
+ # Secret manager secret where the sql:// connection string key:value is defined
21
+ # { "connectionString": "sql://username:password@mydb:3306/information_schema"}
22
+ SecretId: MyTestDatabaseSecret
23
+ # Database engine. supports mysql | postgres | mssql
24
+ Engine: mysql
25
+ Queries:
26
+ # Array of SQL queries
27
+ # MetricName used to create the custom metric and alarm
28
+ - MetricName: LongRunningTransactions
29
+ # SQL Query to execute
30
+ Query: >-
31
+ SELECT pl.host,trx_id,trx_started,trx_query
32
+ FROM information_schema.INNODB_TRX it INNER
33
+ JOIN information_schema.PROCESSLIST pl
34
+ ON pl.Id=it.trx_mysql_thread_id
35
+ WHERE it.trx_started < (NOW() - INTERVAL 4 HOUR);
36
+ ```
37
+
38
+ Create secretmanager secret:
39
+
40
+ ```bash
41
+ aws secretsmanager create-secret --name MyTestDatabaseSecret \
42
+ --description "My test database secret for use with guardian sql check" \
43
+ --secret-string '{"connectionString":"sql://username:password@mydb:3306/information_schema"}'
44
+ ```
@@ -0,0 +1,25 @@
1
+ # TLS
2
+
3
+ CloudWatch Namespace: `TLSVersionCheck`
4
+
5
+ ```yaml
6
+ Resources:
7
+ TLS:
8
+ # endpoint
9
+ - Id: example.com
10
+ # port to check, defaults to 443
11
+ Port: 443
12
+ # list of tls versions to validate against
13
+ # there is a metric for each version with a 0 being no supported and 1 for supported
14
+ # alarm thresholds will have to be adjusted to suit your checking requirements
15
+ # defaults to all versions shown below
16
+ Versions:
17
+ - SSLv2
18
+ - SSLv3
19
+ - TLSv1
20
+ - TLSv1.1
21
+ - TLSv1.2
22
+ # checks and reports the max tls version supported as an int
23
+ # ['SSLv2 => 1', 'SSLv3 => 2', 'TLSv1 => 3','TLSv1.1 => 4', 'TLSv1.2 => 5']
24
+ MaxSupported: '1'
25
+ ```
@@ -0,0 +1,71 @@
1
+ # Alarms for Custom Metrics
2
+
3
+ custom metrics can be used within Guardian either by creating a new alarm template for an existing resource group or by creating a new resource group.
4
+
5
+ ## Existing Resource Group
6
+
7
+ You can add custom metrics to existing resource groups by adding the alarm to the resource group template. Override the desired properties of the alarm. Resource [variables](variables.md) can be used to reference values within the template dimensions.
8
+
9
+ ```yaml
10
+ Templates:
11
+ Ec2Instance:
12
+ LowDiskSpaceRootVolume:
13
+ # Set the metric namespace
14
+ Namespace: CWAgent
15
+ # Set the metric name
16
+ MetricName: DiskSpaceUsedPercent
17
+ # Set the custom dimentions
18
+ Dimensions:
19
+ path: '/'
20
+ # Reference the resource Id from the resource group
21
+ host: ${Resource::Id}
22
+ device: 'xvda1'
23
+ fstype: 'ext4'
24
+ # Override the default properties set by the base template
25
+ Statistic: Maximum
26
+ Threshold: 85
27
+ Period: 60
28
+ EvaluationPeriods: 1
29
+ TreatMissingData: breaching
30
+
31
+ # create our resource
32
+ Resources:
33
+ Ec2Instance:
34
+ - Id: i-12345678
35
+ ```
36
+
37
+ ## New Resource Group
38
+
39
+ when creating alarms for a new resource group you can inherit the Base Alarm template generate the structure of the alarm. The properties can then be overridden in the template. Resource [variables](variables.md) can be used to reference values within the template dimensions.
40
+
41
+ For example here we are creating an alarm for a disk usage metric generated by the CloudWatch agent on a EC2 instance.
42
+
43
+ ```yaml
44
+ Templates:
45
+ CustomGroup:
46
+ # Inherit the base alarm template
47
+ Inherit: Base
48
+ LowDiskSpaceRootVolume:
49
+ # Set the metric namespace
50
+ Namespace: CWAgent
51
+ # Set the metric name
52
+ MetricName: DiskSpaceUsedPercent
53
+ # Set the custom dimentions
54
+ Dimensions:
55
+ path: '/'
56
+ # Reference the resource Id from the resource group
57
+ host: ${Resource::Id}
58
+ device: 'xvda1'
59
+ fstype: 'ext4'
60
+ # Override the default properties set by the base template
61
+ Statistic: Maximum
62
+ Threshold: 85
63
+ Period: 60
64
+ EvaluationPeriods: 1
65
+ TreatMissingData: breaching
66
+
67
+ # create our resource
68
+ Resources:
69
+ CustomGroup:
70
+ - Id: i-12345678
71
+ ```
@@ -0,0 +1,67 @@
1
+ # Event Subscriptions
2
+
3
+ Event subscriptions create cloudwatch events that are triggered by AWS resources such as a EC2 instance termination.
4
+
5
+
6
+ ## Defaults Events
7
+
8
+ As with the default alarms in Guardian, there are default events for some resource types. These events are deployed by default for each of the resources unless the event is disabled.
9
+
10
+
11
+ ## Overriding Defaults
12
+
13
+ Default properites of the events can be overridden through the config YAML using the `EventsSubscription` top level key.
14
+ For example here we are changing the topic the event is being send to.
15
+
16
+ ```yaml
17
+ Topics:
18
+ CustomEvents: arn:aws:sns....
19
+
20
+ EventSubscription:
21
+ Ec2Instance:
22
+ InstanceTerminated:
23
+ Topic: CustomEvents
24
+ ```
25
+
26
+ ## Disabling Default Events
27
+
28
+ Default events can be disabled, the same way default alarms can be disabled through the config YAML.
29
+
30
+ ```yaml
31
+ EventSubscription:
32
+ Ec2Instance:
33
+ # set the instance terminated event to false to disable the event
34
+ InstanceTerminated: false
35
+ ```
36
+
37
+ ## Creating Custom Events
38
+
39
+ Custom events can be created if there are not defaults for that event. They can be inherited from a default event or from the base event model.
40
+
41
+ ### Inheriting From Default Event
42
+
43
+ This is useful if you want to create a new event and a default event already has the same format as the new event you want to create.
44
+ The following example inherits the `MasterPasswordReset` RDS event and creates a new event that captures the security group add to an rds instance event.
45
+
46
+ ```yaml
47
+ EventSubscription:
48
+ RDSInstance:
49
+ # Create a new event name
50
+ DBNewSecurityGroup:
51
+ # inherit the event
52
+ Inherit: MasterPasswordReset
53
+ # alter the required properties
54
+ Message: The DB instance has been added to a security group.
55
+ ```
56
+
57
+ ### Create Event From Scratch
58
+
59
+ If there are no default events that match the format you require you can create an event of the base event subscription model.
60
+
61
+ ```yaml
62
+ EventSubscription:
63
+ ECSCluster:
64
+ ContainerInstanceStateChange:
65
+ Source: aws.ecs
66
+ DetailType: ECS Container Instance State Change
67
+ ```
@@ -0,0 +1,85 @@
1
+ # Maintenance Mode
2
+
3
+ CloudWatch alarms can be enabled and disabled to allow maintenance periods without getting alert notifications.
4
+ Alarms can be provided to the function the following ways
5
+
6
+ ## Alarm Names
7
+
8
+ Alarm names be provided by a space delimited list using the `--alarms` switch.
9
+
10
+ ```bash
11
+ cfn-guardian disable-alarms --group alarm-1 alarm-2
12
+ cfn-guardian enable-alarms --group alarm-1 alarm-2
13
+ ```
14
+
15
+ ## Alarm Name Prefix
16
+
17
+ Alarm name prefix will find the alarms in the account and region that start with the provided string.
18
+ This can be useful if required to disable all guardian alarms, disable all alarm for a resource group or for a specific resource.
19
+ Alarm names are created using the following convention.
20
+
21
+ `guardian` - `ResourceGroupName` - `ResourceId` or `FriendlyName` - `AlarmName`
22
+
23
+ The following example would disable/enable all alarms for all ECS Services
24
+
25
+ ```bash
26
+ cfn-guardian disable-alarms --alarm-prefix guardian-ECSService
27
+ cfn-guardian enable-alarms --alarm-prefix guardian-ECSService
28
+ ```
29
+
30
+ The following example would disable/enable all alarms for the ECS Service app
31
+
32
+ ```bash
33
+ cfn-guardian disable-alarms --alarm-prefix guardian-ECSService-app
34
+ cfn-guardian enable-alarms --alarm-prefix guardian-ECSService-app
35
+ ```
36
+
37
+ ## Maintenance Groups
38
+
39
+ Maintenance groups are defined in the YAML configuration file and creates a logical mapping between alarms.
40
+
41
+ ```yaml
42
+ Resources:
43
+
44
+ ApplicationTargetGroup:
45
+ - Id: app-tg
46
+ LoadBalancer: public-lb
47
+
48
+ AutoScalingGroup:
49
+ - Id: ecs-asg
50
+
51
+ ECSCluster:
52
+ - Id: prod
53
+
54
+ ECSService:
55
+ - Id: app
56
+ Cluster: prod
57
+
58
+ Http:
59
+ - Id: https://myapp.com
60
+ StatusCode: 200
61
+
62
+ # Define the top level key
63
+ MaintenaceGroups:
64
+
65
+ # Define the group name
66
+ AppUpdate:
67
+ # Define the resource group
68
+ ECSService:
69
+ # define the alarms in the resource group
70
+ UnhealthyTaskCritical:
71
+ # define the resource id's
72
+ - Id: app
73
+ # or the friendly name
74
+ - Name: app
75
+ Http:
76
+ EndpointAvailable:
77
+ - Id: https://myapp.com
78
+ EndpointStatusCodeMatch:
79
+ - Id: https://myapp.com
80
+ ```
81
+
82
+ ```bash
83
+ cfn-guardian disable-alarms --group AppUpdate
84
+ cfn-guardian enable-alarms --group AppUpdate
85
+ ```