cfn-guardian 0.3.3 → 0.6.4

Sign up to get free protection for your applications and to get access to all the features.
Files changed (56) hide show
  1. checksums.yaml +4 -4
  2. data/.github/workflows/build-gem.yml +25 -0
  3. data/.github/workflows/release-gem.yml +25 -0
  4. data/.github/workflows/release-image.yml +33 -0
  5. data/.rspec +1 -0
  6. data/Gemfile.lock +24 -24
  7. data/README.md +4 -772
  8. data/cfn-guardian.gemspec +1 -3
  9. data/docs/alarm_templates.md +130 -0
  10. data/docs/cli.md +182 -0
  11. data/docs/composite_alarms.md +24 -0
  12. data/docs/custom_checks/azure_file_check.md +28 -0
  13. data/docs/custom_checks/domain_expiry.md +10 -0
  14. data/docs/custom_checks/http.md +59 -0
  15. data/docs/custom_checks/log_group_metric_filters.md +27 -0
  16. data/docs/custom_checks/nrpe.md +29 -0
  17. data/docs/custom_checks/port.md +40 -0
  18. data/docs/custom_checks/sftp.md +73 -0
  19. data/docs/custom_checks/sql.md +44 -0
  20. data/docs/custom_checks/tls.md +25 -0
  21. data/docs/custom_metrics.md +71 -0
  22. data/docs/event_subscriptions.md +67 -0
  23. data/docs/maintenance_mode.md +85 -0
  24. data/docs/notifiers.md +33 -0
  25. data/docs/overview.md +22 -0
  26. data/docs/resources.md +93 -0
  27. data/docs/variables.md +58 -0
  28. data/lib/cfnguardian.rb +76 -62
  29. data/lib/cfnguardian/cloudwatch.rb +43 -32
  30. data/lib/cfnguardian/compile.rb +87 -4
  31. data/lib/cfnguardian/config/defaults.yaml +9 -0
  32. data/lib/cfnguardian/deploy.rb +2 -16
  33. data/lib/cfnguardian/display_formatter.rb +1 -2
  34. data/lib/cfnguardian/error.rb +4 -0
  35. data/lib/cfnguardian/models/alarm.rb +101 -29
  36. data/lib/cfnguardian/models/check.rb +30 -12
  37. data/lib/cfnguardian/models/event.rb +43 -15
  38. data/lib/cfnguardian/models/event_subscription.rb +96 -0
  39. data/lib/cfnguardian/resources/amazonmq_rabbitmq.rb +136 -0
  40. data/lib/cfnguardian/resources/azure_file.rb +20 -0
  41. data/lib/cfnguardian/resources/base.rb +126 -26
  42. data/lib/cfnguardian/resources/ec2_instance.rb +11 -0
  43. data/lib/cfnguardian/resources/http.rb +1 -0
  44. data/lib/cfnguardian/resources/internal_http.rb +8 -8
  45. data/lib/cfnguardian/resources/internal_port.rb +4 -4
  46. data/lib/cfnguardian/resources/internal_sftp.rb +8 -8
  47. data/lib/cfnguardian/resources/log_group.rb +2 -2
  48. data/lib/cfnguardian/resources/rds_cluster.rb +14 -0
  49. data/lib/cfnguardian/resources/rds_instance.rb +80 -0
  50. data/lib/cfnguardian/resources/redshift_cluster.rb +2 -2
  51. data/lib/cfnguardian/resources/sftp.rb +1 -1
  52. data/lib/cfnguardian/resources/sql.rb +2 -2
  53. data/lib/cfnguardian/stacks/main.rb +9 -8
  54. data/lib/cfnguardian/stacks/resources.rb +35 -6
  55. data/lib/cfnguardian/version.rb +1 -1
  56. metadata +33 -7
@@ -0,0 +1,27 @@
1
+ # Log Group Metric Filters
2
+
3
+ Metric filters creates the metric filter and a corresponding alarm.
4
+ Cloudwatch NameSpace: `MetricFilters`
5
+
6
+ AWS [documentation](https://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html) of pattern syntax
7
+
8
+ ```yaml
9
+ Resources:
10
+ LogGroup:
11
+ # Log group name
12
+ - Id: /aws/lambda/myfuntion
13
+ # List of metric filters
14
+ MetricFilters:
15
+ # Name of the cloud watch metric
16
+ - MetricName: MyFunctionErrors
17
+ # search pattern, see aws docs for syntax
18
+ Pattern: error
19
+ # metric to push to cloudwatch. Optional as it defaults to 1
20
+ MetricValue: 1
21
+
22
+ Templates:
23
+ LogGroup:
24
+ # use the MetricName name to override the alarm defaults
25
+ MyFunctionErrors:
26
+ Threshold: 10
27
+ ```
@@ -0,0 +1,29 @@
1
+ # Nrpe
2
+
3
+ Cloudwatch NameSpace: `NRPE`
4
+
5
+ *Note: This requires the nrpe agent running and configured on your EC2 Host*
6
+
7
+ ```yaml
8
+ Resources:
9
+ Nrpe:
10
+ # Array of host groups with the uniq identifier of Environment.
11
+ # This will create a nrpe lambda per group attach to the defined vpc and subnets
12
+ - Environment: Prod
13
+ # VPC id for the vpc the EC2 hosts are running in
14
+ VpcId: vpc-1234
15
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
16
+ # Multiple subnets from the same AZ cannot be used!
17
+ Subnets:
18
+ - subnet-abcd
19
+ Hosts:
20
+ # Array of hosts with the Id: key defining the host private ip address
21
+ - Id: 10.150.10.6
22
+ # Array of nrpe commands to run against the host.
23
+ # A custom metric and alarm is created for each command
24
+ Commands:
25
+ - check_disk
26
+ - Id: 10.150.10.6
27
+ Commands:
28
+ - check_disk
29
+ ```
@@ -0,0 +1,40 @@
1
+ # Port
2
+
3
+ The port check checks a TCP port connection is established on a specified port within the timeout.
4
+
5
+ ## Public Port Check
6
+
7
+ Cloudwatch NameSpace: `PortCheck`
8
+
9
+ ```yaml
10
+ Resources:
11
+ Port:
12
+ # Array of resources defining the endpoint with the Id: key and Port: Int
13
+ - Id: api.example.com
14
+ Port: 443
15
+ # can override the default timeout of 120 seconds
16
+ Timeout: 60
17
+ ```
18
+
19
+ ## Private Port Check
20
+
21
+ Cloudwatch NameSpace: `InternalPortCheck`
22
+
23
+ ```yaml
24
+ Resources:
25
+ InternalPort:
26
+ # Array of host groups with the uniq identifier of Environment.
27
+ # This will create a nrpe lambda per group attach to the defined vpc and subnets
28
+ - Environment: Prod
29
+ # VPC id for the vpc the EC2 hosts are running in
30
+ VpcId: vpc-1234
31
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
32
+ # Multiple subnets from the same AZ cannot be used!
33
+ Subnets:
34
+ - subnet-abcd
35
+ Hosts:
36
+ # Array of resources defining the endpoint with the Id: key and Port: Int
37
+ # All the same options as Port
38
+ - Id: api.example.com
39
+ Port: 8080
40
+ ```
@@ -0,0 +1,73 @@
1
+ # SFTP
2
+
3
+ The sftp check produces 5 different metrics that alarms can be created from.
4
+
5
+ 1. `Available` whether a connection could be made in timely manner, indicating problems with network, DNS lookup or server timeout.
6
+ 2. `ConnectionTime` time taken to connect to the sftp server, reported in milliseconds.
7
+ 3. `FileExists` checks the existence of the specified file in the location specified.
8
+ 4. `FileGetTime` time taken to download the file specified.
9
+ 5. `FileBodyMatch` body of the file specified matches regex provided.
10
+
11
+ [aws-lambda-sftp-check](https://github.com/base2Services/aws-lambda-sftp-check)
12
+
13
+ ## Public SFTP Check
14
+
15
+ The public sftp check executes the check against a public endpoint.
16
+
17
+ CloudWatch Namespace: `SftpCheck`
18
+
19
+ ```yaml
20
+ Resources:
21
+ SFTP:
22
+ # sftp endpoint, can accept both ip address or dns endpoint
23
+ - Id: example.com
24
+ # sftp user to test connection with
25
+ User: user
26
+ # optionally set port, defaults to port 22
27
+ Port: 22
28
+ # for added security you can use allowed hosts when creating a
29
+ # connection to the sftp by supplying the public key of the sftp server.
30
+ # this removes the security risk for man in the middle attacks.
31
+ ServerKey: public-server-key
32
+ # ssm parameter path for the password for the SFTP user.
33
+ Password: /ssm/path/password
34
+ # ssm parameter path for the private key for the SFTP user
35
+ PrivateKey: /ssm/path/privatekey
36
+ # ssm parameter path for the password for the private key
37
+ PrivateKeyPass: /ssm/path/privatekey/password
38
+ # optionally set a file to check its existence and test the time it takes to get the file
39
+ File: file.txt
40
+ # optionally check for a regex match pattern in the body of the file
41
+ FileBodyMatch: ok
42
+ ```
43
+
44
+ ## Private SFTP Check
45
+
46
+ Private sftp check should be used when running the check against a private sftp endpoint or a public sftp point that requies whitelisting. Whitelisting can be achieved by putting the sftp check in a private subnet and hitting the endpoint through a NAT gateway, whitelisting the NAT gateway's IP on the sftp security group.
47
+
48
+ CloudWatch Namespace: `InternalSftpCheck`
49
+
50
+ ```yaml
51
+ Resources:
52
+ InternalSFTP:
53
+ # Array of host groups with the uniq identifier of Environment.
54
+ # This will create a sql lambda per group attach to the defined vpc and subnets
55
+ - Environment: Prod
56
+ # VPC id for the vpc the EC2 hosts are running in
57
+ VpcId: vpc-1234
58
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
59
+ # Multiple subnets from the same AZ cannot be used!
60
+ Subnets:
61
+ - subnet-1234
62
+ Hosts:
63
+ # Array of sftp hosts with the Id: key defining the host private ip address
64
+ - Id: example.com
65
+ User: user
66
+ Port: 22
67
+ ServerKey: public-server-key
68
+ Password: /ssm/path/password
69
+ PrivateKey: /ssm/path/privatekey
70
+ PrivateKeyPass: /ssm/path/privatekey/password
71
+ File: file.txt
72
+ FileBodyMatch: ok
73
+ ```
@@ -0,0 +1,44 @@
1
+ # Sql
2
+
3
+ Cloudwatch NameSpace: `SQL`
4
+
5
+ ```yaml
6
+ Resources:
7
+ Sql:
8
+ # Array of host groups with the uniq identifier of Environment.
9
+ # This will create a sql lambda per group attach to the defined vpc and subnets
10
+ - Environment: Prod
11
+ # VPC id for the vpc the EC2 hosts are running in
12
+ VpcId: vpc-1234
13
+ # Array of subnets to attach to the lambda function. Supply multiple if you want to be multi AZ.
14
+ # Multiple subnets from the same AZ cannot be used!
15
+ Subnets:
16
+ - subnet-1234
17
+ Hosts:
18
+ # Array of hosts with the Id: key defining the host private ip address
19
+ - Id: my-rds-instance.example.com
20
+ # Secret manager secret where the sql:// connection string key:value is defined
21
+ # { "connectionString": "sql://username:password@mydb:3306/information_schema"}
22
+ SecretId: MyTestDatabaseSecret
23
+ # Database engine. supports mysql | postgres | mssql
24
+ Engine: mysql
25
+ Queries:
26
+ # Array of SQL queries
27
+ # MetricName used to create the custom metric and alarm
28
+ - MetricName: LongRunningTransactions
29
+ # SQL Query to execute
30
+ Query: >-
31
+ SELECT pl.host,trx_id,trx_started,trx_query
32
+ FROM information_schema.INNODB_TRX it INNER
33
+ JOIN information_schema.PROCESSLIST pl
34
+ ON pl.Id=it.trx_mysql_thread_id
35
+ WHERE it.trx_started < (NOW() - INTERVAL 4 HOUR);
36
+ ```
37
+
38
+ Create secretmanager secret:
39
+
40
+ ```bash
41
+ aws secretsmanager create-secret --name MyTestDatabaseSecret \
42
+ --description "My test database secret for use with guardian sql check" \
43
+ --secret-string '{"connectionString":"sql://username:password@mydb:3306/information_schema"}'
44
+ ```
@@ -0,0 +1,25 @@
1
+ # TLS
2
+
3
+ CloudWatch Namespace: `TLSVersionCheck`
4
+
5
+ ```yaml
6
+ Resources:
7
+ TLS:
8
+ # endpoint
9
+ - Id: example.com
10
+ # port to check, defaults to 443
11
+ Port: 443
12
+ # list of tls versions to validate against
13
+ # there is a metric for each version with a 0 being no supported and 1 for supported
14
+ # alarm thresholds will have to be adjusted to suit your checking requirements
15
+ # defaults to all versions shown below
16
+ Versions:
17
+ - SSLv2
18
+ - SSLv3
19
+ - TLSv1
20
+ - TLSv1.1
21
+ - TLSv1.2
22
+ # checks and reports the max tls version supported as an int
23
+ # ['SSLv2 => 1', 'SSLv3 => 2', 'TLSv1 => 3','TLSv1.1 => 4', 'TLSv1.2 => 5']
24
+ MaxSupported: '1'
25
+ ```
@@ -0,0 +1,71 @@
1
+ # Alarms for Custom Metrics
2
+
3
+ custom metrics can be used within Guardian either by creating a new alarm template for an existing resource group or by creating a new resource group.
4
+
5
+ ## Existing Resource Group
6
+
7
+ You can add custom metrics to existing resource groups by adding the alarm to the resource group template. Override the desired properties of the alarm. Resource [variables](variables.md) can be used to reference values within the template dimensions.
8
+
9
+ ```yaml
10
+ Templates:
11
+ Ec2Instance:
12
+ LowDiskSpaceRootVolume:
13
+ # Set the metric namespace
14
+ Namespace: CWAgent
15
+ # Set the metric name
16
+ MetricName: DiskSpaceUsedPercent
17
+ # Set the custom dimentions
18
+ Dimensions:
19
+ path: '/'
20
+ # Reference the resource Id from the resource group
21
+ host: ${Resource::Id}
22
+ device: 'xvda1'
23
+ fstype: 'ext4'
24
+ # Override the default properties set by the base template
25
+ Statistic: Maximum
26
+ Threshold: 85
27
+ Period: 60
28
+ EvaluationPeriods: 1
29
+ TreatMissingData: breaching
30
+
31
+ # create our resource
32
+ Resources:
33
+ Ec2Instance:
34
+ - Id: i-12345678
35
+ ```
36
+
37
+ ## New Resource Group
38
+
39
+ when creating alarms for a new resource group you can inherit the Base Alarm template generate the structure of the alarm. The properties can then be overridden in the template. Resource [variables](variables.md) can be used to reference values within the template dimensions.
40
+
41
+ For example here we are creating an alarm for a disk usage metric generated by the CloudWatch agent on a EC2 instance.
42
+
43
+ ```yaml
44
+ Templates:
45
+ CustomGroup:
46
+ # Inherit the base alarm template
47
+ Inherit: Base
48
+ LowDiskSpaceRootVolume:
49
+ # Set the metric namespace
50
+ Namespace: CWAgent
51
+ # Set the metric name
52
+ MetricName: DiskSpaceUsedPercent
53
+ # Set the custom dimentions
54
+ Dimensions:
55
+ path: '/'
56
+ # Reference the resource Id from the resource group
57
+ host: ${Resource::Id}
58
+ device: 'xvda1'
59
+ fstype: 'ext4'
60
+ # Override the default properties set by the base template
61
+ Statistic: Maximum
62
+ Threshold: 85
63
+ Period: 60
64
+ EvaluationPeriods: 1
65
+ TreatMissingData: breaching
66
+
67
+ # create our resource
68
+ Resources:
69
+ CustomGroup:
70
+ - Id: i-12345678
71
+ ```
@@ -0,0 +1,67 @@
1
+ # Event Subscriptions
2
+
3
+ Event subscriptions create cloudwatch events that are triggered by AWS resources such as a EC2 instance termination.
4
+
5
+
6
+ ## Defaults Events
7
+
8
+ As with the default alarms in Guardian, there are default events for some resource types. These events are deployed by default for each of the resources unless the event is disabled.
9
+
10
+
11
+ ## Overriding Defaults
12
+
13
+ Default properites of the events can be overridden through the config YAML using the `EventsSubscription` top level key.
14
+ For example here we are changing the topic the event is being send to.
15
+
16
+ ```yaml
17
+ Topics:
18
+ CustomEvents: arn:aws:sns....
19
+
20
+ EventSubscription:
21
+ Ec2Instance:
22
+ InstanceTerminated:
23
+ Topic: CustomEvents
24
+ ```
25
+
26
+ ## Disabling Default Events
27
+
28
+ Default events can be disabled, the same way default alarms can be disabled through the config YAML.
29
+
30
+ ```yaml
31
+ EventSubscription:
32
+ Ec2Instance:
33
+ # set the instance terminated event to false to disable the event
34
+ InstanceTerminated: false
35
+ ```
36
+
37
+ ## Creating Custom Events
38
+
39
+ Custom events can be created if there are not defaults for that event. They can be inherited from a default event or from the base event model.
40
+
41
+ ### Inheriting From Default Event
42
+
43
+ This is useful if you want to create a new event and a default event already has the same format as the new event you want to create.
44
+ The following example inherits the `MasterPasswordReset` RDS event and creates a new event that captures the security group add to an rds instance event.
45
+
46
+ ```yaml
47
+ EventSubscription:
48
+ RDSInstance:
49
+ # Create a new event name
50
+ DBNewSecurityGroup:
51
+ # inherit the event
52
+ Inherit: MasterPasswordReset
53
+ # alter the required properties
54
+ Message: The DB instance has been added to a security group.
55
+ ```
56
+
57
+ ### Create Event From Scratch
58
+
59
+ If there are no default events that match the format you require you can create an event of the base event subscription model.
60
+
61
+ ```yaml
62
+ EventSubscription:
63
+ ECSCluster:
64
+ ContainerInstanceStateChange:
65
+ Source: aws.ecs
66
+ DetailType: ECS Container Instance State Change
67
+ ```
@@ -0,0 +1,85 @@
1
+ # Maintenance Mode
2
+
3
+ CloudWatch alarms can be enabled and disabled to allow maintenance periods without getting alert notifications.
4
+ Alarms can be provided to the function the following ways
5
+
6
+ ## Alarm Names
7
+
8
+ Alarm names be provided by a space delimited list using the `--alarms` switch.
9
+
10
+ ```bash
11
+ cfn-guardian disable-alarms --group alarm-1 alarm-2
12
+ cfn-guardian enable-alarms --group alarm-1 alarm-2
13
+ ```
14
+
15
+ ## Alarm Name Prefix
16
+
17
+ Alarm name prefix will find the alarms in the account and region that start with the provided string.
18
+ This can be useful if required to disable all guardian alarms, disable all alarm for a resource group or for a specific resource.
19
+ Alarm names are created using the following convention.
20
+
21
+ `guardian` - `ResourceGroupName` - `ResourceId` or `FriendlyName` - `AlarmName`
22
+
23
+ The following example would disable/enable all alarms for all ECS Services
24
+
25
+ ```bash
26
+ cfn-guardian disable-alarms --alarm-prefix guardian-ECSService
27
+ cfn-guardian enable-alarms --alarm-prefix guardian-ECSService
28
+ ```
29
+
30
+ The following example would disable/enable all alarms for the ECS Service app
31
+
32
+ ```bash
33
+ cfn-guardian disable-alarms --alarm-prefix guardian-ECSService-app
34
+ cfn-guardian enable-alarms --alarm-prefix guardian-ECSService-app
35
+ ```
36
+
37
+ ## Maintenance Groups
38
+
39
+ Maintenance groups are defined in the YAML configuration file and creates a logical mapping between alarms.
40
+
41
+ ```yaml
42
+ Resources:
43
+
44
+ ApplicationTargetGroup:
45
+ - Id: app-tg
46
+ LoadBalancer: public-lb
47
+
48
+ AutoScalingGroup:
49
+ - Id: ecs-asg
50
+
51
+ ECSCluster:
52
+ - Id: prod
53
+
54
+ ECSService:
55
+ - Id: app
56
+ Cluster: prod
57
+
58
+ Http:
59
+ - Id: https://myapp.com
60
+ StatusCode: 200
61
+
62
+ # Define the top level key
63
+ MaintenaceGroups:
64
+
65
+ # Define the group name
66
+ AppUpdate:
67
+ # Define the resource group
68
+ ECSService:
69
+ # define the alarms in the resource group
70
+ UnhealthyTaskCritical:
71
+ # define the resource id's
72
+ - Id: app
73
+ # or the friendly name
74
+ - Name: app
75
+ Http:
76
+ EndpointAvailable:
77
+ - Id: https://myapp.com
78
+ EndpointStatusCodeMatch:
79
+ - Id: https://myapp.com
80
+ ```
81
+
82
+ ```bash
83
+ cfn-guardian disable-alarms --group AppUpdate
84
+ cfn-guardian enable-alarms --group AppUpdate
85
+ ```