@aws-mdaa/datawarehouse 1.4.0 → 1.6.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/README.md +108 -154
- package/SCHEMA.md +7284 -541
- package/lib/config-schema.json +2750 -217
- package/lib/datawarehouse-config.d.ts +188 -151
- package/lib/datawarehouse-config.js +7 -2
- package/package.json +17 -12
- package/sample_configs/sample-config-comprehensive.yaml +212 -0
- package/sample_configs/sample-config-minimal.yaml +45 -0
- package/sample_configs/sample-config-public-access-block-external.yaml +31 -0
- package/mdaa.config.json +0 -3
package/README.md
CHANGED
|
@@ -1,53 +1,94 @@
|
|
|
1
1
|
# Data Warehouse
|
|
2
2
|
|
|
3
|
-
|
|
3
|
+
> **Note:** This documentation is also available in a rendered format [here](https://aws.github.io/modern-data-architecture-accelerator/packages/apps/analytics/datawarehouse-app/index.html).
|
|
4
4
|
|
|
5
|
-
|
|
5
|
+
Deploys a secure Redshift-based data warehouse with KMS encryption, VPC networking, SAML federation, scheduled pause/resume actions, event notifications, and automated credential rotation for database users. Common scenarios include centralizing structured data for BI reporting, running complex analytical queries across large datasets, or providing federated SQL access to business analysts.
|
|
6
6
|
|
|
7
|
-
|
|
7
|
+
---
|
|
8
8
|
|
|
9
|
-
|
|
9
|
+
## Deployed Resources
|
|
10
|
+
|
|
11
|
+
This module deploys and integrates the following resources:
|
|
12
|
+
|
|
13
|
+
**Warehouse KMS Key** - Customer-managed KMS key for warehouse resources.
|
|
14
|
+
|
|
15
|
+
**Warehouse Bucket** - S3 bucket for warehouse utility and maintenance operations.
|
|
10
16
|
|
|
11
|
-
**Warehouse
|
|
17
|
+
**Warehouse Logging Bucket** (Optional) - S3 bucket for Redshift user activity audit logging. Uses SSE-S3 encryption (Redshift logging requirement).
|
|
12
18
|
|
|
13
|
-
**
|
|
19
|
+
**Execution Roles** - Externally managed IAM execution roles associated to the Redshift cluster for cross-service operations.
|
|
14
20
|
|
|
15
|
-
**Warehouse
|
|
21
|
+
**Warehouse Security Group** - Controls network connectivity to the Redshift cluster.
|
|
16
22
|
|
|
17
|
-
|
|
23
|
+
**Warehouse Subnet Group** - Controls which subnets the cluster is deployed on.
|
|
18
24
|
|
|
19
|
-
**
|
|
25
|
+
**Warehouse Parameter Group** - Cluster configuration parameters controlling cluster behavior.
|
|
20
26
|
|
|
21
|
-
**Warehouse
|
|
27
|
+
**Warehouse Cluster** - Redshift cluster conforming to the specified configuration.
|
|
22
28
|
|
|
23
|
-
|
|
24
|
-
* No ingress (to cluster) permitted by default
|
|
29
|
+
**Warehouse Cluster Scheduled Actions** (Optional) - Scheduled actions to automatically pause and resume the Redshift cluster.
|
|
25
30
|
|
|
26
|
-
**Warehouse
|
|
31
|
+
**Warehouse Federation Roles** (Optional) - IAM roles for SAML-based federated access to the Redshift cluster via IAM Identity Providers.
|
|
27
32
|
|
|
28
|
-
**
|
|
33
|
+
**SNS Event Subscriptions** (Optional) - EventBridge subscriptions for cluster and scheduled action event notifications.
|
|
34
|
+
|
|
35
|
+
**Redshift DB Service Users** (Optional) - Database users with credentials stored in Secrets Manager for programmatic access.
|
|
36
|
+
|
|
37
|
+
**Warehouse Users** (Optional) - Redshift user credentials with configurable automated secret rotation.
|
|
38
|
+
|
|
39
|
+

|
|
29
40
|
|
|
30
|
-
|
|
41
|
+
---
|
|
31
42
|
|
|
32
|
-
|
|
43
|
+
## Related Modules
|
|
33
44
|
|
|
34
|
-
|
|
35
|
-
|
|
36
|
-
|
|
45
|
+
- [Roles](../../governance/roles-app/README.md) — Create IAM roles for Redshift execution roles or SAML federation access
|
|
46
|
+
- [Data Lake](../../datalake/datalake-app/README.md) — Redshift can query data lake S3 buckets via Redshift Spectrum with execution roles
|
|
47
|
+
- [QuickSight Account](../quicksight-account-app/README.md) — Connect QuickSight to Redshift as a data source via VPC connection
|
|
48
|
+
- [QuickSight Project](../quicksight-project-app/README.md) — Create QuickSight data sources pointing to the Redshift cluster
|
|
49
|
+
- [OpenSearch](../opensearch-app/README.md) — Deploy OpenSearch as a complementary analytics engine for full-text search and log analytics
|
|
37
50
|
|
|
38
|
-
|
|
51
|
+
---
|
|
39
52
|
|
|
40
|
-
|
|
53
|
+
## Security/Compliance Details
|
|
41
54
|
|
|
42
|
-
|
|
43
|
-
* Grants ability to dynamically generate cluster user and credentials, and join groups provided in the SAML claim by the identity provider
|
|
44
|
-
* Groups must pre-exist in cluster, otherwise federation will fail
|
|
55
|
+
This module is designed in alignment with MDAA security/compliance principles and CDK nag rulesets. Additional review is recommended prior to production deployment, ensuring organization-specific compliance requirements are met.
|
|
45
56
|
|
|
46
|
-
|
|
57
|
+
- **Encryption at Rest**:
|
|
58
|
+
- All cluster data encrypted with customer-managed KMS key
|
|
59
|
+
- Warehouse and utility S3 buckets encrypted with KMS
|
|
60
|
+
- Audit logging bucket uses SSE-S3 (Redshift requirement)
|
|
61
|
+
- **Encryption in Transit**:
|
|
62
|
+
- SSL enforced on all client connections via parameter group
|
|
63
|
+
- **Least Privilege**:
|
|
64
|
+
- Database user credentials stored in Secrets Manager with configurable automatic rotation
|
|
65
|
+
- Execution roles scoped to specific Redshift operations
|
|
66
|
+
- **Separation of Duties**:
|
|
67
|
+
- SAML federation roles support SSO access with dynamic credential generation and group membership via SAML claims
|
|
68
|
+
- Federation groups must pre-exist in the cluster
|
|
69
|
+
- Event notifications via SNS for cluster management and security events with configurable severity filtering
|
|
70
|
+
- **Network Isolation**:
|
|
71
|
+
- Cluster deployed in VPC with configurable subnet group
|
|
72
|
+
- Security group denies all ingress by default; access must be explicitly granted via CIDR or security group rules
|
|
47
73
|
|
|
48
|
-
|
|
74
|
+
---
|
|
49
75
|
|
|
50
|
-
|
|
76
|
+
## AWS Service Endpoints
|
|
77
|
+
|
|
78
|
+
The following VPC endpoints may be required if public AWS service endpoint connectivity is unavailable (e.g., private subnets without NAT gateway, firewalled environments, or PrivateLink-only architectures):
|
|
79
|
+
|
|
80
|
+
| AWS Service | Endpoint Service Name | Type |
|
|
81
|
+
| ----------------- | --------------------------------------- | --------- |
|
|
82
|
+
| Redshift | `com.amazonaws.{region}.redshift` | Interface |
|
|
83
|
+
| Redshift Data API | `com.amazonaws.{region}.redshift-data` | Interface |
|
|
84
|
+
| KMS | `com.amazonaws.{region}.kms` | Interface |
|
|
85
|
+
| S3 | `com.amazonaws.{region}.s3` | Gateway |
|
|
86
|
+
| Secrets Manager | `com.amazonaws.{region}.secretsmanager` | Interface |
|
|
87
|
+
| SNS | `com.amazonaws.{region}.sns` | Interface |
|
|
88
|
+
| CloudWatch Logs | `com.amazonaws.{region}.logs` | Interface |
|
|
89
|
+
| STS | `com.amazonaws.{region}.sts` | Interface |
|
|
90
|
+
|
|
91
|
+
---
|
|
51
92
|
|
|
52
93
|
## Configuration
|
|
53
94
|
|
|
@@ -56,136 +97,49 @@ The Data Warehouse CDK application is used to configure and deploy resources req
|
|
|
56
97
|
Add the following snippet to your mdaa.yaml under the `modules:` section of a domain/env in order to use this module:
|
|
57
98
|
|
|
58
99
|
```yaml
|
|
59
|
-
|
|
60
|
-
|
|
61
|
-
|
|
62
|
-
|
|
100
|
+
datawarehouse: # Module Name can be customized
|
|
101
|
+
module_path: '@aws-mdaa/datawarehouse' # Must match module NPM package name
|
|
102
|
+
module_configs:
|
|
103
|
+
- ./datawarehouse.yaml # Filename/path can be customized
|
|
63
104
|
```
|
|
64
105
|
|
|
65
|
-
### Module Config
|
|
106
|
+
### Module Config Samples and Variants
|
|
66
107
|
|
|
67
|
-
|
|
108
|
+
Copy the contents of the relevant sample config below into the `./datawarehouse.yaml` file referenced in the MDAA config snippet above.
|
|
109
|
+
|
|
110
|
+
#### Minimal Configuration
|
|
111
|
+
|
|
112
|
+
Required properties only — a basic Redshift cluster with VPC networking, security group ingress, and audit logging. Start here for a quick data warehouse deployment before adding federation, scheduled actions, or database users.
|
|
113
|
+
|
|
114
|
+
[sample-config-minimal.yaml](sample_configs/sample-config-minimal.yaml)
|
|
68
115
|
|
|
69
116
|
```yaml
|
|
70
|
-
#
|
|
71
|
-
|
|
72
|
-
adminUsername: admin
|
|
73
|
-
|
|
74
|
-
# The admin password will be automatically rotatated after this many days
|
|
75
|
-
adminPasswordRotationDays: 30
|
|
76
|
-
|
|
77
|
-
# The number of days that automated snapshots are retained (1-35 days)
|
|
78
|
-
# Set 0 to disable the snapshot.
|
|
79
|
-
# Default - 1
|
|
80
|
-
automatedSnapshotRetentionDays: 3
|
|
81
|
-
|
|
82
|
-
# An optional list of arns for keys which may be used to write data to the cluster bucket.
|
|
83
|
-
# This may be useful to allow a Glue job to write data to the cluster bucket in order to load into the cluster.
|
|
84
|
-
additionalBucketKmsKeyArns:
|
|
85
|
-
- arn:{{partition}}:kms:{{region}}:{{account}}:key/abcd-123123-abcd-12312421
|
|
86
|
-
|
|
87
|
-
#Used to configure SAML federations
|
|
88
|
-
federations:
|
|
89
|
-
- federationName: "test" # Should be descriptive and unique
|
|
90
|
-
# This is the arn of the IAM Identity Provider
|
|
91
|
-
providerArn: arn:{{partition}}:iam::{{account}}:saml-provider/sample-saml-identity-provider
|
|
92
|
-
|
|
93
|
-
# This is a set of Role/Principal Arns which will be granted access to the Warehouse S3 bucket
|
|
94
|
-
dataAdminRoles:
|
|
95
|
-
- arn: arn:{{partition}}:iam::{{account}}:role/Admin
|
|
96
|
-
|
|
97
|
-
# A list of roles which will be provided read/write access to the warehouse bucket
|
|
98
|
-
warehouseBucketUserRoles:
|
|
99
|
-
- name: User
|
|
100
|
-
- name: team2-ex-role
|
|
101
|
-
|
|
102
|
-
# Set of execution roles required to be associated to the cluster
|
|
103
|
-
# If execution role requires read/write access to warehouse bucket, explicitly add that role to 'warehouseBucketUserRoles' property
|
|
104
|
-
executionRoles:
|
|
105
|
-
- arn: arn:{{partition}}:iam::{{account}}:role/team1-ex-role
|
|
106
|
-
- name: team2-ex-role
|
|
107
|
-
|
|
108
|
-
# The VPC and subnets on which the cluster will be deployed. If automatic cluster relocation is required,
|
|
109
|
-
# at least one subnet per AZ should be specified.
|
|
110
|
-
vpcId: vpc-12321421412
|
|
111
|
-
subnetIds:
|
|
112
|
-
- subnet-12312312421
|
|
113
|
-
- subnet-12312321412
|
|
114
|
-
|
|
115
|
-
#A preferred maintenance window day/time range. Should be specified as a range ddd:hh24:mi-ddd:hh24:mi (24H Clock UTC).
|
|
116
|
-
#Example: 'Sun:23:45-Mon:00:15'
|
|
117
|
-
preferredMaintenanceWindow: Sun:23:45-Mon:00:15
|
|
118
|
-
|
|
119
|
-
# Port the cluster will listen on (defaults to 5440)
|
|
120
|
-
clusterPort: 5440
|
|
121
|
-
|
|
122
|
-
# Ingress rules to be added to the cluster security group.
|
|
123
|
-
# All other traffic will be blocked
|
|
124
|
-
# Can reference other security groups (prefix sg:) or ipv4 CIDR sources (prefix ipv4:)
|
|
125
|
-
securityGroupIngress:
|
|
126
|
-
ipv4:
|
|
127
|
-
- 172.31.0.0/16
|
|
128
|
-
sg:
|
|
129
|
-
- ssm:/path/to/ssm
|
|
130
|
-
# The node type and initial number of nodes
|
|
131
|
-
nodeType: RA3_4XLARGE
|
|
132
|
-
numberOfNodes: 2
|
|
133
|
-
|
|
134
|
-
# Controls whether or not the cluster logs user audit activity to S3
|
|
135
|
-
# Note that enabling this will result in a new S3 bucket being created
|
|
136
|
-
# specifically for user audit logs. Due to Redshift limitations, this
|
|
137
|
-
# S3 bucket will use S3/AES-256 encryption instead of KMS CMK.
|
|
138
|
-
enableAuditLoggingToS3: true
|
|
139
|
-
|
|
140
|
-
databaseUsers:
|
|
141
|
-
- userName: "serviceuserGlue"
|
|
142
|
-
dbName: "default_db"
|
|
143
|
-
secretRotationDays: 90
|
|
144
|
-
secretAccessRoles:
|
|
145
|
-
- name: "test-arn"
|
|
146
|
-
- userName: "serviceuserQuicksight"
|
|
147
|
-
dbName: "default_db"
|
|
148
|
-
secretRotationDays: 90
|
|
149
|
-
|
|
150
|
-
# The list of scheduled actions to pause and/or resume cluster
|
|
151
|
-
scheduledActions:
|
|
152
|
-
# Pause cluster every Friday at 6pm ET starting April 13, 2022 until Dec 31, 2099
|
|
153
|
-
- name: pause-cluster
|
|
154
|
-
enable: True
|
|
155
|
-
# Target Action must be either of: "pauseCluster" or "resumeCluster". resizeCluster is not supported yet.
|
|
156
|
-
targetAction: pauseCluster
|
|
157
|
-
# Specify the action schedule in cron format cron(Minutes Hours Day-of-month Month Day-of-week Year).
|
|
158
|
-
schedule: cron(0 22 ? * FRI *)
|
|
159
|
-
# Start Date and Time in UTC format when the schedule becomes active. This must be a future date-time.
|
|
160
|
-
startTime: "2023-12-31T00:00:00Z"
|
|
161
|
-
# End Date and Time in UTC format after which the schedule is no longer active. This must be a future date-time later than start date.
|
|
162
|
-
endTime: "2099-12-31T00:00:00Z"
|
|
163
|
-
|
|
164
|
-
- name: resume-cluster
|
|
165
|
-
# Resume cluster every Monday at 7am ET starting April 13, 2022 until Dec 31, 2099
|
|
166
|
-
enable: True
|
|
167
|
-
# Target Action must be either of: "pauseCluster" or "resumeCluster". resizeCluster is not supported yet.
|
|
168
|
-
targetAction: resumeCluster
|
|
169
|
-
# Specify the action schedule in cron format cron(Minutes Hours Day-of-month Month Day-of-week Year).
|
|
170
|
-
schedule: cron(0 12 ? * MON *)
|
|
171
|
-
# Start Date and Time in UTC format when the schedule becomes active. This must be a future date-time.
|
|
172
|
-
startTime: "2023-12-31T00:00:00Z"
|
|
173
|
-
# End Date and Time in UTC format after which the schedule is no longer active. This must be a future date-time later than start date.
|
|
174
|
-
endTime: "2099-12-31T00:00:00Z"
|
|
175
|
-
|
|
176
|
-
# Cluster and Scheduled Action event notification configs
|
|
177
|
-
eventNotifications:
|
|
178
|
-
# List of emails to which email notifications will be sent
|
|
179
|
-
# If not specified, an SNS topic is still created and
|
|
180
|
-
# other types of subscriptions can be directly added.
|
|
181
|
-
email:
|
|
182
|
-
- example@example.com
|
|
183
|
-
# Event severity level
|
|
184
|
-
# "ERROR" | "INFO"
|
|
185
|
-
severity: INFO
|
|
186
|
-
# Event categories to be included
|
|
187
|
-
# "configuration" | "management" | "monitoring" | "security" | "pending"
|
|
188
|
-
eventCategories:
|
|
189
|
-
- management
|
|
190
|
-
- security
|
|
117
|
+
# Contents available via above link
|
|
118
|
+
--8<-- "target/docs/packages/apps/analytics/datawarehouse-app/sample_configs/sample-config-minimal.yaml"
|
|
191
119
|
```
|
|
120
|
+
|
|
121
|
+
#### Comprehensive Configuration
|
|
122
|
+
|
|
123
|
+
Deploys a multi-node Redshift cluster with SAML federation, scheduled pause/resume, audit logging, database users with secret rotation, event notifications, workload management, parameter group tuning, and VPC networking. Use this as a reference when you need full control over cluster sizing, access patterns, cost management, and operational automation.
|
|
124
|
+
|
|
125
|
+
[sample-config-comprehensive.yaml](sample_configs/sample-config-comprehensive.yaml)
|
|
126
|
+
|
|
127
|
+
```yaml
|
|
128
|
+
# Contents available via above link
|
|
129
|
+
--8<-- "target/docs/packages/apps/analytics/datawarehouse-app/sample_configs/sample-config-comprehensive.yaml"
|
|
130
|
+
```
|
|
131
|
+
|
|
132
|
+
#### External Public Access Block Management
|
|
133
|
+
|
|
134
|
+
Deploys a Redshift cluster with S3 bucket public access block management delegated externally — for example, via account-level S3 settings or SCPs. When `publicAccessBlockManagedExternally` is set to `true`, CDK omits the `PutBucketPublicAccessBlock` API call on provisioned S3 buckets, avoiding conflicts in environments where SCPs restrict that action. Use this variant when your organization enforces public access restrictions at the account or organizational level rather than per-bucket.
|
|
135
|
+
|
|
136
|
+
[sample-config-public-access-block-external.yaml](sample_configs/sample-config-public-access-block-external.yaml)
|
|
137
|
+
|
|
138
|
+
```yaml
|
|
139
|
+
# Contents available via above link
|
|
140
|
+
--8<-- "target/docs/packages/apps/analytics/datawarehouse-app/sample_configs/sample-config-public-access-block-external.yaml"
|
|
141
|
+
```
|
|
142
|
+
|
|
143
|
+
---
|
|
144
|
+
|
|
145
|
+
[Config Schema Docs](SCHEMA.md)
|