runbooks 0.9.2__py3-none-any.whl → 0.9.5__py3-none-any.whl
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- runbooks/__init__.py +15 -6
- runbooks/cfat/__init__.py +3 -1
- runbooks/cloudops/__init__.py +3 -1
- runbooks/common/aws_utils.py +367 -0
- runbooks/common/enhanced_logging_example.py +239 -0
- runbooks/common/enhanced_logging_integration_example.py +257 -0
- runbooks/common/logging_integration_helper.py +344 -0
- runbooks/common/profile_utils.py +8 -6
- runbooks/common/rich_utils.py +347 -3
- runbooks/enterprise/logging.py +400 -38
- runbooks/finops/README.md +262 -406
- runbooks/finops/__init__.py +44 -1
- runbooks/finops/accuracy_cross_validator.py +12 -3
- runbooks/finops/business_cases.py +552 -0
- runbooks/finops/commvault_ec2_analysis.py +415 -0
- runbooks/finops/cost_processor.py +718 -42
- runbooks/finops/dashboard_router.py +44 -22
- runbooks/finops/dashboard_runner.py +302 -39
- runbooks/finops/embedded_mcp_validator.py +358 -48
- runbooks/finops/finops_scenarios.py +1122 -0
- runbooks/finops/helpers.py +182 -0
- runbooks/finops/multi_dashboard.py +30 -15
- runbooks/finops/scenarios.py +789 -0
- runbooks/finops/single_dashboard.py +386 -58
- runbooks/finops/types.py +29 -4
- runbooks/inventory/__init__.py +2 -1
- runbooks/main.py +522 -29
- runbooks/operate/__init__.py +3 -1
- runbooks/remediation/__init__.py +3 -1
- runbooks/remediation/commons.py +55 -16
- runbooks/remediation/commvault_ec2_analysis.py +259 -0
- runbooks/remediation/rds_snapshot_list.py +267 -102
- runbooks/remediation/workspaces_list.py +182 -31
- runbooks/security/__init__.py +3 -1
- runbooks/sre/__init__.py +2 -1
- runbooks/utils/__init__.py +81 -6
- runbooks/utils/version_validator.py +241 -0
- runbooks/vpc/__init__.py +2 -1
- {runbooks-0.9.2.dist-info → runbooks-0.9.5.dist-info}/METADATA +98 -60
- {runbooks-0.9.2.dist-info → runbooks-0.9.5.dist-info}/RECORD +44 -39
- {runbooks-0.9.2.dist-info → runbooks-0.9.5.dist-info}/entry_points.txt +1 -0
- runbooks/inventory/cloudtrail.md +0 -727
- runbooks/inventory/discovery.md +0 -81
- runbooks/remediation/CLAUDE.md +0 -100
- runbooks/remediation/DOME9.md +0 -218
- runbooks/security/ENTERPRISE_SECURITY_FRAMEWORK.md +0 -506
- {runbooks-0.9.2.dist-info → runbooks-0.9.5.dist-info}/WHEEL +0 -0
- {runbooks-0.9.2.dist-info → runbooks-0.9.5.dist-info}/licenses/LICENSE +0 -0
- {runbooks-0.9.2.dist-info → runbooks-0.9.5.dist-info}/top_level.txt +0 -0
runbooks/inventory/cloudtrail.md
DELETED
@@ -1,727 +0,0 @@
|
|
1
|
-
# CloudTrail: Accountability and Governance
|
2
|
-
|
3
|
-
??? info "python check_cloudtrail_status.py"
|
4
|
-
|
5
|
-
| Parent Acct | Account Number | Region | Trail Name | Trail Type | S3 Bucket |
|
6
|
-
| ------------ | -------------- | --------- | -------------------- | ---------- | ----------------------------- |
|
7
|
-
| 909135376185 | 909135376185 | us-east-1 | ams-tf-cloudtraillog | OrgTrail | ams-cloudtrail-vector-all-org |
|
8
|
-
|
9
|
-
---
|
10
|
-
|
11
|
-
## How to identify WHO changed an AWS Resources, WHEN, and HOW it happened
|
12
|
-
|
13
|
-
|
14
|
-
|
15
|
-
??? note "🔎 Objective: Changed an AWS Security Group (sg-xxx) in an AWS account"
|
16
|
-
|
17
|
-
> We need to know:
|
18
|
-
|
19
|
-
- **Who** made the change
|
20
|
-
- **When** it happened
|
21
|
-
- **How** (what method/tool was used)
|
22
|
-
- Optionally: **What** exactly was changed (rules added/removed/modified)
|
23
|
-
|
24
|
-
|
25
|
-
## ✅ Step-by-Step Forensic Workflow
|
26
|
-
|
27
|
-
### 🛠️ **1. Enable or Query AWS CloudTrail (Primary Source)**
|
28
|
-
|
29
|
-
CloudTrail is your **single source of truth** for who did what, when, and how in AWS.
|
30
|
-
|
31
|
-
#### 🔍 How to Query CloudTrail for SG Changes:
|
32
|
-
**Console:**
|
33
|
-
1. Go to **CloudTrail > Event History**
|
34
|
-
2. Set **Lookup Attribute** to:
|
35
|
-
- **Event name**: `AuthorizeSecurityGroupIngress`, `AuthorizeSecurityGroupEgress`, `RevokeSecurityGroupIngress`, `RevokeSecurityGroupEgress`, or `UpdateSecurityGroupRuleDescriptions`
|
36
|
-
- Or filter by **Resource Name**: `sg-xxx`
|
37
|
-
3. Set **Time Range** (last 7–30 days typically)
|
38
|
-
|
39
|
-
**Important Fields to Look At:**
|
40
|
-
| Field | Description |
|
41
|
-
|-------|-------------|
|
42
|
-
| **Event Time** | When the change occurred |
|
43
|
-
| **Event Name** | Type of operation (authorize/revoke/modify) |
|
44
|
-
| **User Identity** | IAM principal who initiated the change |
|
45
|
-
| **Access Key / Session Context** | Whether it was via console, CLI, or automation |
|
46
|
-
| **Source IP** | IP address where the change came from |
|
47
|
-
| **Event Source** | Always `ec2.amazonaws.com` for SG changes |
|
48
|
-
| **Request Parameters** | IP ranges, ports, protocols involved in the change |
|
49
|
-
|
50
|
-
> ✅ **Pro Tip**: If you're using **AWS Organizations**, query from the **Audit Account's CloudTrail**, or the central logging bucket if it's delivered via S3.
|
51
|
-
|
52
|
-
---
|
53
|
-
|
54
|
-
### 📜 **2. Use AWS Config for Historical Diff View**
|
55
|
-
|
56
|
-
If AWS Config is enabled for your account (highly recommended), it provides **resource-level history and diffs**.
|
57
|
-
|
58
|
-
#### 🔍 How to Use:
|
59
|
-
1. Go to **AWS Config > Resources**
|
60
|
-
2. Filter by **Resource Type**: *EC2 Security Group*
|
61
|
-
3. Search for **sg-xxx**
|
62
|
-
4. View **Configuration Timeline**:
|
63
|
-
- You’ll see **before/after diffs**
|
64
|
-
- You can pinpoint what rule (CIDR/port/protocol) was added or removed
|
65
|
-
|
66
|
-
> 🔐 **Bonus**: AWS Config also tells you **compliance status**, i.e., whether the change violated your internal security baselines.
|
67
|
-
|
68
|
-
---
|
69
|
-
|
70
|
-
### 📊 **3. Use Athena + CloudTrail Logs for Advanced Search (Optional)**
|
71
|
-
|
72
|
-
If your CloudTrail is delivered to S3 (recommended for long-term logging), you can:
|
73
|
-
- Use **Amazon Athena** with the **AWS CloudTrail partitioned schema**
|
74
|
-
- Run a SQL query like this:
|
75
|
-
|
76
|
-
```sql
|
77
|
-
SELECT
|
78
|
-
eventTime,
|
79
|
-
eventName,
|
80
|
-
userIdentity.arn,
|
81
|
-
sourceIPAddress,
|
82
|
-
requestParameters.groupId,
|
83
|
-
requestParameters.ipPermissions
|
84
|
-
FROM cloudtrail_logs
|
85
|
-
WHERE eventSource = 'ec2.amazonaws.com'
|
86
|
-
AND requestParameters.groupId = 'sg-xxx'
|
87
|
-
AND eventName IN (
|
88
|
-
'AuthorizeSecurityGroupIngress',
|
89
|
-
'RevokeSecurityGroupIngress',
|
90
|
-
'AuthorizeSecurityGroupEgress',
|
91
|
-
'RevokeSecurityGroupEgress'
|
92
|
-
)
|
93
|
-
ORDER BY eventTime DESC
|
94
|
-
```
|
95
|
-
|
96
|
-
---
|
97
|
-
|
98
|
-
### 🧑💻 **4. Correlate IAM User/Role Access**
|
99
|
-
|
100
|
-
Once you identify **who** made the change, validate:
|
101
|
-
|
102
|
-
- Was it a human user (`IAMUser`) or automated role (`IAMRole`, `AssumedRole`)?
|
103
|
-
- Was MFA enforced for human access?
|
104
|
-
- Did the user belong to a **delegated group (like `Admin`, `NetworkOps`)**?
|
105
|
-
|
106
|
-
Use **AWS IAM > Access Analyzer** or **CloudTrail "userIdentity" block** for this.
|
107
|
-
|
108
|
-
> 🔐 If it's an assumed role like `DevOpsAssumeRole`, look for **"sessionContext > sessionIssuer > userName"** in CloudTrail to trace back the original IAM identity.
|
109
|
-
|
110
|
-
---
|
111
|
-
|
112
|
-
### 🔄 **5. Cross-Check via Change Management / ITSM**
|
113
|
-
|
114
|
-
If you're practicing good governance, you should have:
|
115
|
-
- A **JIRA ticket**, **ServiceNow request**, or **GitOps pull request** associated with the change
|
116
|
-
- Cross-reference **timestamp** and **user** from CloudTrail with ticket system logs
|
117
|
-
- If deployed via Terraform/CDK: check the commit history or CI/CD job logs
|
118
|
-
|
119
|
-
---
|
120
|
-
|
121
|
-
## 🔥 Security Best Practices Going Forward
|
122
|
-
|
123
|
-
| Practice | Why |
|
124
|
-
|---------|-----|
|
125
|
-
| ✅ Enable **CloudTrail org-wide** and deliver logs to central S3 | Full audit trail |
|
126
|
-
| ✅ Use **AWS Config** across accounts | Historical visibility of resource changes |
|
127
|
-
| ✅ Integrate **CloudTrail Insights** | Detect unusual activity like bulk security group changes |
|
128
|
-
| ✅ Tag SGs with `Owner`, `Environment`, `ChangeTicket` | Aids investigation |
|
129
|
-
| ✅ Enforce **IAM Conditions** for `ec2:AuthorizeSecurityGroupIngress` etc. | Only allow changes through approved paths |
|
130
|
-
| ✅ Use **GuardDuty** or **Security Hub** to flag risky changes (e.g., `0.0.0.0/0` open port) | Detection & alerting |
|
131
|
-
|
132
|
-
---
|
133
|
-
|
134
|
-
> **An enterprise-grade, fully automated solution** for **monitoring and alerting on AWS Security Group (`sg-xxx`) changes** with forensic-level visibility, real-time alerts, and infrastructure governance.
|
135
|
-
|
136
|
-
We're going to:
|
137
|
-
|
138
|
-
1. ✅ Deep dive: How to **use Athena with CloudTrail partitioned schema**
|
139
|
-
2. ✅ Craft **Athena queries** to detect SG changes
|
140
|
-
3. ✅ Configure **AWS Config rules** to enforce and detect policy violations
|
141
|
-
4. ✅ Build a **real-time alert workflow** using **EventBridge + SNS + Microsoft Teams**
|
142
|
-
5. ✅ Wrap with best practices for **automation, security, and audit-readiness**
|
143
|
-
|
144
|
-
---
|
145
|
-
|
146
|
-
## 🔍 PART 1: Use Amazon Athena with AWS CloudTrail Logs (Partitioned Schema)
|
147
|
-
|
148
|
-
Amazon Athena allows you to **query CloudTrail logs stored in S3 using SQL**, which is ideal for forensic investigations or continuous audits.
|
149
|
-
|
150
|
-
### ✅ Step-by-Step Setup
|
151
|
-
|
152
|
-
#### **Step 1 – Ensure CloudTrail is Delivered to S3**
|
153
|
-
If not already configured:
|
154
|
-
- Go to **CloudTrail > Trails**
|
155
|
-
- Ensure **S3 logging is enabled** and directed to a known bucket (e.g., `cloudtrail-logs-org-central`)
|
156
|
-
- Ideally use **organization trail** for centralized auditing
|
157
|
-
|
158
|
-
#### **Step 2 – Create Athena Table for CloudTrail Logs**
|
159
|
-
|
160
|
-
Use this sample schema to create an Athena table (you only need to do this once per account or org-wide audit bucket):
|
161
|
-
|
162
|
-
> ~~s3://<your-cloudtrail-bucket-name>/AWSLogs/<account-id>/CloudTrail/~~ --> s3://cloudtrail-logs-org-central/AWSLogs/~~Your_Management_Account_ID~~/CloudTrail/
|
163
|
-
|
164
|
-
s3://ams-cloudtrail-vector-all-org/AWSLogs/909135376185/CloudTrail/
|
165
|
-
|
166
|
-
> create-cloudtrail_logs-table.sql
|
167
|
-
|
168
|
-
```sql
|
169
|
-
CREATE EXTERNAL TABLE IF NOT EXISTS cloudtrail_logs (
|
170
|
-
eventVersion STRING,
|
171
|
-
eventTime TIMESTAMP,
|
172
|
-
eventSource STRING,
|
173
|
-
eventName STRING,
|
174
|
-
awsRegion STRING,
|
175
|
-
sourceIPAddress STRING,
|
176
|
-
userAgent STRING,
|
177
|
-
userIdentity STRUCT<
|
178
|
-
type: STRING,
|
179
|
-
principalId: STRING,
|
180
|
-
arn: STRING,
|
181
|
-
accountId: STRING,
|
182
|
-
accessKeyId: STRING,
|
183
|
-
userName: STRING,
|
184
|
-
sessionContext: STRUCT<
|
185
|
-
attributes: STRUCT<
|
186
|
-
mfaAuthenticated: STRING,
|
187
|
-
creationDate: STRING>,
|
188
|
-
sessionIssuer: STRUCT<
|
189
|
-
type: STRING,
|
190
|
-
principalId: STRING,
|
191
|
-
arn: STRING,
|
192
|
-
accountId: STRING,
|
193
|
-
userName: STRING>>>,
|
194
|
-
requestParameters STRING,
|
195
|
-
responseElements STRING,
|
196
|
-
additionalEventData STRING,
|
197
|
-
errorCode STRING,
|
198
|
-
errorMessage STRING,
|
199
|
-
requestID STRING,
|
200
|
-
eventID STRING,
|
201
|
-
readOnly STRING,
|
202
|
-
eventType STRING,
|
203
|
-
apiVersion STRING,
|
204
|
-
managementEvent BOOLEAN,
|
205
|
-
recipientAccountId STRING,
|
206
|
-
sharedEventID STRING,
|
207
|
-
vpcEndpointId STRING
|
208
|
-
)
|
209
|
-
PARTITIONED BY (`region` STRING, `year` STRING, `month` STRING, `day` STRING)
|
210
|
-
STORED AS PARQUET
|
211
|
-
LOCATION 's3://your-cloudtrail-logs/AWSLogs/<account-id>/CloudTrail/'
|
212
|
-
TBLPROPERTIES (
|
213
|
-
"classification"="parquet",
|
214
|
-
"projection.enabled"="true",
|
215
|
-
"projection.region.type"="enum",
|
216
|
-
"projection.region.values"="us-east-1,us-west-2,ap-southeast-2",
|
217
|
-
"projection.year.type"="integer",
|
218
|
-
"projection.year.range"="2024,2030",
|
219
|
-
"projection.month.type"="integer",
|
220
|
-
"projection.month.range"="1,12",
|
221
|
-
"projection.day.type"="integer",
|
222
|
-
"projection.day.range"="1,31",
|
223
|
-
"storage.location.template"="s3://your-cloudtrail-logs/AWSLogs/<account-id>/CloudTrail/${region}/${year}/${month}/${day}/"
|
224
|
-
);
|
225
|
-
```
|
226
|
-
|
227
|
-
> Replace `<your-cloudtrail-bucket-name>` and `<account-id>` accordingly.
|
228
|
-
|
229
|
-
#### **Step 3 – Repair Partitions (very important)**
|
230
|
-
|
231
|
-
```sql
|
232
|
-
MSCK REPAIR TABLE cloudtrail_logs;
|
233
|
-
```
|
234
|
-
|
235
|
-
This loads available partitions into Athena for querying.
|
236
|
-
|
237
|
-
---
|
238
|
-
|
239
|
-
## 📊 PART 2: Craft Athena Query to Detect SG Changes
|
240
|
-
|
241
|
-
Here's an optimized, real-world Athena SQL query to detect all Security Group changes (`sg-xxx`):
|
242
|
-
|
243
|
-
```sql
|
244
|
-
SELECT
|
245
|
-
eventTime,
|
246
|
-
eventName,
|
247
|
-
userIdentity.arn AS actor,
|
248
|
-
userIdentity.sessionContext.sessionIssuer.userName AS assumedBy,
|
249
|
-
sourceIPAddress,
|
250
|
-
requestParameters,
|
251
|
-
json_extract_scalar(requestParameters, '$.groupId') AS securityGroupId
|
252
|
-
FROM cloudtrail_logs
|
253
|
-
WHERE eventName IN (
|
254
|
-
'AuthorizeSecurityGroupIngress',
|
255
|
-
'RevokeSecurityGroupIngress',
|
256
|
-
'AuthorizeSecurityGroupEgress',
|
257
|
-
'RevokeSecurityGroupEgress',
|
258
|
-
'UpdateSecurityGroupRuleDescriptionsEgress',
|
259
|
-
'UpdateSecurityGroupRuleDescriptionsIngress'
|
260
|
-
)
|
261
|
-
AND json_extract_scalar(requestParameters, '$.groupId') = 'sg-xxx'
|
262
|
-
AND year = '2025'
|
263
|
-
AND month = '04'
|
264
|
-
ORDER BY eventTime DESC;
|
265
|
-
```
|
266
|
-
|
267
|
-
> ✅ **Customize** the `year` and `month` fields based on your timeframe. This is important for partition pruning and performance.
|
268
|
-
|
269
|
-
---
|
270
|
-
|
271
|
-
## 🛡️ PART 3: Use AWS Config to Detect Non-Compliant SG Changes
|
272
|
-
|
273
|
-
### ✅ AWS Config Setup
|
274
|
-
|
275
|
-
Ensure AWS Config is:
|
276
|
-
- **Enabled in the account or centrally via org**
|
277
|
-
- Recording **EC2:SecurityGroup** as a tracked resource
|
278
|
-
|
279
|
-
### 🔧 Create AWS Managed or Custom Rule
|
280
|
-
|
281
|
-
Use AWS Managed Rule: `INCOMING_SSH_DISABLED`, `RESTRICTED_INCOMING_TRAFFIC`, or create a **custom rule** (Lambda-backed) to detect violations like:
|
282
|
-
|
283
|
-
- **Ports open to 0.0.0.0/0**
|
284
|
-
- **New rules outside of allowed IP range**
|
285
|
-
- **Unauthorized source CIDRs**
|
286
|
-
|
287
|
-
### ✅ Example: Custom AWS Config Rule for CIDR Scope
|
288
|
-
|
289
|
-
You can use [AWS Config Rule example](https://docs.aws.amazon.com/config/latest/developerguide/evaluate-config_develop-rules_nodejs.html) or build a custom rule in Python that flags any ingress rule with `0.0.0.0/0`.
|
290
|
-
|
291
|
-
---
|
292
|
-
|
293
|
-
## 🚨 PART 4: Automate Real-Time Alerting with EventBridge + SNS + MS Teams
|
294
|
-
|
295
|
-
### 🧱 Overview of Architecture
|
296
|
-
|
297
|
-
1. **EventBridge Rule** watches for specific `AuthorizeSecurityGroupIngress`, etc.
|
298
|
-
2. **SNS Topic** receives those events
|
299
|
-
3. **Lambda Function** transforms event into MS Teams format and posts via webhook
|
300
|
-
|
301
|
-
---
|
302
|
-
|
303
|
-
### 🔧 Step-by-Step: Setup
|
304
|
-
|
305
|
-
#### ✅ 1. Create EventBridge Rule
|
306
|
-
|
307
|
-
```json
|
308
|
-
{
|
309
|
-
"source": ["aws.ec2"],
|
310
|
-
"detail-type": ["AWS API Call via CloudTrail"],
|
311
|
-
"detail": {
|
312
|
-
"eventName": [
|
313
|
-
"AuthorizeSecurityGroupIngress",
|
314
|
-
"RevokeSecurityGroupIngress",
|
315
|
-
"AuthorizeSecurityGroupEgress",
|
316
|
-
"RevokeSecurityGroupEgress"
|
317
|
-
],
|
318
|
-
"requestParameters.groupId": ["sg-xxx"]
|
319
|
-
}
|
320
|
-
}
|
321
|
-
```
|
322
|
-
|
323
|
-
#### ✅ 2. Create SNS Topic (e.g., `SGChangeAlerts`)
|
324
|
-
|
325
|
-
- In **SNS**, create a topic
|
326
|
-
- Add Lambda (below) as a subscriber
|
327
|
-
|
328
|
-
#### ✅ 3. Create MS Teams Incoming Webhook
|
329
|
-
|
330
|
-
- In MS Teams:
|
331
|
-
- Go to a channel
|
332
|
-
- Choose “Connectors” → “Incoming Webhook”
|
333
|
-
- Name it, copy the **Webhook URL**
|
334
|
-
|
335
|
-
---
|
336
|
-
|
337
|
-
#### ✅ 4. Lambda Function to Post to MS Teams
|
338
|
-
|
339
|
-
Use a simple Python Lambda (add your webhook URL):
|
340
|
-
|
341
|
-
```python
|
342
|
-
import json
|
343
|
-
import urllib3
|
344
|
-
|
345
|
-
http = urllib3.PoolManager()
|
346
|
-
teams_webhook_url = "<YOUR_MS_TEAMS_WEBHOOK_URL>"
|
347
|
-
|
348
|
-
def lambda_handler(event, context):
|
349
|
-
for record in event['Records']:
|
350
|
-
message = json.loads(record['Sns']['Message'])
|
351
|
-
detail = message.get('detail', {})
|
352
|
-
|
353
|
-
sg_id = detail.get('requestParameters', {}).get('groupId', 'Unknown SG')
|
354
|
-
event_name = detail.get('eventName', 'UnknownEvent')
|
355
|
-
actor = detail.get('userIdentity', {}).get('arn', 'Unknown')
|
356
|
-
region = detail.get('awsRegion', 'Unknown')
|
357
|
-
source_ip = detail.get('sourceIPAddress', 'Unknown')
|
358
|
-
|
359
|
-
msg = {
|
360
|
-
"title": "⚠️ AWS Security Group Change Detected",
|
361
|
-
"text": f"**Event**: {event_name}\n**SG**: {sg_id}\n**User**: {actor}\n**Region**: {region}\n**Source IP**: {source_ip}"
|
362
|
-
}
|
363
|
-
|
364
|
-
http.request('POST', teams_webhook_url, body=json.dumps(msg), headers={'Content-Type': 'application/json'})
|
365
|
-
```
|
366
|
-
|
367
|
-
- Attach basic execution policy (SNS invocation + logs)
|
368
|
-
- Test by triggering a manual SG change
|
369
|
-
|
370
|
-
---
|
371
|
-
|
372
|
-
## 🔄 PART 5: Best Practices for Automation & Audit Maturity
|
373
|
-
|
374
|
-
| Category | Best Practice |
|
375
|
-
|----------|----------------|
|
376
|
-
| **Logging** | Store CloudTrail in centralized, versioned, encrypted S3 bucket |
|
377
|
-
| **Query** | Partition Athena tables by `year/month/day` for efficiency |
|
378
|
-
| **Alerting** | Always include user identity, IP, region in alerts |
|
379
|
-
| **Tagging** | Tag SGs with `Owner`, `Environment`, `ChangeTicket` |
|
380
|
-
| **Governance** | Use SCPs and IAM boundaries to restrict unauthorized SG changes |
|
381
|
-
| **Forensics** | Store Athena queries in a shared repo; automate with scheduled queries for weekly audits |
|
382
|
-
| **Cost Optimization** | Use Amazon Athena scheduled queries + QuickSight dashboards instead of external tools |
|
383
|
-
|
384
|
-
---
|
385
|
-
|
386
|
-
## 🎯 Final Thoughts: Governance-Driven Cloud Security
|
387
|
-
|
388
|
-
This solution provides:
|
389
|
-
- **Immediate visibility** (EventBridge + Teams alerting)
|
390
|
-
- **Historical traceability** (CloudTrail + Athena)
|
391
|
-
- **Policy enforcement** (AWS Config + IAM/SCP)
|
392
|
-
- **Audit readiness** (Tagging + documentation + centralized logs)
|
393
|
-
|
394
|
-
By integrating these layers, you go beyond reaction and establish a **proactive, auditable, and secure cloud operating model**.
|
395
|
-
|
396
|
-
---
|
397
|
-
|
398
|
-
Absolutely. Let's take our time and raise the bar.
|
399
|
-
|
400
|
-
We are now enhancing the **Athena Query Suite** to support **real-time or near-real-time** analysis of **Critical Alerts** across:
|
401
|
-
|
402
|
-
1. 🔐 **Security**
|
403
|
-
2. 🌐 **Network**
|
404
|
-
3. 🏗️ **Infrastructure**
|
405
|
-
4. 💻 **EC2 Runtime**
|
406
|
-
|
407
|
-
---
|
408
|
-
|
409
|
-
## 🧠 Goal:
|
410
|
-
|
411
|
-
Design and implement **production-ready Athena SQL queries**, with **partition projection**, **structured access**, and **cost-efficient filtering**, for all **critical alert types**. These queries should:
|
412
|
-
- Align with **AWS best practices**
|
413
|
-
- Be easy to integrate with **scheduled queries, dashboards, or alert pipelines**
|
414
|
-
- Prioritize **precision, performance, and auditability**
|
415
|
-
|
416
|
-
---
|
417
|
-
|
418
|
-
## 🧩 Improvements Identified from Previous Athena Queries:
|
419
|
-
|
420
|
-
| Area | Original | To-Be |
|
421
|
-
|------|----------|-------|
|
422
|
-
| Filtering | Broad or unpartitioned | Partitioned by `year`, `month`, `day`, `region` |
|
423
|
-
| Identity Insight | Simple `userIdentity.arn` | Full `sessionContext.sessionIssuer.userName`, MFA check |
|
424
|
-
| Parameter Handling | `json_extract_scalar` | Structured `MAP` access (e.g. `requestParameters['groupId']`) |
|
425
|
-
| Output Fields | Too generic | Specific fields like `eventName`, `caller`, `ip`, `action`, `resourceId` |
|
426
|
-
| Extensibility | Hardcoded SG ID | Accepts any SG, port, or IP — makes it reusable |
|
427
|
-
|
428
|
-
---
|
429
|
-
|
430
|
-
## ✅ Let’s Now Write the Full Set of **Critical Athena Queries**
|
431
|
-
|
432
|
-
---
|
433
|
-
|
434
|
-
### 🔐 **1. Security Alerts**
|
435
|
-
|
436
|
-
---
|
437
|
-
|
438
|
-
#### 🔸 A. Unauthorized API Calls (Brute Force, Exploitation Attempts)
|
439
|
-
|
440
|
-
> UnauthorizedOperation.sql
|
441
|
-
|
442
|
-
```sql
|
443
|
-
SELECT
|
444
|
-
eventTime,
|
445
|
-
userIdentity.arn AS user_arn,
|
446
|
-
sourceIPAddress,
|
447
|
-
awsRegion,
|
448
|
-
eventName,
|
449
|
-
errorCode
|
450
|
-
FROM cloudtrail_logs
|
451
|
-
WHERE eventName = 'UnauthorizedOperation'
|
452
|
-
AND year = '2025'
|
453
|
-
AND month = '04'
|
454
|
-
AND day BETWEEN '01' AND '08'
|
455
|
-
ORDER BY eventTime DESC;
|
456
|
-
```
|
457
|
-
|
458
|
-
> 📌 **Improvement**: Could also add `errorCode IN ('AccessDenied', 'UnauthorizedOperation')` to catch more cases.
|
459
|
-
|
460
|
-
---
|
461
|
-
|
462
|
-
#### 🔸 B. Root Account Usage
|
463
|
-
|
464
|
-
> RootAccountUsage.sql
|
465
|
-
|
466
|
-
```sql
|
467
|
-
SELECT
|
468
|
-
eventTime,
|
469
|
-
eventName,
|
470
|
-
sourceIPAddress,
|
471
|
-
userIdentity.arn AS user_arn,
|
472
|
-
userAgent
|
473
|
-
FROM cloudtrail_logs
|
474
|
-
WHERE userIdentity.type = 'Root'
|
475
|
-
AND year = '2025'
|
476
|
-
AND month = '04'
|
477
|
-
AND day BETWEEN '01' AND '08'
|
478
|
-
ORDER BY eventTime DESC;
|
479
|
-
```
|
480
|
-
|
481
|
-
> ✅ Use `userIdentity.type = 'Root'` — most accurate method to detect root usage across API calls.
|
482
|
-
|
483
|
-
---
|
484
|
-
|
485
|
-
#### 🔸 C. Security Group Rule Changes
|
486
|
-
|
487
|
-
> SecurityGroupRuleChanges.sql
|
488
|
-
|
489
|
-
```sql
|
490
|
-
SELECT
|
491
|
-
eventTime,
|
492
|
-
userIdentity.arn AS user,
|
493
|
-
eventName,
|
494
|
-
requestParameters['groupId'] AS securityGroupId,
|
495
|
-
requestParameters['ipPermissions'] AS modifiedPermissions,
|
496
|
-
sourceIPAddress
|
497
|
-
FROM cloudtrail_logs
|
498
|
-
WHERE eventName IN (
|
499
|
-
'AuthorizeSecurityGroupIngress',
|
500
|
-
'RevokeSecurityGroupIngress',
|
501
|
-
'AuthorizeSecurityGroupEgress',
|
502
|
-
'RevokeSecurityGroupEgress',
|
503
|
-
'UpdateSecurityGroupRuleDescriptionsIngress',
|
504
|
-
'UpdateSecurityGroupRuleDescriptionsEgress'
|
505
|
-
)
|
506
|
-
AND year = '2025'
|
507
|
-
AND month = '04'
|
508
|
-
AND day BETWEEN '01' AND '08'
|
509
|
-
ORDER BY eventTime DESC;
|
510
|
-
```
|
511
|
-
|
512
|
-
---
|
513
|
-
|
514
|
-
#### 🔸 D. IAM Policy/Role Changes
|
515
|
-
|
516
|
-
```sql
|
517
|
-
SELECT
|
518
|
-
eventTime,
|
519
|
-
eventName,
|
520
|
-
userIdentity.arn AS user,
|
521
|
-
requestParameters['roleName'] AS role,
|
522
|
-
requestParameters['policyDocument'] AS newPolicy,
|
523
|
-
sourceIPAddress
|
524
|
-
FROM cloudtrail_logs
|
525
|
-
WHERE eventName IN (
|
526
|
-
'PutRolePolicy', 'AttachRolePolicy', 'CreatePolicy',
|
527
|
-
'CreateRole', 'UpdateAssumeRolePolicy'
|
528
|
-
)
|
529
|
-
AND year = '2025'
|
530
|
-
AND month = '04'
|
531
|
-
AND day BETWEEN '01' AND '08'
|
532
|
-
ORDER BY eventTime DESC;
|
533
|
-
```
|
534
|
-
|
535
|
-
---
|
536
|
-
|
537
|
-
#### 🔸 E. Port 22/3389 Open to 0.0.0.0/0
|
538
|
-
|
539
|
-
```sql
|
540
|
-
SELECT
|
541
|
-
eventTime,
|
542
|
-
userIdentity.arn AS user,
|
543
|
-
requestParameters['groupId'] AS sg_id,
|
544
|
-
requestParameters['ipPermissions'] AS permissions,
|
545
|
-
sourceIPAddress
|
546
|
-
FROM cloudtrail_logs
|
547
|
-
WHERE eventName IN ('AuthorizeSecurityGroupIngress')
|
548
|
-
AND requestParameters['ipPermissions'] LIKE '%0.0.0.0/0%'
|
549
|
-
AND (
|
550
|
-
requestParameters['ipPermissions'] LIKE '%22%' OR
|
551
|
-
requestParameters['ipPermissions'] LIKE '%3389%'
|
552
|
-
)
|
553
|
-
AND year = '2025'
|
554
|
-
AND month = '04'
|
555
|
-
AND day BETWEEN '01' AND '08'
|
556
|
-
ORDER BY eventTime DESC;
|
557
|
-
```
|
558
|
-
|
559
|
-
---
|
560
|
-
|
561
|
-
### 🌐 **2. Network Alerts**
|
562
|
-
|
563
|
-
---
|
564
|
-
|
565
|
-
#### 🔸 A. NAT Gateway or Internet Gateway Failures (CloudTrail-Based)
|
566
|
-
|
567
|
-
CloudTrail doesn't capture health-check failure natively. Use **CloudWatch logs + SNS** for real-time alerts. But you can **detect removal** of NAT/IGW:
|
568
|
-
|
569
|
-
```sql
|
570
|
-
SELECT
|
571
|
-
eventTime,
|
572
|
-
eventName,
|
573
|
-
userIdentity.arn AS user,
|
574
|
-
requestParameters['gatewayId'] AS gateway_id,
|
575
|
-
sourceIPAddress
|
576
|
-
FROM cloudtrail_logs
|
577
|
-
WHERE eventName IN ('DeleteNatGateway', 'DetachInternetGateway')
|
578
|
-
AND year = '2025'
|
579
|
-
AND month = '04'
|
580
|
-
AND day BETWEEN '01' AND '08'
|
581
|
-
ORDER BY eventTime DESC;
|
582
|
-
```
|
583
|
-
|
584
|
-
---
|
585
|
-
|
586
|
-
#### 🔸 B. VPC Flow Logs — Suspicious Connections (JOIN with parsed Flow Logs)
|
587
|
-
|
588
|
-
This requires **VPC Flow Logs** parsed via **Athena + Glue**. Sample query pattern:
|
589
|
-
|
590
|
-
```sql
|
591
|
-
SELECT *
|
592
|
-
FROM vpc_flow_logs_parquet
|
593
|
-
WHERE dstaddr IN ('1.2.3.4', '8.8.8.8')
|
594
|
-
AND dstport IN (22, 3389, 3306)
|
595
|
-
AND action = 'ACCEPT'
|
596
|
-
AND day BETWEEN '2025-04-01' AND '2025-04-08'
|
597
|
-
ORDER BY start DESC;
|
598
|
-
```
|
599
|
-
|
600
|
-
---
|
601
|
-
|
602
|
-
### 🏗️ **3. Infrastructure Monitoring Alerts**
|
603
|
-
|
604
|
-
> These are best handled via **CloudWatch alarms**, but some are detectable via Athena from CloudTrail as *resource state changes*.
|
605
|
-
|
606
|
-
---
|
607
|
-
|
608
|
-
#### 🔸 A. Auto Scaling Group Launch Failures
|
609
|
-
|
610
|
-
```sql
|
611
|
-
SELECT
|
612
|
-
eventTime,
|
613
|
-
userIdentity.arn,
|
614
|
-
eventName,
|
615
|
-
errorMessage,
|
616
|
-
sourceIPAddress
|
617
|
-
FROM cloudtrail_logs
|
618
|
-
WHERE eventSource = 'autoscaling.amazonaws.com'
|
619
|
-
AND eventName = 'CreateAutoScalingGroup'
|
620
|
-
AND errorCode IS NOT NULL
|
621
|
-
AND year = '2025'
|
622
|
-
AND month = '04'
|
623
|
-
ORDER BY eventTime DESC;
|
624
|
-
```
|
625
|
-
|
626
|
-
---
|
627
|
-
|
628
|
-
#### 🔸 B. Backup Failure (AWS Backup)
|
629
|
-
|
630
|
-
```sql
|
631
|
-
SELECT
|
632
|
-
eventTime,
|
633
|
-
eventName,
|
634
|
-
userIdentity.arn,
|
635
|
-
requestParameters['backupVaultName'],
|
636
|
-
errorMessage
|
637
|
-
FROM cloudtrail_logs
|
638
|
-
WHERE eventSource = 'backup.amazonaws.com'
|
639
|
-
AND eventName = 'StartBackupJob'
|
640
|
-
AND errorCode IS NOT NULL
|
641
|
-
AND year = '2025'
|
642
|
-
AND month = '04'
|
643
|
-
ORDER BY eventTime DESC;
|
644
|
-
```
|
645
|
-
|
646
|
-
---
|
647
|
-
|
648
|
-
### 💻 **4. EC2 Instance Alerts**
|
649
|
-
|
650
|
-
---
|
651
|
-
|
652
|
-
#### 🔸 A. Unexpected Stop or Termination
|
653
|
-
|
654
|
-
```sql
|
655
|
-
SELECT
|
656
|
-
eventTime,
|
657
|
-
userIdentity.arn,
|
658
|
-
eventName,
|
659
|
-
requestParameters['instanceId'],
|
660
|
-
sourceIPAddress
|
661
|
-
FROM cloudtrail_logs
|
662
|
-
WHERE eventSource = 'ec2.amazonaws.com'
|
663
|
-
AND eventName IN ('StopInstances', 'TerminateInstances')
|
664
|
-
AND year = '2025'
|
665
|
-
AND month = '04'
|
666
|
-
ORDER BY eventTime DESC;
|
667
|
-
```
|
668
|
-
|
669
|
-
---
|
670
|
-
|
671
|
-
#### 🔸 B. EC2 Status Check Failure (Indirect Detection)
|
672
|
-
|
673
|
-
You can track `DescribeInstanceStatus` calls or hook into **CloudWatch Alarm**. Sample from CloudTrail:
|
674
|
-
|
675
|
-
```sql
|
676
|
-
SELECT
|
677
|
-
eventTime,
|
678
|
-
userIdentity.arn,
|
679
|
-
eventName,
|
680
|
-
requestParameters['instanceId'],
|
681
|
-
sourceIPAddress
|
682
|
-
FROM cloudtrail_logs
|
683
|
-
WHERE eventSource = 'ec2.amazonaws.com'
|
684
|
-
AND eventName = 'DescribeInstanceStatus'
|
685
|
-
AND year = '2025'
|
686
|
-
AND month = '04'
|
687
|
-
ORDER BY eventTime DESC;
|
688
|
-
```
|
689
|
-
|
690
|
-
---
|
691
|
-
|
692
|
-
## 🧱 Implementation Tips
|
693
|
-
|
694
|
-
- Automate Athena queries via **Scheduled Queries (daily/hourly)**
|
695
|
-
- Export results to **S3 + QuickSight** or **alert on non-empty results**
|
696
|
-
- Pair with **EventBridge rules** for real-time alerts
|
697
|
-
- Use **Lambda** to format and send alert messages to:
|
698
|
-
- **MS Teams / Slack / Email**
|
699
|
-
- Custom dashboards
|
700
|
-
|
701
|
-
---
|
702
|
-
|
703
|
-
## ✅ Summary Table of Enhanced Queries
|
704
|
-
|
705
|
-
| Alert Type | Query Name | Detection Method |
|
706
|
-
|------------|------------|------------------|
|
707
|
-
| Security | Unauthorized API Calls | CloudTrail via Athena |
|
708
|
-
| Security | Root Account Usage | `userIdentity.type = 'Root'` |
|
709
|
-
| Security | SG Rule Changes | `eventName` in SG Ops |
|
710
|
-
| Security | IAM Policy Changes | `PutRolePolicy`, etc. |
|
711
|
-
| Security | Public SSH/RDP | `ipPermissions` contains `0.0.0.0/0` |
|
712
|
-
| Network | NAT/IGW Delete | `DeleteNatGateway`, `DetachInternetGateway` |
|
713
|
-
| Network | VPC Flow Anomaly | Join with VPC logs |
|
714
|
-
| Infra | ASG Fail | CloudTrail errorCode on `CreateAutoScalingGroup` |
|
715
|
-
| Infra | AWS Backup Fail | `StartBackupJob` with error |
|
716
|
-
| EC2 | Unexpected Stop | `StopInstances`, `TerminateInstances` |
|
717
|
-
| EC2 | Status Check Fail | `DescribeInstanceStatus` queries |
|
718
|
-
|
719
|
-
---
|
720
|
-
|
721
|
-
> TODO: Let’s turn this into a full-scale, automated DevSecOps solution.
|
722
|
-
|
723
|
-
- **Terraform/CDK to create scheduled Athena queries + alerts?**
|
724
|
-
- A ready-made **QuickSight security dashboard?**
|
725
|
-
- A **GitHub repo** for these SQL files + alert infrastructure?
|
726
|
-
|
727
|
-
---
|