@patricio0312rev/skillset 0.1.0
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/CHANGELOG.md +29 -0
- package/LICENSE +21 -0
- package/README.md +176 -0
- package/bin/cli.js +37 -0
- package/package.json +55 -0
- package/src/commands/init.js +301 -0
- package/src/index.js +168 -0
- package/src/lib/config.js +200 -0
- package/src/lib/generator.js +166 -0
- package/src/utils/display.js +95 -0
- package/src/utils/readme.js +196 -0
- package/src/utils/tool-specific.js +233 -0
- package/templates/ai-engineering/agent-orchestration-planner/ SKILL.md +266 -0
- package/templates/ai-engineering/cost-latency-optimizer/ SKILL.md +270 -0
- package/templates/ai-engineering/doc-to-vector-dataset-generator/ SKILL.md +239 -0
- package/templates/ai-engineering/evaluation-harness/ SKILL.md +219 -0
- package/templates/ai-engineering/guardrails-safety-filter-builder/ SKILL.md +226 -0
- package/templates/ai-engineering/llm-debugger/ SKILL.md +283 -0
- package/templates/ai-engineering/prompt-regression-tester/ SKILL.md +216 -0
- package/templates/ai-engineering/prompt-template-builder/ SKILL.md +393 -0
- package/templates/ai-engineering/rag-pipeline-builder/ SKILL.md +244 -0
- package/templates/ai-engineering/tool-function-schema-designer/ SKILL.md +219 -0
- package/templates/architecture/adr-writer/ SKILL.md +250 -0
- package/templates/architecture/api-versioning-deprecation-planner/ SKILL.md +331 -0
- package/templates/architecture/domain-model-boundaries-mapper/ SKILL.md +300 -0
- package/templates/architecture/migration-planner/ SKILL.md +376 -0
- package/templates/architecture/performance-budget-setter/ SKILL.md +318 -0
- package/templates/architecture/reliability-strategy-builder/ SKILL.md +286 -0
- package/templates/architecture/rfc-generator/ SKILL.md +362 -0
- package/templates/architecture/scalability-playbook/ SKILL.md +279 -0
- package/templates/architecture/system-design-generator/ SKILL.md +339 -0
- package/templates/architecture/tech-debt-prioritizer/ SKILL.md +329 -0
- package/templates/backend/api-contract-normalizer/ SKILL.md +487 -0
- package/templates/backend/api-endpoint-generator/ SKILL.md +415 -0
- package/templates/backend/auth-module-builder/ SKILL.md +99 -0
- package/templates/backend/background-jobs-designer/ SKILL.md +166 -0
- package/templates/backend/caching-strategist/ SKILL.md +190 -0
- package/templates/backend/error-handling-standardizer/ SKILL.md +174 -0
- package/templates/backend/rate-limiting-abuse-protection/ SKILL.md +147 -0
- package/templates/backend/rbac-permissions-builder/ SKILL.md +158 -0
- package/templates/backend/service-layer-extractor/ SKILL.md +269 -0
- package/templates/backend/webhook-receiver-hardener/ SKILL.md +211 -0
- package/templates/ci-cd/artifact-sbom-publisher/ SKILL.md +236 -0
- package/templates/ci-cd/caching-strategy-optimizer/ SKILL.md +195 -0
- package/templates/ci-cd/deployment-checklist-generator/ SKILL.md +381 -0
- package/templates/ci-cd/github-actions-pipeline-creator/ SKILL.md +348 -0
- package/templates/ci-cd/monorepo-ci-optimizer/ SKILL.md +298 -0
- package/templates/ci-cd/preview-environments-builder/ SKILL.md +187 -0
- package/templates/ci-cd/quality-gates-enforcer/ SKILL.md +342 -0
- package/templates/ci-cd/release-automation-builder/ SKILL.md +281 -0
- package/templates/ci-cd/rollback-workflow-builder/ SKILL.md +372 -0
- package/templates/ci-cd/secrets-env-manager/ SKILL.md +242 -0
- package/templates/db-management/backup-restore-runbook-generator/ SKILL.md +505 -0
- package/templates/db-management/data-integrity-auditor/ SKILL.md +505 -0
- package/templates/db-management/data-retention-archiving-planner/ SKILL.md +430 -0
- package/templates/db-management/data-seeding-fixtures-builder/ SKILL.md +375 -0
- package/templates/db-management/db-performance-watchlist/ SKILL.md +425 -0
- package/templates/db-management/etl-sync-job-builder/ SKILL.md +457 -0
- package/templates/db-management/multi-tenant-safety-checker/ SKILL.md +398 -0
- package/templates/db-management/prisma-migration-assistant/ SKILL.md +379 -0
- package/templates/db-management/schema-consistency-checker/ SKILL.md +440 -0
- package/templates/db-management/sql-query-optimizer/ SKILL.md +324 -0
- package/templates/foundation/changelog-writer/ SKILL.md +431 -0
- package/templates/foundation/code-formatter-installer/ SKILL.md +320 -0
- package/templates/foundation/codebase-summarizer/ SKILL.md +360 -0
- package/templates/foundation/dependency-doctor/ SKILL.md +163 -0
- package/templates/foundation/dev-environment-bootstrapper/ SKILL.md +259 -0
- package/templates/foundation/dev-onboarding-builder/ SKILL.md +556 -0
- package/templates/foundation/docs-starter-kit/ SKILL.md +574 -0
- package/templates/foundation/explaining-code/SKILL.md +13 -0
- package/templates/foundation/git-hygiene-enforcer/ SKILL.md +455 -0
- package/templates/foundation/project-scaffolder/ SKILL.md +65 -0
- package/templates/foundation/project-scaffolder/references/templates.md +126 -0
- package/templates/foundation/repo-structure-linter/ SKILL.md +0 -0
- package/templates/foundation/repo-structure-linter/references/conventions.md +98 -0
- package/templates/frontend/animation-micro-interaction-pack/ SKILL.md +41 -0
- package/templates/frontend/component-scaffold-generator/ SKILL.md +562 -0
- package/templates/frontend/design-to-component-translator/ SKILL.md +547 -0
- package/templates/frontend/form-wizard-builder/ SKILL.md +553 -0
- package/templates/frontend/frontend-refactor-planner/ SKILL.md +37 -0
- package/templates/frontend/i18n-frontend-implementer/ SKILL.md +44 -0
- package/templates/frontend/modal-drawer-system/ SKILL.md +377 -0
- package/templates/frontend/page-layout-builder/ SKILL.md +630 -0
- package/templates/frontend/state-ux-flow-builder/ SKILL.md +23 -0
- package/templates/frontend/table-builder/ SKILL.md +350 -0
- package/templates/performance/alerting-dashboard-builder/ SKILL.md +162 -0
- package/templates/performance/backend-latency-profiler-helper/ SKILL.md +108 -0
- package/templates/performance/caching-cdn-strategy-planner/ SKILL.md +150 -0
- package/templates/performance/capacity-planning-helper/ SKILL.md +242 -0
- package/templates/performance/core-web-vitals-tuner/ SKILL.md +126 -0
- package/templates/performance/incident-runbook-generator/ SKILL.md +162 -0
- package/templates/performance/load-test-scenario-builder/ SKILL.md +256 -0
- package/templates/performance/observability-setup/ SKILL.md +232 -0
- package/templates/performance/postmortem-writer/ SKILL.md +203 -0
- package/templates/performance/structured-logging-standardizer/ SKILL.md +122 -0
- package/templates/security/auth-security-reviewer/ SKILL.md +428 -0
- package/templates/security/dependency-vulnerability-triage/ SKILL.md +495 -0
- package/templates/security/input-validation-sanitization-auditor/ SKILL.md +76 -0
- package/templates/security/pii-redaction-logging-policy-builder/ SKILL.md +65 -0
- package/templates/security/rbac-policy-tester/ SKILL.md +80 -0
- package/templates/security/secrets-scanner/ SKILL.md +462 -0
- package/templates/security/secure-headers-csp-builder/ SKILL.md +404 -0
- package/templates/security/security-incident-playbook-generator/ SKILL.md +76 -0
- package/templates/security/security-pr-checklist-skill/ SKILL.md +62 -0
- package/templates/security/threat-model-generator/ SKILL.md +394 -0
- package/templates/testing/contract-testing-builder/ SKILL.md +492 -0
- package/templates/testing/coverage-strategist/ SKILL.md +436 -0
- package/templates/testing/e2e-test-builder/ SKILL.md +382 -0
- package/templates/testing/flaky-test-detective/ SKILL.md +416 -0
- package/templates/testing/integration-test-builder/ SKILL.md +525 -0
- package/templates/testing/mocking-assistant/ SKILL.md +383 -0
- package/templates/testing/snapshot-test-refactorer/ SKILL.md +375 -0
- package/templates/testing/test-data-factory-builder/ SKILL.md +449 -0
- package/templates/testing/test-reporting-triage-skill/ SKILL.md +469 -0
- package/templates/testing/unit-test-generator/ SKILL.md +548 -0
|
@@ -0,0 +1,505 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: backup-restore-runbook-generator
|
|
3
|
+
description: Creates comprehensive disaster recovery procedures with automated backup scripts, restore procedures, validation checks, and role assignments. Use for "database backup", "disaster recovery", "data restore", or "DR planning".
|
|
4
|
+
---
|
|
5
|
+
|
|
6
|
+
# Backup/Restore Runbook Generator
|
|
7
|
+
|
|
8
|
+
Create reliable disaster recovery procedures for your databases.
|
|
9
|
+
|
|
10
|
+
## Backup Strategy
|
|
11
|
+
|
|
12
|
+
````markdown
|
|
13
|
+
# Database Backup Strategy
|
|
14
|
+
|
|
15
|
+
## Backup Types
|
|
16
|
+
|
|
17
|
+
### 1. Full Backup (Daily)
|
|
18
|
+
|
|
19
|
+
- **When**: 2:00 AM UTC
|
|
20
|
+
- **Retention**: 30 days
|
|
21
|
+
- **Storage**: S3 `s3://backups/full/`
|
|
22
|
+
- **Size**: ~50 GB
|
|
23
|
+
- **Duration**: ~45 minutes
|
|
24
|
+
|
|
25
|
+
### 2. Incremental Backup (Hourly)
|
|
26
|
+
|
|
27
|
+
- **When**: Every hour
|
|
28
|
+
- **Retention**: 7 days
|
|
29
|
+
- **Storage**: S3 `s3://backups/incremental/`
|
|
30
|
+
- **Size**: ~500 MB
|
|
31
|
+
- **Duration**: ~5 minutes
|
|
32
|
+
|
|
33
|
+
### 3. Transaction Log Backup (Every 15 min)
|
|
34
|
+
|
|
35
|
+
- **When**: Every 15 minutes
|
|
36
|
+
- **Retention**: 3 days
|
|
37
|
+
- **Storage**: S3 `s3://backups/wal/`
|
|
38
|
+
- **Point-in-time recovery capability**
|
|
39
|
+
|
|
40
|
+
## Backup Automation
|
|
41
|
+
|
|
42
|
+
### PostgreSQL
|
|
43
|
+
|
|
44
|
+
```bash
|
|
45
|
+
#!/bin/bash
|
|
46
|
+
# scripts/backup-postgres.sh
|
|
47
|
+
|
|
48
|
+
set -e
|
|
49
|
+
|
|
50
|
+
# Configuration
|
|
51
|
+
DB_NAME="production"
|
|
52
|
+
DB_USER="postgres"
|
|
53
|
+
DB_HOST="postgres.example.com"
|
|
54
|
+
BACKUP_DIR="/var/backups/postgres"
|
|
55
|
+
S3_BUCKET="s3://my-backups/postgres"
|
|
56
|
+
DATE=$(date +%Y%m%d_%H%M%S)
|
|
57
|
+
FILENAME="${DB_NAME}_${DATE}.sql.gz"
|
|
58
|
+
|
|
59
|
+
# Create backup directory
|
|
60
|
+
mkdir -p $BACKUP_DIR
|
|
61
|
+
|
|
62
|
+
echo "🔄 Starting backup: $FILENAME"
|
|
63
|
+
|
|
64
|
+
# Full backup with pg_dump
|
|
65
|
+
pg_dump \
|
|
66
|
+
--host=$DB_HOST \
|
|
67
|
+
--username=$DB_USER \
|
|
68
|
+
--dbname=$DB_NAME \
|
|
69
|
+
--format=custom \
|
|
70
|
+
--compress=9 \
|
|
71
|
+
--file=$BACKUP_DIR/$FILENAME \
|
|
72
|
+
--verbose
|
|
73
|
+
|
|
74
|
+
# Verify backup
|
|
75
|
+
if [ -f "$BACKUP_DIR/$FILENAME" ]; then
|
|
76
|
+
SIZE=$(du -h "$BACKUP_DIR/$FILENAME" | cut -f1)
|
|
77
|
+
echo "✅ Backup created: $SIZE"
|
|
78
|
+
else
|
|
79
|
+
echo "❌ Backup failed"
|
|
80
|
+
exit 1
|
|
81
|
+
fi
|
|
82
|
+
|
|
83
|
+
# Upload to S3
|
|
84
|
+
echo "📤 Uploading to S3..."
|
|
85
|
+
aws s3 cp $BACKUP_DIR/$FILENAME $S3_BUCKET/ \
|
|
86
|
+
--storage-class STANDARD_IA
|
|
87
|
+
|
|
88
|
+
# Verify upload
|
|
89
|
+
if aws s3 ls $S3_BUCKET/$FILENAME; then
|
|
90
|
+
echo "✅ Uploaded to S3"
|
|
91
|
+
else
|
|
92
|
+
echo "❌ S3 upload failed"
|
|
93
|
+
exit 1
|
|
94
|
+
fi
|
|
95
|
+
|
|
96
|
+
# Cleanup old local backups (keep last 7 days)
|
|
97
|
+
find $BACKUP_DIR -type f -name "*.sql.gz" -mtime +7 -delete
|
|
98
|
+
echo "🗑️ Cleaned up old local backups"
|
|
99
|
+
|
|
100
|
+
# Send notification
|
|
101
|
+
curl -X POST $SLACK_WEBHOOK \
|
|
102
|
+
-H 'Content-Type: application/json' \
|
|
103
|
+
-d "{\"text\": \"✅ Database backup complete: $FILENAME ($SIZE)\"}"
|
|
104
|
+
|
|
105
|
+
echo "✅ Backup complete!"
|
|
106
|
+
```
|
|
107
|
+
````
|
|
108
|
+
|
|
109
|
+
### MySQL
|
|
110
|
+
|
|
111
|
+
```bash
|
|
112
|
+
#!/bin/bash
|
|
113
|
+
# scripts/backup-mysql.sh
|
|
114
|
+
|
|
115
|
+
set -e
|
|
116
|
+
|
|
117
|
+
DB_NAME="production"
|
|
118
|
+
DB_USER="root"
|
|
119
|
+
DB_PASSWORD=$MYSQL_PASSWORD
|
|
120
|
+
DATE=$(date +%Y%m%d_%H%M%S)
|
|
121
|
+
FILENAME="${DB_NAME}_${DATE}.sql.gz"
|
|
122
|
+
|
|
123
|
+
echo "🔄 Starting MySQL backup..."
|
|
124
|
+
|
|
125
|
+
# Backup with mysqldump
|
|
126
|
+
mysqldump \
|
|
127
|
+
--user=$DB_USER \
|
|
128
|
+
--password=$DB_PASSWORD \
|
|
129
|
+
--single-transaction \
|
|
130
|
+
--quick \
|
|
131
|
+
--lock-tables=false \
|
|
132
|
+
--databases $DB_NAME \
|
|
133
|
+
| gzip > /var/backups/mysql/$FILENAME
|
|
134
|
+
|
|
135
|
+
# Upload to S3
|
|
136
|
+
aws s3 cp /var/backups/mysql/$FILENAME s3://my-backups/mysql/
|
|
137
|
+
|
|
138
|
+
echo "✅ Backup complete!"
|
|
139
|
+
```
|
|
140
|
+
|
|
141
|
+
## Restore Procedures
|
|
142
|
+
|
|
143
|
+
### Full Restore
|
|
144
|
+
|
|
145
|
+
```bash
|
|
146
|
+
#!/bin/bash
|
|
147
|
+
# scripts/restore-postgres.sh
|
|
148
|
+
|
|
149
|
+
set -e
|
|
150
|
+
|
|
151
|
+
BACKUP_FILE=$1
|
|
152
|
+
RESTORE_DB="production_restored"
|
|
153
|
+
|
|
154
|
+
if [ -z "$BACKUP_FILE" ]; then
|
|
155
|
+
echo "Usage: ./restore-postgres.sh <backup-file>"
|
|
156
|
+
exit 1
|
|
157
|
+
fi
|
|
158
|
+
|
|
159
|
+
echo "🔄 Starting restore from: $BACKUP_FILE"
|
|
160
|
+
|
|
161
|
+
# 1. Download from S3
|
|
162
|
+
echo "📥 Downloading backup..."
|
|
163
|
+
aws s3 cp s3://my-backups/postgres/$BACKUP_FILE /tmp/
|
|
164
|
+
|
|
165
|
+
# 2. Create new database
|
|
166
|
+
echo "🗄️ Creating database..."
|
|
167
|
+
psql -h $DB_HOST -U postgres -c "CREATE DATABASE $RESTORE_DB;"
|
|
168
|
+
|
|
169
|
+
# 3. Restore backup
|
|
170
|
+
echo "🔄 Restoring data..."
|
|
171
|
+
pg_restore \
|
|
172
|
+
--host=$DB_HOST \
|
|
173
|
+
--username=postgres \
|
|
174
|
+
--dbname=$RESTORE_DB \
|
|
175
|
+
--verbose \
|
|
176
|
+
/tmp/$BACKUP_FILE
|
|
177
|
+
|
|
178
|
+
# 4. Verify restore
|
|
179
|
+
echo "✅ Verifying restore..."
|
|
180
|
+
TABLE_COUNT=$(psql -h $DB_HOST -U postgres -d $RESTORE_DB -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';")
|
|
181
|
+
echo " Tables restored: $TABLE_COUNT"
|
|
182
|
+
|
|
183
|
+
ROW_COUNT=$(psql -h $DB_HOST -U postgres -d $RESTORE_DB -t -c "SELECT COUNT(*) FROM users;")
|
|
184
|
+
echo " User rows: $ROW_COUNT"
|
|
185
|
+
|
|
186
|
+
echo "✅ Restore complete!"
|
|
187
|
+
echo " Database: $RESTORE_DB"
|
|
188
|
+
echo " To use: UPDATE application config to point to $RESTORE_DB"
|
|
189
|
+
```
|
|
190
|
+
|
|
191
|
+
### Point-in-Time Recovery (PITR)
|
|
192
|
+
|
|
193
|
+
```bash
|
|
194
|
+
#!/bin/bash
|
|
195
|
+
# scripts/pitr-restore.sh
|
|
196
|
+
|
|
197
|
+
TARGET_TIME=$1 # Format: 2024-01-15 14:30:00
|
|
198
|
+
|
|
199
|
+
echo "🔄 Point-in-Time Restore to: $TARGET_TIME"
|
|
200
|
+
|
|
201
|
+
# 1. Restore base backup
|
|
202
|
+
echo "📦 Restoring base backup..."
|
|
203
|
+
pg_basebackup -D /var/lib/postgresql/data -X stream
|
|
204
|
+
|
|
205
|
+
# 2. Configure recovery
|
|
206
|
+
cat > /var/lib/postgresql/data/recovery.conf << EOF
|
|
207
|
+
restore_command = 'aws s3 cp s3://my-backups/wal/%f %p'
|
|
208
|
+
recovery_target_time = '$TARGET_TIME'
|
|
209
|
+
recovery_target_action = 'promote'
|
|
210
|
+
EOF
|
|
211
|
+
|
|
212
|
+
# 3. Start PostgreSQL
|
|
213
|
+
echo "🚀 Starting PostgreSQL in recovery mode..."
|
|
214
|
+
systemctl start postgresql
|
|
215
|
+
|
|
216
|
+
# 4. Wait for recovery
|
|
217
|
+
while ! pg_isready; do
|
|
218
|
+
echo " Waiting for recovery..."
|
|
219
|
+
sleep 5
|
|
220
|
+
done
|
|
221
|
+
|
|
222
|
+
echo "✅ PITR complete!"
|
|
223
|
+
```
|
|
224
|
+
|
|
225
|
+
## Validation Checks
|
|
226
|
+
|
|
227
|
+
```bash
|
|
228
|
+
#!/bin/bash
|
|
229
|
+
# scripts/validate-restore.sh
|
|
230
|
+
|
|
231
|
+
DB=$1
|
|
232
|
+
|
|
233
|
+
echo "🔍 Validating restore..."
|
|
234
|
+
|
|
235
|
+
# 1. Check table count
|
|
236
|
+
TABLES=$(psql -d $DB -t -c "SELECT COUNT(*) FROM information_schema.tables WHERE table_schema='public';")
|
|
237
|
+
echo "Tables: $TABLES"
|
|
238
|
+
|
|
239
|
+
if [ "$TABLES" -lt 10 ]; then
|
|
240
|
+
echo "❌ Too few tables restored"
|
|
241
|
+
exit 1
|
|
242
|
+
fi
|
|
243
|
+
|
|
244
|
+
# 2. Check row counts
|
|
245
|
+
for table in users products orders; do
|
|
246
|
+
ROWS=$(psql -d $DB -t -c "SELECT COUNT(*) FROM $table;")
|
|
247
|
+
echo " $table: $ROWS rows"
|
|
248
|
+
|
|
249
|
+
if [ "$ROWS" -lt 1 ]; then
|
|
250
|
+
echo "❌ Table $table is empty"
|
|
251
|
+
exit 1
|
|
252
|
+
fi
|
|
253
|
+
done
|
|
254
|
+
|
|
255
|
+
# 3. Check constraints
|
|
256
|
+
CONSTRAINTS=$(psql -d $DB -t -c "SELECT COUNT(*) FROM information_schema.table_constraints WHERE constraint_type='FOREIGN KEY';")
|
|
257
|
+
echo "Foreign keys: $CONSTRAINTS"
|
|
258
|
+
|
|
259
|
+
# 4. Check indexes
|
|
260
|
+
INDEXES=$(psql -d $DB -t -c "SELECT COUNT(*) FROM pg_indexes WHERE schemaname='public';")
|
|
261
|
+
echo "Indexes: $INDEXES"
|
|
262
|
+
|
|
263
|
+
# 5. Test query performance
|
|
264
|
+
START=$(date +%s%N)
|
|
265
|
+
psql -d $DB -c "SELECT COUNT(*) FROM users WHERE email LIKE '%@example.com%';" > /dev/null
|
|
266
|
+
END=$(date +%s%N)
|
|
267
|
+
DURATION=$(( (END - START) / 1000000 ))
|
|
268
|
+
echo "Query performance: ${DURATION}ms"
|
|
269
|
+
|
|
270
|
+
if [ "$DURATION" -gt 1000 ]; then
|
|
271
|
+
echo "⚠️ Slow query - missing indexes?"
|
|
272
|
+
fi
|
|
273
|
+
|
|
274
|
+
echo "✅ Validation complete!"
|
|
275
|
+
```
|
|
276
|
+
|
|
277
|
+
## Disaster Recovery Runbook
|
|
278
|
+
|
|
279
|
+
````markdown
|
|
280
|
+
# Disaster Recovery Runbook
|
|
281
|
+
|
|
282
|
+
## Incident Response
|
|
283
|
+
|
|
284
|
+
### 1. Assess Situation (5 minutes)
|
|
285
|
+
|
|
286
|
+
- [ ] Identify incident severity (P0/P1/P2)
|
|
287
|
+
- [ ] Determine data loss window
|
|
288
|
+
- [ ] Notify stakeholders
|
|
289
|
+
|
|
290
|
+
**Contacts:**
|
|
291
|
+
|
|
292
|
+
- DBA On-Call: [phone]
|
|
293
|
+
- Engineering Lead: [phone]
|
|
294
|
+
- CTO: [phone]
|
|
295
|
+
|
|
296
|
+
### 2. Stop the Bleeding (10 minutes)
|
|
297
|
+
|
|
298
|
+
- [ ] Enable maintenance mode
|
|
299
|
+
- [ ] Stop writes to corrupted database
|
|
300
|
+
- [ ] Preserve evidence (logs, backups)
|
|
301
|
+
|
|
302
|
+
```bash
|
|
303
|
+
# Enable maintenance mode
|
|
304
|
+
kubectl scale deployment/api --replicas=0
|
|
305
|
+
```
|
|
306
|
+
````
|
|
307
|
+
|
|
308
|
+
### 3. Identify Recovery Point (15 minutes)
|
|
309
|
+
|
|
310
|
+
- [ ] Determine last good backup
|
|
311
|
+
- [ ] Check backup integrity
|
|
312
|
+
- [ ] Calculate data loss
|
|
313
|
+
|
|
314
|
+
```bash
|
|
315
|
+
# List available backups
|
|
316
|
+
aws s3 ls s3://my-backups/postgres/ | tail -20
|
|
317
|
+
|
|
318
|
+
# Check backup size
|
|
319
|
+
aws s3 ls s3://my-backups/postgres/production_20240115_020000.sql.gz --human-readable
|
|
320
|
+
```
|
|
321
|
+
|
|
322
|
+
### 4. Prepare Recovery Environment (30 minutes)
|
|
323
|
+
|
|
324
|
+
- [ ] Spin up new database instance
|
|
325
|
+
- [ ] Configure networking
|
|
326
|
+
- [ ] Test connectivity
|
|
327
|
+
|
|
328
|
+
```bash
|
|
329
|
+
# Create RDS instance
|
|
330
|
+
aws rds create-db-instance \
|
|
331
|
+
--db-instance-identifier production-recovery \
|
|
332
|
+
--db-instance-class db.r6g.xlarge \
|
|
333
|
+
--engine postgres \
|
|
334
|
+
--master-username postgres \
|
|
335
|
+
--master-user-password [secure-password]
|
|
336
|
+
```
|
|
337
|
+
|
|
338
|
+
### 5. Execute Restore (1-2 hours)
|
|
339
|
+
|
|
340
|
+
- [ ] Download backup from S3
|
|
341
|
+
- [ ] Run restore script
|
|
342
|
+
- [ ] Apply transaction logs (if PITR)
|
|
343
|
+
- [ ] Verify data integrity
|
|
344
|
+
|
|
345
|
+
```bash
|
|
346
|
+
# Run restore
|
|
347
|
+
./scripts/restore-postgres.sh production_20240115_020000.sql.gz
|
|
348
|
+
|
|
349
|
+
# Validate
|
|
350
|
+
./scripts/validate-restore.sh production_restored
|
|
351
|
+
```
|
|
352
|
+
|
|
353
|
+
### 6. Validate and Test (30 minutes)
|
|
354
|
+
|
|
355
|
+
- [ ] Run validation scripts
|
|
356
|
+
- [ ] Test critical queries
|
|
357
|
+
- [ ] Verify row counts
|
|
358
|
+
- [ ] Check data consistency
|
|
359
|
+
|
|
360
|
+
### 7. Cutover (15 minutes)
|
|
361
|
+
|
|
362
|
+
- [ ] Update application config
|
|
363
|
+
- [ ] Point DNS to new database
|
|
364
|
+
- [ ] Disable maintenance mode
|
|
365
|
+
- [ ] Monitor for errors
|
|
366
|
+
|
|
367
|
+
```bash
|
|
368
|
+
# Update connection string
|
|
369
|
+
kubectl set env deployment/api DATABASE_URL=postgresql://...
|
|
370
|
+
|
|
371
|
+
# Scale up
|
|
372
|
+
kubectl scale deployment/api --replicas=3
|
|
373
|
+
```
|
|
374
|
+
|
|
375
|
+
### 8. Post-Recovery (1 hour)
|
|
376
|
+
|
|
377
|
+
- [ ] Monitor system health
|
|
378
|
+
- [ ] Verify user reports
|
|
379
|
+
- [ ] Document incident
|
|
380
|
+
- [ ] Schedule postmortem
|
|
381
|
+
|
|
382
|
+
## Recovery Time Objective (RTO)
|
|
383
|
+
|
|
384
|
+
| Scenario | Target | Actual |
|
|
385
|
+
| --------------- | ---------- | ---------- |
|
|
386
|
+
| Full restore | 2 hours | [measured] |
|
|
387
|
+
| PITR restore | 3 hours | [measured] |
|
|
388
|
+
| Region failover | 15 minutes | [measured] |
|
|
389
|
+
|
|
390
|
+
## Recovery Point Objective (RPO)
|
|
391
|
+
|
|
392
|
+
| Backup Type | Data Loss Window |
|
|
393
|
+
| ---------------- | ---------------- |
|
|
394
|
+
| Full backup | 24 hours |
|
|
395
|
+
| Incremental | 1 hour |
|
|
396
|
+
| Transaction logs | 15 minutes |
|
|
397
|
+
|
|
398
|
+
````
|
|
399
|
+
|
|
400
|
+
## Automated Backup Monitoring
|
|
401
|
+
|
|
402
|
+
```typescript
|
|
403
|
+
// scripts/monitor-backups.ts
|
|
404
|
+
import { S3Client, ListObjectsV2Command } from '@aws-sdk/client-s3';
|
|
405
|
+
|
|
406
|
+
const s3 = new S3Client({ region: 'us-east-1' });
|
|
407
|
+
|
|
408
|
+
async function checkBackupHealth() {
|
|
409
|
+
const bucket = 'my-backups';
|
|
410
|
+
const prefix = 'postgres/';
|
|
411
|
+
|
|
412
|
+
// List recent backups
|
|
413
|
+
const command = new ListObjectsV2Command({
|
|
414
|
+
Bucket: bucket,
|
|
415
|
+
Prefix: prefix,
|
|
416
|
+
MaxKeys: 10,
|
|
417
|
+
});
|
|
418
|
+
|
|
419
|
+
const response = await s3.send(command);
|
|
420
|
+
const backups = response.Contents || [];
|
|
421
|
+
|
|
422
|
+
// Check last backup age
|
|
423
|
+
const latestBackup = backups[0];
|
|
424
|
+
const age = Date.now() - new Date(latestBackup.LastModified!).getTime();
|
|
425
|
+
const ageHours = age / (1000 * 60 * 60);
|
|
426
|
+
|
|
427
|
+
if (ageHours > 25) {
|
|
428
|
+
console.error('❌ No backup in last 24 hours!');
|
|
429
|
+
// Send alert
|
|
430
|
+
await sendSlackAlert('No recent database backup!');
|
|
431
|
+
process.exit(1);
|
|
432
|
+
}
|
|
433
|
+
|
|
434
|
+
// Check backup size
|
|
435
|
+
const size = latestBackup.Size! / (1024 * 1024 * 1024); // GB
|
|
436
|
+
if (size < 10) {
|
|
437
|
+
console.error('⚠️ Backup size suspiciously small');
|
|
438
|
+
}
|
|
439
|
+
|
|
440
|
+
console.log('✅ Backup health check passed');
|
|
441
|
+
console.log(` Latest: ${latestBackup.Key}`);
|
|
442
|
+
console.log(` Age: ${ageHours.toFixed(1)} hours`);
|
|
443
|
+
console.log(` Size: ${size.toFixed(2)} GB`);
|
|
444
|
+
}
|
|
445
|
+
|
|
446
|
+
checkBackupHealth();
|
|
447
|
+
````
|
|
448
|
+
|
|
449
|
+
## Role Assignments
|
|
450
|
+
|
|
451
|
+
```markdown
|
|
452
|
+
## DR Team Roles
|
|
453
|
+
|
|
454
|
+
### Database Administrator (Primary)
|
|
455
|
+
|
|
456
|
+
- Execute restore procedures
|
|
457
|
+
- Verify data integrity
|
|
458
|
+
- Monitor recovery progress
|
|
459
|
+
|
|
460
|
+
### Engineering Lead
|
|
461
|
+
|
|
462
|
+
- Coordinate response
|
|
463
|
+
- Communicate with stakeholders
|
|
464
|
+
- Make cutover decisions
|
|
465
|
+
|
|
466
|
+
### DevOps Engineer
|
|
467
|
+
|
|
468
|
+
- Provision infrastructure
|
|
469
|
+
- Update application configs
|
|
470
|
+
- Monitor system health
|
|
471
|
+
|
|
472
|
+
### Product Manager
|
|
473
|
+
|
|
474
|
+
- Assess business impact
|
|
475
|
+
- Prioritize recovery
|
|
476
|
+
- Customer communication
|
|
477
|
+
|
|
478
|
+
## Escalation Path
|
|
479
|
+
|
|
480
|
+
1. DBA on-call →
|
|
481
|
+
2. Engineering Lead →
|
|
482
|
+
3. CTO →
|
|
483
|
+
4. CEO (P0 incidents only)
|
|
484
|
+
```
|
|
485
|
+
|
|
486
|
+
## Best Practices
|
|
487
|
+
|
|
488
|
+
1. **Test restores regularly**: Quarterly DR drills
|
|
489
|
+
2. **Automate backups**: Never rely on manual processes
|
|
490
|
+
3. **Multiple locations**: Cross-region backup storage
|
|
491
|
+
4. **Monitor backup health**: Alert on failures
|
|
492
|
+
5. **Document procedures**: Keep runbook updated
|
|
493
|
+
6. **Encrypt backups**: Protect sensitive data
|
|
494
|
+
7. **Version control**: Track backup script changes
|
|
495
|
+
|
|
496
|
+
## Output Checklist
|
|
497
|
+
|
|
498
|
+
- [ ] Backup automation scripts
|
|
499
|
+
- [ ] Restore procedures documented
|
|
500
|
+
- [ ] Validation checks defined
|
|
501
|
+
- [ ] PITR procedure (if applicable)
|
|
502
|
+
- [ ] DR runbook created
|
|
503
|
+
- [ ] Role assignments documented
|
|
504
|
+
- [ ] RTO/RPO defined
|
|
505
|
+
- [ ] Backup monitoring configured
|