ai-flow-dev 2.7.0 → 2.8.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/LICENSE +21 -21
- package/README.md +573 -570
- package/package.json +74 -74
- package/prompts/backend/flow-build-phase-0.md +535 -535
- package/prompts/backend/flow-build-phase-1.md +626 -626
- package/prompts/backend/flow-build-phase-10.md +340 -340
- package/prompts/backend/flow-build-phase-2.md +573 -573
- package/prompts/backend/flow-build-phase-3.md +834 -834
- package/prompts/backend/flow-build-phase-4.md +554 -554
- package/prompts/backend/flow-build-phase-5.md +703 -703
- package/prompts/backend/flow-build-phase-6.md +524 -524
- package/prompts/backend/flow-build-phase-7.md +1001 -1001
- package/prompts/backend/flow-build-phase-8.md +1407 -1407
- package/prompts/backend/flow-build-phase-9.md +477 -477
- package/prompts/backend/flow-build.md +137 -137
- package/prompts/backend/flow-check-review.md +656 -20
- package/prompts/backend/flow-check-test.md +526 -14
- package/prompts/backend/flow-check.md +717 -67
- package/prompts/backend/flow-commit.md +88 -119
- package/prompts/backend/flow-docs-sync.md +354 -354
- package/prompts/backend/flow-finish.md +919 -0
- package/prompts/backend/flow-release.md +949 -0
- package/prompts/backend/flow-work-feature.md +61 -61
- package/prompts/backend/flow-work-fix.md +46 -46
- package/prompts/backend/flow-work-refactor.md +48 -48
- package/prompts/backend/flow-work-resume.md +34 -34
- package/prompts/backend/flow-work.md +1098 -1286
- package/prompts/desktop/flow-build-phase-0.md +359 -359
- package/prompts/desktop/flow-build-phase-1.md +295 -295
- package/prompts/desktop/flow-build-phase-10.md +357 -357
- package/prompts/desktop/flow-build-phase-2.md +282 -282
- package/prompts/desktop/flow-build-phase-3.md +291 -291
- package/prompts/desktop/flow-build-phase-4.md +308 -308
- package/prompts/desktop/flow-build-phase-5.md +269 -269
- package/prompts/desktop/flow-build-phase-6.md +350 -350
- package/prompts/desktop/flow-build-phase-7.md +297 -297
- package/prompts/desktop/flow-build-phase-8.md +541 -541
- package/prompts/desktop/flow-build-phase-9.md +439 -439
- package/prompts/desktop/flow-build.md +156 -156
- package/prompts/desktop/flow-check-review.md +656 -20
- package/prompts/desktop/flow-check-test.md +526 -14
- package/prompts/desktop/flow-check.md +717 -67
- package/prompts/desktop/flow-commit.md +88 -119
- package/prompts/desktop/flow-docs-sync.md +354 -354
- package/prompts/desktop/flow-finish.md +919 -0
- package/prompts/desktop/flow-release.md +662 -0
- package/prompts/desktop/flow-work-feature.md +61 -61
- package/prompts/desktop/flow-work-fix.md +46 -46
- package/prompts/desktop/flow-work-refactor.md +48 -48
- package/prompts/desktop/flow-work-resume.md +34 -34
- package/prompts/desktop/flow-work.md +1202 -1390
- package/prompts/frontend/flow-build-phase-0.md +425 -425
- package/prompts/frontend/flow-build-phase-1.md +626 -626
- package/prompts/frontend/flow-build-phase-10.md +33 -33
- package/prompts/frontend/flow-build-phase-2.md +573 -573
- package/prompts/frontend/flow-build-phase-3.md +782 -782
- package/prompts/frontend/flow-build-phase-4.md +554 -554
- package/prompts/frontend/flow-build-phase-5.md +703 -703
- package/prompts/frontend/flow-build-phase-6.md +524 -524
- package/prompts/frontend/flow-build-phase-7.md +1001 -1001
- package/prompts/frontend/flow-build-phase-8.md +872 -872
- package/prompts/frontend/flow-build-phase-9.md +94 -94
- package/prompts/frontend/flow-build.md +137 -137
- package/prompts/frontend/flow-check-review.md +656 -20
- package/prompts/frontend/flow-check-test.md +526 -14
- package/prompts/frontend/flow-check.md +717 -67
- package/prompts/frontend/flow-commit.md +88 -119
- package/prompts/frontend/flow-docs-sync.md +550 -550
- package/prompts/frontend/flow-finish.md +919 -0
- package/prompts/frontend/flow-release.md +519 -0
- package/prompts/frontend/flow-work-api.md +1547 -0
- package/prompts/frontend/flow-work-feature.md +61 -61
- package/prompts/frontend/flow-work-fix.md +38 -38
- package/prompts/frontend/flow-work-refactor.md +48 -48
- package/prompts/frontend/flow-work-resume.md +34 -34
- package/prompts/frontend/flow-work.md +1595 -1320
- package/prompts/mobile/flow-build-phase-0.md +425 -425
- package/prompts/mobile/flow-build-phase-1.md +626 -626
- package/prompts/mobile/flow-build-phase-10.md +32 -32
- package/prompts/mobile/flow-build-phase-2.md +573 -573
- package/prompts/mobile/flow-build-phase-3.md +782 -782
- package/prompts/mobile/flow-build-phase-4.md +554 -554
- package/prompts/mobile/flow-build-phase-5.md +703 -703
- package/prompts/mobile/flow-build-phase-6.md +524 -524
- package/prompts/mobile/flow-build-phase-7.md +1001 -1001
- package/prompts/mobile/flow-build-phase-8.md +888 -888
- package/prompts/mobile/flow-build-phase-9.md +90 -90
- package/prompts/mobile/flow-build.md +135 -135
- package/prompts/mobile/flow-check-review.md +656 -20
- package/prompts/mobile/flow-check-test.md +526 -14
- package/prompts/mobile/flow-check.md +717 -67
- package/prompts/mobile/flow-commit.md +88 -119
- package/prompts/mobile/flow-docs-sync.md +620 -620
- package/prompts/mobile/flow-finish.md +919 -0
- package/prompts/mobile/flow-release.md +751 -0
- package/prompts/mobile/flow-work-api.md +1500 -0
- package/prompts/mobile/flow-work-feature.md +61 -61
- package/prompts/mobile/flow-work-fix.md +46 -46
- package/prompts/mobile/flow-work-refactor.md +48 -48
- package/prompts/mobile/flow-work-resume.md +34 -34
- package/prompts/mobile/flow-work.md +1605 -1329
- package/prompts/shared/mermaid-guidelines.md +102 -102
- package/prompts/shared/scope-levels.md +114 -114
- package/prompts/shared/smart-skip-preflight.md +214 -214
- package/prompts/shared/story-points.md +55 -55
- package/prompts/shared/task-format.md +74 -74
- package/prompts/shared/task-summary-template.md +277 -277
- package/templates/AGENT.template.md +443 -443
- package/templates/backend/.clauderules.template +112 -112
- package/templates/backend/.cursorrules.template +102 -102
- package/templates/backend/README.template.md +2 -2
- package/templates/backend/ai-instructions.template.md +2 -2
- package/templates/backend/copilot-instructions.template.md +2 -2
- package/templates/backend/docs/api.template.md +320 -320
- package/templates/backend/docs/business-flows.template.md +97 -97
- package/templates/backend/docs/code-standards.template.md +2 -2
- package/templates/backend/docs/contributing.template.md +3 -3
- package/templates/backend/docs/data-model.template.md +520 -520
- package/templates/backend/docs/testing.template.md +2 -2
- package/templates/backend/project-brief.template.md +2 -2
- package/templates/backend/specs/configuration.template.md +2 -2
- package/templates/backend/specs/security.template.md +2 -2
- package/templates/desktop/.clauderules.template +112 -112
- package/templates/desktop/.cursorrules.template +102 -102
- package/templates/desktop/README.template.md +170 -170
- package/templates/desktop/ai-instructions.template.md +366 -366
- package/templates/desktop/copilot-instructions.template.md +140 -140
- package/templates/desktop/docs/docs/api.template.md +320 -320
- package/templates/desktop/docs/docs/architecture.template.md +724 -724
- package/templates/desktop/docs/docs/business-flows.template.md +102 -102
- package/templates/desktop/docs/docs/code-standards.template.md +792 -792
- package/templates/desktop/docs/docs/contributing.template.md +149 -149
- package/templates/desktop/docs/docs/data-model.template.md +520 -520
- package/templates/desktop/docs/docs/operations.template.md +720 -720
- package/templates/desktop/docs/docs/testing.template.md +722 -722
- package/templates/desktop/project-brief.template.md +150 -150
- package/templates/desktop/specs/specs/configuration.template.md +121 -121
- package/templates/desktop/specs/specs/security.template.md +392 -392
- package/templates/frontend/README.template.md +2 -2
- package/templates/frontend/ai-instructions.template.md +2 -2
- package/templates/frontend/docs/api-integration.template.md +362 -362
- package/templates/frontend/docs/components.template.md +2 -2
- package/templates/frontend/docs/error-handling.template.md +360 -360
- package/templates/frontend/docs/operations.template.md +107 -107
- package/templates/frontend/docs/performance.template.md +124 -124
- package/templates/frontend/docs/pwa.template.md +119 -119
- package/templates/frontend/docs/state-management.template.md +2 -2
- package/templates/frontend/docs/styling.template.md +2 -2
- package/templates/frontend/docs/testing.template.md +2 -2
- package/templates/frontend/project-brief.template.md +2 -2
- package/templates/frontend/specs/accessibility.template.md +95 -95
- package/templates/frontend/specs/configuration.template.md +2 -2
- package/templates/frontend/specs/security.template.md +175 -175
- package/templates/fullstack/README.template.md +252 -252
- package/templates/fullstack/ai-instructions.template.md +444 -444
- package/templates/fullstack/project-brief.template.md +157 -157
- package/templates/fullstack/specs/configuration.template.md +340 -340
- package/templates/mobile/README.template.md +167 -167
- package/templates/mobile/ai-instructions.template.md +196 -196
- package/templates/mobile/docs/app-store.template.md +135 -135
- package/templates/mobile/docs/architecture.template.md +63 -63
- package/templates/mobile/docs/native-features.template.md +94 -94
- package/templates/mobile/docs/navigation.template.md +59 -59
- package/templates/mobile/docs/offline-strategy.template.md +65 -65
- package/templates/mobile/docs/permissions.template.md +56 -56
- package/templates/mobile/docs/state-management.template.md +85 -85
- package/templates/mobile/docs/testing.template.md +109 -109
- package/templates/mobile/project-brief.template.md +69 -69
- package/templates/mobile/specs/build-configuration.template.md +91 -91
- package/templates/mobile/specs/deployment.template.md +92 -92
- package/templates/work.template.md +61 -47
|
@@ -1,1001 +1,1001 @@
|
|
|
1
|
-
## PHASE 7: Operations & Deployment (10-15 min)
|
|
2
|
-
|
|
3
|
-
> **Order for this phase:** 7.1 → 7.2 → 7.3 → 7.4 → 7.4.1 → 7.5 → 7.6 → 7.7 → 7.7.1 → 7.7.2 → 7.8 → 7.9 → 7.9.1 → 7.9.2 → 7.9.3 → 7.9.4 → 7.10
|
|
4
|
-
|
|
5
|
-
> **📌 Scope-based behavior:**
|
|
6
|
-
>
|
|
7
|
-
> - **MVP:** Ask 7.1-7.4 only (deployment basics), skip 7.5-7.10 (monitoring, scaling, backups), mark as "TBD"
|
|
8
|
-
> - **Production-Ready:** Ask 7.1-7.8, simplify 7.9-7.10 (advanced monitoring and resilience)
|
|
9
|
-
> - **Enterprise:** Ask all questions 7.1-7.10 with emphasis on reliability and disaster recovery
|
|
10
|
-
|
|
11
|
-
### Objective
|
|
12
|
-
|
|
13
|
-
Define deployment, monitoring, and operational practices.
|
|
14
|
-
|
|
15
|
-
---
|
|
16
|
-
|
|
17
|
-
## 🔍 Pre-Flight Check (Smart Skip Logic)
|
|
18
|
-
|
|
19
|
-
> 📎 **Reference:** See [prompts/shared/smart-skip-preflight.md](../../.ai-flow/prompts/shared/smart-skip-preflight.md) for the complete smart skip logic.
|
|
20
|
-
|
|
21
|
-
**Execute Pre-Flight Check for Phase 7:**
|
|
22
|
-
|
|
23
|
-
- **Target File**: `docs/deployment.md`
|
|
24
|
-
- **Phase Name**: "OPERATIONS & DEPLOYMENT"
|
|
25
|
-
- **Key Items**: CI/CD pipeline, deployment platform, monitoring, logging
|
|
26
|
-
- **Typical Gaps**: Incident runbooks, disaster recovery, scaling strategy
|
|
27
|
-
|
|
28
|
-
**Proceed with appropriate scenario based on audit data from `.ai-flow/cache/audit-data.json`**
|
|
29
|
-
|
|
30
|
-
---
|
|
31
|
-
|
|
32
|
-
## Phase 7 Questions (Full Mode)
|
|
33
|
-
|
|
34
|
-
**7.1 Deployment Environment**
|
|
35
|
-
|
|
36
|
-
```
|
|
37
|
-
|
|
38
|
-
Where will you deploy?
|
|
39
|
-
|
|
40
|
-
A) ⭐ Cloud Platform
|
|
41
|
-
|
|
42
|
-
- AWS (ECS, Fargate, Lambda, EC2)
|
|
43
|
-
- Google Cloud (Cloud Run, GKE, Compute Engine)
|
|
44
|
-
- Azure (App Service, AKS, VMs)
|
|
45
|
-
|
|
46
|
-
B) 🔥 Platform-as-a-Service (PaaS)
|
|
47
|
-
|
|
48
|
-
- Heroku
|
|
49
|
-
- Railway
|
|
50
|
-
- Render
|
|
51
|
-
- Fly.io
|
|
52
|
-
- Vercel (for APIs)
|
|
53
|
-
|
|
54
|
-
C) 🏢 On-Premises
|
|
55
|
-
|
|
56
|
-
- Company servers
|
|
57
|
-
- Private cloud
|
|
58
|
-
|
|
59
|
-
D) 🐳 Container Orchestration
|
|
60
|
-
|
|
61
|
-
- Kubernetes (GKE, EKS, AKS)
|
|
62
|
-
- Docker Swarm
|
|
63
|
-
- Nomad
|
|
64
|
-
|
|
65
|
-
Your choice: \_\_
|
|
66
|
-
Why?
|
|
67
|
-
|
|
68
|
-
```
|
|
69
|
-
|
|
70
|
-
**7.2 Containerization**
|
|
71
|
-
|
|
72
|
-
````
|
|
73
|
-
|
|
74
|
-
Will you use Docker?
|
|
75
|
-
|
|
76
|
-
A) ⭐ Yes - Dockerize application
|
|
77
|
-
|
|
78
|
-
- Multi-stage build
|
|
79
|
-
- Optimized image size
|
|
80
|
-
- Docker Compose for local dev
|
|
81
|
-
|
|
82
|
-
B) No - Deploy directly
|
|
83
|
-
|
|
84
|
-
If yes:
|
|
85
|
-
Base image: **
|
|
86
|
-
Estimated image size: ** MB
|
|
87
|
-
|
|
88
|
-
Example stack (local development):
|
|
89
|
-
|
|
90
|
-
```yaml
|
|
91
|
-
services:
|
|
92
|
-
app:
|
|
93
|
-
build: .
|
|
94
|
-
ports: [3000:3000]
|
|
95
|
-
db:
|
|
96
|
-
image: postgres:15
|
|
97
|
-
redis:
|
|
98
|
-
image: redis:7
|
|
99
|
-
```
|
|
100
|
-
|
|
101
|
-
````
|
|
102
|
-
|
|
103
|
-
**7.3 Environment Strategy**
|
|
104
|
-
|
|
105
|
-
```
|
|
106
|
-
|
|
107
|
-
How many environments will you have?
|
|
108
|
-
|
|
109
|
-
A) ⭐ Three environments
|
|
110
|
-
|
|
111
|
-
- Development (local)
|
|
112
|
-
- Staging (pre-production, QA)
|
|
113
|
-
- Production (live)
|
|
114
|
-
|
|
115
|
-
B) 🏆 Four+ environments
|
|
116
|
-
|
|
117
|
-
- Development
|
|
118
|
-
- Testing (automated tests)
|
|
119
|
-
- Staging
|
|
120
|
-
- Production
|
|
121
|
-
|
|
122
|
-
C) 🚀 Two environments
|
|
123
|
-
|
|
124
|
-
- Development
|
|
125
|
-
- Production
|
|
126
|
-
|
|
127
|
-
Your choice: \_\_
|
|
128
|
-
|
|
129
|
-
Environment configuration:
|
|
130
|
-
A) ✅ Environment variables (.env files)
|
|
131
|
-
B) ✅ Config service (AWS Secrets Manager, Vault)
|
|
132
|
-
C) ✅ Feature flags (LaunchDarkly, Unleash)
|
|
133
|
-
|
|
134
|
-
```
|
|
135
|
-
|
|
136
|
-
**7.4 CI/CD Pipeline**
|
|
137
|
-
|
|
138
|
-
```
|
|
139
|
-
|
|
140
|
-
CI/CD platform:
|
|
141
|
-
|
|
142
|
-
A) ⭐ GitHub Actions - If using GitHub
|
|
143
|
-
B) 🔥 GitLab CI - If using GitLab
|
|
144
|
-
C) Jenkins - Self-hosted
|
|
145
|
-
D) CircleCI
|
|
146
|
-
E) Travis CI
|
|
147
|
-
F) AWS CodePipeline
|
|
148
|
-
G) Azure DevOps
|
|
149
|
-
|
|
150
|
-
Your choice: \_\_
|
|
151
|
-
|
|
152
|
-
Pipeline stages:
|
|
153
|
-
|
|
154
|
-
1. ✅ Checkout code
|
|
155
|
-
2. ✅ Install dependencies
|
|
156
|
-
3. ✅ Lint
|
|
157
|
-
4. ✅ Test (with coverage)
|
|
158
|
-
5. ✅ Build
|
|
159
|
-
6. ✅ Security scan (optional)
|
|
160
|
-
7. ✅ Deploy to staging
|
|
161
|
-
8. ⏸️ Manual approval (optional)
|
|
162
|
-
9. ✅ Deploy to production
|
|
163
|
-
|
|
164
|
-
Auto-deploy strategy:
|
|
165
|
-
A) ⭐ Auto-deploy to staging, manual approval for production
|
|
166
|
-
B) 🚀 Auto-deploy to production (main branch)
|
|
167
|
-
C) Manual deploy for all environments
|
|
168
|
-
|
|
169
|
-
```
|
|
170
|
-
|
|
171
|
-
**7.4.1 Deployment Strategy** (Production-Ready and Enterprise only)
|
|
172
|
-
|
|
173
|
-
```
|
|
174
|
-
What deployment strategy will you use for production?
|
|
175
|
-
|
|
176
|
-
A) ⭐ Rolling Deployment - Gradual replacement
|
|
177
|
-
- Replace instances one at a time
|
|
178
|
-
- Zero downtime
|
|
179
|
-
- Easy rollback
|
|
180
|
-
|
|
181
|
-
B) 🔥 Blue-Green Deployment - Instant switch
|
|
182
|
-
- Two identical environments
|
|
183
|
-
- Switch traffic instantly
|
|
184
|
-
- Higher infrastructure cost
|
|
185
|
-
|
|
186
|
-
C) ⚡ Canary Deployment - Progressive rollout
|
|
187
|
-
- Deploy to small percentage first
|
|
188
|
-
- Monitor for issues
|
|
189
|
-
- Gradually increase traffic
|
|
190
|
-
|
|
191
|
-
D) 🏆 Feature Flags - Code-level control
|
|
192
|
-
- Deploy code, toggle features
|
|
193
|
-
- Instant enable/disable
|
|
194
|
-
- Best with: LaunchDarkly, Unleash
|
|
195
|
-
|
|
196
|
-
Your choice: __
|
|
197
|
-
|
|
198
|
-
Rollback plan:
|
|
199
|
-
- How quickly must rollback complete? __ minutes
|
|
200
|
-
- Who can trigger rollback? [DevOps/Tech Lead/Any developer]
|
|
201
|
-
- Rollback trigger criteria? [Error rate > X%, latency > Y ms, manual]
|
|
202
|
-
|
|
203
|
-
If Blue-Green:
|
|
204
|
-
- Traffic switching: [Load balancer, DNS, etc.]
|
|
205
|
-
- Database migrations: [Strategy for zero-downtime]
|
|
206
|
-
|
|
207
|
-
If Canary:
|
|
208
|
-
- Initial traffic: __%
|
|
209
|
-
- Gradual increase: __% per __ minutes
|
|
210
|
-
- Success criteria: __
|
|
211
|
-
```
|
|
212
|
-
|
|
213
|
-
**7.5 Monitoring & Logging**
|
|
214
|
-
|
|
215
|
-
````
|
|
216
|
-
|
|
217
|
-
Monitoring tools:
|
|
218
|
-
|
|
219
|
-
Application Performance Monitoring (APM):
|
|
220
|
-
A) ⭐ Datadog - Full-featured, expensive
|
|
221
|
-
B) 🔥 New Relic - Popular
|
|
222
|
-
C) Sentry - Error tracking focus
|
|
223
|
-
D) ⚡ OpenTelemetry + Grafana - Open source
|
|
224
|
-
E) AWS CloudWatch
|
|
225
|
-
F) None yet
|
|
226
|
-
|
|
227
|
-
Your choice: \_\_
|
|
228
|
-
|
|
229
|
-
Logging:
|
|
230
|
-
A) ⭐ Centralized logging
|
|
231
|
-
|
|
232
|
-
- Winston/Pino (Node.js) → CloudWatch/Datadog
|
|
233
|
-
- Python logging → ELK Stack
|
|
234
|
-
|
|
235
|
-
B) Basic console logs
|
|
236
|
-
|
|
237
|
-
C) Structured JSON logging ⭐
|
|
238
|
-
|
|
239
|
-
```json
|
|
240
|
-
{
|
|
241
|
-
"level": "info",
|
|
242
|
-
"timestamp": "2024-01-15T10:30:00Z",
|
|
243
|
-
"userId": "123",
|
|
244
|
-
"action": "user.login",
|
|
245
|
-
"ip": "192.168.1.1",
|
|
246
|
-
"message": "User logged in successfully"
|
|
247
|
-
}
|
|
248
|
-
```
|
|
249
|
-
|
|
250
|
-
Your logging strategy: \_\_
|
|
251
|
-
|
|
252
|
-
Metrics to track:
|
|
253
|
-
|
|
254
|
-
- ✅ Request rate (requests/sec)
|
|
255
|
-
- ✅ Error rate (% of failed requests)
|
|
256
|
-
- ✅ Response time (p50, p95, p99)
|
|
257
|
-
- ✅ Database query time
|
|
258
|
-
- ✅ Cache hit rate
|
|
259
|
-
- ✅ CPU/Memory usage
|
|
260
|
-
- Custom business metrics: \_\_
|
|
261
|
-
|
|
262
|
-
````
|
|
263
|
-
|
|
264
|
-
**7.6 Alerts**
|
|
265
|
-
|
|
266
|
-
```
|
|
267
|
-
|
|
268
|
-
When should you be alerted?
|
|
269
|
-
|
|
270
|
-
A) ✅ Error rate > **% (e.g., 1%)
|
|
271
|
-
B) ✅ Response time > **ms (e.g., 1000ms)
|
|
272
|
-
C) ✅ 5xx errors (server errors)
|
|
273
|
-
D) ✅ Service down (health check failure)
|
|
274
|
-
E) ✅ Database connection failures
|
|
275
|
-
F) ✅ Disk space > 80%
|
|
276
|
-
G) ✅ Memory usage > 85%
|
|
277
|
-
|
|
278
|
-
Alert channels:
|
|
279
|
-
A) ⭐ Email
|
|
280
|
-
B) 🔥 Slack/Discord
|
|
281
|
-
C) ⚡ PagerDuty/Opsgenie (on-call)
|
|
282
|
-
D) SMS (critical only)
|
|
283
|
-
|
|
284
|
-
Your preferences: \_\_
|
|
285
|
-
|
|
286
|
-
On-call rotation:
|
|
287
|
-
A) Yes - Using [PagerDuty/Opsgenie]
|
|
288
|
-
B) No - Monitor during business hours
|
|
289
|
-
|
|
290
|
-
```
|
|
291
|
-
|
|
292
|
-
**7.7 Backup & Disaster Recovery**
|
|
293
|
-
|
|
294
|
-
```
|
|
295
|
-
|
|
296
|
-
Backup strategy:
|
|
297
|
-
|
|
298
|
-
Database backups:
|
|
299
|
-
A) ⭐ Automated daily backups
|
|
300
|
-
|
|
301
|
-
- Retention: 30 days
|
|
302
|
-
- Point-in-time recovery
|
|
303
|
-
|
|
304
|
-
B) 🏆 Continuous backups
|
|
305
|
-
|
|
306
|
-
- Every hour
|
|
307
|
-
- 90 days retention
|
|
308
|
-
|
|
309
|
-
C) Manual backups weekly
|
|
310
|
-
|
|
311
|
-
Your strategy: **
|
|
312
|
-
Retention period: ** days
|
|
313
|
-
|
|
314
|
-
Disaster recovery:
|
|
315
|
-
|
|
316
|
-
- Recovery Time Objective (RTO): \_\_ (how fast to restore)
|
|
317
|
-
- Recovery Point Objective (RPO): \_\_ (acceptable data loss)
|
|
318
|
-
|
|
319
|
-
Example:
|
|
320
|
-
|
|
321
|
-
- RTO: 1 hour (service restored within 1 hour)
|
|
322
|
-
- RPO: 15 minutes (lose max 15 min of data)
|
|
323
|
-
|
|
324
|
-
```
|
|
325
|
-
|
|
326
|
-
**7.7.1 Database Migrations in Production**
|
|
327
|
-
|
|
328
|
-
```
|
|
329
|
-
How will you handle database migrations in production?
|
|
330
|
-
|
|
331
|
-
Zero-downtime migrations:
|
|
332
|
-
A) ⭐ Yes - Plan for zero-downtime migrations (Production-Ready/Enterprise)
|
|
333
|
-
B) No - Accept maintenance windows (MVP)
|
|
334
|
-
|
|
335
|
-
If zero-downtime:
|
|
336
|
-
- Strategy: [Expand/Contract, Blue-Green migrations, etc.]
|
|
337
|
-
- Rollback plan: __
|
|
338
|
-
- Testing: [Tested on staging, Dry-run process]
|
|
339
|
-
|
|
340
|
-
Migration windows (if not zero-downtime):
|
|
341
|
-
- Preferred time: __
|
|
342
|
-
- Duration: __ minutes
|
|
343
|
-
- Notification: __
|
|
344
|
-
```
|
|
345
|
-
|
|
346
|
-
**7.7.2 Database Connection Pooling**
|
|
347
|
-
|
|
348
|
-
```
|
|
349
|
-
Database connection pooling configuration:
|
|
350
|
-
|
|
351
|
-
Pool tool: [ORM built-in, pgBouncer, HikariCP, etc.]
|
|
352
|
-
|
|
353
|
-
Settings:
|
|
354
|
-
- Min connections: __
|
|
355
|
-
- Max connections: __
|
|
356
|
-
- Connection timeout: __ ms
|
|
357
|
-
- Idle timeout: __ ms
|
|
358
|
-
- Max lifetime: __ ms
|
|
359
|
-
|
|
360
|
-
Monitoring:
|
|
361
|
-
- Track active/idle connections: [Yes/No]
|
|
362
|
-
- Alert on pool exhaustion: [Yes/No]
|
|
363
|
-
```
|
|
364
|
-
|
|
365
|
-
**7.8 Scaling Strategy**
|
|
366
|
-
|
|
367
|
-
```
|
|
368
|
-
|
|
369
|
-
How will you handle growth?
|
|
370
|
-
|
|
371
|
-
A) ⭐ Horizontal scaling - Add more instances
|
|
372
|
-
|
|
373
|
-
- Load balancer distributes traffic
|
|
374
|
-
- Stateless application design
|
|
375
|
-
|
|
376
|
-
B) Vertical scaling - Bigger instances
|
|
377
|
-
|
|
378
|
-
- Increase CPU/RAM
|
|
379
|
-
- Simpler but limited
|
|
380
|
-
|
|
381
|
-
C) ⚡ Auto-scaling - Automatic based on load
|
|
382
|
-
|
|
383
|
-
- Scale up during high traffic
|
|
384
|
-
- Scale down to save costs
|
|
385
|
-
- Metrics: CPU > 70%, requests > threshold
|
|
386
|
-
|
|
387
|
-
Your strategy: \_\_
|
|
388
|
-
|
|
389
|
-
Expected load:
|
|
390
|
-
|
|
391
|
-
- Initial: \_\_ requests/minute
|
|
392
|
-
- Year 1: \_\_ requests/minute
|
|
393
|
-
- Peak traffic: \_\_x normal load
|
|
394
|
-
|
|
395
|
-
Database scaling:
|
|
396
|
-
A) Read replicas - Scale reads
|
|
397
|
-
B) Sharding - Split data across DBs
|
|
398
|
-
C) Vertical scaling - Bigger DB instance
|
|
399
|
-
D) Not needed yet
|
|
400
|
-
|
|
401
|
-
```
|
|
402
|
-
|
|
403
|
-
**7.9 Health Checks**
|
|
404
|
-
|
|
405
|
-
````
|
|
406
|
-
|
|
407
|
-
Health check endpoints:
|
|
408
|
-
|
|
409
|
-
A) ✅ /health - Basic liveness
|
|
410
|
-
|
|
411
|
-
- Returns 200 OK if app is running
|
|
412
|
-
|
|
413
|
-
B) ✅ /health/ready - Readiness check
|
|
414
|
-
|
|
415
|
-
- Returns 200 OK if app can handle traffic
|
|
416
|
-
- Checks: DB connected, Redis connected, etc.
|
|
417
|
-
|
|
418
|
-
C) ✅ /health/live - Liveness check
|
|
419
|
-
|
|
420
|
-
- Returns 200 OK if app is alive
|
|
421
|
-
- Load balancer uses this
|
|
422
|
-
|
|
423
|
-
Example response:
|
|
424
|
-
|
|
425
|
-
```json
|
|
426
|
-
{
|
|
427
|
-
"status": "healthy",
|
|
428
|
-
"timestamp": "2024-01-15T10:30:00Z",
|
|
429
|
-
"checks": {
|
|
430
|
-
"database": "ok",
|
|
431
|
-
"redis": "ok",
|
|
432
|
-
"disk_space": "ok"
|
|
433
|
-
},
|
|
434
|
-
"version": "1.2.3"
|
|
435
|
-
}
|
|
436
|
-
```
|
|
437
|
-
|
|
438
|
-
Your health check endpoints: \_\_
|
|
439
|
-
|
|
440
|
-
````
|
|
441
|
-
|
|
442
|
-
**7.9.1 Graceful Shutdown**
|
|
443
|
-
|
|
444
|
-
```
|
|
445
|
-
Will you implement graceful shutdown?
|
|
446
|
-
|
|
447
|
-
A) ⭐ Yes - Handle shutdown gracefully (Production-Ready/Enterprise)
|
|
448
|
-
B) No - Standard shutdown
|
|
449
|
-
|
|
450
|
-
If yes:
|
|
451
|
-
Shutdown sequence:
|
|
452
|
-
1. Stop accepting new requests (timeout: __s)
|
|
453
|
-
2. Finish processing current requests (timeout: __s)
|
|
454
|
-
3. Close database connections (timeout: __s)
|
|
455
|
-
4. Close other connections (Redis, message queues, etc.)
|
|
456
|
-
5. Exit process
|
|
457
|
-
|
|
458
|
-
Total shutdown timeout: __s
|
|
459
|
-
|
|
460
|
-
Implementation:
|
|
461
|
-
- Signal handling: [SIGTERM, SIGINT]
|
|
462
|
-
- Health check grace period: __s
|
|
463
|
-
- Connection drain timeout: __s
|
|
464
|
-
```
|
|
465
|
-
|
|
466
|
-
**7.9.2 Circuit Breakers & Resilience**
|
|
467
|
-
|
|
468
|
-
```
|
|
469
|
-
Will you implement circuit breakers?
|
|
470
|
-
|
|
471
|
-
A) ⭐ Yes - Protect against cascading failures (Production-Ready/Enterprise)
|
|
472
|
-
B) No - Direct service calls
|
|
473
|
-
|
|
474
|
-
If yes:
|
|
475
|
-
Circuit breaker tool: [Resilience4j, Hystrix, Polly, etc.]
|
|
476
|
-
|
|
477
|
-
Configuration:
|
|
478
|
-
- Failure threshold: __% (open circuit after X% failures)
|
|
479
|
-
- Success threshold: __% (close circuit after X% successes)
|
|
480
|
-
- Timeout: __ms
|
|
481
|
-
- Half-open retries: __
|
|
482
|
-
- Reset timeout: __s
|
|
483
|
-
|
|
484
|
-
Fallback strategy:
|
|
485
|
-
A) ⭐ Return cached data
|
|
486
|
-
B) Return default/empty response
|
|
487
|
-
C) Call alternative service
|
|
488
|
-
D) Return error gracefully
|
|
489
|
-
|
|
490
|
-
Services to protect:
|
|
491
|
-
{{#EACH SERVICE_TO_PROTECT}}
|
|
492
|
-
- **{{SERVICE_NAME}}**: {{FAILURE_THRESHOLD}}% threshold, fallback: {{FALLBACK_STRATEGY}}
|
|
493
|
-
{{/EACH}}
|
|
494
|
-
```
|
|
495
|
-
|
|
496
|
-
**7.9.3 Retry & Timeout Policies**
|
|
497
|
-
|
|
498
|
-
```
|
|
499
|
-
Define retry and timeout policies for external dependencies:
|
|
500
|
-
|
|
501
|
-
| Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
|
|
502
|
-
|--------------------|-----------|---------|----------------------|----------------------|
|
|
503
|
-
| Database queries | 5000ms | 2 | None (fail fast) | Connection pooled |
|
|
504
|
-
| Redis cache | 1000ms | 1 | None | Cache miss = OK |
|
|
505
|
-
| Payment API | 30000ms | 3 | Exponential (1s,2s,4s)| Must complete |
|
|
506
|
-
| Email service | 5000ms | 3 | Fixed (2s) | Queue if fails |
|
|
507
|
-
| External REST APIs | 10000ms | 2 | Exponential | Circuit breaker |
|
|
508
|
-
| File storage (S3) | 15000ms | 3 | Exponential | Large files |
|
|
509
|
-
|
|
510
|
-
Your policies:
|
|
511
|
-
|
|
512
|
-
| Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
|
|
513
|
-
|--------------------|-----------|---------|----------------------|----------------------|
|
|
514
|
-
| | | | | |
|
|
515
|
-
| | | | | |
|
|
516
|
-
|
|
517
|
-
Global defaults:
|
|
518
|
-
- Default HTTP timeout: __ ms (recommended: 10000)
|
|
519
|
-
- Default retries: __ (recommended: 2)
|
|
520
|
-
- Default backoff: [None/Fixed/Exponential]
|
|
521
|
-
|
|
522
|
-
Non-retryable errors:
|
|
523
|
-
- 400 Bad Request (client error, won't succeed on retry)
|
|
524
|
-
- 401/403 Unauthorized/Forbidden
|
|
525
|
-
- 404 Not Found
|
|
526
|
-
- [Your additions]
|
|
527
|
-
```
|
|
528
|
-
|
|
529
|
-
**7.9.4 Request/Response Logging & Masking**
|
|
530
|
-
|
|
531
|
-
```
|
|
532
|
-
What request/response data will you log?
|
|
533
|
-
|
|
534
|
-
Log levels by environment:
|
|
535
|
-
| Environment | Level | Body Logging | Performance Logging |
|
|
536
|
-
|-------------|----------|--------------|---------------------|
|
|
537
|
-
| Development | debug | Full | Yes |
|
|
538
|
-
| Staging | info | Truncated | Yes |
|
|
539
|
-
| Production | info | Minimal | Yes |
|
|
540
|
-
|
|
541
|
-
Request logging:
|
|
542
|
-
- ✅ HTTP method and URL
|
|
543
|
-
- ✅ Request ID (correlation)
|
|
544
|
-
- ✅ User ID (if authenticated)
|
|
545
|
-
- ✅ IP address (optional, may hash for privacy)
|
|
546
|
-
- ✅ Request duration (ms)
|
|
547
|
-
- ❓ Request body (careful with size and PII)
|
|
548
|
-
- ❓ Query parameters
|
|
549
|
-
|
|
550
|
-
Response logging:
|
|
551
|
-
- ✅ Status code
|
|
552
|
-
- ✅ Response duration (ms)
|
|
553
|
-
- ❓ Response body (careful with size)
|
|
554
|
-
|
|
555
|
-
Sensitive data masking (CRITICAL):
|
|
556
|
-
|
|
557
|
-
| Field Pattern | Masking Strategy |
|
|
558
|
-
|------------------------|----------------------------|
|
|
559
|
-
| password, secret | Completely redact |
|
|
560
|
-
| token, api_key | Show last 4 chars only |
|
|
561
|
-
| email | j***@example.com |
|
|
562
|
-
| phone | ***-***-1234 |
|
|
563
|
-
| credit_card | ****-****-****-1234 |
|
|
564
|
-
| ssn, national_id | Completely redact |
|
|
565
|
-
| [Your patterns] | __ |
|
|
566
|
-
|
|
567
|
-
Log format:
|
|
568
|
-
A) ⭐ Structured JSON (recommended for aggregation)
|
|
569
|
-
B) Plain text with patterns
|
|
570
|
-
C) Framework default
|
|
571
|
-
|
|
572
|
-
Log aggregation:
|
|
573
|
-
A) ⭐ Centralized (ELK, Datadog, CloudWatch)
|
|
574
|
-
B) File-based with rotation
|
|
575
|
-
C) Console only (development)
|
|
576
|
-
```
|
|
577
|
-
|
|
578
|
-
**7.10 Documentation & Runbooks**
|
|
579
|
-
|
|
580
|
-
```
|
|
581
|
-
|
|
582
|
-
Operational documentation:
|
|
583
|
-
|
|
584
|
-
A) ✅ Deployment guide - How to deploy
|
|
585
|
-
B) ✅ Runbooks - How to handle incidents
|
|
586
|
-
|
|
587
|
-
- Database connection failure → steps to diagnose/fix
|
|
588
|
-
- High CPU usage → steps to investigate
|
|
589
|
-
- Service down → recovery procedure
|
|
590
|
-
|
|
591
|
-
C) ✅ Architecture diagrams (Mermaid format)
|
|
592
|
-
|
|
593
|
-
- System architecture diagram (mermaid)
|
|
594
|
-
- Data flow diagram (mermaid)
|
|
595
|
-
- Infrastructure diagram (mermaid)
|
|
596
|
-
|
|
597
|
-
D) ✅ API documentation
|
|
598
|
-
|
|
599
|
-
- Swagger/OpenAPI
|
|
600
|
-
- Auto-generated from code
|
|
601
|
-
|
|
602
|
-
Will you create these?
|
|
603
|
-
A) Yes - All of them ⭐
|
|
604
|
-
B) Yes - Critical ones only (deployment, runbooks)
|
|
605
|
-
C) Later - Start without docs
|
|
606
|
-
|
|
607
|
-
API documentation strategy:
|
|
608
|
-
A) ⭐ Code-First (Recommended)
|
|
609
|
-
|
|
610
|
-
- Generate docs from code (Swagger/OpenAPI decorators)
|
|
611
|
-
- Always in sync with code
|
|
612
|
-
- Tools: @nestjs/swagger, FastAPI docs
|
|
613
|
-
|
|
614
|
-
B) 📝 Design-First
|
|
615
|
-
|
|
616
|
-
- Write openapi.yaml manually first
|
|
617
|
-
- Generate code from spec
|
|
618
|
-
- Better for large teams/contracts
|
|
619
|
-
|
|
620
|
-
C) 📄 Manual
|
|
621
|
-
|
|
622
|
-
- Write Markdown/Notion docs
|
|
623
|
-
- Hard to keep in sync (Not recommended)
|
|
624
|
-
|
|
625
|
-
```
|
|
626
|
-
|
|
627
|
-
---
|
|
628
|
-
|
|
629
|
-
#### 🎨 MERMAID OPERATIONS DIAGRAM FORMATS - CRITICAL
|
|
630
|
-
|
|
631
|
-
## **Use these exact formats** for operational and infrastructure diagrams mentioned in question 7.10:
|
|
632
|
-
|
|
633
|
-
##### 1️⃣ System Architecture Diagram (Deployment View)
|
|
634
|
-
|
|
635
|
-
Use `graph TD` to show deployed system components with scaling and redundancy:
|
|
636
|
-
|
|
637
|
-
````markdown
|
|
638
|
-
```mermaid
|
|
639
|
-
graph TD
|
|
640
|
-
subgraph "Production Environment"
|
|
641
|
-
subgraph "Load Balancer Layer"
|
|
642
|
-
LB1[Load Balancer 1]
|
|
643
|
-
LB2[Load Balancer 2]
|
|
644
|
-
end
|
|
645
|
-
|
|
646
|
-
subgraph "Application Layer"
|
|
647
|
-
App1[API Server 1<br/>4 vCPU, 8GB RAM]
|
|
648
|
-
App2[API Server 2<br/>4 vCPU, 8GB RAM]
|
|
649
|
-
App3[API Server 3<br/>4 vCPU, 8GB RAM]
|
|
650
|
-
end
|
|
651
|
-
|
|
652
|
-
subgraph "Data Layer"
|
|
653
|
-
Primary[(Primary DB<br/>PostgreSQL 15)]
|
|
654
|
-
Replica1[(Read Replica 1)]
|
|
655
|
-
Replica2[(Read Replica 2)]
|
|
656
|
-
Cache[Redis Cluster<br/>3 Nodes]
|
|
657
|
-
end
|
|
658
|
-
|
|
659
|
-
subgraph "Message Queue"
|
|
660
|
-
Queue[RabbitMQ Cluster<br/>3 Nodes]
|
|
661
|
-
end
|
|
662
|
-
end
|
|
663
|
-
|
|
664
|
-
Internet[Internet] -->|HTTPS| LB1
|
|
665
|
-
Internet -->|HTTPS| LB2
|
|
666
|
-
LB1 --> App1
|
|
667
|
-
LB1 --> App2
|
|
668
|
-
LB2 --> App2
|
|
669
|
-
LB2 --> App3
|
|
670
|
-
|
|
671
|
-
App1 -->|Write| Primary
|
|
672
|
-
App2 -->|Write| Primary
|
|
673
|
-
App3 -->|Write| Primary
|
|
674
|
-
|
|
675
|
-
App1 -->|Read| Replica1
|
|
676
|
-
App2 -->|Read| Replica2
|
|
677
|
-
App3 -->|Read| Replica1
|
|
678
|
-
|
|
679
|
-
App1 -->|Cache| Cache
|
|
680
|
-
App2 -->|Cache| Cache
|
|
681
|
-
App3 -->|Cache| Cache
|
|
682
|
-
|
|
683
|
-
App1 -->|Async Jobs| Queue
|
|
684
|
-
App2 -->|Async Jobs| Queue
|
|
685
|
-
App3 -->|Async Jobs| Queue
|
|
686
|
-
|
|
687
|
-
Primary -.->|Replication| Replica1
|
|
688
|
-
Primary -.->|Replication| Replica2
|
|
689
|
-
|
|
690
|
-
style Internet fill:#e1f5ff
|
|
691
|
-
style Primary fill:#e1ffe1
|
|
692
|
-
style Cache fill:#f0e1ff
|
|
693
|
-
style Queue fill:#ffe1f5
|
|
694
|
-
```
|
|
695
|
-
````
|
|
696
|
-
|
|
697
|
-
## **Use for:** Showing deployed infrastructure, scaling configuration, redundancy, high availability
|
|
698
|
-
|
|
699
|
-
##### 2️⃣ Data Flow Diagram (Request Flow)
|
|
700
|
-
|
|
701
|
-
Use `flowchart LR` to show how data moves through the system step-by-step:
|
|
702
|
-
|
|
703
|
-
````markdown
|
|
704
|
-
```mermaid
|
|
705
|
-
flowchart LR
|
|
706
|
-
User[User Request] -->|1. HTTPS POST| LB[Load Balancer]
|
|
707
|
-
LB -->|2. Route| API[API Server]
|
|
708
|
-
API -->|3. Validate JWT| Auth[Auth Service]
|
|
709
|
-
Auth -->|4. Token Valid| API
|
|
710
|
-
|
|
711
|
-
API -->|5. Check Cache| Cache[(Redis Cache)]
|
|
712
|
-
Cache -->|6. Cache Miss| API
|
|
713
|
-
|
|
714
|
-
API -->|7. Query| DB[(PostgreSQL)]
|
|
715
|
-
DB -->|8. Data| API
|
|
716
|
-
|
|
717
|
-
API -->|9. Store in Cache| Cache
|
|
718
|
-
API -->|10. Enqueue Job| Queue[Message Queue]
|
|
719
|
-
|
|
720
|
-
Queue -->|11. Process| Worker[Background Worker]
|
|
721
|
-
Worker -->|12. Send Email| Email[Email Service]
|
|
722
|
-
|
|
723
|
-
API -->|13. JSON Response| User
|
|
724
|
-
|
|
725
|
-
style User fill:#e1f5ff
|
|
726
|
-
style Cache fill:#f0e1ff
|
|
727
|
-
style DB fill:#e1ffe1
|
|
728
|
-
style Email fill:#fff4e1
|
|
729
|
-
```
|
|
730
|
-
````
|
|
731
|
-
|
|
732
|
-
## **Use for:** Documenting request/response cycles, async processing flows, numbered execution steps
|
|
733
|
-
|
|
734
|
-
##### 3️⃣ Infrastructure Diagram (Cloud Resources)
|
|
735
|
-
|
|
736
|
-
Use `graph TB` with subgraphs to show cloud infrastructure and network topology:
|
|
737
|
-
|
|
738
|
-
````markdown
|
|
739
|
-
```mermaid
|
|
740
|
-
graph TB
|
|
741
|
-
subgraph "AWS Cloud - Production (us-east-1)"
|
|
742
|
-
subgraph "VPC (10.0.0.0/16)"
|
|
743
|
-
subgraph "Public Subnet (10.0.1.0/24)"
|
|
744
|
-
ALB[Application Load Balancer]
|
|
745
|
-
NAT[NAT Gateway]
|
|
746
|
-
end
|
|
747
|
-
|
|
748
|
-
subgraph "Private Subnet 1 (10.0.10.0/24)"
|
|
749
|
-
ECS1[ECS Cluster<br/>Auto Scaling Group]
|
|
750
|
-
App1[Container: API<br/>Fargate Task]
|
|
751
|
-
App2[Container: API<br/>Fargate Task]
|
|
752
|
-
end
|
|
753
|
-
|
|
754
|
-
subgraph "Private Subnet 2 (10.0.20.0/24)"
|
|
755
|
-
RDS[(RDS PostgreSQL<br/>Multi-AZ)]
|
|
756
|
-
ElastiCache[ElastiCache Redis<br/>Cluster Mode]
|
|
757
|
-
end
|
|
758
|
-
|
|
759
|
-
subgraph "Private Subnet 3 (10.0.30.0/24)"
|
|
760
|
-
SQS[Amazon SQS<br/>Message Queue]
|
|
761
|
-
Lambda[Lambda Functions<br/>Background Workers]
|
|
762
|
-
end
|
|
763
|
-
end
|
|
764
|
-
|
|
765
|
-
subgraph "Supporting Services"
|
|
766
|
-
S3[S3 Bucket<br/>File Storage]
|
|
767
|
-
CloudWatch[CloudWatch<br/>Monitoring & Logs]
|
|
768
|
-
SecretsManager[Secrets Manager<br/>API Keys & Credentials]
|
|
769
|
-
end
|
|
770
|
-
end
|
|
771
|
-
|
|
772
|
-
Internet[Internet Users] -->|HTTPS| ALB
|
|
773
|
-
ALB --> App1
|
|
774
|
-
ALB --> App2
|
|
775
|
-
|
|
776
|
-
App1 --> RDS
|
|
777
|
-
App2 --> RDS
|
|
778
|
-
App1 --> ElastiCache
|
|
779
|
-
App2 --> ElastiCache
|
|
780
|
-
|
|
781
|
-
App1 -->|Upload/Download| S3
|
|
782
|
-
App2 -->|Upload/Download| S3
|
|
783
|
-
|
|
784
|
-
App1 -->|Send Message| SQS
|
|
785
|
-
SQS -->|Trigger| Lambda
|
|
786
|
-
Lambda --> RDS
|
|
787
|
-
|
|
788
|
-
App1 -->|Logs & Metrics| CloudWatch
|
|
789
|
-
App2 -->|Logs & Metrics| CloudWatch
|
|
790
|
-
Lambda -->|Logs| CloudWatch
|
|
791
|
-
|
|
792
|
-
App1 -->|Fetch Secrets| SecretsManager
|
|
793
|
-
App2 -->|Fetch Secrets| SecretsManager
|
|
794
|
-
|
|
795
|
-
style Internet fill:#e1f5ff
|
|
796
|
-
style RDS fill:#e1ffe1
|
|
797
|
-
style ElastiCache fill:#f0e1ff
|
|
798
|
-
style S3 fill:#fff4e1
|
|
799
|
-
style CloudWatch fill:#ffe1e1
|
|
800
|
-
```
|
|
801
|
-
````
|
|
802
|
-
|
|
803
|
-
## **Use for:** Documenting cloud architecture, network topology, AWS/GCP/Azure resources, VPC design
|
|
804
|
-
|
|
805
|
-
##### 4️⃣ Monitoring & Observability Diagram (Optional)
|
|
806
|
-
|
|
807
|
-
Use `graph TD` to show monitoring, logging, and alerting stack:
|
|
808
|
-
|
|
809
|
-
````markdown
|
|
810
|
-
```mermaid
|
|
811
|
-
graph TD
|
|
812
|
-
subgraph "Application Layer"
|
|
813
|
-
App[API Servers]
|
|
814
|
-
Worker[Background Workers]
|
|
815
|
-
end
|
|
816
|
-
|
|
817
|
-
subgraph "Monitoring Stack"
|
|
818
|
-
Prometheus[Prometheus<br/>Metrics Collection]
|
|
819
|
-
Grafana[Grafana<br/>Dashboards]
|
|
820
|
-
AlertManager[Alert Manager<br/>Notifications]
|
|
821
|
-
end
|
|
822
|
-
|
|
823
|
-
subgraph "Logging Stack"
|
|
824
|
-
FluentBit[Fluent Bit<br/>Log Collector]
|
|
825
|
-
Elasticsearch[Elasticsearch<br/>Log Storage]
|
|
826
|
-
Kibana[Kibana<br/>Log Viewer]
|
|
827
|
-
end
|
|
828
|
-
|
|
829
|
-
subgraph "Tracing"
|
|
830
|
-
Jaeger[Jaeger<br/>Distributed Tracing]
|
|
831
|
-
end
|
|
832
|
-
|
|
833
|
-
subgraph "Alerts"
|
|
834
|
-
PagerDuty[PagerDuty]
|
|
835
|
-
Slack[Slack Notifications]
|
|
836
|
-
end
|
|
837
|
-
|
|
838
|
-
App -->|Metrics| Prometheus
|
|
839
|
-
Worker -->|Metrics| Prometheus
|
|
840
|
-
Prometheus --> Grafana
|
|
841
|
-
Prometheus --> AlertManager
|
|
842
|
-
|
|
843
|
-
App -->|Logs| FluentBit
|
|
844
|
-
Worker -->|Logs| FluentBit
|
|
845
|
-
FluentBit --> Elasticsearch
|
|
846
|
-
Elasticsearch --> Kibana
|
|
847
|
-
|
|
848
|
-
App -->|Traces| Jaeger
|
|
849
|
-
Worker -->|Traces| Jaeger
|
|
850
|
-
|
|
851
|
-
AlertManager --> PagerDuty
|
|
852
|
-
AlertManager --> Slack
|
|
853
|
-
|
|
854
|
-
style Grafana fill:#e1f5ff
|
|
855
|
-
style Kibana fill:#f0e1ff
|
|
856
|
-
style PagerDuty fill:#ffe1e1
|
|
857
|
-
```
|
|
858
|
-
````
|
|
859
|
-
|
|
860
|
-
## **Use for:** Documenting observability strategy, monitoring infrastructure, alerting workflows
|
|
861
|
-
|
|
862
|
-
**Best Practices for Operations Diagrams:**
|
|
863
|
-
|
|
864
|
-
1. **Include Resource Specs:** Add CPU/RAM/disk info to nodes (e.g., `[API Server<br/>4 vCPU, 8GB RAM]`)
|
|
865
|
-
2. **Show Redundancy:** Display load balancers, replicas, multi-AZ deployments, failover paths
|
|
866
|
-
3. **Label Network Boundaries:** Use subgraphs for VPCs, subnets, availability zones, regions
|
|
867
|
-
4. **Document Protocols:** Label connections with HTTPS, gRPC, TCP, WebSocket, etc.
|
|
868
|
-
5. **Add IP Ranges:** Include CIDR blocks for network subnets (e.g., `10.0.1.0/24`)
|
|
869
|
-
6. **Show Auto-Scaling:** Indicate which components scale horizontally/vertically
|
|
870
|
-
7. **Include External Services:** SaaS tools, third-party APIs, CDNs, email providers
|
|
871
|
-
8. **Color Code by Layer:** Infrastructure (blue), data (green), monitoring (purple), alerts (red)
|
|
872
|
-
|
|
873
|
-
**Common Formatting Rules:**
|
|
874
|
-
|
|
875
|
-
- Code fence: ` ```mermaid ` (lowercase, no spaces, three backticks)
|
|
876
|
-
- Use `subgraph "Name"` to group related components by layer/zone
|
|
877
|
-
- Use `[(Cylinder)]` for databases, data stores, and persistent storage
|
|
878
|
-
- Use `[Square Brackets]` for services, servers, and compute resources
|
|
879
|
-
- Use dotted arrows `-.->` for replication, backup, and async flows
|
|
880
|
-
- Apply consistent styling: `style NodeName fill:#colorcode`
|
|
881
|
-
|
|
882
|
-
**Deployment Context Examples:**
|
|
883
|
-
|
|
884
|
-
- For Docker: Show containers, volumes, networks, registries
|
|
885
|
-
- For Kubernetes: Show pods, services, ingress, namespaces, persistent volumes
|
|
886
|
-
- For Serverless: Show Lambda functions, API Gateway, S3 triggers, event sources
|
|
887
|
-
- For VMs: Show instances, security groups, load balancers, auto-scaling groups
|
|
888
|
-
|
|
889
|
-
## **Validation:** Test diagrams at https://mermaid.live/ before saving to ensure syntax is correct
|
|
890
|
-
|
|
891
|
-
### Phase 7 Output
|
|
892
|
-
|
|
893
|
-
```
|
|
894
|
-
📋 PHASE 7 SUMMARY:
|
|
895
|
-
|
|
896
|
-
Deployment Environment: [cloud/PaaS/on-premises/container-orchestration + platform choice + rationale] (7.1)
|
|
897
|
-
Containerization: [yes/no + Docker setup (base image, size, compose stack)] (7.2)
|
|
898
|
-
Environments: [number of environments (dev/staging/prod) + config approach (env vars/secrets/feature flags)] (7.3)
|
|
899
|
-
CI/CD Pipeline: [platform (GitHub Actions/GitLab CI/etc.) + pipeline stages + auto-deploy strategy] (7.4)
|
|
900
|
-
Deployment Strategy: [standard/blue-green/canary/rolling + zero-downtime approach + rollback plan] (7.4.1)
|
|
901
|
-
Monitoring & Logging: [APM tool + logging strategy (centralized/structured JSON) + metrics to track] (7.5)
|
|
902
|
-
Alerts: [alert conditions (error rate/response time/5xx/etc.) + channels (email/Slack/PagerDuty) + on-call rotation] (7.6)
|
|
903
|
-
Backup & Disaster Recovery: [backup strategy + retention period + RTO/RPO targets] (7.7)
|
|
904
|
-
Database Migrations in Production: [zero-downtime strategy + rollback plan + migration windows] (7.7.1)
|
|
905
|
-
Database Connection Pooling: [pool tool + settings (min/max/timeouts) + monitoring] (7.7.2)
|
|
906
|
-
Scaling Strategy: [horizontal/vertical/auto-scaling + expected load + database scaling approach] (7.8)
|
|
907
|
-
Health Checks: [endpoints (/health, /health/ready, /health/live) + checks performed] (7.9)
|
|
908
|
-
Graceful Shutdown: [yes/no + shutdown sequence + timeouts] (7.9.1)
|
|
909
|
-
Circuit Breakers & Resilience: [yes/no + tool + configuration + fallback strategies] (7.9.2)
|
|
910
|
-
Documentation & Runbooks: [what will be created (deployment guide/runbooks/architecture diagrams in mermaid format/API docs) + API doc strategy (code-first/design-first)] (7.10)
|
|
911
|
-
|
|
912
|
-
Is this correct? (Yes/No)
|
|
913
|
-
```
|
|
914
|
-
|
|
915
|
-
---
|
|
916
|
-
|
|
917
|
-
### 📄 Generate Phase 7 Documents
|
|
918
|
-
|
|
919
|
-
**Before starting generation:**
|
|
920
|
-
|
|
921
|
-
```
|
|
922
|
-
📖 Loading context from previous phases...
|
|
923
|
-
✅ Re-reading docs/testing.md
|
|
924
|
-
✅ Re-reading ai-instructions.md
|
|
925
|
-
```
|
|
926
|
-
|
|
927
|
-
**Generate documents automatically:**
|
|
928
|
-
|
|
929
|
-
**1. `docs/operations.md`**
|
|
930
|
-
|
|
931
|
-
- Use template: `.ai-flow/templates/docs/operations.template.md`
|
|
932
|
-
- Fill with deployment, monitoring, alerting, backup, scaling
|
|
933
|
-
- Write to: `docs/operations.md`
|
|
934
|
-
|
|
935
|
-
**2. `specs/configuration.md`**
|
|
936
|
-
|
|
937
|
-
- Use template: `.ai-flow/templates/specs/configuration.template.md`
|
|
938
|
-
- Fill with environment variables, secrets management, feature flags
|
|
939
|
-
- Write to: `specs/configuration.md`
|
|
940
|
-
|
|
941
|
-
**3. `.env.example`**
|
|
942
|
-
|
|
943
|
-
- List all environment variables needed
|
|
944
|
-
- Include comments explaining each variable
|
|
945
|
-
- Write to: `.env.example`
|
|
946
|
-
|
|
947
|
-
```
|
|
948
|
-
✅ Generated: docs/operations.md
|
|
949
|
-
✅ Generated: specs/configuration.md
|
|
950
|
-
✅ Generated: .env.example
|
|
951
|
-
|
|
952
|
-
Documents have been created with all Phase 7 information.
|
|
953
|
-
|
|
954
|
-
📝 Would you like to make any corrections before continuing?
|
|
955
|
-
|
|
956
|
-
→ If yes: Edit the files and type "ready" when done. I'll re-read them.
|
|
957
|
-
→ If no: Type "continue" to proceed to final checkpoint.
|
|
958
|
-
```
|
|
959
|
-
|
|
960
|
-
**If user edits files:**
|
|
961
|
-
Re-read files to refresh context before continuing.
|
|
962
|
-
|
|
963
|
-
---
|
|
964
|
-
|
|
965
|
-
### Phase 7 Completion
|
|
966
|
-
|
|
967
|
-
```
|
|
968
|
-
✅ Phase 7 Complete!
|
|
969
|
-
|
|
970
|
-
Generated documents:
|
|
971
|
-
✅ docs/operations.md
|
|
972
|
-
✅ specs/configuration.md
|
|
973
|
-
✅ .env.example
|
|
974
|
-
|
|
975
|
-
📝 Would you like to review these documents before proceeding to Phase 8?
|
|
976
|
-
|
|
977
|
-
→ If yes: Edit the files and type "ready" when done.
|
|
978
|
-
→ If no: Type "continue" to proceed to Phase 8.
|
|
979
|
-
```
|
|
980
|
-
|
|
981
|
-
---
|
|
982
|
-
|
|
983
|
-
## 📝 Generated Documents
|
|
984
|
-
|
|
985
|
-
After Phase 7, generate/update:
|
|
986
|
-
|
|
987
|
-
- `docs/operations.md` - Operations and deployment guide
|
|
988
|
-
- `specs/configuration.md` - Configuration specification
|
|
989
|
-
- `.env.example` - Environment variables template
|
|
990
|
-
|
|
991
|
-
---
|
|
992
|
-
|
|
993
|
-
**Next Phase:** Phase 8 - Project Setup & Final Documentation
|
|
994
|
-
|
|
995
|
-
Read: `.ai-flow/prompts/backend/flow-build-phase-8.md`
|
|
996
|
-
|
|
997
|
-
---
|
|
998
|
-
|
|
999
|
-
**Last Updated:** 2025-12-20
|
|
1000
|
-
|
|
1001
|
-
**Version:** 2.1.8
|
|
1
|
+
## PHASE 7: Operations & Deployment (10-15 min)
|
|
2
|
+
|
|
3
|
+
> **Order for this phase:** 7.1 → 7.2 → 7.3 → 7.4 → 7.4.1 → 7.5 → 7.6 → 7.7 → 7.7.1 → 7.7.2 → 7.8 → 7.9 → 7.9.1 → 7.9.2 → 7.9.3 → 7.9.4 → 7.10
|
|
4
|
+
|
|
5
|
+
> **📌 Scope-based behavior:**
|
|
6
|
+
>
|
|
7
|
+
> - **MVP:** Ask 7.1-7.4 only (deployment basics), skip 7.5-7.10 (monitoring, scaling, backups), mark as "TBD"
|
|
8
|
+
> - **Production-Ready:** Ask 7.1-7.8, simplify 7.9-7.10 (advanced monitoring and resilience)
|
|
9
|
+
> - **Enterprise:** Ask all questions 7.1-7.10 with emphasis on reliability and disaster recovery
|
|
10
|
+
|
|
11
|
+
### Objective
|
|
12
|
+
|
|
13
|
+
Define deployment, monitoring, and operational practices.
|
|
14
|
+
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
## 🔍 Pre-Flight Check (Smart Skip Logic)
|
|
18
|
+
|
|
19
|
+
> 📎 **Reference:** See [prompts/shared/smart-skip-preflight.md](../../.ai-flow/prompts/shared/smart-skip-preflight.md) for the complete smart skip logic.
|
|
20
|
+
|
|
21
|
+
**Execute Pre-Flight Check for Phase 7:**
|
|
22
|
+
|
|
23
|
+
- **Target File**: `docs/deployment.md`
|
|
24
|
+
- **Phase Name**: "OPERATIONS & DEPLOYMENT"
|
|
25
|
+
- **Key Items**: CI/CD pipeline, deployment platform, monitoring, logging
|
|
26
|
+
- **Typical Gaps**: Incident runbooks, disaster recovery, scaling strategy
|
|
27
|
+
|
|
28
|
+
**Proceed with appropriate scenario based on audit data from `.ai-flow/cache/audit-data.json`**
|
|
29
|
+
|
|
30
|
+
---
|
|
31
|
+
|
|
32
|
+
## Phase 7 Questions (Full Mode)
|
|
33
|
+
|
|
34
|
+
**7.1 Deployment Environment**
|
|
35
|
+
|
|
36
|
+
```
|
|
37
|
+
|
|
38
|
+
Where will you deploy?
|
|
39
|
+
|
|
40
|
+
A) ⭐ Cloud Platform
|
|
41
|
+
|
|
42
|
+
- AWS (ECS, Fargate, Lambda, EC2)
|
|
43
|
+
- Google Cloud (Cloud Run, GKE, Compute Engine)
|
|
44
|
+
- Azure (App Service, AKS, VMs)
|
|
45
|
+
|
|
46
|
+
B) 🔥 Platform-as-a-Service (PaaS)
|
|
47
|
+
|
|
48
|
+
- Heroku
|
|
49
|
+
- Railway
|
|
50
|
+
- Render
|
|
51
|
+
- Fly.io
|
|
52
|
+
- Vercel (for APIs)
|
|
53
|
+
|
|
54
|
+
C) 🏢 On-Premises
|
|
55
|
+
|
|
56
|
+
- Company servers
|
|
57
|
+
- Private cloud
|
|
58
|
+
|
|
59
|
+
D) 🐳 Container Orchestration
|
|
60
|
+
|
|
61
|
+
- Kubernetes (GKE, EKS, AKS)
|
|
62
|
+
- Docker Swarm
|
|
63
|
+
- Nomad
|
|
64
|
+
|
|
65
|
+
Your choice: \_\_
|
|
66
|
+
Why?
|
|
67
|
+
|
|
68
|
+
```
|
|
69
|
+
|
|
70
|
+
**7.2 Containerization**
|
|
71
|
+
|
|
72
|
+
````
|
|
73
|
+
|
|
74
|
+
Will you use Docker?
|
|
75
|
+
|
|
76
|
+
A) ⭐ Yes - Dockerize application
|
|
77
|
+
|
|
78
|
+
- Multi-stage build
|
|
79
|
+
- Optimized image size
|
|
80
|
+
- Docker Compose for local dev
|
|
81
|
+
|
|
82
|
+
B) No - Deploy directly
|
|
83
|
+
|
|
84
|
+
If yes:
|
|
85
|
+
Base image: **
|
|
86
|
+
Estimated image size: ** MB
|
|
87
|
+
|
|
88
|
+
Example stack (local development):
|
|
89
|
+
|
|
90
|
+
```yaml
|
|
91
|
+
services:
|
|
92
|
+
app:
|
|
93
|
+
build: .
|
|
94
|
+
ports: [3000:3000]
|
|
95
|
+
db:
|
|
96
|
+
image: postgres:15
|
|
97
|
+
redis:
|
|
98
|
+
image: redis:7
|
|
99
|
+
```
|
|
100
|
+
|
|
101
|
+
````
|
|
102
|
+
|
|
103
|
+
**7.3 Environment Strategy**
|
|
104
|
+
|
|
105
|
+
```
|
|
106
|
+
|
|
107
|
+
How many environments will you have?
|
|
108
|
+
|
|
109
|
+
A) ⭐ Three environments
|
|
110
|
+
|
|
111
|
+
- Development (local)
|
|
112
|
+
- Staging (pre-production, QA)
|
|
113
|
+
- Production (live)
|
|
114
|
+
|
|
115
|
+
B) 🏆 Four+ environments
|
|
116
|
+
|
|
117
|
+
- Development
|
|
118
|
+
- Testing (automated tests)
|
|
119
|
+
- Staging
|
|
120
|
+
- Production
|
|
121
|
+
|
|
122
|
+
C) 🚀 Two environments
|
|
123
|
+
|
|
124
|
+
- Development
|
|
125
|
+
- Production
|
|
126
|
+
|
|
127
|
+
Your choice: \_\_
|
|
128
|
+
|
|
129
|
+
Environment configuration:
|
|
130
|
+
A) ✅ Environment variables (.env files)
|
|
131
|
+
B) ✅ Config service (AWS Secrets Manager, Vault)
|
|
132
|
+
C) ✅ Feature flags (LaunchDarkly, Unleash)
|
|
133
|
+
|
|
134
|
+
```
|
|
135
|
+
|
|
136
|
+
**7.4 CI/CD Pipeline**
|
|
137
|
+
|
|
138
|
+
```
|
|
139
|
+
|
|
140
|
+
CI/CD platform:
|
|
141
|
+
|
|
142
|
+
A) ⭐ GitHub Actions - If using GitHub
|
|
143
|
+
B) 🔥 GitLab CI - If using GitLab
|
|
144
|
+
C) Jenkins - Self-hosted
|
|
145
|
+
D) CircleCI
|
|
146
|
+
E) Travis CI
|
|
147
|
+
F) AWS CodePipeline
|
|
148
|
+
G) Azure DevOps
|
|
149
|
+
|
|
150
|
+
Your choice: \_\_
|
|
151
|
+
|
|
152
|
+
Pipeline stages:
|
|
153
|
+
|
|
154
|
+
1. ✅ Checkout code
|
|
155
|
+
2. ✅ Install dependencies
|
|
156
|
+
3. ✅ Lint
|
|
157
|
+
4. ✅ Test (with coverage)
|
|
158
|
+
5. ✅ Build
|
|
159
|
+
6. ✅ Security scan (optional)
|
|
160
|
+
7. ✅ Deploy to staging
|
|
161
|
+
8. ⏸️ Manual approval (optional)
|
|
162
|
+
9. ✅ Deploy to production
|
|
163
|
+
|
|
164
|
+
Auto-deploy strategy:
|
|
165
|
+
A) ⭐ Auto-deploy to staging, manual approval for production
|
|
166
|
+
B) 🚀 Auto-deploy to production (main branch)
|
|
167
|
+
C) Manual deploy for all environments
|
|
168
|
+
|
|
169
|
+
```
|
|
170
|
+
|
|
171
|
+
**7.4.1 Deployment Strategy** (Production-Ready and Enterprise only)
|
|
172
|
+
|
|
173
|
+
```
|
|
174
|
+
What deployment strategy will you use for production?
|
|
175
|
+
|
|
176
|
+
A) ⭐ Rolling Deployment - Gradual replacement
|
|
177
|
+
- Replace instances one at a time
|
|
178
|
+
- Zero downtime
|
|
179
|
+
- Easy rollback
|
|
180
|
+
|
|
181
|
+
B) 🔥 Blue-Green Deployment - Instant switch
|
|
182
|
+
- Two identical environments
|
|
183
|
+
- Switch traffic instantly
|
|
184
|
+
- Higher infrastructure cost
|
|
185
|
+
|
|
186
|
+
C) ⚡ Canary Deployment - Progressive rollout
|
|
187
|
+
- Deploy to small percentage first
|
|
188
|
+
- Monitor for issues
|
|
189
|
+
- Gradually increase traffic
|
|
190
|
+
|
|
191
|
+
D) 🏆 Feature Flags - Code-level control
|
|
192
|
+
- Deploy code, toggle features
|
|
193
|
+
- Instant enable/disable
|
|
194
|
+
- Best with: LaunchDarkly, Unleash
|
|
195
|
+
|
|
196
|
+
Your choice: __
|
|
197
|
+
|
|
198
|
+
Rollback plan:
|
|
199
|
+
- How quickly must rollback complete? __ minutes
|
|
200
|
+
- Who can trigger rollback? [DevOps/Tech Lead/Any developer]
|
|
201
|
+
- Rollback trigger criteria? [Error rate > X%, latency > Y ms, manual]
|
|
202
|
+
|
|
203
|
+
If Blue-Green:
|
|
204
|
+
- Traffic switching: [Load balancer, DNS, etc.]
|
|
205
|
+
- Database migrations: [Strategy for zero-downtime]
|
|
206
|
+
|
|
207
|
+
If Canary:
|
|
208
|
+
- Initial traffic: __%
|
|
209
|
+
- Gradual increase: __% per __ minutes
|
|
210
|
+
- Success criteria: __
|
|
211
|
+
```
|
|
212
|
+
|
|
213
|
+
**7.5 Monitoring & Logging**
|
|
214
|
+
|
|
215
|
+
````
|
|
216
|
+
|
|
217
|
+
Monitoring tools:
|
|
218
|
+
|
|
219
|
+
Application Performance Monitoring (APM):
|
|
220
|
+
A) ⭐ Datadog - Full-featured, expensive
|
|
221
|
+
B) 🔥 New Relic - Popular
|
|
222
|
+
C) Sentry - Error tracking focus
|
|
223
|
+
D) ⚡ OpenTelemetry + Grafana - Open source
|
|
224
|
+
E) AWS CloudWatch
|
|
225
|
+
F) None yet
|
|
226
|
+
|
|
227
|
+
Your choice: \_\_
|
|
228
|
+
|
|
229
|
+
Logging:
|
|
230
|
+
A) ⭐ Centralized logging
|
|
231
|
+
|
|
232
|
+
- Winston/Pino (Node.js) → CloudWatch/Datadog
|
|
233
|
+
- Python logging → ELK Stack
|
|
234
|
+
|
|
235
|
+
B) Basic console logs
|
|
236
|
+
|
|
237
|
+
C) Structured JSON logging ⭐
|
|
238
|
+
|
|
239
|
+
```json
|
|
240
|
+
{
|
|
241
|
+
"level": "info",
|
|
242
|
+
"timestamp": "2024-01-15T10:30:00Z",
|
|
243
|
+
"userId": "123",
|
|
244
|
+
"action": "user.login",
|
|
245
|
+
"ip": "192.168.1.1",
|
|
246
|
+
"message": "User logged in successfully"
|
|
247
|
+
}
|
|
248
|
+
```
|
|
249
|
+
|
|
250
|
+
Your logging strategy: \_\_
|
|
251
|
+
|
|
252
|
+
Metrics to track:
|
|
253
|
+
|
|
254
|
+
- ✅ Request rate (requests/sec)
|
|
255
|
+
- ✅ Error rate (% of failed requests)
|
|
256
|
+
- ✅ Response time (p50, p95, p99)
|
|
257
|
+
- ✅ Database query time
|
|
258
|
+
- ✅ Cache hit rate
|
|
259
|
+
- ✅ CPU/Memory usage
|
|
260
|
+
- Custom business metrics: \_\_
|
|
261
|
+
|
|
262
|
+
````
|
|
263
|
+
|
|
264
|
+
**7.6 Alerts**
|
|
265
|
+
|
|
266
|
+
```
|
|
267
|
+
|
|
268
|
+
When should you be alerted?
|
|
269
|
+
|
|
270
|
+
A) ✅ Error rate > **% (e.g., 1%)
|
|
271
|
+
B) ✅ Response time > **ms (e.g., 1000ms)
|
|
272
|
+
C) ✅ 5xx errors (server errors)
|
|
273
|
+
D) ✅ Service down (health check failure)
|
|
274
|
+
E) ✅ Database connection failures
|
|
275
|
+
F) ✅ Disk space > 80%
|
|
276
|
+
G) ✅ Memory usage > 85%
|
|
277
|
+
|
|
278
|
+
Alert channels:
|
|
279
|
+
A) ⭐ Email
|
|
280
|
+
B) 🔥 Slack/Discord
|
|
281
|
+
C) ⚡ PagerDuty/Opsgenie (on-call)
|
|
282
|
+
D) SMS (critical only)
|
|
283
|
+
|
|
284
|
+
Your preferences: \_\_
|
|
285
|
+
|
|
286
|
+
On-call rotation:
|
|
287
|
+
A) Yes - Using [PagerDuty/Opsgenie]
|
|
288
|
+
B) No - Monitor during business hours
|
|
289
|
+
|
|
290
|
+
```
|
|
291
|
+
|
|
292
|
+
**7.7 Backup & Disaster Recovery**
|
|
293
|
+
|
|
294
|
+
```
|
|
295
|
+
|
|
296
|
+
Backup strategy:
|
|
297
|
+
|
|
298
|
+
Database backups:
|
|
299
|
+
A) ⭐ Automated daily backups
|
|
300
|
+
|
|
301
|
+
- Retention: 30 days
|
|
302
|
+
- Point-in-time recovery
|
|
303
|
+
|
|
304
|
+
B) 🏆 Continuous backups
|
|
305
|
+
|
|
306
|
+
- Every hour
|
|
307
|
+
- 90 days retention
|
|
308
|
+
|
|
309
|
+
C) Manual backups weekly
|
|
310
|
+
|
|
311
|
+
Your strategy: **
|
|
312
|
+
Retention period: ** days
|
|
313
|
+
|
|
314
|
+
Disaster recovery:
|
|
315
|
+
|
|
316
|
+
- Recovery Time Objective (RTO): \_\_ (how fast to restore)
|
|
317
|
+
- Recovery Point Objective (RPO): \_\_ (acceptable data loss)
|
|
318
|
+
|
|
319
|
+
Example:
|
|
320
|
+
|
|
321
|
+
- RTO: 1 hour (service restored within 1 hour)
|
|
322
|
+
- RPO: 15 minutes (lose max 15 min of data)
|
|
323
|
+
|
|
324
|
+
```
|
|
325
|
+
|
|
326
|
+
**7.7.1 Database Migrations in Production**
|
|
327
|
+
|
|
328
|
+
```
|
|
329
|
+
How will you handle database migrations in production?
|
|
330
|
+
|
|
331
|
+
Zero-downtime migrations:
|
|
332
|
+
A) ⭐ Yes - Plan for zero-downtime migrations (Production-Ready/Enterprise)
|
|
333
|
+
B) No - Accept maintenance windows (MVP)
|
|
334
|
+
|
|
335
|
+
If zero-downtime:
|
|
336
|
+
- Strategy: [Expand/Contract, Blue-Green migrations, etc.]
|
|
337
|
+
- Rollback plan: __
|
|
338
|
+
- Testing: [Tested on staging, Dry-run process]
|
|
339
|
+
|
|
340
|
+
Migration windows (if not zero-downtime):
|
|
341
|
+
- Preferred time: __
|
|
342
|
+
- Duration: __ minutes
|
|
343
|
+
- Notification: __
|
|
344
|
+
```
|
|
345
|
+
|
|
346
|
+
**7.7.2 Database Connection Pooling**
|
|
347
|
+
|
|
348
|
+
```
|
|
349
|
+
Database connection pooling configuration:
|
|
350
|
+
|
|
351
|
+
Pool tool: [ORM built-in, pgBouncer, HikariCP, etc.]
|
|
352
|
+
|
|
353
|
+
Settings:
|
|
354
|
+
- Min connections: __
|
|
355
|
+
- Max connections: __
|
|
356
|
+
- Connection timeout: __ ms
|
|
357
|
+
- Idle timeout: __ ms
|
|
358
|
+
- Max lifetime: __ ms
|
|
359
|
+
|
|
360
|
+
Monitoring:
|
|
361
|
+
- Track active/idle connections: [Yes/No]
|
|
362
|
+
- Alert on pool exhaustion: [Yes/No]
|
|
363
|
+
```
|
|
364
|
+
|
|
365
|
+
**7.8 Scaling Strategy**
|
|
366
|
+
|
|
367
|
+
```
|
|
368
|
+
|
|
369
|
+
How will you handle growth?
|
|
370
|
+
|
|
371
|
+
A) ⭐ Horizontal scaling - Add more instances
|
|
372
|
+
|
|
373
|
+
- Load balancer distributes traffic
|
|
374
|
+
- Stateless application design
|
|
375
|
+
|
|
376
|
+
B) Vertical scaling - Bigger instances
|
|
377
|
+
|
|
378
|
+
- Increase CPU/RAM
|
|
379
|
+
- Simpler but limited
|
|
380
|
+
|
|
381
|
+
C) ⚡ Auto-scaling - Automatic based on load
|
|
382
|
+
|
|
383
|
+
- Scale up during high traffic
|
|
384
|
+
- Scale down to save costs
|
|
385
|
+
- Metrics: CPU > 70%, requests > threshold
|
|
386
|
+
|
|
387
|
+
Your strategy: \_\_
|
|
388
|
+
|
|
389
|
+
Expected load:
|
|
390
|
+
|
|
391
|
+
- Initial: \_\_ requests/minute
|
|
392
|
+
- Year 1: \_\_ requests/minute
|
|
393
|
+
- Peak traffic: \_\_x normal load
|
|
394
|
+
|
|
395
|
+
Database scaling:
|
|
396
|
+
A) Read replicas - Scale reads
|
|
397
|
+
B) Sharding - Split data across DBs
|
|
398
|
+
C) Vertical scaling - Bigger DB instance
|
|
399
|
+
D) Not needed yet
|
|
400
|
+
|
|
401
|
+
```
|
|
402
|
+
|
|
403
|
+
**7.9 Health Checks**
|
|
404
|
+
|
|
405
|
+
````
|
|
406
|
+
|
|
407
|
+
Health check endpoints:
|
|
408
|
+
|
|
409
|
+
A) ✅ /health - Basic liveness
|
|
410
|
+
|
|
411
|
+
- Returns 200 OK if app is running
|
|
412
|
+
|
|
413
|
+
B) ✅ /health/ready - Readiness check
|
|
414
|
+
|
|
415
|
+
- Returns 200 OK if app can handle traffic
|
|
416
|
+
- Checks: DB connected, Redis connected, etc.
|
|
417
|
+
|
|
418
|
+
C) ✅ /health/live - Liveness check
|
|
419
|
+
|
|
420
|
+
- Returns 200 OK if app is alive
|
|
421
|
+
- Load balancer uses this
|
|
422
|
+
|
|
423
|
+
Example response:
|
|
424
|
+
|
|
425
|
+
```json
|
|
426
|
+
{
|
|
427
|
+
"status": "healthy",
|
|
428
|
+
"timestamp": "2024-01-15T10:30:00Z",
|
|
429
|
+
"checks": {
|
|
430
|
+
"database": "ok",
|
|
431
|
+
"redis": "ok",
|
|
432
|
+
"disk_space": "ok"
|
|
433
|
+
},
|
|
434
|
+
"version": "1.2.3"
|
|
435
|
+
}
|
|
436
|
+
```
|
|
437
|
+
|
|
438
|
+
Your health check endpoints: \_\_
|
|
439
|
+
|
|
440
|
+
````
|
|
441
|
+
|
|
442
|
+
**7.9.1 Graceful Shutdown**
|
|
443
|
+
|
|
444
|
+
```
|
|
445
|
+
Will you implement graceful shutdown?
|
|
446
|
+
|
|
447
|
+
A) ⭐ Yes - Handle shutdown gracefully (Production-Ready/Enterprise)
|
|
448
|
+
B) No - Standard shutdown
|
|
449
|
+
|
|
450
|
+
If yes:
|
|
451
|
+
Shutdown sequence:
|
|
452
|
+
1. Stop accepting new requests (timeout: __s)
|
|
453
|
+
2. Finish processing current requests (timeout: __s)
|
|
454
|
+
3. Close database connections (timeout: __s)
|
|
455
|
+
4. Close other connections (Redis, message queues, etc.)
|
|
456
|
+
5. Exit process
|
|
457
|
+
|
|
458
|
+
Total shutdown timeout: __s
|
|
459
|
+
|
|
460
|
+
Implementation:
|
|
461
|
+
- Signal handling: [SIGTERM, SIGINT]
|
|
462
|
+
- Health check grace period: __s
|
|
463
|
+
- Connection drain timeout: __s
|
|
464
|
+
```
|
|
465
|
+
|
|
466
|
+
**7.9.2 Circuit Breakers & Resilience**
|
|
467
|
+
|
|
468
|
+
```
|
|
469
|
+
Will you implement circuit breakers?
|
|
470
|
+
|
|
471
|
+
A) ⭐ Yes - Protect against cascading failures (Production-Ready/Enterprise)
|
|
472
|
+
B) No - Direct service calls
|
|
473
|
+
|
|
474
|
+
If yes:
|
|
475
|
+
Circuit breaker tool: [Resilience4j, Hystrix, Polly, etc.]
|
|
476
|
+
|
|
477
|
+
Configuration:
|
|
478
|
+
- Failure threshold: __% (open circuit after X% failures)
|
|
479
|
+
- Success threshold: __% (close circuit after X% successes)
|
|
480
|
+
- Timeout: __ms
|
|
481
|
+
- Half-open retries: __
|
|
482
|
+
- Reset timeout: __s
|
|
483
|
+
|
|
484
|
+
Fallback strategy:
|
|
485
|
+
A) ⭐ Return cached data
|
|
486
|
+
B) Return default/empty response
|
|
487
|
+
C) Call alternative service
|
|
488
|
+
D) Return error gracefully
|
|
489
|
+
|
|
490
|
+
Services to protect:
|
|
491
|
+
{{#EACH SERVICE_TO_PROTECT}}
|
|
492
|
+
- **{{SERVICE_NAME}}**: {{FAILURE_THRESHOLD}}% threshold, fallback: {{FALLBACK_STRATEGY}}
|
|
493
|
+
{{/EACH}}
|
|
494
|
+
```
|
|
495
|
+
|
|
496
|
+
**7.9.3 Retry & Timeout Policies**
|
|
497
|
+
|
|
498
|
+
```
|
|
499
|
+
Define retry and timeout policies for external dependencies:
|
|
500
|
+
|
|
501
|
+
| Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
|
|
502
|
+
|--------------------|-----------|---------|----------------------|----------------------|
|
|
503
|
+
| Database queries | 5000ms | 2 | None (fail fast) | Connection pooled |
|
|
504
|
+
| Redis cache | 1000ms | 1 | None | Cache miss = OK |
|
|
505
|
+
| Payment API | 30000ms | 3 | Exponential (1s,2s,4s)| Must complete |
|
|
506
|
+
| Email service | 5000ms | 3 | Fixed (2s) | Queue if fails |
|
|
507
|
+
| External REST APIs | 10000ms | 2 | Exponential | Circuit breaker |
|
|
508
|
+
| File storage (S3) | 15000ms | 3 | Exponential | Large files |
|
|
509
|
+
|
|
510
|
+
Your policies:
|
|
511
|
+
|
|
512
|
+
| Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
|
|
513
|
+
|--------------------|-----------|---------|----------------------|----------------------|
|
|
514
|
+
| | | | | |
|
|
515
|
+
| | | | | |
|
|
516
|
+
|
|
517
|
+
Global defaults:
|
|
518
|
+
- Default HTTP timeout: __ ms (recommended: 10000)
|
|
519
|
+
- Default retries: __ (recommended: 2)
|
|
520
|
+
- Default backoff: [None/Fixed/Exponential]
|
|
521
|
+
|
|
522
|
+
Non-retryable errors:
|
|
523
|
+
- 400 Bad Request (client error, won't succeed on retry)
|
|
524
|
+
- 401/403 Unauthorized/Forbidden
|
|
525
|
+
- 404 Not Found
|
|
526
|
+
- [Your additions]
|
|
527
|
+
```
|
|
528
|
+
|
|
529
|
+
**7.9.4 Request/Response Logging & Masking**
|
|
530
|
+
|
|
531
|
+
```
|
|
532
|
+
What request/response data will you log?
|
|
533
|
+
|
|
534
|
+
Log levels by environment:
|
|
535
|
+
| Environment | Level | Body Logging | Performance Logging |
|
|
536
|
+
|-------------|----------|--------------|---------------------|
|
|
537
|
+
| Development | debug | Full | Yes |
|
|
538
|
+
| Staging | info | Truncated | Yes |
|
|
539
|
+
| Production | info | Minimal | Yes |
|
|
540
|
+
|
|
541
|
+
Request logging:
|
|
542
|
+
- ✅ HTTP method and URL
|
|
543
|
+
- ✅ Request ID (correlation)
|
|
544
|
+
- ✅ User ID (if authenticated)
|
|
545
|
+
- ✅ IP address (optional, may hash for privacy)
|
|
546
|
+
- ✅ Request duration (ms)
|
|
547
|
+
- ❓ Request body (careful with size and PII)
|
|
548
|
+
- ❓ Query parameters
|
|
549
|
+
|
|
550
|
+
Response logging:
|
|
551
|
+
- ✅ Status code
|
|
552
|
+
- ✅ Response duration (ms)
|
|
553
|
+
- ❓ Response body (careful with size)
|
|
554
|
+
|
|
555
|
+
Sensitive data masking (CRITICAL):
|
|
556
|
+
|
|
557
|
+
| Field Pattern | Masking Strategy |
|
|
558
|
+
|------------------------|----------------------------|
|
|
559
|
+
| password, secret | Completely redact |
|
|
560
|
+
| token, api_key | Show last 4 chars only |
|
|
561
|
+
| email | j***@example.com |
|
|
562
|
+
| phone | ***-***-1234 |
|
|
563
|
+
| credit_card | ****-****-****-1234 |
|
|
564
|
+
| ssn, national_id | Completely redact |
|
|
565
|
+
| [Your patterns] | __ |
|
|
566
|
+
|
|
567
|
+
Log format:
|
|
568
|
+
A) ⭐ Structured JSON (recommended for aggregation)
|
|
569
|
+
B) Plain text with patterns
|
|
570
|
+
C) Framework default
|
|
571
|
+
|
|
572
|
+
Log aggregation:
|
|
573
|
+
A) ⭐ Centralized (ELK, Datadog, CloudWatch)
|
|
574
|
+
B) File-based with rotation
|
|
575
|
+
C) Console only (development)
|
|
576
|
+
```
|
|
577
|
+
|
|
578
|
+
**7.10 Documentation & Runbooks**
|
|
579
|
+
|
|
580
|
+
```
|
|
581
|
+
|
|
582
|
+
Operational documentation:
|
|
583
|
+
|
|
584
|
+
A) ✅ Deployment guide - How to deploy
|
|
585
|
+
B) ✅ Runbooks - How to handle incidents
|
|
586
|
+
|
|
587
|
+
- Database connection failure → steps to diagnose/fix
|
|
588
|
+
- High CPU usage → steps to investigate
|
|
589
|
+
- Service down → recovery procedure
|
|
590
|
+
|
|
591
|
+
C) ✅ Architecture diagrams (Mermaid format)
|
|
592
|
+
|
|
593
|
+
- System architecture diagram (mermaid)
|
|
594
|
+
- Data flow diagram (mermaid)
|
|
595
|
+
- Infrastructure diagram (mermaid)
|
|
596
|
+
|
|
597
|
+
D) ✅ API documentation
|
|
598
|
+
|
|
599
|
+
- Swagger/OpenAPI
|
|
600
|
+
- Auto-generated from code
|
|
601
|
+
|
|
602
|
+
Will you create these?
|
|
603
|
+
A) Yes - All of them ⭐
|
|
604
|
+
B) Yes - Critical ones only (deployment, runbooks)
|
|
605
|
+
C) Later - Start without docs
|
|
606
|
+
|
|
607
|
+
API documentation strategy:
|
|
608
|
+
A) ⭐ Code-First (Recommended)
|
|
609
|
+
|
|
610
|
+
- Generate docs from code (Swagger/OpenAPI decorators)
|
|
611
|
+
- Always in sync with code
|
|
612
|
+
- Tools: @nestjs/swagger, FastAPI docs
|
|
613
|
+
|
|
614
|
+
B) 📝 Design-First
|
|
615
|
+
|
|
616
|
+
- Write openapi.yaml manually first
|
|
617
|
+
- Generate code from spec
|
|
618
|
+
- Better for large teams/contracts
|
|
619
|
+
|
|
620
|
+
C) 📄 Manual
|
|
621
|
+
|
|
622
|
+
- Write Markdown/Notion docs
|
|
623
|
+
- Hard to keep in sync (Not recommended)
|
|
624
|
+
|
|
625
|
+
```
|
|
626
|
+
|
|
627
|
+
---
|
|
628
|
+
|
|
629
|
+
#### 🎨 MERMAID OPERATIONS DIAGRAM FORMATS - CRITICAL
|
|
630
|
+
|
|
631
|
+
## **Use these exact formats** for operational and infrastructure diagrams mentioned in question 7.10:
|
|
632
|
+
|
|
633
|
+
##### 1️⃣ System Architecture Diagram (Deployment View)
|
|
634
|
+
|
|
635
|
+
Use `graph TD` to show deployed system components with scaling and redundancy:
|
|
636
|
+
|
|
637
|
+
````markdown
|
|
638
|
+
```mermaid
|
|
639
|
+
graph TD
|
|
640
|
+
subgraph "Production Environment"
|
|
641
|
+
subgraph "Load Balancer Layer"
|
|
642
|
+
LB1[Load Balancer 1]
|
|
643
|
+
LB2[Load Balancer 2]
|
|
644
|
+
end
|
|
645
|
+
|
|
646
|
+
subgraph "Application Layer"
|
|
647
|
+
App1[API Server 1<br/>4 vCPU, 8GB RAM]
|
|
648
|
+
App2[API Server 2<br/>4 vCPU, 8GB RAM]
|
|
649
|
+
App3[API Server 3<br/>4 vCPU, 8GB RAM]
|
|
650
|
+
end
|
|
651
|
+
|
|
652
|
+
subgraph "Data Layer"
|
|
653
|
+
Primary[(Primary DB<br/>PostgreSQL 15)]
|
|
654
|
+
Replica1[(Read Replica 1)]
|
|
655
|
+
Replica2[(Read Replica 2)]
|
|
656
|
+
Cache[Redis Cluster<br/>3 Nodes]
|
|
657
|
+
end
|
|
658
|
+
|
|
659
|
+
subgraph "Message Queue"
|
|
660
|
+
Queue[RabbitMQ Cluster<br/>3 Nodes]
|
|
661
|
+
end
|
|
662
|
+
end
|
|
663
|
+
|
|
664
|
+
Internet[Internet] -->|HTTPS| LB1
|
|
665
|
+
Internet -->|HTTPS| LB2
|
|
666
|
+
LB1 --> App1
|
|
667
|
+
LB1 --> App2
|
|
668
|
+
LB2 --> App2
|
|
669
|
+
LB2 --> App3
|
|
670
|
+
|
|
671
|
+
App1 -->|Write| Primary
|
|
672
|
+
App2 -->|Write| Primary
|
|
673
|
+
App3 -->|Write| Primary
|
|
674
|
+
|
|
675
|
+
App1 -->|Read| Replica1
|
|
676
|
+
App2 -->|Read| Replica2
|
|
677
|
+
App3 -->|Read| Replica1
|
|
678
|
+
|
|
679
|
+
App1 -->|Cache| Cache
|
|
680
|
+
App2 -->|Cache| Cache
|
|
681
|
+
App3 -->|Cache| Cache
|
|
682
|
+
|
|
683
|
+
App1 -->|Async Jobs| Queue
|
|
684
|
+
App2 -->|Async Jobs| Queue
|
|
685
|
+
App3 -->|Async Jobs| Queue
|
|
686
|
+
|
|
687
|
+
Primary -.->|Replication| Replica1
|
|
688
|
+
Primary -.->|Replication| Replica2
|
|
689
|
+
|
|
690
|
+
style Internet fill:#e1f5ff
|
|
691
|
+
style Primary fill:#e1ffe1
|
|
692
|
+
style Cache fill:#f0e1ff
|
|
693
|
+
style Queue fill:#ffe1f5
|
|
694
|
+
```
|
|
695
|
+
````
|
|
696
|
+
|
|
697
|
+
## **Use for:** Showing deployed infrastructure, scaling configuration, redundancy, high availability
|
|
698
|
+
|
|
699
|
+
##### 2️⃣ Data Flow Diagram (Request Flow)
|
|
700
|
+
|
|
701
|
+
Use `flowchart LR` to show how data moves through the system step-by-step:
|
|
702
|
+
|
|
703
|
+
````markdown
|
|
704
|
+
```mermaid
|
|
705
|
+
flowchart LR
|
|
706
|
+
User[User Request] -->|1. HTTPS POST| LB[Load Balancer]
|
|
707
|
+
LB -->|2. Route| API[API Server]
|
|
708
|
+
API -->|3. Validate JWT| Auth[Auth Service]
|
|
709
|
+
Auth -->|4. Token Valid| API
|
|
710
|
+
|
|
711
|
+
API -->|5. Check Cache| Cache[(Redis Cache)]
|
|
712
|
+
Cache -->|6. Cache Miss| API
|
|
713
|
+
|
|
714
|
+
API -->|7. Query| DB[(PostgreSQL)]
|
|
715
|
+
DB -->|8. Data| API
|
|
716
|
+
|
|
717
|
+
API -->|9. Store in Cache| Cache
|
|
718
|
+
API -->|10. Enqueue Job| Queue[Message Queue]
|
|
719
|
+
|
|
720
|
+
Queue -->|11. Process| Worker[Background Worker]
|
|
721
|
+
Worker -->|12. Send Email| Email[Email Service]
|
|
722
|
+
|
|
723
|
+
API -->|13. JSON Response| User
|
|
724
|
+
|
|
725
|
+
style User fill:#e1f5ff
|
|
726
|
+
style Cache fill:#f0e1ff
|
|
727
|
+
style DB fill:#e1ffe1
|
|
728
|
+
style Email fill:#fff4e1
|
|
729
|
+
```
|
|
730
|
+
````
|
|
731
|
+
|
|
732
|
+
## **Use for:** Documenting request/response cycles, async processing flows, numbered execution steps
|
|
733
|
+
|
|
734
|
+
##### 3️⃣ Infrastructure Diagram (Cloud Resources)
|
|
735
|
+
|
|
736
|
+
Use `graph TB` with subgraphs to show cloud infrastructure and network topology:
|
|
737
|
+
|
|
738
|
+
````markdown
|
|
739
|
+
```mermaid
|
|
740
|
+
graph TB
|
|
741
|
+
subgraph "AWS Cloud - Production (us-east-1)"
|
|
742
|
+
subgraph "VPC (10.0.0.0/16)"
|
|
743
|
+
subgraph "Public Subnet (10.0.1.0/24)"
|
|
744
|
+
ALB[Application Load Balancer]
|
|
745
|
+
NAT[NAT Gateway]
|
|
746
|
+
end
|
|
747
|
+
|
|
748
|
+
subgraph "Private Subnet 1 (10.0.10.0/24)"
|
|
749
|
+
ECS1[ECS Cluster<br/>Auto Scaling Group]
|
|
750
|
+
App1[Container: API<br/>Fargate Task]
|
|
751
|
+
App2[Container: API<br/>Fargate Task]
|
|
752
|
+
end
|
|
753
|
+
|
|
754
|
+
subgraph "Private Subnet 2 (10.0.20.0/24)"
|
|
755
|
+
RDS[(RDS PostgreSQL<br/>Multi-AZ)]
|
|
756
|
+
ElastiCache[ElastiCache Redis<br/>Cluster Mode]
|
|
757
|
+
end
|
|
758
|
+
|
|
759
|
+
subgraph "Private Subnet 3 (10.0.30.0/24)"
|
|
760
|
+
SQS[Amazon SQS<br/>Message Queue]
|
|
761
|
+
Lambda[Lambda Functions<br/>Background Workers]
|
|
762
|
+
end
|
|
763
|
+
end
|
|
764
|
+
|
|
765
|
+
subgraph "Supporting Services"
|
|
766
|
+
S3[S3 Bucket<br/>File Storage]
|
|
767
|
+
CloudWatch[CloudWatch<br/>Monitoring & Logs]
|
|
768
|
+
SecretsManager[Secrets Manager<br/>API Keys & Credentials]
|
|
769
|
+
end
|
|
770
|
+
end
|
|
771
|
+
|
|
772
|
+
Internet[Internet Users] -->|HTTPS| ALB
|
|
773
|
+
ALB --> App1
|
|
774
|
+
ALB --> App2
|
|
775
|
+
|
|
776
|
+
App1 --> RDS
|
|
777
|
+
App2 --> RDS
|
|
778
|
+
App1 --> ElastiCache
|
|
779
|
+
App2 --> ElastiCache
|
|
780
|
+
|
|
781
|
+
App1 -->|Upload/Download| S3
|
|
782
|
+
App2 -->|Upload/Download| S3
|
|
783
|
+
|
|
784
|
+
App1 -->|Send Message| SQS
|
|
785
|
+
SQS -->|Trigger| Lambda
|
|
786
|
+
Lambda --> RDS
|
|
787
|
+
|
|
788
|
+
App1 -->|Logs & Metrics| CloudWatch
|
|
789
|
+
App2 -->|Logs & Metrics| CloudWatch
|
|
790
|
+
Lambda -->|Logs| CloudWatch
|
|
791
|
+
|
|
792
|
+
App1 -->|Fetch Secrets| SecretsManager
|
|
793
|
+
App2 -->|Fetch Secrets| SecretsManager
|
|
794
|
+
|
|
795
|
+
style Internet fill:#e1f5ff
|
|
796
|
+
style RDS fill:#e1ffe1
|
|
797
|
+
style ElastiCache fill:#f0e1ff
|
|
798
|
+
style S3 fill:#fff4e1
|
|
799
|
+
style CloudWatch fill:#ffe1e1
|
|
800
|
+
```
|
|
801
|
+
````
|
|
802
|
+
|
|
803
|
+
## **Use for:** Documenting cloud architecture, network topology, AWS/GCP/Azure resources, VPC design
|
|
804
|
+
|
|
805
|
+
##### 4️⃣ Monitoring & Observability Diagram (Optional)
|
|
806
|
+
|
|
807
|
+
Use `graph TD` to show monitoring, logging, and alerting stack:
|
|
808
|
+
|
|
809
|
+
````markdown
|
|
810
|
+
```mermaid
|
|
811
|
+
graph TD
|
|
812
|
+
subgraph "Application Layer"
|
|
813
|
+
App[API Servers]
|
|
814
|
+
Worker[Background Workers]
|
|
815
|
+
end
|
|
816
|
+
|
|
817
|
+
subgraph "Monitoring Stack"
|
|
818
|
+
Prometheus[Prometheus<br/>Metrics Collection]
|
|
819
|
+
Grafana[Grafana<br/>Dashboards]
|
|
820
|
+
AlertManager[Alert Manager<br/>Notifications]
|
|
821
|
+
end
|
|
822
|
+
|
|
823
|
+
subgraph "Logging Stack"
|
|
824
|
+
FluentBit[Fluent Bit<br/>Log Collector]
|
|
825
|
+
Elasticsearch[Elasticsearch<br/>Log Storage]
|
|
826
|
+
Kibana[Kibana<br/>Log Viewer]
|
|
827
|
+
end
|
|
828
|
+
|
|
829
|
+
subgraph "Tracing"
|
|
830
|
+
Jaeger[Jaeger<br/>Distributed Tracing]
|
|
831
|
+
end
|
|
832
|
+
|
|
833
|
+
subgraph "Alerts"
|
|
834
|
+
PagerDuty[PagerDuty]
|
|
835
|
+
Slack[Slack Notifications]
|
|
836
|
+
end
|
|
837
|
+
|
|
838
|
+
App -->|Metrics| Prometheus
|
|
839
|
+
Worker -->|Metrics| Prometheus
|
|
840
|
+
Prometheus --> Grafana
|
|
841
|
+
Prometheus --> AlertManager
|
|
842
|
+
|
|
843
|
+
App -->|Logs| FluentBit
|
|
844
|
+
Worker -->|Logs| FluentBit
|
|
845
|
+
FluentBit --> Elasticsearch
|
|
846
|
+
Elasticsearch --> Kibana
|
|
847
|
+
|
|
848
|
+
App -->|Traces| Jaeger
|
|
849
|
+
Worker -->|Traces| Jaeger
|
|
850
|
+
|
|
851
|
+
AlertManager --> PagerDuty
|
|
852
|
+
AlertManager --> Slack
|
|
853
|
+
|
|
854
|
+
style Grafana fill:#e1f5ff
|
|
855
|
+
style Kibana fill:#f0e1ff
|
|
856
|
+
style PagerDuty fill:#ffe1e1
|
|
857
|
+
```
|
|
858
|
+
````
|
|
859
|
+
|
|
860
|
+
## **Use for:** Documenting observability strategy, monitoring infrastructure, alerting workflows
|
|
861
|
+
|
|
862
|
+
**Best Practices for Operations Diagrams:**
|
|
863
|
+
|
|
864
|
+
1. **Include Resource Specs:** Add CPU/RAM/disk info to nodes (e.g., `[API Server<br/>4 vCPU, 8GB RAM]`)
|
|
865
|
+
2. **Show Redundancy:** Display load balancers, replicas, multi-AZ deployments, failover paths
|
|
866
|
+
3. **Label Network Boundaries:** Use subgraphs for VPCs, subnets, availability zones, regions
|
|
867
|
+
4. **Document Protocols:** Label connections with HTTPS, gRPC, TCP, WebSocket, etc.
|
|
868
|
+
5. **Add IP Ranges:** Include CIDR blocks for network subnets (e.g., `10.0.1.0/24`)
|
|
869
|
+
6. **Show Auto-Scaling:** Indicate which components scale horizontally/vertically
|
|
870
|
+
7. **Include External Services:** SaaS tools, third-party APIs, CDNs, email providers
|
|
871
|
+
8. **Color Code by Layer:** Infrastructure (blue), data (green), monitoring (purple), alerts (red)
|
|
872
|
+
|
|
873
|
+
**Common Formatting Rules:**
|
|
874
|
+
|
|
875
|
+
- Code fence: ` ```mermaid ` (lowercase, no spaces, three backticks)
|
|
876
|
+
- Use `subgraph "Name"` to group related components by layer/zone
|
|
877
|
+
- Use `[(Cylinder)]` for databases, data stores, and persistent storage
|
|
878
|
+
- Use `[Square Brackets]` for services, servers, and compute resources
|
|
879
|
+
- Use dotted arrows `-.->` for replication, backup, and async flows
|
|
880
|
+
- Apply consistent styling: `style NodeName fill:#colorcode`
|
|
881
|
+
|
|
882
|
+
**Deployment Context Examples:**
|
|
883
|
+
|
|
884
|
+
- For Docker: Show containers, volumes, networks, registries
|
|
885
|
+
- For Kubernetes: Show pods, services, ingress, namespaces, persistent volumes
|
|
886
|
+
- For Serverless: Show Lambda functions, API Gateway, S3 triggers, event sources
|
|
887
|
+
- For VMs: Show instances, security groups, load balancers, auto-scaling groups
|
|
888
|
+
|
|
889
|
+
## **Validation:** Test diagrams at https://mermaid.live/ before saving to ensure syntax is correct
|
|
890
|
+
|
|
891
|
+
### Phase 7 Output
|
|
892
|
+
|
|
893
|
+
```
|
|
894
|
+
📋 PHASE 7 SUMMARY:
|
|
895
|
+
|
|
896
|
+
Deployment Environment: [cloud/PaaS/on-premises/container-orchestration + platform choice + rationale] (7.1)
|
|
897
|
+
Containerization: [yes/no + Docker setup (base image, size, compose stack)] (7.2)
|
|
898
|
+
Environments: [number of environments (dev/staging/prod) + config approach (env vars/secrets/feature flags)] (7.3)
|
|
899
|
+
CI/CD Pipeline: [platform (GitHub Actions/GitLab CI/etc.) + pipeline stages + auto-deploy strategy] (7.4)
|
|
900
|
+
Deployment Strategy: [standard/blue-green/canary/rolling + zero-downtime approach + rollback plan] (7.4.1)
|
|
901
|
+
Monitoring & Logging: [APM tool + logging strategy (centralized/structured JSON) + metrics to track] (7.5)
|
|
902
|
+
Alerts: [alert conditions (error rate/response time/5xx/etc.) + channels (email/Slack/PagerDuty) + on-call rotation] (7.6)
|
|
903
|
+
Backup & Disaster Recovery: [backup strategy + retention period + RTO/RPO targets] (7.7)
|
|
904
|
+
Database Migrations in Production: [zero-downtime strategy + rollback plan + migration windows] (7.7.1)
|
|
905
|
+
Database Connection Pooling: [pool tool + settings (min/max/timeouts) + monitoring] (7.7.2)
|
|
906
|
+
Scaling Strategy: [horizontal/vertical/auto-scaling + expected load + database scaling approach] (7.8)
|
|
907
|
+
Health Checks: [endpoints (/health, /health/ready, /health/live) + checks performed] (7.9)
|
|
908
|
+
Graceful Shutdown: [yes/no + shutdown sequence + timeouts] (7.9.1)
|
|
909
|
+
Circuit Breakers & Resilience: [yes/no + tool + configuration + fallback strategies] (7.9.2)
|
|
910
|
+
Documentation & Runbooks: [what will be created (deployment guide/runbooks/architecture diagrams in mermaid format/API docs) + API doc strategy (code-first/design-first)] (7.10)
|
|
911
|
+
|
|
912
|
+
Is this correct? (Yes/No)
|
|
913
|
+
```
|
|
914
|
+
|
|
915
|
+
---
|
|
916
|
+
|
|
917
|
+
### 📄 Generate Phase 7 Documents
|
|
918
|
+
|
|
919
|
+
**Before starting generation:**
|
|
920
|
+
|
|
921
|
+
```
|
|
922
|
+
📖 Loading context from previous phases...
|
|
923
|
+
✅ Re-reading docs/testing.md
|
|
924
|
+
✅ Re-reading ai-instructions.md
|
|
925
|
+
```
|
|
926
|
+
|
|
927
|
+
**Generate documents automatically:**
|
|
928
|
+
|
|
929
|
+
**1. `docs/operations.md`**
|
|
930
|
+
|
|
931
|
+
- Use template: `.ai-flow/templates/docs/operations.template.md`
|
|
932
|
+
- Fill with deployment, monitoring, alerting, backup, scaling
|
|
933
|
+
- Write to: `docs/operations.md`
|
|
934
|
+
|
|
935
|
+
**2. `specs/configuration.md`**
|
|
936
|
+
|
|
937
|
+
- Use template: `.ai-flow/templates/specs/configuration.template.md`
|
|
938
|
+
- Fill with environment variables, secrets management, feature flags
|
|
939
|
+
- Write to: `specs/configuration.md`
|
|
940
|
+
|
|
941
|
+
**3. `.env.example`**
|
|
942
|
+
|
|
943
|
+
- List all environment variables needed
|
|
944
|
+
- Include comments explaining each variable
|
|
945
|
+
- Write to: `.env.example`
|
|
946
|
+
|
|
947
|
+
```
|
|
948
|
+
✅ Generated: docs/operations.md
|
|
949
|
+
✅ Generated: specs/configuration.md
|
|
950
|
+
✅ Generated: .env.example
|
|
951
|
+
|
|
952
|
+
Documents have been created with all Phase 7 information.
|
|
953
|
+
|
|
954
|
+
📝 Would you like to make any corrections before continuing?
|
|
955
|
+
|
|
956
|
+
→ If yes: Edit the files and type "ready" when done. I'll re-read them.
|
|
957
|
+
→ If no: Type "continue" to proceed to final checkpoint.
|
|
958
|
+
```
|
|
959
|
+
|
|
960
|
+
**If user edits files:**
|
|
961
|
+
Re-read files to refresh context before continuing.
|
|
962
|
+
|
|
963
|
+
---
|
|
964
|
+
|
|
965
|
+
### Phase 7 Completion
|
|
966
|
+
|
|
967
|
+
```
|
|
968
|
+
✅ Phase 7 Complete!
|
|
969
|
+
|
|
970
|
+
Generated documents:
|
|
971
|
+
✅ docs/operations.md
|
|
972
|
+
✅ specs/configuration.md
|
|
973
|
+
✅ .env.example
|
|
974
|
+
|
|
975
|
+
📝 Would you like to review these documents before proceeding to Phase 8?
|
|
976
|
+
|
|
977
|
+
→ If yes: Edit the files and type "ready" when done.
|
|
978
|
+
→ If no: Type "continue" to proceed to Phase 8.
|
|
979
|
+
```
|
|
980
|
+
|
|
981
|
+
---
|
|
982
|
+
|
|
983
|
+
## 📝 Generated Documents
|
|
984
|
+
|
|
985
|
+
After Phase 7, generate/update:
|
|
986
|
+
|
|
987
|
+
- `docs/operations.md` - Operations and deployment guide
|
|
988
|
+
- `specs/configuration.md` - Configuration specification
|
|
989
|
+
- `.env.example` - Environment variables template
|
|
990
|
+
|
|
991
|
+
---
|
|
992
|
+
|
|
993
|
+
**Next Phase:** Phase 8 - Project Setup & Final Documentation
|
|
994
|
+
|
|
995
|
+
Read: `.ai-flow/prompts/backend/flow-build-phase-8.md`
|
|
996
|
+
|
|
997
|
+
---
|
|
998
|
+
|
|
999
|
+
**Last Updated:** 2025-12-20
|
|
1000
|
+
|
|
1001
|
+
**Version:** 2.1.8
|