ai-flow-dev 2.7.0 → 2.8.0

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (171) hide show
  1. package/LICENSE +21 -21
  2. package/README.md +573 -570
  3. package/package.json +74 -74
  4. package/prompts/backend/flow-build-phase-0.md +535 -535
  5. package/prompts/backend/flow-build-phase-1.md +626 -626
  6. package/prompts/backend/flow-build-phase-10.md +340 -340
  7. package/prompts/backend/flow-build-phase-2.md +573 -573
  8. package/prompts/backend/flow-build-phase-3.md +834 -834
  9. package/prompts/backend/flow-build-phase-4.md +554 -554
  10. package/prompts/backend/flow-build-phase-5.md +703 -703
  11. package/prompts/backend/flow-build-phase-6.md +524 -524
  12. package/prompts/backend/flow-build-phase-7.md +1001 -1001
  13. package/prompts/backend/flow-build-phase-8.md +1407 -1407
  14. package/prompts/backend/flow-build-phase-9.md +477 -477
  15. package/prompts/backend/flow-build.md +137 -137
  16. package/prompts/backend/flow-check-review.md +656 -20
  17. package/prompts/backend/flow-check-test.md +526 -14
  18. package/prompts/backend/flow-check.md +725 -67
  19. package/prompts/backend/flow-commit.md +88 -119
  20. package/prompts/backend/flow-docs-sync.md +354 -354
  21. package/prompts/backend/flow-finish.md +919 -0
  22. package/prompts/backend/flow-release.md +949 -0
  23. package/prompts/backend/flow-work-feature.md +61 -61
  24. package/prompts/backend/flow-work-fix.md +46 -46
  25. package/prompts/backend/flow-work-refactor.md +48 -48
  26. package/prompts/backend/flow-work-resume.md +34 -34
  27. package/prompts/backend/flow-work.md +1098 -1286
  28. package/prompts/desktop/flow-build-phase-0.md +359 -359
  29. package/prompts/desktop/flow-build-phase-1.md +295 -295
  30. package/prompts/desktop/flow-build-phase-10.md +357 -357
  31. package/prompts/desktop/flow-build-phase-2.md +282 -282
  32. package/prompts/desktop/flow-build-phase-3.md +291 -291
  33. package/prompts/desktop/flow-build-phase-4.md +308 -308
  34. package/prompts/desktop/flow-build-phase-5.md +269 -269
  35. package/prompts/desktop/flow-build-phase-6.md +350 -350
  36. package/prompts/desktop/flow-build-phase-7.md +297 -297
  37. package/prompts/desktop/flow-build-phase-8.md +541 -541
  38. package/prompts/desktop/flow-build-phase-9.md +439 -439
  39. package/prompts/desktop/flow-build.md +156 -156
  40. package/prompts/desktop/flow-check-review.md +656 -20
  41. package/prompts/desktop/flow-check-test.md +526 -14
  42. package/prompts/desktop/flow-check.md +725 -67
  43. package/prompts/desktop/flow-commit.md +88 -119
  44. package/prompts/desktop/flow-docs-sync.md +354 -354
  45. package/prompts/desktop/flow-finish.md +910 -0
  46. package/prompts/desktop/flow-release.md +662 -0
  47. package/prompts/desktop/flow-work-feature.md +61 -61
  48. package/prompts/desktop/flow-work-fix.md +46 -46
  49. package/prompts/desktop/flow-work-refactor.md +48 -48
  50. package/prompts/desktop/flow-work-resume.md +34 -34
  51. package/prompts/desktop/flow-work.md +1202 -1390
  52. package/prompts/frontend/flow-build-phase-0.md +425 -425
  53. package/prompts/frontend/flow-build-phase-1.md +626 -626
  54. package/prompts/frontend/flow-build-phase-10.md +33 -33
  55. package/prompts/frontend/flow-build-phase-2.md +573 -573
  56. package/prompts/frontend/flow-build-phase-3.md +782 -782
  57. package/prompts/frontend/flow-build-phase-4.md +554 -554
  58. package/prompts/frontend/flow-build-phase-5.md +703 -703
  59. package/prompts/frontend/flow-build-phase-6.md +524 -524
  60. package/prompts/frontend/flow-build-phase-7.md +1001 -1001
  61. package/prompts/frontend/flow-build-phase-8.md +872 -872
  62. package/prompts/frontend/flow-build-phase-9.md +94 -94
  63. package/prompts/frontend/flow-build.md +137 -137
  64. package/prompts/frontend/flow-check-review.md +656 -20
  65. package/prompts/frontend/flow-check-test.md +526 -14
  66. package/prompts/frontend/flow-check.md +725 -67
  67. package/prompts/frontend/flow-commit.md +88 -119
  68. package/prompts/frontend/flow-docs-sync.md +550 -550
  69. package/prompts/frontend/flow-finish.md +910 -0
  70. package/prompts/frontend/flow-release.md +519 -0
  71. package/prompts/frontend/flow-work-api.md +1540 -0
  72. package/prompts/frontend/flow-work-feature.md +61 -61
  73. package/prompts/frontend/flow-work-fix.md +38 -38
  74. package/prompts/frontend/flow-work-refactor.md +48 -48
  75. package/prompts/frontend/flow-work-resume.md +34 -34
  76. package/prompts/frontend/flow-work.md +1583 -1320
  77. package/prompts/mobile/flow-build-phase-0.md +425 -425
  78. package/prompts/mobile/flow-build-phase-1.md +626 -626
  79. package/prompts/mobile/flow-build-phase-10.md +32 -32
  80. package/prompts/mobile/flow-build-phase-2.md +573 -573
  81. package/prompts/mobile/flow-build-phase-3.md +782 -782
  82. package/prompts/mobile/flow-build-phase-4.md +554 -554
  83. package/prompts/mobile/flow-build-phase-5.md +703 -703
  84. package/prompts/mobile/flow-build-phase-6.md +524 -524
  85. package/prompts/mobile/flow-build-phase-7.md +1001 -1001
  86. package/prompts/mobile/flow-build-phase-8.md +888 -888
  87. package/prompts/mobile/flow-build-phase-9.md +90 -90
  88. package/prompts/mobile/flow-build.md +135 -135
  89. package/prompts/mobile/flow-check-review.md +656 -20
  90. package/prompts/mobile/flow-check-test.md +526 -14
  91. package/prompts/mobile/flow-check.md +725 -67
  92. package/prompts/mobile/flow-commit.md +88 -119
  93. package/prompts/mobile/flow-docs-sync.md +620 -620
  94. package/prompts/mobile/flow-finish.md +910 -0
  95. package/prompts/mobile/flow-release.md +751 -0
  96. package/prompts/mobile/flow-work-api.md +1493 -0
  97. package/prompts/mobile/flow-work-feature.md +61 -61
  98. package/prompts/mobile/flow-work-fix.md +46 -46
  99. package/prompts/mobile/flow-work-refactor.md +48 -48
  100. package/prompts/mobile/flow-work-resume.md +34 -34
  101. package/prompts/mobile/flow-work.md +1593 -1329
  102. package/prompts/shared/mermaid-guidelines.md +102 -102
  103. package/prompts/shared/scope-levels.md +114 -114
  104. package/prompts/shared/smart-skip-preflight.md +214 -214
  105. package/prompts/shared/story-points.md +55 -55
  106. package/prompts/shared/task-format.md +74 -74
  107. package/prompts/shared/task-summary-template.md +277 -277
  108. package/templates/AGENT.template.md +443 -443
  109. package/templates/backend/.clauderules.template +112 -112
  110. package/templates/backend/.cursorrules.template +102 -102
  111. package/templates/backend/README.template.md +2 -2
  112. package/templates/backend/ai-instructions.template.md +2 -2
  113. package/templates/backend/copilot-instructions.template.md +2 -2
  114. package/templates/backend/docs/api.template.md +320 -320
  115. package/templates/backend/docs/business-flows.template.md +97 -97
  116. package/templates/backend/docs/code-standards.template.md +2 -2
  117. package/templates/backend/docs/contributing.template.md +3 -3
  118. package/templates/backend/docs/data-model.template.md +520 -520
  119. package/templates/backend/docs/testing.template.md +2 -2
  120. package/templates/backend/project-brief.template.md +2 -2
  121. package/templates/backend/specs/configuration.template.md +2 -2
  122. package/templates/backend/specs/security.template.md +2 -2
  123. package/templates/desktop/.clauderules.template +112 -112
  124. package/templates/desktop/.cursorrules.template +102 -102
  125. package/templates/desktop/README.template.md +170 -170
  126. package/templates/desktop/ai-instructions.template.md +366 -366
  127. package/templates/desktop/copilot-instructions.template.md +140 -140
  128. package/templates/desktop/docs/docs/api.template.md +320 -320
  129. package/templates/desktop/docs/docs/architecture.template.md +724 -724
  130. package/templates/desktop/docs/docs/business-flows.template.md +102 -102
  131. package/templates/desktop/docs/docs/code-standards.template.md +792 -792
  132. package/templates/desktop/docs/docs/contributing.template.md +149 -149
  133. package/templates/desktop/docs/docs/data-model.template.md +520 -520
  134. package/templates/desktop/docs/docs/operations.template.md +720 -720
  135. package/templates/desktop/docs/docs/testing.template.md +722 -722
  136. package/templates/desktop/project-brief.template.md +150 -150
  137. package/templates/desktop/specs/specs/configuration.template.md +121 -121
  138. package/templates/desktop/specs/specs/security.template.md +392 -392
  139. package/templates/frontend/README.template.md +2 -2
  140. package/templates/frontend/ai-instructions.template.md +2 -2
  141. package/templates/frontend/docs/api-integration.template.md +362 -362
  142. package/templates/frontend/docs/components.template.md +2 -2
  143. package/templates/frontend/docs/error-handling.template.md +360 -360
  144. package/templates/frontend/docs/operations.template.md +107 -107
  145. package/templates/frontend/docs/performance.template.md +124 -124
  146. package/templates/frontend/docs/pwa.template.md +119 -119
  147. package/templates/frontend/docs/state-management.template.md +2 -2
  148. package/templates/frontend/docs/styling.template.md +2 -2
  149. package/templates/frontend/docs/testing.template.md +2 -2
  150. package/templates/frontend/project-brief.template.md +2 -2
  151. package/templates/frontend/specs/accessibility.template.md +95 -95
  152. package/templates/frontend/specs/configuration.template.md +2 -2
  153. package/templates/frontend/specs/security.template.md +175 -175
  154. package/templates/fullstack/README.template.md +252 -252
  155. package/templates/fullstack/ai-instructions.template.md +444 -444
  156. package/templates/fullstack/project-brief.template.md +157 -157
  157. package/templates/fullstack/specs/configuration.template.md +340 -340
  158. package/templates/mobile/README.template.md +167 -167
  159. package/templates/mobile/ai-instructions.template.md +196 -196
  160. package/templates/mobile/docs/app-store.template.md +135 -135
  161. package/templates/mobile/docs/architecture.template.md +63 -63
  162. package/templates/mobile/docs/native-features.template.md +94 -94
  163. package/templates/mobile/docs/navigation.template.md +59 -59
  164. package/templates/mobile/docs/offline-strategy.template.md +65 -65
  165. package/templates/mobile/docs/permissions.template.md +56 -56
  166. package/templates/mobile/docs/state-management.template.md +85 -85
  167. package/templates/mobile/docs/testing.template.md +109 -109
  168. package/templates/mobile/project-brief.template.md +69 -69
  169. package/templates/mobile/specs/build-configuration.template.md +91 -91
  170. package/templates/mobile/specs/deployment.template.md +92 -92
  171. package/templates/work.template.md +47 -47
@@ -1,1001 +1,1001 @@
1
- ## PHASE 7: Operations & Deployment (10-15 min)
2
-
3
- > **Order for this phase:** 7.1 → 7.2 → 7.3 → 7.4 → 7.4.1 → 7.5 → 7.6 → 7.7 → 7.7.1 → 7.7.2 → 7.8 → 7.9 → 7.9.1 → 7.9.2 → 7.9.3 → 7.9.4 → 7.10
4
-
5
- > **📌 Scope-based behavior:**
6
- >
7
- > - **MVP:** Ask 7.1-7.4 only (deployment basics), skip 7.5-7.10 (monitoring, scaling, backups), mark as "TBD"
8
- > - **Production-Ready:** Ask 7.1-7.8, simplify 7.9-7.10 (advanced monitoring and resilience)
9
- > - **Enterprise:** Ask all questions 7.1-7.10 with emphasis on reliability and disaster recovery
10
-
11
- ### Objective
12
-
13
- Define deployment, monitoring, and operational practices.
14
-
15
- ---
16
-
17
- ## 🔍 Pre-Flight Check (Smart Skip Logic)
18
-
19
- > 📎 **Reference:** See [prompts/shared/smart-skip-preflight.md](../../.ai-flow/prompts/shared/smart-skip-preflight.md) for the complete smart skip logic.
20
-
21
- **Execute Pre-Flight Check for Phase 7:**
22
-
23
- - **Target File**: `docs/deployment.md`
24
- - **Phase Name**: "OPERATIONS & DEPLOYMENT"
25
- - **Key Items**: CI/CD pipeline, deployment platform, monitoring, logging
26
- - **Typical Gaps**: Incident runbooks, disaster recovery, scaling strategy
27
-
28
- **Proceed with appropriate scenario based on audit data from `.ai-flow/cache/audit-data.json`**
29
-
30
- ---
31
-
32
- ## Phase 7 Questions (Full Mode)
33
-
34
- **7.1 Deployment Environment**
35
-
36
- ```
37
-
38
- Where will you deploy?
39
-
40
- A) ⭐ Cloud Platform
41
-
42
- - AWS (ECS, Fargate, Lambda, EC2)
43
- - Google Cloud (Cloud Run, GKE, Compute Engine)
44
- - Azure (App Service, AKS, VMs)
45
-
46
- B) 🔥 Platform-as-a-Service (PaaS)
47
-
48
- - Heroku
49
- - Railway
50
- - Render
51
- - Fly.io
52
- - Vercel (for APIs)
53
-
54
- C) 🏢 On-Premises
55
-
56
- - Company servers
57
- - Private cloud
58
-
59
- D) 🐳 Container Orchestration
60
-
61
- - Kubernetes (GKE, EKS, AKS)
62
- - Docker Swarm
63
- - Nomad
64
-
65
- Your choice: \_\_
66
- Why?
67
-
68
- ```
69
-
70
- **7.2 Containerization**
71
-
72
- ````
73
-
74
- Will you use Docker?
75
-
76
- A) ⭐ Yes - Dockerize application
77
-
78
- - Multi-stage build
79
- - Optimized image size
80
- - Docker Compose for local dev
81
-
82
- B) No - Deploy directly
83
-
84
- If yes:
85
- Base image: **
86
- Estimated image size: ** MB
87
-
88
- Example stack (local development):
89
-
90
- ```yaml
91
- services:
92
- app:
93
- build: .
94
- ports: [3000:3000]
95
- db:
96
- image: postgres:15
97
- redis:
98
- image: redis:7
99
- ```
100
-
101
- ````
102
-
103
- **7.3 Environment Strategy**
104
-
105
- ```
106
-
107
- How many environments will you have?
108
-
109
- A) ⭐ Three environments
110
-
111
- - Development (local)
112
- - Staging (pre-production, QA)
113
- - Production (live)
114
-
115
- B) 🏆 Four+ environments
116
-
117
- - Development
118
- - Testing (automated tests)
119
- - Staging
120
- - Production
121
-
122
- C) 🚀 Two environments
123
-
124
- - Development
125
- - Production
126
-
127
- Your choice: \_\_
128
-
129
- Environment configuration:
130
- A) ✅ Environment variables (.env files)
131
- B) ✅ Config service (AWS Secrets Manager, Vault)
132
- C) ✅ Feature flags (LaunchDarkly, Unleash)
133
-
134
- ```
135
-
136
- **7.4 CI/CD Pipeline**
137
-
138
- ```
139
-
140
- CI/CD platform:
141
-
142
- A) ⭐ GitHub Actions - If using GitHub
143
- B) 🔥 GitLab CI - If using GitLab
144
- C) Jenkins - Self-hosted
145
- D) CircleCI
146
- E) Travis CI
147
- F) AWS CodePipeline
148
- G) Azure DevOps
149
-
150
- Your choice: \_\_
151
-
152
- Pipeline stages:
153
-
154
- 1. ✅ Checkout code
155
- 2. ✅ Install dependencies
156
- 3. ✅ Lint
157
- 4. ✅ Test (with coverage)
158
- 5. ✅ Build
159
- 6. ✅ Security scan (optional)
160
- 7. ✅ Deploy to staging
161
- 8. ⏸️ Manual approval (optional)
162
- 9. ✅ Deploy to production
163
-
164
- Auto-deploy strategy:
165
- A) ⭐ Auto-deploy to staging, manual approval for production
166
- B) 🚀 Auto-deploy to production (main branch)
167
- C) Manual deploy for all environments
168
-
169
- ```
170
-
171
- **7.4.1 Deployment Strategy** (Production-Ready and Enterprise only)
172
-
173
- ```
174
- What deployment strategy will you use for production?
175
-
176
- A) ⭐ Rolling Deployment - Gradual replacement
177
- - Replace instances one at a time
178
- - Zero downtime
179
- - Easy rollback
180
-
181
- B) 🔥 Blue-Green Deployment - Instant switch
182
- - Two identical environments
183
- - Switch traffic instantly
184
- - Higher infrastructure cost
185
-
186
- C) ⚡ Canary Deployment - Progressive rollout
187
- - Deploy to small percentage first
188
- - Monitor for issues
189
- - Gradually increase traffic
190
-
191
- D) 🏆 Feature Flags - Code-level control
192
- - Deploy code, toggle features
193
- - Instant enable/disable
194
- - Best with: LaunchDarkly, Unleash
195
-
196
- Your choice: __
197
-
198
- Rollback plan:
199
- - How quickly must rollback complete? __ minutes
200
- - Who can trigger rollback? [DevOps/Tech Lead/Any developer]
201
- - Rollback trigger criteria? [Error rate > X%, latency > Y ms, manual]
202
-
203
- If Blue-Green:
204
- - Traffic switching: [Load balancer, DNS, etc.]
205
- - Database migrations: [Strategy for zero-downtime]
206
-
207
- If Canary:
208
- - Initial traffic: __%
209
- - Gradual increase: __% per __ minutes
210
- - Success criteria: __
211
- ```
212
-
213
- **7.5 Monitoring & Logging**
214
-
215
- ````
216
-
217
- Monitoring tools:
218
-
219
- Application Performance Monitoring (APM):
220
- A) ⭐ Datadog - Full-featured, expensive
221
- B) 🔥 New Relic - Popular
222
- C) Sentry - Error tracking focus
223
- D) ⚡ OpenTelemetry + Grafana - Open source
224
- E) AWS CloudWatch
225
- F) None yet
226
-
227
- Your choice: \_\_
228
-
229
- Logging:
230
- A) ⭐ Centralized logging
231
-
232
- - Winston/Pino (Node.js) → CloudWatch/Datadog
233
- - Python logging → ELK Stack
234
-
235
- B) Basic console logs
236
-
237
- C) Structured JSON logging ⭐
238
-
239
- ```json
240
- {
241
- "level": "info",
242
- "timestamp": "2024-01-15T10:30:00Z",
243
- "userId": "123",
244
- "action": "user.login",
245
- "ip": "192.168.1.1",
246
- "message": "User logged in successfully"
247
- }
248
- ```
249
-
250
- Your logging strategy: \_\_
251
-
252
- Metrics to track:
253
-
254
- - ✅ Request rate (requests/sec)
255
- - ✅ Error rate (% of failed requests)
256
- - ✅ Response time (p50, p95, p99)
257
- - ✅ Database query time
258
- - ✅ Cache hit rate
259
- - ✅ CPU/Memory usage
260
- - Custom business metrics: \_\_
261
-
262
- ````
263
-
264
- **7.6 Alerts**
265
-
266
- ```
267
-
268
- When should you be alerted?
269
-
270
- A) ✅ Error rate > **% (e.g., 1%)
271
- B) ✅ Response time > **ms (e.g., 1000ms)
272
- C) ✅ 5xx errors (server errors)
273
- D) ✅ Service down (health check failure)
274
- E) ✅ Database connection failures
275
- F) ✅ Disk space > 80%
276
- G) ✅ Memory usage > 85%
277
-
278
- Alert channels:
279
- A) ⭐ Email
280
- B) 🔥 Slack/Discord
281
- C) ⚡ PagerDuty/Opsgenie (on-call)
282
- D) SMS (critical only)
283
-
284
- Your preferences: \_\_
285
-
286
- On-call rotation:
287
- A) Yes - Using [PagerDuty/Opsgenie]
288
- B) No - Monitor during business hours
289
-
290
- ```
291
-
292
- **7.7 Backup & Disaster Recovery**
293
-
294
- ```
295
-
296
- Backup strategy:
297
-
298
- Database backups:
299
- A) ⭐ Automated daily backups
300
-
301
- - Retention: 30 days
302
- - Point-in-time recovery
303
-
304
- B) 🏆 Continuous backups
305
-
306
- - Every hour
307
- - 90 days retention
308
-
309
- C) Manual backups weekly
310
-
311
- Your strategy: **
312
- Retention period: ** days
313
-
314
- Disaster recovery:
315
-
316
- - Recovery Time Objective (RTO): \_\_ (how fast to restore)
317
- - Recovery Point Objective (RPO): \_\_ (acceptable data loss)
318
-
319
- Example:
320
-
321
- - RTO: 1 hour (service restored within 1 hour)
322
- - RPO: 15 minutes (lose max 15 min of data)
323
-
324
- ```
325
-
326
- **7.7.1 Database Migrations in Production**
327
-
328
- ```
329
- How will you handle database migrations in production?
330
-
331
- Zero-downtime migrations:
332
- A) ⭐ Yes - Plan for zero-downtime migrations (Production-Ready/Enterprise)
333
- B) No - Accept maintenance windows (MVP)
334
-
335
- If zero-downtime:
336
- - Strategy: [Expand/Contract, Blue-Green migrations, etc.]
337
- - Rollback plan: __
338
- - Testing: [Tested on staging, Dry-run process]
339
-
340
- Migration windows (if not zero-downtime):
341
- - Preferred time: __
342
- - Duration: __ minutes
343
- - Notification: __
344
- ```
345
-
346
- **7.7.2 Database Connection Pooling**
347
-
348
- ```
349
- Database connection pooling configuration:
350
-
351
- Pool tool: [ORM built-in, pgBouncer, HikariCP, etc.]
352
-
353
- Settings:
354
- - Min connections: __
355
- - Max connections: __
356
- - Connection timeout: __ ms
357
- - Idle timeout: __ ms
358
- - Max lifetime: __ ms
359
-
360
- Monitoring:
361
- - Track active/idle connections: [Yes/No]
362
- - Alert on pool exhaustion: [Yes/No]
363
- ```
364
-
365
- **7.8 Scaling Strategy**
366
-
367
- ```
368
-
369
- How will you handle growth?
370
-
371
- A) ⭐ Horizontal scaling - Add more instances
372
-
373
- - Load balancer distributes traffic
374
- - Stateless application design
375
-
376
- B) Vertical scaling - Bigger instances
377
-
378
- - Increase CPU/RAM
379
- - Simpler but limited
380
-
381
- C) ⚡ Auto-scaling - Automatic based on load
382
-
383
- - Scale up during high traffic
384
- - Scale down to save costs
385
- - Metrics: CPU > 70%, requests > threshold
386
-
387
- Your strategy: \_\_
388
-
389
- Expected load:
390
-
391
- - Initial: \_\_ requests/minute
392
- - Year 1: \_\_ requests/minute
393
- - Peak traffic: \_\_x normal load
394
-
395
- Database scaling:
396
- A) Read replicas - Scale reads
397
- B) Sharding - Split data across DBs
398
- C) Vertical scaling - Bigger DB instance
399
- D) Not needed yet
400
-
401
- ```
402
-
403
- **7.9 Health Checks**
404
-
405
- ````
406
-
407
- Health check endpoints:
408
-
409
- A) ✅ /health - Basic liveness
410
-
411
- - Returns 200 OK if app is running
412
-
413
- B) ✅ /health/ready - Readiness check
414
-
415
- - Returns 200 OK if app can handle traffic
416
- - Checks: DB connected, Redis connected, etc.
417
-
418
- C) ✅ /health/live - Liveness check
419
-
420
- - Returns 200 OK if app is alive
421
- - Load balancer uses this
422
-
423
- Example response:
424
-
425
- ```json
426
- {
427
- "status": "healthy",
428
- "timestamp": "2024-01-15T10:30:00Z",
429
- "checks": {
430
- "database": "ok",
431
- "redis": "ok",
432
- "disk_space": "ok"
433
- },
434
- "version": "1.2.3"
435
- }
436
- ```
437
-
438
- Your health check endpoints: \_\_
439
-
440
- ````
441
-
442
- **7.9.1 Graceful Shutdown**
443
-
444
- ```
445
- Will you implement graceful shutdown?
446
-
447
- A) ⭐ Yes - Handle shutdown gracefully (Production-Ready/Enterprise)
448
- B) No - Standard shutdown
449
-
450
- If yes:
451
- Shutdown sequence:
452
- 1. Stop accepting new requests (timeout: __s)
453
- 2. Finish processing current requests (timeout: __s)
454
- 3. Close database connections (timeout: __s)
455
- 4. Close other connections (Redis, message queues, etc.)
456
- 5. Exit process
457
-
458
- Total shutdown timeout: __s
459
-
460
- Implementation:
461
- - Signal handling: [SIGTERM, SIGINT]
462
- - Health check grace period: __s
463
- - Connection drain timeout: __s
464
- ```
465
-
466
- **7.9.2 Circuit Breakers & Resilience**
467
-
468
- ```
469
- Will you implement circuit breakers?
470
-
471
- A) ⭐ Yes - Protect against cascading failures (Production-Ready/Enterprise)
472
- B) No - Direct service calls
473
-
474
- If yes:
475
- Circuit breaker tool: [Resilience4j, Hystrix, Polly, etc.]
476
-
477
- Configuration:
478
- - Failure threshold: __% (open circuit after X% failures)
479
- - Success threshold: __% (close circuit after X% successes)
480
- - Timeout: __ms
481
- - Half-open retries: __
482
- - Reset timeout: __s
483
-
484
- Fallback strategy:
485
- A) ⭐ Return cached data
486
- B) Return default/empty response
487
- C) Call alternative service
488
- D) Return error gracefully
489
-
490
- Services to protect:
491
- {{#EACH SERVICE_TO_PROTECT}}
492
- - **{{SERVICE_NAME}}**: {{FAILURE_THRESHOLD}}% threshold, fallback: {{FALLBACK_STRATEGY}}
493
- {{/EACH}}
494
- ```
495
-
496
- **7.9.3 Retry & Timeout Policies**
497
-
498
- ```
499
- Define retry and timeout policies for external dependencies:
500
-
501
- | Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
502
- |--------------------|-----------|---------|----------------------|----------------------|
503
- | Database queries | 5000ms | 2 | None (fail fast) | Connection pooled |
504
- | Redis cache | 1000ms | 1 | None | Cache miss = OK |
505
- | Payment API | 30000ms | 3 | Exponential (1s,2s,4s)| Must complete |
506
- | Email service | 5000ms | 3 | Fixed (2s) | Queue if fails |
507
- | External REST APIs | 10000ms | 2 | Exponential | Circuit breaker |
508
- | File storage (S3) | 15000ms | 3 | Exponential | Large files |
509
-
510
- Your policies:
511
-
512
- | Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
513
- |--------------------|-----------|---------|----------------------|----------------------|
514
- | | | | | |
515
- | | | | | |
516
-
517
- Global defaults:
518
- - Default HTTP timeout: __ ms (recommended: 10000)
519
- - Default retries: __ (recommended: 2)
520
- - Default backoff: [None/Fixed/Exponential]
521
-
522
- Non-retryable errors:
523
- - 400 Bad Request (client error, won't succeed on retry)
524
- - 401/403 Unauthorized/Forbidden
525
- - 404 Not Found
526
- - [Your additions]
527
- ```
528
-
529
- **7.9.4 Request/Response Logging & Masking**
530
-
531
- ```
532
- What request/response data will you log?
533
-
534
- Log levels by environment:
535
- | Environment | Level | Body Logging | Performance Logging |
536
- |-------------|----------|--------------|---------------------|
537
- | Development | debug | Full | Yes |
538
- | Staging | info | Truncated | Yes |
539
- | Production | info | Minimal | Yes |
540
-
541
- Request logging:
542
- - ✅ HTTP method and URL
543
- - ✅ Request ID (correlation)
544
- - ✅ User ID (if authenticated)
545
- - ✅ IP address (optional, may hash for privacy)
546
- - ✅ Request duration (ms)
547
- - ❓ Request body (careful with size and PII)
548
- - ❓ Query parameters
549
-
550
- Response logging:
551
- - ✅ Status code
552
- - ✅ Response duration (ms)
553
- - ❓ Response body (careful with size)
554
-
555
- Sensitive data masking (CRITICAL):
556
-
557
- | Field Pattern | Masking Strategy |
558
- |------------------------|----------------------------|
559
- | password, secret | Completely redact |
560
- | token, api_key | Show last 4 chars only |
561
- | email | j***@example.com |
562
- | phone | ***-***-1234 |
563
- | credit_card | ****-****-****-1234 |
564
- | ssn, national_id | Completely redact |
565
- | [Your patterns] | __ |
566
-
567
- Log format:
568
- A) ⭐ Structured JSON (recommended for aggregation)
569
- B) Plain text with patterns
570
- C) Framework default
571
-
572
- Log aggregation:
573
- A) ⭐ Centralized (ELK, Datadog, CloudWatch)
574
- B) File-based with rotation
575
- C) Console only (development)
576
- ```
577
-
578
- **7.10 Documentation & Runbooks**
579
-
580
- ```
581
-
582
- Operational documentation:
583
-
584
- A) ✅ Deployment guide - How to deploy
585
- B) ✅ Runbooks - How to handle incidents
586
-
587
- - Database connection failure → steps to diagnose/fix
588
- - High CPU usage → steps to investigate
589
- - Service down → recovery procedure
590
-
591
- C) ✅ Architecture diagrams (Mermaid format)
592
-
593
- - System architecture diagram (mermaid)
594
- - Data flow diagram (mermaid)
595
- - Infrastructure diagram (mermaid)
596
-
597
- D) ✅ API documentation
598
-
599
- - Swagger/OpenAPI
600
- - Auto-generated from code
601
-
602
- Will you create these?
603
- A) Yes - All of them ⭐
604
- B) Yes - Critical ones only (deployment, runbooks)
605
- C) Later - Start without docs
606
-
607
- API documentation strategy:
608
- A) ⭐ Code-First (Recommended)
609
-
610
- - Generate docs from code (Swagger/OpenAPI decorators)
611
- - Always in sync with code
612
- - Tools: @nestjs/swagger, FastAPI docs
613
-
614
- B) 📝 Design-First
615
-
616
- - Write openapi.yaml manually first
617
- - Generate code from spec
618
- - Better for large teams/contracts
619
-
620
- C) 📄 Manual
621
-
622
- - Write Markdown/Notion docs
623
- - Hard to keep in sync (Not recommended)
624
-
625
- ```
626
-
627
- ---
628
-
629
- #### 🎨 MERMAID OPERATIONS DIAGRAM FORMATS - CRITICAL
630
-
631
- ## **Use these exact formats** for operational and infrastructure diagrams mentioned in question 7.10:
632
-
633
- ##### 1️⃣ System Architecture Diagram (Deployment View)
634
-
635
- Use `graph TD` to show deployed system components with scaling and redundancy:
636
-
637
- ````markdown
638
- ```mermaid
639
- graph TD
640
- subgraph "Production Environment"
641
- subgraph "Load Balancer Layer"
642
- LB1[Load Balancer 1]
643
- LB2[Load Balancer 2]
644
- end
645
-
646
- subgraph "Application Layer"
647
- App1[API Server 1<br/>4 vCPU, 8GB RAM]
648
- App2[API Server 2<br/>4 vCPU, 8GB RAM]
649
- App3[API Server 3<br/>4 vCPU, 8GB RAM]
650
- end
651
-
652
- subgraph "Data Layer"
653
- Primary[(Primary DB<br/>PostgreSQL 15)]
654
- Replica1[(Read Replica 1)]
655
- Replica2[(Read Replica 2)]
656
- Cache[Redis Cluster<br/>3 Nodes]
657
- end
658
-
659
- subgraph "Message Queue"
660
- Queue[RabbitMQ Cluster<br/>3 Nodes]
661
- end
662
- end
663
-
664
- Internet[Internet] -->|HTTPS| LB1
665
- Internet -->|HTTPS| LB2
666
- LB1 --> App1
667
- LB1 --> App2
668
- LB2 --> App2
669
- LB2 --> App3
670
-
671
- App1 -->|Write| Primary
672
- App2 -->|Write| Primary
673
- App3 -->|Write| Primary
674
-
675
- App1 -->|Read| Replica1
676
- App2 -->|Read| Replica2
677
- App3 -->|Read| Replica1
678
-
679
- App1 -->|Cache| Cache
680
- App2 -->|Cache| Cache
681
- App3 -->|Cache| Cache
682
-
683
- App1 -->|Async Jobs| Queue
684
- App2 -->|Async Jobs| Queue
685
- App3 -->|Async Jobs| Queue
686
-
687
- Primary -.->|Replication| Replica1
688
- Primary -.->|Replication| Replica2
689
-
690
- style Internet fill:#e1f5ff
691
- style Primary fill:#e1ffe1
692
- style Cache fill:#f0e1ff
693
- style Queue fill:#ffe1f5
694
- ```
695
- ````
696
-
697
- ## **Use for:** Showing deployed infrastructure, scaling configuration, redundancy, high availability
698
-
699
- ##### 2️⃣ Data Flow Diagram (Request Flow)
700
-
701
- Use `flowchart LR` to show how data moves through the system step-by-step:
702
-
703
- ````markdown
704
- ```mermaid
705
- flowchart LR
706
- User[User Request] -->|1. HTTPS POST| LB[Load Balancer]
707
- LB -->|2. Route| API[API Server]
708
- API -->|3. Validate JWT| Auth[Auth Service]
709
- Auth -->|4. Token Valid| API
710
-
711
- API -->|5. Check Cache| Cache[(Redis Cache)]
712
- Cache -->|6. Cache Miss| API
713
-
714
- API -->|7. Query| DB[(PostgreSQL)]
715
- DB -->|8. Data| API
716
-
717
- API -->|9. Store in Cache| Cache
718
- API -->|10. Enqueue Job| Queue[Message Queue]
719
-
720
- Queue -->|11. Process| Worker[Background Worker]
721
- Worker -->|12. Send Email| Email[Email Service]
722
-
723
- API -->|13. JSON Response| User
724
-
725
- style User fill:#e1f5ff
726
- style Cache fill:#f0e1ff
727
- style DB fill:#e1ffe1
728
- style Email fill:#fff4e1
729
- ```
730
- ````
731
-
732
- ## **Use for:** Documenting request/response cycles, async processing flows, numbered execution steps
733
-
734
- ##### 3️⃣ Infrastructure Diagram (Cloud Resources)
735
-
736
- Use `graph TB` with subgraphs to show cloud infrastructure and network topology:
737
-
738
- ````markdown
739
- ```mermaid
740
- graph TB
741
- subgraph "AWS Cloud - Production (us-east-1)"
742
- subgraph "VPC (10.0.0.0/16)"
743
- subgraph "Public Subnet (10.0.1.0/24)"
744
- ALB[Application Load Balancer]
745
- NAT[NAT Gateway]
746
- end
747
-
748
- subgraph "Private Subnet 1 (10.0.10.0/24)"
749
- ECS1[ECS Cluster<br/>Auto Scaling Group]
750
- App1[Container: API<br/>Fargate Task]
751
- App2[Container: API<br/>Fargate Task]
752
- end
753
-
754
- subgraph "Private Subnet 2 (10.0.20.0/24)"
755
- RDS[(RDS PostgreSQL<br/>Multi-AZ)]
756
- ElastiCache[ElastiCache Redis<br/>Cluster Mode]
757
- end
758
-
759
- subgraph "Private Subnet 3 (10.0.30.0/24)"
760
- SQS[Amazon SQS<br/>Message Queue]
761
- Lambda[Lambda Functions<br/>Background Workers]
762
- end
763
- end
764
-
765
- subgraph "Supporting Services"
766
- S3[S3 Bucket<br/>File Storage]
767
- CloudWatch[CloudWatch<br/>Monitoring & Logs]
768
- SecretsManager[Secrets Manager<br/>API Keys & Credentials]
769
- end
770
- end
771
-
772
- Internet[Internet Users] -->|HTTPS| ALB
773
- ALB --> App1
774
- ALB --> App2
775
-
776
- App1 --> RDS
777
- App2 --> RDS
778
- App1 --> ElastiCache
779
- App2 --> ElastiCache
780
-
781
- App1 -->|Upload/Download| S3
782
- App2 -->|Upload/Download| S3
783
-
784
- App1 -->|Send Message| SQS
785
- SQS -->|Trigger| Lambda
786
- Lambda --> RDS
787
-
788
- App1 -->|Logs & Metrics| CloudWatch
789
- App2 -->|Logs & Metrics| CloudWatch
790
- Lambda -->|Logs| CloudWatch
791
-
792
- App1 -->|Fetch Secrets| SecretsManager
793
- App2 -->|Fetch Secrets| SecretsManager
794
-
795
- style Internet fill:#e1f5ff
796
- style RDS fill:#e1ffe1
797
- style ElastiCache fill:#f0e1ff
798
- style S3 fill:#fff4e1
799
- style CloudWatch fill:#ffe1e1
800
- ```
801
- ````
802
-
803
- ## **Use for:** Documenting cloud architecture, network topology, AWS/GCP/Azure resources, VPC design
804
-
805
- ##### 4️⃣ Monitoring & Observability Diagram (Optional)
806
-
807
- Use `graph TD` to show monitoring, logging, and alerting stack:
808
-
809
- ````markdown
810
- ```mermaid
811
- graph TD
812
- subgraph "Application Layer"
813
- App[API Servers]
814
- Worker[Background Workers]
815
- end
816
-
817
- subgraph "Monitoring Stack"
818
- Prometheus[Prometheus<br/>Metrics Collection]
819
- Grafana[Grafana<br/>Dashboards]
820
- AlertManager[Alert Manager<br/>Notifications]
821
- end
822
-
823
- subgraph "Logging Stack"
824
- FluentBit[Fluent Bit<br/>Log Collector]
825
- Elasticsearch[Elasticsearch<br/>Log Storage]
826
- Kibana[Kibana<br/>Log Viewer]
827
- end
828
-
829
- subgraph "Tracing"
830
- Jaeger[Jaeger<br/>Distributed Tracing]
831
- end
832
-
833
- subgraph "Alerts"
834
- PagerDuty[PagerDuty]
835
- Slack[Slack Notifications]
836
- end
837
-
838
- App -->|Metrics| Prometheus
839
- Worker -->|Metrics| Prometheus
840
- Prometheus --> Grafana
841
- Prometheus --> AlertManager
842
-
843
- App -->|Logs| FluentBit
844
- Worker -->|Logs| FluentBit
845
- FluentBit --> Elasticsearch
846
- Elasticsearch --> Kibana
847
-
848
- App -->|Traces| Jaeger
849
- Worker -->|Traces| Jaeger
850
-
851
- AlertManager --> PagerDuty
852
- AlertManager --> Slack
853
-
854
- style Grafana fill:#e1f5ff
855
- style Kibana fill:#f0e1ff
856
- style PagerDuty fill:#ffe1e1
857
- ```
858
- ````
859
-
860
- ## **Use for:** Documenting observability strategy, monitoring infrastructure, alerting workflows
861
-
862
- **Best Practices for Operations Diagrams:**
863
-
864
- 1. **Include Resource Specs:** Add CPU/RAM/disk info to nodes (e.g., `[API Server<br/>4 vCPU, 8GB RAM]`)
865
- 2. **Show Redundancy:** Display load balancers, replicas, multi-AZ deployments, failover paths
866
- 3. **Label Network Boundaries:** Use subgraphs for VPCs, subnets, availability zones, regions
867
- 4. **Document Protocols:** Label connections with HTTPS, gRPC, TCP, WebSocket, etc.
868
- 5. **Add IP Ranges:** Include CIDR blocks for network subnets (e.g., `10.0.1.0/24`)
869
- 6. **Show Auto-Scaling:** Indicate which components scale horizontally/vertically
870
- 7. **Include External Services:** SaaS tools, third-party APIs, CDNs, email providers
871
- 8. **Color Code by Layer:** Infrastructure (blue), data (green), monitoring (purple), alerts (red)
872
-
873
- **Common Formatting Rules:**
874
-
875
- - Code fence: ` ```mermaid ` (lowercase, no spaces, three backticks)
876
- - Use `subgraph "Name"` to group related components by layer/zone
877
- - Use `[(Cylinder)]` for databases, data stores, and persistent storage
878
- - Use `[Square Brackets]` for services, servers, and compute resources
879
- - Use dotted arrows `-.->` for replication, backup, and async flows
880
- - Apply consistent styling: `style NodeName fill:#colorcode`
881
-
882
- **Deployment Context Examples:**
883
-
884
- - For Docker: Show containers, volumes, networks, registries
885
- - For Kubernetes: Show pods, services, ingress, namespaces, persistent volumes
886
- - For Serverless: Show Lambda functions, API Gateway, S3 triggers, event sources
887
- - For VMs: Show instances, security groups, load balancers, auto-scaling groups
888
-
889
- ## **Validation:** Test diagrams at https://mermaid.live/ before saving to ensure syntax is correct
890
-
891
- ### Phase 7 Output
892
-
893
- ```
894
- 📋 PHASE 7 SUMMARY:
895
-
896
- Deployment Environment: [cloud/PaaS/on-premises/container-orchestration + platform choice + rationale] (7.1)
897
- Containerization: [yes/no + Docker setup (base image, size, compose stack)] (7.2)
898
- Environments: [number of environments (dev/staging/prod) + config approach (env vars/secrets/feature flags)] (7.3)
899
- CI/CD Pipeline: [platform (GitHub Actions/GitLab CI/etc.) + pipeline stages + auto-deploy strategy] (7.4)
900
- Deployment Strategy: [standard/blue-green/canary/rolling + zero-downtime approach + rollback plan] (7.4.1)
901
- Monitoring & Logging: [APM tool + logging strategy (centralized/structured JSON) + metrics to track] (7.5)
902
- Alerts: [alert conditions (error rate/response time/5xx/etc.) + channels (email/Slack/PagerDuty) + on-call rotation] (7.6)
903
- Backup & Disaster Recovery: [backup strategy + retention period + RTO/RPO targets] (7.7)
904
- Database Migrations in Production: [zero-downtime strategy + rollback plan + migration windows] (7.7.1)
905
- Database Connection Pooling: [pool tool + settings (min/max/timeouts) + monitoring] (7.7.2)
906
- Scaling Strategy: [horizontal/vertical/auto-scaling + expected load + database scaling approach] (7.8)
907
- Health Checks: [endpoints (/health, /health/ready, /health/live) + checks performed] (7.9)
908
- Graceful Shutdown: [yes/no + shutdown sequence + timeouts] (7.9.1)
909
- Circuit Breakers & Resilience: [yes/no + tool + configuration + fallback strategies] (7.9.2)
910
- Documentation & Runbooks: [what will be created (deployment guide/runbooks/architecture diagrams in mermaid format/API docs) + API doc strategy (code-first/design-first)] (7.10)
911
-
912
- Is this correct? (Yes/No)
913
- ```
914
-
915
- ---
916
-
917
- ### 📄 Generate Phase 7 Documents
918
-
919
- **Before starting generation:**
920
-
921
- ```
922
- 📖 Loading context from previous phases...
923
- ✅ Re-reading docs/testing.md
924
- ✅ Re-reading ai-instructions.md
925
- ```
926
-
927
- **Generate documents automatically:**
928
-
929
- **1. `docs/operations.md`**
930
-
931
- - Use template: `.ai-flow/templates/docs/operations.template.md`
932
- - Fill with deployment, monitoring, alerting, backup, scaling
933
- - Write to: `docs/operations.md`
934
-
935
- **2. `specs/configuration.md`**
936
-
937
- - Use template: `.ai-flow/templates/specs/configuration.template.md`
938
- - Fill with environment variables, secrets management, feature flags
939
- - Write to: `specs/configuration.md`
940
-
941
- **3. `.env.example`**
942
-
943
- - List all environment variables needed
944
- - Include comments explaining each variable
945
- - Write to: `.env.example`
946
-
947
- ```
948
- ✅ Generated: docs/operations.md
949
- ✅ Generated: specs/configuration.md
950
- ✅ Generated: .env.example
951
-
952
- Documents have been created with all Phase 7 information.
953
-
954
- 📝 Would you like to make any corrections before continuing?
955
-
956
- → If yes: Edit the files and type "ready" when done. I'll re-read them.
957
- → If no: Type "continue" to proceed to final checkpoint.
958
- ```
959
-
960
- **If user edits files:**
961
- Re-read files to refresh context before continuing.
962
-
963
- ---
964
-
965
- ### Phase 7 Completion
966
-
967
- ```
968
- ✅ Phase 7 Complete!
969
-
970
- Generated documents:
971
- ✅ docs/operations.md
972
- ✅ specs/configuration.md
973
- ✅ .env.example
974
-
975
- 📝 Would you like to review these documents before proceeding to Phase 8?
976
-
977
- → If yes: Edit the files and type "ready" when done.
978
- → If no: Type "continue" to proceed to Phase 8.
979
- ```
980
-
981
- ---
982
-
983
- ## 📝 Generated Documents
984
-
985
- After Phase 7, generate/update:
986
-
987
- - `docs/operations.md` - Operations and deployment guide
988
- - `specs/configuration.md` - Configuration specification
989
- - `.env.example` - Environment variables template
990
-
991
- ---
992
-
993
- **Next Phase:** Phase 8 - Project Setup & Final Documentation
994
-
995
- Read: `.ai-flow/prompts/backend/flow-build-phase-8.md`
996
-
997
- ---
998
-
999
- **Last Updated:** 2025-12-20
1000
-
1001
- **Version:** 2.1.8
1
+ ## PHASE 7: Operations & Deployment (10-15 min)
2
+
3
+ > **Order for this phase:** 7.1 → 7.2 → 7.3 → 7.4 → 7.4.1 → 7.5 → 7.6 → 7.7 → 7.7.1 → 7.7.2 → 7.8 → 7.9 → 7.9.1 → 7.9.2 → 7.9.3 → 7.9.4 → 7.10
4
+
5
+ > **📌 Scope-based behavior:**
6
+ >
7
+ > - **MVP:** Ask 7.1-7.4 only (deployment basics), skip 7.5-7.10 (monitoring, scaling, backups), mark as "TBD"
8
+ > - **Production-Ready:** Ask 7.1-7.8, simplify 7.9-7.10 (advanced monitoring and resilience)
9
+ > - **Enterprise:** Ask all questions 7.1-7.10 with emphasis on reliability and disaster recovery
10
+
11
+ ### Objective
12
+
13
+ Define deployment, monitoring, and operational practices.
14
+
15
+ ---
16
+
17
+ ## 🔍 Pre-Flight Check (Smart Skip Logic)
18
+
19
+ > 📎 **Reference:** See [prompts/shared/smart-skip-preflight.md](../../.ai-flow/prompts/shared/smart-skip-preflight.md) for the complete smart skip logic.
20
+
21
+ **Execute Pre-Flight Check for Phase 7:**
22
+
23
+ - **Target File**: `docs/deployment.md`
24
+ - **Phase Name**: "OPERATIONS & DEPLOYMENT"
25
+ - **Key Items**: CI/CD pipeline, deployment platform, monitoring, logging
26
+ - **Typical Gaps**: Incident runbooks, disaster recovery, scaling strategy
27
+
28
+ **Proceed with appropriate scenario based on audit data from `.ai-flow/cache/audit-data.json`**
29
+
30
+ ---
31
+
32
+ ## Phase 7 Questions (Full Mode)
33
+
34
+ **7.1 Deployment Environment**
35
+
36
+ ```
37
+
38
+ Where will you deploy?
39
+
40
+ A) ⭐ Cloud Platform
41
+
42
+ - AWS (ECS, Fargate, Lambda, EC2)
43
+ - Google Cloud (Cloud Run, GKE, Compute Engine)
44
+ - Azure (App Service, AKS, VMs)
45
+
46
+ B) 🔥 Platform-as-a-Service (PaaS)
47
+
48
+ - Heroku
49
+ - Railway
50
+ - Render
51
+ - Fly.io
52
+ - Vercel (for APIs)
53
+
54
+ C) 🏢 On-Premises
55
+
56
+ - Company servers
57
+ - Private cloud
58
+
59
+ D) 🐳 Container Orchestration
60
+
61
+ - Kubernetes (GKE, EKS, AKS)
62
+ - Docker Swarm
63
+ - Nomad
64
+
65
+ Your choice: \_\_
66
+ Why?
67
+
68
+ ```
69
+
70
+ **7.2 Containerization**
71
+
72
+ ````
73
+
74
+ Will you use Docker?
75
+
76
+ A) ⭐ Yes - Dockerize application
77
+
78
+ - Multi-stage build
79
+ - Optimized image size
80
+ - Docker Compose for local dev
81
+
82
+ B) No - Deploy directly
83
+
84
+ If yes:
85
+ Base image: **
86
+ Estimated image size: ** MB
87
+
88
+ Example stack (local development):
89
+
90
+ ```yaml
91
+ services:
92
+ app:
93
+ build: .
94
+ ports: [3000:3000]
95
+ db:
96
+ image: postgres:15
97
+ redis:
98
+ image: redis:7
99
+ ```
100
+
101
+ ````
102
+
103
+ **7.3 Environment Strategy**
104
+
105
+ ```
106
+
107
+ How many environments will you have?
108
+
109
+ A) ⭐ Three environments
110
+
111
+ - Development (local)
112
+ - Staging (pre-production, QA)
113
+ - Production (live)
114
+
115
+ B) 🏆 Four+ environments
116
+
117
+ - Development
118
+ - Testing (automated tests)
119
+ - Staging
120
+ - Production
121
+
122
+ C) 🚀 Two environments
123
+
124
+ - Development
125
+ - Production
126
+
127
+ Your choice: \_\_
128
+
129
+ Environment configuration:
130
+ A) ✅ Environment variables (.env files)
131
+ B) ✅ Config service (AWS Secrets Manager, Vault)
132
+ C) ✅ Feature flags (LaunchDarkly, Unleash)
133
+
134
+ ```
135
+
136
+ **7.4 CI/CD Pipeline**
137
+
138
+ ```
139
+
140
+ CI/CD platform:
141
+
142
+ A) ⭐ GitHub Actions - If using GitHub
143
+ B) 🔥 GitLab CI - If using GitLab
144
+ C) Jenkins - Self-hosted
145
+ D) CircleCI
146
+ E) Travis CI
147
+ F) AWS CodePipeline
148
+ G) Azure DevOps
149
+
150
+ Your choice: \_\_
151
+
152
+ Pipeline stages:
153
+
154
+ 1. ✅ Checkout code
155
+ 2. ✅ Install dependencies
156
+ 3. ✅ Lint
157
+ 4. ✅ Test (with coverage)
158
+ 5. ✅ Build
159
+ 6. ✅ Security scan (optional)
160
+ 7. ✅ Deploy to staging
161
+ 8. ⏸️ Manual approval (optional)
162
+ 9. ✅ Deploy to production
163
+
164
+ Auto-deploy strategy:
165
+ A) ⭐ Auto-deploy to staging, manual approval for production
166
+ B) 🚀 Auto-deploy to production (main branch)
167
+ C) Manual deploy for all environments
168
+
169
+ ```
170
+
171
+ **7.4.1 Deployment Strategy** (Production-Ready and Enterprise only)
172
+
173
+ ```
174
+ What deployment strategy will you use for production?
175
+
176
+ A) ⭐ Rolling Deployment - Gradual replacement
177
+ - Replace instances one at a time
178
+ - Zero downtime
179
+ - Easy rollback
180
+
181
+ B) 🔥 Blue-Green Deployment - Instant switch
182
+ - Two identical environments
183
+ - Switch traffic instantly
184
+ - Higher infrastructure cost
185
+
186
+ C) ⚡ Canary Deployment - Progressive rollout
187
+ - Deploy to small percentage first
188
+ - Monitor for issues
189
+ - Gradually increase traffic
190
+
191
+ D) 🏆 Feature Flags - Code-level control
192
+ - Deploy code, toggle features
193
+ - Instant enable/disable
194
+ - Best with: LaunchDarkly, Unleash
195
+
196
+ Your choice: __
197
+
198
+ Rollback plan:
199
+ - How quickly must rollback complete? __ minutes
200
+ - Who can trigger rollback? [DevOps/Tech Lead/Any developer]
201
+ - Rollback trigger criteria? [Error rate > X%, latency > Y ms, manual]
202
+
203
+ If Blue-Green:
204
+ - Traffic switching: [Load balancer, DNS, etc.]
205
+ - Database migrations: [Strategy for zero-downtime]
206
+
207
+ If Canary:
208
+ - Initial traffic: __%
209
+ - Gradual increase: __% per __ minutes
210
+ - Success criteria: __
211
+ ```
212
+
213
+ **7.5 Monitoring & Logging**
214
+
215
+ ````
216
+
217
+ Monitoring tools:
218
+
219
+ Application Performance Monitoring (APM):
220
+ A) ⭐ Datadog - Full-featured, expensive
221
+ B) 🔥 New Relic - Popular
222
+ C) Sentry - Error tracking focus
223
+ D) ⚡ OpenTelemetry + Grafana - Open source
224
+ E) AWS CloudWatch
225
+ F) None yet
226
+
227
+ Your choice: \_\_
228
+
229
+ Logging:
230
+ A) ⭐ Centralized logging
231
+
232
+ - Winston/Pino (Node.js) → CloudWatch/Datadog
233
+ - Python logging → ELK Stack
234
+
235
+ B) Basic console logs
236
+
237
+ C) Structured JSON logging ⭐
238
+
239
+ ```json
240
+ {
241
+ "level": "info",
242
+ "timestamp": "2024-01-15T10:30:00Z",
243
+ "userId": "123",
244
+ "action": "user.login",
245
+ "ip": "192.168.1.1",
246
+ "message": "User logged in successfully"
247
+ }
248
+ ```
249
+
250
+ Your logging strategy: \_\_
251
+
252
+ Metrics to track:
253
+
254
+ - ✅ Request rate (requests/sec)
255
+ - ✅ Error rate (% of failed requests)
256
+ - ✅ Response time (p50, p95, p99)
257
+ - ✅ Database query time
258
+ - ✅ Cache hit rate
259
+ - ✅ CPU/Memory usage
260
+ - Custom business metrics: \_\_
261
+
262
+ ````
263
+
264
+ **7.6 Alerts**
265
+
266
+ ```
267
+
268
+ When should you be alerted?
269
+
270
+ A) ✅ Error rate > **% (e.g., 1%)
271
+ B) ✅ Response time > **ms (e.g., 1000ms)
272
+ C) ✅ 5xx errors (server errors)
273
+ D) ✅ Service down (health check failure)
274
+ E) ✅ Database connection failures
275
+ F) ✅ Disk space > 80%
276
+ G) ✅ Memory usage > 85%
277
+
278
+ Alert channels:
279
+ A) ⭐ Email
280
+ B) 🔥 Slack/Discord
281
+ C) ⚡ PagerDuty/Opsgenie (on-call)
282
+ D) SMS (critical only)
283
+
284
+ Your preferences: \_\_
285
+
286
+ On-call rotation:
287
+ A) Yes - Using [PagerDuty/Opsgenie]
288
+ B) No - Monitor during business hours
289
+
290
+ ```
291
+
292
+ **7.7 Backup & Disaster Recovery**
293
+
294
+ ```
295
+
296
+ Backup strategy:
297
+
298
+ Database backups:
299
+ A) ⭐ Automated daily backups
300
+
301
+ - Retention: 30 days
302
+ - Point-in-time recovery
303
+
304
+ B) 🏆 Continuous backups
305
+
306
+ - Every hour
307
+ - 90 days retention
308
+
309
+ C) Manual backups weekly
310
+
311
+ Your strategy: **
312
+ Retention period: ** days
313
+
314
+ Disaster recovery:
315
+
316
+ - Recovery Time Objective (RTO): \_\_ (how fast to restore)
317
+ - Recovery Point Objective (RPO): \_\_ (acceptable data loss)
318
+
319
+ Example:
320
+
321
+ - RTO: 1 hour (service restored within 1 hour)
322
+ - RPO: 15 minutes (lose max 15 min of data)
323
+
324
+ ```
325
+
326
+ **7.7.1 Database Migrations in Production**
327
+
328
+ ```
329
+ How will you handle database migrations in production?
330
+
331
+ Zero-downtime migrations:
332
+ A) ⭐ Yes - Plan for zero-downtime migrations (Production-Ready/Enterprise)
333
+ B) No - Accept maintenance windows (MVP)
334
+
335
+ If zero-downtime:
336
+ - Strategy: [Expand/Contract, Blue-Green migrations, etc.]
337
+ - Rollback plan: __
338
+ - Testing: [Tested on staging, Dry-run process]
339
+
340
+ Migration windows (if not zero-downtime):
341
+ - Preferred time: __
342
+ - Duration: __ minutes
343
+ - Notification: __
344
+ ```
345
+
346
+ **7.7.2 Database Connection Pooling**
347
+
348
+ ```
349
+ Database connection pooling configuration:
350
+
351
+ Pool tool: [ORM built-in, pgBouncer, HikariCP, etc.]
352
+
353
+ Settings:
354
+ - Min connections: __
355
+ - Max connections: __
356
+ - Connection timeout: __ ms
357
+ - Idle timeout: __ ms
358
+ - Max lifetime: __ ms
359
+
360
+ Monitoring:
361
+ - Track active/idle connections: [Yes/No]
362
+ - Alert on pool exhaustion: [Yes/No]
363
+ ```
364
+
365
+ **7.8 Scaling Strategy**
366
+
367
+ ```
368
+
369
+ How will you handle growth?
370
+
371
+ A) ⭐ Horizontal scaling - Add more instances
372
+
373
+ - Load balancer distributes traffic
374
+ - Stateless application design
375
+
376
+ B) Vertical scaling - Bigger instances
377
+
378
+ - Increase CPU/RAM
379
+ - Simpler but limited
380
+
381
+ C) ⚡ Auto-scaling - Automatic based on load
382
+
383
+ - Scale up during high traffic
384
+ - Scale down to save costs
385
+ - Metrics: CPU > 70%, requests > threshold
386
+
387
+ Your strategy: \_\_
388
+
389
+ Expected load:
390
+
391
+ - Initial: \_\_ requests/minute
392
+ - Year 1: \_\_ requests/minute
393
+ - Peak traffic: \_\_x normal load
394
+
395
+ Database scaling:
396
+ A) Read replicas - Scale reads
397
+ B) Sharding - Split data across DBs
398
+ C) Vertical scaling - Bigger DB instance
399
+ D) Not needed yet
400
+
401
+ ```
402
+
403
+ **7.9 Health Checks**
404
+
405
+ ````
406
+
407
+ Health check endpoints:
408
+
409
+ A) ✅ /health - Basic liveness
410
+
411
+ - Returns 200 OK if app is running
412
+
413
+ B) ✅ /health/ready - Readiness check
414
+
415
+ - Returns 200 OK if app can handle traffic
416
+ - Checks: DB connected, Redis connected, etc.
417
+
418
+ C) ✅ /health/live - Liveness check
419
+
420
+ - Returns 200 OK if app is alive
421
+ - Load balancer uses this
422
+
423
+ Example response:
424
+
425
+ ```json
426
+ {
427
+ "status": "healthy",
428
+ "timestamp": "2024-01-15T10:30:00Z",
429
+ "checks": {
430
+ "database": "ok",
431
+ "redis": "ok",
432
+ "disk_space": "ok"
433
+ },
434
+ "version": "1.2.3"
435
+ }
436
+ ```
437
+
438
+ Your health check endpoints: \_\_
439
+
440
+ ````
441
+
442
+ **7.9.1 Graceful Shutdown**
443
+
444
+ ```
445
+ Will you implement graceful shutdown?
446
+
447
+ A) ⭐ Yes - Handle shutdown gracefully (Production-Ready/Enterprise)
448
+ B) No - Standard shutdown
449
+
450
+ If yes:
451
+ Shutdown sequence:
452
+ 1. Stop accepting new requests (timeout: __s)
453
+ 2. Finish processing current requests (timeout: __s)
454
+ 3. Close database connections (timeout: __s)
455
+ 4. Close other connections (Redis, message queues, etc.)
456
+ 5. Exit process
457
+
458
+ Total shutdown timeout: __s
459
+
460
+ Implementation:
461
+ - Signal handling: [SIGTERM, SIGINT]
462
+ - Health check grace period: __s
463
+ - Connection drain timeout: __s
464
+ ```
465
+
466
+ **7.9.2 Circuit Breakers & Resilience**
467
+
468
+ ```
469
+ Will you implement circuit breakers?
470
+
471
+ A) ⭐ Yes - Protect against cascading failures (Production-Ready/Enterprise)
472
+ B) No - Direct service calls
473
+
474
+ If yes:
475
+ Circuit breaker tool: [Resilience4j, Hystrix, Polly, etc.]
476
+
477
+ Configuration:
478
+ - Failure threshold: __% (open circuit after X% failures)
479
+ - Success threshold: __% (close circuit after X% successes)
480
+ - Timeout: __ms
481
+ - Half-open retries: __
482
+ - Reset timeout: __s
483
+
484
+ Fallback strategy:
485
+ A) ⭐ Return cached data
486
+ B) Return default/empty response
487
+ C) Call alternative service
488
+ D) Return error gracefully
489
+
490
+ Services to protect:
491
+ {{#EACH SERVICE_TO_PROTECT}}
492
+ - **{{SERVICE_NAME}}**: {{FAILURE_THRESHOLD}}% threshold, fallback: {{FALLBACK_STRATEGY}}
493
+ {{/EACH}}
494
+ ```
495
+
496
+ **7.9.3 Retry & Timeout Policies**
497
+
498
+ ```
499
+ Define retry and timeout policies for external dependencies:
500
+
501
+ | Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
502
+ |--------------------|-----------|---------|----------------------|----------------------|
503
+ | Database queries | 5000ms | 2 | None (fail fast) | Connection pooled |
504
+ | Redis cache | 1000ms | 1 | None | Cache miss = OK |
505
+ | Payment API | 30000ms | 3 | Exponential (1s,2s,4s)| Must complete |
506
+ | Email service | 5000ms | 3 | Fixed (2s) | Queue if fails |
507
+ | External REST APIs | 10000ms | 2 | Exponential | Circuit breaker |
508
+ | File storage (S3) | 15000ms | 3 | Exponential | Large files |
509
+
510
+ Your policies:
511
+
512
+ | Service/Dependency | Timeout | Retries | Backoff Strategy | Notes |
513
+ |--------------------|-----------|---------|----------------------|----------------------|
514
+ | | | | | |
515
+ | | | | | |
516
+
517
+ Global defaults:
518
+ - Default HTTP timeout: __ ms (recommended: 10000)
519
+ - Default retries: __ (recommended: 2)
520
+ - Default backoff: [None/Fixed/Exponential]
521
+
522
+ Non-retryable errors:
523
+ - 400 Bad Request (client error, won't succeed on retry)
524
+ - 401/403 Unauthorized/Forbidden
525
+ - 404 Not Found
526
+ - [Your additions]
527
+ ```
528
+
529
+ **7.9.4 Request/Response Logging & Masking**
530
+
531
+ ```
532
+ What request/response data will you log?
533
+
534
+ Log levels by environment:
535
+ | Environment | Level | Body Logging | Performance Logging |
536
+ |-------------|----------|--------------|---------------------|
537
+ | Development | debug | Full | Yes |
538
+ | Staging | info | Truncated | Yes |
539
+ | Production | info | Minimal | Yes |
540
+
541
+ Request logging:
542
+ - ✅ HTTP method and URL
543
+ - ✅ Request ID (correlation)
544
+ - ✅ User ID (if authenticated)
545
+ - ✅ IP address (optional, may hash for privacy)
546
+ - ✅ Request duration (ms)
547
+ - ❓ Request body (careful with size and PII)
548
+ - ❓ Query parameters
549
+
550
+ Response logging:
551
+ - ✅ Status code
552
+ - ✅ Response duration (ms)
553
+ - ❓ Response body (careful with size)
554
+
555
+ Sensitive data masking (CRITICAL):
556
+
557
+ | Field Pattern | Masking Strategy |
558
+ |------------------------|----------------------------|
559
+ | password, secret | Completely redact |
560
+ | token, api_key | Show last 4 chars only |
561
+ | email | j***@example.com |
562
+ | phone | ***-***-1234 |
563
+ | credit_card | ****-****-****-1234 |
564
+ | ssn, national_id | Completely redact |
565
+ | [Your patterns] | __ |
566
+
567
+ Log format:
568
+ A) ⭐ Structured JSON (recommended for aggregation)
569
+ B) Plain text with patterns
570
+ C) Framework default
571
+
572
+ Log aggregation:
573
+ A) ⭐ Centralized (ELK, Datadog, CloudWatch)
574
+ B) File-based with rotation
575
+ C) Console only (development)
576
+ ```
577
+
578
+ **7.10 Documentation & Runbooks**
579
+
580
+ ```
581
+
582
+ Operational documentation:
583
+
584
+ A) ✅ Deployment guide - How to deploy
585
+ B) ✅ Runbooks - How to handle incidents
586
+
587
+ - Database connection failure → steps to diagnose/fix
588
+ - High CPU usage → steps to investigate
589
+ - Service down → recovery procedure
590
+
591
+ C) ✅ Architecture diagrams (Mermaid format)
592
+
593
+ - System architecture diagram (mermaid)
594
+ - Data flow diagram (mermaid)
595
+ - Infrastructure diagram (mermaid)
596
+
597
+ D) ✅ API documentation
598
+
599
+ - Swagger/OpenAPI
600
+ - Auto-generated from code
601
+
602
+ Will you create these?
603
+ A) Yes - All of them ⭐
604
+ B) Yes - Critical ones only (deployment, runbooks)
605
+ C) Later - Start without docs
606
+
607
+ API documentation strategy:
608
+ A) ⭐ Code-First (Recommended)
609
+
610
+ - Generate docs from code (Swagger/OpenAPI decorators)
611
+ - Always in sync with code
612
+ - Tools: @nestjs/swagger, FastAPI docs
613
+
614
+ B) 📝 Design-First
615
+
616
+ - Write openapi.yaml manually first
617
+ - Generate code from spec
618
+ - Better for large teams/contracts
619
+
620
+ C) 📄 Manual
621
+
622
+ - Write Markdown/Notion docs
623
+ - Hard to keep in sync (Not recommended)
624
+
625
+ ```
626
+
627
+ ---
628
+
629
+ #### 🎨 MERMAID OPERATIONS DIAGRAM FORMATS - CRITICAL
630
+
631
+ ## **Use these exact formats** for operational and infrastructure diagrams mentioned in question 7.10:
632
+
633
+ ##### 1️⃣ System Architecture Diagram (Deployment View)
634
+
635
+ Use `graph TD` to show deployed system components with scaling and redundancy:
636
+
637
+ ````markdown
638
+ ```mermaid
639
+ graph TD
640
+ subgraph "Production Environment"
641
+ subgraph "Load Balancer Layer"
642
+ LB1[Load Balancer 1]
643
+ LB2[Load Balancer 2]
644
+ end
645
+
646
+ subgraph "Application Layer"
647
+ App1[API Server 1<br/>4 vCPU, 8GB RAM]
648
+ App2[API Server 2<br/>4 vCPU, 8GB RAM]
649
+ App3[API Server 3<br/>4 vCPU, 8GB RAM]
650
+ end
651
+
652
+ subgraph "Data Layer"
653
+ Primary[(Primary DB<br/>PostgreSQL 15)]
654
+ Replica1[(Read Replica 1)]
655
+ Replica2[(Read Replica 2)]
656
+ Cache[Redis Cluster<br/>3 Nodes]
657
+ end
658
+
659
+ subgraph "Message Queue"
660
+ Queue[RabbitMQ Cluster<br/>3 Nodes]
661
+ end
662
+ end
663
+
664
+ Internet[Internet] -->|HTTPS| LB1
665
+ Internet -->|HTTPS| LB2
666
+ LB1 --> App1
667
+ LB1 --> App2
668
+ LB2 --> App2
669
+ LB2 --> App3
670
+
671
+ App1 -->|Write| Primary
672
+ App2 -->|Write| Primary
673
+ App3 -->|Write| Primary
674
+
675
+ App1 -->|Read| Replica1
676
+ App2 -->|Read| Replica2
677
+ App3 -->|Read| Replica1
678
+
679
+ App1 -->|Cache| Cache
680
+ App2 -->|Cache| Cache
681
+ App3 -->|Cache| Cache
682
+
683
+ App1 -->|Async Jobs| Queue
684
+ App2 -->|Async Jobs| Queue
685
+ App3 -->|Async Jobs| Queue
686
+
687
+ Primary -.->|Replication| Replica1
688
+ Primary -.->|Replication| Replica2
689
+
690
+ style Internet fill:#e1f5ff
691
+ style Primary fill:#e1ffe1
692
+ style Cache fill:#f0e1ff
693
+ style Queue fill:#ffe1f5
694
+ ```
695
+ ````
696
+
697
+ ## **Use for:** Showing deployed infrastructure, scaling configuration, redundancy, high availability
698
+
699
+ ##### 2️⃣ Data Flow Diagram (Request Flow)
700
+
701
+ Use `flowchart LR` to show how data moves through the system step-by-step:
702
+
703
+ ````markdown
704
+ ```mermaid
705
+ flowchart LR
706
+ User[User Request] -->|1. HTTPS POST| LB[Load Balancer]
707
+ LB -->|2. Route| API[API Server]
708
+ API -->|3. Validate JWT| Auth[Auth Service]
709
+ Auth -->|4. Token Valid| API
710
+
711
+ API -->|5. Check Cache| Cache[(Redis Cache)]
712
+ Cache -->|6. Cache Miss| API
713
+
714
+ API -->|7. Query| DB[(PostgreSQL)]
715
+ DB -->|8. Data| API
716
+
717
+ API -->|9. Store in Cache| Cache
718
+ API -->|10. Enqueue Job| Queue[Message Queue]
719
+
720
+ Queue -->|11. Process| Worker[Background Worker]
721
+ Worker -->|12. Send Email| Email[Email Service]
722
+
723
+ API -->|13. JSON Response| User
724
+
725
+ style User fill:#e1f5ff
726
+ style Cache fill:#f0e1ff
727
+ style DB fill:#e1ffe1
728
+ style Email fill:#fff4e1
729
+ ```
730
+ ````
731
+
732
+ ## **Use for:** Documenting request/response cycles, async processing flows, numbered execution steps
733
+
734
+ ##### 3️⃣ Infrastructure Diagram (Cloud Resources)
735
+
736
+ Use `graph TB` with subgraphs to show cloud infrastructure and network topology:
737
+
738
+ ````markdown
739
+ ```mermaid
740
+ graph TB
741
+ subgraph "AWS Cloud - Production (us-east-1)"
742
+ subgraph "VPC (10.0.0.0/16)"
743
+ subgraph "Public Subnet (10.0.1.0/24)"
744
+ ALB[Application Load Balancer]
745
+ NAT[NAT Gateway]
746
+ end
747
+
748
+ subgraph "Private Subnet 1 (10.0.10.0/24)"
749
+ ECS1[ECS Cluster<br/>Auto Scaling Group]
750
+ App1[Container: API<br/>Fargate Task]
751
+ App2[Container: API<br/>Fargate Task]
752
+ end
753
+
754
+ subgraph "Private Subnet 2 (10.0.20.0/24)"
755
+ RDS[(RDS PostgreSQL<br/>Multi-AZ)]
756
+ ElastiCache[ElastiCache Redis<br/>Cluster Mode]
757
+ end
758
+
759
+ subgraph "Private Subnet 3 (10.0.30.0/24)"
760
+ SQS[Amazon SQS<br/>Message Queue]
761
+ Lambda[Lambda Functions<br/>Background Workers]
762
+ end
763
+ end
764
+
765
+ subgraph "Supporting Services"
766
+ S3[S3 Bucket<br/>File Storage]
767
+ CloudWatch[CloudWatch<br/>Monitoring & Logs]
768
+ SecretsManager[Secrets Manager<br/>API Keys & Credentials]
769
+ end
770
+ end
771
+
772
+ Internet[Internet Users] -->|HTTPS| ALB
773
+ ALB --> App1
774
+ ALB --> App2
775
+
776
+ App1 --> RDS
777
+ App2 --> RDS
778
+ App1 --> ElastiCache
779
+ App2 --> ElastiCache
780
+
781
+ App1 -->|Upload/Download| S3
782
+ App2 -->|Upload/Download| S3
783
+
784
+ App1 -->|Send Message| SQS
785
+ SQS -->|Trigger| Lambda
786
+ Lambda --> RDS
787
+
788
+ App1 -->|Logs & Metrics| CloudWatch
789
+ App2 -->|Logs & Metrics| CloudWatch
790
+ Lambda -->|Logs| CloudWatch
791
+
792
+ App1 -->|Fetch Secrets| SecretsManager
793
+ App2 -->|Fetch Secrets| SecretsManager
794
+
795
+ style Internet fill:#e1f5ff
796
+ style RDS fill:#e1ffe1
797
+ style ElastiCache fill:#f0e1ff
798
+ style S3 fill:#fff4e1
799
+ style CloudWatch fill:#ffe1e1
800
+ ```
801
+ ````
802
+
803
+ ## **Use for:** Documenting cloud architecture, network topology, AWS/GCP/Azure resources, VPC design
804
+
805
+ ##### 4️⃣ Monitoring & Observability Diagram (Optional)
806
+
807
+ Use `graph TD` to show monitoring, logging, and alerting stack:
808
+
809
+ ````markdown
810
+ ```mermaid
811
+ graph TD
812
+ subgraph "Application Layer"
813
+ App[API Servers]
814
+ Worker[Background Workers]
815
+ end
816
+
817
+ subgraph "Monitoring Stack"
818
+ Prometheus[Prometheus<br/>Metrics Collection]
819
+ Grafana[Grafana<br/>Dashboards]
820
+ AlertManager[Alert Manager<br/>Notifications]
821
+ end
822
+
823
+ subgraph "Logging Stack"
824
+ FluentBit[Fluent Bit<br/>Log Collector]
825
+ Elasticsearch[Elasticsearch<br/>Log Storage]
826
+ Kibana[Kibana<br/>Log Viewer]
827
+ end
828
+
829
+ subgraph "Tracing"
830
+ Jaeger[Jaeger<br/>Distributed Tracing]
831
+ end
832
+
833
+ subgraph "Alerts"
834
+ PagerDuty[PagerDuty]
835
+ Slack[Slack Notifications]
836
+ end
837
+
838
+ App -->|Metrics| Prometheus
839
+ Worker -->|Metrics| Prometheus
840
+ Prometheus --> Grafana
841
+ Prometheus --> AlertManager
842
+
843
+ App -->|Logs| FluentBit
844
+ Worker -->|Logs| FluentBit
845
+ FluentBit --> Elasticsearch
846
+ Elasticsearch --> Kibana
847
+
848
+ App -->|Traces| Jaeger
849
+ Worker -->|Traces| Jaeger
850
+
851
+ AlertManager --> PagerDuty
852
+ AlertManager --> Slack
853
+
854
+ style Grafana fill:#e1f5ff
855
+ style Kibana fill:#f0e1ff
856
+ style PagerDuty fill:#ffe1e1
857
+ ```
858
+ ````
859
+
860
+ ## **Use for:** Documenting observability strategy, monitoring infrastructure, alerting workflows
861
+
862
+ **Best Practices for Operations Diagrams:**
863
+
864
+ 1. **Include Resource Specs:** Add CPU/RAM/disk info to nodes (e.g., `[API Server<br/>4 vCPU, 8GB RAM]`)
865
+ 2. **Show Redundancy:** Display load balancers, replicas, multi-AZ deployments, failover paths
866
+ 3. **Label Network Boundaries:** Use subgraphs for VPCs, subnets, availability zones, regions
867
+ 4. **Document Protocols:** Label connections with HTTPS, gRPC, TCP, WebSocket, etc.
868
+ 5. **Add IP Ranges:** Include CIDR blocks for network subnets (e.g., `10.0.1.0/24`)
869
+ 6. **Show Auto-Scaling:** Indicate which components scale horizontally/vertically
870
+ 7. **Include External Services:** SaaS tools, third-party APIs, CDNs, email providers
871
+ 8. **Color Code by Layer:** Infrastructure (blue), data (green), monitoring (purple), alerts (red)
872
+
873
+ **Common Formatting Rules:**
874
+
875
+ - Code fence: ` ```mermaid ` (lowercase, no spaces, three backticks)
876
+ - Use `subgraph "Name"` to group related components by layer/zone
877
+ - Use `[(Cylinder)]` for databases, data stores, and persistent storage
878
+ - Use `[Square Brackets]` for services, servers, and compute resources
879
+ - Use dotted arrows `-.->` for replication, backup, and async flows
880
+ - Apply consistent styling: `style NodeName fill:#colorcode`
881
+
882
+ **Deployment Context Examples:**
883
+
884
+ - For Docker: Show containers, volumes, networks, registries
885
+ - For Kubernetes: Show pods, services, ingress, namespaces, persistent volumes
886
+ - For Serverless: Show Lambda functions, API Gateway, S3 triggers, event sources
887
+ - For VMs: Show instances, security groups, load balancers, auto-scaling groups
888
+
889
+ ## **Validation:** Test diagrams at https://mermaid.live/ before saving to ensure syntax is correct
890
+
891
+ ### Phase 7 Output
892
+
893
+ ```
894
+ 📋 PHASE 7 SUMMARY:
895
+
896
+ Deployment Environment: [cloud/PaaS/on-premises/container-orchestration + platform choice + rationale] (7.1)
897
+ Containerization: [yes/no + Docker setup (base image, size, compose stack)] (7.2)
898
+ Environments: [number of environments (dev/staging/prod) + config approach (env vars/secrets/feature flags)] (7.3)
899
+ CI/CD Pipeline: [platform (GitHub Actions/GitLab CI/etc.) + pipeline stages + auto-deploy strategy] (7.4)
900
+ Deployment Strategy: [standard/blue-green/canary/rolling + zero-downtime approach + rollback plan] (7.4.1)
901
+ Monitoring & Logging: [APM tool + logging strategy (centralized/structured JSON) + metrics to track] (7.5)
902
+ Alerts: [alert conditions (error rate/response time/5xx/etc.) + channels (email/Slack/PagerDuty) + on-call rotation] (7.6)
903
+ Backup & Disaster Recovery: [backup strategy + retention period + RTO/RPO targets] (7.7)
904
+ Database Migrations in Production: [zero-downtime strategy + rollback plan + migration windows] (7.7.1)
905
+ Database Connection Pooling: [pool tool + settings (min/max/timeouts) + monitoring] (7.7.2)
906
+ Scaling Strategy: [horizontal/vertical/auto-scaling + expected load + database scaling approach] (7.8)
907
+ Health Checks: [endpoints (/health, /health/ready, /health/live) + checks performed] (7.9)
908
+ Graceful Shutdown: [yes/no + shutdown sequence + timeouts] (7.9.1)
909
+ Circuit Breakers & Resilience: [yes/no + tool + configuration + fallback strategies] (7.9.2)
910
+ Documentation & Runbooks: [what will be created (deployment guide/runbooks/architecture diagrams in mermaid format/API docs) + API doc strategy (code-first/design-first)] (7.10)
911
+
912
+ Is this correct? (Yes/No)
913
+ ```
914
+
915
+ ---
916
+
917
+ ### 📄 Generate Phase 7 Documents
918
+
919
+ **Before starting generation:**
920
+
921
+ ```
922
+ 📖 Loading context from previous phases...
923
+ ✅ Re-reading docs/testing.md
924
+ ✅ Re-reading ai-instructions.md
925
+ ```
926
+
927
+ **Generate documents automatically:**
928
+
929
+ **1. `docs/operations.md`**
930
+
931
+ - Use template: `.ai-flow/templates/docs/operations.template.md`
932
+ - Fill with deployment, monitoring, alerting, backup, scaling
933
+ - Write to: `docs/operations.md`
934
+
935
+ **2. `specs/configuration.md`**
936
+
937
+ - Use template: `.ai-flow/templates/specs/configuration.template.md`
938
+ - Fill with environment variables, secrets management, feature flags
939
+ - Write to: `specs/configuration.md`
940
+
941
+ **3. `.env.example`**
942
+
943
+ - List all environment variables needed
944
+ - Include comments explaining each variable
945
+ - Write to: `.env.example`
946
+
947
+ ```
948
+ ✅ Generated: docs/operations.md
949
+ ✅ Generated: specs/configuration.md
950
+ ✅ Generated: .env.example
951
+
952
+ Documents have been created with all Phase 7 information.
953
+
954
+ 📝 Would you like to make any corrections before continuing?
955
+
956
+ → If yes: Edit the files and type "ready" when done. I'll re-read them.
957
+ → If no: Type "continue" to proceed to final checkpoint.
958
+ ```
959
+
960
+ **If user edits files:**
961
+ Re-read files to refresh context before continuing.
962
+
963
+ ---
964
+
965
+ ### Phase 7 Completion
966
+
967
+ ```
968
+ ✅ Phase 7 Complete!
969
+
970
+ Generated documents:
971
+ ✅ docs/operations.md
972
+ ✅ specs/configuration.md
973
+ ✅ .env.example
974
+
975
+ 📝 Would you like to review these documents before proceeding to Phase 8?
976
+
977
+ → If yes: Edit the files and type "ready" when done.
978
+ → If no: Type "continue" to proceed to Phase 8.
979
+ ```
980
+
981
+ ---
982
+
983
+ ## 📝 Generated Documents
984
+
985
+ After Phase 7, generate/update:
986
+
987
+ - `docs/operations.md` - Operations and deployment guide
988
+ - `specs/configuration.md` - Configuration specification
989
+ - `.env.example` - Environment variables template
990
+
991
+ ---
992
+
993
+ **Next Phase:** Phase 8 - Project Setup & Final Documentation
994
+
995
+ Read: `.ai-flow/prompts/backend/flow-build-phase-8.md`
996
+
997
+ ---
998
+
999
+ **Last Updated:** 2025-12-20
1000
+
1001
+ **Version:** 2.1.8