gaia-framework 1.65.1 → 1.83.2

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (57) hide show
  1. package/.claude/commands/gaia-create-stakeholder.md +20 -0
  2. package/.claude/commands/gaia-test-gap-analysis.md +17 -0
  3. package/CLAUDE.md +102 -1
  4. package/README.md +2 -2
  5. package/_gaia/_config/global.yaml +5 -1
  6. package/_gaia/_config/lifecycle-sequence.yaml +20 -0
  7. package/_gaia/_config/skill-manifest.csv +2 -0
  8. package/_gaia/_config/workflow-manifest.csv +3 -1
  9. package/_gaia/core/engine/workflow.xml +11 -1
  10. package/_gaia/core/protocols/review-gate-check.xml +29 -1
  11. package/_gaia/core/workflows/party-mode/steps/step-01-agent-loading.md +60 -9
  12. package/_gaia/creative/workflows/problem-solving/checklist.md +64 -14
  13. package/_gaia/creative/workflows/problem-solving/instructions.xml +367 -22
  14. package/_gaia/creative/workflows/problem-solving/workflow.yaml +31 -1
  15. package/_gaia/dev/agents/_base-dev.md +7 -1
  16. package/_gaia/dev/skills/_skill-index.yaml +9 -0
  17. package/_gaia/dev/skills/figma-integration.md +296 -0
  18. package/_gaia/lifecycle/knowledge/brownfield/config-contradiction-scan.md +137 -0
  19. package/_gaia/lifecycle/knowledge/brownfield/dead-code-scan.md +179 -0
  20. package/_gaia/lifecycle/knowledge/brownfield/test-execution-scan.md +209 -0
  21. package/_gaia/lifecycle/skills/document-rulesets.md +91 -6
  22. package/_gaia/lifecycle/templates/brownfield-scan-doc-code-prompt.md +219 -0
  23. package/_gaia/lifecycle/templates/brownfield-scan-hardcoded-prompt.md +169 -0
  24. package/_gaia/lifecycle/templates/brownfield-scan-integration-seam-prompt.md +127 -0
  25. package/_gaia/lifecycle/templates/brownfield-scan-runtime-behavior-prompt.md +141 -0
  26. package/_gaia/lifecycle/templates/brownfield-scan-security-prompt.md +440 -0
  27. package/_gaia/lifecycle/templates/gap-entry-schema.md +282 -0
  28. package/_gaia/lifecycle/templates/infra-prd-template.md +356 -0
  29. package/_gaia/lifecycle/templates/platform-prd-template.md +431 -0
  30. package/_gaia/lifecycle/templates/prd-template.md +70 -0
  31. package/_gaia/lifecycle/templates/story-template.md +22 -1
  32. package/_gaia/lifecycle/workflows/2-planning/create-ux-design/instructions.xml +52 -3
  33. package/_gaia/lifecycle/workflows/4-implementation/add-feature/checklist.md +1 -1
  34. package/_gaia/lifecycle/workflows/4-implementation/add-feature/instructions.xml +2 -3
  35. package/_gaia/lifecycle/workflows/4-implementation/add-stories/checklist.md +5 -0
  36. package/_gaia/lifecycle/workflows/4-implementation/add-stories/instructions.xml +73 -1
  37. package/_gaia/lifecycle/workflows/4-implementation/create-stakeholder/checklist.md +25 -0
  38. package/_gaia/lifecycle/workflows/4-implementation/create-stakeholder/instructions.xml +79 -0
  39. package/_gaia/lifecycle/workflows/4-implementation/create-stakeholder/workflow.yaml +22 -0
  40. package/_gaia/lifecycle/workflows/4-implementation/create-story/instructions.xml +11 -1
  41. package/_gaia/lifecycle/workflows/4-implementation/retrospective/instructions.xml +21 -1
  42. package/_gaia/lifecycle/workflows/4-implementation/retrospective/workflow.yaml +1 -1
  43. package/_gaia/lifecycle/workflows/4-implementation/validate-story/instructions.xml +11 -0
  44. package/_gaia/lifecycle/workflows/anytime/brownfield-onboarding/checklist.md +12 -0
  45. package/_gaia/lifecycle/workflows/anytime/brownfield-onboarding/instructions.xml +248 -4
  46. package/_gaia/lifecycle/workflows/anytime/brownfield-onboarding/workflow.yaml +1 -0
  47. package/_gaia/testing/workflows/test-gap-analysis/checklist.md +8 -0
  48. package/_gaia/testing/workflows/test-gap-analysis/instructions.xml +53 -0
  49. package/_gaia/testing/workflows/test-gap-analysis/workflow.yaml +38 -0
  50. package/bin/gaia-framework.js +44 -8
  51. package/bin/helpers/derive-bump-label.js +41 -0
  52. package/bin/helpers/validate-bump-labels.js +38 -0
  53. package/gaia-install.sh +96 -21
  54. package/package.json +1 -1
  55. package/_gaia/_memory/tier2-results/.gitkeep +0 -0
  56. package/_gaia/_memory/tier2-results/checkpoint-resume-2026-03-24.yaml +0 -6
  57. package/_gaia/_memory/tier2-results/engine-scenarios-2026-03-22.yaml +0 -14
@@ -0,0 +1,282 @@
1
+ # Gap Entry Schema
2
+
3
+ > **Version:** 1.2.0
4
+ > **Story:** E11-S1, E12-S5, E11-S18
5
+ > **Traces to:** FR-111, FR-123, US-38, ADR-021, ADR-022
6
+ >
7
+ > Standardized output schema for brownfield scan subagents (E11).
8
+ > All scan agents MUST format gap entries using this schema.
9
+ > Infra-specific categories added for infrastructure/platform project support (E12-S5).
10
+ > Location: `_gaia/lifecycle/templates/gap-entry-schema.md`
11
+
12
+ ## Schema Definition
13
+
14
+ Each gap entry is a YAML object with the following fields:
15
+
16
+ ```yaml
17
+ id: "GAP-{scan_type}-{seq}"
18
+ category: "<enum>"
19
+ severity: "<enum>"
20
+ title: "<string>"
21
+ description: "<string>"
22
+ evidence:
23
+ file: "<relative-path>"
24
+ line: <number-or-range>
25
+ protocol: "<string>" # Optional
26
+ recommendation: "<string>"
27
+ verified_by: "<agent-id>"
28
+ confidence: "<enum>"
29
+ ```
30
+
31
+ ## Field Reference
32
+
33
+ | Field | Type | Required | Description |
34
+ |-------|------|----------|-------------|
35
+ | `id` | string | yes | Unique identifier. Format: `GAP-{scan_type}-{seq}` where `scan_type` maps to the category and `seq` is a zero-padded 3-digit sequence (e.g., `GAP-dead-code-001`) |
36
+ | `category` | enum | yes | Gap classification — must be one of the 12 allowed values (see Category Enum) |
37
+ | `severity` | enum | yes | Impact level — must be one of the 5 allowed values (see Severity Enum) |
38
+ | `title` | string | yes | Short summary of the gap (max 80 characters) |
39
+ | `description` | string | yes | Detailed explanation of the gap, what it means, and why it matters |
40
+ | `evidence` | object | yes | Source code evidence with required `file` and `line` sub-fields, plus optional `protocol` sub-field (see Evidence Object) |
41
+ | `recommendation` | string | yes | Actionable fix or remediation guidance |
42
+ | `verified_by` | string | yes | ID of the scan agent that produced this finding (e.g., `dead-code-analyzer`, `config-scanner`) |
43
+ | `confidence` | enum | yes | Agent's confidence in the finding accuracy (see Confidence Enum) |
44
+
45
+ ## Enums
46
+
47
+ ### Severity Enum
48
+
49
+ | Value | Description |
50
+ |-------|-------------|
51
+ | `critical` | Blocks deployment or causes data loss |
52
+ | `high` | Significant risk requiring prompt attention |
53
+ | `medium` | Moderate risk, should be addressed in current sprint |
54
+ | `low` | Minor issue, can be deferred |
55
+ | `info` | Informational finding, no immediate action needed |
56
+
57
+ ### Category Enum
58
+
59
+ 12 categories total — 7 application categories (E11-S1) plus 5 infrastructure categories (E12-S5):
60
+
61
+ #### Application Categories (7)
62
+
63
+ | Value | Scan Agent | Description |
64
+ |-------|------------|-------------|
65
+ | `config-contradiction` | E11-S2 | Configuration files contradict each other or runtime behavior |
66
+ | `dead-code` | E11-S3 | Unreachable code, unused exports, orphaned files |
67
+ | `hard-coded-logic` | E11-S4 | Magic numbers, embedded URLs, environment-specific constants |
68
+ | `security-endpoint` | E11-S5 | Unprotected routes, missing auth, exposed secrets |
69
+ | `runtime-behavior` | E11-S6 | Behavior that only manifests at runtime (race conditions, memory leaks) |
70
+ | `doc-code-drift` | E11-S7 | Documentation does not match actual code behavior |
71
+ | `integration-seam` | E11-S8 | Fragile integration points, tight coupling, missing contracts |
72
+
73
+ #### Infrastructure Categories (5) — ADR-022 §10.16.5
74
+
75
+ | Value | Infra PRD Section | Description |
76
+ |-------|-------------------|-------------|
77
+ | `resource-drift` | Resource Specifications | Declared infrastructure state differs from actual deployed state (e.g., Terraform state mismatch, orphaned cloud resources) |
78
+ | `config-sprawl` | Environment Strategy & DX | Configuration values duplicated across multiple files without a single source of truth (e.g., same port in Dockerfile, Helm values, and Terraform variables) |
79
+ | `secret-exposure` | Security Posture | Secrets, credentials, or sensitive values present in source files, environment configs, or IaC definitions without proper secrets management |
80
+ | `missing-policy` | Verification Strategy | Infrastructure lacks policy-as-code enforcement (e.g., no OPA/Rego, no Checkov rules, no tfsec scans for security/compliance) |
81
+ | `environment-skew` | Environment Strategy & DX | Environment definitions (dev/staging/prod) have inconsistent resource specifications, missing parity, or undocumented differences |
82
+
83
+ ### Confidence Enum
84
+
85
+ | Value | Description |
86
+ |-------|-------------|
87
+ | `high` | Strong evidence, verified through multiple signals |
88
+ | `medium` | Reasonable evidence, single signal source |
89
+ | `low` | Weak evidence, needs human verification |
90
+
91
+ ## Evidence Object
92
+
93
+ The `evidence` field is a composite object grouping source location data:
94
+
95
+ ```yaml
96
+ evidence:
97
+ file: "src/services/auth.ts" # Relative path from project root (non-empty string)
98
+ line: 42 # Single line number
99
+ protocol: "rest" # Optional. Protocol type
100
+ ```
101
+
102
+ Or with a line range:
103
+
104
+ ```yaml
105
+ evidence:
106
+ file: "config/database.yml"
107
+ line: "15-28" # Line range (start-end)
108
+ ```
109
+
110
+ Or without the optional protocol field (backward compatible):
111
+
112
+ ```yaml
113
+ evidence:
114
+ file: "src/utils/helper.ts"
115
+ line: 10
116
+ ```
117
+
118
+ | Sub-field | Type | Required | Constraints |
119
+ |-----------|------|----------|-------------|
120
+ | `file` | string | yes | Relative path from project root. Must be non-empty. |
121
+ | `line` | number or string | yes | Single line number (integer) or range as `"start-end"` string |
122
+ | `protocol` | string | no | Optional. One of `rest`, `graphql`, `grpc`, `websocket`, or any custom string. Omit if not applicable. When present, must be a non-empty string. |
123
+
124
+ ## ID Format
125
+
126
+ Pattern: `GAP-{scan_type}-{seq}`
127
+
128
+ - `scan_type` is the category value (e.g., `dead-code`, `config-contradiction`)
129
+ - `seq` is a zero-padded 3-digit sequence number starting at 001
130
+ - Regex: `^GAP-(config-contradiction|dead-code|hard-coded-logic|security-endpoint|runtime-behavior|doc-code-drift|integration-seam|resource-drift|config-sprawl|secret-exposure|missing-policy|environment-skew)-\d{3}$`
131
+
132
+ The `scan_type` component in the ID maps directly to the `category` value. See the Category Enum tables (Application + Infrastructure) for the full list of valid scan types.
133
+
134
+ ## Validation Rules
135
+
136
+ All fields listed in the Field Reference are **required** — a gap entry with any missing field is invalid.
137
+
138
+ ### Enum Validation
139
+
140
+ - `severity` must be exactly one of: `critical`, `high`, `medium`, `low`, `info`
141
+ - `category` must be exactly one of: `config-contradiction`, `dead-code`, `hard-coded-logic`, `security-endpoint`, `runtime-behavior`, `doc-code-drift`, `integration-seam`, `resource-drift`, `config-sprawl`, `secret-exposure`, `missing-policy`, `environment-skew`
142
+ - `confidence` must be exactly one of: `high`, `medium`, `low`
143
+ - Any value not in the enum set must be rejected
144
+
145
+ ### Format Validation
146
+
147
+ - `id` must match the regex `^GAP-(config-contradiction|dead-code|hard-coded-logic|security-endpoint|runtime-behavior|doc-code-drift|integration-seam|resource-drift|config-sprawl|secret-exposure|missing-policy|environment-skew)-\d{3}$`
148
+ - `evidence.file` must be a non-empty string containing a relative path (no leading `/`)
149
+ - `evidence.line` must be a positive integer or a range string matching `^\d+-\d+$`
150
+ - `title` should not exceed 80 characters
151
+ - `verified_by` must be a non-empty string identifying the scan agent
152
+ - `evidence.protocol` when present, must be a non-empty string
153
+
154
+ ### Required vs Optional
155
+
156
+ All 9 top-level fields (`id`, `category`, `severity`, `title`, `description`, `evidence`, `recommendation`, `verified_by`, `confidence`) are **required**. There are no optional top-level fields in the base schema.
157
+
158
+ The `evidence` object contains one optional sub-field: `protocol`. This is the first optional sub-field in the schema. Existing gap entries that omit `protocol` remain fully valid.
159
+
160
+ ### Optional Field Validation
161
+
162
+ The `protocol` sub-field of the `evidence` object is not enum-validated. It accepts any non-empty string when present. Recommended canonical values are `rest`, `graphql`, `grpc`, and `websocket`, but custom strings (e.g., `mqtt`, `soap`, `amqp`) are also accepted without schema changes. When `protocol` is omitted entirely, the gap entry remains valid (backward compatible). When present, an empty string is invalid.
163
+
164
+ ## Budget Control
165
+
166
+ Each gap entry should average approximately **100 tokens** in structured YAML format (per NFR-024).
167
+
168
+ Guidelines:
169
+ - Use structured YAML, not prose paragraphs
170
+ - Keep `title` under 80 characters
171
+ - Keep `description` to 1-2 sentences
172
+ - Keep `recommendation` to 1-2 sentences
173
+ - Avoid embedding full code snippets in descriptions — reference via `evidence` instead
174
+
175
+ With 12 categories across application and infrastructure scans, total token usage varies by project type. After consolidation and deduplication (E11-S10), the single `consolidated-gaps.md` must stay within the 40K framework context budget.
176
+
177
+ ## Examples
178
+
179
+ ### Application Category Example
180
+
181
+ ```yaml
182
+ id: "GAP-config-contradiction-001"
183
+ category: "config-contradiction"
184
+ severity: "high"
185
+ title: "Database timeout mismatch between config files"
186
+ description: "production.yaml sets db.timeout to 30s while docker-compose.yml sets POSTGRES_TIMEOUT to 10s."
187
+ evidence:
188
+ file: "config/production.yaml"
189
+ line: 18
190
+ recommendation: "Align timeout values. Set both to 30s or extract to a shared environment variable."
191
+ verified_by: "config-scanner"
192
+ confidence: "high"
193
+ ```
194
+
195
+ ### Application Category Example with Protocol
196
+
197
+ ```yaml
198
+ id: "GAP-security-endpoint-001"
199
+ category: "security-endpoint"
200
+ severity: "high"
201
+ title: "Unprotected admin route exposes user management API"
202
+ description: "The /api/admin/users endpoint has no authentication middleware applied."
203
+ evidence:
204
+ file: "src/routes/admin.ts"
205
+ line: 15
206
+ protocol: "rest"
207
+ recommendation: "Add authentication middleware to all /api/admin/* routes."
208
+ verified_by: "security-scanner"
209
+ confidence: "high"
210
+ ```
211
+
212
+ ### Infrastructure Category Examples
213
+
214
+ ```yaml
215
+ id: "GAP-resource-drift-001"
216
+ category: "resource-drift"
217
+ severity: "high"
218
+ title: "Terraform state shows orphaned S3 bucket"
219
+ description: "S3 bucket 'app-logs-legacy' exists in AWS but is not declared in any Terraform configuration."
220
+ evidence:
221
+ file: "infra/terraform/storage.tf"
222
+ line: "1-45"
223
+ recommendation: "Import the bucket into Terraform state or delete it if no longer needed."
224
+ verified_by: "infra-drift-scanner"
225
+ confidence: "high"
226
+ ```
227
+
228
+ ```yaml
229
+ id: "GAP-config-sprawl-001"
230
+ category: "config-sprawl"
231
+ severity: "medium"
232
+ title: "Database port duplicated across 4 config files"
233
+ description: "Port 5432 is hardcoded in Dockerfile, docker-compose.yml, Helm values.yaml, and Terraform variables.tf."
234
+ evidence:
235
+ file: "docker-compose.yml"
236
+ line: 14
237
+ recommendation: "Extract database port to a single environment variable, reference it from all 4 files."
238
+ verified_by: "config-sprawl-scanner"
239
+ confidence: "high"
240
+ ```
241
+
242
+ ```yaml
243
+ id: "GAP-secret-exposure-001"
244
+ category: "secret-exposure"
245
+ severity: "critical"
246
+ title: "AWS access key embedded in Terraform variables"
247
+ description: "AWS_ACCESS_KEY_ID is set as a default value in variables.tf instead of using a secrets manager."
248
+ evidence:
249
+ file: "infra/terraform/variables.tf"
250
+ line: 23
251
+ recommendation: "Remove the default value, use AWS SSM Parameter Store or HashiCorp Vault."
252
+ verified_by: "secret-scanner"
253
+ confidence: "high"
254
+ ```
255
+
256
+ ```yaml
257
+ id: "GAP-missing-policy-001"
258
+ category: "missing-policy"
259
+ severity: "medium"
260
+ title: "No policy-as-code enforcement for Kubernetes manifests"
261
+ description: "Kubernetes deployments lack OPA/Gatekeeper or Kyverno policies for security constraints."
262
+ evidence:
263
+ file: "k8s/deployments/api-server.yaml"
264
+ line: "1-30"
265
+ recommendation: "Add OPA Gatekeeper constraints or Kyverno policies to enforce pod security standards."
266
+ verified_by: "policy-scanner"
267
+ confidence: "medium"
268
+ ```
269
+
270
+ ```yaml
271
+ id: "GAP-environment-skew-001"
272
+ category: "environment-skew"
273
+ severity: "high"
274
+ title: "Staging uses 2 replicas while production uses 5"
275
+ description: "Replica counts differ between staging and production with no documented justification."
276
+ evidence:
277
+ file: "k8s/overlays/staging/deployment-patch.yaml"
278
+ line: 8
279
+ recommendation: "Document the replica difference rationale or align staging proportionally."
280
+ verified_by: "env-skew-scanner"
281
+ confidence: "high"
282
+ ```
@@ -0,0 +1,356 @@
1
+ ---
2
+ template: 'infra-prd'
3
+ version: 1.0.0
4
+ used_by: ['create-prd']
5
+ domain: '{domain}'
6
+ ---
7
+
8
+ # Infrastructure PRD: {product_name}
9
+
10
+ > **Project:** {project_name}
11
+ > **Domain:** {domain}
12
+ > **Date:** {date}
13
+ > **Author:** {agent_name}
14
+ > **Status:** Draft | In Review | Approved
15
+
16
+ ## 1. Overview & Scope
17
+
18
+ {Platform purpose, target environments, and team ownership.}
19
+
20
+ ### Platform Purpose
21
+
22
+ {What this infrastructure provides and why it exists.}
23
+
24
+ ### Target Environments
25
+
26
+ | Environment | Purpose | Region(s) | Owner |
27
+ |-------------|---------|-----------|-------|
28
+ | {env_name} | {purpose} | {regions} | {team} |
29
+
30
+ ### Team Ownership
31
+
32
+ | Component | Owning Team | Escalation |
33
+ |-----------|-------------|------------|
34
+ | {component} | {team} | {contact} |
35
+
36
+ ## 2. Goals and Non-Goals
37
+
38
+ ### Goals
39
+ - {Goal 1}
40
+ - {Goal 2}
41
+
42
+ ### Non-Goals
43
+ - {Explicitly out of scope item 1}
44
+
45
+ ## 3. Platform Capabilities
46
+
47
+ {What the infrastructure enables. Each capability follows the format below.}
48
+
49
+ | ID | Capability | SLO |
50
+ |----|-----------|-----|
51
+ | PC-01 | Enable {team/service} to {capability} with {SLO} | {target} |
52
+ | PC-02 | Enable {team/service} to {capability} with {SLO} | {target} |
53
+
54
+ ## 4. Resource Specifications
55
+
56
+ {Compute, storage, networking, IAM provisioning. Per-environment breakdown.}
57
+
58
+ ### Compute
59
+
60
+ | Resource | Environment | Spec | Scaling |
61
+ |----------|-------------|------|---------|
62
+ | {resource} | {env} | {cpu/memory} | {auto/manual, min-max} |
63
+
64
+ ### Storage
65
+
66
+ | Store | Type | Size | IOPS | Backup |
67
+ |-------|------|------|------|--------|
68
+ | {store} | {block/object/file} | {size} | {iops} | {policy} |
69
+
70
+ ### Networking
71
+
72
+ | Component | CIDR/Range | Protocol | Purpose |
73
+ |-----------|-----------|----------|---------|
74
+ | {component} | {cidr} | {protocol} | {purpose} |
75
+
76
+ ### IAM Provisioning
77
+
78
+ | Role/Policy | Scope | Permissions | Lifecycle |
79
+ |-------------|-------|-------------|-----------|
80
+ | {role} | {scope} | {permissions} | {create/rotate/revoke} |
81
+
82
+ ### State Management
83
+
84
+ {State backend strategy — e.g., Terraform remote state, locking, encryption.}
85
+
86
+ | Backend | Lock Provider | Encryption | Workspace Strategy |
87
+ |---------|--------------|------------|-------------------|
88
+ | {backend} | {lock} | {encryption} | {workspace} |
89
+
90
+ ### Data Persistence Requirements
91
+
92
+ | Data Store | Durability | Replication | Retention |
93
+ |------------|-----------|-------------|-----------|
94
+ | {store} | {durability} | {replication} | {retention} |
95
+
96
+ ## 5. Operational SLOs
97
+
98
+ {Availability targets, MTTR, RTO/RPO, error budgets, resource utilization targets.}
99
+
100
+ ### Availability & Recovery
101
+
102
+ | Metric | Target | Measurement |
103
+ |--------|--------|-------------|
104
+ | Availability | {99.x%} | {how measured} |
105
+ | MTTR | {minutes} | {how measured} |
106
+ | RTO | {minutes} | {recovery time objective} |
107
+ | RPO | {minutes} | {recovery point objective} |
108
+ | Error Budget | {x% per month} | {how calculated} |
109
+
110
+ ### Resource Utilization Targets
111
+
112
+ | Resource | Target Utilization | Alert Threshold |
113
+ |----------|-------------------|-----------------|
114
+ | CPU | {target%} | {alert%} |
115
+ | Memory | {target%} | {alert%} |
116
+ | Storage IOPS | {target} | {threshold} |
117
+ | Network Bandwidth | {target Gbps} | {threshold} |
118
+ | Network Latency | {target ms} | {threshold} |
119
+
120
+ ## 6. Security Posture
121
+
122
+ {Security requirements tailored for infrastructure projects.}
123
+
124
+ ### IAM/RBAC
125
+
126
+ {Identity and access management, role-based access control policies.}
127
+
128
+ | Principal | Role | Scope | MFA Required | Review Cadence |
129
+ |-----------|------|-------|-------------|----------------|
130
+ | {principal} | {role} | {scope} | {yes/no} | {quarterly/annually} |
131
+
132
+ ### Network Segmentation
133
+
134
+ {Network isolation, security groups, firewall rules, zero-trust boundaries.}
135
+
136
+ | Zone | CIDR | Ingress Rules | Egress Rules | Purpose |
137
+ |------|------|---------------|-------------|---------|
138
+ | {zone} | {cidr} | {rules} | {rules} | {purpose} |
139
+
140
+ ### Secrets Management
141
+
142
+ {Secrets storage, rotation, injection, and audit strategy.}
143
+
144
+ | Secret Type | Store | Rotation | Injection Method |
145
+ |-------------|-------|----------|-----------------|
146
+ | {type} | {vault/kms/ssm} | {cadence} | {env var/sidecar/init container} |
147
+
148
+ ### Image Provenance
149
+
150
+ {Container image signing, scanning, and supply chain verification.}
151
+
152
+ | Registry | Signing | Scanning | Admission Policy |
153
+ |----------|---------|----------|-----------------|
154
+ | {registry} | {cosign/notary} | {trivy/grype} | {policy} |
155
+
156
+ ### Compliance Mapping
157
+
158
+ {Regulatory and compliance framework alignment.}
159
+
160
+ | Framework | Controls | Evidence | Audit Frequency |
161
+ |-----------|----------|----------|----------------|
162
+ | {SOC2/HIPAA/PCI/ISO} | {control IDs} | {how demonstrated} | {cadence} |
163
+
164
+ ## 7. Environment Strategy & Developer Experience
165
+
166
+ {Environment parity, promotion pipeline, drift detection, self-service provisioning.}
167
+
168
+ ### Environment Parity
169
+
170
+ | Dimension | Dev | Staging | Production |
171
+ |-----------|-----|---------|-----------|
172
+ | {dimension} | {dev config} | {staging config} | {prod config} |
173
+
174
+ ### Promotion Pipeline
175
+
176
+ {How changes flow from dev to production.}
177
+
178
+ ```
179
+ {dev} → {staging} → {production}
180
+ ```
181
+
182
+ ### Drift Detection
183
+
184
+ {How configuration drift is detected and remediated.}
185
+
186
+ | Tool | Schedule | Remediation | Notification |
187
+ |------|----------|-------------|-------------|
188
+ | {tool} | {cron} | {auto/manual} | {channel} |
189
+
190
+ ### Self-Service Provisioning
191
+
192
+ {Developer self-service capabilities and guardrails.}
193
+
194
+ | Capability | Interface | Guardrails | Approval |
195
+ |------------|-----------|-----------|----------|
196
+ | {capability} | {CLI/portal/API} | {policy} | {auto/manual} |
197
+
198
+ ### Onboarding
199
+
200
+ {New team member and new service onboarding procedures.}
201
+
202
+ ### Observability
203
+
204
+ {Monitoring, logging, tracing, and alerting strategy.}
205
+
206
+ | Signal | Tool | Retention | Alerting |
207
+ |--------|------|-----------|---------|
208
+ | Metrics | {prometheus/cloudwatch} | {retention} | {pagerduty/slack} |
209
+ | Logs | {elk/cloudwatch} | {retention} | {rules} |
210
+ | Traces | {jaeger/xray} | {retention} | {rules} |
211
+
212
+ ## 8. Dependencies & Provider Constraints
213
+
214
+ {Cloud provider limits, Terraform provider versions, upstream service contracts.}
215
+
216
+ ### Cloud Provider Limits
217
+
218
+ | Provider | Service | Limit | Current Usage | Headroom |
219
+ |----------|---------|-------|--------------|----------|
220
+ | {provider} | {service} | {limit} | {current} | {remaining} |
221
+
222
+ ### Terraform Provider Versions
223
+
224
+ | Provider | Version | Constraint | Notes |
225
+ |----------|---------|-----------|-------|
226
+ | {provider} | {version} | {~> x.y} | {notes} |
227
+
228
+ ### Upstream Service Contracts
229
+
230
+ | Service | SLA | API Version | Deprecation |
231
+ |---------|-----|------------|-------------|
232
+ | {service} | {sla} | {version} | {date or N/A} |
233
+
234
+ ## 9. Cost Model
235
+
236
+ {Per-environment resource cost estimates, scaling cost projections, and cost-per-unit efficiency metrics.}
237
+
238
+ ### Per-Environment Resource Cost Estimates
239
+
240
+ | Resource | Dev (monthly) | Staging (monthly) | Production (monthly) |
241
+ |----------|--------------|-------------------|---------------------|
242
+ | Compute | ${cost} | ${cost} | ${cost} |
243
+ | Storage | ${cost} | ${cost} | ${cost} |
244
+ | Networking | ${cost} | ${cost} | ${cost} |
245
+ | Monitoring | ${cost} | ${cost} | ${cost} |
246
+ | **Total** | **${total}** | **${total}** | **${total}** |
247
+
248
+ ### Scaling Cost Projections
249
+
250
+ | Scenario | Trigger | Additional Cost | Timeline |
251
+ |----------|---------|----------------|----------|
252
+ | {scenario} | {trigger condition} | ${projection} | {timeframe} |
253
+
254
+ ### Cost-Per-Unit Efficiency Metrics
255
+
256
+ | Metric | Current | Target | Optimization |
257
+ |--------|---------|--------|-------------|
258
+ | Cost per request | ${cost} | ${target} | {strategy} |
259
+ | Cost per GB stored | ${cost} | ${target} | {strategy} |
260
+ | Cost per environment | ${cost} | ${target} | {strategy} |
261
+
262
+ ## 10. Verification Strategy
263
+
264
+ {Policy-as-code (OPA/Rego, Checkov, tfsec), plan validation, smoke tests, drift detection, chaos testing.}
265
+
266
+ ### Policy-as-Code
267
+
268
+ | Tool | Scope | Rules | Enforcement |
269
+ |------|-------|-------|-------------|
270
+ | OPA/Rego | {scope} | {rule count} | {warn/deny} |
271
+ | Checkov | {scope} | {rule count} | {warn/deny} |
272
+ | tfsec | {scope} | {rule count} | {warn/deny} |
273
+
274
+ ### Plan Validation
275
+
276
+ {Terraform plan review, cost estimation, blast radius analysis.}
277
+
278
+ | Check | Tool | Gate | Threshold |
279
+ |-------|------|------|-----------|
280
+ | {check} | {tool} | {CI/manual} | {threshold} |
281
+
282
+ ### Smoke Tests
283
+
284
+ {Post-deployment verification tests.}
285
+
286
+ | Test | Target | Expected | Timeout |
287
+ |------|--------|----------|---------|
288
+ | {test} | {endpoint/resource} | {result} | {timeout} |
289
+
290
+ ### Drift Detection
291
+
292
+ {Scheduled plan diffs, state file monitoring, compliance scanning.}
293
+
294
+ ### Chaos Testing
295
+
296
+ {Failure injection, resilience validation.}
297
+
298
+ | Experiment | Target | Hypothesis | Blast Radius |
299
+ |-----------|--------|-----------|-------------|
300
+ | {experiment} | {target} | {hypothesis} | {scope} |
301
+
302
+ ## 11. Operational Runbooks
303
+
304
+ {Scaling, failover, incident response, rollback procedures.}
305
+
306
+ ### Scaling Procedures
307
+
308
+ | Trigger | Action | Rollback | Owner |
309
+ |---------|--------|----------|-------|
310
+ | {trigger} | {action} | {rollback} | {team} |
311
+
312
+ ### Failover Procedures
313
+
314
+ | Scenario | Detection | Response | RTO |
315
+ |----------|-----------|----------|-----|
316
+ | {scenario} | {detection} | {response steps} | {rto} |
317
+
318
+ ### Incident Response
319
+
320
+ | Severity | Notification | Escalation | Runbook |
321
+ |----------|-------------|------------|---------|
322
+ | P1 | {channel} | {escalation path} | {link} |
323
+ | P2 | {channel} | {escalation path} | {link} |
324
+
325
+ ### Rollback Procedures
326
+
327
+ | Change Type | Rollback Method | Verification | Duration |
328
+ |-------------|----------------|-------------|----------|
329
+ | {type} | {method} | {verification} | {estimate} |
330
+
331
+ ## 12. Requirements Summary
332
+
333
+ ### Infrastructure Requirements
334
+
335
+ | ID | Description | Priority | Status |
336
+ |----|------------|----------|--------|
337
+ | IR-001 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
338
+ | IR-002 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
339
+
340
+ ### Operational Requirements
341
+
342
+ | ID | Description | Priority | Status |
343
+ |----|------------|----------|--------|
344
+ | OR-001 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
345
+ | OR-002 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
346
+
347
+ ### Security Requirements
348
+
349
+ | ID | Description | Priority | Status |
350
+ |----|------------|----------|--------|
351
+ | SR-001 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
352
+ | SR-002 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
353
+
354
+ ## 13. Open Questions
355
+
356
+ - [ ] {Unresolved question}