gaia-framework 1.65.1 → 1.83.2
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude/commands/gaia-create-stakeholder.md +20 -0
- package/.claude/commands/gaia-test-gap-analysis.md +17 -0
- package/CLAUDE.md +102 -1
- package/README.md +2 -2
- package/_gaia/_config/global.yaml +5 -1
- package/_gaia/_config/lifecycle-sequence.yaml +20 -0
- package/_gaia/_config/skill-manifest.csv +2 -0
- package/_gaia/_config/workflow-manifest.csv +3 -1
- package/_gaia/core/engine/workflow.xml +11 -1
- package/_gaia/core/protocols/review-gate-check.xml +29 -1
- package/_gaia/core/workflows/party-mode/steps/step-01-agent-loading.md +60 -9
- package/_gaia/creative/workflows/problem-solving/checklist.md +64 -14
- package/_gaia/creative/workflows/problem-solving/instructions.xml +367 -22
- package/_gaia/creative/workflows/problem-solving/workflow.yaml +31 -1
- package/_gaia/dev/agents/_base-dev.md +7 -1
- package/_gaia/dev/skills/_skill-index.yaml +9 -0
- package/_gaia/dev/skills/figma-integration.md +296 -0
- package/_gaia/lifecycle/knowledge/brownfield/config-contradiction-scan.md +137 -0
- package/_gaia/lifecycle/knowledge/brownfield/dead-code-scan.md +179 -0
- package/_gaia/lifecycle/knowledge/brownfield/test-execution-scan.md +209 -0
- package/_gaia/lifecycle/skills/document-rulesets.md +91 -6
- package/_gaia/lifecycle/templates/brownfield-scan-doc-code-prompt.md +219 -0
- package/_gaia/lifecycle/templates/brownfield-scan-hardcoded-prompt.md +169 -0
- package/_gaia/lifecycle/templates/brownfield-scan-integration-seam-prompt.md +127 -0
- package/_gaia/lifecycle/templates/brownfield-scan-runtime-behavior-prompt.md +141 -0
- package/_gaia/lifecycle/templates/brownfield-scan-security-prompt.md +440 -0
- package/_gaia/lifecycle/templates/gap-entry-schema.md +282 -0
- package/_gaia/lifecycle/templates/infra-prd-template.md +356 -0
- package/_gaia/lifecycle/templates/platform-prd-template.md +431 -0
- package/_gaia/lifecycle/templates/prd-template.md +70 -0
- package/_gaia/lifecycle/templates/story-template.md +22 -1
- package/_gaia/lifecycle/workflows/2-planning/create-ux-design/instructions.xml +52 -3
- package/_gaia/lifecycle/workflows/4-implementation/add-feature/checklist.md +1 -1
- package/_gaia/lifecycle/workflows/4-implementation/add-feature/instructions.xml +2 -3
- package/_gaia/lifecycle/workflows/4-implementation/add-stories/checklist.md +5 -0
- package/_gaia/lifecycle/workflows/4-implementation/add-stories/instructions.xml +73 -1
- package/_gaia/lifecycle/workflows/4-implementation/create-stakeholder/checklist.md +25 -0
- package/_gaia/lifecycle/workflows/4-implementation/create-stakeholder/instructions.xml +79 -0
- package/_gaia/lifecycle/workflows/4-implementation/create-stakeholder/workflow.yaml +22 -0
- package/_gaia/lifecycle/workflows/4-implementation/create-story/instructions.xml +11 -1
- package/_gaia/lifecycle/workflows/4-implementation/retrospective/instructions.xml +21 -1
- package/_gaia/lifecycle/workflows/4-implementation/retrospective/workflow.yaml +1 -1
- package/_gaia/lifecycle/workflows/4-implementation/validate-story/instructions.xml +11 -0
- package/_gaia/lifecycle/workflows/anytime/brownfield-onboarding/checklist.md +12 -0
- package/_gaia/lifecycle/workflows/anytime/brownfield-onboarding/instructions.xml +248 -4
- package/_gaia/lifecycle/workflows/anytime/brownfield-onboarding/workflow.yaml +1 -0
- package/_gaia/testing/workflows/test-gap-analysis/checklist.md +8 -0
- package/_gaia/testing/workflows/test-gap-analysis/instructions.xml +53 -0
- package/_gaia/testing/workflows/test-gap-analysis/workflow.yaml +38 -0
- package/bin/gaia-framework.js +44 -8
- package/bin/helpers/derive-bump-label.js +41 -0
- package/bin/helpers/validate-bump-labels.js +38 -0
- package/gaia-install.sh +96 -21
- package/package.json +1 -1
- package/_gaia/_memory/tier2-results/.gitkeep +0 -0
- package/_gaia/_memory/tier2-results/checkpoint-resume-2026-03-24.yaml +0 -6
- package/_gaia/_memory/tier2-results/engine-scenarios-2026-03-22.yaml +0 -14
|
@@ -0,0 +1,282 @@
|
|
|
1
|
+
# Gap Entry Schema
|
|
2
|
+
|
|
3
|
+
> **Version:** 1.2.0
|
|
4
|
+
> **Story:** E11-S1, E12-S5, E11-S18
|
|
5
|
+
> **Traces to:** FR-111, FR-123, US-38, ADR-021, ADR-022
|
|
6
|
+
>
|
|
7
|
+
> Standardized output schema for brownfield scan subagents (E11).
|
|
8
|
+
> All scan agents MUST format gap entries using this schema.
|
|
9
|
+
> Infra-specific categories added for infrastructure/platform project support (E12-S5).
|
|
10
|
+
> Location: `_gaia/lifecycle/templates/gap-entry-schema.md`
|
|
11
|
+
|
|
12
|
+
## Schema Definition
|
|
13
|
+
|
|
14
|
+
Each gap entry is a YAML object with the following fields:
|
|
15
|
+
|
|
16
|
+
```yaml
|
|
17
|
+
id: "GAP-{scan_type}-{seq}"
|
|
18
|
+
category: "<enum>"
|
|
19
|
+
severity: "<enum>"
|
|
20
|
+
title: "<string>"
|
|
21
|
+
description: "<string>"
|
|
22
|
+
evidence:
|
|
23
|
+
file: "<relative-path>"
|
|
24
|
+
line: <number-or-range>
|
|
25
|
+
protocol: "<string>" # Optional
|
|
26
|
+
recommendation: "<string>"
|
|
27
|
+
verified_by: "<agent-id>"
|
|
28
|
+
confidence: "<enum>"
|
|
29
|
+
```
|
|
30
|
+
|
|
31
|
+
## Field Reference
|
|
32
|
+
|
|
33
|
+
| Field | Type | Required | Description |
|
|
34
|
+
|-------|------|----------|-------------|
|
|
35
|
+
| `id` | string | yes | Unique identifier. Format: `GAP-{scan_type}-{seq}` where `scan_type` maps to the category and `seq` is a zero-padded 3-digit sequence (e.g., `GAP-dead-code-001`) |
|
|
36
|
+
| `category` | enum | yes | Gap classification — must be one of the 12 allowed values (see Category Enum) |
|
|
37
|
+
| `severity` | enum | yes | Impact level — must be one of the 5 allowed values (see Severity Enum) |
|
|
38
|
+
| `title` | string | yes | Short summary of the gap (max 80 characters) |
|
|
39
|
+
| `description` | string | yes | Detailed explanation of the gap, what it means, and why it matters |
|
|
40
|
+
| `evidence` | object | yes | Source code evidence with required `file` and `line` sub-fields, plus optional `protocol` sub-field (see Evidence Object) |
|
|
41
|
+
| `recommendation` | string | yes | Actionable fix or remediation guidance |
|
|
42
|
+
| `verified_by` | string | yes | ID of the scan agent that produced this finding (e.g., `dead-code-analyzer`, `config-scanner`) |
|
|
43
|
+
| `confidence` | enum | yes | Agent's confidence in the finding accuracy (see Confidence Enum) |
|
|
44
|
+
|
|
45
|
+
## Enums
|
|
46
|
+
|
|
47
|
+
### Severity Enum
|
|
48
|
+
|
|
49
|
+
| Value | Description |
|
|
50
|
+
|-------|-------------|
|
|
51
|
+
| `critical` | Blocks deployment or causes data loss |
|
|
52
|
+
| `high` | Significant risk requiring prompt attention |
|
|
53
|
+
| `medium` | Moderate risk, should be addressed in current sprint |
|
|
54
|
+
| `low` | Minor issue, can be deferred |
|
|
55
|
+
| `info` | Informational finding, no immediate action needed |
|
|
56
|
+
|
|
57
|
+
### Category Enum
|
|
58
|
+
|
|
59
|
+
12 categories total — 7 application categories (E11-S1) plus 5 infrastructure categories (E12-S5):
|
|
60
|
+
|
|
61
|
+
#### Application Categories (7)
|
|
62
|
+
|
|
63
|
+
| Value | Scan Agent | Description |
|
|
64
|
+
|-------|------------|-------------|
|
|
65
|
+
| `config-contradiction` | E11-S2 | Configuration files contradict each other or runtime behavior |
|
|
66
|
+
| `dead-code` | E11-S3 | Unreachable code, unused exports, orphaned files |
|
|
67
|
+
| `hard-coded-logic` | E11-S4 | Magic numbers, embedded URLs, environment-specific constants |
|
|
68
|
+
| `security-endpoint` | E11-S5 | Unprotected routes, missing auth, exposed secrets |
|
|
69
|
+
| `runtime-behavior` | E11-S6 | Behavior that only manifests at runtime (race conditions, memory leaks) |
|
|
70
|
+
| `doc-code-drift` | E11-S7 | Documentation does not match actual code behavior |
|
|
71
|
+
| `integration-seam` | E11-S8 | Fragile integration points, tight coupling, missing contracts |
|
|
72
|
+
|
|
73
|
+
#### Infrastructure Categories (5) — ADR-022 §10.16.5
|
|
74
|
+
|
|
75
|
+
| Value | Infra PRD Section | Description |
|
|
76
|
+
|-------|-------------------|-------------|
|
|
77
|
+
| `resource-drift` | Resource Specifications | Declared infrastructure state differs from actual deployed state (e.g., Terraform state mismatch, orphaned cloud resources) |
|
|
78
|
+
| `config-sprawl` | Environment Strategy & DX | Configuration values duplicated across multiple files without a single source of truth (e.g., same port in Dockerfile, Helm values, and Terraform variables) |
|
|
79
|
+
| `secret-exposure` | Security Posture | Secrets, credentials, or sensitive values present in source files, environment configs, or IaC definitions without proper secrets management |
|
|
80
|
+
| `missing-policy` | Verification Strategy | Infrastructure lacks policy-as-code enforcement (e.g., no OPA/Rego, no Checkov rules, no tfsec scans for security/compliance) |
|
|
81
|
+
| `environment-skew` | Environment Strategy & DX | Environment definitions (dev/staging/prod) have inconsistent resource specifications, missing parity, or undocumented differences |
|
|
82
|
+
|
|
83
|
+
### Confidence Enum
|
|
84
|
+
|
|
85
|
+
| Value | Description |
|
|
86
|
+
|-------|-------------|
|
|
87
|
+
| `high` | Strong evidence, verified through multiple signals |
|
|
88
|
+
| `medium` | Reasonable evidence, single signal source |
|
|
89
|
+
| `low` | Weak evidence, needs human verification |
|
|
90
|
+
|
|
91
|
+
## Evidence Object
|
|
92
|
+
|
|
93
|
+
The `evidence` field is a composite object grouping source location data:
|
|
94
|
+
|
|
95
|
+
```yaml
|
|
96
|
+
evidence:
|
|
97
|
+
file: "src/services/auth.ts" # Relative path from project root (non-empty string)
|
|
98
|
+
line: 42 # Single line number
|
|
99
|
+
protocol: "rest" # Optional. Protocol type
|
|
100
|
+
```
|
|
101
|
+
|
|
102
|
+
Or with a line range:
|
|
103
|
+
|
|
104
|
+
```yaml
|
|
105
|
+
evidence:
|
|
106
|
+
file: "config/database.yml"
|
|
107
|
+
line: "15-28" # Line range (start-end)
|
|
108
|
+
```
|
|
109
|
+
|
|
110
|
+
Or without the optional protocol field (backward compatible):
|
|
111
|
+
|
|
112
|
+
```yaml
|
|
113
|
+
evidence:
|
|
114
|
+
file: "src/utils/helper.ts"
|
|
115
|
+
line: 10
|
|
116
|
+
```
|
|
117
|
+
|
|
118
|
+
| Sub-field | Type | Required | Constraints |
|
|
119
|
+
|-----------|------|----------|-------------|
|
|
120
|
+
| `file` | string | yes | Relative path from project root. Must be non-empty. |
|
|
121
|
+
| `line` | number or string | yes | Single line number (integer) or range as `"start-end"` string |
|
|
122
|
+
| `protocol` | string | no | Optional. One of `rest`, `graphql`, `grpc`, `websocket`, or any custom string. Omit if not applicable. When present, must be a non-empty string. |
|
|
123
|
+
|
|
124
|
+
## ID Format
|
|
125
|
+
|
|
126
|
+
Pattern: `GAP-{scan_type}-{seq}`
|
|
127
|
+
|
|
128
|
+
- `scan_type` is the category value (e.g., `dead-code`, `config-contradiction`)
|
|
129
|
+
- `seq` is a zero-padded 3-digit sequence number starting at 001
|
|
130
|
+
- Regex: `^GAP-(config-contradiction|dead-code|hard-coded-logic|security-endpoint|runtime-behavior|doc-code-drift|integration-seam|resource-drift|config-sprawl|secret-exposure|missing-policy|environment-skew)-\d{3}$`
|
|
131
|
+
|
|
132
|
+
The `scan_type` component in the ID maps directly to the `category` value. See the Category Enum tables (Application + Infrastructure) for the full list of valid scan types.
|
|
133
|
+
|
|
134
|
+
## Validation Rules
|
|
135
|
+
|
|
136
|
+
All fields listed in the Field Reference are **required** — a gap entry with any missing field is invalid.
|
|
137
|
+
|
|
138
|
+
### Enum Validation
|
|
139
|
+
|
|
140
|
+
- `severity` must be exactly one of: `critical`, `high`, `medium`, `low`, `info`
|
|
141
|
+
- `category` must be exactly one of: `config-contradiction`, `dead-code`, `hard-coded-logic`, `security-endpoint`, `runtime-behavior`, `doc-code-drift`, `integration-seam`, `resource-drift`, `config-sprawl`, `secret-exposure`, `missing-policy`, `environment-skew`
|
|
142
|
+
- `confidence` must be exactly one of: `high`, `medium`, `low`
|
|
143
|
+
- Any value not in the enum set must be rejected
|
|
144
|
+
|
|
145
|
+
### Format Validation
|
|
146
|
+
|
|
147
|
+
- `id` must match the regex `^GAP-(config-contradiction|dead-code|hard-coded-logic|security-endpoint|runtime-behavior|doc-code-drift|integration-seam|resource-drift|config-sprawl|secret-exposure|missing-policy|environment-skew)-\d{3}$`
|
|
148
|
+
- `evidence.file` must be a non-empty string containing a relative path (no leading `/`)
|
|
149
|
+
- `evidence.line` must be a positive integer or a range string matching `^\d+-\d+$`
|
|
150
|
+
- `title` should not exceed 80 characters
|
|
151
|
+
- `verified_by` must be a non-empty string identifying the scan agent
|
|
152
|
+
- `evidence.protocol` when present, must be a non-empty string
|
|
153
|
+
|
|
154
|
+
### Required vs Optional
|
|
155
|
+
|
|
156
|
+
All 9 top-level fields (`id`, `category`, `severity`, `title`, `description`, `evidence`, `recommendation`, `verified_by`, `confidence`) are **required**. There are no optional top-level fields in the base schema.
|
|
157
|
+
|
|
158
|
+
The `evidence` object contains one optional sub-field: `protocol`. This is the first optional sub-field in the schema. Existing gap entries that omit `protocol` remain fully valid.
|
|
159
|
+
|
|
160
|
+
### Optional Field Validation
|
|
161
|
+
|
|
162
|
+
The `protocol` sub-field of the `evidence` object is not enum-validated. It accepts any non-empty string when present. Recommended canonical values are `rest`, `graphql`, `grpc`, and `websocket`, but custom strings (e.g., `mqtt`, `soap`, `amqp`) are also accepted without schema changes. When `protocol` is omitted entirely, the gap entry remains valid (backward compatible). When present, an empty string is invalid.
|
|
163
|
+
|
|
164
|
+
## Budget Control
|
|
165
|
+
|
|
166
|
+
Each gap entry should average approximately **100 tokens** in structured YAML format (per NFR-024).
|
|
167
|
+
|
|
168
|
+
Guidelines:
|
|
169
|
+
- Use structured YAML, not prose paragraphs
|
|
170
|
+
- Keep `title` under 80 characters
|
|
171
|
+
- Keep `description` to 1-2 sentences
|
|
172
|
+
- Keep `recommendation` to 1-2 sentences
|
|
173
|
+
- Avoid embedding full code snippets in descriptions — reference via `evidence` instead
|
|
174
|
+
|
|
175
|
+
With 12 categories across application and infrastructure scans, total token usage varies by project type. After consolidation and deduplication (E11-S10), the single `consolidated-gaps.md` must stay within the 40K framework context budget.
|
|
176
|
+
|
|
177
|
+
## Examples
|
|
178
|
+
|
|
179
|
+
### Application Category Example
|
|
180
|
+
|
|
181
|
+
```yaml
|
|
182
|
+
id: "GAP-config-contradiction-001"
|
|
183
|
+
category: "config-contradiction"
|
|
184
|
+
severity: "high"
|
|
185
|
+
title: "Database timeout mismatch between config files"
|
|
186
|
+
description: "production.yaml sets db.timeout to 30s while docker-compose.yml sets POSTGRES_TIMEOUT to 10s."
|
|
187
|
+
evidence:
|
|
188
|
+
file: "config/production.yaml"
|
|
189
|
+
line: 18
|
|
190
|
+
recommendation: "Align timeout values. Set both to 30s or extract to a shared environment variable."
|
|
191
|
+
verified_by: "config-scanner"
|
|
192
|
+
confidence: "high"
|
|
193
|
+
```
|
|
194
|
+
|
|
195
|
+
### Application Category Example with Protocol
|
|
196
|
+
|
|
197
|
+
```yaml
|
|
198
|
+
id: "GAP-security-endpoint-001"
|
|
199
|
+
category: "security-endpoint"
|
|
200
|
+
severity: "high"
|
|
201
|
+
title: "Unprotected admin route exposes user management API"
|
|
202
|
+
description: "The /api/admin/users endpoint has no authentication middleware applied."
|
|
203
|
+
evidence:
|
|
204
|
+
file: "src/routes/admin.ts"
|
|
205
|
+
line: 15
|
|
206
|
+
protocol: "rest"
|
|
207
|
+
recommendation: "Add authentication middleware to all /api/admin/* routes."
|
|
208
|
+
verified_by: "security-scanner"
|
|
209
|
+
confidence: "high"
|
|
210
|
+
```
|
|
211
|
+
|
|
212
|
+
### Infrastructure Category Examples
|
|
213
|
+
|
|
214
|
+
```yaml
|
|
215
|
+
id: "GAP-resource-drift-001"
|
|
216
|
+
category: "resource-drift"
|
|
217
|
+
severity: "high"
|
|
218
|
+
title: "Terraform state shows orphaned S3 bucket"
|
|
219
|
+
description: "S3 bucket 'app-logs-legacy' exists in AWS but is not declared in any Terraform configuration."
|
|
220
|
+
evidence:
|
|
221
|
+
file: "infra/terraform/storage.tf"
|
|
222
|
+
line: "1-45"
|
|
223
|
+
recommendation: "Import the bucket into Terraform state or delete it if no longer needed."
|
|
224
|
+
verified_by: "infra-drift-scanner"
|
|
225
|
+
confidence: "high"
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
```yaml
|
|
229
|
+
id: "GAP-config-sprawl-001"
|
|
230
|
+
category: "config-sprawl"
|
|
231
|
+
severity: "medium"
|
|
232
|
+
title: "Database port duplicated across 4 config files"
|
|
233
|
+
description: "Port 5432 is hardcoded in Dockerfile, docker-compose.yml, Helm values.yaml, and Terraform variables.tf."
|
|
234
|
+
evidence:
|
|
235
|
+
file: "docker-compose.yml"
|
|
236
|
+
line: 14
|
|
237
|
+
recommendation: "Extract database port to a single environment variable, reference it from all 4 files."
|
|
238
|
+
verified_by: "config-sprawl-scanner"
|
|
239
|
+
confidence: "high"
|
|
240
|
+
```
|
|
241
|
+
|
|
242
|
+
```yaml
|
|
243
|
+
id: "GAP-secret-exposure-001"
|
|
244
|
+
category: "secret-exposure"
|
|
245
|
+
severity: "critical"
|
|
246
|
+
title: "AWS access key embedded in Terraform variables"
|
|
247
|
+
description: "AWS_ACCESS_KEY_ID is set as a default value in variables.tf instead of using a secrets manager."
|
|
248
|
+
evidence:
|
|
249
|
+
file: "infra/terraform/variables.tf"
|
|
250
|
+
line: 23
|
|
251
|
+
recommendation: "Remove the default value, use AWS SSM Parameter Store or HashiCorp Vault."
|
|
252
|
+
verified_by: "secret-scanner"
|
|
253
|
+
confidence: "high"
|
|
254
|
+
```
|
|
255
|
+
|
|
256
|
+
```yaml
|
|
257
|
+
id: "GAP-missing-policy-001"
|
|
258
|
+
category: "missing-policy"
|
|
259
|
+
severity: "medium"
|
|
260
|
+
title: "No policy-as-code enforcement for Kubernetes manifests"
|
|
261
|
+
description: "Kubernetes deployments lack OPA/Gatekeeper or Kyverno policies for security constraints."
|
|
262
|
+
evidence:
|
|
263
|
+
file: "k8s/deployments/api-server.yaml"
|
|
264
|
+
line: "1-30"
|
|
265
|
+
recommendation: "Add OPA Gatekeeper constraints or Kyverno policies to enforce pod security standards."
|
|
266
|
+
verified_by: "policy-scanner"
|
|
267
|
+
confidence: "medium"
|
|
268
|
+
```
|
|
269
|
+
|
|
270
|
+
```yaml
|
|
271
|
+
id: "GAP-environment-skew-001"
|
|
272
|
+
category: "environment-skew"
|
|
273
|
+
severity: "high"
|
|
274
|
+
title: "Staging uses 2 replicas while production uses 5"
|
|
275
|
+
description: "Replica counts differ between staging and production with no documented justification."
|
|
276
|
+
evidence:
|
|
277
|
+
file: "k8s/overlays/staging/deployment-patch.yaml"
|
|
278
|
+
line: 8
|
|
279
|
+
recommendation: "Document the replica difference rationale or align staging proportionally."
|
|
280
|
+
verified_by: "env-skew-scanner"
|
|
281
|
+
confidence: "high"
|
|
282
|
+
```
|
|
@@ -0,0 +1,356 @@
|
|
|
1
|
+
---
|
|
2
|
+
template: 'infra-prd'
|
|
3
|
+
version: 1.0.0
|
|
4
|
+
used_by: ['create-prd']
|
|
5
|
+
domain: '{domain}'
|
|
6
|
+
---
|
|
7
|
+
|
|
8
|
+
# Infrastructure PRD: {product_name}
|
|
9
|
+
|
|
10
|
+
> **Project:** {project_name}
|
|
11
|
+
> **Domain:** {domain}
|
|
12
|
+
> **Date:** {date}
|
|
13
|
+
> **Author:** {agent_name}
|
|
14
|
+
> **Status:** Draft | In Review | Approved
|
|
15
|
+
|
|
16
|
+
## 1. Overview & Scope
|
|
17
|
+
|
|
18
|
+
{Platform purpose, target environments, and team ownership.}
|
|
19
|
+
|
|
20
|
+
### Platform Purpose
|
|
21
|
+
|
|
22
|
+
{What this infrastructure provides and why it exists.}
|
|
23
|
+
|
|
24
|
+
### Target Environments
|
|
25
|
+
|
|
26
|
+
| Environment | Purpose | Region(s) | Owner |
|
|
27
|
+
|-------------|---------|-----------|-------|
|
|
28
|
+
| {env_name} | {purpose} | {regions} | {team} |
|
|
29
|
+
|
|
30
|
+
### Team Ownership
|
|
31
|
+
|
|
32
|
+
| Component | Owning Team | Escalation |
|
|
33
|
+
|-----------|-------------|------------|
|
|
34
|
+
| {component} | {team} | {contact} |
|
|
35
|
+
|
|
36
|
+
## 2. Goals and Non-Goals
|
|
37
|
+
|
|
38
|
+
### Goals
|
|
39
|
+
- {Goal 1}
|
|
40
|
+
- {Goal 2}
|
|
41
|
+
|
|
42
|
+
### Non-Goals
|
|
43
|
+
- {Explicitly out of scope item 1}
|
|
44
|
+
|
|
45
|
+
## 3. Platform Capabilities
|
|
46
|
+
|
|
47
|
+
{What the infrastructure enables. Each capability follows the format below.}
|
|
48
|
+
|
|
49
|
+
| ID | Capability | SLO |
|
|
50
|
+
|----|-----------|-----|
|
|
51
|
+
| PC-01 | Enable {team/service} to {capability} with {SLO} | {target} |
|
|
52
|
+
| PC-02 | Enable {team/service} to {capability} with {SLO} | {target} |
|
|
53
|
+
|
|
54
|
+
## 4. Resource Specifications
|
|
55
|
+
|
|
56
|
+
{Compute, storage, networking, IAM provisioning. Per-environment breakdown.}
|
|
57
|
+
|
|
58
|
+
### Compute
|
|
59
|
+
|
|
60
|
+
| Resource | Environment | Spec | Scaling |
|
|
61
|
+
|----------|-------------|------|---------|
|
|
62
|
+
| {resource} | {env} | {cpu/memory} | {auto/manual, min-max} |
|
|
63
|
+
|
|
64
|
+
### Storage
|
|
65
|
+
|
|
66
|
+
| Store | Type | Size | IOPS | Backup |
|
|
67
|
+
|-------|------|------|------|--------|
|
|
68
|
+
| {store} | {block/object/file} | {size} | {iops} | {policy} |
|
|
69
|
+
|
|
70
|
+
### Networking
|
|
71
|
+
|
|
72
|
+
| Component | CIDR/Range | Protocol | Purpose |
|
|
73
|
+
|-----------|-----------|----------|---------|
|
|
74
|
+
| {component} | {cidr} | {protocol} | {purpose} |
|
|
75
|
+
|
|
76
|
+
### IAM Provisioning
|
|
77
|
+
|
|
78
|
+
| Role/Policy | Scope | Permissions | Lifecycle |
|
|
79
|
+
|-------------|-------|-------------|-----------|
|
|
80
|
+
| {role} | {scope} | {permissions} | {create/rotate/revoke} |
|
|
81
|
+
|
|
82
|
+
### State Management
|
|
83
|
+
|
|
84
|
+
{State backend strategy — e.g., Terraform remote state, locking, encryption.}
|
|
85
|
+
|
|
86
|
+
| Backend | Lock Provider | Encryption | Workspace Strategy |
|
|
87
|
+
|---------|--------------|------------|-------------------|
|
|
88
|
+
| {backend} | {lock} | {encryption} | {workspace} |
|
|
89
|
+
|
|
90
|
+
### Data Persistence Requirements
|
|
91
|
+
|
|
92
|
+
| Data Store | Durability | Replication | Retention |
|
|
93
|
+
|------------|-----------|-------------|-----------|
|
|
94
|
+
| {store} | {durability} | {replication} | {retention} |
|
|
95
|
+
|
|
96
|
+
## 5. Operational SLOs
|
|
97
|
+
|
|
98
|
+
{Availability targets, MTTR, RTO/RPO, error budgets, resource utilization targets.}
|
|
99
|
+
|
|
100
|
+
### Availability & Recovery
|
|
101
|
+
|
|
102
|
+
| Metric | Target | Measurement |
|
|
103
|
+
|--------|--------|-------------|
|
|
104
|
+
| Availability | {99.x%} | {how measured} |
|
|
105
|
+
| MTTR | {minutes} | {how measured} |
|
|
106
|
+
| RTO | {minutes} | {recovery time objective} |
|
|
107
|
+
| RPO | {minutes} | {recovery point objective} |
|
|
108
|
+
| Error Budget | {x% per month} | {how calculated} |
|
|
109
|
+
|
|
110
|
+
### Resource Utilization Targets
|
|
111
|
+
|
|
112
|
+
| Resource | Target Utilization | Alert Threshold |
|
|
113
|
+
|----------|-------------------|-----------------|
|
|
114
|
+
| CPU | {target%} | {alert%} |
|
|
115
|
+
| Memory | {target%} | {alert%} |
|
|
116
|
+
| Storage IOPS | {target} | {threshold} |
|
|
117
|
+
| Network Bandwidth | {target Gbps} | {threshold} |
|
|
118
|
+
| Network Latency | {target ms} | {threshold} |
|
|
119
|
+
|
|
120
|
+
## 6. Security Posture
|
|
121
|
+
|
|
122
|
+
{Security requirements tailored for infrastructure projects.}
|
|
123
|
+
|
|
124
|
+
### IAM/RBAC
|
|
125
|
+
|
|
126
|
+
{Identity and access management, role-based access control policies.}
|
|
127
|
+
|
|
128
|
+
| Principal | Role | Scope | MFA Required | Review Cadence |
|
|
129
|
+
|-----------|------|-------|-------------|----------------|
|
|
130
|
+
| {principal} | {role} | {scope} | {yes/no} | {quarterly/annually} |
|
|
131
|
+
|
|
132
|
+
### Network Segmentation
|
|
133
|
+
|
|
134
|
+
{Network isolation, security groups, firewall rules, zero-trust boundaries.}
|
|
135
|
+
|
|
136
|
+
| Zone | CIDR | Ingress Rules | Egress Rules | Purpose |
|
|
137
|
+
|------|------|---------------|-------------|---------|
|
|
138
|
+
| {zone} | {cidr} | {rules} | {rules} | {purpose} |
|
|
139
|
+
|
|
140
|
+
### Secrets Management
|
|
141
|
+
|
|
142
|
+
{Secrets storage, rotation, injection, and audit strategy.}
|
|
143
|
+
|
|
144
|
+
| Secret Type | Store | Rotation | Injection Method |
|
|
145
|
+
|-------------|-------|----------|-----------------|
|
|
146
|
+
| {type} | {vault/kms/ssm} | {cadence} | {env var/sidecar/init container} |
|
|
147
|
+
|
|
148
|
+
### Image Provenance
|
|
149
|
+
|
|
150
|
+
{Container image signing, scanning, and supply chain verification.}
|
|
151
|
+
|
|
152
|
+
| Registry | Signing | Scanning | Admission Policy |
|
|
153
|
+
|----------|---------|----------|-----------------|
|
|
154
|
+
| {registry} | {cosign/notary} | {trivy/grype} | {policy} |
|
|
155
|
+
|
|
156
|
+
### Compliance Mapping
|
|
157
|
+
|
|
158
|
+
{Regulatory and compliance framework alignment.}
|
|
159
|
+
|
|
160
|
+
| Framework | Controls | Evidence | Audit Frequency |
|
|
161
|
+
|-----------|----------|----------|----------------|
|
|
162
|
+
| {SOC2/HIPAA/PCI/ISO} | {control IDs} | {how demonstrated} | {cadence} |
|
|
163
|
+
|
|
164
|
+
## 7. Environment Strategy & Developer Experience
|
|
165
|
+
|
|
166
|
+
{Environment parity, promotion pipeline, drift detection, self-service provisioning.}
|
|
167
|
+
|
|
168
|
+
### Environment Parity
|
|
169
|
+
|
|
170
|
+
| Dimension | Dev | Staging | Production |
|
|
171
|
+
|-----------|-----|---------|-----------|
|
|
172
|
+
| {dimension} | {dev config} | {staging config} | {prod config} |
|
|
173
|
+
|
|
174
|
+
### Promotion Pipeline
|
|
175
|
+
|
|
176
|
+
{How changes flow from dev to production.}
|
|
177
|
+
|
|
178
|
+
```
|
|
179
|
+
{dev} → {staging} → {production}
|
|
180
|
+
```
|
|
181
|
+
|
|
182
|
+
### Drift Detection
|
|
183
|
+
|
|
184
|
+
{How configuration drift is detected and remediated.}
|
|
185
|
+
|
|
186
|
+
| Tool | Schedule | Remediation | Notification |
|
|
187
|
+
|------|----------|-------------|-------------|
|
|
188
|
+
| {tool} | {cron} | {auto/manual} | {channel} |
|
|
189
|
+
|
|
190
|
+
### Self-Service Provisioning
|
|
191
|
+
|
|
192
|
+
{Developer self-service capabilities and guardrails.}
|
|
193
|
+
|
|
194
|
+
| Capability | Interface | Guardrails | Approval |
|
|
195
|
+
|------------|-----------|-----------|----------|
|
|
196
|
+
| {capability} | {CLI/portal/API} | {policy} | {auto/manual} |
|
|
197
|
+
|
|
198
|
+
### Onboarding
|
|
199
|
+
|
|
200
|
+
{New team member and new service onboarding procedures.}
|
|
201
|
+
|
|
202
|
+
### Observability
|
|
203
|
+
|
|
204
|
+
{Monitoring, logging, tracing, and alerting strategy.}
|
|
205
|
+
|
|
206
|
+
| Signal | Tool | Retention | Alerting |
|
|
207
|
+
|--------|------|-----------|---------|
|
|
208
|
+
| Metrics | {prometheus/cloudwatch} | {retention} | {pagerduty/slack} |
|
|
209
|
+
| Logs | {elk/cloudwatch} | {retention} | {rules} |
|
|
210
|
+
| Traces | {jaeger/xray} | {retention} | {rules} |
|
|
211
|
+
|
|
212
|
+
## 8. Dependencies & Provider Constraints
|
|
213
|
+
|
|
214
|
+
{Cloud provider limits, Terraform provider versions, upstream service contracts.}
|
|
215
|
+
|
|
216
|
+
### Cloud Provider Limits
|
|
217
|
+
|
|
218
|
+
| Provider | Service | Limit | Current Usage | Headroom |
|
|
219
|
+
|----------|---------|-------|--------------|----------|
|
|
220
|
+
| {provider} | {service} | {limit} | {current} | {remaining} |
|
|
221
|
+
|
|
222
|
+
### Terraform Provider Versions
|
|
223
|
+
|
|
224
|
+
| Provider | Version | Constraint | Notes |
|
|
225
|
+
|----------|---------|-----------|-------|
|
|
226
|
+
| {provider} | {version} | {~> x.y} | {notes} |
|
|
227
|
+
|
|
228
|
+
### Upstream Service Contracts
|
|
229
|
+
|
|
230
|
+
| Service | SLA | API Version | Deprecation |
|
|
231
|
+
|---------|-----|------------|-------------|
|
|
232
|
+
| {service} | {sla} | {version} | {date or N/A} |
|
|
233
|
+
|
|
234
|
+
## 9. Cost Model
|
|
235
|
+
|
|
236
|
+
{Per-environment resource cost estimates, scaling cost projections, and cost-per-unit efficiency metrics.}
|
|
237
|
+
|
|
238
|
+
### Per-Environment Resource Cost Estimates
|
|
239
|
+
|
|
240
|
+
| Resource | Dev (monthly) | Staging (monthly) | Production (monthly) |
|
|
241
|
+
|----------|--------------|-------------------|---------------------|
|
|
242
|
+
| Compute | ${cost} | ${cost} | ${cost} |
|
|
243
|
+
| Storage | ${cost} | ${cost} | ${cost} |
|
|
244
|
+
| Networking | ${cost} | ${cost} | ${cost} |
|
|
245
|
+
| Monitoring | ${cost} | ${cost} | ${cost} |
|
|
246
|
+
| **Total** | **${total}** | **${total}** | **${total}** |
|
|
247
|
+
|
|
248
|
+
### Scaling Cost Projections
|
|
249
|
+
|
|
250
|
+
| Scenario | Trigger | Additional Cost | Timeline |
|
|
251
|
+
|----------|---------|----------------|----------|
|
|
252
|
+
| {scenario} | {trigger condition} | ${projection} | {timeframe} |
|
|
253
|
+
|
|
254
|
+
### Cost-Per-Unit Efficiency Metrics
|
|
255
|
+
|
|
256
|
+
| Metric | Current | Target | Optimization |
|
|
257
|
+
|--------|---------|--------|-------------|
|
|
258
|
+
| Cost per request | ${cost} | ${target} | {strategy} |
|
|
259
|
+
| Cost per GB stored | ${cost} | ${target} | {strategy} |
|
|
260
|
+
| Cost per environment | ${cost} | ${target} | {strategy} |
|
|
261
|
+
|
|
262
|
+
## 10. Verification Strategy
|
|
263
|
+
|
|
264
|
+
{Policy-as-code (OPA/Rego, Checkov, tfsec), plan validation, smoke tests, drift detection, chaos testing.}
|
|
265
|
+
|
|
266
|
+
### Policy-as-Code
|
|
267
|
+
|
|
268
|
+
| Tool | Scope | Rules | Enforcement |
|
|
269
|
+
|------|-------|-------|-------------|
|
|
270
|
+
| OPA/Rego | {scope} | {rule count} | {warn/deny} |
|
|
271
|
+
| Checkov | {scope} | {rule count} | {warn/deny} |
|
|
272
|
+
| tfsec | {scope} | {rule count} | {warn/deny} |
|
|
273
|
+
|
|
274
|
+
### Plan Validation
|
|
275
|
+
|
|
276
|
+
{Terraform plan review, cost estimation, blast radius analysis.}
|
|
277
|
+
|
|
278
|
+
| Check | Tool | Gate | Threshold |
|
|
279
|
+
|-------|------|------|-----------|
|
|
280
|
+
| {check} | {tool} | {CI/manual} | {threshold} |
|
|
281
|
+
|
|
282
|
+
### Smoke Tests
|
|
283
|
+
|
|
284
|
+
{Post-deployment verification tests.}
|
|
285
|
+
|
|
286
|
+
| Test | Target | Expected | Timeout |
|
|
287
|
+
|------|--------|----------|---------|
|
|
288
|
+
| {test} | {endpoint/resource} | {result} | {timeout} |
|
|
289
|
+
|
|
290
|
+
### Drift Detection
|
|
291
|
+
|
|
292
|
+
{Scheduled plan diffs, state file monitoring, compliance scanning.}
|
|
293
|
+
|
|
294
|
+
### Chaos Testing
|
|
295
|
+
|
|
296
|
+
{Failure injection, resilience validation.}
|
|
297
|
+
|
|
298
|
+
| Experiment | Target | Hypothesis | Blast Radius |
|
|
299
|
+
|-----------|--------|-----------|-------------|
|
|
300
|
+
| {experiment} | {target} | {hypothesis} | {scope} |
|
|
301
|
+
|
|
302
|
+
## 11. Operational Runbooks
|
|
303
|
+
|
|
304
|
+
{Scaling, failover, incident response, rollback procedures.}
|
|
305
|
+
|
|
306
|
+
### Scaling Procedures
|
|
307
|
+
|
|
308
|
+
| Trigger | Action | Rollback | Owner |
|
|
309
|
+
|---------|--------|----------|-------|
|
|
310
|
+
| {trigger} | {action} | {rollback} | {team} |
|
|
311
|
+
|
|
312
|
+
### Failover Procedures
|
|
313
|
+
|
|
314
|
+
| Scenario | Detection | Response | RTO |
|
|
315
|
+
|----------|-----------|----------|-----|
|
|
316
|
+
| {scenario} | {detection} | {response steps} | {rto} |
|
|
317
|
+
|
|
318
|
+
### Incident Response
|
|
319
|
+
|
|
320
|
+
| Severity | Notification | Escalation | Runbook |
|
|
321
|
+
|----------|-------------|------------|---------|
|
|
322
|
+
| P1 | {channel} | {escalation path} | {link} |
|
|
323
|
+
| P2 | {channel} | {escalation path} | {link} |
|
|
324
|
+
|
|
325
|
+
### Rollback Procedures
|
|
326
|
+
|
|
327
|
+
| Change Type | Rollback Method | Verification | Duration |
|
|
328
|
+
|-------------|----------------|-------------|----------|
|
|
329
|
+
| {type} | {method} | {verification} | {estimate} |
|
|
330
|
+
|
|
331
|
+
## 12. Requirements Summary
|
|
332
|
+
|
|
333
|
+
### Infrastructure Requirements
|
|
334
|
+
|
|
335
|
+
| ID | Description | Priority | Status |
|
|
336
|
+
|----|------------|----------|--------|
|
|
337
|
+
| IR-001 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
|
|
338
|
+
| IR-002 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
|
|
339
|
+
|
|
340
|
+
### Operational Requirements
|
|
341
|
+
|
|
342
|
+
| ID | Description | Priority | Status |
|
|
343
|
+
|----|------------|----------|--------|
|
|
344
|
+
| OR-001 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
|
|
345
|
+
| OR-002 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
|
|
346
|
+
|
|
347
|
+
### Security Requirements
|
|
348
|
+
|
|
349
|
+
| ID | Description | Priority | Status |
|
|
350
|
+
|----|------------|----------|--------|
|
|
351
|
+
| SR-001 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
|
|
352
|
+
| SR-002 | {description} | {Must-Have/Should-Have/Nice-to-Have} | {Draft/Approved} |
|
|
353
|
+
|
|
354
|
+
## 13. Open Questions
|
|
355
|
+
|
|
356
|
+
- [ ] {Unresolved question}
|