@thierrynakoa/fire-flow 10.0.0 → 12.2.1
This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
- package/.claude-plugin/plugin.json +8 -8
- package/ARCHITECTURE-DIAGRAM.md +7 -4
- package/COMMAND-REFERENCE.md +33 -13
- package/DOMINION-FLOW-OVERVIEW.md +581 -421
- package/QUICK-START.md +3 -3
- package/README.md +101 -44
- package/TROUBLESHOOTING.md +264 -264
- package/agents/fire-executor.md +200 -116
- package/agents/fire-fact-checker.md +276 -276
- package/agents/fire-phoenix-analyst.md +394 -0
- package/agents/fire-planner.md +145 -53
- package/agents/fire-project-researcher.md +155 -155
- package/agents/fire-research-synthesizer.md +166 -166
- package/agents/fire-researcher.md +144 -59
- package/agents/fire-roadmapper.md +215 -203
- package/agents/fire-verifier.md +247 -65
- package/agents/fire-vision-architect.md +381 -0
- package/commands/fire-0-orient.md +476 -476
- package/commands/fire-1a-new.md +216 -0
- package/commands/fire-1b-research.md +210 -0
- package/commands/fire-1c-setup.md +254 -0
- package/commands/{fire-1a-discuss.md → fire-1d-discuss.md} +35 -7
- package/commands/fire-3-execute.md +55 -2
- package/commands/fire-4-verify.md +61 -0
- package/commands/fire-5-handoff.md +2 -2
- package/commands/fire-6-resume.md +37 -2
- package/commands/fire-add-new-skill.md +2 -2
- package/commands/fire-autonomous.md +20 -3
- package/commands/fire-brainstorm.md +1 -1
- package/commands/fire-complete-milestone.md +2 -2
- package/commands/fire-cost.md +183 -0
- package/commands/fire-dashboard.md +2 -2
- package/commands/fire-debug.md +663 -663
- package/commands/fire-loop-resume.md +2 -2
- package/commands/fire-loop-stop.md +1 -1
- package/commands/fire-loop.md +1168 -1168
- package/commands/fire-map-codebase.md +3 -3
- package/commands/fire-new-milestone.md +356 -356
- package/commands/fire-phoenix.md +603 -0
- package/commands/fire-reflect.md +235 -235
- package/commands/fire-research.md +246 -246
- package/commands/fire-search.md +1 -1
- package/commands/fire-skills-diff.md +3 -3
- package/commands/fire-skills-history.md +3 -3
- package/commands/fire-skills-rollback.md +7 -7
- package/commands/fire-skills-sync.md +5 -5
- package/commands/fire-test.md +9 -9
- package/commands/fire-todos.md +1 -1
- package/commands/fire-update.md +5 -5
- package/hooks/hooks.json +16 -16
- package/hooks/run-hook.sh +8 -8
- package/hooks/run-session-end.sh +7 -7
- package/hooks/session-end.sh +90 -90
- package/hooks/session-start.sh +1 -1
- package/package.json +4 -2
- package/plugin.json +7 -7
- package/references/metrics-and-trends.md +1 -1
- package/skills-library/SKILLS-INDEX.md +588 -588
- package/skills-library/_general/methodology/AUTONOMOUS_ORCHESTRATION.md +182 -0
- package/skills-library/_general/methodology/BACKWARD_PLANNING_INTERVIEW.md +307 -0
- package/skills-library/_general/methodology/CIRCUIT_BREAKER_INTELLIGENCE.md +163 -0
- package/skills-library/_general/methodology/CONTEXT_ROTATION.md +151 -0
- package/skills-library/_general/methodology/DEAD_ENDS_SHELF.md +188 -0
- package/skills-library/_general/methodology/DESIGN_PHILOSOPHY_ENFORCEMENT.md +152 -0
- package/skills-library/_general/methodology/INTERNAL_CONSISTENCY_AUDIT.md +212 -0
- package/skills-library/_general/methodology/LIVE_BREADCRUMB_PROTOCOL.md +242 -0
- package/skills-library/_general/methodology/PHOENIX_REBUILD_METHODOLOGY.md +251 -0
- package/skills-library/_general/methodology/QUALITY_GATES_AND_VERIFICATION.md +157 -0
- package/skills-library/_general/methodology/RELIABILITY_PREDICTION.md +104 -0
- package/skills-library/_general/methodology/REQUIREMENTS_DECOMPOSITION.md +155 -0
- package/skills-library/_general/methodology/SELF_TESTING_FEEDBACK_LOOP.md +143 -0
- package/skills-library/_general/methodology/STACK_COMPATIBILITY_MATRIX.md +178 -0
- package/skills-library/_general/methodology/TIERED_CONTEXT_ARCHITECTURE.md +118 -0
- package/skills-library/_general/methodology/ZERO_FRICTION_CLI_SETUP.md +312 -0
- package/skills-library/_general/methodology/autonomous-multi-phase-build.md +133 -0
- package/skills-library/_general/methodology/claude-md-archival.md +280 -0
- package/skills-library/_general/methodology/debug-swarm-researcher-escape-hatch.md +240 -240
- package/skills-library/_general/methodology/git-worktrees-parallel.md +232 -0
- package/skills-library/_general/methodology/llm-judge-memory-crud.md +241 -0
- package/skills-library/_general/methodology/multi-project-autonomous-build.md +360 -0
- package/skills-library/_general/methodology/shell-autonomous-loop-fixplan.md +238 -238
- package/skills-library/_general/patterns-standards/GOF_DESIGN_PATTERNS_FOR_AI_AGENTS.md +358 -0
- package/skills-library/methodology/BREATH_BASED_PARALLEL_EXECUTION.md +1 -1
- package/skills-library/methodology/RESEARCH_BACKED_WORKFLOW_UPGRADE.md +1 -1
- package/skills-library/methodology/SABBATH_REST_PATTERN.md +1 -1
- package/templates/ASSUMPTIONS.md +1 -1
- package/templates/BLOCKERS.md +1 -1
- package/templates/DECISION_LOG.md +1 -1
- package/templates/phase-prompt.md +1 -1
- package/templates/phoenix-comparison.md +80 -0
- package/version.json +2 -2
- package/workflows/handoff-session.md +1 -1
- package/workflows/new-project.md +2 -2
- package/commands/fire-1-new.md +0 -281
|
@@ -0,0 +1,251 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: PHOENIX_REBUILD_METHODOLOGY
|
|
3
|
+
category: methodology
|
|
4
|
+
version: 1.0.0
|
|
5
|
+
contributed: 2026-03-06
|
|
6
|
+
contributor: dominion-flow
|
|
7
|
+
last_updated: 2026-03-06
|
|
8
|
+
tags: [phoenix, rebuild, refactor, intent-extraction, vibe-code, technical-debt, reverse-engineering]
|
|
9
|
+
difficulty: hard
|
|
10
|
+
sources:
|
|
11
|
+
- "Fred Brooks — No Silver Bullet (1986) — Essential vs Accidental Complexity"
|
|
12
|
+
- "Martin Fowler — Refactoring: Improving the Design of Existing Code"
|
|
13
|
+
- "Michael Feathers — Working Effectively with Legacy Code"
|
|
14
|
+
- "Refactoring.Guru — Design Pattern Catalog"
|
|
15
|
+
---
|
|
16
|
+
|
|
17
|
+
# Phoenix Rebuild Methodology
|
|
18
|
+
|
|
19
|
+
> **Core insight:** Don't refactor the mess — reverse-engineer the INTENT, then build clean from scratch. A phoenix burns the old and rises new. The ashes carry the knowledge; the new form carries none of the accidental complexity.
|
|
20
|
+
|
|
21
|
+
---
|
|
22
|
+
|
|
23
|
+
## 1. How to Extract Intent from Messy Code
|
|
24
|
+
|
|
25
|
+
### Reading Order (Critical — Do NOT Read Code First)
|
|
26
|
+
|
|
27
|
+
```
|
|
28
|
+
1. README / docs → What the developer SAID the app does
|
|
29
|
+
2. Route / endpoint files → The API surface reveals feature boundaries
|
|
30
|
+
3. Database schema/models → The data model reveals domain concepts
|
|
31
|
+
4. Tests (if any) → Tests encode intended behavior
|
|
32
|
+
5. Git commit messages → The narrative of how the code evolved
|
|
33
|
+
6. The code itself → LAST — read implementation after you understand intent
|
|
34
|
+
```
|
|
35
|
+
|
|
36
|
+
**Why this order:** Reading code first biases toward "what it does" instead of "what it was meant to do." Surrounding artifacts reveal intent more clearly than tangled implementation.
|
|
37
|
+
|
|
38
|
+
### Intent Extraction Patterns
|
|
39
|
+
|
|
40
|
+
| Pattern | What to Look For | What It Reveals |
|
|
41
|
+
|---------|-----------------|-----------------|
|
|
42
|
+
| **Naming Intent** | Function/variable names vs their behavior | Gap between name and behavior = accidental complexity |
|
|
43
|
+
| **Comment Intent** | Comments saying "should", "TODO", "HACK", "FIXME" | Unfulfilled intent — developer knew what they wanted |
|
|
44
|
+
| **Test Intent** | What tests assert (if tests exist) | The behaviors the developer cared about verifying |
|
|
45
|
+
| **Error Handling Intent** | What errors are caught vs thrown | What the developer thought could go wrong |
|
|
46
|
+
| **Commit Message Intent** | "fix:", "feat:", "hack:" prefixes | The sequence of intentions over time |
|
|
47
|
+
| **Dead Code Intent** | Commented-out code, unreachable branches | Abandoned attempts — replaced or forgotten? |
|
|
48
|
+
| **Copy-Paste Intent** | Duplicated blocks with minor variations | "I needed this to work like THAT but slightly different" |
|
|
49
|
+
| **Magic Number Intent** | Hardcoded values with no explanation | A business rule or config never extracted |
|
|
50
|
+
| **Import Intent** | Imported but unused libraries | Features planned but never implemented |
|
|
51
|
+
| **Overengineering Intent** | Complex abstractions wrapping simple logic | Developer anticipated needs that never materialized |
|
|
52
|
+
|
|
53
|
+
### The "Squint Test"
|
|
54
|
+
|
|
55
|
+
For any module, ask: **"If I squint past the implementation mess, what is this module's job in ONE sentence?"**
|
|
56
|
+
|
|
57
|
+
If you cannot answer in one sentence, the module violates Single Responsibility and should be split during rebuild. The squint test produces the "intent statement" for each feature.
|
|
58
|
+
|
|
59
|
+
---
|
|
60
|
+
|
|
61
|
+
## 2. Accidental vs Essential Complexity (Fred Brooks)
|
|
62
|
+
|
|
63
|
+
### Essential Complexity — Keep and Rewrite Clean
|
|
64
|
+
|
|
65
|
+
Complexity inherent to the PROBLEM itself. Cannot be removed without changing what the application does.
|
|
66
|
+
|
|
67
|
+
**Examples:**
|
|
68
|
+
- Tax calculation rules (complex because taxes are complex)
|
|
69
|
+
- Multi-currency arithmetic (complex because currencies are complex)
|
|
70
|
+
- Role-based permissions with inheritance (complex because access control is nuanced)
|
|
71
|
+
- Content repurposing logic (complex because each platform has different format requirements)
|
|
72
|
+
|
|
73
|
+
**During rebuild:** Preserve ALL essential complexity. Rewrite it cleaner, add tests, add comments — but do NOT simplify away the business rules.
|
|
74
|
+
|
|
75
|
+
### Accidental Complexity — Remove Entirely
|
|
76
|
+
|
|
77
|
+
Complexity introduced by the IMPLEMENTATION, not the problem. CAN and SHOULD be eliminated.
|
|
78
|
+
|
|
79
|
+
**Detection heuristics:**
|
|
80
|
+
```
|
|
81
|
+
IF removing the pattern changes WHAT the app does → ESSENTIAL (keep)
|
|
82
|
+
IF removing the pattern only changes HOW it does it → ACCIDENTAL (remove)
|
|
83
|
+
IF the pattern exists because "that's how the tutorial did it" → ACCIDENTAL
|
|
84
|
+
IF the pattern exists because "the business rule requires it" → ESSENTIAL
|
|
85
|
+
IF the pattern appears in "common anti-patterns" lists → likely ACCIDENTAL
|
|
86
|
+
```
|
|
87
|
+
|
|
88
|
+
**Common accidental complexity indicators:**
|
|
89
|
+
- Global state mutations instead of state management
|
|
90
|
+
- Callback nesting instead of async/await
|
|
91
|
+
- Raw SQL strings instead of parameterized queries / ORM
|
|
92
|
+
- No separation between routes, business logic, and data access
|
|
93
|
+
- Duplicated code instead of shared functions
|
|
94
|
+
- Inconsistent error handling (some try/catch, some not)
|
|
95
|
+
- No type safety (everything is `any`)
|
|
96
|
+
- Hardcoded configuration values
|
|
97
|
+
- No environment separation (dev/staging/prod)
|
|
98
|
+
|
|
99
|
+
---
|
|
100
|
+
|
|
101
|
+
## 3. Edge Case Preservation Protocol
|
|
102
|
+
|
|
103
|
+
### The 5-Step Protocol
|
|
104
|
+
|
|
105
|
+
Every edge case in the original code must be:
|
|
106
|
+
|
|
107
|
+
```
|
|
108
|
+
1. IDENTIFIED — Found in the code (conditional branches, special cases)
|
|
109
|
+
2. DOCUMENTED — Recorded in INTENT.md with WHY it exists
|
|
110
|
+
3. CLASSIFIED — Is this edge case still needed in the rebuild?
|
|
111
|
+
4. CARRIED — If needed, it MUST appear in the rebuild BLUEPRINT
|
|
112
|
+
5. VERIFIED — The rebuilt project must handle this edge case (test it)
|
|
113
|
+
```
|
|
114
|
+
|
|
115
|
+
### Where Edge Cases Hide (Top 10 Locations)
|
|
116
|
+
|
|
117
|
+
```
|
|
118
|
+
1. if/else branches with non-obvious conditions
|
|
119
|
+
2. try/catch blocks with specific error type handling
|
|
120
|
+
3. Database query filters with multiple conditions
|
|
121
|
+
4. Validation rules with specific ranges or patterns
|
|
122
|
+
5. Timeout/retry logic
|
|
123
|
+
6. Date/time/timezone handling
|
|
124
|
+
7. Currency/precision arithmetic (rounding rules)
|
|
125
|
+
8. Null/undefined guards (especially nested: user?.profile?.settings?.theme)
|
|
126
|
+
9. Migration/compatibility code (backwards compat shims)
|
|
127
|
+
10. Feature flags / A/B test branches
|
|
128
|
+
```
|
|
129
|
+
|
|
130
|
+
### Kill or Keep Decision Framework
|
|
131
|
+
|
|
132
|
+
```
|
|
133
|
+
KEEP if:
|
|
134
|
+
- It handles a real business scenario (even rare ones)
|
|
135
|
+
- It prevents data corruption or data loss
|
|
136
|
+
- It handles an external API quirk (rate limits, format variations)
|
|
137
|
+
- It was added in response to a bug report (check git blame)
|
|
138
|
+
- Removing it would change user-visible behavior
|
|
139
|
+
|
|
140
|
+
KILL if:
|
|
141
|
+
- It handles a bug in code that is being rewritten anyway
|
|
142
|
+
- It works around a library limitation that no longer exists
|
|
143
|
+
- It exists because of poor architecture (which the rebuild fixes)
|
|
144
|
+
- It is dead code (no execution path reaches it)
|
|
145
|
+
- It was a temporary hack with a TODO to remove
|
|
146
|
+
```
|
|
147
|
+
|
|
148
|
+
---
|
|
149
|
+
|
|
150
|
+
## 4. Vibe-Coder Anti-Pattern Replacement Map
|
|
151
|
+
|
|
152
|
+
| Anti-Pattern | Detection Signal | Production Replacement |
|
|
153
|
+
|-------------|-----------------|----------------------|
|
|
154
|
+
| **God File** | Single file > 500 LOC with mixed concerns | Split by responsibility: routes, services, models, utils |
|
|
155
|
+
| **Copy-Paste Variation** | Near-identical blocks (>80% similar) | Extract shared function with parameters for variations |
|
|
156
|
+
| **Callback Hell** | Nested callbacks > 3 levels deep | async/await with proper error handling |
|
|
157
|
+
| **Global State Spaghetti** | `global`, `window.`, module-level mutation | State management (Redux, Zustand, Context) or DI |
|
|
158
|
+
| **No Error Handling** | No try/catch, uncaught promise rejections | Error middleware + typed error classes + logging |
|
|
159
|
+
| **Inline Everything** | SQL in route handlers, HTML in logic | Layered architecture: controller → service → repository |
|
|
160
|
+
| **Magic Strings/Numbers** | Hardcoded values throughout | Named constants, enums, config files |
|
|
161
|
+
| **No Types** | `any` everywhere, no interfaces | TypeScript strict mode with proper type definitions |
|
|
162
|
+
| **No Tests** | Zero test files | Test-first rebuild (write tests from INTENT.md before code) |
|
|
163
|
+
| **Security Ignorance** | Hardcoded secrets, no input validation, raw SQL | .env, validation schemas, parameterized queries, auth middleware |
|
|
164
|
+
| **No Config Separation** | URLs, ports, keys mixed in code | Environment-specific config with validation |
|
|
165
|
+
| **Monolith Route File** | All routes in one file | Route module per resource with controller pattern |
|
|
166
|
+
|
|
167
|
+
---
|
|
168
|
+
|
|
169
|
+
## 5. The Intent Graph
|
|
170
|
+
|
|
171
|
+
A three-column mapping that serves as the translation layer between messy source and clean target:
|
|
172
|
+
|
|
173
|
+
```
|
|
174
|
+
CODE (what exists) → INTENT (what was meant) → CLEAN (what to build)
|
|
175
|
+
```
|
|
176
|
+
|
|
177
|
+
### Why the Intent Graph Matters
|
|
178
|
+
|
|
179
|
+
It prevents two failure modes:
|
|
180
|
+
|
|
181
|
+
**Failure Mode 1: Literal Translation**
|
|
182
|
+
Rebuilding the mess exactly as it is, just with better formatting. The intent graph forces you to go THROUGH intent, not directly from code to code.
|
|
183
|
+
|
|
184
|
+
**Failure Mode 2: Intent Loss**
|
|
185
|
+
Rebuilding something clean but missing features because the developer's hidden knowledge was not extracted. The graph forces you to document every code pattern before discarding it.
|
|
186
|
+
|
|
187
|
+
### Building the Intent Graph
|
|
188
|
+
|
|
189
|
+
```
|
|
190
|
+
FOR each source code module:
|
|
191
|
+
1. Read the code → document WHAT it does (observable behavior)
|
|
192
|
+
2. Read surrounding context → document WHY (intent)
|
|
193
|
+
3. Look up best practice → document HOW to do it right
|
|
194
|
+
4. Record all three columns in INTENT-GRAPH.md
|
|
195
|
+
5. Cross-check: does the "clean" column preserve all behaviors
|
|
196
|
+
from the "code" column? If not, something was lost.
|
|
197
|
+
```
|
|
198
|
+
|
|
199
|
+
### Example Intent Graph Entries
|
|
200
|
+
|
|
201
|
+
| Source Code (Messy) | Developer Intent | Clean Implementation |
|
|
202
|
+
|---------------------|-----------------|---------------------|
|
|
203
|
+
| 200-line auth middleware with inline SQL | Role-based access control | passport.js + RBAC middleware + DB-backed roles |
|
|
204
|
+
| Global error handler that catches everything | Don't crash the app | Express error middleware + typed error classes + Sentry |
|
|
205
|
+
| 15 API routes in one file | CRUD for users + products + orders | Separate route modules + controller layer + service layer |
|
|
206
|
+
| Hardcoded `PORT=3000` in 5 files | Environment-specific config | dotenv + typed config loader + validation |
|
|
207
|
+
| Copy-pasted validation in every route | Input validation | Zod/Joi schema per endpoint, shared validation middleware |
|
|
208
|
+
| `setTimeout(fn, 5000)` retry loops | Handle transient API failures | Exponential backoff utility with configurable retries |
|
|
209
|
+
| `if (user.role === 'admin' \|\| user.id === 1)` | Admin access + superuser bypass | RBAC with permissions table + superuser flag in DB |
|
|
210
|
+
|
|
211
|
+
---
|
|
212
|
+
|
|
213
|
+
## 6. Rebuild Order Strategy
|
|
214
|
+
|
|
215
|
+
When rebuilding from INTENT.md, build in this order:
|
|
216
|
+
|
|
217
|
+
```
|
|
218
|
+
1. FOUNDATION — Project scaffold, config, types, database schema
|
|
219
|
+
2. CORE — Highest-uniqueness features (CRITICAL and HIGH)
|
|
220
|
+
3. SUPPORT — Medium-uniqueness features
|
|
221
|
+
4. STANDARD — Low-uniqueness and boilerplate (often auto-generated)
|
|
222
|
+
5. INTEGRATION — Wire everything together, cross-module flows
|
|
223
|
+
6. HARDENING — Error handling, logging, security, edge cases
|
|
224
|
+
7. TESTING — Tests written from INTENT.md assertions
|
|
225
|
+
8. DOCUMENTATION — README, API docs, deployment guide
|
|
226
|
+
```
|
|
227
|
+
|
|
228
|
+
**Rationale:** Build what is UNIQUE first. Boilerplate is easiest to add later and lowest risk. If context runs out, you want the unique business logic done, not the boilerplate. This is the opposite of how vibe coders build (scaffold first, unique logic last — which is why their unique logic is always the messiest part).
|
|
229
|
+
|
|
230
|
+
---
|
|
231
|
+
|
|
232
|
+
## 7. Feature Uniqueness Classification
|
|
233
|
+
|
|
234
|
+
| Score | Definition | Rebuild Strategy |
|
|
235
|
+
|-------|-----------|-----------------|
|
|
236
|
+
| **BOILERPLATE** | Standard framework code, no custom logic | Regenerate from best practices (don't even read the original) |
|
|
237
|
+
| **LOW** | Minor customization of standard patterns | Use standard pattern, apply customizations from INTENT.md |
|
|
238
|
+
| **MEDIUM** | Meaningful business logic using common patterns | Rewrite with proper architecture, preserve all business rules |
|
|
239
|
+
| **HIGH** | Custom algorithms, domain-specific rules | Carefully extract logic, rewrite with tests, preserve edge cases |
|
|
240
|
+
| **CRITICAL** | Core business differentiator, proprietary logic | Extract verbatim, wrap in clean architecture, comprehensive tests |
|
|
241
|
+
|
|
242
|
+
---
|
|
243
|
+
|
|
244
|
+
## When Agents Should Reference This Skill
|
|
245
|
+
|
|
246
|
+
- **fire-phoenix-analyst:** Primary reference — use reading order, intent extraction patterns, squint test, uniqueness classification
|
|
247
|
+
- **fire-phoenix (command):** Reference rebuild order strategy when creating phase breakdown from INTENT.md
|
|
248
|
+
- **fire-planner:** When planning rebuild phases, use uniqueness scores to prioritize task order
|
|
249
|
+
- **fire-executor:** When rebuilding modules, check anti-pattern map to avoid reintroducing accidental complexity
|
|
250
|
+
- **fire-verifier:** When verifying rebuild, check that accidental complexity items are absent and edge cases are preserved
|
|
251
|
+
- **fire-researcher:** When researching alternatives for a stuck rebuild, check intent graph for original intent
|
|
@@ -0,0 +1,157 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: QUALITY_GATES_AND_VERIFICATION
|
|
3
|
+
category: methodology
|
|
4
|
+
description: Industry-proven verification patterns — tiered gates, risk-based testing, error budgets, shift-left, and the 7 principles of testing applied to AI-assisted development
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
tags: [quality-gates, verification, testing, risk-based-testing, error-budget, shift-left]
|
|
7
|
+
sources:
|
|
8
|
+
- "Google SRE — Error Budget Policy (sre.google)"
|
|
9
|
+
- "Netflix — Kayenta Automated Canary Analysis"
|
|
10
|
+
- "SonarSource, Dynatrace, LinearB — Quality Gate frameworks"
|
|
11
|
+
- "Hans-Petter Halvorsen — Software Development: A Practical Approach"
|
|
12
|
+
- "ISTQB — 7 Principles of Testing"
|
|
13
|
+
- "CRISP-ML(Q) — Phase-level risk registers"
|
|
14
|
+
---
|
|
15
|
+
|
|
16
|
+
# Quality Gates and Verification
|
|
17
|
+
|
|
18
|
+
> **Core insight:** A gate that halts progress is working correctly. Distinguish between "agent failed" and "gate blocked advancement pending better input."
|
|
19
|
+
|
|
20
|
+
---
|
|
21
|
+
|
|
22
|
+
## 1. Tiered Verification (Shift-Left)
|
|
23
|
+
|
|
24
|
+
Structure verification as two tiers — never run expensive checks when cheap ones already fail:
|
|
25
|
+
|
|
26
|
+
### Tier 1: Fast Gate (seconds, always run)
|
|
27
|
+
- Syntax validation / linting
|
|
28
|
+
- File existence checks
|
|
29
|
+
- Schema conformance
|
|
30
|
+
- Import resolution
|
|
31
|
+
- Type checking (if applicable)
|
|
32
|
+
|
|
33
|
+
### Tier 2: Slow Gate (minutes, run only when Tier 1 passes)
|
|
34
|
+
- Integration tests
|
|
35
|
+
- End-to-end validation
|
|
36
|
+
- Performance benchmarks
|
|
37
|
+
- Security scans
|
|
38
|
+
- Cross-phase contract verification
|
|
39
|
+
|
|
40
|
+
**Why:** A build that doesn't compile will never pass integration tests. Running integration tests on broken syntax wastes tokens and time.
|
|
41
|
+
|
|
42
|
+
**Agent action:** fire-verifier should run Tier 1 checks first. If any fail, report immediately without running Tier 2. This alone can save 50%+ of verification time on failed phases.
|
|
43
|
+
|
|
44
|
+
---
|
|
45
|
+
|
|
46
|
+
## 2. Risk-Based Testing
|
|
47
|
+
|
|
48
|
+
Test scope is a function of two variables:
|
|
49
|
+
|
|
50
|
+
```
|
|
51
|
+
Test Priority = Likelihood of Failure × Impact of Failure
|
|
52
|
+
```
|
|
53
|
+
|
|
54
|
+
| Quadrant | Likelihood | Impact | Strategy |
|
|
55
|
+
|----------|-----------|--------|----------|
|
|
56
|
+
| **Test first, test most** | High | High | Full verification, manual review |
|
|
57
|
+
| **Test thoroughly** | Low | High | Targeted deep tests |
|
|
58
|
+
| **Test efficiently** | High | Low | Automated regression |
|
|
59
|
+
| **Sample or defer** | Low | Low | Spot check, trust |
|
|
60
|
+
|
|
61
|
+
**Agent action:** Before verification, classify each changed area by this matrix. Don't test everything equally — that's wasteful. Don't skip testing on "small changes" — that's dangerous.
|
|
62
|
+
|
|
63
|
+
### Change Impact Scoping
|
|
64
|
+
```
|
|
65
|
+
Config-only change → verify config loads, skip code tests
|
|
66
|
+
Backend-only change → verify API + DB, skip frontend/E2E
|
|
67
|
+
Frontend-only change → verify rendering + UX, skip backend
|
|
68
|
+
Full-stack change → full verification
|
|
69
|
+
Test-only change → verify tests pass, minimal code review
|
|
70
|
+
```
|
|
71
|
+
|
|
72
|
+
---
|
|
73
|
+
|
|
74
|
+
## 3. Error Budget for Retry Decisions
|
|
75
|
+
|
|
76
|
+
Borrowed from Google SRE: every task has a finite retry budget.
|
|
77
|
+
|
|
78
|
+
```
|
|
79
|
+
Task error budget = max_retries (default: 2)
|
|
80
|
+
|
|
81
|
+
After each retry:
|
|
82
|
+
budget -= 1
|
|
83
|
+
|
|
84
|
+
IF budget == 0:
|
|
85
|
+
STOP retrying
|
|
86
|
+
Route to: research → re-plan → or escalate
|
|
87
|
+
|
|
88
|
+
NEVER: retry the same approach a 3rd time
|
|
89
|
+
```
|
|
90
|
+
|
|
91
|
+
**Why this works:** An agent that retries the same failing approach 5 times is burning tokens, not solving problems. Two retries catches transient failures. Beyond that, the approach itself is wrong.
|
|
92
|
+
|
|
93
|
+
**Integration with circuit breaker:** The error budget is the per-task trip threshold. When exhausted, the task-level breaker opens and routes to research.
|
|
94
|
+
|
|
95
|
+
---
|
|
96
|
+
|
|
97
|
+
## 4. The 7 Principles of Testing (Applied to AI Development)
|
|
98
|
+
|
|
99
|
+
From ISTQB, adapted for AI-agent workflows:
|
|
100
|
+
|
|
101
|
+
| Principle | Original | AI-Agent Translation |
|
|
102
|
+
|-----------|----------|---------------------|
|
|
103
|
+
| **1. Testing shows presence of bugs** | Testing reduces probability of undiscovered defects but isn't proof of correctness | Verification catches issues but passing doesn't guarantee production-ready |
|
|
104
|
+
| **2. Exhaustive testing is impossible** | Test based on risk assessment, not completeness | Scope verification to change impact, don't verify everything |
|
|
105
|
+
| **3. Early testing** | Start testing as early as possible | Verify plan coherence before execution, not after |
|
|
106
|
+
| **4. Defect clustering** | A small number of modules contain most bugs | Track which phases/tasks cluster failures — invest guards there |
|
|
107
|
+
| **5. Pesticide paradox** | Same tests stop finding new bugs | Rotate verification approaches; static checklist misses novel failures |
|
|
108
|
+
| **6. Testing is context-dependent** | Different software needs different testing | Backend change ≠ frontend change ≠ config change → different checks |
|
|
109
|
+
| **7. Absence of error is a fallacy** | Bug-free software can still be unusable | Code that passes all checks but doesn't meet user requirements is still wrong |
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## 5. Phase-Level Risk Registers (CRISP-ML(Q))
|
|
114
|
+
|
|
115
|
+
Before each major phase, generate a short risk assessment:
|
|
116
|
+
|
|
117
|
+
```markdown
|
|
118
|
+
## Phase {N} Risk Register
|
|
119
|
+
|
|
120
|
+
| Risk | Likelihood | Impact | Mitigation |
|
|
121
|
+
|------|-----------|--------|------------|
|
|
122
|
+
| {most likely failure mode} | {H/M/L} | {H/M/L} | {specific action} |
|
|
123
|
+
| {second most likely} | {H/M/L} | {H/M/L} | {specific action} |
|
|
124
|
+
| {third most likely} | {H/M/L} | {H/M/L} | {specific action} |
|
|
125
|
+
```
|
|
126
|
+
|
|
127
|
+
**This is 5 lines, not a document.** The point is to think about failure before executing, not to create paperwork.
|
|
128
|
+
|
|
129
|
+
---
|
|
130
|
+
|
|
131
|
+
## 6. Definition of Ready / Definition of Done
|
|
132
|
+
|
|
133
|
+
Two gates that prevent wasted work:
|
|
134
|
+
|
|
135
|
+
### Definition of Ready (before starting a task)
|
|
136
|
+
- [ ] Acceptance criteria are clear and verifiable
|
|
137
|
+
- [ ] Dependencies are resolved or documented
|
|
138
|
+
- [ ] Scope is bounded (files, tools, operations)
|
|
139
|
+
- [ ] Required context is available (MEMORY.md, prior phase output)
|
|
140
|
+
|
|
141
|
+
### Definition of Done (before declaring complete)
|
|
142
|
+
- [ ] All Tier 1 checks pass
|
|
143
|
+
- [ ] Tier 2 checks pass (if applicable to scope)
|
|
144
|
+
- [ ] No regressions introduced
|
|
145
|
+
- [ ] RECORD.md updated with what was done
|
|
146
|
+
- [ ] Agent confidence ≥ 70% on the output
|
|
147
|
+
|
|
148
|
+
**Rule:** If DoR isn't met, send the task back for clarification — don't start it. If DoD isn't met, the task is not done — don't advance.
|
|
149
|
+
|
|
150
|
+
---
|
|
151
|
+
|
|
152
|
+
## When Agents Should Reference This Skill
|
|
153
|
+
|
|
154
|
+
- **fire-verifier:** Apply tiered verification (Tier 1 before Tier 2), risk-based scoping
|
|
155
|
+
- **fire-planner:** Include risk register in plan output, define DoR/DoD per task
|
|
156
|
+
- **fire-executor:** Check DoR before starting, track error budget per task
|
|
157
|
+
- **fire-autonomous:** Use error budget to decide retry vs. escalate
|
|
@@ -0,0 +1,104 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: RELIABILITY_PREDICTION
|
|
3
|
+
category: methodology
|
|
4
|
+
description: Predict phase reliability before execution using implied scenario detection, sensitivity analysis, and constrained models — catch architectural mismatches before they cost tokens
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
tags: [reliability, prediction, implied-scenarios, sensitivity-analysis, quality-gates]
|
|
7
|
+
sources:
|
|
8
|
+
- "Rodrigues, Rosenblum, Uchitel — Reliability Prediction in Model-Driven Development (UCL/Imperial, 2005)"
|
|
9
|
+
- "CRISP-ML(Q) — Mercedes-Benz AG + TU Berlin, 2020"
|
|
10
|
+
---
|
|
11
|
+
|
|
12
|
+
# Reliability Prediction for AI-Assisted Development
|
|
13
|
+
|
|
14
|
+
> **Core insight:** "Composition reveals what specification omits." When you connect agents, phases, or tools together, the system produces behaviors no individual specification predicted. Detect these early or pay later.
|
|
15
|
+
|
|
16
|
+
---
|
|
17
|
+
|
|
18
|
+
## 1. Two Risk Dimensions Per Phase
|
|
19
|
+
|
|
20
|
+
Every phase has two independent failure probabilities — assess both before executing:
|
|
21
|
+
|
|
22
|
+
| Dimension | Question | Example |
|
|
23
|
+
|-----------|----------|---------|
|
|
24
|
+
| **Transition probability** | If this phase succeeds, does it cleanly advance to the next? | "Auth phase done, but API routes phase expects a different token format" |
|
|
25
|
+
| **Component reliability** | What's the probability this agent/tool produces correct output? | "LLM generating boilerplate = 95% reliable. LLM designing novel algorithm = 60% reliable" |
|
|
26
|
+
|
|
27
|
+
**Agent action:** Before executing a task, estimate both. If component reliability < 60%, research first. If transition probability is unclear, verify the interface contract with the next phase.
|
|
28
|
+
|
|
29
|
+
---
|
|
30
|
+
|
|
31
|
+
## 2. Implied Scenario Detection
|
|
32
|
+
|
|
33
|
+
After any multi-agent or multi-phase interaction, check for unspecified behaviors:
|
|
34
|
+
|
|
35
|
+
### Positive Implied Scenarios (missing specification)
|
|
36
|
+
- The system produced a correct behavior not in the plan
|
|
37
|
+
- **Action:** Add it to the phase spec. Document it in PATTERNS.md
|
|
38
|
+
- Example: "The auth middleware also handles rate limiting — not planned, but correct and useful"
|
|
39
|
+
|
|
40
|
+
### Negative Implied Scenarios (architecture mismatch)
|
|
41
|
+
- The system permits behavior the specification forbids
|
|
42
|
+
- **Action:** Add a constraint (validation gate, type check, pre-condition) — don't patch the agent
|
|
43
|
+
- Example: "The executor can write to files outside the declared scope — add scope enforcement"
|
|
44
|
+
|
|
45
|
+
> **Key finding:** Adding a single constraint improved system reliability from 64.9% to 86.2% in the source study. Constraints beat corrections.
|
|
46
|
+
|
|
47
|
+
---
|
|
48
|
+
|
|
49
|
+
## 3. Sensitivity Analysis — Where to Invest Guards
|
|
50
|
+
|
|
51
|
+
Not all phase failures are equal. Rank by **downstream impact**, not frequency:
|
|
52
|
+
|
|
53
|
+
```
|
|
54
|
+
For each phase that has ever failed:
|
|
55
|
+
1. Fix that phase's reliability to 0% (assume it fails)
|
|
56
|
+
2. Estimate: how many downstream phases break?
|
|
57
|
+
3. Estimate: what's the rework cost?
|
|
58
|
+
4. Rank phases by total downstream damage
|
|
59
|
+
|
|
60
|
+
Result: The phase with the highest damage multiplier
|
|
61
|
+
gets the most verification investment
|
|
62
|
+
```
|
|
63
|
+
|
|
64
|
+
**Common surprise:** The rarest failure with the highest downstream cost should get the most guard investment. A planning failure that happens 5% of the time but invalidates 3 downstream phases is worse than an execution failure that happens 20% of the time but is locally contained.
|
|
65
|
+
|
|
66
|
+
---
|
|
67
|
+
|
|
68
|
+
## 4. Probability Completeness
|
|
69
|
+
|
|
70
|
+
Every decision diamond must have exhaustive branches. No implicit "otherwise":
|
|
71
|
+
|
|
72
|
+
```
|
|
73
|
+
BAD: "If verification passes → proceed to handoff"
|
|
74
|
+
(What happens if it fails? Undefined.)
|
|
75
|
+
|
|
76
|
+
GOOD: "If verification passes → proceed to handoff
|
|
77
|
+
If verification fails with fixable issues → re-execute with gaps
|
|
78
|
+
If verification fails with architectural issues → re-plan
|
|
79
|
+
If verification fails 3 times → dead-end shelf + escalate"
|
|
80
|
+
```
|
|
81
|
+
|
|
82
|
+
**Rule:** If the branches from a decision point don't cover 100% of outcomes, the workflow has a structural defect.
|
|
83
|
+
|
|
84
|
+
---
|
|
85
|
+
|
|
86
|
+
## 5. Early Non-Functional Analysis
|
|
87
|
+
|
|
88
|
+
> "Early evaluation of software properties is important in order to reduce costs before resources have been allocated and decisions have been made."
|
|
89
|
+
|
|
90
|
+
The highest-leverage phase is **planning** — not because planning is intrinsically valuable, but because:
|
|
91
|
+
- Defect caught in planning = 1 iteration to fix
|
|
92
|
+
- Same defect caught in execution = 5 iterations
|
|
93
|
+
- Caught in production = indefinitely more
|
|
94
|
+
|
|
95
|
+
**Agent action:** Before executing, verify the plan's non-functional properties: Is the architecture coherent? Are the dependencies resolvable? Is the scope verifiable? These questions cost 30 seconds to check and save hours of rework.
|
|
96
|
+
|
|
97
|
+
---
|
|
98
|
+
|
|
99
|
+
## When Agents Should Reference This Skill
|
|
100
|
+
|
|
101
|
+
- **fire-planner:** Before generating a plan, assess transition probabilities between phases
|
|
102
|
+
- **fire-verifier:** After verification, run implied scenario check on multi-agent outputs
|
|
103
|
+
- **fire-executor:** Before starting a task with < 60% component reliability, route to research
|
|
104
|
+
- **fire-researcher:** When analyzing why a phase failed, use sensitivity analysis to prioritize
|
|
@@ -0,0 +1,155 @@
|
|
|
1
|
+
---
|
|
2
|
+
name: REQUIREMENTS_DECOMPOSITION
|
|
3
|
+
category: methodology
|
|
4
|
+
description: Turn vague requirements into testable specifications using utility trees, ATAM tradeoff analysis, and weighted decision matrices — never accept Level 1 input for execution
|
|
5
|
+
version: 1.0.0
|
|
6
|
+
tags: [requirements, decomposition, utility-tree, atam, weighted-decision-matrix, tradeoffs]
|
|
7
|
+
sources:
|
|
8
|
+
- "CMU SEI — How to Address Poorly-Defined Requirements in Software System Design (Nov 2025)"
|
|
9
|
+
- "Dr. Lori Flynn, Lyndsi Hughes — Carnegie Mellon University Software Engineering Institute"
|
|
10
|
+
- "IEEE — Software Quality Definition"
|
|
11
|
+
- "ATAM — Architecture Tradeoff Analysis Method"
|
|
12
|
+
---
|
|
13
|
+
|
|
14
|
+
# Requirements Decomposition
|
|
15
|
+
|
|
16
|
+
> **Core insight:** "Never accept a vague requirement as input to any agent task — always decompose to Level 4 before beginning execution." A requirement you can't test is a requirement you can't verify.
|
|
17
|
+
|
|
18
|
+
---
|
|
19
|
+
|
|
20
|
+
## 1. The Four-Level Decomposition (Utility Tree)
|
|
21
|
+
|
|
22
|
+
Every requirement must be drilled from vague to testable:
|
|
23
|
+
|
|
24
|
+
```
|
|
25
|
+
Level 1: Quality Attribute (vague)
|
|
26
|
+
"Good security"
|
|
27
|
+
|
|
28
|
+
Level 2: Sub-factors (decomposed concerns)
|
|
29
|
+
Data Protection, Auth, Security Logging, Compliance
|
|
30
|
+
|
|
31
|
+
Level 3: Refined Sub-factors (actionable concerns)
|
|
32
|
+
Encrypt data at rest, Restrict access, Log unauthorized access
|
|
33
|
+
|
|
34
|
+
Level 4: Requirements (specific, testable, implementable)
|
|
35
|
+
"FIPS 140-2 validated encryption", "RBAC with role hierarchy",
|
|
36
|
+
"Failed auth attempts logged with IP + timestamp"
|
|
37
|
+
```
|
|
38
|
+
|
|
39
|
+
**Rule:** You cannot start execution on a Level 1 or Level 2 requirement. If a user says "make it secure" or "add good error handling," decompose FIRST.
|
|
40
|
+
|
|
41
|
+
**Agent action:** fire-planner decomposes every requirement to Level 4 before generating BLUEPRINT tasks. Each Level 4 entry must have a corresponding test/verification criterion.
|
|
42
|
+
|
|
43
|
+
---
|
|
44
|
+
|
|
45
|
+
## 2. Tradeoff Analysis (ATAM)
|
|
46
|
+
|
|
47
|
+
Requirements exist in tension. You cannot maximize everything:
|
|
48
|
+
|
|
49
|
+
| Tension | Example |
|
|
50
|
+
|---------|---------|
|
|
51
|
+
| Security vs. Performance | Encryption adds latency |
|
|
52
|
+
| Flexibility vs. Simplicity | More config options = more complexity |
|
|
53
|
+
| Speed-to-market vs. Quality | Shortcuts now = rework later |
|
|
54
|
+
| Feature richness vs. Maintainability | More features = more surface area |
|
|
55
|
+
|
|
56
|
+
**The ATAM goal:** "Elicit, concretize, and prioritize the driving quality attribute requirements."
|
|
57
|
+
|
|
58
|
+
**Agent action:** When a plan has competing quality attributes, surface the tradeoff explicitly:
|
|
59
|
+
```markdown
|
|
60
|
+
## Tradeoff: {Attribute A} vs. {Attribute B}
|
|
61
|
+
|
|
62
|
+
Decision: Prioritize {A} because {reason}
|
|
63
|
+
Consequence: {B} will be {specific impact}
|
|
64
|
+
Mitigation: {what we'll do to limit the downside}
|
|
65
|
+
```
|
|
66
|
+
|
|
67
|
+
**Never silently resolve a tradeoff.** The user should know what they're trading away.
|
|
68
|
+
|
|
69
|
+
---
|
|
70
|
+
|
|
71
|
+
## 3. Weighted Decision Matrix (WDM)
|
|
72
|
+
|
|
73
|
+
When choosing between approaches, score mathematically:
|
|
74
|
+
|
|
75
|
+
```
|
|
76
|
+
Score = Σ (weight_i × rating_i) for each criterion
|
|
77
|
+
|
|
78
|
+
Where:
|
|
79
|
+
weight_i = stakeholder priority (sum to 1.0)
|
|
80
|
+
rating_i = how well this option satisfies criterion i
|
|
81
|
+
```
|
|
82
|
+
|
|
83
|
+
### Weight Calculation (Rank-to-Linear)
|
|
84
|
+
```
|
|
85
|
+
weight = 2r / N(N+1)
|
|
86
|
+
where r = priority rank, N = total criteria count
|
|
87
|
+
```
|
|
88
|
+
|
|
89
|
+
### Scaling Convention
|
|
90
|
+
- Higher raw value = better → use R directly (coverage %, security score)
|
|
91
|
+
- Higher raw value = worse → use 1/R (cost, latency, error rate)
|
|
92
|
+
- Normalize to comparable magnitude before multiplying
|
|
93
|
+
|
|
94
|
+
**Agent action:** When fire-planner or fire-researcher evaluates 2+ approaches, use WDM scoring instead of subjective "I think approach A is better." Present the scored comparison to the user.
|
|
95
|
+
|
|
96
|
+
---
|
|
97
|
+
|
|
98
|
+
## 4. Requirements Handoff Gate
|
|
99
|
+
|
|
100
|
+
Requirements are ready for execution when ALL of these are true:
|
|
101
|
+
|
|
102
|
+
- [ ] **Tradeoffs are known** — you understand what you're giving up
|
|
103
|
+
- [ ] **Threats to quality are mitigated** — identified and addressed
|
|
104
|
+
- [ ] **Requirements are precisely defined** — not vague
|
|
105
|
+
- [ ] **Requirements are measurable** — you can test and get a number
|
|
106
|
+
- [ ] **Requirements are prioritized** — ranked by importance
|
|
107
|
+
- [ ] **Requirements have test criteria** — each requirement maps to a verification step
|
|
108
|
+
|
|
109
|
+
**If this gate fails:** Send back for clarification. Do NOT start building on vague requirements — that's how Frankenstein projects are born.
|
|
110
|
+
|
|
111
|
+
---
|
|
112
|
+
|
|
113
|
+
## 5. Behavioral Discovery (Post-Build Requirements)
|
|
114
|
+
|
|
115
|
+
Some requirements are discovered after building, not before:
|
|
116
|
+
|
|
117
|
+
| Discovery Type | Action |
|
|
118
|
+
|----------------|--------|
|
|
119
|
+
| **New behavior we want** | Add as new requirement, add test |
|
|
120
|
+
| **Behavior that violates a requirement** | File as defect, fix |
|
|
121
|
+
| **Behavior we consciously accept** | Document as acknowledged risk |
|
|
122
|
+
| **Behavior not in spec at all** | Classify: is it positive implied scenario or negative? |
|
|
123
|
+
|
|
124
|
+
This maps to the implied scenario detection pattern from RELIABILITY_PREDICTION.md — composition reveals behaviors no individual specification predicted.
|
|
125
|
+
|
|
126
|
+
---
|
|
127
|
+
|
|
128
|
+
## 6. Scenario Elicitation for AI Agents
|
|
129
|
+
|
|
130
|
+
When gathering requirements from users who don't know exactly what they want:
|
|
131
|
+
|
|
132
|
+
### Seed Scenario Technique
|
|
133
|
+
Present high-level context descriptions to anchor the conversation:
|
|
134
|
+
- "This is a SaaS platform for small businesses" (seed)
|
|
135
|
+
- "Given that context, what matters most — speed to market, enterprise security, or cost efficiency?" (elicit priority)
|
|
136
|
+
- "What could go wrong that would be unacceptable?" (elicit constraints)
|
|
137
|
+
- "What must always work, even if other things break?" (elicit critical paths)
|
|
138
|
+
|
|
139
|
+
### Quality Attribute Building Blocks
|
|
140
|
+
For each stated concern, ask three questions:
|
|
141
|
+
1. **Concerns:** What are you worried about?
|
|
142
|
+
2. **Factors:** What sub-dimensions define this?
|
|
143
|
+
3. **Methods:** How will we achieve it?
|
|
144
|
+
|
|
145
|
+
This converts stakeholder stories into technical requirements without requiring technical language.
|
|
146
|
+
|
|
147
|
+
---
|
|
148
|
+
|
|
149
|
+
## When Agents Should Reference This Skill
|
|
150
|
+
|
|
151
|
+
- **fire-1a-discuss:** Use utility tree decomposition during requirement gathering
|
|
152
|
+
- **fire-planner:** Decompose to Level 4 before generating BLUEPRINT tasks
|
|
153
|
+
- **fire-researcher:** Use WDM when comparing alternative approaches
|
|
154
|
+
- **fire-verifier:** Verify against Level 4 requirements, not Level 1 descriptions
|
|
155
|
+
- **fire-vision-architect:** Surface tradeoffs explicitly when presenting architecture branches
|