@thierrynakoa/fire-flow 10.0.0 → 12.2.1

This diff represents the content of publicly available package versions that have been released to one of the supported registries. The information contained in this diff is provided for informational purposes only and reflects changes between package versions as they appear in their respective public registries.
Files changed (94) hide show
  1. package/.claude-plugin/plugin.json +8 -8
  2. package/ARCHITECTURE-DIAGRAM.md +7 -4
  3. package/COMMAND-REFERENCE.md +33 -13
  4. package/DOMINION-FLOW-OVERVIEW.md +581 -421
  5. package/QUICK-START.md +3 -3
  6. package/README.md +101 -44
  7. package/TROUBLESHOOTING.md +264 -264
  8. package/agents/fire-executor.md +200 -116
  9. package/agents/fire-fact-checker.md +276 -276
  10. package/agents/fire-phoenix-analyst.md +394 -0
  11. package/agents/fire-planner.md +145 -53
  12. package/agents/fire-project-researcher.md +155 -155
  13. package/agents/fire-research-synthesizer.md +166 -166
  14. package/agents/fire-researcher.md +144 -59
  15. package/agents/fire-roadmapper.md +215 -203
  16. package/agents/fire-verifier.md +247 -65
  17. package/agents/fire-vision-architect.md +381 -0
  18. package/commands/fire-0-orient.md +476 -476
  19. package/commands/fire-1a-new.md +216 -0
  20. package/commands/fire-1b-research.md +210 -0
  21. package/commands/fire-1c-setup.md +254 -0
  22. package/commands/{fire-1a-discuss.md → fire-1d-discuss.md} +35 -7
  23. package/commands/fire-3-execute.md +55 -2
  24. package/commands/fire-4-verify.md +61 -0
  25. package/commands/fire-5-handoff.md +2 -2
  26. package/commands/fire-6-resume.md +37 -2
  27. package/commands/fire-add-new-skill.md +2 -2
  28. package/commands/fire-autonomous.md +20 -3
  29. package/commands/fire-brainstorm.md +1 -1
  30. package/commands/fire-complete-milestone.md +2 -2
  31. package/commands/fire-cost.md +183 -0
  32. package/commands/fire-dashboard.md +2 -2
  33. package/commands/fire-debug.md +663 -663
  34. package/commands/fire-loop-resume.md +2 -2
  35. package/commands/fire-loop-stop.md +1 -1
  36. package/commands/fire-loop.md +1168 -1168
  37. package/commands/fire-map-codebase.md +3 -3
  38. package/commands/fire-new-milestone.md +356 -356
  39. package/commands/fire-phoenix.md +603 -0
  40. package/commands/fire-reflect.md +235 -235
  41. package/commands/fire-research.md +246 -246
  42. package/commands/fire-search.md +1 -1
  43. package/commands/fire-skills-diff.md +3 -3
  44. package/commands/fire-skills-history.md +3 -3
  45. package/commands/fire-skills-rollback.md +7 -7
  46. package/commands/fire-skills-sync.md +5 -5
  47. package/commands/fire-test.md +9 -9
  48. package/commands/fire-todos.md +1 -1
  49. package/commands/fire-update.md +5 -5
  50. package/hooks/hooks.json +16 -16
  51. package/hooks/run-hook.sh +8 -8
  52. package/hooks/run-session-end.sh +7 -7
  53. package/hooks/session-end.sh +90 -90
  54. package/hooks/session-start.sh +1 -1
  55. package/package.json +4 -2
  56. package/plugin.json +7 -7
  57. package/references/metrics-and-trends.md +1 -1
  58. package/skills-library/SKILLS-INDEX.md +588 -588
  59. package/skills-library/_general/methodology/AUTONOMOUS_ORCHESTRATION.md +182 -0
  60. package/skills-library/_general/methodology/BACKWARD_PLANNING_INTERVIEW.md +307 -0
  61. package/skills-library/_general/methodology/CIRCUIT_BREAKER_INTELLIGENCE.md +163 -0
  62. package/skills-library/_general/methodology/CONTEXT_ROTATION.md +151 -0
  63. package/skills-library/_general/methodology/DEAD_ENDS_SHELF.md +188 -0
  64. package/skills-library/_general/methodology/DESIGN_PHILOSOPHY_ENFORCEMENT.md +152 -0
  65. package/skills-library/_general/methodology/INTERNAL_CONSISTENCY_AUDIT.md +212 -0
  66. package/skills-library/_general/methodology/LIVE_BREADCRUMB_PROTOCOL.md +242 -0
  67. package/skills-library/_general/methodology/PHOENIX_REBUILD_METHODOLOGY.md +251 -0
  68. package/skills-library/_general/methodology/QUALITY_GATES_AND_VERIFICATION.md +157 -0
  69. package/skills-library/_general/methodology/RELIABILITY_PREDICTION.md +104 -0
  70. package/skills-library/_general/methodology/REQUIREMENTS_DECOMPOSITION.md +155 -0
  71. package/skills-library/_general/methodology/SELF_TESTING_FEEDBACK_LOOP.md +143 -0
  72. package/skills-library/_general/methodology/STACK_COMPATIBILITY_MATRIX.md +178 -0
  73. package/skills-library/_general/methodology/TIERED_CONTEXT_ARCHITECTURE.md +118 -0
  74. package/skills-library/_general/methodology/ZERO_FRICTION_CLI_SETUP.md +312 -0
  75. package/skills-library/_general/methodology/autonomous-multi-phase-build.md +133 -0
  76. package/skills-library/_general/methodology/claude-md-archival.md +280 -0
  77. package/skills-library/_general/methodology/debug-swarm-researcher-escape-hatch.md +240 -240
  78. package/skills-library/_general/methodology/git-worktrees-parallel.md +232 -0
  79. package/skills-library/_general/methodology/llm-judge-memory-crud.md +241 -0
  80. package/skills-library/_general/methodology/multi-project-autonomous-build.md +360 -0
  81. package/skills-library/_general/methodology/shell-autonomous-loop-fixplan.md +238 -238
  82. package/skills-library/_general/patterns-standards/GOF_DESIGN_PATTERNS_FOR_AI_AGENTS.md +358 -0
  83. package/skills-library/methodology/BREATH_BASED_PARALLEL_EXECUTION.md +1 -1
  84. package/skills-library/methodology/RESEARCH_BACKED_WORKFLOW_UPGRADE.md +1 -1
  85. package/skills-library/methodology/SABBATH_REST_PATTERN.md +1 -1
  86. package/templates/ASSUMPTIONS.md +1 -1
  87. package/templates/BLOCKERS.md +1 -1
  88. package/templates/DECISION_LOG.md +1 -1
  89. package/templates/phase-prompt.md +1 -1
  90. package/templates/phoenix-comparison.md +80 -0
  91. package/version.json +2 -2
  92. package/workflows/handoff-session.md +1 -1
  93. package/workflows/new-project.md +2 -2
  94. package/commands/fire-1-new.md +0 -281
@@ -0,0 +1,251 @@
1
+ ---
2
+ name: PHOENIX_REBUILD_METHODOLOGY
3
+ category: methodology
4
+ version: 1.0.0
5
+ contributed: 2026-03-06
6
+ contributor: dominion-flow
7
+ last_updated: 2026-03-06
8
+ tags: [phoenix, rebuild, refactor, intent-extraction, vibe-code, technical-debt, reverse-engineering]
9
+ difficulty: hard
10
+ sources:
11
+ - "Fred Brooks — No Silver Bullet (1986) — Essential vs Accidental Complexity"
12
+ - "Martin Fowler — Refactoring: Improving the Design of Existing Code"
13
+ - "Michael Feathers — Working Effectively with Legacy Code"
14
+ - "Refactoring.Guru — Design Pattern Catalog"
15
+ ---
16
+
17
+ # Phoenix Rebuild Methodology
18
+
19
+ > **Core insight:** Don't refactor the mess — reverse-engineer the INTENT, then build clean from scratch. A phoenix burns the old and rises new. The ashes carry the knowledge; the new form carries none of the accidental complexity.
20
+
21
+ ---
22
+
23
+ ## 1. How to Extract Intent from Messy Code
24
+
25
+ ### Reading Order (Critical — Do NOT Read Code First)
26
+
27
+ ```
28
+ 1. README / docs → What the developer SAID the app does
29
+ 2. Route / endpoint files → The API surface reveals feature boundaries
30
+ 3. Database schema/models → The data model reveals domain concepts
31
+ 4. Tests (if any) → Tests encode intended behavior
32
+ 5. Git commit messages → The narrative of how the code evolved
33
+ 6. The code itself → LAST — read implementation after you understand intent
34
+ ```
35
+
36
+ **Why this order:** Reading code first biases toward "what it does" instead of "what it was meant to do." Surrounding artifacts reveal intent more clearly than tangled implementation.
37
+
38
+ ### Intent Extraction Patterns
39
+
40
+ | Pattern | What to Look For | What It Reveals |
41
+ |---------|-----------------|-----------------|
42
+ | **Naming Intent** | Function/variable names vs their behavior | Gap between name and behavior = accidental complexity |
43
+ | **Comment Intent** | Comments saying "should", "TODO", "HACK", "FIXME" | Unfulfilled intent — developer knew what they wanted |
44
+ | **Test Intent** | What tests assert (if tests exist) | The behaviors the developer cared about verifying |
45
+ | **Error Handling Intent** | What errors are caught vs thrown | What the developer thought could go wrong |
46
+ | **Commit Message Intent** | "fix:", "feat:", "hack:" prefixes | The sequence of intentions over time |
47
+ | **Dead Code Intent** | Commented-out code, unreachable branches | Abandoned attempts — replaced or forgotten? |
48
+ | **Copy-Paste Intent** | Duplicated blocks with minor variations | "I needed this to work like THAT but slightly different" |
49
+ | **Magic Number Intent** | Hardcoded values with no explanation | A business rule or config never extracted |
50
+ | **Import Intent** | Imported but unused libraries | Features planned but never implemented |
51
+ | **Overengineering Intent** | Complex abstractions wrapping simple logic | Developer anticipated needs that never materialized |
52
+
53
+ ### The "Squint Test"
54
+
55
+ For any module, ask: **"If I squint past the implementation mess, what is this module's job in ONE sentence?"**
56
+
57
+ If you cannot answer in one sentence, the module violates Single Responsibility and should be split during rebuild. The squint test produces the "intent statement" for each feature.
58
+
59
+ ---
60
+
61
+ ## 2. Accidental vs Essential Complexity (Fred Brooks)
62
+
63
+ ### Essential Complexity — Keep and Rewrite Clean
64
+
65
+ Complexity inherent to the PROBLEM itself. Cannot be removed without changing what the application does.
66
+
67
+ **Examples:**
68
+ - Tax calculation rules (complex because taxes are complex)
69
+ - Multi-currency arithmetic (complex because currencies are complex)
70
+ - Role-based permissions with inheritance (complex because access control is nuanced)
71
+ - Content repurposing logic (complex because each platform has different format requirements)
72
+
73
+ **During rebuild:** Preserve ALL essential complexity. Rewrite it cleaner, add tests, add comments — but do NOT simplify away the business rules.
74
+
75
+ ### Accidental Complexity — Remove Entirely
76
+
77
+ Complexity introduced by the IMPLEMENTATION, not the problem. CAN and SHOULD be eliminated.
78
+
79
+ **Detection heuristics:**
80
+ ```
81
+ IF removing the pattern changes WHAT the app does → ESSENTIAL (keep)
82
+ IF removing the pattern only changes HOW it does it → ACCIDENTAL (remove)
83
+ IF the pattern exists because "that's how the tutorial did it" → ACCIDENTAL
84
+ IF the pattern exists because "the business rule requires it" → ESSENTIAL
85
+ IF the pattern appears in "common anti-patterns" lists → likely ACCIDENTAL
86
+ ```
87
+
88
+ **Common accidental complexity indicators:**
89
+ - Global state mutations instead of state management
90
+ - Callback nesting instead of async/await
91
+ - Raw SQL strings instead of parameterized queries / ORM
92
+ - No separation between routes, business logic, and data access
93
+ - Duplicated code instead of shared functions
94
+ - Inconsistent error handling (some try/catch, some not)
95
+ - No type safety (everything is `any`)
96
+ - Hardcoded configuration values
97
+ - No environment separation (dev/staging/prod)
98
+
99
+ ---
100
+
101
+ ## 3. Edge Case Preservation Protocol
102
+
103
+ ### The 5-Step Protocol
104
+
105
+ Every edge case in the original code must be:
106
+
107
+ ```
108
+ 1. IDENTIFIED — Found in the code (conditional branches, special cases)
109
+ 2. DOCUMENTED — Recorded in INTENT.md with WHY it exists
110
+ 3. CLASSIFIED — Is this edge case still needed in the rebuild?
111
+ 4. CARRIED — If needed, it MUST appear in the rebuild BLUEPRINT
112
+ 5. VERIFIED — The rebuilt project must handle this edge case (test it)
113
+ ```
114
+
115
+ ### Where Edge Cases Hide (Top 10 Locations)
116
+
117
+ ```
118
+ 1. if/else branches with non-obvious conditions
119
+ 2. try/catch blocks with specific error type handling
120
+ 3. Database query filters with multiple conditions
121
+ 4. Validation rules with specific ranges or patterns
122
+ 5. Timeout/retry logic
123
+ 6. Date/time/timezone handling
124
+ 7. Currency/precision arithmetic (rounding rules)
125
+ 8. Null/undefined guards (especially nested: user?.profile?.settings?.theme)
126
+ 9. Migration/compatibility code (backwards compat shims)
127
+ 10. Feature flags / A/B test branches
128
+ ```
129
+
130
+ ### Kill or Keep Decision Framework
131
+
132
+ ```
133
+ KEEP if:
134
+ - It handles a real business scenario (even rare ones)
135
+ - It prevents data corruption or data loss
136
+ - It handles an external API quirk (rate limits, format variations)
137
+ - It was added in response to a bug report (check git blame)
138
+ - Removing it would change user-visible behavior
139
+
140
+ KILL if:
141
+ - It handles a bug in code that is being rewritten anyway
142
+ - It works around a library limitation that no longer exists
143
+ - It exists because of poor architecture (which the rebuild fixes)
144
+ - It is dead code (no execution path reaches it)
145
+ - It was a temporary hack with a TODO to remove
146
+ ```
147
+
148
+ ---
149
+
150
+ ## 4. Vibe-Coder Anti-Pattern Replacement Map
151
+
152
+ | Anti-Pattern | Detection Signal | Production Replacement |
153
+ |-------------|-----------------|----------------------|
154
+ | **God File** | Single file > 500 LOC with mixed concerns | Split by responsibility: routes, services, models, utils |
155
+ | **Copy-Paste Variation** | Near-identical blocks (>80% similar) | Extract shared function with parameters for variations |
156
+ | **Callback Hell** | Nested callbacks > 3 levels deep | async/await with proper error handling |
157
+ | **Global State Spaghetti** | `global`, `window.`, module-level mutation | State management (Redux, Zustand, Context) or DI |
158
+ | **No Error Handling** | No try/catch, uncaught promise rejections | Error middleware + typed error classes + logging |
159
+ | **Inline Everything** | SQL in route handlers, HTML in logic | Layered architecture: controller → service → repository |
160
+ | **Magic Strings/Numbers** | Hardcoded values throughout | Named constants, enums, config files |
161
+ | **No Types** | `any` everywhere, no interfaces | TypeScript strict mode with proper type definitions |
162
+ | **No Tests** | Zero test files | Test-first rebuild (write tests from INTENT.md before code) |
163
+ | **Security Ignorance** | Hardcoded secrets, no input validation, raw SQL | .env, validation schemas, parameterized queries, auth middleware |
164
+ | **No Config Separation** | URLs, ports, keys mixed in code | Environment-specific config with validation |
165
+ | **Monolith Route File** | All routes in one file | Route module per resource with controller pattern |
166
+
167
+ ---
168
+
169
+ ## 5. The Intent Graph
170
+
171
+ A three-column mapping that serves as the translation layer between messy source and clean target:
172
+
173
+ ```
174
+ CODE (what exists) → INTENT (what was meant) → CLEAN (what to build)
175
+ ```
176
+
177
+ ### Why the Intent Graph Matters
178
+
179
+ It prevents two failure modes:
180
+
181
+ **Failure Mode 1: Literal Translation**
182
+ Rebuilding the mess exactly as it is, just with better formatting. The intent graph forces you to go THROUGH intent, not directly from code to code.
183
+
184
+ **Failure Mode 2: Intent Loss**
185
+ Rebuilding something clean but missing features because the developer's hidden knowledge was not extracted. The graph forces you to document every code pattern before discarding it.
186
+
187
+ ### Building the Intent Graph
188
+
189
+ ```
190
+ FOR each source code module:
191
+ 1. Read the code → document WHAT it does (observable behavior)
192
+ 2. Read surrounding context → document WHY (intent)
193
+ 3. Look up best practice → document HOW to do it right
194
+ 4. Record all three columns in INTENT-GRAPH.md
195
+ 5. Cross-check: does the "clean" column preserve all behaviors
196
+ from the "code" column? If not, something was lost.
197
+ ```
198
+
199
+ ### Example Intent Graph Entries
200
+
201
+ | Source Code (Messy) | Developer Intent | Clean Implementation |
202
+ |---------------------|-----------------|---------------------|
203
+ | 200-line auth middleware with inline SQL | Role-based access control | passport.js + RBAC middleware + DB-backed roles |
204
+ | Global error handler that catches everything | Don't crash the app | Express error middleware + typed error classes + Sentry |
205
+ | 15 API routes in one file | CRUD for users + products + orders | Separate route modules + controller layer + service layer |
206
+ | Hardcoded `PORT=3000` in 5 files | Environment-specific config | dotenv + typed config loader + validation |
207
+ | Copy-pasted validation in every route | Input validation | Zod/Joi schema per endpoint, shared validation middleware |
208
+ | `setTimeout(fn, 5000)` retry loops | Handle transient API failures | Exponential backoff utility with configurable retries |
209
+ | `if (user.role === 'admin' \|\| user.id === 1)` | Admin access + superuser bypass | RBAC with permissions table + superuser flag in DB |
210
+
211
+ ---
212
+
213
+ ## 6. Rebuild Order Strategy
214
+
215
+ When rebuilding from INTENT.md, build in this order:
216
+
217
+ ```
218
+ 1. FOUNDATION — Project scaffold, config, types, database schema
219
+ 2. CORE — Highest-uniqueness features (CRITICAL and HIGH)
220
+ 3. SUPPORT — Medium-uniqueness features
221
+ 4. STANDARD — Low-uniqueness and boilerplate (often auto-generated)
222
+ 5. INTEGRATION — Wire everything together, cross-module flows
223
+ 6. HARDENING — Error handling, logging, security, edge cases
224
+ 7. TESTING — Tests written from INTENT.md assertions
225
+ 8. DOCUMENTATION — README, API docs, deployment guide
226
+ ```
227
+
228
+ **Rationale:** Build what is UNIQUE first. Boilerplate is easiest to add later and lowest risk. If context runs out, you want the unique business logic done, not the boilerplate. This is the opposite of how vibe coders build (scaffold first, unique logic last — which is why their unique logic is always the messiest part).
229
+
230
+ ---
231
+
232
+ ## 7. Feature Uniqueness Classification
233
+
234
+ | Score | Definition | Rebuild Strategy |
235
+ |-------|-----------|-----------------|
236
+ | **BOILERPLATE** | Standard framework code, no custom logic | Regenerate from best practices (don't even read the original) |
237
+ | **LOW** | Minor customization of standard patterns | Use standard pattern, apply customizations from INTENT.md |
238
+ | **MEDIUM** | Meaningful business logic using common patterns | Rewrite with proper architecture, preserve all business rules |
239
+ | **HIGH** | Custom algorithms, domain-specific rules | Carefully extract logic, rewrite with tests, preserve edge cases |
240
+ | **CRITICAL** | Core business differentiator, proprietary logic | Extract verbatim, wrap in clean architecture, comprehensive tests |
241
+
242
+ ---
243
+
244
+ ## When Agents Should Reference This Skill
245
+
246
+ - **fire-phoenix-analyst:** Primary reference — use reading order, intent extraction patterns, squint test, uniqueness classification
247
+ - **fire-phoenix (command):** Reference rebuild order strategy when creating phase breakdown from INTENT.md
248
+ - **fire-planner:** When planning rebuild phases, use uniqueness scores to prioritize task order
249
+ - **fire-executor:** When rebuilding modules, check anti-pattern map to avoid reintroducing accidental complexity
250
+ - **fire-verifier:** When verifying rebuild, check that accidental complexity items are absent and edge cases are preserved
251
+ - **fire-researcher:** When researching alternatives for a stuck rebuild, check intent graph for original intent
@@ -0,0 +1,157 @@
1
+ ---
2
+ name: QUALITY_GATES_AND_VERIFICATION
3
+ category: methodology
4
+ description: Industry-proven verification patterns — tiered gates, risk-based testing, error budgets, shift-left, and the 7 principles of testing applied to AI-assisted development
5
+ version: 1.0.0
6
+ tags: [quality-gates, verification, testing, risk-based-testing, error-budget, shift-left]
7
+ sources:
8
+ - "Google SRE — Error Budget Policy (sre.google)"
9
+ - "Netflix — Kayenta Automated Canary Analysis"
10
+ - "SonarSource, Dynatrace, LinearB — Quality Gate frameworks"
11
+ - "Hans-Petter Halvorsen — Software Development: A Practical Approach"
12
+ - "ISTQB — 7 Principles of Testing"
13
+ - "CRISP-ML(Q) — Phase-level risk registers"
14
+ ---
15
+
16
+ # Quality Gates and Verification
17
+
18
+ > **Core insight:** A gate that halts progress is working correctly. Distinguish between "agent failed" and "gate blocked advancement pending better input."
19
+
20
+ ---
21
+
22
+ ## 1. Tiered Verification (Shift-Left)
23
+
24
+ Structure verification as two tiers — never run expensive checks when cheap ones already fail:
25
+
26
+ ### Tier 1: Fast Gate (seconds, always run)
27
+ - Syntax validation / linting
28
+ - File existence checks
29
+ - Schema conformance
30
+ - Import resolution
31
+ - Type checking (if applicable)
32
+
33
+ ### Tier 2: Slow Gate (minutes, run only when Tier 1 passes)
34
+ - Integration tests
35
+ - End-to-end validation
36
+ - Performance benchmarks
37
+ - Security scans
38
+ - Cross-phase contract verification
39
+
40
+ **Why:** A build that doesn't compile will never pass integration tests. Running integration tests on broken syntax wastes tokens and time.
41
+
42
+ **Agent action:** fire-verifier should run Tier 1 checks first. If any fail, report immediately without running Tier 2. This alone can save 50%+ of verification time on failed phases.
43
+
44
+ ---
45
+
46
+ ## 2. Risk-Based Testing
47
+
48
+ Test scope is a function of two variables:
49
+
50
+ ```
51
+ Test Priority = Likelihood of Failure × Impact of Failure
52
+ ```
53
+
54
+ | Quadrant | Likelihood | Impact | Strategy |
55
+ |----------|-----------|--------|----------|
56
+ | **Test first, test most** | High | High | Full verification, manual review |
57
+ | **Test thoroughly** | Low | High | Targeted deep tests |
58
+ | **Test efficiently** | High | Low | Automated regression |
59
+ | **Sample or defer** | Low | Low | Spot check, trust |
60
+
61
+ **Agent action:** Before verification, classify each changed area by this matrix. Don't test everything equally — that's wasteful. Don't skip testing on "small changes" — that's dangerous.
62
+
63
+ ### Change Impact Scoping
64
+ ```
65
+ Config-only change → verify config loads, skip code tests
66
+ Backend-only change → verify API + DB, skip frontend/E2E
67
+ Frontend-only change → verify rendering + UX, skip backend
68
+ Full-stack change → full verification
69
+ Test-only change → verify tests pass, minimal code review
70
+ ```
71
+
72
+ ---
73
+
74
+ ## 3. Error Budget for Retry Decisions
75
+
76
+ Borrowed from Google SRE: every task has a finite retry budget.
77
+
78
+ ```
79
+ Task error budget = max_retries (default: 2)
80
+
81
+ After each retry:
82
+ budget -= 1
83
+
84
+ IF budget == 0:
85
+ STOP retrying
86
+ Route to: research → re-plan → or escalate
87
+
88
+ NEVER: retry the same approach a 3rd time
89
+ ```
90
+
91
+ **Why this works:** An agent that retries the same failing approach 5 times is burning tokens, not solving problems. Two retries catches transient failures. Beyond that, the approach itself is wrong.
92
+
93
+ **Integration with circuit breaker:** The error budget is the per-task trip threshold. When exhausted, the task-level breaker opens and routes to research.
94
+
95
+ ---
96
+
97
+ ## 4. The 7 Principles of Testing (Applied to AI Development)
98
+
99
+ From ISTQB, adapted for AI-agent workflows:
100
+
101
+ | Principle | Original | AI-Agent Translation |
102
+ |-----------|----------|---------------------|
103
+ | **1. Testing shows presence of bugs** | Testing reduces probability of undiscovered defects but isn't proof of correctness | Verification catches issues but passing doesn't guarantee production-ready |
104
+ | **2. Exhaustive testing is impossible** | Test based on risk assessment, not completeness | Scope verification to change impact, don't verify everything |
105
+ | **3. Early testing** | Start testing as early as possible | Verify plan coherence before execution, not after |
106
+ | **4. Defect clustering** | A small number of modules contain most bugs | Track which phases/tasks cluster failures — invest guards there |
107
+ | **5. Pesticide paradox** | Same tests stop finding new bugs | Rotate verification approaches; static checklist misses novel failures |
108
+ | **6. Testing is context-dependent** | Different software needs different testing | Backend change ≠ frontend change ≠ config change → different checks |
109
+ | **7. Absence of error is a fallacy** | Bug-free software can still be unusable | Code that passes all checks but doesn't meet user requirements is still wrong |
110
+
111
+ ---
112
+
113
+ ## 5. Phase-Level Risk Registers (CRISP-ML(Q))
114
+
115
+ Before each major phase, generate a short risk assessment:
116
+
117
+ ```markdown
118
+ ## Phase {N} Risk Register
119
+
120
+ | Risk | Likelihood | Impact | Mitigation |
121
+ |------|-----------|--------|------------|
122
+ | {most likely failure mode} | {H/M/L} | {H/M/L} | {specific action} |
123
+ | {second most likely} | {H/M/L} | {H/M/L} | {specific action} |
124
+ | {third most likely} | {H/M/L} | {H/M/L} | {specific action} |
125
+ ```
126
+
127
+ **This is 5 lines, not a document.** The point is to think about failure before executing, not to create paperwork.
128
+
129
+ ---
130
+
131
+ ## 6. Definition of Ready / Definition of Done
132
+
133
+ Two gates that prevent wasted work:
134
+
135
+ ### Definition of Ready (before starting a task)
136
+ - [ ] Acceptance criteria are clear and verifiable
137
+ - [ ] Dependencies are resolved or documented
138
+ - [ ] Scope is bounded (files, tools, operations)
139
+ - [ ] Required context is available (MEMORY.md, prior phase output)
140
+
141
+ ### Definition of Done (before declaring complete)
142
+ - [ ] All Tier 1 checks pass
143
+ - [ ] Tier 2 checks pass (if applicable to scope)
144
+ - [ ] No regressions introduced
145
+ - [ ] RECORD.md updated with what was done
146
+ - [ ] Agent confidence ≥ 70% on the output
147
+
148
+ **Rule:** If DoR isn't met, send the task back for clarification — don't start it. If DoD isn't met, the task is not done — don't advance.
149
+
150
+ ---
151
+
152
+ ## When Agents Should Reference This Skill
153
+
154
+ - **fire-verifier:** Apply tiered verification (Tier 1 before Tier 2), risk-based scoping
155
+ - **fire-planner:** Include risk register in plan output, define DoR/DoD per task
156
+ - **fire-executor:** Check DoR before starting, track error budget per task
157
+ - **fire-autonomous:** Use error budget to decide retry vs. escalate
@@ -0,0 +1,104 @@
1
+ ---
2
+ name: RELIABILITY_PREDICTION
3
+ category: methodology
4
+ description: Predict phase reliability before execution using implied scenario detection, sensitivity analysis, and constrained models — catch architectural mismatches before they cost tokens
5
+ version: 1.0.0
6
+ tags: [reliability, prediction, implied-scenarios, sensitivity-analysis, quality-gates]
7
+ sources:
8
+ - "Rodrigues, Rosenblum, Uchitel — Reliability Prediction in Model-Driven Development (UCL/Imperial, 2005)"
9
+ - "CRISP-ML(Q) — Mercedes-Benz AG + TU Berlin, 2020"
10
+ ---
11
+
12
+ # Reliability Prediction for AI-Assisted Development
13
+
14
+ > **Core insight:** "Composition reveals what specification omits." When you connect agents, phases, or tools together, the system produces behaviors no individual specification predicted. Detect these early or pay later.
15
+
16
+ ---
17
+
18
+ ## 1. Two Risk Dimensions Per Phase
19
+
20
+ Every phase has two independent failure probabilities — assess both before executing:
21
+
22
+ | Dimension | Question | Example |
23
+ |-----------|----------|---------|
24
+ | **Transition probability** | If this phase succeeds, does it cleanly advance to the next? | "Auth phase done, but API routes phase expects a different token format" |
25
+ | **Component reliability** | What's the probability this agent/tool produces correct output? | "LLM generating boilerplate = 95% reliable. LLM designing novel algorithm = 60% reliable" |
26
+
27
+ **Agent action:** Before executing a task, estimate both. If component reliability < 60%, research first. If transition probability is unclear, verify the interface contract with the next phase.
28
+
29
+ ---
30
+
31
+ ## 2. Implied Scenario Detection
32
+
33
+ After any multi-agent or multi-phase interaction, check for unspecified behaviors:
34
+
35
+ ### Positive Implied Scenarios (missing specification)
36
+ - The system produced a correct behavior not in the plan
37
+ - **Action:** Add it to the phase spec. Document it in PATTERNS.md
38
+ - Example: "The auth middleware also handles rate limiting — not planned, but correct and useful"
39
+
40
+ ### Negative Implied Scenarios (architecture mismatch)
41
+ - The system permits behavior the specification forbids
42
+ - **Action:** Add a constraint (validation gate, type check, pre-condition) — don't patch the agent
43
+ - Example: "The executor can write to files outside the declared scope — add scope enforcement"
44
+
45
+ > **Key finding:** Adding a single constraint improved system reliability from 64.9% to 86.2% in the source study. Constraints beat corrections.
46
+
47
+ ---
48
+
49
+ ## 3. Sensitivity Analysis — Where to Invest Guards
50
+
51
+ Not all phase failures are equal. Rank by **downstream impact**, not frequency:
52
+
53
+ ```
54
+ For each phase that has ever failed:
55
+ 1. Fix that phase's reliability to 0% (assume it fails)
56
+ 2. Estimate: how many downstream phases break?
57
+ 3. Estimate: what's the rework cost?
58
+ 4. Rank phases by total downstream damage
59
+
60
+ Result: The phase with the highest damage multiplier
61
+ gets the most verification investment
62
+ ```
63
+
64
+ **Common surprise:** The rarest failure with the highest downstream cost should get the most guard investment. A planning failure that happens 5% of the time but invalidates 3 downstream phases is worse than an execution failure that happens 20% of the time but is locally contained.
65
+
66
+ ---
67
+
68
+ ## 4. Probability Completeness
69
+
70
+ Every decision diamond must have exhaustive branches. No implicit "otherwise":
71
+
72
+ ```
73
+ BAD: "If verification passes → proceed to handoff"
74
+ (What happens if it fails? Undefined.)
75
+
76
+ GOOD: "If verification passes → proceed to handoff
77
+ If verification fails with fixable issues → re-execute with gaps
78
+ If verification fails with architectural issues → re-plan
79
+ If verification fails 3 times → dead-end shelf + escalate"
80
+ ```
81
+
82
+ **Rule:** If the branches from a decision point don't cover 100% of outcomes, the workflow has a structural defect.
83
+
84
+ ---
85
+
86
+ ## 5. Early Non-Functional Analysis
87
+
88
+ > "Early evaluation of software properties is important in order to reduce costs before resources have been allocated and decisions have been made."
89
+
90
+ The highest-leverage phase is **planning** — not because planning is intrinsically valuable, but because:
91
+ - Defect caught in planning = 1 iteration to fix
92
+ - Same defect caught in execution = 5 iterations
93
+ - Caught in production = indefinitely more
94
+
95
+ **Agent action:** Before executing, verify the plan's non-functional properties: Is the architecture coherent? Are the dependencies resolvable? Is the scope verifiable? These questions cost 30 seconds to check and save hours of rework.
96
+
97
+ ---
98
+
99
+ ## When Agents Should Reference This Skill
100
+
101
+ - **fire-planner:** Before generating a plan, assess transition probabilities between phases
102
+ - **fire-verifier:** After verification, run implied scenario check on multi-agent outputs
103
+ - **fire-executor:** Before starting a task with < 60% component reliability, route to research
104
+ - **fire-researcher:** When analyzing why a phase failed, use sensitivity analysis to prioritize
@@ -0,0 +1,155 @@
1
+ ---
2
+ name: REQUIREMENTS_DECOMPOSITION
3
+ category: methodology
4
+ description: Turn vague requirements into testable specifications using utility trees, ATAM tradeoff analysis, and weighted decision matrices — never accept Level 1 input for execution
5
+ version: 1.0.0
6
+ tags: [requirements, decomposition, utility-tree, atam, weighted-decision-matrix, tradeoffs]
7
+ sources:
8
+ - "CMU SEI — How to Address Poorly-Defined Requirements in Software System Design (Nov 2025)"
9
+ - "Dr. Lori Flynn, Lyndsi Hughes — Carnegie Mellon University Software Engineering Institute"
10
+ - "IEEE — Software Quality Definition"
11
+ - "ATAM — Architecture Tradeoff Analysis Method"
12
+ ---
13
+
14
+ # Requirements Decomposition
15
+
16
+ > **Core insight:** "Never accept a vague requirement as input to any agent task — always decompose to Level 4 before beginning execution." A requirement you can't test is a requirement you can't verify.
17
+
18
+ ---
19
+
20
+ ## 1. The Four-Level Decomposition (Utility Tree)
21
+
22
+ Every requirement must be drilled from vague to testable:
23
+
24
+ ```
25
+ Level 1: Quality Attribute (vague)
26
+ "Good security"
27
+
28
+ Level 2: Sub-factors (decomposed concerns)
29
+ Data Protection, Auth, Security Logging, Compliance
30
+
31
+ Level 3: Refined Sub-factors (actionable concerns)
32
+ Encrypt data at rest, Restrict access, Log unauthorized access
33
+
34
+ Level 4: Requirements (specific, testable, implementable)
35
+ "FIPS 140-2 validated encryption", "RBAC with role hierarchy",
36
+ "Failed auth attempts logged with IP + timestamp"
37
+ ```
38
+
39
+ **Rule:** You cannot start execution on a Level 1 or Level 2 requirement. If a user says "make it secure" or "add good error handling," decompose FIRST.
40
+
41
+ **Agent action:** fire-planner decomposes every requirement to Level 4 before generating BLUEPRINT tasks. Each Level 4 entry must have a corresponding test/verification criterion.
42
+
43
+ ---
44
+
45
+ ## 2. Tradeoff Analysis (ATAM)
46
+
47
+ Requirements exist in tension. You cannot maximize everything:
48
+
49
+ | Tension | Example |
50
+ |---------|---------|
51
+ | Security vs. Performance | Encryption adds latency |
52
+ | Flexibility vs. Simplicity | More config options = more complexity |
53
+ | Speed-to-market vs. Quality | Shortcuts now = rework later |
54
+ | Feature richness vs. Maintainability | More features = more surface area |
55
+
56
+ **The ATAM goal:** "Elicit, concretize, and prioritize the driving quality attribute requirements."
57
+
58
+ **Agent action:** When a plan has competing quality attributes, surface the tradeoff explicitly:
59
+ ```markdown
60
+ ## Tradeoff: {Attribute A} vs. {Attribute B}
61
+
62
+ Decision: Prioritize {A} because {reason}
63
+ Consequence: {B} will be {specific impact}
64
+ Mitigation: {what we'll do to limit the downside}
65
+ ```
66
+
67
+ **Never silently resolve a tradeoff.** The user should know what they're trading away.
68
+
69
+ ---
70
+
71
+ ## 3. Weighted Decision Matrix (WDM)
72
+
73
+ When choosing between approaches, score mathematically:
74
+
75
+ ```
76
+ Score = Σ (weight_i × rating_i) for each criterion
77
+
78
+ Where:
79
+ weight_i = stakeholder priority (sum to 1.0)
80
+ rating_i = how well this option satisfies criterion i
81
+ ```
82
+
83
+ ### Weight Calculation (Rank-to-Linear)
84
+ ```
85
+ weight = 2r / N(N+1)
86
+ where r = priority rank, N = total criteria count
87
+ ```
88
+
89
+ ### Scaling Convention
90
+ - Higher raw value = better → use R directly (coverage %, security score)
91
+ - Higher raw value = worse → use 1/R (cost, latency, error rate)
92
+ - Normalize to comparable magnitude before multiplying
93
+
94
+ **Agent action:** When fire-planner or fire-researcher evaluates 2+ approaches, use WDM scoring instead of subjective "I think approach A is better." Present the scored comparison to the user.
95
+
96
+ ---
97
+
98
+ ## 4. Requirements Handoff Gate
99
+
100
+ Requirements are ready for execution when ALL of these are true:
101
+
102
+ - [ ] **Tradeoffs are known** — you understand what you're giving up
103
+ - [ ] **Threats to quality are mitigated** — identified and addressed
104
+ - [ ] **Requirements are precisely defined** — not vague
105
+ - [ ] **Requirements are measurable** — you can test and get a number
106
+ - [ ] **Requirements are prioritized** — ranked by importance
107
+ - [ ] **Requirements have test criteria** — each requirement maps to a verification step
108
+
109
+ **If this gate fails:** Send back for clarification. Do NOT start building on vague requirements — that's how Frankenstein projects are born.
110
+
111
+ ---
112
+
113
+ ## 5. Behavioral Discovery (Post-Build Requirements)
114
+
115
+ Some requirements are discovered after building, not before:
116
+
117
+ | Discovery Type | Action |
118
+ |----------------|--------|
119
+ | **New behavior we want** | Add as new requirement, add test |
120
+ | **Behavior that violates a requirement** | File as defect, fix |
121
+ | **Behavior we consciously accept** | Document as acknowledged risk |
122
+ | **Behavior not in spec at all** | Classify: is it positive implied scenario or negative? |
123
+
124
+ This maps to the implied scenario detection pattern from RELIABILITY_PREDICTION.md — composition reveals behaviors no individual specification predicted.
125
+
126
+ ---
127
+
128
+ ## 6. Scenario Elicitation for AI Agents
129
+
130
+ When gathering requirements from users who don't know exactly what they want:
131
+
132
+ ### Seed Scenario Technique
133
+ Present high-level context descriptions to anchor the conversation:
134
+ - "This is a SaaS platform for small businesses" (seed)
135
+ - "Given that context, what matters most — speed to market, enterprise security, or cost efficiency?" (elicit priority)
136
+ - "What could go wrong that would be unacceptable?" (elicit constraints)
137
+ - "What must always work, even if other things break?" (elicit critical paths)
138
+
139
+ ### Quality Attribute Building Blocks
140
+ For each stated concern, ask three questions:
141
+ 1. **Concerns:** What are you worried about?
142
+ 2. **Factors:** What sub-dimensions define this?
143
+ 3. **Methods:** How will we achieve it?
144
+
145
+ This converts stakeholder stories into technical requirements without requiring technical language.
146
+
147
+ ---
148
+
149
+ ## When Agents Should Reference This Skill
150
+
151
+ - **fire-1a-discuss:** Use utility tree decomposition during requirement gathering
152
+ - **fire-planner:** Decompose to Level 4 before generating BLUEPRINT tasks
153
+ - **fire-researcher:** Use WDM when comparing alternative approaches
154
+ - **fire-verifier:** Verify against Level 4 requirements, not Level 1 descriptions
155
+ - **fire-vision-architect:** Surface tradeoffs explicitly when presenting architecture branches